Programming Languages and Systems, 7 conf., ESOP '98

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen 1381 Chris Hankin (Ed.) Program...

Author: Chris Hankin

7 downloads 920 Views 16MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen

1381

Chris Hankin (Ed.)

Programming Languages and Systems 7th European Symposium on Programming, ESOP'98 Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS' 98 Lisbon, Portugal, March 28 - April 4, 1998 Proceedings

~ Springer

Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell Universit2~ NY, USA Jan van Leeuwen, Utrecht University, The Netherlands

Volume Editor Chris Hankin The Imperial College of Science andTechnology, Department of Computing 180 Queen's Gate, London SW7 2BZ, UK E-mail: [email protected] Cataloging-in-Publication data applied for

Die Deutsche Bibliothek - CIP-Einheitsaufnahme Programming languages and systems : proceedings / 7th European Symposium on Programming, ESOP '98, held as part of the Joint European Conferences on Theory and Practice of Software, ETAPS '98, Lisbon, Portugal, March 28 - April 4, 1998. Chris Hankin (ed.). Berlin ; Heidelberg ; New York ; Barcelona ; Budapest ; H o n g K o n g ; L o n d o n ; M i l a n ; P a d s ; Santa Clara ; Singapore ; Tokyo : Springer, 1998 (Lecture notes in computer science ; Vol. 1381) ISBN 3-540-64302-8

CR Subject Classification (1991): D.3,F.3, E4, D.1-2 ISSN 0302-9743 ISBN 3-540-64302-8 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer -Verlag. Violations are liable for prosecution under the German Copyright Law. 9 Springer-Verlag Berlin Heidelberg 1998 Printed m Germany Typesetting: Camera-ready by author SPIN 10631992 06/3142 - 5 4 3 2 1 0

Printed on acid-free paper

Foreword The European conference situation in the general area of software science has long been considered unsatisfactory. A fairly large number of small and mediumsized conferences and workshops take place on an irregular basis, competing for high-quality contributions and for enough attendees to make them financially viable. Discussions aiming at a consolidation have been underway since at least 1992, with concrete planning beginning in summer 1994 and culminating in a public meeting at TAPSOFT'95 in Aarhus. On the basis of a broad consensus, it was decided to establish a single annual federated spring conference in the slot that was then occupied by TAPSOFT and CAAP/ESOP/CC, comprising a number of existing and new conferences and covering a spectrum from theory to practice. ETAPS'98, the first instance of the European Joint Conferences on Theory and Practice of Software, is taking place this year in Lisbon. It comprises five conferences (FoSSaCS, FASE, ESOP, CC, TACAS), four workshops (ACoS, VISUAL, WADT, CMCS), seven invited lectures, and nine tutorials. The events that comprise ETAPS address various aspects of the system development process, including specification, design, implementation, analysis and improvement. The languages, methodologies and tools which support these activities are all well within its scope. Different blends of theory and practice are represented, with an inclination towards theory with a practical motivation on one hand and soundly-based practice on the other. Many of the issues involved in software design apply to systems in general, including hardware systems, and the emphasis on software is not intended to be exclusive. ETAPS is a natural development from its predecessors. It is a loose confederation in which each event retains its own identity, with a separate programme committee and independent proceedings. Its format is open-ended, allowing it to grow and evolve as time goes by. Contributed talks and system demonstrations are in synchronized parallel sessions, with invited lectures in plenary sessions. Two of the invited lectures are reserved for "unifying" talks on topics of interest to the whole range of ETAPS attendees. The aim of cramming all this activity into a single one-week meeting is to create a strong magnet for academic and industrial researchers working on topics within its scope, giving them the opportunity to learn about research in related areas, and thereby to foster new and existing links between work in areas that have hitherto been addressed in separate meetings. ETAPS'98 has been superbly organized by Jos~ Luis Fiadeiro and his team at the Department of Informatics of the University of Lisbon. The ETAPS steering committee has put considerable energy into planning for ETAPS'98 and its successors. Its current membership is: Andr~ Arnold (Bordeaux), Egidio Astesiano (Genova), Jan Bergstra (Amsterdam), Ed Srinksma (Enschede), Rance Cleaveland (Raleigh), Pierpaolo Degano (Pisa), Hartmut Ehrig (Berlin), Jos~ Fiadeiro (Lisbon), Jean-Pierre Finance (Nancy), Marie-Claude Gaudel (Paris), Wibor

yl

Gyimothy (Szeged), Chris Hankin (London), Stefan J~ihnichen (Berlin), Uwe Kastens (Paderborn), Paul Klint (Amsterdam), Kai Koskimies (Tampere), Tom Maibaum (London), Hanne Riis Nielson (Aarhus), Fernando Orejas (Barcelona), Don Sannella (Edinburgh, chair), Bernhard Steffen (Dortmund), Doaitse Swierstra (Utrecht), Wolfgang Thomas (Kiel) Other people were influential in the early stages of planning, including Peter Mosses (Aarhus) and Reinhard Wilhelm (Saarbriicken). ETAPS'98 has received generous sponsorship from: Portugal Telecom TAP Air Portugal the Luso-American Development Foundation the British Council the EU programme "Training and Mobility of Researchers" the University of Lisbon the European Association for Theoretical Computer Science the European Association for Programming Languages and Systems the Gulbenkian Foundation I would like to express my sincere gratitude to all of these people and organizations, and to Jos~ in particular, as well as to Springer-Verlag for agreeing to publish the ETAPS proceedings. Edinburgh, January 1998

Donald Sannella ETAPS Steering Committee chairman

7th European

Symposium

on Programming

- ESOP'98

ESOP is devoted to fundamental issues in the specification, analysis, and implementation of programming languages and systems. The emphasis is on research which bridges the gap between theory and practice. Traditionally, this has included the following non-exhaustive list of topics: software specification and verification (including algebraic techniques and model checking), programming paradigms and their integration (including functional, logic, concurrent and object-oriented), semantics facilitating the formal development and implementation of programming languages and systems, advanced type systems (including polymorphism and subtyping), program analysis (including abstract interpretation and constraint systems), program transformation (including partial evaluation and term rewriting), and implementation techniques (including compilation). The programme committee received 59 submissions from which 17 papers were accepted. This proceedings volume also includes an invited contribution by Gert Smolka.

Programme

Committee:

J. de Bakker (The Netherlands) L. Cardelli (UK) A. Deutsch (France) R. Giegerich (Germany) R. Glfick (Denmark) R. Gorrieri (Italy) C. ttankin (UK, chair) P. Hartel (UK)

P. Lee (USA) H.R. Nielson (Denmark) M. Odersky (Australia) A. Pettorossi (Italy) A. Porto (Portugal) D. Sands (Sweden) D. Schmidt (USA)

I would like to thank the members of the programme committee and their sub-referees; the sub-referees are listed below. I also wish to express my gratitude to Hanne Riis Nielson, for passing on the wisdom she gained in chairing ESOP'96, and Don Sannella and Jos~ Luis Fiadeiro for the excellent job that they have done in organising ETAPS'98. ESOP'98 was organised in cooperation with ACM SIGPLAN.

London, January 1998

Chris Hankin

viii

Referees for ESOP'98

S. Abramov J. Alferes S.M. All L. Augustsson M. Beemster W. Bibel B. Blanchet F.S. de Boer E. Boerger M. Bonsangue A. Bossi M. Bugliesi M. Bulyonkov M. Butler L. Caires M. Carlsson S. Cimato K. Claessen C. Colby J.C. Cunha M. Dam L. Dami R. Davies S. Etalle M.C.F. Ferreira

R. Focardi N. Francez M. Gaspari G. Gonthier A. Gordon A. Gravell C. Haack T. Hallgren M. Hanus J. Hatcliff N. Heintze J. Hughes M. Huth H. Ibraheem B. Jacobs J. Jeuring T. Johnsson J. Jergensen B. Jung A. Klimov O. Kullmann E. Lamma C. Laneve M. Leuschel S. Maclean

S. Romanenko G. Malcolm A. Sabelfeld J. Maraist E. Scholz L. Margara J.P. Secher C. Mascolo E. Meijer R. Segala E. Sekerinski A. Middeldorp P. Sestoft D. Miller O. Shivers T. Mogensen H. Sondergaard A. Montresor C. Stone L. Moreau J.J. Moreno Navarro A. Stoughton C. Szyperski J. Mountjoy A. Takano M. Muench P. Thiemann A.P. Nemytykh B. Thompson F. Nielson M.G.J. van den Brand R. O'Callahan W. van Oortmerssen E. Ohlebusch W. Vree C. Okasaki H.R. Walters C. Palamidessi J. Warners F. Pfenning M. Wermelinger D. Plump A. Werner M. Proietti P. Wickline P. Purdom G. Zavattaro S. Renault J.G. Riecke

T a b l e of C o n t e n t s

Invited

Papers

Concurrent Constraint Programming Based on Functional Programming (Extended Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. Smolka

Regular

1

Papers

A Bisimulation Method for Cryptographic Protocols . . . . . . . . . . . . . . . . . . . M. Abadi and A. D. Gordon

12

A Polyvariant Binding-Time Analysis for Off-line Partial Deduction . . . . M. Bruynooghe, M. Leuschel and K. Sagonas

27

Verifiable and Executable Logic Specifications of Concurrent Objects in s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L. Caires and L. Monteiro

42

Complexity of Concrete Type-Inference in the Presence of Exceptions .. R. Chatterjee, B. G. Ryder and W. A. Landi

57

Synchronisation Analysis to Stop Tupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W.-N. Chin, S.-C. Khoo and T.-W. Lee

75

Propagating Differences: An Efficient New Fixpoint Algorithm for Distributive Constraint Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Fechl and H. Seidl

90

Reasoning about Classes in Object-Oriented Languages: Logical Models and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U. Hensel, M. Huisman, B. Jacobs and H. Tews

105

Language Primitives and Type Discipline for Structured Communication-Based Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Honda, V. T. Vasconcelos and M. Kubo

122

The Functional Imperative: Shape! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. B. Jay and P. A. Steckler

139

Code Motion and Code Placement: Just Synonyms? . . . . . . . . . . . . . . . . . . . J. Knoop, O. Riithing and B. Steffen

154

• Recursive Object Types in a Logic of Object-Oriented P r o g r a m s . . . . . . . K. R. M. Leino

170

Mode-Automata: About Modes and States for Reactive Systems . . . . . . . F. Maraninchi and Y. Rdmond

185

From Classes to Objects via Subtyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Rdmy

200

Building a Bridge between Pointer Aliases and P r o g r a m Dependences .. J. L. Ross and M. Sagiv

221

A Complete Declarative Debugger of Missing Answers . . . . . . . . . . . . . . . . . S. Ruggieri

236

Systematic Change of D a t a Representation: P r o g r a m Manipulations and a Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. L. Scherlis

252

A Generic Framework for Specialization (Abridged Version) . . . . . . . . . . . P. Thiemann

267

Author Index

283

.......................................................

Concurrent Constraint Programming Based on Functional Programming (Extended Abstract) Gert Smolka Programming Systems Lab DFKI and Universit~t des Saarlandes Postfach 15 11 50, D-66041 Saarbriicken, Germany smolka@dfki, de, http ://www. ps. uni-sb, de/~smolka/

1

Introduction

We will show how the operational features of logic programming can be added as conservative extensions to a functional base language with call by value semantics. We will address both concurrent and constraint logic programming [9, 2, 18]. As base language we will use a dynamically typed language that is obtained from SML by eliminating type declarations and static type checking. Our approach can be extended to cover all features of Oz [6, 15]. The experience with the development of Oz tells us that the outlined approach is the right base for the practical development of concurrent constraint programming languages. It avoids unnecessary duplication of concepts by reusing functional programming as core technology. Of course, it does not unify the partly incompatible theories behind functional and logic programming. They both contribute at a higher level of abstraction to the understanding of different aspects of the class of programming languages proposed here. 2

The Base Language DML

As base language we choose a dynamically typed language DML that is obtained from SML by eliminating type declarations and static type checking. Given the fact that SML is a strictly statically typed language this surgery is straightforward. To some extent it is already carried out for the definition of the dynamic semantics in the definition of SML [4]. There are several reasons for choosing SML. For the extensions to come it is essential that the language has call by value semantics, sequential execution order, and assignable references. Moreover, variants, records and exceptions are important. Finally, SML has a compact formal definition and is well-known. Since SML does not make a lexical distinction between constructors and variables, we need to retain constructor declarations. Modules loose their special status since they can be easily expressed with records functions. Since DML is dynamically typed, the primitive operations of the language will raise suitable exceptions if some of their arguments are ill-typed. Equality

in DML is defined for all values, where equality of functions is defined analogous to references. Every SML program can be translated into an DML program t h a t produces exactly the same results. Hence we can see SML as a statically typed language t h a t is defined on top of DML. There is the interesting possibility of a programming system t h a t based on DML offers a variety of type checking disciplines. Code checked with different type disciplines can be freely combined. To provide for the extensions to come, we will organize the operational semantics of DML in a style t h a t is rather different from the style used in the definition of SML [4].

3

Values

We distinguish between primitive values and compound values. Primitive values include numbers, nullary constructors, and names. Names represent primitive operations, reference cells, and functions. C o m p o u n d values are obtained by record and variant construction. We organize values into a first-order structure over which we obtain sufficiently rich first-order formulas. This set-up gives us a relation a ~ r t h a t holds if an assignment a satisfies a formula r An assignment is a mapping from variables to values. The details of such a construction can be found in [17].

4

States

A state is a finite function a mapping addresses a to so-called units u. Units are either primitive values other than names or representations of records, variants, reference cells, functions, and primitive operations:

{11 = a l , . . . , l k = ak} c(a) re~(a) fun(Match, Env)

primop A match Match is a sequence of clauses (pl=>el [ ... ] pk=>ek). An environment Env is a finite function from program variables to addresses. For technical convenience we will identify addresses, names and variables occurring in formulas. We say that an assignment a satisfies a state a and write a ~ a if for every address a in the domain of a the following holds: 1. 2. 3. 4.

If If If If

a(a) a(a) a(a) a(a)

is a primitive value v, then a(a) = v. is a reference cell, a function, or primop, then a(a) = a. = {11 = h i , . . . ,lk = ak}, then a(a) = {ll = a ( a l ) , . . . ,lk = o~(ak)}. = c(a'), then a ( a ) = c(a(a')).

Note that every state is satisfiable and that all assignments satisfying a state a agree on the domain of a. Our states and environments are different from the corresponding notions in the definition of SML. There environments map program variables to values and states map addresses to values. In our set-up environments map program variables to addresses and states map addresses to units. This means that our states represent both stateful and stateless information. Moreover, our states make structure sharing explicit. The sharing information is lost when we move from a state to a satisfying assignment. Given a state a, the unique restriction of an assignment satisfying a to the domain of a is an environment in the sense of the definition of SML. The relation a ~ r is defined to hold if and only if every assignment that satisfies a also satisfies r If the free variables of r are all in the domain of a, then we have either a ~ r or a ~ --r This means that a state has complete information about the values of its addresses. We require that the first-order language be rich enough so that for every a there exists a formula r such that

for all assignments a. Note that a determines r up to logical equivalence. We use r to denote a constraint with the above property and say that r represents the logic content of a. 5

Thread

and

Store

The operational semantics of DML distinguishes between a thread and a store. The thread is a functional evaluator that operates on the store. The states of the store are the states defined above. The store should be thought of as an abstract data type that is accessed by the thread only through a number of predefined operations. An example of such an operation is record selection, which takes an address a and a label 1 and returns an address or an exception packet. An interesting primitive operation is the equality test al = a2. It returns t r u e if a ~ al = a2 and f a l s e if a ~ -~ al = a2. Note that this definition yields structural equality for records and variants. We write a --+ a r to say that there is an operation on the store that will replace the state a with a state a r. The following monotonicity property holds for DML and all extensions we will consider:

abr

a'br

provided all free variables of r are in the domain of a.

6

Logic Variables

We now extend DML with logic variables, one of the essentials of logic programming. Logic variables are a means to represent in a state partial information

about the values of addresses. Logic variables are modelled with a new unit l v a r . The definition of states is extended so that a state may map an address also to l v a r or an address. We only admit states a whose dereference relation a --+0 a' is terminating, where a --+0 a' holds iff a(a) = a'. The case ~(a) = a' may appear when a logic variable is bound. We use o* (a) to denote the unique normal form of a with respect to the dereference relation --+~. The definition of the relation c~ ~ a is extended as follows: 1. If a(a) = l v a r , then there is no constraint on c~(a). 2. If a(a) = a', then ~(a) = a(a'). States can now represent partial information about the values of their addresses. This means that a does not necessarily determine the t r u t h value of a formula r whose free variables are in the domain in a. Our states contain more information than necessary. For instance, if a(a) = al and a(al) = a2, then the difference between a and a[a2/a] cannot be observed at the level of the programming language. In general, we impose the semantic requirement that for a state a and addresses al and a2 such that a ~ al = a2 the difference between al and a2 must not be observable. As it comes to space complexity, it is nevertheless important to model structure sharing. The existing operations of DML are extended to the new states as follows. If an operation needs more information about its arguments than the state provides, then the operation returns the control value b l o c k e d . If there is only one thread, then computation will terminate. If there are several threads, the thread will retry the operation in the hope that other threads have contributed the missing information (see next section). A match (e.g., (x: : x r => el I n i l => e2)) blocks until the store contains enough information to commit to one of the clauses or to know that none applies. Of particular interest is the equality test al =as, which we defined to return t r u e if a ~ al = a2 and f a l s e if a ~ -~ al = a2. Since in the presence of logic variables a may entail neither al = a2 nor -~al = a2, the equality test al=a2 may block. There is an efficient incremental algorithm [17] that checks for entailment and disentailment of equations al = as. We say that an operation is granted by a if it does not block on a. All extensions we will consider will satisfy the generalized monotonicity condition: if a grants o and a --~ a', then a' grants o. The operation ivar:

unit ->

'a

creates a fresh logic variable and returns its address. The operation isvar:

'a -> bool

returns true if its argument a dereferences to a logic variable (i.e., a(a*(a)) = l v a r ) and f a l s e otherwise. The variable binding operation

<-:

'a

*

'a ->

'a

expects t h a t its left argument is a logic variable, binds it to its right argument, and returns the right argument. More precisely, if <- is applied to (al, a2) and a* (al) = a3, a(a3) = l v a r and a*(a2) = a4, we distinguish two cases: 1. If a3 = a4, then there is no side effect and a4 is returned. 2. If a3 # a4, then the store is updated to o-[a4/a3] and a4 is returned. The operation wait:

'a -> 'a

is an identity function that blocks until its argument is bound to a nonvariable unit. This operation is useful for concurrent programming.

7

Multiple Threads

It is straightforward to extend DML with multiple threads. We use interleaving semantics, t h a t is, the operations threads perform on the store do not overlap in time. Threads can be created with the expression spawn e

which spawns a new thread evaluating e and returns (). Often it is convenient to use the derived form thread e which expands to let val x = ivar() in (spawn x <- e); x end where x is a program variable that does not occur free in e. For instance, if we

want to evaluate the constituents of the application e(el, e2) concurrently, we can simply write (thread e) (thread el, thread e2) since the necessary synchronization comes for free. The combination of logic variables and reference cells provides for powerful synchronization techniques. For this we need an operation exchange:

'a ref * 'a -> 'a

which updates the reference cell given as first argument to hold the second argument and returns the previous content of the cell. N o w a function mutex:

(unit -> 'a) -> 'a

that applies the function given as argument under mutual exclusion can be written as follows:

local val r = ref() in fun mutex(a) = let val c = ivar() in wait (exchange (r, c)) ; let val v = a() in c <- (); v end end end;

A function channel:

unit

- > { p u t : 'a -> unit, get: unit -> 'a}

that returns an asynchronous channel (i.e., a concurrent queue) can be written as

follows:

fun channel() = let val init = ivar() val p u t r = ref init val getr = ref init fun

put(x)

=

let val new = ivar() val old = exchange(purr,new) in old <- x::new; () end fun get () = let val new = ivar() val x::c = exchange(getr,new) in new <- c ; x end in {put=put, get=get} end T h e put function puts items on the channel and the get function gets items from the channel. The get function blocks until there is an item on the channel. The blocking is caused by the match val x : :c = exchange (getr,new) To obtain fairness, the simple requirement that every thread that is not blocked will eventually advance sumces. In the two examples above starvation is excluded since the blocked threads are implicitly queued by means of logic variables. Note that both example functions encapsulate the logic variables they introduce. Our simple fairness requirement rests on the genera|ized monotonicity condition stated above (i.e.,the property that a thread can advance cannot be invalidated by the operations performed by other threads). Languages that take channels as concurrency primitive (e.g., Pict [7]) require the more complicated fairness condition that our channels implement with logic variables. The outlined style of concurrent programming originated with Oz and is explored in [15, i]. The paper [15] relates to a previous version of O z that did not have sequential composition. The book [I] is based on the current version

of Oz and explores concurrent programming with object-oriented abstractions. The interested reader may also consult [19], which outlines a distributed version of Oz currently under development.

8

Unification

Next we define unification. We say that a t is obtained from a by a narrowing step if there are addresses a and a t in the domain of a such that a(a) = l v a r , a*(a') r a, and a ' = a[a'/a]. Note that the variable binding operation
~a *

Ja

->

~a

expects that its two arguments al and a2 be unifiable. If this is the case, it narrows the state accordingly and returns a2. Otherwise, it returns an exception packet. Our states combine first-order constraints with higher-order functions and reference cells. Unification only concerns the part of a state that represents firstorder constraints. Investigations of unification and constraint solving that relate to the unification defined here can be found in [3, 17].

9

Choices

An essential feature of logic programming is a built-in mechanism for search. To add this feature to DML, we introduce choice expressions of the form choice ell...lek A choice is evaluated by replacing it with one of its alternatives e~. To make this practical, the choices are tried from left to right employing chronological backtracking as in Prolog. We arrange things such that a speculative computation terminates with failure if a unification operation fails. If there is only one thread, this gives us the search mechanism of pure Prolog. If there are multiple threads, we require that a choice is only committed once all other threads are either blocked or can only advance by committing a choice.

10

Spaces

T h e outlined Prolog-like search is not satisfactory in a concurrent setting since search is done at the top level and cannot be encapsulated into concurrent agents. It also fails to provide means for p r o g r a m m i n g search engines like all solution search. This long standing problem of logic programming is solved by Oz with a new concept called spaces. A space is a box consisting of a store and threads. C o m p u t a t i o n in a space is speculative and does not have a direct effect outside. C o m p u t a t i o n in a space proceeds until the space becomes either failed or stable. Stability means t h a t no thread can advance except by committing a choice. There is an operation t h a t blocks until a space is failed or stable and then reports the result. For stable spaces there are two possibilities: either there is a pending choice or not. If there is no pending choice, the space can be merged with the parent space to obtain the result of the speculative computation. If there is a pending choice, the space can be cloned and be committed to the respective alternatives. Spaces turn out to be a simple and flexible means for programming search engines. A first version is described in [II, 14]. A recent paper on spaces and their use is [I0].

11

Finite Domain Constraints

Finite domain constraints are constraints over integers that in conjunction with constraint p r o g r a m m i n g yield a powerful tool for solving combinatorial problems like scheduling [2, 18, 12]. To include t h e m in our framework, we introduce a new unit l v a r ( D ) t h a t represents a logic variable that is constrained to take a value in D, where D must be a finite set of integers. Variable binding and unification are adapted so t h a t they respect finite domain constraints. The primitive operations of DML treat finite domain variables like unconstrained variables. There is a new primitive operation fdvar: f i n d o m - >

int

t h a t returns a fresh logic variable constrained to the finite domain given as argument. Unification is extended to handle constrained logic variables according to their logical meaning. For instance, the expression let

val x = fdvar[1,2]

val y = fdvar[0,2]

i n x == y end

is equivalent to the expression 2. More expressive constraints like 2 * x = y are realized with concurrent agents called propagators. For instance, if the store knows that x E { 1 , . . . , 10} and y E { 1 , . . . , 9}, a propagator for the constraint 2 . x = y can narrow the domains of x and y to x E {1,2,3,4} and y E {2,4,6,8}. This form of inference is called constraint propagation. In general, there will be m a n y propagators t h a t communicate through the store. The power of a constraint p r o g r a m m i n g system depends on the class of propagators it offers. Depending on the constraints they

realize, propagators often use nontrivial algorithms. A ubiquitous constraint is "xl, 9.., xk are all different". For instance, if the store knows x 6 {1,2,3}

y 9 {1,2,3}

z9

u 9 {1,2,3,4,5}

v 9 {1,3,4}

a propagator for "x, y, z, u, v are all different" can narrow the domains to x9

y9

z9

u9

v9

which determines the values of u and v. There is a complete propagation algorithm for the all different constraint that has quadratic complexity in the number variables and possible values [8]. 12

Feature

Constraints

Feature constraints are constraints over records that have applications in computational linguistics and knowledge representation. There is a parameterized primitive operation tellfeature#1abe1: record * 'a -> unit that constrains its first argument to be a record that has a field with the label label and with the value that is given as second argument. For instance, t ellf eature#age (x, y) posts the constraint that x is a record of the form {age=y .... }. To accommodate feature constrains, we use units of the form

l v a r ( w , {/1 = a l , . . . , l k = a k } ) that represent logic variables that are constrained to records as specified. A record satisfies the above specification if it has at most w fields and at least a field for every label l~ with a value that satisfies the constraints for the address ai. The metavariable w stands for a nonnegative integer or co, where k _~ w. There is also a primitive operation tellwidth: record * int -> unit

that constrains its first argument to be a record with as many fields as specified by the second argument. The operation t e l l f e a t u r e is in fact a unification operation. It narrows the store in a minimal way so that the logic content of the new state is equivalent to the logic content of the old state conjoined with the feature constraint told. For instance, the expression let val x = ivar() in tellwidth(x,l) ; tellfeature#a(x,7) ; x end is equivalent to the expression {a=7}.

Feature constraints and the respective unification algorithms are the subject of [17]. Feature constraints are related to Ohori's [5] inference algorithm for polymorphic record types.

10 13

Conclusion

The main point of the paper is the insight that logic and concurrent constraint languages can be profitably based on functional core languages with call by value semantics. This avoids unnecessary duplication of concepts. SML wins over Scheme since it has richer data structures and factored out reference cells. Our approach does not unify the theories behind functional and logic programming. It treats the extensions necessary for concurrent constraint programming at an abstract implementation level. To understand and analyse concurrent

constraint programming, more abstract models are needed (e.g., [9, 2, 13, 15,16]). It seems feasible to extend the SML type system to logic variables and constraints. Such an extension would treat logic variables similar to reference cells. Feature constraints could possibly be treated with Ohori's polymorphic'record types [5]. The approach presented here is an outcome of the Oz project. The development of Oz started in 1991 from logic programming and took several turns. Oz subsumes all concepts in this paper but has its own syntax and is based on a relational rather than a functional core. The relational core makes Oz more complicated than necessary. The insights formulated in this paper can be used to design a new and considerably simplified version of Oz. Such a new Oz would be more accessible to programmers experienced with SML and would be a good vehicle for teaching concurrent constraint programming.

Acknowledgments Thanks to Martin Miiller and Christian Schulte for comments on a draft of the paper, and thanks to all Oz developers for inspiration.

References 1. M. Henz. Objects for Concurrent Constraint Programming. Kluwer Academic Publishers, Boston, Nov. 1997. 2. J. Jaffax and M. J. Maher. Constraint logic programming: A survey. The Journal of Logic Programming, 19/20:503-582, May-July 1994. 3. J.-L. Lassez, M. J. Maher, and K. Marriott. Unification revisited. In J. Minker, editor, Foundations of Deductive Databases and Logic Programming. Morgan Kaufmann Publishers, San Mateo, CA, USA, 1988. 4. R. Milner, M. Torte, R. Harper, and D. MacQueen. The Definition of Standard ML (Rewsed). The MIT Press, Cambridge, MA, 1997. 5. A. Ohori. A polymorphic record calculus and its compilation. A CM Trans. Prog. Lang. Syst., 17(6):844-895, 1995. 6. Oz. The Oz Programming System. Programming Systems Lab, DFKI and Universit~t des Saaxlandes: http ://www. ps. uni-sb, de/oz/. 7. B. C. Pierce and D. N. Turner. Pict: A programming language based on the picalculus. In Proof, Language and Interaction: Essays in Honour of Robin Milner. The MIT Press, Cambridge, MA, 1997.

11 8. J.-C. Regin. A filtering algorithm for constraints of difference in CSPs. In Proceedings of the National Conference on Artificial Intelligence, pages 362-367, 1994. 9. V. A. Saraswat. Concurrent Constraint Programming. The MIT Press, Cambridge, MA, 1993. 10. C. Schulte. Programming constraint inference engines. In G. Smolka, editor, Proceedings of the 3rd International Conference on Principles and Practice of Constraint Programming, volume 1330 of Lecture Notes zn Computer Science, pages 519-533, Schloss Hagenberg, Linz, Austria, Oct. 1997. Springer-Verlag. 11. C. Schulte and G. Smolka. Encapsulated search in higher-order concurrent constraint programming. In M. Bruynooghe, editor, Proceedings of the Internatwnal Logic Programming Symposium, pages 505-520, Ithaca, New York, USA, Nov. 1994. The MIT Press, Cambridge, MA. 12. C. Schulte, G. Smolka, and J. Wiirtz. Finite domain constraint programming in Oz, a tutorial, 1998. Programming Systems Lab, DFKI and Universit/it des Saarlandes: ftp : / / f t p . ps. uni-sb, de/oz/document at ion/FDTutorial, ps. gz. 13. C. Smolka. A foundation for concurrent constraint programming. In J.-P. Jouannaud, editor, Constraints in Computational Logics, volume 845 of Lecture Notes in Computer Science, pages 50-72. Springer-Verlag, Berlin, Sept. 1994. 14. G. Smolka. The definition of Kernel Oz. In A. Podelski, editor, Constraints: Baszcs and Trends, volume 910 of Lecture Notes in Computer Science, pages 251-292. Springer-Verlag, Berlin, 1995. 15. G. Smolka. The Oz Programming Model. In J. van Leeuwen, editor, Computer Science Today, volume 1000 of Lecture Notes in Computer Science, pages 324-343. Springer-Verlag, Berlin, 1995. 16. G. Smolka. Problem solving with constraints and programming. A CM Computing Surveys, 28(4), Dec. 1996. Electronic Section. 17. G. Smolka and R. Treinen. Records for logic programming. The Journal of Logic Programming, 18(3):229-258, Apr. 1994. 18. P. Van Hentenryck, V. Saraswat, et al. Strategic directions in constraint programming. A C M Computing Surveys, 28(4):701-726, Dec. 1997. ACM 50th Anniversary Issue. Strategic Directions in Computing Research. 19. P. Van Roy, S. Haridi, P. Brand, G. Smolka, M. Mehl, and R. Scheidhauer. Mobile objects in Distributed Oz. A CM Transactions on Programming Languages and Systems, 19(5), Sept. 1997.

A Bisimulation M e t h o d for Cryptographic Protocols Martin Abadi 1 and Andrew D. Gordon 2 i Digital Equipment Corporation, Systems Research Center 2 University of Cambridge, Computer Laboratory

A b s t r a c t . We introduce a definition of bisimulation for cryptographic protocols. The definition includes a simple and precise model of the knowledge of the environment with which a protocol interacts. Bisimulation is the basis of an effective proof technique, which yields proofs of classical security properties of protocols and also justifies certain protocol optimisations. The setting for our work is the spi calculus, an extension of the pi calculus with cryptographic primitives. We prove the soundness of the bisimulation proof technique within the spi calculus.

1

Introduction

In reasoning about a reactive system, it is necessary to consider not only the steps taken by the system but also the steps taken by its environment. In the case where the reactive system is a cryptographic protocol, the environment may well be hostile, so little can be assumed about its behaviour. Therefore, the environment m a y be modelled as a nondeterministic process capable of intercepting messages and of sending any message that it can construct at any point. This approach to describing the environment is fraught with difficulties; the resulting model can be somewhat arbitrary, hard to understand, and hard to reason about. Bisimulation techniques [Par81,Mi189] provide an alternative approach. Basically, using bisimulation techniques, we can equate two systems whenever we can establish a correspondence between their steps. We do not need to describe the environment explicitly, or to analyse its possible internal computations. Bisimulation techniques have been applied in a variety of areas and under m a n y guises. Their application to cryptographic protocols, however, presents new challenges. - Consider, for example, a secure communication protocol P ( M ) , where some cteartext M is t r a n s m i t t e d encrypted under a session key. We m a y like to argue that P(M) preserves the secrecy of M , and m a y want to express this secrecy property by saying that P(M) and P(N) are equivalent, for every M and N. This equivalence m a y be sensible because, although P(M) and P(N) send different messages, an attacker that does not have the session key cannot identify the cleartext. Unfortunately, a standard notion of bisimulation would require t h a t P(M) and P(N) send identical messages. So we should relax the definition of bisimulation to permit indistinguishable messages.

13 In reasoning about a protocol, we need to consider its behaviour in reaction to inputs from the environment. These inputs are not entirely arbitrary. For example, consider a system P(M) which discloses M when it receives a certain password. Assuming that the password remains secret, P(M) should be equivalent to P(N). In order to argue this, we need to characterise the set of possible inputs from the environment, and to show that it cannot include the password. - Two messages that are indistinguishable at one point in time may become distinguishable later on. In particular, the keys under which they are encrypted may be disclosed to the environment, which may then inspect the cleartext that these keys were concealing. Thus, the notion of indistinguishability should be sensitive to future events. Conversely, the set of possible inputs from the environment grows with time, as the environment intercepts messages and learns values that were previously secret.

-

In short, a definition of bisimulation for cryptographic protocols should explain what outputs are indistinguishable for the environment, and what inputs the environment can generate at any point in time. In this paper we introduce a definition of bisimulation that provides the necessary account of the knowledge of the environment. As we show, bisimulation can be used for reasoning about examples like those sketched informally above. More generally, bisimulation can be used for proving authenticity and secrecy properties of protocols, and also for justifying certain protocol optimisations. We develop our bisimulation proof technique in the context of the spi calculus [AG97a,AG97b,AG97c,Aba97], an extension of the pi calculus [MPW92] with cryptographic primitives. For simplicity, we consider only shared-key cryptography, although we believe that public-key cryptography could be treated through similar methods. Within the spi calculus, we prove the soundness of our technique. More precisely, we prove that bisimulation yields a sufficient condition for testing equivalence, the relation that we commonly use for specifications in the spi calculus. We have developed other proof techniques for the spi calculus in earlier work. The one presented in this paper is a useful addition to our set of tools. Its distinguishing characteristic is its kinship to bisimulation proof techniques for other classes of systems. In particular, bisimulation proofs can often be done without creativity, essentially by state-space exploration. The next section is a review of the spi calculus. Section 3 describes our proof method and Section 4 illustrates its use through some small examples. Section 5 discusses related work. Section 6 concludes. 2

The

Spi Calculus

(Review)

This section reviews the spi calculus, borrowing from earlier presentations. It gives the syntax and informal semantics of the spi calculus, introduces the main notations for its operational semantics, and finally defines testing equivalence.

14 2.1

Syntax

We assume an infinite set of names and an infinite set of variables. We let c, m, n, p, q, and r range over names, and let w, x, y, and z range over variables. When they represent keys, we let k and K range over names too. The set of terms is defined by the grammar:

L, M, N ::=

terms

n

name

(M, N ) 0

pair zero successor encryption variable

suc(M) { M }N x

Intuitively, {M}N represents the ciphertext obtained by encrypting the t e r m M under the key N using a shared-key cryptosystem such as DES [DES77]. In examples, we write 1 as a shorthand for the term suc(O). The set of processes is defined by the grammar:

P, Q, R ::= M (N).P M(x).P P IQ (vn)P !P

[M is N] P 0 let (x, y) = M in P case M of 0 : P suc(x) : Q case L of {X}N in P

processes output input (scope of x is P ) composition restriction (scope of n is P ) replication match nil pair splitting (scope of x, y is P ) integer case (scope of x is Q) decryption (scope of x is P )

We abbreviate M(N).O to M(N). We write P[M/x] for the outcome of replacing each free occurrence of x in process P with the term M, and identify processes up to renaming of bound variables and names. Intuitively, processes have the following meanings: - An output process M ( N ) . P is ready to output N on M , and then to behave as P . T h e output happens only when there is a process ready to input from M. An input process M(x).Q is ready to input from M, and then to behave as Q[N/x], where N is the input received. A composition P [ Q behaves as P and Q running in parallel. A restriction (vn)P is a process t h a t makes a new, private name n, which may occur in P, and then behaves as P. A replication !P behaves as infinitely m a n y replicas of P running in parallel. - A match [M is N] P behaves as P provided that M and N are the same term; otherwise it is stuck, t h a t is, it does nothing. - The nil process 0 does nothing. -

-

-

15

A pair splitting process let (x,y) = M in P behaves as P[N/x][L/y] if M is

-

-

-

a pair (N, L), and it is stuck if M is not a pair. An integer case process case M of 0 : P suc(x) : Q behaves as P if M is 0, as Q[N/x] if M is suc(N) for some N, and otherwise is stuck. A decryption process case L of {x}g in P attempts to decrypt L with the key N. If L has the form {M}N, then the process behaves as P[M/x]. Otherwise the process is stuck.

For example, P ~=m(x).case x of {Y}K in ~({0}y) is a process that is ready to receive a message x on the channel m. When the message is a ciphertext of the form {Y}K, process P sends 0 encrypted under y on the channel m. This process may be put in parallel with a process Q ~ ~ ( { K ' } K ) , which sends the name K ' encrypted under K on the channel m. In order to restrict the use of g to P and Q, we may form ( v K ) ( P J Q ) . The environment of ( v K ) ( P I Q ) will not be able to construct any message of the form {Y}K, since K is bound. Therefore, the component P of ( v K ) ( P J Q ) may output {0}g,, but not {0}z for any z different from K ' . Alternatively, the component P of (vK)(P [ Q) may produce no output: for example, were it to receive 0 on the channel m, it would get stuck. We write f n ( M ) and f n ( P ) for the sets of names free in term M and process P respectively, and write fv(M) and fv(P) for the sets of variables free in M and P respectively. A term or process is closed if it has no free variables. 2.2

Operational

Semantics

An abstraction is an expression of the form (x)P, where x is a bound variable and P is a process. Intuitively, (x)P is like the process p(x).P minus the name p. A concretion is an expression of the form ( v r n l , . . . , mk)(M)P, where M is a term, P is a process, k _ 0, and the names m l , . . . , mk are bound in M and P. Intuitively, ( 1 ] m l , . . . , m k ) ( M ) P is like the process ( v m l ) . . . (vmk)~(M)P minus the name p, provided p is not one of ml, . . . , mk. We often write concretions as (vr~)(M)P, where r~ = m l , . . . ,mk, or simply (v)(M)P if k = 0. Finally, an agent is an abstraction, a process, or a concretion. We use the metavariables A and B to stand for arbitrary agents, and let fv(A) and fn(A) be the sets of free variables and free names of an agent A, respectively. A barb is a name m (representing input) or a co-name ~ (representing output). An action is a barb or the distinguished silent action T. The commitment relation is written P ~ A, where P is a closed process, a is an action, and A is a closed agent. The exact definition of commitment appears in earlier papers on the spi calculus [AG97b,AG97c]; informally, the definition says: -

P ~

Q means that P becomes Q in one silent step (a T step).

- P --~ (x)Q means that, in one step, P is ready to receive an input x on m and then to become Q.

- P --~ ( v m l , . . . ,mk)(M)Q means that, in one step, P is ready to create the new names ml, . . . , ink, to send M on m, and then to become Q.

16 2.3

Testing

Equivalence

We say that two closed processes P and Q are testing equivalent, and write P ~_ Q, when for every closed process R and every barb/~, if

for some P' and A, then for some Q' and B, and vice versa. For example, the processes ( v K ) ~ ( { 0 } K ) and ( v K ) ~ ( { 1 } K ) are testing equivalent. We may interpret this equivalence as a security property, namely that the process ( v K ) ~ ( { X } K ) does not reveal to its environment whether x is 0 or 1. In the examples contained in Section 4 and in earlier papers, various other properties (including, in particular, secrecy properties) are formulated in terms of testing equivalence. In this paper we develop a sound technique for proving testing equivalence: we introduce a definition of bisimulation and show that if two closed processes are in one of our bisimulation relations then they are testing equivalent.

3

Framed Bisimulation

This section defines our notion of bisimulation, which we call framed bisimulation. 3.1

Frames

and

Theories

Our definition of bisimulation is based on the notions of a frame and of a theory. A bisimulation does not simply relate two processes P and Q, but instead relates two processes P and Q in the context of a frame and a theory. The frame and the theory represent the knowledge of the environment of P and Q. A frame is a finite set of names. Intuitively, a frame is a set of names available to the environment of the processes P and Q. We use f r to range over frames. - A theory is a finite set of pairs of terms. Intuitively, a theory that includes a pair (M, N) indicates that the environment cannot distinguish the data M coming from process P and the data N coming from process Q. We use th to range over theories.

-

Next we define the predicate (fr, th) ~- M ++ N inductively, by a set of rules. Intuitively, this predicate means that the environment cannot distinguish M coming from P and N coming from Q, and that the environment has (or can construct) M in interaction with P and N in interaction with Q. (Eq Frame)

(Eq Theory)

n 9 fr

( M , N ) 9 th

(fr, th) ~- n ~ n

(fr, th) }- M ~ N

(Eq Variable) (fr, th) }- x <-+ x

17 (Eq Pair)

(Eq Zero)

(fr, th) F- M ~ M' (fr, th) F- N ~ N' (fr, th) F- (M, N) e+ (M', N')

(fr, th) t- 0 ~ 0

(Eq Suc)

(Eq Encrypt)

(fr, th) f- M ++ M'

(fr, th) ~- suc(M) ++ suc(M')

(fr, th) ~- M +~ M' (fr, th) t- N ++ N' (fr, th) F- { M } N ++ {M'}N,

For example, if fr = {n} and th = {({0}K,{n}g)}, where n and K are distinct names, then we have (fr, th) F- n ~-~ n and (fr, th) F- {0}g ++ {n}K, and also (fr, th) t- (n, {O}g) ++ (n, {n}K), but we have neither (fr, th) t- g ~ g nor (fr, th) F { n } g ~ {0}g. We say that the pair (fr, th) is ok, and write (fr, th) F- ok, if two conditions hold: (1) whenever (M, N) E th: - M is closed and there are terms M1 and M2 such that M = {M1}M2 and there is no N2 such that (fr, th) ~- M2 ++ N2; - N is closed and there are terms N1 and N2 such that N = {N1}N2 and there is no M2 such that (fr, th) F- M2 ++ N2; (2) whenever (M, N) E th and (M', N') E th, M = M' if and only if N = N'. Intuitively, the first condition requires that each term in a pair (M, N) in a theory be formed by ciphertexts that the environment cannot decrypt. For example, the requirement that there be no N2 such that (fr, th) f- M2 ++ N2 means that the environment cannot construct M2 for decrypting M. The second condition guarantees that no ciphertext is equated to two other ciphertexts. This condition is essential because the environment can compare ciphertexts even when it cannot decrypt them (see Example 2 of Section 4). 3.2

Ordering

Frame-Theory

Pairs

We define an ordering between pairs of frames and theories as follows: we let (fr, th) < (fr', th') if and only if for all M and N, (fr, th) F- M ~ N implies (fr ~, th t) F- M t-~ N. This relation is reflexive and transitive. It is not the same as the pairwise ordering induced by subset inclusion; fr C_ fr ~ and th C th ~ imply (fr, th) ~ (fr ~, the), but the converse implication does not hold. P r o p o s i t i o n 1. Suppose (fr', th') F- ok. Then (fr, th) <_ (fr', th') i] and only if

fr C_ fr' and (fr', th') F- M ~ N for each (M, N) e th. As indicated above, we view a pair (fr, th) as a representation for the knowledge of an environment. With this view, and assuming that (fr~,th ~) ~ ok, the relation (fr, th) < (fr t, th ~) means that the environment may go from the knowledge represented in (fr, th) to the knowledge represented in (fr ~, the). The definition of (fr, th) < (fr', th') implies that the set of names and terms that the

18

environment has (or can construct) grows in this transition. It also implies that any indistinguishable pair of terms remains indistinguishable after the transition. So, if ever we assert that (fr, th) characterises an environment, we should take care that (fr, th) does not imply that the environment cannot distinguish two terms M and N if later information would allow the environment to distinguish these terms. For example, if fr' includes the name n, then th should not contain ({0}n, {1}n). Intuitively, the acquisition of the name n would allow the environment to distinguish {O}n and {1}n, so ( f r ' , t h ' ) }- {O}n ++ {1}~ would not hold. On the other hand, th may contain ({O}n, {0}~); in that case, th' could not contain ({0}n, {0}n), but we would at least have (fr', th') ~- {0}n ~ {0}n. 3.3

Framed

Relations

and

Bisimulations

For a theory th, we let fn(th) = U{fn(M) u fn(N) I (M,N) E th}. We let 7rl(th) = {M I ( M , N ) E th} and 7r2(th) = { g ] (M,N) E th}, and write fn(~rl(th)) and fn(~r2(th)) for the sets of names U { f n ( M ) I M e ~rl(th)} and U{fn(g) i N 9 ~'2(th)} respectively. A framed process pair is a quadruple (fr, th, P, Q) such that P and Q are closed processes, f r is a frame, and th is a theory. When 7~ is a set of framed process pairs, we write (fr, th) ~- P 7~ Q to mean (fr, th, P, Q) 9 T~. A framed relation is a set ~ of framed process pairs such that (fr, th) ~- ok whenever (fr, th) t- P T4 Q. A framed simulation is a framed relation ,.q such that, whenever (fr, th) tP S Q, the following three conditions hold. -

If P r > p , then there is a process Q' with Q ~ > Q' and (fr, th) ~ P' 8 Q'. If P c > (x)P' and c 9 f r then there is an abstraction (x)Q' with Q _!+ (x)Q' and, for all sets {~} disjoint from fn(g) U fn(Q) u fr u fn(th) and all closed M and N, if (fr U {g},th) ~- M ++ g then (fr U { g ) , t h ) F- P'[M/x] S

Q'[g/x]. - If P ~> (y~)(M)P', c 9 fr, and the set {r~} is disjoint from fn(P) U fn(Trl(th)) U f r then there is a concretion (u~)(N)Q' with Q ~> (~g)(N)Q' and the set {if} is disjoint from fn(Q) Ufn(Tr2(th)) U fr, and there is a frametheory pair (fr', th') such that (fr, th) < (fr', th'), (fr', th') F- M ++ N, and (fr', th') ~- P' S Q'. We may explain these conditions as follows. -

-

The first condition simply requires that if P can take a ~- step then Q can match this step. The second condition concerns input steps where the channel c on which the input happens is in fr (that is, it is known to the environment). In this case, we must consider the possible inputs M from the environment to (x)P', namely the terms M that the environment can construct according to (fr U {if), th). The names in ~ are fresh names, intuitively names just generated by the environment. Correspondingly, we consider the possible inputs N

19 for (x)Q', for an appropriate (x)Q' obtained from Q. We then require that giving these inputs to (x)P ~ and (x)Q t, respectively, yields related processes P'[M/x] and Q'[N/x]. The choice of (x)Q t is independent of the choices of M and N. So, in the technical jargon, we may say that S is a late framed simulation. - The third condition concerns output steps where the channel c on which the output happens is in f r (that is, it is known to the environment). In this case, P outputs the term M while creating the names ~ . The condition requires that Q can output a corresponding term N while creating some names ~. It also constrains M and N, and the resulting processes P ' and Qq The constraints concern a new frame-theory pair (fr t, thl). Intuitively, this pair represents the knowledge of the environment after the output step. The requirement that (]C,th ~) ~- M ++ N means that the environment obtains M in interaction with P and N in interaction with Q, and that it should not be able to distinguish them from one another. Because we do not impose a minimality requirement on (fC, the), this pair may attribute "too much" knowledge to the environment. For example, fC may contain names that are neither in f r nor in M or N, so intuitively the environment would not be expected to know these names. On the other hand, the omission of a minimality requirement results in simpler definitions, and does not compromise soundness.

A framed bisimulation is a framed relation S such that both S and 3 -1 are framed simulations. Framed bisimilarity (written .-~/) is the greatest framed bisimulation. By the Knaster-Tarski fixpoint theorem, since the set of framed relations ordered by subset inclusion forms a complete lattice, framed bisimilarity exists, and equals the union of all framed bisimulations. Our intent is that our definition of framed bisimulation may serve as the basis for an algorithm, at least for finite-state processes. Unfortunately, the definition contains several levels of quantification. The universal quantifiers present a serious obstacle to any algorithm for constructing framed bisimulations. In particular, the condition for input steps concerns all possible inputs M and N; these inputs are of unbounded size, and may contain an arbitrary number of fresh names. However, we conjecture that the inputs can be classified according to a finite number of patterns--intuitively, because the behaviour of any finitestate process can depend on at most a finite portion of its inputs. An algorithm for constructing framed bisimulations might consider all inputs that match the same pattern at once. We leave the invention of such an algorithm for future work. 3.4

Soundness

(Summary)

Our main soundness theorem about framed bisimulation is that it is a sufficient condition for testing equivalence. T h e o r e m 2. Consider any closed processes P and Q, and any name n f[ f~(P)U fn(Q). Suppose that (fn(P) U fn(Q) u {n}, O) ~- P ~ f Q. Then P "~ Q.

20 This theorem implies that if we want to prove that two processes are testing equivalent, then we may construct a framed bisimulation S such that (fn(P) U fn(Q) u {n}, 0) ~- P S Q where n is a single, arbitrary new name. (The addition of the name n is technically convenient, but may not be necessary.) The next section illustrates this approach through several examples. The proof of this theorem requires a number of auxiliary notations, definitions, and lemmas. We omit the details of the proof, and only indicate its main idea. In the course of the proof, we extend the relation ~-~ to processes: we define the predicate (fr, th) F- P ++ Q by the following rules. (EQ Out) (fr, th) F- M ~ M' (fr, th) F N ~ g ' (fr, th) F P ++ P' (fr, th) F M ( N ) . P ~ M ' ( N ' ) . P ' (Eq In) (fr, th) I- M ~ M' (fr, th) ~- P ++ P' (fr, th) F- M ( x ) . P ~ M'(x).P' (Eq Par) (fr, th) ~- P ~ P' (fr, th) F- Q ++ Q' ( ~ , t h ) F- P I Q ~ P ' I Q '

(Eq Repl) (fr, th) I- P ++ P' (fr, th) F- !P ~ !P'

(Eq Res) (where n • fr U fn(th)) (fr U {n}, th) ~- P ++ P'

(fr, th) F (vn)P ~ (un)P'

(Eq Match) (fr, th) F M ~ M' (fr, th) ~- N ~-~ N' (fr, th) ~- P ++ P' (fr, th) F [M is N] P ++ [M' is N'] P' (Eq Nil)

(fr, th) ~- 0 ++ 0

(EQ Let) (fr, th) F M o M' (fr, th) F P + 4 P ' (fr, th) ~- let (x, y) = M in P ++ let (x, y) = M' in P'

(Eq IntCase) (fr, th) F M ++ M' (fr, th) t- P ~ P' (fr, th) t- Q o Q' (fr, th) ~- case M of 0 : P suc(x) : Q ~ case M' of 0 : P' suc(x) : Q' (Eq Decrypt) (fr, th) ~- M ++ M' (fr, th) F g ++ N' (fr, th) t- P ++ P' (fr, th) t- case M of {X}N in P ~ case M' of {x}g, in P' The core of the proof depends on a relation, $, defined so that P S Q if and only if there is a frame fr, a theory th, and processes P1, P2, Q1, Q2, such that

P = (vp-~(P1 ]P2)

Q -- (yq~(Q1 ]Q2)

and (fr, th) t- ok, (fr, th) ~- P1 ~ I Q~, and (fr, th) b P2 ++ Q2, where {p~ = (fn(P1) u/n(~rl (th))) - fr and {q~} = (fn(Q1) u fn(~r~(th))) - ft. By a detailed

21 case analysis, we may show that S satisfies the definition of a standard notion of bisimulation--a barbed bisimulation up to restriction and barbed equivalence. Given some auxiliary lemmas about testing equivalence, the theorem then follows easily. The construction of S also yields that framed bisimilarity is a sufficient condition for a strong equivalence called barbed congruence. The converse of soundness--completeness--does not hold. The failure of completeness follows from the fact that framed bisimilarity is a sufficient condition for barbed congruence. (Barbed congruence and a fortiori framed bisimilarity are sensitive to T steps and to branching structure, while testing equivalence is not.) Incompleteness may be somewhat unfortunate but, in our experience, it seems to be compatible with usefulness. 4

Examples

This section shows how bisimulations can be exploited in proofs through some small examples. These examples could not be handled by standard notions of bisimulation (like that of our earlier work [AG97b,AG97c]). We have worked through further examples, including some examples with more steps. In all cases, the proofs are rather straightforward. Throughout this section, c, K , K1, a n d / ( 2 are distinct names, and n is any name different from c. Moreover, M, M ~, M ' , M1,/1//2, N1, and N2 are closed terms; it is convenient to assume that no name occurs in them.

Example 1 As a first example, we show that the processes (vK)~({M}K) and (~K)-6({M'}K) are in a framed bisimulation, so they are testing equivalent. Intuitively, this means that these processes do not reveal M and M', respectively. For this example, we let S be the least relation such that: - ({c, n}, O) F- (vK)~({M}K) S (vK)~({M'}K) - ({c,n}, {({M}k, {M'}k)}) f- 0 S 0 for all names k ~t {c, n} Since ({c,n}, 0) ~- ok and ({c,n}, {({M}k, {M'}k)}) I- ok, 8 is a framed relation. Next we show that S is a framed simulation; symmetric reasoning establishes that 8 -1 is one too. Assuming that (fr, th) t- P S Q, we need to examine the commitments of P and Q. We consider two cases, which correspond to the two clauses of the definition of S. -

Suppose that P = (L,K)-6({M}K) and Q = (vK)-6({M'}K). In this case, we have f r = {c, n} and th = 0. Up to renaming of the bound name K , the only commitment of P is P ~ ~ (vK)({M}K)O. To establish that S is a framed simulation, we need only consider the case where K is renamed to some k r fn(P) U/n(Th(0)) U {c,n}, that is, k r {c,n}. By renaming, we have Q

"~ (~/k)({M'}k)O. We let th' = {({M}k, {M'}k)}. We have (fr, th) <_ (fr, th'), (fr, th') f- {M}k <-~ {M'}k, and (fr, th') F- 0 ~ O. Thus,

Q can match P's commitment.

22 - Suppose that P = 0 and Q = 0. This case is trivial, since 0 has no commitments. In short, ({c, n}, 0) t- (vK)-5({M}K) S (vK)-~({M'}K), and 8 is a framed bisimulation, as desired.

Example 2 As a small variant of the first example, we consider the processes (vK)-g(({M}K, {M}K)) and (vg)~(({M'}g, {M"}K)). When M ~ = M " , the argument of the first example works for this example too, with only trivial modifications. We define 8 as the least relation such that: - ({c, n}, O) ~- (uK)~(({M}K, {M}K)) S (uK)-6(({M'}K, {M"}K)) - ({c, n}, {({M}k, {M'}k), ({M}k, {M"}k)}) t- 0 S 0 for all names k ~ {c, n} This relation is a framed bisimulation when M ' = M " . On the other hand, it is not a framed bisimulation when M' # M " . In fact, in that case it is not even a framed relation, because ({c,n}, {({M}k, {M'}k), ({M}k, {M"}k)}) t- ok does not hold (because condition (2) of the definition of ok is not satisfied). The fact that S is not a framed bisimulation in this case should not be a concern. It is actually necessary: the processes (uK)-5(({M}K, {M}K)) and (vK)-~(({M'}K, {M"}K)) are not testing equivalent when M' # M". The environment c(z).let (x, y) -- z in [x is y] E(0) distinguishes them. Thus, this example illustrates that two ciphertexts that cannot be decrypted can still be compared, and justifies part of the definition of framed bisimulation.

Example 3 As a further variant, we study an example with nested encryption. We consider the processes (vK1)(vK2)e({M1, {M2}K2}K~).e(K1) and (vK1)(uK2) ~({N1, {N2}K2 }K~ ).~(K1). Each of these processes creates two keys K 1 a n d / ( 2 , sends a ciphertext, and then reveals K1. Anyone who receives K1 can partially decrypt the ciphertext. In order to analyse these processes, we let 8 be the least relation such that: - ({c,n},O) ~- (uK1)(vK2)-5<{M1, {M2}K2}Kl>.-5 S (vK1) (vK2)c<{N1, {N2 }K2 }K1 >.c - ({c, n, kl}, {({M2}k2, {N2}k2)}) F- ~(kl) S ~(kl> for all names kl, k2 with kl # k2 and {kl, k2} N {c, n} = 0

- ({c,n, kl},{({M2}k2,{N2}k2)}) ~- OS 0 for all names kl, k2 with kl # k2 and {kl, k2} N {c, n} = 0 Note how, between the first and the second clauses, the frame has been enlarged with kl, although the processes considered in the second clause have not yet sent kl; this simplifies the construction of S and is permitted by the definitions of Section 3. The assumptions guarantee that ({c, n, kl}, {({M2}k2, {g2}k2)}) I- ok and hence that S is a framed relation. Moreover, S is a framed bisimulation if and only if the following condition holds:

((c, n,

f-

{M2}k

(N1,

23 In turn, this condition holds if and only if M1 -- N1. Intuitively, the equality M1 = N1 becomes necessary only when the two processes send the key kl, since M1 and N~ are not visible in the first message. Our definition of _< guarantees that the necessity of M1 = N1 is propagated correctly.

Example 4 While all the examples above concern the secrecy of certain outputs, this one concerns the impossibility of certain inputs. We consider the processes (vK)'~({O}K).c(x).[x is K] ~({0}g) and (vg)-5({O}K>.C(x).O. The former process creates a key K , sends {0}g, listens for an input, and if it receives K then it sends {0}g again. However, we would expect that K will never arrive as an input to this process, since the process never discloses K (but only {0}g, from which K itself cannot be deduced). Therefore, we would expect this process to be equivalent to the latter process, which simply stops upon receipt of a message. For this example, we let S be the least relation such that: - ({c, n}, 0)~- ((vK)-5({O}g).c(x).[x is K] ~({0}K))S ((vK)-~<{O}g).c(x).O) -

((c, n}, (((0)k,

(c(x).[x is k]

S (e(x).0)

for all names k with k ~ {c, n} - ( ( c , n , ~ } , (((0}k, (0}k)}) }- (IN is k] ~((0}k)) S 0 for all names k with k ~ {c, n}, for all sets {~} disjoint from {c, n, k}, and all closed terms Y and g ' with ({c,n, fft},{({O}k,{O}k)}) ~ N <--+g ' (We are not assuming that no names occur in the closed terms N and N'.) Since ({c,n, r5}, {({0}k, {0}k)}) ~- N ~ N' and k ~ {c,n, rh}, the term Y is not k, so [N is k] ~{{0}k) ~> A is not true for any a and A. In other words, the process [N is k] e({0}k) is stuck. It follows easily that S is a framed bisimulation.

Example 5 In cryptographic protocols, some keys are generated by consulting sources of randomness, but it is also common to generate keys by applying one-way functions to other keys. (Whereas many one-way functions are quite efficient, randomness and key agreement can be relatively expensive [Sch96b].) As a final example, we consider a simple protocol transformation inspired by a common method for generating keys. We compare the process (vK1)(vK2) ~({M1 }K1)-~{{M2 }K2), which generates and uses two keys, with the process (vK) ~({N1}{o}~).~({N2}{I}K), which generates the master key K and then uses the derived keys {0}K and {1}K. In order to show that these processes are testing equivalent, we construct once more a framed bisimulation. We let S be the least relation such that: - ({c, n}, O) t- (vK1)(vK2)-5({M1}KI).-5({M2}K2) S

(.K)~<{N1}{o}K>.U({N2}{I}K)

- ({c,n}, {({M1}k,, {N1}{0},)}) ~- (vK2)~{{M2}K2> S c<{/'V2}{1}~> for all names k and kl with {k, kl} C1{c, n} -- 0 - ({c,n}, { ( { / 1 } k , , {N1}{o}~), ( { / 2 } k : , {N2}{1}~)}) t- 0 S 0 for all names k and kl with {k, kl} A {c,n} = O and all names k2 with k2 ~ {c,n, kl} It is somewhat laborious but not difficult to check that S is a framed bisimulation, much as in the examples above.

24 5

Related

Work

Park [Par81] first suggested the bisimulation proof technique, in the context of Milner's CCS. After Park's work, bisimulation became a cornerstone of the theory of CCS [Mil89]. Milner, Parrow, and Walker [MPW92] extensively studied a variety of forms of bisimulation for the pi calculus, their generalisation of CCS with name-passing and mobile restrictions. Our definition of framed bisimulation generalises (and relaxes) the definition of strong bisimulation from earlier work on the spi calculus [AG97b,AG97c]. We can show that if processes P and Q are strongly bisimilar, then, for all frames fr, (fr, 0) ~- P "-'f Q. The converse implication fails. According to most other definitions, a bisimulation is a set of pairs of processes. According to our definition, a framed bisimulation is a set of quadruples consisting of a pair of processes grouped with a frame and a theory. According to a definition of Pierce and Sangiorgi [PS96] for a typed pi calculus, a bisimulation is a set of pairs of processes indexed by a type assumption that binds types to channel names. Our use of a frame is a little like their use of typing assumptions, in that both a frame and a typing assumption delimit the channels on which the environment may observe the processes in the bisimulation. On the other hand, the spi calculus is untyped, and our use of a theory to represent compound terms possessed but not decomposable by the environment seems to be new. There are also parallels with the work of Pitts and Stark [PS93] on the ~,calculus, a simply-typed A-calculus enriched with dynamic allocation of names. Pitts and Stark define a logical relation on programs, parameterised by a partial bijection on the names free in related programs. Their logical relation is sound for proving observational equivalence; it is incomplete but more generous than the usual notion of applicative bisimilarity for the v-calculus. A logical relation is one in which two abstractions are related if and only if they send related arguments to related results. Given the clause for inputs in the definition of framed bisimulation, which requires the bodies of two abstractions ( x ) P ~ and (x)Q ~ to be related on all related terms M and N, we may say that the relations (fr, th) F- M ~ N and (fr, th) F- P ~ / Q form a parametric logical relation on the terms and processes of the spi calculus. Like Pitts and Stark's logical relation, our logical relation is sound for proving testing equivalence. Further, it is incomplete but more generous than the usual notion of strong bisimilarity; it has parameters (the frame and the theory) that serve to identify certain processes that are distinguished by the usual relation of strong bisimilarity. However, the analogy with Pitts and Stark's work is not perfect; in particular, their use of partial bijections on names is different from our use of frames and theories. In the last few years, several methods for analysing cryptographic protocols have been developed within action-based or state-based models (see for example [MCF87,Mi195,Kem89,Mea92,GM95,Low96,Sch96a,Bo196,Pau97]). Some of these models are presented as process algebras, others in logical forms. Often, the analysis of a protocol requires defining a particular attacker (an environment) for the protocol; recently, there has been promising progress towards automating the construction of this attacker. Bisimulation techniques do appear in the

25 security literature (as in the work of Focardi and Gorrieri [FG95]), but rarely, and without special tailoring to cryptographic applications.

6

Conclusions

When reasoning about a cryptographic protocol, we must take into account the knowledge of the environment with which the protocol interacts. In our definition of bisimulation, this knowledge is represented precisely as a set of names that the environment has obtained, and as a set of pairs of ciphertexts that the environment has received but cannot distinguish. This precise representation of the knowledge of the environment is the basis for an effective and sound proof technique. Using this technique, we can construct proofs for small but subtle cryptographic protocols. The proofs are fairly concise and do not require much creativity. Therefore, although we have not yet attempted to mechanise our proofs, we believe that such a mechanisation is possible, and that it may enable the automatic verification of substantial examples.

Acknowledgements Discussions with Davide Sangiorgi were helpful at the start of this work. Andy Pitts made useful comments. Gordon held a Royal Society University Research Fellowship at the University of Cambridge Computer Laboratory for most of the time we worked on this paper. He now holds a position at Microsoft Research.

References M. Abadi. Secrecy by typing in security protocols. In Theoretical Aspects of Computer Software, volume 1281 of Lecture Notes in Computer Science, pages 611-638. Springer-Verlag, 1997. [AG97a] M. Abadi and A. D. Gordon. A calculus for cryptographic protocols: The spi calculus. In Proceedings of the Fourth A CM Conference on Computer and Communicatwns Security, pages 36-47, 1997. [AG97b] M. Abadi and A. D. Gordon. A calculus for cryptographic protocols: The spi calculus. Technical Report 414, University of Cambridge Computer Laboratory, January 1997. [AG97c] M. Abadi and A. D. Gordon. Reasoning about cryptographic protocols in the spi calculus. In CONCUR'97: Concurrency Theory, volume 1243 of Lecture Notes in Computer Science, pages 59-73. Springer-Verlag, 1997. [Bo196] D. Bolignano. An approach to the formal verification of cryptographic protocols. In 3rd A CM Conference on Computer and Communications Security, pages 106-118, March 1996. IDES77] Data encryption standard. Fed. Inform. Processing Standards Pub. 46, National Bureau of Standards, Washington DC, January 1977. [FG95] R. Focardi and R. Gorrieri. A classification of security properties. Journal of Computer Security, 3(1), 1995. [Aba97]

26 [GM951

[Kem89] [Low96]

[MCF87]

[Mea92] [Mi189] [Mil95] [MPW92]

[Par81]

[Pau97]

[PS93]

[PS96] [Sch96a] [Sch96b]

J. Gray and J. McLean. Using temporal logic to specify and verify cryptographic protocols (progress report). In Proceedings of the 8th IEEE Computer Security Foundations Workshop, pages 108-116, 1995. R. A. Kemmerer. Analyzing encryption protocols using formal verification techniques. IEEE Journal on Selected Areas in Communications, 7, 1989. G. Lowe. Breaking and fixing the Needham-Schroeder public-key protocol using FDR. In Tools and Algorithms for the Construction and Analysis of Systems, volume 1055 of Lecture Notes in Computer Science, pages 147-166. Springer-Verlag, 1996. J. K. Millen, S. C. Clark, and S. B. Freedman. The Interrogator: Protocol security analysis. IEEE Transactions on Software Engineering, SE-13(2):274288, February 1987. C. Meadows. Applying formal methods to the analysis of a key management protocol. Journal of Computer Security, 1(1):5-36, 1992. R. Milner. Communication and Concurrency. Prentice-Hall International, 1989. J. K. Millen. The Interrogator model. In IEEE Symposium on Security and Privacy, pages 251-260, 1995. R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, parts I and II. Information and Computation, pages 1-40 and 41-77, September 1992. D. Park. Concurrency and automata on infinite sequences. In P. Deussen, editor, Theoretical Computer Science: 5th GI-Conference, Karlsruhe, volume 104 of Lecture Notes in Computer Science, pages 167-183. Springer-Verlag, March 1981. L. Paulson. Proving properties of security protocols by induction. In Proceedings of the lOth IEEE Computer Security Foundations Workshop, pages 70-83, 1997. A. M. Pitts and I. D. B. Stark. Observable properties of higher order functions that dynamically create local names, or: What's new? In Mathematical Foundations of Computer Science, Proe. 18th Int. Syrup., Gdadsk, 1993, volume 711 of Lecture Notes in Computer Science, pages 122-141. SpringerVerlag, 1993. B. Pierce and D. Sangiorgi. Typing and subtyping for mobile processes. Mathematical Structures in Computer Science, 6(5):409-453, October 1996. S. Schneider. Security properties and CSP. In IEEE Symposium on Security and Privacy, pages 174-187, 1996. B. Schneier. Applied Cryptography: Protocols, Algorithms, and Source Code in C. John Wiley & Sons, Inc., second edition, 1996.

A Polyvariant Binding-Time Analysis for Off-line Partial Deduction Maurice Bruynooghe, Michael Leuschel, and Konstantinos Sagonas Katholieke Universiteit Leuven, Department of Computer Science Celestijnenlaan 200A, B-3001 Heverlee, Belgium e-ma~: {maurice ,michael ,kostis}@cs .kuleuven.ac.be A b s t r a c t . We study the notion of binding-time analysis for logic programs. We formalise the unfolding aspect of an on-line partial deduction system as a Prolog program. Using abstract interpretation, we collect information about the run-time behaviour of the program. We use this information to make the control decisions about the unfolding at analysis time and to turn the on-line system into an off-line system. We report on some initial experiments.

1

Introduction

Partial evaluation and partial deduction are well-known techniques for specialising respectively functional and logic programs. While both depart from the same basic concept, there is quite a divergence between their application and overall approach. In functional programming, the most widespread approach is to use off-li~e specialisers. These are typically very simple and fast specialisers which take (almost) no control decisions concerning the degree of specialisation. In thiscontext, the specialisation is performed as follows: First, a bi~di~g-~ime a~alysis (BTA) is performed on the program which annotates all its statements as either "reducible" or "non-reducible". The annotated program is then passed to the off-line specialiser, which executes the statements marked reducible and produces residual code for the statements marked non-reducible. In logic programming, the o~-li~e approach is almost the only one used. All work is done by a complex on-line specialiser which monitors the whole specialisation process and decides on the degree of specialisation while specialising the program. A few researchers have explored off-line specialisation, but lacking an appropriate notion of BTA, they worked with hand-annotated programs, something which is far from being practical. Until now, it was unclear how to perform BTA for logic programs. The current paper remedies this situation. It develops a BTA for logic programs, not by translating the corresponding notions from functional programming to logic programming, but by departing from first principles. Given a logic program to be specialised, we develop a logic program which performs its online specialisation. The behaviour of this program is analysed and the results are used to take all decisions w.r.t, the degree of specialisation off-line. This turns the on-line specialiser into an off-line specialiser. A prototype has been built and the quality and speed of the off-line specialisation has been evaluated.

28 2

Background

2.1

Partial Deduction

In contrast to ordinary (full) evaluation, a partial evaluator receives a program P along with only part of its input, called the static input. The remaining part of the input, called the dynamic input, will only be known at some later point in time. Given the static input S, the partial evaluator then produces a specialised version Ps of P which, when given the dynamic input D, produces the same output as the original program P. The goal is to exploit the static input in order to derive a more efficient program. In the context of logic programming, full input to a program P consists of a goal G and evaluation corresponds to constructing a complete SLDNF-tree for P U (G}. The static input is given in the form of a partially instantiated goal G t (and the specialised program should be correct for all instances of GI). A technique which produces specialised programs is known under the name of partial deduction [18]. Its general idea is to construct a finite set of atoms A and a finite set of finite, but possibly incomplete SLDNF-trees (one for every 1 atom in A) which "cover" the possibly infinite SLDNF-tree for P U (G~}. The derivation steps in these SLDNF-trees correspond to the computation steps which have been performed beforehand and the specialised program is then extracted from these trees by constructing one specialised clause per non-failing branch. In partial deduction one usually distinguishes two levels of control: the global control, determining the set A, thus deciding which atoms are to be partially deduced, and the local control, guiding construction of the finite SLDNF-trees for each individual atom in A and thus determining what the definitions for the partially deduced atoms look like. 2.2

Off-llne vs. O n - l i n e C o n t r o l

The (global and local) control problems of partial evaluation and deduction in general have been tackled from two different angles: the so-called on-line versus off-line approaches. The on-line approach performs all the control decisions during the actual specialisation phase. The off-line approach on the other hand performs a (binding-time) analysis phase prior to the actual specialisation phase. This analysis starts from a description of which parts of the inputs will be "static" (i.e. sufficiently known) and provides binding-time annotation8 which encode the control decisions to be made by the specialiser, so that the specialiser becomes much more simple and efficient. Partial evaluation of functional programs [8, 15] has mainly stressed off-line approaches, while supercompilation of functional [32, 31] and partial deduction of logic programs [13, 3, 1, 30, 25, 20] have concentrated on on-line control. On-line methods, usually obtain better specialisation, because no control decisions have to be taken beforehand, i.e. at a point where the full specialisation 1 Formally, an SLDNF-tree is obtained from an atom or goal by what is called an

unfolding rule.

29 information is not yet available. The main reasons for using the off-line approach are to make specialisation itself more efficient and, due to a simpler specialiser algorithm, enable effective self-application (specialisation of the specialiser) [16]. Few authors discuss off-line specialisation in the context of logic programming [27, 17], mainly because so far no automated binding-time analysers have been developed. This paper aims to remedy this problem. 3 3.1

Towards

BTA

for partial

deduction

An on-line specialiser

The basic idea of BTA in functional programming is to model the flow of static input: the arguments of a function call flow to the function body, the result of a function flows back to the call expression. The expressions are annotated reducible when enough of their parameters are static, i.e. will be known at specialisation time, to allow the (partial) computation of the expression. Modelling the dataflow gives a system of inequalities over variables in a domain {static, dynamic} whose least solution yields the best annotation. This approach does not immediately translate to logic programs. Problems are that the dataflow in unification is bidirectional and that the degree of instantiation of a variable can change over its lifetime (see also [17]). We follow a different approach and reconstruct binding-time analysis from first principles. We start with a Prolog program which performs the unfolding decisions of an on-line specialiser. However, whereas real on-line specialisers base their unfolding decisions on the history of the specialisation phase, ours bases its decisions solely on the actual arguments of the call (which can be more easily approximated off-line). This is in agreement with the off-line specialisers for functional languages which base their decision to evaluate or residualise an expression on the availability of the parameters of that expression. The next step will be to analyse the behaviour of this program (the binding-time analysis) and to use the results to make the unfolding decisions at compile time. First we develop the on-line specialiser. Assuming that for each predicate p/m a test predicate unfold_p/m exists which decides whether to unfold a call or not, we obtain an on-line specialiser by replacing each call p(t) by

( unfold_p( )

memoise_p())

A call to memoise_p(t) informs the specialiser that the call p(t) has to be residualised. The specialiser has to check whether Ca generalisation of) p(t) has already been specialised - - i f not it has to initiate the specialisation of (a generalisation of) p(~)-- and has to perform appropriate renaming of predicates to ensure that residual code calls the proper specialised version of the predicate it calls.

Ezample 1 (Funny append). Consider the following on-line specialiser for a variant, funnyapp/3 of the append~3 predicate in which the first two arguments of the recursive call have been swapped:

30

funnyapp ( [] ,X ,X). funnyapp([X]U'J ,V,[X[W]) : (unfold_funnyapp(V,U,W) -> funnyapp(V,U,W) ; memoise_funnyapp (V,U,W) unfold_fu~myapp(X,Y,Z) : - ground(X).

.

Specialising this program for a query f u n n y a p p ( [ a , b ] , L, 1~) results in the specialised clause (the residual call is renamed as f u n n y a p p A ) funnyapp([a,b] ,L, [a[R1]) : - f u n n y a p p l ( L , [b] ,R1).

Specialising the funnyapp program for the residual call f u n n y a p p ( L , [hi ,I~1) gives (after renaming) the clauses funnyappA ( [], [b], [b] ). funnyapp_.l([XlU], [b] ,[X,blR]) :- funnyapp_2(U, [] ,R). Once more specialising, now for the residual call funnyapp (U, [ ] , R), gives funnyapp_2 ( [], [ ] , [] ). funnyapp_2 ( [][ [U], [], [X [U] ). This completes the specialisation. Note that the sequence of residual calls is terminating in this example. In general, infinite sequences are possible. They can be avoided by generalising some arguments of the residual calls before specialising. In the above example, instead of using ground(X) as condition of unfolding, one could also use the test n i l t e r m i n a t e d ( X ) . T h i s would allow to obtain the same level of specialisation for a query f u n n y a p p e n d ( [X, Y], L, It). This test is another example of a so called rigid or downward closed property: if it holds for a certain term, it holds also for all its instances. Such properties are well suited for analysis by means of abstract interpretation. 3.2

F r o m o n - l i n e t o off-line

"Ihrning the on-line specialiser into an off-line one requires to determine the u~fold_p/n predicates during a preceding analysis and to decide on whether to replace the (u~fold_p(~) --, p(~); memo/se(p(~))) construct either by p(~) or by memoise(p(~)). The decision has to be based on a safe estimate of the calls u~folg_p(~) which will occur during the specialisation. Computing such safe approximations is exactly the purpose of abstract interpretation [9].

:- { grud (L1) } "eapl(Ll,L2,R) {grad(L1)}. ~ap1 ( [] , x , x ) . fapl([XIO],V,[X[W]) : - {grnd(X,U)} (unf.~apl(V,U,W) {grnd(X,U, [I)} ->

{g~.d(Z,U, V)} ~ap2(V,U,W) {gr.~(Z,U, V,W)} ; {grnd(X,U))memo_fap3(V,U,W) {grad(X,U)} ) {grnd(X,U)}. unf_fapl(X,Y,Z) : - {grnd(u ground(X) {grnd(X, u memo_fap3 (X ,Y ,Z).

3] By adding to the code of Example 1 the fact memoise.=funnyapp(X,Y,Z)., and an appropriate handling of the Prolog built-in g r o u n d / I , one can run a goaldependent polyvariant groundness analysis (using e.g. PLAI coupled with the set-sharing domain) for a query where the first argument is ground and obtain the above annotated program. The annotated code for the version l a p 2 is omitted because it is irrelevant for us. Indeed, inspecting the annotations for u n f _ f a p l we see that the analysis cannot infer the groundness of its first argument. So we decide off-line not to unfold, we cancel the test and the then branch and simplify the code into:

:- {gr, d(L1)} fapl(L1,L2,R) {gr, d(Ll)}. ~apl ([] ,X,X). fapl([XlU] ,V, [XIW]) :- {grnd(X,U)} memo.=fap3(V,U,W)
:- {9r, d(L2)} fap3(L1,L2,R) {9r, d(L2)}. ~ap3([] ,X,X). fap3(I'X[U],V,[XIW]) :- {gr, d(F)} (unf_.fap2(V,U,W) {grnd(F)}-> {grnd(F)} fap4(V,g,W) {gr, d(V)} ; {grnd(V)} memo_fapS(V,U,W) {grnd(V)} ) {gr.d(V)}. unf.iap2(X,Y,Z) :- {grnd(X)} ground(X) {grnd(X)}. memo_.fap5 (X ,Y ,Z). This time, the annotations for unf_:fap2 show that the groundness test will definitely succeed. So we decide off-line always to unfold and only keep the then branch. Moreover, the f a p 4 call has the same call pattern as the original call to funnyapp, so we also rename it as f a p l . This yields the second code fragment:

:- {grud(L2)} fap3(L1,L2,R) {grnd(L2)}. fap3([] ,X,X). fap3([XIUJ,V,[X[W])

:-

{grnd(V)} fapl(V,U,W) {grnd(V)}

Applying the specialiser on these two code fragments for a query lap1 ( [ a , b ] , L ,R) gives the same specialised code as in Example 1. However, this time, no calls to u n f o l d f u n n y a p p have to be evaluated during specialisation. 3.3

Automation

To weave the step by step analysis sketched above in a single analysis, a special purpose tool has to be built. We implemented a system based on the abstract domain POS, also called P R O P [24]. It describes the state of the program variables by means of positive boolean formulas, i.e., formulas built from +-*, A and V. Its

32 most popular use is for groundness analysis. In that case, the formula X expresses that the program variable X is (definitely) bound to a ground term, X ~-~ Y expresses that X is bound to a ground t e r m if[ Y is, so an eventual binding of X to a ground term will be propagated to Y. This domain is extended with false as b o t t o m element and is ordered by boolean implication. Groundness analysis corresponds to checking the rigidity 2 of program variables w.r.t, the eermsize n o r m s and abstracts a unification such as X = [Y[Z] by the boolean formula X *-* Y A Z. However POS can also be used with other semi-linear norms [4]. In e.g. normalised programs, it only requires to redefine the abstraction of the unifications. For example, with the lisr norm 4, unification of X = [Y[Z] is abstracted as X *-* Z, and a formula X means that the p r o g r a m variable X is bound to a term with a bounded listlength, i.e. either the t e r m is a nil-terminated list, or has a main functor which is not a list constructor. The analyser has to decide the outcome of the unfold_p test and has to decide which branch to take for further analysis while doing the analysis. Also it has to launch the analysis of the generalisations of the memoised calls. The generalisation we employ is to replace an argument which is not rigid under the norm used in the analysis by the abstraction of a fresh variable. These requirements exclude the direct use of the abstract compilation technique in the way advocated by e.g. [5]. One problem of the scheme of [5] is that it handles constructs ( g r o u n d ( X ) ---, p(~); memoise(p(~))) too inaccurately. The boolean formula is represented as a truth table, i.e. a set of tuples, and the analyser processes the truth table a tuple at a time. Therefore it cannot infer in a p r o g r a m point t h a t X is true, i.e. that X is definitely ground, so it can never conclude that the else branch cannot be taken. The other problem is that the analyses launched for the m e m o i s e d calls should not interfere (i.e. output should not flow back) with the analysis of the clauses containing the memoised calls. Note that defining m e m o i s e as memoise_p( X1, . . . , X n ) : - copy(X1, Y1), . . . , copy( X,~, Y,~ ), and abstracting copy(X, Y ) as X *-~ Y does not work: The abstract success state of executing P ( Y 1 , . . . , Yn) will update the abstractions of X 1 , . . . , X n . Our prototype binding-time analyser currently consists of ~ 800 lines of Prolog code and uses XSB [29] as a generic tool for semantic-based program analysis [6]. The boolean formulas from the POS domain are represented by their truth tables. This representation enables abstract operations to have straightforward implementations based on the projection and equi-join operations of the relational algebra. The disadvantage is that the size of truth tables quickly increases with the number of variables in a clause. The use of dedicated d a t a structures like BDDs to represent the boolean formulas as in [33] often results in better performance but at the expense of substantial p r o g r a m m i n g efforts. The main part of the analyser can be seen as a source-to-source transformation (i.e. abstract compilation) that given the p r o g r a m P to be analysed, A term is rigid w.r.t, a norm if all its instances have the same size w.r.t, the norm. 3 The termsize norm [11] includes all subterms in the measure of the term. 4 The listlength norm [11] includes only the tail of the list in the measure of the list (and measures other terms as 0); nil-terminated lists are rigid under this norm.

33 produces an abstract program p a with suitable annotations. The abstract program can be directly run under XSB (using tabling to ensure termination). The execution leaves the results of the analysis in the XSB tables. Each predicate p / n of P is abstracted to a predicate p a / 2 whose arguments carry input and output sets of tuples. The core part of setting up the analysis is then to define the code for the abstract interpretation of each call (at a program point PP# of interest):

(.nfold_p(X) --4 p(X); memoise_p(X) ).

(1)

is abstracted by the following code fragment: project(Args,TPPin, TC),

(unfold_p(TC)-> unfold(TC,PP#), pa(TC,TR) ; TR=TC, generalise(TC,TCG), memo(TCG,PP#), pa(TCG,_) ) , equi_join( hrgs , TPPin, TR, TPPout), Predicates unfold/2 and m e m o / 2 which abstract the behaviour of each call in the form of (1) above are tabled predicates which have no effect on the computation, but only record information containing the results of the analysis. Their arguments are the current abstraction and the current program point. This information is then dumped from the XSB tables and is fed to the off-line specialiser. The variable TPPi,~ holds the truth table which represents the abstraction of the program state in the point prior to the call. The call to project/3 projects the truth table on the positions h r g s of the variables X participating in the call. The result is TC (Tuples of the Call). The predicate unfold_p/1 (currently supplied by the user for each predicate pin to be analysed) inspects TC to decide whether there is sufficient information to unfold the call. If it succeeds the then branch is taken which analyses the effects of unfolding pin. This is done by executing p~/2 with TC as abstraction of the call state. The analysis returns TR as abstraction of the program state reached after unfolding pin. If the call to unfold_p~1 fails, the call is memoised, and the program state remains unchanged, so TR = TC. The generalisation of the memoised call also needs to be analysed; therefore the else branch first generalises the current state TC into TCG by erasing all dependencies for non-rigid arguments 5 and then calls p~/2 with TCG as initial state, but takes care not to use the returned abstract state as the bindings resulting from specialising memoised calls do not flow back. These actions effectively realise the intended functionality of memoise_p/1. Finally, the new program state TR over the variables X has to be propagated to the other program variables described by the initial state TPPin. This is achieved simply by taking the equi-join over the h r g s of TPPi,~ and TR. The new program state is described by TPPout. One of our examples (see Section 4.2) uses two different norms in the unfold tests: the term norm which tests for groundness, and the listlength norm which tests for the boundedness of lists (whether lists are nil-terminated). This does not pose a problem for our framework, we simply use a truth table which encodes two boolean formulas, one for the term norm and one for the listlength norm. ~A position is rigid if it has an "s" in each tuple e.g. generalise([p(s,,~,s), p(s,d,d)],TCG)yields TCG = [ p ( s , s , s ) , p ( s , s , d ) , p ( s , d , s ) , p ( s , d , d ) ] .

34 4

Some

Experiments

and Benchmarks

We first discuss the p a r s e r and l i f t s o l v e examples from [17]. 4.1

T h e parser e x a m p l e

A small generic parser for languages defined by grammars of the form S ::= aS]X (X is a placeholder for a terminal symbol as well as the first argument to nont/3; arguments 2 and 3 represent the string to be parsed as a difference list): nont(X,T,R) : - t ( a , T , V ) , n o n t ( X , V , R ) . nont(X,T,R) : - t ( X , T , R ) .

t(x, [X IEsl ,Es). A termination analysis can easily determine that calls to t / 3 always terminate and that calls to nont/3 terminate if their second argument is ground. One can therefore derive the following unfold predicates: u n f o l d _ t (X, S1, $2).

unfold_nont (X,T,R) : - g r o u n d ( T ) .

Performing our analysis for the entry point :- .[grnd(X)}n o n t (X . . . . ) we obtain the following annotated program (dynamic arguments [i.e. non-ground ones] and non-reducible predicates [i.e. memoised ones] are underlined.): nont(X,T_,_R) : - t ( a , T , V ) , nont(X,V,_R). nont(X,L_R) : - t(X,T,_R). t (X, [X[Es],Es).

Feeding this information into the off-line system LOGEN [17] and specialising obtain: nont__0([a[B],C) :- nont__0(B,C). nont__O([c[D],D). Analysing the same specialiser for :- {grnd(T)} nont (_,T,_) yields:

n o n t (c ,T,R), we

nont(_X,T,_R) : - t ( a , T , V ) , nont(X_,V,R). nont(X_,T,_R) : - t(X_,T,_R).

t(x_, Ix IEs] ,~s). Feeding this information into LOGEN and specialising nont (_X,[ a , a , c ] ,R) yields: nont__O(c,~). nont__O(a,[c]). nont__O(a,[a,c]).

4.2

T h e liftsolve e x a m p l e

The following program is a meta-interpreter for the ground representation, in which the goals are "lifted" to the non-ground representation for resolution. To perform the lifting, an accumulating parameter is used to keep track of the variables that have already been encountered and generated. The predicate rang and x_amg transform (a list of) ground terms (the first argument) into (a list of) non-ground terms (the second argument; the third and fourth arguments represent the incoming and outgoing accumulator respectively). The predicate

35 s o l v e uses these predicates to "lift" clauses of a program in ground representation

(its first argument) and then use them for resolution with a non-ground goal (its second argument) to be solved. s o l v e (GrP, [] ). s o l v e (GrP, [NsH INsT] ) : non_ground~aember (t erm(clause, [NsHINgBdy] ) ,GrP), s o l v e (GrP ,NgBdy), s o l v e (GrP ,NgT).

non_sroundJuember(ggx, [GrHI_GrT]) :- make_uon_sround(GrH,ggX). non_groundJuember (NgX, [_GrHIGrT] ) :- non_ground_~ember (NgX,GrT) . make_non_sround(G,NG) :- nmg(G ,NG, [] ,_Sub) . ran8 (vat (N) ,X, [ ] , [sub (N ,X)] ). ran8 (vat (N) ,X, [sub (N ,X)IT], [sub (N ,X)IT] ). rang(var(N),X,[sub(H,Y)[T],[sub(N,Y)]T1]) :- N \== M, mng(var(N),X,T,T1). nmg(term(F,Arss) , t e r m ( F , I l r g s ) ,InS,OutS) :- lmng(Args,IArgs,InS,OutS).

hnng( [], [3 ,Sub,Sub). ]:nng([HIT] , [IHIIT] ,InS,OutS) :mng(H,IH,InS,InS1), lmng(T,IT,InSl,0utS). The following unfold predicates can be derived by a termination analysis: unfold__lmng(Gs,NGs ,InSub,0utSub) :- ground(Gs), bounded_list (InSub). unfoldmmg(G,NG,InSub,OutSub) :- ground(G), bounded_list (InSub). unfold~aakemon_ground(G,NG) :- ground(G). unfold_non_groundJnember (NgX,L) :- ground(L). unfold_solve (GrP,Query) :- 8round(GrP).

Analysing the specialiser for the entry point solve(ground,_) we obtain: solve(GrP, D. solve(GrP,[-NgH[SgT]) :non_ground_member(term(clause,[NgH[NgBdy]), GrP), solve(GrP,SgBdy), solve(GrP,SgW). non_ground_member(NgX,[GrH[_GrW]) :- make_non_ground(GrH,NgX). non_ground_member(SgX,[_GrI-IIGrW]) :- non_ground_member(NgX,GrW). make_non_ground(G,NG) :- mng(G,NG, ~,_Sub).

mng(var(N),X,[],[sub(N,X)]). mng(var(N),X,[sub(N,X)IT], [sub(N,X)IT]). mng(var(N),X_,[sub(M,Y)IT], [sub(M,Y)IT1]) :-N \== M, mng(var(N),X,T,Tl). mng(term(F,Args),term(F,IArgs), InS,OutS) :-Imng(Args,IArgs, InS,OutS). Imng(~,~_,Sub,Sub). Imng([HIT],[IHIIT], InS,OutS) :-mng(H,IH,InS,InSl), Imngl(T,l _T, InSI,OutS) . lmngl(~,[],Sub,Sub). lmngl([HTT],[IH-[i~, InS,OutS) :- mngl(H,IH,InS,InS1), lmngl(T,IT, InSl,OutS). mngl(var(N),X,~,[sub(N,X)]). mngl(var(S),X,~sub(S,X)lT],[sub(N,X)lT]). mngl(var(S),X, [sub(M,Y)IT],[sub(M,Y)IT1]) :- S \ = = M, mngl(var(N),X,T,T1).

mngl(term(F,Args),terra(F,IArgs), InS,OutS) :-Imngl(Args,IArgs, InS,OutS). One can observe that the call lmagl(T,IT, InSl,0utS) has not been unfolded. Indeed, the third argument InS1 is considered to be dynamic (non-ground) and the call to unfold..l_am8 will thus not always succeed. However, based on the termination analysis, it is actually sufficient for termination if the third arguments

36 to mng and lmng are bounded lists (as the listlength norm can be used in the termination proof). If we use our prototype to also keep track of bounded lists we obtain the desired result: the call lnmgl (T,IT,InS1 ,OutS) can be unfolded as the first argument is ground and third argument can be inferred to be a bounded list. By feeding the so obtained annotations into LOGEN [17] we obtain a specialiser which removes (most of) the meta-interpretation overhead. E.g. specialising solve ( I t erm (clause, It erm (q, [vaz (1) ] ), term (p, [var (I) ] )] ), term(clause, [term(p, [term(a, [] )] )] )], G)

yields the following residual program: solve_0 ( [ ] ) . s o l v e _ O ( I t errs ( q , [B] ) I C] ) : - s o l v e _ O ( I t erm ( p , [B] ) ] ) , s o l v e _ O ( C ) . s o l v e _ O ( [ t e r m ( p , [ t e r m ( a , [ ] ) ] ) [D]) : - s o l v e _ O ( [ ] ) , s o l v e _ O ( D ) .

4.3

Some Benchmarks

We now study the efficiency and quality of our approach on a set of benchmarks. Except for the p a r s e r benchmark all benchmarks come from the DPPD benchm a r k library [19]. We ran our prototype analyser, BTA, that performs bindingtime analysis and fed the result into the off-line compiler generator LOGEN [17] in order to derive a specialiser for the task at hand. The ECCE on-line partial deduction system [19] has been used for comparison (settings are the same as for ECCE-X in [20], i.e. a mixtus like unfolding, a global control based upon characteristic trees but no use of conjunctive partial deduction). The interested reader can consult [20] to see how ECCE compares with other systems. All experiments were conducted on a Sun Ultra-1 running SunOS 5.5.1. ECCE and LOGEN were run using Prolog by BIM 4.1.0. BTA was run on XSB 1.7.2. Benchmark

ECCE-PD

depth.lam liftsolve.app liftsolve.app4 match.kmp parser regexp.rl

0.34 s 1.00 s 12.32 s

BTA

LOGEN

PD

[Ratio[

0.18 s

0.05 + 0.579 s 0.05 s 0.003 0.079* + 1.841 s 0.05 s 0.006 s " " 0.014 s 0.06 + 0.031 s 0.01 s 0.006

0.06 s 0.17 s

0.039 + 0.031 s 0.06 s 0.006

0.03 + 0.01 s

0.02 s 0.001

Table I. Analysis and SpecialisationTimes

In Table 1 one can see a s u m m a r y of the transformation times. The columns under BTA contain: the time to abstract and compile the p r o g r a m § the time for execution of the abstracted program (both under XSB). The column under LOGEN contains the time to generate the specialiser with LOGAN using the so obtained annotations. Observe, that for any given initial annotation, this has only to be performed o n c e : the so obtained specialiser can then be used over and over again for different specialisation tasks. E.g. the same specialiser was used for the liftsolve, app and liftsolve, app4 benchmark. The '*' for liftsolve, app indicates the time for the abstract compilation only producing code for the groundness analysis. The extra arguments and instructions for the

37 bounded list analysis were added by hand (but will be generated automatically in the next version of the prototype). The column under P D gives the time for the off-line specialisation. The last column of the table contains the ratio of running ECCE over running the specialisers generated by BTA -~- LOGEN. As can be seen, the specialisers produced by BTA --[- LOGEN run 28 - 880 times faster than ECCE. We conjecture that for larger programs (e.g liftsolve with a very big object program) this difference can get even bigger. Also, for 3 benchmarks the combined time of running BTA --~ LOGEN and then the so obtained specialiser was less than running ECCE, i.e. our off-line approach fares well even in "oneshot" situations. Of course, to arrive at a fully a u t o m a t i c (terminating) system one will still have to add the time for the termination analysis, needed to derive the "unfold" predicates. Benchmark

[[Original I ECCE ]BTA + LOGEN 0.06 s 0.08 s 0.00 s 1.33 32 1 0.01 s l i f t s o l v e .app 0.13 s 0.01 s 13 13 1 0.02 s liftsolve.app4 0.17 s 0.00 s 6.5 1 > 34 0.51 s mat ch. kmp 0.58 s 0.34 s 1.14 1 1.71 parser 0.20 s 0.12 s 0.12 s 1.74 1 1.74 0.20 s regexp.rl 0.29 s 0.10 s 1.5 2.9 1 flepth, lain

Table 2. Absolute Runtimes and Speedups Table 2 compares the efficiency of the specialised programs (for the run time queries see [19]; for the p a r s e r example we ran nont(c, [a17, c, b], [b]) 100 times). As was to be expected, the programs generated by the on-line specialiser ECCE outperform those generated by our off-line system. E.g. for the m a t c h , kmp benchm a r k EcCE is able to derive a Knuth-Morris-Pratt style searcher, while off-line systems (so far) are unable to achieve such a feat. However, one can see that the specialised programs generated by BTA + LOGEN are still very satisfactory. The most satisfactory application is l i ~ t s o l v e . a p p (as well as l i f t s o l v e . a p p 4 ) , where the specialiser generated by BTA + LOGEN runs 167 (resp. 880) times faster than ECCE while producing residual code of equal (resp. almost equal) efficiency. In fact, the specialiser compiled the append object program from the ground representation into the non-ground one in just 0.006 s (to be compared with e.g. the compilers generated by SAGE [14] which run in the order of minutes). Furthermore, the time to produce the residual program and then running it is less than the time needed to run the original program for the given set of runtime queries. This nicely illustrates the potential of our approach for applications such as runtime code generation, where the specialisation time is (also) of prime importance.

38

5

Discussion

We have formulated a binding-time analysis for logic programs, and have reported on a prototype implementation and on an evaluation of its effectiveness. To develop the binding-time analysis, we have followed an original approach: Given a program P to be analysed we transform it into an on-line specialiser p r o g r a m P ' , in which the unfolding decision are explicitly coded as calls to predicates unfold_p. The on-line specialiser is different from usual ones in the sense that it - - like off-line specialisers - - uses the availability of arguments to decide on the unfolding of calls. Next, we apply abstract interpretation - - a binding-time a n a l y s i s - - to gather information about the run-time behaviour of P ' . The information in the program points related to u n f o l d _ p allows to decide whether the test will definitely succeed - - i n which case the unfolding branch is r e t a i n e d - - or will possibly fail - - i n which case the branch yielding residual code is retained. The resulting program now behaves as an off-line specialiser as all unfolding decisions have been taken at analysis time. An issue to be discussed in more detail is the termination of the specialisation. First, a specialiser has a global control component. It must ensure that only a finite number of atoms are specialised. In our prototype, we generalise the residual calls before generating a specialised version: arguments which are not rigid s w.r.t, the norm used in the unfolding condition are replaced by fresh variables. This works well in practice but is not a sufficient condition for termination. In principle one could define the memoise_p predicates as:

memoise_p(X) :- copy_term(X,Y), generalise(Y,Z), p(Z). and then generalise such that quasi-termination [21] of the program, where calls to p are tabulated, can be proven. In practice, the built-in c o p y _ t e r m / 2 and the built-ins needed to implement g e n e r a l i s e / 2 will make this a non-trivial task. Secondly, there is the local control component. It must ensure that the unfolding of a particular a t o m terminates. This is decided by the code of the transformed program. Defining the u n f o l d _ p predicates by hand is error-prone and consequently not entirely reliable. In principle, one could replace the calls memoise_p by t r u e and apply off-the-shelf tools for proving termination of logic programs [22, 7]. Whether these will do well depends on how well they handle the i f - t h e n - else construct used in deciding on the unfolding and the builtins used in the rigidity test (e.g. the analysis has to infer that X is bounded and rigid w.r.t, the norm in the program point following a test ground(X)). It is likely that small extensions to these tools will suffice to apply t h e m successfully in proving termination of the unfolding ~, at least when the unfolding conditions are based on rigidity tests with respect to the norms used by those termination analysis tools. A more interesting approach for the local control problem is to automatically generate unfolding conditions by program analysis. Actually, one could apply a s I.e., "static" from functional programming becomes "rigid w.r.t, a given norm." 7 After a small extension by its author, the system of [22] could handle small examples. However, so far we have not done exhaustive testing.

39 more general scheme for handling the unfolding than the one used so far. Having for each predicate p / n the original clauses with head p / n and transformed clauses with head p t / n , the transformed clauses could be derived from the original by replacing each call q/m by: ( t e r m i n a t e s _ q ( ~ ) -> q(~) ; (unfold_q(~) -> qt(~) ; memoise_q(~)

) )

In [10], Decorte and De Schreye describe how the constraint-based termination analysis of [11] can be adapted to generate a finite set of "most general" termination conditions (e.g. for append/3 they would generate rigidity w.r.t, the listlength norm of the first argument and rigidity w.r.t, the listlength norm of the third argument as the two most general termination conditions; for our f u n n y a p p / 3 they would generate rigidity of the first and second argument w.r.t. the listlength norm as the most general termination condition.). These conditions can be used to define the t e r m i n a t e s _ q predicates. If they succeed, the call q(~) can be executed with the original code and is guaranteed to terminate. Moreover, as they are based on rigidity, they are very well suited to be approximated by our binding-time analysis. Actually, in all our benchmarks programs, we were using termination conditions for controlling the unfolding, so in fact we could have further improved the speed of the specialiser by not checking the condition on each iteration but using the above scheme. Generating unfold_q definitions is a harder problem. It is related to the generation of "safe" (i.e. termination ensuring) delay declarations in languages such as MU-Prolog and GSdel. This is a subtle problem as discussed in [28, 23]. For example, the condition (nonvar(X) ; n o n v a r ( Z ) ) is not safe for a call append(X,Y, Z); execution, and in our case unfolding, could go on infinitely for some non-linear calls (e.g. append( Eal L] ,Y,L)). Also the condition n o n v a r / 1 is not rigid. (For funnyapp/3 we had rigid conditions, however this is rather the exception than the rule.) A safe unfolding condition for append(X,Y,Z) is l i n e a r ( a p p e n d ( X , Y , Z ) ) , (nonvar(X); n o n v a r ( Z ) ) . Linearity is well suited for analysis (e.g. [2]), but a test nonvar(X) is not. Moreover, unless X is ground, the test is typically not invariant over the different iterations of a recursive predicate. A solution could be to switch to a hybrid specialiser: deciding the linearity test at analysis-time and the simple nonvar tests at run-time. But as said above, perhaps due to lack of a good application (for languages with delay, speed is more important than safety), there seems to be no work on generating such conditions. Another hybrid approach is taken in a recent work independent of ours [26]. This work also starts from the termination condition. When it is violated, the size of the term w.r.t, the norm used in the termination condition and the maximal reduction of the size in a single iteration is used to compute the number of unfolding steps. The program is transformed and calls to be unfolded are given an extra argument initialised with the allowed number of unfolding steps. An on-line test checks the value of the counter and the call is residualised when the counter reaches zero.

40 Acknowledgements M. Bruynooghe and M. Leuschel are supported by the Fund for Scientific Research - Flanders Belgium (FWO). K. Sagonas is supported by the Research Council of the K.U. Leuven. Some of the present ideas originated from discussions and joint work with Jesper JCrgensen, and from the PhD. work of Dirk Dussart [12], to both of whom we are very grateful. We thank Bart Demoen, Stefaan Decorte, Bern Martens, Danny De Schreye and Sandro Etalle for interesting discussions, ideas and comments.

References 1. R. Bol. Loop Checking in Partial Deduction. The Journal of Logic Programming, 16(1&2):25-46, May 1993. 2. M. Bruynooghe, M. Codish, and A. Mulkers. Abstracting unification: a key step in the design of logic program analyses. In Computer Science Today, pages 406-442. Springer-Verlag, LNCS Vol. 1000, 1995. 3. M. Bruynooghe, D. De Schreye, and B. Martens. A General Criterion for Avoiding Infinite Unfolding During Partial Deduction. New Generation Computing, 11(1):47-79, 1992. 4. M. Codish and B. Demoen. Deriving Polymorphic Type Dependencies for Logic Programs Using Multiple Incarnations of Prop. In B. Le Charlier, editor, Proceedings of the First International Symposium on Static Analysis, number 864 in LNCS, pages 281-297, Namur, Belgium, September 1994. Springer-Verlag. 5. M. Codish and B. Demoen. Analysing Logic Programs using "Prop"-ositional Logic Programs and a Magic Wand. Journal of Logic Programming, 25(3):249274, December 1996. 6. M. Codish, B. Demoen, and K. Sagonas. Semantic-Based Program Analysis for Logic-Based Languages using XSB. K.U. Leuven TR CW 245. December 1996. 7. M. Codish and C. Taboch. A Semantic Basis for Termination Analysis of Logic Programs and its Realization using Symbolic Norm Constraints. In Proceedings of the Sizth International Conference on Algebraic and Logic Programming, number 1298 in LNCS, pages 31-45. Springer-Verlag, September 1997. 8. C. Consel and O. Danvy. Tutorial Notes on Partial Evaluation. In Proceedings of the ACM Conference on Principles of Programming Languages, pages 493-501, Charleston, South Carolina, January 1993. ACM Press. 9. P. Cousot and R. Cousot. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Conference Record of the Fourth ACM Symposium on Principles of Programming Languages, pages 238-252, Los Angeles, California, January 1977. ACM. 10. S. Decorte and D. De Schreye. Termination Analysis: Some Practical Properties of the Norm and Level Mapping Space. TR, Dept. Comp. Science, K.U. Leuven. 11. S. Decorte and D. De Schreye. Demand-driven and Constraint-based Automatic Termination Analysis for Logic Programs. In L. Naish, editor, Proceedings of the Fourteenth International Conference on Logic Programming, pages 78-92, Leuven, Belgium, July 1997. The MIT Press. 12. D. Dussart. Topics in Program Specialisation and Analysis for Statically Typed Functional Languages. PhD thesis, Katholieke Universiteit Leaven, May 1997. 13. J. Gallagher and M. Bruynooghe. The Derivation of an Algorithm for Program Specialisation. Nen~ Generation Computing, 9(3,4):305-333, 1991. 14. C. A. Gurr. A Self-Applicable Partial Evaluator for the Logic Programming Language GJdel. PhD thesis, Department of Computer Science, University of Bristol.

4J 15. N. D. Jones, C. K. Gomard, and P. Sestoft. Partial Eraluation and Automatic Program Generation. Prentice Hall International Series in Computer Science, 1993. 16. N. D. Jones, P. Sestoft, and It. SCndergaard. MIX: a Self-applicable Partial Evaluator for experiments in Compiler Generation. LISP and Symbolic Computation,

2(1):9-50, 1989. 17. J. J~rgensen and M. Leuschel. Efficiently Generating Efficient Generating Extensions in Prolog. In O. Danvy, R. Glfick, and P. Thiemann, editors, Proceedings of the 1996 Dagstuhl Seminar on Partial Evaluation, number 1110 in LNCS, pages 238-262, SchloB Dagstuhl, February 1996. Springer-Verlag. 18. J. Komorowsld. Partial Evaluation as a means for inferencing data structures in an Applicative Language: A Theory and Implementation in the case of Prolog. In Proceedings of the A CM Conference on Principles of Programming Languages, pages 255-267, Albuquerque, New Mexico, January 1982. ACM. 19. M. Leuschel. The ECCE partial deduction system and the D P P D library of benchmarks. Obtainable via h t t p ://www. os. lmleuven, ac .be/~lpai, 1996. 20. M. Leuschel, B. Martens, and D. De Schreye. Controlling Generalisation and Polyvariance in Partial Deduction of Normal Logic Programs. A CM Trans. Prog. Lang. Syst., 20, 1998. To Appear. 21. M. Leuschel, B. Martens, and K. Sagonas. Preserving Termination of Tabled Logic Programs While Unfolding. In N. Fuchs, editor, Proceedings of LOPSTR'97: Logic Program Synthesis and Transformation, LNCS, Leuven, Belgium, July 1997. 22. N. Lindenstrauss and Y. Sagiv. Automatic Termination Analysis of Logic Programs. In L. Naish, editor, Proceedings of the Fourteenth International Conference on Logic Programming, pages 63-77, Leuven, Belgium, July 1997. The MIT Press. 23. E. Marchiori and F. Teusink. Proving Termination of Logic Programs with Delay Declarations. In J. W. Lloyd, editor, Proceedings of the 1995 International Logic Programming Symposium, pages 447-461, Portland, Oregon, December 1995. 24. K. Marriott and H. SCndergaard. Precise and Efficient Groundness Analysis for Logic Programs. ACM Letters on Progr. Lang. and Syst., 2(1-4):181-196, 1993. 25. B. Martens and D. De Schreye. Automatic Finite Unfolding Using Well-Founded Measures. The Journal of Logic Programming, 28(2):89-146, August 1996. 26. J. Martin. Sonic Partial Deduction. Technical Report, Dept. Elec. and Comp. Sc., University of Southampton, January 1998. 27. T. Mogensen and A. Bondorf. Logimix: A self-applicable partial evaluator for Prolog. In K.-K. Lau and T. Clement, editors, Logic Program Synthesis and Transformation. Proceedings of LOPSTR'9$, pages 214-227. Springer-Verlag, 1992. 28. L. Naish. Coroutining and the Construction of Terminating Logic Programs. Technical Report TR 92/5, Dept. Computer Science, University of Melbourne, 1992. 29. K. Sagonas, T. Swift, and D. S. Warren. XSB as an Efficient Deductive Database Engine. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, pages 442-453, Minneapolis, Minnesota, May 1994. ACM. 30. D. Sahlin. Mixtus: An Automatic Partial Evaluator for Full Prolog. New Generation Computing, 12(1):7-51, 1993. 31. M. H. SCrensen and R. Gliick. An Algorithm of Generalization in Positive Supercompilation. In J. W. Lloyd, editor, Proceedings of the 1995 International Logic Programming Symposium, pages 465-479, Portland, Oregon, December 1995. 32. V. F. Turchin. The Concept of a Supercompilcr. ACM 2Zeans. Prog. Lang. Syst., 8(3):292-325, July 1986. 33. P. Van Hentenryck, A. Cortesi, and B. Le Charlier. Evaluation of the Domain Prop. Journal of Logic Programming, 23(3):237-278, June 1995.

Verifiable and Executable Logic Specifications of Concurrent Objects in s Luls Caires and Luis Monteiro { l c a i r e s , lra}@d• f c t . u n l . p t Departamento de InformAtica - Universidade Nova de Lisboa A b s t r a c t . We present the core-/:. fragment o f / ~ and its program logic. We

illustrate the adequacy of E~ as a recta-language for jointly defining operational semantics and program logics of languages with concurrent and logic features, considering the case of a specification logic for concurrent objects addressing mobile features like creation of objects and channels in a simple way. Specifications are executable by a translation that assigns to every specification a model in the form of a core-s program. We also illustrate the usefulness of this framework in reasoning about systems and their components.

1

Introduction

The "proof-search as computation" paradigm has inspired the design of several programming and specification languages [6]. In [1] we presented s a simple language that supports the reduction and state-oriented style of specification of mobile process calculi without compromising the relational style of specification typical of logic programming. In particular, we have shown that both the 7rcalculus [10] and a version of the logic of hereditary Harrop formulas [7] can be adequately encoded into /:~. On the other hand, s can itself be encoded in classical linear logic, in such a way that successful computations correspond to proofs [3]. As illustrated in [2],/:~ suits itself well as a gpecification language of the operational semantics of programming languages with concurrent and logic features; we also proposed/:~ as a meta-language for the compositional definition of operational semantics and program verification logics of languages of such a kind. Herein, we pursue this general directions, picking the case of declarative object-oriented specification of concurrent systems. MOTIVATION. Several embeddings of object-oriented constructs into variants of r-calculus have been presented in the literature [12, 16, 14]. However [11], the 7r-calculus, while supporting the basic features of concurrency, mobility, abstraction and creation of processes and names, is a low level calculus, so the question arises about what kind of idioms are more suitable to model and reason about object systems and their components. In this setting, most of the current approaches focus On operational semantics or types, and the object languages have a definite operational character. In this paper, we develop a contribution in this general theme, stressing the instrumental use of the E~ framework in the spirit stated above. More specifically, we consider the case of a declarative specification language and program logic for concurrent objects that provides a unified scenario for specifying and reasoning about systems. This is achieved in the context

43 of a well-defined operational semantics that renders specifications executable and makes essential use of the reductive and deductive features of/:~. OVERVIEW. First, we define core-/:~, a small yet expressive fragment o f / : ~ , and its program logic #s The logic #/:~ is based on a many-sorted first-order version of the #-calculus enriched with connectives enabling the implicit expression of component location and a hiding quantifier. For instance, a property | r holds of a system with two components that satisfy respectively ~o and r and Z , ~ holds of a system that satisfies ~ in a context where a name n is private. Then, we present a specification logic for concurrent objects that actually reduces to an idiom of #/:~. From the viewpoint of logical specification, our proposal extends existing ones (for instance, [15]) by addressing mobile features like dynamically changing number of components and creation of private channels in an intrinsic way, while supporting both local and global reasoning by relying on the monoidal structure induced by | Moreover, specifications are executable by means of a syntax-directed encoding that assigns to each specification a model in the form of a core-s program. This translation relies on a combination of techniques from process calculi and proof-theoretic approaches to logic programming. We also illustrate the usefulness of this framework in reasoning about systems and their components by giving some examples. OBJECT MODEL. We consider a standard asynchronous model of concurrent objects as closed entities that exhibit computational behaviour by performing mutually exclusive actions over a private state. Actions are triggered by serving request messages from the environment, but are only allowed to ocurr under certain enabling conditions. Actions can change the object's internal state, cause the sending of action requests to other objects or even the creation of new objects or channels, and display internal non-determinism. Thus, any specification of the behaviour of an object must define conditions under which actions can occur, and also what are their effects. SPECIFICATION LANGUAGE. Since [13], many authors adopted subsets of some temporal logic (TL) as a specification language for concurrent systems. But unrestricted use of TL often leads to unimplementable specifications, and typical approaches pick some restricted class of formulas as dynamic axioms. For instance in [4], one finds safety formulas like ~ A A* ~ Xr and liveness formulas like qo ~ FA* - where A* denotes the occurrence of action A, and X and F are respectively the next and eventually operators of (linear) TL. Concerning liveness properties, in our asynchronous model we will not be able to assert such a strong assertion, but just something like the weaker ~o ~ XmA* where mA* asserts the existence of a pending request for action A. Then, whether mA* implies FA* depends on fairness assumptions and on the actual enabling conditions of action A. In summary, we adopt the (branching time) logic inherited from #s and consider specifications of object classes that enumerate instance attributes, and include dynamic axioms of the form ~ ~ ([A]) r and static axioms defining the meaning of non-logical symbols. STRUCTURE OF THE PAPER. In Sections 2-3 we introduce core-s and its program logic. In Section 4, we introduce the object specification language and

44 present its interpretation i n / : ~ in Section 5. After discussing some properties of this interpretation in Section 6, we present in Section 7 an example of reasoning a b o u t a system.

2

The

core-s

Language

Here we introduce core-s a small yet expressive fragment of s [1, 2]. SYNTAX. Given a set V of variables, the set T O ; ) of untyped terms over Y is defined inductively as follows: variables are terms, if x is a variable and t l , - - 9 , tn are terms then x ( t l , . . . , t,~) is also a t e r m t (s.t. head(t) = x). In general, we will use x , y , z , i , j for variables, a, b, c, t, u, v for terms, ~ for a list of distinct variables and t for a sequence ( t l , . . . ,t~) or a multiset ~]tl,... , t , ~ of terms, depending on context. Sometimes, we will write x.t for x(t) and call names to variables. Untyped agents of core-/:r are defined by

Inaction 0 denotes the well-terminated process. An atom a can be seen either as an elementary message to be sent asynchronously or as a d a t a element in the shared state, p[q stands for the parallel composition of p and q; we sometimes write rh for roll 9 link when the m are atoms. Restriction ux induces generation of private names; uxp binds x in p. To explain the behaviour of the input prefix fc ~ ~[g]p we start by considering the particular case where the test g is 0, which we abbreviate by ~ ~, ~[]p. The agent ~ ~, ~NP waits for a message b (a multiset of terms) and then behaves like a(p), if there is a substitution cr with domain ~ such t h a t a(5) - b. In general, in addition to receiving the message, ~ ~ ~[g]p must also perform successfully the test v,(g). Roughly, a test g succeeds if there is an atomically encapsulated computation sequence starting with g and reaching a success state (defined below). The replicable agent !~ ~[g]p behaves just like it's non-replicable version, except t h a t it does not consume itself upon reduction. In input prefix agents, the input variables ~ must have free occurrences in either or g, although not in the head of some a. Note however t h a t in non-replicable input prefix, 5 m a y be empty, this corresponds to the testing agent of [1]. In b o t h forms, ~ m a y be empty, in which case the prefix ~c, m a y be ommited. For agents, we will use normally p, q, r, g. TYPING. Here we introduce a simple type system for core-s Assume given a set {2 of primitive sorts, containing at least o, and a set A of primitive basic types containing say nat, bool. Basic types fi and types ~- are given by . .

::=

::= n

A type t h a t is not basic is called a sort. The definition of types and sorts is stratified in such a way that no t e r m of basic type 6 can contain a subterm of some sort. The motivation for this stratification is to limit scope extrusion to terms of sort kind. Signatures are partial m a p s from variables to types. We use Z, ~ , F for signatures, and assume that x r r when writing Z , x :~-. Terms and

45

Z~-h:(T~,.'',Tk)~ Zl-ti:Ti ~" h(t~,... ,tk) : ~3

Y,,X:~'~-p I- UX : Tp

~-h(t~,...,tk):o t- hit~,. . . , tk) Z,&:~-g Z, 2 : ~ - p

~-p ~-q ~ ~- PIq Z,~c:~t-ai:o

Fig. 1. Typing of terms and agents. agents are typed by the rules in Figure 1. We will refer by T~ (S) the set of terms t such that Z ~- t : T is valid. Note that binding occurrences of variables in vx and &~, are now explicitly assigned a type. In the rule for restriction, T must be a sort and, in the rules for input prefix agents, any variable x not occurring free in 5 must be assigned a basic type. If ~ t- p is valid, we will say that S ; p is a well-typed agent, bringing explicit the typing context. For well-typed agents we will use P, Q, R, and write :P for the set of all well-typed agents. REDUCTION. The operational semantics of s is defined by a relation of reduction between agents. Following [8], we first define a structural congruence relation that abstracts away from irrelevant notational distinctions. = is the smallest congruence relation over agents containing a-conversion and closed under equations (i) Piq ~- qiP, (ii) (piq)I r ~- ql(plr), (iii) pI 0 ~ p, (iv) vx(piq ) ~- vxplq i f x is not free in q, (v) ~xO --- O, (vi) !r ~-!ri!r and (vii) !rlSc ~ 5[air ~-!rI~ ~ 5[qi!r]r if are not free in p and !r stands for some replicable input prefix agent. Reduction is the least relation P -+ Q between well-typed agents P and Q such that

~;Z]X : 7"p -+ ~;VX : Tq Z; ~ ~ [g]Plq --+ Z; a(p)I q Z;(nI x : ~ 5 [ g ] P i q -~ E;aiP)iq a[g]plq :e

if ~',X : T;p ~ ~ , X : ~-;q if Z; a(g) -5, V/ } if Z ; a i g ) -~ ~/Arh -- 0(5 )

a[g]plq

Here a assigns to each x : r a term t E Tr(Z) and x / ( a success state) is any agent of the form ~; v~ : ~(!p), where !p is a composition of a (possibly null) number of replicating agents. As usual, we will refer by -~ the transitive-reflexive closure of ~ . Subject reduction holds: if P E P and P --+ Q then also Q E P. LABELLED TRANSITION SYSTEM. To capture the compositional behaviour of agents, an "open" version of the reduction relation must be defined that captures not only the behaviour of agents in isolation but also the behaviour they can exhibit with adequate cooperation of the environment. As usual, this is formalised by means of an action-labelled transition system (LTS), where actions may be output (? a), input iS a) or silent (T). Several possibilities arise at this point, we must require at least soundness, that is, agreement of _L~ with reduction. Additionally, we can also impose a notion of relevance for P, more precisely, that whenever P ~ Q there is P' E P such that P' ~ Q', P I P ' E 7) and P I P ' -~ QIQ', and likewise when P ~-~ Q.

46 . l"a S; a--~ .,U;0 if head(a) CDef

S; a 4.!~:~.~b[]p2~; p(p) if p(b) -- a A head(a) 9 Def E, Y: 7; !qla(g) 24 ~/

S; !p ~ .,U;!p

z; 5:: S , x :"r;p ~ S~,x : T;q if X r a z~; UX : Tp -'~ SI; UX : Tq Z; p i"(~)~ Z, 5: : f; p'

a[g]p

s, 9:

~,,x : r;p ~ Et;q

o(p)

if x 9 a A x ~_head(a)

~; ux : Tp "r(x_~)a X~; q ~; q r174

~ , F, 5: : f; q'

S;plq ~ ~ , F ; v s : : ~(P'lq') S;p-Sr S ' ; p '

Is]

S';Plq -5, S';P'Iq

S;p~-~ ,V,,F;p ' S;p[q t ~ r

S ; q t-~ s , O ; q ' ~,F,O;p'lq'

Fig. 2. LTS specification for core-/:,

ACTIONS. Z-actions are objects of the forms $(5: : ~)5!p (output action) or $(5: : ~)5!p (input action) where 5: : ~ is a possibly empty signature, each ai is an atom and each !p a replication such that Z, 5: : ~ t- all " " la,~lIpll"" liPm is valid. To grasp the meaning of the binding signature, just consider it a generalisation of r-calculus bound input and bound output actions, in our notation, x(y) would be rendered by r and x(y) by $(y)x(y). When all components 5:, 5 and !p of an action are void, both r and $ will be noted T (the silent action). To highlight the typing context we may write r ; a for Z-actions a. The set of all actions will be noted Act, and individual actions by a, b, c. We will also define head((x : T)5!h.p) as the set of heads of all 5 and all h. Actions a and b of the same polarity compose into another action a | b; composition of (5: : ~)Slp with (# : ~)SIq is defined as (5: : "~, # : ~)55!pq. Intuitively, a | b is the concurrent execution of a and b; | is commutative, associative and has T for unit. L T S SPECIFICATION. In Figure 2, we present a LTS specification for core-s Most rules have the form one expects. Substitution a assigns each x : T a term in Tr(57,# : ~), and p a term in Tr(Z). When we are interested in computations starting from agents F; p with F a given global signature, we let Def C r and define two transitions for an agent Z; a, to distinguish the case where a is to be substituted in place by it's definition (when head(a) E De]), from the case where a is to be sent to the environment. This specification defines a sound and relevant transition relation -~ for all of IP, and should be taken as the reference transition relation. Note however that when developing applications of core-/:, (for instance, when studying certain restricted idioms as in this paper) we might be interested just in some subset of P. Since -St might not be relevant for that subset, additional provisos may be placed occasionally in some of the rules above, in order to obtain relevance for the fragment under consideration.

47 ~V~;P~I,v P(t) S; p ~I,v t - t

if t 9 I t ( P )

S; p ~I,, mo S;p ~I,. ~ A r S; p ~I,. Vx : ~-~

if if if if if if

not S ; p ~I,v qo S;p ~I,. ~o and S;p ~I,v r Z ; p ~I,~ qo{~/t} for all t 9 T~(S) 3 ( S ; a ) 9 S3(Z';q) 9 79(S;p-St S';q and S';q ~I,~ qo) S ; p ~I,v (S) (p S ; p E v(X) Z ; p ~I,v X VS C_ 79(clos(S) ~ S ; p 9 S) where S;p ~i,. pX.~ clos(S) is V(F; q) 9 79(/"; q ~I,vtx~s] ~o ~ (F; q) 9 S) if p - a h / S ; p ~ I , . [a] S; b'X : Tp ~I,v Sx~O if S,X:T;p~i,vqo if p ~ ql r and S; q ~ I , . qo and S; r ~I,v S; 0 ~I,v 1

Fig. 3. Satisfaction of p s

3

formulas.

A program logic for core-/:,r

The program logic p/:~ is basically a many sorted first-order version of the propositional p-calculus (see [5]) extended with operators for handling private names and spatial distribution. Formulas of p / : . are

,z ::= P(~) It -- ~ I--,~ I ~A~ I W : ",,Z I (S)~IX I pX.~ I S~:,-~ I ~,|

I It] I 1

The formulas expressing conjunction (qo A r properties P(t), equality (t - u), possibility ((S)~), universal quantification (Vx : T~), negation (-%0) and least fixed point (pX.tp) are to be understood in the usual way. In a fixed point formula pX.qo, any occurrence of X must be under an even number of negations and under no | The remaining forms are particular to pZ:~ and deserve further comments. Exqo asserts a property of a state embedded into a context where the name x is private. The formula qo|162holds in every state that can be partitioned into a component satisfying qo and a component satisfying r 1 is the unit for | Finally, [t] is true of those states that essentially contain just the message t. Formulas are expected to be well-typed w.r.t, a signature Z; for ps formulas ~, we assume a corresponding judgement form S F- ~ defined as expected. If such a judgement is valid, qo will be called a S-formula. SEMANTICS. Truth of formulas of ps is taken with respect to a structure that consists in a labelled transition system for core-/:. 1 together with a map I assigning to each signature E an interpretation I s . Such interpretations assign to each predicate symbol of type (T1,... , ~-~)O a subset of T~1(S) x ... x T~ (Z). I is required to be monotonic, that is S C_ S t implies that, for every predicate 1 In general, the reference LTS; sometimes a suitable restriction may be required, cf. previous remarks on relevance.

48 symbol P, I f ( P ) C_ I~,(P). In Figure 3 we present the relation - ~ I , v of satisfaction between E-terms and well-typed E-formulas of #s The definition is parametric on the interpretation map I and also on a valuation v. This valuation assigns sets of well-typed terms to propositional variables X, and is needed to interpret the fixed point operator. When interpreting formulas without free propositional variables, the valuation is irrelevant and could be omitted. Some specific comments about this definition are in order. First, core-s terms are taken modulo structural congruence ~. In the clause for iS)~o, S is a set of s actions. To follow usual notations we will write (.) ~o for (S) ~o when S is the set of all actions. In the clause of #X4o, the property clos(S) holds of prefixed points of the mapping that sends sets of terms satisfying a property r to the set of terms satisfying the property ~o{z/r The syntactic restriction on the occurrences of X in #X4o causes this mapping to be monotonic, therefore the clause for #X.~o defines the least fixed point as the intersection of all prefixed points, cf. Knaster-Tarsky theorem. Other logical connectives (say, V, 3, O) are defined in the usual way according to standard tarskian semantics. The following usual abbreviations will also be used: IS]q0 for --(S)-,qo (necessity), vX.~o for -,#X.~qo{X/-x} (greatest fixed point), _1_for # X . X (false) and T for -,_l_ (true). We will also write {~} for E ~ when no x occurs free in ~, and introduce the following derived operators: ([S]) q0 standing for (S) T A [S]qo, inv[s]qO standing for vX.qoA[S]X (~ holds along any S-path) and ev[s]qo standing for #X.~oV(S) X (~o will eventually hold along some S-path) and evs[s]~ for #X.~ V((S) T A [S]X) (qo will eventually hold along any S-path); inv[.]~o will be abbreviated by inv~o and likewise for ev~o and evs. The use of an explicit signature is essential in the definition of satisfaction. For instance, note that Z~-7 =:~VyZx(~x - y) is true in any interpretation. This formula asserts intuitively that if a state has some private name, then this name is distinct from every term in the current environment. The scoping behaviour of private names induces the need to consider the brackets (~o} even when the private names do not ocurr in ~. For instance, a : o; vb: o b[]a ~ {(.) (ta) -7} but o; . b : o

4

V:

(') (*o)

A specification

-7.

language

for concurrent

objects

The specification language we present here is an idiom of the program logic of Section 3. Essentially, we consider specifications of classes of objects that indicate besides the instance state variables, for each possible action A a set of dynamic axioms of the form ~o ~ ([A]) r where ~ (the pre-state formula) is a proposition over the object's state and r (the pos-state formula) is a proposition defining the new state of the object and other side effects induced by the execution of A. Possible side effects are the posting of action requests i.m (for other objects) and of creation requests ui : c(5) (for a new object i of class c(5)). Besides this components, a specification also comprises a post-state formula defining the initial state of objects of the class and a static theory (called so because it does not involve any dynamic modalities) defining the meaning of the non-logical

49 constants (predicate symbols) occurring elsewhere in the specification. At this stage, the reader may want to give a look at the sample specification in Fig. 4. We assume given a set of basic types and a set of basic sorts (containing, at least o and act). The type (act)o will be abbreviated by obj. A system specification is triple (Z, C, S) where E is a signature, the set of class names C is a subset of E, and S is a mapping assigning to each c E C a class specification Sc of type E(c) ----To. A parametric E-class specification of type (T1,... ,T~)O is then a five-tuple (A,/:,Z, ~ , 7-) such that (1) A a n d / : and ~ are pairwise disjoint signatures, (2) A is of the form al : ~'1,... ,a,~ : Tn (3) Z is a post-state formula, T~ is a finite set of dynamic axioms and T is a finite sets of static axioms. Post-state formulas and dynamic and static axioms will be defined shortly. PARAMETERS. Signature A lists the parameters of the class specification. If A is empty, the specification is non-parametric and therefore fully determined. Given a E-class specification (.4,/:, Z, T~, T ) of type (~'1, 999 , Tn)O and terms ti E T~, (~) we obtain some non-parametric concrete instance (O,Z{~/~), n{a/~}, T{a/~}). LOCAL SYMBOLS. S i g n a t u r e / : splits in two components: /:attrs, containing the declarations of the class instance's attributes and /:preds, which assigns to each predicate symbol defined by the local static theory T a predicate type (~)o. DYNAMIC THEORY. Dynamic axioms have the form V~ : "~(SA~0 ~ ([A])r where the action A is a term of type act, S is a state formula, 7~ is a static formula, r is a post-state formula and ~ are some variables with free occurrences at least in S, 7~ or A - those that occur just in qo and r should have basic type. A formula of the form S A ~ will be called a pre-state formula. These kinds of formulas are defined by (state formulas) S ::= T [ a = t [ S A S (static formulas) ~ ::= P(t) I T I 7~A ~ I qo V (post-state formulas) r ::= S ] p i : c(t)r I i.m | r where a E /:attrs, P E /:preds and c E C . In the definition of post-state formulas, i is assumed of type obj and m is assumed of type act. INITIAL. I is a post-state formula specifying the object's state at birth time. STATIC THEORY. 7- is a set of static axioms of the form V~ : ~(~o ~ P(t)) where qo is a static formula. COMMENTS. All formulas in the class specification are well-typed w.r.t, the signature Z`4/:, self: obj, except that self cannot occur in T. To avoid useless inconsistencies, no more than a single occurrence of a particular attribute symbol is allowed in pre- and post-state formulas, and the state formulas in the antecedent of dynamic axiom should be pairwise exclusive. SEMANTICS. Semantics of class specifications is given by translation into a #/:~ formulas. To perform this, we must understand formulas in a specification as asserting properties of a prototypical object of the given class and bring this arbitrary object explicit. More precisely, if i is a variable of type obj and ~ a formula in a class specification, the formula i.~ asserts the property 7~ relativized to the object i. This relativization is essential also to enable global reasoning about systems composed of multiple interacting objects. Now, i.~ is obtained

50 from ~ as follows: replace self by i, rename predicate symbols P as c.P 2 is the class name, replace actions A in dynamic axioms by $ i(A) and replace equations a = t by i.a = t. Moreover, understand ui : c(t)r as Ei:obj([c(i,t)] @ r and j . m as [j(m)]. Every expression of the form i.a = t must also be translated to some formula of #/:.. This translation depends on the actual representation of objects as agents of core-/:.. Several possibilities arise in this point, we just require invariant validity of the following principles P 1 Vt : T(i.a -- t ~ [.](3y : Ti.a ----y) ) (Persistence of Attributes) P 2 Vt : T~v : T,(i.a = t A i.a = v ~ v -- t) (Attribute values are functional) P3 --(3z : T(i.a = z) | 3 z : T(i.a = Z) | T) (Unicity)

SPECIFICATION FORMULAS AND MODELS. Let Z ~ be obtained from Z by changing the types assigned to class names in C from (~)o to (obj, ~)o and s be obtained f r o m / : by renaming each predicate symbol in ~ p r e d s from P to c.P. Then, every Z-class specification C = (A,/:, Z, ~ , T) yields the Z~ C*(i,~)=S~(

A

i.at=vt)A(

al Es

A Dj ET~

i.Dj)A( A

c.Pk)

P~ E T

This formula expresses the full content of the specification and can be used to define a notion of model for a system specification. D e f i n i t i o n 1 A model for a system specification ~P = (E,C, S) is an interpretation map I and an assignment of a E~ !de to each c E C such that for all c e C Pc ~ I (!dc) (i.Z A inv(S*(i,~) | T}) v, ol'Co where Pc ~ .-, J*~preds, i : obj; c(i, ~) for some ~, and satisfaction is taken w.r.t. some sound restriction of the reference L T S relevant to (QIPc ~ Q for c ~ C}.

5

From Specifications to Agents

In this section, we show how models for class specifications can be systematically constructed in a rather simple way. Basically, an object "located at" name i will be represented by an agent of the form u s : a(s(i, ~)[uc.p(R(n)is [T(c.T))) where R(7~)i8 and T(c.T) are encodings of respectively the dynamic and static theory, and s(i, ~) is a "record" of the attribute values. We first introduce a canonic form for dynamic axioms in which all of the object's attributes occur both in pre- and post -state formulas. The motivation is that this simplifies the description of the translation and renders explicit the intended frame condition. CANONIC FORM. Let Vx(S1Aqo ~ ([a]) r be a dynamic axiom where $2 is the maximal state formula embedded into the post-state formula r Then, its canonic form is VYC~k(S1 A S~ A qo ~ ([a]) r where S~ is al = yl A - . . A ak Yk =

2 When ~ is a static axiom, we also write c.~ for i.~o

51 for those attributes ai not occurring in S1 and S~ is bl -- vl A.. 9h bl = vz where bi = vi is the (unique) equation in S A S~ for an attribute bi not occurring in $2. From now on, we fix a Z-class specification S = (A, Z:,Z, T~, T ) associated to c : (T1, 999 , T~)O in some system specification (Z, C, S), and consider its associated specification formula S* (i, 5~) for some i of type obj. W.l.o.g. we will take the canonic form of every axiom in/~. STATIC FORMULAS AND AXIOMS. Are encoded by the mapping T ( - )

T
~. 0 ~ [T((pi)]T(cp2)

T(~l V ~2) T
~-> [~,n: o(nl!n[T(~l)]O[!n[T(~2)])]O ~ c.P(t)

T(V~: "?(~o=~ c.P(t))) H. !~: "? ~,P(t')[T(,.p>]0

This translation assigns to the theory 7" a well-typed core-l:~ F - t e r m Tic.'/-] defined as T(c.dz) I " " IT(c.d-) where el, are the formulas in 7". We have P r o p o s i t i o n 1 Let S be a signature, 7" be a static theory and i.~ a goal formula over ~=. Then ~=; 7" F ~o is provable in classical logic iJ~ ~; T: be the translation map

P(al = tl A . - . A am = tm>~ ~+ s(i, t l , . . . , tin)

P
P(j.a | r ,/

Uj:obj(C(j, t')lP
~

.

P<-)s IS parametric on two variables i and s, respectively of type obj and a c . This encoding represents the state of an object as a term s(i, g;), where i is the object name and g are the current values of it's attributes. We now can interpret in #Z:. attribute valuation formulas i.ak =- t (where ak is the k-th attribute in sequence defined above ) by the formula 3~ : r x l , . . . ,Xk~,t, Xk+2,... ,xm-1)] where the ~ are the appropriate types and t occurs in the k-th argument position of s. We will see shortly that this interpretation fully satisfies principles P1-3. P r o p o s i t i o n 2 Let ~o be a post-state formula in a class specification and S ~i.~. Then ~; u s : a s ( P ( r h/) ~ I i.r is valid for every I. DYNAMIC AXIOMS. Of the form r = g& : ~(S A ~ ~ <[a]>r are translated by the map R<-)~ given by R(r = ! ~ : ~ ~, P(S>~ : i(a)[T(i.~)]P
C(,S') = ! i : obj, 6~: ~=~~, c(i, 5)~Obs(i, 5~) where Obs(i, 5) - ~ s : as(P(2:>~ I'c-p: Z:pred~(R(n}i, IT(c.7"))) is the representation of an object i of class C in a state satisfying 27. In general, an agent of

52

the form vs : as(s(i,~)l~c, p : s [T(c.7"))) will represent an object "located at" i. Finally, a system ~ = (S, C, S) is encoded into the S ~ agent S<~) = IcecC(Sc) and we will take Def = C when defining the open operational semantic of a system of objects using the LTS of Section 2. Some remarks are needed before stating and proving the correctness of above defined encoding. COMPUTATIONS AND CONFIGURATIONS. A computation in system ~ = (S, C, S) is a reduction sequence I ~ $1 ~ $ 2 . - . with I of the form S~ S(~)IP and I ~ [c(i, ~)] for some class c 6 C. Every Si must have the form S'; S(!P)[v~ : ~(Obl[... [Obnlml[... link) where the Obj are objects and the mj are either creation requests or messages i(a). Since the agent S(!P) encoding ~P is fixed once for all, we can abstract of its presence and consider just agents like E'; v~ : ~(Obl[... [Obn[ml[... [mk), assuming that for transitions ~ c the action !d.C is always the encoding of some class specification in S(kV). Moreover, any component of a system also has this particular structure, so we can focus our concern just on well-typed agents of this form, to be called configurations. The following characterises the kind of actions configurations can perform with and without cooperation of the environment in a system reduction. P r o p o s i t i o n 3 Let C be a configuration of a system kv. If C --~ C' is a transition

occurring inside the derivation of a reduction between system configurations, then c is of the forms T,~i.m, $i.m or J~p for some definition p = C(S>. Hence, for instance when proving a (safety) property like I ~ invqo from I ~ and qo ~ [.]qo we can soundly restrict the universe of actions. Now, note that if S ; p t~4~ Q and Z; p ~:b Q, then there is no configuration r such that Z; r $J4~ r' and S; plr is also configuration, because an object i cannot be located both in p and r. Hence, the reference LTS can be refined to approximate relevance, by adding to rule [s] the proviso: "if not (a - ~ i(a) and q $J4b for some b)". The transition relation obtained thus is easily seen to be a subset of the reference transition relation and can be proven (still) sound w.r.t, the fragment of system configurations. Therefore, in the sequel, we will always refer to this restricted transition relation. We can now state correctness of the translation. L e m m a 1 Let S = ( . A , / : , / , 7 ~ , T ) be a E-class specification of a class c. Let I be any interpretation such that I s ( c . P ) = { ( t l , . . . , t , ) [ Z; e.T t- c.P({)}. Then

F,i: obj; Obs(i,~) ~ I i.Z/k inv{S* (i, ~3) | T} where F is of the form Z~176 ' for any Z'. An immediate consequence of Lemma 1 is that the translation just presented provides a model for system specifications. T h e o r e m 1 Let @ = ( Z , C , S ) be a system specification. Then

Zol.co ~'~preds,

i :

obj;c(i,~) ~ I (C(Sc>)(i.ZA inv{S*(i,~) | T})

for every c 6 C and ~ of the appropriate types, I assigns to each signature S the interpretation I~. (c.P) = { ( V l , . . . , Vkv) I ~; c.T P c.P(O)} and satisfaction is taken w.r.t, the relevant LTS for system configurations.

53

6

Some General Properties of Systems

In this section, we state several properties found useful to reason about systems and that can be proven valid in any configuration. We start by stating correctness of the interpretation of attribute valuations w.r.t, principles P1-3.

Proposition 4 For every configuration P of a system ~ = (~, C, S), and every class c 6 C such that a 6 s C

we have P ~ P1 A P2 A P3.

We now introduce the following properties R1 and R2 that are true of all of 7~. E l . PRIVACY. ~ : T ~ ~ Vy:T~:~(~A-~x - y). If a state has some private name, then this name is distinct from every term in the current environment. R2. TELESCOPING. ~'~:~(ev[s]99/k inv[U]_l_) ~ evZ~:~o where S is the set of actions without free occurrences of v 6 ~ and U is the set of actions extruding some v. Thus, we can move a restriction "forward" in a computation if the internal context does not extrude any of the restricted names. The remaining properties are specific to object system configurations. We assume that i, j are generic object names. R3. LOCALITY. (~i ~ j) A i.~ | 7- =~ [j.m](i.~ | T) for every pre-state formula ~. Attribute values of an object can only be changed by the object's own actions. R4. GROUPING. (-~i -- j) A (i.~ | T) A (j.r | T) ~ (i.~ | j . r | T) allows the concatenation of assertions about disjoint components of a configuration. Rh. MERGING. (i.~P | T) /k (i.r | T) ~ ((i.~ A i.r | T) allows the merging of assertions about the same component of a configuration. R6. RELEVANCE I. (~. i.a) T ~ [~ i.m]_l_ expresses that if some object is located in the system then no messages directed to it could be sent to the environment. R7. RELEVANCE II. Sc(i,~ ) | ~o ~ S*(i,a) | (~ A [$ i.m]_l_) Messages directed to an object i cannot be received by the internal context surrounding i. R8. SUPPORT. VX : ~((i.a> T ~ i.~) given that V~ : ~(i.~ ~ ([i.a]> i.r is the single dynamic axiom for the action a in S c (i, 5). R9. ACT. ([a] | i n v ( T | ( c y s t ) A (~ ~ ($ a) r ~ ev[s]{r | T} where S is any action set containing 7. Together with a fairness assumption, this ensures that every request for a message that is never forever disabled will be attended. E l 0 . CREATION. For each class c of type (T1,... ,T,~)O, the principle c(i,5) | [/.ml] | | [i.?nk] | 7-) ~ ev[s ] (ZASc* (i, a) | [/-~1] | | [i.Tnk] | "[- where S can be anything not excluding Sd with d is C(Sc) for some c 6 C. Every request for a new object is fulfilled. We now bring explicit a fairness assumption. D e f i n i t i o n 2 A computation 191 -?4 P2 ~ "'" is fair if whenever P~ ~- ~ : ~(b[Q) and Q ~ inv e v s (J,b} T then for some n > i, Pn -~ tel : ~(b[Q'),

Q' ~ Q", P~+I ~-

v9

: ~ Q" and P~ :+ P~+I.

In a fair computation, a message request for an action that is only sometimes disabled will necessarily result in its execution. The important remark to make is that the weak eventualities asserted in the creation and synchronisation axioms above express in fact strong eventualities, if we restrict our concern just to fair computations. For instance, we can prove that if P ~ I ([a] | inv(T | (evs~) A

54 Thr (s:obj,r:obj) is attrs up : bool

SerO is attrs req : li ch : obj

sl : Ii

initial

initial

v n : N ( r e q = nil A eh = n)

up---- T A sl = nil

dynamic

dynamic

Vr, a : obj ch = r ~ ([open(a)])

Vl : li, x : i u p - - T A sl = l

Thr(r, s)a.ans(t) Vs : li r : obj, l : li ch = r A req = s ([done(r, l)]) req = l 9 s

Vlu : li, x : i up = T A sl = l A r e m ( l , x, u)

vt

([ins(x)]) st = x 9 l

:

([del(~)]) sZ =

v l : lti a: obi req = l ~ ([d~mp(a)]) req -=- nil | a.sl(1)

Vl : li u p = T A s l = l ([qu@ up = V | s.done(r, 0

end

static Vx : i , l : li r e m ( x , x 9 1,1) V x y : i, lr : li r e i n ( x , y 9 l, x 9 r) r e i n ( x , l, r)

N 0 is end

end F i g . 4. A sample specification

(~o ~ ( a ) r then for all fair reduction paths P - P1 -% P2 -% "'" there is Pi such t h a t Pi ~ {r | T}.

7

Reasoning

about

Systems

We now present a toy specification of a commercial server usable for instance at a bookseller network site, and prove a couple of very simple properties about it with the p r o g r a m logic. Servers can perform three types of actions: o p e n ( a ) , t h a t creates a new thread whose identity is returned to the user a by means of a message a . a n s ( t ) , d u m p ( a ) t h a t returns to a the current list of orders, clearing it afterwards, and d o n e ( r , l), explained shortly. Each thread can execute insertions i n s ( i ) and deletions d e l ( i ) of items i in the current shopping list at the user command, until q u i t is invoked. After this happens, the thread yields the collected shopping list (the order) to the server via a d o n e ( r , l) request, committing itseff to no further action. Obviously, only s . d o n e ( r , l) requests originated by threads should be served by the server. This is achieved in the proposed specification by the use of a private communication channel kept in the c h attribute of servers. The specification is presented in Figure 4, in a sugared syntax not hard to relate to the definitions in Section 4. So, let I = ~ n : o b j ( s . r e q = n i l A s . c h = n ) A i n v { S e r * ( s ) | T} characterise the initial state of a system. We now prove t h a t I ~ i n v Vr : obj, l : l i [ s , d o n e ( r , I)]_L t h a t is, the privacy assumption just mentioned. To t h a t end, we first show the safety property I ~ i n v P where P = Z n : o b j ( s . c h = n | T) A i n v { S e r * ( s ) | T } . Since I ~ P , we first argue informally t h a t P ~ [a]P for every action a. Since

55 attribute s.ch is never mentioned in s post-formulas, its value n never changes by the frame condition. So we must just prove that such private value is never extruded. The only possibility is in an action 1` (n : obj)s.done(n, l). But since say P ~ ($(u: o b j ) s . o p e n ( u ) ) T then P ~ [1" (n: obj)s.done(n,l)]_L by R6, and therefore P ~ [$(n)s.done(n, l)]P. Thus I =~ invP. We now prove in detail that P ~ VRL[s.done(R, L)]_L. 1. ~ n ( ( S e r * ( s ) A s.ch -- n) | T), hypothesis and R5. By def of Ser*(s) and R8 2. En(((Vs'r'l's.ch = r' A s.req = s' ~ (s.done(r', l')) T) A s.ch = n) | T) 4. Zn(((Vr'l'-~s.ch = r' ~ [s.done(r',/')]_L) A s.ch = n) @ T), by pure logic. 5. Z,~((Vr'l'-~n "- r' ~ [s.done(r',/')]_L) | T), by P2. 6. 3 R L (s.done(R, L)) T, hypothesis. 7. 2 R L ( ( s . d o n e ( R , L ) ) T A Zn((Vr'l'-~n & r' ~ [s.done(r',l')]2) | T), by 5,6. S. 3RL(Z,~((Vr'l'-~n - r' ~ [s.done(r', I')]_L A (s.done(R, L)) T A -~R - n) | T), by R1 and noting that n is not free in s.done(R, L). 9. 3 R L ( Z n ( ( [ s . d o n e ( R , L)]_L A (s.done(R, L)) T) | T), by pure logic. 10. VRL[s.done(R, L)]_L, by 9,6 contradiction. 11. P ~ VRL[s.done(R, L)]_L, by 1 discharge, and we are done. We now consider the following system-wide property ~ stating that, in certain conditions, after a thread executes a quit action, it should disable further actions and it's parent server must eventually acquire the shopping list produced by it. Then ~ is Z n ( S e r * ( s ) | (t.sl = l A T h r * ( t , s , n ) ) | T) ~ [t.quit](ev{3u(s.reqs = l 9 u) | T} A |nv[S]_L) where S is the set of all actions by t. We now sketch the proof of ~, highlighting just the main steps. Assume Zn (Ser* (s) | (t.sl = l A Thr* (t, s, n)) | T). This implies ~,~(Ser*(s) | (t.sl = 1A Thr*(t, s, n) A t.Ts(s, n)) | T), where Ts(s, n) is the dynamic axiom for quit in the specification for threads. Now, [t.quit]B where B =_ ~ n ( S e r * (s) | (t.up = F A T h r * (t, s, n) | s.done(n, l)), using t.Ta (s, n), and noting that n is not free in t.quit. Note that B ~ inv[S]_L, because B i n v ( t . u p = F| On the other hand, S ~ Zn(s.done(n, l ) | 1 7 4 T}). Now, s.done(n, l) | i n v { Ser* ( s) | implies s.done(n, l) | i n v ( e v s 3 r ( s.ch = n A s.reqs = r) A qr(s.ch = n A s.reqs = r) ~ ([s.done(n, l)]) qr(s.reqs = 1 * r) | T), and this last formula implies ev[un]{(3r(s.reqs = 1 9 r) | T)} by R9, where Un is the set of actions without free occurrences of n. Since ~nev[un] r ::~ eV~nr , by R2, we conclude ~. 8

Conclusions

and Further

Work

We presented a declarative executable specification language and program logic for concurrent objects providing a unified framework for specifying and reasoning about systems, thus demonstrating the usefulness of the Z:,~ language as a meta-language for structuring the definition of operational semantics and program verification logics of languages with concurrent and logic features. An aspect not covered in the present paper was object types, we expect also types to be accommodated in this framework. The proposed model can be extended

56 in several directions (for instance, modelling and reasoning a b o u t inheritance, transactions, or mobile code). An interesting observation about #/2~ is t h a t the monoidal structure induced by | 1 together with ~ and Z~:r induces an Action Structure [9]. This fact may be explored in the definition of a more abstract characterisation of a verification logic for global/local properties of modular systems in the presence of private names. Acknowledgements To the anonymous referees for their useful comments and Project E S C O L A P R A X I S / 2 / 2 . 1 / M A T / 4 6 / 9 4 for partially supporting this work.

References 1. L. Calres. A language for the logical specification of processes and relations. In Michael Hanus, editor, Proceedings of the Algebraic and Logic Programming International Conference, number 1139 in LNCS, pages 150-164, 1996. 2. L. Caires. A language for the logical specification of processes and relations. Technical Report 6.96, Universidade Nova de Lisbon, DI/FCT, 1996. http://wwwctp .di.fct .unl.pt/,-.lcaires/writings/lpi6.96.ps.gz. 3. L. Caires and L. Monteiro. Proof net semantics of proof search computation. In K. Meinke and M. Hanus, editors, Proceedings of the Algebraic and Logic Programming International Conference, number 1298 in LNCS, pages 194-208, 1997. 4. J. Fiadeiro and T. Maibaum. Temporal theories as modularisation units for concurrent system specification. Formal Aspects of Computing, 4(3):239-272, 1992. 5. D. Kozen. Results on the propositional #-calculus. TCS, 27(3):333-354, 1983. 6. D. Miller. A survey of linear logic programming. Computational Logic, 2(2), 1995. 7. D. Miller, G. Nadathur, F. Pfenning, and A. Scedrov. Uniform proof as a foundation for logic programming. Ann. of Pure and App. Logic, (51):125-157, 1991. 8. R. Milner. Functions as processes. Math. Struc. in Computer Sciences, 2(2):119141, 1992. 9. R. Milner. Calculi for interaction. Acta Informatica, 33(8):707-737, 1996. 10. R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, Part I -4- II. Information and Computation, 100(1):1-77, 1992. 11. O. Nierstrasz, J-G. Schneider, and M. Lumpe. Formalizing composable software systems - A research agenda. In Proceedings 1st IFIP Workshop on Formal Methods for Open Object-based Distributed Systems FMOODS'96, pages 271-282. Chapmann and Hall, 1996. 12. B. Pierce and D. Turner. Concurrent objects in a process calculus. In Takayasu Ito and Akinori Yonezawa, editors, Theory and Practice of Parallel Programming (TPPP), Sendai, Japan (Nov. 199~), number 907 in Lecture Notes in Computer Science, pages 187-215. Springer-Verlag, April 1995. 13. A. Pnueli. The temporal semantics of concurrent programs. In G. Kahn, editor, Semantics of Concurrent Computations, volume 70 of LNCS, pages 1-20, Evian, France, July 1979. Springer-Verlag, Berlin, Germany. 14. D. Sangiorgi. An interpretation of typed objects into the typed ~r-calculus. Technical report, INRIA Technical Report RR-3000, 1996. 15. A. Sernadas, C. Sernadas, and J. Costa. Object specification logic. Journal of Logic and Computation, 5(5):603-630, October 1995. 16. D. Walker. Objects in the 7r-calculus. Journal of Information and Computation, 116(2):253-271, 1995.

Complexity of Concrete Type-Inference in the Presence of Exceptions* R a m k r i s h n a Chatterjee 1 Barbara G. Ryder 1 William A. Landi 2 1 Department of Computer Science, Rutgers University, Piscataway, NJ 08855 USA, Fax: 732 445 0537, {ramkrish,ryder}@cs.rutgers.edu 2 Siemens Corporate Research Inc, 755 College Rd. East, Princeton, NJ 08540 USA, wlandi~scr.siemens.com

A b s t r a c t . Concrete type-inference for statically typed object-oriented programming languages (e.g., Java, C ++) determines at each program point, those objects to which a reference may refer or a pointer may point during execution. A precisecompile-time solution for this problem requires a flow-sensitive analysis. Our new complexity results for concrete type-inference distinguish the difficulty of the intraproeedural and interproeedural problem for languages with combinations of single-level types 3, exceptions with or without subtyping, and dynamic dispatch. Our results include: - The first polynomial-time algorithm for concrete type-inference in the presence of exceptions, which handles Java without threads, and

C++; Proofs that the above algorithm is always safe and provably precise on programs with single-level types, exceptions without subtyping, and without dynamic dispatch; - Proof that intraprocedural concrete type-inference problem with single-level types and exceptions with subtyping is P S P A C E c o m p l e t e , while the interprocedural problem without dynamic dispatch is P S P A C E - h a r d . Other complexity characterizations of concrete type-inference for programs without exceptions are also presented. -

1

Introduction

Concrete type-inference (CTI from now on) for statically typed object-oriented p r o g r a m m i n g languages (e.g., Java, C ++) determines at each program point, those objects to which a reference m a y refer or a pointer m a y point during execution. This information is crucial for static resolution of dynamically dispatched calls, side-effect analysis, testing, program slicing and aggressive compiler optimization. * The research reported here was supported, in part, by NSF grant GER-9023628 and the Hewlett-Packard Corporation. 3 These are types with data members only of primitive types.

58 The problem of CTI is both intraprocedurally and interprocedurally flowsensitive. However, there are approaches with varying degrees of flow-sensitivity for this problem. Although some of these have been used for pointer analysis of C, they can be adapted for CTI of Java without exceptions and threads, or C ++ without exceptions. At the one end of the spectrum are intraprocedurally and interprocedurally flow-insensitive approaches [Ste96, SH97, ZRL96, And94], which are the least expensive, but also the most imprecise. While at the other end are intraprocedurally and interproceduraUy flow-sensitive approaches [LR92, EGH94, WL95, CBC93, MLR+93, Ruf95], which are the most precise, but also the most expensive. Approaches like [PS91, PC94, Age95] are in between the above two extremes. An intraprocedurally flow-insensitive algorithm does not distinguish between program points within a method; hence it reports the same solution for all program points within each method. In contrast, an intraprocedurally flow-sensitive algorithm tries to compute different solutions for distinct program points. An interprocedurally flow-sensitive (i.e. context-sensitive) algorithm considers (sometimes approximately) only interprocedurally realizable paths [RHS95, LR91]: paths along which calls and returns are properly matched, while an interprocedurally flow-insensitive (i.e. context-insensitive) algorithm does not make this distinction. For the rest of this paper, we will use the term flow-sensitive to refer to an intra- and interprocedurally flow-sensitive analysis. In this paper, we are interested in a flow-sensitive algorithm for C T I o f a robust subset of Java with exceptions, but without threads (this subset is described in Section 2). The complexity of flow-sensitive CTI in the presence of exceptions has not been studied previously. None of the previous flow-sensitive pointer analysis algorithms [LR92, WL95, EGH94, PR96, Ruf95, CBC93, MLR+93] for C/C ++ handle exceptions. However, unlike in C ++, exceptions are frequently used in Java programs, making it an important problem for Java. The main contributions of this paper are: - The first polynomial-time algorithm for CTI in the presence of exceptions that handles a robust subset of Java without threads, and C ++4, - Proofs that the above algorithm is always safe and provably precise on programs with single-level types, exceptions without subtyping, and without dynamic dispatch; thus this case is in P, - Proof that intraprocedural CTI for programs with single-level types and exceptions with subtyping is P S P A C E - e o m p l e t e , while the interprocedural problem (even) without dynamic dispatch is P S P A C E - h a r d . - New complexity characterizations of CTI in the absence of exceptions. These results are summarized in table 1, which also gives the sections of the paper containing these results. The rest of this paper is organized as follows. First, we present a flow-sensitive algorithm, called the basic algorithm, for CTI in the absence of exceptions, and discuss our results about complexity of CTI in the absence of exceptions. Next, f

4 In this paper, we present our algorithm only for Java.

59 results

paper single-level exceptions exceptions dynamic sectior types iwithout subtypes with subtypes dispatch interprocedural CTI sec 4 X X in P, O(n ~) intraprocedural GTI sec 4 X X PSPACE-complete interprocedural CTI sec 4 X X PSPACE-hard interprocedural CTI sec 3 X PSPACE-hard interprocedural GTI sec 3 X in P, O(n s) X intraprocedural CTI sec 3 in N C

Table 1. Complexity results for CTI summarized

we extend the basic algorithm for CTIin the presence of exceptions, and discuss the complexity and correctness of the extended algorithm. Finally, we present P S P A C E - h a r d n e s s results about CTIin the presence of exceptions. Due to lack of space, we have omitted all proofs. These proofs and further details about the results in this paper are given in [CRL97] 5.

2

Basic definitions

P r o g r a m r e p r e s e n t a t i o n . Our algorithm operates on an interprocedural control flow graph or I C F G [LR91]. An I C F G contains a control flow graph (CFG) for each method in the program. Each statement in a method is represented by a node in the m e t h o d ' s CFG. Each call site is represented using a pair of nodes: a call-node and a return-node. Information flows from a call-node to the entrynode of a target m e t h o d and comes back from the exit-node of the target method to the return-node of the call-node. Due to dynamic dispatch, interprocedural edges are constructed iteratively during data-flow analysis as in [EGH94]. Details of this construction are shown in Figure 3. We will denote the entry-node of main by start-node in the rest of this paper.

R e p r e s e n t a t i o n o f d y n a m i c a l l y c r e a t e d o b j e c t s . All run-time objects (or arrays) created at a program point n are represented symbolically by object_n. No distinction is m a d e between different elements of an array. Thus, if an array is created at n, object_n represents all elements of the array. P r e c i s e solution for CTI. A reference variable is one of the following: - a static variable (class variable) of reference 6 type; 5 available at http://www.prolangs.rutgers.edu/refs/docs/tr341.ps. 6 may refer to an instance of a class or an array.

60 - a local variable of reference type; - Av, where Av is V[tl]...[Q], and 9 V is a static/local variable or V is an array object~ allocated at program point n, such that V is either a d-dimensional array of reference type or an array of any type having more than d dimensions and 9 each ti is a non-negative integer; or V.sl...sk, where 9 V is either a static/local variable of reference type or V is Av or V is object objectn created at program point n, 9 for 1 < i < k, each V.sl...si-1 (V for i = 1) has the type of a reference to a class T / a n d each si is a field of reference type of T / o r si = fi and fi is a field of ~ and fi is an array having at least ri dimensions and each tij is a non-negative integer, and 9 V.sl...sk is of reference type. -

[tQ]...[ti.i]

Using these definitions, the precise solution for C T I can be defined as follows: given a reference variable R V and an object objecLn, (RV, objecLn ) belongs to the precise solution at a program point n if and only if R V is visible at n and there exists an execution p a t h from the start-node of the program to n such that if this p a t h is followed, R V points to objecLn at n (i.e., at the top of n). Unfortunately, all paths in a program are not necessarily executable a n d determining which are executable is undecidable. Barth[Bar78] defined precise up to symbolic execution to be the precise solution under the assumption that all program paths are executable (i.e., the result of a test is independent of previous tests and all the branches are possible). In the rest of this paper we use precise to mean precise up to symbolic execution. P o i n t s - t o . A points-to has the form (var, obj >; where varis one of the following: (1) a static variable of reference type, (2) a local variable of reference type, (3) objec~_m - an array object created at program point m or (4) object_n.f- field f, of reference type, of an object created at a program point n; and obj is object_s - an object created at a program point s. S i n g l e - l e v e l t y p e . A single-level type is one of the following: (1) a primitive type defined in [GJS96] (e.g., int, float etc.), (2) a class that has all non-static d a t a - m e m b e r s of primitive types (e.g., class A { int i,j; }) or (3) an array of a primitive type. S u b t y p e . We use Java's definition ofsubtyping: a class A is a subtype of another class B if A eztends B, either directly or indirectly through inheritance. S a f e s o l u t i o n . An algorithm is said to compute a safe solution for C T I if and only if at each program point, the solution computed by the algorithm is a superset of the precise solution.

6] S u b s e t o f J a v a c o n s i d e r e d . We essentially consider a subset that excludes threads, but in some cases we may need to exclude three other features: finalize methods, static initializations and dynamically defined classes. Since finalize methods are called (non-deterministically) during garbage collection or unloading of classes, if a finalize method modifies a variable of reference type (extremely rare), it cannot be handled by our algorithm. Static initializations complicate analysis due to dynamic loading of classes. If static initializations can be done in program order, our algorithm can handle them. Otherwise, if they depend upon dynamic loading (extremely rare), our algorithm cannot handle them. Similarly, our algorithm cannot handle classes that are constructed on the fly and not known statically. We will refer to this subset as Java Wo Threads. Also, we have considered only exceptions generated by throw statements. Since run-time exceptions can be generated by almost any statement, we have ignored them. Our algorithm can handle run-time exceptions if the set of statements that can generate these exceptions is given as an input. If all statements that can potentially generate run-time exceptions are considered, we will get a safe solution; however, this may generate far more information than what is useful.

3

CTIin

the

absence

Our basic algorithm for ates on an ICFG and is analysis, but instead of extend this algorithm to

of exceptions

CTI is an iterative worklist algorithm [KU76]. It opersimilar to the Landi-Ryder algorithm [LR92] for alias aliases, it computes points-tos. In Section 4, we will handle exceptions.

L a t t i c e f o r d a t a - f l o w a n a l y s i s . In order to restrict data-flow only to realizable paths, points-tos are computed conditioned on assumed-points-tos (akin to reaching alias in [LIL92] [PR96]), which represent points-tos reaching the entry of a method, and approximate the calling context in which the method has been called (see the example in Appendix A). A points-to along with its assumedpoints-to is called a conditional-points-to. A conditional-points-to has the form (condition, points-to), where condition is an assumed-points-to or empty (meaning this points-to is applicable to all contexts). For simplicity, we will write (empty,points-to I as points-to. Also a special data-flow element reachable is used to check whether a node is reachable from the start-node through a realizable path. This ensures that only such reachable nodes are considered during dataflow analysis and only points-tos generated by them are put on the worklist for propagation. The lattice for data-flow analysis (associated with a program point) is a subset lattice consisting of sets of such conditional-points-tos and the data-flow element reachable. Q u e r y . Using these conditional-points-tos, a query for CTI is answered as follows. Given a reference variable V and a program point l, the conditionalpoints-tos with compatible assumed-points-tos computed at l are combined to

62 determine the possible values of V. Assumed-points-tos are compatible if and only if they do not imply different values for the same user defined variable. For example, if V is p.fl, and the solution computed at I contains (empty, (p, objl)), (z, (objl.fl, obj2)) and (u, (objl.fl, obj3)), then the possible values of Y are obj2 and obj3.

Algorithm description. Figure 1 contains a high-level description of the main loop of the basic algorithm, apply computes the effect of a statement on an incoming conditional-points-to. For example, suppose l labels the statement p.fl = q, ndf_clm (i.e. the points-to reaching the top of l) is (z, (p, object_s)) and (u, (q, object_n)) is present in the solution computed at I so far. Assuming z and u are compatible, apply generates (object_s.fl, object_n) under the condition that both z and u hold at the entry-node of the method containing I. Then either z or u is chosen as the condition for the generated data-flow element. For example, if u is chosen then (u, (object_s.fl, object_n)) will be generated. When a conjunction of conditions is associated with a points-to, any fixed-size subset of these conditions may be stored without affecting safety. At a program point where this data-flow element is used, if all the conjuncts are true then any subset of the conjuncts is also true. This may cause overestimation of solution at program points where only a proper subset of the conjuncts is true. At present, we store only the first member of the list of conditions, apply is defined in Appendix B. add_to_solution_and_worklist_if_nccded checks whether a data-flow element is present in the solution set (computed so far) of a node. If not, it adds the dataflow element to the solution set, and puts the node along with this data-flow element on the worklist. process_exiLnode propagates dat~-flow elements from the exit-node of a method to the return-node of a call site of this method. Suppose (z, u) holds at the exit-node of a method M. Consider a return-node R of a call site C of M. For each assumed-points-to m such that (z, t) is in the solution set at C and t implies z at the entry-node of M, (z, u) is propagated by process_exit_node to R. process_eziLnode is defined in Figure 2. process_caILnode propagates data-flow elements from a call site to the entrynode of a method called from this site. Due to dynamic dispatch, the set of methods invoked from a call site is iteratively computed during the data-flow analysis as in [EGH94]. Suppose (z, t) holds at a call site C which has a method M in its set of invocable methods computed so far. If t implies a points-to z at the entry-node of M (e.g., through an actual to formal binding), (z,z) is forwarded to the entry-node of M. proccss_calLnodc also remembers the association between z and z at C because this is used by proccss_ezit_node as described above, process_call_node is defined in Figures 3 and 4. Other functions used by the above routines are defined in Appendix B. Appendix A contains an example which illustrates the basic algorithm. Precision of the basic algorithm. By induction on the number of iterations needed to compute a data-flow element and the length of a path associated with a data-flow element, in [CRL97], we prove that the basic algorithm computes the

63 worklist. Each worklist node contains a data-flow element, which /•/ isinitialize a conditional-points-to or reachable, and an ICFG node. create a .orklist node containing the entry-node of main and reachable, and add it to the .orklist;

.hile ( .orklist is not emptyte)h{

eorklist ; WLnode = remove a node from ndf_elm = WLnode.data-flow-element; node = WLnode.node; if ( node ~

a call_node and node ~

exit_node of a method ) {

//compute the effect of the statement associated with node on nd/_elm. generated~lata_flo._elements

= apply( node, ndf_elm );

for ( each successor suet of node ) for ( each dr_elm in generated_data_flo,_elements ) add_to_solution_and_worklist_if_needed( dr_elm, succ );

} ~/endoliI if ( node is an exit.node of a method ) process_exit_node( node, ndf_elm ) ; if ( node is a call_node ) process_call_node( node, ndf_elm ) ;

} / / e n d of while

Fig. 1. High-level description of the basic algorithm

precise solution for programs with only single-level types and without dynamic dispatch, exceptions or threads. For programs of this form, C T I is distributive and a conditional-points-to at a program point can never require the simultaneous occurrence of multiple conditional-points-tos at (any of) its predecessors. Intuitively this is why the above proof works. The presence of general types, dynamically dispatched calls or exceptions with subtyping violate this condition and hence C T I is not polynomial-time solvable in the presence of these constructs. We also prove that the basic algorithm computes a safe solution for programs written in Java Wo Threads, but without exceptions. C o m p l e x i t y o f t h e basic a l g o r i t h m . The complexity of the basic algorithm for programs with only single-level types and without dynamic dispatch, exceptions or threads is O(nS), where n is approximately the number of statements in the input program. This an improvement over the O(n 7) worst-case bound achievable by applying previous approaches of [RHS95] and [LR91] to this case. Note that O(n 3) is a trivial worst-case lower bound for obtaining a precise solution for this case. For programs written in Java Wo Threads, but without exceptions, the basic algorithm is polynomial-time. O t h e r r e s u l t s on t h e c o m p l e x i t y o f C T I in the absence o f exceptions. In [C1~L97], we prove the following two theorems: T h e o r e m 1 Intraprocedural C T I for programs with only single-level types is in non-deterministic log-space and hence NC.

64 void process_exit_node( exit_node, ndf_olm ) { / / L e t M be the method containing the exit_node. if ( ndf_elm represents the value of a local variable )

/ / i t need not be forwarded to the successors (return-nodes of call sites for this method) because the local variable is not visible outside this method. return;

if ( ndf_elm is reachable ) { for ( each call site C in the current sot of call sites of M ){ if ( solution at C contains reachable ) { add_to_solution_and_.orklist_if_needed( ndf_elm, R ) ;

//R

is the return-node for C.

for ( each s in C.waiting_local_points_to_table

) {

/~/ conditional-points-tos representing values of local variables reaching //~/ C are not forwarded to R until it is found reachable. C.waiting_local_points_to_table contains such conditional-points-tos. f / S i n c e R has been found to be reachable

delete s from C.waiting_local_poinst_to_tablo; add_to_solution_and_.orklist_if_needed( s, R ) ;

} }

}

r)eturn;

add_to_table_of_condi$ions(

ndf_olm, exit_node ) ;

//This table is accessed from the call sites of M for expanding assumed-points-tos. for ( each call site C in the current set of call sites of M ) { Sor get_assumed_points_tos( C, ndf_olm.assumed-points_to, M ) ; ( each assumed_points_to Apt in S ) { CPT = new conditional-points-to( Apt, half_elm.points_to ); add_to_~olu$ion_and_worklist_if_neoded( CPT, It );

}1

/ / R is the return-node for C.

} / / e n d of process_exit_node

Fig. 2. Code for processing an exit-node Recall that non-deterministic log-space is the set of languages accepted by nondeterministic Turing machines using logarithmic space[Pap94] and NC is the class of efficiently parallelizable problems which contains non-deterministic logspace. Theorem 2 C T I for programs with only single-level types and dynamic dispatch is P S P A C E hard.

4

Algorithm for CTIin the presence of exceptions

In this section we extend the basic algorithm for C T I o f Java Wo Threads, and discuss the complexity and precision of this extended algorithm. Data-flow Elements: The data-flow elements propagated by this extended algorithm have one of the following forms:

65

void

//R

process_caILnode( C, ndf_elm ){ is the return-node for caILnode C.

if ( ndf_elm implies an increase in the set CM of methods invoked from this site ) {

//Recall that due to dynamic dispatch, the interprocedural //edges are constructed on the fly, as in [EGH94]. add t h i s new method ~

}

to CM;

for ( each dfelm in the solution set of C ) interprocedurally_propagate( dfelm, C, nM) ; //defined in Figure

if ( ndf_elm represents value of a local variable ) if ( solution set for R contains reachable )

/~/ Forward ndf_elm to the return-node because (unlike C++) a local variable cannot be modified by a call in Java. add_to_solution_and_worklist_if_needed( ndf_elm, R );

else

/ / Cannot forward till R is found to be reachable.

}

add ndf_elm to waiting_local_points_to_table;

for ( each method M in CM ) interprocedurally~ropagate(

ndf_elm, C, M );

Fig. 3. Code for processing a call-node

1. 2. 3. 4. 5. 6. 7.

(reachable), (label, reachable), (ezcp-type, reachable), (z, u>, (label, z, u), (ezcp-typc, z, u), (ezcp, z, obj).

Here z and u are points-tos. The lattice for data-flow analysis associated with a p r o g r a m point is a subset lattice consisting of sets of these data-flow elements. In the rest of this section, we present definitions of these data-flow elements and a brief description of how they are propagated. Further details are given in [CKL97]. First we describe how a throw statement is handled. Next, we describe propagation at a method exit-node. Finally, we describe how a finally statement is handled.

throw statement: In addition to the conditional-points-tos described previously, this algorithm uses another kind of conditional-points-tos, called ezceptionalconditional-points-tos, which capture propagation due to exceptions. The conditional part of these points-tos consists of an exception type and an assumed points-to (as before). Consider a throw statement l in a method Proc, which throws an object of type T (run-time type and not the declared type). Moreover let ((q, objl), (p, objil) be a conditional-points-to reaching the top of l. At the throw statement, this points-to is transformed to (T, (q, objl), (p, obj2)> and

66

interprocedurally_propagate( ndf_elm, C, M) { / / C ks a call_node, R is the return-node o] C and M is a method called from C. i f ( ndf_elm == reachable ) { add_t o_solut ion_and_~orklist_if_needed (ndf_elm, M. entrymode) ; if ( M.exit_node has reachable ) { add_to_solution_and_worklist_if_needed(ndf_elm, R) ; for ( each s in C.waiting_local_points_to_table ) {

/ / S i n c e R has been found to be reachable

d e l e t e s from C.waiting_loea1_poinst_to_table;

} }

}

add_to_solution_andJorklist_if_needed(

s, R ) ;

propagat e_conditional_points_t os_with_empty_condit ion(C ,M) ; return;

/ / g e t the points-tos implied by ndf_elm at the entry-node o/ M S = get_implied_conditional_points_tos

(ndf_elm,M, C) ;

for ( each s in S ) { add_to_solution_andJorklist_if_needed( s, M.engry_node ) ; add_t o_t able_of_assumed_point s_t os ( $. assumed_points_t o, ndf_elm.assumed_points_to, C ) ;

//S This table is accessed ~rom exit-nodes o/methods called from C for expanding assumed points-tos. i f ( ndf_elem.apt is a new apt f o r s . a p t ) { / / a p t stands ]:or assumed-points-to Pts = lookup_table_of_conditions(

s.assumed_points_to,

M. exit_node ) ;

//nd/_elm.assumed_points_to is an assumed-points-to for each element o/ Pts

}

}

for ( each pts in Pts ) { opt = n o v conditional_points_to( ndf_elm.assumed_points_to, add_to_solution_and_,orklist_ifaxeeded( cpt, R ) ;

pts ) ;

} }//end o/each s in S

Fig. 4. Code for interproeedurally_propagate

propagated to the exit-node of the corresponding try statement, if there is one. A precalculated catch-table at this node is checked to see if this exception (identified by its type T) can be caught by any of the corresponding catch statements. If so, this exceptional-conditional-points-to is forwarded to the entry-node of this catch statement, where it is changed back into an ordinary conditional-points-to ((q, obj 1), (p, obj2)). If not, this exceptional-conditional-points-to is forwarded to the entry-node of a finally statement (if any), or the exit-node of the innermost enclosing try, catch, finally or the method body. A throw statement also generates a data-flow element for the exception itself. Suppose the thrown object is obj and it is the thrown object under the assumed points-to (p, objl). Then (ezcp, (p, objl), obj) representing the exception is generated. Such data-flow elements are handled like exceptional-conditional-pointstos, described above. If such a data-flow element reaches the entry of a catch statement, it is used to instantiate the parameter of the catch statement.

67 In addition to propagating reachable (defined in section 3), this algorithm also propagates data-flow elements of the form (ezcp-type, reachable). When (reachable) reaches a ~hrow statement, it is transformed into (ezcp-type, reachable), where emcp-type is a run-time type of the exception thrown, which is then propagated like other exceptional-conditional-points-tos. If the throw is not directly contained in a try statement, then the dataflow elements generated by it are propagated to the exit-node of the innermost enclosing catch, finally or method body.

ezit-node of a method: At the exit-node of a method, a data-flow element of type 4,6 or 7 is forwarded (after replacing the assumed points-to as described in section 3) to the return-node of a call site of this method if and only if the assumed points-to of the data-flow element holds at the call site. At a returnnode, ordinary conditional-points-tos (type 4) are handled as before. However, a data-flow element of type 6 or 7 is handled as if it were generated by a throw at this return-node. finally statement: The semantics of exception handling in J a v a is more complicated than other languages like C ++ because of the finally statement. A try statement can optionally have a finally statement associated with it. It is executed no m a t t e r how the try statement terminates: normally or due to an exception. A finally statement is always entered with a reason, which could be an exception thrown in the corresponding try statement or one of the corresponding catch statements, or leaving the try statement or one of its catch clauses by a return, (labelled) break or (labelled) continue, or by falling through. This reason is rem e m b e r e d on entering a finally, and unless the finally statement itself creates its own reason to exit the finally, at the exit-node of the finally this reason is used to decide control flow. If the finally itself creates its own reason to exit itself (e.g., due to an exception), then this new reason o v e r r i d e s any previous reason for entering the finally. Also, nested finally statements cause reasons for entering t h e m to stack up. In order to correctly handle this involved semantics, for all data-flow elements entering a finally, the algorithm remembers the reason for entering it. For data-flow elements of type 3, 6 or 7 (enumerated above), the associated exception already represents this reason. A label is associated with data-flow elements of type 1 or 4, which represents the statement number to which control should go after exit from the finally. Thus the data-flow elements in a finally have one of the following forms:

1. 2. 3. 4. 5.

(label, reachable), ( ezcp-type, reachable), (label, z, u) , (ezcp-type, z, u), (ezcp, z, ob3).

When a labelled data-flow element reaches the labelled statement, the label is dropped and it is transformed into the corresponding unlabelled data-flow element.

68 Inside a finally, due to labels and exception types associated with data-flow elements, apply uses a different criterion for combining data-flow elements (at an assignment node) than the one given in section 3. Two data-flow elements (xl,yl,zl) and (x2,y2,zZ) can be combined if and only if both xl and xZ represent the same exception type or the same label, and y1 and y2 are compatible (as defined in section 3). At a call statement (inside a finally), if a data-flow element has a label or an exception type associated with it, it is treated as part of the context (assumed points-to) and not forwarded to the target node. It is put back when assumed points-tos are expanded at an exit-node of a method. For exceptionalconditional-points-tos or data-flow elements representing exceptions, the exceptions associated with them at the exit-node o v e r r i d e any label or exception type associated with their assumed points-tos at a corresponding call site. Data-flow elements of the form (label,reachable) or (excp-type, reachable) are propagated across a call if and only if (reachable) reaches the exit-node of one of the called methods. A mechanism similar to the one used for handling a call is used for handling a try statement nested inside a finally because it can cause labels and exceptions to stack up. Details of this are given in [CRL97]. If the finally generates a reason of its own for exiting itself, the previous label/exception-type associated with a data-flow element is discarded, and the new label/exception-type representing this reason for leaving the finally is associated with the data-flow element. E x a m p l e . The example in Figure 5 illustrates the above algorithm.

Precision o f t h e extended algorithm. In [CRL97], we prove that the extended algorithm described in Section 4 computes the precise solution for programs with only single-level types, exceptions without subtyping, and without dynamic dispatch. We also prove that this algorithm computes a safe solution for programs written in Java WoThreads.

Complexity o f the extended a l g o r i t h m . The the worst-case complexity of the extended algorithm for programs with only single-level types, exceptions without subtyping, and without dynamic dispatch is O(nT). Since we have proved that the algorithm is precise for this case, this shows that this case is in P. If we disallow trys nested inside a finally, the worst-case complexity is O(n6). For general programs written in Java WoThreads, the extended algorithm is polynomialtime. C o m p l e x i t y d u e t o e x c e p t i o n s w i t h s u b t y p i n g . In [CRL97], we prove the following theorem:

Theorem 3 Intraprocedural CTI for programs with only single-level types and

exceptions with subtyping is PSPACE-complete; while the interprocedural case (even) without dynamic dispatch is PSPACE-hard.

69

/ / N o t e : for simplicity only a part of the solution is shown class A {}; class .xcp_t extends Exception class base {

{};

public static A a; public static void method( A parma ) throws excp_t { sxcp_t unexp; a = parma; ii: u n e x p = n e w excp_t;

/r

empty, ( unexp, object_ll ) >,

( (p~m, obje~t_12 >, (~, objectS2> > ow unexp;

};}

/~/ (excp, empty, object_ll ), (excp_t, (param, object_f2 >, (a, object_12 > >

class test {

public static void test_method( ) { A local ; 12: local = new A; try { base.method(

}

local );

//<excp, empty, object_Zl >, (excp_t, empty, (a, object_Z2 > )

c a t c h ( excp_t parma ) {

~3/:(empty,
} finally {

// (l~,empty, (a, objectS2> )

<empty, >

};} Fig. 5. CTI in the presence of exceptions Theorems 2 and 3 show that in the presence of exceptions, among all the reasonable special cases that we have considered, programs with only single-level types, exceptions without subtyping, and without dynamic dispatch comprise the only natural special case that is in P. Note that just adding subtyping for exception types and allowing overloaded catch clauses increase complexity from P to PSPACE-hard.

5

Related work

As mentioned in the introduction, no previous algorithm for pointer analysis or C T I handles exceptions. This work takes state-of-the-art in pointer analysis one step further by handling exceptions. Our algorithm differs from other pointer analysis and C T I algorithms [EGH94, WL95, l~uf95, PC94, PS91, CBC93,

70 MLR+93] in the way it maintains context-sensitivity by associating assumedpoints-tos with each data-flow element, rather than using some approximation of the call stack. This way of handling context-sensitivity enables us to obtain precise solution for polynomial-time solvable cases, and handle exceptions. This way of maintaining context is similar to Landi-Ryder's[LR92] method of storing context using reaching aliases, except that our algorithm uses points-tos rather than aliases. Our algorithm also differs from approaches like [PS91, Age95] in being intraprocedurally flow-sensitive.

6

Conclusion

In this paper, we have studied the complexity CTI for a subset of Java, which includes exceptions. To the best of our knowledge, the complexity of CTIin the presence of exceptions has not been studied before. The following are the main contributions of this work (proofs are not presented in this paper, but appear in [CRL97]): 1. The first polynomial-time algorithm for CTI in the presence of exceptions which handles a robust subset of Java without threads, and C ++. 2. A proof that CTIfor programs with only single-level types, exceptions without subtyping, and without dynamic dispatch is in P and can be solved in O(n 7) time. 3. A proof that intraprocedural CTIfor programs with only single-level types, exceptions with subtyping, and without dynamic dispatch is P S P A C E complete, and the interprocedural case is PSPACE-hard. Additional contributions are: 1. A proof that CTI for programs with only single-level types, dynamic dispatch, and without exceptions is PSPACE-hard. 2. A proof that CTI for programs with only single-level types can be done in O(n 5) time. This is an improvement over the O(n 7) worst-case bound achievable by applying previous approaches of [RHS95] and [LlZ91] to this case.

3. A proof that intraprocedural CTI for programs with only single-level types is in non-deterministic log-space and hence NC.

References [Age95] [And94]

Ole Agesen. The cartesian product algorithm: Simple and precise type inference of parametric polymorphlsm. In Proceedings of European Conference on Object-oriented Programming (ECOOP '95), 1995. L. O. Andersen. Program Analysis and Specialization for ~he C Programming Language. PhD thesis, DIKU, University of Copenhagen, 1994. Also available as DIKU report 94/19.

71 J. M. Barth. A practical interprocedural data flow analysis algorithm. Communications of the ACM, 21(9):724-736, 1978. [CBC93] Jong-Deok Choi, Michael Burke, and Paul Carini. Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects. In Proceedings of the ACM SIGPLAN/SIGACT Symposium on Principles of Programming Languages, pages 232-245, January 1993. [CRL97] Ramkrishna Chatterjee, Barbara Ryder, and William Landi. Complexity of concrete type-inference in the presence of exceptions. Technical Report DCS-TR-341, Dept of CS, Rutgers University, September 1997. [EGH94] Maryam Emami, Rakesh Ghiya, and Laurie J. Hendren. Context-sensitive interprocedural points-to analysis in the presence of function pointers. In Proceedings of the ACM SIGPLAN Conference on Programming language design and implementation, pages 242-256, 1994. [GJS96] James Gosling, Bill Joy, and Guy Steele. The Java Language Specification. Addison-Wesley, 1996. [KU76] J.B. Kam and J.D. Ullman. Global data flow analysis and iterative algorithms. Journal of ACM, 23(1):158-171, 1976. [LR91] W.A. Landi and Barbara G. Ryder. Pointer-induced allasing: A problem classification. In Proceedings of the A CM SIGPLAN/SIGA CT Symposium on Principles of Programming Languages, pages 93-103, January 1991. W.A. Landi and Barbara G. Ryder. A safe approximation algorithm for [LR92] interprocedural pointer allasing. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 235-248, June 1992. [MLR+93] T . J . Marlowe, W . A . Landi, B . G . Ryder, J. Choi, M. Burke, and P. Carini. Pointer-induced aliasing: A clarification. ACM SIGPLAN Notices, 28(9):67-70, September 1993. C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994. [Pap94] J. Plevyak and A. Chien. Precise concrete type inference for object ori[PC94] ented languages. In Proceeding of Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA '95}~pages 324-340, October 1994. [P1~96] Hemant Pande and Barbara G. Ryder. Data-flow-based virtual function resolution. In LNCS 1155, Proceedings of the Third International Symposium on Static Analysis, 1996. J. Palsberg and M. Schwartzbach. Object-oriented type inference. In [PSgl] Proceedings of Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA '91), pages 146-161, October 1991. [RHS95] T. Reps, S. Horwitz, and M. Sagiv. Precise interprocedural dataflow analysis via graph teachability. In Proceedings of the ACM StGPLAN/SIGACT Symposium on Principles of Programming Languages, pages 49-61, 1995. [Ruf95] E. Ruf. Context-insensitive alias analysis reconsidered. In Proceedings of the ACM SIGPLAN Conference on Programming language design and implementation, pages 13-22, June 1995. [SI-I97] M. Shapiro and S. Horwitz. Fast and accurate flow-insensitive points-to analysis. In Proceedings of the A CM SIGPLAN/SIGA CT Symposium on Principles of Programming Languages, pages 1-t4, 1997. [Ste96] Bjarne Steensgaard. Points-to analysis in almost linear time. In Proceedings of the A CM SIGPLAN//SIGA CT Symposium on Principles of Programming Languages, pages 32-41, 1996. [Bar78]

72 [WL95]

[ZRL96]

A

Robert P. Wilson and Monica S. Lam. Efficient context-sensitive pointer analysis for c programs. In Proceedings of the A CM SIGPLAN Conference on Programming language design and implementation, pages 1-12, 1995. S. Zhang, B. G. Ryder, and W. Land<. Program decomposition for pointer aliasing: A step towards practical analyses. In Proceedings of the ~th Symposium on the Foundations of S o , ware Engineering, October 1996.

Example

for t h e basic a l g o r i t h m

/ / N o t e : due to lack of space only a part o~ the solution is shown class A {};

class B { public B fieldl;

};

class base { public static A a; public void method( ){ ii: a = n e w A;

class c { public B fieldl;

};

class derived extends base ( public void method( ) { //overrides base::method

12:

//<empty, >

a = new

A;

//<empty,
exit_~ode:

exitmods :

class c a l l e r { public static void call( base Param ) i{I

//((param, objecLl4 ), (param, object_14

//<<param, object J5 >, (param, objectS5 13:

/ / / / / /

};/

:

,aram.method() ;

param, object_14 ) = > base::method is called. param, object_15 ) => derived::method is called. ~.mpty, (a, object_ll > ) is changed to <param, objectd4 >, (a, object_ll }} because base"method.. ~s called only when <param, object_14). (empty, (a, object_12 ) > is changedto <(param, objectd5 >, because derived::method is is called only when <param, object_15 >.

}; // end of class caller class potpourri { public static void example( C param ) {

/ / L e t S = { ( (param, object_16 >, (param, object_f6 > >,

//((param, objectS7>, <param, objectS7> ), //((object~6.1~eldl, null ), >, //<(object~7.1~eld~, null ), >} //solution at this point is S local = param;

///Let $1 = S U {<(param, object_16 ), >'}

solution at this point is $1

18: l o c a l . f i e l d l =

new

B;

////solution =

///

s~ u { <<param, object_IS >,

<<param, objectJ7 I, ) }

objectJS > >>;

73 exit_node:

};};

class test { public void test1( ) { base p; 14: p = ne. base;

};

public void test2( ) { derived p; 15: p = he. derived;

//(empty, (p, o ~ e c t ~ ) ) 19: caller.call(p);

110: caller.call(p);

//(empty, (p, o~ect~4 ) ) . / / A t 19 (p, o~ectJ4 ) / / = > (empty, (a, o~ect~1 ) ). exit.node:

~ (empty, (p, object_IS) ) .

public void test3( ) { C q; 16: q = ne. C; potpouri.example( q );

};

};

B

At 110 (p, object_15 ) = >

(empty, (a, objectS2) ).

exit_node:

public void test4( ) {

};p o t p o u r i . e x a m p l e (

q );

Auxiliary functions

/~/ CPT stands for a conditional-points-to. DFE stands for a data-flow-element which could be a CPT or 'reachable'. set of data-flo.-elements

apply( node, rDFE ) {

/ / t h i s ]unction computes the effect of the statement (if any) associated / / w i t h node on rDFE, i.e. the resulting data-flow-elements at the bottom

//I oI node.

set of data-flo.-elements retVal = empty set; if ( node is a not an assignment node ) { add rDFE to retVal; return retVal ;

}

if

( rDFE == reachable ) { add rDFE t o r e t V a l ; if (node unconditionally generates a conditional-points-to)

{

/~/ e.g. l: p = new A; unconditionally generates <empty, (p, object_Z) ). add this conditional-points-to

to retVal;

else { rDFE

is a conditional-points-to

_set = combine compatible CPTs in the solution set computed so far (including rDFE) to generate the set of locations represented by the left-hana-side.

/ / N o t e : each element in lhs_set has a set of assumed / / p o i n t s - t o s associated with it' similarly compute rhs_sot for the right-hand-side. retVal = combine compatible elements from lhs_set and rhs-set.

/~// only one of the assumed points-tos associated with a resulting points-to is chosen as its assumed points-to. if ( rDFE is not killed by this node ) add rDFE to retVal;

74 return retVal ;

} //end

of apply

v o i d add_to_table_of_conditions( rCPT, e x i t - n o d e

) {

/~/ each exit-node has a table condTable associated with it, which stores for each assumed points-to, the points-tos which hold at the exit-node with this assumed points-to. This function stores r C P T in this table. } v o i d add_to_table_of_assumed_points_tos(s,

condition,

C) {

/~/~/ C is a call-node, condition is an assumed-points-to and s is a points-to passed to the entry-node of a method invoked from C. each call-node has a table asPtTtable associated with it, which stores for each points-to that is passed to the entry-node of a method invoked from this site, the assumed-points-tos which imply this point-to at the call site. This function stores condition with s in this table. }

l

s e t of p o i n t s - t o s get_assumed_points_tos( C, s , M ) { i~ ( s i s empty ) { if ( the solution set at C does not contain reachable ) return empty set ; else if C is a dynamically-dispatched-call site) return the set of assume~-points-tos for the values of receiver, which result in a call to method M; else return a set containing empty;

I

} } else { return the assumed-points-tos stored with s in C.asPtTtable;

}

}

//asPtTtable is defined in add_to_table_of_assumed_points_tos. lookup_table_of_conditions( c o n d i t i o n , e x i t - n o d e ) { returns the points-tos stored with condition in exit-node.condTable, which is defined in add_to_table_of_conditions.

s e t of p o i n t s - t o s

l/It

}

set of conditional-points-tos

get_implied_conditional_points_tos(rCPT,

l/l/It returns conditional-points.tos implied by rCPT.points-to at the entry-node of method M at call-node C. This means it also performs actual to formal binding. Note that it may return //an empty set. } void propagate_conditionaLpoints_tos_with_empty_condition( C, M) { if ( C is not a dynamicallyxlispatched_r site )

s -- { empty

};

else S = set of assumed-points-tos for the values of receiver, which result in a call to method M; Pts = lookup_table_of_conditions( empty, M.exit_node ) ; for ( each s in S ) { for ( each pts in Pts ) { cpt = new conditional_points_to( s, pts ); add_to_solution_and_worklist_if_needed( cpt, R ) ;

}

}

//R

is the return-node of C

M, C ) {

Synchronisation Analysis to Stop Tupling* Wei-Ngan Chin 1, Siau-Cheng Khoo 1, and Tat-Wee Lee 2 1 National University of Singapore 2 Singapore Police Academy

A b s t r a c t . Tupling transformation strategy can be used to merge loops together by combining recursive calls and also to eliminate redundant calls for a class of programs. In the latter case, this transformation can produce super-linear speedup. Existing works in deriving a safe and automatic tupling only apply to a very limited class of programs. In this paper, we present a novel parameter analysis, called synchronisation analysis, to solve the termination problem for tupling. With it, we c a n perform tupling on functions with multiple recursion and accumulative arguments without the risk of non-termination. This significantly widens the scope for tupling, and potentially enhances its usefulness. The analysis is shown to be of polynomial complexity; this makes tupling suitable as a compiler optimisation.

1

Introduction

Source-to-source transformation can achieve global optimisation through specialisation for recursive functions. Two well-known techniques are partial evaluation [9] and deforestation [20]. Both techniques have been extensively investigated [18, 8] to discover automatic algorithms and supporting analyses that can ensure correct and terminating program optimisations. Tupling is a lesser known but equally powerful transformation technique. The basic technique works by grouping calls with c o m m o n arguments together, so that their multiple results can be computed simultaneously. When successfully applied, redundant calls can be eliminated, and multiple traversals of data structures combined. As an example, consider the Tower of Hanoi function.

hanoi(O,a,b,c) = ~; hanoi(l q-n,a,b,c) = hanoi(n,a,e,b)-H-f(a,b)l-H-hanoi(n,c,b,a); Note that q-q- denotes list catenation. The call hanoi(n,a,b,c) returns a list of moves to transfer n discs from pole a to b, using c as a spare pole. The first parameter is a recursion parameter which strictly decreases, while the other three parameters are permuting parameters which are bounded in values. (A formal classification of parameters will be given later in Sec 2.) This definition contains redundant calls, which can be eliminated. By gathering each set of overlapping calls, which share c o m m o n recursion arguments into a tupled function, the tupling method introduces: * This work was done while Lee was a recipient of a NUS Graduate Scholarship.

76

ht2(n,a,c,b) = (hanoi(n,a,c,b), hanoi(n,c,b,a) ; ht3(n,a,b,c) = (hanoi(n,a,b,c), hanoi(n,b,c,a), hanoi(n,c,a,b)) ; and transforms hanoi to the following : hanoi(l+n,a,b,c) ht2(O,a,c,b) ht2(l+n,a,c,b)

= let (u,v)=ht2(n,a,e,b) in u~+[(a,b)]+~v, = (a,U); = let (u,v,w)=ht3(n,a,b,c) in (~+~[(a,c)]+~v,w~+[(c,b)]~+.) ;

ht3(O,a,b,c) ht3(l+n,a,b,c)

= (~,~,[~); = let (u,v,w)=ht3(n,a,c,b) in (u-H-[(a,b)]-H-v,w-H-[(b,c)]-t+u, v-Hr[(c,a)]+bw); Despite a significant loss in modularity and clarity, the resulting tupled function is desirable as their better performance can be mission critical. Sadly, manual and error-prone coding of such tupled functions are frequently practised by functional programmers. Though the benefits of tupling are clear, its wider adoption is presently hampered by the difficulties of ensuring that its transformation always terminates. This problem is crucial since it is possible for tupling to meet infinitely many different tuples of calls, which can cause infinite number of tuple functions to be introduced. Consider the knapsack definition below, where W(i) and V(i) return the weight and value of some i-th item. knap(O,w)

= o ;

~av(l+n,w)

= ~ w < W ( l + n ) then knav(n,w)

else max(knap (n, w),knap (n, w- W ( l + n ) ) + V(1 +n) ; Redundant calls exist but tupling fails to stop when it is performed on the above function. Specifically, tupling encounters the following growing tuples of calls. 1. (knap(n,w),knap(n,w-W(l+n))) 2. (knap(nl , w),knap(nl, w- W( l +nl )),knap(nl,w- W(2 +nl )), k~ap(n, , w- W(2+nl )- W(~ +n~ ))) 3. (k~aV(n2 , w),knap(n2, w- W (1+n~ ) ),knav(n~, ~- W ( 2 +n~ )- W ( l + ~ ) ), k~ap(.~ , w- W(2 +n~ ) ),k~ap(~ , . - W(3+n2 ) ),knap(~ , w- W(3+n~ )- W( l +n~ ) ),

knav(n2 , w- W(34-n2 )- W(2 +n2 )),knap(n2 , w- W(34-n2 )- W(Z +n2 )- W(1 +n2 ))) Why did tupling failed to stop in this case? It was because the calls of knap overlap, but their recursion parameter n did not synchronise with an accumulative parameter w. To avoid the need for parameter synchronisation, previous proposals in [2, 7] restrict tupling to only functions with a single recursion parameter, and without any accumulative parameters. However, this blanket restriction also rules out many useful functions with multiple reeursion a n d / o r accumulative parameters that could be tupled. Consider: repl(Leaf(n),xs) = Leaf(head(xs)); revl( Node(1,r),xs) = Node(revl(1,xs),revl(r, sdrov(1,xs) ) ) ; sdrop(Leaf(n),xs) ~ail(xs); sdrop(Node(l,r),xs) = sdrop(r, sch'op(l, xs)); Functions repl and sdrop are used to replace the contents of a tree by the items from another list, without any changes to the shape of the tree. Redundant sdrop calls exist, causing repl to have a time complexity of O(n*) where n is the size of =

77

the input tree. The two functions each has the first parameter being recursion and the second being accumulative. For the calls which overlap, the two parameters synchronise with each other (see Sac. 6 later). Hence, we can gather repl(l, xs) and sdrop(l, xs) to form the following function: ~st~pO, xs)

= (~eplO,~),~d~op(~,xs))

Applying tupling to rstup yields the following O(n) definition: rstup(Leaf(n),xs) = (Leaf(head(xs)),tail(xs)) ; rstup(node(1,r),xs) = let {(u,v)=rstup(1,xs); (a,b)=rstup(r,v)} in (Node(u,a),b) ; This paper proposes a novel parameter analysis, called s y n c h r o n i s a t i o n analysis, to solve the termination problem for tupling. With it, we can considerably widen the scope of tupling by selectively handling functions with multiple recursion (and/or accumulative) parameters. If our analysis shows that the multiple parameters synchronises for a given function, we guarantee that tupling will stop when it is applied to the function. However, if our analysis shows possible non-synchronisation, we must skip tupling. (This failure may be used to suggest more advanced but expensive techniques, such as vector-based [3] or list-based [14] memoisations. These other techniques are complimentary to tupling since they may be more widely applicable but yield more expensive codes.) Lastly, we provide a costing for our analysis, and show that it can be practically implemented. In See. 2, we lay the foundation for the discussion of tupling transformation and synchronisation analysis. See. 3 gives an overview of the tupling algorithm and the two obstacles towards terminating transformation. See. 4 provides a formal treatment of segments, which are used to determine synchronisation. In Sac. 5, we formulate prevention of indefinite unfolding via investigation of the behaviour of segment concatenation. In See. 6, we formally introduce synchronisation analysis, and state the conditions that ensure termination of tupling transformation. Sac. 7 describes related work, before a short conclusion in Sac. 8. Due to space constraint, we refer the reader to [4] for more in-depth discussion of related issues. 2

Language

and

Notations

We consider a strict first-order functional language. D e f i n i t i o n 1 (A S i m p l e L a n g u a g e ) . A program in our simple language consists of sets of mutual-recursive functions: (Program) P ::= [M,],"=0 (Set of Mut. Rec. Fns) M ::= [F,]~=0 (Set of Equations) F ::= {/(p,1 . . . . . P,n) = t,},~=o (Expression) t ::= v I C(tl . . . . , t , ) I f ( t l . . . . . t,) l if tl then t~ else t~ I Jet {p, = t,},"=0 in t (Pattern) p ::= v I C ( p l , . . . , p , )

78

We allow use of infix binary d a t a construction in this paper. Moreover, we shall construct any Peano integer as a d a t a constructed from 0 and l+n. Applying the constructor, (1+), k times to a variable m will be abbreviated as k+m. For clarity, we adopt the following multi-holes context notation : D e f i n i t i o n 2 ( M u l t i - h o l e s Context Notation). The RHS of an equation of a function ?can be expressed as Er[t~,... ,t,] where tl, ..., t~ are sub-expressions occurring in the RHS. [] Safety of tupling transformation relies on the ability to determine systematic change in the arguments of successive function calls. Such systematic change can be described with appropriate operators, as defined below: Definition 3 (Argument

Operators).

1. Each d a t a constructor C of arity n in the language is associated with n descend operators, which are d a t a destructors. Notation-wise, let C(tl,...,tn) be a d a t a structure, then any of its corresponding d a t a destructors is denoted by C - ' , and defined as C-' C(tl,...,tn) = t,. 2. For each constant (i.e. variable-free constructor subterm), denoted by c in our program, a constant operator c_always return that constant upon application. 3. An identity, id, is the unit under function composition. 4. For any n-tuple argument (al .... ,an), the i *h selector, ~,, is defined as fi, (al,..., an) :

a, .

5. An accumulative operator is any operator that is not defined above. For instance, a d a t a constructor, such as C described in item 1 above, is an accumulative operator; and so is the tuple constructor (opl .... ,opt). [] Composition of operators is defined by (f o g) x = f (g x). A composition of argument 6perators forms an operation path, denoted by op. It describes how an argument is changed from a caller to a callee through call unfolding. This can be determined by examining the relationship between the parameters and the call arguments appearing in the RHS of the equation. For instance, consider the equation g(x) = Eg[g(C(x,2))]. C(x,2) in the RHS can be constructed from p a r a m e t e r x via the operation p a t h : C o (id, 2). Changes in an n-tuple argument can be described by an n-tuple of operation paths, called a segment, and denoted by (opl,...,op,). For convenience, we overload the notion id to represent an identity operation as well as a segment containing tuple of identity operation paths. Segments can be used in a function graph to show how the function arguments are transformed: D e f i n i t i o n 4 ( L a b e l l e d C a l l G r a p h ) . The labelled call graph of a set of mutualrecursive functions F, denoted a s ( N F , E F ) , is a graph whereby each function name from F is a node in NF; and each caller-callee transition is represented by an arrow in EF, labelled with the segment information. []

79

((1+) - I

o

~i, 1~2, 94, f]3)

\/ I ano'l /\

((i+) - I 0 ~I, ~4, ~3, ~2)

Fig. 1. Labelled Call Graph of Hanoi defined in Sec. 1.

We use segments to characterise function parameters. The characterisation stems from the way the parameters are changed across labelled call graph of mutually recursive functions. Definition 5 (Characterising ParAmeters/Arguments). of the form f(pl ..... p,) = t,

Given an equation

1. A group of f s parameters are said to be bounded parameters if their corresponding arguments in each recursive call in t are derived via either constants, identity, or application of selectors to this group of parameters. 2. The i th p a r a m e t e r of f is said to be a recursion parameter if it is not bounded and the i th argument of each recursive call in t is derived by applying a series of either descend operators or identity to the i *h parameter. 3. Otherwise, the Fs p a r a m e t e r is said to be an accumulative parameter. [] Correspondingly, an argument to a function call is called a recursion/accumulative argument if it is located at the position of a recursion/accumulative parameter. We can partition a segment according to the kinds of parameters it has: D e f i n i t i o n 6 ( P r o j e c t i o n s o f S e g m e n t s ) . Given a segment, s, characterising the parameters of an equation, we write ~rR(s)/TrA(S)/rrB(s) to denote the (sub-)tuple of s, including only those operation paths which characterise the recursion/accumulative/bounded parameters. The sub-tuple preserves the original ordering of the operation paths in the segment. Furthermore, we write ~-K(s) to denote the sub-tuple of s excluding ~rB(s). [] As examples, the first parameter of function hanoi is a recursion parameter, whereas its other parameters are bounded. In the case of functions repl and sdrop (defined in Sec. 1 too), both their first parameters are recursion parameters, and their second parameters are accumulative. Our analysis of segments requires us to place certain restrictions on the parameters and its relationship with the arguments. This is described as follows:

80

Definition 7 ( R e s t r i c t i o n s o n P a r a m e t e r s ) . 1. Each set of mutual-recursive functions (incl. recursive auxiliary functions), M, has the same number of recursion/accumulative parameters but can have an arbitrary number of bounded parameters. 2. Given an equation from M, the i th recursion/accumulative argument of any recursive call in the RHS is constructed from the i th parameter of the equation. [] The second restriction above enables us to omit the selector operation from the operation paths derived for recursion/accumulative parameters. Though restrictive, these requirements can be selectively lifted by pre-processing transformation and/or improved analysis. Due to space constraint, the details are described in a companion technical report [4].

3

Tupling Algorithm

Redundant calls may arise during executing of a function / when two (or more) calls in f's RHS have overlapping recursion arguments. We define the notion of overlapping below:

Definition 8 (Call Overlapping). 1. Two recursion arguments are said to overlap each other if they share some common variables. 2. Two accumulative arguments are said to overlap each other if one is a substructure of the other. 3. Two calls overlap if all its corresponding recursion and accumulative arguments overlap. Otherwise, they are disjoint. [] For example, if two functions fl and f2 have only recursion arguments, then

fl(C1 (xl ,x2 ), C2 (x4, C1 (xs,x6 ))) and f2(C2 (x2,x3 ), C~ (x~,xT )) have overlapping recursion arguments, whereas fl ( C1 (xl ,x2 ), C2 (x4 , C1 (x~,x6 ) ) ) and f2(x~ ,xT ) are disjoint. If two calls overlap, the call graphs initiated from them may overlap, and thus contain redundancy. Hence, it is useful, during tupling transformation, to gather the overlapping calls into a common function body with the hope that redundant calls (if any) will eventually be detected and eliminated (via abstraction). Once these redundant calls have been eliminated, what's left behind in the RHS will be disjoint calls. Applying tupling on a function thus attempts to transform the function into one in which every pair of calls in the new RHS are disjoint. Fig. 2 gives an operational description of the tupling algorithmJ There are two types of unfolds in the tupling algorithm: In Step 4.2, calls are unfolded without instantiation; ie., a call is unfolded only when all its arguments matches the LHS of an equation. On the other hand, in Step 4.4.2.2, a call is selected and forced to unfold. This henceforth requires instantiation. Among the 1 Please refer to [10] for detail presentation.

81

1. Decide a set of recursive function calls to tuple, C. 2. Let E contains a set of equations (with C calls in their RHS) to transform. 3. Let D be empty. (D will be the set of tuple definitions.) 4. While E is not empty, 4.1.Remove an equation, say f ( p l . . . . , p , ) = t, from E. 4.2.Modify t=r tl by repeatedly unfolding without instantiation each function call to C in t. 4.3.Gather each subset of overlapping C calls, s,, and perform the following tuples abstraction steps: 4.3.1. For each tuple s,, abstract the common substructure, tw, occurring in the accumulative arguments of all calls in s~, and modify s , ~ let {v w = t ,,: lk' J~=1 in s,.t m ~k, in let {(v,,1,. . . , v .... ) = s[}~=i in tz. 4.3.2. Modify t l ~ let U,=l{v,,~ = t ,,~Jj=l 4.4.For each tuple s[ from {s,},=l, ' TM we transform to s," as follows : Let { v l , . . . , v,} = Vars(s[) where Vars(e) returns free variables of e 4.4.1. If s[ has < 1 function call, then s[' = s[. (No action taken.) 4.4.2. If there is no tuple definition for s[ in D, then 4.4.2.1. Add a new definition, g ( v l , . . . , v,) = s[ to D. 4.4.2.2. Instantiate g by unfolding (with instantiation) a call in s[ that has maximal recursion argument. This introduces several new equations for g, which are then added to E. 4.4.2.3. Set s[' = g ( v l , . . . , v , ) . 4.4.3. Otherwise, a tuple definition, of name say h, matching s[ is found in D. 4.4.3.1. Fold s[ against definition of h to yield a call to h, by setting

s:' = h(v,,..., v.). 4.5.Output a transformed equation m lk, in l e t { ( v , , l , .. . , v .... ) = s,,, },=1 ,n in tz. I(Pl ,.. .,Pn)=let U,=I {v,o = t wJ~=l 5. Halt. F i g . 2. Tupfing Algorithm, T

calls available, we choose to unfold a call h a v i n g m a x i m a l r e c u r s i o n a r g u m e n t s ; t h a t is, t h e recursion a r g u m e n t s , t r e a t e d as a tree-like d a t a s t r u c t u r e , is d e e p e s t in d e p t h a m o n g the calls. A l t h o u g h effective in e l i m i n a t i n g r e d u n d a n t calls, e x e c u t i o n of a l g o r i t h m 7m a y n o t t e r m i n a t e in general due to one of the following reasons: (1) S t e p 4.2 m a y unfold calls indefinitely; (2) Step 4.4.2 m a y i n t r o d u c e infinitely m a n y new t u p l e definitions as t u p l i n g encounters infinitely m a n y different tuples. W e a d d r e s s these two t e r m i n a t i o n issues in See. 5 a n d See. 6 respectively. B u t first, we give a f o r m a l t r e a t m e n t of a l g e b r a of segments.

4

Algebra

of S e g m e n t s

A set of s e g m e n t s f o r m s an a l g e b r a u n d e r c o n c a t e n a t i o n o p e r a t i o n .

82 D e f i n i t i o n 9 ( C o n c a t e n a t i o n o f O p e r a t i o n P a t h s ) . Concatenation of two operation paths opl and op2, denoted by opl ; op2, is defined as op2 o opl. []

Definition 10 (Concatenation o f S e g m e n t s ) . Concatenation of two segments sl and s2, denoted by sl;s2, is defined componentwise as follows: sl = (opl ..... opn ) &s2

(op'1 ..... o p ' ) = ~ s l ; s 2 = ( o p l ; o p ' l

..... o p , , o p , ) .

[3

A concatenated segment can be expressed more compactly by applying the following reduction rules: 1. For any operator O, id o 0 reduces to O, and O o i d reduces to O. 2. V 01, ..., On and O, (01 o 0 . . . . . On o O) reduces to ( 0 1 , . , On) o O. Applying the above reduction rules to a segment yields a compacted segment. Henceforth, we deal with compacted segments, unless we state otherwise. Lastly, concatenating a segment, s, n times is expressed as s n. Such repetition of segment leads to the notion of factors of a segment, as described below: D e f i n i t i o n 11 ( F a c t o r i s a t i o n o f S e g m e n t s ) . Given segments s and f. f is said to be a factor of s if (1) f i s a substring of s, and (2) 3n > 0. s = f n . We call n the power of s wrt f. [] For example, (C -2, id) is a factor of (C-2o C-2oC -2, id ), since (C -S, id )3 = (C-2oC-~oC -2, id ). It is trivial to see that every segment has at least one factor - itself. However, when a segment has exactly one factor, it is called a prime segment. An example of prime segment is (C-~loC~ 1, id ).

Lemma 12 (Uniqueness o f P r i m e F a c t o r i s a t i o n [ 1 0 ] ) . Let s

be a compacted segment, there exists a unique prime segment f such that s = f k for some k>O. []

5

Preventing

Indefinite

Unfolding

We now provide a simple condition that prevents tupling transformation from admitting indefinite unfolding at Step 4.2. This condition has been presented in [2]. Here we rephrase it using segment notation, and extend it to cover multiple recursion arguments. We first define a simple cycle as a simple loop (with no repeating node, except the first one) in a labelled call graph. The set of segments corresponding to the simple cycles in a labelled call graph (N,E) is denoted as SCycle(N,E). This can be computed in time of complexity O(INllEI 2) [10]. T h e o r e m 13 (Preventing Indefinite Unfolding). L e t F be a set of mutualrecursive functions, each of which has non-zero number of recursion parameters. I f V s 6 SCycle(NF,EF), rrR(s) ~ (id, .... id), then given a call to f E F with arguments of finzte size, there exists a number N>O such that the call can be successively unfolded (without instantiation) not more than N times. []

83 6

Synchronisation

Analysis

We now present analysis that prevents tupling from generating infinitely many new tuple functions at Step 4.4.2. The idea is to ensure the finiteness of syntactically different (modulo variable renaming) tupled calls. As a group of bounded arguments is obtained from itself by the application of either selectors, identity or constants operators, it can only have finitely many different structures. Consequently, bounded arguments do not cause tupling transformation to loop infinitely. Hence, we focus on determining the structure of recursion and accumulative arguments in this section. Specifically, we assume non-existence of bounded arguments in our treatment of segments. Since syntactic changes to call arguments are captured by series of segments, differences in call arguments can be characterised by the relationship between the corresponding segments. We discuss below a set of relationships between segments.

Definition 14 ( L e v e l s o f S y n c h r o n i s a t i o n ) . Two segments sl and s2 are said to be : t. level-1 synchronised, denoted by sl ~1 s2, if 3s[, s~.(sl;s~ = s2;s~). Otherwise, they are said to be level-O synchronised, or simply, unsynchronised. 2. level-2 synchronised (sl ~2 s2) if 3s~, s'~.((s~;st = s2;s':) A (st = id V s~ = id)). 3. level-3 synchronised (sl ~3 s2) if 3 s.3n, m > 0.(sl = s n A s2 = sin). 4. level-4 synchronised (sl -~4 s2) if sl = s2. [] Levels 1 to 4 of synchronisation form a strict hierarchy, with synchronisation at level i implying synchronisation at level j if i > j. Together with level-0, these can help identify the termination property of tupling transformation. Why does synchronisation play an important role in the termination of tupling transformation? Intuitively, if two sequences of Segments synchronise, then calls following these two sequences will have finite variants of argument structures. This thus enables folding (in Step 4.4.3) to take effect, and eventually terminates the transformation. In this section, we provide an informal account of some of the interesting findings pertaining to tupling termination, as implied by the different levels of synchronisation. F i n d i n g 1. Transforming two calls with identical arguments but following level-O synchronised segments will end up with disjoint arguments. 2

Example 15. Consider the following two equations for functions gl and g2 respectively: 2 Sometimes, two apparently level-0 synchronised may turn into synchronisation of other levels when they are prefixed with some initial segment. Initial segments may be introduced by the argument structures of the two initially overlapping calls. Such hidden synchronisation can be detected by extending the current technique to handle "rotate/shift synchronisation" [5].

84

gl(Ct(xt,x2),C2(yl,y2))

= Egl[gl(Xl,yl)] ;

The segment leading to call gl(xl,yl) is (C-~I,C~ 1 ), whereas that leading to call g2(x~ ,y~) is (Cg 1, C~2). These two segments are level-O synchronised. Suppose that we have an expression containing two calls, to gl(u,v) and g2(u,v) respectively, with identical arguments. Tupling-transform these two calls causes the definition of a tuple function:

g_tup(u,v) = ( gl(u,v), g2(u,v) ) ; which will then transform (through instantiation) to the following: g_tup(C1 (111,u2),C2 (vl , v2)) = (Egl[gl(ux ,vl )],Eg2[g2(ul ,v2 )]) ; As the arguments of the two calls in the RHS above are now disjoint, tupling terminates. However, the effect of tupling is simply an unfolding of the calls. Thus, it is safe a but not productive to transform two calls with identical arguments if these calls follow segments that are level-0 synchronised. [] F i n d i n g 2. Applying tuphng transformation on calls that follow level-2 synchronised segments may not terminate.

Example 16. Consider the binomial function, as defined below:

bi~(o,k)

= 1;

b/n(/+n,0)

= I ;

bin(l§247 -= if k>_n then 1 else bin(n,k)-t-bin(n,14-k) ; Redundant call exists in executing the two overlapping calls bin(n,k) and bin(n, 1+k) Notice that the segments leading to these calls (((1-t-)-1,(1-t-) -1 ) and ((1+) -1,id )) are level-2 synchronised. Performing tupling transformation on (bin(n,k),bin(n, 1+k) will keep generating new set of overlapping calls at Step 4.4.2.2, as shown below:

1. 2. 3. 4.

(bin(n,k),bin(n,l+k)) (bm(1+n~,k),bin(nl,k),bm(nl,~+k)) (bi~(nl ,kl ),bi~(n,,~ +kl ),bin(n, ,2+k, )) (bi~(~ +n~,kl ),bin(n~,kl ),bin(n~,~ +k, ),bin(n2,2+kx ))

Hence, tupling transformation fails to terminate.

[]

Non-termination of transforming functions such as bin can be predicted from the (non-)synchronisability of its two segments - - Given two sequences of segments, sl = ((1+) -1, (i+) -1 ) and s2 = ((1+) -1, id ). If these two sequences are constructed using only Sl and s2 respectively, then it is impossible for the two sequences to be identical (though they overlap). However, if two segments are level-3 synchronised, then it is;always possible to build from these two segments, two sequences that are identical; thanks to the following Prop. 17(a) about level-3 synchronisation.

3 with respect to termination of tupling transformation.

85

P r o p e r t y 17 ( P r o p e r t i e s o f Level-3 Synchronisation). (a) Let fl, f2 be prime factors of sl and s2 respectively, then sl -~3 s2 ~ fl = f2. (b) Level-3 synchronisation is an equivalence relation over segments (ie., it is reflexive, symmetric, and transitive). [] This, thus, provides an opportunity for termination of tupling transformation. Indeed, the following theorem highlights such an opportunity.

T h e o r e m 18 ( T e r m i n a t i o n I n d u c e d b y L e v e l - 3 S y n c h r o n i s a t i o n ) . Let F be a set of mutual-recursive functions with S being the set of segments corresponding to the edges in (NF,EF). Let C be an initial set of overlapping F-calls to be tupled. If I. V s e S C y c l e ( N F , E F ) . fiR(s) ~ (id,...,id), 2. V s1, $2 E S . -ff-B(S1) r~3 ~-B(S2).

then performing tuphng transformation on C terminates.

[]

The notion ~-~(s) was defined in Defn. 6. The first condition in Theorem 18 prevents infinite number of unfolding, whereas the level-3 synchronisation condition ensures that the number of different tuples generated during transformation is finite. The proof is available in [4].

Example 19. Consider the equation of f defined below: f(2+n,4+m,y) = Ef[f(l+n,X+m,C(y)),f(n,m,C(C(r)))l ; Although the reeursion arguments in f(l+n, 24-rn,C(y)) and f(n,rn, C(C(y))) are consumed at different rate, the argument consumption (and accumulating) patterns for both calls are level-3 synchronised. Subjecting the calls to tupling transformation yields the following result: f(24-n,4-t-m,y) = let {yl ----C(y)} in lee {(u,v) -- [Aup(n,m,yl)} in F~[u,v] ; fAup(l+n,2+m,y) -- let {y2 = C(y)} in let {(u,v) -- fAup(n,m,y2)} in (F~[u,v],u); [] Finally, since level-4 synchronisation implies level-3 synchronisation, Theorem 18 applies to segments of level-4 synchronisation as well. In situation where segments are not all level-3 synchronised with one another, we describe here a sufficient condition which guarantees termination of tupling transformation. To begin with, we observe from Prop. 17(b) above that we can partition the set of segments S into disjoint level-3 sets of segments. Let Hs = { [sl], ..., [sk] } be such a partition. By Prop. 17(a), all segments in a level-3 set Is,] share a unique prime factor, s say, such that all segments in [s,] can be expressed as {~1 ..... ~"'}. We then define HCF([s,]) = ~cd(pl .....P"'), where gcd computes the greatest common divisor. HCF([s,]) is thus the highest common factor of the level-3 set [s,].

Definition 20 (Set of I~ighest C o m m o n Factors). Let S be a set of segment. The set of highest common factors of S, HCFSet(S), is defined as

86

HCFSet(S)

=

{ HCF([s,]) I [s,] e zT~ }.

The following theorem states a sufficient condition for preventing infinite definition during tupling transformation 4 T h e o r e m 21 ( P r e v e n t i n g I n f i n i t e D e f i n i t i o n ) . Let F be a set of mutualrecursive functions. Let S be the set of segments corresponding to the edges in (NF,EF). Let C be a set of overlapping calls occurring in the R H S of an equation in F. If V sl, s2 6 HCFSet(S) . -~'~(sl) ~o -~-~(s2), then performing tupling transformation on C wzll generate a finzte number of different tuples. [] A proof of the theorem is available in [10, 4]. A note on the complexity of this analysis: We notice from Theorem 21 that the main task of synchronisation analysis is to determine that all segments in HCFSet(S) are level-0 synchronised. This involves expressing each segment in S as its prime factorisation, partitioning S under level-3 synchronisation, computing the highest common factors for each partition, and lastly, determining if HCFSet(S) is level-0 synchronised. Conservatively, the complexity of synchronisation analysis is polynomial wrt the number of segments in S and the maximum length of these segments. Theorem 22 summarises the results of Theorem 13 and Theorem 21. T h e o r e m 22 ( T e r m i n a t i o n o f T u p l i n g T r a n s f o r m a t i o n ) . Let F be a set of mutual-recursive functions, each of which has non-zero number of recursion parameters. Let S be the set of segments correspond to the edges in (NF,EF). Let C a set of overlapping calls occurring in the RH~q of an equation in F. If 1. V s 6 SCycIe(NF,EF) . rrR(s) r (id,...,id), and 2. u sl, s2 6 HCFSet(S) . 9--~(s,) "o 9-~(s2), then performing tuplmg transformation on C will terminate.

7

[]

R e l a t e d Work

One of the earliest mechanisms for avoiding redundant calls is memo-functions [13]. Memo-functions are special functions which remember/store some or all of their previously computed function calls in a memo-table, so that re-occurring calls can have their results retrieved from the memo-table rather than re-computed Though general (with no analysis required), memo-functions are less practical since they rely on expensive run-time mechanisms. 4 It is possible to extend Theorem 21 further by relaxing its premises. In particular, we can show the prevention of infinite definition in the presence of segments that are level-2 synchronisation, provided such segment can be broken down into two subsegment, of which one is level-3 synchronised with some of the existing segments, and the other is level-0 synchronised [10].

87 Other transformation techniques (e.g. tupling and tabulation) may result in more efficient programs but they usually require program analyses and may be restricted to sub-classes of programs. By focusing on a restricted bi-linear self-recursive functions, Cohen [6] identified some algebraic properties, such as periodic commutative, common generator, and explicit descent relationships, to help predict redundancy patterns and corresponding tabulation schemes. Unfortunately, this approach is rather limited since the functions considered are restricted. In addition, the algebraic properties are difficult to detect, and yet limited in scope. (For example, the Tower-of-Hanoi function does not satisfy any of Cohen's algebraic properties, but can still be tupled by our method.) Another approach is to perform dwect search of the DG. Pettorossi [16] gave an informal heuristic to search the DG (dependency graph of calls) for eureka tuples. Later, Proietti & Pettorossi [17] proposed an Elimination Procedure, which combines fusion and tupling, to eliminate unnecessary intermediate variables from logic programs. To ensure termination, they only handled functions with a singl e recursion parameter, while the accumulative parameters are generalised whenever possible. No attempt is made to analyse the synchronizability of multiple recursion/accumulating parameters. With the aim of deriving incremental programs, Liu and Teitelbaum [12, 11] presented a three-stage method to cache, incrementalize and prune user programs. The caching stage gathers all intermediate and auxiliary results which might be needed to incrementalize, while pruning removes unneeded results. While their method may be quite general, its power depends largely on the incrementalize stage, which often requires heuristics and deep intuitions. No guarantee on termination, and the scope of this difficult stage is presently offered. Ensuring termination of transformers has been a central concern for many automatic transformation techniques. Though the problem of determining termination is in general undecidable, a variety of analyses can be applied to give meaningful results. In the case of deforestation, the proposals range from simple pure treeless syntactic form [20], to a sophisticated constraint-based analysis [18] 5 to stop the transformation. Likewise, earlier tupling work [2, 7] were based simply on restricted functions. In [2], the transformable functions can only have a single recursion parameter each, while accumulative parameters are forbidden. Similarly, in the calculational approach of [7], the allowed functions can only have a single recursion parameter each, while the other parameters are lambda abstracted, as per [15]. When lambda abstractions are being tupled, they yield effective elimination of multiple traversals, but not effective elimination of redundant function-type calls. For example, if functions sdrop and repl are lambda-abstracted prior to tupling, then redundant calls will not be properly eliminated. By engaging a more powerful synchronisation analysis for multiple parameters, we have managed to extend considerably the class of functions which could be tupled safely. Some initial ideas of our synchronisation analysis can be found in [5]. The earlier work is informal and incomplete since it did not cover accumu5 from which the title of this paper was inspired

88 lative parameters. The present proposal is believed to be comprehensive enough to be practical.

8

Conclusion

There is little doubt that tupled functions are extremely useful. Apart from the elimination of redundant calls and multiple traversals, tupled function are often hnear with respect to the common arguments (i.e. each now occurs only once in the RHS of the equation). This linearity property has a number of advantages, including: - It can help avoid space leaks that are due to unsynchronised multiple traversals of large data structures, via a compilation technique described in [19]. - It can facilitate deforestation (and other transformations) that impose a linearity restriction [20], often for efficiency and/or termination reasons. - It can improve opportunity for uniqueness typing [1], which is good for storage overwriting and other optimisations. Because of these nice performance attributes, functional programmers often go out of their way to write such tupled functions, despite them being more awkward, error-prone and harder to write and read. In this paper, we have shown the effectiveness and safeness of tupling transformation, when coupled with synchronisation analyses. Furthermore, not only do we generalise the analyses from handling of single recursion argument to that of multiple recursion arguments, we also bring together both recursion and accumulative arguments under one framework for analysis. These have considerably widened the scope of functions admissible for safe tupling. Consequently, the tupling algorithm T and associated synchronisation analysis could now be used to improve run-time performance, whilst preserving the clarity/modularity of programs. A c k n o w l e d g m e n t s We thank the annonymous referees for their valuable comments and Peter Thiemann for his contribution to earlier work. References 1. E. Barendsen and J.E.W. Smetsers. Conventional and uniqueness typing in graph rewrite systems. In 13th Conference on the Foundations of Software Technology Theoretical Computer Science, pages 45-51, Bombay, India, December 1993. 2. Wei-Ngan Chin. Towards an automated tupling strategy. In ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, pages 119-132, Copenhagen, Denmark, June 1993. ACM Press. 3. Wei-Ngan Chin and Masami Hagiya. A bounds inference method for vector-based memoisation. In 2nd ACM SIGPLAN Intl. Conference on Functional Programmmg, pages 176-187, Amsterdam, Holland, June 1997. ACM Press.

89 4. W.N. Chin, S.C. Khoo, and T.W. Lee. Synchronisation analysis to stop tupling extended abstract. Technical report, Dept of IS/CS, NUS, Dec 1997. http :I/www. iscs .nus. sgl-khoosc/paper/synTech, ps. gz. 5. W.N. Chin, S.C. Khoo, and P. Thiemann. Synehronisation analyses for multiple recursion parameters. In lntl Dagstuhl Seminar on Partial Evaluation (LNCS 1110), pages 33-53, Germany, February 1996. 6. Norman H. Cohen. Eliminating redundant recursive calls. ACM Trans. on Programming Languages and Systems, 5(3):265-299, July 1983. 7. Z. Hu, H. Iwasaki, M. Takeichi, and A. Takano. Tupling calculation eliminates multiple traversals. In 2nd ACM SIGPLAN International Conference on Functional Programming, pages 164-175, Amsterdam, Netherlands, June 1997. ACM Press. 8. N.D. Jones, C.K. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation. Prentice Hall, 1993. 9. N.D. Jones, P. Sestoft, and H. Sondergaard. An experiment in partial evaluation: the generation of a compiler generator. Journal of LISP and Symbolic Computation, 2(1):9-50, 1989. t0. Tat Wee Lee. Synchronisation analysis for tupling. Master's thesis, DISCS, National University of Singapore, 1997. http://www, iscs .nus. sg/-khoosc/paper/itw_thesis, ps. gz. 11. Y A. Liu, S D. Stoner, and T. Teitelbaum. Discovering auxiliary information for incremental computation. In 23rd ACM Symposium Principles of Programming Languages, pages 157-170, St. Petersburg, Florida, January 1996. ACM Press. 12. Y A. Liu and T. Teitelbaum. Caching intermediate results for program improvement. In A CM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, pages 190-201, La Jolla, California, June 1995. ACM Press. 13. Donald Michie. Memo functions and machine learning. Nature, 218:19-22, April 1968. 14. A. Pettorossi and M. Proietti. Program derivation via list introduction. In IFIP TC 2 Working Conf. on Algorithmic Languages and Calculi, Le Bischenberg, France, February 1997. Chapman & Hall. 15. A. Pettorossi and A. Skowron. Higher order generalization in program derivation. In TAPSOFT 87, Pisa, Italy, (LNCS, vol 250, pp. 306-325), March 1987. 16. Alberto Pettorossi. A powerful strategy for deriving programs by transformation. In 3rd ACM LISP and Functional Programming Conference, pages 273-281. ACM Press, 1984. 17. M. Proietti and A. Pettorossi. Unfolding - definition - folding, in this order for avoiding unnecessary variables in logic programs. In Proceedings of PLILP, Passau, Germany, (LNCS, vol 528, pp. 347-258) Berlin Heidelberg New York: Springer, 1991. 18. H. Seidl and M.H. Serensen. Constraints to stop higher-order deforestation. In 2~th ACM Symposium on Principles of Programming Languages, Paris, France, January 1997. ACM Press. 19. Jan Sparud. How to avoid space-leak without a garbage collector. In A CM Conference on Functional Programming and Computer Architecture, pages 117-122, Copenhagen, Denmark, June 1993. ACM Press. 20. Phil Wadler. Deforestation: Transforming programs to eliminate trees. In European Symposium on Programming, Nancy, France, (LNCS, vol 300, pp. 344-358), March 1988.

Propagating Differences: An Efficient New Fixpoint Algorithm for Distributive Constraint Systems Christian Fecht 1 and Helmut Seidl 2 1Universits

des Saarlandes, Postfach 151150, D-66041Saarbrficken, Germany, [email protected] 2 Fachbereich IV - I n f o r m a t i k , Universit~tTrier, D-54286Trier, Germany, [email protected]

A b s t r a c t . Integrating semi-naive fixpoint iteration from deductive d a t a bases [3, 2, 4] as well as continuations into worklist-based solvers, we derive a new application independent local fixpoint algorithm for distributive constraint systems. Seemingly different efficient algorithms for abstract interpretation like those for linear constant propagation for imperative languages [17] as well as for control-flow analysis for functional languages [13] turn out to be instances of our scheme. Besides this systematizing contribution we also derive a new efficient algorithm for abstract 0LDT-resolution as considered in [15, 16, 25] for Prolog.

1

Introduction

Efficient application independent local solvers for general classes of constraint systems have been successfully used in program analyzers like G A I A [9,6], P L A I A [20] or G E N A [10, 11] for Prolog and PAG [1] for imperative languages. The advantages of application independence are obvious: the algorithmic ideas can be pointed out more clearly and are not superseded by application specific aspects. Correctness can therefore be proven more easily. Once proven correct, the solver then can be instantiated to different application domains - thus allowing for reusable implementations. For the overall correctness of every such application it simply remains to check whether or not the constraint system correctly models the problem to be analyzed. Reasoning about the solution process itself can be totally abandoned. In [12], we considered systems of equations of the form x = f~ (x a variable and tried to minimize the number of evaluations of right-hand sides f~ during the solution process. Accordingly, we viewed these as (almost) atomic actions. In practical applications, however, like the abstract interpretation of Prolog programs, right-hand sides represent complicated functions. In this paper, we therefore t r y to minimize not just the number of evaluations but the overall work on right-hand sides. Clearly, improvements in this direction can no longer abstract completely from algorithms implementing right-hand sides. Nonetheless, we aim at optimizations in an as application independent setting as possible. We start by observing that right-hand sides f~ of defining equations x --- f~ often are of the form f~ -- tl U ... IAtk where the tl represent independent contributions to the value of x. We take care of t h a t by considering now systems of constraints of the form x ~ t. Having adapted standard worklist-based equation solvers to such constraint

91

systems, we investigate the impact of two further optimizations. First, we try to avoid identical subcomputations which would contribute nothing new to the next iteration. Thus, whenever a variable y accessed during the last evaluation of right-hand side t has changed it's value, we try to avoid reevaluation of t as a whole. Instead, we resume evaluation just with the access to y. To do this in a clean way, we adapt the model of (generalized) computation trees. We argue that many common expression languages for right-hand sides can easily and automatically be translated into this model. This model has the advantage to make continuations, i.e., remaining parts of computations after returns from variable lookups, explicit. So far, continuations have not been used in connection with worklist-based solver algorithms. Only for topdown-solver T D of Le Charlier and Van Hentenryck [5, 12] a related technique has been suggested and practically applied to the analysis of Prolog, by Englebert et al. in [9]. In case, however, computation on larger values is much more expensive than on smaller ones, continuation based worklist solvers can be further improved by calling continuations not with the complete new value of the modified variable but just its increment. This concept clearly is an instance of the very old idea of optimization through reduetwn in strength as considered, e.g., by Paige [22]. A similar idea has been considered for recursive query evaluation in deductive databases to avoid computing the same tuples again and again [3, 4]. Semi-naive iteration, therefore, propagates just those tuples to the respective next iteration which have been newly encountered. Originally, this optimization has been considered for rules of the form x D f y where x and y are mutually recursive relations and unary operator f is distributive, i.e., f (sl tA s2) = f sl U f s2. An extension to n-ary f is contained in [2]. A general combination, however, of this principle with continuations and local worklist solvers seems to be new. To make the combination work, we need an operator diff which when applied to abstract values dl and d2 determines the necessary part from dl LAd2 given dl for which reevaluation should take place. We then provide a set of sufficient conditions guaranteeing the correctness of the resulting algorithm. Propagating differences is orthogonal to the other optimizations of worklist solvers considered in [12]. Thus, we are free to add timestamps or just depth-first priorities to obtain even more competitive fixpoint algorithms (see [12]). We refrained from doing so to keep the exposition as simple as possible. We underline generality as well as usefulness of our new fixpoint algorithm by giving three important applications, namely, distributive framework IDE for interprocedural analysis of imperative languages [17], control-flow analysis for higher-order functional languages [13], and abstract OLDTresolution as considered in [15, 16] for Prolog. In the first two cases, we obtain similar complexity results as for known special purpose algorithms. Completely new algorithms are obtained for abstract 0LDT-resolution. Another effort to exhibit computational similarities at least between control-flow analysis and certain interprocedural analyses has been undertaken by Reps and his coworkers [18, 19]. It is based on the graph-theoretic notion of context-free language reachability. This approach, however, is much more limited in its applicability than ours since it does not work for binary operators and lattices which are not of the form D -- 2 A for some un-ordered base set A. The paper is organized as follows. In sections 2 through 6 we introduce our basic concepts. Especially, we introduce the notions of computation trees and weak monotonicity of computation trees. In the following three sections, we successively derive differential fixpoint algorithm W R y . Conventional worklist solver W R is introduced in section 7. Continuations are added in section 8 to obtain solver W R y from which we obtain algorithm W R ~ in section 9. The results of section 9 are sufficient to derive fast

92

algorithms for framework IDE (section 10) as well as control-flow analysis (section 11). Framework IDE has been proposed by Horwitz, Reps and Sagiv for interprocedural analysis of imperative programs and applied to interprocedurai linear constant propagation [17]. A variant of control-flow analysis ("set-based analysis") has been proposed by Heintze for the analysis of ML [13]. Another variant, even more in the spirit of the methods used here, has been applied in [26] to a higher-order functional language with call-by-need semantics to obtain a termination analysis for deforestation. In section 12 we extend applicability of algorithm WRza further by introducing generalized computation trees. This generalization takes into account independence of subcomputations. Especially, it allows to derive new algorithms for abstract 0LDT-resolution [15, 16, 25] (section 13). As an example implementation, we integrated an enhanced version of fixpoint algorithm W R a into program analyzer generator G E N A for Prolog [10, 11] and practically evaluated the generated analyzers on our benchmark suite of large Prolog programs. The results are reported in section 14.

2

Constraint S y s t e m s

Assume D is a complete lattice of values. A constraint system ~ with set variables V over lattice D consists of a set of constraints x ~ t where left-hand side x E V is a variable and t, the right-hand side, represents a function It] : (V --~ D) --+ D from variable assignments to values. Le Charlier and Van Hentenryck in [5] and Fecht and Seidl in [12] presented their solvers in a setting which was (almost) independent of the implementation of right-hand sides. In this paper, we insist on a general formulation as well. As in [12] we assume that 1. set V of variables is always finite; 2. complete lattice D has finite height; 3. evaluation of right-hand sides is always terminating. In contrast to [5, 12], however, our new algorithms take also into account how righthand sides are evaluated.

3

C o m p u t a t i o n Trees

Operationally, every evaluation of right-hand side t on variable assignment a can be viewed as a sequence alternating between variable lookups and internal computations which, eventually, terminates to return the result. Formally, this type of evaluation can be represented as a D-branching computation tree (ct for short) of finite depth. The set T(V, D) of all computation trees is the least set T containing d E D, x E V together with all pairs (x, C) where x E V and C : D -~ T. Given t E T(V, D), function It] : (V --~ D) -~ D implemented by t and set dep(t, _) of variables accessed during evaluation of t are given by: [d] a = d [x~ a = a x ~(x,C)] a = IC (a x)] ~r

dep(d, a) -- 0 dep(x, a) = (x} dep((x,C),a) = ( x } U d e p ( C ( a x ) , a )

Clearly, ct x can be viewed as an abbreviation of ct (x,)~d.d). Representations of equivalent computation trees can be obtained for various expression languages.

93 Example 1. Assume right-hand sides are given by e ::= d l x l f x J g ( x l , x 2

)

where, d denotes an element in D, and f and g represent monotonic functions D --+ D and D 2 -+ D, respectively. For simplicity, we do not distinguish inotationally) between these symbols and their respective meanings. Standard intra-procedural dataflow analyzes for imperative languages naturally introduce constraint systems of this simple type imostly even without occurrences of binary operators g). The computation tree t for expression e can be chosen as e itself if e 9 D U V . For e -- .f x, we set t = ix, ]) and for e - - - g i x l , x 2 ) , t : <xl,C) where C d = Ix2,Cd> and Cdd' = gid, d'). [3 Further examples of useful expression languages together with their translations into igeneralized) computation trees can be found in sections 11, 12, and 13. It should be emphasized that we do not advocate ct's as specification language for right-hand sides. In the first place, ct's serve as an abstract notion of algorithm for right-hand sides. In the second place, however, ct's (resp. their generalization as considered in section 12) can be viewed as the conceptual intermediate representation for our fixpoint iterators to rely on, meaning, that evaluation of right-hand sides should provide for every access to variable y some representation C of the remaining part of the evaluation. As in example 1, such C typically consists of a tuple of values together with the reached program point.

4

Solutions

Variable assignment a : V ~ D is called solution for S if a x -1 It] a for all constraints x ~ t in S. Every system S has at least one solution, namely the trivial one mapping every variable to T, the top element of D. In general, we are interested in computing a "good" solution, i.e., one which is as small as possible or, at least, non-trivial. With system S we associate function Gs : i V --~ D) --~ V ~ D defined by Gs a x = ~ ]{~t] a I x ~ t 9 S}. If we are lucky, all right-hand sides t represent monotonic functions. Then G s is monotonic as well, and therefore has a least fixpoint which is also the least solution of 2. As observed in [8, 12], the constraint systems for interprocedural analysis of (imperative or logic) languages often are not monotonic but just weakly monotonic.

5

Weak Monotonicity of C o m p u t a t i o n Trees

Assume we are given a partial ordering "<" on variables. Variable assignment a : V --+ D is called monotonic if for all Xl < xu, a Xl E a x2. On computation trees we define a relation "<" by: 9 _l_ < t for every t; and dl < d~ if dl E d2; 9 xl ~ x:

as ct's if also Xl < x2 as variables;

9 <Xl, e l ) <~ ix2,C2) if xl < x2 and C1 dl _ C2 d2 for all dl __ d2. Constraint system S is called weakly monotonic iff for every xl < x2 and constraint xl _~ tl in S, some constraint x2 _ t2 E S exists such that tl < t2. S is called monotonic if it is weakly monotonic for variable ordering "=". We have: P r o p o s i t i o n 2. Assume al, a2 are variable assignments where a l E a2 and at least one of the a~ is monotonic. Then tl < t2 implies:

94 1. d e p ( t l , a l ) _< dep(t2,a2);

2. It1] a l _E It2] a2.

Here, Sl _< 82 for sets Sl,S2 C V iff for all Xl E Sl, Xl _< x2 for some x2 E s2. Semantic property 2 coincides with what was called "weakly monotonic" in [12] - a d a p t e d to constraint systems. It is a derived property here since we started out from syntactic properties of c o m p u t a t i o n trees (not just their meanings as in [12]). As in [12] we conclude: C o r o l l a r y 3. If S is weakly monotonic, then: 1. If a is monotonic, t h e n G s a is again monotonic; 2. S has a unique least solution # given b y / z = L]:>0 G ~ / . Especially, this least solution # is monotonic.

6

Local Fixpoint C o m p u t a t i o n

Assume set V of variables is t r e m e n d o u s l y large while at the same time we are only interested in the values for a rather small subset X of variables. T h e n we should try to c o m p u t e the values of a solution only for variables from X a n d all those variables y t h a t "influence" values for variables in X . This is the idea of local fixpoint computation. Evaluation of c o m p u t a t i o n tree t on a r g u m e n t a does not necessarily consult all values a x, x E V. Therefore, evaluation of t m a y succeed already for partial a : V ".~ D. If evaluation of t on a succeeds, we can define the set dep(t, a) accessed d u r i n g this evaluation in the same way as in section 3 for complete functions. Given partial variable assignment a : V ---* D, variable y directly influences (relative to a) variable x if y E dep(t, a) for some r i g h t - h a n d side t of x. Let then "influencing" denote the reflexive a n d transitive closure of this relation. Partial variable assignment a is called X - s t a b l e if[ for every y E V influencing some x E X relative to a, a n d every constraint y ~_ t in S for y, It] a is defined with a y __] It] a. A solver, finally, computes for constraint system S a n d set X of variables an X - s t a b l e partial assignment (7; furthermore, if S is weakly monotonic a n d # is its least solution, t h e n a y --/~ y for all y influencing some variable in X (relative to a).

7

T h e Worklist Solver w i t h Recursion

T h e first solver W i t we consider is a local worklist algorithm e n h a n c e d with recursive descent into new variables (Fig. 1). Solver W i t is an a d a p t i o n of a simplification of solver W R T in [12] to constraint systems. Opposed to W i t T , no time s t a m p s are maintained. For every encountered variable x algorithm W i t (globally) m a i n t a i n s the current value a x together with a set infl x of constraints z _] t such t h a t evaluation of t (on a) m a y access value a x. The set of constraints to be reevaluated is kept in d a t a structure W, called worklist. Initially, W is empty. The algorithm starts by initializing all variables from set X by calling procedure Init. Procedure Init when applied to variable x first checks whether x has already been encountered, i.e., is contained in set dorn. If this is not the case, x is added to dora, a x is initialized to _L a n d set inflx of constraints potentially influenced by x is initialized to 0. T h e n a first a p p r o x i m a t i o n for x is c o m p u t e d by evaluating all r i g h t - h a n d sides t for x a n d adding the results to a x. If a value different from _L has been obtained, all elements from set inflx have to be added to W. After that, set infl x is emptied.

95

dorn=O; W=O; f o r a l l (x E X ) Init(x);

while (W ~ 0) { x _~ t = Extract(W); n e w = lit] ( A y . E v a l ( x ___t, y)); i f ( n e w 17- a x ) { a x =a

}

xU new;

W = W U infl x; i n f i x = 0;

} v o i d Init(V x) { i f (x~dom) { dora = dora U {x}; a x = _l_; infl x = 0; f o r a l l (x _~ t E S ) a x = a x U It] (Ay.Eval(x _ t, y)); if(axe_L) { W = W U infl x; infl x = 0;

}

} } D E v a l ( C o n s t r a i n t r, V y) { Init(y); infl y = infl y U {r}; r e t u r n a y;

}

F i g . 1. A l g o r i t h m W R .

As long as W is nonempty, the algorithm now iteratively extracts constraints x _~ t from W and evaluates right-hand side t on current partial variable assignment a. If It] a is not s u b s u m e d by a x, the value of a for x is u p d a t e d . Since the value for x has changed, the constraints r in infi x m a y no longer be satisfied by a; therefore, t h e y are added to W. Afterwards, inflx is reset to 0. However, r i g h t - h a n d sides t of constraints r are n o t evaluated on a directly. T h e r e are two reasons for this. First, a m a y not be defined for all variables y t h e evaluation of t m a y access; second, we have to d e t e r m i n e all y accessed by the evaluation of t on a. Therefore, t is applied to auxiliary function Ay.Evai(r, y). W h e n applied to constraint r and variable y, Eval first initializes y by calling Init. T h e n r is added to infl y, and t h e value of a for y (which now is always defined) is returned. Theorem

8

4. A l g o r i t h m W R is a solver.

[2

The Continuation Solver

Solver W i t contains inefficiencies. Consider constraint x _~ t where, during evaluation of t, value a y has been accessed at subtree t ~ = (y, C) of t. Now assume a y obtains

96

new value new. Then reevaluation of complete right-hand side t is initiated. Instead of doing so, we would like to initiate reevaluation only of subtree C new. Function C in subtree t' can be interpreted as (syntactic representation of) the continuation where reevaluation of t has to proceed if the value of a for y has been returned. We modify solver W R therefore as follows: 9 Whenever during evaluation of right-hand side of constraint x ~ t, subtree t' --(y, C) is reached, we not just access value a y and apply continuation C but additionally add (x, C) to the infl-set of variable y. 9 Whenever a y has obtained a new value, we add to W all pairs (x, (y, C)), (x, C) E infl y, to initiate their later reevaluations. The infi-sets of resulting algorithm W R c now contain elements from V x C o n t where C o n t = D --~ T(V, D), whereas worklist W obtains elements from V x T(V, D). In order to have continuations explicitly available for insertion into infl-sets, we change the functionality of the argument of [.] (and hence also N ) by passing down the current continuation into the argument. We define:

~d] a' = d

~x] a' = or' (Ad.d) x

[(x, C)] a ' = ~C (a' C x)] a '

where a f C x --- a x. Using this modified definition of the semantics of computation trees we finally adapt the functionality of Eval. The new function Eval consumes three arguments, namely variables x and y together with continuation C. Here, variable x represents the left-hand side of the current constraint, y represents the variable whose value is being accessed, and C is the current continuation. Given these three arguments, Eval first calls Init for y to make sure that cry as well as infl y have already been initialized. Then it adds (x, C) to set infl y. Finally, it returns a y. We obtain: T h e o r e m 5. Algorithm W R c is a solver.

O

A similar optimization for topdown solver T D [5, 12] in the context of analysis of Prolog programs has been called clause prefix optimization [9]. As far we know, an application independent exposition for worklist based solvers has not considered before.

9

The Differential Fixpoint Algorithm

Assume variable y has changed its value. Instead of reevaluating all trees (y, C) from set infl of y with the new value, we may initiate reevaluation just for the increment. This optimization is especially helpful, if computation on "larger" values is much more expensive than computations on "smaller" ones. The increase of values is determined by some function diff : D • D --~ D satisfying dl 0 dill(d1, d2) -- dl 0 d2

Example 6. If D ----2A, A a set, we may choose for diff set difference. If D = M --~ R, M a set and R a complete lattice, ditT(fl,fe) can be defined as diff(fl, f2) v = _L if f2 v _ fl v and dill(f1, f2) v = f2 v otherwise. 0 To make our idea work, we have to impose further restrictions onto the structure of right-hand sides t. We call 8 distributive if S is weakly monotonic and for every subterm (x, C) of right-hand sides of 8, dl, d2, and d such that d = d l Od2 and arbitrary variable assignment a: dep(C dl, or) U dep(C d2, a) D dep(C d, a)

and

fC dl] a LJ ~C d2] a ~_ lie d] a

97

dom=0; W=0; f o r a l l (x E X ) Init(x); w h i l e (W • 0) {

(x, C, A) = Extract(W); new = [C A] (AC, y.Eval(x, C, y)); if (new [Z a x ) { A = diff(a x, new); (x x = dr x U n e w ;

foran ((x', c') 9 infl x) W = W U {(x', C', ~)}; } } v o i d Init(V x){ if(x~dom) { dom = dom U {x}; a x = / ; infl x = 0; foraU (x -1 t 9 S ) ) a x = a x U it] (AC, y.Eval(x,C,y)); if (a x ~: Z)

forall ((~', c') 9 i.fl ~) w = w u {(~', c ' , (o ~))}; } } D EvaI(V x, C o n t C, V y) { Init(y); infl y ----infl y U {(x,C)}; r e t u r n a y;

} F i g . 2. Algorithm W R . a .

In interesting applications, 3 is even monotonic and variable dependencies are "static", i.e., independent of a. Furthermore, equality holds in the second inclusion. This is especially the case if right-hand sides are given through expressions as in Example 1, where all operators f and g are distributive in each of their arguments. In order to propagate increments, we change solver W R . c as follows. Assume a y has obtained a new value which differs from the old one by A and ( x, C) E infl y. 9 Instead of adding (x, (y, C)) to W (as for W R c ) , we add (x, C, A). Thus, now worklist W contains elements from V x C o n t x D. 9 If we extract triple (x, C, A) from W, we evaluate ct C A to obtain a (possibly) new increment for x. In contrast, however, to W R c and W R , it is no longer safe to empty sets infl y after use. The resulting differential worklist algorithm with recursive descent into new variables ( W R ~ for short) is given in Figure 2.

Theorem

7. For distributive S, W R . a computes an X-stable partial least solution. []

98

Note that we did not claim algorithm W R ~ to be a solver: and indeed this is not the case. Opposed to solvers W R and W R c , algorithm W R . a may fail to compute the least solution for constraint systems which are not distributive.

10

The Distributive

Framework

IDE

As a first application, let us consider the distributive framework IDE for interprocedural analysis of imperative languages as suggested by Horwitz et al. [17] and applied, e.g., to linear constant propagation. Framework IDE assigns to program points elements from lattice D = M -+ L of program states, where M is some finite base set (e.g., the set of currently visible program variables), and L is a lattice of abstract values. The crucial point of program analysis in framework IDE consists in determining summary functions from D --+ D to describe effects of procedures. The lattice of possible transfer functions for statements as well as for summary functions for procedures in IDE is given by 5r ---- M 2 --~ T~ where ~ C L -+ L is assumed to be a lattice of distributive functions of (small) finite height (e.g., 4 for linear constant propagation) which contains Ax._l_ and is closed under composition. The transformer in D --+ D defined by f E 9r is given as

Clearly, If] is distributive, i.e., If] (~/1Uy~) = If] ~/1U If] Tie. Computing the summary functions for procedures in this framework boils down to solving a constraint system 8 over ~" where right-hand sides e are of the form:

e::=/Ixlfoxlx~o~l where f E Y. Since all functions in 9r are distributive, function composition o : ~-2 _~ ~is distributive in each argument. Thus, constraint system 8 is a special case of the constraint systems from example 1. Therefore, we can apply W R ~ to compute the least solution of S efficiently - provided operations "o" and "U" can be computed efficiently. Using a diff-function similar to the last one of Example 6, we obtain: T h e o r e m 8. If operations in T~ can be executed in time O(1), then the summary functions for program p according to interprocedural framework IDE can be computed by W R ~ in time C0([p[. [M[3). [] The complexity bound in Theorem 8 should be compared with O([p[. [M[ s) which can be derived for W R . By saving factor [M[ ~, we find the same complexity as has been obtained in [17] for a special purpose algorithm.

11

Control-Flow

Analysis

Control-flow analysis (cfa for short) is an analysis for higher-order functional languages possibly with recursive data types [13]. Cfa on program p tries to compute for every expression t occurring in p a superset of expressions into which t may evolve during program execution, see, e.g., [23, 24, 21, 26]. Let A denote the set of subexpressions of p and D - 2A. Then cfa for a lazy language as in [26] can be formalized through a constraint system 8 over domain D with set V of in variables y~, t E A, where righthand sides of constraints consist of expressions e of one of the following forms:

e ::= {a} Ix I (a e xl);x2

99 for a E A. Here, we view (a E xl);x2 as specification of ct (xl,C) where C d = 0 if a ~ d and C d = x2 otherwise. Let us assume set V of variables is just ordered by equality. Then S is not only monotonic but also distributive. As function dill, we simply choose set diffrence. With these definitions, algorithm W R n can be applied. Let us assume that the maximal cardinality of an occurring set is bounded by s _< IPl. Furthermore, let I denote the complexity of inserting a single element into a set of maximally s elements. In case, for example, we can represent sets as bit vectors, I = O(1). In case, the program is large but we nevertheless expect sets to be sparse we may use some suitable hashing scheme to achieve approximately the same effect. Otherwise, we may represent sets through balanced ordered trees giving extra cost I = O(log s). Cfa introduces O(IPl 2) constraints of the form y _D (a E xl); x2. Inorder to avoid creation of (a representation of) all these in advance, we introduce the following additional optimization. We start iteration with constraint system So lacking all constraints of this form. Instead, we introduce function r : V -+ D --+ 2. . . . traints which, depending on the value of variables, returns the set of constraints to be added to the present constraint system, r is given by:

r x d = {y D x2 la ~ d, y O_(a ~ x);x2 ~ 3} Thus especially, r x (dl Ud2) = (r x d l ) U (r x d2). Whenever variable x is incremented by A, we add all constraints from r x A to the current constraint system by inserting them into worklist W. For cfa, each application r x A can be evaluated in time O(IAI). Thus, if the cardinalities of all occurring sets is bounded by s, at most O(IPl . s) constraints of the form y D x are added to So. Each of these introduces an amount O(s. I) of work. Therefore, we obtain: T h e o r e m 9. If s is the maximal cardinality of occurring sets, the least solution of constraint system S for cfa on program p can be computed by the optimized W R n algorithm in time O(Ipl . 82 9 I ) . [] The idea of dynamic extension of constraint systems is especially appealing and clearly can also be cast in a more general setting. Here, it results in an efficient algorithm which is comparable to the one proposed by Heintze in [13].

12

G e n e r a l i z e d C o m p u t a t i o n Trees

In practical applications, certain subcomputations for right-hand sides t u r n out to be independent. For example, the values for a set of variables may be accessed in any order if it is just the least upper bound of returned values which matters. To describe such kinds of phenomena formally, we introduce set GT(V, D) of generalized computation trees (gct's for short). Gct's t are given by: t : : = d I x l S l (t,C) where S C GT(V, D) is finite and C : D --+ GT(V, D). Thus, we not only allow sets of computation trees but also (sets of) computation trees as selectors of computation trees. Given t e ~T(V, D), function It] : (V --+ D) --+ D implemented by t as well as set dep(t, _) of variables accessed during evaluation are defined by: [d] a ix I a

= d = a x

Is1 ~

= I I{Irt]l o It ~ s }

lit, C)] a = IV (It] a)] a

dep(d, a) dep(x, a) dep(S, a) dep((t, C), a)

= = -~ =

0 {x} U{dep(t, ~ ) l t e s } dep(t, a) U dep(C (It] a), a)

100

While sets of trees conveniently allow to make independence of subcomputations explicit (see our example application in section 13), nesting of trees into selectors eases the translation of deeper nesting of operator applications.

Example 10. Assume expressions e are given by the grammar: e

::= d l x

I fe t g(el,e2)

where d E D and f and g denote monotonic functions in D --~ D and D 2 --+ D, respectively. The gct t~ for e can then be constructed by: 9 t~ = e i f e E D U V ;

9 t~ = (te,,Ad.fd) i r e = re'; 9 te = (t~l, C) with C d l = (t~:, Ad2.g (dl, 42)) if e - g (el, e2).

[]

For partial ordering "<" on set V of variables, we define relation "_<" on gct's by: 9 _L _< t for every t; and dl < d2 if dl _ d2; 9 xl < x2 as gct's if also xl < x2 as variables; 9 $1 < $2 if for all tl E $1, tl _< t2 for some t2 E $2;

9 {tl,C1) < (t2,C2) if tl < t2 and for all dl E_ d2, C d l <_ Cd2. Now assume the right-hand sides of constraint system S all are given through gct's. Then S is called weakly monotonic iff for every xl < x2 and constraint Xl _~ tl in S some constraint x2 _~ t2 in ,9 exists such that tt < t2. With these extended definitions prop. 2, cor. 3 as well as Theorem 4 hold. Therefore, algorithm W R is also a solver for constraint systems where right-hand sides are represented by gct's. Function C in t = (t ~, C) can now only b e interpreted as a representation of the continuation where reevaluation of t has to start if the evaluation of subtree t ~ on a has terminated, t ~ again may be of the form (s, C ) . Consequently, we have to deal with lists 7 of continuations. Thus, whenever during evaluation of t an access to variable y occurs, we now have to add pairs (x, 7) to the infl-set of variable y. As in section 8, we therefore change the functionality of [.] by defining: [d Eo ' ~ / = d

IS] a ' ~

Ix] o' ~ = o' 7 x

[(~, c ) I o' 7 = i v (it] o' (c:~))] o'

= [ ~{[t]a " / [ t e S}

where 0 t 7 x = a x. The goal here is to avoid reevaluation of whole set S just because one of its elements has changed its value. Therefore, we propagate list .~/arriving at set S of trees immediately down to each of its elements. Now assume a y has changed its value by A. Then we add all triples (x,% A) to W where (x,'~) E infl y. Having extracted such a triple from the worklist, the new solver applies list ~ to the new value A. The iterative application process is implemented by: app ~ d 0' = d

app (C:7) d 0' = app 7 ( I t d] a ' -/) 0'

The resulting value then gives the new contribution to the value of a for x. As for W R a for ordinary evaluation trees, infl-sets cannot be emptied after use. Carrying the definition of distributivity from section 9 over to gct's, we obtain: T h e o r e m 11. For distributive ,9 with gct's as right-hand sides, algorithm W R ~ computes an X-stable partial least solution. [] As an advanced application of gct's, we consider abstract 0LDT-resolution [15, 16, 25].

13

A b s t r a c t 0LDT-Resolution

Given Prolog program p, abstract 0LDT-resolution tries to compute for every program point x the set of (abstract) values arriving at x. Let A denote the set of possible values. Lattice D to compute in is then given by D --- 2A. Coarse-grained analysis assigns to each predicate an abstract state transformer A --+ D, whereas fine-grained analysis additionally assigns transformers also to every program point [15]. Instead of considering transformers as a whole (as, e.g., in the algorithm for framework IDE in section 10), transformer valued variable x is replaced by a set of variables, namely x a, a E A, where x a represents the return value of the transformer for x on input a. Thus, each variable x a potentially receives a value from D. The idea is that, in practice, set A may be tremendously large, while at the same time each transformer is called only on a small number of inputs. Thus, in this application we explicitly rely on demand-driven exploration of the variable space. To every transformer variable x the analysis assigns a finite set of constraint schemes x 9 2 e where 9 formally represents the argument to the transformer, and e is an expression built up according to the following grammar (see [25]):

e::=solgfe

f::=)~a.s(ga)lxl~a.(x(ga)oa

)

Here, s : A --+ D denotes the singleton map defined by s a = (a}, and g : (A --~ D) --+ D -+ D denotes the usual extension function defined by g f d = ( J ( ] a I a e d}. Thus, expressions e are built up from s 9 = ( 9 by successive applications of extensions g f for three possible types of functions ] : A --~ D. Unary functions g : A --~ A are used ~to model basic computation steps, passing of actual parameters into procedures and returning of results, whereas binary operators [] : D • A --+ D conveniently allow to model local states of procedures. They are used to combine the set of return values of procedures with the local state before the call [25]. In case of fine-grained analysis, every scheme for right-hand sides contains at most two occurrences of "s whereas coarse-grained analysis possibly introduces also deeper nesting. The set of constraints for variable x a, a E A, are obtained from the set of constraint schemes for x by instantiating 9 with actual parameter a. The resulting right-hand sides can be implemented through gct's ti~ }. For d E D, gct td is of the form td ----d or td = (Sd, C) such that C d' returns some tree trd,, and Sd = (s~ I a E d). The forms for elements sa of Sd correspond to the three possible forms for f in expressions g .f e, i.e, s~ ::----s ( g a ) [ x a [ (x(ga),)~d'.d'oa) Constraint system S for abstract 0LDT-resolution then turns out to be monotonic as well as distributive. As operator diff, we choose: diff(dl, d2) = d2\dl. Therefore, we can apply algorithm W R a . Let us assume that applications g a and insertion into sets can be executed in time O(1), whereas evaluation of d Da needs time O ( # d ) . Then: T h e o r e m 12. Fine-grained 0LDT-resolution for program p with abstract values from A can be executed by W R a in time O ( N . s 2) where N < [p[. ~ A is the number of considered variables, and s < # A is the maximal cardinality of occurring sets. [] W R a saves an extra factor O(s 2) over solver W R . An algorithm with similar savings has been suggested by Horwitz et al. [18]. Their graph-based algorithm, however, is neither able to treat binary operators nor deeper nested right-hand sides as ours. In the usual application for program analysis, A is equipped with some partial abstraction ordering "E", implying that set d C A contains the same information as its lower closure d~ = {a E A I Ba' E d : a E_ a'}. In this case, we may decide to compute

102

with lower subsets of A right from the beginning [15, 25]. Here, subset d C_ A is called lower iff d -- d$. If all occurring functions f as well as operators [] are monotonic, then we can represent lower sets d by their maximal elements and do all computations just with such anti-chains. The resulting constraint system then turns out to be no longer monotonic. However, it is still weakly monotonic w.r.t, variable ordering "_<" given by xl al < x2 a2 iff xl ~ x2 and al U a2. As function diff for anti-chains, we choose

diff(dl,d2)=d2\(dl$)={a2 E d2 I Val E dl: a2 ~ a l } Again, we can apply the differential worklist algorithm. Here, we found no comparable algorithm in the literature. W i t h beats conventional solvers for this application (see section 14). A simple estimation of the runtime complexity, however, is no longer possible since even large sets may have succinct representations by short anti-chains.

14

Practical Experiments

We have adapted the fastest general-purpose equation solver from [12], namely W D F S (for a distinction called W D F S Eq~ here), to constraint systems giving general-purpose constraint solver W D F S Co~. Solver W D F S c~ is similar to solver W i t , but additionally maintains priorities on variables and, before return from an update of a for variable x, evaluates all constraints y _] t from the worklist where y has higher priority as the variable below x (cf. [12]). To solver W D F S c~ we added propagation of differences (the "A") in the same way as we added propagation of differences to solver W i t in section 9. All fixpoint algorithms have been integrated into G E N A [10, 11]. G E N A is a generator for Prolog program analyzers written in SML. We generated analyzers for abstract 0LDT-resolution for PS+POS-I-LIN which is S0ndergaard's pair sharing domain enhanced with POS for inferring groundness [27, 7]. Its abstract substitutions are pairs of b d d ' s and graphs over variables. Thus, we maintain anti-chains of such elements. The generated analyzers were run on large real-world programs, aqua-c (about 560KB) is the source code of an early version of Peter van Roy's Aquarius Prolog compiler. c h a t (about 170KB) is David H.D. Warren's chat-80 system. The numbers reported in table 1, are the absolute runtimes in seconds (including system time) obtained for SML-NJ, version 109.29, on a Sun 20 with 64 MB main memory. Comparing the three algorithms for 0LDT-resolution, we find that all of these have still quite acceptable runtimes (perhaps with exeption of aqua-c) where algorithm W D F S ~ ~ almost always outperforms the others. Compared with equation solver W D F S Equ, algorithm W D F S ~ ~ saves approximately 40% of the runtime where usually less than half of the gain is obtained by maintaining constraints. The maximal relative gain of 48% could be observed for program readq where no advantage at all could be drawn out of constraints. Opposed to that, for Stefan Diehl's interpreter for action semantics a c t i o n , propagation of differences did not give (significantly) better numbers than considering constraints alone - program ann even showed a (moderate) decrease in efficiency (factor 3.32). Also opposed to the general picture, constraints for aqua-c resulted in an improvement of 25% only - of which 9% was lost through propagation of differences! This slowdown is even more surprising, since it could not be confirmed with analyzer runs on aqua-c for other abstract domains. Table 2 reports the runtimes found for aqua-c on domain CompCon. Abstract domain CompCon analyzes whether variables are bound to atoms or are composite. For CompCon, constraints introduced a slowdown of 18% whereas propagation of differences resulted in a gain of efficiency by 38% over W D F S ~qu.

103

program action.pl ann.pl aqua-c.pl b2.pl chat.pl chat-parser.pl chess.pl flatten.pl nand.pl readq.pl sdda.pl

WDFSB'~ ~ WDFS~'O,~ W D F S , ~ ~ 32.97 19.37 19.35 1.36 1.62 4.52 1618.0( 1209.00 1361.00 2.41 2.14 1.82 77.7: 67.94 53.14 29.62 27.72 17.28 0.4( 0.38 0.37 0.36 0.34 0.26 0.47 0.38 0.32 14.96 15.09 7.85 0.59 0.50 0.73

Table 1. Comparison of WDFSEqU, W D F S Con, and W D F S ~ ~ with PS+ POS+LIN.

program aqua-c.pl

WDFS~q~ W D F S C ~ I WDFS'~~ 214.67 252.43 133.61

Table 2. Comparison of W D F S Eq~, W D F S c~

15

and W D F S ~ ~ with CompCom

Conclusion

We succeeded to give an application independent exposition of two further improvements to worklist-based local fixpoint algorithms. This allowed us not only to exhibit a common algorithmic idea in seemingly different fast special purpose algorithms like the one of Horwitz et al. for interprocedural framework IDE [17] of Heintze's algorithm for control-flow analysis [13]. Our exposition furthermore explains how such algorithms can be practically improved - namely by incorporating recursive descent into variables as well as timestamps [12]. Finally, our approach allowed to develop completely new efficient algorithms for abstract 0LDT-resolution.

References 1. M. Alt and F. Martin. Generation of Efficient Interprocedural Analyzers with PAG. In 2nd SAS, 33-50. LNCS 983, 1995. 2. I. Balbin and K. Ramamohanarao. A Generalization of the Differential Approach to Recursive Query Evaluation. JLP, 4(3):259-262, 1987. 3. F. Bancilhon and R. Ramakrishnan. An Amateur's Introduction to Recursive Query Processing Strategies. In ACM SIGMOD Conference 1986, 16-52, 1986. 4. F. Bancilhon and R. Ramakrishnan. Performance Evaluation of Data Intensive Logic Programs. In Jack Minker, editor, Foundations of Deductive Databases and Logic Programming, chapter 12, 439-517. Morgan Kaufmann Publishers, 1988. 5. B. Le Charlier and P. Van Hentenryck. A Universal Top-Down Fixpoint Algorithm. Technical Report CS-92-25, Brown University, Providence, RI 02912, 1992. 6. B. Le Charlier and P. Van Hentenryck. Experimental Evaluation of a Generic Abstract Interpretation Algorithm for Prolog. A CM TOPLAS, ~16(1):35-101, 1994. 7. M. Codish, D. Dams, and E. Yardeni. Derivation of an Abstract Unification Algorithm for groundnes and Aliasing Analysis. In ICLP, 79-93, 1991.

104

8. P. Cousot and R. Cousot. Static Determination of Dynamic Properties of Recursive Programs. In E.J. Neuhold, editor, Formal Descriptions of Programming Concepts, 237-277. North-Holland Publishing Company, 1977. 9. V. Englebert, B. Le Charlier, D. Roland, and P. Van Hentenryck. Generic Abstract Interpretation Algorithms for Prolog: Two Optimization Techniques and their Experimental Evaluation. SPE, 23(4):419-459, 1993. 10. C. Fecht. GENA - A Tool for Generating Prolog Analyzers from Specifications. 2nd SAS, 418-419. LNCS 983, 1995. 11. C. Fecht. Abstrakte Interpretation logischer Programme: Theorie, Implementierung, Generierung. PhD thesis, Universit~t des Saarlandes, Saarbriicken, 1997. 12. C. Fecht and H. Seidl. An Even Faster Solver for General Systems of Equations. 3rd SAS, 189-204. LNCS 1145, 1996. Extended version to appear in SCP'99. 13. N. Heintze. Set-Based Analysis of ML Programs. ACM Conf. LFP, 306-317, 1994. 14. N. Heintze and D.A. McAllester. On the Cubic Bottleneck in Subtyping and Flow Analysis. IEEE Symp. LICS, 342-351, 1997. 15. P. Van Hentenryck, O. Degimbe, B. Le Charlier, and L. Michel. Abstract Interpretation of Prolog Based on OLDT Resolution. Technical Report CS-93-05, Brown University, Providence, RI 02912, 1993. 16. P. Van Hentenryck, O. Degimbe, B. Le Charlier, and L. Michel. The Impact of Granularity in Abstract Interpretation of Prolog. 3rd WSA, 1-14. LNCS 724, 1993. 17. S. Horwitz, T.W. Reps, and M. Sagiv. Precise Interprocedural Dataflow Analysis with Applications to Constant Propagation. 6th TAPSOFT, 651-665. LNCS 915, 1995. 18. S. Horwitz, T.W. Reps, and M. Sagiv. Precise Interprocedural Dataflow Analysis via Graph Reachability. 22nd POPL, 49-61, 1995. 19. D. Melski and T.W. Reps. Interconvertability of Set Constraints and Context-Free Language Reachability. ACM SIGPLAN Symp. PEPM, 74-89, 1997. 20. K. Muthukumar and M. V. Hermenegildo. Compile-Time Derivation of Variable Dependency Using Abstract Interpretation. JLP, 13(2&3):315-347, 1992. 21. H. Riis Nielson and F. Nielson. Infinitary Control Flow Analysis: A Collecting Semantics for Closure Analysis. 2~th POPL, 332-345, 1997. 22. R. Paige. Symbolic Finite Differencing - Part I. 3rd ESOP, 36-56. LNCS 432, 1990. 23. J. Palsberg. Closure Analysis in Constraint Form. ACM TOPLAS, 17:47-82, 1995. 24. J. Palsberg and P. O'Keefe. A Type System Equivalent to Flow Analysis. ACM TOPLAS, 17:576-599, 1995. 25. H. Seidl and C. Fecht. Interprocedural Analysis Based on PDAs. Technical Report 97-06, University Trier, 1997. Extended Abstract in: Verification, Model Checking and Abstract Interpretation. A Workshop in Assiciation with ILPS'97. 26. H. Seidl and M.H. Serensen. Constraints to Stop Higher-Order Deforestation. 2~th POPL, 400-413, 1997. 27. H. Sendergaard. An Application of Abstract Interpretation of Logic Programs: Occur Check Reduction. 1st ESOP, 327-338. LNCS 213, 1986.

Reasoning about Classes in Object-Oriented Languages: Logical Models and Tools ULRICH HENSEL1 MARIEKE HUISMAN2 BART JACOBS 2 HENDRIK TEWS 1

1 Inst. Theor. Informatik, TU Dresden, D-01062 Dresden, Germany. {hensel, t ews}@t cs. inf. tu-dresden, de

2 Dep. Comp. Sci., Univ. Nijmegen, P.O. Box 9010, 6500 GL Nijmegen, The Netherlands. {marieke ,bart}@cs .kun. nl

A b s t r a c t . A formal language CCSL is introduced for describing specifications of classes in object-oriented languages. We show how class specifications in CCSLcan be translated into higher order logic. This allows us to reason about these specifications. In particular, it allows us (1) to describe (various) implementations of a particular class specification, (2) to develop the logical theory of a specific class specification, and (3) to establish refinements between two class specifications. We use the (dependently typed) higher order logic of the proof-assistant PUS, so that we have extensive tool support for reasoning about class specifications. Moreover, we describe our own front-end tool to PVS, which generates from CCSL class specifications appropriate PVS theories and proofs of some elementary results.

1

Introduction

During the last two decades, object-orientation has established itself in analysis, design and programming. At this moment, c ++ and JAVA are probably the most popular object-oriented programming languages. Despite this apparent success, relatively little work has been done on formal (logical) methods for object-oriented programming. One of the reasons, we think, is that there is no generally accepted formal computational model for object-oriented programming. Such a model is needed as domain of reasoning. One such formal model has recently emerged in the form of "coalgebras" (explicitly e.g. in [21]). It should be placed in the tradition of behavioural specification, see also [6, 8, 4]. Coalgebras are the formal duals of algebras, see [14] for background information. They consist of a (hidden) state space--typically written as Self in this context--together with several operations (or methods) acting on Self. These operations m a y be attributes giving some information about objects (the elements of Self), or they may be procedures for modifying objects. All access to elements of Self should go via the operations of the coalgebra. In contrast, elements of abstract data types represented as algebras can only be built via the "constructor" operations (of the algebra). We consider coalgebras together with initial states as classes, and elements of the carrier Self of a coalgebra as (states of) objects of the class.

106

For verification purposes involving coalgebraic classes and objects we are interested in the observable behavior and not in the concrete representation of elements of Self. A behavior of an object in this context is the objects reaction pattern, i.e. what we can observe via the attributes after performing internal computations triggered by pressing procedure buttons. This naturally leads to notions like bisimilarity (indistinguishability of objects via the available operations) and invariance. Based on coalgebras, a certain format has been developed for class specifications, see [21,11, 10]. This format typically consists of three sections, describing the class specifications' methods, assertions, and creation-conditions--which hold for newly created objects. We have developed this format into a formal language CCSL, for Coalgebraic Class Specification Language, which will be sketched below. Ad hoc representations of these class specifications in the higher order logic of the proof-tool pvs [18, 17] have been used in [12, 13] to reason about such classes--notably for refinement arguments. Further experiments with formal reasoning about classes and objects have led us to a general representation of CCSL class specifications in higher order logic. Below we explain this model (in the logic of evs), and also a (preliminary version of a) front-end tool that we use for generating such models from class specifications. The code for this tool (called LOOP for Logw of Object-Oriented Programming) is written in OCAML [22]. It basically performs three consecutive steps: it first translates a CCSL class specification into some representation in OCAML; this representation is then internally analysed and finally transformed into Pvs theories and proof. The generated evs file contains several theories describing the representation of the class specification, via appropriate definitions and associated lemmas (e.g. stating that bisimilarity is an equivalence relation). Another file that is generated by our tool contains instructions for proofs of the lemmas in the PVS file. The architecture of our tool allows for easy extensions, e.g. to accept JAVA [7] or EIFFEL [16] classes, or to generate files for other proof assistants such as ISABELLE [19]. The diagram below describes (via the solid lines) what our tool currently does, see Section 5 for some more details. The dashed lines indicate possible future extensions. CCSL

lexing and ~ representation parsing in OCAML

class

specifications

I

I

PVS

theories

& proofs

-,f f f 7

f f f

ISABELLE

theories & proofs

/ f f

The idea behind the dashed lines on the left is that classes in actual programming languages should lead to executable class specifications, about which one can

107

reason. We have made some progress--which will be reported elsewhere--in reasoning about JAVA classes in this setting. Here we concentrate on the upper solid lines. The paper is organised as follows. We start in Sections 2 and 3 with an elaborate discussion of two examples, involving a class specification of a register in which one can store data at addresses, and a subclass specification of a bounded register, in which writing is only allowed if the register is not full. This involves overriding of the original write method. Then, in Section 4 we discuss some further aspects of the way that we model class specifications and that we reason about them. Finally, in Section 5, we describe the current stage of the implementation of our front-end tool. We shall be using the notation of PVS'S higher order logic when we describe our models. It is largely self-explanatory, and any non-standard aspects will be explained as we proceed. The LOOP front-end tool t h a t will be described in Section 5 is still under development. Currently, it does the basics of the translation from CCSL class specification to PVS theories and proofs (without any fancy GUI). It m a y take some time before it reaches a stable form. Instead of elaborating implementation details, this paper focuses on the basic ideas of our models.

2

A simple register: specification and modeling

We start by considering a simple register, which can store d a t a at an address. It contains read, write and erase operations, satisfying some obvious requirements. It is described in our coalgebraic class specification language CCSL in Figure 1. The type constructor A ~+ Lift[A] adds a b o t t o m element ' b o t ' to a type A, and keeps all elements a from A as up(a). A total function B -+ Lift[A] m a y then be identified with a partial function B --+ A. We use square brackets [ A 1 , . . . , Am] for the Cartesian product A1 • . . . • An. Begin Register[ Data : Type, Address : Type ] : ClassSpec Method read : [Self, Address] -> Lift [Data] ; write : [Self, Address, Data] -> Self; erase : [Self, Address] -> Self Assertion read_write : PVS FORALL(a,b : Address, d : Data) read(write(x, a, d), b) = IF a = b THEN up(d) ELSE read(x, b) ENDIF ENDPVS read_erase : PVS FORALL(a,b : Address) : read(erase(x, a), b) = IF a = b THEN bot ELSE read(x, b) ENDIF ENDPVS Constructor new : Self Creation read_new : PVS FORALL(a:Address) : read(new,a) = bot ENDPYS End Register Fig. i. A register class specification in C C S L T h e types D a t a a n d Address are parameters in this specification, which can

be suitably instantiated in a particular situation. W h a t is coalgebraic about such specifications is that all methods act on Self, i.e. have Self as one of their

108

input types 1 . Usually this type Self is not written explicitly in object-oriented languages, but it is there implicitly as the receiver of m e t h o d invocations. T h e constructor section declares new as a constructor (without parameters) for creating a new register. Notice t h a t assertions and creation-conditions have names. T h e PVS and E N D P V S tags are used to delimit strings, which are basically boolean expressions in PVS ~. It is assumed t h a t x is a variable in Self. In order to reason (with PVS tool support) a b o u t such a Register class specification, we first m o d e l it in the higher order logic of PVS. This is w h a t our LOOP tool does a u t o m a t i c i l y . It generates several PVS theories to capture this specification. Space restrictions prevent us from discussing all these theories in detail, so we concentrate on the essentials. T h e first step is to introduce a (single) type which captures the interface of a class specification, via a labeled product. For Register, this is done in the following PVS theory. RegisterInterface[ Self, Data, Address : TYPE ] : THEORY BEGIN

RegisterIFace

: TYPE = [# read : [Address -> Lift[Data]l, write : [Address, Data -> Self], erase : [ A d d r e s s - > Self] #] END RegisterInterface

T h e square brace n o t a t i o n [A1,. 9 An --+ B] is used in PVS for the type of (total) functions with n inputs from A t , 9 9 An and with result in B. Notice t h a t in the types of the operations in this interface the input t y p e Self is o m i t t e d 3. This is intended: a crucial step in our approach is t h a t we use coalgebras of the form c : [Self -> RegisterIFace[Self, Data, Address]]

as models of the m e t h o d section of the Register specification, with Self as state space. T h e individual m e t h o d s can be extracted f r o m such a coalgebra c via the definitions: r e a d ( c ) : [ S e l f , Address -> L i f t [ D a t a ] ] = LAMBDA(x : S e l f , a : Address) : r e a d ( c ( x ) ) ( a ) w r i t e ( c ) : [ S e l f , Address, Data -> S e l f ] = LAMBDA(x : S e l f , a : Address, d : Data) : w r i t e ( c ( x ) ) ( a , d ) erase(c)

: [Self, Address -> Self] = : Self, a : Address) : erase(c(x))(a)

LAMBDA(x

T h u s the individual m e t h o d s of a class can be extracted from such a single coalgebra c. 1 A bit more precisely, the methods can all be written, possibly using currying, of the form Self -+ F,(Self); and they can be combined into a single operation Self --~ F1(Self) x . . . x F,(Self). 2 O u r front-end tool simply passes the string in P V S ... E N D P V S where it is parsed and typechecked.

on to the PVS tool,

s Categorically, the type RegisterlFace captures the functor associated with the signature of operations in the class specification, see [14].

109

Next, our formalisation deals with invariants and bisimulations. These are special kinds of predicates and relations on Self which are suitably closed under the above operations. For example, an invariant P C Self with respect to a Register coalgebra c satisfies, by definition:

P(z) :~

Va: Address, d: Data. P(write(c)(x, a, d)) Va:Address. P(erase(c)(z,a)).

A bisimulation w.r.t, c is a relation R C_ Self x Self satisfying: R(z, y) ~

Va: Address. read(c)(z, a) = read(c)(y, a) Va: Address, d: Data. R(write(c)(x, a, d), write(c)(y, a, d)) Va: Address. R(erase(e)(z, a), erase(e)(y, a)).

Bisimilarity b i s ira? is then the greatest bisimulation relation. Interestingly, these notions of invariant and bisimulation are completely determined by the class interface RegisterIFace. They are generated automatically (by our tool) by induction on the structure of the types in the interface, based on liftings of these types to predicates and relations, as introduced in [9] (see also [13]). These Pvs theories about invariants and bisimulations contain several standard lemmas (stating e.g. that invariants are closed under finite conjunctions A and universal quantification V), for which proof instructions are generated automatically (again using induction). The next theory RegisterSemantics deals with the assertions and creationconditions. The two assertions in Figure 1 are translated into two predicates on the carrier type Self of a Register coalgebra c: [Self --+ RegisterIFace[Self, Data, Address]]. Assuming that z is a variable of type Self, we generate: r e a d _ w r i t e ? ( c ) ( x ) : bool = F O R A L L ( a , b : Address, d : Data) : r e a d ( c ) ( w r i t e ( c ) ( x , IF a ~ b THEN up(d) ELSE read(c)(x, b) E N D I F r e a d _ e r a s e ? ( c ) ( x ) : bool = F O R A L L ( a , b : Address) : r e a d ( c ) ( e r a s e ( c ) ( x , a), b) = IF a = b THEN bot ELSE read(c)(x, b) E N D I F

a, d), b) =

For convenience, these predicates are collected in a single predicate: liegisterAssert?(c) : bool = FORALL(x : Self) : read_write?(c)(x) AND read_erase?(c)(x) Similarly, we put the creation-condition in a predicate liegisterCreate?(c) : PliED[Self] = {x : Self I FORALL(a : Address) : read(c)(x, a) = bet} At this stage we are able to say what actually constitutes a class implementation that satisfies a class specification as in Figure 1: it is a coalgebra c: [Self--+ RegisterIFace[Self, Data, Address]] satisfying the predicate RegisterAssert?, together with some element new: Self satisfying the predicate RegisterCreate?(c). This is formalised in the following theory using a (dependent!) labeled product.

110

[Self, Data, Address : TYPE] : THEORY BEGIN IMPORTING ReEisterSemantics [Self, Data, Address] R e g i s t e r C l a s s : TYPE = [# clg : (ReEisterAssert?), new : (RegisterCreate?(clg)) #] END ReEist erClassStructure RegisterClassStructure

T h e notation (P) for a predicate P: [A -> bool] on A : T Y P E is used in PVS as

an abbreviation for the predicate subtype { x : A I P ( x ) ) . A class thus consists of a state space Self with appropriate operations (combined in a coalgebra c l g on Self) and with an appropriate constructor new. An object of such a class is then simply an inhabitant of the state space Self. Thus, in the way t h a t we model classes and objects, the methods are part of the class, and not of the object. This is called the delegation implementation, in contrast to the embedding implementation, where the operations are part of the object, see [1, Sections 2.1 and 2.2]. Once we have all this settled, we can start reasoning about the class specification. The two things we can do immediately are: (1) describing an implementation of the specification, and (2) developing its theory. Both are user tasks: the tool only provides theory frames which the user can fill in. We give a sketch of what can be done. As to (1), it is a wise strategy to write out an implementation, immediately after finishing the specification. It is notoriously hard to write "good" specifications which capture the informal description of the m a t t e r in question and, at the same time, are consistent in the logic used. This is sometimes called the "ground problem". Usually, specialists have a good understanding of a particular implementation. Once this implementation is formally written out it can be checked against the assertions and creation-conditions. For example, for the Register class specification, an obvious implementation describes registers as partial functions from addresses to data. This can be done via the Lift[-] type constructor, and yields as state space: FunctionSpace : TYPE = [ A d d r e s s - > L i f t [ D a t a ] ] This type can be equipped with a suitable coalgebra structure and a constructor: c : [FunctionSpace -> RegisterIFace[FunctionSpace, D a t a , A d d r e s s ] ]

=

LAMBDA(f : FunctionSpace) : (# r e a d := L A M B D A ( a : Address) : f(a), write := L A M B D A ( a : Address, d : Data) : f WITH [(a) := up(d)], erase := LAMBDA(a : Address) : f WITH [(a) := bot] #) new : Fu_nctionSpace = L A M B D A ( a : Address) : bet

(The notation g WITH [(y) := z] is an abbreviation for LAMBDAx : IF x = y THEN z ELSE g(x) ENDIF.) This coalgebra structure on the state space FunctionSpace clearly captures our intuition, and it is not hard to prove t h a t both propositions RegisterAssert?(c) and RegisterCreate?(c)(new) hold. Actually, Pvs can prove both of t h e m with a single command, (GRIND).

111

Of course, we can also define other implementations. For example, one can define an implementation in which the sequence of operations applied to an object is recorded for each address. This can be done by taking as state space: HistorySpace

: TYPE = [Address -> list[Lift[Data]]]

T h e implementation of the methods and constructor on this state space is left to the interested reader. Again, (GRIND) in PVS proves that the assertions and creation-conditions hold (for our implementation). When class specifications are used as components in other classes (e.g. via class-valued attributes, see Section 4) we need a model for them. Obvious choices for a model are (1) an arbitrary, so-called "loose" model and (2) a final model. Both are generated. Once we know that our class specification has a non-trivial model (and hence that it is consistent) we can safely postulate the existence of a loose model. A final model enables the use of subclasses for components, but its existence is an open question in presence of binary methods. Due to a lack of space, only the loose model is described here. It has the following form. LooseRegisterClass[Data, Address : TYPE] : THEORY BEGIN LooseRegisterType : TYPE IMPORTING RegisterClassStructure[LooseRegisterType, loose_Register_existence : AXISM EXISTS(cl : RegisterClass) : TRUE LooseRegisterClass : RegisterClass END LooseRegisterClass

Data, Address]

In this theory the existence of an arbitrary model of the class specification is guaranteed via an axiom. In principle this can be dangerous, because it m a y lead to inconsistencies. However, as long as a non-trivial implementation has been given (earlier) by hand, there is no such danger. The type LooseRegisterType in this theory is simply postulated, and we know nothing about its internal structure. This ensures that when this model is used as a component in another class, no internal details can be accessed (simply because there are no such details). We turn to the second way to reason about a (translated) specification. Our tool generates an almost e m p t y Pvs theory frame called RegisterUserTheory. This theory starts by declaring a coalgebra structure c satisfying the predicate RegisterAssert?, together with a constructor satisfying the creation-condition RegisterCreate?(c). Under these assumptions a user can start proving various logical consequences of the assertions in the class specification. For example, a useful proposition that can be proved in RegisterUserTheory is the following characterisation of bisimilarity. : LEMMA bisim?(c)(x,y) IFF FORALL(a : Address)

bisim_char

: read(c) (x,a) -- read(c) (y,a)

This expresses that two objects (or states) x, y: Self are bisimilar (i.e. indistinguishable) w.r.t, the assumed (arbitrary) model c if and only if they give the same read output at each address. Intuitively this m a y be clear: if we cannot see

112

a difference between two objects via reading, then using a write or erase will not create a difference between these objects (because a read after a write or erase is completely determined by the Register assertions). Using this characterization, it is easy to prove, for example, write_commutation : LEMMA FORALL(a,b : Addresses, d,e : Data) : a /= b IMPLIES bisim?(c)(write(c)(write(c)(x, a, d), b, e), write(c) (write(c)(x, b, e), a, d))

This result says that one can exchange write operations at different addresses. Notice that we are careful in only stating that the outcomes are bisimilar, and not necessarily equal. We avoid the use of equality of objects/states, since we regard these as hidden, and we restrict access to (public) methods. In addition, the use of bisimilarity entails that the results that we prove also hold in implementations where bisimilar states need not be (internally) equal, like in the above HistorySpace model. There we can have equal reads at all addresses in two states, even though the histories of these states are quite different. Hence such states are bisimilar, but internally different. At the end, it may be instructive to compare this coalgebraic way of combining methods, with the approach taken in [1] (explicitly e.g. in Section 8.5.2). There the methods of a class are combined in a slightly different manner, namely in a labeled product, called "trait type": RegisterTrait = [# read: Self-+ [Address -+ Lift[Data]], write: Self--+ [Address, Data --+ Sell], erase: Self --+ [Address --~ Selt]#] What we do is basically the same, except that our methods are combined "coalgebraieally", with the common input type Self on the outside. What is called a "class type" in [1] is such a "trait type" together with a constructor new, see the RegisterClass type above. Thus, when it comes to interfaces, there is no real difference between our approach and the one in [1]. But we go further in two essential ways: (a) we restrict the methods and constructors so that they satisfy certain requirements (given in the assertions and creation-conditions in the specification), and (b) we (automatically) generate appropriate notions of invariance and bisimilarity for (the interface of) each class specification, and use them systematically in reasoning about these specifications.

3

A bounded

register:

inheritance

and

overriding

Having described an implementation for the Register class specification--and developed part of its theory--we now introduce a new class specification BoundedRegister by inheritance. A bounded register is a subclass of a register, which overrides the write operation and defines a new attribute count. A bounded register can only store a limited number of data elements, and the count attribute is

113

Begin BoundedRegister[ Data : Type, Address : Type, n : nat ] : ClassSpec Inherit from Register [Data, Address] Method write : [Self,Address,Data] -> Self ; count : S e l f - > nat Assertion override_write_def : PVS FORALL(a : Address, d : Data) : bisim?(write(x, a, d), IF count(x) < n OR up?(read(x,a)) THEN super_write(x, a, d) ELSE x ENDIF) ENDPVS count_super_write : PVS FORALL(a : Address, d : Data) : count(super_write(x, a, d)) ffi IF bot?(read(x,a)) THEN count(x) + 1 ELSE count (x) ENDIF ENDPVS count_erase : PVS FORALL(a : Address) : count(erase(x, a)) ffi IF bot?(read(x, a)) THEN count (x) ELSE max(O, count(x) - i) ENDIF ENDPVS Constructor n e w : Self Creation count_new : PVS count(new) = 0 ENDPVS End BoundedRegist er Fig. 2. A bounded re~ster class specification in C C S L

used to keep track of how much data is currently stored. When the bounded register is full (i.e. when its count is above a certain number n given as parameter), a write operation does not have any effect; otherwise it acts as the write operation from the superclass Register. Further, the read and erase operations from Register are used without modification. A CCSL class specification of a bounded register is given in Figure 2. The predicates h o t ? and up? on L i f t [Data] tell us whether an element x : L i f t [ D a t a ] is hot or up(d), for some d : Data. Again, our tool generates several PVS theories from this specification. This section will discuss the essential consequences the use of inheritance (in combination with overriding) has on the generated theories. We model inheritance by letting the interface of the BoundedRegister not only contain the operations write and count, but also the superclass as a field (super_Register). This enables access to the methods of the superclass. BoundedRegisterIFace : TYPE = [# super_Register : RegisterIFace[Self, Data, Address], write : [Address, D a t a - > Self], count : nat #] N o w w e provide access not only to the individual m e t h o d s of the B o u n d e d Register class but also to the m e t h o d s f r o m the superelass, via the following

definitions. c : VAR [Self -> BoundedRegisterIFace[Self,

Data, Address]]

super_Register(c) : [Self -> RegisterIFace[Self, Data, Address]] ffi LAMBDA (x:Self) : super_Register(c(x)) read(c) : [[Self, Address] -> Lift[Data]] ffi

114

LAMBDA (x: Self, a:Address) : read(super_Register (c (x)) ) (a) super_write(c) :[[Self, Address, Data] -> Self] -LAMBDA (x:Self, a:Address, d:Data) : write(super_Register(c(x)))(a, write(c) : [[Self, Address, Data] -> Self] = LAMBDA (x:Self, a:Address, d:Data) : write(c(x))(a, d) erase(c) : [Self, Address -> Self] = LAMBDA(x : Self, a : Address) : erase(super_register(c(x)))(a) count (c) : [Self -> nat] = LAMBDA(x : Self) : count (c (x) )

d)

Via these explicit definitions, all methods of superclasses can be used in subclasses. The number of such definitions may be considerable when there are high inheritance trees, but our tool generates all of them automatically. In fact, this is one of the reasons for developing such a tool. The write operation in the subclass specification in Figure 2 also occurs in the superclass. This double occurrence is used to signal overriding. Our tool recognizes it, and generates as a result two write operations. A "direct" one from the current subclass (simply called r~rite) and an "indirect" one from the superclass (called s u p e r _ w r i t e ) . Notice that the coalgebra c--used as variable in this theory--combines both the structure of the subclass and the superclass. The theories about invariants and bisimulations are generated incrementally, i.e. they extend the predicates and relations on Register with appropriate clauses for the additional methods of the subclass. The assertions and creation-conditions of BoundedRegister are translated into PVS predicates, just as in the Register example. The resulting predicate BoundedRegisterAssert? combines these assertions with the assertions in RegisterAssert?. The predicate BoundedRegisterCreate? similarly combines the new creation-conditions with the "super" creation-conditions from Register. This implies that, although we override a method, we can still expect the superclass to behave as specified. BoundedRegisterhssert?(c) : bool = RegisterAssert?[Self, Data, Address] (super_register(c)) AND FORALL(x : Self) : everride_write_def?(c)(x) AND count_super_write?(c) (x) AND count_erase?(c) (x) BeundedRegisterCreate?(c) : PRED[Self] = {x : Self ~ count(c)(x) = 0 AND RegisterCreate7 [Self, Data, Address] (super_register (c)) (x) } T h e BoundedRegisterStrueture theory n o w contains an additional casting operation f r o m BoundedRegisterClass to RegisterClass. BoundedRegisterClass : T Y P E = [# clg : (BoundedRegisterAssertT), new : (BeundedRegisterCreate? (clg)) #] cast : [BoundedRegisterClass -> RegisterClass] LAMBDA(cl : BoundedRegisterClass) : [# clg := super_Register(clg(cl)), new := new(el) #]

=

115

(Well-definedness of cast involves proving two easy results.) When an implementation for a bounded register is described, definitions for the methods in BoundedRegister (i.e. count and write) and for those in the superclass (i.e. read, write, erase) have to be given. An obvious implementation of the bounded register specification uses the Cartesian product [ n a t , F u n c t i o n S p a c e ] as state space, where F u n c t i o n S p a c e is the state space of the first Register implementation in the previous section. The first component n a t describes the value of count. Appropriate operations on this state are easily defined, by re-using the Register implementation on F u n c t i o n S p a c e . The contents of the theory with the loose model is not influenced by inheritance and also the way the theory is generated is not altered. 4

Modeling

other

object-oriented

aspects

This section briefly discusses h o w - - a n d to what extend--various typically objectoriented features are realised in our formalisation. Not all of the aspects that we touch upon have fully crystalised into stable form, and the further development and use of our tool may lead to certain changes. C o m p o n e n t classes. When specifying a new class one often wishes to use another class as a component. By component we mean an attribute which is an instance of another class. This is also known as an aggregation realising a has-a relationship between two classes. B e g i n Counter Method

Assertion

[ n: posnat, val_init val : S e l f - > nat; next : S e l f - > Self; clear : Self -> Self val_next

: nat]

: CLASSSPEC

: PVS val(next (x)) = IF val(x) = n-I THEN 0 ELSE val(x)+l ENDIF ENDPVS

Constructor Creation End Counter

val_init : PVS val_init <= n ENDPVS val_clear : PVS val(clear(x)) = 0 ENDPVS new : Self val_new : PVS val(new) = val_init ENDPVS

Fig. 3. A counter (modulo n) class specification in CCSL

To demonstrate the use of components we adopt an example from [12]. Suppose that we have a class Counter, which counts modulo a parameter n, as in Figure 3. This class Counter is used (twice) as a component in the class specification of a DoubleCounter in Figure 4. A DoubleCounter has two counters as components, both counting modulo n. It has operations next, val and clear. The first counter is incremented every time a next operation is executed. The second counter is only incremented when the first counter reaches n. As we have seen, our tool automatically generates loose and final models (without any internal structure) for every specification, and presents an option for the user. Both these models can be used for components, but a final model enables subclassing for components.

116

Begin DoubleCounter[ n: posnat ] : CLASSSPEC Method val : Self -> nat; first : S e l f - > Counter[n,0]; second : Self -> Counter[n,O] ; next : Self -> Self; clear : Self -> Self A s s e r t i o n val_def : PVS val(x) = n * val(second(x)) + val(first (x)) ENDPVS first_next : PVS bisim?(first (next(x)), next (first (x))) ENDPVS second_next : PVS bisim?(second(next(x)), IF val(first(x)) = n-1 THEN next (second(x)) ELSE second(x) ENDIF) ENDPVS first_clear : PVS bisim?(first (clear(x)) , clear(first (x)) ENDPVS second_clear : PVS bisim? (second (clear (x)) , clear (second (x)) ) ENDPVS Constructor new : Self Creation first_new: PVS bisim?(first(new), new) ENDPVS second_new: PVS bisim?(second(new), new) E~IDPVS End DoubleCount er

Fig. 4. A double counter class specification in CCSL As an example, the interface for DoubleCounter, using a loose model for the components, will be generated as follows. DoubleCounterIFace

:

TYPE = [# val : nat, first : LooseCounterType[n,0], second : LooseCounterType[n,O], next : Self, clear : Self #]

When generating the other theories for DoubleCounter, components are handled just as normal attributes (with bisimilarity as their equality relation). R e f i n e m e n t . Earlier we mentioned how to implement a class specification and how to develop its theory. A third i m p o r t a n t activity is proving refinements between class specifications. We say that a "concrete" class refines an "abstract" class when a model (i.e. an implementation) of the abstract class can be described in terms of the concrete class. We construct this model as abstract(c): [Self--+ AbstractIFace[Self,...]], where c: [Self--+ ConcreteIFace[Self,...]] is an arbitrary model of the concrete class 4. Following [13] we do not need the entire state space Self to obtain an "abstract" model, but we can restrict ourselves to the subtype (P) of Self arising from an invariant P on Self (w. r. t. the abstract class). Then abstract (c) restricts to an operation of type [(P) --+ A b s t r a c t I F a c e [ ( P ) , . . . ] ] . Of course, it has to be proven that the model satisfies the assertions and creationconditions of the abstract class, as expressed by the following lemma. Abstract_refine : LEMMA AbstractAssert? (abstract (c)) AND AbstractCreate? (abstract (c)) (new)

4 Such a model abstract(c) should actually incorporate models of all the superclasses of the abstract class. Therefore, in practice, the model abstract(c) is best constructed by first constructing all these "super" models.

117

As an example, we can prove that DoubleCounter with p a r a m e t e r n refines a counter modulo n ~. The model for this refinement uses the invariant that the values of b o t h component counters are bounded by n. O v e r l o a d e d m e t h o d s . Some object-oriented languages allow overloading of methods: multiple methods with the same n a m e m a y occur in the same class as long as their types are different. This is also possible in CCSL. PVS does allow overloading of functions, but field names in a labeled p r o d u c t - - u s e d as types of interfaces--are not permitted, hence we use ordinary products in interfaces with overloading. M u l t i p l e i n h e r i t a n c e . In our formalization we allow multiple inheritance (even though some object-oriented languages do not). This requires coping with name clashes, for instance: (1) if different superclasses define a m e t h o d with the same name, and (2) if one class is inherited twice via different paths. To solve the first problem, the user can rename the conflicting methods in the INHERIT FROM section in the CCSL specification, like in EIFFEL [16]. As an example, a class can inherit both from Counter and from DoubleCounter in the following manner. INHERIT FROM Counter[n,0]

RENAMING val AS val_c AND next AS next_c AND c l e a r AS c l e a r _ c , DoubleCounter[n] RENAMING val AS val_d AND next AS next_d AND clear AS clear_d

This will lead to method definitions like val_c(c) : LAMBDA(x val_d(c) : LAMBDA(x

[Self -> : Self) [Self-> : Self)

nat] = : val(super_Counter(c(x))) nat] = : val (super_DoubleCounter (c (x)))

Renaming is also necessary for different instances of the same class. The second problem of multiple paths to the same method is solved essentially by using sets of ancestor methods. C r e a t i o n w i t h p a r a m e t e r s . So far we have simply used 'new' in CCSL specifications as a constructor which returns a new instance of a class. In object-oriented languages one can usually parametrise such constructors with the initial values of the attributes. Typically, in a point class (specification) with attributes f s t and snd for first and second coordinate, one m a y wish to have new as a (binary) constructor satisfying the following creation-conditions. fst(new(a,

b))

= a AND s n d ( n e w ( a ,

b))

= b

This option also exists in CCSL: one can put constructors as functions with type [ A 1 , . . . , A,~] --4 Self in the constructor section. They are handled in e v s via a labeled product containing all these constructors, instead of a single constructor new, as in the examples in Sections 2 and 3. Since we have not yet reached agreement on whether or not constructors should be inherited in object-oriented specifications, we included both options. S u b t y p i n g . The usual object-oriented view is that inheritance (subclassing) implies subtyping (see [1, Section 3.2]), namely of the form: in every place where an

118

object from a superclass is expected, an object from a subclass may be used as well. This is because all methods from the superclass also exist in the subclass-possibly in overridden form, but still with the same type. Precisely this aspect of subclassing exists in our formalisation because all methods from superclasses are explicitly (re-)defined in subclasses, see the definitions of r e a d ( c ) etc. for bounded registers in Section 3. This "structural" subtyping (see again [1, Section 3.2]) arises because the Register interface is part of the Boundedl~egister interface. Also we use explicit casting operations from subclasses to superclasses, as described for bounded registers in Section 3. Such casting operations are generated for components as final models. B i n a r y m e t h o d s . Binary methods are a topic of intense debate in the objectoriented community, see [3]. They are allowed in many object-oriented languages, but can lead to various problems (notably type insecurities). A standard example of a binary method is the union (or intersection) operation in a class (specification) of sets (over some parameter type A). . o ,

elem? : [Self, A -> bool]; add, delete : [Self, A -> Self];

union, intersection : [Self, Self -> Self]; , o ,

Typically, a binary (or n-ary, for n > 1) method takes multiple inputs of type Self. Methods of type [Self, A 1 , ' - - , A,] --4 F(Self) are allowed in CCSL under the following two restrictions: (1) if Self occurs in Ai then Ai = Self, (2) Self occurs only positively in F. L a t e b i n d i n g . Consider a Point class specification with attributes f s t and s n d (as above) and with a move method satisfying: fst(move(x,da,db))

-- f s t ( x )

+ d a AND s n d ( m o v e ( x , d a , d b ) )

-- s n d ( x )

+ db

Suppose now that we often need the move operation with parameters da = db = 1, and decide to define it explicitly as m o r e l ( x ) = move(x, 1, i ) . Late binding means that if we later override move in a subclass of Point, then the morel method will change accordingly: its definition will then use the overridden move. At this moment we have an ad hoc solution to model late binding, and we are still testing its appropriateness in various examples.

5

T h e f r o n t - e n d LOOP t o o l

Thus far we have seen how ( C C S L ) class specifications can be translated into higher order logic. This translation is done automatically by our tool, which is constructed as a front-end to a proof assistant. In general, front-end tools provide a higher level interface tailored to a specific application domain [2, 20, 23, 15, 5]. They vary in the degree of sophistication and user support. While simple systems feature theory blueprints where the user fills out special slots in combination with specialised high level tactics [2, 5], more advanced approaches define a special language and provide command line compilers [20] or even interactive user interfaces [15].

119

Our development aims at an environment in which the user can specify classes in several languages and frameworks and can then reason about their properties and relationships in a suitable proof assistant of choice. Ultimately, we desire a tool, called LOOP (for: Logic of Object-Oriented Programming), which provides an interactive (emacs) shell for the proof assistant. Thus far, as a first step, we focus on the compiler, which generates for a given class specification the corresponding theory and proof representations for the target proof assistant. It should be easy to extend the tool to other object-oriented languages and proof assistants. Also, it should come with a suitable graphical user interface. These aims influenced the choice of the implementation language and the architecture of the compiler. We use the typed functional language Objective Caml (OCAML) [22], the current release of the French ML dialect CAML. Objective Caml provides, above the strict typing and readable syntax of an ML dialect, a typed module system, command line compilers with the capability of generating native machine code, lexer and parser generators, and an extensive library including an X-Window interface. The architecture of the compiler (see Figure 5) exploits standard compiler construction techniques. It is organised in a number of passes which work on fixed interface data structures. This enables us to easily plug-in modules for other input languages (than CCSL) and other target proof assistants (than PVS).

CCSL " string> ~

~ [ ~ stream I~". . . .

]~class_type

Itypechecker]

Iinheritancean~lyserI

itheorygeneratori the~

ype~-[ prettyprinter] Pvs ~o strings Fig.5. Tool architecture The compiler basically consists of the input modules lexer and parser, the internal modules (the vertical part in Figure 5), and the pretty printer. The lexer and parser are generated by the OCAML tools OCAMLLEX and OCAMLYACCwhich resemble the well-known LEX and YAcc from c programming environments. Parsing a (CCSL) string yields an internal symbolic class represented as a value of the complicated, inductively defined OCAML type c l a s s _ t y p e . The parser can be replaced by any other function which generates values of class_~ype. All internal passes have input and output values in this type. The real work is carried out at a symbolic level. Extra steps can easily be inserted. After type checking and performing several semantic checks (for instance to determine the full inheritance tree of a class) the final internal pass produces symbolic theories

120

and proofs as values of the OCAML type t h e o r y _ t y p e . This latter pass is the workhorse of the whole system. Finally, a target specific pretty printer converts the symbolic representation for PVS (or another proof assistant). Currently, the compiler accepts CCSL class specifications in a file name.beh and generates the corresponding theories and proofs as described in the previous sections. For instance, compilation of a file r e g i s t e r . b e h containing the simple specification from Figure 1 will generate the files r e g i s t e r . p v s and r e g i s t e r . p r f . The file r e g i s g e r . p v s can then be loaded, parsed, and type checked in PVS. Before filling out the theory frames as described above the user can prove automatically all the standard lemmas with the p r o o f - f i l e command.

Conclusions and future work We have elaborated a way to model object-oriented class specifications in higher order logic in such detail that it is amenable to tool support. Future work, as already mentioned at various points in this paper, involves: elaboration of the formal definition of CCSL (including e.g. visibility modifiers and late bindings), completion of the implementation of the LOOP tool, definition of appropriate tactics, stepwise refinement, development of various extensions to the tool and of course: use of the tool in reasoning about various object-oriented systems.

Acknowledgements We thank David Griffioen for helpful discussions.

References 1. M. Abadi and L. Cardelli. A Theory of Objects. Monographs in Comp. Sci. Springer, 1996. 2. M. Archer and C. Heitmeyer. TAME: A specialized specification and verification system for timed automata. In A. Bestavros, editor, Work In Progress (WIP) Proceedings of the 17th IEEE Real-Time Systems Symposium (RTSS'96), pages 3-6, Washington, DC, December 1996. The WIP Proceedings is available at http ://www. cs. bu. edu/pub/ieee-rt s/rt ss96/wip/proceedings.

3. K. Bruce, L. Cardelli, G. Castagna, The Hopkins Objects Group, G. Leavens, and B. Pierce. On binary methods. Theory ~ Practice of Object Systems, 1(3), 1996. 4. C. C[rstea. Coalgebra Semantics for Hidden Algebra: parametrised objects and inheritance. To appear in: Workshop on Algebraic Development Techniques (Springer LNCS), 1998. 5. A. Dold, F.W. von Henke, H. Pfeifer, and H. Ruefl. Formal verification of transformations for peephole optimization. In FME '97: Formal Methods: Their Industrial Application and Strengthened Foundations, Lecture Notes in Computer Science. 6. J.A. Goguen and G. Malcolm. An extended abstract of a hidden agenda. In J. Meystel, A. Meystel, and R. Quintero, editors, Proceedings of the Conference on Intelligent Systems: A Semiotic Perspective, pages 159-167. Nat. [nst. Stand. & Techn., 1996.

121

7. J. Gosfing, B. Jay, and G. Steele. The Java Language Specification. AddisonWesley, 1996. 8. R. Hennicker, M. Wirsing, and M. Bidoit. Proof systems for structured specifications with observability operators. Theor. Comp. Sci., 173(2):393-443, 1997. 9. C. Hermida and B. Jacobs. Structural induction and coinduction in a fibrational setting. Information 8A Computation (to appear). 10. B. Jacobs. Inheritance and cofree constructions. In P. Cointe, editor, European Conference on Object-Oriented Programming, number 1098 in Lect. Notes Comp. Sci., pages 210-231. Springer, Berlin, 1996. 11. B. Jacobs. Objects and classes, co-algebraically. In B. Freitag, C.B. Jones, C. Lengauer, and H.-J. Schek, editors, Object-Orientation with Parallelism and Persistence, pages 83-103. Kluwer Acad. Publ., 1996. 12. B. Jacobs. Behaviour-refinement of coalgebraic specifications with coinductive correctness proofs. In M. Bidoit and M. Dauchet, editors, TAPSOFT'97: Theory and Practice of Software Development, number 1214 in Lect. Notes Comp. Sci., pages 787-802. Springer, Berlin, 1997. 13. B. Jacobs. Invariants, bisimulations and the correctness of coalgebraic refinements. In M. Johnson, editor, Algebraic Methodology and Software Technology, number 1349 in Lect. Notes Comp. Sci., pages 276-291, Berlin, 1997. 14. B. Jacobs and J. Rutten. A tutorial on (co)algebras and (co)induction. EATCS Bulletin, 62:222-259, 1997. 15. J. Knappman. A PVS based tool for developing programs in the refinement calculus. Master's thesis, Inst. of Comp. Sci. & Appl. Math., Christian-AlbrechtsUniv. of Kiel, 1996. DipiJKm. html

http://,ww.informatik.uni-kiel.de/inf/deRoever/

16. B. Meyer. Object-Oriented Software Construction. Prentice Hall, 2nd rev. edition, 1997. 17. S. Owre, S. Rajan, J.M. Rushby, N. Shankar, and M. Srivas. PVS: Combining specification, proof checking, and model checking. In R. Alur and T.A. Henzinger, editors, Computer Aided Verification, number 1102 in Lect. Notes Comp. Sci., pages 411-414. Springer, Berlin, 1996. 18. S. Owre, J.M. Rushby, N. Shankar, and F. yon Henke. Formal verification for fault-tolerant architectures: Prolegomena to the design of PVS. 1EEE Trans. on Softw. Eng., 21(2):107-125, 1995. 19. L.C. Paulson. Isabelle: A Generic Theorem Prover. Number 828 in Lect. Notes Comp. Sci. Springer, Berlin, 1994. 20. C.H. Pratten. An Introduction to Proving AMN Specifications with PVS and the A M N - P r o o f Tool. http ://www. dsse. ecs. soton, ac. uk/~chplamn_proof/ papers, html. 21. H. Reichel. A n approach to object semantics based on terminal co-algebras, Math.

Struct. in Comp. Sci., 5:129-152, 1995. 22. D. R~my and J. Vouillon. Objective ML: A simple object-oriented extension of ML. In Princ. of Progr. Lang., pages 40-53. ACM Press, 1997. 23. J.U. Skakkebaek. A Verification Assistant for a Real Time Logic. PhD thesis, Dep. of Computer Science, Techn. Univ. Denmark, 1994.

Language Primitives and Type Discipline for Structured Communication-Based Programming KOHEI HONDA*, VASCO T. VASCONCELOSt, AND MAKOTO KUBO$ ABSTRACT. •Ve introduce basic language constructs and a type discipline as a foundation of structured communication-based concurrent programming. The constructs, which are easily translatable into the summation-less aSynchronous It-calculus, allow prograznmers to organise programs as a combination of multiple flows of (possibly unbounded) reciprocal interactions in a simple and elegant way, subsuming the preceding communication primitives such as method invocation and rendez-vous. The resulting s3mtactic structure is exploited by a type discipline ~t la ML, which offers a high-level type abstraction of interactive behaviours of programs as well as guaranteeing the compatibility of interaction patterns between processes in a well-typed program. After presenting the formal semantics, the use of language constructs is illustrated through examples, and the basic syntactic results of the type discipline are established. Implementation concerns are also addressed.

1. Introduction Recently, significance of programming practice based on communication a m o n g processes is rapidly increasing by the development of networked computing. Prom network protocols over the Internet to server-client systems in local area networks to distributed applications in the world wide web to interaction between mobile robots to a global banking system, the execution of complex, reciprocal communication among multiple processes becomes an important element in the achievement of the goals of applications. Many programming languages and formalisms have been proposed so far for the description of software based on communication. As programming languages, we have CSP [7], Ada [28], languages based on Actors [2], P O O L [3], ABCL [33], Concurrent Smalltalk [32], or more recently Pict and other 1r-calculus-based languages [6, 23, 29]. As formalisms, we have CCS [15], Theoretical CSP [8], ~r-calculus [18], and other process algebras. In another vein, we have functional programming languages augmented with communication primitives, such as CML [26], dML [21], and Concurrent Haskell [12]. In these languages and formalisms, various communication primitives have been offered (such as synchronous/asynchronous message passing, remote procedure call, method-call and rendez-vous), and the description of communication behaviour is done by combining these primitives. What we observe in these primitives is that, while they do express one-time interaction between processes, there is no construct to structure a series of reciprocal interactions between two parties as such. That is, the only way to represent a series of communications following a certain scenario (think of interactions between a file server and its client) is to describe them as a collection of distinct, unrelated interactions. In applications based on complex interactions among concurrent processes, which are appearing more and more in these days, the lack of structuring methods would result in low readability and careless bugs in final programs, apart from the case when the whole communication behaviour can be simply described as, say, a one-time remote procedure call. The situation may be illustrated in comparison with the design history 9Dept. of Computer Science, University of Edinburgh, UH. t Dept. of Computer Science, University of Lisbon, Portugal. SDept. of Computer Science, Chiba University of Commerce, Japan.

123

of imperative programming languages. In early imperative languages, programs were constructed as a bare sequence of primitives which correspond to machine operations, such as assignment and goto (consider early Fortran). As is well-known, as more programs in large size were constructed, it had become clear that such a method leads to programs without lucidity, readability or verifiability, so that the notion of structured programming was proposed in 1970's. In present days, having the language constructs for structured programming is a norm in imperative languages. Such a comparison raises the question as to whether we can have a similar structuring method in the context of communication-based concurrent programming. Its objective is to offer a basic means to describe complex interaction behaviours with clarity and discipline at the high-level of abstraction, together with a formal basis for verification. Its key elements would, above all, consist of (1) the basic communication primitives (corresponding to assignment and arithmetic operations in the imperative setting), and (2) the structuring constructs to combine them (corresponding to "if-then-else" and "while"). Verification methodologies on their basis should then be developed. The present paper proposes structuring primitives and the accompanying type discipline, as a basic structuring method for communication-based concurrent programming. The proposed constructs have a simple operational semantics, and various communication patterns, including those of the conventional primitives as well as those which go beyond them, are representable as their combination with clarity and rigor at the high-level of abstraction. The type discipline plays a fundamental role, guaranteeing compatibility of interaction patterns among processes via a type inference algorithm in the line of ML [19]. Concretely our proposal consists of the following key ideas. 9 A basic structuring concept for communication-based programming, called session. A session is a chain of dyadic interactions whose collection constitutes a program. A session is designated by a p r i ~ t e port called channel, through which interactions belonging to that session are performed. Channels form a distinct syntactic domain; its differentiation from the usual port names is a basic design decision we take to make the logical structure of programs explicit. Other than session, the standard structuring constructs for concurrent programming, parallel composition, name hiding, conditional and recursion are provided. In particular the combination of recursion and session allows the expression of unbounded thread of interactions as a single abstraction unit. 9 Three basic communication primitives, value passing, label branching, and delegation. The first is the standard synchronous message passing as found in e.g. CSP or ~r-calculus. The second is a purified form of method invocation, deprived of value passing. The third is for passing a channel to another process, thus allowing a programmer to dynamically distribute a single session among multiple processes in a well-structured way. Together with the session structure, the combination of these primitives allows the flexible description of complex communication structures with clarity and discipline. 9 A basic type discipline for the communication primitives, as an indispensable element of the structuring method. The typability of a program ensures two possibly communicating processes always own compatible communication patterns. For example, a procedural call has a p a t t e r n of output-input from the caller's viewpoint; then the callee should have a communication pattern of input-output. Because incompatibility of interaction patterns between processes would be one of the main reasons for bugs in communication-based programming, we believe such a type discipline has important pragmatic significance. The derived type gives high-level abstraction of interactive behaviours of a program.

124

Because communication between processes over the network can be done between modules written in different programming languages, the proposed communication constructs may as well be used by embedding them in various programming languages; however for simplicity we present them as a self-contained small programming language, which has been stripped to the barest minimum necessary for explanation of its novel features. Using the language, the basic concept of the proposed constructs is illustrated through programming examples. They show how extant communication primitives such as remote procedural-call and method invocation can be concisely expressed as sessions of specific patterns (hence with specific type abstractions). They also show how sessions can represent those communication structures which do not conform to the preceding primitives but which would naturally arise in practice. This suggests that the session-based program organisation may constitute a synthesis of a number of familiar as well as new programming ideas concerning communication. We also show the proposed constructs can be simply translatable into the asynchronous polyadic ~r-calculus with branching [31]. This suggests the feasibility of implementation in distributed environment. Yet much remains to be studied concerning the proposed structuring method, including the efficient implementation of the constructs and the accompanying reasoning principles. See Section 6 for more discussions. The technical content of the present paper is developed on the basis of the preceding proposal [27] due to Kaku Takeuchi and the present authors. The main contributions of the present proposal in comparison with [27] are: the generalisation of session structure by delegation and recursive sessions, which definitely enlarges the applicability of the proposed structuring method; the typing system incorporating these novel concepts; representation of conventional communication primitives by the structuring constructs; and basic programming examples which show how the constructs, in particular through the use of the above mentioned novel features, can lucidly represent complex communication structures which would not be amenable to conventional communication primitives. In the rest of the paper, Section 2 introduces the language primitives and their operational semantics. Section 3 illustrates how the primitives allow clean representation of extant communication primitives. Section 4 shows how the primitives can represent those interaction structures beyond those of the conventional communication primitives through key programming examples. Section 5 presents the typing system and establishes the basic syntactic results. Section 6 concludes with discussions on the implementation concerns, related works and further issues.

2. Syntax and Operational Semantics 2.1. B a s i c C o n c e p t s . The central idea in the present structuring method is a session. A session is a series of reciprocal interactions between two parties, possibly with branching and recursion, and serves as a unit of abstraction for describing interaction. Communications belonging to a session are done via a port specific to that session, called a channel. A fresh channel is generated when initiating each session, for the use in communications in the session. We use the following s)natax for the initiation of a session. r e q u e s t a(k) i n P

a c c e p t a(k) i n P

initiation of session

The r e q u e s t first requests, via a name a, the initiation of a session as well as the generation of a fresh channel, then P would use the channel for later communications. The a c c e p t , on the other hand, receives the request for the initiation of a session via a, generates a new" channel k, which would be used for communications in P. In the above grammar, the parenthesis (k) and the key word i n shows the binding and its scope.

125

Thus, in r e q u e s t a(k) i n P , the part (k) binds the free occurrences of k in P . This convention is used uniformly throughout the present paper. Via a channel of a session, three kinds of atomic interactions are performed: value sending (including name passing), branching and channel passing (or delegation). k ! [ e l . . , en];P

k?(xl . . . x n ) i n P

k <~1;P

k [> {ll : PLY"" ~l~ : P~} c a t c h k(k') in P

throw k[k']; P

d a t a sending/receiving label selection/branching channel sending/receiving (delegation)

Data sending/receiving is the standard synchronous message passing. Here ei denotes an expression such as arithmetic/boolean formulae as well as names. We assume variables xx..xn are all distinct. We do not consider program passing for simplicity, cf. [11]. The branching/selection is the minimisation of method invocation in object-based programming, ll,..,l, are labels which are assumed to be pairwise distinct. The channel sending/receiving, which we often call delegation, passes a channel which is being used in a session to another process, thus radically changing the structure of a session. Delegation is the generalisation of the concept with the same name originally conceived in the concurrent object-oriented programming [34]. See Section 4.3 for detailed discussions. In passing we note that its treatment distinct from the usual value passing is essential for both disciplined programming and a tractable type inference. Communication primitives, organised by sessions, are further combined by the following standard constructs in concurrent programming.

PI I P2 (~)P (~k)P if e then P else Q def X1(~ika) =/'I and-.-and Xn(~.k.) = P. in P

concurrent composition name/channel hiding conditional recursion

We do not need sequencing since each communication primitive already accompanies one. We also use • the inaction, which denotes the lack of action (acting as the unit of "["). Hiding declares a name/channel to be local in its scope (here P). Channel hiding may not be used for usual programming, but is needed for the operational semantics presented later. In conditional, e should be a boolean expression. In recursion, X , a process variable, would occur in Px...P, and P zero or more times. Identifiers in ~ilci should be p a i r ~ e distinct. We can use replication (or a single recursion) to achieve the same effect, but multiple recursion is preferable for well-structured programs. This finishes the introduction of all language constructs we shall use in this paper. We give a simple example of a program. a c c e p t a(k) i n k![1]; k?(y) i n P

I r e q u e s t a(k) i n k?(x) i n k[[x + 1]; i n a c t .

The first process receives a request for a new session via a, generates k, sends 1 and receives a return value via k, while the second requests the initiation of a session via a, receives the value via the generated channel, then returns the result of adding 1 to the value. Observe the compatibility of communication patterns between two processes. 2.2. S y n t a x S u m m a r y . We summarise the syntax we have introduced so far. Base sets are: names, ranged over by a, b , . . . ; channels, ranged over by k, k'; variables, ranged over by x, y , . . . ; constants (including names, integers and booleans), ranged over by c, d , . . . ; expressions (including constants), ranged over by e, e ' , . . . ; labels, ranged over by l,l',... ; and process variables, ranged over by X , Y , . . . . u,u' . . . . denote names and channels. Then processes, ranged over by P, Q . . . , are given by the following grammar.

126

P

::~

session request session acceptance data sending data reception label selection label branching channel sending channel reception conditional branch parallel composition inaction name/channel hiding recursion process variables

request a(k) in P accept a(k) in P

k![~]; P k?(~) in P k <~l;P k > {l~ : e~O"" D&: e~}

throw k[k']; P catch k(k') in P if e then P else Q

P[Q inact

(vu)P def D in P

x[~] D

::~

X~(~1)

= P~ a n d . . . a n d X . ( ~ . ~ )

= P.

declaration for recursion

The association of "l" is the weakest, others being the same. Parenthesis (.) denotes binders which bind the corresponding free occurrences. The standard simultaneous substitution is used, writing e.g. P[~/~]. The sets of free names/channels/variables/process variables of say P, defined in the standard way, are respectively denoted by fn(P), fc(P), fv(P) and fpv(P). The alpha-equality is written --a. We also set fu(P) def fc(P) U fn(P). Processes without free variables or free channels are called programs. 2.3. O p e r a t i o n a l S e m a n t i c s . For the concise definition of the operational semantics of the language constructs, we introduce the structural equality - (cf. [4, 16]), which is the smallest congruence relation on processes including the following equations.

1. P =- Q if P - a Q. 2. P [ i n a c t = P , PIQ-Q[P, (P[Q)[R=-P[(Q[R). 3. ( ~ ) i n a c t - inact, (~u~)P - ( ~ ) P , (~uu')P - (~u'~)P, ( ~ ) P I q (vu)(P) Q) if u r fu(Q), (lJu)def D i n P ----clef D in (vu)P if u r fu(D). 4. (def D in e ) [ q = def D in (P I q ) tf fpv(D) n fpv(q) = ~. 5. def D in (def D' in P) = def D and D' in P iffpv(D) A fpv(D') = @. N o w the operational semantics is given by the reduction relation -~, denoted P ~ Q, which is the smallest relation on processes generated by the following rules. [LINK]

(accept a(k) in P1) [ ( r e q u e s t a(k) in P2) --~ (uk)(P1 I P2)

[CoM]

(k![e]; P1) I (k?(~) in P2) -~ Px [ P2[~/~]

(~ J~e)

[LABEL] (k,~li;P)[(kt>{ll : P l ~ ' " ~ l n : P , } ) ~ P[ Pi ( l _ < i _ < n ) [PASS]

(throw k[k']; P1) ] (catch k(k') i n P2) ~

[IF1]

if e then P1 e l s e P2 --~ P1

(e .[ t r u e )

[IF2]

if e then Pl else P2 -Y ]92

(e $ false)

[DEF]

def D in (X[~k] I Q) ~

[ScoP]

P ~ P' ~

(uu)P-+ (uu)P'

[PAR]

P~P'

P ) Q "+ P ' I Q

[SIR]

P_=P'andP'-->Q'andQ'_=Q

~

P1 I P2

def D in (P[~/~]) Q) (~ I e,X(~k) = P E D)

=~ P ~ Q

Above we assume the standard evaluation relation .!.on expressions is given. We write --+* for the reflexive, transitive closure of -~. Note how a fresh channel is generated in [LINK]

127

rule as the result of interaction (in this way r e q u e s t a(k) i n P corresponds to the bound output in ~r-calculus[18]). In the [LABEL] rule, one of the branches is selected, discarding the remaining ones. Note we do not allow reduction under various communication prefixes, as in standard process calculi. As an example, the simple program in 2.1 has the following reduction (below and henceforth we omit trailing inactions).

accept a(k) i n k)[1];k?(y) i n P [ r e q u e s t a(k) i n k?(x) i n k)[x + 1]

(uk)(k][1];k?(y) i n P [ k?(x) i n k![x + 1]) (vk)(k?(y) i n P [ k ! [ x + 1]) -+ P[2/y]. Observe how interaction proceeds in a lock-step fashion. This is due to the synchronous form of the present communication primitives.

3. Representing Communication (1) Extant Communication Primitives This section discusses how structuring primitives can represent with ease the communication patterns of conventional communication primitives which have been in use in programming languages. They show how we can understand the extant communication patterns as a fixed way of combining the proposed primitives. 3.1. C a l l - r e t u r n . The call-return is a widely used communication pattern in which a process calls another process, then the callee would, after some processing, returns some value to the caller. Usually the caller just waits until the reply message arrives. This concept is widely used as a basic primitive in distributed computing under the name of remote procedure call. We may use the following pair of notations for call-return. x = call

f/el"" en] i n P,

fun

f(xl"'" xn) i n P.

On the right, the return command r e t u r n / e ] would occur in P . Assume these constructs axe added to the syntax in 2.2. A simple programming example follows. E x a m p l e 3.1 (Factorial). Fact(f) = fun

f(x) i n

i f x = 1 then r e t u r n / l ]

e l s e (ub)(Fact[b] I Y = c a l l b[x - 1] i n r e t u r n [ x * y]) Here and henceforth we write X(~/~) = P , or a sequence of such equations, for the declaration part of recursion, leaving the b o d y p a r t implicit. This example implements the standard recursive algorithm for the factorial function, y = c a l l f[5] i n P would give its client process. The communication patterns based on call-return are easily representable by the combination of request/accept and send/receive. We first show the mapping of the caller. Below [.] denotes the inductive mapping into the structuring primitives.

Ix = c a l l a/e] in l:~l de=f r e q u e s t a(k) i n k![e]; k?(x) i n [P]. Naturally we assume k is chosen fresh. The basic scenario is that a caller first initiates the session, sends the values, and waits until it receives the answer on the same channel. On the other hand, the callee is translated as follows: [:fun a(:r) i n P ] de__faccept

a(k) i n k?(~) i n [P][k![e]/return[e]].

Here [k![e]/return[e]] denotes the syntactic substitution of k![e] for each return/e]. Observe that the original "return" operation is expressed by the "send" primitive of

128

the structuring primitives. As one example, let us see how the factorial program in Example 3.1 is translated by the above mapping. E x a m p l e 3.2 (Factorial, translation). Fact(f) = accept

f(k) in

k?(x) in

if X = 1 then k![1];

e l s e (ub)(Fact[b] [ r e q u e s t b(k') i n k'![x - 1]; k'?(y) in k![x * y]) Notice how" the usage of k' and k differentiates the two contexts of communication. H we compose the translation of factorial with that of its user, we have the reduction: Fact[If [ [y = c a l l f[3] in P ] ~ * Fact[If [ [f][6/y]. In this way, the semantics of the synchronous call-return is given by that of the structuring primitives via the translation. Some observations follow. (1) The significance of the specific notations for call-return would lie in the declaration of the assumed fixed communication pattern, which would enhance readability and help verification. At the same time, the translation retains the same level of clarity as the original code, even if it does need a few additional key strokes. (2) In translation, the caller and the callee in general own complementary communication patterns, i.e. input-output meets output-input. However if, for example, the return commands appear twice in the callee, the complementarity is lost. This relates to the notion of types we discuss later: indeed, non-complementary patterns are rejected by the typing system. (3) The translation also suggests how structuring primitives would generalise the traditional call-return structure. That is, in addition to the fixed pattern of input-output (or, correspondingly, output-input), we can have a sequence of interactions of indefinite length. For such programming examples, see Section 4. 3.2. M e t h o d I n v o c a t i o n . The idea of method invocation originates in object-oriented languages, where a caller calls an object by specifying a method name, while the object waits with a few methods together with the associated codes, so that, when invoked, executes the code corresponding to the method name. The call may or may not result in returning an answer. As a notation, an "object" would be written: obj a:{/a(~l) in P l l " " ~In(~,) i n P,}, where a gives an object identifier, 4 , . . - , In are labels (all pairwise distinct) with formal parameters xi, and Pi gives the code corresponding to each li. The return action, written return[el, may occur in each Pi as necessary. A caller then becomes:

x = a.li[~] i n P

a.l,[~];P.

The left-hand side denotes a process which invokes the object with a method li together with arguments ~, then, receiving the answer, assigns the result to x, and finally executes the continuation P. The right-hand side is a notation for the case when the invocation does not need the return value. As an example, let us program a simple cell. E x a m p l e 3.3 (Cell). Cell(a,c) = obj c: {read() in

(return[c][Cell[a,C])n~ite(x)

in Cell[a,x] ).

The cell Cell[a, c] denotes an object which saves its value in c and has a name a. There are two methods, read and write, each with the obvious functionalities. A caller which "reads" this object would be written, for example, x = a. read[ ] in P.

129 The method invocation can be represented in the structuring constructs by combining label branching and value passing. We show the translations of an object, a caller which expects the return, and a caller which does not expect the return, in this order. [obj a:{/l(Xl) in PI~-"~I.(:L~) in Pn}] def accept a(k) in k [> {ll : k?(:~l) in [Pl]cr~..-

~ln: k?(xn)

in

[Pn]G}

ix = a. li[e] in P] clef r e q u e s t a(k) i n k • l,; k![~]; k?(x) in JR] [a. l,[~]; P] d,=frequest

a(k) in

k <~ l,; k![~]; IF]

In each translation, k should be fresh. In the first equation, a denotes the substitution [k![e]/return[e]], replacing all occurrences of return[e] by k![e]. Observe that a method invocation is decomposed into label branching and value passing. Using this mapping, the cell is now translated into: E x a m p l e 3.4 (Cell, translation).

Cell(a, c) = accept a(k) in k I> {read: k![e]; Cell[a, c] ~ w r i t e : k?(x) in Cell[a, x]}. Similarly, x = a.read[] in P is translated as r e q u e s t c(k) in k <3 read;k?(x) in P, while a. write[3] becomes r e q u e s t a(k) i n k <~w r i t e ; k![3]. Some observations follow. (1) The translation is not much more complex than the original text. The specific notation however has a role of declaring the fixed communication pattern and saves key-strokes. (2) Here again, the translation of the caller and the callee in general results in complementary patterns. There are however two main cases where the complementarity is lost, one due to the existence/lack of the return statement (e.g. x = a. wr• in Q above) and another due to an invocation of an object with a non-existent method. Again such inconsistency is detectable by a typing system introduced later. (3) The translation suggests how structuring primitives can generalise the standard method invocation. For example, an object may in turn invoke the method of the caller after being invoked. We believe that, in programming practice based on the idea of interacting objects, such reciprocal, continuous interaction would arise naturally and usefully. Such examples are discussed in the next section. In addition to call-return and method invocation, we can similarly represent other communication patterns in use with ease, for example asynchronous call-return (which indudes rendez-vous [28] and ~ture [2]) and simple message passing. For the space sake we leave their treatment to [11]. Section 6 gives a brief discussion on the significance of specific notations for these fixed communication patterns in language design.

4. Representing Communication (2) Complex Communication Patterns This section shows how the structuring primitives can cleanly represent various complex communication patterns which go beyond those represented in conventional communication primitives. 4.1. C o n t i n u o u s I n t e r a c t i o n s . The traditional call-return primitive already encapsulates a sequence of communications, albeit simple, as a single abstraction unit. The key possibility of the session structure lies in that it extends this idea to arbitrarily complex communication patterns, including the case when multiple interaction sequences interleave with each other. The following shows one such example, which describes the behaviour of a banking service to the user (essentially an automatic teller machine).

130

E x a m p l e 4.1 (ATM). ATM(a, b) -- a c c e p t

a(k)

in

k![ia~;

b(h) in k?(amt) in h![id, amt]; ATM[a, b] ~withdraw : request b(h) in k?(amt) in h <~withdraw; h![id, amt]; h ~> { s u c c e s s : k <~ d i s p e n s e ; k![amt]; ATM[a, b]

k D. {deposit : request h ,~ d e p o s i t ;

~ f a i l u r e : k <~ o v e r d r a f t ; ATM[a, b]} ~balance : r e q u e s t

b(h)

i n h <~b a l a n c e ;

h?(amt)

in

k![amt]; ATM[a, b]} The program, after establishing a session with the user via a, first lets the user input the user code (we omit such details as verification of the code etc.), then offers three choices, d e p o s i t , withdraw, and b a l a n c e . When the user selects one of them, the corresponding code is executed. For example, when the w i t h d r a w is chosen, the program lets the user enter the amount to be withdrawn, then interacts with the bank via b, asking with the user code and the amount. If the bank answers there is enough amount in the account, the money is given to the user. If not, then the o v e r d r a f t message results. In either case the system eventually returns to the original waiting mode. Note in particular the program should communicate with the bank in the midst of interaction with the user, so three parties are involved in interaction as a whole. A user may be written as: request

a(k)

in

k![myI~; k <~withdraw;

k![58];

k E> {dispense:

k?(amt)

in P~overdraft : Q}.

Here we may consider Q as a code for "exception handling" (invoked when the balance is less than expected). Notice also interactions are now truly reciprocal. 4.2. U n b o u n d e d I n t e r a c t i o n s . The previous example shows how structuring primitives can easily describe the situation which would be difficult to program in a clean way using the conventional primitives. At the same time, the code does have room for amelioration. If a user wants to look at his balance before he withdraws, he should enter his user code twice. The following refinement makes this redundancy unnecessary. E x a m p l e 4.2 (Kind ATM).

a(k) in k![i~; Actions[a, b, id, k] --- k ~>{ d e p o s i t : r e q u e s t b(h) i n k?(amt) i n h <~ deposit; h![id, amt]; Actions[a, b, id, k] ~withdraw : r e q u e s t b(h) i n k?(amt) i n h <~withdraw; h)[i4; h![amt]; h > { s u c c e s s : k <3 d i s p e n s e ; k![amt]; Actions[a, b, id, k] ~ f a i l u r e : k <~ o v e r d r a f t ; Actions[a, b, id, k]} ~balance : r e q u e s t b(h) i n h <3 b a l a n c e ; h?(amt) i n k![amt]; Actions[a, b, id, k]

ATM'(a, b) = accept Actions(a, b, id, k)

~quit : ATM'[a, b]}

131

As can be seen, the main difference lies in that the process is still within the same session even after recurring to the waiting mode, after processing each request: once a user establishes the session and enters the user code, she can request various services as many times as she likes. To exit from this loop, the branch q u i t is introduced. This example shows how recursion within a session allows a flexible description of interactive behaviour which goes beyond the recursion in usual objects (where each session consists of a fixed number of interactions). It is notable that the unbounded session owns a rigorous syntactic structure so as to allow type abstraction, see Section 5. Unbounded interactions with a fixed p a t t e r n naturally arise in practice, e.g. interactions between a file server and its client. We believe recursion within a session can be effectively used for varied programming practice. 4.3. D e l e g a t i o n . The original idea of delegation in object-based concurrent programming [34] allows an object to delegate the processing of a request it receives to another object. Its basic purpose is distribution of processing, while maintaining the transparency of name space for clients of t h a t service. Practically it can be used to enhance modularity, to express exception handing, and to increase concurrency. The following example shows how we can generalise the original notion to the present setting. E x a m p l e 4.3 (Ftp server). Below (~,, P denotes the n-fold parallel composition.

Init(pid, his) = (ub)(Ftpd[pid, b] I @,+FtpThread[b, nis]) Ftpd(pid, b) = a c c e p t pid(s) i n r e q u e s t b(k) i n throw k[s]; Ftpd[pid, b] FtpThread(b, his) = a c c e p t b(k) i n c a t c h k(s) i n s?(userid, passwd) i n r e q u e s t his(j) i n j <~ checkUser; j![userid, passwa]; j t> { i n v a l i d : s <~ s o r r y ~ v a l i d : s ,~ welcome; Actions[s, b]} Actions(s, b) = s I> {get : ... ; s?(file) i n - . . ; Actions[s, b] ~put : . . . ; s?(file) i n -.- ; Actions[s, b] ~bye : . - - ;FtpThread[b, nis]}. The above code shows an outline of the code of a ftp server, which follows the behaviour of the standard ftp servers with T C P / I P protocol. Initially the program Init generates a server F t p d and n threads with the identical behaviour (for simplicity we assume all threads share the same name b). Suppose, in this situation, a server receives a request for a service from some client, establishing the session channel s. A server then requests for another session with an idle thread (if it exists) and "throws" a channel s to that thread, while getting ready for the next request from clients itself. It is now the thread FtpThread which actually processes the user's request, receiving the user name, referring to NIS, and executing various operations (note recursion within a session is used). Here the delegation is used to enable the ftp server to process multiple requests concurrently without undue delay in response. The scheme is generally applicable to a server interacting with many clients. Some observations follow. (1) The example shows how the generalised delegation allows programmers to cleanly describe those interaction patterns which generalise the original form of delegation. Other examples of the usage of delegation abound, for example a file server with geographically distributed sites or a server with multiple services each to be processed by a different sub-server. (2) A key factor of the above code is t h a t a client does not have to be conscious of the delegation which takes place on the server's side: that is, a client program can be

132

written as if it is interacting with a single entity, for example as follows.

request

pid(s) in s![myza]; s ~, {sorry : - - . ~ e l c o ~ e : ... }

Observe that, between the initial r e q u e s t and the next sending operation, the car interaction takes place on the server's side: however the client process does not have to be conscious of the event. This shows how delegation enables distribution of computation while maintaining the transparency of the name space. (3) If we allow each ftp-thread to be d)-namically generated, we can use parallel composition to the same effect, just as the use of '~fork" to pass process resources in UNIX. While this scheme has a limitation in t h a t we cannot send a channel to an already running process, it offers another programming method to realise flexible, dynamic communication structures. We also observe that the use of t h r o w / c a t c h , or the "fork" mentioned above, would result in complexly woven sequences of interactions, which would become more error-prone t h a n without. In such situations, the type discipline discussed in the next section would become an indispensable tool for programming, where we can algorithmically verify if a program has coherent communication structure and, in particular, if it contains interaction errors.

5. The Type Discipline 5.1. P r e l i m i n a r i e s . The present structuring m e t h o d allows the clear description of complex interaction structures beyond conventional communication primitives. The more complex the interaction becomes, however, the more difficult it would be to capture the whole interactive behaviour and to write correct programs. The type discipline we shall discuss in this section gives a simple solution to these issues at a basic level. We first introduce the basic notions concerning types, including duality on types which represents complementarity of interactions.

Definition 5.1 (Types). Given type variables (t, t ' , . . . ) and sort variables (s, s',... ), sorts (S, S' .... ) and types (c~,t$,... ) are defined by the following grammar. S ::=nat

[ b o o l [ ( a , ~ ) I s [ ps.S

::= r Ir I ~:{11: ~ l , . . . , z . : ~ . } I 1 I • I 1"[53;~ It[~];~ I e g l : ~ x , . . . , t , : ~ , } I tl~t.~ where, for a type a in which • does not occur, we define ~, the co-type of a, by:

A sorting (resp. a typing, resp. a basis) is a finite p a r t i a l map from names and variables to sorts (resp. from channels to types, resp. from process variables to the sequences of sorts and types). We let s (resp. A , A ' , . . . , resp. O , e ' , . . . ) range over sortings (resp. typings, resp. bases). ~Ve regard types and sorts as representing the corresponding regular trees in the standard way [5], and consider their equality in terms of such representation. A sort of form (a, ~) represents two complementary structures of interaction which are associated with a name (one denoting the b e h a ~ o u r starting with a c c e p t , another that which starts with r e q u e s t ) , while a type gives the abstraction of interaction to be done through a channel. Note ~ = a holds whenever ~ is defined. I is a specific type indicating no further connection is possible at a given name. The following partial algebra on the set of typings, cf. [10], plays a key role in our typing system.

133

D e f i n i t i o n 5.2 (Type algebra). Typings A1 and ~2 are compatible, written A1 • A2, if Aa(k ) = /X2(k) for all k E dom(A~x)n dom(z~2). When Ah • As, the composition of A1 and A2, written A1 o A2, is given as a typing such that (A1 o A2)(k ) is (1) .L, when k e dom(A1) A dom(A2); (2) A,(k), if k 9 d o m ( ~ , ) \ dom(Ai+l rood ~) for i e {1,2}; and (3) undefined otherwise. Compatibility means each common channel k is associated with complementary behaviours, thus ensuring the interaction on k to run without errors. When composed, the type for k becomes J_, preventing further connection at k (note _L has no co-type). One can check the partial operation o is partially commutative and associative. 5.2. T y p i n g S y s t e m . The main sequent of our typing system has a form O;F~-P~A which reads: "under the environment e ; r , a process P has a typing A." Sorting F specifies protocols at the free names of P , while tD~ping A specifies P ' s behaviour at its free channels. When P is a program, O and A become both empty. Given a typing or a sorting, say ~, write r 9s : p for r U {s : p} together with the condition that s r dom(~); and r for the result of taking off s : r from r if ~(s) is defined (if not we take r itself). Also assume given the evident inference rules for arithmetic and boolean expressions, whose sequent has the form F F e ~ a, enjoying the standard properties such as F b e ~ S and e $ c imply F F c ~ S. The main definition of this section follows. D e f i n i t i o n 5.3 (Basic typing system). The typing system is defined by the axioms and rules in Figure 1, where we assume the range of A in [INACT] and [VAR] contains only 1 and .L. For simplicity, the rule [DEF] is restricted to single recursion, which is easily extendible to multiple recursion. If 0 ; F ~- P > A is derivable in the system, we say P is typable under 0; r with A, or simply P is typable. Some comments on the typing system follow. (1) In the typing system, the left-hand side of t h e turnstile is for shared names and variables ("classical realm"), while the right-hand side is for channels sharable only by two complementary parties (a variant of "linear realm"). It differs from various sorting disciplines in that a channel k is in general ill-sorted, e.g. it may carry an integer at one time and a boolean at another. In spite of this, the manipulation of linear realm by typing algebra ensures linearised usage of channels, as well as preventing interaction errors, cf. Theorem 5.4 below. (2) In [THR], the behaviour represented by a for channel k' is actually performed by the process which "catches" k I (note k ' cannot occur free in A, hence neither in P , by our convention on "-"). To capture the interactions at k I as a whole, k ~ : a is added to the linear realm. On the other hand, the rule [CAT] guarantees that the receiving side does use the channel k ~ as is prescribed. Reading from the conclusions to the antecedents, [THR] and [CAT] together illustrate how k' is "thrown" from the left to the right. (3) The simplicity of the typing rules is notable, utilising the explicit syntactic structure of session. In particular it is syntax-directed and has a principal type when we use a version of kinding [22]. It is then easy to show there is a typing algorithm ~ la ML, which extracts the principal type of a given process iff it is typable. It should be noted that simplicity and tractability of D~ping rules do not mean that the obtainable type information is uninteresting: the resulting type abstraction richly represents the interactive behaviour of programs, as later examples exhibit.

134

O;F~- P~,A -k : a

[REQ]

[Acc] (9; F, a : (o, H) I- accept a(k) in P t, A F F- ~ . S

@;FI-pt~A.k:a

[s~o] e ; r ~ k ! [ ~ ] ; ~

k:t[~q;~

[acv]

O;I~F Pt~A -k : (9; F, a : (ct, ~) t- request a(k) in P ~, A

O;P.~c:S~-P~,A.k:a O;F I- k?(~) in Pt, A. k :~[S]; ~

O;F~-P,~A.k:a~ --- O ; F ~ - P ~ t , A . k : a , [BR] e ; r e k > {tl: P , ] - . - ~ t , : P,} ~ZX. k : *~{t~: ~ , , . . . , t , : a , } O;Fb P ~ A . k : a j

[SEL]e;F ~-k,~lj;P~A. k : e { / x : ax . . . . , l . : a , }

(l_<j
e ; F t- P~. A .k :/~ r 1 O ; F ~ - P ~ A . k : I 3 . k ' :a r tWH% ~O; r ~- thro~ k[k']; P ~ ~ - k : t [ a ] ; # - k ' : a tCArj O; r ~- catch k(k') in P ~ A. k :,l.[a];/3 i~ t , e ; r t - P t , A e;Pl-_Q.t,/, cA [L;ONCJ ~ ; f f - ~ I Q r , A o A , ._xA')

e [IF] r

~bool

F- a : S I- P ~ A F ~- P ~, A - k :_L [NREs] 0;~;i'-~Va3-~;S [C~s] {9; e--c-g ~YkS~;a r I- ~t,S

"DEF" e ; r ' ~

[v*R]e,x:~a;r~x[~]~zx.~: a t

O;FFPt:,A

O;FFQt:,A

eo;r~_ifethenPelseQt,

A

[INACT](9; F b inact t>A

: ~1- P~'[r : & O ; P ~ - Q ~ A ' o ' X "

I e\x;r~aef x(~)=Pi~Q~zx

t t )=

$6:)

FIGURE 1. The typing system Below we briefly summarise the fundamental syntactic properties of the typing system. We need the following notion: a k-process is a prefixed process with subject k (such as k![~];P and c a t c h k(k') i n P). Next, a k-redex is a pair of dual k-processes composed by I, i.e. either of forms (k![~];P I k?(x) i n Q), ( k , ~ l ; P ] k t > { l l : QI~ "''~ln: Qn}), or (throw k[k'];PI c a t c h k(k") i n Q). Then P is an error ff P =_ d e f D i n (vfi)(P'lR) where P ' is, for some k, the ]-composition of either two k-processes that do not form a k-redex, or throe or more k-processes. We then have: T h e o r e m 5.4. 1. (Invariance under _=) O; F F- P ~ A and P =- Q imply (9; F ~- Q ~ A 2. (Subject reduction) (9; F F P ~ A and P -~* Q imply (9; F I- Q ~ A. 3. (Lack of run-time errors) A typable program never reduces into an error. We omit the proofs, which are straightforward due to the syntax-directed nature of the typing rules. See [11] for details. We note t h a t we can easily extend the typing system with ML-like polymorphism for recursion, which is useful for e.g. template processes (such as d e f Cell(cv) . . . . i n Cell[a 42] I CeU[b t r u e ] ) , with which all the properties in Theorem 5.4 are preserved. This and other basic extensions are discussed in [11]. 5.3. E x a m p l e s . We give a few examples of typing, taking programs in the preceding sections. We omit the final 1 from the type, e.g. we write $[a] for ,[ [a]; 1. First, the factorial in Example 3.2 is assigned, at its free name, a type $[nat]; 1"[nat] (for factorial) and its dual J'[nat];~[nat] (for its user). Next, the cell in Example 3.3 is given a type

135

&:{read :t [c~],~ r i t e :$[c~]} (for the cell) and its dual e { r e a d :$ [c~],w r i t e :t[c~]} (for its user). The type of a cell says a cell waits with two options, and, when "read" is selected, it would send an integer and the session ends, and when "write" is selected, it would receive an integer and again the session ends: its dual says a user may do either "read" or "write", and, according to which of them it selects, it behaves as prescribed. As a more interesting example, take the "kind ATM" in Example 4.2. Consider ATM'[ab] under the declaration in the example. Then a typing a : (~,~), b : (/~,~) is given to the process, where we set c~, which abstracts the interaction with a user, as: Ct

def $[nat]; #t.&:{deposit :$[nat]; t, withdraw :$[nat]; e { d i s p e n s e :1"[nat]; f, o v e r d r a f t : t}, b a l a n c e :1"[nat]; 4, quiz :t [nat]},

while fl, which abstracts the interaction with the banking system, is given as: def e{deposit :t[nat nat], withdraw :1"[nat nat]; & { s u c c e s s : 1, f a i l u r e : 1}, b a l a n c e :$ [nat]; ~ [nat]}. Notice the type abstraction is given separately for the user (at a) and the bank (at b), describing the behaviour of ATM ~ for each of its interacting parties. As a final example, the ftp server of Example 4.3 is given the following type at its principal name:

$[natnat];

${sorry:l, welcome:#t.&{get:-.-;t,

put:...;t, bye:...}}.

This example shows that the throw/catch action is abstracted away in the type with respect to the user (but not in the type with respect to the thread, which we omit) so that the user can interact without concerning himself with the delegation occurring on the other's side. In these ways, not only the type discipline offers the correctness verificationof programs at a basic level,but also it gives a clean abstraction of interactive behaviours of programs, which would assist programming activities. 6. Discussions 6.1. Implementation Concerns. In the previous sections, we have seen how the session structure enables clean description of complex communication behaviours, employing the synchronous form of interactions as its essential element. For implementation of the primitives, however, the use of asynchronous communication is essential, since the real distributed computing environments are inherently asynchronous. To study such implementation in a formal setting, which should then be applied to the realistic implementation, we consider a translation of the present language primitives into TyCO[14], a sorted summation-less polyadic asynchronous r-calculus with branching structure (which is ultimately translatable into its monadic, branch-less version). The existence of the branching structure makes T y C O an ideal intermediate language. For the space sake we cannot discuss the technical details of the translation, for which the reader may refer to [11]. W e only listessential points. (I) The translation converts both channels and names into names. Each synchronous interaction (including the branching-selection) is translated into two asynchronous interactions, the second one acting as acknowledgement. This suggests the primitives are amenable for distributed implementation, at least at the rudimentary level.

136

(2) In spite of (1) above, the resulting code is far from optimal: as a simple example, if a value is sent immediately after the request operation, clearly the value can be "piggy-backed" to the request message. A related example is the translation of the factorial in Section 3 in comparison with its standard "direct" encoding in Pict or TyCO in a continuation-passing style, cf. [24]. To find the effective, welt-founded optimisation methods in this line would be a fruitful subject of study (see 6.2 below). (3) The encoding translates the typable programs into well-sorted TyCO codes. It is an intriguing question how we can capture, in a precise way, the well-typedness of the original code at the level of TyCO (this question was originally posed by Simon Gay for a slightly different kind of translation). 6.2. R e l a t e d W o r k s a n d F u r t h e r Issues. In the following, comparisons with a few related works are given along with some of the significant further issues. First, in the context of conventional concurrent programming languages, the key departure of the proposed framework lies in that the session structure allows us to form an arbitrary complex interaction pattern as a unit of abstraction, rather than being restricted to a fixed repertoire. Examples in Section 4 show how this feature results in clean description of complex communication behaviours which may not be easy to express in conventional languages. For example, if we use, for describing those examples, so-called concurrent object-oriented languages [34], whose programming concept based on objects and their interaction is proximate to the present one, we need to divide each series of interactions into multiple chunks of independent communications, which, together with the nesting of method invocations, would make the resulting programs hard to understand. The explicit treatment of session also enables the type-based verification of compatibility of communication patterns, which would facilitate the writing of correct communication-based programs at an elementary level. In spite of these observations, we believe that various communication primitives in existing programming languages, such as object-invocation and RPC, would not diminish their significance even when the present primitives are incorporated. We already discussed how they would be useful for declaring fixed communication patterns, as well as for saving key strokes (in particular the primitives for simple message passing would better be given among language constructs since their description in the structuring primitives is, if easy, roundabout). Notice we can still maintain the same type discipline by regarding these constructs as standing for combination of structuring primitives, as we did in Section 3. In another vein, the communication structures which we can extract from such declaration would give useful information for performing optimisation. Secondly, the-field of research which is closely related to the present work is the study of ~rious typed programming languages based on r-calculi [6, 23, 29], and, more broadly, the study of types for r-caiculi (see for example [17, 24, 25, 13, 30, 35]). In 6.1, we already noted that the proposed primitives are easily translatable into TyCO, hence into r-calculus. Indeed, the encoding of various interaction structures in r-calculus is the starting point of the present series of study (cf. Section 2 of [27]). Comparisons with these works may be done from two distinct viewpoints. (1) From the viewpoint of language design, Pict and other r-calculus-based languages use the primitives of (polyadic) asynchronous ~-calculus or its variants as the basic language constructs, and build further layers of abstraction on their basis. The present proposal differs in that it incorporates the session-based structure as a fundamental stratum for programming, rather than relying on chains of direct name passing for describing communication behaviour. While the former is operationally translatable into the latter as discussed in 6.1, the very translation of, say,

137

the programming examples in the preceding sections would reveal the significance of the session structure for abstraction concerns. In particular, any such translation should use multiple names for actions belonging to a single session, which damages the clarity and readability of the program. We note that we are far from claiming that the proposed framework would form the sole stratum for high-level communication-based programming: abstraction concerns for distributed computing are so diverse that any single framework cannot meet all purposes. However we do believe that the proposed constructs (with possible variations) would offer a basic and useful building block for communication-based programming, especially when concerned communication behaviours tend to become complex. (2) From the viewpoint of type disciplines, one notable point would be that well-typed programs in the present typing system are in general ill-sorted (in the sense of [17]) when they are regarded as zr-terms, since the same channel k may be used for carrying values of different sorts at different occasions. This is in a sharp contrast with most type disciplines for 7r-calculus in literature. An exception is the typing system for monadic 1r-calculus presented in [35], where, as in the present setting, the incorporation of sequencing information in types allows certain ill-sorted processes to be well-typed. Apart from a notable difference in motivation, a main technical difference is that the rules in [35] are not syntax-directed; at the same time, the type discipline in [35] guarantees a much stronger behavioural property. Another notable feature of the present typing system is the linear usage of channels it imposes on programs. In this context, the preceding study on linearity in ~r-calculus such as [10, 13] offers the clarification of the deterministic character of interaction at channels (notice interaction at names is still non-deterministic in general). Also [THrt] and [CAT] rules have some resemblance to the rules for linear name communication presented in [10, 13]. Regarding these and other works on types for rr-calculus, we note that various cases of redundant codes in translation mentioned in 6.1 above often concern certain fixed ways of using names in processes, which would be amenable to type-based analysis. Finally one of the most important topics which we could not touch in the present paper is the development of reasoning principles based on the proposed structuring method. Notice the type discipline in Section 5 already gives one simple, though useful, example. However it is yet to be studied whether reasoning methods on deeper properties of programs (cf. [1, 3, 20]) which are both mathematically well-founded and are applicable to conspicuous practical situations, can be developed or not. We wish to touch on this topic in our future study. A c k n o w l e d g e m e n t s . We sincerely thank three reviewers for their constructive comments. Discussions with Simon Gay, Luls Lopes, Benjamin Pierce, Kaku Takeuchi and Nobuko Yoshida have been beneficial, for which we are grateful. We also thank the last two for offering criticisms on an earlier version of the paper. Vasco Vasconcelos is partially supported by Project P R A X I S / 2 / 2 . 1 / M A T / 4 6 / 9 4 .

References [1] S. Abramsky, S. Gay, and R. Nagaxajan, Interaction Categories and Foundations of Typed Concurrent Computing. Deductive Program Design, Springer-Verlag, 1995. [2] G. Agha. Actors: A Model o] Concurrent Computation in Distributed Systems. MIT Press, 1986. [3] P. America and F. de Boer. Reasoning about Dynamically Evolving Process Structures. Formal Aspects of Computing, 94:269-316, 1994. [4] G. Berry and G. Boudol. The chemical abstract machine. TCS, 96:217-248, 1992. [5] ]3. Courcelle. Fundamental properties of infinite trees. TCS, 25:95-169, 1983.

138

[6] C. Fournet and G. Gonthier. The reflexive chemical abstract machine and the join-calculus. In POPL'96, pp. 372-385, ACM Press, 1996. [7] C.A.H. Hoare. Communicating sequential processes. CACM, 21(8):667-677, 1978. [8] C.A.R. Hoare. Commumcating Sequential Processes. Prentice Hall, 1995. [9] Jones, C.B., Process-Algebraic Foundations for an Object-Based Design Notation. UMCS-93-10-1, Computer Science Department, Manchester University, 1993. [10] K. Honda. Composing processes. In POPL '96, pp. 344-357, ACM Press, 1996. [11] K. Honda, V. Vasconcelos, and M. Kubo. Language primitives and type disciplines for structured communication-based programming. Full version of this paper, in preparation. [12] S. Peyton Jones, A. Gordon, and S. Finne. Concurrent Haskell. In POPL'96, pp. 295-308, ACM Press, 1996. [13] N. Kobayashi, B. Pierce, and D. Turner. Linearity and the pi-calculus. In POPL'96, pp. 358-371, ACM Press, 1996. [14] L. Lopes, F. Silva, and V. Vasconcelos. A framework for compiling object calculi. In preparation. [15] R. Milner. Communication and Concurrency. C.A.R. Hoare Series Editor. Prentice Hall, 1989. [16] R. Milner. Functions as processes. MSCS, 2(2):119-141, 1992. [17] R. Milner, Polyadic z-Calculus: a tutorial. Logic and Algebra o/Specification, Springer-Verlag, 1992. [18] R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, Parts I and II. Journal of Information and Computation, 100:1-77, September 1992. [19] R. Milner and M. Tofte. The Definition o] Standard ML. M I T Press, 1991. [20] H. 1%. Nielson and F. Nielson. Higher-order concurrent programs with finite communication topOlogy. In POPL '9g. ACM Press, 1994. [21] A. Ohori and K. Kato. Semantics for communication primitives in a polymorphic language. In POPL 93. ACM Press, 1993. [22] A. Ohori. A compilation method for ML-style polymorphic record calculi. In POPL 9~, pp. 154-165. ACM Press, 1992. [23] B. Pierce and D. Turner. Pict: A programming language based on the pi-calculus. CSCI Technical Report 476, Indiana University, March 1997. [24] B. Pierce, and D. Sangiorgi, Typing and subtyping for mobile processes. In LICS'93, pp. 187-215, 1993. [25] B. Pierce and D. Sangiorgi, Behavioral Equivalence in t h e Polymorphic Pi-Calculus. POPL 97, ACM Press, 1997. [26] J-H. Reppy. CML: A higher-order concurrent language. In PLDI 91, pp. 293-305. ACM Press, 1991. [27] K. Takeuchi, K. Honda, and M. Kubo. An interaction-based programming language and its typing system. In PARLE'94, volume 817 of LNCS, pp. 398-413. Springer-Veriag, July 1994. [28] US Government Printing Office, Washington DC. The Ada Programming Language, 1983. [29] V. Vasconcelos. Typed concurrent objects. In ECOOP'94, volume 821 of LNCS, pp. 100-117. Springer-Verlag, 1994. [30] V. Vasconcelos and K. Honda, Principal Typing Scheme for Polyadic 1r-Calculus. CONCUR'g3, Volume 715 of LNCS, pp.524-538, Springer-Verlag, 1993. [31] V. Vasconcelos and M. Tokoro. A typing system for a calculus of objects. In 1st ISOTAS, volume 742 of LNCS, pp. 460-474. Springer-Verlag, November 1993. [32] Y. Yokote and M. Tokoro. Concurrent programming in ConcurrentSmalltalk. In Yonezawa and Tokoro [34]. [33] A. Yonezawa, editor. ABCL, an Object-Oriented Concurrent System. MIT Press, 1990. [34] A. Yonezawa and M. Tokoro, editors. Object-Oriented Concurrent Programming. MIT Press, 1987. [35] N. Yoshida, Graph Types for Monadic Mobile Processes. In FST//TCS'16, volume 1180 of LNCS, pp. 371-386, Springer-Verlag, 1996. The full version as LFCS Technical Report, ECS-LFCS-96-350, 1996.

139

The Functional Imperative: Shape! C.B. Jay

P.A. Steckler

School of Computing Sciences, University of Technology, Sydney, P.O. Box 123, Broadway NSW 2007, Australia; email: {cbj, steck}@socs, uts. edu. au fax: 61 (02) 9514 1807

1

Introduction

FISH is a new programming language for array computation that compiles higher-order polymorphic programs into simple imperative programs expressed in a sub-language TURBOT, which can then be translated into, say, C. Initial tests show that the resulting code is extremely fast: two orders of magnitude faster than HASKELL,and two to four times faster than OBJECTIVE CAML, one of the fastest ML variants for array programming. Every functional program must ultimately be converted into imperative code, but the mechanism for this is often hidden. FISH achieves this transparently, using the "equation" from which it is named: Functional = Imperative + Shape Shape here refers to the structure of data, e.g. the length of a vector, or the number of rows and columns of a matrix. The FISH compiler reads the equation from left to right: it converts functions into procedures by using shape analysis to determine the shapes of all array expressions, and then allocating appropriate amounts of storage on which the procedures can act. We can also read the equation from right to left, constructing functions from shape functions and procedures. Let us consider some examples. >-I>

let v = [ I

0;1;2;3

I]

;;

declares v to be a vector whose entries are 0, 1, 2 and 3. The compiler responds with both the the type and shape of v, that is, v : vec[int] #v = (-4,int_shape) This declares v has the type of an expression for a vector of integers whose shape #v is given by the pair ('4,int_shape) which determines the length of v, ~4, and the c o m m o n shape int_shape of its entries. The tilde in -4 indicates that the value is a size used to describe array lengths, which is evaluated statically. By contrast, integers may be used as array entries, and are typically computed dynamically. Arrays can be arbitrarily nested. For example

140 > - I > l e t x = [ I [1 O; 1 13 ; [ I x : vec[vec [int]] #x = (~2, (-2,int_shape))

2; 3 I ]

I]

;;

is a vector of vectors (or 2 x 2 matrix). Here too, the entries of the outer vector all have the same shape, namely (~2, int_shape). FISH primitives can be used to construct higher-order functions. For example, sum_int adds up a vector of integers. Also, we cam m a p functions across arrays. For example, to s u m the rows of x above we have >-i> let y = map sum int x ;; y : vec[int] #y = (~2,int shape) >-I> Y,run y ;; Shape = (~2,int shape) Value = [I 1; 5 I] Notice that the compiler was able to compute the shape of y before computing any of the array entries. This shape analysis lies at the heart of the FISH compilation strategy. Let us consider h o w mapping is achieved in more detail. Consider the mapping map f x of a function f across a vector x. The resulting vector has the same length as x but what is the c o m m o n shape of its entries? In FISH, each function f has a shape #f which maps the shape of its argument to the shape of its result. Letting fsh represent #f and xsh represent #x we can describe the shape of map f by the following function >-I> let map sh fsh xsh = (fst xsh, fsh (snd xsh)) map sh : (q -> r) -> s * q -> s * r

;;

This function will be used to determine the storage required for y. The procedure for assigning values to its entries is given by the following for-loop >-I> let map_pr f y x = for (• @(len var x))

{ y [ i ] := f ! x [ i ] };; map_pr : ( a -> b) -> v a r [ v e c [ b ] ]

-> v a r [ v e c [ a ] ]

-> comm

Here x : v a t [ v e c [a] ] is a phrase variable, i.e. an assignable location for a vector whose entries are of d a t a t y p e a. Similarly, y : v a r [ v e c [b] ]. Now l e n _ v a r x is the length of x, while 9 converts from a size to an integer. Also, ! extracts t h e expression associated to a phrase variable, i.e. de-references it. Note t h a t b o t h x and x [ i ] are phrase variables. Finally, the shape function for m a p p i n g and its procedure are combined using the function proc2fun

: ( # [ a ] -> # [ b ] ) -> (vat[b] -> vat[a] -> comm) -> a -> b

141

It is not a primitive constant of F[SH, but itself is defined using simpler constructions, that act directly on shapes and commands. In particular, it is designed to avoid unnecessary copying of its vector argument. Thus, we have the higherorder, polymorphic function >-I> let map f = proc2fun (map_sh #f) map : (a -> b) -> vec[a] -> vec[b]

(map_pr f)

;;

This same strategy is also used to define other polymorphic array functions for reduction, folding, and so on. Partial evaluation converts arbitrary FISH programs into imperative programs in a sub-language called TURBOT, which can then be translated to the imperative language of choice, currently ANSI C. For example, the TURBOT for map sum_int x is given in Figure 1. The variables B and t store x and y, respectively.

n e w #A =

('2,int_shape)

in

n e w #B = ('2,('2,int_shape))

B[O][O]

:= 0;

B[O][1]

:= 1;

B[1][0]

:= 2;

B[1][1]

:=

for

in

3;

(0 <= i < 2) {

n e w #C = Jar_shape

in

C := O; A[i] := !C;

f o r (0 <= j < 2) { A[i] := !A[i] + !B[i][j] } end end

return A

Fig. 1. TURBOTprogram for mapping summation.

It might be objected that this approach is not about functional programming at all, since map is "just" a disguised procedure. In response, we point out that our map is still a higher-order polymorphic function. All functional programs are ultimately compiled into imperative machine code. The difference is that FISH performs the compilation simply, efficiently and transparently, by making clear the role of shapes. Also, note that FISH, like Forsythe [Rey96], supports "callby-name" evaluation. That is, beta-reduction is always legitimate, despite the presence of commands, and side-effects.

142

Indeed, there is no obligation to program with the imperative features of FISH directly; we are constructing a library of FISH functions, called GOLDFISH, that includes the usual combinators of the Bird-Meertens Formalism [BW88], such as map, reduce, and many of the standard array operations, such as linear algebra operations and array-based sorting algorithms. The chief goal of partial evaluation is to compute the shapes of all intermediate expressions. Such static shape analysis has a number of additional benefits over a comparable dynamic analysis. First, many program errors, such as attempting to multiply matrices whose sizes do not match, show up as compiletime shape errors. The basic techniques of static shape analysis were developed in the purely functional language Vec [JS97a]. Its shape analyser is able to detect all array bound errors statically, at the cost of unrolling all loops. FISH takes a more pragmatic approach: array indices are treated as integers, not sizes, and so some array bound errors escape detection. However, combinations of functions which are free of array-bound errors, such as those in GOLDFISH, produce programs without bound errors. Second, knowledge of shapes supports compile-time strategies for data layout. On the small scale, this means not having to box array entries, a significant efficiency gain. On a large scale, FISH satisfies many of the pre-requisites for being a portable parallel programming language. Array combinators have been advocated as a powerful mechanism for structuring parallel programs [Ski94], but the inability to determine the appropriate data distributions has proved a major barrier in the quest for efficiency. Shape-based cost analysis offers a solution to this problem [JCSS97]. Also, FISH is a natural candidate for a coordination language for parallel programming (see [DGHY95], for example) since it supports both the second-order combinators useful for data distribution, and the imperative code used on individual processors. Static shape analysis requires significant amounts of function in-lining, so we have adopted the strategy of in-lining all function calls. This might be expected to lead to code explosion, but has not been a problem to date. The ability to analyse shapes means that array arguments can be stored, with only references being passed to functions (as in Figure 1) rather than copying of array expressions. Also, many of the standard functions, such as map, generate for-loops which only require one copy of the function body. Thus we have eliminated the costs associated with closures with little cost in copying of either code or data. This is a major source of the speed gains in our benchmarks. As FISH evaluation is "call-by-name" the most natural comparison is with another pure language HASKELL[HPJW92] (which is call-by-need). Our tests show FISH to be over 100 times faster than HASKELL on simple mapping and reducing problems. Of course, it is more common to use a "call-by-value" language with reference types for array computation, such as ML. Different implementations of ML vary significantly in their speed on array programs. O'CAML [Obj] is one of the fastest on arrays. But FISH is two to four times faster than O'CAML on a range of typical problems (see Section 3 for details). Of course, programs built by combining functions will often be slower than

143

hand-built procedures due to the copying inherent in the algorithm, but initial results suggest that the additonal overhead is not so large. We have implemented three matrix multiplication algorithms, using a triple for-loop (pure imperative code), a double for-loop plus inner product (mixed mode) and a purely combinatory algorithm. The combinatory algorithm takes approximately five times longer than the imperative algorithm. If these results scale up for more complex combinations of functions, then the performance penalty for using the functional style will be tolerable for a much larger range of applications than at present. Finally, observe that FISH supports a smooth transition from prototype to final program. Prototypes written in GOLDFISH can be refined by successively replacing critical parts of the program until the desired efficiency is reached. Hence, there is no need to abandon the prototype when moving to production. Also, because TURBOT is SO simple, it can be easily translated to the imperative language of choice, and native code can be embedded in FISH programs. Thus, FISH is a novel combination of expressive programming style (supporting higher-order, polymorphically typed, functions) and efficient execution that can be integrated with existing imperative languages. A slightly expanded version of this paper is available [JS97b].

2

The FISH Language

FISH is essentially an Algol-like language [Rey81, OT97]. In particular, it supports a sub-language TUaBOT that is a simple imperative language equipped with local variables that obey a stack discipline. Commands act on a store, and function application is call-by-name. To this base, we add the structure of a typed functional language, which can also be used to model procedures. As in all Algol-like languages, the data types (meta-variable 7-) are distinguished from the phrase types (meta-variable 8). Data types represent storable values, while phrase types represent meaningful program fragments. All of the fundamental types are at play in the rule for typing an assignment F I- A: var[~] F I- e: exp[~] /" F A := e : comm

A is a phrase variable for an array of data type c~ and e is an expression for such an array. Phrase variables can be coerced to expressions, e.g. !A : exp[~]. The assignment itself is a command. In most treatments of Algol-like languages the data types are limited to a fixed set of primitives or datum types (meta-variable 5), such as integers and booleans (though see Tennent [Ten89] for another approach). Arrays are treated as processes of phrase type, in which a suitable number of local variables are constructed to store the entries. This contradicts our usual notion of arrays, as storable quantities. However, allowing for array data types introduces an additional complexity. If ~ -- vec[int] above then the lengths of A and e may differ, and the assignment generate an array bound error. FISH is able to handle

144

this situation since the compiler is able to determine the shapes of all arrays statically, and check for shape equality for assignments. This is achieved by dividing the data types into data structures, here just array types (meta-variable ~) and shape types (meta-variable a). Values of shape type are determined by the compiler. The structured types also support structured phrase variables, as shown by the rule P F A: var[vec[(~]] _/I F i : exp[int]

P k- A[i]: var[cJ Thus, FISH supports assignment to whole arrays, and to their sub-arrays. More generally, the array data types support a generous class of polymorphic operations, such as mapping and reducing. FISH v 0.3 is available at f t p . socs. u t s . edu. a u / U s e r s / c b j / F i s h by anonymous ftp. It is implemented in OBJECTIVE CAML and currently runs on Sun SparcStations under Solaris, or using the O'CAML byte-code interpreter. 2.1

Types

Types and their Shapes

#6 = #[6] 6 ::= i n t l bool I float I ...

#X = #[X] ##[X] = #[X] #vec[oL] = size x # ~

::= x~ 1 6 1 vec[cJ fl ::= size I fact

: : = x ~ I #[x~] I #[61 I ~ I ~ x ~ e ::= x~ I #[xd Ivad~] I e• I co~m I #[comm] I e ~ e r ::= e I v x ~ . r I v x ~ . r I vx0.r

#~---~ #var[T] ----exp[#r] #exp[T] = exp[#T] #comm = #[comm] # # [ c o m m ] = #[comm] #(e -~ r = #e -~ #e'

Fig. 2. FISH types.

The F]SH types are given in Figure 2. Most of the constructions have been introduced above. Note that phrase variables are always of array type c~, never of shape type. This is because shapes, once declared, cannot be changed by an assignment. There are type variables for arrays, shapes or phrases whose kind is indicated by subscripting. Subscripts will often be elided when the kind is clear from the context. Type schemes are constructed from phrase types by quantification of type variables. The shape type constructor # [ - ] may be applied to array and phrase variables, datum types and the command type. It is then extended to an idempotent function # that acts on all types. Such types are

145

used to represent the shapes of terms. Only two cases of its definition deserve comment. The shape of a variable is an expression, not a variable, since it cannot be assigned. The shape of a vector is a pair, consisting of its length, and the common shape of its entries. Since all array entries have the same shape, the regularity of arrays is enforced. For example, #vec[vec[int]] = size x (size x #[int]). In other words, the entries in a vector of vectors must all have the same length, so that any vector of vectors is a matrix. The types of the purely imperative sub-language TURBOT are those FISH types that can be constructed without the function arrow constructor, type variables, and the shape of type variables. In the displays provided by the FISH implementation, type quantifiers are elided, array type variables are constructed using the letters a , b , c, d, shape type variables using q, r , s , t and phrase type variables using x, y , z. Also, exp [a] is displayed as a. For example, compare the type of map given in the introduction with its type according to the language definition:

VXa, Ya.(exp[X]-+ exp[Y])~ exp[vec[X]] ~ exp[vec[Y]]. 2.2

Terms

The TURBOT terms are given in Figure 3. The FISH terms extend those of TURBOT by the additional rules in Figure 4. Many of the introduction rules for terms are standard, so we will only address the novelties here. Shape constants and operations are distinguished from their datum counterparts by pre-fixing a tilde (-). For example, ~3 : exp[size] is a size numeral, distinct from the integer 3 and "tree:fact is a f a c t , distinct from the boolean t r u e . One may think of terms of f a c t type as "shape booleans'; alternatively, f a c t terms are compile-time assertions about shape. For example, comparing the shapes of two vectors yields a f a c t : > - I > l e t v l = [ I 1;:2 I ] ; ; vl : vec[int] # v l = (-2,int_shape) >-I> let v 2 = [I 3;4 I];; v 2 : vec[int] # v 2 = (~2,int_shape) >-I> let w = # v l = = # v 2 ; ; w

: fact

#w = ~true

Each phrase variable A : var[a] has an associated expression !A : exp[a] which represents its value. There are also coercions from sizes to integers, and facts to booleans (but not the reverse!). In practice, such coercions will often be inferred during parsing or type inference, but the presence of type variables means that they cannot be abandoned altogether. FISH supports three forms of conditional expression, with key words if, ire and ifsh which express conditional commands and expressions, and shape-based

146

Term and Phrase Variables

s

F I- A : var[vec[a]]

>- 0

F I- i : exp[int]

F I- A[i]: var[oz]

FF-x:O Expressions

F I- el: exp[51] r I- e2: exp[52] F l - e l @ e2:exp[53]

F F" d : exp[5]

I" l- shl : exp[/~l] F l- sh2: exp[/32] _P I- shl @ sh2 : exp[/33]

F I- sh : exp[/~] F I- i : exp[size]

_PI- sh : exp[fact]

F I- @i : exp[int]

F I- f a ct 2 b o o l( sh ) : exp[bool]

F i- A : var[a]

F t- b: exp[bool] F t- e l : exp[a] F t- e2: exp[a]

F H A : exp[a]

F I- ife b then el else e2 : exp[a] r e ~1 : exp[~l]

r ' I- 5_shape: exp[#[5]]

r e e2: e•

F t- (el, e2) : exp[al x a2]

Commands Ft-Cl:comm F t - C 2 :comm F I- C1;C2 : comm

F F- skip : comm

F I- A: var[a] F I- e: exp[a] _P I- A := e : comm Fl-b:exp[bool]

_P I- abort : comm

F I - C I :comrn _ P I - C 2 : c o m m

F I- if b then C1 else C2 : comm _P I- m : exp[int]

_r' I- n : exp[int]

FF-for (m
F , i : exp[int] I- C : comm

C:comm

Ft-b:exp[bool] Ft-C:comm F I- while b do C : comm F I- e : exp[#c~] Fl-new#x=ein

F, x : var[oz] I- C : comm Cend

:comm

_P I- e : exp[a] _P I- output(e) : cornrn

F i g . 3. TURBOT terms.

c o n d i t i o n a l s , respectively. T h e c o m m a n d form is s t a n d a r d . E x p r e s s i o n condit i o n a l s ire b then el else e2 have a special s t a t u s . B o t h el a n d e2 m u s t have t h e s a m e s h a p e , which is t h e n t h e s h a p e of t h e whole c o n d i t i o n a l . T h i s is n e c e s s a r y

"147

Functions

F,x : ~1 ~- t : 82

Fkt

:~1-+82 F k t l _r'Ft tl :82

F k - t 2 : 8 2 F, x:Closr(82) F t l :81 FFtl wherex=t2:81 Combinators

:81

cr >-8 FFc:8

error : VXe. X condsh : VX~. explfact] --~ X --~ X --~ X # : v x 0 . x -~ # I x ]

null : VXo.#X --~ X prim_rec : VXe. (exp[size] --> X --+ X) --+ X -+ exp[size] -~ X equal: VX~,. # X --~ # X -+ exp[fact] fst : VXr Y~. exp[X x Y] --~ exp[X] sncl : VXr Y~. exp[X x Y] --~ exp[Y] entry : VX~. exp[size] --> exp[vec[X]] -~ exp[X] newexp : VX~. exp[#[X]] ~ (var[X] -~ comm) --~ exp[X] hold : #[comm] heq: #[comrn] -~ #[comm] -~ exp[fact] ap2e : VX~. (var[X] --> cornm) -~ (exp[X] ~ cornm)

Fig. 4. Additional FISH terms.

for static shape analysis. If the branches are to have different shapes, then the condition must be resolved statically, i.e. must be a fact f . This is done using a shape conditional ifsh f then el else e2 which is sugar for condsh f el e2. Each (well-shaped) d a t u m of type ~ has the same shape 5_shape which m a y be thought of as describing the storage requirements for such a datum, e.g. the number of bits required for a floating point real number. Similarly, every (well-shaped) c o m m a n d has the same shape hold. This reflects the principle t h a t c o m m a n d s must leave the store in the same shape as they found it. The c o m m a n d block new # x = e in C end introduces a local phrase variable x whose shape is e. The combinator newexp is applied to create a local block which returns an expression. The user syntax for newexp's is the same as for c o m m a n d blocks, but with end replaced by return x. The output construction converts an expression to a c o m m a n d which has no effect on the store, but outputs the value of its argument. Now let us consider the additional FISH terms in Figure 4. Polymorphic terms are constructed by where-clauses. Although we have not proved it, we believe t h a t p o l y m o r p h i s m need not be restricted to syntactic values, as in [MHTM97]. The error combinator arises when shape analysis detects a shape error. The shape combinator # returns the shape of its argument. Its one-sided inverse is null which returns an array expression of the same shape as .its argument, but whose entries are undefined. This is mainly for internal use by the compiler, prim_rec

148

is a primitive recursion combinator (see below), hold is the shape of a command without shape errors, while heq compares the shapes of two commands, ap2e allows a procedure to be applied to an expression. Its effect can be modelled using newexp but at the cost of some unnecessary copying. The FISH primitives can be used to construct a large variety of higher-order polymorphic functions, such as the typical BMF combinators, fundamental linear algebra operations, e.g. matrix multiplication, and other fundamental array operations, e.g. quicksort. A suite of these is being assembled as a purely functional sub-language of FISH, called GOLDFISH. General recursion introduces some delicate issues for shape analysis. In the most common cases the shapes produced are independent of the number of iterations involved. This is the case for while loops, since commands are not allowed to change any shapes. These are quite adequate for datum-level recursion. Recursion on arrays is more problematic, since the iterated function may produce a new array of different shape to the original, so that the shape of the result depends on the number of iterations. This is not a problem in the most common cases, e.g. mapping and reducing, which are handled directly in GOLDFISH. The pdm_rec combinator handles the general, shape-changing operation by requiring that the number of iterations be a size, and using this to unfold the recursion during compilation. The general case, iterating a shape-changing function for an unbounded number of iterations, is not currently handled. Type unification and inference proceeds by a variant of the usual Algorithm W [Mi178]. The only difficulty is that the idempotence of the shape function # on types means that most general unifiers, and hence principal types, do not always exist. For example, taken alone, the FISH function fun x -> #x == ( - 2 , i n t _ s h a p e ) has ambiguous type, since x could have either type s i z e * # [ i n t ] or vec [ i n t ] . But when applied to a term with known kind, the function's type becomes unambiguous: >-I> (fun x - >

J

:

#x == ( - 2 , i n t _ s h a p e ) )

( ~ 3 , i n t _ s h a p e ) ;;

fact

#J = -false >-I> (fun x -> #x == (-2,int_shape)) J : fact #J = -true

[I i;2 I] ;;

The ambiguities vanish once the kinds are known, so that the ambiguities do not present great practical difficulties.W e plan to allow programmer annotations to resolve those that do arise.

2.3

Evaluation

The evaluation rules are not presented, for lack of space. Most are standard, the only novelties being those for applying the shape combinator # . For example, evaluation of #(entry i v) proceeds by evaluating snd #v. Efficiency of the resulting code is greatly enhanced by partial evaluation.

149

Partial evaluation is used to inline all function calls, apply any constant folding, and evaluate all shapes. It is guaranteed to terminate. This is done without knowledge of the store, other than its shape. Inlining succeeds because we do not support a general fixpoint operator (see above). Shape evaluation succeeds because we have maintained a careful separation of shapes from datum values. The key result is that every FISH term of type comm partially evaluates to a TURBOT command, i.e. a simple imperative program without any function calls or procedures. The FISH implementation takes the resulting TURBOT and performs a syntaxdirected translation of TURBOT to C, which is compiled and executed in the usual way. The C code produced is quite readable. The translation uses the known shapes of TURBOT arrays in their C declarations. If there is sufficient space they will be allocated to the run-time stack. Otherwise they are allocated to the heap using a checked form of C's m a l l o c ( ) called x m a l l o c ( ) . 3

Benchmarks

We have compared the run-time speed of FISH with a number of other polymorphic languages for several array-based problems. All tests were run on a Sun SparcStation 4 running Solaris 2.5. C code for FISH was generated using GNU C 2.7.2 with the lowest optimization level using the -0 flag and all floating-point variables of type d o u b l e (64 bits). As FISH is a call-by-name language, the most natural comparison is with HASKELL. Preliminary tests using ghc 0.29, the Glasgow HASKELL compiler showed it to be many times slower than FISH. For example, reducing a 100,000element array in ghc 0.29 took over 16 seconds of user time, compared to 0.04 seconds for FISH. We attribute such lethargy to the fact that Haskell builds its arrays from lists, and may well be boxing its data. Bigloo 1.6, which like F]SH compiles a functional language to C, is much slower than FISH. Now let us consider eager functional languages with mutable arrays, such as ML. O'CAML is one of the faster implementations of ML-related languages, especially for arrays, setting a high standard of performance against which to test our claims about FISH. 1 Hence, we confine further comparisons to O'CAML. For O'CAML code, we used ocamlopt, the native-code compiler, from the 1.05 distribution, using the flag - u n s a f e (eliminating arrays bounds checks), and also - i n l i n e 100, to enable any inlining opportunities. O'CAML also uses 64 bit floats. Also, O'CAML requires all arrays to be initialised, which will impose a small penalty, especially in the first two examples. None of the benchmarks includes I/O, in order to focus comparison on array computation. We timed seven kinds of floating point array computations: mapping of constant division over a vector, reduction of addition over a vector, multiplication 1 While there is a good deal of contention about which ML implementation is fastest overall, O'CAML appears to be superior at array processing. See the thread in the USENET newsgroup comp.lang.ml, beginning with the posting by C. Fecht on October 14, 1996.

150

I

IMaplReducel MM l~176

semilMM c~

FIS" 1.02 0.37 I 0.41 [ 0.70 I 1.11 I O'CAML 3.67

2.18

0.71

5.38

3.45

I

ILeroy FFTIQuicksort mono[Quicksort poly I FISH 3.17 8.41 I 8.41 O'CAML 4.07 12.10 I 57.27 Fig. 5. Benchmarks: FISH vs. O'CAML.

of matrices (three ways), the Fast Fourier Transform, and quicksort. The three matrix multiplication algorithms use a triply-nested for-loop; a doubly-nested loop of inner-products; and a fully combinatory algorithm using explicit copying, transposition, mapping and zipping. Its FISH form is

let mat_mult_float x y = zipop (zipop inner_prod_float)) (map (copy (cols y)) x) (copy (rows x) (transpose y)) For the mapping and reduction tests, we tried vectors containing from 100,000 to 1,000,000 elements. For the matrix multiplications, we tried matrices with dimensions from 10• 10 to 100• 100. Our FFT test is based on Leroy's benchmark suite for O'CAML, which we translated mechanically to FISH. For quicksort, we implemented two variants of a non-recursive version of the algorithm, one where the comparison operation was fixed (mono), and another where the operation was passed as an argument (poly). We sorted vectors with 10,000 elements. For each benchmark, the times for FISH were at least 25% faster than for O'CAML. Where function parameters are being passed, FISH is from 3 to 7 times faster. The results for the largest input sizes we tried are summarised in Figure 5. The times indicated are user time as reported by S o l a r i s / u s r / b i n / t i m e . In the O'CAML code, one of the costs is the application of the function passed to map, or reduce, which the FISH compiler inlines. To isolate this effect, we also in-lined the function calls in O'CAML code by hand. This reduced the slow-down for mapping to a factor of two, and for reducing to a factor of three. 4

Relations

to other work

As mentioned, the FISH language borrows many features from ALGOL-like languages. [OT97] is a collection of much of the historically significant work in this area. Non-strict evaluation and type polymorphism are also features of HASKELL [HPJW92]. While HASKELLdoes feature arrays, speed of array operations is not a particular concern. Key features of FISH are its collection of polymorphic

151

combinators for array programming, and the ability to define new polymorphic array operations. OBJECTIVE CAML also offers some polymorphic operations in its Array module, though no compile-time checking of shape and array bound errors is provided [Obj]. APL is a programming language specialized for array operations, though it does not offer higher-order functions live62]. ZPL is a recent language and system for array programming on parallel machines [ZPL]. Turning to compilation techniques, general results on partial evaluation can be found in [NN92, JGS93]. Inlining is still a subject of active research, e.g. using flow analysis to detect inlining opportunities, [JW96, Ash97]. Inlining in FISH is complete, so that there are no functions left in the generated code. Thus, FISH code is specialised for the shapes of its arguments, providing a more refined specialisation than one based just on the type [HM95]. In contrast to the approximations provided by program analyses based on abstract interpretation [CC77], FISH obtains exact shapes of data. Similarly, unboxing of polymorphic function parameters (e.g. [Ler97]) becomes moot, so our system gives a kind of "radical unboxing". Also, FISH's stack discipline obviates the need for garbage collection. Torte and Talpin's regions inference is an attempt to gain the benefits of stack discipline in ML, while still allowing a global store [TT94]. An implementation of this approach based on the ML Kit has recently become available [TBE+97]. In the regions approach, the best sizes of the regions needed are not easily determined. The ML Kit with Regions has tools for allowing programmers to tune their region allocations. In the FISH system, shape analysis assures that allocations are always of exactly the right size. 5

Future

Work

Future work will proceed in several directions. Here are two major goals. First, to extend the language to support more data types, e.g. trees, whose shapes are more complex. We have already begun work on supporting higher-order arrays, which have an arbitrary finite number of dimensions, and may contain arrays as elements. Such extensions bring the possibility of combining shape analysis with the other major application of shape, shape polymorphism [BJM96], or polytypism [JJ97]. Second, we are implementing support for general recursion for first-order procedures. Third, FISH is ideally suited to play the role of a co-ordination language for parallel programming, since the GOLDFISH combinators are ideal for distribution and co-ordination, while the imperative code runs smoothly on the individual processors. 6

Conclusions

FISH combines the best features of the functional and imperative programming styles in a powerful and flexible fashion. This leads to significant efficiency gains when executing functional code, and to significant gains in expressiveness for imperative programs. The key to the smooth integration of these two paradigms is

152

shape analysis, which allows polymorphic functional programs to be constructed from imperative procedures and, conversely, functional programs to be compiled into imperative code. Static shape analysis determines the shapes of all intermediate data structures (i.e. arrays) from those of the arguments, and the shape properties of the program itself. In conventional functional languages this is impossible, since array shapes are given by integers whose value may not be known until execution. FISH avoids this problem by keeping shape types separate from datum types. A c k n o w l e d g e m e n t s We would like to thank Chris Hankin, Bob Tennent and Milan Sekanina for many productive discussions.

References J.M. Ashley. The effectiveness of flow analysis for inlining. In Proe. 1997 ACM SIGPLAN International Conf. on Functional Programming (ICFP '97), pages 99-111, June 1997. [BJM96] G. Bell~, C. B. Jay, and E. Moggi. Functorial ML. In PLILP '96, volume 1140 of LNCS, pages 32-46. Springer Verlag, 1996. TR SOCS-96.08. [BW881 R. Bird and P. Wadler. Introduction to Functional Programming. International Series in Computer Science. Prentice Hall, 1988. P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model [CC77] for static analysis of programs by construction of approximation of fixpoints. In Conf. Record of the ~th ACM Symposium on Principles of Programming Languages, pages 238-252, 1977. [DGHY95] J. Darlington, Y.K Guo, To H.W., and Jing Y. Functional skeletons for parallel coordination. In Proceedings of Europar 95, 1995. R. Harper and G. Morrisett. Compiling polymorphism using intensional [HM95] type analysis. In Conference Record of POPL '95: 22nd ACM SIGPLANSIGA CT Symposzum on Principles of Programming Languages, pages 130141, San Francisco, California, January 1995. [HPJW92] P. Hudak, S. Peyton-Jones, and P. Wadler. Report on the programming language Haskell: a non-strict, purely functional language. SIGPLAN Notices, 1992. K.E. Iverson. A Programming Language. Wiley, 1962. [Ive62] [JCSS97] C.B. Jay, M.I. Cole, M. Sekanina, and P. Steckler. A monadic calculus for parallel costing of a functional language of arrays. In C. Lenganer, M. Griebl, and S. Gorlatch, editors, Euro-Par'97 Parallel Processzng, volume 1300 of Lecture Notes ~n Computer Science, pages 650-661. Springer, August 1997. N.D. Jones, C.K. Gomard, and P. Sestoft. Partial Evaluation and Au[JGS93] tomatic Program Generation. International Series in Computer Science. Prentice Hall, 1993. P. Jansson and J. Jeuring. PolyP - a polytypic programming language [JJ97] extension. In POPL '97: The 24th ACM SIGPLAN-SIGACT Symposium on Pmnczples of Programming Languages, pages 470-482. ACM Press, 1997. C.B. Jay and M. Sekanina. Shape checking of array programs. In Com[JS97a] puting: the Australasian Theory Seminar, Proceedings, 1997, volume 19 of Australian Computer Science Communications, pages 113-121, 1997. lAsh97]

153

C.B. Jay and P.A. Steckler. The functional imperative: shape! Technical Report 06, University of Technology, Sydney, 1997. 20 pp. S. Jagannathan and A. Wright. Flow-directed inlining. In Proe. ACM SIG[Jw96] PLAN 1996 Conf. on Programming Language Design and Implementation, pages 193-205, 1996. [Ler97] X. Leroy. The effectiveness of type-based unboxing. In Abstracts from the 1997 Workshop on Types in Compilation (TIC97). Boston College Computer Science Department, June 1997. [MHTM97] R. Milner, R. Harper, M. Tofte, and D. MacQueen. The Definition of Standard ML (Revised). MIT Press, 1997. R. Milner. A theory of type polymorphism in programming. JCSS, 17, [Mil78] 1978. [NN92] F. Nielson and H.R. Nielson. Two-level functional languages. Cambridge University Press, 1992. OBJECTIVE CAML home page on the World-Wide Web. [Obj] http ://pauillac. inria, fr/ocaml. P.W. O'Hearn and R.D. Tennent, editors. Algol-like Languages, Vols I and [OT97] II. Progress in Theoretical Computer Science. Birkhauser, 1997. [Rey81] J.C. Reynolds. The essence of ALGOL. In J.W. de Bakker and J.C. van Vliet, editors, Algorithmic Languages, pages 345-372. IFIP, North-Holland Publishing Company, 1981. John C. Reynolds. Design of the programming language Forsythe. Report [Rey96] CMU-CS-96-146, Carnegie Mellon University, June 1996. [Ski94] D.B. Skillicorn. Foundations of Parallel Programming. Number 6 in Cambridge Series in Parallel Computation. Cambridge University Press, 1994. [TBE+97] M. Tofte, L. Birkedal, M. Elsman, N. Hallenberg, T.H. Olesen, P. Sestoft, and P. Bertelsen. Programming with regions in the ML kit. Technical Report 97/12, Univ. of Copenhagen, 1997. [Ten89] R.D. Tennent. Elementary data structures in Algol-like languages. Science of Computer Programming, 13:73-110, 1989. M. Tofte and J.-P. Talpin. Implementation of the typed call-by-value A[TT94] calculus using a stack of regions. In Conf. Record of POPL '94: 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 188-201, January 1994. [ZPL] ZPL home page. http ://www. cs. washington, edu/research/zpl. [JS97b]

C o d e M o t i o n and C o d e P l a c e m e n t : Just S y n o n y m s ? * Jens K n o o p 1.*, Oliver Riithing 2, and B e r n h a r d Steffen 2 1Universit~tPassau, D-94030 Passau, Germany e-mail:knoop~fmi.uni-passau.de 2 Universit~t Dortmund, D-44221 Dortmund, Germany e-maih { r u e t h •

A b s t r a c t . We prove that there is no difference between code motion (CM) and code placement (CP) in the traditional syntactic setting, however, a dramatic difference in the semantic setting. We demonstrate this by re-investigating semantic CM under the perspective of the recent development of syntactic CM. Besides clarifying and highlighting the analogies and essential differences between the syntactic and the semantic approach, this leads as a side-effect to a drastical reduction of the conceptual complexity of the value-flow based procedure for semantic CM of [20], as the original bidirectional analysis is decomposed into purely unidirectional components. On the theoretical side, this establishes a natural semantical understanding in terms of the Herbrand interpretation (transparent equivalence), and thus eases the proof of correctness; moreover, it shows the frontier of semantic CM, and gives reason for the lack of algorithms going beyond. On the practical side, it simplifies the implementation and increases the efficiency, which, like for its syntactic counterpart, can be the catalyst for its migration from academia into industrial practice. K e y w o r d s : Program optimization, data-flow analysis, code motion, code placement, partial redundancy elimination, transparent equivalence, Herbrand interpretation.

1

Motivation

Code motion (CM) is a classical optimization technique for eliminating partial redundancies (PRE). 1 Living in an ideal world a P R E - a l g o r i t h m would yield the p r o g r a m of Figure l(b) when applied to the p r o g r a m of Figure l(a). A truly optimal result; free of any redundancies. * An extended version is available as [14]. ** The work of the author was funded in part by the Leibniz Programme of the German Research Council (DFG) under grant O1 98/1-1. 1 CM and P R E are often identified. To be precise, however, CM is a specific technique for PRE. As we are going to show here, identifying them is inadequate in general, and thus, we axe precise on this distinction.

155

a)

b) 1

1

2

5

3

h2 = c+b

s

4

3

7 8

hi =a+b

4

9

7 $

13

13

z:= hl

14

lo

z =h2

z =42

1~

p =b

12

15

17

1

1

i The Original Program

Desired

r Optimization: No

A Truly

Optimal Program

-

Redundancies at a l l !

Fig. 1. Living in an Ideal World: The Effect of Partial Redundancy Elimination

Unfortunately, the world is not that ideal. Up to now, there is no algorithm achieving the result of Figure l(b). In reality, PRE is characterized by two different approaches for CM: a syntactic and a semantic one. Figure 2 illustrates their effects on the example of Figure l(a). The point of syntactic CM is to treat all term patterns independently and to regard each assignment as destructive to any term pattern containing the left-hand side variable. In the example of Figure l(a) it succeeds in eliminating the redundancy of a + b in the left loop, but fails on the redundancy of c + p in the right loop, which, because of the assignment to p, is not redundant in the "syntactic" sense inside the loop. In contrast, semantic CM fully models the effect of assignments, usually by means of a kind of symbolic execution (value numbering) or by backward substitution: by exploiting the equality of p and b after the assignment p := b, it succeeds in eliminating the redundant computation of c + p inside the right loop as well. However, neither syntactic nor semantic CM succeeds in eliminating the partial redundancies at the edges 8 and 13. This article is concerned with answering why: we will prove that redundancies like in this example are out of the scope of any "motion"-based PRE-technique. Eliminating them requires to switch from motion-based to "placement"-based techniques. This fact, and more generally, the analogies and differences between syntactic and semantic CM and CP as illustrated in Figures 2(a) and (b), and Figure l(b), respectively, are elucidated for the first time in this article.

156

Fig. 2. Back to Reality: Syntactic Code Motion vs. Semantic Code Motion

H i s t o r y and Current Situation: Syntactic CM (cf. [2, 5-8, 12, 13, 15, 18]) is well-understood and integrated in many commercial compilers. 2 In contrast, the much more powerful and aggressive semantic CM has currently a very limited practical impact. In fact, only the "local" value numbering for basic blocks [4] is widely used. Globalizations of this technique can be classified into two categories: limited globalizations, where code can only be moved to dominators [3, 16], and aggressive globalizations, where code can be moved more liberally [17, 20, 21]. The limited approaches are quite efficient, however, at the price of losing significant parts of the optimization power: they even fail in eliminating some of the redundancies covered by syntactic methods. In contrast, the aggressive approaches are rather complex, both conceptually and computationally, and are therefore considered impractical. This judgement is supported by the state-ofthe-art here, which is still based on bidirectional analyses and heuristics making the proposed algorithms almost incomprehensible. In this article we re-investigate (aggressive) semantic CM under the perspective of the very successful recent development of syntactic CM. This investigation highlights the conceptual analogies and differences between the syntactic and the semantic approach. In particular, it allows us to show that: 2 E.g., based on [12, 13] in the Sun SPARCompiler language systems (SPARCompiler is a registered trademark of SPARC International, Inc., and is licensed exclusively to Sun Microsystems, Inc.).

157

-

-

the decomposition technique into unidirectional analyses developed in [12, 13] can be transferred to the semantic setting. Besides establishing a natural connection between the Herbrand interpretation (transparent equivalence) and the algorithm, which eases the proof of its correctness, 3 this decomposition leads to a more efficient and easier implementation. In fact, due to this simplification, we are optimistic that semantic CM will find its way into industrial practice. there is a significant difference between motion and placement techniques (see Figures 1 and 2), which only shows up in the semantic setting. The point of this example is that the computations of a + b and c + b cannot safely be "moved" to their computation points in Figure l(b), but they can safely be "placed" there (see Figure 3 for an illustration of the essentials of this example).

The major contributions of this article are thus as follows. On the conceptual side: (1) Uncovering that CM and CP are no synomyms in the semantic setting (but in the syntactic one), (2) showing the frontier of semantic CM, and (3) giving theoretical and practical reasons for the lack of algorithms going beyond! On the technical side, though almost as a side-effect yet equally important, presenting a new algorithm for computationally optimal semantic CM, which is conceptually and technically much simpler as its predecessor of [20]. Whereas the difference between motion and placement techniques will primarily be discussed on a conceptual level, the other points will be treated in detail.

Fig. 3. Illustrating the Difference: Sem. Code Placement vs. Sem. Code Motion

S a f e t y - T h e B a c k b o n e o f C o d e M o t i o n : Key towards the understanding of the conceptual difference between syntactic and semantic CM is the notion of 3 Previously (cf. [20, 21]), this connection, which is essential for the conceptual understanding, had to be established in a very complicated indirect fashion.

158

safety of a program point for some computation: intuitively, a program point is safe for a given computation, if the execution of the computation at this point results in a value that is guaranteed to be computed by every program execution passing the point. Similarly, up-safety (down-safety) is defined by requiring that the computation of the value is guaranteed before meeting the program point for the first time (after meeting the program point for the last time).4 As properties like these are undecidable in the standard interpretation (cf. [16]), decidable approximations have been considered. Prominent are the abstractions leading to the syntactic and semantic approach considered in this article. Concerning the safety notions established above the following result is responsible for the simplicity and elegance of the syntactic CM-algorithms (cf. [13]): Theorem 1 (Syntactic Safety).

Safe = Up-safe V Down-safe

It is the failure of this equality in the semantic setting, which causes most of the problems of semantic CM, because the decomposition of safety in up-safety and down-safety is essential for the elegant syntactic algorithms. Figure 4 illustrates this failure as follows: placing the computation of a + b at the boldface join-node is (semantically) safe, though it is neither up-safe nor down-safe. As a consequence, simply transferring the algorithmic idea of the syntactic case to the semantic setting without caring about this equivalence results in an algorithm for CM with second-order effects (cf. [17]). 5 These can be avoided by defining a motion-oriented notion of safety, which allows to reestablish the equality for a hierarchically defined notion of up-safety: the algorithm resulting from the use of these notions captures all the second-order Fig. 4. Safe though neither Up-Safe nor effects of the "straightforwardly Down-Safe transferred" algorithm as well as the results of the original bidirectional version for semantic CM of [20]. M o t i o n v e r s u s P l a c e m e n t : The step from the motion-oriented notion of safety to "full safety" can be regarded as the step from motion-based algorithms to placement-based algorithms: in contrast to CM, CP is characterized by allowing arbitrary (safe) placements of computations with subsequent (total) redundancy 4 Up-safety and down-safety are traditionally called "availability" and very busyness", which, however, does not reflect the "semantical" essence and the duality of the two properties as precise as up-safety and down-safety. 5 Intuitively, this means the transformation is not idempotent (cf. [17].

159

elimination (TRE). As illustrated in Figure 3, not all placements can be realized via motion techniques, which are characterized by allowing the code movement only within areas where the placement would be correct. The power of arbitrary placement leads to a number of theoretic and algorithmic complications and anomalies (cf. Section 5), which we conjecture, can only be solved by changing the graph structure of the argument program, e.g. along the lines of [19]. Retrospectively, the fact that all CM-algorithms arise from notions of safety, which collapse in the syntactic setting, suffices to explain that the syntactic algorithm does not have any second-order effects, and that there is no difference between "motion" and "placement" algorithms in the syntactic setting. Theorem 2 (Syntactic CM and Syntactic CP). In the syntactic setting, CM is as powerful as CP (and vice versa).

2

Preliminaries

We consider procedures of imperative programs, which we represent by means of directed edge-labeled flow graphs G = (N, E, s, e) with node set N, edge set E, a unique start node s and end node e, which are assumed to have no incoming and outgoing edges, respectively. The edges of G represent both the statements and the nondeterministic control flow of the underlying procedure, while the nodes represent program points only. Statements are parallel assignments of the form ( x l , . . . , X r ) : = ( t l , . . . , t r ) , where xi are pairwise disjoint variables, and t~ terms of the set T, which as usual are inductively composed of variables, constants, and operators. For r - - 0 , we obtain the empty statement "skip". Unlabeled edges are assumed to represent "skip". Without loss of generality we assume that edges starting in nodes with more than one successor are labeled by "skip". 6 Source and destination node of a flow-graph edge corresponding to a node n of a traditional node-labeled flow graph represent the usual distinction between the entry and the exit point of n explicitly. This simplifies the formal development of the theory significantly, particularly the definition of the value-flow graph in Section 3.1 because the implicit treatment of this distinction, which, unfortunately is usually necessary for the traditional flow-graph representation, is obsolete here; a point which is intensely discussed in [11]. For a flow graph G, let pred(n) and succ(n) denote the set of all immediate predecessors and successors of a node n, and let source(e) and dest(e) denote the source and the destination node of an edge e. A finite path in G is a sequence ( e l , . . . , ea) of edges such that dest(ej) = source(ej+l) for j E { 1 , . . . , q - 1}. It is a path from m to n, if source(el)=m and dest(eq)=n. Additionally, p[i,j], where 1 < i < j < q, denotes the subpath ( e i , . . . , e j ) of p. Finally, P [ m , n ] denotes the set of all finite paths from m to n. Without loss of generality we assume that every node n E N lies on a path from s to e. 6 Our approach is not limited to a setting with scalar variables. However, we do not consider subscripted variables here in order to avoid burdening the development by alias-analyses.

160

3

Semantic Code Motion

In essence, the reason making semantic CM much more intricate than syntactic CM is that in the semantic setting safety is not the sum of up-safety and downsafety (i.e., does not coincide with their disjunction). In order to illustrate this in more detail, we consider as in [17] transparent equivalence of terms, i.e., we fully treat the effect of assignments, but we do not exploit any particular properties of term operators. 7 Moreover, we essentially base our investigations on the semantic CM-algorithm of [20] consisting of two major phases:

- Preprocess: a) Computing transparent equivalences (cf. Section 3.1). b) Constructing the value-flow graph (cf. Section 3.1). Main Process: Eliminating semantic redundancies (cf. Section 4). -

After computing the transparent equivalences of program terms for each program point, the preprocess globalizes this information according to the value flow of the program. The value-fiow graph is just the syntactic representation for storing the global flow information. Based on this information the main process eliminates semantically redundant computations by appropriately placing the computations in the program. In the following section we sketch the preprocess as far as it is necessary for the development here, while we investigate the main process in full detail. In comparison to [20] it is completely redesigned.

3.1

The Preprocess: Constructing the Value-Flow Graph

The first step of the preprocess computes for every program point the set of all transparently equivalent program terms. The corresponding algorithm is fairly straightforward and matches the well-known pattern of Kildall's algorithm (cf. [10]). As

rosu,t

program point

annotated by a structured partition (SP) DAG (cp. [9]), i.e., an ordered, directed, acyclic graph whose nodes are a c b,d labeled with at most one operator or constant and a set of variables. The SPDAG attached to a program point represents the set of all terms being transparently equivalent at this point: two terms are transparently equivalent iff they are represented by the same node of an SPDAG; e.g., the SPDAG on the right represents the term equivalences [a I b, d I c I z, a + b, a + d I y,c + b,c + a~. Afterwards, the value-flow graph is constructed. Intuitively, it connects the nodes, i.e., the term equivalence classes of the SPDAG-annotation :D computed in the previous step according to the data flow. Thus, its nodes are the classes of transparently equivalent terms, and its edges are the representations of the data flow: if two nodes ~ and ~ / o f a value-flow graph are connected by a valueflow graph edge (v, v~), then the terms represented by v ~ evaluate to the same value after executing the flow-graph edge e corresponding to (v, v ~) as the terms represented by v before executing e (cf. Figure 7). This is made precise by 7 In [20] transparent equivalence is therefore called Herbrand equivalence as it is induced by the Herbrand interpretation.

161

means of the relation ~~ defined next. To this end, let F denote the set of all nodes of the SPDAG-annotation 7). Moreover, let Terms(u) denote the set of all terms represented by node u of F, and let Af(u) denote the flow-graph node u is related to. Central is the definition of the backward substitution 5, which is defined for each flow-graph edge e - ( X l , . . . , xr) := ( t l , . . . , tr) by 5e : W --~ T, 5 e ( t ) = d / t [ t l , . . . , t r / x l , . . . , xr], where t [ t l , . . . , t r / x l , . . . , xr] stands for the simultaneous replacement of all occurrences of xi by ti in t, i E { 1 , . . . , r}. s 6 The relation ~ on F is now defined by: V(u,u') E F. u~ ~ u' .'. '.'dr (Af(u),Af(u')) E E A Terms(u) D_5~(Terms(u')) The value-flow graph for a SPDAG-designation 7) is then as follows:

The value-flow graph with respect to 7) is a pair V F G = ( VFN, VFE) consisting of

Definition 1 (Value-Flow Graph).

- a set of nodes VFN=df F called abstract values and 6 - a set o/edges VFE C VFN• VFN with VFE=df It is worth noting that the value-flow graph definition given above is technically much simpler than its original version of [20]. This is simply a consequence of representing procedures here by edge-labeled flow graphs instead of node-labeled flow graphs as in [20]. Predecessors, successors and finite paths in the value-flow graph are denoted by overloading the corresponding notions for flow graphs, e. g., pred(u) addresses the predecessors of u E F. V F G - R e d u n d a n e i e s : In order to define the notion of (partial) redundancies with respect to a value-flow graph VFG, we need to extend the local predicate Comp for terms known from syntactic CM to the abstract values represented by value-flow graph nodes. In the syntactic setting Comp~ expresses that a given term t under consideration is computed at edge e, i.e., t is a sub-term of some right-hand side term of the statement associated with e. Analogously, for every abstract value u E VFN the local predicate Comp, expresses that the statements of the corresponding outgoing flow-graph edges compute a term represented by u: 9

Comp~ r

( r e E E. source(e)=Af(u) ~ Terms(u) n Terms(e) ~ ~)

Here Terms(e) denotes the set of all terms occurring on the right-hand side of the statement of edge e. In addition, we need the notion of correspondence between value-flow graph paths and flow-graph paths. Let p = (el,... ,eq) E P [ m , n ] be a path in the flow s Note, for edges labeled by "skip" the function 5e equals the identity on terms Idw. 9 Recall that edges starting in nodes with more than one successor are labeled by "skip". Thus, we have: Yn e N. I{e]source(e)=n}l>l =~ Terms({ e I source(e) = n })----0. Hence, the truth value of the predicate Comp~ depends actually on a single flow-graph edge only.

162

graph G, and let p' = (s~,... ,~r) E P[~,, #] be a path in a value-flow graph VFG of G. Then pl is a corresponding VFG-prefix of p, if for all i E {1,... ,r} holds: Af ( source(Ei) ) = source(e~) and Af ( dest(si) ) = dest(e~). Analogously, the notion of a corresponding VFG-postfix pl of p is defined. We can now define:

Definition 2 ( V / G - R e d u n d a n c y ) . Let VFG be a value-flow graph, let n be anode o] G, and t be a term of T. Then t is 1. partially V/G-redundant at n, if there is a path p = (e~,..., %) E P[s,n] with a corresponding VFG-postfix p ' = ( c l , . . . , ~ r ) E P[v,#] such that Compv and t E Terms(it) holds. 2. (totally) V/G-redundant at n, if t is partially V/G-redundant along each path p E P[s, n].

4

T h e Main Process: Eliminating Semantic R e d u n d a n c i e s

The nodes of a value-flow graph represent semantic equivalences of terms syntactically. In the main process of eliminating semantic redundancies, they play the same role as the lexical term patterns in the elimination of syntactic redundancies by syntactic CM. We demonstrate this analogy in the following section.

4.1

The Straightforward Approach

In this section we extend the analyses underlying the syntactic CM-procedure to the semantic situation in a straightforward fashion. To this end let us recall the equation system characterizing up-safety in the syntactic setting first: up-safety of a term pattern t at a program point n means that t is computed on every program path reaching n without an intervening modification of some of its operands.l~

Equation S y s t e m 3 (Syntactic Up-Safety for a Term Pattern t) Syn-USn = (n # s).

H (Syn-USm+ C~ mCpred(n)

Transp(m, n)

The corresponding equation system for VFG-up-safety is given next. Note that there is no predicate like "VFG-Transp" corresponding to the predicate Transp. In the syntactic setting, the transparency predicate Transpe is required for checking that the value of the term t under consideration is maintained along a flow-graph edge e, i.e., that none of its operands is modified. The essence of the value-flow graph is that transparency is modeled by the edges: two value-flow graph nodes are connected by a value-flow graph edge iff the value they represent is maintained along the corresponding flow-graph edge. 10 As convenient, we use -, + and overlining for logical conjunction, disjunction and negation, respectively.

163

Equation System 4 (V_fiG-Up-Safety) VFG-US~ = (v ~

YEN,).

H

(VFG-US~+

Comps)

~6pred(v)

The roles of the start node and end node of the flow graph are here played by the "start nodes" and "end nodes" of the value-flow graph defined by:

VFN, =dr { 12 ]N(wed(v) ) # pred( N(u) ) V N(v) = s} VFYe =dy { v IN(suee( ) ) # suce(N( ) ) vN(.) = e} Down-safety is the dual counterpart of up-safety. However, the equation system for the VFG-version of this property is technically more complicated, as we have two graph structures, the flow graph and the value-flow graph, which must both be taken separately into account: as in the syntactic setting (or for upsafety), we need safety universally along all flow-graph edges, which is reflected by a value-flow graph node v being down-safe at the program point N(v), if a term represented by v is computed on all flow-graph edges leaving N(v), or if it is down-safe at all successor points. However, we may justify safety along a flowgraph edge e by means of the VFG in various ways, as, in contrast to up-safety concerning the forward flow (or the syntactic setting), a value-flow graph node may have several successors corresponding to e (cf. Figure 7), and it suffices to have safety only along one of them. This is formally described by:

Equation System 5 (VFG-Motion-Down-Safety) VFG-MDSv = (v ~ VFge)" (Compv +

H

~

VFG-MDS~)

m e s u c c ( A f ( v ) ) ~ . . . . (~) ~(~) = m

The Transformation (of the Straightforward Approach) Let VFG-US* and VFG-MDS* denote the greatest solutions of the Equation Systems 4 and 5. The analogy with the syntactic setting is continued by the specification of the semantic CM-transformation, called Sem-CMstrght. It is essentially given by the set of insertion and replacement points. Insertion of an abstract value v means to initialize a fresh temporary with a minimal representative t' of v, i.e., containing a minimal number of operators, on the flow-graph edges leaving node H(v). Replacement means to replace an original computation by a reference to the temporary storing its value.

Insertv r

VFG-MDS: 9((v 6

VFNs)+

~

VFG-MDS~+ VFG-US~)

~Epred(v)

Replace v r

Compv

Managing and reinitializing temporaries: In the syntactic setting, there is for every term pattern a unique temporary, and these temporaries are not interacting with each other: the values computed at their initialization sites are thus

164

propagated to all their use sites, i.e., the program points containing an original occurrence of the corresponding term pattern, without requiring special care. In the semantic setting propagating these values requires usually to reset them at certain points to values of other temporaries as illustrated in Figure 2(b). The complete process including managing temporary names is accomplished by a straightforward analysis starting at the original computation sites and following the value-graph edges in the opposite direction to the insertion points. The details of this procedure are not recalled here as they are not essential for the point of this article. They can be found in [20, 21]. S e c o n d - O r d e r Effects: In contrast to the syntactic setting, the straightforward semantic counterpart Sem-CMstrghthas second-order effects. This is illustrated in Figure 5: applying Sem-CMstrght to the program of Figure 5(a) results in the program of Figure 5(b). Repeating the transformation again, results in the optimal program of Figure 5(c).

Fig. 5. Iliustrating Second-Order Effects

Intuitively, the reason that Sem-CMstrght has second-order effects is that safety is no longer the sum of up-safety and down-safety. Above, up-safety is only computed with respect to original computations. While this is sufficient in the syntactic case, it is not in the semantic one as it does not take into account that the placement of computations as a result of the transformation may make "new" values available, e.g. in the program of Figure 5(b) the value of a + b becomes available at the end of the program fragment displayed. As a consequence, total redundancies like the one in Figure 5(b) can remain in the

165

program. Eliminating them (TRE) as illustrated in Figure 5(c) is sufficient to capture the second-order effects completely. We have:

Theorem 3 (Sem-CMstrgh~). 1. Sem-CMstrght relies on only two uni-directional analyses. 2. Sem-CMs~rght has second-order effects. 3. Sem-CMs~rgh~ /ollowed by TRE achieves computationally motion-optimal results wrt the Herbrand interpretation like its bidirectional precursor o/

[20]. 4.2

Avoiding Second-Order Effects: The Hierarchical Approach

The point of this section is to reestablish as a footing for the algorithm a notion of safety which can be characterized as the sum of up-safety and down-safety. The central result here is that a notion of safety tailored to capture the idea of "motion" (in contrast to "placement") can be characterized by down-safety together with the following hierarchical notion of up-safety:

Equation System 6 ("Hierarchical" VFG-Up-Safety) VFG-MUSv = VFG-MDS~ + (u ~ VFN,) . H VFG-MUS. #6pred(v)

In fact, the phenomenon of second-order effects can now completely and elegantly be overcome by the following hierarchical procedure: m 1. Compute down-safety (cf. Equation System 5). 2. Compute the modified up-safety property on the basis of the down-safety computation (cf. Equation System 6). The transformation Sem-CMHi~r of the Hierarchical Approach is now defined as before except that VFG-MUS* is used instead of VFG-US*. Its results coincide with those of an exhaustive application of the CM-procedure proposed in Subsection 4.1, as well as of the original (bidirectional) CM-procedure of [20]. However, in contrast to the latter algorithm, which due to its bidirectionality required a complicated correctness argument, the transformation of the hierarchical approach allows a rather straightforward link to the Herbrand semantics, which drastically simplifies the correctness proof. Besides this conceptual improvement, the new algorithm is also easier to comprehend and to implement as it does not require any bidirectional analysis. We have:

Theorem 4 (Sem-CMHier). 1. Sem-CMHier relies on only two uni-directional analyses sequentially ordered. 2. Sem-CMHier is free of second-order effects. 3. Sem-CMHier achieves computationally motion-optimal results wrt the Herbrand interpretation like its bidirectional precursor of [20]. 11 The "down-safety/earliest" characterization of [12] was also already hierarchical. However, in the syntactic setting this is not relevant as it was shown in [13].

166

5

Semantic

Code

Placement

In this section we are concerned with crossing the frontier marked by semantic CM towards to semantic CP, and giving theoretical and practical reasons for the fact that no algorithm has gone beyond this frontier so far. On the theoretical side (I), we prove that computational optimality is impossible for semantic CP in general. On the practical side (II), we demonstrate that down-safety, the handle for correctly and profitably placing computations by syntactic and semantic CM, is inadequate for semantic CP. (I) No O p t i m a l i t y in General: Semantic CP cannot be done "computationally placement-optimal" in general. This is a significant difference in comparison to syntactic and semantic CM, which have a least element with respect to the relation "computationally better" (cf. [13, 20]). In semantic CP, however, we are faced with the phenomenon of incomparable minima. This is illustrated in the example of Figure 6 showing a slight modification of the program of Figure 3. Both programs of Figure 6 are of incomparable quality, since the "right-most" path is improved by impairing the "left-most" one. a)

b) :=

I

:=

I

Fig. 6. No Optimality in General: An Incomparable Result due to CP

(II) I n a d e q u a t e n e s s of Down-safety: For syntactic and semantic CM, downsafe program points are always legal insertion points. Inserting a computation at a down-safe point guarantees that it can be used on every program continuation. This, however, does not hold for semantic CP. Before showing this in detail, we illustrate the difference between "placable" V_fiG-down-safety (VFG-DownSafe) and "movable" V/~-down-safety (VFG-M-DownSafe) by means of Figure 7 which shows the difference by means of the value-flow graph corresponding to the example of Figure 3. We remark that the predicate VFG-DownSa/e is decidable. However, the point to be demonstrated here is that this property is insufficient anyhow in

167

Fig. 7. The Corresponding Value-Flow Graph order to (straightforwardly) arrive at an algorithm for semantic CP. This is illustrated by Figure 8 showing a slight modification of the program of Figure 6. Though the placement of h := a + b is perfectly down-safe, it cannot be used at all. Thus, impairing the program. a)

b) :=

Fig. 8. Inadequateness of Down-Safety: Degradation through Naive CP

S u m m a r y : The examples of Figure 3 and of this section commonly share that they are invariant under semantic CM, since a + b cannot safely be moved to (and hence not across) the join node in the mid part. However, a+b can safely be placed in the left branch of the upper branch statement. In the example of Figure 3 this suffices to show that semantic CP is in general strictly more powerful

168

than semantic CM. On the other hand, Figure 6 demonstrates that "computational optimality" for semantic CP is impossible in general. While this rules out the possibility of an algorithm for semantic CP being uniformly superior to every other semantic CP-algorithm, the inadequateness of down-safety revealed by the second example of this section gives reason for the lack even of heuristics for semantic CP: this is because down-safety, the magic wand of syntactic and semantic CM, loses its magic for semantic CP. In fact, straightforward adaptions of the semantic CM-procedure to semantic CP would be burdened with the placing anomalies of Figures 6 and 8. We conjecture that a satisfactory solution to these problems requires structural changes of the argument program. Summarizing, we have: T h e o r e m 5 (Semantic C o d e P l a c e m e n t ) . 1. Semantic CP is strictly more powerful than semantic CM. 2. Computational placement-optimality is impossible in general. 3. Down-safety is inadeqate for semantic CP.

6

Conclusion

We have re-investigated semantic CM under the perspective of the recent development of syntactic CM, which has clarified the essential difference between the syntactic and the semantic approach, and uncovered the difference of CM and CP in the semantic setting. Central for the understanding of this difference is the role of the notion of safety of a program point for some computation. Modification of the considered notion of safety is the key for obtaining a transfer of the syntactic algorithm to the semantic setting which captures the full potential of motion algorithms. However, in contrast to the syntactic setting, motion algorithms do not capture the full potential of code placement. Actually, we conjecture that there does not exist a satisfactory solution to the code placement problem, unless one is prepared to change the structure of the argument program.

References 1. B. Alpern, M. N. Wegman, and F. K. Zadeck. Detecting equality of variables in programs. In Conf. Ree. 15th Syrup. Principles of Prog. Lang. (POPL'88), pages 1 - 11. ACM, NY, 1988. 2. F. Chow. A Portable Machine Independent Optimizer - Design and Measurements. PhD thesis, Stanford Univ., Dept. of Electrical Eng., Stanford, CA, 1983. Pubh as Tech. Rep. 83-254, Comp. Syst. Lab., Stanford Univ. 3. C. Click. Global code motion/global value numbering. In Proc. ACM SIGPLAN Conf. Prog. Lan 9. Design and Impl. (PLDFg5), volume 30,6 of ACM SIGPLAN Not., pages 246-257, 1995. 4. J. Cocke and J. T. Schwartz. Programming languages and their compilers. Courant Inst. Math. Sciences, NY, 1970.

169

5. D. M. Dhamdhere. A fast algorithm for code movement optimization. A CM SIGPLAN Not., 23(10):172 - 180, 1988. 6. D. M. Dhamdhere. Practical adaptation of the global optimization algorithm of Morel and Renvoise. ACM Trans. Prog. Lang. Syst., 13(2):291 - 294, 1991. Tech. Corr. 7. D. M. Dhamdhere, B. K. Rosen, and F. K. Zadeck. How to analyze large programs efficiently and informatively. In Proc. ACM SIGPLAN Conf. Prog. Lang. Design and Impl. (PLDI'92), volume 27,7 of ACM SIGPLAN Not., pages 212 - 223, 1992. 8. K.-H. Drechsler and M. P. Stadel. A solution to a problem with Morel and Renvoise's "Global optimization by suppression of partial redundancies". ACM Trans. Prog. Lang. Syst., 10(4):635 - 6 4 0 , 1988. Tech. Corr. 9. A. Fong, J. B. Kam, and J. D. Ullman. Application of lattice algebra to loop optimization. In Conf. Rec. 2nd Syrup. Principles of Prog. Lang. (POPL'75), pages 1 - 9. ACM, NY, 1975. 10. G. A. Kildall. A unified approach to global program optimization. In Conf. Rec. 1st Syrup. Principles of Prog. Lang. (POPL '73), pages 194 - 206. ACM, NY, 1973. 11. J. Knoop, D. Koschiitzki, and B. Steffen. Basic-block graphs: Living dinosaurs? In Proe. 7th Int. Conf. on Compiler Constr. (CC'98), LNCS, Springer-V, 1998. 12. J. Knoop, O. Rfithing, and B. Steffen. Lazy code motion. In Proc. ACMSIGPLAN Conf. Prog. Lang. Design and Impl. (PLDI'92), volume 27,7 of ACM SIGPLAN Not., pages 224 - 234, 1992. 13. J. Knoop, O. Riithing, and B. Steffen. Optimal code motion: Theory and practice. ACM Trans. Prog. Lang. Syst., 16(4):1117-1155, 1994. 14. J. Knoop, O. Rfithing, and B. Steffen. Code Motion and Code Placement: Just Synonyms? Technical Report MIP-9716, Fakultiit fiir Mathematik und Informatik, Universit~t Passau, Germany, 1997. 15. E. Morel and C. Renvoise. Global optimization by suppression of partial redundancies. Comm. ACM, 22(2):96 - 103, 1979. 16. J. H. Reif and R. Lewis. Symbolic evaluation and the global value graph. In Conf. Rec. ~th Syrup. Principles of Prog. Lang. (POPL'77), pages 104 - 118. ACM, NY, 1977. 17. B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global value numbers and redundant computations. In Conf. Ree. 15th Syrup. Principles of Prog. Lang. (POPL '88), pages 2 - 27. ACM, NY, 1988. 18. A. Sorkin. Some comments on a solution to a problem with Morel and Renvoise's "Global optimization by suppression of partial redundancies". A CM Trans. Prog. Lang. Syst., 11(4):666 - 668, 1989. Tech. Corr. 19. B. Steffen. Property-oriented expansion. In Proc. 3rd Stat. Analysis Symp. (SAS'96), LNCS 1145, pages 22 - 41. Springer-V, 1996. 20. B. Steffen, J. Knoop, and O. Riithing. The value flow graph: A program representation for optimal program transformations. In Proc. 3rd Europ. Syrup. Programming (ESOP'90), LNCS 432, pages 389 - 405. Springer-V., 1990. 21. B. Steffen, J. Knoop, and O. Riithing. Efficient code motion and an adaption to strength reduction. In Proc. ~th Int. Conf. Theory and Practice o/Software Development (TAPSOFT'91), LNCS 494, pages 394 - 415. Springer-V., 1991.

Recursive Object Types in a Logic of Object-Oriented Programs K. Rustan M. Leino Digital Equipment Corporation Systems Research Center 130 Lytton Ave., Palo Alto, CA 94301, U.S.A. http://www.research.digital.com/SRC/people/Rustan_Leino

Abstract. This paper formalizes a small object-oriented programming notation.

The notation features imperative commands where objects can be shared (aliased), and is rich enough to allow subtypes and recursive object types. The syntax, type checking rules, axiomatic semantics, and operational semantics of the notation are given. A soundness theorem showing the consistency between the axiomatic and operational semantics is also given. A simple corollary of the soundness theorem demonstrates the soundness of the type system. Because of the way types, fields, and methods are declared, no extra effort is required to handle recursive object types.

0

Introduction

It is well known that C.A.R. Hoare's logic of the basic commands of imperative, procedural languages [9] has been useful in understanding imperative languages. Objectoriented programming languages being all the rage, one is surprised that the literature has not produced a corresponding logic for modern object-oriented programs. The control structures of object-oriented programs are similar to those treated by Hoare, but the data structures of object-oriented programs are more complicated, mainly because objects are (possibly shared) references to data fields. This paper presents a logic for an object-oriented programming notation. In an early attempt at such a logic, Leavens gave an axiomatic semantics for an object-oriented language [ 11]. However, the language he used differs from popular object-oriented languages in that it is functional rather than imperative, so the values of the fields of objects cannot be changed. America and de Boer have given a logic for the parallel language POOL [4]. This logic applies to imperative programs with object sharing (sometimes called aliasing), but without subtyping and method overriding. In a logic that I will refer to as logic AL, Abadi and I defined an axiomatic semantics for an imperative, objectoriented language with object sharing [2], but it does not permit recursive object types. Poetzsch-Heffter and Mtiller have defined (but not proved sound) a Hoare-style logic for object-oriented programs that remove many of the previous limitations [18]. However, instead of following the standard methodological discipline of letting the designer of a method define its specification and then checking that implementations meet the specification, the specification of a method in the Poetzsch-Heffter and Miiller logic is derived from the method's known implementations. The present logic deals with imperative features, subtyping, and recursive object types.

171

The literature has paid much attention to the type systems of object-oriented languages. Such papers tend to define some notion of types, the commands of some language, the type rules and operational semantics for the commands, and a soundness theorem linking the type system with the operational semantics. (Several examples of this are found in Abadi and Cardelli's book on objects [1].) But after all that effort, one still doesn't know how to reason about the programs that can be written with the provided commands, since no axiomatic semantics is given. In addition to giving a programming notation and its axiomatic semantics, this paper, like the paper describing logic AL, gives an operational semantics and a soundness theorem that links the operational semantics with the axiomatic semantics. The soundness theorem directly implies the soundness of the type system. A complication with type systems is that types can be recursive, that is, an object type T may contain a field of type T or a method whose return type is T. The literature commonly treats recursive data types by introducing some sort of fix-point operator into the type system, good examples of which are a paper by Amadio and Cardelli on recursive types and subtypes [3] and the book by Abadi and Cardelli. By treating types in a dramatically different way, the present logic supports recursive object types without the need for any special mechanism like fix-points. The inclusion of recursive object types is one main advantage of the present logic over logic AL, which does not allow them. (The other main advantage over logic AL is that the present logic can be used with any first-order theory.) Because the given soundness theorem implies the soundness of the type system, the present work contributes also to the world of type systems. In difference to the paper by Amadio and Cardelli, which considers unrestricted recursive types, the type system in the present paper uses a restriction along the lines of name matching. In particular, types are simply identifiers, and the subtype relation is simply a given partial order among those identifiers. This is much like the classes in Java [8] or the branded object types in Modula-3 [17]. But in contrast to languages like Java or Modula-3, fields and methods are declared separately from types in the language considered in this paper. (This is also done in Cecil [5] and Ecstatic [13].) Not only does this simplify the treatment without loss of applicability to languages like Java and Modula-3, but it also makes explicit the separation of concerns. For example, as the logic shows, having to know all the fields of a particular object type is necessary only for the allocation of a new object. Furthermore, when a field or method is declared at some type T, each Subtype of T automatically acquires, or inherits, that field or method. Consequently, one gets behavioral subtyping for free, something that can also be achieved by the inheritance discipline considered by Dhara and Leavens [6]. In contrast, subtype relations frequently found in the literature (including the subtype relation used in logic AL), involves the fields and methods of types. In such treatments of types, one often encounters words like "co-variant"; there will be no further occurrence of such words in this paper. The rest of this paper is organized as follows. Section 1 relates the present logic to some work that has influenced it. Section 2 describes the declarations that can be used in program environments, and Section 3 describes the commands: their syntax, axiomatic semantics, and operational semantics. Section 4 discusses an example program. Then,

172

Section 5 states the soundness theorem. Section 6 discusses some limitations of the logic, and the paper concludes with a brief summary.

1

Sources of Influence

My work with Abadi has inculcated the present logic with its style and machinery. The present logic also draws from other sources with which I am quite familiar: my thesis [ 12], my work on an object logic with Nelson [ 15], and the Ecstatic language [13]. This section compares the features of these sources of influence with the features of the present logic. My thesis includes a translation of common object-oriented language constructs into Dijkstra's guarded commands, an imperative language whose well-known axiomatic semantics is given in terms of weakest preconditions [7]. My attempt at an object logic with Nelson is also based on guarded commands, and Ecstatic is a richer objectoriented programming language defined directly in terms of weakest preconditions. Types, fields, and methods in these three sources are declared in roughly the same way as in the present logic. While these sources do provide a way to reason about object-oriented programs, they take for granted the existence of an operational semantics that implements the axiomatic semantics. The present paper includes an operational semantics for the given commands, and establishes the correctness of the operational semantics with respect to the axiomatic semantics by proving a soundness theorem. Like logic AL, the present logic has few and simple commands. Each command in logic AL operates on an object store and produces a value placed in a special register called r. In the present logic, commands are allowed to refer to the initial value of that register, which simplifies many of the rules. (It also makes the commands "cute".) Another difference is that the present logic splits logic AL's let command into two commands: sequential composition and binding. The separation works well because the initial value of register r can be used. Perhaps surprisingly, another consequence of using the initial value of r is that the present logic manages fine without Abadi and Cardelli's ff binder that appears in logic AL to bind a method's self parameter.

2

Environments

This section starts defining the logic by describing program environments and the declarations that a program environment can contain. A program environment is a list of declarations. A declaration introduces a type, a field, or a method. An identifier is said to be declared in an environment if it is introduced by a type, field, or method declaration in the environment, or if it is one of the built-in types. I write x r E to denote that identifier x is not declared in environment E . The judgement E ~ <> says that E is a well-formed environment. The empty list, written 0, is a valid environment.

Empty Environment 01-<>

173

The next three subsections describe types, fields, and methods, and give the remaining rules for well-formed environments. 2.0

Types

A type is an identifier. There are two built-in types, Boolean and Object. (Other types, like integers, can easily be added, but I omit them for brevity.) Types other than Boolean are called object types. A new object type is introduced by a subtyping pair, which has the form T<:U, where T is identifier that names the new type, and U is an object type. Like in Java, Object denotes the root of the class hierarchy. The analogue of a subtyping pair T<:U in Java is a class T declared as a subclass of a class U: class T extends U { ... }. A type is said to be declared in an environment if it is Boolean, Object, or if it occurs as the first component of a subtyping pair. To express this formally, the judgement E l-type T says that T is a type in environment E , and the judgement E l-obj T says that T is an object type in E . The rules for these judgements are as follows. Here and throughout this paper, I use T and U , possibly subscripted, to denote types.

Types in Environments E l-obj U T q[ E (E, T < : U ) l- o

Declared Types E l- o E l-obj Object

( l-type, l-obj ) (E, T < : U , E ~) l- o (E, T < : U , EO l-obj T

El- o E ]'type Boolean

E l-obj T E l-type T

The reflexive, transitive closure of the subtyping pairs forms a partial order called the subtyping order. The judgement E l- T <: U says that T and U are types in E that are ordered by the subtyping order. Type T is then said to be a subtype of U. The rules are: Subtyping Order

E t-type T E l - T <: T

2.1

( b- <: )

(E, T<:U, E') l- o ( E , T < : U , E ' ) I - T <: U

E F To <: T1 E I- T1<: T2 E I- To < : T2

Fields

A field is a map from an object type to another type. A field f is introduced by afield triple, written I: T ~ U, where f is the identifier that names the field, T is an object type called the index type of f, and U is a type called the range type of t. The analogue of a field triple f: T ~ U in Java is an instance variable f of type U declared in a class T: classT{ ... Uf; ... }. An environment can contain field triples. A field i is said to be declared in an environment E if it occurs in some field triple f: T ~ U in E . This is expressed by the judgement E l-fieta f: T ~ U . The rules for these judgements are as follows. Here and throughout, I use I, possibly subscripted, to denote field names.

174

Fields in E n v i r o n m e n t s

E [-obj T

E [-type U

(E, f: T--> U)[-<>

Declared Fields

f ~[ E

( [-field )

(E, f: T --~ U, E') [- <> ( E , f : T - - > U,E')[-y~eldf:T-+ U

For a type To declared in an environment E , the set offields of To in E , written Fields(To, E ) , is the set of all field triples f: T -+ U such that E [-fi~ld f: T ---> U and E [- To < : T. 2.2

Methods

A method quadruple has the form m: T --~ U : R, where m is an identifier denoting a method, T is an object type, U is a type, and R is a relation. The analogue of a method quadruple m: T -+ U : R in Java is a method m with return type U declared in a class T and given a specification R : class T { ... U m ( ) {...} ... }. Note that the Java language does not have a place to write down the specification of a method. In the present language, the declaration of a method includes a specification, which specifies the effect of the method as a relation on the pre- and post-state of each method invocation. Note also that methods take no parameters (other than the object on which the method is invoked, an object commonly referred to as self). This simplifies the logic without losing theoretical expressiveness, since parameters can be passed through fields. An environment can contain method quadruples. A method m is said to be declared in an environment E if it occurs in some method quadruple m: T ~ U : R in E . This is expressed by the judgement E [-method m: T ---> U : R. Formally, the rules are as follows. I use m , possibly subscripted, to denote methods. Methods in E n v i r o n m e n t s

E [-obj T E, 0 [-~eZR

E [-tyv~ U m r E (E, m: T - + U:R)[-<>

Declared M e t h o d s

( [-method )

(E, m: T ---~ U:R,E')[-<> (E, m: T ~

U : R, E I) [-method m : T ---r U : R

The judgement E, 0 [-rel R, which will be described in more detail in Section 3.1, essentially says that R is a relation that may mention fields declared in E but doesn't mention any local program variables. For a type To declared in an environment E , the set of methods of To in E , written Methods(To, E), is the set of all method quadruples m: T ~ U : R such that E [-method m : T ---> U : R and E [- To <: T . 2.3

Relations

Methods are specified using relations. A relation is an untyped first-order predicate on a pre-state and a post-state. In order to make relations expressive, the present logic can be used with an underlying logic, which provides a set of function symbols and a set of first-order axioms about those function symbols. The example in Section 4 shows how an underlying logic may be used. Syntactically, relations are made up only of:

175

-

-

-

-

-

-

-

-

the constants f a l s e , true, and nil ; constants for field names, and the special field allot ; the special variables ~', #, &, 6 ; other variables (I will write v to denote a typical variable); equality between terms; applications of the functions select and store ; applications of the functions of the underlying logic; the usual logical connectives -1, /x , and u

The grammars for relations ( R ) and terms ( e ) are thus: R::=eo=el I ~ R I R o A R 1 I (Vx :: R ) e ::=false J true I nil J f I r J i" I & I 6 I v I select(eo, el, e2) I store(eo, el, e2, e3) I pa(eo . . . . . e ~ - l ) I "'" I pz(eo . . . . . e~z-1)

,

where pa . . . . . pz denote the function symbols of the underlying logic with arities ka . . . . . k z , respectively. It will be convenient to also allow 5 , v , =~ , r , -: , and 3 as the usual abbreviations of the operators above. The semantics of a command (program statement) is defined in terms of a relation on a register and a (data) store, together called a state. The variables ~" and ~ denote the register in the pre- and post-states of the command, respectively, and & and 6 denote the store in those respective states. The value of a field f of an object e in a store a is denoted select(a, e, f). The expression store(a, eo, f, el) represents the store that results from setting the f field of object eo in store a to the value el 9 The relationship between select and store is defined as follows. (u

eo, el, f0, fl, e :: select(store(a, eo, f0, e), e0, fo) = e A (eo#elvfo#fl select(store(a, eo, fo, e), el, fl) = select(a, el, fl)) )

(o)

The special field alloc is used to record which objects in the data store have been allocated; the alloc field of an object is false until the object is allocated, and is true from there on. As we shall see, the logic allows relations to be rewritten. A rewriting uses the rules of logic and some axioms. In particular, a rewriting may use as axioms the definition of select and store (0), the distinctness of the boolean values and the distinctness of field name constants: false ~ true all field name constants (including allot) are distinct

(1) (2)

and the axioms of the underlying logic.

3

Commands

This section describes commands: their syntax, their axiomatic semantics, and their operational semantics.

176

3.0

Syntax

A command has a form dictated by the following grammar.

a

::=

C :::

c v ao ~ al a t ; al with v: T do a [T: fi = ci i~I, mj = aj J~J] f f := v m false l true I nil

constant local variable conditional composition binding object construction field selection field update method invocation

Informally, the semantics of the language is as follows. (Recall from the previous section that commands operate on a register and a store.) - The constants false, true, and nil evaluate to themselves. That is, they have the effect of setting the register to themselves. - A local variable is an identifier introduced via a binding command. Every local variable is immutable: once bound (using with, see below), the value of a local variable cannot be changed. A local variable evaluates to its value. - The conditional command evaluates a0 if the register is initially false, and evaluates al if the register is initially true. Note that the guard of the conditional is not shown explicitly in the command; rather, the initial value of the register is used as the guard. - The sequential composition of a0 and al first evaluates a0 and then evaluates al 9 The final values of the register and store in the evaluation of a0 are used as the initial values of the register and store in the evaluation of a l . Composition is usually written at ; a l , but to keep the language looking like popular objectoriented languages, I also allow the alternative syntax a0. al (see examples below). - The binding command with v: T do a introduces a local variable v for use in a . Its evaluation consists in evaluating a with v bound to the initial value of the register. - The command [T: fi = ci iet, mj = ~ Je]] constructs a new object of type T , and sets the register to (a reference to the fields and methods of) the object. The command must list every field f, from the set of fields of T. The initial value for field fi is the given constant ci. The command must also list every method mj from the set of methods of T. The implementation of method mj for the new object is given as the command 4 , which receives self as the initial value of the register and returns the method result value as the final value of the register. The command aj cannot reference local variables other than those it declares. - A field can be selected ( f ) and updated ( f := v ). Both operate on the object referenced by the initial value of the register. Selection sets the register to the f field of the object. Update sets the f field of the object to the value of v, leaving the register unchanged.

177

- The method invocation m finds the implementation of method m for the object referenced by the initial value of the register, and then proceeds to evaluate that implementation. The evaluation of the implementation begins with the initial register and store values of the invocation, and the invocation ends with the final register and store values of the evaluation of the implementation. Other than the initial and final register values (which encode self and the result value, respectively), a method does not have explicit parameters; instead, parameters can be passed via the fields of the object. Here are some examples that compare the present commands with programs written in other languages. The Modula-3 program statement if b then S else T end is written as the command b ; (T S). The Modula-3 expression new(T, f :-----true).f, where T is an object type with one field f and no methods, is written as the command [T : f = true] ; f, or with the alternative syntax for composition, the command is written [T : f = true].f. The Modula-3 program x.f : = true is written true ; with v: Boolean do x.f := v. As an example of object sharing, the command [T: f = c] ; with v: T do with w: T do (v.f := y ; w.f) allocates a new T object whose f field is set to c, creates two references to the object ( v and w ), updates the object's f field via v, and reads f back via w, returning y . The following example shows the construction of a T object whose method or computes the disjunction of fields x and y : [T: x = f a l s e , y = f a l s e , or = with self: T do (x; (self .y <> true))] Note that although primitive, the programming notation is expressive enough to admit common object-oriented languages features like object construction, method invocation, and object sharing. The programming notation is kept minimal in order to simplify the associated rules.

3.1

Axiomatic Semantics

This subsection gives the axiomatic semantics of the commands. The judgement E, V t - a : T - - - ~

U:R

says that command a in command environment (E, V) can be started in a state where the register contents has type T, and terminates in a state where the register contents has type U. The execution of a is such that its pre- and post-states satisfy the relation R. The rules of the axiomatic semantics double as type checking rules, because with a trivial R (such as ~" = ~"), the judgement expresses what it means for command a to be well-typed. Before giving the axiomatic semantics, some other definitions and rules pertaining to constants, local variables, and command environments are in order. There are three constants: false, true, and nil. The judgement E ~-const c: T expresses that constant c has type T .

178

Type o f Constants

( t-to,st )

Et-<>

El-<>

Et-objT

E I--constfalse: Boolean

E t-const true: Boolean

E t-const nil: T

A local variable declaration has the form v: T, where v is an identifier denoting a local variable and T is a type. A command environment is a pair (E, V), where E is a program environment and V is a list of local variable declarations. A local variable v is said to be declared in a command environment (E, V) if it occurs in some local variable declaration v: T in V. This is expressed by the judgement E, V t-vat V: T . Thus, in a command environment (E, V), E contains declarations of types, fields, and methods, whereas V contains declarations of local variables. This separation allows a simple characterization of a command environment without local variable declarations: (E, 0). We saw this in the "Methods in Environments" rule in Section 2.2, and we will see it in the "Object Construction" rule below and in Theorems 0, 1, and 2 in Section 5. The judgement

E, V t-rel R says that R is a relation whose free variables are fields or local variables declared in (E, V), or are among the special fields and variables allot, ~, ~, &, and 6 . The obvious formal rules for this judgement are omitted. Thus, the judgement E, 0 t-rel R used in the hypothesis of the "Methods in Environments" rule in Section 2.2 implies that R does not mention local variables. I write x r (E, V) to denote that identifier x is not declared in command environment (E, V). The formal rules of the above are then: Declared Local Variables ( t-var )

Well-formed C o m m a n d Environment E t- <>

E, V t- <> v f[ (E, V)

E, 0 t- <>

E t-type T

E, (V, v: T, V') t- <>

E, (V, v: T) t- <>

E, (V, v: T, V') t-vat v: T

Now for the rules of the axiomatic semantics. There is one rule for each command, and one subsurnption rule. E, V t- a : 7'1 --+ T2 : R

Subsumption E t- To < : T!

E t- T2 < : T3

t-fol R =#. R'

E, V t-ret R'

E, V t- a : To ---> T3 : R'

The judgement t-for P represents provability in first-order logic, under axioms (0), (1), and (2) from Section 2.3 and the axioms of the underlying logic. Constant

Local Variable

E, V t- <>

E t-const C: T

E, V t - c :

U-+ T:i'=c

E t-type U A & =6

Conditional E, V t- at : Boolean --+ T : Ro

E, V t-var V: T E ['-type U E, V t - v : U - + T:/'=v A 6=6

E, V t- al : Boolean --~ T : RI

E, V t- ao ,r al : Boolean --+ T : (~ = f a l s e ~

Ro) A (~ = true =~

RI)

179

Composition E, V [- ao : To --> TI : Ro E, V [- al : T1---+ T2 : R1 ~"and 6" do not occur free in R0 or RI

E, V [ - a o ;

al : To ~

Binding

T 2 : (3~', o" :: Ro[[',6 :---- r, o'] m RI[~', & : = ~,6] )

E, (V, v: T) [- a : T -+ U : R E, V [- with v: T do a : T --~ U : R[v := ~]

Object Construction E, V [- <> E [--type U E [-oh: T fi: Ti ~ Ui l~l are the elements of Fields(T, E) mj:Tj--+ Uj : RjJ~S a r e t h e e l e m e n t s o f M e t h o d s ( T , E )

E [-const Ci: Ui i~l E , O [- aj : T---~ Uj : Rj j~J

E, V [- [T: fi = ci i~I, mj : aj jell : U --~ T : : ~ nil m select(b, :, allot) = f a l s e A 6 = store(... (store(b, k, alloc, true), F i d d Selection

:, fi, ci) i~l

Field Update

E, V [- <> E [-'field f: T --~ U E, V [- f : T - + U : VA nil ~ ~ = select(&, ~, 1) A b = 6

E [-'fieldf: To ~

E, V [-var v: U1

Uo

E [- T1 < : T o E [- UI < : U o

E, V [- f := v : TI -+ TI : ~"~ nil ~ ? =/" A 6 = store(&, ~, f, v)

Method Invocation E, V [- <> E [-method m: T ~ U : R E, V [- m : T ---r U : ? v~ nil =:~ R 3.2

Operational Semantics

The operational semantics is defined by the judgement r, ~r, tz, S [- a .... r', crt, tz '

It says that given an initial operationalstate (r, or,/z) and stack S , executing command a terminates in operational state (r', or',/z'). Operational states are triples whose first two components correspond to the register and data store components of states, as defined above. The third component is a method store. Let 7-/ denote a set of given object names. A stack is a partial function from local variables to 7-/U {false, true, nil}. A m e t h o d store is a partial function /~ from 7 / , such that -

/z(h)(type) is the allocated type of object h, and /z(h)(m), if defined, is the implementation of method m of object h.

A store p a i r is a pair (~r,/z) where cr is a data store and /z is a method store. In addition to keeping the method implementations of objects, the method store keeps the allocated type of objects. The operational semantics records this information as it allocates a new object, but doesn't use it subsequently. The information is used only

180 to state and prove the soundness theorem. By conveniently recording this information in the operational semantics, where it causes no harm, one avoids the use of a store type (cf [2]). The result is a simpler statement and proof of soundness. To save space, I omit the rules for the operational semantics. They can be found in a SRC Technical Note [14].

4 Example In this section, I show an example of a program that can be proved in the logic. Let us consider a linked-list type with a method that appends a list to another. Reasoning about a program with such a type requires reasoning about reachability among linked-list nodes. To this end, we assume the underlying logic to contain a function symbol Reach (adapted from Greg Nelson's reachability predicate [16]). Informally, Reach(eo, el, a, f, e2) is true whenever it is possible to reach from object e0 to object el via applications of f in a , never going through object e2. The example in this section assumes that the underlying logic contains the following two axioms, which relate Reach to select and store, respectively. el, a, f, e2 :: Reach(eo, el, a, f, e2) = true -eo = el v (eo ~ e2/x Reach(select(g, eo, f), el, a, f, e2) = true) ) (Ve0, el, a, f0, e2, fl, e3, e4 :: f0 ~ fl =~ Reach(eo, el, a, f0, e2) = Reach(eo, el, store(g, e3, fl, e4), f0, e2) ) (u

(3) (4)

Axiom (3) resembles Nelson's axiom A1 and says that every object reaches itself, and that eo reaches el if eo is not e2 and e0.f reaches el 9 Axiom (4) says that whether or not e0 reaches el via f0 is independent of the values of another field fl 9 The example uses the following environment, which I shall refer to as E : Node < : Object

appendArg: Node --~ Node

next: Node --~ Node append: Node --* Node : R

where R is the relation

? r nil

Reach(h, ~, b, next, nil)/x select(b, k, next) = nil/x select(b, i', next) = select(b, ~, appendArg) A ( u o, f :: select(&, o, f) = select(6, o, f) v (o = k ^ f = next) v f = a p p e n d A r g )

Informally, this relation specifies that the method store its argument (which is passed in via the appendArg field) at the end of the list linked via field n e x t . More precisely, the relation specifies that the method find an object k reachable from ~" (self) via the next field such that k.next is nil. The method is to set k.next to the given argument node. The method can modify ~.next and can modify the a p p e n d A r g field of any object (since this field is used only as a way to pass a parameter to append anyway), but it is not allowed to modify the store in any other way. To present the example code, I introduce a new command, isnil, which tests whether or not the register is nil.

181

Nil Test

E, V F- <>

E ~obj T

E, V ~- isnil : T --* Boolean : (~ : nil ~ ~ = true) A (~ ~ nil ~

~ = false) /x & = 6

(Section 6 discusses expression commands such as nil tests.) To write the program text, I assume the following binding powers, from highest to lowest: := ; with ... do. Now, consider the following command. [Node: next = nil, appendArg= nil, a p p e n d = with self: Node do a p p e n d A r g ; with n: Node do self.next ; isnil ;

(5)

( self.next.appendArg : = n ; append <> self.next : = n ; self ) ] This command allocates and returns a Node object whose next and appendArg fields are initially nil. The implementation of append starts by giving names to

the self object and the method's argument. Then it either calls a p p e n d recursively on self.next or sets self.next to the given argument, depending on whether or not self.next is nil. With axioms (3) and (4) in the underlying logic, one can prove the following judgement about the given allocation command. E, 0 ~- (5) : Object ~ Node : ~"= ~"

(6)

Though the relation in this judgement ( ~ = ?) is trivial, establishing the judgement requires showing that the given implementation of append satisfies its declared specification. I omit the proof, which is straightforward. I conclude the example with three remarks. First, remember that to reason about a call to a method, in particular the recursive call to append, one uses the specification of the method being called, not its implementation. This makes the reasoning independent of the actual implementation of the callee, which may in fact even be a different implementation than the one shown. Second, remember that only partial correctness is proved. That is, judgement (6) says that if the method terminates, its pre- and post-states will satisfy the specified relation. Indeed, an invocation of method append on an object in a cyclic structure of Node objects will not terminate. Third, the static type of the field self.next and the argument self.appendArg is N o d e , but the dynamic types of these objects in an execution may be any subtype of Node. Note, however, that judgement (6) is independent of the dynamic types of these objects. Indeed, having established judgement (6) means that the method works in every execution. This is because the logic is sound, as is shown in the next section.

5

Soundness

This section states a soundness theorem, which proves the correctness of the operational semantics with respect to the axiomatic semantics. I first motivate the soundness theorem, and then state it together with an informal explanation. Some additional formal definitions and the proof itself are found in a SRC Technical Note [14].

182

As a gentle step in presenting the full soundness theorem, consider the following theorem. Theorem 0. If one can derive both E, 0 k- a : Object ~ nil, go, 0, 0 F- a ..~ r, g, Iz , then r = true.

Boolean : ~ = true and

Here and in the next two theorems, g0 denotes a data store that satisfies (Vh e 7-t :: select(go, h, alloc) -- f a l s e ) , and 0 denotes the partial function whose domain is empty. The theorem says that if in an environment E one can prove that a command a satisfies the transition relation ~ ---- true, then any terminating execution of command a from a "reset" state ends with a register value of t r u e . A simple theorem about the result type of a command is the following. Theorem 1. If one can derive E, 0 t- a : Object -+ T : R and E f-obj T and nil, go, 0, 0 t- a -,-* r, g, Iz, then the value r has type T , that is, either r = nil or E t - / z ( r ) ( t y p e ) <: T . This theorem says that if one can prove, using the axiomatic semantics, that a command a has final type T, where T is an object type, and one can show that, operationally, the program terminates with a register value of r , then r is a value of type T (that is, it is nil or its allocated type is a subtype of T ). This theorem shows the soundness of the type system's treatment of object types. An interesting theorem that says something about the final object store of a program is the following. Theorem 2. If one can derive both E, 0 t- a : Object ~ T : R and nil, tro, 0, 0 ta ---* r, g , / z , then R[~', &, k, 6 : = nil, g0, r, tr] holds as a first-order predicate. This theorem says that if one can prove the two judgements about a , then relation R actually describes the relation between the initial and final states. To prove the theorems above, one needs to prove something stronger. I call the stronger theorem, of which the theorems above are corollaries, the main theorem. The theorem is stated as follows. Main Theorem. If (7) E, V ~- a : T -+ U : R, (8) r, cr,/z, S ~- a -,~ r', g ' , / x ' , (9) E, cr,/z IF r : T , (10) E I k g , / z , and (11) E , V , g , / z Ik S; then (12) r, g, r', r S IF R , (13) (g,/z) -< ((r',/~'), (14) E, g ' , / z ' IF r' : U, and (15) E I~- g ' , / z ' . In the antecedent of this theorem, (7) and (8) express the judgements that have been derived for some command a . One can hope to say something interesting in the conclusion of the theorem only if the execution under consideration is from a "reasonable" state (r, g, lZ) and uses a "reasonable" stack S. Therefore, judgement (9) states that r is a value of type T , judgement (10) says that store pair (g,/z) matches the environment E , and judgement (11) says that S is a well-typed stack. In the conclusion of the theorem, (12) expresses that R does indeed describe the relation between the initial and final states of the execution, and (14) expresses that r' has type U. In addition, to use the theorem as a sufficiently strong induction hypothesis in the proof, (13) says that (g,/x) is continued by (g', Iz'). This property expresses a

183

kind of monotonicity that holds between two store pairs, the first of which precedes the other in some execution. Also, judgement (15) says that ( a ' , / z ' ) , like the initial store pair, matches the environment. By removing (12) from the conclusion of the main theorem, one gets a corollary that expresses that the type system is sound with respect to the operational semantics. Such a corollary follows directly from the main theorem, but could also be proved directly in the same way the main theorem is.

6

Limitations of the Logic

The object construction command is rather awkward. Because it lists method implementations, a method cannot directly construct objects whose type and method implementations are the same as for self. Instead, one can declare object types representing classes, as is done, for example, by Abadi and Cardelli [1] (see SRC TN 1997-025 for an example [14]). One can consider modifying the present logic to remove the limitation from the object construction command. For example, like in common classbased object-oriented languages, one can extend the program environment to include method implementations. One must then have a "link-time" check that ensures that every method that may be called by the program at run-time has an implementation. Or, like in common object-based languages, one can add a construct for cloning objects or their method implementations. Another omission from the present logic is the ability to compare objects for equality. Just like one would expect to add primitive types like integers to the present logic, one would expect to add more general expressions, including comparison expressions. A logic of programs provides a connection between programs and their specifications. In the present logic, method declarations contain specifications that are given simply as transition relations. Transition relations are not practically suited for writing down method specifications, because they are painfully explicit. Specification features like modifies clauses and abstract fields would remedy the situation, but lie outside the scope of this paper. To mention some work in this area, Lano and Haughton [10] have surveyed object-oriented specifications, and my thesis [12] shows how to deal with modifies clauses and data abstraction in modular, object-oriented programs. The logic for POOL [4] includes some specification features that can be used to state properties of recursive data structures.

7

Summary

I have presented a sound logic for object-oriented programs whose commands are imperative and whose objects are references to data fields. The programming notation requires that types, fields, and methods be declared in the environment before they can be used in a program. The main contributions of the paper are the logic itself, the soundness theorem, and the way that types are handled, which makes the subtype relation and the admission of recursive object types trivial.

184

Acknowledgements. I am grateful to Martin Abadi, Luca Cardelli, Greg Nelson, and Raymie Stata for helpful comments on the logic and the presentation thereof.

References h Martin Abadi and Luca Cardelli. A Theory of Objects. Springer-Verlag, 1996. 2. Martin Abadi and K. Rustan M. Leino. A logic of object-oriented programs. In Theory

and Practice of Software Development: Proceedings/TAPSOFT '97, 7th International Joint Conference CAAP/FASE, volume 1214 of Lecture Notes in Computer Science, pages 682696. Springer, April 1997. 3. Roberto M. Amadio and Luca Cardelli. Subtyping recursive types. ACM Transactions on Programming Languages and Systems, 15(4):575-631, September 1993. 4. Pierre America and Frank de Boer. Reasoning about dynamically evolving process structures. FormaIAspects of Computing, 6(3):269-316, 1994. 5. Craig Chambers. The Cecil language: Specification & rationale, version 2.1, March 7, 1997. Available from http : //www. cs. washington, edu/research/proj ects/ce cil/www/Papers/cecil-spec, html, 1997.

6. Krishna Kishore Dhara and Gary T. Leavens. Forcing behavioral subtyping through specification inheritance. Technical Report TR #95-20c, Iowa State University, Department of Computer Science, 1997. 7. Edsger W. Dijkstra. A Discipline of Programming. Prentice-Hall, Englewood Cliffs, NJ, 1976. 8. James Gosling, Bill Joy, and Guy Steele. The JavaT M Language Specification. AddisonWesley, 1996. 9. C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576-580,583, October 1969. 10. Kevin Lano and Howard Haughton. Object-Oriented Specification Case Studies. Prentice Hail, New York, 1994. 11. Gary Todd Leavens. Verifying Object-Oriented Programs that Use Subtypes. PhD thesis, MIT Laboratory for Computer Science, February 1989. Available as Technical Report MIT/LCS/TR-439. 12. K. Rustan M. Leino. Toward Reliable Modular Programs. PhD thesis, California Institute of Technology, January 1995. Available as Technical Report Caltech-CS-TR-95-03. 13. K. Rustan M. Leino. Ecstatic: An object-oriented programming language with an axiomatic semantics. In The Fourth International Workshop on Foundations of Object-Oriented Languages, January 1997. Proceedings available from h t t p : //www. c s . i n d i a n a , e d u / hyplan/pierce/fool/.

14. K. Rustan M. Leino. Recursive object types in a logic of oject-oriented programs. Technical Note 1997-025a, Digital Equipment Corporation Systems Research Center, January 1998. 15. K. Rustan M. Leino and Greg Nelson. Object-oriented guarded commands. Internal manuscript KRML 50, Digital Equipment Corporation Systems Research Center, March 1995. 16. Greg Nelson. Verifying teachability invariants of linked structures. Conference Record of the Tenth Annual A CM Symposium on Principles of Programming Languages, pages 38-47, January 1983. 17. Greg Nelson, editor. Systems Programming with Modula-3. Series in Innovative Technology. Prentice-Hall, Englewood Cliffs, N J, 1991. 18. Amd Poetzsch-Heffter and Peter Mtiller. A logic for the verification of object-oriented programs. In R. Berghammer and E Simon, editors, Programming Languages and Fundamentals of Programming. Christian Albrechts-Universit~it Kiel, 1997.

Mode-Automata: A b o u t M o d e s and States for Reactive Systems* Florence Maraninchi and Yann R@mond VERIMAG** - Centre Equation, 2 Av. de Vignate - F38610 GIERES http ://www. imag. fr/VERIMAG/PEOPLE/Florence. Maraninchi

A b s t r a c t . In the field of reactive system programming, dataflow synchronous languages like Lustre [BCH+85,CHPP87] or Signal [GBBG85] offer a syntax similar to block-diagrams, and can be efficiently compiled into C code~ for instance. Designing a system that clearly exhibits several "independent" running modes is not difficult since the mode structure can be encoded explicitly with the available dataflow constructs. However the mode structure is no longer readable in the resulting program; modifying it is error prone, and it cannot be used to improve the quality of the generated code. We propose to introduce a special construct devoted to the expression of a mode structure in a reactive system. We call it mode-automaton, for it is basically an automaton whose states are labeled by dataflow programs. We also propose a set of operations that allow the composition of several mode-automata (parallel and hierarchic compositions taken from Argos [Mar92]), and we study the properties of our model, like the existence of a congruence of mode-automata for instance, as well as implementation issues.

1

Introduction

The work on which we report here has been motivated by the need to talk about running modes in a dataflow synchronous language. Dataflow languages like Lustre [BCH+85,CHPP87] or Signal [GBBG85] belong to the family of synchronous languages [BB91] devoted to the design, programming and validation of reactive systems. They have a formal semantics and can be efficiently compiled into C code, for instance. The dataflow style is clearly appropriate when the behaviour of the system to be described has some regularity, like in signal-processing. Designing a system that clearly exhibits several "independent" running modes is not so difficult since the mode structure can be encoded explicitly with the available dataflow constructs. However the mode structure is no longer readable in the resulting program; modifying it is error prone, and it cannot be used to improve the * This work has been partially supported by Esprit Long Term Research Project SYRF 22703 ** Verimag is a joint laboratory of Universit~ Joseph Fourier (Grenoble I), CNRS and INPG

186

quality of the generated code, decompose the proofs or at least serve as a guide in the analysis of the program. In section 2 we propose a definition of a mode, in order to motivate our approach. Section 3 defines mini-Lustre, a small subset of Lustre which is sufficient for presenting our notion of mode-automata in section 4. Section 5 explains how to compose these mode-automata, with the operators of Argos, and Section 6 defines a congruence. Section 7 compares the approach to others, in which modes have been studied. Finally, section 8 gives some ideas for further work.

2

What

is a M o d e ?

One (and perhaps the only one) way of facing the complexity of a system is to decompose it into several "independent" tasks. Of course the tasks are never completely independent, but it should be possible to find a decomposition in which the tasks are not too strongly connected with each other - - i.e. in which the interface between tasks is relatively simple, compared to their internal structure. The tasks correspond to some abstractions of the global behaviour of the system, and they may be viewed as differents parts of this global behaviour, devoted to the treatment of distinct situations. Decomposing a system into tasks allows independent reasoning about the individual tasks. Tasks may be concurrent, in which case the system has to be decomposed into concurrent and communicating components. The interface defines how the components communicate and synchronize with each other in order to reach a global goal. Thinking in terms of independent modes is in some sense an orthogonal point of view, since a mode structure is rather sequential than concurrent. This is typically the case with the modes of an airplane, which can be as high level as "landing" mode, "take-off" mode, etc. The normal behaviour of the system is a sequence of modes. In a transition between modes, the source mode is designed to build and guarantee a given configuration of the parameters of the system, such that the target mode can be entered. On the other hand, modes may be divided into sub-modes. Of course the mode structure may interfere with the concurrent structure, in which case each concurrent subsystem may have its own modes, and the global view of the system shows a Cartesian product of the sets of modes. Or the main view of the system may be a set of modes, and the description of each mode is further decomposed into concurrent tasks. Hence we need a richer notion of mode. This seems to give something similar to the notion of mode we find in Modecharts [JM88], where modes are structured like in Statecharts [Har87] (an A n d / O r tree). However, see section 7 for more comments about Modecharts, and a comparison between Modecharts and our approach.

187

2.1

Modes and States

Technically, all systems can be viewed as a (possibly huge, or even infinite) set of elementary and completely detailed states, such t h a t the knowledge a b o u t the current state is sufficient to determine the correct output, at any point in time. States are connected by transitions, whose firing depends on the inputs to the system. This complete model of the system behaviour m a y not be manageable, but it exists. We call its states and transitions execution states and transitions. Execution states are really concrete ones, related for instance to the content of the program m e m o r y during execution. The question is: how can we define the modes of a system in terms of its execution states and transitions? Since the state-transition view of a complex behaviour is intrinsically sequential, it seems that, in all cases, it should be possible to relate the abstract notion of mode to collections of execution states. The portion of behaviour corresponding to a mode is then defined as a set of execution states together with the attached transitions. Related questions are: are these collections disjoint? do they cover the whole set of states? S. Paynter [Pay96] suggests t h a t these two questions are orthogonal, and defines Real-Time Mode-Machines, which describe exhaustive but not necessarily exclusive modes (see more comments on this paper in section 7). In fact, the only relevant question is t h a t of exclusivity, since, for non exhaustive modes, one can always consider t h a t the "missing" states form an additional mode. 2.2

Talking about Modes in a Programming Language

All the formalisms or languages defined for reactive systems offer a parallel composition, together with some synchronization and communication mechanism. This operation supports a conceptual decomposition in terms of concurrent tasks, and the parallel structure can be used for compositional proofs, generation of distributed code, etc. The picture is not so clear for the decomposition into modes. The question here is how to use the mode structure of a complex system for p r o g r a m m i n g it, i.e. what construct should we introduce in a language to express this view of the system? The mode structure should be as readable in a program as the concurrent structure is, thus making modifications easier; moreover, it should be usable to improve the quality of the generated code, or to serve as a guide for decomposing proofs. The key point is t h a t it should be possible to project a program onto a given mode, and obtain the behaviour restricted to this mode (as it is usually possible to project a parallel program onto one of its concurrent components, and get the behaviour restricted to this component).

2.3

Modes and Synchronous Languages

None of the existing synchronous languages can be considered as providing a construct for expressing the mode structure of a reactive system.

188

We are particularly interested in dataflow languages. When trying to think in terms of modes in a dataflow language, one has to face two problems: first, there should be a way to express that some parts of a program are not always active (and this is not easy); second, if these parts of the program represent different modes, there should be a way of describing how the modes are organized into the global behaviour of the system. Several proposals have been made for introducing some control features in a dataflow program, and this has been tried for one language or another among the synchronous family: [RM95] to introduce in Signal a way to define intervals delimited by some properties of the inputs, and to which the activity of some subprograms can be attached; [JLMR94,MH96] propose to mix the automaton constructs of Argos with the dataflow style of Lustre: the refinement operation of Argos allows to refine a state of an automaton by a (possibly complex) subsystem. Hence the activity of subprograms is attached to states. Embedding Lustre nodes in an Esterel program is possible, and would have the same effect. However, providing a full set of start-and-stop control structures for a dataflow language does not necessarily improve the way modes can be dealt with. It solves the first problem mentioned above, i.e. the control structures taken in the imperative style allow the specification of activity periods of some subprograms described in a dataflow declarative style. But it does little for the second problem: a control structure like the interrupt makes it easy to express that the system switches between different behaviours, losing information about the current state of the behaviour that is interrupted, and starting a new one in some initial configuration. Of course, some information may be transmitted from the behaviour that is killed to the one that is started, but this is not the default, and it has to be expressed explicitly, with the communication mechanism for instance. For switching between modes, we claim that the emphasis should be on what is transmitted from one mode to another. Transmitting the whole configuration reached by the system should be the default if we consider that the source mode is designed to build and guarantee a given configuration of the parameters of the system, such that the target mode can be entered.

2.4

A Proposal: M o d e - A u t o m a t a

We propose a programming model called "mode-automatd', made of: operations on automata taken from the definition of Argos [Mar92]; dataflow equations taken from Lustre [BCH+85]. We shall see that mode-automata can be considered as a discrete version of hybrid automata [MMP91], in which the states are labeled by systems of differential equations that describe how the continuous environment evolves. In our model, states represent the running modes of a system, and the equations associated with the states could be obtained by discretizing the control laws. Mode-automata have the property that a program may be projected onto one of its modes.

189

3

M i n i - L u s t r e : a (very) S m a l l S u b s e t of L u s t r e

For the rest single node, some sense, format used

of the paper, we use a very small subset of Lustre. A p r o g r a m is a and we avoid the complexity related to types as much as possible. In the mini-Lustre model we present below is closer to the DC [CS95] as an intermediate form in the Lustre, Esterel and Argos compilers.

Definition 1 (mini-Lustre p r o g r a m s ) . N = (Vi, re, Vt, f, I) where: Vi, Vo and Vt are pairwise disjoint sets of input, output and local variable names. I is a total function from 12oU Vt to constants, f is a total function from Vo U Vl to the set Eq(];i U 12o U Vl) and E q ( V ) is the set of expressions with variables in V , defined by the following grammar: e ::= c I x I op(e, ..., e) I pre(x), c stands for constants, x stands for a name in V , and op stands for all combinational operators. A n interesting one is the conditional i t el t h e n e2 e l s e e 3 where el should be a Boolean expression, and e2, e3 should have the same type. pre(x) stands for the previous value of the flow denoted by x. In case one needs pre(x) at the first instant, I(x) should be used. [] We restrict mini-Lustre to integer and Boolean values. All expressions are assumed to be typed correctly. As in Lustre, we require t h a t the dependency graph between variables be acyclic. A dependency of X onto Y appears whenever there exists an equation of the form X . . . . Y... and Y does not appear inside a p r e operator. In the syntax of mini-Lustre programs, it means t h a t Y appears in the expression f ( X ) , not in a p r e operator.

Definition 2 ( T r a c e semantics of mini-Lustre). Each variable name v in the mini-Lustre program describes a flow of values of its type, i.e. an infinite sequence v0, Vl,.... Given a sequence of inputs, i.e. the values Vn, for each v 9 Vi and each n > O, we describe below how to compute the sequences (or traces) of local and output flows of the program. The initialization function gives values to variables for the instant "before time starts", since it provides values in case p r e ( x ) is needed at instant O. Hence we can call it x - x : Vv 9 Vo u V~.

v-1 = I ( v )

For all instants in time, the value of an output or local variable is computed according to its definition as given by f : Vn > O.

Vv 9 Vo u v~.

v~ = y ( v ) [ x ~ l x ] [ x ~ _ l l p r e ( x ) ]

We take the expression f ( v ) , in which we replace each variable name x by its current value xn, and each occurrence of pre(x) by the previous value xn-1. This yields an expression in which combinational operators are applied to constants. The set of equations we obtain for defining the values of all the flows over time is acyclic, and is a sound definition. []

190

Definition 3 (Union of mini-Lustre nodes). Provided they do not define the same outputs, i.e. )21 o fq )22 = 0, we can put together two mini-Lustre programs. This operation consists in connecting the outputs of one of them to the inputs of the other, if they have the same name. These connecting variables should be removed from the inputs of the global program, since we now provide definitions for them. This corresponds to the usual dataflow connection of two nodes. ()2/1,)2ol,)2/1, fl,/1 ) [,.J (V? , )22o, )22, f2, I 2) :

( (v: u v?) \ vl \ V o,

U V o,

u v2 ,

Ax.if x e )21oU )21 then f l ( x ) else f2(x), x e )2o then

i1( ) else 12(=))

Local variables should be disjoint also, but we can assume that a renaming is performed before two mini-Lustre programs are put together. Hence )2~ fq )22 = 0 is guaranteed. The union of sets of equations should still satisfy the acyclicity constraint. []

Definition 4 (Trace equivalence for mini-Lustre). Two programs L1 = ()2i,)2o, )21, f l , 11) and L2 = 02i, )20, )22, f2,12) having the same input]output interface are trace-equivalent (denoted by L1 ~', L2) if and only if they give the same sequence of outputs when fed with the same sequence of inputs. [] Definition 5 (Trace equivalence for mini-Lustre with no initial specification). We consider mini-Lustre programs without initial specification, i.e. mini-Lustre programs without the function I that gives values for the flows "before time starts". Two such objects 5x = ()2i,)20, )2~ , f l ) and L2 = ()2~,Vo, )22, f2) having the same input]output interface are trace-equivalent (denoted by L1 ~ L~) if and only if, for all initial configuration I, they give the same sequence of outputs when fed with the same sequence of inputs. [] P r o p e r t y 1 : Trace equivalence is preserved by union L,.~ L' ==a L U M ~ L ' U M

4

4.1

[]

Mode-Automata

Example and Definition

The mode-automaton of figure 1 describes a program that outputs an integer X. The initial value is 0. Then, the program has two modes: an incrementing mode, and a decrementing one. Changing modes is done according to the value reached by variable X: when it reaches 10, the mode is switched to "decrementing"; when X reaches 0 again, the mode is switched to "incrementing". For simplicity, we give the definition for a simple case where the equations define only integer variables. One could easily extend this framework to all types of variables.

191

x:o

] -'-"~-'~----..... X=pre(X) + 1

X ~=

0

1

0 X=pre(X)- 1

Fig. 1. Mode-automata: a simple example

Definition

6 (Mode-automata).

A mode-automaton is a tuple (Q, qo, Vi, Vo, :~, f, T) where: - Q is the set of states of the automaton part qo E Q is the initial state - Vi and 1)o are sets of names for input and output integer variables. T C Q x C(V) • Q is the set of transitions, labeled by conditions on the variables of V : V~ U 12o I : 12o ~ Z is a function defining the initial value of output variables - f :Q ~ 1)o ~ EqR defines the labeling of states by total ]unctions from 12o to the set EqR(Vi U 12o) of expressions that constitute the right parts of the equations defining the variables of 12o.

-

-

-

The expressions in EqR(]3i U 12o) have the same syntax as in mini-Lustre nodes: e ::= c [ x [ op(e,...,e) [ pre(x), where c stands for constants, x stands for a name in Vi U ]2o, and op stands for all combinational operators. The conditions in C(~i U 12o) are expressions of the same form, but without pre operators; the type of an expression serving as a condition is Boolean. []

Note that Input variables are used only in the right parts of the equations, or in the conditions. Output variables are used in the left parts of the equations, or in the conditions. We require that the automaton part of a mode-automaton be deterministic, i.e., for each state q E Q, if there exist two outgoing transitions (q, cl,ql) and (q, c2, q2), then Cl A c2 is not satisfiable. We also require that the automaton be reactive, i.e., for each state q E Q, the formula V(q,c,q')ET C is true. With these definitions, the example of figure 1 is written as: ( { A , B } , A , O , { X } , I : X --+ O, f(A) = { X = pre(X) + I ) , f ( B ) = { X = pre(X) - 1 }), {(g,x = 10, B),(B,X = 0, A),(A,X # 10, A),(B,X # 0, B)})

In the graphical notation of the example, we omitted the two loops (A, X # 10, A) and (B,X # 0, B).

192

4.2

S e m a n t i c s b y Translation into Mini-Lustre

The main idea is to translate the automaton structure of a mode-automaton into mini-Lustre, in a very classical and straightforward way. Then we gather all the sets of equations attached to states into a single conditional structure. We choose to encode each state by a Boolean variable. Arguments for a more efficient encoding exist, and such an encoding could be applied here, independently from the other part of the translation. However, it is sometimes desirable that the pure Lustre program obtained from mode-automata be readable; in this case, a clear (and one-to-one) relation between states in the mode-automaton, and variables in the Lustre program, is required. The function /: associates a mini-Lustre program with a mode-automaton. We associate a Boolean local variable with each state in Q = {q0, ql, ..., qn), with the same name. Hence:

~ ( ( q , qo, ]:~, l;o,27, f , T ) ) = ():~, ~;o,Q,e,J) The initial values of the variables in ]2o are given by the initialization function 27 of the mode-automaton, hence Vx E ~;o, J(x) = Z(x). For the local variables of the mini-Lustre program, which correspond to the states of the mode-automaton, we have: J(qo) = true and J(q) = false, Vq ~ q0. The equation for a local variable q that encodes a state q expresses that we are in state q at a given instant if and only if we were in some state q~, and a transition (q', c, q) could be taken. Note that, because the automaton is reactive, the system can always take a transition, in any state. A particular case is q' = q: staying in a state means taking a loop on that state, at each instant for q 9 Q, e(q) is the expression:

V pre (q' A c) (q',c,q)ET

For x 9 12o, e(x) is the expression : if q0 then f(q0)else if ql then f(ql)...else if q~ then f(q~) The mini-Lustre program obtained for the example is the following (note that pre(A and X = 10) is the same as pre(A) and pre(X) = 10, hence the equations have the form required in the definition of mini-Lustre). ])~ = ~ 14 -- { X ) ])~ = {A, B )

f(X) :if A then pre(X)+l else pre(X)-I f(A) :pre (A and not X=IO) or pre(B and X = O) f(B) :pre (B and not X=O) or pre(A and X = 10) I(X) = 0 5

5.1

I ( A ) = true

Compositions

I ( B ) = false

of Mode-Automata

Parallel C o m p o s i t i o n w i t h Shared Variables

A single mode-automaton is appropriate when the structure of the running modes is flat. Parallel composition of mode-automata is convenient whenever

193

the modes can be split into at least two orthogonal sets, such that a set of modes is used for controlling some of the variables, and another set of modes is used for controlling other variables.

v:o[

Y=10

~~~-..__._.

Y -----0

Y = pre(Y) q- 1

Y=pre(Y)-

1

xX_> 10_2o <50 X=pre(X) +Y

X = pre(X) - Y

Fig. 2. Parallel composition of mode-automata, an example

D e f i n i t i o n Provided l}o1 N l~o2 = 0, we define the parallel composition of two mode-automata by : (Q1, qo~' T 1, V~, Vo~, Z 1, fx)[I (Q2, qo2, T 2, V?, 1;o2,Z 2, f2) = (Q1 x Q2,(q~,q2),(])l O'l)?)\Vo1 \ )2O2,Vo1 UI)O2,Z,f) Where:

S fl(ql)(X) f(ql'q2)(X) = "~ff(q2)(X)

if X e Yo1 otherwise, i.e. if X e 1;o2

Similarly:

Z(X) = f zl (X)

if X E Vo1 ~, Z 2 (X) if X E 1)o2

And the set T of global transitions is defined by:

(qI,CI,q'I) E TI

A(q2,C2,q'2)E T 2

---> ((ql,q2),CIAC2,(q'I,q'2)) E T

The followingproperty establishes a relationship between the parallel composition definedfor mode-automataas a synchronousproduct, and the intrinsic parallelism of Lustre programs, captured by the union of sets of equations: Property 2 s ) ,~ ~(M~) U L(M2)

[]

194

5,2

I d e a s for H i e r a r c h i c M o d e s

A very common notion related to modes is t h a t of sub-modes. In the hierarchic composition below, the equation X = p r e ( X ) - 1 associated with state D is shared by the two submodes, while these submodes have been introduced for defining Y more precisely. Note t h a t the scope of Y is the whole program, not only state D. Splitting state D for refining the definition of Y has something to do with equivalences of m o d e - a u t o m a t a (see below). Indeed, we would like the hierarchic composition to be defined in such a way t h a t the program of figure 3 be "equivalent" to that of figure 4, where the program associated with state D uses a Boolean variable q. "Refining" (or exploding) the conditional definition of Y into the m o d e - a u t o m a t o n with states A, B of figure 3 is the work performed by the Lustre compiler, when building an interpreted a u t o m a t o n from a Lustre program.

IX, Y : O

]

~D

~,

X=pre(X)-I Y=IO

Y = pre(Y) + 1

Y -- pre(Y) - 1

X=pre(X)+l X=IO Y=pre(Y)+2

J X=O

Fig. 3. Hierarchic composition of mode-automata, an example X, Y, q : 0 X = 0

q=0 X=pre(X)+l I_Y=pre(Y)+2

q=if pre(q=O and Y=IO or q = l and Y r then 1 else 0 I Y= if q=O then pre(Y) + 1 else pre(Y) - 1 [_X=pre(X)-I

Fig. 4. Describing Y with a conditional program (q = 0 corresponds to A; q --- 1 corresponds to B)

195

6

Congruences

of Mode-Automata

We try to define an equivalence relation for mode-automata, to be a congruence for the parallel composition. There are essentially two ways of defining such an equivalence : either as a relation induced by the existing trace equivalence of Lustre programs; or by an explicit definition on the structure of mode-automata, inspired by the trace equivalence of automata. The idea is that, if two states have equivalent sets of equations, then they can be identified. Definition 7 (Induced equivalence of mode-automata). M1 --i M2 < ;- /~(M1) ~/~(M2) (see definition ~ for ..~).

[]

The direct equivalence is a bisimulation, taking the labeling of states into account:

Definition 8 (Direct equivalence of mode-automata).

( Ql, q~, Tl,1)~, l)xo, El, f 1) =d ( Q2, q2, T2, l;~, V2o, iZ'2,f2) r 3R C_Rs such that: e R

^

(ql,q2) E R

~

I (ql,cl,qn) E T

1

===~

3qt2,c 2 s. t. (q2,c2,q'2) E T 2

A (qn,q,2) E R A C1 =

C2

and conversely. Where Rs C Q1 • Q2 is the relation on states induced by the equivalence of the attached sets of equations: (ql,q2) E R, ~ fl(ql) ~ f2(q2) (see definition 5 for ~). [] Note: the conditions labeling transitions are Boolean expressions with subexpressions of the form a#b, where a and b are integer constants or variables, and # is a comparison operator yielding a Boolean result. We can always consider that these conditions are expressed as sums of monomials, and that a transition labeled by a sum c V c' is replaced by two transitions labeled by c and c~ (between the same states). Hence, in the equivalence relation defined above, we can require that the conditions c 1 and c 2 match. Since c I -- c2 is usually undecidable (it is not the syntactic identity), the above definition is not practical. It is important to note that the two equivalences do not coincide: M1 ----d M2 ==~ M1 - i M2, but this is not an equivalence. Translating the modeautomaton into Lustre before testing for equivalence provides a global comparison of two mode-automata. On the contrary, the second definition of equivalence compares subprograms attached to states, and may fail in recognizing that two mode-automata describe the same global behaviour, when there is no way of establishing a correspondence between their states (see example below).

Example 1. Let us consider a program that outputs an integer X. X has three different behaviours, described by: B1 : X = pre(X) + 1, B2 : X = pre(X) + 2

196

and B3 : X = w e ( X ) - 1. The transitions between these behaviours are triggered by conditions on X: Cij is the condition for switching from Bi to Bj. A mode-automaton M1 describes B1 and B2 with a state q12, and B3 with another state q3. Of course the program associated with q12 has a conditional structure, depending on C12, C21 and a Boolean variable. The program associated with q3 is B3. Now, consider the mode-automaton 11//2 that describes B1 with a state q~, B2 and B3 with a state q23. In M2, the program associated with q23 has a conditional structure. There is no relation R between the states of M1 and the states of M2 that would allow to recognize that they are equivalent. Translating them into miniLustre is a way of translating them into single-state mode-machines (with conditional associated programs), and it allows to show that they are indeed equivalent. On the other hand, if we are able to split ql~ into two states, and q~3 into two states (as suggested in section 5.2) then the two machines have three states, and we can show that they are equivalent. [] P r o p e r t y 3 : C o n g r u e n c e s of m o d e - a u t o m a t a The two equivalences are congruences for parallel composition.

7

Related

Work

and

Comments

on the

Notion

[]

of Mode

We already explained in section 2.3 that there exists no construct dedicated to the expression of modes in the main synchronous programming languages. Mode-automata are a proposal for that. They allow to distinguish between explicit states (corresponding to modes, and described by the automaton part of the program) and implicit states (far more detailed and grouped into modes, described by the dataflow equational part of the program). This is an answer for people who argue that modes should not be related to states, because they are far more states than modes. Other people argue that modes are not necessarily exclusive. The modes in one mode-automaton are exclusive. However, concurrent modes are elements of a Cartesian product, and can share something. Similarly, two submodes of a refined mode also share something. We tried to find a motivation for (and thus a definition of) non-exclusive modes. Modecharts have recently joined the synchronous community, and we can find an Esterel-like semantics in [PSM96]. In this paper, modes are simply hierarchical and concurrent states like in Argos [Mar92]. It is mentioned that "the actual interaction with the environment is produced by the operations associated with entry and exit events'. Hence the modes are not dealt with in the language itself; the language allows to describe a complex control structure, and an external activity can be attached to a composed state. It seems that the activity is not necessarily killed when the state is left; hence the activities associated with

197

exclusive states are not necessarily exclusive. This seems to be the motivation for non-exclusive modes. Activities are similar to the external tasks of Esterel but, in Esterel, the way tasks interfere with the control struture is well defined in the language itself. Real-time mode-machines have been proposed in [Pay96]. In this paper, modes are collections of states, in the sense recalled above (section 2.1). These collections are exhaustive but not exclusive. However, it seems that this requirement for non-exclusivity is related to pipelining of the execution: part of the system is still busy with a given piece of data, while another part is already using the next piece of data. The question is whether pipelining has anything to do with overlapping or non-exclusive modes. In software pipelining, there may be two components running in parallel and corresponding to the same piece o] source program; if this portion of source describes modes, it may be the case that the two execution instances of it are in different modes at the same time, because one of them starts treating some piece of data, while the other one finishes treating another piece of data. Each instance is in exactly one mode at a given instant; should this phenomenon be called "non-exclusive modes" ? We are still searching for examples of non-exclusive modes in reactive systems.

8

Implementation

and

Further

Work

We presented mode-automata, a way to deal with modes in a synchronous language. Parallel composition is well defined, and we also have a congruence of mode-automata, w.r.t, this operation. We still have to study the hierarchic composition, following the lines of [HMP95] (in this paper we proposed an extension of Argos dedicated to hybrid systems, as a description language for the tool Polka [HPR97], in which hybrid automata may be composed using the Argos operators. In particular, hierarchic composition in HybridArgos is a way to express that a set of states share the same description of the environment). We shall also extend the language of mode-automata by allowing full Lustre in the equations labeling the states (clocks, node calls, external functions or procedures...). Mode-automata composed in parallel are already available as a language, compiled into pure Lustre, thus benefiting from all the lustre tools. We said in section 2 that "It should be possible to project a program onto a given mode, and obtain the behaviour restricted to this mode". How can we do that for programs made of mode-automata? When a program is reduced to a single mode-antomaton, the mode is a state, and extracting the subprogram of this mode consists in taking the equations associated with this state. The object we obtain is a mini-Lustre program without initial state. When the program is something more complex, we are still able to extract a non-initialized mini-Lustre program for a given composed mode; for instance the program of a parallel mode (q, q~) is the union of the programs attached to q and q~. Projecting the complete program onto its modes may be useful for generating efficient sequential code. Indeed, the mode-structure clearly identifies which parts of a program are active at a given instant. In the SCADE (Safety Critical

198

Application Development Environment) tool sold by Verilog S.A. (based upon a commercial Lustre compiler), designers use an activation condition if they want to express that some part of the dataflow program should not be computed at each instant. It is a low level mechanism, which has to be used carefully: the activation condition for subprogram P is a Boolean flow' computed elsewhere in the dataflow program. Our mode structure is a higher level generalization of this simple mechanism. It is a real language feature, and it can be used for better code generation. Another interesting point about the mode-structure of a program is the possibility of decomposing proofs. For the decomposition of a problem into concurrent tasks, and the usual parallel compositions that support this design, people have proposed compositional proof rules, for instance the assume-guarantee scheme. The idea is to prove properties separately for the components, and to infer some property of the global system. We claim that the decomposition into several modes - - provided the language allows to deal with modes explicitly, i.e. to project a global program onto a given mode - - should have a corresponding compositional proof rule. At least, a mode-automaton is a way of identifying a control structure in a complex program. It can be used for splitting the work in analysis tools like Polka [HPR97].

References [BB91] [BCH+85]

[CHPP87]

[cs95] [GBBG85]

[Hat87] [HMP95]

[HPR97]

[JLMR94]

A. Benveniste and G. Berry. Another look at real-time programming. Special Section of the Proceedings of the IEEE, 79(9), September 1991. J-L. Bergerand, P. Caspi, N. Halbwachs, D. Pilaud, and E. Pilaud. Outline of a real time data-flow language. In Real Time Systems Symposium, San Diego, September 1985. P. Caspi, N. Halbwachs, D. Pilaud, and J. Plaice. LUSTRE, a declarative language for programming synchronous systems. In 14th Symposium on Principles of Programming Languages, Munich, January 1987. C2A-SYNCHRON. The common format of synchronous languages - The declarative code DC version 1.0. Technical report, SYNCHRON project, October 1995. P. Le Guernic, A. Benveniste, P. Bournai, and T. Gauthier. Signal: A data flow oriented language for signal processing. Technical report, IRISA report 246, IRISA, Rennes, France, 1985. D. Hazel. Statechazts : A visual approach to complex systems. Science of Computer Programming, 8:231-275, 1987. N. Halbwachs, F. Mazaninchi, and Y. E. Proy. The railroad crossing problem, modeling with hybrid argos - analysis with polka. In Second European Workshop on Real-Time and Hybrid Systems, Grenoble (France), June 1995. N. Halbwachs, Y.E. Proy, and P. Roumanoff. Verification of real-time systems using linear relation analysis. Formal Methods in System Design, 11(2):157-185, August 1997. M. Jourdan, F. Lagnier, F. Maraninchi, and P. Raymond. A multipazadigm language for reactive systems. In In 5th IEEE International Conference on Computer Languages, Toulouse, May 1994. IEEE Computer Society Press.

199

pM88]

Farnam Jahanian and Aloysius Mok. Modechart: A specification language for real-time systems. IEEE Transactions on Software Engineering, 14, 1988. [Mar92] F. Maraninchi. Operational and compositional semantics of synchronous automaton compositions. In CONCUR. LNCS 630, Springer Verlag, August 1992. F. Maraninchi and N. Halbwachs. Compiling argos into boolean equations. [MH96] In Formal Techniques for Real-Time and Fault Tolerance (FTRTFT), Uppsala (Sweden), September 1996. Springer verlag, LNCS 1135. [MMP91] O. Maler, Z. Manna, and A. Pnueli. From timed to hybrid systems. In REx Workshop on Real-Time: Theory in Practice, DePlasmolen (Netherlands), June 1991. LNCS 600, Springer Verlag. S. Paynter. Real-time mode-machines. In Formal Techniques for Real-Time [Pay96] and Fault Tolerance (FTRTFT), pages 90-109. LNCS 1135, Springer Verlag, 1996. [PSM96] Carlos Puchol, Douglas Stuart, and Aloysius K. Mok. An operational semantics and a compiler for modechart specificiations. Technical Report CS-TR-95-37, University of Texas, Austin, July 1, 1996. E. Rutten and F. Martinez. SIGNALGTI, implementing task preemption [RM95] and time interval in the synchronous data-flow language SIGNAL. In 7th Euromiero Workshop on Real Time Systems, Odense (Denmark), June 1995.

From Classes to Objects via Subtyping Didier R~my INRIA-Rocquencourt*

A b s t r a c t . We extend the Abadi-Cardelli calculus of primitive objects with object extension. We enrich object types with a more precise, uniform, and flexible type structure. This enables to type object extension under both width and depth subtyping. Objects may also have extendonly or virtual contra-variant methods and read-only co-variant methods. The resulting subtyping relation is richer, and types of objects can be weaken progressively from a class level to a mote traditional object level along the subtype relationship.

1

Introduction

Object extension has long been considered unsound when combined with subtyping. The problem may be explained as follows: in an object built with two methods ~1 and ~2 of types T1 and T2, the method ~1 may require ~ to be of type ~'2. Forgetting the method ~2 by subtyping would result in the possible redefinition of method ~2 with another, incompatible type ~-s. Then, the invocation of ~1 may fail. Indeed, the first strongly-typed object-based languages that have been proposed provided either subtyping [1] or object extension [21] to circumvent the problem described above. However, each proposal was missing an important feature supported by the other one. Both of them were improved later following the same principle: At an earlier stage, object components were assembled in prototypes [20] or classes [2], relying on some extension mechanism to provide inheritance. Objects were formed in a second, atomic step, immediately losing their extension capabilities for ever, to the benefit of subtyping. In contrast to the previous work, we allow both extension and subtyping at the level of objects, avoiding stratification. Our solution is based on the enrichment of the structure of object types. Thus, our type-system rejects the above counter-example while keeping m a n y other useful programs. In our proposal, an object and its class are unified and can be considered as two different perspectives on the same value: the type of an object is a supertype of the type of its class. Fine grain subtyping allows type information to be lost gradually, both width-wise and depth-wise, slowly fading classes into objects. As is well-known, when more type information is exposed, more operations can be performed (class * BP 105, 78153 Le Chesnay Cedex, France. Email: Didier.Remyr

201

perspective). On the contrary, hiding a sufficient amount of type information allows for more object interchangeability, but permits fewer operations (object perspective). We add object extension to the object calculus of Abadi and Cardelli [3]. We adapt their typing rules to our enriched object types. In particular, we force methods to be parametric in self, that is, polymorphic over all possible extensions of the respective object. In this sense, our proposal is not a strict extension of theirs. In addition to object extension, the enriched type structure has other benefits. We can allow virtual methods in objects (i.e. methods that are required by some other method but that have not been defined yet) since we are able to described them in types. Using co-variant subtyping forbids further re-definition of the corresponding method, as in [3]. Since classes are objects, such methods are in fact final methods. Final methods can only be accessed but no more redefined (except, indirectly, by the invocation of a previously defined method). Virtual methods are useful because they allow objects to be built progressively, component by component, rather than all at once. T h e y also improve security, since they sometime avoid the artificial use of dangerous default methods. While final methods are co-variant, virtual methods, are naturally contravariant. The rest of the paper is organized as follows. In the next section, we describe our solution informally. The following section is dedicated to the formal presentation. In section 4, we show some properties of the type system, in particular the type soundness property. Section 5 illustrates the gain in security and flexibility of our proposal by running a few examples. To a large extend, these examples can be understood intuitively and may also be read simultaneously with or immediately after the informal presentation. In section 6 we discuss possible extensions and variations of our proposal, as well as further meta-theoretical developments. A brief comparison with other works is done in section 7 before concluding.

2

Informal

presentation

Technically, our first goal is to provide method extension, while preserving some form of subtyping. The counter-example given above does not imply that both method extension and width subtyping are in contradiction. It only shows that combining two existing typing rules would allow to write unsafe programs. Thus, if ever possible, a type system with both method extension and subtyping should clearly impose restrictions when combining them. Our solution is to enrich types so that subtyping becomes traceable, and so that extension can be limited to those fields whose exact type is known. We first recall record types with symmetric type information. Using a similar structure for object types, some safe uses of subtyping and object extension can be typed, while the counter-example given in the introduction is rejected.

202

Record types Record values are partial functions with finite domains that map labels to values. Traditionally, the types of records are also partial functions with finite domains that map labels to types. They are represented as records of types, that is, {l~ : % ~eI}. This type says that fields ~'s are defined with values of type Ti'S. However, it does not imply anything about other fields. Another richer, more symmetric structure has also been used for record types, originally to allow type inference for records in ML [23, 24]. There, record types are treated as total functions mapping labels to field types, with the restriction that all but a finite number of labels have isomorphic images (i.e. are equal modulo renaming). Thus, record types can still be represented finitely by listing all significant labels with their corresponding field types and then adding an extra field-type acting as a template for all other labels. In their simplest form, field types are either P T (read present with type T) or A (read absent). For instance, a record with two fields ~1 of type T1 and ~2 of type T2 is given type (~1: P T1 ; 12: P ~'2 ; A). It could also, equivalently, be given type (~I:P T1 ; s ~-2 ; ~:A ; A) where g is distinct from ~1 and 12. In the absence of subtyping, standard types for records {~i : % ~ex} can indeed be seen as a special case of record types, where field variables are disallowed; their standard subtyping relation then corresponds to the one generated by the axiom P T <: A (and obvious structural rules). The type {~1 : ~-1;..~,~ : Tn} becomes an abbreviation for (~I:P ~-1 ; ..~,~:P ~-n ; A). However, record types are much more flexible. For instance, they inherently and symmetrically express negative information. Before we added subtyping, a field g of type A was known to be absent in the corresponding record. This is quite different from the absence of information about field ~. Such precise information is sometimes essential; a wellknown example is record concatenation [16]. Instead of breaking the symmetry with the subtyping axiom P T <: A, we might have introduced a new field U (read unknown), with two axioms P T <:U and A <: U. This would preserve the property that a field of type $ is known to be absent, still allowing present and absent field to be interchanged but at their common supertype U. Field variables and row variables also increase the expressiveness of record types. However, for simplicity, we do not take this direction here. Below, we use meta-variables for rows. This is just a notational convenience. It does not add any power. Object types In their simplest form, objects are just records, thus object types mimic record types. We write object types with [p] instead of (p) to avoid confusion. An object with type [~I:P T1 ;~2:P T2;A] possesses two methods ~1 and g2 of respective types ~-1 and T2. Intuitively, an object [ll = ai ~eI] can be given type [~: P ~-i ~ez ; A] provided methods a~'s have type %'s. However, objects soon differ from records by their ability to send messages to themselves, or to return themselves in response to a method call. More generally,

203

objects are of the form It, = g(x,)a,]. Here, x~ is a variable t h a t is bound to the object itself when the m e t h o d g, is invoked. Consistently, the expression a, must be t y p e d in a context where x, is assumed of the so-called " m y t y p e " , represented by some t y p e variable X equal to the object t y p e T. T h e following typing rule is a variant of the one used in [3]. T --= ~(X)[e,:P T~'eI ; A] A F r

A,X=T,x,:xFa,:T, = q(x~)a, ,Oil : ~-

( T h e t y p e a n n o t a t i o n (X, ~') in the object expression binds the n a m e of m y t y p e locally and specifies the t y p e of the object.) A n extendible object v m a y also be used to build a new object v t with more m e t h o d s t h a n v and thus of a different type, say T'. T h e t y p e T r of self in v ~ is different from the type r of self in v. In order to remain well-typed in v ~, the m e t h o d s of v, should have been typed in a context where the t y p e of self could have been T ~ as well as T. This applies to any possible extension v ~ of v. In other words, m e t h o d s of an object of type T should be p a r a m e t r i c in all possible types of all possible successive extensions of an object of t y p e T. This condition can actually be expressed with subtyping by X < : # % where # T is called the extension type of T (also called the internal t y p e of the object). T h a t is, the least u p p e r b o u n d of all exact I types of complete extensions (extensions in which no virtual m e t h o d remains) of objects of external t y p e T. A field of t y p e A can be overridden with m e t h o d s of a r b i t r a r y types. Thus, the best t y p e for t h a t field in the self p a r a m e t e r is U, i.e. we choose # A to be U. Symmetrically, we choose # ( P T) to be U. This makes m e t h o d s of type P 7internally unaccessible. Fields of type P 7- are known to be present externally, but are not assumed to be so internally. Thus, fields of type P ~- can be overridden with m e t h o d s of a r b i t r a r y types, such as fields of t y p e A. To recover the ability to send messages to self, we introduce a new t y p e field R 7- (read required of t y p e T). A field of type R T is defined with a m e t h o d of t y p e T, and is required to remain of at least type T, internally. Such a field can only be overridden with a m e t h o d of t y p e T. Therefore, self can also view it as a field t h a t is, and will remain, of t y p e T. In m a t h , # R T is l~ T. A field of type P T, can safely be considered as a field of type R T. Thus, we assume P T <: R T. We also assume R T to be a s u b t y p e of U. As a n example, # ~ ( X ) [ ~ I : R vl ; g 2 : P ~-2 ; ~3:U ; A] is ((X)[~z: R I-1 ; ~2: U ; ta: U ; U], or shortly ((X)[el: R T ; U]. T h e extension of a field with a m e t h o d of t y p e r requires t h a t field to be either of t y p e A or R T in the original record (the field m a y also be of type P % which is a s u b t y p e of R ~-.) It is possible to factor the two cases by introducing a new field t y p e M T (read maybe of t y p e r), and the axioms g T<:M ~-, A<:bl T, and 1t w < : U. Intuitively, M r is the union type l~ T U A. This allows, in a first step, to ignore the presence of a m e t h o d while retaining its type, and, in a second step, to forget the t y p e itself. T h e t y p e of object extension becomes more uniform. 1 The exact type of an object is the type with which the methods can initiMly be typed. The external type of an object may be a supertype of the exact type.

204

Roughly, if the original object has type [~1: M~-1 ; T2] and the new method ~1 has type rl then the resulting object has type [~1: R T1 ; T2]. A field of type M r may later be defined or redefined with some method of type T, becoming of type R r, which is a subtype of M ~-. It may also be left unchanged and thus remain of type M T. Thus, a field of type M T will always remain of a subtype of MT. That is, #(M T) is MT. Deep subtyping Subtyping rules described so far allow for width subtyping but not for depth subtyping, since all constructors have been left invariant. The only constructor that could be made covariant without breaking type-soundness is P. Making R covariant would be unsafe. However, we can safely introduce a new field type R+ T to tell that a method is defined and required to be of a subtype of T, provided that a field of type R+ T is never overridden. On the other hand, a method ~1 can safely be invoked on any object of type [gl: R+ ~'1 ; U], which returns an expression of type ~'1. Of course, we also add R r <: R+ r to just forget the fact that we are revealing the exact type information. Symmetrically, a field g of type Mr cannot be accessed, but it can be redefined with a method of a subtype of T. Still, it would be unsound to make MT contravariant. By contradiction, consider an object p of type ~(X)[g:R T ; E : P X;A] where calling method ~' overrides ~ in self with a new method of type T. By subtyping P0 could be given type ((X)[i:M To ; E : P X ;A] where ~-0 is a subtype of T. Then let P2 of type ~(X)[t:M TO ; E:P X ; g":P u n i t ; A] be the extension of Pl with a new method t" that requires l of type TO. Calling method E of P2 restore field t of p2 to some method of type ~- and returns an object P3. However, calling method t" of P3 expects a method ~ of type TO but finds one of type T. We can still introduce a contra-variant symbol M- with the axiom 1~7-<: M- 7-. Then, a method M- ~- can be redefined, but the method in the resulting object remains of type M- T and is thus unaccessible. This is still useful in situations where contra-variance is mandatory or to enforce protection against accidental access (see sections 5.6, 5.2 and [3].) Virtual methods A method ~ is virtual with type T (which we write V T) if other methods have assumed ~ to be of type R T, while the method itself might not have been defined yet. When an object has a virtual method, no other method of that object can be invoked. Thus, V T should not be a subtype of U. A method of type V T can be extended as a method of type R T. Virtual methods may also be contra-variant. We use another symbol V-~- to indicate that deep subtyping has been used. A contra-variant virtual method can be extended, but it must remain contravariant after its extension, i.e. of type M- T, and thus inaccessible. This may be surprising at first. The intuition if that # ( V - T ) should be R- T. However, a method of field-type R- T would be inaccessible, since its best type is unknown. Thus R- T has been identified with M- T.

205

For convenience, we also introduce a new c o n s t a n t F t h a t is a top t y p e for fields. T h a t is, we assume V- T < : F and U < : F (all other relations hold by transitivity).

Static

Dynamic

#~ PT

U

A

U

RT

~o

R+ T MT

~o

M-~-

Allowed

Pre Type _.~ ~/

<:~

T

V

<:T

•

V

~/

< :T

T

"r

X/

<:v

r

•

<:T

•

T

V

•

r

U

~o

V

•

•

VT

RT

<:T

•

~"

V-T

H-T

V

l

~-

Fig. 1. Structure of field types

T h e final s t r u c t u r e of field types and s u b t y p i n g axioms are summarized in figure 1. Thick arrows represent the function ~:. Thick nodes are used instead of reflexive thick arrows, t h a t is, thick nodes are left invariant by # . Thin arrows represent subtyping. We added a r e d u n d a n t but useful distinction between continuous and dashed thin arrows. T h e y are respectively covariant and contravariant by type-extension: when a continuous arrow connects ~'1 and T2, then # T1 is also a s u b t y p e of # ~-2; the inverse applies to dashed arrows. A l t h o u g h it is easy to give intuitions for parts of the hierarchy taken alone (variances, virtual methods, idempotent field-types), we are not able to propose a good intuition for the whole hierarchy. The different c o m p o n e n t s are m o d u l a r technically, but their intuitive, thus approximative descriptions, c a n n o t be composed here. We think t h a t the field-type hierarchy should be u n d e r s t o o d locally, and then considered as such. T h e table on the right is a s u m m a r y of field types and their properties. T h e entry ~o in the first column indicates the static external type. T h e second

206

column # ~ is its extension type, i.e. the static internal type. The two following columns tell whether the field is guaranteed to be present (x/sign) and its type if present. The reason for having < : r instead of T is the covariance of P. The symbol V means any possible type. The last two columns describe access and overriding capabilities (_l_ means disallowed). 3 3.1

Formal

developments

Types

We assume given a denumerable collection of type variables, written a, fl, or XType expressions, written with letter T, are type variables, object types, or the top type T. An object type ~(X)[~,: ~ ieI ;~] is composed of a finite sequence of fields ~, : ~,, without repetition, and a template T for fields that are not explicitly mentioned. Variable X is bound in the object type, and should only appear positively in ~,'s as in T.

T ::= ~ I r ~I ;~]IT W::=AIPrlRrl~rlvrl~+rIM-rlV-TIUl

F

The variance of an occurrence is defined in the usual way: it is the parity of the number of times a variable crosses a contra-variant position (i.e., the number of symbols V- or M-) on that path from the root to that occurrence. The set of free variables of T is fV(T). We write / v - ( r ) the subset of those variables that occurs negatively at least once. Object types are considered equal modulo reordering of fields. They are also equal modulo expansion, that is, by extracting a field from the template:

r

~, ~ I ;~] = r

~ ~ I ; e : ~ ; ~o]

t # e.vi ~ z

Rules for the formation of types will be defined jointly with subtyping rules in figure 2 and are described below. N o t a t i o n For convenience and brevity of notation, we use meta-variables p for rows of fields, that is, syntactic expressions of the form (g~:qoi~ * ; ~o0), where ~o,'s and I are left implicit. We write p(~) the value of p in g, that is, ~o~ if g is one of the s or ~o0 otherwise. We write p \ g for (g,: ~o, ,ez,e,#e ; ~o0). and g : ~o;p for (g:~o ; g,:~o, ~e• ; ~o0). IfT~ is a relation, we write pTdp' for Vg, p(g) i~ p'(g). This is just a meta-notation that is not part of the language of types. It can always be expanded unambiguously into the more explicit notation (g,: ~o, ,~l ; ~o).

3.2

Type extension

We define the extension of field type ~, written # ~ by the two first columns of the table 1. Type extension is lifted to object types homomorphically, i.e., # ~(X)[P] is ~(X)[# P]. The extension is not defined for type variables, nor for F. Note that the extension is idempotent, that is # ( # r ) is Mways equal to # r.

207

Well-formation of environments (ENv x) E }- T <: T

(ENv 0)

(ENv a) EF-~" < : T a ~ dora ( E ) E , a <:~" f- o

x ~ dora(E)

General subtyping

(SuB VAit)

(SUB REF F)

(SuB REF T)

E , a <: % E' F- o E , a <: % E' I- a <: ~"

E F- ( p < : F E I- ~p <: qo

E[-'r <:T

(SuB TITANST) El-T1 <:T2

Ef-r

(SuB TRANS F) E ~ T2 <: T3

E F- ~vl <: %vz

E I- qo2 <: %a3 E I- %vl <: ~v3

E F- ~h <:'r3 Field subtyping (assuming E F- ~- <: T)

(SuB PA)

(SuB PR)

(SuB AM)

(SUB OF)

E}-P~-<:h

Ef-PT<:Rr

EI-A<:MT

Et-U<:F

(SuB R+U) Eb-R+T<:U

EI-M'r <:VT

(SUB RM)

(SuB RR +) Ef-RT<:R+T

Ef-RT<:M~-

(SuB MV)

(SuB MM-)

(sub M-o)

(SuB VV-)

(SuB V-F)

E I-M'r <:M- T

EI-MT<:U

El- V ~ <:V- ~-

E t- V- ~- < : F

(SuB PP) El- r

(SuB R+R +)

E I- P ~- < : P ~-'

E I- R+I- <:R+~ -'

(SuB v-v-)

(SuB M-M-)

E[-T<:T'

E~-T<:T' E ~- M- "r'<:M- ~-

El- ~- <:T' EF- V- T' <:V- T

Object subtyping (SUB TT) EI'-o E F- T <: T

(SuB OBJ) E,x<:T~p<:F x~fv-(P) E }- ((X)[P] <: T

(SuB OBJ DEEP) (r = ( ( X ) [ p ] , r ' --= r Eb~<:T EF-T' <:T E , x < : ~ "r F- p < : p' E , X < : :/:/:"r F # p < : Cp p' E F- T <: ~"

Fig. 2. Types and Subtypes

3.3

Expressions

E x p r e s s i o n s are variables, objects, m e t h o d i n v o c a t i o n , a n d m e t h o d overriding. a ::= x i ~ ( X , ' r ) [ ~ i : ~ ( x ~ ) a ~ ] i a . ~ l

a.~ ~

~(X,'r)g(x)a

208

T h e expression a.~ ~ ((X, ~-)r is the extension of a on field g with a method r The expression (X, T) binds X to the type of self in at and indicates that the resulting type of the extension should be 7-. This information is i m p o r t a n t so that types do not have to be inferred but only checked. Field u p d a t e is just a special case of object extension. This is more general, since the selection between update and extension is resolved dynamically. 3.4

W e l l formation of types and subtyping

Typing environments are sequences of bindings written with letter E. T h e r e are free kinds of judgments (the second and third ones are similar):

E::=OI~<:TIx:~-

Typing environments

EI-o E I - T <:r'

Environment E is well-formed Regular type T is a subtype of T' in E Field type ~ is a subtype of ~o' in E Expression a has type T in E

E F- r <: ~'

EFa:T

The subtyping judgment E F T <: T is used to mean that T is a well-formed regular type in E, while E F ~o <: F means that ~ is a well-formed field-type in E. Thus, T and F also play a role of kinds. For sake of simplicity, we do not allow field variables c~ <: F in environments. We have used different meta-variables T and qo for regular types and field-types for sake of readability, although this is redundant with the constraint enforced by the well-formation rules. T h e formation of environments is recursively defined with rules for the formation of types and snbtyping rules given in figure 2. The subtyping rules are quite standard. Most of the rules are dedicated to field subtyping; they formally described the relation t h a t was drawn in figure 1. A few facts are worth noticing. First we cannot derive E F F <: F. Thus F is only used in E F- ~ <: F to tell that qo is a well-formed field type. It prevents using F in object types. The typing rule SUB PA is also worth consideration. By transitivity with other rules, it allows P 7- to be a subtype of I~ ~-', even if types T and T' are incompatible. However, it remains true, and this is essential, that P r is a subtype of 1t T' if and only if ~- is a subtype of 7-'. The most interesting rule SUB OBJ DEEP describes subtyping for object types. As explained above, row variables are just a meta-notation; thus, the judgment E F p <: p' is just a short hand for E b p(e) <: p'(~) for any label ~, which only involves a finite number of them. The simpler, but weaker rule below would keep mytype invariant during subtyping. (SuB OBJ INVARIANT) (Y ~ ~(X)[P], ~' ----~(X)[P']) Ei-T<:T E}-~"<:T E , x < : T t - p < : p' E I- ~- <: ~-' This would prevent (positive) occurrences of self to be replaced by # T where T is the current type of the object. Replacing in SUB OBJ INVARIANT the bound

209

T of X by # T would actually not behave well with respect to transitivity. The additionM premise in rule SUB OBJ DEEP restores the correct behavior with respect to transitivity. Namely, the application of a rule SUB TRANS between two object types, can be replaced by SUB OBJ DEEP and applications of SUB TRANS to the bounds of the premises.

(ExPR SUBSUMPTION) E]-a:T

(ExPR VAR)

E,x:'r,E'~r

EI--T<:T t El-a:7"'

E , x : 7",E' l- x : 7"

(ExPrt OBJECT) (7" -- r E,x<:~'r,

xi:xba,:Ti,iEI

E F- r

E,X<:~fT"~(PT"~eI;A)<:p

7")[g, = r

,eI] : T

(ExPR SELECT) El-a:7.

E I- 7" <: ~(X)[g.:R+ Tt ; U]

E I- a.t : 7"~{~/a}

(Exprt UPDATE) EF-a:7"

El-'r<:((X)[~.:RT"t;po ] E F- a.i = r 7.)r

E,x<:#7",x:xI-'at 7"

:'rt

(ExPR EXTEND) (T = r (~0, P(~)) ~ {(A, P ~), (V T~, R ~), (v- T~, M- 7"~)} EFa:((X)[g:~oo;p\t] E,X<:#7",x:xFat:Tt E F a.i ~ r162

T

Fig. 3. Typing rules

3.5

T y p i n g rules

Typing rules are given in figure 3. The rules for subsumption, variables, and method invocation are quite standard. Rule ExPrt OBJECT has been discussed earlier. The last premise says that the fields ~, may actually be super-types of P r in p and other fields may also be super types of h. One cannot simply require that p be (~ : P Ti ~X; A) and later use subsumption, since the assumption made on the type of x~ while typing a, could then be too weak. Rule EXPR UPDATE is similar to the overriding rule in [3]. This rule is important since it permits both internal and external updates: the result type of the object is exactly the same as the one before the update. On the contrary, rule ExPR EXTEND is intended to add new methods that were not necessarily defined before, and thus change the type of the object.

210

Let ~ and e~ ieI be distinct labels, j in I, and v be of the form ((X, ~-)[& = g(xi)a, ~ex].

(SELECT)

v.e~ ---~ ~, { . . , - / ~ } { v / ~ } v.e, ~ ~(X, T')g(x)a ~ ~.e ~

if al

dx, ~')~(~o)~0 ~

~(X, T ~--- T')[e, = g ( x / ) a , ' E I - J , ~ j ((x, ~ ~

~')[~, =

= g(x)a]

~(~,)~, ' % e = ~(~)~]

(UPDATE) (EXTEND) (CONTEXT)

, a2 then C{al} ----+C{a2} Fig. 4. Reduction rules

There are three different sub-cases in rule EXPR EXTEND;the one that applies is uniquely determined by the given type T. Then the type of field ~ in the argument is deduced from the small table. Rules EXPR EXTEND and EXPR UPDATE both apply only when 7- is of the form ~(X)[~: P T~ ; p] or ((X)[~: 1~T~ ; p]. Then, the requirements on the type of a are the same (letting the premise of SUB EXTEND be preceded by a subsumption rule). Thus, different derivations lead to the same judgment. It would also be possible to syntactically distinguish between object extension and method update, as well as to separate the extension between three different primitive corresponding to each of the three typing cases.

3.6

Operational semantics

We give a reduction semantics for a call-by-value strategy. Values are reduced to objects. A leftmost outermost evaluation strategy is enforced by the evaluation contexts C.

v ::= ((x,,)[t, = ~(x~)a~ ,~I]

c ::= {} I c.el c.e ~- ((~-,x)dx)a

The reduction rules are given in figure 4. Since programs are explicitly typed, the reduction must also manipulate types in order to maintain programs both wellformed and well-typed, even though it is not type-driven. In fact, the reduction uses an auxiliary binary operation on types qa ~ ~oI, to recompute the witness type of object values during object extension. It is defined in figure 5. The function qa ~-- qo' is extended to object types homomorphically, i.e., ((X)[P] ((X) [P'] is ((X)[P ~ P']. There is some flexibility in defining type reduction. As explained in the next section, it is only necessary that type-extension validates l e m m a 4.

211

r

PT', A, U: V-T'

(/91 P %

A

RT',

~02

R r , R+ r, U,

M- r

Mr':

R+T '

992

M-T

t

VT'

~02

~02

~I

~I

~I

~I

M T

r

R ~"

%ol

VT

V T

(~1

R T

R T

~1

V-- T

~01

~01

M- T

~ol

Fig. 5. Type reduction ~l ~ ~2

4

Soundness of the typing rules

The soundness of the typing rules results from a combination of subject reduction and canonical forms. The proof of subject reduction is standard (see [3] for instance). A few classical lemmas help simplifying the main proof. L e m m a 1 ( B o u n d w e a k e n i n g ) . I f E F T <: ~-' and E , a <: T', E' F /7, then E, a <: T, E' F / 7 . Lemma 2 (Substitution). 1. I f E , a < : % E ' b f f and E F T' <:T, then E , E ' { T ' / a } F f f { T ' / a } . 2. I f E, x : T , E ' b /7 and E F a : % then E , E ' } - / 7 { a / x } .

Lemma 3 (Structural subtyping). 1. I f ~- - ((X)[P] and E F T <: T', then T' is either T or of the f o r m ((X)[P'] and E , X <: # T F p <: p'. 2. I r E F ~o <: R ~'e, then ~ is either R TI or P TO where E F ~'0 <: Te. 3. I f E F ~ <: R+ Tg, then qo is either P 7o, R TO, or R+ TO where E F TO <: qog. 4. I f E F ~ <: P Te, then qo is Pro where E F ~'o <: ~'eEtc.

The proof of subject reduction also uses an essential lemma that relates computation on types to subtyping. Actually, the proof does not depend on the particular definition of # , but only on the following lemma. L e m m a 4 ( T y p e c o m p u t a t i o n ) . Let r and T' be two object types ~(X)[P] and ~(X) [P']. Assume that there exists a row p" such that E, X <: # T F p <: p" and for each label ~, the pair (p"(g), p'(~)) is one of the four forms (A, P re), (V T~, R Te), (V- ~'g,M- Tt), or (~, ~). Let ~ be ~- ~ T' and ~ be p ~ p'. Then, EF'7-<:T'

EF#+<:#T'

EF#+<:#T'

Moreover, if E, X <: # 7 F p(~) <: p'(g), then E, X <: # r I- p(g) <: ~(g); otherwise E , X <: # r F P ~-~ <: ~(g).

212

Lemma 5 ( V i r t u a l m e t h o d s ) . I f E ~ T <: ~(x)[U], t h e n E F T <: # r . Theorem 1 ( S u b j e c t R e d u c t i o n ) . Typings are p r e s e r v e d by reduction. I r E F a : Ta and a

> a ~ t h e n E F- a ~ : ~-~.

Theorem 2 (Canonical Forms). Well-typed expressions that c a n n o t be reduced are values. I f 0 F- a : ~ and there exists no a t s u c h that a a value.

5

) a r, t h e n a is

Examples

For simplicity, we assume that the core calculus has been extended with abstraction and application. This extension could either be primitive or derived from the encoding given in section 5.6. For brievity, we write a.g ~ a' instead of a.~ ~- ~(X, T)q(z)a' when a' does not depend on the self parameter z. In practice, other abbreviations could be made, but we avoid them here to reduce confusion. We consider the simple example of points and colored points. These objects can of course already be written in [3]. The expressiveness of our calculus is not so much its capability to write new forms of complete objects but to provide new means of defining them. This provides more flexibility, increases security in several ways, and removes the complexity of the encoding of classes into objects. 5.1

Objects

A point object p0 can be defined as follows: ((X, p o i n t ) [ x = 0 ; m y = q( z ) A y . ( z . x ~- y) ; p r i n t = q( z ) p r i n t _ i n t z.x] where p o i n t is ((X)[X:R i n t ; m v : P i n t --* x ; p r i n t : P u n i t ; A ] . As in [3], new points can be created using method update as in po.x ~ - ( ( X , p o i n t ) ~ ( z ) l . Moreover, colored points can be defined inheriting from points: c p o i n t ~ ((X)[X: 1~ i n t ; c: R b o o l ; m y : P i n t --* X ; p r i n t : P u n i t ; A] /x

cp = (p0.c ~- r (X, c p o i n t )q( z ) t r u e ) . p r i n t ~(X, c p o i n t ) q ( z ) i f z.c t h e n p r i n t _ i n t z . x When two values of different types have a common super-type "r, they can be interchanged at type T. Here, c p o i n t is not a subtype of p o i n t , since both types carry too precise type information. However, they admit the common super-type ~(X)[X: R int ; c:U; m v : P int --* X ; p r i n t : P unit ; A]..

5.2

Abstraction via subtyping

Subtyping can also be used to enforce security. For instance, field x may be hidden by weakening its type to U. Similarly, method m v may be protected

213

against further redefinition by weakening its type to R+ T. T h a t is, by giving P0 the type ~(x)[mv:R+int--* X ; p r i n t : R u n i t ; U ] . While method m v can no longer be directly redefined, there is still a possibility for indirect redefinition. For instance, method print could have been written so that it overrides method m v before printing. To ensure that a method can never be redefined, directly or indirectly, it must be given type R+T at its creation. 5.3

Virtual methods

The creation of new points by updating the field of an already existing point is not quite satisfactory since it requires the use of default methods to represent the undefined state, which are often arbitrary and may be a source of errors. Indeed, a class of points can be seen as a virtual point lacking its field components.

POINT =~ ((x)[=: v i n t ; my: P i n t -~ X ; print: P u n i t ; A] P ~ r

POINT)[mv = ~(z)Ay.(z.= ~ y) ; p r i , t = r

z.~]

New points are then created by filling in the missing fields: n e w _ p o i n t ~ ~y.(P.x ~- ~(X, p o i n t ) q ( z ) y ) 5.4

A Pl = new_point 0

Traditional class-based perspective

To keep closer to the traditional approach, we may by default choose to hide both fields corresponding to instance variables and the extendible capabilities of the remaining methods. For instance, treating x as an instance variable, and m v and print as "regular" methods, we choose ((x)[mv: R + i n t --* X ; print: R+unit ; U] for p o i n t . Intuitively, the object-type p o i n t hides all information that is not necessary to increase security. Conversely, the class-type POINT remains as precise as possible, to keep expressiveness. Indeed, a class of points is still an object. However, as opposed to the previous section, we adopt some uniform, more structured style, treating "real" objects differently from those representing classes. In colored points, we may choose to leave field c readable and overridable, as if we defined two methods set_c and get_c. c p o i n t A__~(X)[C: R bool ; mv: R + i n t ---+X ; print: R+unit ; U] Single inheritance is obtained by class extension: CPOINT A__~(X)[X: V i n t ; c:V b o o l ; mv:P i n t --* X ;print:P u n i t ;A]

C P z~ (P.print ~- ((X, CPOINW)q(z)if z.c t h e n print_int z.x)

new_cpoint~ Ay.~.(CP.= ~ r

cpoint)r

~ r

cpoint)r

While CPOINT is not a subtype of POINT at the class level, we recover the usual relationship that c p o i n t is a subtype of p o i n t at the object level. Moreover, at the object level, types are invariant by # . Thus, we also recover the subtyping relation of [3]. In particular, object types can be unfolded.

214

5.5

An advanced example

A colorable point p' is a point prepared to be colored without actually being colored. It can be obtained by adding to the point P0 an extra method p a i n t that when called with an argument y returns the colored point obtained by adding the color field c with value y and by updating the print method of p0.

p' ~ po.paint ~- ~(z, point'))~y. ((z.c ~- y).print ~- ((X, c p o i n t ) g ( z ) i f z.c t h e n print_int z.x) where p o i n t I is ((X)[X: R i n t ;...print: R u n i t ;paint: P b o o l -~ c p o i n t ;e: Mb o o l ;U] This example may be seen as the installation (method paint) of a new behavior (method print) t h a t interacts with the existing state x and adds some new state c. The above solution becomes more interesting if each installation involves m a n y methods, and especially if several installation are either different fields of the same objects or the same field of different objects. Then, the installation procedure can be selected dynamically by message invocation instead of manually by applying an external function to the object. 5.6

Encoding of the lambda-calculus

This part improves the encoding proposed in [3]. It also illustrates the use of virtual methods and variances. The untyped encoding of the lambda-calculus into objects in [3] is the following2: ((x)) ~

x.arg

((Ax.M)) ~= [arg = ~(x)x.arg ; v a l = r ((M M')) ~ (((M)).arg ~- q(x)((M'))).val

A function is encoded as an object with a diverging method arg. The encoding of an application overrides the method a r g of the encoding of the function with the encoding of the argument and invokes the method v a l of the resulting object. P r o g r a m s obtained by the translation of functional programs will never call v a l before loading the argument. However, if the encoding is used as a programming style, the type system will not provide as much safety as a type system with primitive function types would. T h e method v a l could Mso be called, accidently, before the field a r g has been overridden. In general, this will, in turn, call the method a r g and diverge. The use of default diverging methods is a hack that palliates the absence of virtual methods. It can be assimilated to a "method not understood" type error and one could argue that the encoding of [3] is not strongly typed. 2 If both functions and objects co-exist, one should actually mark variables introduced by the encoding of functions so as to leave the other variables unchanged.

215

The encoding can be improved using object extension to treat a function as an object [val = ~(x).((M))] with a virtual method a r g (remember that x . a r g may appear in ((M))). The type-system will then prevent the method v a l to be called before the argument has been loaded. More precisely, let us consider the simply typed lambda-calculus: Ax.M

M ::= x I )~x : t . M I M M

t ::= a l t - - , t

Functional types are encoded as follows:

((t --, t')) =Ar

V-((t)) ;val: R+((t')) ;U]

This naturally induces a subtyping relation between function types that is contravariant on the domain and covariant on the co-domMn. T h e typed encoding is given by the following inference rules: A,x:tF-M:t'=~a

x:tEA

A F x : t =~ x . a r g

A F ),x: t.M: A F M : t~ --, t ~ a

t --* t' =~ r

xq~dom(A) ((t -~ t'/})[val = ~(x).a]

A F MP : t~ =~ a ~

A F M M ' : t $ ( a . a r g ~ ( ( X , # ( ( t --* t ' ) ) ) ~ ( x ) a ' ) . v a l It is easy to see that the translation transforms well-typed judgments r F M : t into well-typed judgments 0 F ((M// : ((t)). As in [3], the translation provides a call-by-name operational semantics for the lambda-calculus. The encoding of [3] also provides an equational theory for the object calculus and, thefore, for the l a m b d a calculus, via translation, which we do not.

6 6.1

Discussion Variations

Several variations can be made by consistently modifying field-types, their subtyping relationship, and the typing rule for object extension. The easiest is to drop some subtyping asumption (such as SUB PP, or SUB PA) or drop the fieldtype P ~- altogether. This weakens the type system (some examples are not typable any longer), but it retains the essential features. More significant simplifications can be made at the price of a higher restriction of expressiveness. For instance, virtual field-types could be removed. Some extensions or modifications to the type hierarchy are also possible. For instance, one could introduce fields of type ~P T that do no depend on any other method. These methods would be dual of those of type P T on which no other method depend. Methods of type tP r could always be called even if the object is virtual. This extends to field-types tR 7- and t R+ r similarly.

216

6.2

Extensions

Imperative update is an orthogonal issue to the one studied here, and it could be added without any problem. Object extension should, of course, remain functional.

Equational theory We see no difficulty in adding an equational theory to our calculus, but this remains to be investigated. Treating object extension as a commutative operator would allow to reduce object construction to a sequence of object extensions of the empty object (virtual methods would be crucial here).

Higher-order types As shown above, our objects are sufficiently powerful to represent classes. As opposed to [3], this does not necessitate higher-order polymorphism because methods are already required to be parametric in all possible extensions of self. The addition of higher-order polymorphism might still be useful, in particular to enable parametric classes. We believe that there is no problem in constraining type abstraction by some supertype bound, written (~ <: r as in F<:. However, it would also be useful to introduce #-bounds of the form c~ <: # r. This might require more investigation.

Row variables and binary methods We have used row variables only as a metanotation for simplifying the presentation. It would be interesting to really allow row variables in types. This would probably augment the expressiveness of the language, since it should provide some form of matching that revealed quite useful, especially for binary methods [11,9, 26]. Actually, it remains to investigate how the presented calculus could be extended to cope with binary methods. Row variables might not be sufficient to express matching, and some new form of matching might have to be found. It is unclear whether the known solutions [10] could be adapted to our calculus. 7

Comparison

with

other

works

Our proposal is built on the calculus of objects of Abadi and Cardelli [3], which is invoked throughout this paper. Our use of variance annotations is in principle similar to theirs. By attaching variance annotations to field-types rather than to fields themselves, we eliminate some useless types such as M+ T. Indeed, such a field could not be overriden, nor accessed, and thus it could be just given type U. (Our use of variances also eliminate the ability to specify the type of a field without specifying its variance, which may cause problem with type inference [22].) An essential imported tool is the structure of record-types of [23], which was originally designed for type inference in ML [24]. The use of a richer structure of record types has previously been proposed for type checking records [15, 16, 13, 12]. To our knowledge, the benefits of symmetric information were first transfeted from record types to object types in [25]. There, first-order typing rules for

217

objects with extension and both deep and width subtyping were roughly drafted without any formal treatment. A similar approach has also been independently proposed by Bono, Liquori and others. Their first related work [6] has later lead to many closely related proposals [8, 7, 4, 18, 17, 19]. Most of these are extensions of the Fisher-Honsel calculus of objects [20]. The differences between their approach and the one of [3] (which is also ours) are not always significant but they make a close comparisson more difficult. Only two of these works [19, 18] are extensions of the AbadiCardelli calculus of objects [3] and are thus more connected to our proposal. T h e first-order version [19], is subsumed by both [25] (which also covers deep subtyping) and [18] (which also addresses self types.) Our proposal extends both [25] and [18]. The most interesting comparison can be made with [18]. The main motivation and the key idea behind both proposals are similar: they integrate object subtyping and object extension, using a richer type structure to preserve type soundness. Saturated vs diamond types correspond to our object-types with a field template U vs A, respectively. Our treatment seems more uniform. We only have one kind of object types. We distinguish between the "saturation" and "diamond" properties in fields instead of objects. As a result, we can write an object type that is saturated, except for a few particular fields. Our proposal also includes several additional features: it addresses deep subtyping and virtual methods; it uses a more powerful subtyping rule for object types; it also allows methods to extend self. Moreover, in our proposal, the subtyping relationship is structural for object types. Additionally, subtyping axioms are only given at the level of fields, each one of them treating a different important subtyping capability. As a result, object types have a more regular structure, and can easily be adapted to further extensions. We think this is easier to understand, to modify, and to manipulate. An alternative to virtual methods has also been studied in [4], using a quite different approach, which consists in annotating each method with the list of all other methods they depend on. Thus, each method has a different view of self. Their approach to incomplete objects is, in principle, more powerful that ours; in particular, they can type programs that even traditional class-based languages would reject. We found their types of objects too detailed, and thus their proposal less practical than ours. (Tracing dependencies is closer to some form of program analysis than to standard type systems.) In fact, we intendedly restricted our type system so that methods have a uniform view of self. In practice, our solution is sufficient to capture common forms of inheritance. In [20], pre-objects have pro-types and can be turned into objects with objtypes by subtyping. Pro-types and obj-types are similar to our object types ~(X)[gi: R T, ,eZ ; A] and r R+ r, *el ;lJ]. One difference is that, in our case, subtyping is defined and permitted field by field rather than all at once. Fisher and Mitchell also studied the relationship between objects and classes in [14]. They use bounded existential quantification to hide some of the structure of the object in the public interface. This still allows public methods to be called, while

218

private methods become innaccessible. In our calculus, the richer structure of objects permits to use subtyping instead of bounded existential quantification to provide a similar abstraction. This is not suprising, theoretically, since subtyping, as existential quantification, is a lost of type information. However, this is practically a significant difference, since subtyping allows more explicit type information but is less expressive. Another difference is that using the standard record types they had to introduce record sorts to express negative type information. As pointed out in a more recent paper [5], the design of the language of kinds becomes important for modularity. In particular, [5] improves over [14] by changing default kinds from unknown (U in our setting) to absent (A). Instead, our record types express positive and negative information symmetrically and are viewed as total functions from fields to types, which avoids the somehow ad hoc language of sorts. In a recent paper, Riecke and Stone have circumvented the problem of merging extension with deep and width subtyping by changing the semantics of objects [27]. In fact, their semantics remain in correspondance with the standard semantics of objects in the general case, but the semantics of extension is changed so that the counter example becomes sound in the new semantics. They distinguish between method update and object extension. Then, a field that is already defined is automatically renamed by extension into an anonymous field that becomes externally inaccessible. With their semantics, some of our enriched type information would become obsolete for ensuring type soundness, but it might remain useful for compile-time optimizations. Other pieces of information, e.g. virtual types, would remain quite pertinent.

8

Conclusion

We have proposed a uniform and flexible method for enriching type systems of object calculi by refining the field structure of object types, so that they carry more precise type information. Applying our approach to the object calculus of Abadi and Cardelli, we have integrated object extension and depth and width subtyping, with covariant final methods and contra-variant virtual methods, in a type-safe calculus. When sufficient type information is revealed, objects may represent classes. Type information may also be hidden progressively, until objects can be used and interchanged in a traditional fashion. An important gain is to avoid the encoding of classes as records of premethods. Instead, we provide a more uniform, direct approach. Another benefit of this integration is to allow mixed formed of classes and objects. The use of richer object types also increases both safety by capturing more dynamic misbehavior as static type errors and security by allowing more privacy via subtyping. Moreover, our approach subsumes several other unrelated proposals, and it might provide a unified framework for studying or comparing new proposals. Some ex-

219

tensions and variations are clearly possible, provided the operations on objects, their types and the subtyping hierarchy are changed consistently. More investigation still remains to be done. Adding an equational theory to the calculus, would simplify our primitives, since objects could always be built field by field using object extension only. This might also be a first step towards a better integration of record-based and delegation-based object calculi. In the future, we would also like to study the potential increase of expressiveness t h a t field and row variables could provide. Of course, investigating binary methods remains one of the most important issues. Classes can be viewed as objects. We hope that an even richer type structure would finally enable to see objects for what they really are - - r e c o r d s of f u n c t i o n s - - in the (yet untyped) self-application interpretation.

References 1. Martin Abadi and Luca Cardelli. A theory of primitive objects: Untyped and first-order systems. In Theoretical Aspects of Computer Software, pages 296-320. Springer-Verlag, April 1994. 2. Martin Abadi and Luca Cardelli. A theory of primitive objects: Second-order systems. Science of Computer Programming, 25(2-3):81-116, December 1995. Preliminary version appeared in D. Sanella, editor, Proceedings of European Symposium on Programming, pages 1-24. Springer-Verlag, April 1994. 3. Martin Abadi and Luca Cardelli. A theory of objects. Springer, 1996. 4. V. Bono, M. Bugliesi, M. Dezani-Ciancaglini, and L. Liquori. Subtyping Constraints for Incomplete Objects. In Proceedings of TAPSOFT-CAAP-97, International Joint Conference on the Theory and Practice of Software Development, Lecture Notes in Computer Science. Springer-Verlag, 1997. 5. V. Bono and K. Fisher. An imperative, first-order cMculus with object extension. In Informal Proceedings of the FOOL 5 workshop on Foundations of Object Oriented Programming, Sans Diego, CA, January 1998. To appear. 6. V. Bono and L. Liquori. A subtyping for the fisher-honsell-mitchell lambda calculus of object. In Proc. of CSL-9~, Internatzonal Conference of Computer Science Logic, volume 933 of Lecture Notes in Computer Science, pages 16-30. Springer-Verlag, 1995. 7. Viviana Bono and Michele Bugliesi. A lambda calculus of incomplete objects. In Proceedings of Mathematical Foundations of Computer Science(MFCS), number 1113 in Lecture Notes in Computer Science, pages 218-229, 1996. 8. Viviana Bono and Michele Bugliesi. Matching constraints for the lambda calculus of objects. In Proceedings of (MFCS), 1997. 9. Kim B. Bruce. Typing in object-oriented languages: Achieving expressibility and safety. Revised version to appear in Computing Surveys, November 1995. 10. Kim B. Bruce, Luca Cardelli, Giuseppe Castagna, Valery Trifonov) the Hopkins Objects Group (Jonathan Eifrig, Scott Smith, Gary T. Leavens, and Benjamin Pierce. On binary methods. Theory and Practice of Object Systems, 1(3):221-242, 1996. 11. Kim B. Bruce, Angela Schuett, and Robert van Gent. Polytoil: A type-safe polymorphic object-oriented language. In ECOOP, number 952 in LNCS, pages 27-51. Springer Verlag, 1995.

220

12. Luca Cardelli. Extensible records in a pure calculus of subtyping. In Carl A. Gunter and John C. Mitchell, editors, Theoretical Aspects Of Object-Oriented Programming. Types, Semantics and Language Design, pages 373-425. MIT Press, 1994. 13. Luca Cardelli and John C. Mitchell. Operations on records. In Fifth International Conference on Mathematical Foundations of Programming Semantics, 1989. 14. K. Fisher and J. C. Mitchell. On the relationship between classes, objects and data abstraction. Theoretical And Practice of Objects Systems, To appear, 1998. A preliminary version appeared in the proceedings of the International Summer School on Mathematics of Program Construction, Marktoberdorf, Germany, Springer LNCS, 1997. 15. Robert W. Harper and Benjamin C. Pierce. Extensible records without subsumption. Technical Report CMU-CS-90-102, Carnegie Mellon University, Pittsburg, Pensylvania, February 1990. 16. Robert W. Harper and Benjamin C. Pierce. A record calculus based on symmetric concatenation. Technical Report CMU-CS-90-157, Carnegie Mellon University, Pittsburg, Pensylvania, February 1990. 17. L. Liquori and G. Castagna. A Typed Lambda Calculus of Objects. In Proc. of ASIAN-g6, International Asian Computing Science Conference, volume 1212 of Lecture Notes in Computer Science. Springer-Verlag, 1996. 18. Luigi Liquori. Bounded polymorphism for extensible objects. Technical Report CS-24-96, Dipartimento di Informatica, Universita' di Torino, 1997. 19. Luigi Liquori. An Extended Theory of Primitive Objects: First Order System. In

Proceedings of ECOOP-97, International European Conference on Object Oriented Programming, Lecture Notes in Computer Science. Springer-Verlag, 1997. 20. John C. Mitchell and Kathleen Fisher. A delegation-based object calculus with subtyping. In Fundamentals of Computation Theory, number 965 in LNCS, pages 42-61. Springer, 1995. 21. John C. Mitchell, Furio Honsell, and Kathleen Fisher. A lambda calculus of objects and method specialization. In IEEE Symposium on Logic in Computer Science, pages 26-38, June 1993. 22. Jens Palsberg and Trevor Jim. Type inference of object types with variances. Private Discussion, 1996. 23. Didier Rdmy. Syntactic theories and the algebra of record terms. Research Report 1869, Institut National de Recherche en Informatique et Automatisme, Rocquencourt, BP 105, 78 153 Le Chesnay Cedex, France, 1993. 24. Didier R~my. Type inference for records in a natural extension of ML. In Carl A. Gunter and John C. Mitchell, editors, Theoretical Aspects Of Object-Oriented Programming. Types, Semantics and Language Design. MIT Press, 1993. 25. Didier R~my. Better subtypes and row variables for record types. Presented at the workshop on Advances in types for computer science at the Newton Institute, Cambridge, UK, August 1995. 26. Didier R4my and J4r6me Vouillon. Objective ML: An effective object-oriented extension to ML. Theoretical And Practice of Objects Systems, To appear, 1998. A preliminary version appeared in the proceedings of the 24th ACM Conference on Principles of Programming Languages, 1997. 27. Jon G. Riecke and Christopher A. Stone. Privacy via subsumption. In Informal

Proceedings of the FOOL 5 workshop on Foundations of Object Oriented Programming, Sans Diego, CA, January 1998. To appear.

Building a Bridge b e t w e e n Pointer Aliases and Program D e p e n d e n c e s * John L. Ross 1 and Mooly Sagiv 2 1 University of Chicago, e-mail: johnross 9 2 Tel-Aviv University, e-malh sagiv~math.tau, a c . i l

A b s t r a c t . In this paper we present a surprisingly simple reduction of the program dependence problem to the may-alias problem. While both problems are undecidable, providing a bridge between them has great practical importance. Program dependence information is used extensively in compiler optimizations, automatic program parallelizations, code scheduling in super-scalar machines, and in software engineering tools such as code slicers. When working with languages that support pointers and references, these systems are forced to make very conservative assumptions. This leads to many superfluous program dependences and limits compiler performance and the usability of software engineering tools. Fortunately, there axe many algorithms for computing conservative approximations to the may-alias problem. The reduction has the important property of always computing conservative program dependences when used with a conservative may-alias algorithm. We believe that the simplicity of the reduction and the fact that it takes linear time may make it practical for realistic applications.

1

Introduction

It is well known that programs with pointers are hard to understand, debug, and optimize. In recent years many interesting algorithms that conservatively analyze programs with pointers have been published. Roughly speaking, these algorithms [19, 20, 25, 5, 16, 17, 23, 8, 6, 9, 14, 27, 13, 28] conservatively solve the mayalias problem, i.e., the algorithms are sometimes able to show that two pointer access paths never refer to the same memory location at a given program point. However, may-alias information is insufficient for compiler optimizations, automatic code parallelizations, instruction scheduling for super-scalar machines, and software engineering tools such as code slicers. In these systems, information about the program dependences between di~erent program points is required. Such dependences can be uniformly modeled by the program dependence graph (see [21, 26, 12]). In this paper we propose a simple yet powerful approach for finding program dependences for programs with pointers: * Partially supported by Binational Science Foundation grant No. 9600337

222

Given a program P, we generate a program P~ (hereafter also referred to as the instrumented version of :P) which simulates 7). The program dependences of P can be computed by applying an arbitrary conservative may-alias algorithm to P~. We are reducing the program dependence problem, a problem of great practical importance, to the may-alias problem, a problem with many competing solutions. The reduction has the property that as long as the may-alias algorithm is conservative, the dependences computed are also conservative. Fhrthermore, there is no loss of precision beyond that introduced by the chosen may-alias algorithm. Since the reduction is quite efficient (linear in the program size), it should be possible to integrate our method into compilers, program slicers, and other software tools. 1.1

Main Results and Related Work

The major results in this paper are: - The unification of the concepts of program dependences and may-aliases. While these concepts are seemingly different, we provide linear reductions between them. Thus may-aliases can be used to find program dependences and program dependences can be used to find may-aliases. - A solution to the previously open question about the ability to use "storeless" (see [8-10]) may-alias algorithms such as [9, 27] to find dependences. One of the simplest store-less may alias algorithm is due to Gao and Hendren [14]. In [15], the algorithm was generalized to compute dependences by introducing new names. Our solution implies that there is no need to re-develop a new algorithm for every may-alias algorithm. Furthermore, we believe that our reduction is actually simpler to understand than the names introduced in [15] since we are proving program properties instead of modifying a particular approximation algorithm. - Our limited experience with the reduction that indicates that store-less mayalias algorithms such as [9, 27] yield quite precise dependence information. - The provision of a method to compare the time and precision of different may-alias algorithms by measuring the number of program dependences reported. This metric is far more interesting than just comparing the number of may-aliases as done in [23, 11, 31, 30, 29]. Our program instrumentation closely resembles the "instrumented semantics" of Horwitz, Pfeiffer, and Reps [18]. They propose to change the program semantics so that the interpreter will carry-around program statements. We instrument the program itself to locally record statement information. Thus, an arbitrary may-alias algorithm can be used on the instrumented program without modification. In contrast, Horwitz, Pfeiffer, and Reps proposed modifications to the specific store based may-alias algorithm of Jones and Muchnick [19] (which is imprecise and doubly exponential in space).

223

An additional benefit of our shift from semantics instrumentation into a program transformation is that it is easier to understand and to prove correct. For example, Horwitz, Pfeiffer, and Reps, need to show the equivalence between the original and the instrumented program semantics and the instrumentation properties. In contrast, we show that the instrumented program simulates the original program and the properties of the instrumentation. Finally, program dependences can also be conservatively computed by combining side-effect analysis [4, 7, 22, 6] with reaching definitions [2] or by combining conflict analysis [25] with reaching definitions as done in [24]. However, these techniques are extremely imprecise when recursive data structures are manipulated. The main reason is that it is hard to distinguish between occurrences of the same heap allocated run-time location (see [6, Section 6.2] for an interesting discussion). 1.2

Outline of the rest of this paper

In Section 2.1, we describe the simple Lisp-like language that is used throughout this paper. The main features of this language are its dynamic memory, pointers, and destructive assignment. The use of a Lisp-like language, as opposed to C, simplifies the presentation by avoiding types and the need to handle some of the difficult aspects of C, such as pointer arithmetic and casting. In Section 2.2, we recall the definition of flow dependences. In Section 2.3 the may-alias problem is defined. In Section 3 we define the instrumentation. We show that the instrumented program simulates the execution of the original program. We also show that for every run-time location of the original program the instrumented program maintains the history of the statements that last wrote into that location. These two properties allow us to prove that may-aliases in the instrumented program precisely determine the flow dependences in the original program. In Section 4, we discuss the program dependences computed by some mayalias algorithms on instrumented programs. Finally, Section 5, contains some concluding remarks. 2 2.1

Preliminaries Programs

Our illustrative language (following [19, 5]) combines an Algol-like language for control flow and functions, Lisp-like memory access, and explicit destructive assignment statements. The atomic statements of this language are shown in Table 1. Memory access paths are represented by (Access). Valid expressions are represented by (Exp). We allow arbitrary control flow statements using conditions (Cond)1. Additionally all statements are labeled. 1 Arbitrary expressions and procedures can be allowed as long as it is not possible to observe actual run-time locations.

224

Figure 1 shows a program in our language that is used throughout this paper as a running example. This program reads atoms and builds them into a list by destructively updating the cdr of the tail of the list.

program

DestructiveAppend 0

begin Sl: new( head ) s2: r e a d ( head.car ) sa: head.cdr : = nil s4: tail := head ss: while( tail.car ~ 'x' ) do s6: new( temp ) s7: read( romp.car ) ss: temp.cdr :---- nil s9: tad.cdr :---- temp slo: tail :---- tail.cdr od st1: write( head.car ) s12: write( tail.car ) end.

J/:!

3

\

S

Fig. 1. A program that builds a list by destructively appending elements to tail and its flow dependences.

2.2

The Program Dependence Problem

Program dependences can be grouped into flow dependences (def-use), output dependences (def-def), and anti-dependences (use-def) [21, 12]. In this paper, we focus on flow dependences between program statements. The other dependences are easily handled with only minor modifications to our method. Table 1. An illustrative language with dynamic memory and destructive updates.

(St) ::= si: (Access) :=(Exp) (St} ::= si: new( (Access) ) (St) ::= s~: read((Access)) (St) ::-- &: write( (Exp) ) (Access) ::= variable { (Access).(Sel (Exp) ::= (Access) {atom I nil (Sel) ::= car { cdr (Cond) ::= (Exp) = (Exp) (Cond) ::= (Exp) ~ (Exp)

225

Our language allows programs to explicitly modify their store through pointers. Because of this we must phrase the definition of flow dependence in terms of memory locations (cons-cells) and not variable names. We shall borrow the following definition for flow dependence:

Definition 1 ([18]). Program point q has a flow dependence on program point p i / p writes into m e m o r y location loc that q reads, and there is no intervening write into loc along an execution path by which q is reached from p. Figure 1 also shows the flow dependences for the running example program. Notice that Sll is flow dependent on only sl and s2, while s12 is flow dependent on s2, s4, ST, and sl0. This information could be used by slicing tools to find that the loop need not be executed to print head.car in s11, or by an instruction scheduler to reschedule sm for anytime after s2. Also, s3 and ss have no statements dependent on them, making them candidates for elimination. Thus, even in this simple example, knowing the flow dependences would potentially allow several code transformations. Since determining the exact flow dependences in an arbitrary program is undecidable, approximation algorithms must be used. A flow dependence approximation algorithm is conservative if it always finds a superset of the true flow dependences.

2.3

The May-Alias P r o b l e m

The may-alias problem is to determine whether two access-paths, at a given program point, could denote the same cons-cell.

Definition 2. Two access-paths are may-aliases at program point p, if there exists an execution path to program point p where both denote the same conscell.

In the running example program, head.cdr.cdr and tail are may-aliases at s6 since before the third iteration these access paths denote the same cons-cell. However, tail.cdr.cdr is not a may-alias to head since they can never denote the same cons-cell. Since the may-alias problem is undecidable, approximation algorithms must be used. A may-alias approximation algorithm is conservative if it always finds a superset of the true may-aliases.

3

The

Instrumentation

In this section the instrumentation is defined. For notational simplicity, 7~ stands for an arbitrary fixed program, and pl stands for its instrumented version.

226 3.1

The Main Idea

The p r o g r a m 7)/ simulates all the execution sequences of P. Additionally, the "observable" properties of 7) are preserved. Most importantly, P ' records for every variable v, the statement from 7) t h a t last wrote into v. This "instrumentation information" is recorded in v.car (while storing the original content of v in v.cdr). This "totally static" instrumentation 2 allows p r o g r a m dependences to be recovered by may-alias queries on 7)'. More specifically, for every statement in 7) there is an associated cons-cell in P ' . We refer to these as statement cons-cells. Whenever a statement s, assigns a value into a variable v, 7)' allocates a cons-cell t h a t we refer to as an instrumentation cons-cell. The car of this instrumentation cons-cell always points to the statement cons-cell associated with si. Thus there is a flow dependence from a statement p to q: x := y in 7) if and only if y.car can point to the statement conscell associated with p in 7)'. Finally, we refer to the cdr of the instrumentation cons-cell as the data cons-cell. The d a t a cons-cell is inductively defined: - If si is an assignment si: v := A for an atom, A, then the d a t a cell is A. - If si is an assignment si: v := v' then the d a t a cell denotes v'.cdr. - If the statement is s~: new(v), then the d a t a cons-cell denotes a newly allocated cons-cell. Thus P ' allocates two cons-cells for this statement. In general, there is an inductive syntax directed definition of the d a t a cells formally defined by the function txe defined in Table 3.

3.2

The Instrumentation

of the Running

Example

To make this discussion concrete, Figure 2 shows the beginning of the running example p r o g r a m and its instrumented version. Figure 3 shows the stores of both the programs just before the loop (on the input beginning with 'A'). T h e cons-cells in this figure are labeled for readability only. The instrumented program begins by allocating one statement cons-cell for each statement in the original program. Then, for every statement in the original program, the instrumented statement block in the instrumented program records the last wrote-statement and the data. The variable _rhs is used as a t e m p o r a r y to store the right-hand side of an assignment to allow the same variable to be used on both sides of an assignment. Let us now illustrate this for the statements sl through s4 in Figure 3. - In the original program, after Sl, head points a new uninitialized cons-cell,

Cl. In the instrumented program, after the block of statements labeled by sl, head points to an instrumentation cons-cell, il, head.car points to the statement cell for sl, and head.cdr points to c~. 2 In contrast to dynamic program slicing algorithms that record similar information using hash functions, e.g., [1].

227 - In the original program, after s2, head.car points to the atom A. In the instrumented program, after the block of statements labeled by s2, head.cdr.car points to an instrumentation cons-cell, i2, head.cdr.car.car points to the statement cell for s2, and head.cdr.car.cdr points to A. - In the original program, after s3, head.cdr points to nil. In the instrumented program, after the block of statements labeled by s3, head.cdr.cdr points to an instrumentation cons-cell, i3, head.cdr.cdr.car points to the statement cell for s3, and head.cdr.cdr.cdr points to nil. - In the original program, after s4, tail points to the cons-cell cl. In the instrumented program, after the block of statements labeled by s4, tail points to an instrumentation cons-cell, i4, tail.car points to the statement cell for s4, and tail.cdr points to c~. Notice how the sharing of the r-values of head and tail is preserved by the transformation. 3.3

A Formal

Definition

of the

Instrumentation

Formally, we define the instrumentation as follows: 3. Let 7) be a program in the form defined in Table 1. Let sl , s 2 , . . . , Sn be the statement labels in 7). The instrumented program 7)~ is obtained from 7) starting with a prolog of the form new(ps,) where i = 1, 2 . . . n . After the prolog, we rewrite 7) according to the translation rules shown in Table 2 and Table 3.

Definition

Example 4. In the running example program (Figure 2), in 7), Sll writes head.car and in 7)~, sm writes head.cdr.car.cdr. This follows from: txe( head.car ) = txa( head.car ).cdr = txa( head).cdr.car.cdr = head.cdr.car.cdr

3.4

Properties

of the

Instrumentation

In this section we show that the instrumentation has reduced the flow dependence problem to the may-alias problem. First the simulation of 7) by 7)~ is shown in the Simulation Theorem. This implies that the instrumentation does not introduce any imprecision into the flow dependence analysis. We also show the Last Wrote Lemma which states that the instrumentation maintains the needed invariants. Because of the Simulation Theorem, and the Last Wrote Lemma, we are able to conclude that: 1. exactly all the flow dependences in 7) are found using a may-alias oracle on 7y . 2. using any conservative may-alias algorithm on T'~ always results in conservative flow dependences for 7). Intuitively, by simulation, we mean that at every label of 7) and 7)~, all the "observable properties" are preserved in 7)~, given the same input. In our case, observable properties are: -

r-values printed by the write statements

228

p r o g r a m DestructiveAppend 0

p r o g r a m InstrumentedDestructiveAppend()

begin

begin

sl:

n e w ( head )

sl:

s2:

r e a d ( head.car )

s2:

n e w ( ps, ) Vi E { 1 , 2 , . . . , 12} n e w ( head ) head.car := psi n e w ( head.cdr ) n e w ( head.cdr.car )

head.cdr.car.car := ps~ sa:

head.cdr : = nil

s3:

sa:

tail := head

s4:

s5:

while( tail.car ~ ' x ' ) do

Sh:

r e a d ( head. cdr. car. cdr) _rhs : = nil n e w ( head.cdr.cdr )

head.cdr.cdr.car := pss head. cdr. cdr. cdr := _rhs _rhs := head.cdr n e w ( tail ) tail.car := ps4 tail. cdr := _rhs while( tail.cdr.car.cdr ~ ' x ' ) do

Fig. 2. The beginning of the example program and its corresponding instrumented program.

T a b l e 2. The translation rules that define the instrumentation excluding the prolog. For simplicity, every assignment allocates a new instrumentation cons-cell. The variable _rhs is used as a temporary to store the right-hand side of an assignment to allow the same variable to be used on both sides of an assignment.

si: (Access):=(Exp) ~

s~: _rhs :=txe( (Exp) ) new(txa((Access)))

si:new((Access))

~

si:read((Access))

~

s,: write((Exp))

~

txa( (Access) ).car:=psi txa( (Access) ).cdr:=_rhs si: new(txa ((Access))) txa( (Access) ).car:=psl new(txa((Access)).cdr)

(Expl) = (Exp2) (Expl) # (Exp2)

s~: n e w ( txa( (Access) ) ) txa( (Access) ).car:=ps~ read( txa( (Access) ).cdr) si: write(txa((Exp))) txe( (Expl) ) = txe( (Exp2) ) txe( (Expl) ) # txe( (Exp2) )

229

sa:

s'vnew(head);head.ear:= psa;new(head.cdr) il c' h e a d ~

new(head) el

head~ s2:read(head.car)

s'2:new(head.cdr.car);head.cdr.carcar:=ps2;read(head.cdr.car.cdr)) h e a d ~

['--"-I

el

head , ~ ~a:-rhs:= nd;new(head.edr.edr);headedr.edr.car:=ps3,head.cdr.cdr,edr:= _rh~

sa.head.cdr:= nil

h

e

a

d

~

nil

el head s4:

,

~

nil

I t

tail:= head

s~:

l_21

_rhs:= head.cdr;new(ta~l);tail.car:= ps4;tail.cdr:= _rhs ~_~ il e" i3 t ~ h e a d ~

nil

el

tail~ ' ~ . . ~ head

nil

[ [

l_2_1

F i g . 3. The store of the original and the instrumented running example programs just before the loop on an input list starting with 'A'. For visual clarity, statement conscells not pointed to by an instrumentation cons-cell are not shown. Also, cons-cells are labeled and highlighted to show the correspondence between the stores of the original and instrumented programs.

T a b l e 3. The function txa which maps an access path in the original program into the corresponding access path in the instrumented program. The function txe maps an expression into the corresponding expression in the instrumented program.

txa (variable) Sel) )

= variable txa( (Access) ).cdr.(Sel)

t a( (Aceess) ). dr [txe (atom) [txe(nil)

atom nil

230

-

equalities of r-values

In particular, the execution sequences of 7) and P ' at every label are the same. This discussion motivates the following definition: 5. Let S be an arbitrary sequence of statement labels in 7) el, e2 be 7~ expressions, and I be an input vector. We denote by I, S ~ e l = e2 the fact that the input I causes S to be executed in "P, and in the store after S, the r-values of el and e2 are equal.

Definition

Example 6. In the running example, head.cdr.cdr and tail denote the same conscell before the third iteration for inputs lengths of four or more. Therefore, p I, [sl, s2, s3, s4, sh]([s6, sT, ss, s9, slo])2~head.cdr.cdr = tail T h e o r e m 7. ( S i m u l a t i o n T h e o r e m ) and sequence of statement labels S: .p z, S # e l

=

r

Given input I, expressions el and e2,

p, I, S # t x e ( e l ) = txe(e:)

Example 8. In the running example, before the first iteration, in the last box of Figure 3, head and tail denote the same cons-cell and head.cdr and tail.cdr denote the same cons-cell. Also, in the instrumented program, head.cdr.cdr.cdr.cdr and tail.cdr denote the same cons-cell before the third iteration for inputs of length four or more. Therefore, as expected from Example 6, .p, I, Is1, s2, s3, s4, Sh]([s6, sT, ss, s9, slo])2~txe(head.cdr.cdr) = txe(tail) The utility of the instrumentation is captured in the following lemma. L e m m a 9. ( L a s t W r o t e L e m m a ) Given input I, sequence of statement labels S, and an access path a, the input I leads to the execution of S in ~P in which the last statement that wrote into a is si if and only if p, I, S ~ t x a ( a ) . c a r = psi. Example 10. In the running example, before the first iteration, in the last box p, of Figure 3, we have I, [Sl, s2, s3, s4, Sh]~txa(head).car = psi since Sl is the statement that last wrote into head. Also, for input list I = ['A~,' x~], we have: p, I, [sl, s2, s3, s4, sh, Sll , s12]~txa(tail.car).car = ps2 since for such input s2 last wrote into tail.car (through the assignment to head.car). A single statement in our language can read from many memory locations. For example, in the running example program, statement s5 reads from tail and tail.car. The complete read-sets for the statements in our language are shown in Tables 4 and 5. We are now able to state the main result.

231

T a b l e 4. Read-sets for the statements in our language.

[

s~: (Access):=(Exp)

s~: new( (Access) ) si : read( ( Access) ) s,: write((Exp))

(Expl) = (Exp2) (Expl) # (Exp2)

rsa( (Aceess) ) U rse( (Exp) ) rsa( (Access) ) rse( (Aecess) ) rse( (Exp) ) rse( (Expl) ) U rse( (Exp2) ) rse( (Expi) ) U rse( (Exp2) )

T a b l e 5. An inductive definition of rsa, the read-set for access-paths, and rse, the read-set for expressions.

rsa(variable) =0 ] rsa((Access).(Sel)) rsa((Access)) U {(Access)} r se (variable ) {variable} rse((Access).(Sel)) rse((Access)) U {(Access).(Sel)} I rse(atom) 0 rse(nil) =0

t ]

11. ( F l o w D e p e n d e n c e R e d u c t i o n ) Given program 7), its instrumented version P', and any two statement labels p and q. There is a flow dependence from p to q (in P ) if and only if there exists an access path, a, in the read-set of q, s.t. pSp is a may-alias of txa(a).car at q in P'. Theorem

Example 12. To find the flow dependences of s5 in the running example: s5 : while(tail.car ~ 'x I) First Tables 4 and 5 are used to determine the read-set of s5:

rse( (Expz) ) U rse( (Exp2) ) = rse(tail.car) U rse('x') = rse(tail) U {tail.car} U 0 = (tail} U {tail.car} = {tail, tail.car}. T h e n txa(a).car is calculated for each a in the read-set: txa( tail).car = tail.car txa( tail.car).car = txa( tail).cdr.car.car = tail.cdr.car.car Next the may-aliases to tail.car and tail.cdr.car.car are calculated by any mayaliases algorithm. Finally s5 is flow dependent on the statements associated with the statement cons-cells t h a t are among the may-aliases found to tail.car and

tail.cdr.car.car. The Read-Sets and May-Aliases for the running example are summarized in Table 6. From a complexity viewpoint our method can be very inexpensive. The prog r a m transformation time and space are linear in the size of the original program. In applying Theorem 11 the number of times the may-alias algorithm is invoked is also linear in the size of the original program, or more specifically, proportional

232

Table 6. Flow dependence analysis of the running example using a may-alias oracle.

IStmt I

Read-Set

May-Aliases

81

0

0

s2 s3 s4 so

(head} (head} (head} (tail, tail.car}

{(head.car, psi)} {(head.car, psi)} {(head.car, psi)} { (tail.car, psd), (tail.car, pslo), (tail.cdr.car.car, ps2 ) , ( tail.cdr.car.car, ps7 ) }

so

0

0

s7 so s9 slo

{temp} {temp} (tail, temp} (tail, tail.cdr} (head, head.car} (tail, tail.car}

{(temp.car, vsd}

Sll

s12

{ (temp.car, pso ) }

{(tail.car, psd), (tail.car, ps~o), (temp.car, psd} { (tail.car, psd), (tail.car, pslo), (tail.cdr.cdr.car, psg)} {(head.car, psi), ( head.cdr.car.car, ps2 ) } { (tail.car, psd ), (tail.car, ps~o), (tail.cdr.car.car, ps2 ), ( tail.cdr.car.car, psT ) }

to the size of the read-sets. It is most likely that the complexity of the may-alias algorithm itself is the dominant cost.

4

P l u g and Play

An important corollary of Theorem 11 is that an arbitrary conservative may-alias algorithm on P ' yields a conservative solution to the flow dependence problem on P. Since existing may-alias algorithms often yield results which are difficult to compare, it is instructive to consider the flow dependences computed by various algorithms on the running example program. - The algorithm of [9] yields the may-aliases shown in column 3 of Table 6. Therefore, on this program, this algorithm yields the exact flow dependences shown in Figure 1. - The more efficient may-alias algorithms of [23, 14, 30, 29] are useful to find flow dependences in programs with disjoint data structures. However, in programs with recursive data structures such as the running example, they normally yield many superfluous may-aliases leading to superfluous flow dependences. For example, [23] is not able to identify that tail points to an acyclic list. Therefore, it yields that head.car and ps7 are may-aliases at s11. Therefore, it will conclude that the value of head.car read at sll may be written inside the loop (at statement sT). - The algorithm of [27] finds, in addition to the correct dependences, superfluous flow dependences in the running example. For example, it finds that s~ has a flow dependence on ss. This inaccuracy is attributable to the anonymous nature of the second cons-cell allocated with each new statement. There are two possible ways to remedy this inaccuracy:

233

9 Modify the algorithm so that it is 2-bounded, i.e., also keeps track of c a r and cdr fields of variables. Indeed, this m a y be an adequate solution for general k-bounded approaches, e.g., [19] by increasing k to 2k. 9 Modify the transformation to assign unique names to these cons-cells. We have implemented this solution, and tested it using the PAG [3] implementation of the [27] algorithm 3 and found exactly all the flow dependences in the running example. 5

Conclusions

In this paper, we showed t h a t may-alias algorithms can be used, without any modification, to compute program dependences. We are hoping t h a t this will lead to more research in finding practical may-alias algorithms to compute good approximations for flow dependences. For simplicity, we did not optimize the m e m o r y usage of the instrumented program. In particular, for every executed instance of a statement in the original program that writes to the store, the instrumented p r o g r a m creates a new instrumentation cons-cell. This extra m e m o r y usage is harmless to may-alias algorithms (for some algorithms it can even improve the accuracy of the analysis, e.g., [5]). In cases where the instrumented program is intended to be executed, it is possible to drastically reduce the m e m o r y usage through cons-cell reuse. Finally, it is worthwhile to note that flow dependences can be also used to find may-aliases. Therefore, may-aliases are necessary in order to compute flow dependences. For example, Figure 4 contains a program fragment t h a t provides the instrumentation to "check" if two program variables Vl and v2 are may aliases at program point p. This instrumentation preserves the meaning of the original program and has the proper~ty that Vl and v2 are may-aliases at p if and only if s2 has a flow dependence on sl. p: if vl ~ nil then Sl : Vl.cdr := vl.cdr fi if v2 ~ nil t h e n s2: write(v2.cdr) fi

Fig. 4. A program fragment such that vl and v2 are may-aliases at p if and only if s2 has a flow dependence on sl.

Acknowledgments We are grateful for the helpful comments of T h o m a s Ball, Michael Benedikt, T h o m a s Reps, and Reinhard Wilhelm for their comments t h a t led to substantial 3 On a SPARCstation 20, PAG used 0.53 seconds of cpu time.

234

improvements of this paper. We also would like to thank Martin Alt and Florian Martin for PAG, and their PAG implementation of [27] for a C subset.

References 1. H. Agrawal and J.R. Horgan. Dynamic program slicing. In SIGPLAN Conference on Programming Languages Design and Implementation, volume 25 of A CM SIGPLAN Notices, pages 246-256, White Plains, New York, June 1990. 2. A.V. Aho, R. Sethi, and J.D. Ullman. Compilers: Pmnciples, Techniques and Tools. Addison-Wesley, 1985. 3. M. Alt and F. Martin. Generation of efficient interprocedural analyzers with PAG. In SAS'95, Static Analysis, number 983 in Lecture Notes in Computer Science, pages 33-50. Springer~ 1995. 4. J.P. Banning. An efficient way to find the side effects of procedure calls and the aliases of variables. In A CM Symposium on Principles of Programming Languages, pages 29-41, New York, NY, 1979. ACM Press. 5. D.R. Chase, M. Wegman, and F. Zadeck. Analysis of pointers and structures. In SIGPLAN Conference on Programming Languages Design and Implementation, pages 296-310, New York, NY, 1990. ACM Press. 6. J.D. Choi, M. Burke, and P. Carini. Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side-effects. In A CM Symposium on Principles of Programming Languages , pages 232-245, New York, NY, 1993. ACM Press. 7. K.D. Cooper and K. Kennedy. Interprocedural side-effect analysis in linear time. In SIGPLAN Conference on Programming Languages Design and Implementation, pages 57-66, New York, NY, 1988. ACM Press. 8. A. Deutsch. A storeless model for aliasing and its abstractions using finite representations of right-regular equivalence relations. In IEEE International Conference on Computer Languages, pages 2-13, Washington, DC, 1992. IEEE Press. 9. A. Deutsch. Interprocedural may-alias analysis for pointers: Beyond k-limiting. In SIGPLAN Conference on Programming Languages Design and Implementation, pages 230-241, New York, NY, 1994. ACM Press. 10. A. Deutsch. Semantic models and abstract interpretation for inductive data structures and pointers. In Proe. of ACM Symposium on Partial Evaluation and Semantics-Based Program Manipulation, PEPM'95, pages 226-228, New York, NY, June 1995. ACM Press. 11. M. Emami, R. Ghiya, and L. Hendren. Context-sensitive interprocedural points-to analysis in the presence of function pointers. In SIGPLAN Conference on Programmmg Languages Deszgn and Implementation, New York, NY, 1994. ACM Press. 12. J. Ferrante, K. Ottenstein, and J. Warren. The program dependence graph and its use in optimization. A CM Transactwns on Programming Languages and Systems, 3(9):319-349, 1987. 13. R. Ghiya and L.J. Hendren. Is it a tree, a dag, or a cyclic graph? In ACM Symposium on Principles of Programming Languages, New York, NY, January 1996. ACM Press. 14. R. Ghiya and L.J. Hendren. Connection analysis: A practical interprocedural heap analysis for c. In Proe. of the 8th Intl. Work. on Languages and Compilers for Parallel Computing, number 1033 in Lecture Notes in Computer Science, pages 515-534, Columbus, Ohio, August 1995. Springer-Verlag.

235 15. R. Ghiya and L.J. Hendren. Putting pointer analysis to work. In ACM Symposium on Pmnciples of Programming Languages. ACM, New York, January 1998. 16. L. Hendren. Parallelizing Programs with Recursivc Data Structures. PhD thesis, Cornell University, Ithaca, N.Y., Jan 1990. 17. L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems, 1(1):35-47, January 1990. 18. S. Horwitz, P. Pfeiffer, and T. Reps. Dependence analysis for pointer variables. In SIGPLAN Conference on Programming Languages Design and Implementatzon, volume 24 of ACM SIGPLAN Notices, pages 28-40, Portland, Oregon, June 1989. ACM Press. 19. N.D. Jones and S.S. Muchnick. Flow analysis and optimization of Lisp-like structures. In S.S. Muchnick and N.D. Jones, editors, Program Flow Analysis: Theory and Applications, chapter 4, pages 102-131. Prentice-Hall, Englewood Cliffs, N J, 1981. 20. N.D. Jones and S.S. Muchnick. A flexible approach to interprocedural data flow analysis and programs with recursive data structures. In ACM Symposzum on Principles of Programming Languages, pages 66-74, New York, NY, 1982. ACM Press. 21. D.J. Kuck, R.H. Kuhn, B. Leasure, D.A. Padua, and M. Wolfe. Dependence graphs and compiler optimizations. In ACM Symposzum on Principles of Programming Languages, pages 207-218, New York, NY, 1981. ACM Press. 22. W. Land, B.G. Ryder, and S. Zhang. Interprocedural modification side effect analysis with pointer aliasing. In Proc. of the ACM SIGPLAN '93 Conf. on Programming Language Design and Implementation, pages 56-67, 1993. 23. W. Landi and B.G. Ryder. Pointer induced aliasing: A problem classification. In ACM Symposium on Pmnczples of Programming Languages, pages 93-103, New York, NY, January 1991. ACM Press. 24. J.R. Larus. Refining and classifying data dependences. Unpublished extended abstract, Berkeley, CA, November 1988. 25. J.R. Larus and P.N. Hilfinger. Detecting conflicts between structure accesses. In

SIGPLAN Conference on Programming Languages Design and Implementation, pages 21-34, New York, NY, 1988. ACM Press. 26. K.J. Ottenstein and L.M. Ottenstein. The program dependence graph in a software development environment. In Proceedings of the A CM SIGSOFT//SIGPLAN Soft-

ware Engineering Symposium on Practical Software Development Environments, 27.

28.

29. 30. 31.

pages 177-184, New York, NY, 1984. ACM Press. M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. In ACM Symposium on Pmnciples of Programming Languages, New York, NY, January 1996. ACM Press. M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. A CM Transactions on Programming Languages and Systems, 1997. To Appear. M. Shapiro and S. Horwitz. Fast and accurate flow-insensitive points-to analysis. In ACM Symposium on Principles of Programming Languages, 1997. B. Steengaard. Points-to analysis in linear time. In A CM Symposium on Principles of Programming Languages. ACM, New York, January 1996. R.P. Willson and M.S. Lam. Efficient context-sensitive pointer analysis for c programs. In SIGPLAN Conference on Programming Languages Design and Implementation, pages 1-12, La Jolla, CA, June 18-21 1995. ACM Press.

A Complete Declarative Debugger of Missing Answers Salvatore Ruggieri Dipartimento di Informatica, Universit& di Pisa Corso Italia 40, 56125 Pisa, Italy e-malh ruggieri@di, unip• it

A b s t r a c t . We propose two declarative debuggers of missing answers with respect to C- and S-semantics. The debuggers are proved correct for every logic program. Moreover, they are complete and terminating with respect to a large class of programs, namely acceptable logic programs. The debuggers enhance existing proposals, which suffer from a problem due to the implementation of negation as failure. The proposed solution exploits decision procedures for C- and S-semantics introduced in [9]. K e y w o r d s . Logic programming, Declarative Debugging, Error Diagnosis, Missing Answers, Acceptable Logic Programs.

1

Introduction

Declarative debugging is concerned with finding errors t h a t cause anomalies during the execution of a program starting from some information on the intended semantics of the program. In logic p r o g r a m m i n g systems, a query which is valid in the intended meaning of a program but that is not in its actual semantics is an anomaly called missing answer. A missing answer originates from a "failure" in the construction of a proof tree for a valid query. The reason of such a failure is the presence of uncovered atoms, i.e. of atoms A in the intended interpretation of the program, for which there is no clause instance whose head is A and whose body is true in the intended interpretation. In other words, there is no immediate justification in the program in order to deduce A. The role of a declarative debugger is to find out uncovered atoms starting from missing answers, which are usually detected during the testing phase, and from the intended semantics of the program. In this paper, we concentrate on the C-semantics of Falaschi et al. [5] (or least t e r m model semantics of Clark [3]) and on the S-semantics of Falaschi et al. [6]. Many debuggers in the literature find uncovered atoms starting from missing answers t h a t have a finitely ]ailed SLD-tree. As we will point out, the assumption t h a t missing answers have finitely failed SLD-trees is restrictive in some cases, and it is due to a well-known limitation of the negation as failure rule. We show t h a t restriction in the case of Shapiro's debugger [10] [8, Debugger S.I]. In this paper, we propose two declarative debuggers of missing answers for C- and S-semantics t h a t are correct for any program, and complete and terminating for

237

a large class of logic programs, namely acceptable programs [2]. The implementations of the debuggers rely on decidability procedures for g- and S-semantics which are adapted from [9]. Compared with Shapiro's approach, the debugger for C-semantics relaxes the assumption that the missing answers in input have finitely failed SLD-trees. In addition, we show that a smaller search space is considered. T h e debugger for S-semantics is derived by applying the insights underlying the construction of that for C-semantics to the theory of S-semantics. The only approach on debugging of missing answers with respect to S-semantics is due to Comini et al. [4]. They introduce a method for finding all uncovered atoms starting from the intended interpretation of an acceptable program. However, their approach is effective if[ the intended interpretation is a finite set, whilst we make a weaker assumption.

P r e l i m i n a r i e s We use in this paper the standard notation of Apt [1], when not specified otherwise. In particular, we use queries instead of goals. We denote by L the underlying language a program is defined on. AtomL denotes the set of atoms on L, BL the Herbrand base on L. Usually, one considers L = Lp. groundL(P) denotes the set of ground instances of clauses from P. LD-resolution is SLDresolution together the leftmost selection rule. An atom is called pure if it is of the form p ( x l , . . . , x,~) where xl , . . . , xn are different variables. N is the set of natural numbers. For a ground term t, ll(t) = ll(tl) + 1 if t = [t2ltl] and ll(t) = 0 otherwise, i.e. II is the list-length function.

2

Program

Semantics

and

Missing

Answers

Several declarative semantics have been considered as alternatives to the standard least Herbrand model. We focus on two of them, namely g-semantics of Falaschi et al. [5] (also known as the least term model of Clark [3]) and Ssemantics of Falaschi et al. [6]. D e f i n i t i o n 1. For a logic program P we define

e(P) = { A 9 AtomL [ P ~ A } S ( P ) = { A 9 AtomL [ A is a computed instance of a pure atom }.

[]

By correctness of SLD-resolution, we observe that S ( P ) C_C(P). To each semantics is associated a continuous immediate consequence operator. For a program P , the least fixpoint of TCp, C(P), and the upward ordinal closure T c I" w coincide [5]. Similarly, the least fixpoint of T~, S ( P ) , and the upward ordinal closure Tps t w coincide [6].

238

D e f i n i t i o n 2. For a logic program P we define the following functions from sets of atoms into set of atoms:

TCp(I) = { A8 6 AtomL I 3 A ~ B1 , . . . ,Bn 6 P, {B18,...,BnOJC_I

}

Tap(I) = { A8 6 AtomL I 3 A+--B1 , . . . ,Bn 6 P, B 1t , . . . , B,z! variants of atoms in I and renamed apart e = mg

((B1 , . . . , a n ),

, . . . , B " )) }

[]

An intended interpretation of a program w.r.t, a semantics is a set of atoms which, in the intentions of the programmer, is supposed to be the actual semantics of the program. Starting points of the debugging analysis are missing

answers. D e f i n i t i o n 3. We say that a query is in a set of atoms if every a t o m of the query is in the set. Let ~- be the C- or S-semantics, and 27 be the intended interpretation of a program P w.r.t. ~ . A missing answer w.r.t. ~" is any query which is in 27 but that is not in 5r(P). [] Missing answer are caused by uncovered atoms, i.e. a t o m s valid in the intended meaning of a program t h a t have no immediate justification in the program. D e f i n i t i o n 4. An atom A is uncovered if A 6 27 and A ig TRY(27).

3

Shapiro's

[]

Debugger

Consider the semantics of correct instances of logic programs, i.e. C-semantics. A query Q which is supposed to be a logical consequence of the program, "fails" if it is not. However, by saying that a query Q "fails" it is often meant Q finitely fails. This stronger assumptions is due a well-known limitation of the negation as failure rule, and it affects several declarative debuggers in the literature. Let us consider the Shapiro's debugger [10] [8, Debugger S.I] as an example. Let P be the program under analysis. miss([A ] B], Goal) <--not(call(A) ), miss(A, Goal). miss([A I B], Goal) <-call(A), miss(B, Goal). miss(A, Goal) +user_pred(A), clause(A, B), valid(B), miss(B, Goal). miss(A, A) 4-

239

user_pred(A), not( (clause(A, B), valid(B)) ). clause (A, [B1 . . . . . augmented by P.

B~]).

/or every A +- B i , . . . , Bn 6 P

Program 1 user_pred characterizes user-defined predicates, and it is a collection of facts

u s e r _ p r e d ( p ( X l , . . . , Xn)) for every predicate symbol p of arity n. v a l i d is an oracle defining the intended meaning of P. It may be implemented either by queries to the programmer or by using some specification of the program. Consider now the program: S,

p(X) +-q(Y), r(Y,X). q(a) . 7. q(b). 7. missing r(a, c). r(b, X). Program 2

Shapiro's debugger correctly works when the missing answer in input has a finitely failed tree. On the other hand, the query p ( X ) , s is a missing answer, since p(X) is not a logical consequence of the program, but there is no finitely failed tree, since there is a successful derivation that instantiates X to c. A call m i s s ( [ p ( X ) , s ] , A) to the Shapiro's debugger fails to return that q(b) is uncovered. The need for the hypothesis of finite failure lies in the use of negation in clauses such as: miss([A i B], Goal ) +-not(call(A)

), miss(A, Goal).

where the debugger tries to prove n o t ( c a l l ( A ) ). Due to well-known limitations of the negation as failure rule, not ( c a l l ( t ) ) succeeds if P ~ -~3A, i.e. if there is a finitely-failed SLD-tree. Instead, the intended use of n o t ( c a l l (A) ) is to prove P ~ ~VA, i.e. that A is not a logical consequence of P. A form of completeness has been shown for Ferrand's debugger [7], [8, Debugger F.1], which is obtained from Program I by removing the literals n o t ( c a l l ( h ) ) and c a l l ( A ) . As an example, the uncovered atom q(b) of Program 2 is detected by Ferrand's debugger. However, Ferrand's approach differs from ours in the fact that it considers impossible atoms instead of uncovered atoms. A is impossible if no instance of A is uncovered. Let us consider the following incorrect Ehppend version of the Append program: append( [], Xs, Xs ). append( [XIXs], Ys, [XIZs] ) +append( Ys, Ys, Zs ). 7. should be append(Xs, Ys, Zs). Program 3

240

The query Q = append( r x ] , Ys, [x IYs3 ) is a missing answer, since it is in the intended interpretation, and it is not a logical consequence of EAppend. However, Shapiro's debugger is not able to find out that Q is an uncovered atom, since Q has not a finitely failed tree. On the other hand, Ferrand's debugger does not show that Q is uncovered, due to the fact that there exists an instance of Q which is covered, namely append([XJ, [J, IX]), and then Q is not an impossible atom. Finally, it is worth noting that both debuggers ask the oracle for a valid instance of append(Ys, Ys, Ys). Consequently, the call miss (append( [] , [] , [ ] ) , Goal) is made, where a p p e n d ( I ] , [ ] , []) is not a missing answer. In general, unnecessary questions are addressed to the oracle, in the sense that the search space includes queries that are not missing answers, and then cannot lead to an uncovered atom.

4

A c c e p t a b l e pro g ra m s

In this section, we introduce acceptable logic programs, a well-known large class for which we provide a decision procedure for C- and S-semantics. 4.1

The Framework

First, we recall the definition of level mappings [2]. D e f i n i t i o n 5. A level mapping is a function I I : BL -4 N of ground atoms to natural numbers. For A E BL, I A I is called the level of A. [] We are now in the position to recall the definition of acceptable logic programs, introduced by Apt and Pedreschi [2]. Intuitively, the definition of acceptability requires that for every clause, the level of the head of any of its ground instances is greater than the level of each atom in the body which might be selected further in a LD-derivation. D e f i n i t i o n 6. A program P is acceptable by [ [: BL --~ N and a Herbrand interpretation I iff I is a model of P, and for every A +-- B1, 99 9 Bn in groundn (P) :

forie[1,n]

I~B1,...,Bi_I

implies

P is acceptable if it is acceptable by some [ [ and I.

IAI

>lBi[. []

Consider the following program PKEOKDERfor preorder traversals of binary trees.

(pl) (p2) (ps)

preorder(nil, [] ). preorder(leaf(X), [X] ). preorder(tree(X, Left, Right), Ls) preorder(Left, As), preorder(Right, Bs), append(KXIAs], Bs, Ls).

241

append( [], Xs, Xs ). append( [XIXs], Ys, [XIZs] ) ~append( Xs, Ys, Zs ).

Program 4 PREORDER is acceptable by I I and I, where I p r e o r d e r ( t , ls) I = nodes(t) + 1 lappend(xs, ys, zs) I : l l ( x s )

z = { append(xs, ys, zs) I ll(zs) = ll(xs) + lt(ys) } U U { p r e o r d e r ( t , ls) I U(ls) = nodes(t) } where nodes(t) is 1 if t = l e a f ( s ) ; it is 1 + nodes(l) + nodes(r) if t = t r e e ( s , l, r); and it is 0 otherwise. Suppose now that the variable X in clause (p2) has been erroneously typed in lower case, and let PREORDERI be PREORDERwhere (p2) is replaced by:

(p2')

preorder(leaf(x),

[x] ).

PREORDERI is still acceptable by the same I I and I. Note that p r e o r d e r ( t r e e (E, l e a f (X), l e a f (Y)), [E, X, Y] ) is a missing answer w.r.t. C and S-semantics. However, the query has no finitely failed SLD-tree. In particular, Shapiro's debugger is not able to find an uncovered atom starting from it. Other examples of acceptable programs are Program 2 and Program 3. 4.2

Decision Procedures

We sum up the decidability properties of acceptable programs we are interested in by means of the following Lemma reported from [9]. T h e o r e m 7. Let P be an acceptable program. Every LD-derivation for P and any ground query (in any language) is finite. Moreover, C(P) and S ( P ) are decidable sets. [] The decision procedure for C- and S-semantics will be the crucial in the debugging approach of this paper. Interestingly, they have a natural implementation in the logic programming paradigm itself as Prolog meta-programs. The procedures are provided in [9] for observable decidability. Here we specialize them for semantics decidability. We assume that the predicate f r e e z e introduced in [12, Section 10.3] is available. A call to f r e e z e ( t , B) replaces every variable of A with new distinct constants to obtain B. Unfortunately, as described by Sterling and Shapiro, f r e e z e is not present in existing Prolog implementations. In fact, in [9] f r e e z e is approximated by a predicate c o n s t a n t s that replaces every variable in Q by new distinct constants. The approximation works until a fixed maximum number of new

242

constants is reached, and the results are parametric to that number. Here, we abstract away from such a parameterization and assume that f r e e z e and its dual m e l t are available. Given a term B, m e l t ( B , A) replaces every constants of B introduced by freezing some term by the original variable to obtain A. For instance f r e e z e ( p ( X ) , Y) succeeds by instantiating Y to p ( a x ) , where ax is a fresh constant representing the frozen variable X. The following is the decision procedure for S-semantics. in_s (A) freeze(A, A1), pure(A, B), demo ( [hl], [B] ), variants(A, B). demo([], []). demo([AIAs], [BIBs]) +clause(A, Ls, Id), demo(Ls, Lls), demo(As, Bs), clause(B, LIs, Id). pure (p (XI ..... Xn), p(YI ..... yn)).

for every predicate symbol p o/ arity n

clause(A, [B1, . . . ,

Bn], k).

foreveryCk=A+-B1,...,Bn6P

augmented by the definition of variants [12, Program 11.7]. Program 5

v a r i a n t s ( A , A1) is an extra-logical predicate that succeeds iff A and A1 are variants, p u r e (h, B) computes a pure atom for the predicate symbol of a given atom. c l a u s e ( A , [ B ] , k) models the clause Ck = A e- B of P. A distinct identifier k is assigned to every clause of P. Finally, as a corollary of the results reported in [9], given a program P, we have that for an atom A, in_s (A) succeeds iff A 6 S ( P ) . Moreover, if P is acceptable then every LD-derivation of in_s (A) is finite. Next we present the procedure for C-semantics. Given a program P and an atom A, in_c(A) succeeds iff A 6 C(P). Moreover, if P is acceptable then every LD-derivation of in_c (A) is finite. in_c (A) +freeze(A, A1), c a l l (hl). augmented by P. Program 6

243

5 5.1

Declarative D e b u g g e r s C-semantics

We revise the Shapiro's debugger, by integrating the decision procedure in_c within it. (O)

(i)

missing_answers_c(Q, Goal) miss(Q, Goall), melt(Goall, Goal). miss([A I B], Goal) 4--

~

miss(A, Goal). miss([A ~ B], Goal)

not(in_c(A)

4--

),

~-

in_cCA), ~i~)

~v)

miss(B, Goal). miss(A, Goal) 4user_pred(A), freeze(A, A1), clause(Al, B), valid_c(B), miss(B, Goal). miss(A, At) +user_pred(A), freeze(A, A1),

not( (clause(Al, B), valid_c(B))

).

clause(A, [B1 . . . . .

B~]).

foreveryA~-B1,...,B~ 6 P

augmented by Program 6.

Program 7 valid_c is an oracle describing the queries in the intended interpretation ~.

Formally, called V the definition of v a l i d _ c , an atom v a l i d _ c ( [B] ) is in C(V) iff B is in ~. Consider now Program 2. We have the following definitions of user_pred and valid_c: user_pred (s) . user_pred (p (X)) . user_pred (q(X)) . user_pred(r (X, Y)). val id_c (s). valid_c (p(X)) . valid_c (q(a)) . valid_c (q(b)). valid_c (r (a, c ) ) . valid_c (r (b, X) ).

244

valid_c ( [A IB] ) valid_c (A), valid_c (B). valid_c ( [] ) 9

A call missing_answers_c([p(X) ,s] , A) has a finite LD-tree, and computes the uncovered atom A = q(b). The computation progresses as follows. First p (X) is found to be not a logical consequence of the program. Then the clause p(X) +- q(Y), r(Y,X) is instantiated by Y = b in order to find a valid body. Note that the instance Y = a, X = c cannot be considered as X is frozen. Finally, q(b) is found to be an uncovered atom. Consider PREORDER' and the missing answer Q----preorder(tree(E, leaf(X), leaf(Y)),

[E, X, Y]).

A call missing_answers_c ( [Q] , A) computes A = preorder(leaf (X), [X] ), which is indeed the correctly typed clause (p2).

Consider now Program 3, i.e. EAppend, and the missing answer Q = append( [X], Ys, IX IYs] ). The proposed debugger shows that Q is an uncovered atom. The only query to the oracle during the computation is v a l i d _ c ( a p p e n d ( Y s , Ys, Ys)) with Ys frozen. In other words, the oracle is asked whether append(Ys, Ys, Ys) is valid in the intended meaning of EAppend, which is obviously false. The debugger is correct for every logic program. T h e o r e m 8 ( C - C o r r e c t n e s s ) . Let P be a program, and Q a missing answer w.r.t. C-semantics. If m i s s i n g _ a n s w e r s _ c ( [ Q ] , Goal) has a LD-computed instance m i s s i n g _ a n s w e r s _ c ( [ Q ] , Goal) then Goal is an uncovered atom. Proof. First of all, we consider a language L' obtained by adding to L sufficiently many new constants, which are employed by the predicate f r e e z e . Let us show that we are in the hypotheses of the Theorem considering L' instead of L, and ~' = { A~ 6 AtomL, I A 6 AtomL A A 6 I } instead of I . Since Q is a missing answer w.r.t. 27, then Q is in/7' and P ~: Q, i.e. Q is a missing answer w.r.t. I ' . Moreover the definition V of valid_c is an oracle w.r.t. I ' . In fact, consider any query Q in L'. Q can be written as Q'8, where Q' is in L and t? replaces some variables of Q' with distinct constants not in L. Then, we have that Q is in 27' iff Q' is in I , and then, since V is an oracle, iff V ~ v a l i d _ c ( [ Q ' ] ) . By the Theorem on Constants (see e.g. [11]), V ~ v a l i d _ c ( [Q'] ) iff V ~ valid_c ( [Q'] ) 8, when 8 is of the considered form. This implies, that Q is in I ' iff V ~ v a l i d _ c ( [Q] ), i.e. that V is an oracle w.r.t. :/7'. We now show that any computed instance of miss ( [ Q ] , Goal) returns an uncovered atom on L'. The proof proceeds by induction on the number n of calls to miss in a refutation. (n = 1). Goal can be only instantiated by applying rule (iv). Then there is no clause instance A1 +-- B whose body is in :~', i.e. A1 r TpC(I'), and A1 is obtained by freezing A, hence A1 in Z'. Then Goal is instantiated by an uncovered atom.

245

(n > 1). We show that the hypothesis of the theorem holds for calls to miss in clauses (i, ii, iii). (i) Since n o t ( i n _ c ( A ) ) succeeds, A is not in C(P) albeit by hypothesis it is in I ' . Therefore, A is a missing answer. (ii) Since in_c(A) succeeds, A is in C(P). Therefore, B must be a missing answer. (iii) A1 + - B is a clause instance such that A1 is obtained by freezing A and B is in 2'. By the Theorem on Constants, A is not in C(P) implies A1 not in C(P). As a consequence B is not in C(P). Otherwise, by Definition 2, A1 would be in TC(C(P)) = C(P). Therefore, the call miss ( B , Goal) satisfies the inductive hypothesis, i.e. B is a missing answer. In conclusion, the call m i s s ( Q , G o a l l ) in (o) instantiates G o a l l with an uncovered atom Goall on L'. By melting the frozen variables of Goall, we obtain an atom Goal on L such that Goal is in I (since Goall is in 2:') but not in TpC(I) (otherwise Goall would be in TpC(I'), i.e. Goal is uncovered. [] Restricting the attention to acceptable programs, we are in the position to show completeness of the debugger. We make the further hypothesis that there are finitely many oracle's answers for Q, i.e that there are finitely many LD-derivations for every call to v a l i d _ c during a LD-derivation for missing_answers_c ( [ Q ] , Goal). T h e o r e m 9 ( C - C o m p l e t e n e s s ) . Let P be an acceptable program, and Q a

missing answer w.r.t. C-semantics such that there are finitely many oracle's answers/or Q. Then there exists a LD-computed instance o ] m i s s i n g _ a n s w e r s _ c ( [ Q ] , Goal). Proof. Reasoning as in the proof of Theorem 8, we can assume a language with infinitely many constants. We observe that every prefix ~ of a LD-derivation for Q is finite if the variables of Q are never instantiated along ~. In fact, let 8 be a substitution of the variables of Q with new distinct constants. If there is an infinite prefix ~ of a LD-derivation for Q such that the variables of Q are never instantiated along ~, then ~8 would be an infinite prefix of a LD-derivation for QS. This is impossible by Theorem 7, since Q~ is ground. We denote by dv the maximum length of a prefix of a LD-derivation for Q that does not instantiate any variable of Q. The proof proceeds by induction on dQ. (dQ = 1). Let A be the leftmost atom in Q. We claim that A ~_C(P). Otherwise, by strong completeness of SLD-resolution, there exists a LD-refutation for A that does not instantiate any variable of A. As a consequence, dQ > 1. Therefore, A is a missing answer. By applying clause (i) the query miss (A, Goal) is resolved. We now distinguish two cases: either A is or not a variant of the head of a clause instance A1 +-- B such that B is in the intended interpretation I . In the latter case, by resolving m i s s ( A , Goal) with clause (iv) we get a refutation, since there are finitely many oracle answers.

246

In the former case, A unifies with a clause head without instantiating its variables, and then dA > 1 and dQ > 1. In conclusion, the former case is impossible. (dQ > 1). Let A be the leftmost atom in Q = D, A, E such that A r C(P). By repeatedly applying clauses (i, ii) the query m i s s ( A , Goal) is eventually resolved, since for acceptable programs the calls to in_c terminate. Moreover, since D is in C(P) then by strong completeness of SLD-resolution, there exists a LD-refutation for D that does not instantiate any variable of D, hence dQ > dA. Again, we distinguish two cases: either A is or not a variant of the head a clause instance of P such that the body is in the intended interpretation I . In the latter case, by resolving miss (A, Goal) with clause (iv) we get a refutation, since there are finitely many oracle answers. In the former case, clause (iii) is applicable and a query m i s s ( B , Goal) is eventually resolved where A1 +- B is an instance of a clause c from P such that A1 is obtained by freezing A (i.e., A1 -- A# for # substituting variables with fresh constants) and B is in 5[. As shown in the proof of Theorem 8, B cannot be in C(P), otherwise A would be. Therefore, B is a missing answer. It is readily checked that dA = dA1. We claim that dA1 > dB. In fact, let ~ be the prefix of a LD-derivation for P and B that does not instantiate any variable of B. We observe that the LD-resolvent of A1 and c is more general than B. Therefore, there exists (' prefix of a LD-derivation for A1 longer than ~. Summarizing, dA1 > dB. As a consequence, B is a missing answer and dQ > dA = dA1 > dB. Therefore we can apply the inductive hypothesis on B to obtain the conclusion of the Theorem. [] Finally, we have termination of the debugger. T h e o r e m 10 ( C - T e r m i n a t i o n ) . Let P be an acceptable program, and Q a

query such that there are finitely many oracle's answers for Q. Then the LD-tree Of missing_answars_c([Q] , Goal) is finite. Proof. Suppose there is an infinite LD-derivation. Since calls to • and v a l i d _ c terminate, then it necessarily happens that clause (iii) is called infinitely many times: m i s s ( A 1 , Goal) . . . . . m i s s ( A n , Goal) . . . . . By reasoning as in the proof of Theorem 9, we have that dA1,..., dA,,,.., is an infinite decreasing chain of naturals. This is impossible since naturals are well-founded. [] 5.2

S-semantics

We observe that clauses (iii, iv) of Program 7 followed directly from the definition of uncovered atoms (Definition 4) and the definition of T c (Definition 2). We derive the debugger for S-semantics similarly, but considering now Tp8. (o) (i)

missing_answers_s (Q, Goal) miss(Q, Goal). miss([A i B], Goal) +not(in_s(A) ), miss (A, Goal) .

247

(~i) (~i~)

~)

miss([A I B], Goal) +in_s (A) , miss(B, Goal). miss(A, Goal) +user_pred(A) , pure(A, AI), clause (AI, B), valid_s(B, C), variants(A, A1), miss(C, Goal). miss(A, A) 4user_pred (A) , pure(A, AI), not ( (clause(At, B), valid_s(B, _), variants(A, A1))

). clause(A, [B1 . . . . .

B,~]).

foreveryA~--B1,...,B~6P

augmented by Program 5.

Program 8 v a l i d _ s is an oracle describing the queries in the intended interpretation Z. Formally, called V the definition of v a l i d _ s , an a t o m v a l i d _ s ( [ B ] , [(3] ) is in S ( V ) iff B and (3 are queries whose atoms are in Z and variable disjoint, and (3 is a variant of B. By [6, Theorems 7.1 and 7.7], a call v a l i d _ s ( B , C) has a computed instance v a l i d _ s ( B , C)8 iff for some renamed a p a r t a t o m B in

C),B) and valid_s(B, Therefore, for an atom A the query

$(V), # = m g u ( v a l i d _ s ( B ,

pure(A,

AI), clause(Al,

C)8 = valid~(B,

B), valid_s(B, C), variants(A,

C)#.

AI)

has a LD-refutation iff there exists a renamed a p a r t clause A1 +-- B such t h a t 8 = m g u ( B , C ) for some C in Z and A1O is a variant of A, i.e. iff A 6 Tsp(Z). Consider now, as an example, the following variant of Program 2. S,

p(X) +-q(Y), r(Y,X). X q(a). X missing q(b). r(a, c). r(b, X). We have the following definition of valid_s: valid_s(s, s). valid_s(p(X), p(Y)). valid_s(p(c), p(c)).

248

valid_s valid_s valid_s valid_s

(q(a), q(a)) . (q(b), q(b)). (r ( a , c ) , r ( a , c ) ) . (r (b,X), r ( b , Y ) ) .

valid_s ( [A lAs], [B [Bs] ) +valid_s (A, B), valid_s (As, Bs). valid_s([], []). A call m i s s i n g _ a n s w e r s _ s ( [ p ( c ) , s ] , A) has a finite LD-tree, and returns the uncovered atom q ( a ) . The debugger is correct for every logic program. T h e o r e m 11 ( S - C o r r e c t n e s s ) . Let P be a program, and Q a missing answer w.r.t. S-semantics. If m i s s i n g _ a n s w e r s _ s ( [ Q ] , Goal) has a LD-computed instance missing_answers_s ( [ Q ] , Goal) then Goal is an uncovered atom.

Proof. The proof is by induction on the number n of calls to miss in a refutation. (n = 1). Goal can be only instantiated by applying rule (iv), i.e. if Q is an atom and there is no clause A1 +--B whose body unifies with a query in the intended interpretation I with mgu 8 and A18 is a variant of Q, namely if Q r T~(I). (n > 1). We show that the hypothesis of the theorem holds for calls to miss in clauses (i, ii, iii). (i) Since not ( in_s ( A ) ) succeeds, A is not in S ( P ) albeit by hypothesis it is in I . Therefore, A is a missing answer. (ii) Since in_s (A) succeeds, A is in S ( P ) . Therefore, B must be a missing answer. (iii) Assume that pure(A,

hl), clause(hl, B), valid_s(B, C), variants(A,

succeeds. Then there exists a renamed apart clause A 1 + - B

hl)

such that 8 =

m g u ( B , C) for some C in I and AlP is a variant of A. Since A is not in S ( P ) , then C is not in in S ( P ) . Otherwise, by Definition 2, A would be in Tsp(S(P)) = $ ( P ) . Summarizing, by definition of v a l i d _ s C is in/:, and we showed that C is not in 8 ( P ) . Therefore, the call miss ( C , Goal) satisfies the inductive hypothesis, i.e. C is a missing answer. [] Restricting the attention to acceptable programs, we are in the position to show completeness of the debugger. Also, we assume that there are finitely many oracle's answers. Formally, we say that there are finitely many oracle's answers for Q iff there are finitely many LD-derivations for every call to v a l i d _ s during a LD-derivation for missing_answers_s( [Q], Goal). 12 ( S - C o m p l e t e n e s s ) . Let P be an acceptable program, and Q a missing answer w.r.t. S-semantics such that there are finitely many oracle's answers. Then there exists a LD-computed instance of missing_emswers_s( [Q], Goal).

Theorem

249

Proof. As shown in the proof of Theorem 9, every prefix ~ of a LD-derivation for Q is finite if the variables of Q are never instantiated along ~. We denote by dQ the maximum length of a prefix of a LD-derivation for Q that does not instantiate any variable of Q. The proof proceeds by induction on dq. (dQ = 1). Let A be the leftmost atom in Q. We claim that A r S ( P ) . Otherwise A E C(P). Then by strong completeness of SLD-resolution, there exists a LD-refutation for A that does not instantiate any variable of A. As a consequence, dQ > 1. Therefore, A is a missing answer. By applying clause (i) the query miss (A, Goal) is resolved. We now distinguish two cases: either A is or not a variant of A l p where A1 +- B is a clause of P and # = mgu(B, C) for some C in Z. In the latter case, we observe that by resolving miss (A, Goal) with clause (iv) we get a refutation, since there are finitely many oracle answers. In the former case, A unifies with a clause head without instantiating its variables, and then dA > 1 and dQ > 1. In conclusion, the latter case is impossible. (dQ > 1). Let A be the leftmost atom in Q = D, A, E such that A r S(P). By repeatedly applying clauses (i, ii) the query miss (A, Goal) is eventually resolved, since for acceptable programs the calls to in_s terminate. Moreover, since S ( P ) C_C(P), then by strong completeness of SLD-resolution, there exists a LD-refutation for D that does not instantiate any variable of D, hence dQ ~ dA. Again, we distinguish two cases: either A is or not a variant of A I # where c : A1 +-- B is a clause of P and # = mgu(B, C) for some C in Z. In the latter case, we observe that by resolving m i s s ( A , Goal) with clause (iv) we get a refutation, since there are finitely many oracle answers. In the former case, clause (iii) is applicable and miss ( C , Goal) is eventually resolved. In fact, by definition of v a l i d _ s , the query p u r e ( t , A1), c l a u s e (A1, B), v a l i d _ s ( B , C), v a r i a n t s ( t , A1) succeeds under the stated hypothesis. As shown in the proof of Theorem 8, C cannot be in S ( P ) , otherwise A would be in 8 ( P ) , and then C is a missing answer. We claim that dA > dc. Let ~ be the prefix of a LD-derivation for P and C that does not instantiate any variable of C. Since A and A I # are variants, then there exists a renaming substitution a such that A = Al#a. Moreover, A +-- C # a is an instance of c. Let 8 be a substitution mapping all variables of A +-- C # a into distinct fresh constants. Since no variable of C is instantiated in ~, then there exists a prefix ~ of a LD-derivation for C/tat? of the same length of ~. We observe that the LD-resolvent of At? and c is more general than C#at?, since At? ~ C/tat? is an instance of c. Therefore, there exists a prefix ~" of a LD-derivation for At? longer than ~. By substituting in ~", every fresh constant introduced by t? with the variable it replaced, we get a prefix of a LD-derivation for P and A that does not instantiate any variable of A and whose length is greater than that of ~. Summarizing, C is a missing answer and dq > dA > dc. Therefore we can apply the inductive hypothesis on C to obtain the conclusion of the Theorem. [] Finally, we have termination of the debugger.

250

T h e o r e m 13 ( S - T e r m i n a t i o n ) . Let P be an acceptable program, and Q a query such that there are finitely many oracle's answers for Q. Then the LD-tree of missing_answers_s ( [Q] , Goal) is finite. [] Observe that the hypothesis that there are finitely many oracle's answers is rather restrictive in the case of S-semantics. Looking at Program 8, we quickly realize that clauses (iii, iv) call v a l i d _ s (B, C) with B instantiated by the body of some program clause. In the case of a simple program such as Append, the resulting query v a l i d _ s ( [ a p p e n d ( X s , Ys, Z s ) ] , C) has infinitely many LDderivations. In general, we have that "finitely many oracle's answers" actually requires that E is a finite set. However, by noting that v a r i a n t s ( A , A1) must succeed, a weaker assumption can be made by defining an oracle v a l i d _ s (B, C, A, A1) that stops a LD-derivation if A1 becomes more instantiated than A (or some sufficient condition that implies that, e.g. by checking that the size of A1 remains lower or equal than the size of A). By such an enhanced oracle, we have that the assumption of "finitely many oracle's answers" is less restrictive, and allows for reasoning on intended interpretations that are infinite sets. As an example, starting from the missing answer append (IX], Ys, [X lYs] ) for Program 3, the debugger finds out that it is an uncovered atom, when v a l i d _ s is as described above. 6

Discussion

A C o m p l e t e n e s s R e s u l t The following result is an immediate consequence of Theorems 9 and 12. T h e o r e m 14. Let P be an acceptable program. I / there is a missing answer Q w.r.t. C-semantics (resp., S-semantics) and there are finitely many oracle's answers ]or Q then there exists an uncovered atom. [] A non-constructive proof has been established by Comini et al. [4] also in the case that there are not finitely many oracle's answers. Indeed, the proofs of Theorems 9 and 12 (non-constructively) show that result if we remove the hypothesis that there are finitely many oracle's answers. B o u n d e d P r o g r a m s A declarative characterization of a class larger than acceptable programs is considered in [9], namely the class of bounded logic programs, and a decision procedure is provided with respect to C- and S-semantics. Unfortunately, the overall approach presented in this paper cannot be extended to bounded programs. The main problem is that Theorem 14 does not hold for bounded program, in general. As an example, p is a missing answer for the following (bounded) program, but there is no uncovered atom. p +--p

Y,p

7, missing

Future work is aimed at extending the ideas and the results presented here to larger classes of logic programs, by investigating subclasses of bounded programs.

251

C o n c l u s i o n s We presented two declarative debuggers of missing answers with respect to C- and S-semantics. T h e y are correct for any program, and complete and terminating for a large class of logic programs. The implementations of the debuggers rely on decidability procedures for C- and S-semantics which are adapted from [9]. The results presented in this paper improve on Shapiro's and Ferrand's proposals for completeness and termination. Moreover, efficiency is improved in the following sense. As shown in Theorem 8, only calls m i s s ( [ Q ] , Goal) where Q is a missing answer are made. On the contrary, from the example Program 3, we realize t h a t Shapiro's and Ferrand's debuggers search space include queries t h a t are not missing answers, and then cannot lead to uncovered atoms. Also, we introduced a debugger for S-semantics. The only approach to debugging of missing answers w.r.t. S-semantics is due to Comini et al. [4]. T h e y present a method for finding all uncovered atoms starting from the intended interpretation E of an acceptable program. However, the approach of Comini et al. is effective iff 1: is a finite set, whilst we require a weaker condition, namely t h a t there are finitely m a n y oracle's answers.

References 1. K.R. Apt. Logic programming. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, pages 493-574. Elsevier, 1990. 2. K.R. Apt and D. Pedreschi. Reasoning about termination of pure prolog programs. Information and computation, 106(1):109-157, 1993. 3. K.L. Clark. Predicate logic as a computational formalism. Technical Report DOC 79/59, Imperial College, Dept. of Computing, 1979. 4. M. Comini, G. Levi, and G. Vitiello. Declarative Diagnosis Revisited. In J. W. Lloyd, editor, Proceedings o] the 1995 International Logic Programming Symposium, pages 275-287. The MIT Press, 1995. 5. M. Falaschi, G. Levi, M. Martelli, and C. Palamidessi. A Model-Theoretic Reconstruction of the Operational Semantics of Logic Programs. Information and Computation, 103(1):86-113, 1993. 6. M. Falaschi, G. Levi, C. Palamidessi, and M. Martelli. Declarative Modeling of the Operational Behaviour of Logic Languages. Theoretical Computer Science, 69(3):289-318, December 1989. 7. G. Ferrand. Error Diagnosis in Logic Programming, an Adaption of E. Y. Shapiro's Method. Journal of Logic Programming, 4(3):177-198, 1987. 8. L. Naish. Declarative Diagnosis of Missing Answers. New Generation Computing, 10(3):255-285, 1991. 9. S. Ruggieri. Decidability of Logic Program Semantics and Applications to Testing. In Proc. of PLILP'96, volume 1140 of Lecture Notes in Computer Science, pages 347-362. Springer-Verlag, Berlin, 1996. 10. E. Shapiro. Algorithmic program debugging. The MIT Press, 1983. 11. J. Shoenfield. Mathematical logic. Addison Wesdley, Reading, 1967. 12. L. Sterling and E. Shapiro. The Art o/Prolog. The MIT Press, second edition, 1994.

Systematic Change of Data Representation: Program Manipulations and a Case Study William L. Scherlis * School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213

A b s t r a c t . We present a set of semantics-based program manipulation techniques to assist in restructuring software encapsulation boundaries and making systematic changes to data representations. These techniques adapt abstraction structure and data representations without altering program functionality. The techniques are intended to be embodied in source-level analysis and manipulation tools used interactively by programmers, rather than in fully automatic tools and compilers. The approach involves combining techniques for adapting and specializing encapsulated data types (classes) and for eliminating redundant operations that are distributed among multiple methods in a class (functions in a data type) with techniques for cloning classes to facilitate specialization and for moving computation across class boundaries. The combined set of techniques is intended to facilitate revision of structural design decisions such as the design of a class hierarchy or an internal component interface. The paper introduces new techniques, provides soundness proofs, and gives details of case study involving production Java code.

1

Introduction

Semantics-based program manipulation techniques can be used to support the evolution of configurations of program components and associated internal interfaces. Revision of these abstraction boundaries is a principal challenge in software reengineering. While structural decisions usually must be made early in the software development process, the consequences of these decisions are not fully appreciated until later, when it is costly and risky to revise them. Program * This material is based upon work supported by the National Science Foundation under Grant No. CCR-9504339 and by the Defense Advanced Research Projects Agency and Rome Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-97-2-0241. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Defense Advanced Research Projects Agency, Rome Laboratory, the National Science Foundation, or the U.S. Government. Author email: [email protected].

253

manipulation techniques can potentially support the evolution of internal interface and encapsulation structure, offering greater flexibility in the management of encapsulation structure. This paper focuses on data representation optimizations that are enabled by structural manipulations. It presents three new results: First, it introduces two new program manipulation techniques, idempotency and projection. Second, it provides proofs for the two techniques and for a related third technique, shift, that was previously described without proof [$86,$94]. Finally, a scenario of software evolution involving production Java code is presented to illustrate how the techniques combine with related structural manipulations to accomplish representation change. The case study is an account of how the classes java. lang. String, for immutable strings, and j a v a . lang. S t r i n g B u f f e r , for mutable strings, could have been derived through a series of structural manipulations starting with a single class implementing C-style strings. The intent of this research is to make the evolution of data representations a more systematic and, from a software engineering point of view, low-risk activity that can be supported by interactive source-level program analysis and manipulation tools. Source-level program manipulation presents challenges unlike those for automatic systems such as compilers and partial evaluators. The sections below introduce the program manipulation techniques, sketch their proofs, and give an account of the evolutionary case study. 2

Program

Manipulation

Most modern programming languages support information-hiding encapsulations such as Java classes, Standard ML structures and functors, Ada packages, and Modula-3 modules. In this paper, we use a simplified form of encapsulation based on Java classes but with no inheritance (all classes are final), full static typing, and only private non-static fields (i.e., the data components, or instance variables, of abstract objects). The internal representation of an abstract object thus corresponds to a record containing the instance variables. In Standard ML terminology, these classes correspond to abstypes or simplified s t r u c t u r e s . We use the term abstraction boundary to refer informally to the interface between the class and its clients (the s i g n a t u r e in Standard ML), including any associated abstract specifications. When the scope of access by clients to a class is bounded and its contents can be fully analyzed, then adjustment and specialization of the class interface (and its clients) becomes possible. For larger systems and libraries, this scope is effectively unbounded, however, and other techniques, such as cloning of classes (described below), may need to be used to artifically limit access scopes. L a n g u a g e . We describe the manipulation techniques in a language-independent manner, though we adapt them for use on Java classes in the case study. We can think of an object being "unwrapped" as it passes into the internal scope of its controlling class, and "wrapped" as it passes out of the scope. It is convenient in this presentation to follow the example of the early Edin-

254

burgh ML and make this transit of the abstraction boundary explicit through functions Abs and Rep, which wrap and unwrap respectively. T h a t is, Rep translates an encapsulated abstract object into an object of its data representation type, which in Java is a record containing the object's instance variables. For example, if an abstract P i x e l consists of a 2-D coordinate pair together with some abstract intensity information, then Rep : Pixel --+ Int * Int * I n t e n s i t y and Abs : Int * Int * I n t e n s i t y -~ Pixel. Here "*" denotes product of types. The two key restrictions are that l~ep and Abs (which are implicitly indexed on type names) can be called only within the internal scope of the controlling class and they are the only functions that can directly construct and deconstruct abstract values. M a n i p u l a t i o n s . Manipulation of class boundaries and exploitation of specialized contexts of use are common operations at all stages of program development. We employ two general collections of meaning-preserving techniques to support this. The first collection, called class manipulations, is the focus of this paper. Class manipulations make systematic changes to data representations in classes. They can alter performance and other attributes but, with respect to overall class functionality (barring covert channels), the changes are invisible to clients. T h a t is, class manipulations do not alter signatures (and external invariants), but they do change computation and representation. The second class of manipulations are the boundary manipulations, which alter signatures, but do not change computation and representation. Boundary manipulations move computation into or out of classes, and they merge, clone, and split classes. The simplest boundary manipulations are operation migration manipulations, which move methods into and out of classes. Preconditions for these manipulation rules are mostly related to names, scopes, and types. A more interesting set of boundary manipulations are the cloning manipulations, which separate a class into distinct copies, without a need to fully partition the data space. Cloning manipulations introduce a type distinction and require analyses of types and use of object identity to assure that the client space can be partitioned--potentially with the introduction of explicit "conversion" points. Boundary (and class) manipulations are used to juxtapose pertinent program elements, so they can be further manipulated as an aggregate, either mechanically or manually. Boundary manipulations are briefly introduced in [$94] and are not detailed here. It is worth noting, however, that many classical source-level transformation techniques [BD77] can be understood as techniques to juxtapose related program elements so simplifications can be made. Two familiar trivial examples illustrate: The derivation of a linear-space list reversal function from the quadratic form relies on transformation steps whose sole purpose is to juxtapose the two calls to list append, enabling them to be associated to the right, rather than the left association that is tacit in the recursion structure of the initial program. The derivation of the linear Fibonacci from exponential relies on juxtaposing two separate calls to F(x - 2) from separate recursive invocations. Various strategies for achieving this juxtaposition have been developed such as tupling, composition, and deforestation [P84,S80,W88].

255

C l a s s m a n i p u l a t i o n s . The class manipulations adjust representations and their associated invariants while preserving the overall meaning of a class. Details of these techniques and proofs are in the next two sections. The shift transformation, in object-oriented terms, systematically moves computation among the method definitions within a class definition while preserving the abstract semantics presented to clients of a class. Shift operates by transferring a c o m m o n computation fragment uniformly from all wrapping sites to all unwrapping sites (or vice versa). In the case study later in this paper, shift is used several times to make changes to the string representations. Shifts enable frequency reduction and localizing computation for later optimization. For example, in database design, the relative execution frequencies, use of space, and other considerations determine which computations are done when d a t a is stored and which are done when d a t a is retrieved. Shifts are also useful to accomplish representation change. For example, suppose a function Magnitude : I n t * I n t --~ Real calculates the distance from a point to the origin. If magnitudes are calculated when pixels are accessed, shzft can be used to move these calls to wrapping sites. The representation of pixels thus changes from I n t * I n t to I n t * I n t * Real, with distances being calculated only when pixels are created or changed, not every time they are accessed. The project transformation enables "lossy" specialization of representations, eliminating instance variables and associated computation. For example, in a specialized case where intensity information is calculated and maintained but not ultimately used by class clients, project could be used to simplify pixels from Rep : Pixel -+ I n t * I n t * I n t e n s i t y to Rep' : Pixel --+ I n t * I n t and eliminate intensity-related calculations. The idempoteucy transformation eliminates subcomputations t h a t are idempotent and redundant across method definitions in a class. Idempotency is a surprisingly c o m m o n property among program operations. Most integrity checks, cache building operations, and invariant maintaining operations such as tree balancing are idempotent.

3

Class manipulation techniques

Class manipulations exploit the information hiding associated with instance variables in a class (in Standard ML, variables of types whose d a t a constructors are hidden in a s t r u c t u r e or a b s t y p e ) . Significant changes can be made to the internal structure of a class, including introduction and elimination of instance variables, without effect on client-level semantics. Class manipulations are based on Hoare's venerable idea of relating abstraction and representation in d a t a types. S h i f t . The shift rule has two symmetrical variants. As noted above, the Rep and t b s functions are used to manage access to the d a t a representation. A requirement on all instances of Rep or t b s (which are easy to find, since they are all within the syntactic scope of the class) thus has the effect of a universal requirement on all instances where an encapsulated d a t a object is

256

constructed or selected. Let T be the class (abstract) type of encapsulated objects and V be the type of their representations. In the steps below, a portion of the internal computation of all methods that operate on T objects is identified and abstracted into a function called Span that operates on V objects. Note, in a Java class definition V would be an aggregate of the instance variables (nonstatic fields) of the class. For our presentation purposes, we adapt the usual object-oriented notational convention to make the aggregate of instance variables an explicit value of type V. In the case study we make a similar adaptation for Java. This means we can require the fragment of shifted computation, abstracted as Span, can be a simple function, allowing a more natural functional style in the presentation of the technique. The effect of Shift is to move a common computation from all program points in a class where objects are unwrapped to program points where objects are wrapped (and vice-versa): 1. By other manipulations, local within each operation definition, establish that all sites of Rep : T -4 V appear in the context of a call to Span, SpanoRep: T--+V I where Span : V --4 W . That is, there is some common functional portion of computation of every method that occurs at object unwrapping. If this is not possible, the transformation cannot be carried out. Abstract all instances of this computation into calls to Span. 2. Replace Abs and Rep as follows. All instances of Abs: V - - + T

become

Abs ~oSpan : V - + T I

where Abs ~ : V ~ --4 T ~. All instances of Rep as SpanoRep: T~V

~

become

Rep ~: T ~--+V ~

3. It is now established that all sites of Abs ~ : V ~ --4 T ~ are in the context of a call to Span, Abs loSpan : V --~ T I The shift rule, intuitively, advances computation from object unwrapping to object wrapping. The variant of shift is to reverse these three steps and the operations in them, thus delaying the Span computation from object wrapping to subsequent object unwrapping. Although the universal requirement of the first step may seem difficult to achieve, but it can be accomplished trivially in those cases where Span is feasibly invertible by inserting S_pan-t o Span in specific method definitions. The Span operation is then relocated using the rule above. Later manipulations may then be able to unfold and simplify away the calls to Span-1. P r o j e c t . The project manipulation rule eliminates code and fields in a class that has become dead, for example as a result of specialization of the class interface. The rule can be used when the "death" of code can be deduced only by analysis encompassing the entire class. Suppose the representation type V can

257

be written as V1 * V2 for some V1 and V2. This could correspond to a partitioning of the instance variables (represented here as a composite type V) into two sets. If the conditions below are met, then project can be used to eliminate the V2 portion of the representation type and associated computations. The rule is as follows: 1. Define Rep as Rep : T ~ V1 * V2 and Abs as Abs : V1 * V2 ~ T. 2. Represent the concrete computation of each method op that operates on any portion of the internal representation V as o p : VI * V2 * U ~ VI * V2 * W where U and W are other types (representing non-encapsulated inputs and outputs). Then, by other manipulations, redefine op to calculate its result using two separate operations that calculate separate portions of the result, 1 Opl : VI * U - + VI * W op2 : VI * V2 * U --+ V~ Do this analogously for all operations op on internal representations V1 * V2. If this cannot be done for all such operations involving V1 and V2, then the rule cannot be applied. 3. Replace all operations op within the class by the new operations opl. 4. Simultaneously, replace all instances of t b s and Rep by new versions Rep ~ : T --+ 1/1 and t b s ' : V1 -4 T. The effect of project is to eliminate the V~ portion of the overall object representation and all the computations associated with it. In Java, for example, V2 would be a subset of the instance variables of a class. The project manipulation generally becomes a candidate for application after boundary manipulations have eliminated or separated all methods that depend on the particular instance variables corresponding to V2. It is possible to think of shift and project as introduction and elimination rules for instance variables in the specific sense that, a m o n g its uses, shift can be used to introduce new fields and computations that could later be eliminated using project. I d e m p o t e n c y . The idempotency rule is used to eliminate certain instances of idempotent computations such as integrity checks, caching, or invariant-maintaining operations. When an idempotent operation is executed multiple times on the same d a t a object, even as it is passed among different methods, then all but one of the calls can be eliminated. 1. Define an idempotent Span function of type V --+ V. Span o Span = Span

2. Establish that each call of Abs appears in context Abs o Span or, alternatively, establish t h a t each call to Rep appears in context Span oRep. (These variants are analogous to the two variants of shift.) 1 op(vl,~2,~) = [tet (,~,w) = opl(~,,~) and ~ = o p 2 ( ~ , , ~ , ~ )

in ( ~ , 4 , w )

e.d]

258 3. For one or more particular operation (method) definitions, establish in each that the call to Span can be commuted with the entire remaining portion of the definition. op = Span o op' --- op' o Span 4. In each of the cases satisfying this commutativity requirement, replace op by op'. T h a t is, replace the calls to Span by the identity function. Examples: Performance-oriented calculations (spread over multiple operations) such as rebalancing search trees or rebuilding database indexes can be eliminated or delayed when their results are not needed for the immediate operation. An integrity check involving a proper subset of the instance variables can be eliminated from operations that do not employ those variables (i.e., delayed until a later operation that involves a variable in that subset). Note that this integrity check example depends on a liberal treatment of exceptions in the definition of program equivalence. 4

Correctness

of Class

Manipulations

The class manipulations alter data representations and method definitions in a class. Establishing correctness of the manipulations amounts to proving behavioral equivalence of the original and transformed class definitions. Since encapsulated objects are only unwrapped by methods within the class, the external behavior of a class can be construed entirely in terms of sequences of incoming and outgoing "exposed" values (to use Larch terminology) [G93]. This external class behavior is usually related to the internal definition of the class (i.e., the definitions of the instance variables and methods) through an explicit abstraction mapping, as originally suggested by Hoare [H72]. In the proofs below (as in the descriptions of the manipulations above), we employ a functional style in which methods are functional and objects are pure values. The set of instance variables of a Java class, which can be thought of as an implicit aggregate parameter and result, are made explicit in our rendering as new parameters and results of type T. This simple translation is possible (in our limited Java subset) because objects are pure values--i.e., there is no use of "object identity." Suppose, for example, that T is the abstract type of a class, U and W are other types, and there are four methods in the class, which are functional and have the following signatures: opl : U --+ T, op~ : T , T --+ T, op3 : T ~ U, and oP4 : T --+ T * W. Thus, for example, oPl is a constructor and op3 does not alter or create T objects. Properties of objects in a class are expressed using an invariant I(t, v) that relates abstract values t (of type T above) to representation (instance variables) v of type V. Behavioral correctness of a class is defined in terms of preconditions on incoming exposed values, postconditions on outgoing exposed values, and a predicate I that is an invariant and that relates abstract values with representations. If an abstract specification of the class is not needed, the proof can be obtained without reference to the abstract values ~.

259

If opi is a method, let b-~pi be the implementation of the method, typed accordingly. For example, if opi : U * T --+ T * W then b--~pi : U * V --~ V * W. Let pre i be a precondition on values of the exposed inputs to opi and post i be a postcondition on the values of the exposed outputs from opi. For the invariant I to hold for the class above (opl through oP4), the following four properties of I must hold:

prel(~t ) ~ I(Opl(~t),o-ffl(~t)) I(tl, Vl) A I(t2, v2) ~ I(op2(tl,t2),-o-ff2(Vl , v2)) I(t, v) ~ post3(5-~p3(v))

I(t, v)

4(v)) ^ post4( &))

I(op4(t),

B e h a v i o r a l c o r r e c t n e s s . Establishing behavioral correctness of a manipulation rule operating on a class definition amounts to showing that the relationships among pre i and postj values remain unchanged, even as the definition of I changes in the course of transformation. C o r r e c t n e s s o f shift. Let I(t, v) be an invariant that holds for all methods in class T prior to carrying out the transformation. The effect of shift is to create a new invariant It(t, v ~) that relates the new representation values v' to the abstract values t. The proof proceeds by defining F, showing it is invariant in the transformed class, and showing that it captures the same relationships between preconditions and postconditions as does I. We prove the first variant of shift; the proof for the second variant is essentially identical. For a method op : T --+ T, for example, the Span precondition of the definition of the shift manipulation guarantees for some residual computation r that

/(t, v) =~ I(op(t), (r

o

Span)(v))

Define the post-manipulationinvariant It(t, v t) -

I(t, v) A v' = Span(v). Now,

I'(t, v') =- I(t, v) A v' = Span(v) Definition of I' :=~ I(op(t), (r o Span)(v)) A v' = Span(v) Invariance property of I

I(op(t), r(vt)) :=~ I'(op(t), (Span o r)(v')) =~ I'(op(t), b-~p(v'))

Definition of I' Definition of shift

This establishes that I t is invariant for the operation op in the class definition after it has been subject to the manipulation. To understand the effect on postconditions, we consider the case of op3.

I(t, v) I(t, v) It(t, Span(v)) It(t, Span(v)) (3v)(v' = Span(v)) A It(t, v')

::~ post3(-O-~P3(v)) Assumed ~ posta(r(Span(v)) Precondition of shift ~ posta(r(Span(v)) Definition of I' ~ posta(5-~p's(Span(v) ) Definition of I' :=~ posta(b'-ffp~(v' )

This is sufficient, since in the modified class definition, all outputs of methods are in the codomain of Span. These arguments are easily generNized to methods that include additional parameters and results.

260

C o r r e c t n e s s o f i d e m p o t e n c y . There are two variants of the ~dempotency manipulation. We prove one, in which the second precondition (in the description of the technique) assures that for any t, I(op(t), v) ::> (gv')(v -- (Span o § for some residual function r. Consider the case op : T --~ T. Because of the second precondition of the idempotency rule, op preserves the invariant,

I(t, ,) : . • Z(op(t), (@an o W)(v)) For each method in which Span commutes (and in which Span can therefore be eliminated by the transformation), we need to show that the invariant is maintained, that is I(t, v) :=r I(op(t),-6-~(v)). Assume inductively that invariant I(t,v) holds for all class operations on type T prior to carrying out the transformation. We proceed as follows:

I(t, v) ::V (3v') I(t, v) A v = (Span o r)(v') (Sv') I(op(t), (Span o 5-ppI o Span o r)(v') ) (2v') I(op(t), (b--ffp'o Span o Span o r)(v')) ::~ (2v') I(op(t), (b--~p~ o Span o r)(v')) I(op(t), b-~(v))

idempotency precondition Invariant for op Commutativity Idempotency of Span Definition of v

C o r r e c t n e s s o f p r o j e c t . The proof for the project manipulation is similar, and is omitted. 5

Strings

in Java

We now present a modest-scale case study, which is a hypothetical re-creation of the evolution of early versions of the two J a v a classes used for strings, S t r i n g and S t r i n g B u f f e r . 2 J a v a S t r i n g s are m e a n t to be immutable and efficiently represented with minimal copying. For example, substring operations on S t r i n g s m a n i p u l a t e pointers and do not involve any copying. S t r i n g B u f f e r s , on the other hand, are mutable and flexible, at a modest incremental performance cost. The typical use scenario is that strings are created and altered as S t r i n g B u f f e r s and then searched and shared as S t r i n g s , with conversion operations between 2 These two classes are part of an early version of the released Java Development Kit 1.0Beta from Sun Microsystems. Java is a trademark of Sun Microsystems. Since fragments of the code (string 1.51 and stringbuffer 1.21) are quoted in this paper, we include the following license text associated with the code: "Copyright (c) 1994 Sun Microsystems, Inc. All Rights Reserved. Permission to use, copy, modify, and distribute this software and its documentation for non-commercial purposes and without fee is hereby granted provided that this copyright notice appears in all copies. Please refer to the file "copyright.html" for further important copyright and licensing information. Sun makes no representations or warranties about the suitability of the software, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. Sun shall not be liable for any damages suffered by licensee as a result of using, modifying or distributing this software or its derivatives."

261

them. The data representations used in both classes facilitate this conversion by delaying or minimizing copying of structure. In particular, conversions from S t r i n g B u f f e r to S t r i n g do not always result in copying of the character array data structure. Both classes represent strings as character arrays called v a l u e . Both have an integer field called count, which is the length of the string in the array. The String class also maintains an additional integer field called offset, which is the initial character position in the array. Since S t r i n g s are immutable, this enables substrings to be represented by creating new S t r i n g objects that share the character array and modify count and o f f s e t . The S t r i n g B u f f e r class, which is used for mutable strings, has three fields. In addition to v a l u e and count, there is a boolean s h a r e d that is used to help delay or avoid array copy operations when S t r i n g B u f f e r objects are converted into S t r i n g s . The method used for this conversion does not copy v a l u e , but it does record the loss of exclusive access to the array v a l u e by setting the s h a r e d flag. If the flag is set, then a copy is done immediately prior to the next mutation, at which point the flag is reset. If no mutation is done, no copy is made. A simpler but less efficient scheme would always copy. Selected steps of the evolutionary account are summarized below. The initial steps, omitted here, establish the initial string class, called S t r , which manages mutable strings represented as null-terminated arrays, as in C. S t r always copies character arrays passed to and from operations. (All boundary and class manipulation steps below have been manually carried out using our techniques, and one of these is examined more closely in the following section.) 1. Introduce count. In the initial class S t r , trade space for time by using shift to replace uses of the null character by adding the private instance variable count. Simplifications following the shift eliminate all reference to and storage of null terminating characters. 2. Separate S t r into S t r i n g and S t r i n g B u f f e r . Label all S t r variables in the scope in which S t r is accessible as either S t r i n g or S t r i n g B u f f e r , and introduce two methods to convert between them (both initially the identity function with a type cast). Since copy operations are done at every step, object identity cannot be relied upon, and this is a safe step. This process would normally be iterative, using type analysis and programmer-guided selection of conversion points. Those S t r variables that are subject to mutating operations, such as append, i n s e r t , and setCharAt are typed as S t r i n g B u f f e r . Those S t r variables that are subject to selection operations, such as s u b s t r i n g and r e g i o n M a t c h e s , are typed as S t r i n g . Common operations such as l e n g t h and charAt are in both clones. Because copying is done at every step, conversion operations can always do an additional copy of mutable structures (i.e., v a l u e ) . This approach to allocating variables to the two clones enables use of boundary manipulations to remove mutating operations such as setCharAt, for example, from S t r i n g . At this point, the two clones have different client variables, different sets of operations, but identical representations and representation invariants.

262

3. Introduce o f f s e t . Since S t r i n g s are immutable, s u b s t r i n g does not need to copy value. But elimination of copying requires changing the representation to include offsets as well as length. Use shift to accomplish the representation change preparatory to eliminating the copying. (This is the step that is detailed below.) 4. Introduce shared. In S t r i n g B u f f e r , value is copied prior to passing it to the S t r i n g constructor. This is necessary because value would otherwise be aliased and subject to subsequent mutation, which would violate the mutability invariant of S t r i n g . In fact, the copying needs to be done only if value is subsequently mutated, and could be delayed until then. Use shift to relocate the copying to whatever S t r i n g B u f f e r method immediately follows the conversion, adding the shared flag. This delays the copy by "one step." Then use idempotency to delay that copying "multiple steps" (by commuting it within all calls to non-mutating access methods) until a S t r i n g B u f f e r operation that actually does mutation. 5. Add specialized methods. Identify conversion functions from various other common types to char []. For both string classes, use boundary shifts to compose these conversion functions with the constructors and with commonly used functions such as a p p e n d and i n s e r t . Then use additional boundary shifts to relocate these composed functions as public methods, naming them appropriately (mostly through overloading). Then, unfold (inline) the type conversion calls contained in these methods and specialize accordingly. (This introduces many of the large number of methods defined in the two classes.) 6 (hypothetical). Eliminate value. Suppose that in some context only the lengths of S t r i n g B u f f e r s were needed, but not their contents. That is, after a series of calls to methods such as i n s e r t , append, setCharAt, reverse, and setLength, the only accessor method called is length. Clone and project can be used to create a specialized class for this case.

6

A C l o s e r L o o k at Offsets in Class S t r i n g

We now consider more closely step 3 above, which introduces the offset instance variable as an alternative to copying when substrings are calculated. This step must include, for example, modification of all character access operations to index from o f f s e t . This account illustrates how shift can be employed to introduce offsets into the string representation. An initial Java version of s u b s t r i n g in class S t r i n g could look like this: public String substring (int b, int e) { // b is first char position; e-i is last character position. if ((b < 0) II (e > count) ~I (b > e)) {throw new IndexOutOfBoundsException (endi); } int ncount = e - b; char res[] = new char[ncount]; ArrayCopy (value, b, res, O, ncount); return new String (res, ncount);

263

ArrayCopy copies n c o u n t characters from v a l u e to r e s starting at offsets b and 0 respectively. To facilitate presentation consistent with the manipulation rules, the code below explicitly mentions in special brackets those (- instance v a r i a b l e s -) that are referenced or changed. This allows a more functional treatment of methods. We also usually abbreviate {- i n t e g r i t y checks -}. Thus the code above could be rendered more succinctly as: public String substring (int b, int e) (- value, count -) { { - C h e c k (b, e, count) -} int ncount -- e - b; char res[] = new char[ncount] ; ArrayCopy (value, b, res, O, ncount); return new String (res, ncount); }

The principal steps for shift, recall, are (1) defining a suitable Span function, (2) adapting code so it appears in all the required places, and (3) carrying out the shift transformation by relocating the Span calls. We then (4) simplify results. (1) D e f i n e S p a n . The motivating observation is that the array copying is unnecessary when a substring is extracted. We therefore abstract the instances of array copy code into a new method definition and use that as Span. The original code sites will be replaced by calls. private (char[] ,int) Span (- char[] nvalue, int ncount, char res[] = new char[ncount]; ArrayCepy (nvalue, noffset, res, 0, nceunt); return (res,ncount) ; }

int noffset -) {

(An additional notational liberty we take is to allow functions to return a tuple of results. For this variant of shift, the parameters of Span are exactly the new set of instance variables, and the results are the old instance variables. (2) Call S p a n . The method abstraction step needs to be done in a way that introduces Span calls into all methods that construct or mutate string values (i.e., alter v a l u e or count). Note that s u b s t r i n g does not mutate or construct string values directly; it uses a S t r i n g constructor: public String (nvalue, ncount) (- value, count -) value = new char[ncount] ; count -- ncount; ArrayCopy (nvalue, 0, value, 0 count); }

After

Span

{

is abstracted:

public String (nvalue, ncount) (- value, count -) (value, count) = Span (nvalue, ncount, 0); }

{

When Span has an easily computed inverse function, then Span can always be introduced trivially into otherwise less tractable cases as part of a composition with the inverse. But this will likely necessitate later code simplification. (3) D o t h e S h i f t . Once all the Span calls have been inserted, doing the shift is a mechanical operation. The effect is as follows:

264

-

-

-

The class now has three private instance variables, nvalue, ncount, and n o f f s e t , corresponding to the parameters of Span. All Span calls are replaced by (simultaneous) assignments to the new instance variables of the actual parameters. In each method that accesses instance variables, the first operation becomes a call to Span that computes value and count (now ordinary local variables) in terms of the instance variables.

For example, here is the transformed definition of s u b s t r i n g : public String substring (int b, int e) (- nvalue, ncount, noffset -) { (char value[], int count) = Span (nvalue, ncount, noffset); { - C h e c k (b, e, count) -} int c = e - b; char res[] = new char[c]; ArrayCopy (value, b, res, O, c); return new String (res, c) ; }

(4) Simplify. Unfolding Span and simplifying: public String substring (int b, int e) (- nvalue, ncount, noffset -) { {- Check (b, e, ncount) -} i n t c = e - b; char res[] = new char[c]; ArrayCopy (nvalue, noffset, res, O, c); ArrayCopy (res, b, res, O, c); return new String (res, c); }

This motivates our introducing a new string constructor that can take an offset, pub1 ic S t r i n g (nvalue, ncount, n o f f s e t ) . This is done by abstracting the last two lines of s u b s t r i n g and exploiting the idempotency of array copying. After simplificationspublic String substring (int b, int e) (- nvalue, ncount, noffset -) { {- Check (b, e, ncount) -} return new String (nvalue, e - b, b + noffset) ; } In the case of the charAt m e t h o d in String, w e start with: public char charAt(int i) (- value, count -) { { - C h e c k (i, count) -} return value [i] ; } This does not create or modify the instance variables, so no call to Span is needed

before t h e shift. I t does, however, access i n s t a n c e variables, so i n i t i a l call to Span:

shift

public char charAt(int i) (- nvalue, ncount, noffset -) { ( c h a r v a l u e [ ] , i n t count) -- Span ( n v a l u e , n c o u n t , n o f f s e t ) ; { - C h e c k (i, count) -} return value [i] ; }

inserts an

265

Simplification entails unfolding Span, eliminating the array copy operation, and simplifying: p u b l i c char c h a r A t ( i n t i) (- n v a l u e , ncount, n o f f s e t -) { {- Check ( i , ncount) -} return nvalue[i + noffset];

7

Background

}

and Conclusion

B a c k g r o u n d . There is significant previous work on synthesizing and reasoning about encapsulated abstract data types. Class manipulations are most obviously influenced by Hoare's early work [H72]. Most related work focuses either on synthesizing type implementations from algebraic specifications or on refinement of data representations into more concrete forms [D80,MG90,W93]. But evolution does not always involve synthesis or refinement, and the techniques introduced here are meant to alter (and not necessarily refine) both structure and (internal) meaning. Ideas for boundary manipulations appear in Burstall and Goguen's work on theory operations [BG77]. Wile developed some of these ideas into informallydescribed manipulations on types that he used to provide an account of the heapsort algorithm [W81]. He used a system of manipulating flag bits to record whether values have been computed yet in order to achieve the effect of a primitive class manipulation. For boundary manipulations, we build on several recent efforts which have developed and applied concepts analogous to the operation m~gvation techniques [GN93,JF88,O92]. Early versions of the boundary manipulation and shift techniques were defined (without proof) and applied to introduce destructive operations in programs [JS87,$86] and in a larger case study in which a complex data structure for the text buffer of an interactive text editor was developed [NLS90]. The program manipulation techniques we introduce for classes and other encapsulations are meant to complement other manipulation and specialization techniques for achieving local restructuring and optimizations that do not involve manipulating encapsulations. Conclusion. The techniques described in this paper are source-level techniques meant to be embodied in interactive tools used with programmer guidance to develop and evolve software. The intent is to achieve a more flexible and exploratory approach to the design, configuration, implementation, and adaptation of encapsulated abstractions. Manipulation techniques such as these can decrease the coupling between a programmer decision to perform a particular computation and the related decision of where the computation should be placed with respect to class and internal interface structure. A c k n o w l e d g e m e n t s . Thanks to John Boyland and Edwin Chart, who contributed valuable insights. Thanks also to the anonymous referees for helpful comments.

266

References R.W.Bowdidge and W.G. Griswold, Automated Support for Encapsulating Abstract Data Types. ACM SIGSOFT Symposium Foundations of Software Engineering, 1994. [BD77] R.M. Burstall and J. Darfington, A transformation system for developing recursive programs. JACM 24, pp.44-67, 1977. [BG77] R.M. Burstall and J. Goguen, Putting theories together to make specifications. IJCAI, 1977. J. Darlington, The synthesis of implementations for abstract data types. Im[D80] perial College Report, 1980. [GN93] W.G. Griswold and D. Notkin, Automated assistance for program restructuring. ACM TOSEM 2:3, 1993. J.V. Guttag and J. J. Homing, et al., Larch: Languages and Tools for Formal [G93] Specification, Springer-Verlag, 1993. C.A.R. Hoare, Proof of correctness of data representations. Acta Informatica [H72] 1, pp. 271-281, 1972. R. Johnson and B. Foote, Designing reusable classes. Journal of Object[JF88] Oriented Programming, June/July 1988. U. Jcrring and W. Scherlis, Deriving and using destructive data types. Pro[JS87] gram Specification and Transformation, Elsevier, 1987. G. Kiczales, Beyond the Black Box: Open Implementation, IEEE Software, [K96] January 1996. [MG90] Carroll Morgan and P.H.B. Gardiner, Data refinement by calculation. Acta Informatica 27 (1990). [MG95] J.D. Morgenthaler and W.G. Griswold, Program analysis for practical program restructuring. ICSE-17 Workshop on Program Transformation for Software Evolution, 1995. [NLS90] R.L. Nord, P. Lee, and W. Scherlis, Formal manipulation of modular software systems. ACM/SIGSOFT Formal methods in software development, 1990. W. Opdyke, Refactoring Object-Oriented Frameworks. PhD Thesis, Univer[092] sity of Illinois, 1992. R. Paige, Programming with invariants. IEEE Software 3:1, 1986. [P86] A. Pettorossi, A powerful stragey for deriving efficient programs by transfor[P84] mation. ACM Symposium on Lisp and Functional Programming, 1984. W. Scherlis, Expression procedures and program derivation. Stanford Uni[s80] versity technical report, 1980. W. Scherlis, Abstract data types, specialization, and program reuse. Ad[s86] vanced programming environments, Springer, 1986. W. Scherlis, Boundary and path manipulations on abstract data types. IFIP [$94] 94, North-Holland, 1994. [w88] P. Wadler, Deforestation: Transforming programs to eliminate trees. European Symposium on Programming, Springer, 1988. [w81] D.S. Wile, Type transformations. IEEE TSE SE-7, pp.32-39, 1981. K.R. Wood, A practical approach to software engineering using Z and the [W93] refinement calculus. ACM SIGSOFT Symposium on the Foundations of Software Engineering, 1993. [BG94]

A G e n e r i c F r a m e w o r k for S p e c i a l i z a t i o n

(Abridged Version) Peter Thiemann* Universit~t T/ibingen thiemann@inf ormatik, uni-tuebingen, de

A b s t r a c t . We present a generic framework for specifying and implementing offline partial evaluators. The framework provides the infrastructure for specializing higher-order programs with computational effects specified through a monad. It performs sound specialization for MI monadic instances and is evaluation-order independent. It subsumes most previously published partial evaluators for higher-order functional programming languages in the sense that they axe instances of the generic framework with respect to a particular monad.

Keywords: higher-order programming, program transformation, partial evaluation, computational effects

1

Introduction

A partial evaluator [10, 23] specializes a program with respect to a known part of its input. The resulting specialized program takes the rest of the input and delivers the same result as the original program applied to the whole input. The specialized program usually runs faster than the original one. One particular flavor of partial evaluation is offline partial evaluation. In its first stage, a binding-time analysis annotates all phrases of a program that only depend on the known input as executable at specialization time. The execution of the remaining phrases is deferred to the run time of the specialized program. In the second stage, a static reducer interprets the annotated program. It evaluates all phrases annotated as executable and generates code for the remaining phrases. The goal of this work is to present a generic framework to specify and implement the static reducer. The framework unifies the existing specifications of static reducers and it provides a sound basis to implement reducers that execute and generate code with computational effects. It goes beyond existing specializers in that it allows for experimentation with various effects in a modular way. To achieve this modularity, the framework is parameterized over a monad. The choice of a monad fixes a particular computational effect. Monads have been used in the context of programming languages to structure denotational * Author's present address: Department of Computer Science, Uniyersity of Nottingham, University Park, Nottingham NG7 2RD, England

268

semantics [30], to structure functional programs [39,40], and to equip pure functional languages with side-effecting operations like I/O and mutable state [24,32]. .Closely related to structuring denotational semantics is the construction of modular interpreters [18,28]. There are also recent theoretical approaches to formalize partial evaluation using monads [21,26]. 1.1

Four Ways to Static R e d u c t i o n

In the past, four different approaches have been used to implement static reducers dealing with a particular computational effect. All of them rest on denotational specifications of specialization, or--from a programmer's point of view-on viewing the static reducer as an interpreter for an annotated language. The implementation languages of these interpreters (the metalanguages of the specifications) range from applied lambda calculus to ML with control operators. By factoring these specifications over an annotated variant of Moggi's computational metalanguage [21,30], we demonstrate that all of them are composed from the same set of building blocks, the sole difference being the staging of computation at specialization time. Here is the set of building blocks: evalv an evaluation function for a pure call-by-value lambda calculus; B a binding-time analysis that maps terms to annotated terms; ,% a specializer for applied lambda calculus written in lambda calculus (e.g., Lambdamix [19]); A4 a monadic expansion translation that maps the computational metalanguage to applied lambda calculus by expanding the monadic operators to lambda terms according to the definition of the monad; a call-by-value explication that translates from the (annotated) source language to the (annotated) metalanguage, encoding a callby-value evaluation strategy. As an example for the different approaches consider a call-by-value language A! with some computational effect and specialize a program p to r with respect to known data s. We assume access to the program text of all functions mentioned above: [Sv] is the lambda term denoting the function S, = eval, [S,], [p s] is the textual application of p to s, and so on. T r a n s f o r m S o u r c e P r o g r a m to E x p a n d e d Monadic Style Applying A4ogv to [p s] yields an effect-free lambda term. This term can now be analyzed and statically reduced with a specializer for the lambda calculus [8,19].

,.s,, (B ( M (ev(rp sl)))) = M (,rv(l,-1))

(1)

This approach is viable [9, 31,37], but it suffers from a number of drawbacks. - Monadic expansion typically introduces many new abstractions. This increase in program size slows down the analysis B and the static reduction.

269

- The expanded term can be hard to read for the user of the partial evaluator. It provides no useful feedback from the annotated term on how to change the source program to achieve better specialization. - The straightforward expansion often does not lead to satisfactory results from the binding-time analysis. For example, sophisticated state-passing translations have been designed to obtain better results [31,37]. - The specialized program is also in expanded monadic style (for example, in continuation-passing style [9]), which requires an inverse translation (for example, a direct style translation [13]) to obtain readable results.

Specializer in E x p a n d e d M o n a d i c S t y l e At the cost of moving to a more sophisticated binding-time analysis B!, which takes computational effects into account (e.g., [37]), equation (1) can be rewritten to

(,,,, (~(,~, r,, ~1))) hence (8. oA4 o$.) (~,rp sl) hence

= M (e,,rrl) = (M oE~) rrl

($U 1 o2~4 -1 o 8 . o/t4 o E . ) (B!rp sl) = rrl

Changing the staging by symbolically composing the specializer with the monadic expansion and the explication and their inverses yields evalv rE~-1 o j ~ _ l o 8v o j ~ o e~l B, rp sl = rrl

(2)

There are a number of advantages in return for this complication. - The source program does not undergo any translation. - The binding-time analysis applies directly to the source program p and can transmit useful feedback to the user of the system. - The specialized program is built from pieces of the original source program. The inverse translation is hard-wired into the specializer. Examples of this approach are Bondorf's specializer in extended continuationpassing style [4] and a specializer for call-by-value lambda calculus with first-class references in extended continuation-passing store-passing style [17].

Specializer in Direct Style with Monadic Operators Here we depart from writing the specializer in a pure language and use a meta-interpreter eval! = (evalv o/t4 o Cv) for A! with the monad in question built in. On top of that we write a direct style specializer S! in A! using the built-in monadic operations. For example, if S~ was written in ML it could make use of exceptions, state, and control operations. This approach has the same advantages as the previous approach. Additionally, it is usually more efficient since eval! can employ machine-level implementations of the monadic operations. Now we can reason as follows:

F~l = eval~ [S~l ( ~ Fp sl) = (eval. o.A,4 oE~) rs:l (B~Fp ~l) = eval~ (A,'[(E~FS!])) (/3! Fp s])

270

Lawall and Danvy [25] have followed exactly this path (for the continuation monad) to construct an efficient implementation of Bondorf's specializer in direct style with control operators. Their work exploits the built-in continuation monad of Scheme and" ML. Since 2! simply interprets unannotated pure terms, it follows that

This generalizes a result of Lawall and Danvy [25], namely

[2ol

--- cv IS,1 where

C. is a translation to extended continuation-passing style, encoding call-by-value evaluation; it holds [20] that C. = A/Ic o E. (where ~4c is the monadic expansion for the continuation monad, see Sec. 5); Sc is a continuation-based specializer in CPS; it holds that Sc = S. o Cv. Define a Specializer for t h e M e t a l a n g u a g e Hatcliff and Danvy [21] translate p to the metalanguage via Ev and then define the binding-time analysis BML and the specializer 8ML for the metalanguage.

sMs (

.LUEv(I-p sl)n) = ,r.,..(r,,.1)

The specialized program is also written in the metalanguage and may have to be translated back into )~.~. 1.2

T h e Design Space

Figure 1 shows the design space of specialization for languages with computational effects. Again, ,~! is an impure lambda calculus with some "built-in" effects, ~ML is an enrichment of Moggi's computational metalanguage, and ;~ is a pure (but applied) lambda calculus. The underlined variants are the respective annotated versions of the calculi. They are connected via binding-time analyses B for the respective calculi. The annotated programs are mapped to their specialized versions by the respective specializer S. The diagram should convey the idea that--ideally--the order of the transformations should not matter for the results. Paths in the diagram correspond to design choices in constructing a specializer, as exemplified before. Clearly, if the path to the specialized program does not end in the lower left of the diagram, we need inverse transformations for E and (possibly) A4 to get a specialized program in the source language. The diagram suggests that a specializer has to map the annotated version of a calculus to its standard version. As a counterexample, consider the specializer A/1-1 o S! that maps ~! to )~ML.

1.3

Our Approach

We distinguish three languages, the annotated source language ,k!, the annotated computational metalanguage ~ML, and an implementation language )~impl, which

271

A,I

,E ,2

~! ~ "~.~L M

~

,[ 8

L

M

,E

Fig. 1. Generic Framework

is a functional programming language equipped with a particular monad T . All of them are defined in Sec. 2. In Sec. 3, we define a monadic semantics for the annotated source language in terms of an explication translation from ~! ~---ML" The actual specializer maps ~ML programs to ~! programs. This has two advantages: the binding-time analysis and the explication of the evaluation order are left to the frontend and an inverse of the explication translation is not required. The first stage of specialization, G, maps ~---MLprograms to Ximpl programs. G is generalized from a continuation interpretation for ~---MLin Sec. 4. Evaluation of the ~impl program yields the specialized program in ~!. Subsequently, we instantiate T with various monads and discuss the outcome: the continuation monad yields continuation-based partial evaluation (Sec. 5), the identity monad yields Gomard and Jones's specializer for the lambda calculus (Sec. 6), a combination of the continuation monad and the store monad yields a specializer for core ML with references (Sec. 7), a combination of the continuation monad and the exception monad yields a specializer that can process exceptions at specialization time (Sec. 8). For each choice of monad T we give a specification GT of the monadic operators in terms of ~. The composition GT o G plays the role of AA in the diagram. As outlined in 1.1 above, any implementation of the operators will do, as long as it obeys the specification.

to

2 2.1

Notation Annotated Lambda Calculus

The source language is a simply typed annotated lambda calculus ~_. It will be extended later to ~! to demonstrate the treatment of monadic effects like state and exceptions. It is straightforward to extend both with the usual programming

272 constructs. The typing rules are standard. terms E ::= x I Ax.E I E@E I A--x.E I EQ_E types T ::= ~1 ~- --~ ~" I-~1 T--+T The type t is the type of integers, and T -+ 7-' is the type of functions that map ~- to T'. In the implementation, the underlined (dynamic) types are subsumed in the type Code. Beta reduction of static terms is the only rule of computation. We use standard notational conventions: application associates to the left, the scope of a A goes as far to the right as possible, and we can merge tambda abstractions as in Axy.E. As usual, =~ is single step reduction, = ~ is the reflexive transitive closure of ~ , and = is the reflexive, transitive, and symmetric closure of ~ .

2.2

Annotated Computational Metalanguage

The computational metalanguage [30] is an extended lambda calculus that makes the introduction and composition of computations explicit. Here is the syntax of its annotated version [21]: terms M ::= x I Ax.M I M @ M I unit(M) I let x ~ M in M I Ax.M I M ~ M I unit(M) I le__s x ~ M in M types T : : ~ - t l T - " ~ T I T T I ~ _ I T ' - ' ~ T I T T Intuitively, unit(M) denotes a trivial computation that returns the value of M. The monadic let let x ~ M1 in M2 expresses the sequential composition of computations: first, M1 is performed and then M2 with x bound to the value returned by M1. We augment beta reduction and the monadic reduction rules let x ~ (let y ~ M1 in/1//2) in M3 -+ let y ~ M1 in let x ~ M2 in M3 (3) let x ~ M in unit(x) ~ M let x ~ unit(M1) in/l//2 --~ M2[x := M1]

(4) (5)

by rules that reorganize underlined let expressions, too. let x ~ (le_Aty ~ M1 in M2) in/1//3 -+ le__tty ~ M1 in let x ~ M2 in M3 (6) le__t_tx ~ (let y r

M1 in M2) in M3 --+ let y ~ M1 in le_s x ~ M2 in M3 (7)

le_t_tx ~ (let y ~ M1 in M2) in/1//3 -+ let y ~ M1 in let x ~ M2 in/1//3 (8)

2.3

Implementation Language

The implementation language Aimpl is a lambda-calculus extended with the monadic constructs and some special operators. The special operators include the binary syntax constructors A(,) and ~ (,). Furthermore, there are operators specific to the currently used monad (see below for examples). The implementation language is no longer a true annotated language, since A(, ) and ~(, ) are merely constructors for the datatype Code. The implementation language is purposefully close to existing functional programming languages so that its programs are easily transcribed.

273

3

Explication

The explication translation performs the first part of the work. It maps the (annotated) source language ~! into the (annotated) metalanguage ~ML and makes a particular evaluation order explicit. We only consider g, 0 which fixes left-to-right call-by-value evaluation. g~(x) g~ ()~x.E) g.(EI@E~) g.(_~x.E) s

- unit(x) -- unit()~x.g~ (E)) --= let xl 4==g.(E1) in let x2 r g.(E2) in xl@x2 -- unit(~x.g.(E)) ------let xl 4= s in let x2 r s in xl@x2

The translation of the underlined constructs mirrors the translation of the static constructs exactly, which in turn is standard [20]. This will always be the case for the explication translation. It is easy to show that the translation preserves typing. L e m m a 1. Suppose F F- E : 7. Then 2~4~F~ F- gv(E) : A ~ - ~

A . r2 --+ T1] = M.[T2

--+ Me.IT1] M

where

Code] = Code

This approach is similar to that of Hatcliff and Danvy [21]. They translate the source language to the metalanguage in the very beginning and perform the binding-time analysis on terms of the metalanguage. We can accommodate this setup, but we also support a translation from the annotated source language (after binding-time analysis) to the annotated metalanguage. Both metalanguages have in common the existence of reductions involving the underlined constructs. The language considered by Hatcliff and Danvy excludes rules (7) and (8) [21, fig. 10]. This is merely a different design choice as explained by Lawall and Thiemann [26]. The crucial point is the following theorem [21]. T h e o r e m 1. 4

~--~MLreduction preserves operational equivalence.

Semantics

for

~---ML

The language ~--ML contains non-standard reductions, namely the reorganizing rules for let expressions. Hence we develop a CPS translation that maps ~ML to .~ in such a way that reduction in )tML is simulated by reduction in Aimpl. Figure 2 defines the translation using let (x, El, E2) as syntactic sugar for

(x, E2)), El). L e m m a 2. Suppose M1

=r

M2 then Mc[M1] =~impl Mc[M2].

Recall that we promised a generic implementation scheme parameterized over a monad. So far, we have only produced one implementation for a particular instance, the continuation monad. Let us now abstract from this instance

274

MoM

~_~x

Mr Mr

= Ax.Mc[[M]] = Ak..Md[M1]]@.M~[M2]@k =_ [email protected] l Me[unit(M)]] M~[let x ~= M1 in M2]] = )~k..Md[M1]]@),x..Mr ~(x, Md[M]@Az.z) M~ JIM1~U2 ] =Ak.k@(~(A.4~[[M1], .Md[M2]) ) Mr A~c [let x ~= M~ in M2]] -~ Ak.M~[M1]~Az.let(x, z, M~M2]~k ) Fig. 2. Annotated continuation introduction

~M = Ax.G[M]

G[[Ax.M~

apply G[[M1]]G[M2]] _~ unit (G[[M]]) O[let x r M1 in M2]] let x r G[[M1] in G~'M2]] G[hx.M] ~(x, ~Code (G~'M]])) a[M~M2~ - unit(~(G[[M1]], ttCode(G[[U~]))) if M2: T r = unit(~(G[Ml]], al[M~])) otherwise G[unit(M)~ unit (aiM]I) G[let x r M1 in M2]] = shiftCode w. let z ~.~ 6[[M1] in unit(let(x, z, ~Code (w@G[M2])))

~[[MI@M2]

O[[unit(M)D

Fig. 3. Translation to the implementation language

by rewriting--in the implementation language the right sides of M c such that we obtain the original definition of M c by monadic expansion, i.e., CPS transformation. Figure 3 defines the translation. It might seem like nothing has happened in this transformation. However, we have taken an important conceptual step. We have gotten rid of the annotated language with non-standard reductions in favor of a standard functional language with monadic operators. In other words, we have an implementation. There are two non-standard constructs in the translated terms. ~Code(M) denotes an effect delimiter. It runs the computation M (which may involve effects), discards all effects, and returns the result, provided that M terminates. shiftCode x . M grabs the context of the computation up to the next enclosing ~Code(M'), discards it, and binds it as a function to x. The standard implementation of these operators in terms of continuations is given in the next section 5.

275 These operators are restricted so that shiftCode x . M only abstracts contexts that return Code. Otherwise, type soundness could not be guaranteed.

5

Continuation-Based Specialization

We obtain sound continuation-based specialization [26] from the generic specification G by substituting the continuation monad T T = (r --+ Code) --+ Code in the implementation language. The monadic expansion translation boils down to defining the operators apply M M, unit(M), let x ~ M in M, ~Code(/Yl), and shiftCode x . M . aoM

=

Gd[Ax.M]

-- Ax.Gd[M] =- Ak.Gd[M1]]@Gd[M2]@k

G~l[apply M1 M2]I Gelid(x, M)ll acid(M1, M2)~ G~[unit (M)]] ae[let x ~= M1 in M2] G~[[~Code(M)] 6~[[shiftCode x.M]]

= ~(x, Qd[U]l)

= ~(ac[M1], G~[M2]) = Ak.k@G~[M] = Ak.{~e[[M1]]~Ax.ac[[M2]]@k =_ Q~IIM]]@Az.x - )~k.(Xx.O,[[M~QAz.z)@Ay.Ak'.k'@(k@y)

Here is the connection to Ad~ with o denoting composition. L e m m a 3. Gc o ~ = ~

2t4~.

For the specializer S~ of Lawall and Thiemann [26] we find: L e m m a 4. (6c o 6 o C~) = ~

Sc.

There are specifications of continuation-based specialization in direct style [25] that employ the control operators shift and reset [14]. In contrast to reset, ~Code0 is a general effect delimiter that runs an encapsulated computation, extracts its results, and hides all its effects. The difference is visible in the computation type of "reset(M)".

6

Specialization for the Lambda Calculus

Gomard's specializer Lambdamix [19] is targeted towards an applied lambda calculus. It results from setting T T = T, the identity monad. GI[Ax.M]

= Ax.Q,[M] Qil[apply M1 M2] = ~,[M1]~Qi[M2] Q,[~(z, M)ll = ~(x,Q,~M])

a,[~(M1, Ms)]

Qi[unit(M)] = Q,[let z r M1 in M2] = Gil~Code(M)] = G,[shiftCode z.M D -

G,[M] (Ax.~I[M2])@G,IMx] a,[M] (xx.a,~M])~(Xz.z)

= ~(a,[M~], G,[M~])

Calling Lambdamix S~ we have the following connection (see also [38]): L e m m a 5. (G~ o C~) ~

S~

276 7

Specialization

with

Mutable

Store

If we set T T = (r X Store --+ Code x Store) -+ Store ~ Code x Store (a continuation-passing and store-passing monad) we obtain a specializer for a language with mutable store [17]. The expansion now reads as follows (using (,) for pairing and 7h for projection).

G,[Ax.M] ~,[[apply Mt M2]] G, li(a:, M)]

G[(~(Mt,M,)]

=- ,',x.6,[M] =- ,~ks.~,[M~]@G,[M2l@k~s = i(z, ~,[M])

a,[u~it(M)]

-= ~(G, [Mt], g, [M2]) - ~ k ~ . k @ ( g , [ M ] , ~)

~, [~Code(M)] ~,l[shiftCode x.M]

=- 7rl(~,[M]~(A(x,s).(x,s))@Sempty) = ,~ks.(),x.G,[M]~(,~(z, s').(z, s'))@s)~Ay.)~k' s'.k'@(k~(y, s'))

The only problematic cases are ~Code(M) and shiftCode x.M. The ~Code(M) case shows that the standard reset operator is not sufficient here. The computation M is applied to the empty continuation $(x, s).(x, s) and the empty store Sempty. In the end the Store component is discarded and only the value returned. An implementation of reset would have to thread the store through the computation. Since ~Code(M) discards the static store, the binding-time analysis must guarantee that each reference, whose lifetime crosses the effect delimiter, is dynamic. We have developed such an analysis elsewhere [37]. To understand the implementation of shiftCode x . M observe that the computation M is applied to the empty continuation and to the current store s. In the translated term, each occurrence of x stands for a function that accepts a value y, a continuation k', and a store s'. The function applies the captured continuation k to (y, s') to obtain the result of k paired with the resulting store. Both are passed to k' to produce the result of the computation. Specialization with a store is not very interesting without any operations on it. We add the standard set of primitive operations, to allocate, read, and update references, to the source language. E :: . . . . v :: . . . .

I ref E l ! E I E := E Ire__XfEI!E[E :-- E I ref T [ re_~f~-

The typing rules and operational semantics are again standard (similar to ML). The metalanguage and the implementation language are extended by the same set of operators, but the result type of all these operations is a computation type, i.e., we consider Store an abstract datatype with operations mkref : A4v[T -+ ref T~, rdref : A/tv[ref 7- -~ T]], and wrref : ref T --4 A4~ IT --+ T~. Extending the explication is standard: g,(ref E) -- let x 4= E,(E) in ref x gv(! E) -- let x r gv(E) in ! x g,(E1 :----E2) - let Xl 4= g~(Et) in let x2 4= Ev(E2) in xt := x2

277 G requires some more care. ~[ref M] --- ref G[M~ ~ I ! M ] -~!GEM] G[[ M~ ~ unit(~(G[M]))

A

G[re._[fM]

-= ~[M1 := M2] = {;[M1 := M2] = =

unit(ref(~Code(~[M]))) if M : T r unit(ref(G[M])) otherwise ~M~] := G[M2~ unit(:-~(G[[M~],tlCode(~[[M2~)))if M2: T__~unit(:~(~[[M1], G~M2])) otherwise

T h e translation depends on the type of terms. In the image of the call-byn a m e translation, the argument of ref and the second argument of :-- have computation type. The placement of the delimiter ~Code0 is required to preserve t y p e correctness. Since each effect delimiter causes less effects to be performed at specialization time, it is obvious t h a t in a call-by-name language with effects less computations are performed in a known context. The monadic expansion to the implementation language is straightforward. G, iref M] = Aks.mkref G~[M] ks r M] _= Aks.rdref ~,[M] ks ~,[Ul := U2] = Aks.wrref G,[M1] a,[M2] ks

8

A

Gs[ref(M)] Gs~(M)]

_= ref(G, [M]) _= !(G,[M])

Specialization with Exceptions

Finally, we embark on processing exceptions at specialization time. We use a model of exceptions with "raise E" and "El handle E2" constructs to raise and intercept exceptions and one fixed type "Exception" of exceptions. We assume that E2 is a function that maps Exception to the same type as El. E ::.... I raise E I E handle E I raise E I E handle E The extension of the explication translation is standard: gv (raise E) = let x r s (E) in raise x s handle E2) = E,(E) handle Ax.C,,(E2@x) T h e interesting part is the translation G. G[[raise M~ = raise ~[[M]] G[[M1 handle M2] = G[[M1] handle GHM2] ~[[raise M~ = raise(G[M~) aIM1 handle M2~ = handle(~Code(G[[Ul]] ), G[[M2]]) In the definition of the monadic expansion M e for the exception monad, we write [f,g] : (A + B) -+ C if f : A -+ C and g : B --~ C. We use standard notation for the injection functions inl : A -~ A + B and inr : B --+ A + B. We do not use the straightforward exception monad, but a composition with the continuation monad, which is required to define a sound call-by-value specializer with exceptions. T r = ((T + Exception) --+ (Code + Exception)) --+ (Code + Exception)

278

~X ~e [M1 @M2 ]

= Ax.G~[[M]] - ,Xk.G~[MI~@G~[M2~@k

G~[A(x, M)] G~[4(Mx, M2)]] --= @(~,[M1], G~[M21) G~[[unit(M)~ --- Ak.k@(inl G, gM]) Gd[let x 4= M1 in M2] -- Ak.G,[MI]@[Ax.~,~M2~@k, Ay.k@inr y] -- G ~ M ] @ [ A z . x , ;~y.s z)] -- Ak.(Ax.G,~M]@[Az.inl z, Az.inr z])@Ay.Ak'.k'@(k@(inl y)) g~ ~shiftCode x . M ] G~~raise M~ -- Ak.k@(inr GerM]) Gr -- raise(6~ IM]I) = Ak.6,[M~]@[Ax.k@(inl x),G~[M2]@k] 6~ JIM1 handle M2~ G~[M1 handle M2~ --- Gr handle G,[M2]I Again, the placement of ~Code(M) restricts the binding-time analysis. All exceptions that may cross the effect delimiter must be dynamic.

9

R e l a t e d Work

There are two formalizations of partial evaluation using Moggi's work [30]. Hatcliff and Danvy [21] define a binding-time analysis and specialization for an annotated version of the monadic metalanguage. Their specializer exploits the monadic law 3 to "flatten" nested let expressions. To obtain an executable specification of the specializer they define a separate operational semantics (close to an abstract machine) that they prove equivalent to their first definition of specialization. Lawall and Thiemann [26] define an annotated version of Moggi's computational lambda calculus and show that it is implementable through a annotated CPS translation. This translation forms a reflection in a annotated lambda calculus of the annotated computational lambda calculus. Thus they show that a particular flavor of continuation-based partial evaluation is sound for all monadic models, thereby establishing firm ground for the development of specializers with computational effects expressed through monads. In contrast to these works, the present work continues the work on monadic interpreters [18, 28, 40] in that it shows how to use monads to structure specializers in functional programming languages. Hence, the focus is on directly executable specifications. Our specifications implement the flattening transformation (the monad law 3) using special operations of the monad used to implement the specializer. Incidentally, these operations correspond to effect delimiters, control operators, and store operators [22]. We rely on the two above works [21,26] for the soundness of this approach. Effect delimiters have been considered by a number of researchers for varying purposes. Riecke and Viswanathan [35,36] construct fully abstract denotational semantics for languages with monadic effects. Launchbury and Peyton Jones [24] define an effect delimiter for the state monad with a second order polymorphic type to encapsulate state-based computations. Similar operators have been used

279

by Dussart et al [16] in order to get satisfactory results in a type specializer for the monadic metalanguage extended with mutable store (as in Sec.7). There are offiine partial evaluators for first-order imperative languages [1, 5, 7, 11, 12] and for higher-order languages [2, 3, 8, 29]. However, most partial evaluators for higher-order imperative languages [2, 3,6] defer all computational effects to run time [5]. Realistic partial evaluators for higher-order languages with side effects must be able to perform side effects at specialization time. The only partial evaluator so far capable of this has been specified for a subset of Scheme by Thiemann and Dussart [17]. That work defines the specializer in extended continuation-passing store-passing style, it defines a binding-time analysis (which is proved correct elsewhere [37]), and considers pragmatic aspects such as efficient management of the store at specialization time and specialization of named program points. In contrast, the present work identifies a general scheme underlying the construction of specializers that address languages with computational effects. It does so in an evaluation-order independent framework and is built around monads in order to achieve maximum flexibility. Birkedal and Welinder [2] developed an ad hoc scheme to deal with exceptions in their specializer for ML. It needs a separate correctness proof, because it is not based on the monadic metalanguage. 10

Conclusion

We have specified a generic framework for partial evaluation and demonstrated that it subsumes many existing algorithms for partial evaluation. The framework is correct for all monadic instances since it only performs reductions which are sound in the computational metalanguage. In addition, we have investigated the construction of a generic binding-time analysis for languages with arbitrary effects, the construction of program generators instead of specializers, and the construction of specializers in direct style. These are reported in the full version of this paper [38] which also considers the call-by-name explication and some further language constructs, namely numbers and primitive operations, conditionals, and recursion. References 1. Lars Ole Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, May 1994. (DIKU report 94/19). 2. Lars Birkedal and Morten Welinder. Partial evaluation of Standard ML. Rapport 93/22, DIKU, University of Copenhagen, 1993. 3. Anders Bondorf. Automatic autoprojection of higher order recursive equations. Science of Programming, 17:3-34, 1991. 4. Anders Bondorf. Improving binding times without explicit CPS-conversion. In Proc. 1992 ACM Conference on Lisp and Functional Programming, pages 1-10, San Francisco, California, USA, June 1992.

280

5. Anders Bondorf and Olivier Danvy. Automatic autoprojection of recursive equations with global variables and abstract data types. Science of Programming, 16(2):151-195, 1991. 6. Anders Bondorf and Jesper Jcrgensen. Efficient analysis for realistic off-line partial evaluation. Journal of Functional Programming, 3(3):315-346, July 1993. 7. Mikhail A. Bulyonkov and Dmitrij V. Kochetov. Practical aspects of specialization of Algol-like programs. In Danvy et al. [15], pages 17-32. 8. Charles Consel. Polyvariant binding-time analysis for applicative languages. In David Schmidt, editor, Proc. ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation PEPM '93, pages 66-77, Copenhagen, Denmark, June 1993. ACM Press. 9. Charles Consel and Olivier Danvy. For a better support of static data flow. In John Hughes, editor, Proc. Functional Programming Languages and Computer Architecture 1991, volume 523 of Lecture Notes in Computer Science, pages 496-519, Cambridge, MA, 1991. Springer-Verlag. 10. Charles Consel and Olivier Danvy. Tutorial notes on partial evaluation. In POPL1993 [33], pages 493-501. 11. Charles Consel, Luke Hornof, Francois No~l, Jacques Noy@, and Nicolae Volanschi. A uniform approach for compile-time and run-time specialization. In Danvy et al. [15], pages 54-72. 12. Charles Consel and Francois Noel. A general approach for run-time specialization and its application to C. In Proc. 23rd Annual ACM Symposium on Principles of Programming Languages, pages 145-156, St. Petersburg, Fla., January 1996. ACM Press. 13. Olivier Danvy. Back to direct style. Science of Programming, 22:183-195, 1994. 14. Olivier Danvy and Andrzej Filinski. Abstracting control. In LFP 1990 [27], pages 151-160. 15. Olivier Danvy, Robert Glfick, and Peter Thiemann, editors. Dagstuhl Seminar on Partial Evaluatwn 1996, volume 1110 of Lecture Notes ~n Computer Science, Schlofi Dagstuhl, Germany, February 1996. Springer-Verlag. 16. Dirk Dussart, John Hughes, and Peter Thiemann. Type specialisation for imperative languages. In Mads Tofte, editor, Proc. International Conference on Functional Programming 1997, pages 204-216, Amsterdam, The Netherlands, June 1997. ACM Press, New York. 17. Dirk Dussart and Peter Thiemann. Partial evaluation for higher-order languages with state. Berichte des Wilhelm-Schickard-Instituts WSI-97-XX, Universit~t Tiibingen, April 1997. 18. David Espinosa. Building interpreters by transforming stratified monads. ftp://altdorf.ai.mit.edu/pub/dae, June 1994. 19. Carsten K. Gomard and Neil D. Jones. Compiler generation by partial evaluation: A case study. Structured Programmzng, 12:123-144, 1991. 20. John Hatcliff and Olivier Danvy. A generic account of continuation-passing styles. In Proc. 21st Annual ACM Symposium on Principles of Programming Languages, pages 458-471, Portland, OG, January 1994. ACM Press. 21. John Hatcliff and Olivier Danvy. A computational formalization for partial evaluation. Mathematical Structures in Computer Science, 7(5):507-542, 1997. 22. G. F. Johnson and Dominic Duggan. Stores and partial continuations as first-class objects in a language and its environment. In Proc. 15th Annual ACM Symposium on Princzples of Programming Languages, pages 158-168, San Diego, California, January 1988. ACM Press.

281

23. Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. Partial Evaluation and Automatic Program Generation. Prentice-Hall, 1993. 24. John Launchbury and Simon L. Peyton Jones. Lazy functional state threads. In Proc. of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementatwn, pages 24-35, Orlando, Fla, USA, June 1994. ACM Press. 25. Julia Lawall and Olivier Danvy. Continuation-based partial evaluation. In Proc. 1994 ACM Conference on Lisp and Functional Programming, pages 227238, Orlando, Florida, USA, June 1994. ACM Press. 26. Julia Lawall and Peter Thiemann. Sound specialization in the presence of computational effects. In Proc. Theoretical Aspects of Computer Software, volume 1281 of Lecture Notes in Computer Science, pages 165-190, Sendai, Japan, September 1997. Springer-Verlag. 27. Proc. 1990 ACM Conference on Lisp and Functional Programming, Nice, France, 1990. ACM Press. 28. Sheng Liang, Paul Hudak, and Mark Jones. Monad transformers and modular interpreters. In POPL1995 [34], pages 333-343. 29. Karoline Malmkjmr, Nevin Heintze, and Olivier Danvy. ML partial evaluation using set-based analysis. In Record of the 1994 ACM SIGPLAN Workshop on ML and its Applications, number 2265 in INRIA Research Report, pages 112-119, Orlando, Florida, June 1994. 30. Eugenio Moggi. Notions of computations and monads. Information and Computation, 93:55-92, 1991. 31. Barbara Moura, Charles Consel, and Julia Lawall. Bridging the gap between functional and imperative languages. Publication interne 1027, Irisa, Rennes, France, July 1996. 32. Simon L. Peyton Jones and Philip L. Wadler. hnperative functional programming. In POPL1993 [33], pages 71-84. 33. Proc. 20th Annual ACM Symposium on Principles of Programming Languages, Charleston, South Carolina, January 1993. ACM Press. 34. Proc. 22nd Annual ACM Symposium on Principles of Programming Languages, San Francisco, CA, January 1995. ACM Press. 35. John Riecke. Delimiting the scope of effects. In Arvind, editor, Proc. Functional Programming Languages and Computer Architecture 1993, pages 146-155, Copenhagen, Denmark, June 1993. ACM Press, New York. 36. John Riecke and Ramesh Viswanathan. Isolating side effects in sequential languages. In POPL1995 [34], pages 1-12. 37. Peter Thiemann. Correctness of a region-based binding-time analysis. In Proc. Mathematical Foundations of Programming Semantics, Thirteenth Annual Conference, volume 6 of Electronic Notes in Theoretical Computer Science, page 26, Pittsburgh, PA, March 1997. Carnegie Mellon University, Elsevier Science BV. URL: http://www.elsevier.nl/locate/entcs/volume6.html. 38. Peter Thiemann. A generic framework for specialization. Berichte des WilhelmSchickard-Instituts WSI-97-XXX, Universit/it Tiibingen, October 1997. 39. Philip L. Wadler. Comprehending monads. In LFP 1990 [27], pages 61-78. 40. Philip L. Wadler. The essence of functional programming. In Proc. 19th Annual ACM Symposium on Principles of Programming Languages, pages 1-14, Albuquerque, New Mexico, January 1992. ACM Press.

Complexity of Concrete Type-inference in the Presence of Exceptions? Ramkrishna Chatterjee1 Barbara G. Ryder1 William A. Landi2 1

Department of Computer Science, Rutgers University, Piscataway, NJ 08855 USA, Fax: 732 445 0537, framkrish,[email protected] 2 Siemens Corporate Research Inc, 755 College Rd. East, Princeton, NJ 08540 USA, [email protected]

Abstract. Concrete type-inference for++statically typed object-oriented

programming languages (e.g., Java, C ) determines at each program point, those objects to which a reference may refer or a pointer may point during execution. A precise compile-time solution for this problem requires a ow-sensitive analysis. Our new complexity results for concrete type-inference distinguish the diculty of the intraprocedural and interprocedural problem for languages with combinations of single-level types3 , exceptions with or without subtyping, and dynamic dispatch. Our results include: { The rst polynomial-time algorithm for concrete type-inference in the presence of exceptions, which handles Java without threads, and C++ ; { Proofs that the above algorithm is always safe and provably precise on programs with single-level types, exceptions without subtyping, and without dynamic dispatch; { Proof that intraprocedural concrete type-inference problem with single-level types and exceptions with subtyping is PSPACEcomplete, while the interprocedural problem without dynamic dispatch is PSPACE-hard. Other complexity characterizations of concrete type-inference for programs without exceptions are also presented.

1 Introduction Concrete type-inference (CTI from now on) for statically typed object-oriented programming languages (e.g., Java, C++ ) determines at each program point, those objects to which a reference may refer or a pointer may point during execution. This information is crucial for static resolution of dynamically dispatched calls, side-eect analysis, testing, program slicing and aggressive compiler optimization. The research reported here was supported, in part, by NSF grant GER-9023628 and the Hewlett-Packard Corporation. 3 These are types with data members only of primitive types.

?

The problem of CTI is both intraprocedurally and interprocedurally owsensitive. However, there are approaches with varying degrees of ow-sensitivity for this problem. Although some of these have been used for pointer analysis of C, they can be adapted for CTI of Java without exceptions and threads, or C++ without exceptions. At the one end of the spectrum are intraprocedurally and interprocedurally ow-insensitive approaches [Ste96, SH97, ZRL96, And94], which are the least expensive, but also the most imprecise. While at the other end are intraprocedurally and interprocedurally ow-sensitive approaches [LR92, EGH94, WL95, CBC93, MLR+ 93, Ruf95], which are the most precise, but also the most expensive. Approaches like [PS91, PC94, Age95] are in between the above two extremes. An intraprocedurally ow-insensitive algorithm does not distinguish between program points within a method; hence it reports the same solution for all program points within each method. In contrast, an intraprocedurally ow-sensitive algorithm tries to compute dierent solutions for distinct program points. An interprocedurally ow-sensitive (i.e. context-sensitive) algorithm considers (sometimes approximately) only interprocedurally realizable paths [RHS95, LR91]: paths along which calls and returns are properly matched, while an interprocedurally ow-insensitive (i.e. context-insensitive) algorithm does not make this distinction. For the rest of this paper, we will use the term ow-sensitive to refer to an intra- and interprocedurally ow-sensitive analysis. In this paper, we are interested in a ow-sensitive algorithm for CTI of a robust subset of Java with exceptions, but without threads (this subset is described in Section 2). The complexity of ow-sensitive CTI in the presence of exceptions has not been studied previously. None of the previous ow-sensitive pointer analysis algorithms [LR92, WL95, EGH94, PR96, Ruf95, CBC93, MLR+ 93] for C/C++ handle exceptions. However, unlike in C++ , exceptions are frequently used in Java programs, making it an important problem for Java. The main contributions of this paper are: { The rst polynomial-time algorithm for CTI in the presence of exceptions that handles a robust subset of Java without threads, and C++4, { Proofs that the above algorithm is always safe and provably precise on programs with single-level types, exceptions without subtyping, and without dynamic dispatch; thus this case is in P, { Proof that intraprocedural CTI for programs with single-level types and exceptions with subtyping is PSPACE-complete,while the interprocedural problem (even) without dynamic dispatch is PSPACE-hard. { New complexity characterizations of CTI in the absence of exceptions. These results are summarized in table 1, which also gives the sections of the paper containing these results. The rest of this paper is organized as follows. First, we present a ow-sensitive algorithm, called the basic algorithm, for CTI in the absence of exceptions, and discuss our results about complexity of CTI in the absence of exceptions. Next, 4

In this paper, we present our algorithm only for Java.

results

paper single-level exceptions exceptions dynamic section types without subtypes with subtypes dispatch interprocedural CTI sec 4 x x in P, O(n7 ) intraprocedural CTI sec 4 x x PSPACE-complete interprocedural CTI sec 4 x x PSPACE-hard interprocedural CTI sec 3 x x PSPACE-hard interprocedural CTI sec 3 x in P, O(n5 ) intraprocedural CTI sec 3 x in NC

Table 1. Complexity results for CTI summarized we extend the basic algorithm for CTI in the presence of exceptions, and discuss the complexity and correctness of the extended algorithm. Finally, we present PSPACE-hardness results about CTI in the presence of exceptions. Due to lack of space, we have omitted all proofs. These proofs and further details about the results in this paper are given in [CRL97]5.

2 Basic de nitions Program representation. Our algorithm operates on an interprocedural con-

trol ow graph or ICFG [LR91]. An ICFG contains a control ow graph (CFG) for each method in the program. Each statement in a method is represented by a node in the method's CFG. Each call site is represented using a pair of nodes: a call-node and a return-node. Information ows from a call-node to the entrynode of a target method and comes back from the exit-node of the target method to the return-node of the call-node. Due to dynamic dispatch, interprocedural edges are constructed iteratively during data- ow analysis as in [EGH94]. Details of this construction are shown in Figure 3. We will denote the entry-node of main by start-node in the rest of this paper.

Representation of dynamically created objects. All run-time objects (or

arrays) created at a program point n are represented symbolically by object n. No distinction is made between dierent elements of an array. Thus, if an array is created at n, object n represents all elements of the array.

Precise solution for CTI. A reference variable is one of the following: { a static variable (class variable) of reference6 type; 5 6

available at http://www.prolangs.rutgers.edu/refs/docs/tr341.ps. may refer to an instance of a class or an array.

{ a local variable of reference type; { Av, where Av is V [t1]:::[td], and

V is a static/local variable or V is an array objectn allocated at program point n, such that V is either a d-dimensional array of reference type or an array of any type having more than d dimensions and each ti is a non-negative integer; or { V:s1:::sk , where V is either a static/local variable of reference type or V is Av or V is object objectn created at program point n, for 1 i k, each V:s1 :::si,1 (V for i = 1) has the type of a reference to a class Ti and each si is a eld of reference type of Ti or si = fi [ti1 ]:::[ti ] and fi is a eld of Ti and fi is an array having at least ri dimensions and each ti is a non-negative integer, and V:s1 :::sk is of reference type.

ri

j

Using these de nitions, the precise solution for CTI can be de ned as follows: given a reference variable RV and an object object n, hRV,object n i belongs to the precise solution at a program point n if and only if RV is visible at n and there exists an execution path from the start-node of the program to n such that if this path is followed, RV points to object n at n (i.e., at the top of n). Unfortunately, all paths in a program are not necessarily executable and determining which are executable is undecidable. Barth[Bar78] de ned precise up to symbolic execution to be the precise solution under the assumption that all program paths are executable (i.e., the result of a test is independent of previous tests and all the branches are possible). In the rest of this paper we use precise to mean precise up to symbolic execution.

Points-to. A points-to has the form hvar,obj i; where var is one of the following: (1) a static variable of reference type, (2) a local variable of reference type, (3) object m - an array object created at program point m or (4) object n.f - eld f, of reference type, of an object created at a program point n; and obj is object s - an object created at a program point s.

Single-level type. A single-level type is one of the following: (1) a primitive type de ned in [GJS96] (e.g., int, oat etc.), (2) a class that has all non-static data-members of primitive types (e.g., class A f int i,j; g) or (3) an array of a primitive type. Subtype. We use Java's de nition of subtyping: a class A is a subtype of another class B if A extends B, either directly or indirectly through inheritance.

Safe solution. An algorithm is said to compute a safe solution for CTI if and

only if at each program point, the solution computed by the algorithm is a superset of the precise solution.

Subset of Java considered. We essentially consider a subset that excludes

threads, but in some cases we may need to exclude three other features: nalize methods, static initializations and dynamically de ned classes. Since nalize methods are called (non-deterministically) during garbage collection or unloading of classes, if a nalize method modi es a variable of reference type (extremely rare), it cannot be handled by our algorithm. Static initializations complicate analysis due to dynamic loading of classes. If static initializations can be done in program order, our algorithm can handle them. Otherwise, if they depend upon dynamic loading (extremely rare), our algorithm cannot handle them. Similarly, our algorithm cannot handle classes that are constructed on the y and not known statically. We will refer to this subset as JavaWoThreads. Also, we have considered only exceptions generated by throw statements. Since run-time exceptions can be generated by almost any statement, we have ignored them. Our algorithm can handle run-time exceptions if the set of statements that can generate these exceptions is given as an input. If all statements that can potentially generate run-time exceptions are considered, we will get a safe solution; however, this may generate far more information than what is useful.

3 CTI in the absence of exceptions Our basic algorithm for CTI is an iterative worklist algorithm [KU76]. It operates on an ICFG and is similar to the Landi-Ryder algorithm [LR92] for alias analysis, but instead of aliases, it computes points-tos. In Section 4, we will extend this algorithm to handle exceptions.

Lattice for data- ow analysis. In order to restrict data- ow only to realiz-

able paths, points-tos are computed conditioned on assumed-points-tos (akin to reaching alias in [LR92] [PR96]), which represent points-tos reaching the entry of a method, and approximate the calling context in which the method has been called (see the example in Appendix A). A points-to along with its assumedpoints-to is called a conditional-points-to. A conditional-points-to has the form hcondition, points-toi, where condition is an assumed-points-to or empty (meaning this points-to is applicable to all contexts). For simplicity, we will write hempty,points-toi as points-to. Also a special data- ow element reachable is used to check whether a node is reachable from the start-node through a realizable path. This ensures that only such reachable nodes are considered during data ow analysis and only points-tos generated by them are put on the worklist for propagation. The lattice for data- ow analysis (associated with a program point) is a subset lattice consisting of sets of such conditional-points-tos and the data- ow element reachable.

Query. Using these conditional-points-tos, a query for CTI is answered as

follows. Given a reference variable V and a program point l, the conditionalpoints-tos with compatible assumed-points-tos computed at l are combined to

determine the possible values of V . Assumed-points-tos are compatible if and only if they do not imply dierent values for the same user de ned variable. For example, if V is p.f1, and the solution computed at l contains hempty, hp, obj1ii, hz, hobj1.f1, obj2ii and hu, hobj1.f1, obj3ii, then the possible values of V are obj2 and obj3.

Algorithm description. Figure 1 contains a high-level description of the main

loop of the basic algorithm. apply computes the eect of a statement on an incoming conditional-points-to. For example, suppose l labels the statement p.f1 = q, ndf elm (i.e. the points-to reaching the top of l) is hz, hp, object sii and hu, hq, object nii is present in the solution computed at l so far. Assuming z and u are compatible, apply generates hobject s.f1, object ni under the condition that both z and u hold at the entry-node of the method containing l. Then either z or u is chosen as the condition for the generated data- ow element. For example, if u is chosen then hu, hobject s.f1, object nii will be generated. When a conjunction of conditions is associated with a points-to, any xed-size subset of these conditions may be stored without aecting safety. At a program point where this data- ow element is used, if all the conjuncts are true then any subset of the conjuncts is also true. This may cause overestimation of solution at program points where only a proper subset of the conjuncts is true. At present, we store only the rst member of the list of conditions. apply is de ned in Appendix B. add to solution and worklist if needed checks whether a data- ow element is present in the solution set (computed so far) of a node. If not, it adds the data ow element to the solution set, and puts the node along with this data- ow element on the worklist. process exit node propagates data- ow elements from the exit-node of a method to the return-node of a call site of this method. Suppose hz, ui holds at the exit-node of a method M. Consider a return-node R of a call site C of M. For each assumed-points-to x such that hx, ti is in the solution set at C and t implies z at the entry-node of M, hx, ui is propagated by process exit node to R. process exit node is de ned in Figure 2. process call node propagates data- ow elements from a call site to the entrynode of a method called from this site. Due to dynamic dispatch, the set of methods invoked from a call site is iteratively computed during the data- ow analysis as in [EGH94]. Suppose hx, ti holds at a call site C which has a method M in its set of invocable methods computed so far. If t implies a points-to z at the entry-node of M (e.g., through an actual to formal binding), hz,zi is forwarded to the entry-node of M. process call node also remembers the association between x and z at C because this is used by process exit node as described above. process call node is de ned in Figures 3 and 4. Other functions used by the above routines are de ned in Appendix B. Appendix A contains an example which illustrates the basic algorithm.

Precision of the basic algorithm. By induction on the number of iterations

needed to compute a data- ow element and the length of a path associated with a data- ow element, in [CRL97], we prove that the basic algorithm computes the

// initialize worklist. Each worklist node contains a data- ow element, which // is a conditional-points-to or reachable, and an ICFG node. create a worklist node containing the entry-node of main and reachable, and add it to the worklist; while ( worklist is not empty ) f WLnode = remove a node from the worklist; ndf elm = WLnode.data-flow-element; node = WLnode.node;

if ( node 6= a call node and node 6= exit node of a method ) f

// compute the eect of the statement associated with node on ndf elm. generated data flow elements = apply( node, ndf elm );

for ( each successor succ of node ) f for ( each df elm in generated data flow elements ) add to solution and worklist if needed( df elm, succ );

g

g

g

// end of if

if ( node process if ( node process

is an exit node of a method ) exit node( node, ndf elm ); is a call node ) call node( node, ndf elm );

// end of while

Fig. 1. High-level description of the basic algorithm precise solution for programs with only single-level types and without dynamic dispatch, exceptions or threads. For programs of this form, CTI is distributive and a conditional-points-to at a program point can never require the simultaneous occurrence of multiple conditional-points-tos at (any of) its predecessors. Intuitively this is why the above proof works. The presence of general types, dynamically dispatched calls or exceptions with subtyping violate this condition and hence CTI is not polynomial-time solvable in the presence of these constructs. We also prove that the basic algorithm computes a safe solution for programs written in JavaWoThreads, but without exceptions.

Complexity of the basic algorithm. The complexity of the basic algorithm for programs with only single-level types and without dynamic dispatch, exceptions or threads is O(n5 ), where n is approximately the number of statements in the input program. This an improvement over the O(n7 ) worst-case bound achievable by applying previous approaches of [RHS95] and [LR91] to this case. Note that O(n3 ) is a trivial worst-case lower bound for obtaining a precise solution for this case. For programs written in JavaWoThreads, but without exceptions, the basic algorithm is polynomial-time.

Other results on the complexity of CTI in the absence of exceptions. In [CRL97], we prove the following two theorems:

Theorem 1 Intraprocedural CTI for programs with only single-level types is in non-deterministic log-space and hence NC.

process exit node( exit node, ndf elm ) f // Let M be the method containing the exit node. if ( ndf elm represents the value of a local variable ) // it need not be forwarded to the successors (return-nodes of call sites for // this method) because the local variable is not visible outside this method.

void

return;

if ( ndf elm is reachable ) f for ( each call site C in the current set of call sites of M )f if ( solution at C contains reachable ) f add to solution and worklist if needed( ndf elm, R );

// R is the return-node for C. for ( each s in C.waiting local points to table ) f // conditional-points-tos representing values of local variables reaching // C are not forwarded to R until it is found reachable. // C.waiting local points to table contains such conditional-points-tos. // Since R has been found to be reachable

g g

g

g

delete s from C.waiting local poinst to table; add to solution and worklist if needed( s, R );

return;

add to table of conditions( ndf elm, exit node );

// This table is accessed from the call sites of M for expanding assumed-points-tos. for ( each call site C in the current set of call sites of M ) f S = get assumed points tos( C, ndf elm.assumed points to, M ); for ( each assumed points to Apt in S ) f CPT = new conditional-points-to( Apt, ndf elm.points to ); add to solution and worklist if needed( CPT, R );

g

g

g

// R is the return-node for C.

// end of process exit node

Fig. 2. Code for processing an exit-node Recall that non-deterministic log-space is the set of languages accepted by nondeterministic Turing machines using logarithmic space[Pap94] and NC is the class of eciently parallelizable problems which contains non-deterministic logspace.

Theorem 2

CTI for programs with only single-level types and dynamic dispatch is PSPACEhard.

4 Algorithm for CTI in the presence of exceptions In this section we extend the basic algorithm for CTI of JavaWoThreads, and discuss the complexity and precision of this extended algorithm. Data- ow Elements: The data- ow elements propagated by this extended algorithm have one of the following forms:

void process call node( C, ndf elm )f // R is the return-node for call node C.

if ( ndf elm implies an increase in the set CM of methods invoked from this site ) f

// Recall that due to dynamic dispatch, the interprocedural // edges are constructed on the y, as in [EGH94].

g

add this new method nM to CM; for ( each dfelm in the solution set of C ) interprocedurally propagate( dfelm, C, nM );

// de ned in Figure 4

if ( ndf elm represents value of a local variable ) f if ( solution set for R contains reachable )

// Forward ndf elm to the return-node because (unlike C++ ) // a local variable cannot be modi ed by a call in Java.

add to solution and worklist if needed( ndf elm, R ); else

// Cannot forward till R is found to be reachable.

g g

add ndf elm to waiting local points to table;

for ( each method M in CM ) interprocedurally propagate( ndf elm, C, M );

Fig. 3. Code for processing a call-node 1. hreachablei, 2. hlabel, reachablei, 3. hexcp-type, reachablei, 4. hz, ui, 5. hlabel, z, ui, 6. hexcp-type, z, ui, 7. hexcp, z, obji. Here z and u are points-tos. The lattice for data- ow analysis associated with a program point is a subset lattice consisting of sets of these data- ow elements. In the rest of this section, we present de nitions of these data- ow elements and a brief description of how they are propagated. Further details are given in [CRL97]. First we describe how a throw statement is handled. Next, we describe propagation at a method exit-node. Finally, we describe how a nally statement is handled. throw statement: In addition to the conditional-points-tos described previously, this algorithm uses another kind of conditional-points-tos, called exceptionalconditional-points-tos, which capture propagation due to exceptions. The con-

ditional part of these points-tos consists of an exception type and an assumed points-to (as before). Consider a throw statement l in a method Proc, which throws an object of type T (run-time type and not the declared type). Moreover let hhq, obj1i, hp, obj2ii be a conditional-points-to reaching the top of l. At the throw statement, this points-to is transformed to hT; hq; obj 1i; hp; obj 2ii and

interprocedurally propagate( ndf elm, C, M) f // C is a call node, R is the return-node of C and M is a method called from C. if ( ndf elm == reachable ) f add to solution and worklist if needed(ndf elm, M.entry node); if ( M.exit node has reachable ) f add to solution and worklist if needed(ndf elm, R); for ( each s in C.waiting local points to table ) f

// Since R has been found to be reachable

g

g

g

delete s from C.waiting local poinst to table; add to solution and worklist if needed( s, R );

propagate conditional points tos with empty condition(C,M); return;

// get the points-tos implied by ndf elm at the entry-node of M

S = get implied conditional points tos(ndf elm,M,C);

for ( each s in S ) f add to solution and worklist if needed( s, M.entry node ); add to table of assumed points tos( s.assumed points to, ndf elm.assumed points to, C );

// This table is accessed from exit-nodes of methods called from C // for expanding assumed points-tos. if ( ndf elem.apt is a new apt for s.apt ) f

// apt stands for assumed-points-to Pts = lookup table of conditions( s.assumed points to, M.exit node ); // ndf elm.assumed points to is an assumed-points-to for each element of Pts for ( each pts in Pts ) f cpt = new conditional points to( ndf elm.assumed points to, pts ); add to solution and worklist if needed( cpt, R );

g

g

g

g

// end of each s in S

Fig. 4. Code for interprocedurally propagate propagated to the exit-node of the corresponding try statement, if there is one. A precalculated catch-table at this node is checked to see if this exception (identi ed by its type T) can be caught by any of the corresponding catch statements. If so, this exceptional-conditional-points-to is forwarded to the entry-node of this catch statement, where it is changed back into an ordinary conditional-points-to hhq; obj 1i; hp; obj 2ii. If not, this exceptional-conditional-points-to is forwarded to the entry-node of a nally statement (if any), or the exit-node of the innermost enclosing try, catch, nally or the method body. A throw statement also generates a data- ow element for the exception itself. Suppose the thrown object is obj and it is the thrown object under the assumed points-to hp, obj1i. Then hexcp, hp, obj1i, obji representing the exception is generated. Such data- ow elements are handled like exceptional-conditional-pointstos, described above. If such a data- ow element reaches the entry of a catch statement, it is used to instantiate the parameter of the catch statement.

In addition to propagating reachable (de ned in section 3), this algorithm also propagates data- ow elements of the form hexcp-type, reachablei. When hreachablei reaches a throw statement, it is transformed into hexcp-type, reachablei, where excp-type is a run-time type of the exception thrown, which is then propagated like other exceptional-conditional-points-tos. If the throw is not directly contained in a try statement, then the data ow elements generated by it are propagated to the exit-node of the innermost enclosing catch, nally or method body. exit-node of a method: At the exit-node of a method, a data- ow element of

type 4,6 or 7 is forwarded (after replacing the assumed points-to as described in section 3) to the return-node of a call site of this method if and only if the assumed points-to of the data- ow element holds at the call site. At a returnnode, ordinary conditional-points-tos (type 4) are handled as before. However, a data- ow element of type 6 or 7 is handled as if it were generated by a throw at this return-node. nally statement: The semantics of exception handling in Java is more complicated than other languages like C++ because of the nally statement. A try statement can optionally have a nally statement associated with it. It is executed no matter how the try statement terminates: normally or due to an exception. A nally statement is always entered with a reason, which could be an exception thrown in the corresponding try statement or one of the corresponding catch statements, or leaving the try statement or one of its catch clauses by a return, (labelled) break or (labelled) continue, or by falling through. This reason is remembered on entering a nally, and unless the nally statement itself creates its own reason to exit the nally, at the exit-node of the nally this reason is used to decide control ow. If the nally itself creates its own reason to exit

itself (e.g., due to an exception), then this new reason overrides any previous reason for entering the nally. Also, nested nally statements cause reasons for entering them to stack up. In order to correctly handle this involved semantics, for all data- ow elements entering a nally, the algorithm remembers the reason for entering it. For data- ow elements of type 3, 6 or 7 (enumerated above), the associated exception already represents this reason. A label is associated with data- ow elements of type 1 or 4, which represents the statement number to which control should go after exit from the nally. Thus the data- ow elements in a nally have one of the following forms: 1. hlabel, reachablei, 2. hexcp-type, reachablei, 3. hlabel, z, ui, 4. hexcp-type, z, ui, 5. hexcp, z, obji. When a labelled data- ow element reaches the labelled statement, the label is dropped and it is transformed into the corresponding unlabelled data- ow element.

Inside a nally, due to labels and exception types associated with data- ow elements, apply uses a dierent criterion for combining data- ow elements (at an assignment node) than the one given in section 3. Two data- ow elements hx1,y1,z1i and hx2,y2,z2i can be combined if and only if both x1 and x2 represent the same exception type or the same label, and y1 and y2 are compatible (as de ned in section 3). At a call statement (inside a nally), if a data- ow element has a label or an exception type associated with it, it is treated as part of the context (assumed points-to) and not forwarded to the target node. It is put back when assumed points-tos are expanded at an exit-node of a method. For exceptionalconditional-points-tos or data- ow elements representing exceptions, the exceptions associated with them at the exit-node override any label or exception type associated with their assumed points-tos at a corresponding call site. Data- ow elements of the form hlabel,reachablei or hexcp-type,reachablei are propagated across a call if and only if hreachablei reaches the exit-node of one of the called methods. A mechanism similar to the one used for handling a call is used for handling a try statement nested inside a nally because it can cause labels and exceptions to stack up. Details of this are given in [CRL97]. If the nally generates a reason of its own for exiting itself, the previous label/exception-type associated with a data- ow element is discarded, and the new label/exception-type representing this reason for leaving the nally is associated with the data- ow element.

Example. The example in Figure 5 illustrates the above algorithm. Precision of the extended algorithm. In [CRL97], we prove that the ex-

tended algorithm described in Section 4 computes the precise solution for programs with only single-level types, exceptions without subtyping, and without dynamic dispatch. We also prove that this algorithm computes a safe solution for programs written in JavaWoThreads.

Complexity of the extended algorithm. The the worst-case complexity

of the extended algorithm for programs with only single-level types, exceptions without subtyping, and without dynamic dispatch is O(n7 ). Since we have proved that the algorithm is precise for this case, this shows that this case is in P. If we disallow trys nested inside a nally, the worst-case complexity is O(n6 ). For general programs written in JavaWoThreads, the extended algorithm is polynomialtime.

Complexity due to exceptions with subtyping. In [CRL97], we prove the following theorem:

Theorem 3 Intraprocedural CTI for programs with only single-level types and exceptions with subtyping is PSPACE-complete; while the interprocedural case (even) without dynamic dispatch is PSPACE-hard.

// Note: for simplicity only a part of the solution is shown

class A fg; class excp t extends Exception fg; class base f public static A a; public static void method( A param ) throws excp t f excp t unexp; a = param; l1: unexp = new excp t;

// hempty, hunexp, object l1 i i, // hhparam, object l2 i, ha, object l2 i i throw unexp;

g g;

// hexcp, empty, object l1 i, // hexcp t, hparam, object l2 i, ha, object l2 i i

class test f public static void test method( ) f A local; l2: local = new A; try f base.method( local );

g

// hexcp, empty, object l1 i, hexcp t, empty, ha, object l2 i i

catch( excp t param ) f

// hempty, hparam, object l1 i i, hempty, ha, object l2 i i

g

l3:

finally f

// hl4, empty, ha, object l2 i i // hempty, ha, object l2 i i g

g

l4:

g;

Fig. 5. CTI in the presence of exceptions Theorems 2 and 3 show that in the presence of exceptions, among all the reasonable special cases that we have considered, programs with only single-level types, exceptions without subtyping, and without dynamic dispatch comprise the only natural special case that is in P. Note that just adding subtyping for exception types and allowing overloaded catch clauses increase complexity from P to PSPACE-hard.

5 Related work As mentioned in the introduction, no previous algorithm for pointer analysis or CTI handles exceptions. This work takes state-of-the-art in pointer analysis one step further by handling exceptions. Our algorithm diers from other pointer analysis and CTI algorithms [EGH94, WL95, Ruf95, PC94, PS91, CBC93,

MLR+ 93] in the way it maintains context-sensitivity by associating assumedpoints-tos with each data- ow element, rather than using some approximation of the call stack. This way of handling context-sensitivity enables us to obtain precise solution for polynomial-time solvable cases, and handle exceptions. This way of maintaining context is similar to Landi-Ryder's[LR92] method of storing context using reaching aliases, except that our algorithm uses points-tos rather than aliases. Our algorithm also diers from approaches like [PS91, Age95] in being intraprocedurally ow-sensitive.

6 Conclusion In this paper, we have studied the complexity CTI for a subset of Java, which includes exceptions. To the best of our knowledge, the complexity of CTI in the presence of exceptions has not been studied before. The following are the main contributions of this work (proofs are not presented in this paper, but appear in [CRL97]): 1. The rst polynomial-time algorithm for CTI in the presence of exceptions which handles a robust subset of Java without threads, and C++ . 2. A proof that CTI for programs with only single-level types, exceptions without subtyping, and without dynamic dispatch is in P and can be solved in O(n7 ) time. 3. A proof that intraprocedural CTI for programs with only single-level types, exceptions with subtyping, and without dynamic dispatch is PSPACEcomplete, and the interprocedural case is PSPACE-hard. Additional contributions are: 1. A proof that CTI for programs with only single-level types, dynamic dispatch, and without exceptions is PSPACE-hard. 2. A proof that CTI for programs with only single-level types can be done in O(n5 ) time. This is an improvement over the O(n7 ) worst-case bound achievable by applying previous approaches of [RHS95] and [LR91] to this case. 3. A proof that intraprocedural CTI for programs with only single-level types is in non-deterministic log-space and hence NC.

References [Age95]

Ole Agesen. The cartesian product algorithm: Simple and precise type inference of parametric polymorphism. In Proceedings of European Conference on Object-oriented Programming (ECOOP '95), 1995. [And94] L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, 1994. Also available as DIKU report 94/19.

[Bar78]

J. M. Barth. A practical interprocedural data ow analysis algorithm. Communications of the ACM, 21(9):724{736, 1978. [CBC93] Jong-Deok Choi, Michael Burke, and Paul Carini. Ecient ow-sensitive interprocedural computation of pointer-induced aliases and side eects. In Proceedings of the ACM SIGPLAN/SIGACT Symposium on Principles of Programming Languages, pages 232{245, January 1993. [CRL97] Ramkrishna Chatterjee, Barbara Ryder, and William Landi. Complexity of concrete type-inference in the presence of exceptions. Technical Report DCS-TR-341, Dept of CS, Rutgers University, September 1997. [EGH94] Maryam Emami, Rakesh Ghiya, and Laurie J. Hendren. Context-sensitive interprocedural points-to analysis in the presence of function pointers. In Proceedings of the ACM SIGPLAN Conference on Programming language design and implementation, pages 242{256, 1994. [GJS96] James Gosling, Bill Joy, and Guy Steele. The Java Language Speci cation. Addison-Wesley, 1996. [KU76] J.B. Kam and J.D. Ullman. Global data ow analysis and iterative algorithms. Journal of ACM, 23(1):158{171, 1976. [LR91] W.A. Landi and Barbara G. Ryder. Pointer-induced aliasing: A problem classi cation. In Proceedings of the ACM SIGPLAN/SIGACT Symposium on Principles of Programming Languages, pages 93{103, January 1991. [LR92] W.A. Landi and Barbara G. Ryder. A safe approximation algorithm for interprocedural pointer aliasing. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 235{248, June 1992. [MLR+ 93] T. J. Marlowe, W. A. Landi, B. G. Ryder, J. Choi, M. Burke, and P. Carini. Pointer-induced aliasing: A clari cation. ACM SIGPLAN Notices, 28(9):67{70, September 1993. [Pap94] C. H. Papadimitriou. Computational Complexity. Addison{Wesley, 1994. [PC94] J. Plevyak and A. Chien. Precise concrete type inference for object oriented languages. In Proceeding of Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA '94), pages 324{340, October 1994. [PR96] Hemant Pande and Barbara G. Ryder. Data- ow-based virtual function resolution. In LNCS 1145, Proceedings of the Third International Symposium on Static Analysis, 1996. [PS91] J. Palsberg and M. Schwartzbach. Object-oriented type inference. In Proceedings of Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA '91), pages 146{161, October 1991. [RHS95] T. Reps, S. Horwitz, and M. Sagiv. Precise interprocedural data ow analysis via graph reachability. In Proceedings of the ACM SIGPLAN/SIGACT Symposium on Principles of Programming Languages, pages 49{61, 1995. [Ruf95] E. Ruf. Context-insensitive alias analysis reconsidered. In Proceedings of the ACM SIGPLAN Conference on Programming language design and implementation, pages 13{22, June 1995. [SH97] M. Shapiro and S. Horwitz. Fast and accurate ow-insensitive points-to analysis. In Proceedings of the ACM SIGPLAN/SIGACT Symposium on Principles of Programming Languages, pages 1{14, 1997. [Ste96] Bjarne Steensgaard. Points-to analysis in almost linear time. In Proceedings of the ACM SIGPLAN/SIGACT Symposium on Principles of Programming Languages, pages 32{41, 1996.

[WL95]

Robert P. Wilson and Monica S. Lam. Ecient context-sensitive pointer analysis for c programs. In Proceedings of the ACM SIGPLAN Conference on Programming language design and implementation, pages 1{12, 1995. [ZRL96] S. Zhang, B. G. Ryder, and W. Landi. Program decomposition for pointer aliasing: A step towards practical analyses. In Proceedings of the 4th Symposium on the Foundations of Software Engineering, October 1996.

A Example for the basic algorithm // Note: due to lack of space only a part of the solution is shown class A fg;

class B f public B field1; g;

class base f public static A a; public void method( )f l1: a = new A;

class C f public B field1; g; class derived extends base f public void method( ) f

// overrides base::method l2: a = new A;

// hempty, ha, object l1 i i

g;

exit node:

g;

// hempty, ha, object l2 i i g;

exit node:

g;

class caller f public static void call( base param ) f

// hhparam, object l4 i, hparam, object l4 i i, // hhparam, object l5 i, hparam, object l5 i i, l3: param.method();

// hparam, object l4 i => base::method is called. // hparam, object l5 i => derived::method is called. //hempty, ha, object l1 i i is changed to // hhparam, object l4 i, ha, object l1 i i because base::method is // called only when hparam, object l4 i. // hempty, ha, object l2 i i is changed to // hhparam, object l5 i, ha, object l2 i i because derived::method is // is called only when hparam, object l5 i.

g; g; // end of class caller

class potpourri f public static void example( C param ) f

// Let S = f hhparam, object l6 i, hparam, object l6 i i, // hhparam, object l7 i, hparam, object l7 i i, // hhobject l6. eld1, null i, hobject l6. eld1, null i i, // hhobject l7. eld1, null i, hobject l7. eld1, null i ig // solution at this point is S local = param;

// Let S1 = S U fhhparam, object l6 i, hlocal, object l6 i i, // hhparam, object l7 i, hlocal, object l7 i i g // solution at this point is S1 l8: local.field1 = new B;

// solution = // S1 U f hhparam, object l6 i, hobject l6. eld1, object l8 i i, // hhparam, object l7 i, hobject l7. eld1, object l8 i i, // hempty, hobject l8. eld1, null i i g

exit node:

g;

g;

class test f public void test1( ) f base p; l4: p = new base;

// hempty, hp, object l4 i i

public void test2( ) f derived p; l5: p = new derived; l10: caller.call(p);

l9: caller.call(p);

// hempty, hp, object l4 i i . // At l9 hp, object l4 i // => hempty, ha, object l1 i i. exit node:

g;

public void test3( ) f C q; l6: q = new C; potpouri.example( q ); g; g;

// hempty, hp, object l5 i i . // At l10 hp, object l5 i => // hempty, ha, object l2 i i. exit node:

g;

public void test4( ) f C q; l7: q = new C; potpouri.example( q ); g;

B Auxiliary functions // CPT stands for a conditional-points-to. DFE stands for a // data- ow-element which could be a CPT or `reachable'. set of data-flow-elements apply( node, rDFE ) f // this function computes the eect of the statement (if any) associated // with node on rDFE, i.e. the resulting data- ow-elements at the bottom // of node. set of data-flow-elements retVal = empty set; if ( node is a not an assignment node ) f add rDFE to retVal; return retVal;

g

if ( rDFE == reachable ) f add rDFE to retVal; if (node unconditionally generates a conditional-points-to) f

// e.g. l: p = new A; unconditionally generates // hempty, hp, object l i i.

g

g

add this conditional-points-to to retVal;

else f

// rDFE is a conditional-points-to

lhs set = combine compatible CPTs in the solution set computed so far (including rDFE) to generate the set of locations represented by the left-hand-side.

// Note: each element in lhs set has a set of assumed // points-tos associated with it

similarly compute rhs set for the right-hand-side. retVal = combine compatible elements from lhs set and rhs set.

// only one of the assumed points-tos associated with a // resulting points-to is chosen as its assumed points-to. g

if ( rDFE is not killed by this node ) add rDFE to retVal;

return retVal;

// end of apply void add to table of conditions( rCPT, exit-node ) f // each exit-node has a table condTable associated with it, // which stores for each assumed points-to, the points-tos // which hold at the exit-node with this assumed points-to. // This function stores rCPT in this table. g void add to table of assumed points tos(s, condition, C) f // C is a call-node, condition is an assumed-points-to and s is a // points-to passed to the entry-node of a method invoked from C. // Each call-node has a table asPtTtable associated with it, which // stores for each points-to that is passed to the entry-node of a // method invoked from this site, the assumed-points-tos which // imply this point-to at the call site. This function stores // condition with s in this table. g set of points-tos get assumed points tos( C, s, M ) f if ( s is empty ) f if ( the solution set at C does not contain reachable ) g

return empty set; else f if (C is a dynamically-dispatched-call site) return the set of assumed-points-tos for the values of receiver, which result in a call to method M; else return a set containing empty;

g

g

else f return the assumed-points-tos stored with s in C.asPtTtable;

g

g

// asPtTtable is de ned in add to table of assumed points tos.

lookup table of conditions( condition, exit-node // It returns the points-tos stored with condition in // exit-node.condTable, which is de ned in add to table of conditions.

set of points-tos

g

) f

get implied conditional points tos( rCPT, // It returns conditional-points-tos implied by rCPT.points-to // at the entry-node of method M at call-node C. This means it also // performs actual to formal binding. Note that it may return // an empty set. g void propagate conditional points tos with empty condition( C, M) f if ( C is not a dynamically dispatched call site ) S = f empty g; set of conditional-points-tos

else S = set of assumed-points-tos for the values of receiver, which result in a call to method M;

Pts = lookup table of conditions( empty, M.exit node ); for ( each s in S ) f for ( each pts in Pts ) f cpt = new conditional points to( s, pts ); add to solution and worklist if needed( cpt, R );

g

g

g

// R is the return-node of C

This article was processed using the LATEX macro package with LLNCS style

M, C ) f

LANGUAGE PRIMITIVES AND TYPE DISCIPLINE FOR STRUCTURED COMMUNICATION-BASED PROGRAMMING KOHEI HONDA , VASCO T. VASCONCELOSy, AND MAKOTO KUBOz

Abstract. We introduce basic language constructs and a type discipline as a found-

ation of structured communication-based concurrent programming. The constructs, which are easily translatable into the summation-less asynchronous -calculus, allow programmers to organise programs as a combination of multiple ows of (possibly unbounded) reciprocal interactions in a simple and elegant way, subsuming the preceding communication primitives such as method invocation and rendez-vous. The resulting syntactic structure is exploited by a type discipline a la ML, which oers a high-level type abstraction of interactive behaviours of programs as well as guaranteeing the compatibility of interaction patterns between processes in a well-typed program. After presenting the formal semantics, the use of language constructs is illustrated through examples, and the basic syntactic results of the type discipline are established. Implementation concerns are also addressed.

1. Introduction Recently, signi cance of programming practice based on communication among processes is rapidly increasing by the development of networked computing. From network protocols over the Internet to server-client systems in local area networks to distributed applications in the world wide web to interaction between mobile robots to a global banking system, the execution of complex, reciprocal communication among multiple processes becomes an important element in the achievement of the goals of applications. Many programming languages and formalisms have been proposed so far for the description of software based on communication. As programming languages, we have CSP [7], Ada [28], languages based on Actors [2], POOL [3], ABCL [33], Concurrent Smalltalk [32], or more recently Pict and other -calculus-based languages [6, 23, 29]. As formalisms, we have CCS [15], Theoretical CSP [8], -calculus [18], and other process algebras. In another vein, we have functional programming languages augmented with communication primitives, such as CML [26], dML [21], and Concurrent Haskell [12]. In these languages and formalisms, various communication primitives have been oered (such as synchronous/asynchronous message passing, remote procedure call, method-call and rendez-vous), and the description of communication behaviour is done by combining these primitives. What we observe in these primitives is that, while they do express one-time interaction between processes, there is no construct to structure a series of reciprocal interactions between two parties as such. That is, the only way to represent a series of communications following a certain scenario (think of interactions between a le server and its client) is to describe them as a collection of distinct, unrelated interactions. In applications based on complex interactions among concurrent processes, which are appearing more and more in these days, the lack of structuring methods would result in low readability and careless bugs in nal programs, apart from the case when the whole communication behaviour can be simply described as, say, a one-time remote procedure call. The situation may be illustrated in comparison with the design history Dept. of Computer Science, University of Edinburgh, UK. y Dept. of Computer Science, University of Lisbon, Portugal. zDept. of Computer Science, Chiba University of Commerce, Japan.

of imperative programming languages. In early imperative languages, programs were constructed as a bare sequence of primitives which correspond to machine operations, such as assignment and goto (consider early Fortran). As is well-known, as more programs in large size were constructed, it had become clear that such a method leads to programs without lucidity, readability or veri ability, so that the notion of structured programming was proposed in 1970's. In present days, having the language constructs for structured programming is a norm in imperative languages. Such a comparison raises the question as to whether we can have a similar structuring method in the context of communication-based concurrent programming. Its objective is to oer a basic means to describe complex interaction behaviours with clarity and discipline at the high-level of abstraction, together with a formal basis for veri cation. Its key elements would, above all, consist of (1) the basic communication primitives (corresponding to assignment and arithmetic operations in the imperative setting), and (2) the structuring constructs to combine them (corresponding to \if-then-else" and \while"). Veri cation methodologies on their basis should then be developed. The present paper proposes structuring primitives and the accompanying type discipline, as a basic structuring method for communication-based concurrent programming. The proposed constructs have a simple operational semantics, and various communication patterns, including those of the conventional primitives as well as those which go beyond them, are representable as their combination with clarity and rigor at the high-level of abstraction. The type discipline plays a fundamental role, guaranteeing compatibility of interaction patterns among processes via a type inference algorithm in the line of ML [19]. Concretely our proposal consists of the following key ideas. A basic structuring concept for communication-based programming, called session. A session is a chain of dyadic interactions whose collection constitutes a program. A session is designated by a private port called channel, through which interactions belonging to that session are performed. Channels form a distinct syntactic domain; its dierentiation from the usual port names is a basic design decision we take to make the logical structure of programs explicit. Other than session, the standard structuring constructs for concurrent programming, parallel composition, name hiding, conditional and recursion are provided. In particular the combination of recursion and session allows the expression of unbounded thread of interactions as a single abstraction unit. Three basic communication primitives, value passing, label branching, and delegation. The rst is the standard synchronous message passing as found in e.g. CSP or -calculus. The second is a puri ed form of method invocation, deprived of value passing. The third is for passing a channel to another process, thus allowing a programmer to dynamically distribute a single session among multiple processes in a well-structured way. Together with the session structure, the combination of these primitives allows the exible description of complex communication structures with clarity and discipline. A basic type discipline for the communication primitives, as an indispensable element of the structuring method. The typability of a program ensures two possibly communicating processes always own compatible communication patterns. For example, a procedural call has a pattern of output-input from the caller's viewpoint; then the callee should have a communication pattern of input-output. Because incompatibility of interaction patterns between processes would be one of the main reasons for bugs in communication-based programming, we believe such a type discipline has important pragmatic signi cance. The derived type gives high-level abstraction of interactive behaviours of a program.

Because communication between processes over the network can be done between modules written in dierent programming languages, the proposed communication constructs may as well be used by embedding them in various programming languages; however for simplicity we present them as a self-contained small programming language, which has been stripped to the barest minimum necessary for explanation of its novel features. Using the language, the basic concept of the proposed constructs is illustrated through programming examples. They show how extant communication primitives such as remote procedural-call and method invocation can be concisely expressed as sessions of speci c patterns (hence with speci c type abstractions). They also show how sessions can represent those communication structures which do not conform to the preceding primitives but which would naturally arise in practice. This suggests that the session-based program organisation may constitute a synthesis of a number of familiar as well as new programming ideas concerning communication. We also show the proposed constructs can be simply translatable into the asynchronous polyadic -calculus with branching [31]. This suggests the feasibility of implementation in distributed environment. Yet much remains to be studied concerning the proposed structuring method, including the ecient implementation of the constructs and the accompanying reasoning principles. See Section 6 for more discussions. The technical content of the present paper is developed on the basis of the preceding proposal [27] due to Kaku Takeuchi and the present authors. The main contributions of the present proposal in comparison with [27] are: the generalisation of session structure by delegation and recursive sessions, which de nitely enlarges the applicability of the proposed structuring method; the typing system incorporating these novel concepts; representation of conventional communication primitives by the structuring constructs; and basic programming examples which show how the constructs, in particular through the use of the above mentioned novel features, can lucidly represent complex communication structures which would not be amenable to conventional communication primitives. In the rest of the paper, Section 2 introduces the language primitives and their operational semantics. Section 3 illustrates how the primitives allow clean representation of extant communication primitives. Section 4 shows how the primitives can represent those interaction structures beyond those of the conventional communication primitives through key programming examples. Section 5 presents the typing system and establishes the basic syntactic results. Section 6 concludes with discussions on the implementation concerns, related works and further issues. 2. Syntax and Operational Semantics 2.1. Basic Concepts. The central idea in the present structuring method is a session. A session is a series of reciprocal interactions between two parties, possibly with branching and recursion, and serves as a unit of abstraction for describing interaction. Communications belonging to a session are done via a port speci c to that session, called a channel. A fresh channel is generated when initiating each session, for the use in communications in the session. We use the following syntax for the initiation of a session. request

a(k) in P

accept

a(k) in P

initiation of session

The request rst requests, via a name a, the initiation of a session as well as the generation of a fresh channel, then P would use the channel for later communications. The accept, on the other hand, receives the request for the initiation of a session via a, generates a new channel k, which would be used for communications in P . In the above grammar, the parenthesis (k) and the key word in shows the binding and its scope.

Thus, in request a(k) in P , the part (k) binds the free occurrences of k in P . This convention is used uniformly throughout the present paper. Via a channel of a session, three kinds of atomic interactions are performed: value sending (including name passing), branching and channel passing (or delegation ). k![e1 en ]; P k?(x1 xn ) in P data sending/receiving k l; P k fl1 : P1 [] []ln : Pn g label selection/branching throw k [k 0 ]; P catch k (k 0 ) in P channel sending/receiving (delegation) Data sending/receiving is the standard synchronous message passing. Here ei denotes

an expression such as arithmetic/boolean formulae as well as names. We assume variables x1 ::xn are all distinct. We do not consider program passing for simplicity, cf. [11]. The branching/selection is the minimisation of method invocation in object-based programming. l1 ; ::; ln are labels which are assumed to be pairwise distinct. The channel sending/receiving, which we often call delegation, passes a channel which is being used in a session to another process, thus radically changing the structure of a session. Delegation is the generalisation of the concept with the same name originally conceived in the concurrent object-oriented programming [34]. See Section 4.3 for detailed discussions. In passing we note that its treatment distinct from the usual value passing is essential for both disciplined programming and a tractable type inference. Communication primitives, organised by sessions, are further combined by the following standard constructs in concurrent programming. P1 j P2

concurrent composition name/channel hiding if e then P else Q conditional def X1 (~ x1 k~1 ) = P1 and and Xn (~xn k~n ) = Pn in P recursion We do not need sequencing since each communication primitive already accompanies one. We also use inact, the inaction, which denotes the lack of action (acting as the unit of \j"). Hiding declares a name/channel to be local in its scope (here P ). Channel hiding may not be used for usual programming, but is needed for the operational semantics presented later. In conditional, e should be a boolean expression. In recursion, X , a process variable, would occur in P1 :::Pn and P zero or more times. Identi ers in x~i k~i should be pairwise distinct. We can use replication (or a single recursion) to achieve the same eect, but multiple recursion is preferable for well-structured programs. This nishes the introduction of all language constructs we shall use in this paper. We give a simple example of a program. accept a(k ) in k ![1]; k ?(y ) in P j request a(k ) in k ?(x) in k ![x + 1]; inact: The rst process receives a request for a new session via a, generates k, sends 1 and receives a return value via k, while the second requests the initiation of a session via a, receives the value via the generated channel, then returns the result of adding 1 to the value. Observe the compatibility of communication patterns between two processes. (a)P (k)P

2.2. Syntax Summary. We summarise the syntax we have introduced so far. Base sets are: names, ranged over by a; b; : : : ; channels, ranged over by k; k0 ; variables, ranged over by x; y; : : : ; constants (including names, integers and booleans), ranged over by c; c0 ; : : : ; expressions (including constants), ranged over by e; e0; : : : ; labels, ranged over by l; l0; : : : ; and process variables, ranged over by X; Y; : : : . u; u0 ; : : : denote names and channels. Then processes, ranged over by P; Q : : : , are given by the following grammar.

session request session acceptance k![~e]; P data sending k?(~x) in P data reception k l; P label selection k fl1 : P1 [] []ln : Pn g label branching throw k [k 0 ]; P channel sending catch k (k 0 ) in P channel reception if e then P else Q conditional branch P jQ parallel composition inact inaction (u)P name/channel hiding def D in P recursion X [~ek~] process variables ~ ~ D ::= X1 (~x1 k1 ) = P1 and and Xn (~xn kn ) = Pn declaration for recursion The association of \j" is the weakest, others being the same. Parenthesis () denotes binders which bind the corresponding free occurrences. The standard simultaneous substitution is used, writing e.g. P [~c=x~]. The sets of free names/channels/variables/process variables of say P , de ned in the standard way, are respectively denoted by fn(P ); fc(P ); fv(P ) and fpv(P ). The alpha-equality is written . We also set fu(P ) def = fc(P ) [ fn(P ). Processes without free variables or free channels are called programs. 2.3. Operational Semantics. For the concise de nition of the operational semantics of the language constructs, we introduce the structural equality (cf. [4, 16]), which is the smallest congruence relation on processes including the following equations. 1. P Q if P Q. 2. P j inact P , P j Q Q j P , (P j Q) j R P j (Q j R). 3. (u)inact inact, (uu)P (u)P , (uu0 )P (u0 u)P , (u)P j Q (u)(P j Q) if u 62 fu(Q), (u)def D in P def D in (u)P if u 62 fu(D). 4. (def D in P ) j Q def D in (P j Q) if fpv(D) \ fpv(Q) = ;. 5. def D in (def D0 in P ) def D and D0 in P if fpv(D) \ fpv(D0 ) = ;. Now the operational semantics is given by the reduction relation !, denoted P ! Q, which is the smallest relation on processes generated by the following rules. [Link] (accept a(k) in P1 ) j (request a(k) in P2 ) ! (k)(P1 j P2 ) [Com] (k![~e]; P1 ) j (k?(~x) in P2 ) ! P1 j P2 [~c=x~] (~e # c~) [Label] (k li ; P ) j (k fl1 : P1 [] []ln : Pn g) ! P j Pi (1 i n) [Pass] (throw k[k0 ]; P1 ) j (catch k(k0 ) in P2 ) ! P1 j P2 [If1] if e then P1 else P2 ! P1 (e # true) [If2] if e then P1 else P2 ! P2 (e # false) [Def] def D in (X [~ek~] j Q) ! def D in (P [~c=x~] j Q) (~e # c~; X (~xk~) = P 2 D) [Scop] P ! P 0 ) (u)P ! (u)P 0 [Par] P ! P 0 ) P j Q ! P 0 j Q [Str] P P 0 and P 0 ! Q0 and Q0 Q ) P ! Q Above we assume the standard evaluation relation # on expressions is given. We write ! for the re exive, transitive closure of !. Note how a fresh channel is generated in [Link] P

::=

j j j j j j j j j j j j j

request a(k ) in P accept a(k ) in P

rule as the result of interaction (in this way request a(k) in P corresponds to the bound output in -calculus[18]). In the [Label] rule, one of the branches is selected, discarding the remaining ones. Note we do not allow reduction under various communication pre xes, as in standard process calculi. As an example, the simple program in 2.1 has the following reduction (below and henceforth we omit trailing inactions). accept a(k ) in k ![1]; k ?(y ) in P j request a(k ) in k ?(x) in k ![x + 1] ! (k)(k![1]; k?(y) in P j k?(x) in k![x + 1]) ! (k)(k?(y) in P j k![x + 1]) ! P [2=y]: Observe how interaction proceeds in a lock-step fashion. This is due to the synchronous form of the present communication primitives. 3. Representing Communication (1) Extant Communication Primitives This section discusses how structuring primitives can represent with ease the communication patterns of conventional communication primitives which have been in use in programming languages. They show how we can understand the extant communication patterns as a xed way of combining the proposed primitives. 3.1. Call-return. The call-return is a widely used communication pattern in which a process calls another process, then the callee would, after some processing, returns some value to the caller. Usually the caller just waits until the reply message arrives. This concept is widely used as a basic primitive in distributed computing under the name of remote procedure call. We may use the following pair of notations for call-return. x = call f [e1 en ] in P; fun f (x1 xn ) in P: On the right, the return command return[e] would occur in P . Assume these constructs are added to the syntax in 2.2. A simple programming example follows. Example 3.1 (Factorial). Fact(f ) = fun f (x) in if x = 1 then return[1] else (b)(Fact[b] j y = call b[x , 1] in return[x y ]) Here and henceforth we write X (~xk~) = P , or a sequence of such equations, for the declaration part of recursion, leaving the body part implicit. This example implements the standard recursive algorithm for the factorial function. y = call f [5] in P would give its client process. The communication patterns based on call-return are easily representable by the combination of request/accept and send/receive. We rst show the mapping of the caller. Below [ ] denotes the inductive mapping into the structuring primitives. [ x = call a[~e] in P ] def = request a(k) in k![~e]; k?(x) in [ P ] : Naturally we assume k is chosen fresh. The basic scenario is that a caller rst initiates the session, sends the values, and waits until it receives the answer on the same channel. On the other hand, the callee is translated as follows: [ fun a(~x) in P ] def = accept a(k) in k?(~x) in [ P ] [k![e]=return[e]]: Here [k![e]=return[e]] denotes the syntactic substitution of k![e] for each return[e]. Observe that the original \return" operation is expressed by the \send" primitive of

the structuring primitives. As one example, let us see how the factorial program in Example 3.1 is translated by the above mapping. Example 3.2 (Factorial, translation). Fact(f ) = accept f (k) in k?(x) in if x = 1 then k ![1]; else (b)(Fact[b] j request b(k 0 ) in k 0 ![x , 1]; k 0 ?(y ) in k ![x y ]) Notice how the usage of k0 and k dierentiates the two contexts of communication. If we compose the translation of factorial with that of its user, we have the reduction: Fact[f ] j [ y = call f [3] in P ] ! Fact[f ] j [ P ] [6=y]: In this way, the semantics of the synchronous call-return is given by that of the structuring primitives via the translation. Some observations follow. (1) The signi cance of the speci c notations for call-return would lie in the declaration of the assumed xed communication pattern, which would enhance readability and help veri cation. At the same time, the translation retains the same level of clarity as the original code, even if it does need a few additional key strokes. (2) In translation, the caller and the callee in general own complementary communication patterns, i.e. input-output meets output-input. However if, for example, the return commands appear twice in the callee, the complementarity is lost. This relates to the notion of types we discuss later: indeed, non-complementary patterns are rejected by the typing system. (3) The translation also suggests how structuring primitives would generalise the traditional call-return structure. That is, in addition to the xed pattern of input-output (or, correspondingly, output-input), we can have a sequence of interactions of indefinite length. For such programming examples, see Section 4. 3.2. Method Invocation. The idea of method invocation originates in object-oriented languages, where a caller calls an object by specifying a method name, while the object waits with a few methods together with the associated codes, so that, when invoked, executes the code corresponding to the method name. The call may or may not result in returning an answer. As a notation, an \object" would be written: obj a : fl1 (~ x1 ) in P1 [] []ln (~xn ) in Pn g; where a gives an object identi er, l1 ; : : : ; ln are labels (all pairwise distinct) with formal parameters x~i , and Pi gives the code corresponding to each li . The return action, written return[e], may occur in each Pi as necessary. A caller then becomes: x = a: li [~e] in P a: li [~e]; P: The left-hand side denotes a process which invokes the object with a method li together with arguments e~, then, receiving the answer, assigns the result to x, and nally executes the continuation P . The right-hand side is a notation for the case when the invocation does not need the return value. As an example, let us program a simple cell. Example 3.3 (Cell). Cell(a; c) = obj c : fread() in (return[c] j Cell[a; c]) [] write(x) in Cell[a; x] g: The cell Cell[a; c] denotes an object which saves its value in c and has a name a. There are two methods, read and write, each with the obvious functionalities. A caller which \reads" this object would be written, for example, x = a: read[ ] in P .

The method invocation can be represented in the structuring constructs by combining label branching and value passing. We show the translations of an object, a caller which expects the return, and a caller which does not expect the return, in this order.

[ obj a : fl1 (~x1 ) in P1 [] []ln (~xn ) in Pn g] def = accept a(k ) in k fl1 : k ?(~ x1 ) in [ P1 ] [] []ln : k?(~xn ) in [ Pn ] g def [ x = a: li [~e] in P ] = request a(k) in k li ; k![~e]; k?(x) in [ P ] [ a: li [~e]; P ] def = request a(k) in k li ; k![~e]; [ P ] In each translation, k should be fresh. In the rst equation, denotes the substitution [k![e]=return[e]], replacing all occurrences of return[e] by k![e]. Observe that a method invocation is decomposed into label branching and value passing. Using this mapping, the cell is now translated into: Example 3.4 (Cell, translation).

Cell(a; c) = accept a(k) in k fread : k![e]; Cell[a; c] [] write : k?(x) in Cell[a; x]g: Similarly, x = a: read[ ] in P is translated as request c(k) in k read; k?(x) in P , while a: write[3] becomes request a(k) in k write; k![3]. Some observations follow. (1) The translation is not much more complex than the original text. The speci c notation however has a role of declaring the xed communication pattern and saves key-strokes. (2) Here again, the translation of the caller and the callee in general results in complementary patterns. There are however two main cases where the complementarity is lost, one due to the existence/lack of the return statement (e.g. x = a: write[3] in Q above) and another due to an invocation of an object with a non-existent method. Again such inconsistency is detectable by a typing system introduced later. (3) The translation suggests how structuring primitives can generalise the standard method invocation. For example, an object may in turn invoke the method of the caller after being invoked. We believe that, in programming practice based on the idea of interacting objects, such reciprocal, continuous interaction would arise naturally and usefully. Such examples are discussed in the next section. In addition to call-return and method invocation, we can similarly represent other communication patterns in use with ease, for example asynchronous call-return (which includes rendez-vous [28] and future [2]) and simple message passing. For the space sake we leave their treatment to [11]. Section 6 gives a brief discussion on the signi cance of speci c notations for these xed communication patterns in language design. 4. Representing Communication (2) Complex Communication Patterns This section shows how the structuring primitives can cleanly represent various complex communication patterns which go beyond those represented in conventional communication primitives. 4.1. Continuous Interactions. The traditional call-return primitive already encapsulates a sequence of communications, albeit simple, as a single abstraction unit. The key possibility of the session structure lies in that it extends this idea to arbitrarily complex communication patterns, including the case when multiple interaction sequences interleave with each other. The following shows one such example, which describes the behaviour of a banking service to the user (essentially an automatic teller machine).

Example 4.1 (ATM).

ATM(a; b) = accept a(k) in k![id]; k fdeposit : request b(h) in k?(amt) in h deposit; h![id; amt]; ATM[a; b] []withdraw : request b(h) in k?(amt) in h withdraw; h![id; amt]; h fsuccess : k dispense; k![amt]; ATM[a; b] []failure : k overdraft; ATM[a; b]g []balance : request b(h) in h balance; h?(amt) in k![amt]; ATM[a; b]g The program, after establishing a session with the user via a, rst lets the user input the user code (we omit such details as veri cation of the code etc.), then oers three choices, deposit, withdraw, and balance. When the user selects one of them, the corresponding code is executed. For example, when the withdraw is chosen, the program lets the user enter the amount to be withdrawn, then interacts with the bank via b, asking with the user code and the amount. If the bank answers there is enough amount in the account, the money is given to the user. If not, then the overdraft message results. In either case the system eventually returns to the original waiting mode. Note in particular the program should communicate with the bank in the midst of interaction with the user, so three parties are involved in interaction as a whole. A user may be written as: request

a(k) in k![myId]; k withdraw; k![58]; k fdispense : k?(amt) in P []overdraft : Qg:

Here we may consider Q as a code for \exception handling" (invoked when the balance is less than expected). Notice also interactions are now truly reciprocal. 4.2. Unbounded Interactions. The previous example shows how structuring primitives can easily describe the situation which would be dicult to program in a clean way using the conventional primitives. At the same time, the code does have room for amelioration. If a user wants to look at his balance before he withdraws, he should enter his user code twice. The following re nement makes this redundancy unnecessary. Example 4.2 (Kind ATM). ATM0 (a; b) = accept a(k) in k![id]; Actions[a; b; id; k] Actions(a; b; id; k) = k fdeposit : request b(h) in k?(amt) in h deposit; h![id; amt]; Actions[a; b; id; k] []withdraw : request b(h) in k?(amt) in h withdraw; h![id]; h![amt]; h fsuccess : k dispense; k![amt]; Actions[a; b; id; k] []failure : k overdraft; Actions[a; b; id; k]g []balance : request b(h) in h balance; h?(amt) in k![amt]; Actions[a; b; id; k] []quit : ATM0 [a; b]g

As can be seen, the main dierence lies in that the process is still within the same session even after recurring to the waiting mode, after processing each request: once a user establishes the session and enters the user code, she can request various services as many times as she likes. To exit from this loop, the branch quit is introduced. This example shows how recursion within a session allows a exible description of interactive behaviour which goes beyond the recursion in usual objects (where each session consists of a xed number of interactions). It is notable that the unbounded session owns a rigorous syntactic structure so as to allow type abstraction, see Section 5. Unbounded interactions with a xed pattern naturally arise in practice, e.g. interactions between a le server and its client. We believe recursion within a session can be eectively used for varied programming practice. 4.3. Delegation. The original idea of delegation in object-based concurrent programming [34] allows an object to delegate the processing of a request it receives to another object. Its basic purpose is distribution of processing, while maintaining the transparency of name space for clients of that service. Practically it can be used to enhance modularity, to express exception handing, and to increase concurrency. The following example shows how we can generalise the original notion to the present setting. Example 4.3 (Ftp server). Below n P denotes the n-fold parallel composition. Init(pid; nis) = (b)(Ftpd[pid; b] j n FtpThread[b; nis]) Ftpd(pid; b) = accept pid(s) in request b(k) in throw k[s]; Ftpd[pid; b] FtpThread(b; nis) = accept b(k) in catch k(s) in s?(userid; passwd) in request nis(j ) in j checkUser; j ![userid; passwd]; j finvalid : s sorry[]valid : s welcome; Actions[s; b]g Actions(s; b) = s fget : ; s?( le) in ; Actions[s; b] []put : ; s?( le) in ; Actions[s; b] []bye : ; FtpThread[b; nis]g: The above code shows an outline of the code of a ftp server, which follows the behaviour of the standard ftp servers with TCP/IP protocol. Initially the program Init generates a server Ftpd and n threads with the identical behaviour (for simplicity we assume all threads share the same name b). Suppose, in this situation, a server receives a request for a service from some client, establishing the session channel s. A server then requests for another session with an idle thread (if it exists) and \throws" a channel s to that thread, while getting ready for the next request from clients itself. It is now the thread FtpThread which actually processes the user's request, receiving the user name, referring to NIS, and executing various operations (note recursion within a session is used). Here the delegation is used to enable the ftp server to process multiple requests concurrently without undue delay in response. The scheme is generally applicable to a server interacting with many clients. Some observations follow. (1) The example shows how the generalised delegation allows programmers to cleanly describe those interaction patterns which generalise the original form of delegation. Other examples of the usage of delegation abound, for example a le server with geographically distributed sites or a server with multiple services each to be processed by a dierent sub-server. (2) A key factor of the above code is that a client does not have to be conscious of the delegation which takes place on the server's side : that is, a client program can be

N

N

written as if it is interacting with a single entity, for example as follows. request pid(s) in s![myId]; s fsorry : []welcome : g Observe that, between the initial request and the next sending operation, the catch/throw interaction takes place on the server's side: however the client process does not have to be conscious of the event. This shows how delegation enables distribution of computation while maintaining the transparency of the name space. (3) If we allow each ftp-thread to be dynamically generated, we can use parallel composition to the same eect, just as the use of \fork" to pass process resources in UNIX. While this scheme has a limitation in that we cannot send a channel to an already running process, it oers another programming method to realise exible, dynamic communication structures. We also observe that the use of throw/catch, or the \fork" mentioned above, would result in complexly woven sequences of interactions, which would become more error-prone than without. In such situations, the type discipline discussed in the next section would become an indispensable tool for programming, where we can algorithmically verify if a program has coherent communication structure and, in particular, if it contains interaction errors. 5. The Type Discipline 5.1. Preliminaries. The present structuring method allows the clear description of complex interaction structures beyond conventional communication primitives. The more complex the interaction becomes, however, the more dicult it would be to capture the whole interactive behaviour and to write correct programs. The type discipline we shall discuss in this section gives a simple solution to these issues at a basic level. We rst introduce the basic notions concerning types, including duality on types which represents complementarity of interactions. De nition 5.1 (Types). Given type variables (t; t0; : : : ) and sort variables (s; s0 ; : : : ), sorts (S; S 0 ; : : : ) and types (; ; : : : ) are de ned by the following grammar. S ::= nat j bool j h; i j s j s:S ::= # [S~]; j # []; j &fl1 : 1 ; : : : ; ln : n g j 1 j ? j " [S~]; j " []; j fl1 : 1 ; : : : ; ln : n g j t j t: where, for a type in which ? does not occur, we de ne , the co-type of , by: " [S~]; =# [S~]; fli : i g = &fli : i g " []; =# []; 1 = 1 # [S~]; =" [S~]; &fli : i g = fli : i g # []; =" []; t = t t: = t:: A sorting (resp. a typing, resp. a basis ) is a nite partial map from names and variables to sorts (resp. from channels to types, resp. from process variables to the sequences of sorts and types). We let ,; ,0 ; : : : (resp. ; 0 ; : : : , resp. ; 0; : : : ) range over sortings (resp. typings, resp. bases). We regard types and sorts as representing the corresponding regular trees in the standard way [5], and consider their equality in terms of such representation. A sort of form h; i represents two complementary structures of interaction which are associated with a name (one denoting the behaviour starting with accept, another that which starts with request), while a type gives the abstraction of interaction to be done through a channel. Note = holds whenever is de ned. ? is a speci c type indicating no further connection is possible at a given name. The following partial algebra on the set of typings, cf. [10], plays a key role in our typing system.

De nition 5.2 (Type algebra). Typings 1 and 2 are compatible, written 1 2 , if 1 (k) = 2 (k) for all k 2 dom(1 ) \ dom(2 ). When 1 2 , the composition of 1 and 2 , written 1 2 , is given as a typing such that (1 2 )(k) is (1) ?, when k 2 dom(1 ) \ dom(2 ); (2) i (k), if k 2 dom(i ) n dom(i+1 mod 2 ) for i 2 f1; 2g;

and (3) unde ned otherwise. Compatibility means each common channel k is associated with complementary behaviours, thus ensuring the interaction on k to run without errors. When composed, the type for k becomes ?, preventing further connection at k (note ? has no co-type). One can check the partial operation is partially commutative and associative. 5.2. Typing System. The main sequent of our typing system has a form ; , ` P . which reads: \under the environment ; ,, a process P has a typing ." Sorting , speci es protocols at the free names of P , while typing speci es P 's behaviour at its free channels. When P is a program, and become both empty. Given a typing or a sorting, say , write s : p for [ fs : pg together with the condition that s 62 dom(); and ns for the result of taking o s : (s) from if (s) is de ned (if not we take itself). Also assume given the evident inference rules for arithmetic and boolean expressions, whose sequent has the form , ` e . , enjoying the standard properties such as , ` e . S and e # c imply , ` c . S . The main de nition of this section follows. De nition 5.3 (Basic typing system). The typing system is de ned by the axioms and rules in Figure 1, where we assume the range of in [Inact] and [Var] contains only 1 and ?. For simplicity, the rule [Def] is restricted to single recursion, which is easily extendible to multiple recursion. If ; , ` P . is derivable in the system, we say P is typable under ; , with , or simply P is typable. Some comments on the typing system follow. (1) In the typing system, the left-hand side of the turnstile is for shared names and variables (\classical realm"), while the right-hand side is for channels sharable only by two complementary parties (a variant of \linear realm"). It diers from various sorting disciplines in that a channel k is in general ill-sorted, e.g. it may carry an integer at one time and a boolean at another. In spite of this, the manipulation of linear realm by typing algebra ensures linearised usage of channels, as well as preventing interaction errors, cf. Theorem 5.4 below. (2) In [Thr], the behaviour represented by for channel k0 is actually performed by the process which \catches" k0 (note k0 cannot occur free in , hence neither in P , by our convention on \"). To capture the interactions at k0 as a whole, k0 : is added to the linear realm. On the other hand, the rule [Cat] guarantees that the receiving side does use the channel k0 as is prescribed. Reading from the conclusions to the antecedents, [Thr] and [Cat] together illustrate how k0 is \thrown" from the left to the right. (3) The simplicity of the typing rules is notable, utilising the explicit syntactic structure of session. In particular it is syntax-directed and has a principal type when we use a version of kinding [22]. It is then easy to show there is a typing algorithm a la ML, which extracts the principal type of a given process i it is typable. It should be noted that simplicity and tractability of typing rules do not mean that the obtainable type information is uninteresting: the resulting type abstraction richly represents the interactive behaviour of programs, as later examples exhibit.

[Acc]

; , ` P . k : ; , ` P . k : [Req] ; ,; a : h; i ` accept a(k) in P . ; ,; a : h; i ` request a(k) in P . [Send]

, ` e~ . S~ ; , ` P . k : ; , x~ : S~ ` P . k : [ Rcv ] ; , ` k![~e]; P . k :" [S~]; ; , ` k?(~x) in P . k :# [S~];

[Br]

; , ` P1 . k : 1 ; , ` Pn . k : n ; , ` k fl1 : P1 [] []ln : Pn g . k : &fl1 : 1 ; : : : ; ln : n g

[Sel] [Thr]

; , ` P . k : j (1 j n) ; , ` k lj ; P . k : fl1 : 1 ; : : : ; ln : n g

; , ` P . k : ; , ` P . k : k0 : [Cat] ; , ` throw k[k0 ]; P . k :" []; k0 : ; , ` catch k(k0 ) in P . k :# [];

0 [Conc] ; , ` P . ; , ` Q 0. ( 0 ) [If] , ` e . bool ; , ` P . ; , ` Q . ; , ` P j Q . ; , ` if e then P else Q .

[NRes] [Var]

; , a : S ` P . ; , ` P . k :? [CRes] [Inact] ; , ` inact . ; , ` (a)P . ; , ` (k)P .

, ` e~ . S~ ; , x~ : S~ ` P . k~ : ~ ; , ` Q . [Def] ((X ) = S~~ ) ; X : S~~ ; , ` X [~ek~] . k~ : ~ nX ; , ` def X (~xk~ ) = P in Q .

Figure 1. The typing system

Below we brie y summarise the fundamental syntactic properties of the typing system. We need the following notion: a k-process is a pre xed process with subject k (such as k![~e]; P and catch k(k0 ) in P ). Next, a k-redex is a pair of dual k-processes composed by j, i.e. either of forms (k![~e]; P j k?(x) in Q), (k l; P j k fl1 : Q1 [] []ln : Qn g), or (throw k[k0 ]; P j catch k(k00 ) in Q). Then P is an error if P def D in ( u~)(P 0 jR) where P 0 is, for some k, the j-composition of either two k-processes that do not form a k-redex, or three or more k-processes. We then have:

Theorem 5.4.

1. (Invariance under ) ; , ` P . and P Q imply ; , ` Q . . 2. (Subject reduction) ; , ` P . and P ! Q imply ; , ` Q . . 3. (Lack of run-time errors) A typable program never reduces into an error. We omit the proofs, which are straightforward due to the syntax-directed nature of the typing rules. See [11] for details. We note that we can easily extend the typing system with ML-like polymorphism for recursion, which is useful for e.g. template processes (such as def Cell(cv) = in Cell[a 42] j Cell[b true]), with which all the properties in Theorem 5.4 are preserved. This and other basic extensions are discussed in [11]. 5.3. Examples. We give a few examples of typing, taking programs in the preceding sections. We omit the nal 1 from the type, e.g. we write # [] for # []; 1. First, the factorial in Example 3.2 is assigned, at its free name, a type # [nat]; " [nat] (for factorial) and its dual " [nat]; # [nat] (for its user). Next, the cell in Example 3.3 is given a type

&fread :" []; write :# []g (for the cell) and its dual fread :# []; write :" []g (for its user). The type of a cell says a cell waits with two options, and, when \read" is selected, it would send an integer and the session ends, and when \write" is selected, it would receive an integer and again the session ends: its dual says a user may do either \read" or \write", and, according to which of them it selects, it behaves as prescribed. As a more interesting example, take the \kind ATM" in Example 4.2. Consider ATM0 [ab] under the declaration in the example. Then a typing a : h; i; b : h ; i is given to the process, where we set , which abstracts the interaction with a user, as: def = # [nat]; t:&fdeposit :# [nat]; t; withdraw :# [nat]; fdispense :" [nat]; t; overdraft : tg; balance :" [nat]; t; quit :" [nat]g; while , which abstracts the interaction with the banking system, is given as: def = fdeposit :" [nat nat]; withdraw :" [nat nat]; &fsuccess : 1; failure : 1g; balance :" [nat]; # [nat]g: Notice the type abstraction is given separately for the user (at a) and the bank (at b), describing the behaviour of ATM0 for each of its interacting parties. As a nal example, the ftp server of Example 4.3 is given the following type at its principal name: # [nat nat]; fsorry : 1; welcome : t: &fget : ; t; put : ; t; bye : gg: This example shows that the throw/catch action is abstracted away in the type with respect to the user (but not in the type with respect to the thread, which we omit) so that the user can interact without concerning himself with the delegation occurring on the other's side. In these ways, not only the type discipline oers the correctness veri cation of programs at a basic level, but also it gives a clean abstraction of interactive behaviours of programs, which would assist programming activities. 6. Discussions 6.1. Implementation Concerns. In the previous sections, we have seen how the session structure enables clean description of complex communication behaviours, employing the synchronous form of interactions as its essential element. For implementation of the primitives, however, the use of asynchronous communication is essential, since the real distributed computing environments are inherently asynchronous. To study such implementation in a formal setting, which should then be applied to the realistic implementation, we consider a translation of the present language primitives into TyCO[14], a sorted summation-less polyadic asynchronous -calculus with branching structure (which is ultimately translatable into its monadic, branch-less version). The existence of the branching structure makes TyCO an ideal intermediate language. For the space sake we cannot discuss the technical details of the translation, for which the reader may refer to [11]. We only list essential points. (1) The translation converts both channels and names into names. Each synchronous interaction (including the branching-selection) is translated into two asynchronous interactions, the second one acting as acknowledgement. This suggests the primitives are amenable for distributed implementation, at least at the rudimentary level.

(2) In spite of (1) above, the resulting code is far from optimal: as a simple example, if a value is sent immediately after the request operation, clearly the value can be \piggy-backed" to the request message. A related example is the translation of the factorial in Section 3 in comparison with its standard \direct" encoding in Pict or TyCO in a continuation-passing style, cf. [24]. To nd the eective, well-founded optimisation methods in this line would be a fruitful subject of study (see 6.2 below). (3) The encoding translates the typable programs into well-sorted TyCO codes. It is an intriguing question how we can capture, in a precise way, the well-typedness of the original code at the level of TyCO (this question was originally posed by Simon Gay for a slightly dierent kind of translation). 6.2. Related Works and Further Issues. In the following, comparisons with a few related works are given along with some of the signi cant further issues. First, in the context of conventional concurrent programming languages, the key departure of the proposed framework lies in that the session structure allows us to form an arbitrary complex interaction pattern as a unit of abstraction, rather than being restricted to a xed repertoire. Examples in Section 4 show how this feature results in clean description of complex communication behaviours which may not be easy to express in conventional languages. For example, if we use, for describing those examples, so-called concurrent object-oriented languages [34], whose programming concept based on objects and their interaction is proximate to the present one, we need to divide each series of interactions into multiple chunks of independent communications, which, together with the nesting of method invocations, would make the resulting programs hard to understand. The explicit treatment of session also enables the type-based veri cation of compatibility of communication patterns, which would facilitate the writing of correct communication-based programs at an elementary level. In spite of these observations, we believe that various communication primitives in existing programming languages, such as object-invocation and RPC, would not diminish their signi cance even when the present primitives are incorporated. We already discussed how they would be useful for declaring xed communication patterns, as well as for saving key strokes (in particular the primitives for simple message passing would better be given among language constructs since their description in the structuring primitives is, if easy, roundabout). Notice we can still maintain the same type discipline by regarding these constructs as standing for combination of structuring primitives, as we did in Section 3. In another vein, the communication structures which we can extract from such declaration would give useful information for performing optimisation. Secondly, the eld of research which is closely related to the present work is the study of various typed programming languages based on -calculi [6, 23, 29], and, more broadly, the study of types for -calculi (see for example [17, 24, 25, 13, 30, 35]). In 6.1, we already noted that the proposed primitives are easily translatable into TyCO, hence into -calculus. Indeed, the encoding of various interaction structures in -calculus is the starting point of the present series of study (cf. Section 2 of [27]). Comparisons with these works may be done from two distinct viewpoints. (1) From the viewpoint of language design, Pict and other -calculus-based languages use the primitives of (polyadic) asynchronous -calculus or its variants as the basic language constructs, and build further layers of abstraction on their basis. The present proposal diers in that it incorporates the session-based structure as a fundamental stratum for programming, rather than relying on chains of direct name passing for describing communication behaviour. While the former is operationally translatable into the latter as discussed in 6.1, the very translation of, say,

the programming examples in the preceding sections would reveal the signi cance of the session structure for abstraction concerns. In particular, any such translation should use multiple names for actions belonging to a single session, which damages the clarity and readability of the program. We note that we are far from claiming that the proposed framework would form the sole stratum for high-level communication-based programming: abstraction concerns for distributed computing are so diverse that any single framework cannot meet all purposes. However we do believe that the proposed constructs (with possible variations) would oer a basic and useful building block for communication-based programming, especially when concerned communication behaviours tend to become complex. (2) From the viewpoint of type disciplines, one notable point would be that well-typed programs in the present typing system are in general ill-sorted (in the sense of [17]) when they are regarded as -terms, since the same channel k may be used for carrying values of dierent sorts at dierent occasions. This is in a sharp contrast with most type disciplines for -calculus in literature. An exception is the typing system for monadic -calculus presented in [35], where, as in the present setting, the incorporation of sequencing information in types allows certain ill-sorted processes to be well-typed. Apart from a notable dierence in motivation, a main technical dierence is that the rules in [35] are not syntax-directed; at the same time, the type discipline in [35] guarantees a much stronger behavioural property. Another notable feature of the present typing system is the linear usage of channels it imposes on programs. In this context, the preceding study on linearity in -calculus such as [10, 13] oers the clari cation of the deterministic character of interaction at channels (notice interaction at names is still non-deterministic in general). Also [Thr] and [Cat] rules have some resemblance to the rules for linear name communication presented in [10, 13]. Regarding these and other works on types for -calculus, we note that various cases of redundant codes in translation mentioned in 6.1 above often concern certain xed ways of using names in processes, which would be amenable to type-based analysis. Finally one of the most important topics which we could not touch in the present paper is the development of reasoning principles based on the proposed structuring method. Notice the type discipline in Section 5 already gives one simple, though useful, example. However it is yet to be studied whether reasoning methods on deeper properties of programs (cf. [1, 3, 20]) which are both mathematically well-founded and are applicable to conspicuous practical situations, can be developed or not. We wish to touch on this topic in our future study. Acknowledgements. We sincerely thank three reviewers for their constructive comments. Discussions with Simon Gay, Lus Lopes, Benjamin Pierce, Kaku Takeuchi and Nobuko Yoshida have been bene cial, for which we are grateful. We also thank the last two for oering criticisms on an earlier version of the paper. Vasco Vasconcelos is partially supported by Project PRAXIS/2/2.1/MAT/46/94. [1] [2] [3] [4] [5]

References S. Abramsky, S. Gay, and R. Nagarajan, Interaction Categories and Foundations of Typed Concurrent Computing. Deductive Program Design, Springer-Verlag, 1995. G. Agha. Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press, 1986. P. America and F. de Boer. Reasoning about Dynamically Evolving Process Structures. Formal Aspects of Computing, 94:269{316, 1994. G. Berry and G. Boudol. The chemical abstract machine. TCS, 96:217{248, 1992. B. Courcelle. Fundamental properties of in nite trees. TCS, 25:95{169, 1983.

[6] C. Fournet and G. Gonthier. The re exive chemical abstract machine and the join-calculus. In POPL'96, pp. 372{385, ACM Press, 1996. [7] C.A.R. Hoare. Communicating sequential processes. CACM, 21(8):667{677, 1978. [8] C.A.R. Hoare. Communicating Sequential Processes. Prentice Hall, 1995. [9] Jones, C.B., Process-Algebraic Foundations for an Object-Based Design Notation. UMCS-93-10-1, Computer Science Department, Manchester University, 1993. [10] K. Honda. Composing processes. In POPL'96, pp. 344{357, ACM Press, 1996. [11] K. Honda, V. Vasconcelos, and M. Kubo. Language primitives and type disciplines for structured communication-based programming. Full version of this paper, in preparation. [12] S. Peyton Jones, A. Gordon, and S. Finne. Concurrent Haskell. In POPL'96, pp. 295{308, ACM Press, 1996. [13] N. Kobayashi, B. Pierce, and D. Turner. Linearity and the pi-calculus. In POPL'96, pp. 358{371, ACM Press, 1996. [14] L. Lopes, F. Silva, and V. Vasconcelos. A framework for compiling object calculi. In preparation. [15] R. Milner. Communication and Concurrency. C.A.R. Hoare Series Editor. Prentice Hall, 1989. [16] R. Milner. Functions as processes. MSCS, 2(2):119{141, 1992. [17] R. Milner, Polyadic -Calculus: a tutorial. Logic and Algebra of Speci cation, Springer-Verlag, 1992. [18] R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, Parts I and II. Journal of Information and Computation, 100:1{77, September 1992. [19] R. Milner and M. Tofte. The De nition of Standard ML. MIT Press, 1991. [20] H. R. Nielson and F. Nielson. Higher-order concurrent programs with nite communication topology. In POPL '94. ACM Press, 1994. [21] A. Ohori and K. Kato. Semantics for communication primitives in a polymorphic language. In POPL 93. ACM Press, 1993. [22] A. Ohori. A compilation method for ML-style polymorphic record calculi. In POPL 92, pp. 154{165. ACM Press, 1992. [23] B. Pierce and D. Turner. Pict: A programming language based on the pi-calculus. CSCI Technical Report 476, Indiana University, March 1997. [24] B. Pierce, and D. Sangiorgi, Typing and subtyping for mobile processes. In LICS'93, pp. 187{215, 1993. [25] B. Pierce and D. Sangiorgi, Behavioral Equivalence in the Polymorphic Pi-Calculus. POPL 97, ACM Press, 1997. [26] J-H. Reppy. CML: A higher-order concurrent language. In PLDI 91, pp. 293{305. ACM Press, 1991. [27] K. Takeuchi, K. Honda, and M. Kubo. An interaction-based programming language and its typing system. In PARLE'94, volume 817 of LNCS, pp. 398{413. Springer-Verlag, July 1994. [28] US Government Printing Oce, Washington DC. The Ada Programming Language, 1983. [29] V. Vasconcelos. Typed concurrent objects. In ECOOP'94, volume 821 of LNCS, pp. 100{117. Springer-Verlag, 1994. [30] V. Vasconcelos and K. Honda, Principal Typing Scheme for Polyadic -Calculus. CONCUR'93, Volume 715 of LNCS, pp.524-538, Springer-Verlag, 1993. [31] V. Vasconcelos and M. Tokoro. A typing system for a calculus of objects. In 1st ISOTAS, volume 742 of LNCS, pp. 460{474. Springer-Verlag, November 1993. [32] Y. Yokote and M. Tokoro. Concurrent programming in ConcurrentSmalltalk. In Yonezawa and Tokoro [34]. [33] A. Yonezawa, editor. ABCL, an Object-Oriented Concurrent System. MIT Press, 1990. [34] A. Yonezawa and M. Tokoro, editors. Object-Oriented Concurrent Programming. MIT Press, 1987. [35] N. Yoshida, Graph Types for Monadic Mobile Processes. In FST/TCS'16, volume 1180 of LNCS, pp. 371{386, Springer-Verlag, 1996. The full version as LFCS Technical Report, ECS-LFCS-96-350, 1996.

Code Motion and Code Placement: Just Synonyms?? Jens Knoop1 ?? , Oliver Ruthing2 , and Bernhard Steen2 Universitat Passau, D-94030 Passau, Germany e-mail: [email protected] 2 Universitat Dortmund, D-44221 Dortmund, Germany e-mail: ruething,steffen @ls5.cs.uni-dortmund.de 1

f

g

Abstract. We prove that there is no dierence between code motion (CM ) and code placement (CP ) in the traditional syntactic setting, however, a dramatic dierence in the semantic setting. We demonstrate this by re-investigating semantic CM under the perspective of the recent development of syntactic CM. Besides clarifying and highlighting the analogies and essential dierences between the syntactic and the semantic approach, this leads as a side-eect to a drastical reduction of the conceptual complexity of the value- ow based procedure for semantic CM of [20], as the original bidirectional analysis is decomposed into purely unidirectional components. On the theoretical side, this establishes a natural semantical understanding in terms of the Herbrand interpretation (transparent equivalence), and thus eases the proof of correctness; moreover, it shows the frontier of semantic CM, and gives reason for the lack of algorithms going beyond. On the practical side, it simpli es the implementation and increases the eciency, which, like for its syntactic counterpart, can be the catalyst for its migration from academia into industrial practice. Keywords: Program optimization, data- ow analysis, code motion, code placement, partial redundancy elimination, transparent equivalence, Herbrand interpretation.

1 Motivation Code motion (CM ) is a classical optimization technique for eliminating partial redundancies (PRE ).1 Living in an ideal world a PRE-algorithm would yield the program of Figure 1(b) when applied to the program of Figure 1(a). A truly optimal result; free of any redundancies. ? An extended version is available as [14]. ?? The work of the author was funded in part by the Leibniz Programme of the German Research Council (DFG) under grant Ol 98/1-1. 1 CM and PRE are often identi ed. To be precise, however, CM is a speci c technique for PRE. As we are going to show here, identifying them is inadequate in general, and thus, we are precise on this distinction.

b)

a)

1

1

5

2

h1 := a+b

2

h2 := c+b 5

3

(x,y) := (a+b,c+b)

c := a

4

3

6

13

z := a+b

z := c+b

10

z := 42

p := b

11

c := a+b

14

15

7

9

z := h1

10

z := 42

11

c := h1

z :=h2

p := b

12

17

a := c+p

The Original Program

16

6

13

8

9

12

(x,y) := (h1, h2)

4

7

8

(c, h2) :=(a, h1)

17

a :=h2

14

15

16

Desired Optimization: A Truly Optimal Program No Redundancies at all!

Fig. 1. Living in an Ideal World: The Eect of Partial Redundancy Elimination Unfortunately, the world is not that ideal. Up to now, there is no algorithm achieving the result of Figure 1(b). In reality, PRE is characterized by two different approaches for CM: a syntactic and a semantic one. Figure 2 illustrates their eects on the example of Figure 1(a). The point of syntactic CM is to treat all term patterns independently and to regard each assignment as destructive to any term pattern containing the left-hand side variable. In the example of Figure 1(a) it succeeds in eliminating the redundancy of a + b in the left loop, but fails on the redundancy of c + p in the right loop, which, because of the assignment to p, is not redundant in the \syntactic" sense inside the loop. In contrast, semantic CM fully models the eect of assignments, usually by means of a kind of symbolic execution (value numbering) or by backward substitution: by exploiting the equality of p and b after the assignment p := b, it succeeds in eliminating the redundant computation of c + p inside the right loop as well. However, neither syntactic nor semantic CM succeeds in eliminating the partial redundancies at the edges 8 and 13. This article is concerned with answering why: we will prove that redundancies like in this example are out of the scope of any \motion"-based PRE-technique. Eliminating them requires to switch from motion-based to \placement"-based techniques. This fact, and more generally, the analogies and dierences between syntactic and semantic CM and CP as illustrated in Figures 2(a) and (b), and Figure 1(b), respectively, are elucidated for the rst time in this article.

a)

b) 1

1

( h1, h2) := (a+b,c+b)

2

( h1, h2) := (a+b,c+b)

2

5

3

5

(x,y) := ( h1, h2 )

c := a

4

13

h1 := a+b

9

z := h1

10

z := 42

11

c := h1

4

h2 := c+b

p := b

17

h3 := c+p a := h3

After Syntactic Code Motion

13

h1 := a+b

14

15

16

6

7

8

z :=h2

12

(x,y) := ( h1, h2 )

c := a

Failed to be Eliminated!

7

8

3

6

9

z := h1

10

z := 42

11

c := h1

h2 := c+b z :=h2

p := b

12

17

a := h2

14

15

16

After Semantic Code Motion

Fig. 2. Back to Reality: Syntactic Code Motion vs. Semantic Code Motion

History and Current Situation: Syntactic CM (cf. [2, 5{8, 12, 13, 15, 18]) is

well-understood and integrated in many commercial compilers.2 In contrast, the much more powerful and aggressive semantic CM has currently a very limited practical impact. In fact, only the \local" value numbering for basic blocks [4] is widely used. Globalizations of this technique can be classi ed into two categories: limited globalizations, where code can only be moved to dominators [3, 16], and aggressive globalizations, where code can be moved more liberally [17, 20, 21]. The limited approaches are quite ecient, however, at the price of losing signi cant parts of the optimization power: they even fail in eliminating some of the redundancies covered by syntactic methods. In contrast, the aggressive approaches are rather complex, both conceptually and computationally, and are therefore considered impractical. This judgement is supported by the state-ofthe-art here, which is still based on bidirectional analyses and heuristics making the proposed algorithms almost incomprehensible. In this article we re-investigate (aggressive) semantic CM under the perspective of the very successful recent development of syntactic CM. This investigation highlights the conceptual analogies and dierences between the syntactic and the semantic approach. In particular, it allows us to show that: 2

E.g., based on [12, 13] in the Sun SPARCompiler language systems (SPARCompiler is a registered trademark of SPARC International, Inc., and is licensed exclusively to Sun Microsystems, Inc.).

{ the decomposition technique into unidirectional analyses developed in [12,

13] can be transferred to the semantic setting. Besides establishing a natural connection between the Herbrand interpretation (transparent equivalence) and the algorithm, which eases the proof of its correctness,3 this decomposition leads to a more ecient and easier implementation. In fact, due to this simpli cation, we are optimistic that semantic CM will nd its way into industrial practice. { there is a signi cant dierence between motion and placement techniques (see Figures 1 and 2), which only shows up in the semantic setting. The point of this example is that the computations of a + b and c + b cannot safely be \moved" to their computation points in Figure 1(b), but they can safely be \placed" there (see Figure 3 for an illustration of the essentials of this example). The major contributions of this article are thus as follows. On the conceptual side: (1) Uncovering that CM and CP are no synomyms in the semantic setting (but in the syntactic one), (2) showing the frontier of semantic CM, and (3) giving theoretical and practical reasons for the lack of algorithms going beyond! On the technical side, though almost as a side-eect yet equally important, presenting a new algorithm for computationally optimal semantic CM, which is conceptually and technically much simpler as its predecessor of [20]. Whereas the dierence between motion and placement techniques will primarily be discussed on a conceptual level, the other points will be treated in detail. b)

a)

c) h1 := a+b

(c, h2) :=(a, h1)

(x,y) := ( h1, h2 )

Placing a+b

ing

The Original Program

z := h1

a+b

Mov

Neither a+b nor c+b are down-safe. Hence, the motion process gets stuck!

ing c

h1 := a+b ing

z := c+b

(x,y) := ( h1, h2 )

c := a

Mov

Plac

z := a+b

( h1, h2 ) := (a+b,c+b)

Placing c+b

(x,y) := (a+b,c+b)

c := a

( h1, h2) := (a+b,c+b)

+b

h2 := c+b

c+b

z := h2

After Semantic Code Placement

z := h1

z := h2

After Semantic Code Motion

Fig. 3. Illustrating the Dierence: Sem. Code Placement vs. Sem. Code Motion

Safety { The Backbone of Code Motion: Key towards the understanding of the conceptual dierence between syntactic and semantic CM is the notion of 3

Previously (cf. [20, 21]), this connection, which is essential for the conceptual understanding, had to be established in a very complicated indirect fashion.

safety of a program point for some computation: intuitively, a program point is safe for a given computation, if the execution of the computation at this point results in a value that is guaranteed to be computed by every program execution passing the point. Similarly, up-safety (down-safety ) is de ned by requiring that the computation of the value is guaranteed before meeting the program point for the rst time (after meeting the program point for the last time).4 As properties like these are undecidable in the standard interpretation (cf. [16]), decidable approximations have been considered. Prominent are the abstractions leading to the syntactic and semantic approach considered in this article. Concerning the safety notions established above the following result is responsible for the simplicity and elegance of the syntactic CM-algorithms (cf. [13]): Theorem 1 (Syntactic Safety). Safe = Up-safe _ Down-safe It is the failure of this equality in the semantic setting, which causes most of the problems of semantic CM, because the decomposition of safety in up-safety and down-safety is essential for the elegant syntactic algorithms. Figure 4 illustrates this failure as follows: placing the computation of a + b at the boldface join-node is (semantically) safe, though it is neither up-safe nor down-safe. As a consequence, simply transferring No computation of x := a+b c := a "a+b" the algorithmic idea of the syntacalong this path! tic case to the semantic setting without caring about this equivalence results in an algorithm for CM with Placing "a+b" is safe here! second-order eects (cf. [17]).5 These can be avoided by de ning z := c+b z := a+b a motion-oriented notion of safety, which allows to reestablish the equality for a hierarchically de ned notion of up-safety: the algorithm resulting from the use of these notions captures all the second-order Fig. 4. Safe though neither Up-Safe nor eects of the \straightforwardly Down-Safe transferred" algorithm as well as the results of the original bidirectional version for semantic CM of [20]. Motion versus Placement: The step from the motion-oriented notion of safety to \full safety" can be regarded as the step from motion -based algorithms to placement -based algorithms: in contrast to CM, CP is characterized by allowing arbitrary (safe) placements of computations with subsequent (total) redundancy elimination (TRE). As illustrated in Figure 3, not all placements can be realized via motion techniques, which are characterized by allowing the code movement only within areas where the placement would be correct. The power of arbitrary 4 Up-safety and down-safety are traditionally called \availability" and very busyness", which, however, does not re ect the \semantical" essence and the duality of the two properties as precise as up-safety and down-safety. 5 Intuitively, this means the transformation is not idempotent (cf. [17]. (Though it is neither up-safe nor down-safe!)

placement leads to a number of theoretic and algorithmic complications and anomalies (cf. Section 5), which we conjecture, can only be solved by changing the graph structure of the argument program, e.g. along the lines of [19]. Retrospectively, the fact that all CM-algorithms arise from notions of safety, which collapse in the syntactic setting, suces to explain that the syntactic algorithm does not have any second-order eects, and that there is no dierence between \motion" and \placement" algorithms in the syntactic setting.

Theorem 2 (Syntactic CM and Syntactic CP).

In the syntactic setting, CM is as powerful as CP (and vice versa ).

2 Preliminaries We consider procedures of imperative programs, which we represent by means of directed edge-labeled ow graphs G = (N; E; s; e) with node set N , edge set E , a unique start node s and end node e, which are assumed to have no incoming and outgoing edges, respectively. The edges of G represent both the statements and the nondeterministic control ow of the underlying procedure, while the nodes represent program points only. Statements are parallel assignments of the form (x1 ; : : : ; xr ) := (t1 ; : : : ; tr ), where xi are pairwise disjoint variables, and ti terms of the set T, which as usual are inductively composed of variables, constants, and operators. For r = 0, we obtain the empty statement \skip". Unlabeled edges are assumed to represent \skip". Without loss of generality we assume that edges starting in nodes with more than one successor are labeled by \skip".6 Source and destination node of a ow-graph edge corresponding to a node n of a traditional node-labeled ow graph represent the usual distinction between the entry and the exit point of n explicitly. This simpli es the formal development of the theory signi cantly, particularly the de nition of the value- ow graph in Section 3.1 because the implicit treatment of this distinction, which, unfortunately is usually necessary for the traditional ow-graph representation, is obsolete here; a point which is intensely discussed in [11]. For a ow graph G, let pred(n) and succ(n) denote the set of all immediate predecessors and successors of a node n, and let source (e) and dest (e) denote the source and the destination node of an edge e. A nite path in G is a sequence he1 ; : : : ; eq i of edges such that dest (ej ) = source (ej+1 ) for j 2 f1; : : : ; q , 1g. It is a path from m to n, if source (e1 ) = m and dest (eq ) = n. Additionally, p[i; j ], where 1 i j q, denotes the subpath hei ; : : : ; ej i of p. Finally, P[m; n] denotes the set of all nite paths from m to n. Without loss of generality we assume that every node n 2 N lies on a path from s to e. 6

Our approach is not limited to a setting with scalar variables. However, we do not consider subscripted variables here in order to avoid burdening the development by alias-analyses.

3 Semantic Code Motion In essence, the reason making semantic CM much more intricate than syntactic CM is that in the semantic setting safety is not the sum of up-safety and downsafety (i.e., does not coincide with their disjunction). In order to illustrate this in more detail, we consider as in [17] transparent equivalence of terms, i.e., we fully treat the eect of assignments, but we do not exploit any particular properties of term operators.7 Moreover, we essentially base our investigations on the semantic CM-algorithm of [20] consisting of two major phases: { Preprocess: a) Computing transparent equivalences (cf. Section 3.1). b) Constructing the value- ow graph (cf. Section 3.1). { Main Process: Eliminating semantic redundancies (cf. Section 4). After computing the transparent equivalences of program terms for each program point, the preprocess globalizes this information according to the value ow of the program. The value- ow graph is just the syntactic representation for storing the global ow information. Based on this information the main process eliminates semantically redundant computations by appropriately placing the computations in the program. In the following section we sketch the preprocess as far as it is necessary for the development here, while we investigate the main process in full detail. In comparison to [20] it is completely redesigned.

3.1 The Preprocess: Constructing the Value-Flow Graph

The rst step of the preprocess computes for every program point the set of all transparently equivalent program terms. The corresponding algorithm is fairly straightforward and matches the well-known pattern of Kildall's algorithm (cf. [10]). As a result of this algorithm every program point is +, z +, y annotated by a structured partition (SP ) DAG (cp. [9]), i.e., an ordered, directed, acyclic graph whose nodes are a c b,d labeled with at most one operator or constant and a set of variables. The SPDAG attached to a program point represents the set of all terms being transparently equivalent at this point: two terms are transparently equivalent i they are represented by the same node of an SPDAG; e.g., the SPDAG on the right represents the term equivalences [a j b; d j c j z; a + b; a + d j y; c + b; c + d]. Afterwards, the value- ow graph is constructed. Intuitively, it connects the nodes, i.e., the term equivalence classes of the SPDAG-annotation D computed in the previous step according to the data ow. Thus, its nodes are the classes of transparently equivalent terms, and its edges are the representations of the data ow: if two nodes and 0 of a value- ow graph are connected by a value ow graph edge (; 0 ), then the terms represented by 0 evaluate to the same value after executing the ow-graph edge e corresponding to (; 0 ) as the terms represented by before executing e (cf. Figure 7). This is made precise by 7

In [20] transparent equivalence is therefore called Herbrand equivalence as it is induced by the Herbrand interpretation.

means of the relation , de ned next. To this end, let , denote the set of all nodes of the SPDAG-annotation D. Moreover, let Terms ( ) denote the set of all terms represented by node of , , and let N ( ) denote the ow-graph node is related to. Central is the de nition of the backward substitution , which is de ned for each ow-graph edge e (x1 ; : : : ; xr ) := (t1 ; : : : ; tr ) by e : T ! T, e (t)=df t[t1; : : : ; tr =x1; : : : ; xr ], where t[t1; : : : ; tr =x1 ; : : : ; xr ] stands for the simultaneous replacement of all occurrences of xi by ti in t, i 2 f1; : : :; rg.8 The relation , on , is now de ned by:

8 (; 0 ) 2 ,: , 0 ()df (N ( ); N ( 0 )) 2 E ^ Terms ( ) e (Terms ( 0 )) The value- ow graph for a SPDAG-designation D is then as follows: De nition 1 (Value-Flow Graph). The value- ow graph with respect to D is a pair VFG = (VFN; VFE ) consisting of { a set of nodes VFN=df , called abstract values and { a set of edges VFE VFN VFN with VFE =df ,.

It is worth noting that the value- ow graph de nition given above is technically much simpler than its original version of [20]. This is simply a consequence of representing procedures here by edge-labeled ow graphs instead of node-labeled

ow graphs as in [20]. Predecessors, successors and nite paths in the value- ow graph are denoted by overloading the corresponding notions for ow graphs, e. g., pred( ) addresses the predecessors of 2 , . VFG-Redundancies: In order to de ne the notion of (partial) redundancies with respect to a value- ow graph VFG, we need to extend the local predicate Comp for terms known from syntactic CM to the abstract values represented by value- ow graph nodes. In the syntactic setting Compe expresses that a given term t under consideration is computed at edge e, i.e., t is a sub-term of some right-hand side term of the statement associated with e. Analogously, for every abstract value 2 VFN the local predicate Comp expresses that the statements of the corresponding outgoing ow-graph edges compute a term represented by :9 Comp ()df (8 e 2 E: source (e) = N ( ) ) Terms ( ) \ Terms (e) 6= ;)

Here Terms (e) denotes the set of all terms occurring on the right-hand side of the statement of edge e. In addition, we need the notion of correspondence between value- ow graph paths and ow-graph paths. Let p = he1 ; : : : ; eq i 2 P[m; n] be a path in the ow 8 9

Note, for edges labeled by \skip" the function e equals the identity on terms IdT . Recall that edges starting in nodes with more than one successor are labeled by \skip". Thus, we have: n N: e source (e) = n >1 Terms ( e source (e) = n ) = . Hence, the truth value of the predicate Comp depends actually on a single ow-graph edge only. 8

f

j

g

;

2

jf

j

gj

)

graph G, and let p0 = ("1 ; : : : ; "r ) 2 P[; ] be a path in a value- ow graph VFG of G. Then p0 is a corresponding VFG-pre x of p, if for all i 2 f1; : : : ; rg holds: N (source ("i )) = source (ei ) and N (dest ("i )) = dest (ei ). Analogously, the notion of a corresponding VFG-post x p0 of p is de ned. We can now de ne:

De nition 2 (VFG-Redundancy). Let VFG be a value- ow graph, let n be a node of G, and t be a term of T. Then t is 1. partially VFG-redundant at n, if there is a path p = he ; : : : ; eq i 2 P[s; n] with a corresponding VFG-post x p0 = h" ; : : : ; "r i 2 P[; ] such that Comp and t 2 Terms() holds. 2. (totally) VFG-redundant at n, if t is partially VFG-redundant along each path p 2 P[s; n]. 1

1

4 The Main Process: Eliminating Semantic Redundancies The nodes of a value- ow graph represent semantic equivalences of terms syntactically. In the main process of eliminating semantic redundancies, they play the same role as the lexical term patterns in the elimination of syntactic redundancies by syntactic CM. We demonstrate this analogy in the following section.

4.1 The Straightforward Approach In this section we extend the analyses underlying the syntactic CM-procedure to the semantic situation in a straightforward fashion. To this end let us recall the equation system characterizing up-safety in the syntactic setting rst: up-safety of a term pattern t at a program point n means that t is computed on every program path reaching n without an intervening modi cation of some of its operands.10

Equation System 3 (Syntactic Up-Safety for a Term Pattern t) Syn-USn

= (n 6= s)

Y

(Syn-USm + Comp(m;n) ) Transp(m;n)

m2pred(n)

The corresponding equation system for VFG-up-safety is given next. Note that there is no predicate like \VFG-Transp" corresponding to the predicate Transp. In the syntactic setting, the transparency predicate Transpe is required for checking that the value of the term t under consideration is maintained along a ow-graph edge e, i.e., that none of its operands is modi ed. The essence of the value- ow graph is that transparency is modeled by the edges: two value- ow graph nodes are connected by a value- ow graph edge i the value they represent is maintained along the corresponding ow-graph edge. 10

As convenient, we use negation, respectively.

;

+ and overlining for logical conjunction, disjunction and

Equation System 4 (VFG-Up-Safety) VFG-US = ( 2 = VFNs )

Y

(VFG-US + Comp )

2pred( )

The roles of the start node and end node of the ow graph are here played by the \start nodes" and \end nodes" of the value- ow graph de ned by: VFNs =df f j N ( pred( ) ) 6= pred( N ( ) ) _ N ( ) = sg VFNe =df f j N ( succ( ) ) 6= succ( N ( ) ) _ N ( ) = eg

Down-safety is the dual counterpart of up-safety. However, the equation system for the VFG-version of this property is technically more complicated, as we have two graph structures, the ow graph and the value- ow graph, which must both be taken separately into account: as in the syntactic setting (or for upsafety), we need safety universally along all ow-graph edges, which is re ected by a value- ow graph node being down-safe at the program point N ( ), if a term represented by is computed on all ow-graph edges leaving N ( ), or if it is down-safe at all successor points. However, we may justify safety along a owgraph edge e by means of the VFG in various ways, as, in contrast to up-safety concerning the forward ow (or the syntactic setting), a value- ow graph node may have several successors corresponding to e (cf. Figure 7), and it suces to have safety only along one of them. This is formally described by:

Equation System 5 (VFG-Motion-Down-Safety) VFG-MDS = ( 2 = VFNe ) (Comp +

Y

X

( ) m2succ(N ( )) N2(succ ) = m

VFG-MDS )

The Transformation (of the Straightforward Approach) Let

VFG-US

and denote the greatest solutions of the Equation Systems 4 and 5. The analogy with the syntactic setting is continued by the speci cation of the semantic CM-transformation, called Sem-CMStrght . It is essentially given by the set of insertion and replacement points. Insertion of an abstract value means to initialize a fresh temporary with a minimal representative t0 of , i.e., containing a minimal number of operators, on the ow-graph edges leaving node N ( ). Replacement means to replace an original computation by a reference to the temporary storing its value. VFG-MDS

Insert Replace

()

VFG-MDS ( (

() Comp

2 VFNs ) +

X

2pred( )

VFG-MDS + VFG-US )

Managing and reinitializing temporaries: In the syntactic setting, there is for every term pattern a unique temporary, and these temporaries are not interacting with each other: the values computed at their initialization sites are thus

propagated to all their use sites, i.e., the program points containing an original occurrence of the corresponding term pattern, without requiring special care. In the semantic setting propagating these values requires usually to reset them at certain points to values of other temporaries as illustrated in Figure 2(b). The complete process including managing temporary names is accomplished by a straightforward analysis starting at the original computation sites and following the value-graph edges in the opposite direction to the insertion points. The details of this procedure are not recalled here as they are not essential for the point of this article. They can be found in [20, 21].

Second-Order Eects: In contrast to the syntactic setting, the straightforward semantic counterpart Sem-CMStrght has second-order eects. This is illustrated in Figure 5: applying Sem-CMStrght to the program of Figure 5(a) results in the program of Figure 5(b). Repeating the transformation again, results in the optimal program of Figure 5(c). a)

b)

c) h1 := a+b

c := a

x := a+b

h2 := h1 c := a

y := c+b

z := a+b

Totally redundant!

h1 := a+b

h2 := c+b x := h1

h2 := h1 c := a

h2 := c+b x := h1

y := h2

y := h2

h1 := a+b z := h1

z := h1

Fig. 5. Illustrating Second-Order Eects Intuitively, the reason that Sem-CMStrght has second-order eects is that safety is no longer the sum of up-safety and down-safety. Above, up-safety is only computed with respect to original computations. While this is sucient in the syntactic case, it is not in the semantic one as it does not take into account that the placement of computations as a result of the transformation may make \new" values available, e.g. in the program of Figure 5(b) the value of a + b becomes available at the end of the program fragment displayed. As a consequence, total redundancies like the one in Figure 5(b) can remain in the

program. Eliminating them (TRE) as illustrated in Figure 5(c) is sucient to capture the second-order eects completely. We have: Theorem 3 (Sem-CMStrght). 1. Sem-CMStrght relies on only two uni-directional analyses. 2. Sem-CMStrght has second-order eects. 3. Sem-CMStrght followed by TRE achieves computationally motion-optimal results wrt the Herbrand interpretation like its bidirectional precursor of [20].

4.2 Avoiding Second-Order Eects: The Hierarchical Approach

The point of this section is to reestablish as a footing for the algorithm a notion of safety which can be characterized as the sum of up-safety and down-safety. The central result here is that a notion of safety tailored to capture the idea of \motion" (in contrast to \placement") can be characterized by down-safety together with the following hierarchical notion of up-safety:

Equation System 6 (\Hierarchical" VFG-Up-Safety) VFG-MUS = VFG-MDS + ( 2 = VFNs ) VFG-MUS

Y

2pred( )

In fact, the phenomenon of second-order eects can now completely and elegantly be overcome by the following hierarchical procedure:11 1. Compute down-safety (cf. Equation System 5). 2. Compute the modi ed up-safety property on the basis of the down-safety computation (cf. Equation System 6). The transformation Sem-CMHier of the Hierarchical Approach is now de ned as before except that VFG-MUS is used instead of VFG-US . Its results coincide with those of an exhaustive application of the CM-procedure proposed in Subsection 4.1, as well as of the original (bidirectional) CM-procedure of [20]. However, in contrast to the latter algorithm, which due to its bidirectionality required a complicated correctness argument, the transformation of the hierarchical approach allows a rather straightforward link to the Herbrand semantics, which drastically simpli es the correctness proof. Besides this conceptual improvement, the new algorithm is also easier to comprehend and to implement as it does not require any bidirectional analysis. We have: Theorem 4 (Sem-CMHier ). 1. Sem-CMHier relies on only two uni-directional analyses sequentially ordered. 2. Sem-CMHier is free of second-order eects. 3. Sem-CMHier achieves computationally motion-optimal results wrt the Herbrand interpretation like its bidirectional precursor of [20]. 11

The \down-safety/earliest" characterization of [12] was also already hierarchical. However, in the syntactic setting this is not relevant as it was shown in [13].

5 Semantic Code Placement In this section we are concerned with crossing the frontier marked by semantic CM towards to semantic CP, and giving theoretical and practical reasons for the fact that no algorithm has gone beyond this frontier so far. On the theoretical side (I), we prove that computational optimality is impossible for semantic CP in general. On the practical side (II), we demonstrate that down-safety, the handle for correctly and pro tably placing computations by syntactic and semantic CM, is inadequate for semantic CP. (I) No Optimality in General: Semantic CP cannot be done \computationally placement-optimal" in general. This is a signi cant dierence in comparison to syntactic and semantic CM, which have a least element with respect to the relation \computationally better" (cf. [13, 20]). In semantic CP, however, we are faced with the phenomenon of incomparable minima. This is illustrated in the example of Figure 6 showing a slight modi cation of the program of Figure 3. Both programs of Figure 6 are of incomparable quality, since the \right-most" path is improved by impairing the \left-most" one. b)

a)

h := a+b

h := c+b

c := a

y := c+b

c := a

y := h

z := a+b

z := c+b

z := a+b

z := h

Fig. 6. No Optimality in General: An Incomparable Result due to CP

(II) Inadequateness of Down-safety: For syntactic and semantic CM, downsafe program points are always legal insertion points. Inserting a computation at a down-safe point guarantees that it can be used on every program continuation. This, however, does not hold for semantic CP. Before showing this in detail, we illustrate the dierence between \placable" VFG-down-safety (VFG-DownSafe ) and \movable" VFG-down-safety (VFG-M-DownSafe ) by means of Figure 7 which shows the dierence by means of the value- ow graph corresponding to the example of Figure 3. We remark that the predicate VFG-DownSafe is decidable. However, the point to be demonstrated here is that this property is insucient anyhow in

a) a

b)

+

+ c

+

+ a

b

a

c := a

a

+, z c

a, c

c

a

b

a

b +, y

z := c+b

z := a+b

a

c +, z

+, y a

b

c

b

a

+

c +, z

b

a

: Placably VFG-Down-Safe

c

c

c

b +, y

+

b

b

+

+, y

+

b +

+, x

+ b

+ c

(x,y) := (a+b,c+b)

+

+

b

a

b

+

+, y c

c

b +

c

+

+, x

+

a

b

c := a

a

b +

a

c

c

+

+

(x,y) := (a+b,c+b)

+ a, c

a

+

+

c

+

+ a

b

z := c+b

z := a+b

a

c +, z

+, y a

b

c

b + b

: Movably VFG-Down-Safe

Fig. 7. The Corresponding Value-Flow Graph order to (straightforwardly) arrive at an algorithm for semantic CP. This is illustrated by Figure 8 showing a slight modi cation of the program of Figure 6. Though the placement of h := a + b is perfectly down-safe, it cannot be used at all. Thus, impairing the program. b)

a)

h := a+b

c := a

c := a

z := a+b

z := c+b

z := a+b

z := c+b

Fig. 8. Inadequateness of Down-Safety: Degradation through Naive CP

Summary: The examples of Figure 3 and of this section commonly share that

they are invariant under semantic CM, since a + b cannot safely be moved to (and hence not across) the join node in the mid part. However, a + b can safely be placed in the left branch of the upper branch statement. In the example of Figure 3 this suces to show that semantic CP is in general strictly more powerful

than semantic CM. On the other hand, Figure 6 demonstrates that \computational optimality" for semantic CP is impossible in general. While this rules out the possibility of an algorithm for semantic CP being uniformly superior to every other semantic CP-algorithm, the inadequateness of down-safety revealed by the second example of this section gives reason for the lack even of heuristics for semantic CP: this is because down-safety, the magic wand of syntactic and semantic CM, loses its magic for semantic CP. In fact, straightforward adaptions of the semantic CM-procedure to semantic CP would be burdened with the placing anomalies of Figures 6 and 8. We conjecture that a satisfactory solution to these problems requires structural changes of the argument program. Summarizing, we have:

Theorem 5 (Semantic Code Placement).

1. Semantic CP is strictly more powerful than semantic CM. 2. Computational placement-optimality is impossible in general. 3. Down-safety is inadeqate for semantic CP.

6 Conclusion We have re-investigated semantic CM under the perspective of the recent development of syntactic CM, which has clari ed the essential dierence between the syntactic and the semantic approach, and uncovered the dierence of CM and CP in the semantic setting. Central for the understanding of this dierence is the role of the notion of safety of a program point for some computation. Modi cation of the considered notion of safety is the key for obtaining a transfer of the syntactic algorithm to the semantic setting which captures the full potential of motion algorithms. However, in contrast to the syntactic setting, motion algorithms do not capture the full potential of code placement. Actually, we conjecture that there does not exist a satisfactory solution to the code placement problem, unless one is prepared to change the structure of the argument program.

References 1. B. Alpern, M. N. Wegman, and F. K. Zadeck. Detecting equality of variables in programs. In Conf. Rec. 15th Symp. Principles of Prog. Lang. (POPL'88), pages 1 { 11. ACM, NY, 1988. 2. F. Chow. A Portable Machine Independent Optimizer { Design and Measurements. PhD thesis, Stanford Univ., Dept. of Electrical Eng., Stanford, CA, 1983. Publ. as Tech. Rep. 83-254, Comp. Syst. Lab., Stanford Univ. 3. C. Click. Global code motion/global value numbering. In Proc. ACM SIGPLAN Conf. Prog. Lang. Design and Impl. (PLDI'95), volume 30,6 of ACM SIGPLAN Not., pages 246{257, 1995. 4. J. Cocke and J. T. Schwartz. Programming languages and their compilers. Courant Inst. Math. Sciences, NY, 1970.

5. D. M. Dhamdhere. A fast algorithm for code movement optimization. ACM SIGPLAN Not., 23(10):172 { 180, 1988. 6. D. M. Dhamdhere. Practical adaptation of the global optimization algorithm of Morel and Renvoise. ACM Trans. Prog. Lang. Syst., 13(2):291 { 294, 1991. Tech. Corr. 7. D. M. Dhamdhere, B. K. Rosen, and F. K. Zadeck. How to analyze large programs eciently and informatively. In Proc. ACM SIGPLAN Conf. Prog. Lang. Design and Impl. (PLDI'92), volume 27,7 of ACM SIGPLAN Not., pages 212 { 223, 1992. 8. K.-H. Drechsler and M. P. Stadel. A solution to a problem with Morel and Renvoise's \Global optimization by suppression of partial redundancies". ACM Trans. Prog. Lang. Syst., 10(4):635 { 640, 1988. Tech. Corr. 9. A. Fong, J. B. Kam, and J. D. Ullman. Application of lattice algebra to loop optimization. In Conf. Rec. 2nd Symp. Principles of Prog. Lang. (POPL'75), pages 1 { 9. ACM, NY, 1975. 10. G. A. Kildall. A uni ed approach to global program optimization. In Conf. Rec. 1st Symp. Principles of Prog. Lang. (POPL'73), pages 194 { 206. ACM, NY, 1973. 11. J. Knoop, D. Koschutzki, and B. Steen. Basic-block graphs: Living dinosaurs? In Proc. 7th Int. Conf. on Compiler Constr. (CC'98), LNCS, Springer-V., 1998. 12. J. Knoop, O. Ruthing, and B. Steen. Lazy code motion. In Proc. ACM SIGPLAN Conf. Prog. Lang. Design and Impl. (PLDI'92), volume 27,7 of ACM SIGPLAN Not., pages 224 { 234, 1992. 13. J. Knoop, O. Ruthing, and B. Steen. Optimal code motion: Theory and practice. ACM Trans. Prog. Lang. Syst., 16(4):1117{1155, 1994. 14. J. Knoop, O. Ruthing, and B. Steen. Code Motion and Code Placement: Just Synonyms? Technical Report MIP{9716, Fakultat fur Mathematik und Informatik, Universitat Passau, Germany, 1997. 15. E. Morel and C. Renvoise. Global optimization by suppression of partial redundancies. Comm. ACM, 22(2):96 { 103, 1979. 16. J. H. Reif and R. Lewis. Symbolic evaluation and the global value graph. In Conf. Rec. 4th Symp. Principles of Prog. Lang. (POPL'77), pages 104 { 118. ACM, NY, 1977. 17. B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global value numbers and redundant computations. In Conf. Rec. 15th Symp. Principles of Prog. Lang. (POPL'88), pages 2 { 27. ACM, NY, 1988. 18. A. Sorkin. Some comments on a solution to a problem with Morel and Renvoise's \Global optimization by suppression of partial redundancies". ACM Trans. Prog. Lang. Syst., 11(4):666 { 668, 1989. Tech. Corr. 19. B. Steen. Property-oriented expansion. In Proc. 3rd Stat. Analysis Symp. (SAS'96), LNCS 1145, pages 22 { 41. Springer-V., 1996. 20. B. Steen, J. Knoop, and O. Ruthing. The value ow graph: A program representation for optimal program transformations. In Proc. 3rd Europ. Symp. Programming (ESOP'90), LNCS 432, pages 389 { 405. Springer-V., 1990. 21. B. Steen, J. Knoop, and O. Ruthing. Ecient code motion and an adaption to strength reduction. In Proc. 4th Int. Conf. Theory and Practice of Software Development (TAPSOFT'91), LNCS 494, pages 394 { 415. Springer-V., 1991.

Mode-Automata: About Modes and States for Reactive Systems? Florence Maraninchi and Yann Remond VERIMAG?? { Centre Equation, 2 Av. de Vignate { F38610 GIERES http://www.imag.fr/VERIMAG/PEOPLE/Florence.Maraninchi

Abstract. In the eld of reactive system programming, data ow syn+ chronous languages like Lustre [BCH 85,CHPP87] or Signal [GBBG85] oer a syntax similar to block-diagrams, and can be eciently compiled into C code, for instance. Designing a system that clearly exhibits several \independent" running modes is not dicult since the mode structure can be encoded explicitly with the available data ow constructs. However the mode structure is no longer readable in the resulting program; modifying it is error prone, and it cannot be used to improve the quality of the generated code. We propose to introduce a special construct devoted to the expression of a mode structure in a reactive system. We call it mode-automaton, for it is basically an automaton whose states are labeled by data ow programs. We also propose a set of operations that allow the composition of several mode-automata (parallel and hierarchic compositions taken from Argos [Mar92]), and we study the properties of our model, like the existence of a congruence of mode-automata for instance, as well as implementation issues.

1 Introduction The work on which we report here has been motivated by the need to talk about running modes in a data ow synchronous language. Data ow languages like Lustre [BCH+ 85,CHPP87] or Signal [GBBG85] belong to the family of synchronous languages [BB91] devoted to the design, programming and validation of reactive systems. They have a formal semantics and can be eciently compiled into C code, for instance. The data ow style is clearly appropriate when the behaviour of the system to be described has some regularity, like in signal-processing. Designing a system that clearly exhibits several \independent" running modes is not so dicult since the mode structure can be encoded explicitly with the available data ow constructs. However the mode structure is no longer readable in the resulting program; modifying it is error prone, and it cannot be used to improve the This work has been partially supported by Esprit Long Term Research Project SYRF 22703 ?? Verimag is a joint laboratory of Universit e Joseph Fourier (Grenoble I), CNRS and INPG ?

quality of the generated code, decompose the proofs or at least serve as a guide in the analysis of the program. In section 2 we propose a de nition of a mode, in order to motivate our approach. Section 3 de nes mini-Lustre, a small subset of Lustre which is sucient for presenting our notion of mode-automata in section 4. Section 5 explains how to compose these mode-automata, with the operators of Argos, and Section 6 de nes a congruence. Section 7 compares the approach to others, in which modes have been studied. Finally, section 8 gives some ideas for further work.

2 What is a Mode? One (and perhaps the only one) way of facing the complexity of a system is to decompose it into several \independent" tasks. Of course the tasks are never completely independent, but it should be possible to nd a decomposition in which the tasks are not too strongly connected with each other | i.e. in which the interface between tasks is relatively simple, compared to their internal structure. The tasks correspond to some abstractions of the global behaviour of the system, and they may be viewed as dierents parts of this global behaviour, devoted to the treatment of distinct situations. Decomposing a system into tasks allows independent reasoning about the individual tasks. Tasks may be concurrent, in which case the system has to be decomposed into concurrent and communicating components. The interface de nes how the components communicate and synchronize with each other in order to reach a global goal. Thinking in terms of independent modes is in some sense an orthogonal point of view, since a mode structure is rather sequential than concurrent. This is typically the case with the modes of an airplane, which can be as high level as \landing" mode, \take-o" mode, etc. The normal behaviour of the system is a sequence of modes. In a transition between modes, the source mode is designed to build and guarantee a given con guration of the parameters of the system, such that the target mode can be entered. On the other hand, modes may be divided into sub-modes. Of course the mode structure may interfere with the concurrent structure, in which case each concurrent subsystem may have its own modes, and the global view of the system shows a Cartesian product of the sets of modes. Or the main view of the system may be a set of modes, and the description of each mode is further decomposed into concurrent tasks. Hence we need a richer notion of mode. This seems to give something similar to the notion of mode we nd in Modecharts [JM88], where modes are structured like in Statecharts [Har87] (an And/Or tree). However, see section 7 for more comments about Modecharts, and a comparison between Modecharts and our approach.

2.1 Modes and States

Technically, all systems can be viewed as a (possibly huge, or even in nite) set of elementary and completely detailed states, such that the knowledge about the current state is sucient to determine the correct output, at any point in time. States are connected by transitions, whose ring depends on the inputs to the system. This complete model of the system behaviour may not be manageable, but it exists. We call its states and transitions execution states and transitions. Execution states are really concrete ones, related for instance to the content of the program memory during execution. The question is: how can we de ne the modes of a system in terms of its execution states and transitions? Since the state-transition view of a complex behaviour is intrinsically sequential, it seems that, in all cases, it should be possible to relate the abstract notion of mode to collections of execution states. The portion of behaviour corresponding to a mode is then de ned as a set of execution states together with the attached transitions. Related questions are: are these collections disjoint? do they cover the whole set of states? S. Paynter [Pay96] suggests that these two questions are orthogonal, and de nes Real-Time Mode-Machines, which describe exhaustive but not necessarily exclusive modes (see more comments on this paper in section 7). In fact, the only relevant question is that of exclusivity, since, for non exhaustive modes, one can always consider that the \missing" states form an additional mode.

2.2 Talking about Modes in a Programming Language

All the formalisms or languages de ned for reactive systems oer a parallel composition, together with some synchronization and communication mechanism. This operation supports a conceptual decomposition in terms of concurrent tasks, and the parallel structure can be used for compositional proofs, generation of distributed code, etc. The picture is not so clear for the decomposition into modes. The question here is how to use the mode structure of a complex system for programming it, i.e. what construct should we introduce in a language to express this view of the system? The mode structure should be as readable in a program as the concurrent structure is, thus making modi cations easier; moreover, it should be usable to improve the quality of the generated code, or to serve as a guide for decomposing proofs. The key point is that it should be possible to project a program onto a given mode, and obtain the behaviour restricted to this mode (as it is usually possible to project a parallel program onto one of its concurrent components, and get the behaviour restricted to this component).

2.3 Modes and Synchronous Languages

None of the existing synchronous languages can be considered as providing a construct for expressing the mode structure of a reactive system

We are particularly interested in data ow languages. When trying to think in terms of modes in a data ow language, one has to face two problems: rst, there should be a way to express that some parts of a program are not always active (and this is not easy); second, if these parts of the program represent dierent modes, there should be a way of describing how the modes are organized into the global behaviour of the system. Several proposals have been made for introducing some control features in a data ow program, and this has been tried for one language or another among the synchronous family: [RM95] to introduce in Signal a way to de ne intervals delimited by some properties of the inputs, and to which the activity of some subprograms can be attached; [JLMR94,MH96] propose to mix the automaton constructs of Argos with the data ow style of Lustre: the re nement operation of Argos allows to re ne a state of an automaton by a (possibly complex) subsystem. Hence the activity of subprograms is attached to states. Embedding Lustre nodes in an Esterel program is possible, and would have the same eect. However, providing a full set of start-and-stop control structures for a data ow language does not necessarily improve the way modes can be dealt with. It solves the rst problem mentioned above, i.e. the control structures taken in the imperative style allow the speci cation of activity periods of some subprograms described in a data ow declarative style. But it does little for the second problem: a control structure like the interrupt makes it easy to express that the system switches between dierent behaviours, losing information about the current state of the behaviour that is interrupted, and starting a new one in some initial con guration. Of course, some information may be transmitted from the behaviour that is killed to the one that is started, but this is not the default, and it has to be expressed explicitly, with the communication mechanism for instance. For switching between modes, we claim that the emphasis should be on what is transmitted from one mode to another. Transmitting the whole con guration reached by the system should be the default if we consider that the source mode is designed to build and guarantee a given con guration of the parameters of the system, such that the target mode can be entered.

2.4 A Proposal: Mode-Automata We propose a programming model called \mode-automata", made of: operations on automata taken from the de nition of Argos [Mar92]; data ow equations taken from Lustre [BCH+ 85]. We shall see that mode-automata can be considered as a discrete version of hybrid automata [MMP91], in which the states are labeled by systems of dierential equations that describe how the continuous environment evolves. In our model, states represent the running modes of a system, and the equations associated with the states could be obtained by discretizing the control laws. Mode-automata have the property that a program may be projected onto one of its modes.

3 Mini-Lustre: a (very) Small Subset of Lustre For the rest of the paper, we use a very small subset of Lustre. A program is a single node, and we avoid the complexity related to types as much as possible. In some sense, the mini-Lustre model we present below is closer to the DC [CS95] format used as an intermediate form in the Lustre, Esterel and Argos compilers.

De nition 1 (mini-Lustre programs). N = (V ; V ; V ; f; I ) where: V , V and V are pairwise disjoint sets of input, output and local variable names. I is a total function from V [ V to constants. f is a total function from V [ V to the set Eq(V [ V [ V ) and Eq(V ) is the set of expressions with variables in V , de ned by the following grammar: e ::= c j x j op(e; :::; e) j pre(x). c stands for constants, x stands for a name in V , and op stands for all combinational i

i

o

o

l

l

o

i

o

l

o

l

l

operators. An interesting one is the conditional if e1 then e2 else e3 where e1 should be a Boolean expression, and e2 , e3 should have the same type. pre(x) stands for the previous value of the ow denoted by x. In case one needs pre(x) at the rst instant, I (x) should be used. 2

We restrict mini-Lustre to integer and Boolean values. All expressions are assumed to be typed correctly. As in Lustre, we require that the dependency graph between variables be acyclic. A dependency of X onto Y appears whenever there exists an equation of the form X = :::Y::: and Y does not appear inside a pre operator. In the syntax of mini-Lustre programs, it means that Y appears in the expression f (X ), not in a pre operator.

De nition 2 (Trace semantics of mini-Lustre). Each variable name v in the mini-Lustre program describes a ow of values of its type, i.e. an in nite sequence v ; v ; :::. Given a sequence of inputs, i.e. the values v , for each v 2 V and each n 0, we describe below how to compute the sequences (or traces) of 0

1

n

i

local and output ows of the program. The initialization function gives values to variables for the instant \before time starts", since it provides values in case pre(x) is needed at instant 0. Hence we can call it x,1 :

8v 2 V [ V : v, = I (v) o

l

1

For all instants in time, the value of an output or local variable is computed according to its de nition as given by f :

8n 0: 8v 2 V [ V : v = f (v)[x =x][x , =pre(x)] We take the expression f (v), in which we replace each variable name x by its current value x , and each occurrence of pre(x) by the previous value x , . This o

n

l

n

n

n

1

n

1

yields an expression in which combinational operators are applied to constants. The set of equations we obtain for de ning the values of all the ows over time is acyclic, and is a sound de nition. 2

De nition 3 (Union of mini-Lustre nodes). Provided they do not de ne the same outputs, i.e. V \ V = ;, we can put together two mini-Lustre programs. 1

2

o

o

This operation consists in connecting the outputs of one of them to the inputs of the other, if they have the same name. These connecting variables should be removed from the inputs of the global program, since we now provide de nitions for them. This corresponds to the usual data ow connection of two nodes. (V 1 ; V 1 ; V 1 ; f 1 ; I 1 ) [ (V 2 ; V 2 ; V 2; f 2 ; I 2 ) = ((V 1 [ V 2 ) n V 1 n V 2 ; V 1 [ V 2; V 1 [ V 2; 1 1 1 x:if x 2 V [ V then f (x) else f 2 (x); x:if x 2 V 1 then I 1 (x) else I 2 (x)) Local variables should be disjoint also, but we can assume that a renaming is performed before two mini-Lustre programs are put together. Hence V 1 \ V 2 = ; is guaranteed. The union of sets of equations should still satisfy the acyclicity constraint. 2 o

i

i

i

l

o

i

o

o

o

l

o

o

l

l

l

o

l

l

De nition 4 (Trace equivalence for mini-Lustre). Two programs L1 = (V ; V ; V 1; f 1 ; I 1 ) and L2 = (V ; V ; V 2 ; f 2 ; I 2 ) having the same input/output interface are trace-equivalent (denoted by L1 L2 ) if and only if they give the same sequence of outputs when fed with the same sequence of inputs. 2 i

o

l

i

o

l

De nition 5 (Trace equivalence for mini-Lustre with no initial speci cation). We consider mini-Lustre programs without initial speci cation, i.e. mini-Lustre programs without the function I that gives values for the ows \before time starts". Two such objects L = (V ; V ; V ; f ) and L = (V ; V ; V ; f ) having the same input/output interface are trace-equivalent (denoted by L L ) 1

i

o

1

l

1

2

i

2

o

l

1

2

2

if and only if, for all initial con guration I , they give the same sequence of outputs when fed with the same sequence of inputs. 2

Property 1 : Trace equivalence is preserved by union L L0 =) L [ M L0 [ M

2

4 Mode-Automata

4.1 Example and De nition

The mode-automaton of gure 1 describes a program that outputs an integer X . The initial value is 0. Then, the program has two modes: an incrementing mode, and a decrementing one. Changing modes is done according to the value reached by variable X : when it reaches 10, the mode is switched to \decrementing"; when X reaches 0 again, the mode is switched to \incrementing". For simplicity, we give the de nition for a simple case where the equations de ne only integer variables. One could easily extend this framework to all types of variables.

X:0 X = 10 A

X=0

B X = pre(X) , 1

X = pre(X) + 1

Fig. 1. Mode-automata: a simple example

De nition 6 (Mode-automata).

A mode-automaton is a tuple (Q; q0 ; V ; V ; I ; f; T ) where: { Q is the set of states of the automaton part { q0 2 Q is the initial state { V and V are sets of names for input and output integer variables. { T Q C (V ) Q is the set of transitions, labeled by conditions on the variables of V = V [ V { I : V ,! Z is a function de ning the initial value of output variables { f : Q ,! V ,! EqR de nes the labeling of states by total functions from V to the set EqR(V [ V ) of expressions that constitute the right parts of the equations de ning the variables of V . The expressions in EqR(V [ V ) have the same syntax as in mini-Lustre nodes: e ::= c j x j op(e; :::; e) j pre(x), where c stands for constants, x stands for a name in V [ V , and op stands for all combinational operators. The conditions in C (V [ V ) are expressions of the same form, but without pre operators; the type of an expression serving as a condition is Boolean. 2 i

i

o

o

i

o

o

o

o

i

o

o

i

i

i

o

o

o

Note that Input variables are used only in the right parts of the equations, or in the conditions. Output variables are used in the left parts of the equations, or in the conditions. We require that the automaton part of a mode-automaton be deterministic, i.e., for each state q 2 Q, if there exist two outgoing transitions (q; c1 ; q1 ) and (q; c2 ; q2 ), then c1 ^ c2 is not satis able. We also Wrequire that the automaton be reactive, i.e., for each state q 2 Q, the formula ( )2 c is true. With these de nitions, the example of gure 1 is written as: (fA; B g; A; ;; fX g; I : X ! 0; f (A) = f X = pre(X) + 1 g; f (B ) = f X = pre(X) - 1 g); f(A; X = 10; B ); (B; X = 0; A); (A; X 6= 10; A); (B; X 6= 0; B )g) In the graphical notation of the example, we omitted the two loops (A; X 6= 10; A) and (B; X 6= 0; B ). q;c;q 0

T

4.2 Semantics by Translation into Mini-Lustre

The main idea is to translate the automaton structure of a mode-automaton into mini-Lustre, in a very classical and straightforward way. Then we gather all the sets of equations attached to states into a single conditional structure. We choose to encode each state by a Boolean variable. Arguments for a more ecient encoding exist, and such an encoding could be applied here, independently from the other part of the translation. However, it is sometimes desirable that the pure Lustre program obtained from mode-automata be readable; in this case, a clear (and one-to-one) relation between states in the mode-automaton, and variables in the Lustre program, is required. The function L associates a mini-Lustre program with a mode-automaton. We associate a Boolean local variable with each state in Q = fq0 ; q1 ; :::; q g, with the same name. Hence: L((Q; q0 ; V ; V ; I ; f; T )) = (V ; V ; Q; e; J ) The initial values of the variables in V are given by the initialization function I of the mode-automaton, hence 8x 2 V ; J (x) = I (x). For the local variables of the mini-Lustre program, which correspond to the states of the mode-automaton, we have: J (q0 ) = true and J (q) = false; 8q 6= q0 . The equation for a local variable q that encodes a state q expresses that we are in state q at a given instant if and only if we were in some state q0 , and a transition (q0 ; c; q) could be taken. Note that, because the automaton is reactive, the system can always take a transition, in any state. A particular case is q0 = q: staying in a state means taking a loop on that state, at each instant _ for q 2 Q; e(q) is the expression: pre (q0 ^ c) n

i

o

i

o

o

o

2

(q 0 ;c;q )

T

For x 2 V , e(x) is the expression : if q0 then f (q0 )else if q1 then f (q1 ):::else if q then f (q ) The mini-Lustre program obtained for the example is the following (note that pre(A and X = 10) is the same as pre(A) and pre(X) = 10, hence the equations have the form required in the de nition of mini-Lustre). V = ; V = fX g V = fA; B g f (X ) : if A then pre(X)+1 else pre(X)-1 f (A) : pre (A and not X=10) or pre(B and X = 0) f (B ) : pre (B and not X=0) or pre(A and X = 10) I (X ) = 0 I (A) = true I (B ) = false o

n

i

o

n

l

5 Compositions of Mode-Automata

5.1 Parallel Composition with Shared Variables

A single mode-automaton is appropriate when the structure of the running modes is at. Parallel composition of mode-automata is convenient whenever

the modes can be split into at least two orthogonal sets, such that a set of modes is used for controlling some of the variables, and another set of modes is used for controlling other variables. Y:0

Y = 10 A

Y=0

B Y = pre(Y) , 1

Y = pre(Y) + 1 X>100 C X = pre(X) + Y X:0

X <50

D X = pre(X) , Y

Fig. 2. Parallel composition of mode-automata, an example

De nition Provided V \ V = ;, we de ne the parallel composition of two 1

2

o

o

mode-automata by : (Q1 ; q01 ; T 1; V 1 ; V 1 ; I 1 ; f 1 )k(Q2 ; q02 ; T 2; V 2 ; V 2 ; I 2 ; f 2) = (Q1 Q2 ; (q01 ; q02 ); (V 1 [ V 2) n V 1 n V 2; V 1 [ V 2; I ; f ) Where: 1 1 X ) if X 2 V 1 1 2 f (q ; q )(X ) = ff 2 ((qq2 )( )(X ) otherwise, i.e. if X 2 V 2 o

i

i

i

i

o

o

o

o

o

o

o

Similarly:

) if X 2 V I (X ) = II ((X X ) if X 2 V 1 2

1

o

2

o

And the set T of global transitions is de ned by: (q1 ; C 1 ; q01 ) 2 T 1 ^ (q2 ; C 2 ; q02 ) 2 T 2 =) ((q1 ; q2 ); C 1 ^ C 2 ; (q01 ; q02 )) 2 T The following property establishes a relationship between the parallel composition de ned for mode-automata as a synchronous product, and the intrinsic parallelism of Lustre programs, captured by the union of sets of equations:

Property 2 L(M kM ) L(M ) [ L(M ) 1

2

1

2

2

5.2 Ideas for Hierarchic Modes A very common notion related to modes is that of sub-modes. In the hierarchic composition below, the equation X = pre(X) , 1 associated with state D is shared by the two submodes, while these submodes have been introduced for de ning Y more precisely. Note that the scope of Y is the whole program, not only state D. Splitting state D for re ning the de nition of Y has something to do with equivalences of mode-automata (see below). Indeed, we would like the hierarchic composition to be de ned in such a way that the program of gure 3 be \equivalent" to that of gure 4, where the program associated with state D uses a Boolean variable q. \Re ning" (or exploding) the conditional de nition of Y into the mode-automaton with states A; B of gure 3 is the work performed by the Lustre compiler, when building an interpreted automaton from a Lustre program. X, Y : 0 D

X=pre(X)+1 X=10 Y=pre(Y)+2 C

X=pre(X),1 Y = 10 A

Y=0

B Y = pre(Y) , 1

Y = pre(Y) + 1

X=0

Fig. 3. Hierarchic composition of mode-automata, an example X, Y, q : 0

X =0 D

X = 10

q=0 X=pre(X)+1 Y=pre(Y)+2 C

q=if pre(q=0 and Y=10 or q=1 and Y 6=0) then 1 else 0 Y= if q=0 then pre(Y) + 1 else pre(Y) - 1 X=pre(X),1

Fig. 4. Describing Y with a conditional program (q = 0 corresponds to A; q = 1 corresponds to B )

6 Congruences of Mode-Automata We try to de ne an equivalence relation for mode-automata, to be a congruence for the parallel composition. There are essentially two ways of de ning such an equivalence : either as a relation induced by the existing trace equivalence of Lustre programs; or by an explicit de nition on the structure of mode-automata, inspired by the trace equivalence of automata. The idea is that, if two states have equivalent sets of equations, then they can be identi ed.

De nition 7 (Induced equivalence of mode-automata). M M () L(M ) L(M ) (see de nition 4 for ). 1

2

i

1

2

2

De nition 8 (Direct equivalence of mode-automata). The direct equivalence is a bisimulation, taking the labeling of states into account: (Q1 ; q01 ; T 1; V 1 ; V 1 ; I 1 ; f 1 ) (Q2 ; q02 ; T 2; V 2 ; V 2 ; I 2 ; f 2 ) () 9R R such that: (q01 ; q02 ) 2 R ^ 8 1 1 01 1 =) 9q02 ; c2 s. t. (q2 ; c2 ; q02 ) 2 T 2 > > < (q ; c ; q ) 2 T ^ (q01 ; q02 ) 2 R (q1 ; q2 ) 2 R =) > ^ c1 = c2 > : and conversely. i

o

d

i

o

s

Where R Q1 Q2 is the relation on states induced by the equivalence of the attached sets of equations: (q1 ; q2 ) 2 R () f 1 (q1 ) f 2 (q2 ) (see de nition 5 for ). 2 s

s

Note: the conditions labeling transitions are Boolean expressions with subexpressions of the form a#b, where a and b are integer constants or variables, and # is a comparison operator yielding a Boolean result. We can always consider that these conditions are expressed as sums of monomials, and that a transition labeled by a sum c _ c0 is replaced by two transitions labeled by c and c0 (between the same states). Hence, in the equivalence relation de ned above, we can require that the conditions c1 and c2 match. Since c1 = c2 is usually undecidable (it is not the syntactic identity), the above de nition is not practical. It is important to note that the two equivalences do not coincide: M1 M2 =) M1 M2, but this is not an equivalence. Translating the modeautomaton into Lustre before testing for equivalence provides a global comparison of two mode-automata. On the contrary, the second de nition of equivalence compares subprograms attached to states, and may fail in recognizing that two mode-automata describe the same global behaviour, when there is no way of establishing a correspondence between their states (see example below). Example 1. Let us consider a program that outputs an integer X . X has three dierent behaviours, described by: B1 : X = pre(X ) + 1, B2 : X = pre(X ) + 2 d

i

and B3 : X = pre(X ) , 1. The transitions between these behaviours are triggered by conditions on X : C is the condition for switching from B to B . A mode-automaton M1 describes B1 and B2 with a state q12 , and B3 with another state q3 . Of course the program associated with q12 has a conditional structure, depending on C12 , C21 and a Boolean variable. The program associated with q3 is B3 . Now, consider the mode-automaton M2 that describes B1 with a state q10 , B2 0 . In M2 , the program associated with q23 0 has a conditional and B3 with a state q23 structure. There is no relation R between the states of M1 and the states of M2 that would allow to recognize that they are equivalent. Translating them into miniLustre is a way of translating them into single-state mode-machines (with conditional associated programs), and it allows to show that they are indeed equiv0 into alent. On the other hand, if we are able to split q12 into two states, and q23 two states (as suggested in section 5.2) then the two machines have three states, and we can show that they are equivalent. 2 ij

i

j

Property 3 : Congruences of mode-automata

The two equivalences are congruences for parallel composition.

2

7 Related Work and Comments on the Notion of Mode We already explained in section 2.3 that there exists no construct dedicated to the expression of modes in the main synchronous programming languages. Mode-automata are a proposal for that. They allow to distinguish between explicit states (corresponding to modes, and described by the automaton part of the program) and implicit states (far more detailed and grouped into modes, described by the data ow equational part of the program). This is an answer for people who argue that modes should not be related to states, because they are far more states than modes. Other people argue that modes are not necessarily exclusive. The modes in one mode-automaton are exclusive. However, concurrent modes are elements of a Cartesian product, and can share something. Similarly, two submodes of a re ned mode also share something. We tried to nd a motivation for (and thus a de nition of) non-exclusive modes. Modecharts have recently joined the synchronous community, and we can nd an Esterel-like semantics in [PSM96]. In this paper, modes are simply hierarchical and concurrent states like in Argos [Mar92]. It is mentioned that \the actual interaction with the environment is produced by the operations associated with entry and exit events". Hence the modes are not dealt with in the language itself; the language allows to describe a complex control structure, and an external activity can be attached to a composed state. It seems that the activity is not necessarily killed when the state is left; hence the activities associated with

exclusive states are not necessarily exclusive. This seems to be the motivation for non-exclusive modes. Activities are similar to the external tasks of Esterel but, in Esterel, the way tasks interfere with the control struture is well de ned in the language itself. Real-time mode-machines have been proposed in [Pay96]. In this paper, modes are collections of states, in the sense recalled above (section 2.1). These collections are exhaustive but not exclusive. However, it seems that this requirement for non-exclusivity is related to pipelining of the execution: part of the system is still busy with a given piece of data, while another part is already using the next piece of data. The question is whether pipelining has anything to do with overlapping or non-exclusive modes. In software pipelining, there may be two components running in parallel and corresponding to the same piece of source program; if this portion of source describes modes, it may be the case that the two execution instances of it are in dierent modes at the same time, because one of them starts treating some piece of data, while the other one nishes treating another piece of data. Each instance is in exactly one mode at a given instant; should this phenomenon be called \non-exclusive modes"? We are still searching for examples of non-exclusive modes in reactive systems.

8 Implementation and Further Work We presented mode-automata, a way to deal with modes in a synchronous language. Parallel composition is well de ned, and we also have a congruence of mode-automata, w.r.t. this operation. We still have to study the hierarchic composition, following the lines of [HMP95] (in this paper we proposed an extension of Argos dedicated to hybrid systems, as a description language for the tool Polka [HPR97], in which hybrid automata may be composed using the Argos operators. In particular, hierarchic composition in HybridArgos is a way to express that a set of states share the same description of the environment). We shall also extend the language of mode-automata by allowing full Lustre in the equations labeling the states (clocks, node calls, external functions or procedures...). Mode-automata composed in parallel are already available as a language, compiled into pure Lustre, thus bene ting from all the lustre tools. We said in section 2 that \It should be possible to project a program onto a given mode, and obtain the behaviour restricted to this mode". How can we do that for programs made of mode-automata? When a program is reduced to a single mode-automaton, the mode is a state, and extracting the subprogram of this mode consists in taking the equations associated with this state. The object we obtain is a mini-Lustre program without initial state. When the program is something more complex, we are still able to extract a non-initialized mini-Lustre program for a given composed mode; for instance the program of a parallel mode (q; q0 ) is the union of the programs attached to q and q0 . Projecting the complete program onto its modes may be useful for generating ecient sequential code. Indeed, the mode-structure clearly identi es which parts of a program are active at a given instant. In the SCADE (Safety Critical

Application Development Environment) tool sold by Verilog S.A. (based upon a commercial Lustre compiler), designers use an activation condition if they want to express that some part of the data ow program should not be computed at each instant. It is a low level mechanism, which has to be used carefully: the activation condition for subprogram P is a Boolean ow computed elsewhere in the data ow program. Our mode structure is a higher level generalization of this simple mechanism. It is a real language feature, and it can be used for better code generation. Another interesting point about the mode-structure of a program is the possibility of decomposing proofs. For the decomposition of a problem into concurrent tasks, and the usual parallel compositions that support this design, people have proposed compositional proof rules, for instance the assume-guarantee scheme. The idea is to prove properties separately for the components, and to infer some property of the global system. We claim that the decomposition into several modes | provided the language allows to deal with modes explicitly, i.e. to project a global program onto a given mode | should have a corresponding compositional proof rule. At least, a mode-automaton is a way of identifying a control structure in a complex program. It can be used for splitting the work in analysis tools like Polka [HPR97].

References [BB91]

A. Benveniste and G. Berry. Another look at real-time programming. Special Section of the Proceedings of the IEEE, 79(9), September 1991. [BCH+ 85] J-L. Bergerand, P. Caspi, N. Halbwachs, D. Pilaud, and E. Pilaud. Outline of a real time data- ow language. In Real Time Systems Symposium, San Diego, September 1985. [CHPP87] P. Caspi, N. Halbwachs, D. Pilaud, and J. Plaice. lustre, a declarative language for programming synchronous systems. In 14th Symposium on Principles of Programming Languages, Munich, January 1987. [CS95] C2A-SYNCHRON. The common format of synchronous languages { The declarative code DC version 1.0. Technical report, SYNCHRON project, October 1995. [GBBG85] P. Le Guernic, A. Benveniste, P. Bournai, and T. Gauthier. Signal: A data

ow oriented language for signal processing. Technical report, IRISA report 246, IRISA, Rennes, France, 1985. [Har87] D. Harel. Statecharts : A visual approach to complex systems. Science of Computer Programming, 8:231{275, 1987. [HMP95] N. Halbwachs, F. Maraninchi, and Y. E. Proy. The railroad crossing problem, modeling with hybrid argos - analysis with polka. In Second European Workshop on Real-Time and Hybrid Systems, Grenoble (France), June 1995. [HPR97] N. Halbwachs, Y.E. Proy, and P. Roumano. Veri cation of real-time systems using linear relation analysis. Formal Methods in System Design, 11(2):157{185, August 1997. [JLMR94] M. Jourdan, F. Lagnier, F. Maraninchi, and P. Raymond. A multiparadigm language for reactive systems. In In 5th IEEE International Conference on Computer Languages, Toulouse, May 1994. IEEE Computer Society Press.

[JM88]

Farnam Jahanian and Aloysius Mok. Modechart: A speci cation language for real-time systems. IEEE Transactions on Software Engineering, 14, 1988. [Mar92] F. Maraninchi. Operational and compositional semantics of synchronous automaton compositions. In CONCUR. LNCS 630, Springer Verlag, August 1992. [MH96] F. Maraninchi and N. Halbwachs. Compiling argos into boolean equations. In Formal Techniques for Real-Time and Fault Tolerance (FTRTFT), Uppsala (Sweden), September 1996. Springer verlag, LNCS 1135. [MMP91] O. Maler, Z. Manna, and A. Pnueli. From timed to hybrid systems. In Rex Workshop on Real-Time: Theory in Practice, DePlasmolen (Netherlands), June 1991. LNCS 600, Springer Verlag. [Pay96] S. Paynter. Real-time mode-machines. In Formal Techniques for Real-Time and Fault Tolerance (FTRTFT), pages 90{109. LNCS 1135, Springer Verlag, 1996. [PSM96] Carlos Puchol, Douglas Stuart, and Aloysius K. Mok. An operational semantics and a compiler for modechart speci ciations. Technical Report CS-TR-95-37, University of Texas, Austin, July 1, 1996. [RM95] E. Rutten and F. Martinez. SignalGTI, implementing task preemption and time interval in the synchronous data- ow language signal. In 7th Euromicro Workshop on Real Time Systems, Odense (Denmark), June 1995.

Building a Bridge between Pointer Aliases and Program Dependences ? John L. Ross1 and Mooly Sagiv2 1

University of Chicago, e-mail: [email protected] 2 Tel-Aviv University, e-mail: [email protected]

Abstract. In this paper we present a surprisingly simple reduction of the program dependence problem to the may-alias problem. While both problems are undecidable, providing a bridge between them has great practical importance. Program dependence information is used extensively in compiler optimizations, automatic program parallelizations, code scheduling in super-scalar machines, and in software engineering tools such as code slicers. When working with languages that support pointers and references, these systems are forced to make very conservative assumptions. This leads to many super uous program dependences and limits compiler performance and the usability of software engineering tools. Fortunately, there are many algorithms for computing conservative approximations to the may-alias problem. The reduction has the important property of always computing conservative program dependences when used with a conservative may-alias algorithm. We believe that the simplicity of the reduction and the fact that it takes linear time may make it practical for realistic applications.

1 Introduction It is well known that programs with pointers are hard to understand, debug, and optimize. In recent years many interesting algorithms that conservatively analyze programs with pointers have been published. Roughly speaking, these algorithms [19, 20, 25, 5, 16, 17, 23, 8, 6, 9, 14, 27, 13, 28] conservatively solve the mayalias problem, i.e., the algorithms are sometimes able to show that two pointer access paths never refer to the same memory location at a given program point. However, may-alias information is insucient for compiler optimizations, automatic code parallelizations, instruction scheduling for super-scalar machines, and software engineering tools such as code slicers. In these systems, information about the program dependences between dierent program points is required. Such dependences can be uniformly modeled by the program dependence graph (see [21, 26, 12]). In this paper we propose a simple yet powerful approach for nding program dependences for programs with pointers: ?

Partially supported by Binational Science Foundation grant No. 9600337

Given a program P , we generate a program P (hereafter also referred to as the instrumented version of P ) which simulates P . The program dependences of P can be computed by applying an arbitrary conservative may-alias algorithm to P . 0

0

We are reducing the program dependence problem, a problem of great practical importance, to the may-alias problem, a problem with many competing solutions. The reduction has the property that as long as the may-alias algorithm is conservative, the dependences computed are also conservative. Furthermore, there is no loss of precision beyond that introduced by the chosen may-alias algorithm. Since the reduction is quite ecient (linear in the program size), it should be possible to integrate our method into compilers, program slicers, and other software tools.

1.1 Main Results and Related Work The major results in this paper are:

{ The uni cation of the concepts of program dependences and may-aliases.

While these concepts are seemingly dierent, we provide linear reductions between them. Thus may-aliases can be used to nd program dependences and program dependences can be used to nd may-aliases. { A solution to the previously open question about the ability to use \storeless" (see [8{10]) may-alias algorithms such as [9, 27] to nd dependences. One of the simplest store-less may alias algorithm is due to Gao and Hendren [14]. In [15], the algorithm was generalized to compute dependences by introducing new names. Our solution implies that there is no need to re-develop a new algorithm for every may-alias algorithm. Furthermore, we believe that our reduction is actually simpler to understand than the names introduced in [15] since we are proving program properties instead of modifying a particular approximation algorithm. { Our limited experience with the reduction that indicates that store-less mayalias algorithms such as [9, 27] yield quite precise dependence information. { The provision of a method to compare the time and precision of dierent may-alias algorithms by measuring the number of program dependences reported. This metric is far more interesting than just comparing the number of may-aliases as done in [23, 11, 31, 30, 29]. Our program instrumentation closely resembles the \instrumented semantics" of Horwitz, Pfeier, and Reps [18]. They propose to change the program semantics so that the interpreter will carry-around program statements. We instrument the program itself to locally record statement information. Thus, an arbitrary may-alias algorithm can be used on the instrumented program without modi cation. In contrast, Horwitz, Pfeier, and Reps proposed modi cations to the speci c store based may-alias algorithm of Jones and Muchnick [19] (which is imprecise and doubly exponential in space).

An additional bene t of our shift from semantics instrumentation into a program transformation is that it is easier to understand and to prove correct. For example, Horwitz, Pfeier, and Reps, need to show the equivalence between the original and the instrumented program semantics and the instrumentation properties. In contrast, we show that the instrumented program simulates the original program and the properties of the instrumentation. Finally, program dependences can also be conservatively computed by combining side-eect analysis [4, 7, 22, 6] with reaching de nitions [2] or by combining con ict analysis [25] with reaching de nitions as done in [24]. However, these techniques are extremely imprecise when recursive data structures are manipulated. The main reason is that it is hard to distinguish between occurrences of the same heap allocated run-time location (see [6, Section 6.2] for an interesting discussion).

1.2 Outline of the rest of this paper In Section 2.1, we describe the simple Lisp-like language that is used throughout this paper. The main features of this language are its dynamic memory, pointers, and destructive assignment. The use of a Lisp-like language, as opposed to C, simpli es the presentation by avoiding types and the need to handle some of the dicult aspects of C, such as pointer arithmetic and casting. In Section 2.2, we recall the de nition of ow dependences. In Section 2.3 the may-alias problem is de ned. In Section 3 we de ne the instrumentation. We show that the instrumented program simulates the execution of the original program. We also show that for every run-time location of the original program the instrumented program maintains the history of the statements that last wrote into that location. These two properties allow us to prove that may-aliases in the instrumented program precisely determine the ow dependences in the original program. In Section 4, we discuss the program dependences computed by some mayalias algorithms on instrumented programs. Finally, Section 5, contains some concluding remarks.

2 Preliminaries 2.1 Programs Our illustrative language (following [19, 5]) combines an Algol-like language for control ow and functions, Lisp-like memory access, and explicit destructive assignment statements. The atomic statements of this language are shown in Table 1. Memory access paths are represented by hAccessi. Valid expressions are represented by hExpi. We allow arbitrary control ow statements using conditions hCondi1 . Additionally all statements are labeled. 1

Arbitrary expressions and procedures can be allowed as long as it is not possible to observe actual run-time locations.

Figure 1 shows a program in our language that is used throughout this paper as a running example. This program reads atoms and builds them into a list by destructively updating the cdr of the tail of the list.

program DestructiveAppend() begin s1 : new( head ) s2 : read( head.car ) s3 : head.cdr := nil s4 : tail := head s5 : while( tail.car 6= 'x' ) do s6 : new( temp ) s7 : read( temp.car ) s8 : temp.cdr := nil s9 : tail.cdr := temp s10 : tail := tail.cdr od s11 : write( head.car ) s12 : write( tail.car ) end.

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12

Fig. 1. A program that builds a list by destructively appending elements to tail and its ow dependences.

2.2 The Program Dependence Problem Program dependences can be grouped into ow dependences (def-use), output dependences (def-def), and anti-dependences (use-def) [21, 12]. In this paper, we focus on ow dependences between program statements. The other dependences are easily handled with only minor modi cations to our method.

Table 1. An illustrative language with dynamic memory and destructive updates.

hSti ::= si: hAccessi :=hExpi hSti ::= si: new(hAccessi) hSti ::= si: read(hAccessi) hSti ::= si: write(hExpi) hAccessi ::= variable j hAccessi:hSeli hExpi ::= hAccessi j atom j nil hSeli ::= car j cdr hCondi ::= hExpi = hExpi hCondi ::= hExpi =6 hExpi

Our language allows programs to explicitly modify their store through pointers. Because of this we must phrase the de nition of ow dependence in terms of memory locations (cons-cells) and not variable names. We shall borrow the following de nition for ow dependence:

De nition 1 ([18]). Program point q has a ow dependence on program point p if p writes into memory location loc that q reads, and there is no intervening write into loc along an execution path by which q is reached from p.

Figure 1 also shows the ow dependences for the running example program. Notice that s11 is ow dependent on only s1 and s2 , while s12 is ow dependent on s2 , s4 , s7 , and s10 . This information could be used by slicing tools to nd that the loop need not be executed to print head:car in s11 , or by an instruction scheduler to reschedule s11 for anytime after s2 . Also, s3 and s8 have no statements dependent on them, making them candidates for elimination. Thus, even in this simple example, knowing the ow dependences would potentially allow several code transformations. Since determining the exact ow dependences in an arbitrary program is undecidable, approximation algorithms must be used. A ow dependence approximation algorithm is conservative if it always nds a superset of the true

ow dependences.

2.3 The May-Alias Problem The may-alias problem is to determine whether two access-paths, at a given program point, could denote the same cons-cell.

De nition 2. Two access-paths are may-aliases at program point p, if there exists an execution path to program point p where both denote the same conscell.

In the running example program, head:cdr:cdr and tail are may-aliases at s6 since before the third iteration these access paths denote the same cons-cell. However, tail:cdr:cdr is not a may-alias to head since they can never denote the same cons-cell. Since the may-alias problem is undecidable, approximation algorithms must be used. A may-alias approximation algorithm is conservative if it always nds a superset of the true may-aliases.

3 The Instrumentation In this section the instrumentation is de ned. For notational simplicity, P stands for an arbitrary xed program, and P stands for its instrumented version. 0

3.1 The Main Idea The program P simulates all the execution sequences of P . Additionally, the \observable" properties of P are preserved. Most importantly, P records for every variable v, the statement from P that last wrote into v. This \instrumentation information" is recorded in v:car (while storing the original content of v in v:cdr). This \totally static" instrumentation2 allows program dependences to be recovered by may-alias queries on P . More speci cally, for every statement in P there is an associated cons-cell in P . We refer to these as statement cons-cells. Whenever a statement s assigns a value into a variable v, P allocates a cons-cell that we refer to as an instrumentation cons-cell. The car of this instrumentation cons-cell always points to the statement cons-cell associated with s . Thus there is a ow dependence from a statement p to q: x := y in P if and only if y:car can point to the statement conscell associated with p in P . Finally, we refer to the cdr of the instrumentation cons-cell as the data cons-cell. The data cons-cell is inductively de ned: 0

0

0

0

i

0

i

0

{ If s is an assignment s : v := A for an atom, A, then the data cell is A. { If s is an assignment s : v := v then the data cell denotes v :cdr. { If the statement is s : new(v), then the data cons-cell denotes a newly alloi

i

i

i

0

0

i

cated cons-cell. Thus P allocates two cons-cells for this statement. 0

In general, there is an inductive syntax directed de nition of the data cells formally de ned by the function txe de ned in Table 3.

3.2 The Instrumentation of the Running Example To make this discussion concrete, Figure 2 shows the beginning of the running example program and its instrumented version. Figure 3 shows the stores of both the programs just before the loop (on the input beginning with 'A'). The cons-cells in this gure are labeled for readability only. The instrumented program begins by allocating one statement cons-cell for each statement in the original program. Then, for every statement in the original program, the instrumented statement block in the instrumented program records the last wrote-statement and the data. The variable rhs is used as a temporary to store the right-hand side of an assignment to allow the same variable to be used on both sides of an assignment. Let us now illustrate this for the statements s1 through s4 in Figure 3.

{ In the original program, after s1, head points a new uninitialized cons-cell, c1 . In the instrumented program, after the block of statements labeled by s1 , head points to an instrumentation cons-cell, i1, head:car points to the statement cell for s1 , and head:cdr points to c1 . 0

2

In contrast to dynamic program slicing algorithms that record similar information using hash functions, e.g., [1].

{ In the original program, after s2, head:car points to the atom A. In the in-

strumented program, after the block of statements labeled by s2 , head:cdr:car points to an instrumentation cons-cell, i2 , head:cdr:car:car points to the statement cell for s2 , and head:cdr:car:cdr points to A. { In the original program, after s3, head:cdr points to nil. In the instrumented program, after the block of statements labeled by s3 , head:cdr:cdr points to an instrumentation cons-cell, i3 , head:cdr:cdr:car points to the statement cell for s3 , and head:cdr:cdr:cdr points to nil. { In the original program, after s4, tail points to the cons-cell c1. In the instrumented program, after the block of statements labeled by s4 , tail points to an instrumentation cons-cell, i4, tail:car points to the statement cell for s4 , and tail:cdr points to c1 . Notice how the sharing of the r-values of head and tail is preserved by the transformation. 0

3.3 A Formal De nition of the Instrumentation Formally, we de ne the instrumentation as follows: De nition 3. Let P be a program in the form de ned in Table 1. Let s1; s2; : : : ; s be the statement labels in P . The instrumented program P is obtained from P starting with a prolog of the form new(ps ) where i = 1; 2 : : :n. After the prolog, we rewrite P according to the translation rules shown in Table 2 and Table 3. Example 4. In the running example program (Figure 2), in P , s11 writes head:car and in P , s11 writes head:cdr:car:cdr. This follows from: txe (head:car) = txa (head:car):cdr = txa (head):cdr:car:cdr = head:cdr:car:cdr

n

0

i

0

3.4 Properties of the Instrumentation In this section we show that the instrumentation has reduced the ow dependence problem to the may-alias problem. First the simulation of P by P is shown in the Simulation Theorem. This implies that the instrumentation does not introduce any imprecision into the ow dependence analysis. We also show the Last Wrote Lemma which states that the instrumentation maintains the needed invariants. Because of the Simulation Theorem, and the Last Wrote Lemma, we are able to conclude that: 1. exactly all the ow dependences in P are found using a may-alias oracle on P. 2. using any conservative may-alias algorithm on P always results in conservative ow dependences for P . Intuitively, by simulation, we mean that at every label of P and P , all the \observable properties" are preserved in P , given the same input. In our case, observable properties are: { r-values printed by the write statements 0

0

0

0

0

program DestructiveAppend() begin s1 : new( head ) s2 :

read( head.car )

s3 :

head.cdr := nil

s4 :

tail := head

s5 :

while( tail.car 6= 'x' ) do

program InstrumentedDestructiveAppend() begin new( psi ) 8i 2 f1; 2; : : : ; 12g s1 : new( head ) head.car := ps1 new( head.cdr ) s2 : new( head.cdr.car ) head.cdr.car.car := ps2 read( head.cdr.car.cdr) s3 : rhs := nil new( head.cdr.cdr ) head.cdr.cdr.car := ps3 head.cdr.cdr.cdr := rhs s4 : rhs := head.cdr new( tail ) tail.car := ps4 tail.cdr := rhs s5 : while( tail.cdr.car.cdr 6= 'x' ) do

Fig. 2. The beginning of the example program and its corresponding instrumented program.

Table 2. The translation rules that de ne the instrumentation excluding the prolog.

For simplicity, every assignment allocates a new instrumentation cons-cell. The variable rhs is used as a temporary to store the right-hand side of an assignment to allow the same variable to be used on both sides of an assignment.

si: hAccessi:=hExpi =) si : rhs :=txe (hExpi) new(txa (hAccessi)) txa (hAccessi):car:=psi txa (hAccessi):cdr:= rhs si: new(hAccessi) =) si : new(txa (hAccessi)) txa (hAccessi):car:=psi new(txa (hAccessi):cdr) si: read(hAccessi) =) si : new(txa (hAccessi)) txa (hAccessi):car:=psi read(txa (hAccessi):cdr) si: write(hExpi) =) si : write(txa (hExpi)) hExp1i = hExp2i =) txe (hExp1i) = txe (hExp2i) hExp1i 6= hExp2i =) txe (hExp1i) 6= txe (hExp2i)

:

(

s1 new head

)

0

:

(

-

:

ps1

(

s2 read head:car

)

0

head:car

- ? i1

r

head c1

head

);

s1 new head

:

-

r

(

- ?

);

r

head

-

r

ps1

?

A :

s3 head:cdr

:=

ps2

0

s3

nil

:

rhs

:=

r

r

-

r

?

r

- nil

ps1

A :

s4 tail

:=

ps2 0

s4

head

:

rhs

:=

? - ? r

-

r

--

c1

tail head

r

?

A

r

- nil

ps4

(

head:cdr:cdr:car

-

i3

r

r

? @@ ? R @ i2

r

;

r

);

0

c1

(

r

head:cdr new tail

- ?

;

ps2 read head:cdr:car:cdr

))

A

r

(

i4

tail

:=

r

i2

;

- ?

)

0

nil new head:cdr:cdr

head

head

(

c1

i1

c1

;

ps1 new head:cdr

head:cdr:car:car

-

i1

head

0

c1

s2 new head:cdr:car

c1

:=

);

r

- ?

head

ps1

r

ps2

r

?

;

ps3 head:cdr:cdr:cdr

:=

rhs

- nil

ps3

A :=

tail:car

i1

r

:=

;

ps4 tail:cdr

:=

rhs

- ? - nil ? ? @@ ? @R A 0

i3

c1

r

r

r

r

r

i2

r

r

ps3

Fig. 3. The store of the original and the instrumented running example programs just before the loop on an input list starting with `A'. For visual clarity, statement conscells not pointed to by an instrumentation cons-cell are not shown. Also, cons-cells are labeled and highlighted to show the correspondence between the stores of the original and instrumented programs.

Table 3. The function txa which maps an access path in the original program into

the corresponding access path in the instrumented program. The function txe maps an expression into the corresponding expression in the instrumented program. txa (variable) = variable txa (hAccessi:hSeli) = txa (hAccessi):cdr:hSeli txe (hAccessi) = txa (hAccessi):cdr txe (atom) = atom txe (nil) = nil

{ equalities of r-values In particular, the execution sequences of P and P at every label are the same. This discussion motivates the following de nition: De nition 5. Let S be an arbitrary sequence of statement labels in P e1; e2 be expressions, and I be an input vector. We denote by I; S j=e1 = e2 the fact that the input I causes S to be executed in P , and in the store after S , the r-values of e1 and e2 are equal. Example 6. In the running example, head:cdr:cdr and tail denote the same conscell before the third iteration for inputs lengths of four or more. Therefore, I; [s1 ; s2 ; s3 ; s4 ; s5 ]([s6 ; s7 ; s8 ; s9 ; s10 ])2 j=head:cdr:cdr = tail Theorem 7. (Simulation Theorem) Given input I , expressions e1 and e2, and sequence of statement labels S : 0

P

P

I; S j=e1 = e2 P

()

0

I; S j=txe (e1 ) = txe (e2 ) P

Example 8. In the running example, before the rst iteration, in the last box of Figure 3, head and tail denote the same cons-cell and head:cdr and tail:cdr denote the same cons-cell. Also, in the instrumented program, head:cdr:cdr:cdr:cdr and tail:cdr denote the same cons-cell before the third iteration for inputs of length four or more. Therefore, as expected from Example 6, I; [s1 ; s2 ; s3 ; s4 ; s5 ]([s6 ; s7 ; s8 ; s9 ; s10 ])2 j=txe (head:cdr:cdr) = txe (tail) The utility of the instrumentation is captured in the following lemma. Lemma 9. (Last Wrote Lemma) Given input I , sequence of statement labels S , and an access path a, the input I leads to the execution of S in P in which the last statement that wrote into a is si if and only if P

0

0

I; S j=txa (a):car = ps : P

i

Example 10. In the running example, before the rst iteration, in the last box of Figure 3, we have I; [s1 ; s2 ; s3 ; s4 ; s5 ]j=txa (head):car = ps1 since s1 is the statement that last wrote into head. Also, for input list I = [ A ; x ], we have: I; [s1 ; s2 ; s3 ; s4 ; s5 ; s11 ; s12 ]j=txa (tail:car):car = ps2 since for such input s2 last wrote into tail:car (through the assignment to head:car). A single statement in our language can read from many memory locations. For example, in the running example program, statement s5 reads from tail and tail:car. The complete read-sets for the statements in our language are shown in Tables 4 and 5. We are now able to state the main result. P

0

0

P

0

0 0

0

Table 4. Read-sets for the statements in our language. si : hAccessi:=hExpi si : new(hAccessi) si : read(hAccessi) si : write(hExpi) hExp1i = hExp2i hExp1i =6 hExp2i

rsa(hAccessi) [ rse(hExpi) rsa(hAccessi) rse(hAccessi) rse(hExpi) rse(hExp1i) [ rse(hExp2i) rse(hExp1i) [ rse(hExp2i)

Table 5. An inductive de nition of rsa, the read-set for access-paths, and rse, the read-set for expressions.

rsa(variable) =; rsa(hAccessi:hSeli) = rsa(hAccessi) [ fhAccessig rse(variable) = fvariableg rse(hAccessi:hSeli) = rse(hAccessi) [ fhAccessi:hSelig rse(atom) =; rse(nil) =;

Theorem 11. (Flow Dependence Reduction) Given program P , its instrumented version P , and any two statement labels p and q. There is a ow dependence from p to q (in P ) if and only if there exists an access path, a, in the read-set of q, s.t. psp is a may-alias of txa (a):car at q in P . 0

0

Example 12. To nd the ow dependences of s5 in the running example: s5 : while(tail:car 6= `x ) First Tables 4 and 5 are used to determine the read-set of s5 : rse(hExp1 i) [ rse(hExp2 i) = rse(tail:car) [ rse(`x ) = rse(tail) [ ftail:carg [ ; = ftailg [ ftail:carg = ftail; tail:carg: Then txa (a):car is calculated for each a in the read-set: txa (tail):car = tail:car txa (tail:car):car = txa (tail):cdr:car:car = tail:cdr:car:car Next the may-aliases to tail:car and tail:cdr:car:car are calculated by any mayaliases algorithm. Finally s5 is ow dependent on the statements associated with the statement cons-cells that are among the may-aliases found to tail:car and tail:cdr:car:car. The Read-Sets and May-Aliases for the running example are summarized in Table 6. 0

0

From a complexity viewpoint our method can be very inexpensive. The program transformation time and space are linear in the size of the original program. In applying Theorem 11 the number of times the may-alias algorithm is invoked is also linear in the size of the original program, or more speci cally, proportional

Table 6. Flow dependence analysis of the running example using a may-alias oracle. Stmt s1 s2 s3 s4 s5

s6 s7 s8 s9 s10 s11 s12

Read-Set

May-Aliases

; ; fheadg f(head:car; ps )g fheadg f(head:car; ps )g fheadg f(head:car; ps )g ftail; tail:carg f(tail:car; ps ); (tail:car; ps ); (tail:cdr:car:car; ps ); (tail:cdr:car:car; ps )g ; ; ftempg f(temp:car; ps )g ftempg f(temp:car; ps )g ftail; tempg f(tail:car; ps ); (tail:car; ps ); (temp:car;ps )g ftail; tail:cdrg f(tail:car; ps ); (tail:car; ps ); (tail:cdr:cdr:car; ps )g fhead; head:carg f(head:car; ps ); (head:cdr:car:car; ps )g ftail; tail:carg f(tail:car; ps ); (tail:car; ps ); (tail:cdr:car:car; ps ); (tail:cdr:car:car; ps )g 1 1 1

4

10

2

7

6 6

4

10

4

10

1

6

9

2

4

10

2

7

to the size of the read-sets. It is most likely that the complexity of the may-alias algorithm itself is the dominant cost.

4 Plug and Play An important corollary of Theorem 11 is that an arbitrary conservative may-alias algorithm on P yields a conservative solution to the ow dependence problem on P . Since existing may-alias algorithms often yield results which are dicult to compare, it is instructive to consider the ow dependences computed by various algorithms on the running example program. { The algorithm of [9] yields the may-aliases shown in column 3 of Table 6. Therefore, on this program, this algorithm yields the exact ow dependences shown in Figure 1. { The more ecient may-alias algorithms of [23, 14, 30, 29] are useful to nd

ow dependences in programs with disjoint data structures. However, in programs with recursive data structures such as the running example, they normally yield many super uous may-aliases leading to super uous ow dependences. For example, [23] is not able to identify that tail points to an acyclic list. Therefore, it yields that head:car and ps7 are may-aliases at s11 . Therefore, it will conclude that the value of head:car read at s11 may be written inside the loop (at statement s7 ). { The algorithm of [27] nds, in addition to the correct dependences, super uous ow dependences in the running example. For example, it nds that s5 has a ow dependence on s8 . This inaccuracy is attributable to the anonymous nature of the second cons-cell allocated with each new statement. There are two possible ways to remedy this inaccuracy: 0

Modify the algorithm so that it is 2-bounded, i.e., also keeps track of car and cdr elds of variables. Indeed, this may be an adequate solution for general k-bounded approaches, e.g., [19] by increasing k to 2k. Modify the transformation to assign unique names to these cons-cells. We have implemented this solution, and tested it using the PAG [3] implementation of the [27] algorithm3 and found exactly all the ow dependences in the running example.

5 Conclusions In this paper, we showed that may-alias algorithms can be used, without any modi cation, to compute program dependences. We are hoping that this will lead to more research in nding practical may-alias algorithms to compute good approximations for ow dependences. For simplicity, we did not optimize the memory usage of the instrumented program. In particular, for every executed instance of a statement in the original program that writes to the store, the instrumented program creates a new instrumentation cons-cell. This extra memory usage is harmless to may-alias algorithms (for some algorithms it can even improve the accuracy of the analysis, e.g., [5]). In cases where the instrumented program is intended to be executed, it is possible to drastically reduce the memory usage through cons-cell reuse. Finally, it is worthwhile to note that ow dependences can be also used to nd may-aliases. Therefore, may-aliases are necessary in order to compute ow dependences. For example, Figure 4 contains a program fragment that provides the instrumentation to \check" if two program variables v1 and v2 are may aliases at program point p. This instrumentation preserves the meaning of the original program and has the property that v1 and v2 are may-aliases at p if and only if s2 has a ow dependence on s1 . p: if v1 6= nil then s1 : v1 :cdr := v1 :cdr if v2 6= nil then s2 : write(v2 :cdr)

Fig. 4. A program fragment such that v1 and v2 are may-aliases at p if and only if s2 has a ow dependence on s1 .

Acknowledgments

We are grateful for the helpful comments of Thomas Ball, Michael Benedikt, Thomas Reps, and Reinhard Wilhelm for their comments that led to substantial 3

On a SPARCstation 20, PAG used 0.53 seconds of cpu time.

improvements of this paper. We also would like to thank Martin Alt and Florian Martin for PAG, and their PAG implementation of [27] for a C subset.

References 1. H. Agrawal and J.R. Horgan. Dynamic program slicing. In SIGPLAN Conference on Programming Languages Design and Implementation, volume 25 of ACM SIGPLAN Notices, pages 246{256, White Plains, New York, June 1990. 2. A.V. Aho, R. Sethi, and J.D. Ullman. Compilers: Principles, Techniques and Tools. Addison-Wesley, 1985. 3. M. Alt and F. Martin. Generation of ecient interprocedural analyzers with PAG. In SAS'95, Static Analysis, number 983 in Lecture Notes in Computer Science, pages 33{50. Springer-Verlag, 1995. 4. J.P. Banning. An ecient way to nd the side eects of procedure calls and the aliases of variables. In ACM Symposium on Principles of Programming Languages, pages 29{41, New York, NY, 1979. ACM Press. 5. D.R. Chase, M. Wegman, and F. Zadeck. Analysis of pointers and structures. In SIGPLAN Conference on Programming Languages Design and Implementation, pages 296{310, New York, NY, 1990. ACM Press. 6. J.D. Choi, M. Burke, and P. Carini. Ecient ow-sensitive interprocedural computation of pointer-induced aliases and side-eects. In ACM Symposium on Principles of Programming Languages, pages 232{245, New York, NY, 1993. ACM Press. 7. K.D. Cooper and K. Kennedy. Interprocedural side-eect analysis in linear time. In SIGPLAN Conference on Programming Languages Design and Implementation, pages 57{66, New York, NY, 1988. ACM Press. 8. A. Deutsch. A storeless model for aliasing and its abstractions using nite representations of right-regular equivalence relations. In IEEE International Conference on Computer Languages, pages 2{13, Washington, DC, 1992. IEEE Press. 9. A. Deutsch. Interprocedural may-alias analysis for pointers: Beyond k-limiting. In SIGPLAN Conference on Programming Languages Design and Implementation, pages 230{241, New York, NY, 1994. ACM Press. 10. A. Deutsch. Semantic models and abstract interpretation for inductive data structures and pointers. In Proc. of ACM Symposium on Partial Evaluation and Semantics-Based Program Manipulation, PEPM'95, pages 226{228, New York, NY, June 1995. ACM Press. 11. M. Emami, R. Ghiya, and L. Hendren. Context-sensitive interprocedural points-to analysis in the presence of function pointers. In SIGPLAN Conference on Programming Languages Design and Implementation, New York, NY, 1994. ACM Press. 12. J. Ferrante, K. Ottenstein, and J. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 3(9):319{349, 1987. 13. R. Ghiya and L.J. Hendren. Is it a tree, a dag, or a cyclic graph? In ACM Symposium on Principles of Programming Languages, New York, NY, January 1996. ACM Press. 14. R. Ghiya and L.J. Hendren. Connection analysis: A practical interprocedural heap analysis for c. In Proc. of the 8th Intl. Work. on Languages and Compilers for Parallel Computing, number 1033 in Lecture Notes in Computer Science, pages 515{534, Columbus, Ohio, August 1995. Springer-Verlag.

15. R. Ghiya and L.J. Hendren. Putting pointer analysis to work. In ACM Symposium on Principles of Programming Languages. ACM, New York, January 1998. 16. L. Hendren. Parallelizing Programs with Recursive Data Structures. PhD thesis, Cornell University, Ithaca, N.Y., Jan 1990. 17. L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems, 1(1):35{47, January 1990. 18. S. Horwitz, P. Pfeier, and T. Reps. Dependence analysis for pointer variables. In SIGPLAN Conference on Programming Languages Design and Implementation, volume 24 of ACM SIGPLAN Notices, pages 28{40, Portland, Oregon, June 1989. ACM Press. 19. N.D. Jones and S.S. Muchnick. Flow analysis and optimization of Lisp-like structures. In S.S. Muchnick and N.D. Jones, editors, Program Flow Analysis: Theory and Applications, chapter 4, pages 102{131. Prentice-Hall, Englewood Clis, NJ, 1981. 20. N.D. Jones and S.S. Muchnick. A exible approach to interprocedural data ow analysis and programs with recursive data structures. In ACM Symposium on Principles of Programming Languages, pages 66{74, New York, NY, 1982. ACM Press. 21. D.J. Kuck, R.H. Kuhn, B. Leasure, D.A. Padua, and M. Wolfe. Dependence graphs and compiler optimizations. In ACM Symposium on Principles of Programming Languages, pages 207{218, New York, NY, 1981. ACM Press. 22. W. Land, B.G. Ryder, and S. Zhang. Interprocedural modi cation side eect analysis with pointer aliasing. In Proc. of the ACM SIGPLAN '93 Conf. on Programming Language Design and Implementation, pages 56{67, 1993. 23. W. Landi and B.G. Ryder. Pointer induced aliasing: A problem classi cation. In ACM Symposium on Principles of Programming Languages, pages 93{103, New York, NY, January 1991. ACM Press. 24. J.R. Larus. Re ning and classifying data dependences. Unpublished extended abstract, Berkeley, CA, November 1988. 25. J.R. Larus and P.N. Hil nger. Detecting con icts between structure accesses. In SIGPLAN Conference on Programming Languages Design and Implementation, pages 21{34, New York, NY, 1988. ACM Press. 26. K.J. Ottenstein and L.M. Ottenstein. The program dependence graph in a software development environment. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, pages 177{184, New York, NY, 1984. ACM Press. 27. M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. In ACM Symposium on Principles of Programming Languages, New York, NY, January 1996. ACM Press. 28. M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. ACM Transactions on Programming Languages and Systems, 1997. To Appear. 29. M. Shapiro and S. Horwitz. Fast and accurate ow-insensitive points-to analysis. In ACM Symposium on Principles of Programming Languages, 1997. 30. B. Steengaard. Points-to analysis in linear time. In ACM Symposium on Principles of Programming Languages. ACM, New York, January 1996. 31. R.P. Willson and M.S. Lam. Ecient context-sensitive pointer analysis for c programs. In SIGPLAN Conference on Programming Languages Design and Implementation, pages 1{12, La Jolla, CA, June 18-21 1995. ACM Press.

Systematic Change of Data Representation: Program Manipulations and a Case Study William L. Scherlis ? School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213

Abstract. We present a set of semantics-based program manipulation

techniques to assist in restructuring software encapsulation boundaries and making systematic changes to data representations. These techniques adapt abstraction structure and data representations without altering program functionality. The techniques are intended to be embodied in source-level analysis and manipulation tools used interactively by programmers, rather than in fully automatic tools and compilers. The approach involves combining techniques for adapting and specializing encapsulated data types (classes) and for eliminating redundant operations that are distributed among multiple methods in a class (functions in a data type) with techniques for cloning classes to facilitate specialization and for moving computation across class boundaries. The combined set of techniques is intended to facilitate revision of structural design decisions such as the design of a class hierarchy or an internal component interface. The paper introduces new techniques, provides soundness proofs, and gives details of case study involving production Java code.

1 Introduction Semantics-based program manipulation techniques can be used to support the evolution of con gurations of program components and associated internal interfaces. Revision of these abstraction boundaries is a principal challenge in software reengineering. While structural decisions usually must be made early in the software development process, the consequences of these decisions are not fully appreciated until later, when it is costly and risky to revise them. Program ?

This material is based upon work supported by the National Science Foundation under Grant No. CCR-9504339 and by the Defense Advanced Research Projects Agency and Rome Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-97-2-0241. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the ocial policies or endorsements, either expressed or implied, of the Defense Advanced Research Projects Agency, Rome Laboratory, the National Science Foundation, or the U.S. Government. Author email: [email protected].

manipulation techniques can potentially support the evolution of internal interface and encapsulation structure, oering greater exibility in the management of encapsulation structure. This paper focuses on data representation optimizations that are enabled by structural manipulations. It presents three new results: First, it introduces two new program manipulation techniques, idempotency and projection . Second, it provides proofs for the two techniques and for a related third technique, shift, that was previously described without proof [S86,S94]. Finally, a scenario of software evolution involving production Java code is presented to illustrate how the techniques combine with related structural manipulations to accomplish representation change. The case study is an account of how the classes java.lang.String, for immutable strings, and java.lang.StringBuffer, for mutable strings, could have been derived through a series of structural manipulations starting with a single class implementing C-style strings. The intent of this research is to make the evolution of data representations a more systematic and, from a software engineering point of view, low-risk activity that can be supported by interactive source-level program analysis and manipulation tools. Source-level program manipulation presents challenges unlike those for automatic systems such as compilers and partial evaluators. The sections below introduce the program manipulation techniques, sketch their proofs, and give an account of the evolutionary case study.

2 Program Manipulation Most modern programming languages support information-hiding encapsulations such as Java classes, Standard ML structures and functors, Ada packages, and Modula-3 modules. In this paper, we use a simpli ed form of encapsulation based on Java classes but with no inheritance (all classes are nal ), full static typing, and only private non-static elds (i.e., the data components, or instance variables , of abstract objects). The internal representation of an abstract object thus corresponds to a record containing the instance variables. In Standard ML terminology, these classes correspond to abstypes or simpli ed structures. We use the term abstraction boundary to refer informally to the interface between the class and its clients (the signature in Standard ML), including any associated abstract speci cations. When the scope of access by clients to a class is bounded and its contents can be fully analyzed, then adjustment and specialization of the class interface (and its clients) becomes possible. For larger systems and libraries, this scope is eectively unbounded, however, and other techniques, such as cloning of classes (described below), may need to be used to arti cally limit access scopes. Language. We describe the manipulation techniques in a language-independent manner, though we adapt them for use on Java classes in the case study. We can think of an object being \unwrapped" as it passes into the internal scope of its controlling class, and \wrapped" as it passes out of the scope. It is convenient in this presentation to follow the example of the early Edin-

burgh ML and make this transit of the abstraction boundary explicit through functions Abs and Rep, which wrap and unwrap respectively. That is, Rep translates an encapsulated abstract object into an object of its data representation type, which in Java is a record containing the object's instance variables. For example, if an abstract Pixel consists of a 2-D coordinate pair together with some abstract intensity information, then Rep : Pixel ! Int Int Intensity and Abs : Int Int Intensity ! Pixel. Here \" denotes product of types. The two key restrictions are that Rep and Abs (which are implicitly indexed on type names) can be called only within the internal scope of the controlling class and they are the only functions that can directly construct and deconstruct abstract values. Manipulations. Manipulation of class boundaries and exploitation of specialized contexts of use are common operations at all stages of program development. We employ two general collections of meaning-preserving techniques to support this. The rst collection, called class manipulations, is the focus of this paper. Class manipulations make systematic changes to data representations in classes. They can alter performance and other attributes but, with respect to overall class functionality (barring covert channels), the changes are invisible to clients. That is, class manipulations do not alter signatures (and external invariants), but they do change computation and representation. The second class of manipulations are the boundary manipulations, which alter signatures, but do not change computation and representation. Boundary manipulations move computation into or out of classes, and they merge, clone, and split classes. The simplest boundary manipulations are operation migration manipulations, which move methods into and out of classes. Preconditions for these manipulation rules are mostly related to names, scopes, and types. A more interesting set of boundary manipulations are the cloning manipulations , which separate a class into distinct copies, without a need to fully partition the data space. Cloning manipulations introduce a type distinction and require analyses of types and use of object identity to assure that the client space can be partitioned|potentially with the introduction of explicit \conversion" points. Boundary (and class) manipulations are used to juxtapose pertinent program elements, so they can be further manipulated as an aggregate, either mechanically or manually. Boundary manipulations are brie y introduced in [S94] and are not detailed here. It is worth noting, however, that many classical source-level transformation techniques [BD77] can be understood as techniques to juxtapose related program elements so simpli cations can be made. Two familiar trivial examples illustrate: The derivation of a linear-space list reversal function from the quadratic form relies on transformation steps whose sole purpose is to juxtapose the two calls to list append , enabling them to be associated to the right, rather than the left association that is tacit in the recursion structure of the initial program. The derivation of the linear Fibonacci from exponential relies on juxtaposing two separate calls to ( , 2) from separate recursive invocations. Various strategies for achieving this juxtaposition have been developed such as tupling, composition, and deforestation [P84,S80,W88]. F x

Class manipulations. The class manipulations adjust representations and their associated invariants while preserving the overall meaning of a class. Details of these techniques and proofs are in the next two sections. The shift transformation, in object-oriented terms, systematically moves computation among the method de nitions within a class de nition while preserving the abstract semantics presented to clients of a class. Shift operates by transferring a common computation fragment uniformly from all wrapping sites to all unwrapping sites (or vice versa). In the case study later in this paper, shift is used several times to make changes to the string representations. Shifts enable frequency reduction and localizing computation for later optimization.For example, in database design, the relative execution frequencies, use of space, and other considerations determine which computations are done when data is stored and which are done when data is retrieved. Shifts are also useful to accomplish representation change. For example, suppose a function Magnitude : Int Int ! Real calculates the distance from a point to the origin. If magnitudes are calculated when pixels are accessed, shift can be used to move these calls to wrapping sites. The representation of pixels thus changes from Int Int to Int Int Real, with distances being calculated only when pixels are created or changed, not every time they are accessed. The project transformation enables \lossy" specialization of representations, eliminating instance variables and associated computation. For example, in a specialized case where intensity information is calculated and maintained but not ultimately used by class clients, project could be used to simplify pixels from Rep : Pixel ! Int Int Intensity to Rep : Pixel ! Int Int and eliminate intensity-related calculations. The idempotency transformation eliminates subcomputations that are idempotent and redundant across method de nitions in a class. Idempotency is a surprisingly common property among program operations. Most integrity checks, cache building operations, and invariant maintaining operations such as tree balancing are idempotent. 0

3 Class manipulation techniques Class manipulations exploit the information hiding associated with instance variables in a class (in Standard ML, variables of types whose data constructors are hidden in a structure or abstype). Signi cant changes can be made to the internal structure of a class, including introduction and elimination of instance variables, without eect on client-level semantics. Class manipulations are based on Hoare's venerable idea of relating abstraction and representation in data types. Shift. The shift rule has two symmetrical variants. As noted above, the Rep and Abs functions are used to manage access to the data representation. A requirement on all instances of Rep or Abs (which are easy to nd, since they are all within the syntactic scope of the class) thus has the eect of a universal requirement on all instances where an encapsulated data object is

constructed or selected. Let be the class (abstract) type of encapsulated objects and be the type of their representations. In the steps below, a portion of the internal computation of all methods that operate on objects is identi ed and abstracted into a function called Span that operates on objects. Note, in a Java class de nition would be an aggregate of the instance variables (nonstatic elds) of the class. For our presentation purposes, we adapt the usual object-oriented notational convention to make the aggregate of instance variables an explicit value of type . In the case study we make a similar adaptation for Java. This means we can require the fragment of shifted computation, abstracted as Span, can be a simple function, allowing a more natural functional style in the presentation of the technique. The eect of Shift is to move a common computation from all program points in a class where objects are unwrapped to program points where objects are wrapped (and vice-versa): 1. By other manipulations, local within each operation de nition, establish that all sites of Rep : ! appear in the context of a call to Span, Span Rep : ! 0 where Span : ! 0 . That is, there is some common functional portion of computation of every method that occurs at object unwrapping. If this is not possible, the transformation cannot be carried out. Abstract all instances of this computation into calls to Span. 2. Replace Abs and Rep as follows. All instances of Abs : ! become Abs0 Span : ! 0 where Abs0 : 0 ! 0 . All instances of Rep as Span Rep : ! 0 become Rep0 : 0 ! 0 3. It is now established that all sites of Abs0 : 0 ! 0 are in the context of a call to Span, Abs0 Span : ! 0 The shift rule, intuitively, advances computation from object unwrapping to object wrapping. The variant of shift is to reverse these three steps and the operations in them, thus delaying the Span computation from object wrapping to subsequent object unwrapping. Although the universal requirement of the rst step may seem dicult to achieve, but it can be accomplished trivially in those cases where Span is feasibly invertible by inserting Span ,1 Span in speci c method de nitions. The Span operation is then relocated using the rule above. Later manipulations may then be able to unfold and simplify away the calls to Span ,1. Project. The project manipulation rule eliminates code and elds in a class that has become dead, for example as a result of specialization of the class interface. The rule can be used when the \death" of code can be deduced only by analysis encompassing the entire class. Suppose the representation type can T

V

T

V

V

V

T

V

T

V

V

V

V

V

T

V

T

T

T

V

T

V

V

V

T

T

V

be written as 1 2 for some 1 and 2 . This could correspond to a partitioning of the instance variables (represented here as a composite type ) into two sets. If the conditions below are met, then project can be used to eliminate the 2 portion of the representation type and associated computations. The rule is as follows: 1. De ne Rep as Rep : ! 1 2 and Abs as Abs : 1 2 ! . 2. Represent the concrete computation of each method op that operates on any portion of the internal representation as op : 1 2 ! 1 2 where and are other types (representing non-encapsulated inputs and outputs). Then, by other manipulations, rede ne op to calculate its result using two separate operations that calculate separate portions of the result,1 op1 : 1 ! 1 op2 : 1 2 ! 2 Do this analogously for all operations op on internal representations 1 2. If this cannot be done for all such operations involving 1 and 2, then the rule cannot be applied. 3. Replace all operations op within the class by the new operations op1. 4. Simultaneously, replace all instances of Abs and Rep by new versions Rep0 : ! 1 and Abs0 : 1 ! . The eect of project is to eliminate the 2 portion of the overall object representation and all the computations associated with it. In Java, for example, 2 would be a subset of the instance variables of a class. The project manipulation generally becomes a candidate for application after boundary manipulations have eliminated or separated all methods that depend on the particular instance variables corresponding to 2 . It is possible to think of shift and project as introduction and elimination rules for instance variables in the speci c sense that, among its uses, shift can be used to introduce new elds and computations that could later be eliminated using project . Idempotency. The idempotency rule is used to eliminate certain instances of idempotent computations such as integrity checks, caching, or invariant-maintaining operations. When an idempotent operation is executed multiple times on the same data object, even as it is passed among dierent methods, then all but one of the calls can be eliminated. 1. De ne an idempotent Span function of type ! . Span Span = Span 2. Establish that each call of Abs appears in context Abs Span or, alternatively, establish that each call to Rep appears in context Span Rep. (These variants are analogous to the two variants of shift.) V

V

V

V

V

V

T

V

V

V

V

T

V

V

U

V

U

V

V

W

W

V

U

V

V

V

W

U

V

V

V

T

V

V

V

T

V

V

V

V

1

op(v1 ; v2 ; u)

V

= [let (v1; w) = op1 (v1 ; u) and v2 = op 2 (v1 ; v2 ; u) in (v1 ; v2 ; w) end ] 0

0

0

0

V

3. For one or more particular operation (method) de nitions, establish in each that the call to Span can be commuted with the entire remaining portion of the de nition. op = Span op0 = op0 Span 4. In each of the cases satisfying this commutativity requirement, replace op by op0 . That is, replace the calls to Span by the identity function. Examples: Performance-oriented calculations (spread over multiple operations) such as rebalancing search trees or rebuilding database indexes can be eliminated or delayed when their results are not needed for the immediate operation. An integrity check involving a proper subset of the instance variables can be eliminated from operations that do not employ those variables (i.e., delayed until a later operation that involves a variable in that subset). Note that this integrity check example depends on a liberal treatment of exceptions in the de nition of program equivalence.

4 Correctness of Class Manipulations The class manipulations alter data representations and method de nitions in a class. Establishing correctness of the manipulations amounts to proving behavioral equivalence of the original and transformed class de nitions. Since encapsulated objects are only unwrapped by methods within the class, the external behavior of a class can be construed entirely in terms of sequences of incoming and outgoing \exposed" values (to use Larch terminology) [G93]. This external class behavior is usually related to the internal de nition of the class (i.e., the de nitions of the instance variables and methods) through an explicit abstraction mapping, as originally suggested by Hoare [H72]. In the proofs below (as in the descriptions of the manipulations above), we employ a functional style in which methods are functional and objects are pure values. The set of instance variables of a Java class, which can be thought of as an implicit aggregate parameter and result, are made explicit in our rendering as new parameters and results of type . This simple translation is possible (in our limited Java subset) because objects are pure values|i.e., there is no use of \object identity." Suppose, for example, that is the abstract type of a class, and are other types, and there are four methods in the class, which are functional and have the following signatures: op1 : ! , op2 : ! , op3 : ! , and op4 : ! . Thus, for example, op1 is a constructor and op3 does not alter or create objects. Properties of objects in a class are expressed using an invariant ( ) that relates abstract values (of type above) to representation (instance variables) of type . Behavioral correctness of a class is de ned in terms of preconditions on incoming exposed values, postconditions on outgoing exposed values, and a predicate that is an invariant and that relates abstract values with representations. If an abstract speci cation of the class is not needed, the proof can be obtained without reference to the abstract values . T

T

U

W

U

T

U

T

T

T

T

T

W

T

I t; v

t

v

T

V

I

t

T

If opi is a method, let opi be the implementation of the method, typed accordingly. For example, if opi : ! then opi : ! . Let pre i be a precondition on values of the exposed inputs to opi and post i be a postcondition on the values of the exposed outputs from opi. For the invariant to hold for the class above (op1 through op4 ), the following four properties of must hold: pre 1( ) ) (op1 ( ) op1 ( )) ( 1 1) ^ ( 2 2) ) (op2 ( 1 2) op2 ( 1 2 )) ( ) ) post 3 (op3 ( )) ( ) ) (op4 ( ) op4 ( )) ^ post 4 (op4 ( )) Behavioral correctness. Establishing behavioral correctness of a manipulation rule operating on a class de nition amounts to showing that the relationships among pre i and post j values remain unchanged, even as the de nition of changes in the course of transformation. Correctness of shift. Let ( ) be an invariant that holds for all methods in class prior to carrying out the transformation. The eect of shift is to create a new invariant 0 ( 0 ) that relates the new representation values 0 to the abstract values . The proof proceeds by de ning 0 , showing it is invariant in the transformed class, and showing that it captures the same relationships between preconditions and postconditions as does . We prove the rst variant of shift ; the proof for the second variant is essentially identical. For a method op : ! , for example, the Span precondition of the de nition of the shift manipulation guarantees for some residual computation that ( ) ) (op( ) op( )) ) (op( ) ( Span )( )) De ne the post-manipulation invariant 0 ( 0 ) ( ) ^ 0 = Span ( ). Now, 0 ( 0 ) ( ) ^ 0 = Span ( ) De nition of 0 0 ) (op( ) ( Span )( )) ^ = Span ( ) Invariance property of ) (op( ) ( 0 )) ) 0 (op( ) (Span )( 0 )) De nition of 0 0 0 De nition of shift ) (op( ) op( )) This establishes that 0 is invariant for the operation op in the class de nition after it has been subject to the manipulation. To understand the eect on postconditions, we consider the case of op3 . ( ) ) post 3 (op3( )) Assumed ( ) ) post 3 ( (Span ( )) Precondition of shift 0 ( Span ( )) ) post 3 ( (Span ( )) De nition of 0 0 ( Span ( )) ) post 3 (op0 (Span ( )) De nition of 0 3 0 (9 )( = Span ( )) ^ 0 ( 0 ) ) post 3 (op03( 0 ) This is sucient, since in the modi ed class de nition, all outputs of methods are in the codomain of Span . These arguments are easily generalized to methods that include additional parameters and results. U

T

T

W

U

V

V

W

I I

u

I t ;v

I t ;v

I

u ;

I

t ;t

u

;

I t; v

v ;v

v

I t; v

I

t ;

v

v

I

I t; v

T

I

t; v

v

t

I

I

T

T

r

I t; v

t ;

I

t ; r I

I

t; v

I t; v

v

I

t ; r

I

t ;r v

I

t ;

I

t ;

v

I

v

t; v

I t; v

v

v

r

v

v

I

v

v

I

v

I

v

I

I t; v

I t; v

v

v

I

t;

v

I

t;

v

v

I

t; v

v

r

v

r

v

I

v

v

I

Correctness of idempotency. There are two variants of the idempotency manipulation. We prove one, in which the second precondition (in the description of the technique) assures that for any , ( ( ) ) ) (9 0 )( = (Span ^)( 0 )) for some residual function . Consider the case op : ! . Because of the second precondition of the idempotency rule, op preserves the invariant, ( ) ) (op( ) op( )) ) (op( ) (Span op0 )( )) For each method in which Span commutes (and in which Span can therefore be eliminated by the transformation), we need to show that the invariant is maintained, that is ( ) ) (op( ) op0 ( )). Assume inductively that invariant ( ) holds for all class operations on type prior to carrying out the transformation. We proceed as follows: ( ) ) (9 0 ) ( ) ^ = (Span )( 0 ) idempotency precondition ) (9 0 ) (op( ) (Span op0 Span )( 0 )) Invariant for op ) (9 0 ) (op( ) (op0 Span Span )( 0 )) Commutativity Idempotency of Span ) (9 0 ) (op( ) (op0 Span )( 0 )) ) (op( ) op0 ( )) De nition of Correctness of project. The proof for the project manipulation is similar, and is omitted. t

I op t ; v

v

r

T

I t; v

I t; v

I

I

t ;

I

t ;

v

r

v

T

v

v

t ;

v

I t; v

T

I t; v

I

v

I t; v

v

I

t ;

r

v

v

I

t ;

r

v

v

I

t ;

t ;

v

v

r

r

v

v

v

5 Strings in Java We now present a modest-scale case study, which is a hypothetical re-creation of the evolution of early versions of the two Java classes used for strings, String and StringBuffer.2 Java Strings are meant to be immutable and eciently represented with minimalcopying. For example, substring operations on Strings manipulate pointers and do not involve any copying. StringBuffers, on the other hand, are mutable and exible, at a modest incremental performance cost. The typical use scenario is that strings are created and altered as StringBuffers and then searched and shared as Strings, with conversion operations between 2

These two classes are part of an early version of the released Java Development Kit 1.0Beta from Sun Microsystems. Java is a trademark of Sun Microsystems. Since fragments of the code (string 1.51 and stringbuer 1.21) are quoted in this paper, we include the following license text associated with the code: \Copyright (c) 1994 Sun Microsystems, Inc. All Rights Reserved. Permission to use, copy, modify, and distribute this software and its documentation for non-commercial purposes and without fee is hereby granted provided that this copyright notice appears in all copies. Please refer to the le \copyright.html" for further important copyright and licensing information. Sun makes no representations or warranties about the suitability of the software, either express or implied, including but not limited to the implied warranties of merchantability, tness for a particular purpose, or non-infringement. Sun shall not be liable for any damages suered by licensee as a result of using, modifying or distributing this software or its derivatives."

them. The data representations used in both classes facilitate this conversion by delaying or minimizing copying of structure. In particular, conversions from StringBuffer to String do not always result in copying of the character array data structure. Both classes represent strings as character arrays called value. Both have an integer eld called count, which is the length of the string in the array. The String class also maintains an additional integer eld called offset, which is the initial character position in the array. Since Strings are immutable, this enables substrings to be represented by creating new String objects that share the character array and modify count and offset. The StringBuffer class, which is used for mutable strings, has three elds. In addition to value and count, there is a boolean shared that is used to help delay or avoid array copy operations when StringBuffer objects are converted into Strings. The method used for this conversion does not copy value, but it does record the loss of exclusive access to the array value by setting the shared ag. If the ag is set, then a copy is done immediately prior to the next mutation, at which point the ag is reset. If no mutation is done, no copy is made. A simpler but less ecient scheme would always copy. Selected steps of the evolutionary account are summarized below. The initial steps, omitted here, establish the initial string class, called Str, which manages mutable strings represented as null-terminated arrays, as in C. Str always copies character arrays passed to and from operations. (All boundary and class manipulation steps below have been manually carried out using our techniques, and one of these is examined more closely in the following section.) 1. Introduce count. In the initial class Str, trade space for time by using shift to replace uses of the null character by adding the private instance variable count. Simpli cations following the shift eliminate all reference to and storage of null terminating characters. 2. Separate Str into String and StringBuffer. Label all Str variables in the scope in which Str is accessible as either String or StringBuffer, and introduce two methods to convert between them (both initially the identity function with a type cast). Since copy operations are done at every step, object identity cannot be relied upon, and this is a safe step. This process would normally be iterative, using type analysis and programmer-guided selection of conversion points. Those Str variables that are subject to mutating operations, such as append, insert, and setCharAt are typed as StringBuffer. Those Str variables that are subject to selection operations, such as substring and regionMatches, are typed as String. Common operations such as length and charAt are in both clones. Because copying is done at every step, conversion operations can always do an additional copy of mutable structures (i.e., value). This approach to allocating variables to the two clones enables use of boundary manipulations to remove mutating operations such as setCharAt, for example, from String. At this point, the two clones have dierent client variables, dierent sets of operations, but identical representations and representation invariants.

3. Introduce offset. Since Strings are immutable, substring does not need

to copy value. But elimination of copying requires changing the representation to include osets as well as length. Use shift to accomplish the representation change preparatory to eliminating the copying. (This is the step that is detailed below.) 4. Introduce shared. In StringBuffer, value is copied prior to passing it to the String constructor. This is necessary because value would otherwise be aliased and subject to subsequent mutation, which would violate the mutability invariant of String. In fact, the copying needs to be done only if value is subsequently mutated, and could be delayed until then. Use shift to relocate the copying to whatever StringBuffer method immediately follows the conversion, adding the shared ag. This delays the copy by \one step." Then use idempotency to delay that copying \multiple steps" (by commuting it within all calls to non-mutating access methods) until a StringBuffer operation that actually does mutation. 5. Add specialized methods. Identify conversion functions from various other common types to char[]. For both string classes, use boundary shifts to compose these conversion functions with the constructors and with commonly used functions such as append and insert. Then use additional boundary shifts to relocate these composed functions as public methods, naming them appropriately (mostly through overloading). Then, unfold (inline) the type conversion calls contained in these methods and specialize accordingly. (This introduces many of the large number of methods de ned in the two classes.) 6 (hypothetical). Eliminate value. Suppose that in some context only the lengths of StringBuffers were needed, but not their contents. That is, after a series of calls to methods such as insert, append, setCharAt, reverse, and setLength, the only accessor method called is length. Clone and project can be used to create a specialized class for this case.

6 A Closer Look at Osets in Class String We now consider more closely step 3 above, which introduces the offset instance variable as an alternative to copying when substrings are calculated. This step must include, for example, modi cation of all character access operations to index from offset. This account illustrates how shift can be employed to introduce osets into the string representation. An initial Java version of substring in class String could look like this: public String substring (int b, int e) { // b is first char position; e-1 is last character position. if ((b < 0) || (e > count) || (b > e)) {throw new IndexOutOfBoundsException (endi); } int ncount = e - b; char res[] = new char[ncount]; ArrayCopy (value, b, res, 0, ncount); return new String (res, ncount); }

ArrayCopy copies ncount characters

from value to res starting at osets b and 0 respectively. To facilitate presentation consistent with the manipulation rules, the code below explicitly mentions in special brackets those (- instance variables -) that are referenced or changed. This allows a more functional treatment of methods. We also usually abbreviate f- integrity checks -g. Thus the code above could be rendered more succinctly as: public String substring (int b, int e) (- value, count -) { {- Check (b, e, count) -} int ncount = e - b; char res[] = new char[ncount]; ArrayCopy (value, b, res, 0, ncount); return new String (res, ncount); }

The principal steps for shift , recall, are (1) de ning a suitable Span function, (2) adapting code so it appears in all the required places, and (3) carrying out the shift transformation by relocating the Span calls. We then (4) simplify results. (1) De ne Span. The motivating observation is that the array copying is unnecessary when a substring is extracted. We therefore abstract the instances of array copy code into a new method de nition and use that as Span . The original code sites will be replaced by calls. private (char[],int) Span (- char[] nvalue, int ncount, int noffset -) { char res[] = new char[ncount]; ArrayCopy (nvalue, noffset, res, 0, ncount); return (res,ncount); }

(An additional notational liberty we take is to allow functions to return a tuple of results. For this variant of shift , the parameters of Span are exactly the new set of instance variables, and the results are the old instance variables. (2) Call Span. The method abstraction step needs to be done in a way that introduces Span calls into all methods that construct or mutate string values (i.e., alter value or count). Note that substring does not mutate or construct string values directly; it uses a String constructor: public String (nvalue, ncount) (- value, count -) value = new char[ncount]; count = ncount; ArrayCopy (nvalue, 0, value, 0 count); }

{

After Span is abstracted: public String (nvalue, ncount) (- value, count -) (value, count) = Span (nvalue, ncount, 0); }

{

When Span has an easily computed inverse function, then Span can always be introduced trivially into otherwise less tractable cases as part of a composition with the inverse. But this will likely necessitate later code simpli cation. (3) Do the Shift. Once all the Span calls have been inserted, doing the shift is a mechanical operation. The eect is as follows:

{ { {

The class now has three private instance variables, nvalue, ncount, and noffset, corresponding to the parameters of Span . All Span calls are replaced by (simultaneous) assignments to the new instance variables of the actual parameters. In each method that accesses instance variables, the rst operation becomes a call to Span that computes value and count (now ordinary local variables) in terms of the instance variables. For example, here is the transformed de nition of substring: public String substring (int b, int e) (- nvalue, ncount, noffset -) { (char value[], int count) = Span (nvalue, ncount, noffset); {- Check (b, e, count) -} int c = e - b; char res[] = new char[c]; ArrayCopy (value, b, res, 0, c); return new String (res, c); }

(4) Simplify. Unfolding Span and simplifying: public String substring (int b, int e) (- nvalue, ncount, noffset -) { {- Check (b, e, ncount) -} int c = e - b; char res[] = new char[c]; ArrayCopy (nvalue, noffset, res, 0, c); ArrayCopy (res, b, res, 0, c); return new String (res, c); }

This motivates our introducing a new string constructor that can take an oset, public String(nvalue,ncount,noffset). This is done by abstracting the last two lines of substring and exploiting the idempotency of array copying. After simpli cations: public String substring (int b, int e) (- nvalue, ncount, noffset -) { {- Check (b, e, ncount) -} return new String (nvalue, e - b, b + noffset); }

In the case of the charAt method in String, we start with: public char charAt(int i) (- value, count -) { {- Check (i, count) -} return value[i]; }

This does not create or modify the instance variables, so no call to Span is needed before the shift. It does, however, access instance variables, so shift inserts an initial call to Span : public char charAt(int i) (- nvalue, ncount, noffset -) { (char value[], int count) = Span (nvalue, ncount, noffset); {- Check (i, count) -} return value[i]; }

Simpli cation entails unfolding Span , eliminating the array copy operation, and simplifying: public char charAt(int i) (- nvalue, ncount, noffset -) { {- Check (i, ncount) -} return nvalue[i + noffset]; }

7 Background and Conclusion Background. There is signi cant previous work on synthesizing and reasoning

about encapsulated abstract data types. Class manipulations are most obviously in uenced by Hoare's early work [H72]. Most related work focuses either on synthesizing type implementations from algebraic speci cations or on re nement of data representations into more concrete forms [D80,MG90,W93]. But evolution does not always involve synthesis or re nement, and the techniques introduced here are meant to alter (and not necessarily re ne) both structure and (internal) meaning. Ideas for boundary manipulations appear in Burstall and Goguen's work on theory operations [BG77]. Wile developed some of these ideas into informallydescribed manipulations on types that he used to provide an account of the heapsort algorithm [W81]. He used a system of manipulating ag bits to record whether values have been computed yet in order to achieve the eect of a primitive class manipulation. For boundary manipulations, we build on several recent eorts which have developed and applied concepts analogous to the operation migration techniques [GN93,JF88,O92]. Early versions of the boundary manipulation and shift techniques were de ned (without proof) and applied to introduce destructive operations in programs [JS87,S86] and in a larger case study in which a complex data structure for the text buer of an interactive text editor was developed [NLS90]. The program manipulation techniques we introduce for classes and other encapsulations are meant to complement other manipulation and specialization techniques for achieving local restructuring and optimizations that do not involve manipulating encapsulations. Conclusion. The techniques described in this paper are source-level techniques meant to be embodied in interactive tools used with programmer guidance to develop and evolve software. The intent is to achieve a more exible and exploratory approach to the design, con guration, implementation, and adaptation of encapsulated abstractions. Manipulation techniques such as these can decrease the coupling between a programmer decision to perform a particular computation and the related decision of where the computation should be placed with respect to class and internal interface structure. Acknowledgements. Thanks to John Boyland and Edwin Chan, who contributed valuable insights. Thanks also to the anonymous referees for helpful comments.

References [BG94] R.W.Bowdidge and W.G. Griswold, Automated Support for Encapsulating Abstract Data Types. ACM SIGSOFT Symposium Foundations of Software Engineering, 1994. [BD77] R.M. Burstall and J. Darlington, A transformation system for developing recursive programs. JACM 24, pp.44-67, 1977. [BG77] R.M. Burstall and J. Goguen, Putting theories together to make speci cations. IJCAI, 1977. [D80] J. Darlington, The synthesis of implementations for abstract data types. Imperial College Report, 1980. [GN93] W.G. Griswold and D. Notkin, Automated assistance for program restructuring. ACM TOSEM 2:3, 1993. [G93] J.V. Guttag and J. J. Horning, et al., Larch: Languages and Tools for Formal Speci cation, Springer-Verlag, 1993. [H72] C.A.R. Hoare, Proof of correctness of data representations. Acta Informatica 1, pp. 271-281, 1972. [JF88] R. Johnson and B. Foote, Designing reusable classes. Journal of ObjectOriented Programming, June/July 1988. [JS87] U. Jrring and W. Scherlis, Deriving and using destructive data types. Program Speci cation and Transformation, Elsevier, 1987. [K96] G. Kiczales, Beyond the Black Box: Open Implementation, IEEE Software, January 1996. [MG90] Carroll Morgan and P.H.B. Gardiner, Data re nement by calculation. Acta Informatica 27 (1990). [MG95] J.D. Morgenthaler and W.G. Griswold, Program analysis for practical program restructuring. ICSE-17 Workshop on Program Transformation for Software Evolution, 1995. [NLS90] R.L. Nord, P. Lee, and W. Scherlis, Formal manipulation of modular software systems. ACM/SIGSOFT Formal methods in software development, 1990. [O92] W. Opdyke, Refactoring Object-Oriented Frameworks. PhD Thesis, University of Illinois, 1992. [P86] R. Paige, Programming with invariants. IEEE Software 3:1, 1986. [P84] A. Pettorossi, A powerful stragey for deriving ecient programs by transformation. ACM Symposium on Lisp and Functional Programming, 1984. [S80] W. Scherlis, Expression procedures and program derivation. Stanford University technical report, 1980. [S86] W. Scherlis, Abstract data types, specialization, and program reuse. Advanced programming environments, Springer, 1986. [S94] W. Scherlis, Boundary and path manipulations on abstract data types. IFIP 94, North-Holland, 1994. [W88] P. Wadler, Deforestation: Transforming programs to eliminate trees. European Symposium on Programming, Springer, 1988. [W81] D.S. Wile, Type transformations. IEEE TSE SE-7, pp.32-39, 1981. [W93] K.R. Wood, A practical approach to software engineering using Z and the re nement calculus. ACM SIGSOFT Symposium on the Foundations of Software Engineering, 1993.