Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5608
Pierre-Louis Curien (Ed.)
Typed Lambda Calculi and Applications 9th International Conference, TLCA 2009 Brasília, Brazil, July 1-3, 2009 Proceedings
13
Volume Editor Pierre-Louis Curien Université Paris Diderot - Paris 7 Laboratoire PPS (CNRS / Paris 7) Case 7014, 75205 Paris Cedex 13, France E-mail:
[email protected]
Library of Congress Control Number: Applied for CR Subject Classification (1998): D.1.6, D.3.2, F.3, F.4, I.2.3 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN ISBN-10 ISBN-13
0302-9743 3-642-02272-3 Springer Berlin Heidelberg New York 978-3-642-02272-2 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12695461 06/3180 543210
Preface
This volume contains the papers of the 9th International Conference on Typed Lambda Calculi and Applications (TLCA 2009), which was held from July 1–3, 2009, in Bras´ılia, Brazil, as part of the 5th International Conference on Rewriting, Deduction, and Programming (RDP 2009) together with the International Conference on Rewriting Techniques and Applications (RTA 2009), the International School on Rewriting (ISR 2009), the 4th Workshop on Logical and Semantic Frameworks with Applications (LSFA 2009), the 10th International Workshop on Rule-Based Programming (RULE 2009), the 8th International Workshop on Functional and (Constraint) Logic Programming (WFLP 2009), the 9th International Workshop on Reduction Strategies in Rewriting and Programming (WRS 2009), and the annual meeting of the IFIP Working Group 1.6 on term rewriting. The TLCA series of conferences serves as a forum for presenting original research results that are broadly relevant to the theory and applications of lambda-calculus. Typed lambda-calculi are underlying programming language semantics and implementation on one hand, and a large part of proof theory on the other hand. Previous TLCA conferences were held in Utrecht (1993), Edinburgh (1995), Nancy (1997), L’Aquila (1999), Krak´ ow (2001), Valencia (2003), Nara (2005), and Paris (2007). For TLCA 2009, 27 papers were accepted out of 53 submissions. Each paper was reviewed by at least three members of the Program Committee, with the help of 84 external reviewers. I would like to thank the members of the Program Committee and the external reviewers for their great work, as well as Andrei Voronkov for providing the EasyChair system which was invaluable in the reviewing process, the electronic Program Committee meeting, and the preparation of this volume. In addition to the contributed papers, the TLCA program contained three invited talks by: – Marcelo Fiore (abstract included in the proceedings) – Robert Harper (joint with RTA 2009, abstract included in the proceedings) – Jean-Louis Krivine (talk entitled “Ultrafilters and the Heap”) Many people helped to make TLCA 2009 a success. I would like to thank in particular the Conference Chair Mauricio Ayala Rinc´ on, the TLCA Publicity Chair Luca Paolini, and the local organization team, as well as the following sponsors: Universidade de Bras´ılia, Brazilian Counsel of Technological and Scientific Development - CNPq, Brazilian Coordination for the Improvement of Higher Education Personnel - CAPES, and Federal District Research Foundation - FAPDF. April 2009
Pierre-Louis Curien
Organization
Conference Chair (RDP 2009) Mauricio Ayala Rinc´ on
Bras´ılia University, Brazil
Program Chair (TLCA 2009) Pierre-Louis Curien
CNRS and University Paris 7, France
Program Committee (TLCA 2009) Zena Ariola Patrick Baillot Thierry Coquand Ren´e David Dan Ghica Ryu Hasegawa Barry Jay Soren Lassen Luca Paolini Frank Pfenning Thomas Streicher
University of Oregon, USA CNRS and ENS Lyon, France G¨ oteborg University, Sweden Universit´e de Savoie, Chamb´ery, France University of Birmingham, UK Tokyo University, Japan University of Technology, Sydney, Australia Google, Sydney, Australia University of Turin, Italy Carnegie Mellon University, USA Technical University of Darmstadt, Germany
Local Organizing Committee David Deh´ arbe Fl´ avio L.C. de Moura Hermann Haeusler Elaine Pimentel Alejandro R´ıos Alberto Pardo
Federal University of Rio Grande do Norte (UFRN), Natal, Brazil University of Bras´ılia (UnB), Brazil Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Brazil Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil University of Buenos Aires (UBA), Argentina University of the Republic, Montevideo, Uruguay
TLCA Steering Committee Samson Abramsky Henk Barendregt
Oxford University, UK Radboud University, Nijmegen, The Netherlands
VIII
Organization
Mariangiola Dezani Roger Hindley Martin Hofmann Simona Ronchi Della Rocca Pawel Urzyczyn
University of Turin, Italy Swansea University, UK Ludwig-Maximilians-Universit¨ at, Munich, Germany University of Turin, Italy University of Warsaw, Poland
External Reviewers Thorsten Altenkirch Roberto Amadio Philippe Audebaud David Baelde Franco Barbanera Stefano Berardi Benno van den Berg Aaron Bohannon Guillaume Bonfante Brian Campbell Pierre Clairambault Robin Cockett Claudio Sacerdoti Coen Ferruccio Damiani Daniel Dougherty Gilles Dowek Derek Dreyer Claudia Faggian Lorenzo Tortora de Falco Germain Faure Andrzej Filinski Marcelo Fiore Marco Gaboardi Nicola Gambino Maxime Gamboni Ronald Garcia Richard Garner Dan Grossman Masahiro Hamano Peter Hancock Michael Hanus Robert Harper Masahito Hasegawa Olivier Hermant Claudio Hermida
Chung-Kil Hur Pierre Hyvernat Jun Inoue Mark Jones Ulrich Kohlenbach Tomasz Kowalski Jim Laird Francois Lamarche Pierre Lescanne Paul Blain Levy William Lovas Maria Emilia Maietti Julio Mari˜ no Damiano Mazza Richard McKinley Dale Miller Virgile Mogbil Jean-Yves Moyen C´esar Mu˜ noz Karim Nour Mauro Piccolo Brigitte Pientka Andrew Pitts John Power Myriam Quatrini Femke van Raamsdonk Christophe Raffalli Jason Reed Laurent Regnier Morten Rhiger Simona Ronchi della Rocca Luca Roversi Arnab Roy Didier R´emy
Organization
Andrea Schalk Carsten Sch¨ urmann Jean-Pierre Talpin Kazushige Terui Hayo Thielecke Franklyn Turbak Nikos Tzevelekos
Tarmo Uustalu Vasco T. Vasconcelos Lionel Vaux Luca Vercelli Edwin Westbrook Yoriyuki Yamagata Noam Zeilberger
IX
Table of Contents
Mathematical Synthesis of Equational Deduction Systems (invited talk) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marcelo Fiore and Chung-Kil Hur A Pronominal Approach to Binding and Computation (invited talk) . . . . Robert Harper, Daniel R. Licata, and Noam Zeilberger
1 3
A Modular Type-Checking Algorithm for Type Theory with Singleton Types and Proof Irrelevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Abel, Thierry Coquand, and Miguel Pagano
5
Interactive Learning-Based Realizability Interpretation for Heyting Arithmetic with EM1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Federico Aschieri and Stefano Berardi
20
Syntax for Free: Representing Syntax with Binding Using Parametricity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robert Atkey
35
On the Meaning of Logical Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . Michele Basaldella and Kazushige Terui
50
Thick Subtrees, Games and Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pierre Boudes
65
Bounded Linear Logic, Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ugo Dal Lago and Martin Hofmann
80
Partial Orders, Event Structures and Linear Strategies . . . . . . . . . . . . . . . . Claudia Faggian and Mauro Piccolo
95
Existential Type Systems with No Types in Terms . . . . . . . . . . . . . . . . . . . Ken-etsu Fujita and Aleksy Schubert
112
Initial Algebra Semantics for Cyclic Sharing Structures . . . . . . . . . . . . . . . Makoto Hamana
127
An Operational Account of Call-by-Value Minimal and Classical λ-Calculus in “Natural Deduction” Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hugo Herbelin and St´ephane Zimmermann Refinement Types as Proof Irrelevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . William Lovas and Frank Pfenning
142
157
XII
Table of Contents
Weak ω-Categories from Intensional Type Theory . . . . . . . . . . . . . . . . . . . . Peter LeFanu Lumsdaine
172
Relating Classical Realizability and Negative Translation for Existential Witness Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexandre Miquel
188
Session-Based Communication Optimisation for Higher-Order Mobile Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimitris Mostrous and Nobuko Yoshida
203
The Cut-Elimination Theorem for Differential Nets with Promotion . . . . Michele Pagani A Polymorphic Type System for the Lambda-Calculus with Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barbara Petit
219
234
Kripke Semantics for Martin-L¨ of’s Extensional Type Theory . . . . . . . . . . Steve Awodey and Florian Rabe
249
On the Values of Reducibility Candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . Colin Riba
264
Lexicographic Path Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeffrey Sarnat and Carsten Sch¨ urmann
279
Parametricity for Haskell with Imprecise Error Semantics . . . . . . . . . . . . . Florian Stenger and Janis Voigtl¨ ander
294
Some Observations on the Proof Theory of Second Order Propositional Multiplicative Linear Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lutz Straßburger
309
Algebraic Totality, towards Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . Christine Tasson
325
A Logical Foundation for Environment Classifiers . . . . . . . . . . . . . . . . . . . . Takeshi Tsukada and Atsushi Igarashi
341
Inhabitation of Low-Rank Intersection Types . . . . . . . . . . . . . . . . . . . . . . . . Pawel Urzyczyn
356
Differential Linear Logic and Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . Lionel Vaux
371
Complexity of G¨ odel’s T in λ-Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . Gunnar Wilken and Andreas Weiermann
386
Table of Contents
XIII
The Computational SLR: A Logic for Reasoning about Computational Indistinguishability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yu Zhang
401
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
417
Mathematical Synthesis of Equational Deduction Systems Marcelo Fiore and Chung-Kil Hur Computer Laboratory, University of Cambridge {Marcelo.Fiore,Chung-Kil.Hur}@cl.cam.ac.uk
Our view of computation is still evolving. The concrete theories for specific computational phenomena that are emerging encompass three aspects: specification and programming languages for describing computations, mathematical structures for modelling computations, and logics for reasoning about properties of computations. To make sense of this complexity, and also to compare and/or relate different concrete theories, meta-theories have been built. These metatheories are used for the study, formalisation, specification, prototyping, and testing of concrete theories. Our main concern here is the investigation of meta-theories to provide systems that better support the formalisation of concrete theories. Thereby we propose a research programme based on the development of mathematical models of computational languages, and the systematic use of these models to synthesise formal deduction systems for reasoning and computation. Specifically, we put forth a mathematical methodology for the synthesis of equational and rewriting logics from algebraic meta-theories. The synthesised logics are guaranteed to be sound with respect to a canonical model theory, and we provide a framework for analysing completeness that typically leads to canonical logics. Our methodology can be used to rationally reconstruct the traditional equational logic of universal algebra and its multi-sorted version from first principles. As for modern applications, we have synthesised: (1) a nominal equational logic for specifying and reasoning about languages with name-binding operators, and (2) a second-order equational logic for specifying and reasoning about simple type theories. Overall, we aim at incorporating into the research programme further key features of modern languages, as e.g. type dependency, linearity, sharing, and graphical structure.
References 1. Fiore, M., Hur, C.-K.: On the construction of free algebras for equational systems. In: Special issue for Automata, Languages and Programming (ICALP 2007). Theoretical Computer Science, vol. 410, pp. 1704–1729. Elsevier, Amsterdam (2009) 2. Fiore, M., Hur, C.-K.: Term equational systems and logics. In: Proceedings of the 24th Conference on the Mathematical Foundations of Programming Semantics (MFPS XXIV). Electronic Notes in Theoretical Computer Science, vol. 218, pp. 171–192. Elsevier, Amsterdam (2008) P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 1–2, 2009.
2
M. Fiore and C.-K. Hur
3. Fiore, M.: Algebraic theories and equational logics. In: Invited tutorial at the 24th Conference on the Mathematical Foundations of Programming Semantics, MFPS XXIV (2008), http://www.cl.cam.ac.uk/~ mpf23/ 4. Fiore, M.: Second-order and dependently-sorted abstract syntax. In: 23rd Annual IEEE Symposium on Logic in Computer Science (LICS 2008), pp. 57–68. IEEE Computer Society Press, Los Alamitos (2008) 5. Hur, C.-K.: Categorical Equational Systems: Algebraic Models and Equational Reasoning. Forthcoming PhD thesis. Computer Laboratory, University of Cambridge (2009)
A Pronominal Approach to Binding and Computation Robert Harper, Daniel R. Licata, and Noam Zeilberger Carnegie Mellon University {rwh,drl,noam}@cs.cmu.edu
There has been a great deal of research on programming languages for computing with binding and scope (bound variables, α-equivalence, capture-avoiding substitution). These languages are useful for a variety of tasks, such as implementing domain-specific languages and formalizing the metatheory of programming languages. Functional programming with binding and scope involves two different notions of function: functions-as-data and functions-as-computation. Functionsas-data, used to represent abstract syntax with variable binding, have an intensional, syntactic, character, in the sense that they can be inspected in ways other than function application. For example, many algorithms that process abstract syntax recur under binders, treating variables symbolically. On the other hand, functions-as-computation, the usual functions of functional programming, have an extensional character—a function from A to B is a black box that, when given an A, delivers a B. We are investigating a programming language that provides support for both functions-as-data and functions-as-computation as two different types. Our framework provides one type constructor ⇒ for functions-as-data, used to represent variable binding, and another type constructor ⊃ for functions-as-computation, used for functional programming. This permits representations that mix the two function spaces, which is useful, e.g., for implementing normalization-byevaluation. Our framework treats variable binding pronominally: variables are intrinsically-scoped references to a context. This permits types to be used to reason about the scoping of variables—e.g., that a normalization function maps closed terms to closed terms. In our mixed, pronominal setting, the structural properties of weakening and substitution hold only under some conditions on types, but we show that these conditions can be discharged automatically in many cases. The interested reader may refer to either a technical account of our type theory [1], or a more recent discussion [2] of an implementation as an embedding the dependently typed programming language Agda 2. The latter programs a number of examples, such as normalization-by-evaluation for the untyped λcalculus. One of the key technical tools used in our work is a proof-theoretic technique called higher-order focusing [3, 4], which provides a a logical analysis of pattern matching and evaluation order. Higher-order focusing leads to convenient formalizations of programming languages with pattern-matching [5], and has been used to investigate refinement types [6] and dependent types [7] in the presence of effects. P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 3–4, 2009.
4
R. Harper, D.R. Licata, and N. Zeilberger
References [1] Licata, D.R., Zeilberger, N., Harper, R.: Focusing on binding and computation. In: IEEE Symposium on Logic in Computer Science (2008) [2] Licata, D.R., Harper, R.: A universe of binding and computation (March 2009), http://www.cs.cmu.edu/~ drl [3] Zeilberger, N.: On the unity of duality. Annals of Pure and Applied Logic 153(1–3) (2008); Special issue on Classical Logic and Computation [4] Zeilberger, N.: The logical basis of evaluation order and pattern matching. PhD thesis, Carnegie Mellon University (2009) [5] Zeilberger, N.: Focusing and higher-order abstract syntax. In: ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pp. 359–369 (2008) [6] Zeilberger, N.: Refinement types and computational duality. In: PLPV 2009: Proceedings of the 3rd Workshop on Programming Languages Meets Program Verification, pp. 15–26. ACM Press, New York (2009) [7] Licata, D.R., Harper, R.: Positively dependent types. In: PLPV 2009: Proceedings of the 3rd Workshop on Programming Languages Meets Program Verification, pp. 3–14. ACM, New York (2009)
A Modular Type-Checking Algorithm for Type Theory with Singleton Types and Proof Irrelevance Andreas Abel1 , Thierry Coquand2 , and Miguel Pagano3 1
Ludwig-Maximilians-Universit¨ at M¨ unchen
[email protected] 2 G¨ oteborg University
[email protected] 3 Universidad Nacional de C´ ordoba
[email protected]
Abstract. We define a logical framework with singleton types and one universe of small types. We give the semantics using a PER model; it is used for constructing a normalisation-by-evaluation algorithm. We prove completeness and soundness of the algorithm; and get as a corollary the injectivity of type constructors. Then we give the definition of a correct and complete type-checking algorithm for terms in normal form. We extend the results to proof-irrelevant propositions.
1
Introduction and Related Work
One of the raisons d’ˆetre of proof-checkers like Agda [26], Coq [18], and Epigram [23] is to decide if a given term has some type; i.e., if a term corresponds to a proof of a proposition [17]. Hence, the convenience of such a system is, in part, determined by the types for which the system can check membership. We extend the decidability of type-checking done in previous works [1,2] for Martin-L¨ of typetheories [21,25] by considering singleton types and proof-irrelevant propositions. Singleton types were introduced by Aspinall [8] in the context of specification languages. An important use of singletons is as definitions by abbreviations (see [8,14]); they were also used to model translucent sums in the formalisation of SML [19]. It is interesting to consider singleton types because beta-eta phase separation fails: one cannot do eta-expansion before beta-normalisation because the shape of the types at which to eta-expand is still unknown at this point; and one cannot postpone eta-expansion after beta-normalisation, because etaexpansion can trigger new beta-reductions. Stone and Harper [29] decide type checking in a LF with singleton types and subtyping. Yet it is not clear whether their method extends to computation on the type level. As far as we know, our work is the first where singleton types are considered together with a universe. De Bruijn proposed the concept of irrelevance of proofs [11], for reducing the burden in the formalisation of mathematics. As shown by Werner [30], the use of proof-irrelevance types together with sigma types is one way to get subset types a la PVS [27] in type-theories having the eta rule—this direction was explored ` by Sozeau [28, Sec. 3.3]. P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 5–19, 2009. c Springer-Verlag Berlin Heidelberg 2009
6
A. Abel, T. Coquand, and M. Pagano
Checking dependent types relies on checking types for equality. To this end, we compute η-long normal forms using normalisation by evaluation (NbE) [22]. Syntactic expressions are evaluated into a semantic domain and then reified back to expressions in normal form. To handle functional and open expressions, the semantic domain has to be equipped with variables; a major challenge in rigorous treatments of NbE has been the problem to generate fresh identifiers. Solutions include term families [10], liftable de Bruijn terms [7], or Kripke semantics [4]. In this work we present a novel formulation of NbE which avoids the problem completely: reification is split into an η-expansion phase (↓) in the semantics, followed by a read back function (R) into the syntax which is indexed by the number of already used variables. This way, a standard PER model is sufficient, and technical difficulties are avoided. Outline. The definitions of two calculi are presented in section 2. In section 3 we define the semantics of this LF in a PER model, and we show soundness of the model wrt. the derived rules of the calculus. We use this model to introduce a NbE algorithm, for which we prove completeness (if t = s is derivable, then nbe(t) and nbe(s) are identical). In section 4 we prove, using logical relations, the soundness of the algorithm (i.e., t = nbe(t) is derivable). In section 5 we define a bi-directional algorithm for checking the type of normal forms and inferring the type of neutral terms.
2
The Calculus as a Generalised Algebraic Theory
In the section, we introduce the calculus. For ease of reading, and for showing the modularity of our approach, we present it as two calculi: the first one has dependent function spaces, singleton types, and a universe closed under function spaces and singletons. In the second calculus we leave out singleton types and we add proof-irrelevant types. We present the calculi using the formalism proposed by Cartmell for generalised algebraic theories (GAT) [12]; however, our calculi are not proper GATs (the rules are written in the so-called “informal syntax” and the rule for application is ambiguous). We give only the introductory rules and the axioms; the rules stating that equality is a congruence relation, called derived rules, are omitted. An example of a derived rule is A = B ∈ Type(Γ ) γ=δ∈Δ→Γ . A γ = B δ ∈ Type(Δ) Calculus with singleton types Sorts. The set of sort symbols is {Ctx, →, Type, Term}. Γ, Δ ∈ Ctx (ctx-sort) (subs-sort) Ctx is a type Γ → Δ is a type Γ ∈ Ctx Γ ∈ Ctx A ∈ Type(Γ ) (type-sort) (term-sort) Type(Γ ) is a type Term(Γ, A) is a type
A Modular Type-Checking Algorithm for Type Theory with Singleton Types
7
In the following, whenever a rule has a hypothesis A ∈ Type(Γ ), then Γ ∈ Ctx shall be a further, implicit hypothesis. Similarly, σ ∈ Γ → Δ presupposes Γ ∈ Ctx and Δ ∈ Ctx, and t ∈ Term(Γ, A) presupposes A ∈ Type(Γ ), which in turn presupposes Γ ∈ Ctx. Note that judgements of the form Γ ∈ Ctx, A ∈ Type(Γ ), t ∈ Term(Γ, A), and σ ∈ Γ → Δ correspond to the more conventional forms Γ , Γ A, Γ t : A, and Γ σ : Δ, resp. In the rest of the paper we use the latter. Operators. The set of operators is quite large and instead of giving it at once, we define it as the union of the disjoint sets of operators for contexts, substitutions, types, and terms. Contexts. There are two operators for contexts: SC = {, . }. Γ ∈ Ctx A ∈ Type(Γ ) (empty-ctx) (ext-ctx) ∈ Ctx Γ.A ∈ Ctx Substitutions. For substitutions we have five operators: SS = {id , , ( , ), Γ ∈ Ctx (id-subs) idΓ ∈ Γ → Γ
, p}.
Γ ∈ Ctx (empty-subs) ∈ Γ →
δ∈Γ →Θ σ∈Θ→Δ (comp-subs) σδ ∈ Γ → Δ σ∈Γ →Δ t ∈ Term(Γ, A σ) (ext-subs) (σ, t) ∈ Γ → Δ.A
A ∈ Type(Γ ) (fst-subs) p ∈ Γ.A → Γ
Types. The set of operators for types is ST = {U, Fun Γ ∈ Ctx U ∈ Type(Γ ) A ∈ Type(Γ )
(u-f)
A ∈ Term(Γ, U) A ∈ Type(Γ )
t ∈ Term(Γ, A)
{t}A ∈ Type(Γ )
(u-el)
(sing-f)
A ∈ Type(Γ )
,
, { } }. B ∈ Type(Γ.A)
Fun A B ∈ Type(Γ )
A ∈ Type(Δ)
σ∈Γ →Δ
A σ ∈ Type(Γ )
Terms. The set of operators for terms is SE = {Fun
,{ } ,
(fun-f)
(subs-type)
, q, λ , App
}.
A ∈ Term(Γ, U) B ∈ Term(Γ.A, U) t ∈ Term(Γ.A, B) (fun-u-i) (fun-i) Fun A B ∈ Term(Γ, U) λt ∈ Term(Γ, Fun A B) B ∈ Type(Γ.A)
t ∈ Term(Γ, Fun A B) u ∈ Term(Γ, A) (fun-el) App t u ∈ Term(Γ, B (idΓ , u))
σ∈Γ →Δ t ∈ Term(Δ, A) A ∈ Type(Γ ) (subs-term) (hyp) t σ ∈ Term(Γ, A σ) q ∈ Term(Γ.A, A p) A ∈ Term(Γ, U) t ∈ Term(Γ, A) t ∈ Term(Γ, A) (sing-u-i) (sing-i) {t}A ∈ Term(Γ, U) t ∈ Term(Γ, {t}A ) a ∈ Term(Γ, A) t ∈ Term(Γ, {a}A ) (sing-el) t ∈ Term(Γ, A)
8
A. Abel, T. Coquand, and M. Pagano
Axioms. We give the axioms without the premises, except in the cases where they can not be inferred. Substitutions (σ δ) γ = σ (δ γ) idΓ σ = σ id = p (σ, t) = σ
σ = σ idΓ = σ idΓ.A = (p, q) (σ, t) δ = (σ δ, t δ)
Substitutions on types, and terms; η and β-axioms. {t}A σ = {t σ}A σ q (σ, t) = t
Uγ = U (Fun A B) σ = Fun (A σ) (B (σ p, q)) t (σ δ) = (t σ) δ (λt) σ = λ(t (σ p, q)) App (λt) r = t (idΓ , r)
t idΓ = t (App r s) σ = App (r σ) (s σ) λ(App (t p) q) = t
t, t ∈ Term(Γ, {a}A ) t = t ∈ Term(Γ, {a}A ) (sing-eq-i) (sing-eq-el) t = t ∈ Term(Γ, {a}A ) t = t ∈ Term(Γ, A) Notation. We denote with |Γ | the length of the context Γ ; and Γ !i is the projection of the i-th component of Γ , for 0 i < |Γ |. We say Δ i Γ if Δ pi : Γ ; where pi is the i-fold composition of p with itself. We denote with Terms the set of words freely generated using symbols in SS ∪ ST ∪ SE . We write t ≡T t for denoting syntactically equality of t and t in T ⊆ Terms. We call A the tag of {a}A . Definition 1 (Neutral terms, and normal forms) Ne k ::= q | qpi+1 | App k v Nf v, V, W ::= U | Fun V W | {v}V | λv | k Remark 1 (Weakening of judgements) Let Δ i Γ , Γ A = A , and Γ t = t : A; then Δ A pi = A pi , and Δ t pi = t pi : A pi . Remark 2 (Syntactic validity) 1. If Γ t : A, then Γ A. 2. If Γ t = t : A, then both Γ t : A, and Γ t : A. 3. If Γ A = A , then both Γ A, and Γ A . Lemma 1 (Inversion of types) 1. If Γ Fun A B, then Γ A, and Γ.A B. 2. If Γ {a}A , then Γ A, and Γ a : A. 3. If Γ k, then Γ k : U.
A Modular Type-Checking Algorithm for Type Theory with Singleton Types
9
Lemma 2 (Inversion of typing) 1. 2. 3. 4. 5.
If If If If If Γ
Γ Fun A B : A, then Γ A : U, and also Γ.A B : U; Γ {b}B : A, then Γ B : U, and also Γ b : B; Γ λt : A, then Γ.A t : B . Γ t : {a}A , then Γ t : A, and Γ t = a : A. Γ q pi : A, then either Γ A = (Γ !i) pi+1 ; or Γ A = {a}A , and a = q pi : A .
Calculus with Proof-Irrelevance. Our treatment of proof-irrelevance is based on [9,20]. The motivation for a canonical element witnessing the existence of a proof is to keep the modularity of the algorithm for deciding equality; but since its introduction breaks completeness of type-checking, we consider two calculi: the proof (programming) developments are done in a calculus without prf-tm, and the type-checking is performed in a calculus with it. We show then that this is a conservative extension. Introductory rules A ∈ Type(Γ ) Prf A ∈ Type(Γ )
a ∈ Term(Γ, A)
(prf-f)
[a] ∈ Term(Γ, Prf A)
A ∈ Type(Γ )
(prf-i)
O ∈ Term(Γ, Prf A)
t, t ∈ Term(Γ, Prf A)
t = t ∈ Term(Γ, Prf A) B ∈ Type(Γ )
t ∈ Term(Γ, A)
b ∈ Term(Γ.A, B p)
(prf-eq)
t ∈ Term(Γ, Prf A)
B
b where t ∈ Term(Γ, Prf B)
(prf-tm)
(prf-el)
Axioms (Prf A) δ = Prf (A δ)
[t] δ = [t δ]
(b whereB t) δ = b (δ p, q) whereB δ (t δ)
Oδ = O b whereB [t] = [b (id, t)]
Lemma 3 (Inversion) 1. If Γ [t] : A, then Γ A = Prf A and Γ t : A . 2. If Γ b whereB t : A, then Γ A = Prf B, and Γ t : Prf A , and Γ.A b : B p. As is expected we have now more normal forms, and more neutral terms: Ne k ::= . . . | v whereV k Nf v, V ::= . . . | Prf V | [v] | O Now we prove that the calculus with prf-tm is a conservative extension of the one without it. We decorate the turnstile, and the equality symbol with ∗ for referring to judgements in the extended calculus.
10
A. Abel, T. Coquand, and M. Pagano
Definition 2. A term t is called a lifting of a term t, if all the occurrences of O in t have been replaced by terms s0 , . . . , sn−1 , and O does not occur in any si . We extend this definition to substitutions, contexts, and equality judgements. If Γ is a lifting of Γ , and Γ =∗ Γ , and also Γ then we say that Γ is a good-lifting of Γ . We extend the definition of good-lifting to the others kinds of judgement. Lemma 4. Let Γ ∗ J, then there exists a good-lifting Γ J ; moreover for any other good-lifting Γ J of Γ ∗ J, we have Γ = Γ , and Γ J = J . Corollary 1. The calculus ∗ is a conservative extension of .
3
Semantics
In this section we define a PER model of the calculus presented in the previous section. The model is used to define a normalisation function later. 3.1
PER Semantics
Definition 3. We define a domain D = O ⊕ Var ⊥ ⊕ [D → D] ⊕ D × D ⊕ D × D ⊕ O ⊕ D × [D → D] ⊕ D × D, where Var is a denumerable set of variables (as usual we write xi and assume xi = xj if i = j, for i, j ∈ N), E⊥ = E ∪ {⊥} is lifting, O = {}⊥ is the Sierpinski space, [D → D] is the set of continuous functions from D to D, ⊕ is the coalesced sum, and D × D is the Cartesian product of D [6]. An element of D which is not ⊥ can be of one of the forms: (d, d ) U
Var xi Lam f
for d, d ∈ D for xi ∈ Var for d ∈ D, and f ∈ [D → D]
Fun d f
App d d
Sing d d
for d, d ∈ D .
We define application · : [D × D → D] and the projections p, q : [D → D] by f · d = if f = Lam f then f d else ⊥, p d = if d = (d1 , d2 ) then d1 else ⊥, q d = if d = (d1 , d2 ) then d2 else ⊥. We define a partial function R : N → D → Terms which reifies elements from the model into terms; this function is similar to the read-back function of Gregoire and Leroy’s [16]. Definition 4 (Read-back function) Rj U = U Rj (Fun X F ) = Fun (Rj X) (Rj+1 (F (Var xj ))) Rj (Sing d X) = {Rj d}Rj X
Rj (App d d ) = App (Rj d) (Rj d ) Rj (Lam f ) = λ(Rj+1 (f (Var xj ))) q if j i Rj (Var xi ) = j−i−1 qp if j > i
A Modular Type-Checking Algorithm for Type Theory with Singleton Types
11
Partial Equivalence Relations. A partial equivalence relation (PER) over a set D is a binary relation over D which is symmetric and transitive. If R is a PER over D, and (d, d ) ∈ R then it is clear that (d, d) ∈ R. We define dom(R) = {d ∈ D | (d, d) ∈ R} . If (d, d ) ∈ R, sometimes we will write d = d ∈ R, and d ∈ R if d ∈ dom(R). We denote with PER(D) the set of all PERs over D. If R ∈ PER(D) and F : dom(R) → PER(D), we say that F is a family of PERs indexed by R iff for all d = d ∈ R, F d = F d . If F is a family indexed by R, we write F : R → PER(D). We define two binary relations over D: one for neutral terms and the other for normal forms. d = d ∈ Ne : ⇐⇒ ∀i ∈ N. Ri d and Ri d are defined and Ri d ≡Ne Ri d d = d ∈ Nf : ⇐⇒ ∀i ∈ N. Ri d and Ri d are defined and Ri d ≡Nf Ri d The following definitions are standard [8,14] (except for 1); they will be used in the definition of the model. Definition 5. Let X ∈ PER(D) and F ∈ X → PER(D). – – – –
1 = {(, )}; X F = {(d, d ) | p d = p d ∈ X and q d = q d ∈ F (p d)}; X F = {(f, f ) | f · d = f · d ∈ F d, for all d = d ∈ X }; {{d}}X = {(e, e ) | d = e ∈ X and d = e ∈ X }.
We define U, T ∈ PER(D) and [ ] : dom(T ) → PER(D) using Dybjer’s schema of inductive-recursive definition [15]. We show then that [ ] is a family of PERs over D. Definition 6 (PER model) – Inductive definition of U ∈ PER(D). • Ne ⊆ U, • if X = X ∈ U and d = d ∈ [X], then Sing d X = Sing d X ∈ U, • if X = X ∈ U and for all d = d ∈ [X], F d = F d ∈ U then Fun X F = Fun X F ∈ U. – Inductive definition of T ∈ PER(D). • U ⊂T, • U=U∈T, • if X = X ∈ T , and d = d ∈ [X] then Sing d X = Sing d X ∈ T , • if X = X ∈ T , and for all d = d ∈ [X], F d = F d ∈ T , then Fun X F = Fun X F ∈ T . – Recursive definition of [ ] ∈ dom(T ) → PER(D). • [U] = U, • [Sing d X] = { {d}}[X], • [Fun X F ] = [X] (d → [F d]), • [d] = Ne, in all other cases. Lemma 5. The function [ ] is a family of PER(D) over T .
12
A. Abel, T. Coquand, and M. Pagano
3.2
Normalisation and η-Expansion in the Model
The usual way to define NbE [7] is to introduce a reification function which maps elements from the model into normal forms; and a function mapping neutral terms to elements of the model (the former function is called the inverse of the evaluation function, and the later “make self evaluating” in [10]). A tricky point of the algorithm is to find a new variable when reifying functions as abstractions. In this work we do not need to worry about variable capturing when reifying, because we can define functions corresponding to reification, and lifting of neutrals in the model avoiding completely the need to deal with fresh variables. Definition 7. The partial functions ↑ , ↓ given as follows:
: D → D → D and ⇓ : D → D are
↑Fun X F d = Lam (e → ↑F e App d ↓X e) ↓Fun X F d = Lam (e → ↓F ↑Sing d X e = d
↑X e
(d · ↑X e))
↓Sing d X e = ↓X d
↑U d = d
↓U d = ⇓ d
↑d e = e
↓d e = e, in all other cases.
⇓(Fun X F ) = Fun (⇓ X) (d → ⇓(F ↑X d))
⇓U = U
⇓(Sing d X) = Sing (↓X d) (⇓ X)
⇓ d = d, in all other cases.
Lemma 6 (Characterisation of ↑, ↓, and ⇓). Let X = X ∈ T , then 1. if k = k ∈ Ne then ↑X k = ↑X k ∈ [X]; 2. if d = d ∈ [X], then ↓X d = ↓X d ∈ Nf ; 3. and also ⇓ X = ⇓ X ∈ Nf . Definition 8 (Semantics) Contexts. [[]] = 1
[[Γ.A]] =
[[Γ ]] (d → [[[A]]d])
Substitutions. [[]]d = [[(γ, t)]]d = ([[γ]]d, [[t]]d) [[γ δ]]d = [[γ]]([[δ]]d)
[[id]]d = d [[p]]d = p d
Terms (and types). [[U]]d = U [[{a}A ]]d = Sing ([[a]]d) ([[A]]d) [[λt]]d = Lam (d → [[t]](d, d )) [[q]]d = q d
[[Fun A B]]d = Fun ([[A]]d) (e → [[B]](d, e)) [[App t u]]d = [[t]]d · [[u]]d [[t γ]]d = [[t]]([[γ]]d)
A Modular Type-Checking Algorithm for Type Theory with Singleton Types
13
Definition 9 (Validity) 1. 2. 3. 4. 5. 6. 7. 8.
iff true Γ.A iff Γ A Γ A iff Γ A = A Γ A = A iff Γ and for all d = d ∈ [[Γ ]], [[A]]d = [[A ]]d ∈ T Γ t : A iff Γ t = t : A Γ t = t : A iff Γ A and for all d = d ∈ [[Γ ]], [[t]]d = [[t ]]d ∈ [[[A]]d] Γ σ : Δ iff Γ σ = σ : Δ Γ σ = σ : Δ iff Γ , Δ , and for all d = d ∈ [[Γ ]], [[σ]]d = [[σ ]]d ∈ [[Δ]].
Theorem 1 (Soundness of the Judgements). if Γ J, then Γ J. Proof. By induction on Γ J. Theorem 2 (Completeness of NbE). If t = t : A, then ↓[[A]] [[t]] = ↓[[A]] [[t ]] ∈ Nf . Proof. By Thm. 1 we have [[t]] = [[t ]] ∈ [[[A]]] and we conclude by Lem. 6. Calculus with Proof-Irrelevance. We extend all the definition concerning the construction of the model; D = ...⊕ D ⊕ O ; the new inhabitants will be written as Prf(d), and , respectively. The read-back function is extended by the equations Rj (Prf(d)) = Prf (Rj d) and Rj = O. We add a new clause in the definition of T , if X = X ∈ T , then Prf(X) = Prf(X ) ∈ T , and [Prf(X)] = {(, )} . The definitions of normalisation and expansion are extended for Prf(X), ↑Prf(X) d =
↓Prf(X) d =
⇓ Prf(X) = Prf(⇓ X) .
The semantic equations for the new constructions are [[Prf A]]d = Prf([[A]]d) B
[[b where t]]d =
[[[a]]]d = [[O]]d = .
Remark 3. All of lemmata 5, 6, and theorems 1, and 2 are valid for the calculus with proof-irrelevance.
4
Logical Relations
In order to prove soundness of our normalisation algorithm we define logical relations [24] between types and elements in the domain of T , and between terms and elements in the domain of the PER corresponding to elements of T .
14
A. Abel, T. Coquand, and M. Pagano
Definition 10 (Logical relations). The relations Γ A ∼ X ∈ T (ternary) and Γ t : A ∼ d ∈ [X] are defined simultaneously by induction on X ∈ T . – Neutral types: X ∈ Ne. • Γ A ∼ X ∈ T iff for all Δ i Γ , Δ A pi = R|Δ| ⇓ X. • Γ t : A ∼ d ∈ [X] iff Γ A ∼ X ∈ T , and for all Δ i Γ , Δ t pi = R|Δ| ↓X d : A pi . – Universe X = U. • Γ A ∼ U ∈ T iff Γ A = U. • Γ t : A ∼ X ∈ [U] iff Γ A = U, and Γ t ∼ X ∈ T . – Singletons. • Γ A ∼ Sing d X ∈ T iff Γ A = {a}A for some A , a, and Γ a : A ∼ d ∈ [X]. • Γ t : A ∼ d ∈ [Sing d X] iff Γ A = {a}A for some A , a, such that Γ t : A ∼ d ∈ [X], and Γ A ∼ X ∈ T . – Function spaces. • Γ A ∼ Fun X F ∈ T iff Γ A = Fun A B, and Γ A ∼ X ∈ T , and Δ B (pi , s) ∼ F d ∈ T for all Δ i Γ and Δ s : A pi ∼ d ∈ [X]. • Γ t : A ∼ f ∈ [Fun X F ] iff Γ A = Fun A B, Γ A ∼ X, and Δ App (t pi ) s : B (pi , s) ∼ f · d ∈ [F d] for all Δ i Γ and Δ s : A pi ∼ d ∈ [X]. The following lemmata show that the logical relations are preserved by judgemental equality, weakening of the judgement, and the equalities on the corresponding PERs. Lemma 7. Let Γ A = A , Γ t = t : A, Γ A ∼ X ∈ T , and Γ t : A ∼ d ∈ [X]; then Γ A ∼ X ∈ T , and Γ t : A ∼ d ∈ [X]. Lemma 8 (Monotonicity). Let Δ i Γ , then 1. if Γ A ∼ X ∈ T , then Δ A pi ∼ X ∈ T ; and 2. if Γ t : A ∼ d ∈ [X], then Δ t pi : A pi ∼ d ∈ [X]. Lemma 9. Let Γ A ∼ X ∈ T and Γ t : A ∼ d ∈ [X], then 1. if X = X ∈ T , then Γ A ∼ X ∈ T ; and 2. if d = d ∈ [X], then Γ t : A ∼ d ∈ [X]. The following lemma plays a key role in the proof of soundness. It proves that if a term is related to some element in (some PER), then it is convertible to the reification of the corresponding element in the PER of normal forms. Lemma 10. Let Γ A ∼ X ∈ T , Γ t : A ∼ d ∈ [X], and k ∈ Ne, then 1. Γ A = R|Γ | ⇓ X, 2. Γ t = R|Γ | ↓X d : A; and 3. if for all Δ i Γ , Δ t pi = R|Δ| k : A pi , then Γ t : A ∼ ↑X k ∈ [X].
A Modular Type-Checking Algorithm for Type Theory with Singleton Types
15
In order to finish the proof of soundness we have to prove that each well-typed term (and each well-formed type) is logically related to its denotation; with that aim we extend the definition of logical relations to substitutions and prove the fundamental theorem of logical relations. Definition 11 (Logical relation for substitutions) – Γ σ : ∼ d ∈ 1. – Γ (σ, t) : Δ.A ∼ (d, d ) ∈ X (d → [F d]) iff Γ σ : Δ ∼ d ∈ X , Γ A σ ∼ F d ∈ T , and Γ t : A σ ∼ d ∈ [F d]. After proving the counterparts of 7, 8 and 9 for substitutions, we can proceed with the proof of the main theorem of logical relations. Theorem 3 (Fundamental theorem of logical relations). Let Δ δ : Γ ∼ d ∈ [[Γ ]]. 1. If Γ A, then Δ A δ ∼ [[A]]d ∈ T ; 2. if Γ t : A, then Δ t δ : A δ ∼ [[t]]d ∈ [[[A]]d]; and 3. if Γ γ : Θ then Δ γ δ : Θ ∼ [[γ]]d ∈ [[Θ]]. We define for each context Γ an element ρΓ of D, that is, by construction, logically related to idΓ . This environment will be used to define the normalisation function; also notice that if we instantiate Thm. 3 with ρΓ , then a well-typed term under Γ will be logically related to its denotation. Definition 12. Let ρΓ = PΓ , where P d = d and PΓ.A d = (d , ↑[[A]]d Var x|Γ | ) with d = PΓ d. Then Γ idΓ : Γ ∼ ρΓ ∈ [[Γ ]] for Γ ∈ Ctx. Definition 13 (Normalisation algorithm). Let Γ A, and Γ t : A. nbeΓ (A) = R|Γ | ⇓[[A]]ρΓ nbeA Γ (t) = R|Γ | ↓[[A]]ρΓ [[t]]ρΓ The first point of soundness is a direct consequence of Thm. 3 and Lem. 7; and the second point is obtained using Lem. 10. Corollary 2 (Soundness of NbE). Let Γ A, and Γ t : A, then 1. Γ A ∼ [[A]]ρΓ ∈ T , and Γ t : A ∼ [[t]]ρΓ ∈ [[[A]]ρΓ ]; and 2. Γ A = nbe(A), and Γ t = nbe(t) : A. Remark 4. By expanding the definitions, we easily check 1. nbeΓ (Fun A B) = Fun (nbeΓ (A)) (nbeΓ.A (B)), and 2. nbeΓ ({a}A ) = {nbeA Γ (a)}nbeΓ (A) . Corollary 3. If Γ A, and Γ A , then we can decide Γ A = A . Also if Γ t : A, and Γ t : A, we can decide Γ t = t : A. Corollary 4 (Injectivity of Fun and of { } ). If Γ Fun A B = Fun A B , then Γ A = A , and Γ.A B = B . Also Γ {t}A = {t }A , then Γ A = A , and Γ t = t : A.
16
A. Abel, T. Coquand, and M. Pagano
Calculus with Proof-Irrelevance. We add the corresponding cases in the definition of logical relations, Γ A ∼ Prf(X) ∈ T , iff Γ A = Prf A , and Γ A ∼ X ∈ T ; and Γ t : A ∼ d ∈ [Prf(X)], iff Γ A ∼ Prf(X) ∈ T . Remark 5. All the lemmata 7, 8, 9, 10, theorem 3, and remarks 2, 4 are still valid. Moreover we also have nbe(Prf A) = Prf (nbe(A)).
5
Type-Checking Algorithm
In this section we define a bi-directional type-checking algorithm for terms in normal form, and a type-inference algorithm for neutral terms. We prove its correctness and completeness. The algorithm is similar to previous ones [13,3]. The only difference is due to the presence of singleton types. We deal with this by η-normalising the type, and considering first if the normalised type is a singleton (side-condition in typechecking of neutrals); in that case we check that the term is typeable with the tag of the singleton type, and that it is equal to the term of the singleton. We stress the importance of having a normalisation function with the property stated in Rem. 4, and also to have decidability of equality. In fact, it is enough to have a function nbe( ) such that: 1. nbe({a}A ) = {nbe(a)}nbe(A) , and nbe(Fun A B) = Fun nbe(A)nbe(B) ; A 2. nbeΓ (A) = nbeΓ (B) if and only if Γ A = B, and nbeA Γ (t) = nbeΓ (t ), if and only if Γ t = t : A. In this section, let V, V , W, v, v , w ∈ Nf , and k ∈ Ne. We define a function to get the deepest tag of a singleton, that is essentially the same as in [8], V =
W V
if V ≡ {w}W otherwise.
The predicates for type-checking are defined mutually inductively, together with the function for inferring types. Definition 14 (Type-checking and type-inference) Types Γ ⇐ V . We presuppose Γ .
Γ ⇐U
Γ ⇐V Γ.V ⇐ W Γ ⇐ V Γ ⇐ Fun V W
Γ v ⇐ nbe(V ) Γ k ⇐ U Γ ⇐ {v}V Γ ⇐k
A Modular Type-Checking Algorithm for Type Theory with Singleton Types
17
Terms Γ v ⇐ V . We presuppose Γ V , and V in η-long normal form with respect to Γ . Γ V ⇐U Γ.V W ⇐ U Γ Fun V W ⇐ U Γ V ⇐U Γ v ⇐ nbe(V ) Γ {v}V ⇐ U
Γ.V v ⇐ W Γ λv ⇐ Fun V W Γ v = v : V Γ v ⇐V Γ v ⇐ {v }V
Γ k ⇒V Γ V =V V ≡ {w}W Γ k⇐V Type inference Γ k ⇒ V . We presuppose Γ .
Γ.Ai . . . . A0 q p ⇒ nbe(Ai p i
i+1
Γ v ⇐V Γ k⇒V Γ V = Fun V W ) Γ App k v ⇒ nbe(W (id, v))
Theorem 4 (Correctness of type-checking) 1. If Γ ⇐ V , then Γ V . 2. If Γ v ⇐ V , then Γ v : V . 3. If Γ k ⇒ V , then Γ k : V . Proof By simultaneous induction on the type-checking judgement. In order to prove completeness we define a lexicographic order on pairs of terms and types, in this way we can make induction over the term, and the type. Definition 15. Let v, v ∈ Nf , and A, A ∈ Type(Γ ), then (v, A) ≺ (v , A ) is the lexicographic order on Nf × Type(Γ ). The corresponding orders are v ≺ v iff v is an immediate sub-term of v ; and A ≺Γ A , iff nbe(A ) ≡ {w}nbe(A) . Theorem 5 (Completeness of type-checking) 1. If Γ V , then Γ ⇐ V . 2. If Γ v : A, then Γ v ⇐ nbe(A). 3. If Γ k : A, and Γ k ⇒ V , then Γ nbe(A) = V . Proof By simultaneous induction on V , and well-founded induction on (v, A). Calculus with Proof-Irrelevance Definition 16 (Type-checking and type-inference) Γ.V v ⇐ nbe(V p) Γ k ⇒ Prf V Γ ⇐V Γ v⇐V Γ ⇐ Prf V Γ [v] ⇐ Prf V Γ v whereV k ⇒ Prf V Remark 6. Thm. 4 is still valid for the calculus with prf-tm. Moreover, Thm. 5 is valid if we add the axiom Γ O ⇐ Prf V.
18
A. Abel, T. Coquand, and M. Pagano
Remark 7. Type checking happens always before normalisation. If the term to type-check does not contain O, the case Γ O ⇐ Prf V will never be reached— although occurrences of O may be created by normalisation. Corollary 5. The type-checking algorithm is correct (by Cor. 1) and complete (by last remark) with respect to the calculus without prf-tm.
6
Conclusion
The main contributions of the paper are the definition of a correct and complete type-checking algorithm, and the simplification of the NbE algorithm for a calculus with singletons, one universe, and proof-irrelevant types. The type-checker is based on the NbE algorithm which is used to decide equality and to prove the injectivity of the type constructors. We emphasise that the type-checking algorithm is modular with respect to the normalisation algorithm. All the results can be extended to a calculus with annotated lambda abstractions, yielding a type-checking algorithm for terms not necessarily in normal forms. The full version [5] extends this work by sigma-types and data types and an implementation of the type checker in Haskell.
References 1. Abel, A., Aehlig, K., Dybjer, P.: Normalization by evaluation for Martin-L¨ of type theory with one universe. In: Fiore, M. (ed.) Proc. of the 23rd Conf. on the Mathematical Foundations of Programming Semantics (MFPS XXIII). Electr. Notes in Theor. Comp. Sci, vol. 173, pp. 17–39. Elsevier, Amsterdam (2007) 2. Abel, A., Coquand, T., Dybjer, P.: Normalization by evaluation for Martin-L¨ of Type Theory with typed equality judgements. In: Proc. of the 22nd IEEE Symp. on Logic in Computer Science (LICS 2007), pp. 3–12. IEEE Computer Soc. Press, Los Alamitos (2007) 3. Abel, A., Coquand, T., Dybjer, P.: On the algebraic foundation of proof assistants for intuitionistic type theory. In: Garrigue, J., Hermenegildo, M.V. (eds.) FLOPS 2008. LNCS, vol. 4989, pp. 3–13. Springer, Heidelberg (2008) 4. Abel, A., Coquand, T., Dybjer, P.: Verifying a semantic βη-conversion test for Martin-L¨ of type theory. In: Audebaud, P., Paulin-Mohring, C. (eds.) MPC 2008. LNCS, vol. 5133, pp. 29–56. Springer, Heidelberg (2008) 5. Abel, A., Coquand, T., Pagano, M.: A modular type-checking algorithm for type theory with singleton types and proof irrelevance (full version) (2009), http://www.tcs.ifi.lmu.de/~ abel/singleton.pdf 6. Abramsky, S., Jung, A.: Domain Theory. In: Handbook of Logic in Computer Science, pp. 1–168. Oxford University Press, Oxford (1994) 7. Aehlig, K., Joachimski, F.: Operational aspects of untyped normalization by evaluation. Math. Struct. in Comput. Sci. 14, 587–611 (2004) 8. Aspinall, D.: Subtyping with singleton types. In: Pacholski, L., Tiuryn, J. (eds.) CSL 1994. LNCS, vol. 933, pp. 1–15. Springer, Heidelberg (1995) 9. Awodey, S., Bauer, A.: Propositions as [Types]. J. Log. Comput. 14, 447–471 (2004)
A Modular Type-Checking Algorithm for Type Theory with Singleton Types
19
10. Berger, U., Schwichtenberg, H.: An inverse to the evaluation functional for typed λ-calculus. In: Proc. of the 6th IEEE Symp. on Logic in Computer Science (LICS 1991), pp. 203–211. IEEE Computer Soc. Press, Los Alamitos (1991) 11. Bruijn, N.G.d.: Some extensions of Automath: the AUT-4 family (1994) 12. Cartmell, J.: Generalised algebraic theories and contextual categories. Annals of Pure and Applied Logic, 32–209 (1986) 13. Coquand, T.: An algorithm for type-checking dependent types. Science of Computer Programming 26, 167–177 (1996) 14. Coquand, T., Pollack, R., Takeyama, M.: A logical framework with dependently typed records. Fundam. Inform. 65, 113–134 (2005) 15. Dybjer, P.: A general formulation of simultaneous inductive-recursive definitions in type theory. The Journal of Symbolic Logic 65, 525–549 (2000) 16. Gr´egoire, B., Leroy, X.: A compiled implementation of strong reduction. In: Proc. of the 7th ACM SIGPLAN Int. Conf. on Functional Programming (ICFP 2002). SIGPLAN Notices, vol. 37, pp. 235–246. ACM Press, New York (2002) 17. Harper, R., Honsell, F., Plotkin, G.: A Framework for Defining Logics. Journal of the Association of Computing Machinery 40, 143–184 (1993) 18. INRIA: The Coq Proof Assistant, Version 8.1. INRIA (2007), http://coq.inria.fr 19. Lee, D.K., Crary, K., Harper, R.: Towards a mechanized metatheory of Standard ML. In: Hofmann, M., Felleisen, M. (eds.) Proc. of the 34th ACM Symp. on Principles of Programming Languages, POPL 2007, pp. 173–184. ACM Press, New York (2007) 20. Maillard, O.-A.: Proof-irrelevance, strong-normalisation in Type-Theory and PER. Technical report, Chalmers Institute of Technology (2006) 21. Martin-L¨ of, P.: Intuitionistic Type Theory. Bibliopolis (1984) 22. Martin-L¨ of, P.: Normalization by evaluation and by the method of computability, Talk at JAIST. Japan Advanced Institute of Science and Technology, Kanazawa (2004) 23. McBride, C.: Epigram: Practical programming with dependent types. In: Vene, V., Uustalu, T. (eds.) AFP 2004. LNCS, vol. 3622, pp. 130–170. Springer, Heidelberg (2005) 24. Mitchell, J.C., Moggi, E.: Kripke-Style models for typed lambda calculus. In: LICS, pp. 303–314 (1987) 25. Nordstr¨ om, B., Petersson, K., Smith, J.M.: Programming in Martin L¨ of’s Type Theory: An Introduction. Clarendon Press, Oxford (1990) 26. Norell, U.: Towards a practical programming language based on dependent type theory. Ph.D. thesis, Department of Computer Science and Engineering, Chalmers University of Technology, G¨ oteborg, Sweden (2007) 27. Shankar, N., Owre, S.: Principles and Pragmatics of Subtyping in PVS. In: Bert, D., Choppy, C., Mosses, P.D. (eds.) WADT 1999. LNCS, vol. 1827, pp. 37–52. Springer, Heidelberg (2000) 28. Sozeau, M.: Subset coercions in Coq. In: Altenkirch, T., McBride, C. (eds.) TYPES 2006. LNCS, vol. 4502, pp. 237–252. Springer, Heidelberg (2007) 29. Stone, C.A., Harper, R.: Extensional equivalence and singleton types. ACM Trans. Comput. Logic 7, 676–722 (2006) 30. Werner, B.: On the strength of proof-irrelevant type theories. Logical Meth. in Comput. Sci. 4 (2008)
Interactive Learning-Based Realizability Interpretation for Heyting Arithmetic with EM1 Federico Aschieri and Stefano Berardi C.S. Dept. University of Turin
Abstract. We interpret classical proofs as constructive proofs (with constructive rules for ∨, ∃) over a suitable structure N for the language of natural numbers and maps of G¨ odel’s system T . We introduce a new Realization semantics we call “Interactive learning-based Realizability”, for Heyting Arithmetic plus EM1 (Excluded middle axiom restricted to Σ10 formulas). Individuals of N evolve with time, and realizers may “interact” with them, by influencing their evolution. We build our semantics over Avigad’s fixed point result [1], but the same semantics may be defined over different constructive interpretations of classical arithmetic (in [7], continuations are used). Our notion of realizability extends Kleene’s realizability and differs from it only in the atomic case: we interpret atomic realizers as “learning agents”.
1
Introduction
From now on, we will call EM1 the Excluded middle axiom restricted to Σ10 formulas. In this paper we extend Berardi and de’ Liguoro ([4], [7]) notion of atomic realizability - originally conceived for quantifier free primitive recursive Arithmetic plus EM1 - to full predicate logic, namely Heyting Arithmetic with EM1 (HA + EM1 ). Our idea is to interpret classical proofs as constructive proofs on a suitable structure N for natural numbers and maps of system T . We extend in a natural way Kleene’s intuitionistic realizability in terms of a new notion, which we call “Interactive learning-based Realizability”. We provide a term assignment for the standard natural deduction system of HA+EM1 , which is surprisingly equal in all respects to that of HA, but for the fact that we have new realizers for atomic formulas and Excluded Middle. Our semantics may be used to interpret existing program extraction procedure for classical proofs, in order to solve a major problem of all computational interpretations: global illegibility, which means that, even for simple classical proofs, it is extremely difficult to understand the behavior of extracted programs and how each part of the extracted program relates to the others part of the same program. The main sources of inspiration of this paper are works of Kleene, Coquand, Hayashi, Berardi and de’ Liguoro and Avigad. Kleene’s Realizability revisited. In [15], Kleene introduced the notion of realizability, a formal semantics for intuitionistic arithmetic. Realizability is nothing but a formal version of Heyting semantics for intuitionistic logic, translated into the language of arithmetic. P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 20–34, 2009. c Springer-Verlag Berlin Heidelberg 2009
Interactive Learning-Based Realizability Interpretation
21
Intuitively, realizing a closed arithmetical formula A means exhibiting a computer program - called realizer - able to calculate all relevant information about the truth of A. Hence, realizing a formula A ∨ B means realizing A or realizing B, after calculating which one of the two is actually realized; realizing a formula ∃xA(x) means computing a natural number n - called a witness - and realizing A(n). These two cases are indeed the only ones in which we have relevant information to calculate about the truth of the corresponding formula, and there is a decision to be made: realizing a formula ∀xA means exhibiting an algorithm which takes as input natural numbers n and gives as output realizers of A(n); realizing a formula A ∧ B means realizing A and realizing B; realizing A → B means providing an algorithm which takes as input realizers of A and gives realizers of B; so we see that in these cases, we provide no information about the formula we realize and we only take the inputs we will need for realizing existential or disjunctive formulas. Finally, realizing an atomic formula means that the formula is true: in this case, the realizer does nothing at all. Intuitionistic natural deduction rules are perfectly suited to preserve realizability. In order to actually build realizers from intuitionistic natural deductions, it suffices to give realizers for the axioms. Since our goal is to interpret classical connectives using Heyting and Kleene interpretation of intuitionistic connectives, then a first, quite naive idea would be the following: if we devised realizers for Excluded Middle, we would able to extend realizability to all classical arithmetic. Unfortunately, from the work of Turing is well known that not every instance of Excluded Middle is realizable. If T xyz is Kleene’s predicate, realizing ∀x∀y.∃zT xyz ∨ ∀z¬T xyz implies exhibiting an algorithm which for every n, m calculates whether or not the n-th Turing machine halts on input m: the halting problem would be decidable. Hence, there is no hope of computing with effective programs all the information about the truth of Excluded Middle. However, not all is lost. A first key observation we can make is the following. Suppose we had a realizer O of the Excluded Middle and we made a natural deduction of a formula ∃xA actually using Excluded Middle; then, we would be able to extract from the proof a program u, containing O as subprogram, able to compute the witness for ∃xA. Given the effectiveness of u, after a finite number of steps - and more importantly, after a finite number of calls to O - u would yield the required witness. It is thus clear that u, to perform the calculation, would use only a finite piece of information about the Excluded Middle. This fundamental fact gives us an hope: maybe there is not always necessity of fully realizing Excluded Middle, since in finite computations only a finite amount of information is used. If we were able to gain that information during the computation, we could adapt Kleene’s realizability to Classical Logic. Coquand’s Game Semantics for Classical Arithmetic. As we have seen, computing all relevant information about the truth of a given formula A is not always possible. In [8] and in the context of game semantics, Coquand introduced a new key idea to round on this problem: backtracking and learning. If we cannot
22
F. Aschieri and S. Berardi
compute all the right information about the truth of a formula, maybe we could do this if we were allowed to make mistakes and to learn from them. Suppose, for instance, we have the formula ∀x.∃yP xy ∨ ∀y¬P xy, but no algorithm which, for all n ∈ N given as input, outputs false if ∀y¬P ny holds and outputs true if ∃yP ny holds. Then we may describe a learning algorithm r as follows. Initially, for all n ∈ N given as input, r outputs false. Intuitively, r is initially persuaded - following the principle “if I don’t see, I do not believe” that for all n there is not an m such that P nm holds. Hence, when asked for his opinion about the formula ∃yP ny ∨ ∀y¬P ny, r always says: ∃yP ny is false. However, if someone - an opponent of r - to show that r is wrong, comes out with an m such that P nm holds, r realizes indeed to be mistaken, and stores the information “P nm is true”. Then, the next time being asked for an opinion about ∃yP ny ∨ ∀y¬P ny, r will say: true. In other words, such r, after a maximum of one “mind changing”, would be able to learn the correct answer to any question of the form: “which one among ∃yP ny, ∀y¬P ny does hold?”. This is actually learning by counterexamples and is the key idea behind Coquand’s semantics. Our question is now: can we formulate a realizability notion based on learning by counterexamples in order to extend Kleene’s interpretation to the subclassical Arithmetic HA + EM1 ? In our solution we modify the notion of individual, in such a way that individuals change with time, and realizers “interact” with them. Hayashi’s Proof Animation and Realizability. In [14], Hayashi explains a notion of realizability for a subclassical arithmetic, called limit computable mathematics. Basing his analysis on ideas of Gold, he defines a Kleene’s style notion of realizability equal to the original one but for the fact that the notion of individual changes: the witnesses of existential and disjunctive formulas are calculated by a stream of guesses and “learned in the limit” (in the sense that the limit of the stream is a correct witness). An individual a is therefore a computable map a : N → N, with a(t) representing the value of the individual at time t. The technical device that makes his interpretation working is the class of limiting recursive functions introduced in Gold [11]. For instance, how would Hayashi realize the formula ∀x.∃yP xy ∨ ∀y¬P xy? He would define an algorithm H as follows. Given an n ∈ N, H would calculate the truth value of ∀y ≤ nP ny. Then the correct answer to the question: “which one among ∃yP ny, ∀y¬P ny does hold?” is learned in the limit by computing P (n, 0), P (n, 1), P (n, 2),. . . , P (n, k),. . . and thus producing a stream of guesses either of the form false, false, false,. . . , true, true,. . . , true,. . . or of the form false, false, false, . . . , false, . . . , the first stabilizing in the limit to true, the second to false. Hayashi’s idea is to perform a completely blind and exhaustive search: in such a way, the correct answer is guaranteed to be eventually learned. Hayashi’s realizer do not learn in an efficient way: in Hayashi’s notion of realizability the only learning device is to look through all possible cases. Instead, we want a notion of learning in which we the stream of guesses is driven by the proof itself, as in Coquand’s game semantics.
Interactive Learning-Based Realizability Interpretation
23
Realizability Based on Learning: extending Berardi-de’ Liguoro interpretation. As long as one investigates learning and the process of correcting hypotheses by means of counterexamples, it is natural to use Popper’s ideas [16] as a metaphor. According to Popper, a scientific theory relies on a set of unproved - and unprovable - hypotheses and, through logic, makes predictions suitable to be falsified by experiments. If a prediction is falsified, some hypothesis is incorrect. In front of a counterexample to a theory’s prediction, one must modify the set of hypotheses and build a better theory, which will be tested by experiments, and so on. Laws of Nature are universal statements, that cannot be verified, but are suitable to falsification. Now we may explain the link between falsifiable hypothesis and EM1 . For every n, given an instance ∃y.P ny ∨ ∀y.¬P ny of EM1 (with P atomic), we may formulate an hypothesis about which side of the disjunction is true. If we know that P nm is true for some m, we know that ∃y.P ny is true. Otherwise we may assume ∀y.¬P ny as hypothesis, because it is a falsifiable hypothesis. In order to formalize the process of making hypotheses about EM1 , we introduce a finite base of knowledge, called s, collecting the instances P nm which we know to hold, e.g. by direct calculation. If we have evidence that P nm holds for some m (that is, P nm ∈ s) we know that ∃yP ny is true; in the other case, we assume that ∀y¬P ny is true. So s defines a set of hypotheses on EM1 , of the form ∀y¬P ny: universal falsifiable statements. Using s we can effectively decide which side of a given instance of EM1 is true, albeit at the price of making mistakes: it suffices to define a program which, to decide if ∀y¬P ny is true, looks for any P nm in the finite base s and outputs false if the research is successful, true otherwise. Hence, if s contains enough information, we can provide an effective realizer for every instance of EM1 , as we wished to show. But, as we have said, working with Kleene’s realizability and having realizers for the Excluded Middle implies being able to extract realizers from classical proofs! Now we can take the final step. A Kleene’s style realizer r associated to a classical proof p interprets each step of p deriving some formula as a “prediction” of the truth of this formula, based on the information in s. For example, in front of a formula ∃x.A ∧ B, a realizer r will predict that A(n) ∧ B(n) is true for some n ∈ N (and since n depends on s, it is like we had changed the notion of individual, interpreting “numbers” as computable maps from the set of bases of knowledge to N). Then r will predict B(n) to be true, and so on, until r will arrive at some atomic formula, say ¬P nm, and will predict it to be true. If notwithstanding that prediction - ¬P nm is false, our set of hypotheses is not correct and we have a counterexample. Our reaction is compelled: we enlarge our base of knowledge s by including the information“P nm is true”, thus modifying our set of hypotheses. Our Interactive Realizability differs from Kleene’s realizability in the notion of individual, as we said, and in the realizability relation for the atomic case. In our interpretation, to realize an atomic formula does not mean that the formula is true, but that the realizer extends our base of knowledge s if the formula is not true. The realizer is thought as a learning device. Each extension of s may change the individuals which are parameters of the atomic formula, and therefore may
24
F. Aschieri and S. Berardi
make the atomic formula false again. Then the realizer extends s again, and so forth. The convergence of this “interaction” between a realizer and a group of individuals follows by Avigad’s fixed point thm. [1]. The idea of using finite bases of knowledge and to use them to “decide” Excluded Middle comes from Berardi and de’ Liguoro [4], [7]. In [7] there are only realizers for atomic formulas, which have the task of extending the current knowledge and hence are not trivial: they embody learning strategies. Why the Arithmetic HA + EM1 ? It is now time to explain why we chose to restrict our realizability interpretation to the subclassical Arithmetic HA + EM1 , instead of considering, say, full Peano Arithmetic. There are two main reasons. First, we observe that EM1 enjoys a very good property: the information about its truth can be computed in the limit, in the sense of Gold [11], as we saw en passant when discussing Hayashi’s realizability. This implies that witnesses for existential and disjunctive statements too can be learned in the limit, as shown in Hayashi [14]. Hence HA + EM1 is a very interesting and simple framework to work within. Furthermore, the realizers which we will be able to extract from proofs, will have a straightforward interpretation as winning strategies in 1-Backtracking games [6], which are the most natural and simple instances of Coquand’s style games. This conceptual simplicity helps to understand our semantics, which is completely new and quite subtle, and keeps low the technical complexity. Secondly, if we look at actual mathematical practice, we note that a great deal of mathematical theorems are proved by using EM1 alone ([2], [5]). Plan of the Paper. The paper is organized as follows. In §2 we define the term calculus in which our realizers will be written: a version of G¨ odel’s system T , extended with some syntactic sugar, in order to represent bases of knowledge (which we shall call states) and to manipulate them. Then we prove a convergence property for this calculus (as in Avigad [1]). In §3, we introduce the notion of realizability and prove our Main Theorem, the Adequacy Theorem: “if a closed arithmetical formula is provable in HA + EM1 , then it is realizable”. Proofs are only included in the full version of the paper [3] (downloadable). In [3] we also include some examples of program extraction, which are the motivation of this paper: they show that our theory yields legible, intuitive, effective programs from classical proofs.
2
The Term Calculus
In this section we formalize the intuition of realizer and many more ideas we discussed in the introduction. In particular, we associate to any instance ∃yP xy∨ ∀y¬P xy of EM1 (Excluded Middle restricted to Σ10 -formulas) two functions χP and ϕP . The function χP take a knowledge base s, a value n for x, and it returns a guess for the truth value of ∃y.P ny. When this guess is “true” the function ϕP returns a witness m of ∃y.P ny. The guess for the truth value of ∃y.P ny is computed w.r.t. the knowledge base s, and it may be wrong. For each value of the knowledge base s, the function χP (s, .) is some “approximation” of an
Interactive Learning-Based Realizability Interpretation
25
ideal map, the oracle returning the truth value of ∃y.P xy. The Skolem axioms effectively used by a given proof take the place of a set of experiments testing the correctness of the predictions about ϕP , χP (we do not check the correctness of ϕP , χP in an exhaustive way). Our Term Calculus is based on G¨ odel’s system T . T is simply typed λcalculus, with atomic types N (representing the set N of natural numbers) and Bool (representing the set B of booleans), product types T × U and arrows types T → U , and pairs ., . , projections π0 , π1 , conditional ifT and primitive recursion RT in all types, and the usual reduction rules for λ, ., . , ifT , RT . For more details about T we refer to Girard [10]. Definition 1. (abbreviations for G¨ odel’s system T ). 1. 2. 3. 4.
We write t1 = t2 iff t1 = t2 is a theorem of system T equational theory. If n ∈ N, with n we denote the numeral S n (0) : N. We denote with T, F : Bool the booleans of T . We denote with F V (t) the set of free variables of a term t ∈ T .
We will denote with π2 the string π1 π1 , which is useful since we will often deal with terms u of type A × (B × C). π2 u will represent the third component of u; to be consistent, we will also denote with u0 , u1 , u2 any term u0 , u1 , u2
of type A × (B × C). We formalize now the idea of “finite information about EM1 ” by the notion of state of knowledge. Definition 2. (State Of Knowledge) A state of knowledge, shortly a state, is a finite set s of triples P, n, m such that: – P : Nk+1 → Bool is a closed term of T ; – n = n1 , . . . , nk is a vector of numerals such that P nm = T ; – (consistency) if P, n, m1 ∈ s and P, n, m2 ∈ s, then m1 = m2 . We denote with S the set of all states of knowledge. We think of an atom P, n, m as the code of a witness for ∃y.P (n, y). Consistency condition allows at most one witness for each ∃y.P (n, y) in each knowledge base s. Two witnesses P, n, m , P, n, m with m = m are said inconsistent. odel’s T with T1 is the simply typed lambda calculus obtained by extending G¨ a new atomic type S denoting the elements of S, with four new term formation rules (Def. 3) and four new reduction rules (Def. 4). We will denote terms of T1 of type S by ρ, ρ , . . .. Definition 3. 1. A state s ∈ S is a constant of type S. 2. If P : Nk+1 → Bool is a closed term of T , ρ : S and t1 , . . . , tk : N, then χP ρt1 . . . tk and ϕP ρt1 . . . tk are respectively terms of type Bool and N. 3. If ρ1 , ρ2 : S, then ρ1 ρ2 is a term of type S. 4. If ρ : S, P : Nk → Bool is a closed term of T and t1 , . . . , tk : N, then (Add)ρ P, t1 , . . . , tk is a term of type S.
26
F. Aschieri and S. Berardi
We do not distinguish between a state and the constant denoting it: the set S of states is equal to the set of constants of type S. We interpret χP ρt1 . . . tk and ϕP ρt1 . . . tk respectively as a “guess” for the values of the oracle and the Skolem map for ∃y.P t1 . . . tk y, guess computed w.r.t. the knowledge base s which is the value of ρ. If ρ1 , ρ2 have value the states s1 , s2 , we will interpret ρ1 ρ2 as the union s1 ∪ s2 of the two sets of atoms s1 , s2 , minus all atoms of s2 which are inconsistent with some atom of s1 . s1 s2 is an asymmetrical operation: whenever an atom of s1 and an atom of s2 are inconsistent, we arbitrarily keep the atom of s1 and reject the atom of s2 . In a similar way, if ρ has value the state s, then (Add)ρ P, t1 , . . . , tk denotes s ∪ P, t1 , . . . , tk , minus P, t1 , . . . , tk in the case this atom is inconsistent with some atom of s. Definition 4. Let s, s1 , s2 ∈ S. Assume P : Nk+1 → Bool is a closed term of odel’s T equational T . Then the equational theory associated with T1 extends G¨ theory by the following four rules: 1. If P, n, m ∈ s, then χP sn = T and ϕP sn = m, else χP sn = F and ϕP sn = 0. / 2. s1 s2 = the constant denoting the state s1 ∪ { P, n, m ∈ s2 | P, n, m ∈ s1 for every m ∈ N}. 3. If P nm = T , then Add s P, n, m = s { P, n, m } else Add s P, n, m = s Remark. T1 is nothing but T with some “syntactic sugar”. Indeed, the type S could be translated by N by a suitable bijection between S and N. The terms odel’s T . Each reducχP , ϕP , , Add could be translated by suitable terms of G¨ tion step of T1 could be translated in one or more reduction steps of T . Therefore, we may assume that T1 enjoys the strong normalization. The Church-Rosser property for T1 follows by the fact that redexes of T1 are algebraic and orthogonal each other. We may also prove a normal form property. Any closed normal term of T1 of type N, Bool, S, A × B, A → B is equal respectively to a numeral, a boolean, a state constant, a pair or a λ-abstraction. Proof Sketch. By induction over the term. Assume the first symbol of the term is the primitive recursion symbol, the conditional, or Add, a projection, an application. By case analysis and induction hypothesis on the immediate subterms we may always show that the term is not normal. For the rest of the paper, we assume be fixed a special state variable σ denoting the current “knowledge state” of a term. We introduce now some distinguish subsets of T1 . Definition 5. Let σ : S be a fixed special variable. 1. T1− is the set of all t ∈ T1 without constants of type S. 2. T1σ is the set of all t ∈ T1− such that F V (t) ⊆ {σ}. We will call any t ∈ T1σ a term in the free state variable σ. We have T ⊂ T1σ ⊂ T1− ⊂ T1 , since σ ∈ T1σ − T , xN ∈ T1− − T1σ and s ∈ T1 − T1− . If t ∈ T1σ then t has only one free state variable, and no state constant. If t ∈ T1σ
Interactive Learning-Based Realizability Interpretation
27
we will write, in order to stress the fact, t[σ]; if s is a state, t[s] will denote t[s/σ]. We interpret any t ∈ T1σ as a learning process evaluated w.r.t. the information taken from a unique state, which is the value of the variable σ. t[s] is a closed term of T1 , and if t[σ] : S, then t[s] = s for some s ∈ S. We introduce now a notion of convergence for terms of atomic type of T1σ , expressing the fact that the integers and booleans denoted by a term of T1σ eventually stop changing when the knowledge state σ increases. We say that a sequence {si }i∈N is a weakly increasing chain of states (just w.i. for short), if si ⊆ si+1 for all i ∈ N. We will often write si ≤ si+1 for si ⊆ si+1 . Definition 6. (Convergence). Assume that {si }i∈N is a w.i. chain of states, and u, v ∈ T1σ . 1. u converges in {si }i∈N if ∃n ∈ N.∀m ≥ n.u[sm ] = u[sn ] 2. we say that u converges to v in {si }i∈N and we write “for n → ∞ u[sn ] = v[sn ]” if ∃n ∈ N.∀m ≥ n.u[sm ] = v[sm ]; 3. u converges if u converges in every w.i. chain of states. Remark that if u[σ] is convergent, we do not ask that u is convergent to the same value on all w.i. chain of states. This expresses the fact that the value learned by u may depend on the order by which u gets the information. In Theorem 1 we will prove that if u[σ] has type N, Bool, then u[σ] is convergent: in other words, the value “learned” by u[σ] eventually stops changing. For term of type S we will prove a slightly different property: on all w.i. chains, u[σ] converges to some σ s (this depends on the fact that u[σ] has input σ). In the same Theorem we will prove that if u[σ] : S then u[s] ≥ s for all s ∈ S: we interpret this result by the fact that a learning process always increases the state of knowledge. For each type A of T1 we define a set A of terms u[σ] : A which we call the set of stable terms of type A. We define stable terms by lifting the notion of convergence from atomic types (having a special case for the atomic type S, as we said) to arrow and product types. Definition 7. (Stable Terms). Let {si }i∈N be a w.i. chain of states and s ∈ S. Assume A is a type of T1 . We define a set A of terms t[σ] ∈ T1σ of type A, by induction on A. 1. S is the set of all t[σ] : S such that for all w.i. chains {si }i∈N : (a) ∃s ∈ S. t[σ] converges to σ s in {si }i∈N (b) for all s, s ∈ S, if s = t[s] then s ≥ s 2. N = {t[σ] : N | t converges} 3. Bool = {t[σ] : Bool | t converges} 4. A × B = {t[σ] : A × B | π0 t ∈ A, π1 t ∈ B} 5. A → B = {t[σ] : A → B | ∀u ∈ A, tu ∈ B} If t ∈ A, we say that t is a stable term of type A. By induction on A, we may check that there is some dummy stable term dA ∈ A ∈ T1σ : we set dN = 0, dBool = F, dS = σ, dA×B = dA , dB , dA→B = λ A .dB . We want to prove now that any term u[σ] : A is stable.
28
F. Aschieri and S. Berardi
Lemma 1. (Stability.) Suppose u ∈ T1σ , and for every w.i. chain of states {si }i∈N there is a v ∈ C such that u[sn ] = v[sn ] for n → ∞. Then u ∈ C. Proof. By induction over u (see [3]). Theorem 1. (Stability Theorem) Let w : A be a term of T1σ and let x1 : A1 , . . . , xn : An contain all the free variables of w other than σ. If t1 ∈ A1 , . . . , tn ∈ An , then w[t1 /x1 · · · tn /xn ] ∈ A Proof. By induction over w (see [3]). Corollary 1. Assume that w : A, w ∈ T1σ . Then w ∈ A. If A = N, Bool then w converges, while if A = S, then w converges to some σ s in every w.i. chain, and w[s/σ] ≥ s for all s ∈ S. As last result of this section, we prove that if we start from any state s and we repeatedly apply a term t[σ] in the free state variable σ, eventually we reach a state s = th [s] such that t[s ] = s . We interpret this result by saying that each “learning process” t[σ] eventually stop adding new information to σ. Theorem 2. (Fixed Point Property) Let t[σ] : S be a term in the free variable σ, and s ∈ S. Define t0 [s] = s and tn+1 [s] = t[tn [s]]. Then there are h ∈ N, s ∈ S such that s = th [s], s ≥ s and t[s ] = s . Proof. See the full version of the paper [3].
3
An Interactive Learning-Based Notion of Realizability
In this section we introduce the notion of realizability for HA + EM1 , Heyting Arithmetic plus Excluded Middle on Σ10 -formulas, then we prove our Main Theorem, the Adequacy Theorem: “if a closed arithmetical formula is provable in HA + EM1 , then it is realizable”. For proofs we refer to [3]. We first define the formal system HA + EM1 , from now on “Core Arithmetic”. We will represent atomic formulas of HA + EM1 with terms of T of type Bool. We assume having in T some terms ⇒Bool : Bool, Bool → Bool, ¬Bool : Bool → Bool, . . ., implementing boolean connectives. If t1 , . . . , tn , t ∈ T have type Bool and free variables all of type Bool, we will say that t is a tautological consequence of t1 , . . . , tn in T (a tautology if n = 0) if all boolean assignments making t1 , . . . , tn equal to T in T also make t equal to T in T . Definition 8. (Core and Extended Language of Arithmetic). The Core Language L of Arithmetic is defined as follows. 1. The terms of L are all t ∈ T such that t : N and F V (t) ⊆ {xN1 , . . . , xNn } for some x1 , . . . , xn . 2. The atomic formulas of L are all P t1 . . . tn ∈ T , for some closed P : Nn → Bool and some terms t1 , . . . , tn of L.
Interactive Learning-Based Realizability Interpretation
29
3. The formulas of L are built from atomic formulas of L by the connectives ∨, ∧, → ∀, ∃ as usual. The proofs of Core Arithmetic are proof-trees in natural deduction style, as in van Dalen [9], with: (i) an axiom schema for EM1 ; (ii) the induction rule; (iii) as Post rules: all axioms of equality and ordering on N, all equational axioms of T , and one schema for each tautological consequences of T . We denote with ⊥ the atomic formula F and will sometimes write a generic atomic formula as P (t1 , . . . , tn ) rather than in the form P t1 . . . tn . Finally, since any arithmetical formula has only variables of type N, we shall freely omit their types, writing for instance ∀xA in place of ∀xN A. Post rules cover many rules with atomic assumptions and conclusion as we find useful, for example, the rule: “if f (z) ≤ 0 then f (z) = 0”. As an intermediate step for the Realization interpretation, we introduce an extension of HA+EM1 we call Extended Arithmetic. The Language L1 of Extended Arithmetic extends the Language of Core Arithmetic by allowing terms and atomic formulas to depend on a state, and by the symbols χP , ϕP denoting oracles and Skolem maps for Σ10 -formulas. Definition 9. The the Extended Language L1 is defined as follows. 1. The terms of L1 are all t ∈ T1 such that t : N and F V (t) ⊆ {xN1 , . . . , xNn , σ} for some x1 , . . . , xn . 2. The atomic formulas of L1 are all P t1 . . . tn ∈ T1 , for some P : Nn → Bool, P ∈ T1σ and some terms t1 , . . . , tn of L1 . 3. The formulas of L1 are built from atomic formulas of L1 by the connectives ∨, ∧, → ∀, ∃ as usual. Lσ1 is the subset of the terms and formulas L1 without constants of type S and whose only free variable is σ : S. Deduction rules for Extended Arithmetic are those for Core Arithmetic, plus the axiom schemas for oracles: P (t1 , . . . , tn , t) ⇒Bool χP σt1 . . . tn and for Skolem maps: χP σt1 . . . tn ⇒Bool P (t1 , . . . , tn , (ϕP σt1 . . . tn )). Recall that ⇒Bool : Bool, Bool → Bool is a term implementing implication, therefore P (t1 , . . . , tn , t) ⇒Bool χP σt1 . . . tn is not an implication between two atomic formulas, but it is equal to the single atomic formula Qt1 . . . tn t, where Q = λxN1 . . . λxNn+1 ⇒Bool (P x1 . . . xn xn+1 )(χP σx1 . . . xn+1 ) The set of Skolem axioms effectively used by a given proof will take the place of experiments checking our assumptions about Skolem maps and oracles. The Language of Extended Arithmetic, indeed, is quite unusual. The idea behind it is to offer a way of extending with non computable terms (for example, the ideal χ and ϕ) the standard Language of Arithmetic. We interpret non computable terms t[σ] ∈ Lσ1 as computable maps from S to N, approximating the ideal χ and ϕ by instantiating their free state variable σ with a state s.
30
F. Aschieri and S. Berardi
Using the metaphor explained in the introduction, we will use a theory - whose hypotheses are determined by s - to predict truth values for atomic formulas that we cannot effectively evaluate. Our definition of realizability will provide a formal semantics for the Extended Language of Arithmetic (in the free variable σ), and therefore also for the more usual language of Core Arithmetic in which all functions represent recursive maps. Definition 10. (Types for realizers) For each arithmetical formula A we define a type |A| of T by induction on A: |P (t1 , . . . , tn )| = S, |A ∧ B| = |A| × |B|, |A ∨ B| = Bool × (|A| × |B|), |A → B| = |A| → |B|, |∀xA| = N → |A|, |∃xA| = N × |A| We will now define the realization relation t A, where t ∈ T1σ , A ∈ Lσ1 and t : |A|. Remark that by the Corollary 1 we have t ∈ |A|. By the same Corollary, if A is atomic, that is, if t[σ] : |A| = S, for all s, s ∈ S if s = t[s] then s ≥ s . We interpret this result by saying that any realizer of any atomic formula extends the “current state of knowledge”. We first define t s A, the realization relation w.r.t. any state s ∈ S, then t A. Definition 11. (Indexed Realizability and Realizability) Assume t ∈ T1σ , A ∈ Lσ1 , and t : |A|. We define t s A for any s ∈ S by induction on A. 1. 2. 3. 4. 5. 6.
t s t s t s t s t s t s
P (t1 , . . . , tn ) iff t[s] = s implies P (t1 , . . . , tn )[s] = T A ∧ B iff π0 t s A and π1 t s B A ∨ B iff either π0 t[s] = T and π1 t s A or π0 t[s] = F and π2 t s B A → B iff for all u, if u s A, then tu s B ∀xA iff for all n ∈ N, tn s A[n/x] ∃xA iff π0 t[s] = n and π1 t s A[n/x]
t A iff ∀s ∈ S: t s A In the above definition, at last, we see formalized all the intuitions we hinted at in the introduction. Realizers of disjunctions and existential statements provide a witness, which is an individual depending on an actual state of knowledge s ∈ S, representing the hypotheses used to approximate the non-computable. The actual behavior of a realizer depends upon the current state of knowledge. The state is used only when there is relevant information about the truth of a given formula to be computed: the truth value P (t1 , . . . , tn )[s] of an atomic formula and the existential or disjunctive witness π0 t[s] are computed w.r.t. the state. A realizer t of ∃xA uses the state s to predict that π0 t[s] equals an n, some witness for ∃xA (i.e. A(n) that is realizable). A realizer t of A ∨ B uses the state s to predict which one between A and B is realizable (if π0 t[s] = T then A is realizable, and if π1 t[s] = F then B is realizable). These predictions need not be always correct; hence, it is possible that a realized atomic formula - which ideally should be true when realized - it is actually false; namely, t s P notwithstanding, it may happen that P [s] equals false. If a Skolem axiom predicted to be true is indeed false, then we have encountered
Interactive Learning-Based Realizability Interpretation
31
a counterexample and so our theory is wrong, our approximation still inadequate; in this case, the atomic realizer t takes the state s and extends it to a state t[s] > s. That is to say: if something goes wrong, we must learn from our mistakes. The point is that after every learning, the actual state of knowledge grows, and if we ask to the same realizer new predictions, we will obtain “better” answers. Indeed, we can say more about this last point. Suppose for instance that t A∨B and let {si }i∈N be a w.i. chain of states. Then, since t ∈ Bool×|A|×|B|, π0 t converges in {si }i∈N to a boolean; thus t’s predictions eventually stabilize in the limit and hence a witness is eventually learned. In the atomic case, in order to have t s P (t1 , . . . , tn ), we require that if s is a fixed point of t, then P (t1 , . . . , tn )[s] must equal true. That is to say: if t has nothing more to learn and has no new information to add to s, then it must assure the truth of P (t1 , . . . , tn ) in the state s. By the Fixed Point Theorem, our terms of type S and in the free state variable σ, as we saw, have plenty of fixed points; hence search for truth will be for us computation of a fixed point, driven by the Skolem axioms used by the proof, rather than exhaustive search for counterexamples. As usual for a Realization interpretation, we may extract from any realizer t ∀x.∃y.P (x, y), with P ∈ T , some recursive map ψ : N → N such that P (n, ψ(n)) for all n ∈ N. Indeed, by unfolding the definition of realizer, for all n ∈ N, s ∈ S, if a = π0 t, b = π1 t ∈ T1σ then a(n) s P (n, b(n)). If we define φ(s) = a(n)k [s] for the first k such that a(n)k+1 [s] = a(n)k [s], as in the proof of the Fixed Point Theorem, then a(n)[φ(s)] = φ(s), and by definition of realizer P (n, b(n))[φ(s)] ≡ P (n, b(n)[φ(s)]) is true. The map ψ is then defined by ψ(n) = b(n)[φ(s0 )] for all n ∈ N and some random choice of s0 ∈ S. We may prove that the map ψ in fact belongs to T1 , provided we replace the notion of convergence used in this paper with the intuitionistic notion of convergence introduced in [4], and we use this latter to provide a bound for for the first k such that a(n)k+1 [s] = a(n)k [s]. We postpone this topic to another paper. Now we explain how to turn each proof D of a formula A ∈ L1 in HA+ EM1 into a realizers D∗ of the same A. By induction on D, we define a “decoration with realizers” DReal of D, in which each formula B of D is replaced by a new statement u B, for some u ∈ T1 . If t A is the conclusion of DReal , we set D∗ = t. Then we will prove that if D is closed and without assumptions, then D∗ ∈ T1σ and D∗ A. The decoration DReal of D with realizers is completely standard: we have new realizers only for Excluded Middle and for atomic formulas. For notation simplicity, if xi is the label for the set of occurrences of some assumption Ai of D, we use xi also as a name of one free variable in D∗ of type |Ai |. Definition 12. (Term Assignment Rules for Core Arithmetic). Assume D is a 1 proof of A ∈ L1 in HA + EM1 , with free assumptions A1 , . . . , An labeled xA 1 ,..., An N N xn and free variables α1 , . . . , αm . By induction on D, we define a decorated proof-tree DReal , in which each formula B is replaced by u B for some u ∈ T1 , |A | |A | and the conclusion A with some t A, with F V (t) ⊆ {x1 1 , . . . , x1 1 , αN1 , . . . , N ∗ αm , σ}. Eventually we set D = t. all for type
32
F. Aschieri and S. Berardi
1. x|A| A if D consists of a single free assumption A ∈ L labeled xA . uA tB uA∧B uA∧B 2. u, t A ∧ B π0 u A π1 u B uB uA→B tA 3. ut B λx|A| u A → B uA uB 4. T, u, dB A ∨ B F, dA , u A ∨ B u A ∨ B w1 C w2 C if π0 u then (λx|A| w1 )(π1 u) else (λx|B| w2 )(π2 u) C where dA and dB are dummy stable terms of type |A| and |B| uA u ∀xA 5. ut A[t/x] λxN u ∀xA where t is a term of L and xN does not occur free in any free assumption B of the subproof of D of conclusion A. u ∃αN .A t C u A[t/αN ] 6. N N t, u ∃α .A (λα λx|A| t)(π0 u)(π1 u) C N where α is not free in C nor in any free assumption B different from A in the subproof of D of conclusion C. u A(0) v ∀x.A(x) → A(S(x)) 7. λy N Ruvy ∀xA u A1 u2 A2 · · · un An 8. 1 u1 u2 · · · un A where n > 0 and A1 , A2 , . . . , An , A are atomic formulas of L, and the rule is a Post rule for equality or ordering, or a tautological consequence. 9. σ A where A is an atomic axiom of HA + EM1 (an axiom of equality or of ordering or a tautology or an equation of T ) 10. EP ∀x. ∃y P (x, y) ∨ ∀y¬Bool P (x, y) where EP is defined as λαN χP σα, ϕP σα, σ , λnN (Add)σ P, α, n
The term decorating the conclusion of a Post rule is of the form u1 · · · un . In this case, we have n different realizers, whose learning capabilities are put together through a sort of union. If u1 · · · un has a fixed point s then we may prove that s is a common fixed point for all u1 , . . . , un ,1 i.e., that all ui “have nothing to learn”. Then each ui must guarantee Ai to be true, and therefore the conclusion of the Post rule is true, because true premises A1 , . . . , An spell a true conclusion A. The decoration DReal of a proof D can be extended to any proof of Extended Arithmetic. Definition 13. (Extra Term Assignment Rules for Extended Arithmetic). 1
Proof Sketch for n = 2. Prove the following first, for all s, s1 , s2 ∈ S: (i) s1 ≤ s1 s2 ; (ii) if s1 ≤ s2 then s1 s2 = s2 ; (iii) if s ≤ s1 , s2 , and s = s1 s2 then s = s1 and s = s2 . Now from s ≤ u1 [s], u2 [s] and s = u1 [s] u2 [s] deduce s = u1 [s] = u2 [s].
Interactive Learning-Based Realizability Interpretation
33
– (Add)σ P, t1 , . . . , tn , t P (t1 , . . . , tn , t) ⇒Bool χP σt1 . . . tn , χ-Axiom – σ χP σt1 . . . tn ⇒Bool P (t1 , . . . , tn , (ϕP σt1 . . . tn )) , ϕ-Axiom Example (Realizer of the Excluded Middle 1). We now prove that EP ∀x. ∃y P (x, y) ∨ ∀y¬Bool P (x, y). Let m be a vector of numerals and let s ∈ S. EP is defined as λαN χP σα, ϕP σα, σ , λnN (Add)σ P, α, n
and we want to prove that EP m s ∃y P (m, y) ∨ ∀y¬Bool P (m, y) Assume π0 EP m[s] = χP sm = T : we have to prove that π1 EP m = ϕP σm, σ s ∃y P (m, y) By definition, ϕP sm = n, for some n such that P (m, n) = T ; hence, σ s P (m, n) and ϕP σm, σ s ∃y P (m, y). Now assume π0 EP m[s] = χP sm = F. We have to prove π2 EP m = λn (Add)σ P, m, n s ∀y¬Bool P (m, y) that is that, given any n ∈ N, (Add)σ P, m, n s ¬Bool P (m, n) By definition, we have to assume that s is a fixed point of (Add)σ P, m, n ; that is, (Add)s P, m, n = s. Then, it must be the case that P (m, n)[s] = F : it were not so, P (m, n)[s] = T and hence s = s ∪ P, m, n for some n ∈ N , contradicting χP sm = F . Thus, ¬Bool P (m, n)[s] = T : it follows our thesis. EP works according to the ideas we sketched in the introduction. It uses χP to make predictions about which one between ∃y P (m, y) and ∀y¬Bool P (m, y) is true. χP , in turn, relies on the actual state s to make its own prediction. If χP sm = F , given any n, ¬Bool P (m, n) is predicted to be true; if it is not the case, we have a counterexample and (Add) extends the state with P, m, n . On the contrary, if χP sm = T , there is unquestionable evidence that ∃yP (m, y) holds; namely, there is an n such that P, m, n ∈ s; then ϕP is called, and it returns ϕP sm = n. This is the basic mechanism by which we implement learning: every state extension is linked with an assumption about an instance of EM1 which we used and turned out to be wrong (this is the only way to come across a counterexample); in next computations, the actual state will be bigger, the realizer will not do the same error, and hence will be “wiser”. We will now prove our main theorem, that every theorem of HA + EM1 is realizable. We need a few Lemmas first. In the next Lemma we prove that if we introduce a state constant by substitution, then we eliminate it, we preserve the realization relation. Namely, take any realizer u[σ] in T1σ , then replace some occurrences of σ with a constant s, obtaining some term u”[σ, s] in T1 − T1σ . Assume that u”[σ, s] is equal to some u [σ] ∈ T1σ (without the constant s). Then u, u realize the same formulas.
34
F. Aschieri and S. Berardi
Lemma 2. (State Constant Elimination) Let u, u ∈ T1σ , C, C ∈ Lσ1 . Assume for some u” ∈ T1 , C” ∈ L1 we have u [σ] = u”[σ, s], u[σ] = u”[σ, σ] and C [σ] = C”[σ, s], C[σ] = C”[σ, σ]. Then u s C iff u s C. Proof. By induction on C (see [3]). We will also need the commutation of (.)∗ with substitution: Proposition 1. If D is a proof of A, and m ∈ N, then D∗ [m/αN ] = D[m/αN ]∗ . We are now able to prove our main theorem. Theorem 3. (Adequacy Theorem) Suppose that D is a proof of A in the system An 1 of Extended Arithmetic with free assumptions xA and free variables 1 , . . . , xn ∗ α1 : N, . . . , αk : N, σ : S. Let w = D . For all s ∈ S, if n1 , . . . , nk ∈ N and t1 s A1 [n1 /α1 · · · nk /αk ] and . . . and tn s An [n1 /α1 · · · nk /αk ], then |A1 |
w[t1 /x1
n| · · · tn /x|A n1 /α1 · · · nk /αk ] s A[n1 /α1 · · · nk /αk ] n
Proof. By induction on w (see [3]). Corollary 2. If A is a closed formula provable in HA + EM1 , then there exists w ∈ T1σ such that w A.
References 1. Avigad, J.: Update Procedures and the 1-Consistency of Arithmetic. Math. Log. Q. 48(1), 3–13 (2002) 2. Akama, Y., Berardi, S., Hayashi, S., Kohlenbach, U.: An Arithmetical Hierarchy of the Law of Excluded Middle and Related Principles. In: LICS 2004, pp. 192–201 (2004) 3. Aschieri, F., Berardi, S.: An Interactive Realizability... (Full Paper), Tech. Rep., Un. of Turin (2009), http://www.di.unito.it/~ stefano/Realizers2009.pdf 4. Berardi, S.: Classical Logic as Limit . . . . MSCS 15(1), 167–200 (2005) 5. Berardi, S.: Some intuitionistic equivalents of classical principles for degree 2 formulas. Annals of Pure and Applied Logic 139(1-3), 185–200 (2006) 6. Berardi, S., Coquand, T., Hayashi, S.: Games with 1-Bactracking. In: GALOP 2005 (2005) 7. Berardi, S., de’Liguoro, U.: A calculus of realizers for EM1 -Arithmetic. In: Kaminski, M., Martini, S. (eds.) CSL 2008. LNCS, vol. 5213, pp. 215–229. Springer, Heidelberg (2008) 8. Coquand, T.: A Semantic of Evidence for Classical Arithmetic. Journal of Symbolic Logic 60, 325–337 (1995) 9. Dalen, D.v.: Logic and Structure, 3rd edn. Springer-, Heidelberg (1994) 10. Girard, J.-Y.: Proofs and Types. Cambridge University Press, Cambridge (1989) 11. Gold, E.M.: Limiting Recursion. Journal of Symbolic Logic 30, 28–48 (1965) 12. Hayashi, S., Sumitomo, R., Shii, K.: Towards Animation of Proofs -Testing Proofs by Examples. Theoretical Computer Science (2002) 13. Hayashi, S.: Can Proofs be Animated by Games? FI 77(4), 331–343 (2007) 14. Hayashi, S.: Mathematics based on incremental learning - Excluded Middle and Inductive Inference. Theoretical Computer Science 350, 125–139 (2006) 15. Kleene, S.C.: On the Interpretation of Intuitionistic Number Theory. Journal of Symbolic Logic 10(4), 109–124 (1945) 16. Popper, K.: The Logic of Scientific Discovery. Routledge Classics, Routledge (2002)
Syntax for Free: Representing Syntax with Binding Using Parametricity Robert Atkey School of Informatics, University of Edinburgh
[email protected]
Abstract. We show that, in a parametric model of polymorphism, the type ∀α.((α → α) → α) → (α → α → α) → α is isomorphic to closed de Bruijn terms. That is, the type of closed higher-order abstract syntax terms is isomorphic to a concrete representation. To demonstrate the proof we have constructed a model of parametric polymorphism inside the Coq proof assistant. The proof of the theorem requires parametricity over Kripke relations. We also investigate some variants of this representation.
1
Introduction
Representing, computing with, and reasoning about syntax with binding has been of interest to computer scientists for the last 30 or 40 years. The crucial point that makes these activities difficult is the notion of α-equivalence, the obvious idea that if we have two terms equal up to the swapping of the names of their bound variables, e.g. λx.x and λy.y, then the terms should be treated equally. Unfortunately, the obvious representation of binders as a pair of a variable name and a subterm does not respect α-equivalence, so operations on such data must be carefully written in order to respect it. In this paper, we look at two solutions that have been put forward to deal with this (we do not look at the third major approach: nominal sets [7]): de Bruijn indicies and higher-order abstract syntax, and relate the two. The de Bruijn index approach [5], approaches the problem by removing the names of bound variables altogether. Bound variables are represented by pointers to the construct that binds them. For instance, the λ-term λx.λy.xy is represented as λ.λ.1 0. The bound variable x has been replaced by a pointer to the binder one step away from the occurrence, and the bound variable y has been replaced by a binder zero steps away. The advantage of this representation is that α-equivalent terms are now structurally equal. The disadvantage is the complicated definitions of common operations such as substitution, where non-intuitive shifting operations are required to maintain the correct pointers. Another common approach is to use higher-order abstract syntax [13]. In this approach, we use the binding structure of the meta-language to represent binding in the object-language. For the untyped λ-calculus, we suppose that there is a type tm and operations lam : (tm → tm) → tm and app : tm → tm → tm. P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 35–49, 2009. c Springer-Verlag Berlin Heidelberg 2009
36
R. Atkey
The object-level term λx.λy.xy is thus represented as the meta-language term lam (λx. lam (λy. app x y)). The key advantage of this approach is that, since object-level variables are represented using meta-level variables, substitution becomes very easy to define. A disadvantage of this representation is the need to make sure that we do not allow too many terms into our type tm. Proving that we have not done so is called adequacy [8], and is usually performed by reasoning on the canonical forms of some weak type theory such as LF. The key to higher-order abstract syntax is that the meta-level variables that are used to represent object-level variables are only used as variables, and cannot be further analysed. Washburn and Weirich [18] noted that parametric type abstraction, as available in System F, is a viable way of ensuring that represented terms are well behaved. They consider the type ∀α.((α → α) → α) → (α → α → α) → α and derive a fold operator and some reasoning principles from it. This type captures the two operations of higher-order abstract syntax, the lam and the app, but abstracts over the carrier type. Washburn and Weirich claim that this type represents exactly the terms of the untyped λ-calculus, but do not provide a proof. Coquand and Huet [4] also state that this type represents untyped lambda terms, also without proof. In this paper we provide such a proof. The reason that this approach works is that System F terms of type ∀α.τ must act parametrically in α, that is, they cannot reflect on what actual instantiation of α they have been provided with. Reynolds [16] formalised this idea by stating that for any two instantiations of α, parametric terms must preserve all relations between them. We take this idea, and extend it to use Kripke relations [15]. Kripke relations are relations R indexed by some preorder W , such that if w ≤ w in W , then Rwxy implies Rw xy. By requiring that all terms of polymorphic type preserve all Kripke logical relations, we can prove that the denotation of the type given by Washburn and Weirich is isomorphic to the type of closed de Bruijn terms: de Bruijn terms that do not have dangling pointers. The preorder-indexing of the relations is used to handle the expansion of the number of meta-variables being used as object-variables as we go under binders. Traditionally, parametric models of System F have been hard to come by, and have generally involved fiddly constructions with PERs. We make life easier for ourselves by starting with a meta-theory1 that already has impredicative polymorphism and construct a parametric model of System F inside it. We use a version of Coq with impredicative polymorphism for this purpose, and we have formalised most of our results here2 . Overview. In the next section we introduce our model of System F inside the Coq type theory. Following that, in Section 3, we present our main result, the 1 2
Or meta-meta-theory, if one is pedantic. The formal development is available from: http://homepages.inf.ed.ac.uk/ratkey/parametricity
Syntax for Free: Representing Syntax with Binding Using Parametricity
37
isomorphism between the Washburn-Weirich HOAS type and de Bruijn terms. In Section 4 we investigate two alternative representations that take different views on how variables are represented. In Section 5, we show how the computational aspect of System F can be integrated into our object-level representations, and prove that a simplified version of the Haskell ST monad can be represented using de Bruijn-style terms. Finally, Section 6 concludes with a discussion of related work.
2
A Model of Parametric Polymorphism
To state and prove our main results, we construct, inside the Coq proof assistant, a denotational model of System F that supports parametricity. For simplicity, we want to let System F types be denoted by objects of sort Set; we can then express denotations of terms as normal Coq functions that preserve all Kripke relations. 2.1
Preparing the Meta-theory
In order to use Sets as denotations of System F types, we require impredicativity. The denotation of the type ∀α.τ quantifies over all denotations of types (i.e. Sets). By default, Coq’s type theory is predicative for Set (although it is impredicative in the type of propositions, Prop), so one cannot construct a new object of sort Set by quantifying over all objects of sort Set. Fortunately, Coq supports a command line option -impredicative-set that allows us to proceed. We also require three axioms to be added to Coq’s theory. The first of these is proof irrelevance, which states that all proofs of a given proposition are equal: ∀P : Prop. ∀p1 , p2 : P. p1 = p2 . We also require extensionality for functions, which states that two functions are equal if they are equal for all inputs: ∀A : Type, B : A → Type, f, g : (∀a.Ba). (∀x. f x = gx) → f = g Extensionality for functions allows our denotational model to support the ηequality rules of System F. We also require propositional extensionality, which will allow us to treat equivalent propositions as equal: ∀P, Q : Prop, (P ↔ Q) → P = Q These axioms allow us to define data with embedded proofs that are equal if their computational contents are equal, which will aid us in proving equalities between denotations of System F types. We informally justify our use of these axioms, plus impredicativity, by the existence of models of CIC in intuitionistic set theory. In the remainder of the paper, we use informal set theoretic notation and do not explicitly highlight the uses of these axioms. Note that everywhere we use the word “set”, we are referring to Coq objects of sort Set.
38
2.2
R. Atkey
Denotational Semantics of System F
The syntax of System F types is standard: τ ::=
α
| τ1 → τ2
| ∀α.τ
where α is taken from a countably infinite set of variables, and ∀α.τ binds α in τ . We actually use a de Bruijn representation of types (and terms) of System F in our Coq development, but we will use the usual concrete representation for exposition. As we mentioned in the introduction, in order to prove the isomorphisms below involving syntax with binding, we require that the denotation of ∀α.τ be parametric over all Kripke relations over all preorders. Preorders consist of a carrier W : Type and a binary relation ≤W : W → W → Prop that is reflexive and transitive. For a given preorder W , a W -Kripke logical relation over sets A, B : Set is a predicate R : W → A → B → Prop, such that ∀w, w , a, b. w ≤W w → Rwab → Rw ab. For brevity, we write the collection of all W -Kripke relations over A, B as KRel(W, A, B). Note that, even though we are using W -indexed Kripke relations, we do not use sets indexed by any particular W as denotations of System F types—we are not constructing a model of System F in the presheaf category for some preorder W . We will require multiple instantiations of W in our proofs. Type environments γ are mappings from type variables to sets. For a preorder W and a pair of type environments γ1 , γ2 , a relation environment ρ is a mapping from type variables α to W -Kripke relations over γ1 (α), γ2 (α). For any type environment γ and preorder W , there is a relation environment ΔW γ that maps all type variables to the equality relation. We now define the denotations of types and the induced Kripke relations between them. The mapping T − maps types with type environments to sets and the mapping R− maps types τ , preorders W and relation environments over type environments γ1 , γ2 to W -Kripke relations over T τ γ1 , T τ γ2 . These mappings are mutually defined over the structure of types: T αγ = γ(α) T τ1 → τ2 γ = T τ1 γ → T τ2 γ T ∀α.τ γ = { x : ∀A : Set. T τ (γ[α → A]) | ∀W, A1 , A2 , R : KRel(W, A1 , A2 ), w : W. Rτ W (ΔW γ [α → R]) w (x A1 ) (x A2 ) } RαW ρ w x y = ρ(α) w x y Rτ1 → τ2 W ρ w f g = ∀w : W, x : T τ1 γ1 , y : T τ1 γ1 . w ≤W w → Rτ1 W ρ w x y → Rτ2 W ρ w (f x) (gy) R∀α.τ W ρ w x y = ∀A1 , A2 , R : KRel(W, A1 , A2 ). Rτ W (ρ[α → R]) w (x A1 ) (y A2 )
Syntax for Free: Representing Syntax with Binding Using Parametricity
39
These clauses are mostly straightforward for Kripke logical relations, but we draw the reader’s attention to the clause for T ∀α.τ . We have used impredicative quantification over all sets here. We also constrain the denotations of polymorphic types to be those that preserve all W -Kripke relations, for all preorders W . It is this parametricity property that we will use to prove the isomorphisms in Section 3. Lemma 1. The following hold, for all τ and preorders W : 1. For all γ1 , γ2 and ρ, Rτ W ρ is a W -Kripke relation over T τ γ1 , T τ γ2 . 2. For all γ and w, Rτ W ΔW γ w x y iff x = y. Proof. Both by induction over the structure of τ . Note that this denotational semantics of types validates the usual representations of inductive types in System F, e.g. T ∀α.α → (α → α) → αγ ∼ = N etc. Denotations of System F terms. We also define a denotation for every well-typed System F term, but we have elided these for lack of space. Please see the formal development for more details. The main result is that every well-typed System F term has a meaning in the model as a function from the denotation of the context to the denotation of the result type, such that all Kripke relations over any preorder are preserved by this function.
3
Representing λ-Terms Using Parametricity
We will show that, in our model, the denotation of the type τH = ∀α.((α → α) → α) → (α → α → α) → α is isomorphic to the set of closed de Bruijn terms. This task is not so straightforward as producing two functions and showing that they are mutually inverse: we must show that the function from the above type to de Bruijn terms actually does give a well-formed closed de Bruijn term. We define the set of well-formed de Bruijn terms as a natural number-indexed inductively defined set Term : N → Set with constructors: Var : {i : N | i < n} → Term(n) Lam : Term(n + 1) → Term(n) App : Term(n) → Term(n) → Term(n) The set of all closed de Bruijn terms is hence given by Term(0). This definition admits the following recursion principle3 : term rec : ∀P : N → Set. (∀n.{i : N | i < n} → P (n)) → (∀n.P (n + 1) → P (n)) → (∀n.P (n) → P (n) → P (n)) → ∀n.Term(n) → P (n) 3
This is less general than the one Coq provides, but suffices for our purposes.
40
R. Atkey
We will also need the set of “pre-de Bruijn” terms—terms that are not necessarily known to be well-formed—as an intermediate staging ground. The set preTerm is defined inductively with the following constructors: preVar : N → preTerm preLam : preTerm → preTerm preApp : preTerm → preTerm → preTerm There is an obvious relation n t relating context sizes to preTerms well-formed in that context, and an isomorphism between Term(n) and {t : preTerm | n t}. Note that the type preTerm is a normal inductive type and is therefore representable in parametric System F. The mapping from τH to preTerm that we give is also expressible in pure System F. We are now ready to define this mapping from denotations of the type τH to Term(0). We do this first by mapping to preTerm and then showing that the produced term satisfies 0 t. By the definition of T τH , the underlying set for this type is ∀A : Set.((A → A) → A) → (A → A → A) → A. We define φ(t) = t (N → preTerm) lam app 0, where: lam = λf.λi.preLam (f (λj.preVar (j − (i + 1))) (i + 1)) app = λx.λy.λi.preApp (x i) (y i) We instantiate a value of type τH with the set N → preTerm, intending that applying a function of this type to a number n will produce a term well-formed in the context of size n. Inside the definition of these functions, the argument represents the depth of context (or the number of binders) surrounding the current term. In the case for app, we do not go under a binder, so we do not increase the depth when applying it to the sub-terms. In the case for lam, given a function f of type (N → preTerm) → (N → preTerm), and a depth i, we apply f to an argument that will evaluate to a bound variable for a future depth j. The arithmetic computes the distance between the bound variable and its binder. Crucially, it is always the case that j > i, since we only ever count upwards in the depth of terms. This is the meat of the following: Lemma 2. For all t : T τH γ, 0 φ(t). Proof. We use the parametricity of the denotation of τH . Unfolding the definition of RτH , this tells us that the following property holds of all t : T τH γ: ∀W, A1 , A2 , R : KRel(W, A1 , A2 ), w : W. (∀w1 ≥ w, lam 1 : (A1 → A1 ) → A1 , lam 2 : (A2 → A2 ) → A2 . (∀w2 ≥ w1 , f1 : A1 → A1 , f2 : A2 → A2 . (∀w3 ≥ w2 , x : A1 , y : A2 .R w3 x y → R w3 (f1 x) (f2 y)) → R w2 (lam 1 f1 ) (lam 2 f2 )) → (∀w4 ≥ w1 , app 1 : A1 → A1 → A1 , app 2 : A2 → A2 → A2 . (∀w5 ≥ w4 , x1 : A1 , x2 : A2 .R w5 x1 x2 → (∀w6 ≥ w5 , y1 : A1 , y2 : A2 .R w6 y1 y2 → R w6 (app1 x1 y1 ) (app2 x2 y2 ))) → R w4 (t A1 lam 1 app 1 ) (t A2 lam 2 app 2 )))
Syntax for Free: Representing Syntax with Binding Using Parametricity
41
We let W be N with the usual ordering. We will not need to use both type arguments for this proof, so we set A1 = N → preTerm and A2 = 1, the one element set (we use dummy implementations of lam and app for this type). We set R n x y iff ∀n ≥ n.n x(n ). It is easy to verify that this is a Kripke relation. This relation will suffice to prove our lemma, provided we can prove that our implementations of lam and app in the definition of φ satisfy the requirements of t’s parametricity property. For lam, we must prove that at all depths n ≥ 0, if we are given a functional argument f : (N → preTerm) → (N → preTerm) satisfying the property at all n ≥ n, then for all n ≥ n, we have n preLam (f (λj.preVar (j − (n + 1))) (n + 1)) This is true if n + 1 f (λj.preVar (j − (n + 1))) (n + 1) Since f preserves R, we need only show that the argument λj.preVar (j −(n +1)) satisfies R at all n ≥ n + 1. This amounts to showing that n preVar(n − (n + 1)) which is trivial. The case for app is easier and is a straightforward application of the required property being satisfied by the two arguments. This proof is very similar to the Kripke logical relations proof employed by Rhiger [17] to prove that a single language embedded using higher-order abstract syntax always gives well-formed terms. We have extended this by allowing multiple languages to be embedded in a single meta-language. Rhiger also considers the use of type constructors to embed typed languages, something we cannot do in our System F setting. We also note that the proofs here are very similar in structure to the proofs used for proving adequacy of higher-order syntax encodings in LF [8]. Corollary 1. The map φ can be seen as a map from T τH γ to Term(0). The map φ−1 from closed de Bruijn terms is defined by recursion over the structure of terms. We make use of an auxiliary data structure of vectors vec A n, representing lists of elements of type A : Set of length n. These have two constructors: vecNil : vec A 0 vecCons : A → vec A n → vec A (n + 1) and a look-up function lookup : vec A n → {i : N | i < n} → A.
42
R. Atkey
The mapping φ−1 : Term(0) → T τH γ is defined as: φ−1 (t) = λA : Set.λlam.λapp. term rec (λn.vec A n → A) (λn, i, env . lookup env i) (λn, h, env . lam (λx. h (vecCons x env ))) (λn, x, y, env . app (x env ) (y env )) 0 t vecNil The basic idea is to recurse down the term, maintaining a vector of representations of bound variables. Every time we go under a binder, we extend the vector by the object provided by the implementation of lam. For this mapping to be well-defined, we must prove the following: Lemma 3. For all t : Term(0), φ−1 (t) is parametric. Proof. We must prove, essentially, that for any preorder W , pair of sets A1 , A2 and W -Kripke relation R over A1 , A2 , then if lam 1 , lam 2 and app 1 , app 2 are related pairs of functions, then the bodies of φ−1 are related by R at some index w. We strengthen the statement from talking about terms in Term(0) with empty starting environments to: for all n and t : Term(n), v1 : vec A1 n, v2 : vec A2 n and w ≥ w, ∀i : {i : N | i < n}, w ≥ w . R w (lookup v1 i) (lookup v2 i) implies R w (term rec ... t v1 ) (term rec ... t v2 ). This is easily proved by induction on t, and implies the lemma statement. We now prove that our two mappings are mutually inverse. We first do the direction that does not require parametricity: Lemma 4. For all t : Term(0), φ(φ−1 (t)) = t. Proof. As with the previous proof, we strengthen the statement to prove that for all n, t : Term(n) and v : vec (N → preTerm) n, ∀i ≤ n, n . n ≤ n → (lookup v i) n = Var(i + (n − n )) implies term rec ... t v n = t. This is easily proved by induction on t, and implies the lemma statement. The other direction requires the use of parametricity: Lemma 5. For all t : T τH , φ−1 (φ(t) = t. Proof. We are given a set A and operations lam and app. We apply the parametricity property of t (as given in the proof of Lemma 2) with the following data. The preorder W consists of lists of elements of A with the prefix ordering. The set A1 is set to N → preTerm, and A2 is set to A. We set the relation R to be R env x y iff: ∀env env . term rec ... (x (length env )) (toVec env ) = y
Syntax for Free: Representing Syntax with Binding Using Parametricity
43
where length gives the length of a list, and toVec maps lists l of As to a value of type vec A (length l). It is easy to prove that is is a Kripke relation. The proof then proceeds in a very similar way to the proof of Lemma 2. Summing up, we have: Theorem 1. Term(0) ∼ = T τH γ.
4
Alternative Representations of Variables
Washburn and Weirich [18] also consider terms with a fixed maximum number of free variables by using types of the form: n = ∀α.((α → α) → α) → (α → α → α) → αn τH
where α0 = α and αn+1 = α → αn . By extending the proof in the previous n γ ∼ section, we have been able to prove T τH = Term(n) for various n, but unfortunately we have not been able to formally prove this for all n. Washburn and Weirich further claim ([18], in the definition of iterList) that the type ∀α.((α → α) → α) → (α → α → α) → [α] → α represents terms with arbitrary numbers of free variables, where [α] is shorthand for lists of α. However, it is easy to see that this is not the case. Consider the following inhabitant of this type: Λα.λlam.λapp.λenv . match env with nil ⇒ lam(λx.x) | cons(x, t) ⇒ x (where we allow ourselves some syntactic sugar for lists in System F). This “term” represents λx.x when the free variable list is empty, and the first available free variable otherwise. This does not correspond to any single λ-term. We now look at two other representations of variables in higher-order abstract syntax and evaluate them in the light of the techniques of Section 3. 4.1
Parameterised and Weak Higher-Order Abstract Syntax
In [6] the authors note that the normal higher-order abstract syntax type cannot be directly translated to an inductive type in Coq due to the negative occurrence in the case for λ-abstraction. They propose weak higher-order abstract syntax, defined by an inductive type parameterised by a type of variables. We can represent this type in System F like so, using the normal encoding of inductive types: τWH (ν) = ∀α.(ν → α) → ((ν → α) → α) → (α → α → α) → α Choosing something obvious for ν, like natural numbers, results in inhabitants of this type that do not represent λ-terms (because they can inspect the variable
44
R. Atkey
names they are given). The solution is to keep the type ν abstract, so that inhabitants cannot inspect their variables. Hofmann [9] analysed this construction in the setting of presheaves, using a presheaf of variables for ν. Following on from [6], Chlipala [3] noticed that, if the meta language has parametric polymorphism, then the type ∀ν.τWH (ν) can be used to represent λ-terms, but he did not have a proof. He called this technique parameterised higher-order abstract syntax. We can supply such a proof: Theorem 2. T τH γ ∼ = T ∀ν.τWH (ν)γ (∼ = Term(0)). Proof. Define (in System F) φ : τH → ∀ν.τWH (ν) and φ−1 : (∀ν.τWH (ν)) → τH by: φ = λt.Λν.Λα.λvar .λlam.λapp. t [α] (λf. lam (λx. f (var x))) app φ−1 = λt.Λα.λlam.λapp. t [α] [α] (λx.x) lam app Since these functions are terms of System F, the parametricity properties automatically hold. The φ−1 (φ(t)) direction is particularly easy to prove: φ−1 (Λν.Λα.λvar .λlam.λapp. t [α] (λf. lam (λx. f (var x))) app) = Λα.λlam.λapp. t [α] (λf. lam (λx. f ((λx. x) x))) app = Λα.λlam.λapp. t [α] lam app =t In the reverse direction we can prove φ(φ−1 (t)) = t by applying parametricity over ordinary relations (Kripke relations are not needed here). If we have sets V for ν and A for α, the key idea is to relate the A and V by Rxy iff x = var y and relate A and A by the equality relation. 4.2
Locally Higher-Order Abstract Syntax
We now consider explicitly representing free variables in terms using any data type we choose, but representing bound variables using higher-order abstract syntax. This approach is inspired by locally nameless representations using de Bruijn indicies only for bound variables [1]. We consider the type: τLH (ν) = ∀α.(ν → α) → ((α → α) → α) → (α → α → α) → α This type has three “constructors”, one for injecting free variables of type ν into terms, and the two higher-order abstract syntax constructors. We are free to choose any type we like for ν, such as natural numbers or strings. Selecting naturals, we can define the following combinators: var : N → τLH (N) var = λx.Λα.λv.λl.λa. v x app : τLH (N) → τLH (N) → τLH (N) app = λxy.Λα.λv.λl.λa. a x y lam : N → τLH (N) → τLH (N) lam = λxt.Λα.λv.λl. l (λy. t [α] (λx .if x = x then y else v x ) l a)
Syntax for Free: Representing Syntax with Binding Using Parametricity
45
The var combinator constructs a term with a single free variable, and app constructs the object-level application of two terms. The lam combinator is more complicated: for free variable x and term t, it creates a new object-level λabstraction, with the body being t with x substituted for the variable bound by the object-level λ-abstraction. It is also possible to define a pattern matching combinator of type: τLH (N) → N + (τLH (N) × τLH (N)) + (τLH (N) → τLH (N)) that analyses a term in our representation, and returns either a free variable, the pair of terms involved in an application, or a term abstracted over another term in the case of object-level λ-abstraction. We cannot give this term here due to lack of space: please see the OCaml files contained with the Coq development. By using the techniques of Section 3 we can prove that this representation is actually equivalent to a representation using de Bruijn terms. We define such a representation LNTerm(A, n) inductively by the following constructors: freeVar : A → LNTerm(A, n) boundVar : {i : N | i < n} → LNTerm(A, n) Lam : LNTerm(A, n + 1) → LNTerm(A, n) App : LNTerm(A, n) → LNTerm(A, n) → LNTerm(A, n) Theorem 3. For closed types τ , LNTerm(T τ γ, 0) ∼ = T τLH (τ )γ. The significance of this theorem arises from the fact that we can use a language with parametric polymorphism to represent locally nameless λ-terms; a type that would normally seem to require some kind of indexed types to represent. We speculate that it would be possible to build a convenient (if inefficient) library for manipulating syntax with binders in OCaml using this representation.
5
Mixing Computation and Representation
We now go beyond the representation of pure syntax to embed the computational power of System F in abstract syntax trees. Licata, Zeilberger and Harper [11] define a system based on a proof theoretic analysis of focusing that allows for a mixing of computational and representational data. Note that the locally higherorder abstract syntax example from the previous section already demonstrates this in action: the ν → α constructor for free variables is computational in the sense that it can inspect the values it is given. 5.1
Arithmetic Expressions
Our first example is from Licata et al [11], that of the abstract syntax of arithmetic expressions with embedded “semantic” binary operations. Binding structure is introduced into the type by a “let” construct. We make the following definition, assuming some primitive type of integers int:
46
R. Atkey
τA = ∀α. (int → α) → ((int → int → int) → α → α → α) → (α → (α → α) → α) → α From the type, we have three “constructors”: one to introduce integers into terms, one for terms representing binary operations, with a function expressing the actual operation to perform, and one to handle lets, using the normal higherorder abstract syntax representation for binding. We can write an evaluator for expressions in this type very easily: eval(t) = t [int] (λx. x) (λf xy. f xy) (λxf. f x) A de Bruijn-style representation for these arithmetic expressions is given by the following constructors for an indexed type AExp : N → Set: Num : int → AExp(n) Binop : (int → int → int) → AExp(n) → AExp(n) → AExp(n) Let : AExp(n) → AExp(n + 1) → AExp(n) Again, using the same method as Section 3 we can prove: Theorem 4. AExp(0) ∼ = T τA γ. 5.2
Encapsulated Side-Effects with Dynamic Allocation
The Haskell programming languages contains a monad called ST, that is used to represent encapsulated side-effects with dynamic allocation. A simplified version of this monad, with a single type of data stored in references σ, a type of references ρ and result type τ is given by the following data: a family of types ST τ σ ρ, with associated monadic return and bind operations, plus three operations: new σρ : σ → ST ρ σ ρ upd σρ : ρ → σ → ST 1 σ ρ lkup σρ : ρ → ST σ σ ρ corresponding to dynamic allocation of a new memory cell, updating a memory cell and looking up the value of a memory cell. This monad has an associated function runST : ∀τ.∀σ.(∀ρ.τST (τ, σ, ρ)) → τ that takes a computation and runs it, producing a final result value of type τ . The intention is that the nested quantification over ρ prevents references leaking or entering from outside the computation. Moggi and Sabry [12] used operational techniques to prove the safety of the full ST monad with typed references. They represent values of the monadic type using a polymorphic type. Simplified to the System F setting with a single type for stored data, this type can be given as: τST (τ, σ, ρ) = ∀α. (τ → α) → (σ → (ρ → α) → α) → (ρ → σ → α → α) → (ρ → (σ → α) → α) → α
Syntax for Free: Representing Syntax with Binding Using Parametricity
47
We can make this family of types into a monad with the following definitions: return τ σρ : τ → τST (τ, σ, ρ) return τ σρ = λx.Λα.λret new upd lkup. ret x bind τ1 τ2 σρ : τST (τ1 , σ, ρ) → (τ1 → τST (τ2 , σ, ρ)) → τST (τ2 , σ, ρ) bind τ1 τ2 σρ = λcf.Λα.λret new upd lkup. c[α](λx. f x[α]ret new upd lkup) new upd lkup Note that, unlike Moggi and Sabry, we have not included a “constructor” in our type to represent bind , it can already be defined from the ret “constructor”. We define the operations of the monad like so: new σρ = λs.Λα.λret new upd lkup. new s (λr. ret r) upd ρ = λrs.Λα.λret new upd lkup. upd r s (ret ∗) lkup σρ = λr.Λα.λret new upd lkup. lkup r (λs. ret s) Using these combinators we can write programs in monadic style that issue commands to dynamically allocate new memory cells via the new operation and access them using the upd and lkup operations. Moggi and Sabry note that (their version of) the type τST (τ, σ, ρ) almost fits the schema for the polymorphic representation of an inductive type in System F, were it not for the negative occurrence of ρ in the new “constructor”. Using the techniques of Section 3, we can show that this type actually does correspond to an inductively defined type using de Bruijn representation for variables. The appropriate type is given by by the following constructors for an indexed type ST(A, S, −) : N → Set, for sets A and S. Ret : A → ST(A, S, n) New : S → ST(A, S, n + 1) → ST(A, S, n) Update : {i : N | i < n} → S → AExp(A, S, n) → AExp(A, S, n) Lookup : {i : N | i < n} → (S → AExp(A, S, n)) → AExp(A, S, n) Theorem 5. For closed types τ and σ, ST(T τ γ, T σγ, 0) ∼ = T ∀ρ.τST (τ, σ, ρ)γ. An obvious question now is whether this result extends to the case with typed references. Following Moggi and Sabry, we would expect that the Fω type λτ.∀ρ : ∗ → ∗.∀α. (τ → α) → (∀σ. σ → (ρ[σ] → α) → α) → (∀σ. ρ[σ] → σ → α → α) → (∀σ. ρ[σ] → (σ → α) → α) → α
48
R. Atkey
should have a de Bruijn-style representation similar to ST above. However, there is a problem with proceeding naively here. Consider the following program written in this monad (using Haskell’s do notation): do x ← new (λ(). return ()) upd x (λ(). do {y ← lkup x; y ()}) y ← lkup x y () which uses “Landin’s knot” to represent a non-terminating computation using mutable references. However, the “obvious” de Bruijn-style type (using a context consisting of lists of types) does not admit the translation of this term.
6
Related Work and Conclusions
Aside from the work of Washburn and Weirich [18], the closest work to ours is that of Rhiger [17], who shows that a higher-order abstract syntax encoding for a single typed object-language is sound and complete in a simply-typed metalanguage with a type constructor Exp : ∗ → ∗. We have extended his work by allowing multiple embedded languages. The use of System F also allows the use of iteration constructs to access terms from the outside, as demonstrated by Washburn and Weirich. Also related is the work of Carette et al [2]. They use the same method as Rhiger to embed languages inside an existing typed language (OCaml in this case). They abstract over the carrier type and actual implementations of lam and app, as we do here, but do not make the connection to concrete terms explicit. It seems obvious, though we have not yet formally proved it, that there is a natural extension of the representation of inductive types in System F as polymorphic types ∀α.(F [α] → α) → α, where α is positive in F to ones, where we allow negative a occurrences, and the represented type is some kind of abstract syntax with binding. We leave formulating and proving a general theorem of this kind to future work, but we suspect that it will be a straightforward application of the ideas in Section 3, the key idea being the use of Kripke logical relations. In future work we also wish to consider more powerful type theories than System F for use as the meta-language. An obvious first step is the use of System Fω , which will allow the use of type parameters to represent object languages with type systems that are subsets of the meta-language type system, although the case of the multi-typed ST monad from Section 5.2 shows that this extension may not be straightforward. Pfenning and Lee [14] have considered the use of Fω as a meta language, using a form of weak higher-order abstract syntax, but did not prove the close connection between representation and syntax that we have here. A yet more powerful route may be to consider the combination of dependent types and parametric polymorphism, so that representations of logics in the same style as the Logical Framework approach maybe used, combined with powerful ways of computing with them. The work of Izumi [10] on parametricity in dependent types may be useful here.
Syntax for Free: Representing Syntax with Binding Using Parametricity
49
Acknowledgements. Thanks to Randy Pollack, Sam Staton and Jeremy Yallop for comments on this work. This work was funded by the ReQueST grant (EP/C537068) from the Engineering and Physical Sciences Research Council.
References 1. Aydemir, B.E., Chargu´eraud, A., Pierce, B.C., Pollack, R., Weirich, S.: Engineering formal metatheory. In: Necula, G.C., Wadler, P. (eds.) POPL, pp. 3–15. ACM Press, New York (2008) 2. Carette, J., Kiselyov, O., Shan, C.-c.: Finally tagless, partially evaluated. In: Shao, Z. (ed.) APLAS 2007. LNCS, vol. 4807, pp. 222–238. Springer, Heidelberg (2007) 3. Chlipala, A.J.: Parametric higher-order abstract syntax for mechanized semantics. In: ICFP, pp. 143–156 (2008) 4. Coquand, T., Huet, G.: Constructions: A higher order proof system for mechanizing mathematics. In: Buchberger, B. (ed.) EUROCAL 1985. LNCS, vol. 203, pp. 151– 184. Springer, Heidelberg (1985) 5. de Bruijn, N.G.: Lambda-calculus notation with nameless dummies: a tool for automatic formula manipulation with application to the Church-Rosser theorem. Indag. Math. 34, 381–392 (1972) 6. Despeyroux, J., Felty, A.P., Hirschowitz, A.: Higher-Order Abstract Syntax in Coq. In: Dezani-Ciancaglini, M., Plotkin, G. (eds.) TLCA 1995. LNCS, vol. 902, pp. 124–138. Springer, Heidelberg (1995) 7. Gabbay, M., Pitts, A.M.: A New Approach to Abstract Syntax Involving Binders. In: LICS, pp. 214–224 (1999) 8. Harper, R., Licata, D.R.: Mechanizing metatheory in a logical framework. J. Funct. Program. 17(4-5), 613–673 (2007) 9. Hofmann, M.: Semantical Analysis of Higher-Order Abstract Syntax. In: LICS, pp. 204–213 (1999) 10. Izumi, T.: The Theory of Parametricity in Lambda Cube. Technical Report 1217, RIMS Kokyuroku (2001) 11. Licata, D.R., Zeilberger, N., Harper, R.: Focusing on Binding and Computation. In: LICS, pp. 241–252. IEEE Computer Society, Los Alamitos (2008) 12. Moggi, E., Sabry, A.: Monadic encapsulation of effects: a revised approach (extended version). J. Funct. Program. 11(6), 591–627 (2001) 13. Pfenning, F., Elliott, C.: Higher-Order Abstract Syntax. In: PLDI, pp. 199–208 (1988) 14. Pfenning, F., Lee, P.: Metacircularity in the polymorphic λ-calculus. Theoretical Computer Science 89, 137–159 (1991) 15. Plotkin, G.D.: Lambda-Definability in the Full Type Hierarchy. In: Seldin, J.P., Hindley, J.R. (eds.) To H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, pp. 363–373. Academic Press, London (1980) 16. Reynolds, J.C.: Types, Abstraction and Parametric Polymorphism. In: IFIP Congress, pp. 513–523 (1983) 17. Rhiger, M.: A foundation for embedded languages. ACM Trans. Program. Lang. Syst. 25(3), 291–315 (2003) 18. Washburn, G., Weirich, S.: Boxes go bananas: Encoding higher-order abstract syntax with parametric polymorphism. J. Funct. Program. 18(1), 87–140 (2008)
On the Meaning of Logical Completeness Michele Basaldella and Kazushige Terui Research Institute for Mathematical Sciences, Kyoto University, Kitashirakawa Oiwakecho, Sakyo-ku, Kyoto 606-8502, Japan {mbasalde,terui}@kurims.kyoto-u.ac.jp
Abstract. G¨ odel’s completeness theorem is concerned with provability, while Girard’s theorem in ludics (as well as full completeness theorems in game semantics) is concerned with proofs. Our purpose is to look for a connection between these two disciplines. Following a previous work [1], we consider an extension of the original ludics with contraction and universal nondeterminism, which play dual roles, in order to capture a polarized fragment of linear logic and thus a constructive variant of classical propositional logic. We then prove a completeness theorem for proofs in this extended setting: for any behaviour (formula) A and any design (proof attempt) P , either P is a proof of A or there is a model M of A⊥ which beats P . Compared with proofs of full completeness in game semantics, ours exhibits a striking similarity with proofs of G¨ odel’s completeness, in that it explicitly constructs a countermodel essentially using K¨ onig’s lemma, proceeds by induction on formulas, and implies an analogue of L¨ owenheim-Skolem theorem.
1
Introduction
G¨ odel’s completeness theorem (for first-order classical logic) is one of the most important theorems in logic. It is concerned with a duality (in a naive sense) between proofs and models: For every formula A, either
∃P (P A)
or
∃M (M |= ¬A).
Here P ranges over the set of proofs, M over the class of models, and P A reads “P is a proof of A.” One can imagine a debate on a general proposition A, where Player tries to justify A by giving a proof and Opponent tries to refute it by giving a countermodel. The completeness theorem states that exactly one of them wins. Actually, the theorem gives us far more insights than stated. Finite proofs vs infinite models: A very crucial point is that proofs are always finite, while models can be of arbitrary cardinality. Completeness thus implies L¨owenheim-Skolem and compactness theorems, leading to constructions of various nonstandard models.
Supported by JSPS Postdoctoral Fellowship for Foreign Researcher grant.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 50–64, 2009. c Springer-Verlag Berlin Heidelberg 2009
On the Meaning of Logical Completeness
51
Nondeterministic principles: Any proof of G¨ odel’s completeness theorem relies on a strong nondeterministic principle such as K¨ onig’s or Zorn’s lemma, in contrast to the trivial completeness theorem with respect to the class of boolean algebras. Matching of two inductions: Provability is defined by induction on proofs, while truth by induction on formulas. The two inductions are somehow ascribed to the essence of syntax and semantics, respectively, and the completeness theorem states that they do match. Unlike the real debate, however, there is no interaction between proofs and models in G¨odel’s theorem. A more interactive account of completeness is given by Girard’s ludics ([12,13]; see [10,4] for good expositions). Ludics is a variant of game semantics, which has the following prominent features. Monism: Proofs and models are homogeneous entities, called designs. Existentialism: Behaviours (semantic types) are built from designs, in contrast to the ordinary game semantics (e.g., Hyland-Ong [14]) where one begins with the definition of arenas (types) and then proceeds to strategies (proofs). Normalization as interaction: Designs (hence proofs and models) interact together via normalization. It induces an orthogonality relation between designs in such a way that P ⊥M holds if the normalization of P applied to M converges. A behaviour A is defined to be a set of designs which is equivalent to its biorthogonal (A = A⊥⊥ ). In this setting, Girard shows a completeness theorem for proofs [13], which roughly claims that any “winning” design in a behaviour is a proof of it. In view of the interactive definition of behaviour, it can be rephrased as follows: For every (logical) behaviour A and every (proof-like) design P , either
P A
or
∃M (M |= A⊥ and M beats P ).
Here, “M |= A⊥ ” means M ∈ A⊥ , and “M beats P ” means P ⊥M . Hence in case P A, we may conclude P ∈ A⊥⊥ = A. Notice that M |= A⊥ no more entails absolute unprovability of A; it is rather relativized to each P and there is a real interaction between proofs and models. Actually, Girard’s original ludics is so limited that it corresponds to a polarized fragment of multiplicative additive linear logic, which is too weak to be a stand-alone logical system. As a consequence, one does not observe an opposition between finite proofs and infinite models, since one can always assume that the countermodel M is finite (related to the finite model property for MALL [15]). Indeed, the proof of the above completeness is easy once internal completeness (a form of completeness which does not refer to any proof system [13]) for each logical connective has been proved. In this paper, we employ a term syntax for designs introduced in [19], and extend Girard’s ludics with duplication (contraction) and its dual: universal nondeterminism (see [1] and references therein). Although our term approach disregards some interesting locativity-related phenomena (e.g., normalization as merging of orders and different sorts of tensors [13]), our calculus is easier to
52
M. Basaldella and K. Terui
manipulate and closer to the tradition of λ, λμ, λμ˜ μ, π-calculi and others. Our resulting framework is as strong as a polarized fragment of linear logic with exponentials ([4]; cf. also [16]), which is in turn as strong as a constructive version of classical propositional logic. We then prove the completeness theorem above in this extended setting. Our proof exhibits a striking similarity with Sch¨ utte’s proof of G¨ odel’s completeness theorem [18]. Given a (proof-like) design P which is not a proof of A, we explicitly construct a countermodel M in A⊥ which beats P , essentially using K¨onig’s lemma. Soundness is proved by induction on proofs, while completeness is by induction on types. Thus our theorem gives matching of two inductions. Finally, it implies an analogue of L¨ owenheim-Skolem theorem, which well illustrates the opposition between finite proofs and infinite models. In game semantics, one finds a number of similar full completeness results. However, the connection with G¨ odel’s completeness seems less conspicuous than ours. Typically, “winning” strategies in Hyland-Ong games most naturally correspond to B¨ ohm trees, which can be infinite (cf. [5]). Thus, in contrast to our result, one has to impose finiteness/compactness on strategies in an external and noninteractive way, in order to have a correspondence with finite λ-terms. Although this is also the case in [1], we show that such a finiteness assumption is not needed in ludics.
2 2.1
Designs Syntax
We first recall the term syntax for deterministic designs introduced by the second author [19]. We employ a process calculus notation inspired by the close relationship between ludics and linear π-calculus [11]. Designs are built over a given signature A = (A, ar), where A is a set of names a, b, c, . . . and ar : A −→ IN is a function which assigns to each name a its arity ar(a). Let V be a countable set of variables V = {x, y, z, . . .}. Over a fixed signature A, a (proper) positive action is a with a ∈ A, and a (proper) negative action is a(x1 , . . . , xn ) where variables x1 , . . . , xn are distinct and ar(a) = n. In the sequel, we assume that an expression of the form a(x) always stands for a negative action. The positive (resp. negative) deterministic designs P (resp. N ) are coinductively generated by the following grammar: P ::= Ω N0 |aN1 , . . . , Nn , N ::= x a(x).Pa , where ar(a) = n and x = x1 , . . . , xn . Intuitively, designs may be considered as infinitary λ-terms with named applications and superimposed abstractions. Specifically, a positive design N0 |aN1 , . . . , Nn can be thought of as iterated application N0 N1 · · · Nn of name a ∈ A, and a(x).Pa as iterated abstraction λx.Pa of name a ∈ A. A family {a(x).Pa }a∈A of abstractions indexed by A is
On the Meaning of Logical Completeness
53
then superimposed to form a negative design a(x).Pa . Each a(x).Pa is called its component. The reduction rule for designs conforms to this intuition: ( a(x).Pa ) |bN1 , . . . , Nn −→ Pb [N1 /x1 , . . . , Nn /xn ]. Namely, when the application is of name b, one picks up component b(x).Pb from {a(x).Pa }a∈A and applies β-reduction. Notice that any closed positive design P (i.e., a positive design without free variables) has one of the following forms: , Ω and ( a(x).Pa )|aN1 , . . . , Nn . The last design reduces to another closed one. Hence P eventually reduces to , or Ω or diverges. By stipulating that the normal form of P in the last case is Ω, we obtain a dichotomy between and Ω: the normal form of a closed positive design is either or Ω. As we shall see, this induces an orthogonality relation between positive and negative designs. We also use Ω to encode partial sums. Given a set α = {a(x), b(y), . . . } of negative actions, we write a(x).Pa + b(y ).Pb + · · · to denote the negative design a(x).Ra , where Ra = Pa if a(x) ∈ α, and Ra = Ω otherwise. Although [19] mainly deals with linear designs, there is no difficulty in dealing with nonlinear ones. To obtain completeness, however, we also need to incorporate the dual of nonlinearity, that is universal nondeterminism [1]. It is reminiscent of differential linear logic [8], which has nondeterministic sum as the dual of contraction; the duality is essential for the separation property [17] (see also [7] for separation of B¨ohm trees). It is also similar to the situation in Hyland-Ong game semantics [14], where nonlinear strategies for Player may contain a play in which Opponent behaves noninnocently; Opponent’s noninnocence is again essential for full completeness. Definition 1 (Designs). For a fixed signature A, a positive (resp. negative) design P (resp. N ) is a coinductively defined term given as follows: (positive designs) P ::= Ω I Qi Qi ::= N0 |aN , . . . , N (predesigns) 1 n N ::= x a(x).Pa (negative designs), where I Qi is built from a family {Qi }i∈I of predesigns with I an arbitrary index set. We indicate positive designs by P, Q, . . ., negative designs by N, M, . . ., and arbitrary ones by D, E, . . .. Any subterm E of D is called a subdesign of D. A design D may contain free and bound variables. An occurrence of subterm a(x).Pa binds the free-variables x in Pa . Variables which are not under the scope of the binder a(x) are free. We denote by fv(D) the set of free variables occurring in D. In analogy with λ-calculus, we always consider designs up to α-equivalence, that is up to renaming of bound variables (see [19] for further details). We also identify designs which only differ in indexing: I Pi = J Qj if there is a bijection σ : I −→ J such that Pi = Qσ(i) for every i ∈ I. The daimon is now defined to be the empty conjunction ∅. A unary con junction {i} Qi is simply written as Qi . Furthermore, the conjunction operator can be extended to positive and negative designs: for I, J disjoint sets of indices,
54
M. Basaldella and K. Terui
Q i∧ J Qi = I∪J Qi , a(x).Pa ∧ a(x).Qa = a(x).(Pa ∧ Qa ),
I
Ω ∧ P = Ω, x ∧ N = undefined.
In particular, we have P ∧ = P in contrast to [1], which distinguishes the two. By the above convention, conjunction of two positive designs is always defined. A cut is a predesign of the form ( a(x).Pa )|aN1 , . . . , Nn . Otherwise, a predesign is of the form x|aN1 , . . . , Nn and called a head normal form. The head variable x in the predesign above plays the same role as a pointer in a strategy does in Hyland-Ong games and an address (or locus) in Girard’s ludics. On the other hand, a variable x occurring in a bracket (as in N0 |aN1 , . . . , Ni−1 , x, Ni+1 , . . . , Nn ) does not correspond to a pointer nor address. Rather, it corresponds to an identity axiom (initial sequent) in sequent calculus, and for this reason is called an identity. If a negative design N simply consists of a variable x, then N is itself an identity. A design D is said: – total, if D = Ω; – linear (or affine), if for any subdesign of the form N0 |aN1 , . . . , Nn , the sets fv(N0 ), . . . , fv(Nn ) are pairwise disjoint; – deterministic, if in any occurrence of subdesign I Qi , I is either empty (and hence I Qi = ) or a singleton. Example 1 (Girard’s syntax). Girard’s original designs [13] can be expressed in our syntax by taking the signature G = (Pf in (IN), | |) where Pf in (IN) consists of finite subsets of IN and | | is the function that gives the cardinality to each I ∈ Pf in (IN). Girard’s designs correspond to total, deterministic, linear, cut-free and identity-free designs over the signature G. See [19] for more details. 2.2
Normalization
Ludics is an interactive theory. This means that designs, which subsume both proofs and models, interact together via normalization, and types (behaviours) are defined by the induced orthogonality relation. Several ways to normalize designs have been considered in the literature: abstract machines [3,9,6,1], abstract merging of orders [13], and terms reduction [19]. Here we actually extend the last solution. As in pure λ-calculus, normalization is notnecessarily terminating, but in our setting a new difficulty arises: the operator . We define the normal forms in two steps, first giving a reduction rule which finds conjunctions of head normal forms whenever possible, and then expanding /x] denote the simultaneous and capture-free it corecursively. As usual, let D[N substitution of N = N1 , . . . , Nn for x = x1 , . . . , xn in D. Definition 2 (Reduction relation −→). The reduction relation −→ is defined over the set of positive designs as follows: −→ Q ∧ Pb [N /x]. Ω −→ Ω, Q∧ a(x).Pa | bN We denote by −→∗ the transitive closure of −→.
On the Meaning of Logical Completeness
55
Example 2. Let N = a(x). + b(x).(x|ax ∧ x|by ). Then: N |aw −→ , N |bw −→ w|aw ∧ w|by , N |cw −→ Ω. Given two positive designs P, Q, we write P ⇓ Q and read “P converges to Q” if P −→∗ Q and Q is a conjunction of head normal forms (including the case Q = ). We write P ⇑ and read “P diverges” otherwise (typically when P −→∗ Ω). Notice that the above reduction relation is completely deterministic. Alternatively, a nondeterministic one can be defined over predesigns and Ω as follows. Given predesigns P0 , R0 (which can also be seen as positive designs P , R R0 if P0 −→ Q ∧ R0 for some positive design 0 {0} {0} 0 ), we write P0 − Q. We also write Ω− Ω, and P0 − Ω if P0 −→ Ω. Then it is easy to see that P0 converges if and only if all nondeterministic reduction sequences from P0 are finite. Thus our nondeterminism is universal rather than existential. Definition 3 (Normal form). The normal form function : D −→ D is defined by corecursion as follows: i if P ⇓ P = I xi |ai N I xi |ai Ni ; = Ω if P ⇑; x = x. a(x).Pa = a(x).Pa ; Notice that the dichotomy in the closed case is maintained: for any closed positive design P , P is either or Ω. Theorem 1 (Associativity) D[N1 /x1 , . . . , Nn /xn ] = D[N1 /x1 , . . . , Nn /xn ].
3 3.1
Behaviours Orthogonality
In the rest of this work, we restrict ourselves to a special subclass of designs: namely, we consider only total, cut-free and identity-free designs. Restriction to identity-free designs is not a serious limitation, since identities can be replaced by suitable infinitary identity-free designs (namely, their infinite η expansions, called faxes in [13]). A proof of this fact is given in [19]. Since we work in a cut-free setting, we can simplify our notation: we often identify an expression like D[N/x] with its normal form D[N/x]. Thus, we improperly write D[N/x] = E rather than D[N/x] = E. Definition 4 (Orthogonality). A positive design P is closed if fv(P ) = ∅, atomic if fv(P ) ⊆ {x0 } for a certain fixed variable x0 . A negative design N is atomic if fv(N ) = ∅. Two atomic designs P, N of opposite polarities are said orthogonal and written P ⊥N when P [N/x0 ] = . If X is a set of atomic designs of the same polarity, then its orthogonal set, denoted by X⊥ , is defined by X⊥ := {E : ∀D ∈ X, D⊥E}.
56
M. Basaldella and K. Terui
The meaning of can be clarified in terms of orthogonality. For designs D, E of the same polarity, define D E iff {D}⊥ ⊆ {E}⊥ . D E means that E has more chances of convergence than D when interacting with other designs. The following is easy to observe. Proposition 1. is a preorder. Moreover, we have D ∧ E D and D D ∧ D for any designs D, E of the same polarity. In particular, P = P ∧ for any positive design P . This justifies our identification of with the empty conjunction ∅. Although possible, we do not define orthogonality for nonatomic designs. Accordingly, we only consider atomic behaviours which consist of atomic designs. Definition 5 (Behaviour). An (atomic) behaviour X is a set of atomic designs of the same polarity such that X⊥⊥ = X. A behaviour is positive or negative according to the polarity of its designs. We denote positive behaviours by P, Q, R, . . . and negative behaviours by N, M, K . . . . There are the least and the greatest behaviours among all positive (resp. negative) behaviours with respect to set inclusion: 0+ := {}, 0− := {− },
+ := 0−⊥ , − := 0+⊥ ,
(− =
a(x).).
We now introduce the contexts of behaviours, which corresponds to sequents of behaviours in [13]. Definition 6 (Contexts of behaviours). A positive context Γ is of the form x1 : P1 , . . . , xn : Pn , where x1 , . . . , xn are distinct variables and P1 , . . . , Pn are (atomic) positive behaviours. We denote by fv(Γ) the set {x1 , . . . , xn }. A negative context Γ, N is a positive context Γ enriched with an (atomic) negative behaviour N, to which no variable is associated. We define: – P |= x1 : P1 , . . . , xn : Pn if fv(P ) ⊆ {x1 , . . . , xn } and ⊥ P [N1 /x1 , . . . , Nn /xn ] = for any N1 ∈ P⊥ 1 , . . . , Nn ∈ Pn . – N |= x1 : P1 , . . . , xn : Pn , N if fv(N ) ⊆ {x1 , . . . , xn } and ⊥ ⊥ P [N [N1 /x1 , . . . , Nn /xn ]/x0 ] = for any N1 ∈ P⊥ 1 , . . . , Nn ∈ Pn , P ∈ N . Clearly, N |= N iff N ∈ N, and P |= y : P iff P [x0 /y] ∈ P. Furthermore, associativity (Theorem 1) implies the following quite useful principle: Lemma 1 (Closure principle). P |= Γ, x : P if and only if P [N/x] |= Γ for any N ∈ P⊥ . N |= Γ, N if and only if P [N/x0 ] |= Γ for any P ∈ N⊥ . 3.2
Logical Connectives
We next describe how behaviours are built by means of logical connectives in ludics. Let us assume that the set of variables V is equipped with a fixed linear order x0 , x1 , x2 . . ..
On the Meaning of Logical Completeness
57
Definition 7 (Logical connectives). An n-ary logical connective α is a finite set of negative actions a1 (x1 ), . . . , am (xm ) such that the names a1 , . . . , am are distinct and the variables x1 , . . . , xm are taken from {x1 , . . . , xn }. Given a name a, an n-ary logical connective α and behaviours N1 , . . . , Nn , P1 , . . . , Pn we define: aN1 , . . . , Nm := {x0 |aN1 , . . . , Nm : N1 ∈ N1 , . . . , Nm ∈ Nm }, ⊥⊥ αN1 , . . . , Nn := aN , . . . , N , i i 1 m a( x)∈α ⊥ ⊥ α(P1 , . . . , Pn ) := αP⊥ 1 , . . . , Pn ,
where the indices i1 , . . . , im ∈ {1, . . . , n} are determined by the vector x = xi1 , . . . , xim given for each a(x) ∈ α. In terms of linear logic, the cardinality of the connective α corresponds to the additive arity while the arity of each name to the multiplicative arity. Example 3 (Linear logic connectives). Usual linear logic connectives can be defined by logical connectives , &, ↑ , ⊥, below; we also give some shorthand notations for readability. &
⊗ := , ⊕ := &, ↓ := ↑ , := ∅, &
:= {℘(x1 , x2 )}, & := {π1 (x1 ), π2 (x2 )}, ↑ := {↑ (x1 )}, ⊥ := {∗},
• := ℘, ιi := πi , ↓ := ↑, (∗ 0-ary name).
&
We do not have exponentials here, because we are working in a nonlinear setting so that they are already incorporated into the connectives. With these logical connectives we can built (semantic versions of) usual linear logic types (we use infix notations such as N ⊗ M rather than the prefix ones ⊗N, M ). P Q = •P⊥ , Q⊥ ⊥ , P & Q = ι1 P⊥ ⊥ ∩ ι2 Q⊥ ⊥ , ↑ P = ↓ P⊥ ⊥ , ⊥ = 1⊥ , = ∅⊥ . &
N ⊗ M = •N, M ⊥⊥ , N ⊕ M = (ι1 N ∪ ι2 M )⊥⊥ , ↓ N = ↓ N ⊥⊥ , 1 = {x0 |∗}⊥⊥ , 0 = ∅⊥⊥ ,
The next theorem illustrates a special feature of behaviours defined by logical connectives. It also suggests that nonlinearity and universal nondeterminism play dual roles. Theorem 2. Let P be an arbitrary positive behaviour. 1. P |= x1 : P, x2 : P =⇒ P [x0 /x1 , x0 /x2 ] ∈ P. 2. N ∧ M ∈ P⊥ =⇒ N ∈ P⊥ and M ∈ P⊥ . the converses of 1. ( duplicability) and 2.( closure under Moreover, if P = αN , ) hold.
58
M. Basaldella and K. Terui
Proof. 1. Let N ∈ P⊥ . Then P [N/x1 , N/x2 ] = by assumption. Hence P [x0 /x1 , x0 /x2 ][N/x0 ] = , and so P [x0 /x1 , x0 /x2 ] ∈ P⊥⊥ = P. 2. Because of N ∧ M N, M (Proposition 1). ⊥ . To prove N ∧M ∈ P⊥ , it is sufficient Closure under . Let N, M ∈ P⊥ = αN ∈ to show that N ∧M is orthogonal to any x0 |aK a( x)∈α aNi1 , . . . , Nim . But = since x0 occurs only once at the head position, it boils down to N ∧ M |aK = and M |aK = . , which is an easy consequence of N |aK Duplicability. Let P [x0 /x1 , x0 /x2 ] ∈ P = αN . It suffices to show that P [N/x1 , M/x2 ] = holds for any N, M ∈ P⊥ . But we have just proven that N ∧M ∈ P⊥ , and so P [x0 /x1 , x0 /x2 ][N ∧ M/x0 ] = P [N ∧ M/x1 , N ∧ M/x2 ] = . Since N ∧ M N, M by Proposition 1, we have P [N/x1 , M/x2 ] = . Remark 1. Theorem 2 can be considered as an (internal, monistic) form of soundness and completeness for the contraction rule: soundness corresponds to point 1. whereas completeness to its converse (duplicability). 3.3
Internal Completeness
In [13], Girard proposes a purely monistic, local notion of completeness, called internal completeness. It means that we can give a precise and direct description to the elements in behaviours (built by logical connectives) without using the orthogonality and without referring to any proof system. Negative logical connectives easily enjoy internal completeness: Theorem 3 (Internal Completeness (negative case)). α(P1 , . . . , Pn ) = { a(x).Pa : Pa |= xi1 : Pi1 , . . . , xim : Pim for every a(x) ∈ α}, where the indices i1 , . . . , im are determined by the vector x = xi1 , . . . , xim . In the above, Pb can be arbitrary when b(x) ∈ / α. Thus our approach is “immaterial” in that we do not consider incarnations and material designs. For example, we have P & Q = {π1 (x1 ).P + π2 (x2 ).Q + · · · : P |= x1 : P and Q |= x2 : Q} = {π1 (x0 ).P + π2 (x0 ).Q + · · · : P ∈ P and Q ∈ Q}, where the irrelevant components of the sum are suppressed by “· · · .” Up to incarnation (i.e. removal of irrelevant part), P & Q, which has been defined by intersection, is isomorphic to the cartesian product of P and Q: a phenomenon called mystery of incarnation in [13]. As to positive connectives, [13] proves internal completeness theorems for additive and multiplicative ones separately in the linear and deterministic setting. They are integrated in [19] as follows: Theorem 4 (Internal completeness (linear, positive case)). When the universe of designs is restricted to linear and deterministic ones, we have
On the Meaning of Logical Completeness
αN1 , . . . , Nn =
59
aNi1 , . . . , Nim ∪ {}.
a( x)∈α
However, this is no more true with nonlinear designs. A counterexample is given below. Example 4. Let us consider the behaviour P := ↓ ↑↑ (0+ ) = ↓ ↑↑ (0+ ) ⊥⊥ . By construction, the design P0 := x0 | ↓ ↑ (x1 ). belongs to P, but then, also any design of the form 0 | ↓ ↑ (x1 ).Pn belongs to P. Too see this, note Pn+1 := x⊥ that any N = a(x).Pa ∈ P hascomponent of the form ↑ (y).y| ↓ M with M arbitrary (more precisely, ↑ (y). I y| ↓ Mi for some I with Mi arbitrary). Hence we have Pn+1 [N/x0 ] = N | ↓ ↑ (x1 ).Pn [N/x0 ] = (↑ (x1 ).Pn [N/x0 ]) | ↓ M = Pn [N/x0 ]; P0 [N/x0 ] = . This proves Pn+1 ∈ P. However, Pn+1 ∈ / ↓ ↑↑ (0+ ) , since ↑ (x1 ).Pn is not atomic + and so cannot belong to ↑ (0 ). This motivates us to directly prove completeness for proofs, rather than deriving it from internal completeness as in the original work [13]; internal completeness for positives will be further discussed in our subsequent work. In [1] a weaker form of internal completeness is proved, which is enough to derive a weaker form of full completeness: all finite “winning” designs are interpretations of proofs. While such a finiteness assumption is quite common in game semantics, we will show that it can be avoided in ludics.
4 4.1
Proof System and Completeness for Proofs Proof System
We will now introduce a proof system. In our system, logical rules are automatically generated by logical connectives. Since the set of logical connectives vary for each signature A, our proof system is parameterized by A. If one chooses A rich enough, the constant-only fragment of polarized linear logic ([4]; cf. also [16]) can be embedded. In the sequel, we focus on logical behaviours, which are composed by using logical connectives only. Definition 8 (Logical behaviours). A behaviour is logical if it is inductively built as follows (α denotes an arbitrary logical connective): P := αN1 , . . . , Nn ,
N := α(P1 , . . . , Pn ).
Notice that the orthogonal of a logical behaviour is again logical. As advocated in the introduction, our monistic framework renders both proofs and models as homogeneous objects: designs. Definition 9 (Proofs, Models). A proof is a design in which all the conjunctions are unary. In other words, a proof is a deterministic and -free design. A model is an atomic linear design (in which conjunctions of arbitrary cardinality may occur).
60
M. Basaldella and K. Terui
Given a design D, let ac+ (D) be the set of occurrences of proper positive actions a in D. The cardinality of D is defined to be the cardinality of ac+ (D). Notice that a proof in the above sense can be infinite, so might not “prove” anything. Hence it might be better called a “proof attempt” or “untyped proof.” A positive (resp. negative) sequent is of the form P Γ (resp. N Γ, N) where P is a positive proof (resp. N is a negative proof) and Γ is a positive context (see Definition 6) of logical behaviours such that fv(P ) ⊆ fv(Γ) (resp. fv(N ) ⊆ fv(Γ)). Intuitively, a sequent D Γ should be understood as a claim that D is a proof of Γ, or D is of type Γ. Our proof system consists of three types of inference rules: positive (α, a), negative (α), and cut. M1 Γ, Ni1
...
Mm Γ, Nim (z : αN1 , . . . , Nn ∈ Γ) (α, a) z|aM1 , . . . , Mm Γ
a }a(x)∈α {Pa Γ, x : P a(x).Pa Γ, α(P1 , . . . , Pn )
(α)
P Γ, z : P N Δ, P⊥ (cut) P [N/z] Γ, Δ
with the proviso: – In the rule (α, a), a(x) ∈ α, x = xi1 , . . . , xim , and i1 , . . . , im ∈ {1, . . . , n}. a stands for xi1 : Pi1 , . . . , xim : Pim . A component b(y ).Pb – In (α), x : P of a(xa ).Pa can be arbitrary when b(y ) ∈ α. Hence we again take an “immaterial” approach. It is also possible to adopt a “material” approach by requiring Pb = Ω when b(y) ∈ α. Then a proof D is finite (i.e., ac+ (D) is a finite set) whenever D Γ is derivable for some Γ. Thus, as in ordinary sequent calculi, our proof system accepts only essentially finite proofs for derivable sequents (i.e., finite up to removal of irrelevant part). For linear logic connectives, the positive and negative rules specialize to the following (taking the “material” approach):
M Γ, Ni (z : N1 ⊕ N2 ∈ Γ) (⊕, ιi ) z|ιi M Γ (z : 1 ∈ Γ) (1) z|∗ Γ
4.2
P Γ, x1 : P1 , x2 : P2 ( ℘(x1 , x2 ).P Γ, P1 P2 &
M2 Γ, N2 (z : N1 ⊗ N2 ∈ Γ) (⊗, •) z| • M1 , M2 Γ
&
M1 Γ, N1
)
P1 Γ, x1 : P1 P2 Γ, x2 : P2 (&) π1 (x1 ).P1 + π2 (x2 ).P2 Γ, P1 & P2
P Γ (⊥ ⊥) ∗.P Γ, ⊥
a( x).Ω Γ,
(
)
Completeness for Proofs
We now prove soundness and completeness for proofs. In the statement of the theorem below, “D Γ” means that the sequent D Γ is derivable in our proof system.
On the Meaning of Logical Completeness
61
Theorem 5 (Soundness). D Γ =⇒ D |= Γ. Proof. By induction on the derivation of D Γ, using Lemma 1 (the closure principle) and Theorem 2 (1). Theorem 6 (Completeness for proofs). For every positive logical behaviour P and every proof P (see Definition 9), P |= x : P =⇒ P x : P. Similarly for the negative case. The proof below is analogous to Sch¨ utte’s proof of G¨ odel’s completeness theorem [18], which proceeds as follows: 1. Given an unprovable sequent P, find an open branch in the cut-free proof search tree. 2. From the open branch, build a countermodel M in which P is false. We can naturally adapt 1. to our setting, since the bottom-up cut-free proof search in our proof system is deterministic in the sense that at most one rule applies at each step. Moreover, it never gets stuck at the negative sequent, since a negative rule is always applicable bottom-up. Suppose now that P x : P does not have a derivation. Our goal is to build c(Px ). a model c(Px ) ∈ P⊥ such that P ⊥ By K¨ onig’s Lemma, there exists a branch in the cut-free proof search tree, .. .. N1 Ξ1 P1 Θ1 N0 Ξ0 P0 Θ0 , with P0 = P and Θ0 = x : P, which is either finite and has the topmost sequent Pmax Θmax with max ∈ IN to which no rule applies anymore, or infinite. In the latter case, we set max = ∞. Without loss of generality, we assume that each variable is associated to at most one behaviour. Namely, if x : P and x : Q occur in the branch, we have P = Q (an assumption needed for Lemma 2 (2)). We first consider the former case (max < ∞) and illustrate how to build a model c(Pi ) for 0 ≤ i ≤ max by means of concrete examples. The construction proceeds by downward induction from max to 0. (i) When Pmax = Ω, let c(Pmax ) = − (= a(x).). (ii) Suppose for in Γ, z : M ⊗ K but a = • so stance that Pmax Θmax is of the form z|aM that the proof search gets stuck. Then let c(Pmax ) = ℘(xl , xr ).. (iii) Suppose that we have constructed c(Pj ) for i + 1 ≤ j ≤ max, and the relevant part of the branch is of the form:
M. Basaldella and K. Terui
.. .. Pi+1 Θi+1 Ni Ξi Pi . Θi .. .
.. .. Pi+1 Γ, x : P, y : Q = ℘(x, y).Pi+1 Γ, P Q z| • ℘(x, y).Pi+1 , M Γ .. .. &
62
,
where Γ contains z : (P Q) ⊗ M. Let: c(Px ) = {c(Pj ) : i < j ≤ max, Pj has head variable x} c(Pi ) = ℘(xl , xr ).xl | • c(Px ), c(Py ) . Here, c(Pi ) begins with ℘(xl , xr ).xl rather than ℘(xl , xr ).xr , because the branch goes up to the left direction, choosing the left subformula P Q. When none of Pj (i < j ≤ max) has head variable x, we set c(Px ) = − . Next consider the case max = ∞. We first define cn (Pi ) for every n, i < ∞. Let cn (Pi ) = − for i > n. For 0 ≤ i ≤ n, we build cn (Pi ) by downward induction on i from n to 0, using (iii) above. When n → ∞, each cn (Pi ) grows in the sense that each conjunction obtains more and more conjuncts. This allows us to define c(Pi ) for each i by taking the “limit” limn→∞ cn (Pi ), which is roughly speaking the “union” c(Pi ) = n<∞ cn (Pi ) (cf. [19] for the union of designs). c(Px ) for each variable x is similarly defined. Observe that each c(Pi ) and c(Px ) thus constructed are surely models, i.e., atomic linear designs. Theorem 6 is a direct consequence of the following two lemmas. &
&
Lemma 2. For Pi Θi appearing in the branch, suppose that the head variable of Pi is z and z : R ∈ Θi . Then (1) c(Pi ) ∈ R⊥ , and (2) c(Pz ) ∈ R⊥ . Proof. By induction on R. (1) When i = max and the case (ii) applies, we have ℘(xl , xr ). ∈ (M ⊗ K)⊥ by internal completeness for negatives (Theorem 3). Suppose that the case (iii) applies to Pi Θi . Then c(Pi ) = ℘(xl , xr ).xl | • c(Px ), c(Py ) . By induction hypothesis (2), we have c(Px ) ∈ P⊥ and c(Py ) ∈ Q⊥ . Hence x0 | • c(Px ), c(Py ) ∈ P⊥ ⊗ Q⊥ = (P Q)⊥ . Since xl , xr are not free in c(Px ), c(Py ), we have xl | • c(Px ), c(Py ) |= xl : (P Q)⊥ , xr : M⊥ . Hence by Theorem 3, ℘(xl , xr ).xl | • c(Px ), c(Py ) ∈ (P Q)⊥ M⊥ = ((P Q) ⊗ M)⊥ . (2) Follows from (1) since R⊥ is closed under (Theorem 2). &
&
& &
&
Lemma 3. Suppose that the head variable of P0 is x. Then we have P0 ⊥ c(Px ). Proof. We first prove that there is a nondeterministic reduction sequence Pi [c(Pv1 )/v1 , . . . , c(Pvm )/vm ]− ∗ Pi+1 [c(Pw1 )/w1 , . . . , c(Pwn )/wn ] for any i < max, where v and w are the free variables of Pi and Pi+1 , respectively. Suppose that Pi is as in the case (iii) above. By writing [θ] for [c(Pv1 )/v1 , . . . , c(Pvm )/vm ] and noting that c(Pz ) contains c(Pi ) = ℘(xl , xr ).xl | • c(Px ), c(Py ) as conjunct, we have Pi [θ] = c(Pz )| • ℘(x, y).Pi+1 [θ], M [θ] − (℘(x, y).Pi+1 [θ])| • c(Px ), c(Py ) − Pi+1 [θ, c(Px )/x, c(Py )/y],
On the Meaning of Logical Completeness
63
as desired. When max = ∞, we have obtained an infinite reduction sequence from P0 [c(Px )/x]. Otherwise, we have P0 [c(Px )/x]− ∗ Pmax [θ]. In case (i), [θ] − Ω, because c(Pz ) conPmax [θ] = Ω. In case (ii), Pmax [θ] = c(Pz )|aM tains c(Pmax ) = ℘(xl , xr ). + a(x).Ω + · · · as conjunct. This establishes the proof of Theorem 6. Our explicit construction of the model c(Px ) yields a byproduct. Corollary 1 (Downward L¨ owenheim-Skolem, Finite model property). Let P be a proof and P a logical behaviour. If P ∈ P, then there is a countable M. model M ∈ P⊥ (i.e., ac+ (M ) is a countable set) such that P ⊥ Furthermore, when P is linear, there is a finite (and deterministic) model M. M ∈ P⊥ such that P ⊥ The last statement is due to the observation that when P is linear the positive rule (α, a) can be replaced with a linear variant: M1 Γ1 , Ni1 . . . Mm Γm , Nim z|aM1 , . . . , Mm Γ, z : αN1 , . . . , Nn , where Γ1 , . . . , Γm are disjoint subsets of Γ. We then immediately see that the proof search tree is always finite, and so is the model c(Px ).
5
Conclusion
We have presented a G¨odel-like completeness theorem for proofs in the framework of ludics, aiming at linking completeness theorems for provability with those for proofs. We have explicitly constructed a countermodel against any failed proof attempt, following Sch¨ utte’s idea based on cut-free proof search. Our proof employs K¨ onig’s lemma and reveals a sharp opposition between finite proofs and infinite models, leading to a clear analogy with L¨owenheim-Skolem theorem. In Hyland-Ong game semantics, Player’s “winning” strategies most naturally correspond to possibly infinite B¨ ohm trees (cf. [5]). One could of course impose finiteness/compactness on them to have correspondence with finite proofs. But it would not lead to an explicit construction of Opponent’s strategies beating infinite proof attempts. Although finiteness is imposed in [1] too, our current work shows that it is not necessary in ludics. Our work also highlights the duality: proof model deterministic, nonlinear nondeterministic, linear The principle is that when proofs admit contraction, models have to be nondeterministic (whereas do not have to be nonlinear). A similar situation arises, e.g. in [7,17], when one proves the separation property (an analogue of B¨ ohm’s theorem [2]), stating that two distinct terms can be distinguished via interaction
64
M. Basaldella and K. Terui
with a suitable context. Indeed, our construction of countermodels is based on the B¨ohm out technique that is also crucial for proving separation. To prove the separation property in our setting, however, a more delicate treatment of the conjunction would be required (e.g., D and D ∧ D cannot be separated since {D}⊥ = {D ∧ D}⊥ ). Acknowledgements. We are deeply indebted to Pierre-Louis Curien, who pointed out a gap in an earlier draft of this paper. Our thanks are also due to the anonymous referees.
References 1. Basaldella, M., Faggian, C.: Ludics with repetitions (exponentials, interactive types and completeness). To appear in LICS (2009) 2. B¨ ohm. C.: Alcune propriet` a delle forme β − η-normali nel λ− K-calcolo. Publicazioni dell’Istituto per le Applicazioni del Calcolo 696 (1968) 3. Curien, P.-L.: Abstract B¨ ohm trees. Mathematical Structures in Computer Science 8, 559–591 (1998) 4. Curien, P.-L.: Introduction to linear logic and ludics, part II. CoRR abs/cs/0501039 (2005) 5. Curien, P.-L.: Notes on game semantics (manuscript, 2006) 6. Curien, P.L., Herbelin, H.: Abstract machines for dialogue games. CoRR abs/0706.2544 (2007) 7. Dezani-Ciancaglini, M., Intrigila, B., Venturini-Zilli, M.: B¨ ohm’s theorem for B¨ ohm trees. In: ICTCS 1998, pp. 1–23 (1998) 8. Ehrhard, T., Regnier, L.: Differential interaction nets. Theor. Comput. Sci. 364, 166–195 (2006) 9. Faggian, C.: Travelling on designs. In: Bradfield, J.C. (ed.) CSL 2002 and EACSL 2002. LNCS, vol. 2471, pp. 427–441. Springer, Heidelberg (2002) 10. Faggian, C.: Interactive observability in ludics: The geometry of tests. Theor. Comput. Sci. 350, 213–233 (2006) 11. Faggian, C., Piccolo, M.: Ludics is a model for the finitary linear pi-calculus. In: Della Rocca, S.R. (ed.) TLCA 2007. LNCS, vol. 4583, pp. 148–162. Springer, Heidelberg (2007) 12. Girard, J.-Y.: On the meaning of logical rules I: syntax vs. semantics. In: Berger, U., Schwichtenberg, H. (eds.) Computational Logic, pp. 215–272. Springer, Heidelberg (1999) 13. Girard, J.-Y.: Locus solum: From the rules of logic to the logic of rules. Mathematical Structures in Computer Science 11, 301–506 (2001) 14. Hyland, J.M.E., Ong, C.H.L.: On full abstraction for PCF: I, II, and III. Inf. Comput. 163, 285–408 (2000) 15. Lafont, Y.: The finite model property for various fragments of linear logic. J. Symb. Log. 62, 1202–1208 (1997) 16. Laurent, O.: Polarized games. Ann. Pure Appl. Logic 130, 79–123 (2004) 17. Mazza, D., Pagani, M.: The separation theorem for differential interaction nets. In: Dershowitz, N., Voronkov, A. (eds.) LPAR 2007. LNCS, vol. 4790, pp. 393–407. Springer, Heidelberg (2007) 18. Sch¨ utte, K.: Ein System des Verkn¨ upfenden Schliessens. Archiv. Math. Logic Grundlagenf. 2, 55–67 (1956) 19. Terui, K.: Computational ludics. To appear in Theor. Comput. Sci. (2008)
Thick Subtrees, Games and Experiments Pierre Boudes Laboratoire d’Informatique de l’université Paris-Nord UMR CNRS 7030 Institut Galilée – Université Paris-Nord 99, avenue Jean-Baptiste Clément 93430 Villetaneuse – France
[email protected]
Abstract. We relate the dynamic semantics (games, dealing with interactions) and the static semantics (dealing with results of interactions) of linear logic with polarities, in the spirit of Timeless Games [1]. The polarized game semantics is full and faithfull for polarized proof-nets [2]. We detail the correspondence between cut free proof-nets and innocent strategies, in a framework related to abstract Böhm trees. A notion of thick subtree allows us to reveal a deep relation between plays in games and Girard’s experiments on proof-nets. We then define a desequentializing operation, forgetting time in games which coincides with the usual way of computing a result of interaction from an experiment. We then obtain our main result: desequentializing the game interpretation of a polarized proof-net yields its standard relational model interpretation (static semantics).
1 Introduction Denotational semantics interprets a program (a proof or a λ-term) as a structure representing all its possible interactions (via cut elimination or via β-reduction) with others programs. In static semantics only the result of the interaction is represented. In dynamic semantics (games) an interaction is fully represented by a sequence (play) of actions (moves) of the program (the Player) and the environment (the Opponent). There are many references for game semantics. For an introduction, see [3]. In this paper, we use Hyland-Ong style polarized games [4]. In such games, a move can justify itself by pointing to a preceding move. Laurent proved that polarized game semantics is full and faithful [2], for the proof-nets of LLpol the polarized fragment of linear logic (expressive enough to encode simply-typed λ-calculus). Proof-nets have been introduced together with linear logic [5] as a more parallel syntax than sequent calculus. Experiments on proof-nets (see ([6] for an extensive study) provide the same denotations as the categorical interpretation of the corresponding sequent calculus proofs. The static interpretation of a proof-net is the set of results of experiments on this proof-net. The comparison between static and dynamic semantics is strongly motivated by Ehrhard’s result ([7]) stating that the extensional collapse of sequential algorithms
Work partially supported by project NOCoST (ANR, JC05_43380).
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 65–79, 2009. c Springer-Verlag Berlin Heidelberg 2009
66
P. Boudes
(a game model) is the hypercoherences semantics (a static semantics). In [8], by introducing a suitable game semantics (extensional games), P.-A. Mellies gives a finegrained analysis of this result which better details the extensional content of games. We focus here on providing a simple mathematical framework suitable for a direct extraction of the static semantics from the dynamic one. The relational model is the generic static semantics of linear logic, in the sense that the others are generally derived from this one by introducing new ingredients (like various coherence relations, see [9]). In that very simple semantics, formulæ are sets and proofs are relations. A naive approach to the comparison is to consider an operation D which maps a play (an interaction) to Syntax an element in a set of results, as in Figure 1. The diffigames D static semantics culties are then: (i) to build a static semantics with sets play result of results (a natural candidate is the relational model); (ii) to turn D into a logical map, i.e. such that the diFig. 1. Projection D agram commutes for proofs. This approach is successfully used in [1], by introducing a new static semantics, the (bi)-polarized pointed relational model, and in [10], by introducing a new game semantics, bordered games where plays explicitly carry results of interactions. In this paper, we clarify the relation between syntax, static and dynamic semantics, without using models specially designed for the purpose of the projection. We introduce a desequentialization D of justified plays, for which the source is Laurent’s polarized games, and the target is the standard relational model of linear logic. The desequentialization maps a play to the tree of its justification pointers (which is a thick subtree of the formula, see below). The desequentialization introduced here may be used in future works to push some properties of game semantics through the time forgetful projection D or conversely to pull some conditions of the static semantics at the games level. In polarized games, proofs are interpreted as finite innocent strategies. Such strategies can be presented as finite trees of Player’s views. Trees of P-views are particular instances of abstract Böhm trees ([11,12]). When it comes from the interpretation of a proof, a tree of P-views can be thought of as an abstract presentation of the Böhm tree of a simply typed λ-term. (Pointers represent variables binding). Since the polarized game semantics is full and faithful, trees of P-views are in a bijective correspondence with cut free polarized proof-nets of the same type. By analyzing the shape of cut free proof-nets, we detail this correspondence Ψ in a very direct way (comparatively to [2]). To do so, we restrict ourselves to the additive free fragment MELLpol of LLpol. This minimize the complexity of the definition of proof-nets, at a low cost, since additives can (almost) be encoded in MELLpol. Here is a sketch of the correspondence Ψ . Obviously, the Reader unfamiliar with proof-nets and games will find the definitions in the body of the paper. A tree of P-views φ is a finite tree tφ , together with two more datum: a naming fφ of nodes by moves (a node is an occurrence of a move) and a relation ← φ on nodes specifying justification pointers between moves.
Thick Subtrees, Games and Experiments
67
MELLpol proof-nets Ψ
polarized games
D−
trees of P-views (Böhm trees) D
D+ relational model
Fig. 2. Desequentialization
A MELLpol proof-net π is also a finite tree Tπ (the tree representing the nesting of !-boxes) but together with: a labeling Rπ of nodes by flat proof structures (which are finite oriented graphs with pending edges) and a structure (Sπ , Bπ ) relating flat proof structures to each others (the frontiers of the !-boxes). When π is in normal form (cut free) and φ is the strategy interpreting π, the correspondence Ψ establishes a tree isomorphism between Tπ (the !-boxes tree) and the tree of P-views tφ . Moreover each flat proof structure of π has a very particular shape: it only consists of one combined positive connective (a tensor of of course) and one combined negative connective (a par of why not) together with edges connecting them or going through the frontiers of !-boxes. Through Ψ , moves of tφ correspond to these combined connectives and pointers are just another way to draw the connecting edges. We introduce in Section 2 the core ingredient of the paper: the notion of thick subtree, a generalization of the usual notion of rooted subtree. We use thick subtrees both at the term level and at the type level. The desequentialization relates the two levels. Type level (Section 3). A MELLpol formula A can be thought of as a tree: its arena in games. The desequentialization of a play in A is a particular thick subtree of A. And there is a (bijective) encoding of thick subtrees of A into the set of results of type A. Term level (Section 4). Thick subtrees are used at the term level both to represent experiments in proof-nets and to express the intrinsic dynamic of plays of a strategy. The desequentialization D factors into a negative part D− followed by a positive part + D (Fig.2). If p is a play in an innocent strategy φ then D− (p) is a thick subtree s of tφ . Conversely, any thick subtree t of tφ can be lifted into many plays in φ. This is just a matter of extending the tree order of s into a well-shaped total order. Intuitively, the tree order of s corresponds to the internal dynamic of the program (positive/Player) that one will find in any interaction between this program and an environment. The new part of the order is then provided by the environment (negative/Opponent) during a possible interaction. An experiment e in a proof-net π is just a thick subtree s of Tπ , together with an arbitrary valuation v of axioms. By extending the correspondence Ψ between proof-nets and trees of P-views to their thick subtrees, we show that the positive desequentialization D+ of s (together with v) is the result of the experiment e. This proves that the desequentialization is a functor which maps a finite innocent strategy (a set of plays) to the static interpretation of the corresponding proof-net. Figure 2 sketches the full picture we then obtain.
68
P. Boudes
2 Trees and Thick Subtrees (2.a) Trees. Let us recall some basis definition about trees. A finite tree t is a partial order (It , ≤t ), where It is a finite set, called here the indexing set, having a least element (the root), and such that if a ≤t c and b ≤t c then a ≤t b or b ≤t a. In the sequel, trees will all be finite. The associated precedence relation is denoted by <1t (so a <1t b means that a
3 Types Formulæ of multiplicative exponential linear logic with polarities (MELLpol) are given by: N := ?X ⊥ | ⊥ | N N | ?P P := !X | 1 | P ⊗ P | !N
(negative formulæ) (positive formulæ)
with the usual De Morgan laws for the orthogonal (−)⊥ and where X is any element of a given set of atoms V. Here, as in [2], atoms (X, X ⊥ ) are not formulæ. This restriction is necessary for obtaining the faithfulness of the game semantics. For the same reason,
Thick Subtrees, Games and Experiments
69
MELLpol proof-nets (see Section 4) require the introduction of a flat () modality which does not belong to MELLpol and which can be thought of as a why not (?) modality, in semantics. The flat notation is used here to ensure that the introductions of ? in a proof-net are postponed as late as possible. (3.a). The relational interpretation of a formula A is a countable set, denoted |A| and called the web of A. The web of A⊥ is always the same as the web of A. The web of 1 (and the web of ⊥) is the singleton set {∗}, (this set is intended to be the neutral element of the Cartesian product of sets, so ∗ shall be thought of as a notation for the empty tuple). The web of A ⊗ B (or of A B) is |A| × |B|. The web of !A (or of ?A) is the set of finite multisets of elements of |A|. For each atom X ∈ V, an arbitrary enumerable set is chosen as web (both for X and its orthogonal). For convenience, we also set |A| = |?A|. To avoid some bureaucratic aspects, we will work on MELLpol up to associativity and neutrality of multiplicatives. In the relational model, this amounts to working up to associativity of the Cartesian product and neutrality of {∗}. 3.1 Arenas and the Desequentialization An arena A is a labeled ordered finite forest together with a polarity: positive or negative. For the game semantics of MELLpol, we restrict ourselves to finite trees. The labeling function is denoted αA . The labels of leaves are elements of V ∪ {∗} and the labels of other nodes are all equal to ∗. The polarity of the arena is extended to moves by choosing the polarity of the arena for the root and by saying that two successive nodes have different polarities. This corresponds to the usual Player/Opponent polarity as follows: positive corresponds to Player and negative to Opponent. Basically, in MELLpol, the arena associated with a formula A is the syntactic tree of this formula, up to associativity and neutrality of multiplicatives and where exponentials shift polarities. (3.b) Arena of a formula. Let A be a formula. The arena 111+ 112+ (X) of A is defined as follows. The polarity of the arena as11− 21− (X) 22− sociated to A is the polarity of the formula. The tree of ⊥ A and of A are equal. The tree of 1 or of an atom X 2+ 3+ 1+ is the tree reduced to one node: (). If t is the tree of N then (t) is the tree of !N . If (t1 , . . . , tp ) is the tree ε of P and (t1 , . . . , tq ) is the tree of P then the tree of Fig. 4. The arena of N0 P ⊗ P is (t1 , . . . , tp , t1 , . . . , tq ). We adopt the canonical localization of ordered trees on arenas. The labels are chosen such that the label of a node coming from an atom X or X ⊥ is X and the labels of the others nodes are ∗. Conversely, every arena is the arena of a unique formula. We further identify arenas and formulæ. Figure 4 shows the arena of N0 = ?!(?1 ?X ⊥ ) ?(!X ⊗ !⊥) ?1 with the relevant part of the labeling. (3.c). In the arena of a formula A, each sub-formula of A corresponds to a move. Two sub-formulæ can correspond to a same move a, but, for each move, there is a maximal sub-formula F (a) of A corresponding to a. For instance, the first occurrence of ?1 in N0 corresponds to the move 11, but F (11) is ?1 ?X ⊥ .
70
P. Boudes
(3.d). A legal justified tree (LJT, for short) on A is a finite tree (I, ≤) together with a labeling function f : I → IA and a pointing relation ← such that: (i) (I, ←∗ , f ) is a thick subtree of A; (ii) ≤ extends the order ←∗ (i.e. ←∗ ⊆ ≤); and (iii) <1 alternates between positives and negatives. We consider that polarities extend to elements of I by saying that the polarity of a ∈ I is the polarity of f (a). So the set I is the disjoint union of a set of negative nodes I − and a set of positive nodes I + . Observe that (ii) implies that ← alternates between I − and I + (as <1 ). The notion of LJT encompasses the game notion of legal play. A legal play is a LJT where (I ≤) is a total order. (3.e). Observe that a LJT t has two tree structures (It , ≤t ) and (It , ←∗t ). We implicitly generalize some notions on trees (e.g. thick subtrees and prefixes) to LJT by considering that (It , ≤t ) is the tree of the LJT t. If t is a LJT, a thick subtree (It , ≤t , g) of (It , ≤t ) inherits a LJT structure (←t , ft ) from t by setting: ft = ft ◦ g and if a ≤t b and g(a) ←t g(b) then a ←t b. Definition 1. The desequentialization D(t) of a LJT t = (I, ≤, ←, f ) on an arena A is just the thick subtree (I, ←∗ ) of A. (3.f). A thick subtree (t, f ) is equitable when t has as many positive nodes as negative nodes. Observe that the thick subtree associated with an even-length, legal play is equitable. Conversely if a thick subtree (t, f ) of an arena A is equitable then there exists a total order extending t into an even length legal play. (Proof by cases on the number of leaves and internal nodes of t of each polarity). (3.g) Valuation (atoms). Let A be a formula. Let (t, f ) be a thick subtree of A. A valuation v of (t, f ) is a partial labeling of nodes of t given by the choice of an element x of the web of X, for each node a of t such that αA (f (a)) = X (in that case f (a) is a leaf of A and a is a leaf of t). When f (a) is a leaf of A and αA (f (a)) = ∗ we set v(a) = ∗. So, each node a of t such that f (a) is a leaf of A and no other is labeled. A result of type A can be seen as a concrete representation of a valuated thick subtrees of A. This representation commutes to the orthogonal. An element x of |X| (resp. ∗ ∈ |1|), is simply the unique thick subtree of the tree (), together with the valuation mapping its unique node to x (resp. ∗). Let a = ([a11 , . . . , a1n1 ], . . . , [ak1 , . . . , aknk ]) be an element of |P | where P is !N1 ⊗ . . . ⊗ !Nk (Ni can be an atom). For each 1 ≤ i ≤ k and for each 1 ≤ j ≤ ni , aij is an element of the web of Ni , so aij represents a valuated thick subtree (tji , fij , vij ) of Ni . The valuated thick subtree represented by a is then the tree (t11 , . . . , tknk ) (seen unordered) together with: the function f mapping its root to the root of the arena of P and equal to fij on the other nodes; and the valuation vij . Proposition 2. Let A be a formula. The set VTST(A) of valuated thick subtrees of A is equal to the web of A. Direct, by induction on A. So, the desequentialization of a legal play on A together with a valuation is an element of the web of A.
4 Terms (Proof-Nets) (4.a). A flat proof structure R is a finite directed graph, built using links of Figure 5, with at least one pending outgoing edges, called the conclusion edges. A label of an
Thick Subtrees, Games and Experiments
71
edge is either a formula, positive (P , Q) or negative (N , M ) or an atom (X) or its orthogonal (X ⊥ ), or a flat formula F , where F (or G) is either a positive formula or the orthogonal X ⊥ of an atom (so F ⊥ or G⊥ is either a negative formula or an atom X). When connecting two links by an edge, the two labels of the edge must match. Polarities of MELLpol formulæ extend to labels as follows. Atoms X are negN M F F F P Q ative, their orthogonal are ⊗ 1 ⊥ ? positive and flat formulæ are F N M ⊥ ?F P ⊗Q 1 negative. In a link, an outgoing edge is a conclusion ax. ! and an incoming edge is a F ⊥ ⊥F !G⊥ F1 Fn premise. In a !-link, the edge cut X ⊥ ⊥X labeled !G⊥ is the front conclusion and the others edges Fig. 5. Links of flat proof structures are the auxiliary conclusions. There is one !-link (resp. ?-link) for each natural number of auxiliary conclusions (resp. premises). For ?-links and !-link the ordering of incoming and outgoing edges is irrelevant (to remind it we draw them with a double line). Observe that R is acyclic, because for each link, the label of each conclusion is strictly bigger than the label of each premise. (4.b) Correctness criterion. A flat proof structure R is correct if: (i) the graph obtained (starting from R) by inversion of every edge with a positive label is acyclic; and (ii) either R contains no flat link () and has exactly one positive conclusion, or R contains exactly one flat link and has only negative conclusions [13]. (4.c). A !-box (RL , BL ) for a !-link L is a correct flat proof structure RL together with a one to one correspondence BL between the conclusion edges of RL and the conclusions of L such that: the conclusion labeled !G⊥ of L (its front conclusion) is the image of a conclusion edge of R labeled by G⊥ ; and the other edges’ labels are preserved. Definition 3. A proof-net π is a finite tree T and three labeling functions R, S, B of nodes of T such that: – for each node n of T , R(n) is a correct flat proof structure and S(n) is a one to one correspondence between the sons of n and the !-links of R(n); – if n is a son of n in T then (R(n ), B(n )) is a !-box for the !-link S(n)(n ). We do not make any requirement on the label f (r) of the root r of π: this label is just here to ease the writing of the definition and it can be safely forgotten. Observe that, if n is a node of a proof-net π = (T, R, S, B) and if Tn is the maximal subtree Tn of t with root n, then πn = (Tn , R|Tn , S|Tn , B|Tn ) is a proof-net. The conclusions of a proof-net are the conclusions of its root’s flat proof structure. A MELLpol proof-net is a proof-net where conclusions are not atoms or flat formulæ. We do not describe the cut elimination procedure on LLpol proof-nets [13].
72
P. Boudes
4.1 Relational Semantics (4.d). An experiment on a flat proof structure R is a labeling function e on edges of R such that: – if a is a conclusion of an axiom link introducing an atom X, and if its other conclusion is b then e(a) = e(b) and e(a) ∈ |X|; – if a is the front conclusion of a !-link and b1 , . . . , bn are the auxiliary conclusions, labeled respectively by !N , F1 , . . . , Fn then for each i, e(bi ) is a multiset of points of |Fi | and e(a) is a multiset of points of |N |; – if a is the conclusion of a 1-link or of a ⊥-link then e(a) = ∗; – if a1 and a2 are the first and the second premises and a is the conclusion of a ⊗-link or of a -link then e(a) = (e(a1 ), e(a2 )); – if a is the premise and b is the conclusion of a -link then e(b) = [e(a)]; – if a1 , . . . , an are the premises and b is the conclusion of a ?-link then e(a1 ), . . . , e(an ) and e(b) are multisets and e(b) = e(a1 ) + . . . + e(an ); – if a is a premise of a cut link, and if its other premise is b then e(a) = e(b). Observe that e(a) is always an element of the web of the label of the edge a. An experiment on a flat proof structure can be considered as a choice of labels for axiom links and !-links which satisfies the constraint e(a) = e(b) on cut links, when propagated by other links. (4.e). If R has only negative conclusions and if e is an experiment on R then r(e), the result of e, is the family a → e(a) indexed by conclusions of R. This notion of result extends to any flat proof structure R and to any experiment e on R by setting r(e ) = r(e) where e is an experiment on a flat proof structure R defined as follows. If R has a positive conclusion we add below a -link. Then, for each conclusion of type a flat-formula we add below a unary ?-link. The resulting proof structure is R and there is a unique extension of e into an experiment of R which is e. (4.f). Experiments on proof-nets and their results are defined inductively on π as follows. If the root of π is the flat proof structure R then an experiment eπ on π is an experiment e on R together with, for each proof-net πv associated with a !-link v of R, a multiset [e1πv , . . . , ekπvv ] (kv ∈ IN) of experiments on πv which satisfies the following. If a is the front conclusion and b1 ,. . . ,bn are the auxiliary conclusions of v kv 1,v νi , . . . , and if, for each i, the result of eiπv is (xi , νi1,v , . . . , νin,v ) then e(b1 ) = i=1 kv n,v e(bn ) = i=1 νi and e(a) = [x1 , . . . , xkv ]. The result r(eπ ) of eπ is the result of e. Hence on a proof-net, an experiment consists of two choices: (i) a copying choice for !-boxes, inductively given by: taking one copy of the root of π and, for each !-link of the root, choosing an arbitrary finite number of copies of the proof-net above, then starting again for each of these proof-nets; (ii) a choice of labels for axioms links in each (copy of) flat proof structure which have been selected during the first choice. Once propagated, these choices have to obey to the only constraint of equality of labels on cut links. Observe that the first choice (i) is just the choice of an arbitrary thick subtree of Tπ and that there is no constraint on (i) and (ii) when there is no cut link. (4.g). To summarize, in this paper, an experiment on a cut-free proof-net π is given by: a thick subtree s of Tπ ; together with, for each axiom link in s introducing an atom
Thick Subtrees, Games and Experiments
73
X, the choice of an element of the web of X. We call this last choice a valuation of axioms. (4.h). The result of an experiment on a MELLpol proof-net π with only one conclusion N is an element of the web of N . The relational interpretation of a proof-net π is the set of results of experiments on π, for all possible experiments. 4.2 Cut Free MELLpol Proof-Nets In this section, we describe and simplify the structure of cut ax. free proof-nets. We start by introducing two simplifications, there will be a third one. ⊥ X X (4.i). First, we work with multiplicative connectives up to neu+ − trality and associativity. In flat proof structures there are trees of tensor links and 1-links with front conclusions of !-links Fig. 6. Axiom’s case above. We identify maximal such trees, called ⊗-trees, to links (drawn with a triangle). The same for trees of -links and ⊥-links with ?-links above ( -trees). Second, we only consider MELLpol proof-nets with only one negative conclusion. If needed we can always transform a (cut free) proof-net into such a MELLpol proofnet by adding well chosen links to the flat proof structure at its root (the same way as in §4.e). Observe that, if the conclusions of a cut free proof-net are known (before simplification) then we can recover this proof-net from its simplified version. Let π be a (simplified) MELLpol cut ⎧ !G⊥ free proof-net. We detail the shape of the ⎪ 1 ! ⎪ ⎪ ⎪ ⊗ flat proof structures contained in π. Let ⎪ ⎪ ⎪ ! ⎨ P R be a flat proof structure of π. We will !G⊥ p see that there are two cases: one with + ⎪ F 1 Fk11 p p 1 ⎪ ⎪ F F exactly one axiom-link (Fig. 6) and one ⎪ 1 kp ⎪ ⎪ ⎪ without axiom (Fig. 7). ⎩ P Each flat proof structure occurring in σR π is either in a !-box or at the root of π. ⎧ Hence R has conclusion edges labeled ⎪ ⎪ ⎪ F1 , . . . , Fk and exactly one negative ⎪ F1 F1 Fq Fq ⎪ ⎪ − ⊥ ⎪ conclusion edge e labeled G (the only ⎪ ⎪ ? ? ⎪ ⎨ conclusion if R is at the root of π). ?F1 ?Fq According to the correctness criterion − ⎪ ⎪ ⎪ (§4.b), R has only one -link L . Since ⎪ ⎪ ⎪ ⎪ F1 Fk this is the only link which has a positive ⎪ ⎪ N ⎪ ⎩ premise and a negative conclusion, all the links with positive conclusions must be above L in R. Here are the two cases: Fig. 7. What is in the box? (i) either the premise of L is the (positive) conclusion of an axiom link Lax. and there is no other link with some positive conclusions in R; (ii) or the premise of L is the conclusion of a ⊗-tree t⊗ (possibly reduced
74
P. Boudes
to an edge or to a 1-link) with front conclusions of !-links above and there is no other link in R introducing a positive formula (we already found the unique -link). In the first case, the axiom also introduces a negative atom X which cannot be the premise of any link. So, R has no other link than L and Lax. (in particular, R is a leaf of π). Now the second case. The conclusion G⊥ is a MELLpol formula N (if G⊥ was an atom X then there will be an axiom introducing it in R). Above the edge e− , labeled by N , there is a -tree t with ?-links above. There is no other !-link introducing negative conclusions than the above mentioned. If there was one, this would be a formula (because we already found the G⊥ conclusion) but flat formulæ are only introduced by -links (there is only one, L ) and !-links (another one than the above mentioned will also introduce a positive conclusion). Since the premises of the ?-link are formulæ they have to be chosen among the conclusion of L or of the !-links. The others conclusions of these last links which are not premises of ?-links are the conclusions F1 , . . . , Fk of R. So there is a pairwise connection σR of: the conclusion of t⊗ and the auxiliary conclusions of the !-links with: the premises of the ?-links and the conclusions of R different from e− . The third simplification we consider is the following. Observe that the !-links occurring in a flat proof structure R of a cut free proof-net π are totally ordered by mean of the ordering of premises of the unique ⊗-tree of R. As a consequence, rather than using S for matching !-boxes with !-links, we consider Tπ as an ordered tree where the ordering of sons of a node n is the same as the ordering of the !-links of R(n). (This cannot be done in a canonical way when there are cut links). 4.3 Game Semantics In polarized games, a MELLpol proof-net of conclusion A is interpreted as a finite balanced total innocent strategy in the arena A, called further a MELLpol strategy. In the sequel, we restrict ourselves to a negative type (the extension to positive types is easy). (4.j). A Player’s view (P-view for short) is a legal play s such that if si <1 sj and sj is an Opponent’s move then si ← sj (the Opponent always points to the last move). Traditionally a strategy is a set of legal plays satisfying some properties (e.g. prefixclosure, determinism). Composition of strategies is then defined pointwise on legal plays: two interacting legal plays are interleaved and, in the resulting sequence, the part on which the plays have interacted is hidden. We do not recall all the definitions of game semantics and polarized games. It is well known that, when a strategy is innocent, all its legal plays are determined by its P-views. This allows for an alternative description of innocent strategies which only uses P-views which we next relate to the traditional presentation (§4.k and Prop. 5). Definition 4. A finite innocent strategy φ in a negative arena A is an even prefix-closed set of P-views which is finite and deterministic: the longest common prefix of every two elements of the set is of even length. We further consider φ as the prefix tree of its P-views regarded, as a particular LJT (tφ , ←φ , f ) (in which every branch is a P-view).
Thick Subtrees, Games and Experiments
75
(4.k) Traditional presentation. The set of legal plays P (φ) associated with a finite innocent strategy φ on A is the smallest set such that: (i) the P-views of φ are in P (φ); (ii) if there is a visible legal play s · ab such that s ∈ P (φ) and v + (s · ab) is a P-view of φ then s · ab ∈ P (φ). Observe that P (φ) is an even-prefix closed set of visible even length legal plays which is not, in general, finite. (4.l). When (I, ≤, ←, f ) is a LJT, we further consider the following relations: the Player’s pointers ←+ = ← ∩ (I − × I + ); the Opponent’s pointers ←− = ← ∩ (I + × I − ); the Player’s precedence <+ = <1 ∩ (I − × I + ); the Opponent’s precedence <− = <1 ∩ (I + × I − ); and the Player’s order: (<+ ∪ ←− )∗ . The Player’s order on I is still a tree because ←∗ ⊆ ≤. (4.m). A LJT is visible when its Player’s order contains the Player’s pointers. (4.n). The negative desequentialization 112+ 3+ D− ((s, ←s )) of a visible LJT (or play) 3+ + − ∗ is the LJT (Is , (< ∪ ←s ) , ←s ). Each 21− 22− 22− branch of this tree is a P-view. The view 2+ 2+ function v + maps visible legal plays to 112+ 11− 11− P-views: a legal play s with last move 21− 1+ a is mapped to the unique branch of the − + tree D (s) with leaf a. The positive de2 − sequentialization D+ is defined dually, (b) D− (p) − 11 by reversing the roles of Player and Opponent, on co-visible LJTs (the dual no2+ 112+ tion of visible LJTs). We will only use 11− 11− 11− 21− 22− D+ on the image of D− where all LJTs are co-visible (because the Opponent al1+ 1+ 2+ 2+ 3+ ways points to the last move). On LJTs − − the desequentialization D factors through (c) D(p) (a) p D− and D+ , moreover D and D+ coincide on the image of D− . So, for a visible Fig. 8. Desequentialization of a play p legal play s, D(s) = D+ (D− (s)) = D(D− (s)). Figure 8(a) shows a legal play p in the formula N0 (of Fig. 4) which is visible (but not co-visible). If the Player’s move 112 was set to point to the second occurrence of 11 (from bottom to top) then the play would not be visible. Figure 8(b) shows the negative desequentialization of p and Figure 8(c) achieves the desequentialization. (4.o). If a LJT t is such that each Opponent’s move has exactly one son then its compact presentation is itself but where, in the tree (It , ≤t ), each pair of successive nodes a <+ b is regarded as one node (a, b). An even thick subtree t of a finite innocent strategy φ is a thick subtree of φ such that each Opponent’s move has exactly one son (so, t is given by an arbitrary thick subtree of the compact presentation of φ). The set of even thick subtrees of φ is denoted ETST(φ). Proposition 5. Let φ be a finite innocent strategy. If s is a legal play of φ (ie s ∈ P (φ)) then D− (s) is an even thick subtree of φ. Conversely, if t (together with a tree
76
P. Boudes
112+
3+ 1
ax. 21−
22−
112
! ?
11 !
(a) Strategy
?
?X
⊥
(!X ⊗ !⊥) 1
!
2 ?
?1
! ⊗
?
! ⊗
?
11−
1+
⊥ 1
X !
?
2
⊥
X ⊥
⊥ 22 3
21
+
ε−
1
ax.
?
?
?
?
1 ε (b) Go-between
N0 (c) Proof-net
Fig. 9. Example
morphism f ) is an even thick subtree of φ then any total order ≤s on It which extends + ≤t and preserves the Player’s precedence of t (i.e. <+ s =
Thick Subtrees, Games and Experiments
77
proof structure, similar to the one of Fig. 7. The maximal sub-formula F (n− ) of N (§3.c) determines a -tree n( ) with, for each of its premises a1 , . . . , aq , a ?-link Wi of conclusion ai . The premises of these ?-links are yet unknown. The maximal subformula F (n+ ) of N defines a ⊗-tree n(⊗) with: for each of its premises b1 , . . . , bp , a !-link with principal conclusion bi ; and a -link n() having the conclusion of the n(⊗). The auxiliary doors of the !-links and the permutation σRn are not yet defined. In the partial proof-net we then obtain (Figure 9(b)), we still represent the Player’s pointer: if − n+ 1 points to n2 and if F (n1 ) occurs in F (n2 ) at place i then we draw an edge from n1 () to the ith ?-link Wi above n2 ( ). Next, we slice the pointers to reconstruct the missing edges of the flat proof structures, by working inductively on tφ , from leaves to root. When n() points to a ?-link Wi above n( ) we draw an edge from the conclusion of n() to Wi . Otherwise n() is a conclusion of Rn . It is then passed as an auxiliary door to the associated !-link L of the flat proof structure below n: a conclusion edge is drawn from the actual source of the pointer, and this source is changed into L. It is passed again, until the source of the pointer is in the same flat proof structure as its target, a ?-link. We then draw an edge from the source to the target of the pointer. At the end of this process we obtain π. Conversely, let π be a cut free proof-net of conclusion N . We construct φ = ΨN (π) as follows. We define a labeling Mπ of the edges of the flat proof structures of π by moves of N . Intuitively this labeling is just a way to identify the occurrences of sub-formulæ of N to the places where they are created in the proof-net π. The (unique) conclusion edge of π is labeled by the empty word (the root of N ). Going upward through a ?-link, a -link or through a !-link and its associated !-box do not change labels. If L is a ⊗-tree or a -tree and w is the label of its conclusion then its k premises are labeled w · 1, . . . , w · k (in that order). We use Mπ to associate to each node n of π an ordered pair made of one Opponent’s move n− and one Player’s move n+ . For a flat proof structure containing one axiom (Fig. 6) this is respectively the move labeling the negative conclusion and the move labeling the positive conclusion of the axiom. For a flat proof structure without axiom (Fig. 7) this is respectively the move labeling the conclusion of the -tree and the move labeling the conclusion of the ⊗-tree. The tree Tπ equipped with the labeling Mπ will be the compact presentation of − φ. The pointing relation ←φ is not yet defined. We first set n+ 1 ←φ n2 each time − + + − + − + 1 n1 <φ n2 (that is, each time (n1 , n1 ) < (n2 , n2 ) in the compact presentation of φ). Each flat proof structure Rπ (n) associated to a node n of π contains a unique flat link. We denote (n) its conclusion edge. There exists a unique chain n1 <1π . . . <1π nk = n in π such that Bπ (n2 )(. . . Bπ (nk )((n)) . . .) + is the premise of a ?-link of n1 . We set n− 1 ←φ n . We then obtain φ.
4.5 Experiments and Strategies The detailed correspondence between proof-net and strategies shows that a thick subtree of a cut free proof-net π of N can be regarded as an even thick subtree of the corresponding strategy φ = ΨN (π) (and conversely). But an experiment in π is just a thick
78
P. Boudes
subtree of Tπ together with a valuation of axioms (§4.g). We now define valuations of axioms directly in games, to obtain a notion of experiment on MELLpol strategies. The correspondence ΨN then extends into a one to one correspondence between experiments in π and experiments in φ = ΨN (π) and that allows to show that the result of a proof-net’s experiment is the desequentialization of the corresponding strategy’s experiment. (4.q) Valuation of axioms (2). If (t, f ) is an even thick subtree of φ (equivalently, an element of P (φ), §4.k) then a valuation of axioms in (t, f ) is the choice, for each pair a <+ t b such that αN (a) = αN (b) is an atom X, of an element of the web of X. The set of valuated thick subtrees of φ is denoted VETST(φ) and the set of valuated legal plays of P (φ) is denoted V (P (φ)). Proposition 5 extends into a correspondence between these two sets. Even thick subtrees of a strategy inherit the labeling by moves and pointers from the strategy (§3.e). We do the same for thick subtrees of proof-nets. If (t, g) is a thick subtree of Tπ then we define three labeling functions Rt , St and Bt on t as follows. For each node n of t, Rt (n) is a copy of the flat proof structure Rπ (g(n)). If n <1t n then Sπ (g(n))(g(n )) is a !-link L of Rπ (g(n)) which has a corresponding copy Lc in Rt (n). We set St (n)(n ) = Lc . The one to one correspondence Bt (n ) between conclusions of Rt (n ) and conclusions of Lc is then simply a copy of Bπ (g(n )). Lemma 6. Let N be a negative MELLpol formula, ΨN extends into a one to one correspondence between the experiments on π and the experiments on ΨN (π). Moreover, if e is an experiment on π then the valuated thick subtree D+ (ΨN (e)) is the result of e. The extension of ΨN is straightforward, because the condition that the functions Sπ (n) are one to one in the definition of proof-nets (Def. 3) is not necessary to make ΨN work. But one needs to be careful for the slicing of Player’s pointers when reconstructing a − + + proof-net experiment e. If there are two pointers n− ←+ n+ 1 , and n ← n2 such that + + n1 and n2 correspond to the same node in the strategy φ then, when going through the same partial flat proof structure R, these pointers define in R the same edge a (from a !-link to a ?-link or a conclusion) rather than two edges. This identification corresponds to a sum of multisets in the label e(a). There is a labeling of pointers of ΨN (e) which coincides with e on sources of pointers. It is then easy to check that D+ (ΨN (e)) is the result of e. As an immediate consequence of this last Lemma we have: Theorem 7. If π is a MELLpol proof-net of negative conclusion N , then the relational interpretation of π is the set D+ (VETST(ΨN (π))) = D(V (P (ΨN (π)))). This result proves that the desequentialization (together with valuations) defines a logical functor from polarized games to the relational model. By using techniques presented in [9], D can be composed with a functor forgetting some results, to obtain a logical functor from polarized games to coherence spaces or to hypercoherences. The results presented here extend to the full fragment of LLpol with sliced proofnets [14]. The use of additives can be restricted to the outermost flat proof-structure (the root), by using the type isomorphism !(N & M ) = !N ⊗ !M and the distributivity laws
Thick Subtrees, Games and Experiments
79
of linear logic (leaving unchanged the semantics). The only work left is a generalization of the present work from trees to forests. An extension to full linear logic (without polarity constraint) is more problematic at least because of the arbitrary complexity of cut-free flat proof-structures. Since the desequentialization D provides good results for the game semantics of MELLpol, we hope that it can be applied to others game semantics, for instance for languages with imperative features, in order to obtain static semantics of these syntaxes. A corollary of Theorem 7 is that all the results of experiments are equitable. This property, coming from alternation of moves in plays, surely allows for narrowing the relational model to equitable results. Other properties of plays, such as visibility, seem harder to capture on the side of the relational model. Another direction to look at is the faithfulness of the relational model. Factoring the interpretation through abstract Böhm trees by mean of thick subtrees allows for a more combinatoric approach of this long standing conjecture [6]. The author thanks the anonymous referees for their comments on improving the readability of this paper.
References 1. Baillot, P., Danos, V., Ehrhard, T., Regnier, L.: Timeless games. In: Nielsen, M., Thomas, W. (eds.) CSL 1997. LNCS, vol. 1414, pp. 56–77. Springer, Heidelberg (1998) 2. Laurent, O.: Syntax vs. semantics: a polarized approach. Theoretical Computer Science 343(1–2), 177–206 (2005) 3. Abramsky, S., McCusker, G.: Game semantics. In: Proceedings of Marktorberdorf of the 1997 Summer School. Lecture notes. Springer, Heidelberg (1998) 4. Laurent, O.: Polarized games. Annals of Pure and Applied Logic 130(1–3), 79–123 (2004) 5. Girard, J.Y.: Linear logic. Theoretical Computer Science 50, 1–102 (1987) 6. Tortora de Falco, L.: Réseaux, cohérence et expériences obsessionnelles. Thèse de doctorat, Université Paris VII (2000) 7. Ehrhard, T.: Projecting sequential algorithms on strongly stable functions. Annals of Pure and Applied Logic 77 (1996) 8. Melliès, P.A.: Sequential algorithms and strongly stable functions. Theoretical Computer Science 343(1-2), 237–281 (2005) 9. Boudes, P.: Non uniform hypercoherences. In: Blute, R., Selinger, P. (eds.) Proceedings of CTCS 2002. Electronic Notes in Theoretical Computer Science, vol. 69. Elsevier, Amsterdam (2003) 10. Boudes, P.: Projecting games on hypercoherences. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 257–268. Springer, Heidelberg (2004) 11. Curien, P.L.: Abstract Böhm trees. Mathematical Structures in Computer Science 8(6) (1998) 12. Curien, P.-L., Herbelin, H.: Computing with abstract Böhm trees. In: Sato, M., Toyama, Y. (eds.) Fuji International Symposium on Functional and Logic Programming (FLOPS 1998), Kyoto, Japan, April 2-4, 1998, pp. 20–39. World Scientific, Singapore (1998) 13. Laurent, O.: Étude de la polarisation en logique. Thèse de doctorat, Université AixMarseille II (March 2002) 14. Laurent, O., Tortora de Falco, L.: Slicing polarized additive normalization. In: Ehrhard, T., Girard, J.Y., Ruet, P., Scott, P. (eds.) Linear Logic in Computer Science. Lecture Notes Series, vol. 316, pp. 247–282. London Mathematical Society (2004)
Bounded Linear Logic, Revisited Ugo Dal Lago1 and Martin Hofmann2 1
Dipartimento di Scienze dell’Informazione, Universit` a di Bologna 2 Institut f¨ ur Informatik, LMU M¨ unchen
Abstract. We present QBAL, an extension of Girard, Scedrov and Scott’s bounded linear logic. The main novelty of the system is the possibility of quantifying over resource variables. This generalization makes bounded linear logic considerably more flexible, while preserving soundness and completeness for polynomial time. In particular, we provide compositional embeddings of Leivant’s RRW and Hofmann’s LFPL into QBAL.
1
Introduction
After two decades from the pioneering works that started it [3,13,14], implicit computational complexity is now an active research area at the intersection of mathematical logic and computer science. Its aim is the study of machine-free characterizations of complexity classes. The correspondence between an ICC system and a complexity class holds extensionally, i.e., the class of functions (or problems) which are representable in the system equals the complexity class. Usually, the system is a fragment or subsystem of a larger programming language or logical system, the base system, in which other functions besides the ones in the complexity class can be represented. Sometimes, one of the two inclusions is shown by proving that any program (or proof) can be reduced in bounded time; in this case, we say that the system is intensionally sound. On the other hand, ICC systems are very far from being intensionally complete: there are many programs (or proofs) in the base system which are not in the ICC system, even if they can be evaluated with the prescribed complexity bounds. Observe that this does not contradict extensional completeness, since many different programs or proofs compute the same function. Of course, a system that captures all and only the programs of the base system running within a prescribed complexity bound will in all but trivial cases (e.g., empty base system) fail to be recursively enumerable. Thus, in practice, one strives to improve intensional expressivity by capturing important classes of examples and patterns. An obstacle towards applying ICC characterizations of complexity classes to programming language theory is their poor intensional expressive power: most ICC systems do not capture natural programs and therefore are not useful in practice. This problem has been already considered in the literature. Some papers try to address the poor intensional expressive power of ICC systems by defining new programming languages allowing to program in ways which are not allowed P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 80–94, 2009. c Springer-Verlag Berlin Heidelberg 2009
Bounded Linear Logic, Revisited
81
in existing ICC systems. This includes quasi-interpretations [15] and LFPL by the second author [10]. Other papers analyze the intensional expressive power of existing systems either by studying necessary conditions on captured programs or, more frequently, by studying relations between existing ICC systems. One nice example is Murawski and Ong’s paper [16], in which the authors prove that there cannot be any embedding (satisfying certain properties) of Bellantoni and Cook function algebra BC [3] into light affine logic [1]. In this work, we somehow combine the two approaches, by showing that a new logical system, called QBAL is intensionally at least as expressive as two heterogeneous, existing systems, namely Leivant’s RRW [14] and LFPL. QBAL is a generalization of Girard, Scedrov and Scott’s bounded linear logic (BLL, [8]), itself the first characterization of polynomial time computable functions as a fragment of Girard’s linear logic [6]. Bounded linear logic has received relatively little attention in the past [11,17]. This is mainly due to its syntax, which is more involved than the one of other complexity-related fragments of linear logic appeared more recently [7,12,5]. In bounded linear logic, polynomials are part of the syntax and, as a consequence, computation time is controlled explicitly. However, it seems that BLL is not as intensionally expressive as to be able to embed any existing ICC system corresponding to polynomial time (except Lafont’s SLL [12], which anyway was conceived as a very small fragment of BLL). QBAL is obtained by endowing BLL with bounded quantification on resource variables. In other words, formulas of QBAL includes the ones of BLL, plus formulas like ∃x : {x ≤ y 2 }.A or ∀x, y : {x ≤ z, y ≤ z 3 }.B. This new feature by itself increases the intensional expressive power: both RRW and LFPL can be compositionally embedded into QBAL. Moreover, QBAL remains sound with respect to polynomial time. For these reasons, QBAL is not just another system capturing polynomial time computable functions. An extended version of this paper including all proofs is available [4].
2
Syntax
In this Section, we present the syntax of QBAL, together with some of its main properties. The lack of space prevents us from being exhaustive, but all the details can be found in [4]. In the following, we adhere to the notation adopted in the relevant literature on BLL [8,11]. Resource polynomials of products of binomial coefficients, i.e. sums are finite where the xij are pairwise distinct and they can be written as j≤m i≤kj nxij ij nij are natural numbers. Resource polynomials are closed under binary sum, binary product, bounded sum and composition [8]. Order relations between resource polynomials are captured by constraints and constraint sets: Definition 1 (Constraints) • A constraint is an inequality in the form p ≤ q, where p and q are resource polynomials. A constraint set is a finite set of constraints. Constraint sets are denoted with letters like C or D.
82
U. Dal Lago and M. Hofmann
• For each constraint set C , we define an order C on resource polynomials by imposing p C q iff C |= p ≤ q, i.e., the (pointwise) inequality p ≤ q is a logical consequence of C . • C |= D iff C |= p ≤ q for every constraint p ≤ q in D. Resource polynomials, constraints and constraint sets becomes, in turn, the essential ingredients in the definition of QBAL formulas: Definition 2. Formulas of QBAL are defined as follows: A ::=α(p1 , . . . , pn ) | A ⊗ A | A A | ∀α.A | !x
Bounded Linear Logic, Revisited
83
Axiom and Cut A C B
Γ C A
A
A C B
Δ, A C B
Γ, Δ C B
U
Structural Rules Γ, !x
Γ C B W Γ, A C B
p + q C r
Γ, !x
X
Multiplicative Logical Rules Γ, A C B
R
Γ C A B
Γ C A
Δ, B C C
L
Γ, Δ, A B C C
Γ C A Δ C B R⊗ Γ, Δ C A ⊗ B
Γ, A, B C C Γ, A ⊗ B C C
L⊗
Exponential Rules A1 , . . . , An C B
D, x < p |= C
x∈ / FV (D)
p D qi
!x
1 C p
!x
w<x
q{w/x}/y}, Γ C B
P!
D! x
q C r
!y
N!
Second Order Rules Γ C A
α ∈ FV (Γ )
Γ C ∀α.A
R∀α
Γ, A{B/α(x1 , . . . , xn )} C C Γ, ∀α.A C C
L∀α
First Order Rules Γ C ∪D A
x ∈ FV (Γ ) ∪ FV (C )
Γ C ∀x : D.A Γ C A{p/x}
C |= D{p/x}
Γ C ∃x : D.A
R∃x
R∀x
Γ, A{p/x} C C
Γ, A C ∪D C
C |= D{p/x}
Γ, ∀x : D.A C C
L∀x
x ∈ FV (Γ ) ∪ FV (C) ∪ FV (C ) Γ, ∃x : D : A C C
L∃x
Fig. 1. A sequent calculus for QBAL
2.1
QBAL and Second Order Logic
Second order intuitionistic logic can be presented as a context-independent sequence calculus with explicit structural rules [18], G2i. There is a forgetful map [·] from the space of QBAL proofs to the space of G2i proofs. In particular corresponds to → and ⊗ corresponds to ∧. Essentially, [π] have the same structure of π, except for exponential and first order rules, which have no formal correspondence in G2i. From our point of view, if [π] = [ρ], then π and ρ correspond to the same program, i.e. QBAL can be seen as a proper decoration of second order logic proofs with additional information which are not necessary to perform the underlying computation.
84
2.2
U. Dal Lago and M. Hofmann
Properties
QBAL inherits some nice properties from BLL. In particular, proofs can be manipulated in a uniform way by altering their conclusion without altering their structure, i.e., without changing the underlying second order logic proof. Suppose that π : A1 , . . . , An C B is a QBAL proof. Then, we can construct other proofs with the same skeleton as π. For example: • Whenever B C D and Ci C Ai for every 1 ≤ i ≤ n, there is ρ : C1 , . . . , Cn C D such that [ρ] = [π]. • Whenever p1 , . . . , pn are resource polynomials free for substitution for the free resource variables x1 , . . . , xn in π, there is a proof π{p/x} : A1 {p/x}, . . . , An {p/x} C {p/x} B{p/x} such that [π{p/x}] = [π]. • Whenever D |= C there is a proof ρ : A1 , . . . , An D B such [ρ] = [π]. Unfortunately, space constraints prevent us from explicitly giving the details and proofs of the results above (see [4]). 2.3
Cut-Elimination
A nice application of the observations we have just given is cut-elimination. Indeed, the new rules R∀x , L∀x , R∃x and L∃x do not cause any problem in the cut-elimination process. See [4] for more information. In this paper, we will not study cut-elimination. And polynomial time soundness will be itself proved semantically. 2.4
Programming in QBAL
The Curry-Howard correspondence allows to use BLL and QBAL as programming languages. In particular, following the usual impredicative encoding of data into second order intuitionistic logic, natural numbers can be represented as cut-free proofs of the formula Np = ∀α.!y
Functions on natural numbers can be represented by proofs with conclusion Nx Np , where p is a resource polynomial depending on x, only. More generally, functions on the word algebra W can be represented by proofs with conclusion Wx Wp . For example, all constructors c1W , . . . , cw W corresponds to proofs with
Bounded Linear Logic, Revisited
85
conclusion Wx Wx+1 . More generally, the polynomial p gives a bound on the size of the result, as a function of the size of the input. QBAL supports iteration on any word algebra (including natural numbers). As an example, for every p and for every A where x only appears positively, there is a proof πpA of Np , A{y/x} A{y + 1/x}, A{0/x} A{p/x}. 2.5
Unbounded First Order Quantification Is Unsound
One may wonder why quantification on numerical variables is restricted to be bounded (see Definition 2). The reason is very simple: in presence of unbounded quantification, QBAL would immediately become unsound. To see that, define N∞ to be the formula ∃(x) : ∅.Nx . The composition of the successor with itself yields a proof with conclusion Nx Nx+2 which, by rules R∃x and L∃x , becomes a proof with conclusion N∞ N∞ . Iterating it, we obtain a proof of Nx N∞ which computes the function n → 2n. But by rule L∃x , it can be turned into a proof of N∞ N∞ , and iterating it again we obtain a proof computing the exponential function. The boundedness assumption will be indeed critical in Section 4, where we establish that any functions which is representable in QBAL is polynomial time computable.
3
Set-Theoretic Semantics
In this Section, we give a set-theoretic semantics for QBAL. We assume that our ambient set-theory is constructive. This way we have a set of sets U which contains the natural numbers, closed under binary products, function spaces and U-indexed products. An alternative to assuming a constructive ambient set theory consists of replacing plain sets with PERs (partial equivalence relations) or domains or similar structures. See [11] for a more detailed discussion of this issue. A formula A can be interpreted as a set A ρ , where ρ is an environment mapping atoms to sets. For example, α(p) ρ = ρ(α), while ∀x : C .A ρ is simply A ρ . Please observe that the interpretation of any formula A is completely independent from the resource polynomials appearing in A. To any QBAL proof π of A1 , . . . , An C B we can associate a set-theoretic function π ρ : A1 ⊗ . . . ⊗ An ρ → B ρ by induction on π, in the obvious way. π ρ is equal to the set-theoretic semantics of [π] as a proof of second order intuitionistic logic. Set-theoretic semantics of proofs is preserved by cutelimination: if π reduces to σ by cut-elimination, then π ρ = σ ρ . Observe that A ρ only depends on the values of ρ on atoms appearing free in A. So, in particular,
Nq ρ = (C ⇒ C) ⇒ (C ⇒ C) C∈U
is independent on ρ and on q, since Nq is a closed formula. Similarly for Wq ρ . Actually, there are functions ϕN : N → Np and ψN : Np → N such that ψN ◦ ϕN is the identity on natural numbers. They are defined as follows:
86
U. Dal Lago and M. Hofmann
(ϕN (n))C (f, z) = f n (z) ψN (x) = xN (x → x + 1)(0) So, given a proof π : Nx Np , the numeric function represented by π is simply ψN ◦ π ◦ ϕN . Similar arguments hold for functions with conclusion Wx Wp .
4
QBAL and Polynomial Time
In this Section we show that all functions on natural numbers definable in QBAL are polynomial time computable. To this end, we follow the semantic approach in [11] which we now summarise. 4.1
Realizability Sets
Let X be a finite set of variables. We write V(X) for NX —the elements of V(X) are called valuations (over X). If η ∈ V(X) and c ∈ N then η[x → c] denotes the valuation which maps x to c and acts like η otherwise. We assume some reasonable encoding of valuations as natural numbers allowing them to be passed as arguments to algorithms. If C is a constraint set involving at most the variables in X (over X) then we write η |= C to mean that the valuation η ∈ V(X) satisfies all the constraints in C . We write P(X) for the set of resource polynomials over X. If p ∈ P(X) and η ∈ V(X) we write p(η) for the number obtained by evaluating p with x → η(x) for each x ∈ X. We assume known the untyped lambda calculus as defined e.g. in [2]. An untyped lambda term is affine linear if each variable (free or bound) appears at most once (up to α-congruence). For example, λx.λy.yx and λx.λy.y and λx.xy are affine linear while the term λx.xx is not. Notice that every affine linear term t is strongly normalisable in less than |t| steps where |t| is the size of the term. The runtime of the computation leading to the normal form is therefore O(|t|2 ). We will henceforth use the expression affine lambda term for an untyped affine linear lambda term which is in normal form. If s, t are affine lambda terms then their application st is defined as the normal form of the lambda term st. Notice that the application st can be computed in time O((|s| + |t|)2 ). If s, t are affine lambda terms we write s⊗t for the affine lambda term λf.f st. If t is an affine lambda term possibly containing the free variables x, y then we write λx⊗y.t for λu.u(λxλy.t). Notice that (λx⊗y.t)(u⊗v) = t{u/x, v/y}. More generally, if (ti )i
i
We write Λa for the set of closed affine lambda terms. There is a canonical way of representing terms of any word algebra W as affine lambda terms, which is attributed to Dana Scott [19]. For example, the natural number 2 corresponds to the term 2 = λx.λy.x(λx.λy.x(λx.λy.y)).
Bounded Linear Logic, Revisited
87
Definition 4. Let X be a finite set of resource variables. A realizability set over X is a pair A = (|A|, A ) where |A| is a set and A ⊆ V(X) × Λa × |A| is a ternary relation between valuations over X, affine lambda terms, and the set |A|. We write η, t A a for (η, t, a) ∈A . The intuition behind η, t A a is that a is an abstract semantic value, η measures the abstract size of a, and the affine lambda term t encodes the abstract value a. Example 1. (i) The realizability set Nx over {x} of tally natural numbers (“of size at most x”) is defined by: |Nx | = N and η, t Nx n if t = n and η(x) ≥ n (ii) The realizability set Wx over {x} of free terms of W (“of length at most x”) is defined by: |Wx | = W and η, t Wx w if t = w and η(x) ≥ |w| These realizability sets Nx and Wx turn out to be retracts of the denotations of the BLL formulas from Section 2.4. Definition 5. Let A be a realizability set over X. We say that x ∈ X is positive ( negative, respectively) in A, if for all η, μ ∈ V(X), t ∈ Λa , a ∈ |A| where η and μ agree on X \{x} and η(x) ≤ μ(x) (η(x) ≥ μ(x), respectively), η, t A a implies μ, t A a. We notice that x is positive in Nx and Wx . Definition 6. Let A, B be realizability sets over some set X. A morphism from A to B is a function f : |A| → |B| satisfying the following condition: there exists a function e : V(X) → Λa such that e(η) is computable in time q(η) for some resource polynomial q and for each η ∈ V(X), t ∈ Λa , a ∈ |A|, we have η, t A a
implies
η, e(η)t B f (a)
In this case we say that e witnesses f and write A →fe B where in the notation the algorithm e is presumed to exist. The following definition summarises the interpretation of formulas according to [11]: Definition 7. Let A, B be realizability sets over X. Then the following are realizability sets over X: • A ⊗ B as given by |A ⊗ B| = |A| × |B| and η, t A⊗B (a, b) iff t = u⊗v, where η, u A a and η, v B b. • A B is given by |A B| = |A| ⇒ |B| and η, t AB f iff whenever η, u A a it holds that η, t u B f (a). If C is a realizability set over X ∪ {x} and p ∈ P(X) then a realizability set !x
88
U. Dal Lago and M. Hofmann
• t = i
Extending the Realizability Model to QBAL
The notion of a realizability set above is adequate to model formulas of QBAL. The notion of a morphism, however, should be slightly generalized in order to capture constraints: Definition 8. Let A, B be realizability sets over some set X and C a constraint set over X. A function f : |A| → |B|: is a C -morphism from A to B iff there exists a function e : V(X) → Λa such that e(η) is computable in time q(η) for some resource polynomial q and for each η ∈ V(X) with η |= C , t ∈ Λa , a ∈ |A|, we have that η, t A a implies η, e(η)t B f (a). In order to define realizability sets ∀y:C .A and ∃y:C .A, we fix some encoding of environments η as affine lambda terms using the · encoding of natural numbers. We do not notationally distinguish environments from their encodings. Definition 9. Let X, Y be disjoint sets of variables. Let A be a realizability set over X ∪ Y and C a constraint set over X ∪ Y where we put Y = {y1 , . . . , yn } and y = (y1 , . . . , yn ). Furthermore, for each i = 1, . . . , n let pi ∈ P(X) be such that C |= {y ≤ p}. • |∀y:C .A| = |∃y:C .A| = |A|, • η, t ∀y:C .A a ⇐⇒ ∀μ ∈ V(Y ).η∪μ |= C ⇒ η∪μ, tμ A a. • η, μ ⊗ t ∃y:C .A a ⇐⇒ μ ∈ V(Y ) ∧ η∪μ |= C ∧ η∪μ, t A a Recall that ∀y:C .A and ∃y:C .A are well-formed only if for each i there is a resource polynomial pi such that C |= yi < pi . Therefore, the set {μ | η∪μ |= C } is finite and in fact computable in polynomial time from η.
Bounded Linear Logic, Revisited
89
We are now able to prove the main result of this Section: Theorem 1. Let π be a proof of a sequent Γ C B and ρ a mapping of atoms B to realizability sets. Then π |ρ| is a C -morphism from Γ B ρ to B ρ . Proof. The proof is by induction on derivations. We only show the cases that differ significantly from the development in [11]. Case P! . For simplicity, suppose that n = 1, q1 = p and A1 = A. The induction B hypothesis shows that π |ρ| is a C -morphism from A B ρ to B ρ witnessed by e. As in the proof of the main result in [11], we define d(η) = λ xi . e(η[x → i])xi . i
i
Now, if η |= D, then η[x → i] |= C whenever i < p(η) by the side condition B from rule P! . We obtain that π is a D-morphism from !x
5
On Compositional Embeddings
In this Section, we try to justify our emphasis on compositional embeddings. An embedding of a logical system or programming language L into QBAL is a function · from the space of proofs (or programs) of L into the space of proofs for QBAL. Clearly, for an embedding to be relevant from a computational point of view, any proof π of L should be mapped to an equivalent proof π, e.g., π = π . The existence of an embedding of L into QBAL implicitly proves that QBAL is extensionally at least as powerful as L. Such an embedding · is not necessarily computable nor natural. But whenever L is a sound and complete ICC characterization of polynomial time, it must exists, since the classes of
90
U. Dal Lago and M. Hofmann
definable functions in L and in QBAL are exactly the same. Indeed, QBAL is both extensionally sound (see Section 4) and extensionally complete (since BLL can be compositionally embedded into it, see below). Typically, one would like to go beyond extensionality and prove that QBAL is intensionally as powerful as L. And if this is the goal, · should be easily computable. Ideally, we would like · to act homeomorphically on the space of proofs of L. In other words, whenever a proof π of L is obtained applying a proofforming rule R to ρ1 , . . . , ρn , then π should be obtainable from ρ1 , . . . , ρn in a uniform way, i.e., dependently on R but independently on ρ1 , . . . , ρn . An embedding satisfying the above constraint is said to be strongly compositional. The embeddings we will present in the following two sections are only weakly compositional: [π] can be uniformly built from [ρ1 ], . . . , [ρn ] whenever π is obtained applying R to ρ1 , . . . , ρn . We believe that the existence of a weakly compositional embedding of L into QBAL is sufficient to guarantee that QBAL is intensionally as powerful as L because, as we pointed out in Section 2.1, [π] can be thought as the program hidden in the proof π. Notice that BLL can be embedded into QBAL: for every BLL proof π : Γ A, there is a QBAL proof π : Γ ∅ A such that [π] = [π]. Moreover, the embedding is strongly compositional.
6
Embedding LFPL
LFPL is a calculus for non-size-increasing computation introduced by the second author [10]. It allows to capture natural algorithms computing functions such that the size of the result is smaller or equal to the size of the arguments. This way, polynomial soundness is guaranteed despite the possibility of arbitrarily nested recursive definitions. We here show that a core subset of LFPL can be compositionally embedded into QBAL. LFPL types are generated by the following grammar: A ::= | N | A ⊗ A | A A. Rules for LFPL in natural-deduction style are in Figure 2. We omit terms, since the computational content of type derivations is implicit in their skeleton. The set-theoretic semantics A ofan LFPL formula A can be defined very easily: = C∈U C ⇒ C, N = C∈U (C ⇒ C) ⇒ (C ⇒ C), while the operators ⊗ and are interpreted as usual. Notice that the interpretation of an LFPL formula does not depend on any environment ρ. This way, any LFPL proof π : A1 , . . . , An B can be given a semantics π : A1 ⊗ . . . ⊗ An → B , itself independent on any ρ. LFPL types can be translated to QBAL formulas in the following way: qp = ∃ε : {1 ≤ p}.∀α.α α Nqp = Np
A ⊗ Bqp = ∃(x, y) : {x + y ≤ p}.Aqx ⊗ Bqy
A Bqp = ∀(x) : {x + p ≤ q}.Aqx Bqp+x
Bounded Linear Logic, Revisited
91
Axiom, Base Types and Weakening AA
A
N, N
S
AA A T NA
Γ A W Γ, B A
Multiplicative Rules Γ, A B I Γ AB
Γ AB ΔA E Γ, Δ B
Γ A ΔB I Γ, Δ A ⊗ B ⊗
Γ A ⊗ B Δ, A, B C E⊗ Γ, Δ C Fig. 2. LFPL
Please observe that the interpretation of any LFPL is parametrized on two resource polynomials p and q. If a variable x occurs in p, but not in q, then x occurs only positively in Aqp : this is an easy induction on the structure of A. The correspondence scales to proofs: Theorem 2. LFPL can be embedded into QBAL. In other words, for every LFPL proof π : A1 , . . . , An B, there exists a QBAL proof π : A1 yx1 , . . . , An yxn { i xi ≤y,1≤y} By such that tional.
π
=
i
xi
π . Moreover, the correspondence · is weakly composi-
Proof. As expected, the proof goes by induction on π. We just give two interesting cases: • If the last rule used in π is I , then π : Γ A B and the immediate sub-proof is ρ, with conclusion Γ, A B. By induction hypothesis, there is ρ with the appropriate conclusion and π becomes: σ : Γ yx , Ayz { i xi ≤y, i xi +z≤y,1≤y} By Γ yx
{ i xi ≤y, i xi +z≤y,1≤y}
Ayz
By
Γ yx { i xi ≤y,1≤y} A By
i
xi
i
xi +z
i
xi +z
R R∀x
where σ can be easily obtained from ρ by strengthening the underlying constraint set (see Section 2.2). • If the last rule in π is T , then π : N A and the immediate sub-proofs are ρ with conclusion A A and σ with conclusion A. By the induction hypothesis, there are ρ and σ with the appropriate conclusions and π becomes:
92
U. Dal Lago and M. Hofmann
θ : y1 {z≤y,1≤y} A Ay1 {z≤y,1≤y} A Ay1 {z≤y,1≤y} Ayz Ayz+1
ξ : {z≤y,1≤y} Ay0
Nyz {z≤y,1≤y} Ayz where θ and ξ are obtained by instantiating resource variables in ρ and σ, respectively (see Section 2.2). This concludes the proof. One may ask whether such an embedding might work for BLL proper. We believe this to be unlikely for several reasons. In particular, it seems that BLL lacks a mechanism for turning the information about the size of the manipulated objects from being global to being local. In QBAL, this rˆ ole is played by first order quantifiers. As an example, considered the split function for lists of natural numbers that splits a list into two lists, one containing the even entries and one containing the odd entries. The type of that function in LFPL is L(N) L(N) ⊗ L(N) where L(·) denotes the type of lists that we have elided from our formal treatment for the sake of simplicity. In QBAL this function gets the type Lx (Ny ) ∃(u, v) : {u + v ≤ x}.Lu (Ny ) ⊗ Lv (Ny ) The only conceivable BLL formula for this function is Lx (Ny ) Lx (Ny ) ⊗ Lx (Ny ). In LFPL and in QBAL we can compose the split function with “append” yielding a function of type Lx (Ny ) Lx (Ny ) that can be iterated. In BLL this composition receives the type Lx (Ny ) L2x (Ny ) which of course is not allowed in an iteration. But a hypothetical compositional embedding of LFPL into BLL would have to be able to mimic this construction.
7
Embedding RRW
Ramified recurrence on words (RRW) is a function algebra extensionally corresponding to polynomial time functions introduced by Leivant in the early nineties [14]. Bellantoni and Cook’s algebra BC can be easily embedded into RRW. Given a word algebra W, id is the identity function on W and πim are the m-ary projection functions from m-tuples of terms into terms. Functions on W can be defined by composition, primitive recursion, and conditional selec, fεW ) denotes the function obtained from tion. The expression rec(fc1W , . . . , fcw W , f by primitive recursion. Similarly for comp(g, f1 , . . . , fn ) and fc1W , . . . , fcw ε W W cond(fc1W , . . . , fcw , f ). Not every function obtained this way is in RRW: inε W W deed, they correspond to primitive recursive functions on W. In Figure 3, a formal system for judgements in the form f : Wi1 × . . . × Win → Wi (where i1 , . . . , in , i are natural numbers) is defined. If such a judgement can be derived from the rules in Figure 3, then f is said to be an RRW function (the definition
Bounded Linear Logic, Revisited
93
ji = j id : Wi → Wi
εW : Wi cW : Wi → Wi
g : Wi1 × . . . × Win → Wi
πim : Wj1 × . . . × Wjm → Wj
fk : Wj1 × . . . × Wjm → Wik
comp(g, f1 , . . . , fn ) : Wj1 × . . . Wjm → Wi fcW : Wi1 × . . . × Win × Wi × Wj → Wi rec(fc1 , . . . , f
cw W
W
fcW : W
i1
× ... × W
in
, fεW ) : W j
fε : Wi1 × . . . × Win → Wi
i1
×W →W
× ... × W i
in
fε : W
i1
j
×W →W
i<j
i
× . . . × Win → Wi
cond(fc1 , . . . , fcw , fεW ) : Wi1 × . . . × Win × Wj → Wi W W
Fig. 3. RRW as a formal system
of RRW given here is slightly different but essentially equivalent to the original one [14]). Leivant [14] proved that RRW functions are exactly the polytime computable functions on W. But RRW can be compositionally embedded into QBAL, at least in a weak sense: Theorem 3. RRW can be embedded into QBAL. Suppose, in other words, that π : f : Wi1 × . . . × Win → Wi and that i < ij1 , . . . , ijm , while i = ik1 , . . . , ikh . Then, there exists an QBAL proof π : Wx1 , . . . , Wxn {xk1 ≤x,...,xkh ≤x} Wq(xj1 ,...,xjm )+x n where ( i=1 ψW ) ◦ π ◦ ϕW = f . Moreover · is weakly compositional. Quite interestingly, the proof of Theorem 3 (see [4]) is very similar in structure to the proof of polynomial time soundness for BC given in [3], which is based on the following observation: the size of the output of a BC function is bounded by a polynomial on the sizes of normal arguments plus the maximum of sizes of safe arguments. This cannot be formalized in BLL, because the resource polynomials do not include any function computing the maximum of its arguments. On the other hand, this can be captured in QBAL by way of constraints.
8
Conclusions
We presented QBAL, a new ICC system embedding both known systems of impredicative recursion in the sense of [9]. QBAL allows to overcome the main weakness of BLL, namely that all resource variables are global. In the authors’ view, this constitutes the first step towards unifying ICC systems into a single framework. The next step consists in defining an embedding of light linear logic into QBAL and the authors are currently investigating on that.
94
U. Dal Lago and M. Hofmann
References 1. Asperti, A., Roversi, L.: Intuitionistic light affine logic. ACM Transactions on Computational Logic 3(1), 137–175 (2002) 2. Barendregt, H.: The Lambda Calculus: Its Syntax and Semantics. In: Studies in Logic and the Foundations of Mathematics. North-Holland, Amsterdam (1984) 3. Bellantoni, S., Cook, S.: A new recursion-theoretic characterization of the polytime functions. Computational Complexity 2, 97–110 (1992) 4. Dal Lago, U., Hofmann, M.: Bounded linear logic, revisited. Extended Version (2009), http://arxiv.org/abs/0904.2675 5. Danos, V., Joinet, J.-B.: Linear logic and elementary time. Information and Computation 183(1), 123–137 (2003) 6. Girard, J.-Y.: Linear logic. Theoretical Computer Science 50, 1–102 (1987) 7. Girard, J.-Y.: Light linear logic. Information and Computation 143(2), 175–204 (1998) 8. Girard, J.-Y., Scedrov, A., Scott, P.: Bounded linear logic: A modular approach to polynomial-time computability. Theoretical Computer Science 97(1), 1–66 (1992) 9. Hofmann, M.: Programming languages capturing complexity classes. SIGACT News Logic Column 9, 12 (2000) 10. Hofmann, M.: Linear types and non-size-increasing polynomial time computation. Information and Computation 183(1), 57–85 (2003) 11. Hofmann, M., Scott, P.: Realizability models for BLL-like languages. Theoretical Computer Science 318(1-2), 121–137 (2004) 12. Lafont, Y.: Soft linear logic and polynomial time. Theoretical Computer Science 318(1-2), 163–180 (2004) 13. Leivant, D.: A foundational delineation of computational feasiblity. In: Sixth IEEE Symposium on Logic in Computer Science, Proceedings, pp. 2–11 (1991) 14. Leivant, D.: Stratified functional programs and computational complexity. In: 20th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Proceedings, pp. 325–333 (1993) 15. Marion, J.-Y., Moyen, J.-Y.: Efficient first order functional program interpreter with time bound certifications. In: Parigot, M., Voronkov, A. (eds.) LPAR 2000. LNCS, vol. 1955, pp. 25–42. Springer, Heidelberg (2000) 16. Murawski, A., Ong, L.: On an interpretation of safe recursion in light affine logic. Theoretical Computer Science 318(1-2), 197–223 (2004) 17. Sch¨ opp, U.: Stratified bounded affine logic for logarithmic space. In: 22nd IEEE Symposium on Logic in Computer Science, Proceedings, pp. 411–420 (2007) 18. Troelstra, A., Schwichtenberg, H.: Basic proof theory. Cambridge University Press, Cambridge (1996) 19. Wadsworth, C.: Some unusual λ-calculus numeral systems. In: Seldin, J.P., Hindley, J.R. (eds.) To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism. Academic Press, London (1980)
Partial Orders, Event Structures and Linear Strategies Claudia Faggian1 and Mauro Piccolo1,2, 1 2
Preuves, Programmes et Syst`emes, Paris VII (France) Dipartimento di Informatica, Universit` a di Torino (Italia)
Abstract. We introduce a Game Semantics where strategies are partial orders, and composition is a generalization of the merging of orders. Building on this, to bridge between Game Semantics and Concurrency, we explore the relation between Event Structures and Linear Strategies. The former are a true concurrency model introduced by Nielsen, Plotkin, Winskel, the latter a family of linear innocent strategies developed starting from Girard’s work in the setting of Ludics. We extend our construction on partial orders to classes of event structures, showing how to reduce composition of event structures to the simple definition of merging of orders. Finally, we introduce a compact closed category of event structures which embeds Linear Strategies.
1
Introduction and Background
Game Semantics has been successful in providing accurate (fully abstract) models to programming languages and logical systems; the key feature of such a semantics is to be interactive. Computation is interpreted as a play (an interaction) between two players, where Player (P) represents the program/proof, and Opponent (O) represents the environment, the context. The set of the possible plays represents the operational behavior of a term, and is called a strategy. The play should respect some “rules of the game”, expressed by an arena, which denotes a type. Since interaction is the main feature also of a concurrent system and of process calculi, it appears natural to search for an extension of Game Semantics able to model parallel and concurrent computation. This is indeed an active line of research, even though still at its early stage. A way to allow parallelism is to relax sequentiality, and have plays (i.e. traces of computation) which are “partial orders” instead of totally ordered sequences of moves. The intent of this approach is actually two-folded: to allow for parallelism, and to provide partial order models of sequential computation, i.e. models where the scheduling in which the actions should be performed is not completely specified, while it is still possible to express constraints: certain tasks may have to be performed before other tasks; other actions can be performed in parallel, or scheduled in any order.
The second author is supported partially by MIUR-Cofin’07 CONCERTO Project.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 95–111, 2009. c Springer-Verlag Berlin Heidelberg 2009
96
C. Faggian and M. Piccolo
A Game Semantics where strategies would be partial orders has been first propounded by Hyland in a seminal talk in Lyon. Our paper is an effort to pursue this direction. We introduce a Game Semantics where strategies are partial orders, and composition is the merging of the orders; on this basis, we are led to investigate how a class of strategies, here called linear strategies, fits inside a larger picture where we have true concurrency models. Linear strategies. In this paper, we use the name linear strategies to designate a family of strategies originated by the work of Girard in Ludics [10]. Ludics can be seen as a Game Semantics where the foundational role of interaction is taken even further; moreover, many features (actions, names, a built-in observational equivalence) make it close to process calculi. We mention that, as established in [7], Ludics has a close relation with the Linear Pi-calculus [26], i.e. a process calculus which is asynchronous and internal (see [20]). The strategies defined in Ludics (called designs in the original paper) can be seen as a (linear) variant of Hyland-Ong innocent strategies. In [5] was then proposed a “more parallel” version of Ludics, leading to the introduction of graph strategies, called L-nets; these are in many ways close to proof-nets. Exploring this and moving through several degree of sequentiality is the object of [4]. True concurrency. In the literature, there are two main approaches in the study of models for parallel and concurrent programming languages. The first one is represented by interleaving models, that describe a concurrent system by means of possible scheduling of concurrent actions (all traces). The second one is represented by causal models, (also called true-concurrent models) in which concurrency, dependency and conflict relations among actions are directly expressed. A fundamental instance of true concurrent models are event structures, introduced by Nielsen, Plotkin and Winskel [19,23,24] as a theory combining Petri Nets and Domain Theory. An event structure describes a concurrent system in terms of a partial order, which specifies the causality relation between actions, and a conflict relation, which specifies what actions are mutually exclusive. In a previous work [6], we have proposed event structures as a mathematical framework unifying proof-nets and linear strategies. Building on previous work by Varacca and Yoshida [22], there we have given a general definition of composition, based on an abstract machine, which realizes both event structures and game semantics composition. Moreover, we have shown that linear strategies (and proof-nets, in a sense) correspond to a particular subclass of event structure called confusion free event structures; these latter model a kind of well-behaving non-determinism, in which the choice is localized in “cells”. Contributions of this paper. In this paper, we introduce a Game Semantics, where strategies are partial orders (called po strategies) and composition is a generalization of merging of order (Section 2). The merging of two orders has been defined by Girard in [10] as the transitive closure of their set-theoretical union. Under certain acyclicity conditions, the result is a partial order. This is the base of composition of linear strategies, and also of our po strategies. More precisely, a central contribution of this paper is to generalize this operation,
Partial Orders, Event Structures and Linear Strategies
97
without requiring any acyclicity condition. The main advantage of this definition is that it appears mathematically clean and attractive, and this translates into direct and -we believe- clearer proofs. The Game Semantics we defined is rather general, since we do not require our strategies neither to be Player/Opponent alternating nor to be sequential. Moreover we admit an additional neutral polarity; intuitively, neutral actions correspond to internal τ -actions of process calculus. Following an idea proposed by Hyland, we show how to extend the notion of innocence to partial order strategies, i.e. to a setting which is parallel and non-alternating. In the Hyland-Ong game model [12], innocence is an important property on strategies, in order to obtain a well-defined Cartesian closed category which provides a model for PCF and whose effective part consists of morphisms (strategies) that are PCF-definable. In this paper we will see how to extend innocence to a parallel and non alternating setting showing that such a condition allow us to obtain a well defined category where arenas are objects and po strategies are morphisms. In Section 3 we show how to generalize this construction to event structures, in order to define a general setting embedding linear strategies. Arenas are generalized to ES-arenas and po-strategies become typed event structures. All these structures can be described in terms of special families of po-strategies. Using this characterization, we show how to define a simple notion of composition of typed event structure relying on the definition of merging of partial orders. This definition of composition corresponds to the one given in [22]. We conclude then by generalizing the category of innocent po-strategies to a compact closed category of innocent event structures. Finally, starting from this category, we will retrieve linear strategies as a sub-class. Related work. The exploration of concurrent Game Semantics has been initiated by Abramsky and Melli`es with the introduction of Concurrent Games [2], which give a fully complete model of Multiplicative-Additive Linear Logic; strategies are here closure operators. Melli`es and Mimram have then developed the fertile line of asynchronous games [17,18], where plays are seen as Mazurkiewicz traces, and innocence receives a diagrammatic formulation. Since the purpose of this line of work is a better understanding and a generalization of innocence, there are several connections between our work and in particular [18], where strategies also do not require neither alternation nor sequentiality. A main difference is the fact to work with traces or with partial orders, which is the choice which allows us to rely on the merging of orders for composition. Graph strategies have been introduced by Hyland, Schalk [14,21]. Partial order models have then be proposed by McCusker [16], where partial order models where used to study programming languages. The work of Curien, Faggian and Maurel [4,5] also fit in this line. Game semantics for languages equipped with concurrency features have been developed by Ghica, Laird and Murawsky [15,9]. Strategies are here described as set of traces.
98
2
C. Faggian and M. Piccolo
Partial Orders as Strategies
In this section, we introduce a notion of arena and a notion of strategy on an arena, where a strategy is a partial order (po strategies). We follow a minimalist approach, in the spirit of [11,3]. We first introduce a rather general setting in which we have po strategies and a notion of parallel composition (which is well defined and associative); we then gradually add properties (innocence, alternation, arborescence), and show that these properties are all preserved by composition. Furthermore, by introducing a notion of innocence, we eventually obtain (Section 2.4) a category, where the objects are arenas, the arrows are innocent po strategies, and their composition factorizes as parallel composition plus hiding. However, since both parallel composition and hiding have an independent interest, we discuss them separately. At the end of Section 2.3, after giving the definitions, we will discussion the connection between the notions we introduce and the standard notion of innocent strategy. We first recall some preliminary notions on partial orders. A strict partial order (spo for short) is a pair X, <X where X is a set, and <X is binary, transitive and irreflexive relation and it is often denoted simply as X. We will use X, Y, Z, W, . . . to range over spos. We will move freely between a strict partial order X, <X and the partially ordered set or poset X, ≤X , where ≤X is the reflexive closure of <X . Given an element x ∈ X, the set of its enabling elements is [x) = {x ∈ Xx <X x}. An spo is well founded if the set [x) is finite for all x ∈ X. It is arborescent if it is well founded and each x ∈ X has at most one immediate predecessor; we call forest a strict partial order which is arborescent (we will talk of roots and children in the obvious way). A subset S ⊆ X is said to be downward-closed if for all x ∈ S, if y <X x then y ∈ S. 2.1
Arenas and Po Strategies
A polarity is an element of the set {+, −, ±}: The positive polarity + corresponds to Player, while the negative polarity − corresponds to Opponent. The neutral polarity ± plays a role similar to that of a τ action in a process calculus. Given a polarity , its dual ⊥ is defined as +⊥ = − and −⊥ = +, while ±⊥ is undefined. An arena is a set of elements, here called actions, together with a polarity for each action, and possibly some structure of order on the actions (which may express causality or dependency). Our choice here is that the order to is a forest (but this is not necessary). Definition 1. An arena is a set Γ , whose elements are called actions, together with a strict partial order relation <Γ and a polarization function πΓ : Γ → {+, −, ±}, which satisfies: (1.) Γ, <Γ is a forest. (2.) For each a ∈ Γ , if π(a) = ±, then π(c) = ±, for all c ∈ Γ which are comparable with a. With a slight abuse of notation, we will denote in the same way the arena and the set of actions. We use Greek capital letters to range over arenas. Given an arena Γ , we use a, b, c, . . . to range over its actions. A action a is initial if it is a root
Partial Orders, Event Structures and Linear Strategies
99
of Γ, <Γ . If b is immediate predecessor of c, we say that b justifies c, and write b Γ c. An arena Γ is called +/- polarized if πΓ (Γ ) ⊆ {+, −}, it is neutral if πΓ (Γ ) = {±}. An arena Γ is alternating if a1 Γ a2 ⇒ π(a1 ) = π(a2 )⊥ . Observe that our definition of arena is rather general, and does not require alternation. By definition, if b Γ c, either none of them is neutral, or both are neutral. Constructions on arenas. Given two arena Γ1 , Γ2 which are disjoint, i.e. Γ1 ∩ Γ2 = ∅, we write Γ, Δ or, for better readability, Γ Δ for Γ1 ∪ Γ2 with the order and polarization which are induced by the Γi ’s. Given an arena Γ , the neutral arena Γ ± is the arena having the same actions and order relation as Γ , but neutral polarity for all its actions: πΓ ± (a) = ±, for each a ∈ Γ . Given a +/- polarized arena Γ , its dual Γ ⊥ is the arena having the same actions and order relation as Γ , but inverting their polarity: πΓ ⊥ (a) = πΓ (a)⊥ . Remark 1. Observe that Γ ⊥ is not defined on neutral arenas. By writing Γ ⊥ we implicitly assume that Γ is +/- polarized. Po strategies. We are interested in spos X whose elements are taken on an arena Γ . We require that, for the elements, X ⊆ Γ and, for the order, <X respects and refines <Γ (see Condition (1.) below). We will call such spos po strategies. Definition 2. Let Γ be an arena. A spo X on Γ is a well founded spo such that X ⊆ Γ (as sets of actions). X is a a po strategy on the arena Γ , written X : Γ when it satisfies: (1.) For each b ∈ X and a ∈ Γ , if a Γ b then a ∈ X and a <X b. (2.) For each c which is maximal in X, πΓ (c) = +. Both conditions reformulate standard conditions in Game Semantics; the first condition is usually called justification. The second one means that the strategy always has an answer to an opponent move. Observe that each element in X : Γ has a polarity, the one given by πΓ . Given a, b ∈ X, we write a ←X b if a is an immediate predecessor of b according to <X . We remind that we write a Γ b for the same relation in the arena. Let X and Y be spos. If X : Γ, is a po strategy, we write Y X if Y ⊆ X is downward closed and for all a, b ∈ Y a
Parallel Composition
If X is a po strategy on Γ , and Γ = Γ Γ we define X Γ = {c ∈ X|c ∈ Γ } to be the restriction of X to Γ . We compose two po strategies X1 , X2 when they are compatible, that is X1 : Γ1 Λ⊥ and X2 : Λ Γ2 (Γ1 , Γ2 , Λ disjoint). Definition 3 (Parallel composition). Given two orders, Z1 and Z2 , we write Z1 Z2 for the transitive closure of Z1 ∪ Z2 (which does not need to be an order).
100
C. Faggian and M. Piccolo
Let X1 : Γ1 Λ⊥ and X2 : Λ Γ2 (Γ1 , Γ2 , Λ disjoint). The actions on the common arena Λ are called private. We define ⎧ ⎫ Y = Y1 Y2 , ⎨ ⎬ Int(X1 , X2 ) = Y spo on Γ1 Γ2 Λ± | ∃Y1 , Y2 s.t. Y Γ1 Λ⊥ = Y1 + X1 ⎩ ⎭ Y ΛΓ2 = Y2 + X2 We define the parallel composition of X1 , X2 as X1 X2 = max Int(X1 , X2 ). We need to prove that Int(X1 , X2 ) has a unique maximal element. The proof makes essential use of the following immediate Lemma. Lemma 1. Let X1 , X2 and Y be as in Definition 3. If Y1 = Y Γ1 ,Λ⊥ X1 and Y2 Γ2 ,Λ X2 , then for each private action c, c ∈ Y1 iff c ∈ Y2 . Proposition 1. The set Int(X1 , X2 ) given in Definition 3 has a unique maximal element (hence, parallel composition is well defined). Proof. Let U = U1 U2 and V = V1 V2 both belong to Int(X1 , X2 ), where for each i ∈ {1, 2}, Ui = U Γi Λ and Vi = V Γi Λ are downward closed subsets of Xi given by Definition 3. Let us set Y1 = U1 ∪V1 and observe that Y1 + X1 (and similarly for Y2 ). We prove that Y = Y1 Y2 is an spo. Let Y = (Y,
Theorem 2 (Associativity). If X1 , X2 , X3 are po strategies which are pairwise compatible, we have that (X1 X2 )X3 = X1 (X2 X3 ). Proof. Let (X1 X2 )X3 = Z,
Partial Orders, Event Structures and Linear Strategies
101
In the next section, we show that composition preserves several interesting properties. We will need the following easy observations. Proposition 2. Let a ∈ X1 X2 : Γ1 Γ2 Λ± , with X1 : Γ1 Λ⊥ and X2 : Γ2 Λ. Then (1.) If a is private (a ∈ Λ), then a ∈ X1 and a ∈ X2 . Its polarity is neutral. (2.) Otherwise, either a ∈ X1 (and a ∈ X2 ) or a ∈ X2 (and a ∈ X1 ). Moreover, assuming a ∈ X1 , then c ← a (resp. a ← c) in X1 X2 iff c ←X1 a (resp. a ←X1 c); similarly if a ∈ X2 . 2.3
Innocence (and Alternation, and Arborescence)
Composition of po strategies is associative, but -as we will see- it is not possible to define a strategy which behaves as the identity for the composition. This motivates the restriction to a class of po strategies, which we call innocent, for reasons we discuss at the end of this section. Composition preserves innocence. Moreover, more standard notions of strategies, such as arborescent strategies and alternating strategies, will be a subclass of innocent po strategies. Definition 4. We say that a po strategy X : Γ is innocent when for all b, c ∈ X, if b ←X c and (πΓ (c) = − or πΓ (b) = +) then b Γ c Observe that up to now, we have been rather general. In particular, in our definitions we do not require alternation in the polarity of the actions. Remark 2. If the arena Γ is alternating and +/- polarized, then our innocence condition implies +/- alternation in X : Γ . Remark 3. Innocent po strategies allow neutral actions. The innocence condition implies that immediate predecessors (resp. successors) of neutral actions are either negative or neutral (resp. either positive, or neutral). Proposition 3. If X1 : Γ1 Λ and X2 : Γ2 Λ⊥ are innocent, then (1.) X1 X2 : Γ1 Γ2 Λ± is innocent. (2.) Moreover if X1 , X2 are forests, then X1 X2 is a forest. Proof. (1.) Assume c ← a in X1 X2 = Y1 Y2 , and π(a) = − or π(c) = +. By Remark 2, for one of the two Yi (i ∈ {1, 2}), we have c ←Yi a. Hence, by innocence of Yi , c Γi a. (2.) Let Y = X1 X2 , with Y = Y1 Y2 , defined as in 3. Let T be a maximal spo such that T is a forest and T Y . We consider a minimal a ∈ Y s.t. a ∈ T . We observe that (i) each immediate predecessor of a in Y is an immediate predecessor of a in either Y1 or Y2 ; (ii) if a ∈ Y1 (resp. a ∈ Y2 ), a has at most one predecessor. We now prove that a has a unique predecessor in Y , hence the restriction of Y to T ∪ {a} is still a forest. If a is not private, the result is immediate by Proposition 2 and by (ii). The same is true if a is private, and root in one of the Yi . If a is private, let us assume πΛ (a) = −. Let z ←Y2 a and b ←Y1 a. By Innocence of Y1 , we have that b Λ a. By Justification, we have that b ≤Y2 z. Hence z is the only possible immediate predecessor of a also in Y .
102
C. Faggian and M. Piccolo
Discussion. Linearity and pointers. The strategies we have defined are linear, in the sense that there are no repetitions of actions. For this reason, pointers are not required. In fact, for each action c in the strategy there is a unique action b which justifies c. One can also say that pointers are implicit: c points to b. Innocence. Intuitively, innocence captures the idea that Player is not able to see Opponent’s internal calculations; a Player strategy in the game is completely determined by the piece of the play it can see (called view). Since an innocent strategy is completely determined by the views, while the most standard presentation of a strategy is as the set of all its possible plays, an innocent strategy can equivalently be described as a set of views (see [11,3]). This is the approach we follow here, in a sense we are going to explain. It is immediate to associate to a po strategy X a set of views: following the construction developed in [4], we can associate to each x ∈ X the restriction of (X, <) to the set {x ≤ x} (in a parallel setting, a view is not a sequence of action, but a partial order.) One should now see that Definition 4 generalizes the definition of what is a views to a setting which is parallel and non-alternating. In the standard approach, one call view a play where every O move is justified by the immediately preceding P move. This captures the idea that the only information that Player has on O moves is the dependency in the arena. Our condition exactly says that the strategy cannot refine the order given by the Arena on O actions. 2.4
Arenas and Innocent Po Strategies as a Category
In this section, we show that we can organize what we have seen into a category, where the objects are +/- polarized arenas, and the arrows are innocent po strategies. We define composition as standard in Game Semantics: composition = parallel composition + hiding. To complete the construction, we then verify that we have an identity arrow for each object. Hiding consists in removing the private actions, which correspond to “internal communication”, i.e. the actions which are used in the parallel composition to make the two structures communicate. The following is immediate (using Remark 3) Proposition 4 (Hiding). Let X : Γ Λ± be innocent and X := X Γ . We have that X : Γ is a po strategy. Moreover (i) if X is innocent then X is innocent; (ii) if X is arborescent then X is arborescent. Composition preserves innocence, and in that case also arborescence and alternation, as these properties are preserved by both parallel composition and hiding. Putting all pieces together, we have the following Definition 5. Let X1 : Γ1 Λ and X2 : Γ2 Λ⊥ (Γ1 , Γ2 , Λ disjoint). We define their composition as X1 ; X2 = (X1 X2 ) Γ1 ,Γ2 Theorem 3. Let X1 : Γ1 Λ and X2 : Λ⊥ Γ2 . Then X1 ; X2 : Γ1 Γ2 . Moreover, if X1 , X2 are innocent, we have the following: (1.) X1 ; X2 is innocent. (2.) If X1 , X2 are po strategies on alternating arenas, then X1 ; X2 is alternating. (3.) If X1 , X2 are arborescent, then X1 ; X2 is arborescent.
Partial Orders, Event Structures and Linear Strategies
103
Identity (copycat). In this section we introduce a Copycat strategy, which generalizes what is called fax in [10] and copycat in Game Semantics. Our approach closely corresponds to that proposed by Hyland in [13]. All along this section we fix two +/-polarized arenas Δ and Δ , which are disjoint and isomorphic i.e. there is an order isomorphism φ : Δ → Δ such that πΔ (a) = πΔ (φ(a)). We say that φ is a renaming function, and say that Δ and Δ are equal up to renaming. We extend this notion to po strategies too. Given two po strategies X : Γ Δ , X : Γ Δ , we say that they are equal up to renaming if X is obtained from X by substituting each occurrence of a ∈ Δ with φ(a) ∈ φ(Δ). The copycat is a strategy idΔ→φ(Δ) which copies any action from Δ into the corresponding action in φ(Δ). Definition 6 (Copycat). Let Δ and Δ = φ(Δ) be two arenas which are equal up to renaming. We define idΔ→Δ : Δ⊥ Δ as the spo X, where the order is that induced by <Δ and <Δ with the addition of all the pairs {c ←X φ(c) | c ∈ Δ, πΔ (c) = −} and {φ(c) ←X c | c ∈ Δ, πΔ (c) = +}. Example 1. Let Δ⊥ and Δ be the isomorphic arenas represented below. We illustrate idΔ→φ(Δ) in the following picture, where we indicate with a dashed line the order which is added w.r.t. the arenas. Δ⊥
Δ
idΔ→φ(Δ) b+ 2 a− 2
a+ 1 a− 2
a+ 1
b− 1
b+ 2 b−
a+
b− 1
a+ b−
Remark 4. If the arena is alternating, the definition above gives us the standard copycat strategy. Proposition 5 (Identity). (1.) idΔ→φ(Δ) defined in 6 is a innocent po strategy. (2.) idΔ→φ(Δ) φ(Δ) = φ(idΔ→φ(Δ) Δ ). (3.) Let X : Γ Δ be a innocent po strategy. Then X; idΔ→φ(Δ) : Γ φ(Δ) is a innocent po strategy φ(X) which is equal to X up to renaming. Let us conclude with an example of the fact that, without the Innocence condition, the composition with the copycat does not produce the desired effect. Let us consider three singleton arenas Γ1 = {a}, Γ2 = {b}, Γ3 = {c}, where πΓ1 (a) = πΓ2 (b) = πΓ3 (c) = +. We then consider the following po strategy X : Γ1 Γ2 Γ3 , composed with idΓ1 →φ(Γ1 ) . b+
a+
X:
c+
φ(a)+
idΓ1 →φ(Γ1 ) :
a−
b+
X; idA→φ(A) :
φ(a)+ c+
Hiding and observational equivalence. The neutral actions which we hide are silent actions which correspond to an internal synchronization. Hiding gives
104
C. Faggian and M. Piccolo
a canonical representation of an event structure with respect to observational equivalence. This idea is made precise in process calculi by the notion of weak bisimilarity. By analogy with labelled transitions in process calculi, we generate a labelled a transition system on spos as follows: if a is minimal in X then X −→ X \ {a}. We then define the following reductions (i) =⇒ is the reflexive transitive closure τ a a of −→, where τ denotes any neutral action; (ii) =⇒ is =⇒ −→ =⇒, where a is a non-neutral action. We define the weak bisimilarity on po strategies as the greatest binary symmetric relation ≈ satisfying the following property: whenever X1 ≈ X2 and a a X1 −→ X1 then there exists X2 such that X2 =⇒ X2 and X1 ≈ X2 . we have the following Proposition 6. Let X : Γ Λ± , s.t. Γ is +/- polarized. We have that X ≈ X Γ .
3
Event Structures as Strategies
In this section, we extend the construction we have seen for partial orders to event structures. Definition 7. An event structure is a triple E = E, ≤, such that (1.) E, ≤ is a well-founded partial order. (2.) is an irreflexive and symmetric relation, called conflict relation which is hereditary, i.e. for every e1 , e2 , e3 ∈ E, if e1 ≤ e2 and e1 e3 then e2 e3 . Given two event structures E1 = E1 , ≤1 , 1 and E2 = E2 , ≤2 , 2 with E1 ∩ E2 = ∅, we define E1 E2 = E1 ∪ E2 . ≤1 ∪ ≤2 , 1 ∪ 2 . With an abuse of notation, given an event structure E, we will confuse E with the set of its events E, writing e ∈ E (resp. x ⊆ E) for e ∈ E (resp. x ⊆ E). Causal order and conflict are mutually exclusive. Two events which are not causally related nor in conflict are said to be concurrent. A conflict e1 e2 is said inherited from the conflict e1 e2 if e2 ≤ e2 . If the conflict e1 e2 is not inherited from any conflict, we say that it is immediate, written e1 μ e2 . We denote with (resp. μ ) the reflexive closure of (reps. μ ). Given an event structure E a configuration is a set x ⊆ E which is downward closed and conflict free, i.e. if e, e ∈ x then it is never the case that e e . For example, given e ∈ E, the sets [e) and e = [e) ∪ {e} are configurations. Observe that a configuration x is implicitly a partially ordered set (and so a spo), where the partial order is the restriction of the partial order of E to x. This fact is key to our approach, together with the fact that important relations in an event structure can be recovered from its configurations, and in fact an event structure can be described also as a special set of configurations, as we sketch below. Let us denote with C(E) the family of all configurations of E. C(E) ordered by inclusion forms a coherent, finitary prime algebraic domain (a dI-domain
Partial Orders, Event Structures and Linear Strategies
105
satisfying an additional condition [24]) whose set prime elements is {e | e ∈ E}. A converse result holds too, i.e. every coherent finitary prime algebraic domain can be described using an event structure whose events are the prime elements, the order between events is inherited from the order between prime elements and two events are in conflict when they do not admit an upper bound, as prime elements of the domain. The following result, due to Winskel [25, pp. 60-61], summarizes the previous notions. Theorem 4 ([25]). Coherent finitary prime algebraic domain and event structures are equivalent. More details can be found in [25]. This fact will allows us to rely on the results we developed in the previous section. Event structures form the class of objects of a category, whose morphisms are given by any partial map λ : E1 → E2 satisfying λ(x) ∈ C(E2 ) x ∈ C(E1 ) ⇒ ∀e1 , e2 ∈ x.λ(e1 ), λ(e2 ) both defined ∧ λ(e1 ) = λ(e2 ) ⇒ e1 = e2 This category admits all finite products, co-products and pull backs [25]. A morphism is said to be total when the underlying map is. 3.1
Typed Event Structures
We now introduce the notion of typed event structures. Informally, it consists in a pair of event structures, one describing a process (a strategy) and the other one denoting a type (an arena); the typing relation between the two is represented by the existence of a total morphism relating them. Definition 8. An ES-arena is a pair Γ = Γ, Γ where Γ is an arena and Γ is a binary relation such that (1.) Γ, ≤Γ , Γ is an event structure (2.) πΓ is such that (i.) if πΓ (a) = ± then πΓ (b) = ± for all b s.t. b Γ a; (ii.) if a1 μ a2 then πΓ (a1 ) = πΓ (a2 ). All along this section, we will consider an ES-arena Γ both as an event structure and as an arena in the sense of Section 2 (by ignoring the conflict relation). Furthermore, it is clear that each configuration x ∈ C(Γ) with the induced polarization function is also an arena in the sense of Section 2. Thus, we can adapt all the definitions given in Section 2 (alternation, neutrality, duality, +/polarization . . . ) to ES-arenas. In particular, given two disjoint ES-arenas Γ1 = Γ1 , Γ1 and Γ2 = Γ2 , Γ2 we define Γ1 Γ2 = Γ1 Γ2 , Γ1 ∪ Γ2 . Definition 9. Let Γ be an ES-arena, E and event structure. We say that E is typed in Γ (written E : Γ) if there is a total morphism λ : E → Γ, called labeling morphism, satisfying (1.) if e ∈ E is maximal w.r.t. the order of E, then πΓ (λ(e)) = +. Let E : Γ be a typed event structure with a labeling morphism λ, and let x ∈ C(E) be a configuration. Let us consider the structure λ(x), <, where the order < is defined as λ(e1 ) < λ(e2 ) when e1 < e2 (e1 , e2 ∈ E), λ(x), < is a well-defined spo (since λ is a total morphism and an injective map on configurations) and it
106
C. Faggian and M. Piccolo
is isomorphic to x (viewed as spo). We say that λ(x), < a slice of E when x is such that for all e ∈ x, if e is maximal then πΓ (λ(e)) = +. We denote with Slices(E) the set of all slices of E and we use S, S1 , S2 , . . . to range over slices. Lemma 3. Let E : Γ be a typed event structure and let S ∈ Slices(E). Then S = x, < is a po strategy on Γ. The following definition characterizes when a set of slices corresponds to a typed event structure; in this way, we are able to present a typed event structure as a set of spos (which are po strategies). Definition 10. Let Γ be an ES-Arena. A family F of po strategies on Γ ordered by the order (see Section 2) is said to be a po strategies family when it satisfies ⊆ F such that, for every (1.) If X ∈ F and Y + X then Y ∈ F. (2.) For all X X1 , X2 ∈ X they admit an upper bound in F , we have X ∈ F . Observe that the structure of F is very close to the one of a coherent finitary prime algebraic domain. Proposition 7. (1.) Let E : Γ be a typed event structure. Then Slices(E) is a po strategies family. (2.) Let F be a po strategies family on Γ. Then there exists a typed event structure Ev(F ) : Γ such that Slices(Ev(F )) is isomorphic to F . Proof. (1.) can be obtained essentially as a corollary of Theorem 4. To prove (2.), given X ∈ F we define a view to be any subset cX X having a top
element, denoted top(cX ). We define Ev(F ) = E, ≤, as (i) E = X∈F {cX X|cX view} (we denote sets as sequences of their elements without repetitions). (ii) ≤ is the restriction of to E. (iii) cX cX when there is no Y ∈ F such that cX cX Y . We can check that Ev(F ) is an event structure and we can observe by Theorem 4, that Slices and Ev are naturally isomorphic. Moreover Ev(F ) is typed on Γ by taking λ(cX ) = top(cX ). We now define a notion of parallel composition between typed event structure, using the results given in the previous section. Definition 11. Let E1 : Γ Λ⊥ and E2 : Λ Δ be two typed event structures. Then we define E1 E2 = Ev {S1 S2 | S1 ∈ Slices(E1), S2 ∈ Slices(E2 )} where S1 S2 is the parallel composition between spos, defined in Section 2. Theorem 5. Let E1 : Γ Λ⊥ and E2 : Λ Δ. Then E1 E2 : Γ Λ± Δ⊥ . Theorem 6 (Associativity). Let E1 : Γ1 Λ⊥ , E2 : Λ Γ2 Δ⊥ , E3 : Δ Γ3 . Then E1 (E2 E3 ) = (E1 E2 )E3 . We remark that the definition of parallel composition we give is similar to the technique of normalization by slices used to define normalization of designs [10] or L-nets [5] in Ludics.
Partial Orders, Event Structures and Linear Strategies
3.2
107
A Category of Innocent Event Structures
In this section we define a category having ES-arenas as objects and (a subclass of) typed event structures as morphisms. Once again, we rely on the definitions given for po strategies. Definition 12. Let E : Γ be a typed event structure. We say that E is innocent when for all S ∈ Slices(E), S = x, < is an innocent po strategy on Γ. By definition of parallel composition and as a consequence of Proposition 3, we observe that the class of innocent event structures is closed under parallel composition. We then define composition as parallel composition + hiding. ⊥ Definition 13. Given two innocent event structures E1 : Γ Λ , E2 : Λ Δ, we define E1 ; E2 = Ev {S1 ; S2 | S1 ∈ Slices(E1 ), S2 ∈ Slices(E2 )} , where S1 ; S2 is the composition of innocent po strategies defined in Section 2.
Using Proposition 3, we can prove that the class of innocent event structures is closed under composition and it is associative. Moreover, we can define a copycat event structure and prove that it plays the role of an identity with respect to the composition. Definition 14. Let Γ, Γ be two disjoint +/- polarized ES-Arenas which are isomorphic through the isomorphism φ : Γ → Γ and such that for all e ∈ Γ, πΓ (e) =
πΓ (φ(e)). We define the copycat event structure as idΓ→Γ = Ev {idx→φ(x) | x ∈ C(Γ)} , where idx→φ(x) is the copycat spo defined in Section 2.
Proposition 8. idΓ→Γ : Γ Γ⊥ is an innocent event structure and it is the identity w.r.t. composition, i.e. for all innocent E : Γ Δ, E is isomorphic to idΓ→Γ ; E. This result allows us to define a category of innocent event structures. We first introduce some notation. Given two event structures E1 , E2 , we write E1 ∼ E2 when they are isomorphic in the category of event structures and we denote with isoE,E : E → E the isomorphism between them. We denote with [E]∼ = {E | E ∼ E} the class of event structures isomorphic to E. Moreover, given an arena Γ, we define Γ = {Γ ∈ [Γ]∼ | ∀a ∈ Γ.πΓ (a) = πΓ (isoΓ,Γ (a))} i.e. the set of all arenas isomorphic and inducing the same polarity function with respect to a given arena Γ. Definition 15. The category InnEv is the category defined as follows. (1.) The class of objects is Obj(InnEv) = {Γ | Γ +/- polarized ES-arena}. (2.) The set of morphisms between Γ and Δ is InnEv(Γ, Δ) = {[E]∼ | E : Δ Γ⊥ innocent} (3.) The composition of [E1 ]∼ : Γ → Λ and [E2 ]∼ : Λ → Δ is defined as [E2 ]∼ ; [E1 ]∼ = [E1 ; E2 ]∼ : Γ → Δ (4.) Let Γ be an object. and let Γ ∈ Γ be disjoint from Γ. Then, the identity is defined as idΓ = [idΓ→Γ ]∼ : Γ → Γ.
108
C. Faggian and M. Piccolo
Theorem 7. InnEv is a compact closed category. Proof. We can define the tensor product in the following way (1.) Γ Δ = Γ Δ; (2.) given two morphisms [E1 ]∼ : Γ1 → Δ1 and [E2 ]∼ : Γ2 → Δ2 , we have [E1 ]∼ [E2 ]∼ = [E1 E2 ]∼ . It is naturally commutative, associative and it has ∅ as neutral element. Observe also that in this category every object Γ has a dual Γ⊥ = Γ⊥ : it induces a contra-variant functor (−)⊥ defined as above for objects and given a morphism [E]∼ : Γ → Δ we have that ⊥ ⊥ [E]⊥ ∼ = [E]∼ : Δ → Γ . We can use it to define the bifunctor as Γ ⊥ Δ = Γ Δ and we can prove the required adjunction property. 3.3
Retrieving Linear Strategies: Confusion Freeness
In this section we sketch how linear strategies fit into the picture we have been developing, and in fact appear as a subclass of the category InnEv. In [6], we have shown that a feature of event structures representing linear strategies is that they are confusion free. Confusion free event structures describe a form of localized non-determinism, where the non-deterministic choice is localized in cells. Given an event structure E, a cell C ⊆ E is a maximal set of events which are pairwise in immediate conflict, and have the same enabling set: ∀e, e ∈ C.e μ e ∧ [e) = [e ). An event structures is said confusion free when cells are closed under immediate conflict. The relation of conflict models a choice: two events which are in conflict live in two different evolutions of the system. Since conflict is inherited, the point where a choice is made corresponds to events in immediate conflict, i.e. a cell. The construct in process calculus which corresponds to a cell is a guarded sum: each events which is a cell can be seen as a guard on that choice. According to the polarity of the elements of the cell, we would hence have a sum which is guarded by output, input, or τ actions (resp. +, -, or ±). In [6], we showed a correspondence between (negative) cells and additives in Linear Logic. We are now going to define a category where the objects are confusion free ES-arenas and morphisms are confusion free event structures which are innocent. Such a category is derived from InnEv, but we first need to strengthen the conditions imposed to the labeling morphism. This is because the class of innocent confusion free event structures on a confusion free ES-arena (in the sense of Definition 9) is otherwise not closed neither composition. Definition 16. Let E : Γ be a innocent event structure. We say that it is a conf.-free innocent event structure when (1.) E and Γ are confusion free and (2.) if e1 μ e2 then (i.) λ(e1 ) μ λ(e2 ) and (ii.) πΓ (λ(e1 ), πΓ (λ(e2 )) = +. Condition (1.) requires that both the event structure E and the arena Γ are confusion free. Now Γ has really the shape of a MALL formula tree, where immediate conflict codes the additive connectives. Condition (2.) requires that two events which are in immediate conflict in E are in immediate conflict also in Γ and that they are never positive. Observe that this, together with Condition
Partial Orders, Event Structures and Linear Strategies
109
(2.ii) of Definition 8 tells us that events belonging to the same cell of E are labeled with actions in immediate conflict in Γ. Moreover, those actions have the same polarity and such a polarity can only be either negative or neutral. The correspondence innocence/asynchrony (see [7,8]) supports this constraint, which is consistent with the fact in an asynchronous calculus only input-prefixed (external choice) and τ -prefixed (internal choice) terms can be summands in a guarded sum. The following result allows us to define a subcategory of InnEv whose objects are given by equivalence classes of confusion free ES-arenas and whose morphisms are given by equivalence classes of conf.-free innocent event structures. Theorem 8. Let E1 : Γ Λ⊥ and E2 : Λ Δ be two conf.-free innocent event structures. Then (1.) E1 E2 : Γ Λ± Δ is a conf.-free innocent event structure. (2.) E1 ; E2 : Γ Δ is a conf.-free innocent event structure. In the class of conf.-free innocent event structures we are now able to retrieve the family of linear strategies, by imposing opportune constraints. The fundamental fact is that linear strategies can be described as partial orders with a conflict relation (as detailed in [7]). We consider the following constraints on conf.-free innocent event structures: arborescence, sequentiality, acyclicity. An event structure E = E, ≤, is arborescent when E, ≤ is. A conf.-free innocent event structure E : Γ is sequential when it is arborescent and for all S1 , S2 ∈ Slices(E) S1 ∩ S2 ∈ Slices(E) (where the intersection of two slices is the set-theoretical intersection of the underlying set of actions with the induced order). Observe that, if the arena is alternating, this condition tells us that given e, e1 , e2 ∈ E if e ←E e1 , e ←E e2 and π(e) = − then π(e1 ) = π(e2 ) = + and e1 = e2 .Intuitively, this condition would correspond to the constraint that in a process there is an unique output which is active at any time. Finally, we say that E : Γ is acyclic if it satisfies the analogous of the acyclicity constraints given in [5], which we do not detail here. Intuitively, the condition guarantees the absence of deadlocks during parallel composition. When restricting to +/- alternating arenas (and the polarity constraints given in [10]), we retrieve the family of linear strategies Theorem 9. Let E : Γ be a conf.-free event structure on a +/- alternating arena. (1.) If E : Γ is sequential, then E corresponds to a linear strategy as defined by Girard in [10] (these strategies are there called designs). (2.) If E : Γ is arborescent, then E corresponds to a linear strategy extended with mix, as defined in [4] (and there called L-forests). (3.)If E : Γ is acyclic, then E corresponds to an L-net, as defined in [5]. Moreover, all the above sub-classes of conf.-free innocent event structures are closed under composition and all the induced category are all subcategories of InnEv. If we do not insist for the arena to be alternating, we would have also neutral cells, which correspond (in process calculus) to a sum guarded by τ actions. This leaves the space for a possible extension of our work to model internal choices.
110
C. Faggian and M. Piccolo
In future work we want to investigate this direction as a possible approach to non-deterministic Game Semantics. A key element in this paper is linearity, which allows for the definition of composition based on the merging of orders. We believe this is not a limitation to model an expressive calculus. In ongoing work [8] we extend [7], to show that Ludics is in fact able to model a variant of the Linear π-calculus extended with recursion. Even with recursion, the game model is linear.
References 1. Abramsky, S., Malacaria, P., Jagadeesan, R.: Full abstraction for PCF. Inf. and Comp. 163(2), 409–470 (2000) 2. Abramsky, S., Melli`es, P.-A.: Concurrent games and full completeness. In: Proc. of LICS, pp. 431–442 (1999) 3. Curien, P.-L.: Notes on game semantics, http://www.pps.jussieu.fr/~ curien 4. Curien, P.-L., Faggian, C.: L-nets, strategies and proof-nets. In: Ong, L. (ed.) CSL 2005. LNCS, vol. 3634, pp. 167–183. Springer, Heidelberg (2005) 5. Faggian, C., Maurel, F.: Ludics nets, a game model of concurrent interaction. In: Proc. of LICS 2005, pp. 376–385. IEEE Computer Society Press, Los Alamitos (2005) 6. Faggian, C., Piccolo, M.: A graph abstract machine describing event structure composition. ENTCS 175(4), 21–36 (2007) 7. Faggian, C., Piccolo, M.: Ludics is a model for the finitary linear pi-calculus. In: Della Rocca, S.R. (ed.) TLCA 2007. LNCS, vol. 4583, pp. 148–162. Springer, Heidelberg (2007) 8. Faggian, C., Piccolo, M.: Ludics is a model for the linear pi-calculus. Draft (2009) 9. Ghica, D.R., Murawski, A.S.: Angelic semantics of fine-grained concurrency. In: Walukiewicz, I. (ed.) FOSSACS 2004. LNCS, vol. 2987, pp. 211–225. Springer, Heidelberg (2004) 10. Girard, J.-Y.: Locus solum: from the rules of logics to the logics of rules. MSCS 11, 301–506 (2001) 11. Harmer, R.: Innocent game semantics, http://www.pps.jussieu.fr/~ russ 12. Hyland, J.M.E., Ong, L.C.-H.: On full abstraction for PCF: I, II, and III. Inf. and Comp. 163(2), 285–408 (2000) 13. Hyland, M.: A category of partial orders. Communication at AMS/SMF Meeting in Lyon (July 2001), http://www.dpmms.cam.ac.uk/~ martin/Research/Slides/lyon01.pdf 14. Hyland, M., Schalk, A.: Games on graph and sequentially realizable functionals. In: Proc. of LICS 2002, pp. 257–264. IEEE Computer Society Press, Los Alamitos (2002) 15. Laird, J.: A game semantics of the asynchronous π-calculus. In: Abadi, M., de Alfaro, L. (eds.) CONCUR 2005. LNCS, vol. 3653, pp. 51–65. Springer, Heidelberg (2005) 16. McCusker, G., Wall, M.: Categorical and game semantics for SCIR. In: Proc. of GALOP, pp. 157–178 (2005) 17. Melli`es, P.-A.: Asynchronous games 2: The true concurrency of innocence. TCS 358(2-3), 200–228 (2006)
Partial Orders, Event Structures and Linear Strategies
111
18. Melli`es, P.-A., Mimram, S.: Asynchronous games: Innocence without alternation. In: Caires, L., Vasconcelos, V.T. (eds.) CONCUR 2007. LNCS, vol. 4703, pp. 395– 411. Springer, Heidelberg (2007) 19. Nielsen, M., Plotkin, G., Winskel, G.: Petri nets, event structures and domains. TCS 13(1), 85–108 (1981) 20. Sangiorgi, D., Walker, D.: The π-calculus: a theory of mobile processes. Cambridge University Press, Cambridge (2001) 21. Schalk, A., P´erez, J.J.P.: Concrete data structures as games. ENTCS (2005) 22. Varacca, D., Yoshida, N.: Typed event structures and the π-calculus. In: Proc, of MFPS. ENTCS, vol. 158, pp. 373–397 (2006) 23. Winskel, G.: Event structures. In: Brauer, W., Reisig, W., Rozenberg, G. (eds.) APN 1986. LNCS, vol. 255, pp. 325–392. Springer, Heidelberg (1987) 24. Winskel, G.: An introduction to event structures. In: de Bakker, J.W., de Roever, W.-P., Rozenberg, G. (eds.) Linear Time, Branching Time and Partial Order in Logics and Models for Concurrency. LNCS, vol. 354, pp. 364–397. Springer, Heidelberg (1989) 25. Winskel, G., Nielsen, M.: Models for concurrency. In: Handbook of Logic in Computer Science, pp. 1–148. Oxford University Press, Oxford (1995) 26. Yoshida, N., Berger, M., Honda, K.: Strong normalisation in the PI-calculus. Inf. and Comp. 191(2), 145–202 (2004)
Existential Type Systems with No Types in Terms Ken-etsu Fujita1, and Aleksy Schubert2, 1
2
Gunma University, Tenjin-cho 1-5-1, Kiryu 376-8515, Japan [email protected] The University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland [email protected]
Abstract. We study type checking, typability, and type inference problems for type-free style and Curry style second-order existential systems where the type-free style differs from the Curry style in that the terms of the former contain information on where the existential quantifier elimination and introduction take place but omit the information on which types are involved. We show that all the problems are undecidable employing reduction of second-order unification in case of the type-free system and semiunification in case of the Curry style system. This provides a fine border between problems yielding to a reduction of second-order unification problem and the semiunification problem. In addition, we investigate the subject reduction property of the system in the Curry-style.
1
Introduction
The System F (λ2) of Girard-Reynolds is a fundamental system for the study of polymorphism. Its polymorphic properties are seen as the theoretical basis for the functional programming languages such as ML and this resulted in thorough studies e.g. [Boe85, Wel99, Sch98] and for the λ2 subsystems [FS00, Mai90]. A system with 2nd order existential types provides a theoretical basis for abstract data types [MP88]. This ingredient is also present in languages based on ML, but has been less studied except for the domain-free style [NTKN08]. The polymorphic system has enough power to encode impredicatively other connectives ∧, ∨, ∃ and so on in terms of →, ∀ [Pra65]. In this sense, the 2nd order existential type system λ∃ [Fuj05] can be regarded as a subsystem of λ2. On the other hand, the existential system λ∃ can serve, under CPS-translations, as a target calculus not only of λ2 (2nd order intuitionistic logic) but also of λμ-calculus (2nd order classical logic) of Parigot [Fuj05, Has06]. Moreover, a recent work [Fuj] on an intimate connection between λ2 and λ∃ reveals duality not only on reduction correspondence but also on proof structures between the systems. In this light, the expressiveness of λ2 and λ∃ is comparable. Still, the undecidability of the type related problems cannot be established as a corollary
This work was partly supported by the bilateral program (2008) between Japan Society for the Promotion of Science and Polish Academy of Sciences. This work was partly supported by the Polish government grant no N N206 355836.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 112–126, 2009. c Springer-Verlag Berlin Heidelberg 2009
Existential Type Systems with No Types in Terms
113
of the mentioned above translations. This is in principle due to the fact that the translation works smoothly for systems with adequate type information, e.g. domain-free style [BS00, NTKN08], while the additional or missing type information introduces complications which are not immediate to overcome. In this paper, we study type checking, typability, and type inference problems of 2nd order existential type systems in the styles of type-free and Curry. First, we show that all of the type related problems are undecidable for 2nd order existential type system in the type-free style, which can be regarded as an intermediate system between Church style and Curry style. For this, 2nd order unification problem in the flat form is introduced, and this form is proved undecidable by the reduction from 2nd order unification of simple instances [Sch98]. Then, by the reduction from the flat forms, it is proved that all of the type related problems of the type-free system are undecidable. Secondly, we show that type checking and typability problems of the system in the style of Curry are undecidable. By reductions from the semiunification problem [KTU90], the undecidability proofs are established for fragments: (→, ∃) and (¬, ∧, ∃). The first of these results answers the decidability issue for the type related problems the solutions of which were left open by Nakazawa et al. [NTKN08]. Moreover, we investigate the subject reduction property of the system in Curry-style. The 2nd order unification is used to prove the undecidability of several type related problems for second-order calculi [Sch98, FS00, Pfe93, Boe85]. Similarly, the semiunification problem (SUP) is appropriate as a tool to prove the undecidability for other ones [Wel99, KTU93]. The systems in this paper elucidate the mutual relation between the two problems as applied to type related questions. We can conclude that the possibility to fix the number of arguments to a second-order construct plays the crucial part here. This possibility is, indeed, the only difference between the type-free system and the Curry-style system. In this light the semiunification problem can be viewed as a kind of 2nd order unification where the number and the shape of 2nd order variable arguments are not known. The main advantage of the Curry-style type systems from the programmers’ point of view is that a programmer is liberated from the burden of making the type annotations. The type-free system can be an interesting alternative here as the annotating tax is not high, but the programmer can decide where the polymorphism is employed and how much of the polymorphism is exploited. One more advantage of the type-free systems is that they correspond to the 2nd order unification which is more extensively studied than the semiunification problem.
2
Second Order Existential Type Systems (¬, →, ∧, ∃)
We consider a second order system consisting of ¬, →, ∧, and second order existential quantification ∃. The subsystem with → and ∃ is denoted by (→, ∃) while the one with ¬, ∧ and ∃ is by (¬, ∧, ∃). The possible types of the full system are: A ::= X | ⊥ | ¬A | A → A | A ∧ A | ∃X.A
114
K. Fujita and A. Schubert
First, a system between Church and Curry styles is defined as follows, where the special symbol ∃ marks the existence of a witness. We call this system type-free system as we erase here all the types from the terms, but we mark the locations where the types should occur in the corresponding Church-style terms. M ::= x | λx.M | M M | M, M | πi (M ) | ∃, M | let ∃, x = M in M Γ, x : A x : A
(var)
Γ, x : A M : ⊥ Γ M1 : ¬A Γ M2 : A (¬I) (¬E) Γ λx.M : ¬A Γ M1 M2 : ⊥ Γ, x : A1 M : A2 Γ M1 : A1 → A2 Γ M2 : A1 (→ I) (→ E) Γ λx.M : A1 → A2 Γ M1 M2 : A2 Γ M : A1 ∧ A2 Γ M1 : A1 Γ M2 : A2 (∧I) (∧E) Γ M1 , M2 : A1 ∧ A2 Γ πi (M ) : Ai Γ M : A[X := A1 ] (∃I) Γ ∃, M : ∃X.A
Γ M1 : ∃X.A Γ, x : A M2 : A1 (∃E)∗ Γ let ∃, x = M1 in M2 : A1
where (∃E)∗ denotes the eigenvariable condition X ∈ F V (Γ, A1 ) and Γ are contexts which are sets of pairs x : A that assign types to object variables. Secondly, a system in Curry style is given as follows, for instance see [SU06], where we show here only the type rules which differ from the type-free system: M ::= x | λx.M | M M | M, M | πi (M ) Γ M : A[X := A1 ] (∃I) Γ M : ∃X.A
Γ M1 : ∃X.A Γ, x : A M2 : A1 (∃E)∗ Γ M2 [x := M1 ] : A1
where (∃E)∗ denotes the eigenvariable condition X ∈ F V (Γ, A1 ). As usual, a forgetful mapping can be defined from type-free to Curry. For instance, ∃, M = M , let ∃, x = M1 in M2 = M2 [x := M1 ] for type-free terms M, M1 , M2 . Then well-typed Curry terms can be lifted to well-typed type-free terms as in [Bar93]. The type checking problem (TCP) is a problem: given a term M , a type A, and a context Γ , is the judgement Γ M : A derivable? The type inference problem (TIP) is a problem: given a term M and a context Γ , is there a type A such that Γ M : A is derivable? Finally, the typability problem (TP) is a problem: given a term M , are there a context Γ and a type A such that Γ M : A is derivable?
3
Undecidability of Restricted Unification
The proof of undecidability of type related problems for type-free system is based on the reduction of a version of the 2nd order unification problem to the problems. We present here the 2nd order unification we are interested in. Based on the syntax of types, we define expressions for unification problems. Here, the countably infinite set of type variables is separated into two sets of
Existential Type Systems with No Types in Terms
115
variables: One is for first order variables denoted by X, Y , and another is for constants denoted by C. . . A unification problem consists of a set of equations E = {A1 = B1 , . . . , Ak = Bk }. And we say that the instance E is solvable if there exists a substitution S such that S(A1 ) = S(B1 ), . . . , S(Ak ) = S(Bk ). We write Dom(S) for the domain of S. Expressions for the first order unification problems are defined from first order variables and constants, together with →. A, B ::= X | C | A → B Expressions for simple instances of the second order unification problem are defined using second order functional variables of a fixed arity denoted by F, G and terms built of constants and → only, e.g. FC1 · · · Cn for F of arity n. Here, free variables in expressions of unification problems are defined as follows: F Vu (C) = ∅; F Vu (X) = {X}; F Vu (F) = {F}; F Vu (A → B) = F Vu (A)∪F Vu (B). Definition 1 (Simple instances Es ). We consider the following set Es of only two equations of simple instances, which is enough for the undecidability [Sch98]. . . {FA1 . . . An = (FA1 . . . An ) → A0 , GB1 . . . Bm = (GB1 . . . Bm ) → (FA1 . . . An )}
where F has the arity n and G has the arity m with m, n ≥ 1, and the set of the free variables F Vu (Ai , Ai , Ai , Bj , Bj ) = ∅ for each i, j. Theorem 1 (Undecidability for simple instances [Sch98]). The second order unification for simple instances is undecidable. The equations can be transformed so that the second-order variables occur only as topmost symbols (at the root of the tree they correspond to). The simple instances of Definition 1 can be reduced to the following equations Esr : Definition 2 (Simple instances with root restriction Esr ). . FA1 . . . An = XFA1 ...An → A0 . → XFA ...A GB1 . . . Bm = YGB1 ...Bm n 1
. FA1 . . . An = XFA1 ...An . = YGB1 ...Bm GB1 . . . Bm . FA1 . . . An = XFA1 ...An
Simple instances together with the root-restriction (Definition 2) can be reduced further to the following restricted form. We say that a set of unification equations E is in the flat form if it obeys all the restrictions as follows: Definition 3 (Flat form) 1. Root restriction: Second-order variables occur only at root positions. 2. Monadic restriction: If second-order variable occurs as FA1 · · · An then either all Ai are constants or all Ai are first order variables; moreover, the symbols Ai occur only in the equation where FA1 · · · An occurs.
116
K. Fujita and A. Schubert
3. Constant restriction: Each time a second-order variable F is applied to a . vector X1 , . . . , Xn of first order variables as FX1 · · · Xn = A, there is a set of pairwise distinct constants C1 , . . . , Cn such that there is exactly one . equation FC1 · · · Cn = B ∈ E, where C1 , . . . , Cn occur exclusively in this equation of E, and all C1 , . . . , Cn occur in B. The first equation of the simple instances with the root restriction (Definition 2) can be reduced to the following flat form Esf : Definition 4 (Simple instances in flat form Esf ) . XA1 ...An = XF A1 ...An → A0 . F X1 . . . Xn = XA1 ...An → A1 → · · · → An → o . → C1 → · · · → Cn → o F C1 . . . Cn = XA 1 ...An are where F is a fresh second order variable with n-arity; Xi , XA1 ...An , XA 1 ...An fresh first order variables; C1 , . . . , Cn are subject to the constant restriction; and o is a distinguished constant. . Lemma 1. The first equation FA1 . . . An = XFA1 ...An → A0 in Esr of Definition 2 is solvable if and only if the equations in Esf of Definition 4 are solvable.
Proof. 1. Suppose that S(F)A1 . . . An = S(XFA1 ...An ) → A0 . Then define a substitution S for Definition 4 as follows: S (F )Z1 . . . Zn = (S(F)Z1 . . . Zn ) → Z1 → · · · → Zn → o; S (XA1 ...An ) = S(F)A1 . . . An ; S (XA ) = S(F)C1 . . . Cn ; 1 ...An S (Xi ) = Ai ; and S (XF A1 ...An ) = S(XF A1 ...An ). Now all the equations in Definition 4 are solvable. 2. Suppose that the equations in Definition 4 are solvable under S. From the constant restriction of the flat form, the substitution S has the following form: S(F )Z1 . . . Zn = (A → Z1 → · · · → Zn → o) for some A such that S(XA1 ...An ) = A[Z1 := A1 , . . . , Zn := An ] and S(XA ) = A[Z1 := 1 ...An C1 , . . . , Zn := Cn ]. Then define a substitution S for the first equation as follows: S (F)Z1 . . . Zn = A and S (XFA1 ...An ) = S(XF A1 ...An ). Now the first equation in Definition 2 is solvable. The other equations of the simple instances with the root restriction (Definition 2) can be reduced similarly to the flat form, to say Esf for the all. Proposition 1 (Reduction from Es to Esf ). The simple instances of Es are solvable if and only if the simple instances in the flat form Esf are solvable. Proof. Es can be reduced to Esr , and then apply Lemma 1.
4
Undecidability of Type Related Problems for Type-Free System
Now, we embark on the reduction of the unification of equations in the flat form to the type related problems. We provide for a set of equations E, a λ-term M
Existential Type Systems with No Types in Terms
117
such that if a type derivation for M exists then a unifier S for E can be separated from it. The main idea of the construction is to ensure that the shape of a type for a variable xA occurring in M which corresponds to a subexpression A in E strictly corresponds to S(A). We simply write An → B for A → · · · → A → B with n-occurrences of A, and M N n for M N . . . N with n-occurrences of N . Since we have a countably infinite set of term variables of λ-calculus, we can assume one-to-one mappings between expressions of unification problems and term variables of λ-terms. Based on this, we write term variables xA and yA corresponding to an expression A of unification problem. For instance, we have term variables xX and yX from the first order variable X, and similarly term variables xC and yC from the constant C. In particular, from the distinguished constant o we have a term variable xo . In order to make the presentation of the paper more accessible, we simplified the formulations of the lemmas so that it fits well the TP. The handling of the TIP and TCP requires more careful handling of the contexts. Definition 5 (Encoding of first order expressions). For an expression A of the first order unification problem, we define a λ-term MA , as follows: 1. Case A of X (first order variable): MX ≡ y1 (yX xX )(y1 x2o ), where y1 is a fresh term variable. 2. Case A of C (constant): MC ≡ y1 (yC xC )(y1 x2o ), where y1 is a fresh term variable. 3. Case A of A1 → A2 : MA1 →A2 ≡ y1 (y2 (xA1 →A2 xA1 )) (y2 xA2 ) MA1 MA2 (y1 x5o ) where y1 , y2 are fresh term variables. For a context Γ , an updated context Γ (x : A) is defined as usual, such that Γ (x : A)(y) = A if y ≡ x, otherwise Γ (x : A)(y) = Γ (y). Lemma 2. 1. For any expression A of first order unification, there exist context Γ such that Γ MA : o and then Γ (xA ) = SΓ (A) for some substitution SΓ . 2. If we have Γ MA : B, then Γ MA : o under some updated Γ of Γ at yi , yXj , xo for yi , yXj , xo ∈ F V (MA ). 3. For any substitution S of first order unification with F Vu (A) ⊆ Dom(S), there exist context ΓS and type o such that ΓS MA : o and then ΓS (xA ) = S(A). Proof. 1. First, we prove 1 by induction on the structure of A. 2 is clear from this proof. We assume that xo : o, which is used implicitly. (a) Case A of X: We take Γ = {y1 : o2 → o, yX : B → o, xX : B} where B is an arbitrary expression of first order unification. Then we have Γ MX : o. For all xXi ∈ Dom(Γ ), we define SΓ = [X1 := Γ (xX1 ), . . . , Xn := Γ (xXn )]. Then we establish Γ (xX ) = B = S(X).
118
K. Fujita and A. Schubert
(b) Case A of C: We take Γ = {y1 : o2 → o, yC : C → o, xC : C}. Then Γ MC : o and Γ (xC ) = C = S(C) for any S. (c) Case A of A1 → A2 : From the induction hypotheses, we have Γ1 MA1 : o and Γ2 MA2 : o for some contexts, where Γ1 (xA1 ) = SΓ1 (A1 ) and Γ2 (xA2 ) = SΓ2 (A2 ) for some substitutions defined by Γi . We update Γ2 at xX1 , . . . , xXk for all Xi ∈ F Vu (A1 ) ∩ F Vu (A2 ), so that Γ2 (xX1 : Γ1 (xX1 ))(yX1 : Γ1 (xX1 ) → o) . . . (xXk : Γ1 (xXk ))(yXk : Γ1 (xXk ) → o), and obtain Γ2 . Let Γ = Γ1 ∪ Γ2 . Then we have Γ MAi : o and Γ (xAi ) = SΓ (Ai ) for i = 1, 2. Now we define ΓA1 →A2 = Γ ∪ {y1 : o5 → o, y2 : Γ (xA2 ) → o, xA1 →A2 : Γ (xA1 ) → Γ (xA2 )}. Finally, we establish ΓA1 →A2 MA1 →A2 : o, and then ΓA1 →A2 (xA1 →A2 ) = Γ (xA1 ) → Γ (xA2 ) = SΓ (A1 ) → SΓ (A2 ) = SΓ (A1 → A2 ) = SΓA1 →A2 (A1 → A2 ), where SΓA1 →A2 = SΓ . 2. For any A, we have Γ MA : o for some Γ . Here, we update Γ at all xXi with Xi ∈ F Vu (A) ⊆ Dom(S), such that Γ (xX1 : S(X1 ))(yX1 : S(X1 ) → o) . . . (xXn : S(Xn )(yXn : S(Xn ) → o), to say ΓS . Then we still have ΓS MA : o, and then ΓS (xA ) = SΓ S (A) for some SΓ S defined by ΓS . Here, SΓ S = [X1 := ΓS (xX1 ), . . . , Xn := ΓS (xXn )] = [X1 := S(X1 ), . . . , Xn := S(Xn )] = S restricted to F Vu (A). Hence, we have ΓS (xA ) = S(A). Definition 6 (Encoding of first order unification). Let E be a finite set of . equations of first order unification E = {A1 = B1 } ∪ E0 . For the equations we define a λ-term ME in case E0 = ∅ as follows: ME ≡ y1 (y2 xA1 )(y2 xB1 )MA1 MB1 ME0 (y1 x6o ), where y1 , y2 are fresh term variables; and in case E0 = ∅: ME ≡ y1 (y2 xA1 )(y2 xB1 )MA1 MB1 (y1 x5o ), where y1 , y2 are fresh term variables. Lemma 3. Let E be a finite set of equations of first order unification. There is a context Γ such that the problem E is solvable if and only if Γ ME : o. Proof. 1. Suppose that E is solvable under S, i.e., S(Ai ) = S(Bi ) for each 1 ≤ i ≤ n. From Lemma 2, we have ΓS MAi : o and ΓS MBi : o, where ΓS (xAi ) = S(Ai ) = S(Bi ) = ΓS (xBi ). Let Γ = ΓS ∪ Γ1 ∪ . . . ∪ Γn , (i) (i) where Γi = {y1 : o6 → o, y2 : ΓS (xAi ) → o} for 1 ≤ i ≤ n − 1, and 5 Γn = {y1 : o → o, y2 : ΓS (xAn ) → o}. Then we establish Γ ME : o. 2. Suppose that Γ ME : o for some context Γ and type o. Now we have Γ MAi : o and Γ MBi : o for each i, and then from Lemma 2, SΓ (Ai ) = . Γ (xAi ) = Γ (xBi ) = SΓ (Bi ) for some substitution SΓ . Hence all of Ai = Bi are solvable under SΓ .
Existential Type Systems with No Types in Terms
119
We define a λ-term ∃n+1 , M ≡ ∃, ∃n , M and ∃1 , M ≡ ∃, M , which means successive applications of (∃I). We also define a λ-term (let ∃n+1 , a1 = M in N ) ≡ (let ∃, a1 = M in (let ∃n , a2 = a1 in N )), which means successive applications of (∃E). A reduction from Es of Definition 1 to simple instances in the flat form Esf provides totally 10 equations with second order variables, where two second order variables are used, to say F for four equations and G for the rest six equations. We define the following encoding for equations with F, and the case with G follows the same pattern. Definition 7 (Encoding of second order unification). For a set of equations E such that the set E\E consists of the equations in the flat form containing the second order variable F of arity n: . FX1 . . . Xn = B1 ,
. FC1 . . . Cn = B2 ,
. FY1 . . . Yn = B3 ,
. FC1 . . . Cn = B4 ,
where B1 ≡ (X → A1 → · · · → An → o), B3 ≡ (Y → A1 → · · · → An → o), B2 ≡ (X → C1 → · · · → Cn → o), and B4 ≡ (Y → C1 → · · · → Cn → o); we define a λ-term ME as follows: y1 (y2 ∃n , xB1 ) (y2 ∃n , xB2 ) (y2 ∃n , xB3 ) (y2 ∃n , xB4 ) (y2 (let ∃n , a = xF in ∃n , λzz1 . . . zn .y1 (azz1 . . . zn )12 )) (y2 xF ) MB1 MB2 MB3 MB4 ME (y1 x12 o )
(1)
where y1 , y2 are fresh term variables; MB1 , . . . , MB4 are encodings of the first order expressions B1 , . . . , B4 ; and ME is an encoding of the set E of equations. Lemma 4. Let E be the equations in the flat form above. There is a context Γ such that the problem E is solvable if and only if Γ ME : o. Proof. (⇒) A straightforward transformation of the substitution. (⇐) Suppose Γ ME : B for some Γ and B , then an updated Γ of Γ even gives Γ ME : o. Thus for all i = 1, 2, 3, 4, we have Γ MBi : o as well and Γ (xBi ) = S(Bi ) for some substitution S. Since (let ∃n , a = xF in ∃n , λzz1 . . . zn .y1 (azz1 . . . zn )12 ) is well-typed by n-times application of (∃E) and (∃I), type of xF has the form of ∃Z1 . . . Zn .B for some B, such that λzz1 . . . zn .y1 (azz1 . . . zn )12 and a have the same type B in the form of (A → 1 → · · · → n → o) for some types A, i . From Γ xBi : S(Bi ) for all i = 1, 2, 3, 4, we must derive the same type ∃Z1 . . . , Zn .B such that Γ ∃n , xBi : ∃Z1 . . . , Zn .B by n-times application of (∃I). From the constant restriction on Ci , Ci , these constants are to be abstracted one by one, and moreover the head parts of S(Bi ); S(X), S(X ), S(Y ), S(Y ) are also to be abstracted by (∃I). This means that B ≡ (A → Z1 → · · · → Zn → o) with S(X) = A[Z1 := A1 , . . . , Zn := An ]; S(X ) = A[Z1 := C1 , . . . , Zn := Cn ]; S(Y ) = A[Z1 := A1 , . . . , Zn := An ]; S(Y ) = A[Z1 := C1 , . . . , Zn := Cn ]. Now we can extend the substitution S to F, such that S(F)Z1 . . . Zn = (A → Z1 → · · · → Zn → o), and that S(Xi ) = Ai and S(Yi ) = Ai for fresh first order variables Xi , Yi . Then we establish S(F)S(X1 ) . . . S(Xn ) = S(F)A1 . . . An = A
120
K. Fujita and A. Schubert
[Z1 := A1 , . . . , Zn := An ] → A1 → · · · → An → o = S(X) → A1 → · · · → An o = S(B1 ); and S(F)C1 . . . Cn = A[Z1 := C1 , . . . , Zn := Cn ] → C1 → · · · Cn → o = S(X ) → C1 → · · · → Cn → o = S(B2 ). The other equations can solved as well. Thus E is solvable.
→ → be
Theorem 2 (TP, TIP, TCP (→ ∃)). TP, TIP, and TCP are undecidable for the system (→, ∃). Proof. The flat form E is solvable iff Γ ME : o for some Γ and o by Lemma 4, so the undecidability of TP follows. In order to obtain TCP and TIP additional care should be taken to make sure that the types of different constants are indeed different and that we have the distinguished type o = Cn . This can be obtained with the help of the following construction ˆ E ≡ let ∃, xC1 = x∃X.X in . . . let ∃, xCn = x∃X.X in ∃, λx.ME M where x∃X.X is a variable of the type ∃X.X and x are all the free variables in ME except xC1 , . . . , xCn . The type ∃X.X for the variable x∃X.X can be enforced in turn using the term: M∃X.X ≡ y1 (y2 x∃X.X )(y2 ∃, (λx.x))(y2 ∃, ∃, y3 ) where y1 , y2 , y3 are fresh variables. We can now combine MˆE and M∃X.X into one term λx.y3 MˆE M∃X.X where x contains all the free variables of y3 M∃X.X MˆE . The details of the construction are left to the reader.
5
Simple Derivations of the System in the Curry-Style
We analyse the structure of derivations in the Curry-style system. This is further applied to TCP and TIP, and to establish the subject reduction property. Definition 8 (Removing vacuous ∃) 1. |X| = X; |⊥| = ⊥; |¬A| = ¬|A|; |A1 A2 | = |A1 | |A2 | for ∈ {→, ∧}. 2. |∃X.A| = ∃X.|A| if X ∈ F V (A); |∃X.A| = |A| if X ∈ F V (A). Lemma 5 (Basic lemma) (Weakening): Γ, x : A N : A1 where x ∈ F V (N ) if and only if Γ N : A1 . (Substitution): If Γ M : A, then Γ [X := A1 ] M : A[X := A1 ]. (Cut): If Γ M : A and Γ, x : A N : A1 , then Γ N [x := M ] : A1 . |A[X := B]| ≡ |A|[X := |B|]. (Non-vacuous ∃): If Γ M : A is derivable, then |Γ | M : |A| is derivable with no vacuous quantification in types throughout the whole derivation. 6. Γ M : ∃X.∃Y.A if and only if Γ M : ∃Y.∃X.A. 1. 2. 3. 4. 5.
Definition 9 (Simple derivations). A derivation D of Γ M : A in the Curry-style system is called simple, if D complies to all the restrictions:
Existential Type Systems with No Types in Terms
121
1. For each existential type ∃X.A in D, we have X ∈ F V (A). 2. (No redundant (∃E)) For each application of (∃E) in D, x ∈ F V (N ) in: Γ M : ∃X.A Γ, x : A N : A1 (∃E) Γ N [x := M ] : A1
3. The derivation D contains no application of rules (∃I), (∃E) such as: Γ M : A[X := A1 ] (∃I) Γ M : ∃X.A Γ, x : A N : A2 (∃E) Γ N [x := M ] : A2
where the major premise of (∃E) is a consequence of (∃I). 4. The derivation D contains no application of rules (∃E) such as: Γ M : ∃X1 .A1 Γ, x1 : A1 N1 : ∃X2 .A2 (∃E) Γ N1 [x1 := M ] : ∃X2 .A2 Γ, x2 : A2 N2 : A (∃E) Γ N2 [x2 := N1 [x1 := M ]] : A
where the major premise of lower (∃E) is a consequence of upper (∃E). 5. The derivation D contains no application of rules (∃E), (∃I) such as: Γ M : ∃X.A Γ, x : A N : A1 [Y := A2 ] (∃E) Γ N [x := M ] : A1 [Y := A2 ] (∃I) Γ N [x := M ] : ∃Y.A1
where the premise of (∃I) is a consequence of (∃E). Proposition 2 (Simple derivations). For any derivation D of Γ M : A in the Curry-style system, we have a simple derivation D of Γ M : A with the same term M . Proof. First step: We obtain the condition 1 of Def. 9 by Lemma 5(5) and the condition 2 is by Lemma 5(1). Second step: We can prove that for any derivation D of Γ M : A, there exists a derivation D of Γ M : A with the same term, which obeys conditions 3 and 4, following [Pra65, And95]. Third step: Suppose that we have a derivation which obeys 1, 2, 3 and 4. Then the application as in case (5) of Def. 9 can be permuted as follows: Γ, x : A N : A1 [Y := A2 ] (∃I) Γ M : ∃X.A Γ, x : A N : ∃Y.A1 (∃E) Γ N [x := M ] : ∃Y.A1
Thus we can show the existence of simple derivations.
Corollary 1 (Application of (∃E)). On the application of (∃E) in a simple derivation: Γ M : ∃X.A Γ, x : A N : A1 (∃E) Γ N [x := M ] : A1
the judgement Γ M : ∃X.A is a consequence of an application of (var), (→ E) or (∧E). Thus the case of M ≡ y (variable) gives Γ (y) = ∃X.A, and the term M can be in the form of neither λy.M1 nor M1 , M2 for some M1 , M2 .
122
K. Fujita and A. Schubert
Under the simple derivations, lemmas of generation are naturally obtained. For successive applications of substitutions, we write A[Y1 := A1 ] . . . [Ym := Am ] where Yi ∈ F V (Aj ) for i = j. A skeleton of a derivation for Γ x : A in the Curry-style system can be represented as a type-free term: M ≡ (let ∃, a1 = x in · · · let ∃, an = an−1 in ∃, . . . , ∃, an ) with M = x. This can be generalised to all terms by Curry-Howard isomorphism together with a natural type erasure · . Lemma 6 (Generation lemma). (1) (var): If Γ x : A, then there exists x : ∃X1 . . . Xn .B ∈ Γ (n ≥ 0), such that B ≡ B0 [Y1 := B1 ] . . . [Ym := Bm ] and A ≡ ∃Y1 . . . Ym .B0 for some B0 , B1 , . . . , Bm (m ≥ 0). (2) (lam): If Γ λx.M : A, then the derivation has, for some Mi , N, the following structure P in terms of the type-free style with P = λx.M : P ≡ (let ∃, x1 = M1 in · · · let ∃, xn = Mn in ∃, . . . , ∃, λx.N ). (3) (app): If Γ M N : A, then the derivation has, for some Mi , N1 , N2 , P , one of the following structures Q in terms of the type-free style with Q = M N :
Case 1: Q ≡ (let ∃, x1 = M1 in · · · let ∃, xn = Mn in ∃, . . . , ∃, N1 N2 ); Case 2: Q ≡ (let ∃, x1 = M1 in · · · let ∃, xn = Mn in let ∃, xn+1 = N1 N2 in P ), where P = xn+1 .
(4) (pair): If Γ M, N : A, then the derivation has, for some Mi , N1 , N2 , the following structure in terms of the type-free style with P = M, N : P ≡ (let ∃, x1 = M1 in · · · let ∃, xn = Mn in ∃, . . . , ∃, N1 , N2 ). (5) (proj): If Γ πk (M ) : A, then the derivation, for some Mi , N, P , has one of the following structures Q in terms of the type-free style with Q = πk (M ):
Case 1: Q ≡ (let ∃, x1 = M1 in · · · let ∃, xn = Mn in ∃, . . . , ∃, πk (N )); Case 2: Q ≡ (let ∃, x1 = M1 in · · · let ∃, xn = Mn in let ∃, xn+1 = πk (N ) in P ), where P = xn+1 .
6
TCP and TIP for (→, ∃) in Curry-Style
For the proof of undecidability of TC and TCP we adopt here the technique of Wells [Wel99]. Let {A1 ≤ B1 , A2 ≤ B2 } be an instance of SUP [KTU90] built of type variables and →.1 Let X be the set of variables of the instance. The instance has a solution S if and only if R1 (S(A1 )) = S(B1 ) and R2 (S(A2 )) = S(B2 ) for some substitutions R1 , R2 . Let γ, γ1 , γ2 , ζ be fresh type variables which do not occur in X . In the system of (→ ∃), we shorten A → ζ as ¬A. Each type ∃X1 . . . Xk .A where F V (A) = {X1 , . . . , Xk , ζ} is called an existential closure, we write it for simplicity ∃.A (this small abuse of notation simplifies a bit the notation below). We reduce SUP to TCP by letting: M1 ≡ λx.λz.c(λy.z(y(λu1 .xu1 )(λu2 .xu2 ))) and Γ = {c : ¬∃.¬(¬¬(B1 → γ1 ) → ¬¬(γ2 → B2 ) → A1 → A2 )}. Theorem 3 (SUP → TCP). {A1 ≤ B1 , A2 ≤ B2 } has a solution if and only if Γ M1 : ∃γ.(¬γ → ¬γ) is derivable in the system (→ ∃). 1
The problem is undecidable for one binary symbol and two equations [KTU90].
Existential Type Systems with No Types in Terms
123
Proof. For the if part, assume that the instance {A1 ≤ B1 , A2 ≤ B2 } of SUP has a solution, i.e., Ri (S(Ai )) = S(Bi ) for i = 1, 2. Then we have the following derivations (the reader is encouraged to reconstruct the contexts missing in the derivations): ui : ¬(Ri (SA1 ) → Ri (SA2 )) (∃I) x : ¬∃.¬(SA1 → SA2 ) ui : ∃.¬(SA1 → SA2 ) xui : ζ λui .xui : ¬¬(Ri (SA1 ) → Ri (SA2 ))
(2)
Let a : ¬(SA1 → SA2 ), and y : B together with B ≡ (¬¬(SB1 → R1 (SA2 )) → ¬¬(R2 (SA1 ) → SB2 ) → SA1 → SB2 ) and A ≡ (¬(¬¬(B1 → γ1 ) → ¬¬(γ2 → B2 ) → A1 → B2 )). Then, one has y(λu1 .xu1 )(λu2 .xu2 ) : SA1 → SB2 under the solution. We continue: a : ¬(SA1 → SA2 ) y(λu1 .xu1 )(λu2 .xu2 ) : SA1 → SA2 a(y(λu1 .xu1 )(λu2 .xu2 )) : ζ (→ I) λy.a(y(λu1 .xu1 )(λu2 .xu2 )) : ¬B (∃I) λy.a(y(λu1 .xu1 )(λu2 .xu2 )) : ∃.γ1 .γ2 .A c : Γ (c) (→ E) z : ∃.¬(SA1 → SA2 ) c(λy.a(y(λu1 .xu1 )(λu2 .xu2 ))) : ζ (∃E) c(λy.z(y(λu1 .xu1 )(λu2 .xu2 ))) : ζ
Finally, the system (→ ∃) derives Γ M1 : ∃γ.(¬γ → ¬γ), as follows: c(λy.z(y(λu1 .xu1 )(λu2 .xu2 ))) : ζ (→ I) λz.c(λy.z(y(λu1 .xu1 )(λu2 .xu2 ))) : ¬∃.¬(SA1 → SA2 ) λx.λz.c(λy.z(y(λu1 .xu1 )(λu2 .xu2 ))) : ¬∃.¬(SA1 → SA2 ) → ¬∃.¬(SA1 → SA2 ) λx.λz.c(λy.z(y(λu1 .xu1 )(λu2 .xu2 ))) : ∃γ.(¬γ → ¬γ)
Note that the derivation above is a simple derivation. For the only if part, assume that Γ M1 : ∃γ.(¬γ → ¬γ) in the system (→ ∃), and investigate simple derivations of the judgement. Theorem 4 (TCP (→ ∃)). The TCP of the system (→ ∃) in the Curry style is undecidable. The proof given to the theorem above works as well for the statement of type inference. Let N ≡ λx.λz.c(λy.z(y(λu1 .xu1 )(λu2 .xu2 ))) and N ≡ bN , and Γ = {b : ¬∃γ.(¬γ → ¬γ), c : ¬∃.¬(¬¬(B1 → γ1 ) → ¬¬(γ2 → B2 ) → A1 → A2 )}.
Theorem 5 (TIP (→ ∃)). {A1 ≤ B1 , A2 ≤ B2 } has a solution if and only if there exists some A such that Γ N : A in the system (→ ∃). Therefore, TIP of the system (→ ∃) in the Curry style is undecidable either. Proof. It is enough to show that if there exists some A such that Γ N : A then we have a simple derivation of Γ N : ζ. The only ways to apply (∃E) at the end of the derivation are: Γ N : ∃X.B Γ, x : B bx : A (∃E) Γ bx[x := N ] : A
Γ N : ∃X.B Γ, x : B x : A (∃E) Γ x[x := N ] : A
124
K. Fujita and A. Schubert
for some ∃X.B. The former case is impossible by Cor. 1, since N is in the form of λ-abstraction. In the latter case, N : ∃X.B must be derived by (→ E) by Cor. 1. Here, however, type of N cannot be existential but ζ. This implies that Γ N : A is, at the last step, derived by (→ E), where A ≡ ζ.
7
TCP and TIP Are Undecidable in Curry-Style (¬, ∧, ∃)
Let {A1 ≤ B1 , A2 ≤ B2 } be an instance of SUP built of type variables and ∧. We use for (¬∧∃) in the Curry style a reduction similar to the one in Theorem 3. Let M1 ≡ λx.c(λy.y λu1 .π1 xu1 , λu2 .π1 xu2 , π2 x) and Γ = {c : ¬∃.¬¬(¬¬(B1 ∧ γ1 ) ∧ ¬¬(γ2 ∧ B2 ) ∧ ¬(A1 ∧ A2 ))}. Theorem 6 (TCP (¬ ∧ ∃)). {A1 ≤ B1 , A2 ≤ B2 } has a solution if and only if Γ M1 : ∃γ.¬(¬γ ∧ γ) in the system (¬ ∧ ∃). The same method can be applied to TIP. Let N1 ≡ b(λx.c(λy.yλu1 .π1 xu1 , λu2 .π1 xu2 , π2 x)) and Γ1 = {b : ¬∃γ.¬(¬γ ∧ γ), c : ¬∃.¬¬(¬¬(B1 ∧ γ1 ) ∧ ¬¬(γ2 ∧ B2 ) ∧ ¬(A1 ∧ A2 ))}.
Theorem 7 (TIP (¬ ∧ ∃)). {A1 ≤ B1 , A2 ≤ B2 } has a solution if and only if there exists A such that Γ1 N1 : A in the system (¬ ∧ ∃). Proof. For the only-if part, assume that there exists A such that Γ1 N1 : A. Then we obtain Γ1 N1 : ⊥, which follows the proof of TCP.
8
Subject Reduction Property for Curry Systems
It is known [SU06] that the subject reduction property is broken in the Curry system with both ∀ and ∃. Prop. 2 is applied to the analysis of the subject reduction property as well. We show a stronger result that the system (→, ∃) does not enjoy the subject reduction property. Let I ≡ λx.x. Proposition 3 (Subject reduction of β). The subject reduction property of β-reduction is broken with respect to the existential system (→, ∃) in Curry-style. Let Γ be {f : Z → ∃X.(X → X), z : Z}. Then Γ λx.(If z)(If zx) : ∃X.(X → X) but Γ λx.(f z)(If zx) : ∃X.(X → X). Proof. It is enough to show that no simple derivations have Γ λx.f z(If zx) : ∃X.(X → X), following Lemma 6. Proposition 4 (Subject reduction of η). The subject reduction property of η-reduction is broken with respect to (→, ∃) or (¬, ∃). Proof. Let f : ∃X.X → W . Then take λa.f a : Z → W . Let g : ¬∃X.X, and then take λa.ga : ¬Z. From Lemma 6, neither f : (∃X.X) → W f : Z → W nor g : ¬∃X.X g : ¬Z can be derived in the Curry system.
Existential Type Systems with No Types in Terms
9
125
Discussion on Unification and Semiunification
In this paper, we consider type-free and Curry-style terms. The systems differ in how much of a derivation is preserved in a term — the Curry-style terms omit the information from the rules (∃E) and (∃I) while the type-free terms mark applications of the rules, but they omit the types used. This makes a considerable difference as far as derivation reconstruction problem are concerned. In case of a Curry-style term, we are forced to consider a potentially infinite number of ∃ introduction rules (as in the rule (∃I) from the derivation (2) in the proof of Theorem 3), while in a type-free term the number of the introduction rules is determined to be the number of occurrences of ∃, · construct (as in the terms of the first line in the term presented as (1) in Definition 7). A solution of a semiunification inequality X ≤ A where X is a variable is a pair of substitutions R, S such that R(S(X)) = S(A). The formulation of the problem does not restrict the domain of R in any way. Therefore, this problem matches well the situation we have in the Curry-style system. Still, the unification of . the equations in the flat form in its instance FX1 . . . Xn = A, where X1 , . . . , Xn are unique in the set of equations, requires a fixed number of ‘additions’ to the variable F. However, the equation in the flat form can be seen as a semiunification inequality in a variant of semiunification where an additional restriction on the substitution R is imposed to have domain of size n. It is indeed so as the variables X1 , . . . , Xn are unique in the whole set of equations. The difference between the case with the potentially infinite domain of R and with a domain of a fixed size is considerable as the original semiunification problem enjoys the most general solution property while the semiunification with restricted domain of R, as well as the 2nd order unification, does not. Therefore, it is difficult to devise a direct translation between the two problems and no such translation is known now. In [FS00], we have studied the type related problems of other systems between Church-style and Curry-style. One of them is known as the domain-free style, and the type inference problem had been shown undecidable for the predicative fragment of domain-free λ2, called domain-free ML. For this, we reduced the 2nd order unification problem for simple instances. However, the same reduction method cannot be applied to the problem of systems in the type-free style, since the previous method essentially refers to type information which is to be erased in the type-free case.
References [And95]
[Bar93] [Boe85]
Andou, Y.: A normalization-procedure for the first order classical natural deduction with full logical symbols. Tsukuba Journal of Mathematics 19(1), 153–162 (1995) Barendregt, H.: Lambda Calculi with Types. In: Handbook of Logic in Computer Science, vol. II. Oxford University Press, USA (1993) Boehm, H.-J.: Partial polymorphic type inference is undecidable. In: 26th Annual Symposium on Foundations of Computer Science, pp. 339–345. IEEE, Los Alamitos (1985)
126
K. Fujita and A. Schubert
[BS00] [FS00]
[Fuj] [Fuj05]
[Has06] [KTU90]
[KTU93]
[Mai90]
[MP88] [NTKN08]
[Pfe93] [Pra65] [Sch98]
[SU06]
[Wel99]
Barthe, G., Sørensen, M.H.: Domain-free pure type systems. J. Functional Programming 10, 412–452 (2000) Fujita, K., Schubert, A.: Partially typed terms between Church-style and Curry-style. In: Watanabe, O., Hagiya, M., Ito, T., van Leeuwen, J., Mosses, P.D. (eds.) TCS 2000. LNCS, vol. 1872, pp. 505–520. Springer, Heidelberg (2000) Fujita, K.: CPS-translation as adjoint (submitted) Fujita, K.: Galois embedding from polymorphic types into existential types. In: Urzyczyn, P. (ed.) TLCA 2005. LNCS, vol. 3461, pp. 194–208. Springer, Heidelberg (2005) Hasegawap, M.: Relational parametricity and control. LMCS 2, 3 (2006) Kfoury, A.J., Tiuryn, J., Urzyczyn, P.: The undecidability of the semiunification problem. In: STOC 1990: Proceedings of the twenty-second annual ACM symposium on Theory of computing, pp. 468–476. ACM Press, New York (1990) Kfoury, A.J., Tiuryn, J., Urzyczyn, P.: Type reconstruction in the presence of polymorphic recursion. ACM Trans. Program. Lang. Syst. 15(2), 290–311 (1993) Mairson, H.G.: Deciding ML typability is complete for deterministic exponential time. In: POPL 1990: Proceedings of the 17th ACM SIGPLANSIGACT symposium on Principles of programming languages, pp. 382– 401. ACM Press, New York (1990) Mitchell, J.C., Plotkin, G.D.: Abstract types have existential type. ACM Trans. Program. Lang. Syst. 10(3), 470–502 (1988) Nakazawa, K., Tatsuta, M., Kameyama, Y., Nakano, H.: Undecidability of type-checking in domain-free typed lambda-calculi with existence. In: Kaminski, M., Martini, S. (eds.) CSL 2008. LNCS, vol. 5213, pp. 478–492. Springer, Heidelberg (2008) Pfenning, F.: On the undecidability of partial polymorphic type reconstruction. Fundamenta Informaticae 19(1,2), 185–199 (1993) Prawitz, D.: Natural Deduction: A Proof-Theoretical Study. Almquist and Wiksell, Stockholm (1965) Schubert, A.: Second-order unification and type inference for Churchstyle polymorphism. In: POPL 1998: Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 279–288. ACM Press, New York (1998) Sørensen, M.H., Urzyczyn, P.: Lectures on the Curry-Howard Isomorphism. Studies in Logic and the Foundations of Mathematics, vol. 149. Elsevier Science Inc., New York (2006) Wells, J.B.: Typability and type checking in system F are equivalent and undecidable. Ann. Pure Appl. Logic 98(1–3), 111–156 (1999)
Initial Algebra Semantics for Cyclic Sharing Structures Makoto Hamana Department of Computer Science, Gunma University, Japan [email protected]
Abstract. Terms are a concise representation of tree structures. Since they can be naturally defined by an inductive type, they offer data structures in functional programming and mechanised reasoning with useful principles such as structural induction and structural recursion. In the case of graphs or ”tree-like” structures – trees involving cycles and sharing – however, it is not clear what kind of inductive structures exists and how we can faithfully assign a term representation of them. In this paper we propose a simple term syntax for cyclic sharing structures that admits structural induction and recursion principles. We show that the obtained syntax is directly usable in the functional language Haskell, as well as ordinary data structures such as lists and trees. To achieve this goal, we use categorical approach to initial algebra semantics in a presheaf category. That approach follows the line of Fiore, Plotkin and Turi’s models of abstract syntax with variable binding.
1 Introduction Terms are a convenient, concise and mathematically clean representation of tree structures used in logic and theoretical computer science. In the field of traditional algorithms or graph theory, one usually uses unstructured representations for trees, such as a pair (V, E) of vertices and edges sets, adjacency lists, adjacency matrices, pointer structures, etc, which are more complex and unreadable than terms. We know that term representation provides a well-structured, compact and more readable notation. However, consider the case of “tree-like” structures such as that depicted in Fig. 1. This kind of structures – graphs, but almost trees involving (a few) exceptional edges – quite often appears in logic and computer science. Examples include internal representations of expressions in implementations of functional languages that share common sub-expressions for efficiency, control flow graphs of imperative programs used in static analysis and compiler optimizations [CFR+ 91], data models of XML such as trees with pointers [CGZ05], proof trees admitting cycles for cyclic proofs [Bro05], and term graphs in graph rewriting [BvEG+ 87, AK96]. Suppose we want to treat such structures in a pure Fig. 1. functional programming language such as Haskell, Clean, or a proof assistant such as Coq, Agda [Nor07]. In such a case, we would have to abandon the use of naive term representation, and would instead be compelled to use an P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 127–141, 2009. c Springer-Verlag Berlin Heidelberg 2009
128
M. Hamana
unstructured representation such as (V, E), adjacency lists, etc. Furthermore, a serious problem is that we would have to abandon structural recursion/induction to decompose them because they look “tree-like” but are in fact graphs, so there is no obvious inductive structure in them. This means that in functional programming, we cannot use pattern matching to treat tree-like structures, which greatly decreases their convenience. This lack of structural induction means failure of being an inductive type. But, are there really no inductive structures in tree-like structures? As might be readily apparent, treelike structures are almost trees and merely contain finite pieces of information. The only difference is the presence of “cycles” and “sharing”. In this paper, we give an initial algebra characterisation of cyclic sharing structures in the framework of categorical universal algebra. The aim of this paper is to derive the following practical goals from the initial algebra characterisation. [I] To develop a simple term syntax for cyclic sharing structures that admits structural induction and structural recursion principles. [II] To make the obtained syntax directly usable in the current functional languages and proof assistants, as well as ordinary data structures, such as lists and trees. The goal [I] also intends that the term syntax exactly characterises cyclic sharing structures (i.e. no junk terms exist) to make structural induction possible. The goal [II] intends that the obtained syntax should be lightweight as possible, which means that e.g. well-formedness and equality tests on terms for cyclic sharing structures should be fast and easy, as are ordinary data structures such as lists and trees. We do not want many axioms to characterise the intended structures, because in programming situation, to check the validity of axioms every time is expensive and makes everything complicated. Therefore, ideally, formulating structures without axioms is best. The goal [II] is rephrased more specifically as: [II’] To give an inductive type that represents cyclic sharing structures uniquely. We therefore rely on that a type checker automatically ensures the well-formedness of cyclic sharing structures. We choose a functional programming language Haskell to concretely show it in this paper. To archive these goals, we use the category theoretic formulation of initial algebra semantics. Recently, varying the base category other than Set, initial algebra semantics for functor-algebras has proved to be useful framework to characterise various mathematical/computational structures in a uniform setting. We list several: S -sorted abstract syntax is characterised as initial algebra in SetS [Rob02], second-order abstract syntax as initial algebra in SetF [FPT99, Ham04, Fio08] (where F is the category of finite sets), explicit substitutions as initial algebras in the category [Set, Set] f of finitary functors [GUH06], recursive path ordering for term rewriting systems as algebras in the category LO of linear orders [Has02], nested datatypes [GJ07] and generalised algebraic datatypes (GADTs) [JG08] in functional programming as initial algebras in [C, C] and [|C|, C] respectively, where C is a ω-cocomplete category. This paper adds a further example to the above list. We characterise cyclic sharing ∗ structures as an initial algebra in the category (SetT )T , where T is the set of all “shapes”
Initial Algebra Semantics for Cyclic Sharing Structures
129
of trees and T∗ is the set of all tree shape contexts. We derive structural induction and recursion principles from it. An important point is that we merely use algebra of functor to formulate cyclic sharing structures, i.e. not (models of) equational specifications or (Σ, E)-algebras. This achieves the requirement of “without axioms” and is the key to formulate them by an inductive type. Basic idea. It is known in the field of graph algorithms [Tar72] that, by traversing a rooted directed graph in a depth-first search manner, we obtain a depth-first search tree, which consists of a spanning tree (whose edges are called tree edges) of the graph and forward edges (which connect ancestors in the tree to descendants), back edges (the reverse), and cross edges (which connect nodes across the tree from right to left). Forward edges can be decomposed into tree and cross edges by placing indirect nodes. For example, the graph in Fig. 1 becomes a depth-first search tree in Fig. 2 where solid lines are tree edges and dashed lines are back and cross edges. This is the target structure we will model in this paper. That is, tree edges are the basis of an inductive structure, back edges are to form cycles, and cross edges are to form sharing. Consequently, our task is to seek how to characterise pointers making back edges and cross edges in inducFig. 2. Depth-first search tree tive constructions. Formulation. The crucial idea to formulate pointers in inductive constructions is to use binders as pointers in abstract syntax. Trees are formulated as terms. Hence, a remaining problem is how to exactly capture binders in terms. Fiore, Plotkin and Turi [FPT99] has characterised abstract syntax with variable binding by an initial algebra in the category SetF . For example, abstract syntax of λ-terms is modeled as a functor Λ:F
- Set
equipped with three constructors for λ-terms as an algebra structure on Λ. Each set Λ(X) gives the set of all λ-terms which may contain free variables taken from a set X in F. This formulation models a structure (here, abstract syntax trees) indexed by suitable invariant (here, free variables considered as contexts), which is essential information to capture the intended structure (abstract syntax with variable binding). However, this approach using algebras in SetF is insufficient to represent “cross edges” in tree-like graphs. Ariola and Klop has analysed that there are two kinds of sharing in this kind of tree-like graphs [AK96]: (i) vertical sharing (i.e. back edges in depth-first search trees) and (ii) horizontal sharing (i.e. cross edges). In principle, binders capture “vertical” contexts only, but to represent cross edges exactly, we must capture “horizontal” context information, which cannot be handled by the index category F. To solve this problem, in this paper we take a richer index category that is enough to model cross edges. We introduce the notion of shape trees and contexts consisting of them, which represents other parts of a tree viewing from a pointer node. We use shape trees T as “types” of syntax and T∗ as “context”. We use Fiore’s initial algebra semantics
130
M. Hamana
Fig. 3. Trees involving cycle and sharing
for typed abstract syntax with variable binding [Fio02] by algebras in the presheaf cat∗ egory (SetT )T . Hence cyclic sharing trees are modeled as a T and T∗ -indexed set T :T
- (T∗
- Set)
equipped with constructors of cyclic sharing trees as an algebra structure. Organisation. We first give types and abstract syntax for cyclic sharing binary trees in Section 2. We then characterise cyclic sharing binary trees as an initial algebra in Section 3. Section 4 gives a way of implementing cyclic sharing structures by an inductive type in Haskell. Section 5 generalises our treatment to arbitrary signature for cyclic sharing structures. Section 6 relates our representation and equational term graphs in the initial algebra framework by giving a homomorphic translation. In Section 7, we discuss connections to other approaches to cyclic sharing structures.
2 Abstract Syntax for Cyclic Sharing Structures Cyclic structures by μ-terms. The syntax of fixpoint expressions by μ-notation (μterms) is widely used in computer science and logic. Its theory has been thoroughly investigated, e.g., in [AK96]. The language of μ-terms suffices to express all cyclic structures. For example, a cyclic binary tree shown in Fig. 3 (i) is representable by the term μx.bin(μy1 .bin(lf(5), lf(6)), μy2 .bin( x, lf(7) ))
(1)
where bin and lf denote binary and leaf node, respectively. The point is that the variable x refers to the root labeled by a μ-binder, hence a cycle is represented. To uniquely formulate cyclic structures, here we introduce the following assumption: we attach μbinders in front of bin only, and put exactly one μ-binder for each occurrence of bin as for (1). This is seen as uniform addressing of bin-node, i.e., x, y1 , y2 are seen as as labels or “addresses” of bin-nodes. We also assume no axiom to equate μ-terms. That is, we do not identify a μ-term with its unfolding, since they are different (shapes of) graphs. In summary, μ-terms represent cyclic structures. This is the underlying idea of representation of cyclic data in a functional programming language Haskell [GHUV06]. How to represent sharing. Next, we incorporate sharing. The presence of sharing makes the situation more difficult. Consider the tree (ii) in Fig. 3 involving sharing via a cross edge. As similar to the case of cycles, this might be written as a μ-term μx.bin(μy1 .bin(lf(5), lf(6)), μy2 .bin(
, lf(7))).
Initial Algebra Semantics for Cyclic Sharing Structures
131
But can we fill the blank to refer the node a (in Fig. 3 (ii)) from the node c “horizontally” by using the mechanism of binders? Actually, μ-binders are insufficient for this purpose. Hence, we introduce a new notation “ p ↑ x” to refer to a node horizontally. This notation means going up to a node x labelled by a μ-binder and going down to a position p in the subtree rooted by the node x. In the above example, the blank is filled by 11↑x, which means going back to the node x, then going down through the left child twice (using the position 11). See also Example 1. We firstly focus on the formulation of binary trees involving cycles and sharing. Binary trees are the minimal case that can involve the notion of sharing in structures. To ensure correct sharing, we introduce the notion of shape trees. Shape trees. We call our target data structures cyclic sharing trees and its syntactic representation cyclic sharing terms. Cyclic sharing trees are binary trees generated by three kinds of nodes, that is, pointer, leaf, and binary node, and satisfying a certain well-formedness condition. We use skeletons of cyclic sharing trees, called shape trees. Shape trees are binary trees forgetting values in pointer and leaf nodes from cyclic sharing trees. The set T of all shape trees is defined by T τ ::= e | p | l | b(τ1 , τ2 ) where e is the void shape, p is the shape of pointer node, l is the shape of leaf node, and b(τ1 , τ2 ) is the shape of a binary node. We typically use Greek letters σ, τ to denote shape trees. We define referable positions in a shape tree. A position is a finite sequence of {1, 2}. The root position is denoted by the empty sequence and the concatenation of positions is denoted by pq or p.q. The set Pos(τ) of positions in a shape tree τ is defined by Pos(e) = Pos(p) = ∅, Pos(l) = {}, Pos(b(σ, τ)) = {} ∪ {1p | p ∈ Pos(σ)} ∪ {2p | p ∈ Pos(τ)}. An important point is that the void e and the pointer p nodes are not referable by other nodes, hence their positions are defined to be empties. Syntax and types. Shape trees are used as types in a typing judgment. As usual, a typing context Γ is a sequence of (variable, shape tree)-pairs. Typing rules (named binder version) (Pointer)
p ∈ Pos(σ) Γ, x : σ, Γ p↑x : p
(Node)
(Leaf)
k∈Z Γ lf(k) : l
x : b(e, e), Γ s : σ x : b(σ, e), Γ t : τ Γ μx.bin(s, t) : b(σ, τ)
In these typing rules, a shape tree type is assigned to the corresponding tree node. That is, a binary node is of type b(σ, τ) of binary node shape, a pointer node is of type p of pointer node shape, and a leaf node is of type l of leaf node shape. A type declaration x : σ in a typing context (roughly) means that σ is the shape of subtree (say, t) headed by a binder μx (see Example 1). Hence, in (Pointer) rule, taking
132
M. Hamana
a position p ∈ Pos(σ), we safely refer to a position in the tree t. The notation p↑x is designed to realise a right-to-left cross edge. Note also that a path obtained by p↑x is the shortest path from the pointer node to the node referred by p↑x. When p = , we abbreviate ↑x as ↑x. This ↑x exactly expresses a back edge. In (Node) rule, the shape trees b(e, e) and b(σ, e) mask nodes that are reachable via left-to-right references (i.e. not our requirement) or redundant references (e.g. going up to a node x then going back down through the same path) by the void shape e. Example 1. The binary tree involving sharing in Fig. 3 (ii) is represented by a welltyped term μx.bin(μy1 .bin(lf(5), lf(6)), μy2 .bin(11 ↑ x, lf(7))) Typing derivation is as follows. 11 ∈ Pos(β) y1 :α, x:α lf(5) : l y1 :b(l, e), x:α lf(6) : l y2 :α, x:β 11↑x : p y2 :b(p, e), x:β lf(7) : l x:α μy1 .bin(lf(5), lf(6)) : b(l, l) x:β μy2 .bin(11↑x, lf(7)) : b(p, l)
μx.bin(μy1 .bin(lf(5, lf(6)), μy2 .bin(11↑x, lf(7))) : b(b(l, l), b(p, l))
where α = b(e, e), β = b(b(l, l), e). Instead of named variables for binders, a de Bruijn notation is also possible. The construction rules are reformulated as follows. Now a typing context Γ is simply a sequence of shape trees τ1 , . . . , τn . Let |Γ| denote its length. A judgment Γ t : τ denotes a well-formed term t of shape τ containing free variables (de Bruijn indices) from 1 to |Γ|. The intended meaning is that the length |Γ| denotes how many maximally we can go up from the current node t, and each shape tree τi in Γ denotes the shape of the subtree at i-th upped node from t. Therefore, when t is a pointer, a context specifies the set of all positions that a pointer node can refer to. As known from λ-calculus, using de Bruijn notation, binders become nameless, so we can safely omit “x” from μx. Since the typing rules are designed to attach exactly one μ-binder for each bin, even “μ” can be omitted. Hence we have a simplified construction rules of terms. Typing rules (de Bruijn version) (dbPointer)
|Γ| = i − 1 p ∈ Pos(σ) Γ, σ, Γ p↑i : p
(dbNode)
(dbLeaf)
k∈Z Γ lf(k) : l
b(e, e), Γ s : σ b(σ, e), Γ t : τ Γ bin(s, t) : b(σ, τ)
In (dbPointer) rule, the condition |Γ| = i − 1 says that the shape tree σ appears at i-th position of the typing context in the lower judgment. Since for a given graph its depth-first search tree is unique, the following is immediate. Theorem 1. Given rooted graph which is connected, directed and edge-ordered, the term representation in de Bruijn is unique.
Initial Algebra Semantics for Cyclic Sharing Structures
133
Remark 1. This uniqueness of term representation has practical importance. For instance, for the graph in the tree (ii) in Fig. 3, there is only one way to represent it in this term syntax, i.e., bin(bin(lf(5, lf(6)), bin(11↑1, lf(7))) in de Bruijn. Hence, we do not need any complex equality on graphs (other than the syntactic equality) to check whether a given data is the required one. This is in contrast to other approaches. If we represent a graph as a term graph with labels [BvEG+ 87], an equational term graph [AK96], or a letrec-term [Has97], there are several syntactic representations for a single graph, hence some normalisation is required when e.g., defining a function on graphs. Generally speaking, our terms are seen as “de Bruijn notation” of term graphs with labels [BvEG+ 87].
3 Initial Algebra Semantics 3.1 Construction In this section, we show that cyclic sharing terms form an initial algebra and derive structural recursion and induction from it. We use Fiore’s approach to algebras for typed abstract syntax with binding [Fio02] in the presheaf category (SetF↓U )U where U is the set of all types. Now, we take the set T of all shape trees for U, and the set N of natural numbers for variables (i.e. pointers), instead of the category F of finite sets and all functions (used for renaming variables), because we do not need renaming of pointers. Algebras. We define the discrete category T∗ by taking contexts Γ = τ1 , . . . , τn as ob∗ jects (which is equivalent to N ↓ T). We consider algebras in (SetT )T . Two preliminary ∗ definitions. We define the presheaf PO ∈ SetT for pointers by PO( τ1 , . . . , τn ) = { p↑i | 1 ≤ i ≤ n, p ∈ Pos(τi )}. - SetT for context extension by For each τ ∈ T, we define the functor δτ : SetT δτ A = A( τ, − ). ∗ ∗ We define the signature functor Σ : (SetT )T - (SetT )T for cyclic sharing binary ∗ T∗ T trees, which takes A ∈ (Set ) and a type in T, and gives a presheaf in SetT , as follows: ∗
(ΣA)e = 0
(ΣA)p = PO
(ΣA)l = KZ
∗
(ΣA)b(σ,τ) = δb(e,e) Aσ × δb(σ,e) Aτ
where KZ is the constant functor to Z, and 0 is the empty set functor. A Σ-algebra ∗ A is a pair (A, α) consisting of a presheaf A ∈ (SetT )T for a carrier and a natural transformation α : ΣA → A for an algebra structure. By definition of Σ, to give an ∗ algebra structure is to give the following morphisms of SetT : ptrA : PO → Ap
lfA : KZ → Al
binσ,τ A : δb(e,e) Aσ × δb(σ,e) Aτ → Ab(σ,τ) .
A homomorphism of Σ-algebras is a map φ : (A, α) → (B, β) such that φ ◦ α = β ◦ Σφ. Initial Algebra. Let T be the presheaf of all derivable cyclic sharing terms defined by T τ (Γ) = {t | Γ t : τ}.
134
M. Hamana
Theorem 2. For the signature functor Σ for cyclic sharing binary trees, T forms an initial Σ-algebra. Proof. Since δτ preserves ω-colimits, so does Σ. An initial Σ-algebra is constructed by the colimit of the ω-chain 0 → Σ0 → Σ2 0 → · · · [SP82]. These construction steps correspond to derivations of terms by typing rules, hence their union T is the colimit. The algebra structure in : ΣT → T of the initial algebra is obtained by one-step inference of the typing rules, i.e., given by the following operations ptrT (Γ) : PO(Γ) → T p (Γ) p↑i → p↑i
lfT (Γ) : Z → T l (Γ) k → lf(k)
bin (Γ) : T σ (b(e, e), Γ) × T τ (b(σ, e), Γ) → T b(σ,τ) (Γ); T
s, t → bin(s, t).
This development of initial algebra characterisation follows the line of [FPT99, Fio02, MS03]. Hence we can further develop full theory of algebraic models of abstract syntax for cyclic sharing structures along this line. It will provide second-order typed abstract syntax with object/meta-level variables and substitutions via a substitution monoidal ∗ structure and a free Σ-monoid [Ham04, Fio08] in (SetT )T (by incorporating suitable arrows into T∗ ). Object/meta-substitutions on cyclic sharing structures will provide ways to construct cyclic sharing structures from smaller structures in a sensible manner. But this is not the main purpose of this paper, hence details will be pursued elsewhere. 3.2 Structural Recursion Principle One of the important benefit of initial algebra characterisation is that the unique homomorphism from the initial to another algebra is a mapping defined by structural recursion. Theorem 3. The unique homomorphism φ from the initial Σ-algebra T to a Σ-algebra A is described by φp (Γ)( p↑i) = ptrA (Γ)( p↑i) φl (Γ)(lf(k)) = lfA (Γ)(k) φb(σ,τ) (Γ)(bin(s, t)) = binA (Γ)(φσ (b(e, e), Γ)(s), φτ (b(σ, e), Γ)(t)) Proof. Since the unique homomorphism φ : T
- A is a morphism of (SetT∗ )T .
Example 2. We define (1) the function height that computes the height of a cyclic sharing tree t ∈ T τ (Γ) (2) the function leaves that collects all leaf values in t by structural recursion (where max is the maximal function): heightp (Γ)( p↑i) = 1
heightl (Γ)(lf(k)) = 1
heightb(σ,τ) (Γ)(bin(s, t)) = max(heightσ (b(e, e), Γ)(s), heightτ (b(σ, e), Γ)(t)) leavesp (Γ)( p↑i) = ∅
leavesl (Γ)(lf(k)) = {k}
leavesb(σ,τ) (Γ)(bin(s, t)) = leavesσ (b(e, e), Γ)(s) ∪ leavesτ (b(σ, e), Γ)(t)
Initial Algebra Semantics for Cyclic Sharing Structures
135
This is because leaves is the unique homomorphism from T to a Σ-algebra KZ whose operations are ptrKN (Γ)( p↑i) = ∅, lfKN (Γ)(k) = {k}, binKN (Γ)(x, y) = x∪y. Similarly for height. Notice that the height is not so directly defined in ordinary graph representations. From algorithmic point of view, this structural recursion principle provides depth-first search traversal of a rooted graph. Hence, graph algorithms based on depth-first search are directly programmable by this structural recursion. In the author’s home page (http://www.cs.gunma-u.ac.jp/˜hamana/), several other simple graph algorithms have been programmed by structural recursion. 3.3 Structural Induction Principle Another important benefit of initial algebra characterisation is the tight connection to structural induction principle. In order to derive it, following [HJ98], we use the cate∗ ∗ gory Sub((SetT )T ) for predicates on (SetT )T defined by ∗
• objects: sub-presheaves (Q → V), i.e., inclusions between Q, V ∈ (SetT )T , • arrows: u : (Q → V) → (P → U) are natural transformations u : V → U between underlying presheaves satisfying a ∈ Qτ (Γ) implies u(a) ∈ Pτ (Γ) for all τ ∈ T, Γ ∈ T∗ . A sub-presheaf (P → T ) is seen as a predicate P on cyclic sharing terms T , which is indexed by types and contexts. So, we say “PΓτ (t) holds” when t ∈ Pτ (Γ) for t ∈ T τ (Γ). ∗ ∗ ∗ We consider Σ-algebras in Sub((SetT )T ). The signature functor Σ : (SetT )T → (SetT )T T∗ T T∗ T is lifted to Σpred : Sub((Set ) ) → Sub((Set ) ), called the logical predicate lifting in [HJ98], by the same way to Σ: (Σpred (Q → V))p = (PO → PO) (Σpred (Q → V))e = (0 → 0)
(Σpred (Q → V))l = (KZ → KZ )
(Σpred (Q → V))b(σ,τ) = δb(e,e) (Q → V)σ × δb(σ,e) (Q → V)τ ∗
∗
where the context extension is also lifted to δτ : Sub(SetT ) → Sub(SetT ) defined by δτ (A → B) = (A τ, − → B τ, − ). Now, a Σpred -algebra structure α : Σpred (P → T ) → (P → T ) can be read as the induction hypothesis, e.g. binσ,τ P (Γ) : (Pσ (b(e, e), Γ) → T σ (b(e, e), Γ)) × (Pτ (b(σ, e), Γ) → T τ (b(σ, e), Γ))
→ (Pb(σ,τ) (Γ) → T b(σ,τ) (Γ));
s, t → bin(s, t)
means that “if Pσb(e,e),Γ (s) & Pτb(σ,e),Γ (t) holds, then PΓb(σ,τ) (bin(s, t)) holds.” By [HJ98], (T → T ) is an initial Σpred -algebra. The unique homomorphism φ : (T → T ) - (P → T ) means that P holds for all cyclic sharing terms in T . Hence Theorem 4. Let P be a predicate on T . To prove that PΓτ (t) holds for all t ∈ T τ (Γ), it suffices to show (i) PΓp ( p↑i) holds for all p↑i ∈ PO(Γ), (ii) PΓl (lf(k)) holds for all k ∈ Z, (iii) if Pσb(e,e),Γ (s) & Pτb(σ,e),Γ (t) holds, then PΓb(σ,τ) (bin(s, t)) holds. This structural induction principle can be used to prove properties of functions on cyclic sharing terms defined by structural recursion.
136
M. Hamana
4 Inductive Type for Cyclic Sharing Structures In this section, we realise our aim [II] to give an inductive type for cyclic sharing structures. We choose the functional language Haskell, because (1) we show that our characterisation of cyclic sharing is available in today’s programming language technology, and (2) Haskell’s type system is powerful enough to faithfully implement our initial algebra characterisation. The resulting definition might be close to implementations in proof assistants using dependent types such as Coq and Agda. Since the set T τ (Γ) of cyclic sharing terms depends on a shape tree and context, it should be implemented by a dependent type. We have seen in the proof of Thm. 2, constructors of cyclic sharing terms are T and T∗ -indexed functions. Inductive types defined by indexed constructors have been known as inductive families in dependent type theories [Dyb94]. Recently, Glasgow Haskell Compiler incorporates this feature as GADTs (generalised algebraic data types) [PVWW06]. With another feature called type classes, we can realise lightweight dependently-typed programming in Haskell [McB02]. We will implement T τ (Γ) as a GADT “T n t” that depends on a context n (for Γ) and a shape tree t (for τ). Since in Haskell, a type can only depend on types (not values), we firstly define type-level shape trees by using a type class. data E; data P; data L = StopLf; data B a b = DnL a | DnR b | StopB class Shape t instance Shape E; instance Shape P; instance Shape L instance (Shape s, Shape t) => Shape (B s t)
These define constructors of shape trees as types E,P,L and a type constructor B, then group them by the type class Shape. Values of a shape tree τ (e.g. B (B L L) L in Haskell) are Pos(τ) (e.g. DnL (DnR StopLf)), i.e. “referable positions” in τ. Similarly, a context τ1 , . . . , τn is coded as a type-level sequence TyCtx τn (TyCtx τn−1 · · · (TyCtx τ1 TyEmp)), and the type constructors are grouped by the type class Ctx. Values of a context type are “pointers” (e.g. (Up UpStop) meaning ↑2). data TyEmp; data TyCtx t n = Up n | UpStop | UpGD t class Ctx n instance Ctx TyEmp; instance (Shape t, Ctx n) => Ctx (TyCtx t n)
Finally, we define T τ (Γ) by a GADT “T” that takes a context and a shape tree as two arguments of types. data T :: * -> * -> * where Ptr :: Ctx n => n -> T n P Lf :: Ctx n => T n L Bin :: (Ctx n, Shape s, Shape t) => T (TyCtx (B E E) n) s -> T (TyCtx (B s E) n) t -> T n (B s t)
This defines three constructors of cyclic sharing terms faithfully (the part “Ctx n =>” is a quantification meaning that “for every type n which is an instance of the type class Ctx”). For example, the term in Example 1 is certainly a well-typed term and its type is inferred in Haskell:
Initial Algebra Semantics for Cyclic Sharing Structures
137
Bin (Bin (Lf 5) (Lf 6)) (Bin (Ptr (Up (UpGD (DnL (DnL StopLf))))) (Lf 7)) :: T TyEmp (B (B L L) (B P L))
The term Up (UpGD (DnL (DnL StopLf))) is the representation of the pointer 11 ↑ 2 in de Bruijn notation, which is read from the top as “up and up, then going
down (GD is short for going down) along the position 11 and stopping at a leaf”. The type inference and the type checker automatically ensures well-formedness of cyclic sharing terms. In Haskell, we can equally use the GADT T as an ordinary algebraic datatype, so we can define functions on it by structural recursion as described in Example 2 (even simpler; shape tree and context parameters are unnecessary in defining functions due to Haskell’s compilation method [PVWW06]). The implementation and further examples using the GADT T are available from the author’s home page.
5 General Signature We give construction of cyclic sharing structures for arbitrary signature as a natural generalisation of the binary tree case. A signature Σ for cyclic sharing structures consists of a set Σ of function symbols having arities. A function symbol of arity n ∈ N is denoted by f (n) . The set T of all shape trees is defined by T τ ::= e | p | f(τ1 , . . . , τn )
for each f (n) ∈ Σ.
Throughout this section, we use the convention that its small capital is associated with each letter of function symbol, e.g. f for f , g for g, etc. The set of all contexts is T∗ = { τ1 , . . . , τn | n ∈ N, i ∈ {1, . . . , n}, τi ∈ T}. Positions are defined by Pos(e) = Pos(p) = ∅, Pos(f(τ1 , . . . , τn )) = {} ∪ {1.p | p ∈ Pos(τ1 )} ∪ . . . ∪ {n.p | p ∈ Pos(τn )}. Typing rules |Γ| = i − 1 p ∈ Pos(σ) Γ, σ, Γ p↑i : p
γ1 , Γ t1 : τ1 · · · γn , Γ tn : τn f (n) ∈ Σ Γ f (t1 , . . . , tn ) : f(τ1 , . . . , τn )
where γ1 = f(e, . . . , e), γi+1 = f(τ1 , . . . , τi , e, . . . , e) for each 1 ≤ i ≤ n − 1. ∗
The shape trees γi ’s are also used below. The base category is (SetT )T . For a signature ∗ ∗ Σ, we associate a signature functor Σ : (SetT )T → (SetT )T defined by (ΣA)p = PO (ΣA)f(τ1 ,...,τn ) = δγi Aτi for each f (n) ∈ Σ (ΣA)e = 0 1≤i≤n
Theorem 5 (Initial algebra). Let Σ be a signature. T τ (Γ) = {t | Γ t : τ} forms an initial Σ-algebra where operations are: ptrT (Γ) : PO(Γ) → T p (Γ) f T (Γ) : 1≤i≤n T τi (γi , Γ) → T f(τ1 ,...,τn ) (Γ) p↑i → p↑i t1 , . . . , tn → f (t1 , . . . , tn ).
138
M. Hamana
Theorem 6 (Structural recursion). The unique homomorphism φ : T → A is φp (Γ)( p↑i) = ptrA (Γ)( p↑i) φf(τ1 ,...,τn ) (Γ)( f (t1 , . . . , tn )) = f A (Γ)(φτ1 (γ1 , Γ)(t1 ), . . . , φτn (γn , Γ)(tn )) Theorem 7 (Structural induction). To prove that PΓτ (t) holds for all t ∈ T τ (Γ), it suffices to show (i) PΓp ( p↑i) holds for all p↑i ∈ PO(Γ), γ ,Γ (ii) if f (n) ∈ Σ and Pτii (ti ) holds for all i = 1, . . . , n, then PΓf(τ1 ,...,τn ) ( f (t1 , . . . , tn )) holds. Moreover, to define a GADT for a given signature is straightforward as we have done in Sect. 4 for the signature of binary cyclic sharing trees.
6 Connection to Equational Term Graphs in the Initial Algebra Framework We have investigated a term syntax for cyclic sharing structures, which gives a representation of a graph. In this section, we give the converse, i.e., an explicit way to calculate the graph for a given cyclic sharing term. This means to give a semantics of cyclic sharing term by a finite graph. We give it by using Ariola and Klop’s equational term graphs in the initial algebra framework. This semantics makes clear connections to existing works on semantics of cyclic sharing structures. We will see it in next section. Equational term graphs [AK96] are another representation of cyclic sharing structures, which has been used in the formalisation of term graph rewriting. This is a representation of a graph by associating a unique name to each node and by writing down the interconnections through a set of recursive equations. For example, the graph in Figure 4 is represented by an equational term graph {x | x = bin(y1 , y2 ), y1 = bin(z, z), u = lf(6)}.
y2 = lf(9), z = bin(x, u),
We use this form of equational term graphs, called flattened form in [AK96] and formally defined as follows (NB. it is slightly different from the original syntax to make exFig. 4. plicit the connection to cyclic sharing terms). Suppose a signature Σ and a set X = {x, x1 , . . .} of variables. An equational term graph is of the form {x | x1 = t1 , x2 = t2 , . . .} where each ti follows the syntax t ::= x | p↑i | f (x1 , . . . , xn ) for each f (n) ∈ Σ. A variable is called bound if it appears in the left-hand side of an equation, and called free, otherwise. We also call p↑i free variables (and regard them as free variables). It is assumed [AK96] that any useless equation y = t, where y cannot be reachable from the root, is automatically removed. We define a translation from a cyclic sharing term to an equational term graph by the unique homomorphism from the initial algebra to an algebra consisting of equational term graphs. The idea is to use positions as the unique variables in an equational
Initial Algebra Semantics for Cyclic Sharing Structures
139
term graph. We define EGraphτ (Γ) by the set of all equational term graphs having free variables taken from PO(Γ) (where a shape index τ is meaningless for equational term graphs, but we just put this index to form a presheaf). This EGraph form a presheaf ∗ in (SetT )T . Any equational term graph can be drawn as a tree-like graph as Fig. 4, hence for each node, we can give its position in the whole equational term graph. So, an equational term graph {x1 | x1 = t1 , x2 = t2 , . . .} can be normalised to an “α-equivalent form” in which for each x = t, the bound variable x is renamed to the position of t in the whole term as { | = t1 , 1 = t2 , . . .} (see the example below). We identify an equation term graph with its α-normal form. Proposition 1. EGraph forms a Σ-algebra, and the unique homomorphism [[−]] : - EGraph is monomorphic and gives an interpretation of a cyclic sharing term T as a graph represented by an equational term graph. Proof. We define an algebra structure on EGraph on “α-equivalent form” as follows. fτEGraph (Γ)({ | G1 }, . . . , { | Gn }) = { | = f (1, . . . , n), G 1 , . . . , G n } where {1 | G 1 } = shift1 ({ | G1 }) · · · {n | G n } = shiftn ({ | Gn }) EGraph
ptrτ
(Γ)( p↑i) = { | = p↑i}
shiftq { | = t1 , 1 = t2 , . . .} = {q | q = shiftq (t1 ), q.1 = shiftq.1 (t2 ), . . .} shiftq (x) = q.x
for a free variable x, shiftq ( f (t1 , . . . , tn )) = f (shiftq.1 (t1 ), . . . , shiftq.n (tn )) pn · · · pi+1 p if q is pn · · · pi+1 · · · p1 , |q| ≥ i shiftq ( p↑i) = p↑(i − |q|) if |q| < i The function shiftq shifts every bound variable by a position q (i.e. appending q as prefix) in a term in order to form an equational term graph suitably. Then, it is obvious that [[−]] is monomorphic and gives a translation from cyclic sharing terms to equational term graphs. Example 3. Consider the term μx.bin(μy1 .bin(μz.bin(↑x, lf(6)), 1↑y1 ), lf(9)) of Fig. 2. This is represented as the following term in de Bruijn and is interpreted as an equational term graph: bin(bin(bin(↑3, lf(6)), 1↑1), lf(9)) [[−]]
→
12 = 11, { | = bin(1, 2), 111 = , 1 = bin(11, 12), 11 = bin(111, 112)}.
112 = lf(6), 2 = lf(9),
7 Further Connections to Other Works The semantics of cyclic sharing terms by equational term graphs opens connections to other semantics as T - EGraph - S where S is any of the following semantics of equational term graphs.
140
M. Hamana
(i) letrec-expressions: an equational term graph is obviously seen as a letrecexpression. (ii) Domain-theoretic semantics: mentioned below. (iii) Categorical semantics in terms of traced symmetric monoidal categories [Has97]. (iv) Coalgebraic semantics: a graph is seen as a coalgebraic structure that produces every node information along edges, e.g. [AAMV03]. The domain-theoretic semantics of letrec-expressions or systems of recursive equations (e.g. [CKV74]), is now standard, which gives infinite expansion of cyclic sharing structures. Via equational term graphs, we can interpret our cyclic sharing terms in each of these semantics. Each semantics has own advantage and principles to reason about some aspects of cyclic sharing structures. However, none of these has focused on our goals [I] a simple term syntax that admits structural induction, and [II] direct usability in functional programming, mentioned in Introduction. Hence, we have chosen the initial algebra approach to cyclic sharing structures.
8 Conclusion We have given an initial algebra characterisation of cyclic sharing structures and derived inductive datatypes, structural recursion and structural induction on them. We have also associated them with equational term graphs in the initial algebra framework, hence we have shown that various ordinary semantics of cyclic sharing structures are equally applied to them. In programming point of view, practicality of our datatype of cyclic sharing structures still need to be investigated. A possible direction of future work is to use dependently-typed programming language for programming with cyclic sharing structures as an extension of this work. Acknowledgments. The basis for this work, which was motivated by a question of Zhenjiang Hu, was done while the author visited IPL, University of Tokyo during April 2007 - March 2008. I express my sincere gratitude to Masato Takeichi and members of IPL for the opportunity to stay in a stimulating and pleasant research environment. I am also grateful to Varmo Vene and Tarmo Uustalu for discussion on datatypes in Haskell and reading early draft. This work is supported by the JSPS Grant-in-Aid for Scientific Research (19700006) and NII collaboration research grant. Finally, special thanks to Mika.
References [AAMV03] [AK96] [Bro05]
Aczel, P., Ad´amek, J., Milius, S., Velebil, J.: Infinite trees and completely iterative theories: a coalgebraic view. Theor. Comput. Sci. 300(1-3), 1–45 (2003) Ariola, Z.M., Klop, J.W.: Equational term graph rewriting. Fundam. Inform. 26(3/4), 207–240 (1996) Brotherston, J.: Cyclic proofs for first-order logic with inductive definitions. In: Beckert, B. (ed.) TABLEAUX 2005. LNCS (LNAI), vol. 3702, pp. 78–92. Springer, Heidelberg (2005)
Initial Algebra Semantics for Cyclic Sharing Structures [BvEG+ 87]
[CFR+ 91]
[CGZ05] [CKV74]
[Dyb94] [Fio02] [Fio08] [FPT99] [GHUV06] [GJ07] [GUH06] [Ham04]
[Has97] [Has02] [HJ98] [JG08] [McB02] [MS03] [Nor07] [PVWW06] [Rob02] [SP82] [Tar72]
141
Barendregt, H.P., van Eekelen, M.C.J.D., Glauert, J.R.W., Kennaway, R., Plasmeijer, M.J., Ronan Sleep, M.: Term graph rewriting. In: de Bakker, J.W., Nijman, A.J., Treleaven, P.C. (eds.) PARLE 1987. LNCS, vol. 259, pp. 141–158. Springer, Heidelberg (1987) Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Kenneth Zadeck, F.: Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13(4), 451–490 (1991) Calcagno, C., Gardner, P., Zarfaty, U.: Context logic and tree update. In: Proc. of POPL 2005, pp. 271–282 (2005) Courcelle, B., Kahn, G., Vuillemin, J.: Algorithmes d’equivalence et de reduction a des expressions minimales dans une classe d’equations recursives simples. In: Loeckx, J. (ed.) ICALP 1974. LNCS, vol. 14, pp. 200–213. Springer, Heidelberg (1974) Dybjer, P.: Inductive families. Formal Aspects of Computing 6, 440–465 (1994) Fiore, M.: Semantic analysis of normalisation by evaluation for typed lambda calculus. In: Proc. of PPDP 2002, pp. 26–37 (2002) Fiore, M.: Second-order and dependently-sorted abstract syntax. In: Proc. of LICS 2008, pp. 57–68 (2008) Fiore, M., Plotkin, G., Turi, D.: Abstract syntax and variable binding. In: Proc. of 14th Annual Symposium on Logic in Computer Science, pp. 193–202 (1999) Ghani, N., Hamana, M., Uustalu, T., Vene, V.: Representing cyclic structures as nested datatypes. In: Proc. of TFP 2006, pp. 173–188 (2006) Ghani, N., Johann, P.: Initial algebra semantics is enough! In: Della Rocca, S.R. (ed.) TLCA 2007. LNCS, vol. 4583, pp. 207–222. Springer, Heidelberg (2007) Ghani, N., Uustalu, T., Hamana, M.: Explicit substitutions and higher-order syntax. Higher-Order and Symbolic Computation 19(2/3), 263–282 (2006) Hamana, M.: Free Σ-monoids: A higher-order syntax with metavariables. In: Chin, W.-N. (ed.) APLAS 2004. LNCS, vol. 3302, pp. 348–363. Springer, Heidelberg (2004) Hasegawa, M.: Models of Sharing Graphs: A Categorical Semantics of let and letrec. PhD thesis, University of Edinburgh (1997) Hasegawa, R.: Two applications of analytic functors. Theor. Comput. Sci. 272(12), 113–175 (2002) Hermida, C., Jacobs, B.: Structural induction and coinduction in a fibrational setting. Inf. Comput. 145(2), 107–152 (1998) Johann, P., Ghani, N.: Foundations for structured programming with GADTs. In: Proc. of POPL 2008, pp. 297–308 (2008) McBride, C.: Faking it: Simulating dependent types in Haskell. J. Funct. Program. 12(4&5), 375–392 (2002) Miculan, M., Scagnetto, I.: A framework for typed HOAS and semantics. In: Proc. of PPDP 2003, pp. 184–194. ACM Press, New York (2003) Norell, U.: Towards a practical programming language based on dependent type theory. PhD thesis, Chalmers University of Technology (2007) Peyton Jones, S., Vytiniotis, D., Weirich, S., Washburn, G.: Simple unificationbased type inference for GADTs. In: Proc. of ICFP 2006, pp. 50–61 (2006) Robinson, E.: Variations on algebra: monadicity and generalisations of equational theories. Formal Aspects of Computing 13(3-5), 308–326 (2002) Smyth, M.B., Plotkin, G.D.: The category-theoretic solution of recursive domain equations. SIAM J. Comput 11(4), 763–783 (1982) Tarjan, R.E.: Depth-first search and linear graph algorithms. SIAM Journal of Computing 1(2), 146–160 (1972)
An Operational Account of Call-by-Value Minimal and Classical λ-Calculus in “Natural Deduction” Form Hugo Herbelin1 and Stéphane Zimmermann2 2
1 INRIA, France PPS, University Paris 7, France
Abstract. We give a decomposition of the equational theory of call-byvalue λ-calculus into a confluent rewrite system made of three independent subsystems that refines Moggi’s computational calculus: – the purely operational system essentially contains Plotkin’s βv rule and is necessary and sufficient for the evaluation of closed terms; – the structural system contains commutation rules that are necessary and sufficient for the reduction of all “computational” redexes of a term, in a sense that we define; – the observational system contains rules that have no proper computational content but are necessary to characterize the valid observational equations on finite normal forms. We extend this analysis to the case of λ-calculus with control and provide with the first presentation as a confluent rewrite system of SabryFelleisen and Hofmann’s equational theory of λ-calculus with control. Incidentally, we give an alternative definition of standardization in call-by-value λ-calculus that, unlike Plotkin’s original definition, prolongs weak head reduction in an unambiguous way.
Introduction The study of call-by-value evaluation in the context of λ-calculus goes back to Plotkin [1] who introduced and studied rules βv and ηv and a continuationpassing-style (cps) semantics of a call-by-value λ-calculus, named λv . Significant contributions were made first by Moggi [2] then by Sabry and Felleisen [3] who provided axiomatizations of call-by-value λ-calculus shown by the last authors to be complete with respect to Plotkin’s cps semantics. In the same paper, Sabry and Felleisen also give a sound and complete axiomatization of call-by-value λ-calculus with control (hereafter referred as classical call-by-value λ-calculus). Independently, Hofmann [4] gave an alternative axiomatization of the classical call-by-value λ-calculus. The theory of call-by-value λ-calculus turns out to be de facto more complex to describe than the one of call-by-name λ-calculus. While β-reduction is enough to evaluate terms in call-by-name λ-calculus and η is enough to characterize the observational equality on finite normal forms (what Böhm separability theorem P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 142–156, 2009. c Springer-Verlag Berlin Heidelberg 2009
Call-by-Value Minimal and Classical λ-Calculus
143
expresses, see [5] for a generic account of Böhm theorem vs observational completeness), the situation is more intricate in the call-by-value λ-calculus, where, facing the two rules of the equational theory of call-by-name λ-calculus, the equational theories of call-by-value λ-calculus of Sabry and Felleisen and of Moggi rely on about a half dozen rules1 . Though, when we observe call-by-name and call-by-value in the context of sequent calculus [6,7], the complexity of the respective theories is about the same. Then, what is so specific to natural deduction (of which the standard λ-calculus is a notation for proofs along the Curry-Howard correspondence) that makes call-by-value more complicated there? Let us first give a more precise look at what happens in sequent calculus. Call-by-name and call-by-value in sequent calculus In a previous work of the first author with Curien [6], an interpretation of (a μvariant of) Gentzen’s sequent calculus [8] as a variant of λ-calculus called λμ˜ calculus was given, where μ refers to Parigot’s control operator [9]. In this interpretation, the left/right symmetry of sequent calculus pervades into a perfect syntactic duality first between terms and evaluation contexts, and secondly between call-by-name and call-by-value reduction. Especially, the operator μ ˜ , dual to μ, builds an evaluation context that binds the term to which it is applied, as in let x = [ ] in M , in the same way as Parigot’s μ builds a term that binds the evaluation context to which it is applied. The analysis of λ-calculus through sequent calculus given in [7], whether we consider call-by-name or call-by-value, shows a clear separation between operational rules and observation rules (the operational rules are cut-elimination rules and they divide into logical rules that contract the interaction of term constructors and evaluation context constructors into more elementary interactions, and two structural rules, one called (μ) for the contraction of μ and one called (˜ μ) for the contraction of μ ˜, that duplicate or erase information, moving terms around to possibly create new logical interactions). A noticeable peculiarity of sequent calculus is that even in the intuitionistic case, the μ operator is required and when sequent calculus is compared to intuitionistic natural deduction, it turns out that (μ) acts as a commutative cut. What is the problem with call-by-value λ-calculus in natural deduction syntax? Call-by-value λ-calculus precisely needs commutative cuts so as to reveal redexes hidden by β redexes that are not βv redexes. Take as an example the term ((λx.λy.y)(zt))u with z, t and u as free variables. With respect to λv , this term is a normal form, however the reduction between λy and u is possible if one 1
The theory of Moggi has seven rules; the theory of Sabry and Felleisen has six axioms of which βlift : E[let x = N in M ] → let x = N in E[M ] is redundant; we will see later on that isolating βlift out is however important from an operational point of view.
144
H. Herbelin and S. Zimmermann
takes a parallel syntax, like proof-nets or parallel application to obtain the term (λx.u)(zt). This is due to the presence of free variables, that can hide potential reductions. When using a parallel syntax, one does not rely anymore on syntactical shapes, but on calculus dependencies. In our example, the reduction between λx and (zt) is independent from the reduction between λy and u. This is where the structural rules announced in the abstract come in: they will denote implicit use of the reduction rule for μ. For call-by-name, we have the well-known σ-equivalence [10] that disentangle redexes, but we have to be more σ → (λy.(λx.M )N )P careful for call-by-value. So here, the rule ((λx.λy.M )N )P − is not valid, because one changes the evaluation order of N and P and will break confluence in presence of a μ operator. Our structural rules should then preserve head reduction order to avoid such problems.
1
Call-by-Value λ-Calculus (λCBV -Calculus)
In this section, we show how the equational theory of call-by-value λ-calculus can be decomposed into three independent subsystems of rules. Note that we consider here only left-to-right call-by-value. 1.1
The Splitting of the Usual βv Rule
To be able to use what sequent calculus teaches us, we need a construction μ. We extend the λ-calculus grammar with a let corresponding to the μ ˜ of λμ˜ construction, as follows: V ::= x | λx.M T ::= M N | let x = M in N M, N ::= V | T The direct counterpart of this splitting is syntactically expressed by the two following rules: (⇒) (λx.M )N → let x = N in M (letv ) let x = V in M → M [x ← V ] To understand the intuition behind this, it is necessary to look at how head reduction is performed on an application. We have to check first if M is a value or not, and then if N is a value or not to determine what to reduce. So control is not clear on an application, sometimes it is the left term that has it and sometimes it is the right one. Using the let is an elegant way to get rid of this. You only need to look at the left term of an application, and when this term is eventually reduced to a λ, it naturally gives the control to the term of the right thanks to the (⇒) rule. Thanks to this decomposition, we can now have a clear notion of control, and be able to determine where to work on. An evaluation context F is now either an application with a hole on the left, or a hole inside a let, say: F ::= [ ]M | let x = [ ] in M
Call-by-Value Minimal and Classical λ-Calculus
145
and the mechanism of giving control to the “right” part of the application is now explicit. As we will see later, this will help to solve a standardization problem that arises from this ambiguity. 1.2
Dealing with the Implicit (μ) Rules
Whenever in the calculus there exists a conditional rule (W )L → R, we will call a pseudo-redex a term matching L but not satisfying the side condition. For call-by-value λ-calculus, the term (λx.x)(yz) is a pseudo-redex, because yz is not a value. In our calculus, the pseudo-redexes are not of the shape (λx.M )T , but more like let x = T in M . Still, they can hide reductions. Take for example the term (let x = zt in λy.y)u, the counterpart to the term ((λx.λy.y)(zt))u. The reduction between λy and u is still hidden, and we need to get the u inside the let, in a sense. This can be done using the two rules (letlet ) and (letapp ). (letlet ) let x = (let y = M in N ) in P → let y = M in let x = N in P (letapp ) (let x = M in N )P → let x = M in N P The (letapp ) rule is here to deal with a pseudo-redex on the left side of an application. If you have a pseudo-redex inside on the left of an application, you can get it outside the application without disturbing the evaluation order, and recover the possible interactions that are independent of the substitution. Using this rule on the example above, we get (let x = zt in λy.y)u → let x = zt in (λy.y)u. The subterm zt still has control during the reduction, but we can do the other reductions as well. The (letlet ) rule has the same role, only for the right part of an application. 1.3
Basic Properties of the Calculus
Our presentation of call-by-value λ-calculus, called λCBV -calculus, is given in Figure 1. The auxiliary definition of evaluation context F (resp. E) is used to find the sub-term locally (resp. globally) in control in the term. The (⇒) and (letv ) rules correspond to the (βv ) rule, and the (letlet ) and (letapp ) rules are the two rules managing the implicit (μ) reductions2 . Regarding notations, we will use as a convention that parentheses on the left of an application are implicit, meaning that the term M N P stands for ((M N )P ). In the term λx.M , we will say that the occurrence of x in M are bound and by free variables, we mean variables that are not bound. The set of the free variables of a term M will be written F V (M ). To avoid problems of capturing variables, we will work modulo α-conversion, i.e. modulo the renaming of bound variables. Therefore, the names for free variables and bound variables in a given term can always be made distinct. By 2
These rules actually correspond to an infinite family of rules since E[yM ] is a metanotation for the term obtained by plugging yM in the hole of E.
146
H. Herbelin and S. Zimmermann Syntax V ::= x | λx.M M, N, P ::= V | M N | let x = M in N
F ::= [ ]M | let x = [ ] in M E ::= [ ] | xE | F [E]
Operational rules (⇒) (letv )
(λx.N )M let x = V in N
r
→ let x = M in N r → N [x ← V ] Structural rules r
(letlet ) let z = (let x = M in N ) in P → let x = M in (let z = N in P ) r (letapp ) (let x = M in N )P → let x = M in N P Observational rules (η⇒ ) λx.(yx) E (ηlet ) let x = M in E[x] (letvar ) z(let x = M in N )
r
→y r → E[M ] if x ∈ F V (E) r → let x = M in zN
Fig. 1. The full λCBV -calculus
M [x ← V ], we mean the capture-free substitution of every occurrence of the free variable x of M by V . r We write → for the compatible closure of → and →∗ for the reflexive and transitive closure of → . We write = for the equational theory associated. Proposition 1 (Confluence). The operational subsystem, its extension with structural rules and the full system of reduction rules are all confluent in λCBV calculus. Otherwise said, for each of these three systems, if M →∗ N and M →∗ N , then there exists P such that N →∗ P and N →∗ P . Proof (Indication). This statement is proved using Tait – Martin-Löf method, applied to the parallel reduction defined by the union (of generalizations) of the above reduction rules and the following congruence: – tt – if M M then λx.M λx.M – if M M and N N then M N M N and let x = M in N let x = M in N Note that (letvar ) is redundant in the equational theory but necessary in the reduction theory to get the confluence. We say that a reduction relation → is operationally complete if whenever a closed M is not a value, it is reducible along → . For instance, usual β-reduction is operationally complete for call-by-name λ-calculus.
Call-by-Value Minimal and Classical λ-Calculus
147
Proposition 2. λCBV equipped with the rules (⇒) and (letv ) is operationally complete. Proof. Any closed term which is not a value has either the form E[let x = λy.M in N ] in which case (letv ) is applicable or E[(λx.M )N ] in which case (⇒) is applicable. Remark. For simplicity of the definition of λCBV , we did not consider a minimal set of operational rules. A minimal operationally complete set can be obtained by restricting both M in (⇒) and V in (letv ) to be of the form λy.P . However, this r would force to explicitly add an observational rule let x = y in N → N [x ← y] r and a structural rule (λx.N )(E[yM ]) → let x = E[yM ] in N (where the notion of pseudo-redex and erasing rule, see below, is extended to the case of β-redexes). Associated to reduction rules with constraint, we can define erasing rules used to erase the pseudo-redex part of the rule. Namely, we will use the system composed of the following rule for this : er let x = N in M −→ N A term M is normal for a reduction → if it is not reducible for → . It er ∗ is structurally normal if, for all N such that M −→ N , N is normal for the reduction generated by the operational rules. A reduction relation → is structurally complete if all terms normal for → are structurally normal. Proposition 3. The set of terms that are normal with respect to the operational and structural rules is described by the entry Q of the following grammar: Q ::= λx.Q | S | let x = S in Q S ::= x | SQ Moreover, the reduction generated by the operational and structural rules of λCBV -calculus is structurally complete. Proof (Indication). The two parts are proved by induction quite directly. It is interesting to see that in a normal form like let x = P in Q, the leftmost subterm of P is always a free variable and the evaluation of P is blocked by this variable. This suggests an alternative definition of structural completeness based on the notion of evaluation of open terms: a set of rules is structurally complete iff any term which is neither a value nor of the form xM1 ...Mn or let y = xM1 ...Mn in N is reducible. Let us now focus on the observational rules of Figure 1. Given a confluent reduction → , we say that an equation M = N between normal terms belongs to the observational closure of → if for every closed evaluation context E and every substitution ρ of the free variables of M and N by closed values, E[ρ(M )] converges along →∗ iff E[ρ(N )] converges. Proposition 4. The observational rules from Figure 1 belong to the observational closure of the set of operational and structural rules of λCBV -calculus. Proof. When instantiated by closed values, both sides of (η⇒ ) and (letvar ) conE verge by respectively using (⇒) and (letv ), (⇒) and (letlet ). For (ηlet ), the result
148
H. Herbelin and S. Zimmermann
follows by standardization (see below) since at some point, for ρ a substitution of the free variables of E [yM ], the reduction of ρ(E [yM ]) eventually yields a value and (letv ) is applicable. 1.4
The Subformula Property and Structural Completeness
The λCBV -calculus can be typed by natural deduction just like the conventional (call-by-name) λ-calculus. The only new construction is the let, which can be typed by the cut rule. The whole typing system is the following: Γ, x : A M : B Γ, x : A x : A
Γ M :A⇒B
Γ λx.M : A ⇒ B Γ, x : A M : B
Γ N :A
Γ MN : B Γ N :A
Γ let x = N in M : B Contrary to the call-by-name λ-calculus, the original call-by value λ-calculus does not satisfy the subformula property for its normal forms. We can distinguish two kinds of “breaking” of this property. The first one happens with the pseudo-redexes. The term (λx.x)(yz) is a normal form, which can be typed this way, with Γ = y : B ⇒ A, z : B : Γ λx.x : A ⇒ A
Γ yz : A
Γ (λx.x)(yz) : A The problem here comes from the type A ⇒ A that appears after the cut between λx.x and yz. If we remember that a cut, in natural deduction, is an introduction rule followed by a elimination rule, then neither of the two rules, taken alone, are problematic. Think of normal forms like yM , corresponding to the elimination of implication, and λx.M , corresponding to the introduction of implication. It is really the cut that is causing problems. In λCBV , because the pseudo-redexes correspond to sequent calculus cuts, we can get rid of this problem. The A ⇒ A formula become A A, and our term become (λx.x)(yz) → let x = yz in x, whose proof has the subformula property. However, this is not enough to get rid of all our problems. With Γ = u : A, z : C ⇒ D, t : C the term (let x = zt in λy.y)u can be typed this way : Γ let x = zt in λy.y : A ⇒ B
Γ u:A
(let x = zt in λy.y)u : B Again, there is a A ⇒ B type appearing and breaking the subformula property. This time, it is not a problem of pseudo-redex, but rather one of hidden redex. To solve this, we need the structural (letapp ) rule to reduce the interaction between y and u and, in a more general manner our terms have to be structurally complete otherwise hidden redexes will create unnecessary arrow types.
Call-by-Value Minimal and Classical λ-Calculus
149
Pushing pseudo-redexes to more atomic let terms and having structural rules is sufficient to gain back the subformula property for λCBV , which is expressed by the following property. Proposition 5. Let M be a term of the λCBV calculus that is normal with respect to the operational and structural rules. Then if Γ M : A, its proof satisfies the subformula property. Proof. We shall reason by induction on the proof of Γ M : A. The only technical point is to know that on a normal form, the leftmost term on an application is a variable which is in the context. 1.5
Standardization
Plotkin’s definition of standard reduction sequences in λv does not characterize canonical standard reduction sequences: the standard reduction of a term is not h always unique. We first recall the definition of head reduction − → and standard reduction sequences (s.r.s.) in Plotkin [1]: h
– (λx.M )V − → M [x ← V ] – if M − → M then M N − → M N h h → M then V M − → V M – if M − h
h
h
– if M1 − → M2 and M2 , . . . , Nn is a s.r.s., then M1 , M2 , . . . , Nn is a s.r.s. – if M1 , . . . , Mn and N1 , . . . , Np are s.r.s., then M1 N1 , . . . , Mn N1 , . . . , Mn Np is a s.r.s. h
h
Now, if we take M and N such that M − → M and N − → N , then the two following reductions are standard: s
s
s
s
→ (λx.M )N − → (λx.M )N (λx.M )N − (λx.M )N − → (λx.M )N − → (λx.M )N The first one is built as an extension of head reduction, while the second one is built only using the application rule of standard reduction. The explanation is that in λv , there is no direct way to determine what is in control in an application, so there is no possibility to have a unique general rule for application. In λCBV , since the (βv ) rule was split, the control is unambiguous, so we can get rid of this problem. Weak-head reduction for λCBV and an alternative, non ambiguous, definition of standard reduction sequences are given below. – M − → M for any subset3 of rules of λCBV h
3
If (letv ) is present in the subset, we assume that M in the rules (letapp ), (letlet ), E ) is restricted to the form E[yM ] so that head reduction favors (letvar ) and (ηlet E ) is present, we assume that N in the rules (letapp ), (letlet ) and (letvar ) letv . If (ηlet E ). is not of the form E[x] so that head reduction favors (ηlet
150
H. Herbelin and S. Zimmermann
→ M then F [M ] − → F [M ] – if M − h
h
– any variable is a s.r.s. – if M, . . . , E1 [V1 ]4 is a head reduction sequence, and V1 , . . . Vn and E1 , . . . , Ep are s.r.s., then M, . . . , E1 [V1 ], . . . E1 [Vn ], . . . Ep [Vn ] is a s.r.s. – if M1 , . . . , Mn is a s.r.s., then λx.M1 , . . . , λx.Mn is a s.r.s. – [ ] is a s.r.s. on contexts – if M1 , . . . , Mn and E1 , . . . , Ep are s.r.s. on terms and on contexts, then the sequences E1 [let x = [ ] in M1 ], . . . , E1 [let x = [ ] in Mn ], . . . , Ep [let x = [ ] in Mn ] and E1 [[ ]M1 ], . . . , E1 [[ ]Mn ], . . . , Ep [[ ]Mn ] are s.r.s. on contexts
Note that the definition applies to λv too by using only (βv ) in the definition of head reduction. Theorem 1 (Standardization). For all M and N , if M →∗ N using any set of reduction rules there is a unique standard reduction sequence M, . . . , N s → N extending head reduction. We shall note this reduction M − Proof (Indication). We shall proceed by induction, with permutation of nonstandard reductions. Uniqueness is by canonicity of the definition of being a standard reduction sequence. But as we said before, we even got a stronger result. If M is a closed term, its head reduction (so its standard reduction) only uses the two rules (⇒) and (letv ), and the other rules are not necessary. We shall prove this, but we need to notice that if M is a closed term, then M cannot be decomposed through E[yP ], and if M = E[yP ], then M never reduces to a value. We could have provided a somehow “lighter” version of λCBV with rules (letapp ) or (letlet ) relaxed, like: (letapp ) (let x = M in N )P → let x = M in N P But then, a critical pair would have arisen, with the term (let x = V in N )P , between the rules (letapp ) and (letv ). This does not break confluence, but then head reduction of closed terms would have used our (letapp ) rule, which is not necessary, and we would have lost some structure on the calculus. To solve this, we only needed to give priority to the (letv ) rule over the (letapp ) rule by restricting the later only to terms which will never reduce to values. We can now state our proposition: Proposition 6. If M is a closed term such that M →∗ V , then there exists V h ∗
s
such that M − → V and V = V with − → using only the rules (⇒) and (letv ). Proof (Indication). As it is a consequence of the standard reduction, we just use an induction on the standard reduction of M . This result is not surprising, if you think that the (⇒) and (letv ) rules correspond to the (βv ) rule, and that (βv ) rule is sufficient to deal with closed terms. So we have a calculus where all rules have their own purpose, which is clearly defined. 4
Remember that E[] can be empty, so that M reduces to a value.
Call-by-Value Minimal and Classical λ-Calculus (βv ) (λx.M )V (letv ) let x = V in M (let.1) T M (let.2) V T (let.let) let y = (let x = M in N ) in P
→ → → → →
λx.V x (η⇒ ) (letid ) let x = M in x
→ V if x is not free in V → M
151
M [x ← V ] M [x ← V ] let x = T in xM with x fresh let x = T in V x with x fresh let x = M in let y = N in P
W ::= x | λx.Q Q ::= let x = yW in Q | xW | W Fig. 2. Moggi’s λC -calculus and its normal forms
1.6
Comparison with Moggi’s λc -Calculus
In [2], Moggi gives a call-by-value calculus, on the same syntax as λCBV . The reduction rules and the grammar characterizing normal forms with respect to the (βv ), (let.1), (let.2) and (let.let) rules are given in Figure 2 (T denotes terms that are not values). This calculus was made to allow more reducing than the original call-by-value calculus while remaining call-by-value. We say that two operational theories are in equational correspondence (see Sabry-Felleisen for the original notion [3]) if there is a bijection between the equivalence classes of the theories. Especially, we can show that our theory of λCBV is in equational correspondence with the one of Moggi. Proposition 7. The two calculi λc and λCBV are in equational correspondence. Proof (Indication). We only show here the interesting cases of the simulation of the rules. letapp = let.1 + let.let + (let.1)−1 letvar = let.2 + let.let + (let.2)−1 let x=[] in N −1 let.1 = (ηlet ) x[] For let.2 we proceed case by case. If V is y, then let.2 = (ηlet )−1 . If V = λy.M then let.2 =⇒ +(⇒)−1 . Our claim is now that λCBV has a finer structure than λc . First, observe that in Moggi’s calculus, for the same reason as in λCBV , the rules (βv ) and (letv ) are sufficient for operational completeness. Proposition 8. The λc -calculus restricted to the rules (βv ) and (letv ) is operationally complete. The λc -calculus equipped with (βv ), (letv ), (let.1), (let.2) and (let.let) is structurally complete. [] A weird point is the presence of an observational rule, namely (η⇒ ) in the simulation of the (let.1) and (let.2) rules, providing an ambiguous status to these two rules, used for computation but containing a bit of observation too. This
152
H. Herbelin and S. Zimmermann
can be related to another not so likable behavior of λc . We remember that the calculus has the (βv ) rule which is sufficient to reduce closed terms to values, but if we reduce the term (λx.x)((λx.x)(λx.x)) with head reduction, we obtain the following reduction sequence : (λx.x)((λx.x)(λx.x)) → let y = (λx.x)(λx.x) in (λx.x)y →∗ λx.x. The head reduction uses the rule (let.2) as the first reduction rule, and for the term ((λx.x)(λxy.y))(λx.x)(λx.x) the head reduction uses the rules (let.1) and (let.let). So, even if we do not need them, these rules appear and the λc -calculus tends to do useless expansions. This was already noticed by Sabry and Wadler in [11] where they showed that λc is isomorphic to one of its sub-calculi where all the let rules have been reduced, and applications occur only between two variables. The let construction is here a way to make the head-reduction flow explicit and to reduce a term, λc firstly encodes its evaluation flow with some let manipulation, and only then reduces the term. By doing this, we lose almost completely a “hidden agent” of the reduction : the structural congruence. In λv , the (βv ) rule was sufficient because it could go inside a term to find a redex, as here we almost never investigate in the term. The evaluation flow is encoded in a way that the topmost term always contains the redex, but is then superseding the structural congruence. So, the status of the let rules is not clearly defined. They are used for reducing closed terms to values, but are also necessary for the structural completeness and contain a taste of observational rules. Interestingly enough, we can switch between the normal forms or λC and E λCBV by using the η⇒ . Oriented from left-to-right we go from λC to λCBV and conversely with the right-to-left orientation.
2 2.1
Call-by-Value λμtp-Calculus (λμtpCBV -Calculus) Confluence and Standardization
Our calculus is not hard to extend with continuation variables and control operators. We follow the approach of [12] and adopt Parigot’s μ operator [9] together with a toplevel continuation constant tp, what simplifies the reasoning on closed computations and the connection with the λ-calculi based on callcc and A or Felleisen’s C operator. We write C[α ← [β]E] for the generalization of Parigot’s structural substitution and we abbreviate C[α ← [β]([ ])] as C[α ← β]. Since the reductions associated to μ absorb their context, structural rules should not change which subterm is in control, but happily structural rules are only like E[T ] → E [T ], so we cannot break confluence with them. We must be careful with μ reductions however. If μ does not have any reduction restrictions, let still can hide μ redexes. Therefore, we need a new structural rule. The full calculus is given in Figure 3. Proposition 9. The operational subsystem, its extension with structural rules and the full system of reduction rules are all confluent in the λμtpCBV -calculus.
Call-by-Value Minimal and Classical λ-Calculus
153
Syntax V M, N, P C q
::= ::= ::= ::=
x | λx.M V | M N | let x = M in N | μα.C [q]M α | tp
F ::= [ ]M | let x = [ ] in M E ::= [ ] | xE | F [E]
Operational rules (⇒) (letv ) (μv ) (μbase )
r
− → r − → r − → r − →
(λx.N )M let x = V in N F [μα.C] [q]μα.C
let x = M in N N [x ← V ] μα.C[α ← [α]F ] C[α ← q]
Structural rules r
→ let x = M in (let z = N in P ) (letlet ) let z = (let x = M in N ) in P − r (letapp ) (let x = M in N )P − → let x = M in N P r (letµ ) let x = M in μα.[β]N − → μα[β].let x = M in N Observational rules (η⇒ ) E (ηlet ) (ηµ ) (letvar ) (μvar )
λx.yx let x = M in E[x] μα.[α]M z(let x = M in N ) z(μα.C)
r
− → r → r − → r − → r − →
y E[M ] if x ∈ F V (E) M if α not free in M let x = M in zN μα.C[α ← [α](z[ ])]
Fig. 3. The full λμtpCBV -calculus
Proof. The proof is the same as before, using a generalization of (μv ) and (μbase ) to contexts of the form E or [q]E, with the parallel reduction extended with the two following rules : – if M M then [α]M [α]M – if C C then μα.C μα.C Note again the redundancy of (letvar ) and (μvar ) which are here for the confluence. As head reduction can be extended with the new rules, we obtain a notion of standardization for this calculus too.
154
H. Herbelin and S. Zimmermann
The rule letμ is here to deal with hidden μ-redexes, obfuscated by a let. As an example, the term (let x = yz in μα.[α]λx.x)t needs first to be reduced to (μα.[α]let x = yz in λx.x)t, then only the reduction between μα and [ ]t can occur. It is also interesting to see that being in call-by-value, we immediately dodge any problems with David and Py’s critical pair [13]. The term λx.(μα.c)x only reduces to λx.(μα.c[α ← [α][ ]x]), because of the η restriction to values. We say that M evaluates to V in λμtpCBV if [tp]M reduces to [tp]V and we say that a reduction relation → is operationally complete if whenever a closed M is not a value, [tp]M is reducible along → . 2.2
Normal Forms
Not only do we keep the confluence, telling us that the extension is reasonable, but we also keep all the other good properties of the intuitionistic calculus. The grammar generating normal forms with respect to the operational and structural rules, with T as an entry point, and the typing rules associated with the new μ construction are given in Figure 4 (T is a global parameter of the type system corresponding to the type of tp and all rules are generalized with a context Δ on the right). Properties on the normal forms are preserved, as the proposition below says. Γ u : N | β : M, α : N, Δ S ::= x | ST Q ::= λx.T | S | let x = S in Q Γ μβ.[α]u : M | α : N, Δ T ::= Q | μα.[β]Q
Γ u : T | β : M, Δ Γ μβ.[tp]u : M | Δ
Fig. 4. Normal forms and new typing rules for the λμtpCBV -calculus
Proposition 10. The λμtpCBV -calculus equipped with its set of operational rules is operationally complete and the λμtpCBV -calculus equipped with its sets of operational and structural rules is structurally complete. If T is a term in normal form relatively to the operational and structural rules of λμtpCBV -calculus and Γ T : A | Δ then its proof satisfies the subformula property. Proof (Indication). The first point is direct by the same proof as for the intuitionistic case. The second point is solved by induction, with most of the cases not being changed. 2.3
Equational Theory
In [3], Sabry and Felleisen devised an equational theory of call-by-value λcalculus with control operators that is sound and complete with respect to callby-value continuation-passing-style (cps) semantics. We recall only the part of the equations concerning the control operators below, because the other one is verified by Moggi’s calculus and so by λCBV .
Call-by-Value Minimal and Classical λ-Calculus
155
(Ccurrent ) callcc (λk.kM ) (Celim ) callcc (λd.M ) E[callcc M ] (Clift ) (Cabort ) (Ctail ) (Abort)
= callcc (λkM ) = M if d ∈ F V (M ) = callcc (λk.E[M (λf.(kE[f ]))]) if k, f ∈ F V (E, M ) callcc (λk.C[E[kM ]]) = callcc (λk.C[kM ]) for C a term with a hole not binding k callcc (λk.((λz.M )N )) = λz.(callcc (λk.M ))N if k ∈ F V (N ) E[AM ] = AM
Because the language of the equations is not the same as ours, we need some translations: callcc and A are encoded by the terms (λx.μα.[α](x(λy.μδ.[α]y))) and λx.μδ.[tp]x while, conversely, we encode μα.[β]M by callcc (λkα .A(kβ M )), μα.[tp]M by callcc (λkα .A(M )) and let x = N in M by (λx.M )N . By unfolding the definitions and doing basic calculations, we obtain the equational correspondence with the equations of Sabry and Felleisen. But instead of only having equations, we now have an oriented reduction system. Proposition 11. There is an equational correspondence at the level of closed expressions between λμtpCBV and Sabry and Felleisen’s axiomatization of λcalculus with callcc and A. Moreover, since we can equip λμtpCBV with a cps-semantics that matches the one considered in Sabry and Felleisen as soon as μα.c is interpreted by λkα .c, [α]M by M kα and [tp]M by M λx.x, we can transfer Sabry and Felleisen completeness result to the full theory of λμtpCBV . Corollary 1. The full calculus λμtpCBV is sound and complete with respect to βη along its cps-semantics. Since tp has a passive role in the reduction system of λμtpCBV , the confluence of the different subsystems and the structural and cps completenesses also hold for λμCBV which is λμtpCBV without tp.
Summary We studied the equational theory of call-by-value λ-calculus and λμ-calculus from a reduction point of view and provided what seems to be the first confluent rewrite systems for λμ-calculus with control that is complete with respect to the continuation-passing-style semantics of call-by-value λ-calculus. The rewrite system we designed is made of three independent blocks that respectively address operational completeness (ability to evaluate closed terms), structural completeness (ability to contract hidden redexes in open terms) and purely observational properties. The notion of structural completeness is related to the ability to enforce the subformula property in simply-typed λ-calculus but we failed to find a purely computational notion that universally captures the subformula property. For instance, any call-by-value reduction system that does not use a let and does
156
H. Herbelin and S. Zimmermann
not smash abstraction and application from redexes of the form (λx.M )(yz) cannot be accompanied with a typing system whose normal forms type derivation satisfies the subformula property. It would have been desirable to characterize the block of observational rules as a block of rules providing observational completeness, in a way similar to the call-by-name control-free case where β provides operational completeness and η provides observational completeness. Obtaining such a result would however require a Böhm-style separability result for call-by-value λ-calculus and λμ-calculus, what goes beyond the scope of this study. Regarding intuitionistic call-by-value λ-calculus we slightly refined Moggi and Sabry and Felleisen rewrite systems by precisely identifying which rules pertain to the operational block, the structural block or the observational block. Finally, we proposed a new definition of standard reduction sequences for call-by-value λ-calculus that ensures the uniqueness of standard reduction paths between two terms (when such a path exists).
References 1. Plotkin, G.D.: Call-by-name, call-by-value and the lambda-calculus. Theor. Comput. Sci. 1, 125–159 (1975) 2. Moggi, E.: Computational lambda-calculus and monads. Technical Report ECSLFCS-88-66, Edinburgh Univ. (1988) 3. Sabry, A., Felleisen, M.: Reasoning about programs in continuation-passing style. Lisp and Symbolic Computation 6(3-4), 289–360 (1993) 4. Hofmann, M.: Sound and complete axiomatisations of call-by-value control operators. Mathematical Structures in Computer Science 5(4), 461–482 (1995) 5. Dezani-Ciancaglini, M., Giovannetti, E.: From Böhm’s theorem to observational equivalences: an informal account. Electr. Notes Theor. Comput. Sci. 50(2) (2001) 6. Curien, P.L., Herbelin, H.: The duality of computation. In: Proceedings of ICFP 2000. SIGPLAN Notices, vol. 35(9), pp. 233–243. ACM, New York (2000) 7. Herbelin, H.: C’est maintenant qu’on calcule: au cœur de la dualité. Habilitation thesis, University Paris 11 (December 2005) 8. Gentzen, G.: Untersuchungen über das logische Schließen. Mathematische Zeitschrift 39, 176–210,405–431 (1935); English Translation in The Collected Works of Gerhard Gentzen, Szabo, M. E. (ed.), pp. 68–131 9. Parigot, M.: Lambda-mu-calculus: An algorithmic interpretation of classical natural deduction. In: Voronkov, A. (ed.) LPAR 1992. LNCS, vol. 624, pp. 190–201. Springer, Heidelberg (1992) 10. Regnier, L.: Une équivalence sur les lambda-termes. Theor. Comput. Sci. 126(2), 281–292 (1994) 11. Sabry, A., Wadler, P.: A reflection on call-by-value. ACM Trans. Program. Lang. Syst. 19(6), 916–941 (1997) 12. Ariola, Z.M., Herbelin, H.: Control reduction theories: the benefit of structural substitution. Journal of Functional Programming 18(3), 373–419 (2008); with a historical note by Matthias Felleisen 13. David, R., Py, W.: Lambda-mu-calculus and Böhm’s theorem. J. Symb. Log. 66(1), 407–413 (2001)
Refinement Types as Proof Irrelevance William Lovas and Frank Pfenning Carnegie Mellon University Pittsburgh, PA 15213, USA [email protected], [email protected]
Abstract. Refinement types sharpen systems of simple and dependent types by offering expressive means to more precisely classify well-typed terms. Proof irrelevance provides a mechanism for selectively hiding the identities of terms in type theories. In this paper, we show that refinement types can be interpreted as predicates using proof irrelevance in the context of the logical framework LF, establishing a uniform relationship between two previously studied concepts in type theory. The interpretation and its correctness proof are surprisingly complex, lending credence to the idea that refinement types are a fundamental construct rather than just a convenient surface syntax for certain uses of proof irrelevance.
1
Introduction
Refinement type systems seek to extend type theories with more expressive means of classifying terms. Refinements typically take the form of an added layer of sorts above the usual layer of types: types express very basic well-formedness criteria while sorts specify precise properties of terms using technology like subsorting and intersection sorts. Refinement types have been profitably employed in functional languages like ML [7, 4], and they are a topic of much recent and ongoing research [6, 5, 17]. In recent work [10], we developed a system of refinement types for the logical framework LF [8]. An essential guiding principle was to restrict attention to canonical forms using bidirectional typing [16]. Under the canonical forms methodology, features which typically complicate a type system’s metatheory could be expressed cleanly and simply. For example, treating intersection introduction as a checking rule and intersection elimination as a synthesis rule avoided any issues relating to intersection type inference, and restricting typing to canonical forms led to subtyping needing to be defined only at base type. A simple example of refinement types in LF is the natural numbers with refinements standing for even and odd numbers: nat : type. z : nat. s : nat → nat. even < nat. odd < nat. z :: even. s :: even → odd ∧ odd → even. P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 157–171, 2009. c Springer-Verlag Berlin Heidelberg 2009
158
W. Lovas and F. Pfenning
In the above, even < nat declares even as a refinement of the type nat, and the declarations using “::” give more precise sorts for the constructors z and s. Note that since the successor function satisfies two unrelated properties, we give two refinements for it using an intersection sort. In this paper, we exhibit an interpretation of LF refinement types which we refer to as the “subset interpretation”, since a sort refining a type is interpreted as a predicate embodying the refinement, and the set of terms having that sort is simply the subset of terms of the refined type that also satisfy the predicate. For example, under the subset interpretation, we translate the refinements even and odd to predicates on natural numbers, or one-place judgments following the LF judgments-as-types principle [8]. The refinement declarations for z and s turn into constructors for proofs of these predicates. even : nat → type. odd : nat → type. z : even z. s1 : Πx:nat. even x → odd (s x). s2 : Πx:nat. odd x → even (s x). The successor function’s two unrelated sorts translate to proof constructors for two different predicates. We show that our interpretation is correct by proving, for instance, that a term has type S(N ), where S( − ) is the N has sort S if and only if its translation N translation of the sort S into a type family representing a predicate; thus, an adequate encoding using refinement types remains adequate after translation. The chief complication in proving correctness is the dependency of types on terms, which forces us to deal with a coherence problem [2, 13]. Normally, subset interpretations are not subject to the issue of coherence— that is, of ensuring that the interpretation of a judgment is independent of its derivation—since the terms in the target of the translation are the same as the terms in the source, just with the stipulation that a certain property hold of them. The proofs of these properties are computationally immaterial, so they may simply be ignored at runtime. But the presence of full dependent types in LF means that the interpretation of a sort might depend on these proofs, and in turn the derivability of certain typing judgments in the target might depend on the identities of these proofs. Enter proof irrelevance: our primary tool for coping with coherence. Proof irrelevance is a technique used in type theories to selectively hide the identities of certain terms representing proofs of propositions [11,1]. One typical use of proof irrelevance is to render the typechecking of subset types [3, 14] decidable. A subset type {x:A | B(x)} represents the set of terms of type A which also satisfy B; typechecking is undecidable because to determine if a term M has this type, you must search for a proof of B(M ). One might attempt to recover decidability by using a dependent sum Σx:A. B(x), representing the set of terms M of type A paired with proofs of B(M ); typechecking is decidable, since a proof of B(M ) is provided, but equality of terms is overly fine-grained: if there are two proofs of B(M ), the two pairs will be considered unequal. Using proof irrelevance, one can find a middle ground with the type Σx:A. [B(x)],
Refinement Types as Proof Irrelevance
159
where [−] represents the proof irrelevance modality. Type checking is decidable for such terms, since a proof of the property B is always given, but the identity of that proof is ignored, so all pairs with the same first component will be considered equal. Our situation with the subset interpretation is similar: we would like to represent proofs of sort-checking judgments without depending on the identities of those proofs. By carefully using proof irrelevance to hide the identities of sort-checking proofs, we are able to show our translation sound and complete, preserving the adequacy of representations. We begin the remainder of the paper by extending our example above to demonstrate the coherence issues that arise in the subset interpretation (Section 2). After that, we review the formal treatment of refinement types (Section 3) and proof irrelevance (Section 4) in the context of the logical framework LF, and then we discuss our translation and its correctness criteria in detail (Section 5). We conclude by highlighting some broader implications (Section 6).
2
Extended (Counter-)Example
Coherence arises in the subset interpretation due to the presence of dependent types. To show what can go wrong, we extend our example from the introduction to make use of dependency. Our uniform translation in Section 5 will be different in some details, but the essential ideas are unchanged. Consider a judgment double which relates any natural number to its doubling. double : nat → nat → type. dbl/z : double z z. dbl/s : ΠN:nat. ΠN2:nat. double N N2 → double (s N) (s (s N2)). Using refinement kinds, or classes, we can express the property that the second subject of any doubling relation is always even, no matter what properties hold of the first subject. We do so by defining a sort double* which is isomorphic to double, but has a more precise class. double* < double :: → even → sort. dbl/z :: double* z z. dbl/s :: ΠN::. ΠN2::even. double* N N2 → double* (s N) (s (s N2)). The sort represents a natural number with no special properties. Successfully sort-checking the declarations for dbl/z and dbl/s demonstrates that whenever double* M N is inhabited, the second argument, N, is even. There is a crucial difference between refinements like even or odd and refinements like double*: while even and odd denote particular subsets of the natural numbers, the inhabitants of the refinement double* x y are identical to those of the ordinary type double x y. What is important is not whether a particular instance double* x y is inhabited, but rather whether it is well-formed at all. For this reason, we separate the formation of a dependent refinement type family from its inhabitation. Following this idea, the refinement double* translates as follows:
160
W. Lovas and F. Pfenning
: nat → nat → type. double* x y. double*/i : Πx:nat. Πy:nat. even y → double* xy→ double* : Πx:nat. Πy:nat. double* ÷ double x y → type. There are three declarations after translation: which is inhabited exactly when a particular – a formation family, double*, z z will be inhabited, since instance of double* is well-formed (e.g. double* double* z z is well-formed), – a constructor for the formation family, double*/i, which builds such proofs of well-formedness (e.g. double*/i z z z will be a proof that double* z z is well-formed), and – a predicate family, double*, which for any x and y, will be inhabited by proofs that a given derivation of double x y has refinement double* x y, provided that double* x y is well-formed. In the predicate family double*, the proof of well-formedness is made irrelevant using a proof-irrelevant function space A → ÷ B, representing functions from A to B that are insensitive to the identity of their argument. Using irrelevance ensures that a given sort has a unique translation, up to equivalence. We elaborate on this below. The final component, the predicate family double*, is populated by constants generated from the refinement declarations. We write arguments in irrelevant position in [ square brackets ]. : double* z z dbl/z [ double*/i z z z ] dbl/z. : ΠN:nat. ΠN2:nat. Π N2:even dbl/s N2. ΠD:double N N2. ]D double* N N2 [ double*/i N N2 N2 → double* (s N) (s (s N2)) ] [ double*/i (s N) (s (s N2)) (s2 (s N2) (s1 N2 N2)) (dbl/s N N2 D). As is evident even from this short and abbreviated example, the interpretation leads to a significant blowup in the size and complexity of a signature, underscoring the importance of a primitive understanding of refinement types. Note that the formation argument of double* above is always made irrelevant as stipulated by its type. What if we hadn’t made the proofs of formation irrelevant? Then if there were more than one proof that double* x y were wellformed for a given x and y, a soundness problem could arise. To see how, imagine extending the above example with a sort distinguishing zero as a refinement. zero < nat. z :: even ∧ zero.
Refinement Types as Proof Irrelevance
161
As with even and odd, the sort zero turns into a predicate. Now that z has two sorts, it translates to two proof constructors. zero : nat → type. z1 : even z. z2 : zero z. Next, we can observe that zero always doubles to itself and augment the declaration of double* using an intersection class: double* < double :: → even → sort ∧ zero → zero → sort. After translation, since there are potentially two ways for double* x y to be wellformed, there are two introduction constants for the formation family. x y. 1 : Πx:nat. Πy:nat. even y → double* double*/i x y. 2 : Πx:nat. zero x → Πy:nat. zero y → double* double*/i and double* remain the same. The declarations for double* Now recall the refinement declaration for doubling zero, dbl/z :: double* z z , and observe that it is valid for two reasons, since double* z z is well-formed for two reasons. Consequently, after translation, there will be two proofs inhabit z z, but only one of them will be used in the ing the formation family double* translation of the dbl/z declaration. Supposing it is the first one, we’ll have : double* z z [ double*/i 1 z z z1 ] dbl/z , dbl/z check at the but our soundness criterion will still require that the constant dbl/z 2 z z2 z z2 ] dbl/z, the other possibility. The apparent type double* z z [ double*/i mismatch is resolved by the fact that the formation proofs are irrelevant, and so the two types are considered equal. Following the intuition given above, we may formally describe our translation and prove it correct. But first, we take a brief detour to review prior work on refinement types and proof irrelevance in LF.
3
Refinement Types
Refinement types give means of more precisely characterizing well-typed terms. Systems of refinement types usually sit on top of ordinary type systems, allowing the programmer to specify precise properties of programs already known to be well-typed. This refinement restriction is what allows refinement type systems to employ powerful features like subtyping and intersection and union types without overly complicating the system’s metatheory. Although traditionally treated in the context of functional programming [7, 6, 4, 5], recent work has shown how refinement types can be added to the
162
W. Lovas and F. Pfenning
K ::= type | Πx:A. K
(kinds)
A ::= P | Πx:A1 . A2
(types)
P ::= a | P N
(base types)
L ::= sort | Πx::S. L | | L1 ∧ L2
(classes)
S ::= Q | Πx::S1 . S2 | | S1 ∧ S2
(sorts)
Q ::= s | Q N
(base sorts)
Fig. 1. Syntax of LF types and kinds and LFR sorts and classes
logical framework LF [10], making it easier to adequately represent languages and logics with certain forms of judgmental inclusion and to declare and check precise properties about relations over such languages. Above, in Sections 1 and 2, we saw a simple example involving even and odd natural numbers and properties of the doubling relation. Here, we briefly recapitulate some of the details of the formal development to set the stage for what is to come. LF with refinement types, or LFR, is specified using the methodology of canonical forms, pioneered by Watkins, et al. [16] in the definition of the Concurrent Logical Framework, CLF. Following this methodology, we consider only canonical forms, or terms that are β-normal and η-long. Terms are syntactically restricted to the β-normal ones via a separation into atomic and normal terms: R ::= x | c | R N M, N ::= R | λx. N
(atomic terms) (normal terms)
These terms are typed bidirectionally with a synthesis judgment Γ R ⇒ A and a checking judgment Γ N ⇐ A. All judgments are relative to an implicit signature Σ, which declares types and kinds for term and type constants.1 Σ ::= · | Σ, a:K | Σ, c:A
(LF declarations)
In extending to LFR, we add three new forms of declaration: refinement declarations, constant sorting declarations, and subsorting declarations. · · · | Σ, s
(LFR declarations)
Sorts S refine types A, written Γ S < A; this judgment is defined compositionally, with the base case fulfilled by refinement declarations s
As usual, we write the signature explicitly on the turnstile only when necessary.
Refinement Types as Proof Irrelevance
163
The rule is restricted to base types, P , forcing all variables and constants to be fully applied. Note that although switch is the analogue of the usual “conversion rule”, since terms are canonical (β-normal, η-long), the equality P = P is just α-equivalence—we need not worry about β- or η-conversions. This key rule is changed only slightly for sort-checking: equality of base types P and P becomes subsorting between base sorts Q and Q. Q ≤ Q Γ R ⇒ Q (switch-sub) . Γ R⇐Q The switch-sub rule is in fact the only rule that appeals to subsorting: under the canonical forms methodology, subsorting need only be defined on base sorts, where it is simply the reflexive, transitive closure of the relation declared in the signature, extended through applications to identical arguments. Aside from the change to the switch rule, the only other change from typing to sorting is the addition of rules for introducing and eliminating intersections. Following the usual pattern in bidirectional typing, the introduction rules are checking rules and the elimination rules are synthesis rules. Γ N ⇐ S2 Γ N ⇐ S1 (∧-I) Γ N ⇐ S1 ∧ S2 Γ R ⇒ S1 ∧ S2 (∧-E1 ) Γ R ⇒ S1
Γ N ⇐
Γ R ⇒ S1 ∧ S2 (∧-E2 ) Γ R ⇒ S2
(-I)
(no -E)
To maintain terms in canonical form, we must also replace the usual syntactic substitution [M/x] N with the hereditary substitution [M/x]A N which hereditarily contracts any β-redexes substitution might have created. Hereditary substitution is indexed by the putative type A of the variable x in order to facilitate an early proof of decidability. For the purposes of this paper, though, we will simply write [M/x] N for hereditary substitution, since we have no need for ordinary substitution. In addition to sort-checking being decidable, LFR enjoys the usual Substitution and Expansion Principles: canonical terms may be substituted for variables, and every atomic term can be η-expanded to a canonical one. ηP (R) = R
ηΠx:A. B (R) = λx. ηB (R ηA (x))
Principle (Substitution). If ΓL , x::S, ΓR N ⇐ T and ΓL M ⇐ S, then ΓL , [M/x] ΓR [M/x] N ⇐ [M/x] T . Principle (Expansion). If Γ S < A and Γ R ⇒ S, then Γ ηA (R) ⇐ S.
4
Proof Irrelevance
When constructive type theory is used as a foundation for verified functional programming, we notice that many parts of proofs are computationally irrelevant,
164
W. Lovas and F. Pfenning
that is, their structure does not affect the returned value we are interested in. The role of these proofs is only to guarantee that the returned value satisfies the desired specification. For example, from a proof of ∀x:A. ∃y:B. C(x, y) we may choose to extract a function f : A → B such that C(x, f (x)) holds for every x:A, but ignore the proof that this is the case. The proof must be present, but its identity is irrelevant. Proof-checking in this scenario has to ascertain that such a proof is indeed not needed to compute the relevant result. A similar issue arises when a type theory such as λΠ is used as a logical framework. For example, assume we would like to have an adequate representation of prime numbers, that is, to have a bijection between prime numbers p and closed terms M : primenum. It is relatively easy to define a type family prime : nat → type such that there exists a closed M : prime N if and only if N is prime. Then primenum = Σn:nat. prime n is a candidate (with members N, M ), but it is not actually in bijective correspondence with prime numbers unless the proof M that a number is prime is always unique. Again, we need the existence of M , but would like to ignore its identity. This can be achieved with subset types [3, 14] {x:nat | prime(x)} whose members are just the prime numbers p, but if the restricting predicate is undecidable then type-checking would be undecidable, which is not acceptable for a logical framework. For LF, we further note that Σ is not available as a type constructor, so we instead introduce a new type primenum with exactly one constructor, primenum/i: primenum : type. primenum/i : ΠN:nat. prime N → ÷ primenum. Here the second arrow → ÷ represents a function that ignores the identity of its argument. The inhabitants of primenum, all of the form primenum/i N [M ], are now in bijective correspondence with prime numbers since primenum/i N [M ] = primenum/i N [M ] for all M and M . In the extension of LF with proof irrelevance [11, 12], or LFI, we have a new form of hypothesis x÷A (x has type A, but the identity of x should be irrelevant). In the non-dependent case (the only one important for the purposes of this paper), such an assumption is introduced by a λ-abstraction: Γ, x÷A M ⇐ B . Γ λx. M ⇐ A → ÷B We can use such variables only in places where their identity doesn’t matter, e.g., in the second argument to the constructor primenum/i in the prime number example. More generally, we can only use it in arguments to constructor functions that do not care about the identity of their argument: Γ R⇒A→ ÷B Γ⊕ N ⇐ A . Γ R [N ] ⇒ B Here, Γ ⊕ is the promotion operator which converts any assumption x÷A to x:A, thereby making x usable in N . Note that there is no direct way to use an assumption x÷A.
Refinement Types as Proof Irrelevance
165
Table 1. Judgments of the translation Judgment:
Result:
f ( − ) Γ L
≤
Type of proofs of the formation family Kind of the predicate family
s (−, −, −, −, −) K;K −) Γ S < A ; S( Γ Q
Type of coercions between families of kind K
Γ N ⇐S;N Γ R⇒S;R
Proof that N has sort S Proof that R has sort S
Γ Q1 ≤ Q2 ; F ( − , − ) Q1 ≤ Q2 ; Q 1 −Q2 Γ ctx ; Γ Σ sig ; Σ
Metacoercion from proofs of Q1 to proofs of Q2 Coercion from proofs of Q1 to proofs of Q2
Metafunction representing predicate Proof that Q is well-formed
Translated context Translated signature
The underlying definitional equality “=” (usually just α-conversion on canonical forms) is extended so that R [N ] = R [N ] if R = R , no matter what N and N are. The substitution principle (shown here only in its simplest, non-dependent form) captures the proper typing as well as the irrelevance of assumptions x÷A: Principle (Irrelevant Substitution). If Γ, x÷A N ⇐ B and Γ ⊕ M ⇐ A then Γ [M/x] N ⇐ B and [M/x] N = N (under definitional equality).
5
Interpretation
5.1
Overview
We interpret LFR into LFI by representing sorts as predicates and derivations of sorting as proofs of those predicates. The translation is derivation-directed and compositional: for each judgment Γ J , there is a corresponding judgment Γ J ; X whose rules mimic the rules of Γ J . The syntactic class of X and its precise interpretation vary from judgment to judgment (for reference, the various forms are listed in Table 1), but a great deal of insight can be had by examining the specific cases of sort formation and sort checking. Sort formation is embodied by the refinement judgment Γ S < A. The in which corresponding translation judgment has the form Γ S < A ; S, S is a meta-level function representing the sort S as a predicate. Sort checking . Since N represents a Γ N ⇐ S becomes term translation Γ N ⇐ S ; N 2 proof that N has sort S, we should expect that N ⇐ S(N ). 2
Under an appropriate context, briefly discussed below.
166
W. Lovas and F. Pfenning
For example, take the rule ∧-I of intersection introduction. The corresponding translation rule represents the two independent derivations as a pair of proofs. Accordingly, the intersection sort formation rule yields a product of predicates.3 1 Γ N ⇐ S1 ; N 2 Γ N ⇐ S2 ; N 1 , N 2 Γ N ⇐ S1 ∧ S2 ; N
1 Γ S1 < A ; S 2 Γ S2 < A ; S 1 (N ) × S 2 (N ) Γ S1 ∧ S2 < A ; λN . S
We use a bold λ for meta-level abstraction and bold (parens) for meta-level application. Our translation is similar in spirit to Liquori and Ronchi Della Rocca’s Λt∧ [9], a Church-style type system for intersections in which derivations are explicitly represented as proofs and intersections as products. At the top-level, we are interested in checking entire signatures, so it is also instructive to examine the rules for translating the LFR declarations. As we saw with even and odd in Section 1, sorting declarations for constants turn into proof constructor declarations. Σ sig ; Σ c:A ∈ Σ · Σ S < A ; S A (c)) Σ, c::S sig ; Σ, c:S(η This matches our intuitions: the proof constructor c witnesses the fact that the constant c satisfies property S. Since our predicates expect terms in canonical form, we η-expand the constant. The translation on contexts is similar. As we saw with double* in Section 2, a refinement declaration turns into three declarations: one for the formation family, one for the proof constructor for the formation family, and one for the predicate family. pred form p K ; K · Σ L < K ; L f s:K, s/i:L f ( p ( Σ, s
Σ sig ; Σ
a:K ∈ Σ
form The class formation judgment Γ L < K ; L f yields a metafunction describing the type of proofs of formation family, while an auxiliary kind translation pred p yields a metafunction describing the kind of the predicate judgment K ; K family. Recall from our earlier example that the kind of the formation family is the same as the kind of the refined type, in this case K. f takes as input the formation family so far (initially just The metafunction L s) and the translation derivation adds arguments on the way up. At the base case it returns the formation family itself. p takes two arguments: one for the formation family so The metafunction K far (initially s) and one for the refined type so far (initially a). This translation is characterized by its behavior on the base kind, type: pred
type ; λ(Qf , P ). Qf → ÷ P → type 3
For compositionality’s sake, we target an extension of LFI with product and unit types. Such an extension is orthogonal to the addition of proof irrelevance, and has been studied by many people over the years, including Sch¨ urmann [15].
Refinement Types as Proof Irrelevance
167
The kind of the predicate family for a base sort Q refining P is essentially a oneplace judgment on terms of type P , along with an irrelevant argument belonging to the formation family of Q. In light of this, we can make sense of the rule for translating base sorts: Γ Q < P ⇒ L ; Q
P = P L = sort N Γ Q < P ; λN . Q [Q]
yields a proof of The class synthesis translation judgment Γ Q < P ⇒ L ; Q Q’s formation family; thus the predicate for a base sort Q, given an argument that Q is N , is simply the predicate family Q applied to an irrelevant proof Q well-formed and the argument itself, N . We postpone a discussion of subsorting declarations s1 ≤s2 until after a brief review of some metatheoretic results. 5.2
Correctness
Soundness theorems tell us that the result of a translation is well-formed. Even more importantly than telling us that our translation is on some level correct, they serve as an independent means of understanding the translation. In a sense, a soundness theorem can be read as the meta-level type of a translation judgment, and just as types serve as an organizing principle for the practicing programmer, so too do soundness theorems serve the thoughtful theoretician. We mention a few such theorems, then, not only to demonstrate the sensibility of our translation, but also to aid the reader in understanding its purpose. In what follows, form(Q) represents the formation family for a base sort Q. form(s) = s
form(Q N ) = form(Q) N
Then: Theorem 1 (Soundness). Suppose Γ ctx ; Γ and Σ sig ; Σ. ⇐ S(N ). , then Γ N 1. If Γ S < A ; S and Γ N ⇐ S ; N Σ 2. If Γ R ⇒ S ; R, then Γ S < A ; S and Γ Σ R ⇒ S(ηA (R)) (for some A and S). ) ⇐ type. 3. If Γ S < A ; S and Γ N ⇐ A, then Γ Σ S(N p, 4. If Γ Q < P ⇒ L ; Q, then for some K, Lf , and K form ⇒L f ( form(Q)), and – Γ L < K ; Lf and Γ Σ Q pred p and Γ Q ⇒ K p ( form(Q), P ). – K ; K Σ
f (P ) ⇐ type. f and Γ P ⇒ K, then Γ L 5. If Γ L < K ; L Σ pred p (Qf , P ) ⇐ kind. p , Γ Qf ⇒ K, and Γ P ⇒ K, then Γ K 6. If K ; K Σ form
The soundness theorems are each proven by induction on the theorem’s main input derivation. Several clauses must be proved mutually, and not all theorems are shown here. The proofs appeal to several key lemmas.
168
W. Lovas and F. Pfenning
Lemma 1 (Erasure). If Γ J ; X, then Γ J . Lemma 2 (Reconstruction). If Γ J , then for some X, Γ J ; X. Erasure and Reconstruction substantiate the claim that our translation is derivation-directed by allowing us to move freely between translation judgments and ordinary ones. Using Erasure and Reconstruction, we can leverage all of the LFR metatheory without reproving it for translation judgments. For example, several cases require us to substitute into a translation derivation: we can apply Erasure, appeal to LFR’s Substitution Theorem, and invoke Reconstruction to get the output we require. But since Reconstruction only gives us some output X, we may not know that it is the one that suits our needs. Therefore, we usually require another lemma, Compositionality, to tell us that the translation commutes with substitution. There are several such lemmas; we show here the one for sort translation. Lemma 3 (Compositionality). Let σ denote [M/x]. If Γ, x:: S < A ; S ) = S (σN ). and Γ σS < σA ; S , then σ S(N Completeness theorems tell us that our target is not too rich: that everything we find evidence of in the codomain of the translation actually holds true in its domain. While important for establishing general correctness, completeness theorems are not quite so nice to look at as soundness theorems, so we give here only the cases for terms. Then: Theorem 2 (Completeness). Suppose Γ ctx ; Γ and Σ sig ; Σ. ⇐ S(N ), then Γ N ⇐ S. 1. If Γ S < A ; S and Γ Σ N ⇒ B, then Γ S < A ; S, B = S(η A (R)), and Γ R ⇒ S 2. If Γ Σ R (for some S, A, S, and R). In stating Completeness, we syntactically isolate the set of terms that could and N . Completeness is proven by represent proofs using, e.g., metavariables R induction over the structure of these terms. Adequacy of a representation is generally shown by exhibiting a compositional bijection between informal entities and terms of certain LFR sorts. Since we have undertaken a subset interpretation, the set of terms of any LFR sort are unchanged by translation, and so any bijective correspondence between those terms and informal entities remains after translation. Furthermore, soundness and completeness tell us that our interpretation preserves and reflects the derivability of any refinement type judgments over those terms. Thus, we have achieved our main goal: any adequate LFR representation can be translated to an adequate LFI representation. 5.3
Subsorting
We now return to the question of how the translation handles subsorting. Recall that an LFR signature can include subsorting declarations between sort family
Refinement Types as Proof Irrelevance
169
constants, s1 ≤s2 . We require both sort constants to refine the same type constant a and to have the same class L. The rule for translating such declarations creates a coercion constant s1 -s2 . Σ sig ; Σ
s1
s2
≤ s K;K
≤ The auxiliary judgment K ; K s yields a metafunction describing the type of coercions between sorts that refine a type family of kind K. The metafunction s takes five arguments: the refined type, the formation family and predicate K family for the domain of the coercion, and the formation family and predicate family for the codomain of the coercion. As before, the translation derivation builds up a spine of arguments on the way up towards the leaves. At the base kind type, it outputs the type of the coercion:
≤
type ; λ(P , Q1 f , Q1 , Q2 f , Q2 ). Πf1 :Q1 f . Πf2 :Q2 f . Πx:P. Q1 [f1 ] x → Q2 [f2 ] x
Essentially, this is the type of coercions, given x, from proofs of Q1 x to proofs of Q2 x, but of course we must pass Q1 and Q2 proof that they are well-formed, so the coercion requires those proofs as inputs as well. How do these coercions work? Recall from Section 3 that subsorting need only be defined at base sorts Q, and there, it is simply the application-compatible, reflexive, transitive closure of the declared relation. For the purposes of the translation, we give an equivalent algorithmic formulation of subsorting. Following the inspiration of bidirectional typing, we give two judgments: a checking judgment that takes two base sorts as inputs and a synthesis judgment that takes one base sort as input and outputs another base sort that is one step higher in the subsort hierarchy. The synthesis judgment constructs a coercion from the new coercion constants in the signature. s1 ≤s2 ∈ Σ s1 ≤ s2 ; s1 -s2
Q1 ≤ Q2 ; Q 1 −Q2 Q1 N ≤ Q2 N ; Q 1 −Q2 N
The checking judgment, on the other hand, constructs a meta-level coercion between the two sorts. It is defined by two rules: a rule of reflexivity and a rule to climb the subsort hierarchy. Q1 = Q2 (refl) Γ Q1 ≤ Q2 ; λ(R, R1 ). R1 1 Γ Q1 < P ⇒ sort ; Q Γ Q < P ⇒ sort ; Q (climb) Γ Q1 ≤ Q2 ; λ(R, R1 ). F (R, Q 1 −Q Q1 Q R R1 ) Q1 ≤ Q ; Q 1 −Q Γ Q ≤ Q2 ; F
170
W. Lovas and F. Pfenning
The reflexivity rule’s metacoercion simply returns the proof it is given, while the climb rule composes the actual coercion Q 1 −Q with the metacoercion F . Two extra premises generate the necessary formation proofs. We can now end where we started, with the switch rule. Using the metacoercion generated by subsort checking, we can construct the proof we need for soundness. Γ R ⇒ Q ; R
Γ Q ≤ Q ; F (switch-sub). Γ R ⇐ Q ; F (R, R)
6
Conclusion
Logical frameworks are metalanguages specifically designed so that common concepts and notations in logic and the theory of programming languages can be represented elegantly and concisely. LF [8] intrinsically supports α-conversion, capture-avoiding substitution, and hypothetical and parametric judgments, but as with any such enterprise, certain patterns fall out of its scope and must be encoded indirectly. One pattern is the ability to form regular subsets of types already defined. This is addressed in LF extended with type refinement (LFR) [10]. Another pattern is to ignore the identities of proofs, relying only on their existence. This is addressed in LF extended with proof irrelevance (LFI) [11, 12]. In this paper we have shown that the former can be mapped to the latter in a bijective manner, preserving adequacy theorems for LFR representations in LFI. In the methodology of logical frameworks research, it is important to understand the cost of such a translation: how much more complicated are encodings in the target framework, and how much more difficult is it to work with them? We cannot measure this cost precisely, but we hope it is evident from the definition of the translation and the examples that the price is considerable. Even if in special cases more direct encodings are possible, we believe our general translation could not be simplified much, given the explicit goal to preserve the adequacy of representations. Other translations from programming languages, such as coercion interpretations where sorts are translated to distinct types and subsorting to coercions, appear even more complex because adequacy depends on certain functional equalities between coercions. Our preliminary conclusion is that refinement types in logical frameworks provide elegant and immediate representations that are not easy to simulate without them, providing a solid argument for their inclusion in the next generation of frameworks. Refinement types have been also been proposed for functional programming [7, 4], most recently in conjunction with a limited form of dependent types [5]. Proof irrelevance is already integrated in this setting, and also available in general type theories such as NuPrl or Coq. One can ask the same question here: Can we simply eliminate refinement types and just work with dependent types and proof irrelevance? The results in this paper lend support to the conjecture that this can be accomplished by a uniform translation. On the other hand, just as here, it seems there would likely be a high cost in terms of brevity in order to
Refinement Types as Proof Irrelevance
171
maintain a bijection between well-sorted data in the source and dependently well-typed data in the target of the translation. Acknowledgements. Thanks to Jason Reed for many fruitful discussions on the topic of proof irrelevance.
References 1. Awodey, S., Bauer, A.: Propositions as [types]. Journal of Logic and Computation 14(4), 447–471 (2004) 2. Breazu-Tannen, V., Coquand, T., Gunter, C.A., Scedrov, A.: Inheritance as implicit coercion. Information and Computation 93(1), 172–221 (1991) 3. Constable, R.L., et al.: Implementing Mathematics with the Nuprl Proof Development System. Prentice-Hall, Englewood Cliffs (1986) 4. Davies, R.: Practical Refinement-Type Checking. PhD thesis, Carnegie Mellon University (May 2005); Available as Technical Report CMU-CS-05-110 5. Dunfield, J.: A Unified System of Type Refinements. PhD thesis, Carnegie Mellon University (August 2007); Available as Technical Report CMU-CS-07-129 6. Dunfield, J., Pfenning, F.: Tridirectional typechecking. In: Leroy, X. (ed.) ACM Symp. Principles of Programming Languages (POPL 2004), Venice, Italy, pp. 281– 292 (January 2004) 7. Freeman, T.: Refinement Types for ML. PhD thesis, Carnegie Mellon University (March 1994); Available as Technical Report CMU-CS-94-110 8. Harper, R., Honsell, F., Plotkin, G.: A framework for defining logics. Journal of the Association for Computing Machinery 40(1), 143–184 (1993) 9. Liquori, L., Ronchi Della Rocca, S.: Intersection-types ` a la Church. Information and Computation 205(9), 1371–1386 (2007) 10. Lovas, W., Pfenning, F.: A bidirectional refinement type system for LF. Electronic Notes in Theoretical Computer Science 196, 113–128 (2008) 11. Pfenning, F.: Intensionality, extensionality, and proof irrelevance in modal type theory. In: Halpern, J. (ed.) Proceedings of the 16th Annual Symposium on Logic in Computer Science (LICS 2001), Boston, Massachusetts, pp. 221–230. IEEE Computer Society Press, Los Alamitos (2001) 12. Reed, J., Pfenning, F.: Proof irrelevance in a logical framework. Unpublished draft (July 2008) 13. Reynolds, J.C.: The coherence of languages with intersection types. In: Ito, T., Meyer, A.R. (eds.) TACS 1991. LNCS, vol. 526, pp. 675–700. Springer, Heidelberg (1991) 14. Salvesen, A., Smith, J.M.: The strength of the subset type in Martin-L¨ of’s type theory. In: Proceedings of LICS 1988, pp. 384–391. IEEE Computer Society Press, Los Alamitos (1988) 15. Sch¨ urmann, C.: Towards practical functional programming with logical frameworks (July 2003), http://cs-www.cs.yale.edu/homes/carsten/delphin/ (unpublished) 16. Watkins, K., Cervesato, I., Pfenning, F., Walker, D.: A concurrent logical framework I: Judgments and properties. Technical Report CMU-CS-02-101, Department of Computer Science, Carnegie Mellon University (2002) (revised, May 2003) 17. Zeilberger, N.: Refinement types and computational duality. In: PLPV 2009: Proceedings of the 3rd workshop on Programming Languages Meets Program Verification, pp. 15–26. ACM, New York (2009)
Weak ω-Categories from Intensional Type Theory Peter LeFanu Lumsdaine Carnegie Mellon University [email protected]
Abstract. Higher-dimensional categories have recently emerged as a natural context for modelling intensional type theories; this raises the question of what higher-categorical structures the syntax of type theory naturally forms. We show that for any type in Martin-L¨ of Intensional Type Theory, the system of terms of that type and its higher identity types forms a weak ω-category in the sense of Leinster. Precisely, we construct a contractible globular operad PMLId of type-theoretically definable composition laws, and give an action of this operad on any type and its identity types.
1
Overview
Starting with the Hofmann-Streicher groupoid model [1], higher categories have emerged as a natural approach to the semantics of intensional Martin-L¨ of type theory. In the globular approach to higher categories, a higher category has objects (“0-cells”), arrows (“1-cells”) between objects, 2-cells between 1-cells, and so on, with various composition operations and laws depending on the kind of category in question (strict or weak, n- or ω-, . . .). The paradigm for semantics of type theory is then (very roughly!) that types (or contexts) are thought of as / [[B]], terms of identity objects [[A]], terms x : A τ : B as arrows [[τ ]] : [[A]] +3 [[τ ]], terms χ : Id(ρ, ρ ) as 3-cells, and type ρ : IdB (τ, τ ) as 2-cells [[ρ]] : [[τ ]] so on. This idea has recently been explored by various authors in various directions: see for instance [2], [3], [4]. One such direction is investigating the structures formed by the syntax of type theory. In particular, it has been suggested that (terms of) any type (considered together with its higher identity types) should carry the structure of a weak ω-category or -groupoid. We will show that this is indeed the case, using the definition of weak ω-category given by Tom Leinster in [5] following the approach of Michael Batanin [6]. (Note that for this construction, based on a specific type A, the dimensions of cells are always one lower than described above: 0-cells will be terms τ : A, 1-cells will be terms ρ : IdA (τ, τ ), and so on. This comes from the general rule that if X, A are objects of an n-category C, then C(X, A) forms an (n − 1)-category whose 0-cells are 1-cells of C, and so on.) An extended version of this paper, including more background material and proofs, is available online [7]. While writing this paper, I found that Benno van P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 172–187, 2009. c Springer-Verlag Berlin Heidelberg 2009
Weak ω-Categories from Intensional Type Theory
173
den Berg had independently discovered a similar proof (proposed in 2006 and since completed in unpublished work [8]); a development of this is forthcoming in joint work of van den Berg and Richard Garner [9]. 1.1
Outline of the Construction
(We assume throughout some general familiarity with higher category theory, but not with the particular definition of weak ω-category used, which we will recall in detail in later sections, and similarly for the type theory.) Recall that an ω-category C has a set Cn of “n-cells” for each n > 0. The 0and 1-cells correspond to the objects and arrows of an ordinary category: each arrow f has source and target objects a = s(f ), b = t(f ). Similarly, the source // b, and generally and target of a 2-cell α are a parallel pair of 1-cells f, g : a the source and target of an (n + 1)-cell are a parallel pair of n-cells. Cells of each dimension can be composed along a common boundary in any lower dimension, and in a strict ω-category, the composition satisfies various associativity, unit, and interchange laws, captured by the generalised associativity law: each labelled pasting diagram has a unique composite. (See illustrations in Fig. 1). f f
a •
f
a
•
"
a
b
/•
•
b
α <•
•
Θ
α
_*4
!
g
B•
β
}
g f a
•
b
/•
f
c
•
f
f
f
g ·0 f •
α
a
/•
g
γ
b / D•
f
a
•
α
g
/•
h ·0 (g ·0 f ) = (h ·0 g) ·0 f
h
/•
g
@•
f
γ ·1 α
/•
b
β
c
@•
g
β ·0 α
α
γ
•
/ E•
β / • E δ
(δ ·0 γ) ·1 (β ·0 α) = (γ ·1 α) ·0 (δ ·1 β)
Fig. 1. Some cells, composites, and associativities in a strict ω-category
In a weak ω-category, we do not expect strict associativity, so may have multiple composites for a given pasting diagram, but we do demand that these composites agree up to cells of the next dimension (“up to homotopy”), and that these associativity cells satisfy certain coherence laws of their own, again up to cells of higher dimension, and so on. This is exactly the situation we find in intensional type theory. For instance, even in constructing a term witnessing the transitivity of identity—that is, a
174
P. LeF. Lumsdaine
composition law for the pasting diagram ( • term c such that
/•
/ • ), or explicitly a
x, y, z : X, p : Id(x, y), q : Id(y, z) c(q, p) : Id(x, z) —one finds that there is no single canonical candidate: most obvious are the two equally natural terms cl , cr obtained by applying (Id-elim) to p or to q respectively. These are not definitionally equal, but are propositionally equal, i.e. equal up to a 2-cell: there is a term e with x, y, z : X, p : Id(x, y), q : Id(y, z) e(q, p) : Id(cl (q, p), cr (q, p)). In Leinster’s definition [5], a system of composition laws of this sort is wrapped up in the algebraic structure of a globular operad with contraction, and a weak ω-category is given by a globular set equipped with an action of such an operad. We generalise this slightly, to define an internal weak ω-category in any suitable category C. Accordingly, we would like to find an operad-with-contraction PMLId of all such syntactically-definable composition laws, acting on terms of any type and its identity types. In fact, rather than using the full type theory for this, it is enough to consider the composition laws definable using just the Id- rules, so obtaining the construction for a wider class of theories. The heart of the paper is Sect. 4, where we formalise this idea by considering a type theory MLId [X], the fragment of MLI generated just by the structural and Id-rules plus a single generic base type X; then PMLId is the endomorphism operad of the globular object X• in its syntactic category C(MLId [X]), and by some analysis of the fragment MLId [X] we show that PMLId is contractible. Since X is generic, PMLId acts on all other types, giving our main theorem: Theorem. Let T be any type theory extending MLId , and A any type of T . Then the system of types (A, IdA , IdIdA , . . .) is equipped naturally with a PMLId -action, and hence with the structure of an internal weak ω-category in C(T ). To prepare for this, we first lay out in Sect. 2 our presentation of the full type theory MLI and of the fragment MLId , and in Sect. 3 the relevant background on globular operads and their algebras.
2 2.1
The Type-Theoretic Setting The Type Theories MLId , MLId [X]
Our main theories of interest are the various versions of Intensional MartinL¨of Type Theory, usually given with identity types (Id-types), dependent sums and products ( - and -types), units (1-types), and possibly more base types (natural numbers, Booleans. . .). To cover all these in the main theorem, and for a self-contained presentation, we will work throughout this paper in the fragment MLId with only Id-types, and construct our operad from this.
Weak ω-Categories from Intensional Type Theory
175
Table 1. The type theory MLId Basic judgement forms Γ A type Γ A = B type Γ a:A Γ a=b:A Structural groups Variables (Vble) Substitution (Subst) Weakening (Wkg) Exchange (Exch) Equality (=)
Id-rules Id-form Id-intro Id-elim Id-comp (“β” in [10]) compatibility with substitution and =
Some care is thus required in our choice of presentation; presentations which are equivalent in the presence of - or -types may not be so in their absence. The presentation we use is taken, up to notation, from that of Jacobs [10]; we list in Table 1 the rules assumed, referring to [10] for their statements, except for the Id-rules, given in full in Table 2. The only features perhaps needing comment are the explicit inclusion of exchange rules, and of the extra dependent context Δ in the Id-rules; these are each natural rules, but often omitted since they are derivable in the presence of Π-types (as discussed on e.g. p.587 of [10]). Note that from Exch and this Id-elim rule, we can derive a still slightly more general elimination rule Id-elim+ , as Id-elim but with context Γ, x : A, Δ, y : A, Δ , p : IdA (x, y), Δ . To simplify notation when referring to iterated identity types, we introduce the notation (following Warren [11]) An for the nth iterated identity type of a type A; that is, if Γ A type, then Γ A0 := A type, and inductively Γ, x0 , y0 : A0 , x1 , y1 : A1 (x0 , y0 ), . . . , xn , yn : An (x0 , y0 ; . . . ; xn−1 , yn−1 ) An+1 (x0 , y0 ; . . . ; xn , yn ) := IdAn (x0 ,...) (xn , yn ) type. We will often omit the superscripts on these when unambiguous. As usual, we will also be inconsistent in suppression of dependent variables, writing sometimes e.g. Γ A type and sometimes y : Γ A( y ) type. Finally, for a finite partial order I = {i1 < . . . < in }, we will write i∈I xi : Ai (or just i∈I Ai ) to denote the context xi1 : Ai1 , . . . , xin : Ain . 2.2
Translations and Syntactic Categories
For reference on this section (including proofs not given here), see Cartmell [12] and Jacobs [10].
176
P. LeF. Lumsdaine Table 2. Rules for Id-types Γ A type
Id-form
Γ, x, y : A IdA (x, y) type Γ A type Γ, x : A r(x) : IdA (x, x)
Id-intro
Γ, x, y : A, p : IdA (x, y), Δ(x, y, p) C(x, y, p) type Γ, z : A, Δ(z, z, r(z)) d(z) : C(z, z, r(z)) Γ, x, y : A, p : IdA (x, y), Δ(x, y, p) Jz.d (x, y, p) : C(x, y, p) (premises as for Id-elim) Γ, x : A, Δ(x, x, r(x)) Jz.d (x, x, r(x)) = d(x) : C(x, x, r(x))
Id-elim
Id-comp
From here on, we will consider type theories extending MLId ; precisely, by a type theory we will mean a generalised algebraic theory together with an interpretation of MLId in T , in the sense of Cartmell [12]. Recall that a translation F from such a type theory T into a type theory S consists of suitable mappings of types, terms, and derivable judgements, taking each judgement Γ A type in T to a judgement F (Γ ) F (A) type in S, and so on, preserving Id-types and their term-constructors (or in other words, a morphism of generalised algebraic theories under MLId ). Given T , we write T [X] for the result of adjoining to T a fresh base type X X type
X-form
/ S then with no term formation rules. For any S, a translation F : T [X] / consists of a translation F : T S together with a closed type of S. Stating this precisely in the particular case that we will need: Proposition 1. If S is any type theory extending MLId , and A any closed type / S preserving Id-types of S, then there is a unique translation A : MLId [X] and their term-constructors and with A (X) = A. For any type theory T , there is a syntactic category C(T ), having as objects the / Δ suitable strings of terms closed contexts Γ of T , and as arrows f : Γ in context Γ (context maps), all up to α-equivalence and definitional equality. / C(S); / S induces a functor C(F ) : C(T ) Moreover, a translation F : T / in other words, we have a functor C(−) : Th Cat. We will need a simple proposition on limits in syntactic categories: , and F ⊆ P(I) a Proposition 2. Suppose Γ = i∈I xi : Ai is a context in T set of subsets of I, closed under binary intersection and with F = I, such that for each J ∈ F, ΓJ = i∈J xi : Ai is also a well-formed context.
Weak ω-Categories from Intensional Type Theory
177
Then the ΓJ ’s and dependent projections between them give a diagram Γ− : / C(T ), and the dependent projections Γ / ΓJ express Γ as its (F , ⊆)op limit: Γ = limJ∈F ΓJ . ←− / T , the functor C(F ) preserves all Moreover, for any translation F : T such limits. Here dependent projections are the obvious context morphisms from a context to any well-formed subcontext, constructed from the Vble, Wkg and Exch rules. A familiar special case asserts that if Γ A type and Γ B type, then the following square is a pullback: Γ.A.B
πA .B
/ Γ.B
πA
/Γ
πB
Γ.A
πB
The proof of the general proposition is essentially the same. To relativise the constructions of this section to dependent types and contexts over a (closed) context Γ = 0≤i
3
Globular Operads and Their Algebras
As described in the introduction, we want to describe “the globular operad of composition laws”. Accordingly, we recall briefly in this section what a globular operad is, and how it formalises the intuition of a set of composition laws for pasting diagrams with structure specifying how these laws themselves compose. For a slightly (resp. much) fuller treatment, and background on strict higher categories, see Leinster [13] (resp. [5]). 3.1
Globular Operads and Weak ω-Categories
A globular set is a presheaf on the category G generated by arrows 0
s0 t0
/
/1
s1 t1
/
/2
/ / ...
subject to the equations ss = ts, st = tt (omitting subscripts on the arrows, as := [Gop , Sets] of globular sets and natural usual). We thus have the category G transformations between them. More generally, a globular object in a category / C. C is a functor A• : Gop
178
P. LeF. Lumsdaine
Explicitly, a globular set A• has a set An of “n-cells” for each n ∈ N, and each (n + 1)-cell x has parallel source and target n-cells s(x), t(x), as illustrated in the first line of Fig. 1. (Cells x, y of dimension > 0 are parallel if s(x) = s(y) and t(x) = t(y); all 0-cells are considered parallel.) For parallel x, y ∈ An , we write A(x, y) := {z ∈ An+1 | s(z) = x, t(z) = y}, the set of n + 1-cells from x to y. Example 1. For any topological space X, there is a globular set Πω (X) in which 0-cells are points of X, 1-cells are paths between points, 2-cells are homotopies between paths keeping endpoints fixed, and in general, n-cells are suitable maps / X, viewed as homotopies between (n − 1)-cells. H : [0, 1]n Example 2. For any type A in a type theory T , the contexts x0 , y0 : A, x1 , y1 : A1 (x0 , y0 ) . . . , z : An (x0 , y0 . . . , xn−1 , yn−1 ), along with their dependent projections, form a globular object A• in C(T ). Any strict ω-category (as sketched in the Introduction) has an evident underlying globular set, and in fact there is an adjunction (moreover monadic) / str-ω-Cat : U , giving rise to the “free strict ω-category” monad T o F :G Cells of T A• are free (strictly associative) pastings-together of cells from on G. A• , including degenerate pastings from the identity cells of F (A• ) (as shown in figure 2). h a
•
a
a ∈ T X0 a
•
h
1a ∈ T X1
• b
/•
• g
g ·0 h ∈ T X1
c
/•
β
a
h β
g
b / E•
α
"
c
<•
f
d
/•
g
h
1f ·0 α ·0 (β ·1 β ) ∈ T X2
(for a, b, . . . ∈ A0 , f, g, . . . ∈ A1 , α, β, . . . ∈ A2 ) Fig. 2. Some labelled pasting diagrams, elements of a free strict ω-category T A•
In particular, T 1 (where 1 denotes the terminal globular set, with just one cell of each dimension) consists informally of pastings of this sort, but without labels on the cells. This is the crucial globular set of pasting diagrams. Every ˆ —intuitively, the set pasting diagram π ∈ T 1n has an associated globular set π of cells appearing in π, as shown in our pictures of pasting throughout. diagrams Taking categories of elements then gives a category π := G π ˆ , with objects k k the cells of π ˆ and arrows into each cell c from its sources and targets s (c), t (c), / G giving the dimension of each cell; π may and with a functor dim : π be seen as the shape of the canonical diagram of basic cells whose colimit in G gives π ˆ.
Weak ω-Categories from Intensional Type Theory
179
For more discussion of these various ways of looking at a pasting diagram, see Street [14], or the extended version of this paper [7]. / T 1 (“arity”), A globular operad is a globular set P with maps a : P / / P (“units”), m : T P ×T 1 P P (“composition”), such that e:1 /P 1/ // // a η // // T1 e
/P
a
. T1
T P ×T?1 P ??? T P? P ??T ! a ? Ta T1 T 21
μ
m
commute (i.e. e and m are maps over T 1), satisfying the axioms m · (η · e × 1P ) = 1P = m · (η × e) : P
/ P,
m · (μ × m) = m · (T m × 1P ) : T 2 P ×T 2 1 T P ×T 1 P
/ P.
Considering the fibers of a, we may view P as a family of sets P (π) of “π-ary operations” for each π ∈ T 1: an element p of P (π) is seen as a formal operation symbol, taking π-shaped labelled pasting diagrams as input and returning ncells as output. The map e then gives us an n-cell “identity” operation for each n, while m allows us to compose operations appropriately. / Q of globular operads is a map of underlying globular sets A map f : P commuting with a, e and m. An action of a globular operad P on a globular set X is a composition map / X, satisfying c : T X ×T 1 P c · (η × e) = 1X : X
/ X,
c · (μ × m) = c · (T c × 1P ) : T 2 X ×T 2 1 T P ×T 1 P
/ X.
Informally, this implements the “operations” of P as actual operations on X. An element of T X ×T 1 P over some π ∈ T 1n is a π-shaped diagram x with labels from X together with a π-ary operation p of P ; c tells us how to apply p to x, yielding a single n-cell of X. A P -algebra is a globular set X together with an action of P on X. A map / Y of P -algebras is a map of globular sets commuting with the P f : X actions. We denote the resulting category by P -Alg. Example 3. The globular set T 1 is itself trivially an operad (indeed, the terminal one), with a = 1T 1 , i.e. T 1(π) = 1 for every π; a T 1-algebra is then exactly a strict ω-category. This fits with our description above of a strict ω-category having a unique composition for each pasting diagram. Weak ω-categories will also be described as algebras for a cetain globular operad; to find a suitable operad, we need to specify a little extra structure.
180
P. LeF. Lumsdaine
/ Y of globular sets is a choice of liftings A contraction on a map d : X for fillers of parallel pairs: that is, for each parallel pair x, x ∈ X (with the / X(x, x ), convention that all 0-cells are parallel), a map χx,x : Y (dx, dx ) such that d · χ = 1Y . A globular operad with contraction is a globular operad / T 1; this ensures both that enough P with a contraction on the map a : P composition operations exist in P , and that the operations will be associative up to cells of the next dimension, themselves satisfying appropriate coherence laws up to yet higher cells, and so on. It is shown in [5] that the category of globular operads with contraction has an initial object L; this gives the key definition: Definition 1. A weak ω-category is an L-algebra, where L is the initial operadwith-contraction. / O-Alg; so if we / P of operads induces a functor P -Alg A map O have an algebra X for any operad P with contraction, the unique operad-with/ P endows X with the structure of a weak ω-category. contraction map L Example 4. The terminal operad T 1 has a trivial contraction, giving a canonical / wk-ω-Cat. functor str-ω-Cat Example 5. For any space X, the set Πω (X) of Example 1 may be naturally made into a weak ω-category, the fundamental weak ω-groupoid of X. [5, 9.2.7] 3.2
Endomorphism Operads and More General Actions
/ C is a globular object in a category C, Proposition 3. Suppose X• : Gop and for each pasting diagram π we are given a chosen limit Xπ = limc∈ π Xdim c , ←− an object of “diagrams of shape π in X• ”. Then there is an operad EndC (X• ), or just End(X• ), the endomorphism globular operad of X• , in which (for π ∈ T 1n) an element of End(X• )(π) is a sequence of maps ((σi , τi )0 ≤ i < n; ρ), ρ : Xπ
/ Xn ,
σi : Xsn−i π
/ Xi ,
τi : Xtn−1 π
/ Xi ,
commuting appropriately with the source and target maps—in other words, a way of composing each diagram of shape π in X• to an n-cell of X, extending given ways of composing the source and target in each lower dimension. (More abstractly, such ((σi , τi )0 ≤ i < n; ρ) may be described as a natural / C.) transformation between a certain pair of diagrams (G/n)op / D is a functor preserving the limits involved, we can Moreover, if F : C / End(F X). also construct End(F X), and there is a natural map End(X)
Weak ω-Categories from Intensional Type Theory
181
An illustration may be useful here: the definition of Xπ says, for instance, that when π = ( •
$
:•
⎛
⎜ ⎜ ⎜ Xπ := lim ⎜ ←− ⎜ ⎜ ⎝
$
: • ), XO 1 D XO 1 D DDt Dt s zz s zz s s DDD DD z z }zz ! }zz ! X2 X0 aD X X 0 2 = aDD s = X0 DDs t zz t zz DD t DD t z z D zz zz X1 X1
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
∼ = X2 ×X0 X2 , giving the object of 0-composable pairs of 2-cells in X. In the case C = Sets, the sets Xπ are precisely the fibers of the map T ! : / T 1, so this description of End(X• ) can be shown to agree with the TX definition for globular sets given in [5, 6.4]. We will not give a concrete description of the operad composition maps, noting only that they are “exactly what one would expect”. Proof. See the extended version of this paper [7].
We can now extend the definitions of the previous subsection. An action of an / End(X• ). (If C = Sets then this operad P on X• is a map of operads P agrees with our earlier notions of an action on a globular set, by [5, 6.4]). A P -algebra in a category C is a globular object in C together with an P -action; an internal weak ω-category in C is an L-algebra in C. Moreover, an action of P on X• induces an action of P on the globular set / Sets preserves all limits, and C(Y, X• ) for any Y ∈ C, since C(Y, −) : C / / hence we have maps P End(X• ) End(C(Y, X• )). (This construction of the endomorphism globular operad generalises the case given in [5, 9.2.7].)
4
The Contractible Globular Operad PMLId
In this section, we construct the promised operad PMLId , which will act on any type; we then show that it is contractible, and describe (in the main theorem) how it acts to give the desired weak ω-category structures on types. 4.1
Construction of PMLId
We saw above that for a type A in a type theory T extending MLId , the contexts x0 , y0 : A, x1 , y1 : A1 (x0 , y0 ) . . . , z : An (x0 , y0 . . . , xn−1 , yn−1 ), together with the dependent projections between them, form a “globular con/ C(T ). Using the machinery of the previous section, it is now text” A• : Gop
182
P. LeF. Lumsdaine
easy to describe PMLId : it is EndC(MLId [X]) (X• ). However, since C(T ) does not have chosen limits in general, to show the existence of the operad EndC(T ) (A• ) we must construct contexts Γπ providing the limits required by Proposition 3. ˆ. Accordingly, suppose we are given π ∈ T 1n , with associated globular set π There are various ways of putting a total order on the i-cells of π ˆ for each i ≤ n; pick one such. (There is in fact a canonical choice of such orderings; see [7] for details. This choice has some extra cosmetic properties which will later spare us some use of Exch rules, so for simplicity we will assume it is the ordering chosen.) Then take Γπ to be the context xc : A, xc : A1 (xs(c) , xt(c) ), . . . xc : An (xsn (c) , xtn (c) ; . . . ; xs(c) , xt(c) ). c∈ˆ π0
c∈ˆ π1
c∈ˆ πn
For instance, Γ(•
/•
/ •) is the context
x, y, z : A, p : IdA (x, y), q : IdA (y, z) which we met back in Subsection 1.1. Note that we also have projections src : Γπ
/ Γs(π) , and tgt similarly.
Lemma 1. These contexts exhibit the limits required by Proposition 3; that is, Γπ = limc∈ π Xdim c , with the obvious projection morphisms. Moreover, if F : ←− / S is a translation of type theories, then C(F ) : C(T ) / C(S) preserves T these limits. Proof. By Proposition 2, on limits in syntactic categories.
Thus by Proposition 3, the definition of endomorphism operads, we see: Theorem 1. The globular object A• in C(T ) has an endomorphism operad / S is a translation of type theories, there is EndC(T ) (A• ); and if F : T / EndC(S) (F A• ). an induced map of operads EndC(T ) (A• ) Let us unfold what this operad P := EndC(T ) (A• ) actually looks like. For π ∈ / An in C(T ), and for T 1n , an element of P (π) consists of a map ρ : Γπ / Ak and τk : Γtn−k (π) / Ak , commuting 0 ≤ k < n, maps σk : Γsn−k (π) with the dependent projections. So, concretely, an element of P (A• )(π) (a composition law for π) is a sequence of terms ρ
= ((σi , τi )0≤i
x : Γsn (π) σ0 ( x) : A,
x : Γtn (π) τ0 ( x) : A, .. .
x : Γs(π) σn−1 ( x) : Id(σn−2 (src( x)), τn−2 (tgt( x))),
x : Γt(π) τn−1 ( x) : Id(σn−2 (src( x)), τn−2 (tgt( x))),
x : Γπ ρ( x) : Id(σn−1 (src( x)), τn−1 (tgt( x))).
Weak ω-Categories from Intensional Type Theory
183
The source of this is then the composition law (σ0 , τ0 , . . . , σn−2 , τn−2 ; σn−1 ) ∈ P (s(π)), and the target is (σ0 , τ0 , . . . , σn−2 , τn−2 ; τn−1 ) ∈ P (t(π)). /• / •). Then Γπ is the For a typical example, consider the case π = (• context x, y, z : A, p : IdA (x, y), q : IdA (y, z), and we may give a composition law by taking σ0 (x) = x, τ0 (z) = z, and taking ρ to be some term with x, y, z : A, p : Id(x, y), q : Id(y, z) ρ : Id(x, z) as described in Subsection 1.1. Definition 2. As a special case of the above construction, we take PMLId := EndC(MLId [X]) (X• ), the operad of all definable composition laws on a generic type. 4.2
Contractibility of PMLId
For general T , A, we cannot expect EndC(T ) (A• ) to be contractible: contractibility implies (at least) that any two elements of EndC(T ) (A• )(•), i.e. any two terms / •), i.e. are x : A τ, τ : A, are connected by an element of EndC(T ) (A• )(• propositionally equal, which clearly may fail. However, in the specific case of PMLId , we do wish to show contractibility, since this is the operad which naturally acts on any type. What precisely does contractibility mean, here? For every pasting diagram π and every parallel pair of composition laws σ ∈ PMLId (s(π)), τ ∈ PMLId (t(π)), ρ) = σ , t(
ρ) = τ . we need to find some filler ρ
∈ PMLId (π), with s(
Given π ∈ T 1n , such a parallel pair amounts to terms (σi , τi )0≤i
x : Γπ ρ( x) : Id(σn−1 (src( x)), τn−1 (tgt(x))). Playing with small examples (the reader is encouraged to try!) suggests one should be able to do this by applying Id-elim (possibly repeatedly, working bottom-up as usual) to the variables of identity types in Γπ . Id-elim says that to obtain ρ, it’s enough to obtain it in the case where one of the k-cell variables (with k the highest dimension appearing) is of the form r(−), and its source and target (k − 1)-cell variables are equal; and by repeated application, it’s enough to obtain ρ in the case where multiple higher cells have had identities plugged in in this way. Now, since the terms σi , τi have themselves been built up from just the Idrules, as we plug r(−) terms into them and identify the lower variables, they should sooner or later collapse by Id-comp to be of the form ri (x) themselves. In particular, once we have applied Id-elim as far as possible, identifying all the variables of type X to a single x : X and plugging in ri (x) for all the higher variables, the σi , τi should all compute down to ri (x), giving in particular σn−1 = τn−1 = rn−1 (x), so we can take the desired filler to be x : X rn (x) : Id(rn−1 (x), rn−1 (x)).
184
P. LeF. Lumsdaine
Below, we formalise this argument. The crucial lemma is that the context x : X is an initial object in C(MLId [X]); this expresses the fact that since any context Γ in MLId [X] is built up from X and its higher identity types, there is always a unique way to plug in terms ri (x) (i ≥ 0) to all its variables, and / Γ is a context morphism and we plug in ri (x)’s to all ensures that if σ : Γ the variables of σ, the result must reduce to consist of just ri (x)’s. Lemma 2. The context x : X is an initial object in C(MLId [X]); that is, for / Γ. any closed context Γ there is a unique context morphism rΓ : (x : X) Id Remark 1. This lemma does not generally hold in extensions of ML [X]; if we add -types, for instance, it is easily seen to be false.
Proof. We work by structural induction (as, essentially, we must, since this is a property of the theory MLId [X] which can fail in extensions of it). So, given any derivation δ of a judgement J in MLId [X], we derive another judgement J , with form depending on that of J as follows: J y : Γ A( y ) type
J x : X rΓ A : A(rΓ )
y : Γ A( y ) = A ( y ) type x : X rΓ A = rΓ A : A(rΓ )
y : Γ τ ( y ) : A( y ) x : X τ (rΓ ) = rΓ A : A(rΓ )
y : Γ τ ( y ) = τ ( y ) : A( y ) (nothing needed) The context morphisms rΓ : (x : X)
/ Γ are built of the terms rΓ A , by
rΓ,y:A = rΓ , rΓ A . The judgements above ensure that this is well-typed, and is the unique context morphism from (x : X) to Γ . The induction is essentially routine. As usual, we work by cases, depending on the last rule of δ, and our cases for Subst- and Wkg-rules ensure that the terms rΓ A constructed depend only on the judgment y : Γ A( y ) type and not on the specific derivation δ thereof. Some sample cases of the induction are given in the extended version [7] We are now ready to show that PMLId is contractible, arguing along the lines sketched above. Theorem 2. The operad PMLId is contractible. Proof. As described above, this amounts to the statement: for every n ∈ N and pasting diagram π ∈ T 1n , and every sequence (σi , τi )i
x : Γsn−i (π) σi ( x) : X (σ0 (srci ( x)), . . . , τi−1 (tgt( x)))
x : Γtn−i (π) τi ( x) : X (σ0 (srci ( x)), . . . , τi−1 (tgt( x)))
Weak ω-Categories from Intensional Type Theory
185
(i < n) are derivable in MLId [X], we can find a “filler”, i.e. a term ρ with
x : Γπ ρ( x) : X (σ0 (srcn ( x)), . . . , τn−1 (tgt( x))) We show this by induction on the number of cells in π. Suppose π has more than one cell. Then it must have some cells in dimension > 0. Let k be the highest dimension in which π has cells, and c be the last cell ˆk ). Let π − ∈ T 1n be the pasting diagram in π ˆk (using our chosen ordering on π whose globular set is obtained from that of π by removing c and identifying s(c) and t(c). Now Γπ− is equal, up to renaming of variables, to the context obtained from Γπ by removing the variables xkc and xk−1 t(c) , and replacing any occurences of the latter in subsequent types by xk−1 s(c) (at least, the contexts are equal if we used the canonical choices of orderings; otherwise, we have canonical context isomorphisms given by re-ordering variables), and we have a natural context / Γπ given by plugging in xk−1 for xk−1 and r(xk−1 ) for morphism h : Γπ− s(c) t(c) s(c) xkc ; and these are exactly right for
x : Γπ− ρ− ( x) : X (σ0 (srcn (h( x))), . . . , τn−1 (tgt(h( x)))) k−1 k n
x : Γπ Jxk−1 .ρ− (xk−1 x)), . . . , τn−1 (tgt( x))) s(c) , xt(c) , xc ) : X (σ0 (src (
s(c)
to be an instance of Id-elim+ . So to give the desired filler ρ, it is enough to give ρ− with
x : Γπ− ρ− ( x) : X (σ0 (srcn (h( x))), . . . , τn−1 (tgt(h( x)))). But now note that sn−i (π − ) =
sn−i (π) for i < k (sn−i (π))− for i ≥ k
and similarly for tn−i (π − ); moreover, we can construct context morphisms hsi : Γsn−i (π− )
/ Γsn−i (π)
(analogous to h if i ≥ k, and just the identity otherwise), and these commute with the maps src and tgt. So for each i < n, we have
x : Γsn−i (π− ) σi (h( x)) : X (σ0 (h(srci ( x))), . . . , τi−1 (h(tgt( x)))),
x : Γtn−i (π− ) τi (h( x)) : X (σ0 (h(srci ( x))), . . . , τi−1 (h(tgt( x)))), i.e. the sequence of terms (h∗ (σi ), h∗ (τi ))i
186
P. LeF. Lumsdaine
Unwinding this induction, we can see that it exactly formalises the process described at the start of Subsection 4.2. Note that Lemma 2 was applied only at the base case of the induction, and only to show that terms x : X σ : Id(rn (x), rn (x)) must be equal to rn+1 (x). A sufficiently general normalisation result could also be used to show this, resting on showing that these are the only appropriate normal forms; this would have the advantage of extending to the operad EndMLI [X] (X• ) of all composition laws of the full type theory. However, the present approach seems more economical, and more directly justifies the original intuition. 4.3
Types as Weak ω-Categories
Putting the above results together, we obtain our main goal: Theorem 3. Let T be any type theory extending the fragment MLId , Γ any closed context of T , A a type over Γ . Then the globular context A• carries the structure of a weak ω-category in C(T /Γ ). Proof. By Proposition 1, there is a unique translation A : MLId [X]
/ T /Γ
taking X to A, and hence taking X• to A• . By Proposition 3, this induces an action of PMLId on A• , and so, since by Theorem 2 PMLId admits a contraction, an action of L (the initial operad-with-contraction) on A• , as desired. Corollary 1. Let T , Γ , A be as above, Δ a context over Γ in T . Then the globular set C(T /Γ )(Δ, A• ) of context maps f : Γ, Δ
/ Γ, x0 , y0 : A, . . . , xn−1 , yn−1 : A(x0 , . . . , yn−2 ), z : A(x0 , . . . , yn−1 )
over Γ has the structure of a weak ω-category, naturally in Δ.
References 1. Hofmann, M., Streicher, T.: The groupoid interpretation of type theory. In: Twenty-Five Years of Constructive Type Theory (Venice, 1995). Oxford Logic Guides, vol. 36, pp. 83–111. Oxford Univ. Press, New York (1998) 2. Gambino, N., Garner, R.: The identity type weak factorisation system. Theoretical Computer Science (to appear) (2008) arXiv:0808.2122 3. Garner, R.: 2-dimensional models of type theory. Mathematical Structures in Computer Science (to appear) (2008) arXiv:0808.2122 4. Awodey, S., Warren, M.A.: Homotopy theoretic models of identity types. Math. Proc. of the Cam. Phil. Soc. (to appear) (2008) arXiv:0709.0248 5. Leinster, T.: Higher Operads, Higher Categories. London Mathematical Society Lecture Note Series, vol. 298. Cambridge University Press, Cambridge (2004) arXiv:math/0305049 6. Batanin, M.A.: Monoidal globular categories as a natural environment for the theory of weak n-categories. Adv. Math. 136(1), 39–103 (1998)
Weak ω-Categories from Intensional Type Theory
187
7. Lumsdaine, P. Lef.: Weak ω-categories from intensional type theory (extended version) (2008) arXiv:0812.0409 8. van den Berg, B.: Types as weak ω-categories. Lecture delivered in Uppsala, and unpublished notes (2006) 9. Garner, R., van den Berg, B.: Types are weak ω-groupoids (submitted) (2008) arXiv:0812.0298 10. Jacobs, B.: Categorical Logic and Type Theory. Studies in Logic and the Foundations of Mathematics, vol. 141. North-Holland Publishing Co., Amsterdam (1999) 11. Warren, M.A.: Homotopy Theoretic Aspects of Constructive Type Theory. PhD thesis, Carnegie Mellon University (2008) 12. Cartmell, J.: Generalised algebraic theories and contextual categories. Ann. Pure Appl. Logic 32(3), 209–243 (1986) 13. Leinster, T.: A survey of definitions of n-category. Theory Appl. Categ. 10, 1–70 (2002) (electronic) arXiv:math/0107188 14. Street, R.: The petit topos of globular sets. Journal of Pure and Applied Algebra 154, 299–315 (2000)
Relating Classical Realizability and Negative Translation for Existential Witness Extraction Alexandre Miquel Universit´e Paris 7 & LIP (ENS Lyon) [email protected] Abstract. Friedman showed how to turn a classical proof of a Σ10 formula into an intuitionistic proof of the same formula, thus giving an effective method to extract witnesses from classical proofs of such formulae. In this paper we show how to achieve the same goal efficiently using Krivine realizability with primitive numerals, and prove that the corresponding program is but the direct-style equivalent (using call-cc) of the CPS-style program underlying Friedman’s method.
1
Introduction
Classical realizability is a powerful framework introduced by Krivine [4,7] to study the proofs-as-programs paradigm in classical logic. Its main feature is that the computational interpretation of proofs is not described via a negative translation, but instead expressed in direct style using a λ-calculus extended with the control operator call/cc. Although classical realizability is traditionally presented in second-order classical arithmetic, it can be extended to much more expressive logical frameworks such as Zermelo-Fraenkel set theory [5] or the calculus of constructions with universes [8]. And with the help of extra instructions, it can even provide realizers for several forms of the axiom of choice [5]. The purpose of this paper is twofold. First, it aims at presenting the method that naturally comes with classical realizability in order to extract a witness from a classical proof of a Σ10 -formula. Second, it aims to relate this extraction method with the traditional method introduced by Friedman [2] that combines a negative translation with an intuitionistic realizability interpretation, and to show that through this translation, both extraction methods are basically the same (up to the details in the definition of the negative translation). For that, we first present Krivine’s framework for classical realizability as well as the corresponding witness extraction method, introducing a primitive (and we believe, more efficient) representation of natural numbers in the language of realizers—instead of using Church numerals. We then define an intuitionistic realizability model for second-order arithmetic as well as a negative translation in the spirit of [9], but extended to primitive numerals. We finally analyze the witness extraction method of classical realizability through the negative translation, and show that it corresponds to the transformation used by Friedman to prove the conservativity of Peano arithmetic over Heyting arithmetic for Σ10 (and actually Π20 ) formulæ. P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 188–202, 2009. c Springer-Verlag Berlin Heidelberg 2009
Relating Classical Realizability and Negative Translation
2 2.1
189
Classical Realizability in Second-Order Logic The Language of Second-Order Logic
The language of second-order logic is parameterized by a first-order language of expressions (a.k.a. first-order terms) to represent the individuals. In this paper, we shall only consider arithmetic expressions (notation: e, e , etc.) that are formed from first-order variables (notation: x, y, z, etc.) and the constant symbol 0 (‘zero’) using function symbols for all primitive recursive definitions of functions (notation: f , g, h, etc.), including a unary function symbol s for the successor function, binary function symbols + and × for addition and multiplication, and a unary function symbol ‘pred’ for the predecessor function. Formulæ of the (minimal) language of second-order logic are formed from second-order variables (notation: X, Y , Z, etc.) of all arities using implication and first- and second-order universal quantification: Formulæ
A, B
::=
X(e1 , . . . , ek )
| A⇒B
| ∀x A
| ∀X A .
The set of all free (first- and second-order) variables of a formula A is written FV (A). The notions of first- and second-order substitution in a formula are defined as usual, and written A{x := e} and A{X(x1 , . . . , xk ) := B} respectively. In what follows we shall consider the following (standard) second-order encodings for connectives and first- and second-order existential quantifications as well as for Leibniz equality: ⊥ ≡ ¬A ≡
∀Z Z A⇒⊥
A∧B A∨B ∃x A(x) ∃X A(X)
≡ ≡ ≡ ≡
∀Z ((A ⇒ B ⇒ Z) ⇒ Z) ∀Z ((A ⇒ Z) ⇒ (B ⇒ Z) ⇒ Z) ∀Z (∀x (A(x) ⇒ Z) ⇒ Z) ∀Z (∀X (A(X) ⇒ Z) ⇒ Z)
e = e
≡
∀Z (Z(e) ⇒ Z(e ))
(where Z is a fresh variable). 2.2
A Type System for Second-Order Logic
We now define a type system for classical second-order logic, based on a judgment of the form Γ NK t : A, where Γ is a typing context, t a proof-term and A a formula of the language defined above. Here, proof-terms (notation: t, u, etc.) are simply the pure λ-terms enriched with a special constant written cc (‘call/cc’), to prove Peirce’s law. Typing contexts (notation: Γ , Γ , etc.) are finite functions from proof-variables to formulæ. The inference rules of this system are given in Fig. 1. These rules contain an axiom rule, introduction and elimination rules for implication and first- and second-order universal quantification, plus a typing rule for cc (Peirce’s axiom). The semantics of this system is given by the classical realizability model we are going to define now.
190
A. Miquel
Γ NK x : A
(x:A)∈Γ
Γ NK cc : ((A ⇒ B) ⇒ A) ⇒ A
Γ, x : A NK t : B Γ NK λx . t : A ⇒ B Γ NK t : A Γ NK t : ∀x A Γ NK t : A Γ NK t : ∀X A
Γ NK t : A ⇒ B Γ NK t : A Γ NK tu : B Γ NK t : ∀x A Γ NK t : A{x := e}
x∈FV / (Γ )
Γ NK t : ∀X A Γ NK t : A{X(x1 , . . . , xk ) := B}
X ∈FV / (Γ )
Fig. 1. Typing rules for classical second-order logic
2.3
A Calculus of Realizers
Krivine’s classical realizability model [7] is based on a much larger larger calculus than the calculus of proof-terms described in 2.2. Instead, the language λc distinguishes three distinct syntactic categories: terms, stacks and processes. Terms (notation: t, u, etc.) and stacks (notation: π, π , etc.) are defined by mutual induction as follows: Terms Stacks
t, u ::= π
::=
x | λx . t
| tu
| κ
| kπ
α | t·π
(t closed)
Terms are the pure λ-terms enriched with constants for every instruction (notation: κ, κ , etc.) of a fixed instruction set K that contains (at least) the instruction cc, plus a continuation constant for every stack π. Stacks are finite lists of closed terms terminated by a stack constant 1 (notation: α, β, etc.) Note that unlike terms (that may be open or closed), stacks only contain closed terms and are thus closed objects—so that the continuation constant kπ associated to every stack π is really a constant. Finally, a process (notation: p, q, etc.) is a pair written t π and formed by a closed term t and a stack π: Processes
p, q
::=
t π
(t closed)
The set of closed terms (resp. the set of stacks) is written Λc (resp. Π), and the set of processes is written Λc Π.
1
Since the witness extraction method we present in this paper (in section 4) does not make use of the constant at the bottom of the stack, it is safe to assume here that there is only a single stack constant ‘nil’. However, the presence of several stack constants is sometimes desirable when working with extra instructions that make use of them, such as the ‘clock’ instruction [5,7].
Relating Classical Realizability and Negative Translation
2.4
191
Evaluation
The set of processes is equipped with a binary relation p p of evaluation that satisfies (at least) the following axioms Grab Push Call/cc Resume
λx . t tu cc kπ
u·π π t·π t · π
t{x := u} t t t
π u·π kπ · π π
for all t, u ∈ Λc and π, π ∈ Π. Note that only processes are subject to evaluation: there is no notion of reduction for either terms or stacks in λc . 2.5
The Realizability Interpretation
The construction of the classical realizability model is parameterized by a set of processes ⊥ ⊥ ⊆ Λc Π (the ‘pole’) which we assume to be saturated (or closed under anti-evaluation) in the sense that conditions p p and p ∈ ⊥ ⊥ together imply p ∈ ⊥ ⊥ for all p, p ∈ Λc Π. We call a falsity value any set of stacks S ⊆ Π. By orthogonality, every falsity ⊥ ⊆ Λc defined as: value S ⊆ Π induces a truth value S ⊥ ⊥ S⊥ = {t ∈ Λc : ∀π ∈ S t π ∈ ⊥ ⊥} .
A valuation is a function ρ that maps every first-order variable x to a natural number ρ(x) ∈ N, and every second-order variable X of arity k to a falsity value function ρ(X) : Nk → P(Π). A parametric expression (resp. a parametric formula) is simply an expression e (resp. a formula A) equipped with a valuation ρ, that we write e[ρ] (resp. A[ρ]). Parametric contexts are defined similarly. For every parametric expression e[ρ] we write Val(e[ρ]) ∈ N the value of e[ρ], interpreting variables by their images in ρ while giving to the primitive recursive function symbols in e their standard interpretation. Every parametric formula A[ρ] is interpreted as two sets, namely: a falsity value A[ρ] ⊆ Π and a truth value |A[ρ]| ⊆ Λc . Both sets are defined by induction on A as follows: X(e1 , . . . , ek )[ρ]
=
ρ(X)(Val(e1 [ρ]), . . . , Val(ek [ρ]))
(A ⇒ B)[ρ]
=
(∀x A)[ρ]
=
|A[ρ]| · B[ρ] = {t · π : t ∈ |A[ρ]|, π ∈ B[ρ]} A[ρ; x ← n] n∈N
(∀X A)[ρ]
=
A[ρ; x ← F ]
F :Nk →P(Π)
|A[ρ]| =
⊥ = {t ∈ Λc : ∀π ∈ A[ρ] t π ∈ ⊥ ⊥} A⊥
Since the truth value |A[ρ]| and the falsity value A[ρ] actually depend on the parameter ⊥ ⊥, we shall sometimes use the notations |A[ρ]|⊥ ⊥ and A[ρ]⊥ ⊥ to indicate this dependency explicitly. In what follows, we shall write
192
A. Miquel
– t NK A[ρ] (‘t realizes A[ρ]’) when t ∈ |A[ρ]|⊥ ⊥ (keeping in mind that this notion depends on the choice of the pole ⊥ ⊥); – t NK A[ρ] (‘t universally realizes A[ρ]’) when t ∈ |A[ρ]|⊥ ⊥ for all saturated sets ⊥ ⊥ ⊆ Λc Π. 2.6
Adequacy
We call a substitution any finite function from proof-variables to closed λc -terms, and write t[σ] the closed term obtained by applying a substitution σ to a term t. Given a substitution σ and a parametric context Γ [ρ], we write σ NK Γ [ρ] when dom(Γ ) ⊆ dom(σ) and σ(x) NK A[ρ] for all (x : A) ∈ Γ . We say that: – A judgment Γ NK t : A is sound (w.r.t. the pole ⊥ ⊥) when for all valuations ρ and for all substitutions σ such that σ NK Γ [ρ], we have t[σ] NK A[ρ]. n (where P1 , . . . , Pn and C are typing judgments) is – An inference rule P1 ···P C sound (w.r.t. the pole ⊥ ⊥) when the soundness of its premises P1 , . . . , Pn (in the above sense) implies the soundness of its conclusion C. From these definitions, it is clear that the conclusion of any typing derivation formed with only sound inference rules is sound. Proposition 1 (Adequacy). The typing rules of Fig. 1 are sound w.r.t. all poles ⊥ ⊥ ⊆ Λc × Π. A consequence of this proposition is that closed proof-terms given by the type system of Fig. 1 provide universal realizers of the corresponding formulæ. (But not all realizers can be detected via typing [7].)
3
From Second-Order Logic to Second-Order Arithmetic
3.1
Extending the Language of Formulæ
We enrich the language of formulæ with a unary predicate constant null(e) (whose name is self-explanatory) plus a syntactic construct {e} ⇒ B (where e is an expression and B a formula) whose semantics will be given in 3.32 : Formulæ
A, B
::=
···
| null(e)
| {e} ⇒ B
This extension of the language is accompanied with the shorthands: ≡ null(0)
nat(e) ≡ ∀Z (({e} ⇒ Z) ⇒ Z)
N
∀ x A(x) ≡ ∀x ({x} ⇒ A(x)) ∃N x A(x) ≡ ∀Z (∀x ({x} ⇒ A(x) ⇒ Z) ⇒ Z) We also introduce two congruences over expressions and formulæ, written e ∼ = A . The congruence e ∼ = e over the class of expressions is = e and A ∼ 2
Intuitively: {e} ⇒ B is the type of functions producing a proof of B when applied to the value of e, using the primitive representation of numerals described in 3.2
Relating Classical Realizability and Negative Translation
193
the congruence generated by the equational theory of the primitive recursive function symbols expressions are made of. The congruence A ∼ = A over the class of formulæ is then defined as the least congruence containing the contextual closure of the congruence e ∼ = e across atomic formulæ, and satisfying the extra ∼ equation null(s(e)) = ⊥ (writing ⊥ ≡ ∀Z Z). Note that from the definition of the propositional constant , the equation null(0) ∼ = comes for free. 3.2
Adding Primitive Numerals to λc
The instruction set K is enriched with the following instructions: – For every n ∈ N, a (pseudo-)instruction n ˆ ∈ K representing the numeral n as a pure datum. We call the constant n ˆ a pseudo-instruction since it comes with no evaluation rule (i.e. of the form n ˆ π · · · ), thus expressing that the constant n ˆ is meaningless in function position. – Two constants s and rec with the reduction rules s n ˆ·u·π 0·π rec u0 · u1 · ˆ rec u0 · u1 · n +1·π
u n +1·π u0 π u1 n ˆ · (rec u0 u1 n ˆ) · π
for all u, u0 , u1 ∈ Λc , n ∈ N and π ∈ Π. With these instructions, it is possible to implement every primitive recursive function f of arity k as a term fˇ with the reduction rule ˆk · u · π fˇ n ˆ1 · · · n
∗
u m ·π,
writing m the image of (n1 , . . . , nk ) by f . To improve efficiency, we can also introduce the fˇs (or some of them) as primitive instructions. Apart from the representation of numerals as pure data, every natural number n ∈ N can be also represented as a program n ˇ defined by n ˇ ≡ λx . xˆ n. (This program will receive the type nat(n) from the type system defined in 3.4.) More generally we call a lazy numeral any closed term t such that tu uˆ n for some n ∈ N (that may depend on u) for all u ∈ Λc . 3.3
The Extended Realizability Interpretation
The realizability interpretation defined in 2.5 is extended to the new syntactic constructs by letting: ∅ if Val(e[ρ]) = 0 null(e)[ρ] = Π otherwise ({e} ⇒ B)[ρ] = {ˆ n · π : n = Val(e[ρ]), π ∈ B[ρ]} From the interpretation of the predicate null(e) and from the definitions of the congruences e ∼ = e and A ∼ = A , we immediately get:
194
A. Miquel
Proposition 2 (Denotations of congruent expressions/formulæ) 1. If e ∼ = e , then Val(e[ρ]) = Val(e [ρ]) for all valuations ρ; 2. If A ∼ = A , then A[ρ] = A [ρ] and |A[ρ]| = |A [ρ]| for all valuations ρ. 3.4
The Extended Type System and Its Adequacy
We first extend the notion of typing context by allowing a second form of declaration x : {e}, where x is a proof-variable and e an arithmetic expression. Given a substitution σ and a parametric context Γ [ρ] (according to the extended definition of contexts), the notation σ NK Γ [ρ] now means that: – dom(Γ ) ⊆ dom(σ); – σ(x) NK A[ρ] for all (x : A) ∈ Γ ; – σ(x) ≡ n ˆ where n = Val(e[ρ]), for all (x : {e}) ∈ Γ . The type system defined in Fig. 1 is extended with the inference rules of Fig. 2.
Γ NK t : A Γ NK t : A
A∼ =A
Γ NK s : ∀N x nat(s(x))
Γ NK t :
Γ NK rec : ∀X (X(0) ⇒ ∀N x (X(x) ⇒ X(s(x))) ⇒ ∀N x X(x)) Γ, x : {e} NK t : B Γ NK λx . t : {e} ⇒ B
Γ NK t : {e} ⇒ B Γ NK tx : B
(x:{e})∈Γ
Γ NK t : {sn 0} ⇒ B Γ NK tˆ n:B
Fig. 2. Typing rules for classical second-order arithmetic
These rules comprise a conversion rule (in the spirit of type theory and deduction modulo), typing rules for the instructions s (2nd Peano axiom) and rec (induction principle), plus typing rules for the construct {e} ⇒ B. Proposition 3 (Adequacy). The typing rules of Fig. 2 are sound w.r.t. all poles ⊥ ⊥ ⊆ Λc × Π. Using these rules, one can derive for instance that n ˇ ≡ λx . xˆ n has type nat(sn 0), and more generally build proof-terms for the axioms of arithmetic: Fact 1. The following judgments are derivable: 1. 2. 3. 4. 5.
NK NK NK NK NK
ˇ 0 : nat(0) s : ∀N x nat(s(x)) λz . z : ∀x (s(x) = s(y) ⇒ x = y) λz . z (λw . w) : ∀x ¬(0 = s(x)) rec : A(0) ⇒ ∀N x (A(x) ⇒ A(s(x))) ⇒ ∀N xA(x)
(1st (2nd (3rd (4th (5th
Peano Peano Peano Peano Peano
axiom) axiom) axiom) axiom) axiom)
Relating Classical Realizability and Negative Translation
4
195
Witness Extraction in Classical Realizability
A fundamental difference between classical realizability and intuitionistic realizability is that in classical realizability we do not evaluate terms but processes, that are objects formed by combining a proof (the current term) and a counter-proof (the current stack) of the same formula. From a logical point of view, evaluation thus takes place in an inconsistent world, where a proof and a counter-proof can coexist and interact with each other. It is well-known that in classical realizability, the truth value |A| of any (parametric) formula A is always inhabited provided the pole ⊥ ⊥ is not empty.3 On the other hand, the realizability model induced by the empty pole ⊥ ⊥ = ∅ simply mimics the (full) standard model of PA2, since |A|(⊥ ⊥=∅) = Λc iff A is true (in the standard model), and |A|(⊥ ⊥=∅) = ∅ otherwise [7]. A consequence of the ‘local inconsistency’ of classical realizability is that when we get a classical realizer t of an existential formula ∃N x A(x) from which we extract a number n ∈ N and a realizer tn NK A(n), we can never trust the certificate tn that ‘A(n) holds’. Instead, we have to test the proposed (and potentially false) witness to check whether A(n) holds or not—which requires a decision procedure for the predicate A(x)—and repudiate the current witness (to get another witness) as long as the test fails. Here is how to proceed formally. We assume given a unary primitive recursive symbol f with a universal realizer t0 NK ∃N x null(f (x)), for instance a universal realizer that comes from a proof in the system described in Fig. 1 and 2. (Any Σ10 formula can be given this form.) Let fˇ be a term that computes f in the sense of 3.2, that is: a term such that fˇ n ˆ · u · π ∗ u f (n) · π for all n ∈ N, u ∈ Λc and π ∈ Π. (Such a term is also a universal realizer of ∀N x nat(f (x)).) From the term fˇ let us define df ≡ λnxy . fˇ n (λp . rec x (λ . y) p). By definition, the term df decides whether f (n) = 0 or not, in the sense that u0 · π if f (n) = 0 ˆ · u0 · u1 · π
df n u1 · π if f (n) = 0 for all n ∈ N, u0 , u1 ∈ Λc and π ∈ Π. Let us now form the term t0 ≡ t0 (λxy . df x (stop x) y) where ‘stop’ is an instruction with no evaluation rule. Intuitively, the argument that is passed to the realizer t0 is a function that extracts a potential witness x ∈ N with a certificate y NK A(x), and that decides whether f (x) = 0 or not using df . When the test succeeds, the (correct) witness x is returned via the return instruction stop. When the test fails, the (wrong) certificate y—a realizer of null(f (x)), that is, a realizer of ⊥—is given 3
Given an arbitrary process t0 π0 ∈ ⊥ ⊥, it is easy to check that kπ0 t0 ∈ |A| for every parametric formula A.
196
A. Miquel
the control. In practice, such a realizer of ⊥ can do nothing but backtrack using a formerly saved stack. In this way we implement a retroaction loop where the successive witnesses proposed by the realizer t0 are tested and repudiated as long as the test fails, until a correct witness is found and then returned.4 Putting these intuitions into symbols, we get the following: Proposition 4 (Decidable witness extraction). For all π ∈ Π, the process t0 π evaluates to stop n ˆ · π for some n ∈ N such that f (n) = 0. Proof. Fix π ∈ Π, and consider the pole ⊥ ⊥ formed by all the processes p such ˆ · π for some n ∈ N such that f (n) = 0. Since t0 is a universal that p ∗ stop n realizer of the formula ∃N x null(f (x)), it is also a realizer w.r.t. the pole ⊥ ⊥ defined above. Taking a valuation ρ such that ρ(Z) = {π}, we immediately check that stop NK ({n} ⇒ Z)[ρ] for all n ∈ N such that f (n) = 0 (by definition of ⊥ ⊥). Distinguishing the cases where f (n) = 0 and f (n) = 0, we then prove that λxy . df x (stop x) y NK ({n} ⇒ null(f (n)) ⇒ Z)[ρ] for all n ∈ N, hence the same term realizes ∀x ({x} ⇒ null(f (x)) ⇒ Z)[ρ]. Consequently, we have t0 π ∗ t0 (λxy . df x (stop x) y) · π ∈ ⊥ ⊥, hence the desired result. In section 7 we shall reinterpret this witness extraction method through a wellsuited negative translation.
5
Intuitionistic Realizability for Second-Order Arithmetic
We now define an intuitionistic type system accompanied with its realizability model whose definition follows the global pattern of the Brouwer-HeytingKolmogorov interpretation. As in [9], we introduce a primitive form of conjunction (as a Cartesian product) and primitive forms of first- and second-order existential quantification (as infinitary unions). 5.1
The Language of Formulæ
Taking the same language of arithmetic expressions as before (cf 2.1) with its congruence e ∼ = e (cf 3.1) we now consider the following language of formulæ: Formulæ
A, B
::= null(e) | nat(e) | X(e1 , . . . , ek ) | A ⇒ B | ∀x A | ∀X A | A ∧ B | ∃x A | ∃X A
Compared with the language for classical logic described in 2.1 and 3.1, the language above replaces the construct {e} ⇒ B by a (more standard) primitive predicate nat(e). We also consider primitive constructions for conjunction and first- and second-order existential quantifications. In this setting, numeric quantifications are defined as ∀N x A(x) ≡ ∀x (nat(x) ⇒ A(x)) 4
and ∃N x A(x) ≡ ∃x (nat(x) ∧ A(x))
This witness extraction method is actually implemented in the module for classical program extraction currently developped by the author for the Coq assistant [10].
Relating Classical Realizability and Negative Translation
197
The congruence A ∼ = A over the class of formulæ is defined from the congru∼ ence induced by e = e (across atomic formulæ) by adding the equations null(e) ∼ = ⊥
and
(∃v A(v)) ⇒ B ∼ = ∀v (A(v) ⇒ B)
where v is any first- or second-order variable that does not occur free in B. (This second equation will be crucial to establish the result of Prop. 9.) As before, we write ≡ null(0). 5.2
A Type System for Intuitionistic Second-Order Arithmetic
We introduce an intuitionistic (and more traditional) proof system based on a judgment of the form Γ NJ t : A, where the proof-term t is now formed in the pure λ-calculus enriched with the following constants: pair (pairing), fst (first projection), snd (second projection), 0 (zero), s (successor) and rec (recursor). In what follows we shall write t; u for the application pair t u, and denote by Λ the set of all closed proof-terms. Typing contexts are simply defined here as finite functions from proof-variables to formulæ. The class of derivable judgments Γ NJ t : A is inductively defined from the rules of inference of Fig. 3, using the abbreviation ∀N x A(x) for the numeric quantification ∀x (nat(x) ⇒ A(x)) such as defined in 5.1. (Note that there is no elimination rule for the primitive existential quantifier, since the desired elimination can be performed using the conversion ∀v (A(v) ⇒ B) ∼ = (∃v A(v)) ⇒ B.) This system is expressive enough to provide typable proof-terms for all the theorems of intuitionistic second-order arithmetic. 5.3
Weak Reduction
Proof-terms are equipped with a binary relation of one-step weak reduction written t w t and defined by the rules (λx . t)u w t{x := u} fst t1 ; t2 w t1
rec u0 u1 0 w u0 snd t1 ; t2 w t2
rec u0 u1 (s t) w u1 t (rec u0 u1 t) t w t tu w t u
u w u tu w tu
Note that weak reduction is allowed both in the left- and right hand-side of applications, but not below λ-abstraction (i.e. we disable the ξ-rule of λ-calculus). We write w∗ the reflexive-transitive closure of one step weak reduction. Complementarily to the notion of weak reduction, we also define a relation of inner reduction written t i t from the rules: t w t λx . t i λx . t
t i t tu i t u
u i u tu i tu
t i t λx . t i λx . t
The reflexive-transitive closure of the relation of inner reduction is written i∗ while its reflexive-symmetric-transitive closure is written =i . The union of both relations w and i is the ordinary relation of one step reduction, written . By the standard method of parallel reductions we get:
198
A. Miquel
Γ NJ x : A
(x:A)∈Γ
Γ NJ t :
Γ NJ t : A Γ NJ t : A
A∼ =A
Γ NJ pair : A ⇒ B ⇒ A ∧ B Γ NJ fst : A ∧ B ⇒ A Γ NJ 0 : nat(0)
Γ NJ snd : A ∧ B ⇒ B Γ NJ s : ∀N x nat(s(x))
Γ NJ rec : ∀X (X(0) ⇒ ∀N x (X(x) ⇒ X(s(x))) ⇒ ∀N x X(x)) Γ, x : A NJ t : B Γ NJ λx . t : A ⇒ B Γ NJ t : A Γ NJ t : ∀x A Γ NJ t : A Γ NJ t : ∀X A
Γ NJ t : A ⇒ B Γ NJ t : A Γ NJ tu : B
X ∈FV / (Γ )
Γ NJ t : A{x := e} Γ NJ t : ∃x A
Γ NJ t : ∀x A Γ NJ t : A{x := e}
x∈FV / (Γ )
Γ NJ t : ∀X A Γ NJ t : A{X(x1 , . . . , xk ) := B} Γ NJ t : A{X(x1 , . . . , xk ) := B} Γ NJ t : ∃X A
Fig. 3. Typing rules for intuitionistic second-order arithmetic
Proposition 5. The relation is confluent. Moreover, we easily check that inner reduction can always be postponed: Proposition 6. If t i t w t , then t w t1 i∗ t for some t1 . From this proposition and the confluence of we get: Proposition 7 (Confluence of w modulo =i ). It t w t1 and t w t2 , then there are terms t1 and t2 such that t1 w t1 , t2 w t2 and t1 =i t2 . 5.4
The Intuitionistic Realizability Model
We now build a simple realizability model for the type system defined above, in which formulæ are interpreted as saturated sets of terms, that is, as sets of closed proof-terms S ⊆ Λ such that both conditions t w t and t ∈ S imply t ∈ S. The set of all saturated sets is written SAT. Here, a valuation is a function ρ that maps every first-order variable x to a natural number ρ(x) ∈ N, and every second-order variable X of arity k to a function ρ(X) : Nk → SAT. Parametric expressions, formulæ and contexts are
Relating Classical Realizability and Negative Translation
199
defined as before. Every parametric formula A[ρ] is interpreted as a saturated set A[ρ] ∈ SAT that is defined by the standard equations X(e1 , . . . , ek )[ρ] = ρ(X)(Val(e1 [ρ]), . . . , Val(ek [ρ])) Λ if Val(e[ρ]) = 0 null(e)[ρ] = ∅ otherwise nat(e)[ρ] = {t ∈ Λ : t w∗ sn 0, where n = Val(e[ρ])} (A ⇒ B)[ρ] = {t ∈ Λ : ∀u ∈ A[ρ] tu ∈ B[ρ]} (A ∧ B)[ρ] = {t ∈ Λ : ∃u1 ∈ A[ρ] ∃u2 ∈ B[ρ] t w∗ t; u} (∀x A)[ρ] =
A[ρ; x ← n]
n∈N
(∃x A)[ρ] =
(∀XA)[ρ] =
A[ρ; X ← F ]
F :Nk →SAT
A[ρ; x ← n]
n∈N
(∃XA)[ρ] =
A[ρ; X ← F ]
F :Nk →SAT
In what follows, we shall write t NJ A[ρ] for t ∈ A[ρ]. Fact 2. If A ∼ = A , then A[ρ] = A [ρ] for all valuations ρ. 5.5
Adequacy
Given a substitution σ and a parametric context Γ [ρ], we write σ NJ Γ [ρ] when dom(Γ ) ⊆ dom(σ) and σ(x) NJ A[ρ] for all (x : A) ∈ Γ . We say that: – A judgment Γ NJ t : A is sound when for all valuations ρ and for all substitutions σ such that σ NJ Γ [ρ], we have t[σ] NJ A[ρ]. n (where P1 , . . . , Pn and C are typing judgments) is – An inference rule P1 ···P C sound when the soundness of its premises P1 , . . . , Pn (in the above sense) implies the soundness of its conclusion C. Proposition 8 (Adequacy). The typing rules of Fig. 3 are sound. From this result combined with the realizability interpretation of existential quantification and conjunction, we immediately get: Fact 3 (Witness property). If NJ t : ∃N x A(x) ≡ ∃x (nat(x) ∧ A(x)), then t w∗ sn 0; t for some n ∈ N and for some realizer t NJ A(n). A consequence of this is that every proof of ∃N x null(f (x)) weakly reduces to a pair of the form sn 0; t , where n is such that f (n) = 0.
6 6.1
The Negative Translation Translating Formulæ
We now define a negative translation of classical formulæ (such as defined in subsections 2.1 and 3.1) into intuitionistic formulæ (in the sense of subsection 5.1),
200
A. Miquel
in the spirit of [9]. As usual, this translation is parameterized by a fixed intuitionistic formula R—the formula that will represent the pole ⊥ ⊥. In what follows, we write ¬R A for A ⇒ R. Every classical formula A is translated as two intuitionistic formulæ written A¬¬ and A⊥ . The formula A¬¬ is simply defined as a shorthand for A¬¬ ≡ ¬R A⊥ whereas the formula A⊥ is defined by induction on A as follows: (X(e1 , . . . , ek ))⊥ ≡ X(e1 , . . . , ek ) (A ⇒ B)⊥ ≡ A¬¬ ∧ B ⊥ ({e} ⇒ B)⊥ ≡ nat(e) ∧ B ⊥
(null(e))⊥ ≡ null(h(e)) (∀x A)⊥ ≡ ∃x A⊥ (∀X A)⊥ ≡ ∃X A⊥
where the (unary) primitive recursive function symbol h is defined by the equations h(0) = 1 and h(s(x)) = 0. It is a simple exercise to check that: ⊥ ¬¬ Fact 4. If A ∼ = A and A¬¬ ∼ = A . = A , then A⊥ ∼
Note also that by definition, we have (∀v A(v))¬¬ ≡ ∃v A(v)⊥ ⇒ R ∼ = ∀v (A(v)⊥ ⇒ R) ≡ ∀v (A(v)¬¬ ) 6.2
CPS-Translating Terms and Stacks
We now define two translations t → t∗ and π → π ∗ (that are defined by mutual induction on t and π) from the terms and stacks of the λc -calculus to the proofterms defined in 5.2. These translations are parameterized by a fixed mapping α → α∗ associating a proof-term α∗ to every stack constant α of λc . Stacks are translated in the obvious way, as finite lists: (α)∗
≡ α∗
(t · π)∗ ≡ t∗ ; π ∗
Variable, abstraction and application are translated as expected x∗ (tu)∗ (λx . t)∗
≡ x ≡ λk . t∗ u∗ ; k ≡ λk . (λx . t∗ ) (fst k) (snd k)
whereas continuation constants and call/cc are translated as (kπ )∗ (cc)∗
≡ ≡
λk . fst k π ∗ λk . fst k (λzk . fst k z) (snd k); snd k
Interestingly, the pure datum n ˆ is translated as (ˆ n)∗
≡
sn 0 .
Here, the translation does not start with a continuation abstraction λk . . ., since the construct n ˆ is not intended to appear in head position. Finally, the instructions s and rec are translated as: (s)∗
≡
λk . fst (snd k) s (fst k); snd (snd k)
∗
≡
λk . rec (fst1 k) (λpyk . fst (snd k) p; y; k ) (fst (snd (snd k))) (snd (snd (snd k)))
(rec)
Relating Classical Realizability and Negative Translation
201
Proposition 9 (Correctness w.r.t. typing). If Γ NK t : A (Fig. 1–2), then Γ ¬¬ NJ t∗ : A¬¬ (Fig. 3). 6.3
Simulation of Evaluation by Weak Reduction
The expected property would be that each evaluation step t1 π2 t2 π2 in λc corresponds to one or several weak reduction steps t∗1 π1∗ w+ t∗2 π2∗ through the CPS-translation. Although this works for almost all the evaluation rules— application, abstraction, call/cc, continuation and successor—the property does not hold for the evaluation of rec so that we need to refine a little bit more. Proposition 10 (One step simulation). If t1 π1 t2 π2 (one step evaluation in λc ), then t∗1 π1∗ w+ t∗2 u (weak reduction) for some term u =i π2∗ . Corollary 1 (Grand simulation). — If t1 π1 ∗ t2 π2 (evaluation in λc ), then t∗1 π1∗ w∗ u (weak reduction) for some term u =i t∗2 π2∗ .
7
The Negative Interpretation of Witness Extraction
Let us now reinterpret the witness extraction method described in section 4 through the negative translation defined in section 6. For that, consider a λc -term t0 such that NK t0 : ∃N x null(f (x)), where N ∃ x null(f (x)) is a shorthand for ∀Z (∀x ({x} ⇒ null(f (x)) ⇒ Z) ⇒ Z). Let df be the decision function for the predicate null(f (x)) introduced in section 4, and write uf ≡ λxy . df x (stop x) y (where ‘stop’ is an instruction with no evaluation rule) and p0 ≡ t0 uf · α (where α is a stack constant). From the discussion of ˆ · α for some n ∈ N section 4, we know that the process p0 evaluates to stop n such that f (n) = 0. Via the negative translation we have by Prop. 9:
NJ t∗0 : ∀Z ∀x nat(x) ∧ (null(h(f (x))) ⇒ R) ∧ Z ⇒ R ∧ Z ⇒ R The crucial point is the following: Proposition 11. In the intuitionistic realizability model:
d∗f NJ ∀x nat(x) ∧ null(f (x)) ⇒ R ∧ null(h(f (x))) ⇒ R ∧ ⇒ R (independently from the choice of R). To type-check the term uf ≡ λxy . df x (stop x) y through the negative translation, let us now fix the pole by setting R ≡ ∃x (nat(x) ∧ null(f (x))), while defining the translation of the instruction stop as (stop)∗ ≡ λz . z. With this implementation of stop∗ we clearly have
202
A. Miquel
NJ stop∗ : ∀x nat(x) ∧ null(f (x)) ⇒ R hence (combining the latter with Prop. 11 using adequacy)
u∗f NJ ∀x nat(x) ∧ null(h(f (x))) ⇒ R ∧ ⇒ R , from which we deduce that p∗0 NJ R ≡ ∃x nat(x) ∧ null(f (x)) . We have thus shown that through the negative translation, the transformation of the classical proof t0 into the process p0 is nothing but the transformation of a classical proof of a Σ10 -formula into an intuitionistic realizer of the same formula, thus giving a constructive explanation why the procedure described in section 4 successfully extracts a reliable witness in finite time. Of course, the point here is that through the negative interpretation, the transformation of the classical proof t0 into the process p0 exactly follows the well-known method (due to Friedman [2]) to transform a classical proof of a Σ10 -formula into an intuitionistic proof of the same formula.
References 1. Barendregt, H.: The Lambda Calculus: Its Syntax and Semantics. Studies in Logic and The Foundations of Mathematics, vol. 103. North-Holland, Amsterdam (1984) 2. Friedman, H.: Classically and intuitionistically provably recursive functions. Higher Set Theory 669, 21–28 (1978) 3. Girard, J.-Y., Lafont, Y., Taylor, P.: Proofs and Types. Cambridge University Press, Cambridge (1989) 4. Krivine, J.-L.: A general storage theorem for integers in call-by-name lambdacalculus. Th. Comp. Sc. 129, 79–94 (1994) 5. Krivine, J.-L.: Typed lambda-calculus in classical Zermelo-Fraenkel set theory. Arch. Math. Log. 40(3), 189–205 (2001) 6. Krivine, J.-L.: Dependent choice, ‘quote’ and the clock. Th. Comp. Sc. 308, 259–276 (2003) 7. Krivine, J.-L.: Realizability in classical logic. Unpublished lecture notes (available on the author’s web page) (2005) 8. Miquel, A.: Classical program extraction in the calculus of constructions. In: Duparc, J., Henzinger, T.A. (eds.) CSL 2007. LNCS, vol. 4646, pp. 313–327. Springer, Heidelberg (2007) 9. Oliva, P., Streicher, T.: On Krivine’s realizability interpretation of classical secondorder arithmetic. Fundam. Inform. 84(2), 207–220 (2008) 10. The Coq Development Team (LogiCal Project). The Coq Proof Assistant Reference Manual – Version 8.1. Technical report, INRIA (2006)
Session-Based Communication Optimisation for Higher-Order Mobile Processes Dimitris Mostrous and Nobuko Yoshida Department of Computing, Imperial College London
Abstract. In this paper we solve an open problem posed in our previous work on asynchronous subtyping [12], extending the method to higher-order session communication and functions. Our system provides two complementary methods for communication code optimisation, mobile code and asynchronous permutation of session actions, within processes that utilise structured, typed communications. In order to prove transitivity of our coinductive subtyping relation, we uniformly deal with type-manifested asynchrony, linear functional types, and contravariant components in higher-order communications. For the runtime system we propose a new compact formulation that takes into account stored higher-order values with open sessions, as well as asynchronous commutativity. In spite of the enriched type structures, we construct an algorithmic subtyping system, which is sound and complete with respect to the coinductive subtyping relation. The paper also demonstrates the expressiveness of our typing system with an e-commerce example, where optimised processes can interact respecting the expected sessions.
1 Introduction Sessions [16,7] have emerged as a tractable and expressive theoretical substrate, which offers direct language and protocol support [17,9,18] for high-level, type-safe and uniform abstraction for a wide range of communication patterns. Session types enable static validation assuring both type and communication-safety — not only is the value of each message correctly typed, but the sequence of messages are sent and received according to the scenario specified by the session type, precluding communication mismatch. Session primitives can be smoothly integrated with traditional subtyping of object and functional languages, to obtain a more flexible behavioural composition [5]. Our recent work [12] developed a new subtyping, asynchronous subtyping, that characterises compatibility between classes of permutations of communications within asynchronous protocols, offering much greater flexibility. However, an open problem remained: how to uniformly introduce communication optimisations in the presence of code mobility [11], incorporating higher-order sessions and functions into the asynchronous subtyping [12, § 6]. This is the question we address in this paper. Higher-Order Processes with Asynchronous Sessions. We develop a session typing system for the Higher-order π-calculus [15], an amalgamation of call-by-value λcalculus and π-calculus, extending [11]. Code mobility is facilitated by sending not just
The work is supported by EPSRC GR/T03208, GR/T03215 and IST2005-015905 MOBIUS.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 203–218, 2009. c Springer-Verlag Berlin Heidelberg 2009
204
D. Mostrous and N. Yoshida
ground values and channels, but also abstracted processes that can be received and activated locally, reducing the number of transmissions of remote messages. The simplest code mobility operations are sending a thunked process P via channel s (denoted as s!P), and receiving and running it by applying the unit (denoted as s?(x).x ()). In our calculus, communications are always within a session, established when accept and receive processes synchronise on a shared channel: a(x).x!5.x!true.x?(y).(y() | R) | a(x).x?(z1 ).x?(z2 ).x!P resulting in a fresh session, consisting two channels s and s, each private to one of the two processes, and their associated queues initialised to be empty: (ν s)(s!5.s!true.s?(y).(y() | R) | s?(z1 ).s?(z2 ).s!P | s : ε | s : ε) To avoid conflicts, an output on a channel s (resp. s) places the value on the dual queue s (resp. s), while an input on s reads from s (resp. for s). Thus, after two steps the outputs of 5 and true are placed on queue s as follows: (ν s)(s?(y).(y() | R) | s?(z1 ).s?(z2 ).s!P | s : ε | s : 5 · true) and in two more steps the right process receives and reduces to s!P{5/z1 }{true/z2 }. Similarly the next step transmits the thunked process, and R can interact with P locally. The session type of s, S =?[nat].?[bool].![H] (where H is the type of P), guarantees that values are received following the order specified by S. Asynchronous Communication Optimisation with Code Mobility. Suppose the size of P is very large and it does not contain z1 and z2 . Then the right process might wish to start transmission of P to s : ε concurrently without waiting for the delivery of 5 and true, since the sending is non-blocking. Thus we send P ahead as in s!P.s?(z1 ).s?(z2 ).0. The interaction with the left process is safe as the outputs are ordered in an exact complementary way. However the optimised code is not composable with the other party by the original session system [16] since it cannot be assigned S. To make this optimisation valid, we proposed the asynchronous subtyping in [12] by which we can refine a protocol to maximise asynchrony without violating the session. For example, in the above case, S =![H].?[nat].?[bool] is an asynchronous subtype of S, hence the resulting optimisation is typable. The idea of this subtyping is intuitive and the combination of two kinds of optimisations is vital for typing many practical protocols [17,18] and parallel algorithms [13], but it requires subtle formal formulations due to the presence of higher-order code. The linear functional typing developed in [11] permits to send a value that contains free session channels: for example, not only message s!s ?(x).s !x (for s!P), but also one which contains its own session s!s?(x).s!x is typable (if R conforms with the dual session like R = s!7.s?(z).0). The first message can go ahead correctly, but the permutation of the second message (as s!P) violates safety since the input action s?(x) will appear in parallel with s?(z1 ).s?(z2 ), creating a race condition, as seen in: (ν s)(s?(x).s!x | R | s?(z1 ).s?(z2 ).0 | s : ε | s : 5 · true)
Session-Based Communication Optimisation for Higher-Order Mobile Processes (Identifiers) u, v, w ::= x, y, z variables | a, b, c shared channels (Terms) P, Q, R ::= V | u(x).P | u(x).P | k?(x).P | k!V .P | k {l1 : P1 , . . . , ln : Pn } | k l.P | P|Q | (ν a : S) P | (ν s) P | PQ | 0 | s :h
value server client input output branching selection parallel restriction restriction application nil process queue
205
k ::= x, y, z variables | s, s session channels (Values) V ::= u, v, w | k | () | λ(x :U).P | µ(x :U → T ).λ(y :U).P
shared linear unit abstraction recursion
(Message Values) h ::= l | V
label
(Abbreviations) def
P = λ(x : unit).P (x ∈ fv(P)) thunk def
run = λx.(x ())
run
Fig. 1. Syntax
This paper shows that the combination of two optimisations is indeed possible by establishing soundness and communication-safety, subsuming the original typability from [11]. The technical challenge is to prove the transitivity of the asynchronous subtyping integrated with higher-order (linear) function types and session-delegation, since the types now appear in arbitrary contravariant positions [12]. Another challenge is to formulate a runtime typing system which handles both stored higher-order code with open sessions and the asynchronous subtyping. We demonstrate all facilities of typepreserving optimisations proposed in this paper by using an e-commerce scenario. A full version, containing omitted definitions and proofs, is available from [1].
2 The Higher-Order π-Calculus with Asynchronous Sessions 2.1 Syntax and Reduction The calculus is given in Fig. 1, based on the π-calculus augmented with asynchronous session primitives and the call-by-value λ-calculus. Except for recursion and message queues for asynchronous communications [8], all constructs are from the synchronous Higher-Order calculus with sessions [11]. A session is initiated over a shared channel and communications belonging to a session are performed via two fresh end-point channels specific to that session, called session channels or queue endpoint channels, used to distinguish the two end points, taking a similar approach to [5,20]. The dual of a queue endpoint s is denoted by s , and represents the other endpoint of the same session. The operation is self-inverse hence s = s. We write V for a potentially empty vector V1 ...Vn . Types, given later, are denoted by U, T and S, but type annotations are often omitted. For values, we have shared and linear identifiers, unit, abstraction and recursion. For terms, we have prefixes for declaring session connections, u(x).P for servers and
206
D. Mostrous and N. Yoshida
(beta) (λx.P)V −→ P{V/x}
(rec) (µy.λx.P)V −→ P{V/x}{µy.λx.P/y}
(send) s!V .P | s :h −→ P | s :h ·V (sel) s l.P | s :h −→ P | s :h · l
(get) s?(x).P | s :V ·h −→ P{V/x} | s :h (bra) s {l1 : P1 , . . . , ln : Pn } | s : lm ·h −→ Pm | s :h
(1 ≤ m ≤ n)
(conn) a(x).P | a(z).Q −→ (ν s) (P{s/x} | Q{s/z} | s : ε | s : ε) (app-l) (resc)
P −→ P PQ −→ P Q P −→ P (ν a)P −→ (ν a)P
(app-r) (ress)
Q −→ Q V Q −→ V Q P −→ P (ν s)P −→ (ν s)P
s, s fresh
P −→ P P | Q −→ P | Q P ≡ P −→ Q ≡ Q (str) P −→ Q (par)
Fig. 2. Reduction
u(x).P for clients. Session communications are performed using the next four primitives: input k?(x).P, output k!V .P, branching k {l1 : P1 , . . . , ln : Pn } (often written as k {li : Pi }i∈I with index set I) which offers alternative interaction patterns, and selection k l.P which chooses an available branch. (ν a : S)P restricts (and binds) a channel a to the scope of P. Similarly, (ν s) P binds s and s, making them private to P. s :h is a message queue, also called buffer, representing ordered messages in transit with destination s (which may be considered as a network pipe in a TCP-like transport). Queues and session restrictions appear only at runtime. A program is a process which does not contain runtime terms. Other primitives are standard. We often omit 0. The bindings are induced by (ν a : S)P, (ν s)P, u(x).P, u(x).P, λx.P and µy.λx.P. The derived notions of bound and free identifiers, alpha equivalence and substitution are standard. We write fv(P)/fn(P) for the set of free variables/channels, respectively. By using recursion, we can represent infinite behaviours of processes such as, e.g. the definition agent def or !u(y).P in [20,7,10,11]. For example the replication !u(y).P in [11] can be defined as u(x).(µy.λz.(P | z(x).y z))u with x ∈ fv(P). The single-step call-by-value reduction relation, denoted −→, is a binary relation from closed terms to closed terms, defined by the rules in Fig. 2. The rules are from those of the HOπ-calculus [11] combined with asynchronous session communications from [8]. Rule (conn) establishes a new session between server and client via shared name u, generating two fresh session channels and the associated two empty queues (ε denotes the empty string). Rules (send) and (sel) respectively enqueue a value and a label at the tail of the queue for a dual endpoint s. Rules (get) and (bra) dequeue, from the head of the queue, a value or label. (get) substitutes value V for x in P, while (bra) selects the corresponding m-branch. Since (conn) provides a queue for each channel, these rules say that a sending action is never blocked (asynchrony) and that two messages from the same sender to the same channel arrive in the sending order (order preservation). Other rules are standard. A session channel s and s can be sent and received (when V = k), with which various protocols are expressed, allowing complex nested and private structured communications. This interaction is called higher-order session passing (delegation). We use the standard structure rules [10] ≡ such as (ν s) P | Q ≡ (ν s) (P | Q) if s, s ∈ fn(Q) (see [11]). “” denote the multi-step reductions defined as (≡ ∪ →)∗ .
Session-Based Communication Optimisation for Higher-Order Mobile Processes
207
2.2 Example: Optimised Business Protocol with Code Mobility We show a business/financial protocol interaction from [18,17] which integrates the two kinds of type-safe optimisations. We extend the scenario from [11] to highlight the expressiveness gained using the new method. Fig. 3 draws the sequencing of actions modelling a hotel booking through a process Agent. On the left Client behaves dually to Agent; on the right, an optimised MClient utilises type-safe asynchronous behaviour.
Agent
Client
MClient
rtt
rtt
move
move
code
code
run code
run code
hotel
hotel
roomtype rate
roomtype rate
creditcard
creditcard local
Fig. 3. Standard (left) and Optimised (right) Interaction for Hotel Booking
The Agent behaves the same towards both clients: initially it calculates the roundtrip time (RTT) of communication (rtt) and sends it; it then offers to the other party the option to consider the RTT and either send mobile code to interact with the Agent on its location, or to continue the protocol with each executing remotely their behaviour. When mobile code (after choice move) is received, it is run by the Agent completing the transaction on behalf of the client, in a sequence of steps. The behaviour of Client is straightforward and complementary to Agent, but MClient has special requirements: it represents a mobile device with limited processing power, and irrespective of the RTT it always sends mobile code; moreover, it does not care about money, and provides the creditcard number (card) before finding out the rate. To represent this optimised scenario, we start from the process for Agent: Agent = a(x).x!rtt.x {move : x?(code).(run code | Q), local : Q} Q = x?(hotel).x?(roomtype).x!rate.x?(creditcard) . . . The session is initiated over a, then the rtt is sent, then the choices move and local are offered. If the first choice is made then the received code is run in parallel to the process Q which continues the agent’s session, performing optimisation by code mobility. As expected, Client has dual behaviour: Client = a(x).x?(rtt).x move.x!x!ritz.x!suite.x?(rate).x!card. . . .
208
D. Mostrous and N. Yoshida
A more interesting optimisation is given by MClient which at first may seem to disagree with the intended protocol: MClient = a(x).x move.x!x!ritz.x!suite.x!card.x?(rtt).x?(rate) . . . After the session is established, it eagerly sends its choice move, ignoring rtt, followed by a thunk that will continue the session; and another important point is that in the mobile code the output of the card happens before rtt and rate are received. Even without subtyping, the typing of sessions in the HOπ-calculus poses delicate conditions [11]; in the present system, we can further verify that the optimisation of MClient does not violate communications safety (but the similar example in § 1, s!s?(x).s!x.s?(z1 ).s?(z2 ).0, must be untypable): when values are received they are always of the expected type, conforming to a new subtyping relation given in the next section.
3 Higher-Order Linear Types with Asynchronous Subtyping 3.1 Types This section presents an asynchronous subtyping relation for the HOπ-calculus based on [12]. The syntax of the types is given below: Term T ::= U | Value U ::= H | S HO-value H ::= unit | U → T | U T | S Session S ::= ![U].S | ?[U].S | ⊕[l1 : S1 , . . . , ln : Sn ] | &[l1 : S1 , . . . , ln : Sn ] | µt.S | t | end It is an integration of the types from the simply typed λ-calculus with linear functional types, U T , and the session types from the π-calculus. A linear type represents a function to be used exactly once. Term types, ranging over T , include all value types and the process type . Session types range over S, S, ... Higher-Order value types consist of the unit type, the function types, the linear function types and the channel type S, and value types consist of HO-value and session types. Note that linear types are attached only to function types. In the session types, ![U].S represents the output of a value typed by U followed by a session typed by S; ?[U] is its dual. ⊕[l1 : S1 , . . . , ln : Sn ] is the selection type on which one of the labels li can be sent, with the subsequent session typed by Si ; &[l1 : S1 , . . . , ln : Sn ] is its dual called the branching type. t is a type variable and µt.S is a recursive type. We only consider contractive recursive types [20]. end denotes the termination of the session. We often write &[li : Si ]i∈I and ⊕[li : Si ]i∈I and T for unit → T and T1 for or unit T . The type end is often omitted. Each session type S has a dual type, denoted by S, which describes complementary behaviour. This is inductively defined as: ![U].S =?[U].S, ⊕[l1 : S1 , . . . , ln : Sn ] = &[l1 : S1 , ..., ln : Sn ], ?[U].S =![U].S, &[l1 : S1 , . . . , ln : Sn ] = ⊕[l1 : S1 , ..., ln : Sn ], t = t, µt.S = µt.S and end = end. We say a type is guarded if it is neither a recursive type nor a type variable. (An occurrence of) a type constructor not under a recursive prefix in a recursive type is
Session-Based Communication Optimisation for Higher-Order Mobile Processes ![U].?[U ].S ?[U ].![U].S
(OI)
(SI)
209
⊕[l j :?[U].S j ] j∈J ?[U]. ⊕ [l j : S j ] j∈J
(OB) ![U].&[l j : S j ] j∈J &[l j :![U].S j ] j∈J (SB) ⊕[li : &[l j : Si j ] j∈J ]i∈I &[l j : ⊕[li : Si j ]i∈I ] j∈J (Tr)
S1 S2 S2 S3 S1 S3
(CO) ![U].S ![U].S
(CB)
∀i ∈ I. Si Si &[li : Si ]i∈I &[li : Si ]i∈I
(CS) ⊕[li : Si ]i∈I ⊕[li : Si ]i∈I
(CI)
S S ?[U].S ?[U].S
(E) end end (M) μt.S μt.S
Fig. 4. Top Level Asynchronous Action Rules
called top-level action (for example, ![U1 ] and ?[U2 ] in ![U1 ].?[U2 ].µt.![U3].t are toplevel, but ![U3 ] in the same type is not). In the above type, ![U1 ] is the head since it appears as the left-most occurrence of the top-level actions in S (note that ?[U2 ] is not the head). We write Type for the collection of all closed types. 3.2 Higher-Order Asynchronous Subtyping This subsection studies a theory of asynchronous session subtyping: reordered communications, even higher-order and mobile, can preserve the faithfulness to the other dual party. Fig. 4 defines the axioms for partial permutation of top-level actions for closed types, denoted . S S is read: S is an action-asynchronous subtype of S , and means S is more asynchronous than (or more optimised than) S . We write S S for S S. A permutation of two inputs or two outputs is not allowed since it violates type-safety. Suppose P = s!2.s!true.s?(x).0 and Q = s?(y).s?(z).s!y + 2.0. These processes interact correctly. If we permute the outputs of P to get P = s!true.s!2.s?(x).0, then the parallel composition (P | Q) causes a type-error. Similarly the reverse direction of (OI, OB, SI, SB) causes a deadlock, losing progress in session s. For example, consider exchanging s!true and s?(z) in P1 = s!true.s?(z).0, and Q1 = s?(y).s!2.0. Note that partial permutation is only applied to finite parts of the top-level actions without unfolding recursive types. To handle recursive types in asynchronous subtyping, we need to generalise the unfolding function defined in [5] since might be applicable to a type after unfolding of recursions under some guarded prefixes. The definition is based on [12]. Definition 3.1 (n-time unfolding) unfold1+n (S) = unfold1 (unfoldn (S)) unfold0 (S) = S for all S 1 1 unfold (![U].S) =![U].unfold (S) unfold1 (⊕[li : Si ]i∈I ) = ⊕[li : unfold1 (Si )]i∈I unfold1 (?[U].S) =?[U].unfold1 (S) unfold1 (&[li : Si ]i∈I ) = &[li : unfold1 (Si )]i∈I unfold1 (t) = t unfold1 (µt.S) = S[µt.S/t] unfold1 (end) = end For any recursive type S, unfoldn (S) is the result of inductively unfolding the top level recursion up to a fixed level of nesting. Because our recursive types are contractive, unfoldn (S) terminates. We now introduce the main definition of the paper, asynchronous communication subtyping for the HOπ-calculus. First, let us define: (H, H ) = (H, H )
(S, S) = (S , S)
( , ) = ( , )
which is used to adjust for the different variance of functional and session types.
210
D. Mostrous and N. Yoshida
Definition 3.2 (Asynchronous Subtyping). A relation ℜ ∈ Type × Type is an asynchronous type simulation if (T1 , T2 ) ∈ ℜ implies the following conditions: 1. If T1 = , then T2 = . 2. If T1 = unit, then T2 = unit. 3. If T1 = U1 → T1 , then T2 = U2 → T2 or T2 = U2 T2 with (U2 ,U1 ) ∈ ℜ and (T1 , T2 ) ∈ ℜ. 4. If T1 = U1 T1 , then T2 = U2 T2 with (U2 ,U1 ) ∈ ℜ and (T1 , T2 ) ∈ ℜ. 5. If T1 = S1 , then T2 = S2 and (S1 , S2 ) ∈ ℜ and (S2 , S1 ) ∈ ℜ. 6. If T1 = end, then unfoldn (T2 ) = end. 7. If T1 =![U1 ].S1 , then unfoldn (T2 ) ![U2 ].S2 , (U1 ,U2 ) ∈ ℜ and (S1 , S2 ) ∈ ℜ. 8. If T1 =?[U1 ].S1 , then unfoldn (T2 ) =?[U2 ].S2 , (U2 ,U1 ) ∈ ℜ and (S1 , S2 ) ∈ ℜ. 9. If T1 = ⊕[li : S1i ]i∈I , then unfoldn (T2 ) ⊕[l j : S2 j ] j∈J , I ⊆ J and ∀i ∈ I.(S1i , S2i ) ∈ ℜ. 10. If T1 = &[li : S1i ]i∈I , then unfoldn (T2 ) = &[l j : S2 j ] j∈J , J ⊆ I and ∀ j ∈ J.(S1 j , S2 j ) ∈ ℜ. 11. If T1 = µt.S, then (unfold1 (T1 ), T2 ) ∈ ℜ. As standard, the coinductive subtyping relation T1 c T2 (read: T1 is an asynchronous subtype of T2 ) is defined when there exists a type simulation ℜ with (T1 , T2 ) ∈ ℜ. The integration of the subtyping of higher-order (linear) functions and asynchronous sessions requires a careful formulation: (1,2,6) are standard identity rules. (3) says the unlimited function can be used as the linear function. Note that the reverse is unsafe: suppose f = λx.k!x with a linear type nat . If we apply the reverse direction, λ(y : nat → ).(y 1 | y 2) f becomes typable, destroying the linearity of session k. In (3), when Ui is a session type, we use the relation (S1 , S2 ) = (S2 , S1 ) to swap the tuple. The session types are dualised since the session channel is going to be used in a process in a contravariant manner.1 To see this condition, suppose process P = (λ(x : S).x!2.x?(y).0) s with S =![nat].?[bool].end. Then P can safely interact with Q = s!true.s?(z).0. For P to be composable with Q, s in P has a dual type of s in Q, which is S =?[bool].![nat].end. Hence we must have S → c S → , with S c S where the subtyping ordering of session channels is covariant. The case Ti is a session type is similarly explained. (4) is similar. (5) says the shared channel type is invariant (as is the standard session types [5,12,7]). In (7), an output of T1 can be simulated after applying asynchronous optimisation to the unfolded T2 . We also need to ensure object type U1 is a subtype of U2 . For similar reasons with (3), we swap the ordering if they are session types. For the input in (8), we do not require , since, by definition of , if the input appears at the top level in S, then it does so in all S such that S S . The definitions of selection and branching subsume the traditional session branching/selection subtyping. In (9), selection is defined 1
The original session typing system uses a judgement “Γ P : Σ” where Γ is a shared (standard) environment and Σ is a mapping from a session channel to a session type. This means: P accesses the session channels specified at most by Σ. Contrarily, in our typing system defined in the next section, Σ appears in the left-side position, so that we need to dualise the session types for subtyping, cf. [19].
Session-Based Communication Optimisation for Higher-Order Mobile Processes
211
similarly to output since a label appearing in T1 must be included in T2 ; dually, in (10), branching is defined like input and any branch of T2 must be included in T1 . Finally (11) forces T1 to be unfolded until it reaches a guarded type. More examples can be found in § 4.3. We conclude this section with the main theorem for c . Since now types include higher-order function types and session delegations with a combination of n-time unfolding and permutation, the proof of transitivity of c requires a family of relations to connect two relations ℜ1 and ℜ2 , for which we use the transitivity connection trc(ℜ1 , ℜ2 ). Lemma 3.3. If S1 c S2 and S1 unfoldn (S1 ) then S1 c S2 . From the above lemma we have that whenever S1 ℜ S2 for type simulation ℜ, then for some n-times unfolding of S1 , and after applying a sequence of permutations to obtain S1 such that S1 unfoldn (S1 ), there exists a type simulation ℜ such that S1 ℜ S2 . We use this fact below: given S1 ℜ S2 , we obtain a simulation (the union of ℜ ) for each level n of unfolding of S1 , relating each possible permutation S1 of unfoldn (S1 ) and S2 . Definition 3.4 (Transitivity Connection). When S1 ℜ S2 for type simulation ℜ, we define the asynchrony relation of S1 and S2 as: A (S1 , S2 ) = (S1 , S2 ) | unfoldn (S1 ) S ∧ S ℜ S2 ∧ ℜ ⊆ c ∧ (S1 , S2 ) ∈ ℜ n∈N
Then, for type simulations ℜ1 and ℜ2 , we define their transitivity connection trc(ℜ1 , ℜ2 ) as the smallest relation such that whenever S1 ℜ1 S2 and S2 ℜ2 S3 , we have A (S2 , S3 ) ⊆ trc(ℜ1 , ℜ2 ). For type simulation ℜ with S1 ℜ S2 , A (S1 , S2 ) (hence trc(ℜ1 , ℜ2 )) is also a type simulation. Using this property, we have: Theorem 3.5 (Preorder). c is a preorder. Proof. Reflexivity is easy. For transitivity, we assume (T1 , T2 ) ∈ ℜ1 and (T2 , T3 ) ∈ ℜ2 for simulations ℜ1 and ℜ2 , to find a simulation ℜ such that (T1 , T3 ) ∈ ℜ. Define ℜ as: ℜ = ℜ12 · ℜ21 ∪ ℜ21 · ℜ12
with
ℜi j = ℜi ∪ trc(ℜ j , ℜi )
We have (T1 , T3 ) ∈ ℜ1 · ℜ2 ⊆ ℜ12 · ℜ21 ⊆ ℜ; we then prove that ℜ is a type simulation, observing the fact that each ℜi j above is a type simulation, as the union of type simulations, see [1].
4 Asynchronous Higher-Order Session Typing 4.1 A Typing System for Programs We first introduce the linear Higher-Order typing system for programs (terms which do not contain queues and session-restrictions). We define two environments: Γ ::= 0/ | Γ, u : H
Σ ::= 0/ | Σ, k : S
212
D. Mostrous and N. Yoshida
(Common) (Shared)
(Session)
(LVar)
(Base)
H = U T / 0/ u : H Γ, u : H; 0;
Γ; k : S; 0/ k : S
/ {x} x : U T Γ, x : U T ; 0;
/ 0/ () : unit Γ; 0;
(Function)
(Process)
Γ, x : H; Σ; L P : T if H = U T then x ∈ L Γ; Σ; L \ x λ(x : H).P : H → T (Abs)
(AbsS )
Γ; Σ, x : S; L P : T Γ; Σ; L λ(x : S).P : S → T
(Recursion)
/ 0/ λ(y :U).P : U → T Γ, x :U → T ; 0; / 0/ µ(x :U → T ).λ(y :U).P : U → T Γ; 0;
(Sub) Γ; Σ; L P : H
Σ c Σ H c H Γ; Σ ; L P : H
(App) Γ; Σ1 ; L1 P : U T Γ; Σ2 ; L2 Q : U if U = U → T then Σ2 = L2 = 0/ Γ; Σ1 , Σ2 ; L1 , L2 P Q : T
Fig. 5. Selected Linear Session Typing
Γ is a finite mapping, associating HO value types to identifiers. Σ is a finite mapping from session channels to session types. In addition, we use a finite set of linear variables ranged over L , L , ... to ensure linear usage of function terms that may contain session channels. Σ, Σ and L , L denote disjoint-domain unions. Γ, u :U means u ∈ dom(Γ). Then the typing judgement takes the shape: Γ; Σ; L P : T which is read: under a global environment Γ, a term P has a type T with session usages described by Σ and linear variables specified by L . We say the judgement is well-formed / The typing system is given in Fig. 5. In each if dom(Γ) ⊇ L and dom(Γ) ∩ dom(Σ) = 0. rule, we assume the environments of the consequence are defined. We focus on the rules which differ from [11]. In the first group, (Common), (Shared) is an introduction rule for identifiers with shared types; (Session) is for session channels. (LVar) is for linear variables. The second group, (Function), comes from the typed linear λ-calculus with recursive types. All rules are identical except the addition of (Recursion). In (Abs), the premise side-condition ensures that the formal parameter x, to be substituted with the received function, appears in the linear variables. In the conclusion, we remove x from the function environment. (AbsS ) is an abstraction rule for session channels. (Recursion) forbids the use of any free linear identifier (by the condi/ because it is repeatedly used. (App) is the rule for application; the side tion Σ = L = 0) condition ensures that when the right term is of shared function type, it is required not to have free session channels or linear variables. The conclusion says that P and Q’s session environments and linear variable sets are disjoint. The final group, (Process), is for processes. The only different rule from [11] is (Sub) which now uses c ; we write Σ c Σ when dom(Σ) = dom(Σ ) and for all k : S ∈ Σ, we have k : S ∈ Σ with S c S . The rest of the rules and their explanations can be found in [11]. 4.2 A Typing System for Runtime Session Remainder. Type soundness is established by also typing the queues created during the execution of a well-typed initial program. We track the movement of linear
Session-Based Communication Optimisation for Higher-Order Mobile Processes
213
functions and channels to and from the queue to ensure that linearity is preserved, and we check that endpoints continue to have dual types up to asynchronous subtyping after each use. To analyse the intermediate steps precisely, we utilise a session remainder S −τ = S which subtracts the vectorτ of queue types (τ ::= U | l) of the values stored in a queue from the complete session type S of the queue, obtaining a remaining session S . The rules are formalised below: (Empty)
S−ε = S
(Get) (Put) (Branch) (Select)
S −τ = S S −τ = S Sk −τ = S ∧ k ∈ I ∀ i ∈ I . Si −τ = Si
⇒ ⇒ ⇒ ⇒
?[U].S − Uτ = S ![U].S −τ =![U].S &[li : Si ]i∈I − lkτ = S ⊕[li : Si ]i∈I −τ = ⊕[li : Si ]i∈I
When S is end, then the session has been completed; otherwise it is not closed yet. (Empty) is a base rule. (Get) takes an input prefixed session type ?[U].S and subtracts the type U at the head of the queue, then returns the remainder S of the rest of the session S minus the tailτ of the queue type. (Put) disregards the output action type of the session and calculates the remainder S of S −τ, which is returned prefixed with the original output giving ![U].τ. Therefore the output is not consumed. (Branch) is similar with (Get), but it only records the remainder of the k-th branch with respect to a stored label lk . Dually, (Select) records the remainder of all selection paths. A Typing System for Runtime. We first extend the session environment as follows: Δ ::= Σ | Δ, s :τ | Δ, s : (S,τ) The typing judgement is also extended with Γ; Σ; L l : l which is used for typing any labels appearing in a session queue. Δ contains usage information for queues (s :τ) in a term, so that the cumulative result can be compared with the expected session type; for this we use the pairing (s : (S,τ)) that combines the usage of a channel and the sequence of types already on its queue. We identify (S,τ) and (τ, S). We define a composition operation on Δ-environments, used to obtain the paired usages for channels and queues: Δ1 Δ2 = {s : (Δ1(s), Δ2 (s)) | s ∈ dom(Δ1 ) ∩ dom(Δ2 )} ∪ Δ1 \dom(Δ2 ) ∪ Δ2 \dom(Δ1 ) The typing rules for runtime are listed in Fig. 6. (Label) types a label in a queue, while (Queue) forms a sequence of the types of the values in a queue: we ensure the disjointness of session environments of values, and apply a weakening for end (Σ0 ) for closure under the structure rules. (Par) composes processes, including queues, and records the session usage by ; this rule subsumes (Par) for programs. (News ) is the main rule for typing the two endpoint queues of a session. Types S1 and S2 can be given to the queues s and s when the session remainders S1 and S2 of S1 − τ1 and S2 − τ2 are dual session types up to asynchronous subtyping; more precisely, S1 must be a subtype of the dual of S2 , written S1 c S2 . Since the session is compatible, we can restrict s. Note that in all runtime systems, the set of linear variables is empty.
214
D. Mostrous and N. Yoshida
/ 0/ l : l Γ; 0; (News )
if τi = U → T then Σi = 0/ Γ; Σi ; 0/ hi : τi i ∈ 1..n Σ0 = {s : end} Γ; (Σ0 , .., Σn ) s : τ1 ..τn ; 0/ s : h1 ..hn :
(Queue)
(Label)
Γ; Δ, s : (S1 , τ1 ), s : (S2 , τ2 ); 0/ P : Si −τi = Si Γ; Δ; 0/ (ν s)P :
i ∈ 1, 2
(Par)
Γ; Δ1,2 ; L1,2 P1,2 :
Γ; Δ1 Δ2 ; L1 , L2 P1 | P2 :
S1 c S2
(New)
Γ, a : S ; Δ; L P :
Γ; Δ; L (ν a : S )P :
Fig. 6. Runtime Typing
4.3 Typing the Optimised Mobile Business Protocol Using the program and runtime typing systems, we can now type the hotel booking example in § 2.2, in the presence of asynchronous optimisation for higher-order mobility. Agent and standard Client can be typed with: SAgent =![int].&[move :?[unit ].SAgent , local : SAgent ]
with
=?[string].?[string].![double].?[int].end and SAgent
Sclient = SAgent
We then type MClient by using the rules in Fig. 5 and [11]. SMClient = ⊕[move :![unit ].![string].![string].![int].?[int].?[double].end] Applying Def. 3.2 we verify that SMClient c SAgent (and SMClient c SClient ). Then using typing rules (Acc,Req) we can type both MClient and Agent with a : SAgent ∈ Γ, after applying (Sub) on the premises of (Req) typing the body of MClient. We now demonstrate runtime typing; after three reduction steps of MClient | Agent we can have this configuration: (νs)( s {move : s?(code).(run code | . . .), local : . . .} | s:rtt | s:move·s!ritz . . . ) with s as the Agent’s queue. Both queues contain values including the linear higherorder code sent by MClient (which became 0 after this output). Using (Queue, Label) we type s:move·s!ritz . . . with session environment {s : SMClient , s : move · unit } where SMClient comes from typing the HO code containing s, and: =![string].![string].![int].?[int].?[double].end SMClient
and similarly we type s : rtt with {s : int}. The Agent s {move : . . . , local : . . .} (Bra) under session environment is typed with s : &[move :?[unit ].SAgent , local : SAgent ] . The above session environments can be synthesised using to obtain: , int), s : (&[move :?[unit ].SAgent , local : SAgent ], move · unit ) s : (SMClient Now we use the rules in § 4.2 to calculate the session remainder of each queue: SMClient − int =![string].![string].![int].?[double].end &[move :?[unit ].SAgent , local : SAgent ] − move · unit = SAgent
Session-Based Communication Optimisation for Higher-Order Mobile Processes
215
and we have ![string].![string].![int].?[double].end c SAgent . Finally, we can apply (News ) and complete the derivation. We can also check that the similar example in § 1, s!s?(x).s!x.s?(z1 ).s?(z2 ).0, is untypable since we cannot compose the session environments which include s both in the sent thunk and in the continuation.
5 Communication Safety and Algorithmic Subtyping 5.1 Type Soundness and Communication Safety This section studies the key properties of our typing system. First we show that typed processes enjoy subject reduction and communication safety. We begin by introducing balanced environments which specify the conditions of composable environments of runtime processes. Definition 5.1 (Balanced Δ). balanced(Δ) holds if whenever {s : (S1 , τ1 ), s : (S2 , τ2 )}⊆ Δ with S1 −τ1 = S1 and S2 −τ2 = S2 , then S1 c S2 . The definition is based on (News ) in the runtime typing system (Fig. 6): intuitively, all subprocesses generated from an initial typable program should conform to the balanced condition. We next define the ordering between the session environments which abstractly represents an interaction at session channels. Definition 5.2 (Δ Ordering). Recall defined in § 4.2. We define Δ s Δ as follows: s :?[U].S s : Uτ s s : S s :τ s :![U].S s :τ s s : S s :τU
s : &[li : Si ]i∈I s : lkτ s s : Sk s :τ k ∈ I s : ⊕[li : Si ]i∈I s :τ s s : Sk s :τlk k ∈ I
s : µt.S s :τ s s : S s :τ if s : S[µt.S/t] s :τ s s : S s :τ Δ Δ1 s Δ Δ2 if Δ1 s Δ2 and Δ Δ1 defined Note that if Δ1 s Δ2 and Δ Δ1 is defined, then Δ Δ2 is defined; and if balanced(Δ) and Δ s Δ then balanced(Δ ). Then by the standard substitution lemmas, we have: Theorem 5.3 (Type Soundness) 1. Suppose Γ; Δ; L P : . Then P ≡ P implies Γ; Δ; L P : . 2. Suppose Γ; Δ; 0/ P : T with balanced(Δ). Then P −→ P implies Γ; Δ ; 0/ P : T and either Δ = Δ or Δ s Δ . We now formalise communication-safety (which subsumes the usual type-safety). First, an s-queue is a queue process s :h. An s-input is a process of the shape s?(x).P or s {li : Pi }i∈I . An s-output is a process s!V .P or s l.P. Then, an s-process is an s-queue, s-input or s-output. Finally, an s-redex is a parallel composition of either an s-input and non-empty s-queue, or an s-output and s-queue. Definition 5.4 (Error Process). We say P is an error if P ≡ (νa)(νs)(Q | R) where Q is one of the following: (a) a |-composition of two s-processes that does not form either an s-redex or an s-input and an empty s-queue; (b) an s-redex consisting an s-input and s-queue such that Q = s?(x).Q | s : lkh or Q = s {li : Pi }i∈I | s : Vh; (c) an s-process for s ∈ s with s not free in R or Q; (d) a prefixed process or application containing an s-queue.
216
D. Mostrous and N. Yoshida
The above says that a process is an error if (a) it breaks the linearity of s by having e.g. two s-inputs in parallel; (b) there is communication-mismatch; (c) there is no corresponding opponent process for a session; or (d) it encloses a queue under prefix, thus making it unavailable. As a corollary of Theorem 5.3, we achieve the following general communication-safety theorem, subsuming the case that P is an initial program. Theorem 5.5 (Communication Safety). If Γ; Δ; L P : with balanced(Δ), then P never reduces into an error. 5.2 Algorithmic Higher-Order Asynchronous Subtyping This subsection proposes an algorithmic subtyping, extending the method from [12, § 3]. While the inclusion of the higher-order sessions and functional types complicates the proof of soundness, the basic idea of the rules and proofs stays as before. First we ! prove the decidability of S S , introducing the rewriting rule S → S which moves ⊕ the output action to the head (using in the reverse direction). Similarly for S → S . ! For a simple example, let S0 = &[l1 :?[U1 ].![U2 ].end, l2 :![U2 ].end]. Then S0 → &[l1 : !
![U2 ].?[U1 ].end, l2 :![U2 ].end] →![U2 ].&[l1 :?[U1 ].end, l2 : end] by applying (OI, OB) in !
⊕
Fig. 4 in the reverse direction. Since → and → terminate, and S S can be decomposed π π into a finite sequence of S →1 · · · →n S (n ≥ 0) with πi ∈ {!, ⊕}, the decidability of is straightforward. Using this relation, we can define the derivability of judgement Σ T T where Σ is a sequence of assumed goals in the subtyping derivation. We list only the key output rule which is used together with the standard output rule [5,4]: (Out)
Σ ![U1 ].S1 ![U2 ].T [S2h ]h∈H
!
T [![U2 ].S2h ]h∈H →![U2 ].T [S2h ]h∈H Σ ![U1 ].S1 T [![U2 ].S2h ]h∈H
S1 1 T [S2h ]h∈H
where T [Sh ]h∈H represents a h-hole context; and T 1 T means that T and T have the same session constructors under matching recursions; and labels in each type are ! distinct. This rule reads: we fix the subtype and apply → to place ![U2 ] to the head; then we can use the standard output rule. As an example, let S1 = &[l1 :?[U1 ].end, l2 : end]. Then we can derive ![U2 ].S1 S0 (S0 is given above) by using (Out). The algorithm is applied to the initial goal 0/ T T . Then using the same method developed in [12,5], we can prove the subtyping algorithm always terminates. We conclude this section with the following theorem (see [1]): Theorem 5.6 (Soundness and Completeness of the Algorithmic Subtyping). For all closed types T and T with T 1 T , T c T if and only if 0/ T T .
6 Related and Future Work The asynchronous subtyping has been firstly studied in [12] for multiparty session types [8]; this work does not support neither higher-order sessions (delegations) nor code mobility (higher-order functions). Both of these features provide powerful abstractions for structured distributed computing; delegation is the key primitive in our implementation of session types in Java [9] and web service protocols [17,18], to which we
Session-Based Communication Optimisation for Higher-Order Mobile Processes
217
can now apply our theory for flexible optimisation. The proof of the transitivity in this paper requires more complex construction of the transitive closure trc(ℜ1 , ℜ2 ) (Definition 3.4) than the one in [12] due to the higher-order constructs. In spite of the richness of the type structures, we proposed more compact runtime typing and proved communication safety in the presence of higher-order code, which is not presented in [12]. Note that our new typing system subsumes the previous linear typing system in [11], demonstrating a smooth integration of two kinds of type-directed optimisation. Coinductive subtyping of recursive session types is first studied in [5], adapting the standard methods for the IO-subtyping in the π-calculus [14]. The system of [5] does not provide any form of asynchronous permutation, thus does not need the nested ntimes unfolding (Definition 3.1). Our transitivity proof and the algorithmic subtyping are more involved than [5] due to the incorporation with n-time unfolding and higherorder functions. Our treatment of runtime typing, specifically our method for typing session queues and the use of session remainders, is more compact than previous asynchronous session works [8,2,3] where they use the method of rolling-back messages – the head type of a queue typing moves to the prefix of the session type of a process using the queue, and then compatibility is checked on the constructed types. Our method is simpler, as we remove type elements appearing in a queue from its typing. On the other hand, our queue typing is more similar to that of the functional language in [6], where smaller types are obtained after matching with buffer values. Our method works with queue types rather than with values directly, hence it can be extended smoothly to handle asynchronous optimisation, which is not treated in [6]. For example we allow a type consisting an output followed by an input action to be reduced with a type corresponding to the input, leaving the output prefix intact. Using a more delicate composition between values and queue typing, our system enables linear mobile code to be stored in the queues. We intend to integrate the improved methods from this work back to our original subtyping method for multiparty sessions [12], extending it to higher-order multiparty sessions. Another direction is progress [2], by which we mean deadlock-free execution of multiple interleaved sessions: in the presence of higher-order code mobility, this extension is challenging since it requires tracking dependencies inside mobile code. For example, if s!P is blocked, the sessions inside P are also blocked. On the other hand, we postulate that asynchronous subtyping does not introduce deadlock to a deadlock-free supertype, as outputs and selections can only be done in advance (partial commutativity), satisfying even stricter input dependencies than those required by the dual session of the supertype.
References 1. On-line Appendix of this paper, http://www.doc.ic.ac.uk/˜mostrous/hopiasync 2. Bettini, L., Coppo, M., D’Antoni, L., De Luca, M., Dezani-Ciancaglini, M., Yoshida, N.: Global progress in dynamically interleaved multiparty sessions. In: van Breugel, F., Chechik, M. (eds.) CONCUR 2008. LNCS, vol. 5201, pp. 418–433. Springer, Heidelberg (2008) 3. Bonelli, E., Compagnoni, A.: Multipoint Session Types for a Distributed Calculus. In: Barthe, G., Fournet, C. (eds.) TGC 2007. LNCS, vol. 4912, pp. 240–256. Springer, Heidelberg (2008)
218
D. Mostrous and N. Yoshida
4. Carbone, M., Honda, K., Yoshida, N.: Structured Communication-Centred Programming for Web Services. In: De Nicola, R. (ed.) ESOP 2007. LNCS, vol. 4421, pp. 2–17. Springer, Heidelberg (2007) 5. Gay, S., Hole, M.: Subtyping for Session Types in the Pi-Calculus. Acta Info. 42(2/3), 191– 225 (2005) 6. Gay, S., Vasconcelos, V.T.: Linear type theory for asynchronous session types (October 2008) (submitted for publication) 7. Honda, K., Vasconcelos, V.T., Kubo, M.: Language Primitives and Type Disciplines for Structured Communication-based Programming. In: Hankin, C. (ed.) ESOP 1998. LNCS, vol. 1381, pp. 22–138. Springer, Heidelberg (1998) 8. Honda, K., Yoshida, N., Carbone, M.: Multiparty Asynchronous Session Types. In: POPL 2008, pp. 273–284. ACM Press, New York (2008) 9. Hu, R., Yoshida, N., Honda, K.: Session-Based Distributed Programming in Java. In: Vitek, J. (ed.) ECOOP 2008. LNCS, vol. 5142, pp. 516–541. Springer, Heidelberg (2008) 10. Milner, R.: Functions as processes. MSCS 2(2), 119–141 (1992) 11. Mostrous, D., Yoshida, N.: Two Session Typing Systems for Higher-Order Mobile Processes. In: Della Rocca, S.R. (ed.) TLCA 2007. LNCS, vol. 4583, pp. 321–335. Springer, Heidelberg (2007) 12. Mostrous, D., Yoshida, N., Honda, K.: Global principal typing in partially commutative asynchronous sessions. In: Castagna, G. (ed.) ESOP 2009. LNCS, vol. 5502, pp. 316–332. Springer, Heidelberg (2009), www.doc.ic.ac.uk/˜mostrous/asyncsub 13. The Message Passing Interface (MPI) standard, http://www-unix.mcs.anl.gov/mpi/ usingmpi/examples/intermediate/main.htm 14. Pierce, B.C., Sangiorgi, D.: Typing and subtyping for mobile processes. In: Logic in Computer Science (1993); Full version in Mathematical Structures in Computer Science 6(5) (1996) 15. Sangiorgi, D.: Expressing Mobility in Process Algebras: First-Order and Higher Order Paradigms. PhD thesis, University of Edinburgh (1992) 16. Takeuchi, K., Honda, K., Kubo, M.: An Interaction-based Language and its Typing System. In: Halatsis, C., Philokyprou, G., Maritsas, D., Theodoridis, S. (eds.) PARLE 1994. LNCS, vol. 817, pp. 398–413. Springer, Heidelberg (1994) 17. UNIFI. International Organization for Standardization ISO 20022 UNIversal Financial Industry message scheme (2002), http://www.iso20022.org 18. Web Services Choreography Working Group. Web Services Choreography Description Language, http://www.w3.org/2002/ws/chor/ 19. Yoshida, N.: Channel dependency types for higher-order mobile processes. In: POPL 2004, pp. 147–160. ACM Press, New York (2004), www.doc.ic.ac.uk/˜yoshida 20. Yoshida, N., Vasconcelos, V.T.: Language Primitives and Type Disciplines for Structured Communication-based Programming Revisit. In: SecRet 2006. ENTCS, vol. 171(3), pp. 127– 151. Elsevier, Amsterdam (2007)
The Cut-Elimination Theorem for Differential Nets with Promotion Michele Pagani Laboratoire Preuves, Programmes et Syst`emes Universit´e Paris Diderot – Paris 7 & Dipartimento di Informatica Universit` a degli Studi di Torino http://www.di.unito.it/~pagani
Abstract. Recently Ehrhard and Regnier have introduced Differential Linear Logic, DiLL for short — an extension of the Multiplicative Exponential fragment of Linear Logic that is able to express non-deterministic computations. The authors have examined the cut-elimination of the promotion-free fragment of DiLL by means of a proofnet-like calculus: differential interaction nets. We extend this analysis to exponential boxes and prove the Cut-Elimination Theorem for the whole DiLL: every differential net that is sequentializable can be reduced to a cut-free net.
Introduction The cut-elimination procedure has been invented by Gentzen in order to prove consistency of classical logic and Peano Arithmetics. First, he introduced the sequent calculus LK, a proof system sound and complete with respect to classical logic. In this system there is just one deductive rule – the cut – which might prove absurdity, so that consistency can be deducted from the redundancy of such rule. In order to achieve it, Gentzen defined then cut-elimination, a procedure transforming a proof π into another one π of the same theorem where however cuts in π are in “some sense” reduced. Redundancy of the cut rule is then obtained by proving that iterating this procedure always terminates to a cutfree proof – the Cut-Elimination Theorem. Through the time the procedure of cut-elimination has acquired more and more importance in proof theory, also independently from the question of consistency. In particular in the 60’s, cut-elimination revealed a deep nexus between logic and computer science, enabling a correspondence between the execution of programs and the cut-elimination of proofs. This correspondence is called the Curry-Howard correspondence, and allows to express termination properties of programming languages as cut-elimination theorems in specific proof systems.
Supported by a post-doc fellowship of R´egion ˆIle de France and Italian MIUR Project “CONCERTO”.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 219–233, 2009. c Springer-Verlag Berlin Heidelberg 2009
220
M. Pagani
Linear logic (LL, [Gir87]) has been build around cut-elimination: it splits the connectives ”and”, ”or” of LK in two classes (the multiplicatives ⊗, , and the additives &, ⊕) depending on their behavior during cut-elimination, and it introduces a new pair of dual connectives (the exponentials !, ?) giving a logical status to the actions of erasing and duplicating whole pieces of a proof. Linear logic allows to express cut-elimination as a rewriting of proof nets – specific graphs that give a sharper account of cut-elimination than LK. Indeed the proof nets drastically decrease the types of commutative cuts, abounding instead in sequent calculus. In the syntax of proof nets the commutative cuts are due to the boxes – special hyper-edges representing the sequent rules of LL that have some restriction on the context (i.e. promotion, additive conjunction and universal quantification)1 . A particular feature of these cuts is that they can be profoundly affected by the reduction of other cuts, even changing their commutative nature: this may considerably muddle the picture (see [PTdF07] for a more detailed discussion). As far as one restricts to the box-free fragment of LL, the cut-elimination is easily tamable, the reduction of a cut does not affect that of the others and a parallel reduction can be defined straightforwardly. Starting from this remark, Lafont has introduced interaction nets [Laf90] – a graph-rewriting paradigm of distributed computation based on the box-free fragment of the proof nets. The discovery of LL and proof nets has been a fundamental step towards the extension of the Curry-Howard correspondence, that was at the beginning restricted to the functional and sequential core of the programming languages. In particular Ehrhard and Regnier [ER06] have recently achieved a significant step towards a logical understanding of concurrency theory with the introduction of differential linear logic (DiLL). This system extends linear logic with three new rules handling the ! modality (codereliction, cocontraction and coweakening) and allows to express a concurrent sharing of resources [EL08]. The codereliction in particular creates data that can be called exactly once, so that a program made of several subroutines is executed non-deterministically on ”coderelicted” inputs, depending on which subroutine gains the unique available copy of the inputs. Thus we have formal sums, where each addendum represents a possibility. The cut-elimination of DiLL is analyzed with a proofnet-like calculus, called differential nets. Actually, [ER06] considers only the promotion-free fragment of DiLL, whose nets are without boxes and called differential interaction nets, being a non-deterministic example of Lafont’s interaction nets. In that restricted setting the authors prove the Cut-Elimination Theorem. In our paper, we extend their result to the whole DiLL, using differential nets with exponential boxes. The main difficulty in such an extension is to account for the exponential commutative cuts, and specifically for the commutative cut between a codereliction and a box, which has a completely non-standard behavior with respect to the commutative cuts in linear logic proof nets (see the discussion on Figure 6). 1
In this paper, indeed, we consider only the exponential boxes. Up to now, differential nets are restricted to the multiplicatives and exponentials; besides, there are specific notions of LL proof nets able to avoid the boxes for additives and quantifiers.
The Cut-Elimination Theorem for Differential Nets with Promotion
A, A
⊥
Γ, A A⊥ , Δ Γ, Δ
ax
Γ, A, B Γ, A ` B Γ Γ, ?A
!A Γ
0
Γ, A B, Δ Γ, A ⊗ B, Δ
`
Γ, ?A, ?A Γ, ?A
?w
Γ, A Γ, ?A
?c
Γ, !A !A, Δ Γ, !A, Δ
!w
Γ Γ Γ
sum
Γ, A, B, Δ Γ, B, A, Δ
cut
empty
ex
⊗
?Γ, A ?Γ, !A
?d
Γ, A Γ, !A
!c
221
p
!d
Γ Δ Γ, Δ
mix
Fig. 1. Sequent calculus rules for differential linear logic
1 1.1
Preliminaries Differential Nets
We recall differential nets and their cut-elimination: first, we introduce the promotion-free nets, called differential interaction nets in [ER06], then we add boxes to accommodate promotion. In the sake of brevity, the presentation is kept informal, and we refer to [ER06,Vau07] for a more detailed one. We denote sets with braces { }, and sequences by angles . Boldface letters a, b etc. range over sequences, and for i ≤ length(a), ai denotes the i-th element of a. Types of DiLL. The formulas of DiLL are generated by the following grammar, where X, X ⊥ range over a fixed set of propositional variables: A, B ::= X | X ⊥ | A ⊗ B | A
B | !A | ?A.
Linear negation is involutional (A⊥⊥ = A) and defined through De Morgan laws: (X)⊥ := X ⊥ , (A ⊗ B)⊥ := A⊥ B ⊥ , (!A)⊥ := ?A⊥ . Variables and their negations are atomic, ⊗, are multiplicative, while !, ? are exponential. For brevity, we omit to consider multiplicative units 1, ⊥, however all the results in this paper can be extended to the general case straightforwardly. A sequent Γ is a finite sequence of formulas A1 , . . . , An . Capital Greek letters Γ, Δ range over sequents. Fig. 1 gives the rules of DiLL sequent calculus. The calculus is extended with the rule mix and its zeroary version empty— this is needed to have a fair correctness criterion (Prop. 1). Differential interaction nets. A simple interaction net α is the union of a graph and an hyper-graph on a given set of nodes, respecting the following constraints (see Figure 5(a) for examples). – The nodes of α are called ports, they are crossed exactly by one edge and at most by one hyper-edge. In the figures, the ports are not explicitly depicted, as they correspond to the extremities of the edges.
222
M. Pagani
B
B
` A
A`B
A
!A
B
⊗
A⊗B
?w A
?A
A !A
?A
?d
?A
?A
!A
B
?c
?A
!w A
!A
A !A
!A
!d
!A
!c !A
!A
Fig. 2. Cells for differential interaction nets, together with their typing rules
– The edges of α are called wires, they are undirected, possibly loops; a wire {a, b} between two distinct ports a, b has two orientations: from a to b, denoted a, b, and from b to a, denoted b, a; with each orientation is associated a formula of DiLL in such a way that the formula associated with a, b must be the linear negation of the formula associated to b, a.2 – The hyper-edges of α are called cells, and they are sequences of the ports they cross; the first port crossed by a cell is called principal, the other ones (if any) are called auxiliary; every cell is labelled by a symbol that determines the arity of the cell and the types of the wires incident to it, as it is depicted in Figure 2. We require also that cells are not incident to loops. The type of a port a of α is the type of b, a, where {a, b} is the unique wire of α crossing a; in particular this means that the ports connected by a wire have dual types. The ports of α which are crossed by no cell nor loop, are called free. We require that α is given with an enumeration p of its free ports, called the interface of α. The sequent conclusion of α is Γ = A1 , . . . , An , where n is the length of p and for every i ≤ n, Ai is the type of pi . A differential interaction net with sequent conclusion Γ is a finite multiset3 , possibly empty, of simple interaction nets with the conclusion Γ . The loops must be admitted, as they can be produced by cut-elimination. However, they do not appear in the differential nets that are sequentializable, and so they are voluntarily left out of the discussion the most of times. Adding boxes. Boxes are a special kind of cells parameterized by a net, this last one standing for a proof of the premise of the corresponding promotion rule. Formally, the sets dN of differential nets and sN of simple nets are defined simultaneously, by induction on the exponential depth. This means that we define sNd and dNd for every d ∈ N and then we set: sN :=
∞ d=0
sNd ,
dN :=
∞
dNd .
d=0
sN0 (resp. dN0 ) is the set of the simple interaction nets (resp. differential 2 3
Loops are intentionally considered untyped. In a general setting, differential nets are finite linear combinations of simple nets with coefficients in a commutative semiring R with units. In this paper however we will consider only the case R = N, and in such a case the differential nets are the finite multisets of simple nets.
(a) a box
!A
?Cn ?C1
P ?Cn
···
?Cn
!π
···
?Cn
···
The Cut-Elimination Theorem for Differential Nets with Promotion
β∈π ?C1
β
A
223
!A
(b) a box with its contents inside
Fig. 3. A box of type !π can be also presented with its contents depicted inside
interaction nets). A simple net of sNd+1 is a simple net α defined from the cells of Fig.2 and 3(a), such that every box o of α is labelled by a symbol !π, where π is a differential net of dNd , called the contents of o. Moreover, together with o it is given a fixed correspondence between the free ports of every simple net β ∈ π and the ports of o: for every port a of o we denote by aβ the correspondent free port of β. This correspondence enjoys the typing conditions sketched in Fig.3(b). A differential net π of dNd with sequent conclusion Γ is a finite multiset, possibly empty, of simple nets of sNd with sequent conclusion Γ . Initial Greek letters α, β, γ (resp. final Greek letters π, σ, ρ) range over simple nets (resp. differential nets). We use the additive notation for multisets: 0 is the empty multiset, π + σ is the disjoint union of π and σ (repetition does matter); a differential net π can also be written as α∈π α. The depth of a simple net α (resp. differential net π) is the minimal d such that α ∈ sNd (resp. π ∈ dNd ). Many definitions are done by induction on the depth: let us skip to mention it explicitly, when evident. So, we say that a cell/wire w is at depth d of a differential net π, denoted w ∈d π, whenever w ∈d α for a simple net α of π; and we say that w ∈d α whenever either d = 0 and w is a cell/wire of α, view as an interaction net, or there is a box !ρ ∈0 α and w ∈d−1 ρ. We write w ∈! π meaning w ∈d π for a d ∈ N. Switching acyclicity. Every proof in the sequent calculus of DiLL can be translated in a differential net with the same sequent conclusion. This translation is defined by induction on the size of the proof and can be easily deduced from the sequent rules of Figure 1. The translation is not surjective (neither injective) over dN and we call sequentializable the differential nets that are images of it. Purely graph-theoretical conditions, called correctness criteria, have been presented in order to characterize the set of sequentializable differential nets. We give here one among the most celebrated of such correctness criteria, switching acyclicity, presented originally by Danos and Regnier [DR89] for the multiplicative fragment of linear logic, and then extended to DiLL in [ER06].4 A switching of a cell c is an undirected graph σ whose nodes are the ports of c and whose edges are defined depending on the label of c: in case c is of type or ?c, σ has exactly one edge, crossing the principal port and one chosen auxiliary port of c (so c has two possible switchings); otherwise σ is unique and it has one edge for each auxiliary port of c, if any, wiring that port to the principal 4
To be precise, [ER06] introduces switching acyclicity in the promotion-free fragment of DiLL, however its generalization to exponential boxes is straightforward.
224
M. Pagani
one. A correctness graph of a simple net α is an undirected graph σ having as nodes the ports of α and as edges the wires of α plus the edges obtained substituting every cell with one among its switchings. A differential net π is switching acyclic if every simple net of π is switching acyclic; a simple net α is switching acyclic if every correctness graph of α is an acyclic graph and for every box !ρ ∈0 α, ρ is switching acyclic. Proposition 1. A differential net π is sequentializable iff it is switching acyclic. Proof. Standard generalization of the technique developed in [DR89,Dan90]. 1.2
Cut-Elimination
A port is active whenever it is the principal port of a cell or it is an auxiliary port of a box. A cut is a wire connecting two active ports. A differential net is cut-free if it has no cut at any depth. The reader should notice the difference with respect to the differential interaction nets: in that restricted setting the only active ports are the principal ones. Exponential boxes add commutative cuts, that are wires between the principal port of a cell and an auxiliary port of a box. Interaction. We denote αa the pair of a simple net and a sequence a of the free ports of α. Let αa and β b be such that a and b have the same length n and for every i ≤ n, ai and bi have dual types, we call the interaction between αa and β b the simple net αa |β b obtained by equaling for every i ≤ n the port ai of α with the port bi of β, and then by merging the wires that have a port in common5 :
αa
βb
... ... ... ... bn :A⊥ a1:A1 an:An b1:A⊥ n 1
:=
αa ...
An
...
βb ...
...
A1
We can omit the superscripts a, b if they are clear or unimportant. The writing αaa |β b , γ c means αa |β b a |γ c , where a and a are sequences of distinct free ports of α with the lengths and the dual types of resp. b and c. Interaction is extended to differential nets by bilinearity: b αa βjb = αa i | i |βj . i≤l
j≤m
i≤l j≤m
Reductions. Let R be a binary relation over differential nets with the same sequent conclusion, the context closure of R is the smallest relation R◦ s.t. 5
Although intuitively clear, the operation of merging wires should be handled with care because each of the two interfaces a and b may contains pairs of ports wired together and then loops can be produced. We refer to [Vau07] for a formal definition.
The Cut-Elimination Theorem for Differential Nets with Promotion
225
P !d
!d/p
!π
−−→
?c
?c
$/?c
−−−→
?c
$
?c ?c
β∈π
∗d
∗d/∗c
−−−→
∗w
$ ···
α
∗c
···
$
∗d
···
P
α∈π
!d/?d
−−−→
?d
···
−−→
!d
··· ···
$/p
!π
··· ···
···
∗w/∗d −−−→ 0
$
$
!d !c
?w
β
+
∗w
∗d
$/?w
−−−→
?w ···
$
∗d
···
···
∗w
α
P !w ···
!d
− −− → π p/?d
?d
···
−−−→
!π
···
···
⊗/`
···
`
⊗
···
α∈π
?w
Fig. 4. Elementary reduction steps (ers) for differential nets, where ∗ (resp. ∗) is ? (resp. !) or ! (resp. ?), and $ is !w, or !c, or !π. In the !w/?w ers the contractum is the empty graph.
– R◦ is closed by sum: πR◦ π implies (π + ρ)R◦ (π + ρ), for ρ ∈ dN; and – R◦ is closed by interaction: πR◦ π implies α|πR◦ α|π , for α ∈ sN; and – R◦ is closed by promotion: πR◦ π implies !πR◦ !π . In Figure 4 we define specific relations called elementary reduction steps, ers for short. The net at left of an ers is the redex, that at right the contractum R of the ers. For any union R among them we define a reduction, denoted − →, as the R∗ R →. context closure of R. We write by −→ the reflexive and transitive closure of − cut In particular, we define the cut-elimination −−→ as the context closure of the e whole Figure 4, and the exponential reduction − → as the context closure of all the ers of Figure 4 but the ⊗/ ers. We say that a differential net π enjoys the cut∗ cut-elimination if there is a cut-free differential net π0 such that π −−−→ π0 . Notice that the redex of every ers in Fig. 4 is a simple net; also the contractum is simple except for !w/?d, !d/?w, !d/?c, !c/?d, !d/p, and p/?d. In particular !w/?d and its symmetric !d/?w yield the empty sum 0. Also the steps !d/p and p/?d yield 0, in case the content of !π is 0. Proposition 2. Let π −−→ π . If π is switching acyclic, then so it is π . cut
Proof. [ER06] gives the proof for the box-free case: the generalization is easy.
!X
?c
!X
!c
?X ⊥ !c/?c
−−−→
!X
!c
(a)
?c
!X
!c
?c
?X ⊥ !c/?c
−−−→ · · · (b)
Fig. 5. A simple net not enjoying cut-elimination, nor switching acyclicity
226
M. Pagani X !d
!d
!X
?c
!X ?d
?d !X
!w
!d
!w
!d
?c
!!X
+ p/?c
− −− →
?d
(a)
?d
!d/?c
−−−→
?d
?d
(b)
?d
?d
(c)
!d !d
!w
!d
− −− →
?d
!w
!c
+ ··· p/?d
!d/p
?d
−−→
(d)
X
+ ... !w
(e)
!d cut∗
−−→
!X
X
+
!w
!X
!w
!X
!d
!X
(f)
Fig. 6. Example of a reduction to a cut-free differential net
Examples. Figure 5 depicts a (typed) simple net not enjoying cut-elimination nor switching acyclicity (Figure 5(b) gives its cyclic correctness graph): as known, switching acyclicity is an hypothesis needed to prove6 cut-elimination. Notice that replacing the !c-cell of Fig. 5(a) with a box gives a counter-example to the cut-elimination for switching cyclic LL nets. Figure 6 gives an example of reduction from the (switching acyclic) simple net in Fig. 6(a) to the cut-free differential net in Fig. 6(f); by the way, notice that the sequent conclusion X ⊥ , !X, !X is provable in DiLL but not in LL. Let us comment p/?c
the main ers. In the step (a)−−−→(b) the size of the reduced net is increased since a box is duplicated; moreover, the reduction affects the commutative cut incident to the duplicated box, changing its type from !d/p to !d/?c. The fact that the reduction of a cut may affect other cuts takes away from the interaction net !d/?c
paradigm. The step (b)−−−→(c) creates a sum and duplicates cuts even outside boxes. This duplication is closed to the additive duplication in the sliced proofnets of LL, but the acquainted reader should observe that in the sliced proof-nets the sums cannot be created (see [PTdF07] for more details). The reduction then focuses on the left addendum of the sum, and reduces a p/?d redex, getting the !d/p
differential net in (d). The step (d)−−→(e) is the real crucial one, and shows the main oddities of the cut-elimination of DiLL with boxes. As in the p/?c ers, the size of the reduced net is increased and the reduction affects other cuts, but now among the affected cuts there is the one crossing the principal port of the box involved in the reduction, which can be non commutative. Having ers 6
Actually, one can weaken switching acyclicity into visible acyclicity [Pag06], keeping cut-elimination.
The Cut-Elimination Theorem for Differential Nets with Promotion
227
affecting non commutative cuts is a peculiarity of DiLL with boxes and makes subtle importing techniques developed in LL.
2
The Cut-Elimination Theorem
We prove our main result: switching acyclic differential nets enjoy the cutelimination (Theorem 1). This result is achieved by purely combinatorial means, specifically by induction on the pair grade(π), count(π), lexicographically ordered, where grade and count are measures defined in Definition 1. cut We actually guess that −−→ is also strongly normalizing7 , but its proof should be quite hard and it deserves further research. Definition 1. The grade of a formula A is the number grade(A) of connectives occurring in A; the grade({a, b}) of a wire {a, b} is the grade of the type of a, b (or equivalently of the type of b, a). The grade(π) of a differential net π is the maximum grade of the cuts at any depth of π, if any, otherwise it is 0. The count of a differential net π is the number count(π) of the cuts at any depth of π having grade equal to grade(π). Notice that π is cut-free iff grade(π) = 0 iff count(π) = 0. Moreover, consider two differential nets ρ and ρ s.t. grade(ρ ), count(ρ ) < grade(ρ), count(ρ), and a differential net π having a box o ∈! π of type !ρ and grade(π) = grade(ρ), then the differential net π defined from π by replacing o with a box o of type !ρ , enjoys grade(π ), count(π ) < grade(π), count(π). Recall Figure 4 and remark that the pair grade(π), count(π), lexicographically ordered, shrinks whenever an ers of type among ⊗/ , p/?d, !d/?d, !w/?w, !d/?w, !w/?d, p/?w is applied to a cut of π with maximal grade. This is not the case for the other types of ers: they are indeed handled by Lemma 1, stating that any cut with exponential type can be reduced into several (possibly 0) cuts with strictly lesser grade. Lemma 1 is proved by induction on the rank, a measure defined as follows. Definition 2. We define simultaneously the rank of a cell and the rank of a differential net, by induction on the depth. The rank(π) of a differential net π is the maximum rank of its simple nets; the rank of a simple net is the sum of the ranks of its cells; the rank of a cell c is the number (n + 1)(rank(ρ) + 2 + n) if c is a box !ρ crossing n + 1 ports, rank(c) := 1 otherwise. Recall Figure 4 and notice that the rank of every simple net in the contractum of a !d/p ers is strictly smaller than that of its redex, so justifying Definition 2. Definition 3. A !-tree (resp. ?-tree) is a simple net α with a distinguished free port a of type !A (resp. ?A), for a suitable A, called the root of α, and such that one of the following inductive conditions hold (see Figure 7): 7
cut
For every π switching acyclic there is no infinite {πi }i∈N s.t. π0 = π and πi −−→ πi+1 .
!A !d a
!A !w a
!A !π a
α1
!A !c
···
!A a
···
M. Pagani
...
228
α2
!A a
!A
Fig. 7. Inductive definition of the !-trees; the ?-trees are defined similarly, using ?-types and ?-cells, except for the !π-case, which does not yield a ?-tree
– α is a wire crossing a; – α is a !d-cell or a !w-cell (resp. ?d-cell or ?w-cell) with its incident wires and a is the free port wired to the principal port of this cell; – (only for the !-trees) α is a box !π with its incident wires and a is the free port wired to the principal port of the box; – α is made of a cocontraction (resp. contraction) l, its incident wires and two !-trees (resp. ?-trees) having their roots auxiliary ports of l, and a is the free port wired to l Notice a ?-tree is cut-free and switching acyclic; the same holds for !-trees, supposed that the contents of their boxes are resp. cut-free and switching acyclic. Lemma 1 (Exponential Lemma). Let !A be a formula, αa = αa1 1 , . . . , αann be a sequence of n ≥ 1 !-trees having the root ai of type !A, and let β b be a simple net with b = b1 , . . . , bn distinguished free ports of type ?A⊥ . Suppose also grade(β) < grade(!A) and for every i ≤ n, grade(αi ) < grade(!A). The interaction β b |αa e-reduces to a differential net π such that grade(π) < grade(!A): ···
α1
β b |αa :=
d1
!A
c1
···
··· β
αn dn
!A
cn
e∗ −→ π, with grade(π) < grade(!A) .
Proof. The proof is by induction on the pair maxi≤n (rank(αi )), rank(β), lexicographically ordered. If no wire {ci , di } linking β to the !-trees is a cut, then we simply set π = β|α (in the sequel we will omit the superscripts b and a). Otherwise, suppose w.l.o.g. that the wire {c1 , d1 } is a cut. The proof splits in several cases, depending on the type of the ers associated with {c1 , d1 }. We consider only the most delicate cases, the others being straightforward or easy variants of those presented here. In the sequel, α denotes αa2 2 , . . . , αann . Case i (p/?d). If α1 is a box o of type !ρ and c1 is the principal port of a p/?d ?d-cell k (see the leftmost net of Fig. 8), then β|α −−−→ δ∈ρ γδ , where γδ is obtained from β|α by replacing the redex made of o and k with δ (see the rightmost net of Figure 8). Call β the subnet of β not containing the ?d-cell k.
The Cut-Elimination Theorem for Differential Nets with Promotion
··· !ρ d1 c1 !A ?d k
A
···
···
α2
αn
··· β
··· p/?d
− −− →
P
···
δ∈ρ
δ
α2
229
···
αn
··· β
A
Fig. 8. Case p/?d
αl αr al ar l !c d1 c1 !A ?d k
A
···
···
α2
αn
··· β
αr
αl
al
ar
+
!c/?d
−−−→ ?w
k
···
α2
?d
A
···
··· β
αn
e∗
−→
P
δ
∈π δ∈πl +πr A
···
···
Fig. 9. Case !c/?d
Consider β |α , which is a subnet of every γδ , for δ ∈ ρ. Obviously we have max2≤i≤n (rank(αi )) ≤ maxi≤n (rank(αi )) and rank(β ) = rank(β) − 1, hence we can apply the induction hypothesis to β |α and get a differential net π such e∗ that β |α −→ π and grade(π ) < grade(!A). From the hypothesis we have also for every δ ∈ ρ, grade(δ) < grade(!A). Define π as the interaction π |ρ between the A⊥ free port of (every simple net of) π and the A free port of (every simple net of) ρ. Conclude that grade(π) < e∗ grade(!A) and β|α −→ π. Case ii (!c/?d). If α1 is a cocontraction l wired to two !-trees αl and αr and the port c1 of β is the principal port of a dereliction k (see the leftmost net of Fig. 9), then β|α !c/?d-reduces to γl +γr , where γl (resp. γr ) is obtained by erasing the !-cell l and by wiring its left (resp. right) auxiliary port, denoted al (resp. ar ), with the principal port c1 of the dereliction k, and its right (resp. left) auxiliary port with the principal port cw of a new weakening cell (see the differential net at the middle of Figure 9, where one must think that the sum distributes). Call β the subnet of β not containing the ?d-cell k, and notice γl (resp. γr ) can be decomposed into the subnet β |α and the subnet β0c1 cw |αal l , αar r (resp. β0c1 cw |αar r , αal l ), where β0 denotes the simple net formed by the ?d-cell k and the weakening created by the ers. First, consider β |α : we have max2≤i≤n (rank(αi )) ≤ maxi≤n (rank(αi )) and rank(β ) = rank(β) − 1, hence we can apply the induction hypothesis and get e∗ a differential net π s.t. β |α −→ π and grade(π ) < grade(!A). Second, c1 cw al consider the simple net β0 |αl , αar r : we have max(rank(αl ), rank(αr )) < rank(α1 ) ≤ maxi≤n (rank(αi )), hence we can apply the induction hypothesis and
230
M. Pagani
αl αr al ar l !c d1 c1 !A k ?c cr cl
···
···
α2
αn
···
αl !c/?c
−−−→
β
αr
al ar ?c ?c !c cr
!c cl
···
···
α2
αn
···
αl e∗
−→
β
P ∈ π
αr
ar + al ?c ?c ···
/
Fig. 10. Case !c/?c
get a differential net πl satisfying β0c1 cw |αal l , αar r −→ πl and grade(πl ) < e∗ grade(!A). Similarly we get a differential net πr satisfying β0c1 cw |αar r , αal l −→ πr and grade(πr ) < grade(!A). Finally, we define π as the interaction between the A⊥ free port of (every simple net of) π with the A free port of (every simple net of) πl + πr , see e∗
!c/?d
e∗
Figure 9. We have β|α −−−→ γl + γr −→ π and grade(π) < grade(!A). Case iii (!c/?c). If α1 is a cocontraction l wired to two !-trees αl and αr and the port c1 of β is the principal port of a contraction k (see the leftmost net of Figure 10) then β|α !c/?c-reduces to the net γ in the middle of Figure 10. Let β be the subnet of β not containing the contraction k, let cl , cr be the two auxiliary ports of k and let al , ar be the two auxiliary ports of l, which are also the roots of resp. αl and αr . Finally, let ll , lr be the two copies of l created by the reduction of {c1 , d1 }: notice ll , lr are !-trees having rank equal to rank(l) = 1. Consider β |ll , lr , α , which is a subnet of γ. Observe that max{rank(l), rank(α2 ), . . . , rank(αn )} ≤ maxi≤n (rank(αi )) and rank(β ) = rank(β) − 1, hence we can apply the induction hypothesis and get a e∗ differential net π such that β |ll , lr , α −→ π and grade(π ) < grade(!A). + For every ∈ π , let be the net formed by and the two copies of the contraction k created by the reduction of {c1 , d1 } (see the rightmost simple net of Figure 10). Let us consider the interaction + |αl , αr and notice that max(rank(αl ), rank(αr )) < rank(α1 ) ≤ maxi≤n (rank(αi )). We thus apply the e∗ induction hypothesis, getting a differential net π s.t. + |αl , αr −→ π and !c/?c e∗ grade(π ) < grade(!A). Define π = ∈π π and conclude: β|α −−−→ γ −→ π and grade(π) < grade(!A). Case iv (!d/p). If α1 is a codereliction l and the port c1 of β is an auxiliary port of a promotion o of type !ρ (see the leftmost net of Figure 11), then β|α !d/p-reduces to the differential net δ∈ρ β δ , where β δ is obtained from β by replacing o with the simple net δ outlined at the right of Figure 11. For every δ ∈ ρ, we have rank(β δ ) = rank(β ) + rank(δ ), where β is the subnet of β not containing o; we have also rank(δ ) < rank(!ρ), in fact the rank has been suitably defined to have any contractum of a !d/p ers strictly smaller than the rank of its redex (Definition 2). Hence we conclude rank(β δ ) < rank(β). We apply the
The Cut-Elimination Theorem for Differential Nets with Promotion
δ∈ρ
!ρw
−−→
!d
?c ?c ···
!d/p
P
δ
β
αn
!d o
···
l
δ
···
!c
··· l !d α2 d1 !A c1 · · · · · · !ρ o
···
α2
231
···
···
αn
β
! / Fig. 11. Case !d/p e∗
induction hypothesis to β δ |α getting a differential net π δ s.t. β δ |α −→ π δ !d/p and grade(π δ ) < grade(!A). We define π = δ∈ρ π δ and we conclude β|α −−→ δ e∗ → π and grade(π) < grade(!A). δ∈ρ β − Case v (Otherwise.). The case !d/?c is an easy variant of the case !c/?d. The case !c/p is similar to the case !c/?c: indeed, adopting the notation of Fig. 10, one has to apply three times the induction hypothesis — once applied to β |α , once to the contents of the box involved in the !c/p ers, and a last time to the interaction with the !-trees residue of αl and αr . The p/?c, and p/p cases are variations of the case !c/p. The other cases are obvious. Let us stress that in general the count of the differential net π mentioned in Lemma 1 may be greater than count(β b |αa ): what decreases is the grade. This motivates the introduction of two distinct measures, grade and count. cut∗
Theorem 1. For every switching acyclic π, there is a cut-free π0 s.t. π −−−→ π0 . Proof. The proof is by induction on the pair grade(π), count(π), lexicographically ordered. Let {a, b} ∈! π be a cut having maximal grade, i.e. grade({a, b}) = grade(π), and having maximal depth among the cuts with maximal grade in π. Let α be the simple net of π or of the contents of a box in π, having {a, b} at depth 0. Our goal is to prove that: cut∗
(∗) there is a cut-free differential net ρα such that α −−−→ ρα . First we show that (∗) entails that π enjoys the cut-elimination. Indeed since grade(α) = grade(π), the differential net π obtained from π by substituting cut∗ α with ρα meets grade(π ), count(π ) < grade(π), count(π). Since π −−−→ π and π is switching acyclic (Proposition 2), we conclude by induction hypothesis. So let us prove (∗). The multiplicative case is straightforward. Then suppose {a, b} has an exponential type, let a : !A and b : ?A⊥ . Let α! be the maximal !-tree of α with root a, and let α? be the maximal ?-tree of α with root b. Let moreover o1 , . . . , on , for n ≥ 0, be the boxes of α having one, or more, auxiliary port !A as free port of α? , and let α?+ be the simple net made of α? and of these boxes o1 , . . . , on . Notice that there is no cut between α? and any box oi : indeed, α? is a ?-tree, hence none of its free ports is connected to an active port,
232
M. Pagani
but b. By the switching acyclicity of α, the !-tree α! and the simple net α?+ are disjoint, i.e. the sets of the ports in resp. α! and α?+ are disjoint. This means that α may be expressed as the interaction among three simple nets: α! , α?+ and a simple net β, as follows ··· ···
a
α?
α!
α?+ ··
···
·
o1
α = β|α?+ |α! = I
!A
on
b
···
···
,
β
where we denote by I the set of wires, possibly empty, shared between β and α?+ |α! . Notice that each wire {c, d} ∈ I, d denoting the free port for β, meets exactly one of the following three conditions: i. either d is an auxiliary port of a !d-cell in α! (resp. ?d-cell in α? ), and c has type A⊥ (resp. A); ii. or d is an auxiliary port of a !c-cell in α! (resp. ?c-cell in α? ), and c is of type ?A⊥ (resp. !A) and it is not active (i.e. nor principal port, neither auxiliary port of a box): in fact if c were the principal port of a cell, then this cell should be a !-cell (resp. ?-cell), and α! (resp. α? ) would not be maximal, while if c were the auxiliary port of a box then c would be a free port of α? and this box would be one oi added to α?+ ; iii. or d is an auxiliary port of a box in α! or an auxiliary or principal port of a box oi in α?+ . By hypothesis, the contents of every box of α have a strictly smaller grade than grade(α), and so it is in particular for the boxes in α! and those in α?+ . We then deduce grade(α?+ ), grade(α! ) < grade(α) = grade({a, b}). This means we can e∗ apply Lemma 1 to α?+ |α! , and get a differential net ρ such that α?+ |α! −→ ρ and grade(ρ ) < grade({a, b}). For every γ ∈ ρ , the interaction β|γ is the result e∗ of replacing in α the subnet α?+ |α! with γ. So we have α −→ γ∈ρ β|γ. For every γ ∈ ρ we prove grade(β|γ), count(β|γ) < grade(α), count(α). Clearly we have grade(β|γ) ≤ grade(α), so assume grade(β|γ) = grade(α) and let us prove count(β|γ) < count(α). This amounts to count the cuts with grade grade(α) in β|γ and those in α = β|α?+ |α! . Let us start with β|γ: none of these cuts can be in γ, since by hypothesis grade(γ) < grade({a, b}) = grade(α), then these cuts are either cuts in β or they are in the set I of wires shared by β and γ. So count(β|γ) = nβ + nI , where nβ is the number of cuts in β, nI that of cuts in I. As for α, we have count(α) = nβ + nI + 1, where nI is the number of the wires of I which are cuts in β|α?+ |α! with grade equal to grade(α) (in general nI = nI ), and the number 1 is associated with {a, b}. We prove nI ≤ nI , that clearly implies count(β|γ) < count(β|α?+ |α! ). Consider a wire {c, d} ∈ I, d denoting a free port for β, with grade equal to grade(α). Assume {c, d} is a cut in β|γ, we prove {c, d} is also a cut in β|α?+ |α! . Recall that {c, d} meets exactly one among the above conditions
The Cut-Elimination Theorem for Differential Nets with Promotion
233
(i)-(iii). Since we suppose grade({c, d}) = grade(α), cond. (i) fails, since we suppose {c, d} is a cut in β|γ, c is active in β and so cond. (ii) fails. It remains condition (iii) which entails that {c, d} is a cut also in β|α?+ |α! , d being active in α?+ |α! . We eventually conclude grade(β|γ), count(β|γ) < grade(α), count(α), for every that the pair grade, count shrinks γ ∈ ρ . However this does not mean also in γ∈ρ β|γ, since the count of γ∈ρ β|γ is the sum of the counts of each β|γ. So we first apply the induction hypothesis to each β|γ (notice β|γ e∗ is switching acyclic by Proposition 2 and α −→ γ∈ρ β|γ), getting a cut-free cut∗ differential net ργ such that β|γ −−−→ ργ , and then we define ρα = γ∈ρ ργ .
References Dan90.
DR89. EL08. ER06. Gir87. Laf90.
Pag06.
PTdF07.
Vau07.
Danos, V.: La Logique Lin´eaire appliqu´ee ` a l’´etude de divers processus de normalisation (principalement du λ-calcul). Th`ese de doctorat, Universit´e Paris VII (1990) Danos, V., Regnier, L.: The structure of multiplicatives. Archive for Mathematical Logic 28, 181–203 (1989) Ehrhard, T., Laurent, O.: Interpreting a finitary pi-calculus in differential interaction nets. Submitted journal version (April 2008) Ehrhard, T., Regnier, L.: Differential interaction nets. Theor. Comput. Sci. 364(2), 166–195 (2006) Girard, J.-Y.: Linear logic. Theoretical Computer Science 50, 1–102 (1987) Lafont, Y.: Interaction nets. In: POPL 1990: Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 95–108. ACM Press, New York (1990) Pagani, M.: Acyclicity and coherence in multiplicative and exponential ´ linear logic. In: Esik, Z. (ed.) CSL 2006. LNCS, vol. 4207, pp. 531–545. Springer, Heidelberg (2006) Pagani, M., de Falco, L.T.: Strong normalization property for second order linear logic. Pre-print, Istituto per la Applicazioni del Calcolo, Roma (2007); To appear in Theoretical Computer Science Vaux, L.: λ-calcul diff´erentiel et logique classique: interactions calculatoires. Th`ese de doctorat, Universit´e Aix-Marseille II (November 2007)
A Polymorphic Type System for the Lambda-Calculus with Constructors Barbara Petit LIP - ENS Lyon 46 All´ee d’Italie, 69364 Lyon, France http://perso.ens-lyon.fr/barbara.petit
Abstract. We present a Curry-style second-order type system with union and intersection types for the lambda-calculus with constructors of Arbiser, Miquel and Rios, an extension of lambda-calculus with a pattern matching mechanism for variadic constructors. To prove the strong normalisation property for this system, we translate well-typed terms in an auxiliary calculus of case-normal forms using the interpretation method. We finally prove the strong normalisation property for the auxiliary calculus using the standard reducibility method. Keywords: lambda-calculus, polymorphism, pattern matching, strong normalisation.
Introduction Pattern matching is a crucial feature in modern functional programming languages [15,11,14], as well as in proof assistants (especially those based on type theory [5,1]). It has been intensively studied both in practical implementations and from the theoretical point of view of rewriting [18]. Many approaches have been proposed to extend λ-calculus [3] with pattern matching facilities, both in untyped [13,7] and typed [12,4,8] settings. Among these approaches, the λ-calculus with constructors [2] proposes to decompose ML-style pattern matching using a case construct {|c1 → u1 ; . . . ; cn → un |} · t performing case analysis on constant constructors, in the spirit of the case of Pascal (or the switch of C). Destruction of composite data structures formed by applying a constructor to one or many arguments is achieved using a commutation rule between case and application1 : (CaseApp)
{|θ|} · (tu) = ({|θ|} · t)u
Thanks to this rule, one can encode the whole ML-style pattern matching in the calculus, and write destruction functions on more complex data types, such as 1
Which differs from the commutative conversion rules [10] coming from logic.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 234–248, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Polymorphic Type System for the Lambda-Calculus with Constructors
235
for instance the predecessor function: pred = λx.{|0 → 0; succ → λz.z|} · x, which satisfies: pred (succ n) = {|0 → 0; succ → λz.z|} · (succ n) = ({|0 → 0; succ → λz.z|} · succ) n = (λz.z) n =n Actually, one can even encode pattern matching for variadic constructors. The λ-calculus with constructors enjoys many good properties, such as confluence and separation (in the spirit of B¨ ohm’s theorem). In this paper, we propose a polymorphic type system for this calculus, thus addressing the problem of typing the case construct in presence of the CaseApp commutation rule. Since we are mainly interested in the correctness of the type system —in terms of strong normalisation and absence of match failure— we intentionally consider a very expressive (and strongly undecidable) type system, based on second order polymorphism with universal and existential quantification, and binary intersection and union types, with subtyping. (Union types are essential to decompose algebraic data types). It appears that the main difficulty in the strong normalisation proof is to design a good notion of reducibility candidates which is able to cope with the commutation rule attached to the case. Instead, we will prove strong normalisation indirectly by interpreting the calculus into an intermediate calculus of case normal forms, and then prove strong normalisation for this calculus (where commutation rules have been disabled), using the standard reducibility method [10]. From this construction we will deduce the main property of the typed calculus, including the safety from match failure for well typed terms. Outline: Parts 1 and 2 respectively present the λC -calculus and the type system. Part 3 introduce the calculus in case normal forms, and part 4 the reducibility candidates model. Finally, part 5 concludes with the main properties of the typed λC -calculus.
1 1.1
The Lambda Calculus with Constructors Its Syntax
The syntax of the λ-calculus with constructors [2] is defined from two disjoint sets of symbols: variables (notation: x, y, z, etc.) and constructors (notation: c, d, etc.) It consists of two syntactic categories defined by mutual induction in Fig. 1: terms (notation: t, u, s, etc.) and case bindings (notation: θ, φ). Terms include all the syntactic constructs of the λ-calculus, plus constructors (as constants) with a case construct (similar to the case construct of Pascal) to analyse them. There is also a constant (the Daimon, inherited from ludics [9]) representing immediate termination. Case bindings are finite functions from constructors to terms.
236
B. Petit
T erms :
t, u, s | | |
CaseBindings :
θ, φ
x | tu c {|θ|} · t
|
λx.t
(λ-calculus) (Constructor) (Case Construct) (Daimon)
{c1 → u1 ; . . . cn → un } (Case Binding) cj for i = j ci =
Fig. 1. λC -terms and case bindings
Free and bound (occurrences of) variables are defined as usual, taking care that constructors are not variables and thus not subject to α-conversion. The set of free variables (denoted by FV(t)) is defined for the new constructs by FV(c) = ∅
FV({|θ|} · t) = FV(θ) ∪ FV(t)
FV(θ) = ∪(c→u)∈θ FV(u)
The usual operation of substitution on terms (notation: t[x := u]) is defined as expected, taking care of renaming bound variables when needed in order to prevent variable capture. Substitution on case bindings (notation: θ[x := u]) is defined componentwise. 1.2
Its Operational Semantics
The reduction of λC is based on the 9 reduction rules given in Fig. 2 among which one can find the β and η reduction rules of the λ-calculus, now called AppLam and LamApp 2 , respectively. Case constructs are propagated through terms via the CaseApp, CaseLam and CaseCase commutation rules, and ultimately destructed with CaseCons reduction. For an explanation of the role and expressiveness of these rules, see [2]. The confluence or non confluence is known for every combination of the 9 reduction rules [2], and the full calculus is confluent. In this paper, we shall only consider the following subcalculi, which are both confluent: – BC denotes the λC -calculus without AppLam rule. – DC is the λC -calculus without rules AppDai, LamDai and CaseDai. A term with no infinite reduction is said to be strongly normalising. By extension, a calculus is strongly normalising (notation: SN ) when all its terms are. It is also known that BC is SN ([2], Prop. 2). 1.3
Values in λC
In λC , we call data structure a term of the form c t1 . . . tk where c is a constructor and t1 , . . . , tk (k ≥ 0) are arbitrary terms. We then call value a term which is a λ-abstraction or a data structure. 2
In λC -calculus, the name of each reduction rule consists of the names of the two constructions interacting for the reduction.
A Polymorphic Type System for the Lambda-Calculus with Constructors
237
Beta-reduction AppLam (AL) (λx.t)u → t{x ← u} AppDai (AD) u → Eta-reduction LamApp (LA) λx.tx → t (x ∈ / FV(t)) LamDai (LD) λx. → Case propagation CaseCons (CO) {|θ|} · c → t ((c → t) ∈ θ) CaseDai (CD) {|θ|} · → CaseApp (CA) {|θ|} · (tu) → ({|θ|} · t)u (x ∈ / FV(θ)) CaseLam (CL) {|θ|} · λx.t → λx.{|θ|} · t Case conversion CaseCase (CC) {|θ|} · {|φ|} · t → {|θ ◦ φ|} · t with θ ◦ {c1 → t1 ; ...; cn → tn } ≡ {c1 → {|θ|} · t1 ; ...; cn → {|θ|} · tn } Fig. 2. Reduction rules for λC
We say that a term is defined when it has no subterm of the form {|θ|} · c, with c ∈ / dom(θ), and that it is hereditarily defined when all its reducts (in any number of steps) are defined. (Intuitively, not defined terms contain pattern matching failures and therefore will be rejected by the type system.) Proposition 1. Every defined closed normal term is either or a value. Finally, a term which is both strongly normalising and hereditarily defined is said to be perfectly normalising.
2
Type System
2.1
An Informal Presentation
The type system we want to define includes the simply typed λ-calculus: the main type construct is the arrow type T → U , coming with its usual introduction and elimination rules. To achieve polymorphism, we introduce type variables (written X, Y etc.) and universal type quantification (notation: ∀X.T ). Instantiation is performed via a subtyping judgement containing all the rules of system F with subtyping such as presented in [16]. To type-check data structures, we associate a type constant —still written c— to every constructor c. We introduce a type application DT for applied structures, → → → − − → − − so that we can derive c t : c T from t : T (see 2.2 for more details on vectorial notations). Nevertheless, the formation of application types has to be restricted to prevent typing non normalising terms. Indeed, unrestricted type application would allow to give the type T T to the term δδ, where T is a type for δ 3 . 3
δ λx.xx has type ∀X(X → X) → ∀X(X → X) for instance.
238
B. Petit
For that we distinguish a subclass of data types (notation: D, E). They will be the only types on the left-hand side of a type application. In practice this subclass excludes arrow types and type variables (which could be instantiated by arbitrary types). To still keep the ability to quantify over data-types, we introduce data type variables (notation: α, β etc.) and data type quantification. To encode algebraic types, we add union types. For example, we could define a type of natural numbers with the equation nat ≡ 0 ∪ succ (nat) (where 0 and succ are constructors4 ). To distribute arrow among union, we also need intersection types: (0 ∪ succ(nat)) → T ≡ (0 → T ) ∩ (succ(nat) → T ). This allows to type pred := λx.{|0 → 0; succ → λy.y|} · x with type nat → nat for instance. By symmetry, we add existential quantifier. 2.2
The Formal System
We define a polymorphic type system with union and intersection for both terms and case bindings of λC (Fig. 3). It uses two spaces of type variables: ordinary type variables and data type variables. There are also two kinds of types: ordinary types, and their syntactic subclass of data types.
T ypes :
T, U := | | | | | |
X α | c | DT T →U T ∪U T ∩U ∀α.T | ∀X.T ∃α.T | ∃X.T
(Ordinary type variable) (Data type) (Arrow type) (Union type) (Intersection type) (Universal type) (Existential type)
DataT ypes :
D, E := | | | | |
α c | DT D∪E D∩E ∀α.D | ∀X.D ∃α.D | ∃X.D
(Data type variable) (Data structure) (Union data type) (Intersection data type) (Universal data type) (Existential data type)
Fig. 3. Types of λC
In the following, ν denotes a variable which can be an ordinary type variable or a data type variable. The set TV(T ) denotes the set of all free (ordinary and data) type variables of a type T . We also use a vectorial notation for application and arrow types: → − T := [] | T ;T → − → − c[ ] = c c( T ; T ) = (c T )T → − → − [] → U = U ( T ; T ) → U = T → (T → U ) 4
This would require a fixpoint operator, or a double subtyping rule.
A Polymorphic Type System for the Lambda-Calculus with Constructors
239
Typing rules (Fig. 4) include the usual introduction and elimination rules of typed λ-calculus for each type operator. Some of them —like the elimination of universal quantifier— are indeed subtyping rules (Fig. 5). The subtyping rule Data allows typing constructors with arbitrary arity. A case binding is typed (by CB) like a function waiting for a constructor of its domain as argument, up to a possible conversion of arrow type into application type: if u has type T → U , then {c → u} has type c → (T → U ) as well as c T → U . This is the point that allows CaseApp commutation rule to be well typed. In the same way, the typing rule for a case construct {|θ|} · t (case) allows t to be a function that waits for an arbitrary numbers of argument. This make CaseLam well typed. Case Binding: If θ = {(ci → ui )n } i=1 − n → Γ ui : Ui → Ti i=1 CB − → Γ θ : i ci Ui → Ti Terms: Init
− (x : T ∈ Γ ) Γ x:T
→ intro
False
Γ, x : T t : U Γ λx.t : T → U case
− Γ :T
→ elim
Constr
− Γ c:c
Γ t:T →U Γ u:T Γ tu : U
→ − Γ t: U →T
Γ θ : T → T → − Γ {|θ|} · t : U → T
Shared rules: M is either a term t, either a case binding θ. Univ
Exist
Γ M :T ν∈ / TV(Γ ) Γ M : ∀ν.T
Γ, x : T M : U ν∈ / TV(U ) Γ, x : ∃ν.T M : U Subs
Inter
Union
Γ M :T Γ M :U Γ M :T ∩U
Γ, x : T1 M : U Γ, x : T2 M : U Γ, x : T1 ∪ T2 M : U
Γ M :T T U Γ M :U
Fig. 4. Typing rules
3
Case Normal Forms Calculus
In this part we shall consider the λC -calculus in normal form w.r.t. CaseCons, CaseDai, CaseApp, CaseLam and CaseCase), which we call the λCN -calculus. From a pure syntactical point of view, it is a sub-calculus of λC . However, it is not closed under substitution, and therefore requires a new definition of substitution, and adapted reduction rules. We then project λC in this new calculus.
240
B. Petit
Refl
Arrow ∪introL ∩intro
− T T
Trans
T T U U T → U T → U
− U1 U1 ∪ U2
∪introR
T U1 T U2 T U1 ∩ U2
∀elim
∃intro
− T {X ← U } ∃X.T
∃introD
→ /∩ → /∪
Di Ti (
i Ti → Ui (
i Ti → Ui (
∪/AppR ∃/AppR ∪/∀
D(
− −
i
− U2 U1 ∪ U2 − U1 ∩ U2 U1
− T {α ← D} ∃α.T
− D T → DT
i
Constr
− i
Di )(
i Ti ) → (
i Ti ) → (
D D T T DT D T
− ∀X.T T {X ← U }
T U ν∈ / TV(T ) T ∀ν.U
App/∩
App
∩elimL
∀intro
Data
T T0 T0 T T T
i
Ti )
i Ui )
i Ui )
− Ti ) i DTi
− ν∈ / TV(D) D(∃ν.T ) ∃ν.DT
− ν∈ / TV(U ) ∀ν.(T ∪ U ) (∀ν.T ) ∪ U
∪elim
T1 U T2 U T1 ∪ T2 U
∩elimR
− U1 ∩ U2 U2
∀elimD
− ∀α.T T {α ← D}
∃elim
UT ν∈ / TV(T ) ∃ν.U T
− c1 = c2 → − → − c1 T ∩ c2 U ∀α.α
App/∀
− ∀ν.(DT ) (∀ν.D)(∀ν.T )
→ /∀
− ∀ν.(T → U ) (∀ν.T ) → (∀ν.U )
→ /∃
− ∀ν.(T → U ) (∃ν.T ) → (∃ν.U )
− ∪/AppL ( i Di )T i (Di T ) ∃/AppL ∃/∩
− ν∈ / TV(T ) (∃ν.D)T ∃ν.DT
− ν∈ / TV(U ) (∃ν.T ) ∩ U ∃ν.(T ∩ U )
Fig. 5. Sub-typing rules
3.1
The λCN -Calculus
The grammar of λCN (Fig. 6) differs from the one of λC by case constructs: beside a case binding one can only find an unmatchable constructor or a variable, and so λCN -terms do not have case redex. We write ΛCN the set of λCN -terms and ΛCN its subset of closed terms. 0 Since this calculus is not substitutive in the usual sense (we have to be careful when substituting x in {|θ|} · x), we need to modify substitution. For that, we use a macro: we inductively define the pseudo case constructs |θ| ·t for any t ∈ ΛCN :
A Polymorphic Type System for the Lambda-Calculus with Constructors
Terms: t, u, s x | c | | tu | λx.t | {|θ|} · x | {|θ|} · c Case Bindings: θ, φ {c1 → u1 ; . . . cn → un } Reduction rules:
β
(λx.t)u −−→ tu/x
βd
u −−→
241
(with c ∈ / dom(θ))
CN CN
η
λx.tx −−→ t
ηd
λx. −−→
CN
(x ∈ / FV(t))
CN
Fig. 6. The λCN -calculus
|θ| · x = {|θ|} · x |θ| · c = {|θ|} · c ( if c ∈ / dom(θ)) ( if c → u ∈ θ) |θ| · = |θ| · c = u |θ| · (tu) = (|θ| · t) u |θ| · (λx.t) = λx.|θ| · t ( with x ∈ / FV(θ)) |θ| · ({|φ|} · t) = {|θ˜ ◦φ|} · t where θ˜ ◦{c1 → u1 ; . . . ; cn → un } = {c1 → |θ| · u1 ; . . . ; cn → |θ| · un } Note that {|θ˜ ◦φ|} · t is well defined in ΛCN since dom(θ˜◦φ) = dom(φ). Then we inductively define the substitution in λCN , denoted by tu/x : xu/x = u ({|θ|} · x)u/x = |θu/x | · u yu/x = y ({|θ|} · y)u/x = {|θu/x |} · y cu/x = c ({|θ|} · c)u/x = {|θu/x |} · c (λy.t)u/x = λy.(tu/x ) (t1 t2 )u/x = t1 u/x t2 u/x u/x = where {c1 → t1 ; . . . ; ck → tk }u/x = {c1 → t1 u/x ; . . . ; ck → tk u/x } If a case redex appears when a substitution is performed, it is immediately destructed due to the definition of pseudo case construct. That is why there is no explicit case rule (Fig 6). Although pseudo case constructs and CN-substitution can perform many λC reduction steps at a time, their manipulation remains similar to case constructs and substitution in λC : Lemma 2. For any t, t , u ∈ ΛCN and any λCN -case bindings θ, φ, (|θ| · t)u/x = |θu/x | · tu/x |θ| · (|φ| · t) = |θ˜ ◦ φ| · t u t/x t /y = u t /y tt /y /x t −−→ t CN
⇒
(1) if x ∈ / FV(t )
tu/x −−→∗ t u/x CN
(2) (3) (4)
Values and defined terms are defined in the same way as in λC -calculus (section 1.3). We write V the set of λCN -values, and PN0 the set of perfectly normalising closed λCN -terms. Note that Prop. 1 also holds in λCN .
242
3.2
B. Petit
Projection of λC in λCN
Terms of λCN are included in Λ. Reciprocally, we can project every λC -term t in ΛCN through a translation in case-normal form called switch (notation: ↓ t): ↓x = x ↓ (λx.t) = λx. ↓ t ↓c = c ↓ (t1 t2 ) = ↓ t1 ↓ t2 ↓ = ↓ ({|θ|} · t) = | ↓ θ| · (↓ t) where ↓ {c1 → t1 ; . . . ; ck → tk } = {c1 →↓ t1 ; . . . ; ck →↓ tk } This translation preserves composition, substitution and reduction: Lemma 3. For any t, t , u ∈ Λ and any case bindings θ, φ, ↓ (θ ◦ φ) = (↓ θ ˜ ◦ ↓ φ)
(5)
↓ (t[u := x]) = (↓ t)↓ u/x t → t ⇒ ↓ t −−→∗ ↓ t
(6) (7)
CN
4
Reducibility Candidates Model
In this section, we interpret types by reducibility candidates [10], which are sets of closed and perfectly normalising λCN -terms. We complete the usual definition with the notion of data candidates. 4.1
Introduction to Reducibility Candidates
The definition of reducibility candidates is founded on the notion of neutral terms: a term is said neutral if it is not a value. We write ND the set of defined closed neutral λCN -terms. We also denote by Redn (t) the set of terms to which t reduces in n steps, and by Red∗ (t) the union of all these sets for n in N. A set S ∈ ΛCN is a reducibility candidate when it satisfies: 0 (CR1) Perfect normalisation: S ⊆ PN0 (CR2) Stability by reduction: t ∈ S ⇒ Red1 (t) ⊆ S (CR3) Stability by neutral expansion: If t ∈ ND , then Red1 (t) ⊆ S ⇒ t ∈ S We denote by CR the set of all reducibility candidates, and by (CR) the conjunction of (CR1) , (CR2) and (CR3) . Note that every reducibility candidate is non empty (it contains as neutral term), and that PN0 is in CR . A data candidate is a reducibility candidate whose all values are data structures. The subclass of data candidates, written DC , will be helpful to interpret data types. Sets that satisfy (CR1) and (CR2) are called pre-candidates of reducibility, and we write PCR the set of all such pre-candidates. Remark that t ∈ PN0 implies Red∗ (t) ∈ PCR.
A Polymorphic Type System for the Lambda-Calculus with Constructors
4.2
243
Closure Properties
A reducibility candidate is closed by reduction and by expansion for neutral terms. As a consequence, it is entirely determined by its values: Lemma 4. Let S and S be two reducibility candidates. Then S ∩ V ⊆ S ∩ V iff S ⊆ S , and so S ∩ V = S ∩ V implies S = S . This property allow us to define a candidate by the set of its values. For that, we close ordinary sets of terms by (CR) , following the presentation of [17]. Given a set X of closed λCN -terms, we define: – X0 = Red∗ (X) – Xn+1= Xn ∪ {t ∈ ND /Red1 (t) ⊆ Xn } for every n ∈ N – X = n∈N Xn . When X ⊆ PN0 , X is called the closure of X. It is justified by the following lemma: Lemma 5. If S ⊆ PN0 , S is the smallest reducibility candidate containing S. Given S ⊆ ΛCN 0 , Val(S) denotes the intersection of V and Red∗ (S). In particular, if S ∈ CR then Val(S) = S ∩ V and S = Val(S). Since all the values of a closed set S are already in S0 = Red∗ (S), the following lemma holds and allows the use of closure operator to define data candidates: Lemma 6. Let D ⊆ PN0 such that Val(D) ⊆ {cu1 ...un ui ∈ ΛCN 0 }. Then D is a data candidate. In particular, ∅ is a data candidate. 4.3
Operations in CR
Since we aim to interpret types by reducibility candidates, we need to define all type operators in CR . The definition of arrow is standard [10]: for A, B ⊆ ΛCN 0 , A → B {t ∈ ΛCN / ∀u ∈ A, tu ∈ B} . 0 We define type application by closure: given A, B ⊆ ΛCN 0 , AB {tu t ∈ A and u ∈ B } and A · B AB . The definitions of ∩ and ∀ are natural, since CR is known to be closed by intersection. Union is more tricky, since reducibility candidates are not closed by union for every reduction system. Riba proved that a necessary and sufficient condition is that perfectly normalising neutral terms which are not normal have a principal reduct ([17], Cor. 4.12), where u is said to be a principal reduct of t if t −−→ u CN
and
for all v ∈ V ,
t −−→∗ v CN
⇒
u −−→∗ v. CN
Lemma 7. 1. If t ∈ ND ∩ PN0 is reducible, then t has a principal reduct. 2. For any D ∈ DC and any A ∈ CR, Val(D · A) = Val(D)A.
244
B. Petit
3. Given T ∈ CR, P ∈ PCR, D ∈ DC, (Ti ) and (Di ) two families of CR and DC respectively,
P → T ∈ CR Ti ∈ CR
and
Ti ∈ CR
and
(8) Di ∈ DC
(9)
Di ∈ DC
(10)
D · T ∈ DC 4.4
(11)
Modelling Types
To achieve the definition of type interpretation, we need to give the interpretation of type variables. For that, we use valuations, i.e. functions matching every data-type variable to a data-candidate, and every type variable to a reducibility candidate. Given a valuation ρ, the interpretation of a type T in ρ, written [T ]ρ , is defined inductively in Fig. 7. We also associate to T (seen as a type for case bindings) and ρ the set of case bindings T ρ . Lemma 7.3 ensures that for every valuation ρ, [T ]ρ ∈ CR for any type T , and [D]ρ ∈ DC for any data type D.
[T ∩ U ]ρ = [T ]ρ ∩ [U ]ρ [∀α.U ]ρ = A∈DC [U ]ρ,α→A [∀X.U ]ρ = A∈CR [U ]ρ,X→A [T ∪ U ]ρ = [T ]ρ ∪ [U ]ρ [∃α.U ]ρ = A∈DC [U ]ρ,α→A [∃X.U ]ρ = A∈CR [U ]ρ,X→A T ρ = { θ / λx. {|θ|} · x ∈ [T ]ρ }
[α]ρ = ρ(α) [X]ρ = ρ(X) [c]ρ = {c} [DT ]ρ = [D]ρ · [T ]ρ [T → U ]ρ = [T ]ρ → [U ]ρ
Fig. 7. Interpretation of types
This definition of type interpretation gives a very precise notion of data-types: Lemma 8. (corollary of Lem. 7.2) For any constructor c and any types T1 . . . Tk , Val( [cT1 . . . Tk ]ρ ) = {ct1 . . . tn /ti ∈ [Ti ]ρ }. In particular, Prop. 1 ensures that t ∈ [cT1 . . . Tk ]ρ implies t −−→∗ ct1 . . . tn for some ti ∈ [Ti ]ρ (i ≤ n), or t −−→∗ .
CN
CN
Furthermore, this type interpretation is sound w.r.t. sub-typing: Lemma 9. If, T1 T2 then for any valuation ρ, [T1 ]ρ ⊆ [T2 ]ρ .
A Polymorphic Type System for the Lambda-Calculus with Constructors
5 5.1
245
Strong Normalisation Adequacy Lemma
In this part we prove adequacy for the model: if a λC -term has type T , then its case normal form is in the interpretation of T (and thus is perfectly normalising). Reducibility candidates model deals with closed terms, but proving adequacy lemma by induction requires the use of open terms, with some assumptions on their free variables —which will be guaranteed by a context. Therefore we use substitutions σ, τ to close CN-terms and CN-case bindings: σ := ∅ | (x → u; σ)
M∅ = M ;
Mx→u;σ = M u/x σ ,
We complete the interpretation of types with the one of judgements: given a context Γ , we say that a substitution σ satisfies Γ for the valuation ρ (notation σ ∈ [Γ ]ρ ) when (x : T ) ∈ Γ implies σ(x) ∈ [T ]ρ . A typing judgement Γ t : T (or Γ θ : T ) is said to be valid (notation: Γ t : T or Γ θ : T respectively) if for every valuation ρ and every substitution σ ∈ [Γ ]ρ , ↓ tσ ∈ [T ]ρ
(resp. ↓ θσ ∈ T ρ )
The proof of adequacy requires a kind of inversion lemma for CR : Lemma 10. For every A ∈ CR, P ∈ PCR, every t ∈ ΛCN 0 , and u ∈ PN0 , tu ∈ A
⇔
t ∈ Red∗ (u) → A
(12)
λx.t ∈ P → A
⇔
for all v ∈ P, tv/x ∈ A
(13)
Proposition 11. Given a term t, a case binding θ, a context Γ and a type T , Γ t:T
⇒
Γ t:T
(14)
Γ θ:T
⇒
Γ θ:T
(15)
and thus, every well typed λC -term has a case-normal form perfectly normalising. 5.2
Strong Normalisation of λC
Both λC without β-reduction (BC , see part 1) and typed λC in case normal form (somehow λC without case-reductions) are strongly normalising. Nevertheless, it is not sufficient to conclude the strong normalisation property for the typed λC -calculus. Indeed, subterms non terminating (for β) could be hidden in an unmatched branch of a case-binding which disappears in the CN-form —like in the term {|c → I ; d → δδ|} · c, whose CN-form is I = λx.x. In order to keep potential infinite derivation, we first expand λC -terms (in a way which avoids CaseCons and CaseDai redexes) before taking their casenormal form, though what we call the star expansion: c = I c = I
x = x ({|θ|} · t) = {|θ |} · t (tu) = t u (λx.t) = λx.t n n (ci → ui )i=1 = (ci → ui )i=1
246
B. Petit
It is straightforward to check that star expansion commutes with substitution and is preserved by reduction. Combined with the switch translation (we use the convention ↓t =↓ (t )), it preserves srtict β-reduction, and it respects typing judgements. Lemma 12. For any term t, t , any type T and any context Γ , Γ t:T
If R ∈ {AL, LA, CO}, then
t →R t
If R ∈ {CA, CL, CC}, then
t →R t
⇒
Γ t : T
(16)
⇒
↓t −−→ ↓t
⇒
↓t = ↓t
+
CN
(17) (18)
Corollary 13. Every typable term is strongly normalising for DC . Proof. Consider the measure s(t) = (| ↓t |, μ(t)), where | ↓t | is the size of the maximal CN-reduction of ↓t , and μ(t) the size of the maximal BC -reduction of t, and conclude with the standard rewriting method [18]. Thus, typed λC -calculus without the daimon is strongly normalising. Since daimon rules do not create DC -redeces, we can conclude (using the postponement method, [18]) the strong normalisation of the full calculus: Lemma 14. If a term is strongly normalising for DC , then it also is for λC . Proposition 15. Typed λC -calculus enjoys strong normalisation property. 5.3
Correctness w.r.t. Pattern Matching
So our type system avoids infinite reduction. It also prevents match failure: Lemma 16. For any λC -term t, if ↓t is hereditarily defined, so is t. Corollary 17. Every typed λC -term is hereditarily defined. Notice that the daimon never appears during a reduction. Thus, Prop. 1 and perfect normalisation of typable terms allow to conclude: Theorem 18. Well typed terms of Λ0 that do not contain the daimon reduce strongly on a value.
Conclusion Related works. The first presentation [12] of the pure pattern calculus [13] comes with a ML-style type system that enjoys the strong normalisation property. This type system is less expressive than ours (it contains neither full second order polymorphism nor existential types, and does not distinguish data types from ordinary types) and does not prevent match failure during reduction. On the other hand, this type system is decidable, and an implementation can be found in [6]. A type system close to ours for the pure pattern calculus is currently investigated by Lombardi, Miquel and R´ıos.
A Polymorphic Type System for the Lambda-Calculus with Constructors
247
Several Church-style type systems have been proposed for the ρ-calculus, including a family of type systems organised in a cube similar to Barendregt’s. As far as we know, no Curry-style type system has been proposed for the ρ-calculus. Future works. This paper has raised many questions, mainly concerning a possible implementation of lambda calculus with constructors. The first one is about recursively defined data types, such as nat ≡ 0 ∪ succ(nat) or list T ≡ nil ∪ cons T (list T ). Adding a double subtyping judgement for each data type is a way to do it, but it requires checking the correctness of each rule. A fixpoint operator would probably be a better way, since it would allow to add recursive data types “on the fly”. Still with the view to implementing λC -calculus, we need to isolate a decidable fragment of our type system. This is a real challenge when it comes to type case bindings (remind the example of section 2.2 page 239) and to use union types. An idea to reach this goal, is to first traduce λC -calculus into pure λ-calculus (in the spirit of CPS translations), and then see what happens in term of types. Acknowledgements. I started this work at the University of Buenos Aires, which hosted me for 6 months during my master thesis. I would like to thank Ariel Arbiser, Eduardo Bonelli, Carlos Lombardi, Alejandro R´ıos and Roel de Vrijer for all the discussions we had there, and that were profitable for this paper. I also acknowledge my supervisor, Alexandre Miquel, who never saved his time since then to help me and guide my researches up to the results of this paper.
References 1. The agda proof assistant, http://wiki.portal.chalmers.se/agda/ 2. Arbiser, A., Miquel, A., R´ıos, A.: The lambda-calculus with constructors: Syntax, confluence and separation. To appear in JFP (2009) 3. Barendregt, H.: The Lambda Calculus: Its Syntax and Semantics. Studies in Logic and The Foundations of Mathematics, vol. 103. North-Holland, Amsterdam (1984) 4. Barthe, G., Cirstea, H., Kirchner, C., Liquori, L.: Pure patterns type systems. In: POPL, pp. 250–261 (2003) 5. Bertot, Y., Cast´eran, P.: Coq’Art: The Calculus of Inductive Constructions. Texts in Theoretical Computer Science, vol. 25. EATCS (2004) 6. Bondi, a programming language centred on pattern-matching, http://www-staff.it.uts.edu.au/~ cbj/bondi/ 7. Cirstea, H., Kirchner, C.: Rho-calculus, its syntax and basic properties. In: 5th International Workshop on Constraints in Computational Logics (1998) 8. Cirstea, H., Kirchner, C., Liquori, L.: The rho cube. In: Honsell, F., Miculan, M. (eds.) FOSSACS 2001. LNCS, vol. 2030, pp. 168–183. Springer, Heidelberg (2001) 9. Girard, J.-Y.: Locus solum: From the rules of logic to the logic of rules. Mathematical Structures in Computer Science 11(3), 301–506 (2001) 10. Girard, J.-Y., Lafont, Y., Taylor, P.: Proofs and Types. Cambridge University Press, Cambridge (1989)
248
B. Petit
11. Hudak, P., Peyton-Jones, S., Wadler, P.: Report on the programming language Haskell, a non-strict, purely functional language (Version 1.2). Sigplan Notices (1992) 12. Jay, C.B.: The pattern calculus. ACM Transactions on Programming Languages and Systems 26(6), 911–937 (2004) 13. Jay, C.B., Kesner, D.: Pure pattern calculus. In: Sestoft, P. (ed.) ESOP 2006. LNCS, vol. 3924, pp. 100–114. Springer, Heidelberg (2006) 14. Leroy, X.: The objective caml system, http://caml.inria.fr/ 15. Milner, R., Tofte, M., Harper, R.: The definition of Standard ML. MIT Press, Cambridge (1990) 16. Mitchell, J.C.: Polymorphic type inference and containment. Inf. Comput. 76(2/3), 211–249 (1988) 17. Riba, C.: On the stability by union of reducibility candidates. In: Seidl, H. (ed.) FOSSACS 2007. LNCS, vol. 4423, pp. 317–331. Springer, Heidelberg (2007) 18. Terese. Term Rewriting Systems. Cambridge Tracts in Theoretical Computer Science (2003)
Kripke Semantics for Martin-L¨ of’s Extensional Type Theory Steve Awodey1 and Florian Rabe2, 1 2
Department of Philosophy, Carnegie Mellon University, Pittsburgh, USA [email protected] School of Engineering and Science, Jacobs University Bremen, Germany [email protected]
Abstract. It is well-known that simple type theory is complete with respect to non-standard models. Completeness for standard models only holds when increasing the class of models, e.g., to cartesian closed categories. Similarly, dependent type theory is complete for locally cartesian closed categories. However, it is usually difficult to establish the coherence of interpretations of dependent type theory, i.e., to show that the interpretations of equal expressions are indeed equal. Several classes of models have been used to remedy this problem. We contribute to this investigation by giving a semantics that is both coherent and sufficiently general for completeness while remaining relatively easy to compute with. Our models interpret types of Martin-L¨ of’s extensional dependent type theory as sets indexed over posets or, equivalently, as fibrations over posets. This semantics can be seen as a generalization to dependent type theory of the interpretation of intuitionistic first-order logic in Kripke models. This yields a simple coherent model theory with respect to which simple and dependent type theory are sound and complete.
1
Introduction and Related Work
Martin-L¨ of’s extensional type theory, MLTT, is a dependent type theory ([ML84]). The main characteristic is that there are type-valued function symbols that take terms as input and return types as output. This is enriched with further type constructors such as dependent sum and product. The syntax of dependent type theory is significantly more complex than that of simple type theory, because well-formed types and terms and both their equalities must be defined in a single joint induction. The semantics of MLTT is similarly complicated. In [See84], the connection between MLTT and locally cartesian closed (LCC) categories was first established. LCC categories interpret contexts Γ as objects Γ , types in context Γ as objects in the slice category over Γ , substitution as pullback, and dependent sum and product as left and right adjoint to pullback. But there is a difficulty,
The author was partially supported by a fellowship for Ph.D. research of the German Academic Exchange Service.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 249–263, 2009. c Springer-Verlag Berlin Heidelberg 2009
250
S. Awodey and F. Rabe
namely that these three operations are not independent: Substitution of terms into types is associative and commutes with sum and product formation, which is not necessarily the case for the choices of pullbacks and their adjoints. This is known as the coherence or strictness problem and has been studied extensively. In incoherent models, equal types are interpreted as isomorphic, but not necessarily equal objects such as in [Cur89]. In [Car86], coherent models for MLTT were given using categories with attributes. And in [Hof94], a category with attributes is constructed for every LCC category. Several other model classes and their coherence properties have been studied in, e.g., [Str91] and [Jac90, Jac99]. In [Pit00], an overview is given. These model classes all have in common that they are rather abstract and have a more complicated structure than general LCC categories. It is clearly desirable to have simpler, more concrete models. But it is a hard problem to equip a given LCC category with choices for pullbacks and adjoints that are both natural and coherent. Our motivation is to find a simple concrete class of LCC categories for which such a choice can be made, and which is still general enough to be complete for MLTT. Mathematically, our main results can be summarized very simply: Using a theorem from topos theory, it can be shown that MLTT is complete with respect to — not necessarily coherent — models in the LCC categories of the form SET P for posets P . This is equivalent to using presheaves on posets as models, which are often called Kripke models. They were also studied in [Hof97]. For these rather simple models, a solution to the coherence problem can be given. SET can be equipped with a coherent choice of pullback functors, and hence the categories SET P can be as well. Deviating subtly from the well-known constructions, we can also make coherent choices for the required adjoints to pullback. Finally, rather than working in the various slices SET P /A, we use the isomorphism SET P /A ∼ = SET ∫ P A , where ∫ P A is the Grothendieck construction: Thus we can formulate the semantics of dependent types uniformly in terms of the simple categories of indexed sets SET Q for various posets Q. In addition to being easy to work with, this has the virtue of capturing the idea that a dependent type S in context Γ is in some sense a type-valued function on Γ : Our models interpret Γ as a poset Γ and S as an indexed set Γ |S : Γ → SET . We speak of Kripke models because these models are a natural extension of the well-known Kripke models for intuitionistic first-order logic ([Kri65]). Such models are based on a poset P of worlds, and the universe is given as a P -indexed set (possibly equipped with P -indexed structure). This can be seen as the special case of our semantics when there is only one base type. In fact, our results are also interesting in the special case of simple type theory ([Chu40]). Contrary to Henkin models [Hen50, MS89], and the models given in [MM91], which like ours use indexed sets on posets, our models are standard: The interpretation Γ |S → S of the function type is the exponential of Γ |S and Γ |S . And contrary to the models in [Fri75, Sim95], our completeness result holds for theories with more than only base types and terms.
Kripke Semantics for Martin-L¨ of’s Extensional Type Theory
251
A different notion of Kripke-models for dependent type theory is given in [Lip92], which is related to [All87]. There, the MLTT types are translated into predicates in an untyped first-order language. The first-order language is then interpreted in a Kripke-model, i.e., there is one indexed universe of which all types are subsets. Such models correspond roughly to non-standard set-theoretical models. We briefly review the syntax of MLTT in Sect. 2 and some categorical preliminaries in Sect. 3. Then we derive the coherent functor choices in Sect. 4 and use them to define the interpretation in Sect. 5. We give our main results regarding the interpretation of substitution, soundness, and completeness in Sect. 6. An extended version of this paper is available as [AR09].
2
Syntax
The basic syntax for MLTT expressions is given by the grammar in Fig. 1. The vocabulary of the syntax is declared in signatures and contexts: Signatures Σ declare globally accessible names c for constants of type S and names a for type-valued constants with a list Γ of argument types. Contexts Γ locally declare typed variables x. Substitutions γ translate from a context Γ to Γ by providing terms in context Γ for the variables in Γ . Thus, a substitution from Γ to Γ can be applied to expressions in context Γ and yields expressions in context Γ . Relative to a signature Σ and a context Γ , there are two syntactical classes: types and typed terms. The base types are the application a γ of a type-valued constant to a list of argument terms γ (which we write as a substitution for simplicity). The composed types are the unit type 1, the identity types Id (s, s ), the dependent product types Σx:S T , and the dependent function types Πx:S T . Terms are constants c, variables x, the element ∗ of the unit type, the element refl(s) of the type Id (s, s), pairs s, s , projections π1 (s) and π2 (s), lambda abstractions λx:S s, and function applications s s . We do not need equality axioms s ≡ s because they can be given as constants of type Id (s, s ). For simplicity, we omit equality axioms for types. The judgments defining well-formed syntax are listed in Fig. 2. The typing rules for these judgments are well-known. Our formulation follows roughly [See84], including the use of extensional identity types. The latter means that the equality judgment for the terms s and s holds iff the type Id (s, s ) is Signatures Contexts Substitutions Types Terms
Σ Γ γ S s
::= ::= ::= ::= ::=
· | Σ, c : S | Σ, a : (Γ )type · | Γ, x : S · | γ, x/s a γ | 1 | Id (s, s ) | Σx:S S | Πx:S S c | x | ∗ | refl (s) | s, s | π1 (s) | π2 (s) | λx:S s | s s
Fig. 1. Basic Grammar
252
S. Awodey and F. Rabe Judgment Σ Sig Σ Γ Ctx Σ γ : Γ → Γ Γ Σ S : type Γ Σ S ≡ S Γ Σ s : S Γ Σ s ≡ s
Intuition Σ is a well-formed signature Γ is a well-formed context over Σ γ is a well-formed substitution over Σ from Γ to Γ S is a well-formed type over Σ and Γ types S and S are equal over Σ and Γ term s is well-formed with type S over Σ and Γ terms s and s are equal over Σ and Γ
Fig. 2. Judgments
inhabited. The equality of terms admits conversion rules such as β and η conversion for function terms. The full inference system can be found in [AR09].
3
Categorical Preliminaries
In this section, we repeat some well-known definitions and results about indexed sets and fibrations over posets (see e.g., [Joh02]). We assume the basic notions of category theory (see, e.g., [Lan98]). We use a set-theoretical pairing function (a, b) and define tuples as left-associatively nested pairs, i.e., (a1 , a2 , . . . , an ) abbreviates (. . . (a1 , a2 ), . . . , an ). Definition 1 (Indexed Sets). POSET denotes the category of partially ordered sets. We treat posets as categories and write p ≤ p for the uniquely determined morphism p → p . If P is a poset, SET P denotes the category of functors P → SET and natural transformations. These functors are also called P -indexed sets. We denote the constant P -indexed set that maps each p ∈ P to {∅} by 1P . It is often convenient to replace an indexed set A over P with a poset formed from the disjoint union of all sets A(p) for p ∈ P . This is a special case of a construction by Mac Lane ([LM92]) usually called the Grothendieck construction. Definition 2 (Grothendieck Construction). For an indexed set A over P , we define a poset ∫ P A := {(p, a) | p ∈ P, a ∈ A(p)} with (p, a) ≤ (p , a )
iff
p ≤ p and A(p ≤ p )(a) = a .
We also write ∫ A instead of ∫ P A if P is clear from the context. Using the Grothendieck construction, we can work with sets indexed by indexed sets: We write P |A if A is an indexed set over P , and P |A|B if additionally B is an indexed set over ∫ P A, etc. Definition 3. Assume P |A|B. We define an indexed set P |(A B) by (A B)(p) = {(a, b) | a ∈ A(p), b ∈ B(p, a)}
Kripke Semantics for Martin-L¨ of’s Extensional Type Theory
253
and (A B)(p ≤ p ) : (a, b) → a , B (p, a) ≤ (p , a ) (b)
for a = A(p ≤ p )(a).
And we define a natural transformation πB : A B → A by (πB )p : (a, b) → a. The following definition introduces discrete opfibrations; for brevity, we will refer to them as “fibrations” in the sequel. Using the axiom of choice, these are necessarily split. Definition 4 (Fibrations). A fibration over a poset P is a functor f : Q → P with the following property: For all p ∈ P and q ∈ Q such that f (q) ≤ p , there is a unique q ∈ Q such that q ≤ q and f (q ) = p . We call f normal iff f is the first projection of Q = ∫ P A for some P |A. For every indexed set A over P , the first projection ∫ P A → P is a (normal) fibration. Conversely, every fibration f : Q → P defines an indexed set over P by mapping p ∈ P to its preimage f −1 (p) ⊆ Q and p ≤ p to the obvious function. This leads to a well-known equivalence of indexed sets and fibrations over P . If we only consider normal fibrations, we obtain an isomorphism as follows. Lemma 1. If we restrict the objects of POSET /P to be normal fibrations and the morphisms to be (arbitrary) fibrations, we obtain the full subcategory Fib(P ) of POSET /P . There are isomorphisms F (−) : SET P → Fib(P )
and
I(−) : Fib(P ) → SET P .
Proof. It is straightforward to show that Fib(P ) is a full subcategory. For A : P → SET , we define the fibration F (A) : ∫ P A → P by (p, a) → p. And for a natural transformation η : A → A , we define the fibration F (η) : ∫ P A → ∫ P A satisfying F (A) ◦ F (η) = F (A ) by (p, a) → (p, ηp (a)). For f : Q → P , we define an indexed set I(f ) by I(f )(p) := {a | f (p, a) = p} and I(f )(p ≤ p ) : a → a where a is the uniquely determined element such that (p, a) ≤ (p , a ) ∈ Q. And for a morphism ϕ between fibrations f : Q → P and f : Q → P , we define a natural transformation I(ϕ) : I(f ) → I(f ) by I(ϕ)p : a → a where a is such that ϕ(p, a) = (p, a ). Then it is easy to compute that I and F are mutually inverse functors. Definition 5 (Indexed Elements). Assume P |A. The P -indexed elements of A are given by Elem(A) := ap ∈ A(p) p∈P | ap = A(p ≤ p )(ap ) whenever p ≤ p . Then the indexed elements of A are in bijection with the natural transformations 1P → A. For a ∈ Elem(A), we will write F (a) for the fibration P → ∫ A mapping p to (p, ap ).
254
S. Awodey and F. Rabe
(p1 , a1 , b1 )
(p2 , a2 , b2 )
∫(A B) ∼ = ∫B
F (B) ∫A
(p1 , a1 )
(p2 , a2 )
bi ∈ B(pi , ai ), B((p1 , a1 ) ≤ (p2 , a2 )) = b2 (p1 , a1 , b1 ) ≤ (p2 , a2 , b2 ) (ai , bi ) ∈ (A B)(pi )
ai ∈ A(pi ), A(p1 ≤ p2 )(a1 ) = a2 (p1 , a1 ) ≤ (p2 , a2 )
F (A) P
p1
p2
p1 ≤ p2
Fig. 3. Indexed Sets and Fibrations
Example 1. We exemplify the introduced notions by Fig. 3. P is a totally ordered set visualized as a horizontal line with two elements p1 ≤ p2 ∈ P . For P |A, ∫ A becomes a blob over P . The sets A(pi ) correspond to the vertical lines in ∫ A, and ai ∈ A(pi ). The action of A(p ≤ p ) and the poset structure of ∫ A are horizontal: If we assume A(p1 ≤ p2 ) : a1 → a2 , then (p1 , a1 ) ≤ (p2 , a2 ) in ∫ A. In general, A(p1 ≤ p2 ) need not be injective or surjective. The action of F (A) is vertical: F (A) maps (pi , ai ) to pi . For P |A|B, ∫ B becomes a three-dimensional blob over ∫ A. The sets B(pi , ai ) correspond to the dotted lines, and bi ∈ B(pi , ai ). The action of B((p1 , a1 ) ≤ (p2 , a2 )) and the poset structure of ∫ B are horizontal, and F (B) projects ∫ B to ∫ A. ∫ P (A B) is isomorphic to ∫ ∫ P A B: Their elements differ only in the bracketing, i.e., (pi , (ai , bi )) and ((pi , ai ), bi ), respectively. We have (ai , bi ) ∈ (AB)(pi ), and (AB)(p ≤ p ) : (a1 , b1 ) → (a2 , b2 ). Thus, the sets (AB)(pi ) correspond to the two-dimensional gray areas. Up to this isomorphism, the projection F (AB) is the composite F (A) ◦ F (B). Indexed elements a ∈ Elem(A) are families (ap )p∈P and correspond to horizontal curves through ∫ A such that F (a) is a section of F (A). Indexed elements of B correspond to two-dimensional vertical areas in ∫ B (intersecting the dotted lines exactly once), and indexed elements of A B correspond to horizontal curves in ∫ B (intersecting the gray areas exactly once).
Kripke Semantics for Martin-L¨ of’s Extensional Type Theory
255
We will use Lem. 1 frequently to switch between indexed sets and fibrations, as convenient. In particular, we will use the following two corollaries. Lemma 2. Assume P |A. Then Elem(A) ∼ = Hom Fib(P ) (id P , F (A)) = {f : P → ∫ P A | F (A) ◦ f = id P }. and
SET P /A ∼ = SET ∫ A
Proof. Both claims follow from Lem. 1 by using Elem(A) = Hom SET P (1P , A) as well as Fib(P )/F (A) ∼ = Fib(∫ P A), respectively. Finally, as usual, we say that a category is locally cartesian closed (LCC) if it and all of its slice categories are cartesian closed (in particular, it has a terminal object). Then we have the following well-known result (see, e.g., [AR09]) . Lemma 3. SET P is LCC.
4
Operations on Indexed Sets
Because SET P is LCC, we know that it has pullbacks and that the pullback along a fixed natural transformation has left and right adjoints (see, e.g., [Joh02]). However, these functors are only unique up to isomorphism, and it is non-trivial to pick coherent choices for them. Pullbacks. Assume P |A1 and P |A2 and a natural transformation h : A2 → A1 . The pullback along h is a functor SET P /A1 → SET P /A2 . Using Lem. 2, we can avoid dealing with slice categories of SET P and instead give a functor h∗ : SET ∫ A1 → SET ∫ A2 , which we also call the pullback along h. The functor h∗ is given by precomposition. Definition 6. Assume A1 and A2 indexed over P , and a natural transformation h : A2 → A1 . Then for B ∈ SET ∫ A1 , we put h∗ B := B ◦ F (h) ∈ SET ∫ A2 , where, as in Lem. 1, F (h) : ∫ P A2 → ∫ P A1 . The action of h∗ on morphisms is defined similarly by precomposition with F (h): h∗ (β : B → B ) = β ◦ F (h). Finally, we define a natural transformation between P -indexed sets by h B : A2 h∗ B → A1 B,
(h B)p : (a2 , b) → (hp (a2 ), b).
The application of hB is independent of B, which is only needed in the notation to determine the domain and codomain of h B. Lemma 4 (Pullbacks). In the situation of Def. 6, the following is a pullback in SET P .
256
S. Awodey and F. Rabe
A2 h∗ B
hB
A1 B πB
πh∗ B h
A2
A1
Furthermore, we have the following coherence properties for every natural transformation g : A3 → A2 : id A1 ∗ B = B, (h ◦ g)∗ B = g ∗ (h∗ B),
id A1 B = id A1 B , (h ◦ g) B = (h B) ◦ (g h∗ B).
Proof. The following is a pullback in POSET :
∫ A2 h∗ B
F (h B)
F (πh∗ B ) ∫ A2
F (h)
∫ A1 B
F (πB ) ∫ A1
(p, (a2 , b))
F (h B)
F (πh∗ B ) (p, a2 )
F (h)
(p, (hp (a2 ), b)) F (πB ) (p, hp (a2 ))
If we turn this square into a cocone on P by adding the canonical projections F (A2 ) and F (A1 ), it becomes a pullback in Fib(P ). Then the result follows by Lem. 1. The coherence properties can be verified by simple computations. Equivalently, using the terminology of [Pit00], we can say that for every P the tuple (SET P , SET ∫ A , A B, πB , h∗ B, h B) forms a type category (where A, B, h indicate arbitrary arguments). Then giving coherent adjoints to the pullback means to show that this type category admits dependent sums and products. Adjoints. To interpret MLTT, the adjoints to h∗ , where h : A2 → A1 , are only needed if h is a projection, i.e., A1 := A, A2 := A B, and h := πB for some P |A|B. We only give adjoint functors for this special case because we use this restriction when defining the right adjoint. Thus, we give functors LB , RB : SET ∫ AB → SET ∫ A such that LB πB ∗ RB . Definition 7. We define the functor LB as follows. For an object C, we put LB C := B(C ◦assoc) where assoc maps elements ((p, a), b) ∈ ∫ B to (p, (a, b)) ∈ ∫ A B; and for a morphism, i.e., a natural transformation η : C → C , we put (LB η)(p,a) : (b, c) → (b, η(p,(a,b)) (c))
for (p, a) ∈ ∫ A.
Kripke Semantics for Martin-L¨ of’s Extensional Type Theory
257
Lemma 5 (Left Adjoint). LB is left adjoint to πB ∗ . Furthermore, for any natural transformation g : A → A, we have the following coherence property (the Beck-Chevalley condition) g ∗ (LB C) = Lg∗ B (g B)∗ C. Proof. It is easy to show that LB is isomorphic to composition along πB , for which the adjointness is well-known. In particular, we have the following diagram in SET P :
(A B) C
∼ =
A LB C
πC AB
πLB C
πB A The coherence can be verified by direct computation.
The right adjoint is more complicated. Intuitively, RB C must represent the dependent functions from B to C. The naive candidate for this is Elem(C) ∼ = Hom(1∫ B , C) (i.e., Hom(B, C) in the simply-typed case), but this is not a ∫ Aindexed set. There is a well-known construction to remedy this, but we use a subtle modification to achieve coherence, i.e., the analogue of the Beck-Chevalley condition. To do that, we need an auxiliary definition. Definition 8. Assume P |A|B, P |A B|C, and an element x := (p, a) ∈ ∫ A. Let y x ∈ SET P and a natural transformation i : y x → A be given by {∅} if p ≤ p x y (p ) = ip : ∅ → A(p ≤ p )(a). ∅ otherwise Then we define indexed sets P |y x |B x and P |y x B x |C x by: B x := i∗ B,
∗
C x := (i B) C
and put dx := ∫ y x B x for the domain of C x . The left diagram in Fig. 4 shows the involved P -indexed sets, the right one gives the actions of the natural transformations for an element p ∈ P with p ≤ p . Below it will be crucial for coherence that B x and C x contain tuples in which a is replaced with ∅.
258
S. Awodey and F. Rabe
(i B) C (y x B x ) C x (A B) C πC x yx B x
(a , b , c )
(∅, b )
(a , b )
πC iB
πB x yx
(∅, b , c )
AB
x := (p, a) a := A(p ≤ p )(a)
πB i
∅
A
a
Fig. 4. The Situation of Def. 8
Definition 9. Assume P |A|B. Then we define the functor RB : SET ∫ AB → SET ∫ A as follows. Firstly, for an object C, we put for x ∈ ∫ A (RB C)(x) := Elem(C x ). In particular, f ∈ (RB C)(x) is a family (fy )y∈dx for fy ∈ C x (y). For x ≤ x ∈ ∫ A, we have dx ⊇ dx and put (RB C)(x ≤ x ) : (fy )y∈dx → (fy )y∈dx . Secondly, for a morphism, i.e., a natural transformation η : C → C , we define RB η : RB C → RB C as follows: For x := (p, a) ∈ ∫ A and f ∈ (RB C)(x), we define f := (RB η)x (f ) ∈ (RB C )(x) by f(p ,(∅,b )) := η(p ,(a ,b )) (f(p ,(∅,b )) )
for (p , (∅, b )) ∈ dx and a := A(p ≤ p )(a).
Lemma 6 (Right Adjoint). RB is right adjoint to πB ∗ . Furthermore, for every natural transformation g : A → A, we have the following coherence property ∗
g ∗ (RB C) = Rg∗ B (g B) C. The proof can be found in [AR09]. The adjointness implies Elem(RB C) ∼ = Elem(C). We spell out this isomorphism explicitly because we will use it later on. Lemma 7. Assume P |A|B and P |A B|C. For t ∈ Elem(C) and x := (p, a) ∈ ∫ A, let tx ∈ Elem(C x ) be given by (tx )(p ,(∅,b )) = t(p ,(a ,b ))
where a := A(p ≤ p )(a).
And for f ∈ Elem(RB C) and x := (p, (a, b)) ∈ ∫ A B, we have f(p,a) ∈ Elem(C x ); thus, we can put f x := (f(p,a) )(p,(∅,b)) ∈ C(p, (a, b)).
Kripke Semantics for Martin-L¨ of’s Extensional Type Theory
259
Then the sets Elem(C) and Elem(RB C) are in bijection via sp(−)
Elem(C) t −→ (tx )x∈∫ A ∈ Elem(RB C) and Elem(RB C) f
am(−)
−→ (f x )x∈∫ AB ∈ Elem(C)
Proof. This follows from the right adjointness by easy computations.
Intuitively, sp(t) turns t ∈ Elem(C) into a ∫ A-indexed set by splitting it into components. And am(f ) glues such a tuple of components back together. Syntactically, these operations correspond to currying and uncurrying, respectively. Then we need one last notation. For P |A, indexed elements a ∈ Elem(A) behave like mappings with domain P . We can precompose such indexed elements with fibrations f : Q → P to obtain Q-indexed elements of Elem(A ◦ f ). Definition 10. Assume P |A, f : Q → P , and a ∈ Elem(A). a∗f ∈ Elem(A◦f ) is defined by: (a ∗ f )q := af (q) for q ∈ Q.
5
Semantics
Using the operations from Sect. 4, the definition of the semantics is straightforward. To demonstrate its simplicity, we will spell it out in an elementary way. The models are Kripke-models, i.e., a Σ-model I is based on a poset P I of worlds, and provides interpretations cI and aI for all symbols declared in Σ. I extends to a function −I , which interprets all Σ-expressions. We will omit the index I if no confusion is possible. The interpretation is such that – for a – for a Γ , – for a – for a
context Γ , Γ is a poset, substitution γ from Γ to Γ , γ is a monotone function from Γ to type S, Γ |S is an indexed set on Γ , term s of type S, Γ |s is an indexed element of Γ |S.
If Γ = x1 : S1 , . . . , xn : Sn , an element of Γ has the form (p, (a1 , . . . , an )) where p ∈ P , a1 ∈ ·|S1 (p), . . . , an ∈ x1 : S1 , . . . , xn−1 : Sn−1 |Sn (p, (a1 , . . . , an−1 )). Intuitively, ai is an assignment to the variable xi in world p. For a typed term Γ Σ s : S, both Γ |s and Γ |S are indexed over Γ . And if an assignment (p, α) is given, the interpretations of s and S satisfy Γ |s(p,α) ∈ Γ |S(p, α). This is illustrated in the left diagram in Fig. 5. If γ is a substitution Γ → Γ , then γ maps assignments (p, α ) ∈ Γ to assignments (p, α) ∈ Γ . And a substitution in types and terms is interpreted by pullback, i.e., composition. This is illustrated in the right diagram in Fig. 5; its commutativity expresses the coherence. The poset P of worlds plays the same role as the various posets Γ — it interprets the empty context. In this way, P can be regarded as interpreting an implicit or relative context. This is in keeping with the practice of type theory
260
S. Awodey and F. Rabe
(and category theory), according to which closed expressions may be considered relative to some fixed but unspecified context (respectively, base category). Sum types are interpreted naturally as the dependent sum of indexed sets given by the left adjoint. And pairing and projections have their natural semantics. Product types are interpreted as exponentials using the right adjoint. A lambda abstraction λx:S t is interpreted by first interpreting t and then splitting it as in Lem. 7. And an application f s is interpreted by amalgamating the interpretation of f as in Lem. 7 and using the composition from Def. 10.
∫Γ |S F (Γ |s) Γ
id
F (Γ |S) Γ
Γ
γ
Γ |γ(S)
Γ Γ |S
SET
Fig. 5. Semantics of Terms, Types, and Substitution
Definition 11 (Models). For a signature Σ, Σ-models are defined as follows: – A model I for the empty signature · is a poset P I . – A model I for the signature Σ, c : S consists of a Σ-model IΣ and an indexed element cI ∈ Elem(·|SIΣ ). – A model I for the signature Σ, a : (Γ0 )type consists of a Σ-model IΣ and an indexed set aI over Γ0 IΣ . Definition 12 (Model Extension). The extension of a model is defined by induction on the typing derivations. We assume in each case that all occurring expressions are well-formed. For example in the case for Γ |f s, f has type Πx:S T and s has type S. – Contexts: The context x1 : S1 , . . . , xn : Sn is interpreted as the poset whose elements are the tuples (p, (a1 , . . . , an )) such that p∈P a1 ∈ ·|S1 (p, ∅) .. . an ∈ x1 : S1 , . . . , xn−1 : Sn−1 |Sn (p, (a1 , . . . , an−1 )) The ordering of the poset is inherited from the n-times iterated Groethendieck construction, to which it is canonically isomorphic. – Substitutions γ = x1 /s1 , . . . , xn /sn from Γ to Γ : γ : (p, α ) → p, (Γ |s1 (p,α ) , . . . , Γ |sn (p,α ) ) for (p, α ) ∈ Γ
Kripke Semantics for Martin-L¨ of’s Extensional Type Theory
261
– Base types: Γ |a γ0 := a ◦ γ0 – Composed types: Γ |1(p, α)
:= {∅} {∅} if Γ |s(p,α) = Γ |s (p,α) Γ |Id (s, s )(p, α) := ∅ otherwise := LΓ |S Γ, x : S|T Γ |Σx:S T Γ |Πx:S T
:= RΓ |S Γ, x : S|T
Γ |1 and Γ |Id (s, s ) are only specified for objects; their extension to morphisms is uniquely determined. – Elementary terms: Γ |c(p,α) := cp ,
x1 : S1 , . . . , xn : Sn |xi (p,(a1 ,...,an )) := ai
– Composed terms: Γ |∗(p,α) Γ |refl(s)(p,α) Γ |s, s (p,α) Γ |πi (u)(p,α) Γ |λx:S t Γ |f s
:= ∅ := ∅ := (Γ |s(p,α) , Γ |s (p,α) ) := ai where Γ |u(p,α) = (a1 , a2 ) := sp(Γ, x : S|t) := am(Γ |f ) ∗ (assoc ◦ F (Γ |s))
Here assoc maps ((p, α), a) to (p, (α, a)). Since the same expression may have more than one well-formedness derivation, the well-definedness of Def. 12 must be proved in a joint induction with the proof of Thm. 2 below (see also [Str91]). And because of the use of substitution, e.g., for application of function terms, the induction must be intertwined with the proof of Thm. 1 as well.
6
Main Results
Theorem 1 (Substitution). Assume Σ γ : Γ → Γ . Then: 1. for a substitution Σ γ : Γ → Γ : γ ◦ γ = γ ◦ γ , 2. for a type Γ Σ S : type: Γ |γ(S) = Γ |S ◦ γ, 3. for a term Γ Σ s : S: Γ |γ(s) = Γ |s ∗ γ. Theorem 2 (Soundness). Assume a signature Σ, and a context Γ . If Γ Σ S ≡ S for two well-formed types S, S , then in every Σ-model: Γ |S = Γ |S ∈ SET Γ . And if Γ Σ s ≡ s for two well-formed terms s, s of type S, then in every Σ-model: Γ |s = Γ |s ∈ Elem(Γ |S).
262
S. Awodey and F. Rabe
Thm. 1 and 2 must be proved in a joint induction over all expressions and their well-formedness derivations. The proofs can be found in [AR09]. According to the propositions-as-types interpretation — also known as the Curry-Howard correspondence — a type S holds in a model if its interpretation S is inhabited, i.e., the indexed set S has an indexed element. A type is valid if it holds in all models. Then soundness implies: If there is a term s of type S in context Γ , then in every Σ-model there is an indexed element of Γ |S, namely Γ |s. Conversely, we have: Theorem 3 (Completeness). Assume a signature Σ, a context Γ , and a wellformed type S. If Γ |S has an indexed element in every model, then there is a term s such that Γ Σ s : S. The basic idea of the proof is to take the syntactical category, and then to construct a model out of it using topological embedding theorems. It can be found in [AR09]. Due to the presence of extensional identity types, Thm. 3 implies: For all wellformed terms s, s of type S, if Γ |s = Γ |s in all Σ-models, then Γ Σ s ≡ s . An analogous result for types is more complicated and remains future work.
7
Conclusion and Future Work
We have presented a concrete and intuitive semantics for MLTT in terms of indexed sets on posets. And we have shown soundness and completeness. Our semantics is essentially that proposed by Lawvere in [Law69] in the hyperdoctrine of posets, fibrations, and indexed sets on posets, but we have made particular choices for which the models are coherent. Our models use standard function spaces, and substitution has a very simple interpretation as composition. The same holds in the simply-typed case, which makes our models an interesting alternative to (non-standard) Henkin models. In both cases, we strengthen the existing completeness results by restricting the class of models. We assume that the completeness result can still be strengthened somewhat further, e.g., to permit equality axioms between types. In addition, it is an open problem to find an elementary completeness proof, i.e., one that does not rely on topos-theoretical results. Going beyond the results presented here, we have developed a first-order logic on top of MLTT and extended the completeness result.
References [All87]
[AR09]
Allen, S.: A Non-Type-Theoretic Definition of Martin-L¨ of’s Types. In: Gries, D. (ed.) Proceedings of the Second Annual IEEE Symp. on Logic in Computer Science, LICS 1987, pp. 215–221. IEEE Computer Society Press, Los Alamitos (1987) Awodey, S., Rabe, F.: Kripke Semantics for Martin-L¨ of’s Extensional Type Theory (2009), http://kwarc.info/frabe/Research/LamKrip.pdf
Kripke Semantics for Martin-L¨ of’s Extensional Type Theory
263
[Car86] Cartmell, J.: Generalized algebraic theories and contextual category. Annals of Pure and Applied Logic 32, 209–243 (1986) [Chu40] Church, A.: A Formulation of the Simple Theory of Types. Journal of Symbolic Logic 5(1), 56–68 (1940) [Cur89] Curien, P.: Alpha-Conversion, Conditions on Variables and Categorical Logic. Studia Logica 48(3), 319–360 (1989) [Fri75] Friedman, H.: Equality Between Functionals. In: Parikh, R. (ed.) Logic Colloquium. LNMath, vol. 453, pp. 22–37. Springer, Heidelberg (1975) [Hen50] Henkin, L.: Completeness in the Theory of Types. Journal of Symbolic Logic 15(2), 81–91 (1950) [Hof94] Hofmann, M.: On the Interpretation of Type Theory in Locally Cartesian Closed Categories. In: Pacholski, L., Tiuryn, J. (eds.) CSL 1994. LNCS, vol. 933, pp. 427–441. Springer, Heidelberg (1995) [Hof97] Hofmann, M.: Syntax and Semantics of Dependent Types. In: Pitts, A., Dybjer, P. (eds.) Semantics and Logic of Computation, pp. 79–130. Cambridge University Press, Cambridge (1997) [Jac90] Jacobs, B.: Categorical Type Theory. PhD thesis, Catholic University of the Netherlands (1990) [Jac99] Jacobs, B.: Categorical Logic and Type Theory. Elsevier, Amsterdam (1999) [Joh02] Johnstone, P.: Sketches of an Elephant: A Topos Theory Compendium. Oxford Science Publications (2002) [Kri65] Kripke, S.: Semantical Analysis of Intuitionistic Logic I. In: Crossley, J., Dummett, M. (eds.) Formal Systems and Recursive Functions, pp. 92–130. NorthHolland, Amsterdam (1965) [Lan98] Mac Lane, S.: Categories for the working mathematician. Springer, Heidelberg (1998) [Law69] Lawvere, W.: Adjointness in Foundations. Dialectica 23(3–4), 281–296 (1969) [Lip92] Lipton, J.: Kripke Semantics for Dependent Type Theory and Realizability Interpretations. In: Myers, J., O’Donnell, M. (eds.) Constructivity in Computer Science, Summer Symposium, pp. 22–32. Springer, Heidelberg (1992) [LM92] Mac Lane, S., Moerdijk, I.: Sheaves in geometry and logic. Lecture Notes in Mathematics. Springer, Heidelberg (1992) [ML84] Martin-L¨ of, P.: Intuitionistic Type Theory. Bibliopolis (1984) [MM91] Mitchell, J., Moggi, E.: Kripke-style Models for Typed Lambda Calculus. Annals of Pure and Applied Logic 51(1–2), 99–124 (1991) [MS89] Mitchell, J., Scott, P.: Typed lambda calculus and cartesian closed categories. In: Categories in Computer Science and Logic. Contemporary Mathematics, vol. 92, pp. 301–316. Amer. Math. Society (1989) [Pit00] Pitts, A.: Categorical Logic. In: Abramsky, S., Gabbay, D., Maibaum, T. (eds.) Handbook of Logic in Computer Science, ch. 2. Algebraic and Logical Structures, vol. 5, pp. 39–128. Oxford University Press, Oxford (2000) [See84] Seely, R.: Locally cartesian closed categories and type theory. Math. Proc. Cambridge Philos. Soc. 95, 33–48 (1984) [Sim95] Simpson, A.: Categorical completeness results for the simply-typed lambdacalculus. In: Dezani-Ciancaglini, M., Plotkin, G. (eds.) Typed Lambda Calculi and Applications, pp. 414–427 (1995) [Str91] Streicher, T.: Semantics of Type Theory. Springer, Heidelberg (1991)
On the Values of Reducibility Candidates Colin Riba Laboratoire de l’Informatique du Parall´elisme ENS Lyon – Universit´e de Lyon [email protected] http://perso.ens-lyon.fr/colin.riba/
Abstract. The straightforward elimination of union types is known to break subject reduction, and for some extensions of the lambda-calculus, to break strong normalization as well. Similarly, the straightforward elimination of implicit existential types breaks subject reduction. We propose elimination rules for union types and implicit existential quantification which use a form call-by-value issued from Girard’s reducibility candidates. We show that these rules remedy the above mentioned difficulties, for strong normalization and, for the existential quantification, for subject reduction as well. Moreover, for extensions of the lambda-calculus based on intuitionistic logic, we show that the obtained existential quantification is equivalent to its usual impredicative encoding w.r.t. provability in realizability models built from reducibility candidates and biorthogonals.
1
Introduction
Although useful in some type systems [2, 12], implicit existential types can be problematic because of their elimination rule. Some of these problems seem related to similar issues with union types. The straightforward elimination of union types is known to break subject reduction [1], and for some extensions of the λ-calculus, to break strong normalization as well [14]. The counter example for subject reduction given in [1] has been adapted to implicit existential types in [18]. Concerning strong normalization, the difficulties are related to type interpretations in reducibility. Usually, types are interpreted by closure operators. Union types and implicit existential types are both interpreted by the closure of the union. When no further assumption is made, the validity of their elimination rules follows from the closure under unions of the interpretations (i.e. when closure preserves unions). But results of [14] on failures of strong normalization with elimination of union types show that some rewrite systems are not compatible with type interpretations which are closed under unions. These problems with elimination are caused by possibly bad cohabitations between the rules of elimination and the reductions of some calculi. This suggests
UMR 5668 CNRS ENS-Lyon UCBL INRIA, 46 all´ee d’Italie, 69364 Lyon Cedex 7, France.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 264–278, 2009. c Springer-Verlag Berlin Heidelberg 2009
On the Values of Reducibility Candidates
265
either to adapt the elimination rules to the calculus, or to adapt the calculus to the elimination rules. In call-by-value settings, the first solution is studied in [6, 19] for systems with union and implicit existential quantification. Both works propose elimination rules restricted to call-by-value evaluation contexts. The second solution is studied in [8] for a call-by-value λ-calculus. On a related subject, a reducibility interpretation of Moggi’s computational calculus based on a combination of reducibility candidates and biorthogonals is given in [11]. In this paper, working on rewriting-based extensions of the λ-calculus, we propose to eliminate union and implicit existential types by a (let x = t in c) which is reduced according to a form call-by-value issued from Girard’s reducibility candidates [7]. In contrast with [8, 6, 19], this does not force the whole calculus to be call-by-value. We build on [15, 16], where a general notion of Girard’s candidates is proposed. This framework assumes only a rewrite relation and a set of contexts, called elimination contexts, subject to some axioms. The basic ingredient is an interaction property between terms and elimination contexts. Terms which interact with elimination contexts are called values since they are observable. They are the values that we use in the reduction of the let. Our notion of call-by-value is not the usual one (considered for instance in [8]) since in our case variables are not values. From a theoretical point of view, this allows to define the reduction of (let x = t in c) by a rewrite relation. This would have been impossible if variables were values. We present the basic tools in Sect. 2. The axiomatization of reducibility candidates is presented in Sect. 3. In Sect. 4, we show how to extend modularly our framework with the let and prove that the axioms for reducibility are preserved. Section 5 presents three applications. First, strong normalization with union types and possibly non-deterministic simple rewriting in Sect. 5.1. Second, strong normalization and subject reduction with existential, product and sum types in Sect. 5.2. Third, concerning extensions of the λ-calculus based on intuitionistic logic, we show in Sect. 5.3 that the obtained existential quantification is equivalent to the usual impredicative encoding w.r.t. provability in realizability models built from reducibility candidates and biorthogonals. A version of this paper with full proofs is available from the webpage of the author.
2
Preliminaries
Terms. A signature Σ is a family of sets (Σn )n∈N such that Σn contains algebraic symbols of arity n. We consider λ-terms with uncurryied symbols f in a signature Σ and variables x ∈ X: t, u ∈ Λ(Σ)
::=
x
| λx.t
| tu |
f(t1 , . . . , tn ) ,
where f ∈ Σn . Let Λ be the set of pure λ-terms Λ(∅). A substitution is a function σ : X → Λ(Σ) of finite domain. The capture-avoiding application of σ to a term t is written tσ or t[σ(x1 )/x1 , . . . , σ(xn )/xn ] if Dom(σ) = {x1 , . . . , xn }. We let σ[t/x] be the substitution which maps x to t and is equal to σ on Dom(σ) \ {x}.
266
C. Riba
Reductions. A rewrite relation is a binary relation →R ⊆ (Λ(Σ) \ X) × Λ(Σ) which is closed under contexts and substitutions. Write →RS for →R ∪ →S and let (t)R =def {v | t →R v}. Define the product extension of →R as (t1 , . . . , tn ) →R (u1 , . . . , un ) when there is k ∈ {1, . . . , n} such that tk →R uk and ti = ui for all i = k. We denote by SNR the set of strongly normalizing terms for →R , which is the smallest set such that ∀t. (∀u. t →R u =⇒ u ∈ SNR ) ⇒ t ∈ SNR . Let →β be the smallest rewrite relation on Λ(Σ) such that (λx.t)u →β t[u/x]. The Polymorphic λ-Calculus λ∀ . Our core language is the Curry-style polymorphic λ-calculus λ∀ . Its types are the formulas of second-order minimal logic, with variables X ∈ XT : T, U ∈ T∀
::=
X ∈ XT
|
U⇒T
|
∀X.T .
Its typing rules are the following: (Ax)
(⇒ I)
Γ, x : T x : T (∀ I)
Γ, x : U t : T Γ λx.t : U ⇒ T
Γ t:T (X ∈ / Γ) Γ t : ∀X.T
(⇒ E) (∀ E)
Γ t:U⇒T Γ u:U Γ tu : T
Γ t : ∀X.T Γ t : T [U/X]
Types without ∀X.T are called simple types, denoted by T. The simply typed λ-calculus λ is λ∀ restricted to T (hence without the rules (∀ I) and (∀ E)). Implicit Existential Types. We add to λ∀ implicit existential types. Let T∀∃ be the extension of T∀ with the second-order existential quantification ∃X.T . The usual implicit rules for ∃X.T are (∃ I)
Γ t : T [U/X] Γ t : ∃X.T
(∃ E)
Γ t : ∃X.T Γ, x : T c : C X∈ / Γ, C Γ c[t/x] : C
As for universal quantification, they are not reflected at the term level by corresponding constructors and eliminators. The rule (∃ E) does not satisfy subject reduction [18]. Example 2.1 ([18]). Let I = λx.x, u1 = z(Ixy)(Ixy), u2 = z(xy)(Ixy) and let Γ be the context x : Y ⇒ ∃X.X ⇒ X, y : Y, z : ∀X.(X ⇒ X) ⇒ (X ⇒ X) ⇒ Z. Using (∃ E) we have Γ u1 : Z. But u1 →β u2 and u2 is not of type Z in Γ . Moreover, (∃ E) causes difficulties with strong normalization. To explain them, we introduce some basic notions on reducibility. Reducibility. Let →R be a rewrite relation on Λ(Σ). In strong normalization proofs based on reducibility, types T are interpreted by sets of strongly normalizing terms T . Strong normalization of typable terms follows from the adequacy of the interpretation: typable terms must belong to the interpretation of their type. However, not every type interpretation is adequate. Usually, an adequate interpretation can be described by a closure operator on P(SNR ), i.e. by a function Red ( ) : P(SNR ) → P(SNR ) which is idempotent (Red (Red (A)) = Red (A)),
On the Values of Reducibility Candidates
267
extensive (A ⊆ Red (A)) and monotone (A ⊆ B ⇒ Red (A) ⊆ Red (B)). It is wellknown that the set of closed elements Red =def {Red (A) | A ⊆ SNR } is a complete lattice whose g.l.b.’s are given by intersections. We say that Red is a reducibility family if X ⊆ A for all A ∈ Red . There are essentially three kinds of reducibility families: Tait’s saturated sets [17], Girard’s reducibility candidates [7] and biorthogonals [13]. In this paper we focus on the last two. Assume that Red is closed under the function space ⇒ , which is defined as A ⇒ B =def {t | ∀u. u ∈ A =⇒ t u ∈ B} where A, B ⊆ Λ(Σ). Given an ∈ Red of polymorphic types assignment ρ : XT → Red , the interpretation T Red ρ T ∈ T∀ is inductively defined as XRed = ρ(X), def ρ U ⇒ T Red =def URed ⇒ T Red ρ ρ ρ
and ∀X.T Red =def ρ
T Red ρ[C/X] .
C∈Red
We say that Red is adequate if (Γ t : T and (ρ, σ) |=Red Γ ) implies tσ ∈ T Red ρ , Red Red is adequate where (ρ, σ) |=Red Γ if σ(x) ∈ Γ (x)ρ for all x ∈ Dom(Γ ). If then Γ t : T implies t ∈ SNR . Adequacy and Existential Quantification. Adequacy w.r.t. the rules (∀ I) and (∀ E) is ensured by the definition of ∀X.T Red ρ . Dually, one might expect to T∀∃ defined as adequacy w.r.t. (∃ I) and (∃ E) for the extension of Red ρ ∃X.T Red ρ
=def
Red
Red C∈Red T ρ[C/X]
.
Red Since T Red ρ[C/X] ⊆ Red ( C∈Red T ρ[R/X] ), every reducibility family validates the rule (∃ I). This is less clear for (∃ E). We only know that it is validated by reducibility families which are closed under unions, i.e. such that Red ( A) = A for all A ⊆ Red . But closure under unions is not possible for all rewrite relations →R . This is related to strong normalization problems with union types. Union Types. Let T be the extension of simple types with intersection types T U and union types T U. T is equipped with a binary relation such that (T , ) is a preorder with all finite l.u.b.’s and g.l.b.’s and satisfying usual additional axioms for the arrow ⇒ (see Sect. 5.1). The typing rules are those of λ extended with (Sub) and ( I) (see Sect. 5.1). Intersections and unions are =def T Red ∩ URed interpreted in Red similarly as quantifications: T URed ρ ρ ρ Red Red Red and T Uρ =def Red (T ρ ∪ Uρ ). As with (∃ E), if Red is closed under unions, then Red is adequate for the elimination of union: ρ ( E)
Γ t : T1 T2
Γ, x : T1 c : C Γ c[t/x] : C
Γ, x : T2 c : C
As (∃ E), the rule ( E) does not satisfy subject reduction [1, 18].
268
C. Riba
Problems with Strong Normalization. There are extensions of β-reduction with which the rule ( E) allows to type non-strongly normalizing terms. We consider simple rewrite systems R, that are finite sets of rewrite rules of the form f(x1 , . . . , xn ) →R r, where f ∈ Σn , r ∈ Λ(Σ) and x1 , . . . , xn are distinct variables. Algebraic symbols are typed using a rule inspired from [3]: (FunR )
Γ t:T
∀f(x) →R r, Γ, x : T r : U Γ f(t) : U
For instance, with the non-deterministic simple rewrite system x1 + x2 →+ xi (i ∈ {1, 2}), we have x1 : T1 , x2 : T2 x1 + x2 : T1 T2 . By combining (Fun+ ) and ( E), we can type non-strongly normalizing terms, while the type system with (Fun+ ) but without ( E) is strongly normalizing [14]. Example 2.2 ([14]). Let δ = λx.xx, t1 = λz.zyδ and t2 = λz.δ. There are types T1 , T2 , U, V such that y : V ti : Ti and y : V, x : Ti xx : U (note that ti ti ∈ SNβ+ ). Using ( E) and (Fun+ ) we get y : V (t1 + t2 )(t1 + t2 ) : U. But (t1 + t2 )(t1 + t2 ) →∗β+ t1 t2 →∗β+ δδ ∈ / SNβ+ . Call-by-Value Eliminations. In this paper, we propose and study a modified version of (∃ E) and ( E) which uses a call-by-value (let x = t in c). We consider the extended set of terms Λ(Σ)let
::=
x
|
λx.t |
t u | f(t1 , . . . , tn ) |
(let x = t in u) ,
and we replace (∃ E) and ( E) by the following rules: Γ, x : T1 c : C Γ t : T1 T2 Γ, x : T2 c : C ( Elet ) Γ (let x = t in c) : C
(∃ Elet )
Γ t : ∃X.T Γ, x : T c : C X∈ / Γ, C Γ (let x = t in c) : C
The let is reduced according to a notion of value V issued from Girard’s reducibility candidates (to be defined in Sect. 3). Given a rewrite relation →R and a set of values V, we extend →R with the smallest rewrite relation →V on Λ(Σ)let s.t. (let x = v in c)
→V
c[v/x]
if
v∈V.
We show that the rules ( Elet ) and (∃ Elet ) lead to strongly normalizing systems regardless of closure under unions. We moreover show that this may allow to recover subject reduction with implicit existential quantification. Remark 2.3. The subtyping rule (T1 ⇒ U) (T2 ⇒ U) (T1 T2 ) ⇒ U causes strong normalization problems similar to those of ( E) [14]. Our solution does not handle it because it would imply to have a call-by-value arrow, while we want to force call-by-value only locally in ( Elet ).
3
Reducibility Candidates
Reducibility candidates, denoted by CRRE , form a reducibility family depending only on a rewrite relation →R and a set of elimination contexts E [15, 16]. They come with an inherent notion of values VRE , which are the terms that interact with elimination contexts (see Def. 3.4 below).
On the Values of Reducibility Candidates
269
An Axiomatization. Our axiomatization of reducibility has been first presented in [15]. We use the version of [16] where details and proofs can be found. Let [ ] ∈ X be a distinguished variable and →R be a rewrite relation on Λ(Σ). Definition 3.1 (Evaluation Contexts). A set of evaluation contexts for →R is a set E of terms E[ ] containing exactly one occurrence of [ ], and which is closed under reduction: if E[ ] ∈ E and E[ ] →R t then t = F[ ] ∈ E. If t ∈ Λ(Σ) and E[ ] ∈ E then we let E[t] =def (E[ ])[t/[ ]]. Values and neutral terms are defined by an interaction property between terms and evaluation contexts. Definition 3.2 (Interacton, Values and Neutral Terms). Let E be a set of evaluation contexts for →R . – A term t interacts with E[ ] ∈ E for →R , notation t 1R E[ ], if there is a term w such that E[t] →R w and which is not of the form E [t ] with (E[ ], t) →R (E [ ], t ). – A value for →R in E is a term v s.t. v 1R E[ ] for some E[ ] ∈ E. We denote by VRE the set of values for →R in E. Given X ⊆ Λ(Σ), the set of values of X is VRE (X) =def {v ∈ VRE | t →∗R v for some t ∈ X}. We write VRE (t) for VRE ({t}). – A term t is neutral for →R in E if it is not a value. We denote by NRE the set of neutral terms for →R in E. Values are observable terms, since they interact with evaluation contexts. For instance, the values for β-reduction in evaluation contexts E[ ] ∈ E⇒ ::= [ ] | E[ ] t are the λ-abstractions λx.t. Note that non-interaction is compositional: Lemma 3.3. If t 1R E[ ] and E[t] 1R F[ ] then t 1R F[E[ ]]. Reducibility candidates are defined from a rewrite relation and a set of evaluation contexts satisfying some axioms. These axioms define elimination contexts. The modular extension of this framework with let, presented in Sect. 4, requires to introduces new axioms of closure under substitutions. Definition 3.4 (Elimination Contexts). Let E be a set of evaluation contexts for →R . Then E is a set of elimination contexts for →R if – [ ] ∈ E, – variables are neutral: X ⊆ NRE , – NRE is closed under composition with E: if t ∈ NRE and E[ ] ∈ E then E[t] ∈ NRE . – NRE \ X is closed under substitution: if t ∈ NRE \ X, then tσ ∈ NRE . – VRE is closed under substitution: if t ∈ VRE , then tσ ∈ VRE . – E is closed under substitution: if E[ ] ∈ E and σ : (X \ {[ ]}) → Λ(Σ) then Eσ[ ] ∈ E. For instance, the contexts E⇒ are elimination contexts for β-reduction. We assume given a set E of elimination contexts for →R .
270
C. Riba
Definition 3.5 (Reducibility Candidates). The set of reducibility candidates for →R in E, written CRRE , is the set of all C ⊆ SNR such that (CR0) if t ∈ C and t →R u then u ∈ C, (CR1) if t ∈ NRE and ∀u. t →R u =⇒ u ∈ C then t ∈ C. There is a closure operator CRRE ( ) : P(SNR ) → P(SNR ) such that CRRE (X) is the smallest reducibility candidate containing X if X ⊆ SNR . The closure of the empty set is the set of hereditary neutral terms, i.e. the set of strongly normalizing neutral terms which never reduce to a value. Since variables are neutral terms in normal form (recall that →R ⊆ (Λ(Σ) \ X) × Λ(Σ)), we get X ⊆ C for all candidate C. It follows that reducibility candidates form a reducibility family. Note that the greatest element of CRRE is SNR . The axioms defining elimination contexts allow to show the following basic property of reducibility candidates. We use it to prove that elimination-based type interpretations (such as the function space ⇒ ) preserve CRRE . Lemma 3.6 ([16]). Let t ∈ NRE , E[ ] ∈ E ∩ SNR and C ∈ CRRE . If E[u] ∈ C for all u ∈ (t)R , then E[t] ∈ C. Proof. By induction on E[ ] ∈ SNR . Since E[t] is neutral, it is sufficient to show that (E[t])R ⊆ C. Let w ∈ (E[t])R . Since t is neutral w = (E [t ]) with (t, E[ ]) →R (t , E [ ]). If t →R t then w ∈ C by assumption. Otherwise E[ ] →R E [ ], and for all u ∈ (t)R we have E[u] →R E [u], hence E [u] ∈ C by (CR0). Thus E [t] ∈ C by induction hypothesis.
The Values of Reducibility Candidates. An important property of reducibility candidates is that they are uniquely determined by their values: Lemma 3.7 ([16]). Given X ⊆ SNR and t ∈ SNR , we have t ∈ CRRE (X) if and only if VRE (t) ⊆ VRE (X). This paper builds on the following consequences of Lem. 3.7. Write A ⊆∗ B when A is a non-empty subset of B. Corollary 3.8 (A, B ⊆ SNR )
CRRE (A) ⊆ CRRE (B) ⇔ VRE (A) ⊆ VRE (B) VRE (A) VRE (CRRE ( A)) =
(A ⊆∗ CRRE )
CRRE (B ∪ (C1 ∩ C2 )) = CRRE (B ∪ C1 ) ∩ CRRE (B ∪ C2 )
(B, C1 , C2 ∈ CRRE )
B ∩ CRRE (C1 ∪ C2 ) = CRRE ((B ∩ C1 ) ∪ (B ∩ C2 ))
(B, C1 , C2 ∈ CRRE )
The first equality says that CRRE is in some sense closed under union w.r.t. values. It will justify the typing rules ( Elet ) and (∃ Elet ) (Sect. 4.2). The last two ones state the distributivity of the candidate lattice, which is used in Sect. 5.1. All these properties are independent from the closure under unions of CRRE .
On the Values of Reducibility Candidates
271
Orthogonality. We now briefly discuss biorthogonality. See [16] for details on biorthogonality in our framework. Let →R be a rewrite relation on Λ(Σ) and E be a set of elimination contexts for →R . Define ⊥ ⊆ Λ(Σ) × E as t ⊥ E[ ] if and only if E[t] ∈ SNR . Define the orthogonal of A ⊆ Λ(Σ) (resp. B ⊆ E) as (resp.
A⊥ B⊥
=def =def
{E[ ] | ∀t. t ∈ A =⇒ t ⊥ E[ ]} {t | ∀E[ ]. E[ ] ∈ B =⇒ t ⊥ E[ ]})
⊥ The induced map ( )⊥⊥ on P(SNR ) is a closure operator [16]. Write CR⊥ RE for {A⊥⊥ | A ⊆∗ SNR } = {B⊥ | B ⊆∗ E ∩ SNR }. It is a reducibility family thanks to: ⊥ Lemma 3.9 ([16]). CR⊥ RE ⊆ CRRE .
4
A Call-by-Value Extension of Reducibility
We now insert the let presented in Sect. 2 into the reducibility candidates issued from a rewrite relation →R on Λ(Σ) and a set E of elimination contexts for →R . We first extend the set of terms from Λ(Σ) to Λ(Σ)let (defined in Sect. 2). We then extend →R to the smallest rewrite relation on Λ(Σ)let containing the original relation →R (which was defined on Λ(Σ)). Now, let →V be the smallest rewrite relation on Λ(Σ)let such that (let x = t in u) →V u[t/x] if t ∈ VRE . The delicate operation is to extend the set of elimination contexts. We need to extend E both with contexts of the form (let x = E[ ] in c) and with contexts having the same shape as those in E, but built on Λ(Σ)let rather than on Λ(Σ). This second operation is easy to express in the usual cases where E is defined by a grammar (see Sect. 5.1 and Sect. 5.2). However, performing this operation on an arbitrary set E while preserving the axioms of Def. 3.4 leads us to some technicalities. We chose to close E by Λ(Σ)let -substitution, but this causes difficulties w.r.t. stability by reduction. A solution is to close by substitution only the contexts which are linear in X, but they need not to be stable by reduction. We therefore work with the set ERlin , defined as the set of R-normal linear E[ ] ∈ E. Definition 4.1. Let Elet be the smallest set such that E[ ] ∈ ERlin ∧ σ : (X \ {[ ]}) → Λ(Σ)let =⇒ E[ ]σ ∈ Elet (E[ ], F[ ] ∈ Elet ∧ t ∈ Λ(Σ)let ) =⇒ E[(let x = F[ ] in t)] ∈ Elet We now show that Elet is a set of elimination contexts for →RV . This allows to define reducibility candidates and biorthogonals. We then show that this turns the closure of an union into an elimination-based interpretation. 4.1
Values, Neutral Terms and Elimination Contexts
In this section, we give the main steps of the proof that Elet is a set of elimination contexts for →RV . We first give a characterization of the values and neutral terms for →RV in Elet in terms of those for →R in E. Note that the let ensures that values for →R in E are values for →RV in Elet . The characterization of neutral terms relies on the fact that values and non-variable neutral terms are not unifiable in Λ(Σ)let , which follows from the substitution axioms of Def. 3.4.
272
C. Riba
Proposition 4.2 (Values and Neutral Terms) σ : X → Λ(Σ)let } ,
VRVElet
=
{vσ | v ∈ Λ(Σ) ∩ VRE
NRVElet
= ∪
X ∪ {(let x = t in c) | t, c ∈ Λ(Σ)let } {nσ | n ∈ (Λ(Σ) ∩ NRE ) \ X ∧ σ : X → Λ(Σ)let } .
∧
In order to show that Elet is a set of elimination contexts for →RV , we have to check the axioms of Defs. 3.1 and 3.4. Concerning Def. 3.1, linearity in [ ] follows from an easy induction on Elet , using linearity in [ ] of E in the base case. Stability by reduction is a consequence of the condition E[ ] ∈ ERlin in the first clause of the definition of Elet . Concerning Def. 3.4, stability by substitution for values and non-variable neutral terms follows directly from Prop. 4.2. Stability by substitution for contexts follows from an easy induction on Elet . Moreover, variables are neutral by Prop. 4.2, and the preservation of neutral terms by composition with elimination contexts is a consequence of Prop. 4.2 shown by induction on Elet . Hence we have Theorem 4.3. Elet is a set of elimination contexts for →RV . ⊥ We thus obtain reducibility candidates CRRVElet and biorthogonals CR⊥ RVElet directly from a rewrite relation →R on Λ(Σ) and a set of elimination contexts E.
4.2
An Elimination-Based Interpretation of Unions
In this section, biorthogonals, the we show that for reducibility candidates and let turns Red ( A) into the elimination-based interpretation Red A defined as t | ∀C ∈ Red , ∀c s.t. u ∈ A ⇒ c[u/x] ∈ C, (let x = t in c) ∈ C . Hence, we get elimination-based interpretations of union and implicit existential types. This ensures the adequacy of reducibility candidates and biorthogonals w.r.t. (∃ Elet ) and ( Elet ), but does not change the definition of the type interpretation using the closure of unions (see Sect. 5). ⊥ Write CR for CRRVElet , CR⊥⊥ for CR⊥ RVE let , N for NRVE let , V for VRVElet and SN for SNRV . We begin by showing that CR A (resp. CR⊥⊥ A) is a reducibility candidate (resp. a biorthogonal). Lemma 4.4. If A ⊆∗ CR then CR A ∈ CR. Proof. Write A for CR A. We firstshow that A ⊆ SN. Let t ∈ A and take C =def CR( A). Since A ⊆ CR( A), we get (let x = t in x) ∈ C ⊆ SN, hence t ∈ SN. Stability by reduction (CR0) is immediate. For the clause (CR1), take t ∈ N such that (t)RV ⊆ A. Let C ∈ CR and c such that u ∈ A implies c[u/x] ∈ C. Since (let x = [ ] in c) ∈ Elet ∩ SN and (let x = t in c) ∈ C for all t ∈ (t)RV , we have (let x = t in c) ∈ C by Lem. 3.3.
Lemma 4.5. If A ⊆∗ CR⊥⊥ then CR⊥⊥ A ∈ CR⊥⊥ .
On the Values of Reducibility Candidates
273
Proof. Since CR⊥⊥ A is the orthogonal of the non-empty subset of SN E[(let x = [ ] in c)] | C ∈ CR⊥⊥ ∧ E[ ] ∈ C⊥ ∧ u ∈ A ⇒ c[u/x] ∈ C .
is the closure of the union when Red ∈ {CR, CR⊥⊥ }. Theorem 4.6. If Red ∈ {CR, CR⊥⊥ } and A ⊆∗ Red then Red A = Red ( A).
We now show that
Red
Proof. (⊇). Since Red is defined operator, by Lem. 4.4 and 4.5 by a closure A. Let t ∈ A, C ∈ it is sufficient to show that A ⊆ Red Red and c s.t. u ∈ A implies c[u/x] ∈ C. Note that t, c ∈ SN since A is not empty. By Lem. 3.9, we have C ∈ CR. Since (let x = t in c) is neutral, it is sufficient to show that ((let x = t in c))RV ⊆ C. We reason by induction on pairs (t, c) ordered by the product extension of →RV . Let w such that (let x = t in c) →RV w. There are two cases. w = c[t/x] & t ∈ V. Since t ∈ A, we get c[t/x] ∈ C by assumption on C. w = (let x = t in c ) & (t, c) →RV (t , c ). By (CR0), we have t ∈ A and that u ∈ A ⇒ c [u/x] ∈ C. We get w ∈ C by induction hypothesis. ⊥ ⊥ (⊆).We have Red A ∈ Red by Lem. 4.4 and Lem. 4.5. Since CR ⊆ CR we have ∈ CR. Therefore,by Cor. 3.8 it is sufficient toshow that Red A, Red ( A) A) ⊆ Red ( A). Let v ∈ V( Red A) and take C =def Red ( A). Since V( Red A ⊆ C, we obtain that (let x = v in x) ∈ C. But (let x = v in x) →V v since v ∈ V. Hence v ∈ C = Red ( A) by (CR0).
5
Applications
We now discuss three applications. Our approach is to start from a given calculus made of a type system and a rewrite relation →R . We provide a set of elimination contexts E. This gives reducibility candidates (and biorthogonals) as in Sect. 3. We then apply the method of Sect. 4. Terms are extended with let, the rewrite relation →R is extended with →V , and we obtain elimination contexts Elet by Def. 4.1. Using Thm. 4.3, we obtain reducibility candidates CR (and biorthogonals CR⊥⊥ ) for →RV in Elet . We then extend the type system with ( Elet ) or (∃ Elet ). Adequacy w.r.t. these rules is ensured by Thm. 4.6. The first application concerns union types, the second one existential types and the third one deals with existential quantification in realizability models based on reducibility candidates and biorthogonals. 5.1
Union Types
We apply the framework presented in Sect. 4 to a calculus with union types, intersection types and (possibly non-deterministic) simple rewrite rules. We focus on reducibility candidates and do not consider biorthogonals since their lattice is not distributive. Some proofs are postponed until Sect. 5.2.
274
C. Riba
We consider simple types with unions and intersections T . As in [5], they are equipped with a preorder so that (T, ) is a distributive lattice satisfying the additional axioms U1 U2 T2 T1 (T ⇒ U1 ) (T ⇒ U2 ) T ⇒ (U1 U2 ) and U2 ⇒ T2 U1 ⇒ T1 Given a simple rewrite system R, we add to λ the rule (FunR ) and Γ t:T T U Γ t:T Γ t:U (Sub) Γ t:T U Γ t:U We consider reducibility candidates for →βR in elimination contexts E⇒ . By Thm. 4.3, we thus obtain reducibility candidates CR for →βRV in the elimination contexts E⇒let . Note that (let x = t in c) →V c[t/x] if and only if t = λy.u. Therefore, continuing Ex. 2.2, we have y : V (let x = t1 + t2 in xx) : U, but (let x = t1 + t2 in xx) does not reduce to t1 t2 : since t1 + t2 is not a value, we must choose between t1 and t2 before performing the substitution. Note that E⇒let is the following set of contexts (see Lem. 5.2): ( I)
E[ ]
::=
[ ] | E[ ] t
|
(let x = E[ ] in t) .
We now add the rule ( Elet ) to the system. Types are interpreted as in Sect. 2. Write ρ for CR ρ . The correctness of the subtyping relation is standard [14], excepted that distributivity is ensured by Cor. 3.8. We get ρ : T → CR since ⇒ : CR2 → CR (Lem. 5.3). Theorem 5.1. If Γ t : T and (ρ, σ) |= Γ then tσ ∈ T ρ . Proof. By induction on Γ t : T . The case of ( Elet ) follows from Thm. 4.6, which entails that T Uρ = T ρ ∨CR Uρ (see the proof of Thm. 5.5). Adequacy w.r.t. the other rules is standard [14].
5.2
Implicit Existential Types
We apply the framework of Sect. 4 to a calculus with existential, product and sum types. We show that it enjoys strong normalization and subject reduction. The System. We let T∀∃×+ be the extension of T∀∃ with the binary product T × U and the binary sum T + U. Terms are built on the signature Σ
=def
{ , , π1 , π2 , inj1 , inj2 , case( , , )} .
We extend β-reduction with πi t1 , t2 →β ti and case(inji t, u1 , u2 ) →β ui t. The type system is that of λ∀ enriched with (∃ I) and the following rules: (×I)
(+I)
Γ t 1 : T1 Γ t 2 : T2 Γ t1 , t2 : T1 × T2
Γ t : Ti (i ∈ {1, 2}) Γ inji t : T1 + T2
Γ t : T1 × T2 (i ∈ {1, 2}) Γ π i t : Ti Γ u1 : T1 ⇒ U Γ t : T1 + T2 Γ u2 : T2 ⇒ U (+E) Γ case(t, u1 , u2 ) : U (×E)
On the Values of Reducibility Candidates
275
Reducibility. We consider the elimination contexts E[ ] ∈ E
::=
[ ] | E[ ] t
|
πi E[ ] |
case(E[ ], t, u) .
The values are the terms of the form λx.t, t, u or inji t. We consider reducibility candidates for →β in E. By Thm. 4.3, we thus obtain reducibility candidates CR and biorthogonals CR⊥⊥ for →βV in the elimination contexts Elet . Lemma 5.2. Elet is the set of contexts E[ ] ::= [ ] | E[ ] t | πi E[ ] | case(E[ ], t, u) | (let x = E[ ] in t) . We add the rule (∃ Elet ) to the system. In the remaining of this section, we assume that Red ∈ {CR, CR⊥⊥ }. We extend the interpretation of Sect. 2 as follows: T × URed ρ T + URed ρ
{t | π1 t ∈ T Red ∧ π2 t ∈ URed ρ ρ } {t | ∀C ∈ Red , ∀s ∈ T Red ⇒ C, ∀u ∈ URed ⇒ C, ρ ρ case(t, s, u) ∈ C }
=def =def
The correctness of the definition relies on the fact that the shape of elimination contexts has not been destroyed by their extension with let (Lem. 5.2). Lemma 5.3. If ρ : XT → T∀∃×+ and T ∈ T∀∃×+ then T Red ∈ Red . ρ Proof. In both cases for Red , we reason by induction on T . The result is trivial if T is a variable. The cases of ∀X.T and ∃X.T follow from the fact that Red is defined by a closure operator. We detail the case of A ⇒ B. Write T for T Red ρ . Red = CR. We have A ⇒ B ⊆ SNβV since A is not empty and B ⊆ SNβV . Stability by reduction follows from (CR0) on B. We get (CR1) by Lem. 3.6, using that [ ]t ∈ Elet for all t ∈ Λ(Σ)let thanks to Lem. 5.2. Red = CR⊥⊥ . Since A ⇒ B is orthogonal to {E[[ ]t] | t ∈ A ∧ E[ ] ∈ B⊥ }
which is a non-empty subset of SNβV . Since type constructors are interpreted by eliminations, Lem. 5.3 implies adequacy w.r.t. the elimination rules (⇒ E), (×E) and (+E). Adequacy w.r.t. the rules (⇒ I), (×I) and (+I) follows from the following saturation lemma (recall that SNβV ∈ CR and [ ] ∈ Elet ). Lemma 5.4. For all u, s ∈ SNβV , all E[ ] ∈ Elet and all C ∈ CR, E[(λx.t)u] ∈ C E[π1 t, u] ∈ C E[π2 u, t] ∈ C
if if
E[t] ∈ C E[t] ∈ C
if
E[t[u/x]] ∈ C
E[case(inj1 t, u, s)] ∈ C E[case(inj2 t, s, u)] ∈ C
if if
E[ut] ∈ C E[ut] ∈ C
Proof. Note that t, E[ ] ∈ SNβV since C ⊆ SNβV . We detail the case of E[(λx.t)u]. Since (λx.t)u is neutral, by Lem. 3.6 it is sufficient to show that E[w] ∈ C if (λx.t)u →β w. We reason by induction on (t, u) ordered by the product extension of →β . Let w such that (λx.t)u →β w. If w = t[u/x] then we are done by assumption. Otherwise, w = (λx.t )u with (t, u) →β (t , u ) and we conclude by induction hypothesis, using that C is closed under reduction.
276
C. Riba
Using Thm. 4.6 for the existential quantification, adequacy follows as usual. Theorem 5.5. If Γ t : T and (ρ, σ) |=Red Γ then tσ ∈ T Red ρ . Proof. By induction on Γ t : T . We detail the case of (∃ Elet ). Let (ρ, σ) |= Γ . By induction hypothesis we have tσ ∈ ∃X.T Red / Γ, U, if C ∈ Red ρ . Since X ∈ Red and u ∈ T ρ[C/X] then (ρ[C/X], σ[u/x]) |=Red Γ, x : T . By induction hypothesis again, we deduce that cσ[u/x] ∈ URed for all u ∈ C∈Red T Red ρ ρ[C/X] . We obtain Red Red Red (let x = tσ in cσ) ∈ Cρ since ∃X.T ρ = C∈Red T ρ[C/X] by Thm. 4.6.
Subject Reduction. We sketch the proof that let allows to recover subject reduction for implicit existential types. The key-point is that the reduction of a (let x = t in c) obtained from (∃ Elet ) is only made when t is a value, in which case we know that its type has been obtained by a (∃ I) rule (Prop. 5.6). This prevents us from the counter example of [18] (Ex. 2.1): we can type (let w = Ixy in zww) but this term does not reduce to u1 since Ixy is not a value. Proposition 5.6 (Determinacy of Typing). Let v ∈ VβVElet . If Γ v : ∃X.T then there is U such that Γ v : T [U/X]. The remaining of the proof directly follows that of [9] for λ∀ . It relies on usual substitution and inversion properties. Substitution. (i) If Γ, x : U t : T and Γ u : U then Γ t[u/x] : T . (ii) If Γ t : T then Γ [U/X] t : T [U/X]. Inversion. (i) If Γ λx.t : U ⇒ T then Γ, x : U t : T . (ii) If Γ t1 , t2 : A1 × A2 then Γ ti : Ai for all i ∈ {1, 2}. (iii) If Γ inji t : A1 + A2 then Γ t : Ai . Theorem 5.7 (Subject Reduction). If Γ t : T and t →βV u then Γ u : T . 5.3
Realizability Semantics of Existential Quantification
We now discuss the realizability interpretation of existential quantification in presence of the let. Realizability with implicit existential types is used e.g. in [12]. Given a reducibility family Red , write t Red A (read t realizes A) if t is a closed term such that t ∈ ARed for all ρ : XT → Red . In realizabilρ ity models based on λ∀ , the existential quantification is usually encoded as ˜ / T . Using Thm 4.6, we define ∃X.T =def ∀Y.(∀X.T ⇒ Y) ⇒ Y where Y ∈ terms t, u such that for reducibility candidates and biorthogonals, we have ˜ ˜ t (∃X.T ) ⇒ (∃X.T ) and u (∃X.T ) ⇒ (∃X.T ) (Thm. 5.9). This means that in the corresponding realizability models, the let makes the two existential quantifications equivalent w.r.t. provability. We assume given a rewrite relation →R on Λ(Σ), which contains →β and which is compatible with →β in the following sense: if (λx.t)u →R w and w is not of the form (λx.t )u with (t, u) →R (t , u ), then w = t[u/x]. We also
On the Values of Reducibility Candidates
277
assume given a set E of elimination contexts for →R such that [ ] t ∈ E for all t ∈ Λ(Σ) and such that (λx.t)u is neutral for all t, u ∈ Λ(Σ). Note that this include the reduction systems presented in Sect. 5.1 and Sect. 5.2. We obtain λ-terms with let and extend →R with →V . Elimination contexts ⊥ are extended to Elet . We thus obtain reducibility families CRRVElet and CR⊥ RVElet , ⊥ ⊥ that we write CR and CR respectively. Note that compatibility with →β is preserved: if (λx.t)u →RV w then either w = t[u/x] or w = (λx.t )u with for all u ∈ URed then (t, u) →RV (t , u ). This ensures that if t[u/x] ∈ T Red ρ ρ Red t ∈ U ⇒ T ρ . Moreover, we still have [ ] t ∈ Elet for all t ∈ Λ(Σ)let The formulas we consider are the types T∀∃ . Theyare interpreted as in Red Red Sect. 5.2, using Thm. 4.6. In particular ∃X.T ρ = Red T ρ[C/X] | C ∈ Red . Definition 5.8. Given a type T , let 2(T ) =def ∀X.(T ⇒ X) ⇒ X, where X ∈ / T. Note that the boxed type 2(T ) is not the double-negation of T . Boxed types are useful because of the following intermediate properties: given Red ∈ {CR, CR⊥⊥ }, λx.x λx.λy.yx λx.x(λy.y) λt.λx.t (λy.(let z = y in xz))
Red Red Red Red
˜ (∃X.T ) ⇒ 2(∃X.T ) T ⇒ 2(T ) 2(T ) ⇒ T ˜ 2(∃X.T ) ⇒ (∃X.T )
Theorem 5.9. Let Red ∈ {CR, CR⊥⊥ }. ˜ ) (i) λy.λx.(let x = y in xz) Red (∃X.T ) ⇒ (∃X.T ˜ (ii) λx.x(λy.y) Red (∃X.T ) ⇒ (∃X.T )
6
Conclusion
We proposed a let-elimination of union types and implicit existential quantifications. This provides a way to obtain strongly normalizing systems, and for the existential quantification, to get subject reduction as well. We also have shown that the obtained existential quantification coincides with its usual encoding w.r.t. provability in realizability models built from reducibility candidates. Further Work. There are different way in which this work can be extended. First, to study subject reduction of union types with let. Second, to extend the reduction of let with usual permutative conversions. Third, to study the obtained existential quantification in Krivine’s realizability [10]. Another direction is to explore links with classical logic: the obtained let seems to correspond to a form ˜ in the sequent-based λ-calculus of [4]. The elimination rules with let would be of μ seen as the translation on terms of implicit right-introduction rules on contexts. Acknowledgments. We thank Alexandre Miquel for suggesting the study of realizability models and the use of boxed types.
278
C. Riba
References [1] Barbanera, F., Dezani-Ciancaglini, M., de’ Liguoro, U.: Intersection and Union Types: Syntax and Semantics. Information and Computation 119, 202–230 (1995) [2] Blanqui, F., Riba, C.: Combining Typing and Size Constraints for Checking the Termination of Higher-Order Conditional Rewrite Systems. In: Hermann, M., Voronkov, A. (eds.) LPAR 2006. LNCS (LNAI), vol. 4246, pp. 105–119. Springer, Heidelberg (2006) [3] Coquand, T., Spiwack, A.: A Proof of Strong Normalisation using Domain Theory. In: Proceedings of LiCS 2006, pp. 307–316. IEEE Computer Society, Los Alamitos (2006) [4] Curien, P.-L., Herbelin, H.: The Duality of Computation. In: Proceedings of ICFP 2000, pp. 233–243. ACM Press, New York (2000) [5] Dezani-Ciancaglini, M., de’ Liguoro, U., Piperno, P.: A Filter Model for Concurrent Lambda-Calculus. Siam Journal on Computing 27(5), 1376–1419 (1998) [6] Dunfield, J., Pfenning, F.: Tridirectional Typechecking. In: Proceedings of POPL 2004, pp. 281–292. ACM Press, New York (2004) [7] Girard, J.-Y., Lafont, Y., Taylor, P.: Proofs and Types. Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, Cambridge (1989) [8] Ishihara, H., Kurata, T.: Completeness of Intersection and Union Type Assignment Systems for Call-by-Value λ-Models. Theoretical Computer Science 272, 197–221 (2002) [9] Krivine, J.-L.: Lambda-Calculus, Types and Models. Ellis Horwood (1993) [10] Krivine, J.-L.: Realizability in Classical Logic. To appear in Panoramas et synth´eses, Soci´et´e Math´ematique de France, disponible sur HAL (2004) [11] Lindley, S., Stark, I.: Reducibility and TT-Lifting for Computation Types. In: Urzyczyn, P. (ed.) TLCA 2005. LNCS, vol. 3461, pp. 262–277. Springer, Heidelberg (2005) [12] Oliva, P., Streicher, T.: On Krivine’s Realizability Interpretation of Classical Second-Order Arithmetic. Fundameta Informaticae 84(2), 207–220 (2008) [13] Parigot, M.: Proofs of Strong Normalization for Second Order Classical Natural Deduction. Journal of Symbolic Logic 62(4), 1461–1479 (1997) [14] Riba, C.: Strong Normalization as Safe Interaction. In: Proceedings of LiCS 2007, pp. 13–22. IEEE Computer Society, Los Alamitos (2007) [15] Riba, C.: Stability by Union of Reducibility Candidates for Orthogonal Constructor Rewriting. In: Beckmann, A., Dimitracopoulos, C., L¨ owe, B. (eds.) CiE 2008. LNCS, vol. 5028, pp. 498–510. Springer, Heidelberg (2008) [16] Riba, C.: Toward a General Rewriting-Based Framework for Reducibility (submitted, 2008) (available from the author’s homepage) [17] Tait, W.W.: A Realizability Interpretation of the Theory of Species. In: Parikh, R. (ed.) Logic Colloquium. LNCS, vol. 453. Springer, Heidelberg (1975) [18] Tatsuta, M.: Simple Saturated Sets for Disjunction and Second-Order Existential Quantification. In: Della Rocca, S.R. (ed.) TLCA 2007. LNCS, vol. 4583, pp. 366–380. Springer, Heidelberg (2007) [19] Vouillon, J., Melli`es, P.-A.: Semantic Types: A Fresh Look at the Ideal Model for Types. In: Proceedings of POPL 2004. ACM Press, New York (2004)
Lexicographic Path Induction Jeffrey Sarnat1 and Carsten Sch¨ urmann2
2
1 Yale University [email protected] IT University of Copenhagen [email protected]
Abstract. Programming languages theory is full of problems that reduce to proving the consistency of a logic, such as the normalization of typed lambda-calculi, the decidability of equality in type theory, equivalence testing of traces in security, etc. Although the principle of transfinite induction is routinely employed by logicians in proving such theorems, it is rarely used by programming languages researchers, who often prefer alternatives such as proofs by logical relations and model theoretic constructions. In this paper we harness the well-foundedness of the lexicographic path ordering to derive an induction principle that combines the comfort of structural induction with the expressive strength of transfinite induction. Using lexicographic path induction, we give a consistency proof of Martin-L¨ of’s intuitionistic theory of inductive definitions. The consistency of Heyting arithmetic follows directly, and weak normalization for G¨ odel’s T follows indirectly; both have been formalized in a prototypical extension of Twelf.
1
Introduction
Programming languages theory is full of problems that reduce to proving the consistency of a logic, such as the normalization of typed λ-calculi, the decidability of equality in type theory, etc. Although the principle of transfinite induction is routinely employed by logicians in proving such theorems, it is rarely used by programming languages researchers, who often prefer alternatives such as proofs by logical relations and model theoretic constructions. This phenomenon can be explained at least in part by the fact that ordinals can be notoriously tricky to work with. The Burali-Forti paradox illustrates that any ordinal notation system is necessarily incomplete, and in practice, as ordinals get bigger, the notation systems needed to describe them become more complex. In contrast, the lexicographic path ordering (LPO) is both powerful (its order type approaches the small Veblen ordinal) and well understood by computer scientists. Furthermore, it is easy to implement and has been used to prove the termination of term rewriting systems (TRSs) for decades, where, ironically, its considerable proof-theoretic strength cannot be fully harnessed.
This work was in part supported by grant CCR-0325808 of the National Science Foundation and NABITT grant 2106-07-0019 of the Danish Strategic Research Council.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 279–293, 2009. c Springer-Verlag Berlin Heidelberg 2009
280
J. Sarnat and C. Sch¨ urmann
For many logical systems, consistency follows directly from the termination of a cut-elimination procedure. Since the LPO is more than strong enough to prove the consistency of arithmetic, and cut-elimination procedures can be expressed as term rewriting systems, one might hope to demonstrate the consistency of arithmetic using the lexicographic path ordering to show the termination of such a TRS. However, this is impossible. Buchholz [Buc95] has shown that if one proves the termination of a TRS using the LPO, then the proof can be modified such that it is valid in a fragment of arithmetic. By G¨ odel’s second incompleteness theorem, one cannot prove the consistency of arithmetic from within a fragment of arithmetic, therefore a cut-elimination procedure for arithmetic cannot be shown terminating formulated as a TRS via the LPO. In this paper, we show how to harness the strength of the LPO as an induction principle that we call lexicographic path induction, which combines the comfort of structural induction with the expressive strength of transfinite induction. The consistency of arithmetic and weak normalization of G¨odel’s T are wellknown examples of transfinite induction. We give a novel consistency proof by lexicographic path induction of an intuitionistic theory of inductive definitions, based on the system by Martin-L¨ of [ML71] that inspired the definition of inductive types in type theory [Dyb91, PM93], and several sequent calculi in the programming languages literature [MM00, MT03, Bro06, GMN08]. The consistency of Heyting Arithmetic follows as a simple corollary, and the weak normalization of G¨ odel’s T follows via a structural logical relation [SS08]. Both have been formalized in a prototypical extension of Twelf (http://www.twelf.org/lpo/) providing empirical evidence for the usefulness of lexicographic path induction. The paper is organized as follows. In the Section 2 we introduce the lexicographic path orderings and the principle of lexicographic path induction. In Section 3 we introduce a sequent calculus for intuitionistic logic with inductive definitions. In Section 4, we prove the consistency of this logic. In Section 5, we conclude and describe related and future work.
2
The Lexicographic Path Ordering
The lexicographic path ordering (LPO) provides a modular way of specifying orderings on finite labeled trees whose constructors have fixed arity. Given a signature Σ of fixed arity constructors, whose elements we denote generically by the letters f and g, labeled trees are defined as follows Labeled Trees s, t ::= f(s1 , . . . , sn ) where the arity of f, denoted #f, is n for n ≥ 0. We use Σ n to denote the constructors of Σ of arity n. Although signatures can in principle be infinite, all of the signatures considered in this paper are finite. Definition 1 (Lexicographic Path Ordering). Given a precedence relation < on Σ we define
Lexicographic Path Induction
281
1. f < g and for all i ∈ 1 . . . , n si
(Subterm) t
Proof The subterm, monotonicity and transitivity properties are shown in [KL80], well-foundedness is shown in [Buc95]. The big head property can be shown by a straightforward induction on the structure of t. Example 1 Let Σ = {z, succ, op}, where #z = 0, #succ = 1 and #op = 2, and let < be defined as z < {succ, op}, succ < op. The following inequalities hold for every s and t 1. 2. 3. 4.
succn (s)
The first inequality can be seen as an instance of the big head property. The second inequality highlights another interesting property of the LPO: if a large constructor (in this case op) occurs beneath a small constructor (in this case succ), then “bubbling up” the larger constructor results in a larger term, or viewed the other way, “bubbling up” the smaller constructor results in a smaller term; this observation will play an important role in Lemma 6. The third inequality highlights the application of the second clause of Definition 1: in a sense, one can think a partially applied constructor as being a constructor in its own right, where the “precedence” of op(s, −) is smaller than op(succ s, −). The last inequality can be used to help show that the Ackermann function, when formulated as a term rewriting system, terminates. Definition 2 (Principle of Lexicographic Path Induction). The Principle of Lexicographic Path Induction is the principle of well-founded induction with the LPO as the well-founded ordering.
282
J. Sarnat and C. Sch¨ urmann
In the next sections, we will show one of the applications of the principle, which justifies its definition, and demonstrates how one can prove the consistency of Heyting arithmetic via this principle.
3
The Intuitionistic Theory of Inductive Definitions
In the previous section, we saw that lexicographic path induction can be seen as an instance of well-founded induction. However, more conventional induction principles can be obtained from the notion of inductive definitions. In mathematics, inductive definitions give rise to monotone operators, and the Tarski-Knaster fixed point theorem can be used to justify the existence of, and a notion of induction over, inductively defined families. Inductive definitions can also be primitively formalized in the language of first order logic, using a theory of iterated inductive definitions. Martin-L¨ of’s intuitionistic theory of iterated inductive definitions (IID<ω ) [ML71] is notable for its formulation as a natural deduction calculus, where predicate symbols play the role of inductively defined families, atomic introduction rules play the role of monotone operator and atomic elimination rules play the role of induction principles. Martin-L¨ of proved the consistency of IID<ω using a logical relations argument; here, we give a sequent calculus formulation of (non-iterated) intuitionistic inductive definitions (IID0 ) and in Section 4 we prove its consistency using lexicographic path induction. The proof-theoretic strength of IID0 is that of arithmetic [ML71]. 3.1
Formulas and Proofs
The language of IID0 is parameterized by a term algebra, whose fixed arity constructors we denote generically using the letter c, by some collection of fixed arity predicate symbols, whose elements we denote generically using the letter a, and some collection X of term variables, whose elements we refer to generically using the letters x and y. As a running example, we consider an instantiation of IID0 with a term algebra corresponding to the natural numbers, i.e. the constant z and unary constructor s, and with the predicate symbols ⊥, nat and eq, whose arities are 0, 1, and 2, respectively. Formulas are defined below. Terms t, u ::= x | c(t1 , . . . , tn ) Formulas F, G ::= a(t1 , . . . , tn ) | F ∧ G | F ∨ G | F ⇒ G | ∀x.F | ∃x.F Predicates P, Q ::= (x1 , . . . , xn → F )
An n-ary predicate is the abstraction of a formula over n bound variables: note that any formula can be viewed as a 0-ary predicate, and vice-versa. We say that a formula is atomic if it is of the form a(t1 , . . . , tn ), which we denote using the letters A and B; otherwise it is compound, which we denote using the letters K and L. We write t[t /x] and F [t/x] for the usual notions of capture-avoiding substitution, and P (t1 , . . . , tn ) for F [t1 /x1 ] . . . [tn /xn ] when P = x1 , . . . , xn → F . We formalize the notion of provability using sequents of the form Γ F , where Γ is the notion of context defined below. We depart slightly from convention by
Lexicographic Path Induction
283
using Γ to keep track of not only which logical hypotheses may be used freely in the deduction of a sequent, but also which free term variables may be used in the deduction as well. Although naming the logical hypotheses in Γ is not useful when presenting IID0 ’s proof rules, doing so is useful in the presentation of IID0 ’s proof terms, which will be introduced in Section 3.2. To this end, we assume that we are given a set of hypothetical variables H, whose elements we denote generically using the letter h. Contexts Γ ::= · | Γ, h:A | Γ, x
In order for Γ to be well formed, we require that each element of X or H occurs in Γ at most once; this condition can typically be satisfied by tacitly renaming bound variables. We write Γ [t/x] for the capture-avoiding substitution of all occurrences of x in Γ with t (if x is declared in Γ , it is removed). The relation Γ ≥ Γ holds whenever Γ can be obtained by deleting some number of declarations from Γ . We sometimes abuse notation by writing Γ, Γ for the concatenation of two contexts. Our rules are formulated in the style of Pfenning [Pfe95], itself similar to G3i of [TS00]; the exact formulation of rules is immaterial to the applicability of our technique, although we feel that this presentation corresponds to an especially natural notion of proof term. We depart slightly from convention by denoting the occurrence of F in Γ using the premiss h:F ∈ Γ , rather than writing Γ as Γ , h:A, Γ ; we feel that this presentation makes the correspondence between proof-rules and proof terms more transparent. IID0 ’s core proof rules are below. h:F ∈ Γ Γ F
axiom
Γ, h:F G Γ F ⇒G
⇒r
Γ, h1 :F1 G
Γ F
Γ, h:F G Γ G
Γ, x F Γ ∀x.F Γ, h2 :F2 G
∀r
Γ, h:F2 G
Γ ∃x.F
h :F1 ∨ F2 ∈ Γ
Γ G
h :F1 ⇒ F2 ∈ Γ
Γ G
Γ F ∧G
Γ F [t/x]
Γ G Γ F1
Γ F
cut
∃r
∨l
⇒l
Γ Fi
∧r
h :F1 ∧ F2 ∈ Γ
Γ, h:Fi G
Γ G Γ, F [t/x] G
h :∀x.F ∈ Γ
Γ G Γ, x, h:F G
∨ri
Γ F1 ∨ F 2
h :∃x.F ∈ Γ
Γ G
∧li
∀l
∃l
The ∀l and ∃r rules have a free variable side condition: every free variable of t is contained in Γ . Note that the eigenvariable side condition for ∀r and ∃l (i.e. x ∈ Γ ) is enforced by our notion of well-formed context; this notion gives rise to a similar side condition for the hypothetical variables introduced in ⇒ r and in all of the left rules. IID0 is parameterized not only by its predicate symbols, but by the proof rules that refer to its atomic formulas. As is the case with compound formulas, the proof rules for atomic formulas can be divided into right rules and left rules. The right rules are supplied by the user, but must follow a particular format; the left rules are obtained algorithmically from the right rules.
284
J. Sarnat and C. Sch¨ urmann
Right rules. In general, the atomic right rules must be of the form: Γ A1
Γ An
... Γ A0
where A0 . . . An are built up from predicate symbols, term-schematic variables, and constructors from the term algebra. For example, the right rules for nat, eq, and ⊥ are below (note that ⊥ has no right rules). Γ nat(z)
natrz
Γ nat(t) Γ nat(succ t)
natrs
Γ eq(t, t)
eqr
Left rules. In the setting of functional programming, a catamorphism (the most famous of which is probably foldr) can be used to provide a canonical notion of induction over the inhabitants of an inductively defined datatype. Through the lens of the Curry-Howard isomorphism, we can think of IID0 ’s predicate symbols as being inductively defined datatypes, the atomic right rules as datatype constructors, and the atomic left rules as catamorphisms. Just as foldr has arguments corresponding to the constructors nil and cons, so too must the atomic left rule for the predicate symbol nat have minor premisses corresponding to natrz and natrs . In general, atomic left rules all have the form minor premisses
Γ, h:P (t1 , . . . , tn ) G
h :a(t1 , . . . , tn ) ∈ Γ
Γ G
where the major premiss (i.e. Γ, h:P (t1 , . . . , tn ) G) can be seen as allowing for the generalization of the induction hypothesis P , and the minor premisses are calculated from the right rules. For example, the atomic left rules for nat, ⊥, and eq are defined as follows. Γ P (z)
Γ, x, h0 :P (x) P (succ x)
Γ, h:P (t) G
h :nat(t) ∈ Γ natl
Γ G Γ, h:F G
h :⊥ ∈ Γ
Γ G
⊥l
Γ, x P (x, x)
Γ, h:P (t, u) G Γ G
h :eq(t, u) ∈ Γ
eql
The minor premisses of natl are calculated from natrz and natrs ; the sole minor premiss of eql is calculated from eqr; ⊥ has no right rules, and therefore ⊥l has no minor premisses. For details as to how minor premisses are calculated from right rules and how mutual dependencies are treated, we refer the reader to [ML71] and [Bro06]. We can obtain Leibniz equality Γ, h:Q(t) ⇒ Q(u) G Γ G
h :eq(t, u) ∈ Γ
Lexicographic Path Induction
285
from eql by instantiating P (x, y) to a predicate of the form Q(x) ⇒ Q(y), and satisfying the minor premiss with the impr rule; we can obtain the more familiar version of ⊥l h:⊥ ∈ Γ Γ G
by instantiating F to G and satisfying the major premiss with the axiom rule. In fact, the major premiss of every atomic left rule can be made redundant this way, and presenting IID0 ’s atomic left rules without major premisses would not complicate the presentation of our consistency proof. However, as Brotherston points out in [Bro06], without such major premisses, the cut rule could not be eliminated from all of IID0 ’s proofs; although proving full cut-elimination for IID0 is beyond the scope of this paper, we choose this formulation to preserve the possibility for future work. IID0 has the following basic properties: Lemma 2 (Weakening) If Γ F and Γ ≥ Γ then Γ F (Exchange) If Γ, h:G, h :G , Γ F then Γ, h :G , h:G, Γ F (Contraction) If Γ0 , h:G, Γ1 , h :G, Γ2 F then Γ0 , h:G, Γ1 , Γ2 G and Γ0 , Γ1 , h :G, Γ2 F (Substitution) If Γ, x, Γ F and f v(t) ⊆ Γ then Γ, Γ [t/x] G[t/x] Definition 3 (Right Normal). We say that a proof of Γ F is right-normal iff it contains only right rules. Note that right-normality is a stronger criterion than being cut-free. We are primarily concerned with right-normal proofs of atomic formulas; the following lemma summarizes the interaction between atomic right-normal proofs and minor premisses in the left rules of IID0 . Lemma 3 (Folding Right-Normal Proofs). Given a right-normal proof of Γ a(t1 , . . . , tn ) and proofs of the minor premisses of a’s left rule with induction hypothesis P , then Γ P (t1 , . . . , tn ) is provable using only substitution instances of the minor premisses’ proofs and cut. Proof. By a structural induction on Γ a(t1 , . . . , tn ), using substitution. Lemma 3 generalizes straightforwardly to mutually inductive definitions. 3.2
Proof Terms
We define proof terms for Martin L¨ of’s IID0 in order to provide a concise notation for describing and manipulating proof trees. The syntax for the non-atomic proof terms is defined below. C, D, E ::= axiom h | cut F C (h.D) | andr C D | andli (h.C) h | orri C | orl (h1 .C) (h2 .D) h | impr (h.C) | impl C (h.D) h | allr (x.C) | alll t (h.C) h | existsr t C | existsl (x.h.C) h | . . .
286
J. Sarnat and C. Sch¨ urmann
We interpret h.C and x.C as binding occurrences of the variables h and x in C, and write C[t/x] for the capture avoiding substitution of t for x. The typing rules are given below. Γ C :F
h:F ∈ Γ Γ axiom h : F
Γ, h:F D : G
Γ C:F
Γ cut F C (h.D) : G
Γ andr C D : F ∧ G Γ, x C : F
Γ C : Fi
Γ, h:F C : G
Γ orri C : F1 ∨ F2
Γ impr (h.C) : F ⇒ G Γ, h:Fi C : G
Γ C : F [t/x]
Γ allr (x.C) : ∀x.F
h :F1 ∧ F2 ∈ Γ
Γ andli (h.C) h : G
Γ existsr t C : ∃x.F Γ, h1 :F1 C : G
Γ D:G
Γ, h2 :F2 D : G
h :F1 ∨ F2 ∈ Γ
Γ orl (h1 .C) (h2 .D) h : G Γ C : F1
Γ, h:F2 D : G
h :F1 ⇒ F2 ∈ Γ
Γ impl C (h.D) h : G Γ, h:F [t/x] G
h :∀x.F ∈ Γ
Γ alll t (h.C) h : G
Γ, x, h:F G
h :∃x.F ∈ Γ
Γ existsl (x.h.C) h : G
The proof term constructors corresponding to atomic inference rules are obtained in a manner analogous to the above: subtrees correspond to subterms, and extending the context of a subderivation corresponds to introducing a new bound variable; term-, formula-, and predicate-schematic variables are explicitly included in proof terms when convenient. We continue our example below. C, D, E ::= . . . | natrz | natrs t C | eqr t | natl P t C0 (x.h0 .C1 ) (h.D) h | botl F (h.C) h | eql P t u (x.C) (h.D) h
Due to space constraints, we omit the typing rules for atomic proof terms; it should be a straightforward exercise for the reader to reconstruct them. Theorem 2. There exists an isomorphism between derivations of Γ F and proof terms D s.t. Γ D : F . Proof. Each direction is proven by a straightforward induction on the given derivation, replacing each instance of a proof rule with the corresponding typing rule, and vice versa. Theorem 2 justifies treating well-typed proof terms and proof derivations interchangeably, without losing generality. For example, Lemma 3 corresponds to an analogous lemma over well-typed terms, whose behavior on untyped proof terms is, in the case of nat, realized by the following function. fold-nat (natrz ) (P ) (D1 ) (x.h.D2 ) = D1 fold-nat (natrs t C) (P ) (D1 ) (x.h.D2 ) = cut P (t) (fold-nat C P D1 (x.h.D2 )) h.(D2 [t/x])
Lexicographic Path Induction
4
287
The Consistency Proof
We give now the consistency proof of Martin-L¨ of’s IID0 , where we reason about derivations using lexicographic path induction. Informally, a logic is consistent if its notion of provability is somehow nontrivial. Although there are several ways to make this precise, one of the more common is to demonstrate the existence of an unprovable sequent, which can typically be shown as a corollary to some sort of normalization theorem. We generalize this notion slightly by proving that, if · C : A then there exists a right-normal D s.t. · D : A. If, for example, ⊥ is an atomic predicate symbol, then consistency in the traditional sense follows immediately. Although proof terms are in a sense finite trees, we do not apply lexicographic path induction to them directly because proof terms contain information that we do not consider relevant to the size of a proof. Instead, we apply the principle to skeletons of proof trees, which will be defined in 4.1; lexicographic path induction on skeletons subsumes structural induction on proof trees. In many ways, our proof follows the same general structure as Gentzen’s proof of the consistency of arithmetic [Gen69], and the Howard [How70] and Sch¨ utte [Sch77] proofs of normalization for G¨ odel’s T. All involve assigning wellfounded orderings to proof trees/λ-calculus terms (0 for Gentzen, Howard and Sch¨ utte, the LPO here); all demonstrate normalization for a restricted class of sequents/types (· · for Gentzen; the base type for Howard and Sch¨ utte, · A here); and all unfold inductions all-at-once, rather than one-at-a-time. Our proof differs from the others in that lexicographic path induction is applied directly to skeletons of proof terms, whereas in Gentzen’s, Howard’s and Sch¨ utte’s proofs, the assignment of ordinals to proofs/λ-calculus terms is very complex. This discrepancy shouldn’t be too surprising: the order type of the lexicographic path ordering approaches the small Veblen ordinal [DO88, Mos04], which is MUCH larger than 0 ; it is often the case that using stronger-than-necessary assumptions leads to simpler proofs. 4.1
Ordering Proof Terms
Although we have seen in 3.2 that well-typed proof terms contain the same amount of information as proof trees, we do not consider all of this information to be relevant to the size of proofs. In particular, we do not consider hypotheses or elements of the term algebra relevant, nor, with the notable exception of cut, do we consider formulas or predicates relevant. We therefore map proof terms into labeled trees, called skeletons, which are obtained from proof terms by stripping spurious information. Because we consider the size of cut-formulas to be relevant to the size of proofs (as is typically the case), skeletons are defined for formulas as well. The signature Σ for skeletons is defined as follows. Σ0 Σ1 Σ2 Σ3
= a, axiom = all, exists, andli , orri , impr, allr, alll, existsr, existsl = and, or, imp, andr, orl, impl = cut
288
J. Sarnat and C. Sch¨ urmann
Σ also contains constructors of the appropriate arity corresponding to atomic rules, where the stripping function is defined analogously. We give some representative cases of stripping. Note the use of the sans serif font for skeletons. |a(t1 , . . . , tn )| = a |F ∧ G| = and(|F |, |G|) |∀x .F | = all(|F |) |cut F C h.D | = cut(|F |, |C |, |D|) |axiom h| = axiom |allr x .C | = allr(|C |) |existsl (x .h.C ) t | = existsl(|C |) |natl P t C0 (x .h.C1 ) (h.D) h | = natl(|C0 |, |C1 |, |D|)
Definition 4 (Skeleton Ordering). We define < as the least transitive ordering on Σ satisfying all of the following: 1. If f corresponds to an atomic formula, and g is any of {and, or, imp, all, exists} then f < g 2. If f corresponds to a formula, and g corresponds to a proof rule, then f < g 3. If f corresponds to a right-rule or a compound left-rule then f < cut 4. If f corresponds to an atomic left-rule, then cut < f 5. If f corresponds to an atomic right-rule, then f < axiom
Lexicographic Path Induction
4.2
289
The Normalization Procedure
Our proof is structured as follows. Given · C : A, C must either be of the form ar C1 . . . Cn , or cut F D h.E. In the former case, we right-normalize C1 . . . Cn , and apply ar to the result. In the latter case, we find a proof term C which is smaller than C and right-normalize C ; however, the calculation of C depends on whether the cut-formula F is atomic or compound. Observe that, if F is atomic, then, by induction, we can right-normalize D into D ; C is obtained by eliminating all uses of h from h.E, making use of D and Lemma 3. If F is compound, we perform what is essentially a small-step version of the cutadmissibility proof in [Pfe95], where, for reasons that will be explained later, we must be careful to avoid any “commutative conversions” for atomic left rules. Recall that we write A for atomic formulas and K for compound formulas. Lemma 5 (Atomic Cut Reduction). For every C and h.D, if C is rightnormal, Γ C : A and Γ, h:A D : F , then there exists E such that Γ E : F and |E |≤lpo |D |. Proof. By structural induction on D. Most cases are straightforward, often using weakening on C and exchange on D before applying the induction hypothesis, and using monotonicity of
Lemma 6 (Compound Cut Reduction). Let Γ contain only compound assumptions (i.e. if h:F ∈ Γ , then F must be compound). If Γ cut K C (h.D) : G then there exists E such that Γ E : G and |E |
290
J. Sarnat and C. Sch¨ urmann
Proof. By structural induction on cut K C (h.D). The proof can be realized on untyped terms by the function redK, some representative cases of which are below. Most cases use Lemma 4, clause 2. If C is a left rule, or if D is either right rule or left rule that acts on a hypothesis other than h, then the cut is “commutative,” and the offending rule will be bubbled up (see Definition 4, clause 3). Note that the restriction on Γ means that we never encounter commutative cuts of atomic left rules, and thus will never have to bubble one past a cut. This is critical, because as we have seen in Lemma 5, cut must be smaller than al. redK (cut K C (h.andr D1 D2 )) = andr (cut K C h.D1 ) (cut K C h.D2 ) redK (cut K (andli (h .C) h ) (h.D)) = andli (h .cut K C h.D) h redK (cut K C (h.andli (h .D) h )) = andli (h .cut K C h.D) h
If C is a right rule, and D is a left rule that acts on h, then the cut is “essential.” The sizes of cut-formulas play a crucial role in these cases. The ∀ and ∃ essential cases use Lemma 4, clause 3. redK (cut (F ⇒ G) (impr h0 .C0 ) (h.impl D0 (h1 .D1 ) h)) = cut G (cut F (cut (F ⇒ G) (impr h0 .C0 ) h.D0 ) C0 ) (h1 .cut (F ⇒ G) (impr h0 .C0 ) h.D1 ) redK (cut (∀x.F ) (allr x.C) (h.alll t (h .D) h)) = cut (F [t/x]) (C[t/x]) (h .cut (∀x.F ) (allr x.C) h.D)
If C or D is a compound cut, we apply the induction hypothesis and monotonicity of
(cut L C0 h .C1 ) D) = cut K (redK C) h.D C (h.cut L D0 h .D1 )) = cut K C h.(redk D) h.(cut A C0 h .C1 ) D) = cut A C0 (h .cut K C1 h.D) C (h.cut A D0 h .D1 )) = K C h.D0 ) h .(cut K C h.D1 )
We are now ready to prove our consistency theorem. Theorem 3 (Consistency of IID0 ). For all C, if · C : A then there exists a right-normal D such that · D : A Proof. By lexicographic path induction on |C |. This theorem can be realized by the function norm on untyped terms, which is defined as follows. norm (ar C1 . . . Cn ) = ar (norm C1 ) . . . (norm Cn ) norm (cut A C1 h.C2 ) = norm (redA (norm C1 ) (h.C2 )) norm (C as cut K C1 h.C2 ) = norm (redK C)
Note that the consistency theorem can be generalized straightforwardly to sequents of the form Γ F , where Γ contains only eigenvariables, and F contains no implications; we choose not to formulate it this way for the sake of simplicity, although this generalization is used in the proof of weak normalization for G¨ odel’s T.
Lexicographic Path Induction
4.3
291
Heyting Arithmetic
Heyting arithmetic is usually presented with a term algebra that contains constructors not only for z and succ, but also for addition and multiplication, whose operational meaning is defined axiomatically; induction is usually defined as an axiom schema of the form P (z) ∧ (∀x.P (x) ⇒ P (succ x)) ⇒ ∀x.P (x). We instead formulate Heyting Arithmetic as an instance of IID0 , whose term algebra includes only z and succ, and whose atomic formulas include nat, eq and ⊥ (as described in 3.1) along with predicate symbols for add, mult and any other functions that we wish to reason about. The usual axioms for reasoning about addition and multiplication are instead formulated as right rules for add and mult. Γ add(z, t, t)
Γ mult(z, t, z)
Γ add(t1 , t2 , t3 )
addrz
multrz
Γ add(succ t1 , t2 , succ t3 ) Γ mult(t1 , t2 , t3 )
addrs
Γ add(t2 , t3 , t4 )
Γ mult(succ t1 , t2 , t4 )
multrs
Instead of performing induction on terms, we relativize quantifiers and perform induction on nat using natl. The relativization of ∀x1 .∀x2 .∃y.add(x1 , x2 , y), for example, is ∀x1 .nat(x1 ) ⇒ ∀x2 .nat(x2 ) ⇒ ∃y.nat(y) ∧ add(x1 , x2 , y), which is a theorem in this logic. Unfortunately, two of Peano’s axioms (the inequality of zero and one, and the injectivity of the successor operation) are not provable in the logic as we have described it thus far. Rather than modify the definitions of ⊥ and eq, we explicitly add the following inference rules, without changing any of the existing atomic left rules. Γ eq(t, succ t) Γ ⊥
pa7
Γ eq(succ t, succ u) Γ eq(t, u)
pa8
Fortunately, the addition of these two rules barely complicates the consistency proof: redA must be extend with two trivial inductions over pa7 and pa8, and the normalization algorithm must be extended as follows: norm (pa7 C) = % undefined: no such right-normal C norm (pa8 C) = let (eqr (succ t)) = norm C in eqr t
4.4
Weak Normalization for G¨ odel’s T
G¨ odel’s T is the extension of the simply-typed λ-calculus with terms for zero, successor, and a primitive recursion operator. Due to space constraints, we cannot include a full development of the proof of weak normalization for G¨ odel’s T, but the proof follows a structural relations argument that is nearly identical to that of the simply-typed λ-calculus found in [SS08]. The proofs of Closure Under Weak Head Expansion and the Escape Lemma, both of which proceed by induction on types, are entirely unchanged. The major challenge is in the
292
J. Sarnat and C. Sch¨ urmann
Fundamental Theorem, where we must show that applications of the primitive recursion operator are in the logical relation. The primitive recursion operator defines induction on terms of base type, so it should come as no surprise that the Fundamental Theorem can be proven for this case by appealing to induction on the notion of “weakly normalizing at base type,” which is represented here by the unary predicate symbol hco . To this end, we extend the assertion logic with an IID0 -style left rule for hco (and only hco ), which allows us to complete the proof of the Fundamental Theorem. A slight generalization of Theorem 3 can be applied to this assertion logic, thus completing the proof of weak normalization. For details, see http://www.twelf.org/lpo/.
5
Conclusion
We have demonstrated that lexicographic path induction is a suitable replacement for transfinite induction in at least some settings. The proofs of consistency for Heyting arithmetic and G¨ odel’s T have been formalized in a prototypical extension of Twelf. We are not the first to prove the consistency of a logic using an ordering from term rewriting theory: [DP98, Bit99, Urb01] show the strong normalization of cut-elimination for different formulations of first-order logic using either the multiset path ordering or lexicographic path ordering. However, because these results rely on proving termination of term rewriting systems, they cannot be scaled to arithmetic (by G¨odel’s second incompleteness theorem and [Buc95]). Several other logics from the programming languages literature formulate the notion of induction similar to IID0 : the proofs of consistency for FOλΔN [MM00], Linc [MT03], and G [GMN08] all rely on logical relations; the proof of consistency for LKID [Bro06] uses model theory. We are optimistic that many of these systems can be proven consistent using lexicographic path induction. The proof theoretic strength of IID<ω ([ML71], section 10) exceeds that of the small Veblen ordinal, and thus its consistency cannot be proven by lexicographic path induction. We leave whether our technique scales to any useful logics or type theories whose proof-theoretic ordinal is greater than 0 , but smaller than small Veblen, to future work. Acknowledgments. We would like to thank Michael Rathjen and Georg Moser for their helpful answers to our questions regarding large ordinals, and Søren Debois for his helpful comments on an earlier draft of this paper.
References [Bit99]
Tahhan Bittar, E.: Strong normalization proofs for cut-elimination in Gentzen’s sequent calculi. In: Proceedings of the Symposium: Algebra and Computer Science. Helena Rasiowa in memoriam, vol. 46, pp. 179–225. Banach Center Publications (1999) [Bro06] Brotherston, J.: Sequent Calculus Proof Systems for Inductive Definitions. PhD thesis, University of Edinburgh (November 2006)
Lexicographic Path Induction
293
[Buc95] Buchholz, W.: Proof-theoretic analysis of termination proofs. Ann. Pure Appl. Logic 75(1-2), 57–65 (1995) [DO88] Dershowitz, N., Okada, M.: Proof-theoretic techniques for term rewriting theory. In: LICS, pp. 104–111. IEEE Computer Society Press, Los Alamitos (1988) [DP98] Dyckhoff, R., Pinto, L.: Cut-elimination and a permutation-free sequent calculus for intuitionistic logic. Studia Logica 60(1), 107–118 (1998) [Dyb91] Dybjer, P.: Inductive sets and families in Martin-L¨ of’s type theory and their set-theoretic semantics. In: Logical Frameworks, pp. 280–306. Cambridge University Press, Cambridge (1991) [Gen69] Gentzen, G.: New version of the consistency proof for elementary number theory. In: Szabo, M.E. (ed.) The Collected Papers of Gerhard Gentzen, pp. 252–286. North-Holland Publishing Co., Amsterdam (1969) [GMN08] Gacek, A., Miller, D., Nadathur, G.: Combining generic judgments with recursive definitions. In: Pfenning, F. (ed.) Proceedings of LICS 2008, pp. 33–44. IEEE Computer Society, Los Alamitos (2008) [How70] Howard, W.A.: Assignment of ordinals to terms for primitive recursive functions of finite type. In: Kino, A., Myhill, J., Vesley, R.E. (eds.) Intuitionism and Proof Theory, pp. 443–458. North-Holland, Amsterdam (1970) [KL80] Kamin, S., Levy, J.-J.: Attemps for generalising the recursive path orderings. Unpublished lecture notes (1980) [ML71] Martin-L¨ of, P.: Hauptsatz for the intuitionistic theory of iterated inductive definitions. In: Fenstad, J.E. (ed.) Proceedings of the Second Scandinavian Logic Symposium. North-Holland, Amsterdam (1971) [MM00] McDowell, R., Miller, D.: Cut-elimination for a logic with definitions and induction. Theoretical Computer Science 232(1-2), 91–119 (2000) [Mos04] Georg Moser, I.L.: Why ordinals are good for you. In: ESSLLI 2003 - Course Material II. Collegium Logicum, vol. 6, pp. 1–65. The Kurt G¨ odel Society (2004) [MT03] Momigliano, A., Tiu, A.: Induction and co-induction in sequent calculus. In: Berardi, S., Coppo, M., Damiani, F. (eds.) TYPES 2003. LNCS, vol. 3085, pp. 293–308. Springer, Heidelberg (2004) [Pfe95] Pfenning, F.: Structural cut elimination. In: Kozen, D. (ed.) Proceedings of the Tenth Annual Symposium on Logic in Computer Science, San Diego, California, pp. 156–166. IEEE Computer Society Press, Los Alamitos (1995) [PM93] Paulin-Mohring, C.: Inductive Definitions in the System Coq - Rules and Properties. In: Bezem, M., Groote, J.F. (eds.) TLCA 1993. LNCS, vol. 664. Springer, Heidelberg (1993); LIP research report 92-49 [Sch77] Sch¨ utte, K.: Proof Theory. Springer, Heidelberg (1977) [SS08] Sch¨ urmann, C., Sarnat, J.: Structural logical relations. In: Pfenning, F. (ed.) Proceedings of LICS 2008, pp. 69–80. IEEE Computer Society Press, Los Alamitos (2008) [TS00] Troelstra, A.S., Schwichtenberg, H.: Basic Proof Theory, 2nd edn. Cambridge Tracts in Theoretical Computer Science, vol. 43. Cambridge University Press, Cambridge (2000) [Urb01] Urban, C.: Strong normalisation for a Gentzen-like cut-elimination procedure. In: Proceedings of the 5th International Conference on Typed Lambda Calculi and Applications, Krakow, Poland, May 2001, pp. 415–442 (2001)
Parametricity for Haskell with Imprecise Error Semantics Florian Stenger and Janis Voigtl¨ ander Technische Universit¨ at Dresden 01062 Dresden, Germany {stenger,voigt}@tcs.inf.tu-dresden.de
Abstract. Error raising, propagation, and handling in Haskell can be imprecise in the sense that a language implementation’s choice of local evaluation order, and optimizing transformations to apply, may influence which of a number of potential failure events hidden somewhere in a program is actually triggered. While this has pragmatic advantages from an implementation point of view, it also complicates the meaning of programs and thus requires extra care when reasoning about them. The proper semantic setup is one in which every erroneous value represents a whole set of potential (but not arbitrary) failure causes. The associated propagation rules are somewhat askew to standard notions of program flow and value dependence. As a consequence, standard reasoning techniques are cast into doubt, and rightly so. We study this issue in depth for one such reasoning technique, namely the derivation of free theorems from polymorphic types. We revise and extend the foundational notion of relational parametricity, as well as further material required to make it applicable.
1
Introduction
Functional languages come with a rich set of conceptual tools for reasoning about programs. For example, structural induction and equational reasoning tell us that the standard Haskell functions takeWhile :: (α → Bool) → [α] → [α] takeWhile p [ ] = [] takeWhile p (x : y) | p x = x : takeWhile p y | otherwise = [ ] and
map :: (α → β) → [α] → [β] map h [ ] = [] map h (x : y) = h x : map h y
satisfy the following law for appropriately typed p, h, and l: takeWhile p (map h l) = map h (takeWhile (p ◦ h) l) .
This author was supported by the DFG under grant VO 1512/1-1.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 294–308, 2009. c Springer-Verlag Berlin Heidelberg 2009
(1)
Parametricity for Haskell with Imprecise Error Semantics
295
But programming language reality can be a tough game, leading to unexpected failures of such near-obvious laws. For example, Peyton Jones et al. [5] proposed a design for error handling based on a certain degree of impreciseness. The major implementations GHC and Hugs have integrated this design years ago. However, the resulting semantics breaks law (1). An instantiation showing this is p = null, h = tail , and l = [[i] | i ← [1..(div 1 0)]] (or any other immediately failing expression of type list-of-lists), where null :: [α] → Bool null [ ] = True null (x : y) = False
tail :: [α] → [α] tail [ ] = error “tail: empty list” tail (x : y) = y
are standard Haskell functions as well. The problem with (1) now is that its left-hand side yields exactly the “divide by zero”-error coming from l, whereas its right-hand side may also yield the “tail: empty list”-error. This is so due to the semantics of pattern-matching in the mentioned design [5]. In short, it prescribes that when pattern-matching on an erroneous value as scrutinee, not only are any errors associated with it propagated, but, in addition, the branches of the pattern-match are investigated in “error-finding mode” to detect any errors that may arise there independently of the scrutinee. This is done to give the language implementation more freedom in arranging computations, thus allowing more transformations on the code prior to execution. But here it means that when takeWhile (null ◦tail ) encounters an erroneous value, also (null ◦tail ) x is evaluated, with x bound to a special value Bad ∅ that exists only to trigger the error-finding mode. And indeed, the application of tail on that x raises the “tail: empty list”-error, which is propagated by null and then unioned with the “divide by zero”-error from l. In contrast, takeWhile null on an erroneous value does not add any further errors, because the definition of null raises none. And, on both sides of (1), map h only ever propagates, but never introduces errors. Thus, if we do not want to take the risk of introducing previously non-existent errors, we cannot use (1) as a transformation from left to right, even though this might have been beneficial (by bringing p and h together for further analysis or for subsequent transformations potentially improving efficiency). The supposed semantic equivalence simply does not hold. So impreciseness in the semantics has its price, and if we are not ready to abandon the overall design (which would be tantamount to taking away considerable freedom from language implementers), then we must learn how to cope with it when reasoning about programs. The above discussion regarding a concrete instantiation of p, h, and l gives negative information only, namely that (1) may break down in some cases. It does not provide any positive information about conditions on p, h, and l under which (1) actually is a semantic equivalence. Moreover, it is relative to the particular definition of takeWhile given at the very beginning, whereas laws like (1) are often derived more generally as free theorems [7,9] from types alone, without considering concrete definitions. In this paper, we develop the theory of free theorems for Haskell with imprecise error semantics. This continues earlier work [1] for Haskell with all potential error causes (including non-termination)
296
F. Stenger and J. Voigtl¨ ander
conflated into a single erroneous value ⊥. That earlier work indicates that, in this setting, (1) is a semantic equivalence provided p = ⊥ and h is strict and total in the sense that h ⊥ = ⊥ and for every x = ⊥, h x = ⊥. The task before us involves finding the right generalizations of such conditions for a setting in which not all errors are equal. Questions like the following ones arise: – From which erroneous values should p be different? – For strictness, is it enough that h preserves the least element ⊥, which in the design of Peyton Jones et al. [5] denotes the union of all error causes, including non-termination? – Or do we need that also every other erroneous value (denoting a collection of only some potential error causes, maybe just a singleton set) is mapped to an erroneous one? To the same one? Or to ⊥? – For totality, is it enough that non-⊥ values are mapped to non-⊥ values, including possibly to non-⊥ but still erroneous values? – Or do we need that h maps non-erroneous values only to non-erroneous ones? We should not expect trivial answers to these questions. The two settings are simply too different. In particular, it is worth pointing out that the failure of (1) occurs for a very innocently-looking definition of takeWhile here. Note that takeWhile as defined does not, by itself, introduce any errors or non-termination, nor does it use selective strictness via Haskell’s seq-primitive. Thus, the features that made life hard before are actually absent, and still the law breaks down.1 In fact, were it not for the imprecise error semantics, (1) would hold for the given definition of takeWhile as a semantic equivalence for arbitrary p, h, and l, even ones involving ⊥ and seq in arbitrary ways and without strictness or totality conditions. By contraposition, this indicates that genuinely new challenges are posed by the imprecise error semantics. Fortunately, we are not left groping in the dark. Our investigation can be very much goal-directed by studying proof cases of the (relational) parametricity theorem [7,9,6], which is the foundation for all free theorems, and trying to adapt the proof to the imprecise error setting. This leads us to discover, among other formal details and ingredients, the appropriate generalized conditions sought above (first as restrictions on relations, then specialized to the function level). Note that even though we do not deal with exception handling in the functions for which we derive free theorems, our results are nevertheless immediately relevant as well in the larger context of programs that do error recovery (in the IO monad; see the description by Peyton Jones [4, Section 5.2]). Just imagine alternately the left- or right-hand side of the offending instantiation of (1) in the place of the “· · · ” in the following code snippet: Control.Exception.catch (evaluate · · · ) (λs → if s = ErrorCall “tail: empty list” then return [[42]] else return [ ]) 1
Indeed, Johann and Voigtl¨ ander [1] had to add rather ad-hoc occurrences of seq to a definition of the filter -function (of same type as takeWhile) to “provoke” failures of the corresponding standard free theorem. Here, instead, even the most natural specification of takeWhile leads to problems.
Parametricity for Haskell with Imprecise Error Semantics
297
Then depending on whether the left- or right-hand side of (1) is put there, we might observe different non-erroneous program outcomes. This is even more severe than “just” a confusion between different erroneous values. With the results from this paper both kinds of problems are settled. For example, we will derive (cf. Example 1) that (1) is a true semantic equivalence provided p and h are non-erroneous, h acts as identity on erroneous values, and h never maps a non-erroneous value to an erroneous one. Similar fixes can be obtained for other free theorems. The accompanying technical report [8] goes on to establish “inequational” parametricity theorems, including one for the refinement order of Moran et al. [3]. Then, for example, slightly weaker conditions than those mentioned above suffice for a variant of (1) in which the left-hand side is only stated to semantically approximate the right-hand side. The technical report also makes some initial steps into the realm of exceptions as first class citizens by integrating a primitive (Haskell’s mapException ) that allows manipulating already raised errors (respectively, their descriptive arguments) from inside the language. The work most closely related to that reported here is the recent one of Johann and Voigtl¨ ander [2], which also studies relational parametricity for a setting in which different failure causes are semantically distinguished. However, that earlier work does not consider the imprecise error semantics embodied in the mentioned Haskell implementations. Rather, error treatment there is completely deterministic, but results are given modulo a presumed, and then fixed, order on erroneous values. It might be tempting to try to encode the “contents” of erroneous values in the imprecise error semantics, namely sets of error causes, into the unstructured erroneous values of the deterministic setup. Then one could try to choose the order on these unstructured values to agree with the reversed subset order prescribed by Peyton Jones et al. [5] on the encoded sets. But this approach cannot faithfully model how errors are propagated and combined in the imprecise error semantics, e.g., by taking unions in the semantics of patternmatching. In fact, (1) is a semantic equivalence in the setting of Johann and Voigtl¨ ander [2] (for the given definition of takeWhile), no matter what order on erroneous values is chosen. This means that their formal development is fundamentally unsuited to make the semantic distinctions that need to be made here. The remainder of the paper is structured as follows. Section 2 introduces a Haskell-like calculus with imprecise error semantics. Section 3.1 recalls the standard approach to relational parametricity. Sections 3.2 and 3.3 adapt it to the imprecise error setting, and Section 3.4 shows how to derive revised free theorems. Section 4 concludes with an outlook on future work.
2
Imprecise Error Semantics
We consider a polymorphic lambda-calculus that corresponds to Haskell with a semantics that distinguishes between different causes of failure. The syntax of
298
F. Stenger and J. Voigtl¨ ander
types and terms is given as follows, where α ranges over type variables, x over term variables, and n over the integers: τ ::= α | Int | [τ ] | τ → τ | ∀α.τ t ::= x | n | t + t | [ ]τ | t : t | case t of {[ ] → t ; x : x → t} | λx : τ.t | t t | Λα.t | t τ | fix | let! x = t in t | error Note that the calculus is explicitly typed and that type abstraction and application are explicit in the syntax as well. General recursion is captured via a fixpoint combinator, while selective strictness (`a la Haskell’s seq) is provided via a strict-let construct. That construct’s standard semantics is to evaluate the term bound to the term variable, independently of its use in the body term, and to eventually return the evaluation of the latter, potentially reusing the evaluation of the former term. Fig. 1 gives the typing rules for the calculus.2 Standard conventions apply. In particular, typing environments Γ take the form α1 , . . . , αk , x1 : τ1 , . . . , xl : τl with distinct αi and xj , where all free variables occurring in a τj have to be among the listed type variables. The explicit type information in the syntax of empty lists ensures that for every Γ and t there is at most one τ with Γ t : τ . Γ, x : τ x : τ
Γ n : Int
Γ t2 : Int Γ t1 : Int Γ (t1 + t2 ) : Int
Γ [ ]τ : [τ ]
Γ fix : ∀α.(α → α) → α
Γ t1 : τ Γ t2 : [τ ] Γ (t1 : t2 ) : [τ ]
α, Γ t : τ Γ (Λα.t) : ∀α.τ
Γ t1 : τ2 Γ, x1 : τ1 , x2 : [τ1 ] t2 : τ2 Γ t : [τ1 ] Γ (case t of {[ ] → t1 ; x1 : x2 → t2 }) : τ2 Γ t : ∀α.τ1 Γ (t τ2 ) : τ1 [τ2 /α]
Γ, x : τ1 t : τ2 Γ (λx : τ1 .t) : τ1 → τ2
Γ t2 : τ1 Γ t1 : τ1 → τ2 Γ (t1 t2 ) : τ2
Γ error : ∀α.Int → α
Γ t1 : τ1 Γ, x : τ1 t2 : τ2 Γ (let! x = t1 in t2 ) : τ2
Fig. 1. Typing Rules
As an example, map can be defined as the following term and then satisfies map : τ , where τ = ∀α.∀β.(α → β) → [α] → [β]: fix τ (λm : τ.Λα.Λβ.λh : α → β.λl : [α]. case l of {[ ] → [ ]β ; x : y → (h x) : (m α β h y)}) . We use a denotational semantics that extends the one given by Peyton Jones et al. [5], our main extension being that we formalize the treatment of polymorphic 2
Note that, to simplify the presentation, we deviate from Haskell by using integers rather than strings as descriptive arguments for error.
Parametricity for Haskell with Imprecise Error Semantics
299
types. Peyton Jones et al.’s main innovation, and the reason for calling the semantics “imprecise”, is the use of sets of possible failure causes. Formally, let E = {ErrorCall n | n ∈ {. . . , −2, −1, 0, 1, 2, . . .}} and E nt = {NonTermination} ∪ E, where NonTermination and ErrorCall are descriptive tags for use in the denotational semantics but without direct syntactical counterparts in the underlying calculus. The set of all erroneous values is then Verr = {Bad e | e ∈ P(E) ∪ {E nt }} 3 and its elements are ordered by Bad e Bad e iff e ⊇ e .
(2)
The operation lift maps complete partial orders to so-called error-lifted cpos (henceforth, for short, elcpos): lift S = Verr ∪{Ok s | s ∈ S}. The approximation order on such an elcpo is given by (2) on erroneous values, by taking over the order from S for non-erroneous values, and by mandating that ⊥ = Bad E nt is below all, even non-erroneous, values, while otherwise erroneous and non-erroneous values are pairwise incomparable. Illustrated as a diagram, the structure of an elcpo is as follows: Bad e Bad e
e ⊇ e
Bad
Ok ⊥
With the above definitions in place, types are interpreted as elcpos as follows, where θ is a mapping from type variables to elcpos: [[α]]θ [[Int]]θ [[[τ ]]]θ [[τ1 → τ2 ]]θ [[∀α.τ ]]θ
= θ(α) = lift {. . . , −2, −1, 0, 1, 2, . . .} = gfp (λS.lift ({[ ]} ∪ {a : b | a ∈ [[τ ]]θ , b ∈ S})) = lift {f : [[τ1 ]]θ → [[τ2 ]]θ } = lift {g | ∀D elcpo. (g D) ∈ [[τ ]]θ[α→D] \ Verr } .
The first four lines are consistent with a standard semantics featuring only a single erroneous value ⊥ at every type. The complete partial order lifted in the definition of [[Int]]θ is the flat one without ordering between integers. For list types, prior to lifting, [ ] is only related to itself, while the ordering between “− : −”-values is component-wise. Also note the use of the greatest fixpoint to provide for infinite lists. The function space lifted in the definition of [[τ1 → τ2 ]]θ 3
Note that if the e in a (Bad e) ∈ Verr contains NonTermination, then it must also contain every other possible failure cause.
300
F. Stenger and J. Voigtl¨ ander
is the one of monotonic and continuous maps between [[τ1 ]]θ and [[τ2 ]]θ , ordered point-wise. The elements in the set lifted in the definition of [[∀α.τ ]]θ are again ordered point-wise (i.e., g1 g2 iff for every elcpo D, g1 D g2 D). Note that, in this last line, by subtracting Verr from the possible ranges of g, we mandate that a non-erroneous polymorphic value does not have any erroneous instantiation. In particular, we thus exclude, as in Haskell, polymorphic values of which the instantiation at some type is erroneous and at some other type is nonerroneous. More specifically, an erroneous polymorphic value exhibits exactly the same potential failing behavior in each of its instantiations. Of course, ensuring all this also depends on the term semantics, to be considered next:4 [[x]]θ,σ = σ(x) = Ok n [[n]]θ,σ [[t1+ t2 ]]θ,σ = Ok (n1 + n2 ) if [[t1 ]]θ,σ = Ok n1 , [[t2 ]]θ,σ = Ok n2 Bad (E([[t1 ]]θ,σ ) ∪ E([[t2 ]]θ,σ )) otherwise [[[ ]τ ]]θ,σ = Ok [ ] [[t1 : t2 ]]θ,σ = Ok ([[t1 ]]θ,σ : [[t2 ]]θ,σ ) ; x1 : x2 → t2 }]]θ,σ = [[case t of {[ ] → t 1 ⎧ ⎪ if [[t]]θ,σ = Ok [ ] ⎨[[t1 ]]θ,σ [[t2 ]]θ,σ[x1 →a, x2 →b] if [[t]]θ,σ = Ok (a : b) ⎪ ⎩ Bad (e ∪ E([[t1 ]]θ,σ ) ∪ E([[t2 ]]θ,σ[x1 →Bad ∅, x2 →Bad ∅] )) if [[t]]θ,σ = Bad e [[λx : τ.t]]θ,σ = Ok (λa.[[t]]θ,σ[x→a] ) [[t1 t2 ]]θ,σ = [[t1 ]]θ,σ $ [[t2 ]]θ,σ Ok (λD.[[t]]θ[α→D],σ ) if [[t]]θ[α→Verr ],σ = Ok v [[Λα.t]]θ,σ = Bad e if [[t]]θ[α→Verr ],σ = Bad e [[t τ ]]θ,σ [[fix]]θ,σ [[let! x = t1 in t2 ]]θ,σ [[error]]θ,σ
= [[t]]θ,σ $$ [[τ ]]θ = Ok (λD.Ok (λh. ((h $)i ⊥))) [[t2 ]]θ,σ[x→Ok v] if [[t1 ]]θ,σ = Ok v = Bad (e ∪ E([[t2 ]]θ,σ[x→Bad ∅] )) if [[t1 ]]θ,σ = Bad e Bad {ErrorCall n} if a = Ok n = Ok (λD.Ok (λa. )) Bad e if a = Bad e
Most of the above definitions are straightforward. They use λ for denoting anonymous functions, and the following two operators: f a h$a= Bad (e ∪ E(a)) 4
if h = Ok f , if h = Bad e
where
Here σ is a mapping from term variables to values.
E(a) =
∅ e
if a = Ok v if a = Bad e ,
Parametricity for Haskell with Imprecise Error Semantics
and h $$ D =
301
gD if h = Ok g Bad e if h = Bad e .
One crucial point here, taken from Peyton Jones et al. [5], is that application of an erroneous function value incurs all potential failures of the argument as well. We also essentially use their definitions of [[t1 + t2 ]]θ,σ and [[case t of {[ ] → t1 ; x1 : x2 → t2 }]]θ,σ , except that we do not check for overflow in the case of addition. To bring about erroneous values other than ⊥ in the first place, we have the obvious definition of [[error]]θ,σ . The expression ((h $)i ⊥) in the definition for fix means the supremum of the chain ⊥ h $ ⊥ h $ (h $ ⊥) · · · . For [[Λα.t]]θ,σ , we first need to analyze the semantics of t to sort out the exceptional case that every D is mapped to an (actually, one and the same) erroneous value, in which case the semantics of Λα.t should itself be that erroneous value, as explicitly added via lift in the definition of [[∀α.τ ]]θ . Due to the observed uniqueness, it is not actually necessary to check the behavior for every D. Instead, the test can be performed with a single, arbitrary elcpo. We choose the simplest one, namely just Verr . If we find that we are not in the exceptional case, the denotation is just the standard one, but appropriately tagged via Ok. Finally, the definition of [[let! x = t1 in t2 ]]θ,σ follows the standard one, but similarly to the definition of [[case t of {[ ] → t1 ; x1 : x2 → t2 }]]θ,σ , and in line with the operational semantics of Moran et al. [3], t2 is evaluated in “error-finding mode” to contribute further potential failure causes in case t1 is already erroneous. Altogether, we have that if Γ t : τ and σ(x) ∈ [[τ ]]θ for every x : τ occurring in Γ , then [[t]]θ,σ ∈ [[τ ]]θ .
3 3.1
Parametricity The Standard Logical Relation
The key to parametricity results is the definition of a family of relations by induction on a calculus’ type structure. If we were to abandon the primitive error, and thus return to a setting without distinguishing error causes (i.e., with only one erroneous value ⊥), then the appropriate such logical relation would be as follows, where ρ is a mapping from type variables to binary relations between pointed complete partial orders: = ρ(α) Δα,ρ = id ΔInt,ρ Δ[τ ],ρ = list Δτ,ρ Δτ1 →τ2 ,ρ = {(f, g) | f = ⊥ iff g = ⊥, ∀(a, b) ∈ Δτ1 ,ρ . (f $ a, g $ b) ∈ Δτ2 ,ρ } Δ∀α.τ,ρ = {(u, v) | ∀D1 , D2 , R ∈ Rel(D1 , D2 ). (u $$ D1 , v $$ D2 ) ∈ Δτ,ρ[α→R] } We use id to denote identity relations. The operation list takes a relation R and maps it to list R = gfp (λS.{(⊥, ⊥), (Ok [ ], Ok [ ])} ∪ {(Ok (a : b), Ok (c : d)) | (a, c) ∈ R, (b, d) ∈ S}) ,
302
F. Stenger and J. Voigtl¨ ander
where again the greatest fixpoint is taken. Rel(D1 , D2 ) collects all relations between D1 and D2 that are strict, continuous, and bottom-reflecting. Strictness and continuity are just the standard notions (i.e., membership of the pair (⊥, ⊥) and closure under suprema). A relation R is bottom-reflecting if (a, b) ∈ R implies that a = ⊥ iff b = ⊥. The corresponding explicit condition on f and g in the definition of Δτ1 →τ2 ,ρ serves the purpose of ensuring that bottom-reflection is preserved throughout the logical relation. Overall, for that ⊥-only setting, reasoning like Johann and Voigtl¨ ander [1] do gives the following important lemma (by induction on τ ), where Rel is the union of all Rel (D1 , D2 ). That lemma is crucial for then proving the parametricity theorem. Lemma 1. If ρ maps only to relations in Rel, then Δτ,ρ ∈ Rel . Theorem 1. If Γ t : τ , then for every θ1 , θ2 , ρ, σ1 , and σ2 such that – for every α occurring in Γ , ρ(α) ∈ Rel (θ1 (α), θ2 (α)), and – for every x : τ occurring in Γ , (σ1 (x), σ2 (x)) ∈ Δτ ,ρ , we have ([[t]]θ1 ,σ1 , [[t]]θ2 ,σ2 ) ∈ Δτ,ρ . For reference, the proof of Theorem 1 for the setting without error, spelling out more formally the “narrative” of Johann and Voigtl¨ ander [1], is given in Appendix A of the accompanying technical report [8]. 3.2
Towards Appropriate Restrictions on Relations
To establish an analogue of Theorem 1 for the setting including non-⊥ errors, and their deliberately imprecise semantics, we first need to determine just the right set of restrictions to impose on relational interpretations of types. Above, we required strict, continuous, and bottom-reflecting relations. It seems reasonable that continuity will still be required as we still have general recursion via the fixpoint combinator. But for strictness and bottom-reflection, the situation is less clear when we have more than a single erroneous value ⊥ to consider. For example, strictness currently only states that the pair (⊥, ⊥) (i.e., the pair (Bad E nt , Bad E nt )) should be contained in every relation. But what about other erroneous values? Should any pair of them be related? Or only identical ones? Or is inclusion of (⊥, ⊥) actually enough? The best way to answer such questions is to go through the proof of Theorem 1 and see where changes in the calculus and its semantics might require a change in the proof. In our case, it of course makes most sense to study the impact of the new primitive error first. Recalling its typing rule, we will have to prove that, for every θ1 , θ2 , ρ, σ1 , and σ2 such that – for every α occurring in Γ , ρ(α) is an appropriately restricted relation between θ1 (α) and θ2 (α), and – for every x : τ occurring in Γ , (σ1 (x), σ2 (x)) ∈ Δτ ,ρ ,
Parametricity for Haskell with Imprecise Error Semantics
303
we have ([[error]]θ1 ,σ1 , [[error]]θ2 ,σ2 ) ∈ Δ∀α.Int→α,ρ . By the definition of Δ, this will require to establish that for every D1 , D2 , and (appropriate) R, ([[error]]θ1 ,σ1 $$ D1 , [[error]]θ2 ,σ2 $$ D2 ) ∈ ΔInt→α,ρ[α→R] . Further unfolding the current definition of Δ tells us that we will have to show that (3) [[error]]θ1 ,σ1 $$ D1 = ⊥ iff [[error]]θ2 ,σ2 $$ D2 = ⊥ (or a similar statement involving also non-⊥ erroneous values) and that for every (a, b) ∈ ΔInt,ρ[α→R] , ([[error]]θ1 ,σ1 $$ D1 $ a, [[error]]θ2 ,σ2 $$ D2 $ b) ∈ R. Taking into account that the integer type should still be interpreted by an identity relation, and using the semantics definitions given in Section 2, the latter is the same as requiring that for every a ∈ lift {. . . , −2, −1, 0, 1, 2, . . .}, the value Bad {ErrorCall n} if a = Ok n Bad e if a = Bad e is related to itself by R, which is equivalent to requiring that every erroneous value is related to itself by R. Therefore, we propose to generalize the notion of strictness as follows, with id Verr = {(a, a) | a ∈ Verr }. Definition 1. A relation R is error-strict if id Verr ⊆ R. Similar questions as for strictness arise for bottom-reflection in the presence of different failure causes. Is it enough to maintain that two related values are either both ⊥ or else neither one of them is? Or should we generalize by requiring that either both are erroneous or else neither one of them is? Or should we be even more demanding by even expecting that only equal failure causes (or sets thereof) are related? The relevant proof case to check here is the one for the strict-let construct, because selective strictness was what necessitated bottom-reflection in the first place [1]. Recall the typing rule. Inside the proof of an analogue of Theorem 1 by induction over typing derivations we will have to establish, for the induction conclusion in this case, that, for θ1 , θ2 , ρ, σ1 , and σ2 as above, we have ([[let! x = t1 in t2 ]]θ1 ,σ1 , [[let! x = t1 in t2 ]]θ2 ,σ2 ) ∈ Δτ2 ,ρ . The semantics from Section 2 tells us that the two values in the relation of which we are interested here are equal to if [[t1 ]]θ1 ,σ1 = Ok v1 [[t2 ]]θ1 ,σ1 [x→Ok v1 ] Bad (e1 ∪ E([[t2 ]]θ1 ,σ1 [x→Bad ∅] )) if [[t1 ]]θ1 ,σ1 = Bad e1 and
[[t2 ]]θ2 ,σ2 [x→Ok v2 ]
if [[t1 ]]θ2 ,σ2 = Ok v2
Bad (e2 ∪ E([[t2 ]]θ2 ,σ2 [x→Bad ∅] )) if [[t1 ]]θ2 ,σ2 = Bad e2 , respectively. The role of bottom-reflection in the ⊥-only setting is to ensure, via the induction hypothesis corresponding to the precondition Γ t1 : τ1 , viz. ([[t1 ]]θ1 ,σ1 , [[t1 ]]θ2 ,σ2 ) ∈ Δτ1 ,ρ ,
(4)
304
F. Stenger and J. Voigtl¨ ander
that the same branch is chosen in (the analogues of) the two case distinctions above. Here the same can be achieved by introducing an auxiliary function extracting the tag of a value as follows: Ok if a = Ok v T (a) = Bad if a = Bad e , and generalizing bottom-reflection in such a way that related values are always required to have the same image under T . This suffices in the case that [[t1 ]]θ1 ,σ1 = Ok v1 and [[t1 ]]θ2 ,σ2 = Ok v2 , because we then get the desired ([[t2 ]]θ1 ,σ1 [x→Ok v1 ] , [[t2 ]]θ2 ,σ2 [x→Ok v2 ] ) ∈ Δτ2 ,ρ from (Ok v1 , Ok v2 ) ∈ Δτ1 ,ρ (cf. (4)) and the induction hypothesis corresponding to the precondition Γ, x : τ1 t2 : τ2 , namely that for every (b, c) ∈ Δτ1 ,ρ , ([[t2 ]]θ1 ,σ1 [x→b] , [[t2 ]]θ2 ,σ2 [x→c] ) ∈ Δτ2 ,ρ .
(5)
However, in the case that [[t1 ]]θ1 ,σ1 = Bad e1 and [[t1 ]]θ2 ,σ2 = Bad e2 , we need to show that (Bad (e1 ∪ E([[t2 ]]θ1 ,σ1 [x→Bad ∅] )), Bad (e2 ∪ E([[t2 ]]θ2 ,σ2 [x→Bad ∅] ))) ∈ Δτ2 ,ρ , (6) and do not yet have the means for doing so. Note that a supposed error-strictness of Δτ2 ,ρ would only allow us to conclude the desired membership if the sets e1 ∪ E([[t2 ]]θ1 ,σ1 [x→Bad ∅] ) and e2 ∪ E([[t2 ]]θ2 ,σ2 [x→Bad ∅] ) were equal. Revising the notion of error-strictness to guarantee that indeed any two erroneous values are related, independently of the sets of possible failures they represent, would risk completely blurring any distinction between different failure causes, and thus is not an acceptable option. Instead, the proposed generalization of bottomreflection is strengthened in a very natural way. Rather than just requiring that two related values always have the same image under T , we expect the same under E. Definition 2. A relation R is error-reflecting if (a, b) ∈ R implies that T (a) = T (b) and E(a) = E(b).5 Then, (4) and the assumption that Δτ1 ,ρ is error-reflecting imply that in the case [[t1 ]]θ1 ,σ1 = Bad e1 and [[t1 ]]θ2 ,σ2 = Bad e2 we even have e1 = e2 . Moreover, (5) and (Bad ∅, Bad ∅) ∈ Δτ1 ,ρ (cf. supposed error-strictness of Δτ1 ,ρ ) give ([[t2 ]]θ1 ,σ1 [x→Bad ∅] , [[t2 ]]θ2 ,σ2 [x→Bad ∅] ) ∈ Δτ2 ,ρ , and thus by supposed errorreflection of Δτ2 ,ρ , E([[t2 ]]θ1 ,σ1 [x→Bad ∅] ) = E([[t2 ]]θ2 ,σ2 [x→Bad ∅] ) as well, which finally establishes (6), without having to revise the notion of error-strictness. 3.3
The New Logical Relation and Its Parametricity Theorem
We have been led to focus on relations that are error-strict, continuous, and error-reflecting. Clearly, ensuring that these restrictions are preserved will require 5
Note that E(a) = E(b) does not imply T (a) = T (b), as can be seen by taking a = Bad ∅ and b = Ok v.
Parametricity for Haskell with Imprecise Error Semantics
305
changes to the definition of Δ. For example, the operation list used in Section 3.1 does not suffice anymore, but it is easy enough to replace it as follows: list err R = gfp (λS.id Verr ∪ {(Ok [ ], Ok [ ])} ∪ {(Ok (a : b), Ok (c : d)) | (a, c) ∈ R, (b, d) ∈ S}) . For the case of function types, we clearly need an appropriate replacement for the “f = ⊥ iff g = ⊥”-condition occurring in the definition of Δτ1 →τ2 ,ρ . It might seem that, to guarantee error-reflection (instead of bottom-reflection, as earlier), we will have to require “T (f ) = T (g) and E(f ) = E(g)”. But actually it turns out that requiring just “T (f ) = T (g)” is enough, as then the other conjunct can be established from relatedness of f $ a and g $ b for related a and b (see below). For the case of polymorphic types, we clearly have to restrict the relations that we quantify over to error-strict, continuous, and error-reflecting ones. To this end, for given elcpos D1 and D2 , let Rel err (D1 , D2 ) collect all relations between them that are error-strict, continuous, and error-reflecting. (Also, let Rel err be the union of all Rel err (D1 , D2 ).) Overall, we obtain the new logical relation defined as follows: Δerr α,ρ Δerr Int,ρ Δerr [τ ],ρ Δerr τ1 →τ2 ,ρ Δerr ∀α.τ,ρ
= ρ(α) = id lift {..., −2, −1, 0, 1, 2, ...} = list err Δerr τ,ρ err = {(f, g) | T (f ) = T (g), ∀(a, b) ∈ Δerr τ1 ,ρ . (f $ a, g $ b) ∈ Δτ2 ,ρ } = {(u, v) | ∀D1 , D2 , R ∈ Rel err (D1 , D2 ). (u $$ D1 , v $$ D2 ) ∈ Δerr τ,ρ[α→R] }
Now we can state the following analogue of Lemma 1. err Lemma 2. If ρ maps only to relations in Rel err , then Δerr . τ,ρ ∈ Rel
The proof is mostly routine, but we briefly sketch a few interesting parts related to the treatment of erroneous values: err – Error-strictness of Δerr τ1 →τ2 ,ρ follows from error-reflection of Δτ1 ,ρ and errorerr nt strictness of Δτ2 ,ρ , because for every e ∈ P(E) ∪ {E } and a, b with E(a) = E(b), we have ((Bad e) $ a, (Bad e) $ b) ∈ id Verr . err – Error-reflection of Δerr τ1 →τ2 ,ρ follows from error-strictness of Δτ1 ,ρ and error nt reflection of Δerr , because for every e, e ∈ P(E) ∪ {E }, we have that τ2 ,ρ E((Bad e) $ (Bad ∅)) = E((Bad e ) $ (Bad ∅)) implies e = e . err – Error-strictness of Δerr ∀α.τ,ρ follows from error-strictness of Δτ,ρ[α→R] for every error-strict, continuous, and error-reflecting relation R, because for every e ∈ P(E)∪{E nt } and elcpos D1 and D2 , ((Bad e) $$ D1 , (Bad e) $$ D2 ) ∈ id Verr . err – Error-reflection of Δerr ∀α.τ,ρ follows from error-reflection of Δτ,ρ[α→R] for every error-strict, continuous, and error-reflecting relation R, because for every (erroneous or non-erroneous) value h in [[∀α.τ ]]θ for some θ, and every elcpo D, we have T (h $$ D) = T (h) and E(h $$ D) = E(h).
Being assured of Lemma 2 is nice, but not our ultimate goal. Rather, we want the following analogue of Theorem 1.
306
F. Stenger and J. Voigtl¨ ander
Theorem 2. If Γ t : τ , then for every θ1 , θ2 , ρ, σ1 , and σ2 such that – for every α occurring in Γ , ρ(α) ∈ Rel err (θ1 (α), θ2 (α)), and – for every x : τ occurring in Γ , (σ1 (x), σ2 (x)) ∈ Δerr τ ,ρ , we have ([[t]]θ1 ,σ1 , [[t]]θ2 ,σ2 ) ∈ Δerr τ,ρ . Of course, Lemma 2 plays an important role in the proof, in particular in the inductive case for type application. The proof case for error was already discussed earlier, in Section 3.2. The only change necessary to what was said there is that instead of (3) we actually need to establish that T ([[error]]θ1 ,σ1 $$ D1 ) = T ([[error]]θ2 ,σ2 $$ D2 ). But this is straightforward from the term semantics, which forces both tags to be Ok . Another proof case already discussed in Section 3.2 is the one for the strict-let construct. Clearly, it also uses Lemma 2, to deduce T ([[t1 ]]θ1 ,σ1 ) = T ([[t1 ]]θ2 ,σ2 ) from ([[t1 ]]θ1 ,σ1 , [[t1 ]]θ2 ,σ2 ) ∈ Δerr τ1 ,ρ and to deduce (Bad (e1 ∪E([[t2 ]]θ1 ,σ1 [x→Bad ∅] )), err Bad (e2 ∪ E([[t2 ]]θ2 ,σ2 [x→Bad ∅] ))) ∈ Δerr τ2 ,ρ from (Bad e1 , Bad e2 ) ∈ Δτ1 ,ρ and the err statement that for every (b, c) ∈ Δτ1 ,ρ , ([[t2 ]]θ1 ,σ1 [x→b] , [[t2 ]]θ2 ,σ2 [x→c] ) ∈ Δerr τ2 ,ρ . Most other proof cases proceed like the corresponding ones for Theorem 1. The two that do not, and that require Lemma 2 as well, namely those for case and for type abstraction, are discussed in detail in the technical report [8]. 3.4
Applying the New Parametricity Theorem
Having established Theorem 2, we can use it to derive free theorems that hold with respect to the imprecise error semantics. When doing so, we typically want to specialize relations (arising from the quantification in the definition of Δerr ∀α.τ,ρ ) to functions. To this end, the following definition is useful. The notation ∅ is used for empty mappings from type or term variables to elcpos and values. Definition 3. Let h be a term with h : τ1 → τ2 . The graph of h, denoted by G(h), is the relation {(a, b) | [[h]]∅,∅ $ a = b} ⊆ [[τ1 ]]∅ × [[τ2 ]]∅ . Note that it is actually a function, as h and a determine b. Of course, we should restrict attention to such h for which G(h) fulfills all necessary requirements on relations (i.e., error-strictness, continuity, and errorreflection). Error-strictness is easily translated from a restriction on G(h) to one on h. Continuity is a general property of functions and function application in the underlying semantics. Half of error-reflection, in the case of functions, is already given by error-strictness. The other half requires a guarantee that nonerroneous arguments are mapped to non-erroneous results. Altogether, we get the following definition and lemma. Definition 4. A term h with h : τ1 → τ2 and [[h]]∅,∅ = Ok f is – error-strict if f a = a for every a ∈ Verr , and – error-total if T (f a) = Ok for every a ∈ [[τ1 ]]∅ \ Verr . An h with T ([[h]]∅,∅ ) = Bad is neither error-strict nor error-total.
Parametricity for Haskell with Imprecise Error Semantics
307
For example, null is error-strict and error-total, while tail is neither. Also, Haskell’s standard projection function fst is error-strict but not error-total, while (const 42) is error-total but not error-strict. Lemma 3. Let h be a term with h : τ1 → τ2 . Then G(h) ∈ Rel err iff h is error-strict and error-total. We will only use the if-direction of this lemma, so we only sketch the proof of that direction, and only the parts related to the treatment of erroneous values: – Error-strictness of G(h) follows from error-strictness of h by definition of $. – Error-reflection of G(h) follows from error-strictness and error-totality of h by the definition of $ and because for every a ∈ [[τ1 ]]∅ \ Verr , T (a) = Ok and E(a) = ∅, and for every b ∈ [[τ2 ]]∅ , T (b) = Ok implies E(b) = ∅. One further auxiliary lemma we need has to do with a connection between G, the function map, and list err . Lemma 4. Let h be a term with h : τ1 → τ2 . Then G(map τ1 τ2 h) = list err G(h). The proof (by coinduction, using the definition of list err via a greatest fixpoint) holds no surprises and is thus omitted here. We now have everything at hand to derive free theorems. For illustration, we take up the introductory example. Example 1. Let t be a term with t : ∀α.(α → Bool) → [α] → [α]. This requires to extend the calculus and proofs by integrating a Boolean type and associated term-formers with appropriate typing rules, semantics, and so on. Since the details are entirely straightforward, we omit them. By Theorem 2 we have ([[t]]∅,∅ , [[t]]∅,∅ ) ∈ Δerr ∀α.(α→Bool)→[α]→[α],∅ , where ∅ is now also used to denote the empty mapping from type variables to relations. Using the definition of Δerr , we obtain that for every choice of elcpos D1 , D2 , relation R ∈ Rel err (D1 , D2 ), err values p1 , p2 with (p1 , p2 ) ∈ Δerr R, α→Bool,[α→R] , and l1 , l2 with (l1 , l2 ) ∈ list err ([[t]]∅,∅ $$ D1 $ p1 $ l1 , [[t]]∅,∅ $$ D2 $ p2 $ l2 ) ∈ list R. Let h be a term with h : τ1 → τ2 that is error-strict and error-total. (For the introductory example, h = tail is neither error-strict nor error-total.) By Lemma 3 we have G(h) ∈ Rel err ([[τ1 ]]∅ , [[τ2 ]]∅ ), so we can use it to instantiate R above. By Lemma 4 we then have list err R = G(map τ1 τ2 h), and thus for every choice of values p1 , p2 with (p1 , p2 ) ∈ Δerr α→Bool,[α→G(h)] and l1 ∈ [[[τ1 ]]]∅ , [[map τ1 τ2 h]]∅,∅ $ ([[t τ1 ]]∅,∅ $ p1 $ l1 ) = [[t τ2 ]]∅,∅ $ p2 $ ([[map τ1 τ2 h]]∅,∅ $ l1 ). The condition on p1 and p2 unfolds to T (p1 ) = T (p2 ) and for every a ∈ [[τ1 ]]∅ , p1 $ a = p2 $ ([[h]]∅,∅ $ a). The latter is easy to satisfy by choosing p1 = [[λx : τ1 .p (h x)]]∅,∅ and p2 = [[p]]∅,∅ for some p with p : τ2 → Bool, but we need to take note of the requirement that T ([[λx : τ1 .p (h x)]]∅,∅ ) = T ([[p]]∅,∅ ), which is equivalent to T ([[p]]∅,∅ ) = Ok . Altogether, we now have for every l1 ∈ [[[τ1 ]]]∅ , [[map τ1 τ2 h]]∅,∅ $ ([[t τ1 (λx : τ1 .p (h x))]]∅,∅ $ l1 ) = [[t τ2 p]]∅,∅ $ ([[map τ1 τ2 h]]∅,∅ $ l1 ), and thus for every term l with l : [τ1 ], [[map τ1 τ2 h (t τ1 (λx : τ1 .p (h x)) l)]]∅,∅ = [[t τ2 p (map τ1 τ2 h l)]]∅,∅ under the conditions that h is error-strict and errortotal and that T ([[p]]∅,∅ ) = Ok . This is the promised equivalence repair for (1).
308
4
F. Stenger and J. Voigtl¨ ander
Dealing with Exceptions, and beyond
So far, we have dealt with errors as events that lead a program to fail, without any possibility to manipulate them from inside the language itself, or to even recover from them. The accompanying technical report [8] shows how to deal with the Haskell primitive mapException, also discussed by Peyton Jones et al. [5]. Eventually, we want to cover full exception handling by (partially) modeling Haskell’s IO monad. An important move towards practical applicability would be the provision of static analyses that can successfully check for the conditions under which free theorems are now known to hold, so as to make them safely usable in a Haskell compiler. Clearly, conditions like error-strictness and error-totality are undecidable in general. But considering slightly stronger, tractable, conditions could be good enough in practice. In particular, it should be possible to leverage GHC’s strictness analyzer for also establishing error-strictness, and a sufficient check for error-totality is possible using the strategy employed by Xu et al. [10], namely symbolic evaluation plus syntactic safety, all ready for the taking in (a branch of) GHC. Acknowledgements. We would like to thank anonymous reviewers for their comments and suggestions.
References 1. Johann, P., Voigtl¨ ander, J.: Free theorems in the presence of seq. In: POPL, Proceedings, pp. 99–110. ACM Press, New York (2004) 2. Johann, P., Voigtl¨ ander, J.: A family of syntactic logical relations for the semantics of Haskell-like languages. Information and Computation 207(2), 341–368 (2009) 3. Moran, A., Lassen, S.B., Peyton Jones, S.L.: Imprecise exceptions, Co-inductively. In: HOOTS, Proceedings. ENTCS, vol. 26, pp. 122–141. Elsevier, Amsterdam (1999) 4. Peyton Jones, S.L.: Tackling the awkward squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell. In: Proceedings of Marktoberdorf Summer School 2000, pp. 47–96. IOS Press, Amsterdam (2001) 5. Peyton Jones, S.L., Reid, A., Hoare, C.A.R., Marlow, S., Henderson, F.: A semantics for imprecise exceptions. In: PLDI, Proceedings, pp. 25–36. ACM Press, New York (1999) 6. Pitts, A.M.: Parametric polymorphism and operational equivalence. Mathematical Structures in Computer Science 10(3), 321–359 (2000) 7. Reynolds, J.C.: Types, abstraction and parametric polymorphism. In: Information Processing, Proceedings, pp. 513–523. Elsevier, Amsterdam (1983) 8. Stenger, F., Voigtl¨ ander, J.: Parametricity for Haskell with imprecise error semantics. Technical Report TUD-FI08-08, Technische Universit¨ at Dresden (2008) 9. Wadler, P.: Theorems for free. In: FPCA, Proceedings, pp. 347–359. ACM Press, New York (1989) 10. Xu, D.N., Peyton Jones, S.L., Claessen, K.: Static contract checking for Haskell. In: POPL, Proceedings, pp. 41–52. ACM Press, New York (2009)
Some Observations on the Proof Theory of Second Order Propositional Multiplicative Linear Logic Lutz Straßburger ´ INRIA Saclay – ˆIle-de-France — Equipe-projet Parsifal ´Ecole Polytechnique — LIX — Rue de Saclay — 91128 Palaiseau Cedex — France http://www.lix.polytechnique.fr/Labo/Lutz.Strassburger/
Abstract. We investigate the question of what constitutes a proof when quantifiers and multiplicative units are both present. On the technical level this paper provides two new aspects of the proof theory of MLL2 with units. First, we give a novel proof system in the framework of the calculus of structures. The main feature of the new system is the consequent use of deep inference, which allows us to observe a decomposition which is a version of Herbrand’s theorem that is not visible in the sequent calculus. Second, we show a new notion of proof nets which is independent from any deductive system. We have “sequentialisation” into the calculus of structures as well as into the sequent calculus. Since cut elimination is terminating and confluent, we have a category of MLL2 proof nets. The treatment of the units is such that this category is star-autonomous.
1 Introduction The question of when two proofs are the same is important for proof theory and its applications. It comes down to the question of which information contained in a proof is essential, and which information is purely bureaucratic, due to the chosen deductive system. One of the first results in that direction is Herbrand’s theorem which allows a separation between the quantifiers and the propositional fragment of first order classical predicate logic. The work on expansion trees by Miller [1] shows how Herbrand’s result can be generalized to higher order. In this paper we present a similar result for linear logic. Our work is motivated by the desire to find eventually a general treatment for the quantifiers, independent from the propositional fragment of the logic (see the related work by McKinley [2]). The first contribution of this paper is a presentation of MLL2 in the calculus of structures, which is a new deductive formalism using deep inference. That means that inferences are allowed anywhere deep inside a formula, very similar to what happens in term rewriting. As a consequence of this freedom we can show a decomposition theorem, which is not possible in the sequent calculus, and which can be seen as a version of Herbrand’s Theorem for MLL2. Secondly, we give a combinatorial presentation of MLL2 proofs that we call here proof nets (following the tradition) and that quotient away irrelevant rule permutations in the deductive systems (sequent calculus and calculus of structures). The identifications made by these proof nets are consistent with ones for MLL (with units) made by star-autonomous categories [3,4,5]. The main motivation for these proof nets is to exhibit the precise relation between deep inference and P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 309–324, 2009. © Springer-Verlag Berlin Heidelberg 2009
310
L. Straßburger
id
⊥
a⊥ , a A, B, Γ
[A B], Γ
Γ
1
⊥, Γ
Γ, A
B, Δ
∃
Γ, (A B), Δ
exch
1
Aa\B, Γ ∃a.A, Γ
∀
Γ, A, B, Δ Γ, B, A, Δ
A, Γ ∀a.A, Γ
a not free in Γ
Fig. 1. Sequent calculus system for MLL2
the existing presentations of MLL2-proofs: sequent calculus, Girard’s proof nets with boxes [6], and Girard’s proof nets with jumps [7]. In particular, there is a close connection between the decomposition theorem in deep inference, and the sequentialization of proof nets. Furthermore, our proof nets are the first to accomodate the quantifiers and the multiplicative units together without boxes. The proof nets proposed here are independent from the deductive system, i.e., we do not have the strong connection between links in the proof net and rule applications in the sequent calculus. However, we have “sequentialization” into the sequent calculus as well as into the calculus of structures. As expected, there is a confluent and terminating cut elimination procedure, and thus, the two conclusion proof nets form a category.
2 MLL2 in the Sequent Calculus Let us recall how MLL2 is presented in the sequent calculus. Let A = {a, b, c, . . .} be a countable set of propositional variables. Then the set F of formulas is generated by F ::= ⊥ | 1 | A | A ⊥ | [F F ] | (F ⊗ F ) | ∀A. F | ∃A. F Formulas are denoted by capital Latin letters (A, B, C, . . .). Linear negation (−)⊥ is defined for all formulas by the De Morgan laws. Sequents are finite lists of formulas, separated by comma, and are denoted by capital Greek letters (Γ, Δ, . . .). The notions of free and bound variable are defined in the usual way, and we can always rename bound variables. In view of the later parts of the paper, and in order to avoid changing syntax all the time, we use the following syntactic conventions: (i) We always put parentheses around binary connectives. For better readability we use [. . .] for and (. . .) for ⊗. (ii) We omit parentheses if they are superfluous under the assumption that and ⊗ associate to the left, e.g., [A B C D] abbreviates [[[A B] C] D]. (iii) The scope of a quantifier ends at the earliest possible place (and not at the latest possible place as usual). This helps saving unnecessary parentheses. For example, in [∀a.(a ⊗ b) ∃c.c a], the scope of ∀a is (a ⊗ b), and the scope of ∃c is just c. In particular, the a at the end is free. The inference rules for MLL2 are shown in Figure 1. In the following, we will call this system MLL2Seq . As shown in [6], it has the cut elimination property: 2.1 Theorem. The cut rule cut
Γ, A
A⊥ , Δ
Γ, Δ
is admissible for MLL2Seq .
Some Observations on the Proof Theory of MLL2
ai↓
α↓
u↓
S{1} S[a⊥ a]
S[[A B] C] S[A [B C]] S{∀a.[A B]} S[∀a.A ∃a.B]
⊥↓
σ↓
n↓
S{A}
1↓
S[⊥ A] S[A B] S[B A]
S{Aa\B} S{∃a.A}
ls
S{A}
e↓
S(1 ⊗ A)
S([A B] ⊗ C) S[A (B ⊗ C)] f↓
S{∃a.A} S{A}
rs
311
S{1} S{∀a.1}
S(A ⊗ [B C]) S[(A ⊗ B) C]
a not free in A.
Fig. 2. Deep inference system for MLL2
3 MLL2 in the Calculus of Structures We now present a deductive system for MLL2 based on deep inference. We use the calculus of structures, in which the distinction between formulas and sequents disappears. This is the reason for the syntactic conventions introduced above.1 The inference rules work directly (as rewriting rules) on the formulas. The system for MLL2 is shown in Figure 2. There, S{ } stands for an arbitrary (positive) formula context. We omit the braces if the structural parentheses fill the hole. E.g., S[A B] abbreviates S{[A B]}. The system in Figure 2 is called MLL2DI↓ . We consider here only the so-called down fragment of the system, which corresponds to the cut-free system in the sequent calculus.2 Note that the ∀-rule of MLL2Seq is in MLL2DI↓ decomposed into three pieces, namely, e↓, u↓, and f↓. We also need an explicit rule for associativity which is “built in” the sequent calculus. The relation between the ⊗-rule and the rules ls and rs (called left switch and right switch) has already in detail been investigated by several authors [13,14,15,9]. The following theorem ensures that MLL2DI↓ is indeed a deductive system for MLL2. 3.1 Theorem. For every proof of A1 , . . . , An in MLL2Seq , there is a proof of [A1 · · · An ] in MLL2DI↓ , and vice versa. As for MLL2Seq , we also have for MLL2DI↓ the cut elimination property, which can be stated as follows: S(A ⊗ A⊥ ) is admissible for MLL2DI↓ . 3.2 Theorem. The cut rule i↑ S{⊥} 1
2
In the literature on deep inference, e.g., [8,9], the formula (a⊗[b (a⊥ ⊗c)]) would be written as (a, [b, (a⊥ , c)]), while without our convention it would be written as a ⊗ (b (a⊥ ⊗ c)). Our convention can therefore be seen as an attempt to please both communities. In particular, note that the motivation for the syntactic convention (iii) above is the collapse of the on the formula level and the comma on the sequent level, e.g., [∀a.(a ⊗ b) ∃c.c a] is the same as [∀a.(a, b), ∃c.c, a]. The up fragment (which corresponds to the cut in the sequent calculus) is obtained by dualizing the rules in the down fragment, i.e., by negating and exchanging premise and conclusion. See, e.g., [10,11,8,12] for details.
312
x
L. Straßburger
S{∃a.∀b.A} S{∀b.∃a.A}
1f↓
S{∃a.1} S{1}
y↓
S{∃a.∃b.A} S{∃b.∃a.A}
⊥f↓
S{∃a.⊥} S{⊥}
v↓
S{∃a.[A B]}
w↓
S[∃a.A ∃a.B]
af↓
S{∃a.b} S{b}
af↓ ˆ
S{∃a.(A B)} S(∃a.A ∃a.B)
S{∃a.b⊥ } S{b⊥ }
in af↓ and ˆ af↓, a is different from b
Fig. 3. Towards a local system for MLL22 A We write MLL2DI↓
D for denoting a derivation D in MLL2DI↓ with premise A and B
conclusion B. The following decomposition theorem for MLL2DI↓ can be seen as a version of Herbrand’s theorem for MLL2 and has no counterpart in the sequent calculus. 3.3 Theorem
1
{ai↓, ⊥↓, 1↓, e↓} D1 1 A
Every derivation MLL2DI↓ D can be transformed into {α↓, σ↓, ls, rs, u↓}
D2 . B C {n↓, f↓} D3
C
This decomposition is obtained by permuting all instances of ai↓, ⊥↓, 1↓, e↓ up and permuting all instances of n↓, f↓ down. There are two versions of the “switch” in MLL2DI↓ , the left switch ls, and the right switch rs. For Thm. 3.1, the ls-rule would be sufficient, but for obtaining the decomposition in Thm. 3.3 we also need the rs-rule. If a derivation D uses only the rules α↓, σ↓, ls, rs, u↓, then premise and conclusion of D (and every formula in between the two) must contain the same atom occurrences. Hence, the atomic flow-graph [16,17] of the derivation D defines a bijection between the atom occurrences of premise and conclusion of D. Here is an example of a derivation with its flow-graph. (We left some some applications of α↓ and σ↓ implicit.) ls rs u↓ u↓ u↓
∀a.∀c.([a ∀a.∀c.([ ⊥ a]] [c [ ⊥ c]) ]) ∀a.∀c.[a⊥ (a ∀a.∀c.[ ( [c [ ⊥ c])] ])] ∀a.∀c.[a⊥ [(a ∀a.∀c.[ [( c⊥ ) c]] ]]
∀a.[∃c.a⊥ ∀c.[(a ∀a.[∃c. ∀c.[( c⊥ ) c]] ]]
∀a.[∃c.a⊥ [∃c.(a ∀a.[∃c. [∃c.( c⊥ ) ∀c. ∀c.c]] ]]
[∀a.∃c.a⊥ ∃a.[∃c.(a [∀a.∃c. ∃a.[∃c.( c⊥ ) ∀c. ∀c.c]] ]]
(1)
In the sequent calculus the ∀-rule has a non-local behavior, in the sense that for applying the rule we need some global knowledge about the context Γ , namely, that the variable
Some Observations on the Proof Theory of MLL2
313
a does not appear freely in it. This is the reason for the boxes in [6] and the jumps in [7]. In the calculus of structures this “checking” whether a variable appears freely is done in the rule f↓, which is as non-local as the ∀-rule in the sequent calculus. However, with deep inference, this rule can be made local, i.e., reduced to an atomic version (in the same sense as the identity axiom can be reduced to an atomic version). For this, we need an additional set of rules which is shown in Figure 3 (again, we show only the down fragment), and which is called Lf↓. Clearly, all rules are sound, i.e., proper implications of MLL2. Now we have the following: 3.4 Theorem
B B
Every derivation {n↓, f↓} D can be transformed into {n↓} ∪ Lf↓
D , and vice versa. C C
4 Proof Nets for MLL2 For defining proof nets for MLL2, we follow the ideas presented in [18,5] where the axiom linking of multiplicative proof nets has been replaced by a linking formula to accommodate the units 1 and ⊥. In such a linking formula, the ordinary axiom links are replaced by ⊗-nodes, which are then connected by s. A unit can then be attached to a sublinking by another ⊗, and so on. Here we extend the syntax for the linking formula by an additional construct to accommodate the quantifiers. Now, the set L of linking formulas is generated by the grammar L ::= ⊥ | (A ⊗ A ⊥ ) | (1 ⊗ L ) | [L L ] | ∃A. L In [18,5] a proof net consists of the sequent forest and the linking formula. The presence of the quantifiers, in particular, the presence of instantiation and substitution, makes it necessary to expand the structure of the sequent in the proof net. The set E of expanded formulas3 is generated by E ::= ⊥ | 1 | A | A ⊥ | [E E ] | (E ⊗ E ) | ∀A. E | ∃A. E |
A. E | ∃ A. E
There are only two additional syntactic primitives: the , called virtual existential quantifier, and the ∃ , called bold existential quantifier. An expanded sequent is a finite list of expanded formulas, separated by comma. We denote expanded sequents by capital Greek letters (Γ , Δ, . . . ). For disambiguation, the formulas/sequents introduced in Section 2 (i.e., those without and ∃ ) will also be called simple formulas/sequents. In the following we will identify formulas with their syntax trees, where the leaves are decorated by elements of A ∪ A ⊥ ∪ {1, ⊥}. We can think of the inner nodes as a, or with decorated either with the connectives/quantifiers ⊗, , ∀a, ∃a, ∃ a, the whole subformula rooted at that node. For this reason we will use capital Latin letters (A, B, C, . . . ) to denote nodes in a formula tree. We write A B if A is a (not necessarily proper) ancestor of B, i.e., B is a subformula occurrence in A. We write Γ (resp. A) for denoting the set of leaves of a sequent Γ (resp. formula A). 3
This is almost the same structure as Miller’s expansion trees [1]. The idea is to code a formula and its “expansion” together in the same syntactic object. But our case is simpler than in [1] because we do not have to deal with duplication.
314
L. Straßburger
σ 4.1 Definition. A stretching σ for a sequent Γ consists of two binary relations +
σ σ σ and − on the set of nodes of Γ (i.e., its subformula occurrences) such that + and − are disjoint. A stretching consists of edges connecting ∃ -nodes with some of its subformulas, and these edges can be positive or negative. Their purpose is to mark the places of the substitution of the atoms quantified by the ∃ . When writing an expanded sequent Γ with a stretching σ, denoted by Γ σ, we will draw these edges either inside Γ when it is written as a tree, or below Γ when it is written as string. The positive edges are dotted and the negative ones are dashed. Examples are shown in Figures 6, 4 and 5 below. If A is a node in Γ , we write σA to denote the restriction of σ to A. The virtue of second order MLL is the possibility of substitution and instantiation, which is the raison d’ˆetre of the expansion via ∃ and .
4.2 Definition. For an expanded formula E and a stretching σ, we define the ceiling and the floor4 , denoted by E σ and E σ , respectively, to be simple formulas, which are inductively defined as follows:
1 ∅=1
A B σ= A σA B σB
⊥ ∅=⊥ A ⊗ B σ= A σA ⊗ B σB
∀a.A σ=∀a. A σ
a.A σ=∃a. A σ
a ∅=a ∃ a.A σ= A σA
a⊥ ∅=a⊥
∃a.A σ=∃a. A σ
∃ 1 ∅ =1 A B σ = A σA B σB ⊥ ∅ =⊥ A ⊗ B σ = A σA ⊗ B σB ∀a.A σ =∀a. A σ a.A σ = A σ a ∅ =a ⊥ ⊥ ∃ a.A σ =∃a. A˜ σA˜ a ∅ =a ∃a.A σ =∃a. A σ ∃ ˜ The expanded formula A in the last line is obtained from A as follows: For every node σ B with A B and ∃ a.A + B remove the whole subtree B and replace it by a, and for σ ⊥ every B with ∃ a.A − B replace B by a . Note that ceiling and floor of an expanded sequent Γ differ from Γ only on ∃ and . In the ceiling, the is treated as ordinary ∃, and the ∃ is completely ignored. In the floor, the is ignored, and the ∃ uses the information of the stretching to “undo the substitution”. To provide this information on the location is the purpose of the stretching. To ensure that we really only “undo the substitution” instead of doing something weird, we need some further constraints, which are given by Definition 4.3 below. Given Γ σ and nodes A, B in Γ , then we write AB if A is a ∃ -node and there is a stretching edge from A to B, or A is an ordinary quantifier node and A B and B is the variable (or its negation) that is bound by A in A σA . 4.3 Definition. A pair Γ σ is appropriate, if the following three conditions hold: σ σ 1. If A + B and A + C, then B σB = C σC , σ σ if A − B and A − C, then B σB = C σC ,
⊥ if A + B and A − C, then B σB = C σC . σ
4
σ
Note the close correspondece to Miller’s expansion trees [1], where these two functions are called Deep and Shallow, respectively.
Some Observations on the Proof Theory of MLL2
a
b
a
⊥
b
⊥
a ⊥ b⊥
a b
315
∀b
∃c
∃a
∃ c.[(ab)a⊥ ]
∃ a.∀b.[b⊥ b]
a ∃c ∃ c. a.([aa⊥ ]b⊥ )
Fig. 4. Examples of expanded sequents with stretchings that are not appropriate a
b
a⊥
b⊥
∃a
∃c
∀b
a ⊥ b⊥
a b
∃c a
∃ c.[(ab)a⊥ ]
∃ a.[b⊥ b] ∀b.∃
∃ c.([aa⊥ ]b⊥ ) a.∃
Fig. 5. Appropriate examples of expanded sequents with stretchings
2. If A1 B1 and A2 B2 and A1 A2 and B1 B2 , then B1 A2 . 3. For all a.A, the variable a must not occur free in the formula A σA . The first condition above says that in a substitution a variable is instantiated everywhere by the same formula B. The second condition ensures that there is no variable capturing in such a substitution step. The third condition is exactly the side condition of the rule f↓ in Figure 2. We show in Figure 4 three examples of pairs Γ σ that are not appropriate: the first fails Condition 1, the second fails Condition 2, and the third fails Condition 3. In Figure 5 all three examples are appropriate. In [6] and [7], the first two conditions of Definition 4.3 appear only implicitly without being mentioned in the treatment of the ∃-rule. But for capturing the essence of a proof independently of a deductive system, we have to make everything explicit. 4.4 Definition. A pre-proof graph is a quadruple, denoted by P Γ σ, where P a linking formula, Γ is an expanded sequent, σ is a stretching for Γ , and ν is a bijection ν Γ → P such that only dual atoms/units are paired up. If Γ is simple, we say that ν the pre-proof graph is simple. In this case σ is empty, and we can simply write P Γ . ν
For B ∈ Γ we write B ν for its image under ν in P . When we draw a pre-proof ν graph P Γ σ, then we write P above Γ (as trees or as strings) and the leaves are connected by edges according to ν. Figure 6 shows an example written in both ways.
4.5 Definition. A switching s of a pre-proof graph P Γ σ is the graph that is ν obtained from the whole of P Γ σ by removing all stretching edges and by removing ν
316
L. Straßburger ∃c
c
c⊥
c
c⊥ a
c⊥
c⊥
c
c
a⊥
∃d
∀c
∃c
a⊥ a
a⊥ 1
a
a⊥
a⊥ ⊥
a
a⊥
a
a
c
∃a ⊥
⊥
∃c.[(cc )(cc )[(aa⊥ )(aa⊥ )(1(aa⊥ ))]] ∃ d.(c⊥ c⊥ ), ∃ a.(∀c.[cc]∃ ∃ c.(a⊥ a⊥ )⊥), [aa[a⊥ a]] c.∃
Fig. 6. Two ways of writing a proof graph
for each -node one of the two edges connecting it to its children. A pre-proof graph ν P Γ σ is multiplicatively correct if all its switchings are acyclic and connected [19]. For multiplicative correctness the quantifiers are treated as unary connectives and are therefore completely irrelevant. The example in Figure 6 is multiplicatively correct. For involving the quantifiers into a correctness criterion, we need some more conditions. ν Let s be a switching for P Γ , and let A and B be two nodes in Γ . We write A s B if there is a path in s from A to B, starting from A by going down to its parent and coming into B from below. Similarly, one can define the notations A s B and A s B and A s B. Let A and B be nodes in Γ with A B. The quantifier depth of B in A, denoted by A B, is the number of quantifier nodes on the path from A to B (including A if it happens to be an ∀ or an ∃, but not including B). Similarly we define Γ B. For quanP Γ tifier nodes A in P and A in Γ , we say A and A are partners, denoted by A ←→ A, if ν there is a leaf B ∈ Γ with A B in Γ , and A B in P , and A B = A B ν . ν
4.6 Definition. We say a simple pre-proof graph P Γ is well-nested if the following five conditions are satisfied: 1. For every B ∈ Γ , we have Γ B = P B ν . P Γ A, then A and A quantify the same variable. 2. If A ←→
Some Observations on the Proof Theory of MLL2
∃a.∃c.[(a a⊥ ) (c c⊥ )] (1)
317
∃a.∃c.[(a a⊥ ) (c c⊥ )] (2)
∃c.a⊥ , ∀a.[∃c.(a c⊥ ) ∀c.c]
∀a.∃b.a⊥ , ∃a.[∃d.(a c⊥ ) ∀c.c]
∃a.[∃c.(a a⊥ ) ∃c.(c c⊥ )]
∃a.∃c.[(a a⊥ ) (c c⊥ )]
(3)
(4) ∀a.∃c.a⊥ , ∃a.[∃c.(a c⊥ ) ∀c.c]
∃a.∀c.a⊥ , ∃a.[∃c.(a c⊥ ) ∀c.c]
∃a.∃c.[(a a⊥ ) (c c⊥ )] (5)
∃a.∃c.[(a a⊥ ) (c c⊥ )] (6)
∀a.∃c.a⊥ , ∃a.[(∃c.a ∃c.c⊥ ) ∀c.c]
∀a.∃c.a⊥ , ∃a.[∃c.(a c⊥ ) ∀c.c]
Fig. 7. Examples (1)–(5) are not well-nested, only (6) is well-nested P Γ 3. For every quantifier node A in Γ there is exactly one ∃-node A in P with A ←→ A. P Γ A. 4. For every ∃-node A in P there is exactly one ∀-node A in Γ with A ←→ P Γ P Γ s A1 and A ←→ A2 , then there is no switching s with A1 A2 . 5. If A ←→
Every quantifier node in P must be an ∃, and every quantifier node in Γ has exactly one of them as partner. On the other hand, an ∃ in P can have many partners in Γ , but exactly one of them has to be an ∀. Following Girard [6], we can call an ∃ in P together with its partners in Γ the doors of an ∀-box and the sub-graph induced by the nodes that have such a door as ancestor is called the ∀-box associated to the unique ∀-door. Even if the boxes are not really present, we can use the terminology to relate our work to Girard’s. In order to help the reader to understand these five conditions, we show in Figure 7 six simple pre-proof graphs, where the first fails Condition 1, the second one fails Condition 2, and so on; only the sixth one is well-nested. 4.7 Definition. We say that a pre-proof graph P Γ σ is correct if the pair Γ σ is ν appropriate and the simple pre-proof graph P Γ σ is well-nested and multiplicaν tively correct. In this case we say that P Γ σ is a proof graph and Γ σ is its conclusion. ν
The example in Figure 6 is correct. There we have that Γ σ is the simple sequent ∃c.(c⊥ ⊗ c⊥ ), (∀c.[c c] ⊗ (a⊥ ⊗ a⊥ ) ⊗ ⊥), [a a [a⊥ a]] and the conclusion Γ σ is ∃d.(d ⊗ d), ∃a.(a⊥ ⊗ a ⊗ ⊥), [a a [a⊥ a]] . As said before, due to the presence of the multiplicative units (see [18,5]), we need to enforce an equivalence relation on proof graphs. 4.8 Definition. Let ∼ be the smallest equivalence on proof graphs satisfying ν
ν
P [Q R] Γ î σ ∼ P [R Q] Γ î σ ν
ν
ν
ν
ν
ν
P [[Q R] S] Γ î σ ∼ P [Q [R S]] Γ î σ P (1 ⊗ (1 ⊗ Q)) Γ î σ ∼ P (1 ⊗ (1 ⊗ Q)) Γ î σ
P (1 ⊗ [Q R]) Γ î σ ∼ P [(1 ⊗ Q) R] Γ î σ ν
ν
P (1 ⊗ ∃a.Q) Γ {⊥} î σ ∼ P {∃a.(1 ⊗ Q)} Γ {
a.⊥} î σ
318
L. Straßburger
where in the third line ν is obtained from ν by exchanging the preimages of the two 1s. In all other equations the bijection ν does not change. In the last line ν must match the 1 and ⊥. A proof net is an equivalence class of ∼. The first two equations in Definition 4.8 are simply associativity and commutativity of inside the linking. The third is a version of associativity of ⊗. The fourth equation could destroy multiplicative correctness, but since we defined ∼ only on proof graphs we do not need to worry about that.5 The last equation says that a ⊥ can freely tunnel through the borders of a box. Let us emphasize that this quotienting via an equivalence is due to the multiplicative units. If one wishes to use a system without units, one could completely dispose the equivalence by using n-ary s in the linking.
5 Sequentialisation
In this section we will discuss how we can translate proofs in the sequent calculus and the calculus of structures into proof nets and back. Let us begin with the sequent calculus. The translation from MLL2Seq proofs into proof graphs is done inductively on the structure of the sequent proof as shown in Figure 8. For the rules id and 1, this is trivial (ν0 and ν1 are uniquely determined and the stretching is empty). In the rule ⊥, the ν⊥ is obtained from ν by adding an edge between the new 1 and ⊥. The exch and -rules are also rather trivial (P , ν, and σ remain unchanged). For the ⊗ rule, the two linkings are connected by a new -node, and the two principal formulas are connected by a ⊗ in the sequent forest. The same is done for the cut rule, where we use a special cut connective . The two interesting rules are the ones for ∀ and ∃. In the ∀-rule, to every root node of the proof graph for the premise a quantifier node is attached. This is what ensures the well-nestedness condition. It can can be compared to Girard’s putting a box around a proof net. The purpose of the be interpreted as simulating the border of the box. The ∃-rule is the only one where
ν
id
ν0 a a⊥
a⊥ , a
ν
⊥
P Γ
∅
P Γ, Aa\B P Γ, ∃ a.Aa\B
(1 P ) Γ, ⊥ σ
P A, B, Γ ν
P [A B], Γ
ν1
⊥ 1
∅
exch
P Γ, A, B, Δ ν
σ
P Γ, B, A, Δ
∀
P A, B1 , . . . , Bn
σ σ σ
σ
ν
∃a.P ∀a.A, a.B1 , . . . , a.Bn P Γ, A σ
[P Q]
ν∪ν
Q B, Δ τ Γ, (A B), Δ ν
ν
cut
[P Q]
σ∪τ
Q A⊥ , Δ
P Γ, A σ ν∪ν
σ
ν
ν
σ
ν
1
ν
σ
ν
ν
σ
ν⊥
∃
Γ, A A⊥ , Δ
τ
σ∪τ
Fig. 8. Translating sequent calculus proofs into proof nets 5
In [18,5] the relation ∼ is defined on pre-proof graphs, and therefore a side condition had to be given to that equation (see also [20]).
Some Observations on the Proof Theory of MLL2
319
the stretching σ is changed. As shown in Figure 1, in the conclusion of that rule, the subformula B of A is replaced by the quantified variable a. When translating this rule into proof graphs, we keep the B, but to every place where it has to be substituted we add a positive stretching edge from the new ∃ a. Similarly, whenever a B ⊥ should be replaced by a⊥ , we add a negative stretching edge. The new stretching is σ . A pre-proof graph is SC-sequentializable if it can be obtained from a sequent proof ν as described above. If a pre-proof graph P Γ σ is obtained this way then the simple sequent Γ σ is exactly the conclusion of the sequent proof we started from. 5.1 Theorem. Every SC-sequentializable pre-proof graph is a proof graph. For the other direction, i.e, for going from proof graphs to MLL2Seq proofs we need to consider two linking formulas P1 and P2 to be equivalent modulo associativity and commutativity of . We write this as P1 ∼ P2 . Then, we have to remove all ∃-nodes from Γ in order to get a sequentialization theorem because the translation shown in Figure 8 never introduces an ∃-node in Γ . For this we replace in Γ every ∃a.A with ∃ a.A and by adding a stretching edge between the new ∃ a and every a and a⊥ a.∃ that was previously bound by ∃a (i.e, is free in A). Let us write Γ σ for the result of this modification applied to Γ σ.
ν ν σ 5.2 Theorem. If P Γ σ is correct, then there is a P ∼ P , such that P Γ is SC-sequentializable. The proof works in the usual way by induction on the size of P Γ σ. It is a combination of the sequentialization proofs in [5] and [6], and it makes crucial use of the “splitting tensor lemma” which in our case also needs well-nestedness. Let us now discuss the translation between proof nets and derivations in the calculus of structures. This can be done in a more modular way than for the sequent calculus. ν
5.3 Proposition. An MLL2 formula P is a linking formula if and only if there is a derivation 1
{ai↓, ⊥↓, 1↓, e↓} D P⊥
5.4 Lemma.
.
(2)
Let P1 and P2 be two linkings. Then there is a derivation P1 {α↓, σ↓, rs} D
P2
if and only if the simple pre-proof graph P2 P1⊥ is correct. If P1 and P2 have this property, we say that P1 is weaker than P2 , and denote it as P1 P2 . We can now characterize simple proof graphs in terms of deep inference:
320
L. Straßburger ν
5.5 Proposition. A simple pre-proof graph P Γ is correct if and only if there is a linking P with P P and a derivation P ⊥
{α↓, σ↓, ls, rs, u↓} D , Γ
(3)
such that ν coincides with the bijection induced by the flow graph of D. As an example, consider the derivation in (1) which corresponds to (6) in Figure 7. Finally, we characterize appropriate pairs Γ σ in terms of deep inference. 5.6 Proposition. For every derivation D {n↓, f↓}
D C
σ with D = Γ σ
(4)
there is an appropriate pair Γ
Conversely, if Γ
and C = Γ
σ
.
(5)
σ is appropriate, then there is a derivation (4) with (5).
We can explain the idea of this proposition by considering again the examples in Figures 4 and 5. To the non-appropriate examples in Figure 4 would correspond the following incorrect derivations: ∃a.([a a⊥ ] ⊗ b⊥ ) f↓ [(a ⊗ b) a⊥ ] ∀b.[b⊥ b] n↓ n↓ ([a a⊥ ] ⊗ b⊥ ) n↓ ∃a.∀b.[a b] ∃c.[c c⊥ ] ⊥ ∃c.(c ⊗ b )
And to the appropriate examples in Figure 5 correspond the following correct derivations: ∃a.([a a⊥ ] ⊗ b⊥ ) n↓ [(a ⊗ b) a⊥ ] ∀b.[b⊥ b] n↓ n↓ ∃a.∃c.(c ⊗ b⊥ ) f↓ ∀b.∃a.[a b] ∃c.[(c ⊗ b) c⊥ ] ⊥ ∃c.(c ⊗ b )
We can now easily translate a MLL2DI↓ proof into a pre-proof graph by first decomposing it via Theorem 3.3 and then applying Propositions 5.3, 5.5, and 5.6. Let us call a pre-proof graph DI-sequentializable if is obtained in this way from a MLL2DI↓ proof. 5.7 Theorem. Every DI-sequentializable pre-proof graph is a proof graph. By the method presented in [21], it is also possible to translate a MLL2DI↓ directly into a proof graph without prior decomposition. However, the decomposition is the key for the translation from proof graphs into MLL2DI↓ proofs (i.e., “sequentialization” into the calculus of structures). Propositions 5.3, 5.5, and 5.6 give us the following:
Some Observations on the Proof Theory of MLL2 ν
5.8 Theorem. If P Γ is DI-sequentializable.
321
σ is correct, then there is a P P , such that P ν Γ σ
There is an important difference between the two sequentializations. While for the sequent calculus we have a monolithic procedure reducing the proof graph node by node, we have for the calculus of structures a modular procedure that treats the different parts of the proof graph (which correspond to the three different aspects of the logic) separately. The core is Proposition 5.5 which deals with the purely multiplicative part. Then comes Proposition 5.6 which only deals with instantiation and substitution, i.e, the second-order aspect. Finally, Proposition 5.3 takes care of the linking, whose task is to describe the role of the units in the proof. Therefore the equivalence in 4.8, which is due to the mobility if the ⊥, only deals with the linkings. This modularity in the sequentialization is possible because of the decomposition in Theorem 3.3. Because of this modularity we treated the units via the linking formulas [18,5] instead of a linking function as done by Hughes in [22,20].
6 Comparison to Girard’s Proof Nets for MLL2 Such a comparison can only make sense for MLL2− , i.e., the logic without the units 1 and ⊥. In [7] the units are not considered, and in [6] the units are treated in a way that is completely different from the one suggested here. Consequently, in this section we consider only proof nets without any occurrences of 1 and ⊥. For simplicity, we will allow n-ary s in the linkings, so that we can discard the equivalence relation of Definition 4.8 and identify proof graphs and proof nets. The translation from our proof nets to Girard’s boxed proof nets of [6] is immediate: ν If P Γ σ is a given proof net, then (1) for each ∃ in P draw a box around the subproof net which has as doors this ∃ and its partners in Γ ; (2) replace in Γ every node A that is not a by its floor A σ , and remove all stretching edges and all -nodes, and finally (3) remove all ∃- and all -nodes in P , and replace the ⊗-nodes in P by axiom links. For the converse translation we proceed in the opposite order. It is clear that in both directions correctness is preserved, i.e., the two criteria are equivalent. Both data structures contain the same information. However, Girard’s boxed proof nets depend on the deductive structure of the sequent calculus. A box stands for the global view that the ∀-rule has in the sequent calculus, and the ∃-link is attached to it full premise and conclusion that are subject to the same side conditions as in the sequent calculus. The new proof nets presented in this paper make these side conditions explicit in the data structure, which is the reason why our definitions are a bit longer than Girard’s. The proof nets of [7] are obtained from the box proof nets by simply removing the boxes. In our setting this is equivalent to removing all ∃-nodes in P and all -nodes in Γ . Hence, this new data structure contains less information. This raises the question whether the other two representations contain reduntant data or whether Girard’s box-free proof nets make more identifications, and whether the missing data can be recovered. The answer is that the proof nets of [7] make indeed more proof identifications. For example the following proofs of ∀a.a, (∃b.b ⊗ [c c⊥ ]) would be identified:
322
L. Straßburger
∃a.[(a⊥ a) (c⊥ c)] ∃ b.a⊥ [c c⊥ ]) ∀a.a, a.(∃
and
[∃a.(a⊥ a) (c⊥ c)] ∃ b.a⊥ [c c⊥ ]) ∀a.a, ( a.∃
(6) When translating back to box-nets, we must for each ∀-link introduce a box around its whole empire. This can be done because a proof net does not lose its correctness if a ∀-box is extended to a larger (correct) subnet, provided the bound variable does not occur freely in the new scope. In [7], Girard avoids this by variable renaming. The reason why this gives unique representants is the stability and uniqueness of empires in MLL− proof nets. However, as already noted in [5], under the presence of the units, empires are no longer stable, i.e., due to the mobility of the ⊥ the empire of an ∀-node might be different in different proof graphs, representing the same proof net. Another reason for not using the solution of [7] is the desire to find a treatment for the quantifiers that is independent from the underlying propositional structure, i.e., that is also applicable to classical logic. While Girard’s nets are tightly connected to the structure of MLL− -proof nets, our presentation is closely related to Miller’s expansion trees [1] and the recent development by McKinley [2]. Thus, we can hope for a unified treatment of quantifiers in classical and linear logic.
7 Concluding Remarks We have investigated the relation between deep inference and proof nets and the sequent calculus for MLL2, and we have shown that this relation is much closer than one might expect. We did not go into the details of cut elimination because from the previous sections it should be clear that everything works as laid out in [6,7] and [5,18]. There are no technical surprises, and we have a confluent and terminating cut elimination procedure for our proof nets. An important consequence is that we have a category of proof nets: the objects are (simple) formulas and a map A → B is a proof net with conclusion A⊥ , B , where the composition of maps is defined by cut elimination. A detailed investigation of this category (which is *-autonomous [5]) has to be postponed to future research. The proof identifications made in this paper are motivated by the interplay between proof nets, calculus of structures, and sequent calculus. They should not be considered to be the final word. For example the proof nets by Girard [7] make more identifications, and the ones by Hughes [22] make less identifications. However, there are some observations about the units to be made here. The units can be expressed with the second-order quantifiers via 1 ≡ ∀a.[a⊥ a] and ⊥ ≡ ∃a.(a ⊗ a⊥ ). An interesting question to ask is whether these logical equivalences should be isomorphisms in the categorification of the logic. In the category of coherent spaces [6] they are, but in our category of proof nets they are not: The two canonical maps ∀a.[a⊥ a] → 1 and 1 → ∀a.[a⊥ a] are given by: [⊥ (1 ⊥)] ∃ a.(1 ⊥) , 1
and
(1 ∃a.(a a⊥ )) ⊥ , ∀a.[a⊥ a]
(7) respectively. Composing them means performing this cut eliminating:
Some Observations on the Proof Theory of MLL2
[⊥ (1 ⊥) (1 ∃a.(a a⊥ ))] ∃ a.(1 ⊥) , 1 ⊥ , ∀a.[a⊥ a]
→
323
[⊥ (1 ∃a.(a a⊥ ))] ∃ a.(1 ⊥) , ∀a.[a⊥ a]
(8) If the two maps in (7) where isos, the result of (8) must be the same as the identity map ∀a.[a⊥ a] → ∀a.[a⊥ a] which is represented by the proof net ∃a.[(a⊥ a) (a a⊥ )] ∃a.(a a⊥ ) , ∀a.[a⊥ a]
(9)
This is obviously not the case (even if we replaced ∃a by a. ∃ a as for Theorem 5.2). A similar situation occurs with the additive units, for which we have 0 ≡ ∀a.a and ≡ ∃a.a. Since we do not have 0 and in the language, we cannot check whether we have these isos in our category. However, since 0 and are commonly understood as initial and terminal objects of the category of proofs, we could ask whether ∀a.a and ∃a.a have this property: We clearly have a canonical proof for ∀a.a → A for every formula A, but it is not necessarily unique. The correct treatment of additive units in proof nets is still an open problem for future research.
References 1. Miller, D.: A compact representation of proofs. Studia Logica 46(4), 347–370 (1987) 2. McKinley, R.: On Herbrand’s theorem and cut elimination (extended abstract) (preprint) (2008) 3. Barr, M.: *-Autonomous Categories. Lecture Notes in Mathematics, vol. 752. Springer, Heidelberg (1979) 4. Lafont, Y.: Logique, Cat´egories et Machines. PhD thesis, Universit´e Paris 7 (1988) 5. Lamarche, F., Straßburger, L.: From proof nets to the free *-autonomous category. Logical Methods in Computer Science 2(4:3), 1–44 (2006) 6. Girard, J.Y.: Linear logic. Theoretical Computer Science 50, 1–102 (1987) 7. Girard, J.Y.: Quantifiers in linear logic II. Pre´epublication de l’Equipe de Logique, Universit´e Paris VII, Nr. 19 (1990) 8. Br¨unnler, K., Tiu, A.F.: A local system for classical logic. In: Nieuwenhuis, R., Voronkov, A. (eds.) LPAR 2001. LNCS (LNAI), vol. 2250, pp. 347–361. Springer, Heidelberg (2001) 9. Guglielmi, A.: A system of interaction and structure. ACM Transactions on Computational Logic 8(1) (2007) 10. Straßburger, L.: Linear Logic and Noncommutativity in the Calculus of Structures. PhD thesis, Technische Universit¨at Dresden (2003) 11. Br¨unnler, K.: Deep Inference and Symmetry for Classical Proofs. PhD thesis, Technische Universit¨at Dresden (2003) 12. Guglielmi, A., Straßburger, L.: Non-commutativity and MELL in the calculus of structures. In: Fribourg, L. (ed.) CSL 2001 and EACSL 2001. LNCS, vol. 2142, pp. 54–68. Springer, Heidelberg (2001) 13. Retor´e, C.: R´eseaux et S´equents Ordonn´es. Th`ese de Doctorat, sp´ecialit´e math´ematiques, Universit´e Paris VII (February 1993) 14. Blute, R., Cockett, R., Seely, R., Trimble, T.: Natural deduction and coherence for weakly distributive categories. Journal of Pure and Applied Algebra 113, 229–296 (1996)
324
L. Straßburger
15. Devarajan, H., Hughes, D., Plotkin, G., Pratt, V.R.: Full completeness of the multiplicative linear logic of Chu spaces. In: 14th IEEE Symposium on Logic in Computer Science (LICS 1999) (1999) 16. Buss, S.R.: The undecidability of k-provability. Annals of Pure and Applied Logic 53, 72– 102 (1991) 17. Guglielmi, A., Gundersen, T.: Normalisation control in deep inference via atomic flows. Logical Methods in Computer Science 4(1:9), 1–36 (2008) 18. Straßburger, L., Lamarche, F.: On proof nets for multiplicative linear logic with units. In: Marcinkowski, J., Tarlecki, A. (eds.) CSL 2004. LNCS, vol. 3210, pp. 145–159. Springer, Heidelberg (2004) 19. Danos, V., Regnier, L.: The structure of multiplicatives. Annals of Mathematical Logic 28, 181–203 (1989) 20. Hughes, D.: Simple free star-autonomous categories and full coherence (preprint) (2005) 21. Straßburger, L.: From deep inference to proof nets. In: Structures and Deduction — The Quest for the Essence of Proofs (Satellite Workshop of ICALP 2005) (2005) 22. Hughes, D.: Simple multiplicative proof nets with units (preprint) (2005)
Algebraic Totality, towards Completeness Christine Tasson Preuves, Programmes, Systèmes
Abstract. Finiteness spaces constitute a categorical model of Linear Logic (LL) whose objects can be seen as linearly topologised spaces, (a class of topological vector spaces introduced by Lefschetz in 1942) and morphisms as continuous linear maps. First, we recall definitions of finiteness spaces and describe their basic properties deduced from the general theory of linearly topologised spaces. Then we give an interpretation of LL based on linear algebra. Second, thanks to separation properties, we can introduce an algebraic notion of totality candidate in the framework of linearly topologised spaces: a totality candidate is a closed affine subspace which does not contain 0. We show that finiteness spaces with totality candidates constitute a model of classical LL. Finally, we give a barycentric simply typed lambda-calculus, with booleans B and a conditional operator, which can be interpreted in this model. We prove completeness at type Bn → B for every n by an algebraic method.
Introduction In the 80’s, Girard has been led to introduce linear logic (LL) after a denotational investigation of system F. The basic idea of LL is to decompose the intuitionistic implication into a linear one and an exponential modality. Many intuitions behind LL are rooted in linear algebra and relate algebraic concepts with operational ones. For instance, a linear function in the LL sense is a program (or a proof) which uses its argument exactly once and LL shows that this idea is similar to linearity in the algebraic sense. Can we use vector spaces and linear maps for interpreting LL? In the exponential-free fragment of LL, this is quite easy, since all vector spaces can stay finite dimensional: it is sufficient to take the standard relational interpretation of a formula (which is a set) and to build the vector space generated by this set. However, the exponential modality introduces infinite dimension and some topology is needed for controlling the size of dual spaces. Indeed, we want all spaces to be reflexive, that is, isomorphic to their second dual, because duality corresponds to linear negation which is involutive. There are various ways for defining interpretations with linear spaces. Among them, the interpretations based on linearly topologised spaces [1,6] have the feature of not requiring any topology on the field . This is quite natural, since the topology of the field is never used for interpreting proofs. Introduced first by Lefschetz in [15], these spaces are geometrically quite unintuitive (their basic opens are linear subspaces whereas usual basic opens are balls). They provide nevertheless the simplest setting where formulæ of LL can be seen as (topological) linear spaces as shown by Ehrhard when he introduced finiteness spaces [6]. There are two ways of considering finiteness spaces: P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 325–340, 2009. c Springer-Verlag Berlin Heidelberg 2009
326
C. Tasson
Relational finiteness spaces: they can be seen as a refinement of the relational semantics of linear logic, in which the semantics of proofs is the same as the standard one (proofs are interpreted as relations). Linear finiteness spaces: given a field, any relational finiteness space gives rise to a linearly topologised vector space. The category of linear finiteness spaces and continuous linear functions constitutes a model of linear logic. Besides, a linear finiteness space and its dual with the evaluation as pairing is a Chu space [2]. The proofs of LL are interpreted as multilinear hypocontinuous maps (hypocontinuity is between separate continuity and continuity). The description of proofs is close to that of Game Categories of Lafont and Streicher [14]. For describing these categories, we use the duality presentation whose importance has been emphasised by models of Classical Linear Logic such as phase semantics. Even the definition of coherence spaces [9], usually described by means of a binary symmetric and reflexive coherence relation X on a set |X|, can be reformulated through duality [10]. We will freely use the terminology of [12] — a survey of the different duality presentations and in particular of models of linear logic by double orthogonal. The partial orthogonality between the subsets c and c of |X| is given by c ⊥coh c ⇐⇒ (c ∩ c ) ≤ 1. A coherence space can then be seen as a pair X = (|X|, C(X)) where |X| is a countable set and C(X) is a subset of P(|X|) (the powerset of |X|). Moreover, C(X) is required to be ⊥coh -closed, that is equal to its second dual for the duality induced by the orthogonality: C(X) = C(X)⊥⊥ . The elements of C(X) are the cliques of X, those of C(X)⊥ are the anticliques. The category of coherence spaces and cliques is an orthogonality category. The category of relational finiteness spaces and finitary relations (a refinement of standard relations) also constitute an orthogonality category with respect to the finite orthogonality defined as follows: let u and u be two sets, u ⊥fin u ⇐⇒ (u ∩ u ) < ∞. A relational finiteness space is a pair X = (|X|, F (X)) where |X| is a countable set and the set F (X) of finitary parts is ⊥fin -closed. The elements of F (X)⊥ are the antifinitary parts. We carry the relational finiteness in the linear world by considering the linear subspace of |X| generated by finitary linear combinations, that is families x = (xa )a∈|X| such that their support |x| is finitary. The finite orthogonality between supports (|x| ⊥fin |x| ) implies that the pairing between a finitary linear combination x and an antifinitary one x is well-defined: x , x = a∈|X| xa xa = a∈|x|∩|x| xa xa is a finite sum. The notion of totality, introduced by Girard [8] in denotational semantics, is used for interpreting proofs more closely. It often gives the means to prove completeness results as in Loader [16]. Girard-Loader’s totality is described by an orthogonality category up to a slight modification of the partial orthogonality: u ⊥tot u ⇐⇒ (u ∩ u ) = 1.
Algebraic Totality, towards Completeness
327
A totality candidate is then a subset Θ(X) of P(X) such that Θ(X) is ⊥tot -closed. A totality space1 is a pair (|X|, Θ(X)) where Θ(X) is a totality candidate. The notion of totality can be adapted to linear spaces. So, we introduce the polar orthogonality between vectors: x ⊥• x ⇐⇒ x , x = 1. Because we are working in a linear algebra setting, we are able to give a simple characterisation of totality candidates, that is polar-closed subspaces of linear finiteness spaces: a totality candidate of a finiteness space is either the space itself, or the empty set, or any topologically closed affine subspace that does not contain 0. We get an orthogonality category whose objects are pairs of a finiteness space and a totality candidate and whose maps are continuous linear functions that preserve totality candidates. This is a model of LL. Since totality candidates are affine spaces, it is natural to add an affine construction to LL: we thus introduce barycentric LL. We address then the question of completeness: is it the case that any vector in the totality candidate of a formula is the interpretation of a proof of this formula? Restricting our attention to a barycentric version of the simply typed lambda-calculus (extended with booleans B and a conditional operator), we prove completeness at type B n ⇒ B for all n, by an algebraic method. Outline. We start Section 1 with generalities on finiteness spaces at both relational spaces and linear vector spaces level. Then, we give several properties inherited from linearly topologised spaces, in particular, we introduce separation results that are fundamental in the sequel. We describe the interpretation of linear logic proofs into finiteness spaces relying on Ehrhard’s results [6,4]. In Section 2, after having defined totality candidates and the associated total orthogonality category, we study the barycentric λcalculus. Finally, in Section 3, we tackle the completeness problem and give a positive answer for first order boolean types.
1 Finiteness Spaces 1.1 Relational Finiteness Spaces Let A be a countable set. The finite orthogonality is defined by: ∀u, u ⊆ A, u ⊥ u ⇐⇒ u ∩ u finite. As usual, the orthogonal of any F ⊆ P(A) is F ⊥ = {u ⊆ A | ∀u ∈ F, u ⊥ u } and F is orthogonally closed whenever F ⊥⊥ = F . Definition 1. A relational finiteness space is a pair A = (|A|, F (A)) where the web |A| is a countable set and the finitary subsets F (A) ⊆ P(|A|) are orthogonally closed. We say that u ∈ F(A)⊥ is antifinitary. Let A and B be relational finiteness spaces. A finitary relation R between A and B is a subset of |A| × |B| such that ∀u ∈ F(A), R · u = {b ∈ |B| | ∃a ∈ u, (a, b) ∈ R} ∈ F(B), ∀v ∈ F(B)⊥ , t R · v = {a ∈ |A| | ∃b ∈ v , (a, b) ∈ R} ∈ F(A)⊥ . 1
The additional conditions that are actually required in [16] are not essential for our purpose.
328
C. Tasson
Let us call RelFin the category whose objects are the relational finiteness spaces and whose maps are the finitary relations. Every finite subset of a countable set A is finitary. Therefore, there is only one relational finiteness space associated with a finite web (any subset is finitary). Let F , G ⊆ P(A). If F ⊆ G then G ⊥ ⊆ F ⊥ . Besides, F ⊆ F ⊥⊥ , so F ⊥⊥⊥ = F ⊥ . Therefore, (A, F ⊥⊥ ) is always a finiteness space. Let A be a relational finiteness space, then (F (A)⊥ )⊥ = F (A). Thus, the orthogonal A⊥ of A defined to be (|A|, F (A)⊥ ) is a relational finiteness space whose orthogonal A⊥⊥ = (|A|, F (A)⊥⊥ ) is equal to A. Example 2. Booleans. The relational finiteness space B is associated with the web with two elements B = {T, F}. Every subset is finitary: F (B) = P(B). Integers. The web N of integers, associated with the finite subsets Pfin (N) constitutes a relational finiteness space denoted by N . Its orthogonal N ⊥ is (N, P(N)). 1.2 Linear Finiteness Spaces Notations: In the sequel A, (Ai )i≤n and B range over relational finiteness spaces. The field is discrete and infinite (i.e. every subset of is open). We handle standard notions of linear algebra using the notations: • dim(E) is the dimension of E, E ∗ is the space of linear forms over E, • E is the topological dual of E, • x∗ , x is x∗ (x) if x ∈ E and x∗ ∈ E ∗ , • f ∗ : y ∗ ∈ F ∗ → [x∗ ∈ E ∗ : x → y ∗ , f (x) ] is the linear adjoint of f : E → F , • kerE (x∗ ) is the kernel of x∗ ∈ E ∗ , • ann E ∗ (x) (resp. ann E (x)) is the subspace of E ∗ (resp. E ) of linear forms (resp. continuous linear forms) which annihilate x, • aff (T ) (resp. n aff (T ))isn the affine hull (resp. affine closed) of a subset T , that is aff (T ) = { i=1 λi xi | i=1 λi = 1, xi ∈ T , n ∈ N}. If dir (T ) is the direction of T and x ∈ aff (T ), then aff (T ) = x + dir (T ). Any relational finiteness space A gives rise to a linear finiteness space A which is a subspace of the linear space |A| : Definition 3. For every x ∈ |A| , let |x| = {a ∈ |A| | xa = 0} be the support of x. The linear finiteness space associated with A is A = {x ∈ |A| | |x| ∈ F(A)}. With each a ∈ |A|, we associate a basic vector ea ∈ A . Notice that A is generated by the finitary linear combinations of basic vectors (and not by finite ones). Each linear finiteness space can be endowed with a topology induced by the antifinitary parts of the underlying relational finiteness space: Definition 4. For every J ∈ F(A)⊥ , let us call VJ = {x ∈ A | |x| ∩ J = ∅} a fundamental linear neighbourhood of 0. A subset U of A is open if and only if for each x ∈ U there is Jx ∈ F(A)⊥ such that x + VJx ⊆ U . This topology is named the finiteness topology on A . The collection of VJ where J ranges over F (A)⊥ is a filter basis. Indeed, for every J1 , J2 ∈ F(A)⊥ , VJ1 ∩ VJ2 = VJ1 ∪J2 and J1 ∪ J2 ∈ (F (A))⊥ . Besides, A is Hausdorff, since for every x = 0 and a ∈ |x| the finite set {a} ∈ F(A)⊥ , so x ∈ / V{a} .
Algebraic Totality, towards Completeness
329
Endowed with the finiteness topology, A is a linearly topologised space. That is a topological vector space over a discrete field whose topology is generated by a fundamental system (a filter basis of neighbourhoods of 0, here the VJ , which are linear subspaces of A ). Introduced by Lefschetz [15, II - §6], linearly topologised spaces have been studied in [13, §10-13]. Definition 5. Let us call LinFin the category whose objects are the linear finiteness spaces and whose maps are the linear continuous functions. Example 6. Booleans. As every linear finiteness space whose web is finite, the linear finiteness space associated with the boolean relational space has a finite dimension: B = B ⊥ = B 2 . The space B is endowed with the discrete topology since B is antifinitary and so VB = {0} is a fundamental linear neighbourhood of 0. Integers. The linear finiteness space associated with N is the set of finite sequences over . The linear finiteness space associated with N ⊥ is the set of all sequences over . Since N ∈ F(N ⊥ ), VN = {0} is a neighbourhood of zero, thus N is endowed with the discrete topology. On the contrary, the topology on N ⊥ is non-trivial: the fundamental system is the collection of VJ where J ranges over finite subsets of N. The space N ⊥ is simply N endowed with the usual product topology. Linearly topologised spaces are quite different from Banach spaces. Any open subset / VJ , (x + VJ ) ∩ VJ = ∅) — linear finiteVJ of a finiteness space is closed (∀x ∈ ness spaces are totally disconnected. Intuitively, unit balls are replaced by subspaces. Besides, with the usual topological definition, the only bounded subspace would {0}. Fortunately, there are linear variants of the notions of boundedness and compactness: Definition 7. A subspace C of A is said to be linearly bounded iff for every J ∈ F (A)⊥ the codimension of VJ ∩ C in C is finite, i.e. there exists a subspace C0 of C such that: C = (VJ ∩ C) ⊕ C0 and dim C0 is finite. A subspace K of A is said linearly compact iff for every filter F of affine closed subspaces of A , which satisfies the intersection property (i.e. ∀F ∈ F, F ∩ K = ∅), (∩F) ∩ K = ∅. Theorem 8 (Tychonov). [13, §10.9(7)] For any set I, I endowed with the product topology (generated by VJ = {x ∈ I | |x| ∩ J = ∅} with J ⊆ I) is linearly compact. In the converse direction, we get a characterisation of linearly compact spaces: Theorem 9. [15, II - §6(32.1)] For every linearly compact vector space K, there is a set I such that K is topologically isomorphic to I endowed with the product topology. Example 10. Booleans. As in every linearly topologised space of finite dimension (see [13, §13.1]), every subspace of B is linearly bounded. Integers. It follows from Th. 9 that a linearly compact space is discrete iff its dimension is finite. Hence, the linearly compact subspaces of N are the finite dimensional ones. Thanks to Tychonov Th. 8, N ⊥ is linearly compact. In the finiteness setting, finitary supports characterise linearly bounded spaces. Although it is not true in the general setting of linearly topologised spaces [13, §13.1(5)], linearly compact spaces are exactly the closed linearly bounded spaces.
330
C. Tasson
Proposition 11. Let K be a subspace of A . There is an equivalence between 1. K is linearly bounded, 2. |K| = ∪{|x| | x ∈ K} is finitary, 3. the closure of K is linearly compact. Proof. First, let C be a linearly bounded space and J ∈ F(A)⊥ . There is a finite dimensional subspace C0 of C such that C = (C ∩ VJ ) ⊕ C0 . Since the dimension of C0 is finite, |C0 | is finitary. Besides, |C| ∩ J = |C0 | ∩ J which is finite: |C| ∈ F(A). Conversely, if |C| ∈ F(A), then |C| ∩ J is finite and C ⊆ ( |K| ∩ VJ ) ⊕ |K|∩J is linearly bounded. The equivalence between (2) and (3) has already been proved in [4]. We focus attention on the topological dual — a linearly topologised space endowed with the linearly compact open topology, that is the topology of uniform convergence on either linearly bounded spaces or linearly compact spaces (equivalent thanks to Prop. 11). Definition 12. The topological dual A is the linear space made of continuous linear forms over A and endowed with the linearly compact open topology. This topology is generated by the ann A (K) = {x ∈ A | ∀x ∈ K, x , x = 0}’s where K ranges over linearly compact subspaces of A and by their translations. The two following propositions are central in the totality introduced in Section 2. Proposition 13 (Separation). [13, §10.4(1’)]. For every closed subspace D of A and x ∈ / D, there is a continuous linear form x ∈ A such that x, x = 1 and ∀d ∈ D, d, x = 0. Proposition 14 (Separation in the dual). Let T be a closed affine subspace of A such that 0 ∈ / T . There exists x ∈ A such that ∀x ∈ T , x , x = 1. Proof. First, linear algebra ensures the result when the dimension of T is finite. Second, the closed subspace T does not contain 0, so there is K linearly compact of A such that ann (K) ∩ T = ∅. We use the closed affine filter made of TF = {x ∈ E | ∀x ∈ F , x , x = 1} with F ranging over finite collections of A and the compactness of K to build the wanted x. Both separations theorem, are usefull to prove the algebraic isomorphism between a linear finiteness space and its second dual. This isomorphism is both continuous and open, as linearly bounded subspaces of the dual coincide with equicontinuous subspaces (Prop. 15). To sum up, the reflexivity of finiteness spaces relies on the links between linearly compactness, closed linearly bounded and equicontinuity. Proposition 15 (Equicontinuous spaces). Let A be a relational finiteness space. A subspace B of A is linearly bounded if and only if there is J ∈ F(A)⊥ such that B ⊆ ann A (VJ ). Proof. First, B is linearly bounded if and only if |B | ∈ F(A) (see Prop. 11). Second, ∃J ∈ F(A), B ⊆ ann (VJ ) ⇐⇒ ∃J ∈ F(A), |B | ⊆ J ⇐⇒ |B | ∈ F(A). Linear finiteness spaces satisfy other good properties (they admit Schauder bases (ea )a∈ |A| and are complete [6]). Although we do not know by now if the category of linearly topologised spaces satisfying all these properties is stable under LL constructions, we already know that the full subcategory of finiteness spaces is a model of LL [6].
Algebraic Totality, towards Completeness
331
1.3 A Model of MELL with MIX Both categories RelFin and LinFin constitute a model of classical linear logic as it has been proved by Ehrhard [6]. Although linear finiteness spaces are entirely determined by its underlying relational finiteness space (Fig.1), we give the algebraic description of the constructions of LL in LinFin (Fig.2) as in [6,4]. Example 16. If B = 1 ⊕ 1, then we get back to Ex. 2: |?B ⊥ | = |!B| = Mfin (T, F) N2 , F (!B) = {M ⊆ Mfin (T, F) | ∪μ∈M |μ| ∈ F(B)} = P(N2 ), F (?B ⊥ ) = M ⊆ N2 | ∀M ⊆ N2 , M ∩ M fin. = Pfin (N2 ).
Multiplicatives: |A ` B| = |A ⊗ B| = |A| × |B|
Additives: F |&i Ai | = |⊕i Ai | = i |Ai |,
8 9 < R ⊆ |A| × |B| s.t. = ⊥ F (A ` B) = ∀u ∈ F (A) , R · u ∈ F (B) : ; ∀v ∈ F (B)⊥ , t R · v ∈ F (A)
F (&i Ai ) =
8 9 < R ⊆ |A| × |B| s.t. = R · |B| ∈ F (A) F (A ⊗ B) = : ; t R · |A| ∈ F (B)
8 9 < j∈J uj s.t. = J ⊆ I finite F (⊕i Ai ) = : ; ∀j ∈ J, uj ∈ F (Aj )
j
i ui s.t. ∀i ∈ I, ui ∈ F(Ai )
Exponentials: |!A| = |?A| = Mfin (|A|) = {μ : A → N | μ(a) > 0 for finitely many a ∈ A} F(!A) = {M ⊆ Mfin (|A|) | ∪{|μ|, μ ∈ M } ∈ F (A)} o n F(?A) = M ⊆ Mfin (|A|) | ∀u ∈ F (A)⊥ , Mfin (u) ∩ M finite
Fig. 1. LL formulæ interpreted in RelFin
Multiplicatives: ⊥ = 1 = A ` B = A ⊗ε B e B A ⊗ B = A ⊗ A B = Lc ( A , B )
Additives: = 0 = {0} &i∈I Ai = ×i∈I A i ⊕i∈I Ai = ⊕i∈I A i
Exponentials: f A ) ?(A ) = Pol( ⊥
h i f A ) !A = Pol(
Fig. 2. Interpretation of LL formulæ in LinFin
ff
332
C. Tasson
Let us give some explanations on Fig.2: () continuous linear functions. We can generalise the topological dual framework and endow the space of continuous linear functions Lc ( A , B ) with the linearly compact open topology. It is generated by W(K, V ) = {f | f (K) ⊆ V } where K ranges over linearly compact subspaces of A and V over fundamental neighbourhoods of 0 of B and their translations. The linearly topologised space Lc ( A , B ) coincides with the linear finiteness space A B = A⊥ B . Indeed, the canonical map which maps each linear function to its matrix in the base induced by the web is a linear homeomorphism. ( ) hypocontinuous bilinear forms. As noticed by Ehrhard, the evaluation map: ev : A B × A → B is separately continuous but not continuous. That is why we need another notion of continuity. A bilinear form φ : A × B → is said hypocontinuous iff for every linearly compacts KA of A and KB of B , there are two neighbourhoods VB of B and VA of A such that φ(KA , VB ) = 0 and φ(VA , KB ) = 0. We denote A ⊗ε B , the space of hypocontinuous bilinear forms over A × B . It is a linearly topologised space when endowed with the linearly compact open topology generated by W(KA , KB ) = {φ | φ(KA , KB ) = 0} where KA and KB range over linearly compact subspaces of A and B respectively. The space A ⊗ε B is related to the inductive tensor product [11] which was generalised to linearly topologised spaces in [7]. B of A ⊗ε B is the (⊗) complete tensor product. The dual A ⊗ completion of the algebraic tensor product A ⊗ B . Indeed, α( A ⊗ B ) is B [7, Th 2.12] where dense in A ⊗ α : A ⊗ B → ( A ⊗ε B ) y : φ → φ(x, y)]. x⊗y → [x ⊗ (&) product. Let I be a set. The linear finiteness space &i∈I Ai is the product of the A i s, endowed with the product topology. (⊕) direct sum. The linear finiteness space ⊕i∈I Ai is the coproduct ⊕i∈I A i (made of finite linear combinations of elements of the A i s), endowed with the topology induced by the product topology. (!) through webs. The comonadic structure ( !A , , δ)and its linear distribution κ, can be described with respect to the web base: for x = a∈|A| xa ea given in A , μ(a) we set xμ = a∈|μ| xa and we take X = μ∈Mfin (|A|) Xμ eμ ∈ !A in κ : x →
δ : X →
μ μ∈Mfin (|A|) x eμ
M∈Mfin (Mfin (|A|))
: X →
X e , a∈|A| [a] a
μ=Σ(M)
Xμ
eM .
The exponentiation x! = κ(x) of x satisfies (x! ) = x and δ(x! ) = (x! )! . (?) analytic functions. The linear finiteness space ?A⊥ is the dual of !A . However, there is a more algebraic approach [4] of the monoid ?A⊥ . A function P is polynomial whenever there are symmetric hypocontinous i-linear forms2 i φi : × A → such that P (x) = ni=0 φi (x, . . . , x). The space ?(A⊥ ) coin A ) of the space Pol( A ) of polynomial functions cides with the completion Pol( 2
Hypocontinuity for i-linear forms is a generalisation of the bilinear case.
Algebraic Totality, towards Completeness
333
over A endowed with the linearly compact open topology3. We call the elements of ?(A⊥ ) analytic functions. (!) distributions. Finally, we are concerned with the Taylor expansion formula of A ), we can Ehrhard [6]. Taking into account that !A is the dual space of Pol( ! establish a parallel with distributions. For instance, x sends an analytic function F to its image x! , F = F (x), hence it corresponds to the dirac mass at x. Besides, in LinFin, there exists a sequence of projections: πn : μ∈Mfin (|A|) xμ eμ ∈ !A → μ=n xμ eμ ∈ !A , which are linear and continuous since their supports |π|n = {(μ, μ) | μ = n} are finitary. The vector xn = πn (x! ) = μ=n xμ eμ of !A is the convolution of x iterated n times. This distribution sends an analytic function to a homogeneous polynomial of ∞ 1 n degree n, that is its derivative at zero. From x! = n=0 n! x , Ehrhard deduces the Taylor expansion formula: ∞
1 n x , F . (1) F (x) = n! n=0 Example 17. We denote [X, Y ] (resp. [[X, Y ]]) the set of polynomials (resp. formal power series) over2the two variables X, Y . N 2 !B = z ∈ | |z| ∈ P(N ) = [[Xt , Xf ]] ⊥ N2 ?B = z ∈ | |z| ∈ Pfin (N2 ) = [Xt , Xf ] !B B = !B 1 ⊕ 1 = ?B ⊥ 2 = [Xt , Xf ] × n ?B ⊥ = [X1 , X2 , . . . , X2n−1 , X2n ] 2 ⊗n !B B = [X1 , X2 , . . . , X2n−1 , X2n ]
[Xt , Xf ] ,
2 Totality and Barycentric Lambda-Calculus In the present section, we explore an algebraic version of totality spaces, where formulæ are interpreted as finiteness spaces with an additional totality structure. Adapting Loader’s definition to this algebraic setting, we define a general concept of totality finiteness space: it is a pair [ A , T ] where A is a linear finiteness space and T is a subset of A which is equal to its second dual for a duality associated with the polar as defined below. Actually, the finiteness space interpreting any formula coincides with the first component of the totality finiteness space interpreting this formula. 2.1 Totality Finiteness Spaces The polar orthogonality is defined as follows: ∀x ∈ A , x ∈ A , x ⊥• x ⇐⇒ x , x = 1 The polar of a subset T of A is the following closed affine subspace of A : T • = {x ∈ A | ∀x ∈ T , x , x = 1} 3
Generated by W(K) = {P polynomial function | P (K) = 0} with K linearly compact.
334
C. Tasson
This set is closed since (x, x ) → x , x is linear and separately continuous on A × A (let (x, x ) ∈ A × A , {x} is linearly compact, so ann (x) is open in A ; x is a linear continuous form, so ker(x ) is open in A ). Notice that, up to the homeomorphism between A and A , if T is an affine subspace of A , T • = {x ∈ A | ∀x ∈ T , x , x = 1}. There is a simple characterisation of polar-closed affine subspaces: Proposition 18 (Characterisation). A subset T of A is polar-closed (T •• = T ) iff T is the empty set, the space A , or a closed affine subspace that does not contain 0. Proof. If T = A , then T • = ∅ and ∅• = A = T . If T = ∅, then T •• = ∅. It remains the case where T is affine, closed and 0 ∈ / T . The inclusion T ⊆ T •• is straightforward. Let us prove the contrapositive. Let x0 ∈ / T . Let z0 ∈ T and D = dir (T ) then T = z0 + D, x0 = z0 and x0 − z0 ∈ / D. By separation Prop. 13, there is x0 ∈ A such that x0 , x0 − z0 = 1 and ∀d ∈ D, x0 , d = 0. On the one side, if λ = x0 , z0 = 0, we set y0 = λ1 x0 , then y0 , z0 = 1 and ∀d ∈ D, y0 , d = 1 1 1+λ • / λ x0 , d = 0, so y0 ∈ T . However y0 , x0 = λ x0 , x0 = λ = 1, hence x0 ∈ / T and by separation T •• . On the other side, x0 , z0 = 0, then x0 , x0 = 1. Since 0 ∈ Prop. 13, there exists x1 ∈ A such that x1 , z0 = 1 and ∀d ∈ D, x1 , d = 0, hence x1 ∈ T • . Moreover, x1 + x0 , z0 = 1 and ∀d ∈ D, x1 + x0 , d = 0 hence x1 + x0 ∈ T • . To conclude, either x1 , x0 = 0 and x1 ∈ T • or x0 + x1 , x0 = 1 + x0 , x1 = 1 and x0 + x1 ∈ T • , so in both cases, x0 ∈ / T •• . From this characterisation, we deduce another one which n will be useful to compute the constructions of the model. Recall that aff (T ) = { i=1 λi ti | λi = 1, ti ∈ T }. Corollary 19. Let T be a subset of A . If T • = ∅, then T •• = aff (T ). Proof. The proof is based on: T ⊆ aff (T ) ⊆ T •• . Definition 20. A totality finiteness space is a pair [ A , T ] made of a linear finiteness space A and a totality candidate T , that is is a polar closed subspace of A . Let TotFin be the category whose objects are totality finiteness spaces and whose morphisms are continuous linear functions that preserve the totality candidates. 2.2 A Model of Classical Linear Logic To prove that TotFin is a model of classical linear logic, we use the definitions and results of [12, §4-5]. Let G(LF) be the double glueing of the category LinFin along the HOM functor. The objects of G(LF) are triples [ A , U , U ] where U and U are subspaces of respectively A and A . A morphism between [ A , U , U ] and [ B , V , V ] is a continuous linear function f : A → B such that f (U ) ⊆ V and f ∗ (V ) ⊆ U , where f ∗ is the adjoint of f . The linear exponential comonad of LinFin is equipped with a well-behaved linear distribution κ : x ∈ A → x! ∈ !A (it is routine to check the diagrams satisfied by κ). The category TotFin is a subcategory of G(LF) (considering triples [ A , T , T • ]). More precisely, it is a Tight orthogonality with respect to the polar orthogonality. This orthogonality is stable since it is focussed with respect to the focus
Algebraic Totality, towards Completeness
335
{1}: x ⊥• x ⇐⇒ x , x = 1 ⇐⇒ x (x) = 1. Since LinFin is a model of classical linear logic, TotFin is also a model of classical linear logic (c.f. [12, Th. 5.14]). The constructions inherited from LinFin as described in [12, §5.3] are: T (A⊥ ) = T (A)• T (1) = T (⊥) = {1}, T (0) = T () = {0}, T (A & B) = T (A) × T (B), T (A ⊗ B) = [T (A) ⊗ T (B)]•• , T (A B) = [T (A) ⊗ T (B)• ]• , ⊕ B) = [T (A) × T (B)]• , T (A •• •• ! T (!A) = [κ(T (A))] = x | x ∈ T (A) Moreover, we describe every totality candidate as a closed affine subspace. This algebraic description is made possible thanks to the characterisation of totality candidates (Prop. 18) and to the algebraic setting. Proposition 21. T (A ⊗ B) = aff (T (A) ⊗ T (B)), T (A B) = {f ∈ A | f (T (A)) ⊆ T (B)} . T (A ⊕ B) = aff (T (A) × ker(T (B)• ) ∪ ker(T (A)• ) × T (B)) T (!A) = aff (x! | x ∈ T (A)), A ) | ∀x ∈ T (A), F (x) = 1 , T (?A) = F ∈ Pol( A , B) | ∀x ∈ T (A), F (x) ∈ T (B) . T (!A B) = F ∈ Pol(
(2) (3) (4) (5) (6)
Proof. The proof relies on showing that T • is not empty and on the use of Cor. 19. The formula A ⇒ B is interpreted as the totality finiteness space that is made of morphisms of the Kleisli category. Corollary 22. The totality candidate associated with A ⇒ B satisfies the following fundamental equation: T (A ⇒ B) = {F : A → B analytic| ∀x ∈ T (A), F (x) ∈ T (B)}
(7)
In other words, the totality we have defined is a logical relation. Example 23. T (B) = {(xt , yt ) ∈
2
| xt + yt = 1},
T (B ⊥ ) = {(1, 1)},
T (!B) = {F ∈ [[Xt , Xf ]] | ∀(xt , yt ), xt + yt = 1 ⇒ F (xt , yt ) = 1}, T (?B ⊥ ) = {P ∈ [Xt , Xf ] | xt + yt = 1 ⇒ P (xt , yt ) = 1}, 2 T (!B B) = {(P, Q) ∈ [Xt , Xf ] | xt + yt = 1 ⇒ P (xt , yt ) + Q(xt , yt ) = 1} T ( n ?B ⊥ ) = {P ∈ [X1 , . . . , X2n ] | ∀1 ≤ i ≤ n, x2i−1 + x2i = 1 ⇒ P (x1 , y2 , . . . , x2n−1 , x2n ) = 1} 2 T (⊗n !B B) = {(P, Q) ∈ [X1 , . . . , X2n ] | P + Q − 1 ∈ T ( n ?B)} Because totality candidate are affine spaces, it is natural to add a barycentric construction to our proof system and to interpret it by a barycentric combination. Totality finiteness spaces constitute a model of linear logic with MIX and barycentric sums.
336
C. Tasson
2.3 Simply Typed Boolean Barycentric Lambda-Calculus We propose a λ-calculus in the style of Vaux’s algebraic λ-calculus [17]. In the barycentric λ-calculus, sums of terms are allowed. It is well known that the application in λcalculus is linear in the function but not in its argument. That is why we introduce two kinds of terms: atomic terms that do not contain barycentric sums but in the argument of an application and barycentric terms which are barycentric sums of atomic terms. Moreover, we add booleans and a conditional construction. Syntax. Let V be a countable set of variables. Atomic terms s and barycentric terms T are inductively defined by
m ∀i ∈ {1, . . . , m}, ai ∈ , R, S ::= i=1 ai si where m i=1 ai = 1 , s ::= x | λx.s | (s)S | T | F | if s then R else s where x ∈ V We denote Λat the collection of atomic terms and Λbar the collection of barycentric terms. We quotient all these sets of terms by α-conversion and associativity and commutativity of the sum. Types. The barycentric λ-calculus is simply typed with the usual type syste with the restriction that barycentric sums of atomic terms are possible only if the latter have the same type. Decomposing A ⇒ B with exponential and linear map: !A B the type system can be reformulated within linear logic: x∈V Γ, x : A x : A
(var)
Γ s:A⇒B Γ R:A Γ (s)R : B
(app)
x : A, Γ s : B (abs) Γ λx.s : A ⇒ B n Γ si : A ai =1 i=1 (sum) n Γ ai s i : A i=1
(true) Γ T:B Γ F:B Γ s:B Γ R:A Γ S:A (cond) Γ if s then R else S : A
(false)
Semantics. We interpret the barycentric λ-calculus in LinFin through the standard translation of the λ-calculus in LL, extended to deal with the barycentric and boolean features as follows: Γ ni=1 ai si = ni=1 ai si Γ TΓ = (1, 0), FΓ = (0, 1), if s then R else SΓ = (sΓt RΓt + sΓf SΓt , st RΓf + sΓf SΓf ). Notice that since B = B = 2 , the semantics of each term s of type B is given by its two components s = (st , sf ). Theorem 24. Totality finiteness spaces constitute a denotational model of the barycentric λ-calculus. Thanks to cor 22, we can relate the notion of totality with realisability in logic where a term λx.t : A ⇒ B is total iff ∀s : A, t[x ← s] : B.
Algebraic Totality, towards Completeness
337
3 Towards Completeness We focus attention on closed terms of type B n ⇒ B. As we have seen in Ex. 23, terms of that type are pairs of polynomials P = (Pt , Pf ) ∈ [X1 , . . . , X2n ]2 s.t. for all (ai ) ∈ 2n with a2i−1 + a2i = 1, Pt (a1 , . . . , a2n ) + Pf (a1 , . . . , a2n ) = 1. Theorem 25 (Completeness). Every total function of T (B n ⇒ B) is the interpretation of a term of the boolean barycentric calculus. More precisely, we prove that every pair of polynomials P ∈ T (⊗n !B B) is boolean, i.e. there is a term S of the boolean calculus such that S = (Pt , Pf ). Let us first introduce some notations and intermediate results. ¬S = if S then F else T, S+ = if S then T else T, S− = if S then F else F, Πi = λx1 , . . . , xn · xi ,
¬S = (Sf , St ), S+ = (St + Sf , 0), S− = (0, St + Sf ), Πi = (X2i−1 , X2i ).
The following pairs of polynomials are boolean: (X2i , X2i−1 ) = X2i · (1, 0) + X2i−1 · (0, 1) = ¬Πi , (8) + (9) (X2i−1 + X2i , 0) = X2i · (1, 0) + X2i−1 · (1, 0) = Πi , + (1 − X2i , X2i ) = (1, 0) + (X2i−1 , X2i ) − (X2i−1 + X2i , 0) = T + Πi − Πi , . (1 − X2i−1 , X2i−1 ) = T + ¬Πi − Π+ i We prove first a weak version of the completeness theorem where we assume that Pt + Pf − 1 vanishes everywhere. Lemma 26 (Affine pairs). For every polynomial P ∈ nomials (1 − P, P ) is boolean.
[X1 , . . . , Xn ], the pair of poly-
Proof. We use an induction on the degree d of P . If d = 0, there exists a ∈ such that P = a, hence (1 − P, P ) = (1 − a) T + a F. If d > 0, let us first study the monomial case, i.e. X μ = Xiμi with, say, μ1 ≥ 1. (1 − X μ , X μ ) = (1 − X1 ) · (1, 0) + X1 · 1 − X1μ1 −1 i =1 Xiμi , X1μ1 −1 i =1 Xiμi = if Ξ1 then T else Ξd−1 = Ξμ . where the induction hypothesis ensures the existence 1 and Ξd−1 respectively in of Ξ terpreted by X1μ1 −1 and i =1 Xiμi . Finally, if P = aμ Xiμi , then (1 − P, P ) = (1 − aμ ) (1, 0) + ( aμ ) (1 − X μ , X μ ) = (1 − aμ ) T + ( aμ ) Ξμ . The following algebraic lemma, allows us to reduce our problem to affine pairs.
338
C. Tasson
Lemma 27 (Spanning polynomials). Let P ∈ [X1 , . . . , X2n ] where is an infinite field. If P vanishes on the common zeroes of X2i−1 + X2i − 1, then for every i in n {1, . . . , n} there is Qi ∈ [X1 , . . . , X2n ] such that P = i=1 Qi (X2i−1 + X2i − 1). Proof. Under the change of variable: Yi = X2i−1 + X2i − 1, Yi+n = X2i , for i ∈ {1, . . . , n}, we denote by PY the polynomial P . Then for every (yi ) ∈ n , PY (0, . . . , 0, yn+1 , . . . , y2n ) = 0. Since [Y2 , . . . , Y2n ] is a ring, [Y2 , . . . , Y2n ] [Y1 ] is an euclidean ring. The euclidean division of PY by Y1 gives PY = Q1 Y1 + R1 where Q1 ∈ [Y2 , . . . , Y2n ] [Y1 ] and R1 ∈ n [Y2 , . . . , Y2n ]. By iterating this process on Ri for i ∈ {1 . . . n − 1}, we get PY = i=1 Qi Yi + Rn where Qi ∈ [Y1 , . . . , Y2n ] and Rn ∈ [Yn+1 , . . . , Y2n ]. For all (yi ) ∈ n , we have PY (0, . . . , 0, yn+1 , . . . , y2n ) = R is infinn(yn+1 , . . . , y2n ) = 0. Since Q Y . Change the variables, we get nite, Rn = 0 and PY = i=1 i i n P = i=1 Qi (X2i−1 + X2i − 1). Proof (Theorem 25). Let P ∈ T (⊗n !B B). Thanks to Ex. 23, we know that Pt + of {X2i−1 + X2i − 1 | 1 ≤ i ≤ n}. Then, we can apply Pf − 1 vanishes on every zero n Lem. 27: Pt + Pf − 1 = i=1 Qi (X2i−1 + X2i − 1) with Qi ∈ [X1 , . . . , X2n ]. Thus, n (Pt , Pf ) = i=1 [(1 − Qi ) · (1, 0) + Qi · (X2i−1 + X2i , 0)] + (1 − Pf , Pf ) − n(1, 0). By Lem. 26, there are boolean terms Si and S such that (1 − Qi , Qi ) = Si and (1 − Pf , Pf ) = S. We have seen in Eq. (9) that (X2i−1 + X2i , 0) = Π+ i . Finally, we have found a term whose semantics is n
+ P= if Si then T else Πi + S − n T . i=1 If we inductively define the types n by 1 = 1, 2 = B and n + 1 = n ⊕ 1, then the completeness theorem can be generalised to types n1 ×· · ·×nk ⇒ m and a barycentric lambda-calculus with integer and case constructions. Example 28 (Gustave and Parallel-or functions). Several pairs of polynomials can interpret the functions POr ∈ T (!B ⊗ !B) B and Gus ∈ T (!B ⊗ !B ⊗ !B) B satisfying: POr(T, 0) = T Gus(T, F, 0) = T POr(0, T) = T Gus(0, T, F) = T POr(F, F) = F Gus(F, 0, T) = T Gus(F, F, F) = F The pairs of polynomials with the smallest degrees are respectively: POr : B × B ⇒ B (x , y) → (xt + yf − xt yt , xf yf ) Gus : B × B × B ⇒ B (x , y , z) → (xt yf + yt zf + zt xf , xt yt zt + xf yf zf )
Conclusion The first two sections of this article emphasise the algebraic and topological description of the model of finiteness spaces which is important for several reasons. First, the definition of linear finiteness spaces is related to relational finiteness spaces by means of
Algebraic Totality, towards Completeness
339
web. The purpose of a more algebraic approach is to get rid of webs. Our description of reflexivity is a first step in this direction. Second, our study has unveiled an algebraic approach to totality where totality candidates admit a simple algebraic and topological characterisation; such a characterisation was not available in coherence spaces. Moreover, although we needed to use linear logic to describe algebraic totality, we get a notion which is similar to reducibility candidates in realisability. Finally, the partial completeness result is proved using an algebraic method. This gives a new insight into the analogy between linear algebra and linear logic. The last section is devoted to a full completeness result which open the path to a better understanding of non-deterministic functions such as POr and Gus. Let us conclude with a parallel with what happened for sequential algorithms. The model of coherence spaces and stable functions is not complete for sequential algorithm. Even if the not sequential function POr is not stable, Gus is stable. The full completeness result of Loader [16] does not give information on boolean functions since it is restricted to multiplicative constructions with Mix. On the contrary, hypercoherence and strongly stable functions constitute a model of linear logic which characterises sequentiality [5,3]. The perspective is now to understand if this completeness result generalises at other types. Acknowledgements. I want to thank Thomas Ehrhard and Pierre-Louis Curien for their constant support. I am also deeply grateful to Pierre Hyvernat who was interested in the completeness part of this work. He proved simultaneously the result stated here with an elegant combinatorial approach and found the total description of the POr function.
References 1. Blute, R.F., Scott, P.J.: Linear Lauchli semantics. Annals of Pure and Applied Logic (1996) 2. Chu, P.H.: Autonomous categories, chapter Constructing*-autonomous categories. Lecture Notes in Mathematics 11 (1979) 3. Colson, L., Ehrhard, T.: On strong stability and higher-order sequentiality. In: Proceedings of Symposium on Logic in Computer Science, 1994. LICS 1994, pp. 103–108 (1994) 4. Ehrhard, T.: On finiteness spaces and extensional presheaves over the Lawvere theory of polynomials. Journal of Pure and Applied Algebra (to appear) 5. Ehrhard, T.: Hypercoherences: A strongly stable model of linear logic. Mathematical Structures in Computer Science 3(4), 365–385 (1993) 6. Ehrhard, T.: Finiteness spaces. Mathematical Structures in Comp. Sci. 15(4) (2005) 7. Fischer, H.R., Gross, H.: Tensorprodukte linearer topologien. Math. Annalen 160 (1965) 8. Girard, J.-Y.: The system F of variable types, fifteen years later. Theoretical Computer Science 45, 159–192 (1986) 9. Girard, J.Y.: Linear logic. Theor. Comput. Sci. 50, 1–102 (1987) 10. Girard, J.Y.: Le point aveugle cours de logique Tome 1 Vers la perfection Visions des sciences. Hermann (2006) 11. Grothendieck, A.: Produits tensoriels topologiques et espaces nuclaires, vol. 16. American Mathematical Society (1955)
340
C. Tasson
12. Hyland, M., Schalk, A.: Glueing and orthogonality for models of linear logic. Theoretical Computer Science 294(1-2), 183–231 (2003) 13. Köthe, G.: Topological Vector Spaces I. Springer, Heidelberg (1979) 14. Lafont, Y., Streicher, T.: Games semantics for linear logic. In: Proceedings of Sixth Annual IEEE Symposium on Logic in Computer Science, 1991. LICS 1991, pp. 43–50 (1991) 15. Lefschetz, S., et al.: Algebraic topology. American Mathematical Society (1942) 16. Loader, R.: Linear logic, totality and full completeness. In: Proceedings of Symposium on Logic in Computer Science, 1994. LICS 1994, pp. 292–298 (1994) 17. Vaux, L.: On linear combinations of λ-terms. In: Baader, F. (ed.) RTA 2007. LNCS, vol. 4533, pp. 374–388. Springer, Heidelberg (2007)
A Logical Foundation for Environment Classifiers Takeshi Tsukada1 and Atsushi Igarashi2 1 2
Tohoku University Kyoto University
Abstract. Taha and Nielsen have developed a multi-stage calculus λα with a sound type system using the notion of environment classifiers. They are special identifiers, with which code fragments and variable declarations are annotated, and their scoping mechanism is used to ensure statically that certain code fragments are closed and safely runnable. In this paper, we investigate the Curry-Howard isomorphism for environment classifiers by developing a typed λ-calculus λ . It corresponds to multi-modal logic that allows quantification by transition variables—a counterpart of classifiers—which range over (possibly empty) sequences of labeled transitions between possible worlds. This interpretation will reduce the “run” construct—which has a special typing rule in λα —and embedding of closed code into other code fragments of different stages— which would be only realized by the cross-stage persistence operator in λα —to merely a special case of classifier application. We prove that λ enjoys basic properties including subject reduction, confluence, and strong normalization and that the execution of a well-typed λ program is properly staged. Finally, we show that the proof system augmented with a classical axiom is sound and complete with respect to a Kripke semantics of the logic.
1
Introduction
A number of programming languages and systems that support manipulation of programs as data [1,2,3,4,5] have been developed in the last two decades. A popular language abstraction in these languages consists of the Lisp-like quasiquotation mechanism to create and compose code fragments and a function to run them like eval in Lisp. For those languages and systems, a number of type systems for so-called “multi-stage” calculi have been studied [5,6,7,8,9,10,11] to guarantee safety of generated programs even before the generating program runs. Among them, some seminal work on the principled design of type systems for multi-stage calculi is due to Davies [7] and Davies and Pfenning [8]. They discovered the Curry-Howard isomorphism between modal/temporal logics and multi-stage calculi by identifying (1) modal operators in modal logic with type constructors for code fragments treated as data and, in the case of temporal logic, (2) the notion of time with computation stages. For example, the calcuuck and Jørgensen’s lus λ [7], which can be thought as a reformulation of Gl¨ P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 341–355, 2009. c Springer-Verlag Berlin Heidelberg 2009
342
T. Tsukada and A. Igarashi
calculus for multi-level generating extensions [6] by using explicit quasiquote and unquote in the language, corresponds to a fragment of linear-time temporal logic (LTL) with the temporal operator “next” (written ) [12]. Here, linearly ordered time corresponds to the level of nesting of quasiquotations, and a modal formula A to the type of code of type A. It, however, does not treat eval; in fact, the code type in λ represents open code, that is, code that may have free variables, so simply adding eval to the calculus does not work—code execution may fail by unbound variables. The calculus λ [8], on the other hand, corresponds to (intuitionistic) modal logic S4 (only with the necessity operator ), in which a formula A is considered the type of closed code of type A. It supports safe eval since every code is closed, but inability to deal with open code hampers generation of efficient code. The following work by Taha and others [5,13,14,9,15] sought various forms of combinations of the two systems above to develop expressive type systems for multi-stage calculi. Finally, Taha and Nielsen [9] developed a multi-stage calculus λα , which was later modified to make type inference possible [15] and implemented as a basis of MetaOCaml. The calculus λα has a strong type system while supporting open code, eval (called run), and the mechanism called cross-stage persistence (CSP), which allows a value to be embedded in a code fragment evaluated later. For the type system, they introduced the notion of environment classifiers, which are special identifiers with which code fragments and variable declarations are annotated. A key idea is to reduce the closedness checking of a code fragment (which is useful to guarantee the safety of eval) to the freshness checking of a classifier. Unfortunately, however, correspondence to a logic is not clear for λα any longer, resulting in somewhat ad-hoc typing rules and complicated operational semantics, which would be difficult to adapt to different settings. In this paper, we investigate the Curry-Howard isomorphism for environment classifiers by developing a typed λ-calculus λ . The new calculus corresponds to a multi-modal logic that allows quantification by transition variables—the counterpart of environment classifiers. Multiple modalities correspond to indexing of code types by classifiers and quantifiers to types for classifier abstractions, used to ensure freshness of classifiers. One of our key ideas is to set, in the Kripke semantics, classifiers to range over possibly empty sequences of labels, attached to the transition function on possible worlds. A pleasant effect of this interpretation is that it will reduce the run construct—which has a peculiar typing rule in λα —and embedding of closed code into other code fragments of different stages—which would be only realized by the CSP operator in λα —to merely a special case of classifier application. Our technical contributions are as follows: – Identification of a modal logic that corresponds to environment classifiers; – Development of a new typed λ-calculus λ , naturally emerged from the correspondence, with its syntax, operational semantics, and type system; – Proofs of basic properties as a multi-stage calculus; and – Proofs of soundness and completeness of the proof system (augmented with a classical axiom) with respect to a Kripke semantics of the logic.
A Logical Foundation for Environment Classifiers
343
One missing feature in λ is CSP for all types of values but we do not think it is a big problem. First, CSP for primitive or function values is easy to add as a primitive (if one gives up printing code representation of functional values as in MetaOCaml). Second, as mentioned above, embedding closed code into code fragments of later stages is supported by a different means. It does not seem very easy to add CSP for open code to λ , but we think it is rarely needed. Organization of the Paper. In Section 2, we review λα and informally describe how the features of its type system correspond to those of a logic. In Section 3, we define the multi-stage calculus λ and prove basic properties including subject reduction, strong normalization, confluence, and the property that bigstep semantics implements staged execution. In Section 4, we formally define (a classical version of) the logic that corresponds to λ and prove soundness and completeness of the proof system with respect to a Kripke semantics. Lastly, we discuss related work and conclude. We omit proofs of the properties from the paper; a full version of the paper with proofs is available at http://www.sato. kuis.kyoto-u.ac.jp/~igarashi/papers/classifiers.html.
2
Interpreting Environment Classifiers in a Modal Logic
In this section, we informally describe how environment classifiers can be interpreted in a modal logic. We start with reviewing Davies’ λ [7] to get an intuition of how notions in a modal logic correspond to those in a multi-stage calculus. Then, along with reviewing main ideas of environment classifiers, we describe our logic informally and how our calculus λ is different from λα by Taha and Nielsen [9]. 2.1
λ : Multi-stage Calculus Based on LTL
Davies has developed the typed multi-stage calculus λ , which corresponds to a fragment of LTL by the Curry-Howard isomorphism. It can be considered the λcalculus with a Lisp-like quasiquotation mechanism. We first review linear-time temporal logic and the correspondence between the logic and the calculus. Linear-time temporal logic is a sort of temporal logic, in which the truth of propositions may depend on discrete and linearly ordered time, i.e., a given time has a unique time that follows it. Some of the standard temporal operators are (to mean “next”), (to mean “always”), and U (to mean “until”). Its Kripke semantics can be given by taking the set of natural numbers as possible worlds; then, for example, the semantics of is given by: n τ if and only if n+1 τ , where n τ is the satisfaction relation, which means “τ is true at world n.” In addition to the usual Curry-Howard correspondence between propositions and types and between proofs and terms, Davies has pointed out additional correspondences between time and computation stages (i.e., levels of nested quotations) and between the temporal operator and the type constructor meaning “the type of code of”. So, for example, τ1 → τ2 , which means “if τ1 holds at next time, then τ2 holds at next time,” is considered the type of functions
344
T. Tsukada and A. Igarashi
that take a piece of code of type τ1 and return code of type τ2 . According to this intuition, he has developed λ , corresponding to the fragment of LTL only with . λ has two new term constructors next M and prev M , which correspond to the introduction and elimination rules of , respectively. The type judgment of λ is of the form Γ n M : τ , where Γ is a context, M is a term, τ is a type (a proposition of LTL, only with ) and n is a natural number indicating a stage. A context, which corresponds to assumptions, is a mapping from variables to pairs of a type and a natural number, since the truth of a proposition depends on time. The key typing rules are those for next and prev: Γ n+1 M : τ Γ n next M : τ
Γ n M : τ . Γ n+1 prev M : τ
The former means that, if M is of type τ at level n + 1, then, at level n, next M is code of type τ ; the latter is its converse. Computationally, next and prev can be considered quasiquote and unquote, respectively. So, in addition to the standard β-reduction, λ has the reduction rule prev (next M ) −→ M , which cancels next by prev. The code types in λ are often called open code types, since the quoted code may contain free variables, so naively adding the construct to “run” quoted code does not work, since it may cause unbound variable errors. 2.2
Multi-modal Logic for Environment Classifiers
Taha and Nielsen [9] have introduced environment classifiers to develop λα , which has quasiquotation, run, and CSP with a strong type system. We explain how λα can be derived from λ .1 Environment classifiers are a special kind of identifiers with which code types and quoting are annotated: for each classifier α, there are a type constructor τ α for code and a term constructor M α to quote M . Then, a stage is naturally expressed by a sequence of classifiers, and a type judgment is of the form Γ A M : τ , where natural numbers in a λ type judgment are replaced with sequences A of classifiers. So, the typing rules of quoting and unquoting (written ˜M ) in λα are given as follows: Γ Aα M : τ Γ A M α : τ α
Γ A M : τ α . Γ Aα ˜M : τ
Obviously, this is a generalization of λ : if only one classifier is allowed, then the calculus is essentially λ . The corresponding logic would also be a generalization of LTL, in which there are several “dimensions” of linearly ordered time. A Kripke frame for the logic is given by a transition system [12] in which each transition relation is a map. More α formally, a frame is a triple (S, L, {−→| α ∈ L}) where S is the (non-empty) set 1
Unlike the original presentation, classifiers do not appear explicitly in contexts here. The typing rules shown are accordingly adapted.
A Logical Foundation for Environment Classifiers
345
α
of states, L is the set of labels, and −→ ∈ S → S for each α ∈ L. Then, the α semantics of τ α is given by: s τ α if and only if s τ for s −→ s , where s and s are states. The calculus λα has also a scoping mechanism for classifiers and it plays a central role to guarantee safety of run. The term (α)M , which binds α in M , declares that α is used locally in M and such a local classifier can be instantiated with another classifier by term M [β]. We show typing rules for them with one for run below: α∈ / FV(Γ, A) Γ A M : τ A Γ (α)M : (α)τ
Γ A M : (α)τ Γ A M [β] : τ [α := β]
Γ A M : (α)τ α . Γ A run M : (α)τ
The rule for (α)M requires that α does not occur in the context—the term M has no free variable labeled α—and gives a type of the form (α)τ , which Taha and Nielsen called α-closed type, which characterizes a relaxed notion of closedness. The rule for run M says that an α-closed code fragment annotated with α can be run. Note that ·α (but not (α)·) is removed in the type of run M . Taha and Nielsen have shown that α-closedness is sufficient to guarantee safety of run. When this system is to be interpreted as logic, it is fairly clear that (α)τ is a kind of universal quantifier, as Taha and Nielsen has also pointed out [9]. Then, the question is “What does a classifier range over?”, which has not really been answered so far. Another interesting question is “How can the typing rule for run be read logically?” One plausible answer to the first question is that “classifiers range over the set of transition labels”. This interpretation matches the rule for M [β] and it seems that the typing rules without run (with a classical axiom) are sound and complete with the Kripke semantics that defines s (α)τ by s τ [α := β] for all β ∈ L. However, it is then difficult to explain the rule for run. The key idea to solve this problem is to have classifiers range over the set of finite (and possibly empty) sequences of transition labels and to allow a classifier abstraction (α)M to be applied to also sequences of classifiers. Then, run will be unified to a special case of application of a classifier abstraction to the empty sequence. More concretely, we change the term M [β] to M [B], where B is a possibly empty sequence of classifiers (the left rule below). When B is empty and τ is τ0 α (assuming τ0 do not include α), the rule (as shown as the right rule below) can be thought as the typing rule of (another version of) run, since α-closed code of τ0 becomes simply τ0 (without (α)· as in the original λα ). Γ A M : (α)τ Γ M [B] : τ [α := B] A
Γ A M : (α)τ0 α Γ A M [ε] : τ0
Another benefit of this change is that cross-stage persistence for closed code (or embedding of persistent code [10]) can be easily expressed. For example, if x is of the type (α)intα , then it can be used as code computing an integer at different stages as in, say, · · · (˜x[α]) + 3 · · · · · · 4 + (˜˜x[αβ]) · · · β · · · α . So, once a programmer obtains closed code, she can use it at any later stage. Correspondingly, the semantics is now given by v, ρ; s τ where v is a valuation for propositional variable and ρ is a mapping from classifiers to sequences
346
T. Tsukada and A. Igarashi
of transition labels. Then, v, ρ; s τ α is defined by v, ρ; s τ where s is reachable from s through the sequence ρ(α) of transitions and v, ρ; s (α)τ by: v, ρ[A/α]; s τ for any sequence A of labels (ρ[A/α] updates the value of α to be A). In Section 4, we give the formal definition of the Kripke semantics and show that the proof system, based in the ideas above, with double negation elimination is sound and complete to the semantics.
3
The Calculus λ
In this section, we define the calculus λ , based on the ideas described in the previous section: we first define its syntax, type system, and small-step full reduction semantics and states some basic properties; then we define big-step call-by-value semantics and shows that staged execution is possible with this semantics. Finally, we give an example of programming in λ . We intentionally make notations for type and term constructors different from λα because their precise meanings are different; it is also to avoid confusion when we compare the two calculi. 3.1
Syntax
Let Σ be a countably infinite set of transition variables, ranged over by α and β. A transition, denoted by A and B, is a finite sequence of transition variables; we write ε for the empty sequence and AB for the concatenation of the two transitions. We write Σ ∗ for the set of transitions. A transition is often called a stage. We write FTV(A) for the set of transition variables in A, defined by FTV(α1 α2 . . . αn ) = {αi | 1 ≤ i ≤ n}. Let PV be the set of base types (corresponding to propositional variables), ranged over by b. The set Φ of types, ranged over by τ and σ, is defined by the following grammar: Types
τ ::= b | τ → τ | α τ | ∀α.τ .
A type is a base type, a function type, a code type, which corresponds to ·α of λα , or an α-closed type, which corresponds to (α)τ . The transition variable α of ∀α.τ is bound in τ . In what follows, we assume tacit renaming of bound variables in types. The type constructor α connects tighter than → and → tighter than ∀: for example, α τ → σ means ( α τ ) → σ and ∀α.τ → σ means ∀α.(τ → σ). We write FTV(τ ) for the set of free transition variables, which is defined in a straightforward manner. Let Υ be a countably infinite set of variables, ranged over by x and y. The set of terms, ranged over by M and N , is defined by the following grammar: Terms
M ::= x | M M | λx : τ.M | α M | α M | Λα.M | M A .
In addition to the standard λ-terms, there are four more terms, which correspond to M α , ˜M , (α)M , and M [β] of λα (respectively, in the order presented). Note that, unlike ˜M in λα , the term α M for unquote is also annotated. The variable x in λx : τ.M and the transition variable α in Λα.M are bound in M . Bound variables are tacitly renamed to avoid variable capture in substitution.
A Logical Foundation for Environment Classifiers
Γ, x : τ @A A x : τ Γ, x : τ @A A M : σ (Abs) Γ A λx : τ.M : τ → σ Γ Aα M : τ () Γ A α M : α τ
347
(Var)
Γ A N : τ Γ A M : τ → σ (App) Γ A M N : σ Γ A M : α τ () Γ Aα α M : τ
α∈ / FTV(Γ ) ∪ FTV(A) Γ A M : τ (Gen) Γ A Λα.M : ∀α.τ
Γ A M : ∀α.τ (Ins) Γ A M B : τ [α := B]
Fig. 1. Typing rules
3.2
Type System
As mentioned above, a type judgment and variable declarations in a context are annotated with stages. A context Γ is a finite set {x1 : τ1 @A1 , . . . , xn : τn @An }, where xi are distinct variables. We often omit braces {}. We write FTV(Γ ) for the setof free transition variables in Γ , defined by: FTV({xi : τi @Ai | 1 ≤ i ≤ n n}) = i=1 (FTV(τi ) ∪ FTV(Ai )). A type judgment is of the form Γ A M : τ , read “term M is given type τ under context Γ at stage A.” Figure 1 presents the typing rules to derive type judgments. The notation τ [α := B], used in the rule (Ins), is captureavoiding substitution of transition B for α in τ . When α in α is replaced by a transition, we identify ε τ with τ and AB τ with A B τ . For example, ( α ∀α. α b)[α := ε] = ∀α. α b and (∀α. β b)[β := αα] = ∀α . α α b. The first three rules are almost standard except for the stage annotations, which must be equal as in most multi-stage calculi. The rule (Var) means that variables can appear only at the stage which variables are declared. The next two rules () and () are for quoting and unquoting and already explained in the previous section. The last two rules (Gen) and (Ins) are for generalization and instantiation of a transition variable, respectively. They resemble the introduction and elimination rules of ∀x.A(x) in first-order predicate logic: the side condition of the (Gen) rule ensures that the choice of α is independent of the context. Computationally, this side condition expresses α-closedness of M , that means M has no free variable which has annotation α in its type or its stage. This is a weaker form of closedness, which means M has no free variable at all. 3.3
Reduction
We will introduce full reduction M −→ N , read “M reduces to N in one step,” and prove basic properties including subject reduction, confluence and strong normalization. Before giving the definition of reduction, we define substitution. Since the calculus has binders for term variables and transition variables, we need two kinds of substitutions for both kinds of variables. Substitution M [x := N ] for a term variable is the standard capture-avoiding one, and its definition is omitted
348
T. Tsukada and A. Igarashi
here. Substitution M [α := A] of A for α is defined similarly to τ [α := A]. For example, (λx : τ.M )[α := A] = λx : (τ [α := A]).(M [α := A]), (M B)[α := A] = (M [α := A])(B[α := A]) and (β M )[α := A] = β[α:=A] (M [α := A]), where we define α1 ...αn M = α1 · · · αn M and α1 ...αn M = αn · · · α1 M . In particular, (α M )[α := ε] = (α M )[α := ε] = M [α := ε]. Note that, when a transition variable in is replaced, the order of transition variables is reversed, because this is the inverse operation of . This is similar to the inversion operation in −1 −1 group theory: (a1 a2 . . . an )−1 = a−1 n an−1 . . . a1 . The reduction relation M −→ N is the least relation closed under the following three computation rules (λx.M )N −→ M [x := N ]
α (α M ) −→ M
(Λα.M )A −→ M [α := A]
and congruence rules, which are omitted here. In addition to the standard βreduction, there are two rules: the second one, which is already explained previously, cancels quote by unquote and the last one, instantiation of a transition variable, is similar to polymorphic function application in System F. Note that the reduction is full—reduction occurs under any context—and does not take T staging into account. We can define the reduction relation as a triple M −→ N , with T standing for the stage of reduciton, as done in λ [7] and λ [10]. The reduction enjoys three basic properties, subject reduction, strong normalization and confluence. Theorem 1 (Subject Reduction). If Γ A M : τ and M −→ M , then Γ A M : τ . Theorem 2 (Strong Normalization). Let M be a typable term. There is no infinite reduction sequence M −→ N1 −→ N2 −→ · · · . Theorem 3 (Confluence). If M −→∗ N1 and M −→∗ N2 , then there exists N such that N1 −→∗ N and N2 −→∗ N .
3.4
Big-Step Semantics
Now, we give a big-step semantics and prove that the execution of a well-typed program can be properly divided into stages. The judgment has the form A M ⇓ R, read “evaluating term M of stage A yields result R,” where R is either err, which stands for a run-time error, or a value v, defined below. Values are given via a family of sets V A indexed by transitions, that is, stages. The family V A is defined by the following grammar: Vε ::= λx : τ.M | α V α | Λα.V ε A V (A = ε) ::= x | λx : τ.V A | V A V A | α V Aα | Λα.V A | V A B | α V A (if A α = A and A = ε) The set V of values is defined as
A∈Σ ∗
V A.
A Logical Foundation for Environment Classifiers ε M ⇓ λx : τ.M
ε
λx : τ.M ⇓ λx : τ.M ε M ⇓ α M α α M ⇓ M
ε M ⇓ Λα.v ε v[α := B] ⇓ v ε M B ⇓ v
Bα M ⇓ M α M ⇓ α M B
A
ε N ⇓ v M N ⇓ v ε
A
M ⇓M N ⇓N A M N ⇓ M N
A
x⇓x
Aα
349
ε M [x := v] ⇓ v B M ⇓ M Λα.M ⇓ Λα.M B
A M ⇓ M λx : τ.M ⇓ λx : τ.M A
A M ⇓ M α M ⇓ α M
A M ⇓ M M B ⇓ M B A
Fig. 2. Big-Step Semantics. Here, A stands for a non-empty sequence and B for a possibly empty sequence of transition variables.
Figure 2 shows the evaluation rules. The evaluation is left-to-right, call-byvalue. The first four and the last two rules (where B = ε) are for ordinary evaluation. The first two rules are standard. The third rule means that quote is canceled by unquote; since the resulting term M belongs to the stage α (inside quotation), α is attached to the conclusion. The fourth rule about instantiation of a transition abstraction is straightforward. As seen in the second last rule for Λα.M , Λ does not delay the evaluation of the body. The rules for stages later than ε are all similar: since the term to be evaluated is inside quotation, the term constructor is left as it is and only subterms of stage ε are evaluated. For brevity, we do not present the error-generating rules and the error-propagating rules, which are straightforward. We show properties of the big-step semantics. The following lemma says that, unless the result is err, the result must be a value even though the rules do not say it is the case, and that the successful evaluation is included in multi-step reduction (−→∗ stands for the reflexive transitive closure of −→). Theorem 4. Suppose A M ⇓ R. Then, either R = err or M −→∗ R ∈ V A . The last property is type soundness and its corollary that if a well-typed program of a code type yields a result, then the result is a quoted term, whose body is also typable at stage ε. In the statements, we say Γ is ε-free if it satisfies A = ε for any x : τ @A ∈ Γ and define a context Γ −A by: Γ −A = {x : τ @B | x : τ @AB ∈ Γ }. Theorem 5 (Type Soundness). If Γ is ε-free and Γ ε M : τ and ε M ⇓ R, then R = v and v ∈ V ε for some v and Γ ε v : τ . Moreover, if τ = α τ0 , then v = α N and Γ −α ε N : τ0 .
3.5
Programming in λ
We give an example of programming in λ . The example is the power function, which is a classical example in multi-stage calculi and partial evaluation. We augment λ with integers, booleans, arithmetic and comparison operators, ifthen-else, a fixed point operator fix, and let, all of which would be easy to add.
350
T. Tsukada and A. Igarashi
For readability, we often omit type annotations and put terms under quotation in shaded boxes. We start with the ordinary power function without staging. let power0 : int → int → int = fix f. λn. λx. if n = 0 then 1 else x ∗ (f (n − 1) x) Our purpose is to get a code generator power∀ that takes the exponent n and returns (closed, hence runnable) code of λx.x ∗ x ∗ . . . x ∗ 1, which computes xn without recursion. Here, we follow the construction of code generator in the previous work [14,13]. First, we construct a code manipulator power1 : int → α int → α int, which takes an integer n and a piece of integer code and then outputs a piece of code which connects the input code by “∗” n times. It can be obtained by changing type annotation and introducing quasiquotation. let power1 : int → α int → α int = fix f. λn. λx : α int. if n = 0 then (α 1) else α ((α x) ∗ (α f (n − 1) x)) Then, from power1 , we can construct a code generator powerα of type int → α (int → int), which means it takes an integer and returns code of a function. let powerα : int → α (int → int) = λn. α λx :int. α (power1 n (α x)) It indeed behaves as a code generator: for example, powerα 3 would evaluate to α λx : int .x ∗ (x ∗ (x ∗ 1)). This construction is independent of the choice of the stage α. So, by abstracting α at appropriate places in power1 and powerα , we can obtain the desired code generator, whose return type is a closed code type ∀α. α (int → int). let power2 : ∀α. int → α int → α int = Λα. fix f. λn. λx : α int. if n = 0 then (α 1) else α ((α x) ∗ (α f (n − 1) x)) let power∀ : int → ∀α. α (int → int) = λn. Λα. α λx :int. α (power2 α n (α x)) The output from power∀ is usable in any stage. For example, if we want code of a cube function at the stage A, we write power∀ 3 A. In particular, when A is the empty sequence ε, power∀ 3 ε : int → int evaluates to a function closure which computes x ∗ x ∗ x ∗ 1 from the input x.
A Logical Foundation for Environment Classifiers
4
351
Kripke Semantics for λ and Logical Completeness
In this section, we formally define a Kripke semantics of the logic corresponding to λ and prove completeness of the proof system. Actually, what we examine here is a classical version of the logic, which has bottom and a proof rule for double negation elimination, although λ itself can be considered intuitionistic. It is left for future work to study the semantics of the intutionistic version, of which recent work on Kripke semantics for intuitionistic LTL [16] can be a basis. First, we (re)define the set of propositions and the natural deduction proof system. Then, we proceed to the formal definition of the Kripke semantics and state soundness and completeness of the proof system. 4.1
Natural Deduction
The set Φ⊥ , ranged over by φ and ψ, of propositions are given by the grammar for Φ extended with a new constant ⊥. The natural deduction system can be obtained by forgetting variables and terms in the typing rules. We add the following new rule, which is the ordinary double negation elimination rule, adapted for this setting: Γ, (φ → ⊥)@A B ⊥ (⊥-E) . Γ A φ 4.2
Kripke Semantics and Completeness
As mentioned in Section 2, the Kripke semantics for this logic is based on a a functional transition system T = (S, L, {−→ | a ∈ L}) where S is the (nona empty) countable set of states, L is the countable set of labels, and −→ ∈ S → S a1 ···an for each label a ∈ L. We write s −→ s if there exist s1 , . . . , sn−1 such that an−1 a1 a2 an s1 −→ · · · −→ sn−1 −→ s . s −→ To interpret a proposition, we need two valuations, one for propositional variables and the other for transition variables. The former is a total function v ∈ S × PV → {0, 1}; the latter is a total function ρ ∈ Σ → L∗ , where L∗ is the set of all finite sequences of labels. Then, we define the satisfaction relation T , v, ρ; s φ, where s ∈ S is a state, as follows: T , v, ρ; s p iff v(s, p) = 1 T , v, ρ; s ⊥ never occurs T , v, ρ; s φ → ψ iff T , v, ρ; s φ or T , v, ρ; s ψ T , v, ρ; s α φ T , v, ρ; s ∀α.φ
ρ(α)
iff T , v, ρ; s φ where s −→ s iff for all A ∈ L∗ , T , v, ρ[A/α]; s φ
Here, ρ[A/α] is defined by: ρ[A/α](α) = A and ρ[A/α](β) = ρ(β) (for β = α). The satisfaction relation is extended pointwise to contexts Γ (possibly infinite sets of pairs of a proposition and a transition) by: T , v, ρ; s Γ iff T , v, ρ; s A φ for all φ@A ∈ Γ .
352
T. Tsukada and A. Igarashi
The local consequence relation Γ φ is defined by: Γ φ iff T , v, ρ; s Γ implies T , v, ρ; s φ for any T , v, ρ, s . Then, the natural deduction proof system is sound and complete with respect to the local consequence relation. The proof is similar to the one for first-order predicate logic: we use the standard techniques of Skolemization and Herbrand structure. Theorem 6. Γ ε φ if and only if Γ φ.
5
Related Work
Multi-Stage Calculi Based on Modal Logics and Their Extensions. Our work can be considered a generalization of the previous work on the Curry-Howard isomorphism between multi-stage calculi and modal logics [7,8,10]. Here, we briefly discuss how the earlier systems λ and λ can be embedded to λ . First, as already mentioned in Section 2, λ is obtained by using only one transition variable; so, translates to α with a fixed transition variable α; next and prev to α and α , respectively. Second, the calculus λ [8], which corresponds to intuitionistic modal logic S4 (with ). The type τ represents closed code values, which thus can be run or embedded in code of any later stages, as is possible in λ . There are box and unboxn for quoting and unquoting, respectively (see Pfenning and Davies [8] for details).2 The λ -type τ corresponds to ∀α. α τ , where τ does not include α; so, it reflects the fact that the code type in λ is (completely) closed. Unlike the embedding from λ to λα , given in [9], there is no use of CSP. The restriction of λ that all code be closed precludes the definition of a code generator like power∀ , which generates both efficient and runnable code. Nanevski and Pfenning [17] have extended λ with the notion of names, similar to the symbols in Lisp, and remedied the defect of λ by allowing newly generated names (not variables) to appear in closed code. Taha and Sheard [5] added run and CSP to λ and developed MetaML, but its type system was not strong enough—run may fail at run-time. Then, Moggi, Taha, Benaissa, and Sheard [13] developed the calculus AIM (“An Idealized MetaML”), in which there are types for both open and closed code; it was simplified to λBN , which replaced closed code types with closedness types for closed (but not necessarily code) terms. Both calculi are based on categorical models and have sound type systems. The notion of α-closedness in λα can be considered a generalization of λBN ’s closed types. In fact, the typing rule for run in λBN is similar to the one in λα . Although some of these calculi have sound type systems, it is hard to regard them as logic, mainly due to the presence of CSP, which delays the stage of the type judgment to any later stage, and the typing rule for run (as discussed in Section 2). 2
Precisely speaking, this calculus is what they call the “Kripke-style” calculus.
A Logical Foundation for Environment Classifiers
353
One nice property of λα is that a program can be executed without exploiting information on classifiers; in other words, classifiers can be erased after typechecking. Although our calculus λ does not have this “erasure property,” due to the presence of abstraction/instantiation of transition variables, by restricting / FTV(τ ), information on transition ∀-types to be of the form ∀α. α τ where α ∈ variables can be mostly erased. Under this restriction, the only information to be left after erasure is the length n of A in M A, which only duplicates at the head of the value of M n times. This restriction, which resembles one in λi [15], still allows embedding of λ and λ and power∀ (by inlining power2 into the body of it). Comparing λα and λ , we point out two differences between them. First, λα has CSP for all terms but λ cannot express CSP for open code. While we can deal with CSP for closed code as syntactic sugar, CSP for open code cannot be expressed in λ , because there is no context C such that x : α b@ε β C[x] : α b. A second difference is the behavior of run for the term M : ∀α. α α b. In λα , run will remove only one quotation, leaving ∀, so run M : ∀α. α b, while, in λ , the application to ε removes all α , that is, M ε : b. More recently, Yuse and Igarashi have proposed the calculus λ [10] by combining λ and λ , while maintaining the Curry-Howard isomorphism. The main idea was to consider LTL with modalities “always” () and “next” (), which represent closed and open code types, respectively. It is similar to AIM in this respect. Although λ is based on logic, it cannot be embedded into λ simply by combining the two embeddings above. In λ , both directions of τ ↔ τ are provable, whereas neither direction of (∀α. α β τ ) ↔ β ∀α. α τ is provable in λ . However, in λ it seems impossible to program a code specializer like power∀ , which generates specialized code used at any stage; the best possible one presented can generate specialized code used only at any later stage, so running the specialized code is not possible. It is considered not easy to develop a sound type system for staging constructs with side effects. Calcagno, Moggi, and Sheard developed a sound type system for a multi-stage calculus with references using closed types [18]. It is interesting to study whether their closedness condition can be relaxed by using α-closedness. Other Multi-Stage Calculi. Calcagno, Yi, and Kim’s λpoly open [11] is a rather powerful multi-stage calculus with open and closed code fragments, intentionally variable-capturing substitution, lifting values into code, and even references and ML-style type inference. The type structure of λpoly open is rather different: a code type records the names of free variables and their types, as well as the type of the whole code. It is not clear how (a pure fragment of) the calculus can be related to other foundational calculi; possible directions may be to use the calculus of contexts [19] by Sato, Sakurai, and Kameyama, and the contextual modal type theory by Nanevski, Pfenning, and Pientka [20]. Modal Logics. As we discussed above, the -fragment of modal logic, the fragment of LTL can be embedded into our logic, and the -fragment of LTL and our logic cannot be comparable.
354
T. Tsukada and A. Igarashi
Our logic has three characteristic features: (1) it is multi-modal, (2) it has universal quantification over modalities and (3) modal operators are “relative”, meaning their semantics depends on the possible world at which they are interpreted. Most of other logics do not have all of these features. Dynamic logic [21] is a multi-modal logic for reasoning about programs. Its modal operators are [α] for each program α, and [α]φ means “when α halts, φ must stand after execution of α from the current state”. Dynamic logic is multimodal and its modal operators are “relative”, but does not have quantification over programs. Therefore, there is no formula in Dynamic logic which would correspond to ∀α. α α φ. There is, however, a formula which is expressive in Dynamic logic but not in our logic: e.g., a Dynamic logic formula [α∗ ]φ, which means intuitively φ ∧ [α]φ ∧ [α][α]φ ∧ . . . , cannot be expressed in our logic. Hybrid logic [22] is a modal logic with a new kind of atomic formula called nominals, each of which must be true exactly one state in any model (therefore, a nonimal names a state). For each nominal i, @i is a modal operator and @i φ means “φ stands at the state denoted by i”. Hybrid logic has a universal quantifier over nominals. Hybrid logic differs from our logic, in that modal operators @i indicate worlds directly, hence are not “relative”. In Hybrid logic @i @j φ ↔ @j φ, but α β φ and β φ are not equivalent in our logic.
6
Conclusion and Future Work
We have studied a logical aspect of environment classifiers by developing a simply typed multi-stage calculus λ with environment classifiers. This calculus corresponds to a multi-modal logic with quantifier over transitions by the CurryHoward isomorphism. The classical proof system is sound and complete with respect to the Kripke semantics. Our calculus simplifies the previous calculus λα of environment classifiers by reducing run and some use of CSP to an extension of another construct. We believe our work helps clarify the semantics of environment classifiers. From a theoretical perspective, it is interesting to study the semantics of the intuitionistic version of the logic, as mentioned earlier, and also the calculus corresponding to the classical version of the logic. It is known that the naive combination of staging constructs and control operators is problematic since bound variables in quotation may escape from its scope by a control operator. We expect that a logical analysis, like the one presented here and Reed and Pfenning [23], will help analyze the problem. From a practical perspective, one feature missing from λ is CSP for all types. As argued in the introduction, we think typical use of CSP is rather limited and so easy to support. Type inference for λ is an open problem, but, actually, Calcagno, Moggi, and Taha [15] have already developed type inference for a subset of λα , so it may be easy to apply their technique to λ . Acknowledgments. This work was begun while the first author was at Kyoto University. We would like to thank Lintaro Ina, Naoki Kobayashi, Ryosuke Sato, and Naokata Shikuma for useful comments.
A Logical Foundation for Environment Classifiers
355
References 1. Jones, N.D., Gomard, C.K., Sestoft, P.: Partial Evaluation and Automatic Program Generation. Prentice-Hall, Englewood Cliffs (1993) 2. Consel, C., Lawall, J.L., Meur, A.F.L.: A tour of Tempo: A program specializer for the C language. Science of Computer Programming 52(1-3), 341–370 (2004) 3. Wickline, P., Lee, P., Pfenning, F.: Run-time code generation and Modal-ML. In: Proc. of PLDI 1998, pp. 224–235 (1998) 4. Poletto, M., Hsieh, W.C., Engler, D.R., Kaashoek, M.F.: ‘C and tcc: A language and compiler for dynamic code generation. ACM TOPLAS 21(2), 324–369 (1999) 5. Taha, W., Sheard, T.: MetaML and multi-stage programming with explicit annotations. Theoretical Computer Science 248, 211–242 (2000) 6. Gl¨ uck, R., Jørgensen, J.: Efficient multi-level generating extensions for program specialization. In: Swierstra, S.D. (ed.) PLILP 1995. LNCS, vol. 982, pp. 259–278. Springer, Heidelberg (1995) 7. Davies, R.: A temporal-logic approach to binding-time analysis. In: Proc. of LICS 1996, pp. 184–195 (1996) 8. Davies, R., Pfenning, F.: A modal analysis of staged computation. J. ACM 48(3), 555–604 (2001) 9. Taha, W., Nielsen, M.F.: Environment classifiers. In: Proc. of POPL 2003, pp. 26–37 (2003) 10. Yuse, Y., Igarashi, A.: A modal type system for multi-level generating extensions with persistent code. In: Proc. of PPDP 2006, pp. 201–212 (2006) 11. Kim, I.S., Yi, K., Calcagno, C.: A polymorphic modal type system for lisp-like multi-staged languages. In: Proc. of POPL 2006, pp. 257–268 (2006) 12. Stirling, C.: Modal and temporal logics. In: Handbook of Logic in Computer Science, vol. 2, pp. 477–563. Oxford University Press, Oxford (1992) 13. Moggi, E., Taha, W., Benaissa, Z.E.A., Sheard, T.: An idealized MetaML: Simpler, and more expressive. In: Swierstra, S.D. (ed.) ESOP 1999. LNCS, vol. 1576, pp. 193–207. Springer, Heidelberg (1999) 14. Benaissa, Z.E.A., Moggi, E., Taha, W., Sheard, T.: Logical modalities and multistage programming. In: Proc. of IMLA 1999 (1999) 15. Calcagno, C., Moggi, E., Taha, W.: ML-like inference for classifiers. In: Schmidt, D. (ed.) ESOP 2004. LNCS, vol. 2986, pp. 79–93. Springer, Heidelberg (2004) 16. Kojima, K., Igarashi, A.: On constructive linear-time temporal logic. In: Proc. of IMLA 2008 (2008) 17. Nanevski, A., Pfenning, F.: Staged computation with names and necessity. J. Functional Programming 15(5), 893–939 (2005) 18. Calcagno, C., Moggi, E., Sheard, T.: Closed types for a safe imperative MetaML. Journal of Functional Programming 13(3), 545–571 (2003) 19. Sato, M., Sakurai, T., Kameyama, Y.: A simply typed context calculus with firstclass environments. J. Functional and Logic Programming 2002(4), 1–41 (2002) 20. Nanevski, A., Pfenning, F., Pientka, B.: Contextual modal type theory. ACM Transactions on Computational Logic 9(3) (2008) 21. Harel, D., Kozen, D., Tiuryn, J.: Dynamic logic. In: Gabbay, D., Guenther, F. (eds.) Handbook of Philosophical Logic, 2nd edn., vol. 4, pp. 99–218. Springer, Heidelberg (2002) 22. Areces, C., ten Cate, B.: Hybrid logics. In: Blackburn, P., Wolter, F., van Benthem, J. (eds.) Handbook of Modal Logics, pp. 821–868. Elsevier, Amsterdam (2007) 23. Reed, J., Pfenning, F.: Intuitionistic letcc via labelled deduction. In: Proc. of M4M 2007 (2007)
Inhabitation of Low-Rank Intersection Types Pawel Urzyczyn Institute of Informatics, University of Warsaw [email protected] Abstract. We prove that the inhabitation problem (“Does there exist a closed term of a given type?”) is undecidable for intersection types of rank 3 and exponential space complete for intersection types of rank 2.
1
Introduction
Calculi with intersection types have been known for almost 30 years [4,16] and their importance is widely recognized in the literature. Just to mention a few applications: construction of lambda-models [2,4], normalization issues and optimal reductions [16,15], type inference and compilation [6,22]. Another, more foundational, aspect is the logic of intersection types. It has been noticed long ago that this “proof-functional”, rather than “truth-functional”, logic significantly differs from most other logical systems [1,13,14] and various studies were undertaken to understand the nature of this peculiar logic [5,12,17,21]. In particular, it is known that the decision problem for the logic of intersections, that is, the inhabitation problem, is undecidable [20]. This means that the expressive power of the propositional (!) logic of intersections is enormous. It is thus important to locate the exact borderline between decidable and undecidable cases. We address here the question what is the maximal rank of types for which the logic of intersection is still decidable, and what is the complexity? (Roughly speaking, an intersection of simple types is of rank 1, and a rank n + 1 type has arguments of rank n.) In [20] it is shown that the inhabitation problem is undecidable for types of rank 4. For rank 2 the problem is decidable, but Exptime-hard [10]. (Incidentally, rank 2 subsystem is interesting in that its typing power is exactly the same as of rank 2 parametric polymorphism [23].) This paper aims at closing the gap between the results of [10,20]. We show that the logic of rank 3 types is undecidable (Theorem 4). We also improve the lower bound for rank 2, proving Expspace-hardness. Since the obvious algorithm for rank 2 runs in exponential space, we conclude that the problem is Expspacecomplete (Theorem 9). The intermediate problem used in the proof of Theorem 9 is the halting problem for a bus machine—an alternating device operating on a fixed length word, but expanding its own program (the set of available instructions) during the computation. This relatively simple model capturing the notion of exponential space, without explicitly operating in exponential space, may perhaps be of some interest by itself?
Partly supported by Ministry of Science and Higher Education grant N N206 355836.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 356–370, 2009. c Springer-Verlag Berlin Heidelberg 2009
Inhabitation of Low-Rank Intersection Types
357
The system. We consider the basic system of intersection types, with no constants nor subtyping. For simplicity, we assume that the intersection operator is idempotent, commutative and associative. The inference rules are given below: (Ax) Γ, x : τ x : τ (I→)
(I∩)
Γ, x : σ M : τ Γ (λx: σ M ) : σ → τ Γ M :σ
Γ M :τ
Γ M : σ∩τ
(E→)
Γ M :σ→τ
Γ N :σ
Γ (M N ) : τ (E∩)
Γ M :σ∩τ Γ M :σ
After Leivant [11], we set rank(τ ) = 0, when τ is a simple type, and we define: rank(σ→τ ) = max{rank(σ)+1, rank(τ )}; rank(σ∩τ ) = max{1, rank(σ), rank(τ )}. The algorithm. The natural partial algorithm to find inhabitants of a given intersection type [3,10,20] is similar to the Wajsberg/Ben-Yelles algorithm for simple types [18]. But instead of processing constraints of the form Γ ? : τ one at a time, one asks for a single solution X of a system of constraints: Γ1 X : τ1 , . . . , Γn X : τn , where the domains of the environments Γi are the same for all i = 1, . . . , n. The nondeterministic procedure guesses step by step the shape of a normal solution X and verifies the correctness by transforming the constraints until they are trivially satisfied. There are esssentially three possible transformations. 1. If one of the types τi is an intersection, say τi = σ ∩ ρ, then the constraint Γi X : τi is replaced by two: Γi X : σ and Γi X : ρ. 2. If every τi is an implication, say τi = σi → ρi , then a possible guess is that X = λx.X , and therefore we can set the following new constraints: Γ1 , x : σ1 X : ρ1 , . . . , Γn , x : σn X : ρn , if there is no variable y assigned type σi in every Γi ; otherwise we can identify variables x and y and keep the Γi unchanged. 3. Another possibility is that there is a variable x and a number k such that we have Γi λz1 . . . zk .xz1 . . . zk : ρ1i → · · · → ρki → τi , for each i, where zi are fresh. Then we may guess that X = xZ 1 . . . Z k and consider k systems: Γ1 Z j : ρj1 . . . , Γn , Z j : ρjn , for j = 1, . . . , k. These k systems must now be independently solved in parallel. (If k = 0 then x is a solution and the algorithm accepts.) Ranks 1 and 2. It should be clear that the above algorithm yields a solution whenever one exists. To obtain a decision procedure we must be able to give an upper bound for the number of steps in which a solution should be found. The crucial issue is if we can limit the number of constraints. This is possible for ranks 1 and 2. Indeed, if an initial problem ∅ X : τ , with rank(τ ) = n, leads to
358
P. Urzyczyn
a system Γ1 X : τ1 , . . . , Γn X : τn , then the ranks of all types in Γi are at most n−1. For n = 1 this means that all types in Γi are simple, and after a linear number of steps we have no intersections left in the constraints. Since all types in the constraints are subformulas of the original input, the number of transformations is at most polynomial, quite like in the simply-typed case. Inhabitation for rank 1 is thus in Pspace, the same as for propositional intuitionistic logic. For rank 2 we still can give a linear bound on the number of constraints. Intersections can occur in Γi , but only at the top level, and therefore no intersection can be moved (in case 3) to the right-hand side of the sign. There are no more parallel constraints than top-level intersections in the input type. An input τ of size n has at most n subformulas. With at most n constraints this makes n2 choices of types at the rhs of . After every n2 steps we must add a new variable at the lhs, in order not to get into a loop. A variable can be assigned any of n types in each of the n environments, and this is nn different type assignments. Therefore the number of steps we can reasonably make is at 2 most nn · n2 ≤ 2n a.e. This is alternating exponential time, so the upper bound we obtained is exponential space, cf. [8]. We show that the exponential space upper bound is tight: the problem is Expspace-complete. This improves the result of [10], where it was shown that the problem is Exptime-hard, via simulation of linear bounded alternating machines. We use a stronger model, a bus machine, which is like a linear bounded machine, capable of expanding its own program by creating new instructions on the run. Each of the new instructions is of severely limited applicability (it can be used only when the tape contents is equal to a specific word), but there may be plenty of them, and this is where the exponential storage is available. Bus machines and rank 2 types are discussed in Section 3. Rank 3 and up. For types of rank 3 the number of constraints can grow. Here is an example from [20]. Let α = ((a → a) → 1) ∩ ((b → a) → 2) → 1, β = ((a → b) → 1) ∩ ((b → b) → 2) → 2, γ = ((c → a) → a) → 1, and δ = ((c → b) → a) → 2. Then the type α∩β → γ ∩δ → c → 1 has infinitely many inhabitants, for example λxyz.x(λu1 .x(λu2 .x(λu3 .y(λv.u1 (u2 (u3 v)))))). There are arbitrarily long runs of our algorithm without repeating the steadily growing set of constraints. Although types of rank 3 could create a potentially infinite search space, the “vertical” information-passing used in [20] required the fourth rank, and the problem of rank 3 remained open. It turns out that it is also undecidable. We treat this case in Section 2 using an encoding of a restricted form of emptiness problem for a simple class of semi-Thue systems.
2
Expanding Tape
Let A be a finite alphabet, with a designated subset E ⊆ A of end symbols. A simple switch over A is a pair of elements of A, written a ← b. A labeled switch is a quadruple, written a ← b(c ← d), where the simple switch c ← d is the label. Finally, a splitting switch or split , is a triple a ← b · c. We require that:
Inhabitation of Low-Rank Intersection Types
359
− In a simple switch a ← b, if b ∈ E then a ∈ E; − In a labeled switch a ← b(c ← d), if b ∈ E then a ∈ E and c, d ∈ E; − In a split a ← b · c, always a ∈ E and b ∈ E. We draw switches (resp. simple, labeled, and splitting) as follows: aO aO dHH aO HH c←d HH c b b b The three forms of switches represent types of shape b → a, ((d → c) → b) → a, and b ∩ c → a, respectively, and this is reflected by the direction of the arrows. One can also read the arrow as an assignment: for instance the switch a ← b (resp. a ← b · c) indicates an option to replace a by b (resp. bc). An expanding tape machine (etm) is a device operating on words with help of instructions, each instruction consisting of a number of switches. Executing an instruction I on a word amounts to replacing every symbol a of the word by another symbol b, using a switch of the form a ← b or a ← b(c ← d) provided by I. An exception is the last symbol; if it is an end symbol then it can be replaced by two, using a split occurring in I. This is illustrated by the drawing below: a computation of an etm (starting with a single symbol) may be represented as a tree (expanding only to the right); symbols at each level of the tree, read from left to right, show the contents of the tape in the consecutive steps. 0O cHH HH HH H cO bO f ←g
a←e
dO aO
hO cGG GG GG G fO fO
g g e In our example, the computation begins with 0 and the first level of the tree is obtained by applying the split 0 ← b · c. The next step uses two switches b ← d(a ← e), and c ← h(f ← g), etc. (Note that labels a ← e and f ← g introduced in the second step are later used as switches.) The contents of the tape is initially 0 and then bc, dh, aff , and egg. End symbols are 0, c and h. In order to precisely explain how all this happens, we define an etm as a tuple of the form M = A, 0, 1, I , where A is a finite alphabet, 0, 1 ∈ A are the initial and final symbols, and I is the set of global instructions. Every global instruction is a finite set of switches, and we assume that either all switches in an instruction I ∈ I are labeled, or no labeled switch belongs to I. A sequence a1 ← b1 , . . . , ak ← bk of simple switches is called a local instruction of length k. A configuration of an etm is a pair of the form C = w, J , where w is a word over A (in which all symbols, except possibly the last one, are not active) and J is a set of local instructions of lengths not exceeding the length of w. The initial configuration is C0 = 0, ∅ , and configurations of the form 1k , J are final .
360
P. Urzyczyn
A single move of a machine is a result of applying either a global instruction (a global step) or a local one (a local step). The latter is easier to explain. Suppose that C = w, J and I = a1 ← b1 , . . . , ak ← bk ∈ J , where k ≤ |w|. The instruction is applicable provided w = a1 . . . ak−1 ak . . . ak , and the resulting ID is C = b1 . . . bk−1 bk . . . bk , J . We then write C ⇒IM C , omitting the indices when possible. There may be more than one applicable instruction (the machine is nondeterministic). Now we describe the action of global instructions I ∈ I: – Let w = a1 . . . an and w = b1 . . . bn . If for every i ≤ n the simple switch ai ← bi belongs to I then w, J ⇒IM w , J holds for any J . – Let w = a1 . . . an and w = b1 . . . bn bn+1 . If all the switches a1 ← b1 , . . . , an−1 ← bn−1 , an ← bn · bn+1 , belong to I then w, J ⇒IM w , J , for every J . – Let w = a1 . . . an and w = b1 . . . bn . If ai ← bi (ci ← di ) is in I, for all i ≤ n, then w, J ⇒IM w , J , where J = J ∪ { c1 ← d1 , . . . , cn ← dn }. For instance, suppose that I contains the global instructions I1 = {0 ← b·c, . . . }, I2 = {b ← d(a ← e), c ← e(f ← g), . . . }, I3 = {d ← a, e ← f · f, . . . }. Our example tree illustrates a computation which begins with an application of I1 , followed by I2 , and I3 , and then followed by an application of the local instruction I = a ← e, f ← g , created by the earlier use of I1 . The notation C ⇒ C means that C ⇒IM C , for some I, and the symbol ⇒ ⇒ denotes as usual the transitive and reflexive closure of ⇒. Observe an essential difference between global and local instructions. Global instructions provide for additional nondeterminism: one can pick any of the switches and apply them in any order, with or without repetitions. A local instruction, created by an earlier global step, is deterministic: it applies in exactly one way, in a strictly defined context. A configuration C is accepting iff C ⇒ ⇒ C1 , for some final configuration C1 . We say that an etm M halts when the initial configuration C0 is accepting. The halting problem for etm is to determine if a given machine halts. Encoding an etm with intersection types. Given an etm M, we construct an environment Γ , of types of rank at most 2, such that Γ X : 0 for some X (where 0 is a designated type variable) if and only if M halts. The assumptions in Γ represent global instructions of M as follows: – An instruction of the form I = {at ← bt (ct ← dt ) | t ∈ T } is represented by a variable xI of type t∈T (((dt → ct ) → bt ) → at ). – An instruction I = {at ← bt | t ∈ T } ∪ {as ← bs , cs | s ∈ S} is represented by a variable xI of type t∈T (bt → at ) ∩ s∈S (bs ∩ cs → as ). In addition Γ contains the declaration y : 1. A system of constraints Γ1 X : a1 , . . . , Γn X : an , where ai are type variables, represents a configuration C = w, J when w = a1 · · · an and: – Γj = Γ ∪ Δj , where Dom(Δj ) = {xJ | J ∈ J }, for each j = 1, . . . , n; – If J = a1 ← b1 , . . . , ak ← bk belongs to J then Δj (xJ ) = bj → aj for j ≤ k, and Δj (xJ ) = bk → ak , otherwise.
Inhabitation of Low-Rank Intersection Types
361
Lemma 1. The system representing C has a solution iff C is accepting. Proof. The proof from left to right is by induction with respect to the size of a normal solution X. First suppose that the solution is a variable. That can only happen when ai = 1, for all i, and X = y. In this case C itself is a final configuration. Otherwise the term X must be an application of the form xI X , where I is either a global or a local instruction. First suppose that for every j ≤ n we have Γj xI : bj → aj , for some bj , and X is a solution of the constraints Γ1 X : b1 , . . . , Γn X : bn . This system represents a certain configuration C such that C ⇒ C . Since X is shorter than X, we can apply induction and conclude that C (and thus also C) is accepting. If Γj xI : bj → aj , for all j < n, and Γn xI : bn ∩ cn → an , then we have Γ1 X : b1 , . . . , Γn X : bn , Γn X : cn and the induction applies as well. The remaining case is when all switches of I are labeled (and I is global) and Γj X : (dj → cj ) → bj , for all j ≤ n. Now X must necessarily be an abstraction, X = λz X , and X is a solution of the system Γ1 , z : d1 → c1 X : b1 , . . . . . . , Γn , z : dn → cn X : bn . The sequence c1 ← d1 , . . . , cn ← dn is nothing else but the local instruction obtained from C by applying I. Therefore, the new system represents a configuration obtained from C in one step, and we can again use induction. This completes the left-to-right part. The converse is shown by induction with respect to the number of steps from C to a final configuration. Undecidability of the halting problem. Lemma 1 reduces the halting problem for etm to the inhabitation problem for types of rank 3. We now show that the halting problem is undecidable using a restricted class of semi-Thue systems. A semi-Thue system over an alphabet B, where each rule has the form st ⇒ uv, for some s, t, u, v ∈ B, is called a simple semi-Thue system (ssts). Lemma 2. For an ssts it is undecidable if there exists w ∈ B ∗ with w 1|w| . Proof. Easy reduction from emptiness for linear bounded automata [7].
In the remainder of this section we assume a fixed ssts over an alphabet B with 1 ∈ B, and with m rewriting rules of the form si ti ⇒ ui vi (i = 1, . . . , m). We construct an etm M such that M halts iff there exists a word w that can be rewritten to a string of 1’s. The alphabet A of the machine contains B, and a number of additional symbols. More precisely, A = B ∪ {0, 0 , 0 , ∗, $, ◦, •} ∪ {{ i , ri , $i , ?i, i?, !i, i!} | i = 0, . . . , m}. The initial symbol is 0, and the global instructions are as follows. First there are: I0 = {0 ← 0 · 0 } and I1 = {0 ← 0 , 0 ← r0 · $0 }. Then, for every i = 1, . . . m, we have a global instruction Di = {∗← ∗ (◦←•), i−1 ← i (i?←i!), ri−1 ←ri (?i←!i), $i−1 ←$i (◦←•)}, and there are two more: D0= {∗ ← ∗, m ← ∗, rm ← 0 , $m ← r0 , $0 } D∞ = {{∗ ← x, m ← x, rm ← x} | x ∈ B} ∪ {$0 ← $}.
362
P. Urzyczyn
Finally, for i = 1, . . . , m, there is an instruction: Ei = {x ← ◦(• ← x) | x ∈ B ∪ {$}} ∪ {si ← i?(i! ← ui ), ti ←?i(!i ← vi )}. The machine M has no more global instructions. We now explain their meaning and use. The only instructions using 0, 0 , 0 are I0 and I1 , so the computation begins with I0 followed by I1 and we obtain the word 0 r0 $0 . Instructions Di , for i = 0, . . . , m are to be used in an initial phase of the computation and their intended role is twofold: to expand the tape to a sufficient length and to “declare” local instructions to be later used to simulate rewriting rules. Here is a graphic representation of the switches of Di (i > 0) and D0 : ri−1 rm i−1 ∗O ∗O $i−1 m $m O O O O bDD O O DD ◦←• ◦←• i?←i! ?i←!i D r0 ri ∗ ∗ ∗ 0 $0 i $i Suppose for a moment that the tape contents of our machine has the form ∗ ∗ · · · ∗ 0 r0 $0 with j ≥ 0 stars. The only applicable instruction is D1 , and the next configuration must contain the word ∗ ∗ · · · ∗ 1 r1 $1 . Then we have ∗ ∗ · · · ∗ i ri $i , for all i ≤ n, ending up with ∗ ∗ · · · ∗ m rm $m . If we now apply the instruction D0 then the resulting word is ∗ ∗ · · · ∗ ∗ 0 r0 $0 , with j + 1 stars. This can be iterated a number of times. The iteration can terminate only by an application of D∞ , which yields an arbitrary word of the form a1 . . . an $, where n ∈ N, and aj ∈ B, for all j. No further tape expansion is possible. Each of the local instructions generated during the initial phase has the form ◦ ← •, . . . , ◦ ← •, i? ← i!, ?i ←!i, ◦ ← • , and can be used to replace ◦ ◦ · · · ◦ i? ?i ◦ · · · ◦ by • • · · · • i! !i • · · · •. The two switches i? ← i!, ?i ←!i occur next to each other at an arbitrary position (because the i-th rewrite rule can be used anywhere in the word). The next phase of computation is the simulation of the rewriting. Our intention is that the machine should halt iff the word a1 . . . an can be rewritten into 1n . The global instructions Ei serve this purpose, and each Ei simulates the rule si ti ⇒ ui vi . The switches of Ei can act on a word of the form b1 . . . bn $ in a quite arbitrary way, but consider the next step after Ei . The only instructions that may be applicable are the local ones defined during the initial phase. So the only way to avoid a deadlock is that Ei uses each of the symbols i? and ?i exactly once and places them next to each other. All other symbols written by Ei must be ◦. Then a local instruction is applied, and then another local instruction: one generated by an application of Ei . Figure 1 illustrates the situation occurring when this was the last application of Ei (assuming that bj+1 = si , bj+2 = ti ). The bottom line of Figure 1 is exactly what one obtains by applying rule si ti ⇒ ui vi to the word b1 . . . bn . But it may happen that another application of Ei (at the same position in the word) took place earlier in the computation, a trace of this being left in memory in the form of the appropriate local instruction. This local instruction can be applied as well, yielding a word c1 . . . cj ui vi cj+3 . . . cn $ rather than the intended b1 . . . bj ui vi bj+3 . . . bn $. Note that in this case the word c1 . . . cj si ti cj+3 . . . cn $ must have earlier occurred in the computation, and thus c1 . . . cj ui vi cj+3 . . . cn represents a result of a legitimate rewriting sequence from a1 . . . an . Therefore we can state:
Inhabitation of Low-Rank Intersection Types b1O
...
◦O
...
◦O
i?O
?iO
◦O
•O
...
•O
i!O
!iO
b1
...
bj
ui
vi
•←b1
bj
O
•←bj
sO i i!←ui
tOi
bj+3 !i←vi
...
363
bnO
$O
...
◦O
◦O
•O
...
•O
•O
bj+3
...
bn
$
O
•←bj+3
•←bn
•←$
Fig. 1. Hiding and restoring a word
Lemma 3. Assuming definitions as above, the machine M halts if and only if there exists a non-empty word a1 . . . an which rewrites to 1n . Proof. The hard part is “only if”. As we have seen, an arbitrary computation of M must begin with an initial iteration yielding a word of the form a1 . . . an $. By induction with respect to the number of steps we then show that every third configuration must have the form w, J where a1 . . . an w. If M halts then a1 . . . an 1n . The “if” direction goes by induction with respect to the number of rewriting steps. Theorem 4. The inhabitation problem is undecidable for rank 3. Proof. Lemma 3 implies that the halting problem for etm is undecidable. This, together with Lemma 1, yields the undecidability of inhabitation in rank 3.
3
Bus Machines
To investigate the complexity of inhabitation in rank 2 we consider another machine model, the bus machine. Unlike an etm, a bus machine operates on a word (bus) of a fixed length. However, the instructions are more complex, and a bus machine is an alternating, rather than just nondeterministic, device. Alternation is introduced by an additional form of a switch: a universal switch is a triple, written a ← b × c, which corresponds to a type of the form c → b → a, i.e. to an implication with two premises. (N.B. universality has nothing to do with the intersection operator.) Needless to say, splitting switches are not permitted. Formally, we define a bus machine as a tuple M = A, m, w0 , w1 , I , where A is a finite alphabet, m > 0 is the bus length of M (the length of the words processed), w0 and w1 are words of length m over A, called the initial and final word , respectively, and I is again called the set of global instructions. Every global instruction is an m-tuple I = I1 , . . . , Im of sets of switches. Switches in Ii are meant to act on the i-th symbol of the bus. It is required that all switches in a given instruction I are of the same kind: either all are simple, or all are labeled, or all are universal. Therefore we classify instructions as simple, labeled, and universal. A local instruction is an m-tuple of simple switches. A configuration of M is a pair w, J , where w is a word over A of length m, and J is a set of local instructions. The initial configuration is of course w0 , ∅ ,
364
P. Urzyczyn
and any configuration of the form w1 , J is called final . Local steps of computation are defined exactly as in the case of an etm. Global steps differ in that the i-th component of a global instruction applies to the i-th symbol of the tape (bus). To make it more precise, suppose that I = I1 , . . . , Im , and let w = a1 . . . am and w = b1 . . . bm , w = c1 . . . cm . – If I is a simple instruction, and for every i ≤ m the simple switch ai ← bi belongs to Ii , then w, J ⇒IM w , J ; – If for every i ≤ m there is ai ← bi (ci ← di ) in Ii , then w, J ⇒IM w , J , where J = J ∪ { c1 ← d1 , . . . , cm ← dm }; – If I is universal and ai ← bi ×ci is in Ii , for i ≤ m, then w, J ⇒IM w , J , and also w, J ⇒IM w , J . A configuration w, J is accepting iff it is either a final configuration, or – There exists a non-universal instruction I, such that w, J ⇒IM w , J and w , J is accepting, or – There is a universal instruction I such that we have w, J ⇒IM w , J and w, J ⇒IM w , J , where both w , J and w , J are accepting. The machine M halts iff the initial configuration is accepting. Example. This example, based on [10], should give some hint about bus programming. Let I = {0 ← 0, 1 ← 1}, I + = {0 ← 1}, and I − = {1 ← 0}. Take M = A, 4, 0000, 1111, I , where I consists of the following tuples: (I, I, I, I + ), (I, I, I + , I − ), (I, I + , I − , I − ), (I + , I − , I − , I − ). The machine M makes 24 − 1 steps and halts after it has seen all binary strings of length 4. (It uses neither alternation nor local instructions.) Inhabitation in rank 2. We associate simple types with switches: – a ← b translates to b → a; – a ← b(c ← d) translates to ((d → c) → b) → a; – a ← b × c translates to b → c → a. A set of switches translates to an intersection of the corresponding types. Suppose that M = A, m, w0 , w1 , I is a bus machine with I = {I1 , . . . , In }, w0 = a1 . . . am , and w1 = b1 . . . bm . Define Γi = {x0 : bi , x1 : τ1i , . . . , xn : τni }, where τji is the translation of the i-th coordinate of Ij . Lemma 5. The machine M halts if and only if there exists a closed term M such that M has type ai in every Γi . Proof. Let J = cJ1 ← dJ1 , . . . , cJm ← dJm , for each J ∈ J . Prove that the configuration e1 . . . em , J is accepting if and only if there is a term M satisfying all the judgements ΓiJ M : ei , where ΓiJ = Γi ∪ {yJ : dJi → cJi | J ∈ J }. Proceed by induction with respect to the definition of acceptance (“only if” part) and with respect to M (“if” part). It follows that the halting problemfor M reduces to the existence of an inhabim tant of a rank 2 type of the form i=1 (bi → τ1i → · · · → τni → ai ).
Inhabitation of Low-Rank Intersection Types
365
The exponential space hardness. Our goal is to simulate exponential space Turing Machines by means of bus machines. To this end, we exploit the identity Expspace=Aexptime, that is, we actually simulate alternating exponential time. To begin with, assume that an alternating Turing Machine T is given, together with an input word v = a1 . . . an . For simplicity we assume the following: k
– T is a single-tape machine, working in time 2n . – The tape expands to the right and it is initially filled with blanks, except that the first n cells hold the input word. (Let ak = blank, when k > n.) – The set of states of T is partitioned into final states, universal states, and existential states; the universal and existential states are called choice states. – Every choice state q of T has exactly two successors; in a choice state the machine does not write nor move its head. – In a deterministic state the machine always moves its head. We construct a bus machine M of size polynomial in n such that M halts if and only if T accepts v. The bus of M should be seen as composed of 7 segments: |d|i|x|y|z|q|Q| to be interpreted as follows: 1. 2. 3. 4. 5. 6. 7.
The The The The The The The
presently scanned tape symbol d of T (the symbol segment); position i of the head of T (the head segment ); left time stamp x, for the i − 1-st tape cell; present time y (the clock segment ); right time stamp z, for the i + 1-st tape cell; present state q of T (the state segment ); state Q of M’s own control (the control segment ). k
Segments 1, 6, and 7 are single symbols. Segments 2–5 hold numbers up to 2n in binary, i.e. use nk space. (This is indicated by the bold symbols used.) Note that the bus length of our machine is m = 4nk + 3. The bus of M is too short to store a complete ID of T . Therefore only partial information is kept on the bus, namely the internal state and the contents and number of the presently scanned tape cell. In addition we have the present time, and the two additional time stamps. Their role is to remember the time when the machine last scanned its i − 1-st (resp. i + 1-st) tape cell. The left time stamp is ⊥ when i = 1, i.e., when the machine head visits the left end of tape. Also the right time stamp can be ⊥ (if the right neighbour cell has not been seen yet). The initial word of M is | a1 | 1 | ⊥ | 0 | ⊥ | q0 | A |, where 1 represents the string 0 . . . 01 (i.e., the binary code of the number 1). Similarly, ⊥ is the string of ⊥’s, and 0 is the string of 0’s. The final word has ∗’s everywhere. The simulation. A configuration w, J of M is called proper when the word w has the form | d | i | x | y | z | q | A |. We now show how a single step of T is simulated by one or more steps of M. The case when q is a choice state is easy: no writing, no moving, just change the internal state of T , using a global step. In deterministic states, the machine T moves its head to the left or to the right. To simulate this, the symbol in the first segment should be replaced by
366
P. Urzyczyn
the contents of the tape cell to the left or to the right of the present head position. The machine M must know what symbol it is. Thus, whenever T leaves a particular tape cell, and the current symbol must be deleted from the bus, M creates a local instruction containing the information about the present time and symbol. This local instruction is left in the memory of M to be used at the time of return to the same tape cell. To make sure that we always use the appropriate local instruction we need the time stamps. If T makes a left move from tape cell i to i−1, then positions i+1, i, i−1, i−2 of tape are called back , present , target , and forward positions, respectively. For a right move, we use these words in the opposite direction, e.g., i − 1 is the back position. Suppose the machine leaves position i − 1 at time x and is expected to return there at time y. Then the back, present, and target position at time x are, respectively, the forward, target, and present position at time y, because the moves made at x and at y are in the opposite directions. Consider the case when T , scanning d in state q, writes c, moves its head to the left and enters state p. The simulation of such a move begins with a global labeled step, which does not affect the bus, except that the control segment is set to B. But a local instruction is created (a message is sent) to be used at the time of return. It contains the information about the current time y, the symbol c to be written, and the right (back) time stamp z. These parts of the bus can now be safely erased. This is illustrated by the top two rows in Figure 2, where the arrows between many-symbol segments abbreviate multiple switches, and the same applies to their labels. (For instance the switches used in the third segment are actually 0 ← 0(◦ ← •) and 1 ← 1(◦ ← •).) The third row in the figure is obtained after several steps of execution, involving a number of simple global instructions. The machine decreases the head segment i by one (if this is impossible, it gets stuck) and shifts x and y one
?←c
b←b
y ←y
O
q
O O
•←y
D←E
BO
q
O
x←x
AO ◦←•
?←z
y
xO u←u
q
zO
O
?O
i−1←i−1
zO
y
xO
i−1 O
?O
O
◦←•
i←i
iO
dO
y
xO
iO
dO
CO •←q
E←F
?O
i−1 O
?O
xO
◦O
◦O
DO
bO
i−1 O
uO
xO
•O
•O
EO
bO
i−1 O
uO
xO
y
O
q
O
FO
b
i−1
u
y+1
y
p
A
Fig. 2. A deterministic left move
Inhabitation of Low-Rank Intersection Types
367
segment to the right. Then question marks are put in place of the symbol and left time stamp. The details of the construction are left to the reader; cf. [10]. The purpose of the next step is to prepare the bus to “receive” the message sent at time x, when the machine last scanned tape cell i − 1 and moved right to cell i. In fact, the machine must guess the essential contents of the message, namely the symbol b and the time stamp u for tape cell i − 2. (At time y this is seen as the forward time stamp, but it was the back time stamp when the message was sent.) Since the values y and q must not be lost, we “hide” them inside an auxiliary local instruction J, using a trick similar to that in Figure 1. Now we can use the message sent at time x, i.e., the local instruction created at time x. The result is the third last row in Figure 1. Then we restore y and q using J (second last row) and enter the last phase of the simulation: writing y+1 in the clock segment, changing state q to p and resetting control to A. At some later time t, the machine T may return to the i-th tape cell from the left. Then the simulation begins with the bus | e | i−1 | v | t | y | r | A |. After a number of steps this is replaced by | ? | i | t | y | ? | r | C |, and then by the bus | ? | i | ◦ | y | ? | ◦ | D |. Now the message sent at time y is applicable. Note that we send only one message at a time, so that there is no danger of reading a wrong message: the proper one is identified by the target time stamp. The same applies to the auxiliary local instructions. A computation step of T involving a right head shift is simulated by M in a dual way, provided that the right time stamp is not ⊥. Otherwise (when T first time enters a tape cell) we only use global instructions, as shown below: qO yO xO dO iO ⊥O AO i←i
?←c
dO
iO
y←y
?←x
xO
yO
◦←•
⊥O
◦←•
qO
D←E
BO i
ai+1 y p y+1 i+1 A ⊥ Details of the construction are left to the reader. It is also left to the reader to define the appropriate global instruction that can reset the whole bus to a string of stars, when an accepting state of T appears in the 6-th segment. k
Correctness. A computation of the alternating machine T is a tree of height 2n . An initial segment of a branch in the tree, i.e., a sequence C0 , C1 , . . . , Cy of IDs of T , is called a play. We describe conditions under which a proper configuration w, J of M represents a play P = C0 , C1 , . . . , Cy of T . First, we want w = | d | i | x | y | z | q | A |, where the meaning of the components is as in the previous subsection. For instance, q is the state in Cy , the binary word x stands for the largest x < y such that the machine head in Cx is at i − 1, etc. Then we require that the instructions in J encode the machine moves made so far, as follows. For every y < y, such that T makes a left (right) move in Cy , there must be one local instruction in J of one of the forms below (cf. Figure 2):
368
?O
P. Urzyczyn
iO
◦O
y O
?O
◦O
DO
?O
iO
?O
y O
◦O
◦O
DO
c c • z • z • • y y i E i E where the values of i, c, and z are adequate for Cy . If tape cell j was visited at time x < y and then at time y + 1, but not in between, then we may also have auxiliary local instructions in J , of one of two possible shapes: •O xO uO •O uO xO •O •O jO jO bO EO bO EO q q u x x u j j y y F F b b Although there may be more than one w, J satisfying the above conditions, note that the local instructions representing deterministic steps are uniquely determined by the values y of the clock. Lemma 6. Let a configuration w, J represent a play C0 , C1 , . . . , Cy of T . 1. If C0 , C1 , . . . , Cy , Cy+1 is a play then w, J ⇒ ⇒ w , J , where w , J represents C0 , C1 , . . . , Cy , Cy+1 . 2. If w, J ⇒ ⇒ w , J , with w , J proper and with no proper configuration as an intermediate step, then w , J represents a play of the form C0 , C1 , . . . , Cy , Cy+1 . Proof. (1) The nontrivial case is when Cy is deterministic. The message sent at time y (to be received and executed at some time t) contains two kinds of data. 1. Known – the – the 2. Known – the – the
both at time y and time t: present position (at head segment), to become target at time t; present time (at clock segment), to be target time stamp at time t. at time y but not at time t: present symbol (at symbol segment), to be target symbol at time t; back time stamp (at back segment), to be forward time stamp at t.
Data of the first kind identify the instruction uniquely. Indeed, only one message is sent at time y; when the machine returns to the same position next time, the target time stamp is y and it will never be y again, so there is no chance of confusing messages. Data of the second kind constitute the actual contents of the message. It is used to restore the information about the back/forward position that was deleted from the bus at time y. Now if w, J represents C0 , C1 , . . . , Cy , we can see that M can move from w, J to a configuration representing C0 , C1 , . . . , Cy , Cy+1 by properly restoring the missing information. This requires creating and executing an auxiliary local instruction, and again there is no way to confuse it, because no other auxilary instruction available at time y can refer to the clock x. The one created at time y uses the clock x, and after step y+1 it becomes obsolete too. (2) We only consider the case of deterministic Cy . The machine M cannot really behave in a different way than described in part (1). The only nondeterminism available at time y is the guessing of a part of the message to be received. If M makes a wrong guess, it must get stuck, because there is no other
Inhabitation of Low-Rank Intersection Types
369
instruction that could be used. Any other behaviour of M is fully determined by the configuration. In particular the message to be received must introduce itself using the proper position and time stamp. Lemma 7. The Turing machine T accepts a1 . . . an if and only if M halts. Proof. First observe that the initial configuration of M represents the initial play of T . So it is enough to show for w, J representing C0 , C1 , . . . , Cy , that: Cy is an accepting ID of T
⇔
w, J is an accepting configuration of M.
The proof of each direction is by induction with respect to the size of the acccepting computation. That is, in the base case we deal with a final ID of T , and we must observe that the final bus ∗ ∗ · · · ∗ can be directly obtained from w, J only if w contains a final state of T . The induction step in the deterministic case follows directly from Lemma 6. In the existential and universal cases we must note that w, J has exactly two proper successors with respect to ⇒ ⇒. The following is now immediate: Proposition 8. The halting problem for bus machines is Aexptime-complete, and therefore Expspace-complete. Theorem 9. The inhabitation problem for rank 2 types is Expspace-complete. Proof. Follows from Lemma 5 and Proposition 8.
4
Related Results and Open Problems
There are very few positive results on decidability of inhabitation for subsystems of intersection types. One specific case is the system without rule (I∩) shown decidable in [9], by observing that the number of constraints is bounded. Some other cases are treated in [3]; unfortunately the proof given there for the system without rule (E∩) contains a gap.1 The question of inhabitation without rule (E∩) seems to be harder than expected. We still have rule (I∩) and types of unbounded rank, so there is no immediate limit for the number of constraints, take e.g., ((α → p) ∩ (β → p) → p) → p. On the other hand, with no elimination rule, nondeterminism is severely restricted. The conjecture that the problem is decidable seems plausible, yet at present we do not know how to prove it. A different aspect of inhabitation emerges from the semantics-motivated systems built over a finite number of constants [2,19, Problem #13]. The results of the present paper do not apply to these questions directly, however we believe that the techniques developed here may be useful in solving these cases too. 1
The claim m ≤ r on page 14 is not supported.
370
P. Urzyczyn
References 1. Alessi, F., Barbanera, F.: Strong conjunction and intersection types. In: Tarlecki, A. (ed.) MFCS 1991. LNCS, vol. 520, pp. 64–73. Springer, Heidelberg (1991) 2. Alessi, F., Barbanera, F., Dezani-Ciancaglini, M.: Intersection types and lambda models. Theoretical Computer Science 355(2), 108–126 (2006) 3. Bunder, M.W.: The inhabitation problem for intersection types. In: Harland, J., Manyem, P. (eds.) Computing: The Australasian Theory Symposium. Conferences in Research and Practice in Information Technology, vol. 77, pp. 7–14. Australian Computer Society (2008) 4. Coppo, M., Dezani-Ciancaglini, M.: An extension of basic functionality theory for lambda-calculus. Notre Dame Journal of Formal Logic 21, 685–693 (1980) 5. Dezani-Ciancaglini, M., Ghilezan, S., Venneri, B.: The “relevance” of intersection and union types. Notre Dame Journal of Formal Logic 38(2), 246–269 (1997) 6. Kfoury, A.J., Wells, J.B.: Principality and type inference for intersection types using expansion variables. Theoretical Computer Science 311(1–3), 1–70 (2004) 7. Kozen, D.: Automata and Computability. Undergraduate Texts in Computer Science. Springer, Heidelberg (1997) 8. Kozen, D.: Theory of Computation. Springer, Heidelberg (2006) 9. Kurata, T., Takahashi, M.: Decidable properties of intersection type systems. In: Dezani-Ciancaglini, M., Plotkin, G. (eds.) TLCA 1995. LNCS, vol. 902, pp. 297– 311. Springer, Heidelberg (1995) 10. Ku´smierek, D.: The inhabitation problem for rank two intersection types. In: Ronchi Della Rocca, S. (ed.) TLCA 2007. LNCS, vol. 4583, pp. 240–254. Springer, Heidelberg (2007) 11. Leivant, D.: Polymorphic type inference. In: PoPL, pp. 88–98. ACM Press, New York (1983) 12. Liquori, L., Ronchi Della Rocca, S.: Intersection-types ` a la Church. Information and Computation 205(9), 1371–1386 (2007) 13. Lopez-Escobar, E.G.K.: Proof functional connectives. In: Methods in Mathematical Logic. LNMath, vol. 1130, pp. 208–221. Springer, Heidelberg (1993) 14. Mints, G.E.: The completeness of provable realizability. Notre Dame Journal of Formal Logic 30, 420–441 (1989) 15. Møller Neergaard, P., Mairson, H.G.: Types, potency, and idempotency: Why nonlinearity and amnesia make a type system work. In: ICFP, pp. 138–149. ACM Press, New York (2004) 16. Pottinger, G.: A type assingment for the strongly normalizable λ-terms. In: Seldin, J.P., Hindley, J.R. (eds.) To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, pp. 561–577. Academic Press, London (1980) 17. Ronchi Della Rocca, S., Roversi, L.: Intersection logic. In: Fribourg, L. (ed.) CSL 2001. LNCS, vol. 2142, pp. 414–428. Springer, Heidelberg (2001) 18. Sørensen, M.H., Urzyczyn, P.: Lectures on the Curry-Howard Isomorphism. Elsevier, Amsterdam (2006) 19. TLCA List of Open Problems, http://tlca.di.unito.it/opltlca/ 20. Urzyczyn, P.: The emptiness problem for intersection types. Journal of Symbolic Logic 64(3), 1195–1215 (1999) 21. Venneri, B.: Intersection types as logical formulae. Journal of Logic and Computation 4(2), 109–124 (1994) 22. Wells, J.B.: Christian Haack. Branching types. In: Le M´etayer, D. (ed.) ESOP 2002. LNCS, vol. 2305, pp. 115–132. Springer, Heidelberg (2002) 23. Yokouchi, H.: Embedding a second-order type system into an intersection type system. Information and Computation 117(2), 206–220 (1995)
Differential Linear Logic and Polarization Lionel Vaux Laboratoire de Mathématiques de l’Université de Savoie, UMR 5127 CNRS 73376 Le Bourget-du-Lac Cedex, France [email protected]
Abstract. We extend Ehrhard–Regnier’s differential linear logic along the lines of Laurent’s polarization. We provide a denotational semantics of this new system in the well-known relational model of linear logic, extending canonically the semantics of both differential and polarized linear logics: this justifies our choice of cut elimination rules. Then we show this polarized differential linear logic refines the recently introduced ¯ convolution λμ-calculus, the same as linear logic decomposes λ-calculus.
1
Introduction
Differential Linear Logic. Differential interaction nets (DIN) were introduced by Ehrhard and Regnier in [1] to provide a notion of proof nets for the finitary fragment of their differential λ-calculus [2]. Both DIN and differential λ-calculus originate in the study of models of linear logic designed after Girard’s quantitative semantics of λ-calculus [3], such as Ehrhard’s finiteness spaces [4]. The distinctive attribute of these models is that intuitionistic proofs, hence typed λterms, are interpreted by power series in particular vector spaces; thus it makes sense to define differentiation on these. The differential λ-calculus embodies this notion of differentiation, in close correspondence with the linear logic approach to resources of computation: a functional program is linear when uses its argument exactly once. The same as the derivative of a smooth function can be thought of as its best linear approximation, the derivative of an abstraction D(λx s) · t reduces to an abstraction λx ∂∂ xs · t where ∂∂ xs · t is obtained by substituting t for exactly one linear occurrence of x in s, where “linear occurrence” means an occurrence which is used exactly once in the head reduction of s. There can be many such occurrences in a term, hence one actually considers the sum of all such terms, which is similar to the well known rule for the derivative of a product: (f × g) = f × g + f × g . Such a differential extension can be reproduced in linear logic: it boils down to the introduction of costructural rules, dual to linear logic structural rules, and a codereliction rule, dual to dereliction. Costructural rules reflect an algebraic structure on exponentials, with a convolution product m : !A ⊗ !A !A and its unit u : 1 !A. The basis of differentiation in these models is that morphisms f : !A B are power series: then codereliction ∂ : A !A is such that
Supported by French ANR project CHoCo.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 371–385, 2009. c Springer-Verlag Berlin Heidelberg 2009
372
L. Vaux
f ◦ ∂ : A B is the linear part of f , i.e. its derivative at point 0; together with the convolution product, this defines the derivative at any point. The cut elimination procedure then reflects valid equations in the model. The system of DIN presented in [1] is not exactly an extension of linear logic: the promotion rule is missing. It is however possible to reintroduce it, together with appropriate cut elimination rules derived from the semantics in finiteness spaces: this defines differential nets (DN), which depart from the interaction net paradigm (see, e.g. [5]). One can naturally introduce a sequent calculus associated with DN, where cut elimination is guided by the reduction of nets: call differential linear logic (DiLL) this system. Polarization. The notion of polarities in linear logic was made prominent by Andreoli’s work on focusing proofs [6] and Girard’s deterministic system for classical logic [7]. The latter led to the definition of polarized linear logic (LLP) by Laurent [8]: in the polarized fragment of linear logic, the structural rules can be extended to all negative formulas rather than ?-formulas only. It is well known that the transition from an intuitionistic system to a classical one can be performed by allowing deductions with multiple conclusion formulas. Since negative formulas are the target language of Girard’s translation of implicative formulas into linear logic, we understand that LLP corresponds to such a relaxation. The computational counterpart of classical logic is well established: classical truths type control operators. It is moreover possible to extend the Curry– Howard correspondence to a classical logic setting, while retaining the intuition of proofs with multiple conclusions. For instance, Parigot’s λμ-calculus and Herbe¯ lin’s λμ-calculus can be considered as calculi of programs with multiple outputs, controlled by term constructions reflecting polarized structural rules. These enjoy decompositions into LLP, similar to the translation of λ-calculus into linear logic proof nets studied by Danos and Regnier [9,10]. From a semantical point of view, the idea that polarization canonically extends the structure of exponentials to polarized formulas is also valid. For instance Girard’s correlation spaces [7] are coherence spaces equiped with a `-monoid structure and provide a semantics of LLP [8]: the interpretation of costructural rules on polarized formulas is built from that on their subformulas (basically variables and exponentials). Work by Laurent and Regnier [11] later showed that this construction generalizes: the `-monoids of a Lafont category [12] form a model of LLP. Polarized Costructural Rules. In short, DiLL introduces a symmetry on exponential types, with costructural rules, and provides a differential analysis of proofs through a computational notion of derivatives. On the other hand, LLP extends the linear decomposition of intuitionistic logic to classical logic, by relaxing structural rules, i.e. by canonically extending the structure of exponentials to polarized formulas. This motivates the study of the relations entertained by both of these extensions of the Curry–Howard correspondence and its analysis by linear logic. A first result was provided in [13]: the author introduces a differential λμcalculus which is a conservative extension of both λμ-calculus and differential λ-calculus, enjoying confluence and strong normalization of typed terms: the definability of such a system witnesses a compatibility between both extensions,
Differential Linear Logic and Polarization
373
and does not involve any new logical interaction. Indeed, although this is not done in [13], one can consider the system obtained as the union of rules of DiLL and LLP, then check that any kind of cut in this system is already covered by the cut elimination rules of DiLL or LLP: this is the target of a translation of differential λμ-calculus extending naturally that of differential λ-calculus in DiLL and that of λμ-calculus in LLP. In the present paper we rather investigate the effect of polarization on DN: we consider the system obtained by relaxing not only structural rules to negative formulas but also costructural rules to positive formulas. Again the idea is that polarization should also extend the algebraic structure of exponentials to polarized formulas. In particular, this preserves the symmetry between structural and costructural rules introduced in DiLL. There are two main guiding lines when designing cut elimination in this system: symmetry and semantics. We consider a model of linear logic which can be extended to both DiLL and LLP: both correlation spaces and finiteness spaces are refinements of the relational model which underlies Girard’s coherence semantics. More: in the relational interpretation of linear logic proofs, duality boils down to reversing the orientation of relations. This allows to deduce, in a very natural way, the semantics of polarized costructural rules from that of polarized structural rules: just reverse the corresponding relations. The reflexive object introduced in [14] is well suited for this study: it allows to interpret both DiLL and LLP in a pure (i.e. untyped) setting, so that exponential structural and costructural rules are exchanged by symmetry, and polarized structural rules are given by a `-monoid structure on the object. It is then easy to derive the computational behaviour of polarized costructural rules from this semantics. The system presented in the current paper can be seen as the end result of this course of thought. ¯ In [15], the author introduced convolution λμ-calculus based on similar ideas: ¯ interpret Herbelin’s λμ-calculus into the object of [14] through LLP, then investigate the computational counterpart of the monoid operation modelling polarized costructural rules, when applied to the denotations of contexts, which are dual to terms. Organization of the paper. In section 2, we introduce the system of polarized differential nets (PDN), together with typing and reduction rules. Then, in section 3, we validate this new system by providing a denotational semantics on a particular object of the relational model of linear logic. This canonically extends the relational semantics of both DN and LLP. Section 4 briefly reviews sequentialization of PDN. Last, section 5 explicits the translation of the convolution ¯ λμ-calculus in PDN, as hinted in [15]. The end of the paper proposes a quick glimpse at how to bring differentiation back into that setting.
2
Polarized Differential Nets
Polarized differential nets (PDN) are formal finite sums of simple nets, which are particular multiport interaction nets, such as studied by Mazza [16]
374
L. Vaux
Fig. 1. An example of simple net
following Lafont [17]. The cells of simple PDN are actually those of DN, i.e. DIN plus promotion boxes. Mainly, PDN differ from DN when considering typing, which is relaxed by polarization, and cut elimination, which involves new rules. An example simple PDN is given in Figure 1, with a pure typing: this is the translation ¯ of convolution λμ-calculus closed term λx μα x , (x · α) ∗ α (see below). Nets. We call signature a set Σ of symbols, where each symbol α ∈ Σ is given an arity a(α) ∈ N. A simple net on signature Σ is a circuit built up from a finite number of cells, each given a symbol in Σ, which are connected by finitely many wires, so that every cell c is connected to a(αc ) + 1 wires, where αc is the symbol of c. We allow wires with dangling ends, and also loop wires. The ends of wires are called ports (a loop is a wire whose ports are equal). Hence each port is either a cell port, or a loop port, or a free port (of a dangling wire). The placement of connexion points matters: cell ports are not interchangeable. If c is a cell, we write c0 , . . . , ca(αc ) for its ports. Port c0 , which is always present, is the principal port of c; the possible other ones are called auxiliary ports. In general, cells are depicted by triangles, with their respective principal port put on the tip of the triangle, and the auxiliary ones on the opposite edge. The interface of a simple net is the set of its free ports. A net μ is a multiset [μ1 , . . . , μn ] of simple nets sharing the same interface, which we also consider to be the interface of μ. If μ and μ are nets with the same interface, we denote additively their multiset union μ + μ . We also denote by 0 the empty multiset of simple nets, whatever the underlying interface. This should not be confused with the empty simple net ε, the interface of which is empty. We will consider cells of a special kind: a box is a cell with symbol μ! , where μ is a net whose interface matches the ports of the box-cell. A box μ! is depicted as a rectangle containing μ, where we distinguish the principal port by a circled exclamation mark. Let Σ be asignature. We define the signature Σ ! by induction on the depth of boxes: Σ ! = Σ (n) , where Σ (0) = Σ, and if Σ (n) is defined, we set Σ (n+1) = Σ (n) ∪ {μ! ; μ! is a box symbol with μ a net on Σ (n) }. Notice that boxes may contain sums, since box symbols are not necessarily simple nets. Definition 1. The signature Δ0 of PDN of depth 0 is that of Ehrhard–Regnier’s DIN [1]: binary symbols tensor ⊗, par `, contraction c and cocontraction m; unary symbols dereliction d and codereliction ∂; and nullary symbols weakening w and coweakening u. Then the signature of all PDN is Δ = Δ!0 . Typing. The polarized formulas of multiplicative exponential linear logic are given by the following mutually inductive grammars:
Differential Linear Logic and Polarization
375
Fig. 2. Typing of the cells of simple PDN
negative: M, N ::= X | M ` N | ?P positive: P, Q ::= X ⊥ | P ⊗ Q | !N with negation defined by De Morgan duality: X ⊥⊥ = X, (M ` N )⊥ = N ⊥ ⊗ M ⊥ and (?P )⊥ = !P ⊥ . Recall that the linear logic formulas used in the decomposition of minimal implicative natural deduction (i.e. simply typed λ-calculus) through Girard’s translation A ⇒ B = !A B are the intuitionistic and exponential ones, organized as follows: negative: A, B ::= X | ?A⊥ ` B positive: A⊥ , B ⊥ ::= X ⊥ | A⊥ ⊗ !B
why-not: ?A⊥ of-course: !A
These are special cases of polarized formulas. Pure types were introduced by Danos [9] and Regnier [10] in order to interpret pure λ-calculus in linear logic. They are variable free intuitionistic and exponential formulas obtained from an additional constant o, subject to the equation o = o ⇒ o, which allows to type all pure λ-terms. This translates to o = !o o = ?o⊥ ` o. We obtain four possible formulas: o itself (the type of terms), i = o⊥ (its dual, the type of contexts), !o (the type of arguments) and ?i (the type of free variables). In a pure setting, o (resp. i, !o, ?i) is the only negative (resp. positive, of-course, why-not) formula: it can be considered as the archetypal one. Definition 2. A typing of a simple PDN is the assignment of a type to each oriented wire, such that reversing the orientation of the wire negates the type, and respecting some typing constraints on symbols. The typing rules for the cells of simple PDN of depth 0 are given in Figure 2, where types are polarized formulas. If I = p1 , . . . , pk is an ordered interface of simple net μ, i.e. a list of its free ports, and if Γ = γ1 , . . . , γk is a list of types, then we write μ I Γ if there is extend typing to a typing of μ such that the outgoing type at port pi is γi . We m all PDN, by the following rules for sums and boxes: if μ = j=1 μj is a net with ordered interface I, then μ I Γ as soon as, for all j, μj I Γ ; if moreover I = {0, . . . , n}, Γ = N, N1 , . . . , Nn and μ I Γ , then for a box b of symbol μ! , we have b b0 ,...,bn !N, N1 , . . . , Nn (this is the promotion rule of LLP). Notice the four polarized typing rules for structural and costructural cells. From this polarized typing, one straightforwardly deduces an intuitionistic (resp. pure) type system for PDN. Cut elimination. Among the ports of each cell, some are called active: these correspond to active formulas in the associated deduction rule. The only active port of a non-box cell is the principal one; by contrast, all the ports of a box
376
L. Vaux
are active. A cut in a net is a wire between active ports of distinct cells: a redex is the data of two cells c, d and indices of active ports i of c and j of d, such that ci , dj is a wire. We denote such a redex by ci , dj . To each typable redex ci , dj , we associate a reduced net which depends only on the symbols αc and αd , and the port indices i and j, so that the free ports of the reduced net are assigned to the free ports of the cut, i.e. the ports of c and d minus ci and dj . i j Definition 3. We first define reduction at depth 0. If μ is a simple net, c ,d i j n is a redex of μ and k=1 νk is the reduct of c , d given in Figure 3, then we n write μ → 0 k=1 μk , where each μk is the simple net obtained by removing wire i j c , d and cells c and d from μ, then plugging νk instead. This is extended to n n sums as follows: μ →0 μ as soon as μ = i=0 μi and μ = i=0 μi , where each μi is simple, μ0 →0 μ0 and, for i ∈ {1, . . . , n}, μi →0 μi or μi = μi . We now define reduction at any depth. Assume →n is defined. Then if μ is a simple net, μ →n+1 μ if μ →0 μ , or if there are nets ν and ν such that ν →n ν and μ is obtained by replacing a box with symbol ν ! in μ with a box of symbol ν ! . We extend →n+1 on sums similarly to →0 . We finally set μ → μ if μ →n μ for some n. We provided annotations for the reduction rules of Figure 3, organized as follows. Groups m and e are the cut elimination rules for multiplicative exponential linear logic. Groups m and r correspond to the reduction of DIN in [1]; if we add d, we obtain the reduction rules of DN, suitable to encode differential λcalculus. Groups m, e and p define the cut elimination procedure of LLP. This is actually a local version of the reduction presented in [8]: group p and rules e2,3,4 decompose in many steps the reductions of positive trees versus structural rules and auxiliary ports of boxes. The only new reduction rules in PDN are those of group p . It is easily checked that the left part of Figure 3, i.e. groups m, r, p and p except p3 , define a confluent and terminating system. As first noticed by Tranquilli [5], however, even local confluence of the system including d1 is only verified upto some structural reductions (similar to the structural equivalence of Definition 9): this is because rule d1 forces which passive port of the convolution product m receives the linear argument provided by ∂. A full study of the proof theoretic properties of PDN (including confluence and strong normalization) is left for future work: although similar questions for DN receive a partial answer in [5], much remains to be settled. In the following, we concentrate on a semantical justification of our choice of cut-elimination (section 3), remarks on sequentialization properties (section 4) and the computational expressivity of polarized costructural rules (section 5).
3
Relational Semantics
Following [14], we construct an object in the usual multiset based relational model of linear logic (the category of sets and relations, where multiplicatives
Differential Linear Logic and Polarization
Fig. 3. Reduction rules of PDN
377
378
L. Vaux
Fig. 4. Relational typing of the non-box cells of PDN
are interpreted by cartesian products, and exponential modalities by the free commutative monoid construction, i.e. finite multisets) which is an extensional reflexive object in the co-Kleisli category associated with the ! modality. If X is a set, denote by Mfin (X) the set of all finite multisets of elements (ω) in X, and by Mfin (X) the set of all sequences ξ = (ξ(i))i∈ω of multisets in Mfin (X) such that ξ(i) = [] holds for almost all i ∈ ω. We define an increasing (ω) family (Dn )n∈N ofsets by induction on n: D0 = ∅ and Dn+1 = Mfin (Dn ) . Then we set D = n∈N Dn . If a ∈ Mfin (D) and α ∈ D, write a :: α for the sequence β such that β(0) = a and β(i + 1) = α(i) for all i ∈ ω. We denote by ι the constant sequence such that ι(i) = [] for all i ∈ ω. For instance, D1 = {ι}. The mapping (a, α) → a :: α is clearly a bijection from Mfin (D) × D to D, and satisfies ι = [] :: ι. This bijection makes D an extensional reflexive object in the cartesian closed category defined by the co-Kleisli construction on !, hence an extensional model of pure λ-calculus. It also provides a model of pure DiLL: the finitary semantics of [4] is easily reproduced with base types interpreted by D, pruning the finiteness structure. We show how this object provides a model of the reduction of pure PDN, first by defining a commutative monoid structure on D, with unit ι: for all i ∈ ω, set (α β)(i) = α(i) + β(i). Following [11], we obtain a model of pure LLP; we show that this actually extends to a model of PDN. We call relational type any couple (γ, i), where i is an orientation bit 0 or 1, and γ ∈ Mfin (D) ∪ D. We set the duality on types to negate the orientation bit. By convention, when depicting the relational typing of a PDN, we fix the orientation of wires so that the orientation bit is always the same (say 0): on these oriented wires, we only give the value α or a of the type. Definition 4. The rules of relational typing of simple PDN of depth 0 are ngiven in Figure 4. This is extended to all PDN of depth 0 as follows: if μ = i=1 μi is a sum of simple nets with ordered interface I, then μ I Γ as soon as μi I Γ for some i (not necessarily all). It remains to define the typing of boxes. We write μ : (γ1 , . . . , γl I γ1 , . . . , γm ) if μ I (γ1 , 0), . . . , (γm , 0), (γ1 , 1), . . . , (γl , 1). i i ) Assume I = 0, . . . , l + m and there are typings μ : (a1 , . . . , ail I αi , β1i , . . . , βm ! for i ∈ {1, . . . , n}. Then for all box b of symbol μ , we have b : (a , . . . , a
1 l I n
n i i [α1 , . . . , αn ] , β1 , . . . , βm ), where aj = i=1 aj and βk = i=1 βk . The relational semantics of a PDN is then the set of its input-output typings: μI = {((γ1 , . . . , γl ), (γ1 , . . . , γm )); μ : (γ1 , . . . , γl I γ1 , . . . , γm )}. Theorem 1. The relational semantics of PDN is preserved under reduction. Proof. One simply inspects the reduction rules and checks that they preserve all possible typings. Then one concludes by contextuality.
Differential Linear Logic and Polarization
379
Fig. 5. Sequentiality of PDN
4
Sequentiality
Definition 5. A PDN is sequential if it is a sum of simple nets obtained by the rules of Figure 5, plus the formation of boxes, where simple nets μ1 and μ2 , and nets inside boxes are inductively supposed to be sequential. It is said to be weakly sequential when one moreover allows the formation of the empty PDN ε and the juxtaposition (i.e. disjoint union) of simple nets as inductive cases. A first sequentiality criterion is provided by the well-know Danos-Regnier switching condition. Definition 6. Let μ be a simple PDN. A switching of μ is a graph G with vertices the ports of μ and with edges as follows: every wire of μ is an edge in G; for each cell c in μ, with symbol ` or c, there is an edge between c0 and exactly one of the ports ci , i > 0; for each cell d in μ, with symbol other than ` or c, there is an edge between d0 and every port dj , j > 0. A PDN ν is correct if every switching of every simple net ν in μ is acyclic and, recusively, every PDN inside a box cell of ν is correct. Of course, the set of correct PDN is stable under cut elimination. Theorem 2. A PDN is weakly sequential iff it is correct. Proof. One adapts easily the proof by Danos in [9] for MELL proof structures to the case of PDN. Indeed, this proof is only about the geometry of nets: here m is handled like ⊗, ∂ is handled like d and u is handled like a tensor unit.1 One consequence of the polarization property in linear logic, as described by Laurent in [8] is that one can characterize exactly sequential proof structures, based on a simple criterion on so-called correctness graphs. This no longer applies here: the typing rule of codereliction breaks the constraining character of polarization. In particular, we can no longer claim that every typed and sequential net has at most one positive conclusion: this was an essential property of polarization in [10] and [8] (pure or intuitionistic PDN retain this property, however, as first noted by Tranquilli for intuitionistic DN [5]). 1
Although, this is not done in the present paper, one can introduce multiplicative units: 1 is positive and ⊥ is negative. Then, by the polarized typing rules of PDN, 1 (resp. ⊥) can be seen as a special case of u (resp. w).
380
5
L. Vaux
¯ Convolution λμ-calculus
¯ We now recall the definition of the convolution λμ-calculus of [15]. Like Herbelin’s ¯ λμ-calculus, it involves three distinct syntactic categories: terms (proofs with an active conclusion), contexts (proofs with an active hypothesis) and commands (cuts between active conclusions of terms and active hypotheses of contexts). It moreover introduces a binary operation on contexts, which is meant to provide a computational counterpart to the polarized costructural rules of PDN. It turns out the obtained reduction rules closely resemble the definition of the convolution product of distributions [18]. 5.1
Syntax
Basic Syntax. Fix two denumerably infinite sets V (set of variables, denoted by x, y, z) and N (set of names, denoted by α, β, γ). Definition 7. Define terms, contexts and commands by: s ::= x | λx s | μα c (simple terms) σ ::= α | S · e (stacks) e ::= 1 | σ ∗ e (simple contexts) c ::= s , e (simple commands)
S ::= 0 | s + S (terms) E ::= 0 | e + E (contexts) C ::= 0 | c + C (commands) .
We consider terms, commands and contexts up to permutativity of sum in the sense that, e.g., s + (s + S) = s + (s + S). Also, we consider simple contexts up to permutativity of convolution product: e.g., α ∗ ((S · e) ∗ e ) = (S · e) ∗ (α ∗ e ). Notice that these identities preserve free and bound variables and names: hence they are compatible with α-conversion. Notations. We call simple object any simple term, simple context or simple command, and object any term, context or command. We allow formation of arbitrary finite sums of objects of the same kind, with the obvious meaning. Thus sum + becomes an associative and commutative binary operation on terms, contexts and commands respectively, and object 0 is neutral. Similarly, we allow arbitrary finite convolution products of simple contexts, with unit 1. We can then extend our syntactic constructs by linearity: n n p q p q λx ( i=1 si ) = i=1 λx si ( j=1 ej ) ∗ ( k=1 fk ) = j=1 k=1 ej ∗ fk r r n p p n μα ( l=1 cl ) = l=1 μα cl p p i=1 si , j=1 ej = i=1 j=1 si , ej . S · ( j=1 ej ) = j=1 S · ej Notice that the cons S · E of term S and context E is not linear in the term: this is the analogue of application not being linear in the argument, in ordinary λcalculus. This definition introduces some overlap of notations: e.g., λx s denotes both a simple term in our basic syntax, and the value of λx (s + 0) in the above definition. This is however harmless since both writings denote the same term. Hence the set of terms (resp. contexts, commands) is endowed with a structure of commutative monoid. The set of contexts is moreover endowed with a
Differential Linear Logic and Polarization
381
structure of commutative rig (i.e. a commutative ring, without the condition that every element admits an opposite), with addition + and multiplication ∗. Also, λ- and μ-abstractions are linear, cons is linear in the context, and cut is bilinear. Thanks to the notations we have just introduced, the capture avoiding substitution of a term for a variable (resp. of a context for a name) in an object is defined as usual, by induction on objects. 5.2
Translation and Reduction
¯ Translation. Before we recall the reduction of convolution λμ-calculus from [15], we make explicit the intended translation into PDN, first in a typed setting. ¯ Definition 8. The typing rules for the simple objects of convolution λμ-calculus are given in Figure 6, together with their translation in PDN: to each derivation of Γ s : A | Δ (resp. Γ | e : A Δ, c : (Γ Δ)), where Γ = x1 : A1 , . . . , xn : An and Δ = α1 : B1 , . . . , αp : Bp , we associate an intutionistic sequential PDN
For sums of simple objects, we moreover have the following three typing rules: {Γ si : A | Δ}i=1,...,n n Γ i=1 si : A | Δ
{Γ | ei : A Δ}i=1,...,n n Γ | i=1 ei : A Δ
{ci : (Γ Δ)}i=1,...,n n i=1 ci : (Γ Δ)
and the translation of a sum is the sum of the translations. In particular, the object 0 lives in all types and is translated by PDN 0 with corresponding interface. ¯ From this definition, one easily derives a translation of pure convolution λμcalculus into pure typed sequential PDN. Like the translation of λ-calculus into ¯ linear logic, this translation of convolution λμ-calculus is meant up-to a structural equivalence on PDN. Definition 9. We define the structural equivalence ∼ = of PDN, as the reflexive, symmetric, transitive and contextual closure of the equations of Figure 7. Convolution Reduction. We call contextual relation any triple r of binary relations respectively on terms, contexts and commands, each also denoted by r, which are closed under the constructions of Definition 7: let • denote a single occurrence of simple object θ in object Θ, then θ r Θ implies Θ r Θ [Θ /•]. Definition 10. Reduction → is the least contextual relation such that: μα c , e → c [e/α] λx s , (S · e) ∗ f → λy μα s [y + S/x] , α ∗ e , f λx s , 1 → s [0/x] , 1 with y a fresh variable and α a fresh name in (2).
(1) (2) (3)
382
L. Vaux
¯ Fig. 6. Typing of convolution λμ-calculus objects and translation in PDN
In [15], it is proved that this notion of reduction is confluent. It also is proved ¯ in [19] that the simply typed objects of convolution λμ-calculus are all strongly ¯ μ-calculus. normalizing: one adapts the proof by Polonowski in [20] for λμ˜
Differential Linear Logic and Polarization
383
!A !A
x !A
···
T
···
···
z y
Δ
···
c
Δ
A
···
!A
Δ
···
···
!Γ
···
···
x ···
···
x !A
!Γ c
e
c
···
A α
···
!Γ
Δ
···
···
···
···
!Γ
c
···
Fig. 7. Structural equivalence of PDN
!
¯ Fig. 8. PDN simulating the substitution operations of the convolution λμ-calculus
Lemma 1. The PDN represented in subfigures(a), (b), (c) and (d) of Figure 8 reduce to PDN structurally equal to the translation of respectively c [e/α], c [0/α], c [y + z/x] and c [T /x], where: c : (Γ α : A, Δ) and Γ | e : A Δ in case (a);c : (Γ, x : A Δ) in cases (b), (c) and (d) y and z are distinct fresh variables in case (c);Γ T : A | Δ in case (d). Theorem 3. The cut elimination procedure of PDN up-to structural equiva¯ lence simulates the reduction of convolution λμ-calculus. Proof. By context closure, it is sufficient to consider the case of redexes. For μα c , e → c [e/α], case (a)of Lemma 1 applies directly. For λx s , 1 → s [0/x] , 1, reduce the cut u , `, then apply case (b). For λx s , (S · e) ∗ f → λy μα s [T + y/x] , e ∗ α , f , reduce the cut m , ` followed by ⊗ , `; then apply case (c) with fresh variables y and z, followed by case (d) to obtain the PDN associated with λx μα s [y + z/x] [T /z] , e ∗ α , f . Recall from [15] that the relational model described in section 3 provides a ¯ denotational semantics of convolution λμ-calculus. The reader will easily check
384
L. Vaux
that this semantics can be decomposed as the translation of pure convolution ¯ λμ-calculus in PDN, followed by the semantics of pure typed PDN in D. 5.3
¯ Towards a Differential λμ-calculus
Notice that λx s , (T · e) ∗ (U · f ) →∗ s [T + U/x] , e ∗ f . In some sense, context (T · e) ∗ (U · f ) simulates (T + U ) · (e ∗ f ), as is reflected by the fact that these contexts are identified in the relational semantics (see [15]). This suggests to introduce notations S ! = S · 1 and ↑e = 0 · e so that S ! ∗ ↑e simulates S · e: from now on, we consider a syntax where stacks are restricted to those ! two shapes. Reduction rule (2) boils down to the elementary rules λx s , T ∗ f → λx s [x + T /x] , f and λx s , ↑e ∗ f → λx μα s , e ∗ α , f . In this new syntax, all causality information in stacks is lost, and one only retains a minimal form of sequentiality: contexts of type A ⇒ B become bags of arguments of type A and future contexts of type B. So far, we only focused on the computational content of costructural rules. It turns out the breaking down of contexts we have just performed makes the introduction of differentiation in the sense of [2] very natural in this setting. Introduce a new stack construction [s] with typing rule and associated net:
and
!Γ
B
···
Γ s:A|Δ Γ | [s] : A ⇒ B Δ
⊗
···
?A⊥ ` B
s !A
∂
Δ
.
A
One then defines a linear variant of substitution ∂∂ xθ · t following [2] so that λx s , [t] ∗ e → λx ∂∂ xs · t , e , in adequation with cut-elimination in PDN. One has to pay attention to the fact that case (a)of Lemma 1 no longer holds: stacks containing linear arguments can no longer be duplicated nor erased freely. Hence one has to introduce a notion of named cut : let θ be any simple object with free name α of type A and e be a simple context of type A, we must define object θ , eα so that the PDN interpretation of μα c , e reduces (up to ∼ =) to that of θ , eα . This can be done by induction on e, inspecting the possible PDN reductions, and involves a construction similar to the named derivative of [13]. One then proves that the reduction relation of the obtained pure calculus is confluent and appropriately simulated by cut-elimination in PDN; simply typed objects are moreover strongly normalizing. These results are detailed in [19, Chapter 8]. The obtained system can be seen as a classical sequent calculus version of differential λ-calculus. Current investigations include: establishing a precise relationship between this calculus and Boudol’s resource λ-calculus as studied in [5]; studying the “equan n 1 tion” S ! = i=1 n! [S] , w.r.t. both denotational and operational semantics, following [21,22]; relating the convolution product with parallel composition as known in concurrency theory; more generally, revealing the expressivity of Dif¯ ferential λμ-calculus w.r.t. concurrent computing, following recent advances [23] on simulating Milner’s π-calculus [24] in DN.
Differential Linear Logic and Polarization
385
References 1. Ehrhard, T., Regnier, L.: Differential interaction nets. Theor. Comput. Sci. 364, 166–195 (2006) 2. Ehrhard, T., Regnier, L.: The differential lambda-calculus. Theor. Comput. Sci. 309, 1–41 (2003) 3. Girard, J.Y.: Normal functors, power series and lambda-calculus. Ann. Pure Appl. Logic 37, 129–177 (1988) 4. Ehrhard, T.: Finiteness spaces. Math. Struct. Comput. Sci. 15, 615–646 (2005) 5. Tranquilli, P.: Intuitionistic differential nets and lambda-calculus. To appear in Theor. Comput. Sci (2008) 6. Andreoli, J.M.: Logic programming with focusing proofs in linear logic. Journal of Logic and Computation 2, 297–347 (1992) 7. Girard, J.Y.: A new constructive logic: Classical Logic. Math. Struct. Comput. Sci. 1, 255–296 (1991) 8. Laurent, O.: Étude de la polarisation en logique. PhD thesis, Université AixMarseille 2 (2002) 9. Danos, V.: Une application de la logique linéaire à l’étude des processus de normalisation (principalement du λ-calcul). PhD thesis, Université Paris 7 (1990) 10. Regnier, L.: Lambda-calcul et réseaux. PhD thesis, Université Paris 7 (1992) 11. Laurent, O., Regnier, L.: About translations of classical logic into polarized linear logic. In: Proceedings of LICS 2003 (2003) 12. Bierman, G.M.: What is a categorical model of intuitionistic linear logic? In: Dezani-Ciancaglini, M., Plotkin, G. (eds.) TLCA 1995. LNCS, vol. 902. Springer, Heidelberg (1995) 13. Vaux, L.: The differential λμ-calculus. Theor. Comput. Sci. 379, 166–209 (2007) 14. Bucciarelli, A., Ehrhard, T., Manzonetto, G.: Not enough points is enough. In: Duparc, J., Henzinger, T.A. (eds.) CSL 2007. LNCS, vol. 4646, pp. 298–312. Springer, Heidelberg (2007) ¯ 15. Vaux, L.: Convolution λμ-calculus. In: Della Rocca, S.R. (ed.) TLCA 2007. LNCS, vol. 4583, pp. 381–395. Springer, Heidelberg (2007) 16. Mazza, D.: Interaction Nets: Semantics and Concurrent Extensions. PhD thesis, Université Aix–Marseille 2, Università degli Studi Roma Tre (2006) 17. Lafont, Y.: From proof nets to interaction nets. In: Advances in Linear Logic. London Math. Soc. Lect. Not., vol. 222. Cambridge University Press, Cambridge (1995) 18. Schwartz, L.: Théorie des distributions. Hermann (1966) 19. Vaux, L.: λ-calcul différentiel et logique classique: interactions calculatoires. PhD thesis, Université Aix-Marseille 2 (2007) 20. Polonowski, E.: Substitutions explicites, logique et normalisation. PhD thesis, Université Paris 7 (2004) 21. Ehrhard, T., Regnier, L.: Uniformity and the Taylor expansion of ordinary lambdaterms. Theor. Comput. Sci. 403, 347–372 (2008) 22. Ehrhard, T., Regnier, L.: Böhm trees, Krivine’s machine and the Taylor expansion of lambda-terms. In: Proceedings of CiE 2006 (2006) 23. Ehrhard, T., Laurent, O.: Interpreting a finitary pi-calculus in differential interaction nets. In: Caires, L., Vasconcelos, V.T. (eds.) CONCUR 2007. LNCS, vol. 4703, pp. 333–348. Springer, Heidelberg (2007) 24. Milner, R., Parrow, J., Walker, D.: A calculus of mobile processes, i. Inf. Comput. 100(1), 1–40 (1992)
Complexity of G¨odel’s T in λ-Formulation Gunnar Wilken1 and Andreas Weiermann2, 1 Mathematical Biology Unit Okinawa Institute of Science and Technology 12-2 Suzaki, 904-2234 Okinawa, Japan [email protected] 2 Vakgroep Zuivere Wiskunde en Computeralgebra Universiteit Gent Krijgslaan 281 - Gebouw S22, 9000 Gent, Belgium [email protected]
Abstract. Let T be G¨odel’s system of primitive recursive functionals of finite type in the λ-formulation. We define by constructive means using recursion on nested multisets a multivalued function I from the set of terms of T into the set of natural numbers such that if a term a reduces to a term b and if a natural number I(a) is assigned to a then a natural number I(b) can be assigned to b such that I(a) > I(b). The construction of I is based on Howard’s 1970 ordinal assignment for T and Weiermann’s 1996 treatment of T in the combinatory logic version. As a corollary we obtain an optimal derivation length classification for the λ-formulation of T and its fragments. Compared with Weiermann’s 1996 exposition this article yields solutions to several non-trivial problems arising from dealing with λ-terms instead of combinatory logic terms. Keywords: Typed λ-Calculus, Rewrite System, G¨odel’s T, Strong Normalization, Termination.
1 Introduction A common and very convenient tool for proving termination of a reduction system consists in defining an interpretation function I from the set of terms in question into the set of natural numbers such that if a term a rewrites to a term b then I(a) > I(b). A rewritten sequence a1 , . . . , an of terms then yields a strictly descending chain of natural numbers I(a1 ) > . . . > I(an ), thus I(a1 ) is an upper bound for n and therefore I provides a termination proof plus a non trivial upper bound on resulting lengths of longest possible reductions. In this paper we apply a generalization of this method – the non-unique-assignment technique – to Tλ , the λ-formulation of G¨odel’s T, which is the prototype for a higher order rewrite system. For TCL , the combinatory logic formulation of T, a corresponding interpretation has already been constructed in [9]. In this article we solve the technically more involved problem of classifying the derivation lengths for Tλ via a multivalued interpretation function. The extra complications when compared with the treatment in [9] are due to the necessary introduction of a variable concept
This work was partially supported by a grant from the T EMPLETON F ELLOWSHIP at Ghent University.
P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 386–400, 2009. c Springer-Verlag Berlin Heidelberg 2009
Complexity of G¨odel’s T in λ-Formulation
387
into the interpretation and to the necessity of dealing simultaneously with recursion, β-conversion, and the ξ-rule. A brief heuristic for our treatment of Tλ which combines the techniques from [5] and [9] is given in the next section. For a recent and extensive exposition of the history of termination proofs for G¨odel’s T we refer the reader to section 8.2 in [3]. In fact, [3] covers the history of λ-calculus and combinatory logic in general. Unlike the present paper the majority of termination proofs for G¨odel’s T mentioned in [3] does not yield non-trivial upper bounds on the lengths of reductions. An alternative approach for proving termination of G¨odel’s T which yields nontrivial upper bounds on the lengths of reductions was suggested in [1]. There the lengths of derivations were classified by proof-theoretic investigations on head reduction trees. The current approach is more direct, and as a possible benefit for future work we expect the extraction of powerful (syntactic) termination orderings for higher order reduction systems which generalize the recursive path ordering. We conjecture that such an ordering for G¨odel’s T will have the Howard Bachmann ordinal as order type (in accordance with Cichon’s suggestion in RTA open problem 23).
2 A Brief Heuristic for the Assignment Howard proposed a (non-unique) ordinal assignment H from the set of terms into the ordinals less than ε0 such that if a term a rewrites in a certain restricted sense to a term b and if H(a) is assigned to a then we can effectively assign an ordinal H(b) to b such that H(a) > H(b). In this article we employ Howard’s 2-operator in the modified version of [9] for dealing with application terms. For dealing with λ-abstraction terms we utilize a modification of Howard’s δ-operator. The interplay of these two operators then allows for a treatment of β-conversion. Since the δ-operator is – as far as we can see – not strictly monotonic increasing, we follow Howard’s idea of a non-unique assignment of ordinals to terms for dealing with the ξ-rule. The underlying idea is that for defining an ordinal value for an , the n-th member of a rewritten sequence a1 , . . . , an , one has to trace the history of λ-abstraction terms occurring in an back to the corresponding λabstraction terms occurring somewhere before in a1 , . . . , an−1 . For assigning natural numbers to terms one finally has to apply the collapsing function ψ : ε0 → ω from [9] to the resulting assignment. However, the approach so far does neither yield strong normalization for Tλ nor a derivation lengths classification for Tλ . In [5] a reduction of R(t + 1)ab to at(Rtab) is only allowed in case that t is a numeral. For allowing recursor reductions where t is an arbitrary term of type 0 we have to modify Howard’s assignment by using the ψ-function. As long as t is a closed term this works smoothly. Allowing arbitrary open terms t (whose variables may be bounded from a context containing R(t + 1)ab) unfortunately causes enormous extra complications since the treatment of recursion as in [9] does not fit the concept of Howard’s class C which is the domain of the δ-operator. The conditions on ordinal vectors in C regulate the occurrences of variables and are required for the applicability of the δ-operator. The vectors assigned to terms of shape Rt in [9] do not meet these conditions. To overcome this difficulty we firstly introduce a refined version of Howard’s class C that allows for a clean distinction between the different functions variables occurring
388
G. Wilken and A. Weiermann
in the assignment have, namely of merely being substitutable by certain terms versus even being marks for the rebuilding process carried out by the operator δ. Secondly we develop our assignment in two steps as follows: In the first step we define a unique assignment [[·]] which treats β-conversion correctly but only restricted versions of the ξ-rule and recursion similarly as in [5], Section 3. This assignment is applied to the abstraction terms occurring at initial positions of the reduction history of a given term during the second step when Howard’s idea of a non-unique assignment comes into play, which is essential in dealing with abstraction terms as well as with terms of the form Rt in the context of Tλ . The resulting assignment [·] then combines the powerful properties of Howard’s original non-unique assignment with those of the assignment defined in [9]. We now explain the key features of our assignment in a more technical way. In order to distinguish the above mentioned weak role of a variable assignment from the strong one we introduce copies of the original variables (see the definition of OT ) that are not subject to the conditions on C-vectors but still can appropriately be substituted by terms. This is of crucial importance since the substitution property (cf. [5], p. 454) is essential for dealing with β-conversion. That is, given vectors A and B assigned respectively to terms aσ and bσ where bσ is substitutable for xσ in aτ , we obtain an appropriate vector assigned to aτ {xσ := bσ } by taking A{[xσ ] := B} where [xσ ] is assigned to xσ . Consider a schematic reduction sequence of aτ with focus on a subterm λxσ.bρ0 (assuming λxσ.bρ0 to be closed for simplicity): ξ
ξ
β
aτ ≡ . . . λxσ.bρ0 . . . . . . . . . (λxσ.bρn )cσ . . . ≡: dτ . . . bρn {xσ := cσ } . . . The non-unique assignment works in a way that (by slight simplification) σ
[λxσ.bρi ] = δ x [[bρ0 ]] ⊕ [bρi ]{[xσ ] := 1 } is one of the vectors assigned to λxσ.bρi where [bρ0 ] . . . [bρn ], taking [bρ0 ] = [[bρ0 ]]{[[xσ ]] := [xσ ]}. The applications of (ξ) are now treated by the second summand whereas the β-contraction is handled by σ
[(λxσ.bρn )cσ ] δ x [[bρ0 ]] 2σ [cσ ]
4.19 ρ 4.9 [[b0 ]]{[[xσ ]] := [cσ ]} [bρn ]{[xσ ] := [cσ ]}.
Thus the first summand is kept until a β-contraction applies. The general definition of [·] is a little more involved for the following two reasons: The first one is that we do not know the reduction history of for example the term dτ when defining [·] in general. This causes the non-uniqueness of [·]. The second reason is that (if we no longer assume λxσ.bρ0 to be closed) during the schematic reduction given above β-contractions β
. . . (λy.(. . . λxσ.bρi . . .))e . . . . . . (. . . λxσ.bρi . . .){y := e} . . . where y ∈ FV(λxσ.bρi ) may occur. We therefore have to allow a list of substitutions σ δ x [[bρ0 ]] {[[z]] := Z | z ∈ FV(λxσ.bρ0 )} in the first summand of [λxσ.bρi ].
Complexity of G¨odel’s T in λ-Formulation
389
3 The Calculs λ-R We give a short description of typed λ-calculus extended by recursors and case distinction functionals for each type. The theory Tλ is G¨odel’s T based on typed λ-R-calculus. The set TP of finite types is defined inductively containing the type 0 and the type στ whenever σ, τ ∈ TP. Type 0 is intended to be the type for IN whereas type στ for given finite types σ and τ is meant to be the type of functions f : σ → τ . The level of a finite type is defined recursively by g0 := 0 and g(στ ) := max{gσ + 1, gτ }. The set T of terms is defined inductively, containing countably infinitely many variables xσ of type σ for each type σ ∈ TP, constants 0 and S of type 0, as well as Dτ of type 0τ τ τ and Rτ of type 0(0τ τ )τ τ for each τ ∈ TP, and being closed under application (that is ab ∈ T of type τ whenever a ∈ T is of type στ and b ∈ T is of type σ) and abstraction (that is λxρ.b ∈ T of type ρσ whenever b ∈ T is of type σ and λxρ does not occur in b). The set of variables in T is denoted by V. If a ∈ T is of type τ we communicate this sometimes by writing aτ . Conversely, instead of writing xσ we sometimes simply write x for a variable in V. For a ∈ T the set BV(a) of bounded variables of a consists of all variables x for which λx occurs in a, whereas the set FV(a) of free variables of a consists of all variables x occurring in a outside the scope of λx. The equality relation is the reflexive, symmetric and transitive closure of the one step reduction defined as least binary relation on terms such that (D0 ) (R0 ) (Appr ) (β) (ξ)
D0ab a (DS ) D(St)ab b R0ab b (RS ) R(St)ab at(Rtab) a b ⇒ ac bc (Appl ) b c ⇒ ab ac (λx.a)b a{x := b} where BV(λx.a) ∩ FV(b) = ∅ a b ⇒ λx.a λx.b
4 Ordinal Terms and Vectors This section provides the theoretical framework of our assignment of ordinal vectors to terms of T . We develop an autonomous theory of ordinal terms and vectors and give an interpretation of closed ordinal terms as ordinals below ε0 . 4.1 The Sets OT and C Definition 4.1 (N , OT and OT ). Let N be the set of numerals. The set OT of ordinal terms is defined inductively containing 0, 1, ω, for each variable xσ and i ≤ gσ two variables xσi , xσi , and being closed under (a, b) → a ⊕ b, (a, b) → 2a ⊗ b, and (a, b, n) → ψ(ω ⊗ a ⊕ b + n) where a, b ∈ OT and n ∈ N , respectively. Let OT be the set of all h ∈ OT not containing any variable xσi . Convention. 1. Let h, h0 , . . . , hm ∈ OT and n ∈ IN. We sometimes write the result of successively adding ordinal terms on the right hand side (returning 0 if the summation m n−1 is empty) as hi := h0 ⊕ . . . ⊕ hm and n · h := h. i=0
i=0
390
G. Wilken and A. Weiermann
2. Vectors. We denote the set of all ordinal vectors by OT ω where we identify vectors A = A0 , . . . , An ∈ OT <ω with A0 , . . . , An , 0, 0, 0, . . . ∈ OT ω . This is however just a matter of convenience since we only need vectors of finite length. Let the sets X and X of variable vectors be defined by X := {xσ0 , . . . , xσgσ | xσ ∈ V} and X := {xσ0 , . . . , xσgσ | xσ ∈ V}. We denote the vectors xσ0 , . . . , xσgσ and xσ0 , . . . , xσgσ by X σ and X σ respectively. 3. Sum of vectors. Let A and B be ordinal vectorsand τ ∈ TP. The addition of B to A Ai ⊕ Bi i ≤ gτ up to component gτ is defined by (A ⊕τ B)i := Ai otherwise. ω σ 4. Substitution. Let h ∈ OT , X ∈ X and B ∈ OT . Then h {X σ := B} denotes h where all variables xσi occurring in h are replaced by Bi . In case that X σ ∈ X substitution is defined analogously. Definition 4.2 (lh, dp and Sub). We define the length of an OT -term, the maximal height of its exponential towers and the set of its subterms recursively as follows: • If h is a constant or variable we set lh(h) := 1, dp(h) := 0, and Sub(h) := {h}. • For h ∈ {f ⊕ g, ψ(ω ⊗ f ⊕ g + n)} let lh(h) := lh(f ) + lh(g) + 1, dp(h) := max{dp(f ), dp(g)} and Sub(h) := {h} ∪ Sub(f ) ∪ Sub(g). • If h ≡ 2f ⊗g we set lh(h) := 2lh(f )+lh(g)+1, dp(h) := max{1+dp(f ), dp(g)} and Sub(h) := {h} ∪ Sub(f ) ∪ Sub(g). We now introduce our extended version of Howard’s concept of C-sets, see [5]. Definition 4.3 (C – Sets). We define sets C0 ⊂ OT and Ci ⊂ OT for i ≥ 1 by simultaneous induction: C0 contains 0, 1, xσ0 , xσ0 , a⊕b whenever a, b ∈ C0 , and ψ(ω⊗f ⊕g+n) whenever f ∈ C1 , g ∈ C0 , and n ∈ N . Ci contains all terms of OT , xσi for every i ≤ gσ, whenever f ∈ Ci+1 and g ∈ Ci . a ⊕ b whenever a, b ∈ Ci , and 2f ⊗ g The set C ⊂ OT ω is defined by C := n<ω Cn . We further define C j ⊂ OT for j ∈ IN and C ⊂ OT ω by C j := Cj ∩ OT and C := C ∩ OT ω . Note that C is closed under substitution of vectors from X by vectors from C as well as under substitution of vectors from X by vectors from C. 4.2 Interpreting and Comparing Ordinal Terms We interpret closed OT -terms as ordinals below ε0 . For this purpose we interpret ⊕, ⊗ by the natural sum and the natural product of ordinals. 2(·) is also interpreted by the corresponding ordinal function, i.e. 2α := ω α0 · 2n where α = ω · α0 + n and n < ω (cf. for example [7]). At last we define the function ψ : ε0 → ω interpreting ψ-terms of OT exactly as in [9]. For this purpose we need several notions from the theory of subrecursive functions to the same extent as in [9]. An abstract exposition of the underlying concepts can be found in [2]. Let Φ be a sufficiently fast growing number theoretic function, for example the function x → F5 (x + 100) where F0 (x) := 2x and Fn+1 (x) := Fnx+1 (x). Let further the norm function no : ε0 → IN be defined by no(0) := 0 and no(α) := n+no(α1 )+. . .+no(αn ) if α = ω α1 +. . .+ω αn > α1 ≥ . . . ≥ αn . Then ψ is defined recursively on ε0 by ψ(α) := max({0} ∪ {ψ(β) + 1 | β < α & no(β) ≤ Φ(no(α))}).
Complexity of G¨odel’s T in λ-Formulation
391
As mentioned in [9] this definition can be carried out in PRA + PRWO(ε0 ). We now state two basic propositions concerning the norm and the ψ-function (cf. [9]): Proposition 4.4. Let α and β be ordinals less than ε0 . Then we have 1. no(α ⊕ β) = no(α) + no(β). · 1 ≤ no(α ⊗ β) ≤ no(α) · no(β) if α = 0 = β. 2. no(α) + no(β) − 3. no(α) ≤ 2 · no(2α ) and no(2α ) ≤ 2no(α) .
Proposition 4.5. Let k ∈ IN and ordinals α, β < ε0 be given. Then we have 1. 2. 3. 4.
k = ψ(k), k ≤ ψ(α + k), no(α) ≤ ψ(α), and ψ(β) + 1 ≤ ψ(β + 1). ψ(α ⊕ ψ(β)) ≤ ψ(α ⊕ β). α < β & no(α) ≤ Φ(no(β)) ⇒ ψ(α) < ψ(β). α ≥ ω ⇒ Φ(no(α)) ≤ ψ(α).
For the comparison of arbitrary OT -terms and their norms we introduce the notion of admissible substitution. Propositions 4.4 and 4.5 will then generalize. Definition 4.6. We call a map : OT → OT an admissible substitution if for every h ∈ OT we have h ≡ h{X σ := X, X σ := X | X σ ∈ X & X σ ∈ X }, where for each vector X σ ∈ X the conditions X ∈ C, ∀i ≤ gσ no(X i ) no(X 0 ), and ∀i < gσ 0 ≺ no(X i ) are fulfilled and for each vector X σ ∈ X the analogous conditions X ∈ C, ∀i ≤ gσ no(X i ) no(X 0 ), and ∀i < gσ 0 ≺ no(X i ) hold. Definition 4.7. For g, h ∈ OT the relation no(g) ≺ no(h) holds if and only if no(g) < no(h) for every closed admissible substitution . Analogously we define equality and the relationship no(g) no(h) between norms of OT -terms g and h. Definition 4.8. For g, h ∈ OT the relation g ≺ h holds if and only if g < h for every closed admissible substitution . Analogously we define the relation as well as (semantic) equality of OT -terms. Definition 4.9. For A, B ∈ C the relation A ≺ B holds if and only if A0 ≺ B0 and Ai Bi for all i > 0. For σ ∈ TP the relation A ≺σ B holds if and only if A0 ≺ B0 and Ai Bi for all 0 < i ≤ gσ. (Semantic) equality as well as the relation are defined componentwise. The relations =σ and σ are defined componentwise up to level gσ. 4.3 The 2σ – Operator In this subsection we define the operator 2σ and note its basic properties. The definition is due to Howard [5] and was modified by Sch¨utte [7] for vector level greater than gσ and by Weiermann [9] for vector level zero. Note that C is closed under 2σ . Definition 4.10 ( 2σ : OT ω × OT ω → OT ω ). We define ⎧ ⎨ ψ(ω ⊗ (A 2σ B)1 ⊕ A0 ⊕ B0 + gσ) if i = 0 (A 2σ B)i := 2(A 2σ B)i+1 ⊗ (Ai ⊕ Bi ) if 1 ≤ i ≤ gσ ⎩ Ai if i > gσ.
392
G. Wilken and A. Weiermann
Lemma 4.11. Let A, B, C, D ∈ OT ω and σ ∈ TP. 1. ∀i Ai (A 2σ B)i and ∀i ≤ gσ Ai ⊕ Bi (A 2σ B)i . 2. Let k ∈ {1, . . . , gσ + 1}. (a) ∀i ∈ {k, . . . , gσ + 1} Ai Bi implies (A 2σ C)k (B 2σ C)k . If additionally Aj ≺ Bj and ∀i ∈ {k, . . . , j} Bi ⊕ Ci 0 hold for some j with k ≤ j ≤ gσ + 1 we even have (A 2σ C)k ≺ (B 2σ C)k . (b) ∀i ∈ {k, . . . , gσ} Ai Bi implies (C 2σ A)k (C 2σ B)k . If additionally Aj ≺ Bj and ∀i ∈ {k, . . . , j} Ci ⊕ Bi 0 hold for some j with k ≤ j ≤ gσ we even have (C 2σ A)k ≺ (C 2σ B)k . 3. If Ai , Bi 0 and Ai ⊕ Bi Ci for all 1 ≤ i ≤ gσ + 1 then for all 1 ≤ i ≤ gσ we have (A 2σ D)i ⊕ (B 2σ D)i ≺ (C 2σ D)i . 4. If we sharpen the assumptions of 3 by 2Agσ+1 ⊕ Bgσ+1 ≺ Cgσ+1 then we even have ∀1 ≤ i ≤ gσ + 1 2((A 2σ D) 2σ (B 2σ D))i ≺ (C 2σ D)i .
Proof. See [5], p.450.
Lemma 4.12. Let a, b, c, d be understood as norms of OT -terms, ρ, σ, τ ∈ TP and A, B, C, D ∈ OT ω with no(Ai ) a, no(Bi ) b, no(Ci ) c, no(Di ) d for i > 0. Then no((A 2σ B)i ) F2 (a + b + gσ) and no(((A 2ρ B) 2σ (C 2φ D))i ) F3 (a + b + c + d + gρ + 2gσ + gφ) for all i > 0. Proof. The proof is essentially the same as in [9].
Lemma 4.13. Let A, B, C, D ∈ OT ω . 1. Let σ, τ ∈ TP where gσ < gτ . (a) If A ≺τ B, no(Ai ) no(B0 ) for all i ≤ gσ + 1, and no(Ci ) no(C0 ) for all 1 ≤ i ≤ gσ then A 2σ C ≺τ B 2σ C. The assertion holds true with τ instead of ≺τ . (b) If A ≺σ B, no(Ai ) no(B0 ) for all i ≤ gσ, and no(Ci ) no(C0 ) for all 1 ≤ i ≤ gσ + 1 then C 2σ A ≺τ C 2σ B. This holds true with σ instead of ≺σ and τ instead of ≺τ . 2. From the assumptions no(Ai ) 0 for all 1 ≤ i ≤ gσ and no(Ai ) no(Bi ) for all 1 ≤ i ≤ gσ + 1 we obtain no((A 2σ C)i ) ≺ F3 (no((B 2σ C)i ) + gσ) for all 1 ≤ i ≤ gσ + 1. 3. Under the assumptions D0 ≺ ω, Ai , Bi 0 & Ai ⊕ Bi Ci & no(Ai ), no(Bi ) no(Ci ) for all i ≤ gσ + 1, and A1 ⊕ B1 ≺ C1 in case that gσ = 0 we have (A 2σ D)i ⊕ (B 2σ D)i ≺ (C 2σ D)i for all i ≤ gσ. Proof. Ad 1.: This is easily shown using Lemmata 4.11, 4.12 and Proposition gσ+1 gσ 4.5. no(A ), M := no(B ), and L := Ad 2.: We set Ni := gσ+1 j i j i j=i j=i j=i no(Cj ) for i > 0. The proof of Lemma 4.12 yields for 0 < i ≤ gσ + 1 no((A 2σ C)i ) F2 (Ni + Li + gσ).
(1)
Since the case i = gσ + 1 is trivial we assume in the following that 1 ≤ i ≤ gσ. Because of no(Bi ) 0 we then have no(2(B 2σ C)i+1 ) no((B 2σ C)i ) and therefore no((B 2σ C)i+1 ) 2no((B 2σ C)i ) by Proposition 4.4. From this we obtain for
Complexity of G¨odel’s T in λ-Formulation
393
every j ∈ {i, . . . , gσ + 1} no((B 2σ C)j ) 2gσ no((B 2σ C)i ). Since no(Bj ) no((B 2σ C)j ) for every j and no(Cj ) no((B 2σ C)j ) for every 1 ≤ j ≤ gσ we get Mi + Li (gσ + 1) · 2gσ+1 · no((B 2σ C)i ).
(2)
(1) and (2) together with Ni Mi yield no((A 2σ C)i ) F3 (no((B 2σ C)i ) + gσ). Ad 3.: For 1 ≤ i ≤ gσ the claim follows by Lemma 4.11 (3). Consider the case i = 0: (A 2σ D)0 ⊕ (B 2σ D)0 = ψ(ω ⊗ (A 2σ D)1 ⊕ A0 ⊕ D0 + gσ) ⊕ ψ(ω ⊗ (B 2σ D)1 ⊕ B0 ⊕ D0 + gσ) ψ(ω ⊗ ((A 2σ D)1 ⊕ (B 2σ D)1 ) ⊕ A0 ⊕ B0 ⊕ 2D0 + 2gσ) ≺ ψ(ω ⊗ (C 2σ D)1 ⊕ C0 ⊕ D0 + gσ) = (C 2σ D)0 . The -relationship follows by Proposition 4.5 (1) and (2). The ≺-relationship is shown using Proposition 4.5 (3) and the second part of this lemma. σ
4.4 The δ x – Operator In this subsection we introduce our refinement of Howard’s δ-Operator, see [5]. σ
Definition 4.14 ( δ x : C → C ). Let D ∈ C be given. We define ⎧ σ σ ⎪ Lx (D) · ( δ0x D0 )0 if i = 0 ⎪ ⎪ ⎨ i σ xσ ( δ x D)i := ( δj Dj )i if 0 < i ≤ gσ + 1 ⎪ ⎪ ⎪ j=0 ⎩ if i > gσ + 1 Di σ
σ
where Lx will be defined below and the δix : Ci → C, 0 ≤ i ≤ gσ + 1, are defined recursively as follows. Components not explicitly given are defined to be 0. Let h ∈ Ci . 1 if i = j ≤ gσ + 1 σ xσ • If ∀k xk ∈ Sub(h) then ( δi h)j := h + 1 if i = j ≤ gσ + 1. • If ∃k xσk ∈ Sub(h) and σ xσ • h ≡ xi then ( δi h)j := 1 if j ≤ gσ + 1. • h ≡ f ⊕ g where f, g ∈ Ci then xσ σ σ ( δi f )j ⊕ ( δix g)j + 1 if j = 1 σ σ ( δix h)j := ( δix f )j ⊕ ( δix g)j if 1 = j ≤ gσ + 1. f • h ≡ 2 ⊗ g where f ∈ Ci+1 , g ∈ Ci , and i > 0 then σ xσ σ 2( δi+1 f )j ⊕ ( δix g)j + 1 if j = gσ + 1 σ σ ( δix h)j := x ( δi+1 f )j ⊕ ( δix g)j if j ≤ gσ. • h ≡ ψ(ω ⊗ f ⊕ g + n) where f ∈ C1 , g ∈ C0 and i = 0 then σ σ ψ(ω ⊗ f {X σ := 1 } ⊕ ( δ0x g)0 + n) if j = 0 σ σ ( δix h)j := ( δ1x f )j ⊕ ( δ0x g)j if 1 ≤ j ≤ gσ + 1. σ
σ
σ
σ
The norm controlling function Lx : C → IN is defined Lx (D) := Lx1 (D) · Lx2 (D)
by gσ+1 x xσ xσ dpx (Dj ) | j ≤ gσ + 1 + 1, and where L1 (D) := j=0 lh (Dj ), L2 (D) := max 2 (writing x instead of xσ for simplicity) lhx , dpx : OT → IN are defined recursively as follows. Let h ∈ OT .
394
G. Wilken and A. Weiermann
• If ∀k xσk ∈ Sub(h) then lhx (h) := 1 and dpx (h) := 0. • If ∃k xσk ∈ Sub(h) and x x σ • h ≡ xj then lh (h) := 1 and dp (h) := 0. • h ∈ {f ⊕ g, ψ(ω ⊗ f ⊕ g + n)} then lhx (h) := lhx (f ) + lhx (g) + 1 and dpx (h) := max{dpx (f ), dpx (g)}. f • h ≡ 2 ⊗ g then x lh (h) := 2lhx (f ) + lhx (g) + 1 and dpx (h) := max{1 + dpx (f ), dpx (g)}. σ
Lemma 4.15 (Substitution property of δ x ). Let B, D ∈ C and xσ , y ρ be such that σ y ρ ≡ xσ and none of the variables xσj occurs in B. Then ( δ x D){Y ρ := B} ≡ σ σ δ x (D{Y ρ := B}). If B ∈ C then for every z ρ we have ( δ x D){Z ρ := B} ≡ σ ρ δ x (D{Z := B}). σ
σ
Proof. It is easy to verify the substitution property at first for lhx , dpx , Lx and δix . σ Then the substitution property for δ x follows. The following definition together with the next two lemmata will be needed for the proofs of Lemma 5.5 and Corollary 6.3. x Definition 4.16. The set Tji (h) of maximal x-free subterms of i-th level of h ∈ Cj and x the set Subji (h) of subterms different from x of i-th level of h where x-free subterms are considered atomic are defined recursively in the build-up of h, using the abbreviation {h}i,j := {h | i = j}. x x (h) := {h}i,j =: Subji (h). • If ∀k xσk ∈ Sub(h) then Tji σ • If ∃k xk ∈ Sub(h) and x σ x • h ≡ xj then Tji (h) := ∅ =: Subji (h). • h ≡ f ⊕ g then x x x x Tji (h) := Tji (f ) ∪ Tji (g) and Subjix (h) := {h}i,j ∪ Subjix (f ) ∪ Subji (g). f • h ∈ {2 ⊗ g, ψ(ω ⊗ f ⊕ g + n)} then x x x x x x Tji (h) := Tj+1,i (f )∪Tji (g) and Subji (h) := {h}i,j ∪Subj+1,i (f )∪Subji (g). x (h) and Subxj (h) := i≥j Subjix (h). Let Ψ x (h) We further define Tjx (h) := i≥j Tji be the set of all subterms of shape ψ(ω ⊗ f ⊕ g + n) of h ∈ C0 that contain some variable xσk .
Lemma 4.17. Let h ∈ Cj with j = 0 in 3. Then the following assertions hold: σ
1. 2f ⊗ g ∈ Sub(( δjx h)i ) implies 2f ⊗ g ∈ Sub(h) ∪ Sub(h{X σ := 1}). x If additionally i > 0 we even have 2f ⊗ g ∈ Sub[Tji (h)]. xσ 2. Let ψ(ω ⊗ f ⊕ g + n) ∈ Sub(( δj h)i ). Then either ψ(ω ⊗ f ⊕ g + n) is a member of Sub(h) ∪ Sub(h{X σ := 1}) or there are terms f and g with f ≡ f {X σ := 1 } σ and g ≡ ( δ0x g )0 such that ψ(ω ⊗ f ⊕ g + n) belongs to Sub(h). If additionally x i > 0 we even have ψ(ω ⊗ f ⊕ g + n) ∈ Sub[Tji (h)]. σ y x 3. ψ(ω ⊗ f ⊕ g + n) ∈ Ψ (( δ0 h)0 ) implies either ψ(ω ⊗ f ⊕ g + n) ∈ Ψ y (h) or σ ψ(ω ⊗ f ⊕ g + n) ∈ Ψ y (h) where f ≡ f {X σ := 1} and g ≡ ( δ0x g )0 . σ
Proof. Assertions 1 and 2 are proved easily by induction on the definition of δjx and 3 σ is proved by induction on the definition of δ0x .
Complexity of G¨odel’s T in λ-Formulation
395
σ
Lemma 4.18. 1. ∀h ∈ C0 h{X σ := 1} ( δ0x h)0 . 2. Let h ∈ Ci , j > 0 and assume ∀t ∈ Tijx (h) no(t) ≺ m for some m ∈ C0 with σ m 0. Then we have no(( δix h)j ) lhx (h) · m. x 3. Let h ∈ Ci and assume ∀2f ⊗ g ∈ Subxi (h) g 0. Then no(t) 2dp (h) · no(h) for all t ∈ Subxi (h). σ 4. lh(( δix h)j ) < 4 · lh(h) for h ∈ Ci . σ 5. Let h ∈ Ci , j > 0, 0 < α < ε0 s.t. ∀t ∈ Tijx (h) t < α. Then ( δix h)j ≤ lhx (h) · α. x (h) and for 6. Let h ∈ C0 , 0 < m < ω, and 0 < α < ε0 such that t < m for all t ∈ T00 x all f, g, n with ψ(ω ⊗ f ⊕ g + n) ∈ Ψ (h) f < α, n < m, and no(f ) F2 (g + n). σ Then ( δ0x h)0 < ψ(ω ⊗ lhx (h) · α ⊕ lhx (h) · m). Proof. Assertion 3 is proved by straightforward induction on t ∈ Subxi (h). The reσ maining assertions are proved straightforwardly by induction on δix . The following lemma addresses the treatment of β-contraction and shows the interplay between the operators 2 and δ. Lemma 4.19. Let B, D ∈ C be vectors satisfying no(f ) F2 (g + n) for any f, g, n such that ψ(ω⊗f ⊕g+n) ∈ Ψ x (D0 ) as well as ∀i no(Bi ) B0 &∀i < gσ 0 ≺ no(Bi ), σ i.e. B is an admissible substitution for X σ . Then we have δ x D 2σ B D{X σ := B}. Proof. We first show that the lemma follows from the following Claim. Let h ∈ Ci where i ≤ gσ + 1. If i = 0 let further no(f ) F2 (g + n) for all σ f, g, n s.t. ψ(ω ⊗ f ⊕ g + n) ∈ Ψ x (h). Then we have ( δix h 2σ B)i h{X σ := B}. We distinguish between the following three cases: σ Case 1: i > gσ + 1. Then clearly ( δ x D 2σ B)i = Di ≡ Di {X σ := B}. σ σ Case 2: 1 ≤ i ≤ gσ + 1. For all j ∈ {i, . . . , gσ + 1} holds ( δ x D)j ( δix Di )j . Thus σ σ by Lemma 4.11 (2) we have ( δ x D 2σ B)i ( δix Di 2σ B)i and the Claim yields σ ( δix Di 2σ B)i Di {X σ := B}. Case 3: i = 0. Here the Claim applies since we have σ
σ
σ
( δ x D 2σ B)0 = ψ(ω ⊗ ( δ x D 2σ B)1 ⊕ ( δ x D)0 ⊕ B0 + gσ) σ
σ
σ
ψ(ω ⊗ ( δ0x D0 2σ B)1 ⊕ ( δ0x D0 )0 ⊕ B0 + gσ) = ( δ0x D0 2σ B)0 by Proposition 4.5 (3) whose assumptions are easily checked: For all j ≤ gσ + 1 we σ σ σ σ have ( δ x D)j ( δ0x D0 )j . Lemma 4.11 (2) yields ( δ x D 2σ B)1 ( δ0x D0 2σ B)1 . σ σ By Lemma 4.13 (2) no(( δ0x D0 2σ B)1 ) F3 (no(( δ x D 2σ B)1 ) + gσ). σ We now prove the Claim by induction on the definition of δix . Assume first that σ σ no variable xσk occurs in h. Then clearly ( δix h 2σ B)i ( δix h)i = h + 1 h ≡ h{X σ := B}. If otherwise ∃k xσk ∈ Sub(h), whence i ≤ gσ must hold, we distinguish between the following four cases: Case 1: h ≡ xσi . (1 2σ B)i Bi + 1 Bi ≡ h{X σ := B}. Case 2: h ≡ f ⊕ g. Then the i.h. applies after an application of Lemma 4.13 (3). Case 3: h ≡ 2f ⊗ g. Then i ≥ 1 and the i.h. applies after using Lemma 4.11 (4). Case 4: h ≡ ψ(ω ⊗ f ⊕ g + n). Then we have i = 0 and we obtain h{X σ := B} = ψ(ω ⊗ f {X σ := B} ⊕ g{X σ := B} + n)
396
G. Wilken and A. Weiermann σ
σ
≺ ψ(ω ⊗ ( δ1x f 2σ B)1 ⊕ ( δ0x g 2σ B)0 + n) σ
(see below)
σ
σ
= ψ(ω ⊗ ( δ1x f 2σ B)1 ⊕ ψ(ω ⊗ ( δ0x g 2σ B)1 ⊕ ( δ0x g)0 ⊕ B0 + gσ) + n) σ
σ
σ
ψ(ω ⊗ (( δ1x f 2σ B)1 ⊕ ( δ0x g 2σ B)1 ) ⊕ ( δ0x g)0 ⊕ B0 + gσ + n) σ σ ψ(ω ⊗ ( δ0x h 2σ B)1 ⊕ ( δ0x h)0 ⊕ B0 + gσ) (see below) σ
= ( δ0x h 2σ B)0 .
The strict inequality follows from the i.h., using that – since B is an admissible substitution for X σ – by assumption no(f {X σ := B}) F2 (g{X σ := B} + n). The last inequality is easily verified using Lemmata 4.11 (3) and 4.13 (2).
5 Assignment of Ordinal Vectors to Terms Definition 5.1 (Recursive Definition of [[·]] : T → C). [[0]] := 0,
[[S]]0 := 1,
[[xσ ]]i := xσi if i ≤ gσ,
⎧ ⎨ 2 if i = 0 [[Rρ ]]i := 1 if 1 ≤ i ≤ gρ + 1, ⎩ ω if i = gρ + 2
[[Dτ ]]i := 1 if i ≤ gτ + 1,
[[aστ bσ ]]i := ([[aστ ]] 2σ [[bσ ]])i if i ≤ gτ,
σ
[[λxσ.aτ ]] := δ x [[aτ ]] ⊕τ [[aτ ]]{[[xσ ]] := 1}. Components which are not explicitly given above are defined to be 0. Definition 5.2 (The Non-Unique Assignment [·] [0] := [[0]],
[S] := [[S]],
[Dτ ] := [[Dτ ]],
⎧ ⎨ [t]0 + 2 if i = 0 1 if 1 ≤ i ≤ gρ + 1, [Rρ t0 ]i := ⎩ [t]0 + 2 if i = gρ + 2
⊂
T × C ).
[Rτ ] := [[Rτ ]],
[xσ ]i := xσi if i ≤ gσ,
[aστ bσ ]i := ([aστ ] 2σ [bσ ])i if i ≤ gτ,
σ
[λxσ.aτ ] := ( δ x [[dτ ]]){[[y ρ ]] := [yρ ] | y ρ ∈ FV(λxσ.dτ )} ⊕τ [aτ ]{[xσ ] := 1}, where dτ ∈ T and y ∈ T is substitutable for y in λxσ.dτ for every y ∈ FV(λxσ.dτ ) as well as dτ {y := y | y ∈ FV(λxσ.dτ )} n aτ for some n < ω. Components which are not explicitly given above are defined to be 0. [·] is well-defined, shown by induction along the relation t defined below. For every strict subterm a of a term b we have t(a) < t(b) whenever t(b) was defined via t(a). Definition 5.3 (Howard [5]). We define a relation t ⊂ T × IN as follows: t(aτ ) := 1 if aτ is a constant or a variable t(aστ bσ ) := t(aστ ) + t(bσ ) t(λxσ.aτ ) := 1 + t(d0 ) + . . . + t(dn ) where d0 1 . . . 1 dn ≡ aτ and n < ω.
Complexity of G¨odel’s T in λ-Formulation
397
Lemma 5.4 (Substitution property of [·]). Let bρ , cτ ∈ T be terms satisfying the condition BV(cτ ) ∩ FV(bρ ) = ∅. Let further [cτ ] be assigned to cτ and [bρ ] be assigned to bρ . Then [cτ ] { [z ρ ] := [bρ ] } is assigned to cτ {z ρ := bρ }. Proof. By induction on t(cτ ) using Lemma 4.15.
The lemma below shows that the norms of the components of a vector A assigned to some term are majorized by its first component A0 . Lemma 5.5. 1. Let dτ ∈ T be given. Set di := [[d]]i . For every j ∈ IN the following statements hold: (a) ∀i < gτ di 0. (b) ∀2f ⊗ g ∈ Sub(dj ) g 0. (c) no(dj ) d0 . σ σ (d) ∀t ∈ Tjx (dj ) no(t) ≺ Lx2 ([[d]]) · ( δ0x d0 )0 . (e) ∀ψ(ω ⊗ f ⊕ g + n) ∈ Sub(dj ) no(f ) F2 (g + n). 2. (a) Let aστ , bσ ∈ T and i > 0. Then no([[aστ bσ ]]i ) F2 (a0 + b0 + gσ). (b) Let aσϕτ , bσ , cρϕ , dρ ∈ T and i > 0. Then no([[ab(cd)]]i ) F3 (a0 + b0 + c0 + d0 + gσ + 2gϕ + gρ). 3. The statements (a), (b), (c) and (e) of part (1) hold for all [d] assigned to dτ ∈ T . 4. The statements of part (2) also hold for the non-unique assignment [·]. Proof. The proof is carried out by the following steps: First prove part (1) by simultaneous induction on the structure of d using Lemmata 4.12, 4.17 and 4.18. Then conclude part (2) by property (c) of part (1) and Lemma 4.12. Part (3) is established by induction on t(d) using part (1). Finally we conclude part (4) from part (3) in the same way we concluded (2) from (1). The details are now straightforward.
6 Proof of the Main Theorem and Final Results Let G(aτ ) denote the maximum type level of subterms of aτ ∈ T , RG(aτ ) denote the maximum type level of recursors occurring in aτ , and dp(aτ ) be defined recursively by the following clauses: Constants and variables have depth 1, abstraction terms λxσ.aτ have depth dp(aτ ) + 1 and terms aστ bσ have depth dp(aστ ) + dp(bσ ). Theorem 6.1. Assume aσ bσ and let [a] be a vector assigned to aσ . Then we can constructively assign a vector [b] to b such that [a] [b]. Proof. The proof is by induction on t(a). The cases (D0 ), (DS ) and (R0 ) are trivial. (RS ) Rρ (St)ab at(Rρ tab) : Let s be of type 0 with given assignment [s] and ⎧ ⎨ [s]0 + 2 if i = 0 [Rρ s]i = 1 if 1 ≤ i ≤ gρ + 1 ⎩ [s]0 + 2 if i = gρ + 2 where components not shown are 0. Since [Rρ s] ≺ [Rρ ] 20 [s] by Lemma 4.13 (1) for s ≡ St we have [Rρ (St)] 20ρρ [a] 2ρ [b] ≺ [Rρ ] 20 [St] 20ρρ [a] 2ρ [b]. We may
398
G. Wilken and A. Weiermann
therefore assume that [Rρ (St)ab] is given by [Rρ (St)], [a] and [b]. At first we show for all i ∈ {1, . . . , gρ + 1}, using Lemma 4.11 (4), 2 · ((A 2ρ [b]) 2ρ ([Rρ ta] 2ρ [b]))i ≺ ([Rρ (St)a] 2ρ [b])i
(1)
where A is defined by Ai := [at]i if i ≤ gρ, Ai := [a]i + 1 if i = gρ + 1, and setting all other components to 0. In order to prove (1) we verify the assumptions of Lemma 4.11 (4). By Lemma 5.5 we clearly have 0 ≺ [at]i , [Rρ ta]i for every i ≤ gρ whereas for i = gρ + 1 we have 0 ≺ 2[t]0 +2 ⊗ (1 ⊕ [a]i ) = [Rρ ta]i as well as 2 · Ai ⊕ [Rρ ta]i = 2 · ([a]i + 1) ⊕ 2[t]0 +2 ⊗ (1 ⊕ [a]i ) ≺ 2[t]0 +3 ⊗ (1 ⊕ [a]i ) = [Rρ (St)a]i .
(2)
For 1 ≤ i ≤ gρ we have [Rρ ta]i ≺ [Rρ (St)a]i by Lemma 4.11 (2). From this and (2) we obtain for 1 ≤ i ≤ gρ Ai ⊕ [Rρ ta]i = [a]i ⊕ 2[Rρ ta]i+1 ⊗ (1 ⊕ [a]i ) ≺ 2[Rρ (St)a]i+1 ⊗ (1 ⊕ [a]i ) = [Rρ (St)a]i .
(3)
Lemma 4.11 (4) now yields (1), and since [at] ≺ A 2ρ [b] it follows from (1), using Lemma 4.11 (2), that ∀i ∈ {1, . . . , gρ} 2 · [at(Rρ tab)]i ≺ [Rρ (St)ab]i . In case that gρ > 0, from this we conclude, using Lemma 4.11 (1), [at(Rρ tab)]1 ⊕ a1 ⊕ [Rρ tab]1 ≺ [Rρ (St)ab]1 .
(4)
In order to prove [at(Rρ tab)]0 ≺ [Rρ (St)ab]0 we make use of (2), (4), and of [Rρ ta]0 + a0 + t0 = ψ(ω ⊗ [Rρ ta]1 ⊕ t0 + a0 + gρ + 3) + a0 + t0 ψ(ω ⊗ [Rρ ta]1 ⊕ 2t0 + 2a0 + gρ + 3) ≺ ψ(ω ⊗ [Rρ (St)a]1 ⊕ t0 + a0 + gρ + 4) = [Rρ (St)a]0
(5)
where the strict inequality follows from Proposition 4.5 (3) since Lemma 5.5 (4) yields no([Rρ ta]1 ) F2 (t0 + 2 + a0 + gρ + 1) and the relation [Rρ ta]1 ≺ [Rρ (St)a]1 has already been shown. We are now able to estimate the term [at(Rρ tab)]0 . Since the case gρ = 0 is easier to deal with we assume gρ > 0. [at(Rρ tab)]0 = ψ(ω ⊗ [at(Rρ tab)]1 ⊕ [at]0 ⊕ [Rρ tab]0 + gρ) = ψ(ω ⊗ [at(Rρ tab)]1 ⊕ ψ(ω ⊗ [at]1 ⊕ a0 ⊕ t0 ) ⊕ ψ(ω ⊗ [Rρ tab]1 ⊕ [Rρ ta]0 ⊕ b0 + gρ) + gρ) ψ(ω ⊗ ([at(Rρ tab)]1 ⊕ [at]1 ⊕ [Rρ tab]1 ) ⊕ [Rρ ta]0 ⊕ a0 ⊕ t0 ⊕ b0 + 2gρ) ≺ ψ(ω ⊗ [Rρ (St)ab]1 ⊕ [Rρ (St)a]0 ⊕ b0 + gρ) = [Rρ (St)ab]0 where the strict inequality again follows using Proposition 4.5 (3) since by (4) we have [at(Rρ tab)]1 ⊕ [at]1 ⊕ [Rρ tab]1 ≺ [Rρ (St)ab]1 and using (5) and Lemma 5.5 (4) we
Complexity of G¨odel’s T in λ-Formulation
399
obtain no([at(Rρ tab)]1 ⊕ [at]1 ⊕ [Rρ tab]1 ) F3 (a0 + t0 + [Rρ ta]0 + b0 + 3gρ) + F2 (a0 + t0 ) + F2 ([Rρ ta]0 + b0 + gρ) ≺ Φ([Rρ (St)a]0 + b0 + gρ). (Appr ) a b ⇒ ac bc : Straightforward. (Appl ) b c ⇒ ab ac : Straightforward. (β) (λx.a)b a{x := b} (BV(λx.a) ∩ FV(b) = ∅) : Let [(λx.a)b] be given by [λxσ.aτ ] and [bσ ] with σ
[λxσ.aτ ] = ( δ x [[dτ ]]){[[y ρ ]] := [y ρ ] | y ρ ∈ FV(λxσ.dτ )} ⊕τ [aτ ]{[xσ ] := 1}
(6)
where the reduction sequence d0 ≡ d{y := y | y ∈ FV(λx.d)} 1 . . . 1 dn ≡ a for terms di with t(di ) < t(λx.a) and 0 ≤ i ≤ n is part of the reduction history of λx.a. Wlog we may assume that BV(d0 ) ∩ FV(b) = ∅ and consider the vector σ D := [[d]]{[[y]] := [y] | y ∈ FV(λx.d)}. Note that by the substitution property of δ x , σ σ see Lemma 4.15, we have ( δ x [[d]]){[[y]] := [y] | y ∈ FV(λxσ.d)} ≡ δ x D. Lemma 4.19 yields σ (7) δ x D 2σ [b] D{[[x]] := [b]} since the assumptions of Lemma 4.19 are guaranteed through Lemma 5.5. Now [d] := [[d]] {[[z]] := [z] | z ∈ FV(d)} is assigned to the term d, and by the substitution property of [·], Lemma 5.4, [d0 ] := [d]{[y] := [y] | y ∈ FV(λx.d)} ≡ D{[[x]] := [x]} is assigned to the term d0 . An n-fold application of the i.h. yields a vector A assigned to the term a with [d0 ] A where equality holds if and only if n = 0. Since by Lemma 5.5 [b] is an admissible substitution for [x] we obtain, using again Lemma 5.4, [d0 {x := b}] := [d0 ]{[x] := [b]} A{[x] := [b]} =: [a{x := b}]. For this reason we have for every i ≤ gτ σ
[(λx.a)b]i = ( [λx.a] 2σ [b] )i ( δ x D 2σ [b] )i Di { [[x]] := [b]}
by Lemma 4.13 (1)
by (7)
= [d0 {x := b}]i [a{x := b}]i . (ξ) a b ⇒ λx.a λx.b : Let [λxσ.aτ ] be given as above in (6). We have t(a) < t(λx.a) whence by the i.h. there is a vector [b] assigned to bτ with [a] [b]. Since 1 is an admissible substitution we get [a]{ [x] := 1 } [b]{ [x] := 1 }. Now set σ [λx.b] := ( δ x [[dτ ]]){[[y ρ ]] := [y ρ ] | y ρ ∈ FV(λxσ.dτ )} ⊕τ [bτ ]{[xσ ] := 1}. Corollary 6.2. Let aσ ∈ T be given. The vector [a] := [[a]] { [[x]] := [x] | x ∈ FV(a)} is assigned to aσ and we obtain an upper bound for the depth of the reduction tree of a by [a]0 { [x] := 1 | x ∈ FV(a)} ∈ IN. Hence typed λ-calculus with recursors and thus G¨odel’s T in λ-formulation is strongly normalizing. Corollary 6.3. 1. The functions definable in T, i.e. the provably recursive functions of PA are < ε0 -recursive. The derivation lengths function for T is ε0 -recursive. 2. The functions definable in Tn and thus the provably recursive functions of IΣn+1 are < ωn+2 -recursive. The derivation lengths function for Tn is ωn+2 -recursive. To prove this corollary we employ the following two technical lemmata: σ
Lemma 6.4. Let d ∈ T . Then we have Lx ([[d]]) < 22 (Gσ (d) + 1 + 2dp(d)) where Gσ (d) := max{gσ + 1, G(d)}.
400
G. Wilken and A. Weiermann
Proof. Since dpx ≤ dp, lhx ≤ lh and 2dp ≤ lh it is sufficient to estimate the term
2 gσ+1 lh([[d]] ) . By induction on d, setting kfσ (d) := 22 (Gσ (f ) + 2dp(d)) and i i=0 Kfσ (d) := kfσ (d)2 = 22 (Gσ (f ) + 1 + 2dp(d)) for f ∈ T , it is easy to prove that Gσ (f ) lh([[d]]i ) < kfσ (d) for every f such that d is a subterm of f . From this the i=0 lemma now follows for f ≡ d. ¯ to be h{xσ := 1, xσ := 1 | xσ , xσ ∈ OT }. For h ∈ OT we define h i i i i Lemma 6.5. Let d, f ∈ T where d is a subterm of f . Setting G := G(f ), RG := RG(f ), and di := ([[d]]i ) we have ⎧ di = 0 if i > G. ⎪ ⎪ ⎨ · i + 2dp(d)) if G ≥ i > RG. di < 2G+2−·i (2(G + 1) − (1) · i + 2dp(d))) if RG ≥ i ≥ 1. · · < 2 (ω ⊗ 2 d ⎪ i RG−i G+2−i (2(G + 1) − ⎪ ⎩ d0 < ψ(ω ⊗ 2RG−·1 (ω ⊗ 2G+1 (2(G + 1) + 2dp(d)))). Abbreviating the upper bound of di given above by Di we have ∀ψ(ω ⊗ f ⊕ g + n) ∈ Ψ x ([[d]]0 ) f¯ < D1 & n < G. (2) x ∀i, j ≤ G ∀t ∈ Tji ([[d]]j ) t¯ < Di . Proof. (1) and (2) are proved by simultaneous induction on d using Lemmata 4.17, 4.18 (4,5,6) and 5.5. Proof (of Corollary 6.3). We make use of the well-known fact that PA has a functional interpretation in T (cf. [4,8,7]) and the fact that the fragments IΣn+1 have functional interpretations in the Tn (cf. [6]). Then by results from [2] the assertions 1 and 2 follow easily from Theorem 6.1 using Lemmata 6.4 and 6.5.
References 1. Beckmann, A., Weiermann, A.: Analyzing G¨odel’s T via Expanded Head Reduction Trees. Mathematical Logic Quarterly 46, 517–536 (2000) 2. Buchholz, W., Cichon, E.A., Weiermann, A.: A Uniform Approach to Fundamental Sequences and Hierarchies. Mathematical Logic Quarterly 40, 273–286 (1994) 3. Cardone, F., Hindley, J.R.: History of λ-Calculus and Combinatory Logic. In: Gabbay, D.M., Woods, J. (eds.) Handbook of the History of Logic, vol. 5. Elsevier, Amsterdam (to appear) ¨ 4. G¨odel, K.: Uber eine bisher noch nicht ben¨utzte Erweiterung des finiten Standpunktes. Dialectica 12, 280–287 (1958) 5. Howard, W.A.: Assignment of Ordinals to Terms For Primitive Recursive Functionals of Finite Type. Intuitionism and Proof Theory, pp. 443–458. North Holland, Amsterdam (1970) 6. Parsons, C.: On n-Quantifier Induction. The Journal of Symbolic Logic 37(3), 466–482 (1972) 7. Sch¨utte, K.: Proof Theory. Springer, Heidelberg (1977) 8. Shoenfield, J.R.: Mathematical Logic. Addison-Wesley, New York (1967) 9. Weiermann, A.: How Is it That Infinitary Methods Can Be Applied to Finitary Mathematics? G¨odel’s T: A Case Study. The Journal of Symbolic Logic 63(4), 1348–1370 (1998)
The Computational SLR: A Logic for Reasoning about Computational Indistinguishability Yu Zhang Laboratory of Computer Science, Institute of Software, CAS, Beijing, China FIT, Macau University of Science and Technology, Macau SAR China [email protected]
Abstract. Computational indistinguishability is a notion in complexity-theoretic cryptography and is used to define many security criteria. However, in traditional cryptography, proving computational indistinguishability is usually informal and becomes error-prone when cryptographic constructions are complex. This paper presents a formal proof system based on an extension of Hofmann’s SLR language, which can capture probabilistic polynomial-time computations through typing and is sufficient for expressing cryptographic constructions. We in particular define rules that justify directly the computational indistinguishability between programs and prove that these rules are sound with respect to the set-theoretic semantics, hence the standard definition of security. We also show that it is applicable in cryptography by verifying, in our proof system, Goldreich and Micali’s construction of pseudorandom generator.
1 Introduction Research on the verification of cryptographic protocols in recent years has switched its focus from the Dolev-Yao model to the computational model — a more realistic model where criteria for the underlying cryptography are considered. Computational indistinguishability is an important notion in cryptography and the computational model of protocols, which is particularly used to define many security criteria. However, proving computational indistinguishability in traditional cryptography is usually done in a paper-and-pencil, semi-formal way. It is often error-prone and becomes unreliable when the cryptographic constructions are complex. This paper aims at designing a formal system that can help us to verify cryptographic proofs. Our ultimate goal will be fully or partially automating the verification. Noticing that computational indistinguishability can be seen as a special notion of equivalence between programs, we make use of techniques from the theory of programming languages, but this requires in the first place a proper language for expressing cryptographic constructions and adversaries. In particular, we shall consider only “feasible” adversaries, precisely, probabilistic programs that terminate within polynomial time. While such a complexity restriction can be easily formulated using the model of Turing-machines, it is by no mean a good model for formal verification. At this point, our attention is drawn to Hofmann’s SLR system [7,8], a functional programming language that implements Bellantoni and Cook’s safe recursion [3]. The very nice property about SLR is the characterization of polynomial-time computations through typing. The P.-L. Curien (Ed.): TLCA 2009, LNCS 5608, pp. 401–415, 2009. c Springer-Verlag Berlin Heidelberg 2009
402
Y. Zhang
probabilistic extension of SLR has been studied by Mitchell et al. [10], where functions of the proper type capture the computations that terminate in polynomial time on a probabilistic Turing machine. Our system is based on the probabilistic extension of SLR, and we develop an axiomatization system with rules justifying the computational indistinguishability between programs. We prove that these rules are sound with respect to the set-theoretic semantics of the language, hence coincide with the traditional definition of computational indistinguishability. Reasoning about cryptographic constructions in the proof system is purely syntactic, without explicit analysis on the probability of program output and the complexity bound of adversaries. The rest of the paper is organized as follows: Section 2 introduces the computational SLR — a probabilistic extension of Hofmann’s SLR, together with an adapted definition of computational indistinguishability based on the language. In Section 3 we develop the equational proof system and prove the soundness of its rules. Cryptographic examples using the proof system are given in Section 4 to illustrate its usability in cryptography. Section 5 summarizes related work and Section 6 concludes the paper.
2 The Computational SLR We start by defining a language for expressing cryptographic constructions and adversaries, as well as the computational indistinguishability between programs. Due to the complexity consideration, the language should offer a mechanism to capture the class of probabilistic polynomial-time computations. Bellantoni and Cook have proposed a recursion model other than the model of Turing-machines, which is called safe recursion and defines exactly functions that are computable in polynomial-time on a Turing-machine [3]. This is an intrinsic, purely syntactic mechanism: variables are divided into safe variables and normal variables, and safe variables must be instantiated by values that are computed using only safe variables; recursion must take place on normal variables and intermediate recursion results are never sent to safe variables. When higher-order functions are concerned, it is also required that step functions must be linear, i.e., intermediate recursive results can be used only once in each step. Hofmann later developed a functional language called SLR to implement the safe recursion [7,8]. In particular, he introduces a type system with modality to distinguish between normal variables and safe variables, and linearity to distinguish between normal functions and linear functions. He proves that well-typed functions of a proper type are exactly polynomial-time computable functions. Hofmann’s original SLR system has a polymorphic type system, but it is not necessary in cryptography, so in this section we first introduce a non-polymorphic version of Hofmann’s SLR system, then extend it to express cryptographic constructions. We shall adapt the definition of the computational indistinguishability in our language. 2.1 The Non-polymorphic SLR for Bitstrings Types are defined by: τ, τ , . . . ::= Bits | τ × τ | τ ⊗ τ | τ → τ | τ → τ | τ τ .
The Computational SLR
403
Bits is the base type for bitstrings, and all other types are from Hofmann’s language: τ × τ are cartesian product types, and τ ⊗τ are tensor product types as in linear λ-calculus. There are three sorts of functions: τ → τ are modal functions with no restriction on the use of arguments; τ → τ are non-modal functions where arguments must be safe values; τ τ are linear functions where arguments can be used only once. SLR uses a aspects to represent these function spaces: τ −→ τ is a function type with aspect a, which is either m = (modal, nonlinear) for τ → τ , or n = (nonmodal, nonlinear) for τ → τ , or l = (nonmodal, linear) for τ τ . The aspects are ordered: m ≤ n ≤ l. The type system also inherits the sub-typing from SLR and we write τ <: τ if τ is a sub-type of τ . In particular, the sub-typing relation between the three sorts of functions is: τ τ <: τ → τ <: τ → τ . As in the SLR for numbers, we have also Bits → τ <: Bits τ , stating that bitstrings can be duplicated without violating linearity. Expressions of SLR are defined by the following grammar: e1 , e2 , . . . ::= | | | | | | | | | |
x nil B0 | B1 caseτ recτ λx.e e1 e2 e1 , e2 proj1 e | proj2 e e1 ⊗ e2 let x ⊗ y = e1 in e2
atomic variables empty bitstring bits case distinction safe recursor abstraction application product product projection tensor product tensor projection
B0 and B1 are two constants for constructing bitstrings: if u is a bitstring, B0 u (or B1 u) is the new bitstring with a bit 0 (or 1) added at the left end of u. We often use B to denote the bit constructor when its value is irrelevant. caseτ is the constant for case distinction: caseτ (n, e, f0 , f1 ) tests the bitstring n and returns e if n is an empty bitstring, f0 (n ) if the first bit of n is 0 and the rest is n , and f1 (n ) if the first bit of n is 1. recτ is the constant for recursion on bitstrings: recτ (e, f, n) returns e if n is empty, and f (n, recτ (e, f, n )) otherwise, where n is the part of the bitstring n with its first bit cut off. Typing assertions of expressions are of the form Γ t : τ , where Γ is a typing context that assigns types and aspects to variables. A context is typically written as a list of bindings x1 :a1 τ1 , . . . , xn :an τn , where a1 , . . . an are aspects of {m, n, l}. Some typical typing rules of SLR are those for abstractions and applications:
Γ, x :a τ e : τ a
Γ λx . e : τ −→ τ
a
Γ, Δ1 e1 : τ −→ τ Γ, Δ2 e2 : τ Γ nonlinear x :a σ ∈ Γ, Δ2 implies a ≤ a Γ, Δ1 , Δ2 e1 e2 : τ
404
Y. Zhang
and types for constants: B0 , B1 : Bits Bits, recτ : τ (Bits → τ τ ) → Bits → τ, caseτ : Bits (τ × (Bits τ ) × (Bits τ )) τ. The full set of typing rules can be found in Hofmann’s paper [8] or the full version of this paper [16] 2.2 The Computational SLR The probabilistic extension of SLR is studied by Mitchell et al. by adding a random bit oracle to simulate the oracle tape in probabilistic Turing-machines [10]. While we often need to state in the cryptographic reasoning that certain functions are purely deterministic, there is no explicit distinction in their language between probabilistic and purely deterministic functions, so we adopt a different type system from Moggi’s computational λ-calculus [12], where probabilistic computations are captured by monadic types. We call the language computational SLR and often abbreviate it as CSLR. Types in CSLR are extended with a unary type constructor T. It comes from Moggi’s language: a type Tτ is called a monadic type (or a computation type), for computations that return (if they terminate correctly) values of type τ . In our case, a computation always terminates and can be probabilistic, hence it will return one of a set of values, each with a certain probability. The sub-typing system is extended by a rule stating that the constructor T preserves sub-typing. Expressions of the computational SLR are extended with three constructions for probabilistic computations: e1 , e2 , . . . ::= . . . | rand | val(e) | bind x = e1 in e2
SLR terms oracle bit deterministic computation sequential computation
The constant rand returns a random bit 0 or 1, each with the probability 12 . val(e) is the trivial (deterministic) computation which returns e with the probability 1. bind x = e1 in e2 is the sequential computation which first computes e1 , binds the value to x, then computes e2 . We sometimes abbreviate the program of the form bind x1 = e1 in . . . bind xn = en in e as bind ( x1 = e1 , . . . , xn = en ) in e. Note that the order of some bindings must be carefully kept in the abbreviated form. Typing rules for these extra constants and constructions are given in Figure 1. Note that when defining a purely deterministic program in CSLR, it is not sufficient to state that its type does not have monadic components. For instance, the function λxBits . (λy TBits . x)rand has type Bits Bits, but it still contains probabilistic computations. Instead, we must show that the program can be defined and typed in (nonprobabilistic) SLR, and in that case, we say it is SLR-definable and SLR-typable. As in some standard typed λ-calculi, we can define a reduction system for the computational SLR, and prove that every closed term has a canonical form. In particular, the canonical form of type Bits is:
The Computational SLR
Γ rand : TBits
T-RAND
Γ e:τ Γ val(e) : Tτ
Γ, Δ1 e1 : Tτ1 Γ, Δ2 , x :a τ1 e2 : Tτ2 Γ nonlinear x :a σ ∈ Γ, Δ1 implies a ≤ a Γ, Δ1 , Δ2 bind x = e1 in e2 : Tτ2
405
T-VAL
T-BIND
Fig. 1. Typing rules for the computational SLR
b ::= nil | B0 b | B1 b. If u is a closed term of type Bits, we write |u| for its length. We define the length of a bitstring on its canonical form b: |nil| = 0,
|Bi b| = |b| + 1 (i = 0, 1).
2.3 A Set-Theoretic Semantics We write for the set of bitstrings, with a special element denoting the empty bitstring. To interpret the probabilistic computations, we adopt the probabilistic monad defined in [14]: if A is set, we write DA : A → [0, 1] for the set of probability mass functions over A. The original monad in [14] is defined using measures instead of mass functions, and is of type (2A → [0, ∞]) → [0, ∞], where 2A denotes the set of all subsets of A, so that it can also represent computing probabilities over infinite data structure, not just discrete probabilities. But for the sake of simplicity, in this paper we work on mass functions instead of measures. Note that the monad is not the one defined in [10], which is used to keep track of the bits read from the oracle tape rather than reasoning about probabilities. When d is a mass function of DA and a ∈ A, we also write Pr[a ← d] for the probability d(a). If there are finitely many elements in d ∈ DA , we can write d as {(a1 , p1 ), . . . , (an , pn )}, where ai ∈ A and pi = d(ai ). With this monad, every computation type Tτ in CSLR will be interpreted as Dτ , where τ is the interpretation of τ . Expressions are interpreted within an environment which maps every free variable to an element of the corresponding type and we often write ρ ∈ Γ to denote that ρ is an environment for the typing context Γ . Particularly, the two computational constructions are interpreted as: val(e)ρ = {(eρ, 1)}, bind x = e1 in e2 ρ = λv .
v ∈τ
e2 ρ[x → v ](v) × e1 ρ(v ),
where τ is the type of x (or Tτ is the type of e1 ). Interpretation of other types and expressions is given in [16, Figure 4]. The very nice property of SLR is the characterization of polynomial-time computations (the class PTIME) through typing:
406
Y. Zhang
Theorem 1 (Hofmann [8]). The set-theoretic interpretations of closed terms of type Bits → Bits in SLR define exactly polynomial-time computable functions. Mitchell et al. have extended Hofmann’s result to the probabilistic version of SLR with a random bit oracle, showing that terms of the same type in their language define exactly the functions that can be computed by a probabilistic Turing machine in polynomial time (the class PPT). Although our language is slightly different from their language OSLR (which does not have computation types), the categorical model that they use to prove the above result can be also used to interpret the computational SLR. In particular, if we follow the traditional encoding of call-by-value λ-calculus into Moggi’s compua a tational language, function types τ −→ τ in OSLR will be encoded as τ −→ Tτ in CSLR, hence OSLR functions that correspond to PPT computations are actually CSLR functions of type Bits → TBits. This permits us to reuse the result of [10], adapted for the computational SLR: Theorem 2 (Mitchell et al. [10]). The set-theoretic interpretations of closed terms of type Bits → TBits in CSLR define exactly functions that can be computed by a probabilistic Turing machine in polynomial time. 2.4 Computational Indistinguishability We say that a closed SLR-term p (of type Bits → Bits) is length-sensitive if for every two bitstrings u1 , u2 of the same length, i.e. |u1 | = |u2 |, it holds that |p(u1 )| = |p(u2 )|. When a term p is length-sensitive, we write |p| for the underlying length measure function, i.e., |p|(n) = |p(u)|, where |u| = n. If p and q are two length-sensitive SLRfunctions, we write |p| < |q| for the fact that for all bitstring u, |p(u)| < |q(u)|, and similar for |p| > |q|, |p| = |q|, etc. A length-sensitive function is said positive if for every bitstring u, |p(u)| > |u|. We say that a closed CSLR-term p (of type Bits → τ ) is numerical if its value depends only on the length of its argument, i.e., p(u1 ) = p(u2 ) if |u1 | = |u2 |. Note that we do not introduce the standard numerical functions in the language, so the numerical and length-sensitive SLR-functions will be used to represent the usual polynomials of numerals, and we often abbreviate them as polynomials. A numerical polynomial is canonical if it returns empty bitstring or bitstrings containing bit 1 only. Intuitively, two probabilistic functions are computationally indistinguishable, if the probability that any feasible adversary can distinguish them becomes negligible when they take sufficiently large arguments. We adapt the definition of the computational indistinguishability of [6, Definition 3.2.2] in the setting of CSLR. Definition 1 (Computational indistinguishability). Two CSLR terms f1 and f2 , both of type Bits → TBits, are computationally indistinguishable (written as f1 f2 ) if for every term A such that A : Bits → TBits → TBits, every positive polynomial p (SLR-typable of Bits → Bits), and all bitstring w such that |w| ≥ n (for some n ∈ ), |Pr[ ← A(w, f1 (w))] − Pr[ ← A(w, f2 (w))]| < where denotes the empty bitstring.
1 , |p(w)|
The Computational SLR
407
Note that the second parameter of the adversary must be a computation which can be executed several times. If the adversary were of type Bits → Bits → TBits, it would be too weak since the only way to get the second argument from the programs under testing is bind x = fi (w) in A(w, x), where the adversary executes the programs only once and uses the value everywhere. 2.5 Examples of PPT Functions Before moving on to develop the logic for reasoning about programs in CSLR, we define some useful PPT functions that will be frequently used in cryptographic constructions. def
– The random bitstring generation rs = λx : Bits . rec(val(nil), hrs , x), where hrs is defined by def
hrs = λm . λr . bind ( b = rand, u = r ) in case(b, val(nil), λx.val(B0 u), λx.val(B1 u)). rs receives a bitstring and returns a uniformly random bitstring of the same length. It can be checked that hrs : Bits → TBits TBits, hence rs : Bits → TBits. If e is a closed program of type TBits and all possible results of e are of the same length, we write |e| for the length of its result bitstrings. Clearly, for any bitstring u, the result bitstrings of rs rs(u) are of the same length and it can be easily checked rs that |rs rs(u)| = |u|. def
– The string concatenation conc = λx . λy . rec(y, hconc , x), where hconc is defined by def
hconc = λm . λr . case(m, r, λx.B0 r, λx.B1 r). It is checked that conc is a purely deterministic, well-typed SLR-function, of type Bits → Bits Bits. Note that conc can also be defined as a SLR-term of type Bits Bits → Bits, i.e., it recurs on only one of its argument but it does not matter which one, so we do not distinguish the two forms but only require that one of the two arguments of conc must be normal (modal). We often abbreviate conc conc(u, v) as u•v. – Head function and tail function: def
hd = λx . case(x, nil, λy.0, λy.1), def tl = λx . case(x, nil, λy.y, λy.y). Both hd and tl are SLR-definable and SLR-typable of type Bits Bits. def – Split function split = λx . λn . rec(nil ⊗ x, hsplit , n), where def
hsplit = λm . λr . let v1 ⊗ v2 = r in case(v2 , v1 ⊗ v2 , λy.(v1 •0) ⊗ y, λy.(v1 •1) ⊗ y).
408
Y. Zhang
split (x, n) splits the bitstring x into two bitstrings, among which the first one is of the length |n| if |n| ≤ |x| or x otherwise. It can be checked that split is SLRdefinable and SLR-typable of type Bits Bits → Bits ⊗ Bits. With split we can define the prefix and suffix functions: def
pref = λx . λn . let u1 ⊗ u2 = split (x, n) in u1 , def suff = λx . λn . let u1 ⊗ u2 = split (x, n) in u2 . Both of the two functions are SLR-definable of type Bits Bits → Bits.
3 The Proof System We present in this section an equational proof system C on top of CSLR, through which one can justify the computational indistinguishability between CSLR programs at the syntactic level. The system C has two sets of rules: the first set (See [16, Figure 6]) are rules for justifying semantic equivalence between CSLR programs (we write e1 ≡ e2 if e1 , e2 are semantic equivalent), and the second set (Figure 2) are rules for justifying computational indistinguishability.
e1 : Bits → TBits e1 e2
e2 : Bits → TBits e1 e2 e2 e3
e1 e3
e1 ≡ e2
EQUIV
TRANS-INDIST
x :n Bits, y :n Bits e : TBits
e1 e2
λx . bind y = e1 (x) in e λx . bind y = e2 (x) in e
SUB
x :n Bits, n :n Bits e : TBits λn.e[u/x] is numerical for all bitstring u λx . e[i(x)/n] λx . e[B1 i(x)/n] for all canonical polynomial i such that |i| < |p| λx . e[nil/n] λx . e[p(x)/n]
H-IND
Fig. 2. System C rules for computational indistinguishability
The first set are standard rules in typed λ-calculi, with axioms for probabilistic computations. In particular, we have the following three axioms: bind x = val(e1 ) in e2 ≡ e2 [e1 /x], bind x = e in val(x) ≡ e, bind x = (bind y = e1 in e2 ) in e3 ≡ bind y = e1 in bind x = e2 in e3 .
The Computational SLR
409
Rules in the second set are similar as in the logic of Impagliazzo and Kapron [9] (which we shall refer to as the IK-logic in the sequel), where they also define an equational proof system for the computational indistinguishability based on their own arithmetic model. But we do not have the EDIT rule for managing bitstrings, as appears internally in their logic, because there is no primitive operations in CSLR for editing bitstrings except the two bit constructors B0 , B1 . Many bitstring operations are defined as CSLR functions and we have introduced a series of lemmas (see [16, Section 3.2]), which can be used in proofs in the same way as system rules. The H-IND rule comes from the frequently used hybrid technique in cryptography: if two complex programs can be transformed into a “small” (polynomial) number of hybrids (relatively simpler programs), where the extreme hybrids are exactly the original programs, then proving the computational indistinguishability of the two original programs can be reduced to proving the computational indistinguishability between neighboring hybrids. The H-IND in our system is slightly different from that in the IKlogic since we do not have the general primitive that returns uniformly a number which is smaller than a polynomial, but the underlying support from the hybrid technique remains there. We remark that the rule TRANS-INDIST is safe and will not break the complexity constraint, because the number of times of applying a rule in a proof is irrelvant to the security parameter of the programs under testing. To show that the system C is sound with respect to the set-theoretic semantics of CSLR, we prove the soundness of the two sets of rules. Theorem 3 (Soundness of program equivalence rules). If Γ e1 : τ , Γ e2 : τ , and e1 ≡ e2 is provable in system C , then e1 ρ = e2 ρ, where ρ ∈ Γ . Proof. Most rules for semantic equivalence are standard in typed λ-calculus. The probabilistic monad certifies the axioms for computations. Theorem 4 (Soundness of computational indistinguishablity rules). If Γ e1 : Bits → TBits, Γ e2 : Bits → TBits, and e1 e2 is provable in the system C , then e1 and e2 are computationally indistinguishable. Proof. We prove that rules in Figure 2 are sound. The soundness of the rule EQUIV is obvious. For the rule TRANS-INDIST , let A be an arbitrary (well-typed hence computable in polynomial time) adversary and q be an arbitrary positive polynomial, then we can easily define another polynomial q such that for all bitstring u, |q (u)| = 2|q(u)| (e.g., def
q = λx . q(x)•q(x), and clearly it is well typed). Because e1 e2 , according Definition 1, there exists some n ∈ and for any bitstring w such that |w| ≥ n, |Pr[ ← A(w, e1 (w))] − Pr[ ← A(w, e2 (w))]| < Also because e2 e3 , there exists another n ∈ |w| ≥ n ,
1 . |q (w)|
and for any bitstring w such that
|Pr[ ← A(w, e2 (w))] − Pr[ ← A(w, e3 (w))]| <
1 |q (w)|
.
410
Y. Zhang
Without losing generality, we suppose that n ≥ n , then for every bitstring w such that |w| ≥ n, |Pr[ ← A(w, e1 (w))] − Pr[ ← A(w, e3 (w))]| ≤ |Pr[ ← A(w, e1 (w))] − Pr[ ← A(w, e2 (w))]| + |Pr[ ← A(w, e2 (w))] − Pr[ ← A(w, e3 (w))]| 1 1 1 < + = . |q (w)| |q (w)| |q(w)| Since p is arbitrary, according to Definition 1, e1 e3 . To prove the soundness of the rule SUB , we assume that there exists an adversary which can computationally distinguish the two terms in the conclusion part, and show that one can also build another adversary which computationally distinguishes the two terms in the premise part. More precisely, for some polynomial p and any integer n, there exists some bitstring w such that |w| ≥ n and |Pr[ ← A(w, f1 (w))] − Pr[ ← A(w, f2 (w))]| ≥
1 , |p(w)|
where f1 and f2 are the two programs in the conclusion part of the rule SUB . We then build another adversary A : A = λz . λz . A(z, bind y = z in e), def
where f is not free in A and e. According to the set-theoretic semantics, A (w, ei (w)) = A(w, bind y = ei (w) in e), hence |Pr[ ← A (w, e1 (w))] − Pr[ ← A (w, e2 (w))]| ≥
1 , |p(w)|
which is a contradiction of the premise e1 e2 . The soundness of the rule H-IND can be proved in a similar way as that of TRANSINDIST. Let A be an arbitrary well-typed adversary and q be an arbitrary positive polydef nomial. Define another polynomial: q = λx . rec(nil, λm.λr.q (x)•r, p(x)). Clearly, for all bitstrings u, |q (u)| = |q(u)| · |p(u)|. Because λx.e[i(x)/n] λx.e[Bi(x)/n] for all canonical numeral i such that |i| < |p|, we can find a sufficiently large number m ∈ such that for all bitstring w whose length is larger than m, 1 |Pr[ ← A(w, e[nil/n])] − Pr[ ← A(w, e[1/n])]| < |q (w)| ...... |Pr[ ← A(w, e[p(w) − 1/n])] − Pr[ ← A(w, e[p(w)/n])]| <
1 |q (w)|
.
The Computational SLR
411
Therefore, |Pr[ ← A(w, e[nil/n])] − Pr[ ← A(w, e[p(w)/n])]| ≤ |Pr[ ← A(w, e[nil/n])] − Pr[ ← A(w, e[1/n])]| +...... + |Pr[ ← A(w, e[p(w) − 1/n])] − Pr[ ← A(w, e[p(w)/n])]| 1 1 |p(w)| 1 < + ...+ = = , |q (w)| |q (w)| |q (w)| |q(w)| and according to Definition 1, λx.e[nil/n] λx.e[p(x)/n].
4 Cryptographic Examples In this section we give two cryptographic examples, whose proofs are done in system C , to illustrate the usability of the proof system in cryptography. 4.1 Pseudorandom Generators The first example verifies the correctness of Goldreich and Micali’s construction of pseudorandom generator [6]. This example also appears in [9], but their proof has a subtle flaw (see Section 5 for explanation). We first reformulate in CSLR the standard definition of pseudorandom generator [6, Definition 3.3.1]. Definition 2 (Pseudorandom Generator). A pseudorandom generator (PRG for short) is a length-sensitive SLR term g : Bits → Bits such that |g(s)| > |s| for every bitstring s, and λx . bind u = rs rs(x) in val(g(u)) λx . rs rs(g(x)). If g is a pseudorandom generator, we call |g| its expansion factor. We recall the construction of Goldreich and Micali [6] (reformulated in CSLR): Suppose that g1 is a PRG with the expansion factor |g1 |(x) = x + 1, i.e., λx . bind u = rs rs(x) in val(g1 (x)) λx . rs rs(Bx). Let B(x) be the function returning the first bit of g1 (x), and R(x) returning the rest bits: def def B = λx . hd (g1 (x)), R = λx . tl (g1 (x)). Clearly, both B and R are well typed functions (of the same type Bits → Bits). We then define a SLR-function G: G = λu . λn . rec(nil, λm . λr . r•B(R (u, m)), n), def
where the function R is defined as: pref (r, u)), n). R = λu . λn . rec(u, λm . λr . R(pref def
It can also be checked that both G and R are well typed SLR-terms (of type Bits → Bits → Bits). We first prove that, given a polynomial p, one can use G to construct easily a PRG with the expansion factor |p|, and the proof will be done in system C .
412
Y. Zhang
Proposition 1 For every well typed (length-sensitive) polynomial p : Bits → Bits, λx . bind u = rs rs(x) in val(G(u, p(u))) λx . rs rs(p(x)). Proof. The proof follows the traditional hybrid technique in cryptography, by defining def the hybrid function: H = λu1 . λu2 . λn . (u2 − n)•G(u1 , n). It can be checked that for all bitstring u1 , u2 , λn.H(u1 , u2 , n) is numerical, and it can be proved that λx . bind ( u1 = rs rs(x), u2 = rs rs(p(x)) ) in ≡ λx . rs rs(p(x)), val(H(u1 , u2 , nil) λx . bind ( u1 = rs rs(x), u2 = rs rs(p(x)) ) in λx . bind u = rs rs(x) in ≡ val(H(u1 , u2 , p(x))) val(G(u, p(u))). We can then prove the hybrid step in the system C and by the rule H-IND , the two programs on the righthand side of the above two equations are computationally indistinguishable. Theorem 5. The CSLR term λx . G(x, p(x)) is a pseudorandom generator with the expansion factor |p|. Proof. Obvious from Proposition 1 and Definition 2.
4.2 Relating Pseudorandomness and Next-Bit Unpredictability The second example is the equivalence between pseudorandomness and next-bit unpredictability [6]. The notion of next-bit unpredictability can be reformulated in CSLR: a positive polynomial f such that f : Bits → Bits is next-bit unpredictable if for all canonical numeral i such that |i| < |f |, pref (f (u), B1 i(x))) λx . bind u = rs rs(x) in val(pref pref (f (u), i(x))•b). λx . bind u = rs rs(x) in bind b = rand in val(pref
(1)
Lemma 1. Pseudorandomness implies next-bit unpredictability: if a positive polynomial f is a pseudorandom generator, then it is next-bit unpredictable. Proof. We can prove the indistinguishability of (1) by applying directly system C rules and the assumption of pseudorandomness, without defining any auxiliary functions. Lemma 2. Next-bit unpredictability implies pseudorandomness: if a positive polynomial f is next-bit unpredictable, then it is a pseudorandom generator with expansion |f |. Proof. The proof uses the hybrid technique. We define a hybrid function: def
pref (f (x), z)•suff suff (y, z). H = λx.λy.λz.pref It can be easily proved that, for all bitstrings u, v such that |v| = |f (u)|, H(u, v, nil) ≡ v and H(u, v, f (u)) ≡ f (u), hence λx . bind ( u = rs rs(x), v = rs rs(f (x)) ) in ≡ rs rs(f (x)), val(H(u, v, nil)) rs(x), v = rs rs(f (x)) ) in λx . bind ( u = rs λx . bind u = rs rs(x) in ≡ . val(H(u, v, f (x))) val(f (u))
The Computational SLR
413
We can then prove the hybrid step in the system C and by the rule H-IND , conclude that f is a pseudorandom generator, with expansion |f |. Theorem 6. A positive polynomial is a pseudorandom generator if and only if it is next-bit unpredictable. Proof. The two directions are proved respectively in the above two lemmas.
5 Related Work Many researchers in cryptography have realized that the increasing complexity of cryptographic proofs is now an obstacle that cannot be ignored and formal techniques must be introduced to write and check cryptographic proofs. Some proof systems similar to ours have been proposed in recent years. The PPC (probabilistic polynomial-time process calculus) system designed by Mitchell et al. [11] is based on a variant of CCS with bound replication and messages that are computable in probabilistic polynomial-time. An equational proof system is also given in their system to prove the observational equivalence between processes, and the soundness is established upon a form of probabilistic bisimulation. Interestingly, they mention that terms (or messages) in their language can be those of OSLR (the probabilistic extension of SLR), but we are not clear how much expressitivity PPC achieves by adding the process part. It is probably more natural for modeling protocols, but no such examples are given in their paper. Impagliazzo and Kapron have proposed two logic systems for reasoning about cryptographic constructions [9]. Their first logic is based on a non-standard arithmetic model, which they prove captures probabilistic polynomial-time computations. While it is a complex and general system, they define a simpler logic on top of the first one, with rules justifying computational indistinguishability. The language in their second logic is very close to a functional language but is unfortunately not precisely defined, and in fact, this leads to a subtle flaw in their proofs using the logic: the SUB rule in their logic requires that the substitute programs must be closed terms, but this is not respected in their proofs. In particular, the hybrid proofs often have a program of the n) in e, where e has a free variable x and it is often substituted by form let i ← p(n indistinguishable programs, but, for instance, if the two programs also have a bound variable i receiving a random number: n) in e1 , n) in e1 let i ← p(n let i ← p(n according to the rule SUB we can only deduce n) in e[let i ← p(n n) in e1 /x] let i ← p(n n) in e[let i ← p(n n) in e2 /x], let i ← p(n n) in e[e2 /x]. However, the n) in e[e1 /x] let i ← p(n but never let i ← p(n latter is used in many proofs in [9]. Furthermore, they claim that by introducing rules justifying directly the computational indistinguishability between programs, they avoid
414
Y. Zhang
explicit reasoning about the probability, but the rule UNIV contains a premise in their base logic (in the arithmetic model) and proving that might still involve reasoning about the probability. In our knowledge, both the proof systems in PPC and the IK-logic have not been automated. Meanwhile, Nowak has proposed a framework for formal verification of cryptographic primitives and it has been implemented in the proof-assistant Coq [13]. It is in fact a formalization of the game-based security proofs, an approach advocated by several researchers in cryptography [4,15], where proofs are done by generating a sequence of games and transformations between games must be proved computationally sound. In Nowak’s formalization, games are seen as syntactic objects and game transformations are syntactic manipulations that can be verified in the proof-assistant, but the complexity-theoretic issues are not considered. Similar works include the system by Barthe et al., also implemented in Coq but using an imperative language [2] and the other one by Backes et al., implemented in Isabelle/HOL and using a functional language with references and events [1]. Blanchet’s CryptoVerif is another automated tool supporting game-based cryptographic proofs, but not based on any existing theorem provers [5]. Unlike previously mentioned work, CryptoVerif aims at generating the sequence of games based on a collection of predefined transformations, instead of verifying the computational soundness of transformations defined by users.
6 Conclusion We present an equational proof system that can be used to prove the computational indistinguishability between programs, and have proved that rules in the system are sound with respect to the set-theoretic semantics, hence the standard notion of security. We also show that the system is applicable in cryptography by using it to verify a cryptographic construction of pseudorandom generator. Unlike the related work mentioned in the previous section, where they either define a language from scratch or do not give a precise language definition, our language is extended from Hofmann’s SLR, which has a very solid mathematical support based on Bellantoni and Cook’s safe recursion and a nice mechanism for the characterization of polynomial-time computations. Examples given in the paper are experimental and we are working on proving more realistic cryptographic constructions. In particular, one should be able to formulate the game-based approach in our system without much difficulty. Furthermore, as higher-order functions are already native in the language, we expect that the system can be used to verify cryptographic protocols in the computational model.
References 1. Backes, M., Berg, M., Unruh, D.: A formal language for cryptographic pseudocode. In: 4th Workshop on Formal and Computational Cryptography, FCC 2008 (2008) 2. Barthe, G., Grégoire, B., Janvier, R., Zanella Béguelin, S.: Formal certification of code-based cryptographic proofs. In: 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2009), pp. 90–101 (2009)
The Computational SLR
415
3. Bellantoni, S., Cook, S.A.: A new recursion-theoretic characterization of the polytime functions. Computational Complexity 2, 97–110 (1992) 4. Bellare, M., Rogaway, P.: Code-based game-playing proofs and the security of triple encryption. Cryptology ePrint Archive, Report 2004/331 (2004) 5. Blanchet, B.: A computationally sound mechanized prover for security protocols. In: IEEE Symposium on Security and Privacy (S&P 2006), pp. 140–154 (2006) 6. Goldreich, O.: The Foundations of Cryptography: Basic Tools. Cambridge University Press, Cambridge (2001) 7. Hofmann, M.: A mixed modal/linear lambda calculus with applications to bellantoni-cook safe recursion. In: Nielsen, M. (ed.) CSL 1997. LNCS, vol. 1414, pp. 275–294. Springer, Heidelberg (1998) 8. Hofmann, M.: Safe recursion with higher types and BCK-algebra. Annals of Pure and Applied Logic 104(1-3), 113–166 (2000) 9. Impagliazzo, R., Kapron, B.M.: Logics for reasoning about cryptographic constructions. Journal of Computer and System Sciences 72(2), 286–320 (2006) 10. Mitchell, J.C., Mitchell, M., Scedrov, A.: A linguistic characterization of bounded oracle computation and probabilistic polynomial time. In: 39th Annual Symposium on Foundations of Computer Science (FOCS 1998), pp. 725–733 (1998) 11. Mitchell, J.C., Ramanathan, A., Scedrov, A., Teague, V.: A probabilistic polynomial-time process calculus for the analysis of cryptographic protocols. Theoretical Computer Science 353(1-3), 118–164 (2006) 12. Moggi, E.: Notions of computation and monads. Information and Computation 93(1), 55–92 (1991) 13. Nowak, D.: A framework for game-based security proofs. In: Qing, S., Imai, H., Wang, G. (eds.) ICICS 2007. LNCS, vol. 4861, pp. 319–333. Springer, Heidelberg (2007) 14. Ramsey, N., Pfeffer, A.: Stochastic lambda calculus and monads of probability distributions. In: 29th SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2002), pp. 154–165 (2002) 15. Shoup, V.: Sequences of games: a tool for taming complexity in security proofs. Cryptology ePrint Archive, Report 2004/332 (2004) 16. Zhang, Y.: The computational SLR: a logic for reasoning about computational indistinguishability. Cryptology ePrint Archive, Report 2008/434 (2008)
Author Index
Abel, Andreas 5 Aschieri, Federico 20 Atkey, Robert 35 Awodey, Steve 249
Pagani, Michele 219 Pagano, Miguel 5 Petit, Barbara 234 Pfenning, Frank 157 Piccolo, Mauro 95
Basaldella, Michele 50 Berardi, Stefano 20 Boudes, Pierre 65 Coquand, Thierry Dal Lago, Ugo
Rabe, Florian 249 Riba, Colin 264 Sarnat, Jeffrey 279 Sch¨ urmann, Carsten 279 Schubert, Aleksy 112 Stenger, Florian 294 Straßburger, Lutz 309
5
80
Faggian, Claudia 95 Fiore, Marcelo 1 Fujita, Ken-etsu 112 Hamana, Makoto 127 Harper, Robert 3 Herbelin, Hugo 142 Hofmann, Martin 80 Hur, Chung-Kil 1 Igarashi, Atsushi
Miquel, Alexandre Mostrous, Dimitris
188 203
325 50 341
Urzyczyn, Pawel
356
Vaux, Lionel 371 Voigtl¨ ander, Janis 294
341
Licata, Daniel R. 3 Lovas, William 157 Lumsdaine, Peter LeFanu
Tasson, Christine Terui, Kazushige Tsukada, Takeshi
Weiermann, Andreas 386 Wilken, Gunnar 386 Yoshida, Nobuko
203
172 Zeilberger, Noam 3 Zhang, Yu 401 Zimmermann, St´ephane
142