Lecture Notes in Artificial Intelligence Edited by J. G. Carbonell and J. Siekmann
Subseries of Lecture Notes in Computer Science
4130
Ulrich Furbach Natarajan Shankar (Eds.)
Automated Reasoning Third International Joint Conference, IJCAR 2006 Seattle, WA, USA, August 17-20, 2006 Proceedings
13
Series Editors Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA Jörg Siekmann, University of Saarland, Saarbrücken, Germany Volume Editors Ulrich Furbach Universität Koblenz-Landau Fachbereich Informatik Universitätsstrasse 1, 56070 Koblenz, Germany E-mail:
[email protected] Natarajan Shankar Computer Science Laboratory, SRI International Menlo Park, CA 94025, USA E-mail:
[email protected]
Library of Congress Control Number: Applied for
CR Subject Classification (1998): I.2.3, F.4.1, F.3, F.4, D.2.4 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13
0302-9743 3-540-37187-7 Springer Berlin Heidelberg New York 978-3-540-37187-8 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2006 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 11814771 06/3142 543210
Preface
This volume contains the papers presented at the Third International Joint Conference on Automated Reasoning (IJCAR 2006) held on August 17–20, 2006, in Seattle, Washington as part of the Federated Logic Conference (FLoC 2006). The IJCAR series of conferences is aimed at unifying the different research disciplines within automated reasoning. IJCAR 2006 is the fusion of several major conferences: – – – –
CADE: The International Conference on Automated Deduction FroCoS: Workshop on Frontiers of Combining Systems FTP: The International Workshop on First-Order Theorem Proving TABLEAUX: The International Conference on Automated Reasoning with Analytic Tableaux and Related Methods – TPHOLs: The International Conference on Theorem Proving in Higher-Order Logics Prior versions of IJCAR were held at Cork, Ireland in 2004 and Siena, Italy in 2001. These proceedings contain 3 contributions by invited speakers including 1 full paper and 2 short abstracts, 41 research papers, and 8 system descriptions. It also includes a short overview of the CASC-J3 competition for automated theorem proving systems that was conducted during IJCAR 2006. In addition to the plenary CAV–IJCAR–ICLP speaker David Dill, the invited speakers included Bruno Buchberger, Adnan Darwich, and Dale Miller. The contributed papers were selected from 133 research paper submissions and 18 system description submissions. Both the number and quality of these submissions were exceptionally high. Each paper was reviewed by at least three referees, and decisions were reached after over two weeks of discussion through an electronic Program Committee meeting. The submissions, reviews, and discussion were coordinated using the EasyChair conference management system. The accepted papers spanned the entire spectrum of research in automated reasoning including formalization of mathematics, proof theory, proof search, description logics, interactive proof checking, higher-order logic, combination methods, satisfiability procedures, and rewriting. The Herbrand Award for distinguished contributions to automated reasoning was given to Wolfgang Bibel in recognition of his far-sighted vision and leadership for the field, and his seminal technical contributions. The selection committee for the Herbrand Award included previous award winners, the IJCAR 2006 Steering Committee, the CADE Inc. trustees, and the IJCAR 2006 Program Committee. The Herbrand award ceremony and the acceptance speech were part of the conference program. The workshops at IJCAR 2006 were coordinated by the Workshop Chair Maria Paola Bonacina, and included: – – – –
ACL2: The ACL2 Theorem Prover and Its Applications AFM: Automated Formal Methods with PVS, ICS, and SAL CFV: Constraints in Formal Verification DISPROVING: Non-Theorems, Non-Validity, Non-Provability
VI
– – – – – – – –
Preface
ESCoR: Empirically Successful Computerized Reasoning FATES/RV: Formal Aspects of Testing and Runtime Verification LFMTP: Logical Frameworks and Meta-Languages: Theory and Practice PDPAR: Pragmatics of Decision Procedures in Automated Reasoning PLPV: Programming Languages meets Program Verification STRATEGIES: Strategies in Automated Deduction UITP: User Interfaces for Theorem Provers VERIFY: Verification Workshop
In addition to the Program Committee and the reviewers, many people contributed to the success of IJCAR 2006. The Conference Chair John Harrison managed the physical organization of the meeting. Sergey Berezin served as the Publicity Chair. Peter Baumgartner coordinated the Woody Bledsoe student travel awards. The IJCAR 2006 Steering Committee consisted of Franz Baader, Ulrich Furbach, Reiner Hähnle, Natarajan Shankar, Toby Walsh, Peter Baumgartner, John Harrison, Tobias Nipkow, Cesare Tinelli, and Andrei Voronkov. The FLoC 2006 General Chair Moshe Vardi, the Program Co-chairs Thomas Ball and Jakob Rehof, the FLoC Workshop Chair Gopal Gupta, and the IJCAR representative Reiner Hähnle also assisted with many organizational issues and questions. Special thanks go to Andrei Voronkov and his EasyChair system, which makes many tasks of a program chair much more easier. We would like to thank all the people involved in organizing IJCAR 2006 and FLoC 2006, as well as the sponsors of FLoC 2006, Cadence, IBM, Microsoft Research, NEC, and The John von Neumann Minerva Center for the Development of Reactive Systems.
June 2006
Ulrich Furbach Natarajan Shankar
Conference Organization
Program Chairs Ulrich Furbach Natarajan Shankar
Program Committee Alessandro Armando Matthias Baaz David Basin Bernhard Beckert Michael Beeson Maria Paola Bonacina Hubert Comon Amy Felty Martin Giese Rajeev Gore Tom Henzinger Jason Hickey Ian Horrocks Dieter Hutter Andrew Ireland Deepak Kapur Helene Kirchner
Conference Chair John Harrison
Workshop Chair Maria Paola Bonacina
Publicity Chair Sergey Berezin
Travel Awards Peter Baumgartner
Michael Kohlhase Chris Lynch Michael Maher Tom Melham Jose Meseguer Aart Middeldorp Ilkka Niemelae Christine Paulin-Mohring Larry Paulson Carsten Schuermann Stephan Schulz John Slaney Mark Stickel Aaron Stump Geoff Sutcliffe Frank Wolter Hantao Zhang
VIII
Organization
External Reviewers
Behzad Akbarpour Carlos Areces Serge Autexier Arnon Avron Emilie Balland Peter Baumgartner Yves Bertot Gerd Beuster Lars Birkedal Frederic Blanqui Stefan Blom Krysia Broda Chad Brown Achim Brucker Richard Bubel Pierre Casteran Yannick Chevalier Agata Ciabatonni Alessandro Cimatti Koen Claessen Sylvain Conchon Pierre Corbineau Marcello D’Agostino Jeremy Dawson Stephanie Delaune Leonardo de Moura Louise Dennis Clare Dixon Francesco M. Donini Roy Dyckhoff Mnacho Echenim Thomas Eiter Bill J. Ellis Santiago Escobar Stephan Falke Chris Fermüller Jean-Christophe Filliatre Bernd Fischer Pascal Fontaine Lilia Georgieva Jurgen Giesl
Birte Glimm Isabelle Gnaedig Sergey Goncharov Ben Gorry Bernardo Cuenca Grau Alberto Griggio Gudmund Grov Dimitar Guelev Elsa Gunter James Harland Keijo Heljanko Joseph Hendrix Miki Hermann Steffen Hoelldobler Doug Howe Marieke Huisman Joe Hurd Ullrich Hustadt Rosalie Iemhoff Florent Jacquemard Radha Jagadeesan Predrag Janicic Tommi Junttila Yevgeny Kazakov Florent Kirchner Vladimir Klebanov Boris Konev Konstantin Korovin Simon Kramer Temur Kutsia Timo Latvala Carla Limongelli Carsten Lutz Jacopo Mantovani Jia Meng Stephan Merz George Metcalfe Thomas Meyer Marius Minea Glyn Morrill Georg Moser
Organization
Till Mossakowski Boris Motik Normen Müller Fabrice Nahon Leonor Prensa Nieto Robert Nieuwenhuis Immanuel Normann Michael Norrish Hans J. Ohlbach Florina Piroi Adam Poswolsky Florian Rabe Silvio Ranise Jason Reed Greg Restall Julian Richardson Christophe Ringeissen Andreas Roth Andrey Rybalchenko Gernot Salzer Jeffrey Sarnat Ralf Sasse Uli Sattler Steffen Schlager Vasu Singh Christoph Sprenger Mark Steedman
Mark-Oliver Stehr Gernot Stenz Peter Stuckey Sebastiaan Terwijn Alwen Tiu Ashish Tiwari Ralf Treinen Dmitry Tsarkov Harvey Tuch Anni-Yasmin Turhan Xavier Urbain Josef Urban Rene Vestergaard Luca Viganò Laurent Vigneron Christian Vogt Uwe Waldmann Toby Walsh Tjark Weber John Wheeler Burkhart Wolff Chunlai Zhou Calogero G. Zarba Jian Zhang Paul Zimmermann Evgeny Zolin
IX
Table of Contents
Invited Talks Mathematical Theory Exploration (Abstract) Bruno Buchberger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Searching While Keeping a Trace: The Evolution from Satisfiability to Knowledge Compilation (Abstract) Adnan Darwiche . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Representing and Reasoning with Operational Semantics Dale Miller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Session 1. Proofs Flyspeck I: Tame Graphs Tobias Nipkow, Gertrud Bauer, Paula Schultz . . . . . . . . . . . . . . . . . . . . .
21
Automatic Construction and Verification of Isotopy Invariants Volker Sorge, Andreas Meier, Roy McCasland, Simon Colton . . . . . . . .
36
Pitfalls of a Full Floating-Point Proof: Example on the Formal Proof of the Veltkamp/Dekker Algorithms Sylvie Boldo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
Using the TPTP Language for Writing Derivations and Finite Interpretations Geoff Sutcliffe, Stephan Schulz, Koen Claessen, Allen Van Gelder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
Session 2. Search Stratified Context Unification Is NP-Complete Jordi Levy, Manfred Schmidt-Schauß, Mateu Villaret . . . . . . . . . . . . . . .
82
A Logical Characterization of Forward and Backward Chaining in the Inverse Method Kaustuv Chaudhuri, Frank Pfenning, Greg Price . . . . . . . . . . . . . . . . . . .
97
Connection Tableaux with Lazy Paramodulation Andrei Paskevich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
XII
Table of Contents
Blocking and Other Enhancements for Bottom-Up Model Generation Methods Peter Baumgartner, Renate A. Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Session 3. System Description 1 The MathServe System for Semantic Web Reasoning Services J¨ urgen Zimmer, Serge Autexier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 System Description: GCLCprover + GeoThms Predrag Janiˇci´c, Pedro Quaresma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 A Sufficient Completeness Checker for Linear Order-Sorted Specifications Modulo Axioms Joe Hendrix, Jos´e Meseguer, Hitoshi Ohsaki . . . . . . . . . . . . . . . . . . . . . . . 151 Extending the TPTP Language to Higher-Order Logic with Automated Parser Generation Allen Van Gelder, Geoff Sutcliffe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Session 4. Higher-Order Logic Extracting Programs from Constructive HOL Proofs Via IZF Set-Theoretic Semantics Robert Constable, Wojciech Moczydlowski . . . . . . . . . . . . . . . . . . . . . . . . . 162 Towards Self-verification of HOL Light John Harrison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 An Interpretation of Isabelle/HOL in HOL Light Sean McLaughlin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Combining Type Theory and Untyped Set Theory Chad E. Brown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Session 5. Proof Theory Cut-Simulation in Impredicative Logics Christoph E. Benzm¨ uller, Chad E. Brown, Michael Kohlhase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Interpolation in Local Theory Extensions Viorica Sofronie-Stokkermans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Table of Contents
XIII
Canonical Gentzen-Type Calculi with (n,k)-ary Quantifiers Anna Zamansky, Arnon Avron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Dynamic Logic with Non-rigid Functions Bernhard Beckert, Andr´e Platzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Session 6. System Description 2 AProVE 1.2: Automatic Termination Proofs in the Dependency Pair Framework J¨ urgen Giesl, Peter Schneider-Kamp, Ren´e Thiemann . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 CEL— A Polynomial-Time Reasoner for Life Science Ontologies Franz Baader, Carsten Lutz, Boontawee Suntisrivaraporn . . . . . . . . . . . 287 FaCT++ Description Logic Reasoner: System Description Dmitry Tsarkov, Ian Horrocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Importing HOL into Isabelle/HOL Steven Obua, Sebastian Skalberg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Session 7. Search Geometric Resolution: A Proof Procedure Based on Finite Model Search Hans de Nivelle, Jia Meng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 A Powerful Technique to Eliminate Isomorphism in Finite Model Search Xiang Xue Jia, Jian Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Automation of Recursive Path Ordering for Infinite Labelled Rewrite Systems Adam Koprowski, Hans Zantema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Session 8. Proof Theory Strong Cut-Elimination Systems for Hudelmaier’s Depth-Bounded Sequent Calculus for Implicational Logic Roy Dyckhoff, Delia Kesner, St´ephane Lengrand . . . . . . . . . . . . . . . . . . . 347 Eliminating Redundancy in Higher-Order Unification: A Lightweight Approach Brigitte Pientka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
XIV
Table of Contents
First-Order Logic with Dependent Types Florian Rabe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Session 9. Proof Checking Automating Proofs in Category Theory Dexter Kozen, Christoph Kreitz, Eva Richter . . . . . . . . . . . . . . . . . . . . . . 392 Formal Global Optimisation with Taylor Models Roland Zumkeller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 A Purely Functional Library for Modular Arithmetic and Its Application to Certifying Large Prime Numbers Benjamin Gr´egoire, Laurent Th´ery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Proving Formally the Implementation of an Efficient gcd Algorithm for Polynomials Assia Mahboubi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
Session 10. Combination A SAT-Based Decision Procedure for the Subclass of Unrollable List Formulas in ACL2 (SULFA) Erik Reeber, Warren A. Hunt Jr. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Solving Sparse Linear Constraints Shuvendu K. Lahiri, Madanlal Musuvathi . . . . . . . . . . . . . . . . . . . . . . . . . 468 Inferring Network Invariants Automatically Olga Grinchtein, Martin Leucker, Nir Piterman . . . . . . . . . . . . . . . . . . . 483 A Recursion Combinator for Nominal Datatypes Implemented in Isabelle/HOL Christian Urban, Stefan Berghofer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Session 11. Decision Procedures Decidability and Undecidability Results for Nelson-Oppen and Rewrite-Based Decision Procedures Maria Paola Bonacina, Silvio Ghilardi, Enrica Nicolini, Silvio Ranise, Daniele Zucchelli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Verifying Mixed Real-Integer Quantifier Elimination Amine Chaieb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
Table of Contents
XV
Presburger Modal Logic Is PSPACE-Complete St´ephane Demri, Denis Lugiez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 Tree Automata with Equality Constraints Modulo Equational Theories Florent Jacquemard, Michael Rusinowitch, Laurent Vigneron . . . . . . . . 557
Session 12. CASC-J3 CASC-J3 - The 3rd IJCAR ATP System Competition (Invited Summary) Geoff Sutcliffe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
Session 13. Rewriting Matrix Interpretations for Proving Termination of Term Rewriting J¨ org Endrullis, Johannes Waldmann, Hans Zantema . . . . . . . . . . . . . . . 574 Partial Recursive Functions in Higher-Order Logic Alexander Krauss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 On the Strength of Proof-Irrelevant Type Theories Benjamin Werner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Consistency and Completeness of Rewriting in the Calculus of Constructions Daria Walukiewicz-Chrzaszcz, Jacek Chrzaszcz . . . . . . . . . . . . . . . . . . . . 619
Session 14. Description Logic Specifying and Reasoning About Dynamic Access-Control Policies Daniel J. Dougherty, Kathi Fisler, Shriram Krishnamurthi . . . . . . . . . . 632 On Keys and Functional Dependencies as First-Class Citizens in Description Logics David Toman, Grant Weddell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 A Resolution-Based Decision Procedure for SHOIQ Yevgeny Kazakov, Boris Motik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
Mathematical Theory Exploration Bruno Buchberger Research Institute for Symbolic Computation Johannes Kepler University Linz, Austria
Mathematics is characterized by its method of gaining knowledge, namely reasoning. The automation of reasoning has seen significant advances over the past decades and, thus, the expectation was that these advances would also have significant impact on the practice of doing mathematics. However, so far, this impact is small. We think that the reason for this is the fact that automated reasoning so far concentrated on the automated proof of individual theorems whereas, in the practice of mathematics, one proceeds by building up entire theories in a step-by-step process. This process of exploring mathematical theories consists of the invention of notions, the invention and proof of propositions (lemmas, theorems), the invention of problems, and the invention and verification of methods (algorithms) that solve problems. Also, in this process of mathematical theory exploration, storage and retrieval of knowledge plays an important role. The way how one proceeds in building up a mathematical theory in successive, well structured, layers has significant influence on the ease of proving individual propositions that occur in the build-up of the theory and also on the readability and explanatory power of the proofs generated. Furthermore, in the practice of mathematical theory exploration, different reasoning methods are used for different theories and, in fact, reasoning methods are expanded and changed in the process of exploring theories, whereas traditional automated reasoning systems try to get along with one reasoning method for large parts or all of mathematics. We describe a methodology for the algorithmic support of mathematical theory exploration. Starting from any mathematical knowledge base (collection of formulae that may be axioms, definitions, propositions, lemmas, theorems, problem specifications, algorithms), in one exploration round, we introduce a few new notions by axioms or definitions and then explore the possible interactions of these notions with all existing notions as completely as possible before we proceed to the introduction of the next notions. Achieving a certain degree of completeness in the current exploration round is crucial for decreasing the complexity of proving in the subsequent exploration rounds. The invention of axioms and definitions, the invention of propositions that describe interactions between notions, the invention of problems involving the new notions, and the invention of algorithms that solve these problems is something which, typically, needs a certain degree of human intervention. However, we show that significant algorithmic support for these invention steps is possible by the use of formula schemes. Formula schemes are formulae with higher order variables for functions and predicates that describe fundamental properties of functions and predicates, as for example commutativity, isomorphy, etc., that have proved to be useful in the practice of mathematics. U. Furbach and N. Shankar (Eds.): IJCAR 2006, LNAI 4130, pp. 1–2, 2006. c Springer-Verlag Berlin Heidelberg 2006
2
B. Buchberger
The application of formula schemes provides a kind of bottom-up support in the exploration of mathematical theories. A second idea is orthogonal to the use of formula schemes and provides top-down support in the exploration: We provide tools for the analysis of failing proofs. Failing proofs contain a lot of creative information. They give us hints about missing links in the systematic exploration of notions. For example, a failing induction proof of a conjecture may give us a hint about which lemmas we should try to prove before going back to the proof of the conjecture. The extraction of intermediate lemmas from failing proofs can, however, also be applied for the synthesis of algorithms. We illustrate the interplay between the use of formula schemes, lemma extraction from failing proofs and automated reasoning (special reasoners for special theories) in a couple of examples. Our main example is the theory of Groebner bases that has found numerous applications in many different areas of mathematics (algebraic geometry, symbolic analysis, algebraic combinatorics etc.). Particular emphasis is put on a new method for the algorithm-supported synthesis of algorithms, which we call lazy thinking method, which combines the use of formula schemes and the extraction of conjectures from failing (automated) proofs in the following way: – We start from a formal specification P of the problem and try out one algorithm scheme A after the other from a list of algorithm schemes. Such a scheme defines A recursively in terms of unkown subalgorithms B, C, etc. – For the chosen scheme A, we try to prove (by an automated reasoner) the correctness theorem that states that A solves problem P . Typically, this proof will fail because nothing is known on the subalgorithms B, C, etc. – We analyze the failing proof object and extract properties Q, R, etc. for B, C, etc. such that if B, C, etc. had these properties then the correctness proof could continue and eventually succeed. (We have a couple of heuristic rules for this extraction.) – Properties Q, R, etc. can now be conceived as specifications for the subalgorithms B, C, etc. Now we either have suitable algorithms B, C, etc. already available in our current mathematical knowledge base (in this case we are done) or we call the lazy thinking method recursively for B, C, etc. We demonstrate that this seemingly simple procedure has significant inventive power: It is strong enough, for example, to automatically synthesize an algorithm for the construction of Groebner bases. This is surprising because the construction of Groebner bases is a completely nontrivial problem which - in the early days of computer algebra - was even conjectured to be algorithmically unsolvable. We also give some remarks, again in the example of Groebner bases theory, on the use of special theorem provers and the expansion and change of provers during the process of mathematical theory exploration and the implications of these facts on future reasoning systems for mathematics. All the examples are given in our Theorema system, which tries to implement the above approach to mathematical theory exploration.
Searching While Keeping a Trace: The Evolution from Satisfiability to Knowledge Compilation Adnan Darwiche Computer Science Department University of California, Los Angeles CA 90095-1596, USA
[email protected]
Satisfiability testing has seen significant growth over the last decade, leading to orders of magnitude improvements in performance over a relatively short period of time. State of the art algorithms for satisfiability, which are based on DPLL search, were originally meant to find a single satisfying assignment, but their scope has been extended recently to perform exhaustive searches for the purpose of counting and enumerating models. Moreover, the algorithms have been augmented with sophisticated techniques, such as component analysis and formula caching, which are critical to their performance in real–world applications. In a parallel thread of developments, work has been commencing in the area of knowledge compilation, which aims at converting knowledge bases into tractable representations that allow some hard operations to be performed in polytime on the compiled representations. Work in this area has lead to the identification of a comprehensive taxonomy of tractable languages that explicates their relative succinctness and the polytime operations they support. In this talk, I will examine these two threads of developments in automated reasoning and show a deep connection that has evolved over the last few years. In particular, I will show that the trace of a search can be interpreted (and stored) as a propositional sentence, leading each search algorithm to define a propositional language consisting of sentences generated by all possible executions of the algorithm. I will show several matches between exhaustive search algorithms in common use today and well–known languages based on Binary Decision Diagrams (BDDs) and Decomposable Negation Normal Form (DNNF), which currently dominate the area of knowledge compilation. I will also discuss two implications of such matches: (1) a uniform and practical framework in which successful search techniques can be harnessed for the compilation of knowledge bases into various languages of interest, and (2) a new methodology whereby the hidden power and limitations of search algorithms can be unveiled by looking up the properties (tractability and succinctness) of corresponding propositional languages. The talk will provide an exposition to many empirical studies that reflect the state of the art in exhaustive DPLL search and knowledge compilation. The talk will also show how these developments have lead to automated reasoning systems that are now forming the backbones of state of the art problem solvers in various areas, including planning (probabilistic and deterministic), state estimation and diagnosis, and probabilistic reasoning in graphical models. U. Furbach and N. Shankar (Eds.): IJCAR 2006, LNAI 4130, p. 3, 2006. c Springer-Verlag Berlin Heidelberg 2006
Representing and Reasoning with Operational Semantics Dale Miller ´ INRIA & LIX, Ecole Polytechnique
Abstract. The operational semantics of programming and specification languages is often presented via inference rules and these can generally be mapped into logic programming-like clauses. Such logical encodings of operational semantics can be surprisingly declarative if one uses logics that directly account for term-level bindings and for resources, such as are found in linear logic. Traditional theorem proving techniques, such as unification and backtracking search, can then be applied to animate operational semantic specifications. Of course, one wishes to go a step further than animation: using logic to encode computation should facilitate formal reasoning directly with semantic specifications. We outline an approach to reasoning about logic specifications that involves viewing logic specifications as theories in an object-logic and then using a meta-logic to reason about properties of those object-logic theories. We motivate the principal design goals of a particular meta-logic that has been built for that purpose.
1
Roles for Logic in the Specification of Computations
There are two broad approaches to using logic to specify computational systems. In the computation-as-model approach, computations are encoded as mathematical structures, containing such items as nodes, transitions, and state. Logic is used in an external sense to make statements about those structures. That is, computations are used as models for logical expressions. Intensional operators, such as the modals of temporal and dynamic logics or the triples of Hoare logic, are often employed to express propositions about the change in state. This use of logic to represent and reason about computation is probably the oldest and most broadly successful use of logic for specifying computation. The computation-as-deduction approach uses pieces of logic’s syntax (formulas, terms, types, and proofs) directly as elements of the specified computation. In this much more rarefied setting, there are two rather different approaches to how computation is modeled. The proof normalization approach views the state of a computation as a proof term and the process of computing as normalization (via β-reduction or cut-elimination). Functional programming can be explained using proof-normalization as its theoretical basis [23]. The proof search approach views the state of a computation as a sequent (a structured collection of formulas) and the process of computing as the search for a proof of a sequent: the changes that take place in sequents capture the dynamics of computation. Proof U. Furbach and N. Shankar (Eds.): IJCAR 2006, LNAI 4130, pp. 4–20, 2006. c Springer-Verlag Berlin Heidelberg 2006
Representing and Reasoning with Operational Semantics
5
search has been used to provide a theoretical foundation for logic programming [33] and to justify the design of new logic programming languages [31]. The divisions proposed above are informal and suggestive: such a classification is helpful in pointing out different sets of concerns represented by these two broad approaches (reduction, confluence, etc, versus unification, backtracking search, etc). Of course, a real advance in computation logic might allow us merge or reorganize this classification. This classification can help to categorize the various proof systems that have been used to reason about computation systems. For example, the computationas-model approach usually implies that one divides the problem of reasoning about computation into two steps. In the first step, one implements mathematics via some set-theory or higher-order logic (for example, HOL [14], Isabelle/ZF [46], PVS [44]). In the second step, one reduces program correctness problems to mathematics. Thus, data structures, states, stacks, heaps, invariants, etc, all are represented as various kinds of mathematical objects. One then reasons directly on these objects using standard mathematical techniques (induction, primitive recursion, fixed points, well-founded orders, etc). Researchers who specify computation using the proof-normalization approach usually first implement mathematics, but this time, in a constructive mathematics, using, for example, Martin-L¨ of type theory [23] and higher-order intuitionistic logic or dependent type theory (for example, Coq [9] and NuPRL [8]). This paper describes another possibility for the construction of a prover that takes its inspiration from the proof search approach to the specification of computation.
2
The Proof Search Paradigm
As one builds cut-free proofs of sequents (in the sense of, say, Gentzen [12]), sequents change and this change, reading proofs bottom-up, is used to capture the dynamics of computation that one intends to model. The cut-rule and the cut-elimination process do not have roles to play during this simulation of computation: instead, they can play important roles in reasoning about specified computations. While proof search can be seen as a means of giving a broad foundation to logic programming, there are a number of aspects of proof search (as computation) that have not been embraced by the general logic programming community. For example, proof search relies primarily on proof theory for new designs and analysis tools, instead of model theory as is more customarily used by logic programming researchers. Proof search generally stresses logically correct deduction even if non-logical techniques, such as dropping all occur-checks during unification and using the ! pruning or “cut” operator of Prolog, can yield more effective executions. Also, proof search design and theory also focuses on the meaning of logical connectives and quantifiers for expressiveness and for new designs. Such a focus is in contrast to, say, constraint logic programming [21].
6
D. Miller
As we highlight in the rest of this section, the proof search paradigm allows for a relatively straightforward treatments of such “intensional” aspects of computation as binders, binder mobility, and resource management. 2.1
Encoding Symbolic Expression Via λ-Tree Syntax
Most human authored and readable symbolic expressions start life as strings: such linearly organized data are full of semantically unimportant information such as white space, infix/postfix operators, and parentheses. Before processing such concrete syntax, one removes much of this concrete nonsense by parsing the data into a more abstract representation we call here parse trees (often also called abstract syntax). Most parsing technology unfortunately leaves the names of bound variables in the resulting parse trees. Although binders are, of course, important aspects of the meaning of computational objects, the name of variables used to encode binders are another form of concrete nonsense. Since dealing with bindings in syntax is a well known problem, various techniques are available to help make this concrete and semantically meaningless aspect of syntax more abstract. One approach to bindings in syntax uses deBruijn numerals [5]. While such an encoding has proved its value within implementations, deBruijn numerals seem too explicit and complicated to support declarative reasoning of syntax. Other approaches involve the direct use of named variables and a version of set theory to accommodate names and renaming [11]. The higher-order abstract syntax (hoas) [47] approach to encoding syntax proposes that bindings in data should be mapped to the bindings found in whatever programming or specification language one is using. Within functional programming, term-level binders are then encoded as functional objects. While some interesting specifications have resulted [10,18], this approach has numerous semantic problems. For example, while expressions with bindings are still intended to be finite and syntactic objects, the corresponding functions yields values that are usually infinite in extension. Also, there are usually many more constructors for function spaces than simply the λ-abstraction within a functional programming setting: for example, recursive function definition. In contrast, the proof search approach to the specification of computation allows for a different approach to hoas. In that setting, λ-terms modulo various weak subsets of λ-conversion can be used to directly encode expressions. Here, α-conversion abstracts away from the names of bindings, β0 -conversion allows for binder mobility [28,30], and β-conversion allows for object-level substitution. We shall use the term λ-tree syntax [29] to denote the proof search approach to hoas. While there is a long history of using λ-tree syntax in specifying computation, starting with Huet and Lang [20] and Miller and Nadathur [32], few computer systems actually support it directly: the λProlog [40] programming language and the Isabelle [43] and Twelf [48] specification languages are the best known exceptions. Using meta-level application to encode object-level applications is standard practice: for example, one uses meta-level application to apply, say, cons, to two
Representing and Reasoning with Operational Semantics
7
arguments: (cons X L). The λ-tree syntax approach is simply a dualizing of this practice that uses meta-level abstraction to encode object-level binders. 2.2
Encoding Computational Processes as Formula or as Terms
It seems that there are two choices one can make when encoding “active” components of a computation into proof search. Usually, these active objects, such as computation threads, automata states, etc, which we collectively call here as simply “processes”, are encoded as terms. In this process-as-term approach, predicates are then used to state relationships between computational items. For example, we have statements such as “M has value V ”, “in context Γ , M has type σ”, “P makes an A transition and becomes Q”, etc. Given such atomic formulas, one then encodes operational semantics as compound formulas within either an intuitionistic or a classical logic. For an example of encoding the πcalculus using this process-as-term approach, see [35,54] and Section 6. With the availability of linear logic and other sub-structural logics, it seems sensible to consider another style of encoding where processes are encoded directly as formulas. In the process-as-formula approach to encoding, formulas no longer denote truth values: instead they denote “resources” which can change over time. In such a setting, the combinators of a processes calculus are mapped to logical connectives and the environment of a computation thread (including, for example, memory and other threads) are modeled via a logical context (within a sequent, for example). In principle, this approach requires fewer nonlogical constants than are used with the process-as-term approach. There is a large literature of specifying programming language features in this manner using linear logic [31]. While encodings using the process-as-formula approach can often capture the notion of process execution or of reachability, they fail to directly support rich notions of program or process equivalences, such as bisimulation or observational equivalence. To capture these equivalences, the process-as-term approach has provided more successes.
3
Operational Semantics as Logic Specification
A common style of operational semantics specification is presented as inference rules involving relations. We illustrate how such semantic specifications can be mapped into logical specifications. For example, some of the rules for specifying CCS [37] are given by the following inference rules. A
P −−→ R A
P + Q −−→ R
A
Q −−→ R A
P + Q −−→ R
P −−→ P
A
Q −−→ Q
A
P |Q −−→ P |Q
P |Q −−→ P |Q
A A
·
By viewing + and | as constructors of processes and · −−→ · as a predicate of three arguments, it is easy to write these inference rules as the following firstorder Horn clauses.
8
D. Miller A
A
A
A
∀P ∀Q∀A∀R[P −−→ R ⊃ P + Q −−→ R] ∀P ∀Q∀A∀R[Q −−→ R ⊃ P + Q −−→ R] A
A
A
A
∀P ∀A∀P ∀Q[P −−→ P ⊃ P |Q −−→ P |Q] ∀P ∀A∀Q ∀Q[Q −−→ Q ⊃ P |Q −−→ P |Q ] For a slightly more challenging specification of operational semantics, we consider a specification of call-by-name evaluation, which involves bindings and substitution (call-by-value evaluation can also be used here just as easily). Let the type tm denote the syntactic category of untyped λ-terms and let the two constructors abs of type (tm → tm) → tm and app of type tm → tm → tm denote abstraction and application within the untyped λ-calculus, respectively. This encoding places α-equivalence classes of untyped λ-terms in one-to-one correspondence with βη-equivalence classes of terms of type tm. To specify callby-name evaluation, we use an infix binary predicate ⇓ to denote evaluation between two arguments of type tm. Call-by-name evaluation can be specified by the following two inference rules. (abs λx.S) ⇓ (abs λx.S)
M ⇓ (abs λx.S) (S[x/N ]) ⇓ V (app M N ) ⇓ V
To translate these inference rules into logic, one needs to explain carefully the proper treatment of binding (here, λx) and the definition of term-level substitution (here, S[x/N ]). As is often observed, these details are complex and there are a number of different solutions. Furthermore, dealing with all those details does not help in understanding the essential semantics of such a specification rule. Fortunately, we can simply invoke the λ-tree approach to syntax to address these problems. In particular, we assume that our logic contains variables of higher-order type (in particular, of type tm → tm) and that it contains an equality of simply types that includes βη-conversion. In this way, we can simply reuse the careful specification done by, say, Church in [6], of how λ-abstraction and logic interact. Given this motivation, we can now choose to write the above specification as simply the following (higher-order) Horn clauses [41]: ∀R.[(abs R) ⇓ (abs R)] ∀M ∀N ∀V ∀R.[M ⇓ (abs R) ∧ (R N ) ⇓ V ⊃ (app M N ) ⇓ V ] Here, R has type tm → tm and corresponds to the expression λx.S and the substitution S[x/N ] is replaced by the expression (R N ). Various forms of static analysis, such a typing, can be specified using inference rules as well. Consider, for example, the specification of simple typing for the untyped λ-calculus. To specify simple typing for the untyped λ-calculus, we introduce the logic-level type ty to denote the syntactic category of simple type expressions and use the constructors gnd of type ty (denoting a ground, primitive
Representing and Reasoning with Operational Semantics
9
type) and arr of type ty → ty → ty (denoting the function type constructor). The usual rule for simple typing is given as follows: Γ M : (arr U T ) Γ N: U Γ (app M N ): T
Γ, x: T S: U (†) Γ (abs λx.S): (arr T U )
The second inference rule has the proviso (†): x must be a new variable; that is, it is not free in T , U , nor in any of the pairs in Γ . To encode these inference rules into logic, we first pick a binary predicate typeof, whose arguments are of type tm and ty, respectively, to denote the colon relation above. Then the following formulas provide an elegant encoding of these typing inference rules. ∀M ∀N ∀T ∀U [typeof M (arr U T ) ∧ typeof N U ⊃ typeof (app M N ) T ] ∀R∀T ∀U [∀x.[typeof x T ⊃ typeof (R x) U ] ⊃ typeof (abs R) (arr T U )] Notice that these formulas are no longer Horn clauses. The use of λ-tree syntax allows for dispensing with any explicit reference to bindings. The use of the implication in the body of clauses means that the explicit context Γ is being managed implicitly by logic. The term-level binding in λx can be seen as “moving” to the formula-level binding ∀x. During proof search, this formula-level binding will be replaced with an eigenvariable: thus, this formula-level binding will move to a proof-level binding. Such binder mobility gives λ-tree syntax one of its strength: a specification does not need to specify details about how binders are encode, instead, binders only need to be moved from term-level to formula-level to prooflevel bindings. Details of binders need to be addressed only by implementors of the logic.
4
What Good Is a Logic Specification Anyway?
People working in programming language specification and implementation have a history of using declarative tools. For example, both lexical analyzers and parsers are often generated by special tools (e.g., lex and yacc) that work from such declarative specifications as regular expressions and context-free grammars. Similarly, operational semantics has been turned into interpreters via logic programming engines [4] and denotational semantics have been used to generate compilers [45]. Given a history of interest in declarative techniques to specify programming language systems, it seems natural to now focus on the question: why should anyone care that we have written an operational semantic specification or a typing relation declaratively? What benefits should arise from using λ-tree syntax, from using intuitionistic logic or linear logic? One benefit arises from the fact that logic is a difficult discipline to follow: the efforts of the specifier to hammer a specification into a declarative setting that lacks, for example, side-conditions, can often lead to new ways of thinking about what one is specifying. Such rarefied and declarative settings can also allow broad results to be inferred from specifications: for example, the fact that bisimulation
10
D. Miller
is a congruence can be established for process calculi (see, for example, [15,56]) or for functional programming languages [19] by checking syntactic conditions on the declarative specification of operational semantics. Another benefit is that an implementation of logic might provide a uniform means to animate a wide range of logic specifications. The benefit that concerns us here, however, is that a logic specification should facilitate the inferring of formal properties. While this might sound obvious, designing a “meta-logic” for reasoning about logic specifications requires some work. We motivate via some examples one particular meta-logic.
5
Example: A Subject-Reduction Theorem
Consider again the specification of evaluation and typing given in Section 3. The following theorem is usually called the type preservation or the subject-reduction theorem. The informal proof of this theorem below is taken from [24]. Theorem 1. If P evaluates to V and P has type T then V has type T . Proof. We write B to mean that there is a uniform proof of B, where uniform proofs are certain kinds of cut-free proofs that have been used to formalize the notion of goal-directed proof search [33]. Restricting to such uniform proofs in this setting does not result in a loss of completeness. We proceed by induction on the structure of a uniform proof of P ⇓ V that for all T , if typeof P T then typeof V T . Since P ⇓ V is atomic, its proof must end by backchaining on one of the formulas encoding evaluation. If the backchaining is on the ⇓ formula for abs, then P and V are both equal to abs R, for some R, and the consequent is immediate. If P ⇓ V is proved using the ⇓ formula for app, then P is of the form app M N and for some R, there are shorter proofs of M ⇓ (abs R) and (R N ) ⇓ V . Since typeof (app M N ) T , this typing relation must have been proved using backchaining and, hence, there is a U such that typeof M (arr U T ) and typeof N U . Using the inductive hypothesis, we have typeof (abs R) (arr U T ). This atomic formula must have been proved by backchaining on the typeof formula for abs, and, hence, ∀x.[typeof x U ⊃ typeof (R x) T ]. Since our meta-language is (intuitionistic) logic, we can instantiate this quantifier with N and use cut and cut-elimination to conclude that typeof (R N ) T . (This last step is essentially a “substitution lemma” which comes for free given cut elimination and our use of λ-tree syntax.) Using the inductive hypothesis a second time yields typeof V T . This proof is clear and natural and we would like our meta-logic to support similarly structured proofs. This example suggests that the following features would be valuable in the meta-logic. 1. Two distinct logics. In the above informal proof, there are clearly two distinct logics being used. One logic is written with logical syntax and describes some relations, e.g. typability and evaluation. The second logic is written
Representing and Reasoning with Operational Semantics
11
with English text: atomic formulas of that logic are (provability) judgments about the object-logic. This use of two distinct logics – one for the specifying operational semantics and typing and one for the meta-logic – is an important aspect of our approach to reasoning about computation. 2. Structural induction over object-level sequent calculus proofs was used. Obviously induction over other structures (e.g., expressions, formulas) and coinduction play important roles in meta-level reasoning about computation. 3. The instantiation of meta-level eigenvariables was used in this proof. In particular, the variable P was instantiated in one part of the proof to (abs R) and in another part of the proof to (app M N ). Notice that such instantiation of eigenvariables within a proof does not happen in proof search in conventional sequent calculi. 4. The inversion of assumed judgment was used in the above proof a few times, leading, for example, from the assumption typeof (abs R) (arr U T ) to the assumption ∀x[typeof x U ⊃ typeof (R x) T ]. The specification of typeof allows the implication to go in the other direction, but given the structure of the specification of typeof, this direction can also be justified at the meta-level. The system Twelf is capable of proving such type preservation properties along rather similar lines, except that an explicit meta-logic with an explicit induction rule is replaced by a meta-level tool that checks properties such as coverage and termination [51]. In the example above, bindings in the object-logic and object-language played a small role: they were treated only by instantiation. In the next section, we consider the π-calculus since it provides a more challenging problem for dealing with bindings in syntax and in computations.
6
Example: A π-Calculus Specification
To encode the syntax of the π-calculus, let the types p, n, and a denote the syntactic category of processes, names, and actions, respectively. A signature for the π-calculus can thus be listed as 0 : p, out : n → n → p → p, in : n → (n → p) → p, +, | : p → p → p, match : n → n → p → p, ν : (n → p) → p. For example, the expression x(y).P , where x is a name and y is a binding with scope P , can be encoded using a constructor in as the expression (in x (λy.P )). Similarly, the restriction operator νx.P can be encoded as ν(λx.P ). We next introduce three constructors for actions: τ denotes the silent action and the down arrow ↓ and up arrow ↑ encode input and output actions, resp: in particular, the expression (↓xy) denotes an input action on channel x of value y. Notice that the two expressions, λy.↑xy and ↑x, denoting abstracted actions, are equal up to η-conversion and can be used interchangeably. To specifying the operational semantics of the π-calculus, we use the horizontal arrow −−→ to relate a process with an action and a continuation (a process),
12
D. Miller
and the “harpoon” −− to relate a process with an abstracted action and an abstracted continuation (of types n → a and n → p, resp.). The following three rules (named (close), (res), (open)) are part of the specification of one-step transitions for the π-calculus: the full specification using λ-tree syntax can be found in, for example, [34,36]. ↓X
P −− M
↑X
Q −− N
τ
P | Q −−→ νy.(M y | N y)
↑Xy
A
∀n(N n −−→ M n)
∀y(N y −−→ M y)
A
νn.N n −−→ νn.M n
λy.↑Xy
νy.N y −− λy.M y
The (close) rule describes how a bound input and bound output action can yield a τ step with a ν-restricted continuation. The (res) rule illustrates how λ-tree syntax and appropriate quantification can remove the need for side conditions: since substitution in logic does not allow for the capture of bound variables, all instances of the premise of this rule have a horizontal arrow in which the action label does not contain the universally quantified variable free. Thus, the usual side condition for this rule is treated declaratively. There is a direct translation of such inference rules into, say, λProlog [40], in such a way that one can directly animate the operational semantics of the π-calculus.
7
Example: Bisimulation for the π-Calculus
There seems to be something questionable about the use of the universal quantifier in the premises of the operational semantics for the π-calculus above. For A
example, the (res) rule says that if N n −−→ M n is provable for all instances A
of the free variable n then the transition νn.N n −−→ νn.M n is justified. This does not seem to be a completely correct sense of what is implied by the original specification rule of the π-calculus. A more correct sense of the rule should be A
something like: if N n −−→ M n is provable for some new name n, then the above conclusion is justified. In a proof search setting involving only positive inference about computation (for example, judgments involving only may behavior of a process), such a quantifier appears only positively and is instantiated with a new (proof-level bound) variable called an eigenvariable. In this setting, the notion of new name is supported well by the universal quantifier. If, however, negative information is being inferred, as is possible with judgments involving must behaviors, then the universal quantifier is instantiated with any number of existing names. This seems like the wrong meaning for this rule. To illustrate this example more concretely, note that for any name x, the process νy.[x = y]¯ xz is bisimilar to 0: that is, this process can make no transitions. This fact also seems to follow from the nature of bindings: the scope of the bindings for x and for y are such that any instance of x can never equal y (a simple consequence of that fact that sound substitutions avoid variable capture). Now, proving this bisimulation fact should be equivalent to proving A
∀x∀A∀P ¬(νy.[x = y]¯ xz −−→ P )
Representing and Reasoning with Operational Semantics
13
Using the above operational semantics, this should be equivalent to proving A
∀x∀A∀P ¬∀y([x = y]¯ xz −−→ P ) and
A
∀x∀A∀P ∃y¬([x = y]¯ xz −−→ P )
Now it seems that standard proof theory techniques will not achieve a proof: somehow we need to have additional information that for every name there exists another name that is distinct from it. Adopting such an axiom is often done in many settings, but this seems to go against the usual spirit of sequent calculus (a system usually containing no axioms) and against the idea that proof theory is, in fact, an ideal setting to deal with notions of bindings and scope directly. To come up with a proof theoretic approach to address this problem with using the ∀-quantifier in operational semantics, Miller and Tiu [35,36] introduced the ∇-quantifier: the informal reading of ∇x.Bx, in both positive and negative settings, is equivalent to Bx for a new name x. To support this new quantification, sequent calculus is extended to support a notion of “generic judgments” so that “newness” remains a (proof-level) binding and can be seen as being hypothetical. That is, the truth condition of ∇x.Bx roughly reduces to the conditional “if a new name c is created then Bc.” Notice that no assumption about whether or not the domain of quantification is non-empty is made (this detail makes ∇ behave differently from the Gabbay-Pitts “newness quantifier” [11]). If one is interested only in establishing one-step transitions (and not their negation), then it is possible to use ∇ and ∀ in the premises of the operational semantics for the π-calculus interchangeably. Using ∇-quantification instead of ∀-quantification in the premise of the (res) rule does, in fact, allow proving the formula A
∀x∀A∀P ¬(νy.[x = y]¯ xz −−→ P ), A
since this now reduces to ∀x∀A∀P ∇y¬([x = y]¯ xz −−→ P ). If one follows the proof theory for ∇ carefully [36] this negation is provable because the expressions λy.x and λy.y do not unify (for free variable x). Notice that the binding of y is maintained all the way to the level of unification where, in this case, it ensures the correct failure to find an appropriate instance for x. Using the ∇-quantifier, it is now easy and natural to specify bisimulation for the π-calculus with the equivalence displayed in Figure 1. Notice the elegant parA
A
A
A
↓X
↓X
↓X
↓X
↑X
↑X
↑X
↑X
bisim P Q ≡ ∀A∀P [P −−→ P ⇒ ∃Q .Q −−→ Q ∧ bisim P Q ] ∧ ∀A∀Q [Q −−→ Q ⇒ ∃P .P −−→ P ∧ bisim Q P ] ∧ ∀X∀P [P −− P ⇒ ∃Q .Q −− Q ∧ ∀w.bisim (P w) (Q w)] ∧ ∀X∀Q [Q −− Q ⇒ ∃P .P −− P ∧ ∀w.bisim (Q w) (P w)] ∧ ∀X∀P [P −− P ⇒ ∃Q .Q −− Q ∧ ∇w.bisim (P w) (Q w)] ∧ ∀X∀Q [Q −− Q ⇒ ∃P .P −− P ∧ ∇w.bisim (Q w) (P w)] Fig. 1. A specification of bisimulation for the π-calculus
14
D. Miller
allelism between using ∀ to quantify bisimulation of abstracted continuations for bound inputs and using ∇ to quantify bisimulation of abstracted continuations for bound outputs. As is shown in [54], proving this formula in intuitionistic logic yields open bisimulation [49]. Without an inference rule for co-induction, this equivalence is only correct for the finite π-calculus (a subset of the π-calculus that does not contain the replication operator ! nor recursive definition of processes). If we add the excluded-middle assumption ∀w∀z(w = z ∨ w = z) (which is trivially true in a classical meta-theory), the resulting specification can be used to specify late bisimulation [38] (see [54] for the precise statements regarding this specification). A logic programming-style implementation of proof search provides an immediate symbolic bisimulation checker (in the sense of [17,3]) for the finite π-calculus [54,53].
8
The LINC Meta-logic
The three examples above allow us to now motivate the design of a meta-logic that can be used to state properties of object-level provability and, hence, to reason about operational semantics (via their encodings into object-logics). Our meta-logic is called LINC, an acronym coined by Tiu [52] for “lambda, induction, nabla, co-induction.” This logic contains the following three key features. First, LINC is based on the predicative and intuitionistic fragment of Church Simple Theory of Types [6] (restricted to the axioms 1 – 6). Provability for this logic can be described as being essentially Gentzen’s LJ sequent calculus [12] to which is added simple typing for all variables and constants, quantification at all types (excluding the type of predicates), and an inference rule that allows βη-conversion on any formula in a sequent. This logic provides support for λ-tree syntax. Considering a classical logic extension of LINC is also of some interest, as is an extension allowing for quantification at predicate type. Second, LINC incorporates the proof-theoretical notion of definition, a simple and elegant device for extending logic with the if-and-only-if closure of a logic specification (similar to the closed-world assumption [7]). This notion of definition was developed by Halln¨ as and Schroeder-Heister [16,50] and, independently, by Girard [13]. This feature of definitions allows for the “inversion of assumed judgments” mentioned at the end of Section 5 and for the ability to capture not just may behavior but also must behavior. In particular, definitions are central to the treatment of bisimulation mentioned in Section 7 (see also [27]) and for doing model checking directly with operational semantics (see, for example, [53,55]). It also allows for certain failures of proof search to be turned into successful proofs of negations. Definitions are also a natural place to incorporate inductive and co-inductive inference rules: for full details, see paper by McDowell, Miller, Momigliano, and Tiu [25,26,35,39,52]. Third, LINC contains the ∇ quantifier, which, as we just illustrated, allows for more natural and direct reasoning about syntactic encodings based on λ-tree syntax.
Representing and Reasoning with Operational Semantics
15
As Tiu has shown in [52], under restrictions of appropriately “stratified” definitions (a restriction which rules out, for example, a predicate being defined as its own negation), the LINC logic satisfies cut-elimination. The logic F OλΔIN of [25,26] is a subset of LINC, corresponding roughly to the fragment that results from deleting the ∇-quantifier, removing co-induction, and limiting induction to natural number induction. The principal use of the ∇-quantifier is helping with the treatment of bindings in λ-tree syntax encodings. In fact, we know of no use of ∇ in specifications that involve only, say, first-order terms. It is also the case that ∇ is interchangeable with ∀ when definitions are “positive”: that is, when they contain no occurrences of implications and negations. In such Horn clause-like definitions, one can interchange these two quantifiers in the body of definitions without affecting the atomic formulas that are provable [36].
9
Formal Reasoning About Logic Specifications
Figure 2 presents an architecture for organizing the various symbolic systems involved with the specification of and reasoning about computation systems. The top level contains the many applications about which we hope to provide formal proofs. Possible applications should include programming languages, specifications languages, security protocols, type systems, etc. The middle layer contains a few object-level logics, such as Horn clauses (HC), hereditary Harrop formulas (HH) [33], and linear logic (LL). These logics all have well understood meta-theories and their operational semantics is given by proof search following the normal forms dictated by uniform proofs and backchaining [33] or focused proofs [1]. In fact, all of these logics can be seen as modularly sitting inside one single logic, namely, linear logic. A
A I. Applications A
II. Object-logics
III. Meta-logic
π-calculus
A A
Idealized Algol ··· Concurrent-ML
A A
···
λ-calculus
A HC B HH LL . . . A B A B A B A B A A LINC A A A A B
Fig. 2. A three level architecture
16
D. Miller
The bottom layer consists of the single logic LINC, where object-level provability must be encoded (as an abstracted logic programming interpreter) and important results about object-level provability (including cut-elimination) must be proved. Also, object-level logic specifications used to capture aspects of an application must also be encoded into the meta-level. Since the meta-logic and object-logic share the same application and abstraction, terms used to encode application-level objects (for example, a π-calculus expression) are the same at both levels. To illustrate these three-levels, consider the proof of the subject-reduction theorem in Section 5. The application level contains two classes of linguistic items: untyped λ-terms (constructed using abs and app) and simple type expressions (constructed using gnd and arr). The formulas in Section 5 that specify evaluation and typing are object-level (hereditary Harrop) formulas. At the metalevel, such formulas are simply terms, where object-level predicates, such as ⇓ and typeof, are now binary constructor in the meta-logic and where object-level logic connectives and quantifiers are also meta-level term constructors (requiring meta-level λ-abstraction to encode quantifiers). Formulas at the LINC (metalogic) level must now encode the notion of provability for the object-level logic as well as any other judgments that are specific to the application being considering (such as, say, bisimulation). For example, provability of hereditary Harrop formulas can be defined in LINC via a predicate, say, seq Γ B to describe when the object-level formula B is provable from the list of object-level formulas in Γ and the object-level formulas describing evaluation and typing (see [26, Section 4.3] for specifics on how this can be done). The meta-level formula that one wishes to prove within LINC is then ∀P ∀V [seq nil (P ⇓ V ) ⊃ ∀T [seq nil (typeof P T ) ⊃ seq nil (typeof V T )]] This and many similar theorems are proved in [26,36]. For another example, consider again the π-calculus examples given above. In Section 6, operational semantics was given using Horn clauses that allowed ∀ in their bodies. When we moved to Section 7, we needed to make sure that these ∀-quantifiers were replaced by ∇-quantification. This transition is now easily explained: when specifying at the LINC level an interpreter for object-level Horn clauses, that interpreter will naturally translate the object-level conjunction and existential quantifier to the meta-level conjunction and existential quantifier. It will, however, need to translate the object-level universal quantifier to the meta-level ∇-quantifier (for full details, see [36, Section 6]).
10
Future Work and Conclusions
We have described how logic can be used to specify operational semantics: the logics used for this purpose are essentially logic programming languages based in either classical, intuitionistic, or linear logic. These logics generally use highertype quantification in order to support λ-tree syntactic representation. Logic is also used to reason about specifications made in this first logic. This second
Representing and Reasoning with Operational Semantics
17
logic is thus a meta-logic for reasoning about provability in those object-logics. A particular meta-logic, LINC, is based on intuitionistic logic and incorporates the ∇-quantifier and principles of induction and co-induction. Armed with the meta-logic LINC, with several interesting examples of using it to reason about computation, and with several years of experience with implementing proof search systems involving the unification of λ-term, it is now time to build prototype theorem provers for LINC and develop larger examples. Already, we can use λProlog [40] via its Teyjus implementation [42] to animate specifications given in a number of object-logics. A simple model checking-style generalization of (part of) λProlog has also been implemented and used to verify various simple properties of, say, the π-calculus [55,53]. One of the goals of the Parsifal project at INRIA is to use this two level logic approach to reason formally about operational semantics, say, in the context of the POPLmark challenge [2]. We also hope to use this framework to reason about specification logics themselves: for example, to prove soundness of logics used to annotate programming languages for extended static checking, such as the ESC/Java2 object logic [22]. Consistency of two simpler object-logics have been proved in [26] by showing showing formally in (a subset of) LINC that cut-elimination holds for them. Acknowledgments. I am grateful to the anonymous reviewers for their helpful comments on an earlier draft of this paper. This work has been supported in part by the INRIA “Equipes Associ´ees” Slimmer, ACI grants GEOCAL and Rossignol, and the Information Society Technologies programme of the European Commission, Future and Emerging Technologies under the IST-2005-015905 MOBIUS project. This paper reflects only the author’s views and the Community is not liable for any use that may be made of the information contained therein.
References 1. J.-M. Andreoli. Logic programming with focusing proofs in linear logic. J. of Logic and Computation, 2(3):297–347, 1992. 2. B. E. Aydemir, A. Bohannon, M. Fairbairn, J. N. Foster, B. C. Pierce, P. Sewell, D. Vytiniotis, G. Washburn, S. Weirich, and S. Zdancewic. Mechanized metatheory for the masses: The PoplMark challenge. In Theorem Proving in Higher Order Logics: 18th International Conference, LNCS, pages 50–65. Springer-Verlag, 2005. 3. M. Boreale and R. D. Nicola. A symbolic semantics for the π-calculus. Information and Computation, 126(1):34–52, April 1996. 4. P. Borras, D. Cl´ement, T. Despeyroux, J. Incerpi, G. Kahn, B. Lang, and V. Pascual. Centaur: the system. In Proceedings of SIGSOFT’88: Third Annual Symposium on Software Development Environments (SDE3), Boston, 1988. 5. N. Bruijn. Lambda calculus notation with namefree formulas involving symbols that represent reference transforming mappings. Indag. Math., 40(3):348–356, 1979. 6. A. Church. A formulation of the simple theory of types. J.of Symbolic Logic, 5:56–68, 1940.
18
D. Miller
7. K. L. Clark. Negation as failure. In J. Gallaire and J. Minker, editors, Logic and Data Bases, pages 293–322. Plenum Press, New York, 1978. 8. R. L. Constable et al. Implementing Mathematics with the Nuprl Proof Development System. Prentice-Hall, 1986. 9. T. Coquand and G. Huet. The calculus of constructions. Information and Computation, 76(2/3):95–120, February/March 1988. 10. J. Despeyroux, A. Felty, and A. Hirschowitz. Higher-order abstract syntax in Coq. In Second International Conference on Typed Lambda Calculi and Applications, pages 124–138, April 1995. 11. M. J. Gabbay and A. M. Pitts. A new approach to abstract syntax with variable binding. Formal Aspects of Computing, 13:341–363, 2001. 12. G. Gentzen. Investigations into logical deductions. In M. E. Szabo, editor, The Collected Papers of Gerhard Gentzen, pages 68–131. North-Holland, Amsterdam, 1969. 13. J.-Y. Girard. A fixpoint theorem in linear logic. An email posting to the mailing list
[email protected], February 1992. 14. M. Gordon. HOL: A machine oriented formulation of higher-order logic. Technical Report 68, University of Cambridge, July 1985. 15. J. F. Groote and F. Vaandrager. Structured operational semantics and bisimulation as a congruence. Information and Computation, 100:202–260, 1992. 16. L. Halln¨ as and P. Schroeder-Heister. A proof-theoretic approach to logic programming. II. Programs as definitions. J. of Logic and Computation, 1(5):635–660, October 1991. 17. M. Hennessy and H. Lin. Symbolic bisimulations. Theoretical Computer Science, 138(2):353–389, Feb. 1995. 18. M. Hofmann. Semantical analysis of higher-order abstract syntax. In 14th Symp. on Logic in Computer Science, pages 204–213. IEEE Computer Society Press, 1999. 19. D. J. Howe. Proving congruence of bisimulation in functional programming languages. Information and Computation, 124(2):103–112, 1996. 20. G. Huet and B. Lang. Proving and applying program transformations expressed with second-order patterns. Acta Informatica, 11:31–55, 1978. 21. J. Jaffar and J.-L. Lassez. Constraint logic programming. In Proceedings of the 14th ACM Symposium on the Principles of Programming Languages, 1987. 22. J. R. Kiniry, P. Chalin, and C. Hurlin. Integrating static checking and interactive verification: Supporting multiple theories and provers in verification. In VSTTE’05, Proceedings of Verified Software: Theories, Tools, Experiements, Zurich, Switzerland, October 2005. 23. P. Martin-L¨ of. Constructive mathematics and computer programming. In Sixth International Congress for Logic, Methodology, and Philosophy of Science, pages 153–175, Amsterdam, 1982. North-Holland. 24. R. McDowell and D. Miller. A logic for reasoning with higher-order abstract syntax. In G. Winskel, editor, 12th Symp. on Logic in Computer Science, pages 434–445, Warsaw, Poland, July 1997. IEEE Computer Society Press. 25. R. McDowell and D. Miller. Cut-elimination for a logic with definitions and induction. Theoretical Computer Science, 232:91–119, 2000. 26. R. McDowell and D. Miller. Reasoning with higher-order abstract syntax in a logical framework. ACM Trans. on Computational Logic, 3(1):80–136, 2002. 27. R. McDowell, D. Miller, and C. Palamidessi. Encoding transition systems in sequent calculus. Theoretical Computer Science, 294(3):411–437, 2003. 28. D. Miller. A logic programming language with lambda-abstraction, function variables, and simple unification. J. of Logic and Computation, 1(4):497–536, 1991.
Representing and Reasoning with Operational Semantics
19
29. D. Miller. Abstract syntax for variable binders: An overview. In J. Lloyd and et. al., editors, Computational Logic - CL 2000, number 1861 in LNAI, pages 239–253. Springer, 2000. 30. D. Miller. Bindings, mobility of bindings, and the ∇-quantifier. In J. Marcinkowski and A. Tarlecki, editors, 18th International Workshop CSL 2004, volume 3210 of LNCS, page 24, 2004. 31. D. Miller. Overview of linear logic programming. In T. Ehrhard, J.-Y. Girard, P. Ruet, and P. Scott, editors, Linear Logic in Computer Science, volume 316 of London Mathematical Society Lecture Note, pages 119 – 150. Cambridge University Press, 2004. 32. D. Miller and G. Nadathur. A logic programming approach to manipulating formulas and programs. In S. Haridi, editor, IEEE Symposium on Logic Programming, pages 379–388, San Francisco, September 1987. 33. D. Miller, G. Nadathur, F. Pfenning, and A. Scedrov. Uniform proofs as a foundation for logic programming. Annals of Pure and Applied Logic, 51:125–157, 1991. 34. D. Miller and C. Palamidessi. Foundational aspects of syntax. ACM Computing Surveys, 31, September 1999. 35. D. Miller and A. Tiu. A proof theory for generic judgments: An extended abstract. In 18th Symp. on Logic in Computer Science, pages 118–127. IEEE, June 2003. 36. D. Miller and A. Tiu. A proof theory for generic judgments. ACM Trans. on Computational Logic, 6(4):749–783, Oct. 2005. 37. R. Milner. Communication and Concurrency. Prentice-Hall International, 1989. 38. R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, Part II. Information and Computation, pages 41–77, 1992. 39. A. Momigliano and A. Tiu. Induction and co-induction in sequent calculus. In M. C. Stefano Berardi and F. Damiani, editors, Post-proceedings of TYPES 2003, number 3085 in LNCS, pages 293 – 308, January 2003. 40. G. Nadathur and D. Miller. An Overview of λProlog. In Fifth International Logic Programming Conference, pages 810–827, Seattle, August 1988. MIT Press. 41. G. Nadathur and D. Miller. Higher-order Horn clauses. Journal of the ACM, 37(4):777–814, October 1990. 42. G. Nadathur and D. J. Mitchell. System description: Teyjus—a compiler and abstract machine based implementation of Lambda Prolog. In H. Ganzinger, editor, Proceedings of the 16th International Conference on Automated Deduction, pages 287–291, Trento, Italy, July 1999. Springer-Verlag LNCS. 43. T. Nipkow, L. C. Paulson, and M. Wenzel. Isabelle/HOL: A Proof Assistant for Higher-Order Logic. Springer, 2002. LNCS Tutorial 2283. 44. S. Owre, J. M. Rushby, , and N. Shankar. PVS: A prototype verification system. In D. Kapur, editor, 11th International Conference on Automated Deduction (CADE), volume 607 of LNAI, pages 748–752, Saratoga, NY, jun 1992. Springer-Verlag. 45. L. Paulson. Compiler Generation from Denotational Semantics. In B. Lorho, editor, Methods and Tools for Compiler Construction, pages 219–250. Cambridge Univ. Press, 1984. 46. L. C. Paulson and K. Gr¸abczewski. Mechanizing set theory: Cardinal arithmetic and the axiom of choice. J. of Automated Deduction, 17(3):291–323, Dec. 1996. 47. F. Pfenning and C. Elliott. Higher-order abstract syntax. In Proceedings of the ACM-SIGPLAN Conference on Programming Language Design and Implementation, pages 199–208. ACM Press, June 1988. 48. F. Pfenning and C. Sch¨ urmann. System description: Twelf — A meta-logical framework for deductive systems. In H. Ganzinger, editor, 16th Conference on Automated Deduction, number 1632 in LNAI, pages 202–206, Trento, 1999. Springer.
20
D. Miller
49. D. Sangiorgi. A theory of bisimulation for the π-calculus. Acta Informatica, 33(1):69–97, 1996. 50. P. Schroeder-Heister. Rules of definitional reflection. In M. Vardi, editor, Eighth Annual Symposium on Logic in Computer Science, pages 222–232. IEEE Computer Society Press, IEEE, June 1993. 51. C. Sch¨ urmann and F. Pfenning. A coverage checking algorithm for LF. In Theorem Proving in Higher Order Logics: 16th International Conference, TPHOLs 2003, volume 2758 of LNCS, pages 120–135. Springer-Verlag, 2003. 52. A. Tiu. A Logical Framework for Reasoning about Logical Specifications. PhD thesis, Pennsylvania State University, May 2004. 53. A. Tiu. Model checking for π-calculus using proof search. In M. Abadi and L. de Alfaro, editors, CONCUR, volume 3653 of LNCS, pages 36–50. Springer, 2005. 54. A. Tiu and D. Miller. A proof search specification of the π-calculus. In 3rd Workshop on the Foundations of Global Ubiquitous Computing, volume 138 of ENTCS, pages 79–101, Sept. 2004. 55. A. Tiu, G. Nadathur, and D. Miller. Mixing finite success and finite failure in an automated prover. In Proceedings of ESHOL’05: Empirically Successful Automated Reasoning in Higher-Order Logics, pages 79 – 98, December 2005. 56. A. Ziegler, D. Miller, and C. Palamidessi. A congruence format for name-passing calculi. In Proceedings of SOS 2005: Structural Operational Semantics, Electronic Notes in Theoretical Computer Science, Lisbon, Portugal, July 2005. Elsevier Science B.V.
Flyspeck I: Tame Graphs Tobias Nipkow, Gertrud Bauer, and Paula Schultz† Institut f¨ ur Informatik, TU M¨ unchen
Abstract. We present a verified enumeration of tame graphs as defined in Hales’ proof of the Kepler Conjecture and confirm the completeness of Hales’ list of all tame graphs while reducing it from 5128 to 2771 graphs.
1
Introduction
In 1611 Kepler asserted what every cannoneer of the day must have known, that the so-called cannonball packing is a densest arrangement of 3-dimensional balls of the same size. In 1900 this assertion became part of Hilbert’s 18th problem. In 1998 Thomas Hales announced the first (by now) accepted proof. It involves 3 distinct large computations. After 4 years of refereeing by a team of 12 referees, an abridged version was published only recently [5]. The referees declared that they were 99% certain of the correctness of the proof. The programs were merely given a “diagonal look”. Dissatisfied with this state of affairs Hales started the informal open-to-all collaborative flyspeck project (www.math.pitt.edu/∼thales/flyspeck) to formalize the whole proof with a theorem prover. This paper is the first definite contribution to flyspeck. Hales’ proof goes roughly like this: any potential counter example (denser packing) gives rise to a tame plane graph, where tameness is a very specific notion; enumerate all (finitely many) tame graphs (by computer); for each of them check (again by computer) that it cannot constitute a counter example. For modularity reasons Hales provided the Archive, a collection of files with (hopefully) all tame graphs. We recast Hales’ Java program for the enumeration of all tame graphs in logic (Isabelle/HOL), proved its completeness, ran it, and compared the output to Hales’ Archive. It turns out that Hales was right, the Archive is complete, although redundant (there are at most 2771, not 5128 tame graphs), and that one tameness condition present in his Java program was missing from his proof text. Apart from the contribution to Hales’ proof, this paper demonstrates that theorem provers can not just formalize known results but can help in establishing the validity of emerging state-of-the-art mathematical proofs. An intrinsic feature of this proof, which it shares with Gonthier’s proof of the Four Colour Theorem [3], is the need to perform massive computations involving the defined functions (§1.3, §5). Hence efficiency is a concern: during the development phase it is not very productive if, after every change, it takes a week to rerun the proof to find that the change broke it. Part of the achievement of our work is narrowing the gap between the specification of tameness and the enumeration (to simplify the proof) without compromising efficiency unduly. U. Furbach and N. Shankar (Eds.): IJCAR 2006, LNAI 4130, pp. 21–35, 2006. c Springer-Verlag Berlin Heidelberg 2006
22
T. Nipkow, G. Bauer, and P. Schultz
Here is one motivating glimpse of where the tame graphs come from: each one is the result of taking a cluster of balls and projecting the centers of the balls on to the surface of the ball in the center, connecting two centers if they are within a certain distance. For an eminently readable overview of Hales’ proof see [4], for the details see [5], and for the full Monty read [6] and accompanying articles in the same volume. For Hales’ Java program and his Archive see his web page or the online material accompanying [5]. The gory details of our own work can be found in the Isabelle theories available from the first author’s home page and the Archive of Formal Proofs (afp.sourceforge.net). The thesis [1] also contains full details but is a precursor to the work presented here: the enumeration and the proof have changed considerably and the proof has been completed. 1.1
Related Work
Obua [10] and Zumkeller [11] work towards the verification of the remaining computational parts in Hales’ proof. Gonthier’s proof of the Four Colour Theorem [3] is very much concerned with efficient data structures for various computations on plane graphs, a feature it shares with our proof. At the same time he employs hypermaps as a unifying representation of plane graphs. Potentially, hypermaps could play the same role in the less computational but mathematically more abstract parts of flyspeck. 1.2
Basic Notation
Isabelle/HOL [9] conforms largely to everyday mathematical notation. This section introduces further non-standard notation and in particular a few basic data types with their primitive operations. Types. The basic types of truth values, natural numbers and integers are called bool, nat, and int. The space of total functions is denoted by ⇒. Type variables are written a, b, etc. The notation t::τ means that term t has type τ . Sets (type a set ) come with their usual syntax. The pointwise image of set M under a function f is written f ‘ M. Lists (type a list ) come with the empty list [], the infix constructor · , the infix @ that appends two lists, and the conversion function set from lists to sets. Function hd yields the head of a list, last the last element, and butlast drops the last element. Variable names ending in “s” usually stand for lists, |xs| is the length of xs, distinct xs means that the elements of xs are all distinct. Instead of map f xs and filter P xs we frequently write [f x . x ∈ xs] and [x ∈xs . P x ]. 1.3
Proof by Evaluation
Many theorem provers (ACL2, Coq and PVS) have the ability to evaluate functions during proof by compilation into some efficient format followed by execution. Isabelle has been able to generate ML code for some time now [2]. Recently this has been made available as an inference rule: given a term t, all functions in t are compiled to ML (provided this is possible), t is reduced to u by the ML
Flyspeck I: Tame Graphs
23
system, and the result is turned into the theorem t = u. Essentially, the ML system is used as an efficient term rewriting engine. To speed things up further, nat and int are implemented by arbitrary precision ML integers. In order to remove constructs which are not by themselves executable, code generation is preceded by a preprocessor that rewrites the term with specified lemmas. For example, the lemmas ∀ x ∈set xs. P x ≡ list-all P xs and ∃ x ∈set xs. P x ≡ list-ex P xs replace bounded quantifiers over lists with executable functions. This is the key to bridging the gap between specification and implementation automatically and safely. Because lists are executable but sets are not, we sometimes phrase concepts in terms of lists rather than sets to avoid having to define the concept twice and having to provide a trivial implementation proof.
2
Plane Graphs
Following Hales we represent finite, undirected, plane graphs as lists (= finite sets) of faces and faces as lists of vertices. Note that by representing faces as lists they have an orientation. Our enumeration of plane graphs requires an additional distinction between final and nonfinal faces. This flag is attached directly to each face: vertex = nat, face = vertex list × bool,
graph = face list
The projection functions for faces are called vertices and final. The size of a face is the length of its vertex list. Occasionally we call a list of vertices a face, too. A graph is final if all its faces are final. Function F returns the set of faces of a graph, i.e. is a synonym for function set. Function V returns the set of vertices in a graph, countVertices the number of vertices. Given a graph g and a vertex v, facesAt g v computes the list of faces incident to v. For navigation around a face f we consider its list of vertices as cyclic and introduce the following notation: if v is a vertex in f then f · v is the vertex following v and f i · v is the ith vertex following v (where i may also be −1 ). This description is ambiguous if there are multiple occurrences of v in f, but this cannot arise in our context. Representing faces as lists means that we want to regard two vertex lists us and vs as equivalent if one can be obtained from the other by rotation, in which case we write us ∼ = vs. We introduce the notation ∼ x ∈∼ = M ≡ ∃ y∈M . x = y,
M ⊆∼ = N ≡ ∀ x ∈M . x ∈∼ = N
Throughout most of this paper we pretend that a graph is just a face list, but in reality it is more complicated. To avoid recomputation, countVertices and facesAt are (hidden) components of the graph. 2.1
Enumeration of Plane Graphs
Not every list of faces constitutes a plane graph. Hence we need additional means of characterizing planarity. We have chosen an operational characterization, i.e.
24
T. Nipkow, G. Bauer, and P. Schultz
an executable enumeration due to Hales. The reason is that we can then view the enumeration of tame graphs as a restriction of the enumeration of plane graphs. The justification for not starting with a more abstract traditional characterization of planarity is that this is the first definite contribution to flyspeck and it is not yet clear which notion of planarity is most suitable for the rest of the project. In a nutshell, we wanted to concentrate on the new (and unchecked by referees!) enumeration of tame graphs rather than the mathematically well understood issue of planarity. The graphs required for Hales’ proof are plane graphs with at least 2 faces (including the outer one), where each face is a simple polygon of size ≥ 3. In the sequel the term plane refers to this class. Hales’ enumeration of plane graphs proceeds inductively: you start with a seed graph with two faces, the final outer one and the (reverse) nonfinal inner one. If a graph contains a nonfinal face, it can be subdivided into a final face and any number of nonfinal ones as shown below. Final faces are grey, nonfinal ones white. The unbounded grey square indicates the unbounded outer face.
Because a face can be subdivided in many ways, this process defines a tree of graphs. By construction the leaves must be final graphs, and they are the plane graphs we are interested in: any plane graph (in the above sense) of n faces can be generated in n − 1 steps by this process, adding one (final) face at a time. This definition is also meant to serve as the basis of the actual enumeration. Hence we reduce its redundancy, i.e. the number of times each graph is generated, by the following means: – The enumeration is parameterized by a natural number p which controls the maximal size of final faces in the generated graphs. The seed graph contains two (p + 3)-gons and the final face created in each step may at most be a (p + 3)-gon. As a result, different parameters lead to disjoint sets of graphs. Note that the nonfinal faces may (and need to be) of arbitrary size. – In each step we subdivide only one fixed face and the new final face always shares one fixed edge with the subdivided face; which face and edge are chosen is immaterial. This does not affect the set of final graphs that can be generated but merely the order in which the final faces are created. Formalization. Now we are ready for the top level formal specification: PlaneGraphs ≡
∗ p {g | Seed p [next-plane p ]→ g ∧ final g}
where Seed p ≡ [([0 ,. . .,p+2 ],True), ([p+2 ,. . .,0 ],False)] is the seed graph described above. Notation g 0 [f ]→∗ g is simply suggestive syntax for (g 0 , g) ∈
Flyspeck I: Tame Graphs
25
{(g, g ) | g ∈ set (f g)}∗ , i.e. we can reach g from g 0 in finitely many steps via function f :: graph ⇒ graph list. In our case f is next-plane p which maps a graph to a list of successor graphs: next-plane p g ≡ let fs = [f ∈faces g . ¬ final f ] in if fs = [] then [] else let f = minimalFace fs; v = minimalVertex g f in i∈ [3 ..p + 3 ] generatePolygon i v f g
If there are only final faces, we are done. Otherwise we pick a minimal nonfinal face (in terms of size) and a minimal vertex within that face. Minimality of the vertex refers to its distance from the vertices in the seed graph. This policy favours compact graphs over stringy objects. Its implementation requires an additional (hidden) component in each graph. But since the choice of vertex (and face) is irrelevant as far as completeness of the enumeration is concerned, so is the precise implementation. Having fixed f and v we subdivide f in all possible ways by placing an i-gon inside it (along the edge from v to its predecessor vertex in f ), where i ranges from 3 to p+3. Function generatePolygon returns a list of all possible successor graphs and the suggestive syntax i ∈ I F i represents the concatenation of all F i for i in I. Function generatePolygon operates and is explained in stages: generatePolygon n v f g ≡ let enumeration = enumerator n |vertices f |; enumeration = [is∈enumeration . ¬ containsDuplicateEdge g f v is]; vertexLists = [indexToVertexList f v is. is ∈ enumeration] in [subdivFace g f vs. vs ∈ vertexLists]
Enumeration. We have to enumerate all possible ways of inscribing a final ngon inside f such that it shares the edge (f −1 · v , v ) with f (which is removed). The new n-gon can in fact share all edges with f, in which case we simply finalize f without adding any new nonfinal faces; or it can touch f only at f −1 · v and v and all of its other vertices are new; or anything in between. Following Hales one can describe each of these n-gons by a list of length n of increasing indices from the interval {0 ,. . .,|vertices f | − 1 }. Roughly speaking, index i represents vertex f i · v and a pair i,j of adjacent list elements is interpreted as follows: if i < j then the new polygon contains an edge from vertex f i · v to f j · v ; if i = j then the new polygon contains a new vertex at that point. For example, given the face [v 0 ,. . .,v 5 ], the index list [0 ,2 ,3 ,3 ,3 ,5 ] represents some face [v 0 ,v 2 ,v 3 ,x ,y,v 5 ] where x and y are new vertices. The enumeration of all these index lists is the task of function enumerator which returns a nat list list. We have proved that enumerator n m returns all (not necessarily strictly) increasing lists of length n starting with 0 and ending with m − 1 : set (enumerator n m) = {is | |is| = n ∧ hd is = 0 ∧ last is = m − 1 ∧ last (butlast is) < last is ∧ increasing is}
26
T. Nipkow, G. Bauer, and P. Schultz
Condition last (butlast is) < last is excludes lists like [. . .,m,m] which would insert a new vertex behind the last element, i.e. between f −1 · v and v. The next stage in generatePolygon removes those index lists which would create a duplicate edge: containsDuplicateEdge g f v is checks that there are no two adjacent indices i < j in is such that (f i · v , f j · v ) or (f j · v , f i · v ) is an edge in g (unless it is an edge in f, in which case no duplicate edge is created because f is removed). Finally the index list is turned into a list of vertices as sketched above employing datatype a option = None | Some a to distinguish an existing vertex Some(f i · v ) from a new vertex None: indexToVertexList f v is ≡ hideDups [f k · v . k ∈ is] hideDups (i · is) = Some i · hideDupsRec i is hideDupsRec a [] = [] hideDupsRec a (b · bs) = (if a = b then None · hideDupsRec b bs else Some b · hideDupsRec b bs)
The result (in generatePolygon) is vertexLists of type vertex option list list where each list in vertexLists describes one possibility of inserting a final face into f. Subdivision. The last step in generatePolygon is to generate a new graph subdivFace g f vos for each vos in vertexLists by subdividing f as specified by vos. This is best visualized by an example. Given a face f = [1 ,. . .,8 ] and vos = [Some 1 , Some 3 , None, Some 4 , None, Some 8 ] the result of inserting a face specified by vos into f is shown below. 2
3
1
4
8
5 7
3
3 1
9
4
4 8
10
6
Subdividing is an iterative process where in each step we split the (remaining) face in two nonfinal faces; at the end we finalize the face. In the example we first split the face along the path [1 ,3 ], then along [3 ,9 ,4 ] and finally along [4 ,10 ,8 ]. Each splitting in two is performed by splitFace g u v f newvs which returns a new graph where f has been replaced by its two halves by inserting a list of new vertices newvs between the existing vertices u and v in f. The straightforward definition of splitFace is omitted. Repeated splitting is performed by subdivFace g f u n vos where u is the vertex where splitting starts, n records how many new vertices must be inserted along the seam, and vos is a vertex option list from vertexLists: subdivFace g f u n [] = makeFaceFinal f g subdivFace g f u n (vo · vos) = (case vo of None ⇒ subdivFace g f u (Suc n) vos
Flyspeck I: Tame Graphs
27
| Some v ⇒ if f · u = v ∧ n = 0 then subdivFace g f v 0 vos else let ws = [countVertices g..
The definition is by recursion on vos. The base case simply turns f into a final face in g. If vos starts with None, no splitting takes place but n is incremented. If vos starts with Some v there are two possibilities. Either we have merely advanced one vertex along f, in which case we keep on going. Or we have skipped at least one vertex of f, in which case we must split f between u and v, inserting n new vertices: the term [i..<j ] is the list of natural numbers from and including i up to but excluding j. Function splitFace returns the two new faces along with the new graph. Only f 2 is used (it is the face that is subdivided further), but returning both faces helps to state many lemmas more succinctly. Function subdivFace (called from generatePolygon) simply starts subdivFace : subdivFace g f (Some u · vs) ≡ subdivFace g f u 0 vs
Note that because all index lists produced by enumerator are nonempty, all vertex lists produced by indexToVertexList are nonempty and start with Some. 2.2
Invariants
Almost half the proof is concerned with verifying that PlaneGraphs satisfy certain invariants which are combined into the predicate inv :: graph ⇒ bool. Probably half that effort is caused by showing that the extended graph representation, primarily facesAt, is kept consistent. The remaining properties are: each face is of size ≥ 3 and all its vertices are distinct, there are at least two faces, the faces are distinct modulo ∼ = and if the graph has more than 2 faces also modulo reversal, the edges of distinct faces are disjoint (where edges are pairs of adjacent vertices in an oriented face), and any nonfinal face is surrounded by final faces.
3
Tame Graphs
Tameness is rooted in geometric considerations but for this paper it is simply a fixed interface to the rest of Hales’ proof and should be taken as God given. 3.1
Definition of Tame Graphs
The definition relies on 4 tables a :: nat ⇒ nat, b :: nat ⇒ nat ⇒ nat, c :: nat ⇒ int, d :: nat ⇒ nat. Their precise definition is immaterial for this paper and can be found elsewhere [1,5]. Like Hales (in his Java program) we have scaled all rational numbers (from the paper proof) by 1000, thus turning them into integers.
28
T. Nipkow, G. Bauer, and P. Schultz
Summing over the elements of a list (below: of faces) is written x ∈ xs f x . Function faces returns the list of faces in a graph: in our simplified model of graphs it is the identity but in the real model it is a projection. Functions tri and quad count the number of final triangles and quadrilaterals incident to a vertex. Hales calls a face f exceptional if it is a pentagon or larger, i.e. if 5 ≤ |vertices f |. Function except returns the number of final exceptional faces incident to a vertex. A vertex has type (p, q) if p = tri g v, q = quad g v and except g v = 0. A graph is tame if it is plane and satisfies 8 conditions: 1. The size of each face is at least 3 and at most 8: tame 1 g ≡ ∀ f ∈F g. 3 ≤ |vertices f | ∧ |vertices f | ≤ 8
2. Every 3-cycle is a face or the opposite of a face: tame 2 g ≡ ∀ a b c. is-cycle g [a, b, c] ∧ distinct [a, b, c] −→ (∃ f ∈F g. vertices f ∼ = [a, b, c] ∨ vertices f ∼ = [c, b, a])
where is-cycle g vs ≡ hd vs ∈ set (neighbors g (last vs)) ∧ is-path g vs, function neighbors does the obvious and is-path g [] = True is-path g (u · vs) = (case vs of [] ⇒ True | v · ws ⇒ v ∈ set (neighbors g u) ∧ is-path g vs)
3. Every 4-cycle surrounds one of the following configurations:
The tame configurations are straightforward to describe: tameConf 1 tameConf 2 tameConf 3 tameConf 4
a a a a
b b b b
c c c c
d d d d
≡ [[a, b, c, d ]] ≡ [[a, b, c], [a, c, d ]] e ≡ [[a, b, e], [b, c, e], [a, e, c, d ]] e ≡ [[a, b, e], [b, c, e], [c, d , e], [d , a, e]]
Predicate tame-quad formalizes that its parameters form one of the tame configurations, taking rotation into account: tame-quad g a b c d ≡ set (tameConf 1 a b c d ) ⊆∼ = vertices ‘ F g ∨ set (tameConf 2 a b c d ) ⊆∼ = vertices ‘ F g ∨ set (tameConf 2 b c d a) ⊆∼ = vertices ‘ F g ∨ (∃ e∈V g − {a, b, c, d }. set (tameConf 3 a b c d e) ⊆∼ = vertices ‘ F set (tameConf 3 b c d a e) ⊆∼ = vertices ‘ F set (tameConf 3 c d a b e) ⊆∼ = vertices ‘ F set (tameConf 3 d a b c e) ⊆∼ = vertices ‘ F set (tameConf 4 a b c d e) ⊆∼ = vertices ‘ F
g ∨ g ∨ g ∨ g ∨ g)
Flyspeck I: Tame Graphs
29
Finally, tame 3 also takes reversal of orientation into account: tame 3 g ≡ ∀ a b c d. is-cycle g [a, b, c, d ] ∧ distinct [a, b, c, d ] −→ tame-quad g a b c d ∨ tame-quad g d c b a
4. The degree of every vertex is at most 6 and at most 5 if the vertex is contained in an exceptional face: tame 45 g ≡ ∀ v ∈V g. degree g v ≤ (if except g v = 0 then 6 else 5 )
We have combined conditions 4 and 5 from [5] into one. 5. The following inequality holds: tame 6 g ≡ 8000 ≤
f ∈faces g c |vertices f |
6. There exists an admissible assignment of weights to faces of total weight less than 14800: tame 7 g ≡ ∃ w . admissible w g ∧
f ∈faces g w f < 14800
Admissibility is quite involved and discussed below. Although this is not immediately obvious, tame 7 guarantees there are only finitely many tame graphs. It also is the source of most complications in the proofs because it is not straightforward to check this condition. 7. There are no two adjacent vertices of type (4, 0): tame 8 g ≡ ¬ (∃ v ∈V g. type40 g v ∧ (∃ w ∈set (neighbors g v ). type40 g w ))
Now tame is the conjunction of tame 1 up to tame 8 ; the numbering follows [5]. Note that tame 8 is missing in earlier versions of the proof, e.g. www.math.pitt. edu/∼thales/kepler04/fullkepler.pdf of 13/3/2004. The second author noticed and informed Hales of this discrepancy between the proof and his Java code, where the test is present. As a result Hales added tame 8 in the published versions of his proof. 3.2
Admissible Weight Assignment
For w :: face ⇒ nat to be an admissible weight assignment it needs to meet the following requirements: 1. admissible 1 w g ≡ ∀ f ∈F g. d |vertices f | ≤ w f 2. admissible 2 w g ≡
∀ v ∈V g. except g v = 0 −→ b (tri g v ) (quad g v ) ≤ 3. admissible 3 w g ≡ ∀ V . separated g (set V ) ∧ set V ⊆ V g −→ a (tri g v ) + v ∈V
f ∈facesAt g v w f
f ∈[f ∈faces g . ∃ v ∈set V . f ∈ set (facesAt g v )] d |vertices f | ≤ f ∈[f ∈faces g . ∃ v ∈set V . f ∈ set (facesAt g v )] w f
30
T. Nipkow, G. Bauer, and P. Schultz
The first two constraints express that d and b yield lower bounds for w. The last requirement yields another lower bound for w in terms of separated sets of vertices. Separatedness means that they are not neighbours, do not lie on a common quadrilateral, and fulfill some additional constraints: separated 1 g V separated 2 g V separated 3 g V ∀ v ∈V . ∀ f ∈set separated 4 g V
≡ ∀ v ∈V . except g v = 0 ≡ ∀ v ∈V . ∀ f ∈set (facesAt g v ). f · v ∈ / V ≡ (facesAt g v ). |vertices f | ≤ 4 −→ V f ∩ V = {v } ≡ ∀ v ∈V . degree g v = 5
Note that Hales [5] lists 4 admissibility conditions, the third of which our work shows to be superfluous. 3.3
Enumeration of Tame Graphs
The enumeration of tame graphs is a modified enumeration of plane graphs where we remove final graphs that are definitely not tame, and cut the search tree at nonfinal graphs that cannot lead to tame graphs anymore. Note that in contrast to the enumeration of plane graphs, a specification we must trust, the enumeration of tame graphs is accompanied by a correctness theorem (Theorem 3 below), the central result of the work, stating that all tame graphs are generated. Hence it is less vital to present the tame enumeration in complete detail (except to satisfy the curiosity of the reader and allow reproduceability). The core of the tame enumeration is a filtered version of generatePolygon: generatePolygonTame n v f g = [g ∈generatePolygon n v f g . ¬ notame g ]
In reality this is not the definition but the characteristic lemma. The actual definition replaces the repeated enumeration of lists of index lists in generatePolygon by a table lookup. This is “merely” an optimization for speed, but an important one. The filter notame removes all graphs that cannot lead to a tame graph: notame g ≡ ¬ (tame 45 g ∧ is-tame 7 g) is-tame 7 g ≡ squanderLowerBound g < 14800
Using tame 45 on nonfinal graphs is justified because the degree of a vertex can only increase as a graph is refined and because except takes only the final faces into account.1 Function squanderLowerBound computes a lower bound for the total admissible weight of any final graph that can be generated from g. By tame 7 this lower bound must be < 14800. squanderLowerBound g ≡
f ∈finals g d |vertices f | + ExcessNotAt g None
The lower bound consists of a d-sum over all final faces (justified by admissible 1 and the fact that d cannot be negative) and an error correction term ExcessNotAt. The definition of ExcessNotAt is somewhat involved and not shown. 1
tame 6 on the other hand cannot be used to filter out nonfinal graphs because function c may return both positive and negative values, i.e. summing over it is not monotone under the addition of new faces.
Flyspeck I: Tame Graphs
31
Essentially we enumerate all maximal separated sets of vertices, compute the “excess” over and above the d-sum for each one (taking a and b into account as justified by admissible 2 and admissible 3 ), and take the maximum. Note that the number of separated sets can grow exponentially with the number of vertices. On top of generatePolygonTame we have a variant of next-plane p : next-tame0 p g ≡ let fs = [f ∈faces g . ¬ final f ] in if fs = [] then [] else let f = minimalFace fs; v = minimalVertex g f in i∈ polysizes p g generatePolygonTame i v f g
where polysizes restricts the possible range [3 ..p + 3 ] to those sizes which can still lead to tame graphs: polysizes p g ≡ [n∈[3 ..p + 3 ] . squanderLowerBound g + d n < 14800 ]
The justification is that the insertion of a new n-gon into g adds at least d n to squanderLowerBound g. Hence all these graphs would immediately be discarded again by notame and polysizes is merely an optimization, but one which happens to reduce the run time by a factor of 10. The key correctness theorems for squanderLowerBound (recall inv from §2.2) are that it increases with next-tame0 (in fact with next-plane) Theorem 1. If g ∈ set (next-tame0 p g) and inv g then squanderLowerBound g ≤ squanderLowerBound g . and for (final) tame graphs squanderLowerBound is a lower bound for the total weight of an admissible assignment: Theorem 2. If tame g and final g and inv g and admissible w g and f ∈faces g w f < 14800 then squanderLowerBound g ≤ f ∈faces g w f. These two theorems are the main ingredients in the completeness proof of nexttame0 w.r.t. next-plane: any tame graph reachable via next-plane is still reachable via next-tame0. Now we compose next-tame0 with a function makeTrianglesFinal (details omitted) which finalizes all nonfinal triangles introduced by next-tame0 : next-tame1 p ≡ map makeTrianglesFinal ◦ next-tame0 p
This step appears to be a trivial consequence of tame 2 which says that all 3cycles must be triangles, i.e. that one should not be allowed to subdivide a triangle further. The latter implication, however, is not completely trivial: one has to show that if one ever subdivides a triangle, that triangle cannot be reintroduced as a face later on. A lengthy proof yields completeness of next-tame1 w.r.t. next-tame0. The invariants (§2.2) are absolutely essential here. As a final step we filter out all untame final graphs: next-tame p ≡ filter (λg. ¬ final g ∨ is-tame g) ◦ next-tame1 p is-tame g ≡ tame 45 g ∧ tame 6 g ∧ tame 8 g ∧ is-tame 7 g ∧ is-tame 3 g
32
T. Nipkow, G. Bauer, and P. Schultz
Tameness conditions 1 and 2 are guaranteed by construction. Conditions 45, 6 and 8 are directly executable. Condition 7 has been discussed already. This leaves condition 3, the check of all possible 4-cycles: is-tame 3 g ≡ ∀ vs∈set (find-cycles 4 g). is-cycle g vs ∧ distinct vs ∧ |vs| = 4 −→ ok4 g vs
This implementation is interesting in that it employs a search-and-check technique: function find-cycles need not be verified at all because the rest of the code explicitly checks that the vertex lists are actually cycles of distinct vertices of length 4. This can at most double the execution time but reduces verification time. In fact, find-cycles is expressed in terms of a while-functional [9] which simplifies definition but would complicate verification. Function ok4 is a straightforward implementation of tame-quad. Note that this search-and-check technique is applicable only because we merely need to ensure completeness of the enumeration of tame graphs, not correctness. Otherwise we would need to verify that find-cycles finds all cycles. Completeness of next-tame w.r.t. next-tame1 follows from Theorem 2 together with the implementation proof of ok4 w.r.t. tame-quad. Putting the three individual completeness theorems together we obtain the overall completeness of next-tame: all tame graphs are enumerated. Theorem 3. If Seed p [next-plane p ]→∗ g and final g and tame g then Seed p [next-tame p ]→∗ g. The set of tame graphs is defined in the obvious manner: TameEnum p ≡{g | Seed p [next-tame p ]→∗ g ∧ final g} TameEnum ≡ p ≤ 5 TameEnum p
An executable version of TameEnum p is provided under the name tameEnum. It realizes a simple work list algorithm directly on top of next-tame and need not be shown or discussed, except for one detail. Being in a logic of total functions we have to apply an old trick: since we want to avoid proving termination of the enumeration process (which is bound to be quite involved), tameEnum takes two parameters: the usual p and a counter which is decremented in each step. If it reaches 0 prematurely, we return None, otherwise we return Some Fs where Fs is the collected list of final graphs, the result of the enumeration. When running tameEnum we merely need to start with a large enough counter. Because the returned graphs are all final we reduce each graph to a list of list of vertices via fgraph g ≡ map vertices (faces g)
before including it in the result. Hence the actual return type of tameEnum is vertex fgraph list where
a fgraph = a list list.
We merely show tameEnum’s correctness theorem, not the definition: Theorem 4. If tameEnum p n = Some Fs then set Fs = fgraph ‘ TameEnum p .
Flyspeck I: Tame Graphs
33
As a final step we need to run tameEnum p n with suitably large n (such that Some is returned) for all p ≤ 5 2 and compare the result with the contents of the Archive.
4
The Archives
It turned out that Hales’ Archive was complete but redundant. That is, our verified enumeration produced only 2771 graphs as opposed to Hales’ 5128. The reason is twofold: there are many isomorphic copies of graphs in his Archive and it contains a number of non-tame graphs (partly because for efficiency reasons he does not enforce tame 3 completely in his Java program). The new reduced Archive can be found at the first author’s web page. The new Archive is a constant Archive :: vertex fgraph set in the Isabelle theories which is defined via the concatenation of 6 separate archives, one for each p ≤ 5 : Archive ≡ set (Tri @ Quad @ Pent @ Hex @ Hept @ Oct)
The main theorem of our work is the completeness of Archive: Theorem 5. If g ∈ PlaneGraphs and tame g then fgraph g ∈ Archive. Relation is graph isomorphism (§4.1) and x ∈ M ≡ ∃ y∈M . x y. This theorem is a combination of the completeness of next-tame (Theorem 3) and of fgraph ‘ TameEnum ⊆ Archive (where M ⊆ N ≡ ∀ x ∈M . x ∈ N ). The latter is a corollary of the fact that, for each p ≤ 5, tameEnum p n (for suitable n) returns Some Fs such that Fs is equivalent to the corresponding part of the Archive. Quite concretely, we have proved by evaluation (§1.3) that same (tameEnum 0 800000 ) Tri, same (tameEnum 1 8000000 ) Quad, same (tameEnum 2 20000000 ) Pent, same (tameEnum 3 4000000 ) Hex, same (tameEnum 4 1000000 ) Hept, and same (tameEnum 5 2000000 ) Oct, where same is an executable check of equivalence (modulo )3 of two lists of fgraphs. Corollary fgraph ‘ TameEnum ⊆ Archive follows by correctness of tameEnum (Theorem 4) . We cannot detail the definition of same (or its correctness theorem) but we should point out that it is a potential bottleneck: for p = 2 we need to check 15000 graphs for inclusion in an archive of 1500 graphs — modulo graph isomorphism! Although isomorphism of plane graphs can be determined in linear time [8], this algorithm is not very practical because of a large constant factor. Instead we employ a hashing scheme to home in on the isomorphic graph quickly. The graphs of each archive are stored in a search tree (a trie) indexed by a list of natural numbers. The index is the concatenation of a number of hash values invariant under isomorphism. The most important component is obtained by adding up, for each vertex, the size of the faces around that vertex, and then sorting the resulting list. This idea is due to Hales. 2
3
By tame 1 we are only interested in graphs where all faces are of size ≤ 8 = 5 +3, the 3 being added in next-plane. A check for ⊆ would suffice but it is nice to know that Archive is free of junk.
34
4.1
T. Nipkow, G. Bauer, and P. Schultz
Plane Graph Isomorphisms
For lack of space we present our definition of isomorphism but not its implementation: is-pr-iso ϕ Fs 1 Fs 2 ≡ is-pr-Iso ϕ (set Fs 1 ) (set Fs 2 ) is-pr-Iso ϕ Fs 1 Fs 2 ≡ is-Hom ϕ Fs 1 Fs 2 ∧ inj-on ϕ ( F ∈Fs 1 set F ) is-Hom ϕ Fs 1 Fs 2 ≡ map ϕ ‘ Fs 1 // {∼ =} = Fs 2 // {∼ =}
Parameter ϕ is a function of type vertex ⇒ vertex. Predicate is-pr-iso compares lists of faces (fgraphs), is-pr-Iso sets of faces; pr stands for proper. Proper isomorphisms assume that the faces of both graphs have the same orientation. An isomorphism is defined as usual as an injective homomorphism. A homomorphism must turn one graph into the other, modulo rotation of faces: the infix // is quotienting and the symbol {∼ =} is defined as {(f 1 , f 2 ) | f 1 ∼ = f 2 }. ‘Improper’ isomorphisms allow to reverse the orientation of all faces in one graph (rev reverses a list): is-iso ϕ Fs 1 Fs 2 ≡ is-Iso ϕ (set Fs 1 ) (set Fs 2 ) is-Iso ϕ Fs 1 Fs 2 ≡ is-pr-Iso ϕ Fs 1 Fs 2 ∨ is-pr-Iso ϕ Fs 1 (rev ‘ Fs 2 )
Two fgraphs are isomorphic if there exists an isomorphism between them: g 1 g 2 ≡ ∃ ϕ. is-iso ϕ g 1 g 2
5
Statistics
The starting point were 2200 lines of Java, the result 600 lines of executable HOL (= ML), excluding comments, debugging aids, and libraries. This reduction is primarily due to simplifications of the algorithm itself: not splitting the treatment of triangle and quadrilateral seed graphs into 17 cases, dropping all symmetry checks, dropping the special treatment of nonfinal quadrilaterals, and dropping some complicated lower bound estimates (which are all still present in [1]). The simplicity of the final solution belies the difficulty of arriving at it. The whole formalization encompasses 17000 lines of definitions and proofs. Running the complete proof takes 165 minutes on a Xeon: the completeness proof takes 15 minutes, evaluating the enumeration 105 minutes, and comparing the resulting graphs with the Archive (modulo graph isomorphism) 45 minutes. During execution of the enumeration, the gargantuan number of 23 million graphs are generated and examined, of which 35000 are final. The distribution of graphs over the new Archive (for p = 0, . . . , 5) is (20,22,13), (923,18,12), (1545,18,13), (238,17,12), (23,16,12), and (22,15,12), where each triple gives the number of graphs, average number of faces, and average number of vertices for that group of graphs.
6
Future Work
The enumeration of plane graphs needs to be connected with some abstract notion of planarity. Hales is preparing a revised proof based on hypermaps
Flyspeck I: Tame Graphs
35
that could serve as the glue — face lists are easily turned into hypermaps. On the other end, it needs to be shown that none of the tame graphs constitutes a counter example. The linear programming techniques for this step are in place [10], but their application is nontrivial and not well documented in Hales proof. Finally there is the exciting prospect of modifying our proof to cover a very similar graph enumeration in the proof of the Dodecahedral Conjecture [7]. Acknowledgments. The first author wishes to thank: Tom Hales for generously hosting his sabbatical semester at the University of Pittsburgh and for patiently answering all questions; Jeremy and Sean for lunches and friendship; Q & U for late night entertainment.
References 1. G. Bauer. Formalizing Plane Graph Theory — Towards a Formalized Proof of the Kepler Conjecture. PhD thesis, Technische Universit¨ at M¨ unchen, 2006. 2. S. Berghofer and T. Nipkow. Executing higher order logic. In P. Callaghan, Z. Luo, J. McKinna, and R. Pollack, editors, Types for Proofs and Programs (TYPES 2000), volume 2277 of Lect. Notes in Comp. Sci., pages 24–40. Springer-Verlag, 2002. 3. G. Gonthier. A computer-checked proof of the four colour theorem. Available at research.microsoft.com/∼ gonthier/4colproof.pdf . 4. T. C. Hales. Cannonballs and honeycombs. Notices Amer. Math. Soc., 47:440–449, 2000. 5. T. C. Hales. A proof of the Kepler conjecture. Annals of Mathematics, 162:1063– 1183, 2005. 6. T. C. Hales. Sphere packings, VI. Tame graphs and linear programs. Discrete and Computational Geometry, 2006. To appear. 7. T. C. Hales and S. McLaughlin. A proof of the dodecahedral conjecture. E-print archive arXiv.org/abs/math.MG/9811079. 8. J. E. Hopcroft and J. K. Wong. Linear time algorithm for isomorphism of planar graphs (preliminary report). In STOC ’74: Proc. 6th ACM Symposium Theory of Computing, pages 172–184. ACM Press, 1974. 9. T. Nipkow, L. Paulson, and M. Wenzel. Isabelle/HOL — A Proof Assistant for Higher-Order Logic, volume 2283 of Lect. Notes in Comp. Sci. Springer-Verlag, 2002. http://www.in.tum.de/∼ nipkow/LNCS2283/. 10. S. Obua. Proving bounds for real linear programs in Isabelle/HOL. In J. Hurd, editor, Theorem Proving in Higher Order Logics (TPHOLs 2005), volume 3603 of Lect. Notes in Comp. Sci., pages 227–244. Springer-Verlag, 2005. 11. R. Zumkeller. A formalization of global optimization with Taylor models. In Automated Reasoning (IJCAR 2006), 2006. This volume.
Automatic Construction and Verification of Isotopy Invariants Volker Sorge1 , Andreas Meier2 , Roy McCasland3, , and Simon Colton4 1
School of Computer Science, University of Birmingham, UK
[email protected] http://www.cs.bham.ac.uk/~vxs 2 DFKI GmbH, Saarbr¨ ucken, Germany
[email protected] http://www.ags.uni-sb.de/~ameier 3 School of Informatics, University of Edinburgh, UK
[email protected] http://www.inf.ed.ac.uk/~rmccasla 4 Department of Computing, Imperial College London, UK
[email protected] http://www.doc.ic.ac.uk/~sgc
Abstract. We extend our previous study of the automatic construction of isomorphic classification theorems for algebraic domains by considering the isotopy equivalence relation, which is of more importance than isomorphism in certain domains. This extension was not straightforward, and we had to solve two major technical problems, namely generating and verifying isotopy invariants. Concentrating on the domain of loop theory, we have developed three novel techniques for generating isotopic invariants, by using the notion of universal identities and by using constructions based on substructures. In addition, given the complexity of the theorems which verify that a conjunction of the invariants form an isotopy class, we have developed ways of simplifying the problem of proving these theorems. Our techniques employ an intricate interplay of computer algebra, model generation, theorem proving and satisfiability solving methods. To demonstrate the power of the approach, we generate an isotopic classification theorem for loops of size 6, which extends the previously known result that there are 22. This result was previously beyond the capabilities of automated reasoning techniques.
1
Introduction
The classification of abstract algebraic objects such as groups and rings has been a major driving force in pure mathematics. For the majority of algebraic domains, however, full classifications have been elusive (with notable exceptions being Abelian groups, finite simple groups and Abelian quasigroups [17]). This is partially due to the sheer number of classes in domains such as quasigroups, where the axioms are not particularly restrictive. Due to this volume of classes,
The author’s work was supported by EPSRC MathFIT grant GR/S31099.
U. Furbach and N. Shankar (Eds.): IJCAR 2006, LNAI 4130, pp. 36–51, 2006. c Springer-Verlag Berlin Heidelberg 2006
Automatic Construction and Verification of Isotopy Invariants
37
automatic techniques have much potential to add to mathematical knowledge. In particular, enumeration techniques have been used to count the instances of certain finite algebras [11] and model generating and constraint solving techniques could solve open existence problems [19]. Moreover, using first order theorem proving, McCune et. al have generated single axiom representations for numerous algebraic domains [9]. We are more interested in qualitative rather than quantitative results. To this end, we have developed a bootstrapping algorithm for the construction of classification theorems in finite algebraic domains. While it is largely generic, its power relies upon the automatic discovery of algebraic invariants which are able to discriminate between members of different classes defined by the equivalence relation under consideration. When considering isomorphism equivalences, we were able to employ machine learning techniques to generate the invariants, and this led to novel theorems which achieve classifications up to isomorphism of quasigroups and loops of small order [4]. We provide a brief overview of the bootstrapping algorithm in Sec. 2. We present here the application of the bootstrapping algorithm to the production of classification theorems up to isotopism, an equivalence relation which is of greater importance than isomorphism in certain domains, in particular algebraic loop theory. Unfortunately, the machine learning approach did not suffice for this application, as finding isotopy invariants is a much more complex task. Hence, concentrating on the domain of loop theory, we have developed new methods for generating isotopy invariants, as described in Sec. 3. Firstly, using results from the literature, we show how universal identities can be used to generate invariants via an intricate interplay of model generation and theorem proving. Secondly, we present two new sets of invariants derived by systematically examining substructures of loops, and we describe their construction, which uses symbolic computation techniques. As described in Sec. 4, it has also been a challenge to automatically verify that a conjunction of invariant properties defines an isotopy class (another important aspect of the bootstrapping algorithm). For this reason, we have developed methods for simplifying the problems with computer algebra, before providing proof via a satisfiability solver. As a by-product, these methods significantly simplify the task of generating non-isotopic models. Having solved the problems of generating and verifying isotopic invariants, we have employed the bootstrapping algorithm to generate new results. In particular, in Sec. 5, we present an isotopic classification theorem for loops of order 6, which extends the known enumeration theorem by providing a full set of classification properties.
2
Background
As described in [4], we have developed a bootstrapping algorithm for generating classification theorems in algebraic domains of pure mathematics. Moreover, we have applied this approach to constructing new classification results with respect to isomorphism mainly in the algebraic domains of quasigroups and loops. In this section, we briefly outline this method, and summarise our previous results.
38
V. Sorge et al.
The bootstrapping algorithm takes a set of properties, P, a cardinality, n, and an equivalence relation, E, as input. It returns a decision tree that contains the classification theorem for the algebraic structures of order n that satisfy P with respect to E, as well as a set of representants for each equivalence class. During the construction, a set of theorems are proved, which – taken together – prove the correctness and the completeness of the classification theorem. In detail, the method proceeds as follows. Firstly, it initialises a decision tree with root node N labelled with the properties P. We denote the properties that a node is labelled by as PN . The algorithm then works iteratively, starting with the root node and moving on to successive nodes in the tree. It constructs an example of an algebraic structure of order n satisfying PN . When no example can be produced, the algorithm will prove that no structure of size n with properties PN can exist. When an example does exist, one of two things happen, either: (1) the process shows that the node represents an equivalence class with respect to E, i.e., it proves that all structures of order n that satisfy the properties PN are equivalent to each other under E, or (2) the process constructs another algebraic structure satisfying PN , which is not equivalent to the first one. Note that cases (1) and (2) are mutually exclusive and can be performed in parallel. For case (2), the algorithm computes a discriminating property P for the two structures, such that P holds for one structure and ¬P holds for the other. P is then used to further expand the decision tree by adding two new nodes N and N below N , with labels PN = PN ∪ {P } and PN = PN ∪ {¬P } respectively. The algorithm then iterates over these nodes and adds to the tree accordingly. After new nodes have been created for each of these nodes, the above steps are carried out again. The algorithm terminates once no more expansions can be applied. Leaf nodes either represent equivalence classes or are empty, i.e., no structure exists with the properties given in the node. The final classification theorem comprises the disjunction of the properties which label the leaf nodes. The bootstrapping algorithm is a framework that combines a host of reasoning techniques which play their part in achieving the overall goal. In particular, it relies on third party systems to generate algebras and discriminants, and to verify the construction of the decision tree at each step. We use the following methodologies in the different steps of the algorithm: Generating Algebras. We use model generation to construct algebras. In the first step, the algorithm calls a model generator to return an algebra corresponding to the input axioms. Throughout the process, model generators are used to construct algebras which are not equivalent to a given algebra. For the experiments described in [4], we used the model generator Mace [10], but we have replaced this by Sem [22] and Finder [18], as they are more effective in our domain. While Sem is generally the more powerful of the two, it has weaknesses when dealing with function symbols of arity greater than 2, which can be introduced when we employ Skolemisation techniques (see [12] for details). Generating Discriminants. The approach to constructing discriminating properties varies from equivalence relation to equivalence relation. When dealing with the isomorphism relation, we treated the generation of a discriminant
Automatic Construction and Verification of Isotopy Invariants
39
for a pair of algebras as a machine learning problem, and successfully applied automated theory formation [3] and inductive logic programming [5] to such problems. Verifying Properties. Throughout the bootstrapping procedure, all the results coming from third party systems are independently verified by first order automated theorem provers. Thus, for a given discriminant P and two algebras Q and Q , we show that (1) P is a proper discriminant for the equivalence relation E [which means that if Q and Q differ with respect to the property, then they cannot be members of the same equivalence class], (2) P holds for Q, and (3) P does not hold for Q . Proving these properties explicitly guarantees the overall correctness of the constructed decision tree. The proofs themselves are generally not very challenging, and we have experimented with several provers. We generally employ Spass [21] for these tasks. Verifying Equivalence Classes. The most difficult verification problems occurring during the classification process involve showing that a given node forms an equivalence class with respect to the equivalence relation under consideration. More formally, we need to prove that, for a particular set of properties of a node, P , all algebras of cardinality n, which satisfy P , are equivalent, and every member of the equivalence class satisfies P . These types of proof are necessary to fully verify the completeness of a decision tree. Although the theorems are essentially second order, because we work in a finite domain, they can be expressed as propositional logic problems by enumerating all possible equivalence mappings for structures of cardinality n and thus made accessible to ATP systems. In our original experiments, described in [4], we used Spass for these problems, as it was the only system which coped with the massive clause normalisations required (cf. [20]). For later experiments, we replaced Spass by state of the art SAT solvers and developed a range of encoding techniques for a diverse range of systems (cf. [12]). Currently we have integrated zChaff [13] that can handle pure boolean SAT problems, the DPLLT [7] system, which can handle ground equations, and CVClite [2], which can also deal with finite quantification. While using SAT solvers increases the power of our algorithm, if translated naively, many of the proof problems would still be beyond the capabilities of state of the art systems. To enable us to solve these problems, we implemented some computer algebra algorithms in GAP [8] that exploit some mathematical domain knowledge to reduce complexity. In our experiments with isomorphism as the equivalence relation, we mainly concentrated on the domain of quasigroups and loops. A quasigroup is a nonempty set G together with a binary operation ◦ that satisfies the property ∀a, b∈G (∃x∈G x ◦ a = b) ∧ (∃y∈G a ◦ y = b). This property is often called the Latin Square property and has the effect that every element of G appears exactly once in every row and every column of the multiplication table of ◦. A loop is a quasigroup that contains a unit element, i.e., an element e such that ∀x∈G x ◦ e = e ◦ x = x. We generated novel isomorphism classification theorems for quasigroups of orders 3 to 5 and loops of order 4 to 6, a partial classification of
40
V. Sorge et al.
loops of order 7, as well as some classification theorems for quasigroups of orders 6 and 7 with additional special properties, e.g., idempotency. The largest decision tree so far is the full isomorphism classification of quasigroups of size 5. This contains 2875 nodes and 1411 isomorphism classes, but is relatively shallow, with a maximum depth of 23. Its completion took more than four months of processing time. For more details see http://www.cs.bham.ac.uk/˜vxs/quasigroups/.
3
Generating Isotopy Invariants
The isotopy equivalence relation is of considerable importance in quasigroup theory, hence we concentrate on isotopy in this paper. It is defined as follows: Definition 1. We say two quasigroups (G, ·) and (H, ∗) are isotopic to each other — or G is an isotope of H — if there are bijective mappings α, β, γ from G to H such that for all x, y ∈ G, α(x) ∗ β(y) = γ(x · y) holds. Isotopy is an equivalence relation and the classes induced by it are called isotopism classes. It is a generalisation of isomorphism, since G and H are isomorphic if α = β = γ. In other words, while two quasigroups can be isotopic but not necessarily isomorphic to each other, all members of an isomorphism class belong to the same isotopism class. Importantly, every quasigroup is isotopic to a loop, which we call its loop-isotope [14]. Thus, in certain respects, we can restrict the classification of quasigroups to just the classification of loops. As mentioned above, an important function of the bootstrapping algorithm is to generate invariant properties which enable us to discriminate between two algebras belonging to different equivalence classes. For instance, when dealing with the isomorphism equivalence relation, if one example of an algebra was commutative (i.e., ∀x, y (x∗y = y∗x)) and another was not, then they could not belong to the same isomorphism class. Our machine learning approach was able to generate such properties, given background concepts including the multiplication operation. Unfortunately, such simple properties do not enable us to distinguish between members of different isotopy classes. For example, commutativity is not an isotopy invariant, as the isotopism (α, ι, ι), where ι is the identity mapping and α = ι, can map a commutative quasigroup to a noncommutative isotope. Due to the more complicated nature of isotopy invariants, initial experiments with the learning system failed to identify any suitable isotopy invariants. In light of this failure, we developed bespoke methods for generating three types of isotopy invariants that are used while producing isotopic classifications of loops. We first describe the generation of universal identities (quasigroup isotopy invariants), which builds on a concept introduced by Falconer [6]. We describe how we obtain these invariants using an interplay of model generation and theorem proving. We also present two more types of invariants, which are based on substructures, which – to the best of our knowledge – have not been reported in the literature on loops, hence represent a novel way of characterising loops. These invariants are derived by systematically examining substructures of loops, and are constructed using symbolic computation techniques.
Automatic Construction and Verification of Isotopy Invariants
3.1
41
Universal Identities
Following Falconer [6], from a given loop identity we derive a universal identity. We first define two additional operations \ and / on a pair of elements such that: 1. x · (x\y) = y and x\(x · y) = y
2. (y/x) · x = y and (y · x)/x = y
Note that these operations are well defined because loops are quasigroups. Given a loop identity w1 = w2 , where w1 , w2 are words of a loop, i.e., combinations of elements of a loop with respect to the loop operation ·, we can obtain a derived or universal identity w 1 = w 2 by recursively applying the following transformations: 2. if w = u · v, then w = (u/y) · (z\v)
1. if w = x, then w = x;
Here y and z are arbitrary elements of a loop, i.e., new universally quantified variables. As an example, consider the two loops below: L4 0 1 2 3 4 5
0 0 1 2 3 4 5
1 1 2 0 5 3 4
2 2 0 1 4 5 3
3 3 5 4 1 0 2
4 4 3 5 0 2 1
5 5 4 3 2 1 0
L8 0 1 2 3 4 5
0 0 1 2 3 4 5
1 1 2 0 5 3 4
2 2 0 1 4 5 3
3 3 4 5 1 0 2
4 4 5 3 0 2 1
5 5 3 4 2 1 0
The following universal identity holds for L4 but does not hold for L8 : ∀x ∀y1 ∀y2 (x/y1 )·(((x/y1 )·(y2 \x))·(y2 \x)) = ((x/y1 )·(y2 \x))·((x/y1 )·(y2 \x)). This universal identity was derived from the following loop identity: ∀x x · ((x · x) · x) = (x · x) · (x · x) To generate universal identities for isotopy invariants, we have to first find an equation that holds in some loops, transform it into a derived identity, and subsequently show that this new identity is indeed a loop invariant. In order to get a large number of invariants we employ a process of interleaving model generation and first order theorem proving with intermediate transformations of the respective results. We first systematically generate identities I for which we check whether they are loop identities, by trying to generate a loop of size ≤ 8 that satisfies I using the model generator Mace. All identities for which a loop exists are then transformed into universal identities U as described above. Each U is then passed to at least one first order theorem prover in order to show that it is an isotopy invariant. We employ both Vampire [15] and E [16] for this task. Combined, these show that around 80% of the universal identities are indeed isotopy invariants. Note that for each universal identity, U , we show that it is an invariant under isotopy independently of the size of a loop. We can therefore reuse these universal identities in different classifications. Consequently, we do not have to repeat this step in every classification, but rather perform it offline and collect universal identities in a pool of confirmed isotopy invariants. To date, we have
42
V. Sorge et al.
generated 54, 000 confirmed loop identities, which can be translated into universal identities. From these, we have attempted to prove 8, 000 to be isotopy invariants and have succeeded to show this for 6, 831 using both Vampire and E. During the classification of loops of a particular size, n, we draw on this pool by first filtering them again by using the model generator Finder to generate loops of size n that satisfy the invariant. We then extract those invariants for which at least one loop of order n exists, and we use only these as potential discriminants. Note that the filter discards any invariants which cannot solve any discrimination problem, as no loop of size n satisfies the invariant property. Then, when we need to discriminate between two loops we test whether one the invariants holds for one and not for the other, using DPLLT or CVClite. While the model generation and theorem proving stages are ran in parallel by distributing the problems on a large cluster of processors for both the generation and testing of invariants, finding a discriminant can still take a very long time. Moreover, universal identities are not necessarily sufficient to discriminate between two non-isotopic quasigroups. We therefore explore only a limited number of randomly selected invariants, and if this is not successful, we employ two different methods for generating invariants, which are based on the substructures of quasigroups, as described in Sec. 3.2 and Sec. 3.3. 3.2
Substructure Invariants
Let (G, ·) be a quasigroup, and let A and B be non-empty subsets of G. We adopt the usual notation for the set A · B, namely, A · B = {a · b : a ∈ A ∧ b ∈ B}. Lemma 1. Let (G, ·) be a quasigroup and let (H, ∗) be a quasigroup that is isotopic to (G, ·) under the bijections (α, β, γ). Then, for any non-empty subsets A and B of G, we have |A · B| = |α(A) ∗ β(B)|. Proof. Observe that since γ is a bijection, then |γ(A · B)| = |A · B|. It suffices then to show that γ(A · B) = α(A) ∗ β(B). But this follows immediately from the fact that for all a ∈ A and b ∈ B, we have γ(a · b) = α(a) ∗ β(b). When G is finite, one can interpret the elements of A (resp., B) as designating a subset of rows (resp., columns) in the multiplication table of G. The set A · B then consists of the elements where these rows and columns meet. The above result thus suggests the following notation: Notation 1. Let (G, ·) be a quasigroup of order n, and let i, j, k each be integers such that 1 ≤ i, j, k ≤ n. Let G(i, j, k) denote the set: G(i, j, k) = {(A, B) : A, B ⊆ G, |A| = i, |B| = j, |A · B| = k} . Theorem 1. Let (G, ·) and (H, ∗) be isotopic quasigroups of order n, and let i, j, k each be integers such that 1 ≤ i, j, k ≤ n. Then |G(i, j, k)| = |H(i, j, k)|. Proof. Note that the one-to-one correspondence between the collection of ordered pairs (A, B) such that A, B ⊆ G, |A| = i, |B| = j, and the corresponding
Automatic Construction and Verification of Isotopy Invariants
43
collection of ordered pairs of subsets of H, is preserved under isotopy. The result now follows easily from Lemma 1. Continuing with the notation above, fix an element (A, B) ∈ G(i, j, k), and for each gh ∈ A · B, (1 ≤ h ≤ k), let f (gh ) = |{(a, b) ∈ A × B : a · b = gh }|. In other words, f (gh ) is the number of times that gh appears in the block formed by A and B (which we will henceforth refer to as the A · B block). We let F (A, B) = (f (g1 ), . . . , f (gk )), and call this the (un-ordered) frequency-tuple of (A, B). If two such frequency-tuples F and F are the same (up to order), then we write F ≈ F . Lemma 2. Let (G, ·) and (H, ∗) be isotopic quasigroups (under the bijections (α, β, γ)) of order n, and let i, j, k each be integers such that 1 ≤ i, j, k ≤ n. If (A, B) ∈ G(i, j, k), then F (A, B) ≈ F (α(A), β(B)). Proof. In light of Theorem 1, it suffices to prove that, for every g ∈ A · B, f (g) = f (γ(g)). But this equality follows immediately from the fact that if a · b = g, then α(a) ∗ β(b) = γ(g). Given this latest result, we adopt the following notation: Notation 2. Let (G, ·) be a quasigroup of order n, let i, j, k be integers such that 1 ≤ i, j, k ≤ n, and let F be a frequency-tuple for some (C, D) ∈ G(i, j, k). Then, let G(i, j, k, F ) denote the set: G(i, j, k, F ) = {(A, B) ∈ G(i, j, k) : F (A, B) ≈ F } Theorem 2. Let (G, ·) be a quasigroup of order n, let i, j, k be integers such that 1 ≤ i, j, k ≤ n, and let F be a frequency-tuple for some (C, D) ∈ G(i, j, k). Furthermore, let (H, ∗) be a quasigroup isotopic to (G, ·). Then |G(i, j, k, F )| = |H(i, j, k, F )|. Proof. This is an immediate consequence of Theorem 1 and Lemma 2. To generate isotopy invariants based on Theorems 1 and 2, we implemented an algorithm that compares the number of elements in substructures for two quasigroups (G, ·) and (H, ∗). This works iteratively, as follows: for i = 2, . . . , n− 1, j = 2, . . . , n − 1, and k = max(i, j), . . . , n, if |G(i, j, k)| = |H(i, j, k)| then return the invariant, otherwise continue. If all the possible substructures are exhausted without yielding an invariant, we perform a frequency analysis for all the G(i, j, k) and H(i, j, k) until we find a pair G(i, j, k, F ) and H(i, j, k, F ), such that F = F . To keep the formulas resulting for these invariants small, we always prefer an existence argument over the actual comparison of numbers of substructures. That is, we give a preference to invariants, such that either |G(i, j, k)| = 0 or |H(i, j, k)| = 0. As an example of such an invariant, consider property P9 below that expresses that there exists a 2 × 2 substructure that contains exactly 2 distinct elements. Note the use of the unique existence quantifier for variables v1 and v2 . P9 : ∃r1 , r2 ∃c1 , c2 ∃!v1 , v2 r1 = r2 ∧ c1 = c2 ∧ v1 = v2 ∧
2 2 2 i=1 j=1 k=1
(ri · cj = vk )
44
V. Sorge et al.
With respect to P9 , consider these loops: L21 0 1 2 3 4 5 3.3
0 0 1 2 3 4 5
1 1 0 3 2 5 4
2 2 4 0 5 1 3
3 3 5 1 4 0 2
4 4 2 5 0 3 1
5 5 3 4 1 2 0
Note that L21 satisfies property P9 , due to the boxed structure, which is a 2 × 2 structure with exactly 2 elements. However, P9 does not hold for L23 , i.e., |L23 (2, 2, 2)| = 0.
L23 0 1 2 3 4 5
0 2 0 1 4 5 3
1 0 1 2 3 4 5
2 1 2 0 5 3 4
3 5 3 4 0 1 2
4 3 4 5 2 0 1
5 4 5 3 1 2 0
Patterns
Given non-empty subsets A and B of a quasigroup (G, ·), we look for patterns amongst the numbers of distinct elements within the respective sub-blocks. By this, we mean the following: Let |A| = i, |B| = j, and choose i , j such that 1 ≤ i ≤ i and 1 ≤ j ≤ j. Now for each k, 1 ≤ k ≤ n, we let AB(i , j , k) = {(A , B ) : A ⊆ A, B ⊆ B, |A | = i , |B | = j , |A · B | = k}. Furthermore, let pk = |AB(i , j , k)|. In other words, pk is the number of i × j sub-blocks of the A · B block, that have precisely k distinct entries. We now let Pi ,j (A, B) = (p1 , . . . , pn ), and we call Pi ,j (A, B) the i × j pattern-tuple of (A, B). Lemma 3. Let (G, ·), (H, ∗), (α, β, γ), i, j, k, n be as in Lemma 2, and let i , j be integers such that 1 ≤ i ≤ i and 1 ≤ j ≤ j. If A, B ⊆ G such that |A| = i and |B| = j, then Pi ,j (A, B) = Pi ,j (α(A), β(B)). Proof. Note that for each k, 1 ≤ k ≤ n, and for each (A , B ) ∈ AB(i , j , k), we have |A · B | = |α(A ) ∗ β(B )|, by Lemma 1. Now since α and β are bijections, then (A , B ) ∈ AB(i , j , k) if and only if (α(A ), β(B )) ∈ α(A)β(B)(i , j , k). The result now follows in a straightforward manner. Following similar lines as previously, we introduce the following notation: Notation 3. Let (G, ·) be a quasigroup of order n, and let Pi ,j be an i × j pattern-tuple of (C, D) for some C, D ⊆ G such that |C| = i and |D| = j (1 ≤ i, j ≤ n), where integers i , j are such that 1 ≤ i ≤ i and 1 ≤ j ≤ j. We let G(i, j, Pi ,j ) denote the set: G(i, j, Pi ,j ) = {(A, B) : A, B ⊆ G, |A| = i, |B| = j, Pi ,j (A, B) = Pi ,j } Theorem 3. Let (G, ·) be a quasigroup of order n, and let Pi ,j be an i × j pattern-tuple of (C, D) for some C, D ⊆ G such that |C| = i and |D| = j (1 ≤ i, j ≤ n), where integers i , j are such that 1 ≤ i ≤ i and 1 ≤ j ≤ j. Furthermore, let (H, ∗) be a quasigroup isotopic to (G, ·). Then |G(i, j, Pi ,j )| = |H(i, j, Pi ,j )|. Proof. This follows immediately from Lemma 3. In order to generate additional invariants for a given pair of quasigroups (G, ·) and (H, ∗), we employ Theorem 3 in a similar fashion to that used in Sec. 3.2. That is, we successively compare the number of patterns of the same size and the same number of distinct elements.
Automatic Construction and Verification of Isotopy Invariants
4
45
Simplifying Problems
The most difficult verification problems which occur during the classification process involve showing that a given node forms an isotopy class, i.e., we need to prove that a particular set of properties is indeed classifying with respect to isotopy. The opposite problem is to take a given node that doesn’t form an isotopy class, and construct a representant that is not isotopic to the existing representant of the node. To solve these two problems with automated theorem provers and model generators requires considerable effort, as basic formalisations of these problems are not solvable using state of the art techniques. We therefore employ several computational methods, implemented in the computer algebra system GAP [8], for reducing the complexity of the problems to be solved. The methods we developed for isotopism problems make use of those we developed for the corresponding isomorphism problems and also exploit some known results from the isotopy theory of quasigroups. Hence, we first give a brief account of how we handled isomorphism problems (for further details see [12]), and then present their adaptation for isotopism, using theorems presented in [14]. 4.1
Handling of Isomorphism Problems
In general, to prove that a particular set of properties associated with a node on the decision tree is classifying with respect to isomorphism is a higher-order problem, as it requires proving that all structures satisfying the properties are isomorphic. However, since we are concerned with finite structures, the proof problem can be reduced to first-order and even propositional logic by enumerating all (finitely many) possible isomorphic structures of the representant, Q, of the node. With all isomorphic structures available, it suffices to prove that each structure that satisfies the properties of the node equals one of the isomorphic structures. For the construction of a non-isomorphic structure, it suffices to generate a structure that satisfies the properties of the node but differs from each of the isomorphic structures. A naive approach to enumerate all isomorphic structures considers all n! possible bijective mappings for structures of cardinality n. From each of these bijective mappings, a structure can be constructed that is isomorphic to Q, and these n! are all possible isomorphic structures of Q. This naive approach can be simplified by the usage of generating systems. A structure Q with binary operation ◦ is said to be generated by a set of elements {q1 , . . . , qm } ⊆ Q if every element of Q can be expressed as a combination — usually called a factorisation or word — of the qi under the operation ◦. We call a set of generators together with the corresponding factorisations a generating system. Given a generating system, we can exploit the fact that each isomorphism is uniquely determined by the images of the generators, to reduce the total number of isomorphisms that we need to consider. If n is the cardinality of the structures and m is the number of generators, then, instead n! of n!, there are only (n−m)! possible mappings and possible resulting structures to consider, since only the m generators have to be mapped explicitly. In theory, the worst case complexity resulting from the simplification with generating systems is still n!, because there is no guarantee that a generating
46
V. Sorge et al.
system with a small number of generators exists. In our experiments, however, the number of generators was typically only 2, so the complexity for isomorphism problems was reduced to n2 from n!. This approach requires additional effort in order to compute a (minimal) generating system for Q, as well as to prove that a computed system is indeed a generating system for Q. The former task is performed by GAP, whereas the latter task results in an additional proof problem, which is trivial even for large n. 4.2
Handling of Isotopism Problems
Unfortunately, we cannot directly employ the trick of using generating systems since we now have three, instead of one, bijective mappings to consider, so the images of the generators no longer uniquely determine isotopisms. This means a naive approach to performing the isotopism proof or model construction would have to consider all (n!)3 possible triples of bijective mappings to be applied to a representant Q of a node in the classification tree. As for the case of isomorphism, from the (n!)3 possible triples of bijective mappings, all (n!)3 structures that are isotopic to Q can be computed. With all the isotopes of Q available, an automated theorem prover can be used to show that each structure that satisfies the properties of the node equals one of the isotopes. Moreover, a model generator can be asked to compute a structure that satisfies the properties of the node but differs from each of the isotopes. However, we have found that this naive approach exceeds the abilities of state of the art systems, even for n = 4. In order to simplify our problems, particularly by making use of the reduction offered by generating systems, we exploit the following known results from the isotopy theory of quasigroups (see [14] for details). Definition 2. A quasigroup (G, ·) is a principal isotope of the quasigroup (G, ∗) if there are permutations α, β, ι on G such that for all x, y ∈ G, α(x) ∗ β(y) = ι(x · y), where ι is the identity permutation. Theorem 4. If (G, ·) and (H, ∗) are isotopic quasigroups, then (H, ∗) is isomorphic to some principal isotope of (G, ·). Definition 3. Let (G, ·) be a quasigroup. Let f and g be fixed elements of G. Then the isotope (G, ∗) such that (x · g) ∗ (f · y) = x · y for all x, y ∈ G is called the fg-isotope of (G, ·). Theorem 5. Let (G, ·) and (H, ∗) be quasigroups. H is isotopic to G if and only if it is isomorphic to an f g-isotope of G. These results show that, for our purposes, we no longer need to consider quasigroups with distinct ground sets (the set G will typically suffice). Moreover, it is easy to show that every f g-isotope is, in fact, a loop with unit element f g. Consequently, every quasigroup has a loop isotope, and hence, rather than compute all (n!)3 isotopes for a structure Q, it suffices to: 1. compute the set of all f g-isotopes of Q, F G(Q), 2. remove structures from F G(Q) that are isomorphic to other structures in F G(Q) until the resulting set F G (Q) contains no pair of isomorphic structures,
Automatic Construction and Verification of Isotopy Invariants
47
3. compute the set of all structures that are isomorphic to a structure in F G (Q). We denote this set: If g (Q). For a given representant, Q, there are n2 f g-isotopes and for each f g-isotope there are n! isomorphic structures. Hence, our simplification reduces the complexity from (n!)3 structures in the naive approach to n!n2 structures. Although this is already a considerable improvement, the resulting complexity still exceeds the capabilities of state of the art systems, even for small n. We now observe that step 3 discusses isomorphisms only, which consequently allows us to exploit the generating systems simplification developed for the isomorphism problems. Hence, instead of step 3 above, we can perform the following alternative steps: 3a. compute a generating system for each structure in F G (Q), respectively, 3b. for each pair of a structure in F G (Q) and its generating system, compute the set of isomorphic structures with respect to the generating system, respectively. Then, compute the union Gf g (Q) of all resulting structures. In theory, the worst case complexity of Gf g (Q) is still n!n2 . However, in our experiments, we found that the complexity of F G (Q) is typically n rather than n2 , and the complexity resulting from the isomorphisms typically comes to n2 instead of n! when the generating system simplification is exploited. Hence, in practise, the simplified isotopism proof and model generation formalisations typically require the consideration of just n3 structures. Note that additional effort is necessary to prove that two given structures are isomorphic in step 2, and to compute and verify generating systems in step 3a. However, these additional problems are trivial even for large problems.
5
Results
Our primary result is a new qualitative isotopy classification theorem for loops of order 6, in the sense that it provides a full set of classifiers for the 22 known isotopism classes and corresponding representants. The decision tree for the theorem is given in Fig. 1 (for display purposes, we have broken it up into five parts). The entire tree consists of 43 nodes, where the doubly circled leaf nodes represent isotopy classes. The respective classifying properties correspond to the conjunction of discriminants given along the path from the leaves to the root. The discriminants themselves are given below the tree in Fig. 1. Note that the top three discriminants are universal identities, while the remainder are substructure invariants, with the exception of P11 , which is a pattern invariant. The pattern invariant and four of the substructure invariants (P4 , P5 , P6 , P9 ) use an existence argument. The reason for the relatively low number of universal identities as discriminants is that in each step we only test 100 invariants randomly selected out of the overall pool of known invariants before testing for discriminating substructure properties. This restriction is necessary as in our current implementation, testing universal identities with respect to discrimination still consumes a lot of search time. However, we hope that with more advanced encoding techniques and better pre-selection criteria for universal
48
V. Sorge et al. 1
¬P1 2
¬P2 4
P2 P3
6
P5
7
10
8
P7
¬P7
P7
14
15
16
P6
11
26 11
P9
13 ¬P7
21
17
26
18
19
24
30
P14 ¬P12 31
34 P15 36
P1 : P2 : P3 : P4 : P7 : P10 : P12 : P15 :
25 27
P13
32
29 P12
23 ¬P11
P11
¬P7
28
¬P10
22 ¬P8
¬P6 27
P10
P8
9
¬P9
20
¬P13 P7
¬P4
¬P5
10
P6
12
3
P4 5
¬P3
¬P6
P1
33 ¬P14
35 ¬P15
¬P16
P16
39
38
¬P14
P14 40
37
41
P17
¬P17
42
43
∀x, y1 , y2 (x/y1 )·(((x/y1 )·(y2 \x))·(y2 \x)) = ((x/y1 )·(y2 \x))·((x/y1 )·(y2 \x)) ∀x, y1 , y2 (x/y1 )·(((x/y1 )·(y2 \y1 ))·(y2 \y1 )) = ((x/y1 )·((x/y1 )·(y2 \y1 )))·(y2 \y1 ) ∀x, y1 , y2 (x/y1 ) = ((x/y1 )·(y2 \y1 ))·(y2 \y1 ) |G(3, 3, 4)| = 0 P5 : |G(3, 3, 3)| = 0 P6 : |G(2, 3, 3)| = 0 |G(2, 2, 2)| = 7 P8 : |G(2, 3, 3)| = 8 P9 : |G(2, 2, 2)| = 0 |G(2, 2, 2)| = 9 P11 : |G(4, 4, P2,2 )| = 0 ∧ P2,2 = (0, 2, 12, 23, 0, 0) |G(3, 3, 3)| = 8 P13 : |G(2, 2, 2)| = 5 P14 : |G(2, 3, 3)| = 4 |G(3, 2, 3)| = 4 P16 : |G(2, 2, 2)| = 4 P17 : |G(2, 2, 2)| = 11
Fig. 1. Decision tree and discriminating properties for the loops 6 classification
identity invariants, we will be able to replace more substructure discriminants in the decision tree by universal identities. For the construction of the models in the tree, we used the Sem, Finder, and DPLLT systems. Apart from the proof problems concerning universal identities, which were carried out by Vampire or E, the majority of the verification proofs have been done with CVClite or DPLLT. A translation into a purely propositional problem without equality, was prohibitive, however, as both the number of variables and nesting depth of the quantifications led to intractably large boolean satisfiability problems. While most properties of the tree are fully automatically verified, we still had difficulty proving that some of the leaf nodes are indeed isotopy classes. However, we hope to finalise this by shifting to the new version of DPLLT in the future. Moreover, given that the number of isotopy classes of order 6 is indeed 22 as known from the literature [11,14] and given the
Automatic Construction and Verification of Isotopy Invariants
49
proved discriminating nature of the classifying properties, the decision tree does indeed constitute a classification theorem. We are currently working on the classification theorem for loops of order 7, for which there are 564 isotopy classes. So far, we have generated 563 nodes in the decision tree, which contains 439 substructures invariants, with an additional 7 employing frequencies, and 117 depending on pattern invariants or frequency on patterns. We suspect that the remaining isotopy class can be discriminated by extending the concept of patterns by a recursion depth. In many cases, the model generation fails to produce any results, so we re-use results from our isomorphism classification of loops of order 7, as well as enumeration results by McKay and Myrwold [11], to search for non-isotopic models. That is, we start with a set of isomorphism class representants of order 7 loops and search amongst those for one that is non-isotopic to a given loop. Once we have finished the tree, we hope to simplify it by searching for useful universal identities.
6
Conclusions and Future Work
We have shown how the bootstrapping algorithm developed in [4] has been extended to produce isotopy classification theorems in loop theory. This involved solving two technical problems, namely the generation of isotopic invariants, and verifying that nodes in the classification tree define isotopy classes. To solve the first problem, we used the notion of universal identities to generate a pool of thousands of invariants, and we also developed new techniques for using concepts based on substructures as invariants. To solve the second problem, we combined facts from algebraic quasigroup theory in a novel computational way to extend techniques we developed in the context of isomorphism classification to isotopy problems. This not only significantly simplified the verification tasks but also made the related task of generating non-isotopic models feasible. However, it has become clear that it will be difficult to exceed loops of size 7 or 8 with our existing techniques. In particular, the computational techniques to generate substructure invariants need to be improved in order to scale up to larger sizes. This problem can be partially overcome by using more universal identities as discriminants. To this end, it is necessary to cut down on search time by using more intelligent pre-selection of universal identity invariants, as well as producing better reformulations of problems involving universal identities to make them accessible to efficient SAT solving techniques. It is also not clear that our current set of invariants will suffice for all orders. Indeed, we have already investigated another kind of invariant – which extends the concept of frequency to patterns – but which has not been used in the context of the results presented in this paper. Finding new types of isotopy invariants for loops is certainly interesting from a mathematical viewpoint. We plan to extract details of the bespoke invariant generation techniques developed for isotopy into more extensive background knowledge and improved production rules for the machine learning approach. We believe that it may be possible to re-instate the learning approach, thus restoring a more generic approach to invariant discovery, and possibly discovering new invariants. We
50
V. Sorge et al.
also plan to undertake a detailed analysis of the decision trees, as well as a comparison of classification theorems of different sizes for particular structures. From a purely technical point of view, an efficient reimplementation of the entire bootstrapping algorithm would be desirable to overcome some limitations of the Lisp implementation and choice of data structures. Historically, classification projects have been undertaken for algebras with relatively few classes for small orders (e.g., there are only 5 groups of size 8 up to isomorphism). However, classification of other algebras are no less valid projects, albeit ones which are possibly more suited to an automated approach, due to the large number of classes of small order (e.g., there are 106, 228, 849 loops of size 8 up to isomorphism and 1, 676, 267 up to isotopism). We have shown that it is possible to make progress on large classification projects, but that this requires the combination of reasoning techniques, including theorem proving, model generation, machine learning, satisfiability solving and symbolic algebra. We believe that such integrated approaches will lead the way in the application of automated reasoning to complex mathematical discovery tasks.
References 1. R. Alur and D. Peled, eds. Proc. of CAV 2004, LNCS 3114. Springer Verlag, 2004. 2. C. Barrett and S. Berezin. CVC Lite: A New Implementation of the Cooperating Validity Checker. In Alur and Peled [1], pages 515–518. 3. S. Colton. Automated Theory Formation in Pure Mathematics. Springer, 2002. 4. S. Colton, A. Meier, V. Sorge, and R. McCasland. Automatic Generation of Classification Theorems for Finite Algebras. In Proc. of IJCAR 2004, LNAI 3097, pages 400–414. Springer Verlag, 2004. 5. S. Colton and S. Muggleton. Mathematical Applications of Inductive Logic Programming. Machine Learning Journal, 2006 (forthcoming). 6. E. Falconer. Isotopy Invariants in Quasigroups. Transactions of the American Mathematical Society, 151(2):511–526, 1970. 7. H. Ganzinger, G. Hagen, R. Nieuwenhuis, A. Oliveras, and C. Tinelli. DPLL(T): Fast Decision Procedures. In Alur and Peled [1], pages 175–188. 8. GAP Group. Groups, Algorithms, and Programming, v4.7, 2002. gap-system.org. 9. W. McCune. Single axioms for groups and Abelian groups with various operations. Journal of Automated Reasoning, 10(1):1–13, 1993. 10. W. McCune. Mace4 Reference Manual and Guide. Argonne Nat. Laboratory, 2003. 11. B. McKay, A. Meynert, and W. Myrvold. Counting Small Latin Squares. In Europ. Women in Mathematics Int. Workshop on Groups and Graphs, pages 67–72, 2002. 12. Andreas Meier and Volker Sorge. Applying SAT Solving in Classification of Finite Algebras. Journal of Automated Reasoning, 34(2):34 pages, 2005. 13. M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik. chaff: Engineering an efficient SAT Solver. In Proc. of DAC-2001, pages 530–535, 2001. 14. H. Pflugfelder. Quasigroups and Loops: Introduction, volume 7 of Sigma Series in Pure Mathematics. Heldermann Verlag, Berlin, Germany, 1990. 15. A. Riazanov and A. Voronkov. Vampire 1.1. In Proc. of IJCAR 2001, LNAI 2083, pages 376–380. Springer Verlag, 2001. 16. S. Schulz. E: A Brainiac theorem prover. J. of AI Comm., 15(2–3):111–126, 2002. 17. J. Schwenk. A classification of Abelian quasigroups. Rendiconti di Matematica, Serie VII, 15:161–172, 1995.
Automatic Construction and Verification of Isotopy Invariants
51
18. J. Slaney. FINDER, Notes and Guide. Center for Information Science Research, Australian National University, 1995. 19. J Slaney, M Fujita, and M Stickel. Automated reasoning and exhaustive search: Quasigroup existence problems. Computers and Mathematics with Applications, 29:115–132, 1995. 20. G. Sutcliffe. The IJCAR-2004 Automated Theorem Proving Competition. AI Communications, 18(1):33–40, 2005. 21. C Weidenbach, U Brahm, T Hillenbrand, E Keen, C Theobald, and D Topic. SPASS Version 2.0. In Proc. of CADE–18, LNAI 2392, pages 275–279. Springer, 2002. 22. J. Zhang and H. Zhang. SEM User’s Guide. Dep. of Comp. Science, UIowa, 2001.
Pitfalls of a Full Floating-Point Proof: Example on the Formal Proof of the Veltkamp/Dekker Algorithms Sylvie Boldo INRIA Futurs – PCRI LRI, Bˆ at. 490, Universit´e Paris Sud 91405 Orsay Cedex, France
[email protected]
Abstract. Some floating-point algorithms have been used for decades and proved decades ago in radix-2, providing neither Underflow, nor Overflow occurs. This includes the Veltkamp algorithm, used to split a float into an upper part and a lower part and the Dekker algorithm, used to compute the exact error of a floating-point multiplication. The aim of this article is to show the difficulties of a strong justification of the validity of these algorithms for a generic radix and even when Underflow or Overflow occurs. These cases are usually dismissed even if they should not: the main argument in radix 2 of the first algorithm fails in other radices. Nevertheless all results still hold here under mild assumptions. The proof path is interesting as these cases are hardly looked into and new methods and results had to be developed.
1
Introduction
Some algorithms have been known for decades and thoroughly used. They have been proved by different people using different paths. This can be seen enough for the result to be considered correct [1]. Nevertheless, examples have shown that even in this case, the proof may be wrong [2,3]. In particular, such floatingpoint results can reasonably be assumed to be correct in the general case (radix 2, no Underflow, no Overflow). This assumption is usually based on a pen and paper proof. In this kind of proof, the hypotheses and where they are used are well hidden: it is difficult to remove a hypothesis such as “the radix is 2” and see where the proof has to be modified. This is the reason why we use formal methods and proof assistants that give a high guarantee in the result. Some formalizations have already been successful including both hardwarelevel [4,5,6] and high-level floating-point arithmetic [7,8]. We use the Coq proof assistant [9] where floating-point arithmetic has already been formalized [10]. The case study we are interested in is first the Veltkamp algorithm used to split floating-point numbers into an upper and a lower part. This algorithm was discovered at the same period both by Veltkamp who is probably the main author of [11,12] and by Dekker [13]. Both paternities are uncertain, that is why we chose to use their most common denomination. U. Furbach and N. Shankar (Eds.): IJCAR 2006, LNAI 4130, pp. 52–66, 2006. c Springer-Verlag Berlin Heidelberg 2006
Pitfalls of a Full Floating-Point Proof
53
The second case study is an algorithm by Dekker [13] that computes the exact floating point error of a multiplication: for two floating-point numbers a and b in a given format, we compute two floating-point numbers r1 and r2 in the same floating-point format such that r1 is the rounding of a×b and a×b = r1 +r2 (this is an exact mathematical equality). With other similar techniques, this allows the computation with floating-point expansions [13,14,15] to do multi-precision computations using the floating-point unit of the processor. Such methods are used in computational geometry [15] to give the correct answer to “is this point to the left or to the right of this line?” or “is this point in or out of this circle?”. This paper is organized as follows: Section 2 describes floating-point arithmetic and our formalization of it. Section 3 describes the splitting algorithm and its full proof. Section 4 describes the exact multiplication algorithm and its full proof. Section 5 gives some perspectives.
2
Floating-Point Arithmetic
Floating-point formats, numbers and operations are defined by the IEEE-754 standard for binary floating-point arithmetic [16,17], that is under revision [18], and by the IEEE-854 standard for generic radix [19]. A format is a limit on the size of the fraction and on the size of the exponent. A floating-point number is then composed of a sign, a fraction and an exponent within these limits. The value of the floating-point number is then: (−1)s 1.f × 2e−B where s is the sign bit, f is the fraction (the list of bits), e is the exponent and B is a known bias. For special values of the exponent, we get either smaller values that are called subnormals (their value is (−1)s 0.f × 21−B ) or exceptional values such as ±∞. As in [20], the Coq formalization of floating-point arithmetic [10] represents a floating-point number by a pair of integers. For example, the radix-2 number 1.0012E1 is represented as (10012 , −2) = (9, −2). The left part of a float is called the significant and the right part is the exponent. Note that the exponent is shifted compared to the exponent of the IEEE machine number. The radix is defined as 2 in the IEEE-754 standard and can be either 2 or 10 in the IEEE-854 standard. With the increasing interest in the radix 10, we set the radix β as an integer greater than 1. Therefore, a float can be interpreted as a real value as follows: (n, e) ∈ Z2 → n × β e ∈ R. We restrict ourselves to the numbers that fit in a given floating-point format, that gives both the precision p of the format and its minimal exponent Ei . We say that a float (n, e) is bounded if and only if |n| < β p and e ≥ −Ei . The main characteristic of this formalization is that many floats (called a cohort [18]) may share the same real value: (4, 3)10 =R (40, 2)10 =R (400, 1)10 =R (4000, 0)10. To retain uniqueness, we define a canonical representation corresponding to the IEEE float. The reader does not have to bother with the canonicness or not of the floats in the proof. These ponderous details are handled by the proof assistant (and the author). For a canonical float f = (nf , ef ), the ulp (unit in the last place) has a simple expression: ulp(f ) = β ef .
54
S. Boldo
The rounding to nearest is denoted by ◦. It gives the bounded float that is the closest to the real value. When in the middle, this rounding can give either of the enclosing floats. This implies that if f = ◦(x), then |x − f | ≤ ulp(x)/2 = β ef /2. even The default IEEE rounding mode is denoted by ◦ : when in the middle, it gives the float with the even mantissa (ending with a zero in radix 2). All lemmas and theorems explained below are proved in Coq. The corresponding files Veltkamp.v and Dekker.v can be found at the following address: http://lipforge.ens-lyon.fr/www/pff/.
3 3.1
Splitting of Float Algorithm and Hand-Waving Proof
Algorithm. The Veltkamp algorithm [11,12,13] splits a float x in two floats x1 and x2 such that x = x1 + x2 and x1 is the result of rounding x on a smaller precision. More precisely, let s and t be two integers such that 2 ≤ s ≤ t − 2. The working precision is on t bits (or digits) and we want to have x1 on t − s bits. The remainder x2 fits on s bits. The algorithm, described below, only contains usual floating-point operations using the working precision. The constant C can be stored beforehand so the algorithm consists of 4 operations: Algorithm 1 (Veltkamp) Let C = β s + 1. p = ◦(x × C) q = ◦(x − p) x1 = ◦(p + q) x2 = ◦(x − x1 ) Hand-Waving Proof. We compute p ≈ (β s + 1) × x and q ≈ x − p ≈ −β s × x. Therefore p and −q are very near one from another and x1 is computed exactly. And p + q ≈ x as q ≈ x − p, but the exponent of q is about s plus the exponent of x so that all the bits of x lesser than this value are lost. So, we only have in x1 the first t − s bits of x. s bits x +
s
β ×x p = ◦(x × C) q = ◦(x − p) x1 = p + q
Of course, a drawing is not a proof, and especially not a formal proof. We have to deal with many cases, including the possible values of the exponent of p and x1 and, the main point is the values of the exponent of q.
Pitfalls of a Full Floating-Point Proof
3.2
55
Proof
Lemma 1 (ImplyClosest). For a bounded float f (in precision p), if we have e β e+p−1 ≤ z ≤ β e+p and β e+p−1 ≤ f and |z − f | ≤ β2 , then f = ◦(z). In this lemma, e is an integer being the exponent of f . As both z and f are greater that β e+p−1 , we do not fall into the pitfall described below. For more on the definition of the ulp so that this pitfall does not exist, see [21]. ) ulp(f ) f + − ulp(f 2 ; 2
z ◦(z)
f = βe
Radix-2 Assumptions. Let us move to the proof of the Veltkamp algorithm. We first assume the radix is 2 and that there is no Underflow: all floats are normal floats. We assume that the rounding is any rounding to nearest in precision t. Here are the steps taken to prove Algorithm 1: 1. 2. 3. 4. 5. 6. 7.
x1 = p + q, e q ≤ ep ≤ s + 1 + e x , eq ≤ s + ex , eq ≥ s + ex , x1 = ◦t−s (x) using Lemma 1, x = x1 + x2 , x2 fits on s − 1 bits.
All proofs will be described in the next subsection except Step 3’s and 7’s. As a matter of fact, these assumptions only hold in radix 2. Lemma 2 (eqLe2). β = 2 ⇒ eq ≤ s + ex We bound q: |q| = ◦(|p − x|) ≤ (2s + 1)|x| + 12 ulp(p) − |x| + 12 ulp(q). Then |q| ≤ 2s |x| + ulp(p) ≤ |nx |2ex +s + 2s+1+ex = (|nx | + 2)2ex +s . If |nx | ≤ 2t − 3, then |q| < 2t+ex +s and we deduce that eq ≤ s + ex . If not, then we either have |nx | equal to 2t − 2 or 2t − 1: – if |nx | = 2t − 1, then |p| = 2ex +s+1 (2t−1 + 2t−s−1 − 1) and |q| = 2ex +s (2t − 2), – if |nx | = 2t − 2, then |p| = 2ex +s+1 (2t−1 + 2t−s−1 − 1) and |q| = 2ex +s (2t − 2), In any case, we have that eq ≤ s + ex .
This lemma is very useful as it exactly gives the exponent of q, which is the key point of the proof in [13,15]. Here is a counter-example when β = 10, t = 6 and s = 3. For x = 999996, we have x × 1001 = 1000995996 that is rounded into p = 1001000000 and x − p = −1000000004 therefore q = −10000000000 has for exponent 4 + ex = 1 + s + ex . What fails in the previous proof is that we have |q| ≤ (nx + β)β ex +s . So instead of having 2 cases to check, we have β cases to check and when β > 2, we can find among these cases some where the result does not hold.
56
S. Boldo
Lemma 3 (Veltkamp tail2). β = 2 ⇒ x2 fits on s − 1 bits. It will be proved later (Lemma 13) that x2 can be represented with a mantissa s smaller or equal to β2 . As β = 2, this mantissa is smaller or equal to β s−1 , therefore x2 can be represented on s − 1 bits. This fact is well-known in radix-2, but has counter-examples in greater radices, for example when β = 10, t = 6 and s = 3. For x = 123456, we have x1 = 123000 and x2 = 456 cannot fit in 2 digits. In radix 2, the sign is enough to compensate one bit and x2 can be shortened by one bit. A Proof with a Generic Radix But Still no Underflow. We assume there is no Underflow. We want to prove that x1 = ◦t−s (x). By symmetry of the rounding, we assume that x > 0. Therefore, we have p > 0, q < 0 and x1 > 0. Lemma 4 (hxExact). x1 = p + q We use Sterbenz’s lemma [22] to prove it. First, −q = ◦(p − x) ≤ ◦(p) = p. β 1−t Second, if f = ◦(z) and f is normal, then |z| ≤ |f | 1 + 2 , so 1−t
p = (p − x) + x ≤ |q|(1 + β 2 ) + βxC s +1 ≤ |q|(1 + Therefore, some computations lead to p ≤ 2|q|.
β 1−t 2 )
1−t
+ |p|
1+ β 2 β s +1
.
Lemma 5 (eqLeep, epLe). eq ≤ ep ≤ s + 1 + ex
As |q| ≤ |p|, we have eq ≤ ep . As p = ◦ ((β s + 1) × x) ≤ ◦ β s+1 × x , we have ep ≤ s + 1 + ex . Lemma 6 (eqLe). We either have eq ≤ s + ex or both q = −β t+s+ex and ex +s |x − x1 | ≤ β 2 . The result is a disjunction: either we are in the common case (like in radix-2) where the exponent of q will be known exactly, or we are in the exceptional case corresponding to the counter-example of the previous section: then q is slightly too big to have the exponent s + ex . Nevertheless, we can bound the difference between x and x1 as if q had a smaller exponent. |q| = −q = ◦(p − x) ≤ p − x + 12 ulp(q) ≤ (β s + 1)x + 12 ulp(p) − x + 12 ulp(q). Therefore, |q| ≤ β s x + ulp(p) ≤ nx β ex +s + β s+1+ex ≤ (nx + β)β ex +s . If nx ≤ β t − β − 1, then |q| < β t+ex +s and we deduce that eq ≤ s + ex . If not, then nx ≥ β t − β, so x × C ≤ β ex β t − 1 (β s + 1) < β s+1+ex β t−1 + β t−s−1 p ≤ β s+1+ex β t−1 + β t−s−1 ≤ β t+s+ex + β t+ex So, we have p − x < β t+s+ex and −q ≤ β t+s+ex . If −q < β t+s+ex , the result holds. When −q = β t+s+ex , we bound x − x1 : – if p − x ≤ −q, then −q is nearer to p − x than (−q)− , therefore the distance between p − x and −q is smaller than half the distance between q and its − − ex +s ) predecessor: |x − x1 | = |(p − x) − (−q)| ≤ −q−(−q) = ulp((−q) =β 2 . 2 2
Pitfalls of a Full Floating-Point Proof
57
– if p − x ≥ −x, then |x − x1 | = |(p − x) − (−q)| = p − x + q so we have ex +s |x − x1 | ≤ β t+s+ex + β ex +1 − β t+s+ex ≤ β 2 . Lemma 7 (eqGe). eq ≥ s + ex We bound q: |q| = ◦(p − x) ≥ p − x − 12 ulp(q) ≥ β s x − 12 ulp(p) − 12 ulp(q) Therefore |q| ≥ nx β s+ex − β s+1+ex ≥ (nx − β)β s+ex . So if nx ≥ β t−1 + β, we have eq ≥ s + ex . If not, we may have nx = β t−1 . In this case, all the computations are exact and −q = β s+t−1+ex has a good exponent. The last uncovered possibility is that β t−1 + 1 ≤ nx ≤ β t−1 + β − 1. Note that in radix 2, there is only one such float. Here, we precisely bound p and q: x × C ≥ β ex βt−1 + 1 (β s + 1) >β s+ex β t−1 + β t−s−1 + 1 x p ≥ β s+ex β t−1 + β t−s−1 + 1 ≥ β t+s−1+e + β t−1+ex+ β s+ex t−1 t+s−1+ex t−1+ex s+ex ex p−x≥β +β +β −β β + β − 1 > β t+s−1+ex t+s−1+ex −q ≥ β In all cases, eq ≥ s + ex . ex +s
Therefore, we always have |x − x1 | ≤ β 2 ! Indeed, depending on the case of Lemma 6 and the result of Lemma 7, we either have eq = s + ex or both ex +s q = −β t+s+ex and |x − x1 | ≤ β 2 . But if eq = s + ex , then we also have |x − x1 | = |(x − p) − q| ≤
ulp(q) 2
=
β ex +s 2 .
Lemma 8 (Veltkamp aux aux). β t−1+ex ≤ x1 s
s+ex
If nx ≥ β t−1 + β2 , then β t−1+ex ≤ x − β 2 ≤ x − |x − x1 | ≤ x1 . s If not, then we have an integer ε < β2 such that nx = β t−1 + ε. And we can s+ex t−1 t−1−s exactly givethe values of p and q using ε: p = β β + β + ε and s+ex t−1 t−1+ex q = −β β + ε . So x1 = β . We proved these are the only possible rounding to nearest of x × C and x − p using Lemma 1. Lemma 9 (Veltkamp pos). x1 = ◦t−s (x) Let us apply Lemma 1 with z = x, f = x1 , p = t − s and e = ex + s. We first have to prove that x1 can be represented on t − s bits. We know that eq = s + ex and s + ex ≤ ep ≤ s + ex + 1, therefore x1 = p + q can be represented with the exponent s + ex . The corresponding mantissa m is then such that: |m| = |x1 |β −s−ex ≤ (|x| + |x − x1 |)β −s−ex < β t−s + 12 . So |m| ≤ β t−s . This implies that there exists a float v bounded on t − s bits that is equal to x1 (it has exponent s+ex and mantissa m except when m = β t−s where it has exponent 1 + s + ex ). We then have β ex +t−1 ≤ x ≤ β ex +t as x is normal. From Lemma 8, we have that β ex +t−1 ≤ x1 . Finally, we have that |x − x1 | ≤ β ex +s /2, that ends the proof. Why Underflow Does Not Matter Here. The reason why Underflow does not matter here is that we only deal with floating-point additions. Indeed, multiplying x by C is only adding x and xβ s (which is a bounded float). When you
58
S. Boldo
perform a floating-point addition, you cannot lose precision due to Underflows: all the only nonzero digits will be above the Underflow threshold. Note that the Underflow threshold on precision t−s is the same as in precision t. We then artificially create a floating-point format with an Underflow threshold small enough to guarantee that all our floating-point numbers in the working precision will be normal. We put Eiext = Ei + t − 1. Lemma 10 (bimplybplusNorm). If f is bounded and nonzero, there exists g such that g is normal with respect to the extended format and equal to f . We transform our subnormal x into a normal float in the extended precision. We then have to link the rounding modes in the working and extended formats. Lemma 11 (Closestbplusb). If f = ◦ext (z) and f is bounded in the working format, then f = ◦(z). Lemma 12 (Closestbbplus). Let fext be a float (possibly unbounded) such that efext ≥ −Ei . If f = ◦(fext ), then f = ◦ext (z). When performing a floating-point addition, we are exactly in this case: the real we want to round is an integer multiplied by the Underflow threshold. Even when x is subnormal, we prove that the algorithm gives the expected result: Theorem 1 (Veltkamp). If x is a bounded float, if p, q and x1 are computed using the Algorithm 1, then x1 = ◦t−s (x). As for x2 , we first prove Lemma 13 (Veltkamp tail aux). The value x − x1 can be represented with s exponent ex and a mantissa smaller or equal to β2 . When x is normal, as x1 ≥ β t−1+ex , we know that x1 has an exponent greater than the exponent of x. When x is subnormal, its exponent is minimal, therefore the exponent of x1 is greater. So in any case, x − x1 can be represented with ex +s s exponent ex . More |x − x1 | ≤ β 2 , so the mantissa is |x − x1 |β −ex ≤ β2 . Lemma 14 (Veltkamp tail). x = x1 + x2 and x2 fits on s bits. From Lemma 13, we know that x−x1 fits on t bits. It will therefore be computed exactly. More, the lemma also states that x2 = x − x1 can be represented with a mantissa strictly smaller than β s , therefore fits on s bits. 3.3
Special Rounding to Nearest Matters
Rounding to Nearest, Ties to Even. We now assume all the computations even even are done using ◦ . The goal is to prove that x1 = ◦ t−s (x).
Pitfalls of a Full Floating-Point Proof
59
Lemma 15 (ImplyClosestStrict2). For a bounded float f (in precision p), a real z ande an integer e, if we have β e+p−1 ≤ z ≤ β e+p and β e+p−1 ≤ f and |z − f | < β2 , then f = ◦(z) and f is the only possible rounding to nearest. e
The only difference with Lemma 1 is that the “less or equal to” β2 has become a “less than” to guarantees that f is the only rounding to nearest. Lemma 16 (ClosestImplyEven int). For a bounded float f and a real z, if even f = ◦ (z) and f is positive, and there exists an integer n such that the real number z = β ef × n + 12 , then f is even. We know that |x − x1 | ≤ β ex +s /2. If we more have |x − x1 | < β ex +s /2, then even using Lemma 15 and Lemma 9, we easily prove that x1 = ◦ t−s (x). The only remaining case is the tie-breaking case when |x − x1 | = β ex +s /2. We know from Theorem 1 that x1 = ◦t−s (x), so we only have to prove that either x1 is even (on t − s bits) or is the only possible rounding to nearest of x. If the radix id odd, then |x−x1 | = β ex +s /2 implies that there exists an integer m such that 2 × m = β s . As s ≥ 2 and β is odd, then β s is odd, therefore 2 × m is odd, which is impossible. Let us assume the radix is even. From the proof of Lemma 9, we have two cases: x1 either has exponent s+ex or is equal to (β t−s−1 , 1+s+ex). In the second case, x1 is even as β is even. Let us look at the first case where ex1 = s + ex . Then we will prove that the bits at level s + ex of both p and q are even and the lower bits are zero. As x1 = p + q, this will imply that x1 is even. As s + ex ≤ eq ≤ ep ≤ s + ex + 1, the bits lower than s + ex are even for both p and q. Let us prove that the s + ex -bits are even. As |x − x1 | = β ex +s /2, there exists ε ∈ {1, −1} such that x = x1 + εβ ex +s /2. If ep = s + ex + 1, then the bit at level s + ex is zero. If ep = s + ex , there exists m ∈ Z such that x × C = β ex +s (m + ε/2). From Lemma 16, p is even. For q, we split depending on Lemma 6. If q = −β t+s+ex , then its s + ex bit is even. In the other case, eq ≤ s + ex , then there exists m ∈ Z such that x − p = β ex +s (m + ε/2). From Lemma 16, this implies that q is even. In all cases and for both p and q, the s + ex bits are even. So x1 is even. Rounding to Nearest, Ties Away from Zero. An interesting fact is that this algorithm holds for rounding to nearest and rounding to nearest, ties to even, but not for rounding to nearest, ties away from zero [18]. This rounding means that, when exactly in the middle of two floating-point numbers, we choose the one with the bigger absolute value. This rounding is also known as Banker’s rounding, since it is used in banking and phone bills: when in the middle of two cents, you always have to pay more. When all the operations are performed using rounding to nearest, ties away from zero, the result is not the rounding to nearest, ties away from zero of the input: for example let β = 2, t = 6 and s = 3. If x = 1001002, then p = 101001 0002 and q = −100101 0002 so x1 = 100 0002 , which is not the expected result (101 0002 as the rounding is away from zero). This is a unexpected behavior as these two roundings differ in very few points.
60
3.4
S. Boldo
Conclusion on Splitting
We finally proved that this algorithm is correct whatever the radix and the precision. Nevertheless, being generic with respect to the radix is difficult. It easily changes assertions as authors are trained in radix-2 properties. Being generic has a cost, and formal proofs are particularly useful here as it is hard to guess which properties will still hold and which will not when the radix changes. The main drawback of this algorithm is that it may overflow: when computing p = ◦(x × C), an overflow may occur. It is the only place where it can occur as |q| ≤ |p| and the other computed values are much smaller. This algorithm does not overflow if x×C does not overflow. If M is the greatest floating-point number, in rounding to nearest, ties to even or away from zero, this exactly means that |x|× C < M + ulp(M) . And in IEEE formats, M = (2t − 1)22t−3 2Ei . We therefore 2 need |x| <
4 4.1
(2t+1 −1)2Ei +2t−4 . 2s +1
A sufficient assumption is |x| ≤ 2Ei +2t−2 .
Exact Multiplication Algorithm and Hand-Waving Proof
Algorithm. For two bounded floats x and y, the Dekker algorithm [13] computes two bounded floats r1 and r2 such that r1 = ◦(x × y) and x × y = r1 + r2 . The algorithm is described below. It only contains operations using the working precision (t bits). We use s = 2t to split x and y in two (near-)equal parts. Algorithm 2 (Dekker) (x1 , x2 ) = Split(x) (y1 , y2 ) = Split(y) r1 = ◦(x × y) t1 = ◦(−r + ◦(x1 × y1 )) t2 = ◦(t1 + ◦(x1 × y2 )) t3 = ◦(t2 + ◦(x2 × y1 )) r2 = ◦(t3 + ◦(x2 × y2 )) Hand-Waving Proof. The idea is that there is no error in the computation of t1 , t2 , t3 and r2 . The multiplications are exact as the multiplicands are half-size numbers. The additions are exact as they mostly cancel as we know r2 will fit on t bits: r1 = ◦(x × y) t1 = x1 × y1 − r t2 = t1 + x1 × y2 t3 = t2 + x2 × y1 r2 = t3 + x2 × y2
Pitfalls of a Full Floating-Point Proof
4.2
61
Proof
This algorithm seemed to be believed by Dekker to be correct whatever the radix [13]. Nevertheless, it was discovered later by Linnainmaa [23] that this was not the case. In fact, this algorithm perfectly works in radix-2 and works in generic radix when the precision is even (see below). To guarantee a computation is exact, we give a possible exponent e (meaning that f × 2−e is an integer) and prove that |f | < β e+t . The proofs will mostly be inequalities as possible exponents will be deduced from one code line to the other. We first assume there is no Underflow. In this case, we know that x1 can be represented with exponent ex + s, that x2 can be represented with exponent ex and that |x2 | ≤ β s+ex /2. The same properties hold for y/y1 /y2 . As explained before, we choose s = t/2 so that x1 and y1 will be on t/2 digits and x2 and y2 on t/2 digits. The first inequalities needed are: – – – – –
|x2 × y2 | ≤ β 2s+ex +ey /4, |x2 × y1 | < β t+s+ex +ey /2 + β 2s+ex +ey /4, |x1 × y2 | < β t+s+ex +ey /2 + β 2s+ex +ey /4, |x×y −r| ≤ β t+ex +ey /2: as |x×y| < (β t −1)2 β ex +ey , we have er ≤ t+ex +ey . er ≥ t − 1 + ex + ey : as x and y are normal, we have |r| ≥ β 2t−2+ex +ey .
As for the multiplications of halves: – x1 × y1 will be computed exactly with exponent 2s + ex + ey as 2 t/2 ≤ t. – x1 ×y2 will be computed exactly with exponent s+ex +ey as t/2 + t/2 ≤ t. – x2 ×y1 will be computed exactly with exponent s+ex +ey as t/2 + t/2 ≤ t. Therefore the ti s are computed exactly: – t1 can be exactly represented with exponent t − 1 + ex + ey . More, | − r + x1 × y1 | ≤ |r − x × y| + |x1 × y2 | + |x2 × y1 | + |x2 × y2 | < β t+ex +ey /2 + β t+s+ex +ey + 3β 2s+ex +ey /4 < β 2t−1+ex +ey . – t2 can be exactly represented with exponent s + ex + ey . More, |t1 + x1 × y2 | = | − r + x1 × y1 + x1 × y2 | ≤ |r − x × y| + |x2 × y1 | + |x2 × y2 | < β t+ex +ey /2 + β t+s+ex +ey /2 + β 2s+ex +ey /2 < β t+s+ex +ey . – t3 can be exactly represented with exponent s + ex + ey . More, |t2 + x2 × y1 | = | − r + x1 × y1 + x1 × y2 + x2 × y1 | ≤ |r − x × y| + |x2 × y2 | ≤ β t+ex +ey /2 + β 2s+ex +ey /4 < β t+s+ex +ey . What is left is the last computation: if x2 × y2 is computed exactly, then r2 will be computed exactly as r2 would then be equal to x × y − ◦(x × y), which is representable when no Underflow occurs. We deduce from the inequalities that: Lemma 17 (Dekker aux). Provided no Underflow occurs, if x2 × y2 is representable, then x × y = r1 + r2 . The problem is that x2 × y2 fits on 2 × t/2 digits and this can outbound t by 1 in some cases. There are various solutions:
62
S. Boldo
– If some extra-precision is available, then it can be used just for the computation of x2 × y2 to guarantee the exactness of the result. – If the radix is 2, then x2 and y2 fit on s − 1 bits, therefore x2 × y2 fits on 2 × t/2 − 2 ≤ t bits and the computation is exact. – If the precision is even, then 2 × t/2 = t and the computation is exact. Theorem 2 (Dekker1). Provided ex + ey ≥ −Ei , if β = 2 or t is even, then x × y = r1 + r2 . The “no Underflow” assumption is explicit in this theorem: x and y may be subnormal, but the requirement is that the sum of their exponent is greater than the Underflow threshold. This is indeed a necessary and sufficient condition for the error term to be representable [24]. This theorem means that, if the error term can be exactly represented, it will be computed by the given algorithm. 4.3
When Underflow Happens
Underflow may happen in this algorithm and its consequences are harmful: instead of a certain mathematical equality, the only possible result is a bound on the error of the computations. Indeed, instead of all floating-point computations being exact, they may all become error-prone and each computation has to be taken care of and its error bounded. The problem is then of the duality: if we are above the Underflow threshold for one computation, then it is exact. If we are under the Underflow threshold, then the error of the considered computation is less than β −Ei /2. If we just bound the error on each computation, the error bound will be huge as exactness will not be considered. Therefore, to get an error bound that is a multiple of β −Ei , we have to take a great care of the propagation of the error. More precisely, we define an extended format where there will be no Underflow (same precision as the working format, but a small enough Underflow threshold to be sure it will not be undertaken). We then define the following property for the floating point numbers a and a and the real numbers ε and r: Underf_Err(a, a, r, ε) iff a = ◦(r), a is bounded on the extended format, |a − a | ≤ εβ −Ei and if the exponent of a is greater than the Underflow threshold of the working precision, then a = a . We can then initialize the error computation by this lemma: Lemma 18 (Underf Err1). If a fits on the extended format and a = ◦(a ), then Underf_Err(a, a, a , 0.5). Typically, the property on the computation of x1 × y2 is: if the corresponding exact float on the extended format has a large enough exponent, then the computation is error-free; if not, the error is bounded by β −Ei /2. The most used lemma is the one corresponding to the computations of the ti s: Lemma 19 (Underf Err3). Let us assume that Underf_Err(x, x, rx , εx ) and that Underf_Err(y, y , ry , εy ). Assume there exists a floating-point number z on the extended format such that z = x − y and ez ≤ min(ex , ey ). Now, let z = ◦(x−y). Then, if εx +εy ≤ β p−1 −1, we have Underf_Err(z, z , x−y, εx +εy ).
Pitfalls of a Full Floating-Point Proof
63
If the Underflow threshold is not undertaken, the computation is exact as it was undertaken neither for x , nor for y . If the Underflow threshold is undertaken, then the accumulated error is the one on x, plus the one on y. There is no error in the computation of x − y as both are subnormal. The hypothesis εx + εy ≤ β p−1 − 1 is here to guarantee that the drift is not too big. When the drift grows, it might create x and y that are greater than the smallest normal number, and then the error could no more be bounded by β −Ei /2. This assumption is here easily fulfilled as the final bound is 3.5β −Ei : Lemma 20 (Dekker2). If β = 2 or t is even, then |x × y − (r1 + r2 )| ≤ 72 β −Ei . The Underflow problem has already been tackled in a recent paper by Ogita, Rump and Oishi [25] where they give a bound of 5β −Ei . There is no real proof of it and the authors confirm this bound is rough. The 72 bound could probably be sharpened too. In particular, if the radix is 2, it can be reduced to 3. Theorem 3 (Dekker) Assume – The radix β is greater than 1, the precision is greater than 3 and the Underflow threshold β −Ei is smaller than 1. – We either have β = 2 or t is even. – r1 and r2 are computed by Algorithm 2. Then −Ei ≤ ex + ey implies that x × y = r1 + r2 . And anyway, |x × y − (r1 + r2 )| ≤ 72 β −Ei . 4.4
Conclusion on Exact Multiplication
We finally proved the correctness of the algorithm under mild assumptions. Even if the hypothesis “the radix is 2 or the precision is even” is far from elegant, it is very easy to check. More, it is indeed the case on the decimal basic formats of the revision of the IEEE-754 standard [18] on 64 bits (t = 16) or 128 bits (t = 34) but not on the decimal storage format on 32 bits (t = 7). In this last case, some extra-precision may be available, then the computation can be exact if the result of x2 × y2 can be stored with at least one more digit. In most real cases, the exact multiplication algorithm will work, but the other cases should not be ignored as they might make some higher-level algorithm fail. The study of Underflow is enlightening: it was clear from the start that the error in that case would not be zero, but a multiple of the Underflow threshold, but proving it was not that easy. We had to create some unintuitive property that carried all the information we need to be sure that “no Underflow” meant correct and “Underflow” meant a small error. Thanks to the tricky definition, we rather easily propagated this property till the end of the computation to give us an upper bound. This strategy was efficient but quite unusual. A more common solution would have been to split into many subcases: – if x1 × y1 is subnormal, then. . . – if x1 × y1 is normal, but if x1 × y2 or x2 × y1 is subnormal, then. . . – if x1 × y1 , x1 × y2 and x2 × y1 are normal, but x2 × y2 is subnormal, then. . .
64
S. Boldo
It would probably have worked and given a similar result (the author would not dare to guess which solution gives the better bound). The proof cost would have been much greater. We do not think this amount of work would have been worth it: the Underflow case must be studied, so that we can guarantee that the error cannot become huge without warning. As soon as the error is reasonably bounded, it does not always need to be sharpened. As the previous algorithm, the main drawback of this exact multiplication is that it might Overflow either in the splitting of x or y or in the computation of ◦(x × y) or ◦(x1 × y1 ). In any case, either r1 or r2 or both will be exceptional values (infinities or NaNs) so this can be detected afterwards.
5
Perspectives
These algorithms are well-known, but the fact that the Veltkamp algorithm works in rounding to nearest, ties to even and that it does not work in rounding to nearest, ties away from zero are new. It was never proved before as only rounding to nearest (and an error bounded by half-an ulp) was ever considered before. Of course, it gives a rounding to nearest, but the tie-breaking case is not as it should be. The tie-breaking is not important for the Dekker algorithm but it may be decisive in other cases. The point is that it is difficult to guess the correctness of the tie-breaking: yes when to even and no when away from zero. There are various striking things in this case study of floating-point algorithms. The first one is that a generic radix proof is tricky. Most existing proofs are radix-2 and are said to be “easily generalized”. We are aware of the fact that tackling an odd radix seems excessive but many basic facts do not hold when the radix is 10. It might change the exponent of some variables. It changes the fact that the tail of the float may be on s − 1 or s digits. It might make the algorithm of exact multiplication fail. This can often be looked as patches for the proof as the differences may be small. In some cases unfortunately, it cannot: the basic path of the proof relies on some lemma that can be contradicted or some inequality that does not hold anymore. We were flabbergasted by the gap that sometimes exists between a radix-2 proof and its generic counterpart. The second one is the overall complexity of the proofs. The algorithms are only a few lines long, but the formal proofs are above 6 600 lines. This is mostly real computations and inequalities. Justifying that a float is a/the rounding to nearest/ties to even is a painstaking work and computations on powers are entirely left to the user. Some tactics could be written to prove goals such as β p−1 + β p−3 + 1 < β p assuming β > 1. Another perspective is the flexibility of the formal proof checker. When we apply a theorem on a goal that does not have the exact required shape, we would like it to be able to make it fit, even if it means an additional proof. For example, we would like the proof assistant to be able to unfold some definitions or to guess that, if the goal has the form |a × b − c| ≤ d and the theorem has the form |c − a × b| ≤ d, then the theorem should be applied and an additional goal |a × b − c| = |c − a × b| be added (or deduced). A real example is the following
Pitfalls of a Full Floating-Point Proof
65
goal: let r be a real number and f a float. The theorem ClosestUlp states that 2 ∗ |r − f | ≤ ulp(f ) with ulp(f ) = β eN (f ) where N is a known function. To prove that 2 ∗ |f − r| ≤ β e , we would like to apply this theorem and just prove that e is eN (f ) (it would be enough to prove that e ≤ eN (f ) but this may be too much). The third striking aspect is the unexpected difficulty of dealing with Underflows. Underflow creates unpredictable behaviors and these behaviors are difficult to detect as they give no warning at the end of the computation, contrary to Overflow here. Each theorem has a first proof “when no Underflow occur”, usually it means that the inputs are normal floats. Then either the proof can be extended to subnormal floats. It usually means a unintuitive proof: here, we considered the same format with a smaller Underflow threshold. There is left to prove that the results of the roundings are similar whatever the format used. Or the proof fails. What can be guaranteed then is usually a bound on the error (a small multiple of the Underflow threshold is good). Some weaker property could also be guaranteed: a variable is non-zero, the final result is normal. . . Dealing with Underflow is highly unintuitive as subnormal floats behave like fixed-point numbers with a constant maximal error on each operation (note that we have some interesting properties: for example, addition of subnormal numbers is always exact). The theorems and methods of floating-point studies are then highly non-adapted to these cases, especially when there might be a mix between normal and subnormal floats: it means a mix between a fixed error term and a proportional error term that usually do not mix well. New methods should be considered to treat more easily the Underflow problem. This is no simple task as the behavior of subnormal floats can go from a dream (addition is exact) to a hell (the error is 50 % of the value of the result) and there is no way to deduce the behavior of an algorithm when Underflow happens from its normal behavior.
References 1. Davis, P.J.: Fidelity in mathematical discourse: Is one and one really two? Amer. Math. Monthly 79 (1972) 252–263 2. Lamport, L., Melliar-Smith, P.M.: Synchronizing clocks in the presence of faults. Journal of the ACM 32(1) (1985) 52–78 3. Rushby, J., von Henke, F.: Formal verification of algorithms for critical systems. In: Conference on Software for Critical Systems, New Orleans, Louisiana (1991) 4. Carre˜ no, V.A., Miner, P.S.: Specification of the IEEE-854 floating-point standard in HOL and PVS. In: 1995 International Workshop on Higher Order Logic Theorem Proving and its Applications, Aspen Grove, Utah (1995) supplemental proceedings. 5. Russinoff, D.M.: A mechanically checked proof of correctness of the amd k5 floating point square root microcode. Formal Methods in System Design 14(1) (1999) 75 6. Jacobi, C.: Formal Verification of a Fully IEEE Compliant Floating Point Unit. PhD thesis, Saarland University, Saarbrucken, Germany (2002) 7. Harrison, J.: A machine-checked theory of floating point arithmetic. In: 12th International Conference on Theorem Proving in Higher Order Logics, Nice (1999) 8. Harrison, J.: Formal verification of floating point trigonometric functions. In Hunt, W.A., Johnson, S.D., eds.: Proceedings of the Third International Conference on Formal Methods in Computer-Aided Design, Austin, Texas (2000) 217–233
66
S. Boldo
9. The Coq Development Team, LogiCal Project, INRIA: The Coq Proof Assistant: Reference Manual, Version 8.0. (2004) 10. Daumas, M., Rideau, L., Th´ery, L.: A generic library of floating-point numbers and its application to exact computing. In: 14th International Conference on Theorem Proving in Higher Order Logics, Edinburgh, Scotland (2001) 169–184 11. Veltkamp, G.W.: Algolprocedures voor het berekenen van een inwendig product in dubbele precisie. RC-Informatie 22, Technische Hogeschool Eindhoven (1968) 12. Veltkamp, G.W.: ALGOL procedures voor het rekenen in dubbele lengte. RCInformatie 21, Technische Hogeschool Eindhoven (1969) 13. Dekker, T.J.: A floating point technique for extending the available precision. Numerische Mathematik 18(3) (1971) 224–242 14. Priest, D.M.: Algorithms for arbitrary precision floating point arithmetic. In: 10th Symposium on Computer Arithmetic, Grenoble, France (1991) 132–144 15. Shewchuk, J.R.: Adaptive precision floating-point arithmetic and fast robust geometric predicates. In: Discrete and Computational Geometry. Volume 18. (1997) 16. Stevenson, D., et al.: A proposed standard for binary floating point arithmetic. IEEE Computer 14(3) (1981) 51–62 17. Stevenson, D., et al.: An American national standard: IEEE standard for binary floating point arithmetic. ACM SIGPLAN Notices 22(2) (1987) 9–25 18. of the Microprocessor Standards Subcommittee of the Standards Committee of the IEEE Computer Society, F.P.W.G.: IEEE standard for floating-point arithmetic (2006) Work in progress. 19. Cody, W.J., Karpinski, R., et al.: A proposed radix and word-length independent standard for floating point arithmetic. IEEE Micro 4(4) (1984) 86–100 20. Harrison, J.: Verifying the accuracy of polynomial approximations in HOL. In: Proceedings of the 10th International Conference on Theorem Proving in Higher Order Logics, Murray Hill, New Jersey (1997) 137–152 21. Muller, J.M.: On the definition of ulp(x). Technical Report LIP RR2005-09 INRIA RR-5504, Laboratoire de l’Informatique du Parall´elisme (2005) 22. Sterbenz, P.H.: Floating point computation. Prentice Hall (1974) 23. Linnainmaa, S.: Software for doubled-precision floating-point computations. ACM Transactions on Mathematical Software 7(3) (1981) 272–283 24. Boldo, S., Daumas, M.: Representable correcting terms for possibly underflowing floating point operations. In: Symposium on Computer Arithmetic, Spain (2003) 25. Ogita, T., Rump, S.M., Oishi, S.: Accurate sum and dot product. SIAM Journal on Scientific Computing 26(6) (2005) 1955–1988
Using the TPTP Language for Writing Derivations and Finite Interpretations Geoff Sutcliffe1 , Stephan Schulz2 , Koen Claessen3 , and Allen Van Gelder4 1
University of Miami, USA
[email protected] 2 Technische Universit¨ at M¨ unchen, Germany
[email protected] 3 Chalmers University of Technology, Sweden
[email protected] 4 University of California at Santa Cruz, USA
[email protected]
Abstract. One of the keys to the success of the TPTP and related projects is their consistent use of the TPTP language. The ability of the TPTP language to express solutions as well as problems, in conjunction with the simplicity of the syntax, sets it apart from other languages used in ATP. This paper provides a complete definition of the TPTP language, and describes how the language should be used to write derivations and finite interpretations.
1
Introduction
The TPTP problem library [19] is a well known standard set of test problems for first order automated theorem proving (ATP) systems. The TSTP solution library [18], the “flip side” of the TPTP, is becoming known as a resource for contemporary ATP systems’ solutions to TPTP problems. The SystemOnTPTP [16] and associated software have been employed in a range of application projects, e.g., [4,21,23]. One of the keys to the success of these projects is their consistent use of the TPTP language, which enables convenient communication between different systems and researchers. TPTP v3.0.0 introduced a new version of the TPTP language [20]. The language was designed to be suitable for writing both ATP problems and ATP solutions, to be flexible and extensible, and easily processed by both humans and computers. The entry barrier for using the TPTP language is (and has always been) very low. The syntax shares many features with Prolog, a language that is widely known in the ATP community. Indeed, with a few operator definitions, units of TPTP data can be read in Prolog using a single read/1 call, and written with a single writeq/1 call. Development, or at least prototyping, of reasoning software in Prolog is common, and Prolog compatibility eliminates the mundane task of writing IO routines for the reasoning software. The key development from the old (pre-v3.0.0) TPTP language to the new one was the addition of features for writing solutions to ATP problems. The features were designed for writing derivations, but their flexibility makes it possible to U. Furbach and N. Shankar (Eds.): IJCAR 2006, LNAI 4130, pp. 67–81, 2006. c Springer-Verlag Berlin Heidelberg 2006
68
G. Sutcliffe et al.
write a range of DAG structures. Additionally, there are features of the language that make it possible to conveniently specify finite interpretations. This paper provides a complete definition of the TPTP language, and describes how the language should be used to write derivations and finite interpretations. The ability of the TPTP language to express solutions as well as problems, in conjunction with the simplicity of the syntax, sets it apart from other languages used in ATP. Some languages, e.g., the LOP format [13], were designed for writing problems, and do not support writing solutions. Some languages for writing solutions are limited in scope, e.g., the PCL language [5] is limited to solutions to to equational problems, and the OpenTheory language [8] is designed only to be a computer processible form for systems that implement the HOL logic [6]. There are some general purpose languages that have features for writing derivations, e.g., Otter’s proof object format [11,10] and the DFG syntax [7], but none of these (that we know of) also provide support for writing finite interpretations. Mark-up languages such as OmDoc [9], OpenMath [2], and MathML [2] are quite expressive (especially for mathematical content), but their XML based format is not suitable for human processing. Overall, the TPTP language is more expressive and usable than other languages. Interoperability with other languages is supported in some cases, through translation tools.
2
The TPTP Language
The new TPTP language was first used in TPTP v3.0.0, released in November 2004. It has been taken up by a number of developers and received valuable comments and feedback. As a consequence, since that first release there have been some small, but significant, changes and extensions. The BNF definition of the language has recently been thoroughly overhauled. A principal goal has been to make it easy to translate the BNF into lex/yacc/flex/bison input, so that construction of parsers (in languages other than Prolog) can be a reasonably easy task. The BNF definition is in the appendix of this paper. The TPTP language definition uses a modified BNF meta-language that separates semantic, syntactic, lexical, and character-macro rules. Syntactic rules use the standard ::= separator, e.g., <source> ::=
When only a subset of the syntactically acceptable values for a non-terminal make semantic sense, a second rule for the non-terminal is provided using a :== separator, e.g., <source> :== | | , etc. Any further semantic rules that may be reached only from the right hand side of a semantic rule are also written using the :== separator, e.g., :== | This separation of syntax from semantics eases the task of building a syntactic analyzer, as only the ::= rules need be considered. At the same time, the
Using the TPTP Language
69
semantic rules provide the detail necessary for semantic checking. The rules that produce tokens from the lexical level use a ::- separator, e.g., ::- * with the bottom level character-macros defined by regular expressions in rules using a ::: separator, e.g., ::: [a-z] The BNF is documented with comments. The top level building blocks of TPTP files are annotated formulae, include directives, and comments. An annotated formula has the form: language(name, role, formula, source, [useful info]). The languages currently supported are fof - formulae in full first order form, and cnf - formulae in clause normal form. The role gives the user semantics of the formula, e.g., axiom, lemma, conjecture, and hence defines its use in an ATP system - see the BNF for the list of recognized roles and their meaning. The logical formula, in either FOF or CNF, uses a consistent and easily understood notation [20] that can be seen in the BNF. The source describes where the formula came from, e.g., an input file or an inference. The useful info is a list of arbitrary useful information, as required for user applications. The useful info field is optional, and if it is not used then the source field becomes optional. An example of a FOF formula, supplied from a file, is: fof(formula_27,axiom, ! [X,Y] : ( subclass(X,Y) <=> ! [U] : ( member(U,X) => member(U,Y) )), file(’SET005+0.ax’,subclass_defn), [description(’Definition of subclass’), relevance(0.9)]).
An example of an inferred CNF formula is: cnf(175,lemma, ( rsymProp(ib,sk_c3) | sk_c4 = sk_c3 ), inference(factor_simp,[status(thm)],[ inference(para_into,[status(thm)],[96,78,theory(equality)])]), [iquote(’para_into,96.2.1,78.1.1,factor_simp’)]).
A novel feature of the TPTP language, which is employed in the representation of finite interpretations, is the recognition of interpreted predicates and functors. These come in two varieties: defined predicates and functors, whose interpretation is specified by the TPTP language, and system predicates and functors, whose interpretation is ATP system specific. Interpreted predicates and functors are syntactically different from uninterpreted predicates and functors. Defined predicates and functors either start with a $, or are composed of non-alphanumeric characters. System predicates and functors start with $$. Uninterpreted predicates and functors start with a lower case alphabetic. The
70
G. Sutcliffe et al.
defined predicates recognized so far are $true and $false, with the obvious interpretations, and = and != for equality and inequality. The defined functors recognized so far are "distinct object"s, written in double quotes, and numbers. A "distinct object" is interpreted as the domain element in the double quotes. Numbers are interpreted as themselves (as domain elements). A consequence of the predefined interpretations is that all different "distinct object"s and numbers are known to be unequal, e.g., "Apple" != "Microsoft" and 1 != 2 are implicit axioms. Such implicit axioms may be built into an ATP system, e.g., [14], or generated. System predicates and functors are used for interpreted predicates and functors that are available in particular ATP tools. The names are not controlled by the TPTP language, so they must be used with caution. The source field of an annotated formula is most commonly a file record or an inference record. A file record stores the name of the file from which the annotated formula was read, and optionally the name of the annotated formula as it occurs in the file (this may be different from the name of the annotated formula itself, e.g., if the ATP system renames the annotated formulae that it reads in). An inference record stores three items of information about an inferred formula: the name of the inference rule provided by the ATP system, i.e., there are no standards; a list of useful information items, e.g., the semantic status of the formula and with respect to its parents as an SZS ontology value [20] (commonly inferred formulae are theorems of their parents, but in some cases the semantic relationship is weaker, as in Skolemization steps); and a list of the parents, which most commonly are parent annotated formula names, nested inference records, and theory records. A theory record is used when the axioms of some theory are built into the inference rule, e.g., equality axioms are built into paramodulation. The include directives of the TPTP language are analogous to C’s #include directives. An include directive may include an entire file, or may specify the names of the annotated formulae that are to be included from the file, thus providing a more finely grained include mechanism. Regular comments in the TPTP language extend from a % character to the end of the line, or may be block comments within /* ...*/ bracketing. System comments in the TPTP language are used for system specific annotations. They extend from a %$$ sequence to the end of the line, or may be block comments within /*$$ ...*/ bracketing. System comments look like regular comments, so normally they would be discarded. However, a wily user of the language can store/extract information from the comment before discarding it. System comments should identify the ATP system, followed by a :, e.g., /*$$Otter 3.3: Demodulator */. Comments may occur between any two tokens. Parsing tools written in C are available for the TPTP language, conversion of the BNF into lex/yacc input is available [22], and the tptp2X utility distributed with the TPTP is compatible with the language.
Using the TPTP Language
3
71
Derivations
A derivation is a directed acyclic graph (DAG) whose leaf nodes are formulae from the input, whose interior nodes are formulae inferred from parent formulae, and whose root nodes are the final derived formulae. For example, a proof of a FOF theorem from some axioms, by refutation of the CNF of the axioms and negated conjecture, is a derivation whose leaf nodes are the FOF axioms and conjecture, whose internal nodes are formed from the process of clausification and then from inferences performed on the clauses, and whose root node is the false formula. The information required to record a derivation is, minimally, the leaf formulae, and each inferred formula with references to its parent formulae. More detailed information that may be recorded and useful includes: the role of each formula; the name of the inference rule used in each inference step; sufficient details of each inference step to deterministically reproduce the inference; and the semantic relationships of inferred formulae with respect to their parents. The TPTP language is sufficient for recording all this, and more. A comprehensively recorded derivation provides the information required for various forms of processing, such as proof verification [17], proof visualization [15], and lemma extraction [5]. A derivation written in the TPTP language is a list of annotated formulae. Each annotated formula has a name, a role, and the logical formula. Each inferred formula has an inference record with the inference rule name, the semantic relationship of the formula to its parents as an SZS ontology value in a status record, and a list of references to its parent formulae. Example. Consider the following toy FOF problem, to prove the conjecture from the axioms (not all the axioms are needed for the proof - the extra axioms come into play when the example is used again in Section 4 to illustrate the finite interpretation format): %-----------------------------------------------------------------------%----All (hu)men are created equal. John is a human. John got an F grade. %----There is someone (a human) who got an A grade. An A grade is not %----equal to an F grade. Grades are not human. Therefore there is a %----human other than John. fof(all_created_equal,axiom,( ! [H1,H2] : ( ( human(H1) & human(H2) ) => created_equal(H1,H2) ) )). fof(john,axiom,( human(john) )). fof(john_failed,axiom,( grade(john) = f )). fof(someone_got_an_a,axiom,( ? [H] : ( human(H) & grade(H) = a ) )). fof(distinct_grades,axiom,( a != f )). fof(grades_not_human,axiom,( ! [G] : ~ human(grade(G)) )). fof(someone_not_john,conjecture,( ? [H] : ( human(H) & H != john ) )). %--------------------------------------------------------------------
72
G. Sutcliffe et al.
Here is a derivation recording a proof by refutation of the CNF, adapted (removing inferences that simply copy the parent formula) from the one produced by the ATP system EP v0.91 [12]: %-------------------------------------------------------------------fof(3,axiom,( grade(john) = f ), file(’CreatedEqual.p’,john_failed)). fof(4,axiom,( ? [X3] : ( human(X3) & grade(X3) = a ) ), file(’CreatedEqual.p’,someone_got_an_a)). fof(5,axiom,( a != f ), file(’CreatedEqual.p’,distinct_grades)). fof(7,conjecture,( ? [X3] : ( human(X3) & X3 != john ) ), file(’CreatedEqual.p’,someone_not_john)). fof(8,negated_conjecture,( ~ ? [X3] : ( human(X3) & X3 != john ) ), inference(assume_negation,[status(cth)],[7])). cnf(14,plain, ( grade(john) = f ), inference(split_conjunct,[status(thm)],[3])). fof(16,plain, ( human(esk1_0) & grade(esk1_0) = a ), inference(skolemize,[status(sab)],[4])). cnf(17,plain, ( grade(esk1_0) = a ), inference(split_conjunct,[status(thm)],[16])). cnf(18,plain, ( human(esk1_0) ), inference(split_conjunct,[status(thm)],[16])). cnf(19,plain, ( a != f ), inference(split_conjunct,[status(thm)],[5])). fof(22,negated_conjecture,( ! [X3] : ( ~ human(X3) | X3 = john ) ), inference(fof_nnf,[status(thm)],[8])). cnf(24,negated_conjecture, ( X1 = john | ~ human(X1) ), inference(split_conjunct,[status(thm)],[22])). cnf(25,negated_conjecture, ( john = esk1_0 ), inference(spm,[status(thm)],[24,18,theory(equality)])). cnf(28,plain, ( f = a ), inference(rw,[status(thm)],[ inference(rw,[status(thm)],[17,25,theory(equality)]), 14,theory(equality)])). cnf(29,plain, ( $false ), inference(sr,[status(thm)],[28,19,theory(equality)])). %--------------------------------------------------------------------
Using the TPTP Language
4
73
Finite Interpretations
A finite interpretation (or “finite model” of some identified formulae) consists of a finite domain, an interpretation of functors - a functor applied to domain elements is interpreted as a domain element, and an interpretation of predicates - a predicate applied to domain elements is interpreted as true or f alse. The elements of the domain are known to be distinct. The interpretation of functors and predicates is total, i.e., there is an interpretation for every functor and predicate for every pattern of domain element arguments. The TPTP language is sufficient for recording a finite interpretation. The domain, interpretation of functors, and interpretation of predicates, are written as FOF annotated formulae. A recorded interpretation provides the information required for various forms of processing, such as model verification, interpretation of formulae, and identification of isomorphic interpretations. The domain of a finite interpretation is written in the form: fof(fi name,fi domain, ! [X] : ( X = e1 | X = e2 | ... | X = en ) ). where the ei are all "distinct object"s, or all distinct integers, or all distinct constant terms. If "distinct object" or integer terms appear in the interpreted signature, then all those terms must appear in the domain. If constant terms are used they are freely chosen constant terms that do not appear in the signature being interpreted. The ei values then provide an exhaustive list of constant terms whose interpretation form the domain (there is a bijection from the constant terms to the domain, so one may think of the constant terms directly as the domain elements). The use of "distinct object"s or integer terms for a domain is preferred over constant terms, because that takes advantage of the predefined interpretation of such terms - all such terms and corresponding domain elements are known to be distinct (see Section 2). If the domain elements are constant terms then their inequality must be explicitly stated in annotated formulae of the form: fof(ei not ej ,fi domain, ei != ej ). The interpretation of functors is written in the form: fof(fi name,fi functors, ( f(e1 ,...,em) = er & f(e1 ,...,ep) = es ... ) ). specifying that, e.g., f(e1 ,...,em) is interpreted as the domain element er . If "distinct object"s or integer terms appear in the interpreted signature, then those terms are necessarily interpreted as themselves and must not be interpreted in the fi functors. The interpretation of predicates is written in the form: fof(fi name,fi predicates, ( p(e1 ,...,em) & ~ p(e1 ,...,ep) ... ) ).
74
G. Sutcliffe et al.
specifying that, e.g., p(e1 ,...,em) is interpreted as true and p(e1 ,...,ep) is interpreted as f alse. Equality is interpreted naturally by the domain, with the understanding that identical elements are equal. Example. Consider again the FOF problem from Section 3, but with the conjecture replaced by: fof(equality_lost,conjecture,( ! [H1,H2] : ( created_equal(H1,H2) <=> H1 = H2 ) )).
The resultant problem is CounterSatisfiable, i.e., there is a model for the axioms and negated conjecture. Here is one such model, adapted (by converting constant term domain elements to "distinct object" domain elements) from the one found by the model finding system Paradox 1.3 [3]: %-------------------------------------------------------------------fof(equality_lost,fi_domain, ! [X] : ( X = "a" | X = "f" | X = "john" | X = "got_a") ). fof(equality_lost,fi_functors, ( a = "a" & f = "f" & john = "john" & grade("a") = "f" & grade("f") = "a" & grade("john") = "f" & grade("got_a") = "a" )
).
fof(equality_lost,fi_predicates, ( human("john") & human("got_a") & ~ human("a") & ~ human("f") & ~ created_equal("a","a") & ~ created_equal("a","f") & ~ created_equal("a","john") & ~ created_equal("a","got_a") & ~ created_equal("f","a") & ~ created_equal("f","f") & ~ created_equal("f","john") & ~ created_equal("f","got_a") & ~ created_equal("john","a") & ~ created_equal("john","f") & created_equal("john","john") & created_equal("john","got_a") & ~ created_equal("got_a","a") & ~ created_equal("got_a","f") & created_equal("got_a","john") & created_equal("got_a","got_a") ) ). %--------------------------------------------------------------------
Variations, Layout, and Verification Normally every functor and predicate is interpreted once for every pattern of domain element arguments. No functor or predicate may be interpreted more than once for an argument pattern. If a functor or predicate is not interpreted for a given argument pattern then multiple interpretations are being represented, in which that functor or predicate applied to the argument pattern is interpreted as each of the possible values (each domain element for a functor, both true and f alse for a predicate). It is recommended that interpretations follow a standard layout, as illustrated by the examples above. However, the conjuncts of functor and predicate interpretations may be separated into individual annotated formulae. Compact forms are possible using universally quantified formulae, e.g.,
Using the TPTP Language
75
fof(equality_lost,fi_predicates, ( human("john") & human("got_a") & ~ human("a") & ~ human("f") & ! [X] : ~ created_equal("a",X) & ! [X] : ~ created_equal("f",X) & ! [X] : ~ created_equal(X,"a") & ! [X] : ~ created_equal(X,"f") & created_equal("john","john") & created_equal("john","got_a") & created_equal("got_a","john") & created_equal("got_a","got_a") ) ).
An interpretation can be verified as a model of a set of formulae by directly evaluating each formula in the model. The TPTP format also provides an alternative approach - the interpretation is adjoined to the formulae, and a trusted model finder is then used to find a model of the combined formula set.
5
Conclusion
Standards for writing derivations and finite interpretations have been presented. These standards should be adopted by the ATP community, to increase the range of ATP tools that can be seamlessly integrated into more complex and effective reasoning systems. Increased interoperability will contribute to the usability and uptake of ATP technology in application domains. Current work is extending the TPTP language for higher order logic [22]. When this is available, it will be used for extending the TPTP to higher order logic [1]. Future work will include the design of standards for representing infinite interpretations. As a first step, it is planned to represent Herbrand interpretations by term grammars, e.g., formulae of the form: ! [X,Y] : (p(X,Y) <=> ((X != a & Y != a) | (X = a & Y = a))) There are decision procedures for the truth of ground atoms in the context of such formulae. Compaction of finite interpretations using normal-form theory from relational databases is also being considered.
References 1. C. Benzm¨ uller and C. Brown. A Structured Set of Higher-Order Problems. In J. Hurd and T. Melham, editors, Proceedings of the 18th International Conference on Theorem Proving in Higher Order Logics, LNAI 3606, pages 66–81, 2005. 2. O. Caprotti and D. Carlisle. OpenMath and MathML: Semantic Mark Up for Mathematics. ACM Crossroads, 6(2), 1999. 3. K. Claessen and N. Sorensson. New Techniques that Improve MACE-style Finite Model Finding. In P. Baumgartner and C. Fermueller, editors, Proceedings of the CADE-19 Workshop: Model Computation - Principles, Algorithms, Applications, 2003. 4. E. Denney, B. Fischer, and J. Schumann. Using Automated Theorem Provers to Certify Auto-generated Aerospace Software. In M. Rusinowitch and D. Basin, editors, Proceedings of the 2nd International Joint Conference on Automated Reasoning, LNAI 3097, pages 198–212, 2004.
76
G. Sutcliffe et al.
5. J. Denzinger and S. Schulz. Recording and Analysing Knowledge-Based Distributed Deduction Processes. Journal of Symbolic Computation, 21:523–541, 1996. 6. M. Gordon and T. Melham. Introduction to HOL, a Theorem Proving Environment for Higher Order Logic. Cambridge University Press, 1993. 7. R. H¨ ahnle, M. Kerber, and C. Weidenbach. Common Syntax of the DFGSchwerpunktprogramm Deduction. Technical Report TR 10/96, Fakult¨ at f¨ ur Informatik, Univers¨ at Karlsruhe, Karlsruhe, Germany, 1996. 8. J. Hurd and R. Arthan. OpenTheory. http://www.cl.cam.ac.uk/ jeh1004/research/ opentheory, URL. 9. M. Kohlhase. OMDOC: Towards an Internet Standard for the Administration, Distribution, and Teaching of Mathematical Knowledge. In J.A. Campbell and E. Roanes-Lozano, editors, Proceedings of the Artificial Intelligence and Symbolic Computation Conference, 2000, LNCS 1930, pages 32–52, 2000. 10. W. McCune and O. Shumsky-Matlin. Ivy: A Preprocessor and Proof Checker for First-Order Logic. In M. Kaufmann, P. Manolios, and J. Strother Moore, editors, Computer-Aided Reasoning: ACL2 Case Studies, number 4 in Advances in Formal Methods, pages 265–282. Kluwer Academic Publishers, 2000. 11. W.W. McCune. Otter 3.3 Reference Manual. Technical Report ANL/MSC-TM263, Argonne National Laboratory, Argonne, USA, 2003. 12. S. Schulz. E: A Brainiac Theorem Prover. AI Communications, 15(2-3):111–126, 2002. 13. S. Schulz. LOP-Syntax for Theorem Proving Applications. http://www4. informatik.tu-muenchen.de/∼ schulz/WORK/lop.syntax, URL. 14. S. Schulz and Maria Paola Bonacina. On Handling Distinct Objects in the Superposition Calculus. In B. Konev and S. Schulz, editors, Proceedings of the 5th International Workshop on the Implementation of Logics, pages 66–77, 2005. 15. G. Steel. Visualising First-Order Proof Search. In C. Aspinall, D. L¨ uth, editor, Proceedings of User Interfaces for Theorem Provers 2005, pages 179–189, 2005. 16. G. Sutcliffe. SystemOnTPTP. In D. McAllester, editor, Proceedings of the 17th International Conference on Automated Deduction, LNAI 1831, pages 406–410, 2000. 17. G. Sutcliffe. Semantic Derivation Verification. International Journal on Artificial Intelligence Tools, page To appear, 2006. 18. G. Sutcliffe. The TSTP Solution Library. http://www.TPTP.org/TSTP, URL. 19. G. Sutcliffe and C.B. Suttner. The TPTP Problem Library: CNF Release v1.2.1. Journal of Automated Reasoning, 21(2):177–203, 1998. 20. G. Sutcliffe, J. Zimmer, and S. Schulz. TSTP Data-Exchange Formats for Automated Theorem Proving Tools. In W. Zhang and V. Sorge, editors, Distributed Constraint Problem Solving and Reasoning in Multi-Agent Systems, number 112 in Frontiers in Artificial Intelligence and Applications, pages 201–215. IOS Press, 2004. 21. J. Urban. MPTP - Motivation, Implementation, First Experiments. Journal of Automated Reasoning, 33(3-4):319–339, 2004. 22. A. Van Gelder and G. Sutcliffe. Extending the TPTP Language to Higher-Order Logic with Automated Parser Generation. In U. Furbach and N. Shankar, editors, Proceedings of the 3rd International Joint Conference on Automated Reasoning, Lecture Notes in Artificial Intelligence, 2006. 23. J. Zimmer. A New Framework for Reasoning Agents. In V. Sorge, S. Colton, M. Fisher, and J. Gow, editors, Proceedings of the Workshop on Agents and Automated Reasoning, 18th International Joint Conference on Artificial Intelligence, pages 58–64, 2003.
Using the TPTP Language
Appendix %-----------------------------------------------------------------------------%----README ... this header provides important meta- and usage information %---%----Intended uses of the various parts of the TPTP syntax are explained %----in the TPTP technical manual, linked from www.tptp.org. %---%----Four kinds of separators are used, to indicate different types of rules: %---- ::= is used for regular grammar rules, for syntactic parsing. %---- :== is used for semantic grammar rules. These define specific values %---that make semantic sense when more general syntactic rules apply. %---- ::- is used for rules that produce tokens. %---- ::: is used for rules that define character classes used in the %---construction of tokens. %---%----White space may occur between any two tokens. White space is not specified %----in the grammar, but the are some restrictions to ensure that the grammar %----is campatible with standard Prolog: a should be readable with %----read/1. %---%----The syntax of comments is defined by the rule. Comments may %----occur between any two tokens, but do not act as white space. Comments %----will normally be discarded at the lexical level, but may be processed %----by systems that understand them e.g., if the system comment convention %----is followed). %-----------------------------------------------------------------------------%----Files. Empty file is OK. ::= * ::= | %----Formula records ::= | %----Future languages may include ... english | efof | tfof | mathml | ... ::= fof(,,). ::= cnf(,,). ::= | ,<source> %----In derivations the annotated formulae names must be unique, so that %----parent references (see ) are unambiguous. %----Types for problems. %----Note: The previous <source_type> from ... %--- ::= <user_role>-<source> %----... is now gone. Parsers may choose to be tolerant of it for backwards %----compatibility. ::= :== axiom | hypothesis | definition | lemma | theorem | conjecture | lemma_conjecture | negated_conjecture | plain | fi_domain | fi_functors | fi_predicates | unknown %----"axiom"s are accepted, without proof, as a basis for proving "conjecture"s %----and "lemma_conjecture"s in FOF problems. In CNF problems "axiom"s are %----accepted as part of the set whose satisfiability has to be established. %----There is no guarantee that the axioms of a problem are consistent. %----"hypothesis"s are assumed to be true for a particular problem, and are %----used like "axiom"s. %----"definition"s are used to define symbols, and are used like "axiom"s. %----"lemma"s and "theorem"s have been proven from the "axiom"s, can be used %----like "axiom"s, but are redundant wrt the "axiom"s. "lemma" is used as the %----role of proven "lemma_conjecture"s, and "theorem" is used as the role of %----proven "conjecture"s, in output. A problem containing a "lemma" or %----"theorem" that is not redundant wrt the "axiom"s is ill-formed. "theorem"s %----are more important than "lemma"s from the user perspective. %----"conjecture"s occur in only FOF problems, and are to all be proven from %----the "axiom"(-like) formulae. A problem is solved only when all %----"conjecture"s are proven. %----"lemma_conjecture"s are expected to be provable, and may be useful to %----prove, while proving "conjecture"s.
77
78
G. Sutcliffe et al.
%----"negated_conjecture"s occur in only CNF problems, and are formed from %----negation of a "conjecture" in a FOF to CNF conversion. %----"plain"s have no special user semantics, and can be used like "axiom"s. %----"fi_domain", "fi_functors", and "fi_predicates" are used to record the %----domain, interpretation of functors, and interpretation of predicates, for %----a finite interpretation. %----"unknown"s have unknown role, and this is an error situation. %----FOF formulae. All formulae must be closed. ::= | ::= <nonassoc_binary> | %----Only some binary connectives are associative %----There’s no precedence among binary connectives <nonassoc_binary> ::= ::= <=> | => | <= | <~> | ~ | ~& %----Associative connectives & and | are in ::= | ::= <more_or_formula>* <more_or_formula> ::= ::= & <more_and_formula>* <more_and_formula> ::= & %---- are in ()s or do not have a at the %----top level. ::= | | () | ::= [] : ::= ! | ? %----! is universal quantification and ? is existential. Syntactically, the %----quantification is the left operand of :, and the is %----the right operand. Although : is a binary operator syntactically, it is %----not a , and thus a is a %----. %----Universal example: ! [X,Y] : ((p(X) & p(Y)) => q(X,Y)). %----Existential example: ? [X,Y] : (p(X) & p(Y)) & ~ q(X,Y). %----Quantifiers have higher precedence than binary connectives, so in %----the existential example the quantifier applies to only (p(X) & p(Y)). ::= | , %----Future variables may have sorts and existential counting %----Unary connectives bind more tightly than binary ::= ::= ~ %----CNF formulae (variables implicitly universally quantified) ::= () | ::= <more_disjunction>* <more_disjunction> ::= ::= | ~ %----Atoms (<predicate> is not used currently) ::= | <defined_atom> | <system_atom> ::= %----A looks like a , but really we mean %---- ::= <proposition> | <predicate>(<arguments>) %---- <proposition> ::= %---- <predicate> ::= %----Using removes a reduce/reduce ambiguity in lex/yacc. <arguments> ::= | ,<arguments> <defined_atom> ::= $true | $false | <defined_infix_pred> <defined_infix_pred> ::= = | != %----A more general formulation, which syntactically admits more defined atoms, %----is as follows. Developers may prefer to adopt this. %---- <defined_atom> ::= <defined_prop> | <defined_pred>(<arguments>) | %--- <defined_infix_pred> %---- <defined_prop> ::= %---- <defined_prop> :== $true | $false
Using the TPTP Language %---- <defined_pred> ::= %---- <defined_pred> :== %----Some systems still interpret equal/2 as equality. The use of equal/2 %----for other purposes is therefore discouraged. Please refrain from either %----use. Use infix ’=’ for equality. Note: != is equivalent %----to ~ = %----More defined atoms may be added in the future. <system_atom> ::= <system_term> %----<system_atom>s are used for evaluable predicates that are available %----in particular tools. The predicate names are not controlled by the %----TPTP syntax, so use with due care. The same is true for <system_term>s. %----Terms ::= | ::= | <defined_term> | <system_term> ::= | (<arguments>) ::= ::= <defined_term> ::= | %----A more general formulation, which syntactically admits more defined terms, %----is as follows. Developers may prefer to adopt this. %---- <defined_term> ::= | | %---<defined_constant> | %---<defined_functor>(<arguments>) | %--- <defined_infix_func> %---- <defined_constant> ::= %---- <defined_constant> :== %---- <defined_functor> ::= %---- <defined_functor> :== %---- <defined_infix_func> ::= %----System terms have system specific interpretations <system_term> ::= <system_constant> | <system_functor>(<arguments>) <system_functor> ::= <system_constant> ::= ::= %----Formula sources <source> ::= <source> :== | | <external_source> | unknown %----Only a can be a , i.e., derived formulae can be %----identified by a or an :== | :== inference(,<useful_info>, [<parent_list>]) :== %----Examples are deduction | modus_tollens | modus_ponens | rewrite | % resolution | paramodulation | factorization | % cnf_conversion | cnf_refutation | ... <parent_list> :== <parent_info> | <parent_info>,<parent_list> <parent_info> :== <source><parent_details> <parent_details> :== : | :== introduced() :== definition | axiom_of_choice | tautology %----This should be used to record the symbol being defined, or the function %----for the axiom of choice <external_source> :== | | :== file() :== , | :== theory() :== equality | ac %----More theory names may be added in the future. The is %----used to store, e.g., which axioms of equality have been implicitly used, %----e.g., theory(equality,[rst]). Standard format still to be decided. :== creator() :== %----Useful info fields
79
80
G. Sutcliffe et al.
::= ,<useful_info> | <useful_info> ::= <useful_info> :== [] | [] :== | , :== | | %----Useful info for formula records :== <description_item> | <description_item> :== description() :== iquote() %----s are used for recording exactly what the system output about %----the inference step. In the future it is planned to encode this information %----in standardized forms as <parent_details> in each . %----Useful info for inference records :== | :== status(<status_value>) | %----These are the status values from the SZS ontology <status_value> :== tau | tac | eqv | thm | sat | cax | noc | csa | cth | ceq | unc | uns | sab | sam | sar | sap | csp | csr | csm | csb %----The most commonly used status values are: %---- thm - Every model (and there are some) of the parent formulae is a %---model of the inferred formula. Regular logical consequences. %---- cth - Every model (and there are some) of the parent formulae is a %---model of the negation of the inferred formula. Used for negation %---of conjectures in FOF to CNF conversion. %---- sab - There is a bijection between the models (and there are some) of %---the parent formulae and models of the inferred formula. Used for %---Skolemization steps. %----For the full hierarchy see the SZSOntology file distributed with the TPTP. :== (,) :== refutation() %----Useful info for creators is just %----Include directives ::= include(). ::= ,[] | ::= | , %----Non-logical data ::= | : | ::= | () | | ::= | , ::= [] | [] ::= | , %----General purpose ::= | ::= | <single_quoted> %----This maybe useful in the future %--- ::= <dollar_word>