Formalism and Beyond
Logos
Studien zur Logik, Sprachphilosophie und Metaphysik Herausgegeben von/Edited by Volker Halbach, Alexander Hieke, Hannes Leitgeb, Holger Sturm
Volume / Band 23
Formalism and Beyond
On the Nature of Mathematical Discourse Edited by Godehard Link
ISBN 978-1-61451-829-7 e-ISBN (PDF) 978-1-61451-847-1 e-ISBN (EPUB) 978-1-61451-996-6 ISSN 2198-2201 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2014 Walter de Gruyter Inc., Boston/Berlin Printing: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany www.degruyter.com
Contents
Preface
ix
Duality, Epistemic Efficiency & Consistency Michael Detlefsen 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 Abstract Duality or Dualization? . . . . . . . . . . . . . 3 The Contentual Addition Model of Dualization . . . . . . 4 Proofs & Proof Developments . . . . . . . . . . . . . . . 5 The Contentual Addition Model & The Traditional Contentualist View of Proof . . . . . . . . . . . . . . . . . . . 6 Contentual Addition in an Abstract Setting . . . . . . . 7 Non-Trivial Axiom Systems . . . . . . . . . . . . . . . . 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 5 7 8 10 12 17 19
Frege on Quantities and Real Numbers in Consideration of the Theories of Cantor, Russell and Others Matthias Schirn 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 The concept of quantity in Frege’s writings between 1874 and 1884 . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Cantor’s theory of irrational numbers and Frege’s critique 4 Russell on quantities and real numbers in Principles of Mathematics and Principia Mathematica . . . . . . . . . 5 Quantities and real numbers in Grundgesetze . . . . . . . 6 Frege’s plan carried out: von Kutschera’s account . . . .
25 26 35 49 56 59 89
Frege on Formality and the 1906 Independence-Test Patricia A. Blanchette 1 Introduction . . . . . . . . . . . . 2 The Proposal . . . . . . . . . . . . 3 The Import of the 1910 Notes . . 4 The Anti-Metatheory Explanation 5 The Similarity with Hilbert . . . . 6 Conclusion . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
97 97 98 104 107 110 115
vi
Contents
Formal Discourse in Russell: From Metaphysics to Philosophical Logic Godehard 1 2 3 4 5 6 7 8
Link Introduction . . . . . . . . . . . . . . . . . . Setting the Stage: Russell’s Early Ontology . On the Nature of Functions . . . . . . . . . The Substitutional Theory . . . . . . . . . . Principia Mathematica . . . . . . . . . . . . Ramification: Gödel’s Gestalt Switch . . . . Lessons for Ontology . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
119 119 124 139 152 155 163 166 175
On Live and Dead Signs in Mathematics Felix Mühlhölzer 1 A Mess Concerning the Reference, Interpretation and Application . . . . . . . . . . . . . . . . . . . . . . . . . . 2 How can Intended Models be Singled Out? . . . . . . . . 3 Strings of Strokes in Hilbertian Finitism . . . . . . . . .
183 184 195 201
Generalization and the Impossible Paul Ziche 1 “Contradictions are emotions”: The example of the complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . 2 Russell on symbolism: Making the simple complicated . 3 Ways into logic: Generalization and abstraction . . . . . 4 Pure logic and meta-scientific induction . . . . . . . . . . 5 Scientistic liberalism and interesting generalizations . . .
209 209 214 215 219 221
Assumptions of Infinity Karl-Georg Niebergall 1 Introduction . . . . . . . . . . . . . . . . . . . . 2 “T makes an assumption of infinity”, “T assumes the finite” . . . . . . . . . . . . . . . . . . . . . 3 Expressing infinity: a preliminary suggestion . . 4 Axioms for and definitions of “finite” . . . . . . 5 Elaboration of (DIiii) . . . . . . . . . . . . . . . 6 The potentially infinite . . . . . . . . . . . . . . 7 Conclusion . . . . . . . . . . . . . . . . . . . . . 8 Appendix . . . . . . . . . . . . . . . . . . . . . .
. . . . . merely . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
229 229 234 237 242 250 256 261 266
The Interpretation of Classes in Axiomatic Set Theory Daniel Roth, Gregor Schneider 275 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 275 2 Set Theories . . . . . . . . . . . . . . . . . . . . . . . . . 275
vii
Contents
3 4
Interpreting Classes . . . . . . . . . . . . . . . . . . . . . 296 Concluding Remarks . . . . . . . . . . . . . . . . . . . . 308
Purity in Arithmetic: some Formal and Informal Issues Andrew Arana 1 Introduction . . . . . . . . . . . . 2 Topical purity . . . . . . . . . . . 3 The infinitude of primes . . . . . 4 Incompleteness and the possibility 5 Closing thoughts . . . . . . . . . .
. . . . . . . . . . . . . . . . . . of purity . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
315 315 316 318 331 333
Domain Extensions and Higher-Order Syntactical Interpretations Marek Polański 1 Introductory remarks . . . . . . . . . . . . . . . . . . . . 2 Domain extensions: some paradigmatic examples . . . . 3 L-operations and L-constructions . . . . . . . . . . . . . 4 Higher-order syntactical interpretations and their constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Concluding remarks . . . . . . . . . . . . . . . . . . . . .
337 337 338 340 344 349
Finite Methods in Mathematical Practice Laura Crosilla, Peter Schuster 1 Introduction . . . . . . . . . . . . . . . . . . . 2 Hilbert’s programme now and then . . . . . . 3 Finite methods for constructive algebra . . . . 4 Geometric formulas and dynamical proofs . . . 5 Realising Hilbert’s programme in commutative 6 Appendix . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . algebra . . . . .
. . . . . .
351 351 352 365 369 372 398
List of Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 411 Name Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Preface The present volume grew out of a cooperation between German, Italian, and US-American researchers from 2005 to 2011, which was funded in equal parts by the Alexander-von-Humboldt Foundation and a matching fund from the University of Notre Dame within the format of one of the Foundation’s TransCoop projects. The general theme of the cooperation was called: “Imaginary and Ideal Elements and Limit Concepts in Mathematics: Their Theory, History, and Philosophical Understanding”. The title was motivated by an ongoing philosophical interest in the role of formalist aspects in mathematical theorizing and practice. Accordingly, the research activities ranged from studies approaching historical topics with modern logico-philosophical tools to systematic conceptual and logical analyses. The contributions presented here cover this ground and to some degree extend it in various directions. Papers focussing on central historical figures in the field, like Frege, Russell, Hilbert, and Wittgenstein, are accompanied by those dealing with issues like infinity, finiteness, and proof procedures and ones putting formalist mathematics into historical perspective. More general information is given in the abstracts preceding each paper. A note on the name index: Names of authors might not explicitly appear on a page mentioned in the index if their work is merely referred to via numbers in square brackets. In such a case the names can be retrieved from the bibliography of the essay concerned. In the name of all contributors, I wish to thank the Humboldt Foundation and the University of Notre Dame for funding the project, and the Munich Center for Mathematical Philosophy (MCMP) for generous additional financial support. I also thank several anonymous reviewers for their expert opinion; Jesse Tomalty for looking over the English of non-native speakers; Johannes Stern und Roland Poellinger for administrational work at various phases of the project. In particular, Mic Detlefsen is to be thanked for originally initiating the transatlantic cooperation, and Gregor Schneider for serving as the general LATEX editor of the volume. Finally, I thank the editors of the series and the publisher for agreeing to produce the book with them. Munich, June 2014, G. L.
Duality, Epistemic Efficiency & Consistency Michael Detlefsen
Duality has often been described as a means of extending our knowledge with a minimal additional outlay of investigative resources. I attempt to construct a serious argument for this view. Certain major elements of this argument are then considered at length. They’re found to be out of keeping with certain widely held views concerning the nature of axiomatic theories (both in projective geometry and elsewhere). They’re also found to require a special form of consistency requirement.
1 Introduction Duality or reciprocity principles in projective geometry have been described as being among the most significant developments in modern geometry.1 The geometer H. S. M. Coxeter stated the basic rationale behind such claims as follows: One of the most attractive features of projective geometry is the symmetry2 and economy with which it is endowed by the principle of duality: fifty detailed proofs may suffice to establish as many as a hundred theorems. [9, p. 25]3,4
Duality has thus been seen as a means of effecting a dramatic economy in our geometrical thinking. It has in fact been linked to economy or efficiency 1
Cf. [36, p. 24].
2
‘Symmetry’ is the term Klein, Veblen and Young (cf. [32, p. 7]) and others used to describe an important structural feature of the theorem-set of projective geometry they took to be induced by duality.
3
For a sampling of similar statements see [25, p. 25]; [23, p. 217]; [20, pp. 3-4]; [19, p. 15]
4
For convenience, I’ll refer to the basic epistemological idea expressed in this claim – that dualization effectively provides for the doubling of the knowledge represented by a given body of primary proofs – as the doubling idea.
2
Michael Detlefsen
from the very start. A supposed connection is clear, for example, in the writings of Gergonne, who emphasized that efficiency and organization are just as important to the development of scientific thinking as the discovery and justification of “new truths”. Indeed, Gergonne suggested, for extensively developed sciences, improved internal organization may be an even more compelling consideration than the development of further new knowledge. In his view, in fact, this was the case for the mathematics of his day. [A]t the point which mathematics has reached today, . . . encumbered as we are with theorems of which even the most intrepid memory cannot flatter itself it retains the statements, it would perhaps be less useful to science to seek new truths than to reduce the truths already discovered to a small number of guiding principles. In any case a science perhaps recommends itself less by the multitude of propositions which make it up than by the manner in which these propositions are related and connected to one another. [14, pp. 150-151]5
Roughly a century after Gergonne’s observation that accumulated knowledge sometimes stands in need of the type of simplifying “reduction” suggested by duality, Veblen and Young suggested making dual structuring, or something akin to it, an ideal (or at least a virtue) of axiomatic theorizing in geometry more generally. They termed this ideal symmetry, and they described it as a type of internal structuring of a theory in which proofs and/or theorems are “paired” in such a way that, from a given element of a given pair, the other element can be obtained by application of a routine mechanical transformation.6 They listed this ideal alongside such better-known general standards and ideals of scientific theorizing as consistency, independence and categoricity.7 There is, further, the desideratum of utmost symmetry . . . in the whole body of theorems.8 . . . 5
An obvious question, of course, is why the improvements in internal organization should not themselves be seen as constituting discoveries of new truths. Gergonne need not have denied that they are. His point could be reformulated by making a distinction between new truths of the type typically identified as such by mathematicians and truths not so recognized. His suggestion would then have been that development of the former is in certain circumstances as important as development of the latter.
6
Veblen and Young spoke of mathematical ‘sciences’ where I speak of mathematical ‘theories’. “We understand the term a mathematical science to mean any set of propositions arranged according to a sequence of logical deductions.” [32, p. 2]
7
Cf. [32, §§1, 2].
8
Veblen and Young also introduced a further condition they termed ‘generality.’ They described it as follows: “[T]he applicability of a theorem shall be as wide as possible. This has relation to the arrangement of the assumptions, and can be attained by using in the proof of each theorem a minimum of assumptions.”
Duality, Epistemic Efficiency & Consistency
3
Symmetry can frequently be obtained by a judicious choice of terminology. This is well illustrated by the concept of “points at infinity” which is fundamental in any treatment of projective geometry. . . . Let us now consider the . . . two propositions: 1. Any two distinct points of a plane are on one and only one line. 1′ . Any two distinct lines of a plane are on one and only one point. Either of these propositions is obtained from the other by simply interchanging the words point and line. . . . In view of the symmetry of these two propositions it would clearly add much to the symmetry and generality of all propositions derivable from these two, if we could regard them both as true without exception. This can be accomplished by attributing to two parallel lines a point of intersection.” [32, pp. 7-8]9
Veblen’s and Young’s suggestion thus seems to have been that such dualities as the familiar point-line duality of plane projective geometry provide for a resource-conserving conversion of certain proofs into certain other proofs, and that this convertibility, in turn, provides for increased efficiency in the development of our knowledge.10 The reasoning behind remarks such as those just surveyed has not been set out carefully and explicitly. A first task, then, is to try to get clearer on what such reasoning might look like. The following argument, I believe, is an at least credible first candidate. Its clarification and evaluation will be the chief concerns of this paper. Dualization Argument Comparative Epistemic Gain: To make use of dualization provides an opportunity to significantly increase the extent of our knowledge
This description suggests that generality too might have been intended to ‘compress’ or tighten a theory’s internal organization by maximizing the specificity of connections between theorems and axioms. 9
It should perhaps be noted that what we’re here calling duality, or related (though not always identical) notions going by the same name, have been suggested as ideals for theories other than projective geometry. A recent example is an appeal for duality in algebraic geometry (cf. [2]). There are more remote examples as well, one of which concerns duality in set theory and its relationship to duality in projective geometry (cf. [29]).
10 Other dualities (e.g. point-plane) were taken to offer similar benefits. These dualities took various forms. We have already noted dualities for particular types of geometrical figures (e.g. the above-mentioned point-line duality for planar figures, or the point-plane duality for spatial figures). Some also offered more abstract statements of dualities such as the following: “[F]rom any statement or theorem concerning the relative positions of the elements composing a geometrical configuration, another statement or theorem can be obtained by a simple interchange of the elements of the configuration with their reciprocals.” [10, p. 10].
4
Michael Detlefsen
over what it would be were we not to make use of it.11,12 Modest Developmental Costs: To make use of dualization requires only a modest outlay of developmental resources or “labor” beyond those needed to obtain the primary proofs to which it (i.e. dualization) is to be applied.13 Nature of Epistemic Efficiency: If the gains-to-costs ratio of one method of epistemic development are greater than those of another, we say that the epistemic efficiency of the one is higher than that of the other. ∴ Improved Epistemic Efficiency: Dualization presents a compelling opportunity to improve epistemic efficiency. ∴ Dualization: We ought to make use of dualization when we have the opportunity to do so. This or something like it seems to have been the basic reasoning behind the claims concerning the significance of duality just surveyed. There are a number of points at which it invites closer examination. My focus will be the first premise, Comparative Epistemic Gain, whose justification, I believe, is problematic. Before turning to this premise and the question of its justification, however, I want briefly to consider what may be an even more basic matter – namely, whether it is duality principles per se (i. e., duality principles conceived essentially as abstract existence claims), or dualization processes, that are supposed to sustain whatever epistemic economies there are that may credibly be attributed to duality.14
11 Or, if not the quantity or extent of what is strictly speaking knowledge, the quantity of some other type of epistemic good. 12 The following remark by Smart is an example of the use of explicitly epistemic language to describe the virtues of duality. “From any known theorem we are . . . able to write down a reciprocal theorem whose truth we can at once assert; and we have thus a useful and valuable method of extending our knowledge of geometrical properties.” [28, p. 260]. 13 The following remark by Mathews is an example of how the term ‘labor’ has been used to describe the benefits of dualization. “[P]ractically the principle of duality halves our labour, because all we have to do is to translate, so to speak, the enunciation of a proved proposition into that of its correlative, and then infer the latter at once”. [20, p. 4]. 14 There is another basic matter I want to mention, even though I will not discuss it further here. This concerns an assumption that is implicit in the Dualization Argument, namely, that there are reasons to want to economize on the expenditure of cognitive resources. This leads fairly directly to the question of (i) whether and, if so, in what ways cognitive resources are “limited” or “depletable”, and to questions concerning (ii) how limits on and/or depletion of resources for individuals may compare to limits on and/or depletion of resources for communities. These are important and difficult questions, but questions I will not go into here.
Duality, Epistemic Efficiency & Consistency
5
2 Abstract Duality or Dualization? There is, I think, a difference between taking the economy that duality is supposed to represent to be a product of what might be described as “abstract” knowledge of the existence of dual proofs and taking it to be a product of such knowledge of dual proofs as might be obtained through application of known processes of dualization to known primary proofs. For cases where the knowledge represented by having and comprehending a proof of a theorem differs materially from that represented by merely knowing that there is a proof of that theorem, the above distinction can be expected to matter. How extensive and important such differences may be is difficult to say in any general way, and I will make no attempt to cut through this difficulty here. This notwithstanding, the relevance of such a distinction seems an important point to bear in mind. Generally speaking, its importance will be proportional to the extent to which knowing a proof of a proposition provides for better knowledge of it (and perhaps also more extensive knowledge of related propositions) than merely knowing that there is a proof of it. This being so, I will consider dualization rather than mere abstract duality as the focal form duality should be considered to take for purposes of reckoning the extent to which it may increase epistemic efficiency. Those who have written on the significance of duality have not always been careful to mark such a distinction. [G]iven a theorem and its proof, we can immediately assert the dual theorem; for a proof of the latter could be written down mechanically by dualizing every step in the proof of the original theorem. [9, p. 231]
Such statements are in certain respects puzzling. Coxeter describes the process as beginning with a given theorem and a given proof of it. This presumed theorem and its presumed proof are what I will generally refer to as the primary theorem and proof of a dualization process. Coxeter says that these “givens” immediately justify assertion of the dual theorem. What seems curious to me is his description of what is given. He does not say that what is given is that there is a proof of the primary theorem. Rather, he describes the given as the primary proof itself. This naturally suggests an intention to build not on the abstract fact of the existence of a primary proof, but on the substance of that proof itself. In other words, it suggests the following plan of epistemic expansion: Dualization: Given a primary proof of a primary theorem, we mechanically transform it into a dual proof of the dual theorem. Knowledge of this newly obtained proof then justifies, among other things, assertion of the dual theorem.
6
Michael Detlefsen
This is not how Coxeter describes things though. He does not appeal to the transformation of the primary proof into a dual proof. Rather he appeals to knowledge that the proof can be dualized, and concludes the assertability of the dual theorem from this. The procedure that Coxeter describes is therefore not one which appears to transform a given primary proof into a corresponding dual proof. Rather, it is something seemingly intended to transform knowledge of a primary proof directly into knowledge of a dual theorem. What matters, he suggests, is not that knowledge of a primary proof is transformed into knowledge of a dual proof, but that knowledge of a primary proof assures us that a dual proof “could be written down mechanically” (loc. cit., emphasis added). Coxeter’s conception of the epistemic expansion underwritten by duality thus seems a curious mixture of the scheme described in Dualization with a more abstract understanding of duality – an understanding which sees duality as proceeding from known existence of a primary proof (as distinct from knowledge of the proof itself) to known existence of a dual of it to, finally, assertion of the dual theorem. In other words, it appears to combine Dualization with the following Abstract Duality: Given a sentence that is known to have a primary proof, we may immediately assert both the provability of its dual and the dual itself. Dualization and Abstract Duality seem to me to offer substantially different models for the type of epistemic expansion that may be supported by a duality phenomenon. These differences seem generally to be as significant as the combined differences between (i) knowing that a primary proof exists vs. knowing such a proof itself and (ii) knowing that a transformation of a primary to a dual proof exists vs. knowing and applying such a transformation to obtain a dual proof. This notwithstanding, there may be settings where the differences are not so great, or where such differences as there are suggest a preference for Abstract Duality. Consider, for example, the possibility of a theorem of which the following conditions hold: (a) the complexity of even the simplest proof of a theorem (whether primary or secondary) is such as to make it largely unknowable to humans, but (b) the prospects for human knowledge of the existence of such proofs are significantly better. It may that there are propositions with respect to which the best we can hope for is knowledge of the existence of a proof. For such propositions,
Duality, Epistemic Efficiency & Consistency
7
applicable forms of duality might be limited to Abstract Duality.15 Generally speaking though (e.g., when simplest proofs are not so complex as to practically afford only abstract knowledge of their existence), the epistemic expansion underwritten by Dualization can be expected to be greater than that underwritten by Abstract Duality. The reason is that dualization can generally be expected to yield knowledge not only of dual theorems but of dual proofs, and such knowledge seems generally to go beyond mere knowledge of a dual theorem (and/or abstract knowledge of the existence of a proof of it).16 At the same time, though, the sum of the resources consumed by dualization may be greater than that required for knowledge of Abstract Duality.17 The mere fact, if it is a fact, that the overall epistemic product of Dualization should be greater than that of Abstract Duality does not in itself imply that Dualization generally sustains greater gains in efficiency than does Abstract Duality. To determine that it does, if (and where) it does, would require more intricate analysis than I will provide here. What seems clear, though, is that Dualization and Abstact Duality generally represent different plans for epistemic expansion. I note this because, in what follows, I will consider only dualizational plans and such efficiencies as they may or may not support.
3 The Contentual Addition Model of Dualization As I am conceiving of it, dualization is a process of proof development. It starts with a primary proof, treated as given, and transforms it into another proof. This transformation, it is commonly believed, engenders a body of knowledge which represents the epistemic product of the process of dualization. The (or at least a) common view is that it this product effectively “doubles” the knowledge represented by the primary proof from which the dualization process proceeds. It does so, moreover, with little 15 Perhaps Appel’s and Haken’s computer-assisted proof of the four color theorem is a case in point. This at any rate if we suppose that the proof defies human comprehension in a relevant sense(s) in which the program verifying that there is a proof does not. In general, one of the questions raised by the Appel-Haken proof is how the knowledge of a theorem that is normally represented by knowing a proof of a theorem compares to the knowledge represented by knowing that there is a proof of it. This is an interesting and important question, though there is room enough to explore it further here. 16 To mention an obvious additional component, it would generally include knowledge of the premises of the dual proof. 17 To dualize an entire proof might generally require more resources than merely to dualize the theorem proved. In addition, the demands of grasping a dual proof, and of recognizing it as a dual proof once it has been generated, might be considerably greater than those of knowing that a given primary proof can be dualized.
8
Michael Detlefsen
additional expenditure of investigative resources beyond those expended in the development of the primary proof mentioned. A little more specifically, what is taken to be doubled is the extent or amount of knowledge that is represented by a supposed primary proof. To put it another way, the contents of the knowledge that is a direct product of dualizational development of a proof is presumed to be roughly equal in extent to the contents of the knowledge that is a similarly direct product of the primary proof from which dualization departs. These contents are also presumed to be “new”. A proof obtained by dualization is thus presumed to constitute an addition to the contents of the dualizer’s knowledge that is roughly equal in extent to the epistemic product of the primary proof with which it begins. For present purposes, then, I am conceiving of the epistemic product of a proof or proof development as the total body of knowledge which it represents. I am thinking of the extent of such a product, moreover, as a measure of some type of combination of the propositional contents of the several constituent pieces of knowledge whose combination constitutes the product mentioned. What I am calling the Contentual Addition Model of dualization is a view which sees it (i.e. dualization) as a means of increasing the extent of the knowledge that is represented by a primary proof development. Indeed, if Coxeter and others of like persuasion are correct, dualization provides for what is essentially a doubling of the extent of the knowledge provided by a primary proof development. It is supposed to provide this, moreover, with little additional expenditure of cognitive resources beyond that which is required for the primary proof development to whose product it is to be applied. Both primary and dualizational developments of proofs of dual theorems, then, require expenditure of cognitive resources.18 Those who believe in the epistemic benefits of duality generally hold a view to the effect that the ratio of the extent of the epistemic product to the cost of a dualizational development of a proof of a dual theorem generally exceeds that of a primary development of a proof for it.
4 Proofs & Proof Developments We must be careful, though, to distinguish the knowledge represented by a proof (be it primary or a proof by dualization) per se, and the generally more extensive knowledge represented by a proof development. That this 18 As with developed content, so too with developmental expenditures, I will assume them to be quantitatively measurable or estimatable. It may also be necessary to suppose that their supply is in certain respects limited.
Duality, Epistemic Efficiency & Consistency
9
is so is due to the fact that, in the end, the choices presented by duality phenomena are choices between proof developments, and not merely choices between proofs per se. Dualizational proof development has a mechanical character which may generally make it surer and more straightforward than primary development of a proof of a dual theorem. It may therefore seem to offer advantages of surety and simplicity relative to primary developments of proofs for dual theorems. Be this as it may, dualizational and primary developments also have associated epistemic products, and were the epistemic products of primary developments systematically (or at least predictably) greater than those of their dualizational counterparts, the surety and straightforwardness of dualizational development might not be sufficient to warrant an overall preference for dualizational over primary developments. Even supposing that the costs of dualizational development are as a rule substantially lower than those of primary developments, primary development of proofs for dual theorems might predictably generate a greater epistemic product than does dualizational development. Were the expected increases in epistemic product great enough, primary development of proofs of dual theorems might prove to be rationally preferable to dualizational development even if its costs were generally higher, perhaps even much higher. When considering the opportunities for epistemic economy that may be offered by duality, then, we ought to compare the relative epistemic products of primary and dualizational proof development for dual theorems and not merely their primary and secondary proofs. If duality is to represent an opportunity for epistemic economy, it must presumably be the case that the product-to-cost ratio of dualizational proof development for dual theorems is generally greater than that of their primary counterparts. This, at any rate, is the key idea of what I will call the Contentual Addition Model of dualization. According to it, dualization adds to the knowledge represented by a primary proof development by adding to the extent or quantity of the contents known. On some versions of this view, in fact, it provides for what is essentially a doubling of the (extent of) such contents. If the Contentual Addition Model is to be convincing, the idea of contentual addition on which it is based must be plausible. As we will now see, though, there are a number of questions which can be raised concerning such a view of contentual addition. These include: (i) Are the axioms of projective geometry themselves propositions or, as per the common conception of projective geometry as a so-called abstract science, are they rather something more schematic (e.g. propositional functions of some type)?
10
Michael Detlefsen
(ii) (a) Are proofs in projective geometry finite sequences of judgments each element of which (including the conclusions proved) adds to the extent to the total epistemic product of the proof? Or (b) do proofs in projective geometry rather establish what is generally a logical relationship between the contents of a set of possibly non-judgmentive premises and the content of a non-judgmentive conclusion? In addition to these, there are questions concerning the requirements the Contentual Addition Model places on our knowledge of models of the theories (e.g. projective geometry) to which it (the Contentual Addition Model) is applied. Here one thing seems sure – namely, that mere consistency is not in itself enough to sustain the Contentual Addition Model. It seems instead to require knowledge of a model – indeed, particular knowledge of a model, as distinct from mere or abstract knowledge of its existence. This type of knowledge seems to involve more than that which would at least standardly be provided by what Hilbert referred to as “direct” (i.e. a proof-theoretic) proofs of consistency. This complicates the question of what form a consistency requirement ought best to take for theories where dualization is wanted as a means of efficiently extending our knowledge. More on this later.
5 The Contentual Addition Model & The Traditional Contentualist View of Proof Let us conceive of the epistemic product of a proof development as the body of epistemic goods in whose production it figures significantly.19 Likewise, let’s conceive of the cost of a proof development as the total body of cognitive resources expended in carrying it out. These are rough characterizations, of course, but they suggest an at least correspondingly rough way to think of the epistemic efficiency of a proof development – namely, as the ratio of (the extent of) its epistemic product to (the extent of) its cost, however these might more exactly be conceived.20 19 More (though not perfectly) exactly, we may think of it as the total body of epistemic products that are (a) reasonably seen as being regular concommitants of the development in question and (b) reasonably taken to be generally valuable to pursuers of the proof development in question as pursuers of their epistemic-developmental type. 20 We could, and ultimately should, of course, consider not only matters of extent or quantity but also of quality. In a more fully developed account, we would want a ratio of the value of the product (i.e. a composite of its quantity and (relative) quality) of a proof development and the value of the investigative resources consumed by it (i.e. some composite of the scarcity of these resources and of the relative value of expending them in this way).
Duality, Epistemic Efficiency & Consistency
11
We must therefore attempt to determine what are the typical epistemic products of primary and dualizational proof developments, and what are the expenditures typically needed for their respective executions. This may be subtler than it may at first appear to be. To begin with, we must decide certain larger questions concerning the overall character of the type of proof being developed. On one view, proofs are sequences of judgments the propositional contents of which are judged to stand in certain logical relationships to one another. The judgments mentioned are affirmations of propositional contents. Some of these are the contents of premises, one (typically) the content of the conclusion and some the contents of the judgments of logical relationships mentioned. For convenience, I’ll call this the Traditional Contentualist View (TCV) of proof. On the TCV, the epistemic product of a primary proof development can be roughly divided into four types of components – the premisory, logical, conclusory and auxiliary components, respectively. These are what the names suggest. The premisory component is thus knowledge of the developed proof’s premises. The logical subproduct is the knowledge that the development gives of the logical relationships between the several premises and sets of premises of the proof and between (sets of) its premises and its conclusion. The conclusory part of the product is the knowledge the proof development gives of the conclusion of the proof. The auxiliary subproduct of a proof development, finally, is comprised of such other items of knowledge as are non-incidentally engendered by it, even though they should not be part of that knowledge which in some sense constitutes the proof itself.21 As an example of auxiliary knowledge, consider the type of knowledge that may result from making a false start on developing a primary proof, learning from its failure, and eventually successfully developing a primary proof. Such knowledge seems to be a common (if not universal) element of primary proof development – common enough, at least, to deserve consideration as a possibility generally to be considered in realistic estimations of the comparative epistemic merits of primary and dualizational proof developments. A particular point to consider in this connection is that dualizational and primary proof development may not be equal in their auxiliary elements. In particular, dualization might not regularly offer the same rich potential for 21 To avoid misunderstanding, let me note that the auxiliary items I have in mind are not what earlier writers have sometimes referred to as intervenient knowledge – that is, knowledge of such a type as equips a knower to better develop her knowledge in the future, even though it may not itself constitute a contentual addition to her present knowledge. Mathematics has traditionally been prized for such benefits. Cf. [1, Bk. 2, VIII, 2]; the anonymously written dedication to [27, xiv-xv]; and [4, p. 172]. What I am here terming auxiliary items are, by contrast, contentual additions to an agent’s present knowledge, and not (or at least not merely) enhanced capacity for future contentual extension of her knowledge.
12
Michael Detlefsen
development of auxiliary knowledge of the type just mentioned as primary proof development does. Auxiliar knowledge is of course just that, auxiliary. It is not knowledge that is properly a part of that knowledge which constitutes the proof developed in a proof development. This notwithstanding, it may nonetheless contribute to the realization of a more comprehensive set of ends which extend, but also continue (in an appropriate way), the ends of the given proof development(s). Auxiliary gains of the general type mentioned above (i.e. those involved in making and correcting false starts) do not of course come at no cost. The underlying trial-and-error procedures can be expected to consume cognitive resources, and it would not be surprising if the amounts consumed sometimes exceed those consumed by their more mechanical dualizational counterparts.22 This notwithstanding, the main point is this: in estimating the relative benefits of primary and dualizational developments, the possibility of auxiliary gains and costs ought generally to be considered. Dualization may generally offer advantages over primary development in terms of lowered cognitive costs. If, however, it were also to offer less by way of auxiliary gains, it might be that primary development of a proof for a dual theorem would be overall preferable to dualizational development. This being so, the idea that dualization should generally offer what is essentially a doubling of the epistemic benefits of a parallel primary development seems dubious, or at least in need of special argument. Whatever the correct calculation of the alleged efficiency of a dualization might be, it will require knowing of the primary development(s) to which a dualizational alternative is to be compared (a) whether it offers auxiliary epistemic benefits, (b) what the extent of such benefits is, and (c) what their cumulative cognitive cost is. This suggests a significantly more complicated calculation than that suggested by the simple doubling idea.
6 Contentual Addition in an Abstract Setting The TCV is likely the most accommodating view as regards the Contentual Addition Model of duality. As we have just seen, though, even it poses certain difficulties for the Contentual Addition Model. In addition, the TCV does not embody a conception of theory that has generally been taken to be the (or an) appropriate conception for that theory (or cluster of theories) with respect to which dualization has featured most prominently – namely, that pertaining to projective geometry. The 22 It does not seem inevitable that they should do so, though. Even mechanical procedures can be long and intricate.
Duality, Epistemic Efficiency & Consistency
13
prevailing view since the late nineteenth and early twentieth centuries is that projective geometry should be conceived and formulated as a so-called “abstract science” (abstracte Wissenschaft).23 Veblen and Young gave the following summary statement of this view. The starting point of any strictly logical treatment of geometry (and indeed of any branch of mathematics) must . . . be a set of undefined elements and relations, and a set of unproved propositions involving them; and from these all other propositions (theorems) are to be derived by the methods of formal logic. . . . [T]he undefined elements are to be regarded as mere symbols devoid of content, except as implied by the fundamental propositions. Since it is manifestly absurd to speak of a proposition involving these symbols as self-evident, the unproved propositions referred to above must be regarded as mere assumptions. [32, p. 1-2, emphases in text]
They went on to remark, however, that though mathematical sciences are not generally intended as descriptions of particular domains identified in advance, their significance as bodies of knowledge nonetheless depends on their having application to (parts of) our larger experience. We understand the term a mathematical science to mean any set of propositions arranged according to a sequence of logical deduction. From the point of view developed above such a science is purely abstract. If any concrete system of things may be regarded as satisfying the fundamental assumptions, this system is a concrete application or representation of the abstract science. The practical importance or triviality of such a science depends . . . on the importance or triviality of its possible applications. [Loc. cit., emphases in text]
This became the commonly accepted understanding of projective geometry in the early years of the twentieth century. A clear and clearly stated example was Whitehead who described the axioms of projective geometry as “statements about relations between points” [34, p. 1]. He quickly added, though, that “they are not statements about particular relations between particular points” [loc. cit.]. Rather, the points and relations mentioned “are not otherwise specified than by the supposition that the axioms are true propositions when they are considered as referring to them” [loc. cit.]. In the abstract science of projective geometry, then, [T]he points mentioned in the axioms are not a special determinate class of entities; but they are in fact any entities whatever, which happen to be inter-related in such a manner, that the axioms are true when they are considered as referring to those entities and their inter-relations. [34, p. 2] 23 This conception was given impetus by Pasch’s 1882 lectures on what he and others referred to as the “new” geometry ([22]), and also by various writings of Peano and his students. See also [35, p. 46] for a use of the phrase “abstracte Wissenschaft” that Blumenthal reported as having impressed Hilbert. Wiener argued that geometry ought to be constructed as an abstract science whose propositions are in some sense “independent” of the usual axioms of geometry.
14
Michael Detlefsen
Accordingly, the axioms of an abstract science are not truly propositions at all, but propositional functions. As such they are not strictly capable of being either true or false and are therefore not properly regarded as truths. This was the typical view of projective geometry after the turn of the twentieth century, though not every statement of it was so clear as Whitehead’s, or so explicit in its use of a distinction like that between propositions and propositional functions. The important point for our purposes is that on the abstract conception of projective geometry,24 the premises and conclusions of proofs are not propositions but propositional functions. As such they do not have contents to which other contents might be added, or which might be added to other contents, in the way presumed by the Contentual Addition Model. In addition to being conceived as abstract sciences, it has been common in the late nineteenth and twentieth centuries to conceive of mathematical theories hypothetically. In an axiomatic theory thus conceived, proofs are not intended to establish the truth of their conclusions but only to reveal a relationship of logical implication between the axioms of the theory and them. [A] mathematical demonstration, strictly speaking, is not concerned with the truth of the proposition at all; it is concerned merely with the logical relation that exists between the given proposition and certain other propositions called the axioms – in other words, all that a mathematical demonstration tells us is that if the axioms are true, then the theorem in question will also be true – provided, of course, that our deductive reasoning is sound. [17, §8, p. 159]
Such an understanding of proof does not seem to fit well with the Contentual Addition Model. The reason, briefly, is that, on this understanding, primary proofs are proofs of tautologies and dualized proofs of tautologies do not generally constitute proofs of new tautologies in any relevant way.25 The sentences proved and their proofs may be syntactically distinct, but this does not imply a genuine newness of the tautology proved or the proof by which it is proved. To see why, consider a dualizable theory T in a language LT .26 Let π be a primary proof (i.e. a proof developed by primary means) in T of a sentence θ of LT . For convenience, let’s say that an axiom of T is “in” π if it is one of the elements of the finite sequence of sentences of LT 24 This is a conception that has been taken to apply not only to projective geometry but to mathematics more generally. Cf. [3, p. 359]. 25 Still less do they constitute new proofs of new tautologies. 26 This is intended to include, of course, theories that are dually axiomatized (i.e. whose axioms are given in dual pairs). For simplicity, in fact, we may assume that T is of this type.
Duality, Epistemic Efficiency & Consistency
15
that constitutes π. For simplicity, let us further stipulate that knowing π may be regarded as giving knowledge that the axioms of T that are in π logically imply the conclusion of π. Suppose, then, that we dualize π to obtain a proof d(π) of the sentence d(θ). The pivotal question is this: What knowledge may d(π) and its dualizational development reasonably be expected to add to the knowledge assumed to be given by π and its primary development?27 Whatever the details, the core of the answer seems to be this: “d(π) and its dualizational development can not be expected to add anything new of comparable extent to the knowledge provided by π and its development.” The reason is that substituting terms into a tautology to make it a new sentence (i.e. a new instance of an already recognized tautologous form) is not in itself enough to make knowledge either of it or of its tautologousness new knowledge. Thus, for example, the knowledge represented by developing a primary proof of, say, 1. Either [any two distinct points of a plane are on one and only one line] or it is not the case that [[any two distinct points of a plane are on one and only one line]].
is not doubled by making the substitutions necessary to dualize that proof into a proof of 1D. Either {any two distinct lines of a plane are on one and only one point} or it is not the case that {{any two distinct lines of a plane are on one and only one line}}.
Considered as a tautology, in fact, 1D is not new when compared to 1. Neither is the dualized proof of 1 a new proof of a tautology as compared with the assumed primary proof of 1. The reason, generally speaking, is that dualization of tautologies preserves their recognizable logical forms. That is, it preserves the forms that make the dualized sentences the tautologies they are. This being so, and knowledge of tautologousness being mainly a matter of (a) knowing of a sentence that it has a certain logical form F and (b) knowing that all sentences having that form are tautologies, the dual of a tautology will not be a genuinely “new” tautology. It will be only a syntactically distinct instance of an already recognized tautologous form. What would make it a new tautology would be its having a new tautologous form. Dualization, though, does not generally produce sentences having new logical forms as compared to their primary counterparts. Rather, it produces only sentences which, despite their syntactical distinctness from their primary counterparts, are nonetheless of recognizedly the same logical form. 27 Here, as expected, d(θ) is the dual of θ, and d(π) the proof which results from dualizing each element of π.
16
Michael Detlefsen
Much the same is true of proofs of tautologies and their dualizations. If the displayed logical forms of a finite family of sentences α1 , . . . , αn are perceived to stand in certain formal-logical relationships to one another, the displayed logical forms of d(α1 ), . . . , d(αn ), generally speaking, can be expected to perceivedly stand in the same relationships.28 A more detailed description of the general situation may make things clearer. Consider, then, the general scenario in which a primary development of a proof π of θ is based on the following knowledge: (A) (i) knowledge of a certain syntactically displayable logical form Fc , and (ii) knowledge that Fc is displayed by θ, (B) (i) knowledge of certain syntactically displayable logical forms Fa1 , . . . , Fan and (ii) knowledge that these are severally the forms displayed by the axioms in π and, finally, (C) knowledge that a set of sentences of LT which severally have the forms Fa1 , . . . , Fan logically imply any sentence that has the form Fc . Consider further a dualizing prover who knows of her dualizing process that for all sentences σ and all finite sets of sentences Σ of LT , Σ logically implies σ only if d(Σ) logically implies d(σ).29 For such a prover, dualization would seem not to double the knowledge produced by a primary development of π.30 In particular, it could not reasonably be expected to yield new knowledge to match the knowledge described in (C). The knowledge corresponding to the (C) component under dualization would not be new knowledge. Rather, as mentioned earlier, it would be essentially the same knowledge as that represented by the (C) component of the primary proof development. Neither, for that matter, would we expect the (A)(i) and (B)(i) components to be new. What might more reasonably be seen as new are the (A)(ii) and (B)(ii) components – the knowledge that Fc is a form of d(θ) and that Fa1 , . . . , Fan are forms of the axioms d(a1 ), . . . , d(an ) that are in d(π). Such knowledge being new would not, however, be enough to warrant a claim that dualization of π effectively doubles the knowledge yielded by a primary development of π. Rather, for such a claim to be plausible, the (C) 28 I say ‘can be expected to’ rather than ‘will’ because it is possible that, as a matter of human perception, dualization may alter our perception of even logical form. Application of the Contentual Addition Model of duality might thus require premises concerning our perception of logical forms and what stands to influence it. 29 Here, of course, d(σ) is the dual of σ, Σ is a class of sentences of LT , and the elements of d(Σ) are the duals of the elements of Σ. 30 It seems quite plausible that a dualizing prover should have the knowledge mentioned. The usual dualization processes for projective geometry clearly preserve standard logical forms and whether or not Σ implies σ is at least normally determined by the standard logical forms of σ and the sentences in Σ. My use of “normally” here is intended to reflect the fact that, like other norms, norms concerning what is central to rational judgments of implication are at least in principle subject to change.
Duality, Epistemic Efficiency & Consistency
17
component of a dual proof would also have to constitute new knowledge of comparable extent to the (C) component of its primary counterpart, and this seems generally not to hold. The Contentual Addition Model of dualization thus seems incapable of sustaining the Dualization Argument. Specifically, it seems incapable of sustaining the Comparative Epistemic Gain premise when projective geometry is either viewed as an abstract science or a hypothetical science.31
7 Non-Trivial Axiom Systems The above argument having been given, it is important to note that both Veblen and Young [32] (cf. p. 2) and Whitehead [34] (cf. p. 2) expressed general concerns regarding what they termed the possible “triviality” of abstract axiomatic systems. They described this triviality as consisting in the absence of “concrete system[s] of things which may be regarded as satisfying” [32] the axioms of an abstract axiomatic system. The standard usage of ‘concrete’ in mathematics in the late nineteenth and early twentieth centuries was intended to express a contrast with the meaning of ‘abstract’ in ‘abstract science’. Accordingly, a concrete model of an abstract science was taken to be a system of propositions – as distinct from propositional functions or proposition-schemata – obtainable from an abstract science by substituting particular propositions for its propositiono-schematic axioms. Some, however, have suggested that non-triviality requires more than mere existence of a model. They have suggested such additional requirements as the “practical importance” (cf. [17, §16]; [32, p. 2]) of a nontrivializing interpretation. Huntington, for example, described possession of an “interesting concrete interpretation” [16, p. 4, fn ∗ ] as a condition of an abstract science’s significance or study-worthiness. It has thus been fairly common to see the existence of a model as necessary though not in itself sufficient for non-triviality. The relevant meanings of such terms as “concrete”, “interesting” and “practical importance” are not altogether clear of course.32 Let us suppose, however, at least for the sake of argument, that a “concrete” and/or 31 Historically, the hypothetical and abstract conceptions seem often to have been combined. 32 Nor did mathematical writers do much to clarify their meanings. The philosopher Josiah Royce offered a more substantive description making use of a contrast he found compelling between mathematical science and games such as chess. He wrote (cf. [26, pp. 451-452]): “The exactly stated ideal hypotheses whose consequences the mathematician develops must possess, as is sometimes said, sufficient intrinsic importance to be worthy of scientific treatment. They must not be hypotheses. The mathematician is not, like the solver of chess problems, merely displaying his skill in dealing with the arbitrary fictions of an ideal game. His truth is, indeed, ideal;
18
Michael Detlefsen
“interesting” and/or “practically important” model can always be found for a set of projective axioms.33 Under such an assumption, is the Comparative Epistemic Gain premise plausible? Not clearly so, it seems. Even given an interpretation of an abstract theory under which its axioms may be evident, there is no guarantee that dualization will essentially double the extent of one’s knowledge, or otherwise substantially increase it. For one thing, the details of the particular dualization may matter. Suppose, for example, that the system in question is one in which each axiom has a dual that is also an axiom of the system. Some, but not all, dualizations are of this axiomatically dual type. An interpretation of an axiomatically dual system that makes its axioms evident, interesting or practically important will generally confer as much of these qualities on dual proofs as it confers on their primary counterparts. For dualizable systems that are not axiomatically dual, there seems to be no similar guarantee. Being evident, interesting or practically important are not properties that dualization may generally be expected to preserve. Finding an evident, interesting or practically important interpretation of a non-axiomatically dual dualizable system cannot therefor generally be expected to sustain epistemic doubling (or other substantial epistemic increase) under dualization. In addition to this, there is the question of what proportion of the knowledge represented by a primary proof is constituted by knowledge of logical implication. If the proportion is relatively high, the factor by which dualization can be expected to increase or extend the knowledge represented by a primary proof will be relatively low. The reason, as stated in section 6, is that the knowledge of logical implication represented by a dual proof can not generally be counted on to be genuinely new knowledge. If the above reasoning is correct, requiring of a dualizable system that it have not merely a model, but an evident, interesting or practically important model does not in itself do much to improve the plausibility of the Comparative Epistemic Gain premise. his world is, indeed, treated by his science as if this world were the creation of his postulates a ‘freie Schöpfung.’ But he does not thus create for mere sport. On the contrary, he reports a significant order of truth. As a fact, the ideal systems of the pure mathematician are customarily defined with an obvious, even though often highly abstract and remote, relation to the structure of our ordinary empirical world. Thus the various algebras which have been actually developed have, in the main, definite relations to the structure of the space world of our physical experience. The different systems of ideal geometry, even in all their ideality, still cluster, so to speak, about the suggestions which our daily experience of space and of matter give us.” 33 Alternatively, let us suppose the Dualization Argument to be restricted to systems for which this is true.
Duality, Epistemic Efficiency & Consistency
19
8 Conclusion If the preceding discussion is correct, the question of whether and to what extent dualization may be capable of extending our knowledge in an especially efficient way is complex and difficult. Perhaps the central concern is how rightly to conceive quantity or extent of knowledge. If extent of knowledge is understood contentually (i.e. as consisting in the extent of its contents), we need better ways of estimating, comparing and measuring it. How significant dualization may be as a means of increasing the contentual extent of our knowledge also depends on questions concerning the nature of the theories being dualized. In particular, it depends on whether theories are taken to be particular 34 (i.e., their axioms are taken to be particular propositions) or abstract (i.e. their axioms are taken to be propositional functions). It depends as well on questions concerning the epistemic aim of proof within a dualized theory. In particular, it depends on whether the aim of proof is taken to be (i) the justification of the theorems proved, or (ii) the discovery of relationships of logical implication which obtain between the axioms of a theory and other propositions expressible in its language. Among questions we have not thus far properly attended to are the demands the Contentual Addition Model places on dualized theories and how these demands compare to the traditional requirements placed on axiomatic theories. This goes particularly for the traditional consistency requirement. For Contentual Addition to be possible, more seems to be required than mere consistency. The dualized theory must at the very least have a model, and it seems, in fact, that it must have a model that is known in an appropriate way to be a model. This last remark calls for elaboration. It was not uncommon for earlier writers to identify consistency with model-existence. Such identification was perhaps particularly common when it came to matters of practical proof. The (or at least a) common belief was that, practically speaking, the only way to prove the consistency of a theory was to provide a model for it.35 Definitionally, the two were often enough distinguished, even if the significance of their differences may not generally have been clear. It was also common practice to characterize the consistency of a set of axioms in terms of what was logically deducible from it. Hilbert thus described the consistency of the five groups of axioms in his Geometrie as consisting 34 Or ‘concrete’ to employ a more commonly used term. 35 Cf. [12, §95]; [13, §143]; [6, p. 530]; [7, p. 629]; [8, p. 77]; [32, p. 3]; [37, pp. 43-44]; [21, pp. 10-11, 14].
20
Michael Detlefsen
in the fact that they “do not contradict one another, i.e. it is not possible, through logical inference, to derive (abzuleiten) from them a fact (Thatsache) which contradicts any of the axioms” [15, p. 19].36,37 Despite common acceptance of a difference in meaning between consistency and model-existence, then, there was no similarly common belief in their practical difference. There was virtually universal agreement that model-existence implies consistency and therefore that proofs of modelexistence are in effect proofs of consistency. Some, in fact, described this as a basic dictate of logic. Whitehead, for example, saw it as following from the Law of Contradiction. A set of axioms must be consistent, that is to say, it must not be possible to deduce the contradictory of any axiom from the other axioms. According to the logical ‘Law of Contradiction’, a set of entities cannot satisfy inconsistent axioms. Thus the existence theorem for a set of axioms proves their consistency. Seemingly this is the only possible method of proof of consistency. [34, p. 3]
That model-existence implies consistency was thus a widely held belief.38 So, too, as noted a couple of paragraphs above, was the view expressed in the last sentence of Whitehead’s remark. There seems to have been no comparably widespread belief that consistency implies model-existence. These were, of course, the decades just before Gödel’s proof of his completeness theorem – the major result of recent times (and probably ever) addressing the connection between deducibility concepts of consistency and model-existence. Still less common, perhaps, was the idea that there might be a practical way to prove consistency that didn’t involve construction of a model. Here, of course, Hilbert was an important exception.39 36 For similar characterizations of consistency in terms of what is deducible from a set of propositions see [34, p. 3]; [18, p. 13]; [21, pp. 10-11]. There were others who, at least sometimes, defined consistency not as non-contradictoriness of deducible consequences, but as possession of a model (cf. [32, p. 3]; [17, p. 165]). 37 In one respect this characterization of consistency is curious. It identifies contradiction with the axioms as that which is to be avoided. In truth, though, we should want to avoid not only contradiction between a theorem and an axiom, but contradiction between any two theorems. This same curiously narrow characterization of consistency was adopted by other writers too. 38 It was not universal, however. The American philosopher Paul Weiss asked: “Is it possible that the only way we can determine whether a set is consistent is by seeing all the postulates actually exemplified in some one object?” [33, p. 468]. “If so,”, he answered, “we must arbitrarily assume that the object is self-consistent, so that the proof of consistency must ultimately rest on a dogma.” [ibid.]. 39 Whitehead alluded to another class of exceptions as well. “Some mathematicians solve the difficult problem of existence theorems by assuming the converse relation between existence theorems and consistency, namely that, if a set of axioms are consistent, there exists a set of entities satisfying them. Then consistency can only be
Duality, Epistemic Efficiency & Consistency
21
Even had Gödel’s completeness theorem been known at the time, it would not have suggested an alternative to model-construction as a practical means of proving consistency. For first-order axioms, it showed that consistency implies model-existence. It did not, however, provide or suggest an alternative to model-construction as a means of proving consistency. Neither, for that matter, does it provide or suggest actual models for given consistent sets of first-order axioms. Consistency and model-existence are thus theoretically and practically distinct and they were generally taken to be so by nineteenth and twentieth century foundational writers who prized duality. It is to me, then, a little surprising that these same writers did not more carefully distinguish consistency and model-existence as constraints on the adequacy of geometrical axiom-systems. The Contentual Addition Model of duality requires not only that a model of the dualizable axioms exist, but that it be known. Those, therefore, who take dualization to be among the cardinal virtues of projective geometry, and who see Contentual Addition (or something like it) as the basis for this virtue, cannot settle for consistency as the basic constraint on axiom systems. They require not merely consistency or even existence of a model. Rather, at the very least, they require the existence of a known model. A direct proof of consistency of the type envisioned by Hilbert for the axioms of arithmetic would not give them this. Consequently, it would not give them what they need – namely, a means of sustaining Contentual Addition by dualization. Only construction of a model would suffice, and for theoretical reasons as well as for such practical reasons as there may be. Acknowledgments It is a pleasure to acknowledge the generous financial support of the TransCoop Programme of the Alexander von Humboldt Stiftung and the Agence nationale de la recherche (ANR) of France under their chaires d’excellence programme. It is also a pleasure to thank the members of the Imaginary and Ideal Elements and Limit Concepts in Mathematics TransCoop project, the Ideals of Proof (IP) research group, the philosophy department and logic group at the University of Notre Dame, the HPS and SPHERE groups at the Université de Paris 7–Diderot, the Philosophy Department and the Archives Henri Poincaré at the Université de Lorraine, the philosophy department and logic and HPS groups at the École Normale Supérieure and the groups attached to the past and current chairs in the guaranteed by a direct appeal to intuition, and by the fact that no contradiction has hitherto been deduced from the axioms. Such a procedure in the deduction of existence theorems seems to be founded on a rash reliance on a particular philosophical doctrine respecting the creative activity of the mind.” [34, pp. 3-4].
22
Michael Detlefsen
philosophy of language and epistemology at the Collège de France. These groups supported my students and me in a variety of ways and provided a welcoming and stimulating environment. Among individuals, special thanks are due to Paddy Blanchette, Henk Bos, James Cargile, Martin Carrier, Marcus Giaquinto, Jeremy Gray, Tim McCarthy, Colin McLarty, Michael Potter, Greg Restall, Peter Schröder-Heister, Göran Sundholm and Jean-Jacques Szczeciniarz for useful discussion of various points.
References [1] F. Bacon. The two bookes of Francis Bacon. Of the proficience and advancement of learning, divine and humane. Printed for Henrie Tomes, London, 1605. [2] J. Becker and D. Gottlieb. A History of Duality in Algebraic Topology. In I. M. James (ed.), History of Topology, pp. 725–745. North-Holland, Amsterdam, 1999. [3] P. Bernays. Hilbert, David. In Borchert (ed.), Encyclopedia of Philosophy, volume IV. MacMillan Reference USA, Detroit, 2006. [4] B. Bolzano. Preface to Considerations on some Objects of Elementary Geometry. Reprinted in [11], volume 1. Page references are to this reprinting. [5] L. E. J. Brouwer. Über die Bedeutung des Satzes vom ausgeschlossenen Dritten in der Mathematik, insbesondere in der Funktionentheorie. Journal für die reine und angewandte Mathematik, 154:1–7, 1925. English translation in [30], 334-345. Page references are to this translation. [6] H. C. Brown. Review of [24]. Journal of Philosophy, Psychology and Scientific Method, 3:530–531, 1906. [7] H. C. Brown. Infinity and the Generalization of the Concept of Number. Journal of Philosophy, Psychology and Scientific Methods, 5:628–634, 1908. [8] J. Coolidge. The Elements of Non-Euclidean Geometry. Clarendon Press, Oxford, 1909. [9] H. S. M. Coxeter. Introduction to Geometry. Wiley, New York, 2nd edition, 1969. [10] L. Dowling. Projective Geometry. McGraw-Hill Book Co., New York, 1917. [11] W. Ewald. From Kant to Hilbert: A Source Book in the Foundations of Mathematics, two volumes. Oxford University Press, Oxford, 1996. [12] G. Frege. Die Grundlagen der Arithmetik. Eine logisch mathematische Untersuchung über den Begriff der Zahl. W. Koebner, Breslau, 1884. [13] G. Frege. Grundgesetze der Arithmetik, Begriffsschriftlich abgeleitet II. H. Pohle, Jena, 1903. [14] J. D. Gergonne. Géométrie de Situation. Annales de Mathematique, 18: 149–154, 1827-28. [15] D. Hilbert. Grundlagen der Geometrie. Teubner, Leipzig, 1899.
Duality, Epistemic Efficiency & Consistency
23
[16] E. Huntington. The fundamental laws of addition and multiplication in elementary algebra. Annals of Mathematics, 8:1–44, 1906. [17] E. Huntington. The fundamental propositions of algebra. In J. W. A. Young (ed.), Monographs on topics of modern mathematics relevant to the elementary field, pp. 151–210. Longmaks, Green, and Co., London, 1911. [18] E. Huntington. The Continuum and other types of serial order. Harvard University Press, Cambridge, MA, 2nd. edition, 1917. [19] G. Ling, G. Wentworth, and D. Smith. Elements of Projective Geometry. Ginn & Co., New York, 1922. [20] G. Mathews. Projective Geometry. Longmans, Green & Co., New York, 1914. [21] C. O’Hara and D. Ward. Introduction to Projective Geometry. Oxford University Press, Oxford, 1937. [22] M. Pasch. Vorlesungen über Neuere Geometrie. Teubner, Leipzig, 1882. [23] A. Pickford. Elementary projective geometry. Cambridge University Press, Cambridge, 1909. [24] M. Pieri. Sur la compatibilité des axioms de l’arithmétique. Revue de Métaphysique et de Morale, 14:196–207, 1906. [25] T. Reye. Lectures on the Geometry of Position, Part I. The Macmillan Co., New York, 1898. [26] J. Royce. The Sciences of the Ideal. Science, 20(510):449–462, 1904. [27] N. Saunderson. The elements of algebra. Cambridge University Press, Cambridge, 1740. [28] E. H. Smart. A First Course in Projective Geometry. Macmillan & Co., London, 1913. [29] E. Specker. Dualität. Dialectica, 12:451–465, 1958. [30] J. van Heijenoort. From Frege to Gödel: A source book in mathematical logic 1879–1931. Harvard University Press, Cambridge, 1967. [31] O. Veblen and J. W. Young. A set of Assumptions for Projective Geometry. Americal Journal of Mathematics, 30:347–380, 1908. [32] O. Veblen and J. W. Young. Projective Geometry, volume I. Ginn & Co., Boston, 1910. [33] P. Weiss. The Nature of Systems. II. The Monist, 39(3):440–472, 1929. [34] A. N. Whitehead. The Axioms of Projective Geometry, Cambridge Tracts in Mathematics and Mathematical Physics. Cambridge University Press, London, 1906. [35] H. Wiener. Über Grundlagen und Aufbau der Geometrie. Jahresbericht der Deutschen Mathematiker-Vereinigung, 1:45–48, 1892. [36] J. W. Young. Projective Geometry, Mathematical Association of America. Open Court Publishing Co., Chicago, 1930. [37] J. W. Young, W. W. Denton, and U. G. Mitchell. Lectures on Fundamental Concepts of Algebra and Geometry. Macmillan Co., New York, 1911.
Frege on Quantities and Real Numbers in Consideration of the Theories of Cantor, Russell and Others1 Matthias Schirn
The core of this essay is a detailed account of Frege’s theory of real numbers in the second volume of his opus magnum Grundgesetze der Arithmetik [21]. I begin with introductory comments on Frege’s standpoint vis-à-vis the conception of analysis by some of his contemporaries and remarks about his platonism. In section 2, I first take a look at Frege’s theory of quantity in his Habilitationsschrift Rechnungsmethoden, die sich auf eine Erweiterung des Größenbegriffes gründen [16]. I deal then with some critical observations that Frege makes in Die Grundlagen der Arithmetik [18] with respect to Hankel and Newton’s treatment of the concept of quantity and make a few remarks on Frege’s review of H. Cohen’s book Das Prinzip der Infinitesimal-Methode und seine Geschichte [7], in which Cohen comments on the Kantian distinction between extensive and intensive magnitudes. In section 3, I describe the essential features of Cantor’s theory of irrational numbers and examine the main points of the critique deployed by Frege. In section 4, I take a look at Russell’s theory of real numbers in [39] as well as in [56]. Section 5 (5.1–5.4) is devoted to a detailed reconstruction of Frege’s conception of the notion of quantity and his theory of real numbers with an eye to both the sketchy informal and the meticulous formal account as far as it goes in [21]. Special emphasis is placed on the considerations that led him to set up the definitions of the concepts positival class and positive class and the problem of proving the mutual independence of the clauses that make up the definition of the former concept which is only preparatory to the latter. Due to Russell’s paradox, Frege’s logical foundation of analysis remained a fragment. Section 5.2 is an interlude in which I deal briefly with the concept of quantity in the work of Euclid, Aristotle and Euler. In the final section 6, I give a brief account of von Kutschera’s proposal of how Frege might have carried on with the logical construction of analysis in a projected third volume of Grundgesetze, had he not been shocked by Russell’s paradox.
1
I dedicate this essay to Christian Thiel on the occasion of his 75th birthday.
26
Matthias Schirn
1 Introduction: the targets of Frege’s critique in Grundgesetze (vol. II) and a question concerning his Platonism The method of introducing the real numbers proposed by Frege in the second volume of Grundgesetze der Arithmetik [21] lies between the traditional geometrical approach and the theories developed by Cantor, Weierstraß, and Dedekind. The latter purport to be purely arithmetical – hence the label “the arithmetization of analysis”. From the geometrical approach Frege adopts the characterization of the real numbers as ratios of quantities or, as he also says, as measurement numbers (Maßzahlen). And taking up a key idea of his fellow mathematicians, he detaches the real numbers from all special kinds or types of quantity. The rationale for doing this, so we are told, is that the application of the real numbers is not restricted to any special types of quantity, but rather relates to the domain of the measurable, which embraces all types of quantity whatsoever. On the face of it, this sounds to be largely in the spirit of Frege’s logicism which he had laid out informally, and by paying much attention to its philosophical underpinnings, in Die Grundlagen der Arithmetik [18]. In this splendid work as well as in the short essay ‘Formale Theorien der Arithmetik’ [19], Frege argued with great cogency that his logicist project rests crucially on the insight that, if arithmetic is to be regarded as a branch of logic, both the application of the numbers and the laws governing them must exhibit the most salient feature of logic, which is utmost generality. At that time (and certainly for several years to come), Frege was deeply convinced that arithmetic meets the logicist requirement of unrestricted generality and, moreover, enjoys the likewise distinguished status of possessing unmatched objectivity: “There is nothing more objective than the laws of arithmetic” [18, § 105]. Frege’s way of discussing the foundations of analysis in [21] bears a striking methodological similarity to his treatment of number theory in [18]. Neither in [18] nor in [21] does he begin by propounding his own theory, but rather by launching a vigorous attack on rival theories. In [21], the main targets are Heine and Thomae’s radical version of formalism (Frege calls it game formalism), Cantor’s theory of real numbers as well as Weierstraß’s view of the natural numbers. Any reader of this volume who is expecting a thorough examination of the theory of irrational numbers “of such a distinguished mathematician as Weierstraß” [21, § 148] is bound to be disappointed. Frege takes the easy route. He basically confines himself to making critical remarks, spiced with plenty of irony, about Weierstraß’s treatment of the natural numbers2 and eventually tries 2
All translations from the work of Frege, Hankel and Euler into English are my own. In a very few cases of Frege’s work, I have only modified and corrected the existing
Frege on Quantities and Real Numbers
27
to convince us that, due to its shaky foundations, Weierstraß’s theory of irrational numbers need not be examined in greater detail. Likewise, Frege pays comparatively little attention to Dedekind’s theory of real numbers, although he praises his sharp distinction between sign and reference [Bedeutung] and the view, disavowed by the formalists, that numbers are what numerical signs refer to and not those signs themselves.3 However, endorsing arithmetical platonism himself, as I think that he does, Frege naturally finds fault with Dedekind’s creation of new mathematical objects by abstraction.4 In the current literature, it is not undisputed that Frege was a fully fledged platonist. Yet putting his platonism in the right perspective is, to my mind, of considerable importance for appropriately assessing his overall philosophy of arithmetic, including his foundational approach to analysis. Thus, some clarifying words about Frege’s platonism may be in order here. I hold that his logicism goes hand in hand with his endorsement of an arithmetical version of ontological platonism. Frege is convinced that all numbers are logical objects which exist independently of human minds. In particular, his ontological platonism in the period 1893–1902 is meant to apply to logical objects of a fundamental and irreducible kind, namely to courses-of-values of functions. According to his logicist manifesto, all numbers are to be identified with logical objects of this prototype. The common view that Frege was a realist with respect to logical objects has been challenged unsuccessfully, I think, by several Frege scholars such as Sluga, Currie, Resnik, and others. The arguments which they advance I take to be far-fetched, or awkward, or both. Perhaps the clearest expression of Frege’s arithmetical platonism can be encountered in the context of his repudiation of Hankel’s formal arithmetic in [18]. There he says that even the mathematician cannot create something arbitrarily, any more than the geographer; “he too can only discover what is there and name translations. As far as I can tell, most of the passages that I translated from [21] have so far not been published in English translation. The translation of [20, 21] by P. Ebert and M. Rossberg is forthcoming from Oxford University Press. 3
Peter Simons, in his stimulating essay ‘Frege’s Theory of Real Numbers’ [52, p. 359], contends that Frege “brings perceptive criticisms of then current theories of reals, among others those of Cantor, Dedekind and Weierstrass, which are not without contemporary relevance” . However, as far as these three mathematicians are concerned, this claim holds at most for Cantor’s theory, but even in that case it must be relativized; see my assessment of Frege’s critique of Cantor’s theory of the reals in section 3.
4
See [9, § 4] and [10, § 6]. Abstraction à la Dedekind (which he characteristically weds to structure) differs significantly from Fregean abstraction. The latter consists in the transformation of a given equivalence relation into an identity between abstract objects. Note that Frege does not speak of abstract objects on his own account, but rather of non-real objects (nicht-wirklichen Gegenständen) when he deals with those objects which we would call abstract today, for example, the axis of the Earth, the equator, the centre of mass of the solar system (cf. [18, p. 35]).
28
Matthias Schirn
it” [18, pp. 107 f.]. In the Preface to [20] (p. XIII), Frege argues exactly in the same vein. Elsewhere [50] I have pointed out that in a passage in [21], where Frege considers the issue of how we have cognitive access to the objects of arithmetic, he does not give a clear-cut answer to the question whether the step of logical abstraction from right to left in Axiom V could – reasonably and acceptably – be called a creation. At the same time, I deliberately refrained from speculating about the reason(s) that might have motivated Frege (a) to ask this question at all and (b) to desist from giving a straightforward answer to it. Nonetheless, three points seem clear to me. First, there is ample evidence that in [21] Frege’s platonism did not undergo any significant change, in spite of (a). Second, by his own lights, Frege should never have conceded that discussing the question of whether his introduction of courses-of-values by way of logical abstraction via Axiom V can be called a creation, may easily degenerate to a quarrel over words. To my mind, he should have avoided raising this issue at all in § 146 of [21] – in fact, there was no recognizable need to do so – instead of backing himself into a corner by leaving it undecided. Plainly, once the issue was brought up, Frege should have given a definite answer, not an evasive one. In the light of the available evidence about the status and the role that he assigns to Axiom V in his logical system, a coherent answer would have been one along these lines: Axiom V is designed to function as the appropriate means of coming into epistemic contact with courses-of-values, of grasping them; it is not intended to call them into being. Like any explicit definition that meets Pascal’s classical requirements of eliminability and non-creativity (cf. [38, pp. 356 f.]) it would be powerless to achieve this anyway. It is true that, unlike proper definitions which are immediately turned into epistemically trivial assertoric sentences once the definiendum has been defined, Frege considers axioms to contain real knowledge.5 Nonetheless, nowhere does he unambiguously claim that they have any creative potential. When he raises the epistemological key question “How do we grasp logical objects?”, he presupposes that they exist prior to their apprehension. The answer to the question, though not exactly in Frege’s words, is of course as follows: they are grasped by means of logical abstraction. Third, in the light of his undisguised fondness of ontological platonism, Frege would have been well-advised to refrain from conceding that one might perhaps call the procedure of logical abstraction a creation, if creation is meant in a rigid sense, implying a barrier to its executability. For even if a creation of logical objects (if it is possible at all) were to proceed in a regulated, non-arbitrary fashion and thus within sharp boundaries, 5
He does not justify this view. It is, for example, difficult to see how Frege could convince us that the axiom ⊢ a → a (cf. [20, § 18]) possesses genuine epistemic value; see [48] for a discussion of this and related issues.
Frege on Quantities and Real Numbers
29
it would nevertheless be a creation and as such clash with the platonist aspirations that Frege manifests in several places of his writings, not least in Grundgesetze. In other words: prohibiting or condemning any arbitrary and boundless creation of mathematical objects and, in the same breath, licensing in certain cases a creation of such objects, if the mode of carrying it out and its admissibility are established once and for all (cf. [21, p. 149]), marks a position that Frege could not consistently maintain, quite apart from the fact that he fails to spell out what “admissibility” is to mean here precisely and how it could be established. In short, he could not have accepted any creation of mathematical or logical objects, no matter how it were performed. (Concerning Frege’s attack on the practice of bringing numbers into existence by means of definition, which was apparently widespread among his fellow mathematicians, see also [51, pp. 156 ff.].) In my view, another issue raised in § 147 would require clarification. When in § 147 Frege asks whether “our procedure can be called a creation” and responds by saying that this question may easily degenerate to a quarrel over words, it is not absolutely clear what he means by “our procedure”. Considering the entire context of his remarks, I presume that he appeals to the step of logical abstraction inherent in Axiom V, that is, to the transformation of the generality of an equality into a course-of-values equality. Yet I do not wish to vouch for this option. (Frege explains that in carrying out this transformation we acknowledge something in common to the two functions – namely their course-of-values; this is how he characterizes the move of abstraction in Axiom V in [21, § 146]; cf. also [24, p. 198]). It is true that in the third passage and especially in the first half of the fourth and concluding passage of § 147 Frege focuses entirely on the alleged virtues of Axiom V: (1) that it is the appropriate cognitive means of grasping logical objects, if there are such objects at all; (2) that it is scientifically indispensable or, more specifically, that without it a scientific justification of arithmetic would be impossible; (3) that it serves the same ends that other mathematicians seek to attain by creating new numbers. (Concerning (3), Frege had already made it clear that the transformation in Axiom V differs fundamentally from the unregulated and arbitrary creation of numbers by other mathematicians.) He goes on to say: “We thus hope to be able to develop the whole wealth of objects and functions dealt with in mathematics out of the functions whose names are listed in I, § 31, as from a seed. Can our procedure be called a creation?” I find this transition irritating, especially since Frege spares himself the trouble of explaining it to his readers. In particular, I fail to see why and how Frege’s hope of being able to develop all the objects and functions dealt with in mathematics out of the primitive functions of the system of Grundgesetze should derive from the virtues that he claims for Axiom V. Admittedly, one might perhaps say that courses-of-values are developed out of the primitive – course-of-values function εϕ(ε) via Axiom V. And in a sense, Axiom V is designed to “yield” (not to create) all objects dealt with in arithmetic. Recall that according to Frege’s logicist credo all numbers are to be defined as or identified with courses-of-values. Nonetheless, at least the way the functions that occur in arithmetic sprout from the seed of the primitive, logically simple functions is a matter quite distinct from logical abstraction inherent in Axiom V.
30
Matthias Schirn
In any event, at this point of Frege’s exposition it seems that we cannot definitely rule out that with “our procedure” he intends to refer quite generally to the development of the objects and functions dealt with in arithmetic out of the primitive functions and not exclusively to the transformation as represented by Axiom V. However, the content of the last sentence of § 147, vague as it is, appears to speak again in favour of my presumption that with the use of the phrase “our procedure” a few sentences earlier Frege intends to refer only to Axiom V. In this sentence (“And with this, all the difficulties and doubts [concerns] that otherwise call into question the logical possibility of creation disappear, and we may hope that with our courses-of-values we achieve everything what has been missed by following those other paths”), he mentions explicitly courses-of-values and thus appeals implicitly also to Axiom V. Be this as it may, the question as to how the development of the objects and functions treated of in mathematics out of the primitive functions is to proceed is passed over in silence by Frege. This is unfortunate because he missed the chance of dispelling any remaining doubt about what he meant by “our procedure” in the relevant context. As to the development of objects and functions out of the primitive functions, I conjecture that what Frege had in mind was the construction of logically complex function-names and object-names by iterated application of the formation rules of his system, which are “gap formation” and “insertion”. In this way, he does indeed obtain special functions – for example, the relation of an object falling within the extension of a concept, the single-valuedness of a relation, the following [succession] of an object after an object in the series of a relation – and likewise special objects – for example, equivalence classes of equinumerosity, extensions of relations (= Relations), Relations of Relations – that are required for laying the foundations of arithmetic, and he is able to define them via constructive definitions. See the table of definitions in [20, pp. 240 f.]; see also [21], for example, §§ 167, 173, 175, 193.
Admittedly, those who still wish to raise doubts about Frege’s platonism in the period 1893–1903 might think to have an easy task by drawing attention to Frege’s apparently insouciant stipulations when he comes to laying out his formal system. What I have in mind, is first and foremost his stipulation at the end of § 10 in [20]: the True and the False are identified with their own unit classes in order to remove, in a first essential step, the referential indeterminacy of course-of-values terms, arising from Frege’s metalinguistic stipulation concerning the informal analogue of the name – of the course-of-values function “ εϕ(ε)” in § 3, later to be enshrined in the 6 formal version of Axiom V. 6
The stipulation in § 3 reads as follows: “I use the words ‘the function Φ(ξ) has the same course-of-values as the function Ψ(ξ)’ generally as coreferential [gleichbedeutend] with the words ‘the functions Φ(ξ) and Ψ(ξ) always have the same value for the same argument’.” Axiom V appears for the first time at the end of § 20, clad in formal garb. – Above I wrote deliberately “in a first essential step”. In § 10, Frege proposes to achieve a more exact specification of courses-of-values, that is, to remove the referential indeterminacy of course-of-values terms, by determining for
Frege on Quantities and Real Numbers
31
The sceptic might object that the identification just mentioned flies in the face of ontological platonism. According to Frege’s alleged platonist stance – so he or she might argue – it should be an objective fact whether, say, the True, is a courses-of-values or not, and if it is one, which one it is. From the point of view of the platonist, this has to be fixed once and for all in the mind-independent universe of logical objects and, hence, can never be a matter of arbitrary stipulation. On the one hand, I do not think that we are entitled to claim, by appealing to Frege’s practice of making certain stipulations in the course of constructing his mature logical theory, that he was not a platonist, at least not during the period of Grundgesetze. On the other hand, I do not wish to deny that there is indeed a tension between Frege’s platonism and certain stipulations that he makes in [20]. To be sure, we have no evidence that he was fully aware of this conflict; nor do we know whether he thought that he could lightly pass over it, insisting that he was at liberty to make certain stipulations – consistent with the set of assumptions underlying the theory – in order to secure a unique reference (Bedeutung) for every well-formed expression of his formal language. Before I turn to Frege’s view vis-à-vis Cantor’s approach to analysis, let me briefly illustrate the tension that I mentioned by considering just one every primitive first-level function, when introducing it, which values it receives for courses-of-values as arguments, just as for all other arguments. At the stage of § 10, the proposed procedure boils down to determining the values of ξ = ζ for coursesof-values and the two truth-values as arguments. In §§ 11-12, Frege introduces the last two primitive first-level function-names of his system, the description operator and the conditional sign, by determining the values of the corresponding functions for courses-of-values as arguments, and for all other arguments. Note that neither function is completely reducible to a primitive first-level function that has already been elucidated. Although Frege passes the issue over in conspicuous silence, it could seem that with these two additional stipulations the piecemeal process of fixing com– pletely the reference of the name of the second-level course-of-values function εϕ(ε) has come to an end for him. Note that the determination of the values of ξ = ζ for courses-of-values and the two truth-values as arguments plays a key role in Frege’s method of fixing completely the references of courses-of-values terms. This applies even independently of the fact that for negation, the determination of the functionvalues for the two truth-values and all other arguments (of type 1) proves to be unnecessary and the horizontal function (which is a concept under which only the True falls) is reducible to ξ = ζ; plainly, this concept is co-extensive with ξ = (ξ = ξ). – By the way, if for Frege a sound elucidation of the primitive function-name “ εϕ(ε)” would have proved to be feasible, that is, one which did not rest on a presupposed acquaintance with courses-of-values, then he could have defined the predicate “a is a course-of-values” (“CV (a)” ), modelled on his definition of “n is a cardinal number” in § 72 of [18]. – CV (a) := ∃ϕ (εϕ(ε) = a). (As far as I can see, this was first noted in [41] and [44].) Equipped with this definition, which, let us suppose, satisfies Frege’s principle of completeness, he would have been in a position to decide, in principle, for every given object a whether or not it is a course-of-values. If a is a course-of-values and is given to us as such, Axiom V would tell us whether a is identical with a course-of-values b referred to by a canonical course-of-values name. Unfortunately, the prospects for devising an – impeccable elucidation of “ εϕ(ε)” were not encouraging for Frege.
32
Matthias Schirn
special aspect. On the face of it, it seems consistent for Frege (1) to dismiss as indefensible the general proposal, made in the second footnote to [20, § 10], of identifying with their unit classes all and only those objects which are not given to us as courses-of-values (that is, which are not referred to by canonical course-of-values terms) and yet (2) to allow certain particular identifications which the general proposal, if accepted, would also license. On closer reflection, however, this is less clear. The identification of the True and the False with their unit classes is, from Frege’s point of view, indeed consistent with Axiom V, as is established by his “permutation argument” in § 10. Yet following his line of thought in the second footnote, it seems that, before we make this stipulation, we are bound to rule out that the True and the False are courses-of-values or classes containing more than one object. For according to the argument presented there, the fact that an object is not given to us as a course-of-values does not imply that it is not one. In particular, we have no guarantee that is not a courseof-values distinct from its unit class. But why should this argument not apply to Frege’s favourite logical object, referred to by “∀x (x = x)” , for example? And if it does, how can Frege then legitimately identify the True with its unit class? So much for Frege’s platonism which in my view overarches his entire philosophy of arithmetic. It is for this reason that in the present introduction I tried to shed some new light on it.7 Let us return to Frege’s critique of rival theories of real numbers. His discussion of Cantor’s theory of irrational numbers appears to be a trifle less polemical than both his crusade against the formalists and his attempt to make fun of and pull to pieces Weierstraß’s theory of the natural numbers. The discussion consists in large part in demonstrating that Cantor offends against two principles of correct explicit definitions that Frege lays down in [20] and considers at length in [21, §§ 56-67]: the principle of completeness and the principle of simplicity (of the definiendum).8 As to the first principle, he confines himself to considering the case of first-level concepts and first-level relations. The principle then states that a definition of a concept must uniquely determine, with respect to any object, whether or not it falls under the concept. Similarly, a definition of a dyadic relation must unambiguously determine, with respect to any one object and any other object, whether or not the one stands in that relation to the other. The principle of simplicity states that the sign or name defined may not be composed of any 7
My motivation to take a closer look at the concluding passage of [21, § 147] from the point of view of Frege’s platonism derives from the talk ‘ “The discussion of this question can easily degenerate into a quarrel about words”: Platonism in Frege’s Grundgesetze?’ that Marcus Rossberg and Philip Ebert delivered in May 2011 in a conference on Frege’s philosophy of mathematics in Bucharest (organized by Sorin Costreie), and especially from our subsequent discussion.
8
In [20, § 33], Frege states seven principles that he considers to be relevant for definitions.
Frege on Quantities and Real Numbers
33
familiar signs that are yet to be defined.9 To all appearances, it was not Frege’s primary concern to comment on the very substance of Cantor’s theory of real numbers. His resumé that this theory in no way reaches its aim seems, apart from a few sound but comparatively minor objections, strongly exaggerated.10 In my eyes, the momentousness of the theories of real numbers developed by Cantor, Weierstraß and Dedekind is beyond doubt. As a matter of fact, these theories had a decisive impact on later approaches to the foundations of analysis. It is mainly for this reason that they deserve to be called “classical”. Frege distinguishes between three notions of the essence of cardinal numbers in the writings of Weierstraß, all of which he dismisses as untenable. (1) A number is an aggregate of concrete things. “If you roused a man, who had never contemplated the matter, from his sleep with the question, ‘What is a number?’, he would likely put forth, in his initial state of perplexity, expressions similar to those of Weierstraß: ‘set’, ‘mass’, ‘series of things’, ‘object consisting of homogeneous parts’, etc. [. . . ] Both of the possible major errors have thereby been committed. The first consists in confusing the number with its bearer or substrate [. . . ] The second lies in the fact that neither the concept nor the extension of the concept are taken to be the bearer of the number, but rather that which should be denoted by the words ‘aggregate’, ‘series of things’, ‘object consisting of homogeneous parts’” [21, p. 150]. (2) The number is a property (value, validity) of such an aggregate. Frege’s comment is this: “The value or validity of an aggregate or a number is distinguished from the aggregate itself and, hence, it seems obvious that with this the actual [eigentliche] number is meant. This is also a way in which it is smuggled in; nowhere is it said what the value or validity might be” [21, p. 151]. (3) The number is an aggregate of abstract things or of a single, repeatedly occurring abstract thing. “As a row of books consists of books, so the number 3 consists then of abstract units, or better yet, of the – of course repeatedly occurring – One. We do not learn what this might be, though. It is probably so abstract that in order to think it, one must not think anything at all” [21, p. 152]. It is in view of these deficiencies that Frege feels free to leave out an examination of Weierstraß’s theory of irrational numbers. The basis of this theory is simply not firm, Frege surmises. In my view, Frege is not taking the matter seriously enough here. As I have implied above, Weierstraß’s construction of analysis – the terminological shortcomings aside – makes sense. Thus, it would have deserved more careful consideration by Frege beyond criticizing Weierstraß’s use of the word “aggregate” and providing evidence of definition-theoretic errors. By the way, one may speculate what Frege himself would have answered if he had been roused in the night with the question: “What is a number?” 9
Frege maintains that the simplicity of the definiendum does not rule out that it may be regarded as consisting of parts. Its simplicity does exclude, however, that the reference of the definiendum follows from the references of the parts and, furthermore, that these parts occur also in other combinations and are treated as independent signs with a reference of their own (cf. [21, § 66]).
10 Of course, it must be assumed in the first place that one is prepared to accept Frege’s theory of definition.
34
Matthias Schirn
Having said that, let me also mention, in fairness to Frege, that in his critical remarks on Weierstraß’s conception of natural numbers (see also [24, pp. 232 ff.]), just as when he inveighs, rather dismissively, against Cantor’s description of how to arrive at the cardinal number of a given set by carrying out a double act of abstraction (cf. [24, pp. 76-80]) or takes sides against Biermann and Schubert’s views of the numbers (cf. [24, pp. 81-95]; [23, pp. 240-261]), he plays masterly on the keyboard of irony and sarcasm. And for the most part, I am inclined to acknowledge his arguments as sound; a few of them even strike me as devastating. Some might complain that Frege’s critique of Weierstraß, Cantor, Husserl, Heine, Thomae, Biermann, Schubert and other contemporaries of his lacks charity and occasionally overshoots the mark (see, for example, [53] and [54]). Be this as it may, compared with the academically longwinded and stilted writing of many of his fellow mathematicians (I recommend [27] as a delightful sample) I find Frege’s way of criticizing rival theories of number both refreshing and insightful.
Needless to say, Frege’s own theory of real numbers in his mature period, although it differs markedly from the theories of his fellow mathematicians, did not emerge from out of the blue. Typically enough, he refers to it as “Größenlehre” (“theory of quantity”). In a sense, the concept of quantity was a constant companion of his when he developed his foundational project in several stages. As a matter of fact, this concept plays already a key role for Frege at the beginning of his career, namely in his “Habilitationsschrift” Rechnungsmethoden, die sich auf eine Erweiterung des Größenbegriffs gründen of 1874, is at least touched upon in his philosophical masterpiece of 1884 and again plays a dominant role in [21]. So far these topics did not receive the attention they deserve. This applies also to Frege’s critique of Cantor’s theory of irrational numbers.11 In sections 2 and 3, I shall try to fill this gap to some extent. Admittedly, as I indicated above, due its focus on alleged definition-theoretic errors and its neglect of certain issues germane to the quintessence of Cantorian analysis, Frege’s assessment of Cantor’s approach suffers from one-sidedness. Moreover, to my mind it is not free from bias. All the same, I think that it deserves to be discussed by paying a little more attention to some of the details of Cantor’s doctrine than Frege does. I shall now proceed as follows. In section 2, I shall be concerned with Frege’s understanding of the concept of quantity in his work between 1873 and 1884. In a first step, I deal with his theory of quantity in his Habilitationsschrift. In a second step (likewise in section 2), I comment on Hankel’s conception of quantity and a remark of Frege’s on Hankel’s theory of real 11 To the best of my knowledge, so far only Dummett [12, pp. 63 ff.] has dealt with Frege’s critique of Cantor’s theory of irrational numbers. Yet his account differs very much from my own. As to the concept of quantity in [16], the only treatment that I have seen in the literature is the one given by [8], pp. 353 ff. However, I have never come across any discussion of Frege’s remarks on Hankel’s theory of real numbers in [18, § 12].
Frege on Quantities and Real Numbers
35
numbers ([18, § 12]). In addition, I consider a passage in [18, § 19], where Frege makes some remarks about Newton’s view of number in terms of “the abstract relation between any given quantity and another of the same kind that is taken as a unity”. In section 3, I describe the essential features of Cantor’s theory of irrational numbers and examine the main points of the critique deployed by Frege. In section 4, I take a look at Russell’s theory of real numbers in [39] as well as in [56]. Section 5 is devoted to a detailed reconstruction of Frege’s conception of the notion of quantity and his theory of real numbers with an eye to both the sketchy informal and the meticulous formal account as far as it goes in [21]. Special emphasis is placed on the considerations that led him to set up the definitions of the concepts positival class and positive class and the problem of proving the mutual independence of the clauses that make up the definition of the former concept which is only preparatory to the latter. Due to Russell’s paradox, Frege’s logical foundation of analysis remained a fragment. In the final section 6, I give a brief account of von Kutschera’s proposal of how Frege might have carried on with the logical construction of analysis in a projected third volume of Grundgesetze, had he not been shocked by Russell’s paradox.
2 The concept of quantity in Frege’s writings between 1874 and 1884 2.1 Methods of calculation and the concept of quantity: Frege’s Habilitationsschrift (1874) At the outset of his Habilitationsschrift Rechnungsmethoden, die sich auf eine Erweiterung des Größenbegriffs gründen (1874), Frege aims at illustrating the remarkable difference between geometry and arithmetic in the way in which their fundamental principles are grounded. As the title suggests, it is by investigating the concept of quantity that he pursues this aim. Frege points out that this concept had gradually been detached from intuition and finally gained the status of a self-subsistent concept. Its range of application is indeed so comprehensive that he is certainly right in denying that it stems from intuition. Frege argues as follows. Since we have no intuition of the object of arithmetic, its principles cannot rest on intuition either. One might add by way of analogy: Since we have no sense perception of the object of arithmetic, its principles cannot rest on sense perception either. Frege does not tell us directly from which source of knowledge the principles of arithmetic are supposed to originate, but I trust that he would have said something like this: these principles derive from conceptual or pure thinking. In an instructive letter to Anton Marty written in 1882, Frege
36
Matthias Schirn
mentions for the first time the notion of a source of knowledge which he presumably borrowed from Kant. However, as early as in [16, p. 50] Frege speaks of intuition as the source of the axioms of geometry. I am almost certain that the word “source” is meant there to refer to what in his letter to Marty of 1882 he characterizes as the source of knowledge of spatial intuition. Generally speaking, I suppose that Frege uses the term “source of knowledge” to refer to a cognitive faculty of the human mind, and in doing so he is following deliberately in Kant’s footsteps. Yet unlike Kant, he explicitly characterizes (though only in his late fragments) a source of knowledge as that which justifies the acknowledgement of truth, the judgement [24, p. 286]. In the letter to Marty, Frege emphasizes that a source of knowledge more restricted in scope than conceptual thinking (begriffliches Denken), like spatial intuition or sense perception, would not suffice to guarantee the general validity of the arithmetical sentences. Thus, in this letter Frege already classifies three sources of knowledge, a classification that reappears, save for one modification, in an undated letter to E. V. Huntington (presumably written in 1902) and in his late fragments ‘Erkenntnisquellen der Mathematik und der mathematischen Naturwissenschaften’ and ‘Neuer Versuch der Grundlegung der Arithmetik’.12 In these fragments, he acknowledges the logical source of knowledge, the geometrical source of knowledge (that is, spatial intutition) and sense perception as constituting the third source of knowledge. Due to the lack of available evidence, I hesitate to suggest that what in the letter to Marty and again in [18] (cf. § 14) Frege calls conceptual thinking coincides with the logical source of knowledge. However, on plausible grounds I assume that in Frege’s eyes our ability and actual performance of conceptual thinking, in particular our practice of drawing deductive inferences, is very much akin to what in his late fragments he terms the logical source of knowledge. I presume, however, that in his view the logical source of knowledge is not only the faculty of drawing deductive inferences13 and, hence, of providing deduc12 In his letter to Huntington, Frege writes [25, p. 89]: “I have set myself the goal of grounding arithmetic on logic alone. For this it is essential to exclude with certainty everything that stems from other sources of knowledge (intuition, sense experience).” 13 According to Frege, deductive inference is to judge by being aware of other truths as grounds of justification. In his fragment ‘Logik’ (I), he underscores that deductive inference cannot be the only mode of justifying truths. “There must be judgements whose justification rests on something else, if they stand in need of justification at all” [24, p. 3]. The task of investigating non-deductive or non-logical forms of justification is assigned to epistemology. Logic and epistemology are thus put on a par only insofar as both disciplines are concerned with justifying grounds of truths. Admittedly, in ‘Logik’ (I), Frege does not mention other forms of justification besides deductive inference or deductive proof. In particular, he does not say there that epistemology can provide a non-deductive justification of a primitive law of logic. It therefore remains unclear on what the justification of truths (if there are any), which are capable and (or) in need of justification, but resist justification through
Frege on Quantities and Real Numbers
37
tive justifications for truths. My hunch is that he also takes it to be that cognitive faculty which enables us to grasp, in a direct, non-inferential way primitive laws of logic.14 This is not to say that he regards the logical source of knowledge at the same time as furnishing justifying grounds for acknowledging primitive laws of logic to be true. In the Preface to [20], Frege raises the question why and with what right we acknowledge a logical law to be true. His answer is that logic can respond only by reducing it to other logical laws. When this is not possible – as can be seen whenever the act of acknowledging a primitive law of logic as true is at issue – Frege claims that logic can give no answer. In summary then, I presume that during his entire career, from his first writings until his last fragments, Frege adhered unwaveringly and invariably to the Kantian idea that the human mind is endowed with certain specific faculties of attaining knowledge, with sources of knowledge. In [18, § 12], Frege uses also the terms “Erkenntnisgrund ” (“ground of knowledge”) and “Erkenntnisprinzip” (“principle of knowledge”) when he refers to (pure) intuition. There are several places in the Kritik der reinen Vernunft (Critique of Pure Reason, [35]) where Kant employs the term “Erkenntnisquelle”. Following Kemp Smith’s translation of [35], I render it as “source of knowledge”; “source of cognition” is another possible translation, and it is chosen by Guyer and Wood in their translation of [35]. Thus, for example, Kant writes in [35, B4] (I quote again from the translation by Guyer and Wood): “. . . strict universality belongs to a judgement essentially; this points to a special source of cognition [knowledge], namely a faculty of a priori cognition [knowledge]. Necessity and strict universality are therefore secure indications of an a priori cognition [knowledge].” Such a faculty of a priori knowledge is space and time: “Time and space are accordingly two sources of cognition [knowledge], from which different synthetic cognitions can be drawn a priori, of which especially pure mathematics in regard to the cognitions of space and its relations provides a spendid example” (A38-9/B55-6). Much later, in the Transcendental Dialectic, Kant writes (A294/B350-1): “But the formal aspect of all truth consists in agreement with the laws of the understanding. In the senses there is no judgement at all, neither a true nor a false one. Now because we have no other sources of cognition [Erkenntnisquellen] besides these two, it follows that error is effected only through the unnoticed influence of sensibility on understanding, through which it happens that the subjective grounds of the judgement join with the objective ones, and make the latter dedeductive proof, is supposed to rest. It seems, however, that if there were no such truths, epistemology, as characterized by Frege, would lack a proper domain of investigation. For he can hardly see its task in supplying justifying grounds for truths which do not stand in need of justification. Notice that in ‘Logik’ (I) Frege does not explicitly claim or demand the existence of truths that need neither deductive nor non-deductive justification. 14 I am inclined to ask: if it is not the logical source of knowledge that enables us to grasp primitive truths of logic, which other source of knowledge should then enable us to do this? Moreover, it is perfectly possible that in Frege’s view the logical source of knowledge comprises also the faculty of conceptual analysis.
38
Matthias Schirn
viate from their destination. . . ” Earlier in the Critique (A94/B127), Kant had mentioned “three original sources (capacities or faculties of the soul), which contain the conditions of the possibility of all experience, and cannot themselves be derived from any other faculty of the mind, namely sense, imagination, and apperception.” Later, at the very outset of the second book of the Analytic of Principles (A130-1/B169), he underscores that “general logic is constructed on a plan that corresponds quite precisely with the division of the higher faculties of cognition [Erkenntnisvermögen]. These are: understanding [Verstand ], the power of judgement [Urteilskraft], and reason [Vernunft]. In its Analytic that doctrine accordingly deals with concepts, judgements, and inferences, corresponding exactly to the functions and the order of those powers of mind [Gemütskräfte] which are comprehended under the broad designation of understanding in general.” I presume that Kant uses the terms “source of knowledge [cognition]” and “faculty of knowledge [cognition]” largely in the same sense. According to him, there are basically two sources or faculties of knowledge, a lower one, namely sense, and a higher one, which is understanding, taken in a comprehensive sense. Understanding, conceived of in this wider sense, comprises both the power of judgement and reason. On the face of it, Frege’s logical source of knowledge bears a notable similarity to Kant’s source or faculty of knowledge of the understanding. According to Frege, the logical source of knowledge is involved when inferences are drawn, and thus is almost always involved. Similarly, in Kant’s view, both judgements and inferences fall, by their very nature, in the domain and activity of the understanding, construed in the broader sense. As far as the role of concepts in Kant’s Analytic is concerned – recall that he establishes a correspondence between understanding (construed in the narrower sense), the power of judgement, and reason and the specific function belonging to each of these higher faculties of cognition – Frege’s use of the term “conceptual thinking” may come to mind. This term is perhaps a kind of forerunner of the term “logical source of knowledge”, bearing in mind that Frege uses the latter term only in an undated letter to Huntington and his last fragments. Let me emphasize that despite the similarity I just mentioned Frege’s conception of logic does not coincide with Kant’s. Kant distinguishes between general or formal logic and transcendental logic. We know from a remark of Frege’s in ‘Über die Grundlagen der Geometrie’ II, (1906) that, despite first appearances, logic is for him not purely formal (cf. [23, p. 322]). I must leave a thorough comparison of the conceptions of logic of Kant and Frege for another occasion.
It is true that logic is not even mentioned in [16]. Yet stressing the comprehensive range of application of the concept of quantity, as Frege does, seems to foreshadow his later argument from the universal applicability of arithmetic to its purely logical nature ([18] and [19]). To be sure, it is not more than this. In [16], Frege does not anticipate, let alone explicitly state, the central thesis of his philosophical masterpiece still to come in 1884. The thesis is as follows: The fundamental laws of arithmetic are (in all likelihood) analytic, that is, they can be derived exclusively from
Frege on Quantities and Real Numbers
39
primitive laws of logic and definitions.15 I hasten to add that the key idea underlying Frege’s logicist project does already appear in [17], although there it is not yet framed in terms of the notion of analyticity.16 After having classified two kinds of truths which require a proof for their justification – the proof of a truth of the first kind can proceed purely logically, while the proof of a truth of the second kind must be supported by empirical facts – the question to be settled for the laws of arithmetic is to which of these two kinds of truths they belong. As Frege points out, the answer requires us to test “how far one could get in arithmetic by means of inferences alone, relying only on the laws of thought, which are beyond all particularities. The procedure for this test was that I sought first to reduce the concept of ordering in a series to the concept of logical consequence, in order to advance from here to the concept of number” [17, p. X].17 15 In Frege’s view, primitive truths of logic are maximally general truths which, thanks to their evidence, neither need proof nor admit or are capable of proof in a theory T in which they are laid down as axioms. While the property of unprovability depends on a particular system, Frege seems to regard the property of not needing proof as something that belongs intrinsically to certain distinguished, (self-)evident general truths, quite independently of the system in which they are singled out as axioms. Note that in [18, § 3] Frege defines the concept of analyticity only for truths which are capable of being proved; no provision is made for the first premises of the deductive proof (of an arithmetical truth), namely the basic laws of logic figuring as axioms in a theory T and the definitions framed in T . I assume, however, that Frege, had his attention been drawn to the omission, would have characterized both the primitive laws or axioms of logic and the definitions as analytic. Frege insists that in his definition of analyticity in terms of deducibility from fundamental logical laws and definitions it is presupposed that we take into consideration also those propositions on which the permissibility of a definition rests. (Both Austin and Jacquette’s translations of the relevant passage in [18, § 3] are inaccurate; see [22, p. 4] and [26, p. 19].) 16 In his Begriffsschrift of 1879, Frege does not yet employ the term “analytic” in the sense in which he defines it in [18, § 3]. In [17, § 24], he explains his conception of definition by taking as an example the definition of a hereditary property in a series (or sequence). It is only in this context that he uses the term “analytic”, and he does so along Kantian lines, where the analyticity of a judgement implies its epistemic triviality. Once the content of the definiens has been bestowed upon the definiendum, the definition is immediately turned into an analytic judgement, because it displays only what was put into the new symbols in the first place. By contrast, Frege’s definition of the notion of analyticity in terms of the notion of deductive proof in [18], § 3 allows that an analytic truth extends our knowledge. It therefore differs essentially from Kant’s explanation, despite the fact that in [18, § 3 (footnote)] Frege tries to play down the difference by saying that he does not intend to confer a new sense on the term “analytic”, but only to state accurately what Kant has meant by it. Yet in [18, § 88] Frege finds fault with what he sees as the narrowness of Kant’s definition of analyticity. It is already in his letter to Marty that Frege criticizes Kant for having placed too little value on analytic judgements because the examples on which he draws are too simple. I doubt that the basic laws of arithmetic, if they “can be proved from definitions by means of logical laws alone. . . may have to be regarded as analytic judgements in the Kantian sense” [25, p. 163]. 17 In part III of [17] entitled ‘Einiges aus der allgemeinen Reihenlehre’ (‘Some Topics
40
Matthias Schirn
After having argued against the intuitive character of the subject matter of arithmetic in [16], Frege goes on to write (p. 51): If, as we have shown, we do not find the concept of quantity in intuition, but create it ourselves, then we are justified in trying to formulate its definition so as to permit as manifold an application as possible, in order to extend the domain that is subject to arithmetic as far as possible. Now to what do those principles, from which the whole of arithmetic grows as from a seed, refer? To addition; for the other kinds of calculation arise from this one. This is why there is such an intimate connnection between the concepts of addition and quantity that the latter cannot be grasped at all without the former. Quite generally speaking, the process of addition is the following: we replace a group of things by a single one of the same kind. This gives us a determination of the concept of quantitative identity. If we can decide in every case when objects agree in a property, then we have obviously the correct concept of the property. Thus in specifying under what conditions there is a quantitative identity, we determine thereby the concept of quantity. A quantity of a certain kind – for example, a length – is accordingly a property in which a group of things can agree with a single thing of the same kind, independently of their internal structure.
Frege adds that the proposed determination of the concept of quantity can be regarded as sound only if the property we are thinking of allows such a scope that it is possible for things not to agree in it. He calls the multiplicity enclosed within this scope the quantitative domain. This exposition of the concept of quantity is far from being a paragon of clarity and definiteness. (1) To begin with, it springs to mind that Frege speaks of a creation of the concept of quantity by ourselves. To the best of my knowledge, this is the only place in his entire work where he acknowledges a creation at all concerning concepts, numbers or logical objects in general, within the bounds of his own philosophy of arithmetic. I do not know how much importance we should attach to this remark which is at variance with everything that Frege says in his later work about the formation of concepts in general and of mathematical and logical concepts in particular. Perhaps Frege only wanted to convey that the concept of quantity does not originate in intuition, but is rather something that we find only in rational, conceptual thinking. Be this as it may, it is true that only a few years later in ‘Boole’s rechnende Logik und die Begriffsschrift’ of 1880-81 and likewise from a General Theory of Sequences’), Frege derives a number of theorems about sequences to provide a general idea of how to handle his concept-script and underscores the extensive applicability of the theorems obtained. He makes it clear that the range of validity or application of a truth is as wide as the scope of the source of knowledge from which it derives. For the sake of convenience, I use here the term “source of knowledge” ; recall that in 1879 Frege does not yet use this term. When he embarks on commenting on theorems about sequences, he mentions pure thinking and intuition which only a few years later he terms sources of knowledge.
Frege on Quantities and Real Numbers
41
in [18] the usefulness of definitions in mathematics and logic is not generally seen as restricted to their function as abbreviations and simplifications as in Frege’s work, say, after 1891, when he comes to develop and present a systematic theory of (explicit) definition, based on a few clear-cut principles. According to [18], the distinguishing mark of “really good” definitions lies in the fact that they embody a process of fruitful concept formation. This process proceeds by analyzing a judgeable content into a constant and a variable part or in other words: by applying the method which in [20] Frege describes by stating three rules of constructing function-names in his formal language and which I term rules of gap formation. Yet no matter how fruitful concept formation via gap formation is taken by him to be (at least during the period 1880-1884), he nowhere characterizes it as a creation.18 (2) So much at least is clear: Addition is regarded as the key operation on which every other arithmetical operation rests. The concepts of quantity and of addition are inextricably intertwined. It is through addition that the relation of quantitative identity is fixed. In Frege’s theory of real numbers in [21], addition again plays a distinguished role. Here the demarcation of the quantitative domain results from the requirement that the commutative and associative laws for addition hold. (3) The reader who is expecting that a definition of the concept of quantity is finally forthcoming is bound to be disappointed. As to Frege’s claim “If we can decide in every case when objects agree in a property, then we have obviously the correct concept of the property”, Currie [8, p. 354] has pointed out that it is ambiguous. He argues that “agreement in a property” may mean: (a) “a and b have F ”, or (b) “a and b have F to the same degree”, or (c) “a and b are the same F ”. He considers (b) to be the most likely option: “Because from ‘a and b have F to the same degree’ we can infer (A): ‘the magnitude of a’s F ness = the magnitude of b’s F ness’. . . ” (p. 354). Perhaps this is right; it is hard to tell. Note that in Frege’s later work to have a grasp of a first-level concept or property would amount to saying: we can decide for every given object whether it falls under the concept or not/whether it has the property or not. (After 1891, Frege construed the concepts under which a given object falls as its properties.) Now, on the face of it, the way Frege characterizes the intended definitional introduction of the concept of quantity is reminiscent 18 According to [18], the characteristic marks of fruitful definitions are as follows: (1) they represent a kind of concept formation in which, to use Frege’s geometrical image, entirely new boundary lines are drawn; (2) they enable us to carry out gapless proofs, something that would have been impossible without them; (3) we may draw inferences from them which extend our knowledge. This, however, is not to say that a fruitful definition as such adds to our knowledge. Frege nowhere claimed that it does. As I indicated above, in his mature period after 1891 Frege abandoned his thesis about the systematic fruitfulness of good definitions in mathematics and logic. For details regarding this change see [43].
42
Matthias Schirn
of the method of introducing (tentatively) a concept (more precisely: a function-name or a singular term forming operator) by means of a contextual definition, in terms of an abstraction principle. Unfortunately, Frege fails to specify a criterion of identity for quantities. If Currie’s proposal is correct, then we are left with “The quantity of a’s F ness = the quantity of b’s F ness if and only if. . . ”, and have to find out what could or should be put into the empty place, marked by the three dots. If it were the sign of a suitable equivalence relation, then the contextual definition so construed would define the operator “the quantity of a’s ϕness”.19 To be sure, Frege spares himself the trouble of showing how the content of arithmetic is contained in the properties of quantity (he thinks) he has set out, and how special kinds of quantity, such as cardinal number and angle, can also be defined from his standpoint. He confines himself to drawing the conclusion that quantity can be ascribed to operations. In general, he says, it is possible to search for the operation which, when applied n times, can replace a given operation, and for the operation which reverses the given one. “We can easily see that these operations and the ones that can arise from them in the ways indicated form a quantitive domain” [16, p. 52]. Frege goes on to point out that there are several examples of the repetition of the same operation to be found in arithmetic. Thus, addition is said to lead to multiplication, and multiplication to involution. So much for Frege’s treatment of the concept of quantity in [16]. Let us now turn to Hankel and Newton and Frege’s comments on their doctrines.
2.2 Frege on Hankel and Newton in Frege 1884 Guided by his earlier definitions of the terms “analytic truth”, “synthetic truth”, “a posteriori truth” and “a priori truth”, Frege raises the question, in the heading of § 12 of [18], whether the laws of arithmetic are synthetic a priori or analytic. He comments mainly on Hankel’s theory of real numbers and Kant’s notion of intuition. Before I turn to Frege’s comment on Hankel’s theory, let me take a look at Hankel’s introduction of the concept of magnitude in [27]. At the outset of the section entitled “The real numbers in the theory of magnitude”, Hankel claims that the relation-concept magnitude (Grösse) 19 We do not know when Frege began composing and writing [18]. Although it is a fairly small book, it contains almost his entire philosophy of mathematics in a rather condensed form. Furthermore, Frege went through a fair amount of literature before or when writing several chapters of the book. Thus, I presume that soon after the completion of his Begriffsschrift of 1879 he began working on his philosophical masterpiece. The fact that apart from his Begriffsschrift he published relatively little during the period, say, 1875–1883, lends perhaps further support to my presumption. In short, assuming that in [16] Frege thought that the concept of quantity would be best defined in terms of an abstraction principle is by no means far-fetched, let alone out of place.
Frege on Quantities and Real Numbers
43
is immediately given in pure intuition. He concludes from this that he need not provide a metaphysical definition of this concept, that is, a definition that reveals completely its essence, and that an exposition of it will suffice for his purposes. After having made a somewhat cloudy remark on the nature of mathematical definitions, Hankel goes on to say that regarding the concept of magnitude we need not define the concept of quantity (Quantität ), but must rather define the concept of a quantum (Quantum). He adds that these two concepts are united in the word “magnitude” and finally suggests that it is not the concept of magnitude that requires a definition, but rather “what ‘large’ is” (“was ‘gross’ sei” ). I find this hard to follow. Hankel possibly wishes to convey that the meaning of the word “magnitude” contains two distinct constituents or elements, namely the concept of quantity and the concept of a quantum. Alternatively, it could seem that he wants to say that the concept of magnitude has two intimately related conceptual components, namely the concepts just mentioned. I presume that what has to be defined, according to Hankel, is the predicate “is large” (“ist gross”). In what follows, Hankel refers to Euclid. He says that an analysis of the use that Euclid makes of the concept of being large or of the concept of largeness (Begriff des Grossen) yields the following definition [27, pp. 48 f.]: Grösse heisst ein Object, wenn es grösser, kleiner als ein anderes, oder ihm gleich ist, und in letzterem Falle ihm überall substituiert werden kann; wenn es ausserdem durch wiederholte Position vervielfacht (und geteilt) werden kann. Gleichartig heissen Grössen, wenn die eine vervielfältigt, die andere übertreffen kann. We call magnitude an object if it is greater, smaller as another or equal to another object, and in the latter case can always be substituted through it; if furthermore it can be multiplied (and divided) by iterated position. Magnitudes are of the same kind, if the one multiplied can exceed the other.
The last sentence expresses the same idea as definition 4 of book V of Euclid’s Elements (I first quote from the original Greek text and then from the translation provided by Heath): Λόγον χειν πρÕς ¥λληλα µεγέθη λέγεται, § δύναται πολλαπλασιαζόµενα ¢λλήλων Øπερέχειν. Magnitudes are said to have a ratio to one another which are capable, when multiplied, of exceeding one another.
Roberto Torretti has pointed out to me that Hankel’s way of phrasing Euclid’s definition is imprecise. According to Torretti, the correct translation of definition 4 into German should have been this: Gleichartig heissen Grössen, wenn eine jede vervielfältigt die andere übertrifft.
I think that Torretti is right in stressing that this amendment is not sheer pedantry. He argues that if it were sufficient for satisfying the definition
44
Matthias Schirn
that any of the two magnitudes, when multiplied, were to exceed the other, we might say that there is a ratio between a right angle α and any curved angle β contained in α since α multiplied with 1 is clearly greater than β. On the other hand, though, it is obvious that β multiplied with n will never exceed α, no matter how large the factor n may be. As to Hankel’s explanation “Grösse heisst ein Object, wenn es grösser, kleiner als ein anderes, oder ihm gleich ist, und in letzterem Falle ihm überall substituiert werden kann; wenn es ausserdem durch wiederholte Position vervielfacht (und geteilt) werden kann”, it does not correspond to any of the passages of the Elements where Euclid uses the word “µέγεθος” (“magnitude”). Now it is correct that in Book V of the Elements Euclid does not define the concept of magnitude, but rather analyzes its properties and structure by setting up a group of definitions and by subsequently proving a number of propositions involving the concepts of magnitude, of ratio, of multiple, of proportion, proportional, etc. However, instead of presenting a definition of the concept of a quantum or of the predicate “is large”, Hankel expressly offers a definition (not an exposition!) of the concept of magnitude when he makes his stipulation by appealing to Euclid. Recall that Hankel considered a definition of this concept to be unnecessary in the first place. However, I refrain here from trying to disentangle what strikes me as a confusion of terms and turn now to Frege’s comments on Hankel in [18, § 12]. In this section, Frege mentions that Hankel [27] bases the theory of real numbers on three principles, to which he ascribes the character of notiones communes.20 He then quotes from [27, p. 54]: They become perfectly evident through explication, are valid for all domains of magnitudes, according to the pure intuition of magnitude; and they can, without forfeiting their character, be transformed into definitions, by saying: By the addition of magnitudes one understands an operation that satisfies these principles.
Frege objects that in the last claim there is an unclarity. He is willing to grant that the proposed definition can perhaps be framed. Yet he also points out [18, § 12] that it cannot serve as a substitute for those principles; for in the application it would always be at issue: are the cardinal numbers magnitudes, and is, what one ordinarily calls addition of cardinal numbers, addition in the sense of this definition? And to answer it, one would already need to know those sentences about the cardinal numbers.
The three principles that, according to Hankel, can be transformed into definitions are the following (cf. [27, pp. 54 ff.]): (1)
a + (b + c) = (a + b) + c.
20 Hankel appeals here to Kant’s conception of notiones communes.
Frege on Quantities and Real Numbers
45
(2)
a + b = b + a.
(3)
If a = Ae, b = Be and a′ = Ae′ , b′ = Be′ , then (a + b) of e is the same multiple as (a′ + b′ ) of e′ .
Indeed, Hankel’s way of presenting the matter falls short of clarity, but not necessarily for the first reason that Frege mentions. To begin with, strictly speaking, it is not correct to claim that the principles (1), (2), and (3) are transformed into definitions. What Hankel suggests, is rather a single definition of the operation of addition of magnitudes, consisting of several clauses. Understood in this way, the definition does not give rise to objections on formal grounds. Thus, at least from a formal point of view, Frege’s cautious phrase “can perhaps be made” seems to be misplaced. Frege is of course right in speaking of a definition in the singular and thus in tacitly correcting Hankel’s phrasing. Hankel’s definition should read as follows: The addition of magnitudes is an operation that satisfies the following principles: (1) (2)
a + (b + c) = (a + b) + c. a + b = b + a.
(3)
If a = Ae, b = Be and a′ = Ae′ , b′ = Be′ , then (a + b) of e is the same multiple as (a′ + b′ ) of e′ .
Now a word about Frege’s first objection. I take it that Hankel regarded the cardinal numbers as magnitudes/quantities just as Frege did in [16] (cf. p. 51) and in [18]. The definition of the addition of magnitudes given above would then fully apply to the cardinals constituting just one type of magnitude. Moreover, I fail to see why Hankel should care much about the question of whether addition of cardinal numbers in the “ordinary” sense is addition in the sense of his definition. (Note that Frege fails to spell out what “ordinary” is to mean here precisely.) Hankel’s definition just lays down how the operation of addition of magnitudes of any type should be understood in his theory of magnitude. And to be sure, at least the properties of associativity and commutativity fully apply to what we “ordinarily” regard as the operation of addition of cardinal numbers. Frege writes [18, p. 18]: If we consider everything that is called a magnitude: cardinal numbers, lengths, surface areas, volumes, angles, curvatures, masses, velocities, forces, light intensities, galvanic currents, and so forth, we can well understand how they can all be brought under one concept of magnitude; but the expression “intuition of magnitude”, and even more so “pure intuition of magnitude”, cannot be acknowledged as correct.
In the light of Frege’s later work on the foundations of analysis in [21], it might come as a surprise that in this quotation he also mentions cardinal numbers as forming a kind of magnitude. From his later point of view, this is illicit. In [21], in the course of comparing the reals with the cardinals,
46
Matthias Schirn
he makes it clear that a cardinal number serves to answer a question of the form “How many objects of a certain kind are there?”, while a real number answers the question “How great is a magnitude (or quantity) compared with a unit magnitude (or unit quantity)?”. In § 19 of [18], when Frege comes to discuss Newton’s conception of number, he claims – erroneously – that the number that gives the answer to the question “how much?” can also determine how many units are contained in a length. Finally, returning to Frege’s critique of Hankel’s theory of magnitude, I think that Frege is right in denying that the expression “intuition of magnitude”, and even more so the term “pure intuition of magnitude”, can be acknowledged as correct. Unfortunately, it remains obscure why Hankel appeals to a pure intuition at all in this context. Perhaps he was strongly influenced by Kant’s notion of a pure intuition in the Critique of Pure Reason. However this may be, from a mathematical point of view, there was no need for Hankel to invoke a pure intution when introducing and characterizing the concept of magnitude. Thus, I think that Frege’s objection has after all little weight, since Hankel could easily have refrained from using the phrase “according to the pure intuition of magnitude”, indeed without any loss for his mathematical theory of magnitude in general and his proposed definition of the operation of addition of magnitudes in particular. Let us now turn to Frege’s comment on Newton’s conception of number in [18, § 19]. When Frege comes to discuss Newton’s conception of number in [18, § 19] he bases his comments on Baumann’s account of Newton’s ideas (cf. [4]). While I was composing this essay, I did not succeed in getting hold of Newton’s original work; nor did I manage to cast a glance at Baumann’s book. It is for this trivial reason that at present I cannot judge whether Baumann represents Newton’s conception of number faithfully. In § 19, Frege argues against the attempt to conceive numbers geometrically, as ratios between lengths or surfaces. He cites Newton as proposing to define number as the abstract ratio between quantities, namely between any given quantity and another quantity of the same kind, taken as unity. This tallies with Frege’s own characterization of real numbers two decades later in [21]. Frege observes that Newton’s definition applies to numbers in the wider sense, including fractions and irrational numbers, adding the proviso that in this case the concepts of magnitude and of ratio of magnitudes are presupposed. Frege concludes from this: “Accordingly, it appears that the explanation of number in the narrower sense, of cardinal number, will not be superfluous.” 21 Now, the supposed fact that Newton’s general 21 As so often, Austin deviates a trifle too far from Frege’s original text when he translates [22, p. 25]: “This should presumably mean. . . ” This is simply inaccurate. By contrast, Jacquette [26, p. 34] gets it right here: “Accordingly, it appears. . . ” However, both Austin and Jacquette render the term “ Grössenverhältnis” somewhat awkwardly as “relation in respect of magnitude” (Austin) and as “magnitude
Frege on Quantities and Real Numbers
47
definition of number presupposes the concepts of magnitude and of ratio of magnitudes would not cause any problem if prior to that definition Newton provided a proper explication or definition of the concepts of magnitude and of ratio of magnitudes (which at present I do not know). At any rate, Frege tries to justify the apparent need for explaining the concept of cardinal number within a Newtonian setting by referring to Euclid: “for Euclid needs the concept of equimultiple [“des Gleichvielfachen” ] in order to define the identity of two ratios of lengths; and the equimultiple amounts again to a numerical identity.” 22 Frege does not exclude that the identity of ratios of lengths can be defined independently of the concept of number. He goes on to say that if it could be defined in this way, then we might remain in uncertainty in which relation the geometrically defined number would stand to the number of ordinary life. He adds that a further problem might arise, namely the question of whether arithmetic itself can get along well with a geometrical concept of number, especially if one thinks of the number of roots of an equation or the numbers prime to a number and smaller than it. However, as I already pointed out, the sharp distinction between the application of the reals and the application of the cardinals along the lines of [21] is missing in [18]. This is obvious from Frege’s remark that the number that gives the answer to “How many?” can also determine how many units are contained in a length. In conclusion, he raises another objection to what, in his view, might have been Newton’s understanding of the notion of magnitude: Calculation with negative, fractional, irrational numbers can be reduced to calculation with the natural numbers. Yet what NEWTON perhaps wished to understand by magnitudes, as whose ratio the number is defined, was not only geometrical magnitudes but also sets. In that case, however, his definition is useless for our purposes, since of the expressions “number through which a set is determined” and “ratio of a set to the unit of the set” the latter provides no better information than the first.
So much for Frege’s critical assessment of Hankel’s and Newton’s conceptions of quantity.
2.3 Extensive and intensive magnitudes: Frege on Cohen In a review of [7], Frege criticizes Cohen’s treatment of the distinction between extensive and intensive magnitudes.23 This distinction has Kanrelations” (Jacquette). 22 Jacquette’s translation of “Gleichvielfaches” as “equinumerosity” is incorrect, since Frege distinguishes between Gleichvielfaches and Gleichzahligkeit (equinumerosity). 23 See in this connection [7], section 19 “Differential und intensive Größe”, section 33 “Intensive Realität”, section 58 “Das Intensive und das Inextensive” and section 79 “Die intensive Größe und das Infinitesimale bei Kant”; cf. also [6, pp. 211 ff.]. Just one year before he published his Begriffsschrift Frege delivered in Jena a short
48
Matthias Schirn
tian roots. In [35] (“Systematic representation of all synthetic principles”), Kant distinguishes between extensive and intensive magnitudes. The principle of the axioms of intuition is: All intuitions are extensive magnitudes, whereas the principle of the anticipations of perception is: In all appearances the real which is an object of the sensation, has intensive magnitude, that is, a degree. Kant calls an “extensive magnitude that in which the representation of the parts makes possible the representation of the whole (and therefore necessarily precedes the latter)” (A162/B203). He calls that magnitude “which can only be apprehended as a unity, and in which the multiplicity can only be represented through approximation to negation = 0, intensive magnitude” (A168/B210). In his review of [7], Frege writes [23, p. 101]: Now the distinction between intensive and extensive magnitudes has no sense in pure arithmetic. Nor does it seem to matter anywhere else in the whole of mathematics. The number 3, for example, can serve as a measurement number for a distance with respect to a unit of length; but it can also serve as the measurement number for an intensive magnitude, for example, for a light-intensity measured in terms of a unit of lightintensity. The calculation proceeds in both cases according to exactly the same laws. The number 3 is therefore neither an extensive nor an intensive magnitude but it rather stands above this contrast. The same holds also for the infinitesimal. Cohen would perhaps respond to this: Light-intensity is not an intensive but an extensive magnitude; yet it seems that such a response would fly in the face of linguistic usage.
There is not much to add to this assessment from my point of view. I agree with Frege that the distinction at issue does not have any proper place in arithmetic. Following Frege again, I further hold that this distinction is unsuited for playing any significant and fruitful role in other branches of mathematics. Finally, I think that both the concept of extensive magnitude and that of intensive magnitude are far from being sharply defined either in [35] or in [7]. To form a sustainable judgement concerning the legitimacy of Frege’s objections, I have taken the trouble of reading half way through [7], but found Cohen’s cloudy style of writing and arguing hard to digest. It seems to me that in raising his objections to certain ideas presented by Cohen Frege exercised verbal restraint. Despite the massive shortcomings of Cohen’s account, Frege’s critique is not accompanied by irony, let alone by sarcasm. On several other occasions, when he shoots the arrows of his mordant criticism on mathematicians and philosophers alike, just the opposite is the case. lecture on a way of conceiving the shape of a triangle as a complex quantity. He argues that, despite first appearances, the shape of a triangle can be conceived of not only as a quality, but also as a quantity. He underscores that the second option ought not to be confused with the fact that the shape can be characterized by quantitative determinations. “What we are concerned with here is to obtain one and only one measurement number [Meßzahl ] for each triangular shape, so that one can speak of the addition of two triangular shapes to yield a new triangular shape” [23, p. 90].
Frege on Quantities and Real Numbers
49
3 Cantor’s theory of irrational numbers and Frege’s critique In what follows, I shall characterize Cantor’s theory of irrational numbers and take a closer look at the objections that Frege raises to this theory. In his essay ‘Über die Ausdehnung eines Satzes aus der Theorie der trigonometrischen Reihen’ of 1872, Cantor develops for the first time – albeit in a rather condensed form – his theory of irrational numbers [5, pp. 92-101]. He construes them as limit values of convergent sequences of rational numbers. In later work (see, for example, [5, p. 186]), he calls these sequences fundamental sequences (Fundamentalreihen). In his essay of 1872, Cantor takes the rational numbers as given and defines an infinite sequence of rationals a 1 , a 2 , . . . an , . . . (that is, a Cauchy-sequence {an } of rational numbers) by appealing to the condition that the difference an+m − an becomes infinitely small with increasing n, whatever the positive integer m may be. In other words: he defines an infinite sequence of rationals by appealing to the condition that in case of an arbitrarily chosen positive rational ǫ there is a positive integer n1 such that |an+m − an | < ǫ, if n ≥ n1 and m is an arbitrary positive integer. Cantor expresses this condition of {an } succinctly as follows: The sequence {an } has a certain limit b [5, p. 93]. In his seminal work ‘Grundlagen einer allgemeinen Mannigfaltigkeitslehre’ of 1883, Cantor discusses three main forms of introducing the real numbers in a strict arithmetical fashion: the definitions suggested by Weierstraß, Dedekind and himself. All three definitions are said to share the common characteristic that to the definition of an irrational real number there always belongs a well-defined (countably) infinite set of rational numbers. Cantor points out that the difference between the three forms of definition is due to the momentum of generation (Erzeugungsmoment) through which the set of rational numbers is linked to the number it defines, and to the conditions which the set must satisfy in order to qualify as a foundation for the definition of the number in question [5, p. 184]. As to his own definition of the real numbers, Cantor likewise proceeds from a countably infinite set of rational numbers (aν ). Every such set (aν ), which can also be characterized by the requirement Lim (aν+µ − aν ) = 0 ν=∞
(for arbitrary µ),
he calls a fundamental sequence and assigns to it a number b, to be defined through it, “for which one can expediently use the sign (aν ) itself, as suggested by Heine” [5, p. 186].24
24 Note that here Cantor himself does not use quotation marks.
50
Matthias Schirn
Quite in the spirit of Frege, one might object to this explanation that it fails to spell out what entitles us to use the sign ‘(aν )’ in place of the number b we assign to a set (aν ) that meets the requirement mentioned above. Cantor’s appeal to Heine as an authority in this matter must have raised a red flag for Frege. Strictly speaking, a sign cannot take over the status and the function of a number; or, in other words: the former cannot replace the latter. This applies even if by a sign one does not understand the actualized sign type, that is, the concrete, physical occurrence or inscription of the sign, but rather the sign type qua abstract object. Yet Cantor’s subsequent explanations suggest that he originally intends to correlate numbers b, b′ and not their signs with his fundamental sequences. He writes [5, p. 186]: Such a fundamental sequence presents three cases, as can be rigorously deduced from its concept: either its members aν for sufficiently large values of ν are smaller in absolute value than any arbitrarily given number; or, from a certain ν on they are greater than a determinable positive rational number ρ; or, from a certain ν on they are less than a determinable negative rational quantity −ρ. In the first case, I say that b is equal to zero, in the second, that b is greater than zero or positive, in the third that b is less than zero or negative.
Definitions of the relations of identity, greater than and less than for two numbers b und b′ are provided only after the sum and the difference b ± b′ as well as the product b ·b′ have been defined. Cantor stipulates that b = b′ or b > b′ or b < b′ , depending on whether b − b′ is equal to zero or greater than zero or less than zero [5, p. 186]. It seems obvious that he does not intend to set up definitions of the relations of being identical with, being greater than or being less than for two numerical signs “b” und “b′ ”; for in this case his definitions would be nonsensical. Not surprisingly, it is by invoking his principles of definition that Frege raises objections to all three groups of Cantor’s definitions. In [21, § 69], he purports to have unmasked the definitions of the first group as flawed, on the grounds that the definienda are not simple. In fact, the definienda contain the words “greater” and “less”, with which acquaintance prior to the act of framing the definitions must be assumed. Hence, according to Frege, this offends against his two principles of definition. He writes ([21, § 69]; see also [21, §§ 77 f.]): But acquaintance with the words “zero” and “equal” must also be assumed; and then the expressions “equal to zero”, “greater than zero”, and “less than zero” are completely known and must not be explained again. If they were not [completely known], then the previous definitions would have been incomplete – a violation of our first principle of definition.
For someone who is prepared to endorse Frege’s theory of definition, in particular his prohibition on piecemeal definitions, these criticisms would certainly go through.
Frege on Quantities and Real Numbers
51
Frege goes on to take Cantor to task for his definitions of the elementary operations. Among other things (cf. [21, §§ 79 f.]), he objects that the expressions “sum”, “difference”, and “product” are explained through themselves. Since they had thus been explained only incompletely until now, his principle of completeness was breached. Cantor is rebuked for having passed something off as a definition that he would have needed to prove as a theorem. Frege also deals with the third group of definitions in great detail (cf. [21, §§ 81-83]). His critique essentially boils down to the point that in the definitions he is considering the expressions “equal”, “greater”, and “less” are shifted back and forth between being known and being unknown (p. 94). However, in this way, he thinks, his principle of completeness is infringed. Furthermore, he makes it clear that a definition can never be used to define two things. In the present case, this would mean: the greater-than relation and the irrational numbers. The reason is that every attempt to do this ignores the principle of simplicity of the definiendum. Regarding Cantor’s definitions of sum, difference, product, being equal to, being less than, and being greater than, Frege jettisons them on the grounds that they have a kind of Protean ring to them (note that he uses a different, though somewhat related metaphor, [21, § 82]): What first presents itself as an explanation of the signs “+”, “>”, etc., claims, in the next instance, to determine more exactly that which, according to Cantor, should be assigned to the fundamental sequences. However, this deception is only possible due to the fact that those signs are now, once again, considered to be known. Thus, those definitions shimmer in two colours, by sometimes defining the sum, the product, being greater than, etc., and by sometimes being intended to determine the new numbers. But this is incompatible.
In any event, Frege’s complaint that Cantor does not always clearly distinguish between the sign and its denotation or reference appears to be justified. To see this, let us consider a passage from Cantor’s ‘Bemerkung mit Bezug auf den Aufsatz: Zur Weierstraß-Cantorschen Theorie der Irrationalzahlen’ of Illigens (1889) [5, p. 114]25 on which Frege likewise comments. . . . but I have never asserted, nor has anyone else ever asserted that the signs b, b′ , b′′ , . . . are concrete quantities in the literal sense. As abstract thought objects they are only quantities in the non-literal or figurative sense. What must be considered decisive here is that, as anyone familiar with my theory already knows, with the help of these abstract quantities b, b′ , b′′ , . . . concrete quantities in the literal sense, for example, geometric distances, etc., can be quantitatively determined in a precise manner.
I find this a little hard to follow. To begin with, Cantor refers here to b, b′ , b′′ etc., as signs, and, in the same breath, as abstract thought objects 25 It is a short reply to the criticism that E. Illigens [34, pp. 155-160] levelled against Cantor’s theory of irrational numbers.
52
Matthias Schirn
or as abstract quantities. At the beginning of his ‘Bemerkung’, he even calls b, b′ , b′′ irrational number concepts.26 Cantor distinguishes between quantities in the literal sense and abstract quantities in the figurative sense but, as the use of the word “figurative” may already indicate, he seems to acknowledge only the former as quantities in the proper sense of this word. To my mind, the distinction, vague as it is, makes little sense. It seems to me that the talk of certain signs as abstract quantities is only a façon de parler on which nothing should be grounded. Moreover, I fail to see how with the aid of certain signs conceived of as abstract quantities one could bring it about to determine concrete quantities quantitatively in a precise manner. Frege rightly objects to Cantor’s explanation that it would indeed take a strong faith to construe signs qua physical objects as abstract thought objects. At the same time, he grants that Cantor probably considered the signs “b”, “b′ ”, “b′′ ”, etc. to refer to abstract thought objects. It goes without saying that signs written on a piece of paper with pencil or on a blackboard with chalk cannot be regarded as abstract objects. By contrast, a sign construed as a sign type can and even must be considered an abstractum. Thus, if in the preceding quotation Cantor had the sign qua type rather than the sign qua token in mind, then Frege’s complaint would, at least prima facie, lose some of its force. Nevertheless, my hunch is that Cantor meant in fact the sign qua token. I even doubt that he was aware of the type-token distinction (was Frege aware of it?), but these are issues that we cannot and need not decide here. In any case, no matter how a sign is construed – as type or as token – it remains that, taken by itself, it cannot do the job of determining quantities such as geometric distances, time spans, electric charges, light-intensities, and so on in a precise manner. My (partly) conciliatory proposal is then this: when Cantor refers to b, b, b′ , b′′ , . . . (without using quotation marks) as abstract quantities, he means, in contrast to the wording he chooses, that the numbers b, b′ , b′′ , . . . 26 It is not surprising that Cantor and other contemporaries of Frege’s do not distinguish between concept and object as clearly and systematically as Frege does in his work after 1891, if they draw any such distinction at all. Cantor characterizes individual cardinal numbers and order types as general concepts. In the third of his three explanations of the terms “power” and “cardinal number” (which he apparently takes to be synonymous) in ‘Mitteilungen zur Lehre vom Transfiniten’ [5, pp. 411 f.], he speaks also of a set whose elements are particular, well-distinguished abstract concepts. It should be possible, he says, to conceive as objects not only concrete things qua elements of a set, but also abstract concepts qua elements of a set [5, p. 420]. Unfortunately, Cantor does not give clear examples of concepts that he considers to be abstract. However, I assume that he would classify, for example, the concepts cardinal number and order type as abstract ones. One is inclined to point out here that all concepts are, by their very nature, abstract entities and not only those concepts under which abstract objects fall or, let us say, those higher order concepts under which concepts fall.
Frege on Quantities and Real Numbers
53
or the objects designated by “b”, “b′ ”, “b′′ ”, . . . are abstract thought objects with the help of which concrete quantities can be quantitatively determined.27 If this is correct, then by straightening out his terminology Cantor could have escaped blatant incoherence. Frege writes [21, p. 86]: Now if by the expression “abstract thought object” Cantor understands that what we call logical object, then there seems to be perfect agreement between us. Yet it is too bad that these abstract objects do not occur at all in Cantor’s explanation! We have fundamental sequences and signs b, b′ , etc. We cannot, with the best will in the world, consider these to be abstract thought objects, nor can the fundamental sequences be meant.
According to Cantor, we have the fundamental sequences and the numbers b, b′ , etc. assigned to them, and the latter should be defined through the former. However, as we have already agreed upon, for Cantor the numbers b, b′ , etc. are in fact abstract thought objects. I hasten to add that his notion of an abstract thought object does not match Frege’s concept of a logical object (under which Frege subsumes numbers, courses-of-values, and the two truth-values).28 In particular, the question arises as to why the fundamental sequences themselves could not have been meant by the 27 In the case of individual numbers, Cantor also speaks of number concepts. Notice in this connection that he does not clearly distinguish between concept and object, let alone along Fregean lines. 28 I say this although Cantor’s concept of an abstract thought object is not as clear as it should be. In fact, it is less clear than Frege’s concept of a logical object. It is true that Frege says relatively little about this concept. With the exception of the two truth-values, he introduces logical objects by way of what I call logical abstraction (see [48, pp. 172-178]; concerning Frege’s view of logical objects see also [49]). In [18], cardinal numbers are tentatively introduced via Hume’s Principle – which at a later stage of his logical-mathematical investigation into the concept of cardinal number (cf. § 73) is very sketchily and incompletely “derived” from the explicit definition of the cardinality operator – and in § 3 of [20] courses-of-values by means of an informal semantic stipulation later to be embodied in the formal version of Basic Law V. Like Hume’s Principle, Basic Law V is a logical abstraction principle of second-order; both, the second-level equivalence relation of equinumerosity between two first-level concepts and the second-level relation of coextensiveness between two (monadic) first-level functions can be defined in second-order logic. Yet unlike Hume’s Principle, Axiom V is acknowledged by Frege to be a primitive law of logic. As far as Cantor’s notion of an abstract thought object is concerned, I presume that in general he construed an abstract object in accordance with the customary view as one that is non-spatial and non-temporal and, hence, not capable of involvement in causal or physical interaction. Cardinal or ordinal numbers, for example, are abstract thought objects for Cantor. If we cast a glance at his description of how one may arrive at, let us say, the cardinal number M of a given set M , we may gain approximately an idea of what he means by an abstract thought object. Cantor construes M as a definite set, comprised of nothing but units, which exist in our mind as an intellectual copy or a projection of M . M is obtained by carrying out the process of abstraction from both the nature of the elements of M and their order. To be sure, Frege would have refrained from calling numbers, or more generally coursesof-values, abstract thought objects. Note also his disdain for Cantorian abstraction as a variety of psychological abstraction.
54
Matthias Schirn
abstract thought objects. As to Cantor’s fundamental sequences, Frege rightly conjectures in another place that they are taken to consist of abstract thought objects (cf. [21, § 86]). However, I fail to see why the fundamental sequences themselves could not be classified correctly as abstract thought objects. Plainly, a sequence or set of abstract objects is likewise to be considered an abstractum. In [21, § 77], Frege calls into question the idea that the numbers assigned to fundamental sequences are signs. The following view seems more plausible to him [21, p. 89]: Related to each fundamental sequence there is a certain number that need not be a rational. These numbers are thus, to some extent, new and not yet considered, and should be determined by the fundamental sequences to which they are related. The sign ‘b’, then, does not designate the fundamental sequence, but rather the number related to it. Hence, this the number is, itself, not a sign, but rather that which Cantor calls an abstract thought object.
Frege recognizes the weak point of the view that he attributes to Cantor. It is the fact that the assignment of a number to a fundamental sequence and the definition of a new number “are contracted into one act”. Surely, one can assign a previously defined number to a given fundamental sequence, but one cannot assign a determinate number to it that has yet to be defined through it. It is time to summarize Frege’s critique of Cantor’s theory of irrational numbers (cf. [21, § 84]). Frege distinguishes between two views. According to the first, the numbers assigned to fundamental sequences are signs; according to the second, they are abstract thought objects. In the first case, the correlation of numbers qua signs with certain fundamental sequences is said to be inessential. In Frege’s view, Cantor disposes only of his fundamental sequences, while the ratios of quantities are lacking. “First we must know the ratios of quantities, the real numbers; then we may discover how we can determine the ratios by means of the fundamental sequences” [21, § 76]. Frege adds that Cantor’s theory is by no means purely arithmetical since its application to geometry is crucial for it. That is to say: Cantor considered it essential that with the help of abstract quantities b, b′ , b′′ , . . . concrete quantities, such as geometrical distances, be precisely determinable. Cantor’s introduction of abstract numerical quantities is purely arithmetical, but is said to miss the decisive point. The description of how one could determine distances through numerical quantities includes the decisive point, but is alleged to be not purely arithmetical. According to Frege, in the second case we do not succeed in grasping (fassen) the numbers assigned to the fundamental sequences qua abstract thought objects. One cannot assign the numbers to the sequences until one is in possession of these numbers. Frege’s critique of Cantor’s definitional practice need not be repeated here.
Frege on Quantities and Real Numbers
55
Thanks to his introduction of the fundamental sequences and the three forms of definition of the real numbers mentioned a while ago, Cantor arrives at the following theorem (I): If b is the number determined by a fundamental sequence (aν ), then b − aν with increasing ν becomes less in absolute value than any conceivable rational number, or, what amounts to the same thing: Lim aν = b. ν=∞
Cantor underscores that in his definition of the real numbers the number b is not defined as the limit of the members a of a fundamental sequence (aν ). If this were the case, he would be committing the logical error of presupposing the existence of the limit Lim aν . According to Cantor, the ν=∞ situation is precisely this: his previous definitions have assigned to the concept b such properties and such relations in which it stands to the rational numbers that from this we can infer with logical evidence that the limit Lim aν exists and is equal to b. ν=∞
Cantor holds that the irrational number, in virtue of the property ascribed to it by the definitions, has just as definite a reality in our minds (he calls it intrasubjective or immanent reality) as the rational numbers. We need not acquire an irrational number through a limiting process, but, on the contrary, “by possession of it we become convinced of the practicability and evidence of limiting processes in general” [5, p. 187]. This reflection leads Cantor to the following extension of theorem (I): (II) If (bν ) is any set of rational or irrational numbers with the property that Lim (bν+µ − bν ) = 0 (for any µ), then there is ν=∞
a number b determined by a fundamental sequence (aν ) such that Lim bν = b. ν=∞
From (I) and (II) it emerges that the same numbers b that are defined on the basis of fundamental sequences (aν ) – Cantor calls them fundamental sequences of the first-order – and which are defined as limits of the aν , can also be represented in various ways as limits of fundamental sequences (bν ), where each of the bν is defined by a fundamental sequence of the first (ν) order (aµ ) (with fixed ν) [5, p. 188]. Accordingly, Cantor calls such a set (bν ), if it has the property that Lim (bν+µ − bν ) = 0 (for any µ), a fundaν=∞ mental sequence of the second order. Furthermore, one may construct not only fundamental sequences of the third, fourth, . . . , nth order, but also fundamental sequences of the αth order, where α is any number of what Cantor refers to as the second number class.29 All fundamental sequences 29 Cantor’s first number class is the set of all finite ordinal numbers {ν}, which has the type ω (the smallest transfinite ordinal number). The second number class Z(ℵ0 ) is the set {α} of all order types α of well-ordered sets of cardinality ℵ0 and thus the set of all transfinite denumerable ordinal numbers. (ω is thus the smallest number of the second number class.) The type of Z(ℵ0 ) is the smallest non-denumerable
56
Matthias Schirn
of higher order do the same job for the determination of a real number b as do the fundamental sequences of the first order. Hence, by appealing to fundamental sequences of the higher order we do not introduce numbers which could not already have been determined through the fundamental sequences of the first order.30
4 Russell on quantities and real numbers in Principles of Mathematics and Principia Mathematica In [39, p. 285], Russell attempts to undermine Cantor’s claim that his theory of irrational numbers renders his theorem (I) (to which I referred above) strictly demonstrable. According to Russell, Cantor’s supposed proof is fallacious precisely because it fails to show that a rational can be subtracted from a real number. Russell makes it clear that connected with every rational number a there is a real number defined by the fundamental sequence whose terms are all identical with a; if b is the real number defined by a fundamental sequence (aν ), and if bν is the real number defined by a fundamental sequence whose terms are all equal to aν , then (bν ) is a fundamental sequence of real numbers whose limit is b. Russell points out that, contrary to what Cantor assumes, we cannot infer from this reasoning that Lim aν exists. Asserting the existence of Lim aν is justified only if (aν ) has a rational limit. “The limit of a series of rationals either does not exist, or is rational; in no case is it a real number. But in all cases a fundamental series of rationals defines a real number, which is never identical with any rational” [39, p. 285]. Russell [39, pp. 270 ff.] defines the irrational numbers as classes of segments of rational numbers. According to his theory, we can define, with respect to a given rational number r, four infinite classes of rationals: (1) those less than r, (2) those not greater than r, (3) those greater than r, (4) those not less than r. Classes of rationals that have the property of (1) are called segments. A segment of rationals is defined as a class of rationals which is not empty, nor yet coextensive with the rationals themselves, and which is identical with the class of rationals x such that there is a rational y of the said class such that x is less than y. It is in this connection that Russell refers to G. Peano’s work Formulaire de Mathémathiques (vol. II, 1899, part III, § 61). Moreover, in his letter to Frege of 20 February 1903, Russell observes that Frege’s criticism of the ordinal number, its power the second smallest transfinite cardinal number ℵ1 (cf. [5, pp. 325, 331]). 30 See Cantor’s explanation [5, pp. 188 ff.] of why he thinks that his definition of the real numbers is suitable. cf. also § 10 entitled ‘Die in einer transfiniten geordneten Menge enthaltenen Fundamentalreihen’ (‘The fundamental sequences contained in a transfinite ordered set’) of his ‘Beiträge zur transfiniten Mengenlehre’ [5, pp. 307 ff.].
Frege on Quantities and Real Numbers
57
arithmetical theory of irrational numbers in [21] seems sound. He adds that he himself has a purely arithmetical theory which is free of logical errors: “Let k be a class of rational numbers; I then call the class of all rational numbers smaller than at least one member of k the real number determined by k. Some hints of this theory are to be found in Peano” [25, p. 237]. In saying this, Russell refers first and foremost to Peano’s essay ‘Sui numeri irrazionali’, Rivista di Matematica 6 (1896-99), pp. 126140). Russell [39, p. 270] observes that in ‘Beiträge zur Begründung der transfiniten Mengenlehre’ (Mathematische Annalen 46 (1895), pp. 481-512) Cantor comes very close to his own theory of real numbers. As to Peano, Russell complains [39, pp. 274 f.] that Peano’s way of characterizing the relation between segments and irrational numbers lacks clarity. Another point he makes is that Peano goes astray in construing the real numbers as the limits of classes of rationals: a segment is in no sense a limit of a class of rationals [39, pp. 274 f.]. Although Peano maintains that a complete theory of irrational numbers can be constructed by appeal to segments, he does not seem to be aware of the philosophical reasons why this must be done. (See Russell’s arguments for the necessity of supplying such a construction in chapter XXXIV of his [39]). There is an interesting comment by Frege on Russell’s theory of irrational numbers in [39]. In his letter of 21 May 1903 to Russell, Frege recognizes this theory as “logically unassailable”, but makes one important proviso: it must be guaranteed that the word “class” has been given a proper meaning. From Frege’s point of view, this means that classes are to be conceived of as logical objects in accordance with his own conception of logical objects, and thus not as aggregates, systems or wholes consisting of parts (cf. [20, pp. 1-3]; [23, pp. 104 f.; 193 ff.]; [25, pp. 222 f., 225]). He attributes the latter conception to Russell and argues that it does not allow a logical foundation for arithmetic. He writes [25, p. 239]: When you define an irrational number as a class of rational numbers, it is, of course, something different from what I call an irrational number according to my definition, although there is naturally a connection. It seems to me that you need a double transition: (1) from numbers to rational numbers, and (2) from rational to real numbers in general. I want to go at once from numbers to real numbers as ratios of quantities.
In Part VI, entitled “Quantity” of volume III of Principia Mathematica (1913), Russell presents his mature theory of quantities and real numbers which he had worked out in collaboration with Whitehead. As to their treatment of quantities, they proceed from a simple guiding principle: “No quantity of any kind without a comparison of different quantities of that kind” (p. 261). In section B, Russell and Whitehead concern themselves with “kinds” of quantity: masses, spatial distances, velocities. They regard each kind of quantity as what they call a “vector-family”. A vector-family is a class of
58
Matthias Schirn
one-one relations all of which have the same converse domain and, moreover, have their domain contained in their converse domain. Russell and Whitehead argue that in a case that relates to spatial distances, the applicability of this view is obvious; concerning masses, the view is said to become applicable by considering, for example, one gramme as + one gramme, that is, as the relation of a mass m to a mass m′ when m exceeds m′ by one gramme. What is commonly called one grammme will then be the mass which has the relation + one gramme to the zero of mass. Section C is dedicated to measurement, that is, to the discovery of ratios (see the definition of ratios *303.01 on p. 260) or of the relations expressed by real numbers, between the members of a vector-family. A vector-family is measurable if it contains a member T (the unit) such that any other member S stands to T in a relation which is either a ratio or a real number. Section D is concerned with cyclic families of vectors, such as angles or elliptic straight lines. When Russell and Whitehead come to consider the real numbers as opposed to ratios, they point out that the former are said to be required first and foremost to obtain a Dedekindian series, so as to secure limits to sets of rationals having no rational limit. If rationals and irrationals are to form one series, it is necessary to give some definition of “rationals” other than “ratios”, since the series of ratios (assuming the acceptance of the axiom of infinity) is not Dedekindian, and is not part of any arithmetically definable Dedekindian series. Whitehead and Russell stress (p. 316, *310) that the properties which real numbers must possess from their point of view, will be forthcoming if they are identified with segments of H (see the explanation below), and if segments of the form H ′ X, that is, segments which have ratios as limits, will be termed ‘rational real numbers’. They likewise emphasize in this context that hardly any of the properties of real numbers can be proved without acknowledging the axiom of infinity. “Thus H ′ X is the rational real number corresponding to the ratio X, and a real number in general is of the form H ′′ λ, where λ is a class of ratios. H ′′ λ will be irrational when λ has no limit or maximum in H” [56, p. 316]. Note that for the relation “less than” among rationals of a given type (excluding 0q ), Whitehead and Russell use the letter “H”, “to suggest η . . . because, if the axiom of infinity holds, the series of rationals of a given type is an η” (p. 278); cf. the definition of “H” on p. 278, *304.02. Following Cantor, the authors use “η” for the class of rational series (cf. the definition *273.01 on p. 202) and define a rational series as one which is compact, has no beginning or end, and has ℵ0 terms in its field. Consequently, the field of rational series can be arranged in a progression, and this is the source of the special properties by means of which rational series can be distinguished from other compact series (cf. p. 199).
Frege on Quantities and Real Numbers
59
5 Quantities and real numbers in Grundgesetze 5.1 Informal considerations I shall now turn to Frege’s theory of real numbers in [21]. Going through its details will provide an appropriate idea of his sustained efforts; the theory did not just fall into his lap. In §§ 157-164, we find a series of preliminary considerations, followed by the formal construction of analysis under the heading “Grössenlehre” (“theory of quantity”). In section A, Frege proves the associative and commutative laws for the composition of extensions of relations. The latter, unlike the former, are not generally valid. Section B provides the definitional introduction of the quantitative domain, of the concept positival class, as well as the derivation of several theorems – among others, devoted to greater than and less than in a positival class. In section Γ, Frege defines the upper limit in a positival class; in section ∆, he defines the concept positive class and proves Archimedes’ Axiom. Section E contains the proof of the commutative law in a positive class. Finally, in section Z, Frege proves the commutative law in the domain of a positive class. Having completed this proof, Frege rather abruptly breaks off the logical construction of analysis, which he apparently had hoped to bring to a happy ending in a third Grundgesetze volume. The reason is Russell’s discovery of a contradiction in Frege’s logical system. In a first preliminary reflection, Frege discusses the reason why, in his view, the domain of the cardinals cannot be extended to that of the reals. The cardinal numbers are not ratios, and must therefore be distinguished from the positive integers. Surely, we use cardinal numbers to count, and the domain of what is countable is, according to Frege, the widest domain of all. In fact, he considers it to be all-embracing, because everything thinkable belongs to it. Cardinal numbers, by their very nature, provide answers to questions of the type “How many objects of a certain kind are there?” By contrast, the reals are to be construed as ratios of quantities;31 they measure how large a given quantity is compared with a unit quantity. Thus, in Frege’s view, the mode of application of the reals differs fundamentally from that of the cardinals.32 And just as in [18] he attempted to account for the application of the natural numbers in counting in their definition, so in [21] he takes pains to ensure that the application of the real numbers in measurement is appropriately built into their definition. 31 In [21, § 157 (footnote 2)] Frege approvingly mentions Newton when he emphasizes again that he has construed the real numbers as ratios of quantities and has thus pointed to the quantities as those objects between which such a ratio holds. 32 Currie [8, p. 349] claims that the standpoint of [20, 21] presupposes a sharp distinction – both methodological and ontological – between the natural and the real numbers. Only the first half of this claim is correct. According to Frege, both cardinal numbers and real numbers are to be defined as courses-of-values, the cardinals as equivalence classes of equinumerosity and the reals as Relations of Relations.
60
Matthias Schirn
It is noteworthy that the f -relation in which a cardinal number stands to its successor differs from the relation ξ + 1 = ζ. The former only holds between cardinal numbers, while the latter holds also between numbers other than positive integers. Hence, the formula “aS(bSf )” – “b directly follows a in the series of cardinal numbers” – cannot be replaced by “a+1 = b”. By inverting ξ + 1 = ζ, guided by the positive integers, we can go back via zero to the negative numbers. However, going back beyond the cardinal number 1 is impossible. This motivates Frege to distinguish between the cardinal numbers 0 and 1 and what he simply calls the numbers 0 and 1.33 Frege likewise stresses that his way of introducing the real numbers cannot be grounded on geometrical configurations.34 Taken literally, this remark goes practically without saying, for if the theory of real numbers rested intrinsically on geometrical constructions or configurations, then the logicist thesis could hardly apply to analysis, contrary to Frege’s own express opinion. To be sure, both the second and an envisioned third Grundgesetze volume were designed to establish the logicist thesis for the theory of real numbers as well, bearing in mind that Frege presumably considered cardinal arithmetic to be the paradigm case and at the same time the acid test for logicism. If the real numbers are construed as ratios of quantities, then they could not be distances, for example. For it is imperative to distinguish between a distance and the measurement number that belongs to it in proportion to a unit distance. A numerical symbol does not denote a distance; it does not refer to anything geometrical. The same ratio of quantities associated with distances is also present in other types of quantities, for example, in masses, angles, volumes, electric charges, light-intensities, time periods, velocities, moments of inertia, forces, curvatures, etc. Thus, the application of the real numbers is not restricted to any special kinds of quantity (for example, to geometrical ones), but rather relates to the domain of the measurable, which encompasses all kinds of quantity whatsoever. In the course of discussing Cantor’s theory of irrational numbers, Frege accordingly mentions two advantages that the conception of real numbers as ratios of quantities can lay claim to [21, p. 85]: If we take a closer look, we note that that a numerical sign, taken by itself [für sich allein], cannot denote a length, a force etc., but only in connection with the designation of a measure, a unit, such as a metre, a gram, etc. What, then, does the numerical sign, taken by itself, [thereby] denote? Obviously a ratio of quantities . . . If now by ‘number’ we understand the reference [Bedeutung] of a numerical sign, then a real number is the same as a ratio of quantities. Now what have we gained by defin33 Frege introduces here a special notation for the designation of cardinal numbers. I shall ignore it henceforth. 34 “For if arithmetical sentences can be proved independently of geometrical axioms, then they must be so proved” [21, § 158].
Frege on Quantities and Real Numbers
61
ing real number as ratio of quantities? [The second emphasis is mine.] At first it seems only that one expression has been replaced by another. This is, nevertheless, a step forward. Firstly, no one will confuse a ratio of quantities with a written or printed sign; and thus a source of countless misunderstandings and errors is blocked. Secondly, with the expression ‘ratio of quantities’ or ‘ratio of a quantity to a quantity’ we indicate the mode in which real numbers are linked with quantities. Of course, the main work remains to be done. Initially, we have merely words that indicate to us only approximately the direction in which the solution is to be sought. The meaning [Bedeutung] of these words has yet to be fixed more precisely. But we shall even from now on no longer say that a number or numerical sign denotes, now a length, now a mass, now a light-intensity. We shall say, rather, that a length can have to a length the same ratio as a mass has to a mass, or as a light-intensity has to a light-intensity; and this same ratio is the same number and can be denoted by the same numerical sign.
Two comments may be in order here. (a) To note that a numerical sign, taken by itself, cannot denote a length, etc. does not require that we look more carefully at its use; this fact is obvious. In any event, the initial statement in the quoted passage does not imply that a numerical sign, taken by itself, does not denote anything. Thus, in the relevant context it is consistent for Frege to claim further that a numerical sign, taken by itself, denotes a ratio of quantities (a number). Now, the second sentence in the original German reads as follows: “Was bedeutet nun dabei das Zahlzeichen allein?” (My emphasis; I render here the word “allein” in the sense of the phrase “für sich allein” .) At least at first glance it is not clear what the word “dabei” is intended to mean in this context. Is it to mean “in Verbindung mit der Bezeichnung eines Maasses. . . ” (“in connection with the designation of a measure. . . ”) – which is the very phrase that Frege employs in the preceding sentence – or is it merely a dispensable filler, that is, a word whose sense is not intended to contribute anything essential to the expression of the thought? I tend to vote for the second option. The numerical sign, taken by itself, denotes a ratio of quantities. Plainly, it makes little sense to assume that in the quoted sentence Frege uses the word “dabei” with the intention to refer to the case in which a numerical sign is considered just by itself, since this would amount to asking: What does a numerical sign, taken by itself, denote when taken by itself? (b) It is worth pointing out that in raising the question “What, then, does the numerical sign, taken by itself, denote?” Frege appears to ignore his dictum in [21, § 97]: “One may ask about meanings only where the signs are constituents of sentences which express thoughts.” (“Nach Bedeutungen kann nur gefragt werden, wo die Zeichen Bestandtheile von Sätzen sind, die Gedanken ausdrücken.”) This dictum, which Frege states in the course of criticizing Thomae’s game formalism, is immediately reminiscent of the
62
Matthias Schirn
context principle in [18], especially when this is clad in the garb of a methodological maxim: “The meaning of words must be asked for in the context of a sentence, not in isolation” (p. XXIII). It is chiefly for reasons of space that I do not pursue this conflict further here. Suffice it to observe that Frege did not dismiss the context principle from his mind; it is still in force in [20, 21], although it is no longer highlighted as a guiding principle of his foundational project (cf. [42] and [46]).35 As I mentioned at the very outset of this essay, Frege’s method of introducing the real numbers lies between the traditional geometrical approach and the new theories developed by Cantor, Weierstraß, and Dedekind. Concerning the geometrical foundation, Frege retains the characterization of the real numbers as ratios of quantities, but, following a key idea of the new theories, he detaches them from all special kinds of quantity. The application of the real numbers in measuring quantities may not, of course, simply be externally patched onto them, because we would then have to state separately for each kind of quantity how the measurement is to be carried out, and we would lack general criteria for the applicability of the real numbers as measurement numbers. Frege mentions one serious doubt that might be raised to the middle course he has adopted. If the positive square root of 2 is a ratio of quantities, then it seems indispensable for its definition that it provide quantities which in fact stand in this relation to one another. The question is how this can be accomplished if appeal to geometrical or physical quantities is inadmissible. One still needs a ratio of quantities like √ the positive square root of 2 because otherwise our use of the symbol “ 2” would not be justified. In Frege’s view, offering a solution to this difficulty requires that we elucidate the meaning of the word “quantity”. We are told that all previous attempts to define the term “quantity” have miscarried. It is likely that this claim is meant to include Frege’s 35 Whenever it is beyond doubt that Frege uses the word “Bedeutung” in the technical sense that he associates with it in his official semantics of the Sinn and the Bedeutung of linguistic expressions, I render it as “reference” (“denotation” might be another option). Note that in his writings after 1892 Frege’s use of the word “Bedeutung” on a few occasions differs from the technical sense which vastly predominates in them. In the logical investigation ‘Der Gedanke’, for example, he writes: “Die Bedeutung des Wortes ‘wahr’ scheint ganz einzigartig zu sein” [23, p. 345]. I believe that in this context “Bedeutung” is used in a neutral, nontechnical sense; it should therefore be rendered as “meaning”: “The meaning of the word ‘true’ seems to be altogether unique.” Note that nowhere in his writings does Frege say anything about the reference of “true” ; he confines himself to emphasizing the special, “non-contributory” sense of this word whenever it occurs in standard linguistic environments such as “The thought that p is true” or “It is true that p”. Bearing all this in mind, I decided to translate the second occurrence of “Bedeutung” in the quotation above as “meaning”, because I presume that Frege does not use it here strictly in the technical sense. In this context, “Bedeutung” might even be meant to comprise, possibly in a somewhat loose sense, the two semantic components sense and reference. Analogous remarks apply to Frege’s dictum in [21, § 97].
Frege on Quantities and Real Numbers
63
own earlier suggestion (in [16]) of how the notion of quantity should be explicated, namely by stating the conditions under which identity of quantity holds [16, p. 51]. In a moment, we shall see that one of the criticisms that Frege levels against previous and current attempts to define or explicate the concept of quantity, is that the use of the phrase “of the same kind” proves to be an idle wheel. Yet it is precisely this phrase that occurs in Frege’s early explanation of what type of property a quantity of a certain kind such as length, force, mass or velocity would be, if the determination of the general concept of quantity were to proceed via the formulation of identity conditions for quantities. Recall that it would be a property in which a group of things, independently of their internal structure, can agree with a single thing of the same kind. In short, if in [21] Frege still fully subscribed to this early proposal, it would be hard to understand why he did not explicitly exempt it from his wholesale rejection of all attempts to define “quantity”. Be this as it may, his principal objection is that the term “quantity” is usually explained with the help of another term that stands in equal need of explanation as the explicandum itself. As a consequence, we are left with no better an understanding of the term “quantity” than before. Frege illustrates this by means of some examples, but mentions only Otto Stolz and Hermann Hankel by name. If one looks at the attempted explanations of the term “quantity”, Frege says, one often comes across the word “of the same kind” or “homogeneous” (“gleichartig” ) or the like. It is required of quantities that those of the same kind can be compared, added and subtracted, also that a quantity can be decomposed into parts of the same kind. Frege’s objection is that here the phrase “of the same kind” does not explain anything, for things can be of the same kind in one respect, but of different kinds in another (which, to my mind, is almost trivially true).36 Yet the fact that we cannot decide unambiguously whether an object is of the same kind as another goes against his logical requirement of sharp delimitation of a concept or a relation. Others, Frege points out, define the concept of quantity by using the words “greater” and “smaller” or “increase” and “diminish”, but they are said to spare themselves the trouble of explaining in what the relation of being greater or smaller or the activity of increasing or diminishing consists. The same is said to apply to the use of the words “addition”, “sum”, “duplicate”.
36 In the section entitled “The Mathematical-Sublime” of his Kritik der Urteilskraft (Critique of the Power of Judgement) [36, p. 92], Kant says (I slightly paraphrase): That something is a quantity can be recognized by restricting one’s attention to the thing itself, without comparing it at all with other things, namely when plurality of that which is homogeneous constitutes a unit. However, to determine how large it is always requires something else for its measure, which likewise is a quantity.
64
Matthias Schirn
5.2 Interlude: The concept of quantity/magnitude in the work of Euclid, Aristotle, and Euler – some remarks In what follows, I shall take a look at the expositions of the concept of magnitude or quantity in the work of Euclid, Aristotle, and Euler and consider briefly Frege’s possible response to them. Frege was undoubtedly familiar with Euclid’s Elements and thus also with book V in which Euclid deals with the concept of magnitude (µέγεθος). It is perfectly possible and perhaps even probable that he also knew the relevant passages in which Aristotle explicates the concept of quantity (ποσÕν). Finally, I take it to be rather likely that Frege knew (at least part of) the work Vollständige Anleitung zur Algebra (1770) of the famous mathematician Leonhard Euler. At the very beginning of it, Euler gives an explication of the notion of quantity (Grösse) in three parts. When Frege jettisons across the board all previous and present explanations of the notion of quantity in mathematics, he may also have had in mind Euclid or Aristotle or Euler or even the exposition of that concept by all three of them. To all appearances, Frege greatly admired Euclid’s work on the foundations of mathematics for its methodological rigour and its groundbreaking results. This is already evident from the fact that he unconditionally endorsed Euclid’s theory of geometry and declared his conception of axioms to be sacrosanct. I mentioned earlier (in 2.2) that in Book V of the Elements Euclid does not define the concept of magnitude, but rather analyzes its properties and structure by setting up a group of definitions and by subsequently proving a number of propositions involving the concepts of magnitude, of ratio, of multiple, of proportion, of proportional, etc. At the outset of book V, Euclid states altogether eighteen definitions. Regarding the very first definition, one may have expected a definition of “magnitude”, but learns instead under which condition a magnitude is a part of a magnitude. Definition 1: Μέρος στ µέγεθος µεγέθους τÕ λασσον τοà µείζονος, Óταν καταµετρÍ τÕ µεζον. A magnitude is a part of a magnitude, the less of the greater, when it measures the greater.
Definition 3 defines the concept of a ratio: Λόγος στ δύο µεγεθîν еογενîν ¹ κατ¦ πηλικότητά ποια σχέσις. A ratio is a sort of relation in respect of size between two magnitudes of the same kind.
Definition 4 is as follows (it was already quoted earlier in 2.2): Λόγον χειν πρÕς ¥λληλα µεγέθη λέγεται, § δύναται πολλαπλασιαζόµενα ¢λλήλων Øπερέχειν.
Frege on Quantities and Real Numbers
65
Magnitudes are said to have a ratio to one another which are capable, when multiplied, of exceeding one another.
Definition 5 stipulates what it is for magnitudes to be in the same ratio, while Definition 6 introduces the term “proportional” for magnitudes that have the same ratio. Definitions 12–16 are concerned with ratios; they determine the meaning of the terms “alternate ratio”, “converse ratio”, “composition of a ratio”, “separation of a ratio”, “conversion of a ratio”.37 After having set up the group of eighteen definitions, Euclid proceeds to prove twenty five propositions starting with the proof of the proposition “If there be any number of magnitudes whatever which are, respectively, equimultiples of any magnitudes equal in multitudes, then whatever multiple one of the magnitudes is of one, that multiple also will all be of all” and ending with the proof of proposition 25 “If four magnitudes be proportional, the greatest and the least are greater than the remaining two”. So much to Euclid on the concept of magnitude. Let us now cast a glance at Aristotle’s conception of quantity. Aristotle deals with the concept of quantity (ποσÕν, which literally means: how much) briefly in his Metaphysics and more extensively in the Categories. I begin with his exposition in the Metaphysics (1020a). I first quote from the original Greek text and shall then present the English translation by W.D. Ross:38 ΠοσÕν λέγεται τÕ διαιρετÕν ις νυπάρχοντα ïν κάτερον À καστον ν τι κα τόδε τι πέφυκεν εναι. ΠλÁθος µν οâν ποσόν τι ¦ν ¢ριθµητÕν Ï, µέγεθος δ ¥ν µετρητÕν Ï. λέγεται δ πλÁθος µν τÕ διαιρετÕν δυνάµει ις µ¾ συνεχÁ, µέγεθος δ τÕ ις συνεχÁ· µεγέθους δ τÕ µν φ' ν συνεχς µÁκος τÕ δ' π δύο πλάτος τÕ δ' π τρία βάθος. We call a quantity (ποσÕν) that which is capable of being divided in two or more constituent parts of which each is by nature a one and a “this”. A quantity is a plurality (πλÁθος) if it is numerable, a magnitude (µέγεθος) if it is measurable. We call a plurality that which is divisible potentially into non-continuous parts, a magnitude that which is divisible into continuous parts.
Thus, Aristotle considers both the measurability and the divisibility into continuous parts to be defining properties of a quantity qua magnitude. 37 One special aspect concerning Euclid’s treatment of the concept of magnitude in Book V is that it explains how to work with ratios of lengths, areas, etc. without defining numbers which occur in these ratios. 38 Speaking for myself, I consider it essential to include, at least in some places, the original Greek text. As a rule, in my work I rely on original texts whenever I can. One of the reasons is that I never unconditionally trust any translation of a philosophical text. The available standard English translations of Frege’s works, for example, contain numerous mistakes (actually too many for my taste) which distort the meaning of the original. Another more practical reason is that a reader who is interested in casting at least a glance at the original text will presumably appreciate the comfort of having immediate access to it.
66
Matthias Schirn
Frege does not follow Aristotle in distinguishing between two basic kinds of quantity: plurality and magnitude. For Frege, a quantity is, by its very nature, measurable. I assume that Frege would be reluctant to accept Aristotle’s explanation of “quantity”; he would definitely not adopt it as his own, although Aristotle does not speak of the divisibility (or decomposition) of a magnitude into parts of the same kind, but rather of its divisibility into continuous parts. Unlike the Metaphysics, Aristotle’s work Categories (4b) does not yet provide an explication (definition) of the term “quantity”. Here Aristotle confines himself to distinguishing between discrete and continuous quantities and further between quantities that are composed of parts which have position in relation to one another and quantities that are not composed of parts which have position in relation to one another. He mentions number and language as instances of discrete quantities, and lines, surfaces, bodies, time and place as instances of continuous quantities. In the original text, it reads as follows: Τοà δ ποσοà τÕ µέν στι διωρισµένον, τÕ δ συνεχές· κα τÕ µν κ θέσιν χόντων πρÕς ¥λληλα τîν ν αÙτος µορίων συνέστηκε, τÕ δ οÙκ ξ χόντων θέσιν. στι δ διωρισµένον µν οον ¢ριθµÕς κα λόγος, συνεχς δ γραµµή, πιφάνεια, σîµα, τι δ παρ¦ ταàτα χρόνος κα τό πος'.
In what follows, Aristotle attempts to explain each of these categories of quantities. A quantity is further characterized quite generally as something that has no contrary and, furthermore, as something that does not seem to admit of a more and a less. Finally, the most important distinguishing mark of a quantity is, Aristotle claims, its being called both equal and unequal. Ετι τù ποσù οÙδέν στιν ναντίον . . . ΟÙ δοκε δ τÕ ποσÕν πιδέχεσθαι τÕ µ©λλον κα τÕ Âττον, οον τÕ δίπηχυ. . . Ιδιον δ µάλιστα τοà ποσοà τÕ σον τε κα ¥νισον λέγεσθαι'.
As to the last point, I am not exactly sure what Aristotle has in mind. I presume that he wants to convey that a quantity can be said to be equal or unequal to another quantity of the same kind, as the case may be, and that this is the most significant feature of any quantity and of paramount importance for apprehending the concept of quantity.39 39 In a written comment, Michael Scanlan agrees with me that “τÕ σον” is best rendered as “equal” and not as “identical”. He points out that Aristotle is talking about the amount of things, a group of men or the length of a given line. Here one group of men is equal or unequal to another in number, one line is equal or unequal to another in length. Scanlan suggests that here we have an equivalence relation, not an identity. He writes: “Any time that two objects share some, but not necessarily all, properties, then I prefer to talk of an equivalence relation. One way to do this is to treat identity as one among many other equivalence relations. This is definitionally smoother, but outside a formal system, I think it is clearer to save “equivalence” for talking about relations in which a and b share some, but not all, characteristics. Thus, we have an equivalence relation of sameness in height among
Frege on Quantities and Real Numbers
67
Let me end this interlude with a few words on Leonhard Euler’s concept of quantity. The first sentence of Euler’s work Vollständige Anleitung zur Algebra (1770) begins with an explanation of the concept of quantity: 1. Erstlich wird alles dasjenige eine Größe genannt, was einer Vermehrung oder Verminderung fähig ist oder wozu sich noch etwas hinzusetzen oder wovon sich etwas hinwegnehmen läßt . . . 2. Es giebt sehr viele verschiedene Arten von Größen, welche sich nicht wohl aufzählen lassen; und daher entstehen die verschiedenen Theile der Mathematik, deren jeder mit einer besonderen Art von Größen beschäftigt ist. Die Mathematik ist überhaupt nichts anderes als eine Wissenschaft der Größen, welche Mittel ausfindig macht, wie man letztere ausmessen kann, 3. Es läßt sich aber eine Größe nicht anders bestimmen oder ausmessen, als daß man eine andere Größe derselben Art als bekannt annimmt, und das Verhältniß angiebt, in dem diese zu jener steht.
Here is my English translation: 1. In the first place, everything is called a quantity which is capable of an increase or a decrease or to which something can be added or from which something can be taken away . . . 2. There are very many different kinds of quantities which cannot be properly enumerated; and it is for this reason that the different parts of mathematics emerge, each of which is concerned with a special kind of quantity. Mathematics is indeed nothing but a science of quantities that traces ways in which one can measure the latter. 3. However, a quantity can be determined or measured only by assuming that another quantity of the same kind is known, and by giving the ratio in which the first stands to the second.
As I have already said, it is possible that Frege had also Euler’s explication of the concept of quantity (in section 1 of the quotation above) in mind when he criticized those explanations of this concept in which the terms “to increase” and “to diminish” occur essentially. Note that what Euler says in section 2 is reminiscent of and basically in line with certain remarks of Frege’s in [16]. As to Euler’s section 3, I suppose that Frege would have basically agreed that a quantity can be measured only by giving the ratio in which it stands to another quantity, but would probably have criticized Euler’s use of the words “of the same kind”.40
a group of people, who are all distinct individuals, or in geometry we can think of two distinct triangles that are equivalent with respect to the properties involved in congruence.” Scanlan proposes that Aristotle seems to be thinking of equivalence relations in this latter sense in his use of the word “equal” in the chapter on quantity in the Categories. 40 I assume that an explanation of “quantity” such as the one listed in The Encyclopedia of Philosophy (ed. P. Edwards, vol. 5, Macmillan Publishing Co., New York, London 1967, p. 242): “If one thing can be said to be greater than, equal to, or less than another in a certain respect, then this respect may be called a quantity” would likewise have fallen prey to Frege’s critique.
68
Matthias Schirn
So much for my comments on Euclid, Aristotle, Euler and Frege’s possible reaction to their views of the notion of quantity. It is time to return to Frege’s treatment of the concept of quantity in [21].
5.3 Informal considerations continued Frege considers the source of the failures concerning the explications of the concept of quantity to lie in the manner in which the fundamental question is posed. Instead of asking: “What properties must an object have in order to be a quantity?”, we ought to ask: “How must a concept be constituted if its extension is to be a quantitative domain?” 41 If we substitute “class” for “extension of a concept”, then the question becomes: “What properties must a class possess in order to be a quantitative domain?” A thing is not a quantity taken by itself, but only in so far as it belongs, with other objects, to a class which is a quantitative domain. For the sake of convenience, Frege expressly disregards absolute quantities. He confines himself to considering only those quantitative domains, in which a contrast [Gegensatz ] occurs, to which the contrast of the positive and the negative corresponds when it comes to dealing with measurement numbers (cf. [21, § 162, p. 159]). It is in this connection that Frege refers approvingly to Gauss and quotes a longer passage from him (Gauss, works, vol. II, p. 170). In it, Gauss proceeds from the observation that positive and negative numbers can only be applied where that which has been counted [das Gezählte] has an opposite or a contrary [ein Entgegengesetztes]. He points out that, on closer examination, this presupposition applies only where it is not substances (that is, objects conceivable for themselves) [für sich denkbare Gegenstände], but rather the relations, each holding between two objects, that are taken as that which has been counted. It is thereby postulated that these objects are, in a certain way, ordered in a series S, for example, A, B, C . . . and that the relation of A to B can be regarded as being equal to the relation of B to C, etc. Gauss goes on to say that to the concept of opposition [Entgegensetzung] belongs only the permutation [Umtausch]42 of the members of the relation, such that if the relation of 41 More literally, but perhaps less elegantly: “. . . so that its extension is a quantitative domain.” Another option is: “. . . for its extension to be a quantitative domain.” Frege [21, p. 158, footnote 2] acknowledges that Stolz makes a move in this direction when he writes that a quantitative concept is a concept of such a kind that any two of the objects falling under it are explained as being equal or unequal. Frege argues that in the sentence that immediately follows this explanation (or definition), namely: “In other words: ‘Quantity means every object which should be set equal to or unequal to another object’ ” [“Mit andern Worten: ‘Grösse heisst jedes Ding, welches einem andern gleich oder ungleich gesetzt werden soll ’ ”], shows that Stolz abandons his (initially promising) attempt. 42 I presume that the term “permutation” fits the bill here; “interchange” may be another option.
Frege on Quantities and Real Numbers
69
A to B is considered to be +1, the relation of B to A must be represented as −1. Thus, insofar as such a series S is unbounded on both sides, every real number represents the relation of a member, arbitrarily chosen as the beginning (of the series), to a given member of S. Frege signalizes that he basically agrees with this thought, but adds that (for his own purpose) he leaves out the limitation to the integers and prefers to replace the phrase “that which has been counted” with the phrase “that which has been measured” (“das Gemessene”). Moreover, he points out that unlike Gauss he considers the equality of the relations to be definable without regard to certain objects that may stand to one another in the relation. If a relation is given in which A stands to B, then – so Frege argues further – it is at the same time determined whether B stands to C and C stands to D in this same relation, and this yields automatically an ordering of objects in a series. Let me make a couple of brief remarks on this. First, Gauss employs the term “Relation” , not the term “Beziehung”, but he obviously uses the first in the usual sense of the second. In his comment on Gauss, Frege uses the two terms indiscriminately in the sense of “Beziehung” . It is only a few lines later (in the concluding passage of § 162) that he introduces “Relation” as a shorthand expression for “Umfang einer Beziehung” (“extension of a relation”). Second, we know from Frege’s remarks elsewhere in his work that identity of relations cannot, at least in a strict sense of identity, be defined, contrary to what he claims in his comment on Gauss. The reason is that Frege considers identity to be a relation that holds only between objects. However, he claims (cf. [24, p. 131]) that there is a second-level relation between (first-level) concepts which is akin to the first-level relation of identity, namely the mutual subordination or coextensiveness of first-level concepts. Note that in this connection Frege does not mention the analogous case of the coextensiveness of (first-level) relations. In his letter to the American mathematician Huntington which I mentioned earlier in section 2, Frege draws attention to some features of his own theory of real numbers and makes a few critical comments on the theory of magnitudes and real numbers that his addressee had presented in a trias of short articles (see [31, 32, 33]). It is his theory of real numbers in Grundgesetze where Frege sees the points of contact with Huntington’s foundational work. I am just now busy with the printing of the second volume of my Basic Laws of Arithmetic, which partly contains some considerations similar to the ones in your papers, especially with respect to Archimedes’ Axiom and the commutative principle, even though our points of departure are different.
Frege goes on to point out that in his forthcoming work he raises the (fundamental) question “What properties must a class have in order to be a quantitative domain?” and adds that there are some points where
70
Matthias Schirn
his treatment of analysis also diverges from Huntington’s. Moreover, he claims greater simplicity for his account; he writes [25, p. 89]: I too take into account at once the contrast between positive and negative, taking from Gauss the hint that this contrast occurs only among relations. The question now arises in this form: What properties must a class of Relations have in order to be a quantitative domain? This way of putting the question is somewhat simpler than yours, because something corresponding to your ‘rule of combination’ is given from the outset, namely the composition of Relations, which was already defined in the first volume. You take two things into account: the class or ‘assemblage’ and the ‘rule of combination’ in this class, and this rule is not given through the class. This is also a point where, from a logical point of view, your account does not seem to be perfectly correct.
The quantities to be considered by Frege are “Relationen”. Henceforth, I use the term “Relation(s)” with a capital “R” as a shorthand for “extension of a relation” or “extensions of relations”.43 Accordingly, in the quotation above I have rendered Frege’s term “Relationen” as “Relations”, with the only exception of its occurrence in the first sentence where he refers to Gauss. While in [16] Frege somewhat vaguely characterized a quantitative domain as the multiplicity enclosed within the scope of a quantitative kind (for example, length), conceived of as a property in which a group of things can agree with a single thing of the same kind, he now specifies quantitative domains more succinctly and more precisely as classes of Relations, that is, as extensions of concepts subordinate to the concept Relation. Thus, both the quantities themselves and the quantitative domains are now, in pursuit of the logicist programme, taken to be logical objects à la Frege par excellence. Frege lays down that the converse of what he calls “sign” (“Vorzeichen” ) corresponds to the converse of the relation (K and UK). The addition of the measurement numbers corresponds to the composition of Relations (KLΠ). Hence, the symbol “U ” is comparable to the minus sign and “L” is comparable to the sign for addition. The formula “ALUB” corresponds to “a − b”, and the formula “ALUA” corresponds to the null sign. (See the definitions of the converse of a relation Up and the composite relation pLq in section 5.4.) In [21, § 164], Frege addresses the question of where quantities whose ratios are irrational numbers might be found. They will have to be nonempty Relations, that is, the required quantities must not be extensions of those (first-order) relations in which no objects stand to one another. For it is plain that such relations are coextensive; there is only one empty 43 Frege sees no need to introduce a special axiom governing double courses-of-values, and R. Heck [28, pp. 283 f.] explains correctly why this is so. Note in this context that the terms for double courses-of-values can be formed by means of the notation available for the “simple” courses-of-values introduced in [20, § 9]; cf. § 36.
Frege on Quantities and Real Numbers
71
Relation. Yet no real number can be defined by means of the empty Relation. If q is the empty Relation, then both the converse of q and the composition of q with its converse coincide with q. Furthermore, the composition of Relations of the quantitative domain under consideration must not yield the empty Relation. However, this would be the case if there were no object ∆ to which an object would stand in the first Relation and which would stand in the second Relation to an object. The upshot so far is obvious: “We thus need a class of objects which stand to one another in the Relations of our quantitative domain, and in fact this class must comprise infinitely many objects” [21, § 164]. Frege observes that the required class must have a cardinality greater than the class of natural numbers (finite cardinal numbers), and draws attention to the fact that the cardinal number belonging to the concept class of natural numbers is, in effect, greater than the cardinal number of the concept natural number. Somewhat surprisingly, Cantor’s proof that for any set M , the cardinality of the power set ℘(M ) is greater than the cardinality of M is passed over in silence. (In his short article ‘Über eine elementare Frage der Mannigfaltigkeitslehre’ (1890-91), Cantor had already announced his diagonal argument for proving the result just mentioned; cf. [5, pp. 278 ff.]). Having arrived at this point, Frege sketches his plan for the envisaged introduction of the real numbers. In order to render his exposition more accessible to the reader, he temporarily assumes that the irrational numbers are known. Every positive real number a can be represented in the form k=∞ X 1 r+ 2nk k=1
where r is a positive integer or 0, and n1 , n2 , . . . form an infinite, monotone increasing sequence of positive integers. To every positive rational or irrational number a there belongs an ordered pair hr, Ri, where r is a positive integer or 0, and R an infinite class of positive integers (class of the nk ). If instead of the integers we take cardinal numbers, then to every positive real number there belongs an ordered pair whose first member is a cardinal number and whose second member is a class of cardinal numbers which does not contain the cardinal number 0. Suppose now that a, b and c are positive real numbers and that a + b = c holds. Then for every b there is a relation holding between the pairs belonging to a and to c. This relation is said to be definable without presupposing any knowledge of the real numbers. Thus, we have relations, each of which is again characterized by a pair (belonging to b), to which we add the converses. As Frege further points out, the extensions of these relations (that is, these Relations) correspond single-valuedly44 (eindeutig) to the positive and negative real
44 Here I am indebted to the translators of Frege’s Grundgesetze, Philip Ebert and
72
Matthias Schirn
numbers; and to the addition of the numbers b and b′ corresponds the composition of the corresponding (or associated) Relations. He eventually observes that the class of these Relations is a domain which suffices for his plan, but hastens to add that it is not thereby said that he will hold precisely to this route. Thus, following Frege’s exposition in [21, § 164] one could define the real numbers as ordered pairs hm, M i, formed from an integer m and an infinite set M of natural numbers not including 0. If with regard to this representation of the real numbers as ordered pairs hm, M i one defines addition (‘+’), the relation less-than (‘<’), and One (‘1’) appropriately and defines R as the set of the pairs hm, M i, then one can show that the quadruple hR, <, +, 1i is a model of the axiomatic system of the arithmetic of real numbers given by Tarski (cf. [55, pp. 201 ff.]).45 Axiomatic System A* A1: x 6= y → x < y ∨ y < x. A2: x < y → ¬(y < x).
A3: x < z → ∃y(x < y ∧ y < z). A4: S ⊂ R ∧ T ⊂ R ∧ ∀x∀y(x ∈ S ∧ y ∈ T → x < y) → ∃z∀x∀y(x ∈ S ∧ x 6= z ∧ y ∈ T ∧ y 6= z → x < z ∧ z < y). A5: x + (y + z) = (x + y) + z. A6: ∀x∀y∃z(x = y + z). A7: x + z < y + t = (x + y) + z → x < y ∨ z < t.
A8: 1 ∈ R. A9: 1 < 1 + 1.
These nine axioms may be divided into three groups. The first group consists of A1–A4, the second of A5–A7, and the third of A8 and A9. A1 is the Law of Connectivity for the relation less-than. A2 states of course that the relation less-than is asymmetrical. A3 is the Law of Density for Marcus Rossberg. They suggested to me that “single-valued” is a better translation of the term “eindeutig” than the customary “many-one”. 45 Cf. also [37]. In [30], Hilbert presents his first treatment of the foundations of arithmetic, and he does so by setting up an axiom system for the real numbers. The way of introducing the axioms of analysis is akin to the introduction of the axioms of geometry in Die Grundlagen der Geometrie [29]: “We think a system of things; we call these things numbers and designate them by a, b, c, . . . We think these numbers in certain reciprocal relationships whose exact and complete description occurs through the following axioms.” These axioms fall into four groups: I. Axioms of linking, II. Axioms of calculation, III. Axioms of ordering, and IV. Axioms of continuity, comprising the Archimedian Axiom and the Axiom of completeness. Hilbert stresses that the latter two are independent of each other; they do not make any statement about the concept of convergence or about the existence of limits. Thus, Hilbert characterizes the real numbers axiomatically as an ordered, Archimedian field that is incapable of being extended while continuing to satisfy all the axioms.
Frege on Quantities and Real Numbers
73
the relation less-than. It says that the dyadic relation < is dense in the set of all numbers. A4 is the Law of Continuity for the relation less-than, or as it is also sometimes called, Dedekind’s Axiom. One can provide A4 with a more manageable context by adding two definitions: D1: The set of numbers S precedes the set of numbers T if and only if every element of S is less than every element of T . D2: The number z separates the set S of numbers from the set T if and only if for any two elements x of S and y of T , where x 6= z and y 6= z, we have: x < z and z < y. With the aid of D1 and D2, the Axiom of Continuity may be formulated in simpler terms as follows: If one set of numbers precedes another, then there is at least one number which separates the first set from the second. While the axioms of the first group state fundamental properties of the relation less-than, the axioms of the second group relate to the binary operation of addition. A5 is of course the Associative Law for addition. A6 is the Law of Right Invertibility for addition. Tarski did not endow A7 with its own term. He derives this proposition qua theorem from the Laws of Monotonicity on the basis of the axiomatic system he refers to as A′′ of a fragment for the arithmetic of real numbers. The axioms belonging to the third group of System A*, namely A8 and A9, do not require further explanation. Axioms A1, A2, A5, A6, and A7 of system A* for the arithmetic of real numbers comprise the axiomatic system which Tarski calls A′′′ and which – just like its equivalent companion A′′ – characterizes the set of all numbers as an ordered Abelian group with respect to the relation < and the operation +. If we take into account the axioms in A*, which are added to the axiomatic system A′′′ , (thus A2, A4, A8, and A9), A* may be characterized as follows: A* expresses the fact that the set of all numbers R is a densely and continuously ordered Abelian group with respect to the relation < and the operation +, and it singles out a certain positive element 1 in that set. From a methodological point of view, system A* offers several advantages. To my knowledge, it is the simplest known axiomatic system that suffices to construct arithmetic in its entirety. With the exception of A1, which can be derived from the remaining axioms, all other axioms as well as the primitive terms occurring in them are independent of one another. Let us return from this minor digression to Frege’s plan and pick up the thread of representing the real numbers as ordered pairs hr, Ri. If one proves the existence of the class of the Relations which correspond singlevaluedly to the positive and negative real numbers – Frege calls this class
74
Matthias Schirn
the positive class – one can define the real numbers as ratios of quantities of a domain that belongs to a positive class. Moreover, one is then in a position to prove that the real numbers themselves belong to the domain of a positive class. Before embarking on the formal development – that part is entitled “The theory of quantity” (“Die Größenlehre”) – Frege draws attention to two points that he considers to be important. First, neither the classes of finite cardinals nor the ordered pairs mentioned above, nor the Relations between these pairs are irrational numbers. Second, the Relations between the pairs can be defined without assuming that the connection with the irrational numbers is known. “In this way, we shall succeed in defining the real number purely arithmetically or logically as a ratio of quantities which can be shown to exist [to be available], so that no doubt can remain that there are irrational numbers” [21, p. 164].
5.4 The formal theory Throughout my account of Frege’s formal theory of quantity I translate his concept-script formulae largely into “modern” or at least into more accessible notation. In particular, I replace Frege’s symbol for negation by “¬”, his symbol for implication (the conditional) by “→”, his sign for the universal quantifier by “∀”, and the symbol for the course-of-values operator by “#”. Following common usage, I shall (mostly) employ “x”, “y” and “z” as signs for bound object variables (and occasionally also “r” “u” and “w”). (I use “∃” only twice when I offer an equivalent formulation of a formula in which “∀” occurs.) Instead of Frege’s sign for the second-level – course-of-values function “ εϕ(ε)” (the course-of-values operator as I also call it) I use “#xϕ(x)”. “#” is usually followed by “x”, or also by “y” when it concerns the designation of a double-course-of-values; in exceptional cases, it is combined with “u” and “v”. I presume that in his conceptscript Frege would have prohibited such different uses of the same sign, but I do not think that my choice causes any confusion. In every formula that results from my transcription of a concept-script formula and in which one and the same letter occurs more than once it is always employed in the same function. As to the role of “=” in Frege’s concept-script, we must bear in mind his assimilation of assertoric sentences to proper names. Due to the fact that he construes (referential) expressions with the syntactic structure of assertoric sentences as proper names referring either to the object the True or to the object the False, he feels entitled to use “=” not only between ordinary singular terms, but also between assertoric sentences (truth-value names); Basic Law V is a paradigm case. In my transcriptions, I follow Frege in this respect, although “=”, when flanked by two truth-value names, could be replaced by “↔” without essentially affecting the content of a given formula.
Frege on Quantities and Real Numbers
75
Instead of Frege’s symbol for the membership-function “S” I employ “∈”. I am aware that the meaning usually attached to “∈” does not coincide with the meaning that Frege assigns to “S”, but the meaning of the latter is at least akin to the meaning of the former. Again, this should not give rise to any problem, especially since below I list Frege’s definition of the membership-function in modern notation. Thus, whenever the symbol “∈” occurs in my transcription of a Fregean formula (and it does occur quite often), it must always be understood in the precise sense that the definition bestows upon it. As a matter of fact, Frege employs only negation, implication, the universal quantifier and identity from the repertoire of logical signs available in standard first- and second-order logic. For the sake of simplification and perspicuity, I occasionally make use of “∧” in addition to “¬” and “→” and stipulate that “∧” binds stronger than “→”; clearly, “¬(p → ¬q)” is logically equivalent to “p ∧ q”, and “p → (q → r)” is logically equivalent to “p ∧ q → r”. In a few cases, I shall present a definition that Frege frames or a theorem that he proves in equivalent symbolic formulations. Frege uses a number of “exotic” symbols for the designation of the fundamental concepts that he needs for laying the logical foundations of analysis (for example, for the designation of the concepts of positival class and positive class). I decided to retain these special symbols in my transcription of his notation. Finally, a word about Frege’s judgement stroke and the horizontal. In my translation of the propositions that he puts forward or proves in [21], I dispense, for the sake of simplicity, with the judgement stroke, which, for obvious reasons, appears in his symbolism always in tandem with the horizontal. Apart from this, I retained his use of the horizontal. From a logical point of view, the horizontal is dispensable in Frege’s system. As I pointed out in section 1, the concept −− ξ can be reduced to the relation ξ = ζ, since −− ξ is co-extensive with ξ = (ξ = ξ). Yet without the notational benefit that Frege derives from “−− ξ”, his two-dimensional concept-script would not even have got off the ground. Note that by prefixing the horizontal to any well-formed object-name of Frege’s formal language which is not a truth-value name (a sentence) we obtain a truth-value name. Not every name of a truth-value (for example, – = (ε = ε))”) is a truth-value name, that is, an object name with the “ ε(ε syntactic structure of a sentence. Moreover, by prefixing the horizontal to the name of a monadic function that is not a concept we obtain a concept-expression and, analogously, prefixing the horizontal to the name of a dyadic function that is not a relation yields a relational expression. I conclude these preliminary remarks by listing several definitions that Frege sets up in [20] and on which he relies not only in the logical construction of cardinal arithmetic, but also in the development of his theory of real numbers.
76
Matthias Schirn
The relation of an object falling within the extension of a concept (Beziehung des Hineinfallens eines Gegenstandes in einen Begriffsumfang), § 34: a ∈ u := K#y¬∀f (u = #xf (x) → ¬f (a) = y)]. In equivalent notation: a ∈ u := K#y∃f (u = #xf (x) ∧ f (a) = y)]. Single-valuedness of a relation (Eindeutigkeit einer Beziehung), § 37: Ip := ∀x∀y(x ∈ (y ∈ p) → ∀z(x ∈ (z ∈ p) → y = z)).
In equivalent notation: Ip := ∀x∀y∀z(x ∈ (y ∈ p) ∧ x ∈ (z ∈ p) → y = z).
Mapping-into by a relation (Abbildung durch eine Beziehung), § 38: ip := #x#y¬Ip → ¬∀r(∀u(r ∈ (u ∈ p) → ¬u ∈ x) → ¬r ∈ y)].
Converse of a relation (Umkehrung einer Beziehung), § 39: Up := #x#y(x ∈ (y ∈ p)).
The following [succession] of an object after an object in the series of a relation (das Folgen eines Gegenstandes auf einen Gegenstand in der Reihe einer Beziehung), § 45: Mq := #x#y∀F (∀r(F (r) → ∀u(r ∈ (u ∈ q) → F (u))) → (∀u(y ∈ (u ∈ q) → F (u)) → F (x))).
The relation that an object belongs to the series of a relation beginning with an object (die Beziehung, dass ein Gegenstand der mit einem Gegenstand anfangenden Reihe einer Beziehung angehört), § 46: Rq := #x#y¬y ∈ (x ∈ Mq) → x = y. Composite relation (zusammengesetzte Beziehung), § 54: pLq := #x#y¬∀r(r ∈ (x ∈ q) → ¬y ∈ (r ∈ p))]. In equivalent notation: pLq := #x#y∃r(r ∈ (x ∈ q) ∧ y ∈ (r ∈ p))]. So much for the definitions on which Frege relies in his formal account and some relevant details of notation. We can now focus on the stages of Frege’s formal development of his theory of quantity. Frege maintains that the demarcation of the quantitative domain results from the demand that the commutative and associative laws for addition hold, which basically tallies with his treatment of the concept of quantity in [16]. Accordingly, he wishes to get his theory of quantity off the ground by posing the following question: What properties must a class of Relations possess so that in it the commutative and associative laws for the composition of Relations hold? The proof of the associative law (Theorem 489) requires that several sentences about the identity of Relations be derived. The demand that p be a Relation may be expressed by #x#y[−− f (x, y)] = p
Frege on Quantities and Real Numbers
77
as well as by #x#y[−− x ∈ (y ∈ p)] = p. In [21, § 166] the following three theorems concerning the identity of Relations are derived: Theorem (485): #x#y[−− g(x, y)] = q → (#x#y[−− f (x, y)] = p → (∀u∀v((−− v ∈ (u ∈ p)) = (−− v ∈ (u ∈ q))) → p = q)).
Theorem (486):
#x#y[−− g(x, y)] = q → (∀u∀v((−− v ∈ (u ∈ (pLr))) = (−− v ∈ (u ∈ q))) → pLr = q).
Theorem (487): ∀u∀v((−− v ∈ (u ∈ (pLr))) = (−− v ∈ (u ∈ (sLt)))) → pLr = sLt.
Frege further proves he associative law for the composition of Relations (cf. [21, p. 165]). Theorem(489): pL(qLt) = pLqLt. In Frege’s view, two related theorems, namely Theorem (490) F (pLqLt) → F (pL(qLt))
and Theorem (491) F (pL(qLt)) → F (pLqLt),
which obviously could be contracted into F (pLqLt) ↔ F (pL(qLt)),
belong likewise to the more important ones (cf. [21, p. 245]). Unlike the associative law, the commutative law does not hold universally for the composition of Relations. Frege first proves it for the members of a sequence like K, KLK, KL(KLK), KL(KL(KLK)), . . . To render the proof more succinct, he introduces a simple symbol for the sequence-forming relation (reihende Beziehung)46 in such a sequence. He defines [21, p. 166]: ∗ t := #x#y[tLy = x] and then draws the most direct conclusions from this: Theorem (492): F (d ∈ (a ∈ ∗ t)) → F (tLd = a). Theorem (493): d ∈ (a ∈ ∗ t) → (F (tLd) → F (a)).
46 Marcus Rossberg suggested to me that Frege’s phrase “reihende Beziehung” is probably best translated as “series-forming relation” ; I agree. However, I have a slight preference for the term “sequence” over “series”, although it is not a perfectly literal translation of the word “Reihe”. A more casual rendering of “reihende Beziehung” might be “alignment relation” or “lining-up relation” .
78
Matthias Schirn
Theorem (494): d ∈ (a ∈ ∗ t) → (F (a) → F (tLd)). Theorem (495): F (tLd = a) → F (d ∈ (a ∈ ∗ t)). Theorem (496): d ∈ (tLd ∈ ∗ t). The commutative law for the composition of Relations belonging to a sequence of the form mentioned above has the following form: Theorem (501): t ∈ (q ∈ R∗ t) ∧ t ∈ (q ∈ R∗ t) → pLq = qLp.
In order to prove it by appeal to Theorem (144) a ∈ (b ∈ Rq) → (∀x(F (x) → ∀y(x ∈ (y ∈ q) → F (y))) → (F (a) → F (b))) (cf. [20, p. 143], [21, p. 246]), Frege needs or
t ∈ (p ∈ R∗ t) → (pLd = dLp → (d ∈ (a ∈ R∗ t) → pLa = aLp))
t ∈ (p ∈ R∗ t) ∧ pLd = dLp → pL(tLd) = tLdLp, which he proves with the help of t ∈ (p ∈ R∗ t) → pLt = tLp. The latter theorem can be derived by using Theorem (144) (cf.[21, §§169f.]). By appealing to Theorem (501), we can regard the class of members of a sequence like K, KLK, KL(KLK), KL(KL(KLK)), . . . as a quantitative domain and define every positive rational number as a ratio of two quantities belonging to that domain. The negative rational numbers could be introduced by adding the converses of the Relations. When it comes to the irrational numbers, Frege stresses that they can be obtained only as limits. And the limit in turn can be defined only in terms of the relation greater than. For the sake of convenience, he wants to reduce this relation to the notion of the positive: a is greater than b if and only if the Relation composed of a and the converse of b is positive. Now, if a positive class Σ is at hand, the quantitative domain associated with it can be determined as follows: to it belongs every Relation which either is a member of Σ, or is the converse of a Relation belonging to Σ, or is composed of a Relation belonging to Σ and its converse (cf. [21, § 173]). If Σ is a positive class, then Π belongs to the corresponding quantitative domain if and only if ∀x(x ∈ Σ → ¬(¬Π = Ux → Π = xLUx)) → Π ∈ Σ holds. Bearing all this in mind, Frege frames the following definition of the notion of a quantitative domain [21, p. 169]:
Frege on Quantities and Real Numbers
79
ðs := #x[∀y(y ∈ s → ¬(¬x = Uy → x = yLUy)) → x ∈ s].
Frege now outlines the following strategy. To frame a sustainable definition of the concept of positive class requires, he thinks, a certain productive detour. The irrational can only be obtained as a limit. Yet a definition of the notion of limit requires a relation of the smaller to the greater, which Frege wishes to define with the help of the notion of a positive class. In order to address the central question “When is a class a positive class?”, the wider concept of what Frege calls a positival class must first be introduced. Equipped with the latter we can define the notion of least upper bound, and with the least upper bound at hand we arrive at the notion of a positive class. In order to define the notion of a positival class appropriately, it is mandatory to lay down the following conditions: (1) Each Relation that belongs to the positival class Σ is one-to-one. (2) The Relation composed of such a Relation and its converse does not – qua Null quantity (Nullgrösse) – belong to Σ. (3) If the Relations p and q are members of Σ [and p 6= q], then the composition of p with q is in Σ. (4) If p and q are in Σ [and p 6= q], then the Relation composed of p and the converse of q belongs to the quantitative domain of Σ. (5) If p and q are in Σ [and p 6= q], then the composition of the converse of p with q is in the quantitative domain of Σ. Frege makes a number of further stipulations. An object x that stands in the Relation Π to any object, is called the first member of Π. If any object stands to y in the Relation Π, then y is called the second member of Π. For every Relation there is a first class of objects which can appear as the first members of the Relation and a second class of objects which can appear as the second members of the Relation. Frege demands that the first class belonging to a Relation Π is equal to the second class belonging to a Relation K, if Π and K belong to the same positival class. For every positival class Σ there is then exactly one class of objects which can appear both as first and as second members of every Relation belonging to Σ. By appealing to these considerations, Frege sets up the following definition (Ψ) of the concept positival class [21, § 175]: fs := ∀x(x ∈ s → ¬(∀y(y ∈ s → ¬(∀z(∀w¬(z ∈ (w ∈ x)) = ∀w¬(w ∈ (z ∈ y))) ∧ UxLy ∈ ðs ∧ xLUy ∈ ðs → ¬xLy ∈ s)) → #u#v[−− v ∈ (u ∈ x)] = x ∧ IUx ∧ Ix → xLUx ∈ s)).
An equivalent formulation is: fs := ∀x(x ∈ s ∧ ∀y(y ∈ s → ∀z(∀w(z ∈ (w ∈ x)) = ∀w(w ∈ (z ∈ y))) ∧ UxLy ∈ ðs ∧ xLUy ∈ ðs ∧ xLy ∈ s) ∧ #u#v[−− v ∈ (u ∈ x)] = x ∧ IUx ∧ Ix ∧ ¬(xLUx ∈ s)). Peter Simons [52, p. 374] aptly observes that this definition does not constrain the ordering structure defined in terms of greater than to be con-
80
Matthias Schirn
tinuous, or even dense. A positival class, he says, could have the ordering structure of the positive integers, the positive rationals, or those positive rationals which are expressible as multiples of a negative power of 2, for 675 example, 27 64 , 512 . Frege underscores that in definition (Ψ) he has taken pains to include only those clauses that are independent of one another. In the same breath, he admits that their mutual independence cannot be proved, and expresses the belief that especially the clause “UxLy ∈ ðs” cannot be dispensed with. Naturally, the question arises here as to whether the mutual independence of the clauses of (Ψ), in particular the independence of “UxLy ∈ ðs” of the other clauses (assuming that the independence does exist), can in fact be proved, contrary to what Frege claims. It seems unsatisfactory to say: I have tried repeatedly, but in vain, to reduce, say, “UxLy ∈ ðs” to any of the other clauses. Hence, “UxLy ∈ ðs” is likely to be independent of the rest (cf. [21, pp. 171 f.]).47 Frege must have felt uneasy about the way he presents his “independence problem”, as is evident from a footnote at the very end of his formal account (cf. [21, p. 243]). There he suggests that his earlier claim that the mutual independence of the clauses of definition (Ψ) is unprovable should not be construed in an absolute sense. For it is conceivable that one could find classes of Relations to which all conditions except one would apply “so that each of these would not apply in one of the examples”. Thus, if one managed to present a class of Relations satisfying all clauses of (Ψ) with the only exception of, say, “UxLy ∈ ðs”, one would have succeeded in proving the independence of this clause from the other clauses. At the stage he has reached in [21, § 175] Frege doubts, however, that it should be possible to give examples of classes of Relations to which all clauses of (Ψ) except one apply, without presupposing geometry, the rational and irrational numbers, or even empirical facts. According to this assessment, a proof of the mutual independence of the clauses making up (Ψ) or at least a proof that “UxLy ∈ ðs” is independent of the rest, would fulfil its purpose only if it were carried out in a purely logical fashion, hence without invoking at least geometrical and/or empirical facts. Hardly 47 On Frege’s “independence problem” see [1] and [12, p. 288]. Note that what I refer to as clauses (4) and (5) of definition (Ψ) Dummett refers to as clauses (3) and (4). Probably thanks to the expertise of his fellow authors Adeleke and Neumann (both are eminent group theorists), Dummett [12, p. 288] writes: “Clause (4) says that < is a strict linear ordering of the negative elements. . . and is equivalent to the proposition that < is a strict lower semilinear ordering of the group (where this has the obvious meaning). (3) and (4) together are therefore tantamount to the proposition that < is a strict linear ordering of the group. If the ordering is left-invariant, clause (4) must hold. . . (The converse, however, does not hold; a group may have a rightinvariant linear ordering that is not left-invariant.) Frege’s independence problem thus amounts to asking whether there is a group with a right- but not left-invariant upper semilinear ordering that is not linear.” The elaboration of the positive answer is to be found in [1] under the heading “Frege’s independence problem solved”.
Frege on Quantities and Real Numbers
81
anything else could have been expected from someone who committed himself to logicism concerning both number theory and analysis. As to the truths of (Euclidean) geometry, Frege does not even touch upon the issue of determining their epistemological status in [20, 21]. However, there are good reasons to suppose that around 1903 he still endorsed the view set out earlier in [18]: our knowledge of the geometrical truths rests on pure spatial intuition; they are synthetic a priori. I even firmly assume that Frege maintained the thesis that the axioms of Euclidean geometry derive from spatial intuition throughout his entire academic career (that is, from 1873 until 1925). Now I tend to believe that an independence proof concerning the clauses of (Ψ) which presupposes or even makes explicit use of the rational, negative, and irrational numbers would not necessarily offend against the requirement that the proof must proceed in a purely logical manner. Yet it could seem that from Frege’s point of view the proof could be accepted only on condition that these numbers had been introduced as logical objects prior to the formulation of (Ψ). In what follows, Frege first draws about two dozens of conclusions from (Ψ) – Theorems 518–544 – before turning to the proof of Theorem (556). Among the more important ones of those theorems are: Theorem (521): fs → (p ∈ s) → (q ∈ s → (∀x(¬x ∈ (d ∈ q)) → ∀x(¬d ∈ (x ∈ p)))). Theorem (526) fs → (p ∈ s) → (q ∈ s) → UpLq ∈ q ∈ ðs. Theorem (528) fs → (p ∈ s → (q ∈ s → pLUq ∈ ðs)).
Theorem (529) fs → (p ∈ s → (q ∈ s → pLq ∈ s)).
Theorem (531) fs ∧ p ∈ s → #x#y(− − y ∈ (x ∈ p)) = p. Theorem (533) fs ∧ p ∈ s → IUp. Theorem (535) fs ∧ p ∈ s → Ip.
Theorem (536) fs ∧ p ∈ s → ¬pLUp ∈ s.
Theorem (542) fs → (q ∈ s → (f (U(pLUq)) → f (qLUp))). Theorem (543)
82
Matthias Schirn
fs → (q ∈ s → (f (qLUp) → (f (U(pLUq))).
Theorem (544) fs → (p ∈ s → (f (U(UpLq)) → f (UqLp))).
Frege now proceeds to prove Theorem (556) fs → (q ∈ s → (p ∈ s → qLUq = UpLp)).
If the Relation Π belongs to the positival class Σ, he calls both ΠLUΠ and UΠLΠ null-Relation of the Σ-domain. Frege makes use of sentence (487) in his proof of Theorem (556): “If the Relations q and p belong to the same positival class, then the null-Relation qLUq coincides with the null-Relation UpLp.” From Theorem (556) it follows that there is only one null-Relation in the domain of a positival class. The proof requires reference to these sentences (cf. [21, §§ 177 f.]): (α) fs → (q ∈ s → (p ∈ s → (d ∈ (a ∈ (qLUq)) → d ∈ (a ∈ (UpLp))))).
(β) fs → (q ∈ s → (p ∈ s → (d ∈ (a ∈ (UpLp)) → d ∈ (a ∈ (qLUq))))). Proving (α), in turn, requires appeal to the following two sentences: (γ) ¬d ∈ (d ∈ (UpLp)) → ∀x(¬x ∈ (d ∈ p)). (δ) fs → (q ∈ s → (d ∈ (a ∈ (qLUq)) → a = d)).
And for the proof of (β), the following sentences are needed: (ε) ¬a ∈ (a ∈ (qLUq)) → ∀x(¬a ∈ (x ∈ q)).
(ζ) fs → (p ∈ s → (d ∈ (a ∈ (UpLp)) → a = d)). The proof of Theorem (565) is followed by the derivation of three further theorems that are crucial for the concept positival class. Theorem (561) p ∈ ðs → (¬Up ∈ s → (fs → (r ∈ s → (¬p ∈ s → p = rLUr)))).
A Relation that belongs to the domain of a positival class is a nullRelation if neither it nor its converse belong to the positival class itself. Theorem (562) q ∈ s → (fs → (p ∈ s → qL(pLUp) = q)). Theorem (571): p ∈ s → (fs → (q ∈ s → pLUpLq = q)). Both (562) and (571) say that a Relation belonging to a positival class remains unchanged when it is composed with the null-Relation of the domain of its positival class. Frege proves (561) by employing Theorem (516): p ∈ ðs → (∀x(x ∈ s → ¬(¬p = Ux → p = xLUx)) → p ∈ s).48 48 This theorem can easily be inferred from the definition of the quantitative domain that belongs to a positive class (cf. [21], pp. 170, 180).
Frege on Quantities and Real Numbers
83
After having completed the proof, he uses (486) #x#y[−− g(x, y)] = q → (∀u∀v((−− v ∈ (u ∈ (pLr))) = (−− v ∈ (u ∈ q))) → pLr = q).
in his derivation of (562), and for this purpose he needs both (α) fs → (p ∈ s → (q ∈ s → (d ∈ (a ∈ q)) → d ∈ (a ∈ (qL(pLUp))))).
and (β) fs → (p ∈ s → (d ∈ (a ∈ (qL(pLUp))) → d ∈ (a ∈ q))).49
Theorem (571) can be reduced to sentence (566) q ∈ s → (fs → (p ∈ s → qLUpLp = q)) by appealing to sentence (559) (cf. [21, p. 179])
fs → (q ∈ s → (p ∈ s → (f (qLUq) → f (pLUp)))) and by substituting “q” for “p” in (566) (cf. [21, §§ 183 f.]): fs → (q ∈ s → (qLUqLq = q)) or equivalently fs ∧ q ∈ s → qLUqLq = q. In [21, §§ 187-192], Frege eventually derives a few theorems concerning the greater and the smaller in a positival class. If Σ is a positival class and the Relations Π and K belong to its domain, then “ΠLUK ∈ Σ” can be rendered as follows: “Π is greater than K in the Σ-domain”. First, Theorem (586) is proved: q ∈ s → (fs → (r ∈ s → (qLUr ∈ s → (rLt ∈ s → qLt ∈ s)))).
If of two Relations (q) and (r) which belong to the same positival class, (q) is greater than (r), then the Relation composed of (q) and a third Relation (t) belongs to the positival class (s), if the Relation composed of (r) and (t) does. Now if the third Relation is conceived of as the converse of a Relation (p), we obtain as a special case the sentence: “If a Relation (q) is greater than a second Relation (r), which itself is greater than a third (p), then (q) is greater than (p), if the Relations belong to the same positival class.” In [21, § 189], Theorems (587) and (588) are derived; the proof is relatively simple; (588) is considered by Frege to be more important than (587) (cf. p. 576): Theorem (587) a ∈ s → (¬cLUa ∈ s → (fs → (c ∈ s → (¬aLUc ∈ s → a = c)))). Theorem (588): a ∈ s → (¬a = c → (fs → (c ∈ s → (¬aLUc ∈ s → cLUa ∈ s)))). Finally, from (538) (cf. [21, p. 175])
49 Cf. [21, §§ 181 f.].
84
Matthias Schirn
fs ∧ p ∈ s → ¬Up ∈ s
follows Theorem (589): fs → (p ∈ s → (qLUp ∈ s → ¬pLUq ∈ s)).
A Relation that belongs to a positival class is not greater than a second Relation, if the second Relation is greater than the first.
The matter has now reached a point where the final step towards defining the concept positive class can be taken, namely to set up the definition of the notion of least upper bound of a class of Relations in a positival class. Frege suggests using “Σ-limit of Φ” as an abbreviation for the phrase “upper limit of those Relations in a positival class Σ that belong to a class Φ”. ∆ is a Σ-limit of Φ (the Relation ∆ is a least upper bound of a class Φ in a positival class Σ) if and only if the following conditions are fulfilled: (1) Σ is a positival class. (2) ∆ belongs to Σ. (3) Every Relation in Σ that is smaller than ∆ belongs to class Φ. (4) Every Relation in Σ that is greater than ∆ is greater than at least one Relation in Σ that does not belong to Φ. Now, before defining the limit, Frege first defines the dyadic first-level function-name “ξ F ζ” for the sake of abbreviation and simplification: s F u := #x[∀y(y ∈ s → (xLUy ∈ s → y ∈ u))] In equivalent notation: s F u := #x[∀y((y ∈ s ∧ xLUy ∈ s) → y ∈ u)].
With the help of “s F u”, he can define the sign “s ł u” for the limit as follows: s ł u := #x[¬(∀y(y ∈ s → (yLUx ∈ s → ¬y ∈ (s F u))) → (x ∈ (s F u) → (x ∈ s → ¬fs)))].
“∆ ∈ (Σ ł Φ)” should thus be read as “∆ is the Σ-limit of Φ”.
Theorem 602 “The fact that there is no more than one Σ-limit of a class” p ∈ (s ł u) ∧ q ∈ (s ł u) → p = q can be proved with Theorem (587) by showing that of such limits the first is neither greater nor smaller than the second (cf. [21, §§ 195 f.]). Having defined the Σ-limit of a class with the aid of the concept of a positival class, the requirements for framing an appropriate definition of the notion of a positive class are met. A class Σ must possess the following properties in order to be a positive class: (1) Σ must be a positival class; (2) For every Relation in Σ, there must be a smaller Relation in Σ. (3) If there is a Relation in Σ which is such that in Σ every smaller
Frege on Quantities and Real Numbers
85
Relation belongs to a class Φ, while there is a Relation in Σ that is not a member of Φ, then Φ must have a least upper bound in Σ (or in Frege’s words: then there must be a Σ-limit of Φ). The second condition rules out discrete orderings like the positive integers, while the third guarantees continuity (cf. [52, p. 375]). The definition of the concept positive class (pξ) has then the following form: ps := ¬(∀x∀y∀z(∀w(¬w ∈ (s ł x)) ∧ y ∈ (s F x) ∧ y ∈ s → (z ∈ s → z ∈ x)) → ∀z(∀y(y ∈ s → ¬zLUy ∈ s) → ¬z ∈ s) → ¬fs). As the reader may have expected, Frege draws the most direct conclusions from this definition. Among them are: Theorems (604) ps → (∀x(¬x ∈ (s ł u)) → (e ∈ (s F u) → (e ∈ s →(a ∈ s → a ∈ u)))). Theorem (605) ps → ¬(∀x(∀y(y ∈ s → ¬xLUy ∈ s) → ¬x ∈ s) → ¬fs).
Theorem (606) fs → (∀y(y ∈ s → ¬aLUy ∈ s) → a ∈ s).
Frege then focuses on the proof of Archimedes’ Axiom; it reads as follows: When two Relations belong to the same positive class, then there is a multiple of the one that is not smaller than the other. The thought that there is a multiple of a Relation Π that is not smaller than A, can be expressed symbolically in this way: ¬∀x(Π ∈ (x ∈ R∗ Π) → LUx ∈ Σ), if Σ is the positive class. For the sake of abbreviation, Frege introduces by definition a new sign “s ı p” as follows: s ı p := #x#y[¬∀z(x ∈ (z ∈ R∗ p) → yLUz ∈ s)]. [21, p. 192]. According to this definition, he reads as
A ∈ (Π ∈ (Σ ı Π))
There is a multiple of Π that is not smaller than A, if Σ is a positive class and A and Π belong to it. When we use “s ı p”, Archimedes’ Axiom (Theorem 635) is adequately expressed by the following formula: ps → (p ∈ s → (a ∈ s → a ∈ (p ∈ (s ı p))
or equivalently by ps ∧ p ∈ s ∧ a ∈ s → a ∈ (p ∈ (s ı p)).
86
Matthias Schirn
In order to prove (635),50 Frege reasons as follows (cf. [21, § 201]): Suppose that the Relations q and p belong to the positive class s. Now if every member of the ∗ p-series beginning with q were smaller than a, then, according to Theorem (604) ps → (∀x(¬x ∈ (s ł u)) → (e ∈ (s F u) → (e ∈ s → (a ∈ s → a ∈ u))))
there would exist a s-limit of q ∈ (s ı p). For in the positive class s there are also Relations – for example, q – that are at least reached by members of the series in question, which then holds for all smaller Relations. This s-limit m of q ∈ (s ı p) is clearly dependent on q. Frege now wants to show that cLp is the s-limit of pLp ∈ (s ı p) if c is the s-limit of p ∈ (s ı p) and, hence, that c ∈ (s ł (p ∈ (s ı p))) ∧ p ∈ s → cLp ∈ (s ł (pLp ∈ (s ı p))). Since these s-limits obviously coincide, cLp = c would have to hold. Yet according to sentence (585) (cf. [21, p. 185]) c ∈ s ∧ fs ∧ cLp = c → ¬p ∈ s If a Relation composed of a second and a third Relation is identical with the second, then the third does not belong to the same positival class as the second. p might then not belong to class s. The error of assuming that every member of the ∗ p-series beginning with p – that is, that every multiple of p – is smaller than a arises precisely from this consideration. The first step towards proving Theorem (635) consists in deriving the sentence (α) c ∈ (s ł (p ∈ (s ı p))) ∧ p ∈ s → cLp ∈ (s ł (pLp ∈ (s ı p))) (627)
According to (601) (cf. [21, p. 189]), fs → (∀x(x ∈ s → (xLUp ∈ s → ¬x ∈ (s F u))) → (p ∈ (s F u))) → p ∈ s → (p ∈ (s F u) → (p ∈ s → p ∈ (s ł u))). the sentence (β) c ∈ (s ł (p ∈ (sıp))) → (fs → (p ∈ s → cLp ∈ (sF (pLp ∈ (sıp))))) is required (Theorem 619). With (597) (cf. [21, p. 188]), p ∈ (s ł u) → (a ∈ s → (pLaU ∈ s → a ∈ u))
Frege arrives at c ∈ (s ł (q ∈ (s ı p))) → r ∈ s → cLUr ∈ s → r ∈ (q ∈ (s ı p))
by substituting “aLUp” for “r”, after having introduced “r ∈ ðs” in place of “r ∈ s”. By using (γ) a ∈ LUb ∈ (q ∈ (s ı p)) → a ∈ (qLb ∈ (s ı p)), it is possible to arrive at (β). (γ) is derived from (δ) q ∈ (t ∈ R∗ p) → qLb ∈ (tLb ∈ R∗ p),
50 Dummett [12, p. 289] renders Theorem (635) as follows: If < is a complete upper semilinear ordering, then the Archimedean law holds.
Frege on Quantities and Real Numbers
87
which can be proved with the aid of Theorem (144), mentioned earlier. In order to replace “r ∈ s” by “r ∈ ðs” along the lines of the plan just outlined, Frege derives Theorem (615): r ∈ ðs → (fs → (q ∈ s → (rLUq ∈ s → r ∈ s))). A quantity is positive if it is greater than a positive quantity in its domain. In the proof of Theorem (627) (cf. [21, § 201]) one still requires, apart from Theorem (619),51 according to (601), the sentence c ∈ (s ł (p ∈ (s ı p))) → (fs → (cLp ∈ s → (p ∈ s → (e ∈ s → (eLU(cLp) ∈ s → ¬e ∈ (s F (pLp ∈ (s ı p))))))).
On the assumptions made in [21, § 201], it says: For every Relation e greater than cLp there is a Relation that (i) is smaller than e, (ii) is positive and (iii) is not reached by any member of the ∗ p-series beginning with pLp, when c is the s-limit of p ∈ (s ı p). Frege shows that cLp itself is not reached. For it follows from “c ∈ (s ł (p ∈ (s ı p)))” that c is not reached by any member of the ∗ p-series beginning with p. From this it follows further that cLp is not reached by any member of the ∗ p-series beginning with pLp.52 The reasoning at the beginning of § 201 suggests that the derivation of Theorem (627) must be followed by the proof that the s-limits of pLp ∈ (s ı p) and of p ∈ (s ı p) coincide. It is still imperative to show that each s-limit of pLp ∈ (s ı p) is also an s-limit of p ∈ (s ı p). Theorem (633): p ∈ s → (d ∈ (s ł (pLp ∈ (s ı p)))) → d ∈ (s ł (p ∈ (s ı p))). By means of Theorems (627) and (633), Frege can finally complete the execution of his plan and prove Archimedes’ Axiom (see [21, § 214]). Frege then turns to the task of proving the commutative law, first within a positive class and then for the entire quantitative domain of a positive class. This is the concern of Theorems (674) and (689) (cf. [21, pp. 204, 238, 239, 243]), which have the following form: Theorem (674) p ∈ s → (q ∈ s → (ps → qLp = pLq)).
Theorem (689) q ∈ ðs → (p ∈ ðs → (ps → pLq = qLp)).
In preparation for the proof of (674), Frege first derives Theorem (638): UqLp ∈ s → (p ∈ s → (ps → (q ∈ s → qLUq ∈ s))) 51 See the proof in [21, § 204]. 52 Cf. [21, p. 196]. See the proof of Theorems 620-627 in [21, §§ 206-208].
88
Matthias Schirn
and Theorem (641): qLUp ∈ s → (p ∈ s → (ps → (q ∈ s → UpLq ∈ s))).
In addition, he needs to draw on (644): c ∈ s → (d ∈ s → (∀x(x ∈ s → ¬(xL(dLUc) ∈ s → ¬(xL(cLUd) ∈ s)) → (fs → c = d))). To make this easier to understand, the following loose translation is suggested: If in a quantitative domain the two differences (dLUc) and (cLUd) of the positive quantities c and d are smaller than every positive quantity, then the quantities c and d are identical. Theorem (644) is reduced to the following sentence: b ∈ ðs → (p ∈ s → (∀x(x ∈ s → ¬(xLUb ∈ s → ¬xLb ∈ s)) → (fs → b = pLUp))).
In a likewise loose translation, it reads as follows:
A quantity b is a null-quantity if both it and its converse Ub yield a positive quantity when added to any positive quantity of the same domain. For the sake of brevity in this passage, Frege calls positive quantity a Relation that belongs to the positival class (s). “The proof is to be carried out with (516). One shows that b is neither positive, nor can it be the converse of a Relation (q) that is positive.” In the first case, we have bLUb, in the second qLb is not positive ([21, p. 209]; concerning the proof of (644), see [21, § 220]). With the aid of (644), Frege proves the commutative law in a positive class by substituting “pLq” for “d” and “qLp” for “c”. He includes pLq and qLp between the same limits which he then approximates arbitrarily to one another (“die wir dann beliebig einander nähern”). The proof of Theorem (666): pLUb ∈ s → (ps → (p ∈ s → (q ∈ s → (qLUb ∈ s → (fs → (b ∈ s → bLbL(qLq)LU(pLq) ∈ s)))))) is first on the agenda (cf. [21], §§ 221-230). Frege then turns to the proof of Theorem (673) (§§ 231-236) ∀x(qLUx ∈ s → (pLUx ∈ s → (x ∈ s → ¬(aLU(xLx) ∈ s))) → (p ∈ s → (q ∈ s → (ps → (fs → ¬a ∈ s)))). Having carried out this proof, he applies (673) to (666) in order to get a∈s→ (p ∈ s → (q ∈ s → (ps → (fs → aL(qLpLU(pLq)) ∈ s)))).
As was said above, by means of (644), Frege can prove the commutative law in a positive class and thus complete the task that he had set himself
Frege on Quantities and Real Numbers
89
for section E. In section Z (§§ 239–244) he succeeds then in proving the commutative law for the entire quantitative domain of a positive class. After having achieved all this, Frege breaks off his attempt to provide a logical foundation of the theory of real numbers by giving a short preview of the next steps to be carried out. In the first place, it is mandatory to demonstrate the existence of a positive class, as already indicated in § 164. If this can be accomplished, then the real numbers can be defined as ratios of quantities of a domain that belongs to a positive class. By invoking this definition, Frege thinks that he can also prove that the real numbers themselves belong to the domain of a positive class (cf. [21, § 245]).
6 Frege’s plan carried out: von Kutschera’s account Franz von Kutschera [37] shows that Frege’s plan can be carried out.53 Following Frege, the real numbers can be introduced as Relations between the quantities of a domain that belongs to a positive class: ! ∞ M X ci · r ∗ Rhm, M i(r, t) := ∃s P (s) ∧ r ∈ s ∧ t = m · r + . 2i i=1 The symbols used here are explained in this way: m is an integer and M is an infinite set of natural numbers not including 0. V. Kutschera’s representation of the real numbers in the form m+
∞ M X c i
i=1
2i
M (where cM i = 1 if i ∈ M , and ci = 0 if not i ∈ M ) in section II of his essay corresponds to the representation given by Frege in [21, § 164]. R is defined as the set of the pairs hm, M i.
To be sure, the definition of the real numbers as ratios of quantities can only claim sustainability if one succeeds in demonstrating the existence of a positive class. According to Frege’s exposition in [21, § 164], the real numbers can be defined as ordered pairs hm, M i by proceeding from the natural numbers. It can further be shown that the Relations x + a = y for positive real numbers a form a positive class. The proof that the real numbers qua ratios of quantities or measurement numbers do exist is carried out by using the real numbers in the manner just indicated. This is, as v. Kutschera shows, quite simple after already having proved the property of real numbers for the pairs hm, M i. Let us define a := 53 Simons [52] raises the question “What would the continuation have looked like” (p. 375) and answers the question by presenting an interesting outline (pp. 375-381).
90
Matthias Schirn
#z∃x∃y(z = hx, yi ∧ x + a = y) and s := class of a for positive a; then Ua = −− a, aLb = a + b, aLUb = a − b and s = class of a for any real number a. We obtain then P (s) by using the definition of the concept of positival class, and from the theorems A3: r < t → ∃p(r < p ∧ p < t) s
s
s
And A4: u ⊂ s ∧ v ⊂ s ∧ ∀r∀t(r ∈ u ∧ t ∈ v → r < t) → s
∃q∀r∀t(r ∈ u ∧ r 6= q ∧ t ∈ v ∧ t 6= q → r < q ∧ q < t) s
∗
s
we also obtain P (s). Hence, the sets Rhm, M i are not empty for any pair hm, M i. Furthermore, Rhm, M i ∩ Rhm′ , M ′ i = ∅ for m 6= m′ or M 6= M ′ . Since all elements of a positive class s can be represented in the form ∞ M X ci · r m·r+ 2i i=1 54
(where m ≥ −1 and r ∈ s), r ∈ s ∧ P ∗ (s) ∧ r ∈ s′ ∧ P ∗ (s′ ) → s = s′ holds. Due to the uniqueness of this representation, the following also holds: m·r+
∞ M′ ∞ M X X ci · r ci · r ′ = m · r + → m = m′ ∧ M = M ′ . i i 2 2 i=1 i=1
The pairs hm, M i and Frege’s measurement numbers are therefore correlated one-to-one.55 54 See the proof of A3 and A4 in [37, p. 307]. 55 Cf. [37, pp. 311 f.]. Simons [52] suggests that rather than to speak of Frege’s logicism it is more appropriate to speak of his logicisms in the plural. “The logicist thesis is applied locally, not globally” (p. 384). I disagree. I do not think that it is terribly illuminating to speak of Frege’s logicisms in the plural, of one logicism with respect to number theory and of a distinct one with respect to the theory of real numbers. (If they were not distinct, it would hardly make sense to speak of logicisms in the plural.) Admittedly, earlier I said that number theory was for Frege the paradigm and the acid test for logicism. Yet the fact that the execution of his logicist programme is designed to proceed in stages or in layers and, in particular, the fact that in the case of cardinal arithmetic and analysis the programme deals with different domains of application (the countable and the measurable) does not license the assumption that Frege was thinking in terms of local logicisms rather than in terms of one “global” logicism applying uniformly to several branches of arithmetic, in fact to arithmetic in its entirety. Following his logicist manifesto in [18], logicism is the doctrine according to which the basic laws of arithmetic are analytic, that is, can be proved exclusively from primitive laws of logic and definitions. Later, in the Preface to [20, 21], Frege writes that in his earlier work he sought to make it plausible that arithmetic is a branch of logic and need not take any ground of proof from either experience or intuition. Again, the logicist thesis in either formulation is meant to apply to all segments of arithmetic. Regarding number theory and analysis with which Frege was exclusively concerned in [20, 21], this assessment applies independently of the fact (not explicitly mentioned by Frege himself) that the domain of the measurable is less encompassing than the domain of the countable; for example, concepts, thoughts
Frege on Quantities and Real Numbers
91
In their joint paper ‘On a Question of Frege’s about Right-ordered Groups’, Adeleke, Dummett and Neumann are, in their final assessment of Frege’s approach to the foundations of analysis, full of praise for it, despite the fact that it was overshadowed by Russell’s paradox and consequently remained a fragment. They observe that Frege “treated the applications of the real numbers as far more decisive for the way they should be defined than they are in other theories of the foundations of analysis. Mathematically, his construction of the real numbers, uncompleted because of the disaster wrought by Russell’s contradiction, was a pioneering investigation of groups with orderings. . . It is an unjustice that, in the literature on group theory, Frege is left unmentioned and denied credit for his discoveries” [1, p. 64]. Acknowledgments Just a few weeks before I submitted the final version of this essay for publication I was glad to have had the chance of presenting a considerable part of this material to several competent audiences. I am grateful to Rodrigo Bacellar for inviting me to give a talk at the Department of Philosophy of the University of São Paulo, to Jean-Yves Béziau for inviting me to talk at the Brazilian Academy of Philosophy in Rio de Janeiro and to Hannes Leitgeb for inviting me to give a talk in his Colloquium in Mathematical Philosophy at the University of Munich. Thanks to the audiences for interesting discussion. Part of this material was also presented at the Department of Logic of Charles University, Prague in November 2009. I am grateful to Vitezslav Svejdar for his kind invitation. While I began working on a joint research project entitled “Truth and truth-value in the analytic and logical tradition” in Rio de Janeiro in autumn 2011, I also managed to write a few sections of the present paper and put almost the finishing touches to it. I am grateful to CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nivel Superior) for generous financial support of my stay at Universidade Federal do Rio de Janeiro, especially to Mrs. Carina Mayumi Hosaka de Vasconcelos and Mr. Leonardo Oliveira for taking care of my individual concerns, and to the DAAD (Deutscher Akademischer Austauschdienst) for providing financial travel support (thanks are due to Rita Meyer for her advice when I was preparing my stay in Rio de Janeiro and to Elke Massa-Miranda). I also wish to express my enormous gratitude to Godehard Link and Mic Detlefsen for organizing the terrific project “The Imaginary, the Ideal and the Infinite in Mathematics. Their Theory, History, and Philosophical Understanding” (sponsored by the Alexander von Humboldt-Stiftung), from which I derived much benefit, Jesse Anne Tomalty and Daniel Mook for carefully reading the entire and ideas are not measurable in any plausible sense of measurement, but they are countable.
92
Matthias Schirn
essay and for making useful suggestions to improve it, to Roberto Torretti for his advice concerning a question about Hankel and Euclid and to Michael Scanlan for his comments on the section I wrote on Euclid and Aristotle’s conception of magnitude.
References [1] S. Adeleke, M. Dummett, and P. Neumann. On a Question of Frege’s About Right-Ordered Groups. Bulletin of the London Mathematical Society, 19: 513–521, 1987. A summary of the article with a few additional comments appeared in Dummett, Frege and Other Philosophers, Clarendon Press, Oxford, 1991, 53-64 (page references are to this book). [2] Aristotle. Metaphysics, volume 1, a revised text with introduction and commentary by W. D. Ross. Clarendon Press, Oxford, 1924. English translation by W. D. Ross. [3] Aristotle. Categoriae et liber de interpretatione. Oxford University Press, Oxford, 1956, edited by L. Minio-Paluello. [4] J. J. Baumann. Die Lehren von Raum, Zeit und Mathematik in der neueren Philosophie nach ihrem ganzen Einfluss dargestellt und beurtheilt, I. Band: Suarez, Descartes, Spinoza, Hobbes, Locke, Newton. 1868. [5] G. Cantor. Gesammelte Abhandlungen mathematischen und philosophischen Inhalts. Berlin, 1932, edited by E. Zermelo. reprint G. Olms, Hildesheim 1966. [6] H. Cohen. Kants Theorie der Erfahrung. Ferd. Dümmlers Verlagsbuchhandlung, Berlin, 1871. [7] H. Cohen. Das Prinzip der Infinitesimal-Methode und seine Geschichte. Ein Kapitel zur Grundlegung der Erkenntniskritik. Ferd. Dümmlers Verlagsbuchhandlung, Berlin, 1883. New edition with an introduction of W. Flach, Suhrkamp, Frankfurt/M. 1968. [8] G. Currie. Continuity and Change in Frege’s Philosophy of Mathematics. In L. Haaparanta and J. Hintikka (eds), Frege Synthesized, pp. 345–373. Dordrecht, 1986. [9] R. Dedekind. Stetigkeit und irrationale Zahlen. Vieweg und Sohn, Braunschweig, 1872. [10] R. Dedekind. Was sind und was sollen die Zahlen? Braunschweig, 1888.
Vieweg und Sohn,
[11] W. Demopoulos (ed.). Frege’s Philosophy of Mathematics. Harvard University Press, Cambridge, Massachusetts, 1997. [12] M. Dummett. Frege. Philosophy of Mathematics. Duckworth, London, 1991. [13] Euclid. Elements, Greek text in six volumes. Teubner, Leipzig, 1969-1977, edited by E. S. Stamatis. English translation by L. Heath, Green Lion Press, Santa Fe, New Mexico 2002. [14] L. Euler. Vollständige Anleitung zur Algebra. 1770.
Frege on Quantities and Real Numbers
93
[15] G. Frege. Über eine geometrische Darstellung der imaginären Gebilde in der Ebene. In Frege [23], pp. 1–49. [16] G. Frege. Rechnungsmethoden, die sich auf eine Erweiterung des Größenbegriffes gründen. In Frege [23], pp. 50–84. [17] G. Frege. Begriffsschrift. Eine der arithmetischen nachgebildete Formelsprache des reinen Denkens. L. Nebert, Halle a. S., 1879. [18] G. Frege. Die Grundlagen der Arithmetik. Eine logisch mathematische Untersuchung über den Begriff der Zahl. W. Koebner, Breslau, 1884. [19] G. Frege. Über formale Theorien der Arithmetik. Jenaische Zeitschrift für Naturwissenschaft Supplement, 19:94–104, 1885/86. [20] G. Frege. Grundgesetze der Arithmetik. Begriffsschriftlich abgeleitet, volume I. H. Pohle, Jena, 1893. [21] G. Frege. Grundgesetze der Arithmetik. Begriffsschriftlich abgeleitet, volume II. H. Pohle, Jena, 1903. [22] G. Frege. The Foundations of Arithmetic. A Logical-Mathematical Enquiry into the Concept of Number. Translated by J. L. Austin. Basil Blackwell, Oxford, 1950. [23] G. Frege. Kleine Schriften. G. Olms, Hildesheim, 1967, edited by I. Angelelli. [24] G. Frege. Nachgelassene Schriften. F. Meiner, Hamburg, 1969, edited by H. Hermes, F. Kambartel, and F. Kaulbach. [25] G. Frege. Wissenschaftlicher Briefwechsel. F. Meiner, Hamburg, 1976, edited by G. Gabriel, H. Hermes, F. Kambartel, C. Thiel, and A. Veraart. [26] G. Frege. The Foundations of Arithmetic. A Logical-Mathematical Investigation into the Concept of Number. Translated with an Introduction and Critical Commentary by D. Jacquette. Longman, New York, 2007. [27] H. Hankel. Theorie der complexen Zahlensysteme [,] insbesondere der gemeinen imaginären Zahlen und der Hamilton’schen Quaternionen nebst ihrer geometrischen Darstellung. Leipzig, 1867. [28] R. G. Heck. The Julius Caesar Objection. In R. G. Heck (ed.), Language, Thought and Logic, Essays in Honour of Michael Dummett, pp. 273–308. Clarendon Press, Oxford, 1997. [29] D. Hilbert. Grundlagen der Geometrie. Teubner, Leipzig, 1899. Seventh revised and enlarged edition 1930. [30] D. Hilbert. Über den Zahlbegriff. Jahresbericht der deutschen Mathematiker-Vereinigung, 8:180–184, 1900. [31] E. V. Huntington. Simplified Definition of a Group. Bulletin of American Mathematical Society, 2nd Series 8:296–300, 1901-2. [32] E. V. Huntington. A Complete Set of Postulates for the Theory of Absolute Continuous Magnitude. Transactions of the American Mathematical Society, 3:264–279, 1902. [33] E. V. Huntington. Complete Sets of Postulates for the Theories of Positive Integral and Positive Rational Numbers. Transactions of the American Mathematical Society, 3:280–284, 1902.
94
Matthias Schirn
[34] E. Illigens. Zur Definition der Irrationalzahlen. Mathematische Annalen, 35:451–455, 1889. ISSN 0025-5831. [35] I. Kant. Kritik der reinen Vernunft. 1781/1787. Edited by R. Schmid, Felix Meiner, Hamburg 1956; English translation by N. K. Smith, Macmillan & Co, New York, 1929; edited and newly translated by P. Guyer and A.W. Wood, Cambridge University Press, Cambridge, 1998. [36] I. Kant. Kritik der Urteilskraft. 1799. Hamburg, 1963, edited by K. Vorländer. [37] F. von Kutschera. Freges Begründung der Analysis. Archiv für mathematische Logik und Grundlagenforschung, 9:102–111, 1966. Reprinted in Schirn [40], vol I., Logik und Philosophie der Mathematik – Logic and Philosophy of Mathematics, 301-312. [38] B. Pascal. Oeuvres complètes. Paris, 1963, edited by L. Lafuma. De l’esprit géométrique et de l’art de persuader. [39] B. Russell. The Principles of Mathematics. W.W. Norton Company, New York, 1903. [40] M. Schirn (ed.). Studien zu Frege – Studies on Frege, vols. I-III. FrommannHolzboog, Stuttgart–Bad Cannstatt, 1976. [41] M. Schirn. Studien zu Freges Philosophie der Mathematik. Unpublished Habilitationsschrift, University of Regensburg (710 pp.), 1985. [42] M. Schirn. Semantische Vollständigkeit, Wertverlaufsnamen und Freges Kontextprinzip. Grazer Philosophische Studien, 23:79–104, 1985. [43] M. Schirn. Frege on the Purpose and Fruitfulness of Definitions. Logique et Analyse, 125-126:61–80, 1989. [44] M. Schirn. Frege y los nombres de cursos de valores. Theoria, 9:109–133, 1994. [45] M. Schirn (ed.). Frege: Importance and Legacy. de Gruyter, Berlin, New York, 1996. [46] M. Schirn. O Principio do Contexto nas Grundgesetze de Frege (The Context Principle in Frege’s Grundgesetze). Theoria, 11:177–201, 1996. [47] M. Schirn. Fregean Abstraction, Referential Indeterminacy and the Logical Foundations of Arithmetic. Erkenntnis, 59:203–232, 2003. [48] M. Schirn. Hume’s Principle and Axiom V Reconsidered: Critical Reflections on Frege and his Interpreters. Synthese, 148:171–227, 2006. [49] M. Schirn. Concepts, Extensions, and Frege’s Logicist Project. Mind, 115: 983–1005, 2006. [50] M. Schirn. On Translating Frege’s Die Grundlagen der Arithmetik. History and Philosophy of Logic, 31:47–72, 2010. [51] M. Schirn. Consistency, Models, and Soundness. Axiomathes, 20:153–207, 2010. [52] P. Simons. Frege’s Theory of Real Numbers. History and Philosophy of Logic, 8:25–44, 1987. Reprinted in [11], pp. 358-385. [53] W.W. Tait. Frege versus Cantor and Dedekind: On the Concept of Number. In Schirn [45], pp. 70–113.
Frege on Quantities and Real Numbers
95
[54] J. Tappenden. Frege on Axioms, Indirect Proof, and Independence Arguments in Geometry: Did Frege Reject Independence Arguments? Notre Dame Journal of Formal Logic, 41:271–315, 2000. [55] A. Tarski. Introduction to Logic and to the Methodology of Deductive Sciences. Oxford University Press, New York, 1946. Second edition. [56] A. N. Whitehead and B. Russell. Principia Mathematica, volume 3. Cambridge University Press, Cambridge, 1913.
Frege on Formality and the 1906 Independence-Test Patricia A. Blanchette
In 1906, Frege proposes a method, one which rests on the “formal” nature of logical laws, for proving mathematical independence claims. There are many curious features of the 1906 proposal, including the fact that Frege seems subsequently to have found it unacceptable. This essay explores Frege’s proposal, and rejection, of the 1906 independence-test, with the goal of clarifying Frege’s understanding of the nature of logical entailment and of the “formal” nature of logical laws.
1 Introduction In 1906, Frege proposes a general procedure for demonstrating the independence of a given (mathematical) claim from others.1 Three features of this proposal are worth noting. First, the proposal follows hard on the heels of seven years’ worth of harsh criticism, on Frege’s part, of independenceproofs. Since the appearance of Hilbert’s geometrical independence-proofs in 1899, Frege has been uniformly critical of all such proofs. The 1906 discussion is the first instance of any positive proposal on Frege’s part for demonstrating independence, and is the first indication he gives that he takes independence to be demonstrable at all. The second striking feature of the 1906 proposal is that it is not just Frege’s first positive discussion of a potential means of demonstrating independence; it is his only such discussion. Frege never again returns to the procedure proposed here, and indeed, as far as we can tell, never so much as refers to it again. This is not because Frege was satisfied with the 1906 discussion or considered the issue resolved. The proposal made there was, as Frege says, very tentative and without his own usual high standards of care and rigor. Far from being satisfied with this preliminary discussion of the independence test, Frege seems on reflection to have taken the test to be entirely unsatis1
[13].
98
Patricia A. Blanchette
factory. Despite later discussions of independence, Frege never returns to the 1906 proposal, and in 1910 claims that the independence of the axiom of parallels – the very kind of thing which was presumably to have been demonstrated by the proposed method – “cannot be proved”.2 The third noteworthy feature of the proposed 1906 independence-test is that it bears a striking similarity both to our own, standard means of proving independence today, and to Hilberts’s method, the one relentlessly criticized by Frege from 1900-1906, even in the very essay in which he proposes his own technique. This similarity makes it difficult to understand what Frege had in mind when he proposed the 1906 test. Why, in particular, didn’t his criticisms of Hilbert apply immediately and obviously to his own proposal? And why, having presumably thought that his own method did not fail in this immediate way, did Frege later abandon it? What follows is intended as a contribution to the understanding of Frege’s conception of logic by way of coming to understand what lies behind Frege’s proposal of, and his apparent subsequent rejection of, the 1906 independence-test.
2 The Proposal Frege’s proposal occurs in the final section of the 1906 “On the Foundations of Geometry”, his second essay-series of that title. The bulk of the essay consists of a criticism of Hilbert’s consistency- and independence-results, a criticism continued from the 1899 – 1900 correspondence with Hilbert, and the 1903 “On the Foundations of Geometry”.3 Following the criticism of Hilbert’s specific style of independence-proof, Frege at last turns to the question of whether independence can be demonstrated at all. To begin with, Frege clarifies what he means by “independence”. The conception is relatively straightforward: a given thought is independent of a collection of thoughts if that thought can’t be obtained by a (presumably finite) series of steps of logical inference from the thoughts in that collection.4 That it’s thoughts in question, rather than sentences, is crucial for Frege. As he sees it, the question of logical entailment, and hence of independence, arises only for thoughts, i.e. for the kinds of things that 2
[14, p. 183n.]. The claim that Frege’s 1910 discussion constitutes a rejection of the proposed 1906 test is not uncontroversial. For a contrary view, see [27]: for reply, see section 3 below.
3
See the correspondence with Hilbert, English translation in [18, p. 31-52], and [11].
4
See [13, pp. 423-4/334]. Frege isn’t explicit about the finitude, but the context would seem to indicate that he takes the series of inferential steps to be one that we could in principle complete. Nothing in what follows will turn on the assumption that he presumes the number of inferences to be finite. As a reminder: thoughts, for Frege, are the nonlinguistic propositions expressed by sentences; they are not mental entities.
Frege on Formality and the 1906 Independence-Test
99
have a definite truth-value, and are not subject to reinterpretation in the way that sentences are. As Frege puts it, When one uses the phrase ‘prove a proposition’ in mathematics, then by the word ‘proposition’ one clearly means not a sequence of words or a group of signs, but a thought; something of which one can say that it is true. And similarly, when one is talking about the independence of propositions or axioms, this, too, will be understood as being about the independence of thoughts.5
The discussion of how one might demonstrate independence begins as follows: How can one prove the independence of a thought from a group of thoughts? First of all, it may be noted that with this question we enter into a realm that is otherwise foreign to mathematics. For although like all other disciplines mathematics, too, is carried out in thoughts, still, thoughts are otherwise not the object of its investigations. Even the independence of a thought from a group of thoughts is quite distinct from the relations otherwise investigated in mathematics. Now we may assume that this new realm has its own specific, basic truths which are as essential to the proofs constructed in it as the axioms of geometry are to the proofs of geometry; and that we also need these basic truths especially to prove the independence of a thought from a group of thoughts.6
Frege begins his sketch of the new discipline by laying down two straightforward “basic truths”. These truths sound strange to a modern ear, since they follow from Frege’s unusual view that the premises of an inference are always true, together with the standard view that logical inference is truth-preserving. The two principles are as follows: (L1)
If the thought G follows from the thoughts A, B, C by a logical inference, then G is true.
(L2)
If the thought G follows from the thoughts A, B, C by a logical inference, then each of the thoughts A, B, C is true. These laws will of course not get us very far in the investigation of independence claims, particularly when we are interested in the independence of a true thought from a group of true thoughts. As Frege says, But our aim is not to be achieved with these basic truths alone. We need yet another law which is not expressed quite so easily. Since a final settlement of the question is not possible here, I shall abstain from a precise formulation of this law and merely attempt to give an approximation of what I have in mind.7
The “approximation” Frege gives is as follows: Suppose we have a language L which is fully interpreted, in the sense that its sentences each express 5
[13, p. 401]. See also [15, p. 206]: “What we prove is not a sentence, but a thought. And it is neither here nor there which language is used in giving the proof.”
6
[13, pp. 425-6 (336)].
7
[13, p. 426 (337)].
100
Patricia A. Blanchette
determinate thoughts. Suppose also that L is “logically perfect” in the sense that the replacement of a word in a given sentence by a word of the same syntactic category always gives a new well-formed sentence, one which expresses a determinate thought. Consider now a function µ which maps words of L to words of L, and which both preserves syntactic type, and maps “logical” words to themselves. This function will then induce a map from sentences to sentences, and from whole arguments to whole arguments. Given an argument with true premises and a true conclusion, if the function µ maps this argument to a new argument which also has true premises but has a false conclusion, then we can conclude that the original argument’s conclusion is independent of its premises. As Frege puts it, Let us now consider whether a thought G is dependent upon a group of thoughts Ω. We can give a negative answer to this question if, according to our vocabulary [i.e., according to the mapping µ], to the thoughts of group Ω there corresponds a group of true thoughts Ω′ , while to the thought G there corresponds a false thought G′ . For if G were dependent upon Ω, then, since the thoughts of Ω′ are true, G′ would also have to be dependent upon Ω′ and consequently G′ would be true.8
We can abbreviate the test for independence as follows: (L3)
Consider a set P of premise-sentences and a conclusion-sentence C of a fully-interpreted, logically-perfect language L. Let µ be a mapping of the primitive vocabulary of the language to itself which meets the following conditions: (i) µ preserves syntactic type; and (ii)
µ maps logical terms to themselves.
Consider now the set P ′ and sentence C ′ obtained from P and C by replacing each term with its image under µ. If each of the thoughts expressed by the members of P ′ is true, while the thought expressed by C ′ is false, then the thought expressed by C is independent of the thoughts expressed by the members of P. Frege closes his discussion of the test on a cautious note, pointing out two difficulties: first, that of giving the proposed test with more precision, and second, that of distinguishing logical from non-logical vocabulary, as is essential in order to precisely specify the second requirement. He does not claim here that these difficulties are insurmountable or that they give rise to serious objections to the proposal. His attitude seems to be rather that there is more to be said, and that further investigation is required before the rule can be clearly formulated and applied.
8
[13, p. 428 (339)].
Frege on Formality and the 1906 Independence-Test
101
Prior to suggesting his test for independence, while warning his audience that what he’s presenting is merely an “approximation” of what he has in mind, Frege notes that we can say of the central idea here that: One might call it an emanation of the formal nature of logical laws.9
It’s going to be important for what follows to have as clear an understanding as possible of what Frege means here by the “formal” nature of logical laws. Note, to begin with, that he does not mean anything like “syntactic”. For Frege, logic is not essentially connected with syntax in any way; it’s not essentially connected with the “forms” of sentences, or of any linguistic items. Logical inference, and hence the principles that govern that inference, have as their subject-matter, again, Fregean thoughts. Sentences, as merely the contingent means of expressing thoughts, are connected to principles of logic only via the thoughts they express. The kinds of syntactic derivation-rules one finds in a system like that of Frege’s Begriffsschrift merely tell one how justifiably to derive sentences one from another, where that justification is given by (a) the fundamental logical principles linking thoughts one to another, and (b) the contingent choices we’ve made concerning how thoughts of various kinds are expressed by sentences of various kinds. What, then, does Frege mean by the “formal” nature of logical laws? En route to explaining the importance of condition (ii) in (L3) above, Frege asks his audience to suppose a mapping µ meeting the conditions other than (ii) listed above, and gives a brief description of how the appeal to such a mapping when applying (the thus-minimized version of) (L3) can give rise to faulty judgments of independence. The demonstration is as follows. Consider a case in which we’re presented on the left-hand side of a page a series of sentences expressing true premise-thoughts and a conclusion-thought that’s logically entailed by those premise-thoughts, and on the right-hand side the sentences, expressing thoughts, that are induced by a mapping µ meeting condition (i) but not (ii).10 Assume further that the premise-thoughts on the right are true. We can now ask, says Frege, whether the conclusion-sentence on the right “is the appropriate conclusion-sentence of the inference on the right”.11 That is, we ask whether we’ll get, on the right-hand side, an argument whose conclusionthought is entailed by its premise-thoughts in a way that mirrors the entailment on the left. (In order to see what’s coming, note that by disregarding criterion (ii), our mapping might e.g. have mapped an “and” on the lefthand side to an “or” on the right, hence mangling, as one might put it, the logical structure of the argument.) Regarding this question, Frege remarks: 9
[13, p. 426/337].
10 Frege does not explicitly say that the conclusion on the left is entailed by the premises on the left. I take it from the surrounding discussion that this is what he intends. 11 [13, p. 427/338].
102
Patricia A. Blanchette
One may now be tempted to answer our question in the affirmative, thereby appealing to the formal nature of the laws of logic according to which, as far as logic itself is concerned, each object is as good as any other, and each concept of the first level as good as any other and can be replaced by it; etc. But this would be excessively hasty, for logic is not as unrestrictedly formal as is here presupposed. If it were, then it would be without content. Just as the concept point belongs to geometry, so logic, too, has its own concepts and relations; and it is only in virtue of this that it can have a content. Toward what is thus proper to it, its relation is not at all formal. No science is completely formal; but even gravitational mechanics is formal to a certain degree, in so far as optical and chemical properties are all the same to it. To be sure, as far as it is concerned, bodies with different masses are not mutually replaceable; but in gravitational mechanics the difference of bodies with respect to their chemical properties does not constitute a hindrance to their mutual replacement. To logic, for example, there belong the following: negation, identity, subsumption, subordination of concepts. And here logic brooks no replacement. It is true that in an inference we can replace Charlemagne by Sahara, and the concept king by the concept desert, in so far as this does not alter the truth of the premises. But one may not thus replace the relation of identity by the lying of a point in a plane. Because for identity there hold certain logical laws which as such need not be numbered among the premises, and to these nothing would correspond on the other side. Consequently a lacuna might arise at that place in the proof. One can express it metaphorically like this: About what is foreign to it, logic knows only what occurs in the premises; about what is proper to it, it knows all. Therefore in order to be sure that in our translation, to a correct inference on the left there again corresponds a correct inference on the right, we must make certain that in the vocabulary to words and expressions that might occur on the left and whose references belong to logic, identical ones are opposed on the right.12
That is to say, we must require the mappings in question to meet condition (ii). One thing that’s clear from this passage is that the “formal” nature of logic, as Frege sees it, is of a kind with a certain formality had by every (or almost every?) science. For each science, there’s some range of concepts, objects, functions etc. such that the replacement of one concept/object/function by another (preserving type) outside of this range is irrelevant to the science. The uniform replacement of terms referring to the color red by terms referring to blue, for example, when we’re reasoning just about the masses of the objects in question, will have no effect on the scientific viability of that discourse. Logic is just, in Frege’s view here, a particular example of this phenomenon: for terms whose referents come from a given range of concepts/objects/functions whose nature is part of the subject-matter of logic, the replacement of one for another can change 12 [13, pp. 427-8; 338-9].
Frege on Formality and the 1906 Independence-Test
103
the expression of good logical reasoning into the expression of fallacious such reasoning, while the substitution of terms outside of this range for one another will not affect the logical validity of such reasoning. Given this account of formality, it’s clear how the formal nature of logic leads naturally to the idea of (L3) as a test for logical independence. In the kind of case at hand, we know that because C ′ expresses a false thought while the members of P ′ express truths, C ′ cannot be reached by steps of logical inference from P ′ . But then we know that the original C, similarly, cannot be reached by steps of logical inference from P . For each step of logical inference in the path from P to C would, via the mapping µ, correspond to a step of logical inference constituting a path from P ′ to C ′ , which we know there can’t be. That the steps of good logical inference in the path from P to C would correspond via µ to steps of good logical inference in the path from P ′ to C ′ is guaranteed by the fact that µ preserves “logical” objects/concepts/functions, and that all other typepreserving substitutions (i.e. the ones given by µ) are ones with respect to which logic is insensitive. Given Frege’s view of independence as the absence of just such an inferential path, it would seem that (L3) is a good test for Fregean independence, and that its success is indeed due, as Frege says, to the “formal” nature of the laws of logic. Why, then, doesn’t Frege just stop here and call it a day? Here there are several questions. Why did Frege consider this account of a method for proving independence to be unfinished? Why, given the importance for mathematics generally and for Frege’s work in particular of the notion of independence, does he never return to the topic and take care of whatever remaining doubts he had? And why does he claim four years later, presumably in wholesale rejection of the method given here, that The indemonstrability of the axiom of parallels cannot be proved. If we do this apparently, we use the word ‘axiom’ in a sense quite different from that which is handed down to us. Cf. my essays ‘On the Foundations of Geometry’. . . 13
The “Foundations of Geometry” essays Frege refers us to here are those in which he engages in a sustained criticism of Hilbert’s proofs of consistency and independence. One of his points there, alluded to in this passage and discussed below, is that one can only take Hilbert’s proof-technique to be successful in demonstrating the independence of what one calls an “axiom” of Euclidean geometry if one uses this term non-standardly, to stand either for a multiply-interpretable sentence, or for one of the nonEuclidean thoughts expressed under a non-standard interpretation of such a sentence. Before turning to the questions just raised, we look first at a line of reasoning that seeks to show that the 1910 passage just quoted does not, in fact, show Frege rejecting independence proofs in general. 13 [18, p. 183n.]. Frege’s reference here is to [11] and [13].
104
Patricia A. Blanchette
3 The Import of the 1910 Notes Jamie Tappenden has argued that the passage from Frege’s 1910 notes to Jourdain quoted above does not express the view that independence, of the kind under discussion in the 1906 passage, cannot be demonstrated. As Tappenden sees it, there are two senses of “independence” at play, which we can characterize as follows. [Ind-1]:
Axiom A is independent of axioms A1 . . . An if it can be assumed without contradiction that A is false while A1 . . . An are true.
[Ind-2]:
A thought T is independent of a group of thoughts Ω iff T cannot be proved from Ω via steps of valid logical inference.14
Tappenden’s further claim is that Frege’s rejection of Hilbert’s methodology in the Foundations of Geometry is a rejection of attempts to prove independence in the first of these senses. Similarly for the rejection expressed in the 1910 Jourdain notes: Frege is here rejecting the idea that we can prove the parallels postulate to be independent in sense [Ind-1] of the other axioms of Euclidean geometry. But, says Tappenden, Frege has no objection to the demonstration of independence in sense [Ind-2]. As Tappenden understands it, Frege’s objection to demonstrations of independence in the sense of [Ind-1] is that such demonstrations, and indeed even the statement of independence itself, must involve what is by Frege’s lights an incoherent supposition: namely, that an axiom is false. From Frege’s point of view, axioms (which, recall, are not sentences but thoughts) are by definition true, and there is no sense to be made of supposing them to be false. If you think you’re considering a circumstance in which the parallels axiom is false, as Frege sees it, then you must be considering something other than points, lines, parallelness, etc; and hence it is not really the parallels axiom that you’re contemplating. . . . [A]s long as I understand the words ‘straight line’, ‘parallel’, and ‘intersect’ as I do, I cannot but accept the parallels axiom. If someone else does not accept it, I can only assume that he understands these words differently. Their sense is indissolubly bound up with the axiom of parallels.15
In short: considerations that begin with the supposition that an axiomthought is false are already, from this point of view, incoherent. Hence the methodology naturally suggested by [Ind-1] is, by Frege’s lights, fundamentally flawed. And this methodology, i.e. that of supposing the axiom in question to be false and checking for contradiction, is as Tappenden sees it the procedure rejected by Frege both in his controversy with Hilbert and in the 1910 Jourdain notes. 14 [27, pp. 273-4]. 15 [15, pp. 266/247].
Frege on Formality and the 1906 Independence-Test
105
Frege does, to be sure, reject the idea (at least after 1884) that we can make sense of suppositions that involve the falsehood of axiom-thoughts. But it is not accurate to portray this as the reasoning behind Frege’s rejection of Hilbert’s independence-proofs. As Frege recognizes, Hilbert’s method in FG does not turn on supposing axiom-thoughts to be false. Instead, Hilbert’s proofs proceed by reinterpreting axiom-sentences in such a way that the thoughts newly expressed by those sentences are not the original axiom-thoughts of Euclidean geometry, but are instead new thoughts of an entirely different science altogether. Specifically, in order to show that an axiom-sentence σ is independent of a set Γ of axiom-sentences, Hilbert reinterprets the sentences in such a way that while Γ, as interpreted, expresses a set of theorems of the background theory B (in this case, a theory of constructions out of real numbers), σ on the new interpretation expresses a falsehood, specifically, the negation of a theorem of B. Hence, assuming the consistency of B, it is demonstrated that the sentence σ is not derivable from the set Γ of sentences. We can refer to the true geometric thoughts originally expressed by the members of Γ and by σ as “G(Γ)” and “G(σ)” respectively; the thoughts about real numbers expressed under Hilbert’s reinterpretation we’ll call “R(Γ)” and “R(σ)”. Hilbert’s procedure isn’t that of supposing the members of G(Γ) to be true while G(σ) is false, which would, as above, be incoherent on Frege’s view. His procedure instead involves, to put it in Frege’s terms, noting that the sentences can be reinterpreted so as to express (sets of) thoughts R(Γ) and R(σ) respectively, such that the members of the first are true while the last is false. Frege criticizes Hilbert’s terminological looseness with respect to the term “axiom”, and notes that it is confusing to have to deal with Hilbert’s use of the term to refer not just to what it should refer to, namely axiomthoughts, but also to reinterpretable sentences, and to the thoughts expressed by those sentences under various reinterpretations. Those newlyexpressed thoughts are typically not the axioms of any science, and indeed this is strikingly clear, from Frege’s point of view, with respect to those newly-expressed thoughts that are in fact false. Frege’s central complaint is that when Hilbert demonstrates what he, Hilbert, refers to as a situation in which axioms A1 . . . An are true and axiom A is false, what he has done instead is to consider a situation in which newly-expressed thoughts, not the axioms of geometry, have those truth-values.16 The difficulty, in short, is not that Hilbert supposes, incoherently, of a true axiom-thought G(A) that it is false, but instead that he shifts his focus from such a true axiom-thought to a quite different, false thought R(A) of a different science altogether.
16 For a detailed discussion of Frege’s criticism of Hilbert, see my [2] and [3], and [4, Chapter 5].
106
Patricia A. Blanchette
That this is a crucial point for Frege has to do with the fact that the independence of R(A) from R(A1 ) . . . R(An ), as amply demonstrated by the truth of the latter thoughts and the falsehood of the first, does not imply the independence of G(A) from G(A1 ) . . . G(An ), despite the use of the same sentences to express these thoughts. This is not the place to enter into all of Frege’s reasons for this view, but here we can sum up by noting that for Frege, the independence of one thought from a collection of thoughts turns not just on the syntactic structure of the sentences used to express them, but additionally on logical connections, if any, that obtain between the objects, concepts, and functions referred to by the parts of those sentences. To return to the central issue of this section: Frege does indeed, considerably earlier than the debate with Hilbert, speak of independence in the kind of modal sense depicted in [Ind-1]. In Grundlagen, he says For purposes of conceptual thought we can always assume the contrary of some one or other of the geometrical axioms, without involving ourselves in any self-contradictions when we proceed to our deductions, despite the conflict between our assumptions and our intuition. The fact that this is possible shows that the axioms of geometry are independent of one another and of the primitive laws of logic. . . 17
This somewhat vague characterization appears several years before Frege had become clear about the nature of thoughts. It is unclear whether this characterization of independence is one that Frege would continue to endorse by the time he had developed his mature position on thoughts as the relata of logical entailment and hence of independence. It is, in any case, not a characterization that appears on either side in the debate with Hilbert. What about the comment in the notes to Jourdain that the unprovability of the parallels axiom can’t be proven? Here the context is very thin, but it’s hard to make a case that Frege here means something other than unprovability from the other axioms of Euclidean geometry. More to the point, he doesn’t say or otherwise indicate that the problem has to do with the incoherence of assuming a true axiom-thought to be false. Instead, he refers his reader back to his own criticisms of Hilbert, which as we’ve seen are of a very different kind. In short, while there’s some indication in Grundlagen of an understanding of independence which involves the supposition, later thought by Frege to be incoherent, of conditions under which an axiom-thought is false, this isn’t a conception that shows up in his complaints against independenceproofs. His rejection in 1910 of the possibility of proving independence, accompanied as it is by a reference to the complaints against Hilbert, would seem to be a rehearsal of the complaints he has made all along since 1900, complaints against a technique that does not involve the incoherent sup17 [9, § 14].
Frege on Formality and the 1906 Independence-Test
107
position of the falsehood of an axiom-thought, but that involves instead, as Frege sees it, the mis-application of the technique of reinterpretation. So the claim in 1910 that the unprovability of the parallels postulate can’t be demonstrated does, it seems, stand in quite stark contrast with the suggestion in 1906 that there may be a workable method for providing such demonstrations. Finally, Frege’s later discussion, in 1914, of independence in the posthumously-published “Logic in Mathematics” contains a rehearsal of the early criticism of Hilbert’s independence-proofs, and no mention of the 1906 proposal.18 It’s hard to avoid the conclusion that by 1910 Frege thought there was something seriously wrong with the approach outlined in 1906.
4 The Anti-Metatheory Explanation According to one influential understanding of his work, Frege conceives of logic in a way that rules out, as meaningless, a large number of questions (and answers) which form a central part of contemporary logical theory. In particular, Frege’s conception of logic rules out, on this understanding, all metatheoretical questions. If this interpretation of Frege is correct, then there is – or, at least, it has been argued that there is – a straightforward explanation of Frege’s rejection of the 1906 independence-test. The argument, proposed by Tom Ricketts, is that in rejecting the prerequisites for metatheory, Frege rejects exactly what is required for making sense of, and for construing as successful, the 1906 independence-test.19 Indeed, as Ricketts sees it, Frege’s rejection of the 1906 test gives independent support to the claim that Frege’s conception of logic entails the incoherence of metatheory. The general idea that Frege’s understanding of logic rules out metatheory rests on the ideas (a) that Frege conceives of logic as “universal”, and (b) that this universality makes it impossible to take logic, or a system of logic, as itself a subject-matter about which one can provide rigorous proofs. As van Heijenoort and Dreben put it, For Frege, and then for Russell and Whitehead, logic was universal: within each explicit formulation of logic all deductive reasoning, including all of classical analysis and much of Cantorian set theory, was to be formalized. Hence not only was pure quantification theory never at the center of their attention, but metasystematic questions as such, for example the question of completeness, could not be meaningfully raised. . . . we have no vantage point from which we can survey a given formalism as a whole, let alone look at logic whole.20 18 [15]. 19 [23]. 20 [6, p. 44]. See also [22].
108
Patricia A. Blanchette
Similarly, as Ricketts puts it, it is impossible for Frege to formulate any “overarching conception of the logical”.21 The view of Frege’s conception of logic as anti-metatheoretical has a number of problems, and has been subjected to careful scrutiny and criticism in a number of places.22 The central difficulty, which in my view is decisive, is that of finding a sense of the “universality” of logic which is both adopted by Frege and incompatible with metatheory. Leaving aside the general difficulties, however, it is of interest to see whether, as Ricketts suggests, the 1906 discussion sheds any light on the “no-metatheory” claim, or vice-versa. Recall the second “difficulty” Frege notes on proposing the 1906 independence-test: the difficulty of distinguishing logical from non-logical vocabulary. As Ricketts sees it, Frege’s understanding of logic entails that this difficulty cannot be met. Because Frege holds that one cannot “step back” and survey logic whole, he must hold that one cannot, in principle, provide a distinction between the logical and the non-logical. And since the proposed test for independence turns crucially on such a distinction, the test is, in principle, unworkable. In short, Frege’s universalist conception of logic, on this account, entails both that metatheory makes no sense, and that the distinction between the “logical” and the “nonlogical”, a distinction crucial to the 1906 independence-test, cannot, in principle, be drawn. Before looking directly at this question, it’s interesting to ask to what extent the 1906 discussion is itself a piece of metatheory. If it is, then it would seem that Frege’s very raising of the question of how to demonstrate independence is itself reason against the interpretation on which metatheory is from his point of view incoherent. But in fact, it’s not clear whether one should apply the term “metatheory” here at all. If we take “metatheory” to mean roughly the systematic study of (a certain range of properties of) formal systems, then the independence-claims Frege is talking about in 1906 are not metatheoretical. This distinguishes Frege’s independence question from central independence-issues today. That e.g. CH is independent of ZFC is a claim about the non-existence of a proof in a specific (kind of) formal system. In general, where P is a set of premisesentences and C a conclusion-sentence, a model of P ∪ { ∼ C } shows that C isn’t derivable from P in any system whose proofs are truth-preserving with respect to each of the members of the class of models under consideration. So such a result is very squarely metatheoretical, demonstrating as it does an important feature of the deductive system, or range of deductive systems, in question. For Frege on the other hand, the question of whether a given thought τ is independent of a collection Π of thoughts is the ques21 [23, p. 174]. 22 See [26], [24], [25], [4], [5].
Frege on Formality and the 1906 Independence-Test
109
tion of whether τ can be obtained from Π by a finite number of valid steps of logical inference. And while such a series of steps is straightforwardly a proof, it is not a proof in any particular formal system. Frege’s question is not whether a given formula is derivable from a set of formulas in e.g. the system of Begriffsschrift or Grundgesetze. It’s rather the question of whether a thought follows via steps of logical inference from a collection of thoughts, independently of whether there is available for our use a good codification in a formal system of the inferential steps involved. A modern, model-theoretic independence-proof reduces the consistency in system F of P ∪ { ∼ C } to the consistency of a background theory B (the theory used to provide a model of this set of formulas), and shows, assuming the consistency of B, that F itself is consistent. (For if F is inconsistent, then no set of formulas is consistent in F .) But because Frege’s independence-question has nothing to do with a particular formal system, an affirmative answer to his type of independence-question will have nothing to do with the consistency of any such system. Recall Ricketts’ point about logical vocabulary. Independently of issues concerning metatheory, it is certainly true that if Frege cannot distinguish logical from non-logical vocabulary, then he can’t apply his own suggested law. Frege says very little throughout his corpus about the distinction between logical and non-logical vocabulary. This is as one should expect. Unlike his logicist successors, Frege does not understand logicism to involve any claims about logical vocabulary. His thesis does not involve the claim, arguably crucial to Russell and Carnap, that mathematical concepts are definable in terms of “purely logical” ones. Frege’s understanding of the “purely logical” as it pertains to logicism is that purely-logical truths – a class which, he would argue, includes all truths of arithmetic – are those truths provable in the appropriate way from a canonical and independentlyrecognizable collection of fundamental logical principles. The mark of the purely logical is not a matter of a sentence’s containing just the right kind of vocabulary: it’s a matter of that sentence expressing a thought that’s provable via clearly logical inferences from plainly logical premises. So the Fregean logicist project provides Frege with no reason to distinguish logical from non-logical vocabulary, objects, or concepts. Frege also lacks the roughly Tarskian motivation for distinguishing logical from non-logical vocabulary, i.e. the motivation supplied by the use of the apparatus of formal semantics to provide accounts of such relations as that of logical entailment. Frege has no involvement with relations defined in terms of the reinterpretation of vocabulary, and hence the most important reason for us post-Fregeans to distinguish logical from non-logical vocabulary – namely to mark off the former as those to be exempted from reinterpretation – is a reason that, aside from the considerations found in the 1906 passage, Frege doesn’t have. The 1906 passage is novel in Frege’s
110
Patricia A. Blanchette
work not just in its discussion of a potential means for demonstrating independence, but also in its involvement with any project that provides a reason to distinguish logical from non-logical vocabulary. Does the universality of logic as Frege conceives of it make it impossible, as Ricketts suggests, to draw such a distinction? Here, the answer would seem to be a clear “no”. Frege’s conception of logic as “universal” is a combination of the idea that the fundamental logical principles apply in all domains whatsoever, and that his own systems of formal logic are to be applicable, once we expand their vocabulary appropriately, to the formalization of discourse in any area whatsoever. But neither of these claims has any bearing on whether there is a helpful way to distinguish the range of concepts/objects/functions that count as “logical” in the sense required for (L3) from all other concepts, objects, and functions. Similarly, the question of whether Frege can make sense of metatheoretical reasoning would seem far removed from the question of whether he thinks he can provide a clear such distinction. For, again, the distinction in question has nothing to do with properties of particular formal systems, i.e. with metatheory. In sum: as I’ll argue below, the evidence suggests that Ricketts is right to focus on the difficulty of distinguishing logical terms and their references from others as a key to understanding Frege’s rejection not just of his own 1906 independence-test, but also of the general idea of proving independence. But the difficulties here have nothing to do with metatheory or universalism.
5 The Similarity with Hilbert Recall that the central difficulty with Hilbert’s independence-proofs, as Frege sees it, is that they can establish the independence of the geometric axiom-thought G(A) from the collection of geometric axiom-thoughts G(A1 ) . . . G(An ) only on the assumption that this independence-result follows from the superficially-similar result regarding wholly different thoughts, namely the independence of the thought R(A) from the collection R(A1 ) . . . R(An ) of thoughts. And, for Frege, this assumption is generally unreliable, given the possibility of logical connections obtaining amongst the G-thoughts that don’t obtain amongst the superficiallysimilar R-thoughts. But now we face the striking fact that Frege’s own proposed method of demonstrating independence would seem to suffer from exactly the same difficulty. Assuming no expressive limitations on the language in question, it would seem that a geometric axiom-sentence A will be independent of a collection of geometric axiom-sentences A1 . . . An in Hilbert’s sense if and only if the axiom-thought G(A) is independent of the axiom-thoughts G(A1 ) . . . G(An ) in the sense of Frege’s 1906 test. For there will be an
Frege on Formality and the 1906 Independence-Test
111
interpretation (in Hilbert’s sense) on which each member of A1 . . . An is true while A is false, if and only if there is a function µ of the kind Frege mentions mapping A to A′ , each An to A′n , and such that each member of A′1 . . . A′n expresses a true thought while A′ expresses a false one. The only difference between the Hilbert-interpretation I and Frege’s function µ is that while I maps each term t to a set or object o, µ maps t to a term t′ which refers to o. So as long as the Fregean language in question contains names for all of the objects and relations (or their extensions) to which Hilbert has recourse in constructing interpretations, Hilbert’s independence-test and the proposed 1906 independence-test will have exactly the same results. If this is right, then we can easily see that the cases in which Hilbert’s test gives, from Frege’s point of view, the wrong results will also be cases in which Frege’s proposed test gives the same incorrect results. The difficulty is that if the mapping µ takes us from vocabulary whose contents bear important logical relations to one another – e.g. the terms of geometry or of analysis – to ones that don’t, then the independence-declaration will be unreliable. Given such conceptual relations, it can straightforwardly be the case that the thought expressed by A is logically entailed by those expressed by A1 . . . An despite the existence of a mapping µ delivering false A′ and true A′1 . . . A′n . Consider for example the kind of entailment central to Frege’s logicist project: the thought expressed by (α) “0 has a successor” follows logically from that expressed by (β) “0 is a cardinal number” despite the fact the 1906 test would say otherwise via a mapping of the arithmetical terms to non-arithmetical ones.23 In general, if the language in question includes such unanalyzed arithmetical terms as “0”, “successor”, and so on, then the 1906 test will, it seems, give results that flatly contradict Frege’s logicist thesis. And if this is correct, i.e. that Frege’s proposed test and Hilbert’s own test for independence are equivalent (given the satisfaction of some unexceptional criteria by the language in question), then it would seem that the important question is not that of why Frege rejected the test, but of why he proposed it in the first place. One point in the above line of reasoning, however, deserves scrutiny. The essential equivalence between Hilbert’s method and the method proposed by Frege in 1906 requires not just the linguistic richness noted above (i.e. that Frege’s language contains terms for Hilbert’s objects and functions), 23 Indeed, it’s stronger than this: for Frege, the thoughts expressed by (α) and by (β) each follow logically from the empty set, i.e. are truths of logic. For us, the important point is the weaker one that the first thought is not independent of the second.
112
Patricia A. Blanchette
but also an agreement between the two methods with respect to the terms that are to be held fixed by the mapping (Frege) or the reinterpretation (Hilbert) in question. Recall again Frege’s requirement on the mapping µ that it map terms with “logical” contents to themselves. It will be important here not to leap to the conclusion that by “logical” objects/concepts/functions Frege means what we would mean by “logical” objects/concepts/functions. All we know so far is that Frege recognizes a distinction between terms (or their references) whose substitution one for another can make a difference to the logical validity of a step of inference, and all others; and that he uses the term “logical” for the former. While Hilbert is not explicit in Foundations of Geometry about the distinction between those words open for re-translation and those whose interpretation must remain fixed, his practice is the straightforward and familiar one of re-interpreting all geometric terms, and of holding fixed just a small core of paradigmaticallylogical terms like “all”, “not”, “and”, etc.24 An important question to ask is that of whether Frege’s category of terms that must be held fixed in the mapping µ is broader than is Hilbert’s category of terms whose interpretation must be held fixed. If Frege’s category is significantly broader than Hilbert’s, then the equivalence suggested above disappears: independence in Hilbert’s sense will not, under this condition, imply independence in Frege’s 1906 sense. With respect to that question, there are reasons that pull in both directions. First of all, Frege’s 1906 test is clearly inspired by the kinds of duality principles well known from projective geometry.25 That the interchange e.g. of “point” with “line”, together with corresponding exchanges amongst related vocabulary, is guaranteed to map a theorem to a theorem, and a proof to a proof in projective geometry, provides a straightforward means of demonstrating independence: under such circumstances, the demonstration that such a map delivers true A′1 . . . A′n and false A′ gives a guarantee that A is not provable in that science from A1 . . . An . Even when we re-phrase this in terms of the Fregean thoughts involved, the guarantee stands: giving the terms in these sentences their ordinary references, where “point” means point and so on, the independence result is clear. That is to say, it’s clear as long as the mapping is of the kind just described, in which “point” is mapped to “line” and vice-versa, “lies on” to “passes through”, etc. In this paradigm case, there are two things worth noting. First, every geometric term would seem to be available for mapping to another; the only terms held fixed are the narrowly-circumscribed, now-canonical terms of logic. This feature of the paradigm setting would 24 There are no unrestricted quantifiers in F G. The sense in which “all” is fixed is that the only allowed re-interpretation of ‘all F ’s. . . ’ is that given by the reinterpretation of F . 25 For helpful discussion of this point see [27].
Frege on Formality and the 1906 Independence-Test
113
seem to inspire a narrow understanding of the class of fixed terms. But secondly, the geometric terms so mapped in this model case are generally all simple: their references are not definable or analyzable in terms of one another. And this simplicity is essential to the success of the paradigmatic independence-tests inspired by duality principles. Taking this crucial feature into account might be taken to inspire the view that the class of non-fixed terms can only include logically-simple ones. In a suitably rich language, this principle will give rise to a broad understanding of the class of fixed terms. Frege does not seem to have thought that all of the terms of Euclidean geometry are simple in this sense. He reports to Hilbert, for example, that in his own unfinished investigations into the foundations of geometry, he has been able to make do with fewer primitive terms than has Hilbert, this presumably because he takes it that it is possible to define some in terms of others.26 Along these lines we find, for example, Frege’s 1879 definition of the lying of a point A on a line BC in terms of the congruence of pairs of points.27 Let’s return to the case of arithmetic. Frege’s view is that when we have fully analyzed the contents of (α) and (β), we will be left with considerably more-complex sentences (α∗) and (β∗), sentences which cash out those thoughts in terms of considerably simpler functions and objects, as is done in Frege’s Grundlagen and Grundgesetze. The new sentence (α∗) is straightforwardly derivable from (β∗), which shows, as far as Frege is concerned, that the original thought expressed by (α) is provable from the original thought expressed by (β). If we were to treat the terms “0”, “successor”, and “cardinal number” as open for reinterpretation (in the Hilbert test) or mapping to different terms (in the 1906 test), then we will achieve what is by Frege’s lights exactly the wrong result via both methods: i.e. the result that (α) (or the thought it expresses) is independent of (β) (or the thought it expresses). The difficulty here is that we’ve treated logically-complex terms as outside the range of the “fixed” terms, hence undermining (from Frege’s point of view) the essential role of the mapping, i.e. that it not disturb logical structure or “form”. The same point will presumably arise in geometry once our vocabulary includes terms with logically-complex content. Here we have very little to go on by way of concrete texts, since Frege’s work on analyzing geometrical concepts has mostly not survived. But let’s consider a hypothetical example, along the lines of the analysis Frege gives of “point A lies on line BC” in terms of the congruence of pairs of points.28 If for Frege the 26 See Frege’s letter to Hilbert of 27 December 1899; translation in [18, p. 34]. 27 See [7]. 28 That analysis is as follows. “Point A lies on line BC” is analyzed as: For all points D: If CA is congruent with CD, and BA is congruent with BD, then A = D.
114
Patricia A. Blanchette
content of the term “between” can be analyzed in terms of the contents of simpler terms, and in such a way that, when fully cashed out, the content of (γ) “Point B lies between points A and C” is provable from that of (δ) “Point B lies between points C and A”, then the thought expressed by (δ) is not, in Frege’s considered judgment, independent of the thought expressed by (γ). The sentence (δ) is of course, in Hilbert’s sense, independent of the sentence (γ).29 One way to put our question above about the breadth of Frege’s understanding of “logical” terms is this: Is the thought expressed by (δ) independent of the thought expressed by (γ) in the sense of the 1906 test? The answer to this will depend entirely on whether the mapping µ is required to map “between” to itself or not. If by “logical term” Frege means something like what we post-Tarskians tend to mean, so as to include just the kinds of terms that Hilbert himself holds fixed in FG, then “between” is not one of those terms that must be mapped to itself by µ, and hence the 1906 test gives, by Frege’s lights, the wrong result – i.e., essentially, Hilbert’s result. If on the other hand the “logical” terms are for Frege those terms such that the replacement of one of them by some other term can disrupt the logical structure of an argument, then “between” is, under our assumption, a logical term. And in this case, the 1906 test does not declare (δ) independent of (γ), which is to say that it gives, by Frege’s lights, the right result. In the passage quoted in section 2 above, Frege claims that “subsumption . . . [and] subordination of concepts” form part of the subject-matter of logic, and hence part of what the mapping µ must maintain. One might take it that he has in mind here the kind of preservation of logical form obtained by holding fixed those terms whose contents bear such conceptual relations to one another. The question raised at the outset of this section was that of how to make sense of Frege’s proposal of the 1906 test, given its apparent very close similarity to the methodology of Hilbert’s FG, a methodology which Frege had emphatically, and with good reason, rejected. We’ve now recognized that the methodology may not, after all, be as similar to Hilbert’s as it first appeared. Whether it is or not will depend on how broadly we understand the class of terms which the mapping µ is required to map to themselves. We have seen that there are some reasons, though not conGiven this analysis, it is immediate that Point A lies on BC is provable, just using principles of logic, from Point A lies on CB. See [7, p. 204]. 29 “If A, B, C are points on a line and B lies between A and C, then B lies between C and A” is an axiom of order for Hilbert, and demonstrably independent of other axioms of order. Any interpretation falsifying that axiom is one that demonstrates the independence of (δ) from (γ).
Frege on Formality and the 1906 Independence-Test
115
clusive, to take Frege to have included amongst these terms not just the narrow range of terms typically treated as “logical” today, but also more broadly any whose content is relevant to the logical connections between those thoughts expressed by their use. Read in this broader way, Frege’s proposed 1906 test is not equivalent to Hilbert’s, and is not susceptible to the same failings. Nevertheless, read in this way, the test is entirely unwieldy, given Frege’s views about the pervasiveness and the difficulty of recognizing such entailment-relevant content. The import of this unwieldiness is taken up below.
6 Conclusion To recap the situation thus far: with “logical term” read narrowly so as to include just the usual post-Tarskian array of connectives and perhaps identity, the 1906 test gives, in the context of a sufficiently-rich language, what are by Frege’s lights the wrong answers. Here, by “sufficiently rich” is meant a language in which some terms outside of that narrow range have contents with logical connections to one another, e.g. ones definable in terms of others. The languages for which the 1906 test will give what are by Frege’s lights the right answers are of two kinds. First are those languages all of whose terms are logically independent, so that none expresses a content definable or analyzable in terms of others. Here for example are, arguably, languages whose only geometric terms are such simple ones as “point”, “lies on”, “line”, etc. Alternatively, there are those languages that are richer, but with respect to which “logical term” is read broadly so as to include any terms whose replacement by other grammatically-appropriate terms can turn a step of good logical inference into a fallacious such step. If the above hypothetical case accurately reflects Frege’s views, then the word “between” would be held fixed in such a language. Certainly “successor” and “cardinal number” would be. From Frege’s point of view, in order to systematically and effectively apply the 1906 test, then, one would need a way to distinguish terms whose contents bear logical connections to one another from those that don’t. One will need, that is, to be able to distinguish the “logical” from the “non-logical” in the broad sense of that term. And here it’s clear that Frege does not think there is a systematic or straightforward way to do this. For as far as Frege is concerned, the question of whether a given term has a content that will yield on conceptual analysis to as-yetunnoticed complexity is a question which is often very difficulty to answer, a question whose resolution can take generations of mathematical or other analytic work to answer. As Frege puts it in 1892,
116
Patricia A. Blanchette
Now something logically simple is no more given us at the outset than most of the chemical elements are; it is reached only by means of scientific work. [10, p. 193/182]
and with respect to the conceptual complexity responsible for logical incompatibilities amongst thoughts: That a concept contains a contradiction is not always obvious without investigation30
Frege’s worries about distinguishing “logical” from other terms can now be made sense of. While as Hilbert understands them, independence and formality in geometry have to do largely with the kind of structure explicitly reflected in syntax, both independence and formality for Frege have to do additionally with conceptual connections not reflected in bare syntactic structure. Hence the choice of which terms to hold fixed in a mapping (or re-interpretation) whose goal is to preserve form and provide information about independence, while easy and straightforward from the Hilbertian point of view, is difficult and philosophically contentions from Frege’s point of view. My suggestion regarding the proposal and later rejection of the 1906 test, then, is this. Focusing on geometric cases in which the paradigmatic mappings so useful in projective settings really do preserve “form” in Frege’s broad sense (because of the simplicity of the vocabulary involved), Frege’s goal in the 1906 essay is to sketch out the general case of which these trustworthy examples are instances. This general case is the method we see tentatively laid out there, with a good deal of hesitation, and a warning that further clarification is needed with respect to the class of terms to be held fixed by the mapping. We can now see why this worry about the fixed terms can only have grown deeper as soon as Frege tried to circumscribe them in any general way. For given Frege’s views about logical entailment, the only way to be sure, in a given case, that one is holding fixed the right collection of terms is to be sure that one is including all of those terms whose contents bear logical relations to others. And while one can sometimes tell in particular cases that a term is or is not logically complex in this way, Frege’s view about the highly non-trivial nature of the kinds of conceptual analysis necessary to ferret out such connections means that there can be no general recipe for distinguishing fixed from non-fixed terms. And without such a general recipe, the 1906 sketch can never be completed. This, I would like to suggest, explains his rejection of the method tentatively proposed there.
30 [9, § 74]. See also [12, § 145]: “[N]ot every contradiction lies quite open to view.” Translation p. 159 of Geach & Black.
Frege on Formality and the 1906 Independence-Test
117
References [1] M. Beaney and E. Reck (eds). Gottlob Frege: Critical Assessments of Leading Philosophers. Routledge Press, 2005. [2] P. Blanchette. Frege and Hilbert on Consistency. Journal of Philosophy, 93:317–336, July 1996. [3] P. Blanchette. Frege on Consistency and Conceptual Analysis. Philosophia Mathematica, 15(3):321–346, 2007. [4] P. Blanchette. Frege’s Conception of Logic. Oxford University Press, 2012. [5] P. Blanchette. From Logicism to Metatheory. In B. Linsky and N. Griffin (eds), The Palgrave Centenary Companion to Principia Mathematica. Palgrave Macmillan Press, forthcoming. [6] B. Dreben and J. van Heijenoort. Introductory Note to [Gödel] 1929, 1930, and 1930a. In S. Feferman, J. Dawson, S. Kleene, G. Moore, and R. Solovay (eds), Kurt Gödel: Collected Works. Vol. 1: Publications 1929-1936, pp. 44– 59. Oxford University Press, New York, 1986. [7] Gottlob Frege. Applications of the ‘Conceptual Notation’. Translation in [16, pp. 204-208]. Lecture delivered to the Jenaische Gesellschaft für Medicin und Naturwissenschaft on 24 Jan. 1879. [8] Gottlob Frege. Begriffsschrift, Eine Der Arithmetischen Nachgebildete Formelsprache Des Reinen Denkens. Louis Nebert, Halle, 1879. English translation by Stefan Bauer-Mengelberg as: Begriffsschrift, A Formula Language, Modeled Upon That of Arithmetic, for Pure Thought in van Heijenoort (ed.) pp. 5–82. [9] Gottlob Frege. Die Grundlagen der Arithmetik, Eine logisch mathematische Untersuchung über den Begriff der Zahl. Wilhelm Koebner, Breslau, 1884. English translation by J. L. Austin as: The Foundations of Arithmetic, A logico-mathematical enquiry into the concept of number, Oxford: Blackwell 1953. [10] Gottlob Frege. Über Begriff und Gegenstand. Vierteljahrsschrift für Wissenschaftliche Philosophie, 16:192–205, 1892. Reprinted in [21, pp. 166-178]. English translation as “On Concept and Object” in [20, pp. 182-194]. [11] Gottlob Frege. Über die Grundlagen der Geometrie. Jahresbericht der Deutschen Mathematiker-Vereinigung, 12:319–24, 368–75, 1903. Reprinted in [21, pp. 262-266]. English translation as “Foundations of Geometry: First Series” in [20, pp. 273-284]. [12] Gottlob Frege. Grundgesetze der Arithmetik, Band II. Hermann Pohle, Jena, 1903. Some sections translated into English in Peter Geach and Max Black (eds) Translations from the Philosophical Writings of Gottlob Frege, 3rd edn, Totowa, New Jersey: Rowman and Litlefield, 1980, pp. 139-224. [13] Gottlob Frege. Über die Grundlagen der Geometrie. Jahresbericht der Deutschen Mathematiker-Vereinigung, 15:293–309, 377–403, 423–430, 1906. Reprinted in [21, pp. 281-323]. English translation as “Foundations of Geometry: Second Series” in [20, pp. 293-340]. [14] Gottlob Frege. Notes to Jourdain. In [18], 1910.
118
Patricia A. Blanchette
[15] Gottlob Frege. Logik in der Mathematik. In Frege [19], pp. 219–270. English translation as “Logic in Mathematics” in [17, pp. 203-250]. [16] Gottlob Frege. Conceptual Notation and Related Articles. Oxford University Press, Chicago, 1972, edited by T. W. Bynum. [17] Gottlob Frege. Posthumous Writings. University of Chicago Press, Chicago, 1979, edited by H. Hermes, F. Kambartel, and F. Kaulbach. (Translation of most of [19].). [18] Gottlob Frege. Philosophical and Mathematical Correspondence. University of Chicago Press, Chicago, 1980, edited by G. Gabriel, H. Hermes, F. Kambartel, Ch. Thiel, and A. Veraart. [19] Gottlob Frege. Nachgelassene Schriften. Felix Meiner, Hamburg, 2nd revised edition, 1983, edited by H. Hermes, F. Kambartel, and F. Kaulbach. [20] Gottlob Frege. Collected Papers on Mathematics, Logic, and Philosophy. Blackwell, 1984, edited by B. McGuinness. (Translation of most of Frege [1990].). [21] Gottlob Frege. Kleine Schriften. Georg Olms, Hildesheim, 2nd edition, 1990, edited by I. Angelelli. [22] W. Goldfarb. Logic in the Twenties: The Nature of the Quantifier. The Journal of Symbolic Logic, 44:351–368, Sept 1979. [23] T. Ricketts. Frege’s 1906 Foray Into Metalogic. Philosophical Topics, 25 (2):169–188, Fall 1997. Reprinted in [1, Vol II pp. 136-155]. [24] J. Stanley. Truth and Metatheory in Frege. Pacific Philosophical Quarterly, 77:45–70, 1996. Reprinted in [1, Vol II]. [25] P. Sullivan. Metaperspectives and Internalism in Frege. In Beaney and Reck [1], pp. 85–105. [26] J. Tappenden. Metatheory and Mathematical Practice in Frege. Philosophical Topics, 25:213–264, 1997. Revised and expanded version in [1, Vol II pp.190-228]. [27] J. Tappenden. Frege on Axioms, Indirect Proof, and Independence Arguments in Geometry: Did Frege Reject Independence Arguments? Notre Dame Journal of Formal Logic, 41(3):271–315, 2000.
Formal Discourse in Russell: From Metaphysics to Philosophical Logic Godehard Link
We must speak by the card, or equivocation will undo us. Shakespeare, Hamlet The paper is a case study of the metaphysical sources of Russell’s philosophical logic, combined with a moral drawn from it for current ontological debates. Its “genetic” methodology aims to explore the inherent dialectics in the development of ideas which does not always square with the intentions of an author who first put them forth. In Russell’s case, it appears that by trying out various formal means to express his metaphysical convictions he found himself more or less driven towards a modern conception of logic which he had only partly envisaged. A case in point discussed in the paper is the idea of ramification, which was turned into the concept of definability by Gödel and put to use in his hierarchy or constructible sets.
1 Introduction Around the year 1990 the distinguished historian of logic Ivor GrattanGuinness wrote the following: Until twenty years ago the outline history of logicism was well known. Frege had had the important ideas, until he was eclipsed by Wittgenstein. Russell was important in publicising the former and tutoring the latter, and also for working with Moore in the conversion of British philosophy from neo-Hegelianism to the new analytic tradition in the 1900s, but his own work on logic and especially logicism was very muddled. (Preface to [73, p. xiii])
This assessment is certainly less than sympathetic to Russell, and barely stands up under closer scrutiny. There are still at least two good reasons for continuing to give Russell a careful reading. The first is given by GrattanGuinness himself: with the ongoing publication of the Collected Papers
120
Godehard Link
a much more balanced view of Russell’s logical work has emerged.1 The second reason relates to the history of ideas: it is the detours, the blind alleys, the abortive attempts, and even the complete failures that make it easier for us to imagine the situation before a major historical development sets in, and appreciate the intellectual progress made as the new ways of thinking evolve. The present paper intends to shed some new light on the crucial developments leading to the emergence of modern mathematical logic and philosophy in which Russell played an essential role. In order to explain my own take on something which in one way or another has obviously been done many times before, I am going to distinguish three kinds of project in approaching the work of an author: (i) The first kind can best be called hermeneutic in the traditional sense. Central to such a project is the question, What did the author mean? as well as the goal of a coherent reconstruction of the author’s intention. (ii) The second type of project is systematic: it typically answers questions like, How can the ideas contained in an author’s work, however imperfect, be turned into problems of theoretical interest which make sense independent of the historical context and lend themselves to inquiries that incorporate state-of-the-art methods and insights? Most research on an important author in the history of a field is concerned with either the first or the second kind of project, typically combining the two approaches as well. Regarding Bertrand Russell, let me just mention, without intending to exclude other important work,2 three commentators some of whose works exemplify those projects: Bernard Linsky’s Russell’s Metaphysical Logic [63] mainly for the first type, Gregory Landini’s attempt [56] to revive Russell’s logicism for a combination of the two approaches, with emphasis on the second, and again Peter Hylton’s thorough investigation Russell, Idealism, and the Emergence of Analytic Philosophy [49] for the first. But Hylton’s book can also be regarded as an instance of a third kind of project, which, even when dealing with a particular author, is more concerned with the development of certain ideas in an epoch of a given field, resulting from the collective enterprise of its most creative protagonists. Let us call this kind of project genetic. It is meant to stress the historical observation that ideas tend to develop a life of their own which is not always in line with the 1
There has been a wealth of new material on Russell in recent years flowing from the Russell Archives and published in the multi-volume series of his Collected Papers. Relevant to our concern here are Volume 4 on the Foundations of Logic 1903–05 [84] and Volume 5 for the period 1906–08. Unfortunately, the publication of the latter volume is, as of this writing, still in limbo, so that apart from the major works published during Russell’s lifetime, I had to draw mainly on Volume 4, which is, however, quite rich in information by itself. In the following, this volume is identified in the text as ‘CP4’. Likewise, there are abbreviations for specific papers in that volume that can be found in the corresponding entries in the bibliography.
2
Scholarly works on Russell are of course legion. Two more recent pointers to the literature are [42] and [62].
Formal Discourse in Russell
121
intentions of a particular author and which can grow into a new paradigm of thinking, or, in our case, a new technical framework, not necessarily envisaged by those who initiated the development. I would like to view the work reported in this paper as contributing to the third type of project. Thus I am interested in hermeneutic questions only insofar as they elucidate changing views and highlight novel ways of thinking. Furthermore, I am not concerned with those admirable attempts at vindicating the logicist programs ex post facto, be it that of Frege or Russell, attempts that enjoy the benefit of up-to-date knowledge and technical tools.3 Also, unlike Hylton I am not so much interested in the emergence of analytic philosophy in general, which has been described and amply documented so many times; my focus will rather be on the emergence of modern logic proper to the extent to which it derives from Russell’s unique intellectual journey during his formative years between the publication of Principles of Mathematics (PoM) [76] and Principia Mathematica (PM) [93]. Starting out as a philosopher with a considerable metaphysical burden he found his way, by the very cunning of reason, as it were, towards a completely formal conception of logic which proved viable enough to serve as a basis for modern developments in both mathematical and philosophical logic. By the “metaphysical burden” I mean to refer to the fact that Russell entertained a rich and unrestrained ontology. Language mirrored this ontology and afforded, almost in a one-to-one fashion, an immediate access to all its entities, be they concrete, abstract, simple or complex. Objectivity resided in the ontology, and language, considered more or less psychological, still waited to become a theoretical problem for Russell.4 When he began to seriously direct his thought towards logic and mathematics he had not much of a clue, nor did he have the proper tools, to deal with the essential issue of logical generality. As a result, his chances were rather moderate of giving an adequate account of the nature of quantification and the role variables played in it, and of the notion of function which was to loom so large in his mature theory; all these key concepts of logic display aspects of logical generality. Yet when Russell hit upon Peano’s formalism for symbolic logic he quickly managed to advance to the very heart of the realm of logic where paradox was lurking. It speaks to his determination and remarkable resilience that in spite of countless setbacks he continued to explore the possibilities of founding both mathematics and philosophy on logic. Now taking up the sentiment in the opening quotation by GrattanGuinness, which echoes Quine and others, the standard complaint about 3
For the so-called Neo-Fregeanism, see [44]; for a logicist theory Russell-style see [57].
4
For a case in point, in Principles we find the startling sentence: “Thus meaning, in the sense in which words have meaning, is irrelevant to logic.” [76, p. 47]
122
Godehard Link
Russell is that Russell’s work is hopelessly marred by his constant confusion of use and mention, in particular compared with Frege’s exacting standards in this respect, implying that his work is of limited value for today. It is true that without the extensive use of the principle of charity Russell is at times difficult to understand. But Russell scholars have developed a certain virtuosity figuring out what Russell should have meant when he seems to take the symbol for the symbolized again, and indeed after some reading one can develop a certain feel for it. The truth is that Russell does know how to hold the names apart from the things named, and he can be very explicit about it when he thinks it is important in a given context, but more often than not he simply doesn’t seem to care. That practice is, of course, prone to confuse author and commentators alike but should not deter serious research into uncovering the valuable ideas in Russell’s foundational work. That much should be said in fairness to one of the founding fathers of the field even by somebody who finds many of his technical solutions wanting and disagrees with most of his philosophical positions.5 I would like to take my survey of Russell’s views not only as a contribution to the history of logic but at the same time as a case study of what I regard as philosophical problems in Russell’s ontology that are still with us today. Russell is rightly famous for dissociating language from ontology by means of his celebrated method of incomplete symbols, first in the theory of descriptions and then in his so-called no-classes theory. This new view derived from the recognition of the fact that the relationship between language and the world is less direct and more involved than a straightforward naming game in which words stand for objects and the meaning of phrases is established in a homomorphic block-building routine. Perfectly grammatical phrases like ‘the present king of France’ might fail to signify, and a sentence containing such a description has to undergo reconstruction in order to reveal its proper meaning. In the wake of Russell’s discovery the naive belief in the object-creating power of language was irrevocably shaken: not all descriptive phrases manage to identify an object, neither in everyday life, nor in philosophy, nor in mathematics, as the new paradoxes began to reveal. In mathematics in particular, freeing the formalism from the perceived straitjacket of metaphysics and ontology had been felt a distinct progress in the 19th century. For instance, formal algebraic calculations involv5
Among the recent scholarly work on Russell benefiting from the access to the Russell Archives, Landini’s monograph [56] stands out equally in its technical skill, its close reading of the text, and in its sympathetic account of Russell’s project and achievements. Countering Quine’s use-mention narrative Landini writes at one point: “Russell was not confused at all.” [56, p. 253]. I hope to make clear in what follows that confusion is not the issue; Russell had advanced to, and persisted to plough, such a new and difficult conceptual terrain that his technical sophistication tended to lag behind the richness of his ideas for coping with the problems he faced.
Formal Discourse in Russell
123
ing imaginary numbers were accepted during the computational process if they yielded interpretable results. The American mathematician James Pierpont expressed this common sentiment among mathematicians when he addressed the American Mathematical Society in 1899. Referring to the field of analysis he says: The mathematician of today, trained in the school of Weierstrass, is fond of speaking of his science as “die absolut klare Wissenschaft” [the absolutely clear science]. Any attempt to drag in metaphysical speculations are resented with indignant energy. With almost painful emotions, he looks back at the sorry mixture of metaphysics and mathematics which was so common in the last century and at the beginning of this. (Cited in [55, p. 322])
The typical mathematician of the time portrayed here would embrace the doctrine of formalism which takes mathematics as a mere game, albeit governed by definite rules.6 Russell was immune to that extreme position and sided with Frege in explicitly rejecting it. But he came to appreciate the benefits of the formal stance in his foundational enterprise, in particular when facing paradox; in 1903 he writes: In order to realize that no contradictions are involved, it is necessary to remember that our symbols possess a subtlety of discrimination which language cannot reproduce; hence we must reason strictly by the symbolism in this subject . . . (Classes (Cl), [84, p. 6])
Once the referential ties were loosened the symbolism could take on a life of its own, predominantly structured by internal coherence. Under its influence Russell’s philosophical views underwent crucial changes. When in 1908 he had finally arrived at his theory of types he argues for a separation of the formal theory from interpretation; the concluding paragraph of Mathematical Logic as Based on the Theory of Types (MLT) [80] reads: The theory of types raises a number of difficult philosophical questions concerning its interpretation. Such questions are, however, essentially separable from the mathematical development of the theory, and, like all philosophical questions, introduce elements of uncertainty which do not belong to the theory itself. It seems better, therefore, to state the theory without reference to philosophical questions, leaving these to be dealt with independently. ([80, p. 102])
This was of course meant merely as a methodological separation; Russell still had a definite interpretation in mind for his theory. After classes and even propositions had dissolved in the course of his efforts, Russell intended to found the mature theory of Principia on an ontology of propositional functions. Even so, the theory of ramified types that was taken to implement this goal proved to be open to quite different uses and interpretations. On the one hand, philosophers argue to this day about what kind of ontology this theory is exactly committed to; more relevant for the his6
For an extensive overview on the history of formalism see [14].
124
Godehard Link
tory of logic and set theory, however, is the twist that Kurt Gödel added to the understanding of ramification: he used the tool to develop a model of set theory, viz., the constructible hierarchy, that served to prove the consistency of the axiom of choice and the continuum hypothesis relative to the other axioms of set theory. Gödel’s work is a contribution to the theory of classes or sets, which Russell had come to repudiate, whereas propositional functions, the ontological backbone of Principia, were transformed at Gödel’s hands into mere syntactic forms of varying complexity. Above I hinted that Russell’s ontological development still has some bearing on today’s discussion in the philosophy of mathematics. Despite his eliminativism with respect to certain kinds of object Russell never gave up his basic platonist outlook. Today platonism has fallen into disrepute, but its traditional opposite standpoint, nominalism, hasn’t been able to profit very much from that. Instead it seems that modern theories of fictionalism, according to which mathematical discourse isn’t about any mathematical objects at all, has superseded the methodology of Ockham’s razor, which was employed by Russell only on a limited basis. Here, then, is how the paper is organized. I will begin with a short description of Russell’s ontology in his Principles, which serves as a necessary background for the discussion to follow. Its prime feature is the central role Russell assigns to propositions in the furniture of the world. I will then address the main logical problem occasioned by this theoretical choice, that is, how to treat general assertions with such an ontology. In this context attention is also drawn to Russell’s interesting discussion of the nature of the variable. The next section is devoted to Russell’s notion of function, in which the variable plays an essential role. I go on to discuss the relationship between Russell’s ramified theory of types and Gödel’s constructible hierarchy. Finally, by way of relating Russell’s ontological troubles to current discussions in the philosophy of mathematics, I will argue for deflating ontology, and hence for adopting nominalism combined with what I call methodological realism. I will also explain why I think that the currently quite popular doctrine of fictionalism founders on the issues of mathematical truth and objectivity.
2 Setting the Stage: Russell’s Early Ontology 2.1 Russell’s major sources To begin with, let me briefly recapitulate the most important sources that are well-known to have had a major influence on Russell’s thinking. In 1900 Russell published a book on Leibniz whom he criticizes for a “faulty logic” [76, p. 5] neglecting the irreducible character of relations, but takes
Formal Discourse in Russell
125
over from him an ontological principle which was to play a key role in his theoretical decisions: Whatever is, is one.7 In Principles, even before reading Frege, Russell shows himself dissatisfied not only with traditional logic but also with the algebraic school of Boole and followers. In Boole [5], quantification was hidden in the calculus of classes, and everything was forced into the Procrustean bed of algebraic equations. While Russell acknowledged in the algebraic school the “considerable technical development . . . pursued with a certain vigour” [76, p. 10], he found the approach of little use both for philosophy and mathematics in general.8 This was mainly because he could not see how relations could be fitted into the framework in a natural way, and because the extensional approach was against his intensional philosophical persuasions. Moreover, his view of logic as the most general and topic-neutral science led him to reject the notion of universe of discourse introduced by Boole and hinting at a modern model-theoretic conception of interpreting a calculus in varying semantic domains. The issue of the universality of logic looms large in Russell’s work and led to the doctrine of the unrestricted variable, another methodological principle which ranks among Russell’s firm convictions. Interestingly, Russell’s dislike of the algebraic school did not extend to the work of Alfred Whitehead, whose express push for generalizations was in line with Russell’s view of logic. Cantor had a similar appeal for Russell, with his extension of the concept of number to include the realm of the transfinite. But the flexible logical symbolism of Peano had the greatest impact on Russell, who welcomed it as the long-sought tool for formalizing mathematical and philosophical arguments. It is a curious and well-known fact about Principles, though, to which I will return below, that alongside with the Peano-style quantifiers now at his disposal Russell continued to pursue an altogether different approach to quantification, viz., the theory of denoting concepts. On the philosophical side, Russell gives the greatest credit to George Moore. He writes: On fundamental questions of philosophy, my position, in all its chief features, is derived from Mr. G. E. Moore. I have accepted from him the non-existential nature of propositions . . . and their independence of any knowing mind; also the pluralism which regards the world[.] (PoM, [76, p. xviii])
Here we find the origin of Russell’s realism regarding propositions and his free-wheeling ontology of Platonic atomism. The intellectual turn coming with it is described by him very vividly in [82, chap. 5], called Revolt into 7
See, for instance, [76, p. 132], and again [78, p. 189].
8
In a letter to Jourdain on April 15, 1910, Russell writes: “Until I got hold of Peano, it had never struck me that Symbolic Logic would be any use for the Principles of mathematics, because I knew the Boolian stuff and found it useless.” [41, p. 133] Cf. also [82, p. 51].
126
Godehard Link
Pluralism. However, in spite of Russell’s renunciation of his early idealist convictions, it is the neo-Hegelean Bradley whose preoccupation with “contradictions” had left a lasting mark in Russell’s thinking, be it the the form of Bradley’s regress argument or the puzzle over classes as one vs. classes as many.9 Finally, Russell’s study of Frege came too late for his first foundational work Principles to be influenced by Frege’s writings, but Russell found that he agreed with him in many respects, most importantly in their common logicist outlook. However, there are notable differences to be discussed as we go along.
2.2 Russell’s ontological arsenal in Principles According to the liberal ontology of Principles the world consists of terms. Terms come in two basic varieties, simple and complex. Simple terms are on the one hand regular things like people, chairs, but also points, instants or colors, and on the other hand universals, i.e., concepts (properties) and relations like man and greater-than. The point here is that such universals were viewed as unanalyzable logically, and therefore simple. Thus it is not that properties and relations were put on one side, and objects exemplifying them on the other; Russell categorized things according to their analyzability. The most prominent complex terms are the propositions; they are the corner stone for logical analysis. Terms can enter such propositional complexes as logical subjects, i.e., those constituents in a proposition which the proposition is about.10 Aboutness is a relation not between a linguistic statement and some subject matter in the world, but between two worldly entities, a proposition and one or more of its constituents or even non-constituents. While concepts and relations count as simple terms and hence can occur as logical subjects in a proposition p, there is always a “relating” relation in p whose job it is to hold the propositional complex 9
For Russell and Bradley’s regress argument, see, for instance, [48], [49], and [74].
10 A comment on Russell’s idiosyncratic use of the word ‘term’, confusing to the modern eye, might be in order here. It is understood in an expressly ontological rather than linguistic sense. Russell uses it as synonymous with ‘ìndividual’ (indicating unity) and ‘entity’ (indicating being), but wants to have a neutral expression covering both connotations. Commonly mathematicians speak of a term in an equation, which is a linguistic expression. Equations express mathematical propositions, but the word ‘proposition’ oscillates itself between linguistic and ontological usage, in particular in Russell’s work. Now in Boole’s Laws of Thought, known to Russell, we find the author saying that he investigates the mental operations by representing them through operations on signs. Like Frege, Boole is very clear about the distinction between language and the world. In Boole the propositions are linguistic and take the form of equations; and since equations are flanked by terms in algebra, it is natural to call the constituents of propositions terms, and that’s what Boole does. So we might speculate that Russell, after turning propositions, following Moore, into non-linguistic entities, simply kept the word ‘term’ for their constituents.
127
Formal Discourse in Russell
together; it is not itself the subject matter of p, but lends its unifying power to it. Russell gives the example (i) ‘Socrates is human’ vs. (ii) ‘Humanity belongs to Socrates’: in (i) the concept human is the relating relation, whereas in (ii) the same concept, expressed by ‘humanity’ appears as logical subject.11 term
(= unit, individual, entity)
simple
complex
concept
thing concreta points instants numbers classes-as-one
predicate
relation
denoting concept
proposition
a man every man men
met
denotes
(class-concept) ? ? red, man, class of men meet is-human greater-than
Jones
a man
about ? ? Jones
a certain man
Figure 1: Russell’s Early Ontology Thus simple terms and propositions, also called propositional complexes, are the basic components of Russell’s ontology. He distinguishes a further kind of complex term, the denoting concepts, which will be dealt with below. All those items are shown in the chart of Figure 1.12 The perspicuous absence of a final and essential ingredient will be noticed, however, the propositional function, which appears in Principles and was later to play a pivotal role in Principia. In a modern reading of ‘function’, a propositional function is something, as the name would suggest, that takes a term and returns a proposition, and that is really what it amounts to. However, the notion of function as an assignment of values to arguments has no natural place in Russell’s ontological setup. He does say, somewhat vaguely: “We may explain (but not define) this notion as follows: φx is a propositional function if, for every value of x, φx is a proposition, determinate when x 11 Russell equates the property of being human with humanity, caring little about grammatical detail. This is of course in stark contrast with Frege’s meticulous distinctions, both grammatical and logical; see [24]. 12 This chart has been adapted from its German original designed by the author for his work Collegium Logicum, Volume 2, Münster: mentis 2014. Its English amendment is reproduced here by kind permission from the publisher.
128
Godehard Link
is given. Thus ‘x is a man’ is a proposition.” [76, p. 19] He even gives a modern statement of ‘function’: “Every relation which is many-one . . . defines a function.” [76, p. 83]. But he finds that using this notion in the case of propositional functions would be circular: [F]or in the above general definition of a function propositional functions already occur. In the case of propositions of the type “x is an a,” if we ask what propositions are of this type, we can only answer “all propositions in which a term is said to be a”; and here the notion to be defined reappears. [76, p. 83]
The reasoning here seems to be that functions must somehow arise from propositional functions,13 and these in turn from particular propositions. A propositional function is viewed as a type of proposition, arising though not from abstraction, but from substitution. As we will see, substitution turns out to be a central idea of Russell’s in coping with generality in his ontology. Directly following the above quote “explaining” what propositional functions are, he describes the operation as follows: In any proposition, however complicated, which contains no real variables, we may imagine one of the terms, not a verb or adjective, to be replaced by other terms: instead of “Socrates is a man” we may put “Plato is a man,” “the number 2 is a man,” and so on [footnote omitted]. Thus we get successive propositions all agreeing except as to the one variable term. Putting x for the variable term, “x is a man” expresses the type of all such propositions. [76, pp. 19f.]
Since the variable x marks the place in the proposition where any possible term could go, we are left with what Russell calls an “ambiguous” entity lacking a determinate value. In this respect propositional functions are defective, but not “unsaturated”, as Frege would have it.14 As far as the propositional complex is concerned, the variable really fills the slot (see Figure 2). Substitution is Russell’s technique of making up for the lack of an explicit semantics. In modern substitutional semantics, for instance, we substitute (constant) names for variables, thereby determining their value, but this in an operation purely on linguistic expressions. Again, in model-theoretic semantics value determination is effected by an object in the domain of discourse satisfying an open formula with respect to a variable contained in it, and this establishes a relation between an individual and an expression. By contrast, in Russell’s theory determination through substitution takes place among things “in the world”. A precarious ontological status is 13 See the next section for more on this. 14 Russell explicitly rejects the idea of unsaturatedness (see below). As Hylton [49] suggests, such Frege-style incomplete objects would be in need of being supplemented by other objects, and thus not be separate entities required by the kind of atomistic universe that Russell and Moore subscribed to, where all things are ontologically independent from one another.
129
Formal Discourse in Russell
thereby given to the propositional function, which is said to ambiguously denote all the propositions of its type. We find the vague idea of ambiguous denotation expressed again and again, all the way into Principia. A propositional function, Russell says there,15 in a phrasing well-known for its obscurity, “is not a definite object . . . ; it is a mere ambiguity awaiting determination.” [93, I, p. 48] We might try to apportion the indefiniteness of the object to its semantic role, but grant its claim to existence in the ontology as a regular (complex) term, and this is what Russell seems to settle on. propositional proposition function met Jones
a man
about ? ? Jones
a certain man
met
x
a man
about ? ?
any term
a certain man
Figure 2: The variable replacing a term in a proposition But what, then, is the nature of the variable in that type of proposition, or that “proposition of constant form”, an expression also used for propositional functions [76, p. 89]? Russell calls its nature “fundamental”, but also “one of the most difficult [notions] to understand.” [ibid.]. There is again a tension between its role in the propositional function, where it is responsible for the generality conveyed by the given form, and its ontological status. If the proposition φa, with objectual constituent a, is a complex term and not a purely linguistic expression, and if the propositional function φx is obtained by swapping x for a, then x had better be an objectual term as well, but somehow differing from any of the regular terms like Socrates or Plato. Speaking of the number variable n, Russell says: “In fact, n just denotes any number, and this is something quite distinct from each and all of the numbers.” [76, p. 91] And again: “Thus, although x is a number, and no one number is x, yet there is here no contradiction, so soon as it is recognized that x is not one definite term.” [ibid., infra] We have to conclude that the variable is some indefinite term, presumably belonging to an extra ontological category. Inspired by his theory of denoting concepts (see below) Russell distinguishes “the true or formal variable from the 15 Here and in similar places where reference is made to passages of Principia, the attribution of the content is always meant to be made to both of its authors, Whitehead and Russell.
130
Godehard Link
restricted variable. Any term is a concept denoting the true variable; if u be a class not containing all terms, any u denotes a restricted variable.” [76, p. 91] Let us call an indefinite term an arbitrary term.16 Then there is just one arbitrary term, the unrestricted true variable; and for every class-concept u we have the arbitrary u. So it would seem that every predicate or class categorizing a certain domain of objects gives rise to a representative term going proxy for, and being different from, each term in the respective domains. Variables are just those proxies; they are denoted by the denoting concept any u, where ‘any’ is the characteristic marker for indicating their indefinite nature. It seems not impossible to develop a coherent theory in this somewhat extravagant reification program. Russell’s ontology at the time is certainly liberal enough to embrace such a realm of arbitrary terms. Even in mathematics it is sometimes felt that the so-called indeterminates in algebra, for instance, the X in a ring of polynomials R[X] over a given basic ring R, are more “object-like” than just syntactic variables, because the polynomials over X, though formal, can be added and multiplied, and form a separate algebraic structure hovering over R. Indeed, a few decades ago the notion of indeterminate was used in Situation Theory [4], where types of situation, as types of local chunks of the world, play much the same role as Russell’s propositional functions, with indeterminates taking the place of variables. In the set-theoretic interpretation of situation theory the indeterminates are treated as urelements and thereby granted the status of “first-class citizens” in the set-theoretic universe. Thus they assume the “kind of individuality” which Russell wants to grant his variables [76, p. 94]. The indeterminates differ from arbitrary terms in that they are not attached to any specific property. Thus the ontic view of variables doesn’t necessarily stand and fall with the restricting properties; he could have kept his reified variables in the sense of indeterminates. But Russell was soon to turn away from denoting concepts, and after their demise he might have felt more clearly how strained his view of the variable was, a view he never had indeed been comfortable with even in Principles. However, in his transient but important Substitutional Theory of 1906, he found a way to still pursue the theory of objectual substitution in real propositions without those dubious indefinite terms. Russell puts forth a completely type-free theory there, according to which there are just two kinds of entity, individuals and propositions. I will return to the theory below, but I want to focus here on the crucial operation of substitution employed, and a kind of turning point that Russell reached therewith on his way towards a truly symbolic understanding of logic. 16 This is in analogy with Kit Fine’s arbitrary objects; see [21].
131
Formal Discourse in Russell
In his paper On the Substitutional Theory of Classes and Relations [78] Russell begins by explaining what he means by substitution. He sets it apart from the notion of determination, which consists in assigning values to variables and thus plays some sort of semantic role. By contrast, Russell tells us, “substitution consists in replacing one constant by another.” [78, p. 167] He gives an example: Substitution is required if we say ‘Plato was a philosopher whose sympathies were with the aristocratic party; and the same is true of Socrates’[.] . . . When we say ‘the same’ is true of Socrates, we mean that the proposition which results from substituting Socrates for Plato is true[.] [ibid.]
This is one of the passages where it is hard for the modern reader not to interpret Russell as speaking of propositions in the linguistic sense, and thereby to catch him committing one of those use-mention errors by failing to put the names ‘Socrates’ and ‘Plato’ in quotation marks. But Russell’s theory is about propositions in their ontological sense, and Socrates and Plato are objectual constituent terms just like in the framework of Principles. Russell does speak of replacing constants, but again constants are just constant objectual terms. In fact, in the paper from which the above quotations are taken, Russell is quite explicit about the difference between expressions and the entities they might refer to. By 1906 he had at his disposal his theory of incomplete symbols which he meant to use now for eliminating classes from his ontology. So he does distinguish symbols from objects, but nonetheless intends to take substitution objectually. This becomes particularly evident when we consider his definition of identity in [78]. Using the letters ‘p’,‘q’ etc. for propositions, ‘a’ for individuals and ‘x’, ‘y’ for arbitrary terms,17 Russell explains the operation of substitution as follows: I use p(x/a)!q or p/a; x!q to mean ‘q results from p by substituting x for a in all those places (if any) where a occurs in p’. [78, p. 168]
Now Russell goes on to define identity in terms of substitution: x = y
.=.
x(y/x)!x
Df.
(4.1)
When we substitute the term y for the term x in x, and the result is x, then obviously y must have been x all along. Now notice that this definition makes sense only if substitution is a non-linguistic operation on entities in the ontology.18 Otherwise we would get identity of expressions, and 17 In fact, the choice of the letters is only suggestive; Russell allows the small Latin letters to stand indiscriminately for all entities, individuals and propositions. 18 Landini [56, p. 98] expresses this interpretation very clearly when he writes: “But Russell’s ‘substitution’ concerns entities not symbols. Substitution is ontological, not linguistic. Perhaps the point is more salient if we read ‘p/a; b!q’ as ‘b occurs [as entity] in q wherever a occurs [as entity] in p.’ The fundamental notion is really the notion of occurring [as entity] at a position in an entity.”
132
Godehard Link
Leibniz’s Law would be trivialized. Thus the definition commits Russell to a view of substitution according to which all kinds of chunks out there in the world are constantly shifted around by the pure power of logic, as it were, which, however, might fit gods or Aladdin’s Djin better than humans. This view is perfectly in line with the old “Mont-Blanc view” of propositions. There is the often cited exchange between Russell and Frege, where the latter objects to Russell’s view that Mont Blanc with all its snowfields is itself a component part of the thought that Monc Blanc is more than 4000 meters high. But Russell stood by his position, on the grounds that otherwise we would not be able to know anything about Mont Blanc.19 I think the ontological interpretation of substitution has been driven to a point here where it collapses under its own weight. It seemed like while Russell thought of his logical practice as manipulating the world, he was in fact already manipulating symbols. By separating determination from substitution he had already gotten around to dropping the objectual view of the variable and adopting the modern scheme of assigning values to linguistic variables, albeit within a yet semi-formal notion of semantic interpretation that even persisted in Principia. Thus it was de facto a small step for Russell to fully embrace the symbolic stance on substitution and quantification. He had known explicit quantifier symbols ever since he adopted Peano’s notation, so the symbols were waiting to be put to their proper use. This was finally borne out in Russell’s theory of types.
2.3 General assertions In Principles a crucial point is reached when Russell tries to accommodate in his framework general assertions like ‘all Greeks are human’. The problem for Russell is twofold; first of all, he has to recapture the familiar ground of simple quantification within his framework of propositions, but secondly, he is also aware that he has to account for the general case of more than one quantifier. He would like to think of ‘all Greeks are human’ as the infinite conjunction of elementary propositions, of the form ‘Socrates is human and Plato is human and Aristotle is human and . . . ’, but that would be a proposition with infinitely many constituents, which, Russell argues, could not be grasped by finite human minds. So there must be a way to conceive of a proposition which has only finitely many constituents but at the same time is able to deal with, or be about, infinitely many terms. Early on, Russell took notice of a special feature of the English language when it comes to expressing universality; right after the Paris meeting with Peano, Russell writes to Moore:
19 [31], vol. 2: Letter of Russell to Frege dated December 12, 1904.
Formal Discourse in Russell
133
Have you ever considered the meaning of any? I find it to be the fundamental problem for mathematical philosophy. E.g. ‘Any number is less by one than another number’. Here any number cannot be a new concept, distinct from the particular numbers, for only those fulfill the above proposition. But can any number be an infinite disjunction? And if so, what is the ground of the proposition? The problem is the general one as to what is meant by any member of the defined class. I have tried many theories without success. (Letter to Moore, August 8, 1900, cited in [73, p. 136])
The so-called free-choice ‘any’ has a special status in the English language. In order to make a universal statement, rather than invoking the whole domain at issue by means of the explicit quantifier ‘all ’ or ‘every’, or enumerating every instance of it (which is impossible most of the time), one simply makes a schematic statement instead. This is what is frequently found in mathematical texts: “Any non-negative real number has a real square root” is short for: “Suppose an arbitrary non-negative real number x is given; then x has a real square root.” In Principles Russell translates the single quantifier case into what he calls the formal implication (as opposed to the material implication implies of propositional logic): ‘All Greeks are human’ becomes ‘x is Greek implies x is human’. This is a new type of proposition which by material implication relates two terms that are themselves propositions, but “formal” ones in the above sense in that they contain a variable. However, this format doesn’t seem to lend itself easily to multiple quantifiers, a problem that is also mentioned in the above quotation. Thus the second thing the passage shows is that Russell is aware of this crucial desideratum for any general account of quantification. But in order to devise a solution for it, he feels he has to abide by the constraints set up by Moore’s relational theory of propositions, in which apart from ordinary things only concepts could occur as constituents. So he thinks that the noun phrase ‘any number ’ must somehow be represented as a concept. The solution he comes up with is his first theory of denoting involving the introduction of denoting concepts. As has been described in the literature20 its central idea is the separation of two features of a proposition, viz., the notion of aboutness and that of constituency. Denoting concepts are special concepts that can occur as constituents in a proposition. But they differ from simple terms like Socrates, which are constituents and at the same time logical subjects, i.e., terms that the proposition is about. By contrast, denoting concepts are, as concepts, constituents, but the proposition in which they occur is not about them but rather about the things that the concepts denote, in virtue of a newly introduced logical relation between those concepts and certain terms not occurring in the proposition. It is in this way that the denoting concept ‘any number ’ can be 20 See, for instance, [49].
134
Godehard Link
about an infinite totality of objects. Other denoting concepts are, e.g., ‘all men’, ‘every man’, ‘some man’, etc., expressed linguistically by determiner phrases. This appears to take care of quantification and to solve the philosophical puzzle about ways of accessing the infinite with finite resources. Unfortunately, Russell couldn’t think of an adequate implementation of the idea of denoting concepts in a proper logical framework. As it happens there is a way of doing just that, using the modern notion of generalized quantifiers.21 In fact, the idea was around at the time, having to do with the second-level notion of number developed by Frege and by Russell himself. Just as the number One can be modeled as the secondorder property applying to all concepts true of exactly one object, or as the class of all singleton classes, we could take, say, the denoting concept ‘some number ’ to stand for the class of all classes containing an object falling under the concept number, and then explain the meaning of a sentence like ‘some number is prime’ as the subsumption of the concept prime under our denoting concept ‘some number ’. Similarly, the denoting concept ‘all numbers’ would subsume all concepts containing the concept number as part. Ignoring technical niceties of Montague’s framework we could reconstruct with its help what Russell ought to have said. He gives some examples involving ‘any’, in which a is a class and b is a class of classes: “Any a belongs to any b ” is equivalent to “ ‘x is an a’ implies that ‘u is a b’ implies ‘x is a u’ ”; “Any a belongs to a b ” is equivalent to “ ‘x is an a’ implies that ‘there is a b, say u, such that x is a u’ ”[.] [76, pp. 89f.]
Just as ‘any number ’ cannot denote a separate number apart from the usual numbers, as Russell correctly remarks in the above quote, ‘any a’ could not denote an additional element of a here. But it can stand for a second-level property involving an indeterminate element of a, call it ‘x’. Introducing the circumflex or hat notation for lambda-abstracts22 and some mild Peano-style symbolism, we could set up a translation T for the determiner phrases ‘any a’ and ‘a b’ into second level abstracts (‘ε’ stands for membership): any a ab
T T
⊃Px Pˆ x ε a ⊃ Pˆ (Eu) u ε b P u
(4.2) (4.3)
21 The concept of generalized quantifiers has been brought into prominence by Richard Montague [66], [67]. For an extensionalized version, see [3]. 22 For the origin of the hat notation, and the possible meaning(s) Russell attaches to it, see below.
Formal Discourse in Russell
135
Thus ‘any a’ translates into the second-level property true of those firstlevel concepts P that apply to an arbitrary a-term x; ‘a b’ translates into the property of those concepts being true of some member of b. We can now build up the translation of ‘any a belongs to any b ’ in a strictly compositional way; ‘belongs to’ becomes a relation belongs.to ′ between an individual and a second-level property in such a way that we have the following translation of ‘belongs to any b’: belongs to any b
T ⇐⇒
⊃ P x) (4.4) u ˆ belongs.to ′ (u, Pˆ x ε b ⊃ ⊃ u ε x u ˆ x ε b ⊃ (4.5)
In (4.5) we give the phrase ‘belongs to any b ’ the reading intended by Russell, viz., being a u, which, given an arbitrary member x of b, is an element of x. Now this first-level concept is in turn subsumed by the property (4.2), but when we put the translation of the whole sentence together we see that we have to change the indeterminate in (4.5) and make it y, say. Translation combined with lambda conversion then yields the following equivalences: ⊃Px u ⊃ uεy any a belongs to any b T Pˆ x ε a ⊃ ˆ y ε b ⊃ ⊃ uˆ y ε b ⊃ ⊃ uεy x ⇐⇒ x ε a ⊃ (4.6) ⇐⇒
⇐⇒
⊃ y ε b ⊃ ⊃ uεy x ε a ⊃ xεa yεb ⊃ xεy
The result expresses the intended meaning: if an arbitrary a-term and an arbitrary b-term are given, then the former T is an element of the latter, or, in modern set-theoretic parlance, a ⊆ b. Similarly, for Russell’s second of the above sentences we get: ⊃ (Eu) u ε b x ε u any a belongs to a b T x ε a ⊃ (4.7) S This amounts to a ⊆ b. Notice, however, that for the indefinite noun phrase ‘a b’ an explicit existential quantifier has to be used. Also, just depending on the given context, we had to change variables, even if they are thought of as indeterminates. This is evidence that the “individuality” that Russell wants to attach to the variable leaves something to be desired. The irony is that ever since he came to draw on Peano’s symbolism he would de facto use the variable as a mere syntactic placeholder. However, Russell does have a serious reason to refrain from using overt universal quantification instead of the formal implication of the form ‘x is an a implies x is a b’. Such a proposition, he says, is not a single implication, but a whole class of implications, one for each a-term that can replace x. Russell finds that forming a single universal proposition out of a class of propositions has its dangers. This has to do with another
136
Godehard Link
paradox he discovered, the paradox of propositions, which shaped his view of the logical world to the same extent as the paradox that bears his name. The paradox first occurs in a letter from Russell to Frege on September 29, 1902 (see [31]) and is again discussed in [76, pp. 527f.]. Suppose that for every class m of propositions there is a new proposition m∗ stating that every proposition in m is true; Russell calls it the “logical product” of m. Now let an arbitrary class m of propositions be given; then there is another class w of propositions defined thus: w
p 3 (Em 3 (p = q ε m ⊃ q q p ∼ ε m))
=
(4.8)
In modern notation this amounts to the class w
=
{p | ∃m(p = ∀q(q ∈ m → q) ∧ p ∈ / m)}
(4.9)
Here m∗ equals ∀q(q ∈ m → q); so we can rewrite equation (4.9) as w = {p | ∃m(p = m∗ ∧ p ∈ / m)}
(4.10)
Now form w∗ ; then, under the assumption that the star operation is oneone, we arrive at the contradiction w∗ ∈ w
⇐⇒ ∃m(w∗ = m∗ ∧ w∗ ∈ / m) ∗ ⇐⇒ ∃m(w = m ∧ w ∈ / m)
(4.11)
⇐⇒ w∗ ∈ /w
Thus Russell had reason not only to distrust classes but also general propositions. But propositions, apart from individuals, were central to his ontology,23 and it seemed more natural to him to let go, if need be, of propositional functions with their uncertain ontological status and their closeness to paradox, than of propositions. In the years to follow Russell hoped to make use of ‘any’ in connection with general propositions. After his landmark paper On Denoting [77] of 1905, where ‘any’ is not an issue, he returns to the problem a year later. This is the period of his substitutional theory; in his paper [78] he writes: It often happens that what is verbally and to all appearances a general theorem about all cases, is really only a theorem about each particular case. . . . [What] looks like a general statement, may be called a prescription: it will work in each case, but there is, so to speak, no quintessence to be distilled from the various cases and stated as a general theorem. The most fundamental instance of this is presented by functions. As was explained earlier, we can state many propositions about φx, where φx 23 In 1906, Russell writes: “. . . it is very hard to believe that there are no such things as propositions.” [78, p. 188] In fact, he clings on to propositions as entities even in his first enunciation [80] of ramified type theory.
Formal Discourse in Russell
137
may be any function of x we please; but it is meaningless to say that such a proposition hold for all values of φ. For example, if x and y are identical, φx implies φy. This holds in each particular case, but we cannot say that it holds always, because the various particular cases have not enough in common. This distinction is difficult and subtle, and I do not know how to make it clear; but the neglect of it is the ultimate source of all the contradictions which have hitherto beset the theory of the transfinite. [78, pp. 187f.]
This quotation concludes a discussion of the paradoxes, which Russell thought to have finally solved with his new substitutional theory. We can sense here his effort to draw a distinction between single overt quantificational statements and mere schematic expressions. The mention of Leibniz’s Law seems to foreshadow the distinction between axiom schemas in first-order logic vs. single axioms in second-order logic where properties or classes can be freely quantified over. The interpretation of the any-locution as “prescription” in the above quote comes close to what David Hilbert, in his seminal paper Über das Unendliche, calls “hypothetical judgment”. In a well-known passage there, Hilbert explains that a statement like the equation a+1 = 1+a
(4.12)
where ‘a’ is a numeral (“Zahlzeichen”), must not be interpreted as a conjunction of infinitely many equations, but is a mere hypothetical assertion in case a particular numeral is given [46, p. 173]. The schematic force of the English any-locution has developed here into a tool for finitist mathematics.24 But, as Hilbert notices, it cannot be negated without leaving the domain of finitist mathematics. For Russell, denying a schematic statement means denying any of its particular instances. He says this in [80], where he devotes a whole section to the difference between ‘all ’ and ‘any’. He first points out there that “deduction can only be effected with real variables, not with apparent variables.” 25 [80, p. 66] This is a technical remark regarding the practice of mathematical proofs of universal statements alluded to above, where “take any x” goes with real variables. But Russell goes on: For our purposes [the distinction between all and any] has a different utility, which is very great. In the case of such variables as propositions and properties, ‘any value’ is legitimate, though ‘all values’ is not. Thus we may say: ‘p is true or false, where p is any proposition’, though we cannot say ‘all propositions are true or false’. [80, p. 67]
24 Lavine also sees a connection between Russell’s any and Hilbert’s use of it in this context; see [58, p. 190]. 25 Real variables are free variables, and apparent variables are bound.
138
Godehard Link
The reason is that unlike ‘any value’, the admission of ‘all values’ creates a new proposition which would have to be counted as an additional value in the range of quantification, leading to “reflexive fallacies”. These, however, are to be avoided according to the Vicious Circle Principle (VCP, see below), which motivates the introduction of types and orders effected in the paper. As Landini [56] convincingly argues, the stratification into types and orders was mainly a reaction to yet another contradiction Russell had discovered in his type-free substitutional theory of propositions, which would otherwise have been his favorite logical framework. Landini, who found the paradox in one of Russell’s unpublished papers, calls it the p0 /a0 paradox (see [56], [57]); like the older propositional paradox it derives from a Cantor-style diagonal argument.26 Landini also points out that the current paradox is particularly basic in that it neither involves an identity with a general proposition nor does it have any “semantic” or intensional character [56, pp. 204f.]. Thus Russell had finally decided to embrace type theory, but the influence of the substitutional theory can still be traced in [80]. There the hierarchy of propositional functions, which was to serve as blue print for Principia, is only virtual in that there are, properly speaking, no propositional functions at all: they rather simulate pairs of propositions and individuals, written as ‘p/a’, i.e., in the same way as the incomplete symbols of the substitutional theory. Thus we have an ontology of individuals and order-stratified propositions here, and the bound variables range only over propositions of a definite order or type, which constitutes the range of significance of a given propositional function [80, p. 75]. Such was now the division of labor between Russell’s real variable, indicated by ‘any’, and his apparent variable, indicated by ‘all ’: the latter was confined to a consistent assignment of types in a given context, while the former signaled a schema that could be instantiated by any propositional function of arbitrary type. Taking stock of Russell’s ontological views, then, we can see that at least up to the year 1907, in which [80] was written, Russell held on to his early intuitions that the world consists of simple and complex entities, the paradigm of the latter being propositions. His distrust in classes carried over to propositional functions with their unclear place in this basic picture. Accordingly he worked towards eliminating both classes and propositional functions from his ontology. This strategy was laid out in the substitutional theory, to which we will return. When Russell found that a type-free theory of propositions was not to be had, he introduced types and orders and started to work with propositional functions again, first, however, merely in the form of a “more convenient” notational device (in fact the old symbol ‘φx’) simulating the mentioned couples p/a of a propo26 A simplified version of the paradox is given in [58, pp. 68f.].
Formal Discourse in Russell
139
sition p together with a constituent term a marking the place of variability. By the time Principia appeared, Russell’s epistemology had contaminated his logic to the extent that he gave up on propositions as entities and introduced the so-called multiple relation theory of judgement. Complex terms in the ontological sense still survived, however, in the form of something like states-of-affairs expressed by gerunds like “a having the quality q” or “a in the relation R to b” [94, p. 44]. But these objects were no longer the bearers of truth. It is the propositional functions that took center stage now, carrying the main burden of both the logic and the ontology. The question was, and after years of commentary still is, what exactly those enigmatic propositional functions were meant to be. In order to shed some more light on it we will next ask ourselves in which way Russell’s conception of functions in general and of propositional functions in particular evolved.
3 On the Nature of Functions 3.1 Frege’s notion of function A function in present-day mathematics is a certain set of ordered pairs, viz., a relation R such that each object in its domain has a unique object in its codomain which is R-related to it. In this way a function is identified with its graph. This set-theoretic definition, Bourbaki-style, seems to have settled the question as to what function are. Yet there has always been a rival explication in the view of a function as a correlation of one object with another according to a specifiable rule, or, in modern jargon, as an input-output device following a program. This dual aspect roughly corresponds to the philosophical distinction between the extensional and the intensional view of a function, the extension standing for the static, platonist conception, and the intension for the description of the rule in such a way that different descriptions may yield the same “course of values”, as Frege says, or again, that different programs may compute the same function. In the 19th century, by the time when Frege started his career, mathematicians had developed a firm working conception of functions. For instance, in his [12] Richard Dedekind adopts Dirichlet’s notion of function in the latter’s Vorlesungen über Zahlentheorie of 1879 in the sense of a “law”, called mapping (Abbildung), which uniquely correlates a certain object with a given element of some domain. The word ‘law ’ indicates the common understanding that a function, call it f , is typically given through some mathematical equation involving an “independent” variable x and the “dependent” variable y, which is the correlated value f (x) of the function f . The mathematicians also allowed themselves to freely quantify over
140
Godehard Link
functions and to consider functions as arguments of other functions; for instance, a definite integral could be regarded as a “functional” on a suitable class of functions, taking an integrable function to a real or complex value.27 Thus while it was and still is natural to distinguish between objects (typically numbers) and functions, there appeared to be no problem to have functions operate on other functions as their arguments, thereby blurring a rigid division between functions on the one hand and objects on the other. Before we turn to Russell it is instructive to remind ourselves what Frege had to say about the notion of function. Frege had a strict two-sorted ontology, consisting of concepts and objects. Concepts are special functions whose hallmark is their “unsaturatedness”. To make this feature apparent Frege would use a function symbol ‘f ’ followed by an empty space f( )
(4.13)
This was meant to counter the usual practice in mathematics of slurring over the difference between the value f (x) of a function and the function f itself. But it didn’t seem to be just a notational warning. In [32] Frege says: The peculiarity of functional signs, which we here called ‘unsaturatedness,’ naturally has something answering to it in the functions themselves. They too may be called ‘unsaturated,’ and in this way we mark them out as fundamentally different from numbers. [32, p. 155]28
Here he appears to make out an ontological distinction which despite its vividness seems hard to capture. It sounds like a deficiency in an entity, an imperfection; but why should a function be less perfect than a number? Russell for one couldn’t see why. Frege does admit that he uses metaphors: “ ‘Complete’ and ‘unsaturated’ are of course only figures of speech; but all that I wish or I am able to do here is to give hints.” [32, p. 55]29 Anyway, modern set theory levels this distinction, and in mathematical practice, as mentioned above, functions can effortlessly be made into object-like “points” in a function space, mapped onto other objects, and quantified over. Now Frege would certainly not want to forgo the technical tool of quantifying over functions. The idea was to model the central ontological notion 27 See [15] for a standard survey of the mathematics of the 18th and 19th centuries. 28 “Der Eigentümlichkeit der Funktionszeichen, die wir Ungesättigtheit genannt haben, entspricht natürlich etwas an den Funktionen selbst. Auch diese können wir ungesättigt nennen und kennzeichnen sie dadurch als grundverschieden von den Zahlen.” [30, p. 89] 29 “ ‘Abgeschlossen’ und ‘ungesättigt’ sind zwar nur bildliche Ausdrücke, aber ich will und kann hier ja nur Winke geben.” [24, p. 80] The word ‘Wink ’ here has an unexpected Wittgensteinian touch.
Formal Discourse in Russell
141
of concept on that of mathematical function. His calculus of concepts, however, depended crucially on quantification over those concepts, without which his logicist program wouldn’t have been able to get off the ground. Thus unsaturatedness didn’t seem to be an obstacle for concepts to be quantified over. But concepts also needed to be arguments in other concepts, albeit of higher order. Just like regular objects fall under first-level concepts, so first-level concepts fall under second-level concepts, e.g., the concept subsuming all empty first-level concepts, or the concept of being equinumerous with some fixed concept F . Yet Frege seemed to draw a subtle distinction here; in [24] he writes: The relation of an object to a first-level concept that it falls under is different from the (admittedly similar) relation of a first-level to a second level concept. (To do justice at once to the distinction and to the similarity, we might perhaps say: An object falls under a first-level concept; a concept falls within a second-level concept.) The distinction of concept and object thus still holds, with all its sharpness. [32, pp. 50f.]30
Frege is so anxious to keep the type distinction between entities of the various levels that he “parameterizes” the subsumption relation. In fact, in his Grundlagen [22] he appears to speak of two altogether different relations, one being familiar subsumption (falling under ), but the other the relation of “zukommen”, in Austin’s translation “belong-to”.31 Thus he would use the phrase “a belongs to F ” as referring to the specific relation holding between a first-level concept F and its number a. George Boolos, in his seminal paper [6], finds this distinction important enough to give the belong-to relation a separate name, η, and to assign it a special, nonlogical status which, equipped with a proper axiom he calls Numbers, gives rise to what is now known as the (consistent) second-order theory FA of Frege Arithmetic [6, pp. 212ff.]. This is, of course, a reconstruction, but it highlights the idiosyncratic twist that Frege had added to the first levels of the natural type hierarchy. Unlike Russell, he wasn’t prepared to let the same entities, viz., first-level concepts, enter different functional roles, sometimes as functions, sometimes as arguments. 30 “Die Beziehung eines Gegenstandes zu einem Begriffe erster Stufe, unter den er fällt, ist verschieden von der allerdings ähnlichen eines Begriffes erster Stufe zu einem Begriffe zweiter Stufe. Man könnte vielleicht, um dem Unterschiede zugleich mit der Ähnlichkeit gerecht zu werden, sagen, ein Gegenstand falle unter einen Begriff erster Stufe, und ein Begriff falle in einen Begriff zweiter Stufe. Der Unterschied von Begriff und Gegenstand bleibt also in ganzer Schroffheit bestehen.” [24, pp. 75f.] 31 It should be noticed that the German verb ‘zukommen’ is not so much ‘belong-to’ in the sense of elementhood, as might be guessed from Frege’s conception of numbers (Anzahlen) as extensions, but rather in the sense of a relation of assignment or correlation, bringing it close to attaching “sizes” to concepts. This would square with Frege’s idea of real numbers as measurement devices, but interestingly, in his Grundgesetze Frege makes clear that his Anzahlen have nothing to do with whole numbers in the sense of real number quantities ([25], § 157). He says there that the latter are, but Anzahlen are not, proportions (“Verhältnisse”).
142
Godehard Link
Moreover, Frege never attempted to say, from a logician’s point of view, what functions should be in the first place. Instead, he indulges (not always to his advantage) in criticizing almost ad nauseam the wide-spread usemention blunders of the mathematicians of his time, concluding in dismay in the paper [26] already referred to above: The endeavour to be brief has introduced many inexact expressions into mathematical language, and these have reacted by obscuring thought and producing faulty definitions. Mathematics ought properly to be a model of logical clarity. In actual fact there are perhaps no scientific works where you will find more wrong expressions, and consequently wrong thoughts, than in mathematical ones. [32, pp. 115f.]32
However, Frege’s most advanced mathematical colleagues had a claim to possessing a reasonable notion of function, for instance the one used by Dedekind alluded to above. Dedekind does speak of a function as a law, which hints at a mathematical expression, but his paper [12] makes it clear that functions are assignments between mathematical objects specified by expressions, and not the expressions themselves. Furthermore, he has single-letter names ‘ϕ’, ‘ψ’, etc., for functions, and still never confuses function and value. Finally, successful though not always explicit steps of abstraction led him to extend common algebraic operations to functions, like addition, multiplication, or composition. While the degree of precision left much to be desired, we witness instances of the remarkable phenomenon in the history of science that insufficient language and symbolism can still transport viable ideas. Thus, pace Frege, “schiefe Ausdrücke” don’t necessarily lead to wrong thoughts.33 It is mainly in foundational matters that paying close attention to notational detail turned out to be vital to progress, as certainly Russell was to learn the hard way. To finish our survey of Frege’s views it remains to mention his notion of Wertverlauf (course-of-values) of a function f . Frege uses the notation ,
εf (ε)
(4.14)
for the course-of-values of f , with the Greek spiritus lenis (the smoothbreathing diacritic) over the ε as variable-binding operator. While he realizes that in mathematical texts the course-of-values, which corresponds to the graph of a function, is increasingly identified with the function itself, 32 “Das Streben nach Kürze hat viele ungenaue Ausdrücke in die mathematische Sprache eingeführt, und diese haben rückwirkend die Gedanken getrübt und fehlerhafte Definitionen zuwege gebracht. Die Mathematik sollte eigentlich ein Muster von logischer Klarheit sein. In Wirklichkeit wird man vielleicht in den Schriften keiner Wissenschaft mehr schiefe Ausdrücke und infolgedessen mehr schiefe Gedanken finden als in den mathematischen.” [30, p. 90] 33 The work of Ludwig Boltzmann, to whose Festschrift Frege contributed his [26], is a good case in point. Boltzmann’s endless calculations, for instance in the context of his famous H-theorem, were legendary and revolutionized the subject of statistical mechanics although he certainly cared little about the subtleties of notation.
Formal Discourse in Russell
143
Frege calls the function logically prior to its course-of-values ([23, p. 24, fn]; [32, p. 26, fn]); they cannot be the same for categorial reasons: coursesof-values are objects again, and an object is never a function. If the function f is a concept, then it maps objects to the True or to the False, and its course-of-values essentially amounts to the extension of f , the collection of all objects falling under the concept. It is in this way that Frege arrives at classes. In summary, the basic logical categories for Frege are objects and concepts, objects being logically primitive, while concepts are modeled on the notion of mathematical function. That notion is basically left undefined except that functions are regarded as “unsaturated”. Classes as courses-ofvalues are objects that are logically derived from concepts. In particular, membership is not constitutive of these classes, but defined in terms of the falling-under relation.
3.2 Russell’s approach Let us return to Russell. He agrees with Frege that there should be only one notion of function for both logic and mathematics. But as mentioned above, Russell felt that since logic lies at the conceptual heart of mathematics, the notion of function cannot simply be borrowed from mathematics but needs to be explicated in logic. So this is a point where Russell differs from Frege. Another one was mentioned above: “unsaturated” entities are at variance with Russell’s metaphysical convictions, which postulated a realm of atomic entities, be they individuals or universals, that are ontologically independent. A third point of disagreement with Frege is that Russell subscribes to a positional view of objects, according to which the same attribute (property or relation) can appear in a proposition as relating relation and then again fill a subject position in a different proposition. This Janus-faced character of attributes was illustrated above with the sentence pair ‘Socrates is human’ vs. ‘Humanity belongs to Socrates’. Thus subject or entity positions in a proposition are not intrinsically linked to ontological types, but are positional in nature. There might be a great metaphysical difference between Socrates and humanity; what counts logically is their position in the constituent structure of the proposition involving them. Given that Russell wants to reduce mathematics to pure logic, the question arises how to accommodate functions in his general scheme of simple and complex entities. A function is certainly a complex entity, and so its relation to propositions has to be elucidated. Russell’s basic idea in the case of propositional functions has already been discussed in the previous section. It remains to be seen how Russell reconstructs mathematical functions, which have individuals, say, numbers, as values.
144
Godehard Link
There is another crucial ingredient of the mathematical arsenal, intimately connected with propositional functions, that never really fit into Russell’s ontology, and that were the classes. Russell’s difficulties were twofold. First, there was an epistemological problem: he would have liked to define classes extensionally by the enumeration of their terms but that method would not, he says, carry over to infinite classes since an infinite enumeration would exceed human capacities. But in Principles his (first) theory of denoting provides a solution: there is a way to deal with infinite collections, e.g., the class of numbers, by using denoting concepts of finite complexity [76, p. 73]. But in invoking denoting concepts he was on shaky ground, as he was soon to realize, so he settled with a mixed approach, relying on propositional functions to generate by subsumption both finite and infinite classes as their extensions. The second difficulty was Russell’s struggle with the supposed hybrid nature of classes, classes as one vs. classes as many. On the one hand, classes with more than one term could not really be called one as they seemed to be many, e.g., the class of men denoted by the plural denoting concept all men; but then again, the whole composed of the terms of a class should be one, in the case of humans the human race [76, p. 76]. Grammar leads Russell astray here. Also he apparently continued to be under the spell of Bradley-type “contradictions”.34 Moreover, he had been suspicious about classes for yet another reason: he strongly adhered, as we saw, to the Leibnizian notion that whatever is, is one; but what exactly made a collection, a multiplicity of objects, into one singular object? We have now three basic concepts from which mathematics should be made to follow: propositions, propositional functions, and classes. They form a rather fateful triangle, as became apparent in 1901: the class of all non-selfmembered classes, the predicate subsuming exactly the non-self-predicable predicates, and again his 1903 paradox of propositions mentioned above, all spelled doom for the foundational project. Thus it was clear that no part of the triangle was safe, and a thorough analysis of the flaws in the fundamental notions generating the problems was called for. Russell’s first idea was to simply ban classes. He tried to proxy classes by the functions defining them, thereby hoping to avoid the paradox. However, as he would later write to Jourdain: 34 As a matter of fact, Bradley upheld his one-over-many argument against classes during his whole career. As late as 1914 he advanced it again: “The class is many. It is its members. There is no entity external to another than its members. The class is a collection. The class is One, but the One is not something outside the members. The members ecen seem to be members because of what each is internally. And this apparent quality in each cannot be a relation to something outside the class . . . On the other hand, a quality merely internal to each member seems to leave the class without any unity at all. The unity, therefore, not being external, must be taken itself a member of the class. And since this seems once more to be senseless, the class appears to be dissolved.” [7, p. 284]
Formal Discourse in Russell
145
In May 1903, I thought I had solved the whole thing by denying classes altogether; I still kept propositional functions, and made φ do duty for , z(φz). I treated φ as an entity. All went well till I came to consider the function W , where W (φ) . ≡φ . ∼ φ(φ) This brought back the contradiction, and showed that I had gained nothing by rejecting classes. (Letter of March 15, 1906 to Philip Jourdain, [41, p. 78]; see also CP4, [84, p. xxi])
Here we witness Russell’s attempt to apply Ockham’s Razor newly sharpened with logical tools: if a symbolic discourse (the calculus of classes) leads to contradiction, this formalism could not be about anything. But a simple translation into a language of propositional functions failed predictably since the same pattern lies at the bottom of both paradoxes, viz., a negated diagonal construction. More had to be done than just that kind of rewriting exercise. Realizing this Russell tried to dig deeper and went about to reconsider the basic tenets of his whole logical theory. On the ontological side, propositions were not negotiable at that point in time. The doctrine of the unrestricted variable recognized only one kind of object, entities; in particular, if propositional functions exist they can appear as entities in the range of the universal variable. Hence self-application like ‘φ(φ)’ should be permissible. The next plan was to accept classes only if generated by a certain subclass of propositional functions which excluded those self-applicative forms and their cognates, which Russell called “quadratic functions”. The “functional theory” of the years 1903–1905 which was meant to delineate admissible functions failed, however; Russell was groping in the dark. This led Russell to focus on the nature of propositional functions in more detail. Remember that Russell found it difficult to accommodate those functions in his basic ontological framework: are they undefinable or can they be derived from other more basic entities? As an illustrative example for his reasoning I will now consider his reflections mainly contained in one of the working papers of the time contained in Volume 4 of the Collected Papers (CP4), called On the Nature of Functions (ONF, [84, pp. 264ff.]); but I will also occasionally draw on other papers from that volume.
3.3 On the nature of functions The text begins as follows: Is the function derived from the complex, or the complex from the function? The notation x ˆ(φ‘x) suggests the former. The use of φ‘x and φ‘y as involving the same φ suggests the latter. (ONF, [84, p. 265])
146
Godehard Link
To begin with, let me explain the symbols involved. In the current phase of his writings Russell uses the inverted comma symbol φ‘x
(4.15)
for what he calls an “ambiguous value” of a propositional function φ, that is, a propositional complex with an undetermined subject x, which, by substituting a constant, say a, for x, becomes the definite proposition φ‘a. φ‘x is a “variable” propositional complex, with φ in the role of the relating relation of the complex, while φ‘a is a constant proposition like ‘Socrates is human’. The propositional function itself is given by ‘φ‘x’ with a circumflex, or hat, placed on the letter ‘x’: φ‘ˆ x
(4.16)
Like Frege, Russell has been at pains to distinguish the function from its values in his symbolism. Where Frege used the empty space ‘f ( )’, Russell puts a circumflex on the variable. In his notes Outlines of Symbolic Logic (OSL) of 1904 he illustrates the difference thus: ∗1·3 φ‘x. This represents an unspecified value of the function φ‘ˆ x.The difference of the two may be illustrated by the following: (1). sin x ˆ is a periodic function; (2). sin x is a real number. I use x ˆ where Frege uses Greek letters. (OSL, [84, p. 80])
The last remark is not quite correct; Frege uses Greek symbols in connection with the courses-of-values of functions, see (4.14) above. This is more in line with Russell’s second use of the circumflex. It produces abstracts of the form x ˆ(φ‘x) (4.17) The meaning of that symbol changes over time, however. At first it takes the role of a class abstract, where the hat replaces Frege’s smoothbreathing notation. Later on, and this is the sense we are dealing with here, the same expression stands for what we now call a lambda abstract denoting a function that takes an argument x and returnes the value φ‘x. The striking similarity with the modern use has led some commentators to speculate whether Russell had discovered the lambda calculus many years before Church.35 However, to my mind there is evidence against this interpretation. Russell didn’t really pursue the idea, and in Principia there are no genuine lambda abstracts to be found (see below). A further remark. On the face of it Russell is attempting to make headway by paying close attention to symbolism and trying out various notations in a sometimes bewildering abundance. But as pointed out above, 35 See the excellent paper [53] exploring this issue in some detail.
Formal Discourse in Russell
147
Russell never subscribed to formalism of the kind Frege criticizes, quite the opposite; in Principles Russell states explicitly that Frege’s works contain much admirable criticism of the psychological standpoint in logic, and also of the formalist theory of mathematics, which believes that the actual symbols are the subject matter dealt with, and that their properties can be arbitrarily assigned by definition. In both these points, I find myself in complete agreement with him. (PoM, [76, p. 520])
His search for the “right” symbolism is rather driven by need for precision in the face of the danger of paradox lurking everywhere. More importantly, language, including formal discourse, has always had a transparent quality 36 for Russell in that the structure of the things talked about “shine through” and can be read off from the linguistic expressions used. His confidence was of course deeply unsettled by the paradoxes, and so he resorted to turning transparence into a virtue by finding symbolic forms that are true to the structure referred to. This is a theme that was to reappear in Wittgenstein’s Tractatus. It should be clear, then, that we are dealing with ontological structure in search of its proper notation. In this way Russell’s concern for the metaphysical grounding relation between complexes and propositional functions comes to the fore. Let us consider the propositional complex φ‘x; it contains the particular, if indeterminate, object x as a constituent, supplementing an otherwise unspecified compound represented by φ. It then sounds plausible that by abstracting from x by means of the circumflex operation we arrive at the function x ˆ(φ‘x). In this way the complex is ontologically prior to the function.This is indeed the option that Russell settles on. But he feels he has to address the argument for the opposite direction as well: it seems equally obvious that the proposition φ‘x is put together from the propositional function φ and x to begin with. To avoid the grounding circle Russell argues as follows: The various complexes φ‘x, φ‘y, etc. occur, as it were, in nature, the φ just indicating their common part. It is only by analysis that we recognize the mode of combination, as Russell calls it, that is shared by all those complexes. This mode of combination can be identified with the propositional function, which for varying arguments delivers as values all the different, but identically structured propositions. It is here that Russell sees the difference in meaning between φ‘x and the abstract xˆ(φ‘x): φ‘x designates the compound of x with other entities according to a certain mode of combination. x ˆ(φ‘x) designates that mode of combination itself. (ONF, [84, p. 265])
The abstract stands for the intensional description of the rule governing the the function, the complex φ‘x for the result. But if the propositional 36 I have taken this apt metaphor from [49, p. 171].
148
Godehard Link
function is that mode of combination according to which those propositions are composed, then it cannot be a constituent of any of them; he writes in another paper of the same year, just a month earlier (On Functions, OFc): A mode of combination, like everything else, is an entity; but it is not one of the entities occurring in a complex composed of entities combined in the mode in question. . . . In short, in a complex, the combination is a combination of all the constituents, and cannot therefore be itself one of the constituents. (OFc, [84, p. 98])
In this way complexes can indeed be given priority over propositional functions. Incidentally, I interpret the last remark in this quote as the origin of the Vicious Circle Principle, which was to become the rationale for ramification (see Section 5). Now what about the possible anticipation of the lambda calculus? It is true that in an earlier manuscript of 1903, Functions (Fc), Russell introduces an explicit symbol, the vertical bar, for functional application. There he also has functional abstraction for which he still uses Frege’s spiritus lenis. He writes: If φ denotes the function, φ|x will be used to denote the value of the function for the argument x; and conversely, if X denotes an expression , containing x, x(X) will be used to denote the function involved. Thus , , x(X)|x will be another symbol for X. . . . Also x(φ|x) will be another , symbol for φ; and thus {x(φ|x)}|x will be another symbol for φ|x. (Fc, [84, p. 50])
This looks like a rudimentary version of the principle of concretion or lambda-conversion, and later in that paper he even states a kind of axiom to that effect. But the subsequent discussion shows that he is not satisfied with it, and later he seems to have dropped it. In any event, no concretion rule can be found in Principia Mathematica. By that time the functional abstract of the form xˆ(φ‘x) had turned into the incomplete symbol for a class. What Russell did anticipate was Moses Schönfinkel’s device of reducing multi-place functions to one-place functions. This was possible because of his original type-free approach. In the paper just quoted he writes: It will be found unnecessary to regard functions of two or more variables as radically different from functions of one variable; we shall find it possible to treat all such functions as functions whose values are themselves functions. (Fc, [84, p. 51])
Let us return to the other use of the circumflex occurring in Russell’s notation apart from the abstract, namely the one in (4.16), φ‘ˆ x. Russell explains the difference as follows: In the case of particular complexes, although x ˆ(φ‘x) seems to be practically more convenient, it is important to observe that φ‘ˆ x is philosophically more correct. . . . The point is that the function, as opposed to
Formal Discourse in Russell
149
the complex, is got merely by excluding the supposition of any particular value for x, i.e. by taking x to be the variable as such (as we called it). The variable in this sense is expressed by x ˆ. Hence φ‘ˆ x properly expresses the function. (ONF, [84, p. 272])
This passage reflects Russell’s idea already hinted at above, which is also present in Principles: ontologically speaking, a propositional function is really a complex which is structurally identical to any of its values φ‘x, except for “the variable as such”, xˆ,37 replacing the x. This idea is more important for Russell than propositional functions represented by abstracts. While the latter notation was felt redundant and was later put to use for naming classes, ‘φ‘ˆ x’ made it into Principia with its intended meaning virtually unchanged. The basic picture emerged that you get from propositional complexes to function complexes and back again by substitution. This was to become the central idea in the substitutional theory of the year 1906. The paper On the Nature of Functions closes with an intriguing comment on Russell’s notation which is worth quoting in full. Russell seems to sense an ambiguity in his symbolism here that persisted right up to Principia where it constitutes to my mind the greatest nuisance and obstacle for getting clear about the exact nature of the book’s logical system. The passage reads as follows. But now the following difficulty arises: If φ‘ˆ x is the function, then φ‘ˆ x is φ, and we might write (φ‘ˆ x)‘x instead of φ‘x. Here again, the difficulty springs from the two ways of regarding φ‘x, namely (a) as a complex compounded of x and other entities, (b) as a value of the function φ. A function φ is got by substituting x ˆ for x in a complex containing x: this is the way things occur when we start with the complex. But when we start with the function, the opposite order is appropriate: A function is a concept containing x ˆ, and a complex containing x results from substituting x for x ˆ in the function, i.e. taking the value of the function for the argument x. These two opposing points of view must be remembered. The confusing thing is that, while φ‘x properly expresses “the value of φ for the argument x”, we have to use it to express “any complex containing x”, which is a different notion, though it covers the same entities. (ONF, [84, p. 272])
What Russell says here is that there are two ways to understand the symbol ‘φ‘x’, the first being the ontological reading where φ‘x stands for a complex involving the constituent x, while the second just focuses on the value of the function φ, or φ‘x, to keep to Russell’s convention; here, of course, x need not be a constituent of the value. Now by deflating ontology, the first amounts to nothing else than an open formula with the variable ‘x’ in it, i.e., a syntactic expression containing ‘x’. It would have been wise to introduce altogether different symbols for these quite distinct readings, 37 This is the indeterminate again, now decorated with a hat.
150
Godehard Link
for instance, by keeping the Greek letter ‘φ’ for functions and writing the matrix as an expression in Latin letters, say like (4.18)
A[x]
in Kurt Schütte’s nennform 38 style (that’s the notation I prefer). The important distinction here is that ‘φ’ can well serve as quantifiable variable for functions, whereas there is no way to “quantify” open formulas. This is of course a point which Quine urged a long time ago;39 we will return to it when we come to the meaning of Principia’s axiom of reducibility. I conclude this section with some remarks about what Russell called denoting functions. They were designed to complement propositional functions in Russell’s unified logical account of the notion of function. Unlike the latter, denoting functions return as values not propositional complexes but simple things, i.e., regular individuals. In a mathematical context these would typically be numbers. Hence mathematical functions are special denoting functions. Let us see how Russell thinks to accommodate them. In the manuscript OSL, composed in July 1904, he writes: The ordinary functions of mathematics, such as 2x, x2 , sin x, are denoting functions, and are at first sight functions of entities. But on a nearer view, each of them is seen to be a function of a function containing x. (OSL, [84, p. 82])
The denoting functions are explained in the following way: A function of a function (f ‘(φ‘ˆ z )) is often of the sort which I call denoting functions, i.e. instead of having propositions for its values it has entities other than propositions for its values. This is the case with ( x ˆ)φ‘ˆ x and , with x (φ‘ˆ x). In such cases we can distinguish the meaning, which is complex, and contains φ‘ˆ z as a constituent, from the denotation, which may be simple, and in general does not contain φ‘ˆ z . (OSL, [84, p. 81]) ι
A year before the discovery of his theory of descriptions we find Russell still entangled in his old theory of denoting. There we have a second kind of complex entity, apart from propositional complexes, namely the complex denoting concept or denoting complex, for short. Of denoting concepts we only have to know here that a proposition containing such a concept as constituent is not about its constituent but about what the concept denotes in virtue of a special logical denoting relation which Russell describes in his Principles. , We are then invited to parse the denoting complexes ( xˆ)φ‘ˆ x or x (φ‘ˆ x) as a second level function f with a first level function φ‘ˆ z in entity position. f is a function in Russell’s logical sense due to the fact that its argument is the variable φ‘ˆ x (of the function kind) and not some particular function, ι
38 Tranlated as ‘nominal form’ in [85, p. 11]. 39 The well-known criticism is advanced, for instance, in [92, p. 151].
151
Formal Discourse in Russell
say ‘x wrote Waverley’. The value of f for that argument would be attained by substituting the specific function for the variable function, yielding: f‘
x ˆ wrote Waverley φ‘ˆ x
(4.19)
(employing the notation for substitution used by Russell at the time). Now if f is the definite article function, then (4.19) comes out as the denoting complex the author of Waverley. However, this complex is not the person who wrote Waverley, viz., Walter Scott. To account for this difference Russell draws on the distinction between meaning and denotation, which is obviously similar, but far from identical, to Frege’s distinction of Sinn und Bedeutung.40 Turning now to mathematical functions, Russell gives a parallel account. The function 2x, properly construed, is also of the form (f ‘(φ‘ˆ z )); here f is the class abstraction operating on a first level function of x. The above quotation continues: Thus e.g. ,
2x = w {E(u, v) uεx vεx u ∩ v = Λ w ˆ sim u ∪ v}
Df.
(4.20)
Thus 2x is a denoting function of the function ˆ sim u ∪ v E(u, v) uεx vεx u ∩ v = Λ w
(4.21)
which is a function containing x. (OSL, [84, p. 82])41
Russell’s insight that denoting functions are functions of second order opens up a way to conceive of them as akin to logical operators in their algebraic sense: just as quantifiers operate on lower level functions and return propositions, so the description operator, and later the lambda operator, takes functions and gives back objects of other categories, i.e., individuals or attributes.42 The first theory of denoting laid out in Principles was a heroic attempt to accommodate general propositions in the framework of complexes. The theory died at Russell’s own hands in his seminal paper On Denoting. 40 There is an extensive literature on the relation between Russell’s and Frege’s distinction, sparked in particular by commentary on Russell’s infamously obscure Gray’s Elegy argument. A major source here is [65]. 41 Writing out the corresponding formula for the sine would certainly pose a somewhat greater challenge, although it could be done “in principle”, e.g., by using the Taylor series expansion for the sine. In fact, in Principia [93, I, p. 418] the authors suggest just that, by giving the first few terms of the expansion, sparing their readers the clumsy Peano notation, though. 42 A unified algebraic account of propositions and functions was given in Aczel’s Frege Structures [1], which, in the light of Russell’s ontological phase discussed here, could as well have been dubbed “Russell structures”.
152
Godehard Link
The advent of the new theory of description that was developed in that paper had the salutary effect of dispelling some of the metaphysical fog that clouded his logical program. But the basic intensional view on the nature of functions barely changed. Denoting functions were replaced by descriptive functions and under this name made it into Principia Mathematica; see [93, ∗30, pp. 232ff.]. Here functions are still specified by some defining expression and not by a mere extensional correlation of argument and value.
4 The Substitutional Theory With the benefit of the new theory of description Russell started his wellknown ontological austerity program. Not only Meinongian entities had to go but also classes. Because of their infectious closeness to classes that were plagued by paradox, Russell also sought to do away with propositional functions. Referring to his new substitutional theory which he had communicated to Jourdain, he welcomed Jourdain’s positive reaction in a letter from October, 1906 and explained his motives: I am glad you feel attracted by the no-classes theory. I am engaged at present in purging it of metaphysical elements as far as possible, with a view to getting the bare residuum on which its success depends. [41, p. 93]
To explain the basic idea of the substitutional theory it is best to call attention to the fact that after the demise of denoting concepts the “variableas-such”, which was supposed to be a denoting concept, was left dangling. So far the variable had been the mark of a complex representing a propositional function. Now leaving a “hole” in the complex was no option (remember Russell’s opposition to unsaturated objects à la Frege). More importantly, Russell had no intent (not yet!) to give up the bedrock notion of his ontology, the propositional complex; this, together with a basic notion of individual, is the “bare residuum” in the quotation above. Thus barring denoting concepts, Russell in fact returned to his old ontological picture of Principles (or never departed from it), namely one containing a uniform realm of entities, which were only distinguished according to the feature plus/minus logically complex. Variables become purely linguistic in a type-free set-up, with only one style of variable ranging over all entities; this confirmed the doctrine of the unrestricted variable. Hence in particular, the substitutional theory allows quantification over propositions. The logical tools at disposal were the methods of incomplete symbols and substitution. Russell introduces the elementary four-place relation, p/a; b ! q (4.22)
Formal Discourse in Russell
153
with the intended reading: “the proposition q is the result of substituting the object b for the object a in the proposition p ”. Some axioms have of course to be given to secure the essential features of this reading, in particular, that the result of such a substitution is unique. Then the shortened symbol p/a; b = ( qˆ)( p/a; b ! q ) (4.23) ι
will be the definite description standing for the result of putting b in place of a in p. Now if there are no propositional functions left, how can elementary predication be effected in the theory? Well, by substitution rather than functional application. Suppose we want to assert that Socrates is mortal. Then, instead of attributing mortality to Socrates via Fregean saturation, we start out with any proposition containing the relating relation is-mortal, e.g., with the proposition that Plato is mortal, and then substitute Socrates for Plato. Propositional functions are then proxied by pairs of entities (p, a) consisting of a proposition p and an entity a occurring in p in entity position. To give an example, let a be a constant for Plato, p the proposition that Plato is human, q the proposition that Plato is mortal, and x a variable. Then the logical form of the statement that every human is mortal would come out as the formula (in Peano’s notation): p/a; x ⊃ x q/a; x
(4.24)
If the variables ‘p’ and ‘a’ occur in the combination ‘p/a’, this can only happen, according to the syntax adopted, as part of an instance of the full four-place substitution relation ‘p/a; b ! q’ and, derivatively; as part of ‘p/a; b’. Thus the symbol ‘p/a’ in isolation is meaningless. Russell calls ‘p/a’ a matrix and writes: Thus the matrix p/a is a symbol for the phrase ‘the result of replacing a in p by’, which is incomplete and meaningless; in order to acquire meaning, we must add the name of the entity which is to replace a. [78, p. 170]
An expression can be incomplete or “unsaturated”, but not its ontological counterpart, the complex. Since the variable as indeterminate object is gone, the propositional function, which used to contain such a variable as constituent, dissolves. Thus while the complete expression ‘a is human’ refers to a constant proposition, the matrix ‘p/a’ alone doesn’t name anything. This is illustrated in Figure 3. As a matter of fact, this is the doctrine of Principles, which Russell was now able to implement technically in an appropriate way. In [76, p. 88] he says of the propositional function φx: “[T]he φ in φx is not a separate and distinguishable entity: it lives in the propositions of the form φx, and cannot survive analysis.”
154
Godehard Link
expression ‘a is human’
matrix p/a
names
doesn’t name anything
? proposition p : is-human
? is-human
a
Figure 3: The proposition p vs. the matrix p/a
The matrix p/a does more, however. It can also be recruited to do the job of classes. We just agree to call the incomplete symbol a class and explain membership as follows: Thus ‘x is a member of the class p/a’ is to be interpreted as ‘the result of replacing a in p by x is true’. Here the phrase represented by p/a occurs as part of the whole sentence, but is obviously not a part which has an independent meaning of its own. [78, p. 170]
If ‘ε’ stands for membership we can express this by the principle: x ε p/a ≡x p/a; x
(4.25)
Of course, the left-hand side receives its meaning only by this equation. What looks like a rewriting exercise has the following effect for the theory: Russell’s paradox cannot be derived anymore. Since no classes are entities, the Russell class isn’t an entity either. A matrix expression for the putative Russell class would be . r/a = a ∼ ε a (4.26) with an arbitrary constant a. But since it is not the name of any entity at all, it is not a term appearing in the range of x in (4.25) and hence is not an instance to be substituted for a; the contradiction is blocked.
The leading expert on Russell’s substitutional theory is arguably Gregory Landini. In [56] he not only worked out in meticulous detail all the parts and versions of Russell’s sketchy theory, which had to be collected from mostly unpublished sources; he also discovered that Russell himself had found a decisive catch in the system, the p0 /a0 paradox mentioned above, which he never published in his lifetime. Landini argues convincingly that
Formal Discourse in Russell
155
it was this paradox that finally persuaded Russell to pursue the seemingly safer type-theoretic approach in Principia.43
5 Principia Mathematica We finally arrive at the system of Principia Mathematica. After the final blow to the substitutional theory dealt by yet another propositional paradox, Russell gave up on propositions and returned to propositional functions. He now wanted to be on the safe side. The well-known discussion with Poincaré about the underlying sources of the paradoxes led to the formulation of the Vicious Circle Principle and to a careful construal of propositional functions, which now had to carry the ontological burden of the whole system. Classes could be treated as incomplete symbols again, not referring to any real entities, and propositions dissolved with Russell’s new multiple relation theory of judgment. The Vicious Circle Principle directed Russell’s attention to the various degrees of complexity of propositional functions. Since these functions are supposed to be intensional entities, let us also call them attributes, following Quine. Now consider Russell’s own example, viz., Napoleon had all the qualities that make a great general
(4.27)
Let us begin with the elementary attribute ‘being a great general ’, which involves no generalization; such an attribute is now called a matrix (this is not quite the same use as in the substitutional theory). Expanding the attribute by generalizing over individuals, like ‘being a great general of a European power ’ would turn it into a first-order function, first order since only a quantification of an individual variable is involved. Such a function is also of lowest type 1 since only individuals fall under it. But generalizing over qualities is effected by a quantification of a variable for first-order functions themselves. Now having all the qualities that make a great general is an attribute of individuals (Napoleon and Caesar fall under it, but not Pyrrhus, I suppose). According to Principia, however, this attribute cannot be in the range of that first-order function variable. A function, Russell explains, is an “ambiguity” φˆ x (now written without the inverted comma) which is only a definite entity when all its values φa, φb, φc, etc. are given in advance: That is to say, a function is not a well-defined function unless all its values are already well-defined. It follows from this that no function can have among its values anything which presupposes the function, for if it had, 43 It should be mentioned that Landini has a much more contentious claim to the effect that Principia’s ramified type theory is but the substitutional theory in disguise, and that, worked out properly, Russell’s logicism can be vindicated after all (see [56], [57]). This is a rather valiant claim, which cannot be evaluated here.
156
Godehard Link
we could not regard the objects ambiguously denoted by the function as definite until the function was definite, while conversely, as we have just seen, the function cannot be definite until its values are definite. This is a particular case, but perhaps the most fundamental case, of the vicious-circle principle. [93, I, p. 39]
According to this, then, if our attribute above (having all the qualities that make a great general) were in the range of the first-order function quantifier it would be involved in its own definition, violating the VCP. It is thus an attribute of second order. It this way an infinite hierarchy of propositional functions is started, stratified not only according to type but also according to their quantificational complexity measured by the orders. So far this is common wisdom. Now the question arises, how many propositional functions are there? Compare second-order arithmetic; the strength of a system in the language of second-order arithmetic is determined by its comprehension principle which guarantees the existence of enough number sets for the purpose at hand, for instance, those definable from arithmetical formulas only which contain no set quantifiers. The disturbing news in our case is that no comparable comprehension principle can be found in the whole of Principia Mathematica. Most commentators thought that that could not possibly be the case, and an interpretive escape hatch was found soon: complex function terms can be formed by abstraction, and existential generalization delivers the functions needed. Now it is true that Principia preserves the circumflex notation from Russell’s earlier writings, albeit in two different versions. First of all, there is the abstract zˆ(ψz)
(4.28)
which, however, is not used for functions at all now but for classes that get eliminated anyway according to the no-classes strategy. The only notation we find in connection with functions is φ zˆ
(4.29)
But when we look into the text we realize that this is still used in the same sense as in Russell’s 1904 papers, namely as an indication that we are dealing with the function itself as opposed to one of its values φx. In fact, as Landini correctly observed [56, p. 265], Russell uses the notation as just an expedient to mark precisely this difference. He notes: We have found it convenient and possible – except in the explanatory portions – to keep the explicit use of symbols of the type “φˆ z ”, either as constants [e.g. x ˆ = a] or as real variables, almost entirely out of this work. [93, I, p. 19]
Formal Discourse in Russell
157
Thus the second use of the circumflex only survived as a reminder that the function and its values are strictly distinguished in Principia, or so it seems. A better practice would have been to simply drop the argument and write ‘φ’ like we would do today. But this apparently ran counter to Russell’s “no-unsaturated-entities” postulate. (Notice, however, that this consideration should have been of no concern to him anymore, once he had separated language from ontology.) Even so, the mainstream interpretation of Principia assumes, following Church, that the circumflex in its second use does produce something like modern lambda terms. That only makes sense, however, when the letter ‘φ’ acts as a syntactic variable (a “Mitteilungszeichen” in Hilbert’s apt terminology) for a well-formed formula and is not, or not exclusively, a function variable. The authors speak indiscriminately of “functions”, but this is clearly a flaw in the exposition, just as Quine said. For instance, to pick just one among many examples to the same effect, we can read the following: Thus “φ ! x” stands for any value for any function which involves no variables except individuals. It will be seen that “φ ! x” is itself a function of two variables, namely φ ! x ˆ and x. Thus φ ! x involves a variable which is not an individual, namely φ ! x ˆ. [93, I, p. 51]
If ‘φ ! x’ is allowed to contain (individual) variables, it had better be a complex expression, and indeed, a few lines earlier examples are given, . . e.g., φ ! x = (y) ψ(x, y), φ ! x = (Ey) ψ(x, y), etc. However, on the same page we find the example of a higher-order function, where the symbol ‘φ’ is obviously a genuine predicate variable: “φ ! x implies φ ! a with all possible values of φ” [ibid.] The inescapable conclusion is that the word ‘function’ in Principia is “systematically ambiguous” in a bad sense.44 It seems to me that the only way to save the lambda-term interpretation is to charitably suppose that the work had an inofficial rule of substitution at work by means of which the existence of higher-order attributes can then be secured by logic alone, that is, existential generalization. Given Russell’s old idea that there is only a small gap between a proposition and its corresponding function, which is easily bridged by swapping a variable for a constituent in a complex, it could well be that Russell felt no need to turn that substitution into an official rule of the system. However, the assumption of such a rule would run counter to Landini’s argument that all function variables are so-called predicative functions variables anyway [56, p. 264].45 44 Wenn Russell says in the above quote that ‘φ ! x’ is really a function of two variables, then what he ought to refer to is rather a (typically ambiguous) constant, call it ‘α’, for two-place functional application, taking as arguments predicative (because of the shriek; see below) functions φ to be evaluated at x. Here again ‘φ’ is variable, not a formula. 45 More on Landini’s interpretation presently.
158
Godehard Link
What looks merely like a technical point gets to the heart of Principia’s ontology. If one assumes that abstraction terms are part of the system then one has indeed to embrace the commitment to the abundance of highly structured attributes which in a strange way just duplicate the definitional structure giving rise to them. If not, then Principia is left with a relatively austere ontology of predicative functions guaranteed by the notorious axioms of reducibility. They are then the only explicit existence axioms of the system. Suppose Alexander, Caesar and Napoleon are all and the only individuals that display all the qualities that make a great general; then these three men make up the extension of the property of having all the qualities of a great general. As explained above, this attribute is of type 1 and has ramified order 2 as it involves a quantifier over first-order functions. Now if our aim is just to pick out those three men it would be a rather longwinded way to find them by checking all the properties of a great general and asking who falls under them (we might even dispute Napoleon’s claim to this characterization – didn’t he fail rather miserably in Russia and at Waterloo?). The simplest way would be to write down the predicate . ψˆ x = x ˆ = Alexander ∨ xˆ = Caesar ∨ x ˆ = Napoleon (4.30) and the membership check is trivial. This predicate also has the virtue of being of the lowest order compatible with its arguments. In Principia such attributes are called predicative. The axiom of reducibility (AR) postulates that for any attribute of whatever order there exists a coextensive predicative attribute. Whitehead and Russell write this as follows ([93, I, p. 167]; the shriek ‘!’ indicates predicative functions): ∗12·1
⊢ (Ef ) φx ≡x f !x
(4.31)
A modern rendering would be (∃f (o(τ )+1)τ )(∀xτ )[φ(x) ↔ f (x)]
(4.32)
where τ is the order-type of the quantified argument and f is the predicative function, of lowest compatible order o(τ ) + 1, which AR claims to exist. In Principia Russell’s no-classes theory now takes on the following form. The class abstract ‘ˆ z (ψz)’ is an incomplete symbol in the style of definite descriptions, serving the purpose of “technically providing something identical in the case of two functions having the same extension”. [93, I, p. 187] The contextual definition with respect to class expressions then reads [93, I, p. 188]: ∗20·01
z} f {ˆ z(ψz)} = (Eφ) φ!x ≡x ψx f {φ!ˆ
Df (4.33)
159
Formal Discourse in Russell
In order to arrive at the usual class-theoretic notation, Russell defines [ibid.]: ∗20·02 x ǫ (φ!ˆ z) = φ ! x Df (4.34) This is put into (4.33) yielding
⊢ x ǫ zˆ(ψz) ≡ (Eφ) ψy ≡y φ ! y φ ! x
(4.35)
whence with AR we get the usual comprehension scheme for classes: ⊢ x ǫ zˆ(ψz) ≡ ψ x
(4.36)
In this way a calculus of classes can be simulated by propositional functions alone. Now having seen Russell waver between the variable and the schematic conception of the symbol ‘φ’, the question is again what ‘φx’ stands for in AR. But this time the answer is unambiguous: considering how AR works in proofs, e.g., in the step from (4.35) to (4.36), it is clear that ‘φx’ has to be an open formula. The same is of course true of the formula ‘ψx’ appearing in the class abstract ‘ˆ z (ψx)’. But that means that in our above reading of the axiom of reducibility, where we started with “for any attribute of whatever order. . . ”, we have given it an ontological slant that should perhaps be mitigated. Russell simply reads the axiom thus: “We assume, then, that every function of one variable is equivalent, for all its values, to some predicative function of the same argument.” [93, I, p. 166] The ‘function’ here refers of course to a propositional function, which apart from the height of its order should be made of the same “stuff” as its predicative kin. Landini draws attention to a passage in Principia’s chapter on reducibility that suggests a different interpretation (see [56, pp. 262ff.]. In PM, ∗12 Russell writes: It should be observed that, in virtue of the manner in which our hierarchy of functions was generated, non-predicative functions always result from such as are predicative by means of generalization. Hence it is unnecessary to introduce a special notation for non-predicative functions of a given order and taking arguments of a given order. For example, second-order functions of an individual x are always derived by generalization from a matrix f !(φ!ˆ z , ψ!ˆ z , . . . x, y, z, . . .), where the functions f, φ, ψ, . . . are predicative. It is possible, therefore, without loss of generality, to use no apparent variables except such as are predicative. We require, however, a means of symbolizing a function whose order is not assigned. We shall use “φx” or “f (χ!ˆ z )” or etc. to express a function (φ or f ) whose order, relative to its argument, is not given. Such a
160
Godehard Link
function cannot be made into an apparent variable, unless we suppose its order previously fixed. As the only purpose of the notation is to avoid the necessity of fixing the order, such a function will not be used as an apparent variable; the only functions which will be so used will be predicative functions, because, as we have just seen, this restriction involves no loss of generality. [93, I, p. 165]
What is somewhat hidden here among considerations about the proper notation is Russell’s intention to confine himself to using quantifiable predicate variables for predicative functions only. According to Quine’s ontology criterion, then, it would turn out that Principia is committed only to such predicative attributes. That would mean that AR is to be read as a comprehension principle for those functions and not just as a more or less intuitive technical expedient for pushing down excessive accumulations of orders. The fact remains, though, that the objects postulated by AR are still attributes, that is, intensional entities of all finite types. Given that AR is the necessary existence principle for generating higherorder functions at all, it seems strange that Russell should be rather defensive about its justification. But while the pragmatic reason he adduces for accepting AR in the Introduction to Principia’s first edition appears weak at first sight, it shows that Russell was convinced of the axiom’s indispensability. He wrote: That the axiom of reducibility is self-evident is a proposition which can hardly be maintained. But in fact self-evidence is never more than a part of the reason for accepting an axiom, and is never indispensable. The reason for accepting an axiom, as for accepting any other proposition, is always largely inductive, namely that many propositions which are nearly indubitable can be deduced from it, and that no equally plausible way is known by which these propositions could be true if the axiom were false, and nothing which is probably false can be deduced from it. [93, I, p. 59]
What Russell sounds here is his “regressive method” 46 by which plausibility is conferred backward on a principle by its success in the course of developing a theory. Russell continues: In the case of the axiom of reducibility, the inductive evidence in its favour is very strong, since the reasonings which it permits and the results to which it leads are all such as appear valid. But although it seems very improbable that the axiom should turn out to be false, it is by no means improbable that it should be found to be deducible from some other more fundamental and more evident axiom. [93, I, pp. 59f.]
This turned out to be a prescient remark in a very specific context, as we will see presently. Russell’s main argument for AR runs as follows:47 46 See his paper [79]. A similar idea can be found in Gödel and, quite recently, in Hugh Woodin’s work in higher set theory; see Section 7. 47 Landini discusses this passage as well; cf. [56, pp. 293f.].
Formal Discourse in Russell
161
The axiom of reducibility is even more essential in the theory of classes. It should be observed, in the first place, that if we assume the existence of classes, the axiom of reducibility can be proved. For in that case, given any function φˆ z of whatever order, there is a class α consisting of just those objects which satisfy φˆ z . Hence “φx” is equivalent to “x belongs to α.” But “x belongs to α” is a statement containing no apparent variable, and is therefore a predicative function of x. Hence if we assume the existence of classes, the axiom of reducibility becomes unnecessary. The assumption of the axiom of reducibility is therefore a smaller assumption than the assumption that there are classes.” [93, I, p. 58]
Let us reconstruct this argument in a somewhat more formal way. Suppose classes exist. This entitles us to set up a language for classes with variables, say, ‘α’, ‘β’, for ranging over classes. Then there should also be a membership relation, belongs-to, between elements and classes; let’s call it ‘ε’. Furthermore, there has to be a comprehension principle in place that generates a class α for every propositional function φx: (Eα) x ε α ≡x φx
(4.37)
Now the expression ‘x ε α’, Russell says, contains no quantifiers and hence is a predicative function of x which is coextensive with φx. Existential generalization yields (Ef ) f !x ≡x φx , (4.38)
and Russell has derived the axiom of reducibility. But notice what he has done here. To begin with, here is a passage where he is clearly prepared to give the name ‘propositional function’ to a linguistic context containing a variable. (It is true that this linguistic conception has always been lingering in Russell’s writings, but according to our above analysis that might be due to the transparence view that always implied the ontological interpretation of propositional functions as well.) By the same token, then, ‘φx’ is also an expression, possibly involving some complex quantifier structure. Let us make this reading explicit by using for (4.37) nennform expressions of the kind given in (4.18): There is a (quantifier-free) formula A[x] such that
A[x] ≡x φ[x] (4.39) This would mean that whenever there is a class specified by some arbitrarily complex formula φ[x], we will always find a quantifier-free formula A[x] that is coextensive with it. Now we have two options to proceed. Either we follow Russell and declare the inference to (4.38) legitimate; then the existential quantifier ‘(Ef )’ must be read substitutionally. But that would attribute to Russell a modern nominalistic notion of quantification that he could have barely held at the time. I am not denying that Russell says at many places that a propositional function is just an expression containing a variable; but
162
Godehard Link
the context is always such that he speaks of specific examples like ‘x is human’.48 The point is rather that he never details what kind of entity his function symbols range over when they appear as apparent, i.e., quantified, variables. The other option, which I think comes closer to the truth, is to face the fact that by mixing up genuine variables with syntactic variables, Russell (and Whitehead!) just hadn’t yet arrived at a viable understanding of logical syntax. Otherwise he would not have allowed this confusion to permeate the whole work. The step from (4.39) to (4.38) is a non sequitur : there is no way to generalize on an open formula. Thus Russell’s attempt at motivating AR in the above way fails.49 Even so, Russell calls AR Principia’s “axiom of classes” [93, I, p. 167], the proclaimed intent being to make class talk possible without commitment to class-entities. We can assume, then, that AR is modeled on an existence axiom for classes, except that what is postulated are (predicative) attributes rather than classes, and the class language is consigned to the well-known incomplete symbol treatment. Then we have to read AR objectually in the existentially quantified ‘f ’-position and schematically in the ‘φ’-position. This is now a comprehension principle in its modern sense, as displayed in (4.32),50 and on a par with class comprehension. However, it is not a “smaller assumption”, as Russell says; in fact the common perception has been that it is more extravagant. 48 Landini [56, p. 277] indicates three representative places in Russell’s works where Russell expresses this view, and many more could be given. However, they are all about those singular instances. 49 Thus Quine’s criticism remains valid. Landini, however, thinks that the technical point has been exaggerated, and attributes to Russell a mixed view of quantification, objectual for individual variables, and substitutional for predicate variables [56, p. 277]. While I agree with Landini on many of his valuable observations, I remain skeptical in this regard, seeing Landini’s account rather as a (perfectly legitimate) rational reconstruction of Russell’s views. Another ingenious substitutional interpretation of Principia’s ramified type theory, albeit pertaining to the second edition, and qualifying some of Landini’s claims, is given in [45]. 50 This is in fact the way most commentators, including Gödel, took AR to be understood. Landini even claims that this reading was Russell’s own intention. If that was so, then not everybody picked it up. For instance, in their Grundzüge der theoretischen Logik [47] Hilbert and Ackermann described AR in the following way: “Zu jedem im Stufenkalkül vorkommenden Funktionsausdruck gibt es einen äquivalenten prädikativen Ausdruck.” (For every function expression occurring in the calculus of types [and orders] there exists an equivalent predicative expression. [47, p. 107] This is formally rendered as (Pn )(EP1 )(x)(Pn (x) ∼ P1 (x))
(4.40)
Ignoring the purely linguistic interpretation of Russell’s term ‘propositional function’, common to virtually all mathematicians studying Principia, we notice that the function term of arbitrary order Pn is universally quantified and hence not interpreted as an open formula! (Actually, in later editions of the Grundzüge the section on AR was completely removed.)
Formal Discourse in Russell
163
By way of summarizing we may observe, then, that at the time of Principia, none of the components of what we above called “the fateful triangle” had survived Russell’s ontological purge in unaltered form. Classes had turned into mere façons de parler, and propositions had given way to the multiple relation theory of judgement, contaminated with Russell’s epistemology, (cf. [93, pp. 43ff.]). Principia’s pivotal notion of propositional function had undergone a “definitionist” 51 stratification, but its ontological status was left in the dark.
6 Ramification: Gödel’s Gestalt Switch Russell’s ramified type theory fared well with neither logicians nor philosophers. Logicians complained that the theory was set forth poorly regarding modern standards of rigor, and that the axiom of reducibility undid what it was meant to achieve, that is, a complete and paradox-free reduction of mathematics to logic. Philosophers took issue with the constructivist build-up of higher-order entities that were at odds with Russell’s otherwise Platonist outlook. Gödel combined and detailed all the major problems of the system in his well-known contribution [36] to the 1944 Schilpp volume on Russell. It might come as some of a surprise, then, that a rather sympathetic and intriguing reference to Russell can be found in a lecture on Gödel’s relative consistency proof that he delivered in December 1939 in Göttingen shortly before he emigrated to the United States. The lecture with English translation is now available in Volume III of Gödel’s Collected Works [40]. In it Gödel gives an outline of his proof of the relative consistency of the axiom of choice (AC) and the generalized continuum hypothesis (GCH) with the other axioms of set theory. He writes: With regard to that [[theory]], let me explain first of all, that the objects of which set theory speaks fall into a transfinite sequence of Russellian types. [40, p. 137]
He later mentions Russell’s axiom of reducibility and says:52 To be sure, one must observe that the axiom of reducibility appears in different mathematical systems under different names and in different 51 What I call ‘definitionist’ here has been dubbed “constructionalist” by Hazen [45, p. 449]. He draws a subtle distinction between ‘constructivist’ in the sense of the Brouwer tradition and ‘constructionalist’ derived from Russell’s methodology of logical construction. Now while Hazen is right that Russell had no real stake in mathematical constructivism, the term ‘constructionalism’ points in a somewhat different direction, that is, towards the difference between the modeling and the axiomatic approach. 52 Already in his classic [35], when Gödel introduces a comprehension axiom for type theory, (Axiom IV, p. 155), he remarks: “This axiom plays the role of the axiom of reducibility (the comprehension axiom of set theory).”
164
Godehard Link
forms, for example, in Zermelo’s system of set theory as the axiom of separation, in Hilbert’s systems in the form of recursion axioms, and so on. [40, p. 145]
The set theorist Robert Solovay, who wrote the introductory note to the Göttingen and the Brown University lectures, feels puzzled by Gödel’s reference to Russell; he comments: In reading through these two lectures, I was struck by the paucity of references to Zermelo . . . especially in contrast to Gödel’s frequent references to Russell. . . . The second place where the reference to Russell, rather than to Zermelo, seems strange is in the pride of place given to the axiom of reducibility over the axiom of separation of Zermelo. . . . In Russell’s system, where he is guided by his “vicious circle” principle to a bewildering apparatus of types of all levels and ranks, the axiom of reducibility is generally regarded as the grossest philosophical expediency. . . . Therefore it seems strange for Gödel to refer to the separation axiom of set theory as merely a form of the reducibility axiom of Russell. (R. Solovay, in [40, pp. 118f.].)
Gödel, however, was less interested in notational variants and historical faithfulness than in the logical potential of Russell’s ramification idea. First of all, he simply ignores Russell’s “fine point” about the no-classes ontology, and deals with classes themselves, now generally called sets. Next he nominalizes Russell’s higher-order attributes and without ceremony calls those functions “Aussagefunktionen”, that is, turns them into linguistic expressions.53 Here he follows Zermelo, who speaks of “definite Klassenaussagen”, that is, of well-formed formulas in the language of the theory.54 But unlike Zermelo, Gödel focuses on the specific type of quantificational complexity of the defining formula that gives rise to the comprehended set. This harks directly back to Russell’s hierarchy of orders which grow along with every quantifier introduced for the highest-level function present in the built-up of a given propositional function. Then Gödel separates types and orders, which in Principia were strangely entangled, assigning the job of the former to the set-theoretic rank function while reinterpreting orders as stages of definability.55 In order to construct a model of set theory for his consistency proof, he allowed types to become cumulative and extended the orders into the transfinite. Calling sets constructible that can be defined with these resources by increasingly complex formulas, he remarks in the first published announcement of his proof:
53 Confusingly, the English translation renders ‘Aussagefunktion’ as ‘propositional function’. 54 Cf. Zermelo’s axiom of separation (Axiom III ) in [95, p. 263]. The translation of this paper in [92, p. 202] has again ‘propositional function’ for ‘Klassenaussage’. 55 A commentator appreciating the conceptual link between Russell and Gödel is Akihiro Kanamori in his [51].
Formal Discourse in Russell
165
This means “constructible” sets are defined to be those sets which can be obtained by Russell’s ramified hierarchy of types, if extended to include transfinite orders. [39, p. 26]
In the Göttingen lecture Gödel then goes on to prove what he calls the “fundamental theorem” which establishes the truth of the continuum hypothesis in the hierarchy of the constructible sets. The theorem says that the first uncountable ordinal ω1 is a bound for the orders of the constructible subsets of Lω , called Mω in Gödel’s lecture. The theorem then reads as follows: The order of every constructible subset of Mω is an ordinal of the second class, that is, < ω1 . [40, p. 143] Since Lω = Vω is the set of hereditarily finite sets and can thus be identified with ω, it is clear that every subset of ω can be correlated with a countable ordinal η < ω1 , and we get for the cardinality c of the continuum: card (c) = card (2ω ) ≤ ℵ1 . The proof in a nutshell is this. Let m be an arbitrary subset Sof Mω = Lω ; by assumption, m is constructible and thus lives in L = α∈ON Lα , so there is an ordinal α with m ∈ Lα . The set m might appear somewhere high up in the hierarchy, possibly using up many layers of definition for some one propositional function56 ϕ(x) characterizing it. However, Gödel now uses a construction introduced by T. Skolem to define what amounts to the Skolem hull of Lω and m; the resulting set K is still countable. It can then be “condensed” in by now well-known ways such that it becomes ∈-isomorphic to a subuniverse Lη , which must itself be countable. Hence the ordinal η is countable and smaller than ω1 . The isomorphism maps m to a set m′ ∈ Lη , and since it maps Mω onto itself we have x ∈ m ↔ x ∈ m′ for all x ∈ Mω . But this shows that m can be defined in a “shorter” way by a function ϕ′ (x) with a specific upper bound on its order η. Since all these orders are smaller than ω1 there are at most ℵ1 sets of natural numbers. Gödel then comments on this result as follows: I should also like to mention that the fundamental theorem constitutes the corrected core of the so-called Russellian axiom of reducibility. After all, as was mentioned a while ago, Russell had previously given a construction similar to [[that of]] the Mα but had restricted himself to finite orders. His axiom of reducibility then says that the orders of the sets of every type are bounded by a fixed finite number. He was evidently far from being able to prove that. But it now turns out that if the construction of the orders is continued into the transfinite, [[the existence of]] certain transfinite bounds actually become[[s]] provable. That is the content of the fundamental theorem. [40, pp. 143,145]
Gödel is indeed rather charitable towards Russell here. This is a result that Russell could hardly have anticipated. At the time he wrote, necessary new technical tools developed by Skolem and Gödel himself were yet to be discovered. In Principia, the bounds on the orders in AR are simply 56 Of course in the sense of “Aussagefunktion”, i.e., an open formula; cf. the preceding footnotes.
166
Godehard Link
dictated by the syntactic constraints of ramification, that is, they were set to the lowest orders compatible with the argument structure. Russell’s idea is really just to proxy the banned classes by predicative propositional functions. By contrast, Gödel cleared up the conceptual difference between the objects of the theory and their definitions and thereby assigned the “bewildering apparatus of types of all levels and ranks” its proper place. What emerged was a structural insight: The constructible hierarchy of the Lα is to the cumulative hierarchy of the Vα as the ramified theory of types is to simple type theory. (Cf. [18, p. 603].) Nonetheless the idea of the constructible hierarchy is in some sense quite germane to Russell’s definitionist outlook which, as argued above, evolved out of his thoroughly intensional conception of function.
7 Lessons for Ontology The rise of algebraic methods in 19th century mathematics and, in particular, its routine use of complex numbers in the theory of functions went along with the kind of formalist interpretation of mathematical discourse that Frege attacked. For the formalist mathematician the desire to steer clear of “metaphysical” commitments was tantamount to repudiating any semantics for the mathematical language. We find a typical reaction in the writings of James Pierpont, who was mentioned above; tired of the strife over imaginary numbers, he without further ado declared all numbers, integers, fractions, “equally real and equally imaginary”.57 Rather than trying to integrate the complex numbers in the “objectively existing” realm of established number systems, as Gauss (at least initially) had done,58 Pierpont took the demarcation dispute, reasonably enough from his viewpoint, as a dispute about words. To a positivist mind like his, it made no sense to carry the common naming relation over to mathematics because abstract objects as referents were no option for him; thus he writes [ibid.]: “Five horses, three quarters of a dollar, may have an objective existence, but the numbers 5 and 3/4 are imaginary.” Frege argued against mathematical formalism and at the same time against any attempt at providing ersatz objects as referents of number terms, be they empirical or mental, and so he embraced a platonist on57 Cited in [14, p. 307]. 58 For some time Gauss regarded the geometrical interpretation of imaginary numbers in the real plane as demonstrating their objective nature; around 1831 he wrote, that thereby “the metaphysics of imaginary numbers has been put in their true light, and it has been shown that they have the same objective meaning (reale gegenständliche Bedeutung) as the negative numbers.” (cited from [72, pp. 61f.]) Later, however, Gauss seemed to tried to avoid reference to the spatial model; see [14, p. 271, and again pp. 307f.].
Formal Discourse in Russell
167
tology of abstract entities. The same goes for Russell whose metaphysical realism found no fault with abstract objects per se. Russell was no nominalist; his eliminativism had a different source. Classes had to go not because they were abstract objects but because Russell came to regard the very concept of class as incoherent, first under the influence of Bradley and then in connection with his reading of Cantor, which directly led to the discovery of his paradox. He would later recall his reasoning: Applying this [Cantor’s theorem], as I did, to all the things in the universe, one arrives at the conclusion that there are more classes of things than there are things. It follows that classes are not ‘things’. . . . The conclusion to which I was led was that classes are merely a convenience in discourse. [82, pp. 61f.]
However, this is a case of what I have elsewhere called the counting fallacy [60, p. 318], typical of naive realism: as if there were a fixed number of things in the universe independent of one’s reckoning. The paradox was certainly the final straw for Russell, but it could have been dealt with technically (through type theory, for instance) without stripping classes of their existence. Russell’s conception of propositions proved more ominous. It was just a bad idea to make the bearers of truth regular entities in the world. Also, his logical atomism underlying the notion of complex entities like propositions committed another mistake, which could be called the grounding fallacy: the structure of propositions was modeled on the well-founded structure of linguistic expressions, hypostatizing syntactic constituency. In the case of singular propositions, this might have appeared reasonable to some extent, although the “glue” that the relating relation had to provide to hold the complex together remained mysterious and made them vulnerable to Bradley’s regress. But as we have seen, Russell had no viable ontology for general propositions. Finally, a ramified ontology of propositional functions, mimicking quantificational complexity, was even less plausible, and if we follow Landini, it was not even intended by Russell himself. In any event, since numbers were reduced to classes (of classes), and mathematical functions (as associations) to many-one relations,59 all mathematical entities turned out to be fictitious. Thus the Russell of Principia is a platonist regarding universals and abstract objects in general but a nominalist with respect to mathematical objects. How are we to assess this peculiar mixture of ontological attitudes? Metaphysical realism of the kind Russell (among others) adhered to is certainly not for us anymore, as Hilary Putnam has forcefully argued in many places (see, for instance, his [70], in particular Essay 12 therein). 59 For want of a Wiener-Kuratowski style reduction of relations to classes, Principia also had a “no-relations” theory of relations, paralleling the no-classes theory.
168
Godehard Link
When Russell came to apply Ockham’s Razor to his ontology, its use seemed not only to waver, as we saw, but to be misguided in important respects (except for banning the golden mountain and the present king of France). He had no qualms about universals but found it necessary to take ordinary individuals apart and reassemble them as logical constructions from “sense data”. Also, as far as ontological parsimony goes, keeping ordinary universals made the repudiation of mathematical objects appear gratuitous. Nevertheless, Russell’s efforts gave occasion for reflecting on an more enlightened picture of ontology, which is informed by the modern tools of logic and semantics. Quine in particular provided the methodology and showed the way. However, his early nominalism rejecting universals and all abstract objects was compromised by the consideration he subscribed to that the mathematics needed for science could not be recovered without set theory. This is the famous indispensability argument, also put forth by Putnam, for the existence of mathematical objects and other theoretical entities. This halfhearted platonism was for some time the default ontology in the philosophy of mathematics, while the mathematical platonism of Gödel, despite his high renown, was considered marginal. When Putnam defended indispensability in his Philosophy of Logic [69], he also referred to a “fictionalistic” philosophy, which he attributed to such uncongenial figures like the German Kant scholar Hans Vaihinger (the skeptical philosopher of “As-If”) and the distinguished historian of science, but Thomistic metaphysician, Pierre Duhem. Writing in 1971, Putnam comments: “This ‘fictionalistic’ philosophy seems presently to have disappeared[.]” [69, p. 63] However, only a few years later, fictionalism in the philosophy of both mathematics and science started a second career, and latter-day fictionalists claim that it was launched by Bas van Fraassen’s The Scientific Image [91] and Hartry Field’s Science Without Numbers [19]. Thus, for instance, Mark Kalderon writes: Field maintained that mathematics does not have to be true to be good, and van Fraassen maintained that the aim of science is not truth but empirical adequacy. The suggestion common to each is that the aim of inquiry need not be truth, and that the acceptance of a mathematical or scientific theory need not involve belief in its content. [50, p. xi]
It seems to me, however, that both van Fraassen and Field have a somewhat different agenda than what is typically implied by today’s fictionalistminded philosophers who enlist them as their second founding fathers. It is true that both speak of a fictionalist attitude towards abstract objects or unobservables, but only as a corollary of their broader views. Van Fraassen’s aim is epistemological; he wants to establish an enlightened empiricism that embraces the theoretical stance of modern science but is content with empirical adequacy; its shunning of truth, which is considered unattainable, is a typical feature of traditional instrumentalism.
Formal Discourse in Russell
169
Field, on the other hand, defends the metaphysical position of nominalism (that’s what the subtitle of his book says) by pursuing his sophisticated conservativity program.60 Also, he simply equates fictionalism and instrumentalism. Somewhat beguiled by the colorful metaphor of “fiction”, today’s fictionalists seem to regard the analogy between mathematics bereft of its objects and a piece of literature involving fictional characters as defining an ontological stance. Just as the fairy tale Snow White is not really about a particular person, so the “story of arithmetic” is not “really” (to be read with extra onto-stress) about numbers. Mathematical truth is just like truth in a story.61 Not only treatises in book-length have been written recently about fictionalism ([59], [2]), there are also genealogies beginning to be set up tracing fictionalism in the history of ideas. Gideon Rosen’s [75] is a case in point. His paper opens with setting fictionalism apart from instrumentalism, reductionism, and realism. Now Russell for one was a realist and reductionist, but also a fictionalist, as we saw. Duhem was a fictionalist according to the mentioned listing, but by most accounts an instrumentalist.62 It thus appears that fictionalism is a rather mixed bag, and demarcation lines are difficult to draw; we should perhaps be wary of proliferating -isms beyond necessity. Let us have a look at some typical tenets of fictionalism in the philosophy of mathematics. It is said that the mathematical sentence ‘3 is prime’ is as fictitious as the sentence ‘Snow White ate the apple’; neither can be true because both the numeral ‘3’ and the name ‘Snow White’ fail to refer to real objects. The only truth which is operative here is truth in the appropriate story from which the sentence is taken. Now assuming that there are no numbers, this is about all one can say in favor of the analogy. There are certain story-based facts that can be gleaned from the way Snow White is usually told, and that’s it. By contrast, a mathematical “story”, i.e., a theory like arithmetic, is explanatory; it can be indefinitely explored, novel problems about it can be posed and answered, unearthing ever-new facts. If commentary on Goethe’s Faust or Shakespeare’s King Lear seems open-ended, this is because (perfectly reasonable) hermeneutics is inspired anew by every new generation of interpreters who try to shed new light on the closed world of the plot, possibly with mutually inconsistent results. Also, if ‘2 + 1 = 3’ is false, why should it be preferred to the likewise false statement ‘2 + 1 = 4’ ? The reasons are, as Balaguer relates, “that 60 In [20] Field does say, though, that he is “strongly inclined to think that the fictionalist view of mathematics is correct”. [20, p. 3] 61 Field seems to agree with this narrative; he says: “[T]he fictionalist can say that the sense in which ‘2 + 2 = 4’ is true is pretty much the same as the sense in which ‘Oliver Twist lived in London’ is true[.]” [loc. cit.] 62 But see recent scholarship on Duhem [64] that convincingly argues against his instrumentalism and conventionalism.
170
Godehard Link
this story [viz., involving 2 + 1 = 3 ] is pragmatically useful, that it’s aesthetically pleasing, and most important, that it dovetails with our ‘way of thinking’.” [2, p. 13] The problem raised here is the source of objectivity of mathematics, already addressed by Frege in his Grundlagen der Arithmetik [22] and of course answered there by referring to its logicality. Whatever explanation a fictionalist might give of this fact, it hardly draws on the analogy with the structure of novels or fairy tales.63 However, mathematical fictionalists like Otávio Bueno [8] claim that fictionalism, while agreeing with nominalism on the non-existence of mathematical entities, is actually superior to nominalism. Bueno argues that the latter cannot, but the former can, meet the full list of five plausible desiderata he sets up for any viable philosophy of mathematics. Such a philosophy is supposed to explain (i) why mathematical knowledge is possible; (ii) how reference to mathematical entities is achieved; (iii) why mathematics can be successfully applied to science; (iv) how a uniform semantics for mathematics and science can be given; and (v) how mathematical discourse can be taken literally. [8, p. 63] Now there can be little doubt that desiderata (i) and (iii) are reasonable and should indeed be met by any account. The other points, however, are in need of substantial qualifications lest they unduly prejudice the issues involved. For instance, Bueno says, ad point (ii): “We can refer, and we often do refer, to non-existent things. They are, ultimately, objects of thought.” [8, p. 74] Bueno cannot mean ‘objects of thought’ in Frege’s sense, for then he would be back to platonism. He would have to argue for a naturalistic account of reference without falling prey to psychologism. There is nothing in fictionialism, a basically negative position like atheism, which gives it an advantage over a nominalistic strategy to deal with this problem. By the same token, the “uniform semantics” urged for mathematics and science, and similarly point (v), presuppose an account of meaning and truth which it is incumbent upon any anti-platonist theory to work out. There are no special “tools” available to fictionalism that would give it an edge over its rivals. In fact, when fictionalists speak about ‘3 is prime’ being false (because there are no numbers), they betray an understanding of truth which sticks to the traditional correspondence-theoretic view of platonic realism. Note that the issue is not truth of Peano Arithmetic in the standard model; in that “story” the sentence comes out true. What is at stake here is nothing less than an account of mathematical truth 63 John Burgess expresses the same skepticism when he writes: “I think that in view of this radical difference between mathematics and novels, fables, or other literary genres, the slogan ’mathematics is a fiction’ not very appropriate, and the comparison of mathematics to fiction not very apt. My conclusion is that, whatever may remain to be said for or against nominalism, about whether we should or should not call ourselves ’nominalists’, we should not call ourselves ‘fictionalists’.” [10, p. 35]
Formal Discourse in Russell
171
simpliciter. Since for the anti-platonist there is no “Model-in-the-Sky” 64 in which our assertions can be evaluated, we have to find an account what truth here on earth could mean. Fictionalism doesn’t even address this real, admittedly deep, problem. This is not to say that I disagree with fictionalism about deflating ontology. Quite the contrary. We should indeed say goodbye to a metaphysical view of reality which is commonly referred to as “platonism”. The platonist universe is the mind-independent external world, comprising a realm of eternal and immutable, abstract objects, among them the mathematical entities, numbers, sets and functions.65 It seems clear that Russell subscribed to the platonist view as a metaphysical coordinate system in the background, only within which he would pursue his reductionist program. The most perspicuous feature in Russell’s ontology was the notion of complex entities, propositions, denoting concepts and, at least originally, propositional functions. These entities owed their complexity not to contingent physical features of concrete objects (like a crystal or a living cell) but to the metaphysical structure of constituency, which involved, as we saw, an abstract glue called “mode of combination” and a well-founded dependence relation. However, the truth is that the putative structure of ontological complexes derives from the structure of language. The language species is prone to project its linguistic structure into the world. The naming game constantly played with respect to real world objects is metaphorically extended to a correspondence between formal expressions and their referents in a putative realm of mathematical objects or of abstract entities in general. Formalists with their anti-metaphysical bent concluded that this extension is illegitimate, that mathematical expressions aren’t about anything, and that the very notion of truth in mathematics is elusive.66 Now Frege (and Russell with him) was right in rejecting that brand of formalism. Also, convinced of the objectivity and the universal applicability of logic and mathematics, Frege rejected the psychologistic notion of his time that concepts and mathematical abstracta are just ideas in the heads of people. The only alternative apparently left, then, was a realism with respect to abstract objects. A “third realm” of logical objects had to be recognized, as he famously said ([27, p. 69], [34, p. 337]). Modern nominalism contested that conclusion, and rightly so, to my mind. The third realm is a metaphysical figment. But while the platon64 See [88, p. 356]. 65 William Tait, who also finds this view unintelligible, prefers to call it “superrealism”, commenting: “Plato deserves a better fate.” [89, p. 91] 66 The tendency to altogether avoid the word ‘true’ in scientific discourse can still be seen in [47] and its later editions, where the truth values of propositional logic are consistently called ‘korrekt’ and ‘falsch’ (‘correct’ and ‘false’) rather than ‘wahr ’ and ‘falsch’ (‘true’ and ‘false’).
172
Godehard Link
ist has a very simple account of truth and objectivity, any anti-platonist theory is seriously put to the test by these issues. I already indicated why I find fictionalism wanting in this respect. The indispensability theory fares little better. It could hardly be a platonist account because the furniture of our world, in the platonist view, cannot be a matter of negotiation and compromise. The indispensability theorist says that the reach of abstract ontology is dictated by “scientific practice”; but this practice is a collective and multifarious cultural enterprise, and not a homogeneous masterminded activity. It seems odd, therefore, that somebody like Quine, to take my favorite example, should prescribe specific demarcation lines in the ontology of infinite sets: I recognize indenumerable infinites only because they are forced on me by the simplest known systematizations of more welcome matters. Magnitudes in excess, e.g., iω or inaccessible numbers, I look upon only as mathematical recreation and without ontological rights. [43, p. 400]
Which authority is it that decrees i17 as real but iω off-limits? Even Quine himself wasn’t really sure; the willy-nilly platonist was later prepared to embrace predicativism. Referring to Solomon Feferman’s claim in [16] that his predicative system W is sufficient for contemporary physics, Quine remarks: “This would be a momentous result. It would make a clean sweep of the indenumerable infinites and unspecifiable sets.” [71, p. 230] We see clearly that ontology has become a matter of expedience. Lest it should become completely arbitrary, however, it is important to address the question how to square it with objectivity and truth. Georg Cantor, usually considered an arch platonist, trusts that mathematical concepts will either reveal their usefulness in the course of the research process or will be eliminated, thereby facing the test of their objectivity. This is quite a pragmatist idea. Cantor sees the greater danger in unduly confining the “freedom” of (pure) mathematics. He writes: But then every mathematical concept bears the necessary corrective within itself; if it is infertile or unpractical it will reveal it before long through its uselessness, and it will be dropped for failing success. By contrast, every superfluous restriction of the mathematical research drive seems to me to involve a much greater danger . . . as the nature of mathematics consists exactly in its freedom. (Translation G. L.)67
I opt for a metaphysical nominalism combined with what I call a methodological realism with respect to abstract entities in general and mathematical objects in particular. The abstract objects of logic and mathematics are created by metaphorical extension. Metaphorization, to take up 67 “[D]ann aber trägt auch jeder mathematische Begriff das nötige Korrektiv in sich selbst einher; ist er unfruchtbar oder unzweckmäßig, so zeigt er es sehr bald durch seine Unbrauchbarkeit und er wird alsdann wegen mangelnden Erfolgs fallen gelassen. Dagegen scheint mir aber jede überflüssige Einengung des mathematischen Forschungstriebes eine viel größere Gefahr mit sich zu bringen . . . ; denn das Wesen der Mathematik liegt gerade in ihrer Freiheit.” [11, p. 182]
Formal Discourse in Russell
173
an umbrella concept from cognitive scientists George Lakoff and Rafael Núñez [55],68 is a common source of human creativity, indeed both an inevitable and inexhaustible one; it includes well-known cognitive mechanisms like reasoning by analogy, extrapolation, generalization, aided by more specific theoretical tools like systematization, problem reduction, formalization, and mathematical abstraction, i.e., reasoning modulo equivalence relations. The genesis of all these cognitive routines are to be found in cultural practices of human agency, stretching from the most basic interaction with the environment to ever more sophisticated negotiations in a complex society. Lakoff and Núñez use the term embodied cognition to stress the overall naturalistic roots of even as lofty fields as mathematics. Incidentally, this is a perspective that can already be found, for instance, in Philip Kitcher’s anti-platonist philosophy of mathematics [52], where the author, among other things, traces the abstract notion of collection or set back to the human activity of collecting. The embodiment69 approach to cognition helps to explain how mathematical knowledge is generated, and thereby to answer Benacerraf’s worry about the epistemic access to mathematical objects. Part of the explanation must be, however, that this knowledge is not achieved through “reference” to mathematical objects and an investigation of their properties. To put matters this way is already a biased account, imposing on the discourse involving abstracta the misplaced model of dealing with concrete objects by direct or delayed ostension via “causal chains”. It is not surprising that such a view leads to an impasse. Also, I don’t think that the cognitivist account falls under the verdict of Frege’s anti-psychologism. To meet the charge of psychologism, we can admit the commonplace observation that human abstraction originates in mental processes; but we have to insist that that doesn’t make mathematics subjective or “mind-dependent”. Objectivity and mind-independence is achieved through a shared and highly formalized discourse; its concepts are equipped with sharp identity criteria hardly to be found elsewhere, and are linked to precise rules of use restraining the field of connotations that come with normal ideas in the mind of an individual.70 It doesn’t in 68 The cognitive mechanisms detailed in their book (like image schemas, aspectual schemas, conceptual blends) provide vivid illustrations about the way patterns of mathematical abstraction could work. Leaving aside the manifesto style of the book in which the authors grind their axe, and which tends to make them blind to the genuinely theoretical nature of mathematics (see below), I do think that the human faculty of abstraction can be naturalized broadly along these lines. 69 Let us bear for once with this appeal to a modern buzz word, as it points in the right direction. 70 See also Leslie Tharp’s Myth and Mathematics [90], where he writes: “And once we have picked the initial concepts, we have no control over the consequences. There is some sort of objectivity and great definiteness both in the original limitations of choice, and in the uncontrollable consequences”. [90, p. 192] However, while we can agree with Tharp’s effort to secure objectivity in anti-platonism, we should be
174
Godehard Link
the least belittle the ingenuity of mathematicians to point out that their ideas turn into a viable piece of mathematical science only to the extent to which they are couched in a public (formal) language and theory, complete with definitions, theorems and proofs. These are then out there “on paper”, ready for everyone to inspect, to correct or to build on and to develop further. Thus mathematical theories are like cultural artifacts, as objective as pieces of architecture, and as distinct from the ideas of their original authors as the building from the plan.71 Therefore it is important to supplement the cognitivist view with this cultural perspective. While the former serves to point to the evolutionary origin of all concept formation and explains, contra platonism, why mathematics is in rapport with this world, it is the latter that accounts for the theoreticity of mathematics (and of physics) as science. The cultural practice of established disciplinary training and expertise giving rise to theory formation goes beyond all attempts at reducing this collective enterprise to its narrow empiricist or naturalistic roots.72 What about mathematical truth, then? This is obviously not the place to answer the question. Suffice it to say that I find it important to reclaim a kind of “pre-Tarskian innocence” regarding meaning and truth and to insist that mathematical statements can be true simpliciter and not only true in a model. To spell out in a non-question-begging way what that amounts to, without falling back to platonism or fictionalist agnosticism, is still a major challenge in the philosophy of mathematics. Truth in a model is of course always internally available in mathematics, and this is where we are confronted with the incompleteness phenomena. But we are concerned here not with arithmetical truth, which just means truth in the standard model, nor with set-theoretic truth, but with mathematical truth. The issue is how we can renounce the Model-in-the-Sky and still develop a substantial notion of truth, if only as a regulative principle in Kant’s sense, that doesn’t fall prey to charges of relativism. The basic parts of mathematics are firmly rooted in their everyday applications, and likewise Maxwell’s equations, for instance, though highly theoretical, have a reasonable claim to truth. However, when we climb the ladder of abstraction into the higher reaches of set theory, we have to part company with people like Quine. Giving up all correspondence-theoretic delusions we have to find a different route of justification. Elsewhere I argued for a coherentist approach to mathematical truth in this domain [61], referring to Hugh Woodin’s work in connection with the notorious continuum hyskeptical about his brand of “finitistic and mentalistic conceptualism”. 71 I think it can be argued that even the well-known psychologistic language of Dedekind and Cantor has a definite claim to objectivity when put in the proper historical perspective; Tait [89, p. 89], for instance, reads Cantor that way. 72 In van Fraassen’s constructive empiricism, I take the qualification “constructive” to point in that direction.
Formal Discourse in Russell
175
pothesis CH ; a certain axiom, projective determinacy, is introduced there which gives rise to a rich theory that structures the logical space of set theory in a surprisingly tight way, and actually decides CH (in the negative). There is no need to claim self-evidence for such an axiom; it receives its justification from the wealth of its “verifiable consequences”.73 It is intriguing to note that the idea of an a posteriori justification of axioms harks back, as we have seen, to Russell’s regressive method. Finally, once we have given up the idea of a pre-established harmony between the terms of our mathematical language and a realm of eternal platonic entities, it is to be expected that there are points of bifurcation in our mathematical practice at which equally well justified but incompatible theories compete with one another.74 In view of the agonistic nature of argument in other cultural quarters this state of affairs should not be considered altogether anomalous. In set theory, there might be different theoretical viewpoints regarding the very notion of set behind the acceptance of particular axioms. I have in mind here what has been called the combinatorial conception of set presumably embraced by set theorists like Saharon Shelah,75 versus definitionist inclinations in the descriptive set theory school. In any case, it was again Russell whose ramified type theory first gave substance to the definitionist approach, which also accommodates the much more constrained theories of predicativism.
8 Conclusion Russell’s metaphysical logic started out as a logic of simple and complex entities. Chief among the complexes were propositions. The logical symbolism has to mirror this complexity, thereby providing an analysis of the structure at hand. In this picture logic is the “ancilla metaphysicae”: the symbolic form follows the metaphysical structure. However, the correspondence between ontological structure and symbolism breaks down at general 73 Cf. the well-known quote from Gödel [37], which strikes a note different from the platonism usually attributed to him: “Furthermore, however, even disregarding the intrinsic necessity of some new axiom, and even in case it had no intrinsic necessity at all, a decision about its truth is possible also in another way, namely, inductively by studying its ‘success’, that is, its fruitfulness in consequences and in particular in ‘verifiable’ consequences . . . . There might exist axioms so abundant in their verifiable consequences, shedding so much light upon a whole discipline, and furnishing such powerful methods for solving given problems . . . that quite irrespective of their intrinsic necessity they would have to be assumed at least in the same sense as any well-established physical theory.” [37, p. 521], [39, pp. 182f.] 74 This issue has been labeled the problem of pluralism. See the intriguing paper [54] by Peter Koellner, who addresses it from the vantage point of Woodin’s methodology. 75 Combinatorial set theorists tend to be platonists. For instance, Shelah confesses: “I am in my heart a card-carrying Platonist seeing before my eyes the universe of sets,” though he adds, “but I cannot discard the independence phenomena.” [87, p. 5]
176
Godehard Link
propositions. Russell was never able to give a satisfactory account of general propositions compatible with this view. His first theory of denoting was designed to save it but only complicated the ontology by introducing a second variety of complexes called denoting concepts, which were definitely “pre-modern”. Formal logic also spelled doom for propositions: first collecting them into classes was found paradoxical, and later quantifying over them in the modern sense was as well. Here, of course, part of the blame has to be allotted to Russell’s philosophy of logic, the fixed idea of a type-free logic coming from the doctrine of the unrestricted variable. We witness the constant struggle between Russell’s metaphysical convictions and logical preconceptions on the one hand and the requirements of a modern symbolic logic on the other, which he sometimes unwittingly hit upon and then doggedly pursued. As we have seen, a major problem for Russell was to find a place for functions in his ontology. Functional application became substitution in a complex, clearly an intensional conception. The idea is not mapping one object to another, but exchanging constituents in a given structure, the structure itself being remembered in the process. That is essentially the “program view” of functions; it seemed to work for propositional functions, but when dealing with ordinary functions it needed to be supplemented by the notion of denoting function. On Denoting did away with denoting complexes, pulling the rug from under denoting functions, but Russell could now set up definite descriptions for finding the values of functions. On Denoting broke with the strict correspondence view of form and content and therewith put an end to the transparence conception of language. The formal discourse of incomplete symbols, which conveyed meaning only syncategorematically, became Russell’s central logical tool in his reductionist program. The resulting no-classes theory of classes, however, was not meant to reform mathematics in any way. In particular, he was not a constructivist, in spite of the definitionist character of ramified type theory. The Vicious Circle Principle grew out of the concern for grounding propositional complexes. It took Gödel’s gestalt switch to view ramification in a completely different light, that is, as a definability scheme with which the constructible hierarchy could be built up along the (admittedly impredicative) scale of ordinals. Now as was hinted at above, what is interesting here from the perspective of the history of ideas is the phenomenon that a certain discourse, quite independently from the way an author intends to use it, has the potential of lending itself to quite different interpretations and technical developments, thereby enriching the field in novel ways. This versatility is an example of what I like to call the “materiality” of the formalism, the public nature of which is a necessary characteristic of scientific culture.
Formal Discourse in Russell
177
Finally, we considered what Russell’s ontological development could mean for mathematical ontology today. The dual stance on ontology advocated here repudiates platonism in the sense of a cliché “superrealism” but refuses to be constrained in its mathematical practice by dispensability claims on various levels. Fictionalism was likewise found wanting when confronted with the fundamental issues of objectivity and truth. Formal discourse never meant for Russell to embrace formalism. But it was instrumental in his gradual retreat from metaphysical preconceptions towards a modern understanding of logic. If he at times appears excruciatingly slow to the “enlightened” modern reader, then that is just ample evidence for the tremendous gap between nineteenth-century metaphysical logic before Russell and modern developments after his most productive years. Despite its many shortcomings, Prinicipia Mathematica was after all a founding document of a new field. For reasons like these, and others, logic came a long way, at Russell’s hands in the first decade of the last century, towards the modern foundational discipline as we know it today. Acknowledgments I wish to thank the members of the TransCoop Group who provided expert audiences for the ideas shared with them at several stages of the present paper. I presented some of the material and profited from discussions at various places, among them the Universities of Munich, Göttingen, Berlin (Humboldt), Konstanz, and Notre Dame. Apart from the TransCoop group, I am thankful for comments and suggestions I received from a number of people; I’d like to mention Juliet Floyd, Curtis Franks, Gottfried Gabriel, Aki Kanamori, Wolfgang Spohn, Christian Tapp; in particular, I am indebted to Bernard Linsky and Ed Zalta for extensive commentary. I thank Jesse Tomalty for checking my English. Special thanks go to Mic Detlefsen for originally suggesting the cooperation between the German and the American groups in the TransCoop format of the Humboldt Foundation.
References [1] Peter Aczel. Frege structures and the notions of proposition, truth and set. In Jon Barwise, H. Jerome Keisler, and Kenneth Kunen (eds), The Kleene Symposium, pp. 31–59. North-Holland, Amsterdam, 1980. [2] Mark Balaguer. Platonism and Anti-Platonism in Mathematics. Oxford University Press, New York, 1998. [3] Jon Barwise and Robin Cooper. Generalized quantifiers and natural language. Linguistics and Philosophy, 4:159–219, 1981. [4] Jon Barwise and John Perry. Situations and Attitudes. MIT Press, Cambridge, Massachusetts, 1983.
178
Godehard Link
[5] George Boole. An Investigation of the Laws of Thought on which are Founded the Mathematical Theories of Logic and Probabilities. Macmillan, Oxford, 1854. Reprint Dover Publications, New York 1958. [6] George Boolos. The consistency of Frege’s Foundations of Arithmetic. In On Being and Saying: Essays in Honor of Richard Cartwright, pp. 3–20. MIT Press, Cambridge, Massachusetts, 1987. Reprinted in [13]:211–233. Page references are to this reprint. [7] Francis H. Bradley. Essays on Truth and Reality. Clarendon, Oxford, 1914. [8] Otávio Bueno. Mathematical fictionalism. In [9], 59–79, 2005. [9] Otávio Bueno and Øystein Linnebo (eds). New Waves in Philosophy of Mathematics. Palgrave MacMillan, Houndmills, Basingstoke, Hampshire and New York, 2009. [10] John P. Burgess. Mathematics and Bleak House. Philosophia Mathematica, 12:37–53, 2004. [11] Georg Cantor (ed.). Gesammelte Abhandlungen mathematischen und philosophischen Inhalts. Julius Springer, Berlin, 1932. [12] Richard Dedekind. Was sind und was sollen die Zahlen? Vieweg, Braunschweig, 1888. English translation in Richard Dedekind, Essays on the Theory of Numbers, Dover Publications, New York 1963, 228–115. [13] William Demopoulos (ed.). Frege’s Philosophy of Mathematics. Harvard University Press, Cambridge, Massachusetts, 1995. [14] Michael Detlefsen. Formalism. In [86], 236–317, 2005. [15] Jean Dieudonné. Abrégé d’histoire des mathématiques 1700–1900. Tome I et II. Hermann, Paris, 1978. [16] Solomon Feferman. Weyl vindicated: Das Kontinuum seventy years later. In C. Cellucci and G. Sambin (eds), Temi e Prospettive della Logica e della Scienza contemporanee, pp. 59–93. Cooperative Libraria Universitaria Editrice, Bologna, 1988. Reprinted, with a Postscript, in [17], 249–283. [17] Solomon Feferman. In the Light of Logic. Oxford University Press, New York and Oxford, 1998. [18] Solomon Feferman. Predicativity. In [86], 590–624, 2005. [19] Hartry H. Field. Science Without Numbers. A Defence of Nominalism. Basil Blackwell, Oxford, 1980. [20] Hartry H. Field. Realism, Mathematics and Modality. Basil Blackwell, Oxford, 1989. [21] Kit Fine. Reasoning With Arbitrary Objects. Basil Blackwell, Oxford, 1985. [22] Gottlob Frege. Die Grundlagen der Arithmetik. Eine logisch mathematische Untersuchung über den Begriff der Zahl. Koebner, Breslau, 1884. English translation in [33]. [23] Gottlob Frege. Funktion und Begriff. Address given to the Jenaische Gesellschaft für Medizin und Naturwissenschaft, January 9, 1891. Reprinted in [30]:18–39. Page references are to this reprint. English translation as Function and Concept in [32], 21–41, 1891. [24] Gottlob Frege. Über Begriff und Gegenstand. Vierteljahrsschrift für wis-
Formal Discourse in Russell
179
senschaftliche Philosophie, 16:192–205, 1892. Reprinted in [30]:66–80. Page references are to this reprint. English translation in [32], 42–55. [25] Gottlob Frege. Grundgesetze der Arithmetik. Begriffsschriftlich abgeleitet. Vol. II. Pohle, Jena, 1903. [26] Gottlob Frege. Was ist eine Funktion? In Festschrift Ludwig Boltzmann gewidmet zum sechzigsten Geburtstage 20. Februar 1904, pp. 656– 666. Barth, Leipzig, 1904. Reprinted in [30]:81–90. Page references are to this reprint. English translation in [32], 107–116. [27] Gottlob Frege. Der Gedanke. Eine logische Untersuchung. Beiträge zur Philosophie des deutschen Idealismus, 1:58–77, 1918. Reprinted in [28]:30– 53, and in [29], 342–362. English translation in [34], 325–345. [28] Gottlob Frege. Logische Untersuchungen. Vandenhoeck & Ruprecht, Göttingen, 1966. Edited and with an introduction by Günther Patzig. [29] Gottlob Frege. Kleine Schriften. Edited by I. Angelelli. Olms, Hildesheim, 1967. [30] Gottlob Frege. Funktion, Begriff, Bedeutung. Fünf logische Studien. Vandenhoeck & Ruprecht, Göttingen, 1969. Edited and with an introduction by Günther Patzig. [31] Gottlob Frege. Nachgelassene Schriften und wissenschaftlicher Briefwechsel. Edited by H. Hermes, F. Kambartel, and F. Kaulbach. Vol. 2: Wissenschaftlicher Briefwechsel. Felix Meiner, Hamburg, 1976. [32] Gottlob Frege. Translations from the Philosophical Writings of Gottlob Frege. Basil Blackwell, Oxford, 1977. Edited by Peter Geach and Max Black. [33] Gottlob Frege. The Foundations of Arithmetic. A Logico-Mathematical Enquiry into the Concept of Number. Basil Blackwell, Oxford, 1980. English translation of [22] by J. L. Austin. [34] Gottlob Frege. The Frege Reader. Blackwell, Oxford, 1997. Edited by Michael Beaney. [35] Kurt Gödel. Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I. Monathefte für Mathematik und Physik, 38: 173–198, 1931. Reprinted, with English translation, in [38]: 144–195. Page references are to this reprint. [36] Kurt Gödel. Russell’s mathematical logic. In Paul Arthur Schilpp (ed.), The Philosophy of Bertrand Russell, Library of Living Philosophers, vol. 5, pp. 125–153. Open Court, LaSalle, 1944. [37] Kurt Gödel. What is Cantor’s continuum problem? American Mathematical Monthly, 54:515–525, 1947. Reprinted in [39]:176–187. [38] Kurt Gödel. Collected Works. Volume I: Publications 1929–1936 (CW I). Oxford University Press, New York and Oxford, 1986. Edited by Solomon Feferman et al. [39] Kurt Gödel. Collected Works. Volume II: Publications 1938–1974 (CW II). Oxford University Press, New York and Oxford, 1990. Edited by Solomon Feferman et al.
180
Godehard Link
[40] Kurt Gödel. Collected Works. Volume III: Unpublished Essays and Lectures (CW III). Oxford University Press, New York and Oxford, 1995. Edited by Solomon Feferman et al. [41] Ivor Grattan-Guinness. Dear Russell – Dear Jourdain. A Commentary on Russell’s Logic, Based on his Correspondance with Philip Jourdain. Duckworth, London, 1977. [42] Nicholas Griffin (ed.). The Cambridge Companion to Bertrand Russell. Cambridge University Press, Cambridge, 2003. [43] Lewis E. Hahn and Paul A. Schilpp (eds). The Philosophy of W. V. Quine. Open Court, La Salle, Illinois, 1986. [44] Bob Hale and Crispin Wright. Logicism in the twenty-first century. In [86], 166-202, 2005. [45] Allen P. Hazen. A “constructive” proper extension of ramified type theory (The logic of Principia Mathematica, second edition, Appendix B). In [62], 449–480, 2004. [46] David Hilbert. Über das Unendliche. Mathematische Annalen, 25:161–190, 1925. [47] David Hilbert and Wilhelm Ackermann. Grundzüge der theoretischen Logik. Julius Springer, Berlin, 1928. [48] Rolf-Peter Horstmann. Ontologie und Relationen. Hegel, Bradley, Russell und die Kontroverse über interne und externe Beziehungen. Athenäum, Königstein/Ts, 1984. [49] Peter Hylton. Russell, Idealism, and the Emergence of Analytic Philosophy. Clarendon, Oxford, 1990. [50] Mark Eli Kalderon (ed.). Fictionalism in Metaphysics. Clarendon, Oxford, 2005. [51] Akihiro Kanamori. Gödel and set theory. The Bulletin of Symbolic Logic, 13:153–188, 2007. [52] Philip Kitcher. The Nature of Mathematical Knowledge. Oxford University Press, Oxford, 1984. [53] Kevin C. Klement. Russell’s 1903 – 1905 anticipation of the lambda calculus. History and Philosophy of Logic, 24:15–37, 2003. [54] Peter Koellner. Truth in mathematics: The question of pluralism. In [9], 80–116, 2009. [55] George Lakoff and Rafael E. Núñez. Where Mathematics Comes From. How the Embodied Mind Brings Mathematics into Being. Basic Books, New York, 2000. [56] Gregory Landini. Russell’s Hidden Substitutional Theory. Oxford University Press, New York and Oxford, 1998. [57] Gregory Landini. Logicism’s ‘Insolubilia’ and their solution by Russell’s substitutional theory. In G. Link (ed.), One Hundred Years of Russell’s Paradox, pp. 373–399. De Gruyter, Berlin, 2004. [58] Shaughan Lavine. Understanding the Infinite. Harvard University Press, Cambridge, Massachusetts, 1994.
Formal Discourse in Russell
181
[59] Mary Leng. Mathematics and Reality. Oxford University Press, Oxford, 2010. [60] Godehard Link. Algebraic Semantics in Language and Philosophy. CSLI Publications, Stanford, 1998. [61] Godehard Link. Reductionism as resource-conscious reasoning. Erkenntnis, 53:173–193, 2000. [62] Godehard Link (ed.). One Hundred Years of Russell’s Paradox. Mathematics, Logic, Philosophy. De Gruyter, Berlin, 2004. [63] Bernard Linsky. Russell’s Metaphysical Logic. CSLI Publications, Stanford, 1999. [64] Roberto Maiocchi. Pierre Duhem’s The Aim and Structure of Physical Theory: A book against conventionalism. Synthese, 83:385–400, 1990. [65] Gideon Makin. The Metaphysicians of Meaning. Routledge, London, 2000. [66] Richard Montague. Universal grammar. Reprinted in [68]:221–246.
Theoria, 36:373–398, 1970.
[67] Richard Montague. The proper treatment of quantification in ordinary English. In J. Hintikka et al. (ed.), Approaches to Natural Language: Proceedings of the 1970 Stanford Workshop on Grammar and Semantics, pp. 221–242. Reidel, Dordrecht, 1973. Reprinted in [68]:247–270. [68] Richard Montague. Formal Philosophy. Selected Papers of Richard Montague. Edited and with an introduction by Richmond H. Thomason. Yale University Press, New Haven and London, 1974. [69] Hilary Putnam. Philosophy of Logic. Harper & Row, New York, 1971. [70] Hilary Putnam. Realism and Reason. Philosophical Papers, Volume 3. Cambridge University Press, Cambridge, 1983. [71] Willard V. Quine. Immanence and validity. Dialectica, 45:219–230, 1991. [72] Reinhold Remmert. Complex numbers. In Heinz-Dieter Ebbinghaus et al. (eds), Numbers, pp. 55–96. Springer, New York, Berlin, 1991. [73] Francisco Rodríguez-Consuegra. The Mathematical Philosophy of Bertrand Russell: Origins and Development. Birkhäuser, Basel, 1991. [74] Francisco Rodríguez-Consuegra. Propositional ontology and logical atomism. In [62], 417–434, 2004. [75] Gideon Rosen. Problems in the history of fictionalism. In [50]:15–64, 2005. [76] Bertrand Russell. The Principles of Mathematics [PoM]. Allen & Unwin, London, 1903. [77] Bertrand Russell. On denoting [OD]. Mind, New series 14:479–493, 1905. Reprinted in [81]:41–56. Page references are to this reprint. [78] Bertrand Russell. On the substitutional theory of classes and relations. In [83], 165–189, 1906. [79] Bertrand Russell. The regressive method of discovering the premises of mathematics. Paper read before the Cambridge Mathematical Club, March 9, 1907. Published in [83], 272–283, 1907. [80] Bertrand Russell. Mathematical logic as based on the theory of types [MLT].
182
Godehard Link
American Journal of Mathematics, 30, 1908. Reprinted in [81]:59–102. Page references are to this reprint. [81] Bertrand Russell. Logic and Knowledge. Essays 1901-1950. Allen & Unwin, London, 1956. Edited by Robert C. Marsh. [82] Bertrand Russell. My Philosophical Development [MPD]. Allen & Unwin, London, 1959. Reprinted 1993 by Routledge, London. [83] Bertrand Russell. Essays in Analysis [EA]. Allen & Unwin, London, 1973. Edited by Douglas Lackey. [84] Bertrand Russell. The Collected Papers of Bertrand Russell. Volume 4: The Foundations of Logic 1903-05 [CP4]. Allen & Unwin, London, 1994. Edited by Alasdair Urquhart with the assistance of A. C. Lewis. [85] Kurt Schütte. Proof Theory. Springer, Berlin, 1977. [86] Stewart Shapiro (ed.). The Oxford Handbook of Philosophy of Mathematics and Logic. Oxford University Press, Oxford, 2005. [87] Saharon Shelah. The future of set theory. math/0211397, 2002 - arxiv.org, 2002.
S Shelah - Arxiv preprint
[88] William W. Tait. Truth and proof: The platonism of mathematics. Synthese, 69:341–370, 1986. [89] William W. Tait. The Provenance of Pure Reason. Oxford University Press, Oxford, 2005. [90] Leslie Tharp. Myth and mathematics: A conceptualistic philosophy of mathematics I. Synthese, 81:167–201, 1989. [91] Bas C. van Fraassen. The Scientific Image. Clarendon, Oxford, 1980. [92] Jean van Heijenoort (ed.). From Frege to Gödel. A Source Book in Mathematical Logic, 1879–1931. Harvard University Press, Cambridge, Massachusetts, 1967. [93] Alfred North Whitehead and Bertrand Russell. Principia Mathematica [PM]. 3 volumes. Cambridge University Press, Cambridge, 1910–13. 2nd edition 1927. [94] Alfred North Whitehead and Bertrand Russell. Principia Mathematica to *56. 2nd edition. Cambridge University Press, Cambridge, 1927. Paperback Edition to ∗56, 1962. [95] Ernst Zermelo. Untersuchungen über die Grundlagen der Mengenlehre I. Mathematische Annalen, 65:261–281, 1908. English translation in [92]:199– 215.
On Live and Dead Signs in Mathematics Felix Mühlhölzer
The language is not alive except to those who use it. William P. Thurston First order theories of arithmetic have nonstandard models. Usually, however, when not doing model theory, we only consider the standard model as the ‘intended’ one which arithmetic is about. But how do we single out the standard model? This problem has been treated by many people – most recently by Volker Halbach and Leon Horsten in their paper “Computational Structuralism” – but it remains a mess until today. To my mind, it is a pseudo-problem because it overrides the deep ambiguity in the way arithmetic is ‘about’ models and ‘about’ the numbers making up these models. There is a categorial difference between “reference” , on the one hand, and “interpretation” – in the model theoretic sense of “interpretation” which is responsible for the emergence of nonstandard models – on the other. The undifferentiated talk of arithmetic’s ‘aboutness’ blurs this difference or at least underrates it. Reference concerns the use of our signs, whereas interpretation has nothing at all to do with use. In Wittgenstein’s apt metaphor: reference concerns live signs, interpretation dead ones, that is: actually no ‘signs’ at all. This is certainly a deep difference, and in the light of it our problem dissolves. The many nonstandard models emerge with regard to the interpretation of an object language, the ‘signs’ of which are dead; and model theorists have their mathematical way of distinguishing between standard and nonstandard models, which doesn’t raise any philosophical problems. Reference, on the other hand, belongs to the domain of used signs, which are deeply different from the so-called ‘signs’ of a model theoretic object language, and ‘nonstandard models’ are out of the question from the outset. With this distinction in mind, one can easily see how Halbach’s and Horsten’s attempt at solving our problem in a technical way, by making use of Tennenbaum’s Theorem, does not work, and it helps us to see as well why Charles Parsons’ attempt at showing the existence of models of arithmetic by pointing to what he calls ‘intuitive models’ is inadequate. Both attempts essentially blur the categorial difference between reference and interpretation.
184
Felix Mühlhölzer
1 A Mess Concerning the Reference, Interpretation and Application of Mathematical Terms In an article by Luca Bellotti on Skolem’s paradox there is the following nice statement: [T]he good mathematician sees a perfect order where the others see a terrible mess [. . . ], while the good philosopher sees a terrible mess where the others see a perfect order. [2, pp. 186f.]
Bellotti illuminatingly exposes the mess in discussions about possible interpretations of set theory and the irritating fact that there exist countable models if the theory is first order. In the present paper I want to extend Bellotti’s considerations to the notion of “interpretation” when used with regard to mathematics in general, and to its connection with the notions of “reference” and “application”. The mess will increase, and I’ll give a diagnosis of it that makes use of Wittgenstein’s way of distinguishing between “live” and “dead” signs. My strategy will be to concentrate on specific readings of “reference”, “interpretation” and “application” – readings that I believe I understand sufficiently well – and then to compare them with what can be read in the literature. The following is a paradigmatic example of the sort of mess I mean: statements made by Hilbert about his specific way of doing geometry that can be found in his lectures and published texts about the foundations of geometry and also in his letters to Frege. In these letters he famously says that the so-called geometric points constitute a system of things that might very well be thought to be things like love, law, chimney sweep, and so on; he refers to the independence proofs that his approach allows; but he also stresses the applicability of mathematical theories to the world of appearances. In an illuminating paper in the recently published Cambridge Companion to Frege, entitled “Frege and Hilbert”, Michael Hallett, commenting on a Hilbertian passage in one of his lectures, summarizes Hilbert’s view as follows: [According to Hilbert, it] is the mathematical theory itself which is ‘only a schema of concepts’ and which can be differently interpreted, both in the various (predictable and unpredictable) applications, sometimes to the physical world and sometimes in other mathematical theories, and also in meta-mathematical study. [. . . ] Hilbert’s ‘way of understanding’ the independence results therefore introduces, and is based on, the distinction between the axiomatized theory on the one hand and the various models on the other. [12, p. 453]
Here we have the mess that I mean in a nutshell: Does interpretation equal application? Is interpretation always essentially the same thing irrespective of the fact of whether it is directed at the physical world or at other mathematical theories or at models as in model theory? And what about reference, which is actually the main theme of Hallett’s paper? Is reference the same, or essentially the same, as interpretation? That this
On Live and Dead Signs in Mathematics
185
remains unclear in Hilbert’s writings – as I think it does – might be a typical historical fact with regard to the inventor of a new way of thinking: that he himself may not be the best expositor of it; but a considerable lack of clarity in this domain persists up to the present day. Let me start by explaining the notions of reference, interpretation and application in the sense that I believe to understand. I take the term “reference” at face value, like Wittgenstein, to whom the proposition »20+ 15 = 35« is a statement about numbers and who would, as he explicitly says, consider it ridiculous to dispute this.1 In his Big Typescript he asks: “What are numbers?”, and immediately gives the answer: “The meanings of numerals” (“Die Bedeutungen der Zahlzeichen”; [43, p. 569]). This can only mean: what the numerals refer to. He instantly adds: “And an investigation of this meaning is an investigation of the grammar of numerals”, and by “grammar” he means the characteristic use (and by “investigation” a philosophical investigation, not a mathematical one). In the important § 10 of the Philosophical Investigations (= PI) he writes that “one may say that the signs »a«, »b«, etc. [introduced in §§ 8f.] signify numbers”. He then stresses that what words signify should show itself “in the kind of use they have”, and that our uniform talk of the form: »This word signifies this«, should not blind us to the deep dissimilarities in our use of different sorts of words. Wittgenstein explicitly mentions the difference between numerals on the one hand and words like »block«, »slab« and »pillar« on the other. In this § 10 he has a very restricted language game in mind: the language game of § 8; but his remark applies to all language games; and it concerns, of course, not only the expression “to signify” but also expressions such as “to be about” and similar ones. PI § 10 should be read as an invitation to clarify the actual role of such expressions, and to understand their ‘Witz’, their ‘point’, to use Wittgensteinian terminology. What I just quoted was only the beginning of a statement in which Wittgenstein proceeds: [O]ne may say that the signs »a«, »b«, etc. signify numbers: when, for example, this removes the misunderstanding that »a«, »b«, »c« play the part actually played in the language by »block«, »slab«, »pillar«. And one may also say that »c« qualifies this number and not that one; if, for example, this serves to explain that the letters are to be used in the order a, b, c, d, etc., and not in the order a, b, d, c.
Of course, one need not stop at this point and can raise the general question about the function of our talking about signification, aboutness, etc., depending on the language game one is playing. PI § 10 suggests this question. 1
See [42, p. 112]. This sentiment, of course, is also enunciated by other philosophers, for example by William Tait: “I take propositions about numbers to be about numbers and not (somehow construed) about some other type of object or about nothing at all.” ([35, p. 542]; reproduced in [36, p. 37].)
186
Felix Mühlhölzer
I use “reference” as a generic notion that incorporates naming in the case of singular terms (as naming the number 3 with the respective numeral), to apply to in the case of general terms (to apply to the number 7, say, in case of the general term “prime number”) and aboutness in the case of statements (as when Wittgenstein says that »20 + 15 = 35« is a statement about numbers). And one should speak of ‘reference’ only in case of used expressions. Reference essentially depends on use and a philosophical investigation of reference must be an investigation of use. This has not only been said by Wittgenstein but it is at least suggested also by Putnam, Shapiro and others. Putnam adopts what Shapiro calls the Use Thesis: the claim that one understands a language to the extent that one knows how to use the language correctly;2 and Putnam explicitly describes reference as determined by use: On any view, the understanding of the language must determine the reference of the terms, or, rather, must determine the reference given the context of use. If the use, even in a fixed context, doesn’t determine reference, then use isn’t understanding. [28, p. 24]
What we have to clarify, then, is precisely this specific aspect of the use of a term that is relevant to our talking about the term’s reference, and also how we use the term “reference” itself, and what’s the point of all this. What we should not do is talk about, say, the mechanism of reference or similar things, as if reference were a natural kind with a certain essence that should be discovered. Instead, we should look at the way we talk about ‘reference’ and how we use the term “reference” itself in this respect. This is the perspective from which I see, for example, the so-called causal theory of reference: it simply describes our actual use of names of empirical objects and of natural-kind terms that is relevant to our assigning reference to them, and it shows that this specific practice is very different from what a purely descriptionist approach suggests. In a similar way one should now investigate our mathematical vocabulary and the relevant mathematical practice that highlights its reference.3 So, for example, we treat different systems of numerals – the tally notation; the positional notation, with the possibility of using different bases; or the numerals of the Romans – as systems referring to the same numbers. Or, to take a more advanced example, it is not uncommon that mathematicians talk about the problem of “giving meaning to mathematical objects”, for instance to complex numbers, or to determinants, or to groups, and so on; and what they mean are the objects they refer to with certain symbols when they use these symbols in their mathematical practice.
2
See [30, pp. 252-255], and [31, pp. 211-214]. Shapiro connects this thesis with a view about reference, but I’m not really clear about the way he sees this connection.
3
An attitude like this has also been adopted in [16, pp. 454f.] However, in this paper the common mess with respect to “reference” and “interpretation” is not removed.
On Live and Dead Signs in Mathematics
187
I also want to interpret as dealing with reference the following wellknown passage in a letter by Dedekind to Heinrich Weber from January 1888, in which he expounds his view of irrational and cardinal numbers: [Y]ou say that the irrational number is nothing other than the cut itself, while I prefer to create something new (different from the cut) that corresponds to the cut and of which I say that it brings forth, creates the cut. We have the right to ascribe such a creative power to ourselves; and moreover, because of the similarity of all numbers, it is more expedient to proceed in this way. The rational numbers also produce cuts, but I would certainly not call the rational number identical to the cut it produces; and after the introduction of the irrational numbers one will often speak of cut-phenomena with such expressions, and ascribe to them such attributes, as would sound in the highest degree peculiar were they to be applied to the numbers themselves. Something very similar holds for the definition of cardinal number as a class; one will say many things about the class (e.g., that it is a system of infinitely many elements, namely of all similar systems) that one would attach to the number (as a deadweight) only with the greatest reluctance. Does anybody think, or will he not gladly forget, that the number four is a system of infinitely many elements? (But that the number four is the child of the number three and the mother of the number five is something that nobody will forget).4
Dedekind’s talking about the creation of numbers and about our respective creative power sounds mysterious; but when we read it as actually dealing with reference to numbers, in contrast to bringing forth certain models of systems of numbers, it gets a relatively down-to-earth sense, a sense that I find very plausible. According to this sense, Dedekind here simply describes the way we speak about numbers and about our referring to them, which is important to our actual practice of doing mathematics. Our philosophical interest in reference then summons us to clarify this practice. 4
This is the English translation to be found in [12, p. 446]. The German original reads as follows: “Du sagst, die Irrationalzahl sei überhaupt Nichts anderes als der Schnitt selbst, während ich es vorziehe, etwas Neues (vom Schnitte Verschiedenes) zu erschaffen, was dem Schnitte entspricht, und wovon ich sage, daß es den Schnitt hervorbringe, erzeuge. Wir haben das Recht, uns eine solche Schöpferkraft zuzusprechen, und außerdem ist es der Gleichartigkeit der Zahlen wegen viel zweckmäßiger, so zu verfahren. Die rationalen Zahlen erzeugen doch auch Schnitte, aber ich werde die rationale Zahl gewiß nicht für identisch ausgeben mit dem von ihr erzeugten Schnitte; und auch nach Einführung der irrationalen Zahlen wird man von Schnitt-Erscheinungen oft mit solchen Ausdrücken sprechen, ihnen solche Attribute zuerkennen, die auf die entsprechenden Zahlen selbst angewendet gar seltsam klingen würden. Etwas ganz Ähnliches gilt auch von der Definition der Cardinalzahl (Anzahl) als Classe; man wird Vieles von der Classe sagen (z.B. daß sie ein System von unendlich vielen Elementen, nämlich allen ähnlichen Systemen ist), was man der Zahl selbst doch gewiß höchst ungern (als Schwergewicht) anhängen würde; denkt irgend Jemand daran, oder wird er es nicht gern bald vergessen, daß die Zahl vier ein System von unendlich vielen Elementen ist? (Daß aber die Zahl 4 das Kind der Zahl 3 und die Mutter der Zahl 5 ist, wird Jedem stets gegenwärtig bleiben).” [7, pp. 489-490]
188
Felix Mühlhölzer
I do not claim, of course, that this is what Dedekind himself actually had in mind, and I will not go into the many different interpretations that Dedekind’s pronouncements have provoked in the literature.5 What I claim is that Dedekind’s view can be understood in the way just described and that this is a reasonable view. Of course, one finds a lot of mentalistically slanted pathos in Dedekind’s utterances that does not go all too well with my interpretation; so before the passage just quoted he emphasizes that our power to create numbers is a power of the mind and he explains: “We are a divine race and undoubtedly possess creative power, not merely in material things (railways, telegraphs) but especially in things of the mind.” 6 But many of Dedekind’s statements can be easily freed from this mentalistic slant and what remains still looks to be very illuminating. In Dedekind’s texts “creation” actually can mean quite different things.7 But my interest only concerns the sense of a creation of the numbers themselves in contrast to the proxies of numbers offered by certain models, like the model of the cuts of rational numbers as proxies of the real numbers. This is the sense of “create” in the quoted letter to Weber, and in its German original – “erschaffen” – this word is only used with respect to the numbers themselves, which we create. With regard to their proxies, the cuts, Dedekind speaks of “hervorbringen” and “erzeugen” – “bring forth” and “produce” – and it is not we who bring forth and produce the cuts, but the numbers themselves that do this.8 Furthermore, these numbers themselves are not artificial entities, created only recently, but objects with which we have long been familiar. In the Preface to the first edition of his Was sind und was sollen die Zahlen? of 1888 Dedekind writes: “I know that, in the shadowy forms [schattenhafte Gestalten] which I bring before him, many a reader will scarcely recognize his numbers which all his life long have accompanied him as faithful and familiar friends” 9 . With the 5
To this see, for example, [32, pp. 170-175], and [34]. – Let me add that when having just mentioned Dedekind’s bringing forth certain models of systems of numbers, the term “model” is not used in the way in which it occurs in present-day model theory. The origin of model theory is much later (see, e.g., [15] and [1]) and Dedekind’s use of models is only a step on the way to model theory. The possibility of nonstandard models, for example, certainly a hallmark of model theory, is not seen by Dedekind, who is fixated upon second-order logic anyway, and when on p. 151 of [34] Dedekind’s ‘intruders’ are considered as ‘non-standard elements’ this is an anachronism.
6
Translation in [8, p. 835] of: “Wir sind göttlichen Geschlechts und besitzen ohne jeden Zweifel schöpferische Kraft nicht bloß in materiellen Dingen (Eisenbahnen, Telegraphen), sondern ganz besonders in geistigen Dingen.” – Our mind grasps ‘thoughts’, and Dedekind’s notion of a thought is not a mentalistic but a Fregean one, as rightly stressed in [34, p. 148].
7
As shown in [34].
8
In this respect the standard English translation is sloppy in using the phrase “creates the cut” .
9
This is the English translation in [8, p. 791]. A similar sentiment is expressed in a letter Dedekind wrote to Lipschitz in 1876; see [34, p. 154 and note 66 on p. 166].
On Live and Dead Signs in Mathematics
189
‘shadowy forms’ Dedekind of course means his newly invented proxies of the familiar natural numbers. Why did he produce them? Not in order to present the ‘actual’ or ‘genuine’ numbers, but in order to extend our theoretical possibilities of devising more proofs on the basis of more rigorous concepts. This is already hinted at in the very first sentence of said Preface: “In science nothing capable of proof ought to be believed without proof.” (See [8, p. 790]) In contrast to his shadowy forms, the numbers themselves are our ‘familiar friends’ because of our conversant use of the symbols with which we refer to them, and we are creative in precisely this practice and in the way we are incessantly extending it. Dedekind himself connects creativity and use when, in his Stetigkeit und irrationale Zahlen of 1872, he writes: I regard the whole of arithmetic as a necessary, or at least natural, consequence of the simplest arithmetical act, that of counting, and counting itself is nothing other than the successive creation of the infinite series of positive integers in which each individual is defined by the one immediately preceding; the simplest act is to pass from an already-created individual to its successor that is to be created. (See [8, p. 768])
Dedekind means this mentalistically, but I would like to free his statement from all mentalistic undertones and lay the emphasis on our use of the mathematical symbols. It is this use that has to be regarded as essential. The practice relevant to our referring to the familiar numbers is not a foundationalist one. In a foundationalist context we, of course, may sometimes refer with the numeral “3” to a set theoretic proxy of the familiar number, like the corresponding von Neumann number or the Zermelo number, or with the symbol “π” to a cut of the rational numbers or to a certain equivalence class of Cauchy sequences of rational numbers. But we are aware that this is only a modelling in set theory, which need not – and, as Dedekind says, should not – be identified with the number referred to in our mathematical practice. In the letter to Weber quoted above, Dedekind talks about a correspondence between irrational numbers and cuts (of rational numbers), without explicitly defining what he means by this; but what he apparently has in mind here is the interpretation of the structure of irrational numbers in the structure of cuts, as one now says in model theory.10 This is one of the many – amazingly many – senses of the term “interpretation”. It is the sense of interpreting a structure A in a structure B. This sense, however, is philosophically not so exciting, and in what follows, if nothing else is said, I will use the term “interpretation” in the sense of interpreting a language by means of certain structures – with Alfred Tarski’s definition of truth with respect to a given structure as a paradigm.11 This sense is also common in model theory, and it turns up in 10 See [14, p. 107]. 11 See, e.g., [3, pp. 102-106 and 114-119].
190
Felix Mühlhölzer
many philosophical discussions about interpreting a language or theory, for example about problems concerning the ‘standard’ versus ‘non-standard’ or ‘intended’ versus ‘non-intended’ interpretations of theories, especially in the case of mathematical ones. And it is this sense that is often confounded with reference. As I just said, I interpret Dedekind’s view about our familiar numbers as concerning reference, and I think that reference must be sharply distinguished from interpretation in the model theoretic sense of interpreting a language. Even in this sense there is a certain ambiguity. Interpretation in this sense can be understood as identical with what is often simply called a structure, namely a certain set, the domain or universe of discourse, together with a mapping that for each nonlogical symbol of the language assigns a denotation to the symbol in this set in the obvious way, depending on whether the symbol is a name, a predicate or a function symbol. Precisely this is an interpretation in one of the senses I have in mind.12 In a slightly different sense, an interpretation is what has just been meant by “assigning a denotation to a symbol of the language”. This assigning, however, is not an act of doing something but a mathematical object, an object that is studied by the model theorist and that has nothing to do with any use of the respective symbol. So the ambiguity mentioned should not mislead us: both senses of “interpretation” are essentially the same. They only differ in the different emphasis they make, and when in what follows I use the term “interpretation” I will have in mind this essentially one and the same model theoretic idea. Now, the decisive point with respect to this idea is that the language to which the denotations are assigned – that is: the object language that is interpreted – is not a language considered as used. It is a purely mathematical object, and mathematical objects are not used: only the signs with which we refer to these objects are used. Precisely this use is totally absent in case of an interpretation. Of course, the metalanguage is (or may be) used – but not the object language. Such a situation can be described with a felicitous metaphor sometimes used by Wittgenstein:13 in case of an interpretation, the object language that is interpreted is dead like any other mathematical object, and only the meta-language which is about the object language has (or may have) life. In order to refer, the signs of the object language must have life, in order to be interpreted, they must be dead. To express it differently: reference concerns the meaning of the respective signs while interpretation has nothing to do with meaning. 12 In [14, p. 2], this is called a “structure” , and the term “interpretation” is used for the interpretation of structure A in structure B (see p. 107, as already mentioned). 13 For example in [41, pp. 3-5], and [40, §§ 432 and 454]. – I understand this metaphor as felicitous and not as dubious or critical. It points to a deep difference, the difference between life and death, which Wittgenstein in the present context comprehends as the difference between used and non-used signs.
On Live and Dead Signs in Mathematics
191
Wittgenstein’s metaphor appears especially apt when understood in the following way: There is a whole terrain of what may be called “meaning phenomena”, analogous to the many life phenomena which biology sees as characteristic for life. Just as “life” is an intricate notion in biology, the same is true of “meaning” in the philosophy of language, and both notions, “life” and “meaning”, are important in a similar way. Precisely this analogy lets Wittgenstein talk about the ‘life’ of a sign instead of its ‘meaning’. What are the meaning phenomena? They comprise the following: that propositions can agree or disagree with reality; that linguistic signs can be ‘directed’ at objects; that we can use them to refer to entities that are far away, with respect to both time and space, that may even be outside our light cone or that may be abstract entities like numbers and sets; and so on. The problem then is how we should understand these remarkable capacities. According to Wittgenstein, they should not be seen as the effect of a specific medium, the ‘mind’, which has such an astonishing power, and they should also not be seen as afforded by certain appropriate entities that are added or attached to the signs. These are the wrong philosophical pictures. The correct point of view is described by Wittgenstein as follows: [I]f we had to name anything which is the life of the sign, we should have to say that it was its use. [. . . ] The mistake that we are liable to make could be expressed thus: We are looking for the use of a sign, but we look for it as though it were an object co-existing with the sign. [. . . ] The sign (the sentence) gets its significance from the system of signs, from the language to which it belongs. Roughly: understanding a sentence means understanding a language. As a part of the system of language, one may say, the sentence has life. But one is tempted to imagine that which gives the sentence life as something in an occult sphere, accompanying the sentence. But whatever accompanied it would for us just be another sign. [41, pp. 4f.]
This passage could do with a perceptive commentary which I cannot afford here.14 The important insight, however, appears clear enough: that the use emphasized by Wittgenstein has to be understood not as something like a separate entity, added to a sign and thereby giving it life – as something like, say, the ‘bundle of characteristic uses of the sign’, attached to it –, but as enmeshed in an entire practice. I think that this point of view conveys the correct picture of meaning, including reference, and it underlies all the considerations in the present paper.15 14 Not really a commentary, but some further explications can be found in [22, pp. 115132]. 15 Despite casual allusions to Hilbert or Dedekind, my considerations are of a totally unhistorical nature, for example in the way the object- versus meta-language dis-
192
Felix Mühlhölzer
It underlies in particular the distinction between “reference” and “interpretation” that I want to make. The natural place of the notion of “interpretation” as I understand it is in model theory, and model theorists are quite explicit about the fact that the interpreted language appears as a purely mathematical object. Wilfrid Hodges, for example, explains it thus in his book A Shorter Model Theory: [W]e [. . . ] put no restrictions at all on what can serve as a name. For example any ordinal can be a name, and any mathematical object can serve as a name of itself. The items called ‘symbols’ in this book need not be written down. They need not even be dreamed. [14, p. 2]
Think, for example, of Peano arithmetic. This theory can be used by simply doing mathematics based on its axioms (as it is done, for example, in Chapter 1 of Edmund Landau’s nice little book Grundlagen der Arithmetik ). No meta-theoretical considerations are involved, especially no model theoretic ones, and therefore no interpretation is involved in the sense of model theory. If we start thinking about interpretations, we actually desist from the use we made of Peano arithmetic.16 And if we raise the further question about the role this use might nonetheless play in model theory, we directly land in the mess with which I’m struggling in the present paper. Seen from the perspective just described, “interpretation” appears categorially different from “reference”. One might object that this comes from my all too strict understanding of the term “interpretation” as a purely mathematical one. But it is simply a fact of many philosophical discussions about interpretation and reference that they start with purely mathematical facts, like the Löwenheim-Skolem theorems and their mathematical consequences, before “reference” really comes to the fore. This is characteristic, for example, of Putnam’s reflections in his much discussed paper “Models and Reality” (1980) or, more recently, of Halbach and Horsten’s paper “Computational Structuralism” (2005). One might think that in papers like these a wider notion of “interpretation” is used, a notion that I missed until now. But the sense of this wider notion should then be clarified! From the perspective that I have described now, it seems that this wider notion must involve a transition from the dead interpretations, in the mathematical sense of this term, to the life that must be seen in tinction is used. The thinking of Frege or Russell cannot be adequately captured in this way, for example, and it ought to be discussed separately. (I am grateful to Godehard Link for pointing this out to me.) 16 In most cases we do not even think of writing down the expressions of the object language because they are normally unsurveyable, often to such an extent that we could not write them down or that even a machine could not do it. But in the last analysis this fact of not-writing-down the expressions is irrelevant, because from the outset they are not expressions to be used. They are simply mathematical objects and such objects are neither written down nor used. Only the signs with which we refer to the mathematical objects are written down or used.
On Live and Dead Signs in Mathematics
193
what I call “reference”. My problem here is to understand this transition. Might it be the case, for example, that interpretation is some sort of formal core of reference, or “a mere shell of the reference relation”, as Shapiro says [32, p. 139]? And that reference might accomplish what this core or shell alone cannot ? For example, that it might accomplish to select the intended interpretations out of all the interpretations that the LöwenheimSkolem theorems allow? But I do not really understand this philosophical picture. Despite the amount of literature that has been produced in its sphere it remains desperately in need of clarification. I’ll come back to it shortly, but let me first indicate three further aspects of “interpretation” in my sense that also make it appear deeply different from “reference”. First, there is the issue of existence: the existence of an interpretation is normally settled in a purely mathematical way, while the question of whether terms refer seems to have quite another, genuinely philosophical status. Does, for example, the numeral “3” refer ? – I have a tendency to regard this as a question that is astray, and I actually agree with Tyler Burge who has declared any nominalist anxieties that might express themselves in questions of this sort “not to be worth tilting with” [4, p. 214, note 22]. But maybe such a reaction is too rash, at least for a philosopher; and maybe one should take seriously what Charles Parsons called the problem of nonvacuity, namely the problem whether the structures referred to by the structuralists – and therefore also the ‘places’ in such structures – do really exist.17 – Be that as it may, it seems to me that the question about the existence of an interpretation is a very different one from the question of whether our terms ‘really refer’, highlighting the difference between these two notions themselves. Second, a standard way of proving the existence of models consists in constructing the models out of the ‘signs’ of the object language themselves! In precisely such a way the usual proofs of the completeness and the compactness theorem for first-order theories are carried out. When Hodges, in the passage cited before, writes: “The items called ‘symbols’ in this book need not be written down. They need not even be dreamed”, he has cases like this in mind. The third aspect concerning the notion of interpretation I want to mention is that interpretation functions as an essential part of mathematical techniques or methods that aim at solving specific mathematical problems. This is already true of Hilbert’s use of interpretations in order to get metamathematical results, but it has been much expanded in the development of mathematics after Hilbert, where we can also find results belonging to core mathematics that are proved model theoretically and that are now the model theorists’ pride and joy. Nothing like this can be said of the notion of “reference”. What a huge difference! 17 See [27, pp. 48-50].
194
Felix Mühlhölzer
My third theme is “application”. With this term I only mean the application of mathematics to non-mathematical objects and states of affairs. I do not include intra-mathematical application – as when we apply results obtained in algebra to geometry, or vice versa – because I think that this is very different from extra-mathematical application. I will have in mind only the latter. Of course, “application”, even if restricted to the extramathematical case, is a huge subject, which cannot really be discussed here. What interests me is merely the difference between “application” and “reference”, and between “application” and “interpretation”. These, again, seem to be categorial differences: The signs of arithmetic refer to numbers and other mathematical objects, but arithmetic is (in my sense) applied to non-mathematical objects, and the use we have in mind when talking about reference and when talking about application is deeply different.18 Similarly, interpretation (in my sense) is totally different from application as well, as one can see, for example, from the fact that the question about the existence of an interpretation is a purely mathematical one, while the question as to whether there is an application has a to be answered in a totally different way. Such are simply the distinctions that I want to make here. What about the relation of a metatheory of a mathematical theory to this mathematical theory as its object theory? This certainly is not an ‘application’ in my sense because, as normally conceived, it is of a purely mathematical character. Like all objects of pure mathematics, also the objects of metamathematics are individuated by the theory alone. The objects that metamathematics is about are individuated by metamathematics alone; they are not given in advance. This has been conspicuously worked out in [10], and it is in accord with Wittgenstein’s view of metamathematics as explained in [25]. Of course, a metamathematical result can also be applied to the object theory. A proof, say, of the consistency of Peano Arithmetic can be applied to Peano Arithmetic in such a way that one now foretells: when you actually use Peano Arithmetic, by formulating definitions and proving theorems in this system, you will never get into a contradiction. This is an application, in the sense of an empirical prediction, but it is not the way we normally consider the metatheory as being about the object theory. The normal way is a mathematical one and does not consist in empirical predictions. 18 This sort of difference pervades mathematics, also in such cases where the aspect of an application to the empirical world seems to be especially manifest, as for example in knot theory. But also in such cases reflective mathematicians emphasize the purity of their approach. Crowell and Fox, for example, in their classic book Introduction to Knot Theory, emphasize: “Mathematics never proves anything except mathematics, and a piece of rope is a physical object and not a mathematical one. So before worrying about proofs, we must have a mathematical definition of what a knot is and another mathematical definition of when two knots are to be considered the same” [5, p. 3].
On Live and Dead Signs in Mathematics
195
2 How can Intended Models be Singled Out? When one starts with a formal endeavour like model theory, how might the possible use of the object language or object theory then be brought into play? The decisive idea here might perhaps be the idea of modelling: formal theories as being devised in order to model phenomena in the nonformal world – where the term “model” is now used quite differently than in model theory. Now it is the object language or theory itself which is considered as a model and not the interpretation of such a theory. In this sense, I understand so-called formal semantics: as modelling meaning phenomena that concern our live languages which essentially involve use. This modelling, of course, is nothing else than application: the formal object language or theory is applied to non-formal situations by modelling them. Can model theory itself also be understood in this way, perhaps as some sort of very coarse, or very elementary, version of formal semantics? The idea might be that with the term “interpretation” we model what one calls “reference” when taking into account our use of symbols. Or, using Wittgenstein’s metaphor: that dead things model live ones. Is this a good idea? And is it perhaps the idea lying in the background of many philosophical reflections involving model theoretic notions and results? – Take as one example the mathematically proved existence of non-standard interpretations and the problem of how to sort them out as ‘not intended’, as ‘rogue interpretations’, so to speak, and to fix the intended ones. Many people feel that this is a real problem, but to my mind it is not so easy to state precisely what it actually amounts to. After all, we do distinguish between standard and nonstandard models of a theory; and why shouldn’t we simply consider this to be the basis of excluding the nonstandard ones? In the respective model theory, that is, a metatheory, we can easily say that we want to concentrate on the standard ones. Why shouldn’t this be enough? If I understand him correctly, such was Skolem’s answer in the later, mature period of his thinking. It is true, of course, that the metatheory now allows nonstandard models itself, described in a metametatheory; and so on. But why be bothered? Doesn’t this simply and unavoidably belong to our mathematical life? Many people, however, want more. They seem to want some more substantive relation to models, but the possible nature of such a relation remains in the dark. According to a prevalent formulation, we possess an intuition of the standard model and this intuition should sort out the nonstandard models.19 This, for example, was what Bernays said in a 19 Nobody should say at this point that second-order (or still higher order) logic might help, because in this case the second-order quantifiers are in need of being appropriately understood. Proponents of second-order logic like Shapiro explicitly admit that this presupposes an intended standard interpretation (see, e.g., [31, p. 218], and [33, pp. 45f. and 58]). With respect to our distinction between live and dead signs, first- and second-order logic are in the same – dead – boat. And if an adherent of
196
Felix Mühlhölzer
discussion with Skolem in 1938:20 the purely formal axiomatic method is insufficient and what we need are notions of “number” and “set” in their intuitive sense. Skolem, however, resisted this move and responded: “the best thing is to stick, in each domain of research, to a suitable formalism. This way of working does not imply a restriction of the possibilities of reasoning, since one is always free to pass from a formalism to a more extended one”. This is an important answer, I think, which I cannot discuss here, however. Instead, I want to concentrate on Bernays’ point of view and the question of whether it might be made a bit more precise. To talk about intuitions in Bernays’ way is not very helpful. I see the term “intuition” here merely as a placeholder that should now be filled with something substantial. This term appears also in Putnam’s “Models and Reality”, at least at the beginning, when Putnam expounds his problem and when he writes that intuitive notions are “not ‘captured’ by the formal system” [28, p. 3] – where he has the notion of a set in mind, but he could also have mentioned the notion of a natural number. Not only the term “intuitive” is unsatisfying here, but also the term “captured”, which shows a symptomatic indecisiveness: precisely what does Putnam mean by it here? Is he talking about “reference” or “interpretation”, or does he equate the two? It is true that in the course of his paper Putnam drops vague terms like “intuitive” and “captured”, but a considerable portion of unclarity persists because the categorial difference between the notions “reference” and “interpretation” remains underexposed until the end. A similar indecisiveness is conspicuous in Charles Parsons’ so-called Uniqueness Thesis, which he formulates as follows: “If two structures answer equally well to our conception of the sequence of natural numbers, they are isomorphic” [27, p. 272]. What does it mean to say that structures answer to conceptions? Why not use better understood terms here, like “reference” or “interpretation”? If “intuition” is merely a place holder: how might it be substantiated? – One idea here is to refer to our knowledge of how to use the relevant terms. A thought like this, albeit in another context, has been nicely enunciated by Piero Sraffa in a note from 4 March 1934 connected to discussions with Wittgenstein: The error is to regard intuition as a provisional substitute for science: »when you will produce a satisfactory science, I shall give up intuitions«. – Now the two things cannot be set against one another they are on entirely different planes. Intuitions are a way of acting, science one of knowing (Physician).21 one of them needs ‘intuition’, then the same is true of an adherent of the other. 20 See [2, pp. 182f.] 21 In [19, p. 229]. The hint at a ‘physician’ is tentatively explained in [20, p. 65] by mentioning Wittgenstein’s attitude to consider the practicing physician as nobler than the physician as scientist.
On Live and Dead Signs in Mathematics
197
Let us call this Sraffa’s idea of explicating “intuition” via “acting”, that is: via “use”. Under this perspective, talk about ‘intuitive notions’ can get a really substantial and pleasantly down-to-earth interpretation: it can be understood as an allusion to our familiar use of the pertinent terms and especially to those aspects of this use that are relevant to what we call the reference of the terms.22 Might it be possible, then, to say that reference has the power to fix the intended interpretations? The difficulty remains that this power must be understood in the light of the fact that reference belongs to the realm of used signs and interpretation to the realm of dead signs (where I mean the signs of the object language, of course). How might reference act, so to speak, upon the realm of the formal objects of model theory in order to accomplish the wished sorting out of the non-intended ones? How can this gap between reference and interpretation be bridged? Or to speak less metaphorically: how can we on the basis of our practice of referring manage to sort out the unwished interpretations and fix the intended ones? Can we understand this ability without postulating unanalyzed and in the end mysterious powers? The difficulties that we come across here are uncovered in a revealing way by Charles Parsons in his reflections on the possibility of communicating what one means by “natural number”.23 Parsons adopts a Davidsonian perspective of radical interpretation and imagines two speakers, Kurt and Michael, who consider each other’s use of the expression “natural number” in order to understand what the other means. Parsons’ result is not the desired one. Let me quote how he himself describes the outcome. He writes: [Kurt and Michael] convince themselves that their number sequences are isomorphic, [but] it does not follow that they are standard. The discussion of the question from the point of view of radical interpretation seems to show that an interpreter coming from outside can interpret them so that the arithmetical truths of each [. . . ] are all true from the interpreter’s point of view and yet their number series are nonstandard, albeit essentially the same nonstandard model. This interpreter, however, will use his own notion of natural number, or some equivalent, to describe his model. It seems that his interpretation can survive only as long as he does not enter into the sort of dialogue with Kurt or Michael that they have had with each other. [27, p. 287]
One can ponder here what exactly a Davidsonian interpretation is, but I find it helpful to express Parsons’ result with the help of the categories that I here deploy. It appears, then, as a result concerning the reference of 22 Note that this is not Wittgenstein’s own idea. Normally, Wittgenstein himself is very critical of philosophical uses of the term “intuition” . But in the present context, Sraffa’s understanding of this term via use appears very helpful to me. 23 See [27, § 49].
198
Felix Mühlhölzer
Kurt’s and Michael’s term “natural number”, while the interpretation of this term in model theory is unaffected by this. The model theoretic sense in fact does not, as Parsons says about his ‘interpreter coming from the outside’, involve an entering into Kurt and Michael’s dialogue. It desists from Kurt’s and Michael’s use of their words altogether. Another idea to bridge the gap between reference and interpretation might be to look at our practice of theoretically modelling empirical situations, that is: our practice of application as mentioned in Section 1. Perhaps this practice can help us here? – To understand this idea, let us first consider an illuminating example of application coming from physics, where the art of modelling certainly is the most developed one. When in classical electrodynamics one treats spherical waves – waves which correspond to spherical surfaces that are spreading from some center – the respective wave equation, derived from Maxwell’s equations, has actually two solutions: the standard one, representing a wave travelling outward from the origin (the so-called retarded wave solution), and a non-standard one, representing a wave travelling inward toward the origin (the so-called advanced wave solution). The latter solution, however, is normally sorted out. Richard Feynman describes it thus: Although Maxwell’s equations would allow either possibility, we will put in an additional fact – based on experience – that only the outgoing wave solution makes ‘physical sense’. [9, p. 20-14]
So, in this way one of the mathematical possibilities is sorted out by us, based on our experience. Such a procedure is dependent on a sufficiently structured correspondence between certain theoretical descriptions and certain aspects of physical reality. This correspondence, then, lets us make inferences from what we know of this reality to the theoretical descriptions that are appropriate. When talking about a ‘correspondence’, I do not mean something simple or elementary. On the contrary, this correspondence is constituted by the highly complicated relation between theory and experience.24 But without it, the selective role of experience would be impossible. Now, might not a procedure of this kind, essentially involving the application of our theory, also be seen as operative in our sorting out of non-standard models from mathematical theories? – It seems to me that 24 So, when Feynman speaks of “an additional fact, based on experience” , he does not mean something simply given. It is claimed within a theoretical context that might be, or might become, immensely complex. This can be seen, for example, by the fact that there exists a sophisticated discussion in physics, provoked by Paul Dirac, about the possible theoretical usefulness of the non-standard waves. It is not at all a matter of course that they should be sorted out! (See [9, Section 28-5].) In other words: this sorting out, if one accepts it in the end, may rely on profound theoretical reflections, and it is not something which ‘nature itself’, so to speak, forces on us. It is we who decide, after possibly much discussion, whether to sort out or not to sort out.
On Live and Dead Signs in Mathematics
199
this idea does not work either, because there does not appear to be an appropriate correspondence between relevant aspects of our live practice and the dead object language of model theory. This is particularly conspicuous when we look at Halbach and Horsten’s already mentioned paper “Computational Structuralism”, in which a person’s practical ability to count, to add and to multiply is considered.25 Halbach and Horsten want to represent this ability in the abstract domain of models of Peano arithmetic by means of codings of models that make their successor function and their operations of addition and multiplication into recursive functions (in the technical sense of this term). A theorem by Stanley Tennenbaum [37] then says that such models must be standard models. Essential to their approach is the use of Church’s Thesis, which is considered to connect the domain of used signs with the domain of mathematical entities – that is, the domain of live with that of dead things. Church’s Thesis can actually have very different faces,26 but in the context of Halbach and Horsten it seems to be seen as the necessary correspondence between relevant aspects of those two domains. This, however, is a delusion. Church’s Thesis merely connects our nonrigorous notions of calculation with a rigorous one, but the latter can be a used one as well. The decisive step from used to dead signs consists in representing our calculations – non-rigorously and rigorously understood ones – by means of formulae belonging to the formal language that is the object of our model theoretic investigations, i.e., of formulae belonging to the realm of dead signs. It is this transformation of calculations into formulae, and not Church’s Thesis, that constitutes the correspondence connecting the domain of live signs with that of dead signs and their interpretations. And when we now look at the proof of Tennenbaum’s Theorem27 , we see that it is not at all this correspondence that is responsible for the ruling out of non-standard models. The recursiveness of addition of which the proof makes use is brought into play by model theory alone and it does not come from our practice of calculation. Our situation here is totally different from the aforementioned case of physics where the connection of the theory with experiments was responsible for the picking out of a certain theoretical possibility. In our case, the standard models are not picked out by our practice. Thus, using this theorem cannot reduce the aloofness of model theory from our live signs.28 25 Similar approaches can be found in [6] (which has influenced Halbach and Horsten) and [29]. 26 See the papers in [26]. (The discussion of Church’s Thesis shows another case of messes within the philosophy of mathematics.) 27 As it is presented, for example, in [17, Section 11.3], and, in a more transparent way, in [3, Section 25.2]. 28 Here is a short sketch of the proof of Tennenbaum’s Theorem: It only considers countable models and we can assume that all these models do have the same do-
200
Felix Mühlhölzer
At the end of their paper, Halbach and Horsten remark on what they call the “limited scope” of their approach, i.e., on the fact that it is restricted to arithmetic alone. From the perspective just described, this limitation is not surprising. A non-limited approach might be expected if we had a working correspondence between the domains considered in their paper, and similar correspondences could then be searched for in other cases. But in fact, such correspondences do not seem to exist, and Tennenbaum’s Theorem may simply be a singular result that is characteristic of the model theory of arithmetic alone. What, then, is the upshot of the foregoing reflections? – It seems to me that the question of how to single out the intended models is simply a muddle as soon as it asks for an answer that goes beyond what model theory itself can afford. There is no conceivable correspondence between our live signs and the dead ones of the object language of model theory that might lead to an answer. Of course, there are interesting relations between model theory and our normal mathematical practice, and they deserve detailed investigations. They are based on the fact that model theory constructs mathematical structures in its own way, structures that can turn out to be fruitful in normal mathematics. But these relations do not single out standard models, and their value consists in other structural insights of model theory, typically in those where non-standard models are brought into play. main, the standard set N of natural numbers (in the usual set theoretic framework, say). The theorem then says that if the addition function of such a model is recursive then the model cannot be standard, and the proof is essentially based on two mathematical facts. The first is a simple recursive-theoretic one: that there exist recursively enumerable subsets A and B of N such that any subset C that separates them cannot be a recursive set. (Such sets A and B are called recursively inseparable, where we say that a set C ‘separates’ A and B iff C contains A but is totally outside B.) The second essential mathematical fact is a characteristically modeltheoretic one: that the natural method of coding finite subsets of N by products of prime numbers standing for the numbers in the subsets can be extended to infinite subsets of N in the case of nonstandard models that have N as their domain. In this case, the nonstandard numbers of such models can be code numbers: code numbers of infinite subsets of N. And now this sort of coding can be expressed by a formula essentially involving addition; furthermore, the recursively inseparable sets A and B can be defined by formulae with the help of which we can define a set C that separates A and B and that can be coded by the method just mentioned. Then one can easily see that this set C would be a recursive set if the addition function were recursive. This, however, would contradict the recursive inseparability of A and B, and therefore in a nonstandard model the addition function cannot be recursive. – These are the essential steps of the proof. They immediately show that what we are doing here is not that we retract those aspects of the model theoretic thinking that go against our arithmetical practice. On the contrary, we add something that goes against this practice. It is the possibility, opened up by nonstandard models, to use the natural code method mentioned not only to code finite subsets of N but also infinite ones. But this is not possible in our practice, and this makes sufficiently clear that it is not our practice that is sorting out the nonstandard models. On the contrary, it is a genuinely model theoretic mechanism that lies beyond our practice.
On Live and Dead Signs in Mathematics
201
3 Strings of Strokes in Hilbertian Finitism I want to end this paper by applying the notions of reference, interpretation and application to a much more elementary issue than the one discussed in the preceding section. It concerns the strings of strokes occurring in Hilbert’s finitism and the question of how their role should be understood. As objects as such, irrespective of their role, I consider them as what Charles Parsons calls quasi-concrete objects. Such objects are abstract types that have a specific, internal relation to their concrete tokens. They are, as Parsons explains, “determined by their concrete embodiments. [. . . ] What makes an object quasi-concrete is that it is of a kind which goes with an intrinsic, concrete ‘representation’, such that different objects of the kind in question are distinguishable by having different representations.” [27, pp. 33-34] In what follows, when referring to a ‘string of strokes’ I always mean it in Parsons’ sense of a type of strings of strokes, i.e. as a quasi-concrete object. What is the point of considering strings of strokes? Hilbert’s original idea to use them as the undisputable basis of metamathematical investigations has become obsolete, but they can have other functions. In his book Mathematical Thought and Its Objects, Charles Parsons uses them to solve a problem that he sees in structuralist positions in the philosophy of mathematics, a viable one of which he himself tries to develop. Parsons calls his problem, as mentioned above, the problem of nonvacuity. It simply asks what we can say about the existence of instances of a mathematical structure and therewith also of the objects residing in a structure. On what grounds can we claim this existence? With respect to natural numbers, which might be considered to be the most elementary case, Parsons’ answer makes use of the strings of strokes: according to him, they are a clear-cut case of an existing instance of a structure representing the natural numbers, an instance that cannot reasonably be disputed. They instantiate, as he puts it, an intuitive model for the natural numbers, consisting of the natural numbers of Hilbertian finitism, which we might call “intuitive numbers”. Parsons accepts these objects of intuition as mathematical objects and thereby leaves a thoroughgoing structuralist standpoint in the philosophy of mathematics, since the identity of the intuitive numbers does not only depend on their places in a structure but also on their intuitability. He considers this, however, as “an unavoidable impurity in structuralism” [27, p. 219], because otherwise he doesn’t see any chance of giving an acceptable solution to the problem of nonvacuity. Now, in light of the categorial differences that can be seen between “reference”, “interpretation” and “application”, this point of view raises suspicion. Don’t the strings of strokes, as quasi-concrete objects, belong to the domain of use, so that they should not be seen as models in the strict sense of this term? Parsons himself in no way hides the difficulties that beset his own view of the role of the strings of strokes. These are
202
Felix Mühlhölzer
above all vagueness problems concerning the identity of the strings, that is, concerning our speaking of the same and not the same type presented by given tokens of strings. The following is a simple example of problems of this sort. Suppose that the following type is uncontroversial to us (which it certainly is): (a) ||||. A characteristic vagueness, however, turns up if we ask whether, e.g., this (b) ||| |
is the same type. Parsons responds by first considering not the relation between the types of (a) and (b) but the relation between the type of (a) and the token of (b). He admits that this relation can be vague but insists that the relations between the types need not be regarded as vague. The vagueness of the relation between type and token may generate an ambiguity as to what type the term »the type instantiated by (b)« designates, but this does not imply that the identity statement »the type instantiated by (a) is identical with the type instantiated by (b)« must be vague (see [27, pp. 168-169]). This answer, however, is merely the suggestion of a solution to the vagueness problem. It does not tell us anything about the way in which the token of (b) is becoming the token of a certain type (of the same type as ||||, say), or of two types (||| and |, say), or of none. To my mind, it points to a real difficulty of Parsons’ position, a difficulty that stems from a one-sided fixation on the objects themselves – the types |, ||, |||, . . . – at the expense of the rules according to which we produce and use these objects. In a Kantian spirit, Parsons concentrates on our intuition of these objects – “intuition” now in a Kantian sense, not in the sense discussed above – instead of considering the rules and use that involve our dealing with them. However, it seems to me hopeless to try to say something substantial about the identity of these objects without taking our rule-governed use of them into consideration.29 There are different sorts of rules that can be seen as relevant in the present context. Let us consider only the following, very simple one which governs the production of the strings of strokes: (H)30 Start with the object |, then add this same object, then again this object, and so on; at each step first reproduce the whole object produced at the step before and only then add |. When we see our use of the strings of strokes as directed by this rule, we immediately get the following obvious treatment of Parsons’ vagueness problem: The token 29 See also [24], where the role of Parsons’ use of Kantian intuition is investigated in more detail. 30 “H” like “Hilbert”.
On Live and Dead Signs in Mathematics
(b)
203
||| |
should be regarded as of the same type as (a) ||||,
if (b) is treated as occurring immediately after ||| and immediately before |||||. This is the way in which the identity of the signs |, ||, |||, . . . is determined now. It is not a precise mathematical determination, however, but an essentially context-dependent one, and it obviously breaks down when the strings become too long.31 Let me give one simple example of the context-dependence relevant here. Consider the case of the strings of strokes (c) and
|||||
(d) ||| ||. Does (c) present the same string of strokes (= intuitive number) as (d)? Or does (d) present two strings of strokes, namely: ||| and ||? Compare, for example, the following two intuitively given situations: (K1) | || ||| ||| | ||| || |||| || ||| || || . . . and (K2)
. . . ||||| |||| ||| || |.
(K1) can be easily seen as an increasing sequence of consecutive intuitive numbers, and (K2) as a decreasing sequence where each number in the sequence is the foregoing number minus 1, with the consequence that ||| || counts as one number in (K1) and as two numbers, the numbers ||| and ||, in (K2). That is, the identity of the intuitive numbers depends on the context in which they occur, and this is not so in case of genuinely mathematical objects. How is this different to a structuralist position according to which the identity of mathematical objects only depends on the structure in which they reside? – In case of the intuitive numbers, it is not merely the structure that is relevant but also the intuitive character of the objects. Precisely this is the point of Parsons’ use of them, because only then are they candidates that he accepts for his solution to the nonvacuity problem. And also Hilbert could not accept a structuralist answer, because to him the intuitive numbers should be given in advance to any structuralist point of view.32 The context-dependence of the intuitive numbers has to do with 31 The relevance of this sort of breakdown is investigated in Wittgenstein’s reflections on the ‘surveyability’ of proofs in Part III of his Remarks on the Foundations of Mathematics; see my attempt at clarifying this topic in [21] and on numerous pages of [23]. 32 This argumentative move is essential to Hilbert’s idea of metamathematics – and at the same time marks one of its essential weaknesses; see [10] and [25].
204
Felix Mühlhölzer
the way these ‘numbers’ – that is: the strings of strokes – are used and not with the issue of ‘where they reside’ (whatever that may mean). This context-dependence remains a stumbling block to considering the strings of strokes as mathematical objects. What about very long strings, that is, very large intuitive numbers? If their difference is small, they can no longer really be distinguished and also the process described in (H) can no longer afford the necessary distinction because at every step it demands a reproduction of the foregoing string, which ceases to be reliable. This is very different in case of the genuine numbers which are only given via the signs we use in order to refer to them and which allow the invention of ever new symbolic systems in order to catch bigger and bigger ones. Our distinguishing the numbers proceeds via these symbolic devices and has nothing to do with objects considered as given in intuition. All of this is also relevant to the way we refer to the intuitive numbers in contrast to the genuine ones. Reference to strings of strokes is governed by rules like (H), which is a rule explicitly concerned with our actions and which leads to the characteristic context-dependence and restrictedness to short strings just described. In contrast to this, the way in which we refer to numbers is mainly governed by the acceptance of mathematical propositions – axioms, equations, theorems – and in this way we transcend, so to speak, the context-dependence and restrictedness characteristic of the case of the strings of strokes. On the other hand, there must exist relatively intimate relations of arithmetic to the strings of strokes. What do they look like? – Of course, we can apply arithmetic to such quasi-concrete objects. When applying arithmetic to them, the identity of these objects, as extra-mathematical objects, is given in advance; whereas the numbers, which arithmetic is about, are only given by mathematics itself. So, one obvious way to look at the strings of strokes is to treat them simply as objects of the application of arithmetic and not as arithmetical objects themselves. Note that this application concerns strings of strokes as types and not only strings of strokes as tokens. I agree with Parsons in thinking that it is types that we normally have in mind when considering strings of strokes. Therefore it is types as well when we apply arithmetic to strings of strokes. At the same time we should not forget the fact that the strings of strokes, of course, can also be used as signs of arithmetic. If we think of our practice as properly enriched by treating these strings like singular terms, by introducing predicates and function symbols33 and by providing appropriate means for quantification, such that this practice becomes sufficiently similar to our familiar mathematical practice, then we should be allowed to regard the strings of strokes as signs with meaning and with reference 33 Like “=”, “<”, “+” etc.
On Live and Dead Signs in Mathematics
205
to objects. These objects, however, are now the genuine natural numbers themselves. We then have the familiar two levels of, first, the signs, used in a certain, meaning-bestowing way and subject to their own non-precise and context-dependent criteria of identity and, second, the precise mathematical objects referred to by these signs. When using the strings of strokes as numerals, we veer away from Hilbert’s view of finitary arithmetic. According to Hilbert, the strings of strokes are not signs with meaning and reference; Hilbert does not allow sentences with these signs; and therefore his strings of strokes are not symbols of a language. Hilbert expresses elementary sentences like “2 < 3” or “2 + 3 = 5” on the level of what he calls signs for communication,34 with our familiar numerals “1”, “2”, “3”, etc. as signs referring to the respective strings of strokes. Nothing could be easier, however, than to use also the strings of strokes as numerals referring to our familiar natural numbers. And our familiar numerals in decimal notation, which refer to these numbers as well, then appear simply as notational variants, possibly as abbreviations of strings of strokes. From this perspective we may perhaps get a better understanding of our tendency – a tendency that certainly everyone feels – to consider the strings of strokes as mathematical objects. I think we automatically tend to read them as numerals referring to numbers, and then we retroactively (so to speak), but unconsciously, use these numbers as norms for the identity of the numerals themselves. Of course, one can do precisely this, but then the strings’ character as quasi-concrete objects is disavowed, so to speak. We can treat them as genuinely mathematical objects from the outset by considering them, say, as simple graphs consisting merely of nodes, without edges, and by understanding the relations between them purely graph-theoretically. By doing this, however, the strings of strokes can no more be used to solve the problem of nonvacuitiy. One can ask whether this is a problem that needs to be solved at all, but I will not go into this question here. I actually do not only agree with William Tait’s simple-hearted remark, as already quoted in note 1, that our arithmetical propositions are propositions about numbers, but also with his ironic statement that “[t]he question of whether numbers exist in this sense makes no sense, unless it is the trivial question of whether there is a number that is a number – to which one answers: 0 is a number” ([35, p. 542]; reproduced in [36, p. 37]).35
34 “Zeichen der Mitteilung”; see [13, p. 21]. 35 I am grateful to Juliet Floyd, Simon Friederich, Peter Hacker, Volker Halbach, Leon Horsten, Wilfried Keller, Tim Kraft, Godehard Link, Dolf Rami and Dirk Schlimm for valuable comments on preliminary versions of this paper, and to Norbert Schappacher for the discussions I could have with him about its subject matter. Its motto is from [38, p. 167].
206
Felix Mühlhölzer
References [1] Calixto Badesa. The Birth of Model Theory. Princeton University Press, 2004. [2] Luca Bellotti. Skolem, the Skolem ‘Paradox’ and Informal Mathematics. Theoria, 72:177–220, 2006. [3] George S. Boolos, John P. Burgess, and Richard C. Jeffrey. Computability and Logic. Cambridge University Press, fifth edition, 2007. [4] Tyler Burge. Logic and Analyticity. Grazer Philosophische Studien, 66: 199–249, 2003. [5] Richard H. Crowell and Ralph H. Fox. Introduction to Knot Theory. Ginn and Company, Boston, 1977. [6] Walter Dean. Models and recursivity. /org/conference/2002/wd.pdf, 2002.
http://www.andrew.cmu.edu
[7] Richard Dedekind. Gesammelte Werke, Bd. III. Friedrich Vieweg und Sohn, Braunschweig, 1932, edited by Robert Fricke. [8] William Ewald (ed.). From Kant to Hilbert: A Source Book in the Foundations of Mathematics, volume Volume II. Oxford University Press, 1996. [9] Richard P. Feynman, Robert B. Leighton, and Matthew Sands. The Feynman Lectures on Physics, volume Vol. II (Mainly Electromagnetism and Matter). Addison-Wesley, 1964. [10] Simon Friederich. Structuralism and Meta-Mathematics. Erkenntnis, 73: 67–81, 2010. [11] Volker Halbach and Leon Horsten. Computational Structuralism. Philosophia Mathematica, 13(3):174–186, 2005. [12] Michael Hallett. Frege and Hilbert. In Michael Potter and Tom Ricketts (eds), The Cambridge Companion to Frege, pp. 413–464. Cambridge University Press, 2010. [13] David Hilbert and Paul Bernays. Grundlagen der Mathematik I. Springer, Berlin, 1968. [14] Wilfrid Hodges. A Shorter Model Theory. Cambridge University Press, 1997. [15] Wilfrid Hodges. Model Theory (Draft 20 Jul 00). http://wilfridhodges.co.uk /history07.pdf, 2000. [16] Leon Horsten. Having an interpretation. Philosophical Studies, 150:449–459, 2010. [17] Richard Kaye. Models of Peano Arithmetic. (Oxford Logic Guides: 15). Oxford University Press, 1991. [18] Edmund Landau. Grundlagen der Analysis. Akademische Verlagsgesellschaft, Leipzig, 1930. (English translation by F. Steinhardt: Foundations of Analysis, AMS Chelsea Publishing, 2001). [19] Brian McGuinness (ed.). Wittgenstein in Cambridge: Letters and Documents 1911-1951. Blackwell, 2008. [20] Brian F. McGuinness. Wittgenstein and Schlick. Parerga Verlag, 2010.
On Live and Dead Signs in Mathematics
207
[21] Felix Mühlhölzer. ›A mathematical proof must be surveyable‹ – What Wittgenstein meant by this and what it implies. Grazer Philosophische Studien, 71:57–86, 2006. [22] Felix Mühlhölzer. Wittgenstein und der Formalismus. In Matthias Kroß (ed.), »Ein Netz von Normen«: Wittgenstein und die Mathematik, pp. 107– 148. Parerga Verlag, 2008. [23] Felix Mühlhölzer. Braucht die Mathematik eine Grundlegung? Ein Kommentar des Teils III von Wittgensteins »Bemerkungen über die Grundlagen der Mathematik«. Vittorio Klostermann, 2010. [24] Felix Mühlhölzer. Mathematical Intuition and Natural Numbers. Erkenntnis, 73:265–292, 2010. [25] Felix Mühlhölzer. Wittgenstein and Metamathematics. In Pirmin StekelerWeithofer (ed.), Wittgenstein: Philosophie und Wissenschaften, pp. 103– 128. Verlag Felix Meiner, 2012. [26] Adam Olszewski, Jan Woleński, and Robert Janusz (eds). Church’s Thesis After 70 Years, 2006. Ontos Verlag. [27] Charles Parsons. Mathematical Thought and Its Objects. Cambridge University Press, 2008. [28] Hilary Putnam. Models and reality. In Realism and Reason, Philosophical Papers, Vol. 3, pp. 1–22. Cambridge University Press, (1980) 1983. [29] Paula Quinon and Konrad Zdanowski. The Intended Model of Arithmetic. An Argument from Tennenbaum’s Theorem. http://www.impan.pl /~kz/files/PQKZ_Tenn.pdf, 2006. [30] Stewart Shapiro. Second-Order Logic, Foundations, and Rules. The Journal of Philosophy, 87:234–261, 1990. [31] Stewart Shapiro. Foundations without Foundationalism: A Case for Secondorder Logic. Oxford University Press, 1991. [32] Stewart Shapiro. Philosophy of Mathematics: Structure and Ontology. Oxford University Press, 1997. [33] Stewart Shapiro. Do Not Claim Too Much: Second-order Logic and Firstorder Logic. Philosophia Mathematica, 7(3):42–64, 1999. [34] Wilfried Sieg and Dirk Schlimm. Dedekind’s Analysis of Number: Systems and Axioms. Synthese, 147:121–170, 2005. [35] William Tait. Finitism. The Journal of Philosophy, 78:524–546, 1981. (reprinted in [36], pp. 21-42, with an Appendix on pp. 54-60). [36] William Tait. The Provenance of Pure Reason. Oxford University Press, 2005. [37] Stanley Tennenbaum. Non-archimedian models für arithmetic. Notices of the American Mathematical Society, 6:270, 1959. [38] William P. Thurston. On Proof and Progress in Mathematics. Bulletin of the American Mathematical Society, 30:161–177, 1994. [39] Jean van Heijenoort. From Frege to Gödel: A Source Book in Mathematical Logic, 1889-1931. Harvard University Press, 1967.
208
Felix Mühlhölzer
[40] Ludwig Wittgenstein. Philosophical Investigations. second edition, 1958, edited by G.E.M. Anscombe (trans.), G.E.M. Anscombe, and R. Rhees. [41] Ludwig Wittgenstein. The Blue and Brown Books. Blackwell, 1969. [42] Ludwig Wittgenstein. Wittgenstein’s 1939 Lectures on the Foundations of Mathematics. The Harvester Press, 1976, edited by Cora Diamond. [43] Ludwig Wittgenstein. The Big Typescript: TS 213. Blackwell, 2005, edited by C. Grant Luckhardt and Maximilian A. E. Aue. Cited according to the original pagination in TS 213.
Generalization and the Impossible: Issues in the search for generalized mathematics around 1900 Paul Ziche
Around 1900, some apparently long solved issues in mathematics were anew seen as being “paradoxical” ; the best example is provided by the complex numbers that by philosophers, also by philosophers of mathematics, but also by novelists and other authors were seen as posing genuine problems. In the philosophy of mathematics, these problems are related to the issue of generalization: how can one come to new mathematical structures, based upon existing mathematics? This issue became particularly relevant in the context of foundational debates that concerned the relationship between mathematics and logic. This paper presents a number of parallel attempts to discuss the problem of generalization in mathematics: Husserlian phenomenology, Russell’s philosophy of mathematics, Whitehead’s comparative project on ‘universal algebra’, and a number of lesser known theories (such as Wilhelm Ostwald’s ideas about a “theory of order” ) all deal with this issue. This paper aims at giving an overview over these debates and emphasizes the breadth and openness of these foundational debates: Russell suggests an “inductive” method to derive at generalizations concerning mathematical theories, for Husserl this leads into a phenomenological theory of “forms of theories” .
1 “Contradictions are emotions”: The example of the complex numbers The great Dutch author Multatuli (pseudonym for Edward Douwes Dekker, 1820-1887) includes in his autobiographical novel Max Havelaar from 1860 a half-ironic, half serious catalogue that lists the titles of a great number of (unwritten) pamphlets. Dwarfing the more famous Chinese encyclopaedia in the writings of Borges/Foucault [13, p. 7] at least in scope, this list of hypothetical texts is intended to summarize some of the critical issues of his epoch. One of the entries refers to a treatise on three perennial enigmas: “Over het PERPETUUM Mobile, de cirkelkwadratuur en den wortel van wortelloze getallen” [36, p. 70].
210
Paul Ziche
The last item stands out. That the first two tasks, that of constructing a perpetuum mobile and that of squaring the circle, had to remain without a solution has been clear for quite some time, but what about the “roots of numbers that do not have roots”? Multatuli evidently thinks of the problem whether “x2 √ = −1” has a solution, or, in the more popular formulation, in which sense “ −1” can be said to exist. A traditional answer, dating back at √ least to Newton’s Universal Arithmetick from 1707 [44], has been that “ −1” is an “impossible number” because no such number does exist within the field of the real numbers. Ernest Nagel shows that this analysis was widely shared, and could even be used in theological argumentations designed “to support the mysteries of orthodox Christianity” [38, p. 170]. Interestingly, the paradoxical character associated with the imaginary numbers is by no means progressively eliminated in the course of the progress of mathematics. In the years around 1900, in particular, one witnesses a lively interest not only in the apparently more far-reaching contradictions and paradoxes such as Russell’s set-theoretic paradox, but also in the complex numbers.1 The square root of −1 circulated as a kind of universal symbol that could be employed in literary texts as a sign for mind-boggling impossibilities, and for the conflict between criticallyminded questions and an establishment that takes the legitimacy of official forms of discourse for granted. In this function √ it figures prominently in Robert Musil’s Zögling Törleß from 1906. “ −1” is here seen as not only “impossible”, but also as something “scary” or “uncanny” that makes the mind “spin around” or “dizzy” in thinking about it [37, p. 73-77]. This adding up of an existentialist dimension to the mathematical or √ logical problem of the status of −1 does by no means remain restricted to literature. Identical phrases are to be found in philosophical texts, in particular those dealing with the philosophical foundations of mathematics.2 In all cases – and that holds for Musil, too – it is evident that it is not mathematical ignorance that can account for these, from our perspective, rather startling characterizations of complex or imaginary numbers as contradictory or paradoxical. A few examples suffice to illustrate this point. Edmund Husserl calls “imaginary” numbers, and more precisely the “Verknüpfungsformen”, the operations to be performed on complex numbers “arithmetically nonsensical [sinnlos]” or downright 1
For an extensive overview over the role of paradoxes in 20th -century logic, see [5]; the idea that the complex numbers could be paradoxical, however, is not treated here. [45] provides a taxonomy of the forms and roles of contradictions in mathematics from a Wittgensteinian perspective; see also [4].
2
As one example – many more could be given – for identically worded reservations in more traditional philosophical contexts, see Herder’s critical remarks against Kant’s notion of “Schema” : “Wer also sich an Formeln gewöhnt, als ob er den Begriff, an Begriffe, als ob er die Sache habe; wer alle unter sich verwirret und glaubt, Schemate stellen Verstandesbegriffe dar, hat sich mächtig getäuschet. Mit Einer nichtssagen√ den Wortformel ( −1.) konnte er ein ganzes Wörterbuch dunkler Formeln einleiten, die alle so wenig als jene erste Wortformel bedeuten.” [24, p. 123-4]
Generalization and the Impossible
211
“absurd [widersinnig]”, “erroneous and unclear”.3 He even extends these verdicts to cover not only the complex numbers, but also negative numbers, fractions, irrational numbers “and so forth”, thus leaving only the natural numbers untainted. In a similar way, Hermann Hankel, whose “principle of permanence” provided an important, although controversial stimulus for generalizations within mathematics and logic,4 deals with the complex numbers in a lengthy treatise entirely devoted to “complex number systems”. Even here, in a specialist’s account of the state of the art of the relevant fields within mathematics, the imaginary numbers are called “paradoxical, strictly speaking inadmissible [paradox, streng genommen unzulässig]” ([20, V]; for more examples, see [55, ch.VI]). Ernst Cassirer embeds the idea that imaginary numbers provide an “insoluble mystery” – which for him, however, ceased to be mysterious in the work of Gauss – into a broader discussion of “scientific facts”, emphasizing that also the idea of an “entirely isolated body” and the notion of inertia are “by no means evident or natural” and could even be viewed as being “evidently false, and even absurd”. With regard to mathematics, he sees the only possible solution in a formalistic understanding of mathematics that at the same time captures the true character of generality within mathematics: “These obscurities could not be removed until the general character of mathematical concepts had been clearly recognized – until it had been acknowledged that mathematics is not a theory of things but a theory of symbols.” [8, p. 66-7] The existential dimension is made explicit by the Dutch mathematician, philosopher and writer Gerrit Mannoury who, in a popular book on Mathesis en Mystiek (Mathematics and mysticism) from 1924, states explicitly that a “contradiction is an emotion”. Interestingly, he at the same time refers back to Multatuli [35, p. 39, 37].5 3
[31, p. 432-3]: “Das Problem des Imaginären ist innerhalb der historisch ersten Form reiner Mathematik erwachsen, innerhalb der Arithmetik, zumal in der Form der arithmetischen Algebra. Die in der algebraischen Rechnung angelegte Tendenz zur Formalisierung führte zu Operationsformen, die arithmetisch sinnlos waren, aber die merkwürdige Eigentümlichkeit zeigten, daß sie trotzdem rechnerisch verarbeitet werden durften. [. . . ] (Ich fasse hier natürlich den Titel ‘imaginär’ möglichst weit, wonach auch das Negative, ja selbst die Brüche, die Irrationalzahlen und dgl. als imaginär gelten kann. Historisch hat nur das Imaginäre im Sinn des Negativen und der Lateral-Zahl Anstoß erregt.) Die etwa entspringenden Irrtümer und Unklarheiten [. . . ] Aufgrund dieses Systems, also aufgrund der besonderen Natur der Objekte haben gewisse Verknüpfungsformen keine reale Bedeutung, d.h. es sind widersinnige Verknüpfungsformen. Mit welchem Recht darf das Widersinnige rechnerisch verarbeitet [. . . ] werden, als ob es Einstimmiges wäre.” The relevant manuscript material for these lectures – that were presented at the “Göttinger Mathematische Gesellschaft”; Hilbert commented on Husserl’s lecture – is collected and discussed in [52].
4
For an extended discussion of this principle – with an emphasis on the contribution of G. Peacock – and the role it plays in a formalist account of mathematics, see [12, p. 277-287].
5
The context is given by the so-called “signifische kring”, a group of Dutch intellec-
212
Paul Ziche
This dramatic staging of an arithmetical or algebraic problem requires an explanation. Normally, one would assume that by 1900 the use of complex numbers had been put on a solid basis within mathematics through the work of, for instance, Gauss or Cauchy, and had been further developed and refined in mathematical subdisciplines such as algebra and the theory of (complex) functions.6 Complex numbers had also found their way into mathematical physics and were successfully applied in, among other fields, optics and electrodynamics. What, then, could there possibly remain enigmatically enough to make complex numbers appear downright paradoxical? The obvious answer must be: It seems inadequate to assume that the complex numbers simply continued to be problematical. Rather, the complex numbers began to raise problems that were, in important aspects, new. These new questions arose in a revised account of what mathematics is, of which questions can and have to be asked in mathematics, and which strategies for justification are employed and accepted in mathematics. Quite obviously, it is the new interest in the foundations of mathematics that raised seemingly settled issues to the status of highly controversial problems. The iconic status acquired by the complex numbers in literary and philosophical debates already indicates that these questions can be embedded into a broader context, and that they are related to fundamental characteristics of the intellectual landscape around 1900. The following remarks sketch some relevant contextual issues by moving rather freely between logic, mathematics, and philosophy. This can be motivated further by asking in which sense, precisely, the complex numbers can be thought of as being paradoxical. A contradiction does indeed arise if we combine the axioms defining R with the assumption that “x2 = −1” has a solution within R. It is this “axiom of solvability” (cf. [12, p. 277-287]) that accounts for the problematical status of the complex numbers: the axiom is exceedingly plausible, and the equation “x2 = −1” itself does by no means seem extraordinary. Still, this axiom leads, together with the axioms for R, to an inconsistent set of propositions. Characterizing the complex numbers as “absurd”, however, suggests more than just a formal contradiction. Their being seen as absurd implies that the complex numbers somehow clash with what one may call the natural assumptions as to what numbers are or should be, or as to how mathematics should proceed. This lies at the basis of yet another important issue that figures prominently in 19th -century mathematics, logic and philosophy, namely that of abstraction and generalization (see below, section 3). tuals that was devoted to the study of all dimensions of language, including the emotional aspects of language. One of the most prominent members of this group was L.E.J. Brouwer. On the “significs”, see [50]. 6
General literature: [46]; [17]; [16]; [51]; [10].
Generalization and the Impossible
213
The paradoxes arise through a tension between what might appear as a perfectly smooth extension of well-established mathematics on the one hand and the necessity to simultaneously change or drop some of the basic concepts. In the case of the complex numbers, the “axiom of solvability” does not simply enlarge the number of equations that have a solution, but forces upon us a revision of some basic assumptions as regards the operations with numbers (such as that any square is positive), and of the very idea what numbers are (counting can no longer be viewed as the basic operation defining number). Even more radical are the revisions associated with a transition from complex numbers to quaternions or to matrices; in these transitions, fundamental algebraic properties such as the commutativity of multiplication undergo revision. The very success in calculating with the natural numbers even adds to the difficulty because it implies that some aspects of mathematical practice can be taken over without further ado, while others are in need of thoroughgoing reassessment. This dialectical structure can also be detected in the case of ideal elements. Ideal elements, as conceived by Hilbert, are “added” upon an existing theory, thereby extending this theory, making it simpler, and rounding it off/making it closed under the relevant operations ([25], [26]; see also section 3). But clearly, not just any addition to an existing theory leads to these welcome results. Hilbert himself introduces the notion of ideal elements in a strangely weak, analogical way by referring to standard examples taken from projective geometry and, on a more basic level, from the introduction of complex numbers. This procedure raises obvious questions: What is the basis of this analogy? What are the natural extensions of a theory? Which features of the introduction of the complex numbers can be generalized to the more general strategy of thinking in terms of ideal elements? In particular: if the complex numbers can be thought of as being “paradoxical”, it can no longer be clear why one should expect that adding paradoxical elements to a theory should lead to an improved version of the old theory.7
7
Issues in paraconsistent logic are not discussed here; historically, there are interesting links with the problems dealt with in this paper. On the history of paraconsistent logic, and the motives behind thinking logically about paraconsistency, see [43], in particular Part 1: “The History of Paraconsistent Logic”; [42]. – [3] discusses the question as to precisely which properties have to be preserved by consequence; this question (which, in the more narrowly defined area of “preservationism” arose in the context of developments in deontic logic in the 1970s, [3, p. 100]) is obviously also closely related to the problem of generalization in mathematics.
214
Paul Ziche
2 Russell on symbolism: Making the simple complicated In his 1921 booklet, Mysticism and Logic and Other Essays, Russell deals with logic under virtually the same title also chosen by Mannoury in 1924, echoing the dimension of paradoxicality produced by coupling logic and mysticism. The essays collected in this volume range between being “entirely popular” to “somewhat more technical” [47, VI]. One of the popular items is devoted to “Mathematics and the metaphysicians”. In this paper, originally published in 1901 (that is, according to Russell’s own statement, before he became acquainted with the work of Frege) under the less far-reaching title “Recent work in the philosophy of mathematics”, Russell presents pure mathematics as one of the greatest discoveries of the 19th century. He even launches the apparently outrageous claim that Boole’s Laws of thought from 1854 was “the first [book] ever written on mathematics” [47, p. 74]. This can only mean that there is a profound difference between books in and those on mathematics that then are books in meta-mathematics. Within this framework, Russell makes yet another surprising statement, this time concerning the role of symbolism in mathematics, and taking up the troubling notion of simplification in mathematics: “The fact is that symbolism is useful because it makes things difficult. (This is not true of the advanced parts of mathematics, but only of the beginnings.)” [47, p. 77] The use of symbolical notations, according to this passage, has a double function. In the more advanced parts of mathematics, symbolism can help to simplify matters, but as regards the foundations of mathematics, it is precisely the other way round. That symbolism makes things difficult has to mean that it removes a simplicity that is only apparent, and that it establishes the link with the more complicated bits and parts of mathematics, implying that even at the foundations of mathematics, we can and have to employ the very same standards of truth and provability as in the more derivative areas of mathematics. Symbolical language shatters the apparent self-evidence of seemingly trivial mathematical results (Russell uses the stock example “2 + 2 = 4”): “Obviousness is always the enemy to correctness” [47, p. 77]. Doing so, a symbolism allows us to identify the necessary basic propositions that allow for a derivation of these simple statements. Without the difficulties introduced by a symbolism, Russell claims, one would not have arrived at the axioms for algebra or arithmetic. Its function, then, consists in removing the psychological, or, in the language of the period, psychologistic assumptions underlying a justification of the foundations of mathematics in terms of evidence.8 Russell himself applies this idea to Peano’s axioms and to the calculus, but at 8
Cf. also Detlefsen’s [12, p. 237] emphasis on the formalists’ tendency to argue against an “immersion in intuition and meaning”.
Generalization and the Impossible
215
the same time extends this argument far beyond the calculus. His claims are worded in strong language, and range all the way down to teaching elementary mathematics at school, criticizing the fields that were traditionally viewed as providing the natural starting-point for mathematical education. The fact that Euclidean mathematics continues to be taught at high-school level, despite the lack of solid foundations within the Euclidean system, he deems a “scandal” [47, p. 95], taking up Kant’s more famous phrase about the scandalous state of a philosophy that cannot account for the status of reality. This Russellian argument clearly reveals the tension that exists between smoothly extending existing mathematics beyond its boundaries, and the fundamental re-shaping that those generalizing steps can require. Russell’s remark on Boole also makes clear that this has implications for the question how one should conceive of mathematics as a whole, and for the relationship between mathematics and meta-mathematics (these issues cannot be addressed here, however). Russell’s own attempts at giving a clear and comprehensive definition of mathematics in the opening paragraphs of the Principles of mathematics give clear evidence for how pressing this problem was for him at this period [48, § 1, p. 3].
3 Ways into logic: Generalization and abstraction So far, three strategies for extending mathematics have emerged: the in√ troduction of new (types of) objects, such as −1;9 working out the most basic principles; and the use of symbolism throughout the whole of mathematics. Obviously, these strategies are intimately related: accepting new types of objects at the same time implies a decision as to which axioms or basic propositions should be preserved, and symbolic language seems required to pursue these goals. They all lead to a generalization of mathematics beyond its current limits. All of these generalizing strategies find counterparts in, and are embedded into, discourses about generalization in other fields, most notably in philosophy. In all those fields, 19th -century mathematicians such as Hermann Grassmann – being perhaps the most prominent source of inspiration for the new interest in generalization (on Grassmann, see now [41]; see also [55, ch.VI]) – or William Hamilton come to play a central role. It is thus not only debates that focus explicitly on the foundations of mathematics that one should consider when trying to grasp the significance of innovations in mathematics in this period. This paragraph will explore a number of topics that illustrate this multidimensional interrelatedness of various discourses, ranging from mathematics to a broad range of ideas in philosophy and the philosophy of science. 9
Cf. also the exchanges between Meinong and Russell concerning the interaction between forms of logic and the need to accept new forms of objects seem pertinent.
216
Paul Ziche
Hilbert’s justification for “adding” ideal elements to the existing elements of mathematics is yet another, and a particularly important instance illustrating these problems. First, it is not at all clear what “adding [‘hinzufügen’]” really means. Hilbert himself introduces this idea via an analogy with “adding the imaginary elements to the real ones in the theory of complex numbers” [25, p. 160], thus again using the complex numbers as the paradigm example, and ascribes two functions to this introduction of ideal elements, namely “simplification” and a “closing/rounding off [‘Abschluß’]” of mathematical theories [25, p. 161]. In his paper “Über das Unendliche” Hilbert himself stresses that simplicity is related to a “natural and consistent [konsequent]” way of extending the important laws of algebra (“die Gesetze der Algebra”), which requires an argument as to what precisely the most relevant laws are. Hilbert introduces these laws again via examples: “the laws of algebra, e.g. those concerning the existence and the number of the roots of an equation” [26, p. 174]. He does not even shrink from employing a term from traditional metaphysics in order to give a positive twist to the problematical character of the complex numbers: one can detect a “pre-established harmony” between the development of logic and the new ideas in mathematics [26, p. 176]. On the surface, Hilbert’s notion of simplification seems to clash with Russell’s account of the role of symbolism as complicating the matters at hand. But one can easily overcome this tension by asking how the introduction of new elements can possibly contribute to simplification. Hilbert envisages a simplification that results from enriching the stock of elements. The increase in the number of elements has to be compensated by a simplification in the structure of a theory. This can be achieved via extending the domain of validity of the basic principles, either because the original elements can be understood as abstractions from the new ones, or the latter ones as generalizations of the original elements. The idea of a structural simplification by an enriching on other levels can also be detected in the operations of generalization or abstraction. Introducing a more general or more abstract level increases the number of levels, but at the same time produces new and profound insights in the structure of the whole system. The concept of abstraction itself becomes ambiguous in these processes: are the complex numbers abstracted from the real ones by, for instance, adhering to a principle of permanence, or is it rather the other way round, i.e. are the real numbers abstractions reached by leaving out a part of the structurally richer complex numbers? Questions of this kind have been discussed, again, in different branches of philosophy as well as in logic or the philosophy of mathematics in a more narrow sense. The most important logical theory of abstraction is developed in direct contact with these debates in the foundations of mathematics: Frege’s and Peano’s account of definition by abstraction where new, abstract ob-
Generalization and the Impossible
217
jects are introduced by forming equivalence classes on a realm of objects. Frege’s standard example, that of defining the direction of a straight line via the equivalence relation “being parallel with” can itself be related to the complex numbers, as Husserl emphasizes ([21, p. 291]; on the history of theories of abstraction, see [1]; [2]). This theory of abstraction, however, becomes problematical if it is to be used in the context of a program that looks for a generalization of the essential features of a mathematical theory. Isn’t it the case that each and any equivalence class adds a new, abstract element so that these processes of abstraction cannot be discriminative as regards the essential vs. accidental features of a mathematical theory? Again, it is algebra itself that can show where a solution might be sought. One of the great advances of algebra in the 19th century was the understanding that invariants are crucial for understanding the fundamental structure of algebra. Typically, generalizations within algebra, and within mathematics on a larger scale, gave the idea of invariant structures itself fundamental status within mathematics. This is precisely what happens in Felix Klein’s “Erlangen program” from 1872, in many formalist discourses [12], but also in philosophical (see [33] on Cassirer) and scientific contexts, Noether’s theorem about the algebraic correspondence between symmetries and physical invariants being the prime example. In philosophy, a focus on structures as opposed to – in Cassirer’s terms – substances emerged in theories as diverse as logical empiricism (Carnap), mathematics-oriented Neo-Kantianism (Cassirer’s functional concepts), and phenomenology (Husserlian “Theorieformen”, see the end of this section), but also in other fields. In the newly budding philosophy of nature – that was itself particularly closely related to new developments in logic and mathematics – authors as diverse as the chemistphilosopher Wilhelm Ostwald, the biologist-philosopher Hans Driesch, but also A.N. Whitehead turned towards concepts such as “order” that were explicitly designed to be the most general, substance- or object-independent basis for an abstract reflection on science ([55, ch.VII.1]; [57]). A notion of “order” of course also loomed large in the innovative field of projective geometry that provided, with the infinitely distant points, the other standard example of ideal elements, and that at the same time strongly influenced the genesis of modern logic (see already [39]). At the same time, these authors – and many others – developed an interest in the formation of concepts.10 A theory of concept formation, clearly, also involves operations of generalization and/or abstraction, and leads to a hierarchization of increasingly general types of concepts. There are some obviously inadequate strategies towards increasing the generality 10 See also [12, p. 295], with the thesis that formalism is basically a theory about concept-formation.
218
Paul Ziche
of concepts: just leaving out some of the specifications inherent in more concrete concepts does not per se yield informative generalizations. Ostwald’s own attempts to arrive at the most general concepts underlying all discourse in science leads him, for instance, to the concept of a “thing” (cf. [56]); this abstraction, however, becomes meaningful only in the context of other assumptions as to what kind of relations hold between things or between things and perceivers (here, Ostwald’s energeticism holds sway).11 A particularly telling example for the interaction between these issues can be found in the work of Edmund Husserl.12 Husserl develops a rather intricate taxonomy of forms of generalization, in direct contact with mathematical innovations such as (Grassmann’s) theory of manifolds. In his Ideen zu einer reinen Phänomenologie from 1913, he distinguishes between the operations of “Generalisierung”, “Spezialisierung”, “Formalisierung” and “Entformalisierung” [32, p. 31-33]. The basic distinction holds between a generalization that leads, step by step, to ever more general “Wesensgattungen” that remain, however, specific and have a determinate content. As an example, one might turn towards biological taxonomy: man – primates – vertebrates – animals . . . . On the other hand, there also are ultimate essences that provide the highest genus possible, and that no longer have any concrete, “sachhaltige” content whatsoever. The perfectly general and formal category of “things” – that also figured prominently in Ostwald’s account of concept formation – might by taken as an example. The same applies to methodological concepts; also for concepts such as “proposition” or “implication”, one can find the most general and formal “purely logical genera [reinlogische Gattungen]”. These distinctions can again be illustrated by looking at the role that manifolds came to play within mathematics. Husserl views the transition from (ordinary) space to “ ‘Euclidean manifolds’ ” not as a “generalization”, but as a truly “ ‘formale’ Verallgemeinerung”, as a formalization. Looking back at the first sentence of the relevant chapter in Husserl’s Ideen, where he contrasts “Generalisierung” and “Verallgemeinerung von Sachhaltigem in das reinlogisch Formale”, the latter being distinguished from ordinary forms of generalization as that step that takes us form the “concrete to what is formal in a purely logical sense”, this must mean that mathematical manifolds directly illustrate the purely formal level of logic. Husserl is clearly aware of the necessity not only to distinguish, hierarchically, between different forms of generalization, and different types 11 The problem of concept formation played an important role in the early philosophy of science/analytical philosophy; see [38]; [23]. – The Encyclopedia of unified science where Hempel’s text was published is yet another example for the connections between authors from various backgrounds: Mannoury joined forces in the redaction of this series with, among others, Carnap and Neurath. 12 Husserl’s discussion of mathematical theories, in particular his reception of Grassmannian manifolds, has been treated extensively in a series of recent papers and books; see [27], [28], [29]; [30]; [21], [22]; [18]; [19]; [55].
Generalization and the Impossible
219
of generality, but also of addressing mathematical objects and the form of mathematical theories simultaneously. Any fundamental change in the objects acknowledged in a mathematical theory implies a change in the “Theorieform”, and vice versa. In the following section, it will be shown that Russell, for rather similar reasons, refuted the idea of a permanence of forms as a guideline for mathematical abstraction. But then, if the idea of extending certain principles of a mathematical theory in a natural fashion is bound to be problematical, one has to search for new ways to understand these processes. Husserl clearly views his phenomenology as providing the (philosophical, but mathematically informed) solution to this problem.
4 Pure logic and meta-scientific induction How does a generalization of mathematics have to proceed? It seems clear that one cannot just deduce from present-day mathematics what a future mathematics should look like, and which of many possible generalizations is therefore the most reasonable. The problem of establishing what the relevant features of a theory are that then have to be preserved already makes it impossible to give a strictly deductive extension of existing mathematics. If an extension in this form were possible, that would also mean that the original, unextended theory was in fact the basic theory. The picture of a productive extension rather suggests an inductive procedure – that, however, seems at odds with traditional conceptions of the method of mathematics. In Russell’s discussion of the complex numbers in the Principles of mathematics, however, this very claim is made: we find here a strong statement that an inductive procedure is needed in order to arrive at the relevant generalization of mathematics. In the paragraphs on the complex numbers, Russell explicitly denies that there is anything paradoxical about the complex numbers: “The theory of imaginaries was formerly considered a very important branch of mathematical philosophy, but it has lost its philosophical importance by ceasing to be controversial” [48, § 357]. The investigation of the complex numbers has by now, as he states in the terminology of his essay on mathematics and the metaphysicians, taken “a more abstract direction”, namely by into “an examination of the principles of symbolism, and the general nature of a Calculus”. Then he adds a startling remark: in spite of this abstraction, it is not possible to immediately start with the abstract genus; rather, “it is necessary to adopt a more inductive method, and examine the various species [of calculi, P.Z.] one by one.” 13 13 [11, p. 48] discusses various forms of generalization that are related to “Hilbert’s program” , and links them to various views with regard to the nature of induction; in the characterizations he gives of these forms of induction (“genuine, contentual
220
Paul Ziche
The main source of inspiration for this idea is A.N. Whitehead whose comprehensive work on Universal Algebra from 1898 [53] is quoted by Russell alongside the works of Hamilton, De Morgan, Jevons, Peirce, Boole and Grassmann. Whitehead’s book provides the basis for an inductive argument of precisely the form Russell has in mind. Whitehead presents the following main theories (and some other, minor theories employing algebras): manifolds, the algebra of symbolic logic, the calculus of extensions, invariants of groups, the theories of metrics (i.e. geometric and kinematical theories), and vector calculi. The rationale behind this overview over algebraic theories is to facilitate their comparison: “Such algebras have an intrinsic value for separate detailed study; also they are worthy of a comparative study, for the sake of the light thereby thrown on the general theory of symbolic reasoning, and on algebraic symbolism in particular” [53, V]. In this comparison, they acquire a more general role that Whitehead circumscribes via the metaphor of a thought “engine”: “Thus it is hoped in this work to exhibit the algebras both as systems of symbolism, and also as engines for the investigation of the possibilities of thought and reasoning connected with the abstract general idea of space.” (ibid.) Quite obviously, this function corresponds directly to Michael Detlefsen’s analysis of Hilbert’s instrumentalism in terms of – metaphors again – “‘inference tickets’” [11, p. 3]. For Whitehead, just as for all the other authors mentioned so far, it is again the introduction of the complex numbers that paved the way to generalized theories in mathematics that move beyond defining mathematics in terms of quantities [53, VII]: “The introduction of the complex quantity of ordinary algebra, an entity which is evidently based upon conventional definitions, gave rise to the wider mathematical science of to-day”.14 It is the task of Russell’s Principles to provide the philosophical part of this inductive strategy; the mathematical part has, in Russell’s words, been “admirably performed” by Whitehead [48, § 357, p. 377]. Russell vehemently opposes the idea that the relevant generalizations can be based on a “principle of the Permanence of Form” (Peacock’s/Hankel’s principle of permanence being the background of this argument) that states that some basic forms of operations have to remain valid for the new types of numbers. This principle, according to Russell, falls short of doing justice to the types of generalization that are required: “In Universal Algebra, our symbols of operation, such as + and ×, are variables”. Consequently, he is very eager to insist on the difference between complex numbers and the proof” vs. “a method of proof-schematization” ), one can discover strong parallels with Husserl’s typology of forms of generalization. 14 On Russell and Whitehead see [15]. [17, p. 282] gives a rather critical account of Whitehead’s Universal Algebra: “Overall the volume gives an unclear impression, resoundingly belying its title; Whitehead had mixed logic, algebra and geometry together, but the fusion had eluded him”.
Generalization and the Impossible
221
reals: real numbers cannot be understood as complex numbers without an imaginary part [48, § 359, p. 378]. This inductive strategy clearly presupposes the availability of successful algebraic theories, and thus mirrors the attempts to legitimate the use of the complex numbers via their applications. The important difference is again a case of generalization: not only new objects, but whole theories enter this process of inductive legitimation. Whiteheadian comparisons and Russellian induction pose, and in a kind of implicit definition also answer, the question as to what precisely the elements are over which the induction runs. Again, the crucial ideas come from algebra. Within algebra, it is perfectly common to talk of “an” algebra or of many different algebras; the typical objects have itself a theory-like character (more precisely: algebra as a mathematical discipline studies what is common to different algebras). Felix Klein’s ‘Erlangen program’ introduces, in a completely analogous way, the idea of talking about many different geometries as the mathematically most satisfactory way to arrive at a “generalization of geometry” [34, p. 34]. Here, too, all the motives mentioned and discussed so far converge: individuating geometries via their invariance under transformations, Grassmannian inspirations, the motivating role of projective geometry, and that of the complex numbers (on the latter, see [34, p. 4953]), all of them leading to – as Klein’s programmatic title states – a “comparative” consideration of recent developments in geometry. Russell then generalizes this strategy to cover mathematics as a whole. The notion of an “application” of mathematics also undergoes an important shift. No longer is the applicability to empirical reality the criterion that can justify adopting a novel theory, but rather, as in Whitehead’s Universal Algebra, the possibility to fit this theory into an inductive investigation of what (pure, in Russell’s terms) mathematics actually is. In devising a form of inductive argument that runs over whole theories, the object matter of debates about mathematics is changed: such an induction can be understood as the model of an argument about mathematics. An induction of this form is, then, no longer plagued by the uncertainties inherent in any induction that takes empirical data as its basis.
5 Scientistic liberalism and interesting generalizations The debates illustrated here clearly show that the program of generalizing mathematics that characterized much of late 19th - and early 20th -century mathematics does not proceed along fixed and pre-established lines. It makes sense to talk about inductive generalizations within mathematics, and it is possible to link these generalizations directly to the freedom and creativity that seems to be inherent in mathematical practice [38, p. 192]. Whitehead views the gigantic project of his Universal Algebra as being a
222
Paul Ziche
direct contribution to such “interesting generalizations” [53, VIII]. This, importantly, also affects the relationship that holds between the new, general theories within mathematics and the more special ones. Not only philosophers such as Husserl (see his remarks on the different forms of generalization, section III), but also mathematicians such as Hankel [20, p. 12] and philosophers/historians of science commenting on these developments have emphasized the necessity of distinguishing between (in Nagel’s terms; [38, p. 167]) “exhibiting” or exemplifying and “deriving” the principles or the results of mathematics. Both Hankel and Nagel view the first of these methods, that of “exhibiting” the principles, as the mathematically more relevant step. The paradigm example of the introduction of the complex numbers, therefore, with the ensuing changes that had to be made regarding the concept of number, of quantity, and thereby of (large parts of) mathematics, allows us to summarize these developments, and to embed them into yet broader contexts. The dialectic inherent in aiming simultaneously at an extension of mathematics and at a thoroughgoing revision of basic concepts in mathematics can, at a very abstract level, be seen back in the development (and in the very term!) of metamathematics, and in the host of new, increasingly more general definitions of mathematics that all go beyond defining mathematics via some notion of quantity. This program has been taken up by a surprisingly wide range of authors from many different fields, and from very different backgrounds. The academic landscape of the late 19th century was concerned with – one might even say: infatuated with – the search for a systematization of all the individual sciences that seemed to increase steadily in number and specialized complexity [55]. Two features of these debates are particularly remarkable in the present context: a surprising liberalism that could allow for different types of science (in the broad sense) to coexist harmoniously; and the fascination with the search for ever more general sciences, dealing with ever more general types of objects, but still meaningful according to traditional standards. Examples are Husserlian phenomenology with – specified for the subject matter here at hand – its interest in “forms of theories”; Meinongian “Gegenstandstheorie”; Ostwald’s philosophy of nature, which he combined with his energetics and a strong program in the systematization of the sciences; Cassirer’s theory of relational concepts. All these programs interact on many levels: Ostwald and Whitehead share the program of newly establishing a philosophy of nature while at the same time taking recent developments in mathematics into account ([55, ch.IV.7,VII]; [56]); Russell does indeed read texts by Ostwald in the years preceding his Principles;15 Carnap meets with Cassirer and quotes virtually all the authors mentioned in the above reconstruction of philosophico-mathematical 15 See the “Reading Lists” from ca. 1897 in [49, p. 497,501]; Russell refers to an article by Ostwald on atomism and to Ostwald’s handbook Outlines of General Chemistry.
Generalization and the Impossible
223
issues in his Logischer Aufbau ([6, p. 3-4]; [14]). Interestingly, some of the most innovative ideas in 19th -century mathematics very early found acceptance in new disciplines in the natural sciences or in psychology, thereby illustrating the potential these ideas carried for innovations in other fields, and the hesitancy of more traditional disciplines, including mathematics itself, to take notice of them (see [54], see also [9]). The debates concerning the systematization of the sciences can only be understood under the assumption that there was no clear and sufficiently general ideal of scientificity available that could be made to cover all the relevant phenomena, i.e. all the individual sciences that there were to be classified. It was one of the crucial tasks of the systematizations themselves to provide such an ideal. This ideal of scientificity, then, ultimately justified the newly introduced general sciences that themselves resulted from comparing the available forms of science. This strategy closely resembles the innovative features of the debates concerning the principles of mathematics. Here, too, one of the crucial results consisted in clarifying what the relevant unit ideas are; it became possible to ask what “an” algebra is, “a” geometry or “a” mathematical theory that can then be the basis for further generalizations, just as one aimed at stating more precisely what makes a particular intellectual activity into “a” science. All of the authors mentioned and discussed here devoted quite some space to the problem of the complex numbers (for Ostwald, see [40, p. 324328], significantly enough in the context of an attempt to transform the philosophy of nature into a “science of order”; for√Cassirer, [7, p. 70-87]). This may help to understand why the symbol “ −1” could gain popular currency so easily: What is at stake is not just a particular number, or a particular symbol; this symbol stands for a host of new problems arising in the foundations of mathematics, and for the whole program of arriving at new forms of sciences or at new foundational disciplines within mathematics.
References [1] Ignacio Angelelli. Adventures of abstraction. Poznan Studies in the Philosophy of the Sciences and the Humanities, 82:11–35, 2004. [2] Ignacio Angelelli. The Troubled History of Abstraction. In Uwe Meixner and Albert Newen (eds), Logical Analysis and History of Philosophy. Focus: History of Epistemology, pp. 157–175. Mentis, Paderborn, 2005. [3] Bryson Brown. Preservationism: A Short History. In Dov M. Gabbay and John Woods (eds), Handbook of the History of Logic. Vol. 8., pp. 95–127, Amsterdam et al., 2007. Elsevier. [4] William Byers. How mathematicians think. Using ambiguity, contradiction, and paradox to create mathematics. Princeton UP, Princeton, 2007.
224
Paul Ziche
[5] Andrea Cantini. Paradoxes, self-reference and truth in the 20th century. In Don M. Gabbay and John Woods (eds), Handbook of the History of Logic. Vol. 5., pp. 875–1013. Elsevier, Amsterdam et al., 2009. [6] Rudolf Carnap. Der logische Aufbau der Welt. Weltkreis-Verlag, BerlinSchlachtensee, 1928. [7] Ernst Cassirer. Substanzbegriff und Funktionsbegriff. Untersuchungen über die Grundfragen der Erkenntniskritik. Paul Cassirer, Berlin, 1910. [8] Ernst Cassirer. An essay on man. An introduction to a philosophy of human culture., volume Gesammelte Werke. Vol. 23. Meiner, Hamburg, (1944) 2006. [9] Olivier Darrigol. Number and measure: Hermann von Helmholtz at the crossroads of mathematics, physics, and psychology. Studies in History and Philosophy of Science, 34:515–573, 2003. [10] John Derbyshire. Unknown quantity. A real and imaginary history of algebra. Joseph Henry Press, Washington, 2006. [11] Michael Detlefsen. Hilbert’s Program. An Essay on Mathematical Instrumentalism. Reidel, Dordrecht et al., 1986. [12] Michael Detlefsen. Formalism. In Stewart Shapiro (ed.), The Oxford Handbook of Philosophy of Mathematics and Logic, pp. 236–317. OUP, Oxford, 2005. [13] Michel Foucault. Les mots et les choses. Une archéologie des sciences humaines. Gallimard, Paris, 1966. [14] Michael Friedman. A parting of the ways: Carnap, Cassirer, and Heidegger. Open Court, Chicago / La Salle, Ill., 2000. [15] Sébastien Gandon. Russell et l’Universal Algebra de Whithehead: La géométrie projective entre ordre et incidence (1898-1903). Revue d’histoire des mathématiques, 10:187–256, 2004. [16] Ivor Grattan-Guinness. Forms in Algebra and their Interpretations: Some Historical and Philosophical Features. In L. Albertazzi (ed.), Shapes of Forms, pp. 177–190. Kluwer, Dordrecht et al., 1999. [17] Ivor Grattan-Guinness. The Search for Mathematical Roots 1870-1940. Logics, Set Theories and the Foundations of Mathematics from Cantor through Russell to Gödel. Princeton UP, Princeton/Oxford, 2000. [18] Aaron Gurwitsch. Phenomenology and the Theory of the Sciences. Northwestern UP, Evanston, 1971, edited by Lester Embree. [19] Guillermo E. Rosado Haddock. Husserl’s philosophy of mathematics: its origin and relevance. Husserl Studies, 22:193–222, 2006. [20] Hermann Hankel. Theorie der complexen Zahlensysteme insbesondere der gemeinen imaginären Zahlen und der Hamiltonschen Quaternionen nebst ihrer geometrischen Darstellung. Voss, Leipzig, 1867. [21] Mirja Helena Hartimo. Towards completeness: Husserl on theories of manifolds 1890-1901. Synthese, 156:281–310, 2007. [22] Mirja Helena Hartimo. From geometry to phenomonology. Synthese, 162: 225–233, 2008.
Generalization and the Impossible
225
[23] Carl G. Hempel. Fundamentals of Concept Formation in Empirical Science. In Otto Neurath, Rudolf Carnap, and Charles Morris (eds), Foundations of the Unity of Science. Toward an International Encyclopedia of Unified Science, volume 2, pp. 651–745. Chicago UP, Chicago/London, 1952. [24] Johann Gottfried Herder. Verstand und Erfahrung. Eine Metakritik zur Kritik der reinen Vernunft. Erster Theil, volume Sämmtliche Werke Vo.. 21. Weidmannsche Buchhandlung, Berlin, (1799) 1881, edited by Bernhard Suphan. [25] David Hilbert. Die logischen Grundlagen der Mathematik. Mathematische Annalen, 88:151–165, 1923. [26] David Hilbert. Über das Unendliche. Mathematische Annalen, 95:161–190, 1926. [27] Claire Ortiz Hill. Word and Object in Husserl, Frege, and Russell. The Roots of Twentieth-Century Philosophy. Ohio University Press, Athens, 1991. [28] Claire Ortiz Hill. Did Georg Cantor influence Edmund Husserl? Synthese, 113:145–170, 1997. [29] Claire Ortiz Hill. Tackling three of Frege’s problems: Edmund Husserl on sets and manifolds. Axiomathes, 13:79–104, 2002. [30] Claire Ortiz Hill and Guillermo E. Rosado Haddock. Husserl or Frege? Meaning, Objectivity, and Mathematics. Open Court, Chicago / La Salle, Ill., 2000. [31] Edmund Husserl. Das Imaginäre in der Mathematik. Husserliana, XII: 430–444, 1901. [32] Edmund Husserl. Ideen zu einer reinen Phänomenologie und phänomenologischen Psychologie. Erstes Buch. Husserliana III/1. The Hague, (1913) 1976. [33] Karl-Norbert Ihmig. Cassirers Invariantentheorie der Erfahrung und seine Rezeption des ‘Erlanger Programms’. Meiner, Hamburg, 1997. [34] Felix Klein. Das Erlanger Programm. Vergleichende Betrachtungen über neuere geometrische Forschungen. Harri Deutsch, Thun/Frankfurt a.M., (1872) 1997, edited by Hans Wußing. [35] Gerrit Mannoury. Mathesis en mystiek. Een signifiese studie van kommunisties standpunt. Maatschappij voor Goede en Goedkoope Lectuur, Amsterdam, 1924. [36] Multatuli. Max Havelaar of de koffiveilingen der Nederlandsche handelmaatschappy. Bert Bakker, Amsterdam, (1860) 2005, edited by Annemarie Kets. [37] Robert Musil. Die Verwirrungen des Zöglings Törleß. Rowohlt, Reinbek b. Hamburg, (1906) 1978. [38] Ernest Nagel. ‘Impossible Numbers’: A Chapter in the History of Modern Logic. In Teleology Revisited and Other Essays in the Philosophy and History of Science, pp. 166–194. Columbia UP, New York, (1935) 1979.
226
Paul Ziche
[39] Ernest Nagel. The Formation of Modern Conceptions of Formal Logic in the Development of Geometry. Osiris, 7:142–223, 1939. [40] Wilhelm Ostwald. Moderne Naturphilosophie. I. Ordnungswissenschaften. Akademische Verlagsgesellschaft, Leipzig, 1914. [41] Hans-Joachim Petsche, Albert C. Lewis, Jörg Liesen, and Steve Russ (eds). From Past to Future: Grassmann’s Work in Context. Proceedings of the Grassmann Bicentennial Conference. Birkhäuser, Stuttgart, 2011. [42] Graham Priest. Paraconsistency and Dialetheism. In Dov M. Gabbay and John Woods (eds), Handbook of the History of Logic. Vol. 8., pp. 129–204. Elsevier, Amsterdam et al., 2007. [43] Graham Priest, Richard Routley, and Jean Norman. Paraconsistent Logic. Essays on the Inconsistent. Philosophia, München / Hamden / Wien, 1989. [44] Helena M. Pycior. Symbols, Impossible Numbers, and Geometric Entanglement. British Algebra through the Commentaries on Newton’s Universal Arithmetick. CUP, Cambridge, 1997. [45] Esther Ramharter. Are all contradictions equal? Wittgenstein on confusion in mathematics. In PhiMSAMP. Philosophy of Mathematics: Sociological Aspects and Mathematical Practice, pp. 293–306, London, 2010. College Publications. [46] Reinhold Remmert. Komplexe Zahlen. In Heinz-Dieter Ebbinghaus et al. (ed.), Zahlen, pp. 45–78. Springer, Berlin et al., 3rd edition, 1992. [47] Bertrand Russell. Mysticism and Logic and Other Essays. Longmans, Green and Co, London et al., 1921. [48] Bertrand Russell. The principles of mathematics. George Allen & Unwin, London, 2nd; 1st ed. 1903 edition, (1937) 1950. [49] Bertrand Russell. Philosophical Papers 1896-99. Unwin Hyman, London et al., 1990, edited by Nicholas Griffin and Albert C. Lewis. [50] H. Walter Schmitz. De Hollandse Significa. Een reconstructie van de geschiedenis van 1892 tot 1926. Van Gorcum, Assen/Maastricht, 1990. [51] Erhard Scholz (ed.). Geschichte der Algebra. Eine Einführung. B.I. Wissenschaftsverlag, Mannheim et al., 1990. [52] Elisabeth Schuhmann and Karl Schuhmann. Husserls Manuskripte zu seinem Göttinger Doppelvortrag. Husserl Studies, 17:87–123, 2001. [53] Alfred North Whitehead. A treatise on universal algebra with applications. CUP, Cambridge, 1898. [54] Paul Ziche. Wilhelm Thiery Preyers Psychomathematik: Die Mathematisierung der reinen Empfindungen. Psychologie und Geschichte, 10:283– 295, 2002. [55] Paul Ziche. Wissenschaftslandschaften um 1900. Philosophie, die Wissenschaften und der nicht-reduktive Szientismus. Chronos, Zürich, 2008. [56] Paul Ziche. Wilhelm Ostwald als Begründer der modernen Logik. Logik und künstliche Sprachen bei Ostwald und Louis Couturat. In Pirmin StekelerWeithofer, Heiner Kaden, and Nikolaus Psarros (eds), Ein Netz der Wissenschaften? Wilhelm Ostwalds “Annalen der Naturphilosophie” und die
Generalization and the Impossible
227
Durchsetzung wissenschaftlicher Paradigmen. Sächsische Akademie der Wissenschaften, Leipzig, 2009. [57] Paul Ziche. Alternative claims to the discovery of modern logic: Coincidences and diversification. In K. François, B. Löwe, Th. Müller, and B. Van Kerkhove (eds), Foundations of the Formal Sciences VII. Bringing together Philosophy and Sociology of Science, pp. 243–267. College Publications, London, 2011.
Assumptions of Infinity Karl-Georg Niebergall
This text contains precise definientia of “(theory) T makes an assumption of infinity” and an assessment of their strengths and weaknesses. Related proposals for explications of “T assumes merely the finite” and “T assumes the potentially infinite” are also discussed in it. Some of these definitions rest on explicantia of “(formula) α expresses that x is finite relative to T ” and on axiomatizations of “ x is finite”. Appropriate definientia and axioms are given with respect to both set-theoretical and mereological languages.
1 Introduction In a sense, all distinctions are on a par: either x is . . . or x is not . . . . Yet a few of them seem to be particularly fundamental and significant. In my opinion, the distinction between the finite and the infinite is a case in point.1 Either x is finite or x is infinite.2 But more should be said: between the finite and the infinite, there seems to be an abyss.3 For if one starts with a finite object and adds a finite object, or if one starts with finitely many objects and adds finitely many, one still obtains a finite object or finitely many objects; even if finitely many finite objects are adjoined, it is again merely a finite object that results. In order to obtain an infinite object from a finite one, infinitely many or infinitely large objects have to be added. There seems to be no way to bridge the gap between the finite and the infinite, if not by the infinite itself – if it exists.
1
Of course, I am not alone in this assessment: The distinction has had a grip on philosophical and scientific, if not human, thought at least from Early Greek philosophy on; see [9] and [7] for systematically relevant presentations of its history.
2
I regard “x is infinite” as equivalent (for example, by definition) to “ ¬ (x is finite)”.
3
This does not preclude that it is possible to deal with the infinite by finite means. Neither is the word “infinite” (construed as a token) infinite, nor need the sentences which can be used to make assertions about the infinite be infinite.
230
Karl-Georg Niebergall
For maybe there are no infinite objects. If so, then no infinite object is different from any finite one. But that does not mean that we have to understand “x is infinite” as “x is finite”. Nor does it imply that we do not understand “x is finite” and “x is infinite” at all. Actually, in the above deliberation, I have displayed a certain – specific – understanding of “x is finite” and “x is infinite” all along.4 But how exactly are “x is finite” and “x is infinite”, and phrases containing these and similar expressions, used or understood? Let’s deal with this topic more systematically and start by listing some common contexts of use of the words “finite” and, in particular, “infinite”. (αI) x is infinite. (βI) (γI)
There are infinitely many A’s. (Theory) T makes an assumption of infinity.
Similar formulations, with occurences of “finite” instead of “infinite”, are common in case of (α) and (β), but the variant (γF) of (γI) is not so popular. (αF) (βF)
x is finite. There are finitely many A’s.
(γF) T assumes merely the finite.5 These formulations are the linguistic data for the ensuing investigation (for further ones, cf. (αP) and (γP) below and footnotes 5 and 8). I believe that their meaning is neither entirely clear nor entirely opaque. Let me illustrate this assessment by some first, admittedly still vague, examples and principles suggested by the typical usage of (αI) to (γF). First, since “x is infinite” means just “¬ (x is finite)”, an explication of (αI) immediately yields one for (αF) and vice versa; similarly for (βI) and (βF). Second, there should be some agreement as to examples of and counterexamples to (αI) to (γF). The empty set and singletons are finite objects, the set of natural numbers and the set of real numbers are infinite objects. Then there are finitely many stones and atoms in Berlin, there are infinitely many natural numbers, and there may be finitely many or infinitely many space-time points in our universe. Finally, it is commonly taken for granted that PA (see [4]) and ZF (see [6], [17] and [2]) make assumptions of infinity, whereas PLL 1 , the set of logical truths formulated in some first order language L, merely assumes the finite.
4
And I have dealt with the infinite (if it exists) by finite means.
5
I treat “T makes an assumption of infinity”, “ T assumes the infinite”, “T rests on the infinite” and “ T is committed to the infinite” as synonyms. A like remark is supposed to hold for phrases of sort (γF).
Assumptions of Infinity
231
It may not be settled, however, whether it makes sense to say of nonmathematical entities, in particular of non-sets, that they are finite or infinite. Is Berlin a finite object? Assume that it consists of finitely many parts. The entity which clearly is finite in this case is the set of its parts. Maybe also the fusion of Berlin’s parts is a finite object. But since this fusion is just Berlin, Berlin itself would, in the end, be a finite object. Third, there should also be some agreement as to principles holding for (αI) to (γF). For example, a subset of a finite set must be finite; and if S makes an assumption of infinity and S ⊆ T , then T makes an assumption of infinity.6 Fourth, for sets, (αI) and (βI) (plus (αF) and (βF)) seem to be interdefinable. For “There are infinitely many A’s” may be explained as “The set of A’s is infinite”. And, at least for sets x, “x is infinite” should be definable as “There are infinitely many A’s such that for all y, y ∈ x ⇐⇒ y is an A”.7 As a consequence of these observations, I will concentrate on phrases of the type “x is infinite” and “x is finite”, plus “T makes an assumption of infinity” and “T assumes merely the finite”. In addition to these, similar ones containing “potentially infinite” may be taken into account; i.e. (αP) (γP)
x is potentially infinite. T makes an assumption of the potentially infinite.8
To gain a better understanding of (αP) and (γP) I regard as a goal in itself. One simply has to admit that they play an important role both in philosophy and in investigations on the foundations of mathematics. My feeling is, however, that a clear meaning has never been given to them. Whether a sense may be supplied for (γP) such that it receives a place of its own is discussed in section 6. Let me now come to the topic of precise explicantia of (αI), (αF) and (γI). It is well known that in modern mathematics, “x is infinite” and “x is finite” have found such definientia: they are stated in purely set theoretic terms (i.e., formulated in L[∈], the first order language of set theory) and are widely accepted as adequate explicantia of these predicates. As a reminder, here are two of the most popular ones:9 6
Principles such as these play a crucial role for the arguments of section 6.3.
7
A closer look may reveal difficulties: for example, one would like to claim that there are infinitely many ordinal numbers; but, at least in, e.g., ZF, there is neither an infinite set nor an infinite class which contains exactly the ordinal numbers. On further thought, it may be possible to circumvent this difficulty: “There are infinitely many ordinal numbers” could be simulated by “There is an infinite set x such that each element of x is an ordinal number.
8
I treat “T makes an assumption of the potentially infinite”, “ T assumes the potentially infinite” and “T rests on the potentially infinite” as synonyms.
9
See Levy 1958, [16] and [5], plus section 4.2 and the appendix, for further definitions.
232
(D1) (D2)
Karl-Georg Niebergall
x is Dedekind-infinite :←→ ∃y (y ⊂ x ∧ y ≃ x),
x is Dedekind-finite :←→ ¬ (x is Dedekind-infinite). x is finite :←→ ∃y (y ∈ ω ∧ y ≃ x),
x is infinite :←→ ¬ (x is finite). Here these definitions are at work: x ≃ y :←→ ∃u (u is a bijection from x onto y), x ∈ ω :←→ ∀z (∅ ∈ z ∧ ∀y (y ∈ z −→ y ∪ {y} ∈ z) −→ x ∈ z).10 It should be emphasized, however, that there seem to be no accepted proposals for precise definientia of “x is infinite” and “x is finite” which are not formulated in set theoretic terms.11 Is it possible to find further languages in which it can be expressed that x is infinite? More basically still, what does it mean for a formula α(x) to express that x is infinite (especially, if its language is not of the familiar set theoretical type)? Answers to these questions are not easily forthcoming. But I regard them as worth pursuing. For first, I take it as an aim in itself to find out whether the domain of such first order languages in which it can be expressed that x is infinite can be extended beyond the set theoretic example. Second, for historical reaons, I view it as particularly important to investigate if in languages which are supposed to be about concrete objects, it can be expressed that x is infinite. For before the infinite became so central in modern mathematics, the predicate “is finite” had usually been ascribed to entities for which it is far from obvious that they should be construed as mathematical ones (actually, they are often most naturally conceived of as nonmathematical entities, e.g., as objects that have at least concrete components). Third, and more specifically, how to express that x is infinite should be fairly interesting for the nominalist. For he has to face this challenge: Some nominalistic theories make assumptions of infinity;12 but there are only finitely many concrete objects, all of which are finite. Thus these nominalistic theories are false. As a reply to this criticism, it has sometimes been claimed that we do not know whether there are only finitely many concrete objects, all of which are finite. This assertion may be right. But can it be formulated or understood by a nominalist? It would be pleasing for him if he could take part in the above discussion by using his own language; i.e. by employing 10 The appendix contains both alternatives and more details. 11 Let’s not forget that there are also type theoretic definientia for these predicates. In this paper, however, they will not be distinguished from the set theoretic ones. 12 These are, roughly, theories which are supposed to be only about concrete objects. Nominalism, as it is here understood, consists in the acceptance of concrete objects and in the rejection of abstract ones.
Assumptions of Infinity
233
merely the vocabulary of a nominalistic theory. Do nominalistic theories exist relative to which it can be expressed that x is infinite? For a start, it should by investigated whether calculi of individuals could fill this role: for they are commonly conceived of as the archetypical nominalistic theories; they are comparatively simple, having only “overlaps” as a nonlogical primitive; and they should be contained in many nonmathematical, in particular, nominalistic, theories. The task of defining “x is infinite” in terms of “overlaps” is addressed in sections 4.3 and 4.4. Let’s finally come to (γI). To the best of my knowledge, even for mathematical theories T there are only scattered attempts to provide a general explication of “T makes an assumption of infinity”.13 Moreover, there seems to be no agreement on how to evaluate these suggestions. Usually, it is just examples of theories which are supposed to make assumptions of infinity – such as the above-mentioned PA and ZF – that are presented. Let me also point out that the mere existence of an explicans for (αI) does not by itself deliver one for (γI). In particular, even if the phrase “T makes an assumption of infinity” were somehow broken up into “T makes an assumption of x” and “x is infinite” (which incidentally is far from necessary), the task of finding an explication for “T makes an assumption of x” would still not be trivial. So much for the introduction. The subsequent investigation can be viewed as an elaboration of some of the considerations presented only sketchily above. More precisely, its main aim is to state and evaluate explicantia of “T makes an assumption of infinity”, possibly complemented by definientia of “T assumes merely the finite” and “T assumes the potentially infinite”. My approach to this task is such that, at some points, both axiom systems for and definitions of “x is finite” have to be given; I also regard these, however, as interesting in their own right. Thus, in section 2, I suggest and discuss explications of “T makes an assumption of infinity” and of “T assumes merely the finite”. One interesting definiens of “T makes an assumption of infinity” (i.e., (DIiii)) needs further refinement: only if it can be expressed in the language of T that x is infinite can (DIiii) work properly. A preliminary version of a possible explicans of “α expresses that x is infinite” is introduced and motivated in section 3. It rests on axiomatizations of “x is finite”: plausible axioms are presented in section 4, both relative to set theories (section 4.1) and to mereological theories (section 4.3). In addition, set theoretical and mereological definientia of “x is finite” for which those axioms provably hold are formulated in sections 4.2 and 4.4. Different explicit and precise renderings of (DIiii), building on sections 3 and 4, are then put forward and discussed in section 5. This paper contains three further sections: section 6 deals with the prospects of finding an explicans of “T assumes the po13 See [12] for further comments.
234
Karl-Georg Niebergall
tentially infinite”; section 7 contains a final evaluation of the explicantia for “T makes an assumption of infinity” taken into account in this paper; and an appendix supplies a quite detailed proof of a theorem (Theorem 2) mentioned in section 4.2.
2 “T makes an assumption of infinity” and “T assumes merely the finite” 2.1 Background In this section, I present several explicantia both for “T makes an assumption of infinity” and for “T assumes merely the finite”. I think that some are reasonable and others are unacceptable outright, whereas still others need further thought. Since I have a better grasp of “T makes an assumption of infinity”, I start with explications for this phrase and use them as role models for similar constructions later on. Roughly speaking, “T makes an assumption of infinity” means: T can only be true if infinity exists. The phrase “if infinity exists” is – provided it is English at all – not unambiguous, however. As I understand it, it can mean “if infinitely many objects exist”, but also “if an infinite (i.e., infinitely large) object exists”. Thus, we have (i) T makes an assumption of infinity ⇐⇒ T can only be true if infinitely many objects exist and (ii) T makes an assumption of infinity ⇐⇒ T can only be true if an infinite object exists. Prima facie, (i) and (ii) are not equivalent with each other. For naively, PA makes an assumption of infinity in assuming infinitely many objects, but, since each natural number is generally supposed to be finite, not in assuming an infinite object. In this, PA differs from ZF, which makes an assumption of infinity both in assuming infinitely many and infinitely large objects. The definientia of (i) and (ii) are certainly not clear enough, however. One reason is that they contain the modality “can”. As a first elaboration of (i) and (ii), let’s therefore consider: (i)’ T makes an assumption of infinity ⇐⇒ (T is true =⇒ {x | x = x} is infinite) and (ii)’ T makes an assumption of infinity ⇐⇒ (T is true =⇒ ∃x (x is infinite)).
Yet, given (i)’ or (ii)’, if T is false, it makes an assumption of infinity in either case (which is not intended). And it should be possible that T is
Assumptions of Infinity
235
true and makes an assumption of infinity, even if there are not infinitely many objects or no infinite object; but this is precluded by (i)’ or (ii)’.14 The problem is that the definientia of (i)’ and (ii)’ do not capture the necessity, and generality, expressed in (i) and (ii). A familiar method for getting by with this difficulty is to replace the talk of truth by truth in a model and gain generality by quantification over all models (i.e., over all “possible worlds”). This idea leads to the definitions put forward in the next subsection.
2.2 T makes an assumption of infinity The precise versions of (i) and (ii) that I seriously consider are these:15 (DIi) T makes an assumption of infinity : ⇐⇒ ∀M (M |= T =⇒ M is infinite). (DIii) T makes an assumption of infinity : ⇐⇒ ∀M (M |= T =⇒ ∃x (x ∈ M ∧ x is infinite)).
Here is a further version which may be viewed as an adequate rendering of (ii): (DIiii) T makes an assumption of infinity : ⇐⇒ T |= ∃x (x is infinite). One may even take these equivalences into account: (DIiv)
T makes an assumption of infinity : ⇐⇒ ∃M (M |= T ∧ M is infinite).
(DIv) T makes an assumption of infinity : ⇐⇒ T |= ∀x (x is infinite). I first state my own linguistic intuitions concerning the plausibility of these definitions. Then I will see if those first judgements are viable in light of examples and general principles concerning assumptions of infinity which also should have an intuitive appeal. (2.2.1) I take (DIiv) and (DIv) to be prima facie implausible and (DIi) to be plausible. (2.2.2) A reason for the rejection of (DIiv) is that the set PLL 1 has infinite models; but PLL does not make an assumption of infinity. 1 (2.2.3) A reason for the rejection of (DIv) is that ZF makes an assumption of infinity, but does not prove “∀x (x is infinite)” (where “is infinite” is explained by, e.g., (D2)). 14 This recalls a criticism of Quine’s criterion of ontological commitment, some versions of which seem to imply that if a theory T is committed to A’s, A’s exist; see, e.g., [15]. 15 Let me point out that in applying the following definitions, I will tacitly assume that the theories dealt with have models. In addition, the assumption that they are formulated in first order languages L is always supposed to be at work. For models appropriate to L, which are pairs hM, Ii, I usually write “M”.
236
Karl-Georg Niebergall
(2.2.4) In addition to (DIi), (DIiii) has some plausibility, too. (DIiii) receives a detailed treatment in sections 3 and 5. (2.2.5) (DIii) may not be implausible, either, but is nevertheless unacceptable – given model theoretic semantics. For if T has a model M, it has a model M′ consisting only of finite objects.16 By (DIii), T therefore does not make any assumption of infinity. In sum: given (DIii), even ZF does not make an assumption of infinity – and so does no theory. Let me stress that, although (DIii) may seem to be equivalent to (DIiii), this is far from being the case.17 Certainly, a theory like ZF proves “∃x (x is infinite)” (as defined in (D2), for example), thereby making an assumption of infinity in the sense of (DIiii). But this does not mean that in each model M of ZF, there has to be an infinite object. For an element a of such an M , “a is infinite” need simply not be equivalent to “M |= x is infinite[x : a]”.18
2.3 T assumes merely the finite When it comes to assumption of merely the finite, we could envisage these analogues of the above definientia: (DFi) (DFii) (DFiii) (DFiv) (DFv)
T assumes merely the finite : ⇐⇒ ∀M(M |= T =⇒ M is finite). T assumes merely the finite : ⇐⇒ ∀M (M |= T =⇒ ∀x (x ∈ M =⇒ x is finite)). T assumes merely the finite : ⇐⇒ T |= ∀x (x is finite). T assumes merely the finite : ⇐⇒ ∃M (M |= T ∧ M is finite). T assumes merely the finite : ⇐⇒ T |= ∃x (x is finite).
My responses are similar to the commentaries from section 2.1, but not as firm. (2.3.1) I take (DFv) to be prima facie implausible. (2.3.2) A reason for the rejection of (DFi) is that PLL 1 merely assumes the finite: but it has infinite models. (2.3.3) A reason for the rejection of (DFv) is that ZF does not assume merely the finite, but proves “∃x (x is finite)”. (2.3.4) (DFiii) is incompatible with (DIi). For consider IN F (∈), the axiom of infinity (in L[∈]), i.e.,
16 To obtain such a M′ , one may replace each element of M by its singleton and change the interpretation of T ’s vocabulary accordingly, herewith making M′ isomorphic to M.
17 It may well be that (DIii) owes its prima facie plausibility to a confusion with (DIiii). 18 “M |= α(x)[x : a]” means that α(x) is satisfied in M by an assignment which maps the variable “x” to the object a.
Assumptions of Infinity
237
∃x (∅ ∈ x ∧ ∀y (y ∈ x −→ y ∪ {y} ∈ x))
and set T := ZF – IN F (∈) + ¬IN F (∈) (i.e., the theory axiomatized by the usual ZF-axioms without IN F (∈), with ¬IN F (∈) added). T |= ∀x (x is finite) (as defined in (D2)), but T has no finite model. Therefore, by (DIi), T makes an assumption of infinity, whereas by (DFiii), T assumes merely the finite; but this is not possible.19 (2.3.5) (DFiii) and (DFv) are questionable in a further way: given that PLL 1 merely assumes the finite, “∀x (x is finite)” or “∃x (x is finite)”, resp., have to be logical truths of first order logic (under the assumption of (DFiii) and (DFv), resp.). But this is hardly the case; in particular, if PLL 1 is stated in a language without nonlogical vocabulary. (2.3.6) By the same type of reasoning as presented with respect to (DIii), (DFii) is unacceptable. Take a model of T consisting only of infinite objects. By (DFii), T does not merely assume the finite. That is: given (DFii), PLL 1 does not merely assume the finite and no theory does so. As the lesson of this section, I take it that from the above definitions, only (DIi), (DIiii) and (DFiv) have a chance of being acceptable. In fact, (DIi) is simply plausible to me; intuitively, however, I am not that confident with respect to (DIiii) and (DFiv). It should be added that from the equivalences which are (DIi) to (DFv), only some of the implications have been discredited. In particular, I regard (Ad2+) ∀M (M |= T =⇒ M is finite) =⇒ T assumes merely the finite
as fairly convincing. Even more obvious may be this variant of (Ad2+), where a uniform upper bound of the size of the models of T is assumed:
(Ad2)
∃k (k ∈ N ∧ ∀M (M |= T =⇒ |M | ≤ k)) =⇒ T assumes merely the finite.
By the compactness theorem for first order logic, however, “∀M (M |= T =⇒ M is finite)” and “∃k (k ∈ N ∧ ∀M (M |= T =⇒ |M | ≤ k))”, and therefore (Ad2+) and (Ad2), are equivalent.
3 Expressing infinity: a preliminary suggestion 3.1 The task The definientia of (DIi) and (DFiv) differ from that of (DIiii) in an important respect: whereas the formulas “x is infinite” or “x is finite” occuring in the first group belong to some metalanguage ML[T] of the language L[T]
19 To be explicit, in order to carry out the last step of the argument, I have presupposed an additional principle (cf. section 6.3): (Ad4) ¬ (T makes an assumption of infinity ∧ T assumes merely the finite).
238
Karl-Georg Niebergall
in which T is formulated, the predicate “is infinite” from the definiens of (DIiii) is formulated in L[T] itself. ML[T] contains model theoretic terminology, in particular “|=”; and since model theoretic expressions are typically and naturally defined using set theoretical vocabulary, ML[T] may be taken to be an extension of L[∈]. It is therefore unproblematic to assume that the expressions “x is infinite” and “x is finite” occuring in the definientia of (DIi) and (DFiv) are explained set theoretically, for example, by (D1) or (D2) or in one of the ways presented in section 4.2. But when it comes to the definiens of (DIiii), it is left completely open what the vocabulary of L[T] could be. In particular, it need not be a set theoretic one. Hence, there is no guarantee at all that “∃x (x is infinite)” can be expressed in L[T] (it may be recalled that the only successful definitions of “x is infinite” seem to be set theoretic and type-theoretic ones). Now, this reasoning does not only point to possible severe restrictions as to the applicability of definition (Diiii). It also addresses a less well-defined and more basic task: the task of providing an explication for α(x) expresses that x is infinite, where α(x) should be a formula with x as its sole free variable (taken from L[T] in our case). Without such an explication, an assessment of (Diiii) seems hardly possible. For (DIiii) has a genuine chance of being adequate only if the formula mentioned in its definiens – that is, the formula for which I somewhat carelessly used the notation “∃x (x is infinite)” – expresses that an infinite entity exists. Accordingly, “x is infinite” should express that the object x assigned to the variable “x” is infinite.20 A more complete version of (DIiii) could thus be roughly stated as follows T makes an assumption of infinity : ⇐⇒ α(x) expresses that x is infinite and T |= ∃x α(x).
Strictly speaking, this way of putting it is of course unacceptable: α cannot be free in the definiens while not occuring in the definiendum. In addition, it may turn out that whether a formula α(x) expresses that x is infinite has to be relativized to some theory S. In the final versions, these concerns will be taken care of; see section 5. 20 This use of “x” and “x” may be regarded as somewhat questionable. If α(x) were an atomic formula - say “P x” - it would perhaps be preferable to write “P ” expresses infinity instead of “P x” expresses that x is infinite. But α need not be atomic (it is possible, however, to add a variable binding operator to the language which turns formulas into predicates, thereby mimicking the procedure just suggested also for complex formulas). Anyway, the explicantia of α(x) expresses that x is finite put forward in section 5 are contextual definitions which should be free from any use-mention confusion.
Assumptions of Infinity
239
3.2 Approaches to explanations In view of the history of philosophy, one could despair at the task of attaining a decent general understanding of “expresses”. But here we have to deal only with specific, restricted, instances of it: only “expresses that x is infinite” has to be explained, and the formulas considered for the role of expressing that x is infinite are taken from familiar, explicitly stated, formal languages. To start with, it can be observed that the definientia of “x is Dedekindinfinite” and “x is infinite” from (D1) and (D2) are widely accepted as set theoretic explicantia of “x is infinite”. Using the terminology advanced in the previous section, this may be reformulated as: they are viewed as expressing that x is infinite. But what are the reasons for the assertions (E1) (E2)
“∃y (y ⊂ x ∧ y ≃ x)” expresses that x is infinite, “¬∃y (y ∈ ω ∧ y ≃ x)” expresses that x is infinite?
It seems fair to notice that in modern textbooks and research papers on set theory, (D1) and (D2) are normally simply stated and in the same breath accepted without further ado. I will therefore discuss which answers could be given to the question concerning (E1) and (E2) rather than which ones are actually given by the working set theoretician. And here, I see three options:21 (1) (E1) and (E2) are regarded as immediately comprehensible, needing no further arguments for their acceptability. I think that this sort of move should not always be rejected; but I am not satisfied with it here. To start with, the definientia from (D1) and (D2) are already so complex that it is doubtful that (E1) and (E2) can be assessed instantly. In addition, the left to right implication of (D1) is not that obvious to me. Finally, (D2) seems to presuppose that “x ∈ ω” expresses that x is a natural number (or something close to it); if so, reasons have to be given for this claim, too. (2) An introductory course to logic or set theory provides for a paradigmatic situation where (E1) and (E2) can hardly be asserted if no reasons for them are given. Let’s treat (D2). Here, after “x ∈ ω” has been defined and the existence of ω has been postulated, it will be claimed that ω is infinite. I reason for “ω is infinite” as follows. By employing some sentences from L[∈] which I have, at this point, already accepted as set theoretical principles, I show that the following specific sentences follow from these principles: “∅ ∈ ω”, “{∅} ∈ ω”, “{∅, {∅}} ∈ ω”, ...., “∅ 6= {∅}”, “∅ 6= {∅, {∅}}”, ... . Therefore, ω has at least 1 element, has at least 2 elements, has at least 3 elements, . . . , has at least n elements – for each natural number n. Thus, ω has infinitely many elements, and is infinite.
21 This subsection is an abridged version of the more detailed treatment presented in [14].
240
Karl-Georg Niebergall
Now, having at least 1, 2, 3 elements, and so on, is certainly not a special feature of ω. Replace ω by any other infinite set, and the same happens. That is, if x is infinite, it has at least 1, 2, 3, and so on, elements. And if x has at least 1, 2, 3, and so on, elements, it is infinite. Viewed more abstractly, what is done in the case of ω is this: for each natural number n, the sentence “∃≥n y (y ∈ ω)” is shown.22 In general, one might therefore define: α(x) expresses that x is infinite : ⇐⇒ (?) ∀n (n ∈ N =⇒ ∀x (α(x) =⇒ ∃≥n y (y ∈ x))). Yet (?) is not well formed. Now there is a standard procedure to avoid this kind of difficulty: add a metatheoretical predicate. This leads, e.g., to (??) ∀n (n ∈ N =⇒ V |= ∀x (α(x) −→ ∃≥n y (y ∈ x))).
But first I do not know to what truth in the set theoretic universe amounts to. Second I had accepted a specific set of set theoretical principles in the reasoning for “ω is infinite”; yet they are not adressed in (??). And third it may well be that some theory S proves “∀x (α(x) −→ ∃≥n y (y ∈ x))” for each n ∈ N, whereas some other set theory T doesn’t; thus, a dependency on the respective choice of set theoretical principles must be borne in mind. What I suggest here is therefore α(x) expresses that x is infinite relative to T : ⇐⇒ ∀n (n ∈ N =⇒ T ⊢ ∀x (α(x) −→ ∃≥n y (y ∈ x))). (3) A model theoretic treatment of (E1) and (E2) may also seem reasonable. That (E1), for example, is true may be explained in this way: for each structure M for L[∈] and each element x of M which satisfies “∃y (y ⊂ x ∧ y ≃ x)” in M, x is infinite.
Abstracting from this idea and incorporating a relativization to an appropriate set theory T , let’s therefore define: (M1T )
α(x) expresses that x is infinite relative to T : ⇐⇒ ∀a, M (M |= T ∧ a ∈ M ∧ M |= α(x)[x : a] =⇒ a is infinite).
This definition, however, falls prey to a type of argument already put forward against (DIii) in (2.2.5). Actually, it does not even matter how exactly the predicate “is infinite” used in (M1T ) is understood – as long as there is at least one entity which is not infinite in this understanding, (M1T ) is bound to be false for each T (which is consistent with α(x)). Now there is a certain way out of this criticism of model theoretic definitions: Even if it is not possible with model theoretic means to guarantee that an object a satisfying some formula has to be infinite, it should be possible to enforce that a – even if it is finite – nonetheless behaves like 22 ∃≥n xψ :←→ ∃x1 . . . xn (ψ(x1 )∧. . .∧ψ(xn )∧x1 6= x2 ∧. . . x1 6= xn ∧. . . xn−1 6= xn ), ∃=n x ψ :←→ ∃≥n xψ ∧ ¬∃≥n+1 xψ (n ∈ N).
Assumptions of Infinity
241
an infinite object, or plays the role of an infinite object. That is, it may happen that, given some model M of T and a ∈ M , a is finite; which means that it has finitely many elements, i.e., objects which stand in the ∈-relation to a. But since the predicate “∈” (from L[T]) need not be evaluated by the ∈-relation, there may nonetheless be infinitely many objects b ∈ M such that M |= y ∈ x[x : a, y : b]. In this case, a behaves like an infinite object in the sense that the set {b |b ∈ M ∧ M |= y ∈ x[x : a, y : b]} is infinite. This distinction leads to a correction of (M1T ) to which the above mentioned objection does not apply: (M2T ) ∀a, M(M |= T ∧a ∈ M ∧M |= α(x)[x : a] =⇒ {b|b ∈ M ∧ M |= y ∈ x[x : a, y : b]} is infinite). Let me also consider this seemingly negligible modification of (M2T ): (M3T )
∀a, M (M |= T ∧ a ∈ M =⇒ (M |= α(x)[x : a] ⇐⇒ {b | b ∈ M ∧ M |= y ∈ x[x : a, y : b]} is infinite)).
I have no strong intuitions as to which one of (M2T ) and (M3T ) should be preferred; both of them could be acceptable as explicantia of “α(x) expresses that x is infinite relative to T ”. For their assessment, let’s also assume that for the predicate “is infinite” used in them it is the case that a is infinite ⇐⇒ ∀n ∈ N (a has at least n elements).23 Now, there are these results: Lemma 1: Let T be a consistent theory in L[∈]. Then (i) (M2T ) ⇐⇒ ∀n (n ∈ N =⇒ T ⊢ ∀x (α(x) −→ ∃≥n y (y ∈ x))).
(ii) If CST ⊆ T (see section 4.2), then (M3T ) is false. Proof: See [14].
That is, on the one hand, (M2T ) may be acceptable, but does not deliver anything new when compared with the definiens of “α(x) expresses that x is infinite relative to T ” put forward in the syntactically minded approach from (2). (M3T ), on the other hand, should be rejected: it is false in too many cases where it should be true. In my opinion, the upshot of this section’s considerations is then that, whether from a syntactic or from a model theoretic point of view, an 23 (M1T ) to (M3T ) share a trait: if they are to convey what they are supposed to, the predicate “is infinite” used in them has to express infinity. Now this predicate belongs to some metalanguage ML[T] of L[T]. How is it provided for that these occurences of “a is infinite” express that a is infinite? And how is “ β(x) expresses that x is infinite” explained anyway, if β is a formula belonging to ML[T]? Hopefully not by a version of (M1T ) to (M3T ) which is part of a metalanguage of ML[T], if we do not want to enter into a regress.
242
Karl-Georg Niebergall
explicans of “α(x) expresses that x is infinite relative to T ” (for T in L[∈]) should at least imply (M2T *) ∀n (n ∈ N =⇒ T ⊢ ∀x (α(x) −→ ∃≥n y (y ∈ x))).
In light of this, my reaction to (E1) and (E2) is: when relativized appropriately – for example, to ZF – they are true because for each n ∈ N, ZF ⊢ ∀x (x is infinite −→ ∃≥n y (y ∈ x)) and ZF ⊢ ∀x (x is Dedekindinfinite −→ ∃≥n y (y ∈ x)). (M2T *) is a necessary condition for α(x) to express that x is infinite relative to T . But is it sufficient? And do alternatives exist? These questions are addressed in the following section. In a sense, the above line of thought will be repeated in it: in order to determine if a formula α express A (e.g., that x is infinite) relative to T , find out if T proves sentences containing α which are axioms appropriate to A. It is only that the axioms from the next section will be quite different from the sentences “∀x (α(x) −→ ∃≥n y (y ∈ x))”.
4 Axioms for and definitions of “finite” 4.1 Axioms for “finite” relative to set theories It can hardly be doubted that there are sentences containing “finite” 24 which many would accept already because of the way they use or believe they use this predicate. In the context of set theoretic languages, these sentences should be among them: finite(∅), ∀x finite({x}), ∀xy (finite(x) ∧ finite(y) −→ finite(x ∪ y)).
In addition, it will probably be assumed that only the sets which are obtained in the way addressed in these principles are finite. This is a minimality condition, and the postulation of an induction schema (see, e.g., (Ax2.Ind) below) is the common method to capture it.25 Now let’s view this list as a set of axioms for “x is finite”. That is, consider the first order language L[∈, F ], which is L[∈] extended by the 1-place predicate “F ”. The use of “F ”, which is supposed to be read “is 24 The reason for dealing mainly with “x is finite” instead of “x is infinite” from here on is that it is easier to find convincing axioms for finiteness than for infinity (possibly because in set theories such as ZF, there is provably no universal set). Accordingly. I will state explicantia for “α(x) expresses that x is finite”, but nonetheless eventually strive for explications of “T makes an assumption of infinity”. 25 As a matter of fact, this list (and equivalent ones) has been discovered and accepted time and again. See in particular [3]. We seem to have a considerable stability of intuitions here.
Assumptions of Infinity
243
finite”, is determined by the following axiom system in L[∈, F ].
(AxFin2)
(Ax2.i) (Ax2.ii) (Ax2.iii) (Ax2.Ind)
F (∅) ∀x F ({x}) ∀xy (F (x) ∧ F (y) −→ F (x ∪ y))
{(Ind2)ψ | ψ is a formula from L[∈, F ]}.
Here, (Ind2)ψ ≡ AxF in2[ψ] −→ ∀x (F (x) −→ ψ(x)), with AxF in2[ψ] ≡ ψ(∅) ∧ ∀x ψ({x}) ∧ ∀xy (ψ(x) ∧ ψ(y) −→ ψ(x ∪ y)).
Let’s also consider a variant of the above axiom system (stated, again, in L[∈, F ]), which seems to be just as good: (AxFin3) (Ax3.i) (Ax3.ii) (Ax3.Ind)
F (∅) ∀xy (F (x) −→ F(x ∪ {y})) {(Ind3)ψ | ψ in L[∈, F ]}.
Here, (Ind3)ψ ≡ AxF in3[ψ] −→ ∀x (F (x) −→ ψ(x)), with
AxF in3[ψ] ≡ ψ(∅) ∧ ∀xy (ψ(x) −→ ψ(x ∪ {y})). From the many theorems and metatheorems that could be proved now, let me mention merely those which are of particular relevance for the purposes of this paper (see [14] for a fuller development). First and foremost, the two axiom systems (AxFin2) and (AxFin3) result in the same theory.26 Theorem 1: Let T be a theory in L[∈] which proves the axioms of extensionality, union and pairing and the existence of ∅. Then: (i) T + (AxFin3) ⊢ (AxFin2), (ii) T + (AxFin2) ⊢ (AxFin3).
Proof: (i) • (AxFin3) ⊢ (Ax2.iii) – Set
ψ(x) :←→ ∀y (F (y) −→ F (x ∪ y)).
Then — (AxFin3) ⊢ ψ(∅) – since (AxFin3) ⊢ ∀y (F (y) −→ F (∅ ∪ y)). — (AxFin3) ⊢ ψ(x) −→ ψ(x ∪ {z}) – this is (AxFin3) ⊢ ∀y(F (y) −→ F (x∪y)) −→ ∀y(F (y) −→ F(x∪y∪{z})), which holds by (Ax3.ii).
26 Sometimes, it is easier to check the axioms of (AxFin3); an example can be found in the appendix.
244
Karl-Georg Niebergall
With (Ind3)ψ , it follows that (AxFin3) ⊢ ∀x(F (x) −→ ψ(x)), the desired claim. • (AxFin3) ⊢ (Ax2.Ind) – Let ψ be a formula from L[∈, F ]; then
(AxFin3) ⊢ AxF in2[ψ] −→ [ψ(∅) ∧ ∀xy (ψ(x) −→ ψ(x ∪ {y}))], whence by using (Ind3)ψ , (AxFin3) ⊢ (Ind2)ψ .
(ii) • (AxFin2) ⊢ (Ax3.Ind) – Let ψ be a formula from L[∈, F ], and set ϕ(x) :←→ ∀z (ψ(z) −→ ψ(z ∪ x)). Then (a) (AxFin2) ⊢ ϕ(∅) – since (AxFin2) ⊢ ∀z(ψ(z) −→ ψ(z ∪ ∅)). (b) (AxFin2) ⊢ AxF in3[ψ] −→ ∀xz(ψ(z) −→ ψ(z ∪ {x})) −→ ∀xϕ({x}). Moreover,
(AxFin2) ⊢ ϕ(x) ∧ ϕ(y) ∧ ψ(z) −→ ∀z (ψ(z) −→ ψ(z ∪ x))∧ ∀z (ψ(z) −→ ψ(z ∪ y)) ∧ ψ(z) −→ ψ(z ∪ x) ∧ ∀z (ψ(z) −→ ψ(z ∪ y)) −→ ψ(z ∪ (x ∪ y)), whence (AxFin2) ⊢ ϕ(x) ∧ ϕ(y) −→ ∀z(ψ(z) −→ ψ(z ∪ (x ∪ y))), that is (c) (AxFin2) ⊢ ∀xy (ϕ(x) ∧ ϕ(y) −→ ϕ(x ∪ y)). Now (a) to (c) yield (AxFin2) ⊢ AxF in3[ψ] −→ AxF in2[ϕ], which with (Ind2)ϕ implies (AxFin2) ⊢ AxF in3[ψ] −→ ∀x (F (x) −→ ϕ(x)). But this is (AxFin2) ⊢ AxF in3[ψ] −→ ∀x (F (x) −→ ∀z (ψ(z) −→ ψ(z ∪ x)), and therefore one obtains (with ψ(∅)) (AxFin2) ⊢ AxF in3[ψ] −→ ∀x (F (x) −→ ψ(x)). Moreover, we have a result to the effect that in some sense, the use of “F ” is fixed by the axioms: a “characterization lemma”. By this I mean the following: Lemma 2: (i) Let T be a theory in L[∈, F ] which proves (AxFin2), and let α(x) be a formula from L[∈, F ], such that T ⊢ AxF in2[α], and T ⊢ AxF in2[ψ] −→ ∀x (α(x) −→ ψ(x)), for each formula ψ from L[∈, F ]. Then T ⊢ ∀x (α(x) ←→ F(x)).
Assumptions of Infinity
245
(ii) Let T be a theory in L[∈] and let α(x), α′ (x) be formulas from L[∈], such that T ⊢ AxF in2[α], T ⊢ AxF in2[α′ ], and T ⊢ AxF in2[ψ] −→ ∀x (α(x) −→ ψ(x)) and T ⊢ AxF in2[ψ] −→ ∀x (α′ (x) −→ ψ(x)), for each formula ψ from L[∈]. Then T ⊢ ∀x (α(x) ←→ α′ (x)).
Finally, the thoughts expressed in section 3.2 can be understood as an argument that it is intuitively plausible that for each n ∈ N, “∀x (¬F (x) −→ ∃≥n y (y ∈ x))” should be the case. They thus point to a further set of axioms for finiteness, stated in L[∈, F ]: (AxFin1)
{p∀x (¬F (x) −→ ∃≥n y (y ∈ x))q | n ∈ N}.
(AxFin1) is quite different from (AxFin2) and (AxFin3); but there is a connection between it and them: Lemma 3: Let T be a theory in L[∈] which proves the axioms of union and pairing. Then (AxFin1) ⊆ T + {(Ax2.i), (Ax2.ii), (Ax2.iii)}.27
4.2 Definitions of “finite” relative to set theories With axiom systems (AxFin2) and (AxFin3) at hand, we now have a further means to determine whether a suggested explicans of “x is finite” is adequate. Let’s apply this to (D1) and (D2). To start with, we have, indeed, ZF ⊢ AxFin2[finite] and for each formula ψ in L[∈], ZF ⊢ AxFin2 [ψ] −→ ∀x (x is finite −→ ψ(x)). But the following is false: ZF ⊢ AxFin2[Dedekind-finite] and for each formula ψ in L[∈], ZF ⊢ AxFin2 [ψ] −→ ∀x (x is Dedekind-finite −→ ψ(x)). For a correct result, ZF has to be replaced here by, e.g., ZFC (see [2]). In addition, even when “x is finite” (as defined in (D2)) is employed, it is not clear whether we obtain the right results relative to set theories which do not prove that ω exists. Now, as it has been pointed out before (see especially [3]), it would be particularly agreeable to have an adequate set theoretical treatment of finiteness in a set theory which does not prove the axiom of infinity. Moreover, if such a weak core set theory CST could be found, we would have a good chance to avoid the above mentioned dependency of “α(x) expresses that x is infinite relative to T ” on the theory T chosen in each particular case: for the same α could work for each T extending CST. How weak CST could be and still qualify as a set theory is not settled. Personally, I would without hesitation put the axiom of extensionality in
27 The converse inclusion does not hold.
246
Karl-Georg Niebergall
it; in addition, the axioms of pairing and of union28 seem to be beyond dispute; and at least simple instances of the separation scheme should also be included in T , if it deserves to be called a set theory. Accordingly, I accept this determination of CST for the purposes of this paper: CST := the deductive closure of the axioms of extensionality, pairing and union and of the separation schema.29 Coming back to the set theoretical definitions of “x is finite”, it is obvious that the definientia of (D1) and (D2) are quite unsimilar to (AxFin2) and (AxFin3). Now shouldn’t it be possible to find definientia of “x is finite” in L[∈] for which one can see at a glance that they deliver, say, (AxFin2)? Surely, axioms have to be clearly distinguished from definitions and definientia. And in general, there is no reason why it should be possible to rewrite axioms as definientia. But the axioms from (AxFin2) have a specific form: the form of the clauses of an inductive definition in which “F ” occurs only positively. Thus, the well known method of transforming an inductive definition into an explicit set theoretic one suggests the following definition of “x is finite”: Definition 1: Cl(y) :←→ ∅ ∈ y ∧ ∀z ({z} ∈ y) ∧ ∀uv (u ∈ y ∧ v ∈ y −→ u ∪ v ∈ y),
x is finiteind :←→ ∀y (Cl(y) −→ x ∈ y). This definition is highly plausible; but it is flawed. For it can be shown that, e.g., Z ⊢ ∀x (x is finiteind ); and not for each ψ ∈ L[∈], ZF ⊢ AxF in2[ψ] −→ ∀x (x is finiteind −→ ψ(x)). A remedy (which probably is due to Kuratowski; see, e.g., [16] and [5]) has been known for some time. This can be further simplified to: Definition 2: Cl+ (y, x) :←→ ∅ ∈ y ∧ ∀z (z ∈ x −→ {z} ∈ y) ∧ ∀uv (u ∈ y ∧ v ∈ y −→ u ∪ v ∈ y),
x is finiteI :←→ ∀y (Cl+ (y, x) −→ x ∈ y). With this definition, we get the desired result:
Lemma 4: Set α(x) ≡ “x is finiteI ”; then CST + Power ⊢ AxF in2[α], and for each formula ψ in L[∈], CST + Power ⊢ AxF in2[ψ] −→ ∀x (α(x) −→ ψ(x)). Proof: See [14]. What I still take to be somewhat unsatisfactory about this lemma is that the theory mentioned in it includes the power set axiom. Deplorably, 28 By this I mean “∀xy∃z∀u (u ∈ z ←→ u ∈ x ∨ u ∈ y)”. The more general version which postulates the existence of unions for arbitary sets I call “axiom of big-unions”. 29 Let me note that CST does not imply the axioms of infinity, of power set (called “Power” here), of foundation, of replacement, of choice, and not even the axiom of big-unions.
Assumptions of Infinity
247
I see no way to avoid its use in the proof of Lemma 4. But one can do better and get along without the power set axiom if an entirely different set theoretic definiens of “x is finite” is employed. Thus, consider the formula “SF in(x)” defined by “∃y (ω(y) ∧ y ≃ x)”; precise definitions of the components of the definiens can be found in the appendix. Here we have this metatheorem: Theorem 2: Set α(x) ≡ “SF in(x)”; then CST ⊢ AxF in3[α], and for each formula ψ in L[∈], CST ⊢ AxF in3[ψ] −→ ∀x (α(x) −→ ψ(x)). Proof: See the appendix. Corollary 1: Let T be a consistent extension of CST in L[∈] and let α(x) be a formula from L[∈], such that T ⊢ AxF in3[α], and for each formula ψ from L[∈] T ⊢ AxF in3[ψ] −→ ∀x (α(x) −→ ψ(x)).
Then T ⊢ ∀x (α(x) ←→ SF in(x)).
4.3 Axioms for “finite” relative to mereological theories This and the next subsection are about mereological theories, or calculi of individuals. Let me first review what I understand under “mereological theory”. Then I will state relative to mereological theories axioms for “x is finite” which are very similar to the set theoretic ones. Finally, I suggest a mereological definiens of “x is finite” which, surprisingly but fortunately, works as it should.30 L[◦] is a first order language with the 2-place predicate “◦” as its sole non-logical primitive. “x ◦ y” is read “x overlaps y”. L[◦] is supplied with classical first order logic with identity. “⊑”, which is read “part of”, is defined by: x ⊑ y :←→ ∀z (z ◦ x −→ z ◦ y). Moreover, by axiom or by definition, “x = y ←→ x ⊑ y ∧ y ⊑ x” is supposed to be provable. In this language, several axioms can be stated which I assume to be correct under the above mentioned reading of “◦”. First, there are ∀xy (x ◦ y ←→ ∃z (z ⊑ x ∧ z ⊑ y)), O
∀xy∃z∀u (u ◦ z ←→ u ◦ x ∨ u ◦ y), SUM ∀x (¬∀v v ◦ x −→ ∃y∀v (v ⊑ y ←→ ¬v ◦ x)). NEG I call the theory axiomatized by these sentences “CI”. Often, instances of the so-called fusion-schema FUS are also included in mereological theories. 30 For additional information on calculi of individuals, see, e.g. , [13]. This paper also contains an outline of the mereological treatment of finiteness presented here in sections 4.3 and 4.4.
248
Karl-Georg Niebergall
By employing again the common procedure of identifying a schema with the set of “its instances”, FUS can be precisely determined as follows: let ψ be a formula in L[◦]; then set FUSψ : ∃x ψ −→ ∃z∀y(z ◦ y ←→ ∃x(x ◦ y ∧ ψ)) (y, z being new variables), FUS := {FUSψ | ψ is a L[◦]-formula}. A particular instance of FUS, called “FUSAt ”, will play an exposed role for this investigation. It is obtained by choosing “At(x)” for ψ(x) in FUS, i.e. FUSAt : ∃x At(x) −→ ∃z∀y(z ◦ y ←→ ∃x(x ◦ y ∧ At(x))), where At(x) :←→ ∀y (y ⊑ x −→ x ⊑ y).
As it is CST in the set theoretic case, the theory CI + FUSAt will be the core of all the consistent theories formulated in L[◦] which I regard as mereological ones. Among its prominent extensions (in L[◦]) are these theories: ACI := CI + {AT}, FCI := CI + {AF}, MCI := CI + {¬AT, ¬AF}, ACIn+1 := ACI + {∃=n+1 At} (for n ∈ N),
ACIω := ACI + {∃≥n+1 At | n ∈ N}, MCIn+1 := MCI + {∃=n+1 At} (for n ∈ N), MCIω := MCI + {∃≥n+1 At | n ∈ N},
where AT expresses that the part-relation is atomic and AF expresses that it is atomless: AT ∀x∃y (y ⊑ x ∧ At(y)), AF
∀x∃y (y ⊑ x ∧ x 6⊑ y).
L[◦] may now be extended by the 1-place predicate “F ” (for “x is finite”) to L[◦, F ]. Here, too, the use of “F ” is determined by axioms (which are now stated in L[◦, F ]).
It is clear that the first axiom system (AxFin1) (from section 4.2) has a counterpart in L[◦, F ]. It is (AxFinI1) {p∀x (¬F (x) −→ ∃≥n y (y ⊑ x))q | n ∈ N}.
Then I suggest a list of axioms which is close to (AxFin2). (AxFinI2) (AxI2.i) (AxI2.ii) (AxI2.Ind)
∀x (At(x) −→ F (x))
∀xy (F (x) ∧ F (y) −→ F(x ⊔ y))31 {(IndI2)ψ | ψ in L[◦, F ]}.
31 We have here the contextual definition z = x ⊔ y :←→ ∀u (u ◦ z ←→ u ◦ x ∨ u ◦ y).
Assumptions of Infinity
249
Here, (IndI2ψ ) ≡ AxF inI2[ψ] −→ ∀x (F (x) −→ ψ(x)), with AxF inI2[ψ] ≡ ∀x (At(x) −→ ψ(x)) ∧ ∀xy (ψ(x) ∧ ψ(y) −→ ψ(x ⊔ y)). The relation between (AxFin1) and (AxFin2) adressed in Lemma 3 recurs also here: Lemma 5: (AxFinI1) ⊆ CI + {(AxI2.i), (AxII2.ii)}.
4.4 Definitions of “finite” relative to mereological theories Since the axioms from (AxFinI2) have, again, the form of clauses of inductive definitions, a first approach to transform them into the explicit version of such an inductive definition could be: IF in(x) :←→ ∀y (ICl(y) −→ x ∈ y), with ICl(y) :←→ ∀z (At(z) −→ z ∈ y) ∧ ∀zz ′ (z ∈ y ∧ z ′ ∈ y −→ z ⊔ z ′ ∈ y).
Of course, since we do not have “∈” at our disposal here, this cannot be done in L[◦]. Although it is plainly superficial, let me nonetheless simply replace “∈” by “⊑” in the definiens just suggested and thereby put forward as a mereologically stated definiens of “x is finite”: Definition 3: ICl(y) :←→ ∀z (At(z) −→ z ⊑ y)∧∀zz ′ (z ⊑ y∧z ′ ⊑ y −→ z ⊔z ′ ⊑ y), IF in(x) :←→ ∀y (ICl(y) −→ x ⊑ y),
It is barely credible that this definition should work properly – but it does:32 Theorem 3: Set α(x) ≡ “IF in(x)”; then CI + FUSAt ⊢ AxF inI2[α], and for each formula ψ in L[◦], CI + FUSAt ⊢ AxF inI2[ψ] −→ ∀x (α(x) −→ ψ(x)).
Corollary 2: Let T be a consistent extension of CI + FUSAt in L[◦] and let α(x) be a formula from L[◦], such that T ⊢ AxF inI2[α], and for each formula ψ from L[◦] T ⊢ AxF inI2[ψ] −→ ∀x (α(x) −→ ψ(x)).
Then T ⊢ ∀x (α(x) ←→ IF in(x)).
32 A detailed treatment of the mereological case will be published elsewhere.
250
Karl-Georg Niebergall
5 Elaboration of (DIiii) 5.1 “α(x) expresses that x is finite” By recourse to the preparations contained in sections 3 and 4, let me now put forward precise definitions of “α(x) expresses that x is finite relative to T ”, “α(x) expresses that x is finite” and “T makes an assumption of infinity” which are elaborations of (DIiii). For each of these phrases, several definientia are formulated. First, the set theoretical case and the mereological case have to be dealt with separately. Second, in each of them, we have two axiomatizations of “x is finite”. Since there will be amendments, we eventually get eight versions of definientia of “α(x) expresses that x is finite relative to T ”. Third, each of these is carried over into the definitions of “α(x) expresses that x is finite” and “T makes an assumption of infinity”. In what follows, the theories T and the formulas α which are assumed to express finiteness relative to T are formulated in L[∈] or in L[◦]. When the T ′ s and the α′ s occur in one and the same context, they are always supposed to be formulated in the same language. The repeatedly used phrase “T is suitable” is explained as follows: it abbreviates “T is a consistent extension of CST” for theories T in L[∈], and “T is a consistent extension of CI + FUSAt ” for theories T in L[◦].33 I begin with the set theoretic case in (A) and transfer it to the mereological one in (B). (A) Let α be a formula from L[∈] and T (in L[∈]) be suitable. Set AxF in1[α] ≡ ∀x (¬α(x) −→ ∃≥n y (y ∈ x)). Then α(x) strongly expresses that x is finite relative to T : ⇐⇒ T ⊢ AxF in2[α] and for each formula ψ in L[∈], T ⊢ AxF in2[ψ] −→ ∀x (α(x) −→ ψ(x)).
α(x) very weakly expresses that x is finite relative to T : ⇐⇒ AxF in1[α] ⊆ T Sadly, this is trivial: take “x = x” for α(x); since for each n ∈ N, “∀x (x 6= x −→ ∃≥n y (y ∈ x))” is already a logical truth, “x = x” very weakly expresses that x is finite relative to each T formulated in L[∈]. But this is not intended. A way out of this difficulty is to add “T 6⊢ ∀xα(x)” to the second definiens. Since the same can be done with respect to the first one, we 33 The definitions of this section could, in principle, be extended to theories stated in other languages L, and “T is suitable” could be explained for them, too: one merely needs a class C of distinguished theories in L, axioms for finiteness and definitions which are adequate to these axioms relative to C. Many of the theorems of section 5.3 could then easily be tranferred to this new context: for they often rest only on the assumption that the set of suitable theories contains only consistent theories and is closed under extension.
251
Assumptions of Infinity
have to consider two further definitions: α(x) very strongly expresses that x is finite relative to T : ⇐⇒ T 6⊢ ∀xα(x) and T ⊢ AxF in2[α] and for each formula ψ in L[∈], T ⊢ AxF in2[ψ] −→ ∀x (α(x) −→ ψ(x)). α(x) weakly expresses that x is finite relative to T : ⇐⇒ T 6⊢ ∀xα(x) and AxF in1[α] ⊆ T .
(B) Let α be a formula from L[◦] and T (in L[◦]) be suitable. AxF inI1[α] ≡ ∀x (¬α(x) −→ ∃≥n y (y ⊑ x)).
Set
Similarly to the set theoretic case, we have four possible explicantia: α(x) strongly expresses that x is finite relative to T : ⇐⇒ T ⊢ AxF inI2[α] and for each formula ψ in L[◦], T ⊢ AxF inI2[ψ] −→ ∀x (α(x) −→ ψ(x)). α(x) very weakly expresses that x is finite relative to T : ⇐⇒ AxF inI1[α] ⊆ T α(x) very strongly expresses that x is finite relative to T : ⇐⇒ T 6⊢ ∀xα(x) and T ⊢ AxF inI2[α] and for each formula ψ in L[◦], T ⊢ AxF inI2[ψ] −→ ∀x (α(x) −→ ψ(x)).
α(x) weakly expresses that x is finite relative to T : ⇐⇒ T 6⊢ ∀xα(x) and AxF inI1[α] ⊆ T . Because of their dependency on theories T , these are certainly not the originally aimed-at definitions. Now, there is a well-known method for what to do in such a situation: get rid of the disturbing additional parameters by quantifying over them. Usually, this manoeuvre doesn’t lead anywhere, however. For if it did, a relativization to theories would probably have been superfluous from the beginning. But see section 5.3 for more. α(x) universally very strongly expresses that x is finite : ⇐⇒ ∀T (T is suitable =⇒ α(x) very strongly expresses that x is finite relative to T ). α(x) universally strongly expresses that x is finite : ⇐⇒ ∀T (T is suitable =⇒ α(x) strongly expresses that x is finite relative to T ). α(x) universally weakly expresses that x is finite : ⇐⇒ ∀T (T is suitable =⇒ α(x) weakly expresses that x is finite relative to T ). α(x) universally very weakly expresses that x is finite : ⇐⇒ ∀T (T is suitable =⇒ α(x) very weakly expresses that x is finite relative to T ). α(x) existentially very strongly expresses that x is finite : ⇐⇒ ∃T (T is suitable ∧ α(x) very strongly expresses that x is finite relative to T ). α(x) existentially strongly expresses that x is finite : ⇐⇒
252
Karl-Georg Niebergall
∃T (T is suitable ∧ α(x) strongly expresses that x is finite relative to T ). α(x) existentially weakly expresses that x is finite : ⇐⇒ ∃T (T is suitable ∧ α(x) weakly expresses that x is finite relative to T ). α(x) existentially very weakly expresses that x is finite : ⇐⇒ ∃T (T is suitable ∧ α(x) very weakly expresses that x is finite relative to T ).
5.2 “T makes an assumption of infinity” Given the general considerations from section 3 and the precise versions of “α(x) expresses that x is finite” just presented, the following definitions of “T makes an assumption of infinity” in the style of (DIiii) suggest themselves. In their definientia, let L be L[∈] and T be formulated in L[∈], or let L be L[◦] and T be formulated in L[◦]. T makes a universal very strong assumption of infinity : ⇐⇒ ∃α (α ∈ L ∧ T ⊢ ∃x¬α(x) ∧ α(x) universally very strongly expresses that x is finite). T makes a universal strong assumption of infinity : ⇐⇒ ∃α (α ∈ L ∧ T ⊢ ∃x¬α(x) ∧ α(x) universally strongly expresses that x is finite). T makes a universal weak assumption of infinity : ⇐⇒ ∃α (α ∈ L ∧ T ⊢ ∃x¬α(x) ∧ α(x) universally weakly expresses that x is finite). T makes a universal very weak assumption of infinity : ⇐⇒ ∃α (α ∈ L ∧ T ⊢ ∃x¬α(x) ∧ α(x) universally very weakly expresses that x is finite). T makes an existential very strong assumption of infinity : ⇐⇒ ∃α(α ∈ L∧T ⊢ ∃x¬α(x)∧α(x) existentially very strongly expresses that x is finite). T makes an existential strong assumption of infinity : ⇐⇒ ∃α (α ∈ L ∧ T ⊢ ∃x¬α(x) ∧ α(x) existentially strongly expresses that x is finite). T makes an existential weak assumption of infinity : ⇐⇒ ∃α (α ∈ L ∧ T ⊢ ∃x¬α(x) ∧ α(x) existentially weakly expresses that x is finite). T makes an existential very weak assumption of infinity : ⇐⇒ ∃α (α ∈ L ∧ T ⊢ ∃x¬α(x) ∧ α(x) existentially very weakly expresses that x is finite).
Assumptions of Infinity
253
Alternatively, it could also seem reasonable that the theory which has to prove “there is an infinite object” in order to make an assumption of infinity should be the same relative to which “there is an infinite object” expresses that there is an infinite object. Given the type of definitions favoured here, this leads to two additional options: T makes a strong assumption of infinity : ⇐⇒ ∃α (α ∈ L ∧ T ⊢ ∃x¬α(x) ∧ α(x) strongly expresses that x is finite relative to T ). T makes a weak assumption of infinity : ⇐⇒ ∃α (α ∈ L ∧ T ⊢ ∃x¬α(x) ∧ α(x) weakly expresses that x is finite relative to T ).34
5.3 Evaluating the definitions It is probably not easy to see at a glance what the strengths and weaknesses of these definitions are. Thus, let me supply some metatheorems which should be of help for their evaluation.35 Lemma 6: Let T be suitable. Then: (i) if T makes a universal strong assumption of infinity, then T makes a strong assumption of infinity and T makes a universal very weak assumption of infinity; (ii) if T makes a strong assumption of infinity, then T makes an existential very strong assumption of infinity and T makes a weak assumption of infinity; (iii) if T makes an existential very strong assumption of infinity, then T makes an existential strong assumption of infinity and T makes an existential weak assumption of infinity; (iv) if T makes an existential strong assumption of infinity, then T makes an existential very weak assumption of infinity; (v) if T makes a universal very weak assumption of infinity, then T makes a weak assumption of infinity; (vi) if T makes a weak assumption of infinity, then T makes an existential weak assumption of infinity; 34 Two further possible definitions, i.e. T makes a very strong assumption of infinity : ⇐⇒ ∃α (α ∈ L ∧ T ⊢ ∃x¬α(x) ∧ α(x) very strongly expresses that x is finite relative to T ), and T makes a very weak assumption of infinity : ⇐⇒ ∃α(α ∈ L∧T ⊢ ∃x¬α(x)∧α(x) very weakly expresses that x is finite relative to T ) are redundant. For “T makes a strong assumption of infinity” and “T makes a very strong assumption of infinity” are equivalent, and “T makes a weak assumption of infinity” and “T makes a very weak assumption of infinity” are equivalent. 35 Some of the following lemmata have already been stated (more or less) in [14].
254
Karl-Georg Niebergall
(vii) if T makes an existential weak assumption of infinity, then T makes an existential very weak assumption of infinity.36 Corollary 3: (i) “SF in(x)” universally strongly expresses that x is finite (with respect to L[∈]). (ii) “IF in(x)” universally strongly expresses that x is finite (with respect to L[◦]). Proof: (i) By Theorem 2.37 (ii) By Theorem 3. Theorem 4: (i) Equivalent are for suitable T in L[∈]: T ⊢ ∃x¬SF in(x), T makes a universal strong assumption of infinity, T makes a strong assumption of infinity. (ii) Equivalent are for suitable T in L[◦]: T ⊢ ∃x¬IF in(x), T makes a universal strong assumption of infinity, T makes a strong assumption of infinity. Proof: See [14]. Example 1: (i) Z and ZF make a universal strong assumption of infinity. (ii)
FCI, MCIn (for each n ≥ 1) and MCIω + FUSAt make a universal strong assumption of infinity. (iii) For each n ≥ 1, ACIn does not make a strong assumption of infinity. Example 2: ACIω does not make a strong assumption of infinity. Lemma 7: Let T be suitable. Then: (i)
if T is maximally consistent and α(x) weakly expresses that x is finite relative to T , then each model of T is infinite;
(ii) if T is maximally consistent and T makes a weak assumption of infinity, then each model of T is infinite; (iii) if T makes a weak assumption of infinity, then each model of T is infinite. Proof: (i) Let M |= T , and assume α(x) weakly expresses that x is finite relative to T . Then T 6⊢ ¬∃x¬α(x), and since T is supposed to be maximally consistent, T ⊢ ∃x¬α(x). But then for each n ∈ N, T ⊢ ∃x∃≥n y (y ∈ x) or T ⊢ ∃x∃≥n y (y ⊑ x). Therefore, M must be infinite. 36 Several of the converse directions do not hold; this can be shown by recourse to the mereological examples presented in this section. 37 This is a strenghtening of a result from [14].
Assumptions of Infinity
255
(iii) Assume that T makes a weak assumption of infinity, M |= T and M is finite. Consider Th(M). This is a maximally consistent suitable theory which also makes a weak assumption of infinity. Thus, by (ii), Th(M) has only infinite models. But by assumption, M is a finite model of that theory. Contradiction. Example 1(iii), continued: For each n ≥ 1, ACIn does not make a weak assumption of infinity. Example 3: CST, CST + Power, ZF – IN F (∈), ZF – IN F (∈) + ¬IN F (∈) do not make a weak assumption of infinity. Proof: See [14].
Lemma 8: (i) ∃T (T is suitable ∧ T makes a universal weak assumption of infinity) =⇒ ∀T (T suitable =⇒ T makes a weak assumption of infinity). (ii) ∃T (T is suitable ∧ T makes a universal very strong assumption of infinity) =⇒ ∀T (T suitable =⇒ T makes a weak assumption of infinity). Proof: See [14]. Theorem 5: (i) No suitable theory T makes a universal very strong assumption of infinity. (ii) No suitable theory T makes a universal weak assumption of infinity. Proof: By Lemmata 8(i) and 8(ii), and by Example 1(iii) and Example 3. The next lemmata are concerned especially with theories stated in L[◦]. Lemma 9: Let T be suitable in L[◦]. Then: (i) if all models of T are infinite, then “¬∀y y ⊑ x” weakly expresses that x is finite relative to T ; (ii) T makes a weak assumption of infinity ⇐⇒ all models of T are infinite. Proof: (i) Set α(x) ≡ ¬∀y y ⊑ x. Let T be suitable, having only infinite models. Then (a) T ⊢ 6 ∀xα(x).
Moreover, assume M |= T , and let a ∈ M . Then if M |= ¬α(x)[x : a], i.e., if M |= ∀y y ⊑ x[x : a], then a = 1M (= the maximal element of M ). Now since M is infinite and each element of M stands in the ⊑M -relation to 1M , M |= ∃≥n y (y ⊑ x)[x : a] for each n ∈ N.
256
Karl-Georg Niebergall
This shows (b) “AxF inI1[α] ⊆ T ”, which together with (a) establishes the claim. (ii) “=⇒”: Lemma 7(iii). “⇐=”: Set α(x) ≡ ¬∀y y ⊑ x. Let T be suitable, having only infinite models. Then (a) T ⊢ ∃x¬α(x); and by (i), (b) α weakly expresses that x is finite relative to T . (a) plus (b) imply that T makes a weak assumption of infinity. Example 2, continued: ACIω makes a weak assumption of infinity. Example 2, continued: ACIω does not make a universal very weak assumption of infinity. Lemma 10: (i) “x = 6 x” very strongly expresses that x is finite relative to FCI. (ii) “x = 6 x” existentially very strongly expresses that x is finite.
Proof: (i) Set α(x) ≡ x 6= x. Since FCI ⊢ ¬∃x At(x), it is easy to see that FCI ⊢ AxF inI2[α] and that for each formula ψ in L[◦], FCI ⊢ AxF inI2[ψ] −→ ∀x (α(x) −→ ψ(x)). Moreover, FCI 6⊢ ∀x x 6= x, which is FCI 6⊢ ∀x α(x). In sum, α(x) very strongly expresses that x is finite relative to FCI. (ii) By (i), FCI is the witness.
Lemma 11: Each suitable theory in L[◦] makes an existential very strong assumption of infinity. Proof: Let T be suitable, i.e., a consistent extension of CI + FUSAt . Set α(x) ≡ x 6= x. Then by Lemma 10(ii), α(x) existentially very strongly expresses that x is finite. Moreover, T ⊢ ∃x¬α(x). Therefore, T makes an existential very strong assumption of infinity.
6 The potentially infinite 6.1 Assumptions of the potentially infinite Let’s now come to the topic of the potentially infinite and deal with the question of how to understand (αP) and (γP). In [12], I have already voiced scepticism as to the prospects of finding convincing explications of “x is potentially infinite” and “T assumes the potentially infinite”. Personally, I simply have no ordinary understanding of these phrases, and I do not find much help in the existing literature on them. It seems that even examples are missing.38 38 I think that those who claim that the set of natural numbers is potententially infinite and that PA and PRA assume only the potentially infinite do nothing else than advocate an uncommon mode of speaking.
Assumptions of Infinity
257
Wheras the criticism was mainly addressed to (αP) in [12], it is directed against (γP) here. As a matter of fact, I take it that in most of the writings on the potentially infinite, it is (γP) rather than (αP) which is of particular importance. The reason is that those philosophers who are interested in the theme of the potentially infinite are usually drawn to it because they regard it as desirable to avoid assumptions of infinity (i.e., of the actual infinity), yet do not want to be restricted to a mere finitist position. An assumption of merely the potentially infinite seems to be a way out of this quandary:39 it seems to allow you to have your cake and eat it too.
6.2 Suggestions for definitions Let’s thus look out for possible explicantia of “T assumes the potentially infinite”. A first attempt could be to rewrite the definitions from section 2.2 simply by replacing “infinite” by “potentially infinite” everywhere. By this procedure, we obtain (DPi) (DPii) (DPiii) (DPiv)
T assumes the potentially infinite : ⇐⇒ ∀M (M |= T =⇒ M is potentially infinite),
T assumes the potentially infinite : ⇐⇒ ∀M (M |= T =⇒ ∃x (x ∈ M ∧ x is potentially infinite)),
T assumes the potentially infinite : ⇐⇒ T |= ∃x (x is potentially infinite),
T assumes the potentially finite : ⇐⇒ ∃M (M |= T ∧ M is potentially infinite), (DPv) T assumes the potentially infinite : ⇐⇒ T |= ∀x (x is potentially infinite). Intuitively, none of these definitions forces itself upon me as an explication of “T assumes the potentially infinite”.40 Furthermore, all of them suffer from a common shortcoming: be it as a formula of L[T] or as a formula of ML[T], the phrase “x is potentially infinite” occuring in the above definientia has to be explained first – and it has not been explained. This is in sharp contrast to at least those definitions from sections 2.2 and 2.3 where the predicates “x is infinite” and “x is finite” belong to ML[T].
It is granted that one could try to define “x is potentially infinite” by employing a modal vocabulary. Although not contained in the languages ML[T] taken into account up to this point, it may, of course, be added 39 It was once common to believe that the assumption of the actually infinite leads to contradictions, the so called paradoxes of infinity. Through the development of modern set theory, it became clear that those contradictions could be avoided. Nowadays, adherents of the potentially infinite rather claim that assumptions of the (actual) infinite are superfluous, at least for the purposes of arithmetic or analysis; cf. [8]. 40 Employing the definientia of “T assumes the potentially infinite” from section 2.3 as models is also of no help.
258
Karl-Georg Niebergall
to them. But at least since Quine’s criticism there is reasonable doubt as to the general understandability of the modal idiom.41 From this point of view, modality free definientia of “T assumes the potentially infinite” should be welcome. I will consider two of them. (A) Intuitively, it may be left open whether a theory T which assumes the potentially infinite should be allowed to have infinite models. But there should be no finite upper bound for the sizes of its finite models; for if there were such a finite upper bound, T would rather assume merely the finite (cf. (Ad2)). Formally, this idea could be captured as follows: (DPvi)
T assumes the potentially infinite : ⇐⇒ ∀n ∈ N ∃M (M |= T ∧ |M | ≥ n ∧ |M | is finite).
(B) In a nonstandard understanding of “theory”, a theory T is not regarded as something completed, but as something which undergoes development. If T assumes the potentially infinite, none of its stages is allowed to make an assumption of the (actually) infinite.42 As a precise rendering of this idea, the following has been suggested (see [10], [11] and [7]): (DPvii) T assumes the potentially infinite : ⇐⇒ ∀ϕ (T |= ϕ =⇒ ∃M(M |= ϕ ∧ |M | is finite)).
It is a pity that both of these definitions imply that PLL 1 (where L is an arbitrary first order language) assumes the potentially infinite.
6.3 Adequacy conditions The mere fact that the definitions of “T assumes the potentially infinite” put forward so far are not convincing is, of course, no proof that no acceptable ones can be found. But at the same time, if such a definition is regarded as an explication, how can it be decided whether it is adequate? Adequate to what? To our common, pretheoretical understanding of “T assumes the potentially infinite”? For someone like me who doubts the very existence of that understanding, this is no reasonable option. Let me suggest a different approach here. Its starting point is that an assumption of the potentially infinite should have a place of its own. That is: whatever the phrase “T assumes the potentially infinite” may mean, making an assumption of the potentially infinite becomes superfluous if it collapses into making an assumption of the infinite or assuming merely the finite. After all, (γP) has been put forward as a competitor both of (γI) and of (γF). 41 This may be brushed aside when it comes to the mathematical discourse. But here, there is a threat that talk of possibility and necessity becomes dispensable. This is the case if – as it is often assumed, e.g., in the usual application of possible worlds semantics – a mathematical sentence is regarded as necessary, if true. 42 That is, T itself may be construed as a potentially infinite object.
Assumptions of Infinity
259
More explicitly, let α(T ) be: ¬(T makes an assumption of infinity) ∧ ¬(T assumes merely the finite) ∧ T assumes the potentially infinite.
As a test for the acceptability of “T assumes the potentially infinite” I thus suggest (if T is a consistent first order theory): (!)
∃T α(T ) is not false already because of principles or definitions accepted for the predicates occurring in α(T ). We will see if this requirement can be fulfilled. But let me first note that from the logical truth of “for arbitrary x, either x is finite or x is infinite”, the falsity of ∃T α(T ) does not follow. Furthermore, it does not follow that for an arbitrary T , either T makes an assumption of infinity or T assumes merely the finite (compare the end of section 1). Now, there are no explicantia of “T assumes the potentially infinite” which have been accepted so far. So how could (!) be assessed? – The idea for an answer is this: whether (!) holds or not also depends on the precise explanation of “T makes an assumption of infinity” and “T assumes merely the finite”. Maybe such explanations alone already suffice for an evaluation of (!). In the rest of this section, this idea is carried out in a specific way. In order to do this, I will not even assume the correctness of definientia of “T makes an assumption of infinity” and “T assumes merely the finite” from sections 2.2 and 2.3. Rather, merely weakenings of them will be put forward as principles for assumptions of infinity and assumptions of merely the finite, to which one or two hopefully uncontroversial principles of a different kind are added. Thus, what I suggest here is (cf. sections 2.2 and 2.3 and the introduction): (Ad1) ∀M (M |= T =⇒ M is infinite) =⇒ T makes an assumption of infinity. (Ad2) ∃k∀M (M |= T =⇒ |M | ≤ k) =⇒ T assumes merely the finite.
(Ad3)
T assumes merely the finite ∧ S ⊆ T =⇒ S assumes merely the finite. (Ad1) to (Ad3) seem to be quite weak; but they are strong enough to have consequences for (!). Theorem 6: (Ad1) to (Ad3) imply43 for each consistent (first order) theory T : T assumes merely the finite or T makes an assumption of infinity. Proof: Assume T has a finite model M. M has k elements (for some k ∈ N). Consider Th(M). Then 43 That is, together with a certain amount of set theory.
260
Karl-Georg Niebergall
∀M′ (M′ |= T h(M) =⇒ |M ′ | ≤ k).
Therefore, by (Ad2), Th(M) assumes merely the finite. Now T ⊆ T h(M). By (Ad3), this yields T assumes merely the finite. That is: (*) ∃M (M |= T ∧ M is finite) =⇒ T assumes merely the finite.
Now, we have ∃M (M |= T ∧ M is finite) or ∀M (M |= T =⇒ M is infinite). In the first case, by (*), T assumes merely the finite; in the second case, by (Ad1), T makes an assumption of infinity. Consequence of Theorem 6: (!) is not the case. For by Theorem 6, α(T ) is false for each (consistent, first order) theory T . Moreover, the reasoning for this claim involves only the principles (Ad1) to (Ad3) and set theory. With a slight extension of the above set of principles (Ad1) to (Ad3), it can moreover be shown that the equivalences from (DIi) and (DFiv) (see sections 2.2 and 2.3) are consequences of principles dealing with the assumptions of the infinite and merely the finite. Thus, add (see footnote 19) (Ad4)
¬ (T makes an assumption of infinity ∧ T assumes merely the finite).
Theorem 7: (Ad1) to (Ad4) (plus set theory) imply (DIi) and (DFiv). Proof: (i) “∀M (M |= T =⇒ M is infinite) =⇒ T makes an assumption of infinity” is (Ad1). (ii) “T makes an assumption of infinity =⇒ ∀M (M |= T =⇒ M is infinite)”: if T makes an assumption of infinity and T has a finite model M, it follows by (*) from Theorem 6 that T assumes merely the finite, too. Yet, this is precluded by (Ad4). (iii) “∃M (M |= T ∧ M is finite) =⇒ T assumes merely the finite” is (*) from Theorem 6. (iv) “T assumes merely the finite =⇒ ∃M (M |= T ∧ M is finite)”: if T assumes merely the finite and each model of T is infinite, it follows by (i) that T makes an assumption of infinity, too. Yet, this is precluded by (Ad4).44 44 There are also versions of the dispensability of “T assumes the potentially infinite” which depend on suggested definitions of “T assumes the potentially infinite”. Here is an example: Corollary 4: (Ad2), (Ad3) plus (DPvi) (plus set theory) imply for each consistent (first order) T : if T assumes the potentially infinite, then T assumes merely the finite. Proof: By (DPvi), if T assumes the potentially infinite, then it has finite models.
Assumptions of Infinity
261
7 Conclusion 7.1 Summing up As a conclusion of this paper, let me bring together the findings of the previous sections and make some comments on them. The leading question of this work is What could be an (adequate) explicans of “T makes an assumption of infinity”? In section 3, I have provisionally accepted two possible explicantia for it, plus one for “T assumes merely the finite”: (DIi) T makes an assumption of infinity : ⇐⇒ ∀M (M |= T =⇒ M is infinite), or (DIiii) T makes an assumption of infinity : ⇐⇒ T |= ∃x (x is infinite).
(DFiv) T assumes merely the finite : ⇐⇒ ∃M (M |= T ∧ M is finite). On the face of it, (DIi) and (DIiii) are not equivalent: there is a real choice to be made between them, or so it seems. Now, section 6 contains principles for “T makes an assumption of infinity” and “T assumes merely the finite”: (Ad1) to (Ad4). Since these principles have (DIi) and (DFiv) as a consequence, there is no way around (DIi) and (DFiv) if (Ad1) to (Ad4) are accepted. In this case, it seems that (DIiii) cannot be upheld. But as we have seen, (DIiii) is incomplete: it must be refined. Thus, let’s investigate the versions of (DIiii) from section 5.2 and assess by recourse to the metatheorems given in section 5.3 which one of them could be adequate as an explicans of the pretheoretically given “T makes an assumption of infinity”. My understanding is this: First: By Theorem 5, the predicates “makes a universal very strong assumption of infinity” and “makes a universal weak assumption of infinity” do not apply to any suitable theory. As explicantia of “makes an assumption of infinity”, they should therefore be rejected. Second: In light of Example 1, and since I intuitively regard Z and ZF, FCI, the MCIn (for each n ≥ 1) and MCIω + FUSAt as making assumptions of infinity, I think that “T makes a universal strong assumption of infinity”, “T makes a strong assumption of infinity” and “T makes a universal very weak assumption of infinity” should have a good chance of being adequate. But then, weaker conditions are also not excluded by these examples. Third: I take Lemma 7 as suggesting that “makes a weak assumption of infinity” is not too weak (at least not obviously so) to be adequate. Together with (*) from the proof of Theorem 6 (which followed set theoretically from (Ad2) and (Ad3)), we therefore obtain that T assumes merely the finite.
262
Karl-Georg Niebergall
Fourth: Lemma 11 shows that for suitable mereological theories, at least, already “makes an existential very strong assumption of infinity” is too weak to be adequate. For else, a theory like ACI1 , which has merely the one-element model (up to isomorphism) would make an assumption of infinity.45 Fifth: Intuitively, I think that ACIω makes an assumption of infinity: all its models M are infinite, and each of these M contains an element 1M , the maximal element with respect to the part-of relation of M, which has infinitely many parts (in M ). Now it has been noted in section 5.3 that ACIω fails to make a universal strong assumption of infinity, a strong assumption of infinity and even a universal very weak assumption of infinity. But it does make a weak assumption of infinity. Both the fourth and the fifth observation deal only with calculi of individuals. Thus, their impact may be viewed as somewhat limited. Certainly, further examples in other languages have to be found and investigated to gain a better and more appropriate understanding of the different ways (DIiii) could be elaborated. But I take it that the above five comments nonetheless at least point to this: from the refinements of (DIiii) presented in section 5.2, “T makes a weak assumption of infinity” has the best prospects for being convincing as an explicans of “T makes an assumption of infinity”.46 It is remarkable that for suitable theories T , “T makes a weak assumption of infinity” implies the definiens of (DIi); in addition, in the case of mereological theories, the definientia of (DIi) and (DIiii), when explained by “T makes a weak assumption of infinity”, are even equivalent to each other (cf. Lemma 7 and Lemma 9(ii)).
7.2 The remaining problem Before this section is closed, a problem that has been indicated already in the fifth observation has to be addressed. Take a model M of ACIω . As has been pointed out in the proof of Lemma 9, M contains an object 1M such that (a′ ) 1M has infinitely many partsM . Now this can be shown: ACIω ⊢ ∀x IF in(x). Thus (b′ ) 1M satisfies “IF in(x)” in M. But, by Corollary 3,
(c′ ) “IF in(x)” universally strongly expresses that x is finite. 45 Since each suitable theory in L[∈] has an infinite domain, such theories are not of much help at this point. 46 Actually, when the case of ACIω is emphasized, only “T makes a weak assumption of infinity” remains as such an explicans.
Assumptions of Infinity
263
One thing should be clear: we do not have an antinomy here. Yet, one may have the impression that there is a tension between (a′ ), (b′ ) and (c′ ): in short, it seems that “x is finite” is ascribed by them to an object which is infinite. In what follows, let me develop and discuss the details of a slightly modified and generalized version of this idea. Thus, consider this: (RP) There is a formula α from L[◦] such that for each model M of ACIω , (a) 1M is infinite, (b) 1M satisfies α in M, (c) α expresses that x is finite. (RP) is what I call the remaining problem. If (RP) is the case, we really have a problem – at least, if a certain semantical principle (*) also holds: (*)
If α expresses that x is finite and if b satisfies α in M, then b is finite.
(for each formula α from L[◦], appropriate M and element b of M ). For choose α such that (a) to (c) hold. By (c) and (b), together with (*), we have: 1M is finite. But this contradicts (a); i.e., (RP) and (*) are inconsistent with each other. It might be submitted that (RP) is no logical consequence of (a′ ) to (c′ ), and therefore of no relevance to the observation I started with. The first part of this claim is true; but I do not agree with the second. For the step from “1M has infinitely many partsM ” to “1M is infinite” – and back – seems so obvious to me that I will gladly accept it. Moreover, I will also assume that it is allowed to conclude from (RP-AxFinI2) There is a formula α from L[◦] such that for each model M of ACIω , (a) 1M is infinite, (b) 1M satisfies α in M, (c∗ ) α strongly expresses that x is finite relative to ACIω . to (RP), and vice versa.47 With these additional assumptions, (RP) does follow from (a′ ) to (c′ ). After these partly preparatory comments, let me deal with two more serious suggestions for a way out of (RP). The first is: In distinction from the above-mentioned assumptions, (*) is not that plausible. In fact, what should be reasonable is this semantical principle: If α expresses that A and α is true, then A. (for appropriate sentences A and α). And its generalization to formulas α – here restricted to “x is finite”, like in (*) –, i.e. If α expresses that x ist finite and if b satisfies α, then b is finite 47 This assumption is made partly in order to simplify the ensuing argument.
264
Karl-Georg Niebergall
(for each appropriate object b) I regard to be just as convincing. Yet none of these should be conflated with (*). Generally, satisfaction should be emphatically distinguished from satisfacion in a model M. How debatable (*) may be can be seen from the repeated examples of “nonintended interpretations” presented in sections 2 and 3. So one answer to the remaining problem is to reject (*). In the rest of this section, I will discuss an alternative option. It is this: the axiom system (AxFinI2) given for finiteness in section 4.3 is not good enough; we have to search for a better one. Let’s first see how this could free us from the remaining problem. Thus consider a set (AX) of axioms for finiteness in L[◦, F ] which is different from (AxFinI2). Since “T makes an assumption of infinity” is supposed to be explained in the style of (DIiii), definitions which are analogues of those from section 5, but now rest rest on (AX) in place of (AxFinI1) or (AxFinI2), should be taken into account. Accordingly, let α be a formula (in one free variable) from L[◦], let (AX[α]) be the result of replacing “F ” in (AX) by α, and let T (in L[◦]) be suitable. Then: α(x) AX-expresses that x is finite relative to T : ⇐⇒ (AX[α]) ⊆ T .
α(x) universally AX-expresses that x is finite : ⇐⇒ ∀T (T is suitable =⇒ α(x) AX-expresses that x is finite relative to T ). α(x) existentially AX-expresses that x is finite : ⇐⇒ ∃T (T is suitable ∧ α(x) AX-expresses that x is finite relative to T ). T makes a universal AX-assumption of infinity : ⇐⇒ ∃α (α ∈ L[◦] ∧ T ⊢ ∃x¬α(x) ∧ α(x) universally AX-expresses that x is finite). T makes an existential AX-assumption of infinity : ⇐⇒ ∃α (α ∈ L[◦] ∧ T ⊢ ∃x¬α(x) ∧ α(x) existentially AX-expresses that x is finite).
T makes an AX-assumption of infinity : ⇐⇒ ∃α (α ∈ L[◦] ∧ T ⊢ ∃x¬α(x) ∧ α(x) AX-expresses that x is finite relative to T ).
But now, it may well be that “IF in(x)” does not AX-express that x is finite relative to ACIω . In this case, “IF in(x)” fails to be a witness for the claim (RP-AX) There is a formula α from L[◦] such that for each model M of ACIω , (a) 1M is infinite, (b) 1M satisfies α in M,
(c∗ ) α AX-expresses that x is finite relative to ACIω , which is the relevant analogue of (RP-AxFinI2) now, and our reason for (RP) has disappeared. This reply to the remaining problem, though not conclusive, may nonetheless seem to be natural and initially attractive enough. On further thought, however, I do not think that it is entirely convincing. For let’s
Assumptions of Infinity
265
assume that for some formula α in L[◦] and theory T (in L[◦]), T is suitable and α AX-expresses that x is finite relative to T . This is the plausible assumption anyway. For if it is denied, then, in particular, there exists no suitable T (in L[◦]) which makes a universal AXassumption of infinity, none which makes an existential AX-assumption of infinity and none which makes an AX-assumption of infinity. Since other explicantia for “T makes an assumption of infinity” than the ones above have not been provided here, it should be concluded that there are no suitable T (in L[◦]) at all which make an assumption of infinity. But this is certainly not an intended result. I think that ACIω , for example, makes an assumption of infinity. Therefore, let’s assume that ACIω makes an AX-assumption of infinity. Then we have (+) there exists a formula α in L[◦] such that (AX[α]) is provable in ACIω .48 Now, whatever (AX) may, I think that (AxI2.i) and (AxI2.ii) have to be provable from it; given my linguistic intuitions, this is beyond dispute. This together with (+) implies (i) ACIω ⊢ AxF inI2[α]. In addition, by Theorem 3, (ii) ACIω ⊢ AxF inI2[α] −→ ∀x (IF in(x) → α(x)). But (i) plus (ii) yield ACIω ⊢ ∀x (IF in(x) → α(x)); and since ACIω ⊢ ∀x IF in(x), we eventually get
(iii) ACIω ⊢ ∀x α(x). In sum, that means that there exists some formula α (in L[◦]) such that for each model M of ACIω , 1M is infinite and satisfies α in M (by (iii)). And since by (+), α expresses that x is finite relative to ACIω , α is a witness for the truth of (RP-AX) and finally for (RP). That is, the remaining problem is still with us (if the assumptions mentioned at the beginning of section 7.2 are made). At this point, my investigation comes to an end. With respect to the main question – “What could an (adequate) explicans of “T makes an assumption of infinity” be?’ – its conclusion amounts to this: my initial linguistic intuitions, the derivation of (DIi) from the principles (Ad1) to (Ad4), the five observations from section 7.1 and the discussion of the remaining problem (albeit inconclusive) – all of them suggest the definiens of (DIi) as the answer.49
48 If (AX) is a weakening of the axiom system (AxFinI2), such an α is easy to find: simply take “IF in” for it. Let me add that I wonder what mereological axioms for finiteness could be which are not derivable with the help of (AxFinI2). 49 Parts of this paper were presented at several meetings of our TransCoop group on
266
Karl-Georg Niebergall
8 Appendix: a definition of “x is finite” suitable for weak set theories This appendix contains a precise statement and a proof of Theorem 2 from section 4.2.50
8.1 The natural numbers in CST A well known set theoretical definition of “ω” which does not need infinite sets in order to work properly is this: Definition A1: ω − (x) :←→ (x = ∅∨∃y(x = Sy))∧∀z(z ∈ x −→ z = ∅∨∃y(z = Sy))51 Lemma A1: (i) CST ⊢ ω − (∅) (ii) CST ⊢ ∀x (ω − (x) −→ ω − (Sx))
Each element of ω should be an ordinal number. To be on the safe side, the condition of being an ordinal is sometimes simply added to the above definiens. Let’s do that here, too. Moreover, there are two common definientia for “x is an ordinal”: one in which the principle of foundation is built in, and another one where foundation is part of the axiomatic basis of the respective set theory. Since I want to avoid any use of the axiom of foundation, I choose the first option. Definition A2: connex(x) :←→ ∀yz (y ∈ x ∧ z ∈ x −→ y ∈ z ∨ z ∈ y ∨ y = z) trans(x) :←→ ∀y (y ∈ x −→ y ⊆ x) f und(x) :←→ ∀y (y 6= ∅ ∧ y ⊆ x −→ ∃z (z ∈ y ∧ y ∩ z = ∅))
Ord(x) :←→ connex(x) ∧ trans(x) ∧ f und(x) (for “x is an ordinal”)
Lemma (i) (ii) (iii)
A2: CST ⊢ ∀x (f und(x) −→ x 6∈ x) CST ⊢ Ord(∅) CST ⊢ ∀x (Ord(x) −→ Ord(Sx))
Imaginary and Ideal Elements and Limit Concepts in Mathematics. I would like to thank all its members, and, in particular, Godehard Link, Gregor Schneider, Sebastian Paasch and an anonymous referee for helpful comments. 50 The approach to set theoretical definitions of infinity presented in this appendix can already be found in [1]. In particular, this book contains the definitions from section 8.1. But it seems that in order to obtain the results from sections 8.2 and 8.3, Bernays employs a theory which is stronger than CST. 51 Sx := x ∪ {x}.
Assumptions of Infinity
267
(iv) CST ⊢ ∀xy (Ord(x) ∧ y ∈ x −→ Ord(y))52
The above considerations motivate an enhancement of Definition A1 which will be highly useful for my goal. Definition A3: ω(x) :←→ ω − (x) ∧ Ord(x) Lemma (i) (ii) (iii)
A3: CST ⊢ ω(∅) CST ⊢ ∀x (ω(x) −→ ω(Sx)) CST ⊢ ∀xu (ω(x) ∧ u ∈ x −→ ω(u))
Proof: (iii) By Lemma A2(iv), we first of all have (*)
CST ⊢ ω(x) ∧ u ∈ x −→ Ord(x) ∧ u ∈ x −→ Ord(u).
Moreover, by the definition of ω, we also have (**.i) CST ⊢ ω(x)∧u ∈ x −→ ∀z(z ∈ x −→ z = ∅∨∃y(z = Sy))∧u ∈ x −→ u = ∅ ∨ ∃y (u = Sy). In addition, CST ⊢ ω(x) ∧ u ∈ x −→ trans(x) ∧ u ∈ x −→ u ⊆ x, whence by an analogue of (**.i) in the 2nd step CST ⊢ ω(x) ∧ u ∈ x ∧ z ∈ u −→ ω(x) ∧ z ∈ x −→ z = ∅ ∨ ∃y (z = Sy),
and therefore (**.ii) CST ⊢ ω(x) ∧ u ∈ x −→ ∀z (z ∈ u −→ z = ∅ ∨ ∃y (z = Sy)). (*), (**.i) and (**.ii) imply the desired result.
Lemma A4: Let ϕ be a formula from L[∈]; then CST ⊢ ω(x) ∧ ϕ(x) −→ ∃x (ω(x) ∧ ϕ(x) ∧ ∀y (y ∈ x −→ ¬ϕ(y)))
Proof: By Lemma A3(ii) in the first step and (unrestricted) separation in the second step we have CST ⊢ ω(x) ∧ ϕ(x) −→ ω(Sx) ∧ x ∈ Sx ∧ ϕ(x) −→ f und(Sx) ∧ ∃u (u = {y ∈ Sx | ϕ(y)} ∧ u 6= ∅ ∧ u ⊆ Sx). Now this yields CST ⊢ ω(x) ∧ ϕ(x) −→ ∃u (u = {y ∈ Sx | ϕ(y)} ∧ ∃z (z ∈ u ∧ u ∩ z = ∅)), whence (*) CST ⊢ ω(x) ∧ ϕ(x) −→ ∃z (ω(Sx) ∧ z ∈ Sx ∧ ϕ(z) ∧ ∀y (y ∈ z ∧ y ∈ Sx −→ ¬ϕ(y))). 52 See [17] and [2] for some details of the proof.
268
Karl-Georg Niebergall
Moreover, by Lemma A3(iii) (**) CST ⊢ ω(Sx) ∧ z ∈ Sx ∧ ϕ(z) ∧ ∀y (y ∈ z ∧ y ∈ Sx −→ ¬ϕ(y)) −→ ω(z) ∧ ϕ(z) ∧ z ⊆ Sx ∧ ∀y (y ∈ z ∧ y ∈ Sx −→ ¬ϕ(y)) −→ ω(z) ∧ ϕ(z) ∧ ∀y (y ∈ z −→ ¬ϕ(y)). (*) plus (**) imply CST ⊢ ω(x) ∧ ϕ(x) −→ ∃x (ω(x) ∧ ϕ(x) ∧ ∀y (y ∈ x −→ ¬ϕ(y))). Let’s apply these lemmata to establish an induction principle for ω. More precisely, let ψ be a formula from L[∈] and set IHψ := ψ(∅) ∧ ∀x (ω(x) ∧ ψ(x) −→ ψ(Sx)). Theorem A1: CST ⊢ IHψ −→ ∀x (ω(x) −→ ψ(x)) Proof: By the definition of “ω”, we have
CST ⊢ IHψ ∧ω(x)∧¬ψ(x) −→ (x = ∅∨∃y (x = Sy))∧ψ(∅)∧¬ψ(x), and therefore (*) CST ⊢ IHψ ∧ ω(x) ∧ ¬ψ(x) −→ ∃y (x = Sy).
Moreover, we also have by Lemma A3(iii) CST ⊢ IHψ ∧ ω(x) ∧ ¬ψ(x) ∧ ∀y (y ∈ x −→ ψ(y)) ∧ x = Sy −→ y ∈ x ∧ ω(y), whence CST ⊢ IHψ ∧ ω(x) ∧ ¬ψ(x) ∧ ∀y (y ∈ x −→ ψ(y)) ∧ x = Sy −→ ω(y) ∧ ψ(y) ∧ IHψ ,
which by the definition of “IHψ ” yields CST ⊢ IHψ ∧ ω(x) ∧ ¬ψ(x) ∧ ∀y (y ∈ x −→ ψ(y)) ∧ x = Sy −→ ψ(Sy) ∧ x = Sy ∧ ¬ψ(x) −→ ⊥. Now this implies (**) CST ⊢ IHψ ∧ ω(x) ∧ ¬ψ(x) ∧ ∀y (y ∈ x −→ ψ(y)) −→ (∃y (x = Sy) −→ ⊥).
(*) plus (**) have as a consequence: CST ⊢ IHψ ∧ ω(x) ∧ ¬ψ(x) ∧ ∀y (y ∈ x −→ ψ(y)) −→ ⊥, and therefore
CST ⊢ IHψ −→ [∃x (ω(x) ∧ ¬ψ(x) ∧ ∀y (y ∈ x −→ ψ(y))) −→ ⊥]. Together with Lemma A4, this yields CST ⊢ IHψ ∧ ω(x) ∧ ¬ψ(x) −→ ⊥, i.e. CST ⊢ IHψ −→ ∀x (ω(x) −→ ψ(x)).
Assumptions of Infinity
269
8.2 Finiteness In view of (D2) and with “ω(y)” as an alternative to “y ∈ ω”, this definition suggests itself: Definition A4: SF in(x) :←→ ∃y (ω(y) ∧ y ≃ x)
Yet, in the weak set theory CST dealt with here, one has to be careful with the definiens for “bijection”. For, although it is provable in CST that for each a, b, their ordered pair (defined in the usual sense) exists, the cartesian product of arbitrary sets x, y need not provably exist in CST. Thus, let’s explain what a function from x to y is without presupposing the existence of their cartesian product. Definition A5: Rel(u, x, y) :←→ ∀z (z ∈ u −→ ∃vw (z = hv, wi ∧ v ∈ x ∧ w ∈ y))
F ctpart(u, x, y) :←→ Rel(u, x, y) ∧ ∀vww′ (v ∈ x ∧ w ∈ y ∧ w′ ∈ y ∧ hv, wi ∈ u ∧ hv, w′ i ∈ u −→ w = w′ )
F ct(u, x, y) :←→ F ctpart(u, x, y) ∧ ∀v (v ∈ x −→ ∃v ′ (hv, v ′ i ∈ u)) Sur(u, x, y) :←→ F ct(u, x, y) ∧ ∀v ′ (v ′ ∈ y −→ ∃v (hv, v ′ i ∈ u))53
In(u, x, y) :←→ F ct(u, x, y) ∧ ∀vv ′ w (v ∈ x ∧ v ′ ∈ x ∧ w ∈ y ∧ hv, wi ∈ u ∧ hv ′ , wi ∈ u −→ v = v ′ )
u : x −→bij y :←→ Sur(u, x, y) ∧ In(u, x, y) x ≃ y :←→ ∃u (u : x −→bij y) Lemma (i) (ii) (iii)
A5: CST ⊢ ∅ : ∅ −→bij ∅ CST ⊢ ∅ ≃ y −→ y = ∅ CST ⊢ ω(x)∧f : x −→bij y∧z 6∈ y =⇒ f ∪{hx, zi} : x∪{x} −→bij y ∪ {z}
Proof: (ii) CST ⊢ u : ∅ −→bij y ∧ y 6= ∅ −→ y 6= ∅ ∧ ∀v ′ (v ′ ∈ y −→ ∃v (hv, v ′ i ∈ u ∧ v ∈ ∅)) −→ ∃v ′ (v ′ ∈ y ∧ ∃v (hv, v ′ i ∈ u ∧ v ∈ ∅)) −→ ∃v (v ∈ ∅) −→ ⊥. (iii) Let’s abbreviate “ω(x) ∧ f : x −→bij y ∧ z 6∈ y” by γ(f, x, y, z), and set g := f ∪ {hx, zi}. What has to be shown is (a) CST ⊢ γ(f, x, y, z) −→ Rel(g, Sx, y ∪ {z}), (b) CST ⊢ γ(f, x, y, z) −→ F ctpart(g, Sx, y ∪ {z}),
(c) CST ⊢ γ(f, x, y, z) −→ F ct(g, Sx, y ∪ {z}),
53 These are simplified definientia. They are equivalent to the more common ones because CST ⊢ hx, yi = hx′ , y ′ i −→ x = x′ ∧ y = y ′ .
270
Karl-Georg Niebergall
(d) CST ⊢ γ(f, x, y, z) −→ Sur(g, Sx, y ∪ {z}), (e) CST ⊢ γ(f, x, y, z) −→ In(g, Sx, y ∪ {z}). Ad (a): — By definition of g,
CST ⊢ a ∈ g ∧ γ(f, x, y, z) −→ (a ∈ f ∨ a = hx, zi) ∧ γ(f, x, y, z) −→ ∃vw(a = hv, wi∧v ∈ x∧w ∈ y)∨∃vw(a = hv, wi∧v ∈ Sx∧w ∈ y∪{z}) −→ ∃vw (a = hv, wi ∧ v ∈ Sx ∧ w ∈ y ∪ {z}), and therefore CST ⊢ γ(f, x, y, z) −→ Rel(g, Sx, y ∪ {z}).
Ad (b): — By definition of g, CST ⊢ hv, wi ∈ g ∧ hv, w′ i ∈ g −→ [hv, wi ∈ f ∧ hv, w′ i ∈ f ] ∨ [hv, wi = hx, zi = hv, w′ i] ∨ [hv, wi ∈ f ∧ hx, zi = hv, w′ i] ∨ [hv, w′ i ∈ f ∧ hx, zi = hv, wi]
By the assumption γ(f, x, y, z) we get (*) CST ⊢ γ(f, x, y, z) ∧ ([hv, wi ∈ f ∧ hv, w′ i ∈ f ] ∨ [hv, wi = hx, zi = hv, w′ i]) −→ w = w′ .
Moreover CST ⊢ γ(f, x, y, z)∧([hv, wi ∈ f ∧hx, zi = hv, w′ i]∨[hv, w′ i ∈ f ∧hx, zi = hv, wi]) −→ x = v, whence by applying Lemma A2(i) and Definition A2 in the last step, (**) CST ⊢ γ(f, x, y, z) ∧ ([hv, wi ∈ f ∧ hx, zi = hv, w′ i] ∨ [hv, w′ i ∈ f ∧hx, zi = hv, wi]) −→ (hx, wi ∈ f ∨ hx, w′ i ∈ f ) ∧ γ(f, x, y, z) −→ x ∈ x ∧ ω(x) −→ ⊥. (*) and (**) yield (b). Ad (c): — Clearly, CST ⊢ γ(f, x, y, z) ∧ v ∈ x −→ ∃v ′ (hv, v ′ i ∈ f ) −→ ∃v ′ (hv, v ′ i ∈ g) and CST ⊢ γ(f, x, y, z) ∧ v = x −→ hv, zi ∈ g −→ ∃v ′ (hv, v ′ i ∈ g). Since CST ⊢ v ∈ Sx −→ v ∈ x ∨ v = x, (c) follows.
Ad (d): — Like (c). Ad (e): — Like (b): By definition of g,
CST ⊢ hv, wi ∈ g ∧ hv ′ , wi ∈ g −→ [hv, wi ∈ f ∧ hv ′ , wi ∈ f ] ∨ [hv, wi = hx, zi = hv ′ , wi] ∨ [hv, wi ∈ f ∧ hx, zi = hv ′ , wi] ∨ [hv ′ , wi ∈ f ∧ hx, zi = hv, wi] By the assumption γ(f, x, y, z) we get (*) CST ⊢ γ(f, x, y, z) ∧ ([hv, wi ∈ f ∧ hv ′ , wi ∈ f ] ∨ [hv, wi = hx, zi = hv , wi]) −→ v = v ′ . ′
Assumptions of Infinity
271
Moreover CST ⊢ γ(f, x, y, z)∧([hv, wi ∈ f ∧hx, zi = hv ′ , wi]∨[hv ′ , wi ∈ f ∧hx, zi = hv, wi]) −→ z = w,
whence by applying the assumption “z 6∈ y” in the last step, (**) CST ⊢ γ(f, x, y, z) ∧ ([hv, wi ∈ f ∧ hx, zi = hv ′ , wi] ∨ [hv ′ , wi ∈ f ∧hx, zi = hv, wi]) −→ (hv, zi ∈ f ∨ hv ′ , zi ∈ f ) ∧ γ(f, x, y, z) −→ γ(f, x, y, z) ∧ z ∈ y −→ ⊥.
(*) and (**) yield (e).
Corollary A1: (i) CST ⊢ ∅ ≃ ∅ (ii) CST ⊢ ω(x) ∧ x ≃ y −→ x ≃ y ∪ {z} ∨ Sx ≃ y ∪ {z}
Proof: (ii) CST ⊢ x ≃ y ∧ z ∈ y −→ x ≃ y ∪ {z}. And by Lemma A5(iii), CST ⊢ ω(x) ∧ x ≃ y ∧ z 6∈ y −→ Sx ≃ y ∪ {z}. Lemma A6: (i) CST ⊢ ω(x) ∧ f : Sx −→bij y ∧ hx, zi ∈ f −→ f \ {hx, zi} : x −→bij y \ {z} (ii) CST ⊢ ω(x) ∧ Sx ≃ y −→ ∃z (z ∈ y ∧ x ≃ y \ {z})
Proof: (i) Let’s abbreviate “ω(x) ∧ f : Sx −→bij y ∧ hx, zi ∈ f ” by γ(f, x, y, z). What has to be shown is (a) CST ⊢ γ(f, x, y, z) −→ Rel(f \ {hx, zi}, x, y \ {z})
(b) CST ⊢ γ(f, x, y, z) −→ F ctpart(f \ {hx, zi}, x, y \ {z}) (c) CST ⊢ γ(f, x, y, z) −→ F ct(f \ {hx, zi}, x, y \ {z})
(d) CST ⊢ γ(f, x, y, z) −→ Sur(f \ {hx, zi}, x, y \ {z}) (e) CST ⊢ γ(f, x, y, z) −→ In(f \ {hx, zi}, x, y \ {z}).
Ad (a): — Since f is a bijection (used in the second step), we have CST ⊢ γ(f, x, y, z) ∧ a ∈ f \ {hx, zi} −→ ∃vw (a = hv, wi∧a ∈ f ∧hx, zi ∈ f ∧a 6= hx, zi∧v ∈ Sx∧w ∈ y) −→ ∃vw (a = hv, wi ∧ v ∈ Sx ∧ w ∈ y ∧ v 6= x ∧ w 6= z),
and thus
CST ⊢ γ(f, x, y, z) −→ ∀a (a ∈ f \ {hx, zi} −→ ∃vw (a = hv, wi ∧ v ∈ x ∧ w ∈ y \ {z})) −→ Rel(f \ {hx, zi}, x, y \ {z}). Ad (b) and (e): — They follow from f \ {hx, zi} ⊆ f .
Ad (c): — By Lemma A2(i), CST ⊢ γ(f, x, y, z) ∧ v ∈ x −→ γ(f, x, y, z) ∧ v ∈ Sx ∧ v 6= x −→ ∃v ′ (hv, v ′ i ∈ f ∧ v ′ ∈ y ∧ v 6= x) −→ ∃v ′ (hv, v ′ i ∈ f ∧ hv, v ′ i 6= hx, zi)
272
Karl-Georg Niebergall
whence, CST ⊢ γ(f, x, y, z) −→ ∀v (v ∈ x −→ ∃v ′ (hv, v ′ i ∈ f \ {hx, zi})). The claim follows by (b). Ad (d): — Similar to (c). (ii) From (i), we have CST ⊢ γ(f, x, y, z) −→ ∃g (g : x −→bij y \ {z}) −→ x ≃ y \ {z},
whence (*) CST ⊢ ω(x) ∧ f : Sx −→bij y −→ ∀z (hx, zi ∈ f −→ x ≃ y \ {z}). In addition,
(**) CST ⊢ ω(x) ∧ f : Sx −→bij y −→ x ∈ Sx ∧ f : Sx −→bij y −→ ∃z (z ∈ y ∧ hx, zi ∈ f ). (*) plus (**) imply (ii).
8.3 Deriving the principles Theorem A2: (i) CST ⊢ SF in(∅) (ii) CST ⊢ ∀yz (SF in(y) −→ SF in(y ∪ {z})) (iii) For each formula ψ in L[∈], CST ⊢ AxF in3[ψ] −→ ∀x (SF in(x) −→ ψ(x))
Proof: (i) By Lemma A3(i) and Corollary A1(i).
(ii) By Lemma A3(ii) and Corollary A1(ii), CST ⊢ ω(x)∧x ≃ y −→ (ω(x)∧x ≃ y∪{z})∨(ω(Sx)∧Sx ≃ y∪{z}) −→ ∃x (ω(x) ∧ x ≃ y ∪ {z}) −→ SF in(y ∪ {z}).
(iii) Consider this formula from L[∈] (with new y): ϕ(x) :←→ ∀y (x ≃ y −→ ψ(y)).
Claim: (*) CST ⊢ AxF in3[ψ] −→ ϕ(∅), (**)CST ⊢ AxF in3[ψ] −→ ∀x (ω(x) ∧ ϕ(x) −→ ϕ(Sx)). By Theorem A1, (*) plus (**) imply
CST ⊢ AxF in3[ψ] −→ ∀y (ω(y) −→ ϕ(y)), i.e., CST ⊢ AxF in3[ψ] −→ ∀xy (ω(y) ∧ y ≃ x −→ ψ(x)) −→ ∀x (∃y (ω(y) ∧ y ≃ x) −→ ψ(x)) −→ ∀x (SF in(x) −→ ψ(x)). Ad (*), i.e., CST ⊢ AxF in3[ψ] −→ ∀y (∅ ≃ y −→ ψ(y)). By Lemma A5(ii), we have
Assumptions of Infinity
273
CST ⊢ AxF in3[ψ] ∧ ∅ ≃ y −→ ψ(∅) ∧ y = ∅ −→ ψ(y). Ad (**), i.e., CST ⊢ AxF in3[ψ] −→ ∀xy (ω(x) ∧ ϕ(x) ∧ Sx ≃ y −→ ψ(y)).
By definition of ϕ and Lemma A6(ii) we have CST ⊢ AxF in3[ψ] ∧ ω(x) ∧ ϕ(x) ∧ Sx ≃ y −→ ∃z (z ∈ y ∧ x ≃ y \ {z}) ∧ ∀y (x ≃ y −→ ψ(y)), whence (a) CST ⊢ AxF in3[ψ] ∧ ω(x) ∧ ϕ(x) ∧ Sx ≃ y −→ ∃z (z ∈ y ∧ x ≃ y \ {z} ∧ ψ(y \ {z})) −→ ∃z (z ∈ y ∧ ψ(y \ {z})). Moreover, the definition of AxF in3[ψ] yields CST ⊢ AxF in3[ψ] ∧ ψ(y \ {z}) −→ ψ((y \ {z}) ∪ {z}), i.e. CST ⊢ AxF in3[ψ] ∧ z ∈ y ∧ ψ(y \ {z}) −→ ψ(y),
and therefore
(b) CST ⊢ AxF in3[ψ] −→ [∃z (z ∈ y ∧ ψ(y \ {z})) −→ ψ(y)]. (a) plus (b) imply the sought after CST ⊢ AxF in3[ψ] ∧ ω(x) ∧ ϕ(x) ∧ Sx ≃ y −→ ψ(y).
References [1] P. Bernays. Axiomatic Set Theory. Studies in Logic and the Foundations of Mathematics. North Holland, Amsterdam, 1968. [2] H.D. Ebbinghaus. Einführung in die Mengentheorie. Mannheim, Wien, Zürich, 3rd edition, 1994. [3] Adolf Fraenkel. Zehn Vorlesungen über die Grundlegung der Mengenlehre (1927). Darmstadt, 1972. Reproduced by Wissenschaftliche Buchgesellschaft. [4] R. Kaye. Models of Peano Arithmetic. Oxford, 1991. [5] D. Klaua. Grundbegriffe der axiomatischen Mengenlehre, Teil 2. Braunschweig, 1973. [6] K. Kunen. Set Theory. Amsterdam, New York, Oxford, 1980. [7] S. Lavine. Understanding the Infinite. Harvard University Press, Cambridge, Mass. and London, 1994. [8] P. Lorenzen. Das Aktual-Unendliche in der Mathematik. Philosophia Naturalis, 4:4–11, 1957. [9] A.W. Moore. The Infinite. Routledge, London, 1991. [10] J. Mycielski. Analysis without actual infinity. The Journal of Symbolic Logic, 46:625–633, 1981.
274
Karl-Georg Niebergall
[11] J. Mycielski. Locally finite theories. The Journal of Symbolic Logic, 51: 59–62, 1986. [12] K.G. Niebergall. Is ZF finitistically reducible? In G. Link (ed.), One Hundred Years of Russell’s Paradox, pp. 153–180, Berlin, New York, 2004. [13] K.G. Niebergall. Calculi of individuals and some extensions. In A. Hieke and H. Leitgeb (eds), Reduction – Abstraction – Analysis. Proceedings of the 31th International Ludwig Wittgenstein-Symposium in Kirchberg, 2008, pp. 335–354, Frankfurt, Paris, Lancaster, New Brunswick, 2009. Ontos Verlag. [14] K.G. Niebergall. Unendlichkeit ausdrücken und Unendlichkeitsannahmen machen. In C.F. Gethmann (ed.), Lebenswelt und Wissenschaft, volume 2, pp. 957–976, 2011. [15] I. Scheffler and N. Chomsky. What is said to be. Proceedings of the Aristotelian Society, 59:71–82, 1958. [16] J. Schmidt. Mengenlehre I. Bibliographisches Institut, Mannheim, 1966. [17] G. Takeuti and W. Zaring. Introduction to Axiomatic Set Theory. Springer, New York, Heidelberg, Berlin, 2nd edition, 1982.
The Interpretation of Classes in Axiomatic Set Theory Daniel Roth, Gregor Schneider
In this article, we want to pin down the discussion of what classes are to the roots of their mathematical use in axiomatic set theories. In order to extrapolate the intended or best suitable interpretation of classes, in the first part we outline most of the set theories that incorporate proper classes and avail ourselves of the attitude the inventors displayed towards their classes. In the second part, we discuss – adopting a more systematic approach – some of the more metamathematical questions about classes and their relation to sets.1
1 Introduction In this article, we focus on the functional relation between sets and proper classes. Using a historical survey we aim at pointing out that classes have often been used in a more or less essential way for axiomatizing what sets are and for working with them, or for giving our concept of openendedness in set theory a more precise meaning respectively. But what are these proper classes, which we are compelled to put up, if we want to do set theory? How can and how should they be interpreted?
2 Set Theories 2.1 Cantor and ZF Cantor, the founder of set theory,2 was aware, at least from 1883, that one had to differentiate between a multiplicity (“Vielheit”) of things which could be conceived as a unity and an inconsistent multiplicity which does 1
We are grateful for the careful comments of two anonymous referees.
2
The classical overview of set and class theories is [24], more current are [30] and [3].
276
Daniel Roth & Gregor Schneider
not form ‘one finished thing’.3 The former he called sets, the latter he left without a technical name. The difference between sets and inconsistent multitudes was based on the imperative, (CL) that if such a ‘universal’ set as the collection of all sets would be a unity, it should be possible, e.g. via the powerset and the diagonal argument, to ‘construct’ an even ‘bigger’ set, the set of all subsets of the universal set, which leads to the so-called Cantor Paradox.4 We call this principle Cantor’s Legacy. It is a way of stretching the potential infinity of the natural numbers to the size of all infinite sets. Its result is the open-endedness of the hierarchy of sets, and it stipulates a conception of sets as being indefinitely extensible. However, Cantor himself sometimes believed that every potential infinity points at an actual infinity, otherwise it would not be possible to think of something being potentially infinite.5 Therefore there has to be an actually infinite, inconsistent multiplicity of exactly all sets – something nowadays called proper class. Since sets have often been conceived as the pure structural background of everything – and we join this line of argument – there should be no other objects that could not be modeled set-theoretically. Proper classes are equivalent to a collection of sets that does not form a set. Thus, the class of all sets would be a new object with probably no (definite formal) structure. However, if we could clarify in an arbitrary way the content of the concept of class and especially of the universal class, it would seemingly shed light on the concept of set because, with knowledge about the universal class, we would achieve knowledge about what “indefinitely extensible” means. In other words: It would be investigated how ‘high’ the imperative (CL) leads, or where its validity vanishes. Since hardly any formalism, which would have allowed Cantor to express his views more formally, existed in Cantor’s time, he left the task of formalizing his thoughts to the mathematicians to come. The challenge was to understand what differentiates sets and, adopting modern speech, proper classes, and find an adequate formal representation of the dividing properties. Since proper classes do not exist according to Cantor, not only in the sense that they are abstract objects whose existence always seems 3
Cf. [12, p. 443ff.,204], [31, p. 400n3].
4
Cf. [12, p. 448]. (CL) is adequat from a purely set-theoretical perspective, because cardinal and ordinal numbers are seen as special sets. From a historical or more philosophical perspective one has to distinguish the extensibility of cardinal numbers, ordinal numbers and sets, and one has to invoke something like the Axiom of Choice. Please note that (CL) does not explicitly give a property with which we could distinguish sets and inconsistent multiplicities. Rather (CL) contains the mathematical background of the distinction between sets and ‘unfinished’ multiplicities.
5
Cf. [12, p. 404].
The Interpretation of Classes in Axiomatic Set Theory
277
to be dubious, but even in the strict sense that they cannot be thought of as a coherent unity, the question was whether such classes could be represented formally at all. The view on axiomatizing Cantor’s set theory most widely accepted nowadays was given by Zermelo. Skolem and Fraenkel as well as von Neumann and Mirimanoff had made contributions to the axiom system which is nowadays called ZF. It was the idea behind the axioms which was so influential and which is summarized under the idea of building a cumulative hierarchy of sets, the building process being governed by the axioms. Arbitrary classes could not be formed by Zermelo’s axioms, even though it was clear that the intended domain of the theory, the collection of all sets, was a proper class. So Cantor’s intuition about sets and classes can be grasped in the following way: Objects that can be formed by the axioms are sets, while such classes as the Russell-class or similar multiplicities cannot be formed at all. Of course the question of which axioms to adopt – in particular, the axiom of choice was subject to a hard debate – or if axioms are the right tool to formalize Cantor’s idea at all,6 was not settled by Zermelo’s axioms. During the 20th century, however, a wide range of different class and set theories has been developed and explored, and the terms ‘class’ and ‘set’ changed their meanings.7 For our analysis we take set as a subspecies of class and engage in the classical distinction of proper classes that cannot be members of other classes (mere extensions) and proper classes which can (we call them set-classes). When the context makes the point of view clear, we sometimes use “class” for “set or set-class”, excluding the extensions.8 In set theories with a universal set-class (which is now the set-class of all sets and set-classes), the question remains how to deal with the interaction of the small sets and the big set-classes and how to characterize the sets through interrelation to the set-classes or how to define the set-class of all sets. For set theories with a universal set-class, there are extensions like the Russell class or often the class of all sets, for which there is no extensional equivalent set-class.9 In the tradition of the dichotomy of Cantor and Frege, it seems natural to think of mathematical sets and logical classes. But if ‘logical’ includes ‘not-mathematical’, it will not concern us. The other traditional characteristic by which extensions differ from set-classes is predicativity, but as our starting point is mathematical set theory and we are not frightened by im6
See Skolem’s fundamental critique in [66].
7
The single quotation marks sometimes highlight the problematic character of metaphoric use of expressions, and sometimes they are used as a relaxed version of the ordinary quotation marks. Here, “class” is an abbreviation for all equivalent terms like “Klasse” which have been used in the historical discussion.
8
In the title of our article we refer to proper classes.
9
The universal set-class is member of itself, the universal extension is not selfmembered.
278
Daniel Roth & Gregor Schneider
predicativity and Ockham’s razor, this distinction is of minor importance. It seems to be a technical and contingent restriction in a mathematical context.
2.2 Von Neumann In [70] and [71] von Neumann introduced the first formal class theory N. His aim was to save axiomatic set theory by overcoming some of its defects. To achieve this, he wanted to calibrate the axioms so that everything desirable could be proved while most of the negative effects would be avoided. Von Neumann’s attitude was formalistic. In [71, p. 674] he emphasizes that mathematical concepts are without meaning. In short, his attitude at this time can be understood as follows: If the concepts are meaningless anyway, any two axiomatic theories are equally justified. From our point of view, von Neumann is guided by two main thoughts concerning the introduction of his class theory: a)
The theory should allow the expression of any calculations of informal set theory.
b)
Formal derivations should be as simple as possible.
As is known today, his theory implies ZF,10 therefore the first thought is met by the axioms. The advantage of his class theory is that the naïve process of building (predicative) classes is allowed while these classes are not necessarily sets, thus avoiding the Russell paradox. Aiming towards a facilitation of formal derivations, von Neumann uses functions as basic, with one common pool of potential arguments and values. Functions are in the end classes and the elements of a class f are defined as the arguments x which do not point to a value a which was fixed by the axioms (i.e. x ∈ f ⇔ f (x) 6= a). Von Neumann makes a distinction between arguments and functions, allowing only the ‘small’ functions to be arguments again for other functions. The central axiom is IV.2, in which it is specified when a function is ‘too big’, e.g. a proper class. A function f is no argument if, and only if, there is a surjective function g from all arguments of f , for which f (x) is not a, (i.e. the elements of f ) to all arguments (i.e. for every possible argument y there is a x with f (x) 6= a such that g(x) = y). Thus a class is a set if it is smaller than the collection of all sets. In a set theoretic notion, one would state that a class is a proper class if it can be mapped onto the universe V of all sets. Sets, on the other hand, can be understood as classes that are members of other classes. The seemingly harmless axiom IV.2 implies (in the presence of other axioms similar to ZF) the Axiom Schema of Separation, the Axiom Schema 10 To be precise: Every statement of ZF which is provable in ZF is also a provable statement of N.
The Interpretation of Classes in Axiomatic Set Theory
279
of Replacement and the Axiom of Global Choice.11 In that way the axiom system is simpler than ZF, and can furthermore be formulated without an axiom schema, accordingly; there are only finitely many axioms present. It is an easily removable ‘defect’ of von Neumann’s theory that the axiom of choice is not separated from the other axioms, therefore making it difficult to track the use of that axiom. But this is what mathematicians want to know.12 The classes in this theory are restricted insofar as a class C = { u | u is a set and φ(u) } defined by a formula is only part of the universe if the formula φ has all its bound variables restricted to sets. This restricted form of enlargement is called predicative.13 If one drops this restriction and allows the bound variables to range over classes, too, one gets a so-called impredicative enlargement. Summing up, in N classes are both extensions of first-order logic predicates and rather well defined objects, such as sets. Interestingly proper classes are very similar to sets, as they are objects in the range of the quantifiers which just lack the property of being elements of other objects in the discourse. Levy [42, p. 196] mentions von Neumann’s motivation which “regards [proper] classes and sets as objects of the same kind with the same claim for existence”. Levy does not, however, mention that there is a reason for this. Von Neumann didn’t have to bother about the question of how ‘realistic’ his classes were, since he put forward a formalistic point of view. For him classes were nothing more than ideal elements, which make the theory easier through their defined properties and allow one to talk about entities such as the class of all ordinals within the framework of the theory. But if one drops this view and considers sets as somehow real combinatorical entities, the inevitable question arises of how to interpret proper classes.
2.3 VNBG Robinson simplified von Neumann’s system N in [62] by replacing the primitive notions of function and argument with class and set. As von Neumann himself observed, the concepts can freely be substituted. While technically neutral, the shift from function to classes is philosophically of interest since functions are operations which contain a constructive element. One should keep in mind that if a function f is defined as the set of pairs hx, f (x)i where x is an element of the domain of f , that this defi11 The Schema of Separation is gained as follows. If x is a set, then x cannot be mapped onto the universe of sets. Since x ∩ Z is no bigger in size than x, x ∩ Z cannot be mapped onto the universe, and therefore is also a set in accordance with Axiom IV.2. 12 Cf. [15, pp. 386f.]. 13 This term was introduced by Wang, see [55, p. 310].
280
Daniel Roth & Gregor Schneider
nition is quite abstract since it presupposes that all values of f are given at once. Both Bernays [6] and Gödel [25] introduced a modified version of von Neumann’s theory in which, similar to Robinson’s, proper classes and sets were introduced as primitive notions. The resulting theory is usually summarized under the name VNBG (von Neumann, Bernays, Gödel), but one should bear in mind that each of the three developed a different theory.14 Neither Bernays nor Gödel provides any direct argument for why proper classes are included into the axiom system. Not one line refers to the thought that classes might be considered mere ideal objects that make axiomatic set theory more elegant. Our suggestion would be that both of them had in mind to clarify the axiom system in order to prepare the debate about what proper classes might really be. Thus we have to take a closer look at their motives in order to reformulate von Neumann’s class theory. Bernays in [6], in short, formulates a synthesis between Zermelo [78] and von Neumann [70]. He replaced von Neumann’s functions with classes. Furthermore he distinguished syntactically between sets and proper classes, and introduced two membership relations a ∈ b for sets in sets and c η D for sets in classes. Thus he emphasizes the fact that sets and classes may be objects of a different kind. Von Neumann’s motivation to use functions was to facilitate the axiom system. Bernays, on the contrary, does not adopt the formalistic view, but talks about a ‘domain of facts’ which has to be described adequately. Even if his theory is very elegant, we should be sure to understand it as a mere modernized version of von Neumann’s. Bernays’ careful rejection of formalism is easily overlooked, for he commits himself explicitly to first order logic and gets rid of Zermelo’s so-called “Urelemente” which are considered a superfluous addition, as set theory can be developed without using them. We think of Bernays’ theory as an attempt to find a compromise between Zermelo’s ideas about axiomatizing set theory and Skolem’s strong critique.15 Zermelo’s “definite Eigenschaften” are restricted by the use of first order logic, but are also given a natural representation by introducing classes. We think that Fraenkel’s description of Bernays’ motives in [8] seems to be correct. In an historical introduction to Bernays’ Axiomatic Set Theory, he makes some remarks about the axiom systems of von Neumann, Bernays and Gödel in comparison to ZF: Since “over-comprehensive sets” do only harm by being taken as elements of other sets, a distinction is made between collections that can also serve as elements, henceforth called sets, and collections excluded from elementhood, called classes [8, p. 32].
14 As far as we can see, there is no standard for the names of the various axiomatic theories. Often the historical references are not entirely fair. Even the most prominent theory, ZF, should include Skolem’s contribution and better be expressed by ZFS. To avoid further complication we will stick to Maddy’s conventions. 15 See [77, p. 127] about Skolem’s skepticism toward set theory.
The Interpretation of Classes in Axiomatic Set Theory
281
He mentions that “every well formed formula (predicate) defines a class, which contains all objects (sets) that satisfy the predicate”. Although this motivation is very similar to von Neumann’s, he points out the differences between the axiom systems: However, while for von Neumann this axiom (Axiom IV.2) also includes the axioms of Subsets, of Choice, and of Substitution, Bernays took an essential step forward by distinguishing between these different purposes [8, p. 33].
Next to a formalistic attitude along the line of argumentation of von Neumann (‘classes do not harm’), he talks of “different purposes” of the axioms. Unfortunately, no further motivation for the use of classes is given. Gödel modified Bernays’ theory by identifying the membership relation between sets and sets, and between sets and proper classes. This stresses the similarity between sets and classes. Since Gödel’s attitude towards mathematics was certainly not formalistic, it would be interesting to know whether his modification was intended to indicate that proper classes are set-like. No explicit reference can be found in [25]. Fraenkel even suggests [8, p. 32] that Gödel’s modifications “are chiefly for the purpose of proving the consistency of the generalized continuum-hypothesis”.
2.4 Virtual Classes (Bernays, Quine) Surprisingly, in his 1958 Axiomatic Set Theory ([8]) Bernays modified his theory once more by restricting class variables to be used only as free and not as bound variables. In the introduction, Bernays sums up the motivations for his final presentation of set theory. He first points out that it is not his primary motive to avoid the antinomies, since they don’t threaten mathematical practice: Of course, the importance of these antinomies is beyond doubt. But what may be regarded as an exaggeration, is to infer from them a requirement of restricting our usual methods of mathematical procedure. In particular no cogent argument can be drawn from them that mathematics have to be built up in a strictly predicative way, – if one is not a priori convinced of the necessity of a predicative procedure[8, p. 39]
He continues to explain the motives behind his new theory. First order is adopted, as Skolem required, the Replacement scheme (Ersetzungsschema) of Fraenkel and Skolem included to yield “Cantor’s summation process in full generality” [p. 41], and, contrary to von Neumann’s introduction of functions, the notion of set as basic is retained. These motives are essentially the same as those behind his [6] but now there are two new motives. First, the role of the axioms is traced more carefully by beginning with the three axioms that govern the construction of the ordinals and by introducing the stronger axioms, the Axiom of Power Set, the Axiom of Choice and the Axiom of Infinity at a later point in time. Secondly,
282
Daniel Roth & Gregor Schneider
he refrains from using class variables as bound variables of the theory, introducing proper classes as part of a logical class formalism16 , which is in principle eliminable. Bernays puts forth a couple of arguments for his procedure. After mentioning that in his former system sets and proper classes were syntactically different but could be reduced to a one-sorted calculus by identifying sets with certain classes, as done by Gödel in [25], he now questions whether one should “go so far in the formal analogy” [p. 41] as to identify sets with extensions of predicates, as was Frege’s original assumption, which leads to antinomies if formalized in a naïve way. After mentioning the theory of types, he goes on: But another way is to give up Frege’s first assumption, that is to distinguish classes as extensions from sets as individuals. Then we have the advantage that the operation of forming a class {x|A(x)} from a predicate A(c) can be taken as an unrestricted logical operation, not depending on a specifying comprehension axiom. But from this then, we have to separate the mathematical processes of set formation which in the way of Cantor are performed as generalizations of our intuitive operations on finite collections. [p. 42]
His motive here is to be very cautious about the use of class variables and to keep track of the distinction between logical grammar and set theoretic formations. As a second argument, Bernays mentions that the “formally stronger separation of the classes from the sets conforms to the view, that the universe of mathematical objects is not itself a mathematical object” [p. 43]. This argument underlines that Bernays does not have a formalistic attitude since he takes seriously the fact that the universe of all sets V cannot be an object. If classes are allowed as the values of bound variables, one can quantify over V, and the domain of the interpretation is beyond V. The third argument refers to the axioms of class existence in [6]. Each of these, save the axiom of choice, can be reformulated by introducing a class formation. Bernays stresses that his former axioms already have a strong formal character. The Axiom of Choice, however, can be formulated as an assertion of mere set existence. Finally, Bernays mentions that in [6] “the bound class variables never explicitly occur in symbolic notations”, diminishing the difference to his new system, which really isn’t a different system but a modification in which “the formalization is carried out to a higher degree” [p. 43].17 Levy (in [43]) followed Bernays in his influential introduction on set theory. Following Bernays, Quine 1964 introduces in [55] a logical framework in which virtual classes are included. He introduces virtual classes by 16 The class formalism is introduced in the first chapter of [8]. 17 Apart from the discussion about the logical framework in [8] Bernays also offers a new axiomatic approach which is closer to Cantor’s original arithmetic. Moreover, his chapter about real numbers, where it is shown how the real numbers can be constructed in the universe of sets by use of the axioms of set theory and proper definitions, is notable.
The Interpretation of Classes in Axiomatic Set Theory
283
defining them together with the ∈ symbol in toto: The whole combination y ∈ {x|F x} is simply defined as F y. The latter is a well-defined term in first order logic. Armed with this basic trick, one can define a framework of virtual classes and relations and talk satisfactorily about classes without any ontological commitment. Virtual classes are not to be confused with proper classes or ultimate classes, as Quine calls them:18 Ultimate classes, for theories that admit them, are real: they belong to the universe of discourse, they are values of quantifiable variables. The virtual theory of classes, on the other hand, does not invoke classes as values of variables; it talks much as if there were classes, but explains this talk without assuming them [55, p. 20].
Virtual classes in the sense [8] and [55] use them, are ideal in a sense quite different to that of von Neumann’s classes. They can be included in the logical grammar of any first order logic. Therefore, the very natural way of speaking about classes can be translated into formal language quite satisfactorily, and we are by no means forced to answer any question concerning the essence of a class. But of course such a question can be posed and the three authors do by no means propose that there is nothing more to classes than being a façon de parler. They only take seriously the fact that one should not introduce bound variables into a theory without making clear what one is referring to. In the second chapter of [55], Quine describes his point of view, which includes the Class Commitment Principle, very clearly: Once we admit classes and relations irreducibly as values of variables of quantification, and only then, we are committed to recognizing them as real objects. The range of values of the variables of quantification of a theory is the theory’s universe. [55, p. 28]
2.5 Impredicative Class Theories While virtual classes stress the difference between sets and proper classes and regard proper classes as less real, impredicative class theory does quite the opposite. Here proper classes are taken seriously as real entities. In VNBG classes are predicative in the sense that they only correspond to extensions of predicates, in which the values of bound variables are restricted to sets. Thus predicative classes are uniquely describable by the class abstractions {x|Ax} in a language of set theory. If one drops this restriction and allows the quantification on any class in the comprehension schema the resulting theory is called an impredicative enlargement. Adding extensions to ZF in such a way has been introduced by Wang [72], 18 The term “ultimate classes” gives a far better picture of what is meant by “proper classes”. Unfortunately Quine’s naming was not applied by mathematicians and philosophers so we stick to the term “proper class”. But it should be emphasized that sets are not ‘improper’ classes.
284
Daniel Roth & Gregor Schneider
Mostowski [49], Quine [54], Stegmüller [69], Tarski, Morse [48] and Kelly [36] and is commonly called KM, QM, or the impredicative VNBG.19 KM is a stronger theory than ZF, which can be seen by the fact that in KM the consistency of ZF and other new theorems of set theory can be proved. Formally KM is in many regards preferable to VNBG. In KM one can prove the full induction schema including class quantifiers.20
2.6 NF, ML and NFU In 1937, Quine [53] introduced the axiom system NF, New Foundations, which differed from ZF and von Neumann’s system mainly by engaging only one axiom-schema (of comprehension) beside the axiom of extensionality. NF is best motivated as a reaction to the negative effects of Russell’s theory of types. Quine remembered in [57] that he had been disturbed not only by the fact that an axiom of infinity had to be added in type theory to do arithmetic, but also by the consecutive copies at each stage of the empty set and by the cardinal numbers, which are neither unified nor universal in type theory. Furthermore, the syntactical restriction of type theory excluded meaningful sentences from the language, e.g. formulas like xi = xi+1 (with xi , xi+1 , variables of the stages i respectively i + 1), which are necessary to compare different stages.21 Quine’s solution was to collapse the stages to one plain universe and to confine the stratification to the formulas which are allowed for comprehension: (NF-C) ∃x∀y (y ∈ x ↔ φ(y)) for all stratified22 φ(y). NF is the first set theory we consider that has set-classes: The universal set-class A exists because ‘x = x’ is stratified and A is an element of itself. In fact, ‘half’ of the universe is covered by these big set-classes suffering from the inherited deficiencies of (NF-C), whereas the other half is taken up by the combinatorial sets which are ‘built’ on the empty set by ‘repeated’ ‘construction’ of the powerset.23 19 We stick to KM 20 In VNBG one can only prove the schema of induction for conditions without class quantifiers. If one adds to VNBG the full schema of induction, the resulting theory is quite similar but not quite as strong as KM see [42, p. 198]. 21 His opinion about (the later) ZFC was no better in 1936: “[I]n its multiplicity of axioms it seemed inelegant, artificial, and ad hoc” [57, p. 326]. Furthermore, it lacks the big sets, e.g. the universal set or the universal complementary sets. 22 A formula φ is stratified if it is possible to assign to every variable x of φ a natural number i (xi ) such that for all atomic formulas xi = xj and xn ∈ y m of φ it is the case that i = j and m = n + 1. 23 The more ontological separation of stages in type theory is transformed through (NF-C) into a technical matter in NF. E. g., there is no set-class in NF which is a bijection from the universal set-class into the set-class of all set-classes with exactly
The Interpretation of Classes in Axiomatic Set Theory
285
For all who sometimes struggle with the formal-mathematical material in analytic philosophy, we mention that Quine’s 1936 proof of the infinity of the natural numbers was invalid and his first attempt in the first edition of his Mathematical Logic was also deficient and corrected in the second edition. In fact, in 1953 the Swiss set theorist Ernst Specker was the first to provide a valid proof in [68]. He followed the well-known fact in NF that the universal set-class is its own power set and proved that the principle of choice fails in NF. This insight was later used by Jensen in [33] to develop NF further. He weakened the axiom of extensionality to allow urelements and added inter alia an axiom of choice. Because of the urelements, the power set of the universal set could be of lower cardinality than the universe, because every urelement is extensionally equivalent to the empty set, and thus the urelements are not included in any power set. Surprisingly, NFU was equiconsistent to some fragment of ZFC and could be modeled by common set theory. Natural extensions of NFU (in the sense of NFU) were developed which could be shown to be embedded in the hierarchy of the large cardinal hypotheses referring to ZFC.24 As noted above, Quine wanted one theory of both sets and set-classes and his axiomatization of NF treats them as being of one nature – a view he probably held beside the technical challenge how to treat classes in a formal setting. We call this intuition the holistic set intuition. It is obviously contrary to (CL). However, it gives us no further mathematical information about the ZFC-like sets (at least in NF as far as we know it). Through NF we learn nothing new about ZFC. In 1940, Quine made another contribution to class theory by introducing ML, his Mathematical Logic, in [54]. In ML the comprehension principle of NF was replaced by two rules. The first rule states the existence of every extension of all elements satisfying any arbitrary condition φ. The second rule restricts the elementhood to the set-classes and sets of NF. Quine calls the procedure, by which extensions are added to a theory like NF, impredicative enlargement. Impredicative enlargement is not merely an extension of a theory, but an inclusion of the axioms of a theory such as, for example, ZF in a skeletal impredicative theory. The comprehension axiom has to be relativized to sets in order to avoid a straightforward contradiction. Thus ML is obtained from NF “in somewhat the way in which von Neumann modified Zermelo’s system” [56, p. 98]. But as KM, ML is impredicative in the sense that the bound variables in the formulas involved in the comprehension axioms do not have to be restricted to the arguments (the sets). This makes ML a “curiously strong class theory”, in one element. In fact, the latter has to be incomparable in size to the universal class. 24 See [29] and [28]. The biggest extension NFUM is almost of the strength of ZFC with a measurable cardinal.
286
Daniel Roth & Gregor Schneider
which, contrary to N or NF, the full principle of induction can be proven as in KM. An interesting feature of NF and ML is that no model is known at all; that is, no ZFC-like model has been discovered yet. In [73] Wang was able to show that ML is consistent if NF is.
2.7 Finsler’s informal set theory So far the historical significance of Finsler’s set theory (FM) is that Ackermann’s class theory can be seen as an attempt to formalize parts of it, especially the concept circle-free.25 Paul Finsler had proposed in [17] (1926) a kind of super set theory with only three seemingly philosophical or metamathematical axioms and modified it slightly in following articles up to [20].26 Finsler’s reputation in logic and foundations of mathematics is damaged because of his repeated claim to the chronicle and systematic priority of his [18] over Gödel’s incompleteness theorem, which historians and Gödel himself deny. The reception of his set theory did no better.27 Finsler refused to accept some of the philosophical underpinnings of the evolving formal logic and model theory back then which are pivotal for the way of practicing mathematics in the 20th century. He rejected all formal-symbolic starting points for the foundations of mathematics and even neglected any involvement in the research and discussion in the area of symbolic logic. Thus, it is not surprising that the reception of FM is a story of misunderstandings on both sides and an actual act of mathematical discussion where Kuhn’s notions of paradigm, normal science and incommensurability can be applied successfully.28 However, here is not the place to finally evaluate Finsler’s set theory – its axioms are probably not formalizable29 –, we proceed as if a Finsler-style set theory was possible and highlight the important points for our topic. Finsler approached set theory through its objects and not as usual through postulated, structural principles. Finsler’s aim was to get every set: e.g. the universal set, the set containing only itself, or the sets almost identical to inconsistent ‘sets’. He therefore attempted to restrict the existence of sets by inconsistency only. Because his intuition of sets 25 We have no evidence that Ackermann intended a formalization of the concept circlefree. He also did not mention Finsler in his [1], but only Cantor as his predecessor. However, the mathematicians familiar with FM had no doubt that A was a formalization of the concept circle-free; cf. [30, 5.2], [27, 5.5], [10, pp. 101f.]. 26 His articles about set theory are collected in [21], an English translation is contained in [22]. A mathematical representation of Finsler’s set theory is given by [5]. 27 For some negative reviews see [4], [67], [23] and partly [28]; for historical details and some thoughts about Finsler’s premodern framework see [11]. 28 Cf. [11]. 29 Cf. [61, p. 246].
The Interpretation of Classes in Axiomatic Set Theory
287
also includes set-classes, in what follows we will use the term “class” instead of Finsler’s “set” (“Menge”). For Finsler, inconsistency is finally based on the mathematical objects themselves. That is why the (mathematical) existence of the class of all classes A in FM can be assured by looking at the determinacy of its elements and its identity – because the element-relation is all that matters besides identity: For every class, it is definite whether or not it is an element of A, even for A itself. Because A is explicitly determined only by ‘containing all classes’ (which lacks any problematic identity statement like ‘is not identical with the class defined by the condition ‘has elements or not’ for its members’), it does not hinder the identity with isomorphic classes.30 Thus the existence of A cannot cause trouble and entail an inconsistent statement in the informal context of FM, because any inference of an inconsistent statement in FM must have its roots in an ‘inconsistent object’.31 Second, he explicitly stated that isomorphic classes are identical. This way he wanted to handle the identity of self-membered classes.32 The later version of the informal axioms is as follows:33 I For a class it is definite what classes bear the ∈-relation to it. II
III
Two classes are identical whenever possible, i. e. whenever it does not lead to inconsistency. Something is a class whenever possible in respect to axioms I&II.
A simple formalization of the first axiom (∀x (x ∈ y ∨ x 6∈ y)) is a theorem of common first order predicate logic, and one has to ‘weaken’ the logical framework, e.g. by adding a third truth value or by introducing partial predicates, to be able to state the axiom in a way that makes its effects visible in a formalization. But to make sense of the second axiom is not as easy. However, if Finsler refers to an informal way of reasoning, when talking about mathematical possibility and inconsistency, which is very 30 Two classes are isomorphic, if there is a bijection of the transitive closure of the classes which respects the element-relation, see footnote 41. 31 What we classify as an object related approach is usually referred to as a strong type of (mathematical) platonism (e.g. in [11]) which should explain Finsler’s ‘exotic’ style of thought. Contrary, it seems to us that Finsler thought about classes as mathematicians today sometimes consider structures as their ultimate objects. Neither are treated in a purely formal-mathematical way. 32 In fact Finsler later strengthened the criterion of identity to something like ‘two classes are identical if it is possible’, and thus gets close to Aczel’s Anti-Foundation Axiom AFA. (Note, that Aczel’s formulation FAFA of Finsler’s Anti-Foundation Axiom replaces the implication with a biconditional, adding something like ‘if two sets are identical, then an isomorphism can be given’ and makes AFA and FAFA incompatible this way; cf. [2, p. 46ff.].) Anyhow an additional axiom to extensionality is needed to decide e.g. whether B = {B} and B = {{B}} are only notational variants of the same class. 33 Cf. [19, p. 33], [5].
288
Daniel Roth & Gregor Schneider
likely, then axiom II is possibly not formalizable since the formal provability differs from informal provability, in as much as some rules of the latter are formalized by 2 in a modal logic like S4 whereas formal provability is formalized by 2 in GL. But the logical rules of S4 and GL are inconsistent if put together.34 Whereas axiom II would restrict the existence of classes to a minimum in respect to identity, axiom III widens the universe of classes to a maximum. Its predecessor is Hilbert’s completeness axiom of geometry, but Finsler’s set theoretic equivalent was heavily criticized. However, in as much as axiom II implies that isomorphic classes are identical, as Finsler contends, that move seems at first glance to protect FM from the traditional model-theoretic objections based on Cantor’s Legacy. It was objected that every collection of objects that satisfies the (first two) axioms of FM can be expanded by adding the specific Russell extension to the collection which again satisfies the first two axioms and contains new/more classes. Therefore there is no finally completed collection satisfying axioms I&II, which contradicts axiom III.35 This simple style of argument, however, does not succeed if the set theoretical background of model theory is changed from ZFC to FM, because of the identity of isomorphic classes and because of the behavior of the big set-classes. In fact, if you eliminate the object which figures as the universal class out of the model, there is no obvious reason at first glance, why the resulting model should not be isomorphic to the old model. Only something like deleting the ‘old’ zero or adding a ‘new’ one to the model of the natural numbers seems to be obtained, which results in an identical structure.36 Call the Finsler-style class, which has as its elements all sets not members of themselves except itself (halfway formalized: R = { x | x 6∈ x ∧ x 6= R }) – the Russell-class.37 Then, to put it very shortly, the added Russell extension could become the R-class (or a subclass of the R-class) and no new isomorphism-type enters the stage and therefore no new class.38 34 But is informal logic not formalized in S4 and therefore now formal ? No. First, it is not formalized by setting up a new logical system. Second, only some rules are formally stated in a – as later entailed by comparison – fundamentally different formal logic. On formal and informal provability see [39] and the remarks in [9, p. 157ff.]. 35 Cf. [4] or [13]. This objection “had a devastating effect on Finsler’s theory and put an early end to it. This is all the more remarkable as the very objection, which Baer took as a theorem stating the inconsistency of Finsler’s axioms[,] had already been put forward, discussed, and rejected by Finsler himself [. . . ] It seems to be rare in the history of mathematics that someone was refuted by an objection of which he himself had been fully aware.” [11, p. 261]. 36 Of course, this is obviously not true in common set theories. For it to hold the big set-classes have to depend not only on their elements but also on the classes they are elements of or on the concepts that can work as conditions in their definitions. 37 We assume here that its existence is provable in FM. 38 Finsler accused Baer of applying a circular definition which fails to define a new
The Interpretation of Classes in Axiomatic Set Theory
289
If this would be the case the big set-classes would have to be fundamentally different from common sets. In the next paragraphs we indicate that Finsler in fact considered set-classes as concept-dependent and in this sense as not purely element-dependent. – Finsler himself, however, denied any relevance of the beginning model theory at his time and so only put one simple argument forward: If you have all classes, there is not one left to be added.39 As the concept of the R-class indicates, Finsler did work with not unproblematic self-referential concepts. This allowed him to define the universe of the ZFC-like classes as a class again, namely as the class of all well-founded circle-free classes. Please note the obvious problems with finding a border between the sets and set-classes. If set-classes should primarily be the elements of class theory which contain themselves, then the set-class of all sets would be the Russell extension, i. e. it will be a set if and only if it is a set-class. If the sets should be the well-founded classes, the set-class of all well-founded classes would be well-founded itself and would therefore be a set. Because sets are commonly conceived as being well-founded and (therefore) not self-contained, a possible set-class of all sets is both well-founded and not self-contained, and something has to be added to this characterization to get sets out of classes. Finsler arranged the self-referential part by defining a circular class by two alternative conditions.40 First, every class that has an extensional circle is circular. This means it has an element in its transitive closure which occurs in its own transitive closure.41 Second, every class which is (conceptually) dependent on the (not already stated) concept circular (respectively on the opposed concept circle-free) is circular. A circular class then is defined as being either extensionally circular or dependent on the concept circular, and a circle-free class as being neither. The hard task is to state this circular concept more precisely. First Finsler explained in epistemic terms that a conceptually dependent class is one which cannot be defined without reference to the concept it depends on, so the ‘conceptual content’ of a circular class, which is not extensionally circular, depends on the concept of circular. Therefore a conceptually dependent class, which is specified by a condition, cannot be characterized without reference to at least part of the given condition. Thereby a new kind of class is presumed, namely (extensional) classes with a ‘conceptual content’ which is not reducible. Second, reduced to the extensional side: A class depends on the concept circular if its elements definite set (see [16]). 39 Cf. [17, p. 700]. 40 Cf. [19, p. 34]. 41 The transitive closure of a class B is the class which contains exactly the elements of B, the elements of elements of B, etc.
290
Daniel Roth & Gregor Schneider
necessarily change when one thinks in different times of different classes to be circular (or circle-free). Therefore, the class of all well-founded circlefree classes RC would itself depend on the concept of circular, and thus it is circular and no element of itself.42 In sum, circle-free classes are not dependent on their own ‘conceptual content’ or on the given unity of all of themselves and thus can be ‘build up’ by their elements. Our ZFC-like classes are all circle-free, too, and hence our combinatorial classes seem to coincide with the well-founded circle-free classes. RC is the universe of sets.43 What is the gain for set theory? The circularity of RC immediately entails a principle of reflection. Every property of RC not dependent on the concept of circle-free must not induce a complete characterization of RC which it would if RC would be the smallest class with this property. Hence there has to be a set (in fact as many sets as the cardinal number of RC) with this property. Keep in mind that this argument also works upwards (presuming a higher cardinal number structure), because RC also cannot be the greatest set with this property. It is unknown how much of the large cardinal hypotheses can be deduced by this principle of reflection. Finsler himself had no problem with (pure) extensions and set-classes and assumed their existence as normal mathematical objects having similar to Quine a holistic set intuition, namely that set-classes and especially the greatest possible set-classes like the universal class are on a par with sets. Extensions (“Klassen”) are well-defined objects by Finsler determined by the two-place relation between the extensions and the (other) classes. He presupposed their existence insofar as he used them to show the consistency of the three axioms of FM in an informal way. For him consistency was sufficient for existence and his platonistic attitude released him from a search for a further interpretation of classes and extensions. It is remarkable that FM would be a class theory just as a platonist would wish, because all open problems which could be rendered in terms of identity statements would be decided in FM because of an axiom that is as general as possible, that is axiom II. Finsler has never ontologically divided set-classes from sets. He had the intuition that the universal set-class is no more dubious than the empty set or any other pure extensional set. However, he tried to identify a property which characterizes exactly the set-classes and has mathematical impact. If we could make sense of his theory and of the role of conceptual referentiality and the axioms, it would be the only theory that would provide us with a really unified panorama of sets and set-classes and would justify the holistic intuition. Thankfully, there seem to be strong arguments that 42 Of course the foregoing paragraphes lack a precise and formal treatment. 43 Obviously, conceptual dependency for classes is a much ‘stronger’ and informal concept than impredicativity.
The Interpretation of Classes in Axiomatic Set Theory
291
FM fails to be directly formalizable into a (consistent) theory,44 and so it could possibly serve as a complete informal source for our necessarily incomplete formal axiom systems.
2.8 Ackermann’s class theory In [1] Wilhelm Ackermann introduced his axiom system A that included a constant M and a reflection principle that stated that every property that is expressible without M (and without any reference to a given proper class) and that is restricted to members of M , determines a unique element of M . Formally his theory is a class theory, which has one kind of variables only, the membership relation ∈ and the unary predicate symbol M . M x means ‘x is a set’. Ackermann argues directly from Cantor’s famous definition of set, which is divided into four principles (“Grundsätze”). Each principle determines an axiom or axiom schema. The four principles and their formal analogs are 1. Any property φ(x) which holds of sets only determines a class. 2. 3.
4.
∀x(ϕ(x) → M x) → ∃y∀z(z ∈ y ↔ ϕ(z)) Classes are determined by their elements. (x ⊆ y ∧ y ⊆ x) ↔ x = y If a property ϕ(x) is such, that it implies that x is a set, makes no use of the predicate M , and all its parameters are sets, then it determines a set. M x1 ∧ M x2 ∧ .....M xn ∧ ∀x(ϕ(x1 , x2 , ..., xn , x) → M x) → ∃y(M y ∧ ∀z(z ∈ y ↔ ϕ(x1 , x2 , ...., xn , z)) where ϕ is a formula with free variables among x1 , ...xn , x which does not contain M . The members and subsets of a set are themselves sets. M x ∧ (y ∈ x ∨ y ⊂ x) → M y
The first two principles concern the objects of the domain, the concept of set-classes is intensionally vague, because a class introduced by the first axiom schema is not given as set or proper class but only as class. The 44 E.g. Holmes in [27, p. 13] about the third axiom: “We have struggled to understand what is meant by this axiom and how Finsler could draw his stated conclusions from it, and we have been unable to come up with a coherent explanation.” However, a historically sensitive approach to Finsler’s third axiom, which would not treat it simply from the view of modern predicate logic, would probably be able to bridge the gap between the informal axiom and the mathematical conclusions derived from it; see in comparison [65] as a historically informed study on Carnaps work on extremal axioms starting in the late 1920s. Likewise, Breger [11, p. 258] remarks: “During the 1920s, Finsler did not look quite so old-fashioned as he seems today when he made the case for an absolute logic not capable of being formalized”.
292
Daniel Roth & Gregor Schneider
third principle is the heart of his theory. It is mainly the third principle which is strong enough to replace all of the ‘ontological’ axioms of ZF, the Axiom of Pairs, the Axiom of Unity, the Axiom of Infinity and the Axiom of Replacement. The proofs of the ZF axioms other than Foundation are straightforward and simple. Reinhardt [58] has shown that if the Axiom of Foundation is added to A, the resulting theory is as strong as ZF.45 Ackermann gives the following motivation for the third principle: However the concept of set is quite open. Cantor’s definition is intended to mean that a collection is to be tested case by case for being a set, and not that it should be determined at one blow for all collections whether or not they are sets. Thus it won’t be possible to regard the collection as sufficiently sharply delimited only if it can be defined via the general concept of set, as it is the case, say, with the collection of all sets. [1, p. 337]
According to Ackermann’s definition of a set, being a collection of things somehow known to us, is not satisfactory if it uses the notion of set. With regard to the universal class V and the Russell class R we admit that they cannot be defined (in A) without the predicate M and thus are examples of such an unsatisfactory definition. A positive example would be the pair of two well-defined sets y and z, because its definition does not involve the general notion of set. Thus Ackermann’s third principle allows us to generate a lot of sets, which are necessary for set theory while safe guarding us from paradox. Still it leaves us with a vague feeling that the line between sets and proper set-classes is not drawn clearly enough. Levy [42] thinks of Ackermann’s justification based on the Cantor’s concept of set as “clearly insufficient”. He criticizes it from being far from clear, why a single setclass is not allowed as parameter in the formulas of the axiom schema, while the bound quantifiers may range over all set-classes. A better motivation for the third principle can be gained if one understands it as a statement about the ‘inexhaustibility’ of the universal class V. Principles of this sort are called ‘Reflection Principles’. A naïve formulation of the reflection principle would be (NRP)
If V has any property P , then there is a set s which also has the property P .
We say that s mirrors V. A naïve formalization of P leads straight to paradox. If we take P to be the property of V to be the class of all sets, the reflection principle states that there is a universal set, which is impossible.46 The difficulties in search of formal analogs of the informal naïve reflection principle lie in finding suitable candidates for ‘V has the property P’. Once a formal language L like first order logic is fixed, the 45 Meaning that any theorem of ZF is provable in A and any provable statement of A, which can be expressed in ZF, is provable in ZF. 46 Properties which are suitable in the reflection principle are called ‘structural’, see [75].
The Interpretation of Classes in Axiomatic Set Theory
293
reflection principle can be reformulated in the following way (RP) V |= ϕ(A) → ∃αVα |= ϕ(AVα ) Vα A is called the relativization of A to Vα , i.e. A ∩ Vα . If not relativized in a suitable way reflection schema can become inconsistent. If properties are limited to those expressible by first order language formulas and the bound quantifiers are relativized to Vα , one gets a weak reflection principle, which is provable in ZF (see [47]). Ackermann’s third principle can be understood as a reflection principle. Semantically, M stands for the class of all sets VM . Any class defined by a formula ϕ0 (which fulfills the requirements of the third axiom) is now reflected from the universe into VM . Since Ackermann interprets M as VM , the universal class of all sets, proper classes (the ‘Nichtmengen’) are elements of the discourse. He observes that it is necessary for his conception that proper classes are part of the range of the quantifiers. He does not say how they can be interpreted but is satisfied with the fact that the sets of the theory engulf all important (‘wichtige’) sets. Other interpretations of M , especially the ones taking M as a large set, are possible. We will come back to that discussion below. Please note that the proper classes of A are not extensions but setclasses. Because it is essential for Ackermann’s conception that M is not definable, its proper classes cannot be exactly the classes which are not elements of other classes. Therefore proper classes of proper classes exist in A, and the argument can be repeated. The hierarchy of proper classes has to be indescribable in a way (that has no strong consequences, since A entails only a few facts about its proper classes). With regard to Cantor’s intuitions about sets, Ackermann’s theory A can be understood as an improvement of the somewhat random ZF axioms. Moreover we have fewer principles and therefore a narrower picture of the ‘force’ which propels naïve set theory. But the picture is still blurred.
2.9 Bernays’ last theory, 1958 Combining the impredicative class theory with Ackermann’s A, Paul Bernays introduced in [7] the class theory BL (Bernays-Levy), which had an impredicative comprehension schema and, similarly to A, a very elegant reflection schema. Building on results by Levy [40], [41], Bernays starts his article with Levy’s Schema of Relativization (RS)
φ → ∃y(T rans(y) ∧ Rel(y, φ))
where φ is a formula, and Rel(y, φ) the relativization of φ to y. Starting with sets and the Axiom of Pairing, Extensionality and the Axiom Schema of Comprehension he shows how (RS) can be used to gain the Axiom of Union and The Axiom of Infinity. He then introduces a class formalism which allows the use of free class variables in the (RS) and shows how the Replacement Schema can be derived. As in [6] and [8] the classes are
294
Daniel Roth & Gregor Schneider
formally separated from the sets. Finally the formalism is extended by allowing bound class variables in (RS). The formulas, including quantifiers ranging over classes, are reflected into a set y and its power set. Bound class variables are relativized to subsets of the set y, free class variables are relativized by intersection with y. The full schema of impredicative comprehension is added. In the extended formalism, BL the Axiom of Power Sets and the existence of Mahlo cardinals can be derived. Bernays gives only a short comment on the last extension: It cannot be denied that the class formalism loses its elementary character when extended to the deductive frame presented here. On the other hand this extension seems to be necessary if we want to give the schema of relativization its full impact [7, p. 23]
Requiring only four principles, BL certainly is the most elegant and strongest axiomatic set theory compared to ZF, A, VNBG or KM. As the axioms of ZF in A, the axioms of KM can be proven very easily. To give an interpretation of BL is more complicated, since one has to interpret the use of impredicative classes and their role in the reflection schema as well. Therefore, BL is most often considered as a set theory with a second order reflection principle, for instance by Kanamori [34, p. 59]: “Bernays postulated the full second order reflection principle for the universe of sets”.
2.10 Oberschelp In [51] Arnold Oberschelp introduced an impredicative class theory, G*, in which he identified proper classes with urelements (Urelemente). Starting with a theory of urelements, sets and classes, he shows how the axioms have to be modified in order to identify proper classes with urelements in a relevant way. He thus achieves that classes can be elements of other elements, the sets. In the spirit of von Neumann and Ackermann, he is interested in a simple formal theory, from which all the results of naïve set theory can be derived. His contribution to formal simplicity is the idea to allow classes to be elements of other objects without being led into contradictions. In G* proper classes may appear as arguments of functions or relations. As such and notably it is possible to build an ordered pair of two proper classes. This makes the system very flexible. It has been found practicable to embed the domain of sets into the larger domain of classes. But then classes appear that are not sets, i. e., the proper classes. These cannot be elements. In particular, there are no unordered or ordered pairs of proper classes. Therefore there aren’t any relations and functions which have proper classes as arguments. Of course relations can be represented by formulas H(x, y) with two free variables for which also proper classes can be admitted as values. But such a relation cannot be represented as a single
The Interpretation of Classes in Axiomatic Set Theory
295
object which itself ranks among the values of variables. Thus the proper classes are not easy to handle. They bring to a close the upward process of set and class formation (every upward epsilon chain terminates when it reaches a proper class) just like the urelements figure as terminal points in the downward direction. [51, p. 236f.]
Another justification for G* is its usefulness in the theory of categories. By definition, categories are often proper classes, but usually one wants to work with them like with simple elements. Just as in the theories of von Neumann, Bernays or Ackermann, classes are ideal objects in that their main purpose is to facilitate the calculus of set theory. Even though Oberschelp uses a sufficient part of his paper for the motivation of G*, semantic questions considering the ontological status of classes are not posed at all. Syntactically, G* is an impredicative class theory because the comprehension scheme Komp∗ , even though restricted to avoid contradiction, allows classes as free variables. This underlines the formalistic attitude behind G*. Oberschelp shows that G* is consistent relative to G, a usual class theory with urelements that should be consistent relative to ZF. A natural model of G* will not exist, since proper classes, like V, are allowed to be elements of sets. Oberschelp’s class theory was further developed in [52].
2.11 Topological set theories Topological set theories47 are under current research and seem to be promising in providing a natural, alternative enlargement up to the highest large cardinal axioms. They include a positive comprehension axiom schema, which is systematically the other well-known path to restrict the formula of naïve comprehension. In NF only stratified formula are allowed for comprehension, in positive comprehension the formula have to be constructed without negation. Therefore both the negation sign and implication are expelled, but falsum is allowed. That construction of positive formula is strongly related to the laws of topology to maintain open ‘sets’. The set-classes in fact behave like open ‘sets’ in topology. Further, these set theories with set-classes possibly open the justification of set theoretic axioms and equivalent large cardinal axioms to geometric intuition – something we cannot cover in depth here.
47 Cf. [3], [30].
296
Daniel Roth & Gregor Schneider
3 Interpreting Classes 3.1 The problem In [56] Quine added some Supplementary remarks in which he mentions the development of class theory as a major part of logic: [there are] three parts of logic which it is convenient to develop successively: truth function theory, quantification theory, and class theory.
He adds that, “whereas the first two parts are settled in essential respects, the third part – class theory – is in a speculative state”. Accordingly, for Quine the question about an alternative to ZF was in no way settled. Since the various class and set-class theories were proven to be serious alternatives to ZF, a discussion about the ‘right’ theory emerged which is still at stake. Let C stand for any axiomatic class theory, the discussion focuses on four different questions. 1.) Which theorems can be proven by C and to what extent can Cantor’s informal set theory be derived? Since ZF is considered an adequate formalization of naïve set theory, can the axioms of ZF be derived? 2.) How elegant and simple is the formalization of C and the proofs involved? 3.)
Is C consistent? Since class theories are essentially stronger than arithmetic and considering Gödel’s second incompleteness theorem, is the theory consistent relative to other theories, like ZF possibly enriched by large cardinal axioms? 4.) What are natural models of C? How can extensions or set-classes be interpreted in particular? The first question is mainly a mathematical question and has been explored to a sufficient extent. The second question is similar. It is an empirical question which requires some practical working with the formalism. For radical formalists it is desirable to have a simple formalism. It has to be remarked that ‘simplicity’ is a quite complex concept. If we compare VNBG and KM with G*, the Axiom of Comprehension of KM ∃A A = {x|φ(x)} is obviously most elegant, while in G* we can freely build the pair of two proper classes. To be finitely axiomatizable, like VNBG, would be another trace of simplicity. There is no obvious way to compare the different traits of simplicity. For someone who denies that classes can be grasped in a coherent way while affirming that sets can, ZF will anyway be the preferable axiomatization, regardless of the fact that it consists of infinitely many axioms or that the Axiom of Separation is rather limited. The third question is again a proof-theoretical question, that is part of model theory. Apart from NF, the consistency strength of any of the theories cited above is known today. Novak [50] together with Rosser and Wang [63], showed that VNBG is consistent if ZF is consistent. The
The Interpretation of Classes in Axiomatic Set Theory
297
connection between the two theories is even stronger. Any theorem of VNBG which is about sets only, can be proved in ZF. Thus one can say that the sets of VNBG are the same as those of ZF. From that it follows, that the existence of sets, that cannot be shown in ZF, like for example the Mahlo cardinals, cannot be shown in VNBG either. From such a point of view, VNBG is no stronger than ZF, a fact often cited in favor of ZF. But one has to bear in mind, that, if someone is only interested in the proof theoretical strength of a theory, they have already made unspoken philosophical commitments. KM is consistent relative to the theory gained by adding to ZF the statement, ‘There exists an inaccessible cardinal’ (see [41] [42]). It is an interesting empirical fact that for almost any axiomatic theory such as the various class theories, it can be shown that they are consistent if ZF + LC is consistent, with LC standing for a large cardinal axiom. Moreover, most class theories allow natural set models Vα , where α is a large cardinal number. This leads to the last question of how classes are to be interpreted. This question goes beyond pure mathematics. In this part we will show how this question was answered by set theorists.
3.2 Predicative or Impredicative Classes In any of the class theories discussed above one always encounters the same problem: How are the class variables to be interpreted? Our tendency to interpret a variable as ranging over the elements of a discourse leads to the conclusion that there is something bigger than V, the universal class, which is a paradox. One can always omit semantic questions by going back to a formalistic standpoint. As long as a theory is consistent (and that means since the impossibility of giving a direct finitistic proof of the consistency, that no inconsistency has been detected, which allows us to formalize naïve set theory) we don’t have to bother about semantics. But if one takes semantics seriously, the class variables have to be interpreted. Quine’s Class Commitment requires that we can always explain what kind of objects the quantifiers range over. One can try to interpret the proper classes of VNBG as shortcuts for definite properties of sets.48 But then either ZF or virtual class theory seems to be the more consequent axiomatization since quantification is restricted to sets. In fact, the virtual class theory of Quine and Bernays can be seen as a natural endpoint of the discussion of how to reconcile Frege’s notion of a class and Zermelo’s axioms on sets. The dominance of this view on proper classes can be seen by looking at the popular set-theoretic textbooks that were guided by the idea that one can dispense with proper classes. Drake ([14]), Jech ([32]), Levy ([43]), and Kunen ([38]) used the ZF axioms and introduced real proper classes in a semiformal way, but not as objects of the theory. This 48 Predicative classes can be characterized definitely by comprehension terms {x|Ax}, where Ax is a one-place set predicate in a formal language of set theory.
298
Daniel Roth & Gregor Schneider
view clearly shows that the mathematicians followed Gödel’s objectivism (see Wang [76, p. 188], [74, p. 9]) regarding sets. In KM proper classes can no longer be interpreted as properties of sets only, but are, as Maddy [44, p. 122] calls them “combinatorially determined subcollections of the universe of sets”. One should expect that therefore theories which intend proper classes have been rejected since a coherent interpretation of proper classes seems impossible. But since the ontological commitment is based primarily on quantification, one could say, that if we adopt a class theory anyway, it is not necessary to restrict the Axiom of Comprehension as in VNBG. Thus we either adopt virtual or impredicative class theory. This applies to a formalistic as well as to a objectivistic position. In KM {x|φ(x)} is always a class, while in VNBG we have to check whether φ(x) is equivalent to a formula in which no quantification on proper classes is involved. Following von Neumann’s motivations mentioned above KM turns out to be the better alternative. In Levy’s words: The main argument in favor of QM is that once we agree, as von Neumann did, to avoid the antinomies not by forbidding the existence of large classes, but by denying their elementhood, there is no reason at all why we should stop at classes defined by conditions, which do not involve class quantifiers and not admit classes defined by other conditions. [42, p. 198]
If one is guided by the Class Commitment Principle, then proper classes are either real objects, which means they should be in the range of the quantifiers, or they are not, which means we should not quantify on them at all. Now I think both Bernays and Quine can be understood as having an objectivistic conception of sets in just the way they took the ontological commitment seriously. Müller gives an account of Bernays’ attitude when he recalls that “Bernays did not consider classes as real mathematical objects”.49 But if one was to have quantification then no restriction should hold. Only three years after his virtual class theory Bernays came up with an impredicative class theory BL [7]. The case is similar with Quine.50 Quine finds VNBG “not without attractions” when “freed for impredicative cases” [55, p. 321]. His argument against an impredicative enlargement of ZF like set theory is notable. It is based on the following Axiom: For any object x, {x} is an object too, and for any two objects a, b {a, b} is an object.
Now from this axiom it follows that for any object x ∈ {x} ∈ V , so that x is a set. Therefore no extensions are possible. He gives a sketch of his decision: Only because of Russell’s Paradox and the like do we not adhere to the naïve and unrestricted comprehension schema [. . . ]. Having to cut back 49 Cf. [35] 50 For a discussion on Quine’s attitude towards the reality of mathematical objects see [45].
The Interpretation of Classes in Axiomatic Set Theory
299
because of the paradoxes, we are well advised to mutilate no more than what may fairly be seen as responsible for the paradoxes. Thus I do not want to mutilate truth-function logic [. . . ] nor the logic of quantification, but only set theory, the laws of the ‘∈’. But within set theory there is in turn the conspicuous distinction between finite and infinite classes; the obscure classes are infinite ones, and only the infinite ones give rise to paradox. Our maxim of minimum mutilation then favours admitting all finite classes of whatever things we admit. [55, p. 50]
Suppose every class had a singleton. Then we could map the classes oneone into V. By the usual diagonalisation argument, this is impossible. The axiom commits us against proper classes. It entails that for any object x ∈ {x} ∈ V x cannot be a proper class. Therefore no extensions are possible. It is interesting though, that Quine is quite careful about his rejection of adding extensions to set theory.
3.3 The cumulative hierarchy Many set theorists are guided by two principles which are called the iterative conception of set and the cumulative hierarchy. The sets are divided into layers, and for each set there is a minimal layer which contains it. So for each set it is clear whether it appeares ‘before’ an other set. The hierarchy could be defined by transfinite induction. V0 = {}
if α is a successor cardinal if α is a limit ordinal. S Their intuition about V = α∈Ω Vα is that it is essentially unfinished. Therefore any theory introducing other objects than sets is welcome but doesen’t really apply to V but only to a Vα . Class theories, where classes are extensions of sets, like VNBG, KM or BL can be interpreted over standard set-models by interpreting sets as members of a large Vα , and the proper classes as the new members of the powerset of Vα , that is Vα+1 Vα . In class theories like A or S+ the proper classes are interpreted as members of an even bigger Vβ .51 It can be shown that the models of class theory correspond to standard models of ZF enriched by strong cardinal axioms. The structural difference between proper classes and sets can always be reflected into the difference between larger and smaller sets. It is this possibility that raises the question of why to bother with class variables if everything can be likewise expressed solely by using set variables. Vα+1 = P (Vα ) S Vα = β<α Vβ
Maddy [44, p. 122] draws an accurate picture of set theoretical practice: The problem is that when proper classes are combinatorically determined just as sets are, it becomes very difficult to say why this layer of proper 51 S+ is introduced in section 3.4.
300
Daniel Roth & Gregor Schneider
classes a top V is not just another stage of sets we forgot to include. It looks like just another rank; saying it is not seems arbitrary. The only difference we can point to is that the proper classes are banned from membership in sets of rank less than κ. Because the classes look so much like just another layer of sets, most set theorists think of the proper classes of a weak system like VNBG as metamathematical shorthand, and those of the stronger MK as subsets of a suitably chosen high rank.
Or in Reinhardt’s words: [. . . ] many mathematicians find ZF far more natural than KM; our idea of set comes from the cumulative hierarchy, so if you are going to add a layer at the top it looks like you forgot to finish the hierarchy. [59, p. 32]
Especially when a predicative interpretation of classes can not be given, as in KM, “it is frequently regarded philosophically not as a theory about classes, but about subsets of some Rθ [Vθ ] where θ is inaccessible”. He adds that, “presumably any extensional theory of properties should allow such a natural model”. Finally Levy: The main point which will, in our opinion, emerge from this analysis, is that set theory with classes and set theory with sets only are not two separate theories; they are essentially, different formulations of the same underlying theory. [42, p. 199]
The question remains what a theory might be and when we can justify talk of different theories. This can be grasped, however, if we talk about the same facts regardless of working with classes or without them. By applying Occam’s razor, we can dispense with classes. But this view is not very satisfactory. When we refer to V and proper classes it seems fairly clear what we mean. By V we understand the class of all sets. The fact that we can reinterpret any theory which is supposed to be true in V in a set model does not discredit our intuition about V. On the contrary it gives us insight into V, which is so rich in structure, that no axiomatic theory can exhaust it. This Principle, which says that V is essentially indescribable, is called Reflection Principle.
3.4 Reflection principles The axioms of ZF do not tell us enough about large sets and the structure of the universal set V and the transfinite sequence of the ordinals Ω. Our knowledge of V and Ω depends on reflection principles. Since Ackermann’s theory A, which made use of the constant M , proved to be just as strong as ZF the central question was how stronger properties could be found. Now classes, as subclasses of V, the universal class, are the natural extensional counterpart to properties. Thus intuitively a property is described by the class of all sets having that property. In [8] Bernays showed how the use of classes could be used to strengthen the reflection principle.
The Interpretation of Classes in Axiomatic Set Theory
301
Shoenfield, Powell and Reinhardt followed Ackermann and Bernays by trying to find new set existence principles going beyond ZF. Like Ackermann, they augmented the language of set theory and added stronger reflection principles, which allowed them to generate new sets. They focused on the analysis of properties P and formulated axioms stating that these properties should be incomplete in the sense that if they hold for a given set model, there has to be a larger model in which they are valid. In [59] Reinhardt was able to show that the ideas of Powell and Shoenfield could be formalized and lead to the same theories. He presents the class theories S and S+ , which combine the use of classes and the reflection principle. S is very similar to Ackermann’s A. The constant is called V , the Axiom of Extensionality is assumed for all classes (Ackermann’s second principle), elements and subclasses of sets are sets (Ackermann’s fourth principle). Ackermann’s class comprehension schema is replaced by Zermelo’s Axiom of Separation for classes: ∃z∀t(t ∈ z ↔ θ ∧ t ∈ x) Ackermann’s third axiom schema is strengthened: x1 , x2 , . . . , xn ∈ V ∧ ∃X∀t(ϕ(x1 , x2 , . . . , xn , t) → t ∈ X) → ∃y(y ∈ V ∧ ∀z(z ∈ y ↔ ϕ(x1 , x2 , . . . , x3 , z))
where ϕ is a formula with free variables among x1 , . . . , xn , t. In A the free variables of the ‘reflected’ formulas ϕ are all sets. In S the free variable t of ϕ is restricted to some class X only. Reinhardt calls X an imaginable set. If an Axiom of Foundation is added the following reflection principle can be derived: x1 , x2 , ...xn ∈ V → (ϕV ↔ ϕ) where ϕV is obtained from ϕ by relativizing the quantifiers to V . This result justifies interpreting Ackermann’s third principle as a reflection principle. In S+ properties are axiomatized. To formalize the reflection schema, two constants V and U are introduced. U stands for all sets and V is restricted to the existing or real sets. In our language use, V contains all the sets, U the set-classes. In a theory like S+ , the universal class of all existing sets, V is an object, and one can formulate a reflection principle (S3.2) which states that if a property holds in any object, then there is a real set in which the property holds. What makes the theory essentially stronger than a theory like ZF or A is Axiom (S3.3) that states that any subclass C of V is equal to a property.52 This principle is non-constructive and accordingly the use of classes in S+ is impredicative in a strong sense. By the time Ackermann formulated his axiom system, there was no need to think about strengthening the reflection principle. Twenty years later, 52 Stated explicitly: (S3.3): ω ⊆ V → ∃Q∀t ∈ V (t ∈ ω ⇔ t ∈ Q) over the existing properties.
where Q ranges
302
Daniel Roth & Gregor Schneider
however, various large cardinal principles which had a reasonable claim to be valid principles for sets but could not be proved in ZF, have been formulated. In S+ , the class of all ordinal sets χ = {α ∈ V| α is an ordinal } is measurable. Reinhardt differs from many other set theorists by introducing classes as a key to motivate set theoretic principles like the existence of measurable cardinals. The axioms for properties expressed in S+ are supposed to be true in Cantor’s set theory, and I hope the reader will agree they have been introduced in a natural way. [59, p. 31]
He adds that “properties and proper classes (as considered here) are not entrenched in mathematical practice in the way sets are” [59, p. 31]. Developing the theories BL and S+ further, Victoria M. Marshall [46] introduced similar theories with higher order reflection principles, using not only classes but also classes of classes, 2-classes, 3-classes etc. as constants and adding corresponding reflection principles. Syntactically, the universes V 1, V 2, . . . are introduced as constants to formulate the principles. Marshall showed that even the strongest large cardinal axioms are consequences of the resulting theories.
3.5 Interpreting classes in the reflection principle If one wants to interpret the classes in systems like Ackermann’s A, Bernays BL or Reinhard’s S+ , one has to take into account that the use of set-classes and extensions is by intention not meant as a formally dispensable augmentation, but is essential to their theories. Theories like S or S+ are valid in natural set models Vα , where α is an extendible cardinal, but that does not justify their truth, unless we have other justifications for the existence of extendible cardinals. In fact, by an informal reflection argument the existence of extendible cardinals can be justified by the truth of S+ . If V |= S+ then there is an α such that Vα |= S+ . A discussion on the interpretation of proper classes can be found in [59], [60] and in [75]. Both Wang and Reinhardt agree that on the one hand proper classes provide an important tool to formulate strong reflection principles on V or on Ω, the class of all ordinals, i.e. to formulate valid principles about Cantor’s universe while on the other hand the use of proper classes threatens the universality of set theory. Since proper classes are necessary to formulate strong reflection principles53 these principles cannot be justified by showing that they are valid in set-models. Reinhardt makes this point clear: Of course we can suppose there is a set which is a model, but this defeats the intention that the Rω [ = V, the universe of all sets] part of the theory really applies to Cantor’s universe and that proper classes are 53 A strong reflection principles is one that generates the existence of large cardinals.
The Interpretation of Classes in Axiomatic Set Theory
303
really classes of sets. On the other hand, if we introduce new objects (even proper classes) beyond Cantor’s universe this violates the universality of the concept of set. [60, p. 198]
As long as one is dealing with theories like VNBG or KM one can interpret the extensions as large sets and the sets as certain small sets. But such an interpretation is no longer acceptable if the use of classes is essential, as in S+ , due to a direct appeal of our intuitions to Cantor’s universe. It is therefore important to take a closer look at the reflection principle of the theories and the use of classes in them. While Reinhardt [59], [60] differentiates between real and imaginary or potential sets, Wang [75] approaches the problem without any appeal to potentiality. He discusses how Ackermann’s theory A can be interpreted, but his thoughts apply to S, S+ or BL accordingly. The central question is how the constant M may be interpreted. However, once Ackermann’s axiom [the reflection scheme involving the constant M ] is formulated explicitly, we encounter a serious problem of interpreting his system, because M is thought of as V and at the same time taken as one thing in the universe of discourse. If the variables range over sets only, then M cannot be V, because M belongs to the range. If M is to be taken as V, then the variables must range over a larger domain containing things which are not sets. [75, p. 323]
If we interpret M as the universal class V, as Ackermann probably intended, we have fixed the extension of V and the quantifiers range over something outside of V. We are confronted with the same problem as in the interpretation of a theory like KM. The universality of set theory is violated, so that one is led to interpret M as a set. However, the reflection schema can be justified by an informal reflection argument. V is undefinable in just the way Ackermann proposed. Therefore there has to be a set M which shares this undefinability with V. The latter undefinability can be formalized by the reflection schema of A. Informally we consider V as a class, but formally we can only state axioms about the set M . Philosophically one might call this an indirect reflection argument. Such an argument seems very appealing. It would allow us to interpret a theory like A over a set model M = hVα , Vβ i while the truth of the reflection principle would be due to the undefinability of the universal class V.54 But it turns out that this interpretation yields an “insuperable difficulty” [75, p. 324]. A property P that is undefinable in the sense specified by Ackermann’s reflection schema is a property of a set m only because it is thought of as a property of V. If it were no property of V, why should the reflection schema be valid? Thus in this interpretation we have to talk about properties of V, so we have to envisage a bigger universe than the domain of the theory V. In the end, the universality of set theory is threatened again. Therefore, Wang argues for the first of the 54 Vα represents the ‘indefinable’ universe and Vβ its reflection.
304
Daniel Roth & Gregor Schneider
two alternative interpretations, where M is interpreted as V, and accepts that proper classes are imported into our theory, and that it is somehow possible to go “outside” of V. The idea of ‘going outside of V’ is taken up by Reinhardt.
3.6 Potential sets In both [59], [60] Reinhardt discriminates between the members of M or V in his notation, which are called real or existing sets, and the individuals of the discourse, or members of U , which he calls imaginable or potential sets. So we can still regard V as the universal class of all existing sets, while we can go outside of V by considering levels of imaginary sets which go beyond V . Such a modal understanding of sets would certainly “require considerable clarification and defense” ([37, p. 217]) but a closer look reveals that Reinhardt does not really defend such an understanding. He takes “existing” and “imaginable” as a heuristic interpretation which allows us to understand the reflection schemata of S and S+ . He discusses several ways of understanding the distinction between “existing” and “imaginable”. First he makes clear that the quantification can be understood purely extensionally just in the way discussed by Wang [59, p. 9], [60, p. 198]. V is then interpreted as a large set mirroring the real V. This of course leaves us with the same problem of how to interpret the proper classes. An alternative interpretation, which is very close to Ackermann’s is to conceive V and the subclasses of V as essentially unfinished: Namely, we may conceive of V , the class of all sets which exist or have been built up, the “available” sets, having definite membership, as a variable in the old fashioned sense of a “quantity which varies”. [. . . ] Thus the extent of V varies along the ordering of stages. Thus thinking of the order “temporarily”, V changes from moment to moment. [59, p. 9]
Or in [60, p. 196]: A proper class P may however be distinguished from a set x in the following way [. . . ]: If there were more ordinals [. . . ], x would have exactly the same members, whereas P would necessarily have new elements. We could say that the extension of x is fixed but that of P depends on what there exists.
While in [59] he thinks that this interpretation cannot make plausible the axioms of S or S+ in [60] he is more optimistic and gives a picture how these “potential extensions” can be understood. Consider a universe including V , the class of all real sets. This universe is projected into a second universe, in which V is projected to jV in such a way that V can be elementarily embedded in jV . This is achieved by the following projection schema: ∀x, y ∈ V [θV (x, y) ↔ θ(x, y)]
The Interpretation of Classes in Axiomatic Set Theory
305
Through this procedure V = VΩ , where Ω (relative to V ) is the class of all ordinals, becomes a set in the projected universe, which has strong closure properties, like Ω being an inaccessible cardinal. Thus the extent of the class V has grown, while the extent of any set is still unchanged. The projection schema can be understood as a formal analogue to the idea of indirect reflection. We can go outside of Cantor’s universe, formulate valid principles which hold in V or in Ω and reflect them into sets. The problem remains to clarify what is meant by going outside of Cantor’s universe. As in Axiom (3.3) in [59] (which is referred to as (S3.3) on p. 301) the projection can be considerably strengthened by projecting not only the sets x ∈ V , but also arbitrary classes X ⊂ V . In strengthening the projection schema, proper classes appear in two quite different ways. First they are extensionally open, because the extension of the class predicate ∀xφ(x) can change in the projected universe. But secondly proper classes, being subclasses of V , are extensionally determined entities, which are values of a function and can be projected into sets in the projected universe. Reinhardt is aware of this double role: [. . . ] while classes P ⊂ V are treated as having “potential extensions” jP , they [. . . ] are simultaneously treated formally as sets. [60, p. 197]
Since V is a fixed element of the theory of discourse, as in Ackermann’s A many classes above V exist. Reinhardt calls them Ω-classes and showes how the projection schema can be adopted to include Ω-classes. Since Ω-classes go beyond sets and subclasses of V they cannot be understood as a part of Cantor’s universe. The only way to justify such higher order reflection schemas is by indirect reflection as was pointed out above. If it were possible to go outside V and to formulate valid sentences, we come to understand that they are valid in V : We propose to mitigate this sorrow by seeing the universality not in the extension, but in the applicability of the theory of sets. [60, p. 198]
In conclusion Reinhardt tries to combine the results of set theory with a new interpretation of sets and classes. He shows that strong cardinal principles cannot be simply derived from an axiom system like ZF, NF or KM but require a different approach.
3.7 Classes as ideal elements Historically the use of classes in axiomatic set theory follows two traditions: the development of Cantor’s set theory and the method of ideal elements as put forward by Hilbert in the 1920s. The latter was a part of Hilbert’s Program, the attempt to found mathematics and logic by finitary methods. Ideal elements were a tool to retain as much transfinite mathematics as possible, while following a finitistic conception. The method of ideal √ elements refers to ideal objects like i = −1 as well as to ideal propositions (see [26]). While ideal objects like i simplify the laws of algebra, such
306
Daniel Roth & Gregor Schneider
as the existence and number of roots of a polynomial, ideal propositions simplify the formalism, because they allow us to apply classical logic to mathematics. In the same manner a class formalism simplifies set theory, e.g. by allowing an unrestricted Axiom of Comprehension. The standard of evidence corresponds to the truths of the sentences of set theory such as the ZF axioms. The sets, being an underlying system of objects, are enriched by ideal objects which we call proper classes. Especially Bernays, who was always very careful about the formal distinction of set axioms and the class formalism can be understood in such a tradition. The method of ideal elements is linked closely to formalism. But this does not imply that the attitude is necessarily formalistic. Even if it turned out that it was not possible to secure classical mathematics by finitary consistency proofs the method of ideal elements was still valuable.55 So partially Hilbert’s Program was pursued by his pupils von Neumann, Ackermann, Bernays and others. Their work on axiomatic class theory shows that the method of ideal elements was independent of the whole program. The fact that there is a tradition helps to understand, why the question of how to interpret the formalism was not the prime issue. If we ask how ideal elements have helped mathematicians there are at least two aspects, which could be mentioned. The first aspect focuses on the formalism, which is facilitated by the use of ideal elements. This aspect can be found in theories like VNBG, KM or G*, in which classes are added somewhat arbitrarily to an essential set-theoretic axiom system. The formalism provides a helpful framework but is in no way necessary to express set-theoretic theorems. We can compare this use of classes with the use of the sign “∞” in an expression like ln x = −∞. Here the sign ∞ is part of a convenient method to x→0
express that for any given small y and any given negative number there is a smaller x (0 < x < y) such that the natural logarithm of x is beneath the given negative number. Now the sign “∞” is usually not added to the language of arithmetic. Therefore it can be argued that terms for proper classes should not be added to the language of set theory, but only introduced in an informal way, as is done in most textbooks. This thought is strengthened by the fact that VNBG, KM and G* do not provide any new theorems about sets. The second aspect focuses on the original standard of evidence, the sets. If one deals with a theory like N, A, S+ and BL classes cannot be interpreted as superfluous formal additions, because they play a crucial role in the formalism, especially in the reflection schema. It is the property of inexhaustibility of the classes which is reflected into the sets and which, in
55 The impossibility of finding a finite consistency proof follows from Gödel’s second incompleteness theorem.
The Interpretation of Classes in Axiomatic Set Theory
307
short, provides the motivation for even the strongest cardinal hypotheses.56 The ideal elements, the imaginary sets and the proper classes involved in formalism play a more active role. They provide essential help to understand the structure of the real elements, the sets. By translating theories like S+ into set theoretic versions, the ideal elements can be eliminated again. There is an analogy for this kind of ideal element. They are the same kind of ideal elements as Wittgenstein’s Scheinsätze in his Tractatus. It is worth talking about these a bit more, because the analogy is suitable. In the Tractatus, Wittgenstein claims that only those sentences that can be true or false, depending on facts, are meaningful. All meaningful sentences are equal, there is no order to them. This means that any other sentence is either nonsensical, because we did not give one of its components a proper meaning, or meaningless. The latter sentences include tautologies and all the sentences of mathematics. To give a full description of the world, only the meaningful sentences are necessary. By thinking about the world, we try to find metaphysical rules, which we formulate as sentences. Using a proper analysis, we have to understand that all such sentences are in some way without sense. However, this process of philosophizing is not without any use. Through it, we are able to understand the structure of the world and (or as well as) the fact that sentences that allow no verification are without meaning. Once the Scheinsätze have helped to understand the nature of real sentences and the structure of the world, one can dispense of them. For set theory this means that only the sets are real. If we try to introduce classes as real objects and include them into formalism, we find out that the resulting theory can be ‘translated’ into plain set theory, enriched with strong cardinal axioms. Having done this, we come to ‘see’ that the assumption of the existence of the classes can be dropped, and thus the universality of set theory is guaranteed. It is an empirical fact, a fact strengthening our belief in the independence of sets of our mind and the universality of set theory, that this translation can always be achieved. Theories like A, NFU, S+ or BL are either equiconsistent or imply corresponding set theories. Thus it is possible to believe in the truth of the reflection principles without having to accept the existence of real classes, even though the idea of including proper classes leads to the formalization of the principles. In [64] the author has tried to show that mathematical thinking about sets and classes can be understood in just that way. The problem is that such an understanding of classes requires indirect reflection to go outside of V, which is at least a dubious process. 56 It is important to understand that the reflection principles were discovered, and that it is an empirical fact that they could be strengthened by the use of impredicative classes. If one reads through Bernays’ article from 1961 [7] one can get a very good impression of these discoveries.
308
Daniel Roth & Gregor Schneider
4 Concluding Remarks The presented set theories cover all three possible incorporations of setclasses and extensions in set theories. The universe of ZF consists only of sets, but N, KM and BL integrate extensions – which cannot be members of other entities – into a universe of sets. NF, NFU, A, topological set theories and similar set theories with a universal set incorporate setclasses (as a Finsler set theory would), and ML joins sets, set-classes and extensions up for one axiomatic theory. Set theories with set-classes tend to include ‘big’ set-classes like the universal set-class which close the universe, however they do not have to. Insofar as in A the fact that the proper classes are (at least partly) elements of other proper classes, is intimately connected to the heart of this theory (the third principle and axiom schema), its proper classes differ considerably from a ZF-universe augmented again and again by layers of extensions, although both can be modeled in a Vκ (κ a large cardinal). Mathematical input for a better grasp of sets comes more from the incorporation of set-classes, as these theories have an independent enlargement into the hierarchy of large cardinal axioms. As the invention of the reflection principle was made in its natural connection to A and FM, its implementation in set theories with extension-like proper classes as in BL always seems to be more technical. So there is only little, but nevertheless some hope that an application of Quine’s and Finsler’s holistic set intuition could bring up a unified theory of sets and set-classes. If the incorporation of set-classes leads to a better understanding of sets, that would justify the treatment of set-classes as ontologically equal. Furthermore, such a theory contains a universal set-class, so that the domain of the variables of the formal language of the theory is itself an object of the theory. The validity of (CL), however, would have to be restricted to the part of the sets.57 If one focusses on sets and extensions only, the proper classes show up as a kind of limit of thought. Either one needs them as an enhancement of the sets to get started with strong reflection, or the domain principle58 forces at least the universe class on one. Or to make the first point more general, proper classes are necessary to get a look on sets from a wider perspective. It seems reasonable to consider mathematical structures not only for themselves but from the perspective of a richer structure (one 57 In fact, if one turns round (CL) such that to start with the ‘biggest’ collections and to deny the existence of smallest ones, this principle is valid for set-classes in NF and for a part of the set-classes in FM. In this sense there is no ‘sharp border’ between sets and set-classes. 58 The domain principle in mathematics seems to go back to Cantor, who insisted (at least in an earlier period) that every domain of discourse of a mathematical discipline has to be definite (in its parts) and therefore is (to use terms of the later Cantor) a kind of, perhaps inconsistent, multiplicity (see [12, p. 410f.]).
The Interpretation of Classes in Axiomatic Set Theory
309
can call it the principle of richer structure). However, for a mathematical structuralist it could be different. If she takes sets and set-classes (including the universal set-classes) and alters this theory until it becomes a pure theory of structures similar to FM, then – as mentioned in the first part – the relation between the structures and the extensions changes. Now the extensions give an outer ‘picture’ of the structures, which is not directly related to structural equivalents as the Russell extension could not (at least easily) be implemented as a new structure to the structures and no obvious new knowledge about these structures could be generated by those proper classes – contrary to the enhancement of NF to ML. Anyhow, the development of set theory in the 20th century would have been very different if this principle of richer structure would have been applied early to what is called the naïve comprehension principle, which in fact is not so naïve. To recognize that, we enrich the domain of a set theory by adding arbitrary objects which are not sets and not elements of sets. Then the naïve comprehension schema turns into the following: A formula with one free variable is the condition of the elements of a set (or simply forms a set), if it is true only for some sets. To make the objectlanguage able to express what a set is without running into contradiction, one can add a predicate M (x) for ‘x is a set’, which is not definable in the object-language. However, that is exactly the central axiom schema of Ackermann’s class theory and entails a kind of reflection (as seen before). In fact, the presupposition of a domain of all and only sets, which is necessary to get started with a common naïve comprehension schema, is dubious in light of the fact that, still, nobody has put down either a complete formalization of set (which to do is impossible) or a complete intuition of what a set is. To employ the indefinable predicate M can be seen as a reflection on this state of affairs. In addition, (CL) seems to contradict the move to put up a closed unity of all sets. Hence, in view of these thoughts, if there really is an intuitive core of the naïve comprehension schema, it can be rescued by such a context-sensitive formalization.
References [1] Wilhelm Ackermann. Zur Axiomatik der Mengenlehre. Mathematische Annalen, 131:336–345, 1956. [2] Peter Aczel. Non-well-founded Sets. Stanford, 1988. [3] Peter Apostoli, Roland Hinnion, Akira Kanda, and Thierry Libert. Alternative Set Theories. In Andrew D. Irvine (ed.), Philosophy of Mathematics, pp. 461–91. Amsterdam u.a., 2009. [4] Reinhold Baer. Über ein Vollständigkeitsaxiom in der Mengenlehre. Mathematische Zeitschrift, 27:536–9, 1928. [5] Arthur Bakker and Renatus Ziegler. Finsler-Mengenlehre. Technical notes
310
Daniel Roth & Gregor Schneider
(x) series, Institute for Logic, Language and Computation, Universiteit van Amsterdam, 1996. [6] Paul Bernays. A system of axiomatic set theory. The Journal of Symbolic Logic, 2-19, 1937-1954. [7] Paul Bernays. Zur Frage der Unendlichkeitsschemata in der axiomatischen Mengenlehre. In Michael O. Rabin Yehoshua Bar-Hillel, E. I. J. Poznanski and Abraham Robinson (eds), Essays on the Foundations of Mathematics. 1961. [8] Paul Bernays. Axiomatic set theory. Second edition, 1968. [9] Ulrich Blau. Die Logik der Unbestimmtheiten und Paradoxien. Heidelberg, 2008. [10] David Booth. II. Foundational Part: Introduction. In David Booth and Renatus Ziegler (eds), Finsler Set Theory: Platonism and Circularity. Translation of Paul Finsler’s papers on set theory with introductory comments, pp. 85–102. 1996. [11] Herbert Breger. A restoration that failed: Paul Finsler’s theory of sets. In Donald Gillies (ed.), Revolutions in Mathematics, chapter 13, pp. 249–64. Oxford, 1992. [12] Georg Cantor. Gesammelte Abhandlungen mathematischen und philosophischen Inhalts. Hildesheim, 1932, edited by Ernst Zermelo. Nachdruck 1962. [13] Alonzo Church. (Review of: Zur Neubegründung der Mengenlehre. by Johann Jakob Burckhardt). The Journal of Symbolic Logic, 3(4):165–6, 1938. [14] F. R. Drake. Set Theory: An Introduction to Large Cardinals. Amsterdam, New York, 1974. [15] Kenneth Easwaran. The Role of Axioms in Mathematics. Erkenntnis, 68: 381–91, 2008. [16] P. Finsler. Erwiderung auf die vorstehende Note des Herrn R. Baer. Mathematische Zeitschrift, 27:540–2, 1928. [17] Paul Finsler. Über die Grundlegung der Mathematik, Erster Teil. Die Mengen und ihre Axiome. Mathematische Zeitschrift, 25:683–713, 1926. [18] Paul Finsler. Formale Beweise und die Entscheidbarkeit. Mathematische Zeitschrift, 25:676–82, 1926. [19] Paul Finsler. Die Unendlichkeit der Zahlenreihe. Elemente der Mathematik, 9:29–35, 1954. [20] Paul Finsler. Ueber die Grundlegung der Mathematik, Zweiter Teil, Verteidigung. Commentarii Mathematici Helvetici, 38:172–218, 1964. [21] Paul Finsler. Aufsätze zur Mengenlehre. Darmstadt, 1975, edited by Georg Unger. [22] Paul Finsler. Finsler Set Theory: Platonism and Circularity. Translation of Paul Finsler’s papers on set theory with introductory comments. Basel, Boston, Berlin, 1996, edited by David Booth and Renatus Ziegler. [23] Thomas Forster. [Review of] Finsler Set Theory: Platonism and circularity. Studia Logica, 61(3):429–33, 1998.
The Interpretation of Classes in Axiomatic Set Theory
311
[24] Abraham A. Fraenkel, Yehoshua Bar-Hillel, and Azriel Levy. Foundations of Set Theory. Amsterdam, London, 2 edition, 1973. [25] Kurt Gödel. The consistency of the axiom of choice and of the generalized continuum hypothesis with the axioms of set theory, volume 3 of Annals of Mathematical Studies. Princeton, 1940. [26] David Hilbert. Über das Unendliche. Mathematische Annalen, 95:161–90, 1926. [27] M. Randall Holmes. Review of “Finsler Set Theory: Platonism and Circularity”, David Booth and Renatus Ziegler, eds. Homepage, Oktober 1996. [28] M. Randall Holmes. Elementary set theory with a universal set, volume 10 of Cahiers du Centre de logique. Université catholique de Louvain Département de Philosophie, 1998. [29] M. Randall Holmes. Strong axioms of infinity in NFU. Journal of Symbolic Logic, 66(1):87–116, 2001. [30] M. Randall Holmes. Alternative Axiomatic Set Theories. In Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy. 2009. [31] Ignacio Jané. The role of the absolute infinite in Cantor’s conception of set. Erkenntnis, 42:375–402, 1995. [32] Thomas Jech. Set Theory. Berlin Heidelberg, 2003. [33] Ronald Björn Jensen. On the consistency of a slight(?) modification of Quine’s ’New Foundations’. Synthese, 19:250–263, 1969. [34] Akihiro Kanamori. The Higher Infinite. Large Cardinals in Set Theory from Their Beginnings. Berlin, Heidelberg, 2 edition, 2003. [35] Akihiro Kanamori. Bernays and Set Theory. Bulletin of Symbolic Logic, 15 (1):43–69, 2009. [36] J. L. Kelly. General Topology. Van Nostrand, 1955. [37] Peter Koellner. On reflection principles. Annals of Pure and Applied Logic, 157:206–19, 2009. [38] Kenneth Kunen. Set Theory: An Introduction to Independence Proofs. Amsterdam New York Oxford, 1983. [39] Hannes Leitgeb. On Formal and Informal Provability. In Otávio Bueno and Øystein Linnebo (eds), New Waves in Philosophy of Mathematics, pp. 263–299. New York, 2009. [40] Azriel Levy. Contributions to the metamathematics of set theory. PhD thesis, Jerusalem, 1958. [41] Azriel Levy. Axiom schemata of strong infinity in axiomatic set theory. Pacific Journal of Mathematics, 10:223–238, 1960. [42] Azriel Levy. The Role of Classes in Axiomatic Set Theory. In Sets and Classes. On the work of Paul Bernays. H.G. Müller, 1976. [43] Azriel Levy. Basic Set Theory. Springer, 1979. [44] Penelope Maddy. Proper Classes. Journal of Symbolic Logic, 48:113–139, 1983.
312
Daniel Roth & Gregor Schneider
[45] Penelope Maddy. The roots of contemporary platonism. The Journal of Symbolic Logic, 54:1121–1144, 1989. [46] M. V. Marshall. Higher Order Reflection Principles. Journal of Symbolic Logic, 54:474–89, 1989. [47] Richard M. Montague. Fraenkel’s addition to the axioms of Zermelo. In Michael O. Rabin Yehoshua Bar-Hillel, E. I. J. Poznanski and Abraham Robinson (eds), Essays on the Foundations of Mathematics, pp. 91–114. 1961. [48] A. P. Morse. A theory of sets. Accademic Press, 1965. [49] A. Mostowski. Some impredicative definitions in the axiomatic set theory. Fundamenta Mathematicae, 36:111–124, 1951. [50] I.L. Novak. A construction for models of consistent systems. Fundamenta Mathematicae, 37:87–110, 1950-51. [51] Arnold Oberschelp. Eigentliche Klassen als Urelemente in der Mengenlehre. Mathematische Annalen, 157:234–260, 1964. [52] Arnold Oberschelp. Allgemeine Mengenlehre. Mannheim, Leipzig, Wien, Zürich, 1994. [53] W. V. Quine. New foundations for mathematical logic. American Mathematical Monthly, 44:70–80, 1937. [54] W. V. Quine. Mathematical Logic. Harvard University Press, revised edition edition, 1951. [55] W. V. Quine. Set Theory and Its Logic. 1971. [56] W. V. Quine. From a logical point of view. Cambridge, 2. revised edition, 1980. [57] W. V. Quine. The inception of “New Foundations”. Bulletin of the Belgian Mathematical Society, 45(3):325–7, 1993. [58] W. N. Reinhardt. Ackermann’s set theory equals ZF. Annals of Mathematical Logic, 2:189–249, 1970. [59] W. N. Reinhardt. Set existence principles of Shoenfield, Ackermann and Powell. Fundamenta Mathematicae, 83:12–41, 1974. [60] W. N. Reinhardt. Remarks on Reflection Principles, Large Cardinals, and Elementary Embeddings. Proceedings of Symposia in Pure Mathematics, 10:189–205, 1974. [61] Adam Rieger. An Argument for Finsler-Aczel Set Theory. Mind, 109:241– 53, 2000. [62] R. M. Robinson. The theory of classes. A modification of von Neumann’s system. JSL, 2:29–36, 1939. [63] J. B. Rosser and Hao Wang. Non-standard models for formal logic. Journal of Symbolic Logic, 15:113–129, 1950. [64] Daniel Roth. Cantors unvollendetes Projekt. Reflektionsprinzipien und Reflektionsschemata als Grundlagen der Mengenlehre und grosser Kardinalzahlaxiome. München, 2003.
The Interpretation of Classes in Axiomatic Set Theory
313
[65] Georg Schiemer. Carnap on extremal axioms, “completeness of the models,” and categoricity. The Review of Symbolic Logic, 5(4):613–41, 2012. [66] Thoralf Skolem. Einige Bemerkungen zur axiomatischen Begründung der Mengenlehre. In Proceedings of the 5th Scandinavian Mathematics Congress, Helsinki 1922, 1923. [67] Thorwald Skolem. P. Finsler. über die Grundlegung der Mengenlehre. I: Die Mengen und ihre Axiome. Jahrbuch über die Fortschritte der Mathematik, 52(1):192–3, 1934. [68] Ernst P. Specker. The axiom of choice in Quine’s new foundations for mathematical logic. Proceedings of the National Academy of Sciences of the USA, 39:972–975, 1953. [69] Wolfgang Stegmüller. Eine Axiomatisierung der Mengenlehre beruhend auf den Systemen von Bernays und Quine. In Franz von Kutschera Max Käsbauer, Wilhelm Britzelmayr (ed.), Logik und Logikkalkül, pp. 57–104. 1962. [70] John von Neumann. Eine Axiomatisierung der Mengenlehre. Journal für die reine und angewandte Mathematik, 154:241–266, 1925. [71] John von Neumann. Die Axiomatisierung der Mengenlehre. Mathematische Zeitschrift, 27:669–752, 1928. [72] Hao Wang. On Zermelo’s and von Neumann axioms for set theory. Acad. Sci. U.S.A., 35:150–155, 1949. [73] Hao Wang. A formal system of logic. Journal of Symbolic Logic, 15:25–32, 1950. [74] Hao Wang. From Mathematics to Philosophy. 1974. [75] Hao Wang. Large Sets. In Logic, Foundations of Mathematics and Computability Theory., pp. 309–333. Butts and Hintikka, 1977. [76] Hao Wang. Reflections on Kurt Gödel. 1991. [77] Hao Wang. Skolem and Gödel. Nordic Journal of Philosophical Logic, 1(2): 119–132, 1996. [78] Ernst Zermelo. Untersuchungen über die Grundlagen der Mengenlehre. Mathematische Annalen, 65:261–281, 1908.
Purity in Arithmetic: some Formal and Informal Issues Andrew Arana
Over the years many mathematicians have voiced a preference for proofs that stay “close” to the statements being proved, avoiding “foreign” , “extraneous” , or “remote” considerations. Such proofs have come to be known as “pure” . Purity issues have arisen repeatedly in the practice of arithmetic; a famous instance is the question of complexanalytic considerations in the proof of the prime number theorem. This article surveys several such issues, and discusses ways in which logical considerations shed light on these issues.
1 Introduction There has been since antiquity a tradition in mathematics of preferring solutions to problems / proofs of theorems that are restricted to considerations “close” or “intrinsic” to what is being solved/proved.1 A classical example of such preference is the desire for “synthetic” geometrical solutions to geometrical problems rather than “analytic” solutions to those problems that have struck many mathematicians as “rather far” from the problems at hand. Arithmetic in particular has been the locus of much concern over purity. For instance, regarding Hadamard and de la Vallée Poussin’s 1896 proof of the prime number theorem using complex analysis, the distinguished number theorist A. E. Ingham remarked that it “may be held to be unsatisfactory in that it introduces ideas very remote from the 1
Parts of Sections 2 and 3 appeared in French in my article “L’infinité des nombres premiers : une étude de cas de la pureté des méthodes” , which was published in Les études philosophiques 2:97 (2011), pp. 193–213. The author thanks the audience at the conference “The Number Concept: Axiomatization, Cognition and Genesis” held at the Université Nancy 2 in Nancy, France in November 2010, in which a preliminary version of this article was presented. In particular the author thanks Sean Walsh, the organizer of the conference, and the Agence Nationale de la Recherche, France (ANR), who by funding Mic Detlefsen’s senior chaire d’excellence funded this conference.
316
Andrew Arana
original problem, and it is natural to ask for a proof of the prime number theorem not depending on the theory of a complex variable” (cf. [25, pp. 5-6]). Investigation of purity in mathematical practice reveals that there are several different strains of purity differentiated by how they measure what is “close” or “intrinsic” to what is being solved/proved. In [15] we settled on one such strain as the one most central to purity in historical practice, which we called the “topical” strain.2 Very roughly, a solution to a problem is “topically pure” if it draws only on what is “contained” in (the content of) that problem, where what is “contained” in a problem is what grounds its understanding and is what we call that problem’s “topic”. If a solution to a problem draws on something extrinsic to that problem’s topic, then it is topically impure. The chief aim of this essay is to shed further light on topical purity by examining two cases from arithmetic: the infinitude of primes, and Gödel’s incompleteness theorems. The discussion of the infinitude of primes will extend the discussion in [15] of the familiar Euclidean proof. After a brief recapitulation of that work’s discussion of the topical impurity of Furstenberg’s topological proof of the infinitude of primes, stressing different notions of content that arise in reflections on this proof, attention will shift to Gödel’s work. The main question to be addressed is whether this work shows that there are some arithmetic theorems for which no topically pure proof is possible.
2 Topical purity The insight behind the analysis of topical purity in [15] was expressed well by Hilbert in his 1898/1899 lectures on geometry: In modern mathematics such criticism is raised very often, where the aim is to preserve the purity of method [die Reinheit der Methode], i.e. to prove theorems if possible using means that are suggested by [nahe gelegt] the content of the theorem.3
What is critical for a proof’s being pure or not, then, is whether the means it draws upon are “suggested by the content of the theorem” being proved. Since what it is for an element of a proof to be “suggested” by its content is not particularly clear, a chief task of [15] was to clarify this. Call the commitments that together determine the understanding of a given 2
Cf. [1], [2] and [3] for discussions of other important strains of purity.
3
Cf. [23], pp. 315–6. The original reads, “In der modernen Mathematik wird solche Kritik sehr häufig geübt, wobei das Bestreben ist, die Reinheit der Methode zu wahren, d.h. beim Beweise eines Satzes wo möglich nur solche Hülfsmittel zu benutzen, die durch den Inhalt des Satzes nahe gelegt sind.”
Purity in Arithmetic: some Formal and Informal Issues
317
problem (for a particular investigator α) the “topic” of that problem (for α). Among these commitments are definitions, axioms, inferences, etc. These together are constitutive of α’s understanding of the problem, and hence of the identity of the problem (to α).4 A solution to a problem is “topically pure” (for α) if it draws only on what belongs to that problem’s topic. In other words topically pure solutions to problems draw only on what is constitutive of the identity of that problem. The heart of the account in [15] of the epistemic value of topical purity is the following counterfactual: if a component of a topically pure solution to a problem were retracted by an investigator, then that investigator’s understanding of that problem would change. This is because every component of a topically pure solution to a problem belongs to the topic of that problem, and hence is partly determinative of the understanding of that problem. This is not the case for topically impure solutions to problems, since some of their components do not belong to their problems’ topics. The epistemic significance of this counterfactual is as follows. A topically pure solution to a problem remains a solution to that problem even when some component of that solution is retracted, for such retraction “dissolves” that problem, by changing its understanding and hence its identity. While dissolving a problem is not typically taken to count as solving it, we argue that it should, since the aim of problem solution is the relief of rational ignorance, and we cannot (rationally) be ignorant about dissolved problems. Hence topically pure solutions persist as solutions even when one of their components are retracted. Thus the relief of ignorance provided by topically pure solutions to problems is quite “stable” with respect to changes in attitude regarding their components. The same cannot be said of topically impure solutions to problems, however. That is because some components of topically impure solutions may be retracted without dissolving their problems and hence the relief of ignorance they provide is not as “stable” to changes in attitude regarding their components as is the relief provided by topically pure solutions. The epistemic value of topical purity is thus that topically pure solutions are more resilient to retraction of their components than are topically impure solutions. This characterization of topical purity and its value is only as clear as the notion of topic it uses. In the next section we thus turn to an example from arithmetic, the infinitude of primes, that sheds further light on this notion. This characterization also leaves open the question of whether every theorem has a topically pure proof. We will turn to this question in Section 4.
4
These relativizations to a particular investigator are needed because how a problem is understood may differ from investigator to investigator. In practice there is not too much local variation in this.
318
Andrew Arana
3 The infinitude of primes Section 4 of [15] raised as an example for topical purity the problem of determining whether there are infinitely many primes. Two positive solutions to that problem were considered: the classical Euclidean solution, and Furstenberg’s topological solution. In that section the case was made that the former solution is topically pure, while the latter is topically impure. In this section the purity of the Euclidean solution will be discussed in further detail, and the impurity of the topological solution will be discussed again with an eye toward seting up the discussion of Gödel’s work later in this article.
3.1 The Euclidean solution Consider a contemporary investigator α who has a typical contemporary understanding of arithmetic. Suppose α formulates the question concerning the infinitude of primes (henceforth called IP) as follows: for every natural number, is there a greater natural number that is prime? A solution to IP for α is a proof of the result that for all natural numbers a, there exists a natural number b > a such that b is prime. A topically pure solution to IP for α may draw on what belongs to the topic of IP, that is, on the commitments that determine the content of the problem as she has formulated it. To determine whether a given solution to IP is topically pure, then, more must be said about what commitments constitute the topic of IP. The topic of IP must at least include definitions and axioms for natural number, an ordering on the natural numbers, and primality. The natural numbers are typically understood to begin with a first number 1, followed by its successor S(1), and continuing with the successors of each number already reached. Hence axioms for successor would seem to be included, as would induction axioms for making precise the view that the natural numbers “start” with 1 and “continue” onward thereafter. Definitions and axioms for an ordering on the natural numbers would also be needed for IP’s topic, and following typical practice these would specify a linear discrete ordering. Additionally, a definition of primality is also needed, and since a natural number a is ordinarily defined as prime if and only if a 6= 1 and the only numbers dividing a are 1 and a, a definition of divisibility (written a|b) also belongs to IP’s topic. In light of this preliminary specification of IP’s topic, consider the wellknown Euclidean proof from Elements IX.20. If a = 1, then since 2 = S(1) is prime, we know that there is a prime greater than a = 1. So suppose that a > 1. Let p1 , p2 , . . . , pn be all the primes less than or equal to a, and let Q = S(p1 · p2 · · · pn ). Note that Q has a prime divisor b. For each i, b 6= pi ; if not, then b|(p1 · p2 · · · pn ) and b|S(p1 · p2 · · · pn ), and so b = 1,
Purity in Arithmetic: some Formal and Informal Issues
319
contradicting the primality of b. Finally, either b > a, or b ≤ a, but since b ≤ a contradicts that the pi were all of the primes less than or equal to a, we may conclude that b > a. This proof has several steps that themselves require proof. Examples are the step that consists in the assertion that if b|(p1 · p2 · · · pn ) and b|S(p1 · p2 · · · pn ), then b = 1, or the step in which it is asserted that if a|b and a|S(b), then a = 1. Whether or not the Euclidean solution is topically pure thus depends on whether or not the main proof, and all of the subproofs needed to establish the steps of the main proof, draw only what belongs to the topic of IP. On the face of it, there is nothing unarithmetic about these proofs, and so a favored initial diagnosis is that the Euclidean solution is pure. However, there are good reasons to think this too quick. A first reason for concern about the purity of this solution is that it appeals to multiplication in generating Q, though multiplication was not included in the preliminary specification of IP’s topic. A second reason for concern is that the subproofs have not been given fully, and so appeal to elements foreign to IP’s topic cannot yet be ruled out. A reply to both concerns would be to note that the main proof and each of the needed subproofs can be carried out from the first-order Peano axioms (PA), as can be (tediously) checked. Provided that the axioms of PA (augmented by definitions of the appropriate ordering and primality) belong to the topic of IP, its sufficiency for expressing the main proof and each subproof answers the second concern, and its inclusion of a definition of multiplication answers the first concern. Thus, provided that the Peano axioms belong to the topic of IP, and that the proofs when carried out in PA remain faithful to the Euclidean proof and subproofs, the Euclidean proof is topically pure. This is not an especially convincing reply, however, because it begs both questions. The sufficiency of PA for the Euclidean solution was not in question; this is indicative of PA’s being widely considered an adequate axiomatization of elementary arithmetic. What is in question is the topicality of the commitments engendered in accepting PA for IP. The reply simply asserts that these commitments are topical for IP, but that is exactly what being questioned. What is needed is a more fine-grained analysis of the topic of IP, in particular investigation of what operations (divisibility? multiplication? addition?) and modes of inference (classical logic? how much induction?) belong to IP’s topic. To this the essay now shifts.
3.1.1 The topicality of arithmetic operations for IP The second concern just raised is that the fully spelled-out Euclidean solution may contain elements that do not belong to IP’s topic. When spelling out the solution in PA this is the case, since addition is used in establish-
320
Andrew Arana
ing the needed properties of multiplication, while addition is not explicitly mentioned in the problem as formulated. But this is not merely an issue with PA. Any proof that uses multiplication must either take to belong to the definition of multiplication the properties of multiplication that it needs, or prove them on some other basis. If the latter, proof via addition is the obvious choice since multiplication is ordinarily defined as iterated addition (as in PA, for instance). If the former, then some plausible nonadditive definition of multiplication is needed; and moreover some answer will be needed for the reply that multiplication is also not mentioned explicitly in IP. We will return to the issue concerning multiplication; let us for now discuss the additive case. Concerning the use of addition in the Euclidean solution, one response would be to defend the use of addition as topically pure for IP. One could do so on the grounds that addition (and multiplication) are “basic” to understanding the natural numbers, because we are talking about a discretely ordered ring in usual practice, that is, as a structure with both an addition and a multiplication operator. But this seems wrong: Presburger and Skolem arithmetic (with just addition and multiplication, respectively) are just as “basic” as Peano arithmetic. Indeed, children seem to grasp the natural numbers before they understand the concepts of addition and multiplication. The sequence starting with 1 and generated by successors is more plausibly basic (though not necessarily the most basic). Another response to this objection concerning the use of addition in the Euclidean solution notes that the only need for addition in the Euclidean solution is to establish properties of multiplication such as commutativity and associativity. We may thus isolate these properties of multiplication and find a proof directly from them, without adverting to addition. The following seventeen assumptions are an attempt to do this. They include assumptions regarding successor and the ordering in addition to multiplicative assumptions, in order to yield a set of assumptions sufficient for solving IP without using addition. Assumption 1 For all x, there exists y such that y = S(x). Assumption 2 For all x, y, x = y if and only if S(x) = S(y). Assumption 3 For all x, y, there exists z such that z = x · y. Assumption 4 For all x, x · 1 = x. Assumption 5 For all x, y, z, (x · y) · z = x · (y · z). Assumption 6 For all x, y, x · y = y · x. Assumption 7 For every sequence of primes p1 , . . . , pn , there exists z such that z = p1 · p2 · · · pn .
Purity in Arithmetic: some Formal and Informal Issues
321
Assumption 8 For all x, y, z, if x < y and y < z, then x < z. Assumption 9 For all x, x 6< x. Assumption 10 For all x, y, either x < y, x = y, or y < x. These three assumptions together imply that if x < y, then y 6< x, and hence that the trichotomy asserted in Assumption 10 is exclusive, i.e. for each x, y, exactly one of x < y, x = y, and y < x holds. Assumption 11 For all x, 1 ≤ x. Assumption 12 For all x, y, x < y if and only if S(x) < S(y). Assumption 13 For all x, x < S(x). Assumption 14 For all x, y, if x < y then S(x) ≤ y. Assumption 15 For all x, y, z, x < y if and only if xz < yz. Assumption 16 For all y 6= 1 and all x, S(yx) < y · S(x). Assumption 17 For each formula ϕ(x, y), where x is a free variable and the y are terms, if ϕ(1, y) and if for all a and all b < a, ϕ(b, y) implies that ϕ(a, y), then for all a, ϕ(a, y). These assumptions may be grouped as follows: Assumptions 1 and 2 concern successor, 3–7 concern multiplication, 8–10 concern the ordering, 11– 16 concern how successor and multiplication respect the ordering, and 17 is an induction schema.5 Next we will give a non-additive solution to IP using these assumptions. To simplify the structure of the main proof, we will separate from the main proof the following three lemmas, and prove them separately. Lemma 3.1 S(1) is prime. Lemma 3.2 Every natural number a 6= 1 has a prime divisor p ≤ a. Lemma 3.3 For all a, b, if a|b and a|S(b), then a = 1. Using these lemmas, the main result, that for all a, there exists b > a such that b is prime, can be proved as follows, with the assumptions referred to therein listed afterwards. 1. Either a = 1 or a > 1. [Assumption 11] 5
We are not claiming that these assumptions are mutually independent of each other.
322
2.
Andrew Arana
a) Say a = 1. b) By Lemma 3.1, S(1) is prime. [Assumption 1] c) S(1) > 1. [Assumption 13]
3.
a) Say a < 1. b) Let p1 , p2 , . . . , pn be all the primes less than or equal to a. c) Let Q = S(p1 · p2 · · · pn ). [Assumptions 1, 7] d) By Lemma 3.2, Q has a prime divisor b. e)
i. Suppose b = pi . ii. Then b|(p1 · p2 · · · pn ). [Assumptions 5, 6]
iii. By Lemma 3.3, b = 1, contradicting the primality of b. f) Thus for each i, b 6= pi .
g) Either a < b, or b ≤ a. [Assumption 10] h) b ≤ a contradicts that the pi were all the primes less than or equal to a. i) Thus a < b. Proof of Lemma 3.1, S(1) is prime: 1. For all n, n < S(1), n = S(1), or S(1) < n. [Assumptions 1 and 10] 2. a) Suppose n = S(1). b) S(1)|S(1). [Assumption 4] 3.
a) Suppose n < S(1). b) S(n) ≤ S(1). [Assumptions 1, 14]
c) If S(n) < S(1), then n < 1, a contradiction. [Assumptions 11 and 12; and 8–10 which imply that only one of the cases of trichotomy of < obtains] d) If S(n) = S(1), then n = 1. [Assumption 2] e) n = 1. f) 1|S(1). [Assumptions 4, 6]
4.
a) Suppose S(1) < n. b) i. Suppose n|S(1). ii. There exists x such that nx = S(1). iii. S(1) · x < nx. [Assumption 15] iv. S(1) · x < S(1) · 1. [Assumption 4]
v. x < 1, a contradiction. [Assumptions 11, 15] c) Thus, if S(1) < n, then n 6 | S(1).
5. So the only numbers dividing S(1) are 1 and S(1), and so S(1) is prime.
Purity in Arithmetic: some Formal and Informal Issues
323
Proof of Lemma 3.2, every natural number a 6= 1 has a prime divisor p ≤ a: 1. We proceed by strong induction on a.
2. Base case: S(1) is prime by Lemma 3.1. 3. Inductive case: a) Suppose that for all y < a, y 6= 1 has a prime divisor p ≤ y. b) Either a is prime or composite. c) If a is prime, we are finished. d) So suppose a is composite, i.e. that there is some b such that 1 < b < a and b|a. e) By the inductive hypothesis, b has a prime divisor p ≤ b. f) Since p|b and b|a, p|a. [Assumption 5] g) Since p ≤ b and b < a, p ≤ a. [Assumption 8] 4. So for all a 6= 1, a has a prime divisor p ≤ a. [Assumption 17]
Proof of Lemma 3.3, for all a, b, if a|b and a|S(b), then a = 1: 1. Suppose a|b and a|S(b).
2. Then there exist x and y such that ax = b and ay = S(b). 3. ax = b < S(b) = ay. [Assumption 13] 4. x < y. [Assumptions 6 and 15] 5.
a) Suppose a 6= 1. b) S(ax) < a · S(x). [Assumptions 1, 3, and 16] c) S(x) ≤ y. [Assumptions 1 and 14] d) a · S(x) ≤ ay. [Assumptions 3, 6 and 15]
e) S(ax) < ay [Assumption 8] f) S(b) < S(b), a contradiction [Assumption 9]
6. a = 1. To argue that this solution is pure, we would need to argue that each of the seventeen assumptions belong to IP’s topic, that is, that each assumption partly determines the content of IP as formulated (for an ordinary investigator). If we are willing to grant PA as topical for IP, then this is trivial, since each of these assumptions may be derived in PA. Otherwise, this is a difficult task, because it is hard to say definitively whether an assumption is determinative of the content of a problem formulation, even for an ordinary investigator. Indeed it is not even clear what the standards are for making such a determination. Toward this, we note in particular that Assumption 7 is provable by induction, that is, from Assumption 17; and that Assumption 16 asserts that multiplication grows faster than successor, which seems essential to the typical contemporary understanding of the relation of these two functions.
324
Andrew Arana
While the issue concerning the topicality of addition for IP remains of interest, let us expand the discussion by considering the topicality of multiplication for IP as well. We raised two issues for said topicality earlier: firstly, some plausible non-additive definition of multiplication would be needed if multiplication is to belong “natively” to IP’s topic; and secondly, some answer would be needed to the point that like addition, multiplication is also not mentioned explicitly in IP; rather, only division is, in the definition of prime number. Both points may be met by introducing work in mathematical logic. To that work we now turn. We first point out that since primality is defined in terms of divisibility, a definition of divisibility uncontroversially belongs to IP’s topic. This leaves open precisely which such definition is included. Divisibility is often defined in terms of multiplication – a divides b if and only if there exists x such that a · x = b – but this is not required. Informally, a divides b if a collection of b many objects can be divided into a groups with none left over. While this can be expressed in terms of multiplication, we have just shown that it need not be. The question then is what definitions and axioms ground (our understanding of) divisibility. There has been some logical work on axiomatizations of the arithmetic of divisibility, in which divisibility is taken as a primitive, notably by Cegielski (cf. [9], [10]), but this work takes the infinitude of primes as an axiom and so is not finegrained enough for the question in focus here. We turn instead to work of Julia Robinson which offers a more promising direction for our investigation. Robinson showed how to define addition (and multiplication) for the natural numbers in terms of just successor and divisibility, both of which are explicitly referenced in the problem’s formulation (cf. [32, pp. 100-2]). She firstly showed that addition is definable in terms of successor and multiplication as follows: a + b = c if and only if S(a·c)·S(b ·c) = S[(c·c)·S(a·b)]. She next showed both that two numbers being relatively prime, and that a number being the least common multiple of other numbers, can be defined in terms of successor and divisibility, without appeal to addition. She lastly showed how to define multiplication using successor, relative primality, and least common multiple. Using these explicit definitions, the Euclidean solution to IP (as carried out in PA) may be translated into a language with just 1, S, |, and <. In particular, all the axioms, definitions and propositions used in proving the Euclidean solution could be translated into this language. By avoiding explicit reference to addition and multiplication, this translated solution would answer worries concerning the topicality of both addition and multiplication, by showing that neither operation is needed for this translated version of the Euclidean solution, and so that a topically pure solution to IP has been identified. One might resist this response by noting, as Robinson does, that this “mechanical” translation, as Robinson describes it, results in axioms that
Purity in Arithmetic: some Formal and Informal Issues
325
are “complicated and artificial” (cf. [32, pp. 102–103]). They are syntactically more complex than the ordinary axioms of PA, in particular longer than those axioms, and not identifiable to the “naked eye” as equivalent to PA. Robinson thus set out to find “a simple and elegant axiom system” for arithmetic in this restricted language. She identified a candidate for such an axiom system, and demonstrated that it proves the same theorems as PA, and thus proves the Euclidean solution to IP. This would seem to answer the objection. However, the “simple and elegant” axiom system Robinson identifies includes second-order induction. She could not prove that this axiom system proves the same theorems as PA when including just first-order induction. As far as we know, it remains open whether this can be accomplished. It is not clear, however, what is wrong with “complicated and artificial” axioms, from the perspective of topical purity. It is not clear that the “simplicity” and “elegance” of a definition or axiom, construed for instance in terms of syntactic complexity, is relevant to whether a definition or axiom belongs to a problem’s topic. There could be problems whose topics contain irreducibly complex elements. Such problems would not be simple to understand, at least when considering all the commitments required to ground its understanding, but there is no a priori reason to think that every problem, even every commonly-studied problem, is simple to understand in this respect. If that is correct, then Robinson’s “mechanical” approach remains a viable response to the objection to the topical purity of the Euclidean solution on account of its use of addition and multiplication. The objection that the Euclidean solution’s use of addition and multiplication entails its topical impurity can thus be met in several ways. In closing this section, we observe that these responses raise interesting issues concerning “implicit” understanding/commitment that we can only raise here. For instance, we can ask whether in understanding the “translated” Euclidean solution we are implicitly committed to the ordinary additive properties used in the untranslated version. We might think this if we thought there was something “basic” about those ordinary additive properties, so that these properties can be expressed in superficially non-additive ways while remaining additive in content. The Robinson definability approach points toward a way to make this sense of “basicness” somewhat sharper: if the axioms resulting from the translation are too “complex”, then the original axioms are more basic. (We return to such issues in Sections 3.2 and 4.) The notion of complexity to be used here awaits further clarification, though. Such a suggestion would be worth pursuing. If indeed the “translated” non-additive Euclidean solution turns out to be essentially additive in content, then to avoid concluding that the Euclidean solution is topically impure a reasonable strategy would be to press the point that commitment to additive properties is fundamental to an understanding of the natural numbers.
326
Andrew Arana
3.1.2 The topicality of arithmetic inferences for IP Let’s now turn to the question of whether the Euclidean solution to IP remains pure when we vary the modes of inference belonging to IP’s topic. For instance, note that the Euclidean proof would still be topically pure if IP’s logic were taken to be intuitionistic, since it uses excluded middle only for effectively decidable predicates such as “is prime”, and while it uses reductio it does not use double negation. A fuller investigation would check whether some uses of substitution are topical for IP but not others. In this section we focus on inductive modes of inference. For instance, if the logic of IP were taken to be second-order, then the Euclidean solution would still qualify as topically pure by a trivial modification to use secondorder induction rather than the first-order induction schema. Let’s consider a different variant of IP. If the logic of IP were taken to be finitary rather than classical, then the Euclidean solution is also plausibly pure. This is because it is straightforward to check that our demonstration uses the induction schema for arithmetical formulas no more complex than Σ01 – that is, it can be carried out in the weak fragment of PA known as IΣ1 . If we follow Tait [33] in taking finitary arithmetic to be Primitive Recursive Arithmetic (PRA), and follow Hájek and Pudlák [22] in taking PRA to be equivalent to IΣ1 , then a finitary solution to IP is one that can be done in IΣ1 . Hence the Euclidean solution is finitist (on this construal of finitary arithmetic), and thus would still qualify as topically pure. Next, let’s take the logic of IP to be feasible rather than classical. If we follow Parikh [30] and take feasible arithmetic to be I∆0 (where this is PA with the induction schema restricted to formulas with just bounded quantifiers), then it is an open problem whether IP can be solved purely. It is known, though, that the Euclidean solution is not pure for this formulation of IP. The problem is proving the existence of Q = (p1 · p2 · · · pn ) + 1 as is done in the Euclidean solution. This product has exponential growth (by a result of Chebyshev), but Parikh showed [30] that every ∆0 -definable function that is provably total in I∆0 has polynomial growth.6 Hence in I∆0 it is unprovable that every product of primes exists (cf. [13, p. 13]). However, the existence of Q can be proved using bounded induction provided that we add another axiom asserting the totality of the exponential relation, resulting in a theory called I∆0 (exp) (cf. [14, p. 153]). I∆0 (exp) has been well-studied (many call it EFA, for Elementary Function Arithmetic).7 For this solution of IP carried out in EFA to be pure when IP’s 6
Cf. [14] pp. 164–7 for more on what is known concerning the rate of growth of the function yielding products of primes in I∆0 .
7
It is claimed that every result in elementary number theory (for instance, every result in Hardy and Wright’s canonical [24]) can be proved in EFA (cf. [14, p. 149n1]). Indeed, Harvey Friedman has gone further with his “grand conjecture” that (in Avigad’s words, “Every theorem published in the Annals of Mathematics whose
Purity in Arithmetic: some Formal and Informal Issues
327
logic is construed as feasible, the axiom asserting the totality of exponentiation must be part of the topic of IP so construed. By a result of Gödel [19] we know that the exponential relation is definable in (N, 1, S, +, ·) (for a reasonably explicit definition of this type, cf. [16], pp. 276–9). It is unclear, though, whether an axiom asserting the totality of this definable relation is part of the topic of IP with its logic construed feasibly. Let’s add one more data point to this discussion of feasibility. In his dissertation, Alan Woods [35] was able to solve IP in I∆0 augmented by a weak version of the pigeonhole principle. Rather than giving a version of the Euclidean solution, though, Woods gave a version of a solution due to Sylvester. Woods’ theory, called I∆0 + P HP , is logically weaker than I∆0 (exp), in that I∆0 (exp) proves I∆0 + P HP but not vice-versa; and indeed Paris, Wilkie, and Woods were able to give a solution to IP using an even weaker version of the pigeonhole principle (cf. [31], and [14], pp. 162– 4). Again, we may wonder whether one of these pigeonhole-principle-like axioms belong to IP’s topic with its logic construed feasibly. A positive answer would imply that IP has a topically pure solution when its logic is understood feasibly. As far as we are concerned, this remains open. We have thus surveyed some variations of IP based on what induction is taken to belong to its topic. These variations correspond to foundationally familiar arithmetic theories. As we have seen, the purity of the Euclidean solution to IP, and indeed whether there is any known pure solution to IP, depends on what inferences are licensed by its topic.
3.2 The topological solution to IP We will next consider a solution to IP that we have argued (in Section 5 of [15]) is topically impure. This is a topological solution due to Harry Furstenberg that goes as follows (cf. [17, p. 353]). We begin by putting a topology on the integers, by taking the arithmetic progressions Ba,b = {a + bn : n ∈ Z} with a, b ∈ Z, b > 0 as the basic open sets. The following can then be shown (but we omit the details here): these sets Ba,b together form a basis for a topology on the integers; and each Ba,b is closed as well as open. By the latter it follows that the union of finitely many Ba,b is closed since in a topological space, unions of finitely many closed sets are closed. S We now consider the set A = B0,p for p ≥ 2 prime. Since every p
integer besides ±1 has a prime factor (by the Fundamental Theorem of Arithmetic), every integer besides ±1 is contained in some B0,p . Thus, statement involves only finitary mathematical objects (i.e., what logicians call an arithmetical statement) can be proved in elementary arithmetic.” (cf. [5, p. 258])
328
Andrew Arana
A = Z − {−1, 1}. If A were a union of finitely many B0,p , then it would be a closed set in our topology. Then {−1, 1}, being the complement of a closed set, would be open. But this is impossible, since the basic open sets Ba,b are all infinite, and by the definition of basis, each open set is a superset of some basic open set. Thus A is not a union of finitely many B0,p . Hence there are infinitely many primes. Our argument that Furstenberg’s solution to IP is topically impure simply observes that its commitments to definitions of topological space, topological basis, and open and closed sets in a topology do not belong to IP’s topic. Each could be retracted without a corresponding change in our understanding of IP. One might object that some set-theoretic commitments are necessary for understanding IP. For instance, one might reply that the “proper” definition of natural number is set-theoretic, as in the original second-order Dedekind-Peano axioms, or in Frege or Russell’s work. This makes it clear how difficult it is to say definitively what belongs to a problem’s topic, for a full response to this objector would be an argument against this understanding of number, a significant philosophical achievement in its own right. Rather than offer such a response, we observe that it is open whether what is defined by set-theoretic definitions of natural number is the same as what is defined by purely first-order definitions. It is consistent with what is presently accepted that we are discussing (at least) two different problems, one with set-theoretic commitments in its topic, one without.8 In that case, the topical purity of Furstenberg’s proof, with respect to its set-theoretic commitments, comes down to which of these problems is the IP being considered. Another objection of this type would be that while set-theoretic commitments may not be necessary to understanding natural number, they are necessary for understanding the arithmetic functions appealed to by IP. In reply we point out that arithmetic functions can be understood algorithmically, without appeal to set theory. We see no good reason why a set-theoretic understanding of function take precedence, particularly in the case of IP where the functions are merely used for computations. Note also that the topology used in Furstenberg’s proof, when carried out in set theory, is quite weak, i.e. it can be carried out in a fragment of set theory that uses just boolean operations on “simple” sets of natural numbers. On this point, D. Cass and G. Wildenberg have shown that Furstenberg’s proof can be reformulated in terms of periodic functions on integers, “avoiding the language of topology” (cf. [8, p. 203]). In reply, we observe that the issue again is whether any set-theoretic commitments are 8
We say “at least” because it is conceivable that IP’s topic may not contain all of the commitments needed to permit Furstenberg’s proof, while containing other settheoretic commitments.
329
Purity in Arithmetic: some Formal and Informal Issues
engendered by understanding arithmetic problems. Whether or not these set-theoretic commitments are “weak” is only relevant inasmuch as it bears on whether those commitments belong to IP’s topic, and we see no reason to think that they do. In Section 5 of [15] we surveyed a further objection to the topical impurity of Furstenberg’s solution, articulated by Colin McLarty in correspondence. McLarty’s view is that to have a full understanding of IP, one must include not merely set-theoretic commitments but indeed topological commitments of the type appealed to in Furstenberg’s proof. Hence Furstenberg’s proof should not therefore be regarded as impure simply because it appeals to topological principles. We want to revisit this objection here in order to clarify further McLarty’s point, and to pinpoint how it suggests a notion of content that differs from the one at play in topical purity, in which the understanding of ordinary practitioners is foundational. In taking this view, McLarty is aligning himself with the Bourbakiste tradition of arithmetic research, a tradition to which Furstenberg’s work also belongs. McLarty suggests that Furstenberg developed his proof from then-current work of Claude Chevalley in class field theory, work that was considered cutting-edge arithmetic despite the central role of topology in it. In [11] Chevalley took himself to have made progress in realizing a purist ideal; as he remarked to open the paper, “Class field theory is presented a little more simply today than a few years ago, in particular because of the elimination of “transcendental means” ” [La théorie du corps de classes se présente un peu plus simplement aujourd’hui qu’il y a quelques années, notamment du fait de l’élimination des “moyens transcendants”] (cf. [11, p. 394]). The “transcendental means” in question are ζ-functions; as Olga Taussky-Todd remarks in her review of this paper, a striking achievement of this paper was “the exclusion of analytical methods. . . the theory of the ζ-functions which for so long a time seemed necessary can now be omitted.” 9 Chevalley thus sought and achieved the elimination of complex analysis from what he considered arithmetic, though he had an expansive view of what counts as arithmetic; as Taussky-Todd puts it, “topological methods play an important part in this new presentation of class field theory.” From this standpoint, Furstenberg’s proof is a way to illustrate these sophisticated methods to non-experts, and by providing a simple solution to a classical arithmetical problem, a demonstration that they are arithmetic. 9
Cf. Math. Reviews MR0002357 (2, 38c). The Riemann zeta function ζ(s) =
∞ P k=1
1 ks
is a well-known ζ-function used heavily in analytic number theory (cf. [21]). The general notion of ζ-function, or L-functions as they are sometimes known, arose from work of Euler and Dirichlet, and have been used heavily in analytic and algebraic number theory (cf. [29], [7]). H. Weber had used ζ-functions in class field theory (cf. [34]); it was this use of analysis that Chevalley’s work was an attempt to purify. For a historical discussion of ζ-functions in Chevalley’s context, cf. [12]; and for a detailed technical discussion of ζ-functions in this context, cf. [26, Chapter 11].
330
Andrew Arana
This approach yielded new results on the cutting-edge of arithmetic, but Furstenberg’s solution to IP shows that the approach also yielded new solutions to elementary arithmetic problems. Furthermore, the topological means it draws upon are topological only in an axiomatic, lattice-theoretic sense, rather than in the sense typical of the Poincaré-Lefschetz topology according to which essential use is made of continua such as the real or complex lines. Chevalley judged topology in the latter sense to be nonarithmetic, but in the former sense to be arithmetic. This, then, is the view that McLarty offers in objection to our determination that Furstenberg’s solution to IP is topically impure. He claims that the topological elements in Furstenberg’s solution belong to IP’s topic, and as a result Furstenberg’s solution should not be judged topically impure on the basis of its use of these elements. McLarty’s Bourbakiste point is an important one. Work like Chevalley’s and Furstenberg’s shows that IP is not merely a problem of significance for arithmetic, but of topology as well. It shows that there are “deep” connections between arithmetic and topology, connections that were hidden to previous investigators. It brings to light the fusion of what were once thought separate domains. McLarty’s point is that solutions to problems that draw on commitments concerning domains that are “deeply” connected to the topic of a problem are of special epistemic importance. Their importance seems to be twofold: firstly, they improve our knowledge of the connections between domains by showing how one domain can be used to solve problems in another; and secondly, through this gain of knowledge of connections, they afford the investigator “considerable economy of thought” [économie de pensée considérable] by providing her with results applicable to multiple domains of investigation rather than to just a single one (cf. [6, § 5]). Such solutions help combat the splintering of mathematics into autonomous disciplines with different methods and aims (cf. [6, § 1]). We have argued (in Section 5 of [15]) against McLarty’s view, on the grounds that a topological solution of IP provides a “deep” solution but not its most “basic” solution, where “basicness” reflects the conceptual resources corresponding most closely to those which are needed to grasp the problem. Our diagnosis rests on McLarty’s claim that the Chevalleyinspired reading properly articulates IP’s topic, even though an ordinary understanding of IP would seem to contain no topological commitments. There are thus two competing notions of problem understanding that might be thought to be determinative of topics and hence of the content of problems, what could be called “basic” and “deep” understanding.10 On 10 There is a discussion of a related distinction in [4] between “informal” or “intuitive” content of a statement, by which is meant what someone with a casual understanding of geometry would (be able to) grasp, and “formal” or “axiomatic” content, by which is meant the inferential role of that statement in an axiomatic system.
Purity in Arithmetic: some Formal and Informal Issues
331
the latter, Bourbakiste notion suggested by McLarty, Furstenberg’s solution qualifies as topically pure. While this notion of deep understanding is important and deserves further investigation, the view here is that this notion should not replace the “basic” sense of understanding in topic determination. This is, in short, because McLarty and Bourbaki make clear that they see deep understanding as articulating connections between the domain of the problem being investigated and other domains, rather than articulating just the content of the problem being investigation. Commitments of the latter type are the ones relevant to topical purity, however, since a topically pure solution of a problem is best thought of as a solution of precisely that problem, not some different problem – even if there are good reasons to pursue the solution of that different problem, as the McLarty/Bourbaki view argues.
4 Incompleteness and the possibility of purity In an article [28], Georg Kreisel explained the consequences of Gödel’s work for purity as follows: Gödel’s paper [18] established that logical purity can be achieved in principle, and [1931] that arithmetic purity cannot be achieved; in fact, the result in [19] is so general that it is quite insensitive to any genuine ambiguities in the notion of purity of method. (pp. 163–4)
The idea seems to be that Gödel sentences are arithmetical sentences, and that a pure proof of an arithmetical sentence must draw only upon arithmetical means. But since Gödel sentences are unprovable by just arithmetical means, they do not admit of pure proof. Such, at any rate, seems to have been Kreisel’s view. Daniel Isaacson has articulated a view concerning the content of Gödel sentences that can be used to argue against Kreisel that Gödel sentences can be proved purely. In [27] Isaacson argued that sentences in the firstorder language LPA of arithmetic may have purely arithmetical content, or may have in addition “higher-order”, i.e. infinitary, non-arithmetical content. Since the ordinary understanding of sentences in LPA involves only arithmetical content, he says that their higher-order content, if any, is only “implicit” or “hidden”. This follows from his view that the content of arithmetical sentences is determined by what is necessary and sufficient for “perceiving” that that sentence is true, where said “perception” amounts either to “articulating” our grasp of the structure of the natural numbers, as he claims yields the axioms of PA, or to the recognition of a proof of that sentence. Since we have that Gödel sentences are PA-provably equivalent to sentences expressing by coding metamathematical properties of arithmetic (such as unprovability or consistency), it follows that these sentences are
332
Andrew Arana
provably unprovable in PA but provable by higher-order means. Such equivalences reveal “the implicit (hidden) higher-order content” of truths in the language of arithmetic, Isaacson writes. He holds that “the understanding of these sentences rests crucially on understanding this coding and our grasp of the situation being coded.” Hence, he concludes, Gödel sentences are not arithmetical sentences, but rather have higher-order content. If correct, this would seem to imply that pure proofs of Gödel sentences could draw on non-arithmetical resources and hence are available, contra Kreisel. In reply, we point out that Isaacson’s view seems muddled in the following respect: on the one hand, the non-arithmetical nature of Gödel sentences is the result of their provable unprovability in PA, and on the other hand, of their having coded metamathematical content. In identifying these two, is Isaacson’s view committed to maintaining that every arithmetical sentence independent of PA has coded metamathematical content? The first criterion seems to embody the view that the content of a sentence is determined by the inferential role it plays within an axiomatic theory (in this case a theory in which the metamathematics of PA can be carried out, for instance ZFC). This view does not permit obviously arithmetic sentences like the Goldbach conjecture to be judged as arithmetical at present, since there is at present no reason to believe its (plus-minus) truth is “directly perceivable” from our grasp of the structure of the natural numbers, nor from any other truths, arithmetic or not. This tells against the first criterion as a compelling view concerning the content of sentences in the language of arithmetic. The second criterion, that Gödel sentences are higher-order rather than arithmetical in virtue of having coded metamathematical content, is more promising. However, it suffers from the following problem. While we cannot see that Gödel sentences are Gödel sentences without grasping their coded metamathematical content, we can grasp them the way we do ordinary universally quantified sentences in the language of arithmetic without seeing that they are Gödel sentences. For instance, we could reasonably try to prove such sentences while only accepting the axioms of PA. It is true that our interest in Gödel sentences stems from their metamathematical content, generally speaking, but whether a sentence is arithmetical should be independent of our reasons for interest in it. We could encounter Gödel sentences in mainstream number-theoretic work, without knowing beforehand that these sentences are equivalent to metamathematical sentences, and could in that case grasp these sentences without grasping any higher-order content (which is not to say we could prove them without such grasp). A defender of Isaacson’s view could draw on the distinction between “basic” and “deep” content made in the previous section. While the basic
Purity in Arithmetic: some Formal and Informal Issues
333
content of Gödel sentences would seem to be arithmetical, their “deep” content would seem to be metamathematical or higher-order. On this view, the basic content of any sentence expressible in the language of arithmetic is arithmetical, while its deep content depends on other theoretical factors such as its inferential role in axiomatic arithmetic. However, for evaluating Kreisel’s claim that Gödel sentences cannot be proved purely, it is basic rather than deep content that is relevant, at least if the type of purity at issue is topical. As we have explained, grasp of their deep content is unnecessary for grasping these sentences in the ordinary way sufficient for attempting their proof, for instance. But it is precisely the latter type of grasp that determines what belongs to a problem’s topic, and hence what may be drawn upon in a topically pure proof. Consequently Isaacson’s observations indicate in another way the two types of content to which we have already drawn attention, but do not pose a convincing argument against Kreisel’s claim that Gödel sentences have no topically pure proofs.
5 Closing thoughts The case of the infinitude of primes is valuable because it highlights several key issues important for getting clearer on topical purity. How topics of problems are determined awaits further systematic study. Case studies like the one presented here are necessary and important preludes to this type of investigation. This particular case study highlights the difficulty of determining exactly what belongs to the topic of even a quite elementary problem. While addition does not explicitly appear in the problem’s formulation, it is natural to think that addition belongs to the topic of every arithmetic problem, in virtue of the natural numbers’s identity as an additive structure. The discussion of the Euclidean solution here was meant to show how to argue for its topical purity without just granting this point about the additive identity of the natural numbers. The discussion of Furstenberg’s topological solution illustrates two competing notions of problem content that might be thought to be determinative of topics, what could be called “basic” and “deep” content. We argue that what belongs to the deep content of a problem may not necessarily be drawn upon by a topically pure solution of that problem, and so in particular that Furstenberg’s solution of IP is not topically pure. We then consider Kreisel’s claim that Gödel sentences have no pure proofs and observe that Isaacson’s point that these sentences have hidden higher-order content does not contradict this claim, since this hidden content is again deep rather than basic and so does not bear on the purity or impurity of proofs drawing on these higher-order means.
334
Andrew Arana
References [1] Andrew Arana. Logical and semantic purity. Protosociology, 25:36–48, 2008. Reprinted in Philosophy of Mathematics: Set Theory, Measuring Theories, and Nominalism, Gerhard Preyer and Georg Peter (eds.), Ontos, 2008. [2] Andrew Arana. On formally measuring and eliminating extraneous notions in proofs. Philosophia Mathematica, 17:208–219, 2009. [3] Andrew Arana. Elementarity and purity. In Andrew Arana and Carlos Alvarez (eds), Analytic Philosophy and the Foundations of Mathematics. Palgrave/Macmillan, 2011. Forthcoming. [4] Andrew Arana and Paolo Mancosu. On the relationship between plane and solid geometry. Review of Symbolic Logic, 5(2):294–353, June 2012. [5] Jeremy Avigad. Number theory and elementary arithmetic. Philosophia Mathematica, 11:257–284, 2003. [6] Nicholas Bourbaki. L’architecture des mathématiques. In François Le Lionnais (ed.), Les grands courants de la pensée mathématique. Éditions des Cahiers du Sud, 1948. [7] Kevin Buzzard. L-functions. In Gowers [20]. [8] Daniel Cass and Gerald Wildenberg. A novel proof of the infinitude of primes, revisited. Mathematics Magazine, 76(3):203, 2003. [9] Patrick Cegielski. La theorie élémentaire de la divisibilité est finiment axiomatisable. C. R. Acad. Sci. Paris Sér. I Math., 299(9):367–369, 1984. [10] Patrick Cegielski, Yuri Matijasevich, and Denis Richard. Definability and decidability issues in extensions of the integers with the divisibility predicate. Journal of Symbolic Logic, 61(2):515–540, 1996. [11] Claude Chevalley. La théorie du corps de classes. Annals of Mathematics, 41:394–418, 1940. [12] J.W. Cogdell. On Artin L-functions. http://www.math.ohio-state.edu/ ∼ cogdell/artin-www.pdf, 2007. [13] Paola D’Aquino. Local behaviour of the Chebyshev theorem in models of I∆0 . Journal of Symbolic Logic, 57(1):12–27, 1992. [14] Paola D’Aquino. Weak fragments of Peano arithmetic. In The Notre Dame Lectures, volume 18 of Lecture Notes In Logic, pp. 149–185. Association for Symbolic Logic, Urbana, IL, 2005. [15] Michael Detlefsen and Andrew Arana. Purity of methods. Philosophers’ Imprint, 11(2):1–20, 2011. [16] Herbert B. Enderton. A mathematical introduction to logic. court/Academic Press, Burlington, MA, second edition, 2001.
Har-
[17] Harry Furstenberg. On the infinitude of primes. American Mathematical Monthly, 62(5):353, 1995. [18] Kurt Gödel. Die Vollständigkeit der Axiome des logischen Funktionenkalküls. Monatshefte für Mathematik und Physik, 37(1):349–360, 1930. Reprinted and translated in Collected Works Volume 1, Solomon Feferman et. al. (eds.), Oxford University Press, 1986.
Purity in Arithmetic: some Formal and Informal Issues
335
[19] Kurt Gödel. Über formal unentscheidhare Sätze der Principia Mathematica und verwandter Systeme I. Monatshefte für Mathematik und Physik, 38: 173–198, 1931. Reprinted and translated in Collected Works Volume 1, Solomon Feferman et. al. (eds.), Oxford University Press, 1986. [20] Timothy Gowers (ed.). The Princeton companion to mathematics. Princeton University Press, Princeton, 2008. [21] Andrew Granville. Analytic Number Theory. In Gowers [20]. [22] Petr Hájek and Pavel Pudlák. Metamathematics of first-order arithmetic. Perspectives in Mathematical Logic. Springer-Verlag, Berlin, second edition, 1998. [23] Michael Hallett and Ulrich Majer (eds). David Hilbert’s Lectures on the Foundations of Geometry, 1891–1902. Springer-Verlag, Berlin, 2004. [24] G. H. Hardy and E. M. Wright. An introduction to the theory of numbers. Oxford University Press, New York, fifth edition, 1979. [25] A. E. Ingham. The distribution of prime numbers. Cambridge University Press, Cambridge, 1932. [26] Kenneth Ireland and Michael Rosen. A classical introduction to modern number theory. volume 84 of Graduate Texts in Mathematics. SpringerVerlag, New York, second edition, 1990. [27] Daniel Isaacson. Arithmetical truth and hidden higher-order concepts. In W. D. Hart (ed.), The Philosophy of Mathematics, pp. 203–224. Oxford University Press, New York, 1996. First published in Logic Colloquium ’85, the Paris Logic Group (eds.), Amsterdam, North-Holland, 1987, pp. 147– 169. [28] Georg Kreisel. Kurt Gödel. Biographical Memoirs of Fellows of the Royal Society, 26:149–224, 1980. [29] Barry Mazur. Algebraic Numbers. In Gowers [20]. [30] Rohit Parikh. Existence and feasibility in arithmetic. Journal of Symbolic Logic, 36:494–508, 1971. [31] J. B. Paris, A. J. Wilkie, and A. R. Woods. Provability of the pigeonhole principle and the existence of infinitely many primes. Journal of Symbolic Logic, 53(4):1235–1244, 1988. [32] Julia Robinson. Definability and decision problems in arithmetic. Journal of Symbolic Logic, 14:98–114, 1949. [33] William W. Tait. Finitism. The Journal of Philosophy, 78(9):524–546, 1981. [34] Heinrich Weber. Lehrbuch der Algebra, volume III. F. Vieweg und Sohn, Braunschweig, second edition, 1908. [35] Alan Woods. Some problems in logic and number theory and their connections. PhD thesis, University of Manchester, 1981.
Domain Extensions and Higher-Order Syntactical Interpretations Marek Polański
The paper is concerned with logical analysis of a broad family of mathematical constructions which fall under the vague term “domain extension“. The aim of the author is to contribute to a clarification of this notion and to provide an explication in both syntactic and model theoretical terms. The paper is mainly motivated by examples such as Whitehead’s definition of point or Russell’s construction of instants of time from events. The paper begins with a short exposition of some general model theoretic conditions which can serve as a first approximation of an adequate explication of the notion in question. The explication proposed by the author is based on a very general notion of syntactic interpretation. This notion is introduced and characterized semantically in the second part of the paper.
1 Introductory remarks The phrase “domain extension” has many connotations. The present paper is intended to contribute to its clarification in model theoretical terms. Our aim is to provide of a model-theoretic description and a purely syntactical characterization of some broad family of mathematical constructions which fall under this rather vague term. The common use of the term “domain extension” is not entirely clear but it can be roughly characterized by a series of well-known algebraic and geometric examples. The last ones are closely related to the so-called method of extensive abstraction (as described by Whitehead in [10, 11] and developed by Russell in [7] and Tarski in [9]). In this paper we introduce a concept of syntactic interpretation between higher-order theories which turns out to be a syntactic counterpart to constructions of this kind. The paper is organized as follows. In the next section some paradigmatic examples of domain extensions are presented. In the third section we introduce some conceptual preliminaries and formulate some very general conditions on abstract operations on relational structures which can serve
338
Marek Polański
as model-theoretic counterparts of the more concrete operations briefly discussed in the second section. We introduce a very general notion of an L-construction and define a very general notion of an elementary construction. Both sections have a rather expository character. The fourth section provides a new conceptual framework which is motivated by the paradigmatic examples discussed earlier.
2 Domain extensions: some paradigmatic examples Let us begin with a list of well-known mathematical examples which motivate the conceptual framework described and developed in the next two sections. Our paradigmatic examples can be divided into roughly three categories. The first one contains extensions of algebraic structures. This category comprises some well-known cases where an algebraic structure A is extended to an algebraic structure B for the same vocabulary. A is then (up to isomorphism) a substructure of B in the usual model-theoretic sense. The universe B of B is a result of adding new objects which satisfy some conditions expressed in the related vocabulary. The new objects can be (and usually are), roots of polynomials over A. The motivation for such an extension is then purely algebraic. Typical and well-known cases of this sort are extensions of number structures: the passage from natural numbers to integers, from integers to the field of rationals, the passage from rationals to reals and from reals to complex numbers. The second category embraces cases where an new domain is the result of filling gaps in an ordered structure. Typical examples of such a domain extension are completions of Boolean algebras and completions of linear orderings. To the third category belong constructions related to the method of extensive abstraction originated by Whitehead and further developed by Russell, Tarski (compare [9]), and some contemporary authors working on the point-free foundations of geometries (compare [4], [5]). Usually, a point-free geometry is a theory of spacial regions which is formulated in a language extending the language of mereology. Points are introduced via an approximation procedure which consists in defining a set whose elements are certain sets of regions of space linearily ordered by the mereological inclusion. Constructions of this kind, together with some further definitional steps, transform models of point-free geometries into models of point-based geometrical theories. The last ones are in a sense definable over the original models of a given point-free geometry. The essence of the method of extensive abstraction can be illustrated on simple examples. Consider the class of all so-called interval orders which are structures of the form A = (A, ⊲A ) where A is a non-empty set and ⊲A is a binary
Domain Extensions and Higher-Order Syntactical Interpretations 339
relation on A such that: (1) for all a ∈ A: non(a ⊲A a), and (2) for all a, b, c, d ∈ U : if a ⊲A b and c ⊲A d then a ⊲A d or c ⊲A b. Elements of the universe of an interval order can be regarded as time intervals. For each interval order A let us define a relation of overlapping ◦A as follows: a ◦A b iff non(a ⊲A b) and non(b ⊲A a). The following two examples show how point structures (being linear orderings) can be constructed from interval orders by means of the method of extensive abstraction. The constructions exemplify the ideas of Whitehead and Russell. Example 1: Points as maximal sets of mutually overlapping intervals A subset D of A is called an antichain in A if and only if for all a, b ∈ D: a ◦A b. Under a maximal antichain in A we understand an antichain in A which cannot be properly extended to another antichain. Let matc(A) be the set of all maximal antichains in A. For each interval structure A let F (A) be the structure (matc(A), ⋖A ) where ⋖A is a binary relation on matc(A) such that D1 ⋖A D2 iff for some a ∈ D1 , b ∈ D2 : a ⊲A b. Example 2: Points as equivalence classes of abstractive sets Let us define for all a, b ∈ U : a ≺A b :⇔ a 6= b and for all c: if c ◦A a then c ◦A b.
A subset D of U is called an abstractive class in A if D is linearily ordered by the relation ≺A and such that there is no a ∈ A such that for all b ∈ D: a ≺A b. Let ac(A) be the set of all abstractive classes in A. Now let us define the binary relation ∼A on ac(A) as follows: D1 ∼A D2 iff for each a ∈ D1 there is an b ∈ D2 such that a ≺A b and for each a ∈ D2 there is an b ∈ D1 such that a ≺A b.
Clearly, ∼A is an equivalence relation on ac(A). For each interval structure A let F (A) be the structure (ac(A)∼A , ⋖A ) where ac(A)∼A is the corresponding set of all equivalence classes, and for all [D1 ]∼A and [D2 ]∼A : D1 ⋖A D2 :⇔ for some a ∈ D1 , b ∈ D2 : a ⊲A b.
In the next section we shall consider a list of conditions which approximate the class of domain extensions as characterized by the above examples. We start with some very general conditions and try to narrow down the class of operations they define. In section 4 we shall provide a precise concept which can serve as a candidate for an adequate model-theoretic explicatum of the term “domain extension”.
340
Marek Polański
3 L-operations and L-constructions
Let L1 and L2 be finite and purely relational vocabularies. For each vocabulary L let StrL denote the class of all relational structures which can serve as models for L. Let F be a partial operation from StrL1 to StrL2 . We call F regular if it preserves isomorphisms in the following sense. (ISOM) For all A, B ∈ dom(F ): if A ∼ = B then F (A) ∼ = F (B).
The above condition is purely algebraic in character. It does not refer to any particular languages which can be evaluated in the structures in question. There is of course a wide variety of languages which can be associated with a given relational vocabulary. Let L be a logical system which extends the first-order logic. Let us denote by F mL (L) and SentL (L) the classes of all L-formulas and all L-sentences (respectively) in the vocabulary L. We call a partial operation F from StrL1 into StrL2 an L-operation if and only if the following condition holds. (TRANS) There is a mapping τ from SentL (L2 ) into SentL (L1 ) such that for all ϕ ∈ SentL (L2 ) and all A ∈ dom(F ): A |= τ (ϕ) iff F (A) |= ϕ
The condition (TRANS) says intuitively that there is a uniform way to corelate (or to reduce) all L-expressible properties of a constructed structure F (A) with (to) some L-expressible properties of the original structure A. It is apparent that (TRANS) implies the following condition. (ELEM) For all A, B ∈ dom(F ): if A ≡L B then F (A) ≡L F (B).
(TRANS) and (ISOM) are independent from each other. It is easy to show that (TRANS) does not imply (ISOM). To see this let L be the first-order logic and F be an operation such that (1) dom(F ) the class of all models of the first-order theory of some particular finite structure A, (2) F (A) is the linear ordering ω (3) for each B ∈ dom(F ) such that A 6= B F(B) is the linear ordering ω + (ω ∗ + ω). Since each model of the first-order theory of A is isomorphic to A and ω is not isomorphic to ω + (ω ∗ + ω) such an operation F does not satisfy (ISOM). However, (TRANS) is satisfied. Let τ be the function which assigns to each sentence which is true in ω the sentence verum, and to each sentence which is false in ω the sentence falsum. Since ω and ω + (ω ∗ + ω) are first-order equivalent, then obviously for each sentence ϕ in an appropriate vocabulary for ω and ω+(ω ∗ +ω), and each model B in dom(F ): B |= τ (ϕ) just in case F (B) |= ϕ. It is equally easy to show that (ISOM) does not imply (TRANS). Let L be again the first-order logic and let F be the operation which assigns to each linear ordering its automorphism group. Each such automorphism group F (A) of a linear ordering A is here a structure of the form (B, R) where B is the set of all automorphisms of A and R is a ternary relation on B such that for all elements i, j, k ∈ B: (i, j, k) ∈ R if and only if k is the composition of i with j. Hence, each such automorphism group is a structure for a
Domain Extensions and Higher-Order Syntactical Interpretations 341
relational vocabulary. Now the ordering ω and ω + (ω ∗ + ω) are firstorder equivalent but the corresponding automorphism groups F (ω) and F (ω + (ω ∗ + ω)) are not (the first one contains only one element and the second one is infinite) This implies that F does not even satisfy (ELEM), let alone (TRANS). All paradigmatic examples of domain extensions described above take the form of regular L-operations. Some of them do not, strictly speaking, fulfil the condition of uniqueness. For example, each model A of the Peano arithmetic has many extensions which are models of the theory of integers. Nevertheless, any two such B1 and B2 which extend a given model A of Peano arithmetic are isomorphic over A (i.e. isomorphic to each other via a mapping that is the identity on A). Similarly, any two completions of a given Boolean algebra A are isomorphic over A. However, the concept of a regular L-operation is too wide to serve as a satisfactory explicatum of “domain extension”. It can be easily demonstrated that the conditions (ISOM) and (TRANS) taken together are not strong enough to exclude many trivial correlations between two classes of structures. Consider, for example, an operation F which is such that: (i) for all A, B ∈ dom(F ) A ≡L B, and (ii) there is a single structure C such that F (A) = C, for each A ∈ dom(F ). Clearly, any such operation F satisfies both (ISOM) and (TRANS). Therefore further conditions should be taken into account. We call a partial operation F from StrL1 to StrL2 an L-construction if and only if there exist functions f, D, and θ such that θ transforms injectively the set of all L-variables into itself, the domains of f and D are identical with the domain of F and the following conditions hold (we write fA and DA for the values of A under f and D, respectively). (PROXY) For each A ∈ dom(f ) fA is a function from DA onto the universe of F (A).
(DEF)
For each atomic formula ϕ ∈ F mL (L2 ) with free variables x1 , ..., xn there is a formula ψ ∈ F mL (L1 ) with free variables θ(x1 ), ..., θ(xn ) such that for all A ∈ dom(F ) and all a1 , ..., an ∈ DA : (⋆) A |= ψ[θ(x1 ) : a1 , ..., θ(xn ) : an ] iff F (A) |= φ[x1 : fA (a1 ), ..., xn : fA (an )].
The above definition resembles various conditions discussed in literature (compare from example: [3], [1]). Some comments are needed. Firstly, the value of the function f for each model A for which F is defined assigns elements of the universe of F (A) to elements of DA in such a way that each element of the universe of F (A) has at least one counterpart. Hence, for each A in dom(F ) fA is a kind of functional proxy relation. Secondly, the set DA is not necessarily a subset of the set of all individuals in the universe of A. For example, if A is an inteval order (as defined above) DA may be the set of all maximal antichains or the set of all abstractive
342
Marek Polański
classes in A. In such a case θ would assign to each first-order variable a second-order variable. The above definition has some simple but useful consequences. 1.
2.
There is an L-formula δ in L1 whose extension in A is identical with DA . To see this apply the condition (DEF) the formula ’(x = x)’. We call δ a domain formula for F . Observe that δ depends essentially on f. There is an L-formula χ in L1 which defines a relation which is a surrogate for identity. Let us call it an indiscernibility formula for F . To see this apply (DEF) to the formula ’(x = y)’. Obviously, the extension of χ in A is the kernel of the function fA , i.e. the set of all pairs (a, b) such that fA (a) = fA (b). The relation kχkA is clearly an equivalence relation on DA and it is a congruence with respect to the L1 -counterpart of each basic predicate in L2 .
3.
Each L-construction is regular. To see this consider A, B ∈ dom(F ) and an isomorphism h from A onto B Now let us define a function g from rge(fA ) into rge(fB ) as follows. For each d ∈ dom(g) choose an ¯ object a ∈ kδkA such that fA (a) = d and let g(d) := fB (h(a)) where ¯ h is a suitably defined extension of h such that for all L-formulas A ¯ ϕ in L1 : h[kϕk ] = kϕkB . The claim that such an extension of an isomorphism h from A onto B exists is a kind of (model-theoretic) isomorphism theorem. It can easily be shown that g is a bijection that preserves the extensions of formulas which are L1 -counterparts of basic predicates in L2 .
4.
F is also an L-operation. This follows from the fact that the condition (⋆) can be easily shown to hold for all formulas ϕ ∈ F mL (L2 ).
We call F an elementary construction if F is an L-construction and L is the first-order logic. Strictly speaking, according to this explication of an elementary (i.e. first-order) construction the canonical constructions of socalled n-dimensional elementary interpretations (as defined in in [8]) are not elementary constructions. The above definition of an L-construction could be modified in a such a way that ψ in (⋆) would be allowed to have nk free variables. In such a case the domain formula δ for F would contain n free variables. Although we do not follow this explication strategy it should be stressed that the notion of interpretation introduced in the present paper is motivated by Szczerba’s account. All canonical constructions in Szczerba’s sense all L-constructions in our sense. Moreover, many of the operations from our paradigmatic examples are elementary according to Szczerba’s account. In the first category there is only one exception: the operation transforming rationals to reals is a non-elementary (this can be shown with the help of a simple cardinality argument). However, each operation belonging to the second category is not elementary. But all of them are L-construction for some logics L (for instance, for the second-
Domain Extensions and Higher-Order Syntactical Interpretations 343
order logic). The examples from the third group are more complicated and a definitive answer with regard to some of them would require a deeper analysis. As it seems, they are also not elementary, both in our sense and in the sense defined by Szczerba. This shows that the class of elementary constructions does not embrace many of the paradigmatic cases of domain extensions. This fact is the main motivation for a more systematic study of non-elmentary L-constructions. This topic will occupy the rest of the present paper. Our strategy will consist in defining a special kind of higher-order constructions instead of trying to add further general conditions in style of (ISOM) etc. However, it is interesting to notice that there is another strategy which seems equally promising but which we do not follow here. Each of the paradigmatic examples can be regarded as a class of structures H which is elementary in a logic L and such that there is a vocabulary L which extends L1 ⊎ L2 ⊎ {P } where ⊎ stands for the disjoint sum and P is a new unary predicate. For example, the completion operation for linear ordering can be regarded as the class H consisting of structures of the form A = (A, P A ,
(UNIQUE) For each A, B ∈ H: every isomorphism between the L1 -reducts of A and B extends to an isomorphism between A and B. (RIGID) (DOM)
For each A ∈ H: every automorphism of the L1 -reducts of A extends uniquely to an automorphism of A.
For each A ∈ H: each element of A \ P A is definable from parameters in A.
It is easy to see that (UNIQUE) and (DOM) taken together imply (RIGID). If L is the first-order logic then (AXIOM)L , (UNIQUE), and (DOM) taken together are equivalent to the n-dimensional elementary interpretability of the elementary theory of the class of all L2 -reducts of H into the elementary theory of the class of all L1 -reducts of H (for a proof consult [6]). Moreover, it can be shown that our paradigmatic examples satisfy (AXIOM)L , (UNIQUE), and (RIGID), for some general logics L (however, in general not for the first-order logic). It seems also that there are no general results about classes of structures of this kind. Especially, no general translatability or interpretability theorems in the style of the result proved by Myers in [6] are known. Let us now turn to our explication strategy. In the next section we introduce a very general notion of syntactic interpretation and describe an L-construction (where L is a version of higher-order logic) which is its model-theoretic counterpart. The construction itself will serve as a candidate an explicatum of the notion of domain extension.
344
Marek Polański
4 Higher-order syntactical interpretations and their canonical constructions In the following we work within a system of type logic which is the extensional fragment of the version of higher-order logic elaborated in [2]. We start with some syntactical and semantical preliminaries. In order to make the paper self contained, section 4.1 provides a brief introduction to the main syntactical and semantical building blocks of the logical system in question. In particular, we define the notion of a Henkin model of type logic. In the ensuing section we characterize a particular method of construction of a new universe inside of the universe of a given Henkin model. This method consists essentially in relativizing a given model to a (complex) predicate. It can also be regarded as a variant of constructing of an inner model inside a given structure. The main idea behind the construction is that the urelements of the inner universe are allowed to be objects which are of higher types in the original model. This corresponds to constructional steps described roughly and informally in our paradigmatic examples. For instance, the objects of the universe of a completion of a given linear ordering A can be constructed as pairs of subsets of the universe of A. Similarly, the universe of points constructed out of a mereological field of geometric regions, with the help of the Whiteheadian method of extensive abstraction, consists of certain equivalence classes of sets of geometric regions linearily ordered by the mereological inclusion. In all other paradigmatic examples of domain extensions the constructed universes are of similar kinds. After describing this core idea we shall give a precise definition of a certain canonical L-construction and address the question of its purely syntactical characterization. For this purpose, we introduce the concept of a construction code which is a type logical pendant of the concept of an elementary n-dimensional code (as defined by Szczerba). Each such construction code determines uniquely both an operation on Henkin models and a recursive translation between the relevant languages.
4.1 Languages and models of type logic The set Type of all types is the smallest set containing 0 and such that if t1 , .., tn are types then the tuple ht1 , ..., tn i is also a type. 0 is the type of urelements whereas ht1 , ..., tn i is the type of sets of n-tuples (i.e. n-ary relations) whose i-th components are objects of type ti . A language of type logic contains the following items:(a) for each type t, a countably infinite set V art ={υnt : n ∈ ω} of all variables of type t and a countable (possibly empty) set Const (L)={ctn : n ∈ ηt } of all constants (where ηt ≤ ω); (b) logical signs: ¬, ∧, ∨, →, ↔, ∀, ∃, =, λ. Constants of type 0 are called individual constants whereas constants of a type ht1 , ..., tn i are called n-
Domain Extensions and Higher-Order Syntactical Interpretations 345
ary relational constants. The sets T mt (L) of all terms of L of type t and F m(L) of all formulas of L are defined as usual by a mutual recursion: — each variable υnt and each constant ctn of L is a term of L of type t ; — for each type t and any terms α1 and α2 of L of type t the expression α1 = α2 is a formula of L; — for each term β of L of type ht1 , ..., tn i and any terms α1 , ..., αn of L of types t1 , .., tn (respectively), the expression βα1 , ..., αn is a formula of L; — for any formulas ϕ and ψ of L and each variable x the expressions: ¬ϕ, ϕ ∧ ψ, ϕ ∨ ψ, ϕ → ψ, ϕ ↔ ψ, ∀xϕ, and ∃xϕ are formulas of L;
— for each formula ϕ of L and any pairwise distinct variables xt11 , ..., xtnn the expression λxt11 ...xtnn ϕ is a term of L of type ht1 , ..., tn i. In order to ensure better readability of formulas we shall use a more transparent unofficial notation and write ’hλxt11 , ..., xtnn .ϕi’ instead of ’λxt11 ...xtnn ϕ’. The notion of a free occurrence of a variable in a term or in a formula is defined as usual. For example, the free variables of a term of the form hλxt11 , ..., xtnn .ϕi are all the free variables of the formula ϕ except xt11 , ..., xtnn . Formulas and terms without free variables are called sentences and closed terms, respectively. Let A be a nonempty set. A general model for a language L is a pair A t t t = (hUA i, ℑA ) such that hUA i = {UA : t ∈ Type} and: ht ,...,tn i
0 — UA = A and for each types t1 , .., tn : UA 1 t1 tn of ℘(UA × ... × UA );
is a nonempty subset
0 — for each c ∈ Const0 (L): ℑA (c) ∈ UA ; — for each types t1 , .., tn and each c ∈ Constht1 ,...,tn i (L): ℑA (c) ⊆ t1 tn ℘(UA × ... × UA ).
An assignment in A is a function ν which is defined for all variables and t such that ν(xt ) ∈ UA . For each expression ξ ∈ T m(L) ∪ F m(L) the A semantical value kξkν of ξ in A under the assignment ν in A is defined as follows: A — kxkA ν = ν(x), kckν = ℑA (c); A A — kα1 = α2 kA ν = 1, if kα1 kν = kα2 kν ; otherwise A kα1 = α2 kν = 0; A A A — kβα1 , ..., αn kA ν = 1, if (kα1 kν , ..., kαn kν ) ∈ kβkν ; otherwise A kβα1 , ..., αn kν = 0;
A — k¬ϕkA ν and kϕ ◦ ψkν , where ◦ ∈ {∧, ∨, →, ↔}, are defined as usual; t A t A — k∀xt ϕkA ν = 1, if for all a ∈ UA : kϕkν(x:a) = 1; otherwise k∀x ϕkν = t A 0; k∃x ϕkν – analogously; t1 tn — khλxt11 , ..., xtnn .ϕikA a ∈ UA × ... × UA : kϕkA t ν = {¯ ν(x 1 :¯ a i
i )0
= 1}.
346
Marek Polański
Observe that the value of a term α of a relational type t in a general model t A under an assignment ν may lie outside the domain UA . A general model t A = (hUA i, ℑA ) is called a Henkin model for L if for all terms α of L and t all assignments ν in A: kαkA ν ∈ A . A model A is called a full model if for ht1 ,...,tn i t1 tn each type ht1 , ..., tn i the set UA is identical with ℘(UA × ... × UA ). Obviously, each full model for L is a Henkin model for L. Since each first-order vocabulary L can be regarded as a vocabulary for a language of type logic each relational structure A for L can be expanded in an obvious way to a Henkin model B for L where UB0 = A. B is then called a Henkin expansion of A. We call such a model of type logic B a full expansion of A if B itself is full.
4.2 Defining new universes Consider a general model A for a language L. Let s be a type and let δ be a closed term of L of type hsi such that kδkA 6= ∅. The denotas tion of δ in A is a nonempty subset of UA which can serve as the set of t urelements of a new hierarchy of universes hDA,δ i constructed in a cert tain natural way inside hUA i. In order to give an exact description of the construction we first define a family of functions called type transformations. Under a type transformation generated by a type s we shall understand the function πs : T ype −→ T ype such that πs (0) = s and t πs (ht1 , ..., tn i) = hπs (t1 ), ..., πs (tn )i. We can now define hDA,δ i as follows: 0 — DA,δ = kδkA ; ht ,...,tn i
1 — DA,δ
hπ (t1 ),...,πs (tn )i
= UA s
t1 tn ∩ ℘(DA,δ × ... × DA,δ ).
t Each element of the hierarchy hDA,δ i can be defined in the model by a formula constructed from δ in a certain canonical way. To see this consider the family of formulas {δt : t ∈ T ype} recursively defined by the following conditions:
— δ0 = δ; — δht1 ,...,tni = π (ht1 ,...,tn i)
hλυ0 s
π (t1 )
.∀υ1 s
π (tn )
...∀υns
(υ0 υ1 , ..., υn →
V
0
A straightforward calculation shows that for all types t the kδt kA are identical. To illustrate the main idea behind the construction let us consider the set matc(A) of all maximal antichains in an interval order A as described in Example 1. matc(A) is a subset of the powerset of A. Consider now a Henkin expansion B of A and let δ be the following term of L of type hh0ii: hλxh0i .∀y 0 (xh0i y 0 ↔ ∀z 0 (xh0i z 0 → y 0 ◦ z 0 ))i. Obviously, h0i the elements of kδ0 kB are exactly those subsets of A belonging to UB which are antichains with respect to ⊲A and are not properly contained h0i in any such antichains belonging to UB . If B is full then δ B contains all
Domain Extensions and Higher-Order Syntactical Interpretations 347
maximal antichains in A. The set kδh0,0i kB consists of all binary relations hh0i,h0ii
between elements of kδ0 kB which belong to UB . Again, if B is full then kδh0,0i kB is identical with the whole powerset of kδ0 kB × kδ0 kB .
4.3 Construction codes Let L1 and L2 be two languages. For the sake of simplicity we assume that L1 and L2 have no constants of the type 0. Under a code for L1 in L2 we shall understand a tuple c = (s, δ, {ξα : α ∈ Const(L1 )}) where: — s is a type;
— δ is a closed term of L2 of type hsi; — for each constant α in L1 of a relational type t, ξα is a closed term of L2 of type πs (t). Each code c of the kind as defined above describes a certain partial operation from the class of all general models for L2 into the class of all general models for L1 which will be called here the construction for c and designated by Γc . The domain dom(Γc ) of Γc is the class of all general models A such that for all types t : kδt kA 6= ∅. For each A ∈ dom(Γc ) we define t Γc (A) as the pair (hΓc UA i, ℑΓc (A) ) where: t t — for each type t : Γc UA = DA,δ = kδt kA ;
— for each constant α of L1 of type ht1 , ..., tn i: ℑΓc (A) (α) = kξα kA ∩ (kδt1 kA × ... × kδtn kA ).
Clearly, Γc (A) is a general model for L1 . Moreover, by the definition of t1 t t Γc U A and δt , for each t = ht1 , ..., tn i we obtain: Γc UA = Aπs (t) ∩℘(Γc UA × tn t ... × Γc UA ). Hence, if Aπs (t) = ℘(Aπs (t1 ) × ... × Aπs (t1 ) ) then Γc UA = t1 tn ℘(Γc UA × ... × Γc UA ). This leads to the following observation. Observation 1 If A ∈ dom(Γc ) is a full model then Γc (A) is also a full model. Our next step consists in defining a suitable translation from L1 in L2 which is associated in a natural way with a code for L1 in L2 .
4.4 Translations generated by codes Let c = (s, δ, {ξα : α ∈ L1 }) be a code for L1 in L2 . We define the translation function τc from L1 into L2 as follows: π (t)
— τc (υnt ) = υns
;
π (t ) π (t ) π (t ) π (t ) V π (t ) hλυ1 s 1 , ..., υns n .ξα υ1 s 1 , ..., υns n ∧ 0
— τc (α) = for each α ∈ L1 of type ht1 , ..., tn i;
— τc (α1 = α2 ) = (τc (α1 ) = τc (α2 )); — τc (βα1 , ..., αn ) = τc (β)τc (α1 ), ..., τc (αn );
348
Marek Polański
— τc (¬ϕ) = ¬τc (ϕ)
— τc (ϕ ◦ ψ) = τc (ϕ) ◦ τc (ψ), for ◦ ∈ {∧, ∨, →, ↔}; π (t)
— τc (∀υnt ϕ) = ∀υns
π (t)
→ τc (ϕ)); πs (t) πs (t) t τc (∃υn ϕ) = ∃υn (δt υn ∧ τc (ϕ)); V t1 tn τc (hλx1 , ..., xn .ϕi) = hλτc (xt11 ), ..., τc (xtnn ).τc (ϕ)∧ 0
—
—
(δt υns
It is easy to show that for all ξ ∈ T m(L1 ) ∪ F m(L1 ): the free variables of π (t) (t) τc (ξ) are exactly all the variables υns such that υn is a free variable of ξ. The next proposition shows that τc and Γc are related to each other in the way as described in the condition (TRANS) above. The proof is not complicated and will be therefore omitted. Proposition 1 Let c = (s, δ, {ξα : α ∈ L1 }) be a code for L1 in L2 and let A ∈ dom(Γc ) be a general model for L2 . Then for all ξ ∈ T m(L1 ) ∪ F m(L1 ) with F r(ξ) = {xt11 , ..., xtnn } and all t1 tn a 1 ∈ Γc U A , ..., an ∈ Γc UA we have: kτc (ξ)kA t τ (x 1 ):a c
1
tn 1 ,...,τc (xn ):an
= kξkΓtc1A
x1 :a1 ,...,xtnn :an
.
Let α be a term of L1 of type t = ht1 , ..., tn i. Then α is a constant or an sm abstract of the form hλxt11 ...xtnn .ϕi with free variables y1s1 , ..., ym . In the Γc A first case, kαk is identical (by the above Proposition) with the set of all tuples ha1 , ..., an i ∈ kξα kA such that ai ∈ δtAi (for all i with 0 < i ≤ n). In the second case, for all b1 ∈ Γc As1 , ..., bm ∈ Γc Asm , kαkΓysc1A:b ,...,ysm :b 1 m m 1 is identical (again by the above Proposition) with the set of all tuples ha1 , ..., an i ∈ kτc (ϕ)kA such that ai ∈ δtAi (for all i with s π (y 1 ):b ,...,π (y tm ):b s
1
1
s
m
m
0 < i ≤ n). Hence, by definition of hΓc At i, for all terms α of L1 of sm type t with free variables y1s1 , ..., ym and all b1 ∈ Γc As1 , ..., bm ∈ Γc Asm : Γc A t kαkys1 :b ,...,ysm :b ∈ Γc A . This leads to the following observation. 1
1
m
m
Observation 2 If A ∈ dom(Γc ) is a Henkin model then Γc (A) is also a Henkin model.
4.5 Syntactical interpretations between theories Consider two sets of sentences: Σ1 in L1 and Σ2 in L2 . Let c be a code for L1 in L2 . We say that τc is a Henkin interpretation of Σ1 in Σ2 if the image of Σ1 under τc is contained in the Henkin closure of Σ2 , i.e. τc [Σ1 ] ⊆ CnHen (Σ2 ). For each set of sentences Σ the Henkin closure CnHen (Σ) of Σ is defined as the set of all sentences which are true in all Henkin models of Σ. The standard closure of Σ is defined in an analogous way. We call τc a standard interpretation of Σ1 in Σ2 if the image of Σ1 under τc is contained in the standard closure of Σ2 . Observe that τc [Cn(Σ)] ⊆ Cn(τc [Σ]) (where Cn is the Henkin or standard closure op-
Domain Extensions and Higher-Order Syntactical Interpretations 349
erator). Otherwise for some ϕ ∈ Cn(Σ) we had τc (ϕ) ∈ / Cn(τc [Σ]) and hence for some (Henkin or standard) model A of τc [Σ]: A |= ¬τc (ϕ). But then, by Proposition 1 and Observation 1 resp. 2, Γc (A) would be a full resp. a Henkin model of Σ and Γc (A) |= ¬ϕ, i.e. ϕ ∈ / Cn(Σ), which is a contradiction. Using a similar reasoning we can easily prove the following observation. Observation 3 τc is a Henkin (resp. a standard) interpretation of Σ1 in Σ2 if and only if M odHen (Σ2 ) (res. M odf ull (Σ2 )) is contained dom(Γc ) and the image of [M odHen (Σ2 )] (resp of [M odf ull (Σ2 )])under Γc is contained in M odHen (Σ1 ) (resp. in M odf ull (Σ1 )). Moreover, if Γc [M odHen (Σ2 )] ⊆ M odHen (Σ1 ) then, by Observation 1, we also have Γc [M odf ull (Σ2 )] ⊆ M odf ull (Σ1 ) which leads us to the next observation. Observation 4 If τc is a Henkin interpretation of Σ1 in Σ2 then τc is a standard interpretation of Σ1 in Σ2 . It should be stressed, however, that the converse does not hold in general.
5 Concluding remarks It can be shown with some effort that each of the paradigmatic examples mentioned and discussed in section 2 can be regarded as a constructions for some code c as defined in section 4.3. Moreover, if a construction of this kind transforms a class of models K into a class of models K* then the associated translation τc reduces some higher-order theories associated in an natural way with K* to some higher-order theories of K. Although these results are not really surprising they seem to be useful in that they provide a conceptual link between the informal and purely algebraic notion domain extensions on the one hand and some precisely defined relations of reducibility between sets of sentences on the other.
References [1] J. van Benthem and D. Pearce. A mathematical characterization of interpretation between theories. Studia Logica, 43:295–303, 1984. [2] M. Fitting. Types, Tableaus, and Gödel’s God. Kluwer, 2002. [3] A. Gajda. The adequacy condition as a definition of elementary interpretation. Studia Logica, 47:57–69, 1988. [4] G. Gerla. Pointless geometries. In F. Buekenhout (ed.), Handbook of Incidence Geometry, pp. 1015–1031. Elsevier Science, 1994.
350
Marek Polański
[5] G. Gerla and A. Miranda. Mathematical features of Whitehead’s point-free geometry. In M. Weber and W. Desmond (eds), Handbook of Whiteheadian Process Thought, pp. 507–519. Ontos Verlag, 2008. [6] D. Myers. An Interpretive Isomorphism Between Binary and Ternary Relations. In Structures in Logic and Computer Science, Lecture Notes in Computer Science, volume 1261, pp. 84–105. 1997. [7] B. Russell. On order in time. In R. C. Marsh (ed.), Logic and Knowledge. Essays 1901-1950, pp. 347–363. Allen and Unwin, London, 1956. [8] L. Szczerba. Interpretability of elementary theories. In R. E. Butts and J. Hintikka (eds), Logic, Foundations of Mathematics, and Computability Theory, pp. 129–145. Reidel, 1977. [9] A. Tarski. Foundations of the geometry of solids. In J. H. Woodger (ed.), Logic, semantics, metamathematics, papers from 1923 to 1938, pp. 24–29. Clarendon Press, Oxford, 1956. [10] A. N. Whitehead. An Inquiry Concerning the Principles of Natural Knowledge. Cambridge University Press, 1919. [11] A. N. Whitehead. Process and Reality. Macmillan, New York, 1929.
Finite Methods in Mathematical Practice Laura Crosilla, Peter Schuster
In the present contribution we look at the legacy of Hilbert’s programme in some recent developments in mathematics. Hilbert’s ideas have seen new life in generalised and relativised forms by the hands of proof theorists and have been a source of motivation for the so-called reverse mathematics programme initiated by H. Friedman and S. Simpson. More recently Hilbert’s programme has inspired T. Coquand and H. Lombardi to undertake a new approach to constructive algebra in which strong emphasis is laid on the use of finite methods. The main aim is to eliminate the ideal objects and in so doing obtain more elementary and informative proofs. We survey some work in commutative algebra – mainly about and around the Zariski spectrum and the Krull dimension of a commutative ring – which witnesses the feasibility of such a revised Hilbert’s programme.
1 Introduction Hilbert’s programme is undoubtedly one of the main contributions of the last century to the foundations of mathematics. Ensuing Gödel’s celebrated incompleteness results, Hilbert’s programme in its original formulation is nowadays widely deemed as unachievable.1 This notwithstanding, Hilbert’s work on foundations is still highly stimulating from a philosophical point of view; for example in recent years it has inspired instrumentalist proposals by Detlefsen and Field [40, 60]. The present contribution focuses on the impact some of Hilbert’s ideas have had on recent developments in mathematics. First of all it is worth recalling that the work of Hilbert and his school has been vital in shaping the landscape of mathematical logic as it is today; more specifically Hilbert’s programme has been instrumental to the birth of the central field of proof theory. Furthermore, Hilbert’s ideas have seen new life in generalised and relativised forms by the hands of proof theorists and have been source of motivation for the so-called reverse 1
See Detlefsen [40] for a different perspective.
352
Laura Crosilla & Peter Schuster
mathematics programme initiated by H. Friedman and G. S. Simpson (see section 2.3, [82, 55, 57, 126, 105, 6]). More recently Hilbert’s programme has inspired a new approach to constructive algebra in which strong emphasis is laid on the use of finite methods [10, 11, 12, 24, 25, 27, 37, 29, 30, 31, 32, 35, 33, 34, 39, 41, 42, 85, 86, 87, 88, 99, 100, 142]; see in particular [28, 31, 89]. This new trend in constructive algebra has especially focused on commutative algebra, and has sought an elimination of ideal objects in favour of more concrete, low-type ones. Such a shift has produced more elementary and more perspicuous proofs compared with the classical ones. Importantly, one of the motivations for the new approach is to ensure that proofs in commutative algebra have a clear computational significance. This new trend in commutative algebra constitutes the main topic of this note. Notwithstanding various differences among the modified Hilbert’s programmes that we briefly survey below, as well as dissimilarities between each of them and the original programme, we should like to emphasise the following common characteristics. First of all, they all constitute attempts by mathematicians to address foundational issues which arise straight from their own mathematical practice. As such, the mathematical component of these contributions appears to drive also the philosophical reflections, if not to overcast them. Secondly, all the programmes draw directly on Hilbert’s ideas, which constitute an explicit source of inspiration. Therefore, through these new programmes Hilbert’s legacy is very much alive today.
2 Hilbert’s programme now and then 2.1 The original Hilbert’s programme To set the stage for the subsequent discussion, we wish to recall some wellknown themes related to Hilbert’s programme.2 Hilbert’s programme may be seen as having two fundamental components: the axiomatic method and finitary proof theory. Here a crucial role is played by the notion of consistency of a formal system.3 Simplifying considerably, Hilbert aimed at a formalisation of logic and the whole of mathematics, since in so doing we 2
The present note only highlights well-known aspects related to Hilbert’s programme, focusing on those which seem more relevant to the subsequent discussion. Regrettably, we here do no justice to the complexity of Hilbert’s thought (as well as of his collaborators’) nor take into due account the deep and rich literature on the subject. For an excellent recent survey on Hilbert’s programme, also in the light of the latest contributions in proof theory, we refer to Zach’s [144]; see also [91] and [123, 6], the latter especially with regard to the historical progression of the programme and the developments of metamathematics and proof theory.
3
We wish to highlight the purely syntactic character of the notion of consistency, and recall that, as clarified in [123, 6], Hilbert and Bernays had a fundamental role in
Finite Methods in Mathematical Practice
353
can express the entire ‘thought-content of the science of mathematics in a uniform manner’ and thus ‘the interconnections between the individual propositions and facts become clear’ [71, p. 475].4 The axiomatic method had the consequence of relieving our dependence from direct intuition: ‘a theory by its very nature is such that we do not need to fall back upon intuition or meaning in the midst of some argument’ [71, p. 475]. In this context, therefore, the consistency of the axiom systems introduced gains fundamental importance. Hilbert thus aimed at showing, by exclusively finitistic means, that the formal systems encoding mathematics are consistent, that is to say, they are free of contradiction. The consistency of a body of mathematics did not need to be obtained directly, as it could instead be secured by first reducing the corresponding theory to another, more fundamental theory, and then proving the latter consistent. For example, in the case of geometry, Hilbert gave an arithmetical-analytical interpretation of its axioms, thereby reducing the question of its consistency to that of the consistency of the axioms for real numbers. The latter, however, needed to be obtained directly, as it was irreducible to a more fundamental problem. The consistency of the axioms for real numbers apparently was the ultimate problem for the foundations of mathematics according to Hilbert, so much so that already in 1900 it constituted his second problem in his famous address to the International Congress of Mathematicians. In fact, only with time and with the development of mathematical logic and proof theory by Hilbert and his school at the beginning of the 20th century, the problem of the consistency of the axioms for the real numbers became more precise.5 6 As an initial task, Hilbert’s collaborators attempted proofs of the consistency of arithmetic.7 bringing forth the nowadays familiar distinction between syntax and semantics. 4
Here and in the following we refer to the English translation of Hilbert’s work, as indicated in the references.
5
Niebergall and Schirn [95] suggest that Hilbert’s programme was formulated very imprecisely, so much so, that it can be seen as incapable of being refuted by Gödel’s incompleteness results.
6
It was observed by Hilbert and Bernays [72] that classical analysis can be formalised within second order arithmetic. Consequently, nowadays often proof-theorists identify analysis with second order arithmetic, and see the proof of consistency of the latter as the most challenging question stemming from Hilbert’s programme. This identification is however rather coarse, e.g. in view of the results highlighted in section 2.3.
7
It is well known that Ackermann proposed an erroneous proof of consistency of a version of analysis in his dissertation of 1924. He subsequently amended it, accepting suggestions from von Neumann. Ackermann and Bernays apparently believed the new version to constitute a proof of the consistency of number theory. Hilbert optimistically quoted these results in [71], suggesting that they could be easily extended to the whole of analysis. He wrote: ‘The method of W. Ackermann permits a further extension still. For the foundations of ordinary analysis his approach has been developed so far that only the task of carrying out a purely mathematical
354
Laura Crosilla & Peter Schuster
A proof of consistency of mathematics was deemed necessary by Hilbert also to fully justify the current mathematical practice, which had undergone substantial transformations during the nineteenth century. In particular, one of the aims of the programme was to justify the use of the actual infinite in mathematics by a reduction of infinitary notions to finitary ones, as exemplarily expounded in [70]. The strategy was to obtain such a reduction by the axiomatic method and the newly devised mathematical enterprise of proof theory. Crucially, the proof of consistency of analysis had to be obtained by totally uncontroversial means, that is to say, according to Hilbert by strictly finitary means. We also wish to highlight that the axiomatic method and proof theory were considered as all encompassing tools which would systematically provide definitive answers to all foundational questions. More precisely, Hilbert hoped that his proof theory would enable him to: [. . . ] eliminate once and for all the questions regarding the foundations of mathematics, in the form in which they are now posed, by turning every mathematical proposition into a formula that can be concretely exhibited and strictly derived, thus recasting mathematical definitions and inferences in such a way that they are unshakable and yet provide an adequate picture of the whole science. ([71, p. 464]).
Here the fundamental notion of a concrete object plays a crucial role, as clarified by the following quotation from [71]: No more than any other science can mathematics be founded by logic alone; rather, as a condition for the use of logical inferences and the performance of logical operations, something must already be given to us in our faculty of representation, certain extralogical concrete objects that are intuitively present as immediate experience prior to all thought. If logical inference is to be reliable, it must be possible to survey these objects completely in all their parts, and the fact that they occur, that they differ from one another, and that they follow each other, or are concatenated, is immediately given intuitively, together with the objects, as something that neither can be reduced to anything else nor requires reduction. [. . . ] And in mathematics, in particular, what we consider is the concrete signs themselves, whose shape, according to the conception we have adopted is immediately clear and recognizable [71, p. 464-5].8
Finitary sequences of strokes and the concrete signs forming the formulas of a formal system are among Hilbert’s concrete objects, as they satisfy the requirement of complete surveyability and intuitive presentation. Alongside concrete objects Hilbert also envisaged finitary statements, the simplest examples of which are those expressing equality and inequality of proof of finiteness remains. Already at this time I should like to assert what the final outcome will be: mathematics is a presuppositionsless science’ [71, p. 479]. 8
Very similar remarks appear elsewhere in Hilbert’s writings, for example in [70, p. 142].
Finite Methods in Mathematical Practice
355
numerals (see [71, p. 470], [70, p. 146]).9 In fact, elementary number theory held a special place in Hilbert’s proof theory10 , as its truths were considered provable through contentual (inhaltlich) intuitive considerations and were thus prior to any form of logical inference. In fact, according to Hilbert, no contradiction can arise in contentual number theory as there is no logical structure in the propositions of this theory. Consequently, contentual number theory is secure and thus can constitute the basis for the justification of other parts of mathematics. It is interesting to note Hilbert’s attitude towards elementary number theory as well as the requirement of surveyability of the signs as witnessed by the quotation above, as they evoke the influence of Kronecker on Hilbert.11 We wish to recall that Kronecker’s ideas have also been a prominent source of inspiration for constructive mathematicians [18, 47, 86].12 For example, Bishop [18, p. 2] writes: The primary concern of mathematics is number, and this means the positive integers. We feel about number the way Kant felt about space. The positive integers and their arithmetic are presupposed by the very nature of our intelligence [. . . ]. The development of the theory of the positive integers from the primitive concept of the unit, the concept of adjoining a unit, and the process of mathematical induction carries complete conviction. In the words of Kronecker, the positive integers were created by God.13 14 9
Hilbert’s understanding of the numerals is prone to different interpretations. See e.g. [144] for a concise summary.
10 In [70] Hilbert calls number theory the ‘purest and simplest offspring of the human mind’. 11 It is well known that beyond the quite different perspectives on the foundations of mathematics of the two influential mathematicians, there is a component of Kronecker’s legacy in Hilbert’s thought. This is for example elicited by Weyl in [140]. 12 We could also mention at this point that further important source of philosophical inspiration for many constructive mathematicians are Kant, Poincaré and Husserl (see for instance [18, 92, 86, 134]). 13 We wish to recall that Bishop’s text proceeds as follows: ‘Kronecker would have expressed it even better if he had said that the positive integers were created by God for the benefit of man (and other finite beings). Mathematics belongs to man, not to God. We are not interested in properties of the positive integers that have no descriptive meaning for finite man. When a man proves a positive integer to exist, he should show how to find it. If God has mathematics of his own that needs to be done, let him do it himself.’ 14 In [19, pag53] Bishop also wrote: ‘Intuitionism, as developed by Brouwer, stresses as basic our intuition of the integers and our intuition of the real numbers; all of mathematics is to be reduced to these two primitive constructs. In my book [18] I proposed, in the spirit of Kronecker rather than Brouwer, that the integers are the only irreducible mathematical construct. This is not an arbitrary restriction, but follows from the basic constructivist goal – that mathematics concern itself with the precise description of finitely performable abstract operations. It is an empirical fact that all such operations reduce to operations with the integers. There is no reason
356
Laura Crosilla & Peter Schuster
Notwithstanding the crucial role of elementary number theory, ordinary mathematics is ripe of statements which go beyond contentual considerations. In fact, ‘even elementary mathematics goes beyond the standpoint of intuitive number theory’ [70, p. 145]. Therefore, Hilbert15 distinguished different kinds of formulas: first of all there are those to which there correspond contentual communications of finitary propositions (e.g. numerical equations or inequalities). These are unproblematic and the usual rules of classical logic safely apply to them. Then there are other finitary statements which however are problematic. For instance, in general the negation of a finitary statement need not be itself a finitary statement. Take for example the universal statement: for any a, a+1 = 1+a (where a is a meta-variable for a numeral). From a finitary perspective, this statement is ‘incapable of negation’ [70, p. 144] as ‘one cannot, after all, try out all numbers’ [71, p. 470]. For statements of this kind, the usual laws of classical logic do not hold. However, the usefulness of the Aristotelian laws of logic (in particular the excluded middle) is considered fundamental by Hilbert, and undisputable. According to Hilbert, relinquishing classical logic would force us to abandon the ‘tremendous progress’ made in mathematics so far. Now, the difficulty which arises here can be overcome by the introduction of a new kind of propositions, called ideal propositions. These are formulas which in themselves mean nothing but which represent the ideal objects of the theory and are introduced ‘in order that the ordinary laws of logic would hold universally’ [70, p. 146]. In [70, 71] Hilbert used the example of algebra and the use of variables in algebra to clarify this point. For example, instead of the concrete formula a+b = b+a, where a and b stand for particular numerical symbols, it is practice in algebra to prefer the formula: a+b = b+a. The latter, however, has no meaning of its own, as it does not express a finitary statement, it is not an immediate communication of something signified. On the contrary, it is a formal structure from which we can obtain the corresponding finitary statements by substitution of the variables by appropriate numerical symbols. This is to say, one can derive from ideal formulas other ones which do have meaning. Hilbert clarified that there is one indispensable condition attached to the use of the method of ideal elements: ‘extension by the addition of ideal elements is legitimate only if no contradiction is thereby brought about in the old, narrower domain, that is if the relations that result for the old objects whenever the ideal objects are eliminated are valid in the old domain.’ [71, p. 471] We wish once more to stress the role of the formalisation of mathematics. Hilbert thought that by formalising mathematics we could replace mathematics should not concern itself with finitely performable abstract operations of other kinds, in the event that such are ever discovered; our insistence on the primacy of the integers is not absolute.’ 15 Here we closely follow [70].
Finite Methods in Mathematical Practice
357
abstract concepts and complex derivations by contentual investigations, that is by logic-free manipulation of formulas according to specific and circumscribed rules. In [71, p. 471] Hilbert wrote: ‘But a formalised proof, like a numeral, is a concrete and surveyable object. It can be communicated from beginning to end.’ Hilbert claimed, quite optimistically, that once the formalisation of mathematics had been carried out, then a proof of consistency would have easily followed. For example in [71] he wrote that consistency statements amount to showing that a certain formula (for example 0 6= 0) can not be derived by our axioms and rules. However, that the end formula of a derivation ‘has the required structure, namely 0 6= 0, is also a property of the proof that can be concretely ascertained’ [71, p. 471]. Hilbert also once more stated that a finitary proof of consistency ‘can in fact be given, and this provides us with a justification for the introduction of our ideal propositions.’ [71, p. 471]. In fact, a proof of consistency would have enabled to fully justify the use of classical reasoning applied to them. The question of what counted as finitary means of proof for Hilbert and his close collaborators has given rise to different interpretations. Hilbert and Bernays give examples of finitary operations which can be characterised in formal terms by the concept of primitive recursive functions. But as emphasised for example in [130, 143] the Hilbert school did use and accept methods which went beyond primitive recursion. Nevertheless, it is not clear whether they were aware of that and if they would have still accepted them if they were. Tait [129] has convincingly argued that finitism can be formally captured by a fragment of Peano Arithmetic (PA), which is known as Primitive Recursive Arithmetic (PRA). This is obtained by restricting PA’s induction principle to quantifier-free induction.16
2.2 Generalised and relativised Hilbert’s programmes We recall that Hilbert’s programme is often understood as grounded on the following requirements17 : — At least all of PA is to be proved consistent; in fact, the aim is a proof of consistency of all of mathematics. — The proof of consistency ought to be given by exclusively finitary methods.
16 A quite restrictive view of finitism has been proposed by Parsons [98], for which finitism is better captured by taking only addition and multiplication as starting functions and additionally allowing for bounded induction only. A more permissive characterisation of finitism has been proposed by Kreisel [82], according to which finitary reasoning is captured by a system having the same strength as PA. 17 See [40] for a divergent view. See also [95].
358
Laura Crosilla & Peter Schuster
If Hilbert’s programme is so understood, Gödel’s second incompleteness theorem constitutes a fundamental obstacle for the programme, provided that all finitary arguments can be formalised within PA. At first, Gödel and Bernays contemplated the idea that it might still be possible to account for the consistency of PA by employing methods which were not formalizable in PA but which could still be considered finitary.18 Nevertheless, already from the mid 1930s it was widely accepted that all finitary reasoning could be formalised in PA. In fact, as we have seen above, it is nowadays widely acknowledged, following Tait [129], that PRA suffices. Soon after Gödel’s incompleteness results, it was realised (e.g. by Bernays) that an enlarging of the methods of proof theory would have allowed the survival of (a modified) Hilbert’s programme. In fact, it was suggested that ‘instead of a reduction to finitist methods of reasoning, it was required only that the arguments be of a constructive character, allowing us to deal with more general forms of inferences.’ [15, p. 502]. Bernays remarks indicate the possibility of a modification of Hilbert’s programme obtained by relaxing the methods used in the consistency proofs, which can now be taken to be broadly constructive rather than finitary. Following e.g. [144] we call this kind of modified Hilbert’s programme ‘generalised Hilbert’s programme’.19 Another kind of modification of Hilbert’s programme, proposed by Kreisel and then especially brought forward by Feferman, is usually referred to as ‘relativised Hilbert’s programme’ [55]. Contrary to generalised Hilbert’s programmes, relativised Hilbert’s programmes may be seen as pursuing local rather than global projects, as clarified below. As to generalisations, these are exemplified by Gentzen’s consistency proof for arithmetic.20 Here crucial role is played by the use of Transfinite Induction (TI) up to the (countable) ordinal ǫ0 .21 In modern terms, Gentzen’s result can be rephrased as a proof of the consistency of PA given on the basis of the system PRA plus TI up to ǫ0 . In other terms, according to Tait’s identification of PRA as coextensive with finitary reasoning, the only assumption in the proof of consistency of PA which goes beyond finitism is that of TI up to ǫ0 . What is significant here is that TI needs to be applied only to predicates which can be finitistically de18 As quoted in [56], Gödel wrote in [68, p. 138-9, 195]: ‘I wish to note expressly that [this theorem does] not contradict Hilbert’s formalistic viewpoint . . . it is conceivable that there exist finitary proofs that cannot be expressed in the formalism of P . . . ’. Here P refers to Gödel’s finite theory of types of [67]. 19 This is also called ‘extended Hilbert’s programme’ in [58]. 20 We here loosely refer to ‘Gentzen’s proof’, even if, as it is well known, Gentzen gave a number of distinct proofs of the consistency of arithmetic, as highlighted for example in [139]. See also [23] for a formal comparison between Gentzen’s 1938 proof and the rendering of his result by means of infinitary systems (à la Schütte). ω
21 The ordinal ǫ0 is defined as sup{ω, ω ω , ω ω , . . . } = least α (ω α = α).
Finite Methods in Mathematical Practice
359
scribed (i.e. they are primitive recursive).22 Crucially, the reference to ordinals can be recast within arithmetic by using so called ordinal notation systems. These give representations of countable ordinals by means of natural numbers. For example, Cantor’s normal form theorem provides a natural ordinal representation system which can be used in the proof of consistency of arithmetic.23 Considerations of this kind have given rise to a reading of the proof of consistency in terms of an extended form of finitism, for example by Gentzen [63] and Takeuti [131]. We wish to expand slightly on Takeuti’s views as presented in [131, § 11], as, although perhaps in need of further clarification, they raise some interesting points. There the author proposes an extended form of finitism (called Hilbert-Gentzen finitism), and distinguishes this view from the one which resorts to constructive methods in consistency proofs. Takeuti in fact recalls that intuitionism makes some abstract assumptions, an example of which is given by the notion of construction or proof which figures in the Brouwer-Heyting-Kolmogorov explanation of the connectives and quantifiers. Abstract assumptions, Takeuti suggests, are to be avoided as much as possible. However, one can justify a form of extended finitism, which only makes abstract assumptions of a restricted, more concrete kind. These assumptions are called Gedankenexperimente (thought experiments) by Takeuti. In Gentzen’s proof, these are necessary to convince oneself that a certain ordering is in fact a wellordering. Quoting from [131, p. 96]: A Gentzen-style consistency proof is carried out as follows: Construct a suitable standard ordering, in the strictly finitist standpoint. Convince oneself, in the Hilbert-Gentzen standpoint, that it is indeed a well-ordering. Otherwise use only strictly finitist means in the consistency proof.
Gedankenexperimente are crucially needed in performing step (2). According to Takeuti, the distinguishing feature of the Gedankenexperimente is that they only act on concretely given sequences. Hence, according to Takeuti, they are not absolutely abstract as some of the intuitionist’s as22 In fact, they are elementary computable [107]. 23 Cantor normal form theorem states the following: for every ordinal α > 0 there exist unique ordinals α0 ≥ α1 ≥ · · · ≥ αn such that α = ω α0 + · · · + ω αn . As a consequence of the theorem, ordinals α < ǫ0 have normal form with αi < α for each 0 ≤ i ≤ n. Also, each exponent has Cantor normal form with yet smaller exponents. As this process must terminate, ordinals < ǫ0 can be encoded by natural numbers. We observe, furthermore, that for systems proof-theoretically stronger than PA, one needs more powerful ordinal representation systems. In fact, fundamental and challenging work has been necessary to represent the much higher ordinals necessary for the analysis of more complex fragments of analysis. See e.g. [107] for a gentle tour of ordinal analysis.
360
Laura Crosilla & Peter Schuster
sumptions. In other terms, Takeuti’s extended finitism can be characterized as Hilbert’s finitism plus Gedankenexperimente, which are believed to represent a minimal form of ideal component. Although quite suggestive, Takeuti’s notion of Gedankenexperimente is probably not sufficiently detailed, in particular, it would be important to better clarify how to distinguish between absolutely abstract assumptions and more concrete ones [54]. However, the discussion well highlights a typical attitude of many proof-theorists, who are always keen on clarifying the ingredients of a proof, and so pinning down precisely where specific, stronger assumptions pop up. As we shall see below, this is a feature traditionally associated to work in the area of proof-theory, which is making its way within other areas of mathematics, for example through the work in reverse mathematics and, now, constructive algebra. In fact, Takeuti’s discussion anticipates what might appear as further general difficulties facing the proof-theorist’s attempts of framing his work on Hilbertian grounds. With Gentzen’s work, that branch of proof theory known as ordinal analysis came into existence. One of the central themes of ordinal analysis is the classification of theories by means of ordinals. This is achieved by the assignment of proof-theoretic ordinals to theories, measuring their consistency strength and computational power.24 In simple terms, such an ordinal analysis attaches ordinals in a given representation system to formal theories. This work has reached impressive technical achievements [103, 106, 2, 3], but some see difficulties arising here with respect to the philosophical underpinnings of ordinal analysis [56]. This branch of proof theory, in fact, is often presented as a way of justifying larger and larger fragments of second order arithmetic (i.e. analysis) in constructive terms. However, first of all the very notion of constructivity is often not sufficiently spelled out (similarly to Takeuti’s notion of Gedankenexperiment), and one may claim therefore that this reduction is not totally justified.25 Furthermore, over the years, the work in proof theory has become more and more complex, as witnessed for example by recent results on the consistency of strong subsystems of second order arithmetic [103, 106, 2, 3]. Consequently, it becomes more and more challenging for a non-expert to grasp what in these proofs exceeds Hilbert’s finitism, and thus to assess their contribution to a reductionist programme of this kind [56]. For example, Feferman [56] recalls that The crucial question is: In what sense is the assumption of TI(α) justified constructively for the very large ordinals α used in these consistency proofs? Indeed, on the face of it, the explanation of which ordinals α are used appeals to the very concepts and results of infinitary set theory that one is trying to account for on constructive grounds. 24 For example, Gentzen’s consistency proof allows us to assign the ordinal ǫ0 as prooftheoretic ordinal to PA. This is on the basis that PA proves TI for each ordinal strictly less than ǫ0 but it does not prove TI for ǫ0 . 25 See however section 2.3.
Finite Methods in Mathematical Practice
361
Kreisel and Feferman have proposed an alternative route based on Hilbertian themes: relativised Hilbert’s programmes. In the case of relativised Hilbert’s programmes, the all-encompassing task of providing a finitist consistency proof is replaced by the aim of explaining parts of ideal mathematics in terms of more justified ways of reasoning in some sense. In practice this often amounts to reducing one formal system to another more elementary one, as clarified below. In addition, Kreisel raised objections to the too strong emphasis on consistency proofs as the main tool and focus of generalised Hilbert’s programmes. In relativised Hilbert’s programmes, consistency proofs are still a tool at hand, although often in the form of relative consistency; however, other proof-theoretical methods (e.g. functional interpretations) are envisaged. In fact, the principal aim is to use proof theory to make explicit the additional knowledge provided by those proofs. A further idea is that proof theory clarifies ‘what rests on what’ in mathematics [58]. Here the notion of proof-theoretic reduction plays a fundamental role. This can briefly be described as follows. One works on the basis of a reference system, say U ; in practice U can be taken to be PRA. Given two theories S and T which contain U , let Φ be a primitive recursive class of formulas common to the languages of both theories. In addition, Φ should contain the closed equations of the language of U . We say that S is proof-theoretically Φ-reducible to T if in U one proves that given any formula ϕ belonging to the class Φ, every proof of it in S can be transformed via a primitive recursive method into a proof of the same in T . More precisely, S is proof-theoretically Φ-reducible to T , if there exists a primitive recursive function f such that U ⊢ ∀ϕ ∈ Φ∀x (Proof S (x, ϕ) → Proof T (f (x), ϕ)), where Proof V (y, z) expresses that y codes a proof in the theory V of the formula coded by z. Note that if S is proof-theoretically Φ-reducible to T , then S is conservative over T for Φ, in the sense that: if ϕ ∈ Φ and S ⊢ ϕ then T ⊢ ϕ. If false statements such as 0 = 1 belong to Φ, then S is relatively consistent to T . With the notion of proof-theoretic reduction at hand, one can see in exact terms how some forms of abstract mathematics can be reduced to more elementary ones, by looking at formal systems which codify those forms of mathematics and studying their relationship with more elementary ones. Following [55, p. 364], the pattern can be summarised as follows: A part of mathematics M is represented in a formal system T1 which is justified by a foundational or conceptual framework F1 . T1 is reduced proof-theoretically to a system T2 which is justified by another, more elementary such framework F2 .
362
Laura Crosilla & Peter Schuster
For example, Kreisel and Feferman considered not only the reduction of the infinitary to the finitary but also of the nonconstructive to the constructive and of the impredicative to the predicative. In fact, a substantial amount of work has so far been accomplished allowing for the reduction of theories which codify a considerable amount of ideal mathematics to more elementary ones, as for example clarified in [55, 56, 58].
2.3 Contemporary perspectives of Hilbert’s programme A related line of defense of roughly Hilbertian themes has appeared in writings of Simpson, pertaining to Friedman and Simpson’s Reverse Mathematics programme [125, 126]. Simpson claims here to have obtained a partial realisation of Hilbert’s programme, since ‘one can give a finitistic reduction for a substantial portion of infinitistic mathematics including many of the best-known nonconstructive theorems.’ [125, p. 349] The purpose of Reverse Mathematics is to discover which set existence axioms are needed in order to prove specific theorems of ordinary or core mathematics. Often the theorems turn out to be equivalent to the axioms; hence the slogan ‘reverse mathematics’. This programme uses the language of second order arithmetic and it has isolated five main subsystems of it that frequently occur as the reversals of mathematical theorems. To classify a mathematical theorem one usually shows that it is equivalent, on the basis of the next weaker system, to the principal set existence axiom of one of these five systems. In this way one shows which set existence axioms are actually needed for a specific mathematical construction. As a by-product of the analysis, one often obtains more complex but more informative proofs, compared with the standard ones. Surprisingly, results from reverse mathematics have revealed that quite a large part of infinitary mathematics can be reduced to finitistic reasoning. Of fundamental importance in this context is a subsystem of second order arithmetic introduced by H. Friedman and known as WKL0 (the initials referring to Weak König’s Lemma). Although mathematically rather strong, this system is proof-theoretically very weak; in fact it is conservative over PRA with respect to Π02 sentences. This fact prompts Simpson to claim that ‘any mathematical theorem which can be proved in WKL0 is finitistically reducible in the sense of Hilbert’s programme’ [125, p. 354]. In other terms, many theorems which can be proved by use of infinitary techniques and concepts can also be proved elementarily, with the resulting elimination of the relevant infinitary components.26 We also wish to recall here that recently Ishihara, Veldman and others have initiated a programme which goes under the name of constructive re26 We observe that reverse mathematics goes well beyond the aims of a partial realisation of Hilbert’s programme as it also studies mathematical equivalents of subsystems of second order arithmetic which are much stronger than WKL0 .
Finite Methods in Mathematical Practice
363
verse mathematics (see for example [74, 137, 75]). It takes inspiration from Friedman and Simpson’s programme, but differs from it for its scope of action. It builds on the known fact that Bishop-style constructive mathematics is compatible with intuitionistic, recursive and classical mathematics [22]. In fact, each one of these kinds of mathematics may be framed as Bishop-style mathematics plus some specific principles [22]. The idea then is that the privileged standpoint of the constructive mathematician allows him to compare notions and results across all these mathematical traditions. Thus the constructive programme aims at classifying not only theorems in classical mathematics but also theorems in constructive, recursive constructive, intuitionistic mathematics. Constructive reverse mathematics appears thus as a very promising path in the direction of an understanding of the mathematical practice in constructive terms. It also highlights a new trend within constructive mathematics, by introducing a very fine attention to the ingredients of a proof. In this respect constructive reverse mathematics bears some similarities with the algebraists programme to be highlighted below. The results from reverse mathematics might appear quite surprising to the general mathematician, who might wonder if a careful formalisation as the one needed to obtain these results is worth the effort. In fact, as already indicated above and also highlighted for example in [5] and [6], starting from Gentzen’s pioneering work, proof theorists have developed a tradition of trying to keep to a minimum all assumptions they make, and they have also studied weak as well as relatively strong theories.27 In [5] the focus of attention is a system which is even weaker than PRA. This is called Elementary Arithmetic (EA) and is so weak, for example, that it does not prove the totality of an iterated exponential function; in fact, its consistency can be proved in PRA, thus, according to Tait, by finitistic means. Again, as already clarified by Feferman and others, among proof-theorists ‘the general feeling is that most ‘ordinary’ mathematics can be carried out, formally, without using the full strength of ZFC’ [5, p. 258]. Avigad observes that although so weak, from the point of view of finitary number theory and combinatorics EA turns out to be surprisingly robust. So much so that Harvey Friedman has made the following Grand Conjecture:28 Every theorem published in the Annals of Mathematics whose statement involves only finitary mathematical objects (i.e., what logicians call an arithmetical statement) can be proved in elementary arithmetic.
In [5] the author presents some concrete examples of significant theorems (Dirichlet’s theorem on primes in an arithmetic progression and the prime 27 Note that, even the strongest subsystems of second order arithmetic studied for example in [103, 106, 2, 3] are rather weak if compared with the full power for example of ZFC. 28 The conjecture was posted to the Foundations of Mathematics discussion group [FOM] on April 16, 1999.
364
Laura Crosilla & Peter Schuster
number theorem) which although prima facie requiring a good amount of abstract notions, can be shown to hold in systems reducible to EA. Although Avigad recognises that a solution (in one direction or the other, or possibly a mixed one) to Friedman’s conjecture is far from been reachable today, he also suggests some further reasons for working in weak theories.29 The act of ‘mathematizing with one’s hands tied’ can often yield new proofs and a better understanding of old results. It can, moreover, lead to interesting questions and fundamentally new results, such as algorithms or explicit bounds that are absent from non-constructive proofs. [5, p. 272]
One could say that a verification of Friedman’s conjecture would show that there is a precise sense in which the infinitary methods found in ordinary mathematics can ultimately be justified relative to finitary ones. In fact, elementary arithmetic appears to represent a notion of finitism which is quite uncontroversial; indeed, it can account for the methods of proof which would have been accepted by Kronecker.30 31 We ought also to mention at this point the line of research often called proof mining, carried out by Kohlenbach and others aiming at obtaining explicit bounds or rates of convergence from classical proofs especially in analysis [81]. We conclude this sketchy panoramic on the influence of Hilbert’s programme on contemporary proof theory with a brief hint at some ideas presented by Rathjen in [105]. Taking inspiration from both generalised and relativised Hilbert’s programmes, Rathjen [105] proposes a so-called Constructive Hilbert’s programme, aiming at a constructive justification of a substantial portion of classical mathematics. A crucial aspect of Rathjen’s approach is that he clearly specifies what ‘constructive’ means, by referring to a precise framework of theories: Martin-Löf type theory (MLTT). This is usually considered the most satisfactory foundation for constructive mathematics. Rathjen argues that due to the above mentioned work of Feferman as well as the results in reverse mathematics, one can single out a suitable subsystem of classical second order arithmetic as formalising ordinary classical mathematics. Let’s call this subsystem T . The claim is that large parts of infinitistic mathematics can be developed in T . The author comes to the conclusion that ordinary mathematics is demonstrably consistent relative to MLTT, since a strong version of this latter theory 29 Similar arguments have been put forward by Kreisel and Feferman. See for example [58]. 30 Avigad’s paper [5] goes beyond the scope of this survey, by also addressing some fundamental philosophical questions which arise in connection with the proof-theoretic attitude to mathematics. 31 Here we wish to stress the deep similarities of attitude with the new programme in constructive algebra to be surveyed in the next section.
Finite Methods in Mathematical Practice
365
proves the consistency of T . As the consistency proof for T is couched in terms of MLTT, one thus makes use of constructive and predicative rather than finitistic methods. Rathjen’s article also proposes some very intriguing issues relating to the foundations of constructive mathematics and in particular constructive type theory. Significant extensions of Martin-Löf type theory have been proposed in recent years by Palmgren, Rathjen and Setzer [96, 108, 104, 122]. Questions then arise on how to justify these powerful extensions and how far can one carry on this kind of extension process while remaining within a given constructive framework of ideas. Rathjen’s paper proposes an analysis of MLTT and its limits from an external, classical point of view, as ‘a demarcation of the latter is important in determining the ultimate boundaries of a constructive Hilbert program. The aim is to single out a fragment of second order arithmetic or classical set theory which encompasses all possible formalisations of Martin-Löf type theory.’ Rathjen presents an engaging conjecture regarding MLTT’s limits. We believe that more work still needs to be carried out to fully understand these extensions of type theory also from an internal, fully constructive point of view, and thus to fully assess their role for the reductionist programmes in proof-theory [58].
3 Finite methods for constructive algebra In this note we are particularly interested in the influence Hilbert’s programme is having for a new course in constructive algebra, especially through the work of Coquand, Lombardi and others [10, 11, 12, 24, 25, 27, 28, 37, 29, 30, 31, 32, 35, 33, 34, 39, 41, 42, 85, 86, 87, 88, 99, 100, 142]. The first observation we wish to make is that the authors often frame their work explicitly as a contribution to Hilbert’s programme. In fact, in [28] Coquand writes that recent work in commutative algebra “can be seen as achieving a partial realisation of Hilbert’s programme” in such a field. The abstract of Coquand and Lombardi’s [31] reads accordingly: “Recent work in constructive mathematics shows that Hilbert’s programme works for a large part of abstract algebra”. Similar claims are to be found in a number of their articles and presentations. Before clarifying the aims and methods of this undertaking, we wish to recall that constructive algebra is part of the wider field of (Bishop’s style) constructive mathematics. Though its origins can be traced back to Brouwer’s pioneering ideas and Heyting’s realisation of the importance of intuitionistic logic, constructive mathematics had its rebirth starting from Bishop’s publication of Foundations of constructive analysis [18] (see also [20, 21]). Bishop’s style constructive mathematics has often been characterised as mathematics based on intuitionistic logic [22, 109, 21]. Many
366
Laura Crosilla & Peter Schuster
constructive mathematicians further perceive their discipline as incorporating a form of predicativity, which is usually referred to as generalised predicativity. This is a quite generous notion of predicativity [59]. More specifically, for example, the unrestricted use of powerset is considered as unjustified according to this concept of constructivity; however, inductive definitions are allowed, also in their generalised form. Furthermore, we wish to recall that, following Richman’s recent appeal for a mathematics without choice [110, 111], lately more efforts have been dedicated to developing a choice-free constructive mathematics. As to the reasons for pursuing constructive mathematics, we could recall that already Bishop saw this as a necessity if one aims at recovering the computational content of mathematics that may be lost through the use of classical logic (see the introduction to [18] and also [21]). In fact, the computational significance of constructive mathematics is one of the grounds for the more widespread recognition this discipline is receiving nowadays. Certainly, as is well-known, constructive proofs have a direct computational content and thus are amenable to program extraction; and, crucially, programs extracted from constructive proofs are automatically correct. The work we are interested in is especially in commutative algebra. The partial revival of Hilbert’s programme here has two fundamental components: the elimination of ideal objects, and the use of finite methods. The strategy is to analyse classical theorems in commutative algebra and show that when one proves a concrete statement, one can often eliminate the use of ideal objects, and obtain a purely elementary proof. The elimination of the ideal objects is frequently accomplished by finite approximations of infinite objects, and by appealing to a more syntactical and low-type description of the classical notions. Indeed, the aim is often to prove low-type or concrete statements without resorting to notions which are formulated in a higher type level than that of the statement itself. One motivation is that constructive proofs carried out on low type levels are particularly suited for a formalisation within computer-assisted proof systems. It might be interesting to ponder a little on the choice of the field of commutative algebra. Commutative algebra, in fact, is a pivotal corner of contemporary mathematics, which by and large – among other things – has set the grounds for the eventual settling of the famous conjecture known as Fermat’s Last Theorem. But what does the issue of type levels mean in this context? Cannot many theorems of algebra be expressed by something as simple as a finite number of polynomial equations with rational coefficients and in only finitely many indeterminates, whereas to speak for example of analysis requires from the outset a permanent talk of genuinely infinite objects such as real numbers, continuous functions, and the like? Is not therefore algebra a priori much easier to deal with than analysis, from the angle of a (modified) Hilbert’s programme?
Finite Methods in Mathematical Practice
367
The answer to all these questions would have been in the affirmative before Dedekind’s times, and still for Kronecker’s work. However, abstract concepts and methods characteristic of modern algebra were put forward by Dedekind; profited from the rise of set theory and topology; and became most powerful in the hands of Hilbert and his followers (E. Noether, Krull, and others). It is this abstract character which causes problems if revisited from the perspective of a modified Hilbert’s programme; of course not all theorems are affected, but quite a few of the short and elegant proofs done by use of ideal objects. The prime example of an ideal object in commutative algebra is presumably the concept of an ideal that Dedekind has made of Kummer’s “ideal numbers” [43, 44, 45, 46]. Section 5.3 will explain more in detail the difficulties which arise with this notion, and the solution which has been proposed in order to deal with it in a more concrete way. Anticipating a little the contents of that section, we can recall for example that classically the notion of prime ideal of a ring R is usually rendered by appeal to the notion of a subset of R, and thus requires to step into a higher type level compared with the level where the elements of the ring reside. In addition, often proofs of existence of prime ideals involve applications of Zorn’s lemma, which is classically equivalent to the axiom of choice. More specifically, the concept of prime ideal is essential for the fundamental concept of the Zariski spectrum Spec(R) of a commutative ring R.32 In constructive algebra, following Joyal [80], Spec(R) is represented as a distributive lattice, L(R), the lattice generated by the standard basis of the Zariski topology (section 5.3 will clarify why this is in fact possible). The pleasing result of such a device is that all reference to subsets of R (especially to prime ideals) is thus avoided, and in fact this L (R) is situated at the very same type level as R is. With clear reference to Hilbert, the constructive algebraist stresses the direct effective description of the lattice L (R) (rather than defining it as the lattice of compact opens of Spec(R)) and the related fact that we can work with it by simple manipulation of purely symbolic expressions. We have seen above that Kreisel and Feferman’s relativised Hilbert’s programme aims at explaining what are perceived as more complex forms of mathematics in terms of more elementary ones in some sense. For example, they show what is required to explain the infinitary (nonconstructive, impredicative) in terms of the finitary (constructive, predicative). Here, too, the new programme can also be seen as an attempt to show how to give a constructive explanation of some of the ideal elements which are found in commutative algebra. Thus, for example, the lattice L(R) constitutes a way of understanding constructively the spectrum Spec(R) of the ring R. 32 The points this topological space is made of are indeed nothing but the prime ideals of R.
368
Laura Crosilla & Peter Schuster
It is worth recalling that in this enterprise the constructive rendering of a classical notion or a constructive proof are not totally apart from the original classical objects. Already Bishop highlighted the heuristic role of classical proofs for constructive mathematics [18, p. x]: Every theorem proved with idealistic methods presents a challenge: to find a constructive version, and to give it a constructive proof.
In the case of the work by Coquand, Lombardi et al here under consideration, the authors also highlight the fact that often the classical proof guides them in finding the constructive argument, as one can in many cases use the ideas contained in the classical argument in an essential way.33 As for Bishop, here, too, reformulating the statement in an appropriate way constitutes a very important part of the whole constructive enterprise. Once the goal statement has been expressed in concrete terms, then one knows that there must be an elementary proof of it. Importantly, the resulting arguments are in general better proofs inasmuch as they not only are more elementary, and thus more transparent, but also have a clear computational content.34 For Hilbert and his school, consistency was a fundamental instrument in justifying idealistic mathematics. As we have already seen, the prominent role of consistency proofs for Hilbert’s programme had been criticised by Kreisel and Feferman, and consequently had a more limited role within their relativised programmes. It is interesting to remark that here, too, there is not such a general and all-accomplishing tool as a consistency proof. On the contrary, the work carried out so far has looked at some relevant specific examples, in a case-by-case manner. Proof-theoretic techniques, as for example double negation interpretation or cut-elimination can be used for parts of the proofs, but in general each problem requires a new approach and new ideas. Two classical results can be used as heuristic tools for obtaining a constructive proof from a classical one: the completeness theorem for firstorder logic and Barr’s theorem on geometric logic. By completeness we know that if a first-order formula has a proof, possibly containing ideal components, then we can also give a purely first-order proof of it. In fact, if a statement is furthermore formulated equationally, Birkhoff’s completeness theorem for equational theories [17] ensures that there is a purely equational proof of it. Barr’s theorem refers instead to a particular kind of formula which is known as geometric formula (see section 4 below for details). More precisely, this theorem says that if a geometric sentence is derivable from a geometric theory in classical logic, possibly also by ap33 This is why with [29, 87] a series of papers was begun whose titles contain the phrase ‘Hidden constructions in algebra’ or ‘Constructions cachées en algèbre abstraite’. 34 This bears similarities with the arguments e.g. given in [5], where the author highlights the benefits of ‘mathematizing with one’s hands tied’.
Finite Methods in Mathematical Practice
369
plication of the axiom of choice, then it is also deducible from the same theory by intuitionistic logic and without appeal to the axiom of choice. We need to stress again that the full forms of both completeness and Barr’s theorem have classical proofs; consequently they cannot be used in general as automatic translations from classical to intuitionistic theories. In fact, they rather are heuristic tools: to know that an elementary proof ‘must exist’ helps to find it. At this point it is in order to underline the importance of an analysis of the logical complexity of statements. As already indicated, much effort in this modified Hilbert’s programme is dedicated to analysing the logical complexity of given classical theorems and attempting to reformulate them in such a way that their complexity is reduced as much as possible. The more general perspective behind this kind of analysis can be summarised as follows: We see this as a modest, but significant, first step towards the general program of analyzing logically contemporary algebraic geometry, and classifying its results and proofs by their logical complexity.[28]
4 Geometric formulas and dynamical proofs As already observed in the mid 1970s [141, 90], first-order and geometric formulas play a prominent role in constructive algebra, and so do the so-called coherent formulas which are simultaneously geometric and firstorder. The limited logical complexity of geometric formulas has made it possible to single out a peculiar notion of proof, called dynamical proof [16, 39]. A dynamical proof is represented by a tree, and is seen as “logicfree” and elementary. In the case of coherent theories the proofs are even represented by finitely branching trees. We now sketch the road from geometric formulas to dynamical proofs, closely following [26, 31]. In algebra atomic formulas are usually (in)equalities between terms. For example, in the language of rings they are of the form s = t, while in the case of lattices it is convenient to admit also atomic formulas of the form s ≤ t. A positive formula is built from atomic formulas by means of _ ⊤, ⊥, ∧, ∨, ∃, ,
W where infinite disjunction is often restricted to countable index sets. Note that neither implication → nor the universal quantifier ∀ V may be used to build up positive formulas, let alone infinite conjunctions . A geometric formula is of the form ∀x(ϕ → ψ) with positive subformulas ϕ and ψ. Note that the universal quantifier ∀ is only allowed in the outmost position, e.g. expressing universal closure, while → is only permitted in the next position. For example, if ϕ is positive, then ϕ and ¬ϕ are geometric:
370
Laura Crosilla & Peter Schuster
the first is equivalent to ⊤ → ϕ, and the second is defined as ϕ → ⊥. Crucially, neither ϕ ∨ ¬ϕ nor ¬¬ϕ → ϕ are geometric, for ¬ϕ is not positive: it contains an implication. A geometric theory is a theory whose axioms are all geometric W sentences. V A formula is first-order if it has no occurrence of infinite or ; and a coherent formula is a first-order geometric formula. A Horn formula is of the form θ1 ∧ . . . ∧ θn → θn+1 where all the θi are atomic formulas; whence every Horn formula is coherent. Equational theories consist of Horn sentences. Examples of equational theories are the theory of (commutative) rings35 and the theory of (distributive) lattices. The notion of an integral domain however cannot be captured by an equational theory for one needs to have the coherent axiom xy = 0 → x = 0 ∨ y = 0 .
Although the crucial axiom of a field fails to be a geometric formula if it is put as ¬x = 0 → ∃y.xy = 1 ,
it becomes even coherent once it is rephrased in the classically equivalent way x = 0 ∨ ∃y.xy = 1 . (11.1)
Here are two more examples of definitions that can be done by geometric formulas. First, to express that a ring element a is nilpotent one needs a positive formula that is not coherent: _ an = 0 . n∈N
But that the ring A is reduced (i.e., 0 is the only nilpotent element) can even be put as a Horn formula: x2 = 0 → x = 0 . Wraith [141] underlines that geometric formulas are relevant because of Barr’s theorem, which was formulated and proved within topos theory [13]: Theorem 1 (Barr) For every geometric theory T and geometric sentence θ, T + AC ⊢c θ ⇒ T ⊢i θ . In other words, if T proves θ with classical logic and the axiom of choice, then T proves θ with intuitionistic logic and without choice. 35 See the beginning of section 5.1 for a quick reminder of the relevant notions.
Finite Methods in Mathematical Practice
371
A coherent formula can further be characterised as (equivalent to) a finite conjunction of formulas of the form →ϕ ∨ . . . ∨ ∃− ϕ0 → ∃− x x→ 1 1 n ϕn
(11.2)
where every ϕi is a finite conjunction of atomic formulas without parameters. This normal form of coherent formulas is related to the notion of a dynamical proof [16, 26, 39] as follows. Atomic sentences are called facts. Let T be a coherent theory, and θ ≡ θ1 ∧ . . . ∧ θn → θn+1 a Horn sentence with facts θi . To deduce θ from T a finitely branching, finite proof tree can be generated where — a node is a set of facts, together with their parameters, known at a time; — the root consists of the given facts θ1 , . . . , θn ; — a leaf either contains θn+1 , or else is the empty set ∅ and thus represents ⊥.
The facts belonging to a node represent the state of knowledge at a certain time, and knowledge is accumulated with time unless one arrives at ∅. The immediate successors are generated according to the inference rules that correspond to the axioms of T written in the normal form (11.2). Here disjunction is understood as case distinction, with every case giving rise to an immediate successor, and executing the existential quantifier means adding a new parameter; whence in particular all the inference rules are intuitionistically valid. Consider, for instance, a geometric theory T that contains a relativised field axiom such as θ0 → x = 0 ∨ ∃y.xy = 1 . (11.3)
Now if N is a node to which the fact θ0 and the parameter a belong, then N can be endowed with two immediate successors: one of them is N endowed with the fact a = 0; the other one is N enriched with a fresh parameter b and the fact ab = 1. Interestingly, a computational interpretation of the notion of dynamical proof is possible. Here geometric axioms can be seen as subprograms, and branches in a proof tree as runs of a program. The former can be explained along the lines of the foregoing example: the relativised field axiom (11.3) corresponds to a routine, only applicable if the fact θ0 holds, which for any given a tests whether a = 0 and, if the result is negative, produces b such that ab = 1.
372
Laura Crosilla & Peter Schuster
Apart from this interpretation, there are strong metamathematical reasons for considering dynamical proofs [31]:36 The crucial point is that this notion of dynamical proof is complete for deducibility in a coherent theory [16, 26, 39], and that a dynamical proof uses only intuitionistically valid inference steps. Barr’s theorem [for coherent theories and sentences] [. . . ] is a simple consequence: if a coherent sentence is deducible from a coherent theory in classical logic, even with the axiom of choice, it is a semantical consequence of the theory, and so, by completness, it can be derived by a dynamic proof, which is intuitionistically valid.
One can speak of a dynamical proof also in the more general case of a geometric theory T that indeed contains countably infinite disjunctions W , in which case the proof tree is infinitely branching, and completeness as above is still valid. A connection to Hilbert’s programme is, with facts as concrete statements, as follows [31]: A dynamic proof can be seen as a “logic-free” and elementary way to derive new concrete statements from [. . . ] a given collection of concrete statements. By completeness, we know that if we can derive a concrete statement from this theory with the use of ideal methods (typically using Zorn’s lemma), there is also an elementary derivation. [. . . ]
From a constructive perspective, however, this requires a warning to be sounded [31]: Both the completeness theorem and Barr’s theorem are purely heuristic results from a constructive point of view however. Indeed, they are both proved using non constructive means, and do not give algorithms to transform a non effective proof to an effective one.
But this notwithstanding they seem to work quite well as heuristics [31]: In practice however, in all examples analysed so far, it has been possible to extract effective arguments from the ideas present in the non effective proofs. We think that our work, complementary to the work done in constructive mathematics [47, 93] or in Computable Algebra [127], provides a partial realisation of Hilbert‘s program in abstract commutative algebra.
5 Realising Hilbert’s programme in commutative algebra The partial realisation of Hilbert’s programme in commutative algebra initiated by Coquand and Lombardi is clearly of interest to any one – mathematician, logician, philosopher, or otherwise – who is concerned with the 36 The references that occur in this and in the following quotes are adapted to the present bibliography.
Finite Methods in Mathematical Practice
373
foundations of mathematics. For this kind of addressee we will therefore try to shed some light on the recent developments in that area, actually on a particular angle which we believe to be sufficiently representative. As this must anyhow fall short of a comprehensive treatment that takes into account all technical details, we will rather focus on a certain line of concepts and results, and leave it at an overview that will hardly satisfy any working expert but – as we hope – will be of some use for the general reader. More detailed introductions and surveys are available [28, 31, 89], to which our few pages can hardly be more than an appetiser. As compared with constructive algebra along the lines of Bishop [93] and Kronecker [47], extensive use is made of methods from point-free and formal topology [79, 86, 117] and in particular of distributive lattices [7, 78, 138]. Among the many general monographs on commutative algebra as done within customary mathematics we refer to [4] for a concise classic, and to [48] for a more recent and comprehensive source.
5.1 Rings and ideals The basic concept of commutative algebra is the one of a commutative ring, which we briefly recall first. A ring (with unit37 ) is a set R with two distinguished elements, zero 0 and unit 1, and two binary operations, addition + and multiplication ×, such that the following conditions are satisfied: (R, +, 0) is an Abelian group;38 (R, ×, 1) is a monoid; and × distributes over + from both sides. A ring is commutative if so is multiplication, which of course is commonly denoted by juxtaposition. There is a ring with only one element, which a fortiori is commutative: that is, the trivial ring, usually written as 0, in which 1 = 0. A commutative ring R with 1 6= 0 is an integral domain if for every pair a, b ∈ R with ab = 0 either a = 0 or b = 0; and a field if for every a ∈ R either a = 0 or else ab = 1 for some b ∈ R.39 Clearly, every field is an integral domain. The prime examples of an integral domain and a field are the sets Z and Q of the integral numbers and the rational numbers, respectively – of course endowed with the usual operations of addition and multiplication, and with the distinguished elements 0 and 1 that are commonly denoted as such. Also, the ring of polynomials K [T ] in the indeterminate T with coefficients in a field K is an integral domain. But what are, in commutative algebra, the ideal objects in Hilbert’s sense? The prime example is presumably the concept of an ideal that Dedekind has made of Kummer’s “ideal numbers” [46, 45, 44, 43]. In 37 It is nowadays common to suppose that a ring has a unit. 38 A group is a monoid in which every element has an inverse. 39 A field in this sense is a discrete field in the terminology of [93]: that is, one can decide for each element a whether a = 0.
374
Laura Crosilla & Peter Schuster
modern terminology, an ideal a of a commutative ring R is an additive subgroup that is closed under multiplication by arbitrary ring elements, which is to say that a is a subset of R which satisfies the following three conditions: 0 ∈ a,
a ∈ a∧b ∈ a ⇒ a+b ∈ a,
a ∈ a ∨ b ∈ a ⇒ ab ∈ a .
(11.4)
An ideal a equals the ring R precisely when 1 ∈ a; the zero ideal is usually denoted by its only element 0. These two ideals are trivial examples of a principal ideal Ra = {ra : r ∈ R}
generated by a single element a of R, which is often denoted by (a) or the like.
An ideal p of a commutative ring R is a prime ideal if p 6= R, and if the converse of the last condition in (11.4) holds too: that is, if 1 ∈ / p, and if for every pair a, b ∈ R with ab ∈ p either a ∈ p or b ∈ p. A maximal ideal of R is an ideal m with 1 ∈ / m such that for every a ∈ R either a ∈ m or else 1 − ab ∈ m for some b ∈ R. Every maximal ideal is a prime ideal. The zero ideal 0 is a maximal ideal (respectively, a prime ideal) precisely when R is a field (respectively, an integral domain). In particular, if R is a field, then 0 is the only ideal 6= R. The prime examples of a prime ideal are the principal ideals (p) of Z that are generated by the prime numbers p, which are all maximal ideals of Z. In K [T ] for any field K a principal ideal generated by an irreducible polynomial is a maximal ideal. Apart from 0, there are no other prime ideals of Z and K [T ] – at least in classical algebra.40 It is noteworthy that the definitions above of a prime ideal and of a maximal ideal, just as the one of an ideal, only require to quantify over the elements of the given set R. To quantify over the ideals of R is however necessary for the alternative definition of a maximal ideal which the name of this concept comes from: as an ideal m which is maximal among the ideals 6= R, which is to say that, first, m 6= R and, secondly, that for every ideal a 6= R if m ⊆ a, then m = a.
5.2 Noetherian rings As we have just seen, ideals in general are ideal objects inasmuch as they are of the next higher type. In fact, ideals of a ring R are substructures of the algebraic structure of R: that is, subsets of the set R distinguished by a certain set of properties. To quantify over the ideals of R therefore means to quantify over a subclass of the power class of the set R. 40 As we shall explain below, in constructive algebra this remains valid provided that one focusses on the finitely generated ideals.
Finite Methods in Mathematical Practice
375
Admittedly, this nature of a fairly generic subset is hardly visible for the most common kinds of ideals, which indeed are concrete numbers rather than ideal entities. For instance, a principal ideal Ra = {ra : r ∈ R} can be represented by any generator a. More generally, a finitely generated ideal is of the form Ra1 + · · · + Rak = {r1 a1 + . . . + rk ak : r1 , . . . , rk ∈ R} with a1 , . . . , ar ∈ R, and thus can be identified with any finite list (a1 , . . . , ar ) of generators. In particular, a finitely generated ideal still lives on the same level as the ring elements. It is a lucky coincidence that this identification corresponds to the common habit of writing a = (a1 , . . . , ar ) or similarly in place of a = Ra1 + · · ·+ Rak . But aren’t many ideals finitely generated anyway, or even principal? The answer is that it depends, as follows, on whether one chooses to work in classical or in constructive algebra. The principal ideal rings are the rings of which every ideal is principal. Clearly every field is a principal ideal ring. In classical algebra quite a few other rings are principal ideal rings, as there are Z and K [T ] where K is a field. The Noetherian rings – according to the one of the several classically equivalent definitions that goes back to Hilbert – are the rings of which every ideal is finitely generated. Clearly every principal ideal ring is Noetherian. By the Hilbert basis theorem the ring K [T1 , . . . , Tn ] is Noetherian which consists of the polynomials in n ≥ 1 indeterminates T1 , . . . , Tn with coefficients in a field K. This polynomial ring K [T1 , . . . , Tn ] is not a principal ideal ring unless n = 1. In constructive algebra one can still prove [93], by means of the Euclidean algorithm to compute the greatest common denominator of two integers, that an ideal of Z is principal provided that it is finitely generated. A similar argument applies to K [T ] where K is a field as defined above, i.e. a nontrivial ring with a zero test, by which assumption one can determine the degree of any given polynomial, which is to be used for K [T ] in the same way as the modulus of an integer is for Z. One cannot expect however to have a constructive proof that any of these examples is a Noetherian ring according to the Hilbertian definition, or even a principal ideal ring: i.e., that every ideal is finitely generated, let alone principal. As it was put in [93, p. 193], [. . . ] if by Noetherian we mean that every ideal is finitely generated, [. . . ] [then] only trivial rings are Noetherian in this [Hilbert’s] sense from a constructive point of view.
In fact, a certain fragment of the law of excluded middle is necessary already for that the two-element field F2 = {0, 1} be Noetherian in Hilbert’s
376
Laura Crosilla & Peter Schuster
sense. The two most popular classical equivalents of Hilbert’s concept are equally problematic: Noether’s own definition that every ascending chain of ideals is eventually constant; and Artin’s variant that every nonempty set of ideals has a maximal element. All this will be made more precise in the appendix. In constructive algebra it therefore is useless to speak of a Noetherian ring in any of those particular senses, let alone of a principal ideal ring. Yet why did Seidenberg ask “What is Noetherian?” [121]. The reason is that there indeed are classically equivalent but still constructively meaningful variants of Noether’s ascending chain condition. The most popular one, put forward independently by Richman [112] and Seidenberg [121], says that in every ascending chain of finitely generated ideals at least two successive terms are equal. In constructive algebra many rings are Noetherian in this sense [93], including the polynomial ring K[T1 , . . . , Tn ] over a field; see also [100, 102, 120, 132]. Also, to prove with constructive means the termination of Buchberger’s algorithm for the computation of Gröbner bases, which is one of the cornerstones of computer algebra [1], it suffices that any K[T1 , . . . , Tn ] is Noetherian à la Richman and Seidenberg [99]. Another constructively relevant variant of the concept of a Noetherian ring consists of the rings on which one can do Noetherian induction [77, 99, 101]. This classical contrapositive of Artin’s aforementioned maximal principle says that if a property P (a) of finitely generated ideals a of a given ring R is hereditary, then P (a) holds for every finitely generated ideal a of R, where “hereditary” means that, for every a, if P (a′ ) for every a′ % a, then P (a).41 This however requires a generalised inductive definition, which goes beyond first-order logic [31]. In constructive algebra, by the way, Noetherian induction does not suffice for proving that every nontrivial ring has a maximal ideal, and thus a prime ideal. In classical algebra this is a particular case of Artin’s maximal principle mentioned before, which enables one to avoid the use of Zorn’s lemma in the context of Noetherian rings. Without the hypothesis that the ring is Noetherian, existential statements of this kind have been classified as forms of the axiom of choice; we refer the reader to [8] for more details including the original references. Yet being classically equivalent to the classical understanding of a Noetherian ring, both the Richman-Seidenberg concept and the one based on Noetherian induction only make use of quantification over finitely generated ideals. In spite of – or rather just because of – this seeming restriction, each of these concepts still includes most of the concrete examples that the working algebraist encounters in everyday practice, and then makes possible to give constructive proofs of theorems that are of practical interest. Likewise, many theorems of constructive algebra are formulated for 41 In other words, the reverse inclusion order ⊇ is progressive.
Finite Methods in Mathematical Practice
377
finitely generated ideals only; examples will be given later on. Perhaps the most fundamental reason for focussing on finitely generated ideals is that they can be managed more effectively, in both mathematical and programming practice: within a definition or a theorem, and as input or output of an algorithm or a computer program. More pithily put, in general one can only put one’s hand on an ideal if it is finitely generated. Working with the Richman-Seidenberg concept of a Noetherian ring sometimes requires us to add the precondition that the ring under consideration be coherent. This is a closure condition that in these cases allows us to carry out a proof of a theorem about finitely generated ideals without having to leave the realm of finitely generated ideals. A commutative ring is coherent [93] if, first, the intersection of any two finitely generated ideals is finitely generated and, secondly, if the annihilator (0 : a) = {x ∈ R : ∀a ∈ a (ax = 0)} of any finitely generated ideal a is finitely generated. In particular, the finitely generated ideals are closed under forming finite intersections, and thus under forming transporter ideals: (b : a) = {x ∈ R : ∀a ∈ a (ax ∈ b)} . Clearly, coherence would be automatic with the Hilbertian concept of a Noetherian ring; whence to assume coherence would be redundant for Noetherian rings in classical algebra.42
5.3 Spectra and schemes as distributive lattices It often happens in (classical) commutative algebra [48] that the Zariski spectrum Spec(R) of a commutative ring R occurs in the course of a proof. The points of this space are the prime ideals of R: that is, as we have already recalled, the subsets p of R which satisfy 1 6∈ p , 0 ∈ p,
ab ∈ p ⇔ a ∈ p ∨ b ∈ p , a ∈ p∧b ∈ p ⇒ a+b ∈ p,
(11.5)
where a, b ∈ R. Moreover, Spec(R) is endowed with the Zariski topology whose open sets of points – henceforth “opens”– are usually given as the complements of the closed subsets. The latter are the subsets Z (a) = {p ∈ Spec (R) : a ⊆ p} where a is an arbitrary ideal of R. 42 Coherence however is of interest for non-Noetherian rings in classical commutative algebra [65, 66].
378
Laura Crosilla & Peter Schuster
The closure of a point p of Spec (R) consists of the points q which, as prime ideals and thus as subsets of R, lie above p: that is, q ⊇ p. Hence the closed points of Spec (R) are precisely the points of the subspace Max (R) that consists of the maximal ideals of R. In particular, if R is an integral domain, then not every point of Spec (R) is closed unless R is a field; the zero ideal 0 even is the only generic point of Spec (R): that is, the closure of this point is the whole space. To give an example, the initial part of Spec (Z) looks like (2) տ
(3) (5) ↑ ր ... 0
(7)
(11) . . .
where → denotes inclusion ⊆. In other words, 0 → (p) also means that the point (p) is in the closure of the generic point 0. It is not just for convenience of notation that the closed subsets of Spec (R) are given conceptual priority over the open subsets: they form, via their traces on the subspace Max (R), a vast generalisation of the timehonoured concept of an algebraic variety, i.e. the set of common roots of finitely many polynomials. The latter occurs in the special case in which R = K [T1 , . . . , Tn ] where K is an algebraically closed field, such as the field of algebraic numbers or the field of complex numbers. In this case, and by the Hilbert Nullstellensatz, the maximal ideals of K [T1 , . . . , Tn ] are precisely the finitely generated ideals (T1 − t1 , . . . , Tn − tn ) with t1 , . . . , tn ∈ K; whence Max (K [T1 , . . . , Tn ]) ∼ = Kn . In other words, the maximal ideals of K [T1 , . . . , Tn ] correspond to the points of n-dimensional affine space over K. Any such interpretation of course requires us to identify, following Descartes’ algebraisation of geometry, a point of this space with the n-tuple t = (t1 , . . . , tn ) of its coordinates t1 , . . . , tn ∈ K. More specifically, Z (a) ∩ Max (K [T1 , . . . , Tn ]) ∼ = {t ∈ K n : h1 (t) = . . . = hr (t) = 0} for every ideal a = (h1 , . . . , hr ) of K [T1 , . . . , Tn ]. In other words, the closed subsets of Max (K [T1 , . . . , Tn ]) correspond to the algebraic varieties in Kn . This strong link to classic algebraic geometry notwithstanding, admitting non-closed points is essential for the Zariski spectrum’s pivotal role in bridging the gap between algebraic geometry and algebraic number theory: in both fields the results of commutative algebra are used extensively. In fact one had to go for (and beyond) a generalisation of the Zariski spectrum, for Grothendieck’s concept of a scheme, to which we will return later. For the time being we leave it at an example whose relevance should
Finite Methods in Mathematical Practice
379
be clear in view of what we have recalled above. This example is AnK = Spec (K[T1 , . . . , Tn ]) , the affine scheme of dimension n over the field K. We need to stress again that the points of the topological space Spec(R), the prime ideals of the commutative ring R, are subsets of the given set R. As a consequence, the opens of Spec(R) are – as particular sets of prime ideals – even sets of subsets of R. If one seeks to keep the type level as low as possible, then one cannot possibly stick to this definition of the topology on Spec(R). Instead, one needs to deal with this space of points in terms of point-free topology [79, 78], in which the received conceptual priority of points over opens is reversed: that is, the opens rather than the points are seen as the primitive objects. As we shall see below, one can even go one step further and work with appropriate indices, or names, for the elements of a basis of opens; in fact the elements of R are perfectly suited as indices of this kind [62, 118, 119, 124]. By so doing one also follows the lines of what nowadays is known as formal topology [86, 117]. A point is then identified with a neighbourhood filter: that is, a collection of opens each of which contains the point. Of course any such neighbourhood filter needs to be rich enough so that it locates the point in a sufficiently precise way; whence the ontological status of the point can be neglected: The points (prime ideals, . . . ) constitute powerful intuitive help, but they are used here only as suggestive means with no actual existence. [28]
In view of the complexity of points and opens for the Zariski spectrum noticed above, at first glance the move from points to opens seems to make the situation even worse. On the contrary, it gives the chance to do much better: the elements of R can be used to index enough opens of Spec(R) (see below for more details); whence they can be used as finite approximations of points in much the same way in which, say, finite decimal fractions approximate real numbers.
5.3.1 Spectral spaces The chief reason why, for topological methods in commutative algebra, one can get by on point-free topology is that “most of the topological spaces introduced in commutative algebra are spectral spaces” [28], with the space of minimal prime ideals (see below) being an exception that confirms the rule. In particular, Spec(R) is a spectral space, which indeed is the paradigmatic example of a spectral space: “these [the spectral spaces] are precisely the spaces that arise as Zariski spectra of commutative rings” [138, p. 121].43 43 For proofs of this correspondence in point-set and point-free topology see [73] and [9], respectively.
380
Laura Crosilla & Peter Schuster
But what does it mean that Spec(R) is a spectral space, and what is this good for? A spectral space is a topological space X which is sober (that is, every nonempty irreducible closed subset of X has a unique generic point – or, equivalently, X is homeomorphic to the space of completely prime filters of the frame of opens of X), and whose compact opens form a basis K (X) of the topology that is closed by finite intersection. In particular, a spectral space X is a compact Kolmogoroff (or T0 ) space, and K (X) is a distributive lattice. Note that every Hausdorff space is sober, but not every spectral space is Hausdorff. A spectral mapping is a continuous mapping F : X1 → X2 with the property V ∈ K (X2 ) ⇒ F −1 (V ) ∈ K (X1 ) , by which it induces a homomorphism F −1 : K (X2 ) → K (X1 ) between the distributive lattices. A key feature of spectral spaces is that the category of spectral spaces with spectral mappings is equivalent to the category of distributive lattices with lattice homomorphisms [9]. Clearly one assigns any spectral space X to the distributive lattice K (X), and any distributive lattice L to the spectral space Pt (L) whose points are the prime filters of L and which inherits its topology from L (for technical details see the appendix). Categorical equivalence then means X∼ = Pt (K (X))
and K (Pt (L)) ∼ =L
in which the isomorphisms are compatible with continuous mappings in X and lattice homomorphisms in L, respectively. In other words, one can go back and forth from spectral spaces to distributive lattices in a natural way, and without loosing any information. In particular, the compact opens of Spec(R) form a distributive lattice whose prime filters correspond to the prime filters of the ring R and thus to the prime ideals of R. With classical logic the prime filters are indeed the complements of the prime ideals (see again Appendix 6.3), but with intuitionistic logic the former are to be given priority over the latter: [. . . ] because it is at these objects [the prime filters of R rather than its prime ideals] that we wish to localize, and since ¬¬ 6= id, we must deal with them directly [133, p. 194].
5.3.2 Joyal’s lattice It seems as if one would not have gained much by replacing the Zariski topology on Spec(R) by the distributive lattice of its compact opens. Although this lattice only contains the compact opens rather than all the opens, it still consists of sets of points of Spec (R): that is, sets of subsets of R. But, once arrived at this point, one can do better. The key observation is that a fairly natural basis of the Zariski topology consists of
Finite Methods in Mathematical Practice
381
compact opens – thus also is a basis of the associated lattice – and is even indexed by the ring elements. The elements of this truly versatile basis are D (a) = {p ∈ Spec (R) : a ∈ / p} , with a ∈ R, which are the complements of the closed subsets Z (a) for the principal ideals a = (a). Incidentally, the D (a) with a ∈ R form an affine cover of Spec (R): that is, each D (a) is itself isomorphic to a Zariski spectrum. In fact, D (a) ∼ = Spec (R[1/a]) where R[1/a] denotes the ring of (formal) fractions x/an with x ∈ R and n ≥ 0. In view of the characteristic properties (11.5) of a prime ideal, the D (a) with a ∈ R satisfy D(1) = 1 , D(ab) = D(a) ∩ D(b) , D(0) = 0 , D(a + b) ⊆ D(a) ∪ D(b) .
(11.6)
The multiplicative ones among these properties (the top line of the above) ensure that this basis is closed under finite intersections; whence it indeed is a basis rather than just a subbasis. In particular, a generic compact open of Spec (R) is a finite union of basis elements: that is, D(a1 , . . . , an ) = D (a1 ) ∪ . . . ∪ D (an )
(11.7)
with a1 , . . . , an ∈ R.
In view of all this, K (Spec (R)) is isomorphic to a lattice that can be defined in an elementary way: to the distributive lattice L (R) that Joyal [80] has given in terms of generators and relations (see also, for instance, [9, 78]). The generators of L (R) are the symbolic expressions D (a) indexed by the a ∈ R, and they are subject to the so-called support relations: D(1) = 1 , D(ab) = D(a) ∧ D(b) , D(0) = 0 , D(a + b) ≤ D(a) ∨ D(b) .
(11.8)
Of course the support relations do not come out of the blue: they reflect the relations (11.6) between the compact opens, which in turn go back to the characteristic properties (11.5) of a prime ideal. The multiplicative ones among the support relations – i.e. the first line of (11.8) – allow for a simpler notation. First, the generators of L (R) are closed under finite meets; whence they form a basis of L (R) rather than just a subbasis; in other words, a generic element of L (R) is – in compliance with (11.7) – a finite join of generators D(a1 , . . . , an ) = D (a1 ) ∨ . . . ∨ D (an )
(11.9)
rather than a finite join of finite meets of generators. Secondly, while the order relation in an arbitrary distributive lattice has the normal form
382
Laura Crosilla & Peter Schuster
x1 ∧ . . . ∧ xm ≤ y1 ∨ . . . ∨ yn , in L (R) any instance of ≤ can even be simplified to one of the form D (a) ≤ D (b1 , . . . , bn ), simply because in L (R) we have D (a1 · · · am ) = D (a1 ) ∧ . . . ∧ D (am ) . Virtues of Joyal’s lattice The distributive lattice L (R) is isomorphic to K (Spec (R)), with D (a1 , . . . , an ) ∈ L (R) corresponding to D (a1 , . . . , an ) ∈ K (Spec (R)) for any a1 , . . . , an ∈ R. In particular, the prime filters of L (R) equally correspond to the prime ideals of R. But what is the advantage of L (R) as compared with K (Spec (R))? Each of these two distributive lattices has a basis whose elements are indexed by the ring elements, and thus live at the very type level of the elements of the given set R; in both cases moreover an arbitrary element is indexed by a finite list of ring elements. So what? Isn’t K (Spec (R)) just good enough, and why should one bother at all about moving to L (R)? The point is that this lattice [Joyal’s lattice L (R)] is constructively definable from [the ring R] [. . . ], so that we can bypass higher-order and irrelevant notions like the set of prime ideals [of the ring R] [. . . ] [141].
Of course one could view the symbols D (a1 , . . . , an ) for the basic opens as names for the elements of K (Spec (R)) and exclusively work with those names, neglecting the higher-order objects they denote. By so doing however one would end up with working in nothing but L (R). An impossible generalisation Why doesn’t Joyal’s method work more in general? More specifically, why cannot all open sets of a topological space, or even all subsets of a given set, be captured in a similar way?44 So, if Spec (R) allows for a symbolic representation in terms of the D (a) with a ∈ R, why shouldn’t this be possible for the whole power class P (R) in place of Spec (R)? Some evidence can be given with Cantor’s timehonoured argument for that P (R) has strictly bigger cardinality than R:45 if R is countable, then there are countably many D (a) with a ∈ R, which hardly suffice for any representation of P (R) unless R is finite. Needless to say, there are plenty of countably infinite rings; to mention only three of them, there is: the ring Z of integers, the field Q of rational numbers, and the field of algebraic numbers. Local-global principles With a point-free representation of the Zariski spectrum such as Joyal’s, also the local-global principles vital for commu44 With the discrete topology on an arbitrary set, for which every subset is open, the latter case is a special case of the former. 45 Those questions together with this answer have been communicated to us by KarlGeorg Niebergall.
Finite Methods in Mathematical Practice
383
tative algebra need to, and can be put into concrete terms. A typical local-global principle says that a commutative ring R has a certain property E, for short E (R), already if E (Rp ) for every prime ideal p of R. Here Rp is the local ring of R at the point p, the ring of (formal) fractions x/s with x ∈ R and s ∈ R \ p. Note that one rather localises at the prime filter R \ p than at the corresponding prime ideal p. To avoid any such universal quantification over all the points of Spec(R), one has to reformulate it as a quantification over finitely many (compact, or basic) opens that form a covering. This can then be carried over to Joyal’s distributive lattice L (R) or expressed, via radical ideals, in terms of the given ring R. The practicability of this undertaking was demonstrated in various areas [84, 87], such as the one of the Serre conjecture including the theorems of Horrocks and Quillen-Suslin. Minimal primes It is noteworthy that no infinite (meets and) joins are required to describe L (R), so the lattice need not be complete – that is, have arbitrary joins – and coherent logic is enough. However, for minimal prime ideals infinite disjunctions are indispensable, as they are licit in geometric logic. To have the minimal prime ideals of R as the prime filters of a distributive lattice [31], one indeed needs to allow for joins indexed by certain subsets of R: the topological subspace Min (R) of Spec (R) that consists of the minimal prime ideals corresponds to the quotient of L (R) which has the additional relations _ D (a) ∨ D (b) = 1 . b∈R:ab=0
To handle this one has to move from distributive lattices to frames (i.e., complete lattices in which the meet distributes over arbitrary joins), and from coherent to geometric logic. In particular, Min (R) is not a spectral subspace of Spec (R).
5.3.3 Radical ideals The open subsets D (a) of Spec (R) correspond to the radical ideals √ a = {x ∈ R : ∃n ≥ 1 (xn ∈ a)} (11.10) of the ideals a of R, with D (a) ⊆ D (b) ⇔
√ √ a ⊆ b.
This has prompted an alternative description of Joyal’s distributive lattice L (R) in terms of radical ideals. In fact, the formal Hilbert Nullstellensatz
384
Laura Crosilla & Peter Schuster
[29, 78] says that D (a) ≤ D (b1 , . . . , bn ) ⇔ a ∈
p
(b1 , . . . , bn ) ;
(11.11)
p whence L (R) is isomorphic to the distributive lattice of the (b1 , . . . , bn ) ordered by ⊆ with √ √ √ √ √ √ √ a ∨ b = a + b, a∧ b = a·b. 0 = 0, 1 = R, The nontrivial direction ⇒ of (11.11) is a point-free substitute for \ √ a∈ {p ∈ Spec (R) : p ⊇ b} ⇒ a ∈ b , (11.12)
which by the way is needed to establish the aforementioned correspondence between open subsets of Spec (R) and radical ideals of R. Inasmuch as the prime ideals of R are the models of an appropriate theory (see the appendix), the implication (11.12) counts as a completeness theorem, of which the formal Hilbert Nullstellensatz (11.11) can be seen as a purely syntactical counterpart. This implication (11.12) moreover is tantamount to the special case in which a = 1 and b = 0, which is usually put in the form of its contrapositive 1 6= 0 ⇒ Spec (R) 6= ∅
(11.13)
that equivalently reads as “every nontrivial commutative ring has a prime ideal”. Under the name of Krull’s lemma, for arbitrary rings R this statement is known to be an equivalent of the Boolean ultrafilter theorem, a weak version of the axiom of choice [8], whereas for Noetherian rings R the implication (11.13) is provable with classical logic and without any form of the axiom of choice (Section 5.2). Note that no prime ideal at all occurs in the way (11.10) to put the radical ideal, which therefore is used in place of the alternative characterisation \ √ {p ∈ Spec (R) : p ⊇ b} = b ,
of which (11.12) is the nontrivial part, and which clearly violates the finite methods paradigm.
5.3.4 Generalisations Projective spectra The point-free treatment of topological spaces for commutative algebra and algebraic geometry that began with Joyal’s lattice L(R) has already been extended beyond the case of Spec(R), first to the case of projective spectra. Before we outline this development, we briefly recall the concept of a projective spectrum.
Finite Methods in Mathematical Practice
385
A graded ring is a commutative ring R such that a nonnegative integer, the degree deg (a), is assigned to every a ∈ R. These data are expected to satisfy, first, deg (ab) = deg (a) + deg (b) for every pair a, b ∈ R; and, secondly, M R= Rd d≥0
where Rd consists of the elements of R that are homogeneous of degree d: Rd = {a ∈ R : deg (a) = d} . P In particular, the elements of R are finite sums of the form ad with ad ∈ Rd . The prime example of a graded ring is the polynomial ring K[x0 , . . . , xn ], graded by the total degree, in n + 1 indeterminates x0 , . . . , xn with coefficients in a given field K. Following this example, it is fairly standard to assume for any graded ring R that R = R0 [x0 , . . . , xn ] for certain x0 , . . . , xn ∈ R1 . A homogeneous prime ideal of a graded ring R is a prime ideal p such that P ad ∈ p ⇔ ∀d (ad ∈ p) (11.14) P for all ad ∈ R, and with ¬ (x0 ∈ p ∧ . . . ∧ xn ∈ p) .
(11.15)
As for Spec (R), all the D (a) = {p ∈ Proj (R) : a 6∈ p} with a ∈ Rd and d > 0 form basis of compact opens for the Zariski topology on Proj (R) = {p ⊆ R : p homogeneous prime ideal of R} ,
which is called the projective spectrum of R. In the aforementioned prime example, PnK = Proj (K[x0 , . . . , xn ]) is the projective scheme of dimension n over K. As for AnK , if the field K is algebraically closed, then the closed points of PnK correspond to the points of n-dimensional projective space over K.
386
Laura Crosilla & Peter Schuster
For any graded ring R as above, the projective spectrum Proj (R) too is a spectral space and thus can be represented in a point-free way by a distributive lattice [36]. This distributive lattice P (R) is generated by the symbols D(a) with a ∈ Rd for some d > 0, which generators are subject to the relations D(a + b) ≤ D(a) ∨ D(b) D(0) = 0 (11.16) D(ab) = D(a) ∧ D(b) D(x0 ) ∨ . . . ∨ D(xn ) = 1 for all a, b ∈ R. To ensure that D (a + b) is defined in R, one has to require for the first relation that a, b ∈ Rd for a common d > 0. Apart from this side condition, the first three relations of (11.16) are exactly as they are for Joyal’s lattice L (R); and P (R) is a point-free representation of Proj (R) in exactly the same way in which L (R) is one of Spec (R).
When one puts the relations for P (R) in parallel to those imposed on the generators of L (R), one might miss D (1) = 1, which is part of the latter, fromPthe former W ones. In addition, one one might expect to encounter D ( ad ) = D (ad ), as it would perfectly mirror the additional condition (11.14) required from the homogeneous prime ideals. Both relations however are unnecessary – and in fact impossible to denote – in view of the restriction “a ∈ Rd for some d > 0” imposed on the indices P of the generators D (a), according to which one can neither write down D ( ad ) P nor D (1): in general ad is inhomogeneous, and 1 has degree 0. In fact D (1) = 1 has been replaced by the last relation D(x0 ) ∨ . . . ∨ D(xn ) = 1 of (11.16), which captures condition (11.15) and corresponds to the circumstance, well-known from classical geometry, that the n-dimensional projective space has n + 1 affine pieces. This is also reflected by the fact that P (R) is isomorphic [36] to the result of glueing together all the distributive lattices P (R) ∼ 1 =L R D (xi ) = 1 xi 0 with 0 ≤ i ≤ n. Note that a quotient lattice of this type corresponds to an open subspace: to set u = 1 in an arbitrary lattice L means to focus on the lattice elements that are below u. Grothendieck schemes The case of the projective spectrum has hinted at a far more general point-free concept whose customary counterpart includes a large class of schemes à la Grothendieck. As we have recalled earlier on, an affine scheme is one that is isomorphic to the Zariski spectrum of a commutative ring; more precisely the latter is to be viewed together with its natural structure sheaf of local rings. Now a (Grothendieck) scheme is a topological space endowed with a sheaf of local rings that is locally
Finite Methods in Mathematical Practice
387
affine: that is, has an open cover consisting of affine schemes on each of which the given sheaf restricts to the structure sheaf [69, 50]. This clearly is reminiscent of the definition of a differentiable manifold as a topological space covered by homeomorphic copies of open pieces of Euclidean space; moreover the sheaf of local rings on a scheme is nothing but a generalised notion of a continuous function. It has turned out that the already fairly general notion of a Noetherian scheme can be based on distributive lattices rather than topological spaces; even more generally this can be done [38] for every scheme whose underlying topological space is spectral. Technically speaking, these particular schemes form a full subcategory that is equivalent to the category of distributive lattices enriched with appropriate sheaves of local rings. The resulting concept of a spectral scheme [38] instantiates the framework of formal geometries [119] given before in the context of formal topology [117]; it moreover generalises not only the distributive lattice – sketched above – which represents the projective spectrum of a graded ring [36], but also the one which stands for the space of valuations of an abstract nonsingular curve [28]. With the latter lattice Dedekind and Weber’s time-honoured approach to Riemann surfaces via valuations, which already is of a point-free nature, has eventually – actually in the secondnext century – been reduced to an appropriate low type level. It is noteworthy that most of the material required for linking [38] the concept of a spectral scheme with the one of a Grothendieck scheme has already been present since the early days of modern algebraic geometry [69, 1, 6.1] – of course formulated in terms of points rather than opens. Among other things, it then was known that the topological space of a Noetherian scheme X is spectral; and if Y is an arbitrary Grothendieck scheme, then the continuous part of every morphism of Grothendieck schemes X → Y is a spectral mapping. In this context it is therefore legitimate to repeat Rota’s question [116, p. 220]: What would have happened if topologies without points had been discovered before topologies with points, or if Grothendieck had known the theory of distributive lattices?
5.4 Krull dimension of rings and lattices The Krull dimension of a commutative ring R is usually defined as the greatest possible length n of a chain of prime ideals p0 $ p1 $ . . . $ pn . For example, dim (Z) = 1, because 0 $ (p) for any prime number p is the longest possible chain of prime ideals in Z. Also, every field has Krull dimension 0, because 0 is the only prime ideal of a field.
388
Laura Crosilla & Peter Schuster
The Krull dimension of a commutative ring R is the special case X = Spec (R) of the Krull dimension of a topological space X: that is, the greatest possible length n of a chain of nonempty irreducible closed subsets X0 ' X1 ' . . . ' Xn . In fact there is an order-reversing correspondence between the prime ideals of R and the nonempty irreducible closed subsets of Spec (R): the latter are exactly the Z (p) with p a prime ideal of R. Even for n fixed, to define dim (R) = n in the customary way recalled above requires to quantify over the prime ideals of R. However, the topological nature of Krull dimension together with the representation of the Zariski spectrum by a distributive lattice have allowed for characterisations of Krull dimension that are completely elementary [29, 35, 33, 85, 89]. Still, each of these characterisations is, in classical algebra, equivalent to the customary definition recalled above; whence one can use the former to redefine the latter without any talk of prime ideals. The elementary characterisations can be traced back to the concept of Krull dimension for distributive lattices developed by Español [51, 52, 53] following Joyal. One of these characterisations of dim (R) involves directly the generators of the lattice L (R): Lemma 1.1 For each n ≥ 0 we have dim (R) ≤ n precisely when for all a0 , . . . , an ∈ R there are b0 , . . . , bn ∈ R such that D (a0 ) ∧ D (b0 ) D (a1 ) ∧ D (b1 ) .. .
= ≤ .. .
D (an ) ∧ D (bn ) ≤ 1 =
0 D (a0 ) ∨ D (b0 ) .. .
(11.17)
D (an−1 ) ∨ D (bn−1 ) D (an ) ∨ D (bn )
For instance, dim (R) ≤ 0 if and only if every generator of L (R) is complemented by another generator, in which case L (R) is a Boolean algebra. In view of the topological character of Krull dimension it is natural to involve the lattice L (R). However, this can be dispensed with, e.g. along the lines of the formal Hilbert Nullstellensatz (11.11), and another characterisation of Krull dimension can be put entirely in terms of the ring R. This in fact was given by Lombardi [85] before L(R) re-entered the stage. Lemma 1.2 For each n ≥ 0 we have dim (R) ≤ n precisely when for all a0 , . . . , an ∈ R there are b0 , . . . , bn ∈ R and k0 , . . . , kn ∈ N such that ak00 ak11 · . . . · aknn (1 + an bn ) + . . . + a1 b1 + a0 b0 = 0 . (11.18)
Finite Methods in Mathematical Practice
389
For example, dim (R) ≤ 0 if and only if ∀a ∈ R∃b ∈ R∃k ≥ 1.ak (1 + ab) = 0 .
(11.19)
Note that if R is a reduced ring, then in (11.19) one can arrive at k = 1; whence in this case dim (R) ≤ 0 if and only if the commutative ring R is von-Neumann regular : that is, for every a ∈ R there is b ∈ R such that aba = a. Also, dim (R) ≤ 1 if and only if ∀a0 ∈ R∀a1 ∈ R∃b0 ∈ R∃b1 ∈ R∃k0 ≥ 1∃k1 ≥ 1 ak0 ak1 (1+a1 b1 )+a0 b0 = 0 .
Further there is an inductive characterisation of Krull dimension, which of course is particularly suited for proofs by induction. (It has recently also led to one of Krull codimension [115].) The key idea is to reduce the Krull dimension of a ring R to the Krull dimensions of the quotient rings R/Na of R modulo the boundary ideals √ Na = Ra + ( 0 : a) (11.20)
with a ∈ R. More precisely we have the following [33]: Lemma 1.3 For each n ≥ 0 we have dim (R) ≤ n if and only if dim(R/ Na ) ≤ n − 1 for every a ∈ R, where the trivial ring 0 has Krull dimension −1. It is a freak of history that with Lemma 1.3 the Krull dimension can eventually, several generations of algebraists and topologists later, be seen as a particular instance of the concept of inductive dimension in topology that is due to Brouwer, Menger, and Urysohn. According to this timehonoured concept – and roughly speaking – the empty set is of dimension ≤ −1, while for n ≥ 0 a topological space has dimension ≤ n if every point has a basis of neighbourhoods with boundaries of dimension ≤ n − 1. In fact, Spec(R) = ∅ precisely when R = 0, and Spec (R/Na ) viewed as a closed subset of Spec(R) is nothing but the boundary ∂D (a) of D (a); we refer to the appendix for a proof of this easy but little-mentioned fact. The inductive concept of dimension is about as intuitive as is Krull’s. For example, in a two-dimensional object such as a plane the longest chains of nonempty irreducible closed subsets are the ones of the sort plane– line–point; and neighbourhood bases can be formed by open discs, whose boundaries are circles. Yet inductive dimension seems to be inherently more elementary than the latter – at least for the Zariski spectrum, for which Lemma 1.3 provides an elementary characterisation of Krull dimension by way of nothing but inductive dimension. The reader may suspect, however, that the inductive characterisation of Krull dimension by Lemma 1.3 does not entirely comply with the finitemethods paradigm. In particular it is unclear a priori whether the trans-
390
Laura Crosilla & Peter Schuster
√ porter ideals ( 0 : a), and thus the boundary ideals Na , are (at least radicals of) finitely generated ideals. This is the case, however, under the hypotheses that the ring R is Noetherian and coherent, which is frequently assumed in concrete applications of Krull dimension (see below). In fact, if R is of this kind, then L (R) is a Heyting algebra with implication √ D (a) → D (b) = D( b : a) for any (ideals generated by) finite lists √ a and b of elements of R, for which in particular the transporter ideal ( b : a) is finitely generated [29]. Apart from this, Lemma 1.3 may be seen as a façon de parler : as a convenient method, tailor-made for proofs by induction, to encode dim (R) ≤ n still without prime ideals but in the conceptual way that makes up the strength of modern algebra. Decoding is always possible; it is fairly immediate, and only requires some notational effort (Lemma 1.1 and Lemma 1.2).
5.5 Concrete applications of Krull dimension With the one-line characterisation (Lemma 1.2) of Krull dimension, Coquand and Lombardi [29] first proved dim (K [T1 , . . . , Tn ]) = n for every field K and n ≥ 0; in particular, AnK , the affine space of dimension n over K, has indeed Krull dimension n. The idea of this proof is not hard to explain if one admits some suggestive terminology: a complement of a sequence a0 , . . . , an in a ring R is a sequence b0 , . . . , bn in R that satisfies (11.18). Now, on the one hand, every sequence a0 , . . . , an in K [T1 , . . . , Tn ] is algebraically dependent over K (simply because it has one element too many) which can be shown to imply that a0 , . . . , an has a complement; whence dim (K [T1 , . . . , Tn ]) ≤ n. On the other hand, the sequence T1 , . . . , Tn is algebraically independent over K and thus cannot have a complement, which rules out the possibility that dim (K [T1 , . . . , Tn ]) ≤ n − 1.
5.5.1 Kronecker’s theorem under logical scrutiny The inductive characterisation (Lemma 1.3) of Krull dimension has enabled Coquand [25] to give an elementary constructive proof of a timehonoured theorem of Kronecker’s: Theorem 2 (Kronecker) For each n ≥ 0, if dim (R) ≤ n, then for any given h1 , . . . , hm ∈ R with m arbitrary there are g1 , . . . , gn+1 ∈ R such that D (h1 , . . . , hm ) = D (g1 , . . . , gn+1 ) . (11.21)
Finite Methods in Mathematical Practice
391
Coquand’s proof [25] is of course done by induction: if m ≤ n + 1, then there is nothing to prove; if however m > n + 1, then any given set of m generators is transformed linearly into a set of m − 1 generators, to which the induction hypothesis applies. The required linear manipulations of the generators are possible thanks to the additional information which is at one’s disposal by dim (R) ≤ n.
Just as its principal hypothesis, the conclusion of Theorem 2 is of a pointfree topological nature: (11.21) as it stands is an equation in L (R). As is not untypical for the gain of generality by point-free topological methods, Coquand was able to get by without the received hypothesis that the ring R be Noetherian. This assumption still stands, albeit implicitly, behind the “first modern proof” [49] of Kronecker’s theorem by van der Waerden [135], which, anyway, is rather a geometric imagination of an idea of proof. In his proof of Theorem 2 Coquand [25] works both in the ring R and in the lattice L (R). For the purpose of a logical analysis of Theorem 2 we thus establish a two-sorted language S of first-order predicate logic with equality that is an appropriate common extension of the languages R and L of rings with unit and bounded lattices, respectively: 1. S has a sort ρ for ring elements and a sort λ for lattice elements;
2. S has the customary function and relation symbols of R and L: that is, 0, 1, +, −, ×, = for rings;46 and 0, 1, ∨, ∧, ≤, = for lattices;47 3. S has a unary function symbol D of type ρ → λ.
We will only have to write down variables a, b, c, d, . . . of type ρ. With the convention (11.9) we assume par abus de langage that D may take on finite lists of type ρ as arguments. Now, if we fix numerals48 n and m, then the conclusion of Theorem 2 θm,n ≡ ∀h1 . . . ∀hm ∃g1 . . . ∃gn+1 .D (h1 , . . . , hm ) = D (g1 , . . . , gn+1 ) is a coherent sentence of S. Likewise, and again for a fixed numeral n, the hypotheses of Kronecker’s theorem form a coherent theory Γ ∪ ∆ ∪ Λ ∪ {κn } whose components are as follows: 46 The function symbol − for subtraction is required for the theory of rings to be equational. 47 We refrain from using different symbols – such as ⊥ and ⊤ – for the bottom and top element of a bounded lattice; we rather keep to the ones that are equally common for the zero 0 and the unit 1 in a ring with unit. Note also that equality can be defined in terms of order (by antisymmetry: x = y ≡ x ≤ y ∧ y ≤ x), and order in terms of equality and meet (x ≤ y ≡ x ∧ y = x) or join (x ≤ y ≡ x ∨ y = y).
48 For the sake of readability we do not follow the convention to use underlined letters m, n, etc. for numerals.
392
Laura Crosilla & Peter Schuster
— Γ denotes the (equational) theory of commutative rings; — ∆ stands for the (equational) theory of distributive lattices; — Λ consists of the additional axioms for Joyal’s lattice: that is, Λ = {ι, ζ , ∀a∀b.π (a, b) ∧ σ (a, b)} where ι, ζ , π, and σ stand for the (atomic) support relations (11.8): ι ≡ D(1) = 1 , π (a, b) ≡ D(ab) = D(a) ∧ D(b) , ζ ≡ D(0) = 0 , σ (a, b) ≡ D(a + b) ≤ D(a) ∨ D (b) ; — κn is taken from Lemma 1.1 to express that the ring has Krull dimension ≤ n: that is, κn ≡ ∀a0 . . . ∀an ∃b0 . . . ∃bn .χ (a0 , b0 ) ∧ ^ µ (ai , bi , ai−1 , bi−1 ) ∧ ν (an , bn ) 1≤i≤n
where χ, µ and ν are shorthand for the (atomic) dimension relations (11.17): χ (a, b) ≡ D (a) ∧ D (b) = 0 µ (a, b, c, d) ≡ D (a) ∧ D (b) ≤ D (c) ∨ D (d) ν (c, d) ≡ 1 = D (c) ∨ D (d) In all, Theorem 2 for fixed numerals n and m reads as Γ, ∆, Λ, κn ⊢ θm,n . Note that in this theorem there is no occurrence of implication → or (logical) disjunction ∨; in particular there is no branching in the corresponding dynamical proof tree. Also, the existential quantifier ∃ can only be found in κn and θm,n . Alternatively, Theorem 2 can be expressed without any talk of lattices and with fewer hypotheses, which however requires us to use formulas which are geometric but fail to be coherent. To do this we define the language R′ to be the language R of rings enriched with infinite disjunctions indexed by elements of N, and with the usual function of exponentiation (of a ring element by an integer) as a defined symbol. Now any order relation in L (R) can be put as Dp (a) ≤ D (b1 , . . . , bk ) or equivalently, by the formal Nullstellensatz, as a ∈ (b1 , . . . , bk ). In R′ this reads as _
ℓ∈N
∃c1 . . . ∃ck .aℓ = b1 c1 + . . . + bk ck
(11.22)
Finite Methods in Mathematical Practice
393
and therefore is, for a fixed numeral k, a geometric formula but not coherent. One can modify accordingly the conclusion θm,n of Theorem 2 to get the formula p p ′ θm,n ≡ ∀h1 . . . ∀hm ∃g1 . . . ∃gn+1 . (h1 , . . . , hm ) = (g1 , . . . , gn+1 )
of R′ , whose matrix is a finite conjunction of formulas of the form (11.22). To keep to R′ , moreover, the sentence κn needs to be replaced by the equivalent of dim (R) ≤ n from Lemma 1.2: κ′n ≡ ∀a0 . . . ∀an ∃b0 . . . ∃bn _ aℓ00 aℓ11 · . . . · aℓnn (1 + an bn ) + . . . + a1 b1 + a0 b0 = 0 . ℓ0 ,...,ℓn∈N
′ For fixed numerals n and m both κ′n and θm,n are geometric formulas of ′ R but not coherent, and Theorem 2 reads as ′ . Γ, κ′n ⊢ θm,n
5.5.2 The theorem of Eisenbud-Evans and Storch The inductive characterisation (Lemma 1.3) of Krull dimension has even allowed for a constructive proof [34] of the following [49, 128] (see also [83]): Theorem 3 (Eisenbud-Evans, Storch) Let R be Noetherian, strongly discrete, and coherent.49 For each n ≥ 1, if dim (R) ≤ n − 1, then for any given h1 , . . . , hm ∈ R[T ] with m arbitrary there are g1 , . . . , gn ∈ R[T ] such that D (h1 , . . . , hm ) = D (g1 , . . . , gn ) . (11.23) This constructive proof of [34] is done with the concept of a Noetherian ring given by Richman and Seidenberg. Accordingly, Theorem 3 as put above, following [34], is restricted to finitely generated input ideals (h1 , . . . , hm ), and to coherent rings. Of course neither of this moves was done in the classical proofs [49, 128]: while the former is clearly unnecessary for the Hilbertian concept of a Noetherian ring, the latter is automatic for these kinds of rings, which are provably coherent (Section 5.2). For the constructive proof of [34] one further needs to assume the classical tautology – which however is constructively valid for many rings that occur in the mathematical discourse [93] – that the ring is strongly discrete: that is, membership to any finitely generated ideal is a decidable 49 These hypotheses will be briefly reviewed in Section 5.2. Note that only the first of these hypothesis is required classically and can be found in the original papers.
394
Laura Crosilla & Peter Schuster
predicate of the ring elements or, equivalently, the inclusion order between finitely generated ideals is decidable.50 By the formal Nullstellensatz again, the outcome (11.23) of Theorem 3 is tantamount to p p (h1 , . . . , hm ) = (g1 , . . . , gn ) . (11.24) If this is compared with the result (11.21) of Theorem 2 and its equivalent p p (h1 , . . . , hm ) = (g1 , . . . , gn+1 ) ,
the advance that was made with the step from Theorem 2 to Theorem 3 becomes clear once one looks at a particular case. In fact, if R = K[T1 , . . . , Tn−1 ] with K an algebraically closed field, then R has one variable less, but R[T ] has as many variables as K[T1 , . . . , Tn ]. Now Kronecker’s Theorem 2 says that every algebraic subset of the n-dimensional affine space Kn can be described by n + 1 polynomial equations, whereas according to Eisenbud-Evans and Storch’s Theorem 3 this can already be done with n polynomials. One thus can get by with one equation less than before; this n moreover is known to be the optimal bound. For instance, a point of the plane K2 cannot possibly be described by a single polynomial equation: rather, it needs to be seen as the intersection of two curves. Already in dimension 3, however, this issue had remained unsettled for quite a time, and “the history of these results is rather interesting” [49]:
In 1891, 9 years after Kronecker [in 1882] had announced his theorem, Vahlen produced an example which, he claimed, showed that Kronecker’s result was the best possible. The example he gave is a curve in complex projective 3-space which, he ‘showed’, is not the intersection of 3 hypersurfaces [. . . ]. Vahlen’s error seems to have gone undetected until 1942 [actually 1941], when Perron [. . . ] exhibited 3 hypersurfaces whose intersection is the curve in question. (The year [?] before, Van der Waerden [. . . ] had given the first modern proof of Kronecker’s theorem.) In 1961 Kneser [. . . ] showed that the existence of Perron’s hypersurfaces was not an accident by proving that every curve in 3-space is an intersection of 3 hypersurfaces.
Soon after Kneser’s achievement Forster [61] in 1964 raised the question whether n equations suffice in arbitrary dimension n, but it took nearly ten more years until Eisenbud-Evans and Storch independently settled the issue in general [49, 128]. Unlike the case of Theorem 2 discussed above, for Theorem 3 the hypothesis that the ring R is Noetherian seems to be indispensable for the arguments given in [34, 49]. The related but weaker hypothesis that the topological space Spec(R) is Noetherian is sufficient for the avenue followed in [128], which by the way gives more evidence for the topological nature of the theorems of Kronecker and Eisenbud-Evans-Storch. 50 In [93] strong discreteness is expressed by “the ring has detachable ideals”.
Finite Methods in Mathematical Practice
395
As compared with Theorem 5, in Theorem 3 additional hypotheses are made: that the ring be strongly discrete, Noetherian, and coherent. If we leave these aside, Theorem 3 allows for a logical analysis that is completely analogous to the one we have done before for Theorem 2; hence we do not need to do it. However, it is in order to briefly discuss the following question: What are the problems, from a constructive perspective, with the way in which Eisenbud and Evans [49] prove their theorem in classical mathematics? Needless to say, this proof rests upon the received definition of Krull dimension, with (chains of) prime ideals. But additional use of prime ideals is made at least twice. First, the initial√case dim (R) = 0 is started by observing that the quotient ring R = R/ 0 can be written as a finite product of fields; whence R[T ] is a principal ideal ring. In particular, the image of the input ideal (h1 , . . . , hmp ) has a single generator g in R[T ], for which in this case one can √ show that (h1 , . . . , hm ) = g in R[T ] as required. All this is possible thanks to (a special case of) the √ Lasker-Noether decomposition available for Noetherian rings, by which 0 can be expressed as the intersection of finitely many (minimal) prime ideals: √ 0 = p1 ∩ . . . ∩ pk . (11.25) In fact, by the assumption dim (R) = 0 every pi is a maximal ideal – or, equivalently, R/pi is a field. By further removing redundancies one can achieve that the p1 , . . . , pk are relatively prime; whence by the Chinese remainder theorem one arrives at a product of fields √ R/ 0 ∼ = R/p1 × . . . × R/pk as required. The sweeping use of prime ideals notwithstanding, this construction can be constructivised whenever one imposes three additional conditions on R, each of which however is a classical tautology: 1. Suppose that R is strongly discrete (see above). 2. Assume that R has a strong primality test [99]: i.e., one can decide whether any given finitely generated ideal of R is a prime ideal, and if it is not, then one can certify this by way of a counterexample. 3. Understand “R is Noetherian” as the classically equivalent finitedepth property [101, 102]: i.e., every tree whose nodes are labelled by finitely generated ideals of R has finite depth provided that along every branch of the tree the ideals labelling the nodes form an ascending sequence.51
51 Alternatively, one can strengthen “ R is Noetherian” to “R is strongly Noetherian” [99].
396
Laura Crosilla & Peter Schuster
Under these additional hypotheses one can indeed [99, 102] grow a binary √ tree with root labelled by 0 which is labelled strictly increasingly by finitely generated ideals – and which therefore is finite – and whose leaves are labelled by finitely generated prime ideals p1 , . . . , pk as required in (11.25). The second use of prime ideals in Eisenbud and Evans’s proof [49] at first glance seems to be more problematic. To prove the nontrivial part of an equation of type (11.24) they invoke the characterisation \ √ (11.26) Spec (R) = 0 of the nilradical
√ 0 = {x ∈ R : ∃n ≥ 1 (xn = 0)}
(11.27)
of R. The nontrivial part
a∈
\
Spec (R) ⇒ a ∈
√ 0
(11.28)
of (11.26), which by the way is the case b = 0 of implication (11.12), however is of an essentially nonconstructive character. Richman has observed in a related context: P Theorem 3 [i.e., if ai X i has a multiplicative inverse in R[X], then ai is nilpotent for i ≥ 1] admits an elegant proof upon observing that each ai with i ≥ 1 must be in every prime ideal of R, and that the intersection of the prime ideals of R consists of the nilpotent elements of R. This proof gives no clue as to how to calculate n such that an i = 0, while such a calculation can be extracted from the proof that we present. [113]
We would like to stress again that, as in Richman’s case, the invocation of (11.28) in Eisenbud and Evans’s proof [49] can be replaced by fully constructive arguments [34]. As we have recalled before, the implication (11.28) in general is a form of the axiom of choice that is typically proved by way of Zorn’s lemma, whereas in the present case of a Noetherian ring there is no need for any transfinite proof method provided that one has classical logic at hand (see Section 5.2). In particular, the proof of the Theorem of Eisenbud-Evans and Storch given in [49] works in Zermelo-Fraenkel set theory – without the axiom of choice but with classical logic – and most likely even in an appropriate fragment thereof. The conclusions both of Kronecker’s theorem and of the theorem of Eisenbud-Evans and Storch are typical examples of a statement whose input and output is of finite nature, and which therefore truly merits a deduction from its hypotheses that is exclusively done by finite methods. The constructive proof [34] of Theorem 3 does indeed contain, for n and m fixed with dim (R) ≤ n − 1, an algorithm that transforms, in R[T ], any
Finite Methods in Mathematical Practice
397
finite list h1 , . . . , hm of length m into a finite list g1 , . . . , gn of length n that satisfies (11.23). Just as Storch’s proof [128], however, this only covers the affine case, whereas Eisenbud and Evans [49] have done the projective case as well; see also [83]. A constructivisation of the projective case has been carried out very recently [114], with the representation recalled before [36] of the projective spectrum Proj (R) of a graded ring R by a distributive lattice P (R). To follow the case of Spec (R) treated in [34], the Krull dimension of P (R) had to be turned into an elementary characterisation of the graded Krull dimension of R.
5.6 Heitmann dimension: an exception that confirms the rule? It is in order to conclude by a quick look into a nearby direction. The inductive definition of Krull dimension has prompted an inductive definition √ of the Heitmann dimension Hdim, for which in (11.20) the nilradical 0 is replaced by the Jacobson radical JR = {x ∈ R : ∀y ∈ R∃z ∈ R.(1 + xy)z = 1}.
(11.29)
As for the nilradical, in constructive algebra this definition of JR is given priority over the characterisation \ Max (R) = JR (11.30) of JR as the intersection of all the maximal ideals of R, which of course is classically equivalent, again in general with Zorn’s lemma, to (11.29). The parallels between (11.27) and (11.29) on the one hand, and (11.26) and (11.30) on the other hand, are plain.
This inductive definition of Heitmann dimension has eventually led to generalisations of the theorem of Forster-Swan to the case of non-Noetherian commutative rings, as well as to improvements upon the bounds relevant in this context [32, 35, 41]. This gain of knowledge, however, requires us to handle the Heitmann dimension with particular logical care as follows [31]. Already the explicit formula ∀a∃b∀y∃z. (1 + a (1 + ab) y) z = 1
(11.31)
for Hdim (R) ≤ 0 is a prenex formula with two alternations of quantifiers. Although it is a first-order formula, it is by no means a geometric formula; hence Barr’s theorem – even without the axiom of choice as an assumption to be eliminated – does not apply at all.52 Still according to [31] there are other, proof-theoretic methods (Gentzen’s Hauptsatz, negative 52 A related result of Ishihara’s [76] seems not to apply either.
398
Laura Crosilla & Peter Schuster
translation) by which an intuitionistic proof can be obtained from a firstorder classical proof.53 Alternatively [31] the dependence of b on a and of z on a, y in (11.31) above can be expressed by Skolem functions f and g, respectively, with which (11.31) reads as (1 + a (1 + af (a)) y) g (a, y) = 1 . This move even leads to a coherent formula whose dynamical proof tree has no branching at all.
6 Appendix 6.1 Noetherian rings and excluded middle Bishop’s Limited Principle of Omniscience (LPO) says that the disjunction ∃n (an = 1) ∨ ∀n (an = 0)
(11.32)
holds for every infinite sequence a0 , a1 , . . . of binary numbers.54 For completeness’s sake we now recall the well-known argument that LPO follows from the assumption that there is any nontrivial commutative ring R that is Noetherian in the classical sense. To this end we suppose that R is a commutative ring that has both of the following classically trivial properties: 1. R is a discrete ring: that is, a = 0 is decidable for each a ∈ R; 2. R is a nontrivial ring: that is, 1 6= 0 holds in R.
With intuitionistic logic we will deduce LPO from any one of the following three conditions: 4. every ideal of R is finitely generated; 5. every inhabited set of finitely-generated ideals of R has a maximal element; 6. every ascending chain of finitely-generated ideals of R is eventually constant. (Note the restriction to finitely-generated ideals in conditions 5 and 6.) We first observe that condition 6 is clearly implied by condition 5, but also that condition 6 follows from condition 4. In fact, under condition 4 every ascending chain a0 ⊆ a1 ⊆ . . . 53 Incidentally, methods of this kind can be used for syntactic proofs of Barr’s theorem without choice; see [94] and [97], respectively. 54 In other words, LPO says that every Σ01 -formula is decidable.
Finite Methods in Mathematical Practice
399
of ideals of R is eventually constant, no matter whether the ak are all finitely generated. To see this consider the ideal generated by all the ak , viz. [ a= {ak : k ≥ 0} . If this a is finitely generated,
a = Rb0 + · · · + Rbm say, then for every j ≤ m we pick k (j) ≥ 0 such that bj ∈ ak(j) , for which a ⊆ ak(0) + . . . + ak(m) . If we set K = max{k (j) : j ≤ m}, then we have ak(j) ⊆ aK for all j ≤ m; whence a ⊆ aK and thus aK = aK+1 = . . . as desired. Now we deduce LPO from condition 6 in conjunction with conditions 1 and 2. To do so, let a0 , a1 , . . . ∈ {0, 1} be given. In view of condition 2 the binary numbers 0 and 1 can be viewed as elements of R. The finitely generated ideals ak = Ra0 + · · · + Rak with k ≥ 0 form an ascending chain: that is, a0 ⊆ a1 ⊆ . . . By condition 6, there is L ≥ 0 such that aL = aL+1 = . . . By condition 1, either ai 6= 0 for some i ≤ L or else ai = 0 for every i ≤ L. In the former case we are done; in the latter case aL = 0 and thus ak = 0 for all k, which is to say that an = 0 for all n.
6.2 Boundaries of basic opens for the Zariski spectrum In this section we describe the boundary ∂D (a) of the basis open subset D (a) of the Zariski spectrum Spec (R) of a commutative ring R, which of course is only an exercise in classical logic and topology. The reason why we do this in some detail is that we want to show how ∂D (a) is linked to the boundary ideal Na , which in fact defines ∂D (a) as a closed subset. In general, the boundary ∂S of a subset S of a topological space is defined as the closure S minus the interior S 0 , where S 0 equals S precisely when S
400
Laura Crosilla & Peter Schuster
is open. As for Spec (R), note first that the closure of D (a) is √ D (a) = Z( 0 : a) . In fact, the complement of D (a), the so-called pseudocomplement of D (a), is the√union of all the D (b) for which D (a) ∩ D (b) = ∅ or, equivalently, ab ∈ 0; whence √ Spec (R) \ D (a) = D( 0 : a) . √ Since Z (a) is the complement of D (a), and Na = ( 0 : a) + (a), we have √ ∂D (a) = D (a) \ D (a) = Z( 0 : a) ∩ Z (a) = Z (Na ) .
6.3 Prime filters of rings and lattices A prime filter ξ of a distributive lattice L and a prime filter π of a commutative ring R are subsets which satisfy the conditions below. Since with classical logic the defining conditions for a prime filter of R are the characteristic properties of the complement of a prime ideal of R, we have put them in parallel: ξ prime filter of lattice L
π prime filter of ring R
p prime ideal of ring R
x∨y ∈ξ ⇒x∈ξ∨y ∈ξ
a+b∈π ⇒a∈π∨b∈π
¬ (0 ∈ ξ) x ∈ ξ∧y ∈ξ ⇔ x∧y ∈ ξ 1∈ξ
¬ (0 ∈ π) a ∈ π ∧ b ∈ π ⇔ ab ∈ π 1∈π
a∈p∧b∈p ⇒a+b∈p 0∈p ab ∈ p ⇔ a ∈ p ∨ b ∈ p ¬ (1 ∈ p)
The prime filters of a distributive lattice L form a topological space Pt (L) with the family Ξ (x) = {ξ ∈ Pt (L) : x ξ}
(x ∈ L)
as a basis of opens, where we use the fairly customary notation x ξ ≡ x∈ ξ, whose choice we shall explain below. So, x ξ means that Ξ (x) is a neighbourhood of ξ. The topological space Spec (R) is presented by the distributive lattice L (R) inasmuch as, classically, the prime filters ξ of L (R) correspond to the prime ideals p of R, via D (a) ξ ! a ∈ /p
Finite Methods in Mathematical Practice
401
for every a ∈ R. This even defines a homeomorphism Pt (L (R)) ∼ = Spec (R) . More specifically, the prime filters ξ of L (R) correspond to the prime filters π of R via D (a) ξ ! a ∈ π for every a ∈ R, and the prime filters π of R correspond to the prime ideals p of R via a∈π ! a∈ /p
for every a ∈ R. The defining conditions for a prime ideal of R are clearly reflected by the characteristic properties of the closed subsets of Spec (R): p prime ideal of R
behaviour of Z (·)
a∈p∧b∈p⇒a+b∈p 0∈p ab ∈ p ⇔ a ∈ p ∨ b ∈ p ¬ (1 ∈ p)
Z(a) ∩ Z(b) ⊆ Z (a + b) Z(0) = Spec (R) Z(ab) = Z(a) ∪ Z(b) Z(1) = ∅
Likewise, the defining conditions for a prime filter of R are reflected by the characteristic properties of the (standard) basic open subsets of Spec (R): π prime filter of ring R
behaviour of D (·)
a+b∈ π ⇒a∈π∨b∈π ¬ (0 ∈ π) a ∈ π ∧ b ∈ π ⇔ ab ∈ π 1∈π
D(a + b) ⊆ D(a) ∪ D(b) D(0) = ∅ D(a) ∩ D(b) = D(ab) D(1) = Spec (R)
Alternatively, one can consider the models of the propositional theory T (R) whose atomic propositions are the D (a) with a ∈ R and whose axioms are the universal closures of the following formulas: D(a + b) → D(a) ∨ D(b) D(0) → ⊥ D(a) ∧ D(b) → D(ab) ⊤ → D(1) The models µ of T (R) correspond to the prime filters ξ of L (R) via µ |= D (a) ! D (a) ξ
402
Laura Crosilla & Peter Schuster
for every a ∈ R, which also explains the customary use of the symbol for prime filters. A thorough treatment of all this has been carried out in [29]. Acknowledgments Veronika Köberlein, Miriam Kertai, and Natalia Rabel, three former Diplom students of the second author, have contributed to this paper by asking the right questions. Discussions with and suggestions by Michael Detlefsen and Karl-Georg Niebergall have turned out most useful. Davide Rinaldi, Tobias Friedl and Daniel Wessel were so kind as to have a look at the manuscript. The first author is grateful to Andrea Cantini for an inspiring discussion on the themes of this article, which took place at a very early stage in its preparation. Luca Bellotti gave very useful comments on a draft of this paper. Jesse Anne Tomalty proof-read the paper and gave useful suggestions. Last but not least both authors are grateful to Godehard Link for his infinite patience.
References [1] William W. Adams and Philippe Loustaunau. An Introduction to Gröbner Bases, volume 3 of Grad. Stud. Math. American Mathematical Society, Providence, R.I., 1994. [2] T. Arai. Proof theory for theories of ordinals I: Recursively Mahlo ordinals. Annals of Pure and Applied Logic, 122:1–85, 2003. [3] T. Arai. Proof theory for theories of ordinals II: Π3 –Reflecion. Annals of Pure and Applied Logic, 129:39–92, 2004. [4] M. F. Atiyah and I. G. Macdonald. Introduction to commutative algebra. Addison-Wesley Publishing Co., Reading, Mass.-London-Don Mills, Ont., 1969. [5] J. Avigad. Number theory and elementary arithmetic. Philosophia Mathematica, 11(3):257–284, 2003. [6] J. Avigad and E. H. Reck. Clarifying the nature of the infinite: the development of metamathematics and proof theory. Carnegie Mellon Technical Report CMU-PHIL-120, 2001. [7] Raymond Balbes and Philip Dwinger. Distributive lattices. University of Missouri Press, Columbia, Mo., 1974. [8] Bernhard Banaschewski. The power of the ultrafilter theorem. J. London Math. Soc., 27:193–202, 1983. [9] Bernhard Banaschewski. Radical ideals and coherent frames. Comment. Math. Univ. Carolin., 37(2):349–370, 1996. [10] Sami Barhoumi. Seminormality and polynomial rings. J. Algebra, 322: 1974–1978, 2009.
Finite Methods in Mathematical Practice
403
[11] Sami Barhoumi and Henri Lombardi. An algorithm for the Traverso-Swan theorem on seminormal rings. J. Algebra, 320:1531–1542, 2008. [12] Sami Barhoumi, Henri Lombardi, and Ihsen Yengui. Projective modules over polynomial rings: a constructive approach. Math. Nachr., 282, 2009. [13] M. Barr. Toposes without points. J. Pure and Applied Algebra, 5:265–280, 1974. [14] P. Benacerraf and H. Putnam. Philosophy of mathematics. Cambridge University Press, Cambridge, 2nd edition, 1983. [15] P. Bernays. Hilbert, David. In Encyclopedia of philosophy, volume 3, pp. 496–504. Macmillan Free Press, New York, 1967. [16] M. Bezem and T. Coquand. Newman’s lemma – a case study in proof automation and geometric logic. Bulletin of the EATCS, 79:86–100, 2003. [17] G. Birkhoff. On the structure of abstract algebras. Mathematical Proceedings of the Cambridge Philosophical Society, 31:433–454, 1935. [18] E. Bishop. Foundations of constructive analysis. McGraw-Hill, New York, 1967. [19] E. Bishop. Mathematics as a numerical language. In R. E. Vesley A. Kino, J. Myhill (ed.), Intuitionism and Proof Theory, pp. 53–71. North-Holland, Amsterdam, 1970. [20] E. Bishop and D. Bridges. Constructive Analysis. Springer, Berlin and Heidelberg, 1985. [21] D. Bridges. Constructive mathematics. In Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy. Center for the Study of Language and Information, Stanford University, 2009. hhttp://plato.stanford.edu/entries/mathematics-constructive/i. [22] D. S. Bridges and F. Richman. Varieties of Constructive Mathematics. Cambridge University Press, 1987. [23] W. Buchholz. Explaining Gentzen’s Consistency proof within infinitary proof theory. In G. Gottlob, A. Leitsch, and D. Mundici (eds), Computational logic and proof theory, Proceedings of the 5th Kurt Gödel Colloquium on Computational Logic and Proof Theory, volume 1289 of Lecture Notes In Computer Science. Springer, 1997. [24] J. Cederquist and Th. Coquand. Entailment relations and distributive lattices. In Proceedings of Logic Colloquium 1998, volume 13 of Lect. Notes Log., pp. 127–139. Assoc. Symbol. Logic, Urbana, 2000. [25] Thierry Coquand. Sur un théorème de Kronecker concernant les variétés algébriques. C. R. Math. Acad. Sci. Paris, 338(4):291–294, 2004. [26] Thierry Coquand. A completeness proof for geometrical logic. In P. Hájek, L. Valdés-Villanueva, and D. Westerståhl (eds), Logic, Methodology and Philosophy of Science. Proceedings of the Twelfth International Congress, pp. 79–90. King’s College Publications, 2005. [27] Thierry Coquand. On seminormality. J. Algebra, 305:577–584, 2006. [28] Thierry Coquand. Space of valuations. Ann. Pure Appl. Logic, 157:97–109, 2009.
404
Laura Crosilla & Peter Schuster
[29] Thierry Coquand and Henri Lombardi. Hidden constructions in abstract algebra (3): Krull dimension of distributive lattices and commutative rings. In M. Fontana et al. (ed.), Commutative Ring Theory and Applications, volume 231 of Lecture Notes in Pure and Applied Mathematics, pp. 477– 499, 2002. [30] Thierry Coquand and Henri Lombardi. A short proof for the Krull dimension of a polynomial ring. Amer. Math. Monthly, 112(9):826–829, 2005. [31] Thierry Coquand and Henri Lombardi. A logical approach to abstract algebra. Math. Struct. in Comput. Science, 16:885–900, 2006. [32] Thierry Coquand, Henri Lombardi, and Claude Quitté. Generating non noetherian modules constructively. Manuscripta Math., 115:513–520, 2004. [33] Thierry Coquand, Henri Lombardi, and Marie-Françoise Roy. An elementary characterisation of Krull dimension. In L. Crosilla and P. Schuster (eds), From Sets and Types to Topology and Analysis, volume 48 of Oxford Logic Guides, pp. 239–244. Oxford University Press, 2005. [34] Thierry Coquand, Henri Lombardi, and Peter Schuster. A nilregular element property. Archiv Math., 85:49–54, 2005. [35] Thierry Coquand, Henri Lombardi, and Claude Quitté. Dimension de Heitmann des treillis distributifs et des anneaux commutatifs. Publications mathématiques de Besançon. Algèbre et Théorie des Nombres, pp. 57–100, 2006. [36] Thierry Coquand, Henri Lombardi, and Peter Schuster. The projective spectrum as a distributive lattice. Cah. Topol. Géom. Différ. Catég., 48: 220–228, 2007. [37] Thierry Coquand, Lionel Ducos, Henri Lombardi, and Claude Quitté. Constructive Krull dimension. I: Integral extensions. J. Algebra Appl., 8:129– 138, 2009. [38] Thierry Coquand, Henri Lombardi, and Peter Schuster. Spectral schemes as ringed lattices. Ann. Math. Artif. Intell., 56:339–360, 2009. [39] Michel Coste, Henri Lombardi, and Marie-Françoise Roy. Dynamical method in algebra: Effective Nullstellensätze. Ann. Pure Appl. Logic, 111(3):203–256, 2001. [40] M. Detlefsen. Hilbert’s Program. Reidel, Dordrecht, 1986. [41] Lionel Ducos. Sur les théorèmes de Serre, Bass et Forster-Swan. C. R. Math. Acad. Sci. Paris, 339(8):539–542, 2004. [42] Lionel Ducos, Henri Lombardi, Claude Quitté, and Maimouna Salou. Théorie algorithmique des anneaux arithmétiques, de Prüfer et de Dedekind. J. Algebra, 281:604–650, 2004. [43] Harold M. Edwards. The genesis of ideal theory. Arch. Hist. Exact Sci., 23(4):321–378, 1980/81. [44] Harold M. Edwards. Dedekind’s invention of ideals. Bull. London Math. Soc., 15(1):8–17, 1983.
Finite Methods in Mathematical Practice
405
[45] Harold M. Edwards. Dedekind’s invention of ideals. In Studies in the history of mathematics, volume 26 of MAA Stud. Math., pp. 8–20. Math. Assoc. America, Washington, DC, 1987. [46] Harold M. Edwards. Mathematical ideas, ideals, and ideology. Math. Intelligencer, 14(2):6–19, 1992. [47] Harold M. Edwards. Essays in Constructive Mathematics. Springer, New York, 2005. [48] David Eisenbud. Commutative algebra, volume 150 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995. With a view toward algebraic geometry. [49] David Eisenbud and E. Graham Evans, Jr. Every algebraic set in n-space is the intersection of n hypersurfaces. Invent. Math., 19:107–112, 1973. [50] David Eisenbud and Joe Harris. The Geometry of Schemes. Springer, New York, 2000. [51] Luis Español. Constructive Krull dimension of lattices. Rev. Acad. Cienc. Zaragoza (2), 37:5–9, 1982. [52] Luis Español. Le spectre d’un anneau dans l’algèbre constructive et applications à la dimension. Cah. Topol. Géom. Différ. Catég., 24:133–144, 1983. [53] Luis Español. Finite chain calculus in distributive lattices and elementaryKrull dimension. In Laureano Lambán, Ana Romero, and Julio Rubio (eds), Contribuciones científicas en honor deMirian Andrés Gómez, pp. 273–285. Servicio de Publicaciones Universidad de La Rioja, Logroño, 2010. [54] S. Feferman. Proof theory. The Bulletin of the American Mathematical Society, 83(3):351–361, 1977. review of Teakeuti [131]. [55] S. Feferman. Hilbert’s program relativized: Proof–theoretical and foundational reductions. The Journal of Symbolic Logic, 53(2):364–384, 1988. [56] S. Feferman. What rests on what? The proof-theoretic analysis of mathematics. In Philosophy of Mathematics, Part I, Proceedings of the 15th International Wittgenstein Symposium. Verlag Hölder–Pichler–Tempsky, Vienna, 1993. [57] S. Feferman. Why a little bit goes a long way. In S. Feferman: In the light of logic. Oxford University Press, Oxford, 1998. [58] S. Feferman. Does reductive proof theory have a viable rationale? Erkenntnis, 53:63–96, 2000. [59] S. Feferman. Predicativity. In S. Shapiro (ed.), Handbook of the Philosophy of Mathematics and Logic. Oxford University Press, Oxford, 2005. [60] H. Field. Science without numbers. A defence of Nominalism. Princeton University Press, Princeton, 1980. [61] Otto Forster. Über die Anzahl der Erzeugenden eines Ideals in einem Noetherschen Ring. Math. Z., 84:80–87, 1964. [62] Nicola Gambino and Peter Schuster. Spatiality for formal topologies. Math. Struct. Comput. Sci., 17(1):65–80, 2007.
406
Laura Crosilla & Peter Schuster
[63] G. Gentzen. Die Widerspruchsfreiheit der reinen Zahlentheorie. Mathematische Annalen, 112:493–565, 1936. English translation in [64, pp. 132–213]. [64] G. Gentzen. The Collected Papers of Gerhard Gentzen. North–Holland, Amsterdam, 1969. Edited by M. E. Szabo. [65] Sarah Glaz. Commutative coherent rings, volume 1371 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 1989. [66] Sarah Glaz. Commutative coherent rings: historical perspective and current developments. Nieuw Arch. Wisk. (4), 10(1-2):37–56, 1992. [67] K. Gödel. Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme, I. Monatshefte für Mathematik und Physik, 38: 173–198, 1931. reproduced, with English translation, in [68], p. 144–195. [68] K. Gödel. Collected Works, volume I: Publications 1929–1936. Oxford University Press, New York, 1986. Edited by S. Feferman et al. [69] Alexander Grothendieck and Jean Dieudonné. Éléments de géométrie algébrique. Vol. 1. Springer, Berlin, 1971. [70] D. Hilbert. Über das Unendliche. Mathematische Annalen, 95:161–191, 1926. reprinted and translated as On the infinite, in [136], pp. 367–392. [71] D. Hilbert. Die Grundlagen der Mathematik. Abhandlungen aus dem Seminar der Hamburgischen Universität, 6:65–85, 1928. reprinted and translated as The foundations of mathematics, in [136], pp. 464–479. [72] D. Hilbert and P. Bernays. Grundlagen der Mathematik II. Grundlehren Math. Wiss. 50. Springer–Verlag, Berlin, 1939. [73] Melvin Hochster. Prime ideal structure in commutative rings. Trans. Amer. Math. Soc., 142:43–60, 1969. [74] H. Ishihara. Constructive reverse mathematics: compactness properties. In L. Crosilla and P. Schuster (eds), From Sets and Types to Topology and Analysis: Towards practicable foundations for constructive mathematics. Oxford University Press, 2005. [75] H. Ishihara. Reverse mathematics in Bishop’s constructive mathematics. Philosophia Scientiae, 6:43–59, 2006. [76] Hajime Ishihara. A note on the Gödel-Gentzen translation. MLQ Math. Log. Q., 46(1):135–137, 2000. [77] Carl Jacobsson and Clas Löfwall. Standard bases for general coefficient rings and a new constructive proof of Hilbert’s basis theorem. J. Symb. Comput., 12(3):337–372, 1991. [78] Peter T. Johnstone. Stone Spaces. Number 3 in Cambridge Studies in Advanced Mathematics. Cambridge etc.: Cambridge University Press, 1982. [79] Peter T. Johnstone. The point of pointless topology. Bull. Amer. Math. Soc. (N.S.), 8(1):41–53, 1983. [80] André Joyal. Les théoremes de Chevalley-Tarski et remarques sur l’algèbre constructive. Cah. Topol. Géom. Différ. Catég., 16:256–258, 1976. [81] U. Kohlenbach. Applied Proof Theory: Proof Interpretations and their Use in Mathematics. Springer, 2008.
Finite Methods in Mathematical Practice
407
[82] G. Kreisel. Hilbert’s programme. Dialectica, 12:346–372, 1958. revised, with Postscript, in Benaceraff and Putnam [14], pp. 289–238. [83] Ernst Kunz. Introduction to Commutative Algebra and Algebraic Geometry. Birkhäuser, Boston, Basel, Berlin, 1985. [84] Henri Lombardi. Hidden constructions in abstract algebra. I. Integral dependance. J. Pure Appl. Algebra, 167:259–267, 2002. [85] Henri Lombardi. Dimension de Krull, Nullstellensätze et évaluation dynamique. Math. Zeitschrift, 242:23–46, 2002. [86] Henri Lombardi. Algèbre dynamique, espaces topologiques sans points et programme de Hilbert. Ann. Pure Appl. Logic, 137:256–290, 2006. [87] Henri Lombardi and Claude Quitté. Constructions cachées en algèbre abstraite (2). Le principe local global. In M. Fontana et al. (ed.), Commutative Ring Theory and Applications, volume 231 of Lecture Notes in Pure and Applied Mathematics, pp. 461–476, 2002. [88] Henri Lombardi and Claude Quitté. Seminormal rings (following Thierry Coquand). Theoret. Comput. Sci., 392:113–127, 2008. [89] Henri Lombardi and Claude Quitté. Algèbre commutative. Méthodes constructives. Modules projectifs de type fini. Calvage & Mounet, Paris, 2012. [90] M. Makkai and G. E. Reyes. First order categorical logic, volume 611 of Lecture Notes in Math. Springer–Verlag, Berlin, Heidelberg, New York, 1977. [91] P. Mancosu (ed.). From Brouwer to Hilbert. The Debate on the Foundations of Mathematics in the 1920s. Oxford University Press, 1998. [92] P. Martin-Löf. Truth of a proposition, evidence of a judgment, valididy of a proof. Synthese, 73:407–420, 1987. [93] Ray Mines, Fred Richman, and Wim Ruitenburg. A Course in Constructive Algebra. Springer, New York, 1988. Universitext. [94] Sara Negri. Contraction-free sequent calculi for geometric theories with an application to Barr’s theorem. Arch. Math. Logic, 42(4):389–401, 2003. [95] K. Niebergall and M. Schirn. Hilbert’s Programme and Gödel’s Theorems. Dialectica, 56(4):347–370, 2002. [96] E. Palmgren. On universes in type theory. In G. Sambin and J. Smith (eds), Twenty–five years of type theory. Oxford University Press, Oxford, 1998. [97] Erik Palmgren. An intuitionistic axiomatisation of real closed fields. MLQ Math. Log. Q., 48(2):297–299, 2002. [98] C. Parsons. Finitism and intuitive knowledge. In Matthias Schirn (ed.), The Philosophy of Mathematics Today, pp. 249–270. Oxford University Press, Oxford, 1998. [99] Hervé Perdry. Strongly Noetherian rings and constructive ideal theory. J. Symb. Comput., 37(4):511–535, 2004. [100] Hervé Perdry. Lazy bases: a minimalist constructive theory of Noetherian rings. Math. Log. Quart., 54(1):70–82, 2008.
408
Laura Crosilla & Peter Schuster
[101] Hervé Perdry and Peter Schuster. Noetherian orders. Math. Structures Comput. Sci., 21:111–124, 2011. [102] Hervé Perdry and Peter Schuster. Constructing gröbner bases for Noetherian rings, under revision. [103] M. Rathjen. Recent advances in ordinal analysis: Π12 -CA and related systems. Bulletin of Symbolic Logic, 1:468–485, 1995. [104] M. Rathjen. The superjump in Martin–Löf type theory. In S. Buss, P. Hajek, and P. Pudlak (eds), Logic Colloquium ’98, Lecture Notes in Logic 13, pp. 363–386. Association for Symbolic Logic, 2000. [105] M. Rathjen. The constructive Hilbert program and the limits of Martin– Löf type theory. Synthese, 147:81–120, 2005. [106] M. Rathjen. An ordinal analysis of parameter-free Π12 comprehension. Arch. Math. Logic, 44:263–362, 2005. [107] M. Rathjen. The art of ordinal analysis. In Proceedings of the International Congress of Mathematicians, Madrid, Spain, European Mathematical Society, 2006. [108] M. Rathjen, E. Griffor, and E. Palmgren. Inaccessibility in constructive set theory and type theory. Annals of Pure and Applied Logic, 94:181–200, 1998. [109] F. Richman. Intuitionism as generalization. Philosophia Mathematica, 5: 124–128, 1990. [110] F. Richman. The fundamental theorem of algebra: a constructive development without choice. Pacific Journal of Mathematics, 196:213–230, 2000. [111] F. Richman. Constructive mathematics without choice. In P. Schuster, U. Berger, and O. Osswald (eds), Reuniting the Antipodes: Constructive and Nonstandard Views of the Continuum, volume 306 of Synthese Library. Dordrecth, Kluwer, 2001. [112] Fred Richman. Constructive aspects Proc. Amer. Math. Soc., 44:436–441, 1974.
of
Noetherian
rings.
[113] Fred Richman. Nontrivial uses of trivial rings. Proc. Amer. Math. Soc., 103(4):1012–1014, 1988. [114] Davide Rinaldi. A formal proof of the projective Eisenbud-Evans-Storch theorem. Arch. Math. (Basel), 99(1):9–24, 2012. [115] Davide Rinaldi. A constructive notion of codimension. J. Algebra, 383: 178–196, 2013. [116] Gian-Carlo Rota. Indiscrete Thoughts. Birkhäuser, Boston, Basel, Berlin, 1997. [117] Giovanni Sambin. Some points in formal topology. Theoret. Comput. Sci., 305(1-3):347–408, 2003. [118] Peter Schuster. Formal Zariski topology: positivity and points. Ann. Pure Appl. Logic, 137(1-3):317–359, 2006. [119] Peter Schuster. The Zariski spectrum as a formal geometry. Theoret. Comput. Sci., 405:101–115, 2008. [120] Peter Schuster and Júlia Zappe. Do Noetherian rings have Noetherian
Finite Methods in Mathematical Practice
409
basis functions? In A. Beckmann et al. (ed.), Logical Approaches to Computational Barriers. Second Conference on Computability in Europe, CiE 2006. Swansea, UK, July 2006, volume 3988 of Lect. Notes Comput. Sci., pp. 481–489, Berlin and Heidelberg, 2006. Springer. [121] Abraham Seidenberg. What is Noetherian? Rend. Sem. Mat. Fis. Milano, 44:55–61, 1974. [122] A. Setzer. Extending Martin–Löf type theory by one Mahlo–universe. Archive for Mathematical Logic, 39:155–181, 2000. [123] W. Sieg. Hilbert’s programs: 1917–1922. Bulletin of Symbolic Logic, 5: 1–44, 1999. [124] Inger Sigstam. Formal spaces and their effective presentations. Arch. Math. Logic, 34(4):211–246, 1995. [125] S. G. Simpson. Partial realizations of Hilbert’s program. Journal of Symbolic Logic, 53(2):349–363, 1988. [126] S. G. Simpson. Subsystems of Second Order Arithmetic. Perspectives in Mathematical Logic. Springer-Verlag, 1999. [127] V. Stoltenberg-Hansen and J. V. Tucker. Computable rings and fields. In Handbook of computability theory, volume 140 of Stud. Logic Found. Math., pp. 363–447. North-Holland, Amsterdam, 1999. [128] Uwe Storch. Bemerkung zu einem Satz von M. Kneser. Arch. Math. (Basel), 23:403–404, 1972. [129] W. W. Tait. Finitism. Journal of Philosophy, 78:524–546, 1981. [130] W. W. Tait. Remarks on finitism. In W. Sieg, R. Sommer, and C. Talcott (eds), Reflections on the Foundations of Mathematics. Essays in Honor of Solomon Feferman, Lecture Notes in Logic, 15. Association for Symbolic Logic and A K Peters, 2002. [131] G. Takeuti. Proof Theory. Studies in Logic, 81. North–Holland, Amsterdam, 1975. [132] Jonathan Tennenbaum. A Constructive Version of Hilbert’s Basis Theorem. PhD thesis, University of California San Diego, 1973. [133] Myles Tierney. On the spectrum of a ringed topos. In Algebra, topology, and category theory (a collection of papers in honor of Samuel Eilenberg), pp. 189–210. Academic Press, New York, 1976. [134] M. van Atten. Brouwer meets Husserl: on the phenomenology of choice sequences. Springer, 2007. [135] Bartel van der Waerden. Review. Zbl. Math., 24:276, 1941. [136] J. van Heijenoort (ed.). From Frege to Gödel. A Source Book in Mathematical Logic. Harvard University Press, Cambridge, Mass., 1967. [137] W. Veldman. Brouwer’s Fan Theorem as an axiom and as a contrast to Kleene’s alternative. Report no. 0509, IMAPP, Radboud University Nijmegen, 2005. [138] Steven Vickers. Topology via logic, volume 5 of Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, Cambridge, 1989. [139] J. von Plato. The development of proof theory. In Edward N. Zalta (ed.),
410
Laura Crosilla & Peter Schuster
Stanford Encyclopedia of Philosophy. Center for the Study of Language and Information, Stanford University, 2008. http://plato.stanford.edu/entries/proof-theory-development/. [140] H. Weyl. David Hilbert and his mathematical work. Bulletin of the American Mathematical Society, pp. 612–654, 1944. [141] Gavin C. Wraith. Intuitionistic algebra: some recent developments in topos theory. In Proceedings of the International Congress of Mathematicians (Helsinki, 1978), pp. 331–337, Helsinki, 1980. Acad. Sci. Fennica. [142] Ihsen Yengui. Making the use of maximal ideals constructive. Theoret. Comput. Sci., 392:174–178, 2008. [143] R. Zach. The practice of finitism. Epsilon calculus and consistency proofs in Hilbert’s program. Synthèse, 137:211–259, 2003. [144] R. Zach. Hilbert’s program then and now. In D. Jacquette (ed.), Handbook of the Philosophy of Science, volume 5, Philosophy of Logic. Elsevier, 2006.
List of Contributors Andrew Arana is associate professor of philosophy and mathematics at the University of Illinois at Urbana-Champaign. He works in logic and the history and philosophy of mathematics, with particular interest in the interactions between different areas of mathematics such as geometry and algebra, and the role of language in mediating such interactions. Patricia Blanchette is Professor of Philosophy at the University of Notre Dame. Her research centers on the history and philosophy of logic and mathematics, and the history of analytic philosophy. Recent work includes Frege’s Conception of Logic, Oxford University Press 2012. Laura Crosilla is Teaching Assistant in the Philosophy Department at the University of Leeds. She has worked on the proof theory of constructive set theory and of operational constructive set theory, applications of constructive set theory to constructive mathematics and the theorem prover Minlog. She is currently working on the philosophy of constructive mathematics. Michael Detlefsen is McMahon-Hank Professor of Philosophy at the University of Notre Dame and long time editor of the Notre Dame Journal of Formal Logic. He has published on a variety of topics in logic and the history and philosophy of mathematics. Recently, his special interest has been meaningful points of connection with philosophy in the history of mathematics. Godehard Link is a retired professor of logic and philosophy of science at the University of Munich. He is a member of the Munich Center for Mathematical Philosophy (MCMP). His research interests include Logic, Philosophy of Mathematics, Russell Studies, Logic and Semantics of Natural Language, Formal Ontology, Foundations of Probability. He served as the speaker of the German group of the TransCoop project. Felix Mühlhölzer is Professor of Philosophy at the Georg-August University Göttingen and has written mainly on subjects in the philosophy of science and on Wittgenstein, especially Wittgenstein’s philosophy of mathematics. His publications include Braucht die Mathematik eine Grundlegung? Ein Kommentar des Teils III von Wittgensteins Bemerkungen über die Grundlagen der Mathematik (Klostermann, 2010) and Wissenschaft (Reclam, 2011). Karl-Georg Niebergall is Professor for Logic and Philosophy of Language at the Humboldt University Berlin. He studied Logic and Philosophy of Science, Mathematics, Physics and Philosophy at the TH Darmstadt
412
List of Contributors
and at LMU Munich. He received his doctorate in Philosophy, and his Habilitation in Philosophy, and Logic and Philosophy of Science at LMU. In 1997 he was a visiting scholar at Stanford University. His main areas of research are: logic, philosophy of language, ontology, and philosophy of science. Marek Polański studied philosophy at Adam Miciewicz University in Poznan and logic and philosophy at LMU Munich where he received his PhD in Logic and the Methodology of Science from LMU Munich. His areas of interest and specialization include modal logic (especially epistemic logic and counterpart semantics), philosophy of language, and model theory and its applications in analytic philosophy. He is associated with the LMU in Munich and the Humboldt University in Berlin. Daniel Roth is Lehrbeauftragter in Logic and Philosophy of Mathematics at the University of Munich and member of the Munich Center for Mathematical Philosophy. He also teaches mathematics at a secondary school in Munich. Matthias Schirn is retired professor of Philosophy at the University of Munich and a member of the Munich Center for Mathematical Philosophy. His research interests are in the philosophy of logic and mathematics, the philosophy of language, intensional semantics, epistemology and the philosophical and logical theories of Frege, Russell, Wittgenstein and Hilbert. He published numerous articles in international journals and held visiting positions at the universities of Oxford, Cambridge, Harvard, Berkeley, Minnesota (Twin Cities), Rio de Janeiro (UFRJ), São Paulo (PUC-SP), Campinas (UNICAMP), Mexico (UNAM, IIF), Buenos Aires (National University) and other universities in Europe and North and South America. Gregor Schneider is a mathematician and philosopher and a member of the Munich Center for Mathematical Philosophy. He works in the history and philosophy of mathematics and has special interest in modern and ancient foundations of mathematics. Peter Schuster is Lecturer in Mathematical Logic at the University of Leeds. His contributions include reverse and choice-free mathematics, formal topology, and constructive set theory. In 2008 the Humboldt Foundation had awarded him a Feodor Lynen Research Fellowship. Paul Ziche holds a chair for the history of modern philosophy at Utrecht University, The Netherlands; his research topics include: history of philosophy with a focus on the periods around 1800 and 1900; German Idealism; the interaction between philosophy and the sciences; history of psychology; alternative lines in the history of logic, with a focus on the period around 1900.
Name Index Ackermann, Wilhelm, 162, 286, 291–295, 300–306, 309, 353 Aczel, Peter, 151, 287 Adams, William W., 376 Adeleke, Samson A., 80, 91 Angelelli, Ignacio, 217 Apostoli, Peter, 275, 295 Appel, Kenneth, 7 Arai, Toshiyasu, 360, 363 Arana, Andrew, 315–318, 327, 329, 330 Aristotle, 25, 64–68 Atiyah, Michael F., 373 Austin, John, 39, 46, 141 Avigad, Jeremy, 326, 327, 352, 363, 364, 368 Bacon, Francis, 11 Badesa, Calixto, 188 Baer, Reinhold, 286, 288 Bakker, Arthur, 286, 287 Balaguer, Mark, 169, 170 Balbes, Raymond, 373 Banaschewski, Bernhard, 376, 379– 381, 384 Bar-Hillel, Yehishua, 275 Barhoumi, Sami, 352, 365 Barr, Michael, 370 Barwise, Jon, 130, 134 Baumann, J. Julius, 46 Beaney, Michael, 108 Becker, James C., 3 Beckmann, Arnold, 376 Bellotti, Luca, 196 Benacerraf, Paul, 173, 352, 357 Berger, Ulrich, 366 Bernays, Paul, 14, 195, 196, 266, 280–282, 293–295, 297,
298, 300, 301, 306, 307, 352, 353, 357, 358 Bezem, Marc, 369, 371, 372 Biermann, Otto, 34 Birkhoff, Garett, 368 Bishop, Errett, 355, 365, 366, 368, 373, 398 Black, Max, 116 Blanchette, Patricia, 97, 105, 108 Blau, Ulrich, 288 Blumenthal, Otto, 13 Boltzmann, Ludwig, 142 Bolzano, Bernard, 11 Boole, George, 125, 126, 214, 215, 220 Boolos, George, 141, 189, 199 Booth, David, 286 Borchert, Donald M., 14 Borges, Jorge Luis, 209 Bourbaki, Nicholas, 330, 331 Bradley, Francis, 126, 144, 167 Breger, Herbert, 286–288, 291 Bridges, Douglas S., 363, 365, 366 Brouwer, Luitzen E. J., 22, 212, 355, 365, 389 Brown, Bryson, 213 Brown, H. C., 19 Buchholz, Wilfried, 358 Buekenhout, Francis, 338 Bueno, Otávio, 170 Burge, Tyler, 193 Burgess, John, 170, 189, 199 Buss, Sam, 365 Butts, Robert E., 342 Buzzard, Kevin, 329 Byers, William, 210 Cantini, Andrea, 210 Cantor, Georg, 25–27, 31–35, 49– 58, 60, 62, 71, 125, 167,
414
Name Index
172, 174, 275–277, 281, 282, 286, 291–293, 296, 302, 305, 308, 359, 382 Carnap, Rudolf, 217, 218, 222, 223 Cass, Daniel, 328 Cassirer, Ernst, 211, 217, 222, 223 Cauchy, Augustin-Louis, 212 Cederquist, Jan, 352, 365 Cegielski, Patrick, 324 Chebyshev, Pafnuty, 326 Chevalley, Claude, 329, 330 Chomsky, Noam, 235 Church, Alonzo, 146, 157, 288 Cogdell, James W., 329 Cohen, Hermann, 25, 47, 48 Coolidge, Julian L., 19 Cooper, Robin, 134 Coquand, Thierry, 351, 352, 365, 368, 369, 371, 372, 390, 391 Coste, Michel, 352, 365, 369, 371, 372 Costreie, Sorin, 32 Coxeter, Harold S. M., 1, 5, 6, 8 Crosilla, Laura, 351, 352, 365, 388, 389 Crowell, Richard H., 194 Currie, Gregory, 27, 34, 41, 42, 59 D’Aquino, Paola, 326, 327 Darrigol, Olivier, 223 de la Vallée-Poussin, Charles, 315 De Morgan, Augustus, 220 Dean, Walter, 199 Dedekind, Richard, 26, 27, 33, 49, 62, 139, 142, 174, 187– 191, 367, 373, 387 Denton, William W., 19 Derbyshire, John, 212 Desmond, Will, 338 Detlefsen, Michael, 1, 123, 166, 211, 212, 214, 217, 219, 220, 315–318, 327, 329,
330, 351, 357 Dieudonné, Jean, 140, 387 Dirac, Paul, 198 Dirichlet, Johann, 139, 329 Dowling, Linnaeus W., 3 Drake, Frank R., 297 Dreben, Burton, 107 Driesch, Hans, 217 Ducos, Lionel, 352, 365, 397 Duhem, Pierre, 168, 169 Dummett, Michael, 34, 80, 86, 91 Dwinger, Philip, 373 Easwaran, Kenneth, 279 Ebbinghaus, Heinz-Dieter, 230, 245, 267 Ebert, Philip, 27, 32, 71 Edwards, Harold M., 355, 367, 372, 373 Edwards, Paul, 67 Eisenbud, David, 393–397 Enderton, Herbert B., 327 Español, Luis, 388 Euclid, 25, 43, 44, 47, 64, 65, 68 Euler, Leonhard, 25, 26, 64, 67, 68, 329 Evans, E. Graham, 393–397 Ewald, William B., 11, 188, 189 Feferman, Solomon, 166, 172, 352, 358, 360–368 Feynman, Richard P., 198 Field, Hartry, 168, 169, 351 Finsler, Paul, 286–291, 308 Fitting, Melvin, 344 Fontana, Marco, 352, 365, 368, 383 Forster, Otto, 394 Forster, Thomas, 286 Foucault, Michel, 209 Fox, Ralph H., 194 Fraenkel, Abraham, 242, 245, 275, 277, 280, 281 Frege, Gottlob, 19, 25–42, 44–54, 56, 57, 59–91, 97–116,
Name Index
119, 121–123, 125–128, 132, 134, 136, 139–143, 146–148, 151, 152, 166, 170, 171, 173, 184, 192, 214, 216, 217, 277, 282, 297, 328 Friederich, Simon, 194, 203 Friedman, Harvey, 326, 351, 352, 362–364 Friedman, Michael, 223 Furstenberg, Harry, 316, 318, 327– 331, 333 Gajda, Adam, 341 Gambino, Nicola, 379 Gandon, Sébastien, 220 Gauss, Carl Friedrich, 68–70, 166, 211, 212 Geach, Peter, 116 Gentzen, Gerhard, 358–360, 363, 397 Gergonne, Joseph D., 2 Gerla, Giangiacomo, 338 Gillies, Donald, 286–288, 291 Glaz, Sarah, 377 Gödel, Kurt, 20, 21, 119, 124, 160, 162–166, 168, 175, 176, 280–282, 286, 296, 298, 306, 316, 318, 327, 331–333, 351, 353, 358 Goldfarb, Warren, 107 Gottlieb, Daniel H., 3 Gottlob, Georg, 358 Gowers, Timothy, 329 Granville, Andrew, 329 Grassmann, Hermann, 215, 218, 220 Grattan-Guinness, Ivor, 119, 121, 125, 145, 152, 212, 220 Griffin, Nicholas, 120 Griffor, Edward R., 365 Grothendieck, Alexander, 378, 386, 387 Gurwitsch, Aaron, 218 Guyer, Paul, 37
415
Hadamard, Jacques, 315 Haddock, Guillermo E. R., 218 Hahn, Lewis E., 172 Hájek, Petr, 326, 369, 371, 372 Haken, Wolfgang, 7 Halbach, Volker, 183, 192, 199, 200 Hale, Bob, 121 Hallett, Michael, 184, 187, 316 Hamilton, William, 215, 220 Hankel, Hermann, 25–27, 34, 42– 47, 63, 211, 222 Hardy, Godfrey H., 326 Harris, Joe, 387 Hartimo, Mirja H., 217, 218 Hazen, Allen P., 162, 163 Heath, Thomas L., 43 Heck, Richard G. Jr, 70 Heine, Eduard, 26, 34, 49, 50 Hempel, Carl G., 218 Herder, Johann Gottfried, 210 Heyting, Arend, 365 Hilbert, David, ix, 10, 13, 19– 21, 72, 97, 98, 103–107, 110–116, 137, 157, 162, 164, 184, 185, 191, 193, 201, 203, 205, 211, 213, 216, 220, 288, 305, 306, 316, 351–358, 367, 368, 373, 375, 412 Hill, Claire O., 218 Hinnion, Roland, 275, 295 Hintikka, Jaakko, 342 Hochster, Melvin, 379 Hodges, Wilfrid, 188–190, 192, 193 Holmes, Melvin R., 275, 285, 286, 291, 295 Horsten, Leon, 183, 186, 192, 199, 200 Horstmann, Rolf-Peter, 126 Huntington, Edward V., 14, 17, 20, 36, 38, 69, 70 Husserl, Edmund, 34, 209–211, 217– 220, 222, 355 Hylton, Peter, 120, 121, 126, 128,
416
Name Index
133, 147 Ihmig, Karl-Norbert, 217 Illigens, Eberhard, 51 Ingham, Albert E., 315, 316 Ireland, Kenneth, 329 Irvine, Andrew D., 298 Isaacson, Daniel, 331–333 Ishihara, Hajime, 362, 363, 397 Jacobsson, Carl, 376 Jacquette, Dale, 39, 46, 47, 352, 355, 358 James, Ioan M., 3 Jané, Ignacio, 276 Janusz, Robert, 199 Jech, Thomas, 297 Jeffrey, Richard C., 189, 199 Jensen, Roland B., 285 Jevons, William S., 220 Johnstone, Peter T., 373, 379, 381, 384 Jourdain, Philip, 104, 106, 125, 144, 145, 152 Joyal, André, 367, 381, 382, 388 Kalderon, Mark E., 168 Kanamori, Akihiro, 164, 294, 298 Kanda, Akira, 275, 295 Kant, Immanuel, 36–39, 42, 44, 46–48, 63, 168, 174, 210, 215, 355 Kaye, Richard, 199, 230 Kelly, John L., 284 Kitcher, Philip, 173 Klaua, Dieter, 231, 246 Klein, Felix, 1, 217, 221 Klement, Kevin C., 146 Kneser, Martin, 394 Koellner, Peter, 175, 304 Kohlenbach, Ulrich, 364 Kreisel, Georg, 331–333, 352, 357, 358, 361, 362, 364, 367, 368 Kronecker, Leopold, 355, 364, 367, 373, 390, 394
Krull, Wolfgang, 367 Kunen, Kenneth, 230, 297 Kunz, Ernst, 393, 397 Kuratowski, Kazimierz, 246 Lakoff, George, 123, 173 Lambán, Laureano, 388 Landau, Edmund, 192 Landini, Gregory, 120, 122, 138, 154–157, 159, 162, 167 Lavine, Shaughan, 137, 138, 258 Leibniz, Gottfried Wilhelm, 132, 137 Leighton, Robert B., 198 Leitgeb, Hannes, 288 Leitsch, Alexander, 358 Leng, Mary, 169 Levy, Azriel, 231, 275, 279, 284, 292, 293, 297, 300 Lewis, Albert C., 215, 222 Libert, Thierry, 275, 295 Liesen, Jörg, 215 Ling, George H., 1 Link, Godehard, 119, 120, 174, 192 Linnebo, Øystein, 170 Linsky, Bernard, 120 Lipschitz, Rudolf, 188 Löfwall, Clas, 376 Lombardi, Henri, 351, 352, 365, 368, 372, 373, 383, 388, 390 Lorenzen, Paul, 257 Loustaunau, Philippe, 376
131, 160, 229, 124,
282, 298,
167,
355, 379,
Macdonald, Ian G., 373 Maddy, Penelope, 280, 298, 299 Maiocchi, Roberto, 169 Majer, Ulrich, 316 Makin, Gideon, 151 Makkai, Michael, 369 Mancosu, Paolo, 330, 352
Name Index
Mannoury, Gerrit, 211, 214, 218 Marshall, Victoria M., 302 Martin-Löf, Per, 355 Marty, Anton, 35, 36, 39 Mathews, George B., 1, 4 Mazur, Barry, 329 McGuinness, Brian F., 196 McLarty, Colin, 329–331 Meinong, Alexius, 215 Menger, Karl, 389 Mines, Ray, 372, 373, 375–377, 393, 394 Miranda, Annamaria, 338 Mirimanoff, Dmitry, 277 Mitchell, Ulysses G., 19 Montague, Richard, 134, 293 Moore, Andrew W., 229 Moore, George E., 119, 125, 126, 128, 132, 133 Morse, Anthony P., 284 Mostowski, Andrzej, 284 Mühlhölzer, Felix, 183, 191, 194, 202, 203 Multatuli (Dekker, Edward D.), 209–211 Mundici, Daniele, 358 Musil, Robert, 210 Mycielski, Jan, 258 Myers, Dale, 343 Nagel, Ernest, 210, 217, 218, 221, 222 Negri, Sara, 398 Neumann, Peter M., 80, 91 Neurath, Otto, 218 Newton, Isaac, 25, 35, 42, 46, 47, 59 Niebergall, Karl-Georg, 229, 233, 241, 243, 246, 247, 253– 256, 353, 357, 382 Noether, Emmy, 367, 376 Norman, Jean, 213 Novak, Ilse L., 296 Núñez, Rafael, 123, 173 Oberschelp, Arnold, 294, 295
417
O’Hara, Charles W., 19, 20 Olszewski, Adam, 199 Osswald, Horst, 366 Ostwald, Wilhelm, 209, 217, 218, 222, 223 Palmgren, Erik, 365, 398 Parikh, Rohit, 326 Paris, Jeffrey B., 327 Parsons, Charles, 183, 193, 196– 198, 201–204, 357 Pascal, Blaise, 28 Pasch, Moritz, 13 Peacock, George, 211, 220 Peano, Giuseppe, 13, 56, 57, 121, 125, 132, 135, 151, 153, 216 Pearce, David, 341 Peirce, Charles Sanders, 220 Perdry, Hervé, 352, 365, 376, 395, 396 Perron, Oskar, 394 Perry, John, 130 Petschke, Hans-Joachim, 215 Pickford, Alfred G., 1 Pierpont, James, 123, 166 Poincaré, Henri, 155, 355 Polański, Marek, 337 Powell, William C., 301 Priest, Graham, 213 Pudlák, Pavel, 326, 365 Putnam, Hilary, 167, 168, 186, 192, 196, 352, 357 Pycior, Helena M., 210 Quine, Willard V., 121, 122, 150, 155, 157, 160, 162, 168, 172, 174, 235, 258, 279, 282–285, 290, 296–299, 308 Quinon, Paula, 199 Quitté, Claude, 352, 365, 373, 388 Ramharter, Esther, 210 Rathjen, Michael, 352, 359, 360, 363–365
418
Name Index
Reck, Emil H., 352, 363 Reck, Erich H., 108 Reinhardt, William N., 292, 300– 305 Remmert, Reinhold, 166, 212 Resnik, Michael, 27 Reye, Theodor, 1 Reyes, Gonzalo E., 369 Richman, Fred, 363, 365, 366, 372, 373, 375–377, 393, 394, 396 Ricketts, Thomas, 107–110 Rieger, Adam, 286 Rinaldi, Davide, 389, 397 Robinson, Julia, 324, 325 Robinson, Raphael M., 279, 280 Rodríguez-Consuegra, Francisco, 119, 126, 133 Romero, Ana, 388 Rosen, Gideon, 169 Rosen, Michael, 329 Ross, William D., 65 Rossberg, Marcus, 27, 32, 72, 77 Rosser, John B., 296 Rota, Gian-Carlo, 387 Roth, Daniel, 275, 307 Routley, Richard, 213 Roy, Marie-Françoise, 352, 365, 388, 389 Royce, Josiah, 17 Rubio, Julio, 388 Ruitenburg, Wim, 372, 373, 375– 377, 393, 394 Russ, Steve, 215 Russell, Bertrand, 25, 35, 56–59, 91, 119–169, 171, 175– 177, 192, 209, 210, 214– 216, 219–222, 284, 288, 328, 337–339 Salou, Maimouna, 352, 365 Sambin, Giovanni, 373, 379, 387 Sands, Matthew, 198 Saunderson, Nicholas, 11 Scanlan, Michael, 66, 67
Scheffler, Israel, 235 Schiemer, Georg, 291 Schilpp, Paul A., 163 Schirn, Matthias, 25, 28, 29, 31, 41, 53, 62, 353, 357 Schlimm, Dirk, 188 Schmidt, Jürgen, 231, 246 Schmitz, H. Walter, 212 Schneider, Gregor, 275 Scholz, Erhard, 212 Schönfinkel, Moses, 148 Schubert, Hermann, 34 Schuhmann, Elisabeth, 211 Schuhmann, Karl, 211 Schuster, Peter M., 351, 376, 379, 387 Schütte, Kurt, 150, 358 Seidenberg, Abraham, 376, 393 Setzer, Anton, 365 Shapiro, Stewart, 182, 186, 188, 193, 195, 366 Shelah, Saharon, 175 Shoenfield, Joseph, 301 Sieg, Wilfried, 188, 352, 357 Sigstam, Inger, 379 Simons, Peter, 27, 79, 85, 89, 90 Simpson, Stephen G., 351, 352, 362, 363 Skolem, Thoralf, 165, 195, 196, 277, 280, 281, 286 Sluga, Hans, 27 Smart, E. Howard, 4 Smith, David E., 1 Smith, Jan M., 365 Smith, Kemp, 37 Solovay, Robert, 164 Sommer, Richard, 357 Specker, Ernst, 3, 285 Sraffa, Piero, 196, 197 Stanley, Jason, 108 Stegmüller, Wolfgang, 284 Stoltenberg-Hansen, Viggo, 372 Stolz, Otto, 63, 68 Storch, Uwe, 393, 394, 397 Sullivan, Peter, 108
Name Index
Sylvester, James J., 327 Szabo, Manfred E., 359 Szczerba, Lesław, 342–344 Tait, William W., 34, 171, 174, 185, 205, 326, 357, 358, 363 Takeuti, Gaisi, 230, 267, 359, 360 Talcott, Carolyn, 357 Tappenden, James, 34, 98, 104, 108, 112 Tarski, Alfred, 72, 73, 189, 284, 337, 338 Taussky-Todd, Olga, 329 Tennenbaum, Jonathan B., 376 Tennenbaum, Stanley, 199 Tharp, Leslie, 173 Thomae, Johannes, 26, 34, 61 Thurston, William P., 183, 205 Tierney, Myles, 380 Torretti, Roberto, 43 Tucker, John V., 372 Urysohn, Pavel S., 389 Vahlen, K. Theodor, 394 Vaihinger, Hans, 168 Valdés-Villanueva, Luis M., 369, 371, 372 van Atten, Mark, 355 van Benthem, Johan, 341 van der Waerden, Bartel L., 391, 394 van Fraassen, Bas C., 168, 174 van Heijenoort, Jean, 23, 107, 150, 164, 182, 207 Veblen, Oswald, 1–3, 13, 17, 19, 20 Veldman, Wim, 362, 363 Vickers, Steven, 373, 379 von Kutschera, Franz, 25, 35, 72, 89, 90 von Neumann, John, 277–281, 283– 285, 294, 295, 298, 306, 353 von Plato, Jan, 358
419
Wang, Hao, 279, 280, 283, 286, 292, 296, 298, 302–304 Ward, Dudley R. B., 19, 20 Weber, Heinrich, 187–189, 329, 387 Weber, Michel, 338 Weierstrass, Karl, 27, 123 Weiss, Paul, 20 Wentworth, George, 1 Westerståhl, Dag, 369, 371, 372 Weyl, Hermann, 355 Whitehead, Alfred N., 13, 14, 17, 20, 57, 58, 107, 121, 125, 129, 139, 151, 152, 156– 163, 209, 217, 220–222, 337–339 Wiener, Hermann, 13 Wildenberg, Gerald, 328 Wilkie, Alex J., 327 Wittgenstein, Ludwig, 119, 147, 183–186, 190, 191, 194– 197, 203 Woleński, Jan, 199 Wood, Allen W., 37 Woodin, Hugh, 160, 174, 175 Woods, Alan R., 327 Wraith, Gavin C., 369, 370, 382 Wright, Crispin, 121 Wright, Edward M., 326 Yengui, Ihsen, 352, 365 Young, John W., 1–3, 13, 17, 19, 20 Zach, Richard, 352, 355, 357, 358 Zalta, Edward N., 365, 366 Zappe, Júlia, 376 Zaring, Wilson M., 230, 267 Zdanowski, Konrad, 199 Zermelo, Ernst, 164, 277, 280, 285, 297, 301 Ziche, Paul, 209, 211, 215, 217, 218, 222, 223 Ziegler, Renatus, 286, 287