Synthese (2011) 183:1–5 DOI 10.1007/s11229-009-9671-0
The axiomatic method, the order of concepts and the hierarchy of sciences: an introduction Arianna Betti · Willem R. de Jong · Marije Martijn
Received: 25 July 2009 / Accepted: 11 August 2009 / Published online: 6 October 2009 © The Author(s) 2009. This article is published with open access at Springerlink.com
This special issue of Synthese ‘The Classical Model of Science II: The axiomatic method, the order of concepts and the hierarchy of sciences’ follows up on the previous ‘The Classical Model of Science I: A millennia-old model of scientific rationality’. Both issues centre on the role, the significance and the impact of the axiomatic ideal of scientific knowledge in the history of philosophy. The first issue focuses on the relation between axiomatics and a number of issues in the development of logic, mathematics, and methodology and philosophy of science in Aristotle, Proclus, the seventeenth century, Kant, Bolzano, Frege and Le´sniewski. The papers collected in this second issue on the one hand continue to investigate that relation in Kant and Bolzano, stretching it further on to our days, via mathematicians such as Schröder, Dedekind, and Birkhoff, and on the other hand they extend that investigation to related and current issues concerning the empirical sciences, in a systematic evaluation of modern (formal) axiomatic conceptions of science. The contributions in both issues take their cue from the axiomatic ideal in question as captured in the ‘Classical Model (or Ideal) of Science’ (de Jong and Betti 2008): (1) All propositions and all concepts (or terms) of S concern a specific set of objects or are about a certain domain of being(s). (2a) There are in S a number of so-called fundamental concepts (or terms).
A. Betti (B) · W. R. de Jong · M. Martijn Faculteit der Wijsbegeerte, Vrije Universiteit Amsterdam, De Boelelaan, 11051081 HV Amsterdam, The Netherlands e-mail:
[email protected] W. R. de Jong e-mail:
[email protected] M. Martijn e-mail:
[email protected]
123
2
Synthese (2011) 183:1–5
(2b) All other concepts (or terms) occurring in S are composed of (or are definable from) these fundamental concepts (or terms). (3a) There are in S a number of so-called fundamental propositions. (3b) All other propositions of S follow from or are grounded in (or are provable or demonstrable from) these fundamental propositions. (4) All propositions of S are true. (5) All propositions of S are universal and necessary in some sense or another. (6) All propositions of S are known to be true. A non-fundamental proposition is known to be true through its proof in S. (7) All concepts or terms of S are adequately known. A non-fundamental concept is adequately known through its composition (or definition). These seven conditions systematize desiderata of meaningfulness, economy, definability, ground and consequence, truth, necessity, and knowability of propositions and concepts (or sentences and terms) of a real or proper science. Conditions (1–5) relate primarily to the ontological ‘order of things’ (ordo essendi); they can be divided into what we call the ‘Domain Postulate’ (1), the ‘Postulate of Order’ (2a, 2b; 3a, 3b), the ‘Truth Postulate’ (4) and the ‘Necessity’ and the ‘Universality Postulate’ (5). Conditions (6) and (7) instead, together forming ‘the Postulate of (Grounded) Knowledge’, regard the epistemological ‘order of knowledge’ (ordo cognoscendi). It is worth mentioning that what is at issue here is an explanatory ideal of scientific knowledge: what follows from the principles (axioms and definitions) is explained by those principles, which function as grounds. The relation of ‘consequence’ (or, in all generality, of ‘following’) ad 3b should be understood accordingly. The Classical Model of Science as set down in conditions (1–7) can be more immediately seen as concerning an organized system of known truths from the point of view of the context of justification; however, it is also relevant to the context of discovery (ars inveniendi/methodus inventionis), insofar as it can be approached as a normative guide for finding scientific truths, for instance as the ideal structure to be aimed at in seeking to establish axioms. Conditions (1–7) also provide a common interpretive grid and terminology, variously taken up by the papers which follow. Hein van den Berg’s ‘Kant’s conception of proper science’ analyses Kant’s take on the Postulate of Order and on the Necessity Postulate. The Postulate of Order concerns the systematic ordering of concepts by means of definition (i.e. 2a, 2b) and of propositions or judgements by means of relations of grounding (i.e. 3a, 3b); the Necessity Postulate, in one construal, concerns the constraint that the judgements of a proper science must be grounded in a priori principles, which is part of (5). Both Postulates underlie Kant’s conception of a proper science, which van den Berg analyses as any body of cognition which (i) is organized into a complete systematic whole, (ii) expresses objective ground-consequence relations and (iii) has a priori principles from which the non-fundamental judgements of a science can be proven. On the basis of this analysis van den Berg offers an interpretation of Kant’s notion of mathematization, i.e. the claim that any real science should be mathematical, which enables him to explain two things: the specifically foundational role of mathematics with respect to physics and the special status of mathematical natural sciences as the only real natural sciences.
123
Synthese (2011) 183:1–5
3
The epistemic conditions on the knowability of propositions and terms (i.e. 6, 7), and in particular Bolzano’s views on the knowability of axioms in the process of discovery, are at the core of Anita Konzelmann’s ‘Bolzanian knowing: Infallibility, virtue and foundational truth’, based on the little studied epistemological sections of Bolzano’s Wissenschaftslehre. In contemporary terms Bolzano emerges from Konzelmann’s analysis as a virtue epistemologist “in its responsibilist version, which combines reliabilist concerns for epistemic success with internalist concerns for agentive responsibility for knowledge”. For Bolzano, knowing that a proposition is an axiom means asserting thus, and truly evaluating one’s asserting thus as infallible. The latter requires a (non-assertive) account of the evaluation of an assertion as genuinely immediate, but since there are no structural means to distinguish immediate from mediate assertions, Bolzanian knowledge seems to collapse onto a very high degree of believing. Can we avoid this? Not exactly. Konzelmann argues that our best option is to take Bolzanian knowing to involve the best possible evaluation of an assertion’s fallibility while relying on certain salient virtues of the knower, such as trust and prudence. The context of discovery figures prominently also in ‘On the creative role of axiomatics. The discovery of lattices by Schröder, Dedekind, Birkhoff, and others’, by Dirk Schlimm, which centres on the Domain Postulate (i.e. condition (1)). Schlimm discusses the independent introduction of lattices by Schröder, Dedekind, Birkhoff and others as examples illustrating three different ways in which axiomatic systems can creatively lead to the discovery of new domains of objects of investigation: by analogy, by abstraction and by modification. In the first case we start with the consideration of a similarity between certain domains and then capture the similarity by setting up a common axiom system; in the second case we select properties of a given domain, then we capture these properties axiomatically and finally we identify new domains as those which also satisfy the axioms in question; in the third case we modify the axiom system directly by adding, deleting or changing one or more axioms and then use the system thus obtained to define a new domain. The method of abstraction in particular was made possible by the new conception of formal axiomatic systems which emerged at the end of the nineteenth century, in which primitives are conceived as reinterpretable. This new, formal conception of axiomatics is also interestingly connected with the birth of model theory. The model-theoretical approach to axiomatics, in particular the model-theoretical notion of logical consequence—rather than the proof-theoretical one—forms the specific take on (3a, 3b) that Jaakko Hintikka champions in ‘What is the axiomatic method?’. An axiomatic system in the modern sense, says Hintikka, is the study of a certain class of structures, i.e. its models. The derivation of theorems from axioms can produce only new ‘surface’ (or explicit) information, not new ‘depth’ (or implicit) information. That is, it can only produce information that can be read off a sentence without non-trivial deductive aids. Axioms cannot make theorems any more certain, and in this sense they occupy no epistemologically privileged position. Still, axioms can fulfill an explanatory role with respect to the theorems. A partial measure of this explanatory ability of axioms is their simplicity. Hintikka’s proposal may be read as the modern counterpart of the traditional constraint that the relation of consequence between axioms and theorems
123
4
Synthese (2011) 183:1–5
is one in which axioms ground their theorems, as discussed in van den Berg’s and Konzelmann’s papers on Kant and Bolzano. Like Hintikka’s paper, the last two papers of this issue offer a broad discussion of (formal) axiomatic views of science in general and are model-theoretically oriented; however, they lay more emphasis on (the philosophy of) empirical science. F.A. Muller’s ‘Reflections on the revolution at Stanford’ is a critical discussion of the revolutionary change brought about by Patrick Suppes’ view of the nature of scientific theories (‘the Model Revolution’). Muller’s criticism is related to the Domain and the Truth Postulates (conditions (1) and (4) above). A scientific theory in Suppes’ ‘Informal Structural View’ is a set of set-structures defined in the language of pure set theory. But, Muller counters, a scientific theory is supposed to give us scientific knowledge in the sense of scientifically justified true propositions about concrete actual beings. In a Suppesian scientific theory it is glossed over how to discern which data are relevant for which theory, and there is no place reserved for either beings, language, or the relations between the structures that constitute the theory and the beings the theory is supposed to be about. Muller proposes to solve these problems and to complete ‘the Model Revolution’ by what he calls ‘the Structural View’, which involves, in Muller’s opinion, a better characterization of what scientists call ‘model’, that is, a structure meeting a certain set-theoretical predicate (the Suppes-predicate) together with all its formulations. A discussion of problems concerning the currently much debated notion of scientific representation concludes the paper. In his response to Muller, ‘Future development of scientific structures closer to experiments’, Patrick Suppes discusses a different direction for future developments of his Informal Structural View, which aims at solving a number of problems concerning the philosophy of science, especially experimental science. Suppes pleads for staying in touch as much as possible with the actual practices of science at the level of measurement, observation and computation, and with how these practices “should be reflected back into the theory when the limitations imposed by errors or environmental variations are taken seriously”. Suppes attracts attention to the contrast between experiment-talk and theory-talk; to the non-verbal aspects of experimental practice and the way in which experimental methods have influenced the progress of both natural and mathematical sciences in the past; and to the fact that invariance as sought after in approaches like Muller’s tends to be incompatible with computational simplicity, while it is often the latter which is preferred in scientific practices. Suppes touches upon four programmatic issues: first, structures which, albeit still formally defined, match more closely actual methods of measurement, are to be preferred to classical structures with infinite domains closely matching mathematical ones; secondly, axioms of floating-point arithmetic should replace the classical axioms of arithmetic for computational purposes; thirdly, ergodic theory of chaos should be used to get general results about errors; and the fourth final programmatic step is one towards a constructive, non-standard foundation of analysis. The publication of this second special issue concludes a project on the Classical Model of Science partly funded by the Netherlands Organisation for Scientific Research (project 275-80-001) and by the European Research Council (project 203194). The project involved a conference in Amsterdam which, like this issue, was entitled ‘The axiomatic method, the order of concepts and the hierarchy of sci-
123
Synthese (2011) 183:1–5
5
ences’, with an open call for papers, where earlier versions of the papers by Anita Konzelmann, Jaakko Hintikka, Dirk Schlimm and F.A. Muller were presented. The paper by Hein van den Berg was written especially to be submitted to this issue. The paper by Patrick Suppes was originally a long reply on Muller’s paper in the form of a letter to the author. We invited Patrick Suppes to submit the reply in paper form. We wish to thank Stefan Roski and Jan Willem Wieland for meticulous editorial assistance on the final versions, and the anonymous referees for their precious time and kind help with this project. Finally, thanks to Jan Wole´nski, and to the editors of Synthese, especially John Symons, for having believed in this project and for supporting it by hosting both special issues in the journal series. Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Reference de Jong, W. R., & Betti, A. (2008). The classical model of science: A millennia-old model of scientific rationality. Synthese. Online first at http://www.springerlink.com/content/w5340m712258304m. doi:10.1007/s11229-008-9416-5.
123
This page intentionally left blank z
Synthese (2011) 183:7–26 DOI 10.1007/s11229-009-9665-y
Kant’s conception of proper science Hein van den Berg
Received: 14 March 2008 / Accepted: 7 April 2009 / Published online: 30 September 2009 © The Author(s) 2009. This article is published with open access at Springerlink.com
Abstract Kant is well known for his restrictive conception of proper science. In the present paper I will try to explain why Kant adopted this conception. I will identify three core conditions which Kant thinks a proper science must satisfy: systematicity, objective grounding, and apodictic certainty. These conditions conform to conditions codified in the Classical Model of Science. Kant’s infamous claim that any proper natural science must be mathematical should be understood on the basis of these conditions. In order to substantiate this reading, I will show that only in this way it can be explained why Kant thought (1) that mathematics has a particular foundational function with respect to the natural sciences and (2) as such secures their scientific status. Keywords
Kant · Proper science · Objective grounding · Mathematics
1 Introduction The Preface to the Metaphysical foundations of natural science (1786) contains one of Kant’s few systematic attempts at finding the notion of a proper science. Kant defines a proper science as a body of cognition that (i) is a system, (ii) constitutes a rational interconnection of grounds and consequences, and (iii) provides apodictically certain cognition. In addition, Kant states that any proper natural science must allow for the application of mathematics.1 The Preface does not contain a detailed analysis of these conditions, nor does it explain why we should accept them. However, the implications 1 Kant (1902, IV, pp. 467–471).
H. van den Berg (B) Faculteit der Wijsbegeerte, Vrije Universiteit Amsterdam, DeBoelelaan 1105, 1081 HV Amsterdam, The Netherlands e-mail:
[email protected]
123
8
Synthese (2011) 183:7–26
of these conditions are rich. They enable Kant to argue that natural description (the classification of natural kinds), natural history (the historical study of changes within nature), chemistry and empirical psychology are improper sciences. This does not mean that Kant did not take an active interest in the experimental sciences. Recent research has shown that Kant, throughout his life, provided significant philosophical analyses of these sciences.2 This raises the question why Kant adopted his restrictive conception of proper science. In the present paper I will try to answer this question by describing the conceptual background of Kant’s idea of proper science. I will analyze Kant’s conditions for proper science one by one and indicate how they are related to each other. I will also argue that several of these conditions correspond to conditions of the Classical Model of Science as set out by de Jong and Betti (2008). In the first section I will discuss the condition of systematicity. In the second section I will discuss Kant’s claim that any proper science must provide a rational ordering of grounds of consequences. This claim is sometimes interpreted as stating that any proper science must have a priori principles.3 In my opinion, it can be better understood as stating that any proper science must satisfy a grounding-relation, i.e., provide explanative demonstrations. It is Kant’s third condition, discussed in section three, that implies that proper sciences must have a priori principles. Finally, section four will provide an interpretation of the claim that any proper natural science must allow of mathematization. 2 Systematicity The first condition that any proper science must satisfy is that of systematicity.4 In the Critique of pure reason Kant explicates the concept of system as follows: If we survey the cognitions of our understanding in their entire range, then we find that what reason quite uniquely prescribes and seeks to bring about concerning it is the systematic in cognition, i.e., its interconnection based on one principle. This unity of reason always presupposes an idea, namely that of the form of a whole of cognition, which precedes the determinate cognition of the parts and contains the conditions for determining a priori the place of each part and its relation to the others. Accordingly, this idea postulates complete unity of the understanding’s cognition, through which this cognition comes to be not merely a contingent aggregate but a system interconnected in accordance with necessary laws. (Kant 1787, A, p. 645/B, p. 673, original emphasis) These remarks require explanation. First, note that the systematic unity of cognition is said to be brought about by the faculty of reason. This follows from Kant’s conception of reason as a faculty that organizes cognition. In particular, reason logically 2 See, for example, the collection of essays in Watkins (2001). 3 Watkins (2007, p. 5), Pollok (2001, pp. 56–62). 4 This notion has received considerable attention. For recent discussion see Falkenburg (2000, pp. 376– 385), Fulda and Stolzenberg (2001), Guyer (2005, pp. 11–73). My account is indebted to Falkenburg, from whose analysis of Kant’s theory of science I have greatly benefited.
123
Synthese (2011) 183:7–26
9
orders cognition and thus unifies it.5 The term ‘cognition’ refers to both concepts and judgments. I will however restrict my discussion to concepts. Second, Kant claims that the unity of cognition effected by reason is based on an idea of the “form of a whole of cognition” which postulates a “complete unity” of cognition. Hence, a system of cognition constitutes a complete whole. Finally, Kant claims that the place of the parts (cognitions) within a system of cognition and the relation of these parts to each other is determined a priori in accordance with certain conditions. In the following, we will see that Kant takes a system of cognition to be constructed by following certain logical rules establishing necessary relations among cognitions. The concept ‘conditions’ refers, among others, to these rules. In short, a system is a complete whole composed of parts that are necessarily related to each other in accordance with rules. As such, it is distinguished from an aggregate. If we turn our attention to Kant’s discussion of systematicity in the Jäsche Logik, it becomes clear that systematicity must be understood as a logical requirement concerning the form of cognition.6 Kant explicates this requirement in the Doctrine of Method of the Logik, which specifies the conditions of scientific cognition in general. Kant describes the requirement of systematicity, together with those of distinctness (Deutlichkeit) and thoroughness (Gründlichkeit), as logical perfections. These perfections provide ideals of scientific cognition.7 With respect to the ideal of systematicity, Kant remarks that the combination of cognitions in a systematic whole depends on the “distinctness of concepts both in regard to what is contained in them and in respect of what is contained under them”.8 Here, Kant employs traditional logical terminology to elucidate the notion of systematicity. When Kant speaks of that which is contained in a concept, he refers to the totality of partial concepts comprising the intension (Inhalt) of this concept.9 Thus, for example, the partial concepts ‘animal’, ‘rational’ and ‘mortal’ are contained in the concept ‘man’. Conversely, when Kant speaks of that which is contained under a concept, he refers to the totality of concepts comprising its extension (Umfang). For example, the concepts ‘gold’, ‘silver’, ‘copper’ and so forth are contained under the concept ‘metal’, functioning as a characteristic (Merkmal) of these concepts.10 Finally, a concept is distinct if we possess a clear representation of its characteristics, i.e., if we are conscious of the partial concepts contained in this concept. A concept is made distinct by analyzing it.11 Kant’s claim that the connection of cognitions into a systematic whole requires the distinctness of concepts in regard to what is contained in and under them can now be understood as follows: systematicity is brought about both by the analysis of the intension of concepts and by the specification of their extension.12 As such, Kant’s notion of systematicity, when applied 5 Kant (1787, A, pp. 298–302/B, pp. 355–359). Cf. Falkenburg (2000, pp. 376–385). 6 This is also emphasized by Longuenesse. Longuenesse (1998, pp. 149–153). 7 Kant (1902, IX, pp. 139–140). 8 Ibid. 9 Kant (1902, IX, p. 95). 10 Kant (1902, IX, p. 96). 11 Kant (1902, IX, pp. 61–62). 12 Cf. Longuenesse (1998, pp. 150–151).
123
10
Synthese (2011) 183:7–26
to concepts, expresses the conditions posed on concepts in the Classical Model of Science as described by de Jong and Betti (2008), i.e., that a science S has a number of fundamental concepts and that all other concepts are composed of these fundamental concepts.13 Kant’s conception of systematicity is exemplified by hierarchical systems of concepts (trees), proceeding from an elementary concept (genus summum) to more specific and complex concepts by adding differentiae.14 In the appendix to the Transcendental Dialectic of the first Critique Kant endorses a similar conception. There Kant provides a detailed discussion of the logical principles by means of which we establish systematic unity among cognitions. These logical principles function as rules for the construction of systems and are described by Kant as principles of (i) homogeneity, (ii) specification, and (iii) continuity.15 Rule (i) directs us to subsume any concept under a higher and more general concept. Rule (ii) directs us to divide or specify any given concept into more particular concepts (comprising subsets of the former). Finally, by means of rule (iii) we postulate that different levels of concepts within our classification are continuously related, guiding the attempt to specify continuous transitions from one level of concepts to another. By following these principles we order concepts in terms of their extension and intension and obtain a hierarchy of concepts with the “greatest unity alongside the greatest extension”.16 How should we understand Kant’s claim that a system of cognition should be complete? In the Metaphysik Volckmann it is argued that the completeness of any system requires (a) the specification of upper and lower limits (terminus a priori and terminus a posteriori), and (b) principles by means of which all the parts of a system can be related. Kant presents a closed genealogical tree ordered by the relation ‘generated by’ as an example of a system.17 Similarly, in constructing a system of concepts we can specify a highest genus and a lowest species (infima species) and relate them in terms of their extension or intension. It is important to note, however, that Kant denies the existence of an infima species. In principle, the specification of any concept can proceed indefinitely. Infima species are specified by convention.18 Kant also claims that the assumption of the existence of a highest genus is one of reason, for we cannot empirically identify a highest genus. Hence, although in constructing a system we conventionally specify upper and lower limits we cannot establish their objective reality. The ordo cognoscendi does not necessarily mirror the ordo essendi. Let us return to the description of the concept ‘system’ given in the beginning of this section. There, a system was described as a whole consisting of parts. This description secures the generality of the notion of a system, for ‘parts’ can refer to concepts, judgments or material parts. In addition, Kant stated that a system is complete and that the parts of a system must be necessarily related to each other and to the whole in
13 de Jong and Betti (2008). 14 The theory of concepts adopted by Kant is analyzed in detail by de Jong: (1995, pp. 620–627). 15 Kant (1787, A, pp. 657–658/B, pp. 685–686). 16 Kant (1787, A, p. 643/B, p. 671). 17 Kant (1902, XXVIII, pp. 355–356). 18 Kant (1787, A, p. 655/B, p. 683; 1902, IX, p. 97).
123
Synthese (2011) 183:7–26
11
accordance with certain conditions. On the basis of our discussion we can now claim that these conditions comprise (i) logical rules or principles by means of which we establish specific relations among cognitions, and (ii) the specification of upper and lower limits of a system. These conditions secure that a system is a complete and ordered whole. 3 Objective grounding The requirement of systematicity provides a condition that any science must satisfy. In the Preface to the Metaphysical foundations of natural science (1786) Kant takes natural description, natural history, and chemistry to be systematic doctrines. However, he denies them the status of a proper science.19 Hence, systematicity is not sufficient for distinguishing science from science proper. To make this distinction, Kant adds a second condition that any proper science must satisfy. According to Kant, any proper science must be systematically ordered and constitute an interconnection of grounds and consequences. This condition provides a basis for distinguishing mere science from rational science, where being a rational science must be understood as a necessary but not sufficient condition for being a proper science: Any whole of cognition that is systematic can, for this reason, already be called science, and if the connection of cognition is an interconnection of grounds and consequences, even rational science. (Kant 1902, IV, p. 468) In other words, any rational science is a system of cognition containing a groundingrelation.20 This condition is similar to the Proof Postulate of the Classical Model of Science, as described by de Jong and Betti (2008), which states that all non-fundamental propositions of a science S are ultimately grounded in fundamental propositions.21 However, in the Model a neat distinction is made between the conceptual and the propositional ordering, and the Proof Postulate is related to the order of propositions or judgments. Kant does not neatly distinguish the order of concepts from that of judgments. Moreover, Kant takes the grounding-relation to obtain between both concepts and judgments. In the following, I will try to identify some core elements of Kant’s conception of grounding by analyzing passages from both his pre-critical and critical writings. 19 Kant (1902, IV, pp. 467–468, 471). 20 As indicated in note 3, Pollok and Watkins interpret this condition as claiming that proper sciences must
have a priori principles. Pollok further argues that Kant denies that natural description and natural history are proper sciences because they lack a priori principles. This is problematic because: (i) Kant does not criticize these doctrines in these terms, and (ii) Kant seems to allow that chemistry, based on empirical principles, provides a rational interconnection of grounds and consequences (IV, p. 468). More generally, I take this reading to conflate an epistemic condition that proper sciences must satisfy (Kant’s third condition), with the condition of grounding, which I interpret as the condition that proper sciences must provide explanative demonstrations reflecting the order of nature. In terms of the Classical Model of Science, Kant’s second condition relates to the ordo essendi and not to the ordo cognoscendi. In this context, we may also refer to Friedman (1992b), who in his discussion of a priori grounding of natural laws interprets grounding solely in terms of epistemic justification. 21 de Jong and Betti (2008).
123
12
Synthese (2011) 183:7–26
Kant provides an extensive discussion of the concept ‘ground’ (ratio) in his New elucidation (1755).22 Here, a ground is defined as that “which determines a subject in respect of any of its predicates”.23 In addition, Kant defines ‘to determine’ as “to posit a predicate while excluding its opposite”.24 Hence, a ground is a reason for predicating some concept P of a subject-concept S, while excluding not-P. Cognition of grounds is a condition for asserting the truth of judgments since it provides a reason for asserting a judgment ‘S is P’ while excluding the contradictory judgment ‘S is not P’. In the absence of such cognition there would be no knowledge of truths, since all judgments would be merely taken as possibly true.25 This claim concerns the epistemic function of cognition of grounds but does not capture Kant’s grounding condition. In the New elucidation, Kant interprets the concepts ‘ground’ and ‘consequence’ ontologically, i.e., as referring to existing objects. Hence, strictly speaking the relation of ground to consequence obtains between objects. This relation can be represented conceptually: a grounding-relation can be represented by relations holding between concepts and by relations holding between judgments.26 Any structure of concepts or judgments can thus express an objective grounding-relation. For example, Kant takes a grounding-relation to be expressed in the judgment “a triangle has three sides”.27 The concept ‘triangle’ provides us with a reason for predicating ‘three-sidedness’ of it because a triangle is defined as a three-sided figure. Kant provides an example of a grounding-relation expressed by judgments when he distinguishes between an ‘antecedently determining ground’ and a ‘consequentially determining ground’. The former is a ground of being or becoming, the reason why, while the latter is a ground of cognition, the reason that.28 For example: the eclipses of the satellites of Jupiter are a ground for cognizing that light is propagated with a finite velocity, whereas (following Descartes) the elasticity of the globules of the atmosphere in which light is propagated is a ground of being for the finite velocity of light.29 The eclipses of Jupiter’s satellites are a consequence of the finite velocity of light and allow us to demonstrate this fact.30 These eclipses are not the cause of the finite velocity of light. Accordingly, they provide us with a ground of cognition, not a ground of being, for the truth that light has a finite velocity. By contrast, Descartes hypothesis that the propagation of light must be understood as a series of impacts of elastic globules identifies a cause, a ground of 22 Longuenesse has provided detailed accounts of the concept ‘ground’ in Kant’s pre-critical and critical writings. Longuenesse (1998, pp. 345–358; 2001). Different from Longuenesse I focus on the role of this notion in Kant’s views on scientific explanation. 23 Kant (1902, I, pp. 391–392). 24 Ibid. 25 Kant (1902, I, pp. 393–394). 26 Many commentators, in discussions of Kant’s views on the foundation of scientific cognition, focus
exclusively on relations between judgments. Cf. Guyer (2005, pp. 11–55), Friedman (1992b). This is not incorrect but does not do justice to the fact that conceptual orderings can also satisfy grounding relations. This is the case, e.g., for systems of classification given in natural history, though these systems do not express relations obtaining between real grounds and real consequences. 27 Kant (1902, I, p. 392). 28 Kant (1902, I, pp. 391–392). 29 Kant (1902, I, pp. 392–393). 30 Ibid. Cf. Longuenesse (2001, p. 69).
123
Synthese (2011) 183:7–26
13
being, for the finite velocity of light. The distinction between a ground of being and a ground of cognition can be related to the distinction between a demonstratio propter quid and a demonstratio quia.31 Since Descartes’ hypothesis identifies the ground of being of the finite velocity of light, his account of the velocity of light reflects the objective order of ground and consequence and allows us to give a demonstratio propter quid of this phenomenon. By contrast, cognition of the eclipses of the satellites of Jupiter merely provides subjective justification for the truth that light has a finite velocity. In Kant’s terms, a ground of being is the source for the truth of judgments, i.e., a ground for some phenomenon (described by a judgment) to obtain, whereas a ground of cognition “does not bring the truth into being; it only displays it”.32 In the New elucidation, Kant took grounding to be a relation that can be expressed by relations holding between concepts and judgments. This view is retained in the critical period. In the Jäsche Logik, Kant argued that a concept can be taken as a ground of cognition with respect to the set of representations comprising its extension.33 For example, the concept ‘metal’ functions as a ground of cognition with respect to the concepts ‘gold’, ‘silver’, etc. Kant’s idea is that a genus can function as a ground of cognition for its species: the relation of species to genus provides a ground for cognizing that gold is a metal. I will try to specify how Kant understood the relation holding between ground and consequence. In the first Critique, Kant explicates the relation between ground and consequence in terms of logical inference: In every inference there is a proposition that serves as a ground, and another, namely the conclusion, that is drawn from the former, and finally the inference (consequence) according to which the truth of the conclusion is connected unfailingly with the truth of the first proposition. (Kant 1787, A, p. 303/B, p. 360) Kant takes a logical inference to be a function of thought that relates true judgments and shows that the truth of the conclusion follows from the premisse(s). As types of inference Kant lists: ‘immediate inference’, i.e., subalternation, contraposition and the like, and ‘mediate inference’, i.e., syllogistic inference. If we employ modern terminology and strictly distinguish between logical inference and logical derivability or consequence (which Kant does not), we might say that Kant takes a logical inference to express a relation of logical derivability holding between true judgments and that the grounding-relation can be understood in terms of derivability among truths. This is problematic as the notion of grounding is stronger than that of derivability. Grounding p means providing an explanative demonstration of p.34 This is not necessarily the case for a derivation of p. In addition, grounding is a relation obtaining between truths, whereas (from a modern point of view) derivability can obtain between falsities.
31 de Jong and Betti (2008). 32 Kant (1902, I, p. 394). 33 Kant (1902, IX, p. 96). 34 Cf. de Jong and Betti (2008).
123
14
Synthese (2011) 183:7–26
These two difficulties can be resolved by employing Kant’s distinction between a ground of cognition and a ground of being. In Kant’s view, the logical derivation of a true judgment β from a true judgment α establishes that what is asserted by α is a ground of cognition for the truth of β. However, derivability does not show that α grounds β in the sense of providing an explanative demonstration for the truth of β. This type of grounding requires that α specifies the ground of being for what is asserted by β (as in the case of the Cartesian explanation of the finite velocity of light). It is the latter type of grounding relation that must obtain between scientific cognitions, since science must provide objective explanations representing the order of nature. This becomes clear in Kant’s lectures on metaphysics, the Metaphysik Volckmann. Here, a ground is defined as that which, if it is posited, something else is posited. Kant distinguishes between the relation holding between a logical ground and logical consequence, and that holding between a real ground and real consequence.35 The first relation obtains within analytic judgments, e.g., in the hypothetical judgment “if a being is an animal, it is mortal”.36 In such cases, Kant claims that the relation between ground and consequence can be established by means of the principle of identity, i.e., is analytical.37 The truth of this hypothetical can thus be proven logically. Such a proof can be interpreted as establishing a relation between a judgment (the consequent) and a ground of cognition for its truth (expressed in the antecedent), i.e., a ground for cognizing that animal beings are mortal. The ground of being of the mortality of animals is, however, not specified by this logical proof. Kant explicates this by stating that the concept of ground, as pertaining to logic, is “treated in so far it is a ground of cognition”.38 If we understand Kant’s notion of logical inference as derivability even this is saying too much. For establishing a relation between a judgment and its ground of cognition via logical proof is tantamount to providing a ground for the truth of the latter, whereas the relation of derivability can hold between false judgments. However, as said Kant does not share our modern conception of derivability, for he takes logical inferences to be valid only if the premises are true.39 For this reason, Kant thinks that logical inference allows us to show that what is asserted in the antecedent of a hypothetical judgment is a ground of cognition for the truth of what is asserted in the consequent. Kant’s distinction between real grounds and real consequences indicates that judgments that are not logically inferred can also satisfy a grounding-relation. Thus, α can ground β even if β is not derivable from α. This distinction prohibits us from explaining Kant’s notion of grounding solely in terms of derivability.40 According to Kant, the relation between a real ground and real consequence cannot be established analytically. A real ground is defined as that which “if it is posited, something else is 35 Kant (1902, XXVIII, pp. 401–402). For a thorough analysis of the notion of ground in the Metaphysik Volckmann, cf. Longuenesse (1998, pp. 354–356). 36 Kant (1902, XXVIII, p. 397). 37 Kant (1902, XXVIII, p. 402). 38 Kant (1902, XXVIII, p. 399). 39 Kant (1902, IX, p. 121). 40 For this reason, I cannot follow Falkenburg, who explicates Kant’s notion of ‘grounding’ in terms of
deducibility. Falkenburg (2000, pp. 368–370).
123
Synthese (2011) 183:7–26
15
posited, but not according to the principle of identity”.41 Here, the relation is synthetic. This relation obtains, for example, in the hypothetical ‘if I have been exposed to the cold, I will come down with the flu’. In this case, a grounding-relation obtains between antecedent and consequent, although the latter cannot be logically inferred from the former. In physics, according to Kant, we are concerned with the relation between real ground and real consequence.42 Thus, judgments of physics that cannot be logically derived from one another can ground each other (express relations between real grounds and consequences). The same holds for mathematical theorems, which are synthetic and do not allow of logical proof. Kant emphasizes that the concept ‘real ground’ must not be interpreted as a ground of cognition, but as a ground of being.43 This implies that within mathematics and physics we establish relations between judgments that express relations between real grounds and consequences and thus provide demonstrations propter quid. The manner in which Kant takes mathematical judgments to be grounded cannot be explicated within this paper. Judgments of physics satisfy a grounding-relation because in a proof of physics they can be related in such a manner that they express a relation between cause and effect, which is an instance of a relation between ground of being and consequence. This is clear in the Metaphysik Volckmann, where two methods of proof for the truth of cognitions are distinguished: (i) an a posteriori method in which one proceeds from cognition of the consequence to cognition of its ground, e.g., observation of the world allows us to prove that God exists. In this case, we specify a ground of cognition for the truth that God exists. (ii) An a priori method, in which we proceed from cognition of the ground to cognition of its consequence. This is the true method of natural science which consists in specifying causes of effects.44 A proof in which one proceeds, for example, from premises expressing relations between cause and effect to a conclusion expressing the effect would fit this method quite nicely. In the Metaphysical foundations, natural description is denied the status of a proper science on the basis of Kant’s grounding condition.45 This doctrine does not provide “cognition through reason of the interconnection of natural things”.46 I take this to mean that natural description does not provide demonstrations propter quid. Natural description is defined as a “system of classification for natural things in accordance with their similarity”.47 Kant employs this notion to characterize classifications of natural kinds given in disciplines such as zoology or botany. According to Kant, cognitions making up such classificatory systems are not properly grounded. Take for example the taxonomy of organisms based on morphological criteria as given by Linnaeus in his Systema naturae. If we take this taxonomy to be correct, we are provided with a 41 Kant (1902, XXVIII, p. 403). 42 Ibid. 43 Kant (1902, XXVIII, p. 399). 44 Kant (1902, XXVIII, p. 355). The same conception of scientific demonstration, entitled ‘dogmatic proof’,
is articulated in the Danziger Physik. Cf. Kant (1902, XXIX, pp. 103–104). 45 It must be noted that Kant’s views on the scientific merit of natural description and natural history varied
throughout his philosophical career. Cf.: Sloan (2006, pp. 627–648). 46 Kant (1902, IV, pp. 467–468). 47 Ibid.
123
16
Synthese (2011) 183:7–26
ground for cognizing the truth that, say, a lion is a feline. However, it does not provide us with a ground of being, a reason why lions are feline. Linnaeus’ taxonomy does not provide us with relationships holding between real grounds and consequences. Hence, this taxonomy does not allow us to explain why certain organisms have specific morphological characteristics. For this reason, Kant takes natural description to lack explanatory power. The status of natural history is problematic. In his 1788 essay on teleological principles Kant construes natural history as a discipline investigating relations between present properties of natural objects and their historical causes.48 Causal regularities relating present effects with earlier causes are derived from the observation of forces presently operative in nature and inferences by analogy, supporting the claim that these forces have been operative in the past and have produced similar effects as presently observed. Since causal relations constitute relations between objective grounds and consequences, natural history may be interpreted as providing objective explanations, e.g., of the origin of human races.49 However, Kant emphasized that inferences by analogy merely provide empirical (non-apodictic) certain cognition50 and stressed that natural history is a novel science in need of further development.51 This may explain why natural history is classified as a doctrine rather than a science of nature. 4 Apodictic certainty The third and final condition that any system of cognitions must satisfy in order to be a proper science is that its cognitions are apodictically certain, i.e., that we are conscious of their necessary truth: What can be called proper science is only that whose certainty is apodictic; cognition that can contain mere empirical certainty is only knowledge improperly so-called. (Kant 1902, IV, p. 468) In the Logik, Kant defines knowledge (Wissen), opinion (Meinung) and belief (Glaube), as modes of holding-to-be-true (Fürwahrhalten). Holding something to be true is, in turn, defined as a judgment through which something is subjectively “represented as true”.52 In other words, opinion, belief and knowledge are terms that indicate different modes of epistemic justification. Kant’s final condition of scientificity corresponds to what is called the ‘Knowledge Postulate’ in the Classical Model of Science, which relates to the ordo cognoscendi and states that any proposition of a science is known to be true.53 In Kant’s work the ‘Knowledge Postulate’ is intimately 48 Kant (1902, VIII, pp. 61–62). 49 Hence, I cannot subscribe to Sloan’s thesis that Kant, from the 1780s onwards, gave theoretical preference
to natural description over natural history. Sloan (2006, p. 629). 50 Kant (1902, IX, p. 133). 51 Kant (1902, VIII, p. 62). 52 Kant (1902, IX, pp. 65–66). 53 de Jong and Betti (2008). The fact that Kant’s third condition, stating that the cognitions of a sci-
ence must be apodictically certain, relates to the ordo cognoscendi, indicates that this condition should be
123
Synthese (2011) 183:7–26
17
related to the ‘Necessity Postulate’ of the Classical Model of Science, which states that all propositions or judgments of a science are necessary, since he argues that we only have knowledge of a proposition or judgment if we assert its necessary truth. Kant describes the three modes of epistemic justification as follows. We have an opinion if we judge without having sufficient subjective or objective grounds for the truth of this judgment. In this context, the concept ‘ground’ refers to a ground of cognition, a ground on the basis of which we take a judgment to be true. A ground is subjectively sufficient for taking a judgment to be true if it is sufficient for myself, and a ground is objectively sufficient for taking a judgment to be true if it is sufficient or valid for everyone.54 We opine if in the act of judging we take the judgment to be problematic i.e., take the judgment to be merely possibly true. Believing is taking something to be true based on a ground of cognition that is objectively insufficient but subjectively sufficient, e.g., one can rationally believe that God exists since this belief “depends on subjective grounds (of moral disposition)”.55 We believe something if in the act of judging we assert the truth of the judgment. Knowing is taking something to be true based on grounds that are both objectively and subjectively sufficient. I have knowledge if I have a judgment that is apodictically certain, i.e., if I take the judgment to be necessarily true.56 In the Logik, Kant further distinguishes between two types of knowledge: empirical knowledge, based on experience, and rational knowledge, based on reason. Rational knowledge is apodictically certain and can be divided into knowledge that is mathematically (intuitively) certain or philosophically (discursively) certain.57 This distinction relates the epistemic status of mathematical and philosophical cognition to the methods of proof employed within mathematics and philosophy. Mathematical knowledge is intuitively certain because it is proven on the basis of a priori construction in pure intuition. In particular, mathematical theorems are mediately certain synthetic a priori propositions demonstrated from immediately certain (intuitive) synthetic a priori principles (axioms). Philosophical propositions are mediately certain propositions derived from (discursive) synthetic a priori principles. Both mathematical theorems and philosophical propositions are apodictically certain because they are proven on the basis of a priori principles. Empirical knowledge, justified merely empirically, is empirically certain or contingent. However, empirical knowledge is apodictically certain “insofar as we cognize an empirically certain proposition from principles a priori”.58
Footnote 53 continued distinguished from Kant’s grounding condition discussed in Sect. 3, which relates to the ordo essendi. See note 23 for the relevance of this distinction. 54 Kant (1787, A, pp. 820–822/B, pp. 848–850). 55 Kant (1787, A, p. 829/B, p. 857). 56 Cf. Falkenburg (2001, pp. 364–365). Chignell (2007) argues that objective grounds for knowing prop-
ositions indicate that propositions have an objective probability of being true. This cannot be true if, as I will argue, objective grounds of cognition must typically be understood as a priori principles on the basis of which we take propositions to be necessarily true, i.e., have knowledge of these propositions. 57 Kant (1902, IX, pp. 70–71). 58 Kant (1902, IX, p. 71).
123
18
Synthese (2011) 183:7–26
The foregoing shows that the epistemic justification that we have for judgments in a particular science is determined by the relation of these judgments to the principles (fundamental judgments) of this science. In particular, a judgment is apodictically certain if it can be proven by means of a priori principles. These principles are necessary and strictly universal truths, providing subjectively and objectively sufficient grounds of cognition for the truth of judgments somehow derivable from them. It follows that scientific judgments only provide us with knowledge, if they can be proven by means of a priori principles. In the Metaphysical foundations, Kant expresses this point by stating that the principles of a proper science must be a priori. Kant’s conception of ‘proper science’ can now be summarized as follows: in order to be a proper science, any body of cognition must be (i) systematically organized, (ii) express relations between objective grounds and consequences, (iii) have a priori principles on the basis of which the non-fundamental judgments of a science can be proven. These conditions comprise Kant’s model of ‘proper science’. However, the Preface to the Metaphysical foundations is infamous for a different claim. This is the claim that any proper natural science must allow for the application of mathematics, which Kant employs to deny that chemistry and psychology are sciences proper. In the final section I will deal with Kant’s mathematization requirement. I will argue that the latter requirement follows from the requirement that any proper science must have a priori principles. Such a view is suggested by Kant’s criticism of chemistry. For Kant argues that because the principles of chemistry do not allow of mathematization, we lack a priori cognition of the principles underlying chemical appearances.59 It is because of the latter reason that chemistry is denied the status of a proper science. However, it is not clear why Kant thinks that the mathematization of a doctrine secures an a priori foundation of that doctrine. Kant’s view becomes clearer if we take into account that he takes mathematization to be a necessary condition of the scientific status of doctrines of nature. Kant takes this view because he interprets mathematics as a science that provides us with a priori cognition of individual corporeal objects. In particular, mathematics provides a priori grounds of cognition that ground apodictic certain cognition of corporeal objects. As such, mathematics allows us to give an a priori (epistemic) foundation of natural sciences. Before developing this interpretation, I will first discuss a more instrumental interpretation of Kant’s mathematization requirement.
5 Mathematics and a priori justification The requirement that a proper natural science must allow of mathematization is often taken to be equivalent to the requirement that the concepts of such a science be quantifiable. Kant’s claim that “in any special doctrine of nature there can be only as much proper science as there is mathematics therein”,60 is accordingly read as stating that only doctrines dealing with measurable magnitudes qualify as proper natural 59 Kant (1902, IV, pp. 470–471). 60 Ibid.
123
Synthese (2011) 183:7–26
19
sciences.61 Kant’s mathematization requirement is thus simply taken to express the importance of measurability. This reading certainly captures part of Kant’s intentions in emphasizing the importance of mathematics within natural science. In the modern period mathematics was often thought of as providing a quantitative description of empirical objects.62 Nevertheless, I do not think this reading can explain Kant’s mathematization requirement. A difficulty confronting the above reading is that it conflates the notion of mathematization and that of measurability.63 It is true that Kant thought that the mathematical representation of magnitudes enables the measurement of magnitudes. Nevertheless, one should carefully distinguish the notion of mathematization from that of measurability. In the Critique of judgment, Kant states that we measure natural objects by assigning numbers to particular objects and that measurement requires the selection of a unit of measurement. The selection of a unit is arbitrary or context dependent.64 In a purely mathematical context we can, e.g., represent numbers and their relations in terms of relations between line segments. In measuring natural objects we empirically specify a particular kind of object as unit of measurement. Hence, mathematics does not by itself provide a measurement procedure. Consequently, we must distinguish between Kant’s conception of mathematization and that of measurability. If we focus on Kant’s argument for the claim that proper natural sciences require mathematics, it becomes clear that considerations concerning measurability do not play any role. This argument, contained in Preface to the Metaphysical foundations, is based on the premise that proper natural sciences are based on “a priori cognition of natural things”.65 Kant continues his argument by stating that “to cognize something a priori means to cognize it from its mere possibility”.66 I take this to mean that cognition of the logical possibility of an object can be gained a priori by means of the analysis of its concept. However, according to Kant such a procedure does not enable us to cognize “the possibility of determinate natural things”.67 From this it is concluded that “in order to cognize the possibility of determinate natural things, and thus to cognize them a priori, it is still required that the intuition corresponding to the concept be given a priori, that is, that the concept be constructed”.68 And this is a task for mathematics, since mathematical cognition is defined as cognition obtained through the construction of concepts.
61 This view has been endorsed by several commentators. Cf. Okruhlik (1986, p. 313), Nayak and Sotnak
(1995, pp. 133–151). The latter authors assume that, according to Kant, the sole purpose of the application of mathematics within natural sciences is to allow for the measurability of the objects of these sciences. In the following I will argue, in contrast, that mathematics provides a priori principles securing the apodictic certainty of cognitions pertaining to the natural sciences. 62 Christian Wolff, for example, defines mathematics in his Mathematisches Lexicon as “a science that aims to measure everything that can be measured”. Wolff (1965, pp. 863–864). 63 Nayak and Sotnak conflate these two conceptions. Cf. Nayak and Sotnak (1995, pp. 113, 142, 144). 64 Kant (1902, V, p. 251). 65 Kant (1902, IV, p. 470). 66 Ibid. 67 Ibid. 68 Ibid.
123
20
Synthese (2011) 183:7–26
In order to understand Kant’s position we must explain why only mathematical construction allows us to have a priori cognition of determinate natural things. I take Kant to hold that only mathematics provides us (i) with a priori cognition of natural objects by means of (ii) singular and immediate representations of these objects. This reading follows from the claim that a priori cognition of determinate natural things requires the construction of their concept. In the ‘Discipline of Pure Reason’ of the first Critique, Kant explains that mathematical reasoning is based on the construction of concepts, which is defined as follows: “to construct a concept means to exhibit a priori the intuition corresponding to it”.69 The term ‘intuition’ refers to a particular instance of a concept. In contrast to concepts, i.e., general representations representing their object mediately (via intuitions), Kant further interprets intuitions as singular representations that represent their object immediately.70 Hence, mathematical cognition, based on the construction of concepts, concerns singular and immediate representations of objects (the term ‘object’ is explicated below). Moreover, since within mathematics the constructed intuition is exhibited a priori, which is to say that singular representations employed within mathematical demonstrations (e.g., an isosceles triangle) can represent all intuitions falling under the same concept (all isosceles triangles),71 a characteristic of mathematical demonstration securing the universality of what is demonstrated, mathematics provides a priori cognition of objects. With which objects is mathematics concerned? Kant accepts the traditional conception of (pure) mathematics as a science of magnitude. Geometry, for example, is construed as providing a priori cognition of space and spatial relations and thus concerns continuous magnitude. How, then, does mathematics provide a priori cognition of natural objects? In the first Critique, Kant argues that mathematical concepts relate to “data for experience” by means of the a priori construction of figures or images.72 A figure or image is an intuition: it is a particular (sensible and concrete) instance of a concept. In the Prolegomena, Kant further explains that (geometrically) constructed images agree with empirical phenomena.73 As an example, we can think of line segments as geometric images of the velocity (speed plus direction) of corporeal bodies. Kant thus construes mathematics as providing a priori cognition of mathematical constructs, images or models, that represent quantitative features of natural objects. That Kant entertains this position is not surprising, for his views on mathematics stem from a tradition that took mathematical cognition to be descriptive of the empirical world. For example, Christian Wolff, in his Mathematisches Lexicon, defines geometry as “a science of the space taken up by corporeal things in their length, breadth, and width”.74 Moreover, since all things occupy space, Wolff argues, geometry is applicable to all such objects and provides cognition of the latter. This position is similar to that of Kant, for Kant took geometry, insofar as it provides cognition of the structure
69 Kant (1787, A, p. 713/B, p. 741). 70 Kant (1902, IX, p. 91; 1787, A, p. 68/B, p. 93). 71 Kant (1787, A, pp. 713–714/B, pp. 741–742). 72 Kant (1787, A, p. 240/B, p. 299). 73 Kant (1902, IV, p. 287). 74 Wolff (1965, p. 665). On Wolff’s views on mathematics in relation to Kant see Shabel (2003).
123
Synthese (2011) 183:7–26
21
of space, to provide cognition of the formal features of perceptible spatiotemporal objects. The general conception motivating Kant’s views on mathematics is thus that mathematics provides knowledge of the formal (spatiotemporal) features of corporeal objects. In order to substantiate the present reading we can cite Kant’s well-known claim that in mathematical problems the question is not about “existence as such at all, but about the properties of the objects in themselves”.75 Thus, Kant does not attribute existence to mathematical objects (pure intuitions). In a similar vein, Kant states: Through determination of the former [pure intuition] we can acquire a priori cognitions of objects (in mathematics), but only as far as their form is concerned as appearances; whether there can be things that must be intuited in this form is still left unsettled. Consequently all mathematical concepts are not by themselves cognitions, except insofar as one presupposes that there are things that can be presented to us only in accordance with the form of that pure sensible intuition. Things in space and time, however, are only given insofar as they are perceptions (representations accompanied with sensation), hence through empirical representation. (Kant 1787, B, p. 147) This passage conveys Kant’s view that the objective reality of mathematical concepts, i.e., the possible existence of objects falling under such concepts, requires their possible application to empirical intuitions, i.e., to perceivable empirical objects, a view that implies that mathematics yields a body of truths only insofar as it is applicable to empirical objects.76 Hence, when Kant claims that by means of mathematical construction we cognize the form of objects (appearances), he is referring to empirical objects (phenomena). Note that Kant emphasizes that cognition of the objective reality of mathematical concepts requires a philosophical justification. The construction of mathematical concepts in pure intuition (space and time) guarantees their objective reality if we presuppose that “there are things that can be presented to us only in accordance with the form of that pure sensible intuition”. This supposition requires the philosophical justification, given in the Transcendental Aesthetic, that space and time are pure forms of sensible intuition. Given the truth of this supposition mathematical construction establishes the objective reality of mathematical concepts. In particular, by means of mathematical construction we show the possible existence of empirical objects, the form of which is given by the construction. A different way of putting this is that mathematics provides a priori models that possibly represent (formal features of) existing and empirically given natural objects. With this in mind we can turn to the foundational role that Kant attributes to mathematics. A nice illustration of the foundational role of mathematics can be found in § 38 of the Prolegomena.77 In this paragraph Kant gives various examples that elucidate 75 Kant (1787, A, p. 719/B, p. 747). 76 On these and the following points, see: Thompson (1992, pp. 97–101), Parsons (1992, pp. 69–75),
Friedman (1992, pp. 98–104). 77 This paragraph had been subjected to a very detailed and subtle interpretation by Friedman, to which my interpretation is indebted. Cf. Friedman (1992, Chap. 4). I employ § 38 of the Prolegomena as providing an example that allows us to understand (i) the particular foundational role that Kant assigns to
123
22
Synthese (2011) 183:7–26
the transcendental claim that the understanding prescribes a priori laws to nature. The pièce de résistance is an example taken from physical astronomy: “a physical law of reciprocal attraction, extending to all material nature, the rule of which is that the attractions decrease inversely with the square of distance from each part of attraction.”78 In other words, the main focus of § 38 is the law of gravitation. Note, however, that Kant focuses on the dependency on distance of gravitation, i.e., the fact that gravity is an inverse-square force (1/r2 ). How is this law prescribed to nature by the understanding? The first example of § 38 is mathematical. Kant refers to proposition 35 from Book III of the Elementa of Euclid, stating that if two straight lines intersect one another in a circle at point E, and intersect the circle at A, C and B, D, it holds that AE × EC = BE × ED. According to Kant, this law is dependent on the understanding because it can be demonstrated “only from the condition on which the understanding based the construction of this figure, namely, the equality of the radii”.79 Hence, the proof of the above law is based on the condition that all straight lines from the centre of the circle to its boundary are equal, a condition expressed in Euclid’s definition of a circle. The second example is construed by Kant as a generalization of the above property of circles to conic sections. This proposition states that chords intersecting in a conic section intersect in such a way that the rectangles from their parts “stand to one another in equal proportions”.80 Thus, the products of the lengths of the segments of the chords of any conic section stand to one another in equal proportions. If we let chord AC intersect chord BD at E, and let chord A C intersect chord B D at E , then for all conic sections (AE × EC) : (BE × ED) = (A E × E C ) : (B E × E D ).81 It is this property of conic sections that Kant takes as a basis for inferring that gravity is an inverse-square force. This choice of inference is understandable if we, following Friedman,82 consider the Newtonian background of Kant’s argument. In particular, we must take into account Newton’s derivation of the inverse square law given in propositions 11–13 of Book I of the Principia. In proposition 11 Newton employs an instance of the property of conic sections described above to prove that: if a body P moving along an ellipse is subject to a force f centrally directed toward a focus S, then f is inversely proportional to SP2 .83 In proposition 1 and 2 of Book I, Newton had shown that a force acting on a body with uniform linear motion is centrally directed towards a given point if and only if this motion describes equal areas in equal times with respect to that point (i.e., satisfies Kepler’s law of areas).84 Hence, Newton’s proof of proposition 11 shows that if a body moving along an elliptical orbit describes equal areas in equal times with respect to Footnote 77 continued mathematics with respect to physics (which Friedman does not fully explicate), and (ii) Kant’s claim that only mathematical natural sciences constitute proper natural sciences. 78 Kant (1902, IV, p. 321). 79 Ibid. 80 Ibid. 81 Cf. Friedman (1992, p. 191). 82 Friedman (1992, pp. 191–194). 83 Newton (1999, pp. 462–463). 84 Newton (1999, pp. 444–448).
123
Synthese (2011) 183:7–26
23
the focus of the ellipse, it is subject to a central force that is inversely proportional to the square of the distance from that focus. In propositions 12 and 13 Newton proves that the same holds for hyperbolic and parabolic orbits. Corollary 1 of proposition 13 conversely shows that a moving body subject to a centripetal inverse-square force will orbit along a conic section. In short, Propositions 11–13 of Book I prove that orbital motion along a conic section satisfying the law of areas implies an inverse-square force (and vice versa). Kant’s argument in § 38 suggests that he had this derivation in mind. Propositions 11–13 provide us with mathematical demonstrations of particular equivalences. Kant suggests that these demonstrations allow us to infer that gravity is an inverse-square force. This is understandable if we recognize that the mathematical principles developed in Book I provide a basis for Newton’s derivation of the law of gravitation in Book III of the Principia.85 In particular, Kant seems to envision an argument along the following lines. Newton’s proposition 11 of Book I proves that if a body moves in an ellipse and satisfies the laws of areas with respect to a focus, this motion is governed by an inverse-square centripetal force directed toward the focus. We know empirically, by means of Newton’s “phenomena”, that the satellites of primary bodies orbit in an ellipse and satisfy Kepler’s law of areas with respect to their primary bodies situated at a focus. Hence, we can infer mathematically that these satellites are subject to an inverse-square force directed towards their primary bodies. Newton himself employs this type of reasoning in the first three propositions of Book III, insofar as he applies mathematically demonstrated relations, obtaining between centrally directed inverse-square forces maintaining a body in orbit and this motion satisfying Kepler’s laws, to phenomena. These relations allow him to infer that the satellites of Jupiter and Saturn are subject to an inverse-square force directed toward the center of their primary planets (prop. 1),86 that the planets are subject to an inverse-square force directed towards the sun (prop. 2), and that the moon is subject to an inverse-square force directed towards the earth (prop. 3). These propositions provide us with the first steps in Newton’s argument for the law of gravitation and allow him to conclude that gravity is an inverse-square force. The use of mathematics within natural science described above suggests that mathematics can be interpreted as providing models of the physical world, by means of which we cognize quantitative relations obtaining between individual objects. This is a three step procedure: (i) we mathematically establish that motion along a conic section satisfying (one of) Kepler’s laws implies a centripetal inverse-square force; (ii) we empirically observe that heavenly bodies orbit in a conic section and satisfy (one of) Kepler’s laws; (iii) we infer the existence of a centripetal inverse-square force. Kant took mathematics as providing models of physical objects and would have interpreted Newton’s application of mathematical principles to phenomena accordingly. In this 85 As Newton puts it himself in the Introduction to Book III, the task of Book III is to “exhibit the system of the world from these same principles.” The locution ‘same principles’ refers to the mathematical principles of philosophy expounded in Books I and II. Newton (1999, p. 793). 86 Newton does not, however, employ proposition 11 of Book I, but proposition 2 of Book I, allowing us to use the law of areas to infer the existence of a centripetal force, and Corollary 6 of proposition 4 of Book 1, allowing us to use the harmonic law to infer the existence of an inverse-square force. For an analysis of Newton’s argument: Harper (2002, pp. 174–201).
123
24
Synthese (2011) 183:7–26
case, a mathematically constructed conic section functions as a model of the motions of heavenly bodies, enabling the inference from (i) and (ii) to (iii) by subsumption. Note that this inference is successful because we have a fit between our mathematical model and observable phenomena. This need not be the case. In his discussion of the mathematical-mechanical mode of explanation in the Metaphysical Foundations, Kant states that this mode is based on concepts that can be mathematically represented (e.g., ‘extension’, ‘corpuscle’, ‘space’).87 As an example, one may think of an explanation of density differences by means of the mathematical representation of varying amounts of empty space interspersed among material particles. According to Kant we can, then, construct mathematical models on the basis of concepts central to the mathematical-mechanical mode of explanation. Nevertheless, this mode of explanation is rejected since it employs empty concepts such as ‘void space’ or ‘absolute impenetrability’, i.e., concepts the object of which cannot be cognized as existing since their instances are not observable. Consequently, mathematical models based on empty concepts can not be taken to represent actual existing objects. This is probably a consequence of the fact that, insofar as these models are based on empty concepts, they are interpreted in a manner that cannot be adequate to existing objects (e.g., by taking mathematical points to represent absolutely impenetrable corpuscles). Thus, although the construction of a mathematical model secures its objective reality, i.e., its possible application to objects, these models can only be taken to represent particular existing objects if they are interpreted correctly.88 A possible method to determine that a mathematical model represents existing objects is via the empirical confirmation of consequences inferred on the basis of this model (i.e., the empirical confirmation of (iii)).89 The upshot of the previous paragraph is that we take (correctly interpreted) mathematical models to represent existing objects on the basis of empirical confirmation of inferences made on the basis of these models. Hence, the use of mathematics within natural science is empirically conditioned. Nevertheless, mathematics allows the a priori justification of physical cognition. The derivation of the inverse-square law, discussed above in relation to § 38, allows us to see this. For the claim that gravity is an inverse-square force is based on mathematically demonstrated (a priori cognizable) relations. We apply these relations to observable phenomena in order to infer mathematically that, e.g., the planets are subject to an inverse-square force directed towards the sun. The latter claim is thus proven on the basis of a priori principles 87 Kant (1902, IV, pp. 524–525). 88 Abstracting from Kant’s own terminology, we can say that a correct interpretation of mathematical mod-
els employed within physics is guaranteed metaphysically. For Kant takes it to be the task of metaphysics to provide a priori principles in accordance with which the concept of physics must be mathematically constructed. Cf. Kant (1902, IV, p. 473). 89 Because of the reasons sketched in this paragraph, I cannot follow commentators who interpret Kant’s mathematization requirement as following from his view that the mathematical construction of the concepts of a science secures their objective reality. Cf. Falkenburg (2000, p. 289), Pollok (2001, pp. 86–87). Although constructability secures objective reality, it does not secure that mathematical models are adequate to particular natural objects. This adequacy is required, however, if mathematics is supposed to provide a priori justification of scientific cognitions. Hence, this particular interpretation cannot explain how mathematics fulfils a foundational function with respect to natural sciences.
123
Synthese (2011) 183:7–26
25
and is accordingly apodictically certain. Note, moreover, that it provides cognition of a specific quantitative relationship obtaining between individual objects. Hence, mathematics allows us to obtain a priori grounded cognition of determinate natural things. Finally, we may note that mathematics provides a priori grounds of cognition of determinate natural things. Mathematical models do not specify (in Kant’s terms) grounds of being of, e.g., the fact that planets are subject to an inverse-square force, for this ground is given by the physical force of gravitation. Hence, mathematics fulfills a strictly epistemic function with respect to physics. Such a reading is confirmed by the fact that Kant occasionally describes mathematics as an organon of the sciences, which is to say that mathematics is an instrument for bringing about certain cognition.90 In short, mathematical demonstration of judgments in natural science secures knowledge but does not guarantee that judgments ground each other, i.e., express objective relations between grounds and consequences. The interpretation developed above allows us to explain, in conclusion, why only mathematical sciences are proper sciences. According to Kant, it is mathematics alone that provides a priori insight of specific quantitative properties of individual physical objects. This is a consequence of the fact that mathematics provides a priori models (individual and concrete representations) of physical objects. Philosophy or metaphysics provide discursive a priori principles valid for natural objects. However, in contrast to mathematics, philosophical or metaphysical cognition is not based on the construction of concepts. Consequently, it does not provide a priori models of individual physical objects that can be applied to nature in order to obtain a priori cognition of specific quantitative relations obtaining between individual objects. This type of cognition can only be obtained by means of mathematics. 6 Conclusion The upshot of the final section is that Kant argues for the necessity of applying mathematics within the study of nature since mathematics provides a priori cognition of corporeal objects. As such, mathematics can provide a priori principles for doctrines that aim to explain specific features of corporeal objects. The foundational function of mathematics is exemplified in the case of physics, where mathematics provides a priori principles for cognizing physical laws. Kant’s claim that proper natural sciences require mathematics follows from the claim that any proper science must have a priori principles, which is meant to secure that scientific judgments are apodictically certain. This latter condition, in turn, builds on the condition that proper sciences must have principles or grounds, securing that they are explanatory. These conditions correspond to conditions (6) and (3) of the Classical Model of Science. Finally, the condition of systematicity secures that sciences possess a logical order and coherence, incorporating condition (2) of the Classical Model of Science. Kant’s conception of proper science is thus a natural consequence of a classical ideal of science. 90 Cf. Kant (1902, IX, p. 13). Here, Kant further argues that an organon, such as mathematics, anticipates
the matter of the sciences. This claim, I take it, is nicely illustrated by the interpretation of mathematics as providing (a priori) models of physical objects, providing grounds of cognition for cognitions pertaining to physics.
123
26
Synthese (2011) 183:7–26
Acknowledgements I wish to express my appreciation for the insightful comments of two anonymous reviewers. Any remaining shortcomings are of course my own. Research for this paper was conducted within the project The Quest for the System in the Transcendental Philosophy of Immanuel Kant, subsidized by the Netherlands Organization of Scientific Research (NWO). Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
References Chignell, A. (2007). Kant’s concepts of justification. Noûs, 41, 33–63. de Jong, W. R. (1995). Kant’s analytic judgments and the traditional theory of concepts. Journal of the History of Philosophy, 33, 613–641. de Jong, W. R., & Betti, A. (2008). The classical model of science: A millenia-old model of scientific rationality. Synthese. doi:10.1007/s11229-008-9417-4. Falkenburg, B. (2000). Kants Kosmologie. Frankfurt am Main: Vittorio Klostermann. Friedman, M. (1992a). Kant and the exact sciences. Cambridge: Harvard University Press. Friedman, M. (1992b). Causal laws and the foundations of natural science. In P. Guyer (Ed.), The Cambridge companion to Kant (pp. 161–199). Cambridge: Cambridge University Press. Fulda, H. F. & Stolzenberg, J. (Eds.). (2001). Architektonik und System in der Philosophie Kants. Hamburg: Felix Meiner Verlag. Guyer, P. (2005). Kant’s system of nature and freedom. Oxford: Oxford University Press. Harper, W. (2002). Newton’s argument for universal gravitation. In I. B. Cohen & G. E. Smith (Eds.), The Cambridge companion to Newton (pp. 174–201). Cambridge: Cambridge University Press. Kant, I. (1787). Kritik der reinen Vernunft (1998). Hamburg: Felix Meiner. References are in the customary way via the pagination of the first ( A) or second printing (B). Kant, I. (1902). Kants gesammelte Schriften. Vol. I—XXIX (1902–1983). Berlin: De Gruyter, Reimer. Quotations from P. Guyer & A. W. Wood (Eds.), (1992–). The Cambridge edition of the works of Immanuel Kant. Cambridge: Cambridge University Press. Longuenesse, B. (1998). Kant and the capacity to judge. Princeton: Princeton University Press. Longuenesse, B. (2001). Kant’s deconstruction of the principle of sufficient reason. The Harvard Review of Philosophy, IX, 67–87. Nayak, A. C., & Sotnak, E. (1995). Kant on the impossibility of the “soft sciences”. Philosophy and Phenomenological Research, 55, 133–151. Newton, I. (1999). The principia: Mathematical principles of natural philosophy (I. B. Cohen & A. Whitmann, Trans.). Berkeley: University of California Press. Okruhlik, K. (1986). Kant on realism and methodology. In R. E. Butts (Ed.), Kant’s philosophy of the physical sciences (pp. 307–329). Dordrecht: Reidel. Parsons, C. (1992). Kant’s philosophy of arithmetic. In C. J. Posy (Ed.), Kant’s philosophy of mathematics (pp. 43–79). Dordrecht: Kluwer. Pollok, K. (2001). Kants “Metaphysische Anfangsgründe der Naturwissenschaft” Ein Kritischer Kommentar. Felix Meiner Verlag: Hamburg. Shabel, L. (2003). Mathematics in Kant’s critical philosophy: Reflections on mathematical practice. New York: Routledge. Sloan, P. R. (2006). Kant on the history of nature: The ambiguous heritage of the critical philosophy for natural history. Studies in History and Philosophy of Biological and Biomedical Sciences, 37, 627–648. Thompson, M. (1992). Singular terms and intuitions in Kant’s epistemology. In C. J. Posy (Ed.), Kant’s philosophy of mathematics (pp. 81–107). Dordrecht: Kluwer. Watkins, E. (Ed.). (2001). Kant and the sciences. Oxford: Oxford University Press. Watkins, E. (2007). Kant’s philosophy of science. The Stanford Encyclopedia of Philosophy (Fall 2007 edn.) E. N. Zalta (Ed.). URL:http://plato.stanford.edu/archives/fall2007/entries/kant-science/. Wolff, C. (1965). Mathematisches Lexicon. Hildesheim, NY: Georg Olms Verlag.
123
Synthese (2011) 183:27–45 DOI 10.1007/s11229-009-9666-x
Bolzanian knowing: infallibility, virtue and foundational truth Anita Konzelmann Ziv
Received: 24 March 2008 / Accepted: 14 April 2009 / Published online: 6 October 2009 © Springer Science+Business Media B.V. 2009
Abstract The paper discusses Bernard Bolzano’s epistemological approach to believing and knowing with regard to the epistemic requirements of an axiomatic model of science. It relates Bolzano’s notions of believing, knowing and evaluation to notions of infallibility, immediacy and foundational truth. If axiomatic systems require their foundational truths to be infallibly known, this knowledge involves both evaluation of the infallibility of the asserted truth and evaluation of its being foundational. The twofold attempt to examine one’s assertions and to do so by searching for the objective grounds of the truths asserted lies at the heart of Bolzano’s notion of knowledge. However, the explanatory task of searching for grounds requires methods that cannot warrant infallibility. Hence, its constitutive role in a conception of knowledge seems to imply the fallibility of such knowledge. I argue that the explanatory task contained in Bolzanian knowing involves a high degree of epistemic virtues, and that it is only through some salient virtue that the credit of infallibility can distinguish Bolzanian knowing from a high degree of Bolzanian believing. Keywords Believing · Knowing · Infallibility · Immediacy · Foundational truth · Epistemic evaluation · Explanation · Epistemic virtue 1 Introduction My discussion of Bernard Bolzano’s conception of knowledge and its relation to the epistemic requirements of an axiomatic system presupposes that Bolzano is a partisan of the traditional axiomatic model of science.1 Consequently, his epistemology is 1 For details, see de Jong and Betti (2008) and Lapointe (2008).
A. Konzelmann Ziv (B) Philosophisches Seminar, Universität Basel, Nadelberg 6-8, 4051 Basel, Switzerland e-mail:
[email protected]
123
28
Synthese (2011) 183:27–45
committed to the epistemic requirements of the model, summarized in the following principles: (a) all propositions in a system S are known to be true (a non-fundamental proposition is known to be true through its proof in S), and (b) all concepts in S are adequately known (a non-fundamental concept is adequately known through its composition or definition).2 For the present purpose, I will restrain considerations to the scope of principle (a) that embraces, on the one hand, rules of inference and the extent of their application, and, on the other hand, the epistemic status of axioms or foundational truths.3 The latter concern divides into the question of how to know axioms to be true and the question of how to know them to be foundational. Answers to these questions are often given in terms of impregnable or self-evident beliefs. Another option is to say that axioms are just stated. In this case, due to their hypothetical status no truth-value can be ascribed to them. Here, the question is for the epistemic status of hypotheses, as well as for assertions of the form “[ p] is a foundational truth in S”.4 If axioms are imported from the domain of another science, the question of foundational knowledge is reported to that domain. Then, foundational knowledge seems to be dependent on the coherence of scientific systems or on a hierarchy of scientific systems in which the question of foundation recurs. Often, axiomatic models of science go together with foundationalist epistemological accounts that explain knowledge and justification in terms of foundational beliefs being “self-evident” or resulting from some special epistemic faculty—for example “intuition”. The main objections to foundationalism query the lack of criteria for selfevidence, as well as the arbitrariness of intuitional faculties. Bolzano aligns with this criticism, opposing vehemently all claims for intuitional insight as justificatory of infallibility (Bolzano 1837a §315, annotation).5 He supports, on the other hand, the idea of infallible knowledge resulting from “immediate judgments”.6 Yet, he equally admits a substantial lack of criteria for these latter. Therefore, his epistemology cannot be qualified as essentially foundationalist, in spite of some strong foundationalist claims with regard to the ultimate truths of all sciences. The best fitting contempo2 Following de Jong and Betti (2008). 3 “Given the difference between fundamental and non-fundamental elements, the Model has to accommo-
date two kinds of justification, one for the fundamental, and one for the non-fundamental elements. […] Grounded knowledge is to be intended as knowledge propter quid or demonstrative knowledge […]. Scientific knowledge according to the postulate of grounded knowledge is knowledge which is also explanative; in it the ordo cognoscendi matches the ordo essendi.” (de Jong and Betti 2008). 4 I will use the following notations:
[ p]: for a proposition stating p [P]: for a collection of [ p] “ p”: for a subjective representation of p “ p!”: for a judgment (assertion) of [ p]. 5 All quotes from Bolzano 1837a: author’s translation. 6 The argument for this claim is typically foundationalist: “It is equally certain that there are also immediate
judgments; for the existence of mediated ones can in the end be understood only because there are also immediate judgments” (Bolzano 1837b §300.3; cf. §300.11). For the claim of necessary trustworthiness of immediate judgments, Bolzano argues in terms of anti-skepticism: “But if you consider […] possible […] that one of your immediately formed judgments is false or that one of your immediate forms of inference is invalid, then you must distrust indeed not only some, but all your immediate judgments and forms of inference. For all of them have only one and the same warrant, namely your immediate consciousness. You […] should therefore not form any judgments at all” (Bolzano 1837b §42).
123
Synthese (2011) 183:27–45
29
rary label for Bolzano’s approach is Virtue Epistemology in its responsibilist version, which combines reliabilist concerns for epistemic success with internalist concerns for agentive responsibility for knowledge. In the following, I shall first give an overview of Bolzano’s use of central epistemological terms and then focus on the question of infallible knowledge. I will show how the claim for the existence of infallible assertions is related to the exhortation to test one’s judgments. Bolzano characterizes such testing not primarily as an epistemic duty but, rather, as an act of epistemic nobleness essentially connected with the zeal to increase knowledge. The most important goal in increasing knowledge is explanatory; that is, to advance as far as to the objective grounds of an asserted truth.7 I will outline how these supreme epistemic values call for a legitimation of “inverted” ways to knowledge, as well as for an ineluctably virtue-based epistemic ethics. 2 Epistemological notions In the introductory part of his seminal work Theory of science (Wissenschaftslehre), Bolzano connects science (Wissenschaft) with knowledge (Wissen) in the following words: I therefore take the liberty to call an aggregate of truths of a certain kind a science [Wissenschaft], if what is known of it is important enough to be set forth in a special book. […] For instance, I shall call the class of all truths which assert something about the constitution of space the science of space (geometry), because these propositions form a separate species of truths. (Bolzano 1837b §1.1) Bolzano formulates this “Domain Postulate”8 not only in terms of a class of truths determined by a specific set of objects, but also in the pragmatic terms of its worth for a community of knowers. Apart from being a mere domain of objective truths, then, a science is moreover a domain of knowledge. Thus, the logic providing the rules to direct scientific (or more broadly epistemic) agency is a logic whose concept is close to that of epistemic logic. According to Nicholas Rescher, epistemic logic is “applied logic” which has to reflect the “ultimately factual circumstances” of intelligent beings’ cognitive situation. Therefore, the theses of such a logical system are considered to “stand correlative to the ways in which we actually do talk and think about the matter”.9 While the term “knowledge” in the expression “domain of knowledge” is unspecified, meaning roughly “correct assertions supported in an acceptable way”, the same term will be given a specific meaning in epistemological considerations. In Bolzano’s epistemology (Erkenntnistheorie), however, “knowing” is not the central term. Rather, it is an epistemology centered on the notions of “judgment” (Urteil) and “cognition” (Erkenntnis)—corresponding to the contemporary labels “belief” and “true belief”— 7 For Bolzano’s theory of grounding see Tatzel (2002) and Lapointe (2008). 8 On the Domain Postulate see the introduction and de Jong and Betti (2008). 9 Rescher (2005, p. 5).
123
30
Synthese (2011) 183:27–45
as well as on the notion of “inference” (Schliessen). The primary epistemic act is judging, that is “taking for true” (asserting) a certain objective proposition. Judging can be either “immediate” or “mediated” by inference. In Bolzano’s view, believing (Glauben) and knowing (Wissen) are secondary to judging in that they name reflective attitudes of epistemic agents towards their judgments (Bolzano 1837a §321). Believing and knowing are introduced as attitudes resulting from assessing the fallibility of judgments. While an immediate judgment is supposed to have fallibility 0—default infallibility—the fallibility of inferred judgments depends on the way they are inferred. If the inference is “perfect deduction”, that is, starts from correct assertions and follows deductive rules properly, the resulting judgment is infallible. If, on the other hand, there is uncertainty as to the correctness of premises asserted or rule following, the inferred judgment is fallible.10 Consequently, the reflective attitude towards such a judgment is Bolzanian believing, while Bolzanian knowing reflects the assessment of one’s judgment as infallible. It follows from the foregoing that Bolzano’s specified concept of knowledge is internalist: an analysis of “A knows that p” essentially involves agent A’s subjective appraisal of her judgmental activity. Bolzano seems to take it for granted that epistemic agents have a kind of primitive self-attitude towards their cognitive processes, similarly to what Keith Lehrer labels as “self-trust”,11 and that this attitude, if reflected, is manifested in knowing and believing. On the other hand, Bolzano assumes that judgmental activity naturally runs in the direction of its intrinsic aim of asserting truth. This “teleological reliabilism” is based in his metaphysics of the mind that links the causality of mental processes with the necessity of logical laws. With regard to the ontological status of the mind, Bolzano defends a substance monism, according to which a mind or “soul” is the “ruling substance” within an “organic” aggregate of substances, called a “body”.12 Since substances are defined as self-sustaining efficacious simples and are efficacious in virtue of having causal “forces”, this position suggests that the activity of the mind is to be accounted for in causal terms. Given such an account, the activity of a mental substance consists in its running causal processes that are governed by the laws of mental forces. These processes are taken to be reliable ways to achieve the epistemic aim of judgments’ truth-conformity. Bolzano emphasizes the reliability of causal cognitive processes by linking causal and logical necessity in his notion of the ground-consequence relation (Abfolge), for which statements on causal relations count as paradigmatic instantiations. The strong emphasis on a causalist account of the properties of mental substances, together with the teleological claim that these properties unfold in processes that are truth-aimed, results in the reliabilist optimism that epistemic processes achieve their aim of truth-conformity if they function properly in appropriate circumstances. In Bolzano’s epistemology, this externalist feature of process reliabilism meshes with the internalist feature of reflective assessment of one’s epistemic procedures. The meshing becomes salient in 10 Bolzano defines doubt in terms of degrees of probability of confidence, and confidence as indicating either immediacy of judgment or the truth-probability of a proposition [q] in relation to a set of premises. 11 “The first step in the life of reason is self-trust. I trust myself in what I accept and prefer, and I consider
myself worthy of my trust in what I accept or prefer” (Lehrer 1997, p. 5). 12 See, for example, Berg (1976); Chisholm (1991); Krause (2004).
123
Synthese (2011) 183:27–45
31
terms such as “judgmental force”, “representational force”, “imaginative force” and so on. Here, the intentionalist vocabulary—traditionally used to account for subjectively accessible and manageable activities—combines with the reliabilist vocabulary of processual occurrence. Accordingly, epistemological notions such as “judging” are accounted for both in intentionalist terms (asserting the truth of an objective proposition), and in functionalist terms of subliminal processes that concatenate more or less strong “ideas” into a propositionally structured entity of thought. Although internal epistemic processes run mostly unnoticed, they are not immune to beneficial or malefic influence. Epistemic practice shapes them in various ways, just as aesthetic and moral practice sharpen or weaken the faculties upon which they rely. Insofar, Bolzano’s process reliabilism concerning judgmental activity is balanced by strong responsibilist claims concerning epistemic agency. The endeavor of epistemic performance is presented as a dimension of essentially ethical concern, determined not only in normative terms like “ought”, “allowed” or “forbidden”, but also in moral terms like “illicit” or “immoral” that emphasize epistemic responsibility (Bolzano 1837b §317.3).13 If, for example, circumstances and indices for truth-conformity make it illicit not to accept a thought as cognition, the resulting conviction should correctly be called an “ethical, or moral, or sufficient” certainty (ibid.). This attitude corresponds to a large extent to ideas of virtue epistemology that attempt to amend the consequentialism of reliabilist accounts by adding the aspect of agentive responsibility to analyses of knowledge (see, for example, the work of J. Greco, L. Zagzebski, C. Hookway). An example of a virtue epistemological approach to knowledge is Linda Zagzebski’s argument that the reliability of a process cannot confer any added value to its result, since the process receives its own value from the fact that it leads to the given result. So if knowledge shall be more valuable than reliably true belief, the additional value cannot be explained in terms of the reliability of the process leading to it. The explanation must, instead, refer to motivational attitudes that initiate and sustain the actions and processes leading to true belief.14 Similarly, Bolzano’s Theory of science emphasizes the responsibility of epistemic agents to make sure that their practices meet the high standards of epistemic ethics. For example, the scenario of epistemically irresponsible agents whose epistemic failings are constantly adjusted by a reliable demon Verity into the full success of exclusively true beliefs would not fit the conception of Bolzanian knowing.15 To be valuable, a concept of knowledge needs to relate notions of subjective responsibility in the search for truth with notions of success in reaching this aim. In the following, I will relate this point to what Bolzano says on the responsibility of testing one’s judgments and explaining the truths asserted in them. I try to show, on the one hand, that the norms for these activities are aretaic rather than deontic 13 “If the absolute probability of a proposition for a certain being is very large and if there are circumstances which make it foolish or illicit to consider the possibility of the opposite and to act on it, then I call these propositions trustworthy, secure, reliable” (Bolzano 1837b §317.3). 14 Zagzebski (2000) account of knowledge can be expressed in the following scheme:
S knows that p iff (i) S has a “motive for truth”, & (ii) S behaves in a cognitively reliable manner because of (i), & (iii) S successfully reaches the true belief that p because of (ii). 15 See Swank (2000).
123
32
Synthese (2011) 183:27–45
in nature and, on the other hand, that they must account for the epistemic requirements of axiomatic systems. 3 Infallibility and immediacy of judgment Let us now come back to the aforementioned attitudes of Bolzanian believing and knowing that reflect one’s appraisal of one’s judgment as either fallible or infallible. I take it that these evaluative attitudes have to be construed in terms of non-assertive thoughts, which, although cognitive in nature (conceptual or even propositional) may involve feelings and intentions as components. Compare, for example, the notion of “self-trust” mentioned above. A construal of believing and knowing in terms of higherorder judgments—agent A’s judgment “r !” assesses A’s judgment “q!” by asserting the latter’s certainty or degree of subjective probability16 —induces an undesirable infinite regress.17 In order to avoid such evaluator regress, Bolzanian believing and knowing are to be explained as normative attitudes whose analyses refer to a broader realm of mental states than those usually treated as epistemically relevant. As a first approach, they can be analyzed in the following way (Bolzano 1837a §321): (1) A knowsB that [q] iff A asserts [q] & A truly evaluates her asserting [q] as infallible. (2) A believesB that [q] iff A asserts [q] & A evaluates her asserting [q] as fallible. The analysis of “knowing” does not need to mention the traditional clause of [q]’s being true, since infallibility of assertion implies truth-conformity. The analysis’ critical point lies in the correct evaluation of infallibility. If this evaluation is not itself infallible and A can be wrong about it, subjective and objective attribution of knowledge cannot be reconciled. In the analysis of “believing”, however, a wrong evaluation of fallibility does not make a significant difference, since fallibility and truth-conformity are not related in the same way. If A evaluates “q!” as fallible on the basis of [q]’s probability P < 1, this evaluation allows for [q]’s being true, and hence for A’s judgment “q!” to conform to truth. Bolzanian believing can be described as loyalty to a judgment “q!”, as a “relation of our ethos (Gesinnung)” to the truth of [q], in spite of there being reasons against it (Bolzano 1837a §321). Believing is intimately connected to the virtue of trust that presupposes an agent’s acknowledging the risk of 16 “Certainty” (Gewissheit) and “Confidence” (Zuversicht) are notions that qualify judgments in function of the deducibility or probability of objective propositions. Roughly, confidence with regard to a judgment “q!” is the subjective probability of the judgment’s conforming to truth (Bolzano 1837a §§317–320). For the notions of deducibility and probability see Sect. 4. 17 For a similar idea see John Greco’s remarks on subjective justification (S’s believing p is the result of
dispositions that S manifests when S is trying to believe the truth): “The knower must have some awareness that a belief so formed has a good likelihood of being true. Some authors have required that the knower believe that this is so, but […] it seems that people rarely have beliefs about the genesis of their beliefs, and so it would be too strong to require that they always have one in cases of knowledge. […] People manifest highly specific, finely tuned dispositions to form beliefs in some ways rather than others. And this fact, I take it, amounts to an implicit awareness of the reliability of those dispositions.” (Greco 2003, p. 127f) To be noted: the term “belief” in Greco’s use corresponds to the term “judgment” in Bolzano’s use, not to Bolzanian believing.
123
Synthese (2011) 183:27–45
33
being deceived.18 Based on an evaluation that reveals good reasons and due attention to counterarguments, believing will not be disavowed in a significant way. If “q!” has a high degree of “confidence”, the chance of it not conforming to truth may be so low that it would be epistemically “vicious” not to behave as if [q] were true. Believing is, on the other hand, also connected to the virtue of prudence that dissuades one from holding onto a judgment that has too many reasons against it. Believing is at home in the interplay of trust and prudence. Similar considerations apply, although in a different manner, to Bolzanian knowing. While Bolzano assumes that infallibility is a default property of immediate, that is, non-inferential, judgments, he equally assumes that there are no definite criteria to distinguish immediate judgments from “unconsciously inferred” judgments. Since both are experienced as immediate, the virtuous epistemic agent will not lightheartedly evaluate an experience of immediacy as an indicator of infallibility. Contrary to those philosophers who pretend to “immediately cognize all the truths that they, each one in his system, establish” (Bolzano 1837a §315), she will not take experienced immediacy as unconditional justification for dogmatic claims. With regard to experienced immediacy, the salient virtue is prudence. Prudence does not exclude trust or amount to mistrust, but frames trust with a caveat. Testing a judgment even if it seems immediate is motivated mainly by the desire to bring the grounds of the asserted truth to full awareness, not by one’s doubting the judgment. Although Bolzano suggests that there are globally foundational truths, that is, truths that do not have any objective grounds, he desists from trying to list such truths.19 Our judgments, even truly immediate—hence infallible—judgments, generally assert grounded truths. So even if there is no epistemic duty to justify a judgment by providing its reasons, there are epistemic virtues motivating the search for the grounds of the asserted truth. Bolzano emphasizes that this kind of investigating is a “business that we can undertake even then when we do not doubt the truth of the proposition in the least” (Bolzano 1837a §332.8). His “Heuristics” in the Theory of science (Bolzano 1837a §§322–391) is a strong plea for this point. It is apposite to mention at this point that the concepts of Bolzanian believing and knowing do not imply continuous reflective appraisal of one’s judgments. The big part of epistemic activity runs without being accompanied by reflective awareness and is nevertheless able to compensate for errors. Insofar, Bolzano’s epistemology draws importantly on ideas that nowadays are labeled as “Naturalized Rationality”, postulating reasoning processes that are reliable, fast and frugalenough for the demands of a normal human life. As current research programs suggest, simple heuristics such as recognition, take-the-best and take-the-last prove “remarkably successful” and “perform almost as well as a bunch of fancier processing algorithms”, while being much more frugal in the use of information and the demands placed on the computational
18 See for example Baier (1992). 19 Compare Sebestik (1992, p. 282): “It is paradoxal that Bolzano who renews the conception of the
axiomatic does not establish in his Grössenlehre any proposition declared expressly as primitive truth or principle. Bolzano does not propose any list of axioms” (my translation).
123
34
Synthese (2011) 183:27–45
resources of a system.20 For many epistemic purposes, such minimal heuristics are sufficient. Yet for more specific epistemic purpose such as establishing the foundational structure of a scientific system, appropriate heuristics demand that a selection of the assertions to be thoroughly checked is made. Even assertions of “self-evident” propositions considered to provide the foundational truths of a system merit investigation. Self-evidence might manifest in immediacy of judgment—but due to the phenomenal indistinguishability of immediate judgments and those achieved through unconscious (fallible) inference one should be careful about experienced immediacy. Having chosen to test an assertion, epistemic agents “ascend” a hierarchical order of grounds—Bolzano’s Abfolge—towards the most general principles that can explain the asserted truth (Bolzano 1837a §220). The pathway followed may furcate repeatedly and lead to different possible explanations. Aside from the logical and semantic relations involved, it is the circumstances of investigation and the attitudes of the agents that determine the choices of ways and the extent of investigation in the course of their ascent. Whether an investigation has achieved the ultimate grounds is not easy to decide. Choosing “to abandon our testing and rightly so” at a given point springs from a pragmatic attitude involving many epistemic virtues that evaluate the results achieved so far, the resources available and the benefits and costs of further research (Bolzano 1837a §332.3). Our initial question was whether the requirement of knowing the foundational principles of a system S can be accommodated within the frame of Bolzanian knowing. Applying the analysis of Bolzanian knowing to the epistemic requirement of axiomatic systems, we get the following form: (3) A knowsB that [ p] is a foundational truth in S iff A asserts that [ p] is a foundational truth in S & A truly evaluates her asserting so as infallible. Bolzanian knowledge claims apply only to assertions. Since hypothesis is distinct from assertion, Bolzanian knowing cannot be claimed for the hypothesized axioms of a system conceived in hypothetico-deductive terms. Furthermore, it cannot be claimed for “[ p] is a foundational truth in S”. Suppose that heuristic rules lead us “up” from some acknowledged truths [Q] to more basic truths they derive from. The method is hypothetical or conditional in nature: we consider from what relevant set of truths [P] our initial truths [Q] can be inferred. No assertion of any [ p] is needed in order to evaluate whether [P] entails [Q]. But plausibly, once the cogency of [P]’s entailing [Q] is established, we are inclined to assert the propositions in [P] in their right as axioms of our system, and hence to assert a proposition of the form “[ p] is a foundational truth in S”. Such an assertion, however, does not seem to be entailed by logical necessity. We can abstain from it, considering the propositions in [P] as axioms of our system in a hypothetical mode. Doing so, we leave open the possibility of more appropriate axioms being found. If we assert the propositions in [P] to be the axioms of S, this assertion can result in two ways: either it ensues from probabilistic inference, or it is a decisional “leap”. In the first case, our epistemic attitude is just a variety of Bolzanian believing whose underlying evaluation revealed a high degree of probability. In the second case, the epistemic attitude seems to differ from believing since the awareness 20 Carruthers (2006).
123
Synthese (2011) 183:27–45
35
of potential error implied in believing is simply set at nought. If infallibility is to be preserved in Bolzanian knowing, it implies taking a decisional leap. Taking such a leap is epistemically daring: it is motivated rather by audaciousness than by trust and prudence. As mentioned before, evaluating one’s own assertion as infallible is the critical clause in the analysis of Bolzanian knowing. So far I have mentioned three features with regard to this problem. First, it seems that infallibility evaluation cannot itself be assertive, but rather has the character of a thought in the unspecified sense of a structured complex of concepts.21 Second, immediacy of judgment is not a sufficient condition for infallibility, since unconsciously inferred judgments appear equally immediate to the judging agent. Thus, in the case of apparently immediate judgment, the task of infallibility evaluation amounts to the evaluation of authentic immediacy. Third, perfect deduction that warrants a conclusion’s infallibility is dependent on infallibility claims concerning its premises. Unless not all antecedent judgments have probability 1—and hence infallibility—deductive inference will not be perfect, even if the argument is valid. Thus, evaluation of an inference’s infallibility amounts to evaluation of the infallibility of its premises, and we are either in a regress or thrown back to the problem of evaluating immediacy. In view of this situation, the choice is between rejecting the infallibility claim categorically or attempting to save it by providing an account of immediacy evaluation. I will adopt the latter option. 4 Immediacy evaluation The question of immediacy evaluation leads back to reliabilist considerations. For Bolzano, judging is the mental activity of asserting an objective proposition. Yet in order to assert an objective proposition [q], an agent A does not simply “grasp” the ready-made proposition [q] from the objective realm of propositions and adds her approval to it. Rather, A’s asserting [q] is considered the result of processes that involve various constructive mental “forces”. Although the big part of A’s constructive activity runs unnoticed, this is no reason to distrust its results.22 According to Bolzano, the cognitive process leading to a judgment is governed by what he calls the “judgmental force” (Urteilskraft), a force that concatenates a number of mental representations into propositionally structured complexes in a way that enforces approbation. Since Bolzano holds that all mental representations—whether simple or complex—mirror objective representations, there is no propositionally structured subjective representation that does not mirror an objective proposition. Therefore, what is “taken for true” or approved in judging is justly said to be the objective proposition [q] providing the 21 “I feel the truth of p” refers to an evaluative attitude whose comprehended thought is conceptually complex but not necessarily propositionally structured. Even paraphrasing “I feel the truth of p” by “I feel that p is true” does not amount to claiming that the content of the feeling is a judgment, viz. assertive and in need of justification. 22 Bolzano thinks that the processes resulting in complex representations and judgments are “much more
easily comprehended” when we assume “that in doing this the mind [Seele] does not form the judgment that it does it, but that it performs these operations without being clearly conscious of them” (Bolzano 1837b §302.1).
123
36
Synthese (2011) 183:27–45
“matter” of the subjectively built representation “q”. Operations of the “judgmental force” may or may not go together with operations of the “force of inference”. In the first case, the judgment produced presupposes the “presence” of other judgments in the mind which then are “the complete cause for the occurrence” of “q!” (Bolzano 1837b §300.2). Here, approbation of [q] is mediated by the mind’s “moving” from these antecedent judgments to judgment “q!”. In the second case, a judgment “q!” is “immediate” in that the approbation of [q] is independent of inferential moves. Agent A’s non-conceptual judgment “I feel a pain just now”, for example, might gain its immediate approbation from the fact that it expresses an actually experienced physical state of A. For purely conceptual judgments, Bolzano claims that the compulsion for immediate approbation bears exclusively on the meaning of the concepts from which they are built and the relations between them:23 To the full reason why we couch our judgment in just that way, why we claim that each object represented by the concept A has the property represented by the concept b […] belongs primarily also the peculiar property of both the concepts A and b themselves. […] it is obvious that—in order to explain the appearance of such a judgment in our mind (Gemüt)—we must not mention anything else […] than that we envisage by A and b just those concepts that we really envisage (vorstellen) (Bolzano 1837a §302).24 Claims for infallibility of this kind of judgments draw on the semantic properties of the representations involved to produce immediate assertion. Bolzano does not distinguish subjective from objective representations by declaring the latter as “sense” or “meaning” of the former. Rather, the distinction is expressed in terms of the “real existence” of mental phenomena—adhering to individual subjects and having causal efficacy—that instantiate or exemplify a “certain something” which “subsists” and constitutes their “immediate matter (Stoff)” (Bolzano 1837b §48.3).25 The semantics of representations is determined by their “content”, which is the “sum of the components” a representation consists of, and the order of these components (Bolzano 1837b §56). In general, a subjective representation “F” and its objective matter [F] do have the same content which enables them to fulfill their task of representing their common object (Bolzano 1837a §49.1). Agent A’s “having” a representation “F” is a mental instance of [F] that mirrors the semantic morphology of [F] and goes together with the understanding of “expressions” or “signs” that “signify” [F] (Bolzano 1837a §§79.2). If A has a propositionally structured conceptual representation “q”, A apprehends 23 In Bolzano, a concept (Begriff) is a representation which is not an intuition (Anschauung) and does not contain any intuition. An intuition is defined as a semantically simple representation of a singular object, paradigmatically instantiated in the demonstrative [this]. A non-conceptual judgement is a judgment containing intuitions, a (purely) conceptual judgment does not contain intuitions. 24 “[W]hether we make a judgment or not […] is a result […] merely of the sum total of the ideas
immediately present in our souls, according to a certain law of necessity” (Bolzano 1837c §291.5). 25 Bolzano uses the term “sense” or “meaning” (Sinn) with the complement noun “expression” or “sign”,
not with the complement noun “representation”. Although in one place he qualifies an objective proposition as a “meaning”, he does not qualify it as the meaning of a subjective representation but as the meaning of a “linguistic expression” (Bolzano 1839, §2.1; Bolzano 1837a §285). The notion of subjective representation is wider than that of linguistic expression.
123
Synthese (2011) 183:27–45
37
[q] and understands linguistic signs signifying [q]. According to Bolzano’s claim, the compulsion for immediate approbation is triggered in this case by A’s having understandingly ordered a number of components into a complex, so that—under given epistemic circumstances—this ordered semantic complex manifests in the assertion “A is b!”. It is also possible, though, that an identical judgment “A is b!” is mediated by conscious or unconscious inference. The class of judgments that are absolutely non-inferable by virtue of their semantic properties, i.e. their content„ is apparently empty. This fact makes it impossible to provide a structural feature to distinguish immediate judgments. Together with the fact of process opacity in the generation of judgments it rules out the possibility of absolute evaluation of infallibility. As an anti-intuitionist conception of knowing that involves reflective evaluation of one’s own epistemic performance, Bolzanian knowing seems restricted to approximate evaluation of infallibility.26 To maximize evaluation success, an agent needs to combine her skills in following acquired rules of testing with virtues motivating her choices as to the limits of such testing. Maximizing evaluation success also requires submitting the results for evaluation by others. Bolzano depicts a subjectivist epistemology, yet it is not individualist in the sense that it describes epistemic agency as a monadic activity. Epistemic agents are part of epistemic communities whose standards they adapt. Epistemic processes are not restricted to cognitive processes running in natural individuals, but also include collective processes of collecting, selecting, assimilating and distributing information. Bolzano factors this collective dimension into his epistemology by strong references to “common sense” that manifests, for example, in consensus and in the use of a common semiotic system in tracking truth. Bolzanian knowing seems to rely on an evaluation of infallibility that consists in the maximal combination of individual and collective abilities. Yet if Bolzanian knowing relies on mere approximate evaluation of infallibility, then it seems to be just a higher level or an ideal case of Bolzanian believing and its evaluation of fallibility. If infallibility cannot be tracked in an absolute way, there is no point in making such tracking the definitional criterion of knowing. An attempt to save knowing as different from believing apparently requires an additional criterion that provides the value of the absolute. As briefly outlined, I think that such a criterion can be found in the difference between salient virtues involved in believing and knowing. Before tackling this topic again, I should like to present some features of Bolzanian heuristics that combine the responsibility of testing one’s judgments with the task of searching the grounds of the truths asserted. This twofold task lies at the heart of both the evaluation of fallibility in believing and the evaluation of infallibility in knowing. Based on these heuristic requirements, the difference between salient virtues involved in believing and knowing will appear more articulate.
26 I use the term “anti-intuitionist” in the broad sense of denoting an epistemological position that defies the existence of a specific “intuitive” epistemic faculty allowing direct insight into the truth of a proposition. In Bolzano 1837a §315 (annotation) Bolzano criticizes the intuitionist position, especially if it is used to dogmatically establish one’s philosophical system. (This meaning of the term “intuition” differs from the one referred to in note 23. The English word “intuition” translates both the German “Anschauung” (a type of representation) and the German “Intuition” (an epistemic faculty)).
123
38
Synthese (2011) 183:27–45
5 Heuristics and virtue The heuristics of searching the grounds of asserted truths presupposes a testing of judgments that is not motivated by doubts in the judgment or requests for justification but by a desire to reveal the objective grounds of the truth upon which a judgment bears. Love of truth and zeal to explain are considered to be the driving attitudes in epistemic and more specific scientific activity.27 Bolzano’s heuristics is, as his entire epistemology, deeply rooted in ethical concerns, which are much more aretaic in nature than consequentialist or deontic. Within heuristics, it is primarily inductive and abductive reasoning that are within the scope of epistemic virtues. Contrary to rule-governed deductive reasoning, which is strictly valid or invalid and, as such, relatively insensitive to ethical concern, both inductive and abductive reasoning require that various factors be pondered and integrated in optimal ways into epistemic decisions. “Ascending to the grounds” of a given [q] is striving for its best explanation among possible competing options. It requires abilities such as inventive talent, delight in exploration, patience, perseverance, humility and so on that reach beyond the steadiness of rule applying. Drawing on the results of abductive and inductive procedures, Bolzanian knowing is highly dependent on the efficiency of epistemic virtues. The fact that Bolzano dissuades us from taking any apparent immediacy as a guarantee of infallibility abrogates the principle that knowing is attributable on the basis of subjective immediacy of foundational judgments. Whether a truth is foundational cannot be decided by immediacy of judgment, but can only be decided by working out abductive procedures. The results of abductive procedures, the first (or last) grounds, involve various decisions made on the way ascending to these grounds. These decisions are, consequently, part of Bolzanian knowing of foundational truth. They include, for example, decisions on where to stop investigations in order not to give them disproportional weight (Bolzano 1837a §332.3). Appropriate decisions require evaluative agentive properties, comparable to those by which artists apply the appropriate brushstroke or fit in the right word, or those by which wise people are aware of where to initiate or to stop certain actions. The moment to intervene in children’s strife, for example, is not determinable by consulting a deontic precept. Rather, it requires the application of a wide range of skills and experience, as well as the virtues of wisdom, understanding, foresight, fairness and so on. Situations like that are characterized by their involvement of different and contradicting values. Virtues are those properties that can assess the diversity of values involved in a situation, working out cognitively and emotionally balanced responses to it. In the epistemic case, a virtuous approach works out a balanced response to the values of good reasons, the force of evidence, the strength of counterarguments, the relevance of contextual aspects and so on. An aretaic approach to knowledge does not deny the necessity of mandatory precepts in the pursuit of truth. Rather, it emphasizes that the primary accent of the normative dimension in epistemological concerns is not the deontic character of rulefollowing, but the fact that epistemic duties warrant the success of epistemic activity when they are based in virtues. Bolzano’s heuristics demonstrates how duty and virtue
27 Valuable “cogitation” (Nachdenken) is “directed to truth” or “searching for truth” (Bolzano 1837a §322).
123
Synthese (2011) 183:27–45
39
enlace in epistemic attempts: it provides, on the one hand, a collection of rules for one’s “behavior in thinking”, a “directive for thinking” which “ought to be observed” to successfully increase knowledge, and, on the other hand, explicitly resumes aspects of cogitation that are subject to what an agent “earnestly wants”, and hence to the ethics of believing (Bolzano 1837a §§322, 324). The heart of Bolzano’s heuristics consists in a body of “general rules”, among which the following three deserve specific attention in the present context: “Tentative supposition or indirect method” (rule no. 5, §329), “Consulting judgments of others and experience” (rule no. 7, §331), “Testing one”s own judgments” (rule no. 8, §332). Heuristic rule no. 5 resumes the hypothetical method that “attempts to find truth by means of something that is not yet known to be true” (Bolzano 1837a §329.1). It combines with rule no. 8—testing one’s judgments—to achieve the twofold goal of testing a seemingly immediate judgment by ascending to the grounds of the asserted truth. Bolzano admits that due to the involvement of the unknown in the procedure of knowing, the hypothetical method seems “artificial”, “inverted” or “indirect” compared to the “natural” method of inferring known truth from known truth. Practicing exclusively the “natural” method of deduction—which presupposes assertion of grounds before assertion of consequences—restricts, however, knowledge extension in an unhealthy manner. People stubbornly insisting on deductive knowing as the only appropriate kind of knowing might appear stupid since all their cogitating may happen to lead to no significant result. On the other hand, the “inverted” methods are “amiss” in that they involve “haphazardness” and the favors of “serendipity”—and hence fallibility (Bolzano 1837a §330). Bolzano suggests that only a combination of “natural” and “inverted” methods allows accounting for knowing, and that the art of combining them ideally is highly determined by virtues. Ascending to the “first grounds” of a set of truths requires the application of the hypothetical method that cannot rule out fallibility; nevertheless, faultiness is minimized if we choose our tentative suppositions “with proper skillfulness” and “perform the examination in all ways available to us” (Bolzano 1837a §329.2). The required skillfulness enables agents to select those propositions that are potentially expedient as hypothetical grounds for a given statement. “Proper skills” for this task evaluate the propositions’ probability, their simplicity or their convenience for experiments with regard to the statement to be founded and the circumstances of investigation. Within the set of selected propositions, they can further establish a hierarchy that determines the sequence of examination (Bolzano 1837a §329.3). The soundness of the potential explanans can be examined progressively by checking the truth-values of propositions entailed by it, or again regressively by applying the hypothetical method to the hypothetical grounds themselves. For general statements, induction is another appropriate method of examination that implies various epistemic and experiential skills (Bolzano 1837a §329.8). Sometimes, so Bolzano claims, it may even suffice to “think” the proposition to be examined “as clearly as possible”, including representing it “in words or signs of another kind”. This may lead us to “see” its evidence or to remember other occasions when we already considered its truth or falsity. While choosing the appropriate combination of these methods in the circumstances belongs to epistemic skillfulness, epistemic virtue requires us “not to take a proposition for true just because it pops up in our mind or appears there very vividly” (Bolzano 1837a §329.4). Bolzano
123
40
Synthese (2011) 183:27–45
does not exclude subjective evidence as a relevant factor for truth-conduciveness, but he shows that epistemically virtuous agents will not unconditionally rely upon it. To help us to decide which combination of methods is the most promising in searching the grounds of a given truth, Bolzano suggests some pragmatic rules or principles. They are inspired, on the one hand, by the strong probabilistic nature of his logic and semantics, and, on the other hand, by consequentialist values. One such rule states that judgments must be tested only in cases in which “we see that a test is possible without assuming propositions that have less reliability than they have themselves”. If the latter is the case, we could at best corroborate the tested judgment but not refute it. Hence, if its eligible reasons are only less reliable propositions, we can “confidently desist from such testing” (Bolzano 1837a §332.7). It is obvious that the “reasons” mentioned here are supposed to refer to the objective grounds of the proposition [q] asserted in “q!”, and not merely to its “epistemic reasons” that factually led us to assert “q!”. The reliability of propositions can be accounted for in terms of their relational property of probability. Contrary to a proposition’s intrinsic truth or falsity, a proposition’s probability is a relational property depending on a certain set of propositions. Probability is explained—as deducibility is—by a principle of truth-preserving under conditions of variability of terms. It is a triadic relation between two sets of propositions and some constitutive representations: A proposition [q] is probable with regard to a class [P] of propositions and with regard to a variable constituent [c] if a variation of [c] generating only true propositions in [P] generates not only true propositions in [Q]. (Bolzano 1837a §161) Similarly, a proposition [q] is deducible from [P] with regard to [c] if a variation of [c] generating only true propositions in [P] generates only true propositions in [Q]. With regard to the relation between deducibility and probability, we may say with Jan Berg that probability is a weaker case of deducibility, or with Joëlle Proust that deducibility is the ideal limit case of probability with value 1.28 Evaluating the probability of potential grounds [P] of [q] and comparing it to the probability of [q] requires a uniform reference set of propositions. In scientific work, for example, it is common practice to consider a stock of well-established tenets as unquestionable. Such a set can be used to “probabilize” the propositions to be examined, as well as the potential grounds [P] that are considered to explain them. Against this background set, we evaluate whether a suggested candidate [ p] for grounding [q] has or has not the strength of higher probability than [q]. 28 “To say that a proposition P is derivable from the set of propositions with respect to a sequence
of ideas-as-such is tantamount to saying that the conditional proposition: if is true, then P is true, is universally valid with respect to the sequence in question—in other words, that the degree of validity of this conditional is 1 with respect to the sequence. It is near at hand, then, to consider weaker cases of derivability where the conditional has a lower degree of validity. In this way Bolzano is led to his notion of probability. […] This notion of probability is the logical relation of an hypothesis to its evidential support. Hence, Bolzano’s notion of probability has the formal properties of the concept of conditional probability” (Berg 2000, p. 54). “Derivability […] is an extreme case of the relationship of probability. […] When the ‘comparative validity’ of the proposition Mwith respect to A, B, C, D… is less than 1, there is no longer any relationship of derivability, but a relationship of probability” (Proust 1989, p. 89).
123
Synthese (2011) 183:27–45
41
It is rare, however, that epistemic agents in the evaluation of their assertions unconditionally rely on a completely determined reference set of propositions. Even if the axioms of an axiomatic science or the dogmas of a fundamentalist religion provide models for absolute reference sets, scientists at least will admit that their first principles stand in certain dependence relations to propositions external to the system. Bolzano mentions such mutual dependencies between different domains of knowledge in the introduction to the Theory of science. Admitting dependence relations certainly does not entail a throwback to skepticism by way of relativism, but rather a “throwforth” to an open-mindedness that acknowledges “foundherent” relations beyond the limits of a certain domain.29 Keeping in mind that also the seemingly most basic truths involve a web of assumptions remedies the flaw of haughty dogmatism pretending to identify which truths are absolutely foundational grounds. Even in cases of assumptions that are more reliable then the tested judgment we should weigh the costs of prolonging the investigation against the damage of error. Bolzano honestly admits that the rule of at least equal probability for grounding propositions is itself not an immediate judgment and, therefore, exposed to the “danger of error”. This danger, however, is considered more minor than “being determined by any other rule or by mere accident” with respect to the judgments to be tested.30 Since neither doubt nor trust is subject to unconditional randomness, a remaining unavoidable possibility of error is not a good reason to reject a given truth (Bolzano 1837a §332.7, annotation). This epistemic moral reflects an advice given by Kohelet according to which one should not become “overly righteous”.31 Being overly concerned with testing the ultimate possible reason for and especially against each judgment made would destroy one’s epistemic health.32 It would be as insane as taking each apparently immediate judgment as being unquestionably truth-conforming. As heuristic rule no. 7—“Consulting judgments of others and experience”— suggests, an important factor for making proper decisions regarding the ways and length of explanatory investigation is the experience from shared epistemic practices. Under the concept of “common sense” (gemeiner Menschenverstand) Bolzano subsumes performances of consensus and common semiotic systems without specifying, however, whether the term “common” is to be understood in a distributive or in a collective sense. In the first case, “common sense” would refer to a general property of all human beings; that is, to the kind of understanding that all of them have due to their nature. In the second case, the term would refer to a property that human beings have together, to a kind of understanding that grows out of their practical and mental interactions. In view of some remarks and expressions—especially in
29 For the concept of “foundherency” see Haack (1993). 30 A rule inferior in this respect is the Cartesian rule of universal doubt (or its weaker version of universal
testing), which is criticized for being impracticable. Since it is never possible to test all our judgments, we need a principle of selection in order to decide which judgments may remain untested. But exactly this principle is missing in the Cartesian rule of universal doubt. 31 “Don’t be overly righteous, neither make yourself overly wise. Why should you destroy yourself?”
(Bible: Kohelet 7, p. 16). 32 In his Von der weisen Selbsttäuschung (On Wise Self-Deception), Bolzano even praises deception and
error as “sometimes useful […] even for the most wise and virtuous of our species” (Bolzano 1976, p. 111).
123
42
Synthese (2011) 183:27–45
Bolzano 1837a §331—I am inclined to favor the second interpretation, at least for some occurrences of the term “common sense”.33 Collective epistemic properties imply not merely “social routes to knowledge” but also the shared intentionality of “collective epistemic agents”.34 Collective epistemology takes epistemic agents to be “deliberators engaging in certain cognitive activities from a first person perspective” which “marks” the rational point of view. Since the first person perspective allows for singular and plural specification, the “locus of epistemic authority and responsibility” can be distributed “within and across” collectives. It is held that epistemic processes often have to be accounted for as processes of joint individual contributions. Reasons that become “salient from the group perspective” will then have “immediacy” for individuals, in function of their group membership.35 One manifestation of “common sense” is consensus, considered by Bolzano as “a particular indicator of a judgment’s truth”, especially when “the proposition itself is not doubted in spite of dissent about its ground” (Bolzano 1837a §315.6). The claim that a consensual assertion “q!” indicates its truth-conformity is admittedly poignant in those cases where the reasons given for the assertion are not only different but conflicting. Cases of convergence in judgment in spite of diverging reasons seem to be paradigmatic for the need to search for the best explanation of the asserted truth. Insofar, reliance on consensus is not the mere “faith of optimists” or the contingent “fashion” of a “party spirit”.36 Accounting for Bolzanian knowing requires considering all the processes involved in infallibility evaluation. None of them can provide per se the ultimate warrant for infallibility. Epistemically virtuous agents will take consensus on [q] not as a license for dogmatic insistence on their assertion’s infallibility, but rather as a motive for increasing the search for grounds. Combining trust in the judgment’s truth-conformity with virtues like prudence and curiosity in a balanced way creates the integrity warranting a successful search for the best explanation. Consensual trust in [q] is an ideal starting point to engage in exploration and comparison of the most promising explanations of [q].37 Similarly, shared semiotic systems are important tools in the task of evaluating infallibility. Bolzano holds that thoughts shaped in purely subjective ideas, that is, in mere “mentalese”, are too ephemeral for objective tests of truth-conformity. Such testing requires us to get thoughts “into our grip”, so that they can be recalled “as often as 33 For instance, Bolzano equates a judgment that “almost all human beings” avow “as with One mouth”
with an “utterance of the common sense” (Bolzano 1837a §331.4). 34 Cf. Gilbert (2004); Tollefsen (2007). 35 Tollefsen (2007). 36 “Furthermore, even if all who investigate will converge on a given opinion, that does not make it true.
[...] The conviction that convergence coincides with truth is the faith of optimists, not part of a proper definition of truth”; “[W]hen we form beliefs, we frequently rely on the consensus of others to guide us […] In the […] case, where agreement is used as evidence to settle belief, one is pursuing truth for oneself, and the agreement of others is simply taken as diagnostic of truth. Agreement, however, is not always diagnostic of truth” (Goldman 1999, pp. 12, 71). 37 Bolzano gives the example of the “doctrine of the imperishability of substances” that we can adopt with
full confidence, even if all the reasons philosophers provide in order to prove it should be untenable. This truth is taken to be deeply rooted in human reason (Vernunft), since “even assuming a beginning one cannot think of an end of substances” (Bolzano 1838, p. 71f).
123
Synthese (2011) 183:27–45
43
we like”. This purpose is achieved by the use of adequate signs whose function is “to apprehend thoughts that rushed through our soul and to subject them to more accurate testing” (Bolzano 1837a §344.13). Especially complex representations are dependent on adequate signs that serve as the “bonding agent” or “cement” of their elements (Bolzano 1837a §344.12; §334.3–5). Bolzano does not advance anything like a “private language argument”, but the way he discusses a theory of signs (Bolzano 1837a §§334–344) as part of heuristics makes it clear that he considers a semiotic system to be essentially the result of inter-subjective endeavor. Although semiotic systems are conventional, they aim at avoiding arbitrariness and misunderstanding. They cash the mass of subjective representations into a currency that enables intra- and intersubjective trade with truth and explanation. One interest of such trade clearly lies in making thoughts available to objective testing. Other advantages of the use of signs are the possibility of longer inferential series, a better survey of epistemic processes and the chance of being led to new and unexpected thoughts by random assemblies of signs (Bolzano 1837a §334.6–8). All these functions play relevant roles in infallibility evaluation, although none of them can warrant infallibility in its own right. 6 Conclusion Bolzanian knowing is introduced as a reflective attitude that epistemic agents take towards their own assertions. The “reflection” is supposed to reveal the assertion’s value of infallibility. This evaluation involves explanatory attempts at “ascending” to potential grounds of the asserted truth. The endeavor of explaining asserted truths involves not only the rules of “inverted” methods, but also subjective decisions requiring a combination of skills and character traits. None of the rules mentioned can guarantee infallibility, and neither can skills nor virtuous character traits, nor any specific combination of these factors. It seems to follow then that Bolzanian heuristics annuls the possibility of Bolzanian knowing. We are doomed to accept that all reflective attitudes rest on the approximations of the “best possible” evaluation of an assertion’s fallibility, and hence fall into the scope of Bolzanian believing. We can accept this verdict with a shrug—after all, there seems to be nothing bad in assuming that reflective assessment of one’s assertions is always striving for the “best possible” evaluation and that the “perfect” borderline case is just an ideal, a construct helping us to not lose sight of our goal. We can say, in this case, that Bolzanian knowing is accounted for by the following analysis: (4) A knowsB that [q] iff A believesB that [q] & A’s evaluation of her asserting [q] as fallible reveals a high probability of [q]. This analysis can be related to the claims of Bolzanian epistemic ethics that demand to call a proposition “secure” if its probability “for a certain being is very large and if there are circumstances which make it foolish or illicit to consider the possibility of the opposite” (Bolzano 1837b §317.3). In such circumstances—so we might claim— there is not only a right to call the asserted proposition “secure”, but also a right to consider our attitude as “knowing”. Earlier in this paper, I suggested another option for dealing with the insufficiencies of infallibility evaluation that allows for a more substantial distinction of Bolzanian
123
44
Synthese (2011) 183:27–45
knowing from believing. To provide the value of the absolute to knowing, we might resort to the idea of a decisional “leap” that leaves probability considerations behind. Let us assume that the attitude of believing implies awareness of potential error, whereas the attitude of knowing is the result of a decision to set at nought such awareness. Knowing then appears as an attitude whose salient virtue is not trust but rather audaciousness. Instead of trusting in the truth-conformity of her fallible assertion, the knowing agent dares to state the infallibility of her assertion. Such an approach can be accounted for by the following analysis: (5) A knowsB that [q] iff A asserts [q] & A evaluates her asserting [q] & A audaciously decides that her evaluation reveals infallibility. It is not clear how satisfactory such an analysis of knowledge can be. First, it allows two agents in identical epistemic situations with identical epistemic background to have different attitudes toward their assertion “q!”: While agent A believes that [q] because her salient virtue is prudence, agent B knows that [q] because her salient virtue is audaciousness. Second, there is the problem of virtues being gradual properties, turning into vices when they reach a certain degree or when they become dominant. Audaciousness is valuable not per se, but only in the appropriate situation and in an appropriate measure. It may be, however, that such a “decisional” conception of knowledge fits the purpose of settling the epistemic status of foundational principles in axiomatic systems. For the pragmatic purpose of working with such principles it is an advantage to attribute to them the status of being known to be the fundamentals. And it is an advantage to keep the sense of this “being known” stronger than that of “being believed”. It is wise to keep in mind, however, that rash decisions to “leap” into knowledge might testify to vicious rather than virtuous audaciousness. Acknowledgements This paper is dedicated to my teacher Kevin Mulligan. I would like to extend my gratitude to Arianna Betti, Marije Martijn and an anonymous referee for their insightful and constructive comments that helped me improve this paper.
References Baier, A. (1992). Trusting people. Philosophical Perspectives, 6, 137–153. Berg, J. (1976). Bolzanos Metaphysik. In G. Oberkofler & E. Zlabinger (Eds.), Ost-West-Begegnung in Österreich: Festschrift für E. Winter (pp. 27–33). Wien: Böhlau. Berg, J. (2000). From Bolzano’s point of view. The Monist, 83, 47–67. Bolzano, B. (1837a). Wissenschaftslehre. In L. Winter et al. (Eds.) (1969). Bernard Bolzano Gesamtausgabe. Reihe 1 (Vol. 11–14). Stuttgart-Bad Canstatt: Frommann Holzboog. Bolzano, B. (1837b). Theory of science. Attempt at a detailed and in the main novel exposition of logic with constant attention to earlier authors. R. George (Transl. & Ed.) (1972). Berkeley and Los Angeles: University of California Press. Bolzano, B. (1837c). Theory of science. J. Berg (Transl. & Ed.) (1973). Dordrecht/Boston: D. Reidel. Bolzano, B. (1838). Athanasia oder Gründe für die Unsterblichkeit der Seele. Frankfurt a. M.: Minerva GmbH. (Unveränderter Nachdruck 1970.) Bolzano, B. (1839). Von der mathematischen Lehrart. J. Berg (Ed.) (1981). Stuttgart - Bad Cannstatt: Fromann-Holzboog. Bolzano, B. (1976). Ausgewählte Schriften. E. Winter (Ed.). Berlin: Union Verlag. Carruthers, P. (2006). Simple heuristics meet massive modularity. In P. Carruthers, S. Laurence, & S. Stich (Eds.), The innate mind. Culture and cognition (Vol. 2, pp. 181–198). Oxford: Oxford University Press.
123
Synthese (2011) 183:27–45
45
Chisholm, R. (1991). Bernard Bolzano’s philosophy of mind. Philosophical Topics, 19, 205–214. de Jong, W. R., & Betti, A. (2008). The classical model of science: A millennia-old model of scientific rationality. Synthese. doi:10.1007/s11229-008-9417-4. Gilbert, M. (2004). Collective epistemology. Episteme: A Journal of Social Epistemology, 1, 95–107. Goldman, A. (1999). Knowledge in a social world. Oxford: Clarendon Press. Greco, J. (2003). Knowledge as credit for true belief. In M. DePaul & L. Zagzebski (Eds.), Intellectual virtue: Perspectives from ethics and epistemology (pp. 111–134). Oxford: Oxford University Press. Haack, S. (1993). Evidence and inquiry: Towards reconstruction in epistemology. Oxford, Malden (MA): Blackwell. Krause, A. (2004). Bolzanos Metaphysik. Freiburg/München: Karl Alber. Lapointe, S. (2008). Bolzano, a priori knowledge and the Classical Model of Science. Synthese. doi:10. 1007/s11229-008-9421-8. Lehrer, K. (1997). Self-trust. A study of reason, knowledge, and autonomy. Oxford: Clarendon Press. Proust, J. (1989). Questions of form. Logic and the analytic proposition from Kant to Carnap. Minneapolis: University of Minnesota Press. Rescher, N. (2005). Epistemic logic. A survey of the logic of knowledge. Pittsburgh: University of Pittsburgh Press. Sebestik, J. (1992). Logique et mathématique chez Bernard Bolzano. Paris: Vrin. Swank, C. (2000). Epistemic vice. In G. Axtell (Ed.), Knowledge, belief, and character. Readings in virtue epistemology (pp. 195–204). Lanham/Boulder/NY/Oxford: Rowman & Littlefield. Tatzel, A. (2002). Bolzano’s theory of ground and consequence. Notre Dame Journal of Symbolic Logic, 43, 1–25. Tollefsen, D. (2007). Collective epistemic agency and the need for collective epistemology. In N. Psarros & K. Schulte-Ostermann (Eds.), Facets of sociality (pp. 309–329). Frankfurt etc.: Ontos. Zagzebski, L. (2000). From reliabilism to virtue epistemology. In G. Axtell (Ed.), Knowledge, belief, and character. Readings in virtue epistemology (pp. 113–122). Lanham/Boulder/NY/Oxford: Rowman & Littlefield.
123
This page intentionally left blank z
Synthese (2011) 183:47–68 DOI 10.1007/s11229-009-9667-9
On the creative role of axiomatics. The discovery of lattices by Schröder, Dedekind, Birkhoff, and others Dirk Schlimm
Received: 5 January 2008 / Accepted: 13 March 2009 / Published online: 16 October 2009 © Springer Science+Business Media B.V. 2009
Abstract Three different ways in which systems of axioms can contribute to the discovery of new notions are presented and they are illustrated by the various ways in which lattices have been introduced in mathematics by Schr¨oder et al. These historical episodes reveal that the axiomatic method is not only a way of systematizing our knowledge, but that it can also be used as a fruitful tool for discovering and introducing new mathematical notions. Looked at it from this perspective, the creative aspect of axiomatics for mathematical practice is brought to the fore. Keywords
Axiomatics · Discovery · Lattice theory · Mathematical practice
1 Introduction 1.1 On the creative role of axiomatics It is quite common to regard axiomatic systems only as an aspect of the rigorous presentation of scientific or mathematical theories, or of the description of certain domains, but to deny them any role in the creation of new mathematics. This view is succinctly expressed, for example, in the following remarks by Felix Klein on the axiomatic treatment of group theory: The abstract [axiomatic] formulation is excellently suited for the elaboration of proofs, but it is clearly not suited for finding new ideas and methods, rather, it constitutes the end of a previous development. (Klein 1926, p. 335)
D. Schlimm (B) Department of Philosophy, McGill University, 855 Sherbrooke St. W., Montreal, QC H3A 2T7, Canada e-mail:
[email protected]
123
48
Synthese (2011) 183:47–68
Despite the fact that the importance of axiomatics for advancing mathematics had been clearly recognized and often emphasized by none other than Klein’s successor in Göttingen, namely David Hilbert (Hilbert 1918), Klein’s view has remained very popular among mathematicians and even more so among philosophers.1 Traditional accounts of the use of axiomatics in science and mathematics often begin with a specific set of objects or a certain domain of being, say D, which an axiomatic system, say S, is intended to describe and characterize.2 Understood in this way, axiomatization is the process of finding an adequate S for a given D. However, Aristotle’s brief remarks about the introduction of a new notion for what numbers, lines, solids, and times have in common, based on the similarity of certain proofs about them (Analytica Posteriora 1.5, 74a17–25),3 suggest the following procedure: Take some domains D1 , D2 , D3 , etc. that are considered to be analogous in some respect and determine the corresponding axiomatic systems S1 , S2 , S3 , etc.; then, compare these systems and find a (sub-)system S that they have in common and introduce a new notion D as the domain of being for S . Aristotle noticed that a scientific system S can be used in this way to suggest new notions, objects, or domains. Thus, axiomatization is not necessarily a one-way process from D to S, but it can also lead one from S to D . This insight presupposes neither the notion of a formal system, nor the possibility of multiple interpretations (although the latter would most likely be our way of expressing it). Since the domain D is more abstract (in the sense of having only a subset of the properties) than the domains D1 , D2 , D3 , etc., the natural setting for such introductions of new notions is mathematics, where the objects are inherently abstract. Indeed, the mathematical notion of magnitude was introduced later to express what the domains discussed by Aristotle have in common. With a conception of formal systems at hand, by which I mean systems with primitives that can be interpreted in different ways, and which emerged in the 19th century, a second, related way of introducing new domains became possible: Only certain aspects of a single domain D are axiomatized by a system S, and then a new domain D is introduced that is completely determined by S. As a result, this new domain is more abstract than D itself. Furthermore, an axiomatic system S does not need to originate from a given domain D at all, but it can also be obtained through modification from another system of axioms. For example, the first axiom systems for non-Euclidean geometry were obtained in this way from given systems of Euclidean geometry. Only after their consistency was established by interpreting the primitives in an Euclidean setting were the new sets of objects, namely non-Euclidean points and lines, introduced. This is a third way of introducing new domains. Thus, we have identified three distinct ways in which axiomatics can contribute in an essential way to the introduction of new notions: (a)
By analogy: Properties that different analogous domains have in common are expressed by a set of axioms, which, in turn, are taken as the definition of a new and more abstract notion. In other words, one begins with a prior conception
1 For a more in-depth discussion of this point, see Schlimm (2006). 2 See, for example, de Jong and Betti (2008). 3 For a detailed discussion of this passage, see Hasper (2006).
123
Synthese (2011) 183:47–68
(b)
(c)
49
of certain domains being similar and captures this similarity in terms of a common system of axioms.4 These axioms are then understood as characterizing an abstract notion that is instantiated by the analogous domains. By abstraction: Specific properties of a given domain are axiomatized and other domains are identified that also satisfy these axioms. In other words, one starts here with a particular mathematical domain and describes it axiomatically, thereby abstracting from all aspects that are deemed irrelevant. This axiomatic characterization then guides one to the discovery of other domains that satisfy the axioms and which are, on this basis, considered analogous. By modification: A given axiomatic system is modified, by adding, deleting, or changing one or more axioms, and the resulting system is used as the definition of a new kind of domain.
For the new notions introduced in one of these ways to become accepted as genuine mathematics, in particular those that arise by modification, their underlying system of axioms must be considered to be ‘interesting’ in one way or another. We will see in the historical examples discussed below that a generally accepted sufficient reason for investigating an axiomatic system with genuine mathematical content is the fact that it describes a domain that had been investigated previously in its own right. This justifies considering the axioms as characterizing a new abstract notion and bars the introduction of notions based on a completely arbitrary set of axioms, since it guarantees a connection to the current body of mathematics. In the remainder of this paper I will present and discuss how the notion of lattice has been introduced independently by Schröder, Dedekind, Birkhoff, and others, as examples of these three methods for introducing new domains on the basis of axiomatic systems, and I conclude that the axiomatic method is not only a way of systematizing our knowledge of specific domains, but that it can be—and has been—used as a fruitful tool for discovering and introducing new mathematical notions. Looked at it from this perspective and taking into account the role of axiomatics in modern mathematical practice, the creative aspect of axiomatics is brought to the fore. 1.2 The development of lattice theory A lattice is an algebraic structure that can be defined in terms of two operations ∧ (meet) and ∨ (join) that are commutative, associative, and satisfy the absorption laws a ∧ (a ∨ b) = a and a ∨ (a ∧ b) = a,5 or, equivalently, in terms of a partial order relation on a domain in which the infimum and supremum of any two elements exist. This structure is instantiated in many different areas of mathematics, such as logic, set theory, algebra, geometry, functional analysis, and topology. Important for the development of lattice theory is their relation to another algebraic structure, namely Boolean algebras, which can be obtained by adding a complement relation to distributive lattices with 0 and 1 elements. 4 On the use of axioms to characterize analogies, see also Schlimm (2008b). 5 In an axiomatic definition of lattices one can also use idempotency together with the equivalence
(a ∧ b = a) ↔ (a ∨ b = a) instead of the absorption laws; see Birkhoff (1933) and Klein (1935).
123
50
Synthese (2011) 183:47–68
The history of the emergence of lattices and of the establishment of lattice theory as a well-respected and independent branch of mathematics has been investigated with great detail by Herbert Mehrtens in Die Entstehung der Verbandstheorie (The Genesis of Lattice Theory, 1979), on which I rely heavily in this presentation.6 Mehrtens identifies three main sources for the notion of lattice: The set-theoretic grounding of mathematics, modern algebra, and the “axiomatic method” (Mehrtens 1979, p. 292). With respect to the latter, he mentions two kinds of generalizations by which new notions can be introduced, which correspond to those referred to above as ‘by analogy’ and ‘by abstraction’ (Mehrtens 1979, p. 197). Comparing the various developments that led to the independent introductions of the notion of lattice in the late 19th century and again in the 1930s, Mehrtens points out that all of these formations of a new notion resulted from generalizations, but none of them as part of a solution to some concrete problem.7 Some particular episodes from the history of lattice theory that reveal the contributions of axiomatics are presented and discussed below. 2 Lattices obtained by modification of a system of axioms: Schröder’s logical calculus From early on in his career, Ernst Schröder (1841–1902) showed an interest in formal calculi and was well aware of the creative power of axiomatics, which he intended to exploit in his programme of formal or absolute algebra. In his textbook on arithmetic and algebra (1873) he describes the general study of formal algebraic systems as proceeding in four stages: First, find all possible assumptions that could be used for defining an operation in a systematic and sufficient way, with consistency being the only restriction to be imposed on these assumptions. Second, investigate the consequences that can be derived from these assumptions. Third, try to find operations on number systems that are governed by the same laws, and fourth, determine what other meanings, e. g., geometric or physical, could be given to these operations (Schröder 1873, pp. 233 and 293–296). Four years later he presented an axiomatization of Boolean algebra as a calculus for classes and propositions, aiming at a formal and rigorous presentation with a minimal number of axioms (Schröder 1877). This axiomatization is based on two operations on classes, + and · (understood as union and intersection, respectively), and it postulates that classes are closed under these operations, which are both associative, commutative, and idempotent; it further postulates that for any classes a, b and c, a = b implies ac = bc and a + c = b + c, that they satisfy the distributivity law a(b + c) = ab + ac, that the universal class (denoted by ‘1’) is the identity with respect to multiplication, and, finally, that for every class a there exists a complement a1 , such that aa1 = 0 (where ‘0’ denotes the empty class) and a + a1 = 1. As Mehrtens comments, Schröder’s investigations are not as rigorous as announced (Mehrtens 1979, pp. 35–36); what is missing for a complete system of 6 On the development of Boolean algebra, see also Serfati (2007). 7 This observation contradicts the popular claim that the notions of modern mathematics arose by necessity
in order to solve earlier problems. See also the quotation from Birkhoff at the beginning of Sect. 4, and Schlimm (2008a).
123
Synthese (2011) 183:47–68
51
axioms for Boolean algebra are the (implicit) existence requirements for 0 and 1, and the condition that 0 = 1. Nonetheless, Schröder is able to deduce from these axioms known theorems of Boolean algebra, including the absorption laws, and one distributivity law, a + bc = (a + b)(a + c), while the dual one (i. e., a(b + c) = ab + ac) is taken as an axiom. However, an axiomatic characterization of lattices cannot be obtained directly by simply removing one or more axioms from this system. For this, a rearrangement of the system was necessary, which resulted a few years later after a brief exchange with C. S. Peirce. Peirce had also investigated Boolean algebras and in his article “On the algebra of logic” he claimed that the distributivity laws are “easily proved [ …], but the proof is too tedious to give” (1880, p. 33). This remark must have caught Schröder’s attention, since he began to study the independence of the distributive axiom himself. In the course of these investigations he split the axiom into the two inequalities a(b + c) = ( ab + ac, and ab + ac = ( a(b + c), using the symbol = ( for subsumption, and he was able to show that the latter was provable from the remaining axioms, while the former was not. Schröder refers to Peirce’s claim regarding the provability of the distributive laws in the first volume of his Vorlesungen über die Algebra der Logik Vorlesungen über die Algebra der Logik (Lectures on the Algebra of Logic), published in 1890, noting that “This was a point that needed correction,” and adding that one of the distributive inequalities was indeed easy and straightforward to prove. By no means, however, was I able to find a proof for the other part of the theorem. Instead, I was successful in showing its unprovability [ …]. A correspondence with Mr. Peirce on this matter clarified the issue, since he also had become aware of his mistake. (Schröder 1905,Vol. I, pp. 290–291). 8 To show the independence of the distributive law from the other axioms, Schröder employed what he called the method of proof “by exemplification” (Schröder 1905, Vol. I, p. 286), which involves the now familiar presentation of a model in which the independent axiom is false, but the remaining ones are true. The quest for such a model led him to reconsider his earlier work on absolute algebra, where he found a suitable one in his logical calculus with groups (“logischer Kalkul mit Gruppen”). The results of these developments are presented in Schröder’s lectures on the algebra of logic. In the first volume of these lectures he introduces an identical calculus with subsets of a domain (“identischer Kalkul mit Gebieten einer Mannigfaltigkeit”) that is based on the primitive order relation = ( . The basic assumptions for this calculus are:9 Principle I. Principle II. Def. (1). Def. (2× ). Def. (2+ ).
a= ( a. If a = ( b and b = ( c, then a = ( c. If a = ( b and b = ( a, then a = b. 0 is that subset for which 0 = ( a, for every subset a of the domain. 1 is that subset for which a = ( 1, for every subset a of the domain.
8 See also Peirce (1885, p. 190) for Peirce’s acknowledgement of Schröder’s correction. For some later
developments, see Huntington (1904, pp. 300–301) for excerpts from a letter from Peirce to Huntington on this issue, and the discussion in Peirce (1966, Vol. III, p. 128), and Mehrtens (1979, pp. 47–48). 9 These can be found on pages 168, 170, 184, 188, 196, 212, 214, 293, 302, and 303 of Vol. I of Schröder (1905); see also Mehrtens (1979, pp. 43–44).
123
52
Synthese (2011) 183:47–68
If c = ( a and c = ( b, then we say that c = ( ab. If a = ( c and b = ( c, then we say that a + b = ( c. 0 is added as the empty subset. 1 is the entire domain. ab is that subset that is common to a and b. a + b is that subset that is formed by a together with b. If bc = ( 0 (and thus bc = 0), then a(b + c) = ( ab + ac. ( 0 and The negation of a subset a is a subset a1 , such that aa1 = 1= ( a + a1 holds. Postulate (3). For every subset a there is at least one subset a1 , which can be obtained by omitting a from the entire domain.
Def. (3× ). Def. (3+ ). Postulate (1× ). Postulate (1+ ). Postulate (2× ). Postulate (2+ ). Principle III× . Def. (6).
With minor changes (mainly of Definition 3× and Principle III× ) this system was later presented by Huntington as an axiomatization of Boolean algebra (1904). Schröder mentions six different areas of application for this calculus and points out that the conditions listed above Principle III× constitute a separate area of application, namely the logical calculus with groups, which is an instance of the modern notion of a lattice with zero and one elements. Schröder devotes three appendices to his lectures to a discussion of the logical calculus and he uses algorithms, which he had studied extensively in earlier publications, to exhibit a model for it. For Schröder, a group is simply a system that is closed under an operation and an algorithm is a group of formulas of a particular syntactic form. Of the form in question there are 990 different formulas, which constitute the universe U , and Schröder defines product and sum on algorithms, as well as the zero and one algorithm. While arbitrary subsets of U together with union and intersection form a Boolean algebra, Schröder shows that the distributive law (Principle III× ) is not satisfied by the class of algorithms with operations suitably defined, while the laws from Principle I to Postulate 2+ are. Thus, he concludes that the notion determined by these postulates is of genuine mathematical interest and that the distributive law (and its dual, which can be proved from it) is independent of the other axioms. As Mehrtens emphasizes, it is the fact that this model has real content, i. e., that algorithms were studied before in their own right, which makes this independence result so important. Indeed, this is the only case in which Schröder proves the independence of an axiom, from which Mehrtens concludes that “this is not just an axiomatic technique, but the demarcation of two structures” (Mehrtens 1979, pp. 49–50). The developments after the publication of the first volume of Schröder’s lectures support the claim that an axiomatic definition of an abstract notion guides the discovery of other instances, since shortly afterwards other models were suggested for proving the independence of the distributivity axioms: Classes of natural numbers that are closed under addition (Lüroth 1891), ideal contents of concepts (Voigt 1892), and Euclidean and projective geometry (Korselt 1894). These are discussed by Schröder in the second volume (1905) of his lectures on algebra (Schröder 1905, Vol. II, pp. 401–423).10
10 See also Mehrtens (1979, p. 59).
123
Synthese (2011) 183:47–68
53
To summarize, the emergence of the notion of lattice in Schröder’s work shows how an axiomatic characterization of a new mathematical notion can have its origin in a previous axiomatization, from which only a subset of the original axioms is considered. The particular axiom that led to this subset was brought to Schröder’s attention by Peirce’s investigations regarding its independence from the other axioms, and Schröder’s own previous studies suggested to him a particular model for the remaining axioms, which was of independent interest.11 This model justified him to regard the remaining axioms as determining a new notion, of which other mathematically interesting instances were subsequently found. 3 Lattices as abstractions: Dedekind’s Dualgruppen Richard Dedekind (1831–1916) was highly influential in developing the modern abstract style of mathematics and many of his results and techniques have become standard: He introduced such fundamental algebraic notions as field, module, and ideal, he formulated an axiomatic characterization of the natural numbers, and he gave the construction of a continuous domain in terms of cuts of rational numbers. What is perhaps less well-known, is that he also developed—more or less as a byproduct of his work on algebraic number theory and independently of Schröder—the notion of lattices. In algebraic number theory Dedekind’s general aim was to transfer notions and results pertaining to elementary number theory to more general domains of numbers. Such a programme had begun with Gauss’s investigations of the whole complex numbers of the form a + bi (a, b ∈ Z), now called ‘Gaussian integers.’ Kummer had extended this approach to the cyclotomic integers, solving the difficulty that decomposition into prime factors is not always unique by the introduction of ideal numbers.12 This background explains some of Dedekind’s seemingly unusual—for the modern reader—choice of terminology in algebra, which was deliberately chosen to highlight the analogies with number theory.13 He published his main contributions to algebraic number theory as Supplements to the second (1871), third (1879), and fourth (1894) editions of Dirichlet’s Vorlesungen über Zahlentheorie (Lectures on Number Theory), which Dedekind also edited.14 It is telling for the depth of his work that Emmy Noether had her students read all versions of these supplements (Dedekind 1964, Introduction). In the following, the interplay between Dedekind’s axiomatic approach15 and the emergence of his notion of lattice, which he called Dualgruppe, is presented. 11 Notice that many models constructed only for the purpose of showing the independence of certain axioms, e. g., in (Huntington 1904), are of no further mathematical interest. 12 The ring of cyclotomic integers is Z[ζ ], where ζ = cos 2π + i sin 2π is a complex nth root of 1. n n n n The name derives from the fact that the points ζn , ζn2 , . . . , ζnn are equally spaced around the unit circle. See
Dedekind (1877, pp. 3–45) for a historical introduction by Stillwell. 13 See, for example, Dedekind (1877, p. 64), and Dedekind (1932, III, p. 62). 14 For a discussion of these works, see Avigad (2006). 15 On Dedekind’s axiomatic approach in his foundational work, see Sieg and Schlimm (2005).
123
54
Synthese (2011) 183:47–68
An important technique for the formation of mathematical notions, which Dedekind employed as early as 1857, is to consider the set of objects that have a certain property as a single entity. For example, Dedekind considered congruent numbers and those numbers that are divisible by one of Kummer’s ideal numbers as single mathematical objects. Similar considerations in his work on modules led him to the notion of lattice. When Dedekind first introduced the notions of fields, modules, and ideals in 1871, the operations on these entities were not part of the definitions themselves. Rather, they were induced from the underlying domain of numbers. Thus, a module was defined simply as a system of real or complex numbers that is closed under addition and subtraction (Dedekind 1932, Vol. III, p. 242). Relations between modules, like being a divisor and multiple, as well as the notion of greatest common divisor (gcd ) and least common multiple (l cm), were defined in terms of the underlying domain, but no symbols were introduced for these operations and Dedekind did not investigate them further. Only six years later, and with some hesitation, did Dedekind introduce symbols for multiple (>), divisor (<), lcm (+), and gcd (−) in (Dedekind 1877b, p. 121). This allows him to concisely state the following theorems (without proof), for modules a, b, and c: (a + b) − (a + c) = a + (b − (a + c)), and (a − b) + (a − c) = a − (b + (a − c)). These correspond to what are now called the ‘modular laws’ in the theory of lattices and they illustrate the need for the introduction of symbolic representations for gcd and lcm in order to express such general facts. Dedekind also noted that these “characteristic theorems” display a dualism that holds throughout for the notions of gcd and lcm. That is, any true formula expressed in terms of + and − can be transformed into another true formula by switching these symbols. Mehrtens also mentions notes from Dedekind’s Nachlaß entitled “Über den Dualismus in den Gesetzen der Zahlen Moduln” (On the dualism in the laws of number modules)16 , which reveals his interest in this particular phenomenon. In the 1894 version of the Supplements to Dirichlet’s lectures Dedekind speaks of “a peculiar [eigentümlicher ] dualism” (Dedekind 1932, Vol. III, p. 66) between gcd and lcm. He introduces these operations separately and shows their fundamental properties, i.e., commutativity, associativity, and idempotency, for modules a, b, and c (Dedekind 1932, Vol. III, pp. 63 and 65): a + b = b + a, (a + b) + c = a + (b + c), a + a = a,
and
a − b = b − a, (a − b) − c = a − (b − c), a − a = a.
Together with the modular laws, which are now proved, the symmetry of these two operations becomes quite apparent. In a footnote to these considerations Dedekind introduces the notion of a Modulgruppe: 16 (Cod. Ms. Dedekind XI, 1). This manuscript makes references to the second edition of the Supplements,
thus being written before 1879, when the third edition was published.
123
Synthese (2011) 183:47–68
55
If one repeatedly generates modules by forming greatest common divisors and least common multiples, beginning from three arbitrary modules, one obtains a finite Modulgruppe, which consists in general of 28 different modules. The peculiar laws of this group, which contains the modules a ± b if it contains the modules a and b, shall be discussed elsewhere [cf. XXX]. (Dedekind 1932, Vol. III, §169, pp. 66–67)17 The Modulgruppe is also mentioned later in the text in a footnote, where ideals are introduced as a special kind of modules. Dedekind remarks that the group in question is reduced to 18 elements, if its elements are ideals rather than modules, which indicates that he had already studied in some detail the structures induced by the gcd and lcm. The notion of lattice is finally introduced under the name Dualgruppe in Dedekind (1897) and studied further in Dedekind (1900). In the first of these articles, Dedekind studies systems of numbers in terms of their gcd s. This is done in the most general way possible, he explains, and is to be extended to domains which do not allow for decomposition into prime factors. Dedekind begins by investigating systems of three and four numbers, then systems consisting of n general elements, called combinations. For these he formulates six fundamental laws for the operations of − (the combination common to two given ones) and + (the combination that contains two given ones), referred to as “laws A”: α + β = β + α, (α + β) + γ = α + (β + γ ), α + (α − β) = α,
α − β = β − α, (α − β) − γ = α − (β − γ ), α − (α + β) = α.
Thus, Dedekind identifies commutativity, associativity, and the absorption laws, and he also notes that the idempotent laws α + α = α and α − α = α follow from them, but that the distributive laws (α − β) + (α − γ ) = α − (β + γ ) and (α + β) − (α + γ ) = α + (β − γ ), although true for the combinations considered, are not deducible from the laws A. Since Dedekind’s combinations are sets of elements, the operations of + and − can also be interpreted as union and intersection. Seen in this way, Dedekind remarks, many of his theorems about combinations correspond to theorems proved in Schröder’s lectures on the algebra of logic, and he attributes “particular importance” to the fact that Schröder showed the independence of the distributive laws from the system of laws A. In fact, he remarks that he had dealt with these questions for many years himself and that he had also arrived at this result “not without great effort” (Dedekind 1897, p. 113). In the subsequent paragraph Dedekind gives the following definition of Dualgruppe: 17 The reference XXX is to Dedekind (1900) and was added by the editors of the Gesammelte Werke. Note
that Dedekind writes a ± b for ‘a + b and a − b.’
123
56
Synthese (2011) 183:47–68
A system A of things α, β, γ . . . is called a Dualgruppe, if there are two operations ±, such that they create from two things α, β two things α ± β that are also in A and that satisfy the conditions A. (Dedekind 1897, p. 113) To be sure, Dualgruppen are not groups in the modern sense, but lattices. And although Dedekind himself had studied groups in the 1850s and an axiomatic characterization of groups had been published by Dyck in 1882,18 it appears that this term was not always used in this sense by mathematicians who were not deeply involved with the theory of groups. For them, including Dedekind and Schröder, a group was simply a set of elements that is closed under certain operations.19 Immediately after the above definition, Dedekind continues: “In order to show how multifarious the domains are to which this concept can be applied, I mention the following examples” (Dedekind 1897, p. 113), and in addition to the model provided by Schröder, he describes five other models, namely modules, ideals, the subgroups of a group, fields, and points of an n-dimensional space. Referring to this list of examples, Birkhoff, who takes historical accuracy very seriously, remarks that “[t]he abundance of lattices in mathematics was apparently not realized before Dedekind” (Birkhoff 1940, p. 16). Thus, in this case the axiomatic definition of an abstract structure goes hand in hand with the observation that other mathematical domains also satisfy the axioms. Dedekind not only gives an axiomatization of lattices, but he also develops the theory further. As already noted, the distributive laws do not hold in general for Dualgruppen, but they do for the important models from logic and ideal theory. This leads Dedekind to introduce the subspecies of Dualgruppe vom Idealtypus, i. e., distributive lattices. The lattice of modules, for which the modular laws hold, is called Dualgruppe vom Modultypus, accordingly. By exhibiting suitable models, Dedekind is able to show that these two notions do not coincide, i. e., that the distributive and modular laws are independent. In “Über die von drei Moduln erzeugte Dualgruppe” (On the Dualgruppe generated by three modules, 1900). Dedekind investigates—in modern terms—the free modular lattice with three generators; he also determines the structure of the lattice generated by three ideals (i. e., the free distributive lattice with three generators), and he further investigates modular lattices, proving some fundamental theorems about them.20 Mehrtens notes that in the references to Schröder’s lectures, Dedekind does not mention that his own notion of Dualgruppe coincides with Schröder’s notion of logical calculus, and speculates that at this point Dedekind might not have realized that the same abstract structure underlies his and Schröder’s investigations, but only that certain similar relations, which are expressed by the underlying axioms, hold between statements concerning logic and his ideals and modules (Mehrtens 1979, p. 97). Thus, if Mehrtens’s analysis is correct, it were the axioms and theorems that brought out the analogy between Schröder’s and Dedekind’s work, when it was not yet possible for 18 See Wussing (1984). 19 To avoid confusions arising from this terminology, I shall refer to Dedekind’s notion simply as Dual-
gruppe, rather than translating it into English. 20 See, for example, Burris and Sankappanavar (1981, pp. 12–17).
123
Synthese (2011) 183:47–68
57
Dedekind to grasp, nor see the abstract structure that is instantiated. Three years later, however, Dedekind explicitly draws the connection between his Modulgruppen and Schröder’s identical calculus, and the correspondence between Dualgruppen and the logical calculus (Dedekind 1900, p. 252, footnote). Thus, by 1900 Dedekind had published an axiomatic characterization of lattices, discussed the main examples, and proved the fundamental theorems concerning modular lattices; but he did not present this work as the programmatic beginning of a new and important theory. Although abstract, Dedekind’s notion of Dualgruppe was intimately tied to that of modules, which he had hoped would play a fundamental role in algebraic number theory. However, the subsequent development did not follow his lead, as can be seen from the fact that they do not occur in Weber’s textbook on algebra (1895–1896) and that they are mentioned only very briefly in Hilbert’s influential Zahlbericht of 1897.21 In contrast to Schröder, whose starting point for the development of the notion of lattice was an axiomatization of Boolean algebras, Dedekind’s investigations began as the study of particular instances. The structure generated by the operations of gcd and lcm on modules gradually emerged in these investigations, and the duality of the laws governing these operations sparked Dedekind’s interest. He tried to give a minimal axiomatic characterization, and in the study of the dependencies between axioms and theorems it was especially the independence of the distributive law that caught his attention. Finally, the publication of Schröder’s lectures provided him with a new example of an instance of this notion, which motivated Dedekind to publish his own investigations on these matters.22 And, as soon as the axiom system was formulated, Dedekind noticed a number of other instances.
4 Lattices in the 1930s: analogies, modifications, and abstractions Schröder’s and Dedekind’s notions of logical calculus and Dualgruppe were not taken up immediately by their contemporaries. Mathematical practice, however, changed substantially between the turn of the century and the 1930s. In particular, axiomatics developed into a general technique and the use of set-theoretic reasoning became commonly accepted. Moreover, by 1930 many algebraic structures had been studied extensively, and generalizations and abstractions were no longer frowned upon as they often had been earlier.23 In the wake of these changes the notion of lattice was rediscovered independently by several authors around the same time. Mehrtens describes the years between 1930 and 1940 as the formation period of lattice theory, after which it had become an established mathematical theory. It is characteristic for this formation period that mathematicians who studied lattices still had to justify their interest in this notion to their peers. One of the pioneers of lattice theory, Garrett Birkhoff, reports: 21 For a discussion of the lack of influence of both Schröder’s and Dedekind’s notions of lattices, see Mehrtens (1979, pp. 123–126). 22 Interestingly, the publications of Dedekind’s work on the foundations of mathematics were also triggered by other publications on the same subject matter; see Dedekind (1872, p. 317), and Dedekind (1888, p. 335), 23 See Corry (1996) and Ferreirós (1999) for detailed accounts of these developments.
123
58
Synthese (2011) 183:47–68
I recall being dashed when my father asked me what, specifically, could I prove using lattices that could not be proved without them! My lattice-theoretic arguments seemed to me so much more beautiful, and to bring out so much more vividly the essence of the considerations involved, that they were obviously the ‘right’ proofs to use. (Birkhoff 1970, p. 6; quoted from Mehrtens 1979, p. 176)24 The justification for a new notion was usually given in terms of its wide range of applicability or, if possible, its usefulness for solving problems. In the following I shall briefly present how the notion of lattice emerged by analogy in the work of Menger and Bennett, by modification in Klein, and by abstraction in Ore and Birkhoff.
4.1 Menger’s unification of geometry That a projective geometry can be seen as an instance of a lattice had been noted by several authors (e. g., Korselt 1894), but only for Karl Menger (1902–1985) was this the main motivation for introducing his notion of Feld, i. e., a lattice with 0 and 1 elements. He was surprised by the fact that projective and affine geometry,25 although analogous in many respects, were not presented by similar axiomatizations, and asked: “Since projective and affine geometry have so much in common, why not base them on two sets of assumptions that have much in common?” (Menger 1940, [p. 43; translated from Mehrtens 1979, p. 132]). Menger’s interest in geometry was sparked by a coincidence. When he was assigned to teach projective geometry at the beginning of his career as professor in Vienna in 1927, he could not find a satisfying foundation of it in terms of union and intersection, and so he decided to work one out by himself.26 This resulted in a few remarks on a new axiomatization of projective geometry (1928), and Menger continued these investigations together with his students, but they did not arouse much interest outside of their circle.27 Over several years this system of axioms was studied in depth and more and more refined, and a summary of these efforts was published as “New foundations for projective and affine geometry”—subtitled “Algebra of geometry”—in 1936. Menger explicitly motivates his axiomatization of projective geometry, which is based on a single domain of entities (the linear parts of a space) and two operations of union and intersection, by the “far-reaching analogy” with abstract algebra and the algebra of logic that is thereby obtained (Menger 1936, p. 456). He recalls: The algebra of numbers has been developed from postulates about adding and multiplying numbers; the algebra of classes from postulates about joining and intersecting classes. This suggested a foundation of geometry on postulates about 24 Birkhoff’s contributions to lattice theory are discussed in Sect. 4.5, below. 25 In projective geometry all lines intersect, points and lines are dual. Affine geometry is a theory common
to Euclidean and several non-Euclidean geometries, which contains the notion of parallelism, but not that of a metric. 26 An interesting historical parallel is Dedekind’s interest in the foundations of analysis, which also resulted from his teaching duties (Dedekind 1872, p. 315). 27 See also Mehrtens (1979, p. 131).
123
Synthese (2011) 183:47–68
59
joining and intersecting flats, and the name ‘algebra of geometry’ for the theory developed. (Menger 1940, p. 45; quoted from Mehrtens 1979, p. 132) In addition, his treatment of geometry differs from the traditional ones in two other respects, namely that the geometry has an arbitrary finite number of dimensions from the start, and that affine and projective geometry are developed together as much as possible. Menger explains: We first develop [ …] consequences of a system of axioms valid in both affine and in projective spaces. [ …] From this system of axioms common to both geometries we pass to either of them. By adjoining the missing dual of one axiom we obtain a completely self-dual system from which all of projective geometry can be deduced. By adding the Euclidean parallel axiom we obtain the theory of affine spaces. (Menger 1936, p. 457) The possibility of developing great parts of both theories together “could hardly have been foreseen,” Menger remarks, and he also claims that this has advantages “from the pedagogic point of view” (Menger 1936, p. 457). In the concluding paragraph he suggests further investigations based on his axiomatization: By varying slightly some of the axioms of our system, new geometry systems might be obtained. [Footnote: This matter is evidently related to the question of the independence of our axioms, which is not considered in this paper.] Particularly promising in this respect is a variation of Axiom ·6 which, as we have seen constitutes the single difference between projective and affine geometry. (Menger 1936, p. 481) Thus, after having introduced his axiomatization based on a perceived analogy between affine and projective geometry, Menger very clearly expresses here how modifications of it can lead to new theories and he identifies one axiom from his system which looks “particularly promising.” In other words, he employs axiomatics not just for unifying different theories, teaching, and consolidating previous results, but also as a vehicle for further investigations.
4.2 Bennett’s explication of a commonality of axiom systems Although Albert Bennett’s (1888–1971) paper on lattices (1930) remained fairly isolated and consists in not much more than the definition of lattices, it is worth discussing at this point for two reasons. First, it was presented at the time when other similar formulations emerged, thus indicating that the rediscovery of the lattice structure was in the air. Second, he explicitly motivates the introduction of this notion by pointing out that it captures what is common to various previously studied notions. Thus, like Menger’s, also Bennett’s axiomatization is intended to clarify an analogy between previously given mathematical notions. Bennett begins his paper by noting that the notion of serial order (i. e., total order) and the calculus of classes have received a fair amount of attention from the “postulationists,” in particular by Huntington in (1904) and (1917). He continues:
123
60
Synthese (2011) 183:47–68
The two subjects differ considerably but both may be developed by use of a common symbol, <, of order relation. Some other important systems differing from both show also an essentially analogous use of a symbol of dyadic order relation [ …]. It appears therefore worth noting that a body of common relations found in these various basic mathematical studies has hitherto escaped a common formulation. (Bennett 1930, p. 418) Obviously unaware of the earlier work by Peirce and Schröder, Bennett presents an axiomatization of semi-serial order (partially ordered sets, with suprema and infima). The axiom system is taken mostly from Huntington, but “by the omission of certain postulates there given but here extraneous and by introducing VIII an essentially new system of more extensive application is obtained” (Bennett 1930, p. 419).28 Thus, Bennett’s aim is to characterize a perceived analogy axiomatically, and this analogy is based on previous axiomatizations. This is similar to the case of Aristotle mentioned in the introduction. For this reason it is appropriate to refer to his introduction of lattices as ‘by analogy,’ since he would not have pursued this axiomatization without being motivated by the analogy he saw between serial orders and the calculus of classes. After presenting his axioms Bennett shows how to define the operations ∨ and ∧ from the order relation and he deduces some basic theorems. The last two of this five page paper is devoted to a list of 12 mathematical domains that satisfy his system of axioms. He mentions the natural numbers together with ω and the non-negative real numbers with −1 and +∞, both domains with the usual order relation; the non-negative rational integers with gcd and lcm; the subclasses of a given class with logical product and sum; the linear projective subspaces of a given space of n dimensions. In this context Bennett notices a new connection between logic and geometry, namely that the algebra of logic applied to a class of n + 1 elements is a special case of Veblen and Young’s theory of finite geometry of n dimensions with p + 1 points on a line, where p = 1, and he remarks that “[t]his relationship is however left unnoted by Veblen and Young, and by Huntington” (Bennett 1930, p. 422). As further examples Bennett lists the set of closed intervals on a line, the set of all convex regions in a plane, the set of all submodules of a given module, the class of all linear subsets of a given linear set, the class of all regions in the plane each of which is bounded by a circle, the system of subgroups of a given group, and the set of idempotent elements in certain algebras of finite basis, all with appropriate operations. In sum, this short paper is a great example of the use of axiomatics for capturing what is common to two given theories and of the fact that other models can be found with ease once such an axiomatization is formulated.
4.3 Klein’s generalization of algebraic structures In contrast to Bennett, who apparently wrote only a single paper on lattices, Fritz Klein (1892–1961) published over a dozen of articles on this topic between the years 1929 and 1939. Initially, influenced by the work of Schröder and his own investigations 28 The ‘VIII’ refers to Bennett’s list of axioms.
123
Synthese (2011) 183:47–68
61
of logic, he became interested in abstract operations, i. e., where the nature of the elements that are operated upon can be disregarded (Klein 1931, p. 398). In particular, he found it curious that the distributive laws of logical sum and product are symmetric, while those for arithmetical addition and multiplication are not (1929). Thus, a negative analogy caught his attention. Following up on this observation he was led to the axioms for a distributive lattice, which he called “A-Menge” (1931). A year later he introduced the general notion of lattice under the current German term “Verband” (Klein 1932, p. 117). In this context he also gave examples from number theory, which he had apparently learned in the meantime from Dedekind’s works. More references to other models appear in his later publications, and Klein seems to have been encouraged in his pursuits by realizing that other mathematicians had also independently found interest in lattices. Like Bennett, the focus of Klein’s research lies in pure axiomatics, and the applications merely serve to provide a justification. They do not appear to guide his investigations in any particular way. According to Mehrtens, Klein’s studies are detailed but elementary; e. g., he does not discuss the modular laws at all, but his work is evidence for a change towards a mathematical practice that emphasizes abstract axiomatic approaches (Mehrtens 1979, pp. 174–175). 4.4 Ore’s programme of structural investigations A much more influential contribution to lattice theory than that provided by the three mathematicians discussed above is the work of Oystein Ore (1899–1968). After obtaining his doctorate in 1924 under Skolem, Ore worked chiefly on algebraic number theory until the 1930s, focusing in particular on field and ideal theory. Together with Noether and Fricke he edited Dedekind’s collected works (1930–32), and by this time his interest shifted to polynomials in non-commutative rings. His general aim was the transfer of decomposition theorems from algebraic number theory to non-commutative domains. These investigations were extended later to include the Jordan-Hölder theorem and group theory, where his goal became to “base the theory [of groups] as far as possible directly upon the properties of subgroups and eliminate the elements” (Ore 1937, p. 149; translated from Mehrtens 1979, p. 211). A general discussion of Ore’s contribution to the structural image of algebra can be found in Corry (1996, Chap. 6, pp. 263–292). I shall focus here on the emergence of the notion of lattice in Ore’s works. Ore was interested in general structural properties of algebraic systems, and he found in lattices, which he called structures, a very fruitful tool for his investigations. Birkhoff and Mac Lane suggest (in letters to Mehrtens) that Ore developed this notion independently, despite the fact that his first publication on this subject came after Birkhoff’s, whom he mentions. It was with Ore’s “On the foundation of abstract algebra. I” that many mathematicians first heard about the notion of lattice; and this was one of the most often quoted papers on the subject until 1940.29 Ore begins this paper by rejecting the search for a general notion that encompasses all algebraic 29 Birkhoff’s textbook Lattice Theory was published in 1940. His early publications on lattices were in
lesser-known journals.
123
62
Synthese (2011) 183:47–68
structures. He suggests instead a different approach to their unification and study, namely through the investigation of the systems of relations between their sub-domains in terms of a new notion: For all these systems there are defined the two operations of union and cross-cut satisfying the ordinary axioms. This leads naturally to the introduction of new systems, which we shall call structures, having these two operations. (Ore 1935, p. 406) He notes that, on the one hand, this more abstract approach results in a loss of available mathematical machinery (e. g., residue systems and cosets), but that, on the other hand, “a great deal of simplification and also many new results” are gained by this move (Ore 1935, p. 407). The importance of new results is also emphasized in a later paper, where he remarks, in connection with the possibility of presenting known results from different areas as following from a common unifying notion: It is of course quite interesting to examine to what extent this is possible, but the real usefulness of the idea appears through the various new results to which it leads. (Ore 1938, p. 801 quoted from Corry 1996, p. 274) Previous mathematicians had introduced the notion of lattice either in terms of a partial order relation and then defined the operations of meet and join from it (e. g., Bennett), or vice versa (e. g., Menger and Klein). Ore shows that these axiomatizations are in fact equivalent (Ore 1935, p. 409).30 I have pointed out in connection with Bennett, that until 1930 the notion of order had almost exclusively been understood as linear (total) order. As such it had been investigated axiomatically by Huntington and Veblen. Hausdorff had introduced the notion of partially ordered set in 1914, but omitted it from the revised second edition of his textbook in 1927, since it was of no further importance for his work. It is chiefly with Ore’s axiomatic presentation of a “partly ordered set” and the relevance of partial orderings in connection with lattices that this notion became more prominent in mathematics.31 At this point an interesting observation can be made regarding the difference between how mathematics is perceived and presented in retrospect and how it actually develops. Birkhoff tells us the following story: It is often said that mathematics is a language. If so, group theory provides the proper vocabulary for discussing symmetry. In the same way, lattice theory provides the proper vocabulary for discussing order, and especially systems which are in any sense hierarchies. (Birkhoff 1938, p. 793; translated from Mehrtens 1979, p. 314) As we have seen, this logic internal to mathematics does not reflect the historical development of the theory, in which the study of orderings played only a marginal role.
30 Equivalence is meant here in the sense of mutual interpretability. 31 Ore (1935, p. 408). Ore refers to Hausdorff for the terminology; see also Mehrtens (1979, p. 187).
123
Synthese (2011) 183:47–68
63
In a brief passage Ore gives us some insight into his systematic investigations of possible systems of axioms. After having introduced the modular and distributive axioms (referred to as “Dedekind axiom” and “Arithmetic axiom”), Ore notes that “[t]he most common algebraic systems satisfy axioms less restrictive than the arithmetic axiom and more restrictive than the Dedekind axiom” (Ore 1935, p. 415); he presents a method for obtaining axioms of intermediate strength, by imposing identities on the 18 elements that are generated by union and intersection of three given elements, i. e., on the free distributive lattice generated by three elements.32 Thus, the idea is to systematically construct possible models and then to formulate axioms that allow for the distinction between these models. Unfortunately, in the case at hand this method does not succeed, as Ore explains: A discussion of all possibilities shows however that, aside from trivial cases, all conditions thus obtained are equivalent to the arithmetic axiom. This then proves that the arithmetic axiom is the only stronger axiom containing only three arbitrary elements A, B, C. To obtain other axioms stronger than the Dedekind axiom it is necessary to consider the general Dedekind structure generated by four or more elements. (Ore 1935, p. 415) However, Birkhoff (1933) had shown that such structures are in general infinite, by which “the quest for such special axioms is considerably complicated” (Ore 1935, p. 415). To conclude, Ore’s notion of lattice is intended as a tool for generalizing and investigating algebraic structures. Thus, his way of arriving at lattices is, like Dedekind’s, by abstraction. This becomes clear from the fact that his main justification for the new notion is that it allows to recapture important algebraic decomposition theorems and that he does not mention any models from areas of mathematics other than algebra. As Mehrtens points out, this is not enough to form the basis for an independent theory (Mehrtens 1979, p. 186). Such a basis was developed by Garrett Birkhoff.
4.5 Birkhoff’s consolidation of lattice theory When Garrett Birkhoff (1911–1996) was born, his father, G. D. Birkhoff, was one of the leading American mathematicians. Garrett received his B. A. in 1932, then went to England to study group theory, and soon thereafter published his first work of lattice theory (1933). As his main influences Birkhoff mentions the group theorist Remak and the algebraist van der Waerden.33 The latter’s 1930 book Moderne Algebra, based on lectures by Artin and Noether, was the first and highly influential, cohesive, and abstract presentation of algebraic structures. In the preface its aim is described as an introduction to a “whole world” of algebraic concepts, and the creative role of axiomatics is acknowledged:
32 This is Dedekind’s Dualgruppe vom Idealtypus, i. e., a distributive lattice; see page 56, above. 33 In a letter to Mehrtens (Mehrtens 1979, p. 159).
123
64
Synthese (2011) 183:47–68
The recent expansion of algebra far beyond its former bounds is mainly due to the “abstract,” “formal,” or “axiomatic” school. This school has created a number of novel concepts, revealed hitherto unknown interrelations and led to far-reaching results, especially in the theories of fields and ideals, of groups, and hypercomplex numbers. (van der Waerden 1930, quoted from the translation of the second edition, Vol. xi); General notions, like those of structure-preserving mappings and equivalence relations, are introduced set-theoretically in the first chapter, and then they are applied in the study of groups, rings, fields, etc., which are introduced axiomatically. Just before the definitions of rings and fields, the general notion of a system of double composition is introduced as a system that is closed under the operations of addition and multiplication. A lattice is such a system, but it is not mentioned (van der Waerden 1930, p. 37). Through this book Birkhoff became acquainted with a variety of different algebraic structures, while Remak’s work showed him the importance of the study of substructures. Remak investigated unique decomposition of finite groups and the representation of finite groups as subgroups of direct products. His work is characterized in particular by the use of the structure of the normal subgroups of a group and the investigation of the subgroups of direct products with three factors (1932). Mehrtens speculates that Birkhoff may have begun his investigation of the subgroup generated from three normal subgroups by repeatedly forming direct products in connection with his studies of Remak’s works. This structure corresponds to the free modular lattice generated by three elements and is the same one that Dedekind had investigated more than three decades earlier. It plays a prominent role in Birkhoff’s first paper on lattice theory (1933), where he repeats many of Dedekind’s results, but also presents new material;34 e. g., that a lattice generated in this way by four elements (called “free”) is in general infinite and that every distributive lattice can be represented by a ring of sets.35 Since it seems that Birkhoff’s motivation for his axiomatization were particular instances of lattices, his introduction is one by abstraction, similar to that of Dedekind. Birkhoff also discusses modular and distributive lattices, and applications to group theory, ideal theory, and geometry. These applications are elaborated in later papers, where further ones are added, e. g., set theory, measure and probability theory, equivalence relations, and topology. The work of Garrett Birkhoff was instrumental for establishing lattice theory as an independent and generally accepted mathematical theory. Not only did he introduce the now common English term “lattice” in 1933 and wrote the first monograph on 34 Birkhoff had been made aware of Dedekind’s work by Ore, and he discusses its relation to his own work in Birkhoff (1934). He recalls: “Not knowing of Dedekind’s previous work, I felt that my results partly justified my claims” (Birkhoff 1970, p. 6; quoted from Mehrtens (1979), p. 176). Later, he remarks: “I admired the style and the power of the master, and was glad that he had not anticipated more of my work” (Letter to Mehrtens, quoted from Mehrtens (1979), p. 178). Mehrtens points out the striking similarities between Birkhoff’s and Dedekind’s approach; he remarks that Dedekind introduced his Dualgruppen when he was already retired, while Birkhoff developed his notion of lattice when he was only 22 years old: The knowledge that Dedekind had accumulated during his life had in the meantime become general knowledge in the community of algebraists (Mehrtens 1979, p. 179). 35 A ring of sets is a family of sets that are closed under finite union and intersection.
123
Synthese (2011) 183:47–68
65
this subject in 1940, but he also developed the theory in great depth and he was able to integrate the work of other mathematicians into one coherent whole. Bennett and Klein showed that many algebraic structures can be studied as lattices, Menger and Ore showed the relation of lattice theory to the foundations of geometry and to the decomposition theorems in algebra; but Birkhoff is the one who really emphasized the notion of lattice as being of central importance to many mathematical fields. In the opening lecture of the 1938 spring meeting of the American Mathematical Society he introduced lattice theory as a “vigorous and promising younger brother of group theory” and argued emphatically that some familiarity with it “is an essential preliminary to the full understanding of logic, set theory, probability, functional analysis, projective geometry, the decomposition theorems of abstract algebra, and many other branches of mathematics” (Birkhoff 1938, p. 793; quoted from Mehrtens 1979, p. 284).36 5 Conclusion In all the investigations discussed in this paper, axiomatics has been a key methodological and creative tool for mathematical discovery. We have seen that the abstract notion that is today called ‘lattice’ was developed independently by Ernst Schröder and Richard Dedekind in the late 19th century. Schröder was led by considerations regarding the independence of the distributive axioms to a meaningful instance of a lattice, which in turn justified his isolation of a subset of the axioms of Boolean algebras. For Dedekind, the structures induced by the operations of gcd and lcm on modules were the instances of lattices that motivated his axiomatic characterization, and, once the notion was presented axiomatically, he quickly found further instances from many different areas of mathematics. However, their notions were not taken up by their contemporaries and thus lay dormant for the next decades. But by the 1930s a number of developments had taken place in mathematics that facilitated the development and spreading of abstract notions. These include, in particular, a general acceptance of set-theoretic and axiomatic reasoning, and of the study of abstract structures in their own right. In this context the notion of lattice reemerged in the quest for unifying notions in the independent work of younger mathematicians (Klein, Bennett, Menger, Ore, and Birkhoff).37 Indeed, its unifying power is now, in retrospect, regarded as one of its most important virtues.38 Within a decade this research had been consolidated and lattice theory had been established as an independent branch of mathematics, which is marked by the publication of the first textbook on lattice theory by Birkhoff (1940). These developments surrounding the emergence of the notion of lattice were intimately connected to the use of axiomatics. In particular, they illustrate the three different ways I have identified in the Introduction by which axiom systems can contribute to the introduction of new notions: by modification, by abstraction, 36 Similar remarks can be found in the preface to the second edition of Lattice theory (Birkhoff 1940,
second ed., 1948, Vols. iii–iv). 37 Of possible interest is also Skolem (1936), but I was not able to access this paper; see its review (Birkhoff
1937). 38 See for example Birkhoff (1970, p. 1).
123
66
Synthese (2011) 183:47–68
and by analogy. Thus, the creative aspect of axiomatics is an essential ingredient of mathematical practice. Acknowledgements This paper was presented at The Classical Model of Science, Amsterdam, The Netherlands (January 2007), at Perspectives on Mathematical Practices, Brussels, Belgium (March 2007), and at the Meeting of the Canadian Society for History and Philosophy of Mathematics (CSHPM) and the British Society for the History of Mathematics (BSHM), Concordia University, Montreal, QC (July 2007). The author would like to thank the audiences for their valuable comments, in particular Pieter Sjoerd Hasper and Paola Cantù, and Rachel Rudolph for stylistic improvements. Translations are by the author, unless otherwise noted.
References Avigad, J. (2006). Methodology and metaphysics in the development of Dedekind’s theory of ideals. In J. Ferreirós & J. Gray (Eds.), The architecture of modern mathematics (pp. 159–186). Oxford: Oxford University Press. Bennett, A. A. (1930). Semi-serial order. American Mathematical Monthly, 37(8), 418–423. Birkhoff, G. (1933). On the combination of subalgebras. Proceedings of the Cambridge Philosophical Society, 29, 441–464. Birkhoff, G. (1934). Note on the paper “On the combination of subalgebras. Proceedings of the Cambridge Philosophical Society, 30, 200. Birkhoff, G. (1937). Review of Thoralf Skolem, ‘Über gewisse “Verbände” oder “lattices”’ (1936). Journal of Symbolic Logic, 2(1), 50–51. Birkhoff, G. (1938). Lattices and their applications. Bulletin of the American Mathematical Society, 44, 793–800. Birkhoff, G. (1940). Lattice theory (Vol. 25 of AMS Colloquium Publications). New York: American Mathematical Society. (Second edition 1948). Birkhoff, G. (1970). What can lattices do for you?. In J. C. Abbott (Ed.), Trends in lattice theory (pp. 1–40). New York: Van Nostrand Reinhold Co. Burris, S., & Sankappanavar, H. (1981). A course in universal algebra. Springer Verlag: Berlin. (References are to “The Millenium Edition,” available online at http://www.math.uwaterloo.ca/snburris/ htdocs/ualg.html). Corry, L. (1996). Modern algebra and the rise of mathematical structures. In: Science networks. Historical studies (Vol. 17). Basel: Birkhäuser. de Jong, W. R., & Betti, A. (2008). The Aristotelian model of science: A millennia-old model of scientific rationality. Synthese. doi:10.1007/s11229-008-9420-9. Dedekind, R. (1857). Abriß einer Theorie der höheren Kongruenzen in bezug auf einen reellen PrimzahlModulus. Journal für die reine und angewandte Mathematik, 54, 1–26. (Reprinted in Dedekind 1932, Vol. I, Chap. V, pp. 40–67). Dedekind, R. (1872). Stetigkeit und irrationale Zahlen. Braunschweig: Vieweg. (Reprinted in Dedekind 1932, Vol. III, pp. 315–334. English translation Continuity and irrational numbers by W. W. Beman, revised by W. Ewald, in (Ewald 1996), pp. 765–779.) Dedekind, R. (1877). Über die Anzahl der Ideal-Klassen in den verschiedenen Ordnungen eines endlichen Körpers. In Festschrift der Technischen Hochschule Braunschweig zur Säkularfeier des Geburtstages von C.F. Gauss (pp. 1–55). Braunschweig: Vieweg. (Reprinted as Chap. XII in Dedekind 1932, Vol. I, pp. 105–158). Dedekind, R. (1888). Was sind und was sollen die Zahlen? Braunschweig: Vieweg. (Reprinted by R. Dedekind, 1932, Vol. III, pp. 335–391. English translation by W. W. Beman, revised by W. Ewald, 1996, pp. 787–833. Dedekind, R. (1897). Über Zerlegungen von Zahlen durch ihre größten gemeinsamen Teiler. Festschrift der Technischen Hochschule zu Braunschweig bei Gelegenheit der 69. Versammlung Deutscher Naturforscher und Ärzte (pp. 1–40). (Reprinted by Dedekind, 1932, Vol. II, Chap. XXVIII, pp. 103–147). Dedekind, R. (1900). Über die von drei Moduln erzeugte Dualgruppe. Mathematische Annalen, 53, 371–403. (Reprinted by Dedekind, 1932, Vol. II, Chap. XXX, pp. 236–271).
123
Synthese (2011) 183:47–68
67
Dedekind, R. (1930–1932). In R. Fricke, E. Noether & O. Ore (Eds.), Gesammelte mathematische Werke (Vol. 3). Braunschweig: Vieweg. Dedekind, R. (1964). Über die Theorie der ganzen algebraischen Zahlen. Braunschweig: Vieweg. (Reprints of Chaps. XLVI (1894), XLVII (1871), XLVIII (1877), and XLIX (1879) from Dedekind, 1932, Vol. III. With an introduction by B. L. van der Waerden). Dedekind, R. (1996). Theory of algebraic integers. Cambridge UK: Cambridge University Press. (English translation of Sur la Théorie des Nombres entiers algébriques (J. Stillwell, Trans.). (Original work published 1877)) Dirichlet, L. (1871). Vorlesungen über Zahlentheorie. Braunschweig: Vieweg. (Second edition, edited and with additions by R. Dedekind. Third edition 1877, fourth edition 1894). Dyck, W. (1882). Gruppentheoretische Studien. Mathematische Annalen, 20(1), 1–44. Ewald, W. (1996). From Kant to Hilbert: A source book in mathematics (Vol. 2). Oxford: Clarendon Press. Ferreirós, J. (1999). Labyrinth of thought—A history of set theory and its role in modern mathematics. In Science Networks. Historical Studies (Vol. 23). Basel, Boston, Berlin: Birkhäuser Verlag Hasper, P. S. (2006). Sources of delusion in Analytica Posteriora 1.5. Phronesis, 51(3), 252–284. Hausdorff, F. (1914). Grundzüge der Mengenlehre. Leipzig: Veit. (Reprinted by Chelsea, New York, 1949). Hilbert, D. (1897). Die Theorie der algebraischen Zahlenkörper. Jahresbericht der Deutschen Mathematiker Vereinigung, 4, 175–546. (Reprinted in Gesammelte Abhandlungen, Vol. 1, 1932, pp. 63–363). Hilbert, D. (1918). Axiomatisches Denken. Mathematische Annalen, 78, 405–415. [(Reprinted in Gesammelte Abhandlungen, Vol. 3, 1935, pp. 146–156). (English translation: Axiomatic thought, Ewald 1996, pp. 1105–1115)]. Huntington, E. V. (1904). Sets of independent postulates for the algebra of logic. Transactions of the American Mathematical Society, 5(3), 288–309. Huntington, E. V. (1917). The continuum and other types of serial order (2nd ed.). Cambridge, MA: Harvard University Press. (Reprinted by Dover, 1955). Klein, F. (1926). Vorlesungen über die Entwicklung der Mathematik im 19. Jahrhundert (Vol. 1). Berlin: Springer Verlag. Klein, F. (1929). Einige distributive Systeme in Mathematik und Logik. Jahresbericht der Deutschen Mathematiker-Vereinigung, 38, 35–42. Klein, F. (1931). Zur Theorie abstrakter Verknüpfungen. Mathematische Annalen, 105, 308–323. Klein, F. (1932). Über einen Zerlegungssatz in der Theorie der abstrakten Verknüpfungen. Mathematische Annalen, 106, 114–130. Klein, F. (1935). Beiträge zur Theorie der Verbände. Mathematische Zeitschrift, 39, 227–239. Korselt, A. R. (1894). Bemerkungen zur Algebra der Logik. Mathematische Annalen, 44(1), 156–157. (Excerpt from a letter to the editors). Lüroth, J. (1891). Dr. Ernst Schröder, Vorlesungen über die Algebra der Logik. Zeitschrift für Mathematik Und Physik, 36, 161–169. Mehrtens, H. (1979). Die Entstehung der Verbandstheorie. Hildesheim: Gerstenberg Verlag. Menger, K. (1928). Bemerkungen zu Grundlagenfragen. IV: Axiomatik der endlichen Mengen und der elementargeometrischen Verknüpfungsbeziehungen. Jahresbericht der Deutschen Mathematiker-Vereinigung, 37, 309–325. Menger, K. (1936). New foundations of projective and affine geometry. Annals of Mathematics, 37(2), 456–482. Menger, K. (1940). On algebra of geometry and recent progress in non-Euclidean geometry. The Rice Institute Pamphlet, 27, 41–79. Ore, O. (1935). On the foundations of abstract algebra I. Annals of Mathematics, 36(2), 406–437. Ore, O. (1937). Structures and group theory I. Duke Mathematical Journal, 3, 149–174. Ore, O. (1938). On the application of structure theory to groups. Bulletin of the American Mathematical Society, 44, 801–806. Peirce, C. S. (1880). On the algebra of logic. American Journal of Mathematics, 3, 15–57. (Reprinted by Peirce, 1966, Vol. III, pp. 104–157). Peirce, C. S. (1885). On the algebra of logic: A contribution to the philosophy of notation. American Journal of Mathematics, 7(2), 180–196. (Reprinted by Peirce, 1966, Vol. III, pp. 210–249).
123
68
Synthese (2011) 183:47–68
Peirce, C. S. (1960–1966). Collected papers (Vol. 4). Cambridge, MA: Harvard University Press. (Reprint of 1931–1958, Vol. 8, by C. Hartshorne & P. Weiss Eds.). Remak, R. (1932). Über die Untergruppen direkter Produkte von drei Faktoren. Journal für Die Reine Und Angewandte Mathematik, 166, 65–100. Schlimm D. (2006). Axiomatics and progress in the light of 20th century philosophy of science and mathematics. In B. Löwe, V. Peckhaus & T. Räsch (Eds.), Foundations of the formal sciences IV, Studies in logic series (pp. 233–253). London: College Publications. Schlimm, D. (2008a). On abstraction and the importance of asking the right research questions: Could Jordan have proved the Jordan-Hölder Theorem?. Erkenntnis, 68(3), 409–420. Schlimm, D. (2008b). Two ways of analogy: Extending the study of analogies to mathematical domains. Philosophy of Science, 75(2), 178–200. Schröder, E. (1873). Lehrbuch der Arithmetik und Algebra für Lehrer und Studierende: Die sieben algebraischen Operationen (Vol. 1). Leipzig: B.G. Teubner. Schröder, E. (1877). Der Operationskreis des Logikkalküls. Leipzig: B.G. Teubner. (Reprint by Darmstadt, 1966). Schröder, E. (1890–1905). Vorlesungen über die Algebra der Logik. Leipzig: B.G. Teubner. (3 volumes: vol. 1 (1890); vol. 2.1 (pp. 1–400, 1891), 2.2 (pp. 401–605, 1905); vol. 3 (1896). Reprint New York, 1966). Serfati, M. (2007). Du psychologisme booléen au théoreme ` de Stone. Epistémologie et histoire des algèbres de Boole. In J.-C. Pont, L. Freland, F. Padovani & L.Slavinskaia (Eds.), Pour comprendre le XIXe. Histoire et philosophie des sciences à la fin du siècle (pp. 145–169). Florence: Leo S. Olschki. Sieg, W., & Schlimm, D. (2005). Dedekind’s analysis of number: Systems and axioms. Synthese, 147(1), 121–170. Skolem, T. (1936). Über gewisse “Verbände” oder “lattices”. Avhandlinger utgitt av Det Norske Videnskaps-Akademi i Oslo, I. Mat.-naturv. klasse, 7. van der Waerden, B.L. (1930). Moderne algebra (Vol. 2). Berlin: Springer. Voigt, A. (1892). Was ist Logik?. Vierteljahresschrift für Philosophie, 16, 189–332. Weber, H. (1895–1896). Lehrbuch der Algebra (Vol. 2). Braunschweig: Vieweg. Wussing, H. (1984). The genesis of the abstract group concept. Cambridge, MA: MIT Press. (Translation of Die Genesis des abstrakten Gruppenbegriffes, VEB Deutscher Verlag der Wissenschaften, Berlin, 1969).
123
Synthese (2011) 183:69–85 DOI 10.1007/s11229-009-9668-8
What is the axiomatic method? Jaakko Hintikka
Received: 4 February 2008 / Accepted: 25 March 2009 / Published online: 9 October 2009 © Springer Science+Business Media B.V. 2009
Abstract The modern notion of the axiomatic method developed as a part of the conceptualization of mathematics starting in the nineteenth century. The basic idea of the method is the capture of a class of structures as the models of an axiomatic system. The mathematical study of such classes of structures is not exhausted by the derivation of theorems from the axioms but includes normally the metatheory of the axiom system. This conception of axiomatization satisfies the crucial requirement that the derivation of theorems from axioms does not produce new information in the usual sense of the term called depth information. It can produce new information in a different sense of information called surface information. It is argued in this paper that the derivation should be based on a model-theoretical relation of logical consequence rather than derivability by means of mechanical (recursive) rules. Likewise completeness must be understood by reference to a model-theoretical consequence relation. A correctly understood notion of axiomatization does not apply to purely logical theories. In the latter the only relevant kind of axiomatization amounts to recursive enumeration of logical truths. First-order “axiomatic” set theories are not genuine axiomatizations. The main reason is that their models are structures of particulars, not of sets. Axiomatization cannot usually be motivated epistemologically, but it is related to the idea of explanation. Keywords Axiomatic method · Information · Logical consequence · Completeness · Set theory
J. Hintikka (B) Department of Philosophy, Boston University, 745 Commonwealth Avenue, Boston, MA, USA e-mail:
[email protected]
123
70
Synthese (2011) 183:69–85
1 Axiomatic method and the conceptualization of mathematics One of the most important ingredients of the classical model (or models) of science is the axiomatic method. This method can be viewed from several different perspectives. Philosophers’ and other thinkers’ ideas about the axiomatic method have been formed largely in terms of its uses in mathematics. Accordingly, one of the most instructive perspectives on it is offered by its role in the overall development of mathematics in the past two centuries. Around 1800 mathematics still meant, very broadly speaking, merely the study of two areas, on the one hand space and on the other numbers of different kinds, together the functions from numbers to numbers. Slowly, the emphasis began to shift to what has been called conceptual mathematics (see here e.g. Laugwitz 1996). The job description of mathematics came to include the characterization and study of different kinds of structures and of the concepts needed to speak of those structures. At first these structures were found within mathematics itself. Early examples include the theory of surfaces by Gauss and Riemann, who were able to formulate analytic definitions of such intuitively given geometrical notions as curvature. Another, more abstract example is Galois theory, which involved the abstract structural notion of group. The first explicit formulation of a research project in this direction was Riemann’s proposal for the theory of manifolds. Manifolds were as it were raw material on which different forms can be imposed. Later figures in this development of the generalizing and structuralizing conception of mathematics are dominantly David Hilbert and the Bourbaki group. The most abstract structures studied in conceptual mathematics are undoubtedly sets. The genesis of set theory is an integral part of the conceptualization of mathematics. In our present-day perspective, sets might not seem to have any structure at all. However, originally the notion of a set did not exclude its possessing an internal organization. Indeed, at an early stage of the development of set theory sets could be referred to in German as Mannigfaltigkeiten, in other words as manifolds, without any sharp distinction from Riemann’s sense. The structuralist orientation of modern mathematics naturally leads to the use of axiomatization. To understand a kind of structure, for instance the structure of a group, is to have an overview over all its instantiations. In an axiomatic system, this is accomplished by capturing all those structures as the models of the system. It is not an accident that both Hilbert and Bourbaki relied on essentially the axiomatic method (cf. Hilbert 1918; Bourbaki 1950). For this method is the natural one for a structuralist. The axioms determine a class of structures as models of the axiom system. By deriving theorems from the axioms a mathematician can study those structures. This orientation toward structures means that the focus of an axiomatic system— what the axioms axiomatize—are structures themselves, not propositions about them. This induces a kind of model-theoretical preference into modern axiomatization efforts. The study of an axiom system uses typically model-theoretical methods, such as the construction of models that satisfy certain axioms but not others. (Think e.g. of Padoa’s method in the logical theory of definition.) This orientation is frequently overlooked. For instance Hilbert is labeled by philosophers, almost as a reflex action, as a formalist even though his interest and his work in axiomatization are in reality strongly indicative of a model-theoretical way of thinking.
123
Synthese (2011) 183:69–85
71
A structure to be studied can be suggested to us by our intuitions as in geometry or as in group theory, which can be thought of as the general study of symmetries of all different kinds. But this origin of the structure does not matter for its mathematical treatment. For instance, the structure in question can be simply stipulated, as in lattice theory. (I doubt that people have innate intuitions specifically about lattices, except perhaps as a part of our geometrical intuitions in general.) Most importantly, the structures in question can be determined by the laws of nature. Thus the axiomatic method is applicable also in science and not only in mathematics. These applications to science can be extremely important and often have been recognized as being important. For instance, Hilbert spent a great deal of effort to axiomatize actual scientific theories, including thermodynamics [see here the writings of Ulrich Majer, especially Majer (1995, 2001)]. What a scientific theory does is to characterize all structures of a certain kind, viz. the structures compatible with the laws of the theory in question. It is in scientific theories that we can most clearly see the advantages of studying the models of a theory, which now are physical systems, merely by deducing consequences of the axioms governing them. What is crucial in the axiomatic method so understood is that an axiomatized theory is to capture all and only the relevant structures as so many models of the axioms. Once a complete axiom system for some particular science is established, it literally becomes possible to investigate the phenomena that are the entire subject matter of that science in one’s study, instead of a laboratory or observatory with pencil and paper. Of course, in our day and age pencil and paper are being replaced by a computer. But even so, using a computer is easier and cheaper than building an accelerator. Nuts-and-bolts experiments can be replaced by thought-experiments. It is in this way that e.g. the traditional “rational mechanics” (in contradistinction to experimental mechanics) becomes possible. Likewise, it sometimes becomes possible to predict what experience can or cannot show, as when Dirac’s equation enabled physicists to predict the existence of positrons or when Heisenberg’s uncertainty relation enabled physicists to realize the impossibility of measuring two noncommuting variables simultaneously with complete accuracy. One of the most striking examples is from nonaxiomatized physics, but it illustrates the same point. During World War II one of the crucial questions concerning the feasibility of an atom bomb was the critical mass of the nuclear reaction. German physicists never settled the question during the war. Yet the answer was implied by what was already known, so much so that Heisenberg was able to calculate it in a couple of days while interned in England without any resources and of course without performing any experiments or observations. This then is one main use of axiomatization. An axiom system can capture everything we know of certain physical systems or about other kinds of structures. The conceptualization of mathematics must be distinguished from the quest of greater strictness in mathematical theorizing and argumentation. The latter has in philosophers’ discussions received a lion’s share of attention, even though the conceptualization is in reality the more important one. One can in fact consider the entire emphasis on logical strictness as a consequence of conceptualization. For the tools of dealing with abstract concepts over logical ones, and using them presupposes greater awareness of the laws of logic, in other words, of logical strictness. For a well-known
123
72
Synthese (2011) 183:69–85
instance, the study of Fourier series led to the development of set theory which in turn turned out to involve a host of logical or logic-related issues, such as the so-called axiom of choice, which should be viewed as a purely logical principle. This description of how an axiom system works may sound unfamiliar to some readers. They think of an axiomatic system as a theory of some one model (or perhaps some one structure or some one “world”, alias scenario). However, this is simply a special case of the more general description just given, viz. the case in which the class of structures targeted by the axiom system has only one member. In this case each axiom and each subset of the axioms delineates a class of structures as its models until the whole set of axioms boils that class down to a single structure. In this sense, the mechanism of axiomatization is the same no matter whether the axiomatist aims at a class of structures or at a unique one. Another respect in which the model-theoretic characterization of axiomatization may look unfamiliar is that axiomatization is taken to mean formulating a system of basic truths from which all the other truths of some body of knowledge can be derived purely logically. There need not be anything wrong about this account of axiomatization but it is not the whole story. The theorems so derived are the truths that hold in all the models of the axiom systems in question. But there is a great deal more that can be said of the models of an axiom system than merely the set of truths holding in all of them. Hence the task of deriving theorems from the axioms of a system is only a small part of the study of the structure that its models exemplify. This is the reason why the study of an axiom system includes in mathematical practice its metatheory.
2 Axiomatization as a metatheoretical method This fact is reflective in yet another aspect of the axiomatic method that is again connected with the same overall development of mathematics. When a certain type of structure is studied mathematically, this study is not restricted to what can be found out about all such structures or about some subclass of them. A perhaps even more important task in practice is to reach an overview over all such structures. The role of an axiom system in conceptually oriented mathematics is not to serve as a set of premises for deducing consequences, even though many philosophers seem to think so. No, an axiom system is also calculated to serve also as an object for a metatheoretical study. For instance, very little of the actual development of group theory consists in deriving consequences from the axioms of group theory. Most of what is actually called group theory consists of a metatheoretical study dealing with such questions as the taxonomy of different kinds of groups, representation theorems, etc. Already in Greek mathematics the idea was that relatively few conclusions are derived from the axioms and postulates alone. Typical theorems also involve definitions of particular geometrical notions. For the purpose of reaching such a metatheoretical overview, it is crucial to grasp the logical structure of the theory in question, in the sense of seeing what the different independent assumptions of the theory are, of seeing which theorems depend on which of these basic assumptions and so on. For this purpose, the axiomatic method is eminently appropriate. Richer structures can be captured by adding further axioms.
123
Synthese (2011) 183:69–85
73
The independence of different axioms of each other will show what the different and mutually independent assumptions are that go into our knowledge about a certain science or more generally about a certain type of structure. Thus it is important to realize that the metatheoretic aspects of an axiom system typically serve the same overall purpose as axiomatization in the first place, viz. the exploration of the models of an axiom system. The independence of a certain theorem of a given axiom can have an interesting mathematical or even physical meaning. For instance in Hilbert’s classical (1899), such independence results are a significant part of the motivation of his work. The logical inconsistency of two axiomatic theories or two potential axioms can prompt interesting attempts to reconcile them. For instance, the special theory of relativity can be viewed as an attempt to reconcile logically classical mechanics and Maxwell’s theory of electro-magnetism (for problems in this direction, see Trisch 2005). Thus an axiom system requires logic in two different senses. First there is the logic that is being used in the axiomatic theory itself in deriving theorems from axioms. The choice of this logic can be a consequential step. For instance, suppose that IF (independence-friendly) logic is used in elementary number theory, instead of old-fashioned first-order logic. Then it turns out to formulate an elementary consistency proof for elementary arithmetic itself (see Hintikka and Karakadilar 2006. For IF logic, see Hintikka 1996). Because of the metatheoretic aspect of axiom systems, an axiomatist also needs a logic for the metatheory for his or her system. In actual mathematical practice, no sharp distinction is made between the two. Hence, one might hope that the same logic can serve both purposes. In other words, one thing that is suggested by the metatheoretic component of modern mathematical theorizing is that the logic used in an axiom system should be capable of serving as its own metalogic. It might nevertheless appear that this desideratum is totally impossible to satisfy. The first and foremost model-theoretical concept is that of truth. Tarski (1935) showed that one cannot define truth for a first-order language in the same language. This impossibility might at first sight suggest that first-order logic and consequently logic in general is a poor medium for axiomatization. This is a misconception, however. It is due to the fallacious idea that Tarski’s result is due to the excessive power of first-order languages. If so, there is no hope of even construing a logical language that could serve in any realistic sense as its own metalogic. Hence in current mathematical practice there is in principle considerable confusion as to what kind of logic is being actually used. However, the genesis of IF logic has changed the situation radically, for a first-order language using such logic can admit the formulation of a truth predicate for the same language. Be this as it may, the power of a theory to serve as its own metatheory, offers an interesting way of comparing different axiom systems with each other.
3 What is required of axiomatization? In order to perform all these tasks, what condition does an axiom system have to satisfy? In formulating an answer, it is important to realize that since the unfolding of the system takes place by deriving theorems logically from the axioms, an axiom
123
74
Synthese (2011) 183:69–85
system has to be considered in conjunction with a definite logic. This logic is subject to nontrivial requirements. Similarly but less crucially the nonlogical concepts that the theory uses have to be specified. Fairly obviously, the following requirements must be satisfied: (1) The models of the axiomatic theory have to comprise all and only the intended structures. (2) All theorems are logical consequences of the axioms. (3) The derivation of theorems from axioms must not introduce any new information into the conclusion. (4) The logic used must be complete. These requirements probably strike you as unproblematic, not to say as innocuous. In reality they have much more bite than might at first seem to be the case. Much of the rest of this essay is a commentary on these requirements. The requirement (1) should be self-explanatory. The very purpose of the axiomatic method is to study some class of structures by constructing an axiom system whose models they are. Without the requirement (2) an axiom system could not help as to study the network of dependence relations between the different truths and assumptions of a given theory. The requirement (3) is connected with another perspective on the axiomatic method that is historically important. This perspective is most conspicuous in the case of physical theories. There it is sometimes quite unclear what precisely is or is not assumed or whether the logical and mathematical techniques allow the smuggling of new information into an argument. This question was a genuine scientific one most prominently perhaps in the case of Boltzmann’s thermodynamics. There it was a live question among other things whether the mathematical tools Boltzmann used tacitly introduced factual assumptions into physical reality. Hilbert already realized, however tacitly, the fundamental fact that the requirement (3) is satisfied if the derivation of theorems from axioms uses only logical consequence relations, in the model-theoretical sense of logical consequence (Hilbert 1922). This problem of actually summarizing all the information actually used in a theory in an axiom system was an important facet of Hilbert’s motivation. He saw it as a mathematical (conceptual) problem, not a scientific one. Indeed Problem 6 in his famous list of open mathematical problems was the injunction: Axiomatize physical theories. In his motivation of the sixth problem, Hilbert mentions in so many words Boltzmann’s thermodynamics (see Hilbert 1900). The metamathematical dimension of axiomatization shows an important presupposition of its success. An axiom system must be accompanied by a metatheory capable of studying its models, well, model-theoretically. In this sense, mathematical axiom systems are not self-sufficient, unless their model-theory can be formulated as an aspect of the theory itself. This requirement can be thought of as a further condition (5) on successful axiomatization. This requirement in turn imposes conditions on the logic that is being used as the basis of an axiom system. For instance, ordinary first-order logic cannot provide a metatheory for itself or usually for any substantial theory formulated by its means. In contrast, an independence-friendly language allows
123
Synthese (2011) 183:69–85
75
the definition of some of the basic model-theoretical concepts for itself including the concept of truth (see Hintikka 2001; Sandu 1998).
4 What kind of completeness? As to the requirement (4), the completeness mentioned in it is of a special kind. In fact, all the requirements formulated above, including requirement (1), are in effect completeness requirements. It might now seem that it has been shown to be impossible to satisfy in some of the most interesting cases by results like Gödel’s first incompleteness theorem. It might at first look as if such results mark an important limitation of the axiomatic method and perhaps even of what logic and mathematics can accomplish as tools for human thinking. The looming incompleteness seems to threaten our prospects of satisfying both the requirement (1) and the requirement (2). This impression is nevertheless due to a widespread misconception. The source of this misconception is requirement (2). What is meant by “logical consequence” there? It is often, perhaps nearly universally, thought that the relevant logical consequence relations should be possible to capture by mechanical (recursive) rules of inference. This requirement has an old motivation implicit in the very idea of symbolic logic which presupposes that all the work in logic can be done symbolically, that is formally. However, “formal” does not mean the same as “mechanical” or “recursive”. Relations of model-theoretical (semantical) consequence are formal in the sense that they depend only on the logical form of the premises and of the conclusion. Confusing this sense of “formal” as applied to logical consequence relations with the recursive enumerability of the consequences of an axiom system or for that matter any set of premises is simply a mistake. A more recent motivation of the recursivity requirement lies in the desire of capturing relations of logical consequence in computer science terms. But if this requirement is imposed, most actual axiom systems turn out to be incomplete. If the model-theoretical characterization of the axiomatic method proposed above is accepted, then the sense of logical consequence relevant to axiomatization is what is usually known as “logical consequence” and what more fully explained means modeltheoretical (semantical) consequence. Such a consequence relation holds between A and B if and only if all the models of A are also models of B. This relation can hold even if not all such consequences B of A are formally derivable from A by mechanical rules. It is also seen that if consequence relation is so understood, inferences to logical consequences B of a proposition A cannot yield information in the most natural sense of the word that would not be codified by A. By reviewing the advantages of the axiomatic method briefly discussed above, it can be seen that most of them are available as soon as logical consequence is understood in the model-theoretical (semantical) sense. In this sense, B is a logical consequence of A iff all the models of A are also models of B. Mechanizability of the consequence relation is not required. Some of the main advantages of axiomatization are available only if the intended logical consequence relation is understood in this way. The underlying reason is obvious. An axiomatic system is a study of its models. And it
123
76
Synthese (2011) 183:69–85
is precisely relations between models that the semantical (model-theoretical) sense of logical consequence is based on. It is when the axiom system captures our entire knowledge of a class of models (structures) as their semantical consequences, that we can study this class by means of logic and mathematics without needing any further information about the specific subject matter in question. Even if we are dealing with a physical theory we do not need any new experiments or observations. If the notion of logical consequence is understood in this way, the requirement (2) does not restrict the power of the axiomatic method in any way. For instance, one can easily formulate model-theoretically complete axiom systems for arithmetic without violating Gödel’s first incompleteness theorem (Gödel 1931). For Gödel’s result is relative to the assumption that the logic used is the ordinary first-order logic. Semantically complete axiomatizations of elementary number theory are in fact available in terms of second-order logic or extended IF logic. The advantages of the axiomatic method are in fact connected with the often overlooked fact (noted above) that the model-theoretical logical consequence relation is in a sense as formal as any proof-theoretical consequence relation viz. in that it depends only on the logical form of the proposition involved, not on the nonlogical primitives occurring in the axioms. In this important sense, “formal” does not imply “mechanizable” or “recursive” when we are dealing with consequence relations. One can in fact turn the tables here and argue that Gödel-type results on the contrary show that in axiomatic theories the notion of logical consequence has to be taken in a model-theoretical rather than proof-theoretical sense. For those results show that proof-theoretical consequence relation cannot capture all the actual modeltheoretical consequences of the axioms that the system is calculated to capture as its theorems. From the vantage point of the services that the axiomatic method can provide, any restriction to mechanizable consequence relations is not only unnecessary but positively harmful. The very idea was to capture all the relevant subject matter into an axiomatic system in the form of its logical consequences. If these consequences are somehow restricted, this purpose is left incompletely fulfilled, in that in a deductively incomplete theory some of those logical consequences are not accessible from the axioms. It is sometimes said that the use of a notion of logical consequence that cannot be captured by explicit programmable rules of inference forces us to resort to irrational sources of truths such as some kind of mathematical intuition. They are needed to supplement logical deduction. Gödel (1983) in fact claimed that there is a partial epistemological parity between sense-perception in science and intuition in mathematics. The non-mechanizability of relations of logical consequence might therefore seem to expose us to a danger of arbitrariness and perhaps even irrationality. This danger seems to have been actualized in thinkers like Gödel who actually propose that we should rely on intuition. In reality, this perceived danger is largely an illusion. Contrary to Gödel’s unfortunate comparison of intuition with sense-perception, new logical principles are not dragged aus der Tiefe unseres Bewusstseins, by contemplating one’s mathematical soul (or is it a navel?) but by active thought-experiments by envisaging different kinds of structures and by seeing how they can be manipulated in imagination. The relations
123
Synthese (2011) 183:69–85
77
that can be so revealed are model-theoretical rather than proof-theoretical. Maybe such thought-experiments are examples of what is meant by appeals to intuition. But if so, mathematical intuition does not correspond on the scientific side to sense-perception, but to experimentation. In sum, Gödel’s first incompleteness theorem does not bring out any essential limitations of the axiomatic method or of human thinking more generally. What it shows are limitations of computers. It shows in effect that a mathematician’s work is not completed when he or she has managed to formulate a complete axiom system. The rest of his or her work cannot be delegated to computers even when it comes to the derivation of theorems from axioms. But this is eminently in keeping with the experience of applied mathematicians. In many cases, the basic axiomatic assumptions are well known, and yet there is no mechanizable way to examine their consequences for particular problems. Many-body problems in Newtonian mechanics and quantum theory offer examples of such a situation. How much there is in analysis and physics that cannot be handled by means of computable methods is discussed by Pour-El and Richards (1989).
5 Axiomatization and surface information Here we seem to have a paradox, maybe even a contradiction, in our hands. The very purpose of the axiomatic method was said to be the study of the models of the theory by deriving theorems purely logically from the axioms. But it is now suggested that purely logical inferences cannot yield new information. Something appears to be quite wrong here. This conundrum is in fact only one of the symptoms of the general confusion about the role of logic and logical inferences. This apparent paradox is connected with various misinterpretations. For one thing, Hilbert’s alleged formalism amounts to little more than an emphasis on the uninformativeness of the derivation of theorems from the axioms. This uninformativeness is but the other side of the coin of the independence of subject matter of all derivations of theorems from axioms, in other words, of the requirement that all the relevant information is codified in the axioms. This subject matter independence is what is highlighted by Hilbert’s often misunderstood quip about tables, chairs and beermugs. This independence should not have been news to anyone, let alone a shock. It is a consequence of the necessary truth-preservation of logical inference. But in the context of an axiomatic theorizing in mathematics it suggested that an axiomatic system cannot produce new information about its ostensive subject matter. The only new things that are produced in axiomatic theorizing are uninformative proofs. Accordingly, Hilbert was interpreted as a formalist in his axiomatic philosophy of mathematics. An answer lies in a closer analysis of the very notions of information and inference. This answer has been given in my earlier work (see Hintikka 2007a and the literature referred to there). Very briefly explained, what a proposition gives us is a disjunction of a number of mutually exclusive possibilities concerning the world. Those disjuncts specify different alternative possibilities concerning the world admitted by the proposition in question. The prima facie information, called surface information,
123
78
Synthese (2011) 183:69–85
is (ceteris paribus) the greater the fewer those alternatives are, in other words, the more possibilities concerning the reality are being excluded. It turns out, however, that some of these prima facie possibilities concerning the world are only seemingly realizable even though this cannot be “seen” directly from its form. If we assume that all these merely apparent possibilities (inconsistent disjuncts) are eliminated, we can use the number of (suitably weighted) remaining ones to measure a kind of information called depth information. If we do not eliminate them, we can in the same way characterize another kind of information, the one that was called surface information. (In both cases, the different alternative possibilities must be weighted in some suitable way.) Instead of the terms surface information and depth information that I have used someone else might prefer “explicit information” and “implicit information”. They are characterized elsewhere, e.g. in Hintikka (2007a). In other words, by means of purely logical operations it can often be shown that some of those prima facie alternatives that a proposition presents to us are not really possible. For instance, such an alternative may involve the existence of two individuals who on a closer examination will turn out not to be compossible. The totality of alternatives that are not so eliminable represents the depth information of the proposition. This kind of information seems to be what is usually thought of as the Realgehalt of the proposition. Yet the elimination of merely possible-looking alternatives can yield information in a most down-to-earth sense. For before they are eliminated we have to be prepared for the eventuality that such an alternative amounts to, for instance for an encounter of two individuals which in deeper reality are not compossible. Thus surface information is indeed information in a concrete (and more directly accessible) sense than depth information. As a simple illustration we can think of what the discovery of radio waves amounted to conceptually. What Marconi (or perhaps rather Hertz) did was in effect merely to move a transmitter and a receiver to some distance from each other. All that a scientist knew about their interaction without further ado is that they obeyed Maxwell’s equations. But what that meant in actual circumstances had to be figured out by working out a solution of the equations with certain particular boundary and initial conditions. Or else a scientist could find out the same experimentally. In either case, he or she gained nontrivial surface information. Thus the usefulness of the axiomatic method can be seen as being due to its being a systematic means of extracting more surface information from the axioms.
6 Axiomatization and the notion of analyticity The distinction between depth information and surface information helps to overcome another misconception that makes it unnecessarily difficult to reach an overview on the axiomatic method. This misconception is to accept Quine’s view that the notion analytic truth (and perhaps by the same token the notion of logical truth) is not viable. It would be tempting but unfair to dismiss Quine’s rejection of analyticity as another example of his prejudice against model-theoretical concepts. Likewise it would be unfair to blame Quine merely for a confusion between surface information and depth information, even though one can perhaps find such a confusion in his thinking.
123
Synthese (2011) 183:69–85
79
In the last analysis, Quine’s position turns on the insight just explained to the effect that surface information can serve to facilitate our actions and planning just as authentically as depth information. In Quine’s perspective, this means that it is difficult to tell the two kinds of information from each other behaviorally. But from such a behavioral indistinguishability it does not follow that the two cannot be distinguished in other ways. For instance they can apparently be distinguished by an outside observer only by reference to longer stretches of behavior than Quine is countenancing (cf. here Hintikka 1968). But this recognition of a true element in Quine’s criticism does not in the last analysis vindicate it in the least. There is no real problem about requiring that in an axiomatic method the derivation of theorems from axioms must take place purely logically. It is in any case important to realize that Quine’s rejection of the notion of analytic truth threatens to make nonsense of the entire axiomatic method; see Hintikka (2003). (This is because it means making nonsense of requirement (3).)
7 Axiomatizing logic From what has been found, it follows that the idea of axiomatizing logic is a tricky enterprise if not an outright dubious one. For it was seen that the axiomatization of a theory requires an explicit logic as its tool. But can one use logic as a tool of studying logic? The answer is not obvious. Peirce for one did not have any inhibitions about applying logic to logic. However, in the subsequent mainstream of logical theorizing it is generally held that for model-theoretical study of a system of logic one needs a stronger logic. This is what Tarski’s undefinability theorem seems to imply. Ultimately, one seems to be driven outside logic to set theory as the lingua franca of axiomatic theorizing. Alternatively, one is reduced to the mere syntax of logic, that is, to considering any recursive enumeration of logical truths as an axiomatization of logic. However, this idea is predicated on the view rejected in Sect. 4. As far as the meaning of axiomatization in general is concerned, this improper sense is the only viable one in which one can speak of the axiomatization of logic. It seems that some thinkers have been led by this fact to think that recursive enumerability of theorems is an essential feature of axiomatization at large. One of the purposes of this paper is to expose the inadequacy of the current ‘syndrome’ of ideas concerning the axiomatization of logic. On the negative side, it will be shown in the next section that set theory is not in its present form a viable medium of axiomatization in logic or anywhere else. Indeed, the current incarnation of set theory as a first-order axiomatic theory is a hopeless mare’s nest of problems. On the positive side independence-friendly (IF) logic provides us with languages whose metatheory can at least partly be formulated in the same language (see Hintikka 1996). In other words, we can by means of such logic apply logic to study logic, perhaps even to axiomatize it in some other sense. How far these newly gained possibilities reach remains largely to be investigated. The main result already reached is of course the possibility of formulating a truth predicate for a suitable IF language in itself.
123
80
Synthese (2011) 183:69–85
It is nevertheless important to realize that there is a different sense of completeness that is also relevant to the axiomatic method. In this sense, an axiomatization of logic is a recursive enumeration of logical truths. It is in this new sense complete if it enumerates all (and only) logical truths. Completeness in this sense is sometimes misleadingly called semantical completeness. If an axiomatized logic is used as the logic of a nonlogical axiom system, this axiom system serves its purpose optimally only if its logic is complete in the sense of this “semantical” completeness. For otherwise there might be logical consequences of the axiom system which cannot be captured by means of the logic used in it. However, for reasons expounded in Sect. 4, this sense of completeness is the model-theoretical sense.
8 Axiomatizing set theory If we for a moment return to the conditions we have found necessary to impose on satisfactory axiom systems, they might not at first seem to have much critical bite. This impression is not accurate, however. One of the best known and most frequently used type of an axiom system flunks this test. This kind of system is a first-order axiomatic set theory. The basic source of trouble is revealed already by requirement (1). To see this, take any first-order axiomatic set theory AX. Being a set theory, the models of AX should be structures of sets. But, being models of a first-order theory, these models do not consist of sets. They are structures of particular objects, not of sets. The epsilon relation cannot without further ado be interpreted as the membership relation. For the most important instance, in many of these first-order models there are collections α of elements which do not constitute a set, in the sense that there is no element e such that (∀x)(x ε e ↔ x is in α). Admittedly some interesting structures involving sets can be characterized, such as the cumulative notion of set. But what results concerning such structures tell about the nature of sets or about the structure of all sets remains unclear (see here Hintikka 2004a). Moreover, there turn out to be theorems of AX (on certain weak assumptions) some of whose models can be interpreted as consisting entirely of sets. Such theorems must therefore be considered false if interpreted to be about sets (see Hintikka forthcoming). Thus the models of AX do not in any reasonable sense capture the intended settheoretical structures. But even if they did, there would be further trouble. Condition (4) would be violated. Frege might have believed that the general rules for quantifiers codified in first-order logic suffice also for all reasoning about higher-order entities like sets. We know that they do not. Hence what a first-order set theorist must resort to doing is to introduce them as axioms, such as the so-called axiom of choice. This is a strange procedure. What a resulting axiom system captures through its models is a mélange of set-theoretical and logical truths. The received first-order axiom system does not even capture the full force of the “intuitions” that back up the axiom of choice, either. A fully general form of the axiom of choice would have to imply the existence of a full array of Skolem functions for each true sentence. But if this assumption is added to the usual axioms, a contradiction results.
123
Synthese (2011) 183:69–85
81
In fact, the unrestricted axiom of choice turns out to be a valid first-order mode of reasoning (see Hintikka forthcoming). All that is needed to realize this is that a part of the semantical job description of quantifiers is to express the actual dependence and independence of variables by the formal dependence or independence of the quantifiers to which they are bound. This is both good news and bad news for a traditional axiomatist of set theory. The good news is that he or she can capture the force of axiom of choice by switching over to the use of a better first-order logic. The bad news is that unless the set theorist also does something else, his or her set theory will imply a contradiction. If we look at the enterprise of axiomatizing set theory from a historical perspective, we find an ironical story (see here especially Ebbinghaus 2007; Zermelo 1904, 1908, 1930). Zermelo did not begin to axiomatize set theory unselfishly from the goodness of his theoretical heart. His main purpose was to justify his well-ordering theorem. In practice, this largely meant to justify the axiom of choice. As a matter of subsequent history, his axiomatization led to all the problems detailed above. But that is not the full story. Worse still: Zermelo’s specific enterprise was unnecessary, in that the so-called axiom of choice turns out to be in the bottom a plain first-order logical principle. Furthermore, Zermelo’s project is turning out to be self-defeating in a sense. The intuitions that presumably are codified in the current formulations of the axiom of choice in the usual axiom system for set theory justify obviously stronger assumptions. In particular, they justify all inferences from the truth of a set-theoretical proposition to the existence of a full array of its Skolem functions. For these functions are what guarantee the existence of all the “witness individuals” that make manifest the truth of the proposition in question. However, if we add to the axioms of a current first-order axiomatization of set theory all the relevant conditionals (that is, conditionals with each set-theoretical proposition as the antecedent and a statement of the existence of its Skolem functions as its consequence) we obtain an inconsistent system. (Such a system would be inconsistent in virtue of Tarski’s theorem, for it would be easy to formulate a truth predicate in it.) Zermelo’s project can thus been shown to be both redundant and self-defeating. To add incompleteness to injury, the usual first-order axiomatizations of set theory also turn out to be too weak in the dimension indicated by condition (4). Being a firstorder theory AX is not strong enough to serve as its own metatheory. For instance, one cannot define truth in a model of AX by means of AX. This is ironic in view of the widespread but mistaken idea of set theory as the natural medium of all model theory. In its incarnation as AX set theory cannot even serve as a model theory of itself. This fact has important methodological implications. It shows that it may very well be advisable to study specific set-theoretical structures, for instance the continuum, by means other than first-order axiomatic set theory. Thus the analysis of the axiomatic method presented here shows that the received first-order axiomatizations of set theory are not viable. First-order axiomatic set theory is a misuse of the axiomatic method.
123
82
Synthese (2011) 183:69–85
9 Is axiomatization epistemologically motivated? But are there perhaps other purposes that the axiomatic method can serve? Traditionally, axiomatization is sometimes thought of as a means of enhancing the credibility of a theory, perhaps by making it more intuitively obvious. Such claims must be considered skeptically. The logical deduction of theorems from axioms does not introduce any new information and hence cannot strengthen the credibility of anything. (A theorem’s credibility might on the contrary be sometimes enhanced by showing that it is independent of some of the axioms!) Hence the credibility of an axiom system must be due to the axioms. Where do they come from? The class of structures that the axioms are calculated to capture can be either given by intuition, freely chosen or else introduced by experience. No matter how an axiom system has been arrived at, in its own right it is a logical study of a certain class of structures viz. its models. This study is from a purely mathematical point of view all that an axiom system does. The only possible obstacle to its serving this function would be the inconsistency of the system. Hence, the only epistemological justification that an abstract axiom system needs is a proof of its model-theoretical consistency. Such a proof was the original aim of Hilbert’s foundational work. Likewise, an axiom system calculated to capture our intuitions about some subject matter does not need any particular epistemological justification. It is at most a question to be addressed to empirical psychologists as to how accurately an axiom system really captures our intuitions, for instance whether our visual space is perfectly Euclidean or perhaps slightly non-Euclidean in its metric. We also have to heed the possibility that our intuitions can be re-educated. An imaginative writer, such as Abbott (1935) the author of Flatland or Reichenbach (1958) in the Philosophy of space and time might actually induce in us an intuitive picture as to what it would feel like to live in a non-Euclidean space. Someone might suggest as a historical objection to what was just said the need mathematicians felt to prove the parallel postulate (Euclid’s fifth postulate) from other assumptions. Does that show that some intuitions are less intuitive than others and hence in need of a proof? This seems not to be the case, for the special character of the parallel postulate amply explains the attention paid to it. For one thing, it is the only assumption in Euclid whose uses in proofs require arbitrarily long lines, lines that would for instance extend beyond the finite Aristotelian universe. Can such a postulate nevertheless be true in the actual universe? This question alone motivates amply mathematicians’ concern about the parallel postulate. On the other hand, if an axiom system is supposed to be a theory about the real world, its axioms must be verified empirically in the same way as any other scientific truths. Hilbert’s axioms of geometry may have been suggested to him by his intuitions, but unlike Kant he makes it perfectly clear that their applicability to reality presupposes an empirical verification of axioms, however intuitive they may be (Hilbert 1918, p. 149). This applies even to the continuity axioms. Indeed, the status of similar continuity and differentiability assumptions in thermodynamics had in fact been a moot issue in the philosophy of physics.
123
Synthese (2011) 183:69–85
83
The normal development of theoretical science has only gradually led to axiomatic theories whose axioms were far from obvious and not in any normal sense derived directly from experience or otherwise enjoyed greater epistemological certainty than their consequences. As a representative statement we can here quote Henrich Hertz: “But in no way can a direct proof of Maxwell’s equation be deduced from experience. It appears most logical, therefore, to regard them independently of the way in which they had been arrived at, and consider them as hypothetical assumptions and let their plausibility depend on the very large number of natural laws which they embrace. If we take this point of view we can dispense with a number of auxiliary ideas which render the understanding of Maxwell’s theory more difficult” (quoted by Frank 1947, p. 38). Hence obviousness and certainty are not among the crucial characteristics of the axioms of a system that is supposed to apply to the actual world. This is not a recent view, either. Already according to Aristotle the first principles in a deductively organized science are found only through a potentially complex interrogative process (see here Hintikka 2004b). Likewise, according to Newton the greatest difficulty in physics is to find the forces that govern natural phenomena (cf. here Newton 1972, ‘Author’s Preface’, p. 16). They are the explanatory assumptions from which phenomena to be explained are derived. And these forces are not merely assumed, they are derived from phenomena. In the case of his Principia, “we derive from celestial phenomena the gravitational forces”; In general a physicist’s task is “to discover the forces of nature from the phenomena of motions (ibid.).” Hence the axioms of a scientific system do not mean to occupy a privileged position epistemologically. This is related to the fact that the derivation of theorems from axioms can produce new surface information but not new depth information (cf. Sect. 5).
10 Axiomatization and explanation There is nevertheless another epistemological dimension of the axiomatic method which has not been studied in any depth by philosophical methodologists. Maybe the axiomatic system does not in any direct sense contribute to the certainty of the theorems derivable from it. However, many practitioners of the axiomatic method emphasize its uses in explaining these theorems. This is for instance true of the axiomatists of the Bourbaki school. For instance, according to Claude Chevalley another innovation of Bourbaki was “the principle that every fact in mathematics must have an explanation.” This principle is separate from the idea of causality meaning that one fact causes the occurrence of another. Bourbaki held that “anything that was purely the result of calculation was not considered by us a good proof” (quoted in Aczel 2006). But what is this explanatory role of the axiom system? Unfortunately, no answer is forthcoming from the earlier discussions of explanation by philosophers. Most of the earlier literature consists of nitpicking and fruitless controversies about whether explanation is subsumption to a covering law, finding causal connections, unification or what not. However, an analysis of the role of logic is now available (see Hintikka 2007b and the references given there). It is shown there how one kind of explanation
123
84
Synthese (2011) 183:69–85
of why G must be true if F is true can be extracted from any usual logical consequence relation. By “usual” is here meant “first-order”. For the purpose, the proof establishing the consequence relation must be transformed into a normal form, essentially into a tableau proof in which there is no transfer of formulas between the left and the right side. In particular, such a normal form brings out an interpolation formula I which can be thought of as the explanation why G follows from F. This formula is normally more complicated than F or G. This additional complication can serve as a measure of the explanatory depth of the consequence relation in question. It in effect shows how the structure characteristic of the models of G inevitably emerges when we try to construct in reality or in thought a model of F. It thus constitutes an explanation of the connection between F and G in a vivid sense. In this sense, the axioms of a system can provide explanations of theorems. But what is needed for them to be able to serve this explanatory purpose? A partial answer lies in what has been said: The axioms must be simple in the sense that they involve few individuals forming a straightforward structure. (In formal terms, this presupposes low quantificational depth.) Such simplicity does not in the case of an applied axiom system make by itself the axiom any more likely. However, it is vital for the explanatory function of axiomatization. This function deserves closer study than has been devoted to it in the literature. The notion of surface information offers to us a conceptual tool in this investigation. References Abbott, E. (1935). Flatland: A romance of many dimensions. Boston: Little, Brown & Company. Aczel, A. D. (2006). The artist and the mathematician: The story of Nicholas Bourbaki, the genius mathematician who never existed. New York: Avalon. Bourbaki, N. (1950). The architecture of mathematics. American Mathematical Monthly, 57, 221–232. Ebbinghaus, H.-D. (2007). Ernst Zermelo: An approach to his life and work. Berlin: Springer. Frank, Ph. (1947). Einstein: His life and times. New York: A. A. Knopf. Gödel, K. (1931). Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I. Monatshefte für Mathematik und Physik, 38, 173–198. (Reprinted & translation in Collected works (Vol. 1, pp. 144–195). Oxford: Oxford University Press). Gödel, K. (1983). Russell’s mathematical logic. In P. Benacerraf & H. Putnam (Eds.), Philosophy of mathematics (pp. 447–469). New York: Cambridge University Press. Hilbert, D. (1899). Grundlagen der Geometrie. In Festschrift zur Feier der Enthüllung des Gauss-WeberDenkmals in Göttinge (pp. 1–92). Leipzig: Teubner. Hilbert, D. (1900). Mathematische Probleme. In Nachrichten von der Königlichen Gesellschaft der Wissenschaften zu Göttingen, Math.-Phys. Klasse (pp. 253–297). Lecture given at the International Congress of Mathematicians, Paris. Hilbert, D. (1918). Axiomatisches Denken. Mathematische Annalen, 8, 405–415. English translation in W. Ewald (Ed.), From Kant to Hilbert: A source book in the foundations of mathematics (Vol. 2, pp. 1105–1110). Oxford: Oxford University Press. Hilbert, D. (1922). Neubegrundung der Mathematik, Erste Mitteilung. Abandlungen Aus Dem Mathematischen Seminar Hamburger Universität, 1, 157–171. Hintikka, J. (1968). Behavioral criteria of radical translation. Synthese, 19, 69–81. Hintikka, J. (1996). The principles of mathematics revisited. New York: Cambridge University Press. Hintikka, J. (2001). Post-Tarskian truth. Synthese, 126, 17–36. Hintikka, J. (2003). A distinction too few or too many?. In C. Gould (Ed.), Constructivism and practice (pp. 47–74). Lanham, MA: Roman and Littlefield. Hintikka, J. (2004a). Independence-friendly logic and axiomatic set theory. Annals of Pure and Applied Logic, 126, 313–333.
123
Synthese (2011) 183:69–85
85
Hintikka, J. (2004b). On the development of Aristotle’s ideas of scientific method and structure of science. In Analyses of Aristotle (pp. 153–174). Dordrecht: Kluwer. Hintikka, J. (2006). Truth, negation, and some other basic notions in logic. In J. Van Benthem, et al. (Eds.), The age of alternative logics (pp. 195–219). Heidelberg: Springer. Hintikka, J. (2007a). Who has kidnapped the notion of information? In J. Hintikka (Ed.), Socratic epistemology, explorations of knowledge-seeking by questioning (pp. 189–210). New York: Cambridge University Press. Hintikka, J. (2007b). Logical explanations. In J. Hintikka (Ed.), Socratic epistemology, explorations of knowledge-seeking by questioning (pp. 161–188). New York: Cambridge University Press. Hintikka, J. (forthcoming). Reforming logic (and set theory). Hintikka, J., & Karakadilar, B. (2006). How to prove the consistency of arithmetic. Acta Philosophica Fennica, 78, 1–15. Laugwitz, D. (1996). Bernhard Riemann 1826–1866: Wendepunkte in der Auffassung der Mathematik. Basel: Birkhäuser. Majer, U. (1995). Geometry, intuition and experience: From Kant to Hilbert. Erkenntnis, 42, 261–295. Majer, U. (2001). The axiomatic method and the foundations of science: Historical roots of mathematical physics in Göttingen (1900–1930). In M. Rédei & M. Stöltzner (Eds.), John von Neumann and the foundations of quantum physics (pp. 11–34). Dordrecht: Kluwer. Newton, I. (1972/1736). In A. Koyré & I. B. Cohen (Eds.), Isaac Newton’s Philosophiae Naturalis Principia Mathematica (Vol. 1). Cambridge: Harvard University Press. Pour-El, M. B., & Richards, J. (1989). Computability in analysis and physics. Heidelberg: Springer. Reichenbach, H. (1958). The philosophy of space and time. New York: Dover. Sandu, G. (1998). IF logic and truth definition. Journal of Philosophical Logic, 27, 143–164. Tarski, A. (1935). Der Wahrheïtsbegriff in den formalisierten Sprachen. Studia Philosophica, 1, 261– 405. English translation in Tarski (1956). Logic, semantics, metamathematics. New York: Oxford University Press. Trisch, M. (2005). Inconsistency, asymmetry, and non-locality. New York: Oxford University Press. Zermelo, E. (1904). Proof that every set can be well-ordered. Original German in Mathematische Annalen, 59, 514–516. English translation in J. van Heijenoort (Ed.). (1967). From Frege to Gödel: A sourcebook in mathematical logic 1879–1931 (pp. 139–141). Cambridge, MA: Harvard University Press. Zermelo, E. (1908). Investigations on the foundations of set theory. Original German in Mathematische Annalen, 65, 261–281. English translation in J. van Heijenoort (Ed.). (1967). From Frege to Gödel: A sourcebook in mathematical logic (pp. 1879–1931). Cambridge, MA: Harvard University Press. Zermelo, E. (1930). New investigations on the foundations of set theory. Original German in Fundamenta Mathematicae, 16, 29–47. English translation in W. Ewald (Ed.). (1996). From Kant to Hilbert (Vol. 2, pp. 1208–1233). Oxford: Oxford University Press.
123
This page intentionally left blank z
Synthese (2011) 183:87–114 DOI 10.1007/s11229-009-9669-7
Reflections on the revolution at Stanford F. A. Muller
Received: 22 February 2008 / Accepted: 24 April 2009 / Published online: 17 November 2009 © Springer Science+Business Media B.V. 2009
Abstract We inquire into the question whether the Aristotelian or classical ideal of science has been realised by the Model Revolution, initiated at Stanford University during the 1950s and spread all around the world of philosophy of science—salute Suppes. The guiding principle of the Model Revolution is: a scientific theory is a set of structures in the domain of discourse of axiomatic set-theory, characterised by a set-theoretical predicate. We expound some critical reflections on the Model Revolution; the conclusions will be that the philosophical problem of what a scientific theory is has not been solved yet—pace Suppes. While reflecting critically on the Model Revolution, we also explore a proposal of how to complete the Revolution and briefly address the intertwined subject of scientific representation, which has come to occupy center stage in philosophy of science over the past decade.
Keywords
Theory · Model · Classical ideal · Structure · Being · Representation
1 Preamble You, dear Reader, are pleased to call again, and with some earnestness, for my thoughts on the late proceedings at Stanford University. I shall not give you reason to imagine,
F. A. Muller Faculty of Philosophy, Erasmus University Rotterdam, Burg. Oudlaan 50, H5-16, 3062 PA, Rotterdam, The Netherlands F. A. Muller (B) Institute for the History and Foundations of Science, Department of Physics & Astronomy, Utrecht University, Budapestlaan 6, IGG-3.08, 3584 CD, Utrecht, The Netherlands e-mail:
[email protected];
[email protected]
123
88
Synthese (2011) 183:87–114
that I think my sentiments of such value as to wish myself to be solicited about them. They are of too little consequence to be very anxiously either communicated or withheld. It was from attention to you, and to you only, that I hesitated at the time, when you first desired to receive them. In the first letter I had the honour to write to you, and which at length I send, I wrote neither for nor from any description of men; nor shall I in this. My errors, if any, are my own. My reputation alone is to answer for them. This opening paragraph is (with minor modifications) the opening paragraph of Reflections on the Revolution in France, and on the Proceedings in Certain Societies in London Relative to that Event, in a Letter intended to have been sent to a Gentleman in Paris (1790) by the right honourable Edmund Burke. In this long letter to ChamesJean-François de Pont, which detonated the great debate on the French Revolution, Burke defended a Traditionalism that involved a return to an earlier humanity and that contained revolutionary counter-revolutionary prescriptions. Reflections is and always has been a source of inspiration and of arguments for conservative politicians all around the world. Burke’s style combines, in varying proportions, a perspicacious rationality, a formidable sonority, a Gothic if not pathetic severity, and a furious irony that frequently glides into a savage sarcasm.1 In sharp contrast to all of this, the current paper is not a ‘Letter intended to have been sent to a Lady at Stanford University’ but is based on a presentation at the engaging conference ‘The Classical Model of Science’ (Amsterdam, Free University, January 2007); further, it does not advocate a return to an earlier period in the philosophy of science, let alone that it contains ‘revolutionary counter-revolutionary prescriptions’, and, last but not least, its style is cool, calm and collected. What is analogous in it to Burke’s Reflections is that it recognises as revolutionary the change in our thinking about the nature of scientific theories that began at Stanford University about half a century ago under the lead of Patrick Suppes, and that it expounds some reflections on this revolution in the philosophy of science of a critical nature. In addition it also provides some suggestions of how to complete this Model Revolution, rather than to return to the ancien régime. Critically progressive rather than critically conservative is the spirit of this paper. We begin by providing a succinct summary of what the ‘Model Revolution’ is all about (Sect. 2). Then we inquire whether it can be interpreted as achieving the classical ideal of science (Sect. 3). The subsequent sections will contain some critical reflections that form the body of this paper, addressing subsequently: first, the relation between our theories and the beings they are about (a being is here anything that can exist, and an actual being is anything that does exist; sometimes we drop ‘actual’ for the sake of brevity; Sect. 4); secondly, the relation between our theories and the results of experiments and observations (Sect. 5); and, thirdly, the relation between the models and the concepts and propositions of science (Sect. 6). We end with an idea of how to complete the Model Revolution (Sect. 7) and briefly address a related topic which is currently in the limelight of philosophy of science: scientific representation (Sect. 8).
1 See O’Brien (1967, pp. 42–43).
123
Synthese (2011) 183:87–114
89
2 The model revolution at Stanford About half a century ago at Stanford University, in the United States of America, Patrick Suppes and collaborators published, in two papers in Journal for Rational Mechanics and Analysis, a revolutionary axiomatisation of a scientific theory, namely classical particle mechanics Newtonian-style.2 Until Suppes’ arrived on the stage of philosophy, an axiomatisation of a theory was generally understood in the same sense as this gradually became to be understood in Logic since the appearance of Frege’s Begriffsschrift in 1879. During the Interbellum, that understanding had been forcefully promulgated by the Logical-Positivistic philosophers of the Vienna Circle; it was inspired by exciting new developments in Logic at the time, which had started in the course of the nineteenth century (Frege, Schröder, Russell, Hilbert and others).3 In order to characterise some scientific theory T rigorously, as it is presented in some prominent monograph or review article widely used and referred to by scientists, one had to proceed as follows.4 (A) Erect an elementary formal language LT (finite lexicon, lex(LT ), quantification only over all, possibly different sorts of objects, no quantification over predicates; schemata with sentence-variables might be permitted), its lexicon including all predicates that express the ‘fundamental concepts’ of T. The set sent(LT ) of the sentences of LT is defined inductively in the usual manner.5 (B) Erect a formal-deductive apparatus in order to be able to reason rigorously in LT , i.e. to prove theorems (this fixes the deduction–relation between sentences, abbreviated as usual by: ). (C) Collect a number of sentences from sent(LT ) in set ax(T); they are the axioms of T and should be formalisations of the postulates, principles and laws that characterise T and are considered not to be deducible from other postulates, principles and laws in T. Then pentuple (1) FT ≡ lex(LT ), sent(LT ), ax(T), T, is a formal-deductive system, by definition the formalisation of T, where T is the deductive closure of the axioms:6 T ≡ ϕ ∈ sent(LT ) | ax(T) ϕ . (2) Of course one requires that T be consistent. 2 McKinsey et al. (1953), McKinsey and Suppes (1953), McKinsey and Suppes (1955) and Suppes (1954). 3 For the canonical review of the Logical-Positivistic views on scientific theories, see Suppes (1974). 4 We take for granted that we can identify a scientific theory in scientific writings. Admittedly this is not immune from philosophical doubt. But then again, nothing is. 5 At the meta-level we have some weak set-theory to make this all possible rigorously. The concept of ‘object’ here is the metaphysically thin notion as used by Frege and Quine: it can be anything over which we can quantify and for which we then must supply identity conditions—physical objects, persons, processes, events, mental states, space–time points, dreams, numbers, sets, actions, etc. 6 By ‘formalisation’ we mean what Frege meant: spelling out the alphabet, and all the formation and deduction rules.
123
90
Synthese (2011) 183:87–114
Steps (A)–(C) can be taken for mathematical and scientific theories alike7 ; the next steps are for scientific theories only and thus distinguish these from mathematical ones. (D) Distinguish in the lexicon of LT (A) theoretical and observational predicates in order to connect FT (1) to the results of experiments and observations, e.g. measurements, and collect these predicates in sets th(LT ) and obs(LT ), respectively, which thus are both proper subsets of lex(LT ). The axioms of T (2) containing only theoretical predicates are called theoretical postulates and those containing both theoretical and observational predicates—and thus connecting them—are correspondence postulates. Sentences of LT consisting of entirely observational predicates, so-called observation sentences, are in principle (but not always in practice) open to verification or falsification, or both; that is to say, whether an observation sentence is true or false can, in principle, be determined just by performing the relevant observation. To make this possible, the observational sub-language of LT , which contains by definition only all observation sentences without quantifiers, is interpreted (in the standard logical sense) in a specified domain DT of observable objects (events, things) deemed relevant for T; this also provides, via the correspondence postulates, a so-called partial interpretation of the theoretical sentences. (E) Let Ot (T) ⊂ sent(LT ) be the set of observation sentences verified by scientists until historical time t that are relevant for T; they are about the members of domain DT (see D). Call formal-deductive system FT (1) observationally adequate at t iff T (2) is (consistent and) every established empirical truth relevant for T—i.e. every verified observation sentence in Ot (T)—is a member of T8 ; in other words, iff Ot (T) ⊂ T ⊂ sent(LT ).
(3)
Thus human observations of observable objects, expressed in LT , constitute the connexion between (i) formal-deductive system FT (1), which is constructed by us, human beings (as is T), and (ii) the concrete actual beings that T is supposed to be about, of which at most some (but certainly not all) are constructed by us. This 7 Mathematical theories are about (particular kinds of) abstract objects only (numbers, geometrical figures, sets, structures); scientific theories are about concrete actual beings. The existence of the objects of mathematics is philosophically controversial; the existence of the objects of science is not, with the exception of the unobservable ones. 8 This concept of observational adequacy (ObsAdeq) should not be confused with Van Fraassen’s wellknown concept of empirical adequacy (EmpAdeq). Three differences between ObsAdeq and EmpAdeq: (i) ObsAdeq relies on the distinction between theoretical and observational concepts whereas EmpAdeq does not rely on that; (ii) ObsAdeq depends on historical time whereas EmpAdeq is timeless (quantifies universally over historical times); (iii) EmpAdeq requires for its definition a conception of a scientific theory that results from the Model Revolution, because it cannot be defined in the conception we are expounding here. This last point, although explicitly and clearly stated in The Scientific Image (van Frassen 1980, pp. 54–55), has been poorly appreciated by the community of philosophers of science: witness that such eminent philosophers of science as Friedman, Musgrave and the late Lipton are among the poor appreciators—see Muller and Van Fraassen (2008). Both ObsAdeq and EmpAdeq rely on the distinction between observable and unobservable objects.
123
Synthese (2011) 183:87–114
91
observation-nexus codifies the empirical essence of science: without it, there is no science. When the content of Ot (T) grows more varied (difficult to express formally) whilst FT (1) remains observationally adequate (3), it also becomes better confirmed. If a number of repeatedly verified sentences in Ot (T) turn out to be inconsistent with T, then FT (1) is no longer observationally adequate (E); then it has been empirically falsified: Ot (T) ∩ T = Ot (T), which indeed contradicts (3). Let us call this answer to the question of what theoretical scientific knowledge consists in (the kind of knowledge acquired by science that is stored in theories9 ) the Formal-Linguistic View (also misleadingly known as ‘the syntactic view’), henceforth L-View for brevity. The rigorous reconstrual of T on the L-View we summarise in the following octuple:
lex(LT ), sent(LT ), ax(T), obs(LT ), th(LT ), T, , Ot (T) ,
(4)
lex(LT ) = obs(LT ) ∪ th(LT ) ∪ math(LT )
(5)
where
is the lexicon of LT . If LT contains object-names, we throw the names of observable objects in obs(LT ) and those of unobservable objects in th(LT ) so that lex(LT ) indeed exhausts the union of the three sets in (5). The third so far not yet mentioned set, math(LT ), contains the mathematical vocabulary (which is in principle reducible to the vocabulary of pure set-theory); how much mathematics a scientific theory employs differs from theory to theory and therefore also math(LT ) and ax(T) will differ. The required consistency (C) of T implies that the part of T containing only observation sentences is also consistent and thus has a model. In the final and most sophisticated version of the L-View developed by the Logical Positivists it was required that this observational part of T must have a finite model.10 The L-View went down due to an accumulation of internal problems and of external criticism—lege infra. In his seminal review article, ‘The Search for Philosophic Understanding of Scientific Theories’, Suppes (1974) has described the magnificent rise and fall of this ancien régime. According to some, this programme has ended in failure, and actually was doomed ab ovo—although this last addition requires a crushing amount of wisdom with hindsight; others find priceless pearls of philosophical insight in the remains. Patrick Suppes was, like the Logical Positivists, inspired by exciting then-new developments in Logic, e.g. the rise of Tarskian model theory in the 1950s and 1960s. When he axiomatised classical particle mechanics Newtonian-style, he did not follow Steps (A)–(E) above of the L-View: he did not create a formal language, he did not describe his deductive apparatus explicitly, he did not erect a formal-deductive system like (1), and he did not subdivide predicates into theoretical and observational ones. 9 Science harbours other kinds of knowledge as well, such as experimental results, knowledge of how to design and to conduct a scientific experiment, and of how to apply theories. 10 See Suppes (1974, pp. 50–51).
123
92
Synthese (2011) 183:87–114
He did not obtain anything like octuple (4). What Suppes did do when construing scientific theory T rigorously was something we reconstruct in a manner that will invite comparison with Steps (A)–(E) of the L-View. (a) Adopt the informal language of axiomatic set-theory (e.g. Zermelo’s theory of 1908 with the axiom of choice, ZC)—its formalised version is L∈ ; it consists of only set-variables and its lexicon contains no names and a single primitive dyadic predicate, the membership-relation (∈); the variables can be expanded with ‘primordial elements’ aka Urelementen (German). (b) Adopt elementary predicate logic as the background deductive apparatus in order to be able to reason rigorously in L∈ (prove theorems). (c) (c.i) Typify the fundamental concepts in T set-theoretically (whether they are properties, relations, functions, operations, etc. and what their domains and codomains are) and (c.ii) express the postulates, principles and laws that characterise T set-theoretically; (c.iii) obtain a set-theoretical predicate—frequently called a Suppes-predicate; call its set-extension T , which then consists of exactly the sets that are structures meeting the Suppes-predicate—also referred to as models. Whence Suppes’ slogan: To axiomatise a theory is to define a set-theoretical predicate.11 (d) —. (e) Let Dt (T) be the set of data structures that are obtained until historical time t from the measurement-results of experiments or observations relevant for T, which are extracted from ‘the phenomena’ that T is supposed to save. Call T observationally adequate at t iff for every data structure D ∈ Dt (T), there is some structure (model) S ∈ T such that D is embeddable in S, where ‘embeddability’ is broadly construed as some morphism from D into (some part of) S.12 For the sake of rigour, we emphasise that first one has to prove the existence of a convenient set, such as Vω+g (a tiny tip of the cumulative hierarchy V until ordinal level ω+g, where g is the number googol,13 ) in order to guarantee the existence of setextension T by the Axiom of Separation, which then licences the Suppes-predicate, say τ (·), to grab in Vω+g : T ≡ {S ∈ Vω+g | τ (S)},
(6)
so that T ∈ Vω+g and therefore T ⊂ Vω+g . When asked to reconstruct some scientific theory, one can simply begin with step (c) and forget Steps (a) and (b) because they are always the same. For the axioms of set-theory one takes those of Zermelo’s ZC, 11 See Da Costa and Chuaqui (1988) for the definition of a ‘Suppes-predicate’; for a friendly definition of Bourbaki’s related concept of a set-structure from 1968, see Muller (1998, pp. 106–115). 12 Some morphism, that is, isomorphism, partomorphism, monomorphism, epimorphism, partial isomorphism and perhaps more; see Muller (1998, pp. 125–126). 13 This means one can iterate the power-set operation on N, which we identify with ω, googol (g) number of times. This yields all the Cartesian product-sets that include and contain all the mathematical objects that science needs and ever will need.
123
Synthese (2011) 183:87–114
93
because on their basis one can construct all mathematics that science needs and ever will need. The embeddability-relation (e) constitutes the connexion between (i) theory T and (ii) the phenomena that T is supposed to describe, to explain or to predict. To save the phenomena is to embed the data structures. The nexus between (i) and (ii) codifies the empirical essence of science: without it, there simply is no science-as-we-know-it. When the content of DT grows more varied (difficult to express formally) whilst T remains observationally adequate, T becomes better confirmed. If a number of repeatedly obtained data structures in Dt (T) are not embeddable into any structure in T, then T is no longer observationally adequate (e); then T has been falsified. This first sketch of science is, as Suppes warned, not the most accurate picture in which every relevant aspect of science finds a home; more accurate is a hierarchy of structures with the bare data structures at one end and general theory-structures at the other.14 We call this view the Informal-Structural View, henceforth S-View for brevity; the following ordered pair is the rigorous reconstruction of T, to be compared with octuple (4) of the L-View:
T , Dt (T) .
(7)
Today it seems to have replaced the ancien régime largely. The Model Revolution has succeeded. Why did the Model Revolution succeed? Why has the S-View replaced the L-View? For at least six (related) reasons. I. Internal problems raised by the execution of the L-View and external criticism of its assumptions, notably the assumption of subdividing the vocabulary of every scientific theory clearly and sharply in an observational and a theoretical part.15 II. Continuous with reason I is that the formalisation of scientific theories has been of little help to solve all sorts of philosophical problems about science and raised by science. On the contrary, it evoked all sorts of logical problems that had little do to with the philosophical problems concerning science. Thus Van Fraassen writes: The scholastic logistical distinctions that the logical positivists produced— observational and theoretical vocabulary, Craig reductions, Ramsey sentences, first-order axiomatizable theories, projectible predicates, disposition terms, and all the unholy rest of it—had moved us milles milles de toute habitation scientifique, isolated in our own abstract dreams.16 III. To execute Steps (c) and (e) of the S-View has turned out to be very much easier than executing steps (A)–(E) of the L-View. The key observation here is due to Suppes: one does not need to formalise in order to be rigorous, for the informal 14 Suppes (1974), Luce in Bogdan (1979, pp. 93–110). 15 See e.g. Putnam (1960/1962), Toulmin (1974) and Suppes (1974). 16 Van Frassen (1989, p. 225).
123
94
Synthese (2011) 183:87–114
rigour of mathematics, i.e. of set-theory, is just as rigorous as formal rigour is.17 Another slogan of Suppes: Mathematics for the philosophy of science, not meta-mathematics. Axiomatic set-theory has turned out to be extremely convenient for characterising scientific theories. Theories of physics, mathematics, chemistry, biology, economy, politics, psychology, linguistics and more have come under its Alexandrian sway. As far as reconstructing scientific theories rigorously is concerned, the S-programme has met with unprecedented success, whereas the L-programme lost gradually impetus and came to a complete standstill.18 IV. The S-View puts the model center stage, rather than the theory. In his classic paper ‘A Comparison of the Meaning and Use of Models in Mathematics and the Empirical Sciences’, which appeared in this very journal half a century ago, Suppes (1960) argued that his models (i.e. set-structures inhabiting V) could be taken as what working scientists mean by ‘model’. Observing further that scientists generally are model-builders rather than axiomatisors and theoremprovers, the S-View seemed to make much more sense of the practice of science than the L-View.19 V. The L-View makes a theory heavily dependent on a particular formulation in a specific language (LT ), whereas the S-View provides a characterisation of T that is not tied up to a particular formulation (see further Sect. 6). VI. The part of mathematics that any scientific theory employs needs to be isolated, formalised, axiomatised and included in FT (1) of the L-View. The S-View disregards this all and can discard it because, as we asserted earlier, all the mathematics that any scientific theory needs and ever will need can be erected in ZC, which marks our domain of discourse in which we (can) characterise every scientific theory.20 3 Aristotelian ideals In his The Foundations of Mathematics of 1957, Beth provided a modern characterisation of Aristotle’s ‘apodeictic science’, which has been advocated by the likes of Euclid, Descartes, Arnauld, Newton, Leibniz, Pascal, Kant, Bolzano, Husserl, Frege, Peano, Le´sniewski, Russell, Hilbert, Carnap, Tarski, Beth, Montague, . . .21 De Jong and Betti (2008) have extended this characterisation and baptised it ‘The Classical Model of Science’. We shall provide a succinct formulation tailored for present pur17 Suppes (1954, 1968). 18 See Suppes (1961, 1993, 2002), Bogdan (1979), Sneed (1979), Stegmüller (1976), Rantala (1978),
Balzer et al. (1987), Giere (1988), Costa and French (1990) and Muller (1998). 19 During the past decades, Cartwright and her followers have promulgated a less mathematical and more pragmatic and practical oriented approach to scientific modeling; this approach is a line of development within, or at least given birth to by, the S-View. Typical is Morgan and Morrison (1999), who champion this approach with much clamor and less clarity; see Giere (1988) for less clamor and more clarity. 20 Somewhat devious is to ignore the mathematics of T also in the L-View and to take, by default, always ZC as the mathematical part of every T. Then math(LT ) = {∈} in (5). 21 Beth (1968, pp. 31–32); this is the 2nd and revised edition of the first one from 1957.
123
Synthese (2011) 183:87–114
95
poses. A warning is in order: this classical ideal of science was supposed to be more encompassing than what we nowadays call ‘science’ (to which the S-View (7) is geared), and certainly than what we now call ‘natural science’ (to which the L-View (4) was arguably primarily geared). We consider an ordered pair ≡ p , c
(8)
of a set of propositions (p ) and a set of concepts (c ). Let us first call (8) a classical scientific theory iff it meets the following three conditions: (α) is about a specific set of actual beings (objects, processes, events, entities, persons, organisms, periods, structures, . . .), frequently called its domain or scope. (β) Some concepts in c are fundamental; the other concepts in c are defined in terms of the fundamental ones. (γ ) Some propositions in p are fundamental, often called the principles of ; all other propositions in p follow from these. Call (8) to qualify as classical theoretical scientific knowledge iff is a classical scientific theory (α–γ ) that meets in addition the following two conditions: (δ) All propositions in p are true. (ε) All propositions in p are universal and necessary. In the next two conditions, the existence of actual epistemic agents are needed. Call E our Epistemic community, which consists of all actual human beings having a minimum level of scientific education and are sound of mind. Call ordered pair , E to meet the classical ideal of science iff , E meets conditions (α–ε) and meets in addition the following two conditions: (ζ ) Every proposition in p is known by some members of E; every non-fundamental proposition in p is known by members of E through its derivation from the fundamental propositions. (η) Every concept in c is known by some members of E; every defined concept in c is known by members of E through its definition. Notice that if p contains infinitely many propositions, conditions (ζ –η) will never be met, unless there are also infinitely many epistemic agents, or unless an omniscient entity is posited and granted membership in E (God, as Bolzano did), or unless ‘known’ is replaced with ‘knowable’.22 Notice too that the classical ideal of science (α–η) says nothing about the relation between (8) and the results of experiments and observations, notably measurements; precisely in this relation we would, certainly since the Scientific Revolution, seek the essence of what we now call science. Hence even if some set (8) fits the classical profile (α–ε), we should not conclude that qualifies as a piece of what we today would call ‘scientific knowledge’, because something essential has been left out of that classical profile. 22 A referee has pointed out these options.
123
96
Synthese (2011) 183:87–114
De Jong and Betti (2008) point out that being self-evident for a proposition, as a guarantee for knowing it (ζ ), was generally taken for granted for the fundamental propositions (principles) until the second half of the nineteenth century (!) when it comes to geometry, whereafter the decline of this requirement set in. Beth also adds the following requirement: the meaning of the terms in c must be clear so as to not require any further explication.23 Suppes (1992) has pointed out that Aristotle did not provide a single example of a piece of science that qualifies as achieving the classical ideal by Aristotle’s own lights (ibid, p. 215); the very first piece of science that qualified was Euclid’s Elements. We would classify Euclidean geometry today as a piece of pure mathematics. What arguably qualifies as the very first realisation of the classical ideal of science, by our current lights, is Archimedes’ theory of static equilibria, although his level of conceptual and axiomatic explicitness did not come close to that of Euclid’s (ibid, pp. 217–219). Since both the L-View (4) and the S-View (7) are views of scientific theories, rather than of science, and both views are neither about the epistemic status of theories (knowledge) nor about their epistemic relation to us (knowing), they seem epistemically neutral. In spite of this, there is a point in ascertaining whether the L-View and the S-View meet conditions (α–ζ ), if only to understand why they fail if they fail to meet the classical ideal of science. Before we do this, we remark that both Views also are neutral, first, about whether scientific theories are true or false, and, secondly, about whether scientific theories are necessary and universal, so that both Views fail to meet conditions (δ) and (ε). They fail, however, by remaining silent about these issues, not because they cannot be extended so as to meet these conditions. Clearly formal-deductive system FT (1) of the L-View meets conditions (α–γ ) and is therefore a classical scientific theory when we take the primitive predicates of LT to express the fundamental concepts in c and syntactic definability as ‘definability’ (in β), and when we take the sentences of LT to express the propositions in p and the axioms of T (2) to express the principles in p (in γ ). Whether T (6) of the S-View meets conditions (α–γ ) is highly problematic, because T is not a set of propositions and concepts (but a set of set-structures) whilst the predicate ‘classical ideal of science’ is about (8). However, if we consider the set-theoretical renderings of the fundamental concepts of T as expressions of the fundamental concepts in c , and if we consider the characterisation conjuncts in the definition of the set-theoretical predicate whose set-extension is T (6) as expressions of the fundamental propositions in p , then there is a case to be made for the claim that T also qualifies as a classical scientific theory. The condition of the classical ideal of science of which we have so far remained silent is the beings condition (α). We have taken it for granted that in the L-View (4), the specification of the subject-matter of T proceeds semantically: whatever the variables of LT range over and to whatever the predicates of LT apply are the beings that 23 The correspondence between Beth’s characterisation (1968, pp. 31–32), where the conditions are labeled by Roman numerals, and De Jong and Betti (2008), is as follows: α–I, β–IV(b), γ –III, δ–II. Beth’s IV(a) is the meaning of the fundamental terms being evidently clear; his V(a) is that the principles should be obviously true; and his V(b) is the soundness of the deduction–relation, which is construed epistemically by De Jong & Betti, as in condition ζ .
123
Synthese (2011) 183:87–114
97
T is about (unintended models of T (2) now appear at the horizon, we quickly change course). For the S-View we cannot take something similar for granted, because T (6) does not consist of propositions that are about something; the set-theoretical predicate τ (·) that defines T is an open sentence in L∈ and therefore is literally about sets only, inhabitants of V, not of the world. But surely scientific theories are about actual beings in the world and not about abstract sets in the domain of discourse V of set-theory. To conclude, the only good news for propounders of the classical ideal of science is that FT (1) of the L-View (4) qualifies as a classical scientific theory. Bad news is that the classical ideal of science says nothing about the relation between theory and observation, whereas precisely here resides the essence of what we call science. There have been historical attempts to change the characterisation of the classical ideal of science so as to include observations and measurements, but that will then be a characterisation of a different concept, as different as Galilean-Newtonian physics was from its Aristotelian-scholastic progenitors—calling both ‘physics’ does not really help. Perhaps one could argue that the specific failure of T of the S-View (7) to qualify as a classical scientific theory (α–γ ) does not spell trouble for the classical ideal of science, but rather spells trouble for the S-View. This is indeed what we shall argue in the next section, where we expound our first critical reflection on the Model Revolution at Stanford.
4 Deep diving for the beings In the previous section, we emphasised that the set-theoretical predicate τ (·) that defines T (7) is an open sentence in the language of pure set-theory (L∈ ); therefore it is literally about sets and not about anything else. Furthermore, theory T (6) is itself a set of structures in V. How can any scientific theory T thus reconstructed be about any concrete being in the world, say B (of ‘Being’)? Ontological variable ‘B’ may have as values our solar system, Einstein’s brain, the French Revolution, a human cell, the Big Bang, an Uranium nucleus, the entire universe, the market economy of Russia after the fall of the Berlin wall, the population of lions and zebras in Kenia, an earthquake in Turkey, and so forth—all concrete actual beings in the wide sense in which we are using this term. The fact that the S-View does not meet the beings condition (α) of the classical ideal of science counts against this View, not against the classical ideal. Yet Suppes argued, as we have also mentioned previously, that his set-structures could be taken as what working scientists mean by ‘model’, which is exactly the reason why the structures are aka models (Sect. 2). But any model in science is a model of some concrete actual being B. What we know about the modelled beings we know via our scientific models and theories; the epistemic aim of science is the production of knowledge of actual beings. About this putative connexion, between the model S ∈ T and the actual beings (B), the S-View passes over in silence. Do beings actually enter the picture in the S-View at all? They do when we draw a picture (see Fig. 1). There we can find a place for the beings. On the one hand they are what all experiments and all observations in science are about; measurements are the result of the interaction between concrete actual beings and our instruments of experimentation and observation, which are also concrete actual beings. On the other
123
98
Synthese (2011) 183:87–114
concrete actual being B
experiment observation
data structure
embedding
theory T
search
f
some structure T (model of B ?)
Fig. 1 The informal-structural view (7), or S-View
hand, their role in the S-View is not one played somewhere on stage for all of us to spot and to marvel at, let alone in the limelight drawing attention, but somewhere backstage, as somehow responsible for the phenomena, from which data structures are extracted, because the phenomena are the result of interaction between concrete actual beings and us, human beings. In contrast, the data-structures are present on stage, showing off in the limelight. As soon as some data structure D is obtained, we can forget all about the concrete actual beings B at hand, or so it seems. The S-View says next to nothing about how the models S (the theory T ) are related to the concrete beings B they are supposed to provide knowledge of. As we concluded above, the S-View passes over in silence. The best one could say is that a data structure D seems to act as simulacrum of the concrete actual being B, because D is a set-theoretical representation of the qualitative results of experiments or observations extracted from some phenomenon that necessarily involves B; the embeddability relation between data structure D and the model S ∈ T then acts as the simulacrum of the nexus between the abstract model (structure, theory) and the concrete actual being B. But this is not good enough. We don’t want simulacra. We want the real thing. Come on. We shall presently return to this Problem of the Lost Beings; for now we focus our attention on the data structures: for as soon as we have data structures, then, besides the Problem of the Lost Beings, no other problems will appear? Not quite, as we shall see in the next section.24 5 The sea of stories Suppes’ larger programme in the philosophy of science encompasses ‘representational measurement theory’, which concerns the following. We begin by stating the obvious truth that all our observations are qualitative; our sensory organs are not pieces of measurement apparatus whose numerical results we somehow read off ‘inside our minds’. The world we see around us seems to consist broadly speaking of mostly mediumsized dry material objects, moving slowly about or staying put. Science teaches us 24 The essential content of Sect. 5, i.e. expounding the Problem of the Unavailable Stories, is at least
10 years old: see Muller (1998, pp. 284–292).
123
Synthese (2011) 183:87–114
99
that everything we see is the result of electro-magnetic interaction between material objects and our eyes via a narrow window in the spectrum of all electro-magnetic radiation (400–800 nm), above a threshold intensity (below which our eyes do not register anything) and below some upper bound (above which our eyes burn).25 Now, a very specific kind of observations, obtained in certain contexts we call conducting experiments and performing observations, we choose to represent numerically by using pieces of measurement apparatus, and we call these numerical representations data or measurements. Says Suppes (1960, p. 297): The maddeningly diverse and complex experience which constitutes an experiment is not the entity which is directly compared with a model of a theory. Drastic assumptions of all sorts are made in reducing the experimental experience, as I shall term it, to a simple entity ready for comparison with a model of the theory. That simple entity is the data structure. From the 1950s onwards Suppes synthesised various approaches to the problem of under what conditions we are justified to represent qualitative observations quantitatively, i.e. numerically. This is what representational measurement theory is all about.26 For example, by first characterising the binary relation ‘is heavier than’ pertaining to qualitative observations on a pair of scales, and then proving that this relation can be represented mathematically by ‘is larger than’ on the real numbers, one is justified in introducing mass as a realnumber-valued physical magnitude. Then it makes sense to say, for instance, that one mass is 3.74 times as large as another. To contrast, one cannot characterise the binary relation ‘is in love with’ such that a similar representation theorem can be proved, which means there is no justification in saying that Charles Bovary loved Emma 3.74 times as much as Rodolphe Boulanger and Léon Dupuis did. To utter these words is not to say anything; it is to utter nonsense. For the sake of exposition, let us consider the numerical data set D(n, k) ⊂ {1, 2, . . . , k} × Qn ,
(9)
where n, k ∈ N, which consists of (n + 1)-tuples, j, q1 , q2 , . . . , qn , with j running from 1 to k, so that k is the cardinality of D(n, k). The data set D(n, k), which is settheoretically a relation, forms the following data structure (data structures are always relation structures27 ): D(n, k) ≡ N, Q, D(n, k) .
(10)
25 For a scientific criterion of observability, see Muller (2005, Sects. 4, 5). 26 Luce in Bogdan (1979, p. 93): “More than any other living person, Suppes has affected contemporary
presentations of theories of measurement.” For a review of Measurement Theory, see Humphreys (1994, pp. 219–245). Díez (1997) puts Suppes’ role in historical perspective. For an elementary introduction to the subject, see Carnap (1966, pp. 62–104). Krantz et al. (1971) is one volume of the encyclopedia-type series of books containing all data structures in science, together with representation and invariance theorems. 27 Krantz et al. (1971) and Humphreys (1994, pp. 219–245).
123
100
Synthese (2011) 183:87–114
Logically speaking, every conceivable data set can be found in an actual experiment, but not every data set is found. Call the ones that have been found in some experiment or observation an actual data set. We now face two problems, which we address successively. 1◦ . Which data sets of all possible ones are actual? When we want to express ourselves formally, there seems to be no other way than to accept in L∈ ‘actual’ as a primitive monadic predicate of data sets. For suppose we try to provide a definition: a data set is actual iff it is obtained by performing some experiment or observation. The definiens stands as much in need of explication as the definiendum: the notion of ‘a performed experiment’ is as clear, or as obscure, as what an actual data set is. Etc. A simple pragmatic and informal yet defeasible sufficient condition for a data set to be actual is as follows: if a data set is published in some respectable scientific journal, then it is actual. Further, a data set is actual only if it involves an actual concrete being. 2◦ . Suppose we have some actual data set. For which scientific theory is this data set supposed to be relevant? Consider, for the sake of concreteness, a data set D(2, 10) of type (9)—n = 2 and k = 10—and the following four simple physical experiments. (i) We measure the length (l ) of the image on a screen of a little wooden stick we hold in front of a positive lens outside its focal distance; we also measure the length (l) of the stick itself; we have ten sticks. In this manner we obtain ten ordered triples of type j, l, l ∈ N × Q2 , which we collect in data set Dlens (2, 10). (ii) We measure the direct current (I ) and the voltage (U ) over one resistor ten times, for different voltages; then we obtain ten triples of type j, I, U ∈ N × Q2 , which we collect in data set Delec (2, 10). (iii) We measure the traveled distance (d) of a free falling coin in vacuum at ten points in time (τ ) before it hits the ground; then we obtain ten triples of type j, τ, d ∈ N × Q2 , which we collect in data set Dfall (2, 10). (iv) Finally, we measure the height of a mercury column (h) at different instances during the day (t); then we obtain ten triples of type j, t, h ∈ N × Q2 , which we collect in data set DHg (2, 10). Again, for which scientific theories are these data sets supposed to be relevant? Clearly the answer is: (i) Dlens (2, 10) is relevant for ray optics, (ii) Delec (2, 10) for the theory of electrical circuits, (iii) Dfall (2, 10) for Galilei’s theory of kinematics, but also for Newtonian particle mechanics, and (iv) DHg (2, 10) for meteorological theories. The problem is that we cannot come to know this by staring at these data sets, because all four of them comprise ten ordered pairs of positive rational numbers. A story has to be told about how and in which scientific context the data sets are obtained. Thus all actual data structures float in a sea of stories, that need to be told in order to know which data are relevant for which theory. Without such stories we cannot even begin to address the relation between theory and observation. Are there any essential ingredients every such story must have? We answer in the affirmative. Every story must tell us, first, which concrete actual beings the measurements pertain to, and, secondly, what is measured, which involves a specification of the units. Our presentation of the results of the four simple experiments (i)–(iv) above was in this sense incomplete: we should have added the relevant units. Notice however
123
Synthese (2011) 183:87–114
101
that providing units does not tell the complete story, because the ordered pairs in both (iii) Dfall (2, 10) and (iv) DHg (2, 10) have units of time (second occupant) and length (third occupant). Confusing Dfall (2, 10) and DHg (2, 10) would lead to a refutation of Newtonian particle mechanics and to doomsday-weather forecasts. In telling the stories of how and in which contexts experimental results are obtained, we use language, we use words that express concepts and we use sentences that express propositions. When we arrive with our data structure in the realm of theories, by telling some story, we know for which scientific theory or theories the data structure is relevant. But when we only have rigorous construals of all scientific theories in accordance with the S-View (7), to which of the sets of set-structures should we go in order to find a structure that embeds an obtained data structure? The S-View lacks the resources to tell the necessary stories: language. Call this the Problem of the Unavailable Stories. This problem leads us to the next section, where we inquire into the conceptual and propositional content of the set-theoretical construals; it will give rise to our next critical reflection on the S-View. 6 Concepts and propositions at bay We call to mind again that the epistemic aim of modern science is to provide knowledge of concrete actual beings; call theoretical scientific knowledge the kind of knowledge obtained by scientific inquiry that is stored in theories, and hence somehow in the models that constitute the theory according to the S-View (7). A necessary condition any model S must pass in order to be considered as a piece of theoretical scientific knowledge about a particular kind of concrete actual being B is that S should embed all the relevant actual data structures obtained by observing B or performing experiments with B, or both. But what does set-structure S, or a theory for that matter (T S) say about B? Where has the conceptual and propositional content of the scientific theory T, as used by scientists, gone to when reconstructed as T ? Call this the Problem of the Lost Content.28 Saying that force in classical mechanics is, first, a real 3-vector, f ∈ R3 , is saying something about the concept of force, namely that a force has a direction in R3 , which is its direction in 3-dimensional space when and only when we take R3 to represent 3-dimensional space, and, secondly, that force has a strength, or magnitude, when and only when we take length f ∈ R+ to represent this strength. But these explications go beyond the set-theoretical sentence ‘f ∈ R3 ’, because they involve the concepts of direction in 3-dimensional space and strength, whose content clearly goes beyond the language of pure set-theory (L∈ , in which we can only talk about pure sets). Furthermore, where and how does truth fit in the S-View? If there are ‘truth-makers’ of propositions that are somehow determined by means of S (or of T ), then surely they are, or somehow reside in, the concrete actual being B that S (or T ) is supposed to be about. For if B is neither the truth-maker nor involved in the truth-making of 28 The L-View is plagued by a similar problem, because its theoretical vocabulary is only partially interpreted, i.e. connected to observation predicates in the correspondence postulates of T (2); the theoretical vocabulary remains underdetermined.
123
102
Synthese (2011) 183:87–114
sentences about B, then the concept of truth gets dissociated from the actual concrete beings, and such a concept of truth does not seem to be the concept of truth that is used and understood in science. Surely in science truths and falsehoods are truths and falsehoods about the world. What else could they be about? Well, how does this precisely work, then, according to the S-View? Once again, the S-View passes over in silence, this time about the question what the relation is between the structures that constitute the theory and the beings the theory is supposed to be about (the Problem of the Lost Beings of Sect. 4 re-surfaces). Perhaps there is a profound wisdom in this silence. For what would count as possible answer to these questions? Evidently, when we are asking of how this precisely works, we are asking for more words. We are asking for sentences that express propositions about being B, and these propositions surely involve concepts that express, presumably, properties of B and relations between B and other beings. Let model S ∈ T be supposed to be about being B. Those propositions and concepts expressed by the sentences and words that we are asking for will be the truly ontological ones, because they are ‘directly’ about B, about B an sich, as B really is, in and of itself, without any mediation by the products of science (model S, theory T ). These propositions and concepts are expressed in what we, then, may call the ontological language of being B (Lont ). Next Lont somehow has to be related to L∈ , in particular to the part of L∈ that defines S and T (6), in order to establish the desired connexion between S and B. Certainly we would like to have true sentences of Lont about B; all ontological truths about B may be said to express our ontological knowledge of B. When ψ(B) ∈ sent(Lont ) is a sentence about B that is made true by B, and ‘|’ abbreviates this truth-maker relation, then what we are after is to fill in the dots in the following expression: B | ψ(B) iff . . . . . . . . . . . . (in L∈ ).
(11)
But how are we going to find out whether ψ(B) is true or false? By contemplation in the proverbial armchair? By thinking deeply and uncompromisingly about B? Or by combining reason with observation and imagination in that enormously successful manner that forms the beating heart of modern science? The second option surely is promising. But recall that propositions and concepts expressed in Lont were supposed to make sense of what science tells us about B. We were inquiring about the relation between S and B. The ‘ontological theory’ we were after, in order to provide us with the desired ontological knowledge of B, gives rise either to a doubtful programme of armchair science or to a scientific research project. In the last-mentioned case we are eating our tail. What went wrong? Something went wrong in the beginning: we should not want to have Lont in the first place, in order to talk about B. But is this because B as it really is, in and of itself, is epistemically inaccessible for us, as Kant would have it, so that the idea of considering Lont is a senseless exercise? Did we fall prey to pre-Kantian metaphysics? Or when B as it really is, in and of itself, somehow is epistemically accessible for us, do we not already have some part of the language of science to talk about B, namely the language of the scientific theory T that is supposed to be literally about B, as Ampère countenanced contra Kant? Is not T supposed to provide us with knowledge
123
Synthese (2011) 183:87–114
103
of concrete actual beings expressed by true and justified sentences in T? If everything we want to know about B is, however, not provided by T, then we must engage in more scientific research about B in order to refine or to extend or to replace T. What else is there to do? And this lands us in our original problem: a set-structure S ∈ T is defined in L∈ and therefore cannot tell us anything about B whilst T is all that science will ever be able to produce on the current construal. Perhaps we have to take a closer look at Suppes’ arguments for his claim that his models (S ∈ T ) can be taken to be what working scientists call ‘models’. What we have not mentioned so far is that Suppes was specifically inspired by Tarski’s creation of Model Theory in Logic and that this has been the basis for Suppes’ claim.29 Perhaps we find a clue here. Let’s see. Tarski’s conception of a model is, properly construed and denoted, a triple M = S, R, | ,
(12)
consisting of a set-structure S living in the domain of meta-discourse (V whenever the meta-theory is also some set-theory, which usually is the case), a Reference-map R sending terms in a formal language L to items of S and an inductively defined map | from sent(L) to the semantic values ‘true’ and ‘false’ (also known under various other names: satisfaction, truth-map, semantic valuation). We have all learned this in our Logic courses that M is a model of T iff S, R | T .
(13)
The relevant Suppesian model here is the set-structure S, not M (12). Suppes has argued that S can be identified with what is called a ‘model’ by working scientists. So Suppes discards the semantics (R, |); this is why the name ‘semantic view’ for Suppes’ view is not just a misnomer but a terminological howler. But then what if we amend the S-View by taking the full triples M (12) aboard rather than only S? We then could take the set of all of them as the rigorous reconstruction of T. The ensuing view of scientific theories could be truly called the Semantic View: T = M | S, R | T ∧ S ∈ Vω+g ,
(14)
to be combined with verified relevant observation sentences Ot (T) or actual relevant data structures Dt (T). Sentences and truth are now waiting just around the corner, ready to be amalgamated into the proper view of what a scientific theory is. This T (14) however does not seem to be what we want, because a Tarskian model M is a model of a formalised theory, of a set of sentences in a formal language, such as T (2) is. Of all admissible truth-maps the one is chosen that makes, in collaboration with a reference-map R (from terms in the language to elements of V), the axioms of T true. From T (14) we see that the Tarskian model presupposes part of the L-View on theories (4). The set of all triples M (12) such that they are all models of a formal 29 Suppes (2002, pp. 20–21). Suppes (1960, p. 289): “I claim that the concept of a model in the sense of
Tarski may be used without distortion and as a fundamental concept . . .”
123
104
Synthese (2011) 183:87–114
theory T is denoted as: Mod(T). Then T ⊂ Mod(T), but when we re-define Mod(T) as to permit only structures in Vω+g , as in (14), then T = Mod(T).
(15)
In order to see Mod(T) as the proper reconstrual of scientific theory T, we first need T. But what is T a formalisation of? If a formalisation of T, then we are back in the arms of the L-View! One way to avoid these problems is as follows. We return to the S-View (7) consider the extension T (6) not as the ultimate but as the penultimate step towards a rigorous construal of scientific theory T. In order to prepare for the ultimate step, we consider one of Van Fraassen’s arguments in favour of his version of the semantic view (cf. reason VI at the end of Sect. 2): In any tragedy, we suspect some crucial mistake was made at the very beginning. The mistake, I think, was to confuse a theory with the formulation of a theory in a particular language. (. . .) Words are like co-ordinates. If I present a theory in English, there is a transformation which produces an equivalent description in German. There are also transformations which produce distinct but equivalent English descriptions. This would be easiest to see if I were so conscientious as to present the theory in axiomatic form; for then it could be rewritten so that the body of theorems remains the same, but a different subset of those theorems is designated as the axioms, from all the rest follows. Translation is thus analogous to co-ordinate transformation—is there a co-ordinate-free format as well? The answer is Yes (although the banal point that I can describe it only in words obviously remains). (. . .) Suppes’ idea was simple: to present a theory, we define the class of models directly, without paying any attention to questions of axiomatisability in any special language, however relevant or logically interesting that might be.30 Van Fraassen draws a comparison with physical space–time theories and co-ordinate systems on space–time. For centuries such theories were formulated in terms of co-ordinates, which were taken to be labels of space–time points. In order to express that choosing a particular co-ordinate system is a pragmatic issue without any physical relevance, laws of nature formulated using a particular co-ordinate system were required to have the same ’physical content’ when a co-ordinate transformation was performed on the laws. After Einstein had used Riemann’s concept of a differentiable manifold in the formulation of his General Theory of Relativity, this mathematical structure was soon also used to re-formulate Einstein’s Special Theory of Relativity (Minkowski space–time, Robb space–time) and Classical Electro-Dynamics (idem), Classical Mechanics (Galilean space–time, Newtonian space–time), without and with his theory of universal gravitation (Newton–Cartan space–time). In the language of 30 Van Frassen (1989, pp. 221, 5–6).
123
Synthese (2011) 183:87–114
105
Riemann’s differential geometry, it has turned out to be possible to formulate laws of physics (that characterise a physical space–time theory) without mentioning co-ordinate systems, so as to make the physical irrelevance of co-ordinate systems manifest. Hence, Van Fraassen compares the general framework of differential geometry and a differentiable manifold to the general framework of set-theory and a set-structure, respectively; then he compares the co-ordinate-free formulation of a physical space– time theory with the language-free formulation of a scientific theory T through a set T of set-structures S (6). The banal point is that we use L∈ to define S and T . So L∈ has nothing to do with the language used by scientists in text books and review articles about T but is a mere means to characterise S and T . We then further compare a co-ordinate system on space–time with a formulation of T, both chosen for the sake of convenience, and transformations between co-ordinate systems with translations between formulations of T.31 All this looks pretty appealing, especially when displayed: differential geometry
set-theory
differentiable manifold
set-structure
co-ordinate-free formulation of a space–time theory
language-free formulation of a scientific theory
co-ordinatisation of a space–time
formulation of a scientific theory
atlas of all co-ordinate transformations
all formulations of a scientific theory
co-ordinate transformations
formulation translations
There is, however, a problem. When we formulate T rigorously—or “conscientiously”, as Van Fraassen puts it in the quotation displayed above—as a formal-deductive system FT (1), we obtain a lexicon of a particular signature, which tells us how many names, how many sorts of variables and how many primitive predicates of every adicity there are. This will fix the signature (of the structure S) of the models of T (12), that is, of all the models in Mod(T), or so it seems. A model having a different signature cannot, then, be a model of T? If we translate L into another language, say L (every item of the lexicon of L is translated in terms of the lexicon of L ), and choose axioms in L whose deductive closure yields the formal theory T , such that all translated axioms of T become theorems of T , and conversely, then “the body of theorems remains the same” (Van Fraassen, above): T = T . But since in general the signature of L is different from that of L, the models of T and of T will be distinct, so the classes of their models are disjoint: Mod(T) ∩ Mod(T ) = ∅.
(16)
31 Translation in Logic is a concept also due to Tarski; see Muller (1998, pp. 170–173) for details. To formalise making a model of some theory is to define a translation from the object-language in the metalanguage.
123
106
Synthese (2011) 183:87–114
Yet (16) seems incompatible with T = T , for if T = T , then surely Mod(T) = Mod(T ). Right? Wrong. On the basis of the L-View (4), it does not follow from T = T that the theories are identical because to draw that conclusion the lexicons have to coincide too. For the L-View, the formulation counts. One can have T = T but FT = F T . Since the lexicons of L and L are different, the signatures of the models of T and T are different, so that indeed (16) holds. A more explicit notation such as ‘Mod(T, lex(L))’ or ‘Mod(FT )’ would have prevented the fallacious inference from T = T to Mod(T, lex(L)) = Mod(T , lex(L )) or Mod(FT ) = Mod(F T ). But now notice that there is a straightforward manner to amend the L-View slightly so as to meet Van Fraassen’s challenge to provide language-free characterisation of scientific theory T. We could simply universally quantify away the lexicon from (4) and drop uno icto the distinction between ‘observation’ and ‘theoretical’ predicates. Specifically, we identify a theory as a class of all and only inter-translatable formal theories; loosely and symbolically: AT ≡ T , lex(L ) | ∃g : lex(LT ) → lex(L ) : g[T] = T and ∃g inv : lex(L ) → lex(LT ) : g inv [T ] = T ,
(17)
where bijective map g is a translation between the lexicons. Call now this the Formulation-Independent Linguistic View, or [L]-View for brevity:
AT , , Ot (T) ,
(18)
where ‘AT ’ alludes to ‘atlas’. An immediate consequence of (17) is that one member of this class fixes the models of all its other members. So the bodies of all formal theories in this class coincide. Notice that it is an equivalence-class because the relation ‘is inter-translatable to’ is an equivalence-relation; every AT is a member of the quotient-class of the class of all elementary formal theories divided by the relation of inter-translatability. Then what the theory says about the world, i.e. its propositional content, expressed in T, is invariant under translations. Hence our characterisation (18) of T is, although not language-free, independent of any specific formulation. The conceptual content of the theory is formulation-dependent and hence language-dependent: it is expressed by a lexicon and lexicons change under translation. Now, to provide a formulation-free rigorous characterisation of T, e.g. as in T (6) of the S-View, is one way to make a rigorous characterisation formulation-independent (as analogous to a co-ordinate-independent formulation of a physical space–time theory), yet it is not the only way. Another way is to quantify the formulation-dependence universally away, as we just have done, and make the propositional content of a theory independent of its formulation, i.e. of its lexicon. For those who insist on a language-free characterisation of T nonetheless, rather than a formulation-independent one, we point out that also in a co-ordinate-free formulation of physical space–time theories in terms of differentiable manifolds, the co-ordinate systems are not eradicated at all. Physical space–time theories are not free of co-ordinate systems. On the contrary, the structure of a differentiable manifold
123
Synthese (2011) 183:87–114
107
involves by definition an atlas, which is the set of all local co-ordinate systems on the open cover of the space–time continuum; and it is in terms of these co-ordinate systems and their transformations that the differentiability of the manifold is characterised. There is no such thing as a co-ordinate-free differentiable manifold. To a certain extent the co-ordinate-free formulation of physical laws in space–time theories is therefore a fata morgana, because when these laws are explicated, including an explication of the concepts used in the formulation of these laws, all co-ordinate systems will inevitably re-appear on the stage. The independence of physical laws from co-ordinate systems used to be not manifest because co-ordinates were mentioned in the formulation of physical laws; this was guaranteed by imposing a co-variance requirement on physical laws under co-ordinate transformations. In the [L]-View (18) something exactly analogous happens. The formulation-independence is manifest in the language-free S-View (7). We are now prepared to set the promised ultimate step in order to solve the Problem of the Lost Content in the S-View. Set T consists of structures S (6). When part of a Tarskian model M (12), structure S fixes the signature of M. This fixes in turn the signature of a formal language, call it LS, in which we can formulate a theory that has M as its Tarskian model; call such a theory Th(S). We can consider the class of exactly the theories of S: K S ≡ Th(S), lex(LS) | ∃R : S, R | Th(S) and S ∈ Vω+g . (19) Do the theories in K S have the same signature, namely the one of S? Closer inspection reveals they do not, because every translation of member Th, lex(LS) will also be made true by S, R . Every structure S ∈ T now comes with a class K S of theories, having various lexicons, all of which are made true by S, R . The following ordered pair should then be an improvement on the S-View (7) as the construal of what working scientists call a ‘model’:
S, K S .
(20)
Roughly, a model (20) is a structure and all its formulations. When we define the propositional content of S as all inter-translatable formulations, then model (20) is a structure and its propositional content.32 Let T τ now be the class of all such ordered pairs, where S meets the antecedently defined Suppes predicate τ (·) in L∈ : T τ ≡ S, K S | τ (S) .
(21)
Then our improved rigorous construal of T is
T τ , Dt (T) .
(22)
Let us call this the Structural View, or briefly the K S-View. 32 This idea was first proposed in Muller (2005, Sect. 6).
123
108
Synthese (2011) 183:87–114
The K S-View (22) is an improvement, because now we have lots of languages and theories of S—formulations of theories if you like—which we can appeal to for locating the conceptual and propositional content of T: we now have sentences that can be true or false; and we have the predicates of the lexicons to express concepts. Choice of language does not matter for the propositional content of T τ (21), as choice of co-ordinate system does not matter for the physical content of space–time theories, because all sets Th(S) of sentences generated by S, and made true by Tarskian model M (12) that has S as its structure, can have wildly differing lexicons and these are all included in T τ (21). This solves the Problem of the Lost Content. Further, it has all the resources to solve the Problem of the Unavailable Stories. To recapitulate, we have begun this section by raising the difficulty where the conceptual and propositional content of scientific theory T has gone to when construed in accordance to the S-View (7), namely as a set of set-structures defined in the language of pure set-theory (L∈ ). This Problem of the Lost Content is related to the Problem of the Lost Beings we raised earlier when we wondered where the S-View allocates the concrete actual beings (B). For T is supposed to provide us with scientific knowledge, in the sense of scientifically justified true propositions about concrete actual beings. We briefly explored the possibility of an ontological language (Lont ) to talk directly about the beings (so as to solve both of the mentioned problems uno icto), but we soon discovered that we were eating our tail. Then we moved, inspired by Suppes’ inspiration, from the Suppesian model S to the Tarskian model M (12). Notwithstanding the fact that Tarskian models gave us languages to talk about abstract set-structures rather than about the concrete actual beings (B) that T is about, we proceeded, inspired by Van Fraassen’s analogy with co-ordinate-free characterisations of physical space– time theories, to loosen the bond between structure (S) and formulation (notably the lexicon of L) by incorporating translations into the L-View. This has led to the [L]View (18), according to which T is the set of all formal formulations of T. Then we re-introduced formulations of theories of structures into the S-View, but in such a manner that all formulations are incorporated that leave the propositional content of the theory invariant. This led to the K S-View (22), which solves the Problems of the Lost Beings, of the Lost Content and of the Unavailable Stories. Should we now choose the [L]-View (18) or the K S-View (22) as the best reconstrual of scientific theories as they occur in science? In the next section we answer this question. 7 Towards completion Einstein once said that we should make our scientific theories simple but not simpler. Mutatis mutandis for our views on scientific theories. Consider the two new views: [L]-View : AT , Ot (T) , (23) K S-View : T τ , Dt (T) , which are both markedly different from their progenitors, the L-View (4) and the SView (7), respectively. Both new Views look baroque, but the [L]-View does not suffer
123
Synthese (2011) 183:87–114
concrete actual being B
109
about
some formulation of T satisfied by
satisfaction
structure Fig. 2 Beings in the K S -View (22)
from the formulation-dependence that crippled its predecessor, the L-View (4), and the Structural View; further, it suffers neither from the Problem of the Lost Content nor from the Problem of the Lost Beings nor from the Problem of the Unavailability of the Stories—it does suffer from other problems, that lead to the decline of the L-View (see Sect. 2). The K S-View (23) also does not fall prey to the three mentioned Problems. What should not come as a surprise is that on top of this, the gap between these two improved views (23) is not as wide as between the L-View (4) and the S-View (7). For from (17) and (19) we deduce: AT = K S.
(24)
Logically speaking, both new views really are two sides of the same coin. Philosophically speaking, the gap remains sufficiently wide to prefer the K S-View over the [L]-View (23), because the starting point for this construal remains a set-theoretical predicate τ (·) defining a type of set-structure S (the Suppesian model keeps playing the main part), and because the formal theories and their inter-translations in K S need never be spelled out (the formalising labour that is mandatory in the L-View need not be performed). In the old S-View (7) the formulations are simply not there, which is why the conceptual and propositional content of T is lost and T floats helplessly around in the sea of stories; in the K S-View (23) they are lying there for the taking and are officially acknowledged to be part of the full reconstrual of T. Conceptual and propositional content can be hauled in at will and by law. Those propositions can and should be taken as being literally about the concrete actual beings, in exactly the same way as the proposition expressed by the Latin sentence ‘Sol lucet’ is about the Sun: we have dived deeply for the beings and here they are. This does not mean that we have solved the problem of how precisely the structure S relates to the concrete actual being B, but it does mean that we have transformed the problem so as to subsume the transformed problem under one of the major problems of philosophy of the past century: the profound problem of how to understand the relation between the word and the world. Figure 2 depicts the situation we have arrived at. About this profound problem, the K S-View (22) has little in the offing—we shall say a bit about it in the next and final section. Yet at soon as it is granted that the
123
110
Synthese (2011) 183:87–114
concrete actual being B
represents B iff
is a model of B
structure Fig. 3 Representation according to the K S -View (22)
K S-View on scientific theories need not solve this profound philosophical problem— as for example propounders of the informal-structural view French and Ladyman (1999, p. 115) hold—we can conclude, succinctly, that the K S-View (22) completes the Model Revolution at Stanford. A complementary way to complete the Model Revolution, not to solve the problems we have tried to solve, but to solve other problems concerning experimental science rigorously, is outlined by Suppes (in this issue of Synthese).
8 Exitum: scientific representation The concept of representation occurs in art, science, engineering, mathematics, astrology, logic, linguistics, cinematography, religion and literature, poetry included. Raising the question whether it is possible to have a substantive philosophical account of representation that covers all these varieties of human activity will cause mostly sceptical frowns rather than encouraging cheers. Did anyone ever conceive of this to be possible? The best way to proceed, then, seems to follow Wittgenstein’s way: analyse how the word ‘representation’ is used in all these different contexts; then try to find out whether these uses have anything in common so as to identify that as the core of the concept of representation; and then, finally, to adopt this core as the characterisation or even definition of the concept of representation; if all this fails, we must presumably conclude that representation is a family-resemblance concept. Our very first step is to restrict our focus on representation in science, that is, on scientific representation, and leave representation in all other mentioned fields aside, and further focus on scientific representation according to the K S-View (22), that is, on structural scientific representation, or simply structural representation for brevity.33 What the K S-View straightforwardly seems to say about representation is depicted in Fig. 3.
33 Then we can gloss over examples and counter-examples of representation drawn from art etc. that Suárez
(2003) mobilises against structural scientific representation. See also French (2003).
123
Synthese (2011) 183:87–114
111
So ‘representation’ is another word for ‘model’. Giere, however, argues that in order to make sense of (the practice of) science, representation should not be expressed by a dyadic predicate, but by a quadratic predicate, which reads in our terminology:34 scientist S uses model S to represent being B for purpose P.
(25)
Of course analysis (25) does not help a whiff to elucidate the relation between the representing structure S and the represented being B—on the contrary, (25) only complicates it. Furthermore, an analysis of the form (25) puts an end to what Suárez has called a naturalised analysis of representation, because the epistemic agents and their actions and purposes enter the analysis. Representation then has become an intentional concept. To represent is a manifestation of human agency. Frigg, who also takes models to be the representing entities (see Fig. 3), has listed six problems that a substantive philosophical account of scientific representation must solve:35 (F1) Ontological Problem. What sort of entries are models? (F2) Clarification Problem. In virtue of what does a model represent something else? (F3) Taxonomic Problem. What is the taxonomy of the various kinds of models that are used in science? (F4) Demarcation Problem. How to distinguish scientific from non-scientific models? (F5) Epistemological Problem. How to acquire knowledge of the world (of the concrete actual beings) from the models? (F6) Problem of Misrepresentation. How to account for the fact that refuted, erroneous, idealised, approximate, inaccurate models never fail entirely to represent? By construing the representation-relation between structure S and being B as an isomorphism (or a sibling notion), Frigg has argued that the S-View (7) is unable to solve most of these problems. Although a thorough inquiry of whether the improved K S-View (22) fares better in meeting the Frigg Conditions (F1–F6) is beyond the scope of the current paper, a few quick answers seem in order. (F1) Ontological Problem. Models are set-structures. (F2) Clarification Problem. (F3) Taxonomic Problem. No stumbling blocks in the K S-View for erecting some taxonomy of models used in science. 34 Giere (2004, p. 743). Giere identifies ‘general principles’ that scientists employ to construct models of “features of the world” (of beings), and has no use any longer for the concept of a theory (2004, pp. 744–747). But a precise characterisation of these ‘general principles’ leads linea recta to a set-theoretical predicate whose extension simply is a theory, so that the models constructed by scientists employing general principles, and therefore obeying these principles, simply are members of that extension. Giere effectively uses theories in full accordance to the structural view, but he no longer wishes to mention what he uses. 35 Frigg (2006, Sect. 2) lists three problems, of which one consists of two problems (our F1, F2, F3,
F4—Frigg lumps F3 and F4 together), and two requirements (our F5 and F6).
123
112
Synthese (2011) 183:87–114
(F4) Demarcation Problem. Any set-structure in Vω+g can in principle be a model. The problem of what distinguishes the activity of science from other human activities wherein representation occurs, such as art, engineering, astrology, literature and cinematography, cannot and should not be solved by focusing on models only. (F5) Epistemological Problem. There is lots of propositional content stored in K S (19) in T τ (21) of the K S-View (22): there our scientific knowledge of the world is expressed. (F6) Problem of Misrepresentation. When we choose tailor-made morphisms on a case by case basis, misrepresentation can easily be accounted for. Representor and represented have at least one resemblance (otherwise one would never be called a representation of the other), and precludes total misrepresentation. Suárez (2003) has argued similarly that there are five arguments against a construal of the representation-relation as isomorphism (or a sibling notion). (S1) Variety Argument. There are lots of entities that represent but are not set-structures. (S2) Logical Argument (originally due to Goodman). The representation-relation has different properties than the isomorphism-relation: the aforementioned is neither reflexive nor symmetric nor transitive whereas the last-mentioned is an equivalence-relation. (S3) Misrepresentation Argument. A model can be mistargeted or can be idealised, approximate, inaccurate without loosing its representation-relation (cf. footnote 6). (S4) Non-necessity Argument. Representation-relations are not always isomorphisms (or sibling notions). (S5) Non-sufficiency Argument. Isomorphisms (and sibling notions) are not always representation-relations. Again, although a thorough inquiry of whether the improved K S-View (22) also falls prey to the Suárez Arguments is beyond the scope of the current paper, a few quick responses seem in order. (S1) Variety Argument. We decide ab initio to deal with scientific representations only, which are (nearly) always set-structures. (S2) Logical Argument. When we choose tailor-made morphisms on a case by case basis, this argument breaks down. (S3) Misrepresentation Argument. When we choose tailor-made morphisms on a case by case basis, misrepresentation can easily be accounted for. (S4) Non-necessity Argument. Suárez (2003, pp. 235–236) counter-examples from art can be set aside as irrelevant for representation in science. (S5) Non-sufficiency Argument. Suárez (2003, p. 236) argument that morphisms lack ‘directionality’, which is an essential feature of the representation-relation, cannot be met without extending the concept of representation to a form like (25). I am under no illusion that these quick responses settle the representation debate as far as the K S-View (22) is concerned. At best they point in directions worth inquiring into. As Frigg (2006, Sect. 5) has emphasised, to construe the relation between a
123
Synthese (2011) 183:87–114
113
model S and a being B as a morphism of any kind presupposes that B is a structure, because morphisms of any kind are defined as inter-structural relations. This ontological presupposition is often taken for granted, but stands, of course, also in need of inquiry—an inquiry that lies, too, beyond the scope of the current paper.36 When we avoid this presupposition by taking the representation-relation (or equivalently the model-of-relation; see Fig. 3) as primitive, almost all of Suárez’s arguments loose their force. But then Frigg’s Clarification Problem (F2) remains unsolved. One way to solve it is to adopt a truth-conditional view of meaning. We then attempt to specify truth-conditions for the proposition: structure S represents being B (). When we take B to be some phenomenon, the truth-condition is in terms of the embeddability of the data structures extracted from B into S. For an empiricist, this is sufficient. For a realist, this is not sufficient. But then the truth-condition for (), in particular when B is unobservable, launches us on the battle ground of the realism debate. Alas! Beyond the scope of the current paper. I have little to recommend my opinions. They come from one who desires honours, distinctions, and emoluments, but little; and who expects them not at all; who has no contempt for fame and no fear of obloquy; who shuns contention, though he will hazard an opinion; from one who wishes to preserve consistency; but who would preserve consistency by varying his means to secure the unity of his end; and, when the equipoise of the vessel in which he sails may be endangered by overloading it upon one side, is desirous of carrying the small weight of his reasons to that which may preserve his equipoise. Acknowledgements Many thanks to Roman Frigg (London School of Economics), who is responsible for the presence of a section on representation, to Th. Kuipers (Groningen University), to Mauricio Suárez (Complutense Madrid) and to Bas van Fraassen (Princeton University) for remarks, and to Patrick Suppes (Stanford University) for a long responsive letter on an earlier version. Thanks to the Dutch National Science Organisation (NWO) for financial support.
References Balzer, W., Moulines, C. U., & Sneed, J. D. (1987). An architectonic for science: The structuralist program. Dordrecht: Reidel. Beth, E. W. (1968). The foundations of mathematics. Amsterdam: North-Holland. Bogdan, R. J. (Ed.). (1979). Patrick Suppes. Dordrecht: Reidel Carnap, R. (1966). Philosophical foundations of physics. An introduction to the philosophy of science. New York: Basic Books. da Costa, N. C. A., & Chuaqui, R. (1988). On Suppes’ set-theoretical predicates. Erkenntnis, 29, 95–112. da Costa, N. C. A., & French, S. (1990). The model-theoretic approach in the philosophy of science. Philosophy of Science, 57, 248–265. de Jong, W. R., & Betti, A. (2008). The classical model of science: A millennia-old model of scientific rationality. Synthese. doi:10.1007/s11229-008-9417-4. Díez, J. A. (1997). A hundred years of numbers. An historical introduction to measurement theory 1887–1990. Studies in the History and Philosophy of Modern Physics, 28(Part I), 167–185, Part II, 237–265.
36 Sneed (1979, p. 135) well-known concept of an empirical claim needs this ontological presupposition too when he wants to claim that a particular concrete actual being, say the solar system, is ‘a classical particle-mechanical structure’. Sneed takes the presupposition for granted.
123
114
Synthese (2011) 183:87–114
French, S. (2003). A model-theoretic account of representation. Philosophy of Science (PSA Proceedings), 70, 1472–1483. French, S., & Ladyman, J. (1999). Reinflating the semantic approach. International Studies in the Philosophy of Science, 13(2), 103–121. Frigg, R. (2006). Scientific representation and the semantic view of theories. Theoria, 55, 49–65. Giere, R. N. (1988). Explaining science. A cognitive approach. Chicago & London: University of Chicago Press. Giere, R. N. (2004). How models are used to represent reality. Philosophy of Science, 71, 742–752. Humphreys, P. (1994). Patrick Suppes: Scientific philosopher (Volumes 2). Dordrecht: Kluwer Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A. (1971). Foundations of measurement (Vol. 1). New York: Academic Press. McKinsey, J. C. C., Sugar, A. C., & Suppes, P. (1953). Axiomatic foundation of classical particle mechanics. Journal of Rational Mechanics and Analysis, 2, 253–272. McKinsey, J. C. C., & Suppes, P. (1953). Transformations of systems of classical particle mechanics. Journal of Rational Mechanics and Analysis, 2, 272–289. McKinsey, J. C. C., & Suppes, P. (1955). On the notion of invariance in classical mechanics. British Journal for the Philosophy of Science, 5, 290–302. Morgan, M. S., & Morrison, M. (Eds.). (1999). Models as mediators: Perspectives on natural and social science. Cambridge, UK: Cambridge University Press. Muller, F. A. (1998). Structures for everyone. Amsterdam: A. Gerits & Son. Muller, F. A. (2005). The deep black sea: Observability and modality afloat. British Journal for the Philosophy of Science, 56, 61–99. Muller, F. A., & Van Fraassen, B. C. (2008). How to talk about unobservables. Analysis, 68(3), 197–205. O’Brien, C. C. (1967). Burke. Relfections on the revolution in France. Middlesex, England: Penguin Books. Putnam, H. (1960/1962). What theories are not. In E. Nagel, P. Suppes, & A. Tarski (Eds.), Logic, methodology and philosophy of science: Proceedings of the 1960 international congress (pp. 240–251). Stanford, CA: Stanford University Press Rantala, V. (1978). The old and the new logic of metascience. Synthese, 39, 233–247. Sneed, J. D. (1979). The logical structure of mathematical physics. Dordrecht: Reidel. Stegmüller, W. (1976). The structure and dynamics of theories. Berlin: Springer. Suárez, M. (2003). Scientific representation: Against similarity and isomorphism. International Studies in the Philosophy of Science, 17(3), 225–244. Suppe, F. (1974). The search for philosophic understanding of scientific theories. In F. Suppe (Ed.), The structure of scientific theories (pp. 6–232). Urbana: University of Illinois Press. Suppes, P. (1954). Some remarks on the problems and methods of the philosophy of science. Philosophy of Science, 21, 242–248. Suppes, P. (1960). A comparison of the meaning and use of models in mathematics and the empirical sciences. Synthese, 12, 287–301. Suppes, P. (1961). Studies in the methodology and foundations of science. Dordrecht: Reidel. Suppes, P. (1968). The desirability of formalisation in science. Journal of Philosophy, 65, 651–664. Suppes, P. (1974). The structure of theories and the analysis of data. In F. Suppe (Ed.), The structure of scientific theories (pp. 266–307). Urbana: University of Illinois Press. Suppes, P. (1992). Axiomatic methods in science. In M. E. Carvallo (Ed.), Nature, cognition and system II (pp. 205–232). Dordrecht: Kluwer. Suppes, P. (1993). Models and methods in the philosophy of science. Dordrecht: Kluwer. Suppes, P. (2002). Representation and invariance of scientific structures. (Stanford: Centre for Logic, Language and Computation (distributed by Chicago University Press)) Toulmin, S. (1974). The structure of scientific theories. In F. Suppe (Ed.), The structure of scientific theories (pp. 600–614). Urbana: University of Illinois Press. Van Fraassen, B. C. (1980). The scientific image. Oxford: Clarendon Press. Van Fraassen, B. C. (1989). Laws and symmetries. Oxford: Clarendon Press.
123
Synthese (2011) 183:115–126 DOI 10.1007/s11229-009-9670-1
Future development of scientific structures closer to experiments: Response to F.A. Muller Patrick Suppes
Received: 6 January 2009 / Accepted: 17 March 2009 / Published online: 6 October 2009 © Springer Science+Business Media B.V. 2009
Abstract First of all, I agree with much of what F.A. Muller (Synthese, this issue, 2009) says in his article ‘Reflections on the revolution in Stanford’. And where I differ, the difference is on the decision of what direction of further development represents the best choice for the philosophy of science. I list my remarks as a sequence of topics. Keywords Ergodic
Scientific structures · Measurement errors · Invariance · Computation ·
1 Actual beings F.A. Muller (2009) is, I think, almost correct when he says in Sect. 4 of his article that the Informal Structural View (my view) says nothing about actual beings. Long ago, in my 1970 monograph, A probabilistic theory of causality, I noted that for a standard probability space, it was trivial to show by Padoa’s principle that we could not define the concept of an actual event in terms of Kalmogorov’s concepts and axioms. So I introduced an occurrence function θ . Here is what I said: It is obvious that the concept of the occurrence of an event is formally similar to the concept of a proposition’s being true. The four axioms given below assume the algebra of events as given in the usual set-theoretical fashion. The new additional concept of occurrence is expressed by a one-place predicate θ . From the four axioms we can derive Huntington’s five axioms (1934) for formalizing the “informal” part of Whitehead and Russell’s Principia mathematica (1925).
P. Suppes (B) Stanford University, Ventura Hall, 220 Panama Street, Stanford, CA 94305-4101, USA e-mail:
[email protected]
123
116
Synthese (2011) 183:115–126
The predicate ‘θ ’ corresponds to his predicate ‘C’, where C(x) is interpreted to mean that proposition x is true. Axioms of Occurrence Axiom 1. If θ A then θ (A ∪ B). Axiom 2. If θ A and θ B then θ (A ∩ B) Axiom 3. θ A or θ A¯ ¯ Axiom 4. If θ A then it is not the case θ A. I shall only prove two theorems about these axioms. Theorem 1 The Axioms of Occurrence imply Huntington’s five axioms. Theorem 2 The concept of occurrence satisfying Axioms 1–4 above is not definable in terms of standard probability concepts. Moreover, it is not definable in terms of causal concepts set forth in Definitions 1–8.” (Suppes 1970, pp. 37–39) I am not claiming that these axioms are enough to deal satisfactorily with the problem of characterizing actual beings. The actual events I am referring to are also in some sense abstract. In fact, without trying to go deeply into the matter here, I think that much of our experience with actual objects and processes is indescribable in its full concreteness. Any description, in informal or formal language, is some kind of abstraction. Most of our ordinary talk, not just scientific talk, is saturated with concepts that always abstract from the full particularity of a given experience. This is why demonstratives are so necessary, and are available, in every natural language. This is an old problem in philosophy that I shall not consider further here. 2 Theory-talk vs. experiment-talk In Sect. 6 Muller is correct that my view is not the semantic view of theories. My criticism of van Fraassen’s elegant coordinate-free or language-free view of theories is that this just reaches a further level of abstraction, far removed from what is going on in actual experiments. What is striking about the quote from him Muller gives, and also his own following remarks, is that they remain very much at the level of what I call theory-talk, not talk about experiments. In the other direction, with the emphasis on language of some sort, the semantic view moves away from the approach to representation and invariance of pure mathematics (I comment on this again later). The most characteristic feature of experiment-talk, I would say, is that it is almost entirely removed from talk about models of any kind. The focus is on how things were done in a given experiment. Such talk varies widely from one subdiscipline of science to another. There is not a hint of any of this in van Fraassen. It is as if science were only about theories. In recent years I have written about experiments and how dependant they are on students and postdocs learning how to do them through apprenticeships. What is important here is that much of the instruction is nonverbal. You can never learn how to run a particle accelerator, or, for that matter, play tennis or soccer, by
123
Synthese (2011) 183:115–126
117
listening to a lecture. Much nonverbal learning is required, but this essential aspect of science seems to be neglected by many, including both Muller and van Fraassen. I think it is worth amplifying some of the differences between theoretical and experimental developments by looking at some problems and examples from the history of science. A good initial focus is the special theory of relativity. Einstein’s original 1905 paper on special relativity is a wonderful example of “shock and awe” in science. It heralded one of the great scientific revolutions of modern science. On the other hand, from an experimental standpoint, no really new experimental procedures or new kinds of measurement were required to conduct the Michelson-Morley and other relevant experiments, although such procedures were changing, as they usually are. At that period of time in physics, there were no striking changes of methods of measurement associated with the confirmation of the theory. Michelson and Morley just used the kinds of interference measurements that had been thoroughly worked out in the 19th-century development of experimental physics. In fact, this was the first century of real experimental methods in physics. Such continuity of experimental procedures, including the main procedures of measurement, is a familiar fact about many important transitions in physics. To take another example, there were no special new observational or experimental techniques directly associated with the Watson and Crick discovery of the double helix of DNA. They analyzed data from Rosalind Franklin and Maurice Wilkins. The X-ray crystallography that was used had been developed for some time, by von Laue, the Braggs, and later, for biological molecules, by Dorothy Crowfoot Hodgkin. A cascade of new experimental methods did follow the double-helix discovery. Another kind of revolution takes place in a given area of science when a radically new kind of instrumentation is introduced. Classic examples are the telescope and, as another visual device going in the opposite direction, the microscope. With these new instruments, new discoveries were made that had a strong impact on subsequent theoretical developments, in the one case, in astronomy, and in the other, in biology. Certainly the most expensive experiments yet conceived in the history of science are the ones that are aimed at being the only approach practically available to settle such theoretical questions as the existence of the Higgs particle. The cost of the CERN accelerator sets the modern record, but the prior particle accelerators, including the long-serving Linear Accelerator Center at my home university, have also been extravagantly expensive by normal scientific standards. Yet without this new instrumentation, none of the new particles of modern physics would have been discovered, at least, not within the framework of earlier instrumentation. It is perhaps the best example of new physics being stymied without new methods of instrumentation. Still another tale can be told to emphasize that too much stress just on scientific theory is out of place. Think of cases where new mathematical methods were important to progress. In ancient astronomy, the observational methods of the Babylonians, especially from the 8th to the 1st century BCE, were in many ways superior to those of the Greeks, and the methods of calculation were about as good. But in terms of what we now know, the great difference between Babylonian and Greek astronomy of ancient times lies in the sophistication of the mathematical developments of the Greek geometrical ideas, including spherical trigonometry, as opposed to the arithmetical-numerical methods and ideas of the Babylonians. As is well documented in
123
118
Synthese (2011) 183:115–126
Jens Hoyrup’s (2002) book, the Babylonians had a rich computational and algebraic tradition much earlier than the Greeks, but by modern standards it was not comparable to the Greek development of geometry with extensive application to astronomy. Moreover, it seems extremely unlikely that we will find anything in the as yet large number of untranslated clay tablets of Babylon comparable to the Almagest of Ptolemy, which is a massive piece of mathematical and observational astronomy that in style and substance reads very much like a modern textbook of mathematical physics or astronomy, written with greater than usual attention to mathematical rigor. To give a quite concrete example, there is nothing as yet discovered in Babylonian astronomy comparable to Menelaus’ theorem (about 100 CE) on the trigonometry of the right spherical triangle. Such spherical trigonometric results were used by Ptolemy extensively in the Almagest. But this development of spherical trigonometry did not begin with Menelaus, but rather at least as early as Autolycus of Pitane, about 300 BCE. Much of the well-known Euclidean work Phenomena is based on Autolycus. Another work is the Sphaerica of Theodosius, written about 100 BCE. Menelaus’ work was particularly powerful and useful in astronomy, and was much read and used subsequently. Still another aspect of the relation between mathematics and science is that Ptolemy’s extensive use of Menelaus’ results required application of only the elementary parts. This seems to be a generalization that holds rather widely in the use of mathematics in physics. The more elementary parts turn out to be the parts that are really useful and pragmatically important, or at least are the ones that the physicists or astronomers take the time to learn. The tale I have been telling has more extreme versions in many parts of modern science. Scientists often have a rather poor command of both the mathematics and instrumentation used. A good example is the use of functional magnetic resonance imaging (fMRI) in contemporary neuroscience. The nuclear magnetic theory on which it is based was only discovered in physics in the 1940s, for which Felix Bloch and Edward Purcell got the Nobel prize. Its extensive application in fMRI is understood, in any serious detail, by an extraordinarily small number of neuroscientists, relative to the number that are using it for experimental work. This is not meant as a criticism of neuroscientists, but a comment on the complicated and highly specialized nature of modern science. I should emphasize this last point more in terms of the philosophy of science itself. In this age of vast and necessary scientific specialization, it is impossible to enter into all the details in every direction in discussing recent scientific work, the ancient history of science, or the progress expected in the future. One can only be selective and hit-and-miss when trying to give a broad overview. My criticisms are based, not upon the absolute necessity for restricting the range of what is discussed, but on the point that too much emphasis can be placed just on scientific theory itself, whereas often it is at least as important to discuss experimental methods and how they have influenced the progress of science on the one side, and the developments of mathematics on the other. The glory of Ptolemy is his elaborate use of theory, observation, and mathematics. In this respect, it is easier to give a more thorough discussion of the older history of science. So, for example, there is a sense that in the future it will be recognized that the history of ancient astronomy, in spite of the restricted records, will be in many ways more complete than the history of modern science, particularly astrophysics and
123
Synthese (2011) 183:115–126
119
cosmology, just because of the overwhelming wealth of data and the vast range of methodologies used in quantitative observations that in their full detail are theoretically immeasurable. And these matters are not restricted to astronomy and physics. They will also be characteristic in the future of biology, if they are not so already. In this respect I suppose my final point is to emphasize that, if we want anything like a realistic story of any of the major developments of science now and in the future, it will not be a simple and tidy one, theoretical or otherwise, but something of great complexity and nuance, more like a great novel than an admirably lucid textbook.
3 Invariance I agree with Muller’s further development of the Structural View in Sect. 6 of his article. It is an attractive program for some philosophers to follow. But I do want to make a remark about invariance and how it is treated as a central concept in modern mathematics. Representation and invariance theorems are proved in almost all parts of pure mathematics without reference to language or linguistic invariance. An invariance theorem is stated about mathematical objects with no explicit reference to the language describing these objects. This is the view of pure mathematics I carried over to scientific structures in my 2002 book Representation and invariance of scientific structures. I quite agree that this book of mine, as embodying my systematic views of scientific theories, does not deal at all with the problem of how to talk about actual beings or even experiments, but I have been under no illusion that it does. I say the same thing is true of Muller’s extension of this program, but I am sure he is aware of this. There is a problem about invariance that I now recognize more thoroughly than I did when writing the 2002 book. This is the tension between computation and invariance. It is sometimes asked, “Why don’t physicists write their axioms and their formulation of physical laws in an invariant form instead of a covariant one?” Or a more drastic question: “Why are computations not done in computers in an invariant style, rather than satisfying some arbitrarily chosen basis, for example, the decimal base 10 (the customary one taught in school) or base 2?” I momentarily postpone my answer to note that it is already difficult to write the laws of physics in an invariant way. A general response might be, “Well, don’t we already have a good example of invariance in the way synthetic geometry is axiomatized?” And the answer is “yes”, but it is also a very good instance of computational power being weak. Computing much of anything in the notation of Euclid, or a modern axiomatization of Euclidean geometry in synthetic form, is nothing but a mess. This was well recognized long ago, and it was the great glory of Descartes to have discovered the computational power of analytical geometry. (None of the earlier anticipation by Apollonius and others went very far.) Physicists above all are opportunistic about notation. A transformation to a coordinate system that brings an important quantity of measurement, for example, velocity, to zero, is a good thing, because it simplifies the computations. In fact the effort to simplify computations tends to move in the opposite direction from invariance. Physicists like to put computation in its place by proclaiming the “axiom” that all computations are physical computations. This axiom seems to support the further
123
120
Synthese (2011) 183:115–126
inference that computations require a particular representation to have a presence, for example, in the physical processing of a digital computer. But many mathematicians who dislike the tedious matrix and determinant calculations, which were drastically reduced in modern linear algebra, would seem to prefer the heuristic, “Don’t compute until you have to, stay with invariants as long as you can.” There is merit in both views. I introduce these clashing points of discord to note the tension between two of the most important concepts in modern science. To put the matter more positively, these remarks about computation do not reduce the value of the concept of invariance in mathematics, physics, and other parts of science. For example, in the case of special relativity, one’s insight into the nature of the theory is, it seems to me, enhanced by knowing that the simple computation of the proper time between two point events is an invariant, and in fact, a complete invariant for special relativity. I mean by ‘complete’ that from only the assumption of this invariant for measurement frames of reference, one can derive the Lorentz transformations as the widest group preserving this quantity (Suppes 1959). But it is also clear that knowing about this invariant does not mean that you want to use it for the basic formulation of axioms of relativistic mechanics, for example. This can be seen from an axiomatic analysis of the latter in Rubin and Suppes 1954. Also, in all kinds of structures in the theory of measurement, the representation theorems are proved in terms of representations that are not invariant, for example, the measurement of length, velocity, weight, any of the many kinds of quantities one can think of, or utility in psychology and economics. Of course, it is natural to then prove an invariance theorem, which is a theorem giving the widest group of transformations under which the properties of the quantity in question can be preserved. There is a great historical example that goes in the opposite direction. It is a virtue of the ratios of Euclid, used effectively all the way through to Newton’s Principia, to formulate relations in terms which avoid the problems of arbitrary units of measurement, but in doing so, complicate the calculations. I now turn to Muller’s final section, Sect. 7 on completion. My own program for completion is of a very different character. What I see as important and much more fruitful as a direction for the Informal Structural View is to make the theoretical structures of science more closely match experimental data, and perhaps even more, experimental procedures. The rest of what I have to say is devoted to giving some extended examples of this program under four headings: measurement, computation, ergodic theory of observational error, and constructive mathematical foundations.
4 Measurement In the theory of measurement, the program is to eliminate the classical structures that have an infinite domain and thereby match closely the structures of pure mathematics. The task remains to provide structures that are formally defined, but are much closer to actual methods of measurement. A recent paper of mine, ‘Transitive indistinguishability and approximate measurement with standard finite ratio-scale representations,’ exemplifies this approach (Suppes 2006). A representation theorem is proved in terms
123
Synthese (2011) 183:115–126
121
of upper and lower measures to reflect the pre-statistical indeterminacy of an actual scale. The psychological consideration of thresholds below which perceptual or other comparative judgments are difficult, if not impossible, was initiated by Fechner (1860/1966). An important early mathematical analysis was given by Wiener (1921) The probabilistic analysis of thresholds dates at least from works of Thurstone (1927a,b). Falmagne (1976, 1978) has also been a central contributor to this approach, with a number of other papers written with colleagues. Almost all of the work I have referred to assumes that the indistinguishability of similar events, objects, or stimuli is a nontransitive relation. The implicit assumption is that with many different discriminating observations, many initially indistinguishable events may be separated. Here the opposite is the starting point and the reason for the use of the word “transitive” in the title. It is a consequence of the axioms I introduce that indistinguishability is an equivalence relation, and so, transitive. The basis for transitive indistinguishability is easy to explain. An object weighed is assigned to a unique minimal interval, for example, one between 1.9 and 2.0 g. The binary relation of two objects, a and b, being equivalent in weight, a ∼ b, is that they be assigned to the same minimal interval. This relation is obviously an equivalence relation, i.e., reflexive, symmetric, and transitive, but in the system of approximation developed, these properties are not directly testable, but rather consequences of weighing operations with already “calibrated” sets of weights. An object assigned to the minimal interval (1.9–2.0 g) is said to have, as an approximation, an upper measure (of weight) w ∗ (a) = 2.0 g and a lower measure w∗ (a) = 1.9 g. In practice, for all but the most refined procedures of measurement, no statistical analysis of having weight in such a minimal interval is given. In the cases when the minimal interval is just on the borderline of instrument performance, a statistical analysis can be given for repeated measurements. It is conceptually important here to retain both upper and lower measures, for the foundational view formalized in the axioms is that no finer measurement than that of a minimal interval is available in the given circumstances. No theoretical construction of a probability distribution for location within the minimal interval makes much scientific sense. The point being emphasized is that the formalization given is meant to be a step closer to much, but certainly not all, actual practice of measurement when a fixed standard scale representation is available.
5 Computations The same approach should be applied to the computational side of science, especially now that so much of current science is nonlinear and equations cannot be solved in closed form, but only numerically approximated. As an example, the classical axioms of arithmetic need to be replaced for computational purposes by axioms of floatingpoint arithmetic. The point is to show that with full rigor we can match new constructive foundations to what is going on computationally. In fact, it is quite a complex and subtle matter to give satisfactory axioms for floating-point computations. It can even be argued that there is no completely satisfactory set of axioms available at the present time. In this discussion, I would just like to hit a
123
122
Synthese (2011) 183:115–126
few of the high points, to indicate why it is useful but often difficult to implement the kind of foundational program that gets closer to the actual practice of computation. There is something much simpler that is widely taught already in elementary-school mathematics, namely, standard scientific notation. So, if we know a number, say, to 10 significant digits, and the units are, for example, miles, and the distance is the distance from Earth to some other planet at a given moment, to express this in standard scientific notation, we first write the first digit as a number strictly between 0 and 10, for example 1.567897. Then we have as the second part the expression of power as a power of ten. So to express something simple, like a debt of $1,600,000, we would write that debt as $1.6 × 106 . Well, this is simple enough, and it does not have any implications that go against the standard axioms of arithmetic. Floating-point and actual extensive computations are another matter. Here, the scheme must be not only finite, but bounded. We cannot compute, in any actual computer, finite numbers that are too large. It is very easy to write down two numbers that we would like to multiply, which cannot be multiplied in any existing computer, and if we take what some physicists claim seriously, we can even assert that, in principle, such multiplication could never occur, in any computer in our universe, in the amount of time, for example, that the Earth has already existed. So the introduction of a finite bound in floating-point arithmetic is the real problem. In the ordinary mathematical system, we have the neat and clean closure of operations saying that any two numbers can be added to obtain another number, any two can be so multiplied, and with a restriction on division by zero, we say the same of division. There is no restriction in the case of subtraction if we are willing to introduce negative numbers, which we always do, of course. So with the full range of real numbers, the operations of addition, multiplication, division, and subtraction are closed, meaning we can perform these operations on any two numbers, and we obtain another number. Division by zero presents special problems I will ignore here, but which much can be said about. In computations that are limited finitely, something more complicated has to go on. It is these complications that produce all the difficulty for floating-point arithmetic. Historically, there was not much attention paid to the problems of floating-point arithmetic; it all began with the serious development of computers. But then it was evident immediately that something of a highly specific and technical nature was needed in order to have proper control of computation. Otherwise, results that would be difficult to interpret could easily be produced. So, starting around the middle of the 20th century, much attention has been paid to this problem, and the literature is now extensive. The point of emphasis here is not the extensive character of this literature, but rather the lack of almost any developed foundational literature on the subject. There are some axiomatizations, and at least one or two dissertations written on the matter in modern computer science, but compared to the kind of foundational attention that has been devoted to a great variety of problems in elementary number theory, little exists. It is part of the program I advocate to make this a foundational issue, and to discuss in the constructive spirit, which I think should be applied to mathematics as used in science, the idea of having finitistic constraints on floating point to match the necessary constraints on the measurements and computations actually used in scientific practice.
123
Synthese (2011) 183:115–126
123
6 Ergodic theory of observational error The next big step is to use the ergodic theory of chaos to get general results about errors. The philosophical importance of these results has not, in my view, been much appreciated by philosophers of science. Roughly speaking, they go like this. Let us recognize that all, or at least almost all, measurements of continuous quantities have errors bounded away from zero. This is contrary to the mistaken idealism of 19th-century philosophy of physics that errors could be reduced ever closer to zero in a continuing sequence of improvements. Given such errors, we also recognize as impossible the exact confirmation of continuous deterministic physical systems. I am not saying this in a precise way; I think the intuition is clear enough. In any case, the details are clarified by the remarkable results of Ornstein and Weiss (1991) about a variety of chaotic systems, including as perhaps the most vivid one, “Sinai billiards,” which means the following. A convex obstacle is added to the center of the table, which the billiard ball rebounds off of, according to the same symmetrical laws of physics that govern rebounds off the sides of the table. Sinai proved that, for such billiards, one has an ergodic measure, i.e. over a long enough time the ball will pass arbitrarily close to all points on the surface of the table, except for a set of measure zero. What is important is that Ornstein and Weiss also add a concept of α-congruence. Such congruence is of the following sort. Two models, for example, one deterministic and the other stochastic, can be defined for Sinai billiards, such that they are not only measure-theoretic isomorphic but are α-congruent, i.e., all distance measurements for the same pairs of points are within α (the bound on error) except for a set of points of measure α. Take as the deterministic model the simple classical physics one for idealized billiards with no loss of energy, but with errors of measurement as noted above, and let the second be a stochastic model which is a Markov process, suitably defined to be provably α-congruent to the deterministic one. The fundamental theorem is that, no matter how many observations are made, as long as the number is finite, we cannot distinguish between the goodness of fit of the deterministic model and the stochastic one. This establishes a new kind of invariance that pushes hard against many conventional doctrines of long standing in the philosophy of science. Note that the two models are mathematically inconsistent. But empirically one cannot distinguish between the two. These ideas about error are not just methodological ones, but important limitations on our knowledge of the world. They constitute one kind of excellent answer to Kant’s conflicts of the antinomy of pure reason. Consider, for example, the second conflict, concerning whether matter is discrete or continuous. It is obvious how what I have just said could apply to this antinomy. The same thing can be said about the third conflict on causality. I have written about these matters in an informal way in several places (Suppes and Chuaqui 1993; Suppes 1995, 1999; Suppes and de Barros 1996). I am sorry to say that I did not include these topics in my 2002 book.
7 Constructive mathematical foundations Finally, the implementation of the program, in terms of the first three steps I have outlined, suggests a fourth one to make the picture fully finitistic in spirit. This is to
123
124
Synthese (2011) 183:115–126
provide a computational foundation of analysis that uses the same ideas about the nature of errors. Here the basic move is to have a non-standard constructive form of analysis. By non-standard I mean that we have real infinitesimals, and by constructive, I have in mind a very strong form, namely that the axioms are quantifier-free, so that it is a free-variable system. Such a system is of course much weaker mathematically than classical analysis, but my claim is that once the presence of errors that I have been talking about is taken seriously, as a foundational necessity, then everything that one can in fact do that is significant scientifically can be done within such a restricted, constructive mathematical system. Now of course I may be wrong in saying that everything can be done. The point of the program, which is a very concrete one, is to see how far one can go and moreover, where there are counterexamples requiring stronger methods to lead to new scientific results that can not be formulated in such a framework, if there are any. Over the past fifteen years, I have written with two collaborators several long articles on constructive, non-standard foundations of analysis (Chuaqui and Suppes 1990, 1995; Sommer and Suppes 1996, 1997; Suppes and Chuaqui 1993). I remark that absorbing the theory of error in a sharp way into these foundations has been the solution to giving a much more robust and simplified system that in some sense matches very well almost everything that is actually done in scientific practice mathematically, with some possible exceptions here and there. Without giving a detailed description of the system, it is still possible to convey a sense of its main features, especially those of some philosophical interest. I list three. 1.
2.
In Suppes and Chuaqui (1993), we introduced the concept of a geometric subdivision of a closed interval [a, b] of order v, where a and b are finite real numbers and v is an infinite natural number. We used this concept to define the differential du = (b − a)/v and u i = a + u i du, for 0 ≤ i ≤ v. So the u i form a partition of [a, b], and thereby mark the divisions of the geometric subdivisions depending on a, b, and v. This concept was anticipated in the work of Cavalieri (1635), a student of Galileo’s, on the geometry of indivisibles. All of this has complications that are evident in the article mentioned. More generally, use of the infinitesimals of modern nonstandard analysis is a natural computation approach, but brings with it more subtle complexity than initially expected, as the details are worked out. In the system I am now developing with Ted Alper we simplify, and thereby eliminate, much of this onerous detail, by using a strictly finite approach as the basic model. We replace the geometric subdivision by an extremely fine but finite equally spaced grid, with the spacing many orders of magnitude smaller than any current physical constants or limitations on measurement. The second feature concerns the very large. If the universe is of fixed finite size, but not too large, we may be able to measure it. On the other hand, if it is finite but very large, or infinite, there may be no way to empirically distinguish the hypothesis of being very large but finite from that of being infinite. To make this idea more concrete, consider how an approximately spherical uni1000 verse with a diameter greater that 10001000 kilometers could be distinguished from a flat infinite universe. So the second heuristic principle Alper and I are
123
Synthese (2011) 183:115–126
3.
125
using is the indistinguishability in empirical quantitative terms of a very large finite space from an infinite one. The third heuristic principle is that our finite models, with a very small physical distance corresponding to an infinitesimal, should be the basis of establishing, as a weak form of isomorphism, an indistinguishable reflexive and symmetric relation between our very large finite models and standard models using classified analysis, as applied to quantitative empirical tests or simulations of empirical scientific results in any domain of science, but especially physics.
Of course, this program is quite different from the completion Muller has outlined. It does not mean one of us is right and the other wrong; it just means there are fundamentally different ways to extend the Informal Structural View to more details of scientific interest. My own attitude, starting with Suppes (1962), is to go as deeply as possible into the actual practices of science at the level of measurement, observation, and computation, and how they should be reflected back into theory when the limitations imposed by errors or environmental variations are taken seriously. I also look upon this approach to error as really the proper answer to Kant’s antinomies. As Kant himself hints in various passages, the difficulty, as in the case of Russell’s paradox for Frege’s system, is the unbounded or unconditioned character of the results aimed at. It is conditioning or restricting that is necessary to have a consistent theory of nature. If unrestricted and unbounded operations are permitted, then we get some form of antinomies, even in pure mathematics. This was the kind of thing that was wrong with Frege’s system, and also leads to the standard paradoxes of the same sort in axiomatic set theory. As a final point, the program of constructive mathematical foundations outlined above also challenges the pure a priori synthetic status of arithmetic and geometry at the heart of Kant’s and much later work in the foundations of mathematics. Actually, the challenge here is not so much about the a priori, but whether arithmetic and geometry in their classical axiomatizations are scientifically the correct choice. The question is not according to the often mentioned slogan about the certainty of pure geometry versus the uncertainty of empirical geometry, but rather what axiomatization is best for each major scientific discipline. New proposals for both arithmetic and geometry are part of the program I have outlined here. The constructive thrust is not to limit available mathematical methods, as such, but rather to build mathematical structures that more closely match scientific practice. References Cavalieri, B. (1635). Geometria indivisibilibus continuorum nova quadam ratione promote. Bologna: Clemente Ferroni (2nd ed. 1653). Chuaqui, R., & Suppes, P. (1990). An equational deductive system for the differential and integral calculus. In P. Martin-Löf & G. Mints (Eds.), Lecture Notes in Computer Science, Proceedings of COLOG-88 International Conference on Computer Logic (pp. 25–49). Berlin: Springer-Verlag. Chuaqui, R., & Suppes, P. (1995). Free-variable axiomatic foundations of infinitesimal analysis: A fragment with finitary consistency proof. The Journal of Symbolic Logic, 60, 122–159. Fechner, G. T. (1860/1966). Elemente der Psychophysik. Leipzig: Druck und Verlag von Breitkopfs Härtel. Transl. H. E. Adler (1966). Elements of Psychophysics (Vol. 1). New York: Holt, Rinehart & Winston.
123
126
Synthese (2011) 183:115–126
Falmagne, J. C. (1976). Random conjoint measurement and loudness summation. Psychological Review, 83, 65–79. Falmagne, J. C. (1978). A representation theorem for finite random scale systems. Journal of Mathematical Psychology, 18, 52–72. Hoyrup, J. (2002). Lengths, widths, surfaces: A portrait of old Babylonian algebra and its kin. New York: Springer. Huntington, E. V. (1934). Independent postulates for the “informal” part of Principia Mathematica. Bulletin of the American Mathematical Society, 40, 127–136. Muller, F. A. (2009). Reflections on the revolution in Stanford. Synthese (this issue). Ornstein, D., & Weiss, B. (1991). Statistical properties of chaotic systems. Bulletin of the American Mathematical Society (New Series), 24, 11–116. Rubin, H., & Suppes, P. (1954). Transformations of systems of relativistic particle mechanics. Pacific Journal of Mathematics, 4, 563–601. Sommer, R., & Suppes, P. (1996). Finite models of elementary recursive nonstandard analysis. Notas de la Sociedad Matematica de Chile, 15, 73–95. Sommer, R., & Suppes, P. (1997). Dispensing with the continuum. The Journal of Mathematical Psychology, 41, 3–10. Suppes P. (1959). Axioms for relativistic kinematics with or without parity In L. Henkin et al. (Eds.) The axiomatic method with special reference to geometry and physics (pp. 291–307). Amsterdam: North-Holland Publishing Co. Suppes, P. (1962). Models of data. In E. Nagel, P. Suppes & A. Tarski (Eds.) Logic, methodology, and philosophy of science: Proceedings of the 1960 International Congress. Stanford, Stanford University Press (pp. 252–261) Suppes, P. (1970). A probabilistic theory of causality. Amsterdam: North-Holland Publishing Co. Suppes, P. (1993). The transcendental character of determinism. In P. A. French, T. E. Uehling, & H. K. Wettstein (Eds.), Midwest Studies in Philosophy, Vol. XVIII (pp. 242–257). Notre Dame: University of Notre Dame Press. Suppes, P. (1995). Principles that transcend experience: Kant’s antinomies revisited. Transzendentale Prinzipien: Eine Neubetrachtung der Kantschen Antinomien. Metaphysik, 11, 43–54. Suppes, P. (1999). The noninvariance of deterministic causal models. Synthese, 121, 181–198. Suppes, P. (2002). Representation and invariance of scientific structures. Stanford: CSLI Publications. Suppes, P. (2006). Transitive indistinguishability and approximate measurement with standard finite ratio-scale representations. Journal of Mathematical Psychology, 50, 329–336. Suppes, P., & Chuaqui, R. (1993). A finitarily consistent free-variable positive fragment of infinitesimal analysis. Proceedings of the IX Latin American Symposium on Mathematical Logic. Notas de Logica Matematica 38, 1–59. Suppes, P., & de Barros, J. A. (1996). Photons, billiards and chaos. In P. Weingartner & G. Schurz (Eds.), Law and Prediction in the Light of Chaos Research. Lecture Notes in Physics (pp. 189–201). Berlin: Springer-Verlag. Thurstone, L. L. (1927a). A law of comparative judgment. Psychological Review, 34, 273–286. Thurstone, L. L. (1927b). Psychophysical analysis. American Journal of Psychology, 38, 368–389. Wiener, N. (1921). A new theory of measurement: A study in the logic of mathematics. Proceedings of the London Mathematical Society, 19, 181–205.
123