Explanation, Prediction, and Confirmation
Proceedings of the ESF Research Networking Programme
THE PHILOSOPHY OF SCIENCE IN A EUROPEAN PERSPECTIVE Volume 2
Steering Committee Maria Carla Galavotti, University of Bologna, Italy (Chair) Diderik Batens, University of Ghent, Belgium Claude Debru, Ecole Normale Supérieure, France Javier Echeverria, Consejo Superior de Investigaciones Cientificas, Spain Michael Esfeld, University of Lausanne, Switzerland Jan Faye, University of Copenhagen, Denmark Olav Gjelsvik, University of Oslo, Norway Theo Kuipers, University of Groningen, The Netherlands Ladislav Kvasz, Comenius University, Slovak Republic Adrian Miroiu, National School for Political Studies and Public Administration, Romania Ilkka Niiniluoto, University of Helsinki, Finland Tomasz Placek, Jagiellonian University, Poland Demetris Portides, University of Cyprus, Cyprus Wlodek Rabinowicz, Lund University, Sweden Miklós Rédei, London School of Economics, United Kingdom (Co- Chair) Friedrich Stadler, University of Vienna and Institut Wiener Kreis, Austria Gregory Wheeler, New University of Lisbon, FCT, Portugal Gereon Wolters, University of Konstanz, Germany (Co- Chair)
www.pse-esf.org
Dennis Dieks · Wenceslao J. Gonzalez · Stephan Hartmann · Thomas Uebel · Marcel Weber Editors
Explanation, Prediction, and Confirmation
Editors Dennis Dieks Utrecht University Inst. for History and Foundations of Science PO Box 80010 3508 TA Utrecht The Netherlands
[email protected] Stephan Hartmann Tilburg University Tilburg Center for Logic and Philosophy of Science 5000 LE Tilburg The Netherlands
[email protected]
Wenceslao J. Gonzalez University of A Coru˜na Faculty of Humanities Dr. Vazquez Cabrera street, w/n 15.403 Ferrol Spain
[email protected]
Thomas Uebel University of Manchester School of Social Science Oxford Road M13 9PL Manchester United Kingdom
[email protected]
Marcel Weber Fachbereich Philosophie Universit¨at Konstanz 78457 Konstanz Germany
[email protected]
ISBN 978-94-007-1179-2 e-ISBN 978-94-007-1180-8 DOI 10.1007/978-94-007-1180-8 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2011922745 c Springer Science+Business Media B.V. 2011 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
TABLE OF CONTENTS DENNIS DIEKS, Preface: Explanation, Prediction, Confirmation ............................7 Team A: Formal Methods JOHN WORRALL, The No Miracles Intuition and the No Miracles Argument ....... 11 STATHIS PSILLOS, The Scope and Limits of the No Miracles Argument ...............23 GREGORY WHEELER AND RICHARD SCHEINES, Causation, Association and Confirmation .................................................................................................37 JON WILLIAMSON, An Objective Bayesian Account of Confirmation ....................53 ADAM GROBLER, An Explication of the Use of Inference to the Best Explanation .................................................................................83 JOKE MEHEUS, A Formal Logic for the Abduction of Singular Hypotheses .........93 THOMAS MÜLLER, Probabilities in Branching Structures ....................................109 Team B: Philosophy of the Natural and Life Sciences RAFFAELLA CAMPANER, Causality and Explanation: Issues from Epidemiology .125 SAMUEL SCHINDLER, Invariance, Mechanisms and Epidemiology ......................137 ALEXANDER REUTLINGER, What’s Wrong with the Pragmatic-Ontic Account of Mechanistic Explanation? .......................................................................141 MICHAEL JOFFE, Causality and Evidence Discovery in Epidemiology ...............153 GERD GRAẞHOFF, Inferences to Causal Relevance from Experiments................167 ALAN C. LOVE AND ANDREAS HÜTTEMANN, Comparing Part-Whole Reductive Explanations in Biology and Physics .........................................................183 PETER MCLAUGHLIN, The Arrival of the Fittest ..................................................203 THOMAS A. C. REYDON, The Arrival of the Fittest W h a t ? ...................................223 Team C: Philosophy of the Cultural and Social Sciences WOLFGANG SPOHN, Normativity is the Key to the Difference Between the Human and the Natural Sciences ...........................................241 HANS LENK, Methodological Higher-Level Interdisciplinarity by SchemeInterpretationism: Against Methodological Separatism of the Natural, Social, and Human Sciences .........................................................253 JAN FAYE, Explanation and Interpretation in the Sciences of Man ....................269 PETER KEMP, Imagination and Explanation in History .......................................281 PAOLO GARBOLINO, Historical Narratives, Evidence, and Explanations .............293
6
Table of Contents
RAIMO TUOMELA, Holistic Social Causation and Explanation ............................305 WENCESLAO J. GONZALEZ, Complexity in Economics and Prediction: The Role of Parsimonious Factors ..............................................................319 MARIA G. BONOME, Prediction and Prescription in the Science of the Artificial: Information Science and Complexity..........................................................331 Team D: Philosophy of the Physical Sciences JEREMY BUTTERFIELD, Against Pointillisme: A Call to Arms ...............................347 DENNIS DIEKS, The Gibbs Paradox Revisited .....................................................367 MAURO DORATO, The Alexandroff Present and Minkowski Spacetime: Why it Cannot Do What it has Been Asked to Do .....................................379 TOMASZ PLACEK, A Locus for “Now” .................................................................395 SVEND E. RUGH AND HENRIK ZINKERNAGEL, Weyl’s Principle, Cosmic Time and Quantum Fundamentalism ... ...................... ......................................... 411 MICHIEL P. SEEVINCK AND JOS UFFINK, Not Throwing out the Baby with the Bathwater: Bell’s Condition of Local Causality Mathematically ‘Sharp and Clean’ ..............................................................425 Team E: History of the Philosophy of Science BERNA KILINC, Kant on Chance and Explanation ..............................................453 MICHAEL STÖLTZNER, Shifting the (Non-Relativized) A Priori: Hans Reichenbach on Causality and Probability (1915–1932)...................465 PIERRE WAGNER, Carnap’s Theories of Confirmation .........................................477 ARTUR KOTERSKI, The Rise and Fall of Falsificationism in the Light of Neurath’s Criticism .............................................................487 MARIA CARLA GALAVOTTI, Probability and Pragmatism .....................................499 GRAHAM STEVENS, Russell on Non-Demonstrative Inference ............................ 511 ELISABETH NEMETH, Edgar Zilsel on Historical Laws ........................................521 ERIC SCHLIESSER, “Every System of Scientific Theory Involves Philosophical Assumptions” (Talcott Parsons). The Surprising Weberian Roots to Milton Friedman’s Methodology ............................................................533 Index of Names ..................................................................................................545
PREFACE: EXPLANATION, PREDICTION, CONFIRMATION
This volume, the second in the Springer series Philosophy of Science in a European Perspective, contains selected papers from the workshops organised by the ESF Research Networking Programme PSE (The Philosophy of Science in a European Perspective) in 2009. The opening conference of this Programme (Vienna, 18-20 December 2008; see F. Stadler et al., eds., The Present Situation in the Philosophy of Science, Springer, 2010—the first volume of this series) first of all identified general directions in European philosophy of science research and defined points of contact between the different research teams that are part of the Programme. In comparison, the 2009 workshops placed a stronger emphasis on the further development of individual research lines of the teams, while keeping an eye on possibilities of cooperation and cross-fertilization. The individual PSE teams and their areas of research are as follows: Team A, Formal Methods (team leader Stephan Hartmann); Team B, Philosophy of the Natural and Life Sciences (team leader Marcel Weber); Team C, Philosophy of the Cultural and Social Sciences (team leader Wenceslao J. González); Team D, Philosophy of the Physical Sciences (team leader Dennis Dieks); Team E, History of the Philosophy of Science (team leader Thomas Uebel). Under the umbrella of the general theme Explanation, Prediction and Confirmation, these teams organised three meetings in 2009. Team B organised a workshop on “Explanation, prediction, and confirmation in biology and medicine”, which took place in Konstanz from 2 to 4 October; with Marcel Weber as the local organizer and with support from the Konstanz “Zentrum für Philosophie und Wissenschaftstheorie”. Team C organised a workshop on “Explanation, prediction and confirmation in the social sciences: realm and limits” at the University of Amsterdam, from 26 to 27 October; the local organiser was Marcel Boumans. Teams A, D and E organised a joint meeting entitled “Physical and philosophical perspectives on probability, explanation and time”. This meeting took place at the Woudschoten Conference Center in Zeist from 19 to 20 October 2009; its local organiser was Dennis Dieks, supported by the “Institute for History and Foundations of Science” of Utrecht University. The combined presence of three research teams in this meeting offered the opportunity for holding two explicitly interdisciplinary sessions in addition to the solo sessions of the individual teams. These combined sessions focussed on areas of overlap and joint interest between Teams A and D and Teams A and E, respectively. A detailed report of the meeting can be found in the Journal for General Philosophy of Science, 2010, DOI 10.1007/s10838-0109132-y; the results of the workshop of Team C are discussed in detail in the Journal for General Philosophy of Science, 2010, DOI 10.1007/s10838-010-9128-7.
8
Preface
A large audience, from all over Europe and from a variety of specialties, attended the several workshops: at the concluding lecture of the joint Zeist workshop almost one hundred people were present. Among these participants there were a substantial number of students and young scholars. The workshops thus accorded very well with the general idea behind PSE: establishing contacts between scholars from different European countries while furthering high level European research in the philosophy of science. Although the individual PSE teams focus on subjects that at first sight may seem quite different, there turned out to be many areas of overlap and common interest, with ample opportunity for joint work. For example, a connecting thread running through a substantial number of papers in this volume is the concept of probability: probability plays a central role in present-day discussions in formal epistemology, in the philosophy of the physical sciences, and in general methodological debates—it is central in discussions concerning explanation, prediction and confirmation. It became very clear at the meetings that such topics can profit considerably from intellectual exchange between various disciplines. Accordingly, it was decided that PSE should further pursue this path of cooperation and interdisciplinarity. In fact, probability will be a Leitmotiv in 2010, with 4 workshops on the role of probability and statistics in various disciplines; among which a joint workshop on topics of common interest to the philosophy of the life science and the philosophy of the physical sciences. The results will be available in the third volume of this PSE series!
Dennis Dieks Utrecht University
Team A Formal Methods
JOHN WORRALL
THE NO MIRACLES INTUITION AND THE NO MIRACLES ARGUMENT
In this paper I contrast the very modest view of the main ‘consideration’ supporting scientific realism taken by Poincaré and others with the much more ambitious argument developed by Stathis Psillos using some ideas of Hilary Putnam’s and of Richard Boyd’s. I argue that the attempt to produce a more ambitious argument not only fails, but was always bound to fail.
1. THE NO MIRACLES INTUITION Most of us tend toward scientific realism because of the amazing predictive successes enjoyed by theories in (mature) science. To take a well-worn example: the classical wave theory of light is, at root, a series of claims about an unobservable medium, the ‘luminferous aether’, and about unobservable periodic disturbances travelling through it; yet it turns out to follow deductively from this theory (together of course with accepted auxiliary assumptions) that, for instance, the ‘shadow’ of a small opaque disc held in light diverging from a point source will have an illuminated spot at its centre—a claim that can be directly empirically checked and turns out to be true.1 ‘How on earth’, it seems unavoidable to ask, ‘could a theory score a dramatic predictive success like that unless its claims about the reality ‘underlying’ the phenomena (in this case, about the unobservable luminiferous aether) are at least approximately in tune with the real underlying structure of the universe?’ To assume that it could score such successes, while not itself even being approximately true would be, in Poincaré’s words, “to attribute an inadmissible role to chance”2. Of course in this and similar cases, predictive success is the icing on a cake that must already be substantial. If scientists threw out enough theories simply at random, eventually one would score some predictive success ‘by chance’. But other conditions are implicitly presupposed: for example, that the predictive success 1
2
For the historical details of this case, which are at odds with the usual philosophical presentation, see John Worrall, “Fresnel, Poisson and the white spot: the role of successful predictions in the acceptance of scientific theories”, in: D. Gooding, T. Pinch and S. Shaffer (Eds.), The Uses of Experiment. Cambridge: Cambridge University Press, 1989, pp. 135-157. Henri Poincaré, Science and Hypothesis, repr. New York: Dover 1952 (originally 1905), p. 150.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_1, © Springer Science+Business Media B.V. 2011
12
John Worrall
is genuine and not brought about by some ad hoc accommodation of the relevant phenomenon within the theory at issue; also that the theory accounts for all the empirical success of its rivals, and so in particular for the success of its predecessor; and finally that the theory has a certain ‘simplicity’ or ‘unity’. But provided that these conditions are met then the realist-leaning force of predictive successes like that of the white spot seems difficult to resist. As Duhem3 put it: The highest test … of our holding a classification as a natural one is to ask it to indicate in advance things which the future alone will reveal. And when the experiment is made and confirms the predictions obtained from our theory, we feel strengthened in our conviction that the relations established by our reason among abstract notions truly correspond to relations among things.
Let’s call the “conviction” highlighted by Duhem ‘the no miracles intuition’. Notice that it is local: it applies to particular theories and their particular predictive successes. A general case for scientific realism can based on it only in a piecemeal, conjunctive way—it is reasonable to think that the general theory of relativity is approximately true because of its predictive success with, for example, the motion of Mercury, and it is reasonable to think that the photon theory of light is approximately true because of its predictive success with the photoelectric effect, and … This conjunction will not be over ‘the whole of science’ (whatever that is supposed to be). After all, some parts of science are frankly speculative, others highly problematic. Instead the conjunction will be over only those particular theories that have scored genuine particular predictive successes and hence elicit the no miracles intuition. No sensible scientific realist should ever have been realist about every theory in science, nor even about any theory that is (currently) the ‘best’ in its field. (It may after all, as has often been pointed out, be only ‘the best of a bad lot’.) She should be realist only about theories that have scored proper predictive success, since only such success elicits the no miracles intuition and only that intuition underwrites realism. Of course scientific realism faces many well-rehearsed problems—notably the challenge based on the history of theory change: presumably it was reasonable to think that, for example, the elastic solid ether theory of light was approximately true because of its predictive success (see above). Is this compatible with the current realist view that the still more impressively predictive photon theory of light is approximately true, given that the two theories are logically incompatible? However I lay these problems aside here.
3
Pierre Duhem, The Aim and Structure of Physical Theory, trans P. Wiener. Princeton, NJ: Princeton University Press 1954 (originally 1906), p. 28.
The No Miracles Intuition and the No Miracles Argument
2.
13
THE ‘NO MIRACLES ARGUMENT’
Rather, the issue I want to address is whether the “conviction” pointed to by Duhem, Poincaré and others is ineliminably intuitive or can instead be backed up by some more substantial argument. After all, an intuition seems a slim reed from which to hang a philosophical position; surely an argument, if cogent, would put the realist on firmer ground. As we have seen, the intuition applies to individual theories and so the obvious first suggestion would surely be to try to produce a form of argument aimed at underwriting the claims to (approximate) truth of such individual theories. This has indeed been attempted. (It is, for example, this form of the argument that Colin Howson criticises in his Hume’s Problem4.) But I shall not consider it here, instead going straight to the more widely touted, and altogether more ambitious, form of the argument. One that I shall argue was always a non-starter. The first step on the downward slope was taken by Hilary Putnam who famously argued5: The positive argument for realism is that it is the only philosophy that doesn’t make the success of science a miracle. That terms in mature scientific theories typically refer …, that the theories accepted in a mature science are typically approximately true, that the same term can refer to the same thing even when it occurs in different theories—these statements are viewed … as part of the only scientific explanation of the success of science … (emphasis added)
Putnam’s idea—that scientific realism in general could be itself regarded as the (only and therefore the) best scientific explanation of the success of science—was in turn further elaborated by Richard Boyd and then Stathis Psillos into what Psillos calls “the explanationist defence” of scientific realism. The ‘success’ claim used as a premise in this argument/defence is not about the predictive success of particular scientific theories, but instead about the ‘success’ of some alleged general scientific method. (Following van Fraassen6, this No Miracles Argument, with definite capital letters, is also sometimes called the “ultimate argument” for scientific realism.) Psillos’ ‘explanationist defence’ supposes that there is something called ‘scientific methodology’ that has proved to be ‘reliable’—in that it consistently (or fairly consistently) produces theories that yield correct predictions. Moreover, this methodology depends in various ways on background theoretical assumptions. The best explanation of the ‘reliability of scientific methodology’ is that those theories are (approximately) true. Indeed the claim seems to be that it would be 4 5 6
Colin Howson, Hume’s Problem. Oxford: Oxford University Press 2000. Hilary Putnam, Mathematics, Matter and Method (Philosophical Papers, Volume 1). Cambridge: Cambridge University Press 1975, p. 23. Bas van Fraassen, The Scientific Image. Oxford: Clarendon Press 1980, p. 39.
14
John Worrall
inexplicable—a second-order ‘miracle’—if theory-dependent scientific methodology kept producing successful scientific theories, were the theories on which that methodology is dependent not at least approximately true. As Psillos7 emphatically puts it: NMA is not just a generalisation over scientists’ [individual] abductive inferences … The explanandum of NMA is a general feature of scientific methodology—its reliability for yielding correct predictions. NMA asserts that the best explanation of why scientific methodology has the contingent feature of yielding correct predictions is that the theories which are implicated in this methodology are relevantly approximately true. (emphases added)
Moreover, the explanation involved in this defence of scientific realism is itself alleged to be, just as Putnam asserted, a scientific one. (Remember that Putnam famously claimed that scientific realism is “an overarching scientific hypothesis”8.) But, before asking whether this explanation of the success of scientific methodology can possibly itself be a scientific explanation, we should note a number of obscurities in just what the argument is supposed to be in the first place. The underlying idea seems initially clear enough: there is something called general scientific methodology that has been impressively successful (successful in producing theories that enjoy individual predictive successes); this general scientific methodology is theory-dependent in multiple ways; it would be a ‘miracle’ if this methodology were as successful as it is, if the theories on which it depends were not (approximately) true; on the other hand the success of the methodology would be explained if the theories on which it depends were indeed true; and moreover this is the best explanation of the success of that methodology; hence we can infer (by a meta-level ‘abduction’ or ‘inference to the best explanation’) that those theories involved in scientific methodology are indeed (approximately) true. One thing that seems to have gone unnoticed is that the conclusion that this version of the NMA allegedly validates is not the (likely approximate) truth of those scientific theories that score impressive predictive success (and hence elicit the no miracles intuition)—the predictive success of our best theories is the explanandum in this alleged scientific explanation not the explanans—the explanans (to which we are then allegedly entitled to infer) seems to be the (approximate) truth of the background theories taken to be involved in helping scientific methodology produce those predictively successful theories. This seems strange. But, even laying it aside, much remains obscure. Specifically: what exactly is general scientific methodology supposed to consist in, and what role do these presupposed background theories play in it?
7 8
Stathis Psillos, Scientific Realism—How Science tracks Truth. London and New York: Routledge 1999, p. 79. Hilary Putnam, Meaning and the Moral Sciences. Boston: Routledge and Kegan Paul 1978, p. 19.
The No Miracles Intuition and the No Miracles Argument
15
Boyd, whose views Psillos sees himself as developing, is decidedly unclear. He takes it that Kuhn and others have shown that scientific methods are thoroughly theory-dependent—without indicating exactly how—with, however, two (partial) exceptions. Boyd argues that (a) decisions over which (observable) predicates are ‘projectable’ and (b) assessments of degrees of confirmation of a given theory both significantly depend on “the theoretical claims embodied in …[relevant] background theories” and hence in fact, or so he claims, on the assumption that those background theories are “approximately true”9. Psillos10 elaborates as follows (numbers in parentheses added): Scientists use accepted background theories in order [1] to form their expectations, [2] to choose the relevant methods for theory-testing, [3] to calibrate instruments, [4] to assess the experimental evidence, [5] to choose among competing theories, [6] to assess newly suggested hypotheses, etc.
Here [1] seems to amount to Boyd’s point (a), while [2]–[6] are different aspects of Boyd’s claim (b) about ‘degree of confirmation’ being background-knowledgedependent. What Boyd says about ‘projectability’ is rather abstract, but in so far as it applies to real science, it seems to amount to the (well-rehearsed) point that it is background theories, rather than repeated observations, that generally (though not, I think, universally) tell us which properties generalise (and also, I would add, how they may fail to generalise). So, for example, background theories tell us that all electrons have the same charge—in principle one single experiment can then fix what that charge is, and thus can sanction the generalisation that all electrons have particular charge -e. Background evolutionary-biological theories tell us how different types of the same species of bird might differ in the colour of their plumage—instead then of observing ravens haphazardly, we investigate male and female ravens, young and mature ravens, ravens living in different geographical locations, etc; if all those are black and only if they all are, then we infer that all ravens are black. But this is surely best regarded simply as a process of teasing out the full consequences (invoking, of course, auxiliary assumptions) of those underlying theories and thus of further testing them. Nothing here seems to amount to a method of producing new theories whose further success can be regarded as independent of the success of theories that are already accepted in science. Much the same point surely holds for Boyd’s claim (b) about assessments of confirmation being dependent on background theories. Undoubtedly science seeks not just theories that are individually successful, but ones that also combine together successfully. A theory that is inconsistent with some already established theory and that is not independently successful will be viewed very differently 9
Richard Boyd, “The Current Status of the Scientific Realism Debate” in: Jarrett Leplin (Ed.), Scientific Realism. Berkeley: University of California Press 1984, pp. 41-82. Quote on p. 59. 10 op. cit., p. 78.
16
John Worrall
from one that is not (yet?) independently successful but is at least consistent with already accepted theories. Notice however that independent empirical success always seems to dominate. The fact that Copernican astronomy failed to cohere with the best available physics was not regarded by the best scientists in the 17th century as a reason to think it any the less well confirmed empirically by the independent successes it enjoyed (with, amongst others, the phenomena of planetary stations and retrogressions); but instead as a reason to look for a new physics that would be coherent with it. And, in any event, this all looks like an account of one aspect of how theories are tested once they have been articulated and nothing like an account of a ‘methodology’ whose reliability in producing successful theories can be assessed. Finally, if we were (ill-advisedly) to think of the ways that scientists test individual theories against the background of other theories as some sort of method of producing theories, it is altogether unclear how ‘reliable’ that method has been— which theories are we to count? All those that anyone ever dreamed up? Or only those that survive subsequent rigorous testing? It is standard nowadays to hold that more recent philosophy of science has taken us beyond the old ReichenbachPopper view that the contexts of discovery and of justification are quite distinct. Nowadays it is widely believed that the process of construction of theories can be rationally analysed and is not a “mere matter of psychology” (as Popper put it). But, however much can be said by way of logical reconstruction of how particular theories have been arrived at, still most of the action is at the appraisal stage—that is, the stage where the theory is already ‘on the table’ and is being subjected to stringent tests. And no matter how systematically a theory has been arrived at—by ‘deduction from the phenomena’ or whatever—it will of course be rejected if it fails to score (at any rate eventually) independent empirical success. I remain unconvinced, then, of the existence of anything that can be plausibly be called ‘scientific methodology in general’. Moreover, for all that we claim to have gone beyond Popper, it is surely true that scientists sometimes produce theories simply to try them out, without being in any way committed to the claim that they are likely to be predictively successful/true. Nor when they turn out not to be should the production of such tentative theories be thought of as in any way a failure—even if we did identify them as the products of some general ‘scientific method’. To take one example: the idea that the anomalous motion of the perihelion of Mercury might be explained within Newtonian physics by invoking a hitherto undiscovered planet (tentatively called ‘Vulcan’) was of course a perfectly reasonable hypothesis. That hypothesis ‘failed’—in that no evidence of the existence of such a planet could be found. But this was in no sense a failure of ‘scientific method’: science learned that one way of solving the Mercury problem—made plausible by background knowledge in the light of the earlier success with postulating Neptune to explain anomalies in Uranus’s orbit—did not work, and so some other solution would have to be found.
The No Miracles Intuition and the No Miracles Argument
17
But having convinced himself that the argument for realism must be at the level of some allegedly reliable ‘general scientific methodology’, Stathis Psillos necessarily views such episodes as failures and hence—even in his original treatment—is forced to weaken his position. He admits that science “has encountered many failures”11 and so concludes that “the realist argument [i.e. his NMA] should become more local in scope”12. However, he cannot of course, while remaining consistent with his general position, become totally local—he continues explicitly to deny that the NMA amounts simply to a generalisation of the particular ‘abductions’ concerning particular theories in science. So he seems in the end to adopt the view that “most” products of the scientific method are successful or, perhaps (although he does not himself explicitly invoke probabilities) that the probability of a particular theory produced by the ‘scientific method’ being successful is high. However an objectivist probabilistic approach to modelling the production of scientific theories here will not work;13 “most” is clearly vague, and in any event we want to be realist not about ‘most’ scientific theories but (selectively) about all those that elicit the no miracles intuition by enjoying striking predictive success (and we should not want to endorse a realist attitude toward those that are not successful in this way). In some other passages, Psillos weakens the conclusion of his argument still further, claiming that the NMA is meant only to “defend the achievability of theoretical truth”14. Given his endorsement of an externalist epistemology (another aspect of his account with which I fundamentally disagree), this further weakening would only mean that science may deliver some theoretical assertions that are, objectively speaking and independently of what we may or may not (or may or may not rationally) believe, true. But any anti-realist—certainly van Fraassen—can agree with that! And even if we stay ‘internalist’ (as we surely should, ‘externalist epistemology’ has always seemed to me an oxymoron), the weakened claim—which would now mean that science at least on occasion delivers a theoretical assertion which it is reasonable to believe is true (or, again, better: approximately true) is surely still much too weak to sustain the sort of realism that seems intuitively sustainable. The realist should endorse a realist attitude toward all (and only all) those scientific theories that have been predictively successful. Even if we were to concede that there is such a thing a scientific methodology and that it has been reliable in producing theories that are predictively successful, the problems for this approach are far from over. The idea that (i) the best explana11 Ibid., p. 80. 12 Ibid. 13 For criticism of such attempts, that however should not have been taken seriously in the first place, see P. D. Magnus and Craig Callender, “Realist Ennui and the Base Rate Fallacy”, in: Philosophy of Science, 71, 2004, pp. 320-338. For more general criticism see John Worrall, “Miracles and Realism”, in: E. Landry and D. Rickles (Eds.), Structure and Theory. Springer 2010 (forthcoming). 14 op. cit., p. 79.
18
John Worrall
tion of this success is that the theories that are involved in that method are approximately true and (ii) that we are therefore entitled rationally to believe that those theories are indeed approximately true runs smack into three obvious and fundamental objections. Firstly, despite Putnam’s explicit claim (endorsed by Boyd and seemingly by Psillos) any such explanation cannot count as scientific; secondly accepting that the argument involves a “philosophical explanation” rather than a scientific one, realism (strictly about the background theories involved in scientific method, remember) by no means clearly qualifies as even the best philosophical explanation; and thirdly the argument is surely circular. Even if we conceded that ‘science in general’ (or at least ‘mature science in general) had been ‘successful’, how could this proposed grand, meta-level ‘abduction’ or inference to the best explanation’ possibly count as a scientific explanation of that ‘success’? Scientific explanations require independent testability. Is the NMA independently testable? The nearest it might come, so far as I can tell, is via the ‘prediction’ that the next theory produced by the ‘scientific method’ will be predictively successful. (The ‘prediction’ that the next theory will be (approximately) true cannot of course count. Testable predictions need to be testable! ‘Predictive success’ is an effective notion, but truth or approximate truth is not.) But this ‘prediction’ (a) could easily be false without realism thereby being at all challenged or undermined: not all of the theories actually produced in science are successful and hence there is no realist case for them being true (some of them are not even intended (necessarily) to be candidates for truth); and (b), if it refers to theories that are actually accepted in science, as opposed just to proposed or considered, then it is no testable ‘prediction’ at all, but instead a foregone conclusion: no theory would be accepted in (mature) science unless it were predictively successful and indeed more successful than its predecessor. Suppose it is claimed instead that realism is a better philosophical explanation of the success of science than its rivals—presumably because it possesses some ‘explanatory virtue’ different from that of empirical testability. I have many doubts about the whole notion of explanation when not directly related to empirical testability—and to talk in this way seems simply to reexpress the no miracles intuition in an obscure and misleading way. (Indeed Psillos admits15 that it is wrong to expect that inference to the best explanation will be an inference that fits some “logical template”; but then again one wonders why, in that case, it is supposed to be any sort of real logical inference that takes us beyond intuition.) And even if trade in ‘philosophical explanation’ is permitted, why exactly should realism be thought of as a better ‘philosophical explanation’ of science’s success in successfully predicting new types of phenomena than, say, the constructive empiricist ‘explanation’? This, mirroring Psillos’ approach, would presumably claim that scientific method has been successful because the background 15 Stathis Psillos, “The Fine Structure of Inference to the Best Explanation”, in: Philosophy and Phenomenological Research 74, 2007, pp. 441-8.
The No Miracles Intuition and the No Miracles Argument
19
theories that it presupposes are empirically adequate. If Psillos’ realist argument counts as a ‘philosophical explanation’ of science’s success then it is difficult to see why the constructive empiricist one should not. On what grounds, then, could the realist claim hers to be the better explanation? Presumably only on the ground of logical strength of the ‘explanans’. It is of course true that the realist claim that a theory is (let’s say, strictly) true is logically stronger than the constructive empiricist claim that the theory is ‘fully’ empirically adequate and the suggestion is that we should always prize extra content in explanations (provided of course the extra strength does not lead to empirical refutation—no problem in this case). But here I am in sympathy with van Fraassen16 and Fine17—given that this extra content is in no way testable, this is exactly the sort of pseudo-‘deeper explanation’ that we should we shun. We only prize (or only ought to prize) extra content when it leads to independently checkable predictions. Psillos explicitly claims that Fine’s ‘explanation’ of success in terms of empirical adequacy is to be dispreferred because invoking the instrumental reliability of science to explain its instrumental reliability is no sort of explanation at all. But neither is the realist ‘explanation’! Following Psillos in using the hackneyed example: he complains that Fine is in the position famously ridiculed by Molière. But is the claim that opium is sleep inducing because it has dormitive virtue and moreover this virtue was given it by God any better an explanation than the original that just invokes dormitive virtue? And isn’t the realist simply adding a non-testable add-on extra (the truth of the theory) in a completely analogous way? Explanatory brownie points are not awarded for adding content unless the extra content leads to extra testability. Finally, the grand meta-level ‘explanationist defence’ of realism is circular and therefore question-begging. In essence, the explanationist defence uses inference to the best explanation to defend inference to the best explanation! Realism is the claim that our best scientific theories, which are therefore presumably the best explanations we have, are reasonably regarded as approximately true on the basis of their success in predicting new phenomenon. So the realist scientist endorses inference to the best explanation concerning particular theories; and when her realism is challenged, she is being encouraged by Psillos to respond that realism is the best position because it is the best explanation (now of the supposed general success of scientific method). But how could this possibly be convincing to a sceptic? If she accepted inferences to the best explanation she would not have been a sceptic in the first place! As Fine18 put it the ‘explanationist defence’ carries no weight because it involves “the very type of argument whose cogency is the question under discussion”. 16 op. cit. 17 Arthur Fine, “Unnatural Attitudes: Realist and Instrumentalist Attachments to Science”, in: Mind, 95, 1986, pp. 149-179. 18 Arthur Fine, “Piecemeal Realism”, in: Philosophical Studies 61, 1991, pp. 79-96. Quote on p. 82.
20
John Worrall
Fine’s objection is an obvious one and so unsurprisingly has been made by a number of others (e.g. by Larry Laudan19). Psillos tried to avoid accepting its obvious correctness20 by drawing a distinction (originally used by Braithwaite21 in the (similarly doomed) attempt to argue that inductive justifications of induction are perfectly cogent) between ‘rule circularity’ and ‘premise circularity’. If an argument for some conclusion c includes c as a premise, then the argument is ‘viciously circular’; but, Psillos22 endorses Braithwaite’s opinion that ‘rule circular’ arguments are not vicious. An argument is ‘rule circular’ if it employs a rule of inference in taking us from its premises to its conclusion that it is justifiable as a truth-transferring rule only if certain assumptions, including the conclusion c itself, are themselves true. But surely so far as the cogency of an argument goes, the only question is whether it is circular—the ‘vicious’ qualifier is just hot air! There seems to be complete equivalence between premise and rule circularity. In particular any premise circular argument for c can be made rule circular quite trivially: remove c from the list of premises, and, for example, add an extra rule that says you can infer X & c from any derivable statement X. Given this, how could we possibly be (rationally) less concerned about a rule circular argument than a premise circular one? While continuing to maintain that there is an important difference between premise and rule circularity, Psillos has importantly modified his position in later writings. He now seems to admit that scientific realism is not a scientific explanation of anything: “The problem lies in the thought that scientific realism can be supported by the same type of argument that scientific theories are supported [by]. This is a tempting thought. But it is flawed I now think.” 23 (Notice however that this does not render the above criticisms redundant since it is still Psillos’s view that the NMA is to be articulated and defended as a grand meta-level ‘abduction’.) His view now is that the NMA “presupposes rather than establishes the realist frame[work]. Still within the realist framework, the NMA has an important role to play and this … is to offer a vindication of [inference to the best explanation].”24 Well, aside from the fact that no one surely ever thought that the argument establishes realism (as opposed to giving it some rational support), this new posi19 Larry Laudan, “A Confutation of Convergent Realism” in: David Papineau (Ed.) The Philosophy of Science, Oxford: Oxford University Press 1996, pp. 139-165. 20 Stathis Psillos, Scientific Realism—How Science tracks Truth, op. cit. 21 Richard B. Braithwaite, Scientific explanation: a study of the function of theory, probability and law in science. Cambridge: Cambridge University Press 1953. 22 op. cit., p. 82. 23 Stathis Psillos, “Choosing the Realist Framework”, in: Synthese, DOI 10.1007/s11229009-9606-9. Published online 30 June 2009. p. 11. 24 Ibid. This could just be seen as an elaboration of his view in Scientific Realism (p. 89): “In the final analysis, we just have to rely on some basic methods of inquiry. The fact that we have to make recourse to rule-circular arguments in order to defend them, if defence is necessary, is both inescapable and harmless.”
The No Miracles Intuition and the No Miracles Argument
21
tion seems to be an endorsement of the circularity charge rather than a rejoinder to it. You will, this new position allows, be moved by the NMA only if you are already an advocate of inference to the best explanation and hence already a realist. That is, surely, you won’t be moved objectively speaking at all. But psychologically speaking the realist may gain extra confidence by chanting the NMA—even though it can be no news to her objectively speaking. But while preaching to the converted may make the preacher and the converted feel good, the truly converted need no preaching! Having accepted that the NMA is not an argument in favour of realism, it is difficult to see how, in his later interpretation, it is even any sort of consideration in favour of realism—and certainly impossible to see it as a “vindication” of inference to the best explanation (see above quote). Psillos now asserts25 that “the original decision to accept [the realist] framework [or any other framework while] not arbitrary [is] not a matter that answers to truth or falsity”. It is difficult to see exactly what ‘non-arbitrary’ means here, but certainly it seems that this new position allows that someone might happen to be a realist but could equally well have chosen a rival framework—say the constructive empiricist one—and not have been in any sense wrong to do so; and had she made that alternative choice then the NMA would have nothing to say to her. In contrast, the no miracles intuition favoured by Poincaré, Duhem and myself is at least intended to speak across frameworks. It is exactly the predictive success of some particular scientific theories that seems, whatever your initial philosophical point of view, ineluctably to elicit the feeling that the theory must have somehow ‘latched on to’ the deep structure of the universe (without of course being able to say exactly how). This obviously cannot ‘establish’ realism, but it does provide a very modest support for a very modest version of scientific realism—in no stronger a sense than that it sets some version of realism as the default position. This may not seem a lot, but we cannot reasonably expect anything more. We were certainly never going to get anything more from the No Miracles Argument and, as I have argued in this paper, nothing more is exactly what we get.
LSE Houghton Street London WC2A 2AE UK
[email protected]
25 Stathis Psillos, “Choosing the Realist Framework”, op. cit., p. 6.
STATHIS PSILLOS
THE SCOPE AND LIMITS OF THE NO MIRACLES ARGUMENT1
In this paper, I review the scope and limits of the no miracles argument. I defend and, where necessary, revise my account of it as a way to justify Inference to the Best Explanation (IBE).
1 I have argued in my (1999, chapter 4) that the no miracles argument (NMA) should be seen as a grand IBE. The way I read it, NMA is a philosophical argument which aims to defend the reliability of scientific methodology in producing approximately true theories. More specifically, I took it that NMA is a two-part (or two-stage) argument. Here is its structure. NMA (A) (A1) Scientific methodology is theory-laden. (A2) These theory-laden methods lead to correct predictions and experimental success (instrumental reliability). How are we to explain this? (C1) The best explanation (of the instrumental reliability of scientific methodology) is this: the statements of the theory which assert the specific causal connections or mechanisms in virtue of which methods yield successful predictions are approximately true.2 (B) (B1/C1) Theories are approximately true. (B2) These background scientific theories have themselves been typically arrived at by abductive reasoning. 1 2
Many thanks to two anonymous readers for comments. This somewhat cumbersome formulation is meant to highlight that the application IBE should be local and selective rather than global and undifferentiated. Only those parts of a theory that do play some role in the generation of the theory’s successes do get some credit from the explanation of these successes. For all practical purposes, however, the conclusion (C1) might be abbreviated thus: the best explanation of the instrumental reliability of scientific methodology is that background theories are approximately true.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_2, © Springer Science+Business Media B.V. 2011
24
Stathis Psillos
(C2) Therefore, (it is reasonable to believe that) abductive reasoning is reliable: it tends to generate approximately true theories. Given this structure, it is clear that NMA aims to defend the reliability of IBE, as a mode of reasoning. Note that the explanandum, viz., the instrumental reliability of scientific methodology, is distinct from the explanans, viz., the reliability (in Goldman’s sense) of inference to the best explanation. As Arthur Fine aptly put it, instrumental reliability is a feature of scientific theories in virtue of which they are ‘useful in getting things to work for the practical and theoretical purposes for which we might put them to use’ (1991, 86). It has mostly to do with getting predictions right and with leading to empirical successes. Reliability, in the sense it is understood by epistemological externalists, is a property of a method by virtue of which it tracks the truth—that is, it tends to generate true conclusions when fed with true premises. This important distinction is confused in Jacob Busch’s (2008) and a lot that follows misses the point. It is transparent that the NMA has two conclusions (one for each part of it). The first (C1) is that we should accept as (approximately) true the theories that are implicated in the (best) explanation of the instrumental reliability of first-order scientific methodology. The second (C2) is that since, typically, these theories have been arrived at by means of IBE, IBE is reliable (truth-conducive). Both conclusions are necessary for fulfilling the aim of NMA.
2 A straightforward observation is that sub-argument (B) is not circular. It is not an instance of IBE, anyway. (B2) is a factual premise: it is meant to state a fact about how theories have been generated and accepted. It can certainly be contested. But all that matters for (C2) to follow is this: If (C1) is true, then given a factual premise (B2), (C2) is true. There is a missing premise, of course, viz., that if a method yields approximately true theories, this method is reliable. But this is how reliability is understood. It might be objected that all that is shown—at best—is that IBE has been reliable; not that it will be; and hence, not that it is reliable, simpliciter. That is fine, however. No-one claims that the problem of induction is thereby solved. Suppose, for the sake of the argument, that (B) has a different form. For instance, here is a reconstruction of it, motivated by Busch’s (2008). (B*) (C1/B1) Theories are approximately true. (B2*) These theories have been typically produced by (or have been accepted on the basis of) IBE.
The Scope and Limits of the No Miracles Argument
25
What is the best explanation of the connection between IBE and truth? (C2) That IBE is reliable. I do not think this is the right way to reconstruct my argument, the reason being that the claim that IBE is reliable does not explain (in this context) the connection between IBE and truth; it just sums it up. But even if it were the right way, (B*) would be an instance of IBE and not ipso facto circular. How about (A) then? This is not circular either. It is an instance of IBE, but there is no reason to think that instances of IBE, in and of themselves, are circular. This clearly isn’t. Yet, the argument as a whole has an air of circularity. It employs/uses IBE while its (second) conclusion states that IBE (the rule or method employed at least partly for the generation of this conclusion) is reliable. Well and truly. Is this circularity vicious?
3 Vicious circularity is an epistemic charge—a viciously circular argument has no epistemic force. It cannot offer reasons to believe the conclusion. It cannot be persuasive. This has to be right. If the charge of circularity were logical and not epistemic, (if that is, a circular argument lacked validity altogether and not just epistemic force), all deductive arguments would be viciously circular. There is an obvious sense in which all deductive arguments are such that the conclusion is ‘contained’ in the premises—and this grounds/explains their logical validity. Hence, deductive arguments can be circular without being viciously circular. And similarly, some deductive arguments are viciously circular, (without thereby being invalid),--for instance: if Socrates is mortal, then Socrates is mortal; Socrates is mortal; therefore Socrates is mortal. Premise-circularity (where the conclusion is explicitly one of the premises) is always and everywhere vicious! It cannot possibly have any epistemic force for someone who does not already accept the conclusion. NMA, insofar as it is circular, is not premise-circular. (C2) is not among the premises of (B). And (C1) is not among the premises of (A). There is, however, another kind of circularity. This, as Braithwaite (1953, 276) put it “is the circularity involved in the use of a principle of inference being justified by the truth of a proposition which can only be established by the use of the same principle of inference”. It can be called rule-circularity. In general, an argument has a number of premises P1, …, Pn. Qua argument, it rests on (employs/uses) a rule of inference R, by virtue of which a certain conclusion Q follows. It may be that Q has a certain content: it asserts or implies something about the rule of inference R used in the argument; in particular that R is reliable. So: rule-circular arguments are such that the argument itself is an instance, or involves
26
Stathis Psillos
essentially an application, of the rule of inference whose reliability is asserted in the conclusion. If anything, NMA is rule-circular (though in an oblique sense). Part (A) yields a conclusion (C1), such that it, together with another premise (B2), yield another conclusion (C2), whose content is that the rule by means of which (C1) was arrived at is reliable. The pertinent question is whether rule-circularity is vicious. Obviously, rule circularity is not premise-circularity. But, one may wonder, is it still vicious in not having any epistemic force in some sense? In my (1999), I tied this issue to the prospects of epistemological naturalism and externalism. In effect, I argued that NMA proceeds within a broad naturalistic framework in which the charge of circularity loses its bite because what is sought is not justification of inferential methods and practices (at least in the neo-Cartesian internalist sense) but their explanation and defence (in the epistemological externalist sense). It’s not as if NMA should persuade a committed opponent of realism to change sides. But it can explain to all those who employ IBE, in virtue of what it is reliable; and it can possibly sway all those who are neutral on this issue. I now think, however, that this kind of externalist defence of NMA is too narrow. What we should be after are reasons to believe that IBE is reliable (and not just an assertion to the effect that if indeed IBE is reliable, and we are externalists about justification, we are home and dry). Externalism does have a point. Reliability is a property of a rule of inference which the rule possesses (or fails to possess) independently of the reasons we have for thinking that it does (or does not). This is the point behind my claim that “NMA does not make IBE reliable. Nor does it add anything to its reliability, if it happens to be reliable” (1999, 83). Where I was wrong was in what immediately followed: “[NMA] merely generates a new belief about the reliability of IBE which is justified just in case IBE is reliable”. NMA does generate a new belief (about the reliability of IBE) but this belief is not justified “just in case IBE is reliable”. This is too externalist. I now think that NMA justifies this belief too. To see this, let us ask the broader (and interesting) question: can IBE be justified?
4 Obviously, this question has a fine structure. It depends on how exactly we understand IBE and how exactly we understand the call for justification. I have dealt with the first issue in some detail in my (2007). So I will limit myself to a few general comments towards the end of the paper. Let me focus on the second issue and let us ask again: can IBE be justified? If the answer is no, we end up with inferential scepticism. If the answer is yes, there are two options: non-inferential justification and inferential justification. A non-inferential justification of IBE, if
The Scope and Limits of the No Miracles Argument
27
possible at all, would have to rely on some a priori rational insight. An inferential justification of IBE would have to rely on some rule of inference. There are obvious problems with all three options. 1. Scepticism leaves us in an inferential vacuum, which is hardly plausible. 2. Non-inferential justification presupposes something whose existence is dubious (rational insight). 3. Inferential justification has to rely on a rule of inference. If the rule is distinct, there is the issue of how the two rules are inferentially connected. If the rule is the self-same, we end up in rule-circularity. The good news is that this is not a conceptual tangle that arises only in the case of IBE. It spills over to more basic forms of ampliative reasoning as well as to deductive logic. So IBE is in good company. Let’s call this ‘the good company argument’.
5 In the case of the justification of modus ponens (or any other genuinely fundamental rule of logic), if logical scepticism is to be forfeited, there are two options available: either non-inferential justification or inferential (rule-circular) justification. There is no non-inferential justification of modus ponens. Therefore, there is only rule-circular justification. Indeed, any attempt to justify modus ponens by means of an argument has to employ modus ponens itself (see Dummett 1974). Why is there no non-inferential justification of modus ponens? There are many routes to this conclusion, but two stand out. The first is Quine’s argument against basing logic on conventions; the second is that if non-inferential justification is meant to amount to default-reasonableness, we may well end up with a form of relativism, since what counts as default-reasonable might vary from one community to another. (For more on this, see Boghossian 2000). It follows that the rule-circular justification of IBE is in good company—with all basic forms of reasoning (including, of course, enumerative induction).
6 But couldn’t any mode of reasoning (no matter how crazy or invalid) be justified by rule-circular arguments? Take for instance what may be called (due to Igor Douven) Inference to the Worst Explanation. (IWE) Scientific theories are generally quite unsuccessful These theories are arrived at by application of IWE
28
Stathis Psillos
What is the worst explanation of this? That IWE is a reliable rule of inference Let’s call this, following Boghossian (2000, 245) the bad company objection. How can it be avoided? The reply here is that the employment of rule-circular arguments rests on or requires the absence of specific reasons to doubt the reliability of a rule of inference. We can call this, the Fair-Treatment Principle: a doxastic/ inferential practice is innocent until proven guilty. This puts the onus on those who want to show guilt. I take this to be a fundamental epistemic principle. To motivate it properly would require much more space than I have now. But the basic idea is this. Traditional foundationalism has been tied to active justification, viz., to the active search for reasons for holding a belief. So any belief is suspect unless there is some good reason to hold it. The search for independent reasons for holding the belief is then necessary for its justification, since without them there is no way to ensure that the belief is rationally held. There are many reasons why active justification is too strong a condition on the rationality of belief. But in any case, there is an alternative picture of epistemology, what Gilbert Harman (1999) has called ‘general conservatism’. According to this picture, no belief requires active justification in the absence of well-motivated objections to it. The rationale for this is that justification has to start from somewhere and there is no other point to start apart from where we currently are, that is from our current beliefs and inferential practices. Accordingly, unless there are specific reasons to doubt the reliability of IBE, there is no reason to forego its uses in justificatory arguments. Nor is there reason to search for an active justification of it. Things are obviously different with IWE, since there are plenty of reasons to doubt its reliability, the chief being that typically the worst explanations (whatever that means) of the explananda are not truthlike; not to mention the fact that the first premise of IWE is false. It may be further objected that even if the Fair-Treatment Principle permits the employment of certain inferential rules, it fails to give us reasons to rely on them. I am not sure positive reasons, as opposed to the absence of reasons to doubt, are required for the employment of a rule. But in any case, it can be argued that there are some proto-reasons for the use of certain basis inferential rules. Do not forget that our basic inferential rules (including IBE, of course) are rules we value. And we value them because they are our rules, that is rules we employ and reply upon to form beliefs. Part of the reason why we value these rules is that they have tended to generate true beliefs—hence we have some reason to think they are reliable, or at least more reliable than competing rules (say IWE). So even if it is accepted that the employment of rule-circular arguments in favour of an inferential rule does require the presence of reasons to take the rule seriously, there are such reasons.
The Scope and Limits of the No Miracles Argument
29
7 We can pursue the issue of justification by means of rule-circular arguments a bit further, by raising the issue of whether there are intuitive constraints on justification which rule-circular arguments violate. Suppose one were to say: (J) No use of rule R is justified unless X. What could X be such that rule-circular arguments violate it? The only plausible candidate for X which would be violated by a rule-circular argument is: R’s reliability-relevant properties are proved/supported by an independent argument. So (J*) No use of rule R is justified unless R’s reliability-relevant properties are proved or supported by an independent argument. Even then, there is a sense in which a rule-circular argument is an independent argument, since it can have epistemic force for someone who has no views about the rules they employ. In other words, an independent argument need not be an argument of a different form. Still, this is weak independence, since the users of R are disposed to use it, even if they have no views about it. What if we opted for a strong sense of independence? (SI) An argument for the reliability-relevant properties of R is strongly independent if it is either different in form from R or it can sway someone who is not already disposed to using R to start using it (or to acquire this disposition). Note that the first disjunct of this condition is question-begging. But, suppose it is not. If we take it seriously, as noted already, it would be impotent as a criterion for the justification of a basic rule of inference, since no basic inferential rule can be justified by the application of another (distinct in character) rule. Inferential scepticism would follow suit. To see this, reflect on the following claim: no use of memory is justified unless the memory’s reliability-relevant properties are proved/ supported by a non-memory-based argument. Whatever this supposedly independent argument might look like, it will have to be, ultimately, memory-based, since it has to be remembered! The second disjunct of (SI) is moot. A rule-circular argument might (conceivably) sway someone to become disposed to use this very rule. Alternatively, why should it be an intuitive requirement on justification of an inferential rule that it can rationally force someone to start using the rule? Suppose we do require something like this. Let’s call it a condition of extra strong independence.
30
Stathis Psillos
(ESI) An argument for the reliability-relevant properties of R is extra strongly independent if and only if it can be used to sway a sceptic about R. (ESI) is clearly not an intuitive constraint on justification, unless massive inferential scepticism is an intuitive position—which is not. Note, a propos, that nowhere is it said or implied that the use of a rule R is (or should be) rationally compelling—at least if by that it is meant that there are (or should be) arguments for R that can sway the sceptic. But, clearly, the use of a rule R and its justification on the basis of a non-sceptic-suasive rule-circular argument are rationally permitted.
8 In a recent piece Valeriano Iranzo (2008) has raised further objections to my formulation of the NMA. He grants part (A) of the argument (see section 1 above), but claims that my part (B) could in fact be replaced by the following: (I) (I1) Background theories are approximately true (a fortiori, they are approximately empirically adequate). (I2) Background theories have been arrived at by IBE. (*) An inference is instrumentally reliable iff it yields a high rate of empirically adequate conclusions. (I3) Therefore, IBE is instrumentally reliable. This, he argues, is a version of NMA suitable for anti-realists, since it rests on a weaker premise (*) and draws a weaker conclusion than (B). Clearly (*), qua a definition of instrumental reliability, is weaker than the definition of reliability required for (B). Iranzo takes it that this anti-realist version of NMA is broadly within the reliabilist camp, since it defends the instrumental reliability of IBE. But then he goes on to claim that once (I) is seen as an option, the conclusion (I3) should lead us to replace the first premise (I1) with the following weaker premise: (I1*) Background theories are empirically adequate. There is something strange going on here. Iranzo’s anti-realist NMA is a selfundermining argument. Its conclusion (I1*) weakens one of the premises that led to it, viz., (I1). If Iranzo grants part (A) of the argument, as he says he does, there are reasons to accept premise (I1), [C1 in my formulation of (A) in section 1], which in fact is the conclusion of (A). That is, there are reasons (best-explanation based reasons) to accept that background theories are approximately true and a fortiori that they are approximately empirically adequate. But, by plugging (I1) into (I) we are entitled, according to Iranzo, only to part of the content of the
The Scope and Limits of the No Miracles Argument
31
premise (I1), viz., its part which has to do with the empirical adequacy of theories. I think this situation borders with incoherence. The point is not that we may find out that only part of the content of a premise was necessary for the derivation of a certain conclusion. This is fine, of course. The problem is that the very reason for holding the weakened premise (I1*) was the stronger premise (I1). In particular, the very reason for holding that background theories are approximately empirically adequate is that this follows from the conclusion of (A) that they are approximately true. So, I doubt that (I) is a coherently formulated argument. There is a coherent anti-realist version of NMA, but to see it we need to change part (A) of the argument too. So: AR-NMA (AR-A) (A1) Scientific methodology is theory-laden. (A2) These theory-laden methods lead to correct predictions and experimental success (instrumental reliability). How are we to explain this? (AR-C1) The best explanation (of the instrumental reliability of scientific methodology) is that background theories are (approximately) empirically adequate. (AR-B) (AR-B1) Background theories are (approximately) empirically adequate. (I2) Background theories have been arrived at by IBE. (*) An inference is instrumentally reliable iff it yields a high rate of empirically adequate conclusions. (I3) Therefore, IBE is instrumentally reliable. Note that (AR-NMA) takes it that (A) defends empirical adequacy as the best explanation of instrumental reliability background theories. Then, it proceeds by drawing the further (weaker) conclusion that IBE is instrumentally reliable. What is wrong with (AR-NMA)? If we take seriously the obligation/permission distinction noted above, it is a rationally permitted argument. However, in this coherent formulation of (AR-NMA), the issue between it and NMA is whether (C1) or (AR-C1) is the best explanation of the instrumental reliability of background theories. In other words, is truth or empirical adequacy the best explanation? No much progress can be made on this front—though I still think that truth is a better explanation than empirical adequacy, for the reasons noted already in my (1999, chapter 4). Perhaps some progress can be made if we take a different line of argument. (AR-NMA) wavers between two formulations, depending on how we read premise (AR-B1):
32
Stathis Psillos
(AR-B1*) Background theories are, at least, empirically adequate. (AR-B1**) Background theories are, at most, empirically adequate (and false). Obviously (AR-B1*), plugged into (AR-B), licenses the conclusion that IBE is at least instrumentally reliable. This opens up the further question of whether it is reliable, simpliciter, and what follows is that (AR-NMA) becomes compatible with the realist NMA. Equally obviously, (AR-B1**), plugged into (AR-B), licenses the conclusion that IBE is at most instrumentally reliable and unreliable simpliciter. But then it becomes flatly question-begging.
9 Is the rule-circular justification of IBE empty? It might be argued that a rulecircular justification cannot possibly fail to justify IBE. Let’s try to capture this, by the not-sure thing principle: A rule-circular justification of a rule R should be such that the proof of the possession by the rule of the reliability-relevant properties should not be a sure thing. In other words, there should be at least a prima facie risk that the rule-circular argument won’t vindicate R. I am not sure this is a good principle, but that NMA does take some risk is obvious—especially since (C1) might not be the right conclusion to draw. The very possibility of (AR-NMA) shows that there is a choice between (C1) and (AR-C1) and that whether one or the other offers the best explanation of the instrumental reliability of scientific methodology is a substantive issue. Besides, B2 might be contested: it’s a substantive truth, if anything, that theories have been arrived at by IBE. So, C2 might be false. Another related worry might be that the foregoing rule-circular justification is empty because it has no probative epistemic force. Let’s capture this by the whybother-to-play-the-game principle: A rule-circular justification of a rule R should be such that it has some epistemic force vis-à-vis those who are either disposed to use R or are not fullblown sceptics about R. As noted already, the NMA does have some such force. It explains the presence and stability of these dispositions, systematises what inferential patterns they follow and explains why they are reliable.
The Scope and Limits of the No Miracles Argument
33
10 In his critique of my version of NMA, John Worrall (this volume) takes the line that NMA is not an argument proper but rather an intuition—the one that drives home the idea that it is incredible or unlikely that theories yield novel predictions and yet they are entirely off the mark. I certainly agree with Worrall that novel predictions play a crucial role in NMA and the defence of realism. But is it right to claim that NMA is nothing over and above the union of several individual abductions—viz., those abductions that have led to the adoption of individual scientific theories? I am afraid it is not. Part (B) of NMA is necessary because clearly the union of first-order abduction says nothing of their reliability. Part (B) offers fresh information about the reliability of IBE—and hence of its properly employed instances. Moreover, part (B) supports certain counterfactuals which are not licensed by the union of first-order IBEs. Given (B), it could be argued, for instance, that had scientists adopted hypothesis X instead of hypothesis Y on the grounds that X offered (even by their lights) a better explanation of evidence than Y, they would have been closer to the truth. In any case, is the no-miracles an intuition or an argument? That it is not an argument rests only on prejudice; on thinking that only deductive arguments are proper arguments. NMA might not be a good argument (though I think it is). But insofar as there is such thing as ampliative reasoning, NMA is a proper defeasible argument.
11 The NMA has played a key role in the explanationist defence of realism. I now think, however, that the defence it offers to realism is limited. The NMA is not an argument for scientific realism; that is, it’s not an argument for the truth of realism. The details are offered in my (forthcoming). Scientific realism is not a theory; it’s a framework which makes possible certain ways of viewing the world. Scientific realism lacks all the important features of a scientific theory. So, the problem lies in the thought that scientific realism can be supported by the same type of argument that scientific theories are supported. This is a tempting, but flawed, thought, the reason being that the very idea of counting empirical success as being in favour of the truth of a scientific theory—the very idea of evidence making a theory probable, or the very idea that a theory is the best explanation of the evidence, and the like—presupposes that theories are already placed within the realist framework. For the NMA to work at all, it is presupposed that explanation—and in particular explanation by postulation—matters and that scientific theories should be assessed and evaluated on explanatory grounds. Hence, the no miracles argument works within the realist framework; it’s not an argument for it. It presupposes rather than establishes the realist framework. Still, within the realist framework, NMA has an
34
Stathis Psillos
important role to play, and this, as I still think, is to offer a justification of inference to the best explanation.
12 In what has preceded, I have taken IBE for granted without explaining what exactly it is and how it works. Here again, the details are offered in my (2009). To a good approximation, IBE has the following abstract form: • • •
D is a collection of data (facts, observations, givens). H explains D (would, if true, explain D) No other hypothesis can explain D as well as H does.
•
Therefore, H is probably true.
But it is best to see IBE as an inferential genus. The several species of the genus IBE are distinguished, among other things, by plugging assorted conceptions of explanation in the reasoning schema that constitutes the genus. For instance, if the relevant notion of explanation is causal, IBE becomes an inference to the best causal explanation. Or, if the relevant notion of explanation is subsumption under laws, IBE becomes a kind of inference to the best nomological explanation, and so forth. What is also worth stressing is that first-order IBE-type of reasoning has a fine-structure that is shaped, by and large, by the context. For instance, the context can settle what the relevant explanatory relation is; it can (in most typical cases) determine the ranking of the rival explanations; it can settle what assumptions must be in place for the best explanation to be acceptable; it can settle what to watch out for (or search) before the best explanation is accepted (e.g., the absence of certain possible defeaters). Contextual factors can link explanation and truth quite closely because, far from aiming to forge an abstract connection between them, they make this connection stand or fall together with the richness and specificity of the relevant information available. The key idea behind IBE is that explanatory considerations guide inference. But, within a more contextual approach, there is a sense in which the crucial conclusion (C1) of NMA acquires extra strength. For now (C1) gets support from countless context-dependent first-order abductions which yield true beliefs about the micro-constituents of the world.
The Scope and Limits of the No Miracles Argument
35
REFERENCES
Busch, Jacob. ‘No New Miracles, Same Old Tricks’, in: Theoria, 74, 2008, pp. 102-114. Boghossian, Paul. ‘Knowledge of Logic’. In: P. Boghossian & C. Peacocke (eds) New Essays on the A Priori. Oxford: Oxford University Press 2000. Braithwaite, R. B. Scientific Explanation. Cambridge: Cambridge University Press 1953. Dummett, Michael. The Justification of Deduction. British Academy Lecture, Oxford: Oxford University Press 1974. Fine, Arthur. ‘Piecemeal Realism’, In: Philosophical Studies, 61, 1991, pp.79-96. Harman, Gilbert. Reasoning, Meaning and Mind. Oxford: Oxford University Press 1999. Iranzo, Valeriano. ‘Reliabilism and the Abductive Defence of Scientific Realism’. in: Journal for General Philosophy of Science 39, 2008, pp.115-20. Psillos, Stathis. ‘Choosing the Realist Framework’, Synthese, DOI 10.1007/ s11229-009-9606-9 forthcoming. Psillos, Stathis. Knowing the Structure of Nature. London: MacMillan-Palgrave 2009. Psillos, Stathis. ‘The Fine Structure of Inference to the Best Explanation’, in: Philosophy and Phenomenological Research, 74, 2007, pp.441-448. Psillos, Stathis. Scientific Realism: How Science Tracks Truth. London & New York: Routledge 1999.
Dept. of Philosophy and History of Science University of Athens University Campus 15771 Athens Greece
[email protected]
GREGORY WHEELER AND RICHARD SCHEINES
CAUSATION, ASSOCIATION AND CONFIRMATION
ABSTRACT Many philosophers of science have argued that a set of evidence that is “coherent” confirms a hypothesis which explains such coherence. In this paper, we examine the relationships between probabilistic models of all three of these concepts: coherence, confirmation, and explanation. For coherence, we consider Shogenji’s measure of association (deviation from independence). For confirmation, we consider several measures in the literature, and for explanation, we turn to Causal Bayes Nets and resort to causal structure and its constraint on probability. All else equal, we show that focused correlation, which is the ratio of the coherence of evidence and the coherence of the evidence conditional on a hypothesis, tracks confirmation. We then show that the causal structure of the evidence and hypothesis can put strong constraints on how coherence in the evidence does or does not translate into confirmation of the hypothesis.
1. INTRODUCTION It has long been held that “coherent” evidence confirms a hypothesis which explains the coherence of such evidence. For example, if five of your friends come down with a stomach ailment on Tuesday after going to the same restaurant on Monday evening without even seeing each other, then this powerfully confirms the hypothesis that they got food poisoning from that restaurant. The hypothesis explains the evidence because the food served by the restaurant is a common cause of the five separate stomach ailments. Alternatively, if we know that the five friends did not eat at the same restaurant on Monday, but we do know that all met each other and either shook hands or kissed at one time or another during the day, then the co-occurrence of their Tuesday stomach trouble is no evidence at all for restaurant induced food poisoning; the explanation is quite different: they passed a stomach virus from one to the other. The difference between the cases is not the “coherence” of the evidence, but rather the causal explanation of this coherence. In the first case the association of their stomach ailment is due entirely to the influence of a common cause, and in the second case it is due entirely to a causal relationship directly between the friends. In this paper we discuss more systematically and formally the connections between 1) the causal relationships between a hypothesis under consideration and D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_3, © Springer Science+Business Media B.V. 2011
38
Gregory Wheeler and Richard Scheines
evidence relevant to the hypothesis, 2) the coherence of this evidence, and 3) the degree to which the evidence confirms the hypothesis. Bayesian epistemologists have given us probabilistic models of both coherence and confirmation, and philosophers, computer scientists, and statisticians have given us probabilistic models of causal systems.1 Our task is to show how these pieces might or might not fit together. In this paper, our focus is not on coherence, confirmation, or how they are to be modeled, but rather on the relationship between coherence and confirmation and how this relationship might be mediated by the causal relationships among the hypothesis and the evidence. In particular, we will consider situations in which the pieces of evidence in two sets of evidence are equally relevant to a hypothesis individually, the coherence of the sets differ, but the causal structure is identical, as well as situations in which the pieces are equally relevant, the sets are equally coherent, but the causal structure is different. These cases will hopefully bring out the role that causal structure plays in mediating the relationship between coherence and confirmation. The organization of the paper is as follows. In section 2 we review coherentism from the perspective of Bayesian epistemology. In section 3 we introduce the assumptions and models we will use to give structure to notions of coherence and confirmation. In section 4 we discuss the relationship between coherence and confirmation simpliciter. In section 5 we introduce the idea of focused correlation (Wheeler 2009) and extend results connecting coherence to confirmation through focused correlation. In section 6 we present the case for making causal beliefs explicit, and trace several consequences for the relationship between coherence and confirmation that arise solely from the causal structure governing the evidence and hypothesis.
2. COHERENCE AS PROBABILISITIC ASSOCIATION In 1985, Laurence BonJour2 gave some structure to the coherence theory of justification by laying out criteria for a theory of coherence. (He later despaired of meeting those demands and quit the theory altogether, but that is another story.3) His postulates for coherentism mention a role for probability, and several authors, operating under the heading of “Bayesian epistemology”, have explored the prospects of developing a probabilistic measure of coherence along his basic outline.4 1
2 3 4
Models of coherence or confirmation, or the relation between them are discussed by Bovens and Hartmann, 2003a,b, 2006; Douven, 2007; Fitelson, 2003; Glass, 2006; Meijs, 2004; Olsson, 2002, Shogenji, 1999; Wheeler, 2009. Causal Bayes Nets, the probabilistic model of causal systems now standard in computer science and statistics, are discussed in Pearl, 2000; and in Spirtes, Glymour, and Scheines, 2000. BonJour (1985, pp. 95-9). BonJour (2002, pp. viii, 202). See, for example, Huemer 1997, Cross 1999, Shogenji 1999, Bovens & Hartmann
Causation, Association and Confirmation
39
Much of this work in Bayesian epistemology concerns coherence among a set of statements and whether a probabilistic measure of coherence can be adduced which is “truth-conducive”, that is, whether in fact higher coherence among a set means more reason for belief in that set, ceteris paribus. Another perspective divides things up into hypotheses we care about, either decision theoretically or scientifically, and evidence we might collect, possess, or already know about which may be epistemically relevant to these hypotheses. Coherence in this setting is coherence among only the evidence, and the question explored is whether evidential coherence makes any epistemic difference to the hypotheses upon which that evidence might bear, ceteris paribus. It is this perspective that we explore here. Perhaps the most obvious and most common account of epistemic coherence is to portray coherence as probabilistic association, that is, the degree to which evidential propositions co-occur relative to how much they might or might not. We take binary evidential variables as representing propositions. For example, suppose that E1 is a binary evidence variable representing a witness report, where E1=true codes for “the witness reported fact 1 is the case,” written E1 for short, and E1=false codes for “the witness reports that fact 1 is not the case,” written ¬E1 for short. Then perhaps the most straightforward account of evidential coherence in this tradition5 is the association (i.e., deviation from independence) measure advanced by Tomoji Shogenji (1999), which is presented here in standard and conditional form:6 S (E1, E2) =
P (E1∩ E2) P(E1) ∩ P(E2)
S (E1, E2 | H) =
P (E1∩ E2 | H) P(E1 | H) P(E2 | H)
Here binary evidence variables E1 and E2 are positively associated (associated conditional on a hypothesis, H) if S is greater than 1, independent (conditionally independent on H) if S is 1, and negatively associated (negatively associated conditional on H) if S is less than 1. Shogenji interprets S (in unconditional form) as a measure of “pairwise coherence”, which evaluates “the degree by which the two beliefs [E1=true and E2=true] are more likely true together than they would be if they were [probabilistically independent]” (Shogenji 1999, 339). Even though Shogenji interprets this measure to apply to cases in which the evidence variables are assigned “true”, and thus to concern the coherence of a pair of propositions, the notion of coherence here is probabilistic association among evidence variables rather than concurrent agreement of an information set. After all, the measure has no meaning without a joint probability distribution over all the possible states for the variables. 5 6
2003a, 2003b, 2005; Olsson 2002, 2005; Fitelson 2003; Meijs, 2004, Glass 2006. Other proposals along these lines have been made by Huemer 1997, Cross 1999, Olsson 2002, Fitelson 2003, Glass 2006, and Wheeler 2009. Although this definition of association is attributed to Shogenji in Bayesian epistemology, it predates him in the general statistics literature by several decades.
40
Gregory Wheeler and Richard Scheines
Association fails to capture many aspects of evidential coherence, and some might argue that it fails to capture any of them. Our interest is not in entering that debate, however, it is rather in considering the relationship between coherence and confirmation, and how causation helps mediate this relationship. So, without prejudicing the debate or claiming it is adequate, we will use Shogenji’s measure of coherence in what follows.
3. CONFIRMATION The debate about how to characterize the idea (or mirage) of confirmation, and whether the probability calculus is the right tool for the job, is similarly contentious and far from settled. We have no desire to enter this fray either—only to examine how popular probabilistic conceptions of confirmation relate to popular probabilistic notions of coherence. Several measures of confirmation have been offered. A few of the more popular ones use probability to express how much confirmation an evidence set E provides to a hypothesis H (Elles and Fitelson 2002): P (H | E) P (H )
•
r (H, E) = df log
•
l (H, E) = df log P (E | H) P (E | ¬H )
•
ko (H, E) = df
P (E | H) – P (E | ¬H ) P (E | H) + P (E | ¬H )
Cohen (1977) and Earman (1992) define the incremental confirmation of E2 on H after we already know E1: •
inc1 (H, E1, E2 ) = df P (H | E1 ∩ E2) – P(H | E1)
An extension of incremental confirmation that normalizes for how much room above P(H|E1) there is for E2 to “boost” the posterior of H is: inc1 (H, E1, E2 ) = df
P (H | E1 ∩ E2) – P(H | E1) 1– P(H | E1) Confirmation and Coherence
Using Shogenji’s measure of association as a measure of evidential coherence, and any of the measures of confirmation given above, we can ask whether there is a relationship between the coherence of an evidence set and the confirmation it provides to a hypothesis. More formally, is it the case, or, under what conditions
Causation, Association and Confirmation
41
is it the case, that for two evidence sets E and E’, and one of the measures of confirmation above, confi, S(E) > S(E’) ⇒ conf i (H,E) > conf i (H,E’) ? As many authors have noted, the answer is clearly no. It is not the coherence of the evidence, but its relation to H and the reason for the coherence that makes the difference. For example, consider again the food poisoning vs. communicable virus hypotheses and the evidence that five of your friends are stricken with stomach problems on Tuesday. Whatever the coherence of this evidence in Shogenji’s sense, it provides quite different confirmation to each hypothesis. The problem is that it is not the coherence or lack of such between the health of our five friends, but the reason for the coherence that is telling. Probabilistic attempts to secure a connection between coherence and confirmation either smuggle in a reason for the coherence (e.g., the partially reliable witness model of Hartmann and Bovens, 2003a,b), or rely upon a definition of coherence that is partially built from the confirmation relation (e.g., Bovens and Hartmann, 2003b).
4. CETERUS PARIBUS What we now want to explore is whether, all else equal, evidence sets that are differentially coherent provide differential confirmation to a hypothesis H. In this section we attempt to formalize and clarify “all else equal.” In what follows, we will assume that the domain D =
is a hypothesis H = true and an evidence set E = {E1 = true, …, En= true}, where H and E1, …, En are propositional (binary) variables, none of which are logically related. A proposition conveniently expresses either the content of a proposition, or a witness report of a proposition. Extending what follows to real-valued variables is certainly possible. By insisting that no logical relations obtain, which we do for two reasons, we mean that there are positive probability distributions over D in which every pair of variables X and Y are probabilistically independent. This is not possible, for example, in a setting in which E1 = The Butler did it, and E2 = The Butler did it or The Cook did it, for in no positive distribution is E1 independent of E2. We assume this in order to activate the theory of causal Bayes nets, which require variables that are unrelated logically, and also to avoid what we take to be distracting puzzles that trade on logical relations between variables.7 7
Witness reports whose contents are logically related are not themselves logically related in this way, for it is perfectly possible to have a measure involving propositional variables V1: witness 1 report = The Butler did it, and V2: witness 2 report = The Butler did it or The Cook did it in which V1 and V2 are independent.
42
Gregory Wheeler and Richard Scheines
We assume that P(D), a probability distribution over a domain of propositions D = 8 is positive. We say that two distinct pieces of evidence Ei and Ej are equally confirmatory for a hypothesis H iff • P (H | Ei) = P (H | Ej), and • P (H | ¬Ei) = P (H | ¬Ej) We consider two assumptions (A1) Positive Relevance: all propositions in a set of evidence E are positively relevant to H, i.e., ∀Ei ∈ E, P (H | Ei) ≥ P (H) ≥ P (H | ¬Ei) (A2) Equal Relevance: all propositions in a set of evidence E are equally confirmatory, i.e., ∀Ei Ej ∈ E, P (H | Ei) = P (H | Ej), P (H | ¬Ei) = P(H | ¬Ej). Call an evidence set whose elements satisfy (A1) with respect to H a positive evidence set for H, and call an evidence set whose elements satisfy (A1) and (A2) for H an equally positive evidence set (epe) for H. These assumptions probabilistically encode what we take “ceteris paribus” to mean in an investigation on the relation between probabilistic notions of coherence and confirmation. We want to compare the confirmatory power for H of two sets of evidence E and E’, where E and E’ are identical in all respects except for their coherence. That is, we want to consider sets E and E’ which are comparable in terms of how confirmatory each individual piece of evidence is to H, but that differ in terms of their overall coherence, and ask which set does more to confirm H. This is how we interpret the question: Does more coherent evidence do more to confirm H, ceteris paribus.
5. FOCUSED CORRELATION Wheeler (2009) attempted to address the apparent disconnect between coherence and confirmation by using a relation between coherence and conditional coherence on a hypothesis first introduced by Myrvold (1996). Wheeler examined how different degrees of “focused correlation” track different degrees of confirmation. The focused correlation of a set of evidence E = {E1,..,En) with respect to a hypothesis H is the ratio of the coherence/association of the evidence conditional on
8
Probability can be interpreted as credal or objective, we don’t care. Our concern is only with how coherence, interpreted probabilistically, interacts with confirmation, also interpreted probabilistically.
Causation, Association and Confirmation
43
H to the coherence/association of the evidence simpliciter, which can be expressed generally as: P (E1, …, En | H ) P (E | H ), …, P (En | H ) P (H | E1, …, En) P (H)n–1 S (E1, …, En | H) 1 = = ForH (E1, …, En) := S (E1, …, En) P (H | E1), …, P (H | En) P (E1, …, En) P (E1,) …, P (En)
Wheeler (2009) drew several connections between focused correlation and confirmation. Before examining the role of causal structure, we strengthen these connections for the case of evidence sets with two variables. Consider Hypothesis H and evidence sets E = {E1, E2} and E’ = {E1, E3} satisfying assumption (A1) above. For each of the confirmation measures above, the confirmation of H on an evidence set E is positive (greater than 0) if ForH (E) is positive (greater than 1). Proposition 1: If E is a positive evidence set for H, and ForH(E) > 1, then all of the following hold: r (H, E) > 0 l (H, E) > 0 ko (H, E) > 0 inc1 (H, E) > 0 inc2 (H, E) > 0 Proposition 1 says that for any evidence set E that satisfies (A1), if E has a focused correlation for H above 1, then E provides positive confirmation of H by any of the above confirmation measures. If a set of evidence has more Shogenji coherence conditional on H that it does unconditionally, then the evidence provides positive confirmation to H. This is good progress, but because the condition is not necessary, it still leaves open the situations in which ForH (E) < 1. When we strengthen the assumptions regarding the uniformity of the evidence, focused correlation tracks confirmation: Proposition 2: If E={E1, E2} and E’= {E1, E3}, and E ∪ E’ is an equally positive evidence set for H, then all of the following inequalities are equivalent: ForH (E) > ForH (E’) r (H, E) > r (H, E’) l (H, E) > l (H, E’) ko (H,E) > ko (H, E’) inc1 (H, E) > inc1 (H, E’) inc2 (H, E) > inc2 (H, E’)
44
Gregory Wheeler and Richard Scheines
So in several senses, focused correlation tracks confirmation and incremental confirmation, whereas simple coherence (association) does not. Looking at the formula for focused correlation, it is immediate that two equally positive evidence (epe) sets can have equal association while having unequal focused correlation, and thus confirmation. Why does the focused correlation capture something about the relationship between coherence and confirmation? Part of the answer, we believe, is the causal structure governing the system.
6. CAUSAL STRUCTURE Generally, Causal Bayes Nets (CBNs) provide all the apparatus needed to represent causal structure,9 and to characterize the constraints such structures impose on the probability distributions they might represent. Let a causal graph G = {V, E} be a set of random variables V and a set of directed edges E such that Ei → Ej ∈ E if and only if Ei is a direct cause of Ej relative to V. The set of direct causes of a variable are its parents. A set of variables V is causally sufficient just in case for every pair of variables Vi ,Vj ∈ V, the common causes of Vi ,Vj are also in V. An acyclic causal graph G and a probability distribution P(V) over the variables10 in G satisfy the Causal Markov Axiom (Spirtes, Glymour and Scheines, 2000) just in case P(V) factors according to the causal graph: P (V) = ∏ P (X | parents (X)) X∈V
This factorization11 imposes independence constraints on the distributions P(V) which can be generated by a given causal graph. These constraints are characterized by the graph-theoretic relation of d-separation, (Pearl, 1988). One can view the independence constraints imposed by this factorization as the non-parametric consequences of qualitative causal structure. An additional axiom typically applied to CBNs is the Faithfulness assumption (Spirtes, et al., 2000). A graph G and a probability distribution P(V) over the variables12 in G satisfy the Faithfulness Axiom just in case the only independence relations in P(V) are those entailed by the Causal Markov axiom (d-separation). If causal structure alone plays a mediating role between coherence and confirmation, then it should be through the independence constraints in distributions that are Markov and Faithful to the causal graphs that govern the relationship between evidence and hypothesis. 9 10 11 12
See Pearl, 2000, and Spirtes, Glymour, and Scheines, 2000. The Causal Markov Axiom applies to causally sufficient sets of variables. If X has no parents, then P(X | parents(X)) = P(X). Again, the Faithfulness Axiom applies to causally sufficient sets of variables.
Causation, Association and Confirmation
45
One obvious interpretation of the Bovens and Hartmann 2003 model of partially reliable, independent witness reports is Figure 1.
H
R1
R2
…..
Rn
Rep1
Rep2
…..
Repn
Figure 1: Common Cause Model for Bovens and Hartmann A simplification of the Bovens-Hartmann model is the single-factor common cause model (Figure 2).
H a
E1
c
b E2
…..
En
Figure 2: Single-Factor Common Cause Model Interpreted as a CBN this model entails the evidence is independent conditional on H, that is, any coherence among the evidence is due to H: ∀i, j, Ei _||_ Ej | H.13 How does the causal structure in the single-factor common cause model mediate the relationship between coherence and confirmation? The answer is that, in this model, the coherence of the evidence is strictly a function of the degree to which 13 A _||_ B | C is to be read: A is independent of B conditional on C, where A, B, and C are random variables, or sets of random variables. If A, B, and C are naturally interpreted as events, then they can just as easily be represented as a random variable with binary outcome, 0 = the event did not occur, and 1 = the event occurred.
46
Gregory Wheeler and Richard Scheines
the evidence is individually associated with the hypothesis, so there is no way to separate coherence and confirmationt—they are perfectly intertwined.14 How does causal structure more generally constrain or mediate the relationship between coherence and confirmation? Consider a causal model (Figure 3) that simultaneously represents three important extremal cases: 1. All the coherence among the evidence is because of the hypothesis (E = {E1, E2}) 2. None of the coherence among the evidence is because of the hypothesis (E’ = {E1, E3}) 3. The evidence has no coherence, but is individually related to the hypothesis (E’’ = {E1, E4}) Visiting Team’s Star Player Injured
Home Team Wins H
E4 Home Team’s Star Player Healthy E1
Home Team’s Published Injury Report
Witness Report E2
E3
Figure 3: Causal Model of Home Team Wins The hypothesis H of interest is whether one’s home sporting team wins in some big game that you cannot directly witness. Consider several pieces of evidence relevant to this hypothesis. E1 is whether or not the home team’s star player is injured going into the game, and similarly for E4 and the visiting team’s star player. We “code” E1 as health and E4 as “injury” so that both are positively relevant to H. Both of these facts are evidence of, but also causes of, the hypothesis of interest. As the star players for the two teams have no causal connection relating to injury, we assume that E1 and E4 are causally and probabilistically independent. E3 is the published injury report about the home team’s star player prior to the game. As any sports fan knows, such reports are only partially reliable.15 Still, we assume that such a report is an effect of whether or not the home team’s star player is in fact healthy, and probabilisti14 This is because, in a singly connected Bayes net with binary variables, the correlation (which is proportional to the Shogenji association) between the evidence variables is simply the product of the correlation between the hypothesis and the evidence variables. See Danks and Glymour (2001). 15 In the sense of Bovens and Hartmann (2006).
Causation, Association and Confirmation
47
cally independent of everything else given the state of the star player’s health. Finally, we have the testimony of a partially reliable witness E2 on the outcome of the game. First, consider the two sets of evidence E = {E1, E2} and E’ = {E1, E3}. The coherence in E is for the same reason that different effects of a common cause are coherent: any coherence between E1 and E2 is the result of the correlation between E1 and H and between H and E2. E’ is the other extreme—none of the coherence between E1 and E3 is the result of the correlation between E1 and H and between H and E3. If E and E’ have identical coherence, do they afford different degrees of confirmation to H? Since both sets share E1, this reduces to the question of whether the incremental confirmation afforded by E2 always exceeds that of E3, or vice versa, or neither one. By the causal structure of this model, H and E3 are independent conditional on E1, P (H | E1) = P (H | E1, E3), thus E3 provides zero incremental confirmation after E1. Thus the question of whether E and E’ afford different degrees of confirmation to H reduces to whether E2 provides positive incremental confirmation to H conditional on E1, i.e., P (H | E1, E2) > P (H | E1). The answer is yes, and it makes no difference how strong the relationship between H and E2 is. Proposition 3: If E = {E1, E2} and E’ = {E1, E3} are positive evidence sets for H, then in any probability distribution P (H, E1, E2, E3} that is Markov and Faithful to the causal graph in Figure 3, inc1(H, E1, E2) > inc1(H, E1, E3).16 So coherence plays no role whatsoever in this case, except through the “side door.” The coherence between E1 and E3 is entirely the result of the direct causal influence of E1 on E3. In the case of E1 and E2, however, positive coherence between E1 and E2 must arise through the path that goes through H. So in the case of E = {E1, E2} and E’ = {E1, E3}, by the causal arrangements shown in Figure 3, and by the assumption that all evidence is individually positively relevant to H, E will provide more confirmation than E’ to H, regardless of the coherence of either set.17 It is the causal structure of the situation which drives things. Zero Coherence Now consider evidence sets E = {E1, E2} and E’’ = {E1, E4}. From the causal graph in Figure 3, we know that E1 and E4 are probabilistically independent, so E’’ has zero association and a Shogenji coherence equal to 1. Is it nevertheless possible
16 The proof is given in Wheeler and Scheines (2010). 17 The coherence between E’ could be as close to maximal as we like, and the coherence of E arbitrarily small (but positive), and it is nevertheless the case that E will provide more confirmation than E’.
48
Gregory Wheeler and Richard Scheines
for E’’ to provide more confirmation to H than E, which has positive coherence? The answer, surprisingly, is yes. Proposition 4:18 In cases for which E and E’’ are equally positive evidence (epe) sets for H, then inc1(H, E1, E4) > inc1(H, E1, E2) if and only if αβ where α =
P (H | E1) P (H | E1, E4) and β = . P (H ) P (H | E1)
The incremental confirmation from an evidence set with zero coherence (E’’) exceeds the confirmation from an evidence set with positive coherence (E) just in case the coherence of E is greater than the ratio of the incremental confirmation provided by E1 to the incremental confirmation provided by E4 after knowing E1.
7. CONCLUSION Explicating notions of coherence and confirmation have occupied philosophers of science for hundreds of years. Further, most every philosopher since William Whewell who has discussed both notions has connected them. Recently, many have tried to model these ideas and the connection between them using just the probability calculus. Attempts to connect coherence simpliciter to confirmation are bound to fail, as probabilistic models of coherence make no reference to either the reason for coherence or the reason any piece of evidence in a set of evidence should relate to H. In our view, any such effort ought to include, explicitly in the formalism, both the reason the evidence is coherent and how the evidence is connected to H. We have tried to argue that focused correlation and causal structure move in this direction. Focused correlation, a fully probabilistic measure which compares the coherence of an evidence set conditional on the hypothesis it might or might not confirm to the unconditional coherence of the set turns out to be a powerful indicator of when coherence helps confirmation as well as a powerful indicator of which evidence sets are more helpful than others. It is not the complete story, either, however, as in some cases comparing the focused correlation of two evidence sets underdetermines the confirmation relation. Since evidence can be causally connected to other evidence and to the hypothesis in virtually any way possible, it turns out to be very useful to explicitly and formally model the causal structure governing the evidence and the hypothesis. Even when one connects causal structure to probability only qualitatively through independence and conditional independence, quite a lot about the relationship between coherence and confirmation can be adduced. In cases in which all the evidence are effects of the hypothesis and otherwise causally independent, coherence 18 Again, the proof is given in Wheeler and Scheines (2010).
Causation, Association and Confirmation
49
and confirmation are tightly connected.19 In cases in which the coherence between the evidence has nothing causally to do with the hypothesis, coherence and confirmation are utterly disconnected. In cases in which pieces of evidence are not caused by the hypothesis nor cause each other, the story is more complicated, but extremely rich nonetheless. We have not offered a proof that focused correlation and/or causal structure are the only keys to the castle, nor do we think one is forthcoming. Nor have we offered anything approaching a complete theory of coherence and confirmation through focused correlation and causal structure. We hope, however, that we have shown that these tools can take us much farther than probabilistic models of coherence and confirmation alone.20
REFERENCES L. BonJour, The Structure of Empirical Knowledge. Cambridge, MA, Harvard University Press, 1985. L. BonJour, The dialectics of foundationalism and coherentism, in J. Greco and E. Sosa (eds.), The Blackwell Guide to Epistemology. Malden, MA: Blackwell, 1999, 117-42. L. BonJour, Epistemology. Rowman and Littlefield, Oxford, 2002. L. Bovens, and S. Hartmann, Bayesian Epistemology. Oxford University Press, 2003a. L. Bovens, and S. Hartmann, Solving the Riddle of Coherence, Mind, 112, 2003b, 601-633. L. Bovens, and S. Hartmann, An impossibility result for coherence rankings. Philosophical Studies, 128, 2006, 77-91. R. Carnap, The Logical Foundations of Probability. University of Chicago Press, 1962. L. J. Cohen, The Probable and the Provable, Oxford: Clarendon Press, 1977. C. B. Cross, Coherence and truth conducive justification. Analysis, 59(3), 1999, 186-93. D. Danks and C. Glymour, Linearity properties of Bayes nets with binary variables. In J. Breese & D. Koller (Eds.), Uncertainty in artificial intelligence: 19 Philosophers, statisticians, and computer scientists have learned a lot about how to tell, from data, whether or not a set of measured variables are indeed effects of an unmeasured common cause and otherwise causally independent, and so this case is epistemically particularly exciting. See Silva, Scheines, Glymour, and Spirtes (2006), Junker and Ellis (1997), and Glymour (1998). 20 Thanks are due to Clark Glymour, Teddy Seidenfeld, and audiences at CalTech and the ESF workshop in Utrecht. Thanks also to a referee of this volume for helpful comments.
50
Gregory Wheeler and Richard Scheines
Proceedings of the 17th conference (UAI-2001) (pp. 98-104). San Francisco: Morgan Kaufmann, 2001. A. Dempster, Upper and lower probabilities induced by a multivalued mapping. Annals of Mathematical Statistics. 38, 1967, 325-339. I. Douven and W. Meijs, Measuring coherence. Synthese, 156(3), 2007, 405-425. J. Earman, Bayes or Bust: A Critical Examination of Bayesian Confirmation Theory. Cambridge, MA: MIT Press, 1992. E. Eells, and B. Fitelson, Symmetries and Asymmetries in Evidential Support. Philosophical Studies, 107(2), 2002, 129-142. B. Fitelson, A probabilistic theory of coherence. Analysis, 63, 2003, 194-99. D. H. Glass, Coherence measures and their relations to fuzzy similarity and inconsistency in knowledge bases. Artificial Intelligence Review, 26, 2006, 227249. C. Glymour, What Went Wrong: Reflections on Science by Observation and The Bell Curve, Philosophy of Science, 65(1), 1998, 1-32. C. Glymour, R. Scheines, P. Spirtes, and K. Kelly, Discovering Causal Structure. Academic Press, 1987. R. Haenni, J.W. Romeyn, G. Wheeler, G. and J. Williamson, Probabilistic Logic and Probabilistic Networks, Dordrecht: The Synthese Library, 2009. M. Huemer, Probability and coherence justification. Southern Journal of Philosophy, 35, 1997, 463-72. R Jeffrey, The Logic of Decision. New York: McGraw-Hill, 1965. B. W. Junker and J.L. Ellis, A Characterization of Monotone Unidimensional Latent Variable Models. The Annals of Statistics, 25, 1997, 1327-1343. P. Klein, and T. Warfield, What price coherence? Analysis, 54(3), 1994, 129-32. H. Kyburg, Jr and M. Pittarelli, Set-Based Bayesianism. IEEE Transactions on Systems, Man and Cybernetics A, 26(3), 1996, 324-339. I. Levi, The Enterprise of Knowledge. Cambridge, MA: MIT Press, 1980. C. I. Lewis, An Analysis of Knowledge and Valuation. Open Court, La Salle, 1946. W. Meijs, A corrective to Bovens and Hartmann’s measure of coherence, Philosophical Studies, 133(2), 2004, 151-180. W. Myrvold, Bayesianism and diverse evidence: a reply to Andrew Wayne. Philosophy of Science, 63, 1996, 661-5. E. Olsson, What is the problem of coherence and truth? Journal of Philosophy, 94, 2002, 246-72. E. Olsson, Against Coherence: Truth, Probability and Justification. Oxford University Press, Oxford, 2005. F. Ramsey, Truth and Probability, reprinted in Ramsey, 1931, The Foundations of Mathematics and other Logical Essays, Ch. VII, p.156-198, edited by R.B. Braithwaite, London: Kegan, Paul, Trench, Trubner & Co., New York: Harcourt, Brace and Company. 1999 electronic edition, http://ideas.repec. org/h/hay/hetcha/ramsey1926.html.
Causation, Association and Confirmation
51
M. Schlosshauer and G. Wheeler, Focused correlation, confirmation, and the jigsaw puzzle of variable evidence, Philosophy of Science, forthcoming. T. Shogenji, Is coherence truth conducive? Analysis, 59, 1999, 338-45. R. Silva, C. Glymour, R Scheines, and P. Spirtes, Learning the Structure of Latent Linear Structure Models, Journal of Machine Learning Research, 7, 2006,191-246. P. Spirtes, C. Glymour, and R. Scheines, Causation, Prediction, and Search. 2nd edition. MIT Press, Cambridge, MA, 2000. G. Wheeler, Focused correlation and confirmation. The British Journal for the Philosophy of Science, 60(1), 2009, 79-100. G. Wheeler and R. Scheines, Coherence, Confirmation, and Causation, CMU Department of Philosophy Technical Report, Department of Philosophy, Carnegie Mellon, Pittsburgh, PA 15213, 2010.
Gregory Wheeler CENTRIA – Center for Artificial Intelligence New University of Lisbon 2829-516 Caparica Portugal [email protected] Richard Scheines Department of Philosophy Carnegie Mellon University Pittsburgh, PA USA
Jon Williamson
An Objective Bayesian Account of Confirmation
Abstract This paper revisits Carnap’s theory of degree of confirmation, identifies certain shortcomings, and argues that a new approach based on objective Bayesian epistemology can overcome these shortcomings. Rudolf Carnap can be thought of as one of the progenitors of Bayesian confirmation theory (§1). Bayesian confirmation theory is construed in §2 as a four-step process, the third step of which results in the identification of the degree to which e confirms h, c(h, e), with the probability of h conditional on e in the total absence of further evidence, P∅ (h|e). The fourth step of this process involves isolating an appropriate candidate for P∅ ; Carnap rejected the most natural construal of P∅ on the grounds that it leads to a confirmation function c† that fails to adequately capture the phenomenon of learning from experience (§3). This led him, and subsequent confirmation theorists, to more elaborate interpretations of P∅ , resulting in certain continua of confirmation functions (§§4, 5). I argue in §§5, 6 that this was a wrong move: the original construal of P∅ is in fact required in order that degree of confirmation can capture the phenomenon of partial entailment. There remains the problem of learning from experience. I argue that this problem is best solved by revisiting the third—rather than the fourth— step of the four-step Bayesian scheme (§7) and that objective Bayesianism, which is outlined in §8, offers the crucial insight as to how this step can be rectified. This leads to an objective Bayesian confirmation theory that can capture both partial entailment and learning from experience (§9).
§1 Carnapian Confirmation Our current understanding of confirmation owes much to Rudolf Carnap’s pioneering work of the 1940s and beyond. Carnap (1950, §8) distinguishes three concepts of confirmation: a classificatory concept which applies when evidence e qualitatively confirms a hypothesis h, a comparative concept which applies when h is confirmed by e at least as highly as h′ by e′ , and a quantitative concept according to which h is confirmed by e to degree q, written c(h, e) = q. Carnap also distinguishes two principal notions D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_4, © Springer Science+Business Media B.V. 2011
54
Jon Williamson
of probability: probability1 , or degree of confirmation, and probability2 , or relative frequency (Carnap, 1950, §9). Carnap was chiefly concerned with the quantitative concept of confirmation (i.e., probability1 ) and we will likewise restrict our attention here to this quantitative notion. For Carnap, as for Keynes (1921) before him, this notion of probability is fundamentally a logical relation between a body of evidence and a proposition. It is clear that Carnap viewed this relation as objective, not as an expression of subjective degree of belief. On the other hand, Keynes was emphatic that the logical concept of probability underwrites rational degrees of belief; Carnap went along with this view but was more ambivalent: Many logicians prefer formulations which may be regarded as a kind of qualified psychologism. They admit that logic is not concerned with the actual processes of believing, thinking, inferring, because then it would become a part of psychology. But, still clinging to the belief that there must somehow be a close relations between logic and thinking, they say that logic is concerned with correct or rational thinking. Thus they might explain the relation of logical consequence as meaning: ‘if somebody has sufficient reasons to believe in the premise i, then the same reasons justify likewise his belief in j.’ It seems to me that psychologism thus diluted has virtually lost its content; the word ‘thinking’ or ‘believing’ is still there, but its use seems gratuitous. . . . The characterization of logic in terms of correct or rational or justified belief is just as right but not more enlightening than to say that mineralogy tells us how to think correctly about minerals. The reference to thinking may just as well be dropped in both cases. (Carnap, 1950, pp. 41–42)
Some years later, however, Carnap came to be less ambivalent and took the rational degree of belief approach more seriously (see, e.g., Carnap, 1971). It is fair to say, then, that while Carnap cannot be considered an advocate of what is now called the Bayesian interpretation of probability, which takes probability to be fundamentally interpretable in terms of rational degree of belief, he can be considered to be a pioneer of what is now called Bayesian confirmation theory, which typically admits an identity (whether fundamental or not) between degree of confirmation and rational degree of belief and which proceeds along the following lines.
§2 The Bayesian Approach to Confirmation The Bayesian approach to confirmation might broadly be characterised in terms of the following four steps. Step 1. Consider probability functions defined over a language L. Step 2. Identify c(h, e) = P{e} (h) for some suitable probability function P on L, where P{e} (h) is the probability of h on evidence e.
An Objective Bayesian Account of Confirmation
55
Step 3. Identify P{e} (h) = P∅ (h|e). Step 4. Find an appropriate P∅ that represents confirmation in the absence of evidence. Let us examine these four steps in turn. Step 1. Consider probability functions defined over a language L. Degree of confirmation is taken to be a relation between evidence and a hypothesis and these are naturally construed as propositions (or sometimes, in the case of evidence, sets of propositions). Hence the functions we need to consider—confirmation functions and probability functions—should be defined on propositions. But probability functions are normally defined on events construed as sets of possible outcomes (Kolmogorov, 1933). One of Carnap’s important contributions to this area was his work on defining probability functions on logical languages as opposed to fields of sets. One might, for example, consider a propositional language L = {A1 , . . . , An } on elementary propositions A1 , . . . , An , with compound propositions formed by the usual connectives ¬, ∨, ∧, →, ↔. The set of atomic states of L is defined as Ωn = {±A1 ∧ · · · ∧ ±An }, where +Ai is just Ai and −Ai is ¬Ai . A probability function on L is then a function P , from propositions of L to real numbers, that satisfies the properties: P1. P (ω) ≥ 0 for each ω ∈ Ωn , P2. P (τ ) = 1 for some tautology τ , and ∑ P3. P (θ) = ω|=θ P (ω) for each proposition θ. Alternatively one might consider a predicate language rather than a propositional language. There are various ways of proceeding here, but perhaps the simplest goes as follows (see Williamson, 2010b, Chapter 5). Construe a predicate language as L = {A1 , A2 , . . .} where the Ai enumerate the atomic propositions of the form U t for some predicate U and tuple t of constant symbols. (There is assumed to be a constant symbol for each domain individual.) A finite sublanguage Ln = {A1 , . . . , Al } uses only constant symbols t1 , . . . , tn . The set of atomic states of Ln is Ωn = {±A1 ∧ · · · ∧ ±Al }. A probability function on a predicate language L is then a function from propositions of L to real numbers that satisfies the properties: PP1. P (ω) ≥ 0 for each ω ∈ Ωn and each n, PP2. P (τ ) = 1 for some tautology τ , ∑ PP3. P (θ) = ω∈Ωn ,ω|=θ P (ω) for each quantifier-free proposition θ, where n is large enough that Ln contains all the atomic propositions occurring in θ, and
56
Jon Williamson
PP4. P (∃xθ(x)) = supm P (
∨m i=1
θ(ti )).
Note in particular that a probability function P on predicate language L is determined by its values on the ω for n = 1, 2, . . . (see, e.g., Paris, 1994, Theorem 11.2). PP4 is known as Gaifman’s condition, and PP1–4 imply ∨m ∧m that P (∃xθ(x)) = limm→∞ P ( i=1 θ(ti )) and P (∀xθ(x)) = limm→∞ P ( i=1 θ(ti )). Step 2. Identify c(h, e) = P{e} (h) for some suitable probability function P on L, where P{e} (h) is the probability of h on evidence e. For Carnap, this step is just his explication of the quantitative concept of confirmation in terms of probability1 . One thing that makes Bayesian confirmation theory Bayesian is that the probability of h on evidence e is, in turn, interpretable as the degree to which one should believe h if one were to grant just e. (Bayes (1764) wrote of ‘expectation’ for belief or credence.) It should be reiterated that the proponent of a logical interpretation of probability, such as Keynes or Carnap, would want to say that this Bayesian construal of the probabilities is derivative rather than fundamental: that P{e} (h) = x means that there is a logical probability-relation between {e} and h of degree x, and it is this fact that makes it rational to believe h to degree x if one were to grant just e. Proponents of a Bayesian interpretation, on the other hand, would take the rational degree of belief interpretation as fundamental. According to subjective Bayesianism, x largely depends on the whim of the agent in question, while according to objective Bayesianism, the agent’s evidence plays the leading role in determining x. Step 3. Identify P{e} (h) = P∅ (h|e). It is usual for the Bayesian to identify a conditional belief with a conditional probability: the degree to which one should believe h if one were to grant just e is identified with the probability of h conditional on e (granting nothing at all). As with other rules of Bayesian probability, the justification for such a move normally proceeds via the betting interpretation of degrees of belief. In this case, the degree to which one should believe h if one were to grant just e is first interpreted in terms of a certain conditional bet and then it is shown that under this interpretation the identity posited in Step 3 must hold. The argument proceeds as follows. Interpret P{e} (h) = q as saying that one is prepared to offer a betting quotient q for h (i.e., one is prepared to bet qS for a return of S if h is true), with the bet called off if e is false. (The stake S depends on the betting quotient and may be positive or negative.) The loss one incurs on such a bet is Ie (q − Ih )S, where Iθ is the indicator function for proposition θ, which takes the value 1 if θ is true and 0 if θ is false. If one also offers betting quotient P (h∧ e) = q ′ on h ∧e and P (e) = q ′′ on e then one’s total loss is Ie (q − Ih )S + (q ′ − Ie Ih )S ′ + (q ′′ − Ie )S ′′ .
An Objective Bayesian Account of Confirmation
57
If q ′ < qq ′′ then the stake-maker can choose S ′ = −S = 1 and S ′′ = q to ensure certain loss qq ′′ −q ′ . Similarly if q ′ > qq ′′ the stake-maker can choose S ′ = −S = −1 and S ′′ = −q to ensure certain loss q ′ − qq ′′ . Hence unless P{e} (h)P (e) = P (h ∧ e) one can be made to lose money whatever happens. But if P{e} (h)P (e) = P (h ∧ e) then one avoids the possibility of sure loss, for the following reason. The expected loss is qq ′ S ′ + q ′′ S ′′ + q ′ (−S − S ′ ) + q ′′ (qS − S ′′ ) = (q ′′ q − q ′ )S but this is zero if q ′′ q−q ′ = 0; if the expected loss is zero then the loss cannot be positive in every eventuality. So one avoids the possibility of sure loss if and only if q ′′ q −q ′ = 0. Granting that avoiding the possibility of sure loss is a requirement of rationality, the identity P{e} (h) = P (h ∧ e)/P (e) = P (h|e) must hold for rational degrees of belief (as long as P (e) ̸= 0). Assuming finally that e exhausts the available evidence, P (h ∧ e) = P∅ (h ∧ e) and P (e) = P∅ (e), and Step 3 follows. Step 3 proposes the use of conditional probabilities in the explication of confirmation, and this yields another sense in which the approach can be described as Bayesian. In fact it is often easier to determine the probability of the evidence conditional on the hypothesis than the probability of the hypothesis conditional on the evidence, so Step 3 provides an avenue for Bayes’ theorem to enter the picture: c(h, e) = P∅ (h|e) =
P∅ (e|h)P∅ (h) . P∅ (e)
Although Step 3 proposes the use of conditional probabilities, it should not be confused with the principle of Bayesian conditionalisation, which relates degrees of belief at different points in time, and which says: if you adopt belief function P now and you come to learn just e, you should then change your belief function to P (·|e). While someone who endorses Step 3 might well endorse Bayesian conditionalisation and vice versa, they are in fact rather different principles, one dealing with conditional belief and the other with changes of belief. Bayesian conditionalisation is advocated by many proponents of a Bayesian interpretation of probability, but will not be relevant in our context of Bayesian confirmation theory. Note that Steps 2 and 3 are sometimes conflated. Carnap himself ran the two steps together by making assumptions about c that directly ensure that c(h, e) = P∅ (h ∧ e)/P∅ (e) (Carnap, 1950, §§53,54B). This is perhaps a mistake; as we shall see below, the key steps must be teased apart if we are to make progress with confirmation theory. Step 4. Find an appropriate P∅ that represents confirmation in the absence of evidence. This step seems straightforward, although, as we shall see, Carnap had reservations about the following proposal. The natural choice for P∅ is
58
Jon Williamson
the equivocator , P= , on L, i.e., the probability function that equivocates between the atomic states, giving each ω ∈ Ωn the same probability: df
P∅ (ω) = P= (ω) =
1 |Ωn |
for all ω ∈ Ωn . (The equivocator can alternatively be defined in terms of models of L rather than states of L—see Kemeny (1953).) Putting the four steps together we then have the recommendation that c(h, e) = P= (h|e). Carnap used the notation c† or c∞ for this confirmation function. Having characterised the two-place confirmation relation it is then usual to define a three-place support relation in terms of the confirmation relation (Jeffreys, 1936, p. 421; Good, 1960, pp. 146–147; Gillies, 1990, p. 144).1 Degree of support s(h, e, k) is supposed to capture the added confirmation that e offers to h, over and above the confirmation provided by background k. One possible measure of support is given by s(h, e, k) = c(h, e ∧ k) − c(h, k), but there are many others and little consensus as to which is the most appropriate (see, e.g., Fitelson, 1999). Confusingly, the word ‘confirmation’ is often used to refer both to the two-place relation and to the three-place support relation. In this paper we restrict our attention to the two-place confirmation relation.
§3 Learning from Experience There is a difficulty with the approach to the problem of confirmation outlined in §2, as Carnap realised very early on in his research (see, e.g., Carnap, 1945, p. 81; Carnap, 1952, p. 38). This is the problem that the resulting choice of confirmation function, c† , renders learning from experience impossible. One can illustrate this general problem via the following example. Suppose that ravens r1 , . . . , r101 are being observed to see if they are black (B). Then 1 c† (Br101 , ∅) = P= (Br101 ) = , 2 where ∅ represents an empty evidential statement—a tautology, say. This seems right—in the absence of any evidence it seems appropriate to say that Br101 and ¬Br101 are equally confirmed. However it is also the case that c† (Br101 , Br1 ∧ · · · ∧ Br100 ) = P= (Br101 | Br1 ∧ · · · ∧ Br100 ) = 1
1/2101 1 = . 1/2100 2
Carnap introduces the distinction between confirmation and support in §B.II (p. xvi) of the Preface to the Second Edition of Carnap (1950).
An Objective Bayesian Account of Confirmation
59
Hence, on evidence of the first 100 ravens being black, the degree of confirmation of Br101 remains stuck at 12 . This inability of evidence to change degree of confirmation is quite unacceptable and the confirmation function c† should be rejected, Carnap argued. This problem was in fact recognised by George Boole, who considered drawing balls from an urn containing black and white balls: It follows, therefore, that if the number of balls be infinite, and all constitutions of the system be equally probable, the probability of drawing m white balls in succession will be 21m , and the probability of drawing m+1 white balls in succession 1 ; whence the probability that after m white balls have been drawn, the next 2m+1 drawing will furnish a white one, will be 12 . In other words, past experience does not in this case affect future expectation. (Boole, 1854, pp. 371–2)
§4 Carnap’s Resolution Carnap’s strategy for circumventing the problem of learning from experience was to tinker with Step 4 of the four-step scheme of §2: by isolating desiderata that P∅ ought to satisfy, one can narrow down the functional form of P∅ , without narrowing it down so much as to force the identity P∅ = P= (Johnson, 1932; Carnap, 1952; Paris, 1994, pp. 189–197). Consider the following desiderata: Constant Exchangeability. P∅ should be invariant under permutations of the constant symbols ti . Johnson’s Sufficientness Postulate. P∅ (U tk+1 |±U t1 ∧· · ·∧±U tk ) should depend only on k and the number rk of positive observations. It turns out that, for a predicate language with two or more predicates, all unary, there is a continuum of probability functions satisfying Constant Exchangeability and Johnson’s Sufficientness Postulate, characterised by: P∅ (U tk+1 |±U t1 ∧ · · · ∧ ±U tk ) =
rk + λ/2m , k+λ
where m is the number of predicates in the language and λ ∈ [0, ∞] is an adjustable parameter, and where instances of different predicates are probabilistically independent. This is known as Carnap’s continuum of inductive methods; given λ ∈ [0, ∞], the corresponding confirmation function is denoted by cλ . Note that this characterisation is also supposed to apply to languages with a single unary predicate. In that case, if λ = 0 then P∅ (U tk+1 |±U t1 ∧ · · · ∧ ±U tk ) = rkk and c0 , sometimes called the straight rule, sets degrees of confirmation to observed frequencies. If λ = 1 then P∅ (U tk+1 |±U t1 ∧ · · · ∧
60
Jon Williamson
P∅ 1 0.75 0.5 0.25 0
50
40
30 λ
20
10
0
0
2
4
6
8
10
r10
Figure 1: Carnap’s inductive methods for λ ∈ [0, 50], m = 1 and k = 10. +1/2 = rkk+1 and c1 is called the Jeffreys-Perks’ rule of succession. If k +1 λ = 2 then P∅ (U tk+1 |±U t1 ∧· · ·∧ ±U tk ) = rk+2 and c2 is known as Laplace’s rule of succession. If λ = ∞ then P∅ (U tk+1 |±U t1 ∧ · · · ∧ ±U tk ) = 1/2, and we have c∞ = c† , the function that fails to admit learning from experience. P∅ (U tk+1 |±U t1 ∧ · · · ∧ ±U tk ) is depicted in Fig. 1 for k = 10 and a range of λ and rk . ±U t k )
§5 Problems with Carnap’s Resolution There are several concerns one might have about Carnap’s resolution to the problem of learning from experience; in this section we will consider three. Determining λ. One question that immediately arises is, how should λ be determined? Carnap himself suggested that the choice of λ will depend on empirical performance, simplicity and formal elegance of the resulting inductive method (Carnap, 1952, §18), but he gave no clear indication as to how this balance should be achieved. One might suggest that λ should be treated as a meta-inductive parameter: one should attach a prior probability distribution over λ and update in the light of new evidence (see, e.g., Good, 1980). But then there is a danger of regress: if there is a continuum, with parameter λ′ , of suitable prior distributions over λ, one needs
An Objective Bayesian Account of Confirmation
61
to formulate a prior over λ′ , and so on (Howson and Urbach, 1989, §4.c.2). To get round this problem one might try taking an arbitrary initial value of λ, and changing that as evidence e is gathered in order to minimise the distance between the inductive probability function P{e} and the physical probability function P ∗ (Carnap, 1952, §§19–24; Kuipers, 1986). A choice has to be made concerning the most appropriate distance function—mean square error seems to be the usual choice in this context—and of course since the physical probability function is unknown, one must estimate these probabilities on the basis of available evidence. This leads to an iterative approximation method for updating λ that does not require a prior over λ and that consequently avoids the regress problem. The difficulty with this line of attack is that, since λ varies, the resulting sequence of inductive probabilities cannot be captured by a single member of the λ-continuum— the resulting inductive method is thus irrational according to the norms laid down by Carnap himself. Hence this avenue undermines the whole basis of Carnap’s resolution to the problem of learning from experience. The δ-continuum. A second worry about Carnap’s resolution is that a very similar—and apparently equally justifiable—strategy leads to a totally different continuum of inductive methods, namely the Nix-Paris δcontinuum (Nix, 2005; Nix and Paris, 2006). This continuum takes parameter δ ∈ [0, 1) and is the only set of probability functions satisfying: Regularity. P∅ (θ) = 0 iff |= ¬θ. Constant Exchangeability. P∅ should be invariant under permutations of the ti . Predicate Exchangeability. P∅ should be invariant under permutations of the predicate symbols U . Strong Negation. P∅ should be invariant under negating each occurrence of some predicate. Generalised Principle of Instantial Relevance. If θ |= φ and φ(ti+1 )∧ ψ is consistent then P∅ (θ(ti+2 )|φ(ti+1 ) ∧ ψ) ≥ P∅ (θ(ti+1 )|ψ). For a language with a single unary predicate we have that ( )k [( )r ( )k−rk ] 1 1−δ 1+δ k 1+δ P∅ (±U t1 ∧ · · · ∧ ±U tk ) = + 2 2 1−δ 1−δ and P∅ (U tk+1 |±U t1 ∧ · · · ∧ ±U tk ) =
1+δ δ −( )rk −sk 2 1+δ 1−δ
, +1
62
Jon Williamson
P∅ 1 0.75 0.5 0.25 0
0
0.2
0.4 δ
0.6
0.8
1
0
2
4
6
8
10
r10
Figure 2: The Nix-Paris inductive methods for δ ∈ [0, 1), a single unary predicate and k = 10. where sk = k − rk is the number of observed negative instances of U . This last function is depicted in Fig. 2. In general the δ-continuum only agrees with the λ-continuum at point δ = 0, which corresponds to λ = ∞. This point is the equivocator function of §2—the function that gave rise to the problematic c† . (Nix and Paris stipulate that δ = 1 should correspond to λ = 0, but this stipulation is rather counterintuitive when one compares the graph of the δ-continuum, given in Fig. 2, with that of the λ-continuum given in Fig. 1.) If one takes the principles characterising the δ-continuum to be just as plausible as those characterising the λ-continuum, then Carnap’s resolution to the problem of learning from experience faces an important problem: underdetermination.
The pre-eminence of the equivocator. The last but most important of the problems facing Carnap’s resolution is that—setting aside, for the moment, the argument that it gives rise to c† and the problem of learning from experience—the equivocator function P= (i.e., λ = ∞, δ = 0) stands out by a long shot as the only viable candidate for P∅ . As noted above, if the Johnson-Carnap justification of the λ-continuum is convincing at all, then so is the Nix-Paris justification. And, putting
An Objective Bayesian Account of Confirmation
63
all the desiderata together, we have that the point where the two continua coincide—i.e., the equivocator P= —is the only function satisfying Regularity. P∅ (θ) = 0 iff |= ¬θ. Constant Exchangeability. P∅ should be invariant under permutations of the ti . Predicate Exchangeability. P∅ should be invariant under permutations of the predicate symbols U . Strong Negation. P∅ should be invariant under negating each occurrence of some predicate. Generalised Principle of Instantial Relevance. If θ |= φ and φ(ti+1 )∧ ψ is consistent then P∅ (θ(ti+2 )|φ(ti+1 ) ∧ ψ) ≥ P∅ (θ(ti+1 )|ψ). Johnson’s Sufficientness Postulate. P∅ (U tk+1 |±U t1 ∧· · ·∧±U tk ) should depend only on k and the number rk of positive observations. Thus the equivocator stands out as the only viable candidate for P∅ . One might respond to this line of argument that several of the above desiderata are invariance conditions and can be thought of as applications of the principle of indifference, which says that if one is indifferent concerning which member of a partition will occur then all members of the partitions should receive the same probability, and which is notorious principally for the problems that arise when it is applied over different partitions. Perhaps, then, the line of argument should not be trusted. This response can lead in two directions. If one thinks that the problems generated by multiple applications of the principle of indifference are reason enough to reject the principle straight off, then one will, indeed, reject the above line of argument. But one will also reject the applications of the principle of indifference that lead to the λ-continuum and δ-continuum respectively. Hence Carnap’s resolution of §4 does not get off the ground and there is no serious alternative to the claim at Step 4 of §2 that P∅ = P= . But the response can go in another direction. The problems generated by multiple applications of the principle of indifference are more plausibly taken as reasons to restrict the principle of indifference rather than reject it straight off. After Keynes (1921, §4.21) it is usual to restrict the principle of indifference to the finest partition over which one is indifferent. In our case there is no evidence at all (we are considering P∅ ) and the finest partition over which there is indifference is the finest partition simpliciter—i.e., the partition Ωn of the atomic states. This leads to the following desideratum: State Exchangeability. P∅ should be invariant under permutations of the states ω ∈ Ωn .
64
Jon Williamson
But State Exchangeability clearly implies that P∅ (ω) = P= (ω) = 1/|Ωn | for all ω ∈ Ωn . And it is clear that taking different values of n will not lead to inconsistent applications of the principle of indifference. Hence we have, again, that P∅ = P= . In sum, in either direction in which one takes concerns about applying the principle of indifference, we are left with the equivocator as the only viable candidate for P∅ .
§6 A Rock and a Hard Place A theory of degree of confirmation needs to capture two concepts. On the one hand, it should capture the ampliative concept of degree of inductive plausibility, e.g., the degree to which an observed sample of ravens all being black renders plausible the conclusion that the next observed raven will be black. On the other hand, degree of confirmation should also capture the non-ampliative concept of degree of partial entailment , e.g., the degree to which A∨B entails the conclusion A.2 We apply the concept of confirmation in both cases—a sample of ravens can confirm the conclusion that the next raven will be black; A ∨ B confirms A—so a theory of confirmation should be able to cope with both kinds of case. Carnap was rightly concerned that employing the equivocator as a basis for confirmation—by using c† —would mean that confirmation theory would not be able to capture the concept of inductive plausibility. But by rescinding Step 4 of §2—i.e., by rejecting the identification of P∅ with P= and by developing his continuum of inductive methods—he threw the baby out with the bath water, because the equivocator is the only function able to capture partial entailment in the total absence of evidence. If there is no evidence to distinguish interpretations of a logical language then the degree to which premisses entail a conclusion can only viably be identified with the proportion of models of the premisses that also satisfy the conclusion— equivalently, with the proportion of those atomic states logically implying the premisses that also logically imply the conclusion (Wittgenstein, 1922, §5.1.5). One way to argue for this claim is to appeal to the reasons given in §5 for the pre-eminence of the equivocator. In order to determine the degree to which A ∨ B entails A, consider the following. P∅ (A | A∨B) =
2
P∅ (A ∧ (A ∨ B)) P∅ (A ∧ B) + P∅ (A ∧ ¬B) = P∅ (A ∨ B) P∅ (A ∧ B) + P∅ (A ∧ ¬B) + P∅ (¬A ∧ B)
This concept is called structural confirmation by Kuipers (2001, pp. 208–9).
An Objective Bayesian Account of Confirmation
65
but, as argued in §5, these atomic states should all have the same probability in the absence of evidence, so P∅ (A | A ∨ B) =
1/2 2 = . 3/4 3
Thus A ∨ B partially entails A to degree 2/3. (A ∨ B might be said to support A to degree P∅ (A | A ∨ B) − P∅ (A) = 2/3 − 1/2 = 1/6.) Kemeny and Oppenheim (1952, p. 314) provide a rather different argument for the claim that the equivocator is required to capture partial entailment (which they use to derive a measure of degree of factual support). First they point out that if A and B are logically independent atomic propositions then it must be the case that they are probabilistically independent, P∅ (±A ∧ ±B) = P∅ (±A)P∅ (±B): ‘Two atomic statements which are logically independent cannot support each other factually since they express distinct facts’. Also, A and A ↔ B must be probabilistically independent since A ↔ B is just as favourable to A as to ¬A. But A ∧ (A ↔ B) is logically equivalent to A ∧ B. Hence, P∅ (A)P∅ (B) = P∅ (A ∧ B) = P∅ (A ∧ (A ↔ B)) = P∅ (A)P∅ (A ↔ B). In which case P∅ (B) = P∅ (A ↔ B). Moreover ¬A ∧ (A ↔ B) is logically equivalent to ¬A ∧ ¬B so P∅ (¬A)P∅ (¬B) = P∅ (¬A ∧ ¬B) = P∅ (¬A ∧ (A ↔ B)) = P∅ (¬A)P∅ (A ↔ B) and P∅ (¬B) = P∅ (A ↔ B). Hence P∅ (B) = P∅ (¬B) = 1/2. Similarly P∅ (A) = P∅ (¬A) = 1/2 and, since A and B are probabilistically independent, P∅ (±A ∧ ±B) = 1/4. Similarly the other atomic propositions are all probabilistically independent and have probability 12 , so P∅ (ω) = 1/|Ωn | for ω ∈ Ωn . Hence P∅ = P= , the equivocator. We are thus stuck between a rock and a hard place: on the one hand, the equivocator seems to preclude learning by experience, and so fails to capture the concept of inductive plausibility, while on the other, the equivocator seems to be required to capture the concept of partial entailment. Wesley Salmon recognised this dilemma very clearly. He pointed out that if q entails p then p partially entails q because it entails a part of q, and he argued: if degree of confirmation is to be identified with partial entailment, then c† is the proper confirmation function after all, for it yields the result that p is probabilistically irrelevant to q whenever p and q are completely independent and there is no partial entailment between them. . . . (Salmon, 1967, p. 731)
But Salmon despaired of finding a way out of this dilemma: . . . Unfortunately for induction, statements strictly about the future (unobserved) are completely independent of statements strictly about the past (observed). Not only are they deductively independent of each other, but also they fail to exhibit any partial entailment. The force of Hume’s insight that the future is logically independent of the past is very great indeed. It rules out both full entailment and partial entailment. If partial entailment were the fundamental concept of inductive
66
Jon Williamson
logic, then it would in fact be impossible to learn from experience. (Salmon, 1967, pp. 731–2)
While Carnap sacrificed partial entailment for inductive plausibility, Kemeny focussed on explicating partial entailment (Kemeny, 1953). For both Carnap and Kemeny, the quest for a theory of confirmation that adequately handles the two concepts at once is left empty-handed. Salmon thought that there is no way of satisfying these apparently conflicting demands. But we shall see that there is, by taking another look at the four-step Bayesian approach to confirmation.
§7 The Bayesian Approach Revisited Let us revisit the scheme of §2 in the light of our discussion so far. Step 1. Consider probability functions defined over a language L. Step 2. Identify c(h, e) = P{e} (h) for some suitable probability function P on L, where P{e} (h) is the probability of h on evidence e. Step 3. Identify P{e} (h) = P∅ (h|e). Step 4. Find an appropriate P∅ that represents confirmation in the absence df of evidence. Here P∅ (ω) = P= (ω) = 1/|Ωn | for all ω ∈ Ωn , the equivocator on L. We saw that together these steps have the unhappy consequence that c(h, e) = c† (h, e) = P= (h|e), which precludes learning from experience. Although the focus of the last 60 years of work on confirmation theory and inductive logic has been on Step 4, we have seen that it is not Step 4 that is at fault: the equivocator does indeed stand out as the only viable confirmation function in the total absence of evidence. If Step 4 is not at fault then we must look elsewhere. Revising Steps 1 or 2 would take us away from Bayesian confirmation theory and the remit of this paper;3 instead we will focus on Step 3: Step 3. Identify P{e} (h) = P∅ (h|e). This says that the degree to which you should believe h if you were to grant e is exactly the degree to which you should believe h ∧ e were you to grant nothing, divided by the degree to which you should believe e were you to grant nothing. This claim is far from obvious, and, given that one of the four steps must be revised if confirmation theory is to capture learning from experience, Step 3 could do with closer scrutiny. 3
Popper (1934, Appendix *ix), for one, argued against Step 2.
An Objective Bayesian Account of Confirmation
67
We saw in §2 that the standard justification of Step 3 is in terms of conditional bets: if we interpret the degree to which one should believe h, were one to grant only e, as a betting quotient for h where the bet is called off if e is false, then Step 3 must hold to avoid the possibility of sure loss. It looks at first sight like the case for Step 3 is as compelling as that for the other axioms of probability, which rely on very similar betting justifications. Accordingly, if one were to cast aspersions on this kind of betting justification then one would seem to undermine the whole Bayesian programme. But this is too quick. It is clear that there is something special about Step 3, for it is clear that there are at least two cases in which one cannot explicate the relevant conditional degree of belief as a conditional probability. First, if e is not expressible in the relevant language L then while P{e} (h) may be well-defined, P∅ (h|e) clearly is not. To take a trivial example, if L is a propositional language with a single propositional variable A, and e says that A has probability 0.8 then P{e} (A) is arguably 0.8 although P∅ (h|e) is undefined because e is not a proposition of L. Second, if e is expressible in L but has probability 0 then P∅ (h|e) is undefined but P{e} (A) may be well-defined. For example, the probability that a dart will hit a particular point of a dartboard may be 0, but on evidence e that the dart hit that point, the hypothesis h that the resulting score increased by 20 has a well-defined probability (P{e} (h) = 0 or 1); yet P∅ (h|e) is undefined, so it is not possible that P{e} (h) = P∅ (h|e). In response to this second case, one might point out that, as an alternative to taking conditional probability to be undefined, one can construe the conditional probability as unconstrained when the condition has zero probability: P∅ (h|e) can be any value in the unit interval. But the main point goes through as before: P{e} (h) is welldefined and fully constrained by P∅ , h and e, yet P∅ (h|e) is unconstrained, so the two quantities cannot be identified. In sum, it is apparent that it is not always appropriate to explicate P{e} (h) in terms of a conditional probability. This conclusion leads naturally to two questions. First, under what conditions, exactly, is this explication (and hence Step 3) plausible? Second, if we articulate these conditions to reformulate Step 3, will the problem of learning from experience remain? In order to answer these questions we will need to invoke the machinery of objective Bayesian epistemology.
§8 Objective Bayesian Epistemology Bayesian epistemology addresses the following question: how strongly should an agent believe the various propositions expressible in her language? There are various kinds of Bayesian epistemology; in this section we will sketch objective Bayesian epistemology. The reader is referred to Williamson (2010b) for the details of this particular version of Bayesian epistemology.
68
Jon Williamson
According to objective Bayesian epistemology, an agent with evidence E and language L should apportion the strengths of her beliefs according to three norms: Probability. Her belief function PE should be a probability function on L. Calibration. Her belief function should be calibrated with her evidence. For example, her degrees of belief should be set to frequencies where known. Equivocation. Her belief function should otherwise equivocate sufficiently between basic possibilities expressible in L. The Probability norm requires that rational degrees of belief satisfy the axioms of probability given in §2. The norm says that PE ∈ P where P is the set of probability functions on L. (We need not assume that E itself is expressible as a set of sentences of L.) The usual justification of this norm is in terms of betting behaviour: if degrees of belief are interpreted in terms of betting quotients, then, in order to avoid the possibility of certain loss, they must be probabilities. Note that this justification only needs to appeal to an interpretation of unconditional degrees of belief as betting quotients— conditional beliefs will be analysed separately below—and the problems facing the interpretation of conditional beliefs in terms of conditional bets, alluded to in §7, can be set aside for the moment. The Calibration norm says that the agent’s belief function should lie within some subset of probability functions that are calibrated with her evidence, PE ∈ E ⊆ P. This can be cashed out as follows. The agent’s evidence, construed as everything she takes for granted in her current operating context, may contain information about physical chances that constrains her degree of belief, and it may contain information that constrains degrees of belief in a way that is not mediated by facts about chances. To handle the latter kind of constraint, we may suppose that E imposes a set of structural, non-chance constraints which are satisfied by a subset S of all probability functions, and we insist that PE ∈ S; since this kind of constraint is not central to the points of this paper, there is no need to go into further detail here. To handle the former kind of constraint, we may suppose that the agent’s evidence narrows down the chance function P ∗ on L to a subset P∗ of L. Now this information will typically be pertinent to the agent’s degrees of belief, for if she neglects to bet according to the known chances a shrewd stake-maker can force her to lose money in the long run. But it is too simplistic to say that the agent’s belief function should itself be in P∗ : she might, for instance, have evidence that θ refers to an event in the past, in which case its chance is 0 or 1 and P∗ ⊆ {P ∈ P : P (θ) = 0 or 1}, but it would be absurd to insist that PE ∈ P∗ , i.e., to insist that she should either fully believe or fully disbelieve θ, because she might have no other evidence bearing on the truth of θ. For this reason PE is only constrained to lie in the
An Objective Bayesian Account of Confirmation
69
convex hull ⟨P∗ ⟩ of P∗ . (The whole convex hull is admitted because, while the agent can be made to lose money in the long run if she bets according to degrees of belief outside the hull, as long as she stays within the hull then she avoids this possibility of loss.) In sum, the Calibration norm says that PE ∈ E = ⟨P∗ ⟩ ∩ S. The Equivocation norm says that the agent’s belief function should equivocate sufficiently between the basic possibilities expressible in L. The basic possibilities expressible in L are just the atomic states ω; the probability function that is maximally equivocal is the equivocator P= , so the Equivocation norm can be read as saying that the agent’s belief function should be a function in E that is sufficiently close to P= . If we write ⇓E for the subset of functions in E that are sufficiently close to the equivocator, then the Equivocation norm says that PE ∈ ⇓E. It is usual to measure distance between probability functions by what ∑ has come to be known as the Kullback-Leibler divergence, dn (P, Q) = ω∈Ωn P (ω) log (P (ω)/Q(ω)). (For a predicate language, one can deem P to be closer to R than Q if there is some N such that for all n ≥ N the divergence dn (P, R) is strictly less than the divergence dn (Q, R).) Why should a belief function be equivocal? Because the equivocal belief functions turn out to be those that, under the betting interpretation, minimise worst-case expected loss, for a natural default loss function (Williamson, 2010a). Why should the belief function be sufficiently equivocal rather than maximally equivocal? Because in certain cases there may not be a maximally equivocal belief function in E; in such cases contextual considerations (such as the required numerical accuracy of predictions) can be used to determine what is to count as close enough to the equivocator. In general, if ↓E is the set of maximally equivocal probability functions in E then ↓E ⊆ ⇓E ⊆ E. If there are maximally equivocal functions and if ⇓E = ↓E then one can derive the maximum∑ entropy principle of Jaynes (1957): PE ∈ ↓E = {P ∈ E : entropy H(P ) = − ω P (ω) log P (ω) is maximised}. We shall suppose, in this paper, that if ↓E is non-empty then ⇓E = ↓E, so that the maximum entropy principle is applicable in this case. There are two important consequences of this framework that set objective Bayesianism apart from other versions of Bayesian epistemology. First, no further rule of updating is required. If evidence E changes to E′ then PE changes to PE′ accordingly, where the latter function is determined afresh by the requirement that PE′ ∈ ⇓E′ . Thus belief change is said to be foundational , with beliefs constantly tracking their evidential grounds, rather than conservative (independent rules for updating such as Bayesian conditionalisation tend to conserve prior belief, keeping new beliefs as close as possible to old beliefs). Having said all that, there are many natural circumstances under which the objective Bayesian update will match an update generated by Bayesian conditionalisation, and the cases in which there is disagreement between the two forms of updating can be thought of as pathological cases—cases in which it would be inappropriate to condi-
70
Jon Williamson
tionalise (Williamson, 2009). So under objective Bayesianism one can often think in terms of conditionalisation if one wishes, as long as one is aware of the pathological cases. The second important consequence concerns the treatment of conditional belief. Conditional degrees of belief are already determined by the above scheme: the degree to which one should believe h were one to grant only e, P{e} (h), is determined by the objective Bayesian protocol P{e} (h) = PE (h) where PE ∈ ⇓E and E = {e}. There is thus no need to resort to conditional probabilities or conditional bets in order to handle conditional beliefs. Under the objective Bayesian scheme, then, conditional probabilities are much less central than under other versions of Bayesian epistemology—they simply abbreviate quotients of unconditional probabildf ities, P (θ|φ) = P (θ ∧ φ)/P (φ), and are not to be interpreted in terms of special, conditional betting quotients. Having said all that, there are natural circumstances under which the objective Bayesian view of conditional beliefs will match the conditional bet view. Since these circumstances are important from the point of view of the present paper, we shall dwell on them. We have supposed that evidence E imposes a set of constraints that ought to be satisfied by an agent with that evidence. (There may be more than one way to formulate this set of constraints, but this will not matter for our purposes.) We will use χE to denote this set of constraints; hence E = {P ∈ P : P satisfies the constraints in χE }. Should evidence be inconsistent, i.e., should it determine a set χ0E of prima facie constraints that is unsatisfiable, one cannot identify E = {P ∈ P : P satisfies the constraints in χ0E } = ∅ because in such a situation one can hardly preclude an agent from holding any beliefs at all. Rather, some consistency maintenance procedure needs to be invoked, to generate a set χE of constraints that are jointly satisfiable. One might take χE to be a disjunction of maximal consistent subsets of χ0E , for example, or one might use a consistency maintenance procedure that retains the more entrenched evidence and revokes the less entrenched evidence; we need not decide this question here. Consider two sets of evidence, E and E′ = E ∪ {e}, where e is some sentence of L. We shall call e simple with respect to E iff χE′ is equivalent to (isolates the same set of probability functions as) χE ∪ {P (e) = 1}, i.e., iff the only constraint that e imposes in the context of E is P (e) = 1. Call e consistent with respect to E iff χE ∪ χ0{e} is satisfiable by some probability function (so that χE′ is equivalent to χE ∪χ0{e} ). We then have the following useful result (Seidenfeld, 1986, Result 1; Williamson, 2009): Theorem 8.1 If 1. e is expressible in L, 2. e is simple with respect to E,
An Objective Bayesian Account of Confirmation
71
3. e is consistent with respect to E, and 4. PE (·|e) satisfies χE , then PE′ (h) = PE (h|e). We see, then, that if the above four conditions are satisfied, a conditional degree of belief will match a corresponding conditional probability.
§9 Objective Bayesian Confirmation Theory Having taken a detour into objective Bayesian epistemology, we are now in a position to return to the central concern of the paper—developing an account of confirmation that can capture both inductive plausibility (in particular, learning from experience) and partial entailment (in particular, the fact that the equivocator function captures confirmation in the total absence of evidence). In §7 we suggested that it is Step 3 of the Bayesian scheme—rather than Step 4—that needs reformulating. Here we apply objective Bayesian epistemology to see how Step 3 should be revised.4 The original Step 3 was, Step 3. Identify P{e} (h) = P∅ (h|e). We have seen that objective Bayesianism has a rather different conception of conditional beliefs. Conditional beliefs are to be determined by the norms of objective Bayesianism, rather than via an interpretation in terms of conditional bets. This motivates a new version of Step 3: Step 3′ . Determine P{e} (h) using P{e} ∈ ⇓E, where ⇓E is the set of sufficiently equivocal probability functions satisfying constraints imposed by e. According to this conception, the Bayesian scheme becomes: Step 1. Consider probability functions defined over a language L. Step 2. Identify c(h, e) = P{e} (h) for some suitable probability function P on L, where P{e} (h) is the probability of h on evidence e. Step 3′ . Determine P{e} (h) using P{e} ∈ ⇓E, where ⇓E is the set of sufficiently equivocal probability functions satisfying constraints imposed by e. Step 4. Find an appropriate P∅ that represents confirmation in the absence df of evidence. Here P∅ (ω) = P= (ω) = 1/|Ωn | for all ω ∈ Ωn , the equivocator on L. 4
The approach of this section is a development of that taken in Williamson (2010b) and supersedes that of Williamson (2007, 2008).
72
Jon Williamson
Partial entailment and inductive plausibility It is not hard to see that this revised scheme does what we need of confirmation. For one thing, partial entailment is captured because confirmation in the total absence of evidence is implemented using the equivocator. In fact Step 4 is a consequence of Step 3′ . According to Step 3′ , P∅ (h) is determined by the function in P that is closest to the equivocator. But this is just the equivocator itself (since there is no evidence here). Hence Step 4 follows. Indeed we can calculate that c(A, A ∨ B) = P{A∨B} (A) = P∅ (A|A ∨ B) = P= (A|A ∨ B) = 2/3, just as suggested in §6. Here the identity P{A∨B} (A) = P∅ (A|A ∨ B) follows by Theorem 8.1. For another thing, inductive plausibility can also be captured by this theory of confirmation: learning from experience is no longer impossible. Suppose that an agent grants that a hundred ravens were sampled and all found to be black and that all outcomes are independent and identically distributed (iid) with respect to physical probability. This yields an evidence base E and tells her something about the physical probabilities: there is high probability that the probability of a raven being black is close to the sample mean, i.e., to 1. Statistical theory can be used to quantify this probability and to derive conclusions of the form P ∗ (P ∗ (Br101 ) ≥ 1 − δ) = 1 − ϵ.5 Now fix 1 − ϵ0 to be the minimum degree of belief to which the agent 5
Note that frequentist statistical theory only yields claims about repeatably instantiatable events—not about single cases such as Br101 . Thus frequentist ¯ − freq (B)| < δ) = 1 − ϵ, statistics yields statements of the form freq S (|X R where here the reference class R of the innermost frequency statement is that of all ravens, the reference class S of the outermost frequency statement is ¯ is the sample mean, i.e., the that of all samples of a hundred ravens, and X proportion of sampled ravens that are black (1 in the case of the agent’s particular sample). Such statements are read: if one were to repeatedly sample a hundred ravens then the proportion of samples which have sample mean within δ of the proportion of ravens that are black, is 1 − ϵ. While the normal approximation to the binomial distribution might be applied to yield δ or ϵ in many such cases, in the case of extreme sample frequencies, such as the frequency 1 in our example, interval estimation is rather subtle—see, e.g., Brown et al. (2001). The frequencies in such statements are normally understood as counterfactual rather than actual frequencies—i.e., the reference classes include possible ravens and possible samples other than those that are actually appear (Venn, 1866, p. 18; Kolmogorov, 1933, §2). Such a frequency statement must then be specialised to the single case before the Calibration norm can be used to constrain the single-case belief function PE by appealing to the single-case chance function P ∗ . The specialisation to the single case is itself a subtle question, not least because frequencies involving different reference classes can yield conflicting information about single-case probabilities (the so-called reference-class problem). The machinery of evidential probability was developed for the task of specialising fre-
An Objective Bayesian Account of Confirmation
73
would need to believe P ∗ (Br101 ) ≥ x for her to grant it (i.e., for her to add that proposition to her evidence base). Then apply statistical theory to determine a δ0 such that P ∗ (P ∗ (Br101 ) ≥ 1 − δ0 ) = 1 − ϵ0 . By the Calibration norm of §8, the agent’s rational degrees of belief should be calibrated to this physical probability and so she should strongly believe that the chance is close to 1, PE (P ∗ (Br101 ) ≥ 1 − δ0 ) = 1 − ϵ0 . Accordingly the agent grants that the chance is close to 1, thereby increasing her evidence base from E to E′ = E ∪ {P ∗ (Br101 ) ≥ 1 − δ0 }. Applying the Calibration norm again, the agent should strongly believe that the raven in question will be black, PE′ (Br101 ) ≥ 1 − δ0 . The Equivocation norm will then incline the agent to a sufficiently equivocal point in the interval [1 − δ0 , 1], e.g., PE′ (Br101 ) = 1 − δ0 . We then have that c(Br101 , E′ ) = PE′ (Br101 ) = 1 − δ0 . Thus gaining evidence E′ does raise the degree of confirmation of the next raven being black and we do have learning from experience. Note that in this account of inductive plausibility, quite a lot is packed into E and E′ . In particular, the evidence base needs to include not only facts about the observed sample but also facts about the sampling process in order to derive useful consequences about the chances. However, as pointed out in §8, we do not need to presume that E or E′ is expressible as a proposition e of L. This is a decided advantage of the objective Bayesian approach over other versions of Bayesian confirmation theory: while, when we are deciding how strongly to believe a proposition h, it is important to be able to express that proposition, the task of expressing everything we take for granted is a hopeless, if not in principle impossible, task.6 Note too that statistical theory plays a leading role in implementing the Calibration norm. Hence it is statistical theory that accounts for the inductive plausibility component of confirmation. This contrasts with Carnap’s view that inductive plausibility is a question of logic rather than of mathematical statistics. But it is surely partial entailment, rather than inductive plausibility, that is the logical notion: partial entailment deals with the extent to which premisses entail a conclusion—and entailment is clearly a logical notion—while inductive plausibility deals with the extent to which a hypothesis which goes well beyond the evidence (i.e., which may have little or no deductive support from the evidence) is nevertheless warranted by that evidence—and this goes beyond logic.
6
quentist statements to the single case (Kyburg Jr and Teng, 2001)—this kind of machinery can integrate into the objective Bayesian framework to permit calibration (Wheeler and Williamson, 2009). On the other hand, in the above example L is taken to be rich enough to express claims, such as P ∗ (Br101 ), about physical probabilities. It is often possible to draw useful consequences about chance on less expressive languages, but one should not expect conclusions drawn on a more impoverished language to agree with those drawn on a richer language (Williamson, 2010b, §9.2).
74
Jon Williamson
Broadly speaking, then, the Equivocation norm of objective Bayesian epistemology captures partial entailment and the Calibration norm captures inductive plausibility. Step 3 and Step 3′ To what extent does Step 3′ differ from Step 3? I.e., when will P{e} (h) = P∅ (h|e) under an objective Bayesian construal? And should Step 3′ or Step 3 be preferred where they disagree? Theorem 8.1 can help us answer these questions. Applying Theorem 8.1 in the context of Step 3 and Step 3′ , E = ∅, E′ = {e}, and the four conditions of Theorem 8.1 are the conditions under which P{e} (h) = P∅ (h|e). Should Step 3 and Step 3′ disagree, P{e} (h) ̸= P∅ (h|e), and one or more of these four conditions must fail. Let us examine such failures to see whether Step 3 or Step 3′ is to be preferred in each case. Condition 1. Suppose e is not expressible in L. Then, as noted at the end of §7, P∅ (h|e) is undefined. Hence P{e} (h) ̸= P∅ (h|e). Of course in this case Step 3′ is more plausible than Step 3, because Step 3 cannot be implemented. Condition 2. Suppose then that e is expressible in L but that e is not simple with respect to E = ∅: i.e., e does not merely impose the constraint P (e) = 1. To take a rather trivial example, suppose e says that P ∗ (h) = 0.9. This e clearly imposes at least two constraints: P (e) = 1 (i.e., P (P ∗ (h) = 0.9) = 1) and, via the Calibration norm, P (h) = 0.9. Hence Step 3′ sets P{e} (h) = 0.9. Where there is disagreement between Step 3 and Step 3′ , P∅ (h|e) ̸= 0.9. Clearly it is more appropriate to use Step 3′ , which forces c(h, e) = 0.9, rather than Step 3, which forces c(h, e) ̸= 0.9: the conditional probability simply gets it wrong. The same point can be made in favour of Step 3′ even if the details of the account of calibration of §8 are not adopted. Suppose e says that P (h) = 0.9 (so e talks of rational belief rather than chance). Again, e clearly imposes at least two constraints: P (e) = 1 (i.e., P (P (h) = 0.9) = 1) and P (h) = 0.9. Now there are two cases. If P∅ (h|e) = 0.9 then Step 3′ will agree with Step 3 and the question of which is to be preferred does not arise. Otherwise P∅ (h|e) ̸= 0.9, and Step 3′ is clearly more appropriate because Step 3 will break one of the constraints imposed by e: Step 3′ forces c(h, e) = 0.9 but Step 3 forces c(h, e) ̸= 0.9. Again, the conditional probability simply gets it wrong. Condition 3. Suppose e is inconsistent with respect to E. Since E = ∅ here, this means that e imposes a set χ0{e} of prima facie constraints that is not satisfiable by any probability function on L. As mentioned in §8, the
An Objective Bayesian Account of Confirmation
75
objective Bayesian strategy is to invoke some consistency maintenance procedure to generate a consistent set χE′ of constraints, and to set E′ = {P : P satisfies χE′ }. Step 3′ then selects some PE′ ∈ E′ that is sufficiently equivocal. How does Step 3 proceed? There are two cases here. First, e may be a logical contradiction. If so, e must have probability 0 and the conditional probability P∅ (h|e) must be undefined (or, just as bad, unconstrained). In this case Step 3′ is more plausible than Step 3, because either Step 3 cannot be implemented or it offers no constraint—i.e., e confirms h to no degree at all, or e confirms h to any degree (admitting conclusions as bizarre as c(e, e) = 0). The second possibility is that e is not a logical contradiction, but nevertheless it imposes unsatisfiable constraints. For instance, e may say h ∧ P (h) = 0.9, i.e., that h is true but you ought to believe it only to degree 0.9. While e is not a logical contradiction there is nevertheless something fishy about it, in the sense of Moore’s paradox, because it imposes a set of prima facie constraints χ0{e} = {P (h) = 1, P (h) = 0.9} that is unsatisfiable. While there might be some question as to which consistency maintenance procedure to adopt in this situation—one might identify χ{e} with {P (h) = 1 ∨ P (h) = 0.9} or {P (h) ∈ [0.9, 1]} or ∅, for example—it is clearly the right strategy to maintain consistency somehow, since an agent must be entitled to some belief function or other in such a situation. So Step 3′ seems the right approach to take. Now if P∅ (e) = 0 then, as before, P∅ (h|e) is undefined or unconstrained and Step 3′ is clearly to be preferred over Step 3. But if P∅ (e) > 0 and P{e} (h) ̸= P∅ (h|e), then intuitively one should go with P{e} (h) rather than P∅ (h|e) since only the former results from the appropriate consistency maintenance procedure. Indeed, in our example if P∅ (e) > 0 then P∅ (h|e) = 1 since h ∧ e is logically equivalent to e, but it is clearly unacceptable to insist that c(h, e) = 1 when e is unsatisfiable, so Step 3′ is to be preferred over Step 3.7 Condition 4. Suppose PE (·|e) does not satisfy χE . Since in the current context E = ∅, χE must also be empty. So the only way in which PE (·|e) can fail to satisfy χE is if PE (·|e) is not a well-defined probability function. This occurs if PE (e) = 0 and conditional probability is taken as undefined 7
If P∅ (e) is understood as an objective Bayesian probability, this last situation perhaps does not arise. Arguably it cannot be that P∅ (e) > 0 because the norms of objective Bayesianism should ensure that P∅ (e) = 0 when e imposes unsatisfiable constraints. The idea here is that any set of evidence E imposes the constraint P (θ) = 0 for each θ inconsistent with respect to E. Such a constraint is called a structural constraint (§8). If this policy is accepted then indeed the aforementioned situation does not arise under an objective Bayesian construal of P∅ (e). Note too, though, that if this policy is accepted then P∅ will not agree with the equivocator function P= on those unsatisfiable propositions that are not logical contradictions.
76
Jon Williamson
when the condition has probability zero. As we just saw in the discussion of Condition 3, in this case Step 3 is not implementable and Step 3′ is to be preferred. At the end of §7 we encountered two situations in which Step 3 is inappropriate: the case in which e is not expressible in L and the case in which e has probability 0. These cases correspond to infringements of Conditions 1 and 4 of Theorem 8.1 and one kind of infringement of Condition 3. We asked in §7 whether there are any other restrictions that need to be made to Step 3. We now have our answer: Conditions 2 and 3 spell out the only other restrictions that need to be made. Where these four conditions are satisfied the objective Bayesian account will agree with the original Bayesian scheme of §2. On the other hand, in each case in which these conditions fail, the objective Bayesian account, which replaces Step 3 by Step 3′ , is to be preferred. Note that with the problem of learning from experience, it is Condition 2 that is pertinent: new evidence e tends not to be simple with respect to background E. If e says that a hundred ravens were observed and all found to be black, and that the pertinent chances are iid, then, according to the above account, e does not merely impose the constraint P (e) = 1 but also constraints that imply P (P ∗ (Br101 ) ≥ 1 − δ) = 1 − ϵ. Consequently e is not simple and Step 3′ , rather than Step 3, must be applied. Since Step 3 is abandoned in favour of Step 3′ , the question arises as to whether the resulting account is prone to the Dutch book argument of §2. Surely an agent who does not set P{e} (h) = P∅ (h|e) opens herself up to the possibility of sure loss? The natural response to this worry is just to point out that in the objective Bayesian framework conditional beliefs are not interpreted in terms of conditional bets, so infringing Step 3 does not expose an agent to sure loss. To put it another way, one would be advised not to place a conditional bet, conditional on evidence that is not simple with respect to current evidence, with a betting quotient matching one’s rational degree of belief (as determined by Step 3′ ), for fear of sure loss as per the argument of §2. The interpretation of conditional beliefs in terms of conditional bets is therefore inappropriate in general. Under the approach advocated here, conditional beliefs are explicated by considering unconditional probabilities relative to an evidence base that is expanded to include the conditioning proposition, rather than by considering conditional bets and conditional probabilities. Levi (2010, §4) also favours an approach based on expanding evidence rather than conditional bets. However, Levi imposes a principle—Confirmational Conditionalisation—that forces consistency between conditional beliefs and conditional probabilities. This principle is arguably too strong: according to the argu-
An Objective Bayesian Account of Confirmation
77
ment of this section, while one should expect considerable agreement between conditional degrees of belief and conditional probabilities, agreement should not be universal. In particular, if the conditioning evidence is not simple with respect to the rest of the evidence base then a conditional degree of belief may well disagree with the corresponding conditional probability, and for good reason. Hence Levi’s principle of Confirmational Conditionalisation is arguably just as inappropriate as the Carnapian tradition of confirmation theory upon which he is trying to improve.8
§10 Conclusion Let us recap the main line of argument. Of the four-step Bayesian scheme of §2, Step 4 has been the main locus of the debate concerning Bayesian confirmation theory, largely because it is commonly thought that Step 4 must be revised if confirmation theory is to adequately capture the problem of learning from experience. But revising Step 4 leads to another problem, namely a failure of confirmation theory to capture the phenomenon of partial entailment. In fact, learning from experience can be accounted for in a different way: by reformulating Step 3 in accordance with the prescriptions of objective Bayesian epistemology. This leads to an objective Bayesian confirmation theory and a new four-step scheme that is broadly preferably to the original scheme of §2. During the course of this argument we have had to appeal to some subtle distinctions—the distinction between Bayesian confirmation theory, the Bayesian interpretation of probability and Bayesian epistemology, for 8
Proponents of an interpretation of conditional beliefs in terms of conditional bets might wonder whether one can force consistency between conditional beliefs and conditional probabilities on the objective Bayesian account. If successful, such a move might salvage Step 3. Perhaps the most promising suggestion in this regard is simply to impose a structural constraint of the form P∅ (h|e) = P{e} (h) for each pair of sentences e and h of L. If the resulting set of constraints is satisfiable then it would appear that conditional beliefs can be thought of as conditional probabilities after all. However, it is doubtful that such a set of constraints is satisfiable. Note that P{A1 ∨¬A1 } (ω) = 1/|Ωn | for any ω ∈ Ωn . This is because a tautology fails to provide substantive information about chances, so E = P and ⇓E = {P= }. But according to the above suggestion we have a structural constraint of the form P{A1 ∨¬A1 } (ω) = P∅ (ω|A1 ∨ ¬A1 ). Now P∅ (ω|A1 ∨ ¬A1 ) = P∅ (ω ∧ (A1 ∨ ¬A1 ))/P∅ (A1 ∨ ¬A1 ) = P∅ (ω). Therefore P∅ (ω) = 1/|Ωn | for all ω ∈ Ωn , i.e., P∅ = P= . But then the problem of learning from experience reappears: under the proposed structural constraints, P{e} (h) = P∅ (h|e) = P= (h|e) = P= (h) = P∅ (h) if h and e are logically independent. This contradicts the observation above that it is possible to learn from experience on the objective Bayesian account, i.e., that P{e} (h) > P∅ (h) for some logically independent e and h.
78
Jon Williamson
instance, as well as the distinction between conditional probabilities, conditional bets and conditional beliefs, and the distinction between inductive plausibility and partial entailment. But by teasing these concepts apart we create the conceptual space for a new and promising theory of confirmation. Further work needs to be done to flesh out the theory, of course. The problem of reconciling learning from experience with partial entailment is but one problem for confirmation theory—others include the question of whether universal hypotheses can have positive confirmation and the question of whether language relativity infects confirmation theory. It would be interesting to see how an objective Bayesian confirmation theory might answer these questions. Concerning the first question, it is well known that the equivocator awards zero probability to universally quantified statements. (Indeed, any function satisfying Johnson’s Sufficientness Postulate, hence any function in Carnap’s λ-continuum, awards zero probability to universally quantified statements—see, e.g., Hintikka and Niiniluoto (1980) and Paris (1994, Theorem 12.10).) But under the objective Bayesian account the equivocator only captures confirmation in very special cases (e.g., in the total absence of evidence, or in the case of tautological evidence)—cases in which it is by no means problematic that universal statements be given probability zero. There is clearly nothing in objective Bayesian theory that precludes awarding positive probability to universally quantified statements in the presence of less trivial evidence. Indeed it is clear that if evidence imposes constraints that force a universal hypothesis to have positive probability then it will have positive probability. Arguably an agent’s evidence includes everything she takes for granted, including theory, assumptions and background knowledge as well as the results of observations (Williamson, 2010b, §1.4). If her theoretical evidence includes universal hypotheses, then those universal hypotheses will have positive probability, as will universal hypotheses that they partially entail or render inductively plausible. A detailed investigation of this phenomenon remains a topic for further research. Concerning the second question, it appears that objective Bayesian probability does depend to some extent on the underlying language L, and rightly so because an agent’s language can, independently of any expressions formulated in that language, encapsulate factual information about the world in which the agent dwells. To take a simple example, if the agent’s language has 20 different words for snow, that says something about her environment (Williamson, 2010b, §9.2). Under an objective Bayesian confirmation theory, this would imply that the degree to which e confirms h is relative to some extent on the perspective of the underlying language L. As to whether this leads to any kind of problematic incommensurability of confirmation is another topic for further research.
An Objective Bayesian Account of Confirmation
79
Acknowledgements I am very grateful to Donald Gillies, Theo Kuipers and an anonymous referee for helpful comments, and to the British Academy and the Leverhulme Trust for financial support.
Bibliography Bayes, T. (1764). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370–418. Boole, G. (1854). An investigation of the laws of thought on which are founded the mathematical theories of logic and probabilities. Walton and Maberly, London. Brown, L. D., Cai, T. T., and DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16(2):101–117. Carnap, R. (1945). On inductive logic. Philosophy of Science, 12(2):72–97. Carnap, R. (1950). Logical foundations of probability. Routledge and Kegan Paul, London, second (1962) edition. Carnap, R. (1952). The continuum of inductive methods. University of Chicago Press, Chicago IL. Carnap, R. (1971). Inductive logic and rational decisions. In Studies in inductive logic and probability, volume 1, pages 5–31. University of California Press, Berkeley. Fitelson, B. (1999). The plurality of Bayesian measures of confirmation and the problem of measure sensitivity. Philosophy of Science, 66:S362– S378. Gillies, D. (1990). The turing-good weight of evidence function and popper’s measure of the severity of a test. British Journal for the Philosophy of Science, 41:143–146. Good, I. J. (1960). The paradox of confirmation. British Journal for the Philosophy of Science, 11:145–149. Good, I. J. (1980). Some history of the hierarchical Bayes methodology. Trabajos de Estad´ıstica y de Investigaci´ on Operativa, 31(1):489–519. Hintikka, J. and Niiniluoto, I. (1980). An axiomatic foundation for the logic of inductive generalisation. In Jeffrey, R. C., editor, Studies in inductive logic and probability, volume 2. University of California Press, Berkeley and Los Angeles. Howson, C. and Urbach, P. (1989). Scientific reasoning: the Bayesian approach. Open Court, Chicago IL, second (1993) edition. Jaynes, E. T. (1957). Information theory and statistical mechanics. The Physical Review, 106(4):620–630. Jeffreys, H. (1936). Further significance tests. Mathematical Proceedings of the Cambridge Philosophical Society, 32:416–445.
80
Jon Williamson
Johnson, W. E. (1932). Probability: the deductive and inductive problems. Mind, 41(164):409–423. Kemeny, J. G. (1953). A logical measure function. Journal of Symbolic Logic, 14(4):289–308. Kemeny, J. G. and Oppenheim, P. (1952). Degree of factual support. Philosophy of Science, 19(4):307–324. Keynes, J. M. (1921). A treatise on probability. Macmillan (1948), London. Kolmogorov, A. N. (1933). The foundations of the theory of probability. Chelsea Publishing Company (1950), New York. Kuipers, T. A. F. (1986). Some estimates of the optimum inductive method. Erkenntnis, 24:37–46. Kuipers, T. A. F. (2001). Structures in science: heuristic patterns based on cognitive structures; an advanced textbook in neo-classical philosophy of science. Kluwer, Dordrecht. Synthese Library volume 301. Kyburg Jr, H. E. and Teng, C. M. (2001). Uncertain inference. Cambridge University Press, Cambridge. Levi, I. (2010). Probability logic, logical probability, and inductive support. Synthese, 172:97–118. Nix, C. (2005). Probabilistic induction in the predicate calculus. PhD thesis, University of Manchester. Nix, C. J. and Paris, J. B. (2006). A continuum of inductive methods arising from a generalised principle of instantial relevance. Journal of Philosophical Logic, 35:83–115. Paris, J. B. (1994). The uncertain reasoner’s companion. Cambridge University Press, Cambridge. Popper, K. R. (1934). The Logic of Scientific Discovery. Routledge (1999), London. With new appendices of 1959. Salmon, W. C. (1967). Carnap’s inductive logic. The Journal of Philosophy, 64(21):725–739. Seidenfeld, T. (1986). Entropy and uncertainty. Philosophy of Science, 53(4):467–491. Venn, J. (1866). Logic of chance: an essay on the foundations and province of the theory of probability. Macmillan, London, second (1876) edition. Wheeler, G. and Williamson, J. (2009). Evidential probability and objective Bayesian epistemology. In Bandyopadhyay, P. S. and Forster, M., editors, Handbook of the Philosophy of Statistics. Elsevier. Williamson, J. (2007). Inductive influence. British Journal for the Philosophy of Science, 58(4):689–708. Williamson, J. (2008). Objective Bayesianism with predicate languages. Synthese, 163(3):341–356. Williamson, J. (2009). Objective Bayesianism, Bayesian conditionalisation and voluntarism. Synthese, 178(1):67–85. Williamson, J. (2010a). Bruno de Finetti: Philosophical lectures on probability. Philosophia Mathematica, 18(1):130–135.
An Objective Bayesian Account of Confirmation
81
Williamson, J. (2010b). In defence of objective Bayesianism. Oxford University Press, Oxford. Wittgenstein, L. (1922). Tractatus logico-philosophicus. Routledge & Kegan Paul. Philosophy SECL, University of Kent Canterbury, CT2 7NF United Kingdom [email protected]
ADAM GROBLER
AN EXPLICATION OF THE USE OF INFERENCE TO THE BEST EXPLANATION 1. PROSPECTS AND PROBLEMS OF IBE The aim of the present paper is, first, to give an explication of the very phrase “best explanation”, and second, to give some suggestions about its methodological use. The explication on offer will be given in terms of two set-theoretical criteria of comparing the relative explanatory power of alternatives. One criterion is designed to compare rival hypotheses put forward in the framework of a fixed background knowledge, the other is designed to compare an original background knowledge with its attempted revision. The proposal will be claimed to resolve the problems of Duhemian variety as well as the incommensurability problem. In the face of vast disagreement on the content of the concept of explanation, let alone the question of which explanation is better than another, let us take as the starting point of discussion some well-known conception on these matters. I find Peter Lipton’s1 account most suitable for this job, for I am very sympathetic to his claim that IBE gives the best explanation of our inductive practices—better than instant-confirmation or falsificationist accounts of scientific method. Consequently, I shall try to modify his conception, preserving its spirit, in order to produce an even better explanation. To start with, let us assume the general idea that explanation is a direct answer to a why-question. Sometimes it is claimed that there are also explanations that consist in giving an answer to a how-question. To allow this, however, makes the task of analysis of the patterns of direct answers much more difficult, if executable. Instead, I am inclined to take for granted that explanation-seeking how-questions are reducible to series of why-questions. I believe so, for answers to such questions seem to be expected to uncover a hidden mechanism of a process. They are explanatory to the extent that they do not confine themselves to pure description, but also point to factors that determine or raise the probability of subsequent stages of the process to occur. Sometimes it is also claimed that not all why-questions are explanation-seeking. To meet this objection, further analysis will narrow down the range of whyquestions as well as that of answers to them that are admissible in the context of the present considerations. First of all let us assume the idea, which prior to Lipton was exercised by Bas van Fraassen2, of a contrastive why-question “Why 1 2
Peter Lipton, Inference to the Best Explanation. London: Routledge 1991. Bas van Fraassen, The Scientific Image. Oxford: Clarendon Press 1980.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_5, © Springer Science+Business Media B.V. 2011
84
Adam Grobler
P rather than X?”. This move does away with all pseudo-explanations of the sort “Such were the dictates of fortune”. For Lipton, to explain why P (rather than X) is to point to an element in the causal history of P which has no counterpart in the causal history of not-X.3 This gets rid of all the conceptual problems with telling key causes from secondary ones, or deciding which causes of the indefinite list are explanatory and which not, while it still leaves us with the question which cause is to be cited in the best of all explanations at hand. The most important advantage of Lipton’s formulation, as I see this, is its potential to solve all known paradoxes of explanation that have been put forward in the long-lasting debate. To give one example, let me recall syphilis and paresis case4. To say that John contracted paresis because he has previously suffered from untreated latent syphilis does not explain his misfortune in Hempelian terms5, for its probability is too low. This is rather peculiar result, taking into account that John’s past trouble raises the probability in question from zero to 10% or so. In contrastive account it is clear that one can explain why John rather than Jane contracted paresis provided that John has previously suffered from untreated latent syphilis while Jane has not. Possible good luck of Mary, who avoided contracting paresis despite her syphilitic past, though highly probable, again contrary to Hempel’s ideas, remains unexplained vis-à-vis John’s or anybody else’s fate.6 One disadvantage of Lipton’s account is his neglect of other than causal forms of explanation, especially functional and intentional. Some examples in his book7 make room for intentional explanations, though, as variations of causal ones, with motivations as causes. This idea, however, may undesirably interfere with that of causal explanation of unintended results of intentional actions. Be this as it may with intentional explanations, functional explanations are clearly ignored. Still, this drawback is easily amended once it is admitted that explanatory questions can be answered in terms of functions and intentions as well. 3
4 5 6
7
P. Lipton is not explicit on the concept of counterpart. By way of explication I am inclined to suggest that, in the context of the question “Why P rather than X”, c’ is a counterpart of c iff by analogy if c is a cause P, c’ might be a cause of X. Note that c’, although it might be a cause of X, in fact is not, for the question under consideration presupposes that X has not occurred. Instead, the absence of c’, taking its analogy to c for granted in the context, is a sort of preventing cause for X to occur and therefore a sort of cause of not-X. This point will be made clear in an example to follow. This is one of so popular examples that I am not able to point to its primary source. See Carl G. Hempel, Philosophy of Natural Science. Englewood Cliffs: Prentice-Hall 1966. Note that in the causal histories of Jane’s and Mary’s not-suffering from paresis the counterparts of the relevant cause of John’s suffering from paresis are, respectively, Jane’s and Mary’s previous sufferings from syphilis. The absence of such a counterpart in Jane’s story explains the difference between the end-points of John’s and Jane’s stories, while its presence in Mary’s story leaves the question of what has prevented her from sharing John’s fate open. Ibid.
An Explication of the Use of IBE
85
More serious shortcoming of Lipton’s conception is that his criteria of betterness of explanation are rather vague. They include mechanism, scope, and precision. But, first, how to compare alternative accounts of mechanism? Ptolemaic and Copernican systems, caloric and kinetic theories of heat or wave and particle theory of light propagation seem, intuitively, to provide us with equally detailed characterizations of mechanisms of the celestial bodies movements, heating and cooling, or refraction. But how to make more sophisticated than purely intuitive comparisons in this respect? To make things worse, even if we were able to say which explanation uncovers the (hypothetic) mechanism in more detail, it is by no means clear that the more detailed account is better. Many authors consider simplicity an important merit of explanation. Even if the tension between thoroughness and simplicity were resolved, in many cases the alternative mechanisms seem simply incommensurable. Take, for example, Vortex against Gravity. It is easier to compare explanations with regard to scope and precision. Scope, however, is frequently enlarged at the expense of precision and vice versa. To include quantum phenomena in the scope, one has to accept Uncertainty. To have a precise solution of n-body problem, one has to introduce some special assumptions that narrow down the scope. Not only the two criteria may drive in opposite directions, but also any of them taken separately may give ambiguous results. Scopes of two explanations may overlap, or a novel explanation may win some precision with respect to one parameter and at the same time lose some precision with respect to another one. To make things worse, sometimes enlarging the scope does not bring about any explanatory gain and too much precision is even undesirable. For example, applying the pattern of Natural Selection to account for the evolution of scientific knowledge is just a trick of analogical thinking. Or trying to account for the results of taking a diet in terms of minute and frequently recorded changes of the dieter’s weight ignores ceteris paribus clauses. These observations may suggest quite a plausible idea that instead of imposing a linear ordering of alternative explanations, the criteria of betterness establish at best a partial ordering. To arrive at this best, and some its applications, I shall attempt at elaborating some more operational criteria.
2. HEURISTICS In the previous section we have accepted the idea that explanation consists in answering a contrastive why-question. On the other hand, IBE is to be applicable to evaluating hypotheses. To combine these two ideas, it is necessary to take into account a relation between explanatory answers and explanatory hypotheses. Fortunately, this is quite easy. For example, consider the question: “Why John rather than Jane has contracted paresis?”. The explanatory answer is: “Because John had been suffered from untreated latent syphilis while Jane had not”. Otherwise
86
Adam Grobler
no explanation is available, except for pseudo-explanations of the sort “Because John had a bad luck and Jane had a good luck”. It is clear that the explanatory answer is supported by the explanatory hypothesis: “People (sometimes) contract paresis only when they have suffered from untreated latent syphilis before”. Note that pseudo-explanations are not so grounded: there are no universal hypotheses about luck. Or, if there are some, they are either ad hoc, like the Polish proverb “A fool is always lucky”, or have a figurative meaning, like the English proverb “Luck prefers the prepared”. Now, let us take as a hint the suggestion that the more explanatory answers are supported by an hypothesis the better the hypothesis is. “The better” means here “has a greater explanatory power”. The above formulation is clearly too loose to be considered seriously. One difficulty is the problem of the individuation of questions. In the syphilis-paresis example it is tempting to consider the number of all the questions of the form “Why x rather than y has contracted paresis?” that have explanatory answers in terms of the hypothesis under consideration. Suppose, however, that some alternative cause of paresis, say physilis, is discovered, and physilis appears effective in much less cases than syphilis. If the number of pairs (x, y) such that x contracted paresis due to its hypothetical cause and y did not in the absence of the relevant cause matters, than the syphilis hypothesis appears much better than the physilis one, which is an highly counterintuitive result. It seems that in cases like this the whole class of questions under consideration, let us call it question-type, is to count as a single question. For example, the disjunction: “People (sometimes) contract paresis only when they have suffered from untreated latent syphilis or physilis before” supports explanatory answers to more questions of the question-type under consideration than each of the disjuncts separately. This, I think, is a good reason for preferring the former over the latters. It appears, then, that the individuation of questions is a context-dependent matter. Another difficulty arises from Grünbaum’s challenge.8 He argued there, for example, that Special Relativity cannot answer more questions than Classical Kinematics, for some questions answered by the latter are simply ill-posed from the point of view of the former. These include the questions about the velocity of a mass-point at a specific time that are put without referring to a particular frame of reference. Grünbaum’s argument reflects his preconception that whenever there is no true direct answer to a question, the question is ill-posed and, consequently, has no true answer at all. To resist his line of thought, let us invoke the concept of a corrective answer, one of the concepts of the logic of questions.
8
Adolf Grünbaum, “Can a theory answer more questions than one of its rivals?”, The British Journal for the Philosophy of Science 37, 1, 1986: pp. 1-23.
An Explication of the Use of IBE
87
3. APPLYING THE LOGIC OF QUESTIONS: PRELIMINARIES Let us call a presupposition of the question Q a necessary condition for Q to have a true direct answer, where direct answer, putting aside technicalities, is any possible just-sufficient answer. For example, the question: “Has John stopped beating his wife?” has two direct answers: “Yes” and “No”. This question has some presuppositions, like “John have beaten his wife so far” and “John is married”. When a presupposition of a question is not true, there is no true direct answer to it. Still, such a question has a true corrective answer, where a corrective answer is the denial of a presupposition of the question. In our example possible corrective answer are “John has never beaten his wife” or “In fact, he is not married at all”. Our focus, however, is not yes-or-no-questions, but why-questions. Following Tadeusz Kubiński, Theo Kuipers and Andrzej Wiśniewski let us assume, to start with, that why-questions are reduced to what-questions.9 Consider first one their variation, what-is-the-cause-of-questions. That is “Why P rather than X?” is to be understood as “What is the cause of P (that has no counterpart in the causal history of not-X)?”. Such questions have their presuppositions that correspond to what Theo Kuipers calls “meaning postulates” in his account of explanation by specification.10 These include: i. ii. iii. iv. v.
P actually has occurred11 X has not occurred12 there is a causal law for some c “if c then P occurs”13 c is causally effective for P to occur …14 … and no counterpart of c is causally effective for X to occur (in the circumstances)15.
The scheme of the search of a plausible answer draws upon the concepts of Andrzej Wiśniewski’s logic of questions and was elaborated in a joint paper16. 9
10 11 12 13 14 15 16
Tadeusz Kubiński, An Outline of the Logical Theory of Questions, Berlin 1980, Theo A. F. Kuipers and Andrzej Wiśniewski, “An Erotetic Approach to Explanation by Specification”, in: Erkenntnis 40, 3, 1994, pp. 377-402. Theo A.F. Kuipers, “Explanation by Specification”, in: Logique et Analyse 116, 1986, pp. 509-521. Kuipers insists that P be an abnormal event. But once P is explained, it is hardly can be called abnormal. Kuipers’s formulation is put in terms of normal-abnormal distinction rather than in terms of Liptonian contrastive clause. Kuipers’s formulation is a bit more complex. As I see it, it amounts to specifying idealizations and ceteris paribus clause. This I construe as the requirement that idealizations and ceteris paribus clause are satisfied in the circumstances. Originally, the concept of counterpart was not mentioned. Adam Grobler and Andrzej Wiśniewski, “Explanation and Theory Evaluation”, in:
88
Adam Grobler
Here I sketch it omitting technical details. First, one is advised to try a number of causal hypotheses of the form “if x then P”17. If this fails, one can try to revise background knowledge to enlarge the scope of investigation in order to possibly find an hitherto unknown factor that may causally operate in the circumstances. If this fails, one can try to reject some presuppositions of the question to conclude one of the following: (i) P did not occur, after all, (ii) P results from a causal nexus rather than from a particular cause, or (iii) P has no specific cause. Let us illustrate more complex of the above moves with some examples. First, take a revision of background knowledge in order to enlarge the scope of investigation. Consider the question “Why do tides occur?”, that is “What is the cause of tides?”. One can establish a correlation between the tides and the movements of the Moon. The pre-Newtonian physicist would find an appeal to that as a sort of astrology, for he acknowledges only actions by direct contact. Thus he cannot find any causal law required for explanation. Fortunately, the Newtonian revision of background knowledge introduced action at a distance (gravity) to provide us with a suitable explanation. Another example, this time no case rejection of a presupposition of a whyquestion. Consider the question “Why did John rather than Jane win in the lottery?”. Suppose John bought more tickets than Jane. Then a plausible explanatory hypothesis is that more tickets give more chances to win. Consequently, John had more chances than Jane. But if it was not the case that John bought more tickets, then, assuming that the lottery was fair, the only plausible alternative is the corrective answer to the question under consideration: “John won for no special cause”. The above ideas can be generalized to include what-is-the-reason-of-questions and what-is-the-function-of-questions as other types of why-questions.
4. TWO COMPARATIVE CRITERIA OF EXPLANATORY POWER Let us call a question Q decided against the background K by the hypothesis H iff H is consistent with K, and H and K taken together support a direct answer A to Q. Next, let us call a question Q rejected against the background K by the hypothesis H iff H and K taken together support a corrective answer to Q. Note that in both cases Q is answered, either directly or correctively.
Roberto Festa, Atocha Aliseda, and Jeanne Peijnenburg (eds.), Cognitive Structures in Scientific Inquiry. Essays in Debate with Theo Kuipers. Poznan Studies in the Philosophy of the Sciences and the Humanities 84, Amsterdam/New York: Rodopi 2005, pp. 299-310. Again, instead of a counterpart of c, originally we considered c itself. This, however, in many contexts is justifiable, provided that c is appropriately construed. 17 In considering each hypothesis it is required to decide a number of so-called implied questions, in the sense of erotetic implication. See e.g. Andrzej Wiśniewski, “Erotetic implications”, in: Journal of Philosophical Logic 23, 2, 1994, pp. 174-195.
An Explication of the Use of IBE
89
With these definitions we are able to form the following criterion for comparing the explanatory power of rival hypotheses.18 C1. H’ is of greater explanatory power than H (relative to the background K) iff each question decided by H against K is decided by H’ against K, but not the other way round. This criterion, however, while helps to decide the problems of choice between rival hypotheses, gives us no hint towards deciding a question of Duhemian variety, namely, how to choose, in the face of a puzzle, between adopting a new hypothesis and revising the background.19 To solve this problem, we need another criterion: C2. A transition from the background K to K’ is progressive iff each question answered against K is answered against K’, but there is a question decided against K’ which is not decided against K. Two comments are in order. First, that “answered” is meant here “decided or rejected”. Second, that it does not matter whether a question decided against both possible backgrounds is decided against them by the same hypothesis or by different hypotheses. Note that the above criteria (of having more questions answered), as it was declared before, do not impose linear ordering between the pairs of , i. e. hypotheses and systems of background knowledge. The ordering is only partial. This feature plays a crucial role in some applications mentioned in the next section, where I shall try to suggest how these criteria may work.
18 In the first paragraph of this paper the criteria were called “set-theoretical”. This is because originally they were formulated (in Polish) in Adam Grobler, Metodologia nauk, Kraków: Aureus-Znak, 2006, p. 126, in terms of inclusion relation between sets of questions. I tried to put them with less success for many times before, e.g. in Adam Grobler, “Explanation and Epistemic Virtue”, in: Adam Jonkisz & Leon Koj (eds.), On Comparing and Evaluating Scientific Theories, Poznań Studies in the Philosophy of the Sciences and the Humanities, 72, 2000, pp. 61-88. This earlier attempt was also claimed there to solve the problems mentioned in the next section. 19 I call this “a question of Duhemian variety” for it resembles much Duhem problem. For Duhem, this is the question of choice between rejecting a hypothesis under test or revising auxiliary hypotheses. Here again the question is how to choose between alternative lines of restoring the coherence of a more comprehensive, if not the overall system of scientific knowledge.
90
Adam Grobler
5. APPLICATIONS TO SOME PERSISTENT QUESTIONS IN THE PHILOSOPHY OF SCIENCE
With these criteria we are able to offer solutions to some recalcitrant problems in the philosophy of science. First, a rationalization of the scientists’ resistance to rejecting an apparently falsified hypothesis or to a “paradigm change” is available. An hypothesis that is falsified in an attempt at answering an explanatory question may still have the greatest explanatory power of all the hypotheses available at the time. This point, in a different wording, was made already by Lipton.20 But the criteria on offer enable us to make a similar point with regard to “paradigms” construed as systems of background knowledge or pairs of systems of background knowledge together with hypotheses considered within their respective frameworks.21 Namely, rival paradigms may be incommensurable in the sense of having incomparable explanatory powers: the sets of questions answered by them may overlap. In such a case the resistance of scientists to a paradigm change represents their doubts about the potential of the new paradigm to strengthen its explanatory power to its advantage. When the criteria do not establish the superiority of the new paradigm, in the absence of other considerations to be discussed in the next paragraphs they make scientists’ resistance rational or at least not irrational. In just the same way we can rationalize the coexistence of alternative paradigms: as long as rival paradigms are incomparable in terms of the criteria, it is advisable to develop both in order to see which, if at all, will win in the long run. This justification of theoretical pluralism is more elaborate than Lakatos’s idiosyncratic remark about “budding” or degenerating scientific research programs that may need some time to flourish.22 Moreover, there is a use of the criteria that may help to respond Feyerabend’s claim that one can never know how much time to give to an apparently inferior scientific research program before it is abandoned.23 Clearly, the response to anything-goes principle is not easy. As Thomas Kuhn suggested, winning paradigms often suffer from explanatory loses.24 Fortunately, 20 Op. cit. 21 One “problem of Duhemian variety” to be solved with the criteria on offer is, in this construal, whether to maintain the old paradigm that makes an otherwise attractive hypothesis unacceptable or switch to a new paradigm that may appear to be able to accommodate apparently negative evidence against the hypothesis in question. 22 Imre Lakatos, “Falsification and the Methodology of Scientific Research Programmes”, in: Imre Lakatos & Alan Musgrave, Criticism and the Growth of Knowledge, Cambridge: Cambridge University Press, 1970, pp. 91-196. I assume that Lakatosian concept of scientific research program corresponds to Kuhnian paradigm. 23 Paul K. Feyerabend, Against Method, London: New Left Books 1978. Again, the choice between alternative research programs can be combined with the question of assessing the evidence relevant for the decision about a controversial hypothesis to eventually form “a problem of Duhemian variety”. 24 Thomas S. Kuhn, The Structure of Scientific Revolutions, Chicago: Chicago University Press 1962.
An Explication of the Use of IBE
91
such paradigm changes can be accounted for, in the present framework, by supposing that on such occasions scientists reasonably expect that Kuhn’s losses can be made up for in the course of future investigation. Such was, for example, the Newtonian lose of (Cartesian) explanation of the complanarity of planets that was later regained by the theory of the evolution of solar system that was logically independent from Newtonian Gravity. In other words, the criteria did not, to start with, decide in favor of Newtonian physics but scientists could reasonably hope for some future development that eventually turn the scale – which in fact came true. In the face of the above I am inclined to suggest that the two criteria do not entail straightforwardly any particular methodological advice to be applied in a particular instant and once for all. Rather, scientists are free to apply the criteria to an anticipated state of investigation instead to the actual one. After all, abandoning a paradigm or research program is not aborting it. The loser at a particular stage of inquiry still lives in books and papers and, in case a reviving idea appears, possibly driven by somebody’s bizarre interest, may on demand come back to the stage. To conclude, the present proposal offers a clarification of the concept of best explanation and points to a reasonable middle ground between strictly logical, “instant rationality”, to use Lakatos’s term, and anything-goes anti-methodology. It provides us with relatively precise evaluation tools and, at the same time, takes into account the element of judgment in their use.
Institute of Philosophy Opole University ul. Katowicka 89 PL-45-061 Opole Poland Home: Stojalowskiego 37/2 PL-30-611 Krakow Poland [email protected]
J OKE M EHEUS
A F ORMAL L OGIC FOR THE A BDUCTION OF S INGULAR H YPOTHESES 1
1 I NTRODUCTION The aim of this paper is to present a new logic, called LArs , for the abduction of singular hypotheses. The intended application context of the logic concerns problem solving situations in which one tries to ‘abduce’ an explanatory hypothesis for an explanandum on the basis of a background theory possibly together with some relevant empirical findings. The intended set of premises will thus typically consist of one or more explananda and some general statements that, with respect to those particular explananda, are considered as relevant. It may also contain zero or more empirical findings that are relevant (for instance, because they contradict one of the explanantia). The logic presented here is a variant of the logic LAr .2 Both LAr and LArs are adaptive logics. Adaptive logics are a family of non-standard logics that are meant to study defeasible reasoning processes in a formally exact way. As will become clear below, the adaptive logics framework is especially suited to formulate formal logics for abduction.3 The logics LAr and LArs lead to the same consequence set when applied to the same set of premises. The difference, however, is that LArs , unlike LAr , is in the so-called “standard format” of adaptive logics. Formulating an adaptive logic in standard format has many advantages. One of them is that both the semantics and the proof theory can be formulated in a completely generic way. A second, and more important advantage, is that the standard 1
2 3
Research for this paper was supported by subventions from Ghent University and from the Research Foundation – Flanders (FWO - Vlaanderen). The author is indebted to the anonymous referees for their helpful suggestions. The logic LAr was first presented in Joke Meheus and Diderik Batens, “A Formal Logic for Abductive Reasoning”, in Logic Journal of the IGPL 14, 2006, pp. 221-236. The first logic in this family was designed around 1980 by Diderik Batens and was meant to interpret (possibly) inconsistent theories as consistently as possible. Later the notion of an adaptive logic was generalized in different ways (for instance, to capture ampliative forms of reasoning, such as abduction and inductive generalization) and a whole variety of adaptive logics was designed—for an excellent introduction to adaptive logics see Diderik Batens, Adaptive Logics and Dynamic Proofs. Mastering the Dynamics of Reasoning, with Special Attention to Handling Inconsistency, forthcoming.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_6, © Springer Science+Business Media B.V. 2011
94
Joke Meheus
format warrants that all central meta-theoretic properties (soundness, completeness, proof invariance, fixed point property, . . . ) hold. As soon as it can be shown that the adaptive logic satisfies the requirements of the standard format, all these properties can be proven in a generic way (that is, without referring to the properties of the specific adaptive logic). The standard format will be discussed in Section 5. A final advantage of LArs is that it is much more transparant and much more user-friendly than LAr . One of the reasons for this is that, in LArs , it is not required to determine beforehand the set of possible explanantia and the set of possible explananda, which is a drawback of LAr .4 Both LAr and LArs are based on Classical Logic—henceforth CL. So, all references to causality, laws of nature, and similar non-extensional concepts will be out of the picture. I do not doubt that more interesting results may be obtained from intensional logics. However, as a starting point, I want to keep the discussion as simple and transparant as possible.
2 T HE PROBLEM Many formal logicians display disinterest or even suspicion with respect to abduction. There seem to be several reasons for this. The first is that abductive steps are of the form †
(∀α)(A(α) ⊃ B(α)), B(β) / A(β)
a fallacy known as Affirming the Consequent (given Universal Instantiation). The second is that many examples of purportedly sound abductions seem to rely on a hidden non-formal reasoning: the only sensible formal rule behind them seems to lead inevitably to a set of unsound and even inconsistent conclusions. For instance, given the explananda Qa and Ra and the generalizations (∀x)(P x ⊃ Qx) and (∀x)(¬P x ⊃ Rx), (†) enables one to infer both P a and ¬P a. A third reason seems to be that abductive inferences are ‘contextually validated’. For instance, although it makes sense to infer P a on the basis of {(∀x)(P x ⊃ Qx), Qa, Qb ∧ ¬P b}, it does not make sense to infer P b from that same set of premises. Still, mainly from the side of computer science, several attempts have been made at formulating a logic-based approach to abduction.5 What these approaches have in common is that (†) is not implemented directly. Instead, abductive inferences are specified as a kind of ‘backward reasoning’: given a theory T and an explanandum B, find an A such that6 4 5
6
That both sets have to determined beforehand is one of the reasons that LAr is not in standard format—see Section 5 for the requirements on the standard format. For an excellent introduction to logic-based approaches to abduction, see Atocha Aliseda, Abductive Reasoning. Logical Investigations into Discovery and Explanation. Dordrecht: Springer 2006. I use the notion “theory” here in a broad sense. It contains the background theory, but may also contain empirical findings that are relevant for the explanation problem.
A Formal Logic for the Abduction of Singular Hypotheses
(1) (2) (3) (4) (5)
95
T ∪ {A} ⊢ B. T ̸⊢ B. T ̸⊢ ¬A. B ̸⊢ A. A is ‘minimal’.
The first of these requirements needs little explanation. Also the next two requirements are straightforward: (2) warrants that the explanandum B is not explained by the background theory, and (3) that the explanatory hypothesis A is compatible with T .7 (4) is needed to rule out (partial) self-explanations. For instance, we do not want to abduce B as an explanation for itself. Also, if T ∪ {A} ⊢ B, then T ∪ {A ∨ B} ⊢ B, but we do not want A ∨ B as an explanation for B. Cases like this are ruled out by requiring that the truth of the explanatory hypothesis is not warranted by the truth of the explanandum—this is what (4) comes to. (5) is related to the fact that, when trying to explain an explanandum, one is interested in explanations that are as parsimonious as possible. Hence, in view of (∀α)(A(α) ⊃ B(α)) ⊢CL (∀α)((A(α) ∧ C(α)) ⊃ B(α)), for an arbitrary C, (the so-called property of strengthening the antecedent) one needs to prevent that A(β) ∧ C(β) can be abduced, whenever A(β) can. This can be realized by requiring that the explanatory hypothesis is ‘minimal’. The notion of minimality can be defined in different ways. One may, for instance, consider an explanatory hypothesis as minimal if no alternative is available that is logically weaker and that also satisfies the requirements (1)–(4). This is the option that is followed for the logic LArs . The approach in the present paper is very different from the existing logicbased approaches to abduction. As mentioned above, the logic LArs is an adaptive logic. Defining a logic for abduction within the adaptive logic framework has several advantages. A first one is that the resulting logics (unlike the systems available today) have a proof theory. As we shall see below, this proof theory is dynamic (conclusions derived at some stage may be rejected at a later stage), but it warrants that the conclusions derived at a given stage are justified in view of the insight in the premises at that stage. This is especially important as, at the predicative level, abductive reasoning is not only undecidable, there even is no positive test for it (see also Section 3).8 Another advantage of the proposed logics is that they are much closer to natural reasoning than the existing systems. As we mentioned before, abduction is usually viewed as a form of backward reasoning—“find an A that satisfies the re7 8
A formula A is said to be compatible with a set of premises Γ iff Γ ̸⊢ ¬A. A property for objects of a given kind is decidable iff there is a mechanical procedure that leads to the answer YES if the property holds and to the answer NO if the property does not hold. There is a positive test for objects of a given kind iff there is a mechanical procedure that leads to the answer YES if the property holds. If the property does not hold the procedure may lead to the answer NO, but may continue forever. So, even if A follows abductively from a theory T and an explanandum B, there need not exist any finite construction that establishes this.
96
Joke Meheus
quirements (1)–(5)”. The search procedure by which this is realized in the existing systems (for instance, some form of linear resolution) is very different from the search procedures of human reasoners. The logic proposed in this paper treats abduction as a form of ‘forward reasoning’: it is an ampliative system that directly validates certain inferences of the form (†). The third advantage is related to this: deductive and abductive steps are nicely integrated into a single system. As a consequence, the logics not only enable one to generate explanatory hypotheses, but also to infer predictions on the basis of explanatory hypotheses and the background theory. This is highly important from the point of view of applications. In all interesting cases, explanatory hypotheses are typically used to derive predictions which, in turn, may lead to a revision of the original hypotheses. A final advantage is that, unlike what is the case for the existing logic-based approaches, LArs does not require that one determines beforehand the set of potential explanations. This makes it a much more flexible tool, especially for cases that involve ill-structured problems, where it may not be clear at all what would count as an explanation.
3 M AIN C HARACTERISTICS OF A BDUCTIVE R EASONING In order to present the logic LArs , I shall first discuss in some more detail the main characteristics of abductive reasoning processes. As mentioned in the previous section, abductive inferences should lead to conclusions that are compatible with the premises. It is important to note, however, that different explanantia are not necessarily jointly compatible with the premises. None of the requirements (1)–(5) excludes that different explanations are incompatible with each other. Evidently, this raises the question how one can avoid, in a classical framework, that the generation of contradicting explanations leads to triviality. We shall see below that this problem is handled in a very natural way by the logic LArs . A second characteristic is that abduction is a non-monotonic form of reasoning: conclusions that follow abductively from a theory T may be withdrawn when T is extended to T ∪ T ′ . This characteristic is related to the fact that some of the requirements for abductive inferences are negative—see (2)–(4) in the previous section. To see the relation between the negative requirements and the non-monotonic character of abductive inferences more clearly, consider the following simple example: (6) (7) (8)
John has a fever and small red spots on his face and body. Everybody who has rubeola (the measles) has a fever and small red spots on the face and body. Everybody who has rubella (the German measles) has a fever and small red spots on the face and body.
A Formal Logic for the Abduction of Singular Hypotheses
(9) (10)
97
Nobody has rubeola more than once. Nobody has rubella more than once.
Suppose that (6) is the explanandum B, and that (7)–(10) consitute the background theory. From (7), we may derive by CL: (11)
If John has rubeola, then he has a fever and small red spots on his face and body.
Hence, as requirements (1)–(5) are evidently fulfilled for (6), (7)–(10), and the antecedent of (9), we may abduce (12)
John has rubeola.
Similarly, we may add on the basis of (8) (13)
If John has rubella, then he has a fever and small red spots on his face and body.
This in turn allows us to abduce (14)
John has rubella.
So, we derived two different explanantia for the same explanandum. However, if we interpret a ‘minimal explanation’ as the logically weakest one, then neither of the two explanantia satisfies this criterion, but their disjunction does. It is easy to see how this disjunction may be derived in the proof. From (7) and (8) follows (15)
Everybody who has rubeola or rubela has a fever and small red spots on the face and body.
From this follows by CL (16)
If John has rubeola or rubela, then he has has a fever and small red spots on his face and body.
But then, in view of (6) and (15) we may abduce (17)
John has rubeola or he has rubela.
In view of the disjunction of possible explanations on line (17), the explanations on lines (12) and (14) are no longer minimal. In LArs , from line (17) on, lines (12) and (14) are marked, indicating that the sentences that occur on them are no longer considered as derived in the proof (see Sections 4 and 5 to see how this marking is technically realized). As was mentioned in the introduction, ampliative reasoning processes display an interesting interaction between abductive steps and deductive steps. Suppose that we continue our previous proof with a relevant finding that we just found out about John:
98
(18)
Joke Meheus
John had rubeola last year.
But then, from (9) and (18) we may derive deductively (19)
John does not have rubeola.
And hence, in view of (17) and (19), we can rule out one of the potential explanations, leaving us in this case with one potential explanation: (17)
John has rubela.
The final characteristic is that abductive reasoning processes, at the predicative level, do not have a positive test. This is related to the fact that first-order predicate logic is undecidable—if a conclusion A does not follow from a set of premises Γ, we may not be able to establish this (see also footnote 8). Hence, as abductive inferences are partly defined in terms of negative requirements, it immediately follows that, for undecidable fragments of first-order logic, abductive inferences lack a positive test. Suppose, for instance, that for a theory T , an explanandum B, and a sentence A, (1) is satisfied. In that case, it seems reasonable to conclude that A follows abductively from T , provided that (2)–(5) are satisfied. However, there is no algorithm to establish. Hence, if one is unable to establish that (2)–(5) are satisfied, there is no absolute warrant that the abductive step is not erroneous. This brings us to the general idea behind our adaptive logic. In LArs it is allowed that, as soon as soon as (1) is satisfied, an abductive step is made, but on the condition that (2)–(5) are satisfied. In other words, for any possible application of abduction, it is assumed, unless and until proven otherwise, that the conditions (2)–(5) are satisfied. How this is realized technically will become clear below. We shall end this section with a motivation for this general idea. There are different ways to deal with the lack of a positive test. The most common one is to consider only decidable fragments of first-order logic. The rationale behind this is clear: when dealing with decidable fragments, one may be sure that, for arbitrary theories T and explananda B, there is an algorithm for (2)– (5), and hence, that a decision method can be designed for “follows abductively from”. From the point of view of applications, however, this is an unacceptable restriction: nearly all interesting theories are such that their set of theorems is undecidable. The alternative is to allow that inferences are made, not on the basis of absolute warrants, but on the basis of one’s best insights in the premises. When this second option is followed, abductive reasoning processes not only exhibit an external form of dynamics (adding new information may lead to the withdrawal of previously derived conclusions), but also an internal one (the withdrawal may be caused by merely analysing the premises). Suppose, for instance, that for a theory T , an explanandum B, and a sentence A, one establishes that (1) is satisfied, and one did not establish that one of (2)–(5) is violated. In line with the general idea from two paragraphs ago, we may consider A as an abductive consequence of T . This conclusion, however, is provisional. If at a later moment in time, one is able to
A Formal Logic for the Abduction of Singular Hypotheses
99
show that one of the negative requirements is violated (for instance, because one establishes that ¬A follows from T ), A has to be withdrawn as an explanation for B. There are several arguments in favour of this second option. The first is that unwanted restrictions are avoided: abduction can be defined for any first-order theory. A second argument is that the conclusions of abductive reasoning processes are defeasible anyway. Whether the withdrawal of a conclusion is caused by an external factor (by adding new premises) or by an internal one (by gaining better insight in the premises) does not seem essential. The third, and most important argument is that, even for decidable fragments of first-order logic, it is often unrealistic to require absolute warrants. Even if a decision method is available, reasoners may lack the resources to perform an extensive search, and hence, may be forced to act on their present best insights. The logic LArs follows this second option. This has the advantage that, even for undecidable fragments of first-order logic, it enables one to come to justified conclusions. These conclusions are tentative and may later be rejected, but they constitute, given one’s insight in the premises at that moment, the best possible estimate of the conclusions that are ‘finally derivable’ from the premises.9
4 I NFORMAL P RESENTATION OF THE L OGIC LArs The general idea behind LArs is extremely simple: given a set of premises Γ, it is allowed that any valid inference rule of CL is applied (LArs is an adaptive extension of CL), and moreover that (†) is applied “as much as possible”. For the moment, this ambiguous phrase may be interpreted as “unless and until (∀α)(A(α) ⊃ B(α)) ∧ (B(β) ∧ ¬A(β)) turns out to be CL-derivable from Γ”. So, whenever it is CL-derivable from Γ that, for some general rule (∀α)(A(α) ⊃ B(α)) and some explanandum B(β), (†) cannot be applied consistently (because, ¬A(β) is CL-derivable from Γ), the application of (†) is overruled. In view of what we have seen in Section 2 (requirement (3)), this is exactly what we want. A formula of the form (∀α)(A(α) ⊃ B(α))∧(B(β)∧¬A(β)) will henceforth be called an abnormality. It is a formula that blocks a desired inference (in this case an abductive step). In line with what is common for adaptive logics, we shall say that LArs interprets sets of premises “as normally as possible”. In this case this means that abductive steps are “applied as much as possible”.10 To keep the proofs below within limits, I shall use (∀α)(A(α) ⊃ B(α))(α/β) as an abbreviation for formulas of the form (∀α)(A(α) ⊃ B(α)) ∧ (B(β) ∧ ¬A(β)). At this point, one may interpret “an abductive conclusion A is finally derivable from a theory T ” as “A satisfies the requirements (1)–(5)”—see Section 5 for a precise definition of the notion of final derivability. 10 The term “abnormality” is a technical term that is common in the literature on adaptive logics and that is application dependent. It does not refer to any standard of normality. 9
100
Joke Meheus
In order to rule out (partial) self-explanations, we only need one refinement. Given a formula (∀α)(A(α) ⊃ B(α))(α/β), we shall require that A(β) does not share any predicate with B(β). So, none of the following formulas counts as an abnormality: (∀x)(P x ⊃ P x) ∧ (P a ∧ ¬P a) (∀x)((P x ∧ Qx) ⊃ P x) ∧ ((P a ∧ Qa) ∧ ¬(P a ∧ Qa)) (∀x)((P x ∨ Qx) ⊃ (P x ∨ Rx)) ∧ ((P a ∨ Ra) ∧ ¬(P a ∨ Ra)) (∀x)((P x ∧ Qx) ⊃ (P x ∧ Rx)) ∧ ((P a ∧ Ra) ∧ ¬(P a ∧ Ra)) From now on, we shall only use the term “abnormality” for formulas of the form (∀α)(A(α) ⊃ B(α))(α/β) that satisfy this additional requirement. To realize the contextual validation of (†), we shall make a distinction in the proof theory between unconditional rules and conditional rules. The unconditional rules are all those that are validated by CL. The conditional rules are applications of (†). In order to avoid referring to a large set of specific rules, and as is common for adaptive logics, I shall use two generic rules (see below for illustrations and Section 5 for the precise definitions of the rules). The generic rule for the unconditional rules is called RU, and the one for the conditional rules RC. Here is a very simple example of a proof in LArs . The fifth element in the proof is the so-called condition—its function will become clear immediately. 1 2 3 4 5 6 7 8
(∀x)(P x ⊃ Qx) (∀x)(¬P x ⊃ Rx) (∀x)(Sx ⊃ T x) (∀x)(Sx ⊃ V x) (∀x)(W x ⊃ ¬Sx) Qa ∧ Ra Ta ∧ Tb Wa
PREM PREM PREM PREM PREM PREM PREM PREM
∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅
The formulas on lines 1–5 correspond to the background theory; those on lines 6–8 to explananda. In view of the formulas on lines 1 and 6 we may derive 9
Pa
1,6; RC
{(∀x)(P x ⊃ Qx)(x/a)}
Note that the formula P a ∨ (∀x)(P x ⊃ Qx)(x/a) is CL-derivable from the premises.11 Hence, one may read the latter formula as: P a is derivable from the premises provided (∀x)(P x ⊃ Qx)(x/a) is false. This is why it is allowed in the adaptive proof that P a is derived on the condition {(∀x)(P x ⊃ Qx)(x/a)}. The underlying idea is that abnormalities are supposed to be false, unless and until 11 I say that a formula A is derived in a proof from Γ if A occurs on some unmarked line in the proof. I say that a formula A is derivable from Γ if there exists a proof from Γ such that A occurs on an unmarked line in it. In Section 5, we shall see the precise definitions for derived at a stage, finally derived at a stage and finally derivable.
A Formal Logic for the Abduction of Singular Hypotheses
101
proven otherwise. If at some stage the condition of line 9 is no longer satisfied, the line is marked and the formula that occurs on that line is no longer considered as derived. The addition of “RC” in the justification refers to the fact that this line was added by the generic conditional rule. Note also that potential explanations will always be introduced on a non-empty condition. This provides an easy way to distinguish, in the proof, between CLconsequences of the theory and potential explanations. In view of the formulas on lines 2 and 6, we may continue the proof as follows 10 ¬P a
2,6; RC
{(∀x)(¬P x ⊃ Rx)(x/a)}
This brings us immediately to a slight complication, that is best dealt with first. From the premises, neither (∀x)(P x ⊃ Qx)(x/a) nor (∀x)(¬P x ⊃ Rx)(x/a) is CL-derivable. However, their disjunction is: 11 (∀x)(P x ⊃ Qx)(x/a) ∨ (∀x)(¬P x ⊃ Rx)(x/a)
1,2,6; RU ∅
The information we obtain from the formula on line 11 is that, in view of the premises, one of the abnormalities (∀x)(P x ⊃ Qx)(x/a) and (∀x)(¬P x ⊃ Rx)(x/a) is true. However, at this stage of the proof, we do not know which one is. Another way to put this is that, in view of the premises, there is no reason to choose one abnormality above the other, and hence, also no reason to choose the explanatory hypothesis P a above the hypothesis ¬P a. It is in view of cases like this that a strategy is needed to disambiguate the phrase “to interpret the premises as normally as possible”. The strategy that is followed in the case of LArs is the so-called Reliability Strategy. This requires some definitions and conventions on notation. ∨ Where ∆ is a finite set of abnormalities, the disjunction (∆) will be called a Dab-formula and will be written as Dab(∆). A Dab-formula Dab(∆) will be called a minimal Dab-formula at stage s of a proof, if, at that stage of the proof, no Dab(∆′ ) is derived, such that ∆′ ⊂ ∆. As will become clear below, the marking of lines at a stage s of a proof will proceed in terms of the minimal Dab-formulas that are derived at that stage of the proof. What the Reliability Strategy comes to is that, whenever a minimal Dabformula is derived in the proof at a certain stage (as is the case on line 11 of our proof), then all disjuncts that occur in that Dab-formula are considered as unreliable, and hence, all lines that rely on the normal behaviour of any of these disjuncts are marked. Or, put differently, a line is marked at a stage s if its condition contains an abnormality that is a disjunct of a miminal Dab-formula at that stage. In our proof, this means that both lines 9 and 10 are marked from stage 11 on and the formulas that occur on these lines are no longer considered as derived in the proof. This is precisely what we want, because we want our set of explanantia to be jointly compatible with the premises. This is how the proof looks at stage 11: 1 2
(∀x)(P x ⊃ Qx) (∀x)(¬P x ⊃ Rx)
PREM PREM
∅ ∅
102
3 4 5 6 7 8 9 10 11
Joke Meheus
(∀x)(Sx ⊃ T x) (∀x)(Sx ⊃ V x) (∀x)(W x ⊃ ¬Sx) Qa ∧ Ra Ta ∧ Tb Wa Pa ¬P a (∀x)(P x ⊃ Qx)(x/a)∨ (∀x)(¬P x ⊃ Rx)(x/a)
PREM PREM PREM PREM PREM PREM 1,6; RC 2,6; RC 1,2,6; RU
∅ ∅ ∅ ∅ ∅ ∅ {(∀x)(P x ⊃ Qx)(x/a)} X11 {(∀x)(¬P x ⊃ Rx)(x/a)} X11 ∅
The following extension of the proof illustrates the way in which abductive steps are contextually validated in LArs . Suppose that we continue the proof as follows 12 Sa 13 Sb
3,7; RC 3,7; RC
{(∀x)(Sx ⊃ T x)(x/a)} {(∀x)(Sx ⊃ T x)(x/b)}
However, the following line may be added by the rule RU: 14 ¬Sa
5,8; RU
∅
At this point it becomes clear that the following abnormality is CL-derivable 15 (∀x)(Sx ⊃ T x)(x/a)
3,7,14; RU ∅
From this point on, the condition of line 12 overlaps with the minimal Dab-formula on line 15, and hence, this line is marked: ... 12 13 14 15
Sa Sb ¬Sa (∀x)(Sx ⊃ T x)(x/a)
3,7; RC 3,7; RC 4,5; RU 3,7,14; RU
{(∀x)(Sx ⊃ T x)(x/a)} X15 {(∀x)(Sx ⊃ T x)(x/b)} ∅ ∅
Despite the fact that Sa cannot be abduced on the basis of lines 3 and 7, Sb can be abduced on the basis of these same lines. For this simple example, it is easy to check that line 15 will not be marked in any extension of the proof (Sb is finally derived in this proof—see Section 5 for the precise definition of final derivability). This is also a good point to illustrate how abductive steps and deductive steps are nicely integrated in LArs . In view of lines 4 and 13, the following line may be added: 16 V b
4,13; RU
{(∀x)(Sx ⊃ T x)(x/b)}
The formula on line 16 is a prediction that is CL-derivable from the premises and the explanatory hypothesis on line 13. The line is added by the generic unconditional rule RU. Note that when the unconditional rule is applied, no new formulas
A Formal Logic for the Abduction of Singular Hypotheses
103
are added to the condition, but any formula that occurs in a non-empty condition is ‘carried’ over to the conclusion of the application. The reason for this is easy to understand. If, at some point, line 13 has to be marked (because its condition is no longer satisfied), then evidently any line that depends on it, should also be marked. I shall now show that the logic LArs has some further desired properties. The first is that no logically contingent formula is finally derivable as an explanation for a tautology. Suppose that we would start a proof by introducing two CL-theorems: 1 2
P a ∨ ¬P a (∀x)(Qx ⊃ (P x ∨ ¬P x))
RU RU
∅ ∅
We may now continue the proof as follows 3
Qa
RU
{(∀x)(Qx ⊃ (P x ∨ ¬P x))(x/a)}
However, this is harmless. As soon as we add the following line: 4
(∀x)(Qx ⊃ (P x ∨ ¬P x))(x/a) ∨ (∀x)(¬Qx ⊃ (P x ∨ ¬P x))(x/a) RU
∅
line 3 is marked. There is also no risk that contradictions would be (finally) abducible for logically contingent formulas. Suppose that our explanandum is Qa and that we introduce the CL-theorem (∀x)((P x ∧ ¬P x) ⊃ Qx) in our proof: 1 2
Qa (∀x)((P x ∧ ¬P x) ⊃ Qx)
PREM RU
∅ ∅
We may now abduce P a ∧ ¬P a as an explanation for Qa, but this line is marked as soon as we unconditionally derive (∀x)((P x ∧ ¬P x) ⊃ Qx)(x/a) from Qa in the proof: 3 4
P a ∧ ¬P a
1,2; RC {(∀x)((P x ∧ ¬P x) ⊃ Qx)(x/a)} X4 (∀x)((P x ∧ ¬P x) ⊃ Qx)(x/a) 1; RU ∅
The derivation of (partial) self-explanations is ruled out by the restriction that we have imposed upon our abnormalities. Consider the following proof, where P a is our explanandum 1 2
Pa (∀x)((P x ∨ Qx) ⊃ P x)
PREM ∅ PREM ∅
Evidently, we may extend the proof by 3
(P a ∨ Qa) ∨ ((∀x)((P x ∨ Qx) ⊃ P x)(x/a)) 1; RU
∅
104
Joke Meheus
However, in this case, the derivation of line 3 will not allow us to derive P a∨Qa on the condition (∀x)((P x∨Qx) ⊃ P x)(x/a). The reason is that the latter formula is not an abnormality and that the condition may only contain abnormalities (see the next section for the precise definitions). (The formula P a ∨ Qa could evidently be derived on the empty condition by the rule RU from the formula on line 1, but this is obviously harmless. As was mentioned above, only formulas that are derived on non-empty conditions are considered as possible explanations for some explanandum.) There is one last complication we have to address. What if more than one explanans is abducible for the same explanandum? Consider, for instance, the following proof: 1 2 3 4 5
(∀x)(P x ⊃ Rx) (∀x)(Qx ⊃ Rx) Ra Pa Qa
PREM PREM PREM 1,3; RC 2,3; RC
∅ ∅ ∅ {(∀x)(P x ⊃ Rx)(x/a)} {(∀x)(Qx ⊃ Rx)(x/a)}
From these premises, the following minimal Dab-formulas 6 7
(∀x)(P x ⊃ Rx)(x/a) ∨ (∀x)((Qx ∧ ¬P x) ⊃ Rx)(x/a) 1–3; RU (∀x)(Qx ⊃ Rx)(x/a) ∨ (∀x)((P x ∧ ¬Qx) ⊃ Rx)(x/a) 1–3; RU
∅ ∅
are CL-derivable. This means that, by the Reliability Strategy, both lines 4 and 5 are marked. Hence, as lines 4 and 5 will remain marked in any extension of the proof, neither P a nor Qa is finally derivable from these premises.12 However, their disjunction is, as is shown by the following extension: 8 9
(∀x)((P x ∨ Qx) ⊃ Rx) P a ∨ Qa
1,2; RU 3,8; RC
∅ {(∀x)((P x ∨ Qx) ⊃ Rx)(x/a)}
As no minimal Dab-formula is derivable from the premises that has (∀x)((P x ∨ Qx) ⊃ Rx)(x/a) as one of its disjuncts, the formula on line 9 is finally derived in this proof. Before ending this section, I want to illustrate that LArs leads to the most parsimonious explanatory hypotheses. Consider the following example: 1 2 3
(∀x)(P x ⊃ Rx) (∀x)((P x ∧ Qx) ⊃ Rx) Ra
PREM PREM PREM
∅ ∅ ∅
At this stage both P a and P a ∧ Qa may be abduced: 12 The only way in which lines 4 and 5 would be unmarked in some extension is when the Dab-formulas on lines 6 and 7 are no longer minimal. However, for this simple example, it is easily seen that neither of the disjuncts of the formula on line 6 is CLderivable from the premises and analogously for the formula on line 7.
A Formal Logic for the Abduction of Singular Hypotheses
4 5
Pa P a ∧ Qa
1,3; RU 2,3; RU
105
(∀x)(P x ⊃ Rx)(x/a) (∀x)((P x ∧ Qx) ⊃ Rx)(x/a)
However, line 5 is marked as soon as the following minimal Dab-formula is derived 6
(∀x)((P x ∧ Qx) ⊃ Rx)(x/a) ∨ (∀x)((P x ∧ ¬Qx) ⊃ Rx)(x/a) 1; RU ∅
I end this section with a possible objection. Some readers might object that, given a generalization of the form (∀α)(A(α) ⊃ B(α)), the logic LArs cannot distinguish between cases in which nearly all B are also A and cases in which very few B are A. In the former case, the argument might continue, an application of abduction for a particular explanandum seems sound (it leads to a sufficiently plausible hypothesis), whereas it does not seem sound in the latter case. My reaction to this objection is twofold. The first is that, in cases where there are two or more possible explanations for the same explanandum, only their disjunction is LArs -derivable. So, the plausibility of the (disjunctions of) possible explanations that are LArs -derivable are at least as plausible as their most plausible disjunct. The second reaction is that the objection seems to rely on a confusion. The aim of LArs is not to select those explanantia that are most plausible. Its aim is to derive, for each explanandum, a unique disjunction of possible explanations that is complete (also the most unlikely ones occur in it). For certain application contexts, where it is crucial not to overlook any of the possibilities, this procedure makes good sense.
5 T HE L OGIC LArs In this section, I present the logic LArs in a formally precise way. As any other adaptive logic in standard format, the logic LArs is characterized by a triple: a lower limit logic (a reflexive, transitive, monotonic, uniform, and compact logic for which there is a positive test), a set of abnormalities Ω (characterized by a, possibly restricted, logical form) and a strategy. The lower limit logic is the logic that determines the stable part of the adaptive logic, and that also determines the unconditional rule. In the case of LArs , the lower limit logic is CL and the strategy is Reliability. Relying on the previously introduced abbreviation, the set of abnormalities Ω is defined by Definition 1 Ω = {(∀α)((A(α) ⊃ B(α))(α/β) | no predicate that occurs in B(β) occurs in A(β)} In order to define the semantics, we need some further definitions. We first define the abnormal part of a CL-model: Definition 2 Ab(M ) = {A ∈ Ω | M A}
106
Joke Meheus
We shall say that a Dab-formula Dab(∆) is a Dab-consequence of Γ if it is CLderivable from Γ and that it is a minimal Dab-consequence of Γ if there is no ∆′ ⊂ ∆ such that Dab(∆′ ) is also a Dab-consequence of Γ. The set of formulas that are unreliable with respect to Γ, denoted by U (Γ), is defined by Definition 3 Where Dab(∆1 ), Dab(∆2 ), . . . are the minimal Dab-consequences of Γ, U (Γ) = ∆1 ∪ ∆2 ∪ . . . is the set of formulas that are unreliable with respect to Γ. In view of these definitions, the semantic consequence relation of LArs is given by: Definition 4 A CL-model M of Γ is reliable iff Ab(M ) ⊆ U (Γ). Definition 5 Γ LArs A iff A is verified by all reliable models of Γ. As is common for all adaptive logics in standard format, the proof theory of LArs is characterized by three generic inference rules and a marking definition. The inference rules only refer to the lower limit logic, in our case CL. Where Γ is the set of premises, the inference rules are given by PREM
If A ∈ Γ:
... A
... ∅
RU
If A1 , . . . , An ⊢CL B:
A1 ... An B
∆1 ... ∆n ∆1 ∪ . . . ∪ ∆n
RC
If A1 , . . . , An ⊢CL B ∨ Dab(Θ)
A1 ... An B
∆1 ... ∆n ∆1 ∪ . . . ∪ ∆n ∪ Θ
The premise rule PREM simply states that, at any line of a proof, a premise may be introduced on the empty condition. What the unconditional rule RU comes to is that, whenever , A1 , . . . , An ⊢CL B and the A1 , . . . , An occur in the proof on the conditions ∆1 , . . . , ∆n , then B may be added to the proof on the condition ∆1 ∪ . . . ∪ ∆n . The conditional rule RC is analogous, except that here a new condition is introduced. The marking definition proceeds in terms of the minimal Dab-formulas derived at a stage of the proof: Definition 6 Dab(∆) is a minimal Dab-formula at stage s iff, at stage s, Dab(∆) is derived on condition ∅, and no Dab(∆′ ) with ∆′ ⊂ ∆ is derived on condition ∅.
A Formal Logic for the Abduction of Singular Hypotheses
107
Definition 7 Where Dab(∆1 ), . . . , Dab(∆n ) are the minimal Dab-formulas derived on condition ∅ at stage s, Us (Γ) = ∆1 ∪ . . . ∪ ∆n . Definition 8 Where ∆ is the condition of line i, line i is marked at stage s iff ∆ ∩ Us (Γ) ̸= ∅. A formula A is said to be derived at stage s of a proof if it occurs on a line in the proof that is unmarked at stage s. As the marking proceeds in terms of the minimal Dab-formulas that are derived at a certain stage, it is clear that marking is a dynamic matter: a line may be unmarked at a stage s, marked at a later stage s′ and again unmarked at an even later stage s′′ . This is why a more stable notion of derivability is needed: Definition 9 A is finally derived from Γ at line i of a proof at stage s iff A is derived at line i at stage s, line i is unmarked at stage s, and every extension of the proof in which line i is marked has a further extension in which i is unmarked. As may be expected, the derivability relation of LArs is defined with respect to the notion of final derivability Definition 10 Γ ⊢LArs A (A is finally derivable from Γ) iff A is finally derived in an LArs -proof from Γ. The fact that LArs is in standard format immediately warrants that it has all interesting meta-theoretic properties, such as soundness and completeness and proof invariance:13 Theorem 1 Γ ⊢LArs A iff Γ LArs A. Theorem 2 If Γ ⊢LArs A, then every LArs -proof from Γ can be extended in such a way that A is finally derived in it.
6 C ONCLUSION AND OPEN PROBLEMS In this paper, I presented the logic LArs for the abduction of singular hypotheses. Despite some unusual properties (such as its dynamical character), LArs is a decent formal logic, with a nice proof theory and a (preferential) semantics that is sound and complete with respect to it. Several extensions and variants are possible for the logic LArs . One that first comes to mind is a variant that allows for the abduction of different explanatory hypotheses in cases where more than one explanans is available for the same explanandum. (In such cases, LArs only allows for the derivation of their disjunctions.) Other alternatives may be obtained by changing 13 I refer to Diderik Batens, Ibid., for an overview of the meta-theoretic properties and for their proofs.
108
Joke Meheus
the extensional framework of LArs into one that allows, for instance, for a causal implication. It would also be interesting to explore alternatives in which different explanatory hypotheses do not necessarily have the same weight. Finally, it might be interesting to work on the heuristics behind abductive processes. The logic LArs simply determines which explanantia follow in view of an explanandum and a background theory. But, evidently, looking for an (interesting) explanation is a goal-directed process. All this goes beyond the scope of the logic presented in this paper. Still, it seems like an excellent starting point to explore these questions further.
Centre for Logic and Philosophy of Science University of Ghent Blandijnberg 2 9000 Gent Belgium [email protected]
¨ T HOMAS M ULLER
P ROBABILITIES IN B RANCHING S TRUCTURES
A common, natural view about probabilities, shared by philosophers of diverse persuasions, is that probabilities are graded possibilities.1 On this view, which I will presuppose, there are no probabilities without underlying possibilities, and there is room for different notions of probability at least to the extent that there are different underlying notions of possibility. In this paper I want to trace out consequences of this view for the specific case of possibilities that are grounded in branching structures. Such structures afford a natural representation of real possibilities: concrete possibilities in indexically specifiable situations, like a concrete laboratory experiment or other indeterministic happenings. I will argue that branching-based probabilities lead to interesting probability structures that can shed light on, e.g., the causal Markov condition. My paper is structured as follows: I start by discussing different types of possibility and zooming in on branching-based real possibilities, giving formal details about the theory of branching time. I then work out a theory of probabilities based on branching time structures and discuss a number of peculiarities and limitations of that approach. Finally I give a brief overview of the way in which the branching time framework is extended in the theory of branching space-times, and what an appropriate probability theory for that framework looks like.
R EAL AND OTHER POSSIBILITIES Is there just one correct analysis of the notion of possibility, or can one discern several different notions of possibility that, while sharing some common structure, are interestingly different? The logical empiricists of the 1930s held that the only viable notion of possibility was the ontologically rather innocent notion of logical possibility: the absence of formal contradiction. On the other hand, Weyl (1940) in his overview of the phenomenology of modality already distinguished several different notions of possibility and pointed out that they had their proper uses in specific contexts: apart from logical possibility he acknowledged a priori (conceptual) possibility, physical possibility, and technical possibility. Present-day discussions involving modality tend to focus on the notion of metaphysical possibility as the one notion underlying philosophical argumentation; most thought experiments in philosophy, from Descartes’s genius malignus to Putnam’s Twin Earth, are meant to exhibit metaphysical possibilities. The formal semantics for possibilities developed since the 1950s however shows that there is much room for 1
Cf., e.g., van Fraassen (1980, 198); Popper (1982, 70); Thomason (1984).
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_7, © Springer Science+Business Media B.V. 2011
110
Thomas M¨uller
different accounts of possibility. How do these different notions of possibility hang together? Fine (2005) has argued convincingly that mathematical and logical possibility can be derived from the notion of metaphysical possibility, but he has also argued that physical possibility is a different matter—he shows that physical (in his terminology, “natural”) possibility is a further kind of modality not reducible to the metaphysical variety. The landscape of possibilities seems to be more of a jungle than a desert. All the notions of modality mentioned so far are abstract: they pertain to types of situations, not directly to concrete token-happenings. Possibilities that are important for our practical concerns are however different: they are concrete rather than abstract (I can walk to the market, along a specific route, or I can take my bike)—and as we all know, they vanish as time passes. What was possible today may not be possible any more tomorrow (the last chance to see a live Dodo has long passed; once that specific train is gone, it is impossible for me to catch it). Such real possibilities are at least epistemically more basic than the other, abstract notions: we encounter them in our daily lives, and they are part of our everyday conception of agency. Branching time: a model for real possibilities Formal models for real possibilities were developed starting in the late 1960s, in the context of semantical investigations of tense and other temporal modifiers. Arthur Prior (1957, 1967) developed his tense-logic as a formal framework in which the interaction of time and modality could be studied. The so-called Ockhamist analysis of the future tense results in a formal tempo-modal language whose models have the structure of forward-branching trees; cf. Figure 1. Formally, a branching-time structure is a pair ⟨T, <⟩ such that: • T is a nonempty set; • < is a strict partial ordering on T , i.e., transitive (if x < y and y < z, then x < z) and asymmetric (if x < y then y ̸< x), hence also irreflexive (x ̸< x); • < is backward linear, i.e., if x < z and y < z, then (x < y or x = y or y < x). The elements of T are usually called moments, and the ordering relation < is read tempo-modally: if x < y, we say that x occurs before y, or that y is in the future of possibilities of x. A non-strict order ≤ is derived from < in the usual way, i.e., x ≤ y iff (x < y or x = y). In such structures one can single out so-called histories as maximal linear subsets (i.e., subsets h of T that are maximal w.r.t. the property that for any x, y ∈ h, either x < y or x = y or y < x). In a history h, any two distinct elements are comparable via <, while in general, T will contain incomparable elements (cf.,
Probabilities in Branching Structures
111
Figure 1: A branching structure e.g., the moments m1 and m2 in Figure 1); such elements do not have a history in common. In view of the interpretation of the ordering relation, we can say that incomparable elements are inconsistent, and that a history represents a maximal possible course of events—a possible way the world depicted by ⟨T, <⟩ could develop. The formal requirements for branching time structures laid out above form the bare minimum, and additional requirements may be useful. For example, in the important case of continuous structures it turns out to be convenient, for technical reasons, to constrain the topology of the branching. Basically there are two options: two histories could split such that there is a last moment of overlap and no first moment of difference, or they could have no last moment of overlap, but two different first moments of difference. The convenient choice is to have a last moment of overlap—cf. McCall (1990) for discussion. Formally we thus require: • If m ∈ h1 −h2 and m′ ∈ h2 −h1 , then there is some m∗ for which m∗ < m, m∗ < m′ , and m∗ is maximal in h1 ∩ h2 . Note that this requirement also guarantees that any two histories are connected at some moment; any two possible developments of the world thus share a common past. In order to allow for some small technical simplifications in what follows, we also require that the ordering ⟨T, <⟩ have no maximal moments: • For every m ∈ T there is m′ ∈ T s.t. m < m′ . Formal semantics for branching time In order to explain the notion of real possibility in branching structures, and thus, the background for our discussion of probability, it will be good to have some
112
Thomas M¨uller
details about the formal language that is customarily defined for branching time. The Prior-Thomason semantics for such branching treelike structures defines the following temporal and modal operators: The past tense (“it was the case that”, P), the future tense (“it will be the case that”, F), and the operators of real possibility and necessity (or settledness), Poss and Sett . As is customary in modal languages, formulae are to be evaluated with respect to a model M (a branching structure ⟨T, <⟩ together with a valuation V assigning extensions to atomic propositions) at a point of evaluation. In basic modal logic such a point of evaluation consists just of a simple index (a “world”), so that one could expect the point of evaluation in branching time to be a simple index as well, i.e., a moment. The innovative element of Prior-Thomason branching time semantics is, however, to take the point of evaluation to be a moment and a history through that moment (written m/h, presupposing m ∈ h). For a stand-alone sentence uttered in a given context (formally: at a moment of context mC ∈ T ), the parameter m is initialized as m = mC . This indexical link to a context makes the Prior-Thomason semantics well suited to represent real tenses and possibilities. A specific problem is that there is no similar initialization of the history parameter: metaphysically speaking, there is no “history of the context” singling out “the real future”—the future is open. We do not dwell on the difficulties of interpreting stand-alone sentences containing the future tense. As the philosophical discussion of the notion of an open future shows, these difficulties are real; they are not an artefact of the formal analysis via branching time. For an illuminating account of how these difficulties can be met formally, cf. Belnap (2002a). For the account of real possibility that is at issue here, however, the mentioned difficulty plays no significant role: the semantic clause for Poss (as well as the dual one for Sett ) introduces a quantifier over histories up front, so that it makes no difference which history containing the context moment mC is assigned to h initially. Intuitively speaking, the tense operators P and F move the moment of evaluation along the current history of evaluation, in accordance with the idea that tense encodes location in time. The modal operators Sett and Poss , on the other hand, do not change the moment of evaluation, but the history, thus quantifying over the possible futures of the moment of evaluation. The formal semantical clauses for the mentioned operators are as follows: • M, m/h |= Pϕ iff there is m′ ∈ h s.t. m′ < m and M, m′ /h |= ϕ; • M, m/h |= Fϕ iff there is m′ ∈ h s.t. m < m′ and M, m′ /h |= ϕ; • M, m/h |= Sett ϕ iff for all h′ for which m ∈ h′ , M, m/h′ |= ϕ; • M, m/h |= Poss ϕ iff there is some h′ for which m ∈ h′ and M, m/h′ |= ϕ.2 2
As usual, this clause for possibility is the dual to the clause for necessity, so an explicit definition could be avoided by declaring Poss to be an abbreviation for ¬Sett ¬.
Probabilities in Branching Structures
113
This semantics suggests a natural definition of real possibilities at a moment, which is based on the notion of division of histories at a moment: • Let m belong to histories h1 and h2 , i.e., m ∈ h1 ∩ h2 . h1 and h2 are called undivided at m (written h1 ≡m h2 ) iff there is some m′ ∈ h1 ∩ h2 for which m < m′ . Being undivided at m is an equivalence relation on the set Hm of histories containing m. Reflexivity and symmetry are trivial.3 For transitivity, let h1 ≡m h2 as witnessed by m′ , and h2 ≡m h3 as witnessed by m′′ . As m′ , m′′ ∈ h2 , we have m′ ≤ m′′ or m′′ ≤ m′ (by linearity of histories); assume the former. By backwards linearity we get m′ ∈ h3 , so that m′ also serves as a witness for h1 ≡m h3 . • We say that h1 and h2 split at m (written h1 ⊥m h2 ) iff m is maximal in h1 ∩ h 2 . • The real possibilities at m are the members of the partition Πm of Hm induced by the equivalence relation of undividedness at m, ≡m . • We say that m is a choice point iff Πm has more than one member, i.e., if there are at least two histories splitting at m. In order to have an easy way of talking about real possibilities in a branching structure, we introduce the notion of a transition (cf. von Wright, 1963; Belnap, 1999): a transition t is an ordered pair ⟨m, H⟩ consisting of an initial (a moment m ∈ T ), together with an outcome (one of the real possibilities at that moment, H ∈ Πm ). Employing suggestive notation, we also write t = m H. A transition is trivial if there is no alternative transition with the same initial. We write T R for the set of all transitions, and T Rm for the set of all transitions with initial m. If m is not a choice point, then T Rm = {t} with t = m Hm , a trivial transition. We will alternatively speak of Πm and of T Rm as embodying the real possibilities open at m; these possibilities form an exhaustive set of mutually exclusive alternatives. A set of transitions T = {ti | i ∈ I}, where ti = mi Hi , is consistent if all the outcomes can occur together in one history, i.e., if ∩i∈I Hi ̸= ∅.
P ROBABILITIES IN B RANCHING T IME How can we implement probabilities in branching time? What is the proper notion of probability based on real possibilities? Given that real possibilities at a moment m form a set Πm corresponding to the set of transitions T Rm , a natural move is to 3
For reflexivity we require that there be no maximal moments. If T is allowed to have maximal moments, the definition of undividedness will have to treat those as a special case: h1 and h2 are undivided at m ∈ h1 ∩ h2 iff, either, there is no m′ ∈ T s.t. m < m′ (in which case it turns out that h1 = h2 ), or there is some m′ ∈ h1 ∩ h2 s.t. m < m′ .—A similar patch would be needed to improve the definition of splitting.
114
Thomas M¨uller
use the set T Rm as a sample space and to employ a suitable Boolean σ-algebra and some normalized measure to define a probability space.4 Technically—in the finite case we are assuming—, such a probability space at m is a triple P Rm = ⟨Ω, F, µ⟩ where Ω = T Rm is the sample space, F is the set-theoretic Boolean algebra over Ω (i.e., the set of subsets of Ω with the usual set-theoretical operations), and µ is a normalized measure on F (i.e., µ(Ω) = 1 and for a, b ∈ F with a ∩ b = ∅, we have µ(a ∪ b) = µ(a) + µ(b)). It seems plausible that in any case in which an exhaustive set of alternative possibilities is given, this will be the way to introduce probabilities as weights on these possibilities. What’s so special about probabilities in branching structures then? It turns out that interesting questions arise once we try to combine probability spaces that are defined at different moments. Normally, probability spaces P RA , P RB are combined by forming the Cartesian product ΩA,B = ΩA × ΩB of the sample spaces and FA,B = FA × FB of the algebras, and by defining a joint measure µA,B that satisfies the marginal property (i.e., reduces to the single measures when plugging in a unit element): P RA,B = ⟨ΩA,B , FA,B , µA,B ⟩ µA,B (⟨1A , b⟩) = µB (b);
s.t.
µA,B (⟨a, 1B ⟩) = µA (a).
This recipe rests on the presupposition that the underlying possibilities combine by forming Cartesian products. This is often the case, even if there are correlations (i.e., even if generally, µA,B (⟨a, b⟩) ̸= µA (a)·µB (b)). But in the case of branching structures, this assumption is not warranted. What is the technically appropriate way of defining something like joint probabilities in branching structures? How do the underlying (branching) possibilities constrain these probability spaces? And does the interpretation of the tree-like ordering in terms of an open future lead to any additional constraints on joint measures? It turns out that most of the relevant observations can be made by discussing a few simple cases. When combining two nontrivial probability spaces P Rm and P Rm′ with m ̸= m′ , only two cases need to be discussed: we either have m < m′ or m′ < m (which cases can be discussed together as they coincide after relabeling), or m and m′ are incomparable and thus inconsistent. Combining P Rm with P Rm′ , m < m′ . In this case the choice point m′ lies in the future of possibilities of the choice point m. The Cartesian product recipe fails instructively: most combinations of results from P Rm and P Rm′ make no sense. In fact, since there is splitting at m that 4
In what follows, we will ignore well-known measure-theoretic moves that will have to be made in case the sample space is infinite. We will assume finite branching from now on, i.e., we will be working exclusively with branching structures ⟨T, <⟩ s.t. for all m ∈ T , T Rm is finite. (Note that this does not imply that T itself, or even all Hm , are finite.)
Probabilities in Branching Structures
115
partitions the set Hm of histories through m, the moment m′ occurs in only one of those partitions. (Otherwise it would constitute a witness for undividedness.) Thus there is just one t = m H ∈ T Rm for which m′ ∈ ∪H (i.e., Hm′ ⊆ H). Given some t′ = m′ H ′ ∈ T Rm′ , we can understand the pair ⟨t, t′ ⟩ as expressing the fact that “first, choice point m had outcome H, enabling m′ , and then choice point m′ had outcome H ′ ”. For some t¯ ∈ T Rm − {t}, however, the pair ⟨t¯, t′ ⟩ makes no sense given the tempo-modal (if you wish, causal) interpretation of the ordering relation: there is no history in which first t¯ happens, excluding m′ , and then m′ has some outcome or other. Given t¯, m′ is no longer possible—even though up to (and including) m, it was possible. This point is strengthened if we look at the options for assigning probabilities to pairs of transitions in the Cartesian product. If ⟨t¯, t′ ⟩ cannot happen, then it should certainly have probability zero. By the marginal property, summing over all the alternative outcomes from T Rm′ , this leads to µm ({t¯}) = 0, and this holds for all transitions t¯ ∈ T Rm − {t}. Accordingly, we need to have µm ({t}) = 1, by normalization. This will usually not accord with the given probability measure from P Rm . Furthermore, by considering an incompatible alternative m′′ to m′ , the same argument will lead to µm ({t}) = 0, a contradiction. Dropping the marginal requirement seems quite a high prize to pay in order to avoid this—if one cannot recover the individual probabilities from the joint probability structure, in which sense can the latter be called a combination of the former any more? In the framework of the Cartesian product, the only sensible alternative seems to be to use the product measure, which fulfils the marginal property by definition. But then we have to be ready to assign non-zero probabilities to ⟨t¯, t′ ⟩ and other impossible combinations. (Note that it is not an option to use the product measure only in the case in which the first component is t, and set the probabilities to be zero otherwise—such a measure will generally not be normalized.) Isn’t there a more sensible way to combine P Rm and P Rm′ ? In fact, moving from the abstract discussion above to a concrete example leads the way to a better framework. Assume that at the concrete moment m, I have the choice to go to the races (t) or to stay at home (t¯), and assume for the sake of the argument that some suitable probability measure µm can be given.5 Let m′ be the moment at which I am at the races and bet $5 on Silver Shadow, with outcomes win (t′ ) and lose (t¯′ ). There are two two-way splittings, but clearly, there are only three alternatives: staying at home, going and winning, and going and losing. Returning to the abstract framework and generalizing, the recipe is as follows: • The alternatives that form the sample space of the combined probability space are sets of transitions from the individual sample spaces; • only those sets that are consistent qualify as alternatives; 5
This should not be read as an endorsement that probabilities can be usefully assigned to the outcomes of a singular human decision. In fact this may not be possible. If you are worried, substitute your favourite quantum-mechanical set-up.
116
Thomas M¨uller
• in fact, the sample space consists of the set of maximal consistent sets of transitions. This recipe explains why {stay at home, win} doesn’t occur in the sample space and thus needn’t be assigned a probability: the set is inconsistent. It also explains why {go} by itself is not one of the alternatives—the set is consistent, but not maximally so (it can be extended by “win” or by “lose”). The problems discussed above are hereby avoided. Furthermore insights are gained as to the marginal property and the question of correlations. The marginal property. In the Cartesian product case, the marginal property can be motivated by the thought that if one ignores what happens in the other probability spaces (“elsewhere”), the probability measure under consideration (“the probabilities for a local outcome”) should be recovered. The spatial metaphor is apt here: in fact the Cartesian product is well motivated if simultaneous outcomes of chance experiments are under consideration. Space (at least on a simpleminded, Newtonian conception) orders that which is simultaneous. Once a temporal dimension enters, however, things are different: Here the guiding intuition is that the present probabilities for outcomes should be recovered if we ignore what happens later. Accordingly, in the races example, we want to have µm ({go}) = µm,m′ ({go, win}) + µm,m′ ({go, lose}). Correlations. Again, if we think about the Cartesian product case in terms of the spatial combination of alternatives, we have the intuition that we should normally expect to find the product measure: spatially separated happenings shouldn’t influence one another. However, correlations are not unthinkable. To use Van Fraassen’s slogan, correlations cry out for explanation—but such explanation can often be had, e.g., by signalling, or by a common cause. If the chance of Paul’s wearing red socks on a given day is 1/7, and the same holds for Albert, but the probability of seeing them both wearing red socks is 1/7 instead of 1/49, we may well venture the guess that they have communicated about this, e.g., by declaring Monday to be Red Socks Day. In the case of the temporal combination of probability spaces, correlations are much harder to make sense of. In fact, it seems that they always point to a flawed (and not just incomplete) model. If the chance to get from m to (later) m′ is a, and the chance to get from m′ to (later) m′′ is b, but the chance to get from m to m′′ isn’t a · b, what are we to say? Think in terms of frequencies: A certain fraction of a given population arrives at m′ , and from there again, a certain fraction arrives at m′′ , but the overall account doesn’t fit together? This consideration gives a strong motivation for the causal Markov condition in combining probabilities in branching structures: if m, m′ , and P Rm and P Rm′ are given and m < m′ , then on the combined probability space of maximal consistent sets of transitions from T Rm and from T Rm′ , the correct measure is the product measure.
Probabilities in Branching Structures
117
Combining P Rm′ with P Rm′′ , m′ and m′′ incomparable. In this case, no combination of an outcome of m′ and an outcome of m′′ is possible at all. Suppose that, continuing the races example, when staying at home I can read or cook, and that at m′′ , the moment of the decision, there is an appropriate probability space P Rm′′ that captures the relevant probabilities. Thus, t′′ = m′′ H ′′ could stand for “read” and t¯′′ for “cook”. The moments m′ (of the decision at the races) and m′′ (of the decision at home) are incomparable and thus inconsistent.6 What would it mean to combine the two given probability spaces? Can any sense be made of a combination of, e.g., t′ (“win”) and t′′ (“read”)? It seems not, and it seems that in this case, even the strategy to look at maximal consistent sets of transitions will fail: In the union of T Rm′ and T Rm′′ , only the singletons are consistent sets of transitions—but if the given probabilities of the single spaces are assigned to these singletons, as we seem to have to, it turns out that normalization fails again: the sum of the respective probabilities is 2, not 1. Again, shifting the discussion to a consideration of the underlying possibilities is helpful. Surely m′ and m′′ are inconsistent, but in branching structures as defined above, there is always a common past, a choice point at which the histories containing m′ split off from those containing m′′ . In our story, this happens exactly at moment m, the moment of deciding whether to go to the races or to stay at home. If that moment is added in, the general recipe from the previous discussion can be used: the maximal consistent sets of transitions now contain one outcome of m and one of the outcomes of either m′ or m′′ , and by the given recipe we arrive at a well-defined, normalized probability space. The general moral of this example is that inconsistent moments need to be supplemented by a common past choice point before a sensible joint probability structure can be defined. Summary: Branching-time based probabilities The two cases discussed above in fact cover everything that can happen in combining probabilities defined on branching time structures. Joint probabilities can be defined if the individual given probability spaces afford a combination that gives a full account of causal alternatives. No “raw” inconsistencies may occur. Somewhat more formally: • Let a set M of nontrivial initials and for each m ∈ M a probability space P Rm be given. (P Rm has the set of transitions with initial m as its sample space. By nontriviality, this sample space has more than one element.) A joint probability space P RM can be defined if and only if for any incompatible m′ , m′′ ∈ M , there is some m ∈ M for which m < m′ and m < m′′ . 6
Note that these two incompatible moments could still occur at the same clock time. In order to avoid confusion it is therefore useful to call the elements of a branching structure “moments” and not “times”. This is the terminology adapted in this paper.
118
Thomas M¨uller
Given such an M , the above recipe can be applied for combining the individual probability spaces: • Let T RM = ∪m∈M T Rm be the set of all transitions with initials from M . • The sample space of P RM , ΩM , is the set of maximal consistent subsets of T RM (where consistency, as stated above, means that all outcomes share at least one history). This is the set of causal alternatives that can be built up given initials from M . As one can check, in a branching time structure a set of transitions can be consistent only if its set of initials is linearly ordered (else there isn’t even a history containing all the initials). • In the finite case that we are assuming, the algebra is simply the algebra of subsets of that sample space. • The probabilities of the (singletons of) elements of the sample space ΩM are given by multiplying the respective probabilities of the transitions making up these elements: one multiplies probabilities “vertically”, in accordance with the causal Markov condition. In this way, joint probability structures can be defined in models of branching time.
E XTENDING THE ACCOUNT: B RANCHING S PACE - TIMES In our discussion of why the Cartesian product recipe fails for probabilities based on branching possibilities, we stressed the “spatial” nature of Cartesian producttype combinations, and went for an exploration of temporal (or rather: tempomodal) combination, resulting in the above recipe. It seems, however, that something important was lost along the way: in our above scheme, there is no place any more for Cartesian products, and thus, for combining probability spaces not just at the same time, but at the same moment (cf. note 6 above). To be sure, one can always choose to add in such structure by labeling. If, e.g., at m there is 36-fold splitting because two dice are thrown independently at that same moment, we can choose to label the outcomes accordingly, viewing the 36 element partition of Hm effected by the gamble as itself structured in 6 × 6 cases. But the framework so far provides no means of representing such information internally. This is one motivation for coming up with an extended framework for combining branching possibilities: branching space-times. (In fact, considerations of how to represent simultaneous choices by independent agents were what historically triggered the development of that framework.) The matter is intricate, and page constraints make it impossible for me to give a full account of the framework, or of the resulting probability theory, in this paper. The groundwork for the framework was laid by Belnap (1992); modal correlations were analyzed by Belnap (2002b, 2003) and by M¨uller et al. (2008). Transitions were analyzed by Belnap (2005). The probability theory was developed by Weiner and Belnap
Probabilities in Branching Structures
119
(2006), M¨uller (2005), and Belnap (2007). For the most natural extension of the story about branching time given above, one should consult M¨uller (2005).7 Very briefly, in branching space-times (BST) the guiding idea is that space should be represented at the same level as time. Considerations of relativity theory lead to the choice of relativistic space-times as the appropriate candidates for histories (possible courses of development). For the Minkowski space-time of special relativity it turns out that defining histories to be maximal directed sets is the way to go: a history should be a set such that any two of its members have a common upper bound, and should be maximally so.8 This makes room for space: incomparable elements can belong to one history, if they share a common future. Such elements are called space-like related. In a single Minkowski space-time, all future light cones intersect, so that any two incomparable (i.e., causally isolated) events have a common upper bound. In BST, which integrates, e.g., several Minkowski space-times, two elements without a common upper bound are seen to belong to modal alternatives, i.e., to different histories. The radical extension of the history definition complicates matters significantly. Obviously, one cannot start with backwards linear structures, but has to allow for a broader class of partial orders. Continuity is a natural requirement, as well as future non-ending. The topology of branching is an especially subtle point; the requirement of maximal points in the intersection of histories, similar to what has been discussed above, turns out to be the right choice (i.e., the choice that allows for sensible definitions of undividedness and splitting, leading to local partitions of histories). BST’s prior choice principle fixes the topological features. Once the theory of transitions in BST has been developed, it turns out that much of what has been said above about the combination of “small” (local) alternatives to form joint causal alternatives carries over. Maximal consistent sets of transitions define these alternatives as well. Dealing with space however means that incomparable initials cannot be excluded straight away. If they correspond to space-like related events, like in tossing two dice at the same time, then we want the respective possibilities to combine, in a Cartesian product-like fashion. (It is still better, and more general, to keep talking about consistent sets of transitions.) Probabilistic correlations return as a live option: the causal Markov condition only constrains “vertical” combinations of transitions, not “horizontal” ones.9
7
8 9
In fact I believe that it is useful to read the present paper as an introduction to the mentioned paper, giving a broader motivational background for the technical apparatus that is deveoped there. This will not do for some models of general relativity. An appropriate extension of the framework is still a desideratum. The BST framework also leaves room for a broader notion of correlation called “modal funny business”: it may turn out that local alternatives do not combine smoothly to form joint possibilities. Perfect correlations in quantum mechanics may be an example of such strange combinatorics. So far, BST probability theory has not been extended to cases of modal funny business. The matter certainly merits further study.
120
Thomas M¨uller
Given this framework, it becomes possible to tackle the issue of screening off for probabilistic correlations in an integrated and formally perspicuous manner. For an initial result, and for the technical background, see M¨uller (2005).
C ONCLUSIONS Probabilities are graded possibilities. Different notions of possibilities afford different probability structures. Standard ways of combining probability spaces by forming Cartesian products correspond to the spatial combination of possibilities. The temporal combination of possibilities in branching structures such as branching time leads to a different formalism, for which the notion of a maximal consistent set of transitions gives the elements of a joint sample space. By representing the temporal combination of possibilities in an explicit way, the framework contains a good motivation for the causal Markov condition. Branching timebased probabilities are however limited because the framework cannot describe the important case of spatial combination that lies at the bottom of the standard approach. The theory of branching space-times provides for an integrated framework in which both types of combination can be represented. Acknowledgements I would like to thank the audience and my co-symposiasts at the ESF Workshop Physical and Philosophical Perspectives on Probability, Explanation and Time, Zeist, 20 October 2009, for helpful discussions.
B IBLIOGRAPHY
Belnap, N. (1992). Branching space-time. Synthese, 92:385–434. Belnap, N. (1999). Concrete transitions. In Meggle, G., editor, Actions, Norms, Values: Discussions with Georg Henrik von Wright, pages 227–236. Berlin: de Gruyter. Belnap, N. (2002a). Double time references: Speech-act reports as modalities in an indeterminist setting. In Wolter, F., Wansing, H., de Rijke, M., and Zakharyaschev, M., editors, Advances in Modal Logic, Vol. 3, pages 37–58. Singapore: World Scientific. Belnap, N. (2002b). EPR-like “funny business” in the theory of branching spacetimes. In Placek, T. and Butterfield, J., editors, Non-locality and Modality, pages 293–315. Dordrecht: Kluwer. Belnap, N. (2003). No-common-cause EPR-like funny business in branching space-times. Philosophical Studies, 114:199–221.
Probabilities in Branching Structures
121
Belnap, N. (2005). A theory of causation: Causae causantes (originating causes) as inus conditions in branching space-times. British Journal for the Philosophy of Science, 56:221–253. Belnap, N. (2007). Propensities and probabilities. Studies in History and Philosophy of Modern Physics, 38(3):593–625. Fine, K. (2005). Modality and Tense. Oxford: Oxford University Press. Gabbay, D. and Guenthner, F., editors (1984). Handbook of Philosophical Logic, Vol. II. Dordrecht: D. Reidel. McCall, S. (1990). Choice trees. In Dunn, J. and Gupta, A., editors, Truth or Consequences. Essays in Honor of Nuel Belnap, pages 231–244. Dordrecht: Kluwer. M¨uller, T. (2005). Probability theory and causation: A branching space-times analysis. British Journal for the Philosophy of Science, 56:487–520. M¨uller, T., Belnap, N., and Kishida, K. (2008). Funny business in branching spacetimes: infinite modal correlations. Synthese, 164:141–159. Popper, K. (1982). Quantum Theory and the Schism of Physics. London: Hutchinson. Prior, A. N. (1957). Time and Modality. Oxford: Oxford University Press. Prior, A. N. (1967). Past, Present and Future. Oxford: Oxford University Press. Thomason, R. H. (1984). Combinations of tense and modality. In Gabbay and Guenthner (1984), pages 135–165. van Fraassen, B. (1980). The Scientific Image. Oxford: Oxford University Press. von Wright, G. H. (1963). Norm and Action. A Logical Inquiry. London: Routledge. Weiner, M. and Belnap, N. (2006). How causal probabilities might fit into our objectively indeterministic world. Synthese, 149:1–36. Weyl, H. (1940). The ghost of modality. In Farber, M., editor, Philosophical Essays in Memory of Edmund Husserl, pages 278–303. Cambridge, MA: Harvard University Press. Department of Philosophy University of Utrecht Janskerkhof 13a 3512 BL Utrecht The Netherlands [email protected]
Team B Philosophy of the Natural and Life Sciences
RAFFAELLA CAMPANER
CAUSALITY AND EXPLANATION: ISSUES FROM EPIDEMIOLOGY
While having public health and prevention campaigns as its main aims, epidemiology is also engaged in the elaboration of causal explanations of more or less common diseases. Long neglected by the philosophy of science, epidemiology is currently manifesting a strong interest in foundational and methodological issues. I shall here refer to a wide debate underway in the last decade involving the definition, status and methods of the epidemiology discipline, and relating what it is to explain a disease in epidemiological terms as a complex and multilevel phenomenon. I shall consider how such reflections can relate to some recent philosophical works on causal explanation, and how issues emerging from epidemiology can challenge them.
1. EPIDEMIOLOGY PARADIGMS The debate over causation within epidemiology has been particularly lively in the last few years, with studies reconsidering how the discipline has been developing and changing its focus. Various phases and paradigms have been identified in the history of modern epidemiology during the 19th and 20th century. Following the subdivisions devised by Susser, Schwartz, Morabia and Bromet1, it can be stressed how each dominant paradigm centres in reflections on causal nexus, and each affects public health practices differently: — sanitary statistics, with the miasma paradigm (approximately 1840–1890): diseases are due to emanations from contaminated air, soil and water; interventions in urbanized societies should be taken on sanitation and drainage; — infectious disease epidemiology, with the advent of microbiology and the germ theory paradigm (established towards the end of the 19th century, lasting approximately from 1890 to 1950): diseases (such as tuberculosis, typhoid, diphtheria) 1
Ezra Susser, Sharon Schwartz, Alberto Morabia, Evelyn Bromet (Eds.), Psychiatric Epidemiology. Oxford: Oxford University Press 2006. For recent reflections on the status, role and future of epidemiology, see also: Mervyn Susser, Ezra Susser, “Choosing a Future for Epidemiology: I. Eras and Paradigms”, and “Choosing a Future for Epidemiology: II. From Black Box to Chinese Boxes and Eco-epidemiology”, in: American Journal of Public Health 86, 1996, pp. 668-673 and pp. 674-677; Neil Pearce, “Traditional Epidemiology, Modern Epidemiology and Public Health”, in: American Journal of Public Health 86, 1996, pp. 678-683; Jonathan Wakefield, “Ecologic Studies Revisited”, in: Annual Review of Public Health 29, 2008, pp. 75-90.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_8, © Springer Science+Business Media B.V. 2011
126
Raffaella Campaner
are due to a single, specific infectious agent (necessary and preferably sufficient), which has to be eradicated to interrupt transmission; — risk factor epidemiology, with the theory of multiple causes and an initial special focus on chronic diseases (after World War II, approximately 1950 to date), also labelled “black box” paradigm: diseases are due to exposure to a multiplicity of factors, and public health interventions should reduce individual risk. The specific-cause germ theory is substituted by a nonspecific-cause theory and causes— denoted as “risk factors”—are taken to be unnecessary and insufficient. To evaluate them, risk factors are separated out of the web of causation, and the effects of exposures are isolated and observed under simplified conditions. While the risk factor paradigm has the advantage of acknowledging the webs of several causes underlying diseases, it has also been accused of having serious drawbacks2: a) a concentration on the description of the association between risk factors and diseases versus the explanation of causal processes linking them. The goal of isolating genuine causal factors from confounding or irrelevant ones is striven for by simplifying conditions, to create a state of all things being equal between exposed and unexposed individuals. The criticism holds that “sterilized” conditions obscure the antecedents of the risk factors and largely neglect the mediators connecting them with the disease, thus allowing for the identification rather than the explanation of causal factors; b) a strong focus on the individual level of organization: individuals are regarded as the preferred unit of interest over groups, and hence more proximate biologic factors are favoured over social, economic and environmental factors. To overcome these and other difficulties, a further paradigm, labelled “ecoepidemiology”, has recently been proposed which aims at studying data aggregated over groups rather than data on individuals, largely employing databases and providing disease mapping. Contrasting what it regards as too strong a focus on genetic factors—due to a growth in the last few decades in research on the genetic origins of diseases and their molecular mechanisms, and to the development of new technologies and screening tests—eco-epidemiology conceptualizes disease causation in terms of an interplay of different kinds of factors at different levels. For instance, “a genetic polymorphism may limit the ability of an individual to detoxify carcinogenic aromatic amines. However, only in an environment where this carcinogen is introduced by human economies does it become a potent risk factor
2
See Sharon Schwartz, Ezra Susser, Mervyn Susser, “A Future for Epidemiology?”, in: Annual Review of Public Health 20, 1999, pp. 15-33, and Ana Diez-Roux, “On Genes, Individuals, Society, and Epidemiology”, in: American Journal of Epidemiology 148, 1988, pp. 1027-1032. For a defence, instead, of risk-factor epidemiology, see, for instance, Sander Greenland, Manuela Gago-Dominguez, Jose Esteban Castelao, “The Value of Risk-Factor (‘Black-Box’) Epidemiology”, in: Epidemiology 15, 2004, pp. 529-535.
Causality and Explanation
127
for cancer.” 3 Another example is given by recent studies on schizophrenia, which highlight substantial variations in the incidence of the pathology in different areas. The factors identified as causes of schizophrenia include prenatal exposure to famine (i.e. at a specific historical moment), or, more in general, prenatal nutritional deficiency (i.e. in a specific period of development), paternal age at conception, genetic factors, but also—after observing a higher incidence of schizophrenia in urban areas—the timing and duration of exposure to urban life4. Together with the recognition of the role of environmental and socio-economic factors and the reduction of the emphasis on the microlevel of molecular genetics, the emergence of new paradigms has shifted attention away from the effects of risk factors on individuals, to populations, and thus an emphasis on the status of epidemiology as a discipline targeted to public health and prevention. In terms of causal explanation, conceptual tools need to be devised to allow for an integration of external agents and the internal functioning of the organism. It is in this respect that a notion of mechanism is to be adopted to account for both experimental evidence from basic science and environmental and socio-economic information.
2. OVERCOMING THE BLACK BOX PARADIGM. THE SEARCH FOR MECHANISMS The eruption of investigations on gene-environment interactions, and considerations on socio-economic and environmental factors, has given epidemiologists the opportunity to discuss their approaches to causation. The focus on risk factor disease associations has been accused of neglecting “the downstream mechanisms that allow us to understand how the risk factors operate.”5 Mechanistic explanations are thus invoked. Mechanicism as an approach to both causation and causal explanation has its locus in Wesley Salmon’s work, and has been developed in various forms by authors like Peter Machamer, Lindley Darden, Carl Craver, William Bechtel, Robert Richardson, Adele Abrahamsen and Jim Bogen. Salmon’s view—and, more specifically, his criterion of mark transmission—has been deemed to grasp some important general features of medical causation. The idea of the propagation of a mark over time is claimed to connect easily with the idea of disease localisation according to which the cause is a tissue alteration characteristic of a certain agent, persisting and transmitted over time.6 If this is taken 3 4
5 6
Schwartz, Susser and Susser, ibid., p. 28. See Dana March, Ezra Susser, “Invited Commentary: Taking the Search for Causes of Schizophrenia to a Different Level”, in: American Journal of Epidemiology 163, 2006, pp. 179-181. Susser, Schwartz, Morabia, Bromet, ibid., p. 416. See Paolo Vineis, “Exposure, Mutations and the History of Causality”, in: Journal of Epidemiology and Community Health 54, 2000, pp. 652-653.
128
Raffaella Campaner
as a good hint to a conception of diseases, what about an understanding of the different levels across which they spread? That mechanisms are dynamic multilevel systems has been extensively argued by Machamer, Darden and Craver, and recently highlighted by Glennan, whose definition of mechanism can be seen as synthesizing different intuitions on causation that play some role in epidemiology. His “complex-systems account” centres on the following definition: “a mechanism for a behaviour is a complex system that produced that behaviour by the interaction of a number of parts, where the interactions between parts can be characterised by direct, invariant, change-relating generalizations”7. Mechanisms are systems consisting of stable arrangements of parts, and events are causally related when there is a mechanism connecting them; the description of the mechanism responsible for the effect provides its explanation. Glennan’s notion of mechanism is strongly linked to that of productive behaviour: a mechanism is identified by what it does. As in Salmon’s view, a central role is played by the notion of interaction. According to Glennan, allowable interactions taking place between the parts of a mechanism depend upon the behaviour we are interested in explaining, and mechanisms must be simply such that their internal parts interact to produce the system’s external behaviours. No specific criterion is given regarding what can properly count as “parts” of a mechanism as involved in its functioning; they can easily belong to both the organisms and their environment. The key-notion employed to identify a genuine causal mechanism is that of invariant generalization: as emerges from the key-definition given above, the interactions between parts of the mechanism which give rise to its behaviour are characterised by invariant generalizations, namely generalizations that would hold were a range of possible interventions to be performed, expressed hence in counterfactual terms. The notion of invariant generalization and the use of interventionist counterfactuals are explicitly borrowed from Jim Woodward, whose account is inspired by a manipulationist perspective on causation. Invariance is the fundamental notion on which his whole theory is grounded. Woodward defines invariance as the key feature a relationship must possess to count as causal, and deems a generalization invariant across certain changes if it holds up to certain appropriate levels of approximation across those changes. As in any manipulability approach, a strong interest in controlling is expressed: “wiggling” on a given X that is a relatively invariant cause of Y gives one some control over whether Y obtains. An explanation is stated to be an answer to a what-if-things-had-been-different question, since one ought to be able to associate with any successful explanation a hypothetical or counterfactual experiment: […] the explanation must enable us to see what sort of difference it
7
Stuart Glennan, “Rethinking Mechanistic Explanation”, in: Philosophy of Science 69, 2002, pp. 342-353, quot. p. 344.
Causality and Explanation
129
would have made for the explanandum if the factors cited in the explanans had been different in various possible ways8
Interest in causal explanation is seen as an extension of control from cases in which manipulations can be actually performed to cases in which they cannot. Glennan too understands counterfactuals without appealing to anything like possible worlds, and no such thing as a counterfactual analysis of causation separated from the mechanistic one is put forward. Glennan admits of counterfactuals only insofar as they stand in close relation with mechanisms: counterfactuals are to be employed in a specific, “interventionist” sense, to identify productive mechanisms. The counterfactual approach has gained broad acceptance in epidemiology as a guide to causal reasoning, to identify what performs the production by comparing an exposure or treatment condition with an unexposed or untreated one. Positions sympathetic to a probabilistic-counterfactual approach can be found in Rothman, Greenland, Lash; Susser, Schwartz, Morabia and Bromet; Kaufman and Poole; Parascandola and Weed; Höfler.9 Application of the counterfactual approach is held to lead epidemiologists “to ask whether the removal of one component of the constellation of circumstances […] would have prevented the outcome […] from occurring in [a] particular instance.”10 The counterfactual approach is believed to have various implications: the counterfactual definition includes the “given that other conditions are fixed” clause, and this is taken to mean that the strength of a risk factor—which leads to a disease only in the presence of its causal partners—depends on the prevalence of its causal partners in the population at stake. Furthermore, the search for risk factors cannot but focus on a restricted portion of the causal field, considering some factors a fixed background (for instance, characteristics shared by all members of the population). The designation of what counts as immutable background involves—is the worry—subjective judgments.11 It appears how, in presenting the use of counterfactuals, the focus is here on risk factors. What about the manipulability component expressed by interventionist counterfactuals? This element captures an essential aspect of epidemiology, i.e. its being strongly targeted to prevention. Prevention can often be achieved even in 8
Jim Woodward, Making Things Happen: A Theory of Causal Explanation. New York: Oxford University Press 2003, quot. p. 11. 9 Kenneth Rothman, Sander Greenland, Timothy Lash (Eds.), Modern epidemiology (third ed.). Philadelphia. Lippincott Williams & Wilkins 2008, esp. ch. 4; Susser, Schwartz, Morabia, Bromet, ibid., esp. chs. 4-5; Jay S. Kaufman, Charles S. Poole, “Looking Back on ‘Causal Thinking in the Health Sciences’”, in: Annual Review of Public Health 21, 2000, pp. 101-119; Mark Parascandola, Douglas Weed, “Causation in Epidemiology”, in: Journal of Epidemiology and Community Health 55, 2001, pp. 905-912; Michael Höfler, “Getting Causal Considerations Back on the Right Track”, in: Emerging Themes in Epidemiology 3, 2006, pp. 8-10. 10 Susser, Schwartz, Morabia, Bromet, ibid., p. 36. 11 See Susser, Schwartz, Morabia, Bromet, ibid., pp. 38-42.
130
Raffaella Campaner
the absence of mechanistic knowledge. Mechanistic knowledge can provide the premises for the elaboration of the therapy, but in many cases the therapy itself has preceded the discovery of mechanistic details, and has done without it. The history of medicine is full of examples in which the preventive measure was actually discovered much earlier (from a few to many decades) than the causal agent and the mechanism of action of the disease.12 Waiting for further knowledge would have meant a serious delay in the promotion of those therapies and preventive strategies which have proved essential for reducing the incidence of the pathologies. If the theoretical goal of understanding the mechanistic working of the disease and the practical goal of curing it do not necessarily go together, it is mechanistic understanding that is sought for explanatory purposes. Woodward himself acknowledges that doing the experiment corresponding to the antecedents of [counterfactual claims] doesn’t make [them] have the truth-values they do. Instead the experiments look like ways of finding out what the truth values [of the counterfactual claims] were all along.13
To reconcile it with Glennan’s mechanicism, we can say that the mechanism is what does the causal work, and the invariant generalizations “simply summarize the behaviour of the mechanism.”14 Woodward and Christopher Hitchcock argue that adequate explanations exhibit patterns of counterfactual dependence, describing how the system’s behaviour would change were some conditions to be varied. It can be easily admitted that explanations provide the resources for answering what-if-things-had-beendifferent questions by making explicit which variables the explanandum depends upon, and allow us to gain insights into properties we would like to control, to deal with matters such as control groups, to respond to why we chose some specific experimental strategy. Invariant generalizations, though, will not suffice for explanatory purposes. Recent epidemiologic literature stresses how what we want to know from a causal explanation—differently from a causal identification—is what goes on between the exposure to the risk factors and the appearance of the symptoms, i.e.: we want to uncover what constitutes the pathology’s mechanistic working. Only in this way will the shortcomings of the black box paradigm be overcome. Specified mechanisms clarify the actions and effects of the risk factors, and the organism’s behaviour they outrigger. Although not strictly necessary for curing, knowledge of mechanisms allows for more effective cures and prevention strategies. For example, studies on schizophrenia are seeking to understand how 12 See Paolo Vineis, Micaela Ghisleni, “Risks, Causality, and the Precautionary Principle”, in: Topoi 23, 2004, pp. 203-210. 13 Jim Woodward, “Counterfactuals and Causal Explanation”, in: International Studies in the History and Philosophy of Science 18, 2004, pp. 41-72, quot. p. 46. 14 Stuart Glennan, “Mechanisms, Causes and the Layered Model of the World”, in: Philosophy and Phenomenological Research, forthcoming, p. 9.
Causality and Explanation
131
factors such as prenatal viral infections, genetic vulnerability and adolescent cannabis use combine. The identification and elimination of risk factors would reduce the pathology’s incidence, but knowledge about relevant exposures directs research into causal mechanisms to uncover, for instance, how a given virus affects the development of a foetus’ brain. Moreover, causal mechanistic explanations often provide the framework to generalize the results of a given study beyond the specific cases that have been tested. One way to accomplish the goal of generating knowledge that proves meaningful outside the confines of the study particulars is to identify the mechanisms through which the exposure affects the disease. For instance, if we are to explain infant schizophrenia spectrum disorders and have identified maternal overweight as a possible cause, different mechanisms can be hypothesized through which maternal overweight influences foetal development. The foetus can be affected by maternal overweight through gestational diabetes, or through the mother’s intake of amphetamines which physicians prescribe to limit weight gain during pregnancy. Making the hypothesis more specific by indicating the active ingredients of the exposure, the mechanisms through which they cause the disease and the conditions under which they do so, reduces the range of plausible explanations. The number of alternatives drops because any explanation will have to account not only for the exposure-disease association, but also for the explanatory power of the mechanism identified.
3. MECHANISTIC EXPLANATIONS OF LAYERED DISEASES Contemporary epidemiology stresses how diseases under enquiry have a crucial multilevel character, with different kinds of properties (for instance, living in a wealthy or poor environment; having given genetic features, …) playing an important role, and affecting each other. Socio-economic and environmental properties have been increasingly identified as relevant to many pathologies, as properties which make a real difference and do not count as simple background conditions. Equal concern is expressed for causal pathways at the societal level and pathogenesis and causality at the individual, molecular level. The target is hence to integrate such different levels into some adequate conception of mechanisms and mechanical explanation. The very identification of the levels relevant properties belong to is far from unproblematic and univocal. For instance: does a greater use of cannabis in urban areas as a causal factor of schizophrenia count as a high-level social factor— given its diffusion—, or as a low-level individual factor, with specific molecular pathways? Or: are increased placental levels of corticotrophin releasing hormone (CRH), as a factor which favours pre-term delivery, an individual factor, with its specific chemical and physical features, or a socio-economic factor, given that such levels are higher in poorer women? Furthermore, what may appear to be just
132
Raffaella Campaner
one level, for instance the social one, may be further subdivided, and belonging to a family, neighbourhood, village or society may have different implications. Obviously, explanations may be pursued with different focuses and purposes, and the decision as to which levels to include will have to do with the question raised, the particular nature of the disease, the pattern of disease rates, and so on. The privileged level of investigation can be a high one, on which to intervene with prevention campaigns. In other cases, if the focus is on why certain individuals within a population become infected whereas others do not, the individual genetic level may be what the enquiry concentrates on. Once the epidemiological investigation has analyzed determinants and outcomes at different levels of organization, the action that follows will address the most efficacious levels. Prevention and public health policies hence affect the search for explanations. Although usefulness can be a criterion to decide on which level to focus, an adequate explanation will have to take into account a number of levels involved. Their mutual relations rise various issues, and complicate the search for mechanical explanations. Social and environmental facts do not contribute to the occurrence of diseases in a disembodied way: they cause them “through mediating factors that, at some point in the causal chain, become biologic phenomena. Nonetheless, they are not reducible to individual-level phenomena: they may cause disease through a large number of interactive pathways that cannot be fully itemized”. For instance, it is of course true that the availability of alcohol in and of itself cannot lead to alcoholism without working through the behaviour of individuals. The alcohol has to enter the body to make a difference. However, the availability of alcohol influences the alcoholism rate through many pathways and through interaction with numerous risk factors15.
Aiming at reducing ambiguities in talking of levels Woodward distinguishes between “upper-level”, or—he says—“coarse-grained”, and “lower-level”, or— he says—“fine-grained” causes16. By “upper-level” he refers to more abstract and less physical factors (for instance, social variables); with “lower levels” he refers to more physical and concrete causes (for instance, genetic factors). In an interventionist account like his no particular problem arises with respect to mixing variables belonging to different levels. Variables at high levels can be causes of variables at low levels, provided the right sort of invariant relationships hold. It is hence simply acknowledged that environmental events can alter finer-grained patterns of gene expression, which in turn can influence relatively more macroscopic neural processes, which in turn affect mental or behavioural patterns. Elsewhere Woodward distinguishes between three notions of “levels”, understood in terms 15 Susser, Susser, Morabia, Bromet, ibid., p. 448. 16 Jim Woodward, “Cause and Explanation in Psychiatry”, in: Kenneth Kendler, Josef Parnas (Eds.), Philosphical Issues in Pychiatry. Baltimore: The Johns Hopkins University Press 2008, pp. 132-184.
Causality and Explanation
133
of: 1) the relative size or of part/whole relationships (cell membranes, molecules, atoms, …); 2) relative abstractness versus concreteness; 3) theories, disciplines or characteristic concepts (psychological level, biochemical level, physical level).17 All these senses can be employed in epidemiologic explanations, which take place at the crossroads of various disciplines. It is the interactions between levels that remain a crucial point for the elaboration of a satisfactory account. According to Woodward and Hitchcock, the interventionist theory of explanation has among its virtues the capability of making sense of the intuition that some explanations are deeper than others. Invariance is presented as admitting degrees: if the class of changes under which a relationship R1 is invariant is a subset of the class of changes under which R2 is invariant, then R2 is held to be more invariant than R1. The more invariant the generalization, the deeper the explanation in which it figures: “an explanation is deeper insofar as it makes use of a generalization that is more general”.18 Generality is to be conceived here as generality with respect to hypothetical changes in the system under study. In the field at hand, risk factors may exist that operate in only a specific context, but have a very strong impact there, and risk factors that are much weaker, but whose effects extend over many contexts. It is crucial to distinguish between the scope and the stability of a generalization, the former having to do with how broadly the generalization applies, the latter with the range of variations under which it holds. A very stable generalization can have a narrow scope, and a very broadly applying generalization can be invariant under a restricted range of interventions. Upper-level claims may capture a wider range of dependency relationships than lower-level causal claims, but they may also be less invariant. Epidemiologic explanations in the ecologic perspective have the integration of different levels as their specific aim. To be explanatory, causal claims have to concern mechanisms that are not only potentially present, but that are actually operating in the disease at stake, and have to regard properties of the mechanism’s components that are relevant to the specific target of the explanation. Not all the factors mechanistically involved in a given pathology turn out to be explanatorily relevant in a given context. While the whole causal story is utterly objective, the context will dictate which portion we care about. However, higher levels are not reduced to lower ones, nor are lower level explanations regarded as superior to upper level ones, the idea being that it is the interplay of variables at different levels that has to be elucidated as originating diseases. The emphasis is not so much on width of scope or on stability, as on the range of relevant variables from different levels the explanation manages to put together. On the other hand, insofar as a more stable—“deeper” in Woodward and Hitchcock’s terms—explanation is an explanation that provides the resources for answering a wider range of what-if17 Jim Woodward, “Comment: Levels of Explanation and Variable Choice”, in: Kenneth Kendler, Josef Parnas (Eds.), Philosphical Issues in Pychiatry, op. cit., pp. 216-235. 18 Jim Woodward, Christopher Hitchcock, “Explanatory Generalizations, Part II: Plumbing Explanatory Depth”, in: Noûs 37, 2003, pp. 181-199, quot. pp. 181-182.
134
Raffaella Campaner
things-had-been-different questions, it can provide the resources for more extensive manipulation, and hence, for epidemiologic purposes, for planning interventions and reaching greater control. A deeper explanation in this sense can hence prove a more useful explanation. Explanations focusing on different levels of the same phenomenon could be regarded as more or less adequate by different epidemiologic paradigms, emphasizing, for instance, either the genetic or the social or the environmental factors, with correspondingly different views on what interventions should aim at. The ecologic paradigm is explicitly meant to overcome such a privileging one level over the others, by instead putting “things back together”19. Contemporary epidemiology is also devoting increasing attention to the relationships holding between the individual and the group in which s/he is included. Individual-level explanations need to be integrated into broader models incorporating interactions between individuals, as well as group-level or society-level determinants, where society-level determinants can modify individual-level properties. Recent studies have highlighted how groups may have non-additive properties: population system epidemiology has stressed how populations cannot simply be defined by linear combinations of individuals, and how deeper investigations are warranted on the relations between causal connections at individual and group levels. Growing interest in the ecologic determinants of health promoted by ecoepidemiology has been accompanied by an interest in population patterns of exposure, and not simply numbers of people exposed. Instead of considering the joint effects of multiple exposures in individuals, in population systems modeling “models of nonlinear population processes are needed that define how time-varying patterns of connections among individuals affect population level outcomes”20. Transmission models incorporating interactions between individuals and how they are linked to a population network are thus considered more and more broadly. Individuals with a very low risk of infection, for instance, can be key-individuals in the transmission of the infection at population level. As suggested by Glennan, the concept of interaction is then to be assumed as a key-concept, to go beyond the identification of risk factors “into the realm of explaining causal processes”.21 Troubles may arise, though, in examining how individual causal processes can be considered together with population variables. Glennan—discussing natural selection—distinguishes between causal processes, operating at the individual level, 19 Diez-Roux, ibid., p. 1028. Not only are different levels integrated, but the temporal dimension is taken into account, both in the sense of the influence of historical periods in the development and spread of a pathology and in the sense of the life-course dimension. For instance, examining the trajectory of a disease over the life course can lead to include long-deferred effects of in-utero factors, early childhood and adolescence experiences, and cumulative effects of long-terms exposures. 20 James S. Koopman, John W. Lynch, “Individual Causal Models and Population System Models in Epidemiology”, American Journal of Public Health 89, 1999, pp. 1170-1174. 21 Susser, Schwartz, Morabia, Bromet, ibid., p. 39.
Causality and Explanation
135
and population-level properties, that are causally relevant but “do not produce [the] outcome”.22 The role of population-level properties in the onset of a disease seems, though, a crucial concern for current epidemiology. Whether and how the bits belonging to different levels—population and individual—are actually entangled and can be described in mechanical terms, in the reconstruction of mechanical systems, seems to need further clarification. It is with respect to these and related issues that epidemiology has been rethinking its aims and methods: epidemiologists aim not only to identify causes, but also to explain the causal processes that lead to disease. Causal explanation requires us to elaborate our causal ideas through the specification and examination of alternative explanations, antecedents, mediators, and causal partners. […] The elucidation of causal pathways also helps to locate points for public health intervention.23
Once the relevant properties are obtained through answers to what-if-things-hadbeen-different questions, the demand for mechanical explanations persists. Epidemiology is striving for insights into multiple level causes and their dynamic interplay, into time factors, connections patterns and mediators between exposures and outcomes, to integrate both the constitutive and the etiological aspects24. However, a mechanical description that actually includes and integrates all patterns of interactions among all explanatorily relevant properties is hardly devisable. The elaboration of complete mechanical explanations—taking into account, adequately representing and combing all the relevant factors mentioned above—is hence to be seen as more a desideratum expressed by some recent trends to overcome the black box paradigm, enrich epidemiological analyses and orient further research, than a target reachable as such.
Department of Philosophy University of Bologna Via Zamboni 38 40126 Bologna Italy [email protected]
22 Stuart Glennan, “Productivity, Relevance and Natural Selection”, Biology and Philosophy 24, 2008, pp. 325-339, quot. p. 327. 23 Susser, Schwartz, Morabia, Bromet, ibid., p. 72, italics added. 24 See Wesley Salmon, Scientific Explanation and the Causal Structure of the World. Princeton: Princeton University Press 1984, p. 275.
SAMUEL SCHINDLER
INVARIANCE, MECHANISMS AND EPIDEMIOLOGY
In her paper “Causality and Explanation: issues from epidemiology”, Raffaella Campaner tries to put forth a challenge to current philosophical accounts of explanation. Of the latter, Campaner finds most relevant to epidemiological practices the mechanistic accounts of Glennan1 and Machamer, Darden, and Craver2 and Woodward’s interventionist account of explanation.3 Campaner is not very specific about what her challenge to these views exactly amounts to, but I take it to consist of at least of the following core claim: (CC): Contrary to Woodward, generalisations invariant under interventions do not perform any explanatory work in epidemiology. Rather, it is mechanistic explanations that epidemiologists search for in their attempts to explain diseases as multi-level and multi-factor phenomena.4 This challenge, I submit, fails on two counts. First it ignores the view expressed by Woodward5 that talk about mechanisms can be perfectly reduced to talk about counterfactual explanations. Second, some of the mechanistic concepts Campaner wishes to transfer from the biological context (in which it was developed) to the epidemiological context do not seem to be applicable there. Since these concepts nevertheless seem to be essential to the standard mechanistic account, this raises the question of whether a different conceptual framework might not be better suited for epidemiology. Let us start with the first point. In the standard view about mechanistic explanation,6 mechanisms consist of entities and their activities that jointly produce phenomena. Providing an explanation of a phenomenon then means providing a description of the mechanism that produces or is capable of producing 1 2 3 4
5 6
Glennan (2002) Machamer, Darden, and Craver (2000) Woodward (2003) There are various places in Campaner’s paper, in which she makes claims to this effect. For example: “It is mechanistic understanding that is sought for explanatory purposes”, “invariant generalizations, though, will not suffice for explanatory purposes”, and “once the relevant properties are obtained through answers to what-if-things-hadbeen-different questions, the demand for mechanical explanations persists”. “Epidemiology is striving for insights into multiple level causes and their dynamic interplay.” Woodward (2002) see Machamer et al. (2001)
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_9, © Springer Science+Business Media B.V. 2011
138
Samuel Schindler
the phenomena to be explained. As Woodward7 rightly notes, the standard proponents of mechanistic explanations are somewhat vague about how the production relationship is to be understood, despite it obviously playing a crucial part in the explanation of the phenomena: if a mechanism is not capable of producing a phenomenon, describing the mechanism cannot constitute an explanation of the phenomenon in question. Woodward claims to be able to fill the lacuna left by the “mechanists”. He suggests that the productive relationship between mechanisms and the phenomena be understood in terms of his interventionist account of explanation: if an intervention were to change the state of a mechanism from M to M* (by changing at least one of the components m of M), then the produced phenomenon would change its state from P to P*. Only productive relationships which conform to these “active” counterfactuals, as Woodward calls them in order to point to their interventionist component, are causal relationships. In contrast, to take one of Woodward’s main examples, the manipulating a barometer will not bring about the occurrence of a storm—the relationship is non-causal. Now one can of course ask how enlightening the reduction of the mechanistic production relation to an interventionist notion really is. After all, Woodward’s account of causal explanation is admittedly non-reductionist, i.e., it does even attempt to reduce causation to some other notion (other approaches try to reduce causation to e.g. some probabilistic notion8). In any case, Campaner does not really engage with any of this. In many places it looks as though she merely asserts that the interventionist notion of causation is not sufficient for explanatory purposes in epidemiology. In other places, Campaner appears to suggest that invariance relationships, contrary to mechanistic explanations, are too coarse-grained for them to serve as explanations: Recent epidemiologic literature stresses how what we want to know from a causal explanation—differently from a causal identification—is what goes on between the exposure to the risk factors and the appearance of the symptoms, i.e.: we want to uncover what constitutes the pathology’s mechanistic working.
Campaner provides the following example to illustrate this idea: there seems to be some sort of invariant relationship between maternal overweight and infant schizophrenia. This alone, however, amounts to not more than a brute fact. Epidemiologists of course want more. They want to find out about the ‘causes’ or ‘mechanisms’ of this correlation. Various explanations have been suggested. Infant schizophrenia could be caused by their overweight mother’s gestational diabetes or by amphetamines, which overweight mothers are often prescribed by physicians in order to limit weight gain. But I don’t quite see why the interventionist account could not be rendered more fine-grained here. Why can one not probe the causal content of each of the above ‘mechanistic explanations’ by intervening 7 8
Woodward (2002) see Hitchcock (2010)
Invariance, Mechanisms and Epidemiology
139
on the relevant variables (by sampling only mothers without gestational diabetes and by refraining from prescribing amphetamines, respectively)? Although I do have sympathies with the idea that invariances are insufficient for explanations, I don’t think Campaner does nearly enough to dismiss Woodward’s account as being inappropriate for epidemiological explanations. Let us now turn to the second major problem of Campaner’s paper. Contrary to her assessment of Woodward’s interventionist notion of causation, Campaner is largely sympathetic to the mechanistic account of explanation. For instance, Campaner finds useful for the epidemiologic context the mechanistic idea that multiple factors at different levels are involved in the causation of a phenomenon (here: a disease). But this is of course just one of the tenets of the mechanistic account. Unfortunately, Campaner does not exercise due care when discussing the other assumptions of the mechanistic account. Another central component of the mechanistic account, for example, is the idea that the levels of mechanisms are not reducible to each other.9 Although this is also true in the epidemiological context, it is trivially true. It simply would not make much sense to claim that environmental factors (as parts of higher level phenomena) are somehow reducible to genetic factors (as parts of lower level phenomena), for instance. Likewise, it would be odd to claim that environmental factors are somehow caused by genetic factors. Nevertheless, we can clearly talk about ion channels in neurons in the brain causing action potentials (the paradigmatic example of the mechanists10). Also, as Campaner notes herself, “the very identification of the levels … is far from unproblematic and univocal”. This is again clearly different from ion channels (lower level) and action potentials (higher level), where there is no doubt about a clear distinction between levels. Of course, this example might be uncharacteristically simple even in the context of biology. Perhaps one should then go a step further than Campaner (who apparently does subscribe to the notion of “levels”) and call for a broadening of the notion of mechanisms by for instance abandoning the idea of levels altogether. Doing so, however, might run the risk of trivializing the notion of mechanisms as merely multifactoral explanations of phenomena. Before giving up on the original idea of mechanisms in this way, it may be worth pondering whether the (philosophical) notion of mechanistic explanation is really appropriate to the context of epidemiology in the first place.
REFERENCES Carl F. Craver, Explaining the brain. New York: Oxford University Press 2008. Stuart Glennan, “Rethinking Mechanistic Explanation”, in: Philosophy of Science 69, 3, 2002, pp. 342-S353. 9 cf. Machamer et al. (2000) 10 see Craver (2008)
140
Samuel Schindler
Peter Machamer, Lindley Darden, and Carl F. Craver, “Thinking about mechanisms”, in: Philosophy of Science 67, 1, 2000, pp. 1-25. Christopher Hitchcock, “Probabilistic Causation”, in: The Stanford Encyclopedia of Philosophy (Summer 2010 Edition), Edward N. Zalta (ed.), forthcoming 2010 URL = . Jim Woodward, “What is a mechanism?”, Philosophy of Science 69, 3, 2002, pp. S366–S377. Jim Woodward, Making Things Happen. New York: Oxford University Press 2003.
Department of Philosophy University of Konstanz 78457 Konstanz Germany [email protected]
ALEXANDER REUTLINGER
WHAT’S WRONG WITH THE PRAGMATIC-ONTIC ACCOUNT OF MECHANISTIC EXPLANATION?
1. INTRODUCTION Mechanist theories of explanation1 are taken to be tremendously fruitful in philosophy of science, and especially in philosophy of biology. Among the undeniable advantages of mechanist theories of explanation are: (a) they provide a prima facie adequate theory of explanation for the biological sciences, and possibly in other special sciences (because, e.g., no universal laws are presupposed, and the reference to part-whole relations in the explanatory practice of these disciplines is taken seriously), (b) they account for general as well as actual causal explanations, (c) they cope with several notorious counterexamples to many other theories of explanation (e.g., various scenarios of preemption, counting out irrelevant factors etc.). In this contribution, I would like to critically examine a specific account of mechanistic explanation, namely the account of Daniel Sirtes.2 Daniel Sirtes advocates a pragmatic-ontic model of mechanistic explanation that he characterizes as follows: Mechanistic explanations are answers to questions determined by the triple determinants of the question Q =〈Pk, X, R , and the quadruple determinants of the relevance relation R =〈L, P, T, M […].3
Following van Fraassen’s (1980) pragmatic model of explanation, Sirtes takes Pk to stand for the topic of the (explanation seeking) question Q, X for the contrast class of Pk, and R for the relevance relation. Sirtes improves van Fraassen’s model by fixing the meaning of R. Consequently, Sirtes is not subject to anything-goes objections to van Fraassen’s original account.4 Sirtes5 specifies the relevance relation R as the tuple 〈 L, P, T, M〉with 1. L for “the level(s) or lexicon of the descriptive concepts”, 2. P for “the precision or the tendency of precision of the explanation (e.g. 1 2 3 4 5
Most importantly, Machamer, Darden and Craver (2000); Glennan (2002); Woodward (2002); Craver (2007). Sirtes (2010) Sirtes (Ibid., p. 23) cf. Kitcher & Salmon (1987) Sirtes (Ibid., p. 23)
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_10, © Springer Science+Business Media B.V. 2011
142
Alexander Reutlinger
as detailed as possible, as general as possible)”, 3. T for “the time-frame and time resolution of the explanans (starting and end points, changes should be accounted for in a seconds-, milliseconds-, or nanosecondsscale)”, and 4. M for “the mechanistic relation, which stays the same in all mechanistic explanation”. Most importantly, the mechanistic relation M is characterized as follows: (1) M holds between the parts of a system and the whole system, and (2) according to M, parts and whole causally interact.6 Note that condition (b) requires some refinement: simultaneous part-whole relations are not causal, rather a whole is consituted by its parts. Nevertheless, because of this constitutive relation the interaction of the parts at time t causes the state of the whole system at a later time t*. For our reconstruction of Searle’s Biological Naturalism, we merely require a causal relationship between parts and whole in the latter, diachronic sense (cf. Craver 2007: Chapters 4 and 5). Sirtes’ main motivation to provide this model of mechanistic explanation is a shortcoming of Craver’s7 model of mechanistic explanation. Craver argues that an entity A is explanatorily relevant for an entity B iff the following two conditions are satisfied by A w.r.t. to B, i.e. iff the following counterfactuals CR1 and CR2 are true of A and B. Sirtes8 reconstructs Craver’s causal notion of explanatory relevance as follows: (CR1) When [the entity’s activity] Φ is set to the value φ1 in an ideal intervention, then [the system’s activity] Ψ takes on the value of f(φ1). (CR2) if [the system’s activity] Ψ is set to the value ψ1 in an ideal intervention, then [the entity’s activity] Φ takes on the value f(ψ1). I take Sirtes to denote a whole system by “system” and the parts of the whole by “entity”. Following interventionist theories of causation9, Craver thinks that CR1 and CR2 explicate the meaning of causal statements (that assert causal relations between the whole and its parts). So, according to Craver (and other interventionist theories of explanation), to be a cause of the explanandum is at least a necessary condition for an entity to be explanatorily relevant. I take Sirtes to endorse the claim that M holds iff CR1 and CR2 are true (of the states of a whole and its parts). Sirtes criticizes Craver’s account by pointing out that CR1 and CR2 cannot be sufficient for explanatory relevance, because these conditionals by themselves 6 7 8 9
Sirtes (Ibid., pp. 23f). Craver (2007) Sirtes (Ibid., p. 16) cf. Hitchcock (2001), Woodward (2003), Halpern & Pearl (2005)
What‘s Wrong with the Pragmatic-Ontic Account?
143
do not settle the following questions: (1) on what level is the whole system and its parts described?, (2) what are the temporal boundaries of a mechanism (i.e. start and finish conditions)?, (3) to what degree of precision is a system and its parts supposed to be described?, and (4) how do scientists distinguish between causal factors in the mechanism and causal background conditions of the mechanism?10 According to Sirtes, these questions can only be answered by purely pragmatic decisions of scientists. Thus, pragmatic constraints have to be added to CR1 and CR2 in order to be sufficient for explanatory relevance. So, according to Sirtes, the tuple 〈 L, P, T, M 〉 specifying the relevance relation R falls into an ontic part (consisting of M) and a pragmatic part (including L, P, and T): First, there is the ontic, causal-manipulative part of the relevance condition. The terms that appear in a mechanism description are entities and their causal interactions. […] This ensures that there is a causal efficacy of these entities and causal interactions for the explanandum. Second, there is a pragmatic part of the relevance condition. Lexical (or level(s)), temporal, and precision constraints must be added to the ontic-manipulative condition in order for the boundaries of the mechanism to be determined.11
Sirtes’ model prima facie seems to improve a pragmatic model of explanation by specifying the relevance relation. Nevertheless, I will present two worries concerning the pragmatic-ontic model.
2. WORRIES I will present two worries concerning Sirtes’ model. The first worry addresses the question how the ontic and the pragmatic part of relevance relation R relate to one another. I will argue that their relation is problematic. This becomes obvious, as I will argue, when one points out that—in an interventionist framework—causal statements are semantically evaluated relative to a causal model. Here Sirtes model conflicts with the interventionist theory of causation that he relies on. The second worry addresses a problem for all interventionist and those mechanists that rely on an interventionist notion of causation, such as Sirtes. This problem mainly stems from the notion of a “possible” or “ideal” intervention. First Worry: Causal Statements and Causal Models As shown above, Sirtes argues that pragmatic elements have to be added to CR1 and CR2. On my opinion, this is a strange claim, because interventionist theories of causation presuppose that the required pragmatic choices have already been 10 cf. Sirtes (Ibid.,p p. 16-20) 11 Sirtes (Ibid., pp. 22f.), my emphasis
144
Alexander Reutlinger
made. Let me elaborate on this claim. For instance, consider Woodward’s definition of a direct type-level cause: A necessary and sufficient condition for X to be a direct cause of Y with respect to some variable set V is that there be a possible intervention on X that will change Y (or the probability distribution of Y) when all other variables are held fixed at some value by intervention.12
So, according to Woodward, if X is direct cause of Y, then there is a possible intervention on X that changes Y. According to Woodward, this definition of a direct cause relies on (“interventionist”) counterfactuals of the same form as CR1 and CR2: If the value of X were xi, then the value of Y would change to yi. Note that Woodward’s definition is stated in terms of random variables13, such as X and Y. Variables have been imported in the philosophical debate from the methodology of causal inference. This deviates from the received philosophical view: most philosophers, like David Lewis, take the causal relata to be events described by event statement such as “event e occurs”. Interventionists express event statements in terms of random variables. In other words, the fact that an event e occurs is expressed by the statement that a variable X has one of its possible values xi. For instance, the event statement that the temperature in the room is 30º is represented by the quantitative statement “the variable T representing temperature has its possible values 30º (at a certain time and location)”. Obviously, being able to accommodate quantitative language is an advantage for an explication of causation in the (special) sciences. What matters for my present concerns is this: Woodward defines causation “with respect to some variable set V”. More precisely, Woodward and other interventionists14 define causation relative to a causal model. A causal model M is a triple of a set of exogenous variables U, a set of endogenous variables V, and a set L of functional relation (or “laws”) over U and V. Exogenous variables are taken as given in the model, because their values are not described as being caused by other variables. They have no causal predecessors in the causal graph (associated with the model). Rather, the role they play is restricted to being the causes of endogenous variables. In contrast, each endogenous variable does have a cause in the graph. Note that there is a clear distinction between exogenous and endogenous variables within a given graph. But there is no principled distinction between en12 Woodward (2003, p. 55) 13 A variable X (in the terminology of statistics) is a function X : D → ran (X), with individuals (or outcomes) d∈D of a domain D, and the range ran (X) which is the set of possible values x∈ran(X) of the variables. For quantitative variables X, ran (X) is the set of real numbers; qualitative variables are binary, i.e. ran (X) = {0; 1}. On notation: capital letters, such as X, Y, …, denote variables; lower case letters, such as x, y, …, denote values of variables; the proposition that X has a certain value x is expressed by a statement of the form X=x. 14 Pearl (2000); Hitchcock (2001); Halpern & Pearl (2005)
What‘s Wrong with the Pragmatic-Ontic Account?
145
dogenous and exogenous variables relative to different graphs: One variable might be considered as endogenous in graph G and as exogenous in another graph G*. Whether a variable is endogenous or exogenous depends mostly on the pragmatic decisions, background assumption, and available theories etc. of modeling scientists. Likewise, the choice of the set of possible values over which a variable ranges is to a large extent pragmatic. Note that this choice also includes the degree of precision and, possibly, temporal information. Similarly, a distinction of levels can be added to a model.15 The upshot is that (1) the causal statement “X is direct cause of Y” can only be true or false relative to a causal model, and that (2) there mainly pragmatic reasons to choose one causal model rather than another. In other words, adopting an interventionist framework of causation already presupposes pragmatic constraints (although Craver uses the interventionist theory without making these pragmatic commitments explicit). If this is true, then Sirtes’ claim to add pragmatic constraints is misguided, because—according to interventionist theories of causation—one cannot separate the truth conditions of causal statements from a specific causal model. Further, causal models—as used by interventionists—give rise to another problem for Sirtes. One may wonder what role laws play in Sirtes’ account of explanation. Unfortunately, Sirtes’ relevance relation R remains silent on a crucial feature of interventionist account of explanation: Invariant or stable generalizations.16 Following interventionists about causation and causal explanation, Craver17 calls these generalizations (in a case study of LTP mechanisms), generalizations that are limited in scope, mechanistically fragile (i.e. sensible to disturbing factors) and historically contingent. According to the interventionist framework an event is explained iff this event can be inferred18 from invariant generalizations and initial conditions. Note that interventionists explicitly claim that referring to generalizations is an intended similarity of their model of causal explanation and the Hempel-Oppenheim-model. Interventionists differ from the latter model by, first, allowing non-universal generalizations to be explanatory, and, secondly, by relying on causal background conditions. It seems that Sirtes has to choose here: either laws are hidden in the mechanistic relation.19 If so, Sirtes is required to ex15 cf. Woodward (2002, p. 375) 16 Woodward & Hitchcock (2003); Woodward (2003); Halpern & Pearl (2005); Craver (2007) 17 Craver (Ibid., pp. 66-70) 18 Obviously, this inference has to be non-monotonic, because the generalizations in question are non-universal. Thus, the interventionist model of causal/mechanist explanation differs from the DN-model which Woodward and Hitchcock use as an analogy. 19 Here is evidence that this might the case: Sirtes says, according to CR1 and CR2, that the value of one variable is a function of the values of other variables, e.g. f(φ1). These functional relations can be understood as what interventionists take to be invariant generalizations.
146
Alexander Reutlinger
plain how laws can be incorporated in his model. Or, if laws do not play a role in his account at all, he cannot adopt the interventionist theory of causal (mechanist) explanation à la Woodward and Craver. Second Worry: Ideal Interventions and Counterfactuals As other mechanists and interventionists Sirtes20 relies heavily on interventionist counterfactuals (such as CR1 and CR2). These counterfactuals refer to “ideal” or “possible” interventions. Thus, ideal or possible interventions are an essential for the ontic part of Sirtes relevance condition. Presupposing interventions of this kind is a serious problem, as I will argue. In order to get a grip on the problem, let us turn to the following question: What is a possible intervention? A possible intervention on a variable X is something that explains a variation of the value of X. Interventionists like Woodward understand this variation causally. The basic idea of an intervention consists in assuming an additional causal factor I that produces a local change in X, i.e. a factor I sets a variable X to a certain value. This additional causal factor I is called an ‘intervention’ or, more precisely, an ‘intervention into the value of a variable’. In a causal graph, an intervention I on X can be represented by an additional exogenous variable I ranging over a set of possible values {i1, …, in}. Suppose that X causes Y, because some value y1 of Y counterfactually depends on some value x1 of X (as Woodward’s definition of direct causation requires). According to Woodward, the fact that X has the value x1 is itself the effect of another cause: It is the effect of the intervention variable I (figure 1 below illustrates the idea that I is modeled as an additional variable in a causal graph; the arrows represent causal relevance). I
X
Y
(figure 1) More precisely, Woodward defines interventions via intervention variables. So, let me recapitulate his definition of an intervention variable first: According to Woodward,21 a variable I is an intervention variable for X (relative to Y) iff (a) I is a cause of X. (b) There is at least some value of I such that if I takes this values, then X depends only on I and X depends on no other variables, i.e. I is the only cause of X. (c) I is not a direct cause of Y, and if I is a cause of Y then I is an indirect cause of Y via a causal path leading through X and a–possibly empty–set of intermediate variable Z1, …, Zn. (d) I is probabilistically independent of other causes W1, …, Wn of Y, which 20 Sirtes (Ibid., p. 16) 21 Woodward (2003, p. 98)
What‘s Wrong with the Pragmatic-Ontic Account?
147
are not on a causal path leading from X to Y.22 The notion of an intervention variable is, in turn, used to define the notion of an intervention. Any value ii of an intervention variable I is an intervention on X relative to Y iff it is the case that the value of X counterfactually depends on the fact that I has the value ii.23 Woodward repeatedly underlines that an intervention on X is an intervention relative to Y, the purported cause.24 An intervention on X relative to Y might not be an intervention on X relative to some variable U, because—although condition (a) might be satisfied for some I—the conditions (b)-(d) might not be satisfied for the second case (for instance, because other variables have to be held fixed in the cases of Y and U as effect variables). Woodward points out that interventions need not even be physically possible.25 Instead, interventions are possible in the sense that they are “logically possible” or not “ill-defined for conceptual or metaphysical reasons”26. What is a case where it is physically impossible to intervene, but nevertheless logically possible to carry out an intervention. Sirtes seems to agree with Woodward on the modality of interventions involved in conditionals as CR1 and CR2 when he talks about “ideal” interventions. An example of a logically possible intervention from physics might be: Uranium decays “only spontaneously” in the sense that there is no physically possible way to manipulate Uranium that it decays (or that the probability of decaying is raised) at a time t. Nonetheless, the decay of Uranium causes a flash on a screen (in an experimental setup). Thus, interventionists, like Sirtes, are committed to the claim: there is a logically possible intervention on Uranium decay. “Ideal” or logically possible interventions are problematic for at least two reasons: Reason 1. What exactly is a logically possible cause of the decay of Uranium? Which event or process qualifies as such an intervention on the decay of Uranium? Unfortunately, the answer to these questions seems to be “Every event that is logically compatible with the decay of Uranium qualifies as a logically possible intervention”. Suppose I tell you that the appearance of bubbles in the stomach of a yellow shark is an intervention on Uranium such that Uranium decays—in this case the intervention is represented by a binary variable B, with “B = 1” for “there are bubbles in the stomach of a yellow shark”, and “B = 0” for “there are no bubbles 22 23 24 25
cf. Woodward (2003, pp. 99-102) for an elucidation of condition (d). cf. Woodward (2003, p. 98) cf., for instance, Woodward (2003, p. 103). Woodward (2003, p. 128f.) distinguishes a strong and a weak sense of “physical possibility”: (a) Strong physical possibility: An intervention I is possible iff I is consistent with actual initial conditions and the actual laws. (b) Weak physical possibility: An intervention I is possible iff I is consistent with some set of possible initial conditions and the actual laws. Woodward claims that the counterexample is supposed to rule out interventions on C that are physically possible in both senses. 26 cf. Woodward (2003, pp. 128, 132).
148
Alexander Reutlinger
in the stomach of a yellow shark”. Or, suppose I tell you that James’ morning walk to the office—represented by a binary variable J, with “J = 1” for “James takes a walk”, and “J=0” for “James does not take a walk”—has a similar effect on Uranium decay.27 Obviously, both examples are (a) logically possible interventions on Uranium decay, and (b) members of an infinite set of equally qualifying examples. Clearly, interventionists (and, therefore, Sirtes as well) are committed to (all of) these candidates of interventions—because it is logically possible that there is something about yellow sharks that has the desired intervening effect on Uranium. But if it is true that every event that is logically compatible with the decay of Uranium qualifies as an intervention and we have no criterion to prefer one intervention to another, then a problem arises: logically possible interventions become useless. The reason is that the sentence “there is a logically possible intervention on Uranium decay” is equivalent with “it is no contradiction to say that Uranium decays”. Thus, logically possible interventions are superfluous for determining the meaning of causal claims. Reason 2. Interventionists believe that logically possible interventions are used to determine the truth-conditions of counterfactuals such as CR1 and CR2. I will argue that this claim seems to be at odds with standard accounts of the meaning of counterfactuals. There are three alternative accounts of the meaning of counterfactuals: (I) a truth conditional possible world semantics, (II) the meta-linguistic account and (III) the suppositional account. (I) According to the possible world semantics for counterfactuals28, “If the Uranium were to decay, then there would be a flash on the screen” is true iff there is a flash on the screen in all the closest Uranium-decay-worlds. (II) According to the meta-linguistic account29, “If the Uranium were to decay, then there would be a flash on the screen” is true iff the consequent can be logically derived from a set of premises consisting of (1) laws of nature, (2) the antecedent, and (3) other singular statements describing initial conditions. (III) The suppositional account originates from an idea by Frank P. Ramsey’s that is referred to as “the Ramsey-Test” for conditionals: If two people are arguing ‘If p will q?’ and are both in doubt as to p, they are adding p hypothetically to their stock of knowledge and arguing on that basis about q […]. We can say that they are fixing their degrees of belief in q given p.30
The Ramsey-Test is a test for the correctness of assertability conditions for a conditional, i.e. a test for whether it is correct or acceptable to assert a conditional. 27 Note that these event statements themselves are physically possible, but the fact that they are causes of Uranium decay is not. However, also physically impossible events (e.g., James takes his morning walk with a speed greater than the speed of light at time t) are logically possible interventions. 28 Stalnaker (1968); Lewis (1973) 29 Goodman (1983) 30 Ramsey (1929, p. 247)
What‘s Wrong with the Pragmatic-Ontic Account?
149
According to the Ramsey-Test, a conditional is acceptable (to the degree that) if one adds “hypothetically” the antecedent (e.g. “suppose that Uranium decays”) to one’s knowledge, then one also believes that the consequent is true (e.g., “there is a flash on the screen”). Hypothetically adding the antecedent to one’s beliefs is an epistemic operation. It is an assumption or a supposition. Recently, sophisticated ways to defend the suppositional theory31 have given a probabilistic reading to the Ramsey-Test but such details do not matter for our present concerns. There is lesson to learn from these standard accounts of the meaning of counterfactuals: logically possible interventions are not as innocent as Woodward and Sirtes suggest. The reason for this is that a world where an intervention is carried out on the decay of Uranium instantiates laws of nature that differ from the actual laws (namely, Uranium decay is not an indeterministic process in these worlds), because this intervention is physically impossible given the actual physical laws. This is quite problematic: According to the standard approaches to the meaning of counterfactuals, worlds differing in laws from the actual world are not adequate to evaluate counterfactuals. (I) According to Lewis’32 possible world semantics, worlds that differ in laws are not among the closest antecedent-worlds that are relevant for fixing the truth conditions for counterfactuals. Instead, worlds differing in laws are distant. (II) According to the meta-linguistic account, using non-actual law statements as premises seems to be misguided. Intuitively, if a counterfactual is supposed to be true at the actual world, then the consequent has to be inferred from actual laws and singular statements. (III) According to the suppositional theory, beliefs about which laws are true in the actual world are be important for carrying out the Ramsey-Test33. Supposing that p and adding p to a stock of beliefs including non-actual laws does not make sense. The relevant epistemically possible worlds (in a communicative situation) are (at least partly) constituted by actual laws. To sum up, worlds in which there are interventions on Uranium decay are worlds with non-actual laws. I argue elsewhere that the existence of a merely logically possible intervention clearly differs from a local violation of a law (which Lewis calls a small miracle). This leads to a conflict with the three standard approaches to the meaning of counterfactuals, because worlds with non-actual laws are not used to determine the meaning of counterfactuals. Therefore, worlds where interventions on the decay of Uranium are carried out are not appropriate for evaluating counterfactuals. Now, the burden of proof is on the interventionists’ side: inter31 Adams (1975: chapter 4); Skyrms (1994); Edgington (2008) 32 Lewis (1979, pp. 47f.) 33 cf. Leitgeb (2010, section 2)
150
Alexander Reutlinger
ventionists have to show (a) why the standard approaches are wrong in rejecting interventions and (b) how to evaluate counterfactuals in terms of interventions. This is, of course, an unpleasant result for interventionists. These two arguments against interventions are serious challenges for any philosopher who makes use of the interventionist theory of causation. Thus, I think that Sirtes is obliged to solve the problems that are attached to “possible interventions”. Unfortunately, adding pragmatic constraints to an interventionist theory of causation (and explanatory relevance) does not help to meet these challenges.
3. CONCLUSION Sirtes’ model clearly has attractive features: it is an improved version of van Frassen’s pragmatic model of explanation, because Sirtes provides a fixed relevance relation. And, certainly, Sirtes is completely justified to point out that a good deal of explanatory practice is pragmatically constrained. Nonetheless, I expressed two worries: (1) According to interventionists, causal claim are semantically evaluated only with respect to a causal model. The choice of a causal model is unarguably pragmatic. Thus, Sirtes’ claim to add pragmatic constraints to CR1 and CR2 seems to be misguided, because these constraints are already presupposed (in a causal model) by interventionists. Consequently, one could read Sirtes as reminding us of the pragmatic constraints on causal models that we accept implicitly when we endorse an interventionist theory of causation and explanation. A further problem for Sirtes is connected with causal models: unlike interventionist theories, Sirtes’ account is unclear on the role of laws or invariant generalizations in mechanistic explanation. (2) The second worry addresses two problems concerning “logically possible” or “ideal” interventions, which interventionists and Sirtes presuppose. The first problem consists in the fact that logically possible interventions seem to be completely unnecessary for determining the meaning of causal claims. The second problem arises because assuming logically possible interventions is at odds with three standard accounts of the meaning of counterfactuals. Since Sirtes’ account depends on counterfactuals such as CR1 and CR2, the burden of proof is now on his side: Sirtes has to show (a) why the standard approaches are wrong in rejecting interventions and (b) how to evaluate counterfactuals in terms of interventions. Appealing to pragmatic constraints does not seem to be a promising strategy to solve this problem.
What‘s Wrong with the Pragmatic-Ontic Account?
151
REFERENCES Ernest Adams, A Theory of Conditionals. An Application of Probability to Deductive Logic. Dordrecht: Reidel 1975. Carl Craver, Explaining the Brain. Oxford: Oxford University Press 2007. Dorothy Edgington, “Counterfactuals”, in: Proceedings of the Aristotelian Society 108, 2008, pp. 1-21. Stuart Glennan, “Rethinking Mechanist Explanation”, in: Philosophy of Science 69, 2002 (Supplement), pp. 342-353. Joseph Halpern and Judea Pearl, “Causes and Explanations: A Structural-Model Approach. Part I: Causes”, in: British Journal for the Philosophy of Science 56, 2005, pp. 843-887. Nelson Goodman, Fact, Fiction and Forecast (Fourth Edition). Cambridge (MA): Cambridge University Press 1983. Christopher Hitchcock, “The Intransitivity of Causation Revealed in Equations and Graphs”, in: The Journal of Philosophy 98, 2001, pp. 273-299. Hannes Leitgeb, “A Probabilistic Semantics for Counterfactuals”, unpublished manuscript, 2010. David Lewis, Counterfactuals. Oxford: Blackwell 1973. David Lewis, “Counterfactual Dependence and Time’s Arrow”, in: David Lewis (Ed.), Philosophical Papers II. New York: Oxford University Press 1986, pp. 32-51. Peter Machamer, Darden Lindley and Carl Craver, “Thinking About Mechanisms”, in: Philosophy of Science 67, 2000, pp. 1-25. Judea Pearl, Causality. Models, Reasoning and Inference, Cambridge: Cambridge University Press 2000. Frank P. Ramsey, “General Propositions and Causality”, in: D. Hugh Mellor (Ed.), Frank P. Ramsey: Philosophical Papers. Cambridge University Press, pp. 237-255. Daniel Sirtes, “A Pragmatic-Ontic Account of Mechanistic Explanation”, in: PhilSci Archive 2010, URL: http://philsci-archive.pitt.edu/archive/00005181/. Brian Skyrms, “Adams Conditionals”, in: Ellery Eells and Brian Skyrms (Eds.), Probability and Conditionals. Belief Revision and Rational Decision. Cambridge/UK: Cambridge University Press 1994, pp. 13-26. Robert Stalnaker, “A Theory of Conditionals”, in: William Harper and Robert Stalnaker (Eds.), Ifs. Dordrecht: Reidel 1968, pp. 41-55. James Woodward, “What is Mechanism? A Counterfactual Account.”, in: Philosophy of Science 69, 2002, pp. 366-377. James Woodward, Making Things Happen. Oxford: Oxford University Press 2003. James Woodward and Christopher Hitchcock, “Explanatory Generalizations, Part I: A Counterfactual Account.”, in: Nous 37, 1, 2003, pp. 1-24.
152 Department of Philosophy Westphalian Wilhelms University of Münster Domplatz 23 48143 Münster Germany [email protected]
Alexander Reutlinger
MICHAEL JOFFE
CAUSALITY AND EVIDENCE DISCOVERY IN EPIDEMIOLOGY
In a classic 1965 paper, Bradford Hill set out his famous viewpoints – explicitly not “criteria” – as a guide to inferring causation from association.1 It was written very much in a practical style on the basis of his rich experience, without any attempt at a profound conceptual analysis. This paper sets out a view of causality, and attempts to make the case that it provides a systematic rationale for these viewpoints, or at least is compatible with them. His starting point was the situation where an association has been established between two variables, that is not likely to be attributable to random error. The fundamental question that he posed was, “is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect?”. A more fruitful way to ask this might be, “what other interpretations are possible, and what is the strength of evidence for each?” – replacing the dichotomous view with a formulation that is more like an options appraisal. This appears to be stronger because rather than seeking to dismiss possibilities, it takes each competing candidate seriously on its merits. In thinking about causality in epidemiology, I start from biology more broadly, and ask: given that there is now a great deal of reliable biological knowledge, how has this been possible? What conception of reality is implicit in the way biologists have approached research? And what view of methodology?2
EXISTENCE AND CAUSALITY In relation to ontology, biologists act as if the following were true: objective reality is assumed – a tree exists whether or not there is anyone to observe it; and objects existed before there were persons or other organisms that were capable of observing them3. Furthermore, objects not only exist but also interact, in other 1 2
3
Hill (1965) Biologists typically do not specify what is meant by causation in a way that would satisfy a philosopher, so that we are dealing with their implicit usage. Most existing philosophical theories of causation are based on examples from physics, cf. Russell (1948); Salmon (1984); Dowe (2000). It may be possible to generate a rigorous theory of causation based on biological thinking, cf. Cartwright (2004), but this is beyond the scope of the present paper. An “object” here can refer not only to a whole organism (a tree) but also to part of an organism, e.g. a leaf or a cell membrane, and in population biology to groups of organisms; it can also refer to something external that affects the survival and/or propagation
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_11, © Springer Science+Business Media B.V. 2011
154
Michael Joffe
words the real world contains causal relationships. An example would be: sunlight + tree ⇒ photosynthesis (among other things). Thus they take not only existence but also causality as primitives. Generically, a causal relationship is one that has a mechanism that by its operation makes a difference. Epistemologically there may be less consensus, but a representative view may be the following: 1. we can only observe reality indirectly, through our perceptions and concepts; 2. humans are predisposed to see things in terms of causes – especially agency4, a tendency that needs to be unlearnt in order to do science – we have evolved hard-wired proclivities to do this; 3. causal perceptions are not necessarily correct; 4. to try and make them more likely to be true we have two basic approaches to the process of discovery, which are complementary: • uncovering how the causal relationship operates: “mechanism” • detecting what difference the cause makes to the effect: “difference making” Typically, biologists do not attempt to characterise the abstract properties either of mechanism or of difference making as a philosopher would. Rather, they seek concretely either to understand the former or to detect and characterise the latter. If pressed, no doubt time order would be an abstract property that would be readily agreed to, and very likely also proximity. That mechanism and difference making are separate, and are not mutually inter-convertible, is widely recognised. In philosophy, a notable example is Cartwright: “no causes in, no causes out”; knowledge of causes cannot be obtained from equations and associations alone, a view that is based on an analysis of physics rather than biology.5 A different expression of the same idea is that due to Pearl, who phrases it in terms of two separate and incommensurable languages, one causal and the other statistical – or better, associational.6 As he says, ‘the symbolic representation for the relation “symptoms cause disease” is distinct from … [that] for “symptoms are associated with disease”’. Neither mechanism nor difference making is sufficient on its own. Evidence of a stable association is unconvincing as a causal explanation unless there is at least some plausible way that the cause could operate. Yellow, nicotine-stained fingers would be unlikely as a cause of lung cancer. Conversely, mechanistic evidence on the liability of hot tarry smoke to cause lung tumours when inhaled does not neces4
5 6
of an organism, for example a foodstuff. Agency is not dealt with in this paper; it requires consideration of issues that go beyond the type of causation that is involved in biology (apart from in the study of animal or human behaviour). I therefore avoid illustrative examples both of purposeful actions and of those based on human creations e.g. a light switch. Cartwright (1989). Pearl (2000); Pearl (2002).
Causality and Evidence Discovery in Epidemiology
155
sarily mean that in practice the smoking of cigarettes will have this effect. There could e.g. be insufficient quantities of the chemicals involved, so complementary evidence of the difference-making type is indispensable. Furthermore, differencemaking evidence is also necessary to indicate the size of the resulting effect, in other words to calibrate difference-making with mechanism. Either mechanism or difference making can come first. For any particular research question, the ideal is to have evidence on both, but it is possible to have just one, at least transiently. In such a situation, the following would respectively apply: • a causal mechanism that exists potentially, but is not necessarily operating in practice; • “black-box” difference-making – a change in the causal variable makes a detectable difference to the caused variable, but how this is brought about is not known. Type or token? When biologists study causal relationships, they are almost always concerned with repeatable processes rather than cases, in other words, type not token. Ontologically, it is a matter of common observation that repeatable sequences can occur (type); one cannot exclude the possibility that in addition, some causal relationships are one-off sequences (token). However, epistemologically, the latter are not amenable to study. In assessing one-off sequences for their evidential weight, e.g. in a court case, we have to use reasoning that is derived from repeated experience about how the world works – whether commonsense or more formal. In science, reasoning is in terms of general, repeatable processes. Not only is replication a key pragmatic element, but also generalisability (applicability in a wide variety of different situations) is regarded as a positive feature in a scientific theory. For example in the mid-nineteenth century, major breakthroughs occurred when it was realised that all living organisms are composed of cells; and, all cells originate from other cells. Even when the aim is to explain a one-off event such as the origin of life on Earth, of the first cell, or of the first multi-cellular organism, this has to be in terms of the repeatable known processes of chemistry and physics. Endogenous and exogenous causation When causal relationships are part of a self-perpetuating system, it is convenient to call them endogenous causal relationships. The pioneer of system dynamics, Jay Forrester, has established that systems of interacting causes characterised by feedback relations tend to develop their own distinctive dynamic behaviour,7 and they are typically rather insensitive to initial conditions8; in addition they may be 7 8
Forrester (1970); Lane (2007). In this respect they are very different from chaotic systems.
156
Michael Joffe
influenced by exogenous factors. Simpler systems that lack feedback have only exogenous causation. In biology, evolution has created organisms that have the properties that Forrester described: they have self-perpetuating causal loops as well as negative feedback. This “systemness” is the causal aspect of the well-recognised feature of organisms, that they have the appearance of having been designed. As an example of a self-perpetuating causal loop, a mammal remains alive if and only if its respiration, blood circulation and hindbrain maintain the following causal relationships: respiratory movements
oxygenation assumes intact lung, clear airway, etc
nerve impulses
heart & circulation
perfusion
hindbrain
In addition, the interior of the organism (the “milieu intérieur”) is maintained in a rather stable condition by the operation of numerous negative feedback mechanisms that maintain an optimal concentration of ions such as sodium, hormones such as cortisol, and (in mammals and birds) core temperature. The study of how organisms work – physiology, biochemistry, cell biology, genetics, etc – is essentially the study of these endogenous causal relationships9. In contrast, epidemiology is concerned with causal relationships that are not part of a self-perpetuating system, e.g. the effect of cotton dust on workers’ lungs, or the effect of a medical treatment such as a pharmaceutical agent. Even when a self-perpetuating system is involved, the focus is on what affects its functioning from outside, e.g. a drug to improve cardiac performance or a risk factor for heart disease – so, “exogenous” causation. Epidemiology sets out to identify such causal relationships using a difference-making approach, and also to estimate their strength. Understanding the causal process from a mechanistic viewpoint, in terms of the underlying chemistry, physics and microbiology, is complementary to this. Stochastic and multiple causation In physics (outside the quantum world), causation is typically deterministic. In such circumstances, difference making can be expressed using equations without stochastic terms, and these usually correspond very accurately with measurements. The study of the physico-chemical processes operating within the organism 9
There are also non-biological examples of endogenous causal processes, for example the price mechanism in economics.
Causality and Evidence Discovery in Epidemiology
157
draws on this, albeit with a lower degree of regularity so that physics-style equations are less prominent.10 Outside this relatively predictable realm, the degree of regularity is lower still, and biology tends to lack one-to-one relationships between cause and effect. In many situations, exposure to the causal factor produces the effect only in a proportion of cases, so that causation is probabilistic or stochastic. The causal effect is merely a tendency. Coupled with this, other factors may affect the same outcome, so that causation is multiple. Thus, among long-term heavy cigarette smokers, only a small proportion develop lung cancer; other factors (mainly unknown) are also required. Similarly, other exposures such as radon and asbestos can cause lung cancer, and fruit and vegetable consumption tends to protect against it. Discovering causal effects under these conditions involves demonstrating that the putative causal factor is associated with the outcome, and that this is brought about by a causal process from the former to the latter. “Associated” here means that the probability that they occur together is found to be significantly different from that which would occur purely by coincidence. If this is established, whether or not the association is causal is the issue that Bradford Hill addressed in 1965, to which we will soon return. This strategy depends on there being a degree of regularity in the outcome variable. A classic study which was based on the observation that some population rates are relatively stable comes from sociology rather than from biology, but is applicable also to epidemiology. In his 1897 work on suicide,11 Émile Durkheim noted: • the psychological process (mechanism) occurs at the individual level; • it occurs at a roughly constant rate: the number is fairly predictable in a large social group (e.g. a country), but the identity of who commits suicide is not; • the rates of different variables show differing degrees of constancy/variation; this also depends on the unit of time measurement, e.g. annual or multi-annual; • this constancy is not permanent, because the rate can be affected by external factors; • the rate varies between different social groups – i.e. between-group differences in rate are informative; • how the particular level of the rate, or tendency, is set is obscure. Taking these together, they imply that a stochastic cause operates at the individual level, but is only detectable at a group level. This introduces a dissociation between the mechanistic and difference-making aspects. Epidemiological examples would be: each death on the roads has its own local causes, and we cannot predict such events, but we can predict the approximate number of annual road deaths in any particular country. The same applies to the number of lung cancer cases. We 10 They do exist, e.g. the physics underlying the control of muscular movement, or the pressure relationships in the cardiovascular system. 11 Durkheim (1897/1952).
158
Michael Joffe
may also be interested in what additional factors (including deliberate interventions) influence this number, which could be road design or speed limits in the first case, and asbestos exposure or fruit and vegetable consumption in the second. The question of levels is also relevant to the feasibility of discovery using the difference-making strategy, because this depends on the heterogeneity (amount of variation) present. Thus, as Rose pointed out, if everyone in a particular population smoked, it would be impossible to compare the health risks of smokers and non-smokers; uniformity of this kind exists in e.g. salt and fat consumption in developed countries, so that their health consequences require between-population studies to detect them.12, 13 Epidemiological studies are of two major types: those that involve intervention by the investigator, and those that do not and are therefore limited to observation14. Biology in general is largely observational, from van Leeuwenhoek’s discovery of a previously unsuspected world of invisible organisms using the newly-developed microscope, through Darwin’s painstaking work on the subtle variations within and between species that gave rise to his evolutionary theory, up to today’s research, for example using microarrays to study gene expression. In biology as applied to medicine, the natural history of the disease under study forms the basis for any more sophisticated research, including by intervention studies. For example, it is important when studying cancer to recognise that it can take two or more decades to develop, and is often present and having biological effects before it is diagnosed. Again, Down syndrome is determined before conception, so factors occurring during pregnancy cannot affect its occurrence (although they could affect survival to term).
NON-RANDOMISED EPIDEMIOLOGICAL STUDIES Individual types of study design and the body of interlocking evidence In the mid-nineteenth century John Snow discovered that cholera was water-borne, using observational methods. He mapped cases in an outbreak in Soho, London, and found them to be clustered around a particular pump15; and in another study he established that the proportion of cholera cases was far higher in households supplied by one water company than another. This was before the germ theory of dis12 Rose (1985/2001); Schwartz & Diez-Roux (2001). 13 His distinction of “causes of incidence”, based on such between-population comparisons, from “causes of cases” which are individual susceptibility factors within a population, is a misnomer. They are not separate types of cause, ontologically – it is the feasibility of their discovery that differs. 14 Mathematical modelling also can play an important role; this has so far been best developed in the study of infectious diseases, cf. Anderson & May (1991). 15 That he stopped the outbreak by removing the pump handle is harder to establish, as the number of cases had already started falling by the time he did that.
Causality and Evidence Discovery in Epidemiology
159
ease, so was purely difference-making evidence. Sewer construction followed,16 and outbreaks of cholera (and certain other serious water-borne diseases) ceased. They have similarly ceased throughout the world wherever effective separation of piped water and sewage has been achieved. Despite this undoubted success story, both scientifically and practically, the view has become widespread that reliable causal knowledge is in principle only obtainable from intervention studies, and in particular by using randomised controlled trials (RCTs). The pros and cons of this argument are beyond the scope of this paper. Here I simply note that RCTs have been extremely valuable in assessing the merits or not of medical and surgical interventions, but that little has been discovered using this methodology that has wider or deeper scientific value. The location of the discussion within medicine accentuates the pragmatic at the expense of a stress on deeper understanding. Yet even here, non-RCT evidence has been highly successful: in addition to the cholera example, the discovery of the harmful effects of tobacco has led to action that has prevented an enormous quantity of disease, suffering and death. And tobacco is not alone: asbestos and countless other harmful substances have been incriminated using epidemiology of an observational type, not RCTs. This raises a more fundamental point. In philosophical discussions, it is tempting to be ultra-sceptical, as there is a historical tradition of calling into question even the separate existence of an object such as a tree when not being observed. The scepticism may be justified when considering the merits of an individual argument taken in isolation, but science develops by bringing together evidence from a wide variety of different sources, and by building on existing evidence – which itself may initially be tentative – and producing ever more successful accounts of the world. In Hume’s day it may have been reasonable to see the daily rising of the sun as a mere conjunction of observations, but the understanding of the Solar System has since progressed to a point where the prediction that it will rise tomorrow is no longer based only on past observations. The failure of the sun to rise would have to be understood in terms compatible with modern understanding, perhaps of an asteroid impact. Such an impending cataclysm would be predicted more than a day before it occurred. In the absence of an approaching asteroid or other such threat, it is possible to refer to absolute certainty that the sun will rise tomorrow.17 It is a mistake to regard scientific understanding as if it were a collection of such isolated, uncorroborated statements. An epidemiological example would be that it is known that malaria is transmitted by a certain genus of mosquito, which 16 Although not causally: the parliamentary decision to authorise construction was probably more attributable to the smell of sewage in the Thames near the Houses of Parliament. 17 Hume’s general point is not negated by this argument, as there are many other possible examples that could be used to illustrate it. In due course, some of these may also be sufficiently well understood to graduate from the status of mere repeated observation to that of systematic understanding.
160
Michael Joffe
breeds in stagnant fresh water. That these two pieces of information, obtained from non-intervention studies, are true is confirmed by practical corroboration: malaria declines when nearby ponds are drained. Despite such well-known examples, the ultra-sceptical view is influential – leading epidemiologists Rothman and Greenland prominently state: “Philosophers agree that causal propositions cannot be proved, and find flaws or practical limitations in all philosophies of causal inference”.18 It is important to get the methodological argument straight, not only to correct this erroneous view, but more importantly to provide insights that could improve the quality of research. The case against non-randomised epidemiological studies is based on the claim that causation cannot be reliably inferred from them. This is based on critiques conducted at the level of the individual study, or type of study, not at the level of all the interlocking pieces of evidence taken together.19 It is like taking Hume’s example at face value and ignoring the science of asteroids. Any particular type of study is a means of discovery, as stated above. The findings are typically not conclusive in themselves. Where different designs agree, the conclusions are strengthened. Thus case control studies, being cheaper and quicker than cohort studies, are often carried out first, but their limitations (e.g. retrospectively recalled exposures in the context of severe illness) mean that the findings are tentative in status. If corroborated in subsequent cohort studies, the evidence is more reliable. But certainty is more difficult to attain; as illustrated by the cholera and malaria examples, the keystone is provided by repeated confirmatory intervention20. The initial findings of Snow’s two pioneering studies did not have the status of confirmed truth at the time he published them, but subsequent events proved them to have been true, even by the end of the nineteenth century. The discovery in 1883 of the mechanistic cause of cholera (the water-borne Vibrio cholera bacillus) completed the basic causal picture, but Snow’s vindication would have been complete even without that. Inferring causation from association In this discussion, in addition to Bradford Hill, I draw on the work of Judea Pearl and others who have developed the causal diagram approach. I start from the same point as Bradford Hill, a robust statistical association, and ask, apart from direct causation from the putative causal factor to the outcome, what other interpretations are possible, and what is the strength of evidence for each? An important insight is that if a statistical association exists, e.g. between a 18 Rothman & Greenland (2005). 19 Prominent examples include Papineau, Cartwright and Pearl, cf. Worrall (2007). 20 Possibly but not necessarily using a randomised design. The types of intervention that occurred in the cholera and malaria examples were at the ecological or group, not the individual, level – despite the article of faith among many epidemiologists that only the individual level is really reliable, because of the ecological fallacy.
Causality and Evidence Discovery in Epidemiology
161
suggested causal variable X and a suggested effect Y, the number of ways that this can be brought about, other than random error, is limited21: 1) X has a causal effect on Y; 2) Y has a causal effect on X; 3) X and Y have a common cause (“ancestor”), which we can call C – this is “confounding” (although its definition can vary); 4) X and Y have a common consequence (“descendant”), which we can call C – this is sometimes known as “Bergsonian bias”, which typically involves how the research sample came to be selected. It is possible, and not unusual, for more than one of these four types to coexist. In case 3)22 there can be more than one confounder – in fact this is the rule rather than the exception. Also the variable denoted here by C can in fact be a group of inter-related variables, a pathway or a network of pathways, with the overall effect being a common cause. There is a quantitative aspect too: when an association is brought about in this indirect fashion, it is weaker than either of the associations that represent the two causal relationships (C causes X, C causes Y). The great geneticist and statistician R. A. Fisher made the case that there are “innumerable” possible unknown causes that could bring about an association between X and Y without X actually causing Y. This depends on there being innumerable examples of C of either type. Not only that, they have to be strong enough to bring about the observed association despite the attenuating effect of indirect causal pathways. This means that Fisher’s case has to be that a large number of possible large-enough causes may interfere with causal inference. If the strength of association is great enough, it becomes highly unlikely: for example in the case of cigarettes and lung cancer, the association that Fisher was arguing was not causal, smokers are approximately ten times as likely to contract the disease as life-long non-smokers. Such an association would be extremely difficult to produce by one or more indirect causal pathways such as 3) or 4), and the idea that an unrecognised factor with these properties could underlie the association is conjecture based on theoretical considerations that are a poor guide to reality. Causal diagrams One way of addressing the question of non-causal associations is to draw up a causal diagram with all the known and possible causal pathways that could influence the association under study. For example, a diagram would clarify that having yellow-stained fingers does not cause lung cancer: the arrows would go from habitual cigarette smoker to stained fingers and to lung cancer. But more important, it has been shown that if certain simple rules are fol21 Pearl (2000); (2002). 22 In common with the bulk of the literature, this discussion focuses on case 3). It ignores case 4), which is generally thought to be far less important; some of the same arguments could be applied here as well.
162
Michael Joffe
lowed, such diagrams are mathematically rigorous.23 The type of diagram that has received most attention in this context is called a Directed Acyclic Graph (DAG): “directed” meaning that the pathways have direction, which can be thought of as causal, rather than just being associations (which are undirected); and “acyclic” means that feedback is excluded. The absence of feedback is generally appropriate for depicting causal pathways for exogenous causation.24 One advantage of diagrams is that they can express causal relationships rigorously without specifying a particular functional form, whereas regression equations need to specify e.g. a linear relationship. A corollary is that with the functional form being obscured, it is possible to make false inferences. For example, if the exposure-response function for the effect of a chemical on disease has a threshold (which is generally true apart from genotoxic carcinogens), and if the process leading to the exposure cannot produce a concentration in excess of this level, the diagram will suggest a transmission of causation that does not exist in actuality. More generally, diagrams usually depict variants of monotonic functions, but do not specify which one.25 Bradford Hill’s viewpoints We now come to review the classic 1965 paper from the perspective outlined hitherto. Taking each in turn, although not in the original order, most of them can be seen to flow from the basic principles outlined above.
Bradford Hill’s viewpoints 1. strength of association 2. consistency – different persons, places, circumstances, times 3. specificity – relation of agent to disease 4. temporality 5. biological gradient (dose-response) 6. plausibility – in relation to mechanism 7. coherence with natural history 8. experiment (preventive action) 9. analogy – of suggested mechanism with other similar ones
23 Pearl (2000); Pearl (2002); VanderWeele et al. (2008). 24 Although there are situation where it can occur, e.g. “risk compensation”: an example is a road with a dangerous bend; when straightened the road may appear to be safer, but drivers may compensate by increasing their speed, thereby restoring at least some of the risk, cf. Adams (1995). This is a type of negative feedback. 25 Exceptions to this are worthy of publication. An example is the “dose-response fallacy” in human reproduction, cf. Selevan & Lemasters (1987), based on the biological reasoning that whereas the effects of a low dose of an embryonic toxin will be detectable e.g. as a congenital anomaly, a higher dose is likely to kill the embryo, leading to a miscarriage, which is far harder to detect.
Causality and Evidence Discovery in Epidemiology
163
4. temporality: the cause precedes the effect. This is a central feature of causality, which biologists take as given (backward and simultaneous causation do not feature in biology). It follows naturally from “a causal relationship is one that has a mechanism that by its operation makes a difference”, because “making a difference” implies that “after” is different from “before”. In most epidemiological situations the time order of the variables is clear, but it was included in Bradford Hill’s paper because sometimes there is uncertainty which factor comes first, for example does obesity predispose to type II diabetes, or vice versa? Does weight loss lead to stomach cancer, or is the cancer already present and unrecognised, and causing the weight loss? 6. plausibility: this follow directly from there being two basic approaches to the process of discovery. As epidemiology takes the difference-making route, a complete explanation also requires a complementary mechanistic understanding. For example, the observation that cigarette smoking approximately doubled the risk of ischaemic heart disease was not only the first clue to this now-established causal relationship, but at the time was regarded as lacking biological plausibility (a not-unusual initial response of laboratory scientists), until the biochemical pathways involved were established. 2. consistency: the focus on type rather than token causation implies that the cause-effect relationship will be detectable under a wide variety of conditions26. However, the possibility of effect modification means that this may not be completely uniform, as the causal effect could depend on other conditions e.g. ethnic or genetic factors. Thus, the consumption of broad beans causes destruction of red blood cells in people with a genetic condition called glucose-6-phosphate dehydrogenase (“G6PD”) deficiency, a condition that is especially frequent in certain populations27. 9. analogy – of suggested mechanism with other similar ones: this viewpoint refers to the existence of a precedent, for example, knowing that congenital anomalies can be caused by rubella and by thalidomide makes more believable a hypothesis that a new agent can cause intra-uterine damage. It was given little emphasis in Bradford Hill’s paper. It suggests a view of biological causation in which patterns recur. It does not relate directly to the view of causation put forward in this paper. 3. specificity – relation of agent to disease: outside the realm of infections, where e.g. the measles virus and the leprosy bacillus cause diseases with specific characteristics, there is no a priori reason to suppose that cigarettes, say, will only cause one type of cancer (we know this is untrue). Where specificity does apply it makes discovery easier, but it is now known that many risk factors can cause several distinct medical conditions. In Bradford Hill’s time this was not yet clear, but 26 Methodologically, the most important aspect of consistency may be that it is across types of study design, for reasons already stated. 27 Cappellini & Fiorelli (2008).
164
Michael Joffe
even then he raised this point mainly to caution against the temptation to “overemphasise the importance of the characteristic”. As he noted, multiple causation is possible, and this accords with the account of causation presented above. The DAG methodology is well suited to tracing out multiple causes and multiple effects.28 7. coherence with natural history: this fits with the emphasis above on knowledge of the natural history of disease, and more broadly on the integration of different types of evidence. It can occur at the level of individual cases or of population incidence: for the latter, Bradford Hill cites the rapid rise both of cigarette smoking and (some twenty years later) of lung cancer during the first half of the twentieth century. An example at the case level is that the known natural history of stomach cancer is that it takes many years to become manifest, and in the period before this occurs it can cause weight loss. 8. experiment (preventive action): again, this follows from the emphasis on integrating the whole body of scientific evidence rather than taking each statement in isolation so that it is open to criticism e.g. on logical grounds. Bradford Hill gave the example of a workplace in which the dust has been reduced, the question then being, does this affect the frequency of health events? Other examples have already been alluded to, e.g. cholera. 1. strength of association: the main issue here is that the larger is the relative risk, the less likely is the possibility that it is brought about by confounding. As Bradford Hill says, it is far harder to explain the ten-fold increase in lung cancer due to smoking by indirect means than it is to do the same for the two-fold increase in ischaemic heart disease. This fits neatly with the DAG approach, in which pathways are made explicit, and the resulting diagram is used as the basis for statistical analysis.29 5. biological gradient: this requires that the increase in risk tends generally to be monotonic – as the magnitude of the exposure increases, so does the level of risk. It does not follow directly from the conception of causality presented in this paper, but simply occurs because in most situations a larger input to the causal mechanism produces a larger output, in other words the difference that it makes is bigger.
CONCLUSION In this paper, I have suggested a concept of causation that appears to be implicit in the way that reliable biological knowledge has been accumulated since the midnineteenth century. In this, mechanism and difference making are not seen as different types of cause, but rather they are seen as the two types of strategy used in 28 Joffe & Mindell (2006). 29 Glymour et al (2005).
Causality and Evidence Discovery in Epidemiology
165
the discovery of causal relationships. A distinction is however made between exogenous causes, and endogenous ones that form part of a self-perpetuating system. In the branch of biology known as epidemiology – the difference-making approach to analysis of evidence relating to exogenous causal relationships – it is widely considered that inferring causation is especially difficult unless randomised intervention studies are carried out. I argue here that non-experimental epidemiology has demonstrably discovered evidence that is important both scientifically and practically. The confirmatory keystone is provided when the hitherto-tentative hypothesis is put to the test in the real world, and repeatedly has the expected consequences; understanding the mechanism is not essential. Furthermore, the suggested conception of causation and evidence discovery is compatible with the classic discussion of causal inference by Bradford Hill, who was writing on the basis of experience rather than attempting a systematic account. Acknowledgements I would like to thank Raffaella Campaner for helpful comments on an earlier draft of this paper.
REFERENCES John Adams, Risk. London: UCL Press, 1995. Roy M. Anderson and Robert M. May, Infectious diseases of humans: dynamics and control. Oxford: Oxford University Press, 1991. Maria Domenica Cappellini and G. Fiorelli, “Glucose-6-phosphate dehydrogenase deficiency”, in: Lancet 371, 2008, pp. 64-75. Nancy Cartwright, Nature’s capacities and their measurement, Oxford: Clarendon Press, 1989. Nancy Cartwright, “Causation: One Word, Many Things”, in: Philosophy of Science 71, 2004, pp. 805-19. Phil Dowe, Physical Causation, New York: Cambridge University Press, 2000. Émile Durkheim, Suicide (Introduction), London: Routledge & Kegan Paul Ltd, 1952 [first published in 1897, in French]. Jay Wright Forrester, “Counterintuitive behaviour of social systems” (1970), In: Jay Wright Forrester (Ed.), Collected papers of Jay W. Forrester [1975 collection], Cambridge, MA, Wright-Allen Press, pp 211-44. M. Maria Glymour, Jennifer Weuve, Lisa F. Berkman, Ichiro Kawachi, and James M. Robins, “When is baseline adjustment useful in analyses of change? An example with education and cognitive change”, in: American Journal of Epidemiology 162, 2005, pp. 267-78. Austin Bradford Hill, “The environment and disease: association or causation?”, in: Proceedings of the Royal Society of Medicine 58, 1965, pp. 295-300.
166
Michael Joffe
Michael Joffe and Jennifer Mindell, “Complex causal process diagrams for analyzing the health impacts of policy interventions”, in: American Journal of Public Health 96, 2006, pp. 473-79. David C. Lane, “The power of the bond between cause and effect”, in: System Dynamics Review 23, 2007, pp. 95-118. Judea Pearl, Causality: models, reasoning and inference. New York: Cambridge University Press, 2000. Judea Pearl, “Causal inference in the health sciences: a conceptual introduction” Health services and outcomes research methodology 2, 2002, pp. 189-220. Geoffrey Rose, “Sick individuals and sick populations”, in: International Journal of Epidemiology 30, 2001, pp. 427-32 [first published 1985]. Kenneth J. Rothman, Sander Greenland, “Causation and causal inference in epidemiology”, in: American Journal of Public Health 95 (Suppl.), 2005, pp. S144-50. Bertrand Russell, Human Knowledge. New York: Simon and Schuster, 1948. Wesley Salmon, Scientific Explanation and the Causal Structure of the World. Princeton: Princeton University Press, 1984. Sharon Schwartz and R. Diez-Roux, “Commentary: causes of incidence and causes of cases – a Durkheimian perspective on Rose”, in: International Journal of Epidemiology 30, 2001, pp. 435-39. Sherry G. Selevan and Grace K. Lemasters, “The dose-response fallacy in human reproductive studies of toxic exposures”, in: Journal of Occupational Medicine 29, 1987, pp. 451-54. John Worrall, “Why there’s no cause to randomize”, in: British Journal for the Philosophy of Science 58, 2007, pp. 451-88. Tyler J. VanderWeele, Miguel A. Hernán, and James M. Robins, “Causal directed acyclic graphs and the direction of unmeasured confounding bias”, in: Epidemiology 19, 2008, pp. 720-28.
Division of Epidemiology, Public Health and Primary Care Imperial College of Science, Technology, and Medicine London SW7 2AZ UK [email protected]
GERD GRAßHOFF
INFERENCES TO CAUSAL RELEVANCE FROM EXPERIMENTS
1 THEORY AND EXPERIENCE For many the juxtaposition of theory and experience is governed by the logical properties of a deductive relation. Often it is abbreviated “hypothetico-deductive model of the science” (HD-model). “Theory” is a broad framework of general statements about natural properties and processes. Although theories and conjunctions of scientific statements may lack causal terminology in scientific papers, causal hypotheses are practically always involved. For the following arguments the only relevant aspect is that theories imply propositions that can be compared with empirical data. Here the implication is meant strictly in the sense of a deduction. The HD-model has at first sight attractive advantages. Independent of the logical internal structure of the theories the implication determines the methodological evaluations of the theories by comparing the theory with empirical data according to a simple pattern: Thesis 1. Theory evaluation according to HD is – not exclusively – a logical consequence of the comparison between prediction and empirical data. The mechanism is the comparison of truth values for identical propositional content. The comparison function returns a truth value for the prediction. 1. Theory & h1 … hi → prediction 2. Comparison ( prediction == empirical data) Theories in junction with additional hypotheses h1....hi derive a prediction. The inferences are governed by the logical properties of the conditional. It is false only if the antecedent if true and the consequent is false. The empirical evaluation of theories then draws consequences from the comparison of truth values of the predicted proposition and empirical data. In case both propositions are the same and differ in their truth value, one statement is judged to be false: either the empirical data contradict the prediction or the empirical data must be revised. Thesis 2. The result of comparison (prediction, empirical data) according to HD has then consequences for the evaluation of the theory (which might be just a hypothesis plus auxiliary hypotheses). 1. In case the comparison reveals agreement between prediction and empirical data, support (theory ˄ h1 … hi ).
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_12, © Springer Science+Business Media B.V. 2011
168
Gerd Graßhoff
2. In case of disagreement at least one proposition of (theory ˄ h1 … hi ) is false. A select (theory ˄ h1 … hi ) could be some sort of procedure to provide criteria for the selection of a justified candidate. It is the methodological consequence from the HD-model that a complex of hypotheses cannot gain support from a juxtaposition of its prediction with empirical data. There is a sense of methodological asymmetry between refutation and confirmation – and not proof – of theories. Still in case of a contradiction between prediction and empirical data one obtains the information that at least one premise is false. Logically one doesn’t know which of the premises is false. Because of the strong inferential as symmetry Karl Popper draws the conclusion in Logic of Scientific Discovery from the logical properties of the HD-model: [A] theory of induction is superfluous. It has no function in a logic of science. … The best we can say of a hypothesis is that up to now it has been able to show its worth, and that it has been more successful than other hypotheses although, in principle, it can never be justified, verified, or even shown to be probable. This appraisal of the hypothesis relies solely upon deductive consequences (predictions) which may be drawn from the hypothesis: There is no need even to mention “induction”.1
Some followed Mill’s proposal to solve the problem to identify the supported hypothesis by eliminating empirically equivalent alternatives. Then the empirical data could be explained by a specific hypothesis if alternatives could be eliminated by some kind of good reason beyond empirical data. There are two reasons why such procedure doesn’t work: (i) each elimination based solely on empirical data faces the lack of specificity of modus tollendo tollens (MTT). (ii) MTT looses its power for refuting specific hypotheses for large or infinite set of premises, as it is common in scientific reasoning. In case one doesn’t want to follow Popper’s radical conclusion, one would need criteria beyond the logical consequences of empirical statements. Duhem’s Bons sense is a vague description of such a selection, but no solution.
2 CAUSAL
ANALYSIS
Instead of the HD-model I propose a different model for the empirical evaluation of theories. The dual relation between theory and empirical data will be exchanged by: 1. The canonical form of statements of prediction has the form of a general proposition of causal relevance as defined in Graßhoff and May (2001) as its basic form: (A ˄ X) ˅ Y are causally relevant for Z. 1
Popper, Karl R. 1959. The Logic of Scientific Discovery, New York: Basic Books, S. 315.
Inferences to Causal Relevance from Experiments
169
2. More complex causal statements can be composed according to rules of causal graphs. 3. Theory evaluation is (should be) done by rules of causal reasoning. 4. Empirical data which enter rules of causal reasoning have the form of difference tests as described below. In order to exemplify the new model of theory evaluation, the meanwhile classical case of the discovery of the urea synthesis by Hans Krebs and Kurt Henseleit serves as a model of typical experimental research employing causal reasoning.2 2.1 Causal models In 1932 Hans Krebs and Kurt Henseleit explained the urea synthesis in animal liver by the urea cycle – the first cyclic metabolic pathway discovered in biochemistry.3 This discovery was a milestone in the history of the discipline. For his subsequent studies of a similar process, the tricarboxylic acid cycle, Hans Krebs was later awarded the Nobel Prize. Already in the mid-19th century new analytical techniques showed that the rate of urea synthesis in living animals increased when they were fed an additional supply of glycine and leucine. Schultzen and Nencki assumed in 1869 that amino acids are intermediates in the reaction chain from proteins to urea.4 The introduction of the perfusion method marked an essential refinement of the experimental procedures. Reagents are guided through an organ outside the living organism, where the chemical composition of the leaving liquid is determined. In this way one found that not only glycine and leucine but almost all known proteins and amino acids increase the urea production in the liver. Until the twenties one attempted to optimize the utilization of the perfusion method; yet one did not succeed in decrypting the details of the chemical reactions leading to the formation of the urea. At this time Hans Krebs was working as an assistant in the laboratory of Otto Warburg in Berlin. During these years he conducted basic research and obtained a practical and biochemical knowledge that would play an important role during his discovery of the urea cycle. Krebs owed to Otto Warburg especially the adaptation of the tissue slice method and the use of manometric devices for sensitive measurements of small amounts of substances. In summer 1931, after Krebs had moved to Freiburg, he started his research project on the urea synthesis with his doctoral student Kurt Henseleit. It lasted nearly a full year and took him nearly 200 experiments until the urea cycle could be established. Krebs is guided by some quite general principles of causal reasoning. Crucial for this kind of reasoning is a structure called a “causal graph”, which is a complex 2
3 4
Various aspects of the case study Graßhoff and May (1995a), Nickelsen and Graßhoff (2009); laboratory notebooks as facsimile and transcription in Graßhoff and Nickelsen (2001b), Graßhoff and Nickelsen (2001a). Shortened from Graßhoff and May (1995b). Schultzen and Nencki (1872).
170
Gerd Graßhoff
network of cause-effect relationships. Knowledge of causal relationships is essential for scientific activities such as explaining, predicting and controlling natural processes. The following discussion focuses on the causal aspects. In practice, causal reasoning is of course intertwined with other considerations.
Fig. 1: A certain type of degradation path of amino acids in Neubauer, O. (1928), p. 845. A biochemical pathway is an instantiation of a causal graph. As an example, take the graph from figure 1. It shows the complicated paths of degradation of amino acids, according to Neubauer (1928), a standard textbook available to Hans Krebs. The types of causal relationships in this graph are manifold: Directly causally relevant factors: alanine pyruvic acid The presence of alanine (Alanin) causes the presence of pyruvic acid (Benztraubensäure). This is the basic type of a causal relation. Causal chains: alanine pyruvic acid lactic acid The path from alanine to pyruvic acid and to lactic acid (Milchsäure) is a causal chain. The relevance of alanine for the formation of lactic acid is mediated by pyruvic acid. The figure contains many such pathways. Multiple effects: acetaldehyde acetic acid, acetaldehyde acetone bodies. The presence of acetaldehyde has more than one effect: it causes not only the presence of acetic acid, but also that of an acetone body. Different effects of a common cause tend to occur together. This makes them diagnostically relevant for each other. Multiple causes: alanine pyruvic acid, oxaloacetic acid pyruvic acid Pyruvic acid can be formed on different pathways, e.g. by alanine as well as by oxaloacetic acid.
Inferences to Causal Relevance from Experiments
171
Fig. 2: Types of causal relationships that can be represented in a causal graph. An additional type of relationship are complex causes, where several substances have to be present for an effect; e.g. both hydrogen and oxygen have to be present for the formation of water. All relationships in figure 1 turn out to be complex, if the graph is specified in greater detail. Some of the reactions are such that two substances have to be combined to form a third. And all reactions depend on additional factors such as temperature, pressure and concentration. A second additional type is an inhibiting factor that inhibits an effect that would be present if the inhibiting factor were absent. An example is a toxic substance that inhibits cellular respiration. A third additional type, crucial for our case, is a cyclic process, where some specific event of type A is relevant for the occurrence of another event of the same type. Krebs’ discovery was that degradation of amino acids in mammals is a cyclic process with ornithine acting as a catalyst. A reversible reaction, e.g. between glucose and glycogen in figure 1, is also a rudimentary type of a cyclic structure. Figure 2 summarizes the possible causal relationships representable in a causal graph. A formal analysis of causal relationship has been one of the major philosophical challenges. Although causal reasoning governs most of our daily life and scientific reasoning, it is all but clear whether one could formulate a set of rules which capture our intuitive understanding of causality. David Hume’s analysis is mostly negative and Stuart Mill’s concept has been refuted early on. Even the most advanced theories of causal regularity by John Mackie were abandoned by its author.5 It has been a substantial part of our research project in Hamburg and now in Bern to formulate an adequate causal theory and a theory of causal reason5
Mackie (1980).
172
Gerd Graßhoff
ing that is applicable to well documented cases of scientific reasoning.6 The graph structures which are discussed here can be given a logical interpretation.7 2.2 Theory of causal regularities We interpret the graph structures by utilizing a variant of a causal regularity theory. Since details can be looked up elsewhere, our exposure will be brief.8 The central idea is to represent regularities in a formalism that is similar – but not reducible – to disjunctive normal form in propositional logic. A complex cause is represented by a conjunction; alternative causes are represented by a disjunction of conjunctions; a causal factor is represented by a part of a conjunction, and an inhibiting causal factor by a negation. The presence of a complex cause is assumed to be sufficient for bringing about the effect. ‘To bring about´ implies an asymmetry that cannot be analyzed in terms of the material conditional alone (we will give an operational definition below). Our knowledge of a complex set of regularities is often highly incomplete: What we know are small fragments of only some complex causes of a causal network. How can we use incomplete knowledge and even partially wrong beliefs for explaining and predicting events? How can we revise an incorrect hypothesis? How can we generate a better one? The proposed account is based on two main ideas: uniformity and relevance. A complex cause is represented by a conjunction (“and”); alternative causes by a disjunction of conjunctions (“or”); a causal factor by a part of conjunction, and an inhibiting causal factor by a negation (“not”). The idea of relevance is implemented via the notion of minimality. Each complex cause has to be minimal, i.e., no part of it is sufficient. This ensures that every factor that is part of the complex cause plays an indispensable part in bringing about the effect. A complex cause is also called a minimal sufficient condition. Note that a causal factor is normally neither sufficient (since it is only a part of a complex cause) nor necessary (since there may be complex causes of which it is no part) for the effect. Moreover, the disjunction of the minimal sufficient conditions is also required to be minimal: no part of it is necessary. This prevents redundant complex causes. We call a minimal necessary disjunction of minimal sufficient conjunctions a minimal theory. The presence of a complex cause is assumed to be sufficient for bringing about the effect, and the effect does not occur without one of its causes. This formal rep-
6 7 8
The course book developed and used in Bern is Graßhoff and Baumgartner (2003). Recent studies Baumgartner (2008b), Baumgartner (2008a). Graßhoff and May (2001), Graßhoff and May (1995a). Michael May developed the set of rules for causal reasoning described in this paper, May (1999). Graßhoff and Baumgartner (2003), Graßhoff and May (2003), Graßhoff and May (2001), May (1999).
Inferences to Causal Relevance from Experiments
173
resentation is very suitable for a computational representation and treatment of causal reasoning.9 A complex set of regularities is often highly incomplete: What we know are small fragments of only some complex causes of a causal network. Justifying the causal relevance of those partial structures we have found is arguably more important than identifying fully sufficient causes. 2.3 Principles of causal reasoning 2.3.1 Method of Difference The basic test for causal relevance of a factor is intimately tied to a procedure called ‘method of difference’. By tying the test for causal relevance to this procedure we get a straightforward operational definition for the notion of ‘bringing about’: A factor C brings about an effect E if (1) there are two situations S1 and S2 which are causally homogeneous except that C is present in S1 and absent in S2, and (2) E follows in S1, but not in S2. John Stuart Mill already studied this method in 1843. But despite of being the method scientists actually employ in experiments – in laboratory notebooks we literally find hundreds of experiments that instantiate this method. Mackie (1980) has reconstructed the logic underlying Mill’s methods; Michael May took this as our starting point to extend the basic method for handling much more complex types of experimental design. This section introduces the basic inference pattern for inferring a positively relevant causal factor. First, we will discuss the premises of an inference; then we will show in an example that an application of the inference pattern leads to a deductively valid conclusion; finally we will shortly discuss more complex inference patterns. 2.3.2 Assumptions The key to the proposal is to understand the generation of causal hypothesis as an iterative strategy. Testing does not proceed by comparing the predictions of a complex causal graph against all the relevant empirical data at once. Thesis 3. Empirical validation (and generation) of complex causal hypotheses is a matter of an iterative process of causal reasoning. An inference step to a causally relevant factor of a more complex causal hypothesis is based on three assumptions:
9
Formally, the underlying structure is a directed and-or-graph, where nodes represent types of events that can be instantiated by event tokens. A cycle corresponds to a sequence that contains type identical, but spatio-temporal distinguishable event tokens.
174
Gerd Graßhoff
(i) A complex cause is sufficient for bringing about the effect, and if no complex cause is realized in a situation, the effect does not occur; (ii) the test situation is causally homogeneous; (iii) there is an initial hypothesis. The first assumption is part of the concept of a deterministic causal regularity. The second assumption is discussed below. An initial hypothesis (assumption (iii)) can have different levels of specificity. On one extreme there is the bare assumption that some cause exists for the phenomenon to be explained – no specific factor is mentioned. On the other extreme is a hypothesis already specifying several minimal sufficient conditions. The discovery system starts with the first assumption and ends, if successful, with a justified hypothesis of the second kind. 2.3.3 Inferring a causal factor We will assume that the initial hypothesis already specifies some factors. This is more complex than the most basic case which can be handled by the simple method of difference. First, we assume that a factor A, together with an unknown conjunction of factors x1, is a complex cause for an effect W. Second, we assume that a factor B, together with a second unknown conjunction of factors x2, is a second complex cause for W. Third, we assume that there is an unknown disjunction y of additional minimal sufficient conditions. This is abbreviated (H0) (Ax1) ˅ (Bx2) ˅ y W. This is called an incomplete hypothesis or causal fragment. The unknown complete regularity is (R) (ACDE) ˅ (B¬GH) ˅ (ED¬F) ˅ (GHK) W (‘¬B’ means ‘not B’). To test the causal relevance of a factor C for a condition (Ax1) the following experiment is designed: Factors A and C are crossed in a factorial design, factor B, being an alternative intervening cause, is inhibited. An ‘1’ says that the effect W is present, a ‘0’ that it is absent.
T1 A¬B ¬A ¬ B
C 1 0
¬C 0 0
Table 1: Test Situation T1 The assumption of causal homogeneity puts a constraint on every causally relevant factor not varied explicitly: If this factor is instantiated in one cell (one of several corresponding test situations in an experiment) it is instantiated in every
Inferences to Causal Relevance from Experiments
175
cell. This premise has to be justified separately for every single case; there is no a priori guarantee for its correctness. Often we can justify this assumption even if we do not know the relevant factors; e.g., if we stir a solution carefully and distribute it among several vessels, we have good reason to suppose the contents of all vessels to be causally homogeneous – even if we do not know exactly what is in the vessel. The assumption of causal homogeneity says that each of the factors D, E, G, H, F, K is instantiated in every cell or in none. Let us assume that in our example the factors DEGHF¬K are instantiated. From the result of the experiment, the initial hypothesis (H0) and assumptions (i) and (ii), we can infer deductively that C is a part of the complex cause (Ax1).10 This argument justifies a minimal expansion of (H0) to (H1) (ACx1) ˅ (Bx2) ˅ y W. Proof (1) A minimal sufficient condition is realized in cell 1 (AC), but not in 2 (A¬C), since the effect W is present in 1 but not in 2. (2) Since cells 1 and 2 are causally homogeneous except for C, every minimal sufficient condition that does not contain C or its negation is present in both cases or absent in both cases. (3) Since there is no minimal sufficient condition realized in 2, there is a fortiori no condition realized that does not contain C or its negation. (4) It follows that no condition not containing C or its negation is realized in 1 either. (5) But since the effect is present in 1, a cause must be present. It follows from (4) that every cause that is present must contain C or its negation. And it must contain C unnegated, since C is instantiated in 1. (6) The same kind of argument applies to cells 1 and 3 (¬AC). The only difference is in A; we can infer that no cause is present in 1 that does not contain A. (7) Since there is no complex cause present in 1 that does not contain A and no complex cause that does not contain C, all causes present in 1 must contain AC (note that this does not preclude more than one cause present at the same time). This procedure can be iterated: In the next cycle we take (H1) as our initial hypothesis and examine a further candidate factor. There are additional rules (which we have no space to discuss here) for inferring inhibiting factors, alternative causes and factors without relevance in specific minimal sufficient conditions. These inference patterns are the backbone of the discovery program. The method possesses a high degree of correctness and completeness that largely stems from its ability to eliminate alternative hypotheses. It differs from many methods in not depending on numerical values representing heuristic criteria such 10 For a proof, see May (1999); compare also Mackie (1980).
176
Gerd Graßhoff
as coherence, plausibility, or simplicity. The precise meaning of these values and the scope of their applicability often remain unclear. 2.3.4
More complex designs
If one iterates this procedure, the experimental design demands to add an additional row for every newly discovered factor, in order to eliminate all alternative hypothesis. For testing a relevance of factor D we would have to conduct the following experiment:
T2
D
¬D
AC ¬B ¬AC ¬B A¬C ¬B
1 0 0
0 0 0
Table 2: Test situation T2 From the experimental result we can again infer deductively (H2)
(ACDx1) ˅ (Bx2) ˅ y W.
Without the third line, this inference would not have been valid. There is a possibility that D is not a part of AC: e.g., if the complete regularity is (R2) (AD) ˅ (ACG) W, with G being constantly absent, we would get the same result in cells 1-4. The effect in 1 is due to AD, not to ACG. There is no complex cause that contains ACD, so our conclusion would be wrong. But if we add the third line this is prohibited, since if (R2) or a similar hypothesis were true, then the cause would have been present in 5 (since AD is present). By adding this line we are able to eliminate a whole class of alternative explanations for the experimental result. 2.3.5 Other inference patterns Depending on the result of an experiment, different inference patterns can be applied. There are additional rules (which we have no space to discuss here) for inferring • positively relevant factors, • inhibiting factors, • factors that are part of alternative causes, • factors that are relevant through a causal chain, • factors without relevance. The procedure discussed in the last section demonstrates how to infer single factors. How can we justify a claim that a set of factors is minimal sufficient? The main principle is that of
Inferences to Causal Relevance from Experiments
•
•
•
177
Explaining anomalies. If all known factors of a hypothesis currently investigated are present, but the effect is absent, this is noted as an anomaly. The goal is to expand the hypothesis in such a way that all anomalies can be explained. This does not have to be done immediately, but can be delayed until suitable experimental data are present. If all anomalies generated in previous experiments can be explained by the causal model – and in complex systems, there are a lot of them – this is good, although not logically conclusive, evidence that the hypothesis is minimal sufficient. Irrelevance. A claim that a hypothesis is minimal sufficient can be strengthened by proving the non-relevance of factors not contained in the hypothesis. This should not be conflated with a general causal irrelevance of factors for the generation of a specific effect. This cannot be demonstrated by difference tests. When it can be shown that all factors not contained in the hypothesis are non-relevant, its evidence is even conclusive (however, this is a limit we do not normally strive for). Revision. If no attempt to explain the anomalies is successful, the causal model has to be revised by dropping the hypothesis that gave rise to anomalies. Afterwards, the model can be expanded again. 2.4 Difference tests in practice: notebook entries
Fig. 3 First page of Henseleit’s laboratory entry about the ornithine experiments.
178
Gerd Graßhoff
Difference tests and rules of causal reasoning is still a philosophical terminology, hardly used by scientists. Do they after all employ these techniques? The case study exhibits the typical scientific terminology in which rules of causal reasoning are embedded.
Fig 4: Third and last page of Henseleit’s laboratory entry about the ornithine experiments Figures 3 and 4 show the first and last (out of three) pages of Henseleit’s entry on his experiment with ornithine. These pages are typical of the style in which both researchers recorded their experimental work. It begins with a dated, underlined title. In most cases they mention the name of the substances to be tested. It is followed by a characterization of the organism from which cell tissues had been taken for the experiments. A short phrase is added to describe the procedures applied to the cell material before it was planted into the experimental tube (e.g. “Schnitte ausgewaschen”). The next sections on the page record the specific experimental properties such as temperature, the chemical composition of the solution and the time during which the tissues were exposed to substances in the solution. The lower two thirds of the page display a table structure. The table is divided into two sections: on top there are rows beginning with the amount and name of substances added to the solution (e.g. first row “Ringer Lösung”, second row
Inferences to Causal Relevance from Experiments
179
one column to the right “NH4Cl”). Little arrows to the right of the name of the substances indicate that the same amount of substance was added to the same experimental setup. Hence each column defines the specific attributes of one experimental arrangement within the comparable series of tests. The lower section of the table starts with the record of the weight of the liver tissue relative to which the amount of urea is measured. Since Krebs and Henseleit used manometric methods for their measuring devices, they recorded differences of pressure as measuring data. Together with auxiliary data they calculated the specific amount of urea formed in the reaction tube. This value comes at the bottom of the column and is typically underlined by either Henseleit, who recorded most of the experiments, or by Krebs. In many cases the experimental record in the notebook is concluded with a very short summary titled “Ergebnis”. But the historian who had hoped to find here a detailed document of an ongoing thought–process of the researcher will be disappointed. In most cases a result is noted, e.g. that substance A or B leads to an increase in the formation of urea, or else there is a brief conclusion, for instance that next time one should wash the tissues for a longer time. In the case of the ornithine experiment only the last two columns exhibit an experimental setting, in which ornithine together with ammonia produces much more manometric pressure – and hence urea – than all other settings, even with ornithine alone. This is in fact a very puzzling result because according to the standard hypothesis, with which Krebs was operating at the time, even ornithine alone should have produced urea. This experiment showed the contrary (see the fourth and third last columns in figure 4); ornithine alone did not produce urea. Further implications of the experiment are not recorded under the title “Ergebnis”. But Krebs and Henseleit did draw conclusions: the notebook carefully records all experimental conditions and those of the measuring process, which led to the measured data and from which they derived the putative findings in terms of the specific formation of urea. The experiment is set up as a causal test for factors which are either already known to play a causal influence in the formation of urea or which are tested for that role. Typically the notebook entry records such conditions. Krebs set up the experiments as a causal difference test: if two experimental situations are equal in causally relevant aspects and one additional factor exhibits an effect which is missing in the comparable situation without the presence of that factor, then Krebs rightly concludes the causal relevance of that factor. Krebs and Henseleit operated throughout their work with this methodological machinery. The assumption that the comparable situations should be equal to each other in causally relevant aspects is crucial. The experimenter’s skill is revealed in his ability to realize such a condition in an experiment. Should small differences occur, one must be able to control them and account for them. The only way to test the validity of that condition (which we call condition of homogeneity) is to repeat an experiment under conditions which one can control, and check whether the observed effects remain
180
Gerd Graßhoff
the same. If this is not the case, the condition of homogeneity is not satisfied and the experiment does not allow causal conclusions. Krebs had to learn it the hard way, when he conducted experiments with thymine. He measured some increase of pressure with his manometer and jumped to the conclusion that thymine is causally relevant for the formation of urea. It took him a month’s work to discover his error: he had not controlled the validity of the condition of homogeneity with sufficient care. He rectifies his error by repeating the experiment right from the beginning. The notebook entries for the ornithine experiment therefore contain two columns with an identical arrangement. The result is that there are small fluctuations in the outcome but not in the dimension of the “ornithine effect” (i.e. the production of ornithine for the given conditions). For this range of error Krebs could thus treat his experimental procedures as causally equal.
3 METHODOLOGY OF
CAUSAL MODELS
Instead of using HD-models for inferring causal hypotheses, difference tests of experiments provide best reasons for inferring causal hypothesis. The importance of the material realization of experimental test situations for inferring causal regularities cannot be overstated. Only in the factual material realization of an experiment an effect occurs because of a set of sufficient factors is given. This also counters objections against the regularity theory of causation. These are variations of an argument which runs in its simplest form: given is a complete coincidence of three factors A, B, C. Either all factors are given or all three are not instantiated. Different causal structures might be responsible for those coincidences: A could be common cause for B and C. Or B is the common cause for A and C, and so forth. Hence, so the argument, regularity theory cannot discern true from false causal structures attached to a given coincidence of factors. Yet, the argument falls short because it neglects the specific qualities of the experimental difference test: A material realization of an experiment introduces all factors to the inferences. The homogeneity condition then distinguishes between those experiments where the causally relevant factor (A) is switched as a test factor in otherwise homogeneous conditions (C must be suppressed by other means). If then the effect (B) occurs, a true causal relevance between A and B can be inferred. In all other configurations the condition of homogeneity would be violated, hence no false causal structure could be inferred. The inference pattern offers also a solution to the induction problem: on the assumption of valid homogeneity condition only one (suitable) difference situation is needed to infer deductively the truth of a general causal hypothesis. Of course, the conclusion could still turn out to be wrong, if one of the premises was wrong. Therefore, it is one of the most crucial experimental concerns that the condition of homogeneity is satisfied during experimentation. But this is the usual
Inferences to Causal Relevance from Experiments
181
risk any sort of deductive reasoning on the basis of assumption faces. Once the condition is true, one does not need any fancy inference patters to infer the truth of a causal regularity. This makes experimental inferences surprisingly simple. Thesis 4. The material realization of an experiment with a difference test is part of the causal reasons for a causal relevance hypothesis Causal reasoning reverses methodological issues which are often taken for granted. Thesis 5. A causal hypothesis (claim of causal relevance) cannot be falsified by empirical data. It can be strictly proven that no possible outcome of even a complex difference test allows an inference to the negation of a relevance claim. With the solution to the induction problem we don’t need to follow Popper’s advice to be skeptical with evidential support for general causal hypotheses. Thesis 6. Causal inferences allow the positive justification of a causal relevance on the assumptions that (i) homogeneous conditions for a given material realization (ii) justified initial hypothesis and (iii) correct empirical readings of the effects.
R EFERENCES M. Baumgartner. Uncovering deterministic causal structures: a boolean approach. Synthese, pages 1-26, 2008a. M. Baumgartner. Regularity theories reassessed. Philosophia, pp. 1-28, 2008b. Gerd Graßhoff and Michael Baumgartner. Kausalität und Kausales Schliessen. Bern Studies in the History and Philosophy of Science, Bern, 2003. Gerd Graßhoff and Michael May. Methodische Analyse wissenschaftlicher Entdeckungen. Kognitionswissenschaft, 5(2): 51-67, 1995a. Gerd Graßhoff and Michael May. From historical case studies to systematic methods of discovery. In AAAI Spring Symposium Series. AAAI Press, 1995b. Gerd Graßhoff and Michael May. Causal regularities. In Wolfgang Spohn, Marion Ledwig, and Michael Esfeld, editors, Current Issues in Causation, pp. 85-114. Mentis Verlag, 2001. Gerd Graßhoff and Michael May. Hans Krebs’ and Kurt Henseleit’s laboratory notebooks and their discovery of the urea cycle – reconstructed with computer models. In Hans-Jörg Rheinberger, Frederic L. Holmes, Jürgen Renn, editors, Reworking the Bench. Research Notebooks in the History of Science, volume 7 of Archimedes. New Studies in the History and Philosophy of Science and Technology, pp. 269-294. Kluwer Academic Publishers, Dordrecht, 2003.
182
Gerd Graßhoff
Gerd Graßhoff and Kärin Nickelsen. Dokumente zur Entdeckung des Harnstoffzyklus. Bd. 2: Laborbuch Kurt Henseleit. Bern Studies in the History and Philosophy of Science, Bern, 2001a. Gerd Graßhoff and Kärin Nickelsen. Dokumente zur Entdeckung des Harnstoffzyklus. Bd. 1: Laborbuch Hans Krebs und Erstpublikationen. Bern Studies in the History and Philosophy of Science, Bern, 2001b. John L. Mackie. The Cement of the Universe. A study of Causation. Clarendon Press, Oxford, 2nd edition, 1980. Michael May. Kausales Schliessen. Eine Untersuchung über kausale Erklärungen und Theorienbildung. PhD thesis, University of Hamburg, 1999. O. Neubauer. Intermediärer Eiweißstoffwechsel, pp. 797-825. Julius Springer, Berlin, 1928. Kärin Nickelsen and Gerd Graßhoff. Concepts from the bench. Krebs and the urea cycle. In Giora Hon, Jutta Schickore, and Friedrich Steinle, editors, Going Amiss in Experimental Research, Dibner Institute Studies in the History of Science and Technology, pp. 91-117. Springer, Boston, 2009. O. Schultzen and M. Nencki. Die Vorstufen des Harnstoffs im thierischen Organismus. Zeitschrift für Biologie, 1872.
Philosophische Fakultät I Institut für Philosophie Humboldt Universität zu Berlin Unter den Linden 6 10099 Berlin Germany [email protected]
ALAN C. LOVE AND ANDREAS HÜTTEMANN
COMPARING PART-WHOLE REDUCTIVE EXPLANATIONS IN BIOLOGY AND PHYSICS1
ABSTRACT Many biologists and philosophers have worried that importing models of reasoning from the physical sciences obscures our understanding of reasoning in the life sciences. In this paper we discuss one example that partially validates this concern: part-whole reductive explanations. Biology and physics tend to incorporate different models of temporality in part-whole reductive explanations. This results from differential emphases on compositional and causal facets of reductive explanations, which have not been distinguished reliably in prior philosophical analyses. Keeping these two facets distinct facilitates the identification of two further aspects of reductive explanation: intrinsicality and fundamentality. Our account provides resources for discriminating between different types of reductive explanation and suggests a new approach to comprehending similarities and differences in the explanatory reasoning found in biology and physics.
1. BIOLOGY, PHYSICS, AND NAGEL’S REDUCTIONIST SHADOW Ernest Nagel’s philosophical account of theory reduction in the sciences has cast a long shadow on discussions of the relationship between biology and physics.2 In addition to debates among philosophers of science, some biologists, such as Ernst Mayr, took Nagel’s account as the epitome of what is wrong with philosophical analyses of science; namely, a Procrustean maneuver that stretched and hacked biology to fit a conception of science forged primarily on exemplars from physics. 1
2
Both authors contributed equally to this paper. We are grateful for the comments and suggestions we have received on this material from many colleagues, including Ingo Brigandt, Tom Doyle, Susan Hawthorne, Marie Kaiser, Peter McLaughlin, Ken Schaffner, Ken Waters, and Marcel Weber. Useful feedback also came from participants at the 2009 workshop ‘Explanation, Confirmation, and Prediction in Biology and Medicine,’ held in Konstanz, Germany and sponsored by the European Science Foundation (Research Networking Programme). We want to express our appreciation for financial support from Alexander von Humboldt Foundation to pursue this collaboration, which grew out of our mutual participation in the Second German-American Frontiers of Humanities Symposium, Hamburg, Germany, October 2005, sponsored by the Alexander von Humboldt Foundation and the American Philosophical Society. Nagel (1961)
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_13, © Springer Science+Business Media B.V. 2011
184
Alan C. Love and Andreas Hüttemann
One of my special concerns has been the neglect of biology in works claiming to be philosophies of science. From the 1920s to the 1960s the logical positivists and physicalists who dominated philosophy of science had little interest in and even less understanding of biology, because it simply did not fit their methodology.3
Mayr thought this reflected a general tendency among both scientists and philosophers: “The assumption that it should be possible to “reduce” the theories and concepts of all other sciences, including biology, to those of the physical sciences has clearly dominated not only philosophy but science itself” (1). Unsurprisingly, Mayr saw these as failed maneuvers: “Attempts to ‘reduce’ biological systems to the level of simply physico-chemical processes have failed because during the reduction the systems lost their specifically biological properties” (1). His refrain is a common one—biological systems have special or complex properties, sometimes labeled ‘emergent,’ which cannot be handled purely in physical or chemical terms: “Where organisms differ from inanimate matter is in the organization of their systems” (2). But Mayr also thought other differences between physics and biology could be relevant, including physical science preferences for single cause explanations and deterministic models. For Mayr, this was no mere intellectual skirmish. As Director of the Harvard Museum of Comparative Zoology and a public spokesmen for evolutionary biology in the 1960s, Mayr was involved in major battles over resources with the increasingly dominant molecular approaches to biology (populated by expatriate physicists) that flourished after the structure of DNA was discovered.4 It was in this context that Mayr marshaled his distinction between proximate and ultimate causation, the latter capturing a distinct place for evolutionary biology that was untouchable by molecular biologists.5 A reorientation of physics-dominated philosophy of science was more than academic; it meant the life (or death) of evolutionary biology. Thus, it is no surprise that Mayr’s framing of the issue is in terms of ‘autonomy’: is biology an autonomous science? What makes biology unique?6 Mayr drew a distinction between three different forms of reduction7: (i) constitutive: any dissection of phenomena, events, and processes into the constituents of which they are composed (ii) explanatory: claims that all the phenomena and processes at higher hierarchical levels can be explained in terms of the actions and interactions of the components at the lowest hierarchical levels
3 4 5 6 7
Mayr (1988, p. 1) Beatty (1994) Mayr (1961) cf. Mayr (2004) Mayr (1988, pp. 10-11)
Comparing Part-Whole Reductive Explanations
185
(iii) theory: the theories and laws formulated in biology are only special cases of theories and laws formulated in the physical sciences, and that such biological theories can thus be reduced to physical sciences Mayr took the first to be uncontroversial, while the latter two were areas of contention. Post-positivist philosophers, in his estimation, had decided against theory reduction (Nagel’s heritage being the preeminent example) and he argued against explanatory reduction in terms of the complexity of living systems, their possession of a genetic program, the incommensurability between concepts in biology and physics, the different role played by laws, and the presence of ultimate causation in biology. In an effort to stress the failure of explanatory reduction, Mayr pointed to the paleontologist George Simpson’s “reverse reduction” of physics to biology. The point is that all known material processes and explanatory principles apply to organisms, while only a limited number of them apply to nonliving systems. … Biology, then, is the science that stands at the center of all science, and it is here, in the field where all the principles of all the sciences are embodied, that science can truly become unified.8
Although Mayr acknowledged that Simpson might have overstated his case, the conclusion that any explanatory reduction of biology to physics was a failure could not be missed. Perhaps as a consequence of his real-time, real-world battle over reductionism, Mayr overlooked multiple nuances in Nagel’s original discussion. In particular, Nagel was quite sensitive to the possibility of differences between explanations in biological science and physical science: Despite the undeniable successes of physicochemical explanations in the study of living things, biologists of unquestioned competence continue to regard such explanations as not entirely adequate for the subject matter of biology. … Some of them nevertheless maintain that the mode of analysis required for understanding living phenomena is fundamentally different from that which obtains in the physical sciences.9
Nagel identified two reasons why biological inquiry and explanation might differ from that found in physical science: “One is the dominant place occupied by teleological explanations in biological inquiry. The other is the use of conceptual tools uniquely appropriate to the study of systems whose total behavior is not the resultant of the activities of independent components.”10 The latter reason can be rephrased in terms of (the failure of) part-whole explanations. In his extensive discussion of these reasons, Nagel made a salient observation about two different 8 Simpson (1964, pp. 106-107) 9 Nagel (1961, p. 398) 10 Nagel (Ibid., p. 401)
186
Alan C. Love and Andreas Hüttemann
modes of analysis that correspond to the distinction between structure and function in biology. The contrast between structure and function is evidently a contrast between the spatial organization of anatomically distinguishable parts of an organ and the temporal (or spatiotemporal) organization of changes in those parts. What is investigated under each term of the contrasting pair is a mode of organization or a type of order. In the one case the organization is primarily if not exclusively a spatial one, and the object of the investigation is to ascertain the spatial distribution of organic parts and the modes of their linkage. In the other case the organization has a temporal dimension, and the aim of the inquiry is to discover sequential and simultaneous orders of change in the spatially ordered and linked parts of organic bodies.11
This observation and many others demonstrate Nagel’s acute awareness of the subtle relations between explanations in biology and explanations in physics. Although his conclusion was cautiously optimistic about the future possibility of offering complete physicochemical explanations of biological phenomena, Nagel recognized that the resistance to explanatory reduction by organismal biologists was motivated by an important point: “the stress they place on the hierarchical organization of living things and on the mutual dependence of organic parts is not a misplaced one.”12 Both Mayr and Nagel spoke in very general terms about biology and physics, their explanatory modes, and the success or failure of reductionism. Many biologists and philosophers also have worried that importing models of reasoning from the physical sciences obscures our understanding of reasoning in the life sciences in more circumscribed domains. For example, teleological or functional explanations continue to be an area of debate in philosophy of biology but the topic is absent from philosophy of physics. In this paper we follow a more circumscribed strategy by combining three elements present in the dialectic between Mayr and Nagel: explanatory reduction, part-whole relations, and the temporal dimension of organization. Biology and physics tend to incorporate different models of temporality in part-whole reductive explanations, which partially validates the worry that modes of analysis in physics cannot be imported directly into philosophical analyses of inquiry and explanation in biology. After documenting possible differences in temporal aspects of part-whole reductive explanations, we argue that these result from differential emphases on compositional and causal facets of reductive explanations. Keeping these two facets distinct facilitates the identification of two further aspects of reductive explanation: intrinsicality and fundamentality. The result is an account that provides resources for discriminating between different types of reductive explanation, and suggests a new approach to comprehending similarities and differences in the explanatory reasoning found in biology 11 Nagel (Ibid., p. 426) 12 Nagel (Ibid., p. 444)
Comparing Part-Whole Reductive Explanations
187
and physics, without having to decide the general question of whether and in what sense biology is autonomous from physics.
2. TEMPORALITY IN PART-WHOLE REDUCTIVE EXPLANATIONS 2.1 Part-Whole Reductive Explanations A variety of authors have conceptualized reductionism in terms of the relationship between parts and wholes.13 Nagel also devoted attention to part-whole relations,14 but his analysis was overshadowed in subsequent developments of theory reduction by philosophers (akin to Mayr overlooking aspects of Nagel’s views on explanatory reduction). Part-whole relations play both methodological and explanatory roles in biology and are distinct from identity-reduction, which focuses on the relation of two kinds of properties within the same system rather than explaining properties of a compound system in terms of its parts. Explaining the property of a whole in terms of the properties of its parts is a distinct explanatory question from explaining a property of the whole in terms of another property of the whole by identity. Consider a case where the property of a whole (the temperature of an ideal gas) is explained in terms of the combined properties of the parts (kinetic energies of the molecules). One question we can ask is whether it is possible to explain why an ideal gas with a certain mean kinetic energy has a specific temperature. This involves two kinds of properties of the same system—the specific mean kinetic energy and temperature of the ideal gas at a time. A correspondence rule may link the mean kinetic energy (a property of the gas as a whole) to the temperature (a property of the gas as a whole). A second question is whether it is possible to explain why an ideal gas has a specific mean kinetic energy on the basis of the behavior of its constituent molecules. This involves the behavior of the components of the ideal gas and its behavior as a whole: how do the individual kinetic energies add up to the kinetic energy of the whole? The relation is between parts and wholes, not two properties within the same system. 2.2 Temporality In Nagel’s contrast between spatial and temporal modes of organization, the temporal mode is characterized by its concentration on “sequential and simultaneous orders of change in the spatially ordered and linked parts of organic bodies.” These orders of change are described frequently in causal terms, such as the parts of an organism exhibiting causal interactions conditioned by spatial proximity. All causal explanations involve some element of temporal duration. If the aim is to 13 e.g., Bechtel and Richardson (1993); Sarkar (1998); Wimsatt (1976) 14 Nagel (Ibid., pp. 380-397)
188
Alan C. Love and Andreas Hüttemann
explain the increased rhythmic contraction of the heart by the cellular mechanisms that process adrenergic hormones, then the explanation requires (at least implicitly) an earlier time at which hormones are interacting with cellular receptors and a later time when the altered rhythmic contraction obtains. Temporal duration may be operationalized in different ways depending on the explanatory goals in view, such as with absolute chronology (minutes or hours) or event sequences or stages (Fertilization → Cleavage → Blastula → Gastrula). We can make the role of time more precise by characterizing temporality for part-whole reductive explanations in terms of a property of a whole at t* being explained by properties of its parts at an earlier time t. If a temporal relation is one in which a property or state at t is related to another property or state at t*, then a causal relation is one in which a property or state at t determines or influences another property or state at t*; the state or properties of the parts and their interactions at t (or t1, …, tn < t*) bring about a change in the state or properties of the compound at time t*. Many concepts commonly invoked in philosophical discussions of reduction are atemporal and non-causal.15 Part-whole explanations in physics are often atemporal.16 For the behavior of a physical system, part-whole reductions can pertain to either its states or its temporal evolution (i.e., dynamics). A part-whole reduction of a state explains the state of a compound system at a time on the basis of the states of its parts at the same time. For example, we might explain the determinate energy value E (macrostate) of a compound system (e.g., an ideal gas) by appeal to the determinate energy values of its constituents (e1 to en ; the states of the parts). The explanation relies on the states of the parts (particular facts) and a law of composition delineating how these states contribute to the state of the compound. If we assume that interactions can be neglected, the kinetic energy values simply add up. The explanation is reductive, because it only uses states of the parts and a law of composition in the explanans, but it is also atemporal.17 A second situation is the part-whole reduction of the dynamics of a physical system; the temporal evolution of a compound system can be explained in terms of the dynamics of its parts. For instance, the Hamilton operator for a compound system can be analyzed in terms of those for the parts along with interaction terms and a law of composition. The relation between the Hamilton operator 15 Supervenience is one example: ‘Mental properties supervene on physical properties, in that necessarily, for any mental property M, if any thing has M at time t, there exists a physical base (or subvenient) property P at t, and necessarily anything that has P at a time has M at that time’ (Kim 1998, 9). 16 Hüttemann (2005) 17 The situation is more complex for systems of the order of 1023 degrees of freedom, as in statistical physics. Averaging procedures such as ‘coarse graining’ are necessary to explain the behaviour of compound systems. These procedures often involve averaging over time so that the relations in question are no longer atemporal in the sense defined here.
Comparing Part-Whole Reductive Explanations
189
for the compound and those for the parts (i.e., a part-whole relation) is reductive but atemporal because the Hamilton operator of the compound at time t* is not calculated in terms of the Hamilton operators of the parts and their interactions at another time t. (Typically, the Hamilton operators of the parts and the compound are time-independent.) A part-whole reduction of the dynamics of a compound is often integrated into a causal explanation. In classical dynamics, if we want to explain why the state of the solar system Z develops over time into state Z’, we can appeal to the dynamics of its parts and their interactions. The two dimensions of the explanation are separable: (a) the non-reductive, temporal dimension; and, (b) the atemporal, part-whole reductive dimension. Part-whole reductions (and explanations more generally) in biological science are often temporal. Properties of a whole at t* are explained in terms of properties of parts at an earlier time t (or t1, …, tn < t*); the behavior of the parts at earlier times causes the compound to have a behavior or property at a later time t*. Temporal or causal part-whole reductions can be described differently depending on how the behavior of the parts is described. When laws play a role, causal partwhole reductions involve the laws and initial conditions that pertain to parts of a whole at t explaining the behavior of the whole at t*. Causal part-whole reductions also can be described in terms of causal powers: the behavior of a whole at t* is explained in terms of the causal powers of its parts at t. Causal part-whole explanations are reductive because they constrain the explanans to only laws or causal powers of the parts (and sometimes compositional rules). Examples of causal part-whole reductions abound in biology, such as explaining muscle tissue activity (behavior of a system) at a later time t* by appealing to the properties of muscle cells (the parts) composing the tissue (the whole), which contain special motor proteins that contract by molecular ratcheting, at an earlier time t. Temporal part-whole reductions explain the state of a compound or whole (muscle contraction) in terms of states of the parts at earlier times (myosin ratcheting) and must, for several reasons, be distinguished from atemporal part-whole reductions, which explain the behavior of a compound in terms of the behavior of the parts at the same time.
3. COMPOSITION, CAUSATION, AND THE DIFFERENCE TIME MAKES 3.1 Composition and Causation What explains the difference in how temporality enters into part-whole reductive explanations in biology and physics? The answer lies in the distinction between composition and causation, both of which are key facets of reductive explanation. Composition refers to higher-level entities being constituted by, realized by, or nothing but lower-level entities. The heart is composed of myocardial cells and other cellular entities. A myosin filament found in myocardial cells is composed
190
Alan C. Love and Andreas Hüttemann
of myosin proteins, which are in turn composed of amino acid residues. Causation refers to higher-level entities being caused, brought about, or determined by lower-level entities. The heart’s rhythmic beating is caused by the contraction of its myocardial cells. Myocardial cells contract because myosin proteins ratchet along another set of proteins. The appropriate folding of a particular myosin protein is determined causally by its amino acid sequence. Atemporal part-whole reductions, which correspond to what one typically finds in physical science, focus on composition; i.e., the relation of the higher level to the lower level is about constitution or realization. When there is a temporal element of the explanation, such as when explaining dynamics, the element of causation only enters as a relation between entities at the same level (e.g., the evolution of the state of the parts from an earlier to a later time). Temporal partwhole reductions, which are characteristic of reductive explanatory practices in biology, combine the compositional element with that of causation; they contain a mixture of compositional and causal claims.18 In the part-whole reductive explanation for a physical system described above, the temporal (causal) and the reductive (part-whole) dimensions can be separated neatly in contrast to the intertwined compositional and causal claims found in biology. We agree with Nagel that this difference does not establish “the absolute autonomy of biology or the inherent impossibility of physicochemical explanations of vital phenomena.”19 Therefore, we leave open the possibility that temporal partwhole reductions from biology might be explicated in terms of atemporal partwhole reductions in physics at some point in the future (according to standards of the scientific community). But the difference in explanatory practice motivates distinguishing the two facets. In particular, the way temporality enters into partwhole reductive explanations directs our attention to two further aspects of reductive reasoning that reveal more resources for characterizing differences between modes of analysis in biology and physics. 3.2 Intrinsicality and Fundamentality Part-whole spatial relations presume that parts are contained within or intrinsic to the whole. Thus, intrinsicality seems to be a precondition for part-whole reductive explanations, which relates directly to composition. Claims about reducing a higher-level entity to its component parts involves an individuation of the higherlevel entity such that its components can be distinguished from other entities (a surrounding context or environment). If the goal is to reductively explain a cell in terms of its component parts, then the intrinsic/extrinsic boundary is the cell membrane. If the goal is to reduce the heart to its component parts then the boundaries
18 Craver and Bechtel (2007) 19 Nagel (Ibid., p. 444)
Comparing Part-Whole Reductive Explanations
191
of the organ demarcate intrinsic from extrinsic. What counts as intrinsic is relative to local explanatory aims.20 Parts also are considered more fundamental than the whole; they are the ‘lower level’ that reductively explains the ‘higher level’ system properties of the whole (fundamentality 21). In many cases, only a subset of fundamental level properties will count as explanatory (e.g., biochemical moieties but not spatial location). As with intrinsicality, these qualifications are spelled out locally in the context of explanation. But they also introduce qualifications about intrinsicality. A qualified fundamentality might identify biochemical properties as explanatory of cell properties, which implies that biochemical properties outside of the cell (i.e., extrinsic) are fundamental and that other intrinsic properties of cellular components (e.g., location) are not explanatory. Intrinsicality and fundamentality are constraints on part-whole reductive explanations. Thus, compositional part-whole reductions can fail as reductive explanations either because intrinsicality is violated (for a particular part-whole decomposition) or fundamentality is violated (in the broad sense or for some restricted set of properties). But it seems impossible for these to be violated independently because, at any particular time, the parts are intrinsic and fundamental to the whole. If biochemical properties of a cell’s ‘intrinsic’ parts are considered ‘fundamental’, then they either explain the cell level properties or they do not. This is where the significance of temporality emerges because it allows for the decoupling of intrinsicality and fundamentality in part-whole reductive explanations. If part-whole relations are considered temporally (as expected for biology), then it is possible for intrinsicality to be violated because a part of a whole at time t may no longer be a part of the whole at t*. Heart organogenesis is one empirical case where we can observe the importance of temporality for part-whole reductive explanations. Blood cells coursing through nascent atrial chambers during ontogeny meet the intrinsicality condition (spatially) at time t even though at t* they have passed out of the heart. Fluid flow is a key factor in the proper shaping of the heart during embryogenesis;22 blood cell components at t have a causal effect on the shape of the heart at t* when these cells are no longer intrinsic. This is still a kind of reductive explanation because properties of entities at the qualified fundamental level of cells account for changes in the properties of entities at the non-fundamental level (organs). Another common example is programmed cell death (apoptosis). Cellular components at time t 20 Researchers are explicit about this; e.g., “it is often beneficial to separate contributions arising from fluctuations that are inherent to the system of interest (intrinsic noise) from those arising from variability in factors that are considered to be external (extrinsic noise). … The definition of intrinsic noise is problem-dependent, and varies from one context to another” (Kærn et al. 2005, p. 456). 21 see Sarkar (1998, ch. 3) 22 Hove et al. (2003)
192
Alan C. Love and Andreas Hüttemann
can bring about a change in the properties of a whole at t* even though the cellular components are literally absent at t* due to apoptosis. Two other kinds of possibilities bear mentioning. First, adrenergic hormones secreted in the circulatory system can modulate heart rhythm. These hormones are extrinsic to the heart, even though they meet the fundamentality condition, and exert an effect through a temporally extended process. Second, different organs can interact directly (through physical contact) with the heart to bring about changes in its morphology during embryogenesis. These other organs are not only extrinsic but also non-fundamental because they are at the ‘same’ non-fundamental level as the heart. We can summarize these possibilities in a table of reductive explanatory forms (Table 1). This displays the heterogeneous ways that part-whole explanations can succeed or fail as reductive explanations.23 Table 1 Forms of Explanation I II III IV V
Temporality
Intrinsicality
Fundamentality
N N Y Y Y
Y N Y N N
Y N Y Y N
The first two forms (I and II) correspond to purely compositional relations (I = success; II = failure). Form III includes the possibility of successfully extending compositional relations through time and also encompasses explanations that appeal to fundamental, intrinsic causes at t bringing about an effect in the nonfundamental level whole at t* while ignoring compositional relations that obtain at other times between t and t* (such as blood flow in the heart or apoptosis). Form IV concerns an extrinsic entity from a fundamental level explaining a system property (e.g., the adrenergic hormone case). Although it is reductive because it appeals to properties at the fundamental level, it fails in the sense of a whole being causally explained by its parts. Form V corresponds to the possibility of one organ interacting with another organ—the entity is extrinsic and also resides at the same level of non-fundamentality. Here there is a failure of the parts to explain the whole and a failure of fundamental level properties to explain non-fundamental level properties. Once part-whole relations are temporally indexed, intrinsicality and fundamentality take on independent significance in reductive explanations.24 23 Table 1 suppresses the diversity that obtains as a result of individually characterizing intrinsicality and fundamentality for a particular reductive explanation. 24 The table omits two kinds of possibilities ruled out in discussion: (a) atemporal partwhole reductions that violate either intrinsicality or fundamentality alone (two forms of explanation), and, (b) fundamentality failing when intrinsicality holds, because if a
Comparing Part-Whole Reductive Explanations
193
Table 1 shows the difference that time makes. Without it, there is only a question of whether the part-whole explanatory reduction succeeds (I) or fails (II). But the successes and failures of temporal part-whole explanatory reductions in biology often involve forms III, IV, and V. Now it is clear why the export of atemporal models of reductive explanation from philosophical reflection on physical science could hinder our analyses of reductive explanations in biology. In the partwhole reductive explanation for a physical system described above, the temporal (causal) and the reductive (part-whole) dimensions can be separated in contrast to the mixed compositional and causal claims found in biology or situations where the compound does not exist at the earlier time (e.g., during embryological development). Furthermore, physics typically treats isolated systems and, as a consequence, it is often assumed that parts are not added or lost from the compound (as happens during cell division or apoptosis); i.e., it is assumed that compositional relations remain constant. Our analysis makes explicit why some physics-derived models mischaracterize part-whole reductive explanations found in biological reasoning; i.e., in what respect explanations in biology and physics can be different. In addition to this fault line between modes of analysis in biology and physics, our account generates a new perspective on the ‘context’ objection to reductive explanation.25 The context objection claims that a reduction can be blocked because of an ineliminable appeal to contextual factors. A standard rejoinder is to pursue a reduction of this context. The inclusion of temporality gave us two different ways a reductive explanation can fail as a reductive explanation (IV – intrinsicality fails; V – fundamentality and intrinsicality fail). This provides the basis for a more nuanced reading of the context objection. The reductionist rejoinder maintains the fundamentality condition even when intrinsicality is violated; it preserves fundamentality (a reductive explanation in one aspect) at the expense of intrinsicality, and thus the context objection retains some validity. If an attempt is made to recover intrinsicality by redrawing system boundaries (e.g., treat the body cavity as the system so that the adrenergic hormone is a part of this new whole), then we have changed what counts as whole and parts, and thereby what counts as intrinsic. Whether a part-whole reductive explanation succeeds or fails with respect to intrinsicality depends on how ‘wholes’ are individuated, which means redrawing boundaries constitutes a change of the original question about whether a whole can be explained reductively by its parts. The failure of a reductive explanation for the aspect of intrinsicality may be an empirical indicator that redrawing boundaries is warranted epistemologically. But even if a reductive explanation succeeds as a consequence of new individuation criteria, this is compatible with the claim that the behavior of interest cannot be explained in terms of its intrinsic features under the original individuation criteria. Thus, the success or failure of feature is intrinsic then in order to be contained within a whole it must be instantiated at a more fundamental level than the whole. 25 cf. Delehanty (2005)
194
Alan C. Love and Andreas Hüttemann
a reductive explanation is not an all or nothing phenomenon. For any part-whole reductive explanation, we must not only inquire whether each of these aspects is applicable, but also characterize the details involved in order to evaluate whether there is success or failure of one kind or another.
4. EXAMPLES: PART-WHOLE REDUCTIVE EXPLANATIONS IN BIOLOGY AND PHYSICS 4.1 Ideal Crystal One example of a reductive part-whole explanation in physics is the classical treatment of the ideal crystal. It is a reductive explanation of the dynamics (temporal evolution) of a compound system in terms of the dynamics of the parts and their interactions. According to standard treatments, the electrons and ions that constitute the crystal can be considered separately (adiabatic approximation). The regular structure of the crystal is generated by the ions. Within the so-called harmonic approximation they are, however, not supposed to sit motionless at their lattice-sites. According to the model the ions perform oscillations around the sites of the lattice, which are described as the mean equilibrium positions of the ions. These oscillations are considered small in comparison with the inter-ionic spacing, which means that only nearest-neighbor interactions are relevant. Furthermore, it is supposed that the potential between nearest neighbors is harmonic.26 On the basis of these assumptions we can specify the classical Hamilton function of the ideal crystal. The Hamilton function is constructed in terms of the dynamics of the constituents, which are understood as isolated (kinetic energy terms), and their interactions (potential energy). These contributions are added together according to a law of composition (4.1.1)
H= ∑i Eikin + (1/2) ∑ijUijqiqj
where Eikin = p2i / 2m is the kinetic energy of the parts, and Uij = ∂2/∂qi∂qj U(q1 … q3N ) describes the interactions between the parts. On the basis of the Hamilton function we can determine the thermal density of the crystal, which is given by (4.1.2)
u = 1/V ( ∫ dΓ exp {– βH}H) / ( ∫ dΓ exp {– βH })
in which dΓ stands for the volume element in crystal phase space and β= 1/kBT where kB is the Boltzmann-constant and T the temperature. The thermal density of the crystal permits us to calculate the behavior of the compound system, including measurable thermodynamic properties such as the specific heat cv: (4.1.3)
cv = (∂ / ∂T ) u
Classically, the specific heat in a crystal is independent of its temperature. 26 Ashcroft and Mermin (1976, pp. 422-427)
Comparing Part-Whole Reductive Explanations
195
Within the conceptual framework developed above (Sections 2 and 3), this example of a reductive physical explanation has two important elements. First, the explanation is atemporal because the Hamiltonian of the compound for a certain time t does not depend on the Hamiltonians for the parts at another time t* (this obtains trivially because the Hamiltonian is time-independent). Second, the explanation relies only on features that are intrinsic with respect to the compound, which is typical for many areas of physics that focus on isolated systems. Furthermore, the properties appealed to in the explanation are those of the parts alone, and in this sense the aspect of fundamentality is met. The explanation of the crystal’s behavior provides an example of a reductive part-whole explanation that corresponds to Form I in Table 1. 4.2. Quantum-entanglement Not all reductive part-whole explanations of states (as opposed to the dynamics) in physical science correspond to what we see in the case of the ideal crystal. Consider the spin states of a compound that consists of two non-identical particles. Normalized vectors in two-dimensional Hilbert spaces represent the spin states of the separate particles: H1 and H2. In order to construct a Hilbert-space for the compound system we need a law of composition; e.g., the possible spin states of the compound system are all those states that can be represented as (normalized) vectors in the tensor product of H1 and H2 : Hs = H1 H2. If we take the eigenvectors in the spin z-direction as the basis for H1 and H2 (H1: |ψ z-up1> and |ψ z-down1>; H2: |ψ z-up2> and |ψ z-down2>), then we find all of the following among the possible states of the compound system: (4.2.1) | ψ z-up1 > | ψ z-down2 > (4.2.2) | ψ z-down1 > | ψ z-up2 > (4.2.3) 1/ √2 | ψ z-up1 > | ψ z-down2 > – 1/ √2 | ψ z-down1 > | ψ z-up2 > What is essential is that equation 4.2.3 cannot be written as a simple tensor product of vectors H1 and H2; it can only be written as a superposition of such tensorproducts. The fact that the compound is in a determinate state cannot be explained in terms of the determinate states the constituents occupy. This is because there are states, such as those described in equation 4.2.3, which do not allow the attribution of pure states to the parts of the compound. A part-whole explanation of the state of the compound thus fails. This is a failure of reductive explanation because a part whole-explanation of the state is not merely difficult to formulate but impossible to achieve. Quantum mechanics contains states of compound systems that do not allow for the attribution of pure states to the parts. The impossibility of attaining these kinds of reductive explanations is implied by the formalism of quantum mechanics. Reductive explanations that correspond to Form I in Table 1 are not obtainable. The case of
196
Alan C. Love and Andreas Hüttemann
spin-states can be classified as an instance of Form II. In both of the cases from physical science, fundamentality is understood in terms of properties or states of the parts; intrinsicality is assumed. Thus, fundamentality and intrinsicality stand or fall together. Section 3 described how these two aspects could be decoupled with the addition of temporality, and biological science is a natural place to look for this explanatory pattern. 4.3 Protein Folding Biologists have long recognized that part-whole reductive explanations are relevant in the context of explaining how protein folding occurs: “The protein folding problem … represents an unusually concrete and limited case of the whole problem of reductionism. … understanding the rules of [folding] would teach us worthwhile lessons about … exactly how an organic whole becomes so much more than a sum of its parts.”27 Philosophers have also observed that it represents a key locus for evaluating part-whole explanatory reduction in molecular biology.28 Proteins are composed of amino acid components (‘residues’) that are linked by covalent peptide bonds into a chain (‘polypeptide’). This linear chain is produced from a process termed ‘translation’; specific cellular constituents (ribosomes, themselves proteins) translate a linear stretch of RNA with a triplet code of nucleotides (e.g., AAG) into amino acid residues for a linear polypeptide (e.g., AAG = lysine). Nearly all proteins adopt a three-dimensional structure in order to be functional, which is understood in terms of interactions among its amino acid residues (e.g., hydrophobic residues avoid interaction with surrounding water by segregating to internal regions). Addressing the protein folding problem requires explaining how this conformation is achieved for polypeptides subsequent to translation from RNA in the cellular context.29 The linear sequence hypothesis holds that the three-dimensional folding of a protein results from the properties of the amino acid residues in the polypeptide and their chemical interactions alone—the whole is a ‘sum’ of the interaction of its parts. Although there is an ambiguity in the linear sequence hypothesis between (a) inferring or predicting the three dimensional structure of a protein from its linear sequence of amino acids, and (b) explaining the outcome of three dimensional structure by appeal to the kinetic, thermodynamic, and structural processes in the cell,30 we focus only on the latter construal because it concerns part-whole explanatory reduction. Whether a protein folds only as a consequence of its amino acid residues is a causal question involving reductive explanations of wholes in terms of parts. 27 Richardson (1982, p. 1) 28 Sarkar (1998, p. 169) 29 Some folded proteins aggregate further in order to be functional (e.g. hemoglobin is a tetramer). 30 cf. Freedman (1999)
Comparing Part-Whole Reductive Explanations
197
Evidence in favor of the linear sequence hypothesis was derived initially from experiments on the denaturation and refolding of ribonuclease proteins in vitro.31 Ribonucleases subjected to denaturing conditions were able to refold rapidly into the proper configuration. Correct refolding seemingly occurred as a function of the linear sequence of amino acid residues composing the polypeptide. But the folding took an hour or longer rather than several minutes or less without an enzyme from the endoplasmic reticulum (a cellular organelle where much translation occurs). Many denatured proteins do not refold as cleanly as those studied by Anfinsen’s group32 and the process requires the activity of chaperone proteins that guide folding during and after polypeptide synthesis33: “Proteins need the assistance of molecular chaperones and folding enzymes to reach their native structure efficiently”.34 Molecular chaperones must provide oversight during folding because the cellular environment is crowded.35 Distinct functional groups of chaperones monitor and facilitate protein folding during de novo synthesis, quality control, and the response to stress.36 Multiple amino acid residue interactions between an already functional, folded protein (the chaperone) and the as-of-yet folded polypeptide underlie the process of correct folding.37 Even when mutations are introduced that lead to altered amino acid components in a polypeptide, which should prevent correct folding, proper folding can be induced by the overproduction of molecular chaperones.38 One way the linear sequence hypothesis might fail is that the ordering of the amino acid residues may be insufficient to explain the three dimensional conformation of the folded protein, assuming the laws of macromolecular physics.39 Two separate constraints operate in the temporal part-whole reductive explanation offered by the linear sequence hypothesis. First, only properties of the parts are required to explain protein folding (i.e., intrinsicality). A property is intrinsic to the linear polypeptide if it is a property of one of its amino acid components or their interactions (intrinsic relational properties). Contextual or extrinsic causal factors are not supposed to play an essential role or contribute to correct folding, such as physico-chemical components (e.g., H2O), other proteins (e.g., chaperones), or nucleic acids (e.g., RNA). Second, the amino acids (‘parts’), as well as macromolecular laws that describe their interactions, are available to explain protein folding (fundamentality). System properties due to a complex three-dimensional 31 32 33 34 35 36 37 38 39
Anfinsen (1973) Clark (2004) Feder and Hofmann (1999); Frydman (2001) Liscalijet et al. (2005, p. 78) Ellis (2001); Homouz et al. (2008); Liscalijet et al. (2005) Albanese et al. (2006); Ellis (1998); McClellan et al. (2005); Tang et al. (2006) Tang et al. (2008) Maisnier-Patin et al. (2005) Sarkar (1998, pp. 169-170)
198
Alan C. Love and Andreas Hüttemann
structure that are absent from the linear polypeptide are not fundamental. The tertiary structure of a three-dimensional protein ‘whole’ is explained by the interaction of its component parts at earlier times; the amino acid residues interact causally to bring about the state of the whole (‘correctly folded’). Molecular chaperones are non-intrinsic, non-fundamental causal factors that make specific and necessary contributions to folding (not just as appropriate environmental background): “The manner in which a newly synthesized chain of amino acids transforms itself into a perfectly folded protein depends both on the intrinsic properties of the amino-acid sequence and on multiple contributing influences from the crowded cellular milieu.”40 The intrinsic properties of the linear polypeptide arising from its amino acid residue parts are not sufficient to explain the manifestation of protein folding. The temporally extended process of folding not only requires appropriate environmental conditions but also the contribution of extrinsic chaperones; i.e., there is a failure with respect to the aspect of intrinsicality. Additionally, the causal contribution of chaperones in protein folding results from three-dimensional structure, a kind of property the amino acid parts lack. Thus, the best explanation of protein folding also involves a failure with respect to the aspect of fundamentality (Form V in Table 1). Systems with properties due to complex three-dimensional structure (folded proteins), rather than systems that lack it, are necessary to produce the native conformations of proteins in vivo—the parts alone in combination with the macromolecular laws of composition are not enough. Temporal part-whole reduction fails with respect to both aspects as a reductive explanation.41 Could a ‘reductionist’ adopt the rebuttal to the context objection here (‘just reduce the context also’)? Chaperones are composed of parts and therefore we can ‘reduce’ the operation of an extrinsic chaperone protein whole to its parts. This is akin to the strategy of preserving a reduction by ‘extending the mechanism’ backwards in time.42 But chaperone proteins require other chaperones for their own proper folding, so the attempt to reduce the extrinsic chaperone (or extend the mechanism) leads to a type of explanatory regress. According to the individuation schemes adopted by scientists, extrinsic, non-fundamental wholes (folded proteins—chaperones) are required for the proper folding of another whole (folded protein). A related objection is to suggest a new individuation scheme: the cell as a ‘larger’ whole contains the protein and the crowded cellular milieu, thereby mak40 Dobson (2003, p. 884). “There is a need for molecular chaperones because the intrinsic properties of proteins assure that incorrect interactions are possible” (van der Vies et al. 1993, p. 73). 41 This claim is relative to the individuation and decomposition of the system offered by scientists, and pertains to the process of bringing about the three-dimensional proteinstructure. Whether the pertinent causal powers of the molecular chaperones are truly novel vis-à-vis the causal powers of its parts concerns constitutional reductionism in an atemporal sense rather than causal part-whole reductive explanation. 42 Delehanty (2005)
Comparing Part-Whole Reductive Explanations
199
ing the molecular chaperones intrinsic and fundamental. But even if a causal partwhole explanation of the behavior of the cell is feasible in terms of its intrinsic parts, this would change the question of what parts and wholes are being reduced. The folding of a protein still cannot be explained solely in terms of its amino acid parts; both intrinsicality and fundamentality are violated because the features required to explain folding are extrinsic and not located in the fundamental realm. Shifting to a larger whole simply changes the explanandum.
5. CONCLUSION The examples canvassed in Section 4 illustrate that part-whole explanations in biology often work differently from those in physics. This is due to the fact that part-whole explanations in physics primarily focus on compositional relations, whereas biological part-whole explanations focus on whether the behavior of the parts at an earlier time t cause the behavior of the compound at t* (Sections 2 and 3). As a consequence there are different ways for reductive explanations to succeed or fail. Because fundamentality is often spelled out in terms of the properties of the parts alone (i.e., intrinsicality) in physical science explanations, a failure of reductive explanation is a failure with respect to both aspects. Furthermore, since physics typically deals with isolated systems, there is no way to make an explanatory appeal to extrinsic features. Thus, for spin-states of a compound, the failure of reductive explanation is also a failure of explanation, and has often been classified as an emergent phenomenon.43 In contrast, the biological case illustrates how a reductive explanation can fail in different ways (intrinsicality, fundamentality, or both) without automatically failing as an explanation. Protein folding has been explained, just not only in terms of the fundamental, intrinisic properties of amino acid residues composing the polypeptide. Importing physical science conceptions of reductive part-whole explanations into biology may obscure this important point. So Mayr was correct that there are genuine dangers for such an import. However, we do not agree with his more sweeping claims concerning explanatory reduction. By developing a conceptual framework for reductive part-whole explanations it is easier to see in what respects such explanations in physics and biology differ. The point of departure for this framework was Nagel’s observation of the distinctness of temporal modes of organization in biological systems (Section 1). Explicitly incorporating temporality provides the basis for decoupling fundamentality and intrinsicality (Section 2), which yields multiple combinations of success and failure for reductive explanations that emerge from distinguishing composition and causation (Section 3; Table 1). It also offers a more precise foundation for delineating differences and similar-
43 Humphreys (1997)
200
Alan C. Love and Andreas Hüttemann
ities between the explanatory approaches of biology and physics. The result is an increased comprehension of how reductive explanations operate in the sciences.
REFERENCES Veronique Albanese, Alice Yen-Wen Yam, Joshua Baughman, Charles Parnot, and Judith Frydman, “Systems analyses reveal two chaperone networks with distinct functions in eukaryotic cells”, in: Cell 124, 2006, pp. 75-88. Christian B. Anfinsen, “Principles that govern the folding of protein chains”, in: Science 181, 1973, pp. 223-230. Neil W. Ashcroft and N. David Mermin, Solid State Physics. Philadelphia, Cengage Learning Services 1976. John Beatty, “The proximate/ultimate distinction in the multiple careers of Ernst Mayr”, in: Biology and Philosophy 9, 1994, pp. 333-356. William Bechtel and Robert C. Richardson, Discovering Complexity: Decomposition and Localization as Strategies in Scientific Research. Princeton: Princeton University Press 1993. Patricia L. Clark, “Protein folding in the cell: reshaping the folding funnel”, in: Trends in Biochemical Sciences 29, 2004, pp. 527-534. Carl. F. Craver and William Bechtel, “Top-down causation without top-down causes”, in: Biology and Philosophy 22, 2007, pp. 547-563. Megan Delehanty, “Emergent properties and the context objection to reduction”, in: Biology and Philosophy 20, 2005, pp. 715-734. Christopher M. Dobson, “Protein folding and misfolding”, in: Nature 426, 2003, pp. 884-890. R. John Ellis, “Steric chaperones”, in: Trends in Biochemical Sciences 23, 1998, pp. 43-45. R. John Ellis, “Macromolecular crowding: obvious but underappreciated”, in: Trends in Biochemical Sciences 26, 2001, pp. 597-604. Martin E. Feder and Gretchen E. Hofmann, “Heat-shock proteins, molecular chaperones, and the stress response: evolutionary and ecological physiology”, in: Annual Review of Physiology 61, 1999, pp. 243-282. Robert B. Freedman, “Protein folding in the cell”, in: Thomas E. Creighton (Ed.), Protein Folding. New York: W.H. Freeman and Company 1999, pp. 455-539. Judith Frydman, “Folding of newly translated proteins in vivo: the role of molecular chaperones”, in: Annual Review of Biochemistry 70, 2001, pp. 603-647. Dirar Homouz, Michael Perham, Antonius Samiotakis, Margaret S. Cheung, and Pernilla Wittung-Stafshede, “Crowded, cell-like environment induces shape changes in aspherical protein”, in: Proceedings of the National Academy of Sciences of the United States of America 105, 2008, pp. 11754-11759.
Comparing Part-Whole Reductive Explanations
201
Jay R. Hove, Reinhard W. Köster, Arian S. Forouhar, Gabriel Acevedo-Bolton, Scott E. Fraser, and Morteza Gharib, “Intracardiac fluid forces are an essential epigenetic factor for embryonic cardiogenesis”, in: Nature 421, 2003, pp. 172-177. Paul Humphreys, “How properties emerge”, in: Philosophy of Science 64, 1997, pp. 1-17. Andreas Hüttemann, “Explanation, emergence, and quantum entanglement”, in: Philosophy of Science 72, 2005, pp. 114-127. Mads Kærn, Timothy Elston, William Blake, and James Collins, “Stochasticity in gene expression: from theories to phenotypes”, in: Nature Reviews Genetics 6, 2005, pp. 451-464. Jaegwon Kim, Mind in a Physical World. Cambridge, MA: MIT Press 1998. I. Marije Liscalijet, Bertrand Kleizen, and Ineke Braakmen, “Studying protein folding in vivo”, in: Johannes Buchner and Thomas Kiefhaber (Eds.), Protein Folding Handbook. Part II. Weinheim: WILEY-VCH Verlag 2005, pp. 73104. Sophie Maisnier-Patin, John R. Roth, Asa Fredriksson, Thomas Nystrom, Otto G. Berg, and Dan I. Andersson, “Genomic buffering mitigates the effects of deleterious mutations in bacteria”, in: Nature Genetics 37, 2005, pp. 1376-1379. Ernst Mayr, “Cause and effect in biology”, in: Science 134, 1961, pp. 1501-1506. Ernst Mayr, Toward a New Philosophy of Biology: Observations of an Evolutionist. Cambridge, MA: Harvard University Press 1988. Ernst Mayr, What Makes Biology Unique? Considerations on the Autonomy of a Scientific Discipline. New York: Cambridge University Press 2004. Amie J. McClellan, Melissa D. Scott, and Judith Frydman, “Folding and quality control of the VHL tumor suppressor proceed through distinct chaperone pathways”, in: Cell 121, 2005, pp. 739-748. Ernest Nagel, The Structure of Science: Problems in the Logic of Scientific Explanation. New York: Harcourt, Brace & World, Inc 1961. J. S. Richardson, “What do the folds in proteins look like?”, in Donald B. Wetlaufer (Ed.), The Protein Folding Problem. Boulder, CO: Westview Press 1982, pp. 1-28. Sahotra Sarkar, Genetics and Reductionism. Cambridge: Cambridge University Press 1998. George G. Simpson, This View of Life: The World of an Evolutionist. New York: Harcourt, Brace & World 1964. Yun-Chi Tang, Hung-Chun Chang, Annette Roeben, Dirk Wischnewski, Nadine Wischnewski, Michael J. Kerner, F Ulrich Hartl, and Manajit Hayer-Hartl, “Structural features of the GroEL-GroES nano-cage required for rapid folding of encapsulated protein”, in: Cell 125, 2006, pp. 903-914. Yun-Chi Tang, Hung-Chun Chang, Kausik Chakraborty, F. Ulrich Hartl, and Manajit Hayer-Hartl, “Essential role of the chaperonin folding compartment in vivo”, in: EMBO Journal 27, 2008, pp. 1458-1468.
202
Alan C. Love and Andreas Hüttemann
Saskia M. van der Vies, Anthony Gatenby, Paul V. Viitanen, and George H. Lorimer, “Molecular chaperones in and their role in protein assembly”, in: Jeffrey L. Cleland (Ed.), Protein Folding In Vivo and In Vitro. Washington, DC: American Chemical Society, 1993, pp. 72-83. William C. Wimsatt, “Reductive explanation: a functional account”, in: Robert S. Cohen (Ed.), Proceedings of the Philosophy of Science Association, 1974. Dordrecht, Holland: D. Reidel Publishing Company, 1974, pp. 671-710.
Alan C. Love Department of Philosophy Minnesota Center for Philosophy of Science University of Minnesota 831 Heller Hall 271 19th Ave. S Minneapolis, MN 55455 USA [email protected] Andreas Hüttemann Philosophisches Seminar Universität zu Köln Albertus-Magnus-Platz 50923 Köln Germany [email protected]
PETER MCLAUGHLIN
THE ARRIVAL OF THE FITTEST
“Natural selection may explain the survival of the fittest, but it cannot explain the arrival of the fittest.” (Hugo De Vries)1
In one of his early sketches from the Russian Revolution Jaroslav Hasek tells the story of a more zealous than competent Red Army commander who sought to foster literacy among the peasants in the area where he was stationed by posting a written notice ordering them to learn how to read within three days. Those inhabitants of the county still illiterate after this period were to be shot. Had the local Bolshevik commissar (Hasek) not crossed the commander’s plans, the proposition, “All adult inhabitants of the county are literate,” might have become true, and its truth would have been explainable by appeal to a sort of selection. Some people believe that adaptation by means of natural selection proceeds more or less along the lines of Colonel Jerochymov’s program of literacy by firing squad. Selection, they believe, explains only the survival but not the arrival of the fittest. The question at issue is whether natural selection can explain the origin and production of adaptations or merely the preservation and spread of adaptations.
SOME HISTORICAL SCENE SETTING In the history of biology since Darwin there have been two fundamentally different ways of conceptualizing natural selection: one view sees natural selection as a positive, creative factor in the evolution of organisms; the other views it primarily or exclusively as a negative, eliminative factor. In practice this is often more a question of general attitude than of specific doctrine: whether one looks at natural selection primarily as shaping adaptations or as sifting genes. There are, nonetheless, two diametrically opposed conceptualizations of selection involved: 1) Negative: Selection eliminates inferior types; it trims the set of genetic plans that arise by mutation (and heredity). 2) Creative: Selection produces and shapes adaptations; it structures the gene pool and recombines genes and traits.
1
De Vries (1904, pp. 825–826). De Vries is himself quoting with approval a reviewer of his Mutationstheorie.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_14, © Springer Science+Business Media B.V. 2011
204
Peter McLaughlin
Now some version of each of these views can be formulated such that it seems obviously true and uncontroversial. By (1) we may merely mean to assert that stabilizing selection exists or that there is no DNA in a population after a round of natural selection that was not there before that round, though the frequency of genes may have changed and some may even have disappeared. Or even more trivially: natural selection has never caused a base pair substitution. All this is obviously true, but it would be quite false—the other side can point out—to claim that in the absence of mutation there are no genotypes in the F2 population that were not already there in the F1. By (2) we may mean simply that not only are some traits replaced by other traits but traits may also be developed in a particular direction. For instance, the average neck length in a population of giraffes may increase not only because long-necked giraffes increase in frequency relative to short-necked giraffes, but also because the necks of long-necked giraffes are getting longer. But it would be false to say that selection never reduces variety. And most biologists think that natural selection embraces both processes: production of traits and elimination of traits. Expressed in Darwin’s breeder categories: There is a difference between choosing one breed of cattle over another, thus preserving it, and producing the breed in the first place. Selection can be conceptualized paradigmatically as picking a particular variant out of a large population and fixing it to the detriment of others, or alternatively, as shaping out of the variation that nature offers over a number of generations a particular form that did not previously exist. Darwin was particularly fascinated by the second possibility. If selection consisted merely in separating some very distinctive variety, and breeding from it, the principle would be so obvious as hardly to be worth notice; but its importance consists in the great effect produced by the accumulation in one direction, during successive generations, of differences absolutely inappreciable by an uneducated eye …2
If all there were to selection is simply favoring one given form over another given form, then selection would be trivial—as indeed many have thought it to be. What is non-trivial is the production (by accumulation) of forms that did not exist before selection started (independent of any new mutation). Note however that Darwin has serious difficulty formulating exactly what it is that the skilled breeder is supposed to be able to do; he even seems to describe the breeder’s skill as a sort of special ability to recognize an existing variety that others cannot recognize—as if normal people could see only “very distinctive” varieties but the skilled breeder could also see given varieties that are not so distinctive. This is surely not what Darwin means, but he does not make himself particularly clear. At another place Darwin uses the metaphor of characters written in invisible ink, as if the breeder could see (existing but invisible) traits that others cannot see.3 2 3
Darwin (1895, p. 36). Darwin (1875, pp. 35-36).
The Arrival of the Fittest
205
While the positive or creative view of selection is associated with the breeder’s model of the creation of new forms, the negative or eliminative view is strongly associated with the analogy between natural selection and a sieve. The classical formulation of the sieve analogy derives from Hugo de Vries’ Mutationstheorie: Natural selection is a sieve, it creates nothing, as is often falsely represented, it only screens. It retains only what variability offers it. How that which it sifts arises, ought actually to lie outside the doctrine of selection. It is one thing to ask how the great competition sifts, it is another thing to ask how what was sifted had arisen before.4
And in his next book he continued: It [natural selection] is the sifting out of all organisms of minor worth through the struggle for life. It is only a sieve, and not a force of nature, no direct cause of improvement, as many of Darwin’s adversaries, and unfortunately many of his followers also, have so often asserted. It is only a sieve, which decides which is to live, and what is to die. … Of course, with the single steps of evolution it has nothing to do. Only after the step has been taken, the sieve acts, eliminating the unfit. The problem, as to how the individual steps are brought about, is quite another side of the question.5
Each “single step” in evolution is produced by mutation. Expressed schematically: mutation creates, heredity preserves, and selection discriminates.6 This position was strongly favored by the Mendelians in the first half of the 20th century. However it did incur some difficulties since it seems to be endemic to the position that all existing forms are ultimately due to a series of accidental mutations preserved by heredity—selection has just gotten rid of the other forms. To avoid the logical consequence, that adaptations, like mutations, are ultimately accidental, advocates of this view were often led from saltationism to orthogenesis. Others pushed the position to its absurd consequences and accepted them: T. H. Morgan, for instance, imagines to himself what would have happened had there been no filtering sieve.
4
5 6
“Die natürliche Auslese ist ein Sieb, sie schafft nichts, wie es oft fälschlich dargestellt wird, sondern sichtet nur. Sie erhält nur, was die Variabilität ihr bietet. Wie das, was sie siebt, entsteht, sollte eigentlich ausserhalb der Selectionslehre liegen. Eine Frage ist es, wie der grosse Wettkampf siebt, eine andere, wie das Gesiebte vorher entstanden war.” Mutationstheorie vol. II, p. 667. Gayon (1998, p. 53) points out that Spencer’s notion of “sorting” or “sifting” also involves a sieve metaphor. A.R. Wallace introduces a sieve, but not as a metaphor: “Suppose we have a plant with a small edible seed, and we want to increase the size of that seed. We grow as large a quantity of it as possible, and when the crop is ripe we carefully choose a few of the very largest seeds, or we may by means of a sieve sort out a quantity of the largest seeds” (Darwinism, p. 84). De Vries (1904, pp. 6–7) Such a conception views what Darwin called the “preservation of the favoured races,” i.e. selection, as heredity.
206
Peter McLaughlin
If all the new mutant types that have ever appeared had survived and left offspring like themselves, we should find today all kinds of animals and plants now present, and countless others. This consideration shows that even without natural selection evolution might have taken place. What the theory does account for is the absence of many kinds of living things that could not survive … The argument shows that natural selection does not play the rôle of a creative principle in evolution.7
According to this view, natural selection explains the gaps in the great chain of being but none of the links. It doesn’t explain how the leopard got its spots; this is explained by mutation and heredity. Selection explains only why the spotless, striped, and checkered leopards are no longer with us. If Morgan is right that mutation and heredity by themselves could produce all given forms of life, then selection is explanatorily superfluous with regard to these forms. With the evolutionary synthesis in the 1950s the pendulum swung back in the other direction, and the creationist version of natural selection became dominant. Ernst Mayr complained that natural selection had been “misunderstood and misrepresented as strictly negative”.8 Mutation was demoted to a mere contributing factor of variation. Evolution came to be presented again as a process of variation and selection, not of mutation and selection as Morgan had conceived it. “Let us remember (Mayr tells us) that recombination, not mutation as such, is the primary source of the phenotypic variation encountered by natural selection”.9 Mayr notes critically that the older view—mutation plus elimination—nonetheless continued to be found in textbooks into the 1960s and even in the 1970s: To counteract this kind of thinking, several evolutionists, as J. Huxley, Dobzhansky, and Simpson have called selection a creative process. This designation is justified because evolution mixes in every generation the genetic endowment of the few survivors during sexual reproduction and thus creates abundant new genotypes, which are then tested in the next generation.10 7
Morgan (1932, pp. 130–131). Allen (1978, p. 315) considers this to be a regression to an earlier position. Morgan’s disciple, Lancelot Hogben asks us to: “consider what would happen in a universe so large and abundantly supplied with the necessities of life that no struggle for existence intervenes. Given unlimited time in a Mendelian universe in which natural selection did not operate, all the species we know today would be present, and many more besides. Evolution would have occurred; but the pageant of life would present to the taxonomist a more continuous appearance and the striking gaps which we now see would be filled not by fossils relics but by living forms” (Hogben 1930, p. 181). Hogben, to his credit, does not think this position has anything to do with Darwin; to his discredit, however he does think it is true. See Harris (1999) for someone who takes this to be evolution and rejects it. 8 Mayr (1962, p. 7) 9 Mayr (1962, p. 8). This remains textbook wisdom; see Futuyma (1998, p. 283): “All genetic variation owes its origin ultimately to mutation, but in the short term, a great deal of the genetic variation within a population arises through recombination.” 10 Mayr (1988, p. 99).
The Arrival of the Fittest
207
Mayr thus also names the bone of contention: recombination, which along with mutation (and migration) produces variation according to this newer scheme. Recombination was not adequately grasped by the original Darwinian and Neodarwinian positions. Mutation is independent of past or future selection; but the other forms of variation—migration and recombination—are independent only of future selection, not however, of past selection. In fact there is an important sense in which recombination is a result of selection. Improbable combinations become probable through selection; by selection we can probabilistically see to it that certain traits are combined. Even given random mating, if selection skews the distribution of mates, it influences the results of mating. The creationist view of natural selection, as Mayr indicates, was propounded by many of the most eminent biologists of the second half of the twentieth century. Here are two examples: The chances of such realization for any particular combinations depend on the frequencies in the parental population of the genes involved in the combination. These frequencies, in turn, are to a considerable extent, if not absolutely, determined by the action of natural selection. Selection thus plays an essential part in determining what combinations of genes will be incorporated in individual organisms and so in the actual origin of new variant sorts of organisms. In this role, selection may surely be said to be a creative factor in evolution.11 But natural selection does not act merely as a sieve eliminating detrimental mutations and favoring reproductions of beneficial ones as is often suggested. In the long run, it integrates mutations, and it orders them into adaptively coherent patterns adjusted over millions of years and over millions of generations as a response to environmental challenges. It is natural selection that gives direction to changes, orients chance, and slowly, progressively produces more complex structures, new organs, and new species. Novelties come from previously unseen association of old material. To create is to recombine.12
The problem with the creative view of selection, however, is that it is a bit vague about just how selection is supposed to be creative. It is clear that recombination is somehow supposed to represent a form of selection-dependent variation, but the two quotes above are quite possibly the most specific and detailed biological deliberations on this question on record. Thus we have the assertion that natural selection is creative but only the slightest of hints as to how it manages to be so: some hand waving at recombination. We have to take Jacob and Simpson together: If to create is to recombine and if selection recombines, then selection creates. Although this is certainly preferable to Morgan’s notion that adaptations are produced not because of natural selection but in spite of it, it is still far from an explicitly articulated explanation. The creative position might even be inconsistent: If recombination is selection-dependent variation and if recombination is also locally more important than 11 Simpson (1967, p. 225). 12 Jacob (1977, p. 1163).
208
Peter McLaughlin
mutation for the phenotype differences that become reproductively relevant, then evolution is not based on two independent processes, variation and selection, but rather on two interlocked and interdependent principles. Thus it might be better to go back to the older division into two independent processes, mutation and selection, and to interpret selection so as to include recombination. That is, the problem with the position of de Vries and Morgan is perhaps not (as we have all accepted from Mayr) that they reduce variation to mutation but rather that they reduce selection to elimination. The problem is not so much that they didn’t allow recombination as a kind of variation but that they didn’t recognize recombination as an aspect of selection. If evolution is to be reduced to two independent processes, then these must be mutation and selection, whereby selection includes all (non-accidental) biasing processes including recombination.
MISLEADING METAPHORS What the creative view of selection has going for it is a certain orthodoxy, authority and probably truth, but very little by way of argument or analysis. The eliminative view, on the other hand, is supported by a great deal of argument, most of which however, as we shall see below, seems to be unsound or based on misunderstanding. While biologists tend to have an intuitive commitment to the creative view of selection, it is not something that is generating a lot of research.13 On the other hand, a number of philosophers of late have taken up the eliminative view, and as ever they tend to be on the side of the saltationists. The point of the following critical remarks is not that the vague creationist attitude must be right, but rather that a return to mutationism is not the kind of alternative that is likely to force the creationists to articulate a precise mechanism of how selection creates by recombining. Exponents of both views of natural selection sketched above have taken up de Vries’ sieve analogy to illustrate their positions; but the use of the same analogy 13 Now it is true that many biologists can be cited as saying something like proposition (1) above, but very few can be found who go on explicitly to deny proposition (2). One of these few, Endler (1986), in the course of an enthusiastic report of Elliott Sober’s views, states forthrightly that, “Natural selection is not an explanation of adaptation; it only explains how relatively better adaptations can increase in frequency” (p. 46). And it takes him exactly two hundred pages to come around to the opposite view. As a speculation he allows for “only one way” in which natural selection might indeed help explain a new trait, namely “the case where an evolutionary change requires one or more intermediate steps” (p. 246). The probability that certain independent mutations occur together in the same organism might be increased, he admits, if natural selection has increased the frequency of each variant. As I read it, the case where an evolutionary change requires one or more intermediate steps includes more or less everything but hopeful monsters.
The Arrival of the Fittest
209
conceals a deeper disagreement on what it signifies. This can be illustrated on the example of two of the best-known presentations of the analogy in recent literature.
Fig. 1 (Sober 1984) The most prominent use of the sieve analogy in recent philosophy is the “selection toy” introduced by Elliott Sober in The Nature of Selection to clarify various aspects of the selection process14. The selection toy takes a collection of four different sorts of balls, in which size and color are correlated, and sifts them through three ever finer sieves thus cumulatively selecting in the end the smallest balls—which also happen to be (say) green. This is said to illustrate a particular distinction between selection of and selection for: selection of small green balls and selection for small size but not for green color.15 Karen Neander16 has pointed out a number of peculiarities and weaknesses in Sober’s selection toy model, for instance, that the two upper sieves are irrelevant 14 Sober (1984b, pp. 97–102). 15 Note however that Sober does not in fact speak of the selection of balls for their size, as the selection-toy model might suggest. He consistently uses “selection for” in such a way that selection for traits does not entail selection of individuals for those traits: thus the term “selection” is used in two quite different senses. 16 Neander (1995).
210
Peter McLaughlin
to the final result. The model thus—against its intent—presents natural selection as a one shot process: differences + sifting leads to fixation of a trait. But even if the model were changed a bit to allow repeated sifting, there still remains the basically pre-synthesis conceptualization of natural selection that lies in the use of the sieve itself. On Sober’s interpretation selection is in two ways purely negative: (1) The selected balls are those to which nothing has happened; and no matter how often the process is repeated, the “successful” balls (the smallest ones) continue to be those to whom nothing has happened; nature never laid a glove on them. Selection doesn’t affect the winners, their trajectory is undeflected, they just fall through. Without selection (sifting), the small balls would have been exactly where they are with selection—they would just have had more company, exactly as Morgan envisioned it. But selection does nothing to the winners. (2) The second negative aspect of natural selection lies in the fact that the action of selection is merely to eliminate. Without selection, the big white balls would have been in the bottom chamber, too; but by selection they are eliminated. Natural selection does not in any way directly affect the winners; it acts only by eliminating the losers.17 It has no influence on the properties of the winners or the losers. Survival (“preservation”) in biology appears to be somewhat analogous to inertia in physics: it’s what happens to an organism when nothing happens to it, when there is no selection. “Forces of selection,” as Sober calls them, have deflected the trajectories of other organisms, not the ones selected: We should think of the sources of mortality in a population as component forces added vectorially.18 According to this view after x rounds of selection, nothing has happened to the successful organism (or to successive tokens of the type) x number of times. For Sober’s model, evolution presents a trajectory characterized primarily by conservative replication and occasionally by imperfect replication (mutation). Selection affects the non-survivors or the non-propagated by ending their trajectories. It explains why individuals with particular traits are relatively numerous, but not why they have the traits they have. Natural selection selects out those that don’t make the grade and disposes of them; the balls left in Sober’s sieves are the losers. This makes it easy to believe that selection has had no causal input into the production of any surviving individuals. They are precisely where and what they would have 17 This difference in perspective was one of the significant differences between Darwin’s and Wallace’s views of evolution. Wallace embraced the perspective of the losers of the struggle for life; Darwin viewed selection from the point of view of the winners. See Wallace (1889, esp. Ch. 2 and 5). 18 Sober (1984b, p. 97; p. 158) views Hardy-Weinberg equilibrium as similar to inertia but argues that it is nonetheless different because it applies to states of populations, not to individual organisms; on the other hand, since he interprets his fitness statistics as based on individual propensities, his zero-force states of populations are aggregated from the propensities (states) of individuals.
The Arrival of the Fittest
211
been in the absence of selection. The losers on the other hand have been affected by natural selection—they are gone. Natural selection on this view explains the absence of non-spotted leopards not the origin of spotted ones. From the point of view of Darwin’s breeder, on the other hand, it is the winners, not the losers, who are left in the sieve: When the breeder selects certain organisms, he chooses them for propagation, not for destruction. The gravel selected by the sieve is dumped into the next sieve. At each step, selection picks out and retains the winners so that after x rounds of selection each type has been affected by selection x number of times. Without the sieve the gravel would have landed in the (garbage) bucket with the sand; the sieve has altered its trajectory. F. J. Ayala’s version of the sieve analogy is positive; selection does something to the successful variants:19 Natural selection has been compared to a sieve which retains the rarely arising useful and lets go the more frequently arising harmful mutants. Natural selection acts in that way, but it is much more than a purely negative process, for it is able to generate novelty by increasing the probability of otherwise extremely improbable genetic combinations. Natural selection is creative in a way.
In Ayala’s sieve it is the unsuccessful organisms that fall through the holes. Nature has selected those in the sieve for the traits they possess. They have been chosen for further selection. Ayala also gives a hint as to how selection, even if conceived of as a sieve, can be creative: by changing the frequencies of individual traits, selection changes the probabilities of combinations of traits. That is, selection recombines. Thus, agreement about the sieve analogy can mask a disagreement at a more fundamental level and can mislead the eliminativists into believing that more biologists take their line than actually do. When we compare selection to a sieve, we have to ask whether nature is selecting the sand or the gravel; whether the winners are in the sieve and thus change their trajectories or whether they are in the bucket and remain unaffected by the sieve of selection. 19 Ayala (1970, p. 5, italics PM). He has reiterated this recently: “Natural selection has been compared to a sieve that retains the rarely arising useful genes and lets go the more frequently arising harmful mutants. Natural selection acts in that way, but it is much more than a purely negative process, for it is able to generate novelty by increasing the probability of otherwise extremely improbable genetic combinations. Natural selection is thus in a way creative. It does not “create” the entities upon which it operates, but it produces adaptive genetic combinations that would not have existed otherwise” (Ayala 2004, 60). Dobzhansky warns that the sieve metaphor can be misleading; it is only useful in certain cases where selection may “act as a sieve that retains the rare favourable mutants”; see also Ayala (1970, p. 200); Mayr (1988, pp. 101–103) also distinguishes explicitly between selection and mere elimination. And even De Vries above admits that selection “retains” (erhält) what variation gives it. The sieve metaphor is perhaps fundamentally misleading since it encourages the view that natural selection is a cause of differential reproductive success rather than just being (nonaccidental) differential reproductive success.
212
Peter McLaughlin
VARIATIONAL EXPLANATION AND FILTER DEVICES A number of philosophers have recently argued for the purely eliminative nature of natural selection. The basic argument for this position was presented by Elliott Sober in the Nature of Selection20, and various advocates have articulated some aspects in more detail and added minor improvements. The most articulate of these presentations all follow Sober in appealing to arguments and distinctions (wrongly) attributed to a paper by Richard Lewontin.21 Sober distinguishes between “developmental theories” and “selectional theories” associating developmental theories with Lamarck and selectional theories with Darwin. In developmental theories each individual changes, and the changes in a population are aggregated out of the individual changes. Selectional theories, on the other hand, have “an irreducibly population-level character”; individuals do not undergo change: “Population change isn’t a consequence of individual change but of individual stasis plus individual selection”22. The difference between selectional and developmental explanations is illustrated on the example of an explanation why all the children in a particular room can read at a third grade level. A developmental or transformational explanation would take each child and explain how that individual learned how to read; it would then aggregate the results for the whole class. A selectional explanation is different; it would point to the fact that a reading test was administered at the door: only those children who could read at third-grade level passed the entrance exam and were thus admitted to the classroom. Natural selection, according to this view, is a selectional theory not a developmental theory; it explains not how a particular leopard got its spots but how the leopard population came to be predominantly composed of spotted individuals. Such a selectional theory can in fact only explain the prevalence of a trait, not its production, only the survival of the fittest, not their arrival. There is a type of mechanism known in rational-choice oriented social theory as a “filter device” that is used to show the limits of methodological individualism.23 Such a device allows one to “explain” in some cases why all As are Bs even if one cannot explain how any particular A came to be B—or to be A for that matter. In fact even if both A and B are purely stochastic phenomena and have no common cause or any other link in their origins, I can make all As in a certain space be Bs by using a filter device that allows only those As that happen also to be Bs to pass through. In this manner I can arrange that all As in the space are also Bs without causing any single A to be B. If for instance I have a collection of four different sized balls of four different colors, where one quarter of each size has each of the colors, I can send all the balls through two different filters: the first selects (lets 20 Sober (1984, pp. 147-155). 21 In fact Lewontin (see Walsh 1998, p. 250; 2002, p. 469) is said to be a major spokesman of this position, though I see no evidence that he even adheres to it. 22 Sober (1984, p. 150). 23 Nozick (1974, pp. 21–22; pp. 312–317).
The Arrival of the Fittest
213
through) only the smallest sized balls and the second selects only the green balls. The result is that all the balls in the second space are small and green. The filters explain why all the balls in the final space are small and green although we have no explanation why any particular ball is small or green. Entrance exams—to take Sober’s example—can “explain” why all third graders in a school can read at a certain level without explaining how any child learned how to read.
Why are all the objects round and gray? Process A (transformation)
Process B (filter)
Filter (round)
Mold
Coloring device
Filter (gray)
Fig. 2 Now imagine two somewhat more simple processes as illustrated in Fig. 2: In the first one (A), all individuals are sent through two devices, which transform their shape and color. Thus all individuals that pass through both devices have been caused to be both gray and round. In the other example (B), a diverse group of individuals is sent through two devices that do not change any individual’s properties but merely select individuals based on their properties. Device B allows the passage of only individuals that are gray and round. The outputs of both the transforming devices and the filter devices are the same. This is the alternative that Sober and his supporters see between two basic forms of explanation. The plausibility of Sober’s argumentation is due to the alternative he succeeds in presenting as exhaustive: If natural selection is to be conceived either as the transformation of each individual by the same causes or as an eliminative filter device, then it would be reasonable to prefer the filter device.
214
Peter McLaughlin
But it questionable whether Sober has gotten the alternatives right with these mechanical models. And a look at Lewontin’s own words in this oft-cited and consistently misrepresented paper casts serious doubt on this:24 There are two basic dynamic forms for evolving systems. One is transformational, in which the collection of objects evolves because every individual element in the collection undergoes a similar transformation … The universe of stars is evolving because every star undergoes the same set of transformations.
To this typically physical process Lewontin contrasts a specifically biological form of evolution: The alternative evolutionary dynamic, unique as far as we know to the organic world, and uniquely understood by Darwin, is variational evolution. In a variational scheme, there is variation of properties among individuals in the ensemble, variation that arises independent of any effect it may have on the individual who possesses it.
This is a far cry from a commitment to eliminative selection. Sober in fact at first takes up the distinction between the two kinds of theory in Lewontin’s own terms. But he misunderstands the two processes, changes the meanings of the terms, and very appropriately renames them to fit the change in meaning: Lewontin’s transformational explanation becomes ‘developmental’ and his variational explanation becomes ‘selectional’—to fit the interpretation in terms of merely eliminative selection. Later writers just call it “selectional explanation” attributing not only Sober’s interpretation but even his term itself to Lewontin.25 And while Lewontin is not very expansive about exactly how “variational” explanation works26, he is at least explicit that he intends it to be “unique to the organic world” and not to be something that is instantiated by every tollbooth, turnstile, and entrance exam. And in a parallel passage, where he is somewhat more expansive, Lewontin is even more explicit: “There is no historical process other than the evolution of living organisms that has this variational form, at least as far as we know”.27 While I have no privileged access to Lewontin’s views, the position as he presents it in these two brief discussions seems to take a fairly standard Dob-
24 Lewontin (1983a, p. 23). It should be noted, however, that this paper by Lewontin cited by the philosophical eliminativists is actually only a book review in a popular journal (New York Review of Books) and that the distinction is made ad hoc and en passant while discussing one of the six different books reviewed. Nonetheless Lewontin says enough to make it clear that he is not necessarily of Sober’s opinion. 25 See Walsh (1998); Walsh, Lewens and Ariew (2002) seems to back off one of the terms. 26 Lewontin (1983b) is much more explicit in this regard. 27 Lewontin (1983b, p. 86).
The Arrival of the Fittest
215
zhanskyite line. Nature is characterized by Dobzhansky28 as a “stormy sea,” a gene pool seething with variation; populations are taken to be extremely heterogeneous. Mutation occurs randomly, but at a dependable rate, and there are masses of new phenotypic variation every generation (due mainly to recombination). Selection shapes and biases this fecund and changing mass. Lewontin contrasts this uniquely biological kind of explanation to transformational explanation, where the same causes acting on each individual bring about the same changes in each. Many philosophers of science seem to have taken a wrong turn in the Harvard hallways and to have mistaken Lewontin’s variational evolutionary dynamics for Nozick’s eliminative filter device. The actual philosophical question at issue is whether natural selection is properly interpreted as a filter device or not. If it is a filter device, then no iteration of its action will do anything but filter; it will never explain the origin of a trait whether we speak of types or tokens, of one step or many steps. Cumulative selection, if it is to be creative, must be distinct from iterated filtering. If evolution is simply a process of mutation, heredity, and selection (and selection is just filtering), then selection does not explain how the filtered entities got their traits: mutation does. Linda the leopard got her spots by accident, her children kept them by heredity, and they are relieved of the company of unspotted beasts by the sieve of selection. But if the difference between spotted and spotless leopards depends on more than one base-pair substitution, then the explanation of how the leopard got his spots is not just one accident but a sequence of perhaps dated accidents. This seems to be the basic conceptualization of selection in Jerry Fodor’s notorious recent rejection of natural selection, which he takes to consist “of a random generator of traits and a filter over the traits so generated. And that is all.”29 Let us take up the systematic question whether “variational explanation” (selection) can in fact explain the origin of a trait. Sober (Fig. 3) imagines a lineage of organisms (1–17) through five generations (I–V). The lineage was originally characterized by trait B, which in one individual of generation II mutates to G and quickly goes to fixation: Think of natural selection in generation II as doing exactly two things; it kills organism 2 before the organism reaches reproductive age and it causes organism 3 to have two babies. Given these events in Generation II, how could they affect the traits of individuals 4 and 5 in Generation III? How could killing organism 2 cause organisms 4 and 5 to be G rather than B? And how could allowing organism 3 to have two babies, rather than some different number, influence the traits that those babies come to have?30
28 Dohzhansky (1970, p. 210). 29 Fodor and Piatelli-Palmarini (2010, p.14). 30 Sober (1995).
216
Peter McLaughlin
Fig. 3 (Sober 1995) Sober rightly asks how killing a particular organism in the F2 generation could cause some other organism in the F3 to have property G instead of property B. After all, the F3 organism can only get G or B by heredity or mutation. But since the simplifying assumptions of his model allow neither for sex nor for recombination, the most we can conclude from the example is that selection without recombination could not explain the origin of a trait. The reasoning is thus not very convincing; it makes its case by stipulation not by argument. However vague the creationists may be about how natural selection “creates,” recombination is nonetheless pretty obviously taken to be essential. Basically, if we define trait as the phenotypic expression of a mutation, then natural selection cannot explain the origin (arrival) of a trait because it cannot explain why a mutation occurs. Sober builds this conceptualization of a trait into his example and derives from the example precisely what he has built into it. Neander31, one of the few philosophers to offer counterarguments, points out that “when selection is followed by further genetic recombination and mutation: preceding selection can dramatically alter the probability of subsequent gene sequences occurring”; but she does not follow up on this with an unequivocal assertion that natural selection recombines, and in fact she seems here to accept the conceptualization of selection as something quite distinct from recombination. Supporters of Sober in this regard, while attempting to refine the argument, generally stay within the framework: Walsh32 embellishes on the example, but still retains the basic structure: a haploid organism with no sex, no recombination. Thus the results are no more relevant to the question at issue here: Can recombina-
31 Neander (1995, p. 76). 32 Walsh (1998).
The Arrival of the Fittest
217
tion make selection creative? Walsh defines selection in such a way that it cannot recombine:33 Natural selection sorts variants … But it does not produce the variants, combine them, or transmit them from parents to offspring. … It is perhaps the most distinctive contribution to Darwin’s original theory that it makes this shift from individual thinking to population thinking.
This view of selection however would seem to have the same unwanted consequences as that of de Vries and Morgan. What the argument states is that natural selection explains only the prevalence of certain genetic plans but not the content of any particular plan. But what explains the content?—Heredity and mutation: 1) the laws of chemistry explain why the DNA chain is replicated accurately; 2) accident (or disturbance) explains why the DNA chain is replicated inaccurately. Let us pretend that the resistance of a bacterium to some antibiotic is due to a particular “gene”, and that 10 base pairs have to be replaced in a string of DNA to get to the right sequence from some other sequence fairly common in the strain. How did this gene arise? Well, we know that the chance that any particular base pair substitution occurs at a locus is 10–k, let’s just say 10–8. Let us assume that it takes us 27 generations to get 227 (or about 108) instantiations of a locus, and when we have that many we can reasonably expect to find one with the right substitution at the locus in question. Now if we take our resistant bacterium and look back in its genealogy, we find that about 27 generations ago one base pair mutated accidentally, and around 27 generations before that, another pair mutated accidentally, and so on. To trace back the ten mutations needed, we just need to follow the lineage back about 270 generations more or less. The ultimate explanation of the particu33 Walsh (1998, p. 252) correctly citing Sober’s (1980) disputable reading of Mayr (1963). The opposition between population thinking and individual thinking is a construct: Mayr in fact distinguishes between population thinking and typological thinking, which takes the individuals as representatives of types. Population thinking on the other hand takes individuals in their individuality, viewing them from the perspective of their individual differences from one another. For Mayr population thinking is not opposed to individual thinking—it is individual thinking. For Mayr the philosopher’s concern with ensembles and methodological individualism has little to do with the biologist’s concern with populations and types. As in the interpretation of Lewontin, Sober first reinterprets Mayr’s position and then revises the terminology to fit the interpretation; his successors then take up the new terms attributing them to Mayr. Furthermore since Sober’s statistical view of things demands implicit reference to the reference classes for which the statistics are taken, there is a strong tendency to reintroduce typological thinking into the discussion under the name of “population thinking”. Sometimes Sober’s individual thinking corresponds quite well to Mayr’s population thinking and his population thinking corresponds fairly well to Mayr’s typological thinking. There is indeed a tension between the typological thinking needed to use mathematical models in the first place and Mayr’s kind of population thinking that prevented him from profiting much from mathematical biology.
218
Peter McLaughlin
lar genetic plan is thus: the laws of chemistry, initial conditions, and a series of dated accidents.34 The chance that all ten base pair substitutions occur in the same organism (without natural selection to bias the statistics) is thus about 10–10k. If we return to Morgan and Hogben’s “Mendelian universe” and let a strain of bacteria grow in an infinite bathtub, we can follow this development in the other direction. If the bacteria divide and all live, then about every 27 generations (108 bacteria) we may expect to acquire a new mutation in one of the right lineages. But by the time we get all 10 base pair substitutions (after about 270 generations) we have 2270 (or about 1080) bacteria, that is, about 1060 cubic meters of biomass— which would dwarf our solar system. The only way out of this is of course to say that there is selection and that almost all of the 1016 bacteria expected at the end of the second round are replaced in the struggle for life by those that have the first and the second base pair substitution. Thus, each of the small steps (or all but an occasional one or two steps) on the way to the adaptation, insofar as they are really random, must confer some competitive advantage or other. Without natural selection the natural laws invoked in the above fictional history explain only those things that did not change; all the changes are explained by accident. The laws of chemistry and the initial conditions are, however, the same for all the descendants of the original bacterium; thus all differences in the traits of the descendants would be due to and explained by accident.
SOLVING SOBER ’S PROBLEM According to Darwin’s model of selection, everything that the breeder does to his animals is part of selection. Thus whatever artificial selection does or can do is also part of selection.35 The breeder does not cause mutations; but he may recombine. Sober’s rhetorical flourish, how can killing X confer a particular trait on Y? becomes radically less convincing as soon as we allow compound traits to be traits.36 Let’s say that A is a trait and B is a trait and that AB is also a trait, not just the coexistence of distinct traits A and B in one organism. Call this new trait Q. Can killing an organism with some trait not-A in the F1 generation cause an organism in F2 to have the property Q (=AB)? Well, let’s take a concrete example. Assume a population of 200 sheep in which one ram is A but not B, all the other 34 See Ayala (2004, p. 61) for similar arguments about the first two mutations in the series. To give more than a rough estimate we would of course have to take into account the chance of reverse mutations and many other things. 35 In spite of an occasional tendency in Darwin also to view domestication as a cause of non-directed variation, we should view the breeder as selecting only. On the role of animal breeding as an evolutionary experiment see Rheinberger and McLaughlin (1984). 36 For one of the few serious reflections on what exactly traits are and how to individuate them, see Allen (2001).
The Arrival of the Fittest
219
sheep are not-A, and the ewes are also more or less evenly B, C, D, E, or F: Can a breeder by applying selection cause all sheep to be Q in the F2 or F3 or sometime later? Well, yes. If I kill (or otherwise exclude from reproduction) the 99 rams in the population that bear trait not-A, preserving only the one ram with trait A, and breed this ram with all the ewes, then I will probably get a few rams and ewes with the new compound property Q even in the first season. Note that Q (AB) did not yet exist in the F1 population: it was not hidden from sight, it was not written in invisible ink, it was not there. Depending on how drastic I make selection for the compound trait, I will soon have a whole flock of Q sheep, whose identifying trait Q is a product of selection. The trait did not exist in the original population; the Q type was not instantiated. I have created it by recombination. The death—or rather the exclusion from reproduction—of the non-A organisms in F1 is essential in order to skew the probabilities in F2 in favor of the otherwise unlikely combination AB=Q. Thus Darwin’s breeder by selecting A-organisms and selecting B-organisms, that is, by recombining traits A and B, can produce Q. What the breeder does here can be viewed as an experiment in recombination: One chosen trait known to be hereditary is recombined in every way possible given the actual variation in the (female) population, and the best new combination is then selected for further breeding in the next generation. If the relevant trait of the ram is A and 20% of the ewes have trait B then the chance that the combination AB occurs in any given member of the population has risen from 0.002 to 0.2; and the chance that it occurs at all in the population is close to certain. This is how the breeder produces new traits by selection: By systematically combining a desired trait with other traits available in the population he can, at least sometimes, produce traits that did not exist at all in the original population—even in the absence of new mutations. In Mayr’s words: “Characters are the developmental product of an intricate interaction of genes and since it is selection that ‘supervises’ the bringing together of these genes, one is justified in asserting that selection creates superior new gene combinations.”37 Selection is not merely eliminative: The breeder not only gelds most of the rams (thus eliminating them), he also breeds the remaining rams with all the female phenotypes in the flock. Recombination is part of selection as long as selection is conceptualized on the model of animal breeding. But this may not be precisely the kind of example Sober envisioned. He may actually not mean to ask how killing one organism in F1 can cause another organism in F2 to have one trait instead of another. What he seems to ask is, how a past event that is isolated from the causal history of a given individual, identified by its genealogy, could affect that individual’s traits. Well it can’t—if the event is indeed isolated and the individual is indeed identified by its genealogy. But Sober gives no argument that this must be the case in order for natural selection to explain adaptation—after all we are dealing with competition for resources and the 37 Mayr (1966, p. 202).
220
Peter McLaughlin
demise of competitors must be relevant. What Sober seems to have in mind is the following scenario: The breeder undertakes to produce a particular sheep in the F3 identified as the son of Joe and Sue in the F2, and grandson of Jack and Jill and Pete and Maud in the F1 etc. Can he give this particular future individual a particular mutation-based trait that none of its ancestors have? No! But the breeder’s task is not to see to it that some future individual John, who is to be the fifth son of the third son of the seventh son of Jack, has a particular new trait; his task is rather to produce a trait in some individual or other and then breed it true. Some supporters of Sober38 have indeed taken the discussion in this direction appealing to a particular understanding of the identity of individuals and thus linking their interpretation of natural selection crucially to a particular metaphysics such that their arguments are only plausible if one accepts the essentialist metaphysics to which they subscribe. This development more or less abandons the analysis of natural selection as a theory and enters into general metaphysics: If there had been any difference at any point in the entire history of the production of an individual, it would not have been the same individual. Basically, if one DNA chain anywhere in someone’s billion-year genealogy had split up or mutated differently, he would not have had different properties, he would just have been someone else. But science does not normally strive to explain individual events. And the original question was not what it means to be a particular individual but whether natural selection explains the production of adaptations or just their preservation: Darwin thought that natural selection explained the arrival of the fittest.
REFERENCES Garland E. Allen, Thomas Hunt Morgan. The Man and His Science. Princeton University Press 1978. Francisco J. Ayala, “Teleological Explanations in Evolutionary Biology”, in: Philosophy of Science 37, 1970, pp. 1-15. Francisco J. Ayala, “Design without a Designer: Darwin’s Greatest Discovery” in: William A. Dembski and Michael Ruse (Eds.), Debating Design: From Darwin to DNA. New York: Cambridge University Press 2004. Charles Darwin, The Variation of Plants and Animals under Domestication. (2nd ed.) London: Murray 1875. Charles Darwin, The Origin of Species. (6th ed.) New York: Appleton 1895. Hugo De Vries, Die Mutationstheorie. Versuche und Beobachtungen über die Entstehung von Arten im Pflanzenreich. 2 vols., Leipzig: Veit 1901–1903. Hugo De Vries, Species and Varieties. Their Origin by Mutation. Chicago: Open Court 1904. 38 e.g. Pust (2001; 2004).
The Arrival of the Fittest
221
Theodosius Dobzhansky, Genetics of the Evolutionary Process. New York: Columbia University Press 1970. John A. Endler, Natural Selection in the Wild. Princeton, N.J.: Princeton University Press 1986. Jerry A. Fodor and Massimo Piattelli-Palmarini, What Darwin Got Wrong. London: Profile Books 2010. Douglas J. Futuyma, Evolutionary Biology. (3rd ed.) Sunderland, Mass.: Sinauer 1998. Jean Gayon, Darwinism’s Struggle for Survival: Heredity and the Hypothesis of Natural Selection. Cambridge University Press 1998. Errol E. Harris, “Darwinism and God”, in: International Philosophical Quarterly 39, 1999, pp. 277-290. Lancelot Hogben, The Nature of Living Matter. London: Kegan Paul 1930. François Jacob, “Evolution and Tinkering,” in: Science 196, 4295, 1977, pp. 11611166. Richard Lewontin, “Darwin’s Revolution”, in: New York Review of Books, June 16, 1983a. Richard Lewontin, “The Organism as Subject and Object of Evolution,”, in: Scientia 118, 1983b, pp. 63-82. Reprinted in: Richard Levins and Richard Lewontin: The Dialectical Biologist. Harvard University Press, 1985, pp. 85-106. Ernst Mayr, “Accident or Design. The Paradox of Evolution”, in: Geoffrey W. Leeper (Ed.), The Evolution of Living Organisms. Melbourne: Melbourne University Press 1962. Ernst Mayr, Animal Species and Evolution. Cambridge, Mass.: Belknap Press 1963. Ernst Mayr, Toward a New Philosophy of Biology: Observations of an Evolutionist. Cambridge, Mass.: Harvard University Press 1988. Thomas Hunt Morgan, The Scientific Basis of Evolution. London: Faber 1932. Karen Neander, “Discussion: What Does Natural Selection Explain? Correction to Sober”, in: Philosophy of Science 55, 1988, pp. 422-426. Karen Neander, “Pruning the Tree of Life”, in: British Journal for the Philosophy of Science 46, 1995, pp. 59-80. Karen Neander, “Explaining Complex Adaptations: A Reply to Sober’s ‘Reply to Neander’”, in: British Journal for the Philosophy of Science 46, 1995a, pp. 583-587. Robert Nozick, Anarchy, State, and Utopia. New York: Basic Books 1974. Joel Pust, “Natural Selection Explanation and Origin Essentialism”, in: Canadian Journal of Philosophy 31, 2001, pp. 210-220. Hans-Jörg Rheinberger and Peter McLaughlin, “Darwin’s Experimental Natural History”, in: Journal of the History of Biology 17, 1984, pp. 247-268. George G. Simpson, The Meaning of Evolution. (2d ed.) New Haven: Yale University Press 1967.
222
Peter McLaughlin
Elliot Sober, “Evolution, Population Thinking and Essentialism”, in: Philosophy of Science 56, 1980, pp. 350-383. Elliot Sober, “Force and Disposition in Evolutionary Theory”, in: Christopher Hookway (Ed.), Minds, Machines and Evolution. Cambridge: Cambridge University Press 1984a. Elliot Sober, The Nature of Selection. Cambridge, Mass.: MIT Press 1984b. Elliot Sober, The Philosophy of Biology. Boulder, Colorado: Westview Press 1993. Elliot Sober, “Natural Selection and Distributive Explanation: A Reply to Neander”, in: British Journal of the Philosophy of Science 46, 1995, pp. 384-397. Alfred Russel Wallace, Darwinism: An Exposition of the Theory of Natural Selection with Some of its Applications. London: Macmillan 1889. Denis M. Walsh, “The Scope of Selection: Sober and Neander on What Natural Selection Explains”, in: Australasian Journal of Philosophy 76, 1998, pp. 250-264. Denis M. Walsh, “Chasing Shadows: Natural Selection and Adaptation”, in: British Journal for the Philosophy of Science 48, 2000, pp. 135-153. Denis M. Walsh, Tim Lewens and André Ariew, “The Trials of Life: Natural Selection and Random Drift”, in: Philosophy of Science 69, 2002, pp. 452-473.
Department of Philosophy University of Heidelberg Schulgasse 6 69117 Heidelberg Germany [email protected]
THOMAS A. C. REYDON
THE ARRIVAL OF THE FITTEST WHAT ?
ABSTRACT Biologists and philosophers of biology usually use the phrase “the arrival of the fittest” in the context of discussions on the following question: does natural selection only explain why some organismal traits (i.e., adaptations) are preserved and spread while other traits are not, or does it explain their origin too? In these discussions, the mechanism of natural selection is often compared to a sieve through which some organism or trait types pass while others do not. In the present paper, I shall raise two worries with respect to this analogy and attempt to show that using the analogy does more harm than good when one wants to know what reference to natural selection can explain.
1. INTRODUCTION: DOES SELECTION EXPLAIN THE ARRIVAL OF THE FITTEST? When biologists or philosophers of biology use the phrase “the arrival of the fittest”, they usually refer to a particular line of discussion that began around the turn of the nineteenth to the twentieth century and concerns the question how natural selection works.1 The overarching question in this discussion is what the mechanism of natural selection does and, consequently, what exactly can be explained by invoking natural selection as an explanatory factor. Of course, this presupposes that natural selection can be conceived of as a mechanism or process that actually does something – a presupposition which itself is a topic of philosophical debate. A number of authors have argued that the dynamical/causal picture of selection as doing/causing something is mistaken and that selection should rather be understood as a statistical account of what happens in populations.2 For the sake of the argument, however, here I shall ignore this issue and won’t question the basic assumption that selection does something.
1
2
This paper was conceived as a commentary on Peter McLaughlin’s paper, ‘The arrival of the fittest’ (this volume) and as such addresses some of the issues that are raised there. Both papers were delivered at the workshop Explanation, Prediction, and Confirmation in Biology and Medicine (ESF research network The Philosophy of Science in a European Perspective, University of Konstanz, October 2009). e.g. Walsh, Lewens & Ariew (2002); Walsh (2004)
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_15, © Springer Science+Business Media B.V. 2011
224
Thomas A. C. Reydon
The issue that is usually in focus under the guise of the phrase “the arrival of the fittest” is this: does natural selection only explain why some organismal traits – that is, adaptations – are preserved and spread while other traits are not, or does it explain their origin too? Is natural selection merely an eliminative factor that doesn’t produce evolutionary novelty but only eliminates part of the available variation, or is it a creative factor that both eliminates inferior types and creates novel ones? Before looking at this question in some more detail, it should be noted that this is a different question from an issue on which discussions of what selection explains also often focus. For many authors, the question is whether selection only explains the origin, preservation and spread of trait types, or also explains the presence of a particular trait token in a particular individual organism.3 This issue is not in focus here either; here, the question is whether natural selection can be conceived of as a creative factor at all. In discussions on this question, the mechanism of natural selection is often compared to a sieve through which some organism or trait types pass while others do not. In the present paper, I shall raise two worries with respect to this analogy and attempt to show that using the analogy does more harm than good when one wants to know what reference to natural selection can explain.
2. THE SIEVE ANALOGY OF NATURAL SELECTION One might think (and McLaughlin, this volume, suggests this) that most biologists endorse a view of natural selection as a creative factor and even “tend to have an intuitive commitment to the creative view of selection” (McLaughlin, this volume), while the view that natural selection is merely eliminative is more likely to be found among philosophers who critically look at biology from the outside. I’m not sure that this is true, especially since (as will be discussed below) the eliminative view is a product of biological theorizing too and was advocated by a number of prominent biologists in the period before the establishment of the Modern Synthesis. But be this as it may, an important reason why one indeed might expect biologists to prefer the creative view of selection is that it traces back to Darwin’s own view of selection, in particular to his claim that natural and artificial selection work in the same way4. As Darwin pointed out with respect to artificial selection, breeders do not just pick the organisms they want from the spectrum of already available forms and preserve the desired form by breeding true. Rather, they breed further by picking those organisms with slightly more desirable traits from each generation, crossing these and in so doing accumulating small differences and creating new forms that 3 4
e.g. Sober (1995, p. 385); Walsh (1998, p. 263); Matthen (1999, p. 145); Walsh, Lewens & Ariew (2002, p. 469); Pust (2004); Stegmann (2010, p. 61) Darwin (1858, pp. 51-52; 1859, p. 32)
The Arrival of the Fittest What ?
225
didn’t exist before. For Darwin, natural selection worked in this way too. This creative aspect was precisely what in Darwin’s view made natural selection explanatorily relevant: would selection only account for the preservation of already present forms, it would not be very interesting or of much explanatory value with regard to organismal forms or traits. As Darwin put it: [i]f selection consisted merely in separating some very distinct variety, and breeding from it, the principle would be so obvious as hardly to be worth notice; but its importance consists in the great effect produced by the accumulation in one direction […] of differences absolutely inappreciable by an uneducated eye.5
This seems to be a strong argument in favor of the creative view of selection: from Darwin onward, biologists have routinely and successfully invoked natural selection as an explanatory factor that accounts for the production of novel organismal traits as well as their preservation and the elimination of old ones – and this is a practice that would be entirely misguided if natural selection in fact was unable to explain the phenomena under consideration. But it might of course be the case that biologists have long followed a misguided practice. Indeed, comparatively soon after the publication of the Origin of Species, biologists have begun to criticize Darwin’s theory for not being able to explain where organismal traits come from. Among the early biologists who have pointed to this issue is the Dutch biologist Hugo De Vries. As De Vries put it, “[n]atural selection may explain the survival of the fittest, but it cannot explain the arrival of the fittest”.6 De Vries’ concern was that, as a causal factor in nature, natural selection needs material to work with. If we begin with a population of organisms that exhibit variation in their degrees of adaptedness to the environment they live in, some organisms will have an advantage over other organisms in the population, such that they will be able to produce more offspring to which their traits are inherited than the others. The advantageous traits will then become more common in the 5
6
Darwin (1859, p. 32). According to Paul (1988, pp. 417-418), the idea of selection as an eliminative factor long predated Darwin and Darwin’s crucial insight was that selection was not merely an eliminative but also a creative process. While this may be right, I still believe Darwin was exaggerating when he wrote that selection as a merely eliminative/preserving factor “would be so obvious as hardly to be worth notice”. Purely preserving/eliminative natural selection would be worth notice, as it still would have some explanatory value with respect to the question why we have the biodiversity in the world that we do, rather than a smaller, larger or just different diversity of forms. That is, while it wouldn’t explain the existing forms themselves, it would explain the currently existing diversity of forms. Accordingly, the question at stake here is not whether natural selection explains anything at all, but whether it only explains the preservation of traits, or their origins too. (For a standard list of what selection is thought to explain, see Stegmann 2010, p. 61.) De Vries (1904, pp. 825-826).
226
Thomas A. C. Reydon
population, possibly even displacing other varieties. But where does the adaptive variation in the population that we started with come from in the first place? The theory of evolution by natural selection, according to De Vries (as well as a number of other prominent authors, including Thomas Hunt Morgan), could only explain why some phenotypes remained in existence and spread, while other phenotypes did not. It could not explain how these phenotypes came into being because, as De Vries put it: “Natural selection is a sieve, it creates nothing, […] it only screens. It retains only what variability offers it” (De Vries, Mutationstheorie; quoted in McLaughlin, this volume). The principal problem that Darwin’s and Wallace’s work left unresolved, according to De Vries and other authors, was to explain how phenotypic novelties (the material on which natural selection could operate) could arise. Both biologists and philosophers of biology have picked up on De Vries’ sieve analogy in later discussions of how selection works. However, McLauglin (this volume) suggests, an examination of how various authors write about De Vries’ sieve analogy shows that two quite different interpretations of the analogy have been endorsed. When using the analogy, some authors describe natural selection as a sieve that actively retains superior organism or trait types for the next generation7, while others describe natural selection as a sieve that passively lets through better adapted organism or trait types to the next generation, blocking the way for inferior ones.8 The difference between the two interpretations can be illustrated by means of a comparison with a simple sand sieve.9 Suppose you are sieving a mixture of sand and gravel, shoveling it through a large sieve and collecting what falls through the sieve in a large bucket. Depending on what it is that you are interested in, fine-grained sand or coarse-grained gravel, you’ll view this sieving process differently. If it’s the sand that you’re after, you’ll think of the process as filtering out everything that is unwanted. What is retained is the sand that passes unhindered though the sieve – that is, on which the sieve has not acted. If it’s the gravel that you want, you’ll view the process as picking out just those grains that have the desired minimum size. What is retained is the gravel that has been stopped by the sieve – that is, on which the sieve has acted. In biological terms, these views are unpacked as follows. On the one view, selection is conceptualized as a negative force: what selection does is to weed out those organism or trait types that fail to meet the criteria. Those who remain in the population are those that selection has not affected and that, therefore, cannot be explained as products of the operation of selection. On the other view, natural selection is a positive force: what selection does is to retain those organism or trait types that do meet the criteria. Those who are removed from the population are the ones that natural selection has left 7 8 9
e.g. Ayala (1970, p. 5); Gould (2001, p. 215; 2002, p. 621) e.g. De Vries (1904); Sober, (1984, pp. 147-155) The illustration is McLaughlin’s in the workshop version of his paper.
The Arrival of the Fittest What ?
227
alone; those that remain are those upon which selection has acted. The difference between the two views is thus a difference from the perspective of the organism or trait types that find themselves in the next generation: the question is whether natural selection has affected these. Here a second dichotomy comes into focus. The dichotomy between the views of natural selection as an eliminative factor and as a creative factor – the one that is traditionally in focus in discussions on the “arrival of the fittest” and was discussed in Section 1 – is a difference in opinion about what the result of the operation of natural selection is. The second dichotomy, the one that now comes into focus, is a difference in opinion between two views of how the mechanism of natural selection works. McLaughlin gives these two dichotomies a prominent role in the discussion on the question what selection explains. Early on in his paper, McLaughlin writes that “[w]hile the positive or creative view of selection is associated with the breeder’s model of the creation of new forms, the negative or eliminative view is strongly associated with the analogy between natural selection and a sieve” (McLaughlin, this volume). Indeed, Darwin’s breeder analogy nicely fits the view that selection retains those organism or trait types that meet the criteria in the sieve: in the same manner as breeders create novel forms by retaining certain organisms and breeding with these, natural selection retains certain types and “works” further with these. An argument to the extent that the sieve analogy is inadequate as a description of how natural selection works would, on this alignment of views, yield indirect support for the creative view. At a later stage of the paper, however, McLaughlin suggests that both authors who endorse the eliminative view of selection and those who think of selection as a creative factor use the sieve analogy to strengthen their position. However, the two factions entertain different interpretations of what the sieve analogy means: those who think of natural selection as a creative factor often also think of it as a sieve that actively retains superior organisms or traits of superior types, while those who endorse a view of natural selection as purely eliminative often think of natural selection as a sieve that just passively lets through better adapted organisms or traits of more adaptive types. And indeed, there seems to be a good reason to align views this way: it is difficult to think of a passive factor as creating something, such that a view of natural selection as a creative factor needs to involve some sort of active conception of selection. If this alignment is right, an adequate criticism of the letting-pass view of selection would yield some indirect support for the creative view of selection. In any of these two alignments, though, it seems that the sieve analogy can be used to decide between the eliminative and creative views of selection. My aim here cannot be to decide whether the eliminative or the creative view of natural selection is correct – whether reference to selection explains only the preservation of traits, or their origins too. Rather, in the remainder of this paper I want to raise two worries with respect to the sieve analogy and attempt to show that using the analogy does more harm than good when one wants to know what references to natural selection can explain.
228
Thomas A. C. Reydon
3. TWO INTERPRETATIONS OF THE SIEVE ANALOGY: A FALSE DICHOTOMY According to McLaughlin (this volume), “[w]hen we compare selection to a sieve, we have to ask whether […] the winners are in the sieve and thus change their trajectories or whether they are in the bucket and remain unaffected by the sieve of selection.” But I’m not convinced that this is a question of major importance when addressing the question what natural selection explains. It seems that one can easily make too much of the dichotomy between the two interpretations of the sieve analogy that were discussed above. In fact, I want to suggest that it is a false dichotomy and, by consequence, it is not useful for addressing the question whether natural selection is a purely eliminative factor or also a creative factor. The dichotomy rests on the implicit assumption that the letting-pass and the retentive views of selection are incompatible: either selection is a retaining mechanism, or a mechanism that lets things through, but not both. But this assumption is questionable and therefore, I think, arguing for the creative view by way of criticizing the letting-pass interpretation of the sieve analogy is not a successful strategy. Let me elaborate this point further. In the sieve analogy retention and letting through merely are two faces of the same coin. The retaining and letting-pass views are the same view cast in different words in that they are mirror images of one another: the organism or trait types that natural selection lets through the sieve’s grating on the one view are just those that natural selection retains on the other view. What is taken home and what is thrown away on one formulation of the analogy is just the reverse of what is taken home and what is thrown away on the other formulation. An indication of this mirror symmetry of the analogy when used in a specifically biological context is the fact that authors sometimes use formulations of the sieve analogy that can be translated into their reverse without any loss of meaning or clarity. Consider Gould’s formulation of the sieve analogy: The sieve is shaken, and particles of a certain size become concentrated while others pass through the webbing (lost by selection). Sieving represents the causal act of selection – the interaction of the environment (shaking the sieve) with varying individuals of a population (particles on the sieve). As a result of this interaction, some individuals live (remain on the sieve), whereas others die (pass through the sieve).10
Gould could have given exactly the same analogy aligning remaining alive with passing though the sieve and dying with remaining on the sieve without any difference in content. Another example can be found in Rosenberg & Arp’s introduction to their recently published anthology on the philosophy of biology11. Rosenberg & Arp write that “natural selection acts like a sieve that allows for a certain phenotypic characteristic to pass through to a subsequent generation”, the various spe10 Gould (2001, p. 216; also 2002, p. 621) 11 Rosenberg & Arp (2010)
The Arrival of the Fittest What ?
229
cies that we find today being “those organisms that have made it through one of these environmental sieves”.12 This is the precise opposite of Gould’s sieve analogy. If we follow the analysis of the sieve analogy discussed above, Rosenberg & Arp here clearly advocate a view of selection as a passive, non-creative factor in nature and oppose Gould’s view of how selection works. However, reading Rosenberg & Arp’s text as a clear instantiation of the eliminative view of selection would, in my view, amount to reading too much into it. Even if Rosenberg & Arp actually do endorse the eliminative view of selection (which I don’t know), their writing provides an example that one account can easily be transformed into the other, and back, without changing our understanding of the mechanism of selection. They write in a positive mode: We can define natural selection as a mechanism of evolution by which the environment favors the reproductive success of individuals possessing desirable genetic variants with greater phenotypic fitness, increasing the chance that those genotypes for the phenotypic traits will predominate in succeeding generations.13
Now change all the positive terms into negative ones and see what kind of statement we get: We can define natural selection as a mechanism of evolution by which the environment disfavors the reproductive success of individuals possessing undesirable genetic variants with smaller phenotypic fitness, decreasing the chance that those genotypes for the phenotypic traits will predominate in succeeding generations.
What has changed with respect to our understanding of how natural selection works? Nothing, I contend: the retaining and letting-pass modes contribute the same to our understanding of how natural selection works. The reason for the mirror symmetry of these formulations is that the action of natural selection is comparative: some types are favored over other types, which is just to say that some types are disfavored in comparison to other types. Natural selection is a mechanism that both retains things and let things through – just as sieves do. Concerns whether the retaining view or the letting-pass view better represents how selection operates thus are misguided. It needs to be kept in mind that both are intended as analogies, not as actual explications of the mechanism of natural selection. As analogies, they are epistemological, didactical and rhetorical tools that can be used to increase clarity, achieve understanding, teach and communicate ideas, convince audiences, etc. Both the retaining view and the letting-pass view of selection work equally well in these respects. But if taken as descriptions of the actual mechanism under consideration, both may easily seduce the audience
12 Rosenberg & Arp (Ibid., pp. 2-3, my emphases) 13 Rosenberg & Arp (Ibid., p. 2, their italics)
230
Thomas A. C. Reydon
to think of natural selection as working in a particular way in which it does not in fact work. In my view the alleged opposition between the retaining and letting-pass views thus roots in an analogy being taken too literally. The claim that on the letting-pass view, the winners are those that selection has left alone, whereas on the retentive view the losers are those that selection has left alone is nothing more than an artifact of the analogy. McLaughlin writes that “the use of the same analogy conceals a deeper disagreement on what it signifies” (this volume). In contrast, I think that disagreement about which interpretation of the sieve analogy is correct can suggest a deeper disagreement about the meaning of the analogy than is actually there. On both views, selection acts in the same way: it separates the gravel from the sand resulting in the same outcome, gravel in the sieve, sand in the bucket. But this is only one step in the process. Which of the two will be taken home, the sieve full of gravel or the bucket full of sand, is decided in a step taken after the sieving is over; this decision is not part of the sieve’s operation itself. Thus, on either of the two interpretations the sieve analogy captures natural selection’s separating activity, but fails to capture the possible creative aspect of natural selection. If natural selection is a creative factor, the creative part follows after the sieving itself is over. To illustrate this, consider how Ernst Mayr describes the mechanism of natural selection. Mayr was one of the principal advocates of the view that selection not only eliminates existing organism or trait types, but also creates novel ones (Mclaughlin, this volume). For Mayr, as well as for most adherents of the Modern Synthesis view of evolution (and, as McLaughlin points out, for many contemporary textbooks too), the creation of evolutionary novelty was due not so much to mutation but to recombination. But, McLaughlin points out, recombination and selection aren’t completely independent processes. One result of the operation of natural selection is that the probabilities of the various possible combinations change: some traits become more widespread and thus come to have a larger probability of ending up in combinations with other traits, while other traits become scarcer and thus come to have a smaller recombination probability. This, McLaughlin argues, is what the eliminative view of natural selection misses: “[t]he problem is not so much that [De Vries and Morgan] didn’t allow recombination as a kind of variation but that they didn’t recognize recombination as an aspect of selection” (this volume). Therefore, on McLaughlin’s view recombination is best considered as a part of the process of natural selection. On such a conception, selection indeed does something more than just eliminating some traits from the available spectrum, namely recombining genetic material in such a way that novel traits can arise. It is a view of selection close to Darwin’s own, as it conceives of natural selection as acting in a similar manner as breeders do: by crossing those organisms that exhibit traits they are interested in, breeders create new organismal forms. In this way, natural selection as a whole can be thought of as a creative factor.
The Arrival of the Fittest What ?
231
However, this does not imply that within the mechanism of natural selection no distinction can be made between sieving and recombination. According to Mayr, whose writing on the topic is not always unequivocal, “the creative component of natural selection […] is due to the alternation between genetic recombination and reproductive success”.14 Mayr also often spoke of natural selection as a two-step process, one step being the production of variation and the other step being “selection proper”.15 Thus, Mayr distinguishes between “selection proper”, (i.e., the sieving mechanism) and the production of variation (i.e., recombination) and conceives of both as components of natural selection which operate separately but not independently.16 The sieve analogy, then, captures only one out of two possible components of natural selection. It captures Mayr’s “selection proper” but not recombination – if recombination is counted as an element of selection at all, that is. And this is the case for both interpretations of the sieve analogy – the sieve that actively retains and the sieve that passively lets through. The real issue in the discussion of whether natural selection only explains the “survival of the fittest” or the “arrival of the fittest” too, indeed is, as McLaughlin argues, whether natural selection encompasses recombination as a second, creative component in addition to the sieving (or rather, separation) mechanism. The question is whether recombination is a component of natural selection (as Mayr and most biologists who endorse the Modern Synthesis view of evolution do) or not (as De Vries and Morgan did).17 But the opposition between the two interpretations of the sieve analogy is not relevant in the context of this discussion. McLaughlin and I agree that the sieve analogy is misleading, but we find it misleading for different reasons.
4. WHAT NATURAL SELECTION EXPLAINS AND WHAT IT DOES NOT The second worry that I want to express is that, besides being not particularly relevant to the discussion on what reference to natural selection explains, the sieve analogy also covers up an important part of what is at stake in this discussion. The sieve analogy suggests that we have a dilemma at hand: either natural selection explains the origin of organismal adaptations (that is, the sieve is a creative factor), or it does not (the sieve only passively lets through some of the already existing organism or trait types). This is also the way in which the question is often formulated: does natural selection only explain the “survival of the fittest”, or does it explain the “arrival of the fittest” too? But it seems that a third option is open: 14 Mayr (1982, p. 519, emphasis added) 15 e.g. Mayr (1997, p. 188; 2004, pp. 136-138) 16 Mayr (2004, p. 137) adds that the two components operate simultaneously, rather than sequentially. 17 Here I do not wish to take sides on this matter.
232
Thomas A. C. Reydon
natural selection explains the origin of adaptations to some extent, but it does not fully explain them. It is interesting to note that a number of contemporary biologists have hinted in this direction, signaling that the question of the “arrival of the fittest” still remained an open problem. As recently as 15 years ago, biologists – not philosophers, mind you! – Walter Fontana and Leo Buss complained that evolutionary theory still confronted the problem that was formulated by De Vries. (Fontana & Buss called the problem the “existence problem”). According to Fontana & Buss, The principal triumph of the modern synthetic theory of evolution was to recognize that alleles, individuals and populations were entities relevant to evolution and that evolutionary changes in allele frequencies were governed by mutation, selection and drift. From this realization came a rigorous, albeit loosely connected, body of theories that is widely acknowledged as collectively providing the formal basis of our understanding of evolution. Despite the undoubted success of population genetics and, by extension, population biology, this formulation of evolution suffers from an existence problem. […] Present theory tacitly assumes the prior existence of the entities whose features it is meant to explain.18
That is, the present theory assumes the existence of organisms with varying degrees of adaptedness, rather than explaining it. And: Biologists today, as then, have no coherent theory of variation. […] The origin of the phenotype – De Vries’ arrival of the fittest – is swept under the carpet. […] Existing evolutionary theory is a formulation of the process of natural selection, but is incomplete in that it assumes the prior existence of selectable units and is formalized without a theory of the origins of variation. The Darwinian theory, thus, requires augmentation with a theory of the organism.19
To be sure, what Fontana & Buss were complaining about is not that biology had not identified a cause of the origin of evolutionary novelty simpliciter. After all, how new genotypes come into being through mutation and recombination was already known in detail. Rather, they complained that making reference to mutation and recombination (regardless of whether the latter is seen as a component of the process of natural selection or as a separate process) did not sufficiently explain variation on the level of the phenotype. In their view, there still was an important black box in the explanatory chain between the varying genotype on the one hand and the varying phenotype on the other hand. The creation of new genotypes could be explained by mutation and recombination, but the creation of new phenotypes (that are “visible” to natural selection)20 still was not fully explained. 18 Fontana & Buss (1994, p. 2) 19 Fontana & Buss (Ibid., pp. 56, 59, my emphasis) 20 “Let us remember also that phenotypes are the object of selection, not single genes.” (Mayr 2004, p. 137). In a recent commentary, Block & Kitcher (2010) objected to describing traits as being “visible” or “invisible” to selection, as in their view this metaphor too much treats selection as if it were a conscious actor. To be sure, as with
The Arrival of the Fittest What ?
233
Thus, Fontana & Buss’s “existence problem”, which they themselves equate to De Vries’ “arrival problem”, is not precisely the same problem as the problem that De Vries highlighted. For De Vries, writing at the beginning of the twentieth century, the problem was that natural selection was not a cause of novel organismal forms and nothing much was known about how evolutionary novelty arises. At the time of and immediately after De Vries, this simultaneously implied the question how novel genotypes originated and the question of the origin of novel phenotypes. After all, when the notion of ‘gene’ was first introduced in 1909 a gene simply was nothing more than whatever factor in the gametes co-determined a trait of a developing organism.21 Thus, a gene was a “gene for” a particular trait in the sense that a difference in the gene caused a difference in the trait.22 In this way, individual genes mapped directly onto individual traits, such that the problem of the origin of novel phenotypes was indistinguishable from the problem of the origin of novel genotypes. For Fontana & Buss, writing almost a century later, the problem was that even if natural selection could be seen as a cause of evolutionary novelty, the origin of novel traits could not be explained by referring only to selection as the 1-1 mapping between traits and genes had become lost.23 While the problem of the origin of novel genotypes was resolved, the problem of the origin of novel phenotypes remained to some extent open. Over the past decades, biologists have developed a solution to the “existence problem” in the form of the research program of evo-devo.24 Evo-devo emerged in response to among other things the perceived incompleteness of evolutionary theory as described by the “existence problem”. Framed in the terminology of De Vries, the contemporary problem is that while natural selection can sufficiently explain the “survival of the fittest”, with respect to the “arrival of the fittest” it is a necessary part of the explanation but by itself does not constitute a sufficient explanation. Evo-devo, as an attempt to remedy this issue, is a program that doesn’t just bring evolutionary and developmental biology in contact but aims at integrating them into a homogeneous research program that addresses the problems of both the “arrival” and the “survival of the fittest”. It aims to devise an integrated explanation of the origin of organismal forms as well as the preservation and spread of traits and the origin of species.25 This integration is necessary to achieve the necessary explanatory force: the “evo” part comes to the foreground when explaining the preservation of traits and the origin of novelty on the genotypic level and of
21 22 23
24 25
all metaphors, this one should not be taken too literally. However, I think the metaphor is useful and comparatively harmless if read in the appropriate manner. Johannsen (1926, p. 165) Waters (1994, p. 172) For a history of the gene concept and the changing views of how genes relate to traits, see for example Portin (1993), Waters (1994), Wieben (2003) or Griffiths & Stotz (2007). For a historical overview, see Love & Raff (2003); Laubichler & Maienschein (2007) Love & Raff (2003); Callebaut et al. (2007, pp. 25-30)
234
Thomas A. C. Reydon
species, and the “devo” part comes into play when explaining the origin of form. However, both parts are indispensable elements of the explanations in both cases. Accordingly, proponents of evo-devo sometimes explicitly present their field as investigating both the “arrival” and the “survival of the fittest”.26 So, what seems to have happened in biology in the past decades is the elaboration of a new research program in response to a problem with the existing theory that, although biologists have been pointing it out for about a century, has changed its nature as biological theory advanced. This change in the nature of the problem of the “arrival of the fittest” is covered up when it is addressed in terms of the sieve analogy. In my view, then, the sieve analogy might even lead one to question the need for biologists to find a solution to the “existence problem”, as on one horn of the dilemma there was no such problem. After all, if one accepts with Mayr and others that natural selection explains both the “arrival” and the “survival of the fittest”, the question would be what, if anything, evo-devo as an integration of evolutionary and developmental biology adds to the evolutionary synthesis. I hope to have shown why this suggestion is not warranted. Biologists had good reasons to develop the research program of evo-devo in response to the “existence problem”. The problem is that, following De Vries’ terminology, the problem is sometimes presented as a dilemma: “[t]he question at stake is whether natural selection can explain the origin and production of adaptations or merely the preservation and spread of adaptations” (McLaughlin, this volume). Either it’s arrival plus survival, or only survival. But hidden in the dilemma and covered up by the sieve analogy is a trilemma: either natural selection does not contribute at all to explaining the origin of adaptations (and only explains their preservation and spread), or it contributes to but does not fully explain the origin of adaptations, or it fully explains the origin of adaptations (and there is no need for an additional explanation). And it is precisely the middle option in the trilemma – the correct one, I think – that remains hidden in De Vries’ formulation of the problem. When claiming that natural selection explains the arrival of the fittest, we have to ask: the arrival of the fittest what? Natural selection, if conceived of as including recombination, explains the arrival of fitter genotypes and as such is a necessary element of the explanation of the origin of new organismal forms, but by itself it is not sufficient to explain the arrival of fitter phenotypes.
26 As developmental biologist Scott Gilbert (2000, p. 180), one of the leading figures in contemporary evo-devo, stated: “Developmental syntheses look at the possibilities and constraints for the arrival of the fittest, while population genetics can model their survival. Both approaches are obviously needed to understand evolution.” See also Gilbert’s statements in Yoon (2007).
The Arrival of the Fittest What ?
235
REFERENCES Francisco J. Ayala, “Teleological explanations in evolutionary biology”, in: Philosophy of Science 37, 1970, pp. 1-15. Ned Block and Philip Kitcher, “Misunderstanding Darwin”, in: Boston Review, March/April 2010. Werner Callebaut, Gerd B. Müller and Stuart A. Newman, “The Organismic Systems Approach: Evo-devo and the streamlining of the naturalistic agenda”, in: Roger Sansom and Robert N. Brandon (Eds), Integrating Evolution and Development: From Theory to Practice. Cambridge (MA): MIT Press 2007, pp. 25-92. Charles Darwin, “Abstract of a letter to Asa Gray in ‘On the tendency of species to form varieties; and on the perpetuation of varieties and species by natural means of selection’”, in: Journal of the Proceedings of the Linnean Society of London (Zoology) 3, 1958, pp. 46-50. Charles Darwin, On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life, London: John Murray 1859. Hugo De Vries, Species and Varieties: Their Origin by Mutation, Chicago: Open Court 1904. Walter Fontana and Leo W. Buss, “’The arrival of the fittest’: Toward a theory of biological organization”, in: Bulletin of Mathematical Biology 56, 1994, pp. 1-64. Scott F. Gilbert, “Genes classical and genes developmental”, in: Peter Beurton, Raphael Falk and Hans-Jörg Rheinberger (Eds): The Concept of the Gene in Development and Evolution: Historical and Epistemological Perspectives, Cambridge: Cambridge University Press 2000, pp. 178-192. Stephen J. Gould, “The evolutionary definition of selective agency: Validation of the theory of hierarchical selection and fallacy of the selfish gene”, in: Rama S. Singh, Costas B. Krimbas, Diane B. Paul and John Beatty (Eds.), Thinking About Evolution: Historical, Philosophical, and Political Perspectives. Cambridge: Cambridge University Press 2001, pp. 208-234. Stephen J. Gould, The Structure of Evolutionary Theory, Cambridge (MA): Harvard University Press 2002. Paul E. Griffiths and Karin Stotz, “Gene”, in: David L. Hull and Michael Ruse (Eds.): The Cambridge Companion to the Philosophy of Biology. Cambridge: Cambridge University Press 2007, pp. 85–102. Wilhelm Johannsen, Elemente der exakten Erblichkeitslehre, mit Grundzügen der biologische Variationsstatistik (3. deutsche, neubearbeitete Auflage), Jena: Gustav Fischer 1926. Manfred D. Laubichler and Jane Maienschein, “Embryos, cells, genes, and organisms: Reflections on the history of evolutionary developmental biology”, in: Roger Sansom and Robert N. Brandon (Eds), Integrating Evolution and
236
Thomas A. C. Reydon
Development: From Theory to Practice. Cambridge (MA): MIT Press 2007, pp. 1-24. Alan C. Love and Rudolf A. Raff, “Knowing your ancestors: Themes in the history of evo-devo”, in: Evolution & Development 5, 2003, pp. 327-330. Mohan Matthen, “Evolution, Wisconsin style: Selection and the explanation of individual traits”, in: British Journal for the Philosophy of Science 50, 1999, pp. 143-150. Ernst Mayr, The Growth of Biological Thought: Diversity, Evolution, and Inheritance, Cambridge (MA): Harvard University Press 1982. Ernst Mayr, This is Biology: The Science of the Living World, Cambridge (MA): Harvard University Press 1997. Ernst Mayr, What Makes Biology Unique? Considerations on the Autonomy of a Scientific Discipline, Cambridge: Cambridge University Press 2004. Peter McLaughlin (this volume, pp. 203-222): “The arrival of the fittest”. Diane B. Paul, “The selection of the ‘survival of the fittest’”, in: Journal of the History of Biology 21, 1988, pp. 411-424. Petter Portin, “The Concept of the Gene: Short History and Present Status”, in: Quarterly Review of Biology 68, 1993, pp. 173–223. Joel Pust, “Natural selection and the traits of individual organisms”, in: Biology and Philosophy 19, 2004, pp. 765–779. Alexander Rosenberg and Robert Arp, “General introduction: A short history of philosophy of biology”, in: Alexander Rosenberg and Robert Arp (Eds): Philosophy of Biology: An Anthology. Chichester: Wiley-Blackwell 2010, pp. 1-13. Elliot Sober, The Nature of Selection: Evolutionary Theory in Philosophical Focus, Cambridge (MA): MIT Press 1984. Elliot Sober, “Natural selection and distributive explanation: A reply to Neander”, in: British Journal for the Philosophy of Science 46, 1995, pp. 384-387. Ulrich E. Stegmann, “What can natural selection explain?”, in: Studies in History and Philosophy of Biological and Biomedical Sciences 41, 2010, pp. 61-66. Denis M. Walsh, “The scope of selection: Sober and Neander on what natural selection explains”, in: Australasian Journal of Philosophy 76, 1998, pp. 250264. Denis M. Walsh, “Bookkeeping or metaphysics? The units of selection debate”, in: Synthese 138, 2004, pp. 337-361. Denis M. Walsh, Tim Lewens and André Ariew, “The trials of life: Natural selection and random drift”, in: Philosophy of Science 69, 2002, pp. 452-473. C. Kenneth Waters, “Genes made molecular”, in: Philosophy of Science 61, 1994, pp. 163-185. Eric D. Wieben, “Primer on Medical Genomics, Part VII: The Evolving Concept of the Gene”, in: Mayo Clinic Proceedings 78, 2003, pp. 580-587.
The Arrival of the Fittest What ?
237
Carol K. Yoon, “From a few genes, life’s myriad shapes”, in: The New York Times, June 26, 2007; online available at http://www.nytimes.com/2007/06/26/ science/26devo.html.
Institute of Philosophy & Center for Philosophy and Ethics of Science (ZEWW) Leibniz Universität Hannover Im Moore 21 30167 Hannover Germany [email protected]
Team C Philosophy of the Cultural and Social Sciences
WOLFGANG SPOHN
NORMATIVITY IS THE KEY TO THE DIFFERENCE BETWEEN THE HUMAN AND THE NATURAL SCIENCES
In this paper I take the human sciences to comprise psychology, social, economic, and political sciences, archaeology, history, ethnology, linguistics, philologies, literary and cultural studies, and similar fields having emerged besides and in between. So, the human sciences study the individual and collective ways and products of the human mind. After the term “Geisteswissenschaften” has narrowed its meaning, the term “Humanwissenschaften”, “human sciences”, seems more appropriate. By contrast, the natural sciences are to comprise all the other fields of empirical study, physics, astronomy, chemistry, biology, geology, engineering, etc. In this paper I would like to give an update of, and a fresh attempt at, the longstanding heated issue whether or not there is a principled difference between the human and the natural sciences.
1. WHERE THE DIFFERENCE BETWEEN THE HUMAN AND THE NATURAL SCIENCES IS NOT The issue I am going to discuss starts long before the emergence of the special sciences, namely with Descartes’ powerful introduction of a dualistic world picture, with his distinction of a res extensa being the object of the physical sciences and a res cogitans being the object of studies of the mind. We still see the repercussions of this dualism everywhere. However, philosophers found this ontological division always hard to accept, not to the least because of the great difficulties in accounting for the obvious causal relations between the physical and the mental; those causal relations seemed to call for a more unified picture. Today the old ontological dualism may well count as obsolete. Still, there are remnants of the old dualism in the current philosophical debate about consciousness. We might explain human sciences as dealing with the emanations of human consciousness. Hence, those sciences would find their core in the present consciousness studies. However, dualism is no option there. Looking at phenomenal consciousness, the most evident form of consciousness, we find either the attempt at functionally explaining qualia or phenomenal contents, thus integrating them into the universal causal nexus. Or, if this should fail, as many believe including me, the only alternative seems to be to try to understand phenomenal consciousness directly as a biological phenomenon. Any kind of substance dualism would render further scientific inquiry of consciousness imposD. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_16, © Springer Science+Business Media B.V. 2011
242
Wolfgang Spohn
sible. From this side, hence, we do not receive any confirmation for a principled division among the empirical research fields. Let us look, hence, a bit more broadly at the history of our topic. When the issue came up after the emergence of the special sciences in the 18th and 19th century, the suspicion rather was that there is a principled methodological difference between the natural and the human sciences. This suggestion was forcefully introduced by Windelband (1884). He located a principled difference between the nomothetic and the ideographic disciplines, as he called them. The physical sciences proceed nomothetically; they attempt to subsume their phenomena under general laws and to thereby explain them, as physics paradigmatically does. By contrast, human studies proceed ideographically; they carefully inquire and describe singular constellations, individual small- or large-scale events, unrepeatable processes, as history paradigmatically does. Certainly, empirical researchers proceed both ways. However, already the logical empiricists raised the objection that this distinction does not provide a suitable dividing line between the natural and the human sciences. Single events are scrutinized everywhere, in astronomy and geology, etc., just as well as in historical or economic studies, and the attempt to generalize is undertaken everywhere; certainly, there are economic laws and perhaps even laws of history. One may suggest that the types of laws involved are different. Indeed, there is a large current debate about so-called ceteris paribus laws (cf., e.g., Schrenk 2007). It is only fundamental science that states truly general, unconditioned laws, whereas all the special sciences at most come up with laws hedged by ceteris paribus clauses. However, even this view is contested; perhaps, ceteris paribus conditions run deeper. And it at best discriminates between fundamental and special and not between natural and human sciences. This early debate about the lawful character of various disciplines was, however, only a prelude to a much more serious and encompassing debate initiated by Dilthey (1883) and the hermeneutic philosophy; this had a more pervasive impact. The idea was that the human sciences aim at understanding reached by a special, i.e., the hermeneutic method solving special epistemological problems or hermeneutic circles. These problems and this method were supposed to be quite unlike anything we find in the natural sciences. There is no point in starting detailed argument now; However, I confess that the central claim of the hermeneutics always appeared implausible to me. Certainly, deciphering and interpreting an ancient text is very different an activity from, say, observing cells under the microscope. Where, however, is the principled methodological difference? From an ordinary language point of view, opposing understanding to explanation is an unhappy move, since that opposition simply does not exist. In the relevant contexts, the two terms are often exchangeable. The sciences strive to understand nature just as human studies strive to understand men; cells are objects no less to be understood than texts.
The Difference Between the Human and the Natural Sciences
243
Of course, this remark is not quite fair; understanding was intended to have a narrower meaning when opposed to explanation. The focus of hermeneutic philosophy certainly was on linguistic understanding, on coming to terms with language, the unique and overwhelmingly powerful human faculty. Now, understanding old or foreign texts quite detached from context certainly is one of the most difficult linguistic tasks we actually face. Still, the fundamental hermeneutic problem was raised, I find, rather in analytic philosophy of language, in terms of radical translation, and Quine’s (1960) lessons on the inscrutability of reference and the indeterminacy of translation and Davidson’s (1984) attempt at breaking the basic hermeneutic circle of belief and meaning by his principle of charity were the philosophically more radical approaches. One may find Quine’s strictly behavioristic approach to (stimulus) meaning too narrow and Davidson’s behavioral base for inferring meanings too sparse. Still, there is no hint in their work that inquiring meanings, as far as it is possible, follows a fundamentally different methodology. The point is still clearer in Grice’s (1957) promising program of naturalizing semantics. This program specified precise ways of inferring semantics from psychology and psychology from behavior, and no step of inference showed special features principally different from those made in other empirical fields. Hence, language and linguistic understanding did not seem to be the realms providing a distinction of the human sciences. As a consequence, the issue was thrown back at the basic level where it belongs, i.e., to the question how to conceive of human action. There is no more basic level at which a principled difference between natural and human sciences could emerge; and reversely, if the difference cannot be perceived already there, the case is very likely to be hopeless. However, this is the level at which methodological dualists suffered their most dramatic defeat. Ryle (1949) originally suggested that dispositional explanation differs from causal explanation and that this difference is crucial for understanding human actions. Hempel (1961/62) and Davidson (1963) took the opposite view that dispositional explanation is a variety of causal explanation and that human actions have causes just like any other events. One may safely say, I think, that the causalists clearly won the upper hand; their view seems nowadays generally accepted. Von Wright (1971), who was about the strongest opponent to the causalists at that time just sounds strange when reread with present eyes. This outcome of the debate about the explanation of human actions always appeared to me to be the strongest point in favor of the unity of science and against the methodological disunity of the natural and human sciences. This is how I would have presented the issue 10 years ago, and even today it is, I think, a fair, though partial description of the present state of discussion. However, I have changed my mind some years ago or, rather, I have realized that this change is entailed by my considerations in Spohn (1993, 2007). There is a principled methodological difference between the natural and human sciences. The basic point is that human sciences are shot through with normative considerations
244
Wolfgang Spohn
in an irreducible way in which natural sciences are not. I would like to explain this point in the rest of my paper.
2. NORMATIVITY IN AN EXTERNAL AND AN INTERNAL PERSPECTIVE Normativity comes in many linguistic forms; the basic form, though, uses the auxiliaries “shall” or “should”. A normative question asks, and a normative claim says, what I, or we, should do, or what should be the case. Usually, the realm of normativity is decomposed into two parts, which are, of course, interrelated. One part relates to our actions: What should we do? The other part relates to our beliefs: What should we believe? Thus the realm of the normative basically divides into theoretical and practical issues. Now, it is essential to distinguish two perspectives on normative issues, a distinction I learnt from Hart (1961, pp. 54ff.). There is an external third-person perspective, and there is an internal first-person perspective. In the external perspective we observe and state which norms a person has adopted and which norms actually hold in a given group or community. A rule is more than a regularity, a norm is not just normality. The surplus may be analyzed in various ways. A norm may be said to be enforced by sanctions either explicitly imposed by appropriate institutions or implicitly exerted within the community. Or a rule may be a convention in the sense of Lewis (1969), a behavioral regularity supported by an appropriate pattern of desires and mutual beliefs. Whatever the adequate analysis, it is clear that within the external perspective normative facts are nothing but empirical facts; it is simply a matter of empirical inquiry to find out about the norms actually governing a community. Things look different, however, from the internal perspective. The difference shows up already in the peculiar ambiguity in calling a norm valid. This may mean that the norm is empirically valid, that it holds in, or is accepted by, a given community. Or it may mean that it is normatively valid, that it is to be accepted. The latter claim cannot be empirically confirmed or disconfirmed; it is subject to normative discourse where it may get approved or disapproved. Normative discourse is an essential part of the first-person perspective, a perspective we humans have and cannot escape. Whatever we do, it is legitimate to ask whether we should do, or have done, it; we have to take a normative stance, and taking no stance is, in a way, taking a stance as well. Let me slightly expand on the pervasiveness of normative discourse. There are many norms of theoretical rationality. We have a lively normative dispute about good and bad inductive inferences, about the inference to the best explanation, the principle of minimizing cross-entropy, and more. The foundations of statistics can still not count as settled. Deterministic and probabilistic sciences seem to fall
The Difference Between the Human and the Natural Sciences
245
apart methodologically. And we do not only have arguments about general epistemological principles; we certainly have them in every specialized empirical field. Then, of course, we have principles of practical rationality. The principle of maximizing conditional expected utility is perhaps the most basic and general one, but it is open to interpretation, counterexamples, and amendments. Game theory, no doubt, is a normative theory, too, though with some counterintuitive recommendations. There are quite a number of alternatives to standard decision theory, and there is a larger number of paradoxes which point to defects in our received picture of practical rationality. Again, we find a most lively and ramified normative dispute over those general principles. And in everyday life we continuously discuss about what is reasonable in various situations. More salient than all such norms of rationality are the more substantial ethical norms about how to lead a good and virtuous life and moral norms of justice, fairness, and peaceful coexistence, norms that may enter legal codes or may be used as critical instances against positive law. There is no need to further emphasize the importance of moral discourse. It is a long-standing philosophical issue whether moral principles reduce to rationality principles. However, we need not take a stance. Even if there are two kinds of principles, they are both normative. Is there normative truth? I may be silent on this issue. If it exists, it is not of the correspondence kind. It would be rather like pragmatic truth. Just as pragmatic truth is vaguely explained as that which is maintainable in the limit of empirical inquiry, normative truth might be conceived as being accepted in the limit of normative inquiry. However, we need not presuppose that all normative issues are settled in the limit; maybe, many of them are undecidable. What is important is that normative discourse is not merely a matter of confessions, it is governed by (often defeasible) reasons, and it is fallible and open-ended; firm convictions are fine, but dogmatism is not. In all those respects, normative discourse is very similar to empirical discourse, and it would be worthwhile to more thoroughly explore the rules and patterns of normative argument. Still, empirical and normative arguments are different, simply because of their conclusions. This difference is obvious, even for those like me who do not think that each (defeasible) inference from is to ought has to be a fallacy (cf. Schurz 1997) So, to resume, from the external perspective, the normative is just a kind of empirical phenomenon to be studied like other empirical phenomena. From our internal perspective, by contrast, the normative is entirely different from the empirical, even though connected. Keeping strictly to the external perspective, we could never grasp what all this normative business within the internal perspective is about.
246
Wolfgang Spohn
3. HUMAN AND THE NATURAL SCIENCES DIFFER IN NORMATIVE INVOLVEMENT Now, what has all this to do with the alleged principled difference among the scientific disciplines? This is not obvious. The point is not simply that some disciplines deal with normative issues and others do not. This is much too unspecific a statement. In fact, it is clear that all scientific disciplines deal with normative issues. Each researcher and each science manager must ask herself which fields to inquire and how to allocate resources. If we compare the amount of money invested into conquering cancer and that invested into healing malaria, is that fair? Practical normative questions abound in all disciplines. Even if we declare such questions to be somehow external to science—they are not scientific questions to be answered by scientific methods –, enough normativity remains. Each inquiry is confronted with methodological issues that are theoretical normative issues. How do we best go about to conduct a given inquiry? Which experiment is most discriminative, which poll is most revealing? What should we infer from the data? Which hypothesis is the most plausible or the most explanatory? And so on. All these are normative, i.e., methodological issues, and we find them in all scientific disciplines. Most of them are answered as a matter course, though some more caution would well befit the scientists. In any case, these observations rather speak against any principled difference among the scientific disciplines. Indeed, all disciplines are infected by so much normativity. The point is that only the human sciences are entangled with normativity in a special and much more far-reaching way, namely simply because only the human sciences deal with objects, namely us humans, that have an internal normative perspective. This point is familiar and not novel at all. I only want to grasp it as sharply as I can. So, let me elaborate: It is, I said, that the human sciences empirically deal with all normative phenomena whatsoever and not only those concerning themselves, since they empirically deal with humankind that is occupied with all normative issues whatsoever. The crucial question now is: Can the human sciences do so by restricting themselves to the external perspective, from which normative phenomena are, as stated, a variety of empirical phenomena and may be studied accordingly? The crucial answer is: No, they cannot. Merely in order to do their empirical work properly they have to actively engage into normative theorizing from the internal perspective (or to listen to those thus actively engaged). That is, they have to do something no natural science has to do; they have to leave their empirical home and enter foreign terrain, i.e., normative discourse, in order to properly cultivate their home domain. Why should they have to do this? The basic reason is that there is a fundamental asymmetry in the explanation of human activities. Unless indeterminate, norms
The Difference Between the Human and the Natural Sciences
247
tell us what is right and wrong, correct and incorrect. We humans are receptive for normative considerations; this is what it means to have an internal normative perspective. This perspective is no idle play; it guides us. So, when I ask why Mary did something, the answer “because it was right” gives a first explanatory sketch in a way in which the alternative answer “because it was wrong” would not. If Mary has done something wrong, explaining this requires some additional explanation for deviating from the norm, whereas if Mary has done something right, it suffices, for a first step, to point to her compliance with the norm. Of course, in both cases we may richly detail the explanation; there remains a lot to be cleared up (though not necessarily by philosophers). However, the asymmetry just stated remains; there is always one thing more to explain in the case of wrong doings than in the case of right doings. Or to express the same point in a different way: normative theorizing has a peculiar double role. On the one hand, it is just that, something emerging in normative discourse within the internal perspective. On the other hand, normative theorizing also serves as an empirical theory, precisely because of the human receptivity for norms. Of course, it does not deliver an empirically fully adequate theory; usually, we do more or other things than perfectly implementing norms. Still, it serves at least as an idealized empirical theory. Ideally, we would implement the norms, though actually we are imperfect. And as with any idealized theory, we need additional error theories correcting the idealized theory and making it empirically more adequate, thus explaining also deviations from the ideal. This is why we have the asymmetric explanatory situation mentioned above. One may object that for arriving at this idealized theory it suffices to take the perspective of the external observer. From that perspective one can hypothesize the norms actually holding in the community under study. It is clear, then, that those hypothesized norms are not iron rules and hence need again to be supplemented by error theories. So, the objection goes, the principled point I was just trying to make is fully respected purely within the external perspective. At best, the objector may grant, it is much more difficult to hypothesize the actually valid norms from outside than with a familiarity of the internal normative discourse. However, ease and difficulty certainly are a matter of degree, not of principle. The objection misses, I think, the full force of the internal normative perspective. The idealized theory to be applied and amended in empirical research should not merely collect what the community presently thinks is right; it should state what is right, something that can be inquired only from within the internal perspective, even though only the limit of normative inquiry is definitive. Why should our empirical research appeal to an idealized theory in this strong sense? Because otherwise we cannot understand or explain the dynamics of our normative conceptions. To some extent the norms adopted by a community are a matter of fashion, and then one may speculate about the dynamics of fashions. To some extent, they are matter of historic contingency; there are only historic explanations why some countries drive on the right and not on the left or have a proportional instead of a
248
Wolfgang Spohn
majority voting system. To some extent, though, the dynamics of our normative conceptions can only be understood through our efforts to find out and do what is right. The movement of enlightenment and the evolution of science can only be understood as also being a fight for epistemic rationality, whatever its ultimate standards. The changing forms of society can only be understood as also being a fight for the right conceptions of freedom and justice. History is essentially driven by the evolution of human rights that can only be understood as a fight for finding and enforcing the right moral standards. Of course, these grand historic examples are reflected on each local and individual level. Thus, referring only to the current normative conceptions observable from the external perspective yields too weak an idealized theory; we have to invoke the full normative theory as an ideal in order to represent the actual dynamics of normative conceptions as a history of errors and of approximations to the ideal. And this representation is available only from the internal and not from the external perspective. In a nutshell, since we have to apply the model of an ideal theory and supplementing error theories also in a dynamic perspective, the ideal theory to be used cannot refer to any momentary state, but must take an ideal stance towards the entire evolution that can only consist in the endpoint of this evolution. Of course, the endpoint is only a normative fiction that is actually never reached; humankind will be extinct before. At each moment we can only hypothesize what the endpoint will be. However, we can only hypothesize it from the forward-looking normative point of view and never from the external point of view running behind. This is my argument why the human sciences, and only the human sciences, have to engage into normative theorizing from the internal perspective, something ordinarily not conceived as a scientific enterprise. However, if I am right, they have to do this even in respect of their empirical interests only. This is what fundamentally distinguishes them from the natural sciences. Is this point able to account for the earlier views on the distinction between the human and the natural sciences? Certainly not for the nomothetic/ideographic distinction; but we dismissed that, anyway. Certainly, as far as the explanation of human actions is concerned. The point there is not an obscure opposition of explanation and understanding. It is rather that the explanation of actions appeals to standards of practical rationality and even morality that can be acquired only in normative discourse. Only insofar, the explanation of actions is indeed different. What about language and linguistic understanding? I am not sure. It has been strongly suggested in the recent years that linguistic meaning is thoroughly normatively infected; for instance, the basic tenet of Brandom (1994) is, roughly, that meaning is constituted in the communal practice of giving and accepting reasons. Indeed, according to any kind of inferential role semantics there is at least a close relation between the space of meaning and the space of reasons. And, of course, the space of reasons opens all dimensions of theoretical rationality and epistemic normativity. Such approaches to semantics are highly suggestive; that’s why I am
The Difference Between the Human and the Natural Sciences
249
sympathetic to them. But they are constructively poor; that’s why I am not sure about the relation between my argument and the argument from language.
4. IS METHODOLOGICAL DUALISM COMPATIBLE WITH ONTOLOGICAL MONISM? At the beginning of my paper I dismissed a distinction of human and natural sciences on ontological grounds. However, if monism is our ontological option, does it not refute my argument for a principled difference? I think not. Let me conclude my paper with explaining why. We may well grant the supervenience of the mental on the physical. This includes the supervenience of all our actual normative conceptions on the physical; if we had different ones, there had to be some physical difference as well. If there should be normative facts, those determined in the limit of normative inquiry, one may even grant that they supervene on the physical as well. However, they do so trivially, because, like mathematical truths, they do not depend at all on the physical. The point now is that such ontological acknowledgments do not determine our epistemological third-person perspective. They do not speak for the primacy of the empirical over the normative point of view or for the irrelevance of the latter for the former. Why? It is a well-known philosophical maneuver to turn ontological considerations into epistemological ones with the help of Laplace’s demon. By knowing the ultimate ontological inventory of our world, the distribution of matter (at a given time) and the fundamental physical laws governing it, the demon can apparently know everything that is, and he can apparently explain every past and predict every future action and even every normative conception we tend to have. He seems to be the incarnation of our epistemologically perfected external perspective, and there is no place for the normative perspective in that perfection. However, this is a seriously deceptive picture. We need to understand how wildly nonhuman the demon is. The point is not that in our indeterministic universe even the demon would not get far. Ontologically, we may grant strict determinism for the sake of the argument. The point is rather that neither we nor the demon are capable of specifying the supervenience relation that is only claimed to exist in our ontological professions, and that this incapability has very different, though converging reasons for us and the demon. For us, the problem is not so much complete knowledge of fundamental physical laws; perhaps we are on the verge of it. For us, it is rather the demon’s complete knowledge of particular physical facts (at a given time) and his perfect computational capacities. Both are entirely fictitious for us. It is safe to predict that we shall never exactly compute complex molecules in quantum mechanical
250
Wolfgang Spohn
terms and that, despite the bold declarations of neuroscientists, we can never have more than the roughest understanding of the physiological supervenience base of complex mental processes. In particular, we have to proceed from the causalists’ rudimentary explanation of our actions, which refers to principles of practical rationality and which is the only one consistent with our having a normative perspective; and we may and should specify, qualify, and amend this sketch in multifarious ways. As emphasized, we also evolve our normative point of view; we seek ever better and more complete answers to our normative questions. And as explained, we thereby promote our empirical perspective; our normative conception serves as well as our empirical ideal. Any empirical theory that simply neglects our normative point of view is bound to be incomplete and inadequate. The demon has the complementary problem. Well, not necessarily, the demon might also be an eliminativist and thus not care about supervenience. However, I take the eliminativist’s prediction that our intentional idiom will eventually dissolve to be simply incredible. If so, it will not do for the demon to know everything there is to know on the basic ontological level of physics. He is still entirely ignorant of all relevant supervenience relations. If he wants to know what water is, he must first know our notion of water; then, of course, it is easy for him to establish that water is H2O. If he is to predict whether or not I am happy tomorrow he must also know how happiness supervenes on all the physics he knows; and in order to know this he would first have to acquire the complex notion of happiness. Likewise for all the other mental concepts we have. In particular, he would need to have and exercise a normative perspective by himself; otherwise, he could never grasp what our normative discourse is all about. From both sides, we thus arrive at the same conclusion. The demon needs to have a normative perspective, even if his sole aim is to complete his empirical picture. We have the normative perspective and have to respect it as an empirical ideal in doing empirical human studies. Hence, even from the empirical thirdperson perspective one is committed to the normative first-person perspective. One cannot do well in the former without engaging in the latter; you cannot complete empirical psychology and thus the other human sciences without engaging in normative considerations—something not required in the natural sciences. So, ontological unity in terms of suitable supervenience relations does not prevent there to be a principled methodological difference.
The Difference Between the Human and the Natural Sciences
251
BIBLIOGRAPHY Brandom, Robert (1994), Making It Explicit. Cambridge: Harvard University Press. Davidson, Donald (1963), “Actions, Reasons, and Causes”, Journal of Philosophy 60, 685-700. Davidson, Donald (1984), Inquiries into Truth and Interpretation. Oxford: Clarendon Press. Dilthey, Wilhelm (1883), Einleitung in die Geisteswissenschaften. Leipzig. Grice, H. Paul (1957), “Meaning”, Philosophical Review 66, 377-388. Hart, Herbert L. A. (1961), The Concept of Law. Oxford: Oxford University Press. Hempel, Carl Gustav (1961/62), “Rational Action”, Proceedings and Addresses of the APA 35, 5-23. Lewis, David (1969), Convention: A Philosophical Study. Cambridge: Harvard University Press. Quine, Willard V. O. (1960), Word and Object. Cambridge: MIT Press. Ryle, Gilbert (1949), The Concept of Mind. London: Hutchinson. Schrenk, Markus (2007), The Metaphysics of Ceteris Paribus Laws. Frankfurt a.M.: Ontos. Schurz, Gerhard (1997), The Is-Ought Problem. A Study in Philosophical Logic. Dordrecht: Kluwer. Spohn, Wolfgang (1993), “Wie kann die Theorie der Rationalität normativ und empirisch zugleich sein?”, in: L. Eckensberger, U. Gähde (eds.), Ethik und Empirie. Zum Zusammenspiel von begrifflicher Analyse und erfahrungswissenschaftlicher Forschung in der Ethik. Frankfurt a.M.: Suhrkamp, pp. 151196. Spohn, Wolfgang (2007), “The Core of Free Will”, in: P. K. Machamer, G. Wolters (eds.), Thinking About Causes. From Greek Philosophy to Modern Physics. Pittsburgh: Pittsburgh University Press, pp. 297-309. von Wright, Georg Henrik (1971), Explanation and Understanding. Ithaca: Cornell University Press. Windelband, Wilhelm (1884), Aufsätze und Reden zur Einführung in die Philosophie. Freiburg.
Department of Philosophy University of Konstanz 78457 Konstanz Germany [email protected]
HANS LENK
METHODOLOGICAL HIGHER- LEVEL INTERDISCIPLINARITY BY SCHEME-INTERPRETATIONISM: AGAINST METHODOLOGICAL SEPARATISM OF THE NATURAL, SOCIAL, AND HUMAN SCIENCES
I. INTERDISCIPLINARITY IN THE INFORMATIONAND SYSTEMS TECHNOLOGICAL WORLD It is well known that most of the topical problems of our times cannot be addressed in clean disciplinary separations or total disciplinary make-up, but they are only successfully to be addressed in interdisciplinary or transdisciplinary or even superdisciplinary manner. For instance, ecological problems are not just natural science questions, but of course they are not only cultural or social humanities problem areas either. In the overriding and comprehensive problems of our society and age we encounter a complex of not only internal interaction and interconnection if not mashing of the prospective disciplinary areas. We need more abstract plus disciplinary methods, disciplines and technologies, so to speak generalized operational techniques in order to get a more formal or abstract or methodological perspective we will discuss below. This is even true for the humanities in the narrower sense. It is quite obvious that in the last decades the techniques of information processing, electronic data-retrieval and processing as well as other information techniques have penetrated also the humanities. We deal here generally with information or, to be more precise, interpreted informations as well as processes and results of interpretations. Using a rather technical methodological term, I like to talk of “interpretation constructs” (19931) or even by the referring to rather artificial “objects” by calling them “interpretata” or even “schematisata”, i.e. the results of schematisations or interpretation processes (qua the activation of schemes) amounting to be a rather “higher order raw”, if not “fine material” of the 000sciences and humanities as well as the respective information processing disciplines. During the last century information and the means and possibilities of processing, transmitting and designing as well as manipulating it have been systematically technologized. Already in 1970 I talked of the information processing becoming comprehensively technicalized and systematized, thereby leading to what I called “an information and systems technological age” and society2. Information and systems technologies characteristically cross over, overlap, or reach beyond 1 2
Cf. my Interpretationskonstrukte. Frankfurt/M.: Suhrkamp 1993. My Philosophie im technologischen Zeitalter. Stuttgart: Kohlhammer 1971, 21972; My (Hg.), Technokratie und Ideologie. Stuttgart: Kohlhammer 1973.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_17, © Springer Science+Business Media B.V. 2011
254
Hans Lenk
traditional disciplinary borders and limits not only intersecting, but interconnecting and overriding traditional separations between areas and disciplines amounting to a new interdisciplinary frontier of information technological and to systems expanding processes, phenomena and transdisciplinary interactions and interconnections of whatever sort. One may so to speak talk of a systems technological or systems-technogenic interdisciplinarity. The sharp separations between and by disciplines is not anymore to be upheld, if not just for operative, methodological or methodical reasons. This certainly leads to respective challenges on the side of all the respective scientific methodologies. This is especially true for the traditional humanities with their once fashionable distinction between the alleged “two cultures” of the “natural” versus the “Geisteswissenschaften”, or historical disciplines.
II. THE TRADITIONAL “TWO CULTURES” PROBLEM During the 19th century there seemed to have opened up a total cleavage between the humanities understanding themselves as “the understanding disciplines” (“verstehende Wissenschaften”) which provocatively thought themselves in a certain kind of contrast to the so-called “explanatory sciences” dealing with law explanation and a covering-law model comprehensively applied in the theoretical and systematical natural sciences. To be sure, there are also descriptive natural and historical disciplines like traditional descriptive botanics etc. or biology and geography, but these are also under the grip of law-covering sciences at least since a century now.) The methods of the covering-law sciences were said to be totally different from those of the humanities. For instance, people said, that the humanities and the ‘humanists’ (“Geisteswissenschaftler”) would only “understand” (“verstehen”), but not “explain” (“erklären”) something and that vice versa the natural scientists would only “explain” but not at all “understand”. Already this ironic contrast shows that this cannot be right, even if after a first glimpse of plausibility. These separations were and are indeed much too rough - mostly already because they accentuated this contrast from the beginning. But this dichotomy did not fit well even in the past. Mathematics, e.g., being a pure “Geisteswissenschaft” or logics did not fit into this dichotomy anyhow. Linguistics and social sciences are neither pure natural sciences nor pure humanities. A notorious example certainly is psychology always sitting between the different branches of a naturalistic, experimental or behaviorist, and the humanistic and person-oriented side. The separatism of the disciplines and methods culminated in this contrasting of “Verstehen” and “Erklären” and even led to a critical contrast rendering the so-called two-cultures separation and a respective thesis after Snow3. This two3
Charles Percy Snow, The Two Cultures. Cambridge, UK 1959.
Methodological Higher-Level Interdisciplinarity by Scheme-Interpretationism
255
cultures separation was enthusiastically hailed by the hardcore ideologues of both sides, although Snow originally did not contrast the “natural sciences” and the “humanities-bound” culture, but the “natural scientific” and the “literary intelligence”—which is a rather different contrast indeed. This was notoriously overlooked in the debate. Nowadays, this methodological separatism seems to be outmoded and ideological, a distortion or misrepresentation leading to a caricature of the relationship between the pertinent sciences and disciplines. The traditional “either-or” has to be replaced by a proportional “as well as” in a more differentiated and well analyzed relationship between the two or three or four kinds of disciplines. Separatism leads to a sort of dogmatism, and any dogmatism whatever is an end of analysis with regard to the most interesting questions addressing the urgently required interdisciplinary “diplomatic relations”. Therefore we should not retreat to dogmatism.
III. TYPES OF INTERDISCIPLINARITY How is it possible to differentiate different disciplines and their types and interdisciplinarity from one another? Table 1: Typical differentiations of the disciplines according to their 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
objects and fields or areas methods and arsenals of methods “interests of knowledge” (“Erkenntnisinteressen”, Habermas) theories and their systematic interconnections and networks theories and their historical interconnections and developments (L. Krüger) the relationship of theory and practice substantiality vs. operationality vs. formality of theories, respectively (Bunge) system holism vs. specificity of domains a priori or analytic formality of methods vs. empiricism explanatory and systematising patterns (e.g. descriptive vs. explanatory, historical vs. systematising) 11. cognitivity and normativity (descriptive vs. normative disciplines) 12. fictionality (virtual realities, “cyber worlds”, e.g., the so-called “second life”) and secondary reality (social “validity” or “Geltung”) vs. primary reality (see the concept of “impregnation”, below).
256
Hans Lenk
The disciplines are traditionally distinguished with respect to these criteria or markers. Very important to my mind is the difference between substantial, or substantive, and operative theories (Bunge4) where the latter ones concern procedures, operations, programmings and model-making. Substantive theories would be, e.g., gravitation theories after Newton or Einstein. Operative theories would figure as analytic instruments which can be applied in very different sciences. Information technological and information-theoretical approaches are obviously operative theories and very sensefully and importantly applied in interdisciplinary research. Formal theories are of course those which elaborate formal ideal-language concepts like mathematical theories and the new developments as, e.g., fractal geometry and chaos theory. A rather important distinction seeming to be much more than an absolute or total distinction between different sorts of disciplines is the distinction between cognitive and normative ones. Cognitive descriptive disciplines are certainly the only ones in the natural sciences proper, whereas, e.g., jurisprudence has to be largely taken as a normative discipline, although there are descriptive and cognitive parts and derivations as well as knowledge perspectives that are also important here leading to what can be called nowadays a supplementation, or underpinning, of jurisprudence by some modern sciences like sociology, neuroscience and, traditionally, psychology and even criminology as so-called auxiliary disciplines. Another important difference seems also to be the distinction between real and material objects vs. fictional or soci(et)al objects which are by definition produced by human ruling or linguistic or language structuring and categorizing of a social provenance. All these perspectives—particularly those explicitly mentioned as important—lead to different types of interdisciplinarity which are listed in the following diagram. Table 2: Types of Interdisciplinarity: 1. 2. 3. 4. 5. 6. 7.
Interdisciplinary cooperation in and between projects bi-disciplinary or interdisciplinary research field multidisciplinary aggregat(iv)e science (genuine) bi- or interdiscipline multidiscipline (with multidisciplinary theoretical integration) generalised interdisciplinary systems theories (general systems theory) mathematical theories of abstract and complex dynamical systems, e.g. deterministic chaos theory) 8. supradisciplinary applied structure- and operations disciplines (e.g. operations research) 9. methodological-metatheoretic supradisciplines (philosophy of science, science of science) 4
Mario Bunge, Scientific Research. Vgl. I, II, Berlin u.a.: Springer 1967.
Methodological Higher-Level Interdisciplinarity by Scheme-Interpretationism
257
10. philosophical, epistemological and methodological metadiscipline (e.g. methodological scheme-interpretationism). Some remarks regarding these diagrams and respective types of interdisciplinarity are in order. Ad 1. The cooperation of projects in interdisciplinary research cooperations are certainly the practically most important sort of interdisciplinary teamwork by experts of different orientations like for instance in city planning or any environmental research dealing with natural and human-made and manipulated systems. This is, however, a rather lose not systematic or systematically or theoretically interconnected cooperation or aggregation of experts’ work and contributions according to the respective planning or development programme. Ad 2. There are bi-disciplinary and permanent multi-disciplinary aggregations or cooperative networks, within a research project, obtaining between two or more disciplines, e.g. between architecture and sociology in city-planning. Ad 3. This might be a bordering case towards what Weingart5 calls “multidisciplinary aggregate science” (his example is science of science). Ad. 5. From such an aggregative cooperation is certainly to be distinguished a genuine specific interdiscipline as, e.g., molecular biology or biochemistry or traditionally physical chemistry. Ad. 6. Then there are the generalized interdisciplinary disciplines of a formal or model-based character as for instance generalized systems theory like the one called “General Systems Theory” (after Bertalanffy). Ad. 7. The purely formal and abstract mathematical theories of, e.g., complex dynamic systems, are notably involved in progressive developments in dealing with systems of deterministic chaos or fractal geometry within these approaches. Ad. 8. Supra-disciplinary applied structural and operations disciplines as are to be found in economics in the form of the so-called operations research are pretty old. (However, there are also new ones like the before-mentioned chaos theory if applied.) Ad. 9. There are methodological-metatheoretical supra-disciplines of a higher level like traditional philosophy of science or also a higher-level approach to science research (“Wissenschaftsforschung”) on a more conceptual basis. Ad. 10. Finally we have to mention and probably first of all develop the philosophical and methodological meta-theoretical field of debates of the respective systems connections and the whole set of the disciplines and charge under a specific holistic or higher level-methodological perspective as for instance offered by methodological interpretationism or scheme-constructionism to be discussed in the next section. Indeed, these ten different possibilities and aspects or types of interdisciplinarity are useful, because, e.g., pure “gathering disciplines” loosely covering a 5
Peter Weingart, Wissenschaftssoziologie. Bielefeld 2003.
258
Hans Lenk
practical field of research in a complex interaction of different scientific approaches just bound together by practical requirements are quite another thing than an exact interdiscipline like physical chemistry or again like a mathematical operative theory as mathematical game theory. Here, we have to take into consideration clear methodological distinctions and differentiations. For all these aspects however, we have to require that the scientists have a certain secondary competence in the discipline. Lastly, it is obvious that the philosopher of science who wants to systematically deal with methodical and methodological problems of biology should be somehow up to date in biology proper. Imperative is cooperation and teamwork of scientists from different provenances. It is true, that at least a “third culture” (Lepenies), namely social science, has to be supposed to (have) overcome the mentioned polarity. Indeed there are additional mixed disciplines and many other sorts of interdisciplinary areas and methods on diverse levels as mentioned before so that even some formal and operational “sciences” (like mathematics or logics or game theory, chaos theory etc.) have to be identified as special types of interdisciplinary approaches as done before in our diagram of the types of interdisciplinarity. The same is true for descriptive and historical disciplines as well as linguistic research areas between the perspective traditional borderlines of the natural sciences and, say, the linguistic disciplines. The traditional separatism has to be rejected. It is methodological speaking false and also not only outdated by practice but also rather skewed by ideological leniences.
IV. TOWARDS A SYSTEMATIC SCHEME-INTERPRETATIONISM Any sort of cognition, perception and action is necessarily shaped by (re)activation of “schemata” or “schemes”. Any interpretation is schema (re)activation. Schemata are epistemologically speaking “structural” activation patterns which can be, psychologically and neurologically speaking, accommodated, adapted, “learned” by (co- and re)activating neuronal assemblies. Indeed, in our cognition of any kind we are obliged to use frames, forms, shapes and constructs as well as schemata or schemes. This is true for all sorts of grasping something, may this be by a process of recognition and categorization or of normative structuring or planned acting. Applications of forms and frames are schematizations or schema interpretations as I would like to call these interpre(ta) tive constructs and their activation in order to distinguish them from the usual text interpretation in the hermeneutical sense. Schemata might be used consciously or activated subconsciously. Any kind of interpretation whatsoever is connected with or bound to an activation of such schemata. This connection might be characterized by core features and core stimuli the selection of which is necessary, even though some of these are conducted subconsciously.
Methodological Higher-Level Interdisciplinarity by Scheme-Interpretationism
259
Generally speaking, I call these abstract constructs of frame character schemata or schemes. Schemata are developed and applied on different representational levels in order to integrate individual experiences, single activities and sense data or stimulations into a more general frame, pattern or similarity structure. Any recognizing and generalising, particular conceptual knowledge is thus bound to cognitive schemata which can be understood as more or less abstract constructs which are projected onto and into the seemingly direct sense perception and the respective experiences by recognizing Gestalten or constituting objects, processes, events etc. Any seeing and recognizing shapes and forms is dependent on and guided by schemata. Any cognition whatsoever is thus schematic. This is true not only for recognition, but also for actions, i.e. not only for rather passive sorts of “grasping”, but also for rather active kinds. It was Kant who developed in his Critique of Pure Reason (CPR) the concept of schema for epistemology by conducting within quasi operational procedures of instantiating as well as developing schemata a connection between sense reception on one hand and conceptual recognition on the other. Kant defined (CPR, 179f, my translation) a schema as “product of the power of imagination (Einbildungskraft), which is not attending to individual images or imaginations, but towards the ‘unity’ of sensations and intuitions (Anschauungen) and the determination of sensuality”, “which is rather the imagination of a method to imagine according to a certain concept in an image than the image itself”. Kant anticipated the process of developing and establishing as well as applying cognitive constructs for the imaginative realization, visualisation of mental configurations and models, i.e. of cognitions. Cognitive psychology has only since few decades in the wake of theories and concepts of Gestalt psychology rediscovered this concept of schemata as “imaginative” cognitive constructs (cf. e.g. Rumelhart 19786). Schemata or schemes are called by Rumelhart “the building blocks of cognition” (1978). Psychology discovered that not only visual conception and sense perception general, but also conceptual and common sense or naive theoretical cognition operates in terms of the developing and applying schemata, i.e., any cognitions, interpretations, knowledge whatsoever are bound to the application, selection and activation as well as checking of schemata (see, e. g., Neisser). The process of interpretation is basically to be seen in the or even as the selection and activation of possible configurations of schemata which are verified under the perspective whether or not they are congruent with thought datafragments of memory. Beyond that, this process is an active process of searching for and structuring informations. It is important to notice that schemata consist of sub-schemes. The activation of a subschema is usually immediately related with the activation of the schema itself and the other way around. The mutual activation of schemata and sub-sche6
David E. Rumelhart, “The Building Blocks of Cognition”, In: Spiro, R. / Bruce, B. / Brewer, W. (Ed.), Theoretical Issues in Reading Comprehension. Hillsdale, N.J. 1980.
260
Hans Lenk
mata is essential. In general, the concept of schema or cognitive construct or even interpretational construct is a rather fruitful instrument for developing a cognitive psychological theory, but beyond that also for a new methodological epistemology. Interpretation is indeed the development, stabilization and activation (application) of mentally representing constructs or schemata. Interpretation (in a wide sense) is basically scheme- interpretation and founded on this as well as grounded in schema activation. Therefore, I talk of schema- or scheme-interpretation. We can even conceive of a basic axiom or principle of methodological (scheme-) interpretationism stating that all kinds of grasping, cognition and action are interpretation dependent, i.e. founded on the activation of schemata. This is true far beyond psychological theories and epistemological perspectives, but rather a totally general methodological comprehensive approach comprising the philosophy of knowledge (traditionally called epistemology) as well as philosophy of action and representation. We can call this approach a methodological and transcendental construct- or scheme-interpretationism overarching even the split between natural and social sciences as well the humanities, since all these disciplines would structure their fields and objects according to the activation of schemata by using procedures of establishing, stabilizing and activating schemata as cognitive constructs in order to structure the respective world versions and sets of objects or events, structures, procedures as well as projections. It is interesting that schema interpretation admits of levels of categorisation as well as according to the variability of the respective schemata, i.e. whether or not they are hereditarily fixed or conventionalized or flexible, whether they are subconsciously developed and activated or consciously conceived and used. I developed a hierarchy of levels of interpretation consisting of six different levels or plains of interpretation. The following diagram shows the respective six levels: Table 3: Diagram of the Levels of interpretation7 IS1: practically unchangeable productive primary interpretation “Urinterpretation”) (primary constitution or schematization, respectively) 7
The different levels of interpretation are the following ones: IS1 comprises the practically unchangeable productive primary interpretations of primary constitution which might be represented by subconscious schema instantiation. They comprise the hereditarily fixed or genetically founded activation of selective schemata of sense perception (e. g. contrasts of dark and light etc.) as well as the interactive, selective activations of early ontogenetic developments like the stages of developmental psychology discussed by Piaget. Also comprised are the biologically hardwired primary theories which we cannot alter at will, but which we can (only) problematise in principle. For instance we have no magnetic sense or capacity to trace ultrasound like the bats. But we can conceive of conditions in which we could have these senses or at least devise technological means for substituting these.—On level IS2 we have the habitual, quality forming frame interpretations and schema categorisations as well as “categorialisations” that are abstracted from pre-linguistic discriminatory activities, experiences of
Methodological Higher-Level Interdisciplinarity by Scheme-Interpretationism
261
IS2: habit-shaping, (equal) forms-constituting pattern interpretation (ontogenetically habitual(ized) form and schema categori(ali)zation an preverbal concept-formation) IS3: conventional concept formation transmitted by social, cultural and normregulated tradition IS3a: … by non-verbal cultural gestures, rules, norms, forms, conventions, implicit communicative symbols IS3b: … by verbal forms and explicitly representing communicative symbols, metasymbols, metaschemata etc. IS4: applied, consciously shaped and accepted as well as transmitted classifactory interpretation (classification, subsumption, description by “sortals”, generic formation of kinds, directed concept-formation) IS5: explanatory and in the narrow sense “comprehending” (“verstehende”), justifying, theoretically or argumentatively substantiating interpretation, justificatory interpretation IS6: epistemological (methodological) metainterpretation (plus meta-meta-interpretation etc.) of methods, results, instruments, conception of establishing and analysing interpretative constructs themselves. The model and approach of epistemological interpretationism itself is analyzed on a respective meta-level accumulated in level IS6. Therefore, we have
equality of shape, similarity of presentation and experience etc. Establishment and discriminatory capacity of pre-linguistic conceptualization and development of concepts about language is to be formed on this level.—On level IS3 we have conventional concept formation, namely socially and cultural traditional conventions and norms for representation and forms of discriminatory activities like the explicit conceptualization of framing the world according to natural kinds etc. In so far as this is not related already to language differentiation we can think of a sublevel (IS3a) on which pre-linguistic convention(alisation)s are characteristic. On the other hand (on IS3b) we have the explicitly linguistic conventionalization or the differentiation of concepts by means of language.—Level IS4 would comprise the consciously formed interpretations of embedding and subsuming as well as classifying and describing according to generic terms, kinds etc. It is the level of ordered concept formation and classification as well as ordering and subsumption.—Level IS5 would go beyond that by rendering explanatory, or in the narrower sense comprehending (“Verstehen”) interpretations as well as justifying a theoretically argumentative interpretations in a sense of looking for reasons and grounds of justification.—After all, we have also a level (IS6) of the epistemological and philosophical as well as methodological interpretations of a metacharacter, overarching and integrating the procedures of theory building and theory interpretation, methodology and the models of interpretation in the sense of methodological interpretationism is itself certainly an interpretative one and can be described scheme-interpretationism itself. One could call this a metalevel of interpretation and explicitly speak of epistemological meta-interpretations. However, this level is cumulative and open towards further meta-levels.
262
Hans Lenk
the possibility of a self-application of the interpretational method to interpretative procedures itself. Thus, the present philosophy of interpretative constructs as an epistemological model admits of a certain kind of meta-theoretical and meta-semantical selfapplication in the form of a sort of “meta-meta-interpretation”. This is certainly an asset and epistemological advantage compared to a few other epistemological approaches including critical rationalism after Popper, a theory which does not conceive of the precise conditions of being falsified itself. To be sure, in our cognition, even in all actions and behaviour of any kind we are indeed obliged to follow or use patterns, structures, frames, forms, shapes, scripts, and constructs as well as schemata or schemes. This holds true for all sorts of grasping any objects, whether concrete or abstract ones, maybe by recognition and categorization or by normative interpretation or planned acting. Applications of schemes are schematizations or schema interpretations as I would like to label these interpretative constructs and their activation. They are to be distinguished from the usual text interpretation in hermeneutics. Schemata are used consciously, or they are frequently activated subconsciously. Any “interpretation” is based on or bound to such activations of schemata. Schemes are activated on a psychological, including the neuropsychological, level as well as on neuro-physiological or neurobiological8 and even biochemical sublevels, mainly in the neo-cortex, but also beyond or “below” that in overarching reaction, behaviour and action systems or by the ways of sub-cortical centres as, e. g., the limbic system. Schemes can be analysed from an epistemological point of view taking a broader take-off than in Kant’s approach;9 more generally they may be modelled as methodological constructs. 8
9
Interestingly enough, modern neuroscience is on the brink of giving a naturalized theory of schema development, schema activation and stabilization as well as schema reactivation. Modern brain researchers think of the brain as an “interpretative system” or even speak of “brain constructs” (“Hirnkonstrukte”) (Singer) which are based on the establishment and development of plastic (i. e. flexible though relatively stabilized) neuronal assemblies. The forming and the establishment of neuronal assemblies is hypothesized as being a building-up and stabilization of the frequency phases of oscillatory reactions of different overlapping co-varying and co-oscillating neuronal entities and the neuronal assemblies or networks which are activated simultaneously and selectively on adapting to a certain rhythmic ground oscillation of 40 Hertz and a respective process of synchronization of these oscillations which are starting to oscillate in common phase. Such a theory of the synchronicity of building up and dynamically stabilizing a certain kind of oscillation pattern and initiated impulses in the physical sense seems to be a potential neuro-physiological explanation for the recognition of patterns, representations of forms and recognition of mental states of activities as well as mental imaginations and retrievals from memory. To note, Immanuel Kant, in his Critique of Pure Reason (B, 179f), used the concept of “schema” for epistemology by conducting within quasi operational procedures of instantiating as well as developing schemata a connection between sense reception on one hand and conceptual recognition on the other. Kant had defined “the schema” as
Methodological Higher-Level Interdisciplinarity by Scheme-Interpretationism
263
V. INTERPRETATION AND HERMENEUTICS The approach of schema-interpretation is a rather general, abstract but it is a pragmatic and comprehensively applicable interdisciplinary methodological, even meta-methodological approach which overarches the conception and building of theories, concepts, and hypotheses of most diverse disciplines. Certainly there are differences and incompatibilities below the abstract higher level unity or methodological parallelism. This is not to be denied. For instance, humanities would frequently concentrate on “objects”, which are at least in part produced by interpretation, i.e., fictive or “virtual” objects and fictionalised ones— like also the social sciences in dealing with human-made institutions, social structures as rules and norms etc. Even “the state” or what institution whatever are not just things but rather fictions, social fictions, gaining secondary social existence like being held valid on the side of many humans believing in them, or, their aims and rules etc. or being accustomed to them. An overarching common point of view is indeed, that science is also always the work of humans consisting of human made concepts, theories, hypotheses, instruments etc. (This is true, even in the light of the undeniable insight that scientific constructions are not just at will, but checked on a rigorous basis by experiments etc.) The unity of the sciences is achieved on a higher methodological level of abstraction under this perspective of a scheme-interpretationist or interpretation“a product of the power of imagination, which is not attending to individual images or imaginations, but towards the ‘unity’ of sensations and intuitions and the determination of sensuality”, which is—as Kant stated—“the imagination of a method to imagine according to a certain concept in an image rather than the image itself”: “Now, this imagination of a general procedure of the power of imagination to render an image for a concept, I call the schema connected with this concept”. Kant indeed applied the term ‘schema’ as a concept of such an operation of the sensual and conceptual shaping and framing not just to sense perception like the sensing and seeing of figures in visual space, but also to the imaginative substantiation of the “pure concepts of reason” (categories)—being “the transcendental schema”. This would be “but the pure synthesis, according to a rule of the unity following concepts in general …” (category) (ibid. 181). “In fact, at the foundation of our pure sensual concepts there are not pictures of the objects, but schemata” (ibid. my translation). - Kant applied this procedure of coordination and therefore also the concept of schema also to “imaginative” and mental representation of any objects of any experience, i.e., of their images: “The image is a product of the empirical capacity of the productive power of imagination, the schema of sensual concepts (being of the figures in space) is a product and so to say a monogram of the pure power of imagination a priori, by which and according to which the images are rendered possible at all, which however have always to be connected with the concept only by using the schema which they designate and with which they per se are not totally congruent” (ibid.). Kant thus anticipated the process of developing and establishing as well as applying cognitive constructs for the imaginative realization, visualisation of mental configurations and models, i.e. of cognitions. He only should have extended this to actions, too.
264
Hans Lenk
constructivist approach and may itself be analysed by using models of a yet higher level. In so far you can say that scheme-interpretationism is a higher-level bridge between the allegedly separated “science cultures” à la Snow as well as between cognition and action. It may resume and reinstall a higher-level unity between the different polarities. This seems to be the main message and may even be exemplified with regard to the history of hermeneutics and the respective humanities. Traditionally it is true that even in the history of hermeneutics a sort of perspectivism was emphasized.10 The constructivist, “poietic” or object-forming constitutive function of language is already in some sense acknowledged by Schleiermacher; he already speaks of the “schematism” (“Schematismus”) (again after Kant) and of a “community of thoughts and thinkers” (“Denkgemeinschaft”, Schleiermacher11) within a language community. This constructive element is even more explicitly emphasized by Dilthey: Like Schleiermacher he talks about the reconstructions in the processes of any “given talk” in the formal rules taking up the remarkable quotation from Schleiermacher: “I do not understand anything except what I can construct and see as necessary” (a statement, by the way, already proposed by Vico and Hobbes before). Dilthey would expand this by saying: “Thus originates meaning (or sense), (‘Sinn’, H.L.) by determining the undetermined by the construction”.12 He is totally convinced that any constitution is constructive and that interpretation is a constructive activity of the acting subject. This is parallel exactly to the above-mentioned analytic and formal reunification of knowledge and action obtaining not only in modern philosophy since Peirce’s pragmatist approach and in the ideas of the late Husserl (“Lebenswelt”), but also of the later Wittgenstein in the form of his “life-forms” (“Lebensformen”) and in some variants of pragmatic realism as well.13 Indeed, the central idea of the later Wittgenstein is that meanings are to be reduced or at least necessarily combined with us(ag)es of actions, patterns, i.e. rule-confirming sorts of schematized patternings—developments which might be captured by the concept of “pragmatizing” semiotics and semantics as well as “functionalizing” and 10 Already by Chladenius in 1742, who conceived of knowledge according to the selection, distinction and, comparison and usually if not always from a point of view: He would even speak literaturally of a “Sehepunkt” (point of viewing). He also directly mentioned “perspectives” indeed, in the humanities as well as in other disciplines relying on interpretation of whatever kind. A certain kind of perspectivism is necessary involving some kind of constructivist approach. Already Chladenius saw clearly (ibd. 518) that he would be obliged to have taken over this “Sehepunkt” in all interpretations under a perspective or constructive approach, being a sort of interpretatory activity, a kind of art so to speak. 11 Friedrich D.E. Schleiermacher, Hermeneutik und Kritik. Frankfurt/M.: Suhrkamp 1977, 443ff, 29. 1974, 31. 12 Wilhelm Dilthey, Gesammelte Schriften. (1927, Stuttgart: Teubner) Göttingen: Vandenhoeck & Ruprecht, (Stuttgart 1974),vol. 7, 220. 13 See my Grasping Reality. Singapore: World Scientific 2003.
Methodological Higher-Level Interdisciplinarity by Scheme-Interpretationism
265
somehow “socializing” meaning.14 Its important to know that even the function of relating towards objects or “grasping” objects (see my 2003) and statements by understanding is basically not only constructive and designative, schematizing—in short, interpretative—, but in many ways also activistic. Understanding thus also is a sort of disclosing or unfolding constitution and reconstruction as for instance also Gadamer15 exemplified by drawing on the example of the constitution of the works of art or play by analyzing these phenomena as the realization of a certain patterned activity under rules. The question is whether epistemology may have something to contribute to the bridging of this kind of follow-up “culture separation”. By contrast to the first appearance of an absolute cultural cleavage between the natural, social and human sciences, we have seen, that philosophical concepts of natural and social sciences as well as epistemological insights lead us to the result there is a certain more abstract, higher level epistemological approach being capable of bridging the separation the different disciplines, indeed on a higher level. All the central concepts of knowledge and action in everyday contexts as well as in science and humanities are relying on constructing symbols, applying symbols and the interpretation of these. The development, differentiation, and application of the patterns are understood as interpretation in the widest sense, i.e. as scheme-interpretation. (The traditional hermeneutical understanding of texts would, by differentiating contradistinction, figure as a specific sub-category of a kind of scheme-interpretations by applying them to reading or composing texts—under what I like to call “the reading paradigm”.) It is true that all sorts of representation and access to the world, to other subjects as well as to the situation of a person and human being in a “Lebenswelt” are deeply interpretative, structured by scheme-interpretations and in general unavoidably shaped by interpretations and impregnations in the above-mentioned sense. The basic principle of methodological scheme-interpretationism is that all knowledge, “graspings” and actions are impregnated or bound by scheme-interpretation and that we can only in a schematized manner “grasp”, conceive of, mean, order and act in a differentiated way. This fundamental principle cannot be doubted at all. It is the kernel and basis of the methodological epistemology of scheme-interpretation and their respective theories. At first, the present scheme-interpretation constructive approach was conceived as a methodological proposal. However it can also be analyzed in a quasi Kantian traditional epistemology as a Kantian transcendental interpretationism.16 14 My Einführung in die Erkenntnistheorie. Munich: Fink 1998, chap. 7. My, Das Denken und sein Gehalt. Munich: Oldenburg 2001, chap. 14. One may even expand Wittgenstein’s model of “language games” towards a conception and theory of “schema games” beyond the limits of the verbal and linguistic (my Schemaspiele. Frankfurt/M.: Suhrkamp 1995). 15 Hans Georg Gadamer, Wahrheit und Methode. Tübingen: Mohr Siebeck 1960. 16 See my „Transzendentaler Interpretationismus—ein philosophischer Entwurf“, in: Holz, H. (Ed.): Die goldene Regel der Kritik. FS Radermacher. Bern-Frankfurt 1990,
266
Hans Lenk
Beyond that—and even beyond Cassirer’s—this approach has to be expanded towards an anthropology of the meta-interpreting being.17. Humans are not characterized specifically enough as the symbol applying and symbol interpreting beings (also primates can do that, though in a residual way!), but humans are distinguished by being able to interpret their interpretations again by interpretations on a higher level, to make cognitions, actions, and interpretations the objects of a higher level interpretation or meta-interpretation for that. Humans may differentiate, distinguish, and interpret not only within this specific level or stratum of interpretations—say by conceptualizing different classes of objects, properties, relations etc.—but they may also ascend to higher levels of interpretations by making their interpretations as higher interpretations in turn the object of even higher meta-level interpretations. This is an open overarching of strata and levels not to be finished at this or that specific overall general level, but open for indefinite ascension—though not in practice but in principle. The human being therefore is the meta-symbolic being of the meta-levels the meta-schematizing and superinterpreting being par excellence. It is this possibility of emancipation from the application of symbols on a specific object level or in an object language which characterizes the human being as the meta-interpreting being. Epistemologically speaking it is clear that with the conception of schemeinterpretations and of the interpretative schematizing activities we have found a rather comprehensive promising attempt and model comprising the hypothetical theories of natural scientists as well as the conceptions of meaning and understanding of the humanities (also including philosophers and methodologists themselves) as well as of the conceptualizations in everyday life. Of course, some differential distinction or contrasts even incompatibilities in the sense of specific disciplinary perspectives are not to be denied by this. (To analyze these differences would be the task of a special scheme-interpretationist philosophy of science or hermeneutics etc.) In some sense traditional hermeneutics of understanding has developed a certainly fruitful and indispensable approach as regards some methodological interpretative rules (usages and presuppositions of (re)interpretation of texts). This is the kind of hermeneutical methodology being a rather special case of our wider and comprehensive interpretation-constructivist approach.18 In particular, tradi121-135. 17 My Global TechnoScience and Responsibility. Berlin: LIT 2007, chap. 3. 18 Beyond that however, philosophical hermeneutics starting with Dilthey, but notably being forwarded by Heidegger and Gadamer as well as Blumenberg to have a certain kind of hermeneutic “shaping of the world” or rather representations of world and even the self. The constitution and methodological concepts as well as presuppositions are certainly to be interpreted as special cases of methodological constructive interpretationism in the mentioned general sense. However, thus far hermeneutics, even “world hermeneutics” remained all too much within and under the spell of text-interpretationism subdued by what I call “the paradigm of reading”: the world should so to speak be interpreted as a “text”; even actions would only be understood as texts (Blumenberg’s
Methodological Higher-Level Interdisciplinarity by Scheme-Interpretationism
267
tional hermeneutics as well as universal hermeneutics did not succeed in involving the schematizing activities on a neuronal and biological (neuro-biological) basis providing the vehicles (neuronal correlates) of all symbolic-interpretative activities of the organism. By contrast, modern brain research and neuro-biology did beyond any doubt underline and verify the fruitfulness of a model of scheme-developments and scheme-applications, be it in the primary interpretations of the sense perceptions and patterns like by biological instincts, drives and motivations, be it with respect to conventional, learned scheme-activations and stabilisations of a social and cultural provenance etc. Under the perspective of a generalized concept of interpretation as schematization and the activation and stabilisation of schemes we are able to unify, though on a higher level as mentioned, the basic biological and neurobiological patternings of our actions and knowledge with those of symbolic and cultural conventional provenance under a sort of theoretical and meta-theoretical roof. In addition, the same is true of the combination of everyday knowledge, of any action and object constitution etc. under the general abstract methodological (or, if you wish, quasi-transcendental) perspective of scheme-interpretationism or interpretative constructivism of a nevertheless realist sort.19 We can thus find a certain reunification of epistemology and action theory and their relevant disciplines under a kind of scheme-interpretationist symbolic anthropology, though paying the price of a certain kind of formality and abstractness by reaching the unified result only on higher levels. The unity of knowledge and action is reinstalled on a higher meta-level, namely that of rather abstract interpretative forms, rules, methods, requirements, and results of schematizations, i.e. scheme-interpretations. This is a very relevant and important result not only for the philosophy of the natural sciences but also of the social sciences and the humanities as well as for everyday life.
Karlsruhe Institute of Technology (Universität Karlsruhe) Institut für Philosophie D-76128 Karlsruhe Germany [email protected]
Readability of the World, 1981, as a book title). Universal hermeneutics was fixed to the rather repeated construction of text-interpretation and could only by and large open up towards an interpretational constitutionalism of a more general purview. 19 See my Grasping Reality,. 2003, op.cit., chap. 8.
JAN FAYE
EXPLANATION AND INTERPRETATION IN THE SCIENCES OF MAN
There is a long tradition for separating explanation from interpretation that has permeated the natural sciences, the social sciences, and the human sciences. The tradition was fuelled partly by the positivistic distinction between the context of discovery and the context of justification, partly by Hempel’s model of explanation, but mostly by the hermeneutic philosophy in the social and human sciences. These traditions may now have come to an end. In this paper I shall present a unitary theory of explanation and interpretation, a theory which I have argued for elsewhere, in which interpretations, or rather a group of interpretations, are considered as explanations of meaning.1 The view to be defended may be called the pragmatic-rhetorical theory of explanation. It sees an explanation as a discursive response to an explanationseeking question. Explanation is part of a rhetorical practice in the sense that explanation is an intentional act of communication. Rhetoric, as it is used here, has to do with expedient communication that is context-bound, directed and intentional, and potentially persuasive. An explanation is a response to a question by an interlocutor, and the explanation is meant by the respondent to inform him about what he does not understand by providing some missing information, by making something probable, or by making abstract issues concrete. The respondent’s answer brings insight to the questioner by placing the information he asks about into a broader context of what he already knows or what he is willing to accept. Any pragmatic theory of explanation is characterized by the idea that there is an intimate connection between explanation and context. But where other pragmatic theories (such as Bas van Fraassen’s and Peter Achinstein’s) associate explanatory relevance of an answer with truth or correctness, the rhetorical theory permits that false answers act as explanations whenever they are consonant with the background knowledge of the explainer (and the explainee). Thus an explanatory answer is relevant and informative with respect to the context in which the question is placed and with respect to the background assumptions of the interlocutor and the respondent, and perhaps their personal interests.
1
Jan Faye, “Interpretation in the Natural Sciences”, in Mauro Dorato, Miklós Rédei, and Mauricio Suárez (eds.) EPSA Epistemology and Methodology of Science: Launch of the European Philosophy of Sciences Association. Springer Verlag 2009.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_18, © Springer Science+Business Media B.V. 2011
270
Jan Faye
EXPLANATION AS AN ACT OF COMMUNICATION To begin, let me briefly say what I take explanation to be.2 I hail a pragmatic theory of explanation which characterizes explanation in terms of its role of providing understanding. In contrast to any formal approach to explanation, a pragmatic approach denies that the concept of explanation can be characterised solely in semantic or syntactic terms. And contrary to an ontic approach, it refuses to accept that explanation is only concerned with ontological categories like causation. The pragmatic-rhetorical approach sees explanation as part of the scientific discourse and our general communicative practise. It regards explanation to be an appropriate answer to an explanation-seeking question in relation to a particular epistemic context. A question is being raised in a situation where the questioner has a cognitive problem because he or she lacks knowledge of some form and now hopes to be informed by an explanatory answer. Therefore, this pragmatic view regards the context of the explanatory discourse, including the explainer’s beliefs, cognitive interest and background assumptions, as what determines the appropriateness of the answer. Pragmatists think that the acceptability of the explanatory product is partly a result of the circumstances under which the explanation is produced. Also, they take scientific explanations to be basically similar to explanations in everyday life. The similarity between different kinds of explanations is found in the discourse of questions and answers that takes place in a context consisting of both factual and cognitive elements. The claim is that we do not understand what an explanation is unless we also take more pragmatic aspects around a communicative situation into consideration. The pragmatic-rhetorical view regards explanation as an agent of change in belief systems. Thus, the pragmatic-rhetorical approach holds that a fitting response to an explanation-seeking question in science need not follow by valid deduction from a set of premises, nor does it need to appeal to a causal mechanism; hence, the acceptance of a fitting response as an explanation includes lots of contextual elements such as depending on exigence, audience, and constraints. This approach does not pretend to give us more than a descriptive account of what the audience will accept as an explanation. Whether an explanation is good or bad, true or false, is not the issue as long as it fits into the general pattern of scientific inquiry. So the insight that can be associated with the pragmatic-rhetorical view of explanation is that scientific inquiry, and thus scientific explanation, is goal-oriented and context-bounded. It is always performed relative to some set of interests and a set 2
See Jan Faye, “Explanation Explained,” in Synthese, 111, 1999, 61-75; Jan Faye, Rethinking Science. A Philosophical Introduction to the Unity of Science. Aldershort: Ashgate, ch. 3; and in particular Jan Faye, “The Pragmatic-Rhetorical Theory of Explanation“, in Johannes Persson & Petri Ylikoski (eds.) Rethinking Explanation. Series: Boston Studies in the Philosophy of Science Vol. 252. Dordrecht: Springer Verlag 1997, 43-68.
Explanation and Interpretation in the Sciences of Man
271
of epistemic norms and standards which are context-dependent. Moreover, those norms and standards often change with the change of context without being explicitly acknowledged; thereby leading to controversies about what is an acceptable explanation. A pragmatist like Peter Achinstein characterizes explanation as an illocutionary act.3 In his groundbreaking theory of speech acts John Austin distinguished between three sorts: locutionary, illocutionary and perlocutionary speech acts. 4 According to the speech act theory, an illocutionary act is successful whenever the illocution succeeds, that is, if the sentence is stated in the right circumstances and is understood as intended by the speaker. But I think there is more to the notion of being an explanation than being an illocutionary act. It is not enough to call an appropriate answer to an explanation-seeking question successful as an explanation merely by saying that the questioner or audience understands the intention behind the answer. Imagine a situation in which the explainer provides the questioner with an answer. Indeed the answer is stated in the right circumstances by being a direct response to an explanation-seeking question; moreover the questioner or audience understands the speaker’s intention, namely as the speaker’s wish of providing an answer to his or her question. But as long as the answer does not improve the questioner’s understanding, we seem not to have a successful explanation. What is needed for such an answer to be successful as an explanation is that the perlocution succeeds, that is, an answer to an explanation-seeking question is successful as an explanation if and only if this linguistic act changes the cognitive situation of the questioner. Thus we can separate explanations from successful explanations by stating what counts as an explanation and what counts as a successful explanation in terms of a perlocutionary speech act. Whether or not an answer to an explanation-seeking question functions as a perlocutionary speech act, and therefore acts as a genuine explanation, depends on whether the speaker has intention of informing the explainee so that the latter can reach a new or better understanding. In this case the answer has a perlocutionary purpose. Moreover, the answer is successful as an explanation if it has a perlocutionary effect, and the answer has a perlocutionary effect if it actually has an informative effect on the explainee by changing his or her cognitive state. A common objection against any pragmatic theory is that it cannot cope with the widespread wisdom that the understanding one gets from scientific explanations must be true, objective and invariable. To the extent this intuition is correct I believe the pragmatic-rhetorical approach can account for it. The pragmatist does not have to deny that scientific explanations are concerned with a mind-independent world against which scientific explanations therefore are measured to find out whether they are true or not. She may be a realist of sorts. But in my opinion 3 4
Peter Achinstein, The Nature of Explanation. New York and Oxford: Oxford University Press 1983, p. 16. John L. Austin, How to Do Things with Words. Oxford: Oxford University Press 1962.
272
Jan Faye
the common wisdom has limited value. It is based on a flawed metaphysics that there is always one, and only one, correct way of describing the mind-independent world, that there is just one perspective from which the world is truly described. To be objective is not to be context free, but to be aware of the context and the perspective we speak from. Our description of the world is dressed in conceptual and theoretical clothing, but our description is dressed for the occasion and the conceptual garb may be renewed from time to time. Norms and standards for evaluating one’s beliefs also change with respect to the problem in need of an explanation. Such a change of explanation comes not only with historical development over time but also with the shift in the problem context. Also on a higher level the fact is that scientific theories may be empirically underdetermined by evidence, which means that the theory one accepts is determined by other factors than mere observations. These other factors are, however, not equally objective, nor do they have an objective ranking. Here personal or shared interests play an important part. In my opinion, explanation should be understood in the general context of interpersonal communication. Explanation is closely connected with understanding. When we explain things and events to each other, we pass on information about an immense range of different topics. These may cover such things as the structure of the natural world, social tensions, historical events, reasons for our actions, the meaning of words, symbols, literature and art works, or instructions on how to operate a certain piece of machinery. Explaining things and events is thus an appropriate linguistic reaction to what is considered to be an explanation-seeking question by which we distribute information of all kinds to one another. Now if neither truth nor correctness matters with respect to whether an answer to an explanation-seeking question accounts as explanation, you may wonder how we can distinguish between possible and actual explanation. I suggest the following distinction: A possible explanation is for the explainer a perlocutionary response that seems plausible in the light of his beliefs concerning the evidence, his background knowledge, assumptions, and cognitive interests. An actual explanation is for the explainer a perlocutionary response which he thinks is true in the light of his background knowledge, assumptions, cognitive interests, and beliefs concerning the evidence. Indeed, according to these definitions, both a possible and an actual explanation may be false and therefore explainees may rightly take them to be false.
EXPLANATION IN THE SOCIAL SCIENCES A pragmatic-rhetorical notion of explanation may help us to solve some methodological disputes within the philosophy of the social sciences. Over the years we have seen many different views on which grounds the social sciences should explain social phenomena. Some approaches begin by assuming that it must be
Explanation and Interpretation in the Sciences of Man
273
possible to establish genuine causal laws concerning social forces and given these laws one can explain various social facts. These approaches see the methodological strategy of the natural sciences as their natural goal for theory-building and objectivity. A few approaches base their study of social phenomena on intentionality and on the rationality of the individual social agent. There are those approaches that look for individual causes, rather than general laws, and there are those that attempt to explain social phenomena in terms of social rules and conventions, and still others that believe that the behaviour of social agents must be explained by its social function. This diversity of theories and methodologies in the social sciences have sometimes created a heated debate about which of these approaches, for instance methodological holism or methodological individualism, is the correct way to acquire understanding of social facts. But it is not at all obvious from the debate that such questions have a proper answer. Since our explanatory understanding of the social and cultural phenomena is even more context dependent than that of natural phenomena, it is no wonder that we may find alternative explanations within the human and social sciences. Nevertheless, some philosophers and social scientists urge that the elementary unit of social life is the human individual action. For instance, Jon Elster argues: “To explain social institutions and social change is to show how they arise as the result of the actions and interaction of individuals. This view, often referred to as methodological individualism, is in my view trivially true.”5 Thus, methodological individualists believe that scientific knowledge within the social sciences is appropriately acquired through the study of individuals. Methodological holists take the opposite view claiming that scientific knowledge within the social sciences is appropriately derived through the study of groups, classes, organizations, institutions, forces, or social processes. The explanation of individual behaviour should be based on the sustaining role this behaviour has for the social system as a whole. Philosophers and social scientists who believe that the scientific understanding of aggregate levels is prior to the scientific understanding of individual human actions may prefer various kinds of explanation. They may look for nomic, causal, structural, or functional explanation depending on which thematic problems they want to pursuit. The methodological debate in the social sciences just mentioned is concerned with two partly independent questions: 1) How should we understand the difference between methodological individualism and methodological holism? Which one is the correct approach? 2) How should the social sciences consider types of explanation other than causal explanation such as structural, functional, or intentional explanations? An answer to each of them may be separated since you can be a methodological individualist and appeal to intentional explanation, or you can be methodological holist and still defend causes and mechanisms. But most often 5
Jon Elster, Nults and Bolts for the Social Sciences, Cambridge: Cambridge University Press 1989, p. 13.
274
Jan Faye
we associate individualism with explanations in terms of causes and intentions and holism with explanations in terms of structure and functions. Now, let us focus on the explanatory significance of these two methodological approaches. As I understand them, we can illustrate their virtues by using the two models “homo economicus” and “homo sociologicus” as representing individualism and holism respectively.6 The two models stand in opposition to each other. The first sees the human agent as a rational, free to choose, and utility maximizing individual, the second sees her as a socialized, norm, cultural, and role governed individual whose behaviour is determined by the structural features of the society. If we want to explain a phenomenon like social mobility, i.e. the difference in peoples’ chances to move up the social ladder in relations to their social background, the economical approach will point to the agents’ rational calculation of possibilities and risks. Children of unskilled worker, for instance, may see higher education to be more risky and cost demanding than children of academics. The sociological approach will rather point to limits and barriers in the agents’ environment and structural conditioned resources. None of the diverse approaches seems to exclude each other in practise. There is no fact of the matter which determines what kind of representation of social phenomena one must choose to make science proper. It depends on our cognitive interest which kind of explanation we want to pursue. (Indeed, there is more at stake than cognitive interests and purposes. The discussion is also loaded with moral, normative, and ideological interests.) I believe that theoretical and methodological pluralism is as healthy for the social sciences and the human sciences as it is for the natural sciences. In the natural sciences there is not only one model which can be used to explain all physical phenomena. Not even within quantum mechanics. In physics and chemistry there exist different and mutually excluding approaches which can be used to get theoretical information about a certain physical system. For instance, in nuclear physics you have mutually inconsistent representations of the atomic nucleus, the interacting boson model, the liquid drop model, and the shell model, and which of them a physicist actually uses depends on the kind of problem she wants to solve and the context in which the problem appears. You cannot say that one of the models is correct whereas the others are incorrect. The shell model is very useful for explaining the effect of individual particle movements inside the atomic nucleus whereas the liquid drop model is better for explaining the collective movements of the nucleons. Thus, it is the problem context that determines which information we can get. Similarly, the “homo economicus”-model and “homo sociologicus”-model assign different and partly excluding properties to the agent. In one perspective her action is the result of deliberation and rational free-choice, in another much of the agent’s behaviour is determined by collective features which she is not aware of, and therefore which makes her action not so free and rational. In the 6
See Martin Hollis, The Philosophy of the Social Sciences. Cambridge: Cambridge University Press 1994, pp. 94-114.
Explanation and Interpretation in the Sciences of Man
275
same way as the physicist can choose between the shell model and the liquid drop model, depending on whether she wants to explain individual processes in the nucleus or wants to explain collective processes in the nucleus, the social scientists can choose between homo economicus-models, game theories, etc. and “homo sociologicus”-models if she either wants to explain the agents’ actions in terms of free decision-makers or wants to explain their actions in terms of social roles and memberships of institutions. In neoclassical economics, say, one finds the rational choice of individuals very appealing as the source of the behaviour of the financial market, whereas in new institutional economics one focuses primarily on the emergence of institutions out of interaction among individuals. But it need not be an either or. It depends. Sometimes we can get a better understanding of a social phenomenon by seeing it as a result of the actions of individual agents, but sometimes we can get a better understanding if we turn around and look at it from the institutional perspective. Individuals shape institutions, and institutions shape individuals. This interdependence cannot be explained properly from one perspective only, because seeing free agents as rational individuals excludes the attribution of different properties to humans in situations where they are seen as determined by the social patterns they partake in. In other words, we must put different perspectives of explanation on man depending on the selected context. These contrasting perspectives supply or complement each other, since none of them need to be false. But whether or not both are true depends on whether or not they can be empirically justified. It is the single scientist who chooses a certain perspective of explanation with respect to the cognitive problems she wishes to solve, and it is then experience which determines whether or not this particular explanation is empirically satisfactory.
INTERPRETATION IN CONTEXT We do not only respond to why-questions but also to how- and what-questions. The latter questions are traditionally associated with interpretation-seeking rather than explanation-seeking questions. In contrast, why-questions are typically seen as explanation-seeking question; although there is little to substantiate that an explanation-seeking question is only a why-question. The fact of the matter is, however, that we cannot use logical or linguistic means to differentiate between whether or not a question is explanation-seeking or interpretation-seeking. Rather it is the context that determines it. I hold that interpretation is a response, often hypothetical, to a question concerning a representational problem. This question may be formulated as a whatquestion or it may not be so formulated. A representational problem can be one of two different sorts of problems. The first kind rises from our ignorance of how to represent or conceptualize a certain object or phenomenon. The second kind stems from our lack of knowledge concerning what is the meaning of a certain
276
Jan Faye
phenomenon, sign, symbol, sentence, text, action, etc. What kind of represent is it, and what does it refer to, symbolize, stand for, signifies, etc? The first form of interpretation appears in connection with problems concerning classification, categorization, conceptualization, and mathematization.7 It consists in an appropriate responds to a question like: “What kinds of entity is Y”, “How can Y be represented” and “Does Y make any sense?” Elsewhere I have called this investigative interpretation.8 It may also be termed construction of representations, frameworks, conceptual systems, theories, or meanings. A simple example is the archaeological classification of the prehistorical findings into artefacts of the Stone Age, Bronze Age, and the Iron Age. When this categorization came to light it was a genuine work of constructing a conceptual framework. Today that framework is part of archaeologists’ background assumptions and something which shapes their terminology. Their use of this rough framework of classification no longer involves interpretation because their findings as a whole do not posses a representational problem. What may be a subject of interpretation is whether or not a particular item belongs to this or that period. The second form is about explanation of meaning. I call this determinative interpretation or interpretation proper. This kind of explanation provides us with an answer to questions like: “What does X mean?“, “What is X evidence of” or “How can we understand X?” Determinative interpretation is as such context-dependent as other forms of explanation. Likewise, what is used to explain meaning depends on the cognitive interest and background knowledge of the explainer. Let me provide you with an example which illustrates how different interpretative perspectives can be put on the explanation of certain cultural data. Nearly forty years ago I took part in an investigation involving iron crosses in Danish churchyards.9 Around the country you find cast iron cross memorials dating from the beginning to the close of the nineteenth century, the period during which the first iron foundries were set up in Denmark. This fashion lasted in each county fifty to sixty years until tombstones became popular. Indeed, the first question you may raise is “What do these iron crosses mean?” I knew already as part of my background knowledge that they are memorials. This was what started the investigation in the first place. I was not completely in an epistemic no-man’s-land. Nevertheless, the correct interpretative answer to your question depends on what you mean. It is you, we assume, who stand with a representational problem. If you want to know what their function is, an appropriate 7
8 9
This kind of interpretation is, as far as I understand, the main subject in Lenk’s works on interpretation. See Hans Lenk (1993), Philosophie und Interpretation. Frankfurt a. M.: Suhrkamp 1993; Hans Lenk, Interpretation und Realität, Frankfurt a. M.: Suhrkamp 1995; or Hans Lenk, Grasping Reality. New Jersey, London and Singapore: World Scientific Printers, 2003. See Jan Faye, “Interpretation in the Natural Sciences”, Op. cit. Jan Faye, “Støbejernsgravkors – et metodologisk lærestykke” (Cast iron cross memorials—a methodological lesson), Fortid og Nutid, 37, 1987, 225-38.
Explanation and Interpretation in the Sciences of Man
277
answer is that they were meant to be personal commemorations of a deceased which reminded the living about the dead. In contrast you may want to know what caused people to select iron crosses rather than wooden crosses or carved tombstones. In this case the concrete answer is that iron crosses are more permanent than wood, though much more important is that they became fashionable as soon as they became available. It was the fashion which began among high class people like admirals, bishops, counts, dames, and esquires. Over the years as the iron crosses became cheaper the fashion sunk down through the social classes until it became a poor man’s symbol. However, the incidence of these cast iron crosses is higher in the western and the northern parts of Jutland than in the eastern part, and here again it is higher than on the peninsula of Denmark and at cemeteries around Copenhagen. So on average fewer churchyards on the peninsula have one or more of these iron crosses than in the regions of Jutland. The variation between the regions was between 12.7 and 73.0 percent. Hence the next question you may raise to me, or I put to myself, is/was something like “What does this variation mean?” “What is the explanation of this phenomenon?”, “What is this variation across the country evidence of?” “How can we understand this variation?” or “Why do we see such a variation?” All these questions express the same intention of the explainee, namely that I, the explainee, have a representational problem concerning what these data signifies. After the field work I personally had the opportunity to conduct an examination of the data of the survey covering half of the churchyards in Denmark. This gave me a chance to work with inferences to the best interpretation/explanation in practise. First I had to produce various relevant interpretations of the data based on some common principles concerning understanding and representing such data. I formulated a principle, which I called the unity of time, space and cause, demanding that any understanding of this unequal distribution should be connected to the time the iron crosses were established and the time that has elapsed since then, the place where they were produced or where they were set up, and should only involve causes which I deemed to be relevant based on common background knowledge and ontological beliefs and assumptions. These constraints resulted in the framing of no less than nine distinct hypotheses, all seemingly able to explain the distribution detected. Some of them interpreted the data as a sign of status quo from the days when the memorials were erected; the other group interpreted the data as the result of a development over the years in between. These purported explanations advanced considerations as various as access to cast iron as a substitute for wood, the size and number of the foundries, the number of deaths, how long the vogue for the crosses lasted, its pervasiveness, wind and weather, the veneration shown the dead, increased permanence of residence along with a rise in the number of family graves, and the number of burials proportional to the size of the churchyard after the highpoint of their use.
278
Jan Faye
What is interesting about these explanations is the fact that they are, to a considerable extent, translatable into numbers; this makes them amenable to assessment by statistical methods with a view to the identification of the best explanation. The upshot in the present instance proved to be an excellent correlation between the observations and the ‘pressure’ hypothesis defined as deaths per square meter: over the years an increase in the number of fresh burials in churchyards meant that old graves bearing cast iron crosses came under pressure. The greater the number of individuals who die in a given parish and the smaller the size of the churchyard, the more necessary it became to reuse old graves. This resulted in an escalation in the rate of removal of cast iron crosses relative to the number of deaths and the size of the churchyards. There was little or no correlation between the data and the other hypotheses. However, the hypothesis concerning veneration was interesting since it assumed that the religious sentiments were much higher in Jutland in this period and might have caused a higher reluctant to destroy family memorials, and people therefore had expanded the churchyards because plenty of land was available. But I had difficulty in finding relevant data to test this hypothesis since I did not know how to measure veneration other than by interviewing people. Eventually I got the idea that I might be able to circumvent the impossible task of interviewing people by looking into the result of the parliament election in 1970 and seeing how many voters the small Christian Democratic Party gained in the various regions and comparing these numbers with the figures of cemeteries with one or more iron crosses. The correlations turned out to be highly significant. So I was left with two distinct hypotheses, both strongly supported by evidence. Apparently, these two provided me with equally good causal explanations. One explained the variation in terms of the individual agents, and thereby supported methodological individualism; the other one explained the variation in terms of a general pressure on the graves, something that seems to vindicate methodological holism. I would therefore claim that there is not one single correct explanation. Which one you actually choose depends on the context from where you see the problem. If your explanatory starting point is determined by your belief as a scientist that all social phenomena are determined by individual agents, then you would exclude some explanation. Similarly, if you believe as an explanatory starting point that all social phenomena are a result of forces and pressures in the system as a whole, then you will be blind for equally good explanations. There was, however, a common cause lurking behind this strange coincidence. Both explanations related to phenomena which were a result of the urbanization which took place during the industrialization of Denmark. During this period of a hundred years, cohorts of people moved from the country side into the main cities, changing the demography of the country completely. Not only did this cause a stronger impact on the churchyards around the cities in the Eastern part of Denmark, where the churchyards were comparably smaller than in Jutland, but also it
Explanation and Interpretation in the Sciences of Man
279
changed the minds of the population when the majority became workers instead of farmers. An urban life with a regular income makes you on average less likely to continue to keep the same religious, social, and political values as your forefathers than if you had a life depending entirely on the weather to give you a good harvest or a catch of fish.
CONCLUSION Let us summarize the main view forwarded here. Explanation is an answer to an explanation-seeking question in a context in which some person lacks some information about a certain fact (or phenomenon). Humans have epistemic interests in understanding things according to their nature, and explanations are what give us understanding of the questions which are raised as a consequence of these epistemic interests. The response to such questions is an explanation. Whenever we possess an explanation, we also have an understanding of the question which has been answered by the explanation. But our background assumptions determine what the relevant answer to our questions is. Explanations take part of a bigger system of beliefs. Interpretation, on the other hand, issues an answer to a question about explanation of meaning or about representation of a phenomenon whose comprehension falls outside somebody’s background knowledge. Whenever we interpret something it is because we can’t explain it since we don’t understand it. The answer transforms a phenomenon, now understood in terms of some theory, from being somehow unfamiliar to something less unknown. The phenomena, or rather beliefs about the phenomena, are thereby included among that person’s background assumptions and connected to his or her background knowledge. Phenomena become intelligible and meaningful because by attributing identity or explanation to them, an interpretation brings them in connection with our theories or belief systems. Interpreting is a process which creates either representations or explanations, and both kinds of interpretation provide us with understanding. Thus, the aim of interpretation is to reach a proper understanding of a phenomenon regardless of whether the proposed hypothesis is concerned with traditional meaning, function, intention or causation. In the end an interpretation is a hypothesis which is presented on a background of accepted conventions and ontological assumptions.
University of Copenhagen Department of Media, Cognition and Communication Njalsgade 80, DK-2300 Copenhagen Denmark [email protected]
PETER KEMP
IMAGINATION AND EXPLANATION IN HISTORY
The aim of this study is to clarify the role of imagination in and for explanation in the exploration and writing of history. Let me confess from the beginning that it is deeply inspired by the late French philosopher Paul Ricœur (1913–2005); but I am, of course, responsible for the structure of my presentation and the adaptation of his thoughts to my own project. My starting point must be the concept of imagination itself. It covers a very wide field of meaning based on one fundamental distinction—between perception and imagination. Imagination is the use of mental images that are different from impressions of perceived facts. However, these images are different from empirical facts in very different ways. They may be different ways of coping with or reproducing facts and, therefore, imagination has been considered a kind of weakened perception. In that case, imagination is viewed as totally dependent on perception and taking all its meaning from perception. This is reproductive imagination. However, mental images have also often been understood in opposition to empirical facts as referring to an absent other person or thing in portraits, dreams, fictions, etc., or to another dimension of the facts than the perceived reality, their invisible context, whether it is physical, practical, social, historical, or metaphysical. Here, the imagination is only partly dependent on perception; it may borrow perceptual remains from real perceptions, but it imagines other things that do not exist or are absent in space or time. This is productive imagination. Imagination in history is productive imagination because it is a kind of memory. In his writing On Memory and Recollection, Aristotle claims “that memory belongs to that part of the soul to which imagination belongs” (450a). This memory presupposes perceptions of things, but the perception is gone: “it is only the affection that is present, and the fact is not” (ibid.). Thus, memory is not perception. Moreover, it is not expectation that imagines something in the future. Aristotle declares: “It is impossible to remember the future which is an object of conjecture or expectation … nor is there memory of the present, but only perception. But memory is of the past,” or “memory is of what has been (he mneme tou genomenou)” (449b). Thus, there can be imagination of the future, but this is not memory. Moreover, there can be imagination of pure inventions, but this is not memory, either, because “memory is of the past”. It follows that memory is only one particular form of imagination.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_19, © Springer Science+Business Media B.V. 2011
282
Peter Kemp
Now the question is how that form of imagination we call memory enables us to tell or write history, i.e. to explore and explain history or historical events and the past life of cultures, societies, groups, and individuals, whether we are talking about a short term or a long term. I would like to analyze four “moments” (in the Hegelian sense of a moment, i.e. part of a whole) of historical memory and its historiography: 1. Personal memory of the past without which the past itself has no meaning 2. Interpretation of the traces of human activities without which we cannot understand what these traces might signify, 3. The capacity for storytelling without which we cannot imagine the coherence of a sequence of historical events. 4. The connection of personal memory and collective memory.
I. THE MOMENTS OF HISTORICAL MEMORY 1. Personal Memory Aristotle is well aware of the fact that memory, mneme, presupposes a sense of before and after, i.e. of time: A human being who remember something “always says in his mind that he has heard, or felt or thought this before” (450a). This dating of memories may be more or less exactly but not totally without an awareness of time. It follows that all memory “implies lapse of time. Hence, only those living creatures which are conscious of time can be said to remember, and they do so with that part [of the soul] which is conscious of time” (450a). But, according to Aristotle, we do not only have memories as affects of mental pictures. We have also a power of recollection, anamnesis. We are capable of research into our memory in order to recall what is not present to our mind or what we have forgotten. He declares that recollection differs from remembering not merely in the matter of time, since it is not primarily a concern of time, “but also because, while many other animals share in memory, one may say that none of the known animals can recollect except man [anthropos = the human being]. This is because recollecting is, as it were, a kind of inference; for when a man is recollecting, he infers that he has seen or heard or experiences something of that sort before, and the process is a kind of research” (453a). However, both mental pictures and their recollection belong to the personal experience of memory. A human being knows what memory is and can recall this memory and even fights against forgetting, because he or she experiences what we today designate as “a life story”. And to this life story belongs not only our individual past life but also our past life together with others. Moreover, we know that this personal story is not pure invention, although we are able to imagine it otherwise than it was and transform it nearly totally. But even then we know that there was a past life that we have more or less purposely manipulated. And we
Imagination and Explanation in History
283
know that others with whom we have shared some parts of our own past life story can refute our presentation of the story. This provides the basis for a very important distinction that Ricœur, inspired by Vladimir Jankélévitsch, developed in his last great work from 2000, Memory, History, Forgetting, between “having been” and “simply elapsed”. Therefore, when we say that the historian’s representation is a present image of an absent thing, “the absent thing itself gets split into disappearance into and existence in the past. Past things are abolished, but no one can make it be that they should not have been.”1 Therefore, the historian may represent them as “simply elapsed” in the sense of being irrevocable and see them simply as materials for his construction of the past. But past things and persons also belong to our history and, more or less, to our own history, i.e. our own past or the past to which we are in debt for what we have received from our predecessors. Then, “the intention of the past as having been comes out of this reinforced, once having-been signifies having been present, living, alive”.2 Moreover, to the extent that our own lives will be remembered in the memory of others who preserve what we have been or, at least, what could have been remembered, our past cannot be annihilated. Vladimir Jankélévitch expressed this mysterious having been in his book on The Irreversible and Nostalgia in a phrase that he also posted on the door of his home and that Ricœur places at the start of his own book on Memory, History, Forgetting: “He who has been, henceforth cannot not have been: henceforth this mysterious and profoundly obscure fact of having been is his viaticum for all eternity”.3 Thus, by our own life and work, we not only make history as historical actors, but we belong to a history that exceeds our finite life. Without a sense of this belonging, the past itself would have no meaning, and history would have no meaning. 2. Interpretation of the Human Traces We must now examine how we pass from our personal memory to historiography. The first step is the interpretation of the traces of human activities. Interpretation is different from observation as pure registration. It is a guess at the meaning and importance of a phenomenon, and let me add that there is hardly an observation without a minimum of interpretation. But this needs more detailed discussion that I cannot take up here. My topic is the interpretation of history and, since history consists of absent things and persons, it cannot be observed, only interpreted.
1 2 3
Paul Ricœur, Memory, History, Forgetting, translated by Kathleen Blamey and David Pellauer. Chicago: The University of Chicago Press 2004, p. 280. Ibid., p. 364. Vladimir Jankélévitch, L’Irreversible et la nostalgie. Paris: Flammarion 1974, p. 275; cf. Paul Ricœur, Memory, History, Forgetting, loc. cit., p. 602.
284
Peter Kemp
How do these absent things and persons appear? They appear only indirectly, by the traces we find in monuments, buildings, castles, and towns (in ruins or not), in tools and skeletons we find in graves, etc. But very little could be said about human history on the basis of these material traces if we did not also have books and other writings in our archives. These writings are testimonies given by past human beings about what they have done and experienced in their life. What gives a testimony its force as ground for interpretation is that the assertion of reality given by the person testifying is inseparable from the self-experience of this person that claims,“I was there”. It is true that testimony may be misleading, even totally false. But as Ricœur points out in Memory, History, Forgetting, whatever may be our lack of confidence in principle in such testimony, we have nothing better than testimony, in the final analysis, to assure ourselves that something did happen in the past, which someone attests having witnessed in person, and that the principal, and at times our only, recourse, when we lack other types of documentation, remains the confrontation among testimonies.4
Thus, interpretation is not a random guess. It must conform to testimonies and, in particular, to testimonies that are confirmed by other testimonies or material traces. Furthermore, written testimonies are texts that have to be interpreted according to the so-called hermeneutical circle that Friedrich Schleiermacher developed in his Hermeneutics (notes from the period 1805–1819).5 This hermeneutical circle may be defined on three levels: At the most fundamental level, the circle is created by the relationship between individual speech and collective language: here, speech presupposes language from which it select the words that are applied in a sentence and a text; whereas language presupposes speech in order to be maintained and renewed. On the second level, the circle is constituted by the relationship between the different parts of the text, on one hand, and the whole text, on the other. The sentences and sections of the texts give meaning to the whole text; whereas the whole of the text as con-text gives the sentences and sections their meaning. Finally, a third circle is developed by the relationships between the particular text and the whole epoch (the “whole language”) to which this texts belongs. The particular text must be seen in its cultural and temporal context; whereas the period gets its meaning or specific character as culture in a certain time from all the texts that appear in it.
4 5
Paul Ricœur, Memory, History, Forgetting, loc. cit., p. 147. Friedrich D. E. Schleiermacher, Hermeneutik, Nach den Handschriften, neu herausgegeben und eingeleitet von Heinz Kimmerle, vorgelegt von Hans-Georg Gadamer. Heidelberg: Carl Winter 1959.
Imagination and Explanation in History
285
The distinction in the early Schleiermacher’s hermeneutics between grammatical and psychological interpretation may also be useful for a historian who want to understand what he or she can do to assure the validity of an interpretation. The grammatical interpretation belongs in the third circle between the particular text or speech and the entire epoch, and the individual human being is perceived as an organ of the language and is understood as a part of the epoch. Here, interpretation is comparison with other texts and authors. The psychological interpretation has a reversed optic, perceiving the language as the organ or medium of the author’s individuality and defining the individual style, mode of thinking, and composition. Here, interpretation is divination (hunch, guess), which re-construes and re-narrates The later Schleiermacher (as we find in his Hermeneutik from 1832) gave priority to the psychological interest in the author’s life over grammatical interpretation, but the early Schleiermacher wanted to establish a balance between them. He claims that,if one form of interpretation can be taken to its logical conclusion, the other would be superfluous. However, since this is not the case, the art is to use both in relation to each other. The hermeneutics developed in the 20th century (in particular, by Hans-Georg Gadamer and Paul Ricœur) totally renounced psychological interest in the author; the text should be interpreted, even without any knowledge of the author’s personality. Instead of an interest in the individual author, interpretation should try to understand the individual style and composition of the text and the whole period to which it belongs. In historiography, a similar development has taken place in that interest in great individuals has, to a considerable extent, been replaced by an interest in common peoples, their mentalities, their roads, villages, and towns, their economies, their relationships to nature and climate, etc. Although biographies and, in particular autobiographies, have continued to play an important role in cultural life, they are no longer considered, as Wilhelm Dilthey claimed, the most profound form of historiography but only one among several other ways of writing history. 3. The Capacity for Storytelling This shift of focus in historical research has involved the idea that narrative is only one of form for the presentation of the past in historiography. That seems to exclude that history is made up of stories about the past and that historical imagination is narrative imagination. In particular, this opinion has appeared in the discussion of works from the Annales School, especially the work of Fernand Braudel, The Mediterranean and the Mediterranean World in the Age of Philip II 1966, translated into English in 1972. Braudel distinguished between what happened in the short term, including individual events, and what took place over the long term (la longue durée) around the Mediterranean conditioned by land, climate, towns,
286
Peter Kemp
roads, economy, etc. And the long term seemed to be described and explained without any use of narrative imagination.6 Ricœur has discussed this question in his great work on Time and Narrative. He recognizes that a historical presentation does not need to take the form of a novel or to be a historical narrative. There are other kinds of history than the socalled narrative history. But he claims that, for our historical understanding and explanation, our narrative competence, i.e. our competence to follow a story and to tell a story, does not need to be displayed in a whole story in classical Aristotelian form with a beginning, a climax, and an ending. It is sufficient that we imagine that something has taken place in the past. On the other hand, without this minimal narrativity, there would be no historical focus. Ricœur maintains that if history were to break every connection to our basic competence for following a story and to the cognitive operations constitutive of our narrative understanding … it would lose its distinctive place in the chorus of social sciences. It would cease to be historical.7
There are, according to Ricœur, some features of the narrative competence that the historian cannot do without. First of all, the competence for formulating narrative sentences. A narrative sentence does not simply express an ordinary action, but the historical action is considered in relation to its result and its importance in relation to other actions. The effect or the effects of an historical fact determine the cause, not the reverse like an explanation in natural science. In that sense, the historian is not only an observer but a judge who judges what has happened in the light of what followed. Secondly, the historian puts the narrative sentences into a text that is more than a summary of the sentences but a representation we must follow from the beginning to the end. We cannot deduce this end from the beginning as in a mathematical calculus, and we cannot say that a consequence could have been predicted with absolute certainty from an initial cause. We accept the end if we found probable all things considered. It follows that it is the temporal succession of events or occurrences, and not an abstract logic deduction, that makes the historical explanation different from a purely physical, economic or sociological explanation. Thirdly, narrative competence allows us to establish a configuration that puts the different historical facts in relation to each other and explains them according to a social and culture structure. And, in the same way that a cause of an effect 6
7
Fernand Braudel, La Méditearranée et le monde méditerranéen a l’époque de Philippe II. Paris: Armand Colin, 1949, 2me édition 1966; English translation: The Mediterranean and the Mediterranean World in the Age of Philip II . Berkeley and Los Angeles & London: University of California Press 1972/1995. – “Histoire et sciences sociales: La longue durée” in Annales. Histoire, Sciences Sociales 13.4 (October–December 1958), pp. 725–753; with A. Coll : «Histoire et sciences sociales: La longue durée» in: Réseaux, 5:27 1987:7-37. Paul Ricœur, Time and Narrative, volume 1. Chicago: Chicago Universty Press 1984, p. 91.
Imagination and Explanation in History
287
may not only be a link in a series of causes but may also be the first cause of a new series that appears at a certain moment of historical duration, a structure or configuration may be stable for a shorter or a longer time, but it may also be a new structure that has been established at a certain moment of history. This does not exclude the temporal sequence. On the contrary, the configuration is connected to a sequence and explains its content. Finally, narrative competence is capable of an emplotment, une mise en scene, that uses a certain style and a certain genre in its presentation, just as Hayden White has shown in his Metahistory from 1973 that historians in the 19th century used satire, romance, comedy, and tragedy. It is true that the scientific explanation in historiography is not only given by the unfolding of these four features of narrative competence. Historical inquiry needs more. It also presupposes the development of concepts for criticism and putting the sources in order. Moreover, the historian must make an effort to be objective, i.e. coordinate events that have happened at the same time at different places and bring different—political, social, economic, military, cultural, etc.— aspects of history together. And, finally, he must be aware of the ideology or philosophy of history that his own research implies and be ready to defend it against a critique of ideology. But the historian is not to look for a law or laws in history, as claimed by Karl Hempel.8 There can be no law in history in the way we can assume there are laws in nature, because the course of actions, events or occurrence is not determined by natural causes (according to which the same causes have the same effect), but by singular causes. Instead of explanation by laws, the historian uses a “singular causal explanation”.9 Ricœur has found this idea of singular historical explanation in Max Weber, Wilhelm Dray and Georg Henrik von Wright. According to this explanation, B would not have happened if A had not happened. But A is considered as sufficient cause for the explanation of B. That does not mean that B can be logically deduced from A and could have been predicted from A. Moreover, the fact that A is a cause, that only happens once, does not preclude that it is similar to other singular causes and may be ascribed to the same type of occurrences. But it is not identical to other causes and, thus, does not belong to a law. It should also be noticed that anonymous entities such as peoples, nations, and civilizations have no meaning without the concrete human agents who belong to them. Peoples, nations, and civilizations are born, develop, and die as singular individuals, each of whom have their life-story that only can be understood by narrative. Finally, the fact that the historians, even Braudel, speak about persons, battles, and treatises shows that the long term is not without narrativity. They are not nec8 9
Karl Hempel, “The Function of General Laws in History” in: The Journal of Philosophy No 39, 1942. Paul Ricœur, Time and Narrative, volume 1, loc. cit., p. 182 ff.
288
Peter Kemp
essarily short explosions. They are rather considered as variables of the plot. But, as such, they are temporal changes that take place in the great drama the historian tells us about—for instance, the drama of the Mediterranean. 4. The Connection of Personal Memory and Collective Memory The connection of personal memory and collective memory, belonging to what Raimo Tuomela would call, respectively, ‘I thinking’ and ‘we thinking’,10 is another moment of the historical imagination I would mention. In order to clarify the role of imagination in historical research and presentation, we have concentrated on personal memory, but we have not been able to avoid the fact that the interpretation of the past cannot succeed without testimonies of others and remnants in general from others. However, the question is whether this recollection of history based on traces of the other could be a totally individual work or, on the contrary, whether it is a collective work and, perhaps, a totally collective affair. In the French tradition stemming from sociologist Emil Durkheim, it has been claimed that personal memory is an illusion. The historian Maurice Halbwachs has based this claim on the fact that, from our birth, we are never alone; an individual belongs to a group and his or her memories are shared memories; and we speak together about common memories from our common life in the family, in the school class, in nature and in town where we have experiences together to our common life in the institutions where we work or study, and in the society to which we belong. Halbwachs declares in his book Collective Memory from 1950, translated onto English and published the same year: “a person remembers only by situating himself within the viewpoint of one or several groups and one or several currents of collective thought”.11 As a response to this extreme position defending the idea of pure collective memory, Ricœur does not deny that our memory develops in a relationship with others and that, from the beginning, every individual shares his or her memories with others. This shared memory even has a very profound existential importance for us in close relations.12 However, he claims that personal memory and collective memory are complementary and he endorses the observation of P. F. Strawson in his Individuals from 1959 saying that predicates, whenever they are attributable to oneself, can be attributed to someone other than oneself.13 In other words, if a phenomenon is self-ascribable, it must also be other-ascribable. Ascription to others is, therefore, “not superimposed upon self-ascription but coextensive with it”.14 10 Raimo Tuomela, The Importance of Us: A Philosophical Study of Basic Social Notions. Stanford: Stanford Series in Philosophy, Stanford University Press, Calif. 1995. 11 Maurice Halbwachs, Collective Memory. New York: Harper Colophon, p. 33, quoted by Ricœur in Memory, History, Forgetting, loc. cit., p. 121. 12 Paul Ricœur, Memory, History, Forgetting, loc. cit., p. 132. 13 Ibid., p. 125. 14 Ibid., p. 127.
Imagination and Explanation in History
289
It is true that this complementarity does not prevent error and illusion. Both personal memory and collective memory can be false consciousness. Yet, although it is often collective memory that must correct or criticize personal memory, the reverse may also be justified and even considered as a moral duty for the individual. I consider that this is highly relevant for the work of the historian. Every historian must use his or her own memory and his or her own interpretation and appropriation of the sources and, in particular, the testimonies of the past. But historiography today is always based on the research and historiography of others and, although it may present itself as a one man’s work, it cannot remain an individual work. As a member of a professional community, the historian who wants to be recognized as such must listen to criticism from his colleagues and even from other contemporaries who find his or her interpretation and personal memory partly or totally unsuitable for a shared memory and, thereby, for an objective view of the past in a collective memory.
II. NEW LIGHT ON EXPLANATION Our analysis of the four moments of historical memory throws light on the role of understanding in relation to the explanation of historical phenomena. We may thereby contribute to the old discussion about the difference between understanding and explanation. Wilhelm Dilthey claimed in his work on “the makeup of the historical world” that there is a dichotomy between explanation and understanding and that the difference between the natural sciences and the human or social sciences is built on this dichotomy. He considered the makeup of the historical world as a “nexus of effects (Wirkungszusammenhang)”15 and rejected the idea of causality as a valid understanding of historical phenomena. This nexus of effects was described by Hegel as a process by which human experience has been objectified in “expressions of life” (language, myth, religion, morality, organization, and art), which must again be interpreted. Experience, expression, understanding—Erlebnis, Ausdruck und Verstehen16—were, according to Dilthey, the three phases of the hermeneutic process in which expression carries an authentic personal experience of life or a lived event to others who, in turn, appropriate this articulation according to their own understanding. Thus, Dilthey’s demarcation between the natural sciences and the human sciences: The natural sciences seek causal explanations, while the human sciences seek an understanding of the individual’s life. In both kind of sciences, abstract similarities can be found, and both look for objective knowledge. But, in the hu15 Wilhelm Dilthey, Der Aufbau der geschichtlichen Welt in den Geisteswissenschaften (1910). Frankfurt am Main: Suhrkamp Verlag, Theorie 1970, p. 167. 16 Ibid., p. 98f and 157f.
290
Peter Kemp
man sciences, focus is also on the individual, the unique, and what only happens once. In the beginning, Dilthey focussed very strongly on the individual agent and wanted, like the elder Schleiermacher, to find the authors behind the works. But he overcame this Romantic interest through a Hegelian focus on the objective social reality that remains after the disappearance of authors or creators. However, our analysis of the role of the imagination as memory and recollection in historical research and presentation shows that it is not fruitful or clarifying to exclude explanation from historical inquiry. When the historian tries to answer our questions—What happened? When did it happen? How did it happen? Why did it happen? What were the results?—his or her answers are explanations that give reasons, describe singular causes and their results and singular configurations and their implications at a certain time and thereby present a nexus of effects that seems reasonable without being a logical deduction. It follows that explanations in history are not something that should be avoided but exactly what we want to have, although they must be taken in another sense than in natural science. We want to explain what has happened in the past in order to better understand our world as a narrative whole. It is even “by explaining more we obtain a better understanding” (“expliquer plus pour comprendre mieux”) as Paul Ricœur has said at several occasions.17 It is true that a singular cause or a singular configuration normally involves a series of causes and effects or a structural coherence that looks similar to a causal chain or a systemic coherence in non-living nature. But there is no universal causality or coherence in history. Historical explanations must be limited to situate an event or an occurrence as a singular initial cause or in a particular causal series or as part of a particular configuration. Therefore, when a historical phenomenon is surprisingly new in relation to other phenomena, this novelty, which is itself a cause, cannot be explained or fully explained by other causes or structures. Then, we understand that here is really something new and we must accept it without explanation or only with insufficient explanation. In this case, we can say that we understand without explanation. But historical understanding is much more than an acceptance of the inexplicable. It is, first of all, an imagination by which we situate ourselves in the world of memory as a whole. We only understand historical phenomena when our explanations are provided against a background of personal and collective memory that interprets the sources of the past by using our basic competence for following a story. It is, then, our imagination of the past that constitutes our understanding. And, therefore, historical understanding is fundamental for all historical explanation. 17 Paul Ricœur, Temps et récit. Paris : Seuil 1984, Vol. II, p. 13 and 54; Time and Narrative, Volume 2, translated by Kathleen Blamey and David Pellauer. Chicago: The University of Chicago Press, 1985, p. 5 and 32; see also Jean-Pierre Changeux and Paul Ricœur, Ce qui nous fait penser. Paris: 1998, p. 141-150; What Makes Us Think?, English trans. by M. B. DeBevoise. Princeton and Oxford: Odile Jacob University Press, 2000, p. 125-133.
Imagination and Explanation in History
291
Finally, Dilthey was right in drawing the distinction between explanation and understanding. But this distinction is not the foundation for the difference between natural sciences and human sciences. Therefore, he should also have distinguished between different forms of explanation according to different forms of understanding for his separation of human sciences from natural sciences to be justified. Indeed, explanations in human sciences differ from those in natural science by the fact that they belong to a human world in which we not only remember the human beings who helped create and cultivate it but also imagine the development of ideas and knowledge, ways of life and mentalities, arts and technologies, etc. as something that has happened in the lives of human beings other than ourselves, who still have the same kind of feelings and experiences that we have. This imaginative conception of belonging to a human world we have in common with our predecessors is the basic understanding behind all historical explanation. Thus, it is by our present imagination about past life that we understand what we cannot explain. However, much can be explained. We should only take into account that, in human or social sciences in general and historical science in particular, we explain differently than in natural sciences, because we understand differently when we imagine a historical world than when we imagine nature. The two kinds of explanation presuppose two kinds of understanding.
Bjergtoften 10 DK- 2900 Hellerup Denmark [email protected]
PAOLO GARBOLINO
HISTORICAL NARRATIVES, EVIDENCE, AND EXPLANATIONS
HISTORICAL NARRATIVES AND EVIDENCE A story is a hypothetical historical narrative describing a process, or a narrative of particular events arranged in time and forming a meaningful totality. Historians tell stories to provide understanding of the past and use narratives as a form of explanation: “A narrative explanation, presumably, presents an account of the linkages among events as a process leading to the outcome one seeks to explain”.1 William Whewell’s term ‘colligation’ has been borrowed by William Henry Walsh to describe “the procedure of explaining an event by tracing its intrinsic relations to other events and locating it in its historical context”.2 ‘Historical’ or ‘genetic’ explanations are used also in evolutionary biology.3 It is acknowledged that good narratives must be supported by evidence: Narratives are constrained by the facts, since they are constructed from verifiable statements. They are subject to objective evaluation […] [but] as is to be expected, are underdetermined by their evidence, agreement on evidence still allows for the construction of logically incompatible histories”.4
Facts provide evidence for narratives, and constructing good evidence-based ‘stories’ entails providing an explanation of the occurrence of these particular observed facts, even though ‘good’ explanations not necessarily are ‘covering laws’ explanations. In his seminal paper about the role of laws in history, Hempel wrote that: Even if a historian should propose to restrict his research to a “pure description” of the past, without any attempt at offering explanations or statements about relevance and determination, he would continually have to make use of general laws. For the object of his studies would be the past – forever inaccessible to his direct examination. He would have to establish his knowledge by indirect methods: by the use of universal hypotheses which connect his present data with those past events. This fact has been obscured partly because some of the regularities involved are so familiar that they are not considered worth mentioning at 1 2 3 4
Paul Roth, “Narrative explanations: The case of history”, in: History and Theory 27, 1988, pp. 1-13, p. 1. William Henry Walsh, Introduction to Philosophy of History, London: Hutchinson 1951, p. 59. Kenneth Schaffner, Discovery and Explanations in Biology and Medicine, Chicago: Chicago University Press 1993. Paul Roth, “Narrative explanations: The case of history”, loc. cit., p. 13.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_20, © Springer Science+Business Media B.V. 2011
294
Paolo Garbolino
all; and partly because of the habit of relegating the various hypotheses and theories which are used to ascertain knowledge about past events, to the “auxiliary sciences” of history.5
Interpretivists claimed that explanations of human behaviour are structured entirely differently from explanations of the behaviour of physical objects, since human behaviour consists of actions done for reasons rather than events resulting from causes. Collingwood made a distinction between the outside and the inside of an event.6 By the ‘outside’ of an event, he meant its physical properties as the spatial and temporal location and the fact that bodies are actors in events: for example, the passage of Caesar across the river Rubicon at a certain date. By the ‘inside’ he meant the mental properties of the actors, as Caesar’s intention to defy Republican law. An action is the unity of the outside and the inside of an event and the historian’s task is to study actions, thinking himself into this action, and discerning the thoughts of its agents. Only the outside of an event can be linked by universal hypotheses to another physical event, a trace, because both are physical events, and there is a natural process connecting them. Actions are a class of events, because also actions occur and they have relatively clear beginnings and endings and (unclear) spatial boundaries. Actions can be causes of which events are effects, but, if Donald Davidson is right, we cannot have ‘covering laws’ of the kind Hempel was looking for: there are causal laws connecting reasons and actions but they cannot be psychophysical laws, and they cannot have the form of stating a regular connection between a psychological cause and an action.7 But psychological causes, beliefs and desires, can leave physical traces as well as physical causes. William Dray, while he was denying that the covering law model was apt to the explanation of human behaviour, acknowledged that this explanation has an inductive, empirical side, for we build up to explanatory equilibrium from the evidence. To get inside Disraeli’s shoes the historian does not simply ask himself: “What would I have done?”; he reads Disraeli’s dispatches, his letters, his speeches, etc.8
Social constructivism and Hermeneutics have taken the task of social scientists to be that one of interpreting the social discourse. In their most radical versions, they have claimed that there is no a logic of evidence, but only a discursive practice that creates ‘facts’. The historian Carlo Ginzburg has written that: The fashionable injunction to study reality as a text should be supplemented by the awareness that no text can be understood without a reference to extra textual realities. Even if we reject positivism, therefore, we must still confront ourselves with notions like “reality”, 5 6 7 8
Carl Hempel, “The Function of General Laws in History”, in: Journal of Philosophy 39, pp. 35-48, p. 48. Robin Collingwood, The Idea of History, Oxford: Oxford University Press 1946. Donald Davidson, Essays on Action and Events, Oxford: Clarendon Press 1980. William Dray, Laws and Explanation in History, Oxford: Oxford University Press 1957, p. 131.
Historical Narratives, Evidence, and Explanations
295
“proof” and “truth”. […] We can conclude, therefore, that the tasks of both the historian and the judge imply the ability to demonstrate, according to specific rules, that x did y, where x can designate the main actor, albeit unnamed, of a historical event or of a legal act, and y designates any sort of action.9
The comparison between the work of the historian and the work of the judge is a classical topic, but it is particular meaningful because in the last years the logic of evidence has been studied with particular attention precisely in the field of law. The early attempt by John Wigmore to develop a diagrammatic approach to the analysis and presentation of legal arguments has been taken up by the so called New Evidence Theory scholarship.10 What we might call the Wigmorean view can be restated as follows: in the context of an argument about a question of fact every inferential step from evidence to interim proposition to ultimate probandum or hypothesis requires a warrant. Such warrants typically takes the form of “background generalizations”.11
The term ‘warrant’ is borrowed by Stephen Toulmin’s path-breaking work on the analysis of arguments: warrants provide justification for the statement that the data support the conclusion of an argument.12 Warrants are what, in another context, Alison Wylie calls “linking principles”: the various kinds of background knowledge—the linking principles, “middle-range theory” —that mediate the interpretation of archaeological data as evidence and establish a connection between surviving archaeological traces and specific events and conditions in the past that are thought to have produced them.13 9
10
11 12 13
Carlo Ginzburg, “Checking the Evidence: The Judge and the Historian”, in: James Chandler, Arnold Davidson, and Harry Harootunian (eds.), Questions of Evidence. Proof, Practice, and Persuasion across the Disciplines, Chicago: The University of Chicago Press 1994, pp. 290-303, p. 295-6. John Wigmore, The Principles of Judicial Proof: As Given by Logic, Psychology and General Experience and Illustrated in Judicial Trials, Boston: Little Brown & Co. 1937; David Schum, The Evidential Foundations of Probabilistic Reasoning, New York: Wiley 1994; Joseph Kadane, David Schum, A Probabilistic Analysis of Sacco and Vanzetti Evidence, New York: Wiley 1996; Terence Anderson, William Twining, Analysis of Evidence: How to do Things with Facts Based on Wigmore’s Science of Judicial Proof, 2nd ed., Evanston: Northwestern University Press 1998; William Twining, Ian Hampsher-Monk (eds.), Evidence and Inference in History and Law, Evanston: Northwestern University Press 2003; Terence Anderson, David Schum, and William Twining, Analysis of Evidence, 2nd ed., Cambridge: Cambridge University Press 2006. William Twining, “Evidence as a multi-disciplinary subject”, in: Law, Probability and Risk 2, 2003, pp. 91-107, p. 99. Stephen Toulmin, The Uses of Argument, Cambridge: Cambridge University Press 1958. Alison Wylie, “Evidential constraints: Pragmatic objectivism in archaeology”, in: Michael Martin, Lee McIntyre (eds.), Readings in the Philosophy of Social Sciences,
296
Paolo Garbolino
The topic of evidence emerges from these recent works “as a multi-disciplinary subject [that] is about inferential reasoning. […] ‘Evidence’ is a word of relation used in the context of argumentation (A is evidence of B)”.14 The remarks made more than fifty years ago by Hempel maintain their validity and relevance: interpretation and narrative cannot be kept separated from problems about inference and the logic of evidential reasoning.
THE LOGIC OF HISTORICAL EVIDENCE The evidential reasoning of the historian and the judge is seeking for a connection between a particular observed event, described by a proposition E, and another, not observed, particular event that occurred in the past, described by a proposition H. Archaeological data, DNA data, eye-witness reports, written documents are observable events. Missing documents must have left surviving traces which allow to say that they existed in the past. Hearsay testimony must be reported by someone. And missing data can have an evidential import precisely through the circumstance that certain facts that should have been expected did not occur and someone or something must bear testimony that they did not occur. Historical evidence is not confined to historians and judges. Life sciences are trying to reconstructing the evolutionary past of living species, and a growing and important amount of data of today physics are traces left by past events in the history of the Universe. Which kind of connection must exist between a known proposition E and an unknown, uncertain, proposition H, in order to be able to say that E is (historical) evidence for H? A widely accepted answer to this question, that goes back to William Whewell, is: because H is a possible explanation of E. According to this answer, E is evidence for H if there exists an explanatory relation between H and E.15 According to Peter Achinstein’s recent proposal, this explanatory relation must be true with high probability: E is potential evidence that H if and only if the probability that there is a true explanatory relation between H and E, given E, is greater than 1/2, E is true, and E does not entail H. Achinstein is able to prove that this definition implies that the probability of H given E is greater than 1/2.16 Achinstein is taking ‘evidence’ as meaning strong positive evidence, that is, evidence that raises the probability of the hypothesis at hand beyond a given threshold. This understanding of ‘evidence’ is coherent with the philosophical point of view according to which believing is an ‘all-or-nothing’ matter: one can believe that a proposition is true only if its probability is great enough. What Cambridge (Mass): The MIT Press 1996, pp. 747-765, p. 753-4. 14 William Twining, “Evidence as a multi-disciplinary subject”, loc. cit., p. 97. 15 Gilbert Harman, “The inference to the best explanation”, in: Philosophical Review 74, pp. 88-95; Peter Lipton, Inference to the Best Explanation, London: Routledge 1991. 16 Peter Achinstein, The Book of Evidence, Oxford: Oxford University Press 2001, pp.160-4; and p. 170.
Historical Narratives, Evidence, and Explanations
297
is called ‘identification evidence’ in forensic science offers a counterexample to Achinstein’s definition of “potential evidence”. The so called ‘island problem’ is a toy example that illustrates the basic structure of any identification problem.17 A murder has been committed in an island, on which N+1 male inhabitants remain. Suppose, for the sake of argument, that a totally reliable eye-witness has seen the murderer running away and the murderer was bald. Horace is bald and the probability that another random male in the island is bald is P. Let H the proposition stating that Horace is guilty, and let E be the proposition stating that the man seen by the eye-witness was bald. Is E potential evidence that H? Why has this event occurred, namely the observation, spatially and temporally located, of a bald man running away? One possible answer is that H is true, and the explanatory connection is deductive. Achinstein shows that E is evidence that H, only if:18 (1) p (there is an explanatory connection between H and E | H & E) > 1/2.
p (H | E)
In our example, because the explanatory connection is deductive, it holds that: (2) p (there is an explanatory connection between H and E | H & E) = 1. Therefore, E is evidence that H, only if p (H | E) > 1/2. In the ‘island problem’ this posterior probability is given by the formula:19 (3) p (H | E) = 1/(1 + NP) Let’s take N = 100 and P = 0.04. Then, p (H | E) = 1/5, and E is not potential evidence according to Achinstein’s definition. Knowledge of E raises the probability of H from 1/101 to 1/5, and it seems reasonable to say that it is rather ‘good’ evidence, even though far away from being decisive. The example shows the fundamental problem of any definition of evidence that makes use of a probability threshold: something that is not evidence suddenly becomes evidence and vice versa. This is contrary to a common understanding of ‘evidence’ as exemplified by the U.S. Federal Rule of Evidence 401: “Relevant evidence means evidence having any tendency to make the existence of any fact that is of consequence to the determination of the action more probable or less probable that it would be without the evidence”. 17 Richard Egglestone, Evidence, Proof and Probability, 2nd ed., London: Weidenfeld & Nicolson 1983, Appendix 3. 18 Peter Achinstein, op. cit., p. 155. 19 The posterior odds are: p (H) p (E | H) 1 p (H | E) 1 1 = × = = × NP p (not – H | E) p (E | not – H) p (not – H ) P N Thus, the posterior probability that Horace is guilty is: 1/NP 1 = p (H | E) = 1 + (1/NP) 1 + NP
298
Paolo Garbolino
The common understanding of the terms ‘probable’ and ‘probability’ in the context of the Law of Evidence is not Bayesian, that is, it is not the standard view in this field that degrees of ‘probabilities’ must obey the rules of mathematical probability. 20 But, whatever the meaning of probability, E is understood as evidence for H if E is able to change the degree of belief that the fact finder entertains about H. On the other hand, probabilistic relevance is a necessary but not a sufficient condition, for different individuals can disagree whether a certain evidence is relevant for the hypothesis, or about its degree of relevance. What can be required is anybody having a good reason for believing that a certain evidence is relevant for the hypothesis. A good reason is that the hypothesis is a potential explanation of the occurrence of the event that counts as evidence. In the context of evidential reasoning, an ‘explanation’ is an answer to a whyquestion: why did this particular event occur? Why the running away man was bald? Why in this sample of a thin band of reddish clay, the amount of iridium is more than three hundred times normal levels? (Because it is the trace of an asteroid impact on the Earth). Why is this antenna troubled by a uniform background noise, seemingly coming from every point in the sky, night and day? (Because it captures the signal of the Cosmic Background Radiation). Maybe not any explanation is an answer to a why-question, but in evidential reasoning any explanation is an answer to a particular why-question that makes use of pertinent ‘scientific’ knowledge. A why-question has a context that is determined by the topic of the question, a proposition E, with its associated contrast-class X that includes some other propositions E1, E2, …, alternatives to E, and by the body of background knowledge K.21 A why-question arises in a given context if K entails that E is true, and there is at least one true proposition H, compatible with K, that bears a relevance relation R to the couple〈E, X . Let say that, if the following conditions hold: (a) E is true; (b) H bears R to〈E, X ; (c) R is a warranted relevance relation, then H is explanatory relevant for E. An adequate explanandum for ‘(historical) evidence’ can be given as follows. A particular proposition E is evidence for another particular proposition H if and only if: (i) H is explanatory relevant for E; (ii) H is probabilistically relevant for E.
20 I am assuming that ‘probability’ is Bayesian probability, as it is shown in footnote (19), where Bayes Theorem is used. 21 Bas van Fraassen, The Scientific Image, New York: Oxford University Press 1980.
Historical Narratives, Evidence, and Explanations
299
Condition (c) above is an addition to the original Van Fraassen’s definition. Indeed, Van Fraassen did not set forth any condition to the relevance relation R, and this has been criticized on the grounds that one is running the risk of wrecking against “the Scylla of the ‘anything goes’ theory of explanation”.22 On the other hand, adding the condition that the relation R should be a ‘genuine’ scientific relation, one is sailing against the Charibdys of a relation of explanatory relevance too strong to be used in evidential reasoning which mostly relies on rough-and-ready common sense generalizations. In arguments about evidence, several different kinds of general propositions play an important role both as discrete steps in an argument and as background knowledge. […] scientific truths (such as the law of gravity) common sense generalizations (such as that running away is indicative of a sense of guilt), commonly held beliefs (such as national or ethnic stereotypes, including prejudices, that suggest that a person of such origins has certain characteristics), and general background information bearing on the present case (such as a generalization about X’s habits or Y’s character).23
The scope of common sense generalizations is wide. In law or medical diagnosis or historical inquiry, in considering a particular case calling for judgment about a particular event or situation, insofar as the issue is susceptible to rational argument, the main distinction is not between scientific and intuitive (or subjective) judgment. Rather it is between different kinds of generalization (scientific, common sense, casespecific) and particular items of information all of which have evidential functions in the context of an argument and all of which are subject to critical appraisal in respect of their evidential credentials—viz. relevance, credibility and probative force. […] in most context, “evidence” cannot be restricted to “hard” scientific data.24
In order to be able to sail in the channel between Scylla and Charibdys, I take the stance that Kitcher and Salmon have called “modest relativism”: by ‘a warranted relevance relation’ I shall intend a relevance relation that, “for some particular science(s) and period of interest” is accepted as a ‘genuine’ explanation.25 22 Peter Kitcher, Wesley Salmon, “Van Fraassen on Explanation”, in: Journal of Philosophy 84, 1987, pp. 315-330. 23 Terence Anderson, William Twining, op. cit., p. 43. 24 William Twining, “Evidence as a multi-disciplinary subject”, loc. cit., p. 95-6. 25 The problem of which relevance relations are ‘genuine’ is well known and it is the object of a never-ending debate in the Law. A famous, and widely discussed, rule by the U. S. Supreme Court in 1993, known as the Daubert Rule, figures out a role of scientific gatekeeper for the judge when it states that: “Faced with a proffer of expert scientific testimony […] the trial judge […] must make a preliminary assessment of whether the testimony’s underlying reasoning or methodology is scientifically valid and properly can be applied to the case at issue. Many considerations will bear on the inquiry, including whether the theory or technique in question can be (or has been) tested, whether it has been subjected to peer review and publication, its known or potential error rate, and the existence and maintenance of standards controlling its op-
300
Paolo Garbolino
SCIENTIFIC LAWS AND COMMON SENSE GENERALIZATIONS Scholars of evidence in the law have provided a provisional taxonomy of general statements. The basic distinction is between scientific, common sense, experiencebased and case-specific generalizations.26 Scientific generalisations are based upon the laws of science. Case-specific generalizations are those that are or may be established upon evidence present in a particular case, as for example, “The employer-defendant on this case regularly discriminated against women in its employment practices”. Experience-based generalizations are generalizations that are the product of direct individual experience, even though some of them may be widely shared in a community. For example, experience-based knowledge of policemen, fingerprints experts, physicians, art critics and connoisseurs. Common sense generalizations are so widely accepted in the particular community that they often appear in arguments only implicitly as, for example, “the leading principle of folk psychology”27 that, if any agent X wants A, and if X believes that B is a means to attain A under the circumstances, then X does A. Many common sense generalizations are general statements of the form if A, then usually (sometimes, typically, frequently) B as, for example, the statement that a veracious eye-witness will usually tell the truth. Also many generalizations of the social sciences have this form. There are two possible readings of such statements. The first is to consider them as probabilistic laws: if A then in most cases (in few cases) B. There exist statistical laws in social sciences and they play an important role in evidential reasoning, but common sense generalizations are not statistical laws. One cannot have any statistics about the frequency with which veracious eye-witnesses tell the truth, or about the frequency with which agents who desire A do action B, because by definition veracious eye-witnesses tell the truth, and rational agents who want A do B. Then, one can have a high subjective probability that this particular witness is veracious and, therefore, that he’s telling the truth, and that this particular agent is rational and that, in these circumstances, he desires A. The second understanding of if ‘A then usually B’ statements is that they bear an unstated ceteris paribus clause excluding a (potentially endless) list of conditions: saying that if A then usually B would not be different in meaning from a universal statement with an other things being equal clause: (ceteris paribus) if A then B. Starting from Hempel’s paper on provisos and Nancy Cartwright’s provocative theses, the nature of ceteris paribus laws has been an important issue in philosophy of science and in the philosophy of special sciences.28 eration, and whether it has attracted widespread acceptance within a relevant scientific community.” (Daubert v. Merrel Dow Pharmaceuticals, 509 U. S. 579, 1993). 26 The list that follows is taken from Terence Anderson, David Schum, and William Twining, op. cit., pp. 265-76. 27 Alexander Rosenberg, Philosophy of Social Sciences, 2nd ed., Boulder: Westview Press 1995, p. 31. 28 Carl Hempel, “Provisoes: A Problem Concerning the Inferential Function of Scientific
Historical Narratives, Evidence, and Explanations
301
Common sense generalizations of the form ‘if A then usually B’ are not ceteris paribus generalizations. There are two readings for cp generalizations: either they can be reduced to probabilistic laws, or they are incomplete generalizations. I do not consider here the traditional distinction between ‘lawlike generalizations’ and ‘contingent generalizations’ because the considerations that follow apply to both, and, according to my “modest relativist” view of explanatory relevance, warranted contingent generalizations can be used in evidential reasoning. If cp generalizations can be reduced to probabilistic laws, then the above made remarks will hold. If they are incomplete generalizations, then common sense generalizations are not cp general statements. The only non trivial way of completing a cp generalization would be to formulate some conditions in the language of a more basic scientific theory, conditions that Fodor has called completers.29 Let take a generalization concerning testimony: a veracious eye-witness will usually tell the truth. Being a veracious eye-witness is a mental state, and it means having the intention to tell what one believes. This generalization is always implicitly used in the inference from the event that X says that H occurred to the event H. Actually, it is a kind of prima facie generalization that can be further analyzed in terms of the “leading principle”: if X wants to be cooperative, and if X believes that H occurred and that saying that H is a mean to be cooperative under the circumstances, then X says that H occurred. Let suppose we are able to give completers for such a generalization, so that a complete scientific answer to the question ‘why does X say that H occurred?’ will be: because H is true and (a long conjunction of universal and/or statistical laws plus statements of antecedent conditions). Would such a kind of answer be more satisfactory than the answer because H, and X is a veracious eye-witness? This last answer seems to be a good explanation, in a context where it is a fact finder who is asking the question, and given that the truth conditions of the two answers would be the same. There is a third possible interpretation of common sense generalizations which it is helpful for understanding the role they play in evidential reasoning. Some authors have put forward the idea that intentional psychology is a family of models, following Ronald Giere’s model-based approach to understanding scientific theories and laws.30 According to this view, scientific laws are neither exceptionless nor ceteris paribus generalizations, but are stipulative definitions of
Theories”, in: Erkenntnis 28, 1988, pp. 147-164; Nancy Cartwright, How the Laws of Physics Lie, Oxford: Oxford University Press 1983. 29 Jerry Fodor, “You Can Fool Some of the People All the Time, Everything Else Being Equal; Hedged Laws and Psychological Explanations”, in: Mind 100, 1991, pp. 19-34. 30 Ronald Giere, Explaining Science: A Cognitive Approach, Chicago: Chicago University Press 1988; Ronal Giere, Science without Laws, Chicago: Chicago University Press 1999; Paul Teller, “Twilight of the perfect model”, in: Erkenntnis 55, 2001, pp. 393415.
302
Paolo Garbolino
idealising models that provide simplified representations of, and are similar to, actual complex systems. Peter Menzies shows as these models, which contain a priori definitions, can be used to make empirical predictions and explanations.31 He gives the example of the folk psychology model of rational agency. The generalization: if any agent X wants A, and if X believes that B is a means to attain A under the circumstances, then X does A, is a definitional truth, but we can apply it to a real-world situation because we made what Giere calls a “theoretical hypothesis”, which specifies the relevant aspects under which the model of rational agency is similar to the intended real-world situation, and the degrees of similarity. In this case, the “theoretical hypothesis” would be that Peter is a rational agent. This is an empirical hypothesis that licenses the empirical conclusion that if Peter wants A, and if Peter believes that B is a means to attain A under the circumstances, then Peter does A. I think that Menzies’ proposal can be applied to common sense generalizations provided that models are intended in a broad sense as schemas or scripts. There is empirical evidence that jurors reasoning can be arranged in the form of stories that very often fit standard schemas or models which represent what typically happens in the world according to jurors’ knowledge.32 Sometimes the schema of a story can be summarized in a single sentence that can be generalized under the form of a general normative proposition.33 Although folk models are not explicitly stated in propositional form, often educated people like historians, judges and scientists (and philosophers), who have to perform inferential tasks, must give them a linguistic disguise. In evidential reasoning this rationalization of common sense knowledge takes the form of the development of a chain of reasoning H, A, …, F, E, linking evidence E with the hypothesis H, and where each step of reasoning is supported by one or more generalizations that provide an appropriate relevance relation between two reasoning stages: We assert a generalization G which we believe links E and F, and then we put this generalization to the test by collecting n items of ancillary evidence […] This ancillary evidence together with the generalization being tested forms the basis of our epistemic assessments of likelihoods.34
31 Peter Menzies, “Reason and causes revisited”, in: David Macarthur, Mario de Caro (eds.), Naturalism and Normativity, New York: Columbia University Press forthcoming (quoted with the permission of the author). 32 Ronald Hastie (ed.), Inside the Juror: The Psychology of Juror Decision Making, New York: Cambridge University Press 1993; Marilyn MacCrimmon, “What is ‘common’ about common sense? Cautionary tales for travellers crossing disciplinary boundaries”, in: Cardozo Law Review 22, 2001, pp. 1433-1460. 33 William Twining, “Argumentation, stories and generalizations: a comment”, in: Law, Probability and Risk 6, 2007, pp. 169-185, p. 175. 34 Joseph Kadane, David Schum, op. cit., p. 268-9.
Historical Narratives, Evidence, and Explanations
303
Common sense generalizations ‘if A then usually B’ must be intended as generalizations which are true in idealised models that are applied to the particular situation at hand. In this context, testing a generalization by ancillary evidence plays the role of Giere’s “theoretical hypotheses” which state the “similarity” of the model to the real system. “Ancillary evidence” mentioned by Kadane and Schum is evidence that bears upon the “theoretical hypotheses”. In the example of testimony, we have an idealised model of a veracious eye-witness in which the generalization: if X is a veracious eye-witness then X says that H is the case, is a priori true. If Peter is a veracious eye-witness (theoretical hypothesis) then he says that H. Therefore, we have an answer to a why-question: why Peter says that H ? (the contrast-class is〈says that not-H, says nothing〉). The answer is that the hypothesis H is true and H is explanatory relevant, because H is the case and Peter is a veracious eye-witness), and H is probabilistically relevant, because p (E | H) > p (E | not-H). How does this reading of common sense generalizations take into account the fact that their scope is reduced by a ‘usually’ quantifier? A judgment of “similarity” is a subjective judgment: one judges that a model is similar to the real system if one’s subjective probability that Peter is a veracious eye-witness, given the ancillary evidence, is high.
Facoltà di Design e arti IUAV Dorsoduro 2206 30123 Venice Italy [email protected]
RAIMO TUOMELA
HOLISTIC SOCIAL CAUSATION AND EXPLANATION
I. INTRODUCTION It has been regarded as a problem whether there can be genuine, irreducible causation in the social sciences. By such causation I mean causation that does not reduce to individual level causation. There is a parallel and related problem concerning psychological causation: Is there genuine, irreducible mental causation or is all causation purely physical in its nature? In this paper I will focus on holistic social causation.1 It can be regarded as a characteristic feature of a philosophical problem qua philosophical that it is a problem what the problem really is. Given this, I would say that the problem of social causation is at least in part a philosophical problem. For one thing, it is not clear in the discussion of individualism versus holism and of the possibility of social causation what social properties (or events or states of affairs) as opposed to individualistic ones really are. I will below take the distinction between what is social—in a holistic, macrosocial sense involving irreducible social wholes—and what is individualistic to be related to the “we-mode” perspective or conceptual framework versus the “I-mode” perspective (framework).2 The we-mode is holistic (but in a sense compatible with the causal primacy of individuals) and the I-mode is individualistic. These frameworks concern concepts and principles rather than anything directly ontological while of course the holism-individualism debate basically concerns ontological matters. I will speak of predicates and concepts (concepts being predicates with meaningful uses) and also of what properties these predicates express or purport to express—even if social properties are mostly (but not invariably) collectively man-made.3 Given that sense can be made of holistic social predicates and individualistic predicates and what they express, one needs to say something about causality and about causal laws that can be used for explanation of relevant singular happenings (and facts) and possibly regularities of them.4 Laws are here taken to be ontologi1
2 3
4
However, I will make use of some of my discussion on mental causation in an earlier paper of mine see Raimo Tuomela, “A Defense of Mental Causation”, Philosophical Studies 90, 1998, pp. 1–34. See Raimo Tuomela, The Philosophy of Sociality: The Shared Point of View, N. York: Oxford UP 2007 (pb ed.2010). See Raimo Tuomela, The Philosophy of Social Practices: A Collective Acceptance View, op cit. and The Philosophy of Sociality: The Shared Point of View, op. cit. for my account of what the man-made character of social facts involves. I take it that the more general formulation here is given in terms of facts rather than
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_21, © Springer Science+Business Media B.V. 2011
306
Raimo Tuomela
cal, i.e. they are connections between real, actual and possible, events or facts. A law can in a simple paradigm case be formulated by means of a sentence of the kind “In circumstances C, if A occurs then so necessarily does B (or probably does B)”, where A and B represent kinds of events or facts. If a law exists, then a sentence of the above kind will be true and support relevant “robustness” conditionals such as: “If A had occurred in circumstances C, then B would have occurred (or would probably have occurred)”. I take it for granted that citing causes is explanatory in general. Yet I will not say much about the nature of causes and causal laws in this paper—but will largely rely on common sense examples and paradigm cases. I take these kinds of cases nevertheless to give a “pro tanto” (or “prima facie”) reason for the existence of causal laws.5 My main thesis in this paper is simply this: (GC) There are cases of irreducible holistic social causation and hence of explanation based on a cause-expressing holistic social explanans. I take this thesis to involve the related theses that there are irreducible social facts and that these facts can cause other phenomena. (GC) This is compatible with the kind of ontological individualism that individual agents, functioning individually or jointly, are the sole ultimate causal initiators or ultimate sources of causation in the social world. This kind of ontological individualism I find plausible but cannot here use the space to argue for it. I claim that in generalized form causal relations yield social causal laws fit for explanatory uses. Accordingly, A backing law account is tentatively used in the paper, but not needed for the main argument. If the backing law account is tenable, macrosocial laws (perhaps only tendency laws) exist.
II. INDIVIDUALISTIC VERSUS HOLISTIC PREDICATES AND CONCEPTS Below I will say briefly what I take individualism and holism to involve. My approach will mainly be formulated for predicates (thus concepts, taking concepts to be predicates with meaningful uses) but I will also speak of individualistic versus holistic attitudes and actions.6 As the individualism-holism distinction antecedently is not very clear and as different theoreticians have understood the distinction in different ways, some amount of stipulation is necessary here. Accordingly, I will speak of individualistic predicates or briefly I-predicates and take them to 5
6
events and that event-talk can be formulated in terms of fact-talk. I do not object to using the term ’law’ alternatively to refer to a linguistic entity, i.e. the description of a law in the ontological sense. In this paper, however, the primary use is the nonlinguistic use. A predicate is a linguistic entity, and it may express extralinguistic things like properties (as universals or as e.g. tropes) or sets of entities. One may also want to say that a predicate can express a concept. Here my line would be to say that predicate uses are concepts and that a predicate in this sense expresses a concept.
Holistic Social Causation and Explanation
307
express attitudes, actions, etc. of individuals. (A subset of these predicates will be I-mode predicates, to be explained and discussed later.) Derivatively we may speak of I-attitudes and I-actions as attitudes and actions to which the relevant I-predicates apply.7 I-predicates are taken to be property-expressing predicates that refer to individuals’ psychological or (broadly) physical properties of the kind that do not refer to, or conceptually presuppose, irreducible social group concepts. Thus institutional concepts are excluded. Simple noninstitutional attitudes (that may be relational) are covered, and the same goes also for simple joint and collective actions. In brief, individualistic predicates, I-predicates, then are monadic or relational predicates that do not conceptually rely on irreducible social group predicates. In general terms, a holistic social predicate, termed an S-predicate, on conceptual grounds applies either to a social “whole” like a social group or to groupbased attitudes and actions (including joint actions) that conceptually presuppose irreducible social wholes. Accordingly, to be group-based an attitude or action must conceptually entail at least the collectively accepted or believed existence of a social group (broadly understood). S-predicates will not be fully characterized here because my task in this paper is to argue only for an existential claim, and this can be done by means of a subset of holistic social predicates, the subset consisting of “we-mode” predicates. Thus, I will concentrate on we-mode we-thinking and its features and claim that it involves S-features that require S-predicates for expressing them. Roughly, such we-thinking involves a group’s and its members’ thinking and reasoning in terms of the thick, “togetherness” notion of “we” about attitudes, actions, and emotions attributable (in my view only somewhat non-literally) to a social group and its members. S-predicates in the sense meant here are irreducible to I-predicates—I will comment on this matter later in the paper.8 I will also argue that they are needed for the best description and explanation of the social world. Thus there is much point in concentrating on the full-blown kind of sociality that the present subset of S-predicates involves.9 Let us consider a “we-intention” as an example of a we-mode we-attitude.10 Suppose we, a group of people, share the we-intention, thus have the joint intention, to clean up a park (X). Linguistically expressed “We share the we-intention to perform X as a group”. Such a we-intention of a participant entails the existence of
7
Relevance here means that the content of the attitude or respectively action is expressed by the predicate in question. 8 In my recent book I argue for such irreducibility of some social concepts and what they express—see Raimo Tuomela, The Philosophy of Sociality: The Shared Point of View, op. cit., Chapter 4. 9 The notions in this paragraph and below have been discussed in Chapter 8 of Raimo Tuomela, The Philosophy of Sociality: The Shared Point of View, op. cit.. Especially the so-called CAT thesis of collective sociality is central here. 10 There are also I-mode we-attitudes, but I will not here consider them.
308
Raimo Tuomela
the joint intention by the participants to do X as a group.11 So the participants must mutually believe or accept that they form a group and thus the notion of a social group is presupposed by the full-blown joint intention notions. The participants are assumed to have identified themselves with the group in question and thus made a “togetherness-we” apply to it. From the group’s point of view, the action concept X must be a concept understood by the members, and, in our example, also the existence of the park must be assumed by them. Here is an important ontological point: Such central social notions as that of an institution are socially constructed for the group. Roughly and concisely, an institution exists for the group only if it is constructed and taken by the group to exist. What the existence of an institution thus involves is based on the group’s “intentional horizon”. The existence of a social institution and related notions (like that of a full-blown social group capable of action) is quasi-objective but collectively mind-dependent. Institutional concepts do not get more objective than that: Macro-social entities and properties exist in the appropriate sense if and only if they are collectively constructed and brought about by the group members, and here the appropriate sense involves the use of S-predicates and existence for the group. Non-social notions such as that of a mountain are not similarly constructed. Yet, when this notion occurs in the intentional context of a collective attitude such as a joint intention, it must be taken by the participants to exist, i.e. to exist for them, for their use and to comply with their interests. In general, we-mode predicates and concepts express group-based attitudes and activities defined by the notion of functioning as a group member and should thus be understood as S-predicates. To take an example, a group may have accepted a group goal for itself (e.g. “G is our group’s central goal”). Accordingly, the group members ought, because of their membership, to function in the right way which involves that they respect and promote the achievement of the group goal. This entails (at least for normal cases) that they ought to adopt the group goal as their personal goal (but not necessarily as their “private” goal, one that is not grounded in the group’s goal at all). Thus each agent here has a we-mode goal (the group’s goal, an S-goal) and may have the goal also as his private goal (I-goal).12 The S-goal constitutively depends on the group goal and may differ from (and even conflict with) a member’s relevant private, I-goal. The core of the we-mode being to function as a group member, all mental attitudes and actions that are constitutively involved in the we-mode qualify as holistic, S-features. This is the member-level sense of the we-mode—and it is worth recalling that single group members can (and indeed generally ought to) 11 We must distinguish between an intention attributed to a group agent, a joint intention that its members or some of them share, and a we-intention that a group member qua a group member has. Group intentions ontologically rely on joint intentions which in turn involve the members shared we-intentions 12 To put the matter differently, the agent may have a goal that for him is both a goal in the we-mode and in the I-mode.
Holistic Social Causation and Explanation
309
personally have such S-features (e.g. we-intentions, given that the others have or can be expected to have them). The we-mode involves these three central ideas on the member level: (1) authoritative group reason (e.g. a collectively accepted or a leader-determined group goal might be one), (2) collectivity condition (intuitively, that the members are in the same boat due to their construction of the group goal), and (3) collective commitment (the members’ group-based bond to the goal).13 The macro sense of the we-mode concerns social groups as functional and constructed group agents (we-mode groups) and their characteristics (e.g. attitudes and actions). From a conceptual point of view, but strictly speaking not ontologically, we-mode groups are (extrinsically but not intrinsically) intentional group agents. They are real social systems that can function in ways resembling the functioning of individual agents. The “switch” from the I-mode to the we-mode (that is holistic and is based on the notion of a group) is sometimes factually possible and it involves both a change of agency from individual agents to collective (quasi) agents and a change of I-thinking and acting to we-mode thinking and acting. As to the notion of a we-mode group, it is assumed to commit itself to a group ethos (certain constitutive goals, beliefs, standards, norms, etc.) and to relevant we-reasoning and we-acting.14 A we-mode group always constructs itself as a group in a quasi-entifying sense and thus is an organized group capable of action as a group. In a we-mode group the members ought to act as group members and thus to identify with the group. (The above is in contrast to what needs to be the case in an I-mode group.) Generally speaking, my ontological view about the social realm is both holistic and individualistic. It is holistic in the sense of requiring holistic entities like groups as irreducible social systems to exist. It is individualistic in the sense that individual agents are “in the last analysis” the sole causal initiators or ultimate sources of causation in the social world. There is derivative causation, e.g. the fact of people spending “too” much money on goods can be the cause of increase in inflation, etc.
III. THE SOCIAL CAUSATION PROBLEM Why is social causation seen as a problem? First, this is often connected to the belief that groups and social structures as superindividual entities are dubious entities especially from the point of view of their causal capacities if not otherwise. In my view we-mode groups exist as interactive social systems capable of produc13 See Raimo Tuomela, The Philosophy of Sociality: The Shared Point of View, op. cit. and Raimo Tuomela, “Group Thinking”, ms 2009, for discussion. 14 Every we-mode group has an ethos. Even when you and I spontaneously lift a table together we form a possibly transient group with an ethos constituted at least by the joint goal (and the dyad’s goal) to lift the table together.
310
Raimo Tuomela
ing uniform outcomes in the world and to do this intentionally in view of their members’ activities as group members. A we-mode group does not ontologically exist as a full-blown intentional and mind-possessing agent (or person), although functionally or instrumentally it can be regarded as an agent. (An I-mode group of course is not usually viewed as an agent.)15 Second, social causation may argued against by claiming that the only kind of causation there is in the real world at bottom is physical (or, if you prefer another term, material) causation.16 As it often is put, the world is causally closed concerning physical causation. In the extensive debate on mental causation it has accordingly been argued that mental causation is epiphenomenal and that brain states and the behaviors they cause is all one can have concerning causation. What has been called the “exclusion argument” in effect says this in a precise, reasoned way.17 I will in this paper take for the granted existence of mental causation at least in some contexts and rely on my earlier discussion.18 As to social causation, the main problem has been the debate between holism and individualism concerning causation. Holism claims that social macroentities such as groups and social classes and their monadic and relational properties, e.g. relations of production between social classes, cause people to act in certain ways. This can involve macrocausation, i.e. causation concerning macroevents or macrofacts, or it can involve the causation of individual action. In close analogy with the exclusion argument against mental causation one can formulate an argument saying that social causation is an epiphenomenon only, that (holistic) S-properties are causally impotent, and that I-properties suffice for all causation in the social realm. I oppose this claim and allow that social systems can nonvacuously enter causal chains, although individuals in their we-mode (or some such) capacity are the central intentional initiators of causal chains. The term “Social” in this paper expresses strong sociality, as defined earlier. Thus impotence argument is concerned with most senses of “social”. I will not 15 I have discussed the ontology of social groups in Chapter 6 of Raimo Tuomela, The Philosophy of Sociality: The Shared Point of View, op cit. and in Raimo Tuomela, “Group Thinking”, op. cit.. 16 At least in some interesting cases of causation, whatever its specific content (e.g. physical or social), a cause realizing a system’s causal power and expressing its causal potency can be regarded a cause in the sense of being an INUS condition for the effect, e.g. of the outcome of an action action X (perhaps relative to suitable background conditions). An INUS condition, as explicated by John Mackie, J., The Cement of the Universe: A Study of Causation. Oxford: Oxford University Press, 1974, p. 62, is an insufficient but non-redundant part of an unnecessary but sufficient condition of X. 17 For the epiphenomenalism and exclusion arguments see Robert Van Gulick,“Who’s in Charge Here? And Who’s Doing All the Work?,” in Heil, J. and Mele, A. (eds.), Mental Causation. Oxford: Oxford University Press 1993, pp. 233-256. Oxford, pp. 233-256. Also see the discussion in Raimo Tuomela, “A Defense of Mental Causation”, Philosophical Studies 90, 1998, pp. 1–34. 18 See Raimo Tuomela, “A Defense of Mental Causation”, op. cit.
Holistic Social Causation and Explanation
311
here take a precise stand on what exactly may be “social” without being “Social” (causal interaction between individuals gives examples of such non-holistic sociality). I take it that such weaker sociality has less chance of having causal powers and, in any case, if Social properties were causally impotent that would destroy the basic point of social causation, namely that groups and social structures can in some sense be causally relevant. A person’s having a we-intention is an S-property of his—the concept of a weintention being a holistic one, as claimed above. For instance, a we-intention does not reduce to private I-intention (and relevant I-beliefs, etc.) basically because it is based on the notion of a group reason in contrast to I-attitudes. As will soon be argued, social properties and hence property-expressing S-predicates are causally relevant and potent. The main thesis of this paper says that there are real cases of irreducible holistic social causation and hence of explanation based on a cause-expressing social explanans. My proper defense of this thesis will be given in Section IV, but below I will discuss some central claims needed for that defense. Let us thus consider the following central assumptions and claims related to a we-mode group G and to its action of causally bringing about something X (building a house or entering a treaty). Some of these I have already mentioned in passing, but here is a somewhat more detailed presentation.19 (1) A we-mode group can only act (e.g. perform action X) if its members relevantly act in the we-mode. In other words, the group can act and causally bring about what X involves only vicariously, through its members’ action (based on the group reason that G directs them to perform their parts of their performing X or, at least, to contribute to X). The group action supervenes on its members’ relevant we-mode actions qua members: If a change from X to something else (e.g. not-X) at the group level occurs, then, given that the external circumstances “cooperate”, there must be a relevant change at the member level we-mode actions. The truth of this supervenience claim follows from the fact that the members’ we-mode properties (attitudes and actions) conceptually depend on their group’s ethos-related properties. Accordingly, a group can perform an action X intentionally (in accordance with and because of its intention) if and only if its members acting in the we-mode qua members do it in cooperative external circumstances. In the case of group attitudes, the group members’ we-mode attitudes on conceptual grounds determine the relevant group attitude, while in the case of group action the determination is partly causal and depends also on the cooperativeness of the external circumstances. It can be noted that the discussed kind of supervenience claim need not hold relative to I-mode actions and attitudes. Thus the group can have a belief that most or perhaps all of its members do not have.20 19
See Raimo Tuomela, The Philosophy of Sociality: The Shared Point of View, op. cit., especially chapters 4-6, for a defense. 20 This is by now generally acknowledged in the literature. See Raimo Tuomela, The Importance of Us: A Philosophical Study of Basic Social Notions. Stanford: Stanford
312
Raimo Tuomela
(2) In the present action context the group is indispensable. This is because the following conditional is true (partly because joint action must be based on a joint group reason in the we-mode case): If G (due to its members’ relevant activities21) had not decided (or, more generally, intended) to act (to perform X), then the members would not have performed X qua group members. (3) (a) There can be token-cases of causation in the present action context, and on the type level there can presumably be non-strict laws such as expressed by the S-S or macro-macro law statement “For all we-mode groups G, if G intends to bring about X it tends to succeed, ceteris paribus”. (3) (b) We also get a causal connection to an individual member’s part performance, which connection generally is lawlike (at least to the extent intentions have pro tanto nomic force): “For all we-mode groups G, if G intends to perform X, its members qua members tend to perform or at least try to perform their parts of X”. While this conceptually still expresses an S-S law, “extensional-ontologically” its consequent concerns individuals’ we-mode action, viz. the members’ performances of their parts of X.22 (3) (c) Furthermore, an individual member’s intention to perform his part involves at least the causal connection that he generally succeeds in bringing about the right bodily movements required for his part performance to come about. This can be argued to involve a non-strict law between intentions (i.e. special kinds of proattitudes and commitments) and bodily behaviors, but at least in general not his full intentional performance of his part action (e.g. “intendings-in-action” or “willings” are part of the action and can only cause its behavioral movement component). (4) We-mode groups function in a way that tends to produce more overall order into the social world than pro-group I-mode groups or pure I-mode groups. University Press 1995 and Raimo Tuomela, The Philosophy of Sociality: The Shared Point of View, op. cit.. One of my examples for showing this is: The Communist Party of Ruritania believes that capitalist countries will soon perish but none of its members really believes so. 21 Groups cannot function without their members functioning appropriately. But there are myriad ways in which a group intention can come about, e.g. via its authorized leaders’ or via its members’ spontaneous (or perhaps quarrelsome and tedious) collective acceptance. 22 In a sense we here have a macro-micro law, as it connects a group’s functioning with its members’ functioning, but note that in my account an I-property does not occur in the law and thus the law after all is an S-S law rather than an S-I law. One might try to argue that non-actional bodily movements yet are I-properties rather than S-properties (or are both, the S/I distinction not making sense here), but, in contrast, a bodily movement might still be taken to satisfy earlier group-basedness definition of an S-property. I will leave this matter unresolved here, as my point about actions as a group member that we have in the consequent of the law still undoubtedly are relevantly group-based and hence S-properties.
Holistic Social Causation and Explanation
313
This will be argued in Section IV below. Accordingly, we-mode groups are indispensable also concerning the best results for the task of creating institutional order in society.
IV. ARGUING FOR HOLISTIC SOCIAL CAUSATION In this section I will argue for the existence of holistic social causation. I will first, in subsection A present an argument related to our earlier discussion of a group causally bringing about an event or state. After that I will argue in subsection B that in the case of rational group action we-mode group action and hence group causation will at least sometimes prevail over I-mode (i.e. individualistic) causation. A. Suppose a fictional country Urbania (A) intentionally acts, e.g. boycotts (X) another country, say Ruritania. A country can act only if its government acts and thus only if suitable position-holders of the group act in the right way, say agree upon an action X. They form the joint intention or decision that A is to perform X, and let us assume that this decision is effected through suitable officials jointly or separately doing something Y that in the circumstances generates X, i.e. their action amounts to X (when the connection is purely conceptual or conventional) or it causally generates A’s action X so that in each of these cases the operatives’ actions can be said to have collectively generated X.The general situation here is that agent A is a collective agent and action X is a collective action attributed to A. Thus we are clearly dealing with an S-fact, a macrosocial fact. Certain operative members of A acted suitably as group members, hence in the we-mode, and performed an action or set of actions that generates X. Here the we-mode activities are S-activities that presuppose the existence of the group, A, in question. Viewed from the acting group members’ point of view, they must at least believe that they produced X when it was successfully intentionally collectively brought about by them. Such intentional performance of X depended on the members’ shared weintentions to perform it. These we-intentions and the relevant beliefs needed for carrying out them are also S-properties. They are of course not directly macrosocial properties but as they conceptually depend on the group they are indirectly macrosocial. We have here a rather trivial causal case of macro-macro causation or S-S causation: the group through its members activities intentionally causes some suitable X-related facts to take place in the world such that finally X (an S-feature) comes about. There can of course be cases of unintentional causation as well. For instance, the boycott action X could for instance contingently lead to a new world war. That would be a collective S-phenomenon (that of course is not an intentional we-mode phenomenon). In such unintended causation the S-cause (the group’s intention or decision) might also bring about lots of individual, non-collective consequences
314
Raimo Tuomela
(e.g. personal misery to many people). This means that a macrosocial cause can also cause I-phenomena to occur (a case of S-I causation). Such S-I connections might be lawful and then we would in this case have a macro-micro law in an ontological sense. There may correspondingly also exist causal I-S connections—e.g. some I-mode properties might be causally relevant to people switching from the I-mode to we-mode thinking and acting. B. In this subsection I will present an argument for the importance of S-causation and S-explanation that makes use of a recent mathematical result in game theory. I will also claim that this result gives a boost to the importance of collective goals for social explanation. As collective goals here are understood as goals that groups have they are S-features of the world and thus serve to justify the basic claim (GC) of this paper. As I and my colleagues have recently argued, the best overall theoretical account of cooperation can be argued to require the we-mode approach—at least the we-mode approach often suffices for an account of cooperation and is strictly needed in some contexts.23 Below I will only sketch an especially important argument that relates to cooperation in a game-theoretic context. Let me first mention the point that the switch to group thinking (esp. group agency and we-reasoning) together with the Paretian assumption will often lead to group-rational mutual cooperation in e.g. the PD and Chicken, to High outcome in Hi-Lo, and other social dilemmas involving common interest and actioninterdependence. To make my main point about holistic social causation and the creation of order in the social world, it suffices to focus on the simple Hi-Lo game. Consider thus the following payoff matrix: C D
C 3,3 0,0
D 0,0 1,1
This Hi-Lo game (a coordination game without communication leading to binding agreement) has two equilibria, CC and DD, of which CC dominates over DD. Classical game theory (an I-mode theory) cannot recommend CC over DD, but group game theory can.24 We note that this switch entails the choice of C for the 23 See Raimo Tuomela, The Philosophy of Sociality: The Shared Point of View, op. cit., esp. Chapter 7, and Raul Hakli, Kaarlo Miller, and Raimo Tuomela, “We-Thinking”, Economics and Philosophy, 2010. As recent experimental work shows the we-mode approach, or the experimenters say the “team reasoning” approach is empirically feasible in collective action dilemmas—often most experimental subjects operate on the basis of their we-thinking and cooperate in such dilemmas. (See Andrew Colman, A., Briony Pulford, and Jo Rose, “Collective Rationality in Interactive Decisions: Evidence for Team Reasoning”, Acta Psychologica 128, 2, 2008a, pp. 387-397, and Andrew Colman, A., Briony Pulford, and Jo Rose, “Team Reasoning and Collective Rationality: Piercing the Veil of Obviousness”, Acta Psychologica 128, 2, 2008b, pp. 409-412. 24 See Michael Bacharach, “Interactive Team Reasoning: A Contribution to the Theory of
Holistic Social Causation and Explanation
315
group (that rationally takes C to dominate over D). This choice entails here that the members both choose C and thus cooperate.25 By means of an adaptation of a game-theoretical theorem proved by Bacharach it can be shown that the pro-group I-mode (viz. the strong version of the I-mode where the members individually work for the group, sharing its utility function) and the we-mode, probabilistically construed concerning mode adoption, do not entail quite the same equilibrium behaviors.26 This can be seen by comparing standard Bayesian games in which the players share the group’s utility function (that each is trying to maximize individually) with games in which the players acting as a group try to maximize (expected) group utility in the we-mode sense (that is, by each selecting the best outcome in terms of group utility and then inferring their own part-actions). Even when the choices, utilities and the probabilities of the players acting for their own benefit instead of the group’s benefit are the same, the resulting outcomes may differ. This is because the former case admits individualistic Pareto-suboptimal equilibria that will not be equilibria in the we-mode case. So, relating this to our present problem of holistic social causation, when a rational group (causally) produces a result that is an equilibrium—in the sense that no member has an incentive to deviate from the cooperative action in question—we have a case of collectively rational holistic causation. An I-mode group may also produce the same result, because all group-based or we-mode equilibria also become I-mode equilibria, but not conversely. So what is the point here? We must take into account all the equilibria in the game and argue that holistic causation based on a collective or group goal produces more and better order into the social world than does I-mode causation. Note that in the case of we-mode causation a we-mode equilibrium expresses an explanatory group goal (or at least a joint outcome entailed by a group goal). This particular group goal gives the group members an authoritative reason to act as proper group members in the situation. Accordingly, the we-mode approach does not allow participants functioning properly as group members to defect (while the I-mode approach under discussion does allow it). To take a simple example, in the Hi-Lo game also the DD (or low-low) outcome will qualify as kind of fallback solution (equilibrium) in the individualistic Bayesian game but not in the we-mode game involving maximization of (expected) group utility. Thus it is always possible rationally to free-ride when an individualistic perspective is assumed. To elaborate further on collective goals or, rather, group goals (which are Sproperties), a we-mode group can adopt the mutual cooperation goal as its group Co-operation”, in: Research in Economics 53, 1999, pp.117-147 and Michael Bacharach, Beyond Individual Choice. Princeton: Princeton UP 2006. 25 The collective Payoff Dominance principle falls short of having the same explanatory power as group game theory. See Raul Hakli, Kaarlo Miller, and Raimo Tuomela, “We-Thinking”, op. cit. for discussion. 26 See Theorem 2 in Michael Bacharach, “Interactive Team Reasoning: A Contribution to the Theory of Co-operation”, op. cit.
316
Raimo Tuomela
goal or at least adopt a goal that entails the mutual cooperation joint outcome, e.g. CC in a Hi-Lo or in a PD. In real life terms, group goal might be e.g. to build a bridge together or to see to it that a bridge be built for the use of the group. In an ideal we-mode group, the group goal proximately explains at least the members’ participation and in some cases (e.g. prior goal) also the joint action, and this explanation is basically causal in its nature.27 Participation in the building is normatively governed and sanctionable action in the context of a we-mode group context, we assume. In the I-mode case, including the pro-group I-mode case where the members are only privately committed to the shared I-mode group goal of building a bridge, it is rationally possible to free-ride. Shared private goals of this kind are individual or I-goals in the terminology of this paper, whereas (shared) we-mode goals are S-goals. To repeat, the present mathematical result means that the we-mode approach is capable of creating more collective (i.e. group-level) order than is the (progroup) I-mode approach and gives a better explanation of cooperative member behavior. This gives a strong argument for the need for irreducible S-properties that may be causally active in cases of a group performing an action and thus causally bringing about an outcome as a group.28
V. CONCLUDING OBSERVATIONS AND ARGUMENTS In this paper, it has been assumed that holistic social notions and theorizing concern not only macroentities like groups (small social groups, organizations, states) and their properties (group action, social movements, relations of production, etc. but also members of such groups and their properties based on their acting in group contexts qua group members. My discussion in the paper has given reasons to say that the following claims are tenable: (i) There is (or at least can be) causation in which an S-phenomenon causes another S-phenomenon.
27 It is also teleological, indeed I have in my earlier work developed and defended a “causal-purposive” account of action causation, see Raimo Tuomela, Human Action and Its Explanation. Dordrecht: Reidel 1977, and A Theory of Social Action. Dordrecht: Reidel 1984. The notion of purposive causation that I also applied to cases of joint action is similar to the later developed notion of intentional causation in John Searle, Intentionality: An Essay in the Philosophy of Mind. Cambridge: Cambridge University Press 1983. 28 For other kinds of arguments for S-goals over I-goals, see Raimo Tuomela, The Philosophy of Sociality: The Shared Point of View, op. cit., Chapters 2 and 7-9. The present functional advantage of we-mode groups over I-mode ones also makes it possible to give an evolutionary argument for the development of we-mode “psychology.”
Holistic Social Causation and Explanation
317
(ii) There is (or at least can be) causation in which an S-phenomenon causes the occurrence of an I-phenomenon (and perhaps also genuine cases of I-S causation can be found).29 The cause here might be an INUS cause (or a probabilistic inus cause), because several alternative sets of group members can cause the type of event in question (viz. X). This is partly because the group A diachronically need not have fixed members. It can, for instance, produce X at time t1 through one set of operative members and at t2 (say one hundred years later) through another set of members. These members’ we-intentions will produce X, we can assume, and thus we have several disjunctive sets consisting of single-member we-intention (etc.) conjunctions such that each conjunction is sufficient (or sufficient given the cooperation of “Mother nature”) for X. Furthermore the causation is aspectual, or “qua-causation”.30 E.g. in (i) a phenomenon (e.g. some people sharing a goal) conceptualized as an S-phenomenon (when in the example the goal is had in the we-mode) causes some activity conceptualized as group-based we-mode activity (rather than conceptualized shared private, I-mode action). In accordance with above, it can be argued that the we-mode frame or perspective is not at least a priori reducible to the I-mode or individualistic perspective consisting only of I-properties in our present classification.31 Given this, we have: (iii) S-causes (and S-causation) are not reducible to I-causes (and I-causation). We also noted that there S-causation and a fortiori causal explanation are not in general “replaceable” by I-causation and I-explanation at least in the following explanatory sense: (iv) The we-mode account can be regarded as a kind of holistic social theory or, perhaps better, framework for factual theorizing and theory-formation, which gives or at least can give a better overall account of social explanation in group contexts (that is, in contexts where a group is acting qua a group) than a rival theory employing only I-mode concepts, viz. referring only to I-predicates. (v) The points made in (i), (ii), and (iii) show that there is holistic social causation in the world and that there as a consequence are explanatory connections (i.e. extensional-ontological S-S, S-I, I-S connections) that cannot be captured by the individualistic conceptual framework. In the same tenor, (iv) shows that when people think, reason, and act in group terms (i.e. in the we-mode) more order in the world is created than what the individualistic explanatory framework is able to create. This kind of order can be properly explained only in terms of an approach making serious use of group notions and group thinking. 29 Note that the “can be” versions of (i) and (ii) of course are weaker than (GC) 30 This is my terminology in Raimo Tuomela, “A Defense of Mental Causation”, op. cit. 31 See Raimo Tuomela, The Philosophy of Sociality: The Shared Point of View, op. cit., Chapter 4, for my central argument.
318
Raimo Tuomela
Nevertheless, the points (i)–(v) are compatible with agency individualism, the view that the only ultimate causal entities and, so to speak, “motors” in the social realm are individual human beings (let us not here play philosophical games with robots, apes, Martians, etc.). There may be changes in the social world that function as pro tanto causes (say revolutions or breakdowns of financial systems) but the ultimate explanatory motors “behind” them clearly seem to be human beings.*
* I am grateful to Kaarlo Miller for detailed comments on a version of this paper.
Tammisalontie 28 A 1 00830 Helsinki Finland [email protected]
WENCESLAO J. GONZALEZ
COMPLEXITY IN ECONOMICS AND PREDICTION: THE ROLE OF PARSIMONIOUS FACTORS1
Complexity is a key feature in the sciences of design from the point of view of their framework—or constitutive elements—as well as of their dynamics. This twofold complexity affects economics insofar as it is one of the sciences of the artificial.2 In this regard, economics is a “science of design” that enlarges our possibilities towards the future and uses prediction as a basis for decision-making.3 But economics is also a social science that deals with human needs. This feature adds more factors to the analysis of its complex configuration as a science and its dynamic evolution. Commonly, the studies on complexity in economics are focused on particular forms of expression of complexity rather than on the roots of this problem as a dual science (i.e., artificial and social).4 Within the framework of the structural and dynamical complexity of economics, there is complexity related to economic predictions. This complexity has several forms, mainly ontological and epistemological. The characteristic forms this complexity takes are relevant for economics both as basic science and as applied science. Prediction is important for economics in both cases: on the one hand, prediction is a crucial methodological issue in economics which has been used as a “scientific test” by influential economists;5 and, on the other hand, prediction is the support used for making prescriptions in order to resolve practical problems (e.g., in national economy or in international trade). Complexity is frequently one of the reasons used for arguing that prediction in economics is more difficult than the prediction of natural events (whether it be climate change or the weather forecast). 1 2 3
4 5
This research is supported by the Spanish Ministry of Science and Innovation (FFI2008-05948). Cf. Herbert Simon, The Sciences of the Artificial. 3rd ed., Cambridge: The MIT Press 1996 (1st ed., 1969; 2nd 1981). Cf. Charles Holt, “Rational Forecasting, Learning, and Decision Making,” in Mie Augier and James G. March (eds.), Models of a Man: Essays in Memory of Herbert A. Simon. Cambridge: The MIT Press 2004, pp. 355-363. See the three volumes on this issue: J. Barkley Rosser Jr (ed.), Complexity in Economics. Cheltenham: E. Elgar 2004. On this issue, the perspectives of four Nobel laureates in economics are analyzed in Wenceslao J. Gonzalez, “Prediction as Scientific Test of Economics,” in Wenceslao J. Gonzalez and Jesus Alcolea (eds), Contemporary Perspectives in Philosophy and Methodology of Science. A Coruña: Netbiblo 2006, pp. 83-112.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_22, © Springer Science+Business Media B.V. 2011
320
Wenceslao J. Gonzalez
1. SOURCES OF COMPLEXITY FOR ECONOMIC PREDICTIONS Undoubtedly, economic predictions are related to social needs (e.g., supply and demand of food, housing, clothing, etc.) as well as to artificial enlargements of human society by means of new designs, both at the micro and at the macro levels. The nexus between both levels—micro and macro—are far from simple. In the case of prediction, these differences in the kind of phenomena predicted should be taken into account, especially in epistemological and methodological terms. Another duality is ontological: “economic activity” and “economics as activity.” This distinction offers a central source for complexity in economics as a human undertaking. On the one hand, economic activity has specific characteristics as such; and, on the other hand, economics is a human activity that is related to other human activities (social, political, cultural, ecological, etc.),6 which makes prediction interwoven with several kinds of complexities. Furthermore, economics is a human activity that is developed by individual agents who are related to organizations and markets. Those individuals display creativity when they perform some actions, and this creativity adds another element to the complexity in making economic predictions. Thus, there are at least several components of complexity involved here that may have consequences for making economic predictions. Starting from a social ontology, these sources of complexity resemble a scale with several steps: a) the social and artificial realms; b) the micro and macro levels; c) the degree of autonomy as human undertaking (“economic activity” and “economics as activity”); d) the organizations and markets; e) the role of individual agents (i.e. creativity in different realms). All of them are at stake in front of a classical question: why are economic predictions so difficult? This central question, crucial for explaining their frequent unreliability, has a direct answer: the many faces of complexity of economic reality. In other words, there is an enormous variety of “respects” that could make a difference among economic events, and there exist complex ways in which the economic influences operate and interact with each other. For Sen, one source of this complexity lies in the difficulty in anticipating human behaviour, which can be influenced by a tremendously varied collection of social, political, psychological, biological and other factors. Another source is the inherent difficulty in anticipating the results of interactions of millions of human beings with different values, objectives, motiva-
6
This distinction between “economic activity” and “economics as activity” is developed in Wenceslao J. Gonzalez, “Economic Prediction and Human Activity. An Analysis of Prediction in Economics from Action Theory,” Epistemologia 17, 1994, pp. 253-294; especially, pp. 261-280.
Complexity in Economics and Prediction
321
tions, expectations, endowments, rights, means and circumstances, dealing with each other in a wide variety of institutional settings.7
Parsimonious factors can have a role here dealing with the problem of complexity in economics, which is a main source of difficulty for economic predictions. Parsimonious factors do not look for “simplicity” (a single ingredient or a central axis); they seek sobriety in the selection of elements in order to offer those that are central for the complex system. This austere selection of factors should be able to present the set of elements that is sufficient and necessary to grasp the complex system. This study of the parsimonious factors in economics, thinking of important aspects for predictions, can be focused towards it as a science of design. This inquiry requires taking into account some features, among them: a) the twofold complexity of economics as a science of design—its framework and dynamics—, and b) the possibility of being facing a sort of complexity open to a hierarchical articulation (i.e., an organized complexity internally articulated). Accordingly, the identification of parsimonious factors in economics, understood as a large complex system, should lead to the clarification of its scientific framework and its evolving dynamics. This identification should search a possible hierarchical articulation in the architecture of the particular complex system. Thereafter, the parsimonious factors can contribute to build upon it in order to overcome the complexity of economics predictions. The contribution might be mainly in the task of increasing the accuracy and precision of the predictive models created in economics.
2. VARIETIES OF COMPLEXITY: FRAMEWORK AND DYNAMICS Initially, there is a duality in the kind of complexity present in the sciences of design, such as economics. Complexity might appear both in their constitutive components (when they are arranged as a complex framework) and in the dynamics (when the elements operate as a teleological procedure open to many possibilities in the future). (i) There is a complexity in their configuration as a complex framework. This can be seen in the constitutive elements of a science of the artificial (language, structure, knowledge, method, activity, aims, and values). (ii) There is a complexity in the dynamics of a science of design, especially as an applied science, because this teleological human activity involves aims, processes, and results. A science of design can seek the resolution of concrete problems using a complex system organized by aims, processes, and results. 7
Amartya Sen, “Prediction and Economic Theory,” in John Mason, P. Mathias, and J. H. Westcott (eds.), Predictability in Science and Society. London: The Royal Society and The British Academy 1986, p. 5.
322
Wenceslao J. Gonzalez
2.1. Complex Framework and Complex Dynamics According to this initial duality in the kind of complexity present in the sciences of design, economics as a science of the artificial might have both features: a complex framework of components to explain and predict phenomena as well as a complex dynamics that seeks the resolution of specific problems in the social milieu. In the case of the complex framework the main components are semantic, logical, epistemological, ontological, axiological and ethical. Meanwhile the complex dynamics of the sciences of design requires a methodology that involves aims, processes and results. They evolve as a complex system in historical terms. An approach to complexity as a constitutive element of the sciences of the artificial can be found in Simon. Firstly his focus is on “organized complexity,” and secondly he seeks a complexity that might be characterized in ontological terms. Thus, he is not interested in a possible chaotic structure (e.g., in economics as a presumptive chaotic system) but rather in what he calls “an architecture of complexity.”8 This ontological complexity allows for a hierarchical configuration. Thus, a) the set is eo ipso decomposable—or rather “nearly decomposable”—following a hierarchical dimension (priorities), and b) the elements obtained—subsystems—may be coordinated, insofar as it is an organized complexity. Above and beyond the constitutive elements of a science of design—only some of them are studied by Simon—there is another angle: the dynamic domain in the sciences of the artificial—aims, processes, and results—, which is evolving as a complex system. When the complex system—such as branches of economics—is evolving, the internal structure is open to the future in many ways: a) the evolution might be for an adaptation to the available environment;9 b) the variation can be a historical modification of the system through the emergence of new properties;10 c) the historical modification might be a structural change that ends up being a new complex system (such as the changes in the business firms related to information and communication technologies); etc. Assuming a hierarchy of components within the complex system—an internal framework—, Simon analyzes the speed of evolution by means of the interac8
Cf. Herbert Simon, “The Architecture of Complexity,” Proceedings of the American Philosophical Society 106, 6, 1962, pp. 467-482. Reprinted in Peter E. Earl (ed.), The Legacy of Herbert Simon in Economic Analysis, Vol. 1. Cheltenham: E. Elgar 2001, pp. 485-500. 9 “To predict the short-run behavior of an adaptive organism, or its behavior in a complex and rapidly changing environment, it is not enough to know its goals. We must know about its internal structure and particularly its mechanism of adaptation,” in Herbert Simon, “Theories of Decision-making in Economics and Behavioral Science,” American Economic Review 49, 1959, p. 255. Reprinted in Herbert Simon, Models of Bounded Rationality. Vol. 2: Behavioral Economics and Business Organization, Cambridge: The MIT Press 1982, p. 289. 10 Cf. Karl-Ernest Schenk, “Complexity of Economic Structures and Emergent Properties,” Journal of Evolutionary Economics 16, 2006, pp. 231-253.
Complexity in Economics and Prediction
323
tion of components (a complex dynamics).11 For him, complex systems are nearly completely decomposable and, in the case of human systems, their search for coordination is explicit. This coordination in the sciences of the artificial can be carried on through bounded rationality—present both in science as human activity and in the agents—and using parsimonious factors to get the whole system analyzed. On the one hand, the sciences of design are sciences of the artificial—due to a task of synthesis—, and their subsystems can be taken back to other more basic ones. Thereafter, starting from these basic ingredients, it is possible to do research on the system as a whole. And, on the other hand, the methodological conception of parsimony allow us to focus on the elements needed in the system, instead of searching the mere simplicity of a central axis or a single directive idea. Ontologically, Simon understands complexity in the sciences of the artificial as endowed of an internal hierarchy, and epistemologically he assumes that configuration is always “nearly decomposable.”12 In addition, methodologically he develops some cases starting from a conception of parsimony.13 But a deeper analysis of the complex framework—or constitutive elements—as well as the dynamic domain of the sciences of design (aims, processes, and results) should be made. This analysis requires a wide conceptual framework. In this regard, Nicholas Rescher offers a broader perspective in his book Complexity.14 He emphasizes that a complex system can be organized into subsystems not only in hierarchical terms but also coordinately, through the network of its relations. In order to develop a deeper analysis of the complex framework as well as the dynamic domain of the sciences of design, a central feature to be considered is this: they are applied sciences.15 In addition, they assume bounded rationality in three successive levels: i) in science, insofar as it is a human-made undertaking; ii) in the diverse sciences, when they develop an specific modality of rationality according to the kind of problem that they deal with (such as issues in the fields of economics, information science, communication, etc); and iii) in the agents, when they work on the decision-making of agents in the realms analyzed (economic, documentary, communicative, etc.). 11 Cf. Herbert Simon, “Near Decomposability and the Speed of Evolution,” Industrial and Corporate Change 11, 3, 2002, pp. 587-599. 12 Simon had important contributions to the sciences of the artificial. He was both Nobel Prize in Economics (1978) and a key figure in Computer Sciences who received the Alan Turing Award with Allen Newell (1975). 13 Cf. Herbert Simon, “Science Seeks Parsimony, not Simplicity: Searching for Pattern in Phenomena,” in Arnold Zellner, Hugo Keuzenkamp, and Michael McAleer (eds.), Simplicity, Inference and Modelling. Keeping it Sophisticatedly Simple. Cambridge: Cambridge University Press 2001, pp. 32-72. 14 See Nicholas Rescher, Complexity: A Philosophical Overview. New Brunswick: Transaction Publishers, pp. 8-16. 15 Cf. Ilkka Niiniluoto, “The Aim and Structure of Applied Research,” Erkenntnis 38, 1993, pp. 1-21; and Ilkka Niiniluoto, “Approximation in Applied Science,” Poznan Studies in the Philosophy of Sciences and the Humanities 42, 1995, pp. 127-139.
324
Wenceslao J. Gonzalez
Through the epistemological notion of “bounded rationality” and the use of a methodology of parsimonious factors it should be possible to research the problem of complexity. This might be made in two directions: to achieve the key elements within an internal hierarchical articulation in the system analyzed (economic, documentary, communicative, etc.) and to grasp the reciprocal relations of elements—the interconnection—that allow us finding a coordination within the system. 2.2. Relevant Forms of Complexity Prima facie, it could be puzzling to read “there is still no generally accepted definition of complexity, despite a vast number of proposed ansatzes.”16 Moreover, Simon himself, when dealing with the architecture of complexity, avoids a formal definition of “complex systems”, those that may be either disorganized or organized. By complex system he means “one made up of a large number of parts that have many interactions.”17 In addition, he points out that the whole is more than the sum of the parts in such systems, and it is not a trivial matter to infer the properties of the whole, given the properties of the parts and the laws of their interaction. However, these characteristics given by Simon of a “complex system” are not good enough, because few realities of our world (natural, social and artificial) seem exempt from these relations between the parts and the whole through rules of interaction.18 Another way of tackling this issue is by paying attention to “modes of complexity”, trying to grasp what they share as common features. An interesting collection of modes of complexity can be found in Rescher’s analysis. He distinguishes two large modes of complexity, with several possibilities in both: a) epistemic (complexity related to formulas); and b) ontological, which are those connected to three options: composition, structure, and function.19 Epistemic modes of complexity for Rescher are threefold: (i) descriptive; (ii) generative; and (iii) computational. Within the ontological modes of complexity, there are three blocks: the compositional complexity (constitutional and taxonomic or heterogeneity); the structural complexity (that associated with the ways of organization and the hierarchical configuration); and the functional complexity, 16 Dominique Chu, Roger Strand, and Ragnar Fjelland, “Theories of Complexity. Common Denominators of Complex Systems,” Complexity 8, 3, 2003, p. 19. “There is no agreed-upon definition of such a complex term as ‘complexity.’ Indeed, MIT’s Seth Lloyd has gathered over 45 such definitions, most of these listed in Horgan ([The End of Science] 1997, Chapter 8, footnote 11, p. 303), with many of these definitions emphasizing computational or informational measures,” J. Barkley Rosser Jr, “On the Complexities of Complex Economic Dynamics,” Journal of Economic Perspectives 13, 4, 1999, p. 170. 17 Herbert Simon, The Sciences of the Artificial, 3rd ed., pp. 183-184. 18 Cf. Nicholas Rescher, Ibid., p. 22. 19 Cf. Nicholas Rescher, Ibid., p. 9.
Complexity in Economics and Prediction
325
which includes—for him—two options: operational and nomic. All these features together lead to a concept of “complexity” that shares elements related to two domains: 1) to knowledge (to describe the system, to produce it and to resolve problems within such system); and 2) to reality itself in three different stages: firstly, the components (number of constituents and the variety of them); secondly, the structure (possible ways of arranging the interrelation of components and the modes of inclusion and subsumption in the relations of those components); and thirdly, the functions that the system is able to develop (modes of operation and possible laws governing its way of functioning). Therefore, the features of “complexity” are far beyond those emphasized by Simon, that of a mere hierarchical structure that is decomposable or near-decomposable, and that of a possible coordination that has also interrelations of constituents in order to evolve. A complex system—and, consequently, an economic system—involves at least two wide levels in its complexity: epistemic and ontological (i.e., compositional, structural and functional). Consequently, any economic prediction within a complex system should consider an array of factors. Because the possible large number of factors at stake, prediction should begin with something tractable: it should seek those factors that are more relevant, in principle, to cover the whole area of interest. Those parsimonious factors should consider vertical components (such as hierarchy, open to modes of inclusion and subsumtion) and horizontal ingredients (such as coordination). From a graphical point of view, they can be depicted like few branches of a tree at the beginning that, starting from a finite set of elements, might be able to reach the whole sphera at stake. Economic prediction belongs to these complex cases where there are epistemological, ontological and methodological considerations. They receive the influence of past events (historical background), and those historical trajectories might have influence in order to describe the possible future and to guide the information needed for prescriptions.20
3. THE ROLE OF PARSIMONIOUS FACTORS When economic predictions are seen in the context of complexity, the search for parsimonious factors seems particularly important. The reason is clear: frequently, “radical openness and contextuality are properties that make the control and prediction of complex systems very difficult.”21 Parsimonious factors can be used for control of the system (i.e., to grasp “economic activity” and “economics as activity”) as well as for prediction (including foresight and forecasting).22 20 Cf. Wenceslao J. Gonzalez, “Prediction and Prescription in Economics: A Philosophical and Methodological Approach”, Theoria 13, 2, 1998, pp. 321-345. 21 Dominique Chu, Roger Strand, and Ragnar Fjelland, Ibid., p. 28. 22 On the distinction between “foresight,” “prediction,” “forecasting,” and “planning,” see Wenceslao J. Gonzalez, “On the Theoretical Basis of Prediction in Economics,”
326
Wenceslao J. Gonzalez
In the sciences of design, such as economics, the role of parsimonious factors is related to two main methodological problems: (i) to obtain their identification as key elements of the system (i.e., as necessary and sufficient factors to understand the complexity of the system) and components that characterized complex adaptive systems (such as national economies or stock markets); and (ii) to state their contribution to that structural and dynamic system, which might develop new perspectives towards the future (i.e., prediction and prescription). Putting it differently, there are two central epistemological and methodological issues at stake here: on the one hand, how to reach the parsimonious factors of a complex system, both in the configuration of the scientific framework and in the evolving dynamics of the discipline (in this case, economics); and, on the other hand, how to build up on the parsimonious factors in order to improve the system considered—structure and dynamics—and to enlarge it (e.g., in a science of the artificial, such as economics). 3.1. Identification of Parsimonious Factors How to get “parsimonious factors” is obviously the first problem. It is the task of identifying if there are “parsimonious factors” and which are then their relevant relations. This involves the assumption that there are components of the system that are really the central factors in its configuration. In this regard, “identification” is an epistemological-methodological procedure that includes at least three steps: 1) to be able of recognizing generators of complexity in the specific realm; 2) to grasp their nexus in order to establish the network of interrelations between generators of complexity; and 3) to arrive at the implications of the properties of those generators of complexity. These steps towards the identification of the parsimonious factors in a complex system require taking into account several approaches. Among the classical forms of analysis are the following four kinds: holological, etiological, teleological, and logical. (i) Holological analysis is when a whole is divided in its different parts; (ii) etiological analysis is carried out when there is a search for causes and its presumptive relations with effects; (iii) teleological analysis is developed when the relation is established between means and ends regarding to a given system; and (iv) logical analysis is when the relations are studied in connection with conceptual contents. Holological analysis is done by separation of components. Through this “topdown” analysis we can get an internal inhomogeneity of the system. Simon’s conception on “architecture of complexity” as well as his views on “near-decomposability” fit quite well within the holological analysis.23 His studies of the parsiJournal of Social Philosophy 27, 3, 1996, pp. 201-228, section 3. 23 “One of the important properties that we observe in virtually all complex systems, whether they be social, technical or natural, is that they are nearly decomposable. This means that each such system is divisible into parts, with a high density of interaction
Complexity in Economics and Prediction
327
monious factors in complex structures of the sciences of the artificial are made following the division of a whole in its parts. He gives more weight to ontological considerations than other philosophical reflections. Etiological analysis is the search of causes in the system, mainly those causes that may have multiple effects. “Causality” has received a lot of attention in social sciences, in general, and in economics, in particular. Its possibility in this realm as well as its characteristics24 (either similar or different in comparison with the natural sciences) have been discussed. What is commonly assumed is the high connectivity of the parts within the economic system. If we can get ontological causes (i.e., actual and effective) or epistemological ones (i.e., according our present knowledge) in the economic system, there are important advantages for prediction. Teleological analysis is focused towards the dynamics of the system. Commonly, there is a relation between means and ends within a system, which could be synchronic or diachronic. This relation means-ends is also open to the etiological analysis. De facto, there are studies that “attempt to distinguish between system linkages at a point in time (synchronic) and linkages between things as time passes (diachronic): if a subsystem were subject to a shock, the structure of the former linkages could affect the set of events that then unfolded through time.”25 Frequently, when the emphasis means-ends is on the dynamics of the system, the characterization in the case of economics is seen as an evolution. Then, economic dynamic understood as an “evolution” appears as a creative and destructive process, which shows the historical component of economics. Thus, there are changes of different kinds in the relation between means and ends: a) most of them are originated in the system; b) some are modifications as the result of connections between systems; and c) others are the consequence of emerging new systems. 26 Logical analysis is oriented towards the search of parsimonious factors as “central statements” of the system considered. These statements might be understood as “basic principles” of the whole set of relations established in the system. This analysis avoids the interest in simplicity and looks for the common denominators of the complex system. If they are found, then they may be used for the among the elements within each part and a lower density of interaction between these elements,” Herbert Simon, “Organizing and Coordinating Talk and Silence in Organizations,” Industrial and Corporate Change 11, 3, 2002, p. 611. 24 See for example William Brock, “Causality, Chaos, Explanation and Prediction in Economics and Finance,” in John Casti and Anders Karlqvist (eds), Beyond Belief. Randomness, Prediction and Explanation in Science. Boca Raton: CRC Press 1991, pp. 230-279; and Robert Engle and Halbert White (eds), Cointegration, Causality and Forecasting: A Festschrift in Honour of Clive Granger. Oxford: Oxford University Press 1999. 25 Peter E. Earl and Jason Potts, “Bounded Rationality and Decomposability: The Basis for Integrating Cognitive and Evolutionary Economics,” in Mie Augier and James G. March (eds.), Models of a Man: Essays in Memory of Herbert A. Simon, p. 323. 26 Peter E. Earl and Jason Potts, Ibid., p. 328.
328
Wenceslao J. Gonzalez
search of other properties within the system. In addition, these parsimonious factors from the logical point of view can be utilized to enlarge the system in a coherent way (i.e., as “bottom-up” analysis). Using this set of analyses—holological, etiological, teleological, and logical—the identification of “parsimonious factors” of a system could be obtained. Thereafter, the relevant elements for economic predictions of that system should be easier. In this regard, a convergence among the results of these four analyses is expected, because the main elements (parts, means, causes and reasons) might be connected in order to clarify the parsimonious factors of the structure and dynamics of the system considered. Moreover, the complex configuration of past events and their evolutive dynamics (such as the Great Depression of 1929 and subsequent economic phenomena) can offer some lessons that can be used for the analysis of present events (the ongoing international economic crisis) and their projection towards the future. 3.2. Contribution of Parsimonious Factors If the identification of parsimonious factors pays special attention to past and present aspects of the complex system, the second problem—how to build up on the parsimonious factors—is principally oriented towards the future. This is particularly important in a science of design such as economics, because it is an applied science that uses prediction as an indispensable ingredient for making prescriptions.27 In this task of modeling systems towards the future, there are a number of difficulties, among them is the absence of a “comprehensive theory of design” elaborated around central constraints.28 From a holological perspective, there is a transition from analysis to synthesis. This movement from parts to wholes has been central in Simon’s perspective, which thinks that “the whole is (sometimes) many times more than the sum of its parts.”29 If this is the case, then it is clear that there are “intrinsic difficulties with representing complexity in computer models.”30 Moreover, it seems reasonable to accept in economics the existence of nonlinear interactions between parts of the system. Thus, in order to build up on the parsimonious factors, we need elements serving organized complexity, such as hierarchy. In this regard, Simon has pointed 27 Cf. Herbert Simon, “Prediction and Prescription in Systems Modeling,” Operations Research 38, 1990, pp. 7-14; reprinted in Herbert Simon, Models of Bounded Rationality. Vol. 3: Empirically Grounded Economic Reason. Cambridge: The MIT Press 1997, pp. 115-128. Cf. Wenceslao J. Gonzalez, “Prediction and Prescription in Economics: A Philosophical and Methodological Approach,” pp. 321-345. 28 Explicitly, Simon recognizes that “I should like be able to present to you a comprehensive theory of design built around this central constraint [the scarcity of attention]; but lacking such a theory—and I certainly don’t have one—...”, Herbert Simon, “Organizing and Coordinating Talk and Silence in Organizations,” p. 614. 29 Herbert Simon, Ibid., p. 612. 30 Dominique Chu, Roger Strand, and Ragnar Fjelland, Ibid., p. 20.
Complexity in Economics and Prediction
329
out: “all business organizations are hierarchies, where the hierarchy denotes not only relations of authority, but some degree of independence of each unit and subunit.”31 Building up the system with the etiological approach requires the acceptance of “causes” in human affairs,32 both in the social sciences and in the sciences of the artificial. Simon accepts a causal relation that links phenomena in an asymmetric way. For him, this causal relation is “a functional relation between variables, not a logical implication between the values of those variables.”33 His interest is in parsimonious factors: “we necessarily restrict our analyses to events of small or middle-size worlds, abstracted from their larger environments and characterized by very small number of equations. We see that the notion of causal ordering provides us with a rigorous justification of this essential practice.”34 Causality in dynamic systems is analyzed by Simon from the point of view of a system that is composed by subsystems, where the principal strong forces connect variables belonging to the same subsystems. The dynamics of the nearly decomposable system might be in the short term (the interest is in the relative movements of the variables in each subsystem) or in the long term (the dynamics is then associated with the linkages between subsystems). Thus, “at any level in the hierarchy, the causal ordering that relates to the relative movement of the variables within any single component is (nearly) independent of the causal ordering among components.” 35 Certainly, this kind of approach is also related to a teleological analysis. In the case of economics, complex systems such as organizations and markets are “goal-oriented activities.”36 Moreover, they can be seen as the principal organizers of most of our collective intentionality and with a historical development. Again, we can use parsimonious factors: in the case of short-term, middle-term and longterm economic phenomena, “attention can be directed to the dynamics of specific subsystems without dealing with the entire system at one.” 37 Logically, parsimonious factors cannot be used as an “automatic deductive 31 Herbert Simon, “Organizing and Coordinating Talk and Silence in Organizations,” p. 612. 32 Cf. Merrilee Salmon, “La explicación causal en Ciencias Sociales,” in Wenceslao J. Gonzalez (ed.), Diversidad de la explicación científica. Barcelona: Ariel 2002, pp. 161-180. 33 Yumi Iwasaki and Herbert Simon, “Causal Ordering, Comparative Statics, and Near Decomposability,” Journal of Econometrics 39, 1988. Reprinted in Herbert Simon, Models of Bounded Rationality. Vol. 3: Empirically Grounded Economic Reason. Cambridge: The MIT Press 1997, p. 18. 34 Yumi Iwasaki and Herbert Simon, Ibid., p. 25. 35 Yumi Iwasaki and Herbert Simon, Ibid., p. 35. 36 Herbert Simon, “Complex Systems: The Interplay of Organizations and Markets in Contemporary Society,” Computational and Mathematical Organizational Theory 7, 2001, p. 79. 37 Yumi Iwasaki and Herbert Simon, Ibid., p. 40.
330
Wenceslao J. Gonzalez
system”. They include elements that are interconnected. Thus, in the realm of microeconomics, there are links between uncertainty, bounded rationality and prediction. Economic prediction cannot be made on the basis of a “perfect rationality,” because there are limitations: the uncertainty about the consequences that would follow from each alternative when a decision is made, the information about the set of alternatives could be incomplete, and the complexity of the situations can prevent us from doing the necessary computations to solve the problem.38 Thus, against neoclassical positions—mainly, those of Friedman—,39 Simon maintains that “in this kind of complexity, there is no single sovereign principle for deductive prediction.”40 For him, the evidence available reveals that a theory of substantive rationality and the assumptions of utility maximization do not provide a sufficient base for explaining and predicting economic behavior. Therefore, he suggests applying the procedural theory of rationality to economics, both at micro and macro levels. Therefore, in order to tackle with the problem of complexity, the parsimonious factors cannot be reduced to “near-decomposable” elements in a given system and their evolution. If they want to be the necessary and sufficient elements to reach the whole system, parsimonious factors should take into account the four levels pointed out. When they are used for economic predictions, those factors should give us the ontological elements (parts or phenomena), epistemological components (causes), methodological requisites (processes valid for development) and logical aspects (reasons) that are relevant for foretelling events in the economic realm (e.g., about the future of the financial crisis).
Faculty of Humanities University of A Coruña Dr. Vazquez Cabrera street, w/n 15.403 Ferrol Spain [email protected] 38 Cf. Herbert Simon, “Theories of Bounded Rationality,” in C. B. McGuire and Roy Radner (eds.), Decision and Organization. Amsterdam: North-Holland 1972, p. 169. 39 Friedman has been very influential in mainstream economics. He has emphasized the idea of predictions about phenomena not yet observed as the ultimate goal of a positive science (Cf. Milton Friedman, “The Methodology of Positive Economics,” in Milton Friedman, Essays in Positive Economics. Chicago: The University of Chicago Press 1953, 6th repr., 1969, p. 7). His idea is focusing economic theory on valid and meaningful predictions, which highlights the “impersonal” results, whereas Simon stresses the role of understanding processes and human behavior. 40 Herbert Simon, “Rationality in Psychology and Economics,” in Robin Hogarth and Melvin Reder (eds), Rational Choice. The Contrast between Economics and Psychology. Chicago: The University of Chicago Press 1987, p. 39.
MARIA G. BONOME
PREDICTION AND PRESCRIPTION IN THE SCIENCE OF THE ARTIFICIAL: INFORMATION SCIENCE AND COMPLEXITY1
Information science and computer sciences are two realms of the sciences of the artificial that are faced with new forms of complexity.2 From the point of view of the content, there are new and increasingly more complex informative needs that are being demanded with a growing intensity. From the perspective of agents, there are new forms of interactions between the individuals and the information systems.3 This relation between users and information brings about new organizations that are strengthened through new advances in technology.4 Therefore there are inner and outer factors related to the new forms of complexity. To face those inner and outer aspects, information science and computer sciences have to work jointly to bring forward a possible future (prediction) and to give new action guidelines that allow us to solve posed problems (prescription). Here the dynamic aspects of informative systems are highlighted and whose complexity is the topic of this paper. There are three factors that take part in a special way: information, systems for the processes, and users. These three factors are opened to the future, have a close relation between them, and have an applied side. Information is a dynamic matter from the perspective of its constitution: in its origin, in its development and in its results (that have a feedback effect). First of all, the success of the systems that carry out informative processes depends on the capacity to process information, and secondly, on the active relation with the user (as individuals, a group or an organization). The demand of new forms of informa1 2
3
4
This research project is supported by the Spanish Ministry of Science and Innovation (FFI2008-05948). In the case of computer sciences three aspects take part: the development of cognitive models, artificial intelligence and theory of computation. Cf. Paul Thagard, “Computing in the Philosophy of Science”, in Luciano Floridi (ed.), Philosophy of Computing and Information. Oxford: Blackwell 2004, pp. 307-308. Together with information management, time and the attention that people devote to search relevant information have an influence on decision making. Cf. Herbert A. Simon, “The Future of Information Systems”, Annals of Operations Research 71, 1997, p. 5. It has to be considered that the selection of technologies in project design has strong consequences for later stages of design. About this issue, cf. Mark Bergman and G. Mark, “Technology Choice as a First Step in Design: The Interplay of Procedural and Sensemaking Processes”, Proceedings of the Conference on Designing Interactive Systems 6, 2002, pp. 224-234.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_23, © Springer Science+Business Media B.V. 2011
332
Maria G. Bonome
tion that has to lead to obtain genuine “knowledge”, are related to human historicity: a human being can be submitted to continuous changes in his circumstances, needs and interactions. Therefore, it can be said that initially information science has a double relation to complexity, at least.5 On the one hand, from the point of view of the content, it has to face knowledge management to reduce the uncertainties in decision making, both in the theoretical field and in the practical sphere. On the other hand, it has to face up to the interaction with users. This outer aspect means the design of new products that deal with the needs of agents. Those two aspects—the inner and the outer—are a field of the information science as sciences of design. These need the contribution of other disciplines related to cognitive contents (like philosophy or psychology) or related to the nets of social links (sociology or anthropology). Therefore, information science studies complex problems to attend to cognitive demands but it is open to interdisciplinary teams.
1. COMPLEXITY IN DYNAMICAL SYSTEMS In my opinion, we can recognize and distinguish three types of complexity in informative systems: a) complexity about information itself (information science deals with this specifically); b) complexity about the design of informative systems (computer sciences has a main role in this matter); and c) complexity about the feedback that comes from the interaction between the user and the informative systems. In this third aspect, disciplines that are about visual design, usability and accessibility have a fundamental task. Information science and computer sciences deal with an inner aspect. These disciplines develop mechanisms to analyze, represent, store, process, retrieve and manage informative contents into a system. However, there is an outer aspect as well. If the information the user is searching for has no answer or does not have a proper answer, the system is said “not to represent properly” the aim for which it has been designed. The sciences of design that deal with this matter are aimed to improve these questions through research. 5
Information science can be approached from two points of view: a) in a broad sense, and b) in a narrow meaning. In the first approach, it covers the whole of disciplines related to archives, libraries and documentation centres. In the second option, this science deals specifically with informative processes: creation, acquisition, organization, evaluation, storing, transmission, retrieval and spreading of information. Cf. Antonio Bereijo, “Caracteres de la predicción en las Ciencias de la Documentación: de la Information Science al Análisis Documental”, in Wenceslao J. Gonzalez (ed.), Las Ciencias de Diseño. Racionalidad limitada, predicción y prescripción. A Coruña: Netbiblo 2007, p. 217. This second aspect highlights the service to users and it is the one that focuses the present paper.
Prediction and Prescription in the Science of the Artificial
333
In relation to the dynamics of a system and the complexity, it can be said that there are two main aspects: i) how systems can spontaneously self-organize when they are in a state far from equilibrium and there is no external strength that acts on the system; and ii) how systems self-organize to give an answer to the action of the outer environment. Dealing with the first of these questions science of complexity propounds to consider those systems as a continuous tension between competition and cooperation.6 Competition itself is not enough to successfully guide complex systems. Besides, cooperation between different elements can cause fundamental changes in their dynamics.7 1.1. Understanding the Complexity of a System Understanding the complexity of a system involves getting to know how that system is organized. This means paying attention to both the internal elements and its relation to the environment. In this sense, the theory of organizations is useful to understand the complexity of systems such as those related to information and its retrieval. Herbert Simon made important contributions to characterize complexity in different systems. For him, complexity used to have a structure that has to be understood in terms of the relations that are established among their different components.8 Simon understands complexity of the systems in terms of organization. He uses the concept of “hierarchy” to deal with systems. His definition of hierarchy is “a system that is composed of interrelated subsystems, each of the latter being in turn hierarchic in structure until we reach some lowest level of elementary subsystems”.9 He uses this term to refer to every complex system that is analyzable from consecutive sets of subsystems. Formal hierarchy is about the more specialized systems (firms, governments, universities …).10 6
This topic is developed in Robert Axelrod, The Complexity of Co-operation. AgentBased Models of Competition and Collaboration. Princeton: Princeton University Press, 1997. 7 Cf. Cristoforo S. Bertuglia and Franco Vaio, Nonlinearity, Chaos and Complexity. Oxford: Oxford University Press 2005, p. 6. 8 Simon was interested in the procedure to increase information into organizations, but besides he was interested in the matter of how to have organizations more and more adapted to a more an increasingly complex World. Cf. Herbert A. Simon, “Designing Organizations for an Information-rich world”, in Martin Greenberger (ed.), Computers, Communications, and the Public Interest. Baltimore: The Johns Hopkins Press 1971, pp. 37-72. Compiled in Herbert A. Simon, Models of Bounded Rationality. Vol. 2: Behavioral Economics and Business Organization. Cambridge: The MIT Press 1982, pp. 171-185. Some of his contributions about this topic are analyzed in Mie Augier and James G. March (eds.), Models of a man. Essays in Memory of Herbert A. Simon. Cambridge: The MIT Press 2004. 9 Herbert A. Simon, The Sciences of the Artificial. 3rd ed., Cambridge: The MIT Press 1996, p. 184. 10 The main contributions of Herbert A. Simon to the topic of organizations in relation to
334
Maria G. Bonome
From there Simon considers that a dynamic system can be described in terms of a quasi-decomposable set. In this case, the fast evolution of any complex system consists of a set of stable subsystems, each operating nearly independently of the detailed processes going on within the other subsystems, hence influenced mainly by the net inputs and outputs of the other subsystems. If the near-decomposability condition is met, the efficiency of one component (hence its contribution to the organism’s fitness) does not depend on the detailed structure of other components.11
His approach can be summarized in two main elements. First of all, when a system is quasi-decomposable the short term behaviour of each of its subsystems is practically independent from the behaviour of the other components in the short term. Secondly, the behaviour of each of the components of the systems depends only in an aggregate way on the behaviour of the others. This does not mean that dealing with the components of a system in an independent way involves simplicity. The proposal is different: parsimonious factors are necessary to make a correct description of the system possible.12 1.2. Interaction in the System Order and disorder have been traditionally considered opposites in regards to the configuration of systems. Order is related to systems that are open to classification, to interactions that are analyzable into a discourse of rationality. Disorder is connected to the difficulty of being expressed through clear concepts of modern science, except as a limited form of statistical generalization. To deal with the interaction into the systems, science of complexity has developed a radical re-conceptualization of the relation between order and disorder. In this sense, “chaos” is now conceptualized as extremely complex information, instead of being presented
their inner structure and the relation to the environment can be found in Herbert A. Simon, Administrative Behavior. N. York: Macmillan 1947, and in James G. March and Herbert A. Simon, Organizations, N. York: J. Wiley 1958. Some interesting analysis about Simon’s contributions on decomposability and hierarchy can be found in part VI of Peter Earl (ed.), The Legacy of Herbert Simon in Economic Analysis. Vol I. Cheltenham: Edward Elgar 2001, pp. 485-548. 11 Herbert A. Simon, The Sciences of the Artificial, p. 198. 12 This conception is developed in detail in Herbert A. Simon, “Science Seeks Parsimony, not Simplicity: Searching for Pattern in Phenomena”, in Arnold Zellner, Hugo A. Keuzenkamp and Michael McAleer (eds.), Simplicity, Inference and Modelling. Keeping it Sophisticatedly Simple. Cambridge: Cambridge University Press 2001, pp. 32-72.
Prediction and Prescription in the Science of the Artificial
335
as an absence of order.13 Complexity can be said to be a property of the systems and their models that are in the middle of stability and chaos.14 On the one hand, there are “stable systems”. Predicting their behaviour is possible as far as the changes in the system are easy to control. On the other hand, there are “chaotic systems” where scientific prediction is highly problematic. However, in the middle, there are transitional states between order and disorder that move searching for their own equilibrium. That is the response to the interaction among their components and their response to external actions. Almost every system consists of numerous elements. Among them are reciprocal and non-linear interactions and feedback mechanisms as well. This can be seen in a wide range of scientific fields. It is precisely in this field of interaction where hypertext comes up as a form of representation and transmission of information. Hypertext documents were designed in such a way that the person who is going to read them is able to move into a document in a non-linear way or even to move from one document to another through links. In hypertext, transitions are connected to the mental task of associating ideas. Science of complexity usually adopts an overall view of the systems. It considers the intricate relation between the systems and their elements. It understands the situation in a holistic sense: each system has to be considered as a whole. The idea is to overcome the point of view of classical mechanics. One of the main questions about this approach is precisely its rejection of the stand that considers sets to be reducible to the sum of their parts.15 Information systems are usually considered to be non-linear systems. Therefore, interaction among their several elements has relations that are mainly multiplicative instead of being simply additive ones. Besides, most of systems in the real world in which human beings are involved do not work in a linear way. They are extremely sensitive to initial conditions and the re-entry of data gives rise to very different results. In this kind of system, “interaction” is a key matter because the relation between two variables can be modified by the value of a third.16 This is applied to causality in the social realm: there are numerous causes that usually interact in a non additive way.17
13 Cf. Robert Chia, “From Complexity Science to Complex Thinking: Organization as Simple Location”, Organization 5, 3, 1998, pp. 341-369. Compiled in Robert MacIntosh, Donald MacLean, Ralph Stacey and Douglas Griffin, Complexity and Organization. Readings and Conversations. London: Routledge 2006, p. 211. 14 Cf. Cristoforo S. Bertuglia and Franco Vaio, Ibid., p. viii. 15 Cf. David Byrne, Complexity Theory and the Social Sciences, London: Routledge 1998, p. 7. 16 Cf. David Byrne, Ibid., p. 19. 17 About causality there is a joint paper by Herbert A. Simon and Nicholas Rescher: “Cause and Counterfactual”, Philosophy of Science 33, 1966, pp. 323-340. Reprinted in Herbert A. Simon, Models of Discovery. Boston: Reidel 1977, pp. 107-134.
336
Maria G. Bonome
2. DIFFICULTIES FOR PREDICTION IN THE DESIGN OF INTERACTIVE SYSTEMS As a sphere related to the sciences of design it is necessary to think about how to create the scientific design. This requires the contribution of knowledge at three different levels: a) in regards to the searched aims, b) about the selection of the most suitable processes, and c) in connection with the criteria to be applied in the assessment of the results.18 Every one of the several scientific aspects has a separate development, even though there is a close relation. This gives rise to some shared tasks. Both the design of the technological instruments and the design of systems to process information draw the limits of what is possible to reach. In this sense, prediction is possible to a certain level. Nevertheless, this design has an implicit feedback process about the use of the systems: how it is used, what information is more relevant or what informative contents are irrelevant. Evolution in the design has to deal with some uncertainties, especially in those tasks related to the interaction user-based system because they are more difficult to predict. The evolution of the system through time has a particular interest; which is its dynamic. That interest is justified by the fact that when, on the basis of acquired experience, we succeed in identifying typical reoccurrences in its behaviour, we can then attempt to make predictions about the future behaviour of said system. Obviously we can only do this by assuming that what has occurred in the past will be repeated in the future, an assumption that is often not justified by experience and is sometimes completely arbitrary.19
Two different possibilities about prediction can be distinguished depending on the realm where the emphasis is placed: the dynamic of the “inner” or the dynamic of the “outer”. The first one is about how to improve the composition of the systems and the way in which they work (for instance, if it is possible to get a semantic web with the means we have and how). The second one deals with the relation to the environment: prediction about the behaviour of users when they use the systems (for instance how to anticipate the future consequences of social networks). Information science can make progress towards new goals thanks to the knowledge about future events provided by prediction. Prediction contributes to establishing the aims that are possible to reach then the most convenient ones have to be established. Therefore, prediction in information science has a key role to solve specific problems; it serves as a basis for prescription to decide what to do. 18 Cf. Wenceslao J. Gonzalez, “Análisis de las Ciencias de Diseño desde la racionalidad limitada, la predicción y la prescripción”, in Wenceslao J. Gonzalez (ed.), Las Ciencias de Diseño. Racionalidad limitada, predicción y prescripción, pp. 5-6 19 Cristoforo S. Bertuglia and Franco Vaio, Ibid., p. 4.
Prediction and Prescription in the Science of the Artificial
337
2.1. Kinds of Uncertainty Information science moves in a social environment whose future has to consider prediction and, as far as possible, has to channel prescription. There, uncertainty, which is an inherent feature of modern society—an opened society—appears. Uncertainty not only has its origin in the absence of information or the lack of knowledge—the “inner” elements of the system—but also it appears because of the strategic and institutional features of the networks that articulate and process problems. Several different kinds of uncertainty can be distinguished: a) the substantive, b) the strategic, and c) the institutional.20 Substantive uncertainty refers to the availability of information when we have to face problems of complex entity. This uncertainty is not only related to the question of whether it is possible to have access to information (or, knowledge if that is the case). It happens that “since actors have different perceptions of problems and view them from different frames of reference, they also interpret the available information differently”. As a result, “the different interpretation of the meaning of information is an additional source of substantive uncertainty. More information, therefore, does not necessarily lead to less uncertainty; in fact, it might lead to more uncertainty”.21 Strategic uncertainty appears when cognitive insecurity is amid the interaction among users, those that need information to make decisions. Information users have to respond to the strategic actions of other agents, and to do this they need to anticipate their behaviours. Those mechanisms where several agents take part may make it difficult to predict the results of their interactions. This generates uncertainty to deal with the problem and the process of resolution. There is institutional uncertainty in the sense that informative systems are part of an organized social environment. Therefore, together with the individual factors in the agents (different aims, interests and perceptions of the users about the information) there are some contextual factors: different kind of organizations, social networks and administrative stratifications (local, regional, national). “Interaction between actors is difficult since each will have their behaviour guided by the tasks, opinions, rules and language of their organization, their own administrative level and their own network”.22 Information science works jointly with computer sciences and information and communication technologies (ICT) to face these questions about uncertainty. With their help, information science has to deal with the following issues: i) the lack of information, ii) the interpretation of information, iii) the excess of information, iv) the needs and desires of users, v) the interaction among different kinds of 20 Cf. Joop Koppenjan and Erik-Hans Klijn, Managing Uncertainties in Networks. A Network Approach to Problem Solving and Decision Making. London: Routledge 2004, pp. 6-7. 21 Ibid, p. 6. 22 Ibid, p. 7.
338
Maria G. Bonome
users, vi) the link with the cultural and organizational environment that surrounds them, and vii) the knowledge users have about the computational systems they are using. Dealing with these questions requires the consideration of the role of prediction. The sciences of the artificial—information science and computer sciences— are configured as applied sciences aimed to solve specific problems.23 Therefore, these sciences of design use predictive models as an instrument to analyze the behaviour of large complex systems. This means that they have to be able to predict future events and contribute to examine the possible effects of public decisions (among them, the political ones).24 2.2. Scientific Design and the Management of Uncertainty Problems To deal with uncertainties of the first kind—the substantive ones that are about the components of the systems and their processes—prediction can be said to be easier. The higher simplicity in this case is due to the better knowledge about the involved variables that are mostly designed, so at first, their behaviour is known. The observation of the results from their work will help to improve their development (for instance, new languages or structures to tag digital documents; new strategies to improve web positioning, to advance towards a semantic indexing or new crawlers for information retrieval). Nevertheless, when we are dealing with uncertainty where users take part, other variables come into play. These have a much more dynamic behaviour. In this sense, they are much more difficult to predict than the previous ones, as it can be seen from the analysis of the features in complex social systems. In my opinion, prediction about social interaction in the use of informative systems has a feature that makes it different from prediction in other social sciences. The difference is in the close relation between the “inner” and the “outer”. Indeed, there are ways into the systems to collect, record and analyze data about the users’ behaviour. This is carried out by means of a continuous feedback process. Therefore, while a person is interacting with the system, the system has been designed to try to know their intentions, interests, cultural level, and economic level. New research developments are working to analyze user feelings (sadness, happiness …) so that they can find a way to modify those feelings while the user is interacting with the system. When we talk about institutional uncertainty, because informative systems are part of an organized social environment, there is a problem related to predic23 Sciences of design—as sciences of the artificial—are applied sciences in the sense that they are aimed to goals and they search to solve specific problems. About this issue see Ilkka Niiniluoto, “The Aim and Structure of Applied Research”, Erkenntnis 38, 1993, pp. 1-21. 24 Cf. Herbert A. Simon, “Prediction and Prescription in Systems Modelling”, Operations Research 38, 1, 1990, p. 7.
Prediction and Prescription in the Science of the Artificial
339
tion and another problem regarding prescription. The problem of prediction regards the information in the future and how to share it. This influences the design of aims, and affects the processes and the subsequent evaluation of the results. In this sense, the creation of international organizations into specific realms is more and more frequent in order to work jointly in the development of predictions. The problem of prescription in institutional uncertainty can be said to create more difficulties than prediction. The exchange of information among diverse types of organizations causes an increase of situations where decision making has to be shared. The difficulty rests on the capacity to reach agreements about criteria to enable the interaction among organizations belonging to diverse environments.25 A common starting point, similar processes, and the evaluation of the results from similar criteria are necessary in order to obtain authentic prescriptions. Therefore, scientific design has to deal with several different types of uncertainty: substantive, strategic and institutional. Prediction needs to contribute so that the design can be redefined into a more suitable model. In this sense, the web allows an empirical methodology that enables it to analyze itself in real time. In this way, it can check if scientific aims are reached properly, and if this is not the case, it allows it to know where improvements need to be made. We really are not able to predict some consequences in the use of systems; but since a system is the product of a design, we can act so that some things happen and others do not. This is the role of prescription.
3. THE NEED FOR PRESCRIPTION IN AN INFORMATIVE ENVIRONMENT An information system is an opened space. There every person and every organization has a place to interact with each other. This is a basic question in designing the role of prescriptions in this field and if some aspects of predictability have been mentioned previously, several levels of prescription can be considered here too. The main ones are: the “inner” level, related to tasks about representation, storing, processing and information retrieval; and the “outer” sphere regarding the interaction user-system. 3.1. Prescription in the “Inner” Level In relation to prescriptions in the “inner” level, we need to emphasize that information science—the same as computer sciences—has a statute of “science of design”. As an applied science it has a prescriptive side, therefore it develops patterns, standards and rules that are established as operative instruments to achieve goals. Prescriptions made in the realm of this discipline do not transform reality. The practical knowledge that deals with problems concerning information does 25 Within the realm of information science, an example would be IFLA (International Federation of Library Associations and Institutions).
340
Maria G. Bonome
not modify or alter its own reality it works with—it indexes, summarizes, classifies, etc.—but it does not alter its own entity in any way. Its aims, processes and results are genuinely artificial.26 It can be said that information science has prescriptions in the three aspects of aims, processes and results that are characteristic of an applied science of design. This has multiple possibilities: i) there can be specific prescriptions aimed to guide the transfer of information, both formal and informal, ii) it is possible to model prescriptions regarding how information needs are generated into the society, iii) prescriptions can be made to improve both systems and the quality of information by means of developing the proper methods; iv) it is a convenient way for development of standards to improve the knowledge coming from any kind of document, in such a way that the methods for analyzing and representing systems of information can be increased; and v) it can be given rules to increase the relevance of information sources.27 Prescriptive rules and guides have a key role in making it possible to share information. Nevertheless, these cannot be imposed easily in professional practice due to several factors. To mention a few: a) information science has its origin in the scientification of a professional practice with a very deep rooted practicing;28 b) there is such a fast dynamic in the changes in scientific developments that organizations have hardly any time to assume new patterns; c) economic factors here are very deeply involved so they appear inserted into their own scientific research;29 and d) scientific developments to manage information are shared both by formal and informal organizations into the same space. 3.2. The “Outer” Sphere of Prescriptions The “outer” sphere is together with the “inner”. This aspect involves dealing with the matter of prescription related to the interaction user-system and the interaction among users. As an informative system is a dynamic complex system “application and interpretation of rules can generate shifts and changes in the sets of network rules. Furthermore, rules can be consciously broken. The fact that rules must be followed means, after all, that they are considered acceptable by the actors in the network”.30 In spite of this, both prescriptions—in general and norms 26 Cf. Antonio Bereijo, “Caracteres de la predicción en las Ciencias de la Documentación: de la Information Science al Análisis Documental”, pp. 272-273. 27 Cf. Peter Ingwersen and Kalervo Järvelin, The Turn: Integration of Information Seeking and Retrieval in Context. Dordrecht: Springer 2005, p. 389. 28 About this matter, see Ilkka Niiniluoto, “The Aim and Structure of Applied Research”, pp. 1-21. 29 About economic aspects in science see Wenceslao J. Gonzalez, “Economic Values in the Configuration of Science”, in Evandro Agazzi, Javier Echeverria and Amparo Gomez (eds.), Epistemology and the Social. Amsterdam: Rodopi 2008, pp. 85-112. 30 Cf. Joop Koppenjan and Erik-Hans Klijn, Ibid. p. 80.
Prediction and Prescription in the Science of the Artificial
341
in particular—guarantee certain stability in the behaviour of users and so reduce the uncertainty. The interface between users and the information systems are being improved continually to achieve global communication. Studies about usability are having an increasing relevance in the development of information systems. The information that the users need not only has to be accessible but also has to be useful to their purposes. Together with rational contents, this communication involves intuition, feelings and emotions, which are aspects involved in the way the systems are used. Making models of users is related to try to describe which of the user interests should influence the computational means. In this matter, research in Psychology shows that users are not usually wholly aware of what exactly are their desires, even in tasks aimed to get a specific goal.31 Therefore, prescription here works considering some standards of behaviour, more or less well defined, to guide users in their searches through the system (giving advice, showing alternatives, providing with information about the utilities of the system, etc.). To deal with the matter of interaction among heterogeneous sets of users of the web (with their different needs of information) I consider turning to the theory of organizations. Organization is the basic structure through which human beings develop our interactions. Organizations are responsible for storing and transmitting both information and knowledge so that human beings fill their informative gaps and are able to reduce their uncertainties.32 Contributions of Herbert Simon in this matter include the analysis of the mechanisms that allow organizations an influence on the behaviour of their agents, so that they can be integrated into the group and able to share common goals. These mechanisms are the following: 1) the division of labour among their members where every actor performs a specific task and focuses his/her attention; 2) the establishment of standards to determine how things have to be done, avoiding individuals making their own decisions about those matters; 3) the transmission of orders to establish a structure of authority and influence; 4) the implementation of channels of communication, so that information can flow for decision making (these channels can be formal or informal: the previous ones are based in the hierarchical structure of the organization, the latter follow the social relations between individuals); and 5) the training given to users so that they have knowledge, skills,
31 Cf. Klaus Mainzer, Thinking in Complexity. The Computational Dynamics of Matter, Mind, and Mankind. 5th Ed., Berlin: Sringer 2007, p. 399. 32 “Today, the improvement of organizations and the information systems in them is not a matter of making more information available, but of conserving scarce human attention so that it can focus on the information that is most important and most relevant to the decisions that have to be made”. Herbert A. Simon, “The Future of Information Systems”, p. 3.
342
Maria G. Bonome
identification and loyalty to the organization, in such a way that they are able to make decisions in the way the organization would do it.33 Although organization of the informative systems—particularly of the web— and the interaction among users from all over the world does not follow a preconceived scheme, it could be said that, into that space, they try to self-organize themselves. In that way they reproduce the same organizational structure that they have in their social life. If we are able to make predictions and prescriptions about the behaviour of human beings in specific environments, we might be able to do the same into the frame of the World Wide Web. This is a task aimed to the future and basically teleological. There the mission of prescription is highlighted, as Herbert Simon says: our practical concern in planning for the future is what we must do now to bring that future about. We use our future goals to detect what may be irreversible present actions that we must avoid, and to disclose gaps in our knowledge … Our decisions today require us to know our goals, but not the exact path along which we will reach them.34
Nevertheless, so that a system can develop by exploring new possibilities, some margin to creativity has to be permitted. This happens with the new informative systems; above all of these is the web. Ralph Stacey points out that the science of complexity demonstrates that for a system to be innovative, creative and changeable it must be driven far from equilibrium where it can make use of disorder, irregularity, and difference as essential elements in the process of change. The state a system is in now is the result of every detail of its history, and what it will become can only be known if one knows every detail of its future development, and the only way one can do that is to let the development occur. The future of such a system is open and hence unknowable until it occurs.35
To sum up, information science has to tackle the complexity of systems that are dynamic. There is an “inner” dynamic to the system that looks for increasing the quantity and the quality of information processes; and there is an “outer” dynamic that corresponds to the interaction with the users of the information and the knowledge, that are agents placed in a changing social environment. Therefore, there are difficulties for prediction that have to be dealt with by this science of design. On the horizon there are three types of uncertainty—substantive, strategic and institutional—that generate problems to anticipate a possible future. At the 33 Cf. Herbert A. Simon, Administrative Behavior. 4th Ed., N. York: The Free Press 1997, p. 112. 34 Herbert A. Simon, “Prediction and Prescription in Systems Modelling”, p. 11. 35 Ralph Stacey, “The Science of Complexity: An Alternative Perspective for Strategic Change Processes”, Strategic Management Journal 16, 1995, pp. 477-495. Reprinted in Robert McIntosh, Donald MacLean, Ralph Stacey and Douglas Griffin, Complexity and Organization. Readings and Conversations, p. 92.
Prediction and Prescription in the Science of the Artificial
343
same time, prescription is necessary to channel the “inner” processes of acquiring and retrieving information and to give answer to the users’ demands. The exponential growth of the web is an example of all this.
Faculty of Humanities University of A Coruña Dr. Vazquez Cabrera, w/n; 15403 Ferrol Spain [email protected]
Team D Philosophy of the Physical Sciences
J EREMY B UTTERFIELD
AGAINST P OINTILLISME : A C ALL TO A RMS
A BSTRACT This paper forms part of a wider campaign: to deny pointillisme. That is the doctrine that a physical theory’s fundamental quantities are defined at points of space or of spacetime, and represent intrinsic properties of such points or pointsized objects located there; so that properties of spatial or spatiotemporal regions and their material contents are determined by the point-by-point facts. Elsewhere, I argued against pointillisme about chrono-geometry, and about velocity in classical mechanics. In both cases, attention focussed on temporal extrinsicality: i.e. on what an ascription of a property implies about other times. Therefore, I also discussed the metaphysical debate whether persistence should be understood as endurance or perdurance. In this paper, I focus instead on spatial extrinsicality: i.e. on what an ascription of a property implies about other places. The main idea will be that the classical mechanics of continuous media (solids or fluids) involves a good deal of spatial extrinsicality—which seems not to have been noticed by philosophers, even those who have no inclination to pointillisme. I begin by describing my wider campaign. Then I present some elementary aspects of stress, strain and elasticity—emphasising the kinds of spatial extrinsicality they each involve. I conduct the discussion entirely in the context of “Newtonian” ideas about space and time. But my arguments carry over to relativistic physics.
1 I NTRODUCTION This paper forms part of a wider campaign: to deny pointillisme. That is the doctrine that a physical theory’s fundamental quantities are defined at points of space or of spacetime, and represent intrinsic properties of such points or pointsized objects located there; so that properties of spatial or spatiotemporal regions and their material contents are determined by the point-by-point facts. I will first describe this wider campaign (Section 2). The broad picture is that we should distinguish between temporal extrinsicality, i.e. what an ascription of a property implies about other times, and spatial extrinsicality, i.e. what an ascription of a property implies about other places. Elsewhere, I have proclaimed the D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_24, © Springer Science+Business Media B.V. 2011
348
Jeremy Butterfield
former (2004, 2005, 2006a, 2006b). That is, I argued that there is more temporal extrinsicality than the pointilliste admits. This led me to the metaphysical debate whether the persistence of objects over time should be understood as endurance or as perdurance; (and in particular to assessing the so-called ‘rotating discs argument’ (2006a)). But here, I will proclaim spatial extrinsicality: the way that an ascription “reaches across space” like extended arms—hence my punning title. My main point will be that classical continuum mechanics gives us many rich examples of spatial extrinsicality: which seem never to have been noticed in analytic metaphysics, even though the relevant physics goes back to Euler. (I will confine myself to the “Newtonian” conception of space and time, as applied to continuum mechanics. This restriction keeps things simple: and at no cost, since both the debate and my arguments carry over to relativistic physics, once one there adopts a space vs. time distinction.) This paper also forms part of a yet wider and more positive topic, which is of interest to philosophers who feel no attraction for pointillisme. Namely: the philosophical richness of classical mechanics, especially continuum mechanics. This went without saying for the natural philosophers—later divided into physicists and philosophers of nature!—from the time of Newton to about 1910. Thereafter, the conceptual aspects, indeed problems, of the revolutionary quantum and relativity theories grabbed centre-stage in the philosophy of nature. But the richness of classical mechanics remains, as emphasized in recent decades by such authors as Truesdell, in mechanics and its history, and Mark Wilson in philosophy (1998, 2006, Chapters 4 to 6). I have no space here, even for a list of topics; but my (2004, Section 2; 2004a, Section 2; 2004b, Section 3f.) give some discussion and references. Here, I just note that even as regards pointillisme, this paper is more positive than its predecessors, in two ways. The second is more important, and deserves a separate Subsection. First, one main focus of the earlier papers was whether properties of a point (or point-sized object) that are represented by vectors, tensors, connections etc. can be intrinsic to it. Typically, pointillistes try to reconcile pointillisme with the fact that such properties certainly seem to be extrinsic. So in those papers, I argued that their proposed reconciliations fail. (One main aspect was that their reconciliations involved heterodox construals of the properties. So besides criticizing the reconciliations on their own terms, I urged that once pointillisme was rejected, there was less motivation for heterodoxy.) One really does need extrinsic properties of points, both for chrono-geometry (my 2006) and for the concept of velocity in mechanics (2006b). But in this paper I have no pointilliste authors as targets, for the simple reason that the spatial extrinsicality at work in continuum mechanics has not been noticed by metaphysics—and so no one has proposed a pointilliste reconciliation that I need to rebut.
Against Pointillisme: A Call to Arms
349
1.1 Five kinds of spatial extrinsicality We will see that continuum mechanics exhibits kinds of extrinsicality which are more varied and richer than the temporal extrinsicality which was centre-stage in the earlier papers. There, temporal extrinsicality related to two main topics: (i) implications about the existence of the same particle at different times, and so the endurance-perdurance debate (especially 2004, 2005, 2006a); and (ii) the idea of a spectrum of implications, arising from the sequence of successively higher time-derivatives of position (2006a, Section 4.5; 2006b, Sections 3.2, 4.3). Here in this paper, spatial extrinsicality will not relate to (i). There will of course be an analogue of (ii): a spectrum of extrinsicality due to the sequence of successively higher spatial derivatives. Indeed, there is “more extrinsicality”, since one has to consider derivatives not just of position, but also of many other quantities. But more important: we need to recognize that continuum mechanics also involves several other kinds of spatial extrinsicality. This paper will be mostly concerned with a kind which, like (ii), involves a spectrum. This spectrum arises, not from higher derivatives, but from geometric structures such as lines and planes through the spatial point in question (and the angles between them). Taking successively more lines and-or planes through the point yields successively higherorder tensors: another sense of “more extrinsicality”. Sections 3, 4 and 5 will give examples. (For simplicity, I will consider only rectilinear coordinates: I just note that curvilinear coordinates would give another source of extrinsicality.) Space prevents my going into details about continuum mechanics’ other kinds of spatial extrinsicality. So let me just list, and then set aside, the following three. (1) “Measure”: In treating quantities like mass and charge, continuum mechanics takes a leaf from modern measure theory (cf e.g. Truesdell 1991, pp. 16-19, 92-94). It attributes a quantity like mass primarily to finite volumes, and takes mass density as a derived quantity, defined by a limiting process. This means the mass-density at a spatial point is spatially extrinsic; (even though it is a scalar quantity—which some discussions wrongly take to imply intrinsicality: cf. Section 2). Besides, such densities are only defined almost everywhere, in the measure-theoretic sense: making the spatial extrinsicality subtle (cf. my 2006, Sections 3.3.2.B and 4.6). (2) “Metric”: I have also set aside relativity’s description of continuous matter (or fields). Here, the idea of mass is replaced by a energy-momentum tensor whose definition depends on the spacetime metric in various subtle ways. This suggests that mass-energy, momentum, stress etc. are best seen as relations of the matter (or field) to spacetime structure. Lehmkuhl (2010) is an excellent discussion of this sort of extrinsicality. (3) “Topology”: Facts about the global topology of space, or spacetime, or of configuration space, can induce another, perhaps more striking, kind of extrinsicality. For it is natural to respond to the kinds already listed by saying that, though they no doubt refute pointillisme beyond repair, surely they allow some doctrine of “localism”, that takes the bearers of a physical theory’s fundamental quantities
350
Jeremy Butterfield
to be, not pointillisme’s points (or point-sized bits of matter), but rather: arbitrarily small spatial or spacetime regions (or correspondingly small bits of matter). Thus localism might be defined as claiming that the state assigned by a physical theory to (the systems within) a spatial or spacetime region R is determined by (supervenient upon) the states assigned to the elements of a covering of R consisting of arbitrarily small open sets. And this looks to be compatible with the kinds of extrinsicality so far listed. (In particular, my own earlier proclamation of temporal extrinsicality proposed such a localism as appropriate for an anti-pointilliste defence of perdurance as an account of persistence: 2006a, Section 4.) But in electromagnetism, global topological facts can lead to phenomena (most famously, the Aharonov-Bohm effect) which seem best interpreted in terms of holonomies (values of integrals around loops), thus violating localism. (Myrvold (2010) is an excellent discussion: what I have dubbed ‘localism’, he calls ‘patchy separability’.)1 To sum up: continuum classical mechanics involves at least five kinds of spatial extrinsicality. Namely: the three kinds (1) to (3) just listed, and the two I first mentioned: (4) “Derivatives”: The spectrum of extrinsicality due to successively higher spatial derivatives; (5) “Orders”: The spectrum of extrinsicality due to successively higher order tensors.2 This paper will focus on (5). Indeed, most of my points will flow from the fact that continuum mechanics embraces spatial extrinsicality due to surface forces, acting at a point, with respect to a surface through that point. In Section 3, this idea will be encapsulated in the stress tensor. (In short, stress is a codification of the forces acting at a point.) Besides, by considering successively higher order tensors, we get a spectrum of extrinsicality. This is illustrated in Section 5 by elasticity. Elasticity relates stress to strain: which, in short, is a codification of how a 1
2
You might object that electromagnetism is a different theory than continuum mechanics, so that presumably the latter is free of this kind of extrinsicality. I think this objection is valid only in letter, not in spirit. Without going into details, I would say, as regards ‘the letter’: though mechanics does invoke holonomies in explanations, I concede that, at least so far as I know, holonomies are not interpretatively essential to mechanics, in the way they seem to be for electromagnetism. But as to ‘the spirit’: continuum mechanics is not a conceptually closed theory, since the treatment of many topics, such as a body’s restorative forces or its expansion under heat, inevitably leads to other theories of classical physics, such as thermodynamics and electromagnetism— as the Chapter titles of any of the more comprehensive treatises attest. So when interpreting continuum mechanics, it is artificial—indeed, merely a unconscious reflection of a common pedagogic restriction—to set aside these adjacent theories. Indeed the holonomy interpretation of the classical electromagnetic field is similarly based on considering adjacent theories; in that case, the quantum mechanics of a particle in such a field. For more discussion, cf. e.g. my (2004, Section 2.2), and Belot (1998, Section 5). Although I here have no space to relate these kinds to each other, it is clear that this would be a good project.
Against Pointillisme: A Call to Arms
351
continuous body is distorted at the point. So as a preliminary to discussing elasticity, I need to discuss strain, in particular the strain tensor. I do this in Section 4: where we see yet another philosophical aspect of classical mechanics—that it treats the identity of particles as primitive in the sense of distinguishing a configuration and its permuted image, even if the permutation shifts only indistinguishable particles.
2 T HE WIDER CAMPAIGN AGAINST POINTILLISME My earlier papers described this wider campaign, in particular how it relates to the debate whether persistence should be understood as endurance or perdurance. So here I shall just say, and in part repeat, what little is needed for later Sections. 2.1 Avoiding controversy about the intrinsic-extrinsic distinction My campaign does not need to take sides in the ongoing controversy about how to analyse, indeed understand, the intrinsic-extrinsic distinction. In this paper, I can make do with a much clearer distinction, between a species of extrinsic property that Lewis (1983, p. 114) dubbed ‘positive extrinsic’, and the rest. The positive extrinsic properties are those that imply accompaniment, where something is accompanied iff it coexists with some wholly distinct contingent object. So the negation of this is: properties that are compatible with being unaccompanied, i.e. being the only contingent object in the universe (for short: being lonely). Lewis pointed out (against a proposal of Kim’s) that, since loneliness is itself obviously extrinsic, this distinction is not the same as the intrinsic-extrinsic distinction; though ‘almost any extrinsic property that a sensible person would ever mention is positive extrinsic’ (1983, p. 115). But this distinction is enough for me. That is, I can take pointillisme to advocate properties that are intrinsic in the weak sense of being not positively extrinsic (i.e. being compatible with loneliness). So this makes my campaign’s claims, i.e. my denial of pointillisme, logically stronger; and so I hope more interesting. Anyway, my campaign makes some novel proposals about positive extrinsicality. As we saw in Section 1, I distinguish temporal and spatial (positive) extrinsicality; and in this paper, I will propose degrees of (positive) spatial extrinsicality. 2.1.1 Distinction from three mathematical distinctions Both (a) the murky intrinsic-extrinsic distinction, and (b) the clearer distinction between positive extrinsics and the rest, are different distinctions from three that are made within mathematics and physics, especially in those parts relevant to us: viz. pure and applied differential geometry. The first of these distinctions goes by the name ‘intrinsic’/‘extrinsic’; the second is called ‘scalar’/‘non-scalar’, and
352
Jeremy Butterfield
the third is called ‘local’/‘non-local’. Though my previous papers used only the third distinction, this paper will use the second and third. These distinctions are as follows. (i): The use of ‘intrinsic’ in differential geometry is a use which is common across all of mathematics: a feature is intrinsic to a mathematical object if it is determined (defined) by just the object as given, without appeal to anything extraneous—in particular a coordinate system, or an embedding of the object into another. For example, we say that the intrinsic geometry of a cylinder is flat; it is only as embedded in IR3 that it is curved. (ii): Differential geometry classifies quantities according to how they transform between coordinate systems: the simplest case being scalars which have the same value in all coordinate systems. We will not need the algebraic details of how the other cases—vectors, tensors, connections etc.—transform. But we will need some basic ideas about the geometric ideas that underly the transformation rules. For these are all about choosing appropriate lines and planes through the point in question—e.g. three orthogonal axes to define a local coordinate system—by which to give numerical and so algebraic expression to the quantity (vector or tensor etc.). So the geometric ideas behind the transformation rules are the source of the spectrum of spatial extrinsicality announced in Section 1. Besides, according to (i)’s mathematical usage, the choice of lines and planes through a point is extrinsic to it. So here (i)’s usage of ‘extrinsic’, and my philosophical usage, mesh well; (though in general the philosophical distinctions, both (a) and (b), are different from the three mathematical ones). (iii): Differential geometry uses ‘local’ (as vs. ‘global’) in various ways. But the central use is that a mathematical object or structure is local if it is associated with a point by being determined (defined) by the structures defined on any neighbourhood, no matter how small, of the point. For example, the instantaneous velocity of a point-particle at a spacetime point, and all the higher derivatives of its velocity, are local since their existence and values are determined (in the usual way as a limit of quotients of differences) by the particle’s trajectory in an arbitrarily small neighbourhood of the point. Similarly, for countless other examples of a derivative, or higher derivative, of a quantity at a point. This suggests that these derivatives represent an extrinsic property of the point, even if the original quantity was intrinsic to it. I agree. And my previous papers discussed: (i): how the sequence of derivatives suggests a spectrum of extrinsicality; and (ii): how the derivative’s being mathematically local makes the extrinsicality very mild: to take again the example of velocity, one might say that instantaneous velocity, acceleration and all higher derivatives, are ‘hardly extrinsic’.
Against Pointillisme: A Call to Arms
353
2.2 Classical mechanics is not pointilliste 2.2.1 Two versions of pointillisme I should distinguish a weaker and a stronger version of pointillisme. They differ by: (i) the weaker version taking ‘point’ in pointillisme to mean spatial point; and so denying spatial extrinsicality; (ii) the stronger version taking ‘point’ in pointillisme to mean spacetime point; and so denying temporal as well as spatial extrinsicality. Previous papers rebutted the stronger version; (this involved rebutting the rotating discs argument, so that one can understand persistence as perdurance). Here I rebut the weaker one. Thus I take ‘pointillisme as regards space’ to be, roughly, the doctrine that the instantaneous state of the world is fully described by all the intrinsic properties, at that time, of all spatial points and-or point-sized bits of matter. Here, ‘intrinsic’ means ‘spatially intrinsic’. That is, attributing such a property to an object carries no implications about spatially distant objects; but it can carry implications about objects (perhaps temporal parts of the given object) at other times. On the other hand: I take ‘pointillisme as regards spacetime’ to be, roughly, the doctrine that the history of the world is fully described by all the intrinsic properties of all the spacetime points and-or all the intrinsic properties at all the various times of point-sized bits of matter (either point-particles, or in a continuum). And here ‘intrinsic’ means just the negation of Lewis’ ‘positive extrinsic’. So it means ‘both spatially and temporally intrinsic’: attributing such a property carries no implications about objects at other places, or at other times. It is of course a delicate matter to relate either version of pointillisme to the content of specific physical theories. One naturally asks for example, how philosophers’ idea of an intrinsic property relates to the idea of a physical quantity. Here my tactic will be the obvious one: to formulate pointillisme as a doctrine relativized to (i.e. as a property of) a given physical theory. Thus I will take this paper’s target, pointillisme as regards space, to be a trio of claims about a physical theory, as follows. (a): the fundamental quantities of the theory are defined at points of space; (b): these quantities represent intrinsic properties of such points; (c): models of the theory—i.e. in physicists’ jargon, solutions of its equations, and in metaphysicians’ jargon, possible worlds according to the theory—are fully defined by a specification of the quantities’ values at all spatial points for all the various times. So, putting (a)-(c) together: the idea is that the theory’s models (or solutions or worlds) are something like conjunctions or mereological fusions of “ultralocal facts”, i.e. facts at points. Similarly, pointillisme as regards spacetime would be a trio of claims, but substituting ‘spacetime points’ for ‘spatial points’.
354
Jeremy Butterfield
2.2.2 Mechanics violates pointillisme as regards space Though I have not made a systematic survey, there is no doubt that pointillisme, as regards either space or spacetime, is prominent in recent analytic metaphysics of nature, especially of neo-Humean stripe. The prime example is the metaphysical system of David Lewis, which is so impressive in its scope and detail: one of his main metaphysical theses, ‘Humean supervenience’ (1986, pp. ix-xvi; 1994), is a version of pointillisme as regards spacetime. More specifically, I think many metaphysicians believe that classical physics—or at least classical mechanics— satisfies pointillisme, at least as regards space; (though perhaps not as regards spacetime, because of the rotating discs argument). I agree that there are some considerations in favour of this belief, even the stronger belief about all of classical physics not just classical mechanics. Most of the quantities of most classical physical theories are defined at points of space or spacetime, with models being defined by those quantities’ values at the various points. Witness the endemic use of functions on space or spacetime, governed by differential equations. This supports (a) and (c) in pointillisme’s defining trio of claims. Besides, classical physics is free of various kinds of “holism”, and so anti-pointillisme, that are suggested by entanglement in quantum theory; (as is often remarked: e.g. Teller (1986), Healey (1991, p. 408f.), Esfeld (2001, Chapter 8)). I say ‘kinds’ because the basic fact of entanglement—that the pure state of a composite system (a “whole”) is not determined by the states of its components (“parts”)—manifests differently in different settings: e.g. (i) in the framedependent description of systems in relativity (Myrvold 2002, 2003, 2009), (ii) in heterodox versions of quantum theory such as pilot-wave theory (e.g. Holland 1993, Chapter 7). Nevertheless, this belief in pointillisme is false, even just for classical mechanics. The flaw lies in claim (b). I already mentioned in Section 1 five reasons why (b) is false—five kinds of spatial extrinsicality; (though one reason, viz. my (3), applied in the first instance to branches of classical physics other than mechanics). As I announced there, I will not develop all these reasons: I will just concentrate on one which, though very familiar in the mechanics textbooks, has (as far as I know) been wholly ignored in the philosophical literature. Namely, continuum mechanics embraces spatial extrinsicality due to surface forces, acting at a point, with respect to a surface through that point. We will first see this in Section 3’s discussion of the stress tensor. After that, by considering successively higher order tensors, we will get a spectrum of extrinsicality. I emphasize that all the formalism to follow was established by figures such as Euler and Cauchy between 1770 and 1825; though I shall report it in modern guise, in particular using vector calculus.1 So to sum up: my complaint against the widespread metaphysical view that classical mechanics is pointilliste as regards space is that, although pointillisme’s 1
For the history of vectors and vector calculus, cf. Crowe (1967). I stress again that this physics was not overturned by the advent of relativity: although I will ignore relativity, all my arguments apply equally to relativistic continuum mechanics.
Against Pointillisme: A Call to Arms
355
claims (a) and (c) mostly hold good for classical mechanics, claim (b) is endemically false. Furthermore, one main reason why it is false was known already by 1825—but seems to have been wholly ignored in the philosophical literature.
3 S TRESS :
EXTRINSICALITY FROM A TENSOR ’ S ORDER
In all that follows, I must be brief about technicalities. So I will assume the ideas of tensor algebra and tensor calculus on IR3 , taken as our model of physical space. But I need only consider orthonormal bases and reference frames, and so can suppress the distinction between contravariance and covariance (upper and lower indices): I will use lower indices. More generally, I will follow the exposition, and level of rigour, of Gonzalo and Stuart’s1 excellent text (2008): where the details I must suppress, and much more, can be found. (The same material is treated with more rigour (e.g. about measure theory; cf. (1) in Section 1) in texts such as Gurtin (1981), Marsden and Hughes (1983) and Truesdell (1991).) I will also “let pointillisme off the hook” about the foundations of geometry. That is, I will allow that the pointilliste can invoke any ideas and results of geometry, in particular of tensor algebra and tensor calculus, as legitimately as the rest of us. But this allowance is a matter of not flogging a dead horse, rather than conviction: my (2006) argues that the pointilliste has grave trouble with geometry, in particular tensor calculus. From now on, the basic idea will be that a continuous body is placed in physical space IR3 by occupying a suitable set B ⊂ IR3 .2 B is called the placement or configuration of the body. I will not be precise about how to define ‘suitable set’. Suffice it to make two comments: the first technical, the second philosophical. (i): The main technical desideratum is that the meaning of ‘suitable’ should enable one to secure the invaluable integral theorems of vector calculus (such as the divergence and Stokes’ theorems); and so one usually requires B to consist of a finite number (usually one!) of open bounded disjoint components, and for the boundary ∂B to be piecewise smooth and orientable; (cf. e.g. Gonzalo and Stuart (2008, p. 54) or Kellogg (1967)). (ii): Unfortunately, most philosophical discussion of what ‘suitable’ should mean has been divorced from (i), and dominated by traditional verbal conundrums about whether two continuous bodies can touch each other: Smith (2007) is a fine attempt to connect the philosophical discussion to the physics. Building on this basic idea, this Section considers the forces exerted on the (material points constituting the) body at points x ∈ B. The next Section considers the distortion of the body.
1 2
Whom I thank for permission to reproduce Figures 1 to 3 below. The Newtonian model of physical space is sometimes taken to be the affine space E3 (‘E’ for Euclid), rather than IR3 . But for simplicity, I “keep the origin” and use IR3 .
356
Jeremy Butterfield
3.1 The need for surface forces We begin with a type of force that pointillisme can accommodate: more precisely, can accommodate once we set aside the other kinds of spatial extrinsicality listed in Section 1! Namely, body forces. These are forces exerted on the body’s material points at a distance, and so without regard to any contact between parts of the body, or between the body and its environment. The archetypal example is gravity. We model such a force with a continuous (or at least: suitably integrable) function on B. Thus the body force field, per unit volume, exerted by an external influence on B is a vector-valued function b : B→V . (We will write V , rather than IR3 , for the (unique!) 3-dimensional real vector space, when it is not being interpreted as ∫ physical space.) So the resultant force on a sub-volume Ω ⊂ B is: rb (Ω) = b(x) dVx . Ω Similarly for the body force’s influence on rotational motion, as encapsulated in the idea of torque. I shall not go into details about the need for this idea. Suffice it to say that continuum ∫ mechanics needs the idea of the torque on Ω about a point z, given by: τb (Ω) = Ω (x − z) × b(x) dVx . And under our present (charitable!) assumptions, pointillisme can no doubt accommodate the torque τb (Ω) just as well as rb (Ω). But continuum mechanics also needs the idea of a surface force. Physically, this is a force that arises by contact; and its mathematical representation requires relativizing the force to the surface along which there is contact. This latter idea is fairly intuitive, when the surface in question is the body’s boundary and the force is exerted on the body at the surface by the environment (or vice versa): this is called an ‘external surface force’. But in order to describe how a body resists being broken up by forces that vary across its spatial extent, continuum mechanics also needs internal surface forces. Indeed, it needs to consider at any point x within B, with respect to an arbitrary surface Γ through x, the surface force exerted by the material on one side of Γ, on the material on the other side. Nowadays, textbooks take this idea in their stride. That may seem fair enough, since after all, under small enough forces, a distorted body does not break up along any internal surface; so presumably the description of the body will need some sort of balance of forces across every such surface. But prima facie, the idea is puzzling: how can a force exerted on an internal material point “pull on the point with respect to” a surface? And certainly, to consider all such surfaces through all internal points amounts to a major expansion of the framework for describing forces, compared to a body force’s resultant force and torque, τb (Ω) and rb (Ω). So it is unsurprising that historically, it was a major struggle to articulate the need for this idea. (It is no doubt also unsurprising that the first person to realize this need, and the expanded framework’s sufficiency for describing continua, was that monumental genius, Euler—in the 1770s: for details, cf. e.g. Casey (1992), Wilson 1998 (Section 6).) History aside, the mathematical representation of internal surface forces is given by the Euler-Cauchy Cut Principle, as follows. Let us write n ˆ : Γ→V for the unit normal field on an arbitrary oriented surface Γ. Then the Principle
Against Pointillisme: A Call to Arms
357
postulates that the internal surface force per unit area on Γ is represented by the surface force field (also called traction field) tnˆ : Γ→V . As with ∫ body forces, we can define a resultant force on Γ by integration: it is rs (Γ) := Γ tnˆ (x) dAx . Here, the main point for us is that the subscript n ˆ indicates that the function’s value for a point x ∈ Γ depends on the surface Γ: the nature of this dependence will be our next topic, in Section 3.2. 3.2 The stress tensor In this Section, a natural simplifying assumption about how the traction field tnˆ depends on the surface Γ will imply a simple and powerful representation of the field: viz. the stress tensor. This assumption is called ‘Cauchy’s postulate’. When taken together with two mild continuity assumptions, it implies an analogue, for the traction field, of Newton’s Third Law (that action and reaction are equal and opposite). This is often called ‘Cauchy’s Lemma’. Then Cauchy’s Lemma yields (with no further assumptions) Cauchy’s Theorem: that the traction field at any point is encoded by a second-order tensor, the (Cauchy) stress tensor. This is, essentially, a 3 × 3 matrix with the usual elementary rules for transforming under a change of basis vectors. Cauchy’s postulate says that the traction field tnˆ depends on Γ only via its unit normal n ˆ at the argument-point x ∈ Γ. That is: writing N for the unit sphere in V ≡ IR3 , there is a function t : N × B→V , called the traction function for B, such that: tnˆ (x) = t(ˆ n(x), x). (3.1) Thus surfaces Γ1 , Γ2 , ... that are tangent to each other, with the same normal, at x must have the same traction vector at x. Cf. Fig. 1.
Figure 1: Cauchy’s postulate: surfaces through a point x with normal n at x share the same traction vector t at x. This is the main assumption required for Cauchy’s Lemma.
358
Jeremy Butterfield
Cauchy’s Lemma: Assume (i) that t(ˆ n, x) is continuous, and (ii) that the resultant surface force on a body, divided by the surface area, goes to zero as the body’s volume tends to zero, i.e. that ∫ 1 t(ˆ n(x), x) dAx →0 , as vol(Ω)→0. (3.2) area(∂Ω) ∂Ω It follows that: t(−ˆ n, x) = −t(ˆ n, x). That is: the traction field obeys a law of action and reaction: at each point x, the traction exerted by material on one side of a surface Γ through x upon the material on the other side is equal and opposite to the traction exerted by the latter material on the former. This Lemma now yields Cauchy’s Theorem, which says that t is given by a second-order tensor, i.e. a linear map on V . (I will denote the set of such linear maps V →V by End(V ).) Cauchy’s Theorem: Under these conditions, t(ˆ n, x) is linear in n ˆ . That is: there is at x a second-order tensor S(x), i.e. a linear map S(x) : V →V , such that t(ˆ n, x) = S(x)ˆ n. Let me put this less abstractly, in terms of components. Recall that the components of a second-order tensor S : V →V , in any orthonormal frame (i.e. trio of orthogonal unit vectors) (e1 , e2 , e3 ), are defined to be the nine numbers Sij := ei · S(ej ). One readily checks that Cauchy’s Theorem, i.e. t(ˆ n, x) = S(x)ˆ n, implies that the nine components of S(x) are the components (in the elementary sense for vectors) of the three traction vectors t(ej , x) for the three coordinate planes through x. The ideas underlying this check are that: (i) a second-order tensor encodes the idea of extracting a real number as a bilinear function of two vectors; and (ii) a vector in V ≡ IR3 is normal to a plane; in particular, each basis vector defines a coordinate plane to which it is normal: and putting (i) and (ii) together, we infer that (iii) the components of three vectors, each defined with respect to one of three mutually perpendicular planes, can be represented by a second-order tensor. Cf. Fig. 2. The Cauchy stress tensor S is fundamental to all of continuum mechanics; and we will return to it in Section 5. There we will build on this Section’s idea that a tensor encodes how a physical quantity can be relative to the lines or planes through a point; and that this implies spatial extrinsicality. More specifically: we will see how going to successively higher-order tensors gives a spectrum of spatial extrinsicality. This will be illustrated by elasticity. But to understand that, we first need the idea of strain.
4 S TRAIN :
HAECCEITIST KINEMATICS
I now turn from dynamics to kinematics: from forces, that cause a body to accelerate and-or deform, to the geometric description of motion and deformation. This
Against Pointillisme: A Call to Arms
359
Section has two goals. In Section 4, I introduce deformation and related ideas, especially strain. I emphasize how the mathematics takes for granted the underlying identity of the body’s material points: hence the title’s mention of ‘haecceitism’. In Section 4, I discuss strain. It will be clear that there are various measures of this. For simplicity, I discuss just one: the Cauchy-Green strain tensor. 4.1 The description of deformations At any instant, a body occupies a (suitable) open subset B of IR3 . The set B does not by itself define the configuration of the body, since it does not specify which material point (infinitesimal point-mass) is at which point within B: think of how rotating a solid ball around its centre gives different configurations with the same occupied sphere.
Figure 2: The traction vectors for the coordinate planes with normals e1 , e2 and e3 at x. But having noted this, we will not need to represent in the formalism: we can take the placement of material points at spatial points as understood, and call B a configuration. This is to be compared with some other configuration B ′ . We write X ∈ IR3 , called material coordinates, for the location of a material point in B; and x ∈ IR3 , called spatial coordinates, for its location in B ′ . Thus we compare the configurations, and keep track of each material point, by x and X being functions of each other: x = x(X) and X = X(x). We call B the reference configuration, and B ′ the deformed configuration; although, as in the ball example, B ′ may have the same shape as (even be the same subset of IR3 as) B. We focus on x being a function of X, rather than vice versa. So we define the deformation map ϕ : B ∋ X 7→ ϕ(X) := x ∈ B ′ . (This yields the obvious ancillary notion, viz. the displacement field u : B→V ≡ IR3 defined by u(X) := ϕ(X) − X ∈ V . But we will not need this.) We assume the deformation map ϕ is 1-1, onto, of positive determinant, and smooth. Then we can expect to describe the local nature of a deformation using
360
Jeremy Butterfield
the deformation gradient (a second-order tensor) F := ∇ϕ : X ∈ B 7→ ∇ϕ(X) ∈ End(V ). In particular, we focus on the case where B is a small ball, often written as Ω, of radius α, and its image Ω′ = ϕ(Ω); and we consider the limit as α→0. We use ‘strain’ as a general term to describe the limiting relative differences in the shapes of Ω and Ω′ ; and so we envisage various measures of strain defined in terms of F . 4.2 The Cauchy-Green strain tensor One such measure is the Cauchy-Green strain tensor C. It is often useful because (as we shall see) it does not encode information about any overall rotation that there might be in the deformation from B to B ′ . C is defined by multiplying F (on the left) by its transpose F T . That is: C := F T F : B → End(V ). To interpret this definition, consider the ball Ω of ˆ thus defining points radius α around X ∈ B, and any two unit vectors eˆ and d, Y = X + αˆ e and Z = X + αdˆ on the surface of Ω. Let x, y, z denote the ˆ be the angle between the corresponding deformed points in Ω′ , and let θ ≡ θ(ˆ e, d) vectors v := y − x and w := z − x. Cf. Figure 3.
Figure 3: Interpreting the strain tensor C. Three points X, Y, Z in Ω are mapped to corresponding points x, y, z in Ω′ . C quantifies the limiting changes in the relative position and orientation of the points. Then it is readily shown that as α→0: √ √ |y − x| |z − x| ˆ =: λ(d); ˆ and (4.1) → [ˆ e · C eˆ] =: λ(ˆ e) , → [dˆ · C d] |Y − X| |Z − X| v·w eˆ · C dˆ ˆ θ ≡ cos−1 → cos−1 =: cos−1 (γ(ˆ e, d)). ˆ |v||w| λ(ˆ e)λ(d) Thus the limiting value λ(ˆ e) is called the stretch in the direction eˆ at X. It is the limit of the ratio of deformed length to initial length for line segments that
Against Pointillisme: A Call to Arms
361
ˆ is the limiting value of θ as Y initially lie along eˆ at X. And the angle γ(ˆ e, d) ˆ ˆ and Z tend to X along eˆ and d. Denoting the angle between eˆ and dˆ by Θ(ˆ e, d), the limiting change in angle between these line segments at X, i.e. the quantity ˆ − Θ(ˆ ˆ is called the shear between eˆ and dˆ at X. γ(ˆ e, d) e, d),
5 E LASTICITY:
FURTHER ALONG THE SPECTRUM OF EXTRINSICALITY
We reviewed in Section 3 how a second-order tensor at a point, such as the stress tensor, encodes the idea of extracting a real number as a bilinear function of two vectors there; or equivalently, of giving one vector as a function of another. The emphasis there was on how, since a vector defines (and is defined by) the plane to which it is normal, this corresponds to a function from planes through the point to vectors at it: for the stress tensor, the vector is the traction field (with respect to any surface tangent to the given plane). I now turn to the way in which tensor algebra provides a spectrum of spatial extrinsicality, viz. by going to successively higher-order tensors, i.e. multilinear functions of successively more than two vectors. In particular: by combining the ideas of stress and strain, we can find physically important quantities that lie further along this spectrum. Thus the main quantitative relationships between stress and strain are given by a fourth-order tensor—with the appropriate name ‘elasticity’. Such a tensor corresponds to extracting a real number from a quadrilinear function of four vectors. The details are as follows. The main physical idea is simply that stress arises from strain! That is: when we deform a body, putting it under strain, restorative forces arise, preventing it from falling apart (at least for small strains!). The details of exactly what stress arises depend of course on the constitution of the body. In particular, we define an elastic solid as one in which the stress at any point X ∈ B depends only on the present strain, as encoded by the deformation gradient F at X: not on its rate of change, nor on its past history. This represents a generalization of the familiar Hooke’s law, that the force (stress, tension) in a stretched spring is proportional to its extension (strain). Formally: for an elastic solid, the Cauchy stress tensor S is of the form ˆ (X, t), X) , ∀X ∈ B S(X, t) = S(F (5.1) where Sˆ : End(V )×B→End(V ) is a function, called the stress response function, that depends on the body’s constitution; and t of course represents time. Now we can already see why there is a physically important fourth-order tenˆ sor. Let us for simplicity consider a homogeneous elastic solid, for which S(F, X) ˆ is independent of its second argument X. So: S(X, t) = S(F (X, t)). Then we expect to use the tensor calculus (essentially: multi-variable calculus) to analyse how S depends on its argument F . So we consider the derivatives of the components of Sˆ with respect to the components of F . This gives a four-index quantity:
362
Jeremy Butterfield
it is indeed a fourth-order tensor, the elasticity tensor, which I write as E: Eijkl :=
6 C ONCLUSION :
∂ Sˆij . ∂Fkl
(5.2)
YET MORE EXTRINSICALITY
It is clear that as regards physics, the discussion in Sections 3 to 5 has barely scratched the surface: there is much more to be said about continuum mechanics. The same goes, I submit, for philosophy. In particular, as regards pointillisme: for brevity, I have restricted my critique. I have set aside various kinds of, or sources of, extrinsicality—both physical and philosophical. I will not now lift these restrictions, nor even repeat the list of them from Section 1—why flog a dead horse? But as a final remark, I will mention that yet “more” extrinsicality ensues when contingent claims, in particular the laws of a theory, are allowed as implicit premises in the implications that make a property extrinsic. In other words: many properties that are intrinsic in philosophers’ (kind of) sense are nomically very extrinsic. That is: their ascriptions, when taken together with laws, will have many implications for states of affairs beyond the property’s instance. Let me mention two very striking examples, each with a reference. (1): The position, and even the existence, of a classical point-particle is nomically extrinsic, when we consider how classical particles are constituted, by decoherence, from the quantum state; (Butterfield (2006a, Section 5)). (2): The second example is purely classical, and concerns point-particles in a Newtonian world, interacting by instantaneous forces such as gravity; (the forces need not have an inverse-square dependence on distance). It is due to Schmidt (1997, 1998). He proves that a particle’s trajectory in spacetime, over a finite timeinterval (no matter how short!), completely determines its trajectory throughout all time—regardless of how the other particles are moving. Agreed, the theorem needs some assumptions; but they are entirely reasonable.1 Thus the innovation here, in comparison with the usual idea of determinism for Newtonian gravity, is as follows. Usually one specifies the positions and momenta of all the point-particles across all of space at an instant. But here, one uses the idea that in a finite timeinterval (no matter how short!), the trajectory of the particle one is considering encodes information about all the other particles’ influence on it: and then the theorem is that it in fact encodes enough information to determine the entire future and past trajectory. Let us transcribe this result into our jargon of nomic extrinsicality. That is, let us spell out its consequences for how much is implied about the rest of the universe 1
Namely: (i) the number of particles is finite; (ii) there is a distance less than which no pair of particles ever approach each other; (iii) there is a speed that no particle ever exceeds; (iv) like (iii): there is an acceleration (and so on for higher derivatives) that no particle ever exceeds. Cf. Schmidt (1997, p 446), (1998, pp. 83-84).
Against Pointillisme: A Call to Arms
363
by a statement of the history of the particle(s) in a spacetime region—when the implication is allowed to use the laws of motion and the reasonable assumptions as implicit premises, together with the particular history. The result is very striking. Namely: the history of a particle in a spacetime region of finite but arbitrarily small extent is nomically as extrinsic as one could imagine it to be. For when taken together with the laws of motion and reasonable assumptions, this small patch of history implies the particle’s entire history! Acknowledgements: I am very grateful to audiences in Cambridge, Melbourne, and at the ESF conference in Zeist, and to A. Caulton, W. Myrvold, M. Wilson and the editor, for helpful conversations and comments. I thank O. Gonzalo, A. Stuart and Cambridge University Press, for permission to reproduce Figures 1 to 3 from A First Course in Continuum Mechanics, copyright 2008.
R EFERENCES Belot, G. [1998]: ‘Understanding Electromagnetism’, British Journal for the Philosophy of Science 49, pp. 531-555. Butterfield, J. [2004]: ‘On the Persistence of Homogeneous Matter’, available at: physics/0406021: and at http://philsci-archive.pitt.edu/archive/00002381/ Butterfield, J. [2004a]: ‘Between Laws and Models: some philosophical morals of Lagrangian mechanics’; available at: physics/0409030; philsci-archive.pitt.edu/archive/00001937/ Butterfield, J. [2004b]: ‘Aspects of modality in analytical mechanics’, in P. Weingartner and M. Stoeltzner (eds), Formale Teleologie und Kausalitat in der Physik, Mentis; pp. 160-198; available at: physics/0210081; philsci-archive.pitt.edu/archive/00001192 Butterfield, J. [2005]: ’On the Persistence of Particles’, in Foundations of Physics, 35, pp. 233-269, available at: physics/0401112; philsci-archive.pitt.edu/archive/00001586/. Butterfield, J. [2006]: ‘Against Pointillisme about Geometry’, in Time and History: Proceedings of the 28th Ludwig Wittgenstein Symposium, 2006, ed. F. Stadler and M. St¨oltzner, Ontos Verlag, pp. 181-222; available at: http://philsci-archive.pitt.edu/archive/00002552/ or arxiv.org/abs/physics/ 0512063. Butterfield, J. [2006a]: ‘The Rotating Discs Argument Defeated’, in British Journal for the Philosophy of Science 57, pp. 1-45; available at: http://philsci-archive.pitt.edu/archive/00002382/ Butterfield, J. [2006b]: ‘Against Pointillisme about Mechanics’, in British Journal for the Philosophy of Science 57, pp. 709-753; available at: http://philsci-archive.pitt.edu/archive/00002553/ or http://arxiv.org/abs/ physics/0512064.
364
Jeremy Butterfield
Casey, J. [1992]: ‘The Principle of Rigidification’, Archive for the History of the Exact Sciences 49, pp. 329-383. Crowe, M. [1967]: A History of Vector Analysis, Notre Dame University Press; reprinted by Dover. Esfeld, M. [2001]: Holism in Philosophy of Mind and Philosophy of Physics, Kluwer Academic. Gonzalo, O. and Stuart, A. [2008]: A First Course in Continuum Mechanics, Cambridge University Press. Gurtin, M. [1981]: An Introduction to Continuum Mechanics, Academic Press. Healey, R. [1991]: ‘Holism and Nonseparability’, Journal of Philosophy 88, pp. 393-421. Holland, P. [1993]: The Quantum Theory of Motion: an account of the de BroglieBohm causal interpretation of quantum mechanics, Cambridge University Press. Kellogg, O. [1954]: Foundations of Potential Theory, Dover reprint of a 1929 original published by J. Springer. Lehmkuhl, D. [2010]: ‘Mass-energy-momentum: only there because of spacetime?’, forthcoming in British Journal for the Philosophy of Science; available at: http://philsci-archive.pitt.edu/archive/00005137/ Lewis, D. [1983]: ‘Extrinsic Properties’, Philosophical Studies, 44, pp. 197-200; reprinted in Lewis [1999]; page references to reprint. Lewis, D. [1986]: Philosophical Papers, volume II, New York NY: Oxford University Press. Lewis, D. [1994]: ‘Humean Supervenience Debugged’, Mind, 103, p 473-490; reprinted in Lewis [1999], pp. 224-247. Lewis, D. [1999]: Papers in Metaphysics and Epistemology, Cambridge University Press. Marsden, J. and Hughes, T. (1983): Mathematical Foundations of Continuum Mechanics, Prentice-Hall; reprinted 1994 by Dover. Myrvold, W. [2002]: ‘On Peaceful Coexistence: Is the Collapse Postulate Incompatible with Relativity?’, Studies in History and Philosophy of Modern Physics 33, pp. 435-466. Myrvold, W. [2003]: ‘Relativistic quantum becoming’, British Journal for the Philosophy of Science 53, pp. 475-500 ; available at: http://philsci-archive.pitt.edu/archive/00000569/ Myrvold, W. [2009]: ‘Chasing chimeras’, in British Journal for the Philosophy of Science 60, pp. 635-646; available at: http://philsci-archive.pitt.edu/archive/ 00004529/ Myrvold, W. [2010]: ‘Non-Separability, classical and quantum’, forthcoming in British Journal for the Philosophy of Science; available at: http://philsci-archive.pitt.edu/archive/00004859/ Schmidt, J. [1997], ‘Classical Universes are perfectly predictable’, Studies in the History and Philosophy of Modern Physics 28B, pp. 433-460.
Against Pointillisme: A Call to Arms
365
Schmidt, J. [1998], ‘Predicting the motion of particles in Newtonian mechanics and special relativity’, Studies in the History and Philosophy of Modern Physics 29B, pp. 81-122. Smith, S. [2007]: ‘Continuous bodies, impenetrability and contact interactions: the view from the applied mathematics of continuum mechanics’, British Journal for the Philosophy of Science 58 , pp. 503-538. Teller, P. [1986], ‘Relational holism and quantum mechanics’, British Journal for the Philosophy of Science 37, pp. 71-81. Truesdell, C. [1991]: A First Course in Rational Continuum Mechanics, volume 1, second edition, Academic. Wilson, M [1998]: ‘Classical Mechanics’; Entry in The Routledge Encyclopedia of Philosophy, ed. E. Craig, Routledge; pp. 251-259. Wilson, M [2006]: Wandering Significance: an essay on conceptual behaviour, Oxford University Press.
Trinity College Cambridge CB2 1TQ U.K. [email protected]
D ENNIS D IEKS
T HE G IBBS PARADOX R EVISITED
A BSTRACT The Gibbs paradox has frequently been interpreted as a sign that particles of the same kind are fundamentally indistinguishable; and that quantum mechanics, with its identical fermions and bosons, is indispensable for making sense of this. In this article we shall argue, on the contrary, that analysis of the paradox supports the idea that classical particles are always distinguishable. Perhaps surprisingly, this analysis extends to quantum mechanics: even according to quantum mechanics there can be distinguishable particles of the same kind. Our most important general conclusion will accordingly be that the universally accepted notion that quantum particles of the same kind are necessarily indistinguishable rests on a confusion about how particles are represented in quantum theory.
1 I NTRODUCTION : T HE G IBBS PARADOX Imagine the following experimental set-up: a partition divides a container into two equal parts, each containing a different ideal gas—the amounts of gas, pressure and temperature being equal in the two halves of the container. Now, the partition is suddenly removed, so that the two gases start mixing via an irreversible process; eventually a macroscopic equilibrium situation is established. The uniform gas mixture that fills the container at the end of this process then possesses a higher entropy than the total entropy of the gas system we started with; the difference is the entropy of mixing. The magnitude of this entropy increase can be calculated by considering a reversible process that begins in the same unmixed situation and ends in the same final equilibrium state as the irreversible process we just described. The standard (theoretical) way of realizing this reversible mixing process makes use of semipermeable membranes: the partition between the two halves of the container is replaced (in thought) by two membranes, one only transparent to the gas on the left-hand side (gas A, say), the other only transparent to the other gas (B). These membranes can now be slowly shifted to the left and the right wall of the container, respectively, thereby letting gases A and B expand reversibly. During this process each of the expanding gases exerts a pressure P on the membrane that is opaque to it, so work is done. In order to keep the energy and the temperature at their original values a compensating amount of heat, △Q, should therefore be supplied to the gases, and this can be arranged by immersing the system in a heat bath. D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_25, © Springer Science+Business Media B.V. 2011
368
Dennis Dieks
The change ∫ of entropy resulting from this reversible process can be calculated via △S = dQ/T , with T the temperature. The added heat, dQ, should equal the work performed by the two gases, i.e. 2P dV . In this way we find: ∫ ∫ △S = 2 P dV /T = 2 kN dV /V = 2kN log 2, (1) where we have used the ideal gas law P V = kN T , with N the number of atoms or molecules in each of the two gases and k Boltzmann’s constant. This entropy of mixing is independent of the exact physical properties of gases A and B. The only thing that plays a role in the calculation and in the final result is that the two gases are different. This difference makes it possible to design— in principle—the semi-permeable membranes that are needed for the reversible mixing process. If the gases are the same no distinguishing membranes can exist and there is no mixing at all according to thermodynamics: from a thermodynamic point of view nothing happens when the partition is removed in this case. As a consequence, there is a discontinuity in the behavior of the entropy: any difference between the gases, however small, produces the same mixing entropy 2kN log 2, whereas there is no entropy of mixing if the gases are the same. The existence of this discontinuity is known as the Gibbs paradox. Within the framework of thermodynamics the modern standard response [8] to the existence of this discontinuity is that nothing remarkable is happening here: In principle (although not in practice) it is always possible to design membranes that distinguish gases A and B as long as there is any difference between them at all. Because there is no conceptual difficulty in accepting a discontinuity between “A and B differ” and “A and B are equal”, it should not be considered paradoxical that there is a corresponding discontinuity between distinguishability-in-principle plus existence of a mixing entropy, and complete identity without such an entropy. Moreover, in practical situations the effectiveness of distinguishing between two gases will be restricted by technical limitations—this effectiveness will gradually diminish when the two gases become more similar. As a consequence, no discontinuity will be detectable in actual laboratory situations: the measured mixing entropy will there vanish continuously. It is only in the idealized situation of perfectly effective separation techniques that the discontinuity in the entropy will manifest itself—and as we have seen, in this case there is no conceptual problem.
2 T HE G IBBS PARADOX IN S TATISTICAL M ECHANICS The paradox can also be formulated in statistical mechanics. In statistical mechanics a counterpart to the thermodynamical entropy can be defined, namely the logarithm of the number of microstates W that are compatible with a given macrostate: S = k log W . When an ideal gas of N particles expands and doubles its volume, the number of available microstates X per particle doubles: each particle now obtains twice as much space available to it as it had before. This means
The Gibbs Paradox Revisited
369
that W goes up, from X N to (2X)N , which corresponds to an entropy difference △S = kN log 2. When two different ideal gases mix, the statistical mechanical entropy of mixing is therefore 2kN log 2, exactly the value predicted by thermodynamics. When two equal volumes of the same gas mix, the number of microstates available to an arbitrary individual particle still doubles, so the formula S = k log W gives us △S = 2kN log 2, as before. But now this result seems wrong, at least from the point of view of thermodynamics. As we have seen, the thermodynamical entropy of mixing vanishes in this case, because nothing happens when two identical gases mix. This then leads to a new form of the Gibbs paradox: the statistical mechanical entropy of mixing is insensitive to the question of whether the gases are equal or unequal, but this is in conflict with the discontinuity predicted by thermodynamics. In the literature the following argument is often deployed in order to remove this discrepancy. “Since the molecules (or atoms) of a given species of gas are all qualitatively the same, permutations of these particles do not have any physical effect and do not lead to a new state; therefore, replace W by W/N ! in the formula for the entropy.” As it turns out, this change in the way of counting the number of available microstates suffices to restore agreement with thermodynamics. Indeed, the division by N ! makes the mixing entropy in the equal-gases-case disappear and leaves the other, correct results untouched. For example, doubling of the volume of a gas without a change in the number of particles gives us △S = k(log(2X)N /N ! − log(X)N /N !) = kN log 2, so the entropy of mixing keeps its appropriate value in the case of two different gases that mix. However, doubling the volume together with doubling N gives us, via Stirling’s formula, that the number of microstates goes from W to W 2 : W = X N /N ! → W ′ = (2X)2N /(2N )! = W 2 . This implies, via S = k log W , that the entropy just doubles, without entropy of mixing, when two volumes of equal gases are combined. The way in which the division by (2N )! in the equal-gases-case achieves the removal of the mixing entropy is that it throws away, in addition to the exchanges among left particles and right particles, also all permutations in which one or more particles coming from the left are exchanged with particles originating from the right side of the container. This now discarded number of permutations yields a factor M = (2N )!/N !N ! in the number of microstates, which via △S = k log M corresponds exactly to the value of the entropy of mixing. In other words, there no longer can be any mixing, nor any entropy of mixing, because the exchange of a particle from the left with one from the right is defined away, as not giving rise to a new physical situation. The division by N ! thus restores the situation as we know it from thermodynamics: there is a finite and constant entropy of mixing in the case of different gases, however small the difference between the gases may be, and there is no such entropy in the case of equal gases. However, this result is achieved at the price of defining away differences that obviously are physically real, at least from the point of view of classical physics.
370
Dennis Dieks
Indeed, classical particles are the example par excellence of distinguishable individuals: no two classical particles can be in exactly the same physical state because they will at any instant at least occupy different spatial positions, by virtue of their impenetrability. Moreover, classical particles follow continuous and nonintersecting trajectories, so that they “remember” where they came from. Their individuality and distinguishability are thus preserved over time, with the consequence that it makes a physical difference, in principle, whether a given particle in our gas container originates from the left or from the right. So the above resolution of the Gibbs paradox in statistical mechanics, relying as it does on an assumed identity of states that follow from each other by particle permutations, is in conflict with basic features of classical mechanics. It is for this reason that quantum mechanics is often invoked: in quantum mechanics particles are indistinguishable as a matter of principle, so that particle exchanges really do not alter the physical state—at least, that is the conventional wisdom. However, as we shall show in a moment, the starting point of the whole argument, namely that there should be no statistical mechanical entropy of mixing in the case of two gases of the same kind, is shaky. Although on the macroscopic, thermodynamical level the absence of an effect of mixing two equal gases is certainly a justified assumption, this does not mean that there are no effects if microscopic details are taken into account.
3 T HE S TATISTICAL M ECHANICAL E NTROPY OF M IXING We are now going to argue in the context of statistical mechanics, and this induces us to consider a slight extension of the tools that are used in thermodynamical thought experiments. In statistical mechanics the motivating idea is to take into account how gases are built up from their atoms or molecules, and this makes it natural to consider a variation on the reversible mixing process explained in section 1. In section 1 an essential role was played by semi-permeable membranes that were only transparent to one of the two gases (in the case of the mixing of different gases). In the context of thermodynamics this means that the membranes are sensitive to chemical differences between the gases. In statistical mechanics we have the opportunity to generalize this and to consider membranes that are also sensitive to microscopic particle details. In this spirit we now introduce a new type of semi-permeable membrane: one that is transparent to particles originating on the right-hand side of the container and opaque to particles coming from the lefthand halve (or vice versa). According to classical physics such membranes are possible in principle, as is clear from what was said above about particle properties in classical mechanics: particles carry the data about their origin with them, in their position and momentum values, and this can (in principle) be used to determine whether the membrane should stop them or not. Figuratively speaking, think of submicroscopic computers built into the membrane that perform an ultra-rapid calculation each time a particle hits them, to see where it came from; or the proverbial demon with superhuman calculational powers who stops or lets pass particles
The Gibbs Paradox Revisited
371
depending on their origin. In general, of course, allowing expedients of this kind may upset thermodynamical principles, in particular the second law of thermodynamics. But in the thought experiment we propose here we make a restricted use of these unusual membranes. The idea is merely to employ them for the purpose of demonstrating that if gases are mixed and unmixed by selection on the basis of past particle trajectories and origins, as should be possible according to classical mechanics, this leads to the emergence of an entropy of mixing. Indeed, if we use semi-permeable membranes designed in the way just described, and employ them exactly as our earlier membranes but now in the case of two gases of the same kind, we find just as before that a pressure is exerted on the membranes by the particles to which they are not transparent. Copying the reasoning from section 1, we can conclude that this leads to the presence of an entropy of mixing with the value 2kN log 2. In other words, if the submicroscopic particle picture of statistical mechanics is taken completely seriously, the original formula S = k log W , without the ad-hoc division by N !, gives us correct results. In principle then, on the microscopic level of description the mixing entropy always exists according to classical mechanics, even in the case of equal gases. In principle, classical particles can always be distinguished on the basis of their trajectories and there is a corresponding mixing entropy that can be measured by using semi-permeable membrane-like devices of the kind we have described. Classical atoms and molecules are distinguishable individual entities, and we can conclude that analysis of the Gibbs paradox supports rather than undermines this general feature of classical physics. Of course, microscopic distinguishability cannot show up if we confine ourselves to using macroscopic separation techniques of the kind considered in thermodynamics. But this is a practical matter that should not be confused with an argument for fundamental indistinguishability on the particle level.
4 A N EW D ILEMMA : T HE G IBBS PARADOX IN Q UANTUM M ECHANICS But now we are facing a new paradox. In quantum mechanics the “identity of indistinguishable particles” has long been recognized as a basic principle, given concrete form by the (anti-)symmetrization postulates. These postulates stipulate that in the case of particles of the same kind permutations of particle indices leave a many-particle state either invariant (the case of bosons) or change its sign (the case of fermions); in either case there are no measurable physical differences associated with the permutations. These symmetrization postulates possess a law-like validity, so they hold regardless of the peculiarities of the situation that is considered. Therefore, from the quantum mechanical point of view division by N ! seems completely justified and even mandatory when the number of microstates has to be determined. Application of S = k log W then seems to tell us that as a matter of principle there can be no entropy of mixing when two gases of the same
372
Dennis Dieks
kind mix—as we have seen above, the division by N ! leads immediately to this conclusion. This is a paradox. A treatment of the mixing of gases by means of quantum mechanics should obviously reproduce the results of a classical calculation in the classical limiting situation, so it should be able to yield the value 2kN log 2 in the case in which we follow individual particle trajectories, as described in the previous section. But it now seems that according to quantum mechanics this is impossible in principle! To put the difficulty into perspective, consider a situation that is possible according to quantum mechanics and at the same time can be dealt with by classical mechanics. Suppose that the one-particle states occurring in the quantum mechanical many-particles wavefunction of our ideal gas system do not overlap spatially, and that this remains true for a substantive time interval. Well-known results (in particular Ehrenfest’s theorem, to which we shall return in section 5) assure us that in this case the spatially isolated one-particle quantum wave packets behave exactly like classical particles. In fact, what we have here is the quantum mechanical description of a diluted ideal gas, and this description is virtually identical to what classical theory tells us: small one-particle wave packets take the place of classical particles, are subject to the same dynamical principles, and follow the same trajectories. This is a typical classical limit situation, in which the predictions made by quantum mechanics should parallel those of classical mechanics. In particular, in the experiment with the unusual membranes of section 3 we should expect that quantum mechanics gives us the result we derived there, namely the existence, in principle, of an entropy of mixing with the value 2kN log 2. In the limiting situation everything goes, according to quantum mechanics, as in the classical case and the earlier derivations can be repeated step by step. Apparently then, the quantum mechanical symmetrization postulates are not decisive for the question of whether or not particles are distinguishable in quantum mechanics! In the diluted gas situation that we just discussed quantum particles are as distinguishable as classical particles: they occupy different spatial positions and follow continuous and non-intersecting trajectories. In this case it is clear that real physical differences correspond to different trajectories and different particle origins, and the existence of an entropy of mixing testifies to this: the differences in question give rise to empirical effects. Nevertheless and paradoxically, in this very same situation the symmetrization postulates are fully respected. The situation becomes clearer when we consider a concrete case, namely a two-particle quantum system in which the one-particle wave functions do not overlap spatially, like in the diluted gas just discussed. Take as the quantum state of the system 1 |Ψ⟩ = √ (|ϕ1 ⟩|ψ2 ⟩ + |ψ1 ⟩|ϕ2 ⟩), (2) 2 with |ϕ⟩ and |ψ⟩ representing two non-overlapping wave packets. This state is symmetrical: exchange of the indices 1 and 2 leaves the state invariant, and there is thus no physical difference associated with the distinction between these indices.
The Gibbs Paradox Revisited
373
Still, this symmetrical state represents a situation in which there are two quasiclassical objects, one at the position of the wave packet represented by |ϕ⟩ and one at the position of |ψ⟩. These wave packets and the objects they represent clearly are distinguishable, and they are the things that are relevant for the classical limit. As was illustrated above for the diluted gases case, these spatially non-overlapping wave packets take the role of particles in the classical limit. Summarizing, although it is true that because of the symmetrization each index in an N -particles quantum state of particles of the same kind, 1 ∑ |Ψ⟩ = √ Π|ϕi1 ⟩|ψi2 ⟩|χi3 ⟩|τi4 ⟩ · · · N!
(3)
—where Π denotes permutation over the indices and the summation is taken over all such permutations—is associated with exactly the same “state” (in the sense of a density operator obtained by partial tracing), there still will be distinguishable particles in the classical limit if the one-particle states |ϕ⟩, |ψ⟩, |χ⟩, |τ ⟩ · · · do not spatially overlap. Therefore, the indices in the quantum mechanical formalism, over which (anti-)symmetrization takes place, cannot refer to what we understand to be particles in classical physics! This observation is the key to the resolution of our paradox. Although the indices in the “many-particle” quantum formalism have completely symmetrical roles to play and do not correspond to any physical differences, this does not entail that it does not make a difference if we exchange two particles in the ordinary sense (i.e., the localized objects that we are wont to call particles in classical physics). Therefore, there may be a mixing entropy even if the symmetrization postulates are fully respected: the existence of this entropy depends on the distinguishability of particles, not on the distinguishability of indices. The notion that the symmetrization postulates enforce division by N ! in the classical expression for the entropy, and thus make the entropy of mixing vanish for gases of the same kind, rests on a confusion about the status of the particle concept in quantum mechanics [5, 7].
5 H OW PARTICLES A RE TO B E R EPRESENTED IN Q UANTUM M ECHANICS Elaborating on this conclusion, it should be noted that the symmetrization postulates, which are responsible for the physical equivalence of all indices in states of many particles of the same kind, are basic postulates of quantum mechanics; they possess a universal validity. This implies that if we were to take the indices to refer to particles, it would follow that all particles of the same kind in the universe are in exactly the same state. For example, it would not make sense to distinguish between electrons here and electrons elsewhere, for instance in another solar system: all electrons in the universe are “partly here, partly there, a bit everywhere”. The applicability of the symmetry postulates holds regardless of what kinds of
374
Dennis Dieks
interactions and situations are considered; in particular, whatever circumstances should turn out to be responsible for the transition to the classical limit, these do not affect the applicability of the postulates. Therefore, if we were to assume that the quantum indices refer to particles, this would imply that even in the classical limit particles are all in exactly the same state—that even classical particles are completely indistinguishable! This simple reductio shows once again how wrong it is to think that the symmetrization in quantum mechanics pertains to particles in the ordinary sense. Classical particles are certainly distinguishable objects, so they simply cannot correspond to the quantum indices in the classical limit. Classical particles are characterized by their unique spatial positions and trajectories. Now, as a defence of the idea that the individuality that corresponds with these individuating features disappears as soon as quantum mechanics is taken into account, it is sometimes maintained that the “haziness” of quantum mechanical wave functions, in the sense of their non-zero spatial extensions, makes it impossible to follow a quantum particle in all detail over time (e.g., [1]). For this reason precise trajectories do not exist in quantum mechanics and the concept of genidentity, so the argument continues, cannot apply to quantum particles: their wave packets will overlap, and we can consequently not find out where each particle comes from and with which earlier particle it should be considered identical. This argument is notable for at least two reasons. First, it apparently assumes that particles can be associated with individual wave packets (that may overlap). This betrays a correct intuition about what particles are and how they are to be represented in the quantum formalism, but is of course in conflict with the standard idea that the indices in the formalism denote particles. Second, this “haziness argument” is implausible as a general defence of the indistinguishability of particles. This indistinguishability would apparently have to occur as soon as there is overlap between wave packets; but haziness is something gradual, subject to becoming more or less, and evidently not strong enough a premiss to bear such an all-or-nothing conclusion. Moreover, the (anti-)symmetrization postulates remain in force even if wave packets do not overlap at all, which demonstrates that overlap cannot be the decisive factor here. What the haziness argument actually supports is our earlier conclusion that particles in quantum mechanics should be represented by one-particle wave packets, and that there is a gradual transition from the “classical” situation, in which such wave packets are spatially disjoint, to the more typical quantum case in which there is overlap and in which the classical particle concept is not fully applicable. Our essential argument is that there is no relation between the particle concept thus understood and the “particle indices” in the quantum mechanical formalism. That quantum mechanics is indeed capable of representing classical particles in the way just explained, is (as we already mentioned earlier) guaranteed by Ehrenfest’s theorem. In the case of a Hamiltonian H = p2 /2m + V (r), with p the momentum, m the particle mass and V (r) a potential field, we can introduce a force field F (r) = −∇V (r), in terms of which Ehrenfest’s theorem takes the
The Gibbs Paradox Revisited
375
form
d2 ⟨r⟩. (4) dt2 For certain specific potentials (in particular free motion, i.e. F=0, relevant to our ideal gases case) we find that ⟨F (r)⟩ equals F (⟨r⟩), so that in these cases the d2 mean value of r exactly satisfies the classical law of motion F (⟨r⟩) = m dt 2 ⟨r⟩. In general this is not so. But if the wave function is localized in a sufficiently small region of space, so that the variation of the force field within that region is small, we can replace Eq. (4) by the classical equation in a good approximation (which becomes better when the state becomes more localized). From this it follows that well-localized single-particle quantum states (localized in the sense that their associated wave packets are very narrow) approximately follow classical trajectories. Classical trajectories thus do exist in quantum mechanics: they are realized by (very) small wave packets. Consequently it is essential, for the classical limit, to have a mechanism that keeps wave packets narrow during appreciable time intervals. Such considerations are standard in studies on the classical limit of quantum mechanics, and there is growing agreement that the essential element in explaining how classical mechanics emerges from quantum mechanics is the process of decoherence. Decoherence processes cause the destruction of coherence between parts of the wavefunction centered around different positions, and effectively reduce wave functions to narrow wave packets (see for a more extensive discussion [5]). It is generally acknowledged then that the classical limit of quantum mechanics is characterized by the emergence of classical trajectories followed by narrow wave packets. These narrow wave packets become the particles we are familiar with in classical physics. ⟨F (r)⟩ = m
6 PARTICLES AS E MERGENT E NTITIES Our conclusion is therefore that we should think of particles, as we know them from classical physics, as represented in quantum mechanics by localized wave packets [5, 7]. That is to say, if⊗ we encounter ⊗ ⊗a state ⊗ |Ψ⟩ defined in an n-fold tensor product Hilbert space H1 H2 H3 ... Hn , and wish to investigate whether it can be interpreted in terms of particles, we have to ask ourselves whether it can be written as a (anti-)symmetrized product of localized one-particle states. It is easy to show that if such a “particle decomposition” of |Ψ⟩ exists, it is unique [5]. In most cases states will not allow a particle interpretation; think, for example, of a state of the form (2) with two overlapping wave packets |ϕ⟩ and |ψ⟩ (each defined in a connected region of space). The particle decomposition that we need, in terms of localized states that are non-overlapping (and therefore mutually orthogonal) clearly does not exist in this case: of course there does exist a bi-orthogonal
376
Dennis Dieks
Schmidt decomposition, but the states occurring in it will be linear combinations of |ϕ⟩ and |ψ⟩ and will consequently spatially overlap. An arbitrarily chosen quantum state will therefore not represent particles. We need special circumstances to make the particle concept applicable. In this sense, the classical limit with its decoherence processes makes classical particles emerge from the substrate of the quantum world. It may be added that the circumstances that are responsible for the emergence of classical particles at the same time justify the use of the statistics that we expect for the case of independent individuals. Indeed, in the case of spatially nonoverlapping wave packets, in which our particle concept becomes applicable, both Fermi-Dirac and Bose-Einstein statistics reduce to classical Boltzmann statistics [2, 6].
7 T HE G IBBS PARADOX : C ONCLUSION When equal gases of the same pressure and temperature mix, nothing happens from a macroscopic, thermodynamical point of view. So there should be no entropy of mixing in this case, in conformity with what thermodynamics predicts. In the literature this vanishing of the thermodynamic mixing entropy when two equal gases mix has often been interpreted as a fundamental fact, as a sign that also on the level of statistical mechanics, when the microscopic constitution of the gases is taken into account, it should not make a physical difference whether gas particles originate from one or the other of the initially separated gases. This interpretation is mistaken. When non-thermodynamic, microscopic separation and mixing techniques are allowed, it turns out that even in the case of equal gases the value 2kN log 2 of the mixing entropy, and other effects of mixing, can be recovered and in principle experimentally verified. The vanishing of the entropy of mixing is therefore conditional on looking in a purely macroscopic way at what happens, and has no consequences for the identity and distinguishability of microscopic particles. Invoking quantum mechanics in order to argue that the mixing entropy vanishes as a matter of principle, on account of the fundamental indistinguishability of particles, is completely wrongheaded. As it turns out, the microscopic effects of mixing classical gases of the same kind persist in quantum mechanics. This becomes understandable once we realize that the gas particles do not correspond to the indices in the quantum formalism, but rather to one-particle wave packets. In the classical limit such wave packets become the particles we know from classical mechanics. The conclusion that quantum particles correspond to one-particle wave functions rather than to indices accords with other evidence that these indices have a purely mathematical significance and do not denote physical objects [3, 4, 5]. According to our analysis of what particles are, the appearance of particles constitutes a genuine case of emergence. Only if specific physical conditions are satisfied, resulting in the presence of localized wave packets (decoherence processes are usually essential here) does
The Gibbs Paradox Revisited
377
the concept of a particle in the ordinary sense become applicable to the world described by quantum mechanics. Finally, even in cases in which individual particles in our sense are not present in the mixing gases of the Gibbs paradox, quantum mechanics predicts that a nonzero value of the entropy of mixing can be recovered by using sophisticated membranes. The reason is that the two initially separated volumes of ideal gas are represented by mutually orthogonal wave functions, located on the left and right, respectively. Since ideal gases do not interact these wave functions remain orthogonal, and therefore distinguishable in principle, even after the removal of the partition. This point (to be worked out elsewhere) demonstrates once more that the indistinguishability of bosons and fermions is irrelevant to the resolution of the Gibbs paradox.
B IBLIOGRAPHY [1] Claude Cohen-Tannoudji, Bernard Diu and Frank Laloe, Quantum Mechanics, Vol. 2. Hoboken: Wiley-Interscience 1978, Ch. XIV. [2] Dennis Dieks, “Quantum Statistics, Identical Particles and Correlations”, in: Synthese, 82, 1990, pp. 127-155. [3] Dennis Dieks and Marijn Versteegh, “Identical Particles and Weak Discernibility”, in: Foundations of Physics, 38, 2008, pp. 923-934. [4] Dennis Dieks, “Are ‘Identical Quantum Particles’ Weakly Discernible Objects?”, in: Mauricio Suarez, Mauro Dorato and Miklos Redei (Eds.), EPSA Philosophical Issues in the Sciences: Launch of the European Philosophy of Science Association, Volume 2. Heidelberg: Springer 2010. [5] Dennis Dieks and Andrea Lubberdink, “How Classical Particles Emerge From the Quantum World”, in: Foundations of Physics, 2011, to appear; DOI 10.1007/s10701-010-9515-2. [6] Steven French and Decio Krause, Identity in Physics: A Historical, Philosophical, and Formal Analysis. Oxford: Oxford University Press 2006. [7] Andrea Lubberdink, “Identical Particles in Quantum Mechanics”, at http://arxiv. org/abs/0910.4642 [8] N.G. van Kampen, “The Gibbs Paradox”, in: W.E. Parry (Ed.), Essays in Theoretical Physics. Oxford: Pergamon Press 1984, pp. 303-312. [9] Wojciech H. Zurek, “Decoherence and the Transition from Quantum to Classical Revisited”, in: B. Duplantier, J.-M. Raimond and M. Rivasseau (Eds.), Quantum Decoherence, Poincar´e Seminar 2005 (Progress in Mathematical Physics, vol. 48). Basel: Birkh¨auser 2007, pp. 1-31. Institute for History and Foundations of Science Utrecht University P.O.Box 80.010 3508 TA Utrecht The Netherlands [email protected]
MAURO DORATO
THE ALEXANDROFF P RESENT AND MINKOWSKI SPACETIME: WHY IT CANNOT DO WHAT IT HAS BEEN ASKED TO DO1
In recent times there have been interesting attempts to introduce a mind-independent present in Minkowski spacetime, a structure that has also been regarded as capable of explaining some aspects of our subjective experience of time, in particular the fact that we share a “now” but not a “here”.2 Considering that Minkowski spacetime is the arena for three out of the four interactions postulated by contemporary physics, this claim, if correct, would be remarkable. Against the prevailing opinion of last century’s major physicists and philosophers, we would have in fact discovered that it is possible to find a now (even a “transient one”) in any contemporary physical theory whose spatiotemporal arena has the structure of Minkowski spacetime. At the same time, we would have gone some way toward a rapprochement of “the manifest image”3 of time − characterized by a cosmically extended, transient now − with the physical image, traditionally dominated by the picture of a block universe in which “the present is absent” because regarded as purely mind-dependent. The main thesis of this paper is that in Minkowski spacetime the so-called “Alexandroff present” (henceforth, for short, “Alex”), introduced by Arthur and
1
2
3
Thanks to Richard Arthur and Steven Savitt for extensive discussions and written exchanges over many of the topics discussed here. From them, I have in any case learnt a lot also in the past. Our exchange has reduced, but possibly not completely eliminated, the number of misunderstandings on my part. Carl Hoefer has read a previous version of this paper and helped with critical comments and questions. Richard Arthur, “Minkowski spacetime and Dimensions of the Present”, in: Dennis Dieks (Ed.), The Ontology of Spacetime, Amsterdam: Elsevier, 2006, pp.129-155; Steven Savitt, “The Transient nows”, in: Wayne C. Myrvold, Joy Christian (Eds.), Quantum Reality, Relativistic Causality, and Closing the Epistemic Circle, The Western Ontario Series in Philosophy of Science 74, Amsterdam: Springer 2009, pp. 339-352. This famous expression is in Wilfrid Sellars, “Philosophy and the Scientific Image of Man”, chapter 1 in Wilfrid Sellars, Science, Perception, and Reality, New York: Humanities Press, 1963 pp. 40, where he contrasts the manifest image of the world with its scientific image.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_26, © Springer Science+Business Media B.V. 2011
380
Mauro Dorato
Savitt4 is neither a physically relevant property5 nor a very good explanans of our temporal experience. Therefore, it should be dropped. The paper is structured as follows. In 1 I will present Savitt’s two main motivations for having a present in Minkowski spacetime, and the corresponding two senses in which physics might be said to need an objective now. In 2 I will introduce the main features of Alex, and the alleged link between the agreed-upon temporally extended character of the present of our experience (specious present) and its supposed “interactive nature”. In 3 I will discuss what I refer to as “Alex’s dilemma” between its marginal role in physics and its unfaithfulness to our experience. In 4 I will illustrate the former horn of the dilemma, by presenting some possible uses of Alex in mathematical physics. These will be shown to be insufficient to conclude that Alex has an important role in physical theories. In 5, I will illustrate the latter horn of the dilemma, by defending at the same time an alternative, better explanation of the spatial extendedness of our experienced nows, based on empirical data concerning our psychophysical threshold for regarding two light signals as being temporally successive.
1 SAVITT’S TWO MOTIVATIONS FOR HAVING A PRESENT IN MINKOWSKI SPACETIME The first, and possibly more important, motivation consists in showing that the task necessary to close “Shimony’s circle” between the objective aspect of time and its felt, subjective aspect, can, at least in principle, be completed: The program [of closing the circle] envisages the identification of the knowing subject (or, more generally, the experiencing subject) with a natural system that interacts with other natural systems.6
I am very sympathetic toward this project,7 as it is one of the inspiring Muses of the philosophy of time. 4
5
6 7
Alex was originally suggested by Howard Stein, “On Relativity Theory and the Openness of the future”, in: Philosophy of Science 58, 1991, pp. 147-167, and elaborated upon by Wayne Myrvold, “Relativistic Quantum Becoming”, in: The British Journal for Philosophy of Science, 54, 2003, pp. 475-500. These authors’ claims, however, were less ambitious than Arthur’s and Savitt’s. Except, possibly, in recent attempts at building a mathematical setting in which causality can be studied independently of geometrical and differentiable structure Keye Martin, Prakash Panangaden, “Spacetime topology from causality”, in: arXiv:grqc/0407093v1, 2004. Abner Shimony, “Reality, causality, and closing the circle”, in Search for a Naturalistic World View, Vol. I. Cambridge: Cambridge University Press, 1993, p.40. Arthur does not share this motivation with Savitt in an explicit way. Of course, this does not amount to saying that he sees no point in trying to explain objectively why
The Alexandroff Present and Minkowski Spacetime
381
However, Savitt’s second motivation for introducing an objective now in Minkowski spacetime is, I take it, in potential conflict with his ambition of “closing Shimony’s circle”. This second motivation consists in his attempt to show that Einstein’s often quoted “worry” that physics cannot grasp “the Now” is, in some sense, unfounded:8 Once Einstein said that the problem of the now worried him seriously. He explained that the experience of the Now means something special for man …That this experience cannot be grasped by science seemed to him a matter of painful but inevitable resignation (Carnap 1963, 37, my emphasis).
Parallel to Savitt’s two motivations, there are two senses in which one could think that physics needs an objective now. According to the first, current physics is to be regarded as incomplete because it has not yet captured an allegedly objective property “being now”, a property that − unlike “being massive” or “being charged” − has so far escaped all physical theories. While this first sense might be regarded as being too strong, I take it that it is the only sense in which one would be entitled to conclude that “there is a viable alternative” to Einstein’s worry (see note 8). The second sense in which physics might need a now is much weaker, as it amounts to claiming that a physically objective now − possibly corresponding to some objective spatiotemporal structure − is needed either to explain some key features of our subjective experience or in the practical application of physical theories by human beings, or to give a precise semantic interpretation of our common use of “now” regarded as an indexical. Given that Arthur and Savitt agree with me and others that “now” is an indexical, and that indexicals cannot appear in laws, it seems safe to conclude that the only reason to introduce an objective present in Minkowski spacetime is linked to the second sense, the one that Arthur and Savitt are really after. As we will see, however, even in this weaker sense, Alex turns out to be an arbitrary choice, since other spatiotemporally extended structures, or other non-geometrical explanations, can do the explanatory job in a more satisfactory way.
8
time is experienced in the way we do (personal communication). After quoting from physicists and philosophers holding that there cannot be a now in physical theories, Savitt writes: “What I hope to do in this paper is show that there is a viable alternative picture to these views, a picture that includes, in some sense, a now and the passage of time” (ibid., p.340). This second motivation, unlike the first, is shared also by Arthur: “Indeed it is false to say that physics does not take the “now” into account” (ibid., p. 151, n. 35).
382
Mauro Dorato
2 ALEX’S FEATURES IN RELATION TO THE EXTENDED CHARACTER OF OUR EXPERIENCE
It is now time to look at the features of “Alex” in some more detail, since they are crucial in order to assess its suitability for explaining our experience of time, or for accounting for some pragmatic uses of the “present” in cosmology, as when we claim that, for example, “galaxies in our universe formed a certain number of billions year ago”, or that “presently the universe is expanding”.9 The Alexandroff’s present relative to a segment of any timelike curve delimited by points a and b is the intersection of the future light-cone whose vertex is the beginning event a with the past light cone whose vertex is the end event b (as customary, a temporal orientation is assumed). The Alexandroff’s present relative to the segment a-b of a worldline − call such a set Alex [a,b] − is then by definition the set of points in the above mentioned intersection. The (relational) objectivity of Alex [a,b] depends on the invariance of the relation of causal connectibility within the special theory of relativity. In fact, it turns out that any event in the set Alex [a,b] is both a possible effect and a possible cause of events on the segment of the worldline, and for this reason it could be regarded as the set of events with which my body or any other physical system modelled by a worldline can in principle interact during the time-like interval a,b. Notice for example that in fig. 1 below, point c, which is outside Alex [a,b] but inside event’s b past light cone, can be a possible cause of many events on the segment, but is not in mutual causal contact with all of it, since it cannot, in its turn, be caused by events belonging to the segment.10 More precisely, Alex [a,b] is the set of all points z such that there exist two points x and y, belonging to the segment a, b for which we have both CPxz and CPzy, where CP is the relation of past causal connectibility.11 One more point worth noting is that the length of the timelike interval under consideration may vary from context to context, and this is certainly an advantage of this conception: as we know from ordinary discourse, “now” can pick out my present perception, the present historical moment, or even the “present stage” of cosmic expansion.
9 Arthur, ibid., p. 152. 10 Event a and event c are not causally connectible, and the “first” event that could cause c is below a, or “earlier than a”, see fig. 1. 11 This more rigorous rendering is due to Myrvold, ibid. p. 480. Stein talked about Alex as the set of events contemporaneous with a given segment of a worldline, Stein, ibid, p. 159.
The Alexandroff Present and Minkowski Spacetime
383
Imagine that the segment a, b represents, with due idealizations, my pointlike body looking at my room “right now”. Why should my present experience be represented as temporally extended along the segment a-b? There are two possible arguments in favour of such a hypothesis. One comes from phenomenological evidence stemming already from James, Husserl and other psychologists and phenomenologists, linked to studies related, say, to our perception of music or language. The wealth of experimental evidence coming from this corner and militating against a point-like present is quite strong. A second, more speculative argument has been put forth by Stein. On Stein’s hypothesis, the temporal extendedness of the present of our experience is due to the conditions that he regards as necessary for a conscious experience to occur. These conditions require coordination, i.e., several mutual interactions of the parts of the brains with themselves and with the external environment. Notoriously, in relativistic physics, interactions take time, and it is this empirical hypothesis that could explain, among other things, why our experience of the present is not knife-edged, as James put it, but “specious”, or temporally extended. In a word, the idea of a spatiotemporally extended present like Alex’s − a set of events that is in mutual possible causal interaction with our bodies and brains experiencing something − in Stein’s opinion would be a consequence of the fact that the stability of the various parts of our brains and of the external physical objects around us needs several coordinations, which are realized only thanks to several mutual physical influence.12 Agreed: one immediate advantage of choosing Alex as a model of our experienced present is that it allows a prima facie plausible explanation of the reason why we all share the same now,13 as well as of our “natural” intuition − belonging 12 Stein, ibid. p. 161. 13 For a different explanation, see Jeremy Butterfield, “Seeing the Present”, in: Mind 93,
384
Mauro Dorato
to “the manifest image of time − that the now, or the present moment, appears to us as being cosmically extended: Alex is temporally thin and spatially fat.14 The “illusion” of a cosmically extended now, notoriously, had to be unmasked by Einstein in order to lay the foundations of the special theory of relativity. Since any perception ascertaining the simultaneity of two events is always local, the attribution of a relation of simultaneity to two distant events necessarily entails some conventional element. Considering the fact that our experience of time is always local, it is also natural to claim that if we want to represent our experience of time in Minkowski spacetime, we must rely (on segments of worldlines measured by) proper time, and not on coordinate time. Dieks has pointed out, correctly in my opinion, that we don’t need global nows to accommodate our local experience, since different spatial hyperplanes intersecting the same point or the same short timelike segment − an idealized representation of our body − lead to the same temporal experience.15 However, notice that Alex’s spatial extendedness, while sufficient to explain the corresponding aspect of our experienced now, is by far not sufficient to regard Alex as a good explanans in the second sense specified above. Different extended regions of spacetime might also succeed in explaining this aspect of our experience, and Savitt must somehow show that Alex is the best explanation we can give, a difficult task that, as we are about to see, he tries to undertake without success.. Summarizing my presentation of Arthur and Savitt’s view of Alex, I agree with them on the following two crucial points: (1) the present of our experience looks extended; (2) however, the notion of time on which to base the representation of the experienced present in Minkowski spacetime is proper time rather than coordinate time. 1984, pp. 161-76 and Craig Callender, “The Common Now”, in Philosophical Issues 18, 2008, pp. 339-361. 14 For this reason, Alex has also been used to explain the origin of the belief in presentism. “… presentists … are inclined to accord a kind of ontological privilege to that with which they can (then) interact. What they can interact with, at a particular near-momentary subregion R of their worldtube, is the contents of that region’s Stein Present. But … such a Stein Present has very little temporal thickness and is very large in spatial extent. Such a region is easily mistaken for an instantaneous, global present.”, in Ian Gibson and Oliver Pooley “Relativistic Persistence,” in John Hawthorne (Ed.), Philosophical Perspectives, Vol. 20, Metaphysics. Oxford: Blackwell, 2006, p.169. For an attack on the significance of the presentism/eternalism debate in Minkowski spacetime, see Steven Savitt, “Presentism and eternalism in perspective”, in D. Dieks (ed.), The Ontology of Spacetime, Amsterdam: Elsevier, 2006, pp. 111127, Yuval Dolev, “How to square a non-Localized present with special relativity”, in D. Dieks (ed.), The Ontology of Spacetime, Amsterdam: Elsevier, 2006, pp and 177-190, and Mauro Dorato, “The irrelevance of the presentist eternalist debate in Minkowski spacetime”, in D. Dieks (ed.), The Ontology of Spacetime, Amsterdam: Elsevier, 2006, pp. 93-109. 15 Dennis Dieks, “Becoming, relativity, and locality” in D. Dieks (Ed.), The Ontology of Spacetime, 1, Amsterdam, Elsevier, 2006, pp.157-175.
The Alexandroff Present and Minkowski Spacetime
385
3 ALEX’S OSCILLATION BETWEEN ITS USELESSNESS IN PHYSICS AND ITS UNFAITHFULNESS TO OUR EXPERIENCE
The dilemma that I want to present in this section is a consequence of a tension between Savitt’s two motivations for introducing Alex in Minkowski spacetime, which for brevity I will refer to as “Shimony’s circle” and “Einstein’s worry”. (1) The undisputable objectivity of Alex − needed by Savitt and Arthur to overcome in some sense Einstein’s worry − makes it unsuitable to account for, or explain, the subjective present; (2) Any success in explaining the subjective present with a geometric structure of Minkowski spacetime would make a solution to Einstein’s worry highly implausible. In the rest of the paper, I will argue in favor of these two claims. As explicitly recognized by Arthur (and, I think, implicitly also by Savitt) Alex, qua objective spatiotemporal structure, is not a model for the subjective present, that is, it is not “the set of all those events of which we are consciously aware at the moment of considering them”.16 One of the reasons for this claim has been put forward by Arthur himself: the set of events that we can actually be conscious of during an interval of time is always a subset of the set of those events that are in actual mutual communication with our bodies during that interval. This is not simply due to our selective attention − filtering out from our awareness many events that are in our relative Alex − but also to the fact that Alex is the set of possible and not of actual causes and effects affecting our bodies during a specious present. Therefore, many events in Alex will not be registered by our consciousness, and Alex cannot coincide with the events represented in our subjective present. For analogous reasons, however, the set of events in Alex is not picked out by our everyday uses of “now”, at least not on the hypothesis that the latter is regarded (as Arthur, Savitt and I think it should) as an indexical. While the contextdependence of such uses is well accounted for by the variable length of the timelike curve with extremes a,b, and while it must be admitted that our uses of the indexical “now” need not necessarily refer to the set of events that we subjectively regard as present, the intentions of the speakers are certainly relevant to fix the set of events picked out by the indexicals we use. And by no stretch of imagination could we claim that, by uttering “now”, we typically intend to refer to all the events with which we are in possible mutual causal communication (namely, to Alex). On Savitt’s and Arthur’s part, this could be stipulated only with a remarkable degree of arbitrariness, since in our usage of “now”, the idea of intending to refer (token reflexively) to events that are simultaneous with our speech acts seems much more prominent. It does not seem very plausible to claim that we can be wrong about what we intend to pick out when we use the word “now”, even though it must be admitted that Alex does contain simultaneous events. 16 Arthur, ibid., p.151.
386
Mauro Dorato
However, if Alex is neither (nor is intended to be by its proponents) the subjective present, nor can it be the set of events that we typically intend to pick out when we use the temporal indexical “now”, why should we believe that − simply in virtue of its spatial fatness − it is a good explanation of our “experience” of speakers and of conscious agents (assuming, of course, that the spacetime we live is Minkowski)? Granting that Alex can have some explanatory virtue is not sufficient to remove an impression of arbitrariness. To avoid this problem, Arthur and Savitt should show that Alex is, if not the best available explanation of our experience, a very plausible candidate. As an argument against this plausibility, , notice that when we look, say, at the starry sky, we believe that the light we see is part of our present experience. However, the events corresponding to the emission of light from, say, a supernova, occurred various years before our perception, and are therefore certainly not in our momentary specious present, which typically lasts more than half a second but less than a few seconds. It then follows that the events on the supernova that are responsible for our perception are outside Alex’s, and yet they are pre-theoretically regarded by us as present. The fact that Alex has not been proposed as a physical model of our experienced present (it does not even contain events that we regard as present) weakens its explanatory force: any strong explanans should at least describe correctly its explanandum. As an additional argument in favour of this weakness, notice that the example of the supernova entails lack of supervenience of our mental experience of time over Alex, at least if we suppose that [a,b]’s temporal length is less than a few seconds. In fact, a difference in what we regard as psychologically present (say, we first look at the moon and then we immediately see a supernova in the sky), by including events outside Alex, is not reflected in a difference of the subvenient, physical base, since Alex remains the same in the two cases (i.e., immediately before and at the moment in which we see the supernova). But if our subjective experience does not supervene on Alex, it cannot provide a strong explanation of our temporal experience, at least to the extent that the existence of a relation of supervenience of a family of mental properties B over a family of physical properties A is a necessary (albeit not sufficient) condition for any explanation of B by using A. And even though neither Arthur nor Savitt have proposed Alex as a possible physical basis of supervenience for our mental experience, I think that they will grant that, as a consequence, the explanation that it can provide cannot be but weak and partial. It could be replied that whenever we are looking at closer stars, say, the Sun, we could stretch the segment [a,b] that represents our present into a much longer temporal intervals (16 minutes), so as to include the original electromagnetic emission from our star. Such an “elastic” procedure could possibly be iterated even when we look at the nocturnal sky with a powerful telescope. Wouldn’t this be a way to rescue Alex from the charge of being “unfaithful” to our experienced present?
The Alexandroff Present and Minkowski Spacetime
387
After all, we have already noticed how the length of the present is pragmatically dependent on the events we are referring to. Suppose we accept this “elasticity” as a necessary consequence of the variability of the contexts of utterance of the temporal indexicals. This “stretchy Alex”, however, would be parasitic on a completely different model of the experienced present. The amount of the stretch would in fact come to depend on the distance of the events we are looking at, and therefore on our regarding as present the set of events on the past light-cone centered in our bodies. This would be tantamount to identify the present with the events on the past light-cone. But then why not preferring the past-light cone to Alex as a possible explanation of the extended character of our experience? If the length of Alex comes to depend on the distance of the events which we are interacting with in perception, one must accept that Alex is less successful an explainer of our temporal experience than the past light cone.17 Arthur considers the possibility of representing the present of our experience with the set of events lying on the past light cone, but complains that such a move would be equivalent to eliminating “any distinction between past and present”.18 This is doubtlessly correct. He should have realized, however, that such an elimination is the essential feature of our experience of the present. In fact, we are completely unaware of the time lag due to the speed of light, and unconsciously attribute light an infinite speed. The approximation, for all practical purposes of our earthbound life, is extremely good, since objects that are close “enough”, as noted by Butterfield, typically don’t change much after light bounces off them and reaches our retinas.19 I think that what we are facing here is a deep, acute conflict between the descriptive needs of physics and those of the psychology/phenomenology of the experience of time. On the one hand, if we adopt Einstein’s standard convention of simultaneity in terms of “worldline-orthogonality”, from the viewpoint of physics the events of the emission of the electromagnetic radiation from the stars or the Sun must be regarded as being in the causal past of our perceiving bodies. On the other hand, if we regard the emission of radiation from celestial objects and our act of perceiving it as simultaneous with our perceptions, we are much more faithful to our experience, but we eliminate any difference between past and present. We could of course adopt a criterion of simultaneity that is different from the one that is customarily adopted, and yet much closer to our experience of time: according to this different method, we could regard as simultaneous with our perception any event on the past light cone. However, as already noted by Einstein, this method of fixing simultaneity would be quite impractical for physical needs, because it would make simultaneity depend on the particular spatiotemporal position occupied by the clock: 17 See last section for more arguments. 18 Ibid., p. 151. 19 Butterfield, ibid.
388
Mauro Dorato
We might, of course, content ourselves with time values determined by an observer stationed together with the watch at the origin of the co-ordinates, and co-ordinating the corresponding positions of the hands with light signals, given out by every event to be timed, and reaching him through empty space. But this co-ordination has the disadvantage that it is not independent of the standpoint of the observer with the watch or clock, as we know from experience.20
The impracticality of the method of simultaneity discussed in this quotation shows that, even if we decided to modify Einstein standard criterion of simultaneity, the descriptive needs of physics and those of psychology would diverge. The second argument against the possibility of stretching indefinitely Alex according to need is this: if the extension/duration of the now is totally contextdependent, i.e., dependent on what we observe, its physical meaning would seem to become proportionally weaker. Alex would still be objective, but characterized by a length that is somehow mind or observer dependent!! So once again our attempt to solve Einstein’s worry fails to help solving the issue raised by “Shimony’s circle”. These conclusions are generalizable to structures other than Alex and the past light-cone above. Once we accept (for physical reasons) Einstein’s standard criterion of simultaneity, it follows that even if we could successfully model the psychological present by using some other structure, like the past light-cone discussed above, we would end up with something which introduces some deep discrepancies with the constraints of the physical description of the world. And the simplest hypothesis that explains this gap is this: our natural folk-physical belief about what is present is to some extent simply illusory, since it falsely assumes the presentness (simultaneity with our perception) of whatever falls into our visual fields, supernova explosions included. Einstein expressed the illusory nature of our experience of time quite clearly: The illusion which prevailed prior to the enunciation of the theory of relativity … had its origin in the fact that in our everyday experience we can neglect the time of propagation of light. We are accustomed on this account to fail to differentiate between “simultaneously seen” and “simultaneously happening”; and, as a result, the difference between time and local time is blurred.21
While in Savitt’s intentions Alex is doubtlessly no more than a first step in the attempt to explain our subjective temporal experience − so that he would agree with me that much more needs to be done − my attempt here is to show that it is a step in a wrong direction. 20 Albert Einstein et. al., The Principle of Relativity, transl. by W. Perrerr and G.B. Jeffery, Dover Publication, New York, 1905/1952, p. 39. 21 Albert Einstein, “Physik und Realität”, Journal of The Franklin Institute 221, pp. 313347, English translation in Ideas and Opinion, New York, Bonanza, 1936, p. 299.
The Alexandroff Present and Minkowski Spacetime
389
In fact: 1) In virtue of its objectivity, Alex cannot fulfil the aim of closing Shimony’s circle. Savitt’s central question of bridging the gap between the time of physics and the time of the manifest image is still up for grabs, and must be studied by other means, namely neurophysiology and psychology,22 and not by combining some arbitrary physical structure with some features of our experience. 2) Attempts to close Shimony’s circle by selecting structures out of Minkowski spacetime are destined to be unsuitable for the descriptive purpose of physics, and therefore for proving Einstein’s worry unfounded. Against Savitt, I think we can conclude that his two motivations cannot be had at the same time, and one of them must give way. In the next two sections, we will see that, as a matter of fact, the situation is even worse: Alex can serve neither purpose. The physical irrelevance of the interactive present on the one hand (4), and further arguments in favour of the unfaithfulness of the interactive present on the other (5), will complete my objections to the introduction of Alex in a physical theory whose spatiotemporal arena is Minkowski’s.
4 THE PHYSICAL IRRELEVANCE OF ALEX It would be unfair to claim that the Alex has no role whatsoever in physical theories. For instance, the condition of “strong causality” in a Riemannian manifold is provably equivalent to the condition that the Alexandroff topology coincides with the manifold topology, or to the condition that the Alexandroff topology is Hausdorff.23 The basis of this topology is given by the intersection of the set of future directed timelike curves originating from p with the set of past directed timelike curves originating from q, with p earlier than q: {I+(p) ∩ I−(q) : p, q ∈ M}. Readers will recognize Alex’s present relative to the segment [p, q] of the timelike curve, except that the Alex also contains lightlike geodesics, and is therefore {J+(p) ∩ J−(q) : p, q ∈ M}, where J+ (p) is the set of future directed causal curves originating from p and J−(q) is the set of past directed causal curves originating from q. We need to distinguish an important fact separating Minkowski spacetime from general relativistic spacetimes, and investing the possibility of deriving metrical facts from topological/causal facts alone. While in Minkowski spacetime the manifold topology coincides with the Alexandroff topology, in order to obtain the 22 In any case, Savitt cannot be accused of trying to replace empirical research by philosophical reflection. 23 Martin and Panangaden, ibid. The Alexandroff topology “is the coarsest topology in which all the interiors of light cones are open”. See Lawrence Sklar, Philosophy and Spacetime Physics, University of California Press, Berkeley, 1985, p. 81. More generally, an Alexandroff topological space is one in which the intersection of any family of open sets (also non finite) is still an open set.
390
Mauro Dorato
desired equivalence in general relativity, we need to impose the further condition of strong causality, which is equivalent to the fact that the manifold has no “almost-closed causal curves”, or no curves that come arbitrarily close to intersecting themselves. This means that “for every point p in the manifold M and for every neighbourhood O of p, there is another neighbourhood V of p contained in O such that no causal curve intersects V more than once”.24 In Minkowski spacetime it can be shown (following Robb) that the postulation of a “before-ness” relation, plus some axioms that such a relation must satisfy, is sufficient to fix the fact that an event is in the interior of a past or of a future light cone.25 This means that all the topological properties of regions of Minkowski spacetime can be recovered just by looking at the relation of causal connectibility, and the topology of spacetime can be the Alexandroff topology. In a word, the main foundational interest of the Alexandroff topology lies in the possibility of determining metrical facts from topological facts involving the interiors of the light cones of Minkowski spacetime. The possibility of quasiclosed curves in general relativistic models, however, divorces the manifold topology from the Alexandroff topology, because in spacetime with almost closed causal curves not all causal automorphisms can also be homeomorphisms. However, it should be admitted that the illustrated role of the Alexandroff topology in foundational studies is a far cry from claiming that Alex is relevant for physics. In particular, there is no connection between the Alexandroff topology and temporal properties, despite the fact that stably causal spacetimes − for which the above relationship between manifold topology and Alexandroff topology a fortiori holds − do have a cosmic time.26 The latter concept introduces another important issue: Arthur claims that it is in the application of physics, rather than with respect to physical laws, that the now has a role. He correctly points out that it makes a great deal of difference with respect to available evidence whether it is supposed that the Big Bang occurred 14 or 26 billions years ago: “the relativity of the now to certain events (humans having theories) does not detract from its objectivity”.27 However, here the question is not the objectivity but the relevance of the now for physics. In the sentence just quoted, we can give the truth conditions of the above tensed expression by simply pointing out that, relatively to a certain class of events in spacetime, where we are tenselessly located at the moment in which we use a certain theory, the Big Bang 24 Robert Wald, General Relativity, Chicago University Press, Chicago, 1984, p.196. Strong causality is weaker than stable causality, which in turn is necessary and sufficient for the existence of a cosmic time function. 25 Sklar, ibid, p.79. See also John Winnie, “The causal theory of spacetime”, in: J. Earman C. Glymour, and J. Stachel (Eds.), Foundations of spacetime theories, Minnesota University Press, Minneapolis, 1977, pp. 134-205. 26 The use of closed Alex diamonds in algebraic quantum field theory as a basis to define an algebra of observables is also interesting but here it cannot be expanded upon. 27 Ibid, p.152.
The Alexandroff Present and Minkowski Spacetime
391
has occurred 14 billions years before those events, rather than 26 billions before. But let us even grant a role to the tensed expression “billions of years ago” (we can accept that indexicals have essential meanings), and let us suppose that the “located” above is “located now”. Why should this expression entail reference to the set of events that is in mutual causal interaction with the worldline that, initiating with the Big Bang, intersects the areas of spacetime in which we now refer to that first event? Given the possibility of using a cosmic time, as it is customary in FriedmanRobertson-Walker’s cosmological models, we can interpret that tensed language (billions years ago) as referring − in a token reflexive way, and in the moment in which we use the theory − to the proper time of the “fundamental observer” associated with the mean motion of matter in our local cluster or supercluster of galaxies. Such a proper time can then be extended, given certain assumptions of isotropy, to the proper time of other “fundamental observers”. Note that (i) this standard manoeuvre is fully compatible with a denial of the possibility of using cosmic time to defend presentism or a metaphysics of tenses, but also that (ii) our need of referring to the beginning of time and space cannot be said to provide evidence for the postulation of Alex. In virtue of the Cosmological Principle in fact, “having the age of ... years” is a property that the Universe has in all spacelike-related regions now sharing the same values of pressure and density that we are measuring. The property in question is therefore not relative just to us users of the physical theory in a particular point/region along a particular worldline, as seems to be implied by a structure like Alex.
5 ANOTHER MODEL OF THE PRESENT OF OUR EXPERIENCE? The point of this final section is to reinforce the thesis already argued for that not only is Alex unsuitable to represent or explain some features of the present of our experience, but also that no other structure of Minkowski spacetime could do the job. This will allow me to conclude that we should not try to use such a spacetime to explain or account for features of the present of our experience, in the weakest possible sense of explain or account. Given the facts (already argued for) that: (i) Alex has no important applications in physical theories; (ii) it does not seem a plausible, strong and non-arbitrary explanation of the extendedness of our subjective present, and (iii) It does not correctly pick out the events we intend to pick out when we use “now” in ordinary language (iv) these seem the only reasons to introduce it,. I conclude that we should drop it. With respect to (ii), it seems reasonable to add to the previously discussed counterexamples that Alex should also contain events that we regard as present
392
Mauro Dorato
simply because their temporal separation is so small as to be inferior to our capacity to tell them apart. There is a lot of experimental evidence in this field, which for our purpose we will consider only synthetically, and which, however, is very relevant in order to explain the central feature of our experience of the now, namely its spatially extended appearance.28 Owing to the reaction time that is necessary to distinguish two light signals as temporally successive, we perceive as simultaneous events that are separated by less than a given threshold. In general, the threshold necessary to tell two signals apart is different from person to person, and varies also from sensorial modality to sensorial modality. But in average, and for visual stimula, empirical research indicates that there is a threshold of 30ms for a person to be able to tell two flashes of light apart, let alone determine their temporal order, for which at least 45 ms are necessary. Multiplying the threshold of 30ms by the speed of light c, we get 30 x 10-3 x 300.000 km/h = 9000 km. A flash a, that originated 30 ms on the surface of a sphere whose radius is 9000 km, but simultaneous with event b where I am located, registers on my retina 30ms later, at event d.
More in general, we perceive anything inside a sphere centered in b, and whose radius is 9000km, as simultaneous with all the points between b and d, which are separated by 30ms. This sphere, which is really a cone, can be visualized correctly in the shaded area of fig.2. By remodelling Savitt’s suggestion in the right way, the succession of those cones could give us the transient now. Should we then propose the grey area in the figure as a more faithful replacement of Alex and claim that the succession of the grey, conical regions along worldlines represents the passage of time? I would be ready to maintain that the grey wedge or conical model is much less arbitrary than Arthur and Savitt’s proposal, as it is based on well-confirmed 28 Some of this literature is covered in Callender (ibid.), who correctly invites philosophers of time that, like ourselves, are trying to bridge the experiential with the physical time, to devote more attention to experimental findings in neuropsychology.
The Alexandroff Present and Minkowski Spacetime
393
psycho-physiological data, and that, therefore, it describes more precisely the extended character of our psychological present, in which perceived space, regarded as Leibniz’s order of coexistence, is a construction of our brains. Note that since the light signals intersecting one after the other the vertical segment b-d are physically temporally separated but cannot be perceived as being temporally separated, they should be treated as part of the set of points that are subjectively regarded as present. Events like c are not in Alex [b,d] but are perceived as simultaneous with b (at the center of the sphere) and with a, at its surface, and therefore are in the same psychological present of the observer located in b. Events like these count as other instances of events that Alex does not contain, and that we are forced by our innate perceptual machinery to regard as present.
6 CONCLUSION In sum, we should not continue the game that Arthur and Savitt initiated by playing it with a different (and more appropriate) ball, namely the conical model. We have already noted how the fusion of the past with the present, of what is far away with what is occurring right here, explains why our experience of time “misrepresents” the physical, objective temporal separation of events. The misrepresentation, or better, the temporal fusion of timelike-related or light-like related events, is due to the fact that the time of occurrence of events is merged with the time of our perceiving them. To acknowledge that the present is extended means to acknowledge that the “past-there” is fused with the “present-here”: our treating them as simultaneous is due to the psychophysical threshold needed to tell events apart, plus the considerable speed of light. From an evolutionary viewpoint, this fusion does not represent any terrible distortion of our environment, insofar as objects, as noted by Butterfield (see note 19), do not change their properties too much during the short time needed by light to inform us about them and by our nervous system to process the relative information. Notice that if we acknowledge that our experience of time is a fusion of events that from the viewpoint of physics are temporally separated, then it does not make much sense to try to represent with a geometrical/physical structure the set of events that we represent as being present, except by specifying that these events are part of a single experience of simultaneity. Consequently, we do need to distinguish the “physical present” of events and the psychological present. The former, due to the relativity of simultaneity, in Minkowski spacetime must be represented as pointlike (each physical event is present, from its own “perspective”, when and where it occurs). It is only the latter that is extended, and can include many pointlike physical events, many of which are temporally separated from the other: violations of achronality are admissible only for the psychological present, but not for the physical present.
394
Mauro Dorato
In a word, if we want to bridge the gap between the time of physics and the time of our experience, as Savitt correctly urges us to do, it is important to begin our work by recognizing their important differences.
Department of Philosophy University of Rome 3 Via Ostiense 234 00144 Rome Italy [email protected]
T OMASZ P LACEK
A L OCUS FOR “N OW ”
A BSTRACT We investigate the concepts of past, present, and future that build upon a modal distinction between the settled past and the open future. The concepts are defined in terms of a pre-causal ordering and of qualitative differences between alternative histories. Finally, we look what an event’s past, present, and future look like in the so-called Minkowskian Branching Structures, in which histories are isomorphic to Minkowski spacetime. “What is the present?”1 We typically understand this question as being relative to events, that is, we fix our attention on some event (frequently, an event of our utterance) and query what the present of this event is. The question has many facets, two of which we have set apart for the purposes of this paper. First, we may be concerned with when the “now” is, or what a locus for “now” is. The aim is to indicate a part of our world, or a region of spacetime, as a locus for the present of an arbitrary event. In other words, the aim is to define the set of events co-present (contemporaneous) with a given one, or the set of locations of such events. Secondly, one may wonder what differentiates the present of an event from its past and its future. This second question quickly leads us to considerable metaphysical queries: “Is becoming real?”, or “Is the distinction between tenses objective or mind-dependent?”. Clearly, the second question is much harder; moreover, a positive answer to it presupposes some answer to the first question. So we put it aside, and focus upon what a locus for “now” is. The problem is that starting with the papers of Rietdijk (1966) and Putnam (1967), there have been arguments showing that special relativity (SR) is inimical to any intuitive notion of the present, where “intuitive” here means that it is based on co-presence that is transitive and neither the identity nor the universal relation on Minkowski spacetime.2 Although these arguments deserve a separate 1
2
I would like to thank the audience at the ESF workshop “Physical and Philosophical Perspectives on Probability, Explanation and Time” and the audience of my lunch talk at the Center for Philosophy of Science of the University of Pittsburgh in February 2010. The paper also owes much to the discussions I had with Jacek Wawer and Leszek Wro´nski. The MNiSW research grant 668/N-RNP-ESF/2010/0 is gratefully acknowledged. Cf. Malament (1977), Dieks (1988), Stein (1991), van Benthem (1991), or Raki´c (1997). For a present assessment of Putnam’s argument, cf. Dorato (2008).
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_27, © Springer Science+Business Media B.V. 2011
396
Tomasz Placek
analysis, to keep the length of this paper short, we will limit ourselves to this notso–rigorous formulation of the result: R
The following set of premises is logically incoherent: (1) the relations used to define co-presence and co-presence itself are invariant with respect to automorphisms of Minkowski space-time, (2) co-presence is a transitive relation on Minkowski spacetime, (3) of two co-present events, one cannot be causally before (or after) the other, and (4) co-presence is neither the identity nor the universal relation on Minkowski spacetime.
Attempts to blunt the impact of this result boil down to arguing that a failure of one of premises (1) - (4) is not as bad as it looks.3 In this paper, motivated by an intuition that associates the future with contingency, we construct a spatiotemporally extended and frame-independent notion of the present. The construction does not contradict result R (how could it?): to accommodate contingency, we will distinguish a special set of points of Minkowski spacetime, called splitting points, and thought of as locations of chancy events. To define the present, we will use relations like “x is a splitting point and y lies within the future-light cone of x”, which clearly is not invariant with respect to automorphisms of Minkowski spacetime. Moreover, we will end up with a notion of tenses that will be separate from causal notions defined in terms of light-cones. The approach is intended to be conciliatory: on the one hand, we invite the reader to modify her notion of the present. On the other, we take it that neither special relativity nor general relativity are our ultimate truths, so perhaps one day there will be a theory of both spacetime and chanciness. Needless to say, we bet on our world turning out to be chancy.4
1 M AIN INTUITION There is a strand in philosophy that associates the future with open possibilities, the past with settled facts, and the present with a region of passage from possibility to settledness. The view had a strong proponent in Aristotle.5 In recent times, the idea was defended by Whitrow (1961, pp. 295–296): 3 4
5
For an example, see Savitt (2000) or Dieks (2006). The construction developed here is in some aspects similar to that of M¨uller’s (2006) and to a model I gave at Logica 2002 (unpublished). Some ideas presented here were born in discussions I had with T. M¨uller in the years 2002–2004. I am very grateful to him for sharing his insights with me. It seems to me that Fred Muller once held similar views on loci for the past, present, and future. Cf. Cael I.12: “No capacity relates to being in the past, but always to being in the present or future.”
A Locus for “Now”
397
Strict causality would mean that the consequences pre-exist in the premises. But, if the future history of the universe pre-exists logically in the present, why it is not already in the present? If, for the strict determinist, the future is merely “the hidden present”, whence comes the illusion of temporal succession? The fact of transition and ‘becoming’ compels us to recognize the existence of an element of indeterminism and irreducible contingency in the universe. The future is hidden from us—not in the present, but in the future. Time is the mediator between the possible and the actual.
Similar elaborations on this view can be found in Eddington (1949) and (1953). The doctrine that the objectivity of the distinction between the past, the present, and the future requires indeterminism (or some aspect of contingency, or a failure of the universal causation) has been vigorously opposed.6 But, strangely enough, no friends or foes of the doctrine have belaboured the underlying association between future and contingency to a point of stating it with a rigor that would make the association amenable to formal treatment. It is exactly this task to which we now turn. How then is the future different from the past and the present? Supposedly, in contrast to the latter, the future has some aspect of contingency. Yet what is this aspect, exactly? Note that once we decide on how to respond to this question, we will get a grip on a concept of the future, from which a characterization of the past and the present would fall in naturally. We will define events in the past of event e as those events from which perspective e was in the future. Having had the notions of “events in the past of e” and “events in the future of e”, we will declare that events co-present with e are exactly those events that are neither in the past nor in the future of e. In the above elucidation of what the past is we used tenses (“was in the future”); similarly we will invariably use words like “after” or “before” in our final definition of the past, the present, and the future. This might bring in an objection that our definition is circular. Clarifying this possible confusion, we assume here a pre-causal ordering of the totality of possible point events, and that this ordering is partial. The ordering is similar to the SR ordering in terms of light cones, but generalized to modal contexts. We will read the ordering e 6 e′ as “e can causally influence e′ ”, or “e′ belongs to a possible continuation of e”. The ‘after” and “before” will refer to this ordering. In a similar vein, the tense operators will be standardly defined in terms of pre-causal ordering. As a consequence of this approach, we will get a certain separation between causal notions (including the tense operators) and the notions of past, present, and future.7 Turning to belabouring on a future–contingency link, let us begin with the question: why does my toast at the New Year’s Eve 2012 belong to the future of my present utterance? As a first approximation, take the answer 6 7
See e.g., Gale (1963), as it is an attempt to rebut Whitrow’s and Eddington’s arguments. I am indebted to D. Dieks, K. Kishida, and J. Wawer for clarifying the distinction between a causal ordering and past, present, and future, and for the perception that tense operators (W ill, W as) are defined in terms of the causal ordering, and separated from notions of past, present, and future, as here analyzed.
398
Tomasz Placek
“It belongs to the future only if it might fail to occur.” Evidently, this answer is too strong, as it relegates from the future of e any event that occurs after e in every possible continuation of e. In other words, an event deterministic from the perspective of e cannot belong to the future of e on this construal. As an improvement consider this: “My toast at the New Year’s Eve 2012 belongs to the future of my present utterance only if the way it will occur is not settled yet”. On this proposal, the toast in question belongs to the future of my present utterance since, for instance, it is not yet settled where I will have it. This answer is again too strong, for exactly the same reason as the previous one. What seems to me a minimal link between future and contingency is the following formulation: “My toast at the New Year’s Eve 2012 belongs to the future of my present utterance iff the toast is consistent with the utterance and before the toast there is an event and some aspect of it that is not settled yet.”8 Here “before” is understood weakly, as “before or identical to”. The requirement of consistency excludes from the future of my utterance those possible events that do not occur in a history to which my utterance belongs. To illustrate this analysis, although it is inevitable that my old-fashioned mechanical wall-clock will strike in 52 minutes, this event belongs to the future of my present utterance, because there are some events before it that are in some respect contingent from the present perspective. The New Year’s Eve examples suggest that the future-contingency link should be minimal, which strongly favors our third analysis. There seems to be, however, an opposite intuition as well, which takes the event of our clock striking in 52 minutes as not really belonging to the future, since (given our assumptions) it is already settled that the clock will strike in 52 minutes. The feeling is that the clock mechanism is somewhat “isolated” from its surrounding, and especially from chancy events in its past. No matter what, it will strike. On reflection, the truth of the “already settled” sentence above means that, even if there are many histories to which my utterance belongs, in every such a history there is our clock striking in 52 minutes. There is thus a disjunctive event of our clock striking that is contained in many histories. Lewis calls such events “non-fragile” since, even if our clock stroke a bit differently, we would call it “the same event as the actual striking of our clock”. There is however another concept of events, fragile or non-disjunctive events. On this concept of events, if our clock’s striking were minimally different from the actual one, even by merely having a minimally different past, this event would not count as identical with the actual striking. The feeling of a mechanism isolated from a neighboring chancy event stems from our concentration on settled truth and the underlying disjunctive events. At the level 8
The word ‘iff’ abbreviates ‘if and only if’.
A Locus for “Now”
399
of non-disjunctive events, the phenomenon is absent: a slight chancy event brings in a non-erasable difference for the future.9 To further elaborate on our third analysis, we will put it down in words as below: Condition 1 f is in the future of e iff e is consistent with f and there is some event e′ before or identical to f and a subject matter A such that at e it is contingent that A at the space-time location of e′ . To put rigour into our intuition, we need to combine spacetime with modality. The only rigorous framework for this task is the theory of branching space-times (BST) of Belnap (1992), which in turn is a development of an earlier theory of branching time (BT).10 The development consists of the fact that BST is able to account for spatial and relativistic aspects in addition to modal and temporal aspects analyzed in BT. Both branching theories can be seen as addressing two problems. One is an ontological question: what does the indeterministic world look like? The other problem is semantic, namely, how to formally model a language with tenses, modal operators and indexicals? It is BST’s capacity to handle the second (semantic) problem that we need in the present paper. The basic insight of branching theorists, owed to Prior, is that sentences are evaluated as true or false at the event-history pairs, which leads to giving more structure to evaluation points. Designating evaluation point by e/h, we will have, for a sentence A unsettled in the future of e: e/h1 |= W ill : A but e/h2 |̸= W ill : A. 1.1 Models of BST A model of BST, ⟨W, 6⟩, is a non-empty partially ordered set of possible point events ordered by a pre-causal relation, subject to some postulates.11 Histories in ⟨W, 6⟩ are identified with particular (maximal upward directed) subsets of W . A BST model ⟨W, 6⟩ can serve as a basis of a semantic model ⟨⟨W, 6⟩, I⟩ for a propositional language with tenses and modal operators, and the indexical “here-and-now”. Above I is an interpretation function I : Atoms ⇒ P(W ), where Atoms is the set of atomic formulas. It is understood that atomic formulas of this language have the form: “Here-and-now there is property A”. Turning to truthconditions, here are a few examples: (For more information on BST semantical models, cf. Belnap (2007), M¨uller (2002), and Placek and M¨uller (2007).) To avoid lengthy notation, we abbreviate the point of evaluation ⟨⟨⟨W, 6⟩, I⟩, e/h⟩ to e/h. 9 This problem was brought to my attention by Bryan Roberts. 10 BT theory was suggested in S. Kripke’s letter to A. N. Prior (dated September 3, 1958, unpublished), discussed then briefly in Prior (1967) and worked out in Thomason (1970). 11 For the postulates, and more information on BST models, see Belnap (1992).
400
Tomasz Placek
e/h |= A iff e ∈ I(A) for A an atomic formula; e/h |= ¬φ iff it is not the case that e/h |= φ; e/h |= W ill : φ iff ∃e′ > e : e′ /h |= φ; e/h |= W as : φ iff ∃e′ < e : e′ /h |= φ; e/h |= P oss : φ iff ∃h′ : e ∈ h′ ∧ e/h′ |= φ. Note that in the last clause, since we quantify over histories on its right-hand side, the reference to history on the left-hand side is redundant. We will thus write e |= P oss : A instead of e/h |= P oss : A. Some (but not all) BST models allow for more structure, as one can define spacetime locations (st-locations for short) on them. A set Loc of st-locations for BST model ⟨W, 6⟩ is a partition of W that is conservative with respect to ordering 6—cf. M¨uller (2005). St-location is a relativistic counterpart of our everyday thinking of what would happen at the time or in the location of a given event, if things went differently at some junction in the past. Note that we have thus arrived at the distinction between event (i.e., an element of W ) and st-location of an event (an element of a particular partition Loc of W ). To denote the st-location of event e, we will write loc(e). In what follows, we need to consider sentences of the form “At st-location x it is φ”, like “The value of electromagnetic field at t, x, y, z is such-and-such.” The truth conditions for such sentences can only be formulated with respect to a BST model with set Loc of st-locations: ⟨⟨W, 6, Loc⟩, I⟩, e/h |= Atx : φ iff ∃e′ : e′ ∈ h ∩ x ∧ e′ /h |= φ, where x ∈ Loc. We are now able to formulate the intuition of Condition 1 within the language of BST: Definition 2 An event f belongs to the future of event e, f ∈ Future(e), iff there is event e′ and an atomic formula A such that 1. there is history h such that e, f ∈ h and 2. e′ 6 f and 3. e |= P oss : Atloc(e′ ) : A and 4. e |= P oss : Atloc(e′ ) : ¬A. Event p belongs to the past of event e, p ∈ Past(e), iff event e belongs to the future of p. Event e′ belongs to the present of event e, e′ ∈ Present(e), iff there is a history h such that e, e′ ∈ h and e′ belongs neither to the past nor to the future of e. The future, present and past as defined above are global, that is, whether an event belongs to the past / present / future of event e depends on possibilities open in a history to which e belongs. Technically speaking, clauses (3) and (4) of the definition of the future of e require quantification over all histories comprising e.
A Locus for “Now”
401
L. Wro´nski suggested to me (in a private communication) that for some purposes relativised notions of the past / present / future are more adequate. Typically we do not know about all possibilities available from a given history. We might thus want to relativise the investigated notions to some set of possibilities, those we know or those that are available in our vicinity. Technically, this proposal amounts to relativising the operator P oss to a set H of histories: for H ⊆ {h ∈ Hist | e ∈ h}, e |= P ossH φ iff ∃h h ∈ H ∧ e/h |= φ. As a result of replacing P oss by the relativised operator P ossH in Definition 2, typically the future of e as well as the past of e would become smaller, making the present of e larger. 1.2 Minkowskian Branching Structures (MBS) Although the above definition adequately (we believe) captures our informal statement of Condition 1, it does not permit us to “see” what the future, and hence the past and the present, of an event are. This is a consequence of the generality of BST, which leaves it open what structure BST histories have, as long as they are maximal upward directed subsets of a base set. Thus, to address the “see” question, we need to make it relative to a specific concept of spacetime, and then consider such BST models, in which histories are isomorphic to the spacetime in question. We will investigate the problem for Minkowski spacetime. A particular class of BST models, in which every history is isomorphic to Minkowski spacetime has been investigated by M¨uller (2002), Wro´nski and Placek (2009), and Belnap and Placek (2010). To begin with an informal notion (to be proved identical to BST histories), a possible scenario can be thought of as Minkowski spacetime plus physical content. The content can be represented by an attribution of “point properties” (typically, strengths of physical fields), i.e., a function from ℜ4 to P(P ), where P is the set of point properties. To get a modal aspect12 , we need a system of such “physical contents”. A system of this sort is represented by a property attribution F : ℜ4 × Σ → P(P ), where Σ is the set of labels for scenarios. Since we haven’t (yet) imposed any restrictions on property attribution functions, we should expect that they will produce strange property attributions, or at least, ones incapable of obtaining a BST reading. Thus, in an attempt to arrive at BST models, we single out the class of “proper” property attributions. We shall put our requirement informally first: for F to be a proper property attribution, we require that every two scenarios σ, η ∈ Σ are qualitatively different somewhere and if they are different at some point, there is a special point c ∈ ℜ4 below it (called splitting point for σ and η). Its special character consists in that (1) σ and η agree at and below c, and that (2) for a point x above c, no matter how close x is to c, there is always an even closer point above c at which σ and η disagree in content.13 Note that while postulating a complete qualitative agreement at and 12 As exemplified for instance in saying “It is φ at x ∈ ℜ4 , but it could be ψ there”. 13 The background of the requirement is the density of 6, which is a BST postulate.
402
Tomasz Placek
below c, we do not require a complete disagreement above c; we readily permit that over large regions above c the scenarios are qualitatively the same– as long as they are different at locations arbitrarily close to c and above c. The locutions “above” and “below” refer here to the so-called Minkowskian ordering 6M of ℜ4 : x 6M y iff
3 ∑
(xi − y i )2 6 (x0 − y 0 )2 and x0 6 y 0 ,
(1)
i=1
with a resulting strict ordering <M defined in a usual way. The relation of being space-like related (SLR) is also typically defined: two points are SLR iff they are incomparable by 6M . Putting the above informal explanation in symbols, we have this : Definition 3 A property attribution F : ℜ4 × Σ → P(P ) is proper iff for every σ, η ∈ Σ (σ ̸= η) there is x ∈ ℜ4 such that F (x, σ) ̸= F (x, η), and
(2)
(for every x ∈ ℜ4 ) if F (x, σ) ̸= F (x, η), then there is c ∈ ℜ4 such that c <M x and ∀z ∈ ℜ4 (z 6M c → F (z, σ) = F (z, η)) and (3) ∀x′ ∈ ℜ4 (c <M x′ → ∃y ∈ ℜ4 (c <M y <M x′ ∧ F (y, σ) ̸= F (y, η))). (4) Points of ℜ4 that satisfy conditions 3–4 constitute what we call the set Sση of splitting points for σ and η. From this definition of proper property attribution some desired properties of sets of splitting points are deducible.14 To state them, it is useful to distinguish special subsets of ℜ4 , thought of as regions of no qualitative difference of histories, and defined as Rση := {x ∈ ℜ4 | ¬∃c (c <M x ∧ c ∈ Sση ) for σ, η ∈ Σ. Fact 4 Assume that F : ℜ4 × Σ → P(P ) is a proper property attribution. Then: 1. σ ̸= η → Sση ̸= ∅; 2. Sση = Sησ ; 3. ∀c, c′ ∈ Sση (c ̸= c′ → c SLR c′ ); 4. x ∈ Rση → F (x, σ) = F (x, η); and 5. ∀σ, η, γ ∈ Σ Rση ∩ Rηγ ⊆ Rσγ . 14 We follow here the construction of Belnap and Placek (2010), which is more “physical” than the others, since it derives BST structures from property attributions. Apart from the proper property attributions, these authors assume a topological postulate and a condition on chains of splitting points.
A Locus for “Now”
403
Clearly, Σ is not a set of BST histories, and 6M is not a BST ordering. To produce a BST model, we need to construct these latter notions, showing that they satisfy BST postulates. In this task, we follow M¨uller’s (2002) construction, to which the reader should turn to for more information. First, we define relation ≡ on ℜ4 × Σ:15 xσ ≡ yη iff x = y and x ∈ Rση . Provably ≡ is an equivalence relation on ℜ4 ×Σ. Next, we define a BST event as an equivalence class with respect to ≡, that is {yη | yη ≡ xσ} := [xσ] A BST ordering is defined as follows: [xσ] 6 [yη] iff [xσ] = [xη] ∧ x 6M y. Importantly, it turns out that Σ is indeed a set of labels for histories, as every BST history is of the form: {[xσ] | x ∈ ℜ4 } for σ ∈ Σ. Moreover, given that a property attribution is proper and an additional postulate is satisfied,16 [xσ] is a maximal element in the overlap of two histories {[xσ] | x ∈ ℜ4 } and {[xη] | x ∈ ℜ4 } iff x is a splitting point for these histories, i.e., x ∈ Sση . The construction should finish with proofs that the resulting structure is indeed a BST model.17 Figure 1 illustrates two Minkowskian Branching Structures, first with two histories and one splitting point, and the second —with four histories and two splitting points. The shaded area indicates where a given history overlaps with the first history. Formally speaking, an MBS is a triple ⟨Σ, P, F ⟩, where Σ is a set of labels for scenarios, P is a set of point properties, and F is a proper property attribution. A merit of this construction is that ⟨Σ, P, F ⟩ provides a natural semantic model for a propositional language with tense operators and modal operators, and whose atomic sentences have the form: It is ψ here-and-now, where ψ ∈ P . Furthermore, the proper property attribution F determines interpretation function I in the following manner: [xσ] ∈ I(A) iff ψ ∈ F (x, σ), where A =“It is ψ here-and-now”. The BT/BST truth conditions for tense and modal operators can be readily reformulated in the MBS framework. For a point of evaluation we take ⟨⟨Σ, P, F ⟩, 15 To avoid eyestrain, we write xσ rather than ⟨x, σ⟩. 16 The postulate is: every convergent sequence in a set Sση is convergent to an element of Sση —cf. Belnap and Placek (2010). 17 For the proofs we refer the reader to Belnap and Placek (2010), or M¨uller (2002) and Wro´nski and Placek (2009).
404
Tomasz Placek
e
σ
η
_
+ e
c
c
σ
η
τ
γ
++
+−
−+
−−
+
+
c
c’
+ c
− c’
− c
+ c’
− c
− c’
Figure 1: Top: an MBS with one splitting point and two histories. Bottom: an MBS with two splitting points and four histories. Shadowed regions indicate the intersection of a given history with a reference history σ.
[xσ]/σ⟩, which we abbreviate as [xσ]/σ. As an example, here are the truth conditions for P oss and Aty (y ∈ R4 ): [xσ]/σ|= P oss : B iff there is η ∈ Σ such that [xσ] = [xη] and [xσ]/η |= B [xσ]/σ |= Aty B iff [yσ]/σ |= B, where y ∈ R4 .
(5)
Since in the clause for P oss the reference to label σ after the stroke is redundant, we will write [xσ] |= P oss : B for [xσ]/σ |= P oss : B.
2 W HAT DO THE PRESENTS LOOK LIKE ? We will now apply Definition 2 to some selected MBS’s in order to get a grasp on what the future, the present and the past of a given event are. Two splitting points, time-like. Consider first an MBS with three histories, i.e. Σ = {σ, η, τ }, in which the proper property attribution yields two splitting points c1 , c2 ∈ R4 such that c2 <M c1 and Sση = {c1 } and Sστ = Sητ = {c2 } (See Figure 2.). Pick now an event e := [xσ] that is “between” c2 and c1 in the sense that c2 <M x ̸>M c1 and ask: (1) What is the future of e? (2) What is its past? (3) And what is its present? We claim now: For x such that c2 <M x ̸>M c1 , 1. The future of e = [xσ] is the set of events that are strictly above [c1 σ]: F uture([xσ]) = {[zγ] | c1 <M z ∧ γ ∈ {σ, η}}.
A Locus for “Now”
405
Figure 2: The past, the present, and the future of e = [xσ].
2. The past of e = [xσ] is the set of events that are in history σ and not strictly above [c2 σ]: P ast([xσ]) = {[zσ] | c2 ̸<M z}. 3. The present of e = [xσ] is the set of events in history σ and “between” c2 and c1 in the sense: P resent([xσ]) = {[zσ] | c2 <M z ∧ c1 ̸<M z}. Proof: Ad. 1 ⇒ Let [zγ] ∈ F uture([xσ]). There is then [z ′ γ ′ ] such that (†) [z ′ γ ′ ] < [zγ] and (‡) [xσ] |= P oss : Atz′ A and [xσ] |= P oss: Atz′ ¬A for some atomic formula A. It follows that for some β, β ′ ∈ Σ: (⋆) [xσ] = [xβ] = [xβ ′ ], and [z ′ β]/β |= A and [z ′ β ′ ]/β ′ |= ¬A. The latter entails (⋄) F (z ′ β) ̸= F (z ′ β ′ ), and hence β ̸= β ′ . Given the location of x, it follows from (⋆) that β = σ and β ′ = η (or vice versa), so (⋄) implies that c1 <M z ′ . (†) implies z ′ <M z, and hence c1 <M z. The consistency clause requires γ = η or γ = σ. ⇐ Let c1 <M z. Since c1 ∈ Sση for some z ′ such that c1 <M z ′ <M z: F (z ′ σ) ̸= F (z ′ η). Hence for some atomic A: [z ′ σ]/σ |= A and [z ′ η]/η |= ¬A (or vice versa). By the location of x, [xσ] = [xη], and hence [xσ] |= P oss : Atz′ A and [xσ] |= P oss : Atz′ ¬A. Further each [zσ] and [zη] is consistent with [xσ] (= [xη]) and [z ′ σ] < [zσ] and [z ′ η] < [zη]. Ad. 2 By an argument analogous to the one given above, for every z such that c2 ̸<M z, [xσ] ∈ F uture([zσ]), from which the sought-for result follows. Ad. 3 Immediate from (2) and (3) above.
Note that the present of e = [xσ] turns out to be a spatially extended and temporally thick collection of events. Its temporal thickness depends on the Lorentz interval of the (time-like) vector c1 c2 . Four splitting points, layered in two SLR pairs. Consider an MBS with five histories, i.e., Σ = {σ, η, τ, ν, γ}, with Sση = {c1 }, Sστ = {c2 }, Sσν = {c3 }, and Sσγ = {c4 }. Pick an event [xσ], with x located “between” two pairs of splitting points, ⟨c1 , c2 ⟩ and ⟨c3 , c4 ⟩, each pair being space-like related and each element of
406
Tomasz Placek
Figure 3: History {[xσ] | x ∈ ℜ4 } with four splitting points c1 , . . . , c4 . The present of event [xσ] is indicated by the shaded area.
the top pair lying above each element of the bottom pair—see Figure 3. That is, ∀i=1,2 ∀k=3,4 ck <M ci and (c3 <M e or c4 <M e) and (c1 ̸<M x and c2 ̸<M x). Applying our Definition 2, we get this result: Present([xσ]) = {[yσ] | (y >M c3 ∨ y >M c4 ) ∧ (y ̸>M c1 ∧ y ̸>M c2 )}. Thus, the present of [xσ] turns out to have a shape of a thick letter W . Extreme cases: no point / every point is a splitting point. Consider an MBS with history {[xσ] | x ∈ ℜ4 } in which no point is a splitting point, i.e., ∀x ∈ ℜ4 ∀ρ ∈ Σ x ̸∈ Sσρ . By Definition 3 of proper property attribution, the MBS considered consists of exactly one history, that is Σ = {σ}. This is global determinism. Then for every event, its past as well as its future are empty, from which it follows that for every event, its present is the entire history. We thus have a block universe, indeed. At the other extreme, if a history splits at every point with some other history, that is, if ∀x ∈ ℜ4 ∃ η ∈ Σ x ∈ Sση for some σ ∈ Σ, then for every event [xσ], its present consists merely of the event itself.
3 D ISCUSSION Taking as a guide an (alleged) link between tenses and modalities, we defined a frame-independent notion of “the present of an event”. The presents of events
A Locus for “Now”
407
can be extended as well as point-like. Importantly, the underlying relation of copresence is transitive. The shape of the present of event e occurring in history h depends on the splitting points of h, which are locations of chancy events in h. These depend in turn on the localization of qualitative differences between h and other histories. The definition allows for a non-extended present as well as the global cosmic present, that, the entire universe. On this construal, the present of e is a set of events, that is, it is an eventlike concept. It is not a location-like concept; consequently, one cannot ask in this framework what the present of a given spatiotemporal location is? Observe that two different events sharing the same location must have different presents, since they must belong to alternative histories. Further these two presents, that is, different sets of events, may have different locations, as chancy events in these alternative histories may have different spatiotemporal locations. Finally, according to our definition, there is a full separation of tenses and causal notions. It might happen that e is in the causal past of f (i.e., within the past light-cone of f ), but belong to the present of f . In the other direction, f might lie outside the causal future of e (i.e., e SLR f ), but nevertheless belong to the future of e. This is the price to be paid for not requiring in Definition 2 that the future of e is (causally) after e. The definition also allows that for some two events, each belongs to the future of the other, which further entails that for each event of this kind, its future overlaps with its past. There is a straightforward remedy that prohibits this controversial consequence and removes the separation of tenses and causal notions in one direction: strengthen the clauses 1 and 2 of Definition 2, with the following result: Definition 5 An event f belongs to the future of event e, f ∈ Future(e), iff there is event e′ and an atomic formula A such that stets 1. e < e′ 6 f and 2. e |= P oss : Atloc(e′ ) : A and 3. e |= P oss : Atloc(e′ ) : ¬A. Event p belongs to the past of event e, p ∈ Past(e), iff event e belongs to the future of p. Event e′ belongs to the present of event e, e′ ∈ Present(e), iff there is a history h such that e, e′ ∈ h and e′ belongs neither to the past nor to the future of e. As a result, every event in the future of e will be the causally after e, and every event in the past of e will be causally before e. In general, this change of Definition 2 will result in smaller futures and smaller pasts, but larger presents. In particular, in a Minkowskian Branching Structure, if there are splitting points arbitrarily close below and arbitrarily close above a given event e, the future of e is the union of alternative possible future light-cones of e, its past—the past lightcone of e, and its present in a given history containing e—the set of events that belong to this history and are space-like related to e.
408
Tomasz Placek
Despite the altered definition, some separation of tenses and causal notions remains. To use Aristotle’s sea battle, suppose that the two admirals have already brought their hostile fleets near our harbor and have decided to have a battle tomorrow.18 Suppose further that with their decision, and all the circumstances, no matter what, the battle must happen. And not only this: assume as well that there is not a single trace of chanciness between our present thinking and the battle. No quantum decay, no agent’s dithering to do this rather than that. If these conditions are satisfied, the battle is now; it belongs to the present of your reading these words now. But obviously the battle is tomorrow, that is, in one day: by the definition of tense operators (as well as by common sense) we say: “There will be the sea battle tomorrow”and yet it is also now, presenting us with a clear paradox. Let us finally reflect on where we arrived. We elaborated on an Aristotelian tradition of associating future with contingency, choosing for our analysis what seems to be a very weak link between these notions. We then used a rigorous framework of BST and MBS’s to write down our definitions, and to see what the past, the present, and the future look like according to these definitions, if the underlying spacetime is Minkowski.19 But then, nicely, we hit upon a paradox. Can we tame it or explain it away? Or is this paradox a reductio of the idea of associating future with contingency? We leave it for the reader to decide.
18 I owe to J. Bogen the perception of how acute the separation between W ill and the future is. 19 As pointed out by J. Norton, physics neither exhibits branching structures similar to those of BST, nor splitting points, nor particular patterns of branching scenarios, the single exception being perhaps quantum measurement, but (ironically) this we hardly understand. Accordingly, BST might be not adequate for analyzing time in our physical world—the objection goes. But, in the present approach we derive the axioms of BST from the requirement that the attribution of properties to spatiotemporal points be proper, which is a week and intuitive requirement. (For the details, of the derivation, cf. Belnap and Placek (2010).) Perhaps physics has not yet grasped our everyday modal notion of indeterminism, which means that an event may happen, but not necessarily. (This notion is different from Laplacian indeterminism, elaborated by Montague (1962) and Lewis (1983)—for more on this, see our paper cited above.) As we said in the introduction, we bet that our world is indeterministic in the modal sense, and that physics will come to terms with it. A similar worry is that our analysis cannot be extended to general relativity, since BST axioms are incompatible with some solutions of this theory. Although we do not know how resolve this problem generally, some initial results in this direction are reported in Placek (2009).
A Locus for “Now”
409
R EFERENCES Belnap, N. (1992). Branching space-time. Synthese, 92:385–434. ‘Postprint’ archived at PhilSci Archive, http://philsci-archive.pitt.edu/archive/00001003. Belnap, N. (2002). EPR-like “funny business” in the theory of branching spacetimes. In T. Placek and J. Butterfield, editors, Nonlocality and Modality, pp. 293–315, Dordrecht. Kluwer Academic Publisher. Belnap, N. (2003). No-common-cause EPR-like funny business in branching space-times. Philosophical Studies, 114:199–221. Belnap, N. (2007). An indeterminist view on the parameters of truth. In T. M¨uller, editor, Philosophie der Zeit, pp. 87–113. Klostermann, Frankfurt a.M. Dieks, D. (1988). Special relativity and the flow of time. Philosophy of Science, 55(3):456–460. Dieks, D. (2006). Becoming, relativity and locality. In D. Dieks and M. Redei, editors, The Ontology of Spacetime, volume 1, pp. 157–176. Elsevier, Amsterdam. Dorato, M. (2008). Putnam on time and special relativity. European Journal Analytic Philosophy, 4(2):51–70. Eddington, A. S. (1949). The Nature of Physical World. Cambridge University Press, Cambridge. Eddington, A. S. (1953). Space, Time, Gravitation. Cambridge University Press, Cambridge. Gale, R. M. (1963). Some metaphysical statements about time. The Journal of Philosophy, 60(9):225–237. Lewis, D. (1983). New work for a theory of universals. Australasian Journal of Philosophy, 61:343–377. Malament, D. (1977). Causal theories of time and the conventionality of simultaneity. Nous, 11(3):293–300. Montague, R. (1962). Deterministic theories. In Decisions, Values, and Groups 2, pages 325–370. Oxford University Press, Oxford. Reprinted in his (1974) Formal Philosophy. Yale University Press, New Haven, CT. ed. R.H.Thomason. M¨uller, T. (2002). Branching space-time, modal logic and the counterfactual conditional. In T. Placek and J. Butterfield, editors, Nonlocality and Modality, NATO Science Series, pp. 273–291, Dordrecht. Kluwer Academic Publisher. M¨uller, T. (2005). Probability theory and causation: a Branching Space-Times analysis. British Journal for the Philosophy of Science, 56(3):487–520. M¨uller, T. (2006). On the problem of defining the present in special relativity: a challenge for tense logic. In Stadler, F. and St¨oltzner, M., editors, Time and History. Proceedings of the 28. International Ludwig Wittgenstein Symposium, Kirchberg am Wechsel, Austria 2005, pages 441–458, Frankfurt a.M. Ontos Verlag. M¨uller, T., Belnap, N., and Kishida, K. (2008). Funny business in branching space-times: Infinite modal correlations. Synthese, 164(1):141–159. Placek, T. (2009). Possibilities without possible worlds/histories. Journal of Philosophical Logic, DOI/s10992-010-9159-x, forthcoming.
410
Tomasz Placek
Placek, T. and Belnap, N. (2010). Indeterminism is a modal notion: branching spacetimes and Earman’s pruning. Synthese, DOI 10.1007/s11229-0109846-8, forthcoming. Placek, T. and M¨uller, T. (2007). Counterfactuals and historical possibility. Synthese, 154(2):173–197. Placek, T. and Wro´nski, L. (2009). On infinite EPR-like correlations. Synthese, 167(1):1–32. Prior, A. (1967). Past, present, and future. Oxford University Press, Oxford. Putnam, H. (1967). Time and physical geometry. The Journal of Philosophy, 64(8):240–247. Raki´c, N. (1997). Past, present, future, and special relativity. The British Journal for the Philosophy of Science, 48(2):257–280. Rietdijk, C. W. (1966). A rigorous proof of determinism derived from the special theory of relativity. Philosophy of Science, 33(4):341–344. Savitt, S. F. (2000). There’s no time like the present (in Minkowski spacetime). Philosophy of Science, 67:S563–S574. Stein, H. (1991). On relativity theory and openness of the future. Philosophy of Science, 58(2):147–167. Thomason, R. H. (1970). Indeterminist time and truth-value gaps. Theoria, 36:264–281. van Benthem, J. (1991). The logic of time. Kluwer AP, Dordrecht. Whitrow, G. J. (1961). The Natural Philosophy of Time. Thomas Nelson and Sons Ltd., London. Wro´nski, L. and Placek, T. (2009). On Minkowskian branching structures. Studies in History and Philosophy of Modern Physics, 40:251–258.
Department of Philosophy Jagiellonian University 52 Grodzka 31-044 Krakow , Poland [email protected]
S. E. RUGH AND H. Z INKERNAGEL
W EYL’ S P RINCIPLE , C OSMIC T IME AND Q UANTUM F UNDAMENTALISM
A BSTRACT We examine the necessary physical underpinnings for setting up the cosmological standard model with a global cosmic time parameter. In particular, we discuss the role of Weyl’s principle which asserts that cosmic matter moves according to certain regularity requirements. After a brief historical introduction to Weyl’s principle we argue that although the principle is often not explicitly mentioned in modern standard texts on cosmology, it is implicitly assumed and is, in fact, necessary for a physically well-defined notion of cosmic time. We finally point out that Weyl’s principle might be in conflict with the wide-spread idea that the universe at some very early stage can be described exclusively in terms of quantum theory.
1 I NTRODUCTION A basic characteristic of the Friedmann-Lemaˆıtre-Robertson-Walker (FLRW) model is its t parameter which is employed by cosmologists to trace back the evolution of the universe to its early stages. In a previous examination, we defended a ‘time-clock’ relation which asserts that time, in order to have a physical basis, must be understood in relation to physical processes which act as ‘cores’ of clocks (Rugh and Zinkernagel 2009). In particular, we argued that a necessary physical condition for interpreting the t parameter of the FLRW model as cosmic time in some ‘epoch’ of the universe is the (at least possible) existence of a physical process which can function as a core of a clock in the ‘epoch’ in question.1 In this paper we shall argue, in conformity with – but independently of – the time-clock relation, that the very set-up of the standard (FLRW) model in cosmology with a global time is closely linked to the motion (and properties) of cosmic 1
One of our results was that there are interesting problems for making this t ↔ time interpretation, and thus establishing a physical basis for cosmic time (in particular for a cosmic time scale), at least at ∼ 10−11 seconds after the “big bang” – that is, approximately 30 orders of magnitude before (in a backwards extrapolation of the FLRW model) Planck scales are reached.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_28, © Springer Science+Business Media B.V. 2011
412
S. E. Rugh and H. Zinkernagel
matter.2 It is often assumed that the FLRW model may be derived just from the cosmological principle which states that the universe is spatially homogeneous and isotropic (on large scales). It is much less well known that another assumption, often called Weyl’s principle, is necessary – or, at least, have been claimed to be necessary – in order to arrive at the FLRW model and, in particular, its cosmic time parameter. In a version close to Robertson’s (1933) (we shall discuss various formulations later), the principle states: Weyl’s principle: The world lines of galaxies, or ‘fundamental particles’, form (on average) a spacetime-filling family of non-intersecting geodesics converging towards the past.
The importance of Weyl’s principle is that it provides a reference frame based on an expanding ‘substratum’ of ‘fundamental particles’. In particular, if the geodesic world lines are required to be orthogonal to a series of space-like hypersurfaces, a comoving reference frame is defined in which constant spatial coordinates are “carried by” the fundamental particles. The time coordinate is a cosmic time which labels the series of hypersurfaces, and which may be taken as the proper time along any of the particle world lines. Insofar as the Weyl principle is necessary for the notion of cosmic time in the FLRW model, it clearly becomes important to examine whether the properties and motion of matter are compatible with the Weyl principle as we go back in cosmic time.3 If a point is reached at which this is not the case, then it appears not to be physically justified to contemplate ‘earlier’ epochs. Doing so would involve extrapolating the FLRW model into a domain where the fundamental assumptions (needed to build up the model) are no longer valid and the model would lose its physical basis (see also the discussion in Rugh and Zinkernagel 2009, p. 5). In the following, we first briefly review the early history of Weyl’s principle and question a claim, found in some of the recent literature on this principle, to the effect that the principle has been replaced by the cosmological principle. We then show that although the Weyl principle is not often mentioned explicitly in modern texts on cosmology, it is nevertheless in these texts in implicit form (and, we argue, necessarily so). We finally discuss and question the prospect of satisfying Weyl’s principle, and hence define cosmic time, at a very ‘early phase’ of the universe, if this phase is thought to be describable exclusively in terms of quantum theory.
2A
VERY BRIEF HISTORY OF
W EYL’ S PRINCIPLE
The early history and reception of Weyl’s principle – sometimes denoted postulate, assumption or hypothesis – have been chronicled e.g. by North 1990, and 2
3
Whereas our aforementioned study examines the physical basis for time both locally and globally we shall assume in the present manuscript that spacetime is physically well-defined locally. A related question may of course be made concerning the cosmological principle, see e.g. Weinberg (1973, p. 407).
Weyl’s Principle, Cosmic Time and Quantum Fundamentalism
413
more recently by Bergia and Mazzoni (1999) and Goenner (2001), see also Ehlers (2009). In this section we sketch a few important points about the historical development, and we argue against an apparent consensus among Bergia and Mazzoni (1999) and Goenner (2001) which takes Weyl’s principle to have been rendered redundant by the cosmological principle. Weyl first introduced his principle in 1923, with the appearance of the 5th and revised version of his Raum, Zeit, Materie, in connection with a discussion of de Sitter’s solution to Einstein’s field equations. In Weyl’s 1926 formulation of his principle (which he here called hypothesis), it reads:4 . . . the world lines of the stars [in later contexts, galaxies] form a sheaf [bundle], which rises in a given direction from the infinitely distant past, and spreads out over the hyperboloid [representing de Sitter’s model] in the direction of the future, getting broader and broader. [Quoted from Goenner (2001, p. 121), our inserts]
This principle, or hypothesis, amounts to specifying a choice of congruence (a family of non-crossing curves which fills spacetime) of timelike geodesics to represent the world lines of the cosmological substratum, see e.g. Goenner (2001, p. 120). As Weyl emphasized, such an assumption concerning the choice of congruence is necessary to derive an unambiguous cosmological redshift in de Sitter’s model; see also Bergia and Mazzoni (1999, pp. 336-338). As for the possible empirical support of his principle, Weyl mentions (6 years before Hubble’s 1929 paper) that “it appears that the velocities between distant celestial objects on average increase with their mutual separations” (quoted from Ehlers 2009, p. 1655). Weyl (1923, p. 1664) notes that on his hypothesis the stars (galaxies) belong “to the same causally connected system with a common origin”.5 Moreover, Weyl hints (same page) that this causal connectedness of the stars implies an assumption of the ‘state of rest’ of the stars which is “the only one compatible with the homogeneity of space and time”. On this point, Goenner (2001, p. 120) comments that Weyl thus “indicates that his hypothesis implies the existence of a common time parameter or, turned around, that the stars have a common instantaneous rest system”. While the implication of a cosmic time is consistent with Robertson’s use of Weyl’s principle (see below), Goenner also notes (p. 126) that Weyl never explains in detail why his chosen congruence also implies a common rest frame for the galaxies (and thus a cosmic time). As we shall see in section 3, the issue of whether or not a common time (a common rest frame) is implied by Weyl’s principle may be responsible for the difference in formulations of this principle in the literature. While Robertson is somewhat ambiguous concerning what is included in the Weyl principle (see section 3), he notes (1933, p. 65) that the reintroduction in cosmology of a significant simultaneity (a cosmic time) implied by Weyl’s postulate is permissible since ob4 5
Weyl’s formulations in his 1923 writings are more convoluted, see Goenner (2001) and Bergia and Mazzoni (1999). According to Ehlers (2009, p. 1657) this “implies that each particle can be influenced by all others at any time; in modern parlance there is no particle horizon”.
414
S. E. Rugh and H. Zinkernagel
servations support the idea that galaxies (on average) are moving away from each other with a mean motion which represents the actual motion to within relatively small and unsystematic deviations.6 Whereas, for Weyl, the selection of a particular congruence of curves as world lines to represent cosmic matter was originally merely a specific property of the de Sitter universe, it later became (e.g for Robertson) a necessary assumption for constructing cosmological models. This role of Weyl’s principle is emphasized by Ehlers in Bertotti (1990, p. 29); see also Weyl (1930, p. 937): H. Weyl in 1923 points out that to have a cosmological model one has to specify, besides a space-time (M,g), a congruence of timelike curves to represent the mean motion of matter. [Our emphasis]
Now, Weyl introduced his principle in de Sitter space where (unlike the FLRW model) there is no unique choice of congruence (this is why de Sitter’s cosmos can be written either as a static or an expanding universe), see e.g. Ellis (1990, p. 100). But, as we shall discuss further in section 3 and 4, even if the choice of congruence is unique in the FLRW model, it is still crucial that the actual matter content of the universe is (on average) well represented by this congruence. Bergia and Mazzoni (1999, p. 339) note: “In his 1929 paper, Robertson had given no justification for his introduction of a cosmic time. As we have just seen, he did offer some in 1933, guided by Weyl’s principle. Therefore the continuity between Weyl’s and the cosmological principle seems fairly well established”. This quote might indicate that the cosmological principle somehow replaced the Weyl principle but such an idea would, in our assessment, be misleading both for historical and conceptual reasons.7 For Robertson (1933, p. 65) clearly states that he uses two (in fact four) assumptions amounting to both Weyl’s principle and the cosmological principle.8 Furthermore, Robertson notes that the cosmic time implied by Weyl’s principle “allows us to give a relatively precise formulation of the assumption that our ideal approximation to the actual world is spatially uniform” (1933, p. 65), and he thus suggests that the Weyl principle is actually a precondition for the cosmological principle (we shall pursue this theme further in section 3).
6
7 8
Robertson’s empirical justification for the introduction of a cosmic time stands in contrast to Friedmann’s statement (1922, p. 1993): “In the expression for ds2 , g14 , g24 , g34 can be made to vanish by corresponding choice of the time coordinate, or, shortly said, time is orthogonal to space. It seems to me that no physical or philosophical reasons can be given for this second assumption; it serves exclusively to simplify the calculations”. The term “cosmological principle” is due to Milne in 1933, though it can already implicitly be found in Einstein’s 1917 paper, see e.g. North (1990, p. 157). The four assumptions, schematically, are: (1) a congruence of geodesics; (2) hypersurface orthogonality; (3) homogeneity; and (4) isotropy. In section 3 we return to the relation between (1), (2) and Weyl’s principle in Robertson (1933).
Weyl’s Principle, Cosmic Time and Quantum Fundamentalism
415
In a spirit which seems similar to that of Bergia and Mazzoni, Goenner (2001, p. 126) notes in connection with the literature of the 1940s: “In fact, Weyl’s hypothesis had become superfluous and was replaced by the cosmological principle, i.e. the hypothesis that, in the space sections, no point and no direction are preferred”. Apparently, Goenner’s assessment of the (ir-)relevance of Weyl’s principle for today’s cosmology is similar: “Weyl’s stature in mathematics and science may . . . explain why the hypothesis still is mentioned in some modern books on gravitation and cosmology, notably by authors not specialized in cosmological research” (2001, p. 127). This statement fits well with the fact that Weyl’s principle is often absent in an explicit form in current cosmology textbooks. However, as we shall argue in the following section, Weyl’s principle is, at least implicitly, still present (and necessarily so) in the main texts on cosmology.
3 W EYL’ S PRINCIPLE IN STANDARD TEXTS ON COSMOLOGY In some cosmology textbooks, e.g. Bondi (1960), Raychaudhuri (1979) and Narlikar (2002), the importance of Weyl’s principle is emphasized, and explicitly referred to, when the physical basis of the comoving frame, cosmic time, and the FLRW model are outlined. For instance, the derivation of the FLRW metric in Narlikar (2002, p. 107 ff.) is explicitly built on two assumptions, namely: 1. Weyl’s postulate (Narlikar): The world lines of galaxies [or ‘fundamental particles’] form a bundle of non-intersecting geodesics orthogonal to a series of spacelike hypersurfaces. 2. The cosmological principle: The universe, on large scales, is spatially homogeneous and spatially isotropic.9 In Narlikar’s formulation of Weyl’s postulate (which includes the orthogonality criterion; see below), this postulate is sufficient to build up a comoving reference frame in which the constituents of the universe are at rest (on average) relative to the comoving coordinates: The trajectories, xi = constant, of the constituents are freely falling geodesics, and the requirement that the geodesics be orthogonal to the spacelike hypersurfaces translates into the requirement g0i = 0, which (globally) resolves the space-time into space and time (a 3+1 split). We have g00 = 1 if we choose the time coordinate t so that it corresponds to proper time (dt = ds/c) along the lines of constant xi , i.e. t corresponds to clock time for 9
We note that while Weyl’s principle and the cosmological principle allow for the possibility to set up the FLRW model with a global cosmic time, the implementation of these principles can only be motivated physically if we already have a physical foundation for the concepts of ‘space’ and ‘time’ locally – otherwise we cannot apply concepts like “spacelike”, “spatially homogeneous”, “spatially isotropic”, which appear in the definitions of these principles.
416
S. E. Rugh and H. Zinkernagel
a standard clock at rest in the comoving coordinate system.10 The metric can thus be written in synchronous form in which the spacelike hypersurfaces are surfaces of simultaneity for the comoving observers (see also e.g. MTW 1973, p. 717): ds2 = c2 dt2 − gij dxi dxj
i, j = 1, 2, 3.
(1)
In Narlikar (2002), the role of the cosmological principle is then to simplify the spatial part of this metric in order to get the standard FLRW form:11 { } dr2 2 2 2 2 2 2 2 2 ds = c dt − R (t) + r (dθ + sin θdϕ ) (2) 1 − kr 2 Narlikar’s discussion of the assumptions going into the derivation of the FLRW line element seems to follow Robertson (1933) closely with one notable difference: Narlikar takes orthogonality of the matter world lines to the series of space-like hypersurfaces as being part of Weyl’s principle (and hence, implicitly, that the existence of a comoving frame follows directly from this principle). By contrast, Robertson states Weyl’s principle as in section 1 (i.e. as the principle that matter is represented by a congruence of diverging geodesics) and add, as a further assumption, that the space-like hypersurfaces are orthogonal to the congruence of geodesics.12 In any case, whether or not the assumption of hypersurface orthogonality is included in Weyl’s principle, it is clear that one can impose the requirement of the congruence being orthogonal to the hypersurfaces only given that there is a congruence.13 Given the importance of Weyl’s principle in Robertson (1933), it may at first be surprising that other (than the above mentioned) standard text books on general 10 That the world lines are geodesics implies that g00 depends only on x0 , and so that g00 can be set to unity by a suitable coordinate transformation, see e.g. Narlikar (2002, p. 109). 11 Although there is a preferred choice of the congruence of world lines (a preferred reference frame) in the FLRW model (see below), there are many different coordinate representations of this model; see e.g. Krasinski (1997, p. 11 and p. 14 - 16) who outline at least five different coordinate representations. 12 However, Robertson is actually somewhat ambiguous about what exactly is included in, and implied by, Weyl’s principle. For, right after introducing both assumptions (congruence and hypersurface orthogonality), he mentions (1933, p. 65): “The possibility of thus introducing in a natural and significant way this cosmic time t we consider as guaranteed by Weyl’s postulate, which is in turn a permissible extrapolation from the astronomical observations”. Perhaps this ambiguity is related to Weyl’s own insufficient explanation, mentioned in section 2, of whether a comoving frame (‘state of rest’ of the stars) follows directly from his principle. 13 One can have a congruence which is not orthogonal to a series of spacelike hypersurfaces but not, of course, a hypersurface orthogonal congruence which is not a congruence! Note however the underlying coupled problem: The specification of a congruence (is it hypersurface orthogonal?, are the world lines geodesics?, etc.) depends on gµν , and the specific form of gµν depends in turn on the choice of congruence (the reference frame).
Weyl’s Principle, Cosmic Time and Quantum Fundamentalism
417
relativity and modern cosmology, such as Misner, Thorne and Wheeler (MTW) (1973), Wald (1984), Peebles (1993) or Hawking and Ellis (1973), have no explicit reference to Weyl’s principle. As far as we can see, however, the reason is simply that the Weyl principle is assumed implicitly in these books at an early stage of setting up the FLRW model. To see this, consider first Ellis’ (1990, p. 99) clarification (cf. also the quote of Ehlers in section 2 above): It is important to realise that a cosmological model is specified only when a 4-velocity ua representing the average motion of matter in the universe has been specified as well as the space-time metric gµν ; the observable relations in the model are determined by the choice of this 4-velocity, or equivalently of the associated fundamental world-lines.
As mentioned in section 2, the preferred choice of a 4-velocity or, equivalently, a congruence of world lines, to represent the average motion of matter is unique in the FLRW case.14 But still, the congruence plays a fundamental role since the symmetry constraints of homogeneity and isotropy are imposed with respect to such a congruence, cf. e.g. Ellis (1999): We start by assuming large-scale spatial homogeneity and isotropy about a particular family of worldlines. The RW models used to describe the large-scale structure of the universe embody those symmetries exactly in their geometry. It follows that comoving coordinates can be chosen . . . . [Our emphasis]
Another way of stating this point is that isotropy can be satisfied only in a particular reference frame or for a particular class of fundamental observers (other observers moving with respect to these will not see isotropy). Indeed, such a fundamental class of observers (congruence) is part of the definition of isotropy, cf. e.g. Wald (1984, p. 93) (see also MTW 1973, p. 714): A spacetime is said to be (spatially) isotropic at each point if there exists a congruence of timelike curves (i.e. observers), with tangents denoted ua , filling the spacetime . . . [such that] . . . it is impossible to construct a geometrically preferred tangent vector orthogonal to ua .
Thus, Weyl’s principle – in the general sense of matter being well represented by a congruence of world lines – is a precondition for the cosmological principle; the former can be satisfied without the latter being satisfied but not vice versa. As hinted above in connection with Robertson, the specification of a congruence of world lines representing matter is a necessary but not sufficient condition 14 The mentioned equivalence follows if precisely one non-zero 4-velocity vector is assigned to every point of the manifold (the world line is obtained as the integral curve of ua ). Ehlers (1961) introduces a three-dimensional family of timelike curves to represent the world lines of the matter elements with respect to an arbitrary local coordinate system, xa = xa (yα , s). He then derives the 4-velocity of the matter elements by differentiating the coordinates xa (i.e. ua = ∂xa /∂s) with respect to the proper time s along the world lines. Subsequently the acceleration u˙ of the matter elements can be defined, and this acceleration vanishes (u˙ = 0) if the world lines move along geodesics.
418
S. E. Rugh and H. Zinkernagel
for setting up a cosmic time.15 Only with the additional requirement of the congruence of world lines being hypersurface orthogonal do we get a sufficient condition. In terms of the 4-velocity field, a sufficient condition for having cosmic time is that the motion of matter is (on average) described by this field and that the motion is irrotational (corresponding to the 4-velocity having zero vorticity; see e.g. Ellis (1996).16 Asserting the existence of the 4-velocity field (representing matter) is of course prior to inquiring about its vorticity. This is just another way of repeating that the condition that matter can be well described by Weyl’s principle is necessary for having cosmic time. To require that the motion of matter is well represented by a congruence of world lines (i.e. to impose Weyl’s principle in the general sense) is to require that the matter world lines are non-crossing (of course, this can only be true on average, see below). This non-crossing of world lines is built into the construction of the comoving frame with respect to which cosmic time is defined. As described e.g. in MTW (1973, p. 715 ff), see also Wald (1984, p. 95) and Weinberg (1972, p. 338), an arbitrary grid of space coordinates (x1 , x2 , x3 ) (constant labels) are laid out on a spacelike hypersurface (of homogeneity). These coordinates are “propagated off” and throughout all spacetime by means of the world lines of the cosmological fluid with proper (= cosmic) time measured along any of the fluid world lines (so the coordinates are “carried by” the world lines). Since the world lines of the cosmological fluid are used to propagate the coordinates it is crucial that there is no crossing of the world lines (i.e. that the family of world lines constitutes a congruence), as we would otherwise have the same spacetime point described by different (incompatible) coordinates.17 As Narlikar puts it (2002, p. 108):
15 For instance, G¨odel’s expanding and rotating solution from 1952 satisfies, as far as we can see, Weyl’s principle in Weyl’s 1926 (and Robertson’s 1933) formulation. But this solution does not have a cosmic time (at least not for high rates of rotation, see e.g. Belot (2005, p. 27)). See also Ellis (1996) for a detailed discussion both of G¨odel’s static and expanding rotating solutions of Einstein’s field equations. 16 The 4-velocity field may be decomposed into rotation (vorticity), shear and expansion components, see e.g. Ehlers (1961, p. 1228) or MTW (1973, §22.3). As concerns the connection between zero vorticity and hypersurface orthogonality, Malament (2006, p. 251) presents a nice picture: “Think about an ordinary rope. In its natural twisted state, the rope cannot be sliced by an infinite family of slices in such a way that each slice is orthogonal to all fibers. But if the rope is first untwisted, such a slicing is possible. Thus orthogonal sliceability is equivalent to fiber untwistedness. The proposition extends this intuitive equivalence to the four-dimensional ‘spacetime ropes’ (i.e. congruences of worldlines) encountered in relativity theory. It asserts that a congruence is irrotational (i.e. exhibits no twistedness) iff it is, at least locally, hypersurface orthogonal.” 17 Note that in Minkowski spacetime there is no unique congruence of world lines (no unique preferred frame), no (preferred) cosmic time, and no need to impose the noncrossing criterion – but also that one does not need the world lines of the material constituents to propagate (set up) the coordinates. It is however possible to do so, and
Weyl’s Principle, Cosmic Time and Quantum Fundamentalism
419
It is worth emphasizing the importance of the non-intersecting nature of world lines. If two galaxy world lines did intersect, our [comoving] coordinate system above would break down, for we would then have two different values of xµ specifying the same point in spacetime (the point of intersection).
How can Weyl’s principle be fulfilled in the real universe? Typical ordinary velocities of (nearby) galaxies relative to each other are < v > ∼ 1/1000 × c (MTW 1973, p. 711) and, indeed, some galaxies do collide. Likewise with the more fundamental constituents in earlier phases of the universe. Thus the fundamental world lines in the Weyl principle must be some ‘average world lines’ associated with the average motion of the fundamental particles (in order to “smooth out” any crossings).18 At present and for most of cosmic history, the comoving frame of reference can be identified as the frame in which the cosmic microwave background radiation looks isotropic (see e.g. Peebles 1993, p. 152), and cosmic matter is (above the homogeneity scale) assumed to be described as dust particles with zero pressure which fulfill Weyl’s principle. In the early radiation phase, matter is highly relativistic (moving with random velocities close to c), and the Weyl principle is not satisfied for a typical particle but one may still introduce fictitious averaging volumes in order to create substitutes for ‘galaxies which are at rest’; see e.g. Narlikar (2002, p. 131). However, above the electroweak phase transition (before 10−11 seconds ‘after’ the big bang), all constituents are massless and move with velocity c in any reference frame. There will thus be no constituents which are comoving (at rest).19 One might attempt to construct mathematical points (comoving with a reference in that case it is necessary that the world lines are non-crossing, see e.g. Peebles (1993, p. 250). 18 A closely related problem is to average out inhomogeneities in the matter distribution (such averaging procedures have been developed to a large degree of sophistication, see e.g. Krasinski (1997, pp. 263 - 275)). It is a highly non-trivial problem, and it was emphasized already by G¨odel (1949, p. 560), that the necessary averaging over large volumes will introduce an arbitrariness in the definition of cosmic time depending on the details of the averaging process and the size of the regions considered (see also North (1990, p. 360) and Dieks (2005, p. 11 - 12)). 19 This conclusion may also be reached by noting that the set-up of the FLRW model requires matter (the energy-momentum tensor) to be in the form of a perfect fluid, as this is the only form compatible with the FLRW symmetries, see e.g. Weinberg (1972, p. 414). And a source consisting of pure radiation is not sufficient since one cannot effectively simulate a perfect fluid by “averaging over pure radiation”: Krasinski (1997, p. 5 - 9) notes that the energy-momentum tensor in cosmological models may contain many different contributions, e.g. a perfect fluid, a null-fluid, a scalar field, and an electromagnetic field. But he also emphasizes that whereas a scalar field source is compatible with the FLRW geometry (since it acts as a stiff perfect fluid with equation of state p = ρ), a source of pure null fluid or pure electromagnetic field is not compatible with the FLRW geometry, and solutions with such energy-momentum sources have no FLRW limit (see Krasinski 1997, p. 13).
420
S. E. Rugh and H. Zinkernagel
frame) like the above mentioned center of mass (or, in special relativity, center of energy) out of the massless, ultrarelativistic gas particles, but this procedure requires that length scales be available in order to e.g. specify how far the particles are apart (which is needed as input in the mathematical expression for the center of energy). As discussed in Rugh and Zinkernagel (2009) the only option for specifying such length scales (above the electroweak phase transition) will be to appeal to speculative physics, and the prospects of satisfying Weyl’s principle (and have a cosmic time) will therefore also rely on speculations beyond current well-established physics. We conclude that it is instrumental that some averaging procedure is made in order to yield a non-crossing family of world lines (a congruence). Whether this is possible when matter is described by quantum theory (e.g. in the very early universe) is the question we address in the next section.
4 C OSMIC TIME WITH QUANTUM MATTER ? We have seen that Weyl’s principle cannot be disregarded in the FLRW model as it is either implicitly or explicitly included among the fundamental principles used to set up this model. The question therefore arises: What could be candidates for the “Weyl substratum” which, at epochs when no galaxies are present, can form substitutes for (on average) non-intersecting galaxies at rest? The empirical adequacy of both Weyl’s principle and the cosmological principle depends on the actual arrangement and motion of the physical constituents of the universe. As we go backwards in time it may become increasingly difficult to satisfy these physical principles since, as mentioned in section 3, the nature of the physical constituents is changing from galaxies, to relativistic gas particles, and to entirely massless particles moving with velocity c. In particular, the Weyl principle refers to a non-crossing family of (fluid or particle) world lines, that is, to classical or classicalized particle-like behavior of the material constituents. This makes it difficult even to formulate the Weyl principle (let alone decide whether it is satisfied) if some period in cosmic history is reached where the ‘fundamental particles’ are to be described by wave-functions ψ(x, t) referring to (entangled) quantum constituents. What is a ‘world line’ or a ‘particle trajectory’ then? Unless one can specify a clear meaning of ’non-intersecting trajectories’ in a contemplated quantum ‘epoch’ it would seem that the very notion of cosmic time, and hence the notion of ‘very early universe’ is compromised. This last problem of identifying a Weyl substratum within a quantum description arises most clearly on a “quantum fundamentalist” view according to which the material constituents of the universe could be described exclusively in terms of quantum theory at some early stage of the universe.20 On such a quantum fundamentalist view, the following question naturally arises 20 For instance, Kiefer notes that “The Universe was essentially ‘quantum’ at the onset of inflation” (Joos et al. 2003, p. 208).
Weyl’s Principle, Cosmic Time and Quantum Fundamentalism
421
The cosmic measurement problem: If the universe, either its content or in its entirety, was once (and still is) quantum, how can there be (apparently) classical structures now?
We call this the “cosmic measurement problem” since it addresses the standard quantum measurement problem in the cosmological context. While many aspects of the cosmic measurement problem have been addressed in the literature, the perspective which we would like to add is that the problem is closely related to providing a physical basis for the (classical) FLRW model with a (classical) cosmic time parameter.21 Our point is that if cosmic time in the FLRW model is crucially dependent on a (prior) classical or classicalized behaviour of the material constituents of the universe, then one can hardly (assume a quantum fundamentalist view and) approach the cosmic measurement problem by asserting a gradual emergence of classicality framed in terms of a cosmic time. An often attempted response to the cosmic measurement problem is to proceed via the idea of decoherence. Within such an approach one may imagine that quantum particles in the early universe (like particles in a bubble chamber) will move along ‘tracks’ (instead of being wave functions spread out in space) — due to the interaction of the quantum constituents with the environment (that is, the environment of all the other particles are constantly ‘monitoring’ the particle wave function in question). However, there are reasons to question whether decoherence has sufficient explanatory power for the quantum fundamentalist (e.g. whether decoherence is sufficient to explain the building up of a Weyl substratum). First, as is widely known, decoherence cannot by itself solve the measurement problem and explain the emergence of the classical world (see e.g. Landsman 2006). Furthermore, as already indicated, if decoherence is to provide the classical structures (in the cosmological context), it cannot — as is usually assumed in environmental induced decoherence — be a process in (cosmic) time, insofar as classical structures (non-crossing world lines) are needed from the start to define cosmic time. Finally, a general worry about decoherence has been expressed e.g. by Anastopoulos (2002): “. . . a sufficiently classical behaviour for the environment seems to be necessary if it is to act as a decohering agent and we can ask what has brought the environment into such a state ad-infinitum”.22 21 Note the temporal aspect of the cosmic measurement problem: Not only are classical structures less fundamental since they are derivable from quantum structures, but they are also temporally secondary to the original quantum state of the universe. Depending on the cosmic epoch of interest, various levels of the cosmic measurement problem can be distinguished, for instance (see e.g. Kiefer and Joos 1999): (1) How to get a classical spacetime out of quantum spacetime? (2) How to get classical structures from quantum fields (in a classical spacetime background) — for instance in an early inflationary universe? (3) How to get a measurement apparatus to show definite results (the standard measurement problem)? 22 A further problem is that while the split between system and environmental degrees of freedom may be natural in earth-based experimental arrangements, it appears less obvious in the context of the early universe. Thus, while Kiefer and Joos (1999) appear
422
S. E. Rugh and H. Zinkernagel
Due to these limitations of the decoherence idea in the present context, the quantum fundamentalist is (in our view) still faced with the question of whether a comoving Weyl substratum can be constructed from (non classicalized) quantum constituents (wave functions). Apart from (but related to) the mentioned concerns about decoherence in this context, one may ask what ‘moves’ according to the quantum description? From the point of view of a Born interpretation, the wave function in quantum theory is not a real wave but rather a probabilistic object. The evolution of a wave function ψ(⃗ x, t) therefore appears insufficient to provide a physical basis for the fluid particles comprising the Weyl substratum since in the quantum description (in the Born interpretation) no physical object moves from a definite place A to another place B. Only a mathematical entity ψ(⃗x, t) – the symbolic representation of the quantum system – ‘moves’.23 It is not obvious that this problem will be more tractable if instead of one particle we have quantum systems composed of many constituents (see also Landsman 2006, p. 492). The early universe is envisaged to be described by a collection of interacting quantum fields. In general, these (matter and radiation) fields will be in an entangled state in which it is far from clear that individual particle trajectories are discernible. Thus, even with many constituents it is still not clear that something actually moves from one place to another. As a consequence, there may not be a well-defined notion of particle trajectories (let alone non-crossing particle trajectories) in which case no Weyl substratum can be identified. In that situation, no cosmic time can be defined and it thus seems difficult to maintain the quantum fundamentalist view of an early quantum ‘epoch’ of the universe. As a mathematical study, the FLRW model may be extrapolated back arbitrarily close to t = 0. But as a physical model nobody believes it ‘before’ the Planck time. As we have argued, however, there are interesting problems with establishing a physical basis for the FLRW model with a cosmic time, even before (in a backward extrapolation from now) we might reach an ‘epoch’ in which theories of quantum gravity may come into play. Acknowledgements We would like to thank audiences at talks in Copenhagen, Dortmund, Granada, Heidelberg, Leeds, Oxford, and Utrecht for helpful comments. We also thank George Ellis and Erhard Scholz for comments on the manu-
to assume that various subsets of constituents in the universe can successively classicalize one another via decoherence (starting with the gravitational degrees of freedom), Anastopoulos (2002) points out that the environment/system splitting seems to be arbitrary in the context of general relativity. 23 In our assessment, also the local space and time concepts require a physical foundation in terms of the material constituents (cf. the time-clock relation in Rugh and Zinkernagel 2009). In the quantum context (quantum mechanics as well as quantum field theory) we are therefore faced with an interesting circularity: The wavefunction ψ = ψ(⃗ x, t) is defined on classical spacetime (⃗ x, t) but spacetime has in turn to be constructed with reference to the material building blocks, that is, to the wave functions ψ themselves.
Weyl’s Principle, Cosmic Time and Quantum Fundamentalism
423
script. We finally thank the Spanish Ministry of Science and Innovation (Project FFI2008-06418-C03-02) for financial support.
R EFERENCES Charis Anastopoulos, “Frequently asked questions about decoherence”, in: International Journal of Theoretical Physics 41, 2002, pp. 1573-1590. Gordon Belot, “Dust, Time, and Symmetry”, in: British Journal for the Philosophy of Science 56, 2005, pp. 255-91. Silvio Bergia and Lucia Mazzoni, “Genesis and evolution of Weyl’s reflections on de Sitter’s universe”, in: Hubert Goenner et al (Eds.), The expanding worlds of general relativity. Basel: Birkh¨auser 1999, pp. 325-343. Herman Bondi, Cosmology (2nd edition). Cambridge: Cambridge University Press 1960. Dennis Dieks, “Becoming, Relativity and Locality”, in Dennis Dieks (Ed.), The Ontology of Spacetime. Amsterdam: Elsevier 2006, pp. 157-176. J¨urgen Ehlers, “Contributions to the Relativistic Mechanics of Continuous Media” (1961), translated in: General Relativity and Gravitation 25, 1993, pp. 12251266. J¨urgen Ehlers, “Editorial note to: H. Weyl, On the general relativity theory”, in: General Relativity and Gravitation 41, 2009, pp. 1655-1660. George F. R. Ellis, “Innovation, resistance and change: the transition to the expanding universe”, in: Bruno Bertotti et al (Eds.), Modern cosmology in retrospect. Cambridge: Cambridge University Press 1990, pp. 97-113. George F. R. Ellis, “Contributions of K. G¨odel to relativity and cosmology”, in: Petr Hajek (Ed.), G¨odel ’96: Lecture notes in logic 6. Berlin: Springer-Verlag 1996, pp. 325-343. George F. R. Ellis, “83 years of general relativity and cosmology: progress and problems”, in: Classical and Quantum Gravity 16, 1999, pp. A37-A75. Stephen W. Hawking and George F. R. Ellis, The large scale structure of spacetime. Cambridge: Cambridge University Press 1973. Alexander Friedman, “On the Curvature of Space” (1922), reprinted in: General Relativity and Gravitation 31, 1999, pp. 1991-2000. Hubert Goenner, “Weyl’s contributions to cosmology”, in Erhard Scholtz (Ed.) Hermann Weyl’s Raum – Zeit – Materie and a general introduction to his scientific work. Basel: Birkh¨auser 2001, pp. 105-137. Kurt G¨odel, “A remark about the relationship between relativity theory and idealistic philosophy” in: Paul A. Schilpp (Ed.), Albert Einstein: PhilosopherScientist. La Salle, Illinois: Open Court, 1949, pp. 555-562. Erich Joos et al (Eds.) Decoherence and the Appearance of a Classical World in Quantum Theory. Berlin: Springer 2003. Claus Kiefer and Erich Joos, “Decoherence: Concepts and examples”, in: Philippe Blanchard and Arkadiusz Jadczyk (Eds.), Quantum future. Berlin: Springer 1999, pp. 105-128.
424
S. E. Rugh and H. Zinkernagel
Andrzej Krasinski, Inhomogeneous Cosmological Models. Cambridge: Cambridge University Press 1997. Nicolaas P. Landsman, “Between classical and quantum”, in: Jeremy Butterfield and John Earman (Eds.), Handbook of the Philosophy of Science, Vol. 2: Philosophy of Physics. Amsterdam: North-Holland, 2006, pp. 417-554. David B. Malament, “Classical general relativity”, in: Jeremy Butterfield and John Earman (Eds.), Handbook of the Philosophy of Science, Vol. 2: Philosophy of Physics. Amsterdam: North-Holland, 2006, pp. 417-554. Charles W. Misner, Kip S. Thorne, John A. Wheeler, Gravitation. New York: W. H. Freeman 1973. Jayant V. Narlikar, An Introduction to Cosmology. Third Edition. Cambridge: Cambridge University Press 2002. John D. North, The measure of the universe – A history of modern cosmology. New York: Dover 1990. Phillip J. Peebles, Principles of physical cosmology. Princeton: Princeton University Press 1993. Amal K. Raychaudhuri, Theoretical Cosmology. Oxford: Clarendon Press 1979. Howard P. Robertson, “Relativistic Cosmology”, in: Reviews of Modern Physics 5, 1933, pp. 62-90. Svend E. Rugh and Henrik Zinkernagel, “On the physical basis of cosmic time”, in: Studies in History and Philosophy of Modern Physics 40, 2009, pp. 1-19. Robert M. Wald, General Relativity. Chicago: The University of Chicago Press 1984. Steven Weinberg, Gravitation and Cosmology. New York: John Wiley & Sons 1972. Hermann Weyl, “On the general relativity theory” (1923), reprinted in: General Relativity and Gravitation 41, 2009, pp. 1661-1666. Hermann Weyl, “Redshift and Relativistic Cosmology”, in: The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 9, 1930, pp. 936-943. S. E. Rugh Symposion,‘The Socrates Spirit’ Section for Philosophy and the Foundations of Physics Hellebækgade 27 Copenhagen N Denmark [email protected] H. Zinkernagel Department of Philosophy I Granada University 18071 Granada Spain [email protected]
M. P. S EEVINCK AND J. U FFINK
N OT THROWING OUT THE BABY WITH THE BATHWATER : B ELL’ S CONDITION OF L OCAL CAUSALITY MATHEMATICALLY ‘SHARP AND CLEAN ’
A BSTRACT The starting point of the present paper is Bell’s notion of local causality and his own sharpening of it so as to provide for mathematical formalisation. Starting with Norsen’s (2007 [15], 2009 [16]) analysis of this formalisation, it is subjected to a critique that reveals two crucial aspects that have so far not been properly taken into account. These are (i) the correct understanding of the notions of sufficiency, completeness and redundancy involved; and (ii) the fact that the apparatus settings and measurement outcomes have very different theoretical roles in the candidate theories under study. Both aspects are not adequately incorporated in the standard formalisation, and we will therefore do so. The upshot of our analysis is a more detailed, sharp and clean mathematical expression of the condition of local causality. A preliminary analysis of the repercussions of our proposal shows that it is able to locate exactly where and how the notions of locality and causality are involved in formalising Bell’s condition of local causality. PACS numbers: 03.65.Ta, 03.65.Ud
Now it is precisely in cleaning up intuitive ideas for mathematics that one is likely to throw out the baby with the bathwater. J. S. Bell (1990) [9, p. 106]
I. I NTRODUCTION Despite the existence of many mathematically precise results concerning Bell’s theorem, there continues to be controversy over just what the ingredients of the theorem are, and what the theorem together with the experiments confirming a violation of a Bell-inequality rules out. This is especially so with regard the locality and causality requirements involved. Although recently great progress has D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_29, © Springer Science+Business Media B.V. 2011
426
M. P. Seevinck and J. Uffinck
been achieved in clarifying Bell’s theorem and the crucial notions and assumptions involved1 , we still believe there to be a crucial gap left open. This paper, which is still work in progress, tries to fill in this gap. In section II we will present the intuitive notion of local causality and the way Bell himself further sharpened it so as to allow for mathematical formalisation. In section III Bell’s formalisation of local causality will be commented on using the illuminating papers by Norsen (2007, [15], 2009 [16]). In doing so we will unearth two novel and crucial aspects that have not yet been adequately incorporated in the standard formalisation. Firstly, section III A indicates the intricate relationship of the notions of sufficiency, completeness and redundancy involved. Secondly, the very different theoretical role of settings and outcomes in the candidate theories under study is argued for in section III B. The latter can be rather easily performed but the first needs a rather extensive discussion. This is performed in section IV, and which gives a mathematical account of the two different notions of sufficiency that are in play. It will use an important source of inspiration that has been overlooked in the debate so far. Namely, it will be argued that the concept of sufficiency, first formulated by R. A. Fisher (1922, [13]) in the context of mathematical statistics throws a relevant light on this debate. Next, the tools obtained in the course of our analysis are used in section V to finally give a mathematically sharp and clean formulation of Bell’s notion of local causality. In section VI the novel mathematical formalisation is analysed and it is precisely indicated where and how the notions of locality and causality are involved. Section VII marks the end of this paper by indicating that, indeed, the present paper is still work in progress, as the repercussions of our mathematical formalisation of Bell’s notion of local causality need still to be fully charted and confronted to those of other similar analysis.2
II. T HE INTUITIVE IDEA : B ELL’ S LOCAL CAUSALITY In the section entitled ‘Principle of local causality’ of the very last article Bell wrote on the foundations of quantum theory (published in 1990 and entitled ’La Nouvelle Cuisine’ [9]), Bell begins his explanation of the principle of local causality as follows:3 The direct causes (and effects) of events are near by, and even the indirect causes (and effects) are no further away than permitted by the velocity of light. Bell (1990) [9, p. 105] 1 2 3
For example, see Cavalcanti (2008 [11]), Norsen (2007 [15], 2009 [16]), Seevinck (2008 [17]). This we hope to do in the near future (Seevink & Uffink, 2010, [18]). Here we will mainly focus on Bell’s formulation of this principle as presented in ‘La Nouvelle Cuisine’, Bell (1990) [9]. This presentation we take to be the most definite and precise one Bell ever presented; it is overall consistent with earlier formulations Bell used to indicate this principle. See Norsen (2007) [15] for further elaboration and support of this claim.
Not Throwing out the Baby with the Bathwater
427
effects
1
causes
Figure 1: “Space-time location of causes and effects of events in region 1.” Figure (slightly modified) and caption taken from Bell (1990) [9, p. 105]. This locates the causes operating in a certain region in space-time in the backward light cone of that region and effects of anything occuring in that region in its forward light cone. See Fig. 1. But Bell remarks, the “[t]he above principle is not yet sufficiently sharp and clean for mathematics”. He then continues (see Fig. 2):
1
2
3
Figure 2: “Full specification of what happens in 3 makes events in 2 irrelevant for predictions about 1 in a locally causal theory.” Figure and caption taken from Bell (1990) [9, p. 105].
A theory is said to be locally causal if the probabilities attached to values of local beables in a space-time region 1 are unaltered by a specification of values of local beables in a spacelike separated region 2 when what happens in the backward light cone is already sufficiently specified, for example by a full specification of local beables in a spacetime region 3. It is important that region 3 completely shields off from 1 the overlap of the backward light cones of 1 and 2. And it is important that events 3 be specified completely. Otherwise the traces in region 2 of causes of events in 1 could well supplement whatever else was being used for calculating probabilities about 1. The hypothesis is that any such information about 2 becomes redundant when 3 is specified completely. Bell (1990) [9, p. 106]
428
M. P. Seevinck and J. Uffinck
Although this formulation is considerably sharper, it is not yet cleanly formulated in terms of mathematics. Probably for this reason Bell introduces some further notation and terminology in a subsequent discussion. He in effect introduces the space-time diagram of Fig. 3 that is adapted4 from Norsen’s (2009) [16] highly illuminating paper. 1
2 A
3a
B b
a
3b
λ
Figure 3: Space-time diagram of the setup Bell considers. For explanation, see text. Figure adapted from Norsen (2009) [16]. This diagram encodes the setup Bell considers. It involves measurement on a bi-partite system (e.g., two particles emitted by a source) where each part is measured by a different party, called Alice and Bob respectively. The outcomes of measurement are represented by beables A (in region 1) and B (in region 2) and the settings chosen by experimenters Alice and Bob are denoted by beables a and b respectively. The symbol λ indicates the specification of the state of the bipartite system under study together with other relevant beables in the spacetime regions 3a and 3b. The logic is now as follows. Consider a candidate theory that attempts to describe any correlations found between outcomes A and B. Suppose region 3a shields off region 1 from the overlap of the past light cones of 1 and 2, and, likewise, that region 3b shields off region 2 from the overlap of the past light cones of 1 and 2 (see Fig. 3). It is assumed that (in this candidate theory under study) λ constitutes a complete specification5 of the beables in region 3a and 3b. With all this implicitly in place, Bell continues and applies his principle of local causality to this setup: Invoking local causality, and the assumed completeness of . . . λ . . . we declare redundant certain of the conditional variables in the last expression because they are at space-like separation from the result in question. Bell (1990) [9, p. 109] 4 5
See footnote 5. Norsen (2007 [15], 2009 [16]) requires that such a complete specification of region 3a and 3b not only includes λ but also the setting a in region 3a and setting b in region 3b, respectively. For this purpose he has appropriately located the spacetime location of settings to overlap with region 3 in Fig. 3. We believe, however, that this need not be done, and in fact should not be done. In remark (two) later in this paper this is further explained.
Not Throwing out the Baby with the Bathwater
429
Thus the specification of λ makes both B and b redundant for prediction about A, and both A and a redundant for prediction about B. This finally allows for a clean formulation in mathematics of the principle. For now we follow Norsen (2007) [15] in claiming that this indeed gives (but, see our critique later on) P (A|a, b, B, λ) = P (A|a, λ) , P (B|a, b, A, λ) = P (B|b, λ) ,
(1)
i.e., the conditional probability of obtaining A is independent of both B and b given the specification λ and a, and analogous for the probability of obtaining B. Using the definition of conditional probability one trivially obtains the condition P (A, B|a, b, λ) = P (A|a, λ) P (B|b, λ) ,
(2)
i.e., the joint probability for obtaining outcomes A and B factorizes into a product of individual probabilities for the two spatially separated systems, with each factor containing conditionalization only on local beables. This well-known factorisation condition is thus derived from the principle of local causality, just as Bell himself stressed.6 In the following we will subject this reasoning and mathematical formalisation to a critique. We believe that Bell’s qualitative statement in the long quote above can benefit greatly from a more refined and detailed mathematical discussion than available in the literature, especially concerning (i) the understanding of the notions of sufficiency, completeness and redundancy involved; and (ii) the fact that the settings a, b and outcomes A, B have very different theoretical roles in the candidate theories under study, something which, we will argue, is not adequately reflected in the reasoning leading up to (1). The upshot of our critique is a more detailed, sharp and clean mathematical expression of local causality. It will furthermore be shown7 that this pays off when interpreting this condition.
6
7
“Very often such factorizability is taken as the starting point of the analysis. Here we have preferred to see it not as the formulation of “local causality”, but as a consequence thereof.” Bell (1990) [9, p. 109] Here only a preliminary investigation of this is worked out. A full analysis of the repercussions of our proposal is to be presented in future work, see Seevinck & Uffink (2010, [18]).
430
M. P. Seevinck and J. Uffinck
III. C LEANING UP THE INTUITIVE IDEA Let us first comment on some crucial aspects of Bell’s formulation of local causality, most of which can already be found in the literature.8 Here we will be deliberately short because our main point lies elsewhere as will soon become clear. When possible we will quote Bell so as to let the ‘master speak for himself’. For a more detailed discussion of these points, see more of Bell himself (Bell, 1976 [2], 1977 [3], 1981 [5], 1990 [9]), and especially Norsen (2007 [15], 2009 [16]). (i) It is important to note that the condition of local causality is only intended to be a constraint on candidate theories, and not on the real world. Indeed, Bell starts by writing: “A theory is said to be locally causal if . . . ” [emphasis added]. Furthermore, as Norsen (2009, [16]) has pointed out, Bell has emphasized this point very clearly in (Bell, 1977 [8, p. 101]): I would insist here on the distinction between analyzing various physical theories, on the one hand, and philosophising about the unique real world on the other hand. In this matter of causality it is a great inconvenience that the real world is given to us once only. We cannot know what would have happened if something had been different. We cannot repeat an experiment changing just one variable; the hands of the clock will have moved, and the moons of Jupiter. Physical theories are more amenable in this respect. We can calculate the consequences of changing free elements in a theory, be they only initial conditions, and so can explore the causal structure of the theory. I insist that [local causality] is primarily an analysis of certain kinds of physical theory.
Note that the fundamental concepts involved such as ‘beables’, ‘completeness’, and ‘free variables’ are all relative to some particular candidate theory. This will be become clear next. (ii) Bell uses the term ‘beable’ to denote whatever is posited by the candidate theory to correspond to something physically real: The beables of the theory are those elements which might correspond to elements of reality, to things which exist. Their existence does not depend on observation. Indeed observation and observers must be made out of beables. I use the term ‘beable’ rather than some more committed term like ‘being’ or ‘beer’ to recall the essentially tentative nature of any physical theory. Such a theory is at best a candidate for the description of nature. Terms like ‘being’, ‘beer’, ‘existent’, etc., would seem to me lacking in humility. In fact ‘beable’ is short for maybe-able’. (Bell, 1984 [8, p. 174])
The concept ‘beable’ is thus theory-relative, and it is important that the candidate theory in question is absolutely clear about what it posits as physically real (Norsen, 2007 [15]). Indeed, Bell emphasizes “. . . you must identify in your theory ‘local beables’. The beables of the theory are those entities in it which are, at 8
Nevertheless, most of these aspects are not well-known nor generally appreciated by commentators. It is thus worthwhile mentioning them here.
Not Throwing out the Baby with the Bathwater
431
least tentatively, to be taken seriously, as corresponding to something real.” (Bell, 1990, [9, p. 100]). And, ”[i]t is in terms of local beables that we can hope to formulate some notion of local causality.” (Bell, 1976, [8, p. 53]). When applied to our particular setup of Fig. 3 this implies that the candidate theory in question must provide a well-specified account of the beables λ in region 3. It should be noted that λ is not resticted in any way, as it can be anything the theory posits as physically real. In particular λ need not be some classical hidden variable. (iii) It is important that in Fig. 2 “[r]egion 3 completely shields off from 1 the overlap of the backward light cones of 1 and 2.” (Bell, 1990 [9, p. 106]). Likewise, in the paradigmatic setup of Fig. 3 it is necessary that region 3a shields off region 1 from the overlap of the backward light cones of 1 and 2, and, analogously, region 3b shields off region 2 from the overlap of the backward light cones of 1 and 2 (see Fig. 3). Why? For if this was not the case, such as for region 3′ in Fig. 4, then a violation of (1) fails to indicate the presence of some sort of non-local causation. In short, consider for example an event ×3 in Fig. 4 located in the overlap of the backward light cones of regions 1 and 2 but in the forward light cone of region 3′ . Since ×3 lies in the overlap of the backward light cones of regions 1 and 2, it can influence both A and B. Now suppose ×3 is a genuinely stochastic event, not predictable on the basis of the beables in region 3′ , then specification of events in region 2 could tell about ×3 , which in turn, could allow one to infer more about the events in 1 than is possible from just the original specification of 3′ . The condition (1) with region 3 replaced by region 3′ – to be called called (1′ ) – would exclude any correlation between events b, B and the outcome A, given a and λ. But a failure of this condition could be perfectly compatible with local causality. Thus although (1′ ) “may validly be described as a “no correlations” condition for regions 1 and 2, it definitely fails as a “no-causality” condition.” (Norsen, 2007, [15, p. 12]).
A
B ×2
×1
×3 30 λ0 a0
b0 4
Figure 4: Alternative space-time diagram of the Bell-type setup. For explanation, see text, and compare to Fig. 3. Figure adapted from Norsen (2009) [16].
432
M. P. Seevinck and J. Uffinck
(iv) Bell takes it to be a trivial point that besides λ, both the settings a, b and the outcomes A, B are also beables:9 “The beables must include the settings of switches and knobs on experimental equipment [settings] . . . and the readings of intruments [outcomes].” (Bell, 1975, 8, p. 52]. Indeed, the settings correspond to the controllable part of some measurement apparatus and the outcomes can be taken to be manifested by the final position of some pointer (or something similar, such as a black spot on a photograph, etc.) and these are all beables as they must, in any serious candidate theory, correspond to “something physically real” (Bell, 1990 [9, p. 100]). Thus, “[the] “setting” ultimately comes down to the spatial configuration of some physically real matter, [and] . . . the outcome too is just a convenient way of referring to some physically real and directly observable configuration of matter, and so [these too] will necessarily be reflected in the beables posited by any serious candidate theory.” (Norsen, 2009 [16, p. 5]). There is, however, also a very important difference between settings and outcomes that breaks the symmetry described above. This is a consequence of the fact that, in contradistinction to the outcomes, the settings are supposed10 to be uncorrelated to the beables λ. The reason being that “Now we make an important hypothesis: The variables a and b can be considered to be free,11 or random.” (Bell, 1990, [9, p. 109]). Bell continues: “we can imagine these settings being freely chosen at the last second by two different experimental physicists, or some other random devices. If these last second choices are truely random, they are not influenced by the variables λ. Then the resultant values for a and b do not give any information about λ. So the probability distribution over λ does not depend on a or b”, i.e., ρ(λ|a, b) = ρ(λ).
(3)
with ρ some density. This ‘free variables’ assumption has the important repercussion that, despite the fact that from a physical point of view outcomes and settings are nothing but beables, they do have a completely different theoretical role to play in the candidate theories in question. In the literature this crucial difference has not been correctly incorporated in the mathematical formalisation of Bell’s notion of local causality. One of the main points of the present paper is that this will be performed here (in section III B and beyond). (v) Consider the settings a′ and b′ that are located in region 4 of Fig. 4. Since region 3′ (just like region 3 of Fig. 3, by the way) shields off both region 1 and 2 from the settings a′ and b′ that are located in region 4, we expect that if indeed 9 A point also made by Hans Westman, private communication. 10 This requirement is crucial in deriving the so-called Bell-type inequalities that are used to prove Bell’s theorem (Bell, 1964 [1]). 11 Bell also calls them “free variables – say at the whim of experimenters”. (Bell, 1976 [8, p. 61])
Not Throwing out the Baby with the Bathwater
433
the beables λ′ in region 3′ are completely specified that the beables of region 4 would be redundant for prediction of events in region 1 or 2. The reason being that local causality is supposed to enforce that a complete specification of λ′ renders the events in the backward light cone of region 3′ redundant for predictions of events in the future light cone of 3′ . Indeed, this is precisely what Bell’s intuitive formulation claims (see the very first quote by Bell in section II, and also Fig. 1). Therefore, under this understanding of local causality, and where we take λ′ to be completely specified, we expect that (1) reduces to P (A|a′ , b′ , λ′ ) = P (A|λ′ ) , P (B|a′ , b′ , λ′ ) = P (B|λ′ ) .
(4)
However, this ignores the fact that the settings a′ and b′ are by assumption independent of the beables λ′ of region 3′ (see the remark (iv) above). Therefore, if one takes this into account, we expect the most complete specification of λ′ not to shield off from region 1 and 2 the causal influences a′ and b′ have on these regions. Thus, we in fact expect (4) to be violated, i.e., both a′ and b′ could very well (in a locally causal way) supply information about both A and B that is not already contained in λ′ . Yet, since a′ , b′ lie in the backward light cone of both region 1 and 2 this would in no way indicate non-local causation. In order to exclude such spurious violations of (1) it must be the case that the settings should be spacelike separated and outside the overlap of the backward light cones of regions 1 and 2 (i.e., outside region 4 in Fig. 4), and in most discussions this is indeed the case, e.g., in Fig. 3. Note that the main point here is not to argue for the settings to be located as in Fig. 3 rather than as in Fig. 4 –for that is a rather obvious12 if one wants to test local causality–, but that this is necessary to exclude spurious violations of (1). (vi) The previous remark (i.e. (v)) serves as a starter for the following subsequent discussion. For that remark shows not only that settings should have a very particular space-time configuration in a sound formulation of local causality, it also indicates that the supposition that λ′ could be a complete specification of the beables in region 3′ of Fig. 4 is an illusion; and the same holds for λ in region 3 of Fig. 3. This is explained next. Indeed, we can not but give up the idea that λ is a complete specification of beables in region 3, for otherwise “one wonders how a and b could possibly not be causally influenced by λ (in a locally causal theory).”(Norsen, 2009 [16, p. 283]). In other words, λ cannot be expected to be a complete specification of region 3 because one must allow for the possibility of traces in region 3 of the causal past
12 The settings a,b need not lie in the forward light cone of region 3, all that is necessary is that they lie outside the overlap of the backward light cones of region 1 and 2.
434
M. P. Seevinck and J. Uffinck
of both the settings, and given the independence of λ and the settings, these traces cannot be included in λ.13 An alternative understanding of this point is that one is here faced with a dilemma. That is, the following two assumptions cannot both hold: (i) the free variables assumption, and (ii) the assumption that λ is completely specified, i.e., contains the description of all and every beable in region 3. However, this dilemma can in fact be avoided. Because, luckily, there is no need for a completeness requirement when formalising the notion of local causality. It is only needed that the specification is sufficiently specified, in the relevant sense. This will be futher worked out in the next subsection. Although Bell seemed to be aware of the crucial distinction between a sufficient and complete specification of the beables involved (see next subsection), he nevertheless repeatedly14 stressed the need for λ to be a complete specification of the beables in region 3. The reason for this is the following. Suppose the events in region 3 are not completely specified, i.e., λ leaves out some relevant beables. Then a violation of (1) can no longer be used to infer some sort of non-local causation, for it could well be that the beables excluded from λ influence both regions 1 and 2 in a locally causal way such that correlations between 1 and 2 can no longer be explained by an appeal to λ alone. In order to exclude such spurious violations of (1), Bell requires the specification of region 3 by λ to be complete. Norsen (2007, [15, p. 8]) emphasizes that the key assumption
13 “For example, some candidate theory (and this is actually true of every serious extant candidate theory) might provide a specification of the state of the particle pair which is sufficient in the relevant sense, even though it leaves out some fact (say, the millionth digit of the energy of some relic microwave background photon that happens to fly into the detection region just prior to the measurement) which actually exists in the relevant spacetime region. Such a fact could then be allowed to determine the setting a without introducing even the slightest evidence for the problematic sort of correlation between a and λ. Indeed, this is just an exaggerated version of what happens in the actual experiments, where carefully-isolated and independent pseudo-random-number generators are used to produce the settings at the two stations.” (Norsen, 2009, [16, p. 283) 14 Most notably this is the case in the formulation of local causality as given in section II above: “And it is important that events 3 be specified completely.” (Bell, 1990 [9, p. 106]), and ”Invoking local causality and the assumed completeness of c and λ, . . . ”(Ibid, p. 109). But also already in 1975 in ‘The theory of local beables’ Bell requires this form of completeness: “However, in the particular case that [λ] contains already a complete specification of beables in the overlap of the two light cones, supplementary information from region 2 could reasonable be expected to be redundant.” (Bell, 1975 [8, p. 54]). And in ‘EPR correlations and EPW distributions’, Bell writes: “In a locally-causal theory, probabilities attached to values of local beables in one space-time region, when values are specified for all local beables in a second space-time region fully obstructing the backward light cone of the first, are unaltered by specification of values of local beables in a third region with spacelike separation from the first two.” (Bell, 1986, [8, p. 200]).
Not Throwing out the Baby with the Bathwater
435
by Bell is that events be specified completely:15 “It is only because λ is assumed to be a complete description, that the non-dependence of the probability of A on the distant outcome B follows from local causality.” One final remark concerning the completeness (or sufficiency; see next subsection) that is at stake. This notion is not to be understood in the sense of the true maximal amount of knowledge concerning the systems under study. To the contrary, it is relative with respect to possible candidate theories. Therefore, the question whether the candidate theory in question is complete in the sense of including ‘all that really exists’ is not at stake (there is no need for ‘omniscience’, see Norsen (2007, [15, p. 8]). It thus is the candidate theory in question that should indicate when the desired completeness (or sufficiency) is achieved. —— The remarks (iv) and (vi) indicate that crucial elements of Bell’s condition of local causality have escaped careful mathematical formalisation. Most notably, the theoretical distinctions between settings and outcomes and between the notions of sufficiency and completeness have as of yet not been properly implemented. In the next two subsections these two issues are to be further worked out so as to provide the means for a clean and sharp mathematical formalisation of the notions at stake; something which is to be undertaken in section IV. This fills in the gap that was left untouched in the analysis of others, notably the careful analysis by Norsen (2007 [15], 2009 [16]). A. On sufficiency and completeness The above remarks (v) and (vi) indicate that λ cannot be expected to be a complete specification of region 3 because one must allow for the possibility of traces in region 3 of the causal past of the settings, and given the independence of λ and the settings, these traces cannot be included in λ (see footnote 13). Bell seems to account for this by indicating, in his formulation of local causality as given in section II, that the specification must be “sufficiently specified, for example by a full specification of local beables” [emphasis added]. Indeed, a complete specification is not necessary, but merely sufficient to interpret (1) as a condition of local causality. However, and this is the main point, we have just seen that such a “complete specification” is not an option. It is the “sufficient specification” that can at most be required. We must thus focus our mathematical formalisation on this notion of sufficiency. Although both Bell16 (1990, [9]) and Norsen (2009, [16]) indicate that completeness is not necessary, but only sufficient, in the main discussion of the notion 15 But in a different paper Norsen (2009, [16]) mentions that, strictly speaking, such a form of completeness is not at all needed. See next subsection where this is discussed. 16 Apart from the citation on page 427 above (“. . . is already sufficiently specified, for example . . . ”), taken from La Nouvelle Cuisine (Bell, 1990 [9]), at at least two other
436
M. P. Seevinck and J. Uffinck
of local causality they nevertheless gloss over this and assume that λ provides a complete specification of the beables in region 3. Here we want to avoid this, and therefore will proceed to give a sound analysis of the notion of sufficiency involved. It might be true that “Once one realizes that the notion of a complete specification of beables is relative to a given candidate theory, there is no further problem understanding the meaning of ‘complete’ or ‘full’. . . . But it is less clear, even given some well defined candidate theory, what partial specifications of the beables might be considered ‘sufficient’.” (Norsen, 2007, [15, p. 16]) and that “[i]n a more careful discussion the notion of completeness should perhaps be replaced by that of sufficient completeness for a certain accuracy, with certain epsilonics.” (Bell, 1977, [8, p. 104]). But this is a practical problem that one encounters when formulating adequate candidate theories; it is not a problem for mathematically formalising the notion of local causality. The reason being that we may assume that a serious candidate theory provides us with an unambiguous specification of the causally relevant beables. Otherwise the question of whether the theory is locally causal does not even come up.17 Combining the above points, a first crucial observation is the following. The notion of sufficiency should be relativized with respect to a specific subclass of all beables in the candidate theory, namely the subclass of all beables in a region 3 except for the free variables and the beables causally related to the latter in that region. We thus isolate a very important aspect of the notion of sufficiency involved. It is (α) relative to a specific class Rλ of beables (i.e., the beables have a particular space-time specification, and they need not include all such beables the theory in fact allows for!). occassions Bell mentions that the notion of sufficiency is needed when formalising the notion of local causality: “Consider, then, the hypothesis that A and B fluctuate independently when the relevant causal factors, at time T − δ − ϵ say, whatever they may be, are sufficiently well specified . . . .” [...]. That is assume there are variables λ and some ∫ ∫ probability distribution δ such that (2) holds. [where (2) is: ρ(A, B|a, b) = dλ dµ σ(λ, µ)ρ1 (A|a, λ)ρ2 (B, |b, µ)] (Bell 1980, [8, p. 106]) [emphasis added]. And, “It seems reasonable to expect that if sufficiently many such causal factors can be identified and held fixed, the residual fluctuations will be independent, i.e., P (M, N |a, b, λ) = P1 (M |a, λ)P2 (N |b, λ), where [. . . ] λ denotes any number of other variables that might be relevant.” (Bell, 1981 [8, p. 152]) [first emphasis added]. However, despite all this, Bell did not carefully distinguish between these two notions, and in fact seemed to prefer the requirement of completeness. See footnote 14 above. 17 See Norsen (2007, [15, p. 10]) for a similar analysis that shows that it should be trivial to decide in a given candidate theory what is and what is not a causal influence. This is endorsed by Cartwright & Jones (1991, [10, p. 229]) who write “... but we take the point to be perfectly general: there are no tests of [local] causality outside of models which already have significant causal structure built in.”
Not Throwing out the Baby with the Bathwater
437
The next crucial observation comes from considering the following question: “λ is sufficient for what?” The first obvious answer may be: sufficient for prediction of outcome A or B. However, this cannot be true, as the probabilities assigned to region 1 need not be determined by λ and the settings alone. Let us illustrate this with an example. Since we are allowing genuinely stochastic theories there could be stochastic causal relevant factors above region 3 and within the backward light cone of 1. Consider for example some such causal factors located at events ×1 or ×2 in Fig. 4. It is perfectly legitimite18 in a locally causal theory that these events are allowed to supplement λ in determining the probabilities of events in regions 1 or 2 respectively, i.e., ×1 influencing region 1 and ×2 influencing region 2. The correct answer to our question “sufficient for what?” is that the specification λ should be sufficient for rendering B and b redundant for the task of specifying the probability of outcome A occurring. Indeed, Bell writes: “The hypothesis is that any such information about 2 becomes redundant when 3 is specified completely”. As we have just seen, ‘completely’ should here be replaced by ‘sufficient’ or ‘sufficiently complete’. We thus isolate another very important aspect of the notion of sufficiency involved. The specification λ should be sufficient (β1 ) for a specific purpose, namely (β2 ) to render some other variables redundant for the task of determining some particular quantity. In the next section the notion of sufficiency as spelled out via the requirements α, β1 and β2 will be properly mathematically formalised. But before we can do so a final necessary preparatory analysis needs to be presented in the next subsection. B. The different theoretical role of settings and outcomes In the mathematical formalisation above, (i) the outcomes A, B, (ii) the beables λ and (iii) the settings a, b appear as conditioning arguments in a probability distribution. See e.g. Eq. (1). However, if one treats the settings a and b as conditioning arguments in a probability distribution, this implies, at least in Kolmogorovian probability theory, that they are random variables, and thus a probability distribution over their possible values is defined within the model: one cannot write p(x|y) unless p(y) is also defined. In other words, this means that the candidate theory in question would have to specify how probable it is that Alice will choose one setting a1 rather than a2 , and similarly for Bob and for their joint choices. 18 See Norsen (2009, [16, p. 12]) for an extensive discussion of the following point: “The claim made in the local causality condition is not that the probabilities assigned (to events in region 1, on the basis of complete information about region 3) are the “best possible” probabilities the theory allows. They aren’t. Better ones might be assigned, e.g., if we move region 3 forward in time, into the more recent past of region 1.”
438
M. P. Seevinck and J. Uffinck
But that would be a remarkable feat for any physical theory. Even quantum mechanics leaves the question what measurement is going to be performed on a system as one that is decided outside the theory, and does not specify how much more probable one measurement is than another. It thus seems reasonable not to require from the candidate theories that they describe such probabilities. This explains, we hope, the different theoretical status of settings a, b and random variables A, B, λ. And lest one should despair that this distinction introduces a dichotomy not introduced by Bell, we note that Bell also attributed a special status to a, b by relegating them the status of ‘free variables’, meaning that their values could be set or changed at the last instant. See above, section III, remark (iv). This view, we believe, should also count as an argument against the presumption that a putative candidate theory ought to provide their probabilities. However, above (section III, remark (iv)) it was also mentioned that from a fundamental point of view both settings and outcomes are nothing but beables. But we have just indicated that they have very different roles in any putative candidate theory we envisage, and this means that we should not regard them on equal footing, at least theoretically. Although their ontological (or physical) status might be the same, their theoretical status is not. And this is crucial. Especially since local causality refers to putative candidate theories only (see section III, remark (i)). Thus, the difference between the two must be adequately reflected in any candidate theory. Therefore we adopt the following point of view. Outcomes A, B, as well as the beable specification λ are random variables and figure as arguments of a joint probability distribution Pa,b (A, B, λ) = Pa,b (A, B|λ) ρ(λ). The measurement settings a, b appear as labels of this probability function, not as arguments. The fundamental conditional probabilities to be considered are thus Pa,b (A, B|λ),
(5)
instead of Bell’s P (A, B|a, b, λ). Also, Eq. (3) encoding the ‘free variables’ assumption should in fact read: ρa,b (λ) = ρ(λ).
IV. I NTRODUCING MATHEMATICS :
FORMALIZING SUFFICIENCY
Then how are we to mathematically implement Bell’s idea of ”λ being sufficiently specified so as to declare redundant some of the conditional variables” in Pa,b (A, B|λ), where the latter are in fact to range over both the labels a, b and the random variables A, B? This we will perform next. It is important to realize that two distinct notions of sufficiency are in play, i.e., where firstly the label b (or a), and, secondly, the random variable B (or A) becomes redundant for the task of determining the specific probability to obtain outcome A (or B). Each of these two notions will be clarified next.
Not Throwing out the Baby with the Bathwater
439
A. Functional sufficiency The first kind of sufficiency where one of the labels a, b becomes redundant can be rather easily formalized. Consider a function fx (y) where x labels the different functions fx (y). When variable y is sufficient for the purpose of declaring x redundant for task of determining the function fx (y), relative to the class of all y in some specific range Ry , then, given y the function f does not differ for different values of x, i.e., fx (y) = f (y), ∀y ∈ Ry , ∀x.
(6)
This can of course trivially be generalized to functions fx1 ,x2 ,... (y1 , y2 , . . .) that have more than one label x1 , x2 , . . ., and more than one argument y1 , y2 , . . .. Recall that our preliminary analysis of sufficiency in section III A showed that such a requirement for physical theories implies that certain variables should be sufficient for a particular purpose and relative to a specific class of variables. See the requirements α, β1 and β2 on page 437. This is retained here: y is sufficient for the purpose of making fx (y) independent of x (thus not sufficient for determining its value, or for some other purpose), and this is relative to a specific class, namely to all such y that lie in a given range Ry . Applying this condition to the usual Bell-type framework we have been sketching in the previous section is rather easy. First of all, we will suppose the condition (6) to hold for each of the probability distributions Pa,b (A, B|λ), Pa,b (A|B, λ), etc. Secondly, λ will play the role of y, and thus Rλ the role of Ry , and, thirdly, the settings a, b play the role of the labels x. Let us next turn to sufficiency in the case of statistical dependence P (·|x, y), and that requires considerably more clarificatory effort. B. Sufficiency in statistical inference The concept of sufficiency in the context of the theory of statistical inference was developed by R. A. Fisher (1922) [13]. The basic problem of statistical inference may be formulated as follows. Suppose we have some probabilistic experiment with a fixed set of possible outcomes x ∈ X and a family of probability distributions pθ , θ ∈ Θ, each of which provides some candidate description for the experiment. Here, Θ represents some arbitrary index set. For each value θ ∈ Θ, pθ (x) then provides the probability of x to occur. It is assumed however that we do not know exactly what the correct probability distribution is for the experiment and the problem is to infer something about which probability distribution out of the given collection would provide a best “fit” for the experiment on the basis of recorded outcomes. It is generally useful to present the problem from a slightly expanded version, by adding the supposition that it is possible to repeat the performance of the experiment under i.i.d conditions (i.e., independent and identically distributed repeated
440
M. P. Seevinck and J. Uffinck
trials). In that case, assuming one performs n such trials, the probability of obtaining a sequence (x1 , . . . , xn ) is pθ (x1 , . . . , xn ) =
n ∏
pθ (xi ) .
(7)
i=1
The goal now becomes to make an inference about θ on the basis of the outcomes (x1 , . . . , xn ). There are many forms in which such an inference could be casted. For example, it could take the form of a point estimator, i.e., by designing a function τ : X n 7→ Θ such that τ (x1 , . . . , xn ) would represent the best estimate of θ. These details need not concern us now because the concept of sufficiency is equally important in all such approaches. To introduce the notion of sufficiency it is useful to consider a set of n independent functions {y1 , . . . , yn } defined on X n , such that the equations y1 (x1 , . . . , xn ) = .. .
c1 .. .
yn (x1 , . . . , xn )
cn
=
(8)
always have a unique solution. Thus, one might simply think of (y1 , . . . , yn ) as an alternative coordinate system that charts the points in X n just as well as (x1 , . . . , xn ). Clearly, it is then equivalent whether one provides the recorded data in terms of the original values (x1 . . . , xn ) or in terms of the alternative coordinates (y1 , . . . , yn ). The probability distribution can be transformed to the alternative coordinates: pˆθ (y1 , . . . , yn ) = pθ (x1 , . . . , xn )|
∂xi | ∂yj
(9)
where the last factor represents the Jacobian of the transformation. Now suppose that pˆθ has the following form pˆθ (y1 , . . . , yn ) = pˆθ (y1 )g(y1 , . . . , yn ).
(10)
In that case, the function y1 is said to be sufficient for θ. Of course, there are many choices for such an alternative coordinate system. It is straightforward to show that if y1 out of the set {y1 , . . . , yn } is sufficient for θ, then, the same will hold for an alternative set {y1 , y˜2 , . . . , y˜n }, provided that this also gives a regular coordinate system. The intuitive idea behind this notion of sufficiency is that all the information that the data provide about the unknown value of θ is in this case really contained in y1 alone, because the probabilities of the values of the remaining variables y2 , . . . , yn are insensitive to θ. In other words, whatever form our inference about θ is going to be, it seems reasonable to make it depend only on the value of y1 ,
Not Throwing out the Baby with the Bathwater
441
since all the other data are irrelevant for this purpose. Of course, if a sufficient statistic can be found this greatly simplifies the problem of statistical inference because the number of relevant data can then be reduced from n to 1. In Fisher’s own words, the criterion of sufficiency is: That the statistics chosen should summarize the whole relevant information supplied by teh sample. [. . . ] In mathematical language we may interpret this statement by saying that if θ is the parameter to be estimated, θ1 a statistic which contains the whole of the information as to the value of θ which the sample supplies, and θ2 any other statistic, then the surface of distribution of pairs of values θ1 and θ2 for a given value of θ is such that for a given value of θ1 , the distribution of θ2 does not involve θ. In other words, when θ1 is known, knowledge of the value of θ2 throws no further light upon the value of θ. Fisher (1922) [13, p. 317].
It might be worthwhile to illustrate this by a simple example. Suppose we have a real-valued outcome x and a collection of normal probability distributions which differ only in their location parameter, e.g.: 2 1 pθ (x) = √ e−(x−θ) /2 . 2π
(11)
It is well-known that for this case, the function 1∑ xi n i=1 n
y1 =
(12)
provides a sufficient statistic for θ. In general, however, the class of probability distributions for which a sufficient statistic exists is severely limited; the PitmanKoopman theorem implies they exist only for the exponential family. It is useful to slightly rewrite the condition of sufficiency (10). Using (9), we can rewrite it into pˆθ (y1 , y2 , . . . , yn ) = pˆθ (y1 )ˆ g (y1 , . . . , yn ). Next, using the definition of conditional probability this gives pˆθ (y2 , . . . , yn |y1 ) = gˆ(y1 , . . . , yn ),
(13)
which states that if y1 is sufficient for θ then the conditional probability pˆθ (y2 , . . . , yn |y1 ) must be independent of θ (because gˆ(y1 , . . . , yn ) is), and thus pˆθ (y2 , . . . , yn |y1 ) = pˆ(y2 , . . . , yn |y1 ).
(14)
This alternative formulation of sufficiency shows that once y1 is given, the rest of the data (i.e., {y2 , . . . , yn }) is irrelevant to θ. For our purposes the following notes are of crucial importance. Note firstly that notions of locality or causation can be kept safely on the bench in this approach. For example, it might be that θ labels various races of tomato plants, and x the weight of a tomato produced by a such a plant. Secondly, Fisher’s talk about ‘information’ should be understood in the following sense. It refers to information that is ‘contained’ in a sample and that is
442
M. P. Seevinck and J. Uffinck
‘about’ something. To say that y1 is sufficient is qualified by saying that it is sufficient for a purpose (inferring the value of θ, it may well be insufficient for other purposes!) and relative to a class Ry , namely all other statistics of the same sample space, i.e., all other functions of y2 , . . . , yn of the outcome space X n , i.e. Ry = {y1 , . . . , yn }. So we see that this framework naturally incorporates the two aspects mentioned in subsection III A, namely that sufficiency of λ should be characterised as (i) for a specific purpose, the purpose beign to render some other variable redundant and (ii) relevant to a specific class of variables. See the requirements α, β1 and β2 on page 437. Finally, some words about its status. In mathematical statistics, one might feel, sufficiency is actually just a name for a particular definition. As such, one may ask, how can this be of any help to foundational or conceptual problems in physics? However, even in statistics sufficiency is much more than just a definition. It is a desideratum. But it can be turned into something much more powerful by formulating the demand that, for certain physical probabilistic or stochastic theories, certain variables should be sufficient (in the sense given by the definition above) for a particular purpose relative a specific class of variables. To this we will now turn.
1. Bayesian inference
The above approach used the orthodox formulation of statistical inference, in which parameters are kept strictly distinct from outcomes. The basic reason for this division is that while a statistical model provides probabilities for the events or outcomes x, there is usually not a corresponding probability for parameters, i.e., they are usually not random variables but labels for the probability distributions indicating settings of measurement apparata. At first sight this dichotomy between parameters and outcomes in orthodox statistical inference corresponds nicely to the same dichotomy adopted above in section III B between the theoretical roles played by the settings a, b and outcomes A, B? However, in the context of Bell’s notion of local causality we are not aiming at making an inference about the settings a or b. Therefore we will have to change the perspective somewhat. There is an alternative approach to statistical inference, the Bayesian approach, that does not rely on such a strict division between events and parameters. Bayesian statistical inference proceeds from a similar point of view except that now one assumes the existence of a so-called prior probability distribution over the parameter θ. Furthermore, the probability distributions pθ (x) are now reinterpreted as conditional distributions pθ (x) = p(x|θ) .
(15)
Not Throwing out the Baby with the Bathwater
443
Given these two assumptions, it is possible to provide a so-called posterior probability distribution by means of Bayes’ Theorem, i.e. p(θ|x) = ∫
p(x|θ)ρ(θ) . dθp(x|θ)ρ(θ)
(16)
Extending this to the case of multiple, independent and identically distributed trials one obtains: ∏n pθ (xi )ρ(θ) p(θ|x1 , . . . xn ) = ∫ ∏ni=1 . (17) p i=1 θ (xi )ρ(θ)dθ In the Bayesian approach, the goal of statistical inference is to report this posterior probability distribution. Note that this viewpoint does not necessarily presuppose a subjective interpretation of probability. As stated before, sufficiency is an important notion regardless of which precise approach to statistical inference is chosen. In the Baysian approach this takes the following form. Firstly, note that the transformation (9) remains valid. Secondly, consider pˆ(θ|y1 , . . . , yn ) and note that using Bayes’ Theorem this is equal to ∫
pˆ(y1 , . . . , yn |θ)ρ(θ) . pˆ(y1 , . . . , yn |θ)ρ(θ)dθ
(18)
Then assuming y1 to be sufficient for θ, i.e., assuming (14) obtains, we deduce that pˆθ (y1 )ˆ g (y1 , . . . , yn )ρ(θ) pˆθ (y1 )ˆ g (y1 , . . . , yn )ρ(θ)dθ pˆθ (y1 )ρ(θ) =∫ . pˆθ (y1 )ρ(θ)dθ
pˆ(θ|y1 , . . . , yn ) = ∫
(19)
Thus once the fundamental assumptions of Bayesian inference are in place, the sufficiency condition is entirely equivalent to pˆ(θ|y1 , . . . yn ) = pˆ(θ|y1 )
(20)
Perhaps, this makes the underlying motivation of sufficiency even clearer: if the above condition holds, then the probability of θ, once y1 is given, is not changed when the values of y2 , . . . , yn are included. These additional functions of the data are irrelevant or redundant for the purpose of assigning the posterior probability. In accordance with the above terminology, this is expressed as: y1 is sufficient for the purpose of rendering some set of other variables redundant relative to a class of variables Ry = {y1 , . . . , yn }. It should also now be clear how this condition would fit in with the usual Bell-type framework we have been sketching in the section II (e.g., see Fig. 3). First of all, we will suppose the condition (20) to hold for each of the probability distributions labeled by the settings a, b. Secondly, λ will play the role of y1 , and each of the yi (i ̸= 1) should be one of the other random variables in the candidate
444
M. P. Seevinck and J. Uffinck
theory in question, for example one of the settings A, B, or some other beable specification λ′ . The role Ry is thus taken over by Rλ . Thirdly, θ is one of the outcomes A, B. Now, although obviously λ is not a datum nor a function of the data, it is still assumed to be a random variable. It has a value, and although it will in general be hidden for us we can reason about the hypothetical case that we would know its value. Perhaps one might wonder if the identification θ with the outcomes A, B is a valid move to make. After all, we have introduced the variables y1 , . . . , yn as functions on the space of outcomes X n , whereas θ labelled probability distributions. But in a Baysian approach this distinction does not count as fundamental anymore. As is clear from (20) we are basically dealing with probability distributions on Θ × X n , and both θ and yi can be seen as functions on this larger space. By now we have gathered enough mathematically clean and precise tools to explicate Bells’ intuitive notion of local causality, and this will be carefully outlined in the next section.
V. T HE BABY: B ELL’ S LOCAL CAUSALITY ‘ MATHEMATICALLY SHARP AND CLEAN ’ Recall that our analysis of sufficiency showed that such a requirement for physical theories implies that certain variables should be sufficient (in one of the two senses given above) for a particular purpose relative to a specific class of variables. We have argued that then (6) and (20) can be obtained as mathematical criteria when dealing with, respectively, labels attached to functions and random variables that are to be conditioned on. In Bell’s condition of local causality it is the beable specification λ that is supposed to be sufficient. Thus in order to analyse this condition as a sufficiency criterion, we must, firstly, indicate the purpose for which λ is sufficient (per requirement β1 and β2 ) and, secondly, indicate relative to which beable class this is so, i.e., to indicate the range Rλ of allowable λ (per requirement α). We believe that by now it should come as no surprise that our proposal is the following: Consider again Fig. 2. A candidate theory will be said to be locally causal when, if region 1 is space-like separated from region 2, the theory provides a specification λ in a region 3 that shields off region 2 from the overlap of the back-ward light cones of 1 and 2, that is sufficient, in the sense of (6) and (20), for the purpose of rendering the beables in the far-away region 2 redundant for the task of predicting the probabilities of events in region 1, where λ is relative to the class of all allowable beable specifications that can be given about a region 3 (according to the candidate theory and consistent with the ‘free variables’ assumption), i.e., Rλ contains all allowable beable specifications in this region 3. If we now apply this to the standard bi-partite setup of Fig. 3 we obtain that if the candidate theory in question obeys local causality then the theory provides a
Not Throwing out the Baby with the Bathwater
445
beables specification λ in region 3a that is sufficient for the purpose of rendering the far-away outcome B and setting b redundant for the task of determining the probabilities of obtaining A, and this is so relative to all other allowable beable specifications Rλ the candidate theory in question provides for region 3a. Using the results of the previous section this entails that functional sufficiency (6) renders the label b and statistical sufficiency (20) the random variable B redundant for prediction of Pa,b (A|B, λ). For determining the probability of obtaining outcome B a completely analogous analysis obtains so that we finally get the mathematically sharp and clean formulation of the condition of local causality: Pa,b (A|B, λ) = Pa (A|λ), Pa,b (B|A, λ) = Pb (B|λ),
(21)
from which one trivially obtains factorisability: Pa,b (A, B|λ) = Pa (A|λ) Pb (B|λ).
(22)
Although rather similar to (1), which in section II was claimed to be the mathematical expression of the condition local causality, the differences and alternative derivation are crucial.
VI. N OT THROWING OUT THE BABY WITH THE BATHWATER A. Remarks (one) All this might look like an overcomplicated way to obtain the already wellknown, i.e., something similar to (1), but we have in fact obtained quite a lot: a mathematically clean formulation (21) that brings to the forefront crucial aspects otherwise left out. The formalisation, firstly, encodes the particular notions of sufficiency and redundancy that are involved, secondly, incorporates the theoretical distinction between outcomes (random variables) and settings (labels) enforced by the ‘free variables’ assumption, and finally, indicates rigourously where the constraints set by the notions of locality and causality enter the mathematical formalisation. This latter point has not been stressed before and needs some elaboration. Note that the mathematical formalisation of sufficiency itself needs no requirement what so ever of locality or causality, as can be deduced from the formal analysis of section IV. We must thus look elsewhere, and here is our proposal. When applying both statistical and functional sufficiency to the setup of Fig. 3, requirements of locality and causation necessarily come in play when fullfilling the requirements (α) and (β1 ) and (β2 ) of section III, page 436. Firstly, it is by an appeal to the principle that causality can only be local in the sense of Fig. 1, that the purpose for which λ is sufficient is specified. Indeed, because local causality stipulates that causes operating in a certain region
446
M. P. Seevinck and J. Uffinck
in space-time must lie in the backward light cone of that region and effects of anything occuring in that region can only lie in its forward light cone, we get the inference that anything outside the backward and forward light cone of that particular space-time region should be causally redundant. See Fig. 1 and the very first quote by Bell in section II. But in order to distinguish mere correlation from causal influence, it is, secondly, that the range Rλ of allowable beable specifications is restricted by considerations regarding locality and causality. The beables in Rλ must lie in a spacetime region with specific characteristics, namely it must shield off the overlap of the light cones of region 1 and 2 from these same regions 1 and 2. The possibility of spurious violations of local causality mentioned in remark (iii) on page 431 is in this way eliminated. In conclusion, local causality is not a mere statistical no-correlations requirement; to the contrary, it has a substantial relation to the relativistic causal structure of Fig. 2. See also Norsen (2007, [15]) who extensively argues for this point. According to our analysis presented here, Bell’s condition of local causality is a special form of statistical and functional sufficiency, where the purpose for which λ is claimed to be sufficient, and the range of validity (the class Rλ of beable specifications to which it is relative) are motivated by locality and causality constraints. It is important to realize that the notions of locality and causality referred to here involve nothing but the special relativistic causal structure as exemplified in the light cone structure of Fig. 1. Importantly, there is no need for some philosophical theory of causation or an appeal to the problematic notion of Reichenbach’s Principle of the Common Cause, or the like (see also Norsen (2007, [15])). (two) It is only λ that is supposed to be sufficient, and not λ plus one or both of the settings. Of course the settings a and b are needed to determine the outcomes A and B respectively, but, remember, prediction of outcomes of measurement is not the purpose for which λ is supposed to be sufficient! As was mentioned in footnote 5, Norsen (2007 [15], 2009 [16]) does include the local setting in the specification of the beables that are supposed to render some other beables in a space-like separated region redundant. However, this is not needed, and in fact even unwanted19 . For, after all, what should be in Rλ ? Only those beables whose causal past could be correlated, in the sense of Fig. 1 and according to the candidate theory in question, to the causal past of the beables that are to be rendered redundant, i.e. the beables in region 2, such as B, b. Being free variables the local settings should thus not be in Rλ , as the following example shows. If including a in the beable specification of region 3 could be relevant to render b, B in region 2 redundant for prediction of the probability to obtain A, then there needs to be a genuine possibility in the candidate theory under study for correlations between the causal past of a and events in region 2 (see Fig. 2 and Fig. 3). However, this is excluded by the ‘free variables’ assumption. We can think of a 19 Also, it should be mentioned that Bell himself explicitly refrains from doing this. See Fig. (6.6) of Bell (1990 [9, p. 108]).
Not Throwing out the Baby with the Bathwater
447
being chosen at the very last instant, and therefore (in any locally causal theory) a can make no difference as to whether beables in space-like separated region 2 are, or are not redundant for prediction of obtaining outcome A in region 1. Also, mathematically including a in Rλ makes no difference and is therefore better left out. (three) One could proceed in a two step procedure to obtain the mathematical formulation of local causality (21) and the condition of factorisability (22), by firstly requiring statistical sufficiency (6) so as to obtain, Pa,b (A|B, λ) = Pa,b (A|λ) , Pa,b (B|A, λ) = Pa,b (B|λ) ,
(23)
and only then functional sufficiency (20) to subsequently get: Pa,b (A|λ) = Pa (A|λ) , Pa,b (B|λ) = Pb (B|λ) .
(24)
These two requirements together indeed imply (21) and (22). It might be tempting to think of each of these two conditions (23) and (24) as implementing a different weaker assumption than local causality itself. For after all it is the conjunction of the two that gives the desired condition (21). Logically this is indeed true. But despite this theoretical difference, the physical status of the two conditions is exactly the same. Both are a consequence of local causality, and the appeal to notions of locality and causality used in implementing the functional and statistical sufficiency are just the same, see remark (one) above. Elsewhere this will be further argued for (Seevinck & Uffink, 2010, [18]) and the comparison to a similar famous two-step procedure by Jarrett (1984, [14]) and Shimony (1984, [19]) will be there presented. (four) Orthodox quantum mechanics violates (22). Indeed, as is well known, the quantum mechanical predictions using the singlet state can be easily used to provide such a violation. A closer look reveals that the theory violates statistical sufficiency because it violates (23), but it obeys (24) and thereby functional sufficiency. From this we can conclude that quantum mechanics does not provide a beable specification λ in region 3 with the correct characteristics, i.e., the theory is unable to provide a specification of beables in any appropriate region 3 such that the outcome B always becomes redundant for the probability of determining outcome A. It is tempting to draw more grand conclusions than this one, say, of a somewhat foundational, philosophical or meta-physical nature. We will refrain from doing that here, but in remark (six) below we will address the controversial question of what legitimate conclusions can be drawn from violations of local causality.
448
M. P. Seevinck and J. Uffinck
(five) The qualification of the class of beables Rλ is essential because we must carefully ensure that no spurious violations of local causality are allowed for, and at the same time that we do not encode too much in the specification λ so as to come into conflict with the ‘free variables’ assumption. Therefore, as was shown earlier, the specification cannot be taken to be complete, as some free variables must probably be left out, yet it must be allowed to range over all other beable specifications the candidate theory in question allows for. This justifies calling the specification of λ ‘sufficiently complete’, meaning that nothing causally relevant that is not ‘free’ from the candidate theory under consideration is left out, without claiming the specification to be complete in the sense that everything in the theory is included, including the free variables. This ensures that the spurious violations of remark (iii) and (iv) can not occur, and that local causality is not just a mere no-correlations condition. (six) Suppose (21) fails, how are we to interpret this? Well, given our remarks above, there seems to be only one option, namely that local causality is violated, i.e., there must be some non-local causation present in the candidate theory under study. We cannot blame a violation of (21) on the specification λ not being sufficiently well-specified, for as was just argued in remark (five) above, Rλ ranges over all allowable beable descriptions and should be regarded ‘sufficiently complete’. Nor can we blame such a violation on the existence of ‘locally explicable’ correlations. The space-time structure of Fig. 3 and the further specification of local causality via the notion of sufficiency, including the requirements (α), (β1 ) and (β2 ), excludes any spurious violations of (21) due to correlations that do allow a locally causal explanation. This we argued for in section III. The question then arises what it (philosophically, meta-physically) means for local causality to be violated. Such an investigation will be performed elsewhere, but a preliminary investigation is presented in the next, final, subsection. As a final remark we wish to present a point made by Norsen (2009, [16, p. 12])20 concerning a violation of (21): It isn’t necessarily that something in region 2 is causally influencing something in region 1, or vice versa. It is always possible that there is some other event, neither in region 1 nor region 2, which was not determined by [λ], and which itself causally influences both [beables in region 1] and [in region 2]. The point is, though, that this causal influence would have to be non-local (i.e., would have to violate the special relativistic causal structure sketched in [Fig. 2].
20 See footnote 8 for the reason of mentioning it here.
Not Throwing out the Baby with the Bathwater
449
VII. E NVOI Have we thrown out the baby with the bathwater? We believe not, as our cleaning up of the intuitive idea of local causality for mathematics has proven to be fruitful and to clarify hitherto unknown aspects of the notion of local causality. It should be seen as a complement to Norsen’s manuscripts (2007 [15], 2009 [16]) taking his analysis a step further. Also, we believe that our mathematical formalisation of Bell’s notion of local causality further unearths its rich conceptual background, and that it thereby brings us a bit closer to answering the hard and open foundational questions that arise from attempts to incorporate violations of local causality into our physical worldview. What critical light this novel mathematical formalisation throws on other interpretation and motivations of Bell’s notion of local causality is still to be worked out. This we hope to unearth in the near future (Seevinck & Uffink, 2010, [18]). Acknowledgements MPS acknowledges very fruitful conversations with Hans Westman and Eric Cavalcanti and thanks the Centre for Time, Sydney, Australia for hosting him as a guest researcher. JU acknowledges fruitful discussions with Joe Henson and thanks the Perimeter Institute, Waterloo, Canada for generous hospitality.
R EFERENCES [1] [2] [3] [4]
[5]
[6] [7]
[8]
Bell, J. S. (1964). On the Einstein-Podolski-Rosen Paradox, Physics 1, 195. Reprinted in [8], chapter 2. Bell, J. S. (1976). The theory of local beables, Epistemological Letters, vol. 9, March 1976. Reprinted in Dialectica 39, 85 (1985) and in [8], chapter 7. Bell, J. S. (1977). Free variables and local causality, Epistemological letters, February 1977. Reprinted in [8], chapter 12. Bell, J. S. (1980). Atomic-cascade photons and quantum-mechanical nonlocality. Comments on Atomic and Molecular Physics 9, 121. Reprinted in [8], chapter 13. Bell, J. S. (1981). Bertlmann’s socks and the nature of reality, Journal de Physique, Colloque C2 , suppl. au numero 3, Tome 42, 41. Reprinted in [8], chapter 16. Bell, J. S. (1984). Beables for quantum field theory, CERN-TH.4035/84. Reprinted in [8], chapter 19. Bell, J. S. (1986). EPR correlations and EPW distributions. In New Techniques and Ideas in Quantum Measurement Theory, New York Academy of Sciences. Reprinted in [8], chapter 21. Bell, J.S. (1987). Speakable and unspeakable in quantum mechanics. Cambridge: Cambridge University Press.
450
M. P. Seevinck and J. Uffinck
[9]
Bell, J. S. (1990). La nouvelle cuisine. In A. Sarlemijn and P. Kroes (Eds.), Between Science and Technology (pp. 97-115). Elsevier (North-Holland). Cartwright, N., Jones, M. (1991). How to hunt quantum causes. Erkenntnis 35, 205-231. Cavalcanti, E. G. (2008). Reality, locality and all that: “experimental metaphysics” and the quantum foundations. PhD thesis, the University of Queensland. arXiv: 0810.4974 (v1, 28 Oct 2008) Elby, A., Brown, H. R., Foster, S. (1993). What makes a Theory Physically “Complete”?, Found. Phys. 23, 971. Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society, A, 222. p. 309368. Jarrett, J. P. (1984). On the Physical Significance of the Locality Conditions in the Bell Arguments, Noˆus 18, 569. Norsen, T. (2007). J.S. Bell’s Concept of Local Causality, arXiv:0707.0401 (v2, 6 Jan 2010) Norsen, T. (2009). Local Causality and Completeness: Bell vs. Jarrett, Found. Phys. 39, 273. Seevinck, M. P. (2008). Parts and Wholes, PhD thesis, University of Utrecht. arXiv:0811.1027. (v3, 23 Apr 2009) Seevinck, M. P., Uffink, J. (2010). In preparation. Shimony, A. in Proceedings of the International Symposium: Foundations of Quantum Mechanics in the light of New Technology, (Tokyo, 1984).
[10] [11]
[12] [13] [14] [15] [16] [17] [18] [19]
M. P. Seevinck Institute for Mathematics, Astrophysics and Particle Physics Centre for the History of Philosophy and Science Faculty of Philosophy Radboud University Nijmegen Heyendaalseweg 135 6525 AJ Nijmegen The Netherlands and Institute for History and Foundations of Science Department of Physics and Astronomy Utrecht University The Netherlands [email protected] J. Uffink Department of Physics and Astronomy Utrecht University PO Box 80010 3508 AT Utrecht The Netherlands [email protected]
Team E History of the Philosophy of Science
BERNA KILINC1
KANT ON CHANCE AND EXPLANATION
1. INTRODUCTION On several occasions, Kant posed a dichotomy or trichotomy concerning the possible ways to explain the existence of a thing, within or outside of a proper science. For instance, writing in connection with artifacts and organisms in the Critique of Judgment, he pondered: Now if one asks why a thing exists, the answer is either that its existence and its generation have no relation at all to a cause acting according to intentions, and in that case one always understands its origin to be in the mechanism of nature; or there is some intentional ground of its existence (as a contingent natural being).2
A further division of grounds is found in the following sentence from the Critique of Pure Reason: “The world exists either through blind chance, or through inner necessity, or through an external cause”, where the latter cause is understood as a purposeful agency (A74/B99). In view of the fact that Kant took disjunction in the exclusive sense, the first passage poses a dilemma and the latter a trilemma concerning explanatory resources. Identifying mechanical explanation with “inner necessity”, at least as a first approximation, the two passages above, as well as several others scattered in Kant’s writings, point to the following trichotomy: The explanation of why X exists may be either 1) X is brought about by chance, 2) X has to exist (out of necessity), or 3) X is produced to achieve some goal (by some purposeful agency). Corresponding broadly to what can be briefly referred to as 1) chance, 2) natural and 3) teleological/intentional accounts, these explanatory resources are held by Kant to be mutually exclusive and jointly exhaustive of all possible ways of reflecting on why a thing exists. As is well known, Kant thought a proper science (of nature) should only produce natural explanations, even though at the time he was writing teleological viewpoints usually provided the best explanations of structure and function in living organisms. Instead of granting teleology an explanatory role, Kant recommended it as a regulative 1 2
This research is supported by Bogazici University research funds 08B202. Critique of the Power of Judgment, Cambridge: Cambridge University Press 2000, 5:426. Even when English editions are used, all citations from Kant’s works are referenced by the volume and page numbers of the standard Akademie edition, Kants gesammelte Schriften (except for citations from the Critique of Pure Reason, Cambridge: Cambridge University Press 1997, which will be referred to, in the usual fashion, without specifying the volume). If no English edition is indicated, translations are mine.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_30, © Springer Science+Business Media B.V. 2011
454
Berna Kilinc
idea for guiding the study of biological organisms. Accordingly, one could regard parts of organic beings as if they were intentionally produced to fulfill certain functions, without invoking the latter as the explanation of the former (that is, the existence of the parts). As regards chance accounts, he was more severe, for he dismissed them as being neither explanatory nor regulative. The question I want to pose is why he did not recommend chance explanations on analogy with teleology, as an idea for guiding research. I show in this paper that this question is not anachronistic, as Kant was familiar with the works on the calculus of chances that were being discussed by several eminent mathematicians and philosophers in the period. As Jakob Bernoulli or Abraham De Moivre thought, even though we may grant determinism and describe the outcome of a tossed die as fully determined by the antecedent conditions and laws of mechanics, we can also represent the same event as not fully determined but with a definite chance of happening. The chance hypothesis can then be used to explain the distribution of outcomes in repeated tosses of a die, or the distribution of birth rates in a population or other kinds of demographic phenomena. Establishing that Kant overlooked an important method that was beginning to flourish in his period, I analyse in this paper the possible reasons why. Of course, the motivations are not hard to come by, as Kant was committed to the precepts of rational cosmology as paradigmatic for a proper science. In his renunciation of chance, Kant was following Baumgarten and other metaphysicians in framing rational cosmology by ruling out fate, chance, gaps and leaps.3 This aversion to chance found its expression at both explanatory and regulative levels. Kant denied chance any instrumental role within a scientific method. As the most plausible reason for this stance, I provide in this paper an analysis of Kant’s concept of contingency.
2. KANT ON CHANCE From early on, as recorded and developed in Aristotle’s writings, chance was defined as a contrastive notion, primarily in opposition to what is purposeful or intentional, but also in opposition to what is usually the case. Kant retained these contrasts. He used two German words that are translated as chance or accident: Zufall and Ungefähr, and sometimes parenthetically, the Latin word casus. There is some evidence that he did distinguish between the meanings of these words, as in the following reflection: “Chance [Zufall] is posited as the opposite of that which is designed for a purpose (of that which human reason can contrive when 3
Alexander Gottlieb Baumgarten stated: “An event in the world of which the sufficient reason is unknown is chance. Pure chance, for which there is no sufficient reason, is impossible”. (Metaphysica, 17:107-8; quoted in Eric Watkins, “Kant on Rational Cosmology”, in Eric Watkins (Ed.) Kant and the Sciences. New York: Oxford University Press 2001, p.72).
Kant on Chance and Explanation
455
it comes to ends); chanciness [ungefähr] of that which happens according to unvarying and known laws.”4 However he conjoined these two meanings when in the Heavens he commented on the Epicurean cosmology as follows: The previously mentioned teachers of the mechanical development of the cosmic structure [Epicurus, Leucippus, Democritus, Lucretius] derived all order which can be observed in it from chance accident [ungefähren Zufalle], which allowed the atoms to come together in such a fortunate way that they created a well-ordered totality. … In my theory, by contrast, I find matter bound to certain necessary laws. … This does not happen through accident or chance [Zufall, ungefähr]. By contrast, we see that natural characteristics necessarily bring this condition with them.5
As this passage reveals, Kant took pride in being able to explain through laws of physics features of the planetary system, such as its flattened shape, for which a mechanical explanation could not be found previously.6 He could thus claim that it was not the randomly swirling motions of material particles that just happened to cluster by a fluke but the lawful evolution of an originally diffused nebulous matter that necessitated the present state of the planets and the sun.7 In the passage from the Heavens, the conjunction of the two possible renderings of chance, ‘ungefähre Zufall’, makes chance the contrary of both teleological and the natural explanations. Kant was not entirely consistent in this usage, as can be seen in the following parenthetical remark from the 1790s: “The word chance [Zufall] often indicates merely an event which is not according to order or rule-ungoverned”.8 Given the ambiguity of the words for chance in German, in what follows I use the word chance to render the conjunction “ungefähre Zufall” so as to retain its place within the trilemma sketched in the introduction. 4
5 6
7 8
“Der Zufall wird dem Zwekmäßigen (dem, was die Menschliche Vernunft bey Zwekken in Anschlag bringen kann.) entgegengesetzt; das ungefähr dem, was nach beständigen und bekanten Gesetzen geschieht.” (18:163, Refl.5370). Universal Natural History and the Theory of Heavens trans. by Ian Johnson. Virginia: Richer Resources Publications 2008, 1:XXVII. Beginning with Isaac Newton’s queries, these features of the solar system, such as the uniform direction of the motion of planets and their satellites, were attributed to divine intervention, not unlike miracles (See the 31st query of Newton’s Optics (1704), and the General Scholium for the second edition in 1713 of the Principia). Kant seconded the prevailing opinion that “mere chance” could not account for these uniformities. Concerning the moons of the planets, he asked “why move these companions in their common direction rather toward one side toward which the planets orbit, rather than toward another? … all directions are equal[ly possible]; a mere chance will choose among all these possible directions that in which the singing motion of the stuff issues in orbits.” Universal Natural History, op. cit., 1:63. Kant expressed disapproval of the Epicurean cosmology on several occasions, precisely because it assumed chance happenings. Lectures on Metaphysics, trans. by Karl Ameriks and Steve Naragon, Cambridge: Cambridge University Press 1997, 28:663.
456
Berna Kilinc
Kant’s pronouncements on chance, few and scattered though they were, also point to a close association between this concept and the categories of contingency/necessity. This association is facilitated in German by the very etymology of the words Zufall and Zufälligkeit (contingency). For instance, in Herder’s transcription of his metaphysics lectures from 1762–4, we find: “what is not necessary by a sufficient determinate ground is chance. … It is also called chance when one indeed knows grounds of an event, but does not have in view all the grounds of the event.”9 In the mid-1770s, he defined chance as “blind accident”, articulating this as “an event which is contingent, and indeed, that the contingency takes place in every regard”.10 In these passages, chance comes out as a kind of contingency. It seems that Kant at times used the words chance and contingency interchangeably. For Kant, contingency, the opposite of necessity, is a categorical notion, and should not come in degrees. Yet, Kant discerned degrees when he remarked, for instance, in the Critique of Judgment: if one adduces, e.g., the structure of a bird, the hollowness of its bones, the placement of its wings for movement and of its tail for steering, etc., one says that given the mere nexus effectivus in nature, without the help of a special kind of causality, namely that of ends (nexus finalis), this is all in the highest degree contingent: i.e., that nature, considered as a mere mechanism, could have formed itself in a thousand different ways without hitting precisely upon the unity in accordance with such a rule …11
This observation about the structural and functional unity of an organism and the impossibility of explaining this unity by “the mechanical mode of explanation” is a recurring theme in the Critique of Judgment. As Kant reckoned, it is beyond the pale of understanding to explain why of all the myriad ways parts of an animal could have developed and assembled one combination with such an amazing functional unity fell into place. Similarly, in his example of encountering a regular hexagon drawn in the sand in an uninhabited land, Kant noted that we cannot attribute the drawing to natural causes (the sand, the nearby sea, the wind, the footprints of any known animals, or any other nonrational cause), since the “contingency” of it being due to a rational being, is “so infinitely great”.12 One could conceive of chance as a magnitude in the period when Kant wrote, and that may explain this easy slide between degrees of chance and degrees of contingency. Taken in its official signification however, contingency, like unity, actuality or causality, cannot have degrees. Events are contingent or not, excluding middling degrees.
9 10 11 12
Ibid., 28:41. Ibid., 28:199. Critique of Judgment, loc. cit. 5:360, third italics mine. Ibid., 5:370.
Kant on Chance and Explanation
457
3. THE CHANCE HYPOTHESIS That some phenomena may be due to chance was conceived not only by the Epicurean atomists, but also by the mathematically inclined investigators of demographic phenomena since the seventeenth century. In one of the earliest studies on the London Bills of mortality, the London merchant John Graunt suggested in 1662 that, assuming that an individual has an equal chance of dying in any of the decades between the ages six and seventy-six, one can explain the mortality results (summed up in decades) in the London Bills. The chance hypothesis was put to test in a short tract on birth ratios that appeared in 1710–11. The author, the British physician John Arbuthnot, understood by chance an equal tendency for male and female births, and argued that the results of the compilation of registries for the last 82 years, all indicating a preponderance of male births, would have had a staggeringly low probability were the chance hypothesis true. Noting that “it [such an extreme result] is very improbable (if mere Chance govern’d)”, Arbuthnot favored the design hypothesis, which he construed as a bias for male births.13 Arbuthnot’s way of characterizing the chance hypothesis (as the specification of equal probabilities) was common in the period, even though his presupposition that each alternative outcome gets equal probability was questioned. Soon after the publication of Arbuthnot’s tract, Nicholas Bernoulli remodeled the demographic data by using another chance hypothesis, namely, that the chance of a male birth is 18/35 and argued: “Let 14 000 dice, each having 35 faces, 18 white and 17 black be thrown up, and the odds are very great indeed that the numbers of black and white faces shall come as near, or nearer to each other, as the number of boys and girls in the bills.”14 Kant was also fascinated by the kinds of statistical regularities Arbuthnot and many other writers on political arithmetic had pointed out. He wrote as early as in 1763 in the The Only Possible Argument: Experience, too, confirms the dependency of even the freest actions upon a major natural rule. For, contingent as the decision to marry may be, it is nonetheless found that in a given country the proportion of marriage to the number of those living is fairly constant, provided that one operates with large numbers. For example, it will be found that among one hundred and ten persons of both sexes there will be one married couple. The extent to which man’s freedom contributes to the lengthening or shortening of life is a matter of common knowledge. Nonetheless, even these free actions must be subject to a greater order, for, on average, if one operates with large numbers, the number of mortalities stands in a very exact and constant ratio to the number of the living. These few proofs may suffice in some 13 John Arbuthnot, “An Argument for divine providence taken from the constant regularity observed in the births of both sexes”, in Philosophical Transactions of the Royal Society of London, 27 (1710-12):186-190, at p. 188. Arbuthnot’s reasoning appears to be the first published statistical test of significance. See Ian Hacking, The Emergence of Probability. Cambridge: Cambridge University Press, 1975, pp.166-9. 14 Quoted in Hacking, op. cit., p.168.
458
Berna Kilinc
measure to explain the fact that even the laws of freedom do not, in respect of the rules of the general order of nature, involve any such indeterminacy.15
It is surprising that Kant combined law and freedom in this fashion, given the antithetical position he defended (and also subscribed to at the end) in the third antinomy of reason: one cannot say that in place of the laws of nature, laws of freedom enter into the course of the world, because if freedom were determined according to laws, it would not be freedom, but nothing other than nature. Thus nature and transcendental freedom are as different as lawfulness and lawlessness (A447/B475).
One cannot simply explain away Kant’s recourse to laws of freedom in his pre-critical writings as an immature metaphysical commitment, since Kant did refer to the same in the Critique of Reason, commenting on his existential despair over “the contingency of conception”, and finding solace in the belief that “the contingency in the individual is nonetheless subjected to a rule in the whole” (A779/B807).16 Kant interpreted the regularities in demographic phenomena in the same way several theologians did in the period, most notably the Protestant pastor Johann Peter Süssmilch.17 According to them, long run statistical stability in human affairs was another indication of the original wise design of the world. Kant echoed the same in his 1784 essay On History, and more explicitly in a reflection: That which, as an individual event, happens by chance is, they say, on the whole necessary; as, for instance, what a human being will do is, quite naturally, by chance; on the whole, however, he does what God wills; in a similar way, providence cares about the preservation of the species and has abandoned every individual.18
Kant was familiar not only with the statistical studies on demographic phenomena, but also with the probability calculus. He referred to the calculus of probabilities both in the Critique of Pure Reason (A293/B349) and in the Prolegomena.19 He delivered lectures on probability as recorded in almost all of the surviving 15 Theoretical Philosophy 1755-1770, trans. and ed. by David Walford. Cambridge University Press 1992, 2:111. 16 “… der Zufall im einzelnen nichts desto weniger einer Regel im ganzen unterworfen ist …” 17 Süssmilch’s Die göttliche Ordnung in den Veränderungen des menschlichen Geschlechts aus der Geburt, dem Tode und der Fortpflanzung desselben erwiesen appeared in 1740 and went through four editions by 1798. See Theodore M. Porter, The Rise of Statistical Thinking 1820-1900. Princeton: Princeton University Press, 1986, p. 21, 50-51. 18 16:438, Refl. 2611. 19 Prolegomena to Any Future Metaphysics, New York: Macmillan Publishing Company 1950, 4:369-370.
Kant on Chance and Explanation
459
transcriptions of his logic lectures. There is also good evidence that Kant was familiar with Jakob Bernoulli’s theorem (the weak law of large numbers), for we find scribbled in one of the reflections the very problem that Bernoulli had set out to prove: “To guess, from the actual number of cases (of white and black ball) the number of both there are”.20 In his theorem, Bernoulli presented a model of causation in which the composition of the urn (the ratio of white to black pebbles) from which repeated drawings were made could be seen as the chance-cause of the distribution of outcomes after many trials. Given Kant’s familiarity with these new ways of dealing with uncertainty through the chance hypothesis, it would not be too far-fetched to expect their influence to show in his philosophy of science. For instance, he could have noted that the hypothesis that the chance of marriage of an adult individual is 1/110 explains the population regularity regarding marriages. Chance was precisely the concept invoked to model variability in individuals and the stability in populations in the long run. Yet, in this context instead of invoking chance, Kant preferred teleology, in the style of natural theology tradition. In the following section, I go a little farther into Kant’s account of cognition and science to provide some reasons for this omission.
4. WHY IS CHANCE ILLEGITIMATE? Kant held that chance should not be appealed to in explaining any phenomenon. I believe this repudiation of chance may be related to the peculiar role of the categories of contingency and necessity in explanations. Kant invoked the categories of contingency and necessity as concepts of the understanding responsible for thinking that which the category of actuality provided. The actual X, that is, that which we cognize as existing, is further judged as existing contingently or necessarily. Superimposing Kant’s threefold distinction among the modes of explanation of an existing thing X, namely chance, teleological/intentional and natural modes of explanation upon the categories of contingency and necessity, we see that the first and second modes of explanation can be associated with contingency. In other words, both chance and teleology/intention, if correctly deployable, are contingent modes of understanding. Contingency, however, means that which the reason cannot yet grasp as necessary. The goal of a proper science is to displace the contingent with the necessary. Let me spell out this point in more detail. For Kant, the categories are necessary to conceptualize the truth-makers of the kinds of judgments we are capable of making in order to represent empirical states of affairs. They are abstract patterns for thinking objects—“forms of thought”—by combining the sensory content (intuitions) provided by sensibility (B288). Categories are objective for they objectify. While Kant proclaimed this for all the cat20 16:440, Refl. 2619.
460
Berna Kilinc
egories, he found it necessary to qualify in the case of the modal categories: “The principles of modality are not, however, objective-synthetic, since the predicates of possibility, actuality, and necessity do not in the least augment the concept of which they are asserted in such a way as to add something to the representation of the object” (A233/B286). Kant underlined this contention by adding that the principles of modality are synthetic but in a subjective way. The categories of contingency and necessity qualify the cognition of actuality. The actuality of an object depends on the deliverances of the “material of the senses”, but not in a reductionist way. Kant did not reduce cognition of actuality to direct perception, and thereby departed from a strict empiricist criterion for existence such as observability. More precisely, Kant allowed for a distinction between what is immediately actual and what is mediately actual (to an individual cognizer). In Kant’s account, the latter is some sort of a closure of the former through the analogies of experience. One’s immediate encounter with the world is the basis of this extension, which may overlap with or without a remainder with what is immediately actual to another person. The domain of the actual is thus token-related to an individual cognizer. Actuality depends on the position of the cognizer in the world, projecting the cognizer’s immediate discursive grasp of that position in space and time: “wherever perception and whatever is appended to it in accordance with empirical laws reaches, there too reaches our cognition of the existence of things” (A226/B273). The use of the categories of contingency and necessity augments that dependence further, extending to the epistemic background of an agent. Kant maintained that all necessity is “comprehensible necessity” (A228/B280), for we judge only states of substances to be necessary on the basis of laws of causality. “Necessity therefore concerns only the relations of appearances in accordance with the dynamical law of causality, and the possibility grounded upon it of inferring a priori from some given existence (a cause) to another existence (the effect).” (A227-8/ B280). The contingency of a state of a substance, as can be inferred from the characterization of necessity, arises from an inability to access the law of causality that connects it with the previous states of this substance. The contingency does not therefore indicate the absence of a causal link, but rather the manner of cognizing the actual, as nothing but the actual. Kant’s explication of the category of contingency was very sparse, but he felt the need to point out that this category should not be understood in connection with the second analogy, lest it becomes a mere tautology: That the proposition “Everything contingent must have a cause” may be evident to everyone from mere concepts is not to be denied; but then the concept of the contingent is already taken in such a way that it contains, not the category of modality (as something, the nonexistence of which can be thought), but that of relation (as something that can only exist as the consequence of something else), and then it is, of course, an identical proposition: “What can only exist as a consequence has its cause.” (B290).
Kant on Chance and Explanation
461
In its modal role, the category of contingency depended on the subject, on whether she could consistently hold the existing state of affairs to be not inevitable. In this way, what is contingent for one person may be necessary for someone else, and vice versa. The modal categories are subjective in that they do not serve to describe a knower-independent (but necessarily cognition-dependent) aspect of empirical reality. Another consequence of this kind of dependency is that judgments of contingency do not necessarily become obsolete with the passage of time, even when the state of affairs judged to be contingent obtained in the past. Whether the existences are in the past or in the future, the scientific enterprise lies in showing the contingent to be necessary. Writing “to explain means to derive from a principle,” Kant would concur with Carl Hempel that a scientific explanation requires the derivation of the explanandum from a general law.21 Kant’s demand was stronger, however, in that the law in question should state a necessity, at least in a proper rational science, in which “necessity of laws is inseparably attached to the concept of nature, and therefore makes claim to be thoroughly comprehended.”22 He held that Humean regularities cannot explain, unless those regularities are grounded in necessity. As he described it, on the Humean account, the connection of cause and effect was “only subjectively necessary, but objectively merely contingent … precisely as Hume has it when he calls this mere illusion from custom.”23 In contrast, laws for Kant are “principles of the necessity of that which belongs to the existence of a thing”.24 When such necessary laws are not yet known, Kant must have conceded the ascriptions of chance, as he himself used that ascription on several occasions, for instance, remarking on the “chance defects” of anomalous organisms.25 Yet, such ascriptions do not explain. Neither chance nor teleological explanations are legitimized by an admission of ignorance.
5. KANT ON SCIENTIFIC METHOD The above consideration shows that both chance and teleological/intentional explanations are unacceptable for Kant in a scientific endeavor, for they are contingent. Indeed, instead of embracing teleological explanations where they were the most needed, namely in understanding structure and function in living organisms, Kant accorded teleology a regulative function, permitting its use in purposeindicating but not truth-evaluable hypotheses in the hopes that this may generate 21 Critique of Judgment, op.cit. 5:412. 22 Metaphysical Foundations of Natural Science ed. by Michael Friedman, Cambridge: Cambridge University Press 2004, 4:469. 23 Ibid., 4:476. 24 Ibid., 4:469. 25 Critique of Judgment, op.cit. 5:372.
462
Berna Kilinc
better natural explanations.26 As he put it, “teleological judging is rightly drawn into our research into nature, at least problematically, but only in order to bring it under principles of observation and research in analogy with causality according to ends, without presuming thereby to explain it.”27 One can treat organisms as if their internal parts were organized to fulfill certain functions and ends in order to describe those parts and those functions better. The nineteenth century advocates of non-teleological evolution, appropriating the findings of design analogy, put these results to a very different kind of explanation—a fact that testifies to the correctness of Kant’s foresight about how the metaphysical “counterfactuals” of method (that is, the “as if” view of things) need not determine the metaphysical ingredients of the resultant research. However, Kant’s strategy of incorporating an unjustifiable metaphysics into methodological precepts did not go so far as to license the chance hypothesis. Instead, when the chance hypothesis was the most appropriate, as in demographic regularities, he opted for the teleological mode. Are there reasons deriving from Kant’s idea of science and scientific method that may explain this aversion to chance? For Kant, apart from the disclosure of necessary laws, there were two major demarcation criteria for a proper science: systematicity and quantification. As regards the second criterion, the chance method should be fully admissible, as it is intended to produce precise and quantifiable results. As regards the first criterion, one may conjecture that Kant would regard the chance method to be detrimental to the achievement of a systematic account of nature. We know how to combine various chance hypotheses, but neither Kant nor his contemporaries knew about the abstract frameworks that can model the conjunction of statistical theories. Another possible objection to the chance method is that it conflicts with the natural and the teleological methods. In Kant’s favorable description of the method of naturalists, this opposition is made clear: the anatomists of plants and animals, in order to investigate their structure and to understand for what reason and to what end they have been given such a disposition and combination of parts and precisely this internal form, assume as indispensably necessary the maxim that nothing in such a creature is in vain, and likewise adopt it as the fundamental principle of the general doctrine of nature that nothing happens by chance.28
Could Kant be content with methodological maxims that guided research into different and even opposing directions? The fact that he did embrace both the naturalistic and the teleological methods in the Critique of Judgment shows that he found this unproblematic. There is further evidence that Kant thought maxims may be at odds with each other. Concerning natural classifications, Kant noted that 26 Kant would not oppose invoking intentional human agency, for instance, in his example of inferring the cause of a polygon drawn in sand. 27 Critique of Judgment, op. cit. 5:361. 28 Ibid., 5:376.
Kant on Chance and Explanation
463
there may be conflicting principles of reason—guiding the lumpers and splitters in natural history—and they may issue in different extensions and systematizations of current cognitions (A654/B682). Instead of advocating one of these maxims to the exclusion of the other, he embraced both, supplementing the traditional maxim that “entities are not to be multiplied unnecessarily” with the one he himself formulated: “the varieties of entities are not to be diminished rashly” (A656/B684). Thus, he did not carry over a demand for systematicity to maxims of scientific research.
6. CONCLUSION I conclude with a small observation to sum up the points made. In many contexts in which Kant remarked on the nature of explanation, explanation pertained to appearances (see A476-7/B504-5, A480-1/B508-9, A562/B590, A772-3/B800-801). It is curious that Kant would regard appearances, the non-discursive representations through sensibility, rather than phenomena, to constitute the explananda. Appearances, like intuitions, should be blind, if anything can be said of them at all. But this may precisely be the reason why Kant thought explaining them was the ultimate achievement of understanding and reason. Understanding, in the Kantian sense, is an attempt to press appearances into discourse by the help of categories. Kant would also often say of chance that it is blind, in the sense that it deprives the understanding of all use and is “a breach against the laws of the understanding and of reason”.29 In some sense then appearances are all chancy, at the level at which they are given, for they are not yet grasped by understanding or reason. Their explanation is provided by the way the subject assimilates the given to her understanding and reason. But this assimilation, according to Kant, had to respect the idea of nature, precluding chance and teleology. In the framework of transcendental idealism, it would be more accurate to say that the very assimilation in question brings about nature, just as scientific explanations bring about sciences. The point I suggested in this essay is that, granting transcendental idealism, nature is still narrowly understood by Kant. The blindness of the appearances is not necessarily tantamount to the chance in nature.
Philosophy Department Bogazici University Felsefe Bolumu 34342 Bebek, Istanbul Turkey [email protected] 29 Lectures on Metaphysics, op.cit., 28:200.
MICHAEL STÖLTZNER
SHIFTING THE (NON-R ELATIVIZED) A PRIORI: HANS REICHENBACH ON CAUSALITY AND PROBABILITY (1915–1932)
In the still growing literature on the history of Logical Empiricism, the Kantian roots of some of the movements’ core ideas and its relationship with neo-Kantianism have been discussed frequently. A case in point is Michael Friedman’s analysis of the debates between Hans Reichenbach and Moritz Schlick during the year 1920, after which Reichenbach—prematurely, to Friedman’s mind—abandoned his relativized a priori conception of space and time in favor of a straighter empiricist approach, mainly in order to force issue with neo-Kantian critics of general relativity.1 More recently, these primarily historical investigations have been supplemented by two systematic projects according to which the relativized a priori, if conceived against the backdrop of the mathematics of the day, contains the seeds for a better understanding of modern physics and permits a defense of scientific philosophy against some of the classical criticisms by Quine, Kuhn, and others. While Friedman, through this Kantian turn, intends to maintain core insights of the Logical Empiricist tradition, Ryckman challenges the authority of Reichenbach and Schlick in matters of the „special and the general theories of relativity and elaborates a different approach that departs from an alternative Kant-inspired tradition centering around the pertinent works of Ernst Cassirer and Hermann Weyl.2 Against the backdrop of these projects, which focus on geometry and the interpretation of relativity theory, it is interesting to see that in the case of the third Kantian category pivotal for the emergence of modern physics, to wit, causality, the historical development of Reichenbach’s ideas has been substantially different. In matters of causality and probability, I argue, Reichenbach was more prone to an aprioristic foundation of science than with respect to space and time. In line with his shift towards empiricism, it is true, Reichenbach, after 1920, came to the conclusion that the law of causality was empirical and could be faulted by the future 1
2
Michael Friedman, “Geometry, Convention, and the Relativized A Priori: Reichenbach, Schlick, and Carnap”, in Wesley Salmon and Gereon Wolters (eds.), Logic, Language, and the Structure of Scientific Theories, Pittsburgh–Konstanz: University of Pittsburgh Press, 1994, pp. 21-34. See also Friedman’s Reconsidering Logical Positivism, Cambridge: Cambridge University Press, 1999, and, with respect to Carnap’s Aufbau, Alan Richardson, Carnap’s Construction of the World, Cambridge: Cambridge University Press, 1998. Compare Michael Friedman, Dynamics of Reason, Stanford: CSLI Publications, 2001, and Thomas A. Ryckman, The Reign of Relativity. Philosophy in Physics 1915-1925, Oxford: Oxford University Press, 2005.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_31, © Springer Science+Business Media B.V. 2011
466
Michael Stöltzner
development of physics. Yet, the characteristic trait of Reichenbach’s approach to causality was that he never treated the concept of causality in isolation. For, the law of causality could only be applied to the empirical world if supplemented by a second principle that involved probability and inductive inference. As a consequence, there existed in point of principle no difference between the theory of error and any other probabilistic theory of physics. The development of Reichenbach’s conception of causality and probability, from his dissertation in 1915 to his emigration in 1933, was mainly marked by a change in the epistemological status of and a gradual shift of emphasis between those two principles. The story to be told in the present paper is a complex one, and it involves both the progress in atomic physics, above all the emergence of quantum mechanics in 1926, and Reichenbach’s treatment of related issues, among them relativity theory and the direction of time. The development of Reichenbach’s ideas on causality and probability can be divided into three phases. (i) From his Ph.D.-dissertation until 1920, he considered both principles, causality and the principle of lawful distribution, as synthetic a priori. In contrast to the categories of space and time, causality was not historically relativized. Still, the departure from Kant’s original doctrine and from neoKantianism was substantial because in virtue of the second principle all physical laws, at least on the empirical level, were merely probable. (ii) In the mid-1920s, Reichenbach called causality a complex of principles, the common core of which was the inductive principle of causality. It represented a hypothesis about nature, such that physics one day could be compelled to abandon causality. Reichenbach now considered the division between both principles as merely formal and proposed a theory that was based on the concept of probable determination alone. After the advent of quantum mechanics he proudly declared that he had foreseen the demise of determinism. (iii) After 1930, Reichenbach advocated a consistently probabilistic conception of physical theory. In order to maintain his identification of inductive inference, the theory of error, and probabilistic physics against the criticism that the former could not be translated into a statement about relative frequencies, he returned to an aprioristic conception. For, any claim that probability laws do not hold was self-contradictory because it already presupposed the principle of induction. To my mind, Reichenbach in effect treated induction—in the same vein as the principle of lawful distribution more than a decade before—as an a priori condition for the possibility of experience, the only difference being that no transcendental deduction was available to justify it. Notice that causality has never become one of the alleged dogmas of Logical Empiricism, even though it represented a central theme for all those members of the movement who had a physics background. One of the reasons is that there existed substantial divergences of opinion between Reichenbach and his colleagues Philipp Frank, Richard von Mises, and Moritz Schlick. The other is that Reichenbach, in his influential 1944 Philosophic Foundations of Quantum Mechanics, abandoned his probabilistic approach insofar as he considered the description
Shifting the (Non-Relativized) A Priori
467
of ‘interphenomena’ between quantum mechanical measurements a meaningful question and emphasized the differences between statistical mechanics and quantum mechanics.
1. TWO SYNTHETIC APRIORIS: THE EARLY REICHENBACH Already in his Ph.D. thesis of 1915, Reichenbach developed two ideas that would remain central to his philosophy. First, the principle of causality, to become at all applicable to the description of physical phenomena, must be supplemented with a second principle, then called the principle of lawful distribution or the principle of the continuous probability function. Second, there existed no fundamental difference between the theory of error presupposed by any measuring science and the probabilistic theories of physics. This implied that dynamical (strictly causal) and statistical laws were understood as lawful within the same conceptual framework. In his early works, Reichenbach considered both the principle of causality and the principle of lawful distribution as synthetic a priori.3 Causality guaranteed that there existed a functional form of every physical judgment, while the principle of lawful distribution guaranteed the existence of definite values of the physical magnitudes figuring in these equations because it implied that the disturbing factors, the “irrational remainder of the determinants” not expressed in any additional laws, remained small enough. By doing so the probability function “connects the events laterally”.4 Notice that in contrast to his philosophy of space and time, Reichenbach did not historically relativize these synthetic a priori principles. Still, the departure from Kant’s original doctrine was substantial because there were two categories at work, rather than causality alone, and they yielded a kind of lawfulness more general than Newtonian determinism. At the empirical level, in virtue of the second principle all physical laws became merely probable. On the other hand, genuinely probabilistic laws could thus be treated as objective laws of nature rather than “escape routes sought out by the physicist when he lacks a more precise knowledge of the connections involved”.5 In this vein, Reichenbach also criticized Laplace’s 3
4 5
See Hans Reichenbach, “Der Begriff der Wahrscheinlichkeit für die mathematische Darstellung der Wirklichkeit”, Zeitschrift für Philosophie und philosophische Kritik 161 (1916), 210-39 & 162, 9-112, 223-53, trans. The Concept of Probability in the Mathematical Representation of Reality, Chicago: Open Court, 2008; “Die physikalischen Voraussetzungen der Wahrscheinlichkeitsrechnung”, Die Naturwissenschaften 8, 4655; Nachtrag, p. 349; English trans., “The Physical Presuppositions of the Calculus of Probability”, in Reichenbach, Selected Writings, 1909–1953, vol. 2 (ed. by Maria Reichenbach and Robert S. Cohen), Dordrecht: Reidel, 1978, pp. 293-311; “Philosophische Kritik der Wahrscheinlichkeitsrechnung”, Die Naturwissenschaften 8, 146-53, trans. in Reichenbach, Selected Writings, op. cit., vol. 2, pp. 312-27. “Philosophische Kritik”, op. cit., pp. 148/315 and 152/324. “Philosophische Kritik”, op. cit., p. 153/326.
468
Michael Stöltzner
principle of insufficient reason because it was unable to provide any positive reason for equiprobability.6 The equiprobability hypothesis was not even required to justify the principle of lawful distribution. Taking up Poincaré’s analysis of games of chance, Reichenbach argued that one only needs the hypothesis that the values of a probability function f(Ω) are equally probable if the values of Ω are infinitely close. Or in other words, f(Ω) must be continuous while its special form is irrelevant. Since this hypothesis transcends all finite experiences, it represents a synthetic a priori principle that guarantees the applicability of probability calculus. Schlick’s first theory of causality was also Kantian in spirit, quite in contrast to the stance he had taken in matters of relativity theory.7 He imposed two a priori conditions, simplicity and uniformity, that distinguished causal laws from other regularities. The only empirical elements of the principle of causality were linked to the general theory of relativity, namely the absence of action-at-a-distance and the fact that the space-time coordinates must not explicitly enter into causal laws, that is, Einstein’s principle of general covariance. Thus in matters of causality there was no Schlick available to lure Reichenbach into a more consistent empiricism. Their differences were rather lying in matters of probability. Reichenbach’s second principle imposed a ‘lateral’ order on what Schlick, following von Kries, had termed the ontological regularities.8 These governed what was left undetermined by the nomological regularities, the causal laws of nature. Within von Kries’ Spielraum-interpretation of probability, which Schlick would endorse until the end of his life, all statistical laws were dwelling in the ontological domain, such that objective chance could be integrated into a deterministic world-view.
2. YEARS OF TRANSITION: THE MID 1920S In a paper written three years later but published only in 1933, Reichenbach had partly changed his mind. Although he considered the a priori conception of causality as irrefutable, because one could still claim the existence of causal laws that have not been found to date, the principle was not positively required for the existence of natural laws. The application of the other principle, which was now called probabilistic (or inductive) inference [Wahrscheinlichkeitsschluß], however required that causality was not a priori excluded. At the beginning of the paper, Reichenbach called causality a complex of principles and provided a non-exhaustive list of its elements. It contained Schlick’s 6 7
8
See “Die physikalischen Voraussetzungen”, op. cit. Moritz Schlick, “Naturphilosophische Betrachtungen über das Kausalprinzip”, Die Naturwissenschaften 8 (1920), 461-74, trans. “Philosophical Reflections on the Causal Principle” in Schlick, Philosophical Papers (ed. by Henk L. Mulder and Barbara van de Velde-Schlick), Dordrecht: Reidel, 1979, vol. I, pp. 295-321. Compare Johannes von Kries, Prinzipien der Wahrscheinlichkeitsrechnung, Freiburg i.B: Mohr, 1886.
Shifting the (Non-Relativized) A Priori
469
above-mentioned assertion that space-time coordinates must not figure in the laws themselves, the principle of action by contact, and the temporal order of events. But all three were only partial claims that supplemented the more general inductive principle of causality. This “says that by means of a functional relationship unobserved events can be predicted from observed ones, no matter whether the observed events lie in the future, or in the past, or happen at different space points simultaneously with the act of observation”.9 As he made clear in a rather similar list in his entry for the Handbuch der Physik that was written in the mid-1920s, causality was not exhausted by the concept of a function, as Mach had held, because it represented “a functional connection of a very specific character.”10 Laplacian determinism, on the other hand, represented an unwarranted extrapolation beyond the implication from causes to effects. Reichenbach, having abandoned the synthetic a priori, thus in the mid-1920s positioned himself between the Kantian and the empiricist tradition: the principle of causality could be empirically false but the principle of probabilistic or inductive inference remained a condition for the possibility of scientific knowledge. Reichenbach’s inductive principle of causality operated as such: Starting from a presumed law Fr(p1, …, pr) we find further relevant causes pr+1, …, pr+s that lead to a modified function F’r+s (p1, …, pr, pr+1, …, pr+s). This new governing function is the simplest function that, without being ad hoc, approximates the additional parameters in the least squares. Iterating this procedure with new classes of observed points M’, M’’, … we obtain either the infinite governing sequence (I) Fr, F’r+s, F’r+s, F’r+s, … or (II) Fr, F’r+s, F’’r+s+t, …, F(i)r+s+t+…+w, … In case (I) we have found a causal law, whereas in case (II) the connection between the observations is random. Both cases “characterize an objective state of affairs”, a conclusion for which the requirement of inductive simplicity is crucial. For only then, “the subclasses M, M’, M’’ furnish different governing functions from that furnished by the total class M(i).”11 Otherwise, (I) could trivially be obtained by an arbitrarily complex function. Apart from finding the governing function F’r+s, inductive simplicity implied that the intermediate values between two observed values, that is, future measurements, were described by F’r+s. This assumption of smoothness shows that inductive simplicity had taken the place of the principle of the continuous probability function. To sum up: “Either no continuous causal laws exist or they can be obtained by the requirement of simplicity.” 12 9
Reichenbach, “Die Kausalbehauptung und die Möglichkeit ihrer empirischen Nachprüfung”, Erkenntnis 3 (1933), 32-64, trans. in Reichenbach, Selected Writings, op. cit., vol. 2, pp. 345-71. 10 Compare Ernst Mach, The Science of Mechanics, La Salle, IL: Open Court 1989 (orig. 1883), and Reichenbach, “Ziele und Wege der physikalischen Erkenntnis”, in Handbuch der Physik, vol. 4, Berlin: Springer, pp. 1-80, trans. in Reichenbach, Selected Writings, op. cit., pp. 120-225. 11 Reichenbach, “Die Kausalbehauptung”, op. cit., pp. 43/354 and 45/355. 12 Ibid., p. 51/361.
470
Michael Stöltzner
Other than descriptive simplicity, which guided the choice by convention of a geometry in relativity theory, inductive simplicity represented a hypothesis about nature. In virtue of this difference, Reichenbach also rejected the conventionalist conception of causality because “the principle of causality constitutes a restrictive statement about the behaviour of physical phenomena, and may therefore encounter contradictions.”13 This seems to me the reason why Reichenbach remained largely silent about Frank’s theory of causality. In his first philosophical paper, Frank had argued that the general law of causality was a purely conventional definition.14 Even though, in his 1932 book The Law of Causality and its Limits, Frank explicitly called this approach one-sided and searched for an empirical meaning of causality, he arrived pretty close to his original findings and could only diagnose that the validity of the principle of causality was presupposed in our every-day and scientific judgments. How could Reichenbach’s hypothesis of inductive inference and, accordingly, the principle of causality be justified? Evidently, the series (I) and (II) are infinite, while further observations yield only finitely many data points. Thus we only know with probability whether causality holds or not in a given case. But the empiricist argument that our experiences (probabilistically) confirm whether causality holds, misses the point. For, each single case contains the problem of induction in its entirety. “Whether causality holds in a specific instance can ultimately be decided only by investigating that instance. If causality holds in other cases, the probability that causality holds in the specific case under consideration merely increases”.15 But it never actually reaches unity, that is, certainty. “It is therefore not impossible that physics will some day be confronted by phenomena that compel it to abandon causality.” Mentioning quantum theory, Reichenbach concluded that “in principle, it is possible to determine on the basis of experience whether causality holds.”16 Little wonder that, when publishing the paper with a decade of delay, Reichenbach proudly announced that meanwhile quantum mechanics had led to a breakthrough of his conception by providing a physical theory of type (II). Interestingly, in the handbook entry he had argued, more cautiously, that Heisenberg’s uncertainty relations were “an entirely new kind of restriction to our knowledge of nature, the existence of which was never before suspected”.17 Reichenbach’s 1925 paper on the causal structure of the world was ambitious. Entirely dispensing with the hypothesis of strict causality, he proposed a concep-
13 Ibid., p. 59/367. 14 Philipp Frank, “Kausalgesetz und Erfahrung”, Ostwald’s Annalen der Naturphilosophie 6 (1907), 443-50, trans. “Experience and the Law of Causality” in Frank, Modern Science and its Philosophy, Cambridge, Mass.: Harvard University Press, 1949, pp. 62-8. 15 Reichenbach, “Die Kausalbehauptung”, op. cit., p. 60/367. 16 Ibid., p. 63/370. 17 Reichenbach, “Ziel und Wege”, op. cit., p. 78/216.
Shifting the (Non-Relativized) A Priori
471
tion based on “the concept of probable determination alone”.18 This “accomplishes everything that is achievable by physics and … furthermore possesses the capacity to solve the problem of the difference between past and future, a problem to which the strict causal hypothesis has no solution”.19 Although he maintained his earlier convictions that physics rests upon both “the principle of causal connection and the principle of probable distribution”, and that one can in principle separate the causal connection between the determining factors and the probabilistic distribution of the remaining factors, he now considered the latter division as purely formal.20 It “can be replaced by the single assumption that a connection of a probabilistic nature exists between cause and effect.”21 This connection was anchored on the level of logic. Reichenbach replaced the causal connection of events by ‘A implies B with probability’, or A B, which he understood as a primitive concept and provided a list of laws fulfilled by it, “which claims neither to be exhaustive nor to represent a table of independent axioms.”22 While logical implication (→) connects propositions, probability implication connects events. The most striking formal novelty was that (A B) → (A ¬B). One thus obtained a topology of probability implications, while the probability measure remained unspecified. This topology, Reichenbach claimed, was sufficient to define a temporal order of events. “If probability implication is valid in only one direction [i.e. (B A) ∧ ¬ (A B)], then the antecedent [B] is the temporarily later event”.23 The main difference was that “[n]othing short of the totality of all causes is required for inferences into the future, but inference about the past can be made on the basis of a partial action [of causes]”.24 The future was thus objectively undetermined. Reichenbach also provided a detailed analysis of various inferential scenarios between three or more causes in the form of causal forks. This approach in its mature, and more rigorous, form outlined in the posthumous The Direction of Time became pretty influential on the debates about causality in the 1960s and 1970s.25 In the handbook entry, Reichenbach also discussed the relationship between causality and the special theory of relativity on the basis of his method of mark transmission.26 A mark represents a small variation in an event. If we attach a mark 18 Hans Reichenbach, “Die Kausalstruktur der Welt und der Unterschied von Vergangenheit und Zukunft”, Sitzungsberichte der Bayerischen Akademie der Wissenschaften, mathematisch-naturwissenschaftliche Abteilung, 1925, 133-75, trans. “The Causal Structure of the World and the Difference between Past and Future” in Reichenbach, Selected Writings, op. cit., vol. 2, pp. 81-119, at p. 136/83. 19 Ibid., p. 133/81. 20 Ibid., p. 135/82. 21 Ibid., p. 138/84. 22 Ibid., p. 146/91. 23 Ibid., p. 150/94. 24 Ibid., p. 151/96. 25 Hans Reichenbach, The Direction of Time, Berkeley: University of California Press, 1956. 26 Compare Hans Reichenbach, Axiomatik der relativistischen Raum-Zeit-Lehre, Braun-
472
Michael Stöltzner
to the cause A, this mark will also be observable in the effect B, but not vice versa. This asymmetry is “the distinctive characteristic of the causal relation … [and] can, in turn, be used in defining the sequence of time”.27 Accordingly, the “objective significance of time consists in its formulating the type of order of causal chains. It is, then, a physical theory of a very general nature, but not in any way the product of a special human faculty”—just as Kant had assumed.28 And, referring to his 1925 definition, he argued that the microscopic events in nature could be subjected to temporal order. Boltzmann’s contention that irreversibility and the direction of time emerge only as statistical features at the macro-level, while atomic collisions remain reversible as in Newtonian mechanics, to Reichenbach’s mind, was too closely connected to the false ideal of Laplacian determinism. Moreover, Boltzmann’s reasoning by way of the probability of initial states did not get around the reversibility objection. And Reichenbach criticized Schlick’s claim “that every indication of temporal direction must conform to the Boltzmann scheme”.29 Other than in 1925, Reichenbach’s handbook entry anchored probability implication on the level of perception rather than space-time events. These perceptions are coordinated to the things—and for the positivist this coordination amounts to identity—or to the concepts denoting the things—a difference which the realist Reichenbach considered crucial. He now criticized Schlick’s claim according to which the uniqueness of this coordination represented the only feasible definition of truth. For, this characterization, first, “pertains solely to the ultimate goal of knowledge” and, second, “does not offer any means whereby the truth of a given physical proposition can be tested”.30 The only way to solve the second problem, Reichenbach held, was to analyze our observations and propositions by means of probability implications. “We will no longer be able to speak strictly of the truth of a proposition, but only of its degree of probability”.31 And he called a proposition correct if it was highly probable. The first problem led Reichenbach to regard truth so conceived as a property of a coherent system of scientific knowledge.
27 28 29
30
31
schweig: Vieweg, 1924, trans. Axiomatization of the Theory of Relativity, Berkeley: University of California Press, 1969. Reichenbach, “Ziele und Wege”, op. cit., p. 53/186. Ibid., p. 57/190. Ibid., p. 62/196; compare Schlick, “Naturphilosophie” in Max Dessoir (ed.), Lehrbuch der Philosophie: Die Philosophie in ihren Einzelgebieten, Berlin: Ullstein, 1925, pp. 397-492, trans. “Outlines of the Philosophy of Nature” in Schlick, Philosophical Papers, op. cit., vol. 2, pp. 1-90. Reichenbach, “Ziele und Wege”, op. cit., pp. 28/154 and 29/155; compare Schlick, Erkenntnislehre, second edition, Berlin: Springer, trans. General Theory of Knowledge, La Salle: Open Court, 1979. Reichenbach, “Ziele und Wege”, op. cit., p. 29/155.
Shifting the (Non-Relativized) A Priori
473
3. INDUCTIVE INFERENCE AS AN A PRIORI Unique coordination represented the basis on which Frank and von Mises obtained the same general conclusion as Reichenbach, to wit, that there was no categorical difference between dynamical and statistical laws; the symbols of a statistical theory were coordinated to mass phenomena rather than to individual experiences.32 To Frank and von Mises, Newtonian mechanics, relativistic geometry and (classical or quantum) probabilistic physics thus stood on a par. In the discussions on the first public meeting of the Logical Empiricists, Reichenbach vehemently opposed this comparison. For in the coordination of physical bodies to a mathematical theory, the concept of approximation appears, and this contains the concept of probability. … In the case of geometry, it is true, one is allowed to separate the problem of coordination from the mathematical theory because the problem of coordination does not contain any geometrical concept, but in the theory of probability the concept constituted by this theory enters into the problem of coordination.33
In his paper on the same meeting, Reichenbach advocated, more explicitly than before, the relative frequency interpretation and conjectured that every assertion of probability could be translated in an assertion of frequency.34 And in 1931 he openly criticized Schlick for remaining committed to the Spielraum interpretation and rejecting the notion of collective because it contained infinitely many elements.35 The association between finite observations and an infinite collective in the frequentist account, to Reichenbach’s mind, was based on probability (or inductive) inference. But here von Mises—rightly and with the authority of the mathematician who had laid the formal foundations of this interpretation—retorted that this association “was not translatable into a frequency statement”.36 To avoid this criticism, Reichenbach shifted the problem to the most basic level. “Probability logic cannot be squeezed into the Procrustes bed of strict logic” which leads to the “catastrophe of undecidability” about whether a law of nature 32 See Philipp Frank, “Was bedeuten die gegenwärtigen physikalischen Theorien für die allgemeine Erkenntnislehre?”, Die Naturwissenschaften 17 (1929), 971-7 & 987-94, trans., “Physical Theories of the 20th Century and School Philosophy” in Frank, Modern Philosophy, op. cit., pp. 90-121; Richard von Mises, “Über kausale und statistische Gesetzmäßigkeit in der Physik”, Die Naturwissenschaften 18 (1930), 145-53. 33 Reichenbach in “Diskussion über Wahrscheinlichkeit”, Erkenntnis 1 (1930), 260-85, at 275. 34 Reichenbach, “Kausalität und Wahrscheinlichkeit”, Erkenntnis 1 (1930), 158-88; partially trans. “Causality and Probability” in Reichenbach, Selected Writings, op. cit., vol. 2, pp. 333-44. 35 Compare Hans Reichenbach, “Das Kausalproblem in der Physik”, Die Naturwissenschaften 19, 713-22; English translation in Selected Writings, vol. 1, pp. 326-42. 36 Von Mises in “Diskussion über Wahrscheinlichkeit”, Erkenntnis 1 (1930), 260-85, at p. 282.
474
Michael Stöltzner
is actually confirmed or not.37 It embraces strict logic as a limit in the same vein as truth arises as the limit of high probability. Probability logic itself can only be justified by “the fact that we cannot think differently”. For: “The statement that probability laws do not hold is equivalent to predicting that, in repeated sequences, the regularity implied by the principle of induction does not hold—and this statement is empirically meaningful only if it can be decided inductively, i.e. if the principle of induction holds. The statement that probability laws do not hold is thus self-contradictory and makes no sense”.38 Since Reichenbach did not presuppose strict logic to hold, this contradiction did not amount to an indirect proof of the principle of induction. Rather, it finally dissolved Hume’s problem, as Reichenbach proudly announced. But to my mind this ambitious claim is unwarranted. Reichenbach in effect treated induction—in the same vein as the principle of lawful distribution a decade before—as an a priori condition for the possibility of experience, the only difference being that no transcendental deduction was available to justify it. Still one might wonder, whether it was not at bottom a transcendental argument based on the actual practice of science. Since Reichenbach granted on the other hand that the principle of causality could be empirically inadequate, it appears that the two principles had changed rank. While initially the second principle—be it lawful distribution or probabilistic inference—had only represented an indispensable complement to causality, it had not assumed the lead. Given his repeated claims to have presaged important epistemological characteristics of quantum mechanics, it is quite surprising that in those years Reichenbach did not embark on a more detailed discussion of it and only criticized two interpretative claims of Heisenberg’s. First, the ‘positivistic’ maxim to omit unobservable quantities from the theory “must be correctly reformulated as the stipulation that dispensable quantities should be eliminated.” Yet this was. To Reichenbach’s lights, a simple consequence of probability inference. Second, Heisenberg’s elucidation of the uncertainty relation as a disturbance effect, that is, that “the influence of the instruments of observation cannot be ignored, … is not viable”.39 For, as he argued a year later, “separation in object of observation and means of observation is an idealization that is to a certain extent fulfilled for certain macroscopic phenomena, but it cannot be regarded as a necessary presupposition of the exact sciences in the sense of the principle of causality”.40 The crucial point was rather that one could not push the probability to predict certain combinations of parameters arbitrarily close to unity.
37 38 39 40
Reichenbach, “Kausalität und Wahrscheinlichkeit”, op. cit., p. 170. Ibid., p. 187/343; translation readjusted to original. Reichenbach, “Ziele und Wege”, op. cit, p. 78/215. Reichenbach, “Kausalität und Wahrscheinlichkeit”, op. cit., p. 180-181/338, trans. amended.
Shifting the (Non-Relativized) A Priori
475
4. POST EMIGRATION: ABANDONING THE UNITY OF STATISTICAL AND QUANTUM MECHANICS
Having emigrated to the U.S., however, Reichenbach published extensively on the subject; and his 1944 book Philosophic Foundations of Quantum Mechanics inspired the subsequent debates. Although he still considered the “quantum mechanical criticism of causality … as the logical continuation of a line of development which began with the introduction of statistical laws into physics”, he no longer embraced them within a common program and emphasized the peculiarities of the quantum world.41 The central claim of the book was that causal anomalies were unavoidable if one insisted that interphenomena, i.e. the states between the phenomena actually observed, possess definite values. Reichenbach’s definition of a normal system was based on the idea that neither the laws of nature nor the states depend upon their being observed—while in 1930/1 he had rejected precisely this kind of separation between object system and measurement apparatus. The second major innovation of the book was Reichenbach’s three-valued semantics for quantum mechanical statements. He was dissatisfied with the Copenhagen criterion for physically meaningful statement because this restriction was of a meta-linguistic kind, and physics could not get by without any description of interphenomena. In his posthumous The Direction of Time, Reichenbach modified his causal theory of time because it became clear to him that the mark method was not free of temporal concepts. At the very end of the book, Reichenbach was worrying whether his idea of basing time on the microscopic order was faulted by R. P. Feynman’s contention that a positron corresponded to an electron going backward in time. In this way, a definite causal chain would exist merely locally and causal loops could not be excluded. Luckily, positrons are short-lived and “the vast majority of particles thus conform to the rules of ordered time.”42 Still, this was a statistical argument.43
Department of Philosophy University of South Carolina Columbia, SC 29208 USA [email protected] 41 Hans Reichenbach, Philosophic Foundations of Quantum Mechanics, Berkeley: University of California Press, p. 3. 42 Reichenbach, Direction of Time, op. cit., p. 268. 43 For a more detailed discussion of these later developments, see Thomas A. Ryckman, “Logical Empiricism and the Philosophy of Physics”, in Alan Richardson and Thomas E. Uebel (eds.), The Cambridge Companion to the Vienna Circle, Cambridge: Cambridge University Press, 2007, pp. 193-227.
PIERRE WAGNER
CARNAP’S THEORIES OF CONFIRMATION
The first theory of confirmation that Carnap developed in detail is to be found in “Testability and Meaning”.1 In this paper, he addressed the issue of a definition of empiricism, several years after abandoning the quest for a unique and universal logical framework supposed to be the basis of a clear distinction between the meaningful sentences of science and the pseudo-sentences of metaphysics. The principle of tolerance (according to which everyone is free to build up his own form of language as he wishes) had been adopted near the end of 1932, at a time when it was already obvious to Carnap that a strictly verificationist criterion of meaning was inadequate. He therefore considered a variety of empiricisms and a variety of choices for the language of science. As Carnap put it, “there are many different possibilities in framing an empiricist language”2 and, correspondingly, several degrees of liberalization of the criterion of meaning. It was in this context that Carnap provided both a logical (syntactical) and an empirical analysis of confirmation (and of testing), before distinguishing requirements of different strengths which served the purpose of defining several versions of empiricism. This use of confirmation and testing as a substitute for verification in the formulation of a criterion of meaning was a far cry from the theory of confirmation that Carnap would begin to elaborate a few years later in the quite different context of his work on a system of inductive logic. In the meantime, however, an intermediate stage of his thought on confirmation was expressed in the paper he read at the fifth International Congress for the Unity of Science (Cambridge, Mass., 1939), in which he borrowed a distinction from Charles Morris and divided the theory of language—and the analysis of science—into syntax, semantics, and pragmatics. At that time, Carnap insisted that the concept degree of confirmation was a pragmatic concept, not a logical (semantic) one. By this, he meant that the confirmation of a sentence, in contrast to its truth, is relative to some particular state of knowledge: “a statement of a degree of confirmation does not characterize an objective situation but rather the state of knowledge of a certain person with respect to a certain situation”.3 The need to make a sharp distinction between truth and confirmation had already been pointed out by Carnap in one of the papers he read4 at the Paris Congress on Scientific Philosophy in 1935, after Tarski had ex1 2 3 4
Rudolf Carnap, “Testability and Meaning”, Philosophy of Science, 3, 1936; 4, 1937. Rudolf Carnap, op. cit., 4, 1937, p. 33. Rudolf Carnap, “Science and Analysis of Language”, The Journal of Unified Science (Erkenntnis), 9, The Hague, 1939 [preprinted, but not published], p. 225. Rudolf Carnap, “Wahrheit und Bewährung”, Actes du Congrès International de
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_32, © Springer Science+Business Media B.V. 2011
478
Pierre Wagner
plained to him his work on the definability of a truth predicate in the framework of a semantic (as opposed to a purely syntactical) metalanguage. In 1939, the further category of a pragmatic analysis of the language of science was added, as well as the idea of considering degrees of confirmation: “If we wish to say that a certain sentence, e.g. the law of gravitation, has a high degree of confirmation, then we must add for whom and at what time, e.g. for Newton in the year such and such, or for the Chinese physicists in the year 1900.” “A statement of degree of confirmation (…) belongs to pragmatics.”5 Such statements are in sharp contrast both with the analysis which had been offered in “Testability and Meaning”, where the main issue was the formulation of a criterion of meaning and no concept of degree of confirmation had been considered, and with Carnap’s mature theory of confirmation, in which degree of confirmation would become a logical (semantic), not a pragmatic concept, formulated in terms of conditional probability: the confirmation of some hypothesis h would then be relativized to some evidence e, thus allowing a logical definition of the confirmation of h as the probability of h given e. From 1945 onward, in the context of his theory of inductive logic, Carnap distinguished a logical and a methodological concept of confirmation, devoting most of his work to the former: to decide to what degree h is confirmed by e—a question in logic, but here inductive, not in deductive logic—we need not know whether e is true or false, whether h is true or false, whether anybody believes in e, and, if so, whether on the basis of observation or of imagination or of anything else. All we need is a logical analysis of the meanings of the two sentences. For this reason we call our problem the logical or semantical problem of confirmation, in distinction to what might be called the methodological problems of confirmation, e.g., how best to construct and arrange an apparatus for certain experiments in order to test a given hypothesis, how to carry out the experiments, how to observe the results, etc.6
He then further distinguished between a positive concept of confirmation (“h is confirmed by e”), a comparative one (“h is more confirmed by e than h′ by e′”), and a quantitative one (“h is confirmed by e to the degree r”), devoting most of his work to the latter. Carnap’s mature theory of confirmation depended on another distinction he made between two concepts of probability: the first one (“probability1”) was logical and had the meaning of degree of confirmation whereas the other one (probability2) was statistical and had the meaning of a relative frequency. More precisely, in Logical Foundations of Probability7 (LFP), his explication of the logical
5 6 7
Philosophie Scientifique, vol. 4, Paris, Hermann, 1936, augmented and translated as “Truth and Confirmation”, in Feigl and Sellars, ed., Readings in Philosophical Analysis, New-York, Appleton-Century-Crofts, 1949. Rudolf Carnap, “Science and Analysis of Language”, op. cit., p. 225. Rudolf Carnap, “The Two Concepts of Probability”, Philosophy and Phenomenological Research, 5, 1945, p. 514-5. Rudolf Carnap, Logical Foundations of Probability, Chicago, The University of Chi-
Carnap’s Theories of Confirmation
479
concept of probability was based on a theory of the quantitative concept of confirmation. He defined confirmation functions, called c-functions, in such a way that “c(h,e) = r” meant that hypothesis h is confirmed or supported by evidence e to the degree r. He also introduced some particular c-function called c*, which he regarded as the best candidate for an explication of the concept degree of confirmation. Today, the theory of confirmation of 1950 and function c* constitute Carnap’s most well known explication of confirmation. At the time he wrote LFP, Carnap thought it should be possible to isolate just one inductive method and one confirmation function for such an explication. In 1952, however, he defined a family of functions, the cλ functions (where index λ was a real number ranging from 0 to infinity), thus displaying an infinite number of candidates for an explication of the same concept. In this way, he showed that his theory of confirmation actually was a general approach which led to an infinite number of inductive methods, each one corresponding to some particular confirmation function. Later in the fifties, his theory of confirmation was improved in different ways (e.g. Carnap tried to extend it to the case of more complex languages than the ones he had initially considered), and this led to the German book Induktive Logik und Wahrscheinlichkeit.8 Still later, in the sixties, he began to elaborate a different approach of the concept degree of confirmation, taking the theory of rational choice as a new basis and taking into account the recent developments of semantics to which his friend and collaborator John Kemeny had contributed.9 Carnap’s new ideas in inductive logic and confirmation theory were developed in a series of papers which started with “The Aim of Inductive Logic”10 and ended with “A Basic System of Inductive Logic”, a book-length unfinished text which was to be published posthumously.11 As we can see, during more than thirty years, Carnap’s theory of confirmation evolved in different ways, offering a variety of approaches, from the logical and the empirical analyses of “Testability and Meaning” to the later theory of confirmation expounded in “A Basic System of Inductive Logic”. Today, this variety of approaches is hardly known and in philosophical discussions about confirmation, if Carnap’s views are mentioned at all, the references are usually only to LFP and to the c*-function. And because the second (unrevised) edition of this book—a landmark in the history of philosophy of science—was printed in 1962,
cago Press, 1950, 2nd ed. 1962. 8 Rudolf Carnap and Wolfgang Stegmüller, Induktive Logik und Wahrscheinlichkeit, Vienna, Springer, 1959. 9 John Kemeny, “A New Approach to Semantics”, Journal of Symbolic Logic, 21, 1956. 10 Rudolf Carnap, “The Aim of Inductive Logic”, in Nagel, Suppes, Tarski, ed., Logic, Methodology and Philosophy of Science, Stanford, Stanford University Press, 1962. 11 Part I in Rudolf Carnap et Richard Jeffrey, ed., Studies in Inductive Logic and Probability, vol. I, Berkeley, 1971; Part II in Jeffrey, ed., Studies in Inductive Logic and Probability, vol. II, Berkeley, 1980.
480
Pierre Wagner
the evolution of Carnap’s thought in the sixties is usually overlooked.12 The lack of interest in Carnap’s theory of confirmation is also due to its link with his logical interpretation of probability, an interpretation which is generally regarded as dead today (mainly because of the problem of assigning a priori probabilities), although some authors have recently evinced a renewed interest in logical probability.13 Carnap’s heroic efforts to find a solution to the general problem of defining a priori probabilities have often been considered as a desperate endeavour. Recently, however, a more careful reading of his work on inductive logic—including his latest papers—has allowed more balanced judgements about his influence on contemporary views on probability and confirmation: “he largely shaped the way current philosophy views the nature and role of probability, in particular its widespread acceptance of the Bayesian paradigm”.14 Because there is no place here for a detailed discussion of the different views Carnap espoused on confirmation, I shall now have to leave aside what he called the “classificatory” (or “positive”) and the “comparative” concepts of confirmation, and focus on his work on the “quantitative” one, from 1945 onward. Carnap analysed degree of confirmation as a logical concept, and at that time, this meant a semantic concept, formulated in terms of conditional probability with a logical, epistemic interpretation. In this analysis, the confirmation of hypothesis h being defined as p(h|e)— the probability of h given e—the problem was to find a method for assigning values to such a probabilistic function, from a purely logical viewpoint. The logical character of function c (with arguments h and e) meant two things. First, it meant that the values of this function did not depend on any empirical tests or pragmatic conditions, or on the factual truth or falsity of h and e, but only on the logical (semantic) analysis of these sentences, whatever their logical form and their truth value. Second, that the c-function was a logical function also meant that its arguments were sentences of some precisely defined language L. As a consequence, the logical analysis of h and e depended on the structure and the complexity of this language, and this was the reason why, in 1950, Carnap’s early work on inductive logic was restricted to quite simple languages. For example, all the individuals had to have a name in the language and two different individual constants of the language had to refer to two different objects. Also, in these simple languages, predicates and relations had to be logically independent.15 Later, Carnap and other people tried to cover the case of more complex languages and 12 Carnap’s latest views were discussed in the seventies, shortly after his death. Cf. Richard Jeffrey, “Carnap’s System of Inductive Logic”, Synthese, 25, 1973; Risto Hilpinen, “Carnap’s New System of Inductive Logic”, Synthese, 25, 1973. 13 Patrick Maher, “Subjective and Objective Confirmation”, Philosophy of Science, 63, 1996; James Franklin, “Resurrecting Logical Probability”, Erkenntnis, 55, 2001. 14 Sandy Zabell, “Carnap on Probability and Induction”, in Friedman and Creath, eds., The Cambridge Companion to Carnap, Cambridge U.P., p. 294. 15 For example, “green” and “red” are not logically independent in so far as green(x) implies not-red(x), for any x.
Carnap’s Theories of Confirmation
481
this is one direction in which he tried to improve his theory of confirmation later in the fifties. As a consequence, the fact that the c-functions were regarded as logical functions also implied that the evaluation of c(h,e) depended on some particular semantics. From an historical viewpoint, it is no accident that Carnap’s inductive logic and his quantitative theories of confirmation were developed in the forties, after his semantic turn, and not before, during his syntactical period. The first systematic exposition of his semantic theory was published in 194216, and it is important to note that Carnap’s semantics was quite different from what has become standard semantics nowadays. In particular, when Carnap wrote LFP, he had neither the notion of an L-structure for language L nor the notion of truth in a structure, and our standard notions of a model and of validity were also alien to his theory. In Carnap’s semantics, the basic notions were state description and range, which were also fundamental for his theories of confirmation. Let us consider the simple example of a language L with three individual constants a, b and c and two one-place predicates P and Q, and let us take the conjunction of all the atomic sentences: P(a)∧P(b) ∧P(c)∧Q(a)∧Q(b)∧Q(c). This is one state description for language L. The basic idea is that this conjunction describes one of the possible states of the universe of discourse. If we take all the different ways of putting a negation before some or all the atomic sentences in this conjunction, we get all the other state descriptions for this language. The state descriptions for language L are descriptions of the different possible states of the universe from the viewpoint of language L. If we now consider that a sentence A of L may be true for all, for some, or for none of the state descriptions, we can define the range of sentence A as the set of all the state descriptions which make it true. It is not difficult to discern some analogy between the state descriptions in Carnap’s semantics and the L-structures of our standard semantic theory. One of the main differences, however, is that in Carnap’s semantics the domain of discourse is fixed and reflected in the language by the individual constants. Moreover, whereas the L-structures of the standard Tarskian semantics are defined in a metalinguistic set-theoretic framework, Carnap’s state descriptions are given by formulas of the object language L itself. The reason for imposing restrictions on the object language is then obvious: complications arise as soon as an object of the universe does not have a name in the language or if two predicates such as “green( )” and “red( )” occur, which are not logically independent.17 Another difficulty arises if 16 In Introduction to Semantics, Cambridge, Mass., Harvard U.P., 1942. A less formal exposition had been given in his Foundations of Logic and Mathematics, Chicago, University of Chicago Press, 1939. 17 A further complication arises with relations such as “warmer than”. Because of the meaning of this relation, we need to make sure that the conjunction “W(a,b).W(b,c).W(c,a)”
482
Pierre Wagner
state descriptions are to be formulated as sentences and the language is intended for an infinite domain of objects. One of the main issues for Carnap in the fifties was to find ways of extending his theory of confirmation to the case of more realistic languages than the simplistic ones he had considered in 1950. How does Carnap define a confirmation function on this basis? Let us consider the simple case of languages LN with a finite number of one-place predicates (which are logically independent from one another), and a finite number N of individual constants. First, some metric is defined on the sentences of LN by a function m ascribing a real number m(A) between 0 and 1 to each formula A. This metric (to be interpreted as a probability function) is said to be regular if the following conditions are satisfied: 1 – if A is a state description, then m(A)>0 (this means that each state of the universe is possible); 2 – the sum of all the m(Ai) (where i is an index for all the state descriptions) is equal to 1 (this means that the state descriptions describe all the possible states of the universe); 3 – if the range of a formula A is null (i.e. if A is false for every state description), then m(A)=0 (this means that if a sentence is not possible, its probability is equal to zero); 4 – if the range of A is not null, then m(A) is the sum of the m(Ai) where i is an index for all the state descriptions in the range of A (this means that the probability of a sentence depends on the number of state descriptions for which it is true). The metric defined by the regular function m is a way of ascribing some probability value to all the sentences of language LN, in such a way that the laws of probability theory apply. The requirements imposed on the regular functions m are extremely weak: they leave room for an infinite number of possible m-functions, and one of the main issues is to find a way of assigning precise values so as to get a realistic function m. For example, we may be tempted to proceed in assigning the same value to each state description, arguing that each state description has an equal a priori probability to obtain. If we proceed in this way, we get some particular function, which Carnap calls mw (because he attributes it to Wittgenstein). But he argues that this method actually produces a confirmation function which does not match the intuitive idea we have of the properties that a confirmation method should satisfy. The function cw does not have the properties we expect. From an epistemological viewpoint, there are here two important issues:
is not part of any state description (if “W(x,y)” is used for “x is warmer than y”).
Carnap’s Theories of Confirmation
483
1– Which function m do we have to choose? Some particular language L being given, which values do we have to assign to each sentence of L? 2– What kind of argument shall we put forward to justify our choice of one function rather than another? What will be the basis of our choice? In any case, the reasons we may give for our choice actually apply to the c-functions rather than to the m-functions, so let us first see how the confirmation cfunctions are defined on the basis on the m-functions. The definition is quite simple. Given some regular m-function—call it m—, if we see it as a function assigning probabilities to each sentence of the given language L, the confirmation function—call it c—with arguments h and e is defined as a conditional probability. For this definition, we assume that evidence e is true for at least one state description, so that m(e) is not zero: c(h,e) = df m(h.e) / m(e). This means that for arguments h and e, the value of the confirmation function c is defined as the conditional probability of h given e (p(h|e), which is nothing but p(h.e) / p(e)), where the values of the probability function p are just those of the measure function m. In the special case where e L-implies h, we can easily check that c(h,e) = 1, so that logical implication is a special case of confirmation. Clearly, Carnap’s theory of confirmation is a logical one; for any arguments h and e, c(h,e) depends on the logical form of h and e and on the logical structure of the language, not on any empirical testing. The choice of the m-function determines the definition of the c-function but this still leaves room for an infinite number of possibilities for m. In 1950, Carnap tried to narrow down this number and argued for the choice of one special cfunction that he called c*. Two reasons for this choice were put forward. The first idea was that a logical theory should make no discrimination between individuals. This meant that if some state description results from some other state description by a permutation of individual constants, they actually describe the same structure of the world, and an m-function should assign the same value to both of them. An m-function which satisfies this propriety is said to be a symmetrical function. The second idea is that an m-function should assign the same value not to each state description, but to each set of state descriptions which express the same structure of the universe of discourse. This is an application of the principle of indifference not to state descriptions, but to structure descriptions. Applying these two principles, we get the metric m*, and, applying the foregoing definition of a c-function to m*, we get the confirmation function c*. For some time, Carnap seemed to consider function c* as the best candidate for an explication of confirmation. It happens, however, that c* has serious shortcomings, of which I shall mention just one. Suppose h has a universal form, (this is the case, for instance, if h is a universal law of the form “for all x, if P(x) then Q(x)”) and suppose that the universe is infinite. In this case, we get the unexpected result that
484
Pierre Wagner
c*(h,e) = 0, which was regarded as a serious objection to the adoption of c* as an explication of confirmation. In defence of c*, Carnap gave an ad hoc argument— which few people found really convincing—according to which the use of laws is not indispensible for making predictions and that science can actually do without universal laws.18 On the issue of the choice of a confirmation function, important progress was made in The Continuum of Inductive Methods.19 In this important book, Carnap managed to prove a fundamental theorem to the effect that it is possible to reduce the number of parameters on which the choice of a regular and symmetrical confirmation function depends. It is even possible to reduce this number to just one parameter which Carnap called λ, each value of λ defining one particular inductive method. To give just a hint at the intuitive meaning of λ, notice first that the values of some confirmation function cλ depend both on an empirical factor (the evidence we have) and on a logical factor (the language system we use). What the parameter λ determines is the precise balance between these two factors in the computation of the values of the c-function. In the book, Carnap also examined some properties of the c-functions we get when we choose some particular values for λ. From an epistemological viewpoint, one of the conclusions Carnap drew was that the choice of one precise confirmation function does not depend only on a purely logical basis, but also on its application and on the success we achieve when using some particular inductive method. A conclusion which might seem somewhat surprising for such a supporter of a logical theory of confirmation as Carnap was. Two remarks, however, should be added here. First, we should not confuse what Carnap called a theory of confirmation and what he meant by a method of confirmation. The theory is purely logical and a priori, whereas the choice of a method also depends of the particular application we want to make of the theory. Second, the conclusion I have just mentioned was actually not maintained by Carnap in the sixties, after he adopted a quite different approach to the theory of confirmation. At that time, he insisted that “in principle it is never necessary to refer to experiences in order to judge the rationality of a C-function”.20 Here, he mentions the “rationality” of the c-function, and this denotes a change in his vocabulary. In 1950, Carnap did not hesitate to identify logical probability and degree of confirmation, or to explicate the former by the latter, even though he also mentioned other explanations for the concept of logical probability, such as a fair betting quotient or an estimate of relative frequency. Later, he realized that the phrase “degree of confirmation” was somewhat ambiguous and even misleading: it could mean either “degree of support” or “increase of the degree of support” and he therefore preferred avoiding using that phrase when 18 Rudolf Carnap, “On Inductive Logic”, Philosophy of Science, 12, 1945, pp. 90-3. 19 Rudolf Carnap, The Continuum of Inductive Methods, Chicago, The University of Chicago Press, 1952. 20 Rudolf Carnap, “Inductive Logic and Inductive Intuition”, in I. Lakatos, ed., The Problem of Inductive Logic, Amsterdam, North-Holland, 1968, p. 264
Carnap’s Theories of Confirmation
485
explicating logical probability. Two historical reasons why Carnap changed his approach in the sixties should also be mentioned. First, an important turn took place in the history of logic at the time Kemeny, who had worked with Carnap in Princeton, introduced a new approach to semantics, thereby proposing to replace Carnap’s notion of state descriptions by a new notion of models for logical systems.21 In his later papers on inductive logic, Carnap used the abstract notion of a model and used propositions and events rather than sentences as the basic concepts of his system. Second, an important turn also occurred in the history of probability theory in the fifties, when more and more people became interested in subjective interpretations of probability and in the works of Ramsey and de Finetti. Though Carnap did not adopt a psychological interpretation of probability in terms of subjective degree of belief, he then used the setting and the concepts of decision theory. He defined a credibility function CredX which represented the disposition of some person X for having beliefs. The values CredX (h,e) of this function were measured by X’s behaviour in situations where she had to bet on hypothesis h while knowing e. More precisely, CredX (h,e) was defined as the highest betting quotient on which X would be willing to bet on h, if her total knowledge were e. Now, how can we define a confirmation function on the basis of such a subjective credibility function? Carnap’s idea was to lay down axioms that the credibility function must satisfy in order to represent a rational disposition for having beliefs. Instead of considering some real person X, he therefore proposed to adopt the viewpoint of some idealized entity, some robot or machine able to learn, to have beliefs, and to bet on a hypothesis, and to state axioms which would characterize a purely rational behaviour and, as a consequence, a purely rational credibility function. A confirmation function was then defined as a function c such that c (h,e) was exactly CredX (h,e), where X was a purely rational entity. Using the setting of decision theory, Carnap distinguished descriptive decision theory, which states psychological laws and “normative decision theory, which states conditions of rationality for decisions”.22 But the axioms Carnap proposed still left open the possibility of an infinite number of confirmation functions and for him, it was not clear how far we could go in narrowing down the choice to a smaller number by stating new axioms. This raised the issue of the justification of the axioms for defining a rational c-function. In the context I have just described, Carnap criticized the idea of a justification based on past experiences, or even on general synthetic principles such as the principle of the uniformity of the world. He preferred the idea of a justification by what he called “the ability of inductive intuition”.23 When using this 21 See reference in fn. 9 above. 22 Rudolf Carnap, “Inductive Logic and Rational Decisions”, in Carnap and Jeffrey, ed., Studies in Inductive Logic and Probability, vol. I, Berkeley, University of California Press, 1971, p. 8. 23 Rudolf Carnap, “Inductive Logic and Inductive Intuition”, in I. Lakatos, ed., op. cit., p.
486
Pierre Wagner
phrase, he took the precaution to make clear that by such ability, he did not mean any infallible source of knowledge. In order to explain why he thought intuition was needed here, he remarked that the situation is similar in deductive logic: you cannot convince a person of the validity of the modus ponens inference if she is deductively blind, i.e. if she does not have any ability of deductive intuition. This is not to say that this kind of ability is the only epistemological basis for choosing axioms. For example, the choice can also be guided by the properties of the concepts we are dealing with. This raises an important question which concerns not just inductive or deductive logic but Carnap’s philosophical method in general: if our aim is the construction of a formalized system (in our present case, a system of inductive logic), what can we expect from such a dubious basis as our intuition? I think there are at least two answers to this question. First, paraphrasing what Carnap wrote in a paper from 1953, we can say that it is “desirable that procedures which are generally applied, though only intuitively or instinctively, are brought into the clear daylight, analysed and systematized in the form of exact rules”.24 This is the basic reason why Carnap constructed formal systems. The second answer is borrowed from a paper by Jeffrey: “the business of discovering what our inductive intuitions are is generally not so much a matter of uncovering pre-existent, covert intuitions, as of creating intuitions: forging an inductive temperament out of materials which were not inductive intuitions before they passed through the Carnapian fire”.25 This is in perfect agreement with what Jeffrey called “Carnap’s voluntarism” in another paper which pointed out this important character of his philosophy.26
IHPST (Institut d’histoire et de philosophie des sciences et des techniques) Université Paris 1 Panthéon-Sorbonne 13 Rue du Four 75006 Paris France [email protected]
265. 24 Rudolf Carnap, “Inductive Logic and Science”, Proceedings of the American Academy of Arts and Sciences, vol. 80, 3, p. 190. 25 Richard Jeffrey, “Carnap’s Inductive Logic”, Synthese, 25, 1973, p. 306. 26 Richard Jeffrey, “Carnap’s Volontarism”, in Prawitz, Skyrms, Westerståhl, ed., Logic, Methodology, and Philosophy of Science IX, Amsterdam, 1994.
ARTUR KOTERSKI
THE RISE AND FALL OF FALSIFICATIONISM IN THE LIGHT OF NEURATH’S CRITICISM1
1. INTRODUCTION At the peak of his career, Karl Popper wrote he was still waiting for a simple and clear criticism of the theses of his Logik der Forschung (hereafter: LdF)2 and in one of his last interviews he claimed that all critical arguments were no good.3 However, already Otto Neurath’s 1935 review contained serious counterarguments against falsificationism. Since Popper’s doctrine was under intensive reconstruction for decades, we may ask if at any stage falsificationism was able to meet Neurath’s objections. Since Imre Lakatos the rise of falsificationism is often depicted by four phases.4 The dissection of falsificationism into ‘Popper0,’ ‘Popper1,’ and ‘Popper2,’ followed by methodology of scientific research programs, seems to be very convenient. This fourfold division was often questioned, however, with Popper himself one of the critics.
1 2 3
4
Scientific work financed from the science funds in years 2008–2010 under a research project 2340/B/H03/2008/35. Karl R. Popper, The Logic of Scientific Discovery, New York: Basic Books, Inc. (originally 1935). See Karl R. Popper, ‘Science: Conjectures and Refutations,’ in Karl R. Popper, Conjectures and Refutations: The Growth of Scientific Knowledge, London: Routledge and Kegan Paul, 1963, p. 55 and Karl R. Popper, „Przyszłość jest otwarta. Rozmowa z Sir Karlem Popperem”, in Adam Chmielewski, Filozofia Poppera, Wrocław: Wyd. UW 1995, p. 241: “Ah, yes! Practically everything was criticized but all those critical arguments are worth nothing” (retranslated from Polish). See Imre Lakatos, ‘Criticism and the Methodology of Scientific Research Programmes,’ Proceedings of the Aristotelian Society LXIX (1968), pp. 149-186; Imre Lakatos, ‘Falsification and Methodology of Scientific Research Programmes,’ in Imre Lakatos, Alan E. Musgrave (eds.), Criticism and the Growth of Knowledge, Cambridge: CUP, 1970, pp. 91-196; Imre Lakatos, ‘History of Science and its Rational Reconstructions,’ in Roger C. Buck, Robert S. Cohen (eds.), PSA 1970. In Memory of Rudolf Carnap, Dordrecht: D. Reidel Publishing Company, 1971, pp. 91-135; Imre Lakatos, ‘Popper on Induction and Demarcation,’ in Paul A. Schilpp (ed.), The Philosophy of Karl Popper, La Salle: Open Court, 1974, pp. 241-273 (written in 1971) and Artur Koterski, Falsyfikacjonistyczne kryteria demarkacji w XX-wiecznej filozofii nauki, Lublin: Wyd. UMCS, 2004.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_33, © Springer Science+Business Media B.V. 2011
488
Artur Koterski
While Popper complained about being misrepresented, his friends pointed out ambiguities in Lakatos’s taxonomy.5 Feyerabend thought Lakatos’s strategy counterproductive in that saw in it “slowed down philosophers in their attempt to find the weak spots of critical rationalism.”6 It seems, however, that a kind of division is required, for otherwise we would have to deal with a doctrine that says simply too much to be consistent. We must bear in mind that ‘Popper0,’ ‘Popper1,’ and ‘Popper2,’ stand for specified falsificationist positions and that Popper’s writings support these standpoints to some extent. The history of such support is non-linear, however, so one cannot divide his intellectual life into three consecutive periods—and that was not Lakatos’s point either (he was not even sure which one of the indexed Poppers is to be identified with its archetype). He aimed rather at the rational reconstruction of the development of falsificationism.7 Lakatos started to enumerate Popperian standpoints with index ‘0’ to stress, primarily, that Popper0 had been invented by some critics of falsificationism as a straw man as well as to insinuate that Popper0 published nothing.8 Lakatos concluded his first account of dogmatic falsificationism rather superciliously: ‘So we may just as well forget about Popper0’9. He soon changed his mind: ‘First I shall discuss a most important brand of falsificationism: dogmatic … falsificationism’10. The discussion is important because many—both sympathizers and foes—interpreted falsificationism as if it were Popper0’s proposal. In that sense, Popper0 is not 5
See, e.g., Karl R. Popper, ‘The Philosopher Replies,’ in Schilpp (ed.), The Philosophy, op. cit., pp. 999-1000, 1085, fn. 70a; Karl R. Popper, The Postscript to the Logic of Scientific Discovery, Vol. I, London: Routledge, p. xxiii; John W. N. Watkins, ‘The Propositional Content of the Popper—Lakatos Rift,’ in George Kampis, Ladislav Kvasz, Michael Stöltzner (eds.), Appraising Lakatos. Mathematics, Methodology, and the Man, Dordrecht: KAP, 2002, p. 5; Joseph Agassi, ‘Tristram Shandy, Pierre Menard, and All That: Comments on Criticism and the Growth of Knowledge,’ Inquiry 14 (1971), p. 158. 6 Paul K. Feyerabend, ‘Consolations for the Specialist,’ in Lakatos, Musgrave (eds.), Criticism, op. cit., p. 215, fn. 1. 7 Koterski, Falsyfikacjonistyczne kryteria, op. cit., pp. 17–24. 8 It is wrong to imply by the use of zero-index that Popper wrote or published nothing before LdF. Quite the contrary; those works, however, does not let qualify him even as Popper0 (cf. Malachi Hacohen, Karl Popper. The Formative Years 1902–1945. Politics and Philosophy in Interwar Vienna, Cambridge: CUP, 2001, pp. 133-134, 168; Michel ter Hark, ‘Between Autobiography and Reality: Popper’s Inductive Years,’ Studies in History and Philosophy of Science, 33 (2002), pp. 88-89; Stefano Gattei, ‘Karl Popper’s Philosophical Breakthrough,’ Philosophy of Science, 71 (2004), pp. 449-453). As a supporter of inductivism and verificationism up to late twenties Popper held a view on empirical basis burdened with what he was later to call ‘psychologism’ and this is perhaps the only link connecting him with Popper0. 9 Lakatos, ‘Criticism,’ op. cit., p. 152. 10 Lakatos, ‘Falsification,’ op. cit., p. 95.
The Rise and Fall of Falsificationism
489
a straw man even though it is hard to find him in print. As Lakatos pointed out, on some occasions even Popper himself slips into the language of Popper0.11
2. THE CRITIQUE OF DOGMATIC FALSIFICATIONISM According to Popper0 what distinguishes scientific theories from metaphysics is they are falsifiable.12 This is a common characteristic of all phases of Popperian falsificationism. However, each of Lakatos’s falsificationists gives it a different interpretation. For Popper0 falsifiable theories are those that have an empirical basis. In LdF this is the necessary and the sufficient condition for a theory to acquire a scientific status. Falsification of a theory consists in accepting a potential falsifier. The schema of falsification is based on modus tollendo tollens: (T → p) ˄ ~p → ~T, where p is a prediction deduced from theory T, ~p is a falsifier stating the relevant negative outcome of the experiment. The absolute certain character of this procedure is given by its deductive nature—it generates true conclusions on the basis of true premises. The truth of premises is granted, because (T → p) is logically true and ~p is proved by perceptual experience. The empirical basis does not allow verification of any universal theory but makes possible to show its falsity—and to reject it. As Popper put it categorically in retrospect: “[W]hen an event happens that is forbidden by a theory, then the theory is falsified; it is false, refuted.”13 This is the way of scientific progress: it consists in undeniable step-by-step elimination of falsehoods from science. What distinguishes Popper0 from earlier philosophers and scientists, like Kant or Bayle, who endorsed this way of thinking against Aristotelian tradition but never became falsificationists, was the quest to answer the to him central question of the problem of demarcation.14 The dogmatic solution of the problem may be reconstructed as follows. A non-analytical statement (or a group of them) is scientific if it is provable by observation (like basic statements) or it is falsifiable, i.e. when it has an empirical basis. In short: “Empiricism = singularistic basis”.15 It is impor11 It seems to be much more than an ‘unintentional’ slip when he e.g. rejects ceteris paribus clause—see below, fn. 27. 12 Agassi conceded that early on “Popper’s view of metaphysics” was “clearly tinged with old-fashioned positivism” (Joseph Agassi, Towards A Rational Philosophical Anthropology, The Hague: Martinus Nijhoff, 1977, p. 69). 13 Karl R. Popper, ‚Einleitung 1978,’ in Karl R. Popper, Die beiden Grundprobleme der Erkenntnistheorie, Tübingen: J. C. B. Mohr, 1979, p. xxvii. 14 See Popper, Logic, op. cit., p. 34; Karl R. Popper, ‘Optimist, Pessimist and Pragmatist View of Knowledge of Scientific Knowledge,’ in Karl R. Popper, After the Open Society. Selected Social and Political Writings, London: Routledge, 2008, p. 3; Popper, Die beiden Grundprobleme, op. cit., p. 4, 346. 15 Popper, Die beiden Grundprobleme, op. cit. 437.
490
Artur Koterski
tant to note the absolute character of demarcation0: “Epistemology must determine a strict and universally useable criterion (i.e., a ‘demarcation criterion’)”.16 The demarcation problem is the central question for Popper’s philosophy not only in the theoretical but also in the practical dimension. The demarcation criterion gives a norm of rationality. The task of a scientist consists in the permanent posing of hypotheses and restless attempts to refute them. Rational scientists are forced to specify a priori the conditions of refutation of their own theories. The acceptance of this norm enables scientific progress. We may recapitulate the dogmatic falsificationist position as follows: 1. All scientific theories or hypotheses are fallible (this is a description not a demarcation proposal); 2. However, the empirical basis is infallible (it is even partly proved); 3. The infallibility of the empirical basis is not transferable to the theoretical level; 4. Thus, theories are unprovable, but they may be refuted once some falsifier is accepted; 5. The only tool for this procedure is given by modus tollendo tollens schema; 6. Falsification of a theory is—due to (2) & (5)—ultimate and leads to its rejection; 7. Science is developed by elimination of false theories and replacement by those that explain everything that was explained by the old ones plus the facts that caused their refutation; 8. The theories that do not satisfy the criterion of demarcation0 are metaphysical (= pseudo-scientific). The dogmatic criterion seems to be very powerful, pretty simple and quite effective. It seems to save endangered scientific rationality in a highly persuasive manner. However, as I’ve also argued elsewhere17, there are many counter-arguments against Popper0 and some are fatal for his theory. Four key problems were pointed out by many philosophers, partly by Popper himself, but most importantly—more or less explicitly—by the Vienna Circle members: A1) it is not possible to differentiate between observational and non-observational part of science without adopting a convention18; A2) it is also not possible to have a secure empirical basis19; 16 Popper, Die beiden Grundprobleme, op. cit., p. 422 (italics added). 17 See Koterski, Falsyfikacjonistyczne kryteria, op. cit., ch. I. 18 Otto Neurath, ‘Anti-Spengler,’ in Otto Neurath, Empiricism and Sociology, Dordrecht: D. Reidel Publishing Company, 1973, p. 199 (originally 1921). 19 Otto Neurath, ‘Protocol Sentences,’ in Otto Neurath, Philosophical Papers 1913– 1946, Dordrecht: D. Reidel Publishing Company, 1983, p. 92-96 (originally 1932).
The Rise and Fall of Falsificationism
491
A3) Popper0 excludes from science all stochastic laws20; A4) Popper0 excludes from science all theories with ceteris paribus clause21. Note that Neurath’s works mentioned here predate LdF. Yet the most devastating criticism came with his review of that book.22 Neurath noticed, among other things, that falsificationism was not applicable to empirical science and that, even if it were, it would be false. His counter-arguments run as follows.23 B1) Popperian falsification is a clear-cut instrument for systems with unambiguous language that never occur real science, so it is not applicable as science is not a clear-cut system and operates with a vague language; B2) Popperian falsification does not apply to sciences without experiments; C1) Popperian falsification is an absolute tool, so falsificationism represents pseudo-rationalism—there is no general mean of control in science; C2) Falsificationism does not reflect the actual methods used in science over centuries, so it is ahistorical, giving a false image of what science is; C3) Falsificationism introduces a mechanism for accepting and rejecting statements without taking into the account the pragmatic aspects of scientific conduct and neglecting history; C4) Falsificationism wrongly describes scientific change—it is rarely caused by falsification; moreover, the old theory T1 does not have to be an approximation of T2, so the necessary condition for falsification and replacement is not fulfilled.
20 Edgar Zilsel, ‘Popper, Karl, Logik der Forschung. Zur Erkenntnistheorie der modernen Naturwissenschaft’, Die Naturwissenschaften 23 (1935), p. 532. 21 Otto Neurath, ‘The Lost Wanderers of Descartes and the Auxiliary Motive (On the Psychology of Decision),’ in Neurath, Philosophical Papers, op. cit., p. 3 (originally 1913). 22 Popper never provided an answer to Neurath’s criticism except to suggest it was prompted by political reasons (see Karl R. Popper, ‘Memories of Otto Neurath,’ in Neurath, Empiricism and Sociology, op. cit., p. 56). 23 Neurath’s review did not deal with probability issues, referring to Reichenbach. With some of following criticism Neurath also concorded with Reichenbach who had agreed there were no absolute falsifications and that a negative test outcome need not even be a temporary refutation and gave examples from the history of science inconsistent with falsificationism. (See Hans Reichenbach, ‘Über Induktion und Wahrscheinlichkeit. Bemerkungen zu Karl Poppers ‟Logik der Forschung” ’, Erkenntnis 5 (1935), pp. 267–284.)
492
Artur Koterski
3. THE DECISIONISM OF POPPER1 To avoid the problems A1-A4 Popper1 introduces methodological conventions. The five relevant decisions concern::
1) 2) 3) 4)
the division of statements into observational and theoretical24; moving the part of the former to the background knowledge25; setting the limits of ‘errors’ for probabilistic theories26; placing conditions for placing ceteris paribus clause into the background knowledge27; 5) reconsidering the scientific status of some metaphysical statements: we consider them falsified if they clash with a corroborated scientific theory28. With those corrections in mind we can reformulate Popper’s position as follows. 1. We consider as scientific those theories that are falsifiable—this is a proposal, not a statement of fact, but it is still in agreement with history of science. 2. “Basic statements” are fallible like theories because observational statements describe facts through the prism of a theory or theories; “basic statements” are only temporarily accepted according to a procedure that assesses their usefulness. 3. The fallibility of the “empirical basis” reflects in itself the fallibility of falsification—a falsified theory may be true anyway. 4. Scientific theories cannot or even should not be irrevocably refuted. 5. Thus, the tool for falsification and rejection of a theory is methodological falsification; it is a decision based on a) a tentatively accepted “empirical basis”, b) a highly corroborated falsifying hypothesis, and c) some other elements of background knowledge. 6. Since a falsified theory is just in disagreement with accepted “basic statements”—rejection is possible only when a better theory is available. 7. The development of science consists in introducing theories with higher degree of falsifiability. Scientific change does not have to be caused by a 24 25 26 27
Popper, Logic, op. cit., p. 107. Popper, Die beiden Grundprobleme, op. cit., p. 430. Popper, Logic, op. cit., p. 262. Popper explicitly rejected the idea of ceteris paribus clause (Popper, ‘The Philosopher Replies,’ op. cit., p. 1186, fn. 75), so the fourth methodological decision of Popper1 remains the most controversial nonetheless indispensable element of falsificationism1. However, he writes later that some prima facie forbidden events may be considered as such only if additional assumptions are accepted (cf. Popper, ‘Einleitung,’ op. cit., p. xxviii). This may be interpreted as admission that a version of ceteris paribus clause is necessary for falsification purposes. 28 Popper, Die beiden Grundprobleme, op. cit., p. 361.
The Rise and Fall of Falsificationism
493
conflict of a theory with a falsifier. This is “falsification by theory” developed in the sixties29. 8. Theories that do not satisfy the criterion of demarcation do not have to be pseudo-scientific; they may even contribute (in some indirect way) to progress of science. Note that the new criterion of demarcation strongly resembles the old one: Popper0: Empiricism = singularistic basis; Popper1: A system of statements is scientific if it has an “empirical basis”. However, the meaning of the “empirical basis” is modified now in light of aforementioned methodological decisions, so that Popper1 has some answers to the objections he encountered from the Vienna Circle, especially to the objections A1–A4. He is also able to meet some criticism from Neurath’s review. Ad B1, the criticism that falsification is a clear-cut tool for systems with unambiguous language and so is not applicable to science: according to Popper1, the acceptance of empirical reports is not an automatic procedure anymore—“basic statements” (especially falsifiers) require some pragmatic consideration if they are to be used in empirical tests30. Ad C2, the criticism that falsificationism does not reflect the actual methods used in science over centuries, giving a false image of what science is: according to Popper1, falsificationism is a normative theory stating how science ought to proceed (though it is claimed to be in good agreement with history of science anyway)31. Ad C3, the criticism that falsificationism does not take into account the pragmatic aspects of scientific conduct: according to Popper1, decisionism replaced alleged “mechanicism”, so the pragmatic aspect is taken into the account32. The other objections are either left unanswered or the answers are quite unsatisfactory. For Popper1 there is no science without experiments (B2), and the most important function of experiments consists in serving the falsificationist schema— in all branches of science (C1). Even if in some cases scientific change takes place without ‘falsification by observation,’ it is still indispensable and works in a way criticized by Neurath (C4). Even if we agree that Popper0 is a straw man, we still have to admit that Neurath’s criticism applies to the views presented in LdF and some of its develop29 Cf. J. O. Wisdom, ‘Refutation by Observation and Refutation by Theory,’ in Lakatos, Musgrave (eds.), Criticism, op. cit., pp. 65-67 and J.O. Wisdom, ‘On Methods of Refutation in Metaphysics,’ in Robert S. Cohen, Marx W. Wartofsky (eds.), Boston Studies in the Philosophy of Science, Vol. IV, Dordrecht: D. Reidel Publishing Company, 1969, pp. 523-537. 30 Cf. Popper, Logic, op. cit., p. 106 and Lakatos, ‘Criticism,’ op. cit. 153. 31 Cf. Popper, Die beiden Grundprobleme, op. cit., pp. 393, 395. 32 Popper, Logic, op. cit., §30.
494
Artur Koterski
ments, i.e. Popper1. They are answered only by sophisticated falsificationists to whom we turn next.
4. SOPHISTICATED FALSIFICATIONISM Popper2 is not interested in single hypotheses or their systems, but in series of theories. They alone are the subject of methodological judgment. To formulate the sophisticated criterion of demarcation we must consider three new categories and see how different such an approach is from its predecessor. The methodological judgment requires three interconnected concepts of acceptability. Acceptability0. It is a feature of some theories that are not acceptable1, those that work as kind of regulative ideas in science or some hypotheses from a pioneering period in a given branch of science. The category is introduced because “even metaphysical theory is infinitely better than no theory”33 or because they are promising but still not yet falsifiable theories. They are valuable because “a non testable theory—a metaphysical theory—may be developed and strengthened until it becomes testable”.34 “New ideas”, Popper wrote, “should be regarded as precious and should be carefully nursed; especially if they seem to be a bit wild”.35 The introduction of acceptability0 significantly blurs the border between science and metaphysics. In the previous versions of falsificationism demarcation was the main aim. Popper2 has his own criterion but it is intentionally flexible: “I certainly do not believe in anything like sharp demarcation … the significance of the demarcation, if any, should not be overrated”.36 Instead Popper2 turns to the—as such unmentioned—Neurathian thesis that “[s]cientific theories are distinguished from myths merely in being criticisable, and in being open to modification in the light of criticism”.37 This contrasts sharply with LdF, as Popper admitted: “In those days I wrongly identified the limits of science with those of arguability. I later changed my mind and argued that non-testable (i.e. irrefutable) metaphysical theories may be rationally arguable.”38 Acceptability1. A new theory in the series, T2, is acceptable1 if it explains everything that was explained by the previous one, T1, and it has some excess of em33 Popper, The Postscript, I, op. cit., p. 172. 34 Popper, The Postscript, I, op. cit., p. 191. 35 Karl R. Popper, ‘The Rationality of Scientific Revolutions,’ in Ian Hacking (ed.), Scientific Revolutions, Oxford: OUP, 1985, p. 97. 36 Popper, The Postscript, I, op. cit., p. 159; cf. Karl R. Popper, The Postscript to the Logic of Scientific Discovery, Vol. III, Totowa: Rowman and Littlefield, 1982, p. 211. 37 Popper, The Postscript, I, op. cit., p. 7; cf. Otto Neurath, ‘Ways of Scientific WorldConception,’ in Neurath, Philosophical Papers, op. cit., p. 37 (originally 1930/1931). 38 Karl R. Popper, ‘Two Faces of Common Sense: An Argument for Commonsense Realism and Against the Commonsense Theory of Knowledge,’ in Karl R. Popper Objective Knowledge. An Evolutionary Approach, Oxford: OUP, 1979, p. 40, fn. 9.
The Rise and Fall of Falsificationism
495
pirical content i.e. it additionally generates new predictions unavailable in T1, or it predicts events improbable or forbidden in T1. Acceptability2. T2 is acceptable2 if it is acceptable1 and it is corroborated in tests of some of its empirical content excess. A series of theories that consists solely of aceeptable1 theories is theoretically progressive. When they are acceptable2 as well, the series is empirically progressive. The series is (simply) progressive if it is theoretically and empirically progressive. For Popper2 a series of theories is a part of science when it is a priori theoretically progressive and a posteriori empirically progressive.39 Negatively a series can also be deemed empirically or/and theoretically degenerated. The rules of elimination are different now too. A refutation does not happen because of conflicting “basic statements”. All theories swim in an ocean of anomalies. The function of an experimentum crucis does not consist in refuting one of the competing series but to confirm one of them. Empirical refutation is also not necessary, the scientific change may happen even without experiments. As Eli Zahar put it, “empirical refutations … are not indispensable: since theories can satisfy certain meta-principles [like ‘make coherent choice’] to a greater or lesser degree, it is possible to improve such theories without reference to experimental results”.40 Series of theories do not get refuted and abandoned because of anomalies but for some more complex reasons. Popper2 rejected the psychologically weak claim that every rational scientist is dying to refute his own theory. Corroboration became more important than falsification. We can now give a summary of the problem-shift in Popper2 and then ask if he was finally able to answer the rest of Neurath’s objection. 1. We may characterize as scientific only some series of theories that consist solely of falsifiable elements and only, what is more important, when every new element is better corroborated—the time limit for a theory to acquire acceptability2 remaining unspecified. 2. The empirical basis looses its importance: accepted falsifiers do not lead to rejection of a series unless there is a better competing series; the basic statements state the confirming evidence in the experimentum crucis. 3. Although series of theories are not falsifiable in an absolute manner, it is still possible to judge them in any given moment and to compare them with their rivals; progressive series are preferred. 4. The advance of science consists in developing a progressive series of theories that generate empirical content excess, part of which gets corroborated. 39 “I have no objection whatever to Lakatos’s use of the terms ‘acceptance1’ and ‘acceptance2’.” (Karl R. Popper, ‘Epistemology without a Knowing Subject,’ in Popper, Objective Knowledge, op. cit., p. 143, fn. 42.) 40 Elie G. Zahar, ‘Logic of Discovery or Psychology of Invention?,’ The British Journal for the Philosophy of Science 34 (1983), p. 260.
496
Artur Koterski
5. The tool for progress in science is inconclusive verification—partial confirmation (“corroboration”) of new predictions. 6. Strong corroboration and strong disconfirmation of a given theory in the series does not necessarily say anything about its truth-value. 7. Theories that do not satisfy the sophisticated criterion of demarcation are not necessarily pseudo-scientific; metaphysics may function as research programs that may later turn into full-blood scientific knowledge. Now the last three objections from Neurath may be answered as follows. Ad B2, the criticism that falsification does not apply to science without experiments: according to Popper2, the results scientists gathered without experiments may be moved to the background knowledge. (If for any reason it is not possible we must remember that our criterion cannot be strict.) Ad C1, the criticism that falsificationism represents pseudo-rationalism and that there is no general method of control in science: Popper2 replies that falsificationism is not absolute anymore. There are appeal procedures by which a theoretician may invalidate the decision of an experimenter. There also exists a group of theories (those acceptable0) that are not falsifiable and not rejected as metaphysics or pseudo-science, so falsification is not only non-strict and not only non-ultimate but also not universally applicable. Ad C4, the criticism that falsificationism wrongly describes scientific change: Popper2 answers, first, that he allows not only scientific change because of falsification but also without it. And, second, that he can deal with so-called Kuhnian gaps by means of Watkins’ theory of incongruent counterparts.41 In principle, Popper2 answers all Neurathian objections, although the price is significant. As Watkins noted, “‘Popper2’ turned into anti-Popper”.42 Popper2’s theory could be hardly called falsificationism. As it happened, Popper himself denied he had ever called his conception “falsificationism”.43
5. THE REJECTION OF THE MAIN IDEAS OF FALSIFICATIONISM Lakatos’s aim was to tidy up Popper2 and extend the position by making it more flexible. First, Lakatos specified the structure of the series and supplemented its description by two requirements for its development. Science presents us with competing research programs which include a hard core and a protective belt plus 41 See John W. N. Watkins, ‘Corroboration and the Problem of Content-Comparison,’ in Gerard Radnitzky, Gunnar Andersson (eds.), Progress and Rationality in Science, Dordrecht: D. Reidel Publishing Company, 1978, §9 and John W.N. Watkins, Science and Scepticism, Princeton: UP, 1984, §5.13. 42 John W. N. Watkins, ‘Metaphysics and the Advancement of Science”, The British Journal for the Philosophy of Science 26 (1975), p. 107. 43 Popper, The Postscript, I, op. cit., p. xxxi.
The Rise and Fall of Falsificationism
497
negative and positive heuristics. Second, Lakatos adjusted the criterion of demarcation. These changes radically transformed falsificationism in three respects. First, metaphysics is considered a part of science. Negative heuristics is a rule for the followers of a program not to try to falsify its hard core. So the hard core is metaphysical. Positive heuristics is a group of beliefs, rules and theories that tell a scientist how to modify the protective belt to get increased empirical content. It too may contain non-falsifiable theories, i.e. metaphysics. Clearly there is more that one meaning of the adjective “metaphysical” to consider: 1. Metaphysical1 are those theories that do not have potential falsifiers; 2. Metaphysical2 are those theories that are not subjected to refuting procedure; 3. Metaphysical3 are world-views (e.g. theories acceptable0); 4. Metaphysical4 are systems unable to increase empirical content (degenerated research programs). For Lakatos science cannot be separated from metaphysics. This means that either that there is no need for a criterion of demarcation or that the problem should be redefined. The criterion can differentiate not science and metaphysics but bad and good research programs. Lakatos also modified Popper2’s criterion to eliminate a problem pointed out by Feyerabend: how long we can wait for an acceptable1 theory in the series to become acceptable2. According to Lakatos both conditions should be satisfied a priori. To make it work a special theory of novel facts or background knowledge was introduced.44 Second, science develops according to logic of scientific discovery. LdF claimed that there was no logic of scientific discovery whatsoever: all was guesswork and trial and error procedure. For Lakatos, however, the positive heuristics is the logic for development of scientific research program. This view is backed up by Lakatos’s theory of demarcation which is intrascientific and aims to find a program with better heuristics. When a program encounters a theoretical or an empirical problem, it may be considered acceptable if it is at least acceptable3, i.e. it is developed in accordance with its positive heuristics—in accordance with its internal logic. Third, Lakatos recognised the need for inductive evaluation of theories. All Poppers were opposed to any form of induction. Lakatos was not. In requiring and appealing also to another type of acceptance—called “acceptance3” now viewed as a kind of inductive acceptance required for the assessment of future behavior of a theory or series of theories—Lakatos recognized some induction as a necessary element of theory of knowledge.45 44 See Imre Lakatos and Elie G. Zahar, ‘Why did Copernicus’s Research Programme Supersede Ptolemy’s?,’ in Robert S. Westman (ed.), The Copernican Achievement, Los Angeles: University of California Press, 1976, pp. 354–383. 45 Imre Lakatos, ‘Changes in the Problem of Inductive Logic,’ in Imre Lakatos, Mathematics, Science and Epistemology. Philosophical Papers, Vol. II, Cambridge: CUP,
498
Artur Koterski
6. CONCLUSION Any very short but still adequate description of LdF has to mention as its main aims and tenets (a) the search for the demarcation; (b) anti-inductivism. The results obtained there was (c) falsificationism and (d) the proof that there was no logic of science at all. After many years, however, the demarcation problem and its solution lost their value for Popper and many of his followers. The anti-inductivism was softened as well. Here, of course, the change was not as drastic as in (a) and (c) but Lakatos openly called for induction—and obviously, he rejects (d). If his methodology is indeed the last link in the falsificationist evolutionary chain, then the development of falsificationism denied and rejected the basics of Popper’s doctrine and made it collapse.46 Those changes in Popperian methodology enabled a sophisticated falsificationist to answer to Neurath’s review. Neurath, of course, would not find that position pseudo-rationalist: he would be happy with many features of Lakatosian picture of science. He would appreciate that Lakatos described science in terms of bigger units that are in competition with each other and that do not have to be consistent (the use of an inconsistent hard core is sometimes rational). He would welcome the claim that the standard of rationality is not fixed and appreciate that the problem of demarcation is replaced by the problem of criticism (there is no magic sieve to filter out metaphysics) and that methodological tools are subject to criticism as well. It is interesting to speculate what Neurath would have said about Lakatos’s rational reconstructions of the history of science or his use of philosophy of science for the acquisition of power.
Department of Logic and Philosophy of Science Faculty of Philosophy and Sociology Maria Curie-Sklodowska University Pl. MCS 5, 20-031 Lublin Poland [email protected]
1978 (originally 1968) p. 181 f.; cf. Elie G. Zahar, Leçons d’épistémologie, Paris: Imprimerie de l’École Polytechnique, 1997, p. 125. 46 For many Popperians both antecedent and consequent are false.
MARIA CARLA GALAVOTTI
PROBABILITY AND PRAGMATISM
1. FOREWORD Pragmatism is both a philosophical movement and a way of addressing philosophical problems. With the exception of Charles Sanders Peirce and Clarence Irving Lewis, pragmatist philosophers paid little attention to the foundations of probability, whereas pragmatism exercised a great influence on a number of authors. In the first place, this holds for the founders of the subjective interpretation of probability, namely Frank Ramsey, who is considered a pragmatist although he had no direct connections with the pragmatist movement, and Bruno de Finetti, who acknowledged the influence of Giovanni Vailati on his own work. In addition the writings of upholders of different interpretations of probability, including Hans Reichenbach, Ernest Nagel and Rudolf Carnap, disclose similarities with the pragmatist outlook.
2. ABOUT PRAGMATISM The paternity of the term “pragmatism” is usually ascribed to Peirce, and the philosophical movement called by that name started in the years 1870–72 in connection with the gatherings of the Metaphysical Club in Cambridge, Mass. A decisive influence on that group, which included Peirce, William James and a number of intellectuals of various provenance, was exercised by Justice Oliver Wendell Holmes, upholder of a “prediction theory of law” of which pragmatism Max H. Fish regarded as a generalization.1 Pragmatism is not only a philosophical movement, but also a “method”, to use a term adopted by both Peirce and James. According to widespread opinion, pragmatism cannot be forced into a clear-cut definition. In this vein, Lovejoy distinguished thirteen kinds of pragmatism2 and Schiller gave it seven definitions.3 1
2 3
See Max H. Fish, “Justice Holmes, the Prediction Theory of Law, and Pragmatism”, in: Kenneth Laine Ketner and Christian J. W. Kloesel (Eds.), Peirce, Semeiotic, and Pragmatism. Papers by Max H. Fish, Bloomington: Indiana University Press, 1986, pp. 6-18. See Arthur Lovejoy, The Thirteen Pragmatisms and Other Essays, Baltimore: The Johns Hopkins Press, pp. 1-29. See Ferdinand Canning Scott Schiller, Studies in Humanism, London-New York: Macmillan, 1907.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_34, © Springer Science+Business Media B.V. 2011
500
Maria Carla Galavotti
Nevertheless, some points of agreement among different perspectives can be identified. Philip Wiener singles out five main components of pragmatism: (1) a pluralistic empiricism; (2) a temporalistic view of reality; (3) a relativistic or contextualistic conception of reality; (4) a probabilistic view of physical and social hypotheses and laws in opposition to both mechanistic or dialectical determinism, and (5) a secular democratic individualism.4 For the purpose of these pages it is worthwhile emphasizing some further issues, which are but a development of those described by Wiener. (1) The conception of truth as consent in the long run put forward by Peirce and basically accepted by James. To quote James, truth is “an expedient in the long run and on the whole of course; … The ‘absolutely’ true, meaning what no farther experience will ever alter, is that ideal vanishing-point towards which we imagine that all our temporary truths will some day converge”.5 (2) The centrality of action. This is reflected by Peirce’s maxim underpinning his theory of meaning, according to which “the only means of determining and clarifying the sense of an assertion consists in indicating what particular sort of experiences one thereby intends to affirm will be produced, or would be produced, given certain circumstances”.6 Peirce’s maxim is echoed by James’ definition of pragmatism as a method by which we “interpret each notion by tracing its respective practical consequences”.7 (3) The stress on prediction as the main task of knowledge. If knowledge has to accomplish the purpose of helping man face the future, good predictions are needed. The criterion for evaluating the goodness of predictions is given by their success. (4) The crucial role of belief, which is assigned dispositional meaning in the sense of “preparedness to act”.8 The hypothetical aspect characterizing the view of belief as a mode of expectation, or a “looking forward to action” is a fundamental ingredient of the pragmatist philosophy. It is forcefully stressed by Giovanni Vailati, who borrows from Julius Pikler the claim that “I believe this” is tantamount to saying “if it were my will I could obtain certain presentations. … the ‘would be’ of presentation is the ‘is’ of objective existence. Actual objective existence is conditional subjective existence; it is ‘presentableness through will’ ”9. (5) Also to be stressed is Peirce’s 4 5 6
7 8
9
See Philip Wiener, “Pragmatism”, in: Philip Wiener (Ed.), Dictionary of the History of Ideas, New York: Charles Scribner’s Sons, 1973, volume III, pp. 551-570. William James, Pragmatism, London-New York: Longmans, 1907. Reprinted London, Routledge 1992, p. 109. Charles Sanders Peirce, “How to make our ideas clear”, in: Charles Hartshorne and Paul Weiss (Eds.), Collected Papers of Charles Sanders Peirce, Cambridge, Mass.: Harvard University Press, 1934, 5.402. William James, Pragmatism, cit., p. 39. In this connection Fish (among others) stresses the influence exercised upon pragmatists by Alexander Bain. See Max H. Fish, “Alexander Bain and the Genealogy of Pragmatism”, in: Kenneth Laine Ketner and Christian J. W. Kloesel (Eds.), Peirce, Semeiotic, and Pragmatism. Papers by Max H. Fish, cit., pp. 79-109. Julius Pikler, The Psychology of Belief in Objective Existence, London-Edinburgh: Williams and Norgate, 1890, p. 54. Quoted by Giovanni Vailati, “Le origini e l’idea
Probability and Pragmatism
501
idea that logic is “the ethic of thought”,10 telling us how we should reason to attain our ends, including knowledge and truth.
3. PEIRCE FORERUNNER OF THE PROPENSITY INTERPRETATION OF PROBABILITY
Peirce dealt extensively with probability and induction, and the debate on the foundations of probability abounds with references to his work. He argued in favour of indeterminism, and for that reason his work is regarded by Ian Hacking as the point of arrival of the “erosion of determinism”.11 As observed by Theodore Porter, Peirce “wished to make statistical method central to scientific reasoning”.12 As concerns the interpretation of probability, Peirce anticipated the propensity interpretation. He claims that probability “is a statistical fact”.13 Probability “never properly refers immediately to a single event, but exclusively to the happening of a given kind of event on any occasion of a given kind”, but it does not simply refer to past occurrences, being rather “the ratio that there would be in the long run”.14 So defined, probability is a dispositional concept. As Peirce puts it, the assertion “the probability that if a die be thrown … it will turn up a number divisible by three, is one-third” means that the die has a certain ‘would-be’; and to say that a die has a ‘would-be’ is to say that it has a property, quite analogous to any habit that a man might have. Only the ‘would-be’ of a die is presumably much simpler and more definite than the man’s habit.15
It is because of the stress he puts on the “would-be”, or on the dispositional character of probability, that Peirce is regarded as a forerunner of the propensity theory of probability. The propensity theory of probability was developed in the 60’s by Karl Popper to interpret probability attributions in Quantum Mechanics. Later on, he generalized the theory in view of a much wider application. But while Peirce regards the
10 11 12 13 14 15
fondamentale del pragmatismo”, in: Rivista di psicologia applicata V (1909); reprinted in: Mario Quaranta (Ed.), Scritti, Sala Bolognese: Forni Editore, 1987, volume I, p. 116-128. See Charles Sanders Peirce, Collected Papers, cit., 1.280, 1.444, 5.85, 5.133, 5.533, 8.191. See Ian Hacking, The Taming of Chance, Cambridge: Cambridge University Press, 1990. Theodor Porter, The Rise of Statistical Thinking, 1820-1900, Princeton: Princeton University Press, 1986, p. 221. Charles Sanders Peirce, “Notes on the Doctrine of Chances” (1910), in: Collected Papers, cit., 8.4. Ibidem, 2.661. Ibidem, 2.664.
502
Maria Carla Galavotti
dispositional property of probability as pertaining to objects (such as the die of the above example), Popper ascribes propensity to the set of conditions surrounding the occurrence of events, or “chance setup”. This marks a major difference between Peirce and modern propensity theories.16 After Popper, the propensionist interpretation became quite popular among philosophers of science. However, the ongoing debate on the topic has disclosed a number of problems. For one thing, the dispositional character of the notion of propensity makes it barely applicable to inverse probabilities, and therefore to Bayes’s rule. Furthermore, the propensity theory requires completeness of information in connection with the description of the chance setup, but completeness of information is hardly ever fulfilled in practice. Peirce’s views influenced Ernest Nagel, upholder of a “truth-frequency” theory according to which probability refers to the inference from one set of propositions to another, and denotes the relative frequency of the effectiveness of such an inference. Nagel regards probability as a theoretical notion, and maintains that probability statements are tested by comparing their consequences with observed frequencies. This brings his perspective close to the propensity theory. In fact, an essential component of Popper’s perspective is the distinction between probability statements, expressing propensities and referring to frequencies in virtual sequences of experiments, and statistical statements, referring to relative frequencies observed in actual sequences of experiments, which are used to test probability (propensity) statements. A distinctively pragmatic trait of Nagel’s outlook is pluralism. As he puts it, “the term ‘probability’ is not univocal, for it has different meanings in different contexts”.17 The unifying character of the different uses of probability in different contexts derives from the fact that it represents a measure of success of a certain type of inference.
4. THE INFLUENCE OF PRAGMATISM ON HANS REICHENBACH Reichenbach, who was introduced to Peirce’s writings by Sidney Hook around 1928–29,18 regarded his own perspective as a form of pragmatism. This holds in the first place for his theory of meaning. Reichenbach is equally critical of the position of the “Viennese school” aiming “to show that every proposition has
16 This is stressed by Donald Gillies, Philosophical Theories of Probability, London: Routledge, 2000, where more on the propensity theory of probability can be found. 17 Ernest Nagel, “The Meaning of Probability”, in: Journal of the American Statistical Association 31, 1936, p. 26. On Nagel’s views about probability see Maria Carla Galavotti, Philosophical Introduction to Probability, Stanford: CSLI, 2005, where the reader will also find more on the debate on the interpretation of probability. 18 See Sidney Hook’s memoir in: Maria Reichenbach and Robert S. Cohen (Eds.), Hans Reichenbach, Selected Writings, Dordrecht: Reidel, 1978, volume I, pp. 32-35.
Probability and Pragmatism
503
a verifiable meaning,”19 and of the notion of partial definability put forward by Carnap in “Testability and Meaning”, on the account that such theories overlook the probabilistic aspect characterising scientific (as well as everyday) language. By contrast, he developed a probabilistic theory of meaning which “substituted probability relations for equivalence relations and conceived of verification as a procedure in terms of probabilities rather than in terms of truth”. Instead of defining ‘the meaning’ of a sentence … it merely laid down two principles of meaning; the first stating the conditions under which a sentence has meaning; the second the conditions under which two sentences have the same meaning.20
Reichenbach’s theory of meaning ties the meaning of scientific statements to their predictive character, which is a condition for their testability, and states the fundamental principle underlying his theory of meaning as follows: “there is as much meaning in a proposition as can be utilized for action”.21 Obviously, Reichenbach’s position in this connection brings him close to pragmatism. Indeed, he acknowledges his debt to the pragmatist theory of meaning, especially in connection with his “Principle of retrogression”. This makes it possible to ascertain the relation between direct and indirect sentences by allowing the meaning of an indirect proposition to be obtained from the observable propositions from which it was inferred, and to interpret such inference as an equivalence, in the sense that “the meaning of the conclusion of the inference is the same as the meaning of the premisses of the inference”.22 For Reichenbach it is probability, not truth, that provides the toolbox for a conception of scientific knowledge in tune with scientific practice. By bringing probability into the theory of meaning and more generally into the theory of knowledge, Reichenbach means to proceed one step forward in the direction pointed by pragmatists. In his words: our conception may perhaps be taken as a further development of ideas which originated in pragmatism. Pragmatists had the merit of taking an anti-metaphysical stand, further developments were made possible by the progress of logical instruments.23
19 Hans Reichenbach, “Logicist Empiricism in Germany and the Present State of its Problems”, in: Journal of Philosophy 6 (1936), p. 143. 20 Hans Reichenbach, “The Verifiability Theory of Meaning”, in: Proceedings of the American Academy of Arts and Sciences 53 (1951), p. 47. 21 Hans Reichenbach, Experience and Prediction, Chicago: University of Chicago Press, p. 80. Italics original. 22 Ibidem, p. 49. See also footnote 10, where Reichenbach quotes Peirce and mentions James, Dewey and Schiller. 23 Ibidem, p. 69.
504
Maria Carla Galavotti
Also in tune with pragmatism is Reichenbach’s stress on prediction. According to him “the theory of knowledge is a theory of prediction”24, and since statements about the future are probabilistic, the theory of knowledge requires the theory of probability, which is precisely a “theory of propositions about the future … in which the two truth-values, true and false, are replaced by a continuous scale of probabilities”.25 Such a theory rests on the frequency interpretation of probability, which is the cornerstone of Reichenbach’s entire epistemology. The frequency interpretation plays a twofold role in connection with probability. On the one hand, “a frequency is used as a substantiation for the probability statement; it furnishes the reason why we believe in the statement”, on the other “a frequency is used for the verification of the probability statement; that is to say, it is to furnish the meaning of the statement”.26 Reichenbach’s version of frequentism is more flexible than that developed by Richard von Mises, and is devised for a wider range of applications.27 Unlike von Mises, Reichenbach develops a theory of induction and addresses the problem of its justification. His theory of induction revolves around the so-called “method of concatenated inductions”, whose fundamental feature is that of being self-corrective. Reichenbach praises Peirce for emphasizing this crucial character of the inductive procedure. Reichenbach’s argument for the justification of induction is genuinely pragmatical; it moves from the assertion that inductive inference “cannot be dispensed with because we need it for the purpose of action” and goes on arguing that such a procedure gives the best possible guide to the future.28 Since probability statements serve the purpose of guiding decisions leading to actions, they will be justified if it can be shown that they are the best possible guide to action. It turns out that the frequency interpretation is amenable to this kind of pragmatical justification because it satisfies what Reichenbach calls the “principle of the greatest number of successes”, namely it leads us to act in the most successful way possible. Induction is therefore justified on pragmatic grounds, in view of the attainment of what is taken to be its end, namely the formulation of good predictions.
24 Hans Reichenbach, “La philosophie scientifique: une esquisse de ses traits principaux”, in: Travaux du IX Congrès International de Philosophie, Paris: Hermann, 1937, p. 89. My translation. 25 Hans Reichenbach, “Logicist Empiricism in Germany and the Present State of its Problems”, cit., p. 159. 26 Hans Reichenbach, Experience and Prediction, cit., p. 339. Italics original. 27 For a comparison between Reichenbach’s and von Mises’s versions of frequentism see Maria Carla Galavotti, “ On Hans Reichenbach’s Inductivism”, forthcoming in Synthèse. Online at DOI: 10.1007/s11229-009-9589-6. 28 Hans Reichenbach, Experience and Prediction, cit., p. 346.
Probability and Pragmatism
505
5. LEWIS’S INFLUENCE ON CARNAP Probability is addressed by Clarence Irving Lewis in Chapter X of An Analysis of Knowledge and Valuation. Lewis discusses logicism and frequentism with reference to the writings of Keynes and Reichenbach to conclude that an account of probability should be possible which identifies a tacit community of conception underlying both these theories and perhaps more fundamental and important than their differences: an account, namely, which should take as its point of departure the thesis that a probability is a valid estimate of a frequency from the given data.29
In Logical Foundations of Probability Carnap claims to be “especially gratified by the great similarity between the conceptions of the nature of the logical concept of probability which were developed independently” by Lewis and himself.30 It is not surprising that Carnap found Lewis’s approach congenial, given that in Logical Foundations of Probability logical probability, or probability1, is assigned a twofold role as a method of confirmation and a method of estimation of relative frequency, or probability2. In recent years, a number of authors have called attention to a pragmatic component of Carnap’s thought, especially in connection with the link established between meaning and testing procedures in “Testability and Meaning”, and his Principle of Tolerance.31 Furthermore, A. W. Carus argued that the young Carnap was influenced by Hans Vaihinger, claiming that evidence can be found in the preparatory sketch of the Aufbau titled “From the Chaos to Reality” (1922).32 As a matter of fact, in Meaning and Necessity Carnap acknowledges his debt to Clarence Irving Lewis’s “method of analysis”, especially in connection with the notions of “intentionality” and “synonymicity”. Carnap praises the pragmatical viewpoint for its fruitfulness in the last paragraph of Meaning and Necessity, where he maintains that there is an urgent need for a system of theoretical pragmatics, not only for psychology and linguistics, but also for analytic philosophy. Since pure semantics is sufficiently developed, the time seems ripe for attempts at constructing tentative outlines of pragmatical systems.33 29 Clarence Irving Lewis, An Analysis of Knowledge and Valuation, La Salle, Ill.: Open Court, 1946. Reprinted 1950, p. 291. 30 Rudolf Carnap, Logical Foundations of Probability, Chicago: University of Chicago Press, 1950, p. ix. Italics original. 31 See Alan Richardson, “Carnapian Pragmatism”, in: Michael Friedman and Richard Creath (Eds.), The Cambridge Companion to Carnap, Cambridge: Cambridge University Press, 2007; and Cornelius van de Waal, On Pragmatism, Belmont, California: Wadsworth, 2005. 32 See A.W. Carus, Carnap and Twentieth-Century Thought, Cambridge: Cambridge University Press, 2007. 33 Rudolf Carnap, Meaning and Necessity, Chicago: University of Chicago Press, 1947,
506
Maria Carla Galavotti
This claim reappears in Carnap’s reply to Morris’s paper “Pragmatism and Logical Empiricism” in the Schilpp volume.34 The distinction between “internal” and “external” questions put forward in Empiricism, Semantics and Ontology (1950) is also endowed with a pragmatical flavour. The same holds for Carnap’s admission, to be found in his late writings on inductive logic, that the choice of a λ-function does not depend on purely logical considerations, but on contextual elements such as “beliefs, actions, possible losses, and the like”.35 Carnap’s writings of the 60’s abound with expressions and statements that seemingly bring him close to pragmatism, but it should be kept in mind that Carnap retained a rationalistic view of the “reasonableness” of inductive methods, as opposed to “the successfulness of the … credence function of a given person in the real world. Success depends upon the particular contingent circumstances, rationality does not”.36 In the same spirit, after having endorsed Reichenbach’s pragmatic justification of induction during the 40’s and 50’s, in the 60’s Carnap turned to a justification grounded on the notion of inductive intuition.37 This move served the purpose of keeping the justification of induction entirely aprioristic, while discarding the argument based on success. One can conclude that in spite of having accepted the inclusion of pragmatic elements into inductive logic, Carnap maintained a genuinely rationalistic attitude.
6. THE SUBJECTIVE INTERPRETATION OF PROBABILITY As regards the debate on the foundations of probability, the natural offshoot of pragmatism is the subjective interpretation that took shape in the late 20’s with the work of Frank Ramsey and Bruno de Finetti. For subjectivists, probability is a quantitative expression of the degree of belief in the occurrence of an event entertained by a person in a state of uncertainty. Probability is taken as a primitive notion endowed with a psychological foundation, which needs an operative definition that specifies a way of measuring it. A classical way of measuring probability is in terms of bets, but this is by no means the only option. In fact Ramsey 2nd edition 1956, p. 250. 34 Rudolf Carnap, “Replies”, in: Paul Arthur Schilpp (Ed.), The Philosophy of Rudolf Carnap, La Salle, Ill.: Open Court, 1963, pp. 87-98. 35 See Rudolf Carnap, “Inductive Logic and Rational Decisions” and “A Basic System of Inductive Logic, Part 1”, in: Rudolf Carnap and Richard C. Jeffrey (Eds.), Studies in Inductive Logic and Probability, volume I, Berkeley: University of California Press, 1971. 36 Rudolf Carnap, “The Aim of Inductive Logic”, in: Ernest Nagel, Patrick Suppes and Alfred Tarski (Eds.), Logic, Methodology and Philosophy of Science, Stanford: Stanford University Press, 1962, pp. 303-318. 37 Rudolf Carnap, “Replies”, cit., p. 978. See also “Inductive Logic and Inductive Intuition”, in: Imre Lakatos (Ed.), The Problem of Inductive Logic, Amsterdam: NorthHolland, 1968, pp. 257-267.
Probability and Pragmatism
507
adopted a definition in terms of “preferences” to be determined on the basis of the expectation of obtaining certain goods, not necessarily of a monetary kind, and de Finetti recommended penalty methods, like scoring rules of the kind of Brier’s, as an alternative to the betting scheme.38 After Ramsey showed that coherent degrees of belief satisfy the laws of probability, coherence became the cornerstone of the subjective interpretation, and the only condition of acceptability to be imposed on degrees of belief. Ramsey’s name is often associated with pragmatism in connection with his views on truth, theories and knowledge in general.39 However, his theory of probability is no less imbued with pragmatism, as he acknowledged by quoting Peirce and James. The overall character of Ramsey’s pragmatism amounts to its being centred on man as an agent acting in the world. This attitude imbues Ramsey’s justification of induction, according to which induction is “a useful habit” which it is reasonable to adopt because mankind would be helpless without it. As he observes: “this is a kind of pragmatism: we judge mental habits by whether they work, i.e. whether the opinions they lead to are for the most part true, or more often true than those which alternative habits would lead to”.40 The passage just quoted belongs to the final section of “Truth and Probability”, which Ramsey claims is “almost entirely based on the writings of C. S. Peirce”.41 Other points of agreement between Ramsey and Peirce concern the view of logic as “self-control”,42 and that of “truth as what everyone will believe in the end”.43 The latter bears directly on the notion of “true scientific system”, which is in turn strictly related to those of “chance” and “probability in physics”. A distinctive pragmatical flavour also characterizes the notions of causality and scientific laws, which Ramsey regards as “rules for judging” that “form a system with which the speaker meets the future”.44 38 For more on this see Philip A. Dawid and Maria Carla Galavotti, “De Finetti’s Subjectivism, Objective Probability, and the Empirical Validation of Probability Assessments”, in: Maria Carla Galavotti (Ed.), Bruno de Finetti, Radical Probabilist, London: College Publications, 2009, pp. 97-114. 39 See for instance Horace Standish Thayer, Meaning and Action. A Critical History of Pragmatism, Indianapolis: Bobbs Merrill, 1968, and Philip Wiener, “Pragmatism”, cit. 40 Frank Plumpton Ramsey, “Truth and Probability” (1926), in: Philosophical Papers, edited by Hugh Mellor, Cambridge: Cambridge University Press, 1990, p. 93. 41 Ibidem, p. 90. 42 Frank Plumpton Ramsey, “Philosophy is Analysis of Possibilities”, in: Notes of Philosophy, Probability and Mathematics, edited by Maria Carla Galavotti, Naples: Bibliopolis, 1990, p. 227. 43 Frank Plumpton Ramsey, “General Propositions and Causality”, in: Philosophical Papers, cit., p. 161. 44 Ibidem, p. 149. On Ramsey’s views on chance and related notions see Maria Carla Galavotti, “F.P. Ramsey and the Notion of ‘Chance’”, in: Jaakko Hintikka and Klaus Puhl (Eds.), The British Tradition in the 20th Century Philosophy. Proceedings of the 17th Wittgenstein Symposium, Vienna: Holder-Pichler-Tempsky, 1995, pp. 330-340.
508
Maria Carla Galavotti
Bruno de Finetti acknowledged his debt to the Italian pragmatists, claiming to “particularly admire the work of Giovanni Vailati”.45 In de Finetti’s vivid prose: I had, by and large, adopted the mode of thinking advocated by authors like Vailati and Calderoni … it was precisely this form of reasoning which, in successive waves, from Galileo to Einstein, from Heisenberg to Born, freed physics—and with it the whole of science and human thought—from those superstructures of absurd metaphysical dross which had condemned it to an endless round of quibbling about pretentious vacuities.46
The distinctive features of de Finetti’s pragmatism can be summarized as follows: (a) scientific knowledge is a product of human activity (ruled by probability rather than truth); (b) the main purpose of science is to make good predictions; (c) induction is the proper tool for making predictions and guiding action; (d) the goodness of inductive inferences is to be judged on the basis of their success; (e) the process of knowledge acquisition has a dynamical character, which is reflected by the adoption of Bayes’ method. Moreover, de Finetti’s subjectivism is intrinsically pluralistic, as it admits that different probability evaluations are made on the basis of the same body of evidence. This follows from the fact that probability assessments are taken to depend on objective information, such as frequencies or symmetries, as well as subjective elements, such as expertise in a particular field. To the subjective definition of probability shared by Ramsey, de Finetti added the notion of exchangeability. Combined with Bayes’ rule in the way reflected by the “representation theorem”, exchangeability gives rise to the inferential methodology inspiring so-called neo-Bayesianism. According to de Finetti, the representation theorem answers Hume’s problem because it justifies “why we are also intuitively inclined to expect that frequency observed in the future will be close to frequency observed in the past”.47 This argument is obviously pragmatical based as it is on the role of induction, namely to guide inductive reasoning and behaviour in a coherent way. Unlike upholders of objective interpretations of probability including frequentism, propensionism and logicism, subjectivists do not believe in true, or unknown, probability values. For subjectivists there is no need to impose requirements such as reference class homogeneity (Reichenbach) or total evidence (Carnap) on probability evaluations; the adoption of Bayesian method together with exchangeability is enough to guarantee convergence of probability evaluations as evidence accumulates, and to learn from experience faster than under independence. Obviously, this depends crucially on the choice of priors, which subjectiv45 Bruno de Finetti, Teoria delle probabilità, Torino: Einaudi, 1970, English version Theory of Probability, New York: Wiley, 1975, p. 201. 46 Ibidem, p. 41. 47 Bruno de Finetti, “Subjective or Objective probability: is the Dispute Undecidable?”, in: Symposia Mathematica IX, 1972, p. 34. For more on de Finetti’s philosophy of probability see Maria Carla Galavotti, “Anti-Realism in the Philosophy of Probability: Bruno de Finetti’s Subjectivism”, in: Erkenntnis 31 (1989), pp. 239-261.
Probability and Pragmatism
509
ists regard as context dependent, taking also in this respect a pragmatist stand. However, it should be added that while adopting requirements like reference class homogeneity and total evidence, objectivists like Reichenbach and Carnap have to admit that such requirements cannot be satisfied in practice. They are consequently forced to weaken them, and resort to epistemically relativized formulations. In other words, the notion of objective probability, or the idea that there are true or unknown probabilities, can only be upheld at the price of admitting the crucial role of pragmatic and contextual elements in the evaluation of probability. Things are not much different if probability is taken as propensity, because that also depends on some requirement of completeness of information. So, according to David Miller propensities “must be referred to the complete situation of the universe (or the light cone) at the time”.48 Incidentally, similar problems arise in connection with probabilistic explanations of single events. For instance Wesley Salmon, who made strenuous efforts to develop a theory of causal probabilistic explanation of single events based on a notion of homogeneous reference class along the lines stated by Reichenbach, was eventually forced to admit that in most cases explanation is context dependent.49
7. CONCLUDING REMARKS Peirce thought that pragmatism would become “the dominant philosophical opinion of the 20th Century”.50 Although his prediction may not have been fulfilled, there is no doubt that the development of philosophy of science since the time of logical empiricism shows a progressive opening towards pragmatism. The author who best represents this tendency is Patrick Suppes, upholder of a “probabilistic empiricism” in which important concepts of science and philosophy do not receive a univocal definition, being instead assigned a specific meaning depending on the context. According to Suppes the idea of logical reconstruction pursued by logical empiricists should give way to the search for context aware representations of scientific phenomena and context sensitive inferential methods, leading to a pluralistic and dynamic view of scientific knowledge. Similarly, invariance should replace the idea of “absolute objectivity”.51 Many others, besides Suppes, acknowledge that pragmatical considerations are unavoidable in philoso48 David Miller, Critical Rationalism. A Restatement and Defence, Chicago: Open Court, 1994, p. 183. 49 See Wesley C. Salmon, “A Realistic Account of Causation”, in: Michele Marsonet (Ed.), The Problem of Realism, Ashgate: Aldershot, 2002, pp. 106-134. 50 Quoted from Charles Morris, “Pragmatism and Logical Empiricism”, in: Paul Arthur Schilpp (Ed.), The Philosophy of Rudolf Carnap, cit., pp. 87-98. 51 See Patrick Suppes, Representation and Invariance of Scientific Structures, Stanford: CSLI, 2002.
510
Maria Carla Galavotti
phy of science; think for instance of Bas van Fraassen’s pragmatics of explanation, and Richard Jeffrey’s probability kinematics. To conclude, one can say that the extent and nature of the influence pragmatism has exercised on the debate on the foundations of probability has yet to be fully explored. Furthermore, the ongoing debate in philosophy of science is increasingly open to views that are typical of the pragmatist outlook.
Department of Philosophy University of Bologna via Zamboni 38 I-40126 Bologna Italy [email protected]
GRAHAM STEVENS
RUSSELL ON NON-DEMONSTRATIVE INFERENCE
INTRODUCTION Russell’s general attitude towards science was an element of his philosophy that remained constant throughout his career, while other doctrines and attitudes came and went around it. It is expressed succinctly in the autobiographical book My Philosophical Development: ‘Science is at no moment quite right, but it is seldom quite wrong, and has, as a rule, a better chance of being right than the theories of the unscientific. It is, therefore, rational to accept it hypothetically’.1 This view of scientific theories as successively improving approximations of the truth was more to Russell than simply his outlook on the philosophy of science. It underpinned his entire approach to philosophy, dictating its method, and forging a link between philosophy and science that alienated some (such as Wittgenstein), while inspiring others (such as Quine). It meant, for Russell, that philosophy was answerable to science, and that science often provided the most promising starting point for work in, for example, metaphysics and epistemology. Yet, like most philosophers who take seriously Hume’s sceptical analysis of inductive reasoning, Russell did not think that the principle of induction could be proved without begging the question. Nonetheless, Russell does not grant anything more than what we might call theoretical doubt concerning the truth of the principle of induction in his earlier, and better known, works. In both The Problems of Philosophy2 and in Our Knowledge of the External World,3 for example, the problem of induction is taken to show that the principle, if it is true, must be known a priori, and thus must be a synthetic a priori truth (if is true). In later work, however, Russell argues that the principle is not simply unjustified; it is false. This does not signal any change in attitude towards the status of science—as I have already said, the privileged position given to science was a fixed point in Russell’s philosophy around which other aspects of his philosophy ebbed and flowed—but signals, rather, a change in Russell’s attitude to the place of induction in scientific reasoning. Russell still maintained that scientific inferences were valid. As he came to think that the principle of induction was false, this can only mean that Russell no longer thought that scientific inferences were licensed by the principle of induction. If scientific reasoning is valid reasoning, it must be licensed by valid rules of inference. The inference rules in question were labelled ‘postulates of non-demonstrative infer1 2 3
Bertrand Russell, My Philosophical Development. London: Routledge 1959, p. 13. Bertrand Russell, The Problems of Philosophy. Oxford: Oxford University Press 1912. Bertrand Russell, Our Knowledge of the External World. London: Routledge 1914.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_35, © Springer Science+Business Media B.V. 2011
512
Graham Stevens
ence’ by Russell. They lie at the heart of his last major work in philosophy, 1948’s Human Knowledge: it’s Scope and Limits. Human Knowledge was not well received when it was published, in part because it was diametrically opposed to the Wittgensteinian philosophical outlook dominating analytical philosophy at the time. Since that time, the book has remained largely neglected. I have argued elsewhere4 that this neglect is undeserved. My reasons for arguing that related to the semantic and metaphysical investigations in the book, and the illumination that they give to studies of Russell’s overall philosophical development. In this paper, however, I want to focus on what Russell has to say about non-demonstrative inference in Human Knowledge. In doing so I will also draw on other published and posthumously published works of Russell’s from around the same period. It is not my intention to defend Russell’s position as enunciated in Human Knowledge. On the contrary, I will argue that, ultimately, it is untenable. However, as I hope to show, the theory of non-demonstrative inference Russell develops in this final stage of his philosophical development is untenable not because of anything intrinsic to the theory, but because of his insistence on trying to reconcile that theory with other, more general, philosophical commitments. In particular, his commitment to realism and subsequent refusal to fully embrace empiricism, even when he displays obvious sympathy towards it, is responsible for the downfall of his theory of non-demonstrative inference. For it is this commitment which motivates the theory, but it also places on the theory a demand too great for it to withstand and under which the theory eventually crumbles. As mentioned above, in his earlier work, Russell adopts a broadly Humean take on the principle of induction, denying that it can be justified in a non-circular way, but maintaining that it is utterly central to all scientific (and everyday) reasoning: The general principles of science, such as the belief in the reign of law, and the belief that every event must have a cause, are as completely dependent upon the inductive principle as are the beliefs of daily life. All such general principles are believed because mankind have found innumerable instances of their truth and no instances of their falsehood. But this affords no evidence for their truth in the future, unless the inductive principle is assumed.5
By the time of his later (post 1940) work, his position is rather different: [A]lthough scientific inference needs indemonstrable extra-logical principles, induction is not one of them. It has a part to play, but not as a premise.6
4 5 6
See Graham Stevens, The Russellian Origins of Analytical Philosophy. London: Routledge 2005. Russell, The Problems of Philosophy. Op. Cit. p. 38. Russell, My Philosophical Development, Op. Cit. p. 141.
Russell on Non-Demonstrative Inference
513
What led Russell to change his mind so dramatically about the status of induction as a rule of scientific inference? If we examine his discussions of induction, and of scientific reasoning generally, in his later works—especially Human Knowledge—we find several reasons for the change of heart, including: • • • •
A conviction that the principle of induction (in its purest form, at least) is demonstrably false. A conviction that other, more (psychologically) fundamental principles, can be identified. A kind of “higher-order” naturalism: i.e. naturalism about scientific reasoning. All of these issues are connected in Russell’s thinking with his interest in the limits of empiricism.
In what follows, I will examine each of these in turn. I will begin in section one by examining Russell’s views on the principle of induction. In section two, I will present Russell’s postulates of non-demonstrative inference. In section three, I will address the difficult topic of how we can be said to know these postulates. I will argue in this section that Russell fails in his attempt to show that we have knowledge of these principles. Finally, in section four, I will look at the relation between Russell’s investigation of non-demonstrative inference, and his quite complex attitude towards empiricism and its limitations as a philosophical doctrine.
1. THE PRINCIPLE OF INDUCTION By the 1940s, Russell had begun to devote the majority of his philosophical thinking to the nature of scientific knowledge and scientific reasoning. Although Human Knowledge is the definitive statement of his mature views on these topics, as is so often the case with Russell, manuscripts written during the development of his ideas and arguments are extremely informative.7 One particularly interesting manuscript is a piece written around 1945, called ‘Non-Deductive Inference’. In this manuscript, Russell provides a very clear statement of his reasons for thinking that induction is not justified. Russell thinks that induction ‘by simple enumeration’ is generally held to be ‘the necessary and sufficient postulate of scientific method’. He states the principle in this form: Given that n instances of a class A have been found to be members of some other class B, and no members have been found to be not members of the class B, then the probability that (a) the next member of A will be found to be a member of B, and (b) that all members of 7
These manuscripts are now published in John G. Slater, ed. The Collected Papers of Bertrand Russell, vol. 11. London: Routledge 1997.
514
Graham Stevens
A are members of B, are both functions of n which increase as n increases, and approaches certainty as a limit as n approaches infinity.8
According to Russell, however, this principle is simply false. He offers the following argument as a refutation of the principle in its above form: In an inductive inference, we are given that two classes, A and B, have n members in common, and we are not given that they have any members not in common. Let a1, a2, …, an, be the n members of A that have been found to be members of B, and let an+1 be the next member of A that we encounter. Then so far as the data are concerned, B may be any class of which a1, a2, …, an are members. It is clear that unless n exceeds half the things in the universe, most of the classes of which a1, a2, …, an are members will not contain an+1. Unless, therefore, the classes A and B are subject to some limitation, it is unlikely that an+1 will be a member of B.9
Why, then, do we draw inductive inferences when induction by pure enumeration is obviously invalid (at least to those sufficiently well versed in set-theory)? The answer must be that induction does not normally proceed by pure enumeration. But that can only mean that some connection is being postulated to hold between the relevant classes. This shows, Russell thinks, that the postulated connections are more fundamental to our practice of scientific prediction/reasoning than induction alone.
2. THE POSTULATES OF NON-DEMONSTRATIVE INFERENCE Russell, in Human Knowledge, suggests five postulates that play the roles of connecting premises and conclusion in induction and which are, thus, implicitly appealed to in scientific inferences that are traditionally construed as merely inductive. Although Russell suggests these five, it should be noted that he makes no firm commitment to any one of them; rather he makes the weaker claim that there are some such principles, whatever they may in fact turn out to be. Indeed, of these five he thinks it ‘highly probable that they can be further reduced’:10 1. 2. 3. 4. 5. 8
The postulate of quasi-permanence The postulate of separable causal lines The postulate of spatio-temporal continuity in causal lines The structural postulate The postulate of analogy
Bertrand Russell, ‘Non-Deductive Inference’, in Slater, ed. Collected Papers 11, Op. Cit., pp. 121-129, p. 121-2. 9 Ibid. p. 122. 10 Bertrand Russell, Human Knowledge: It’s Scope and Limits. London: Routledge 1948, p. 506.
Russell on Non-Demonstrative Inference
515
These five postulates (or whichever number is eventually arrived at if they are further reduced) are the real indemonstrable rules of inference required for nonlogical inference, according to Russell, as he makes explicit in a note dictated to his wife in 1959 but never published: ‘These postulates should, in my opinion, replace induction as quantities needed in non-demonstrative inference’.11 (1) should be understood in light of Russell’s notion of a “Causal Line”. Causal lines are supposed to replace the metaphysical notion of substance, which Russell thinks incompatible with modern physics. Causal lines are, roughly, temporal series of events where there is no sudden change in neighbouring elements of the series. They are therefore supposed to account for the persistence of objects without the postulation of a persistent substance. This notion is partly justified by (1): Given any event A, it happens very frequently that, at any neighbouring time, there is at some neighbouring place an event very similar to A.12 But it also supported by postulate (2): It is frequently possible to form a series of events such that, from one or two members of the series, something can be inferred as to all the other members.13 The example Russell gives of where (2) is invoked is in the understanding of motion. In motion (most clearly unimpeded motion) a causal line is such that we interpret every event in the series as a position of a moving thing. The remaining postulates offer further foundations, as Russell sees them, for scientific inferences. (3) denies the possibility of “action at a distance”; (4) captures the idea that a structured series of data of the right kind is supposed to originate in a common source (e.g. if several hearers hear the same sound, we assume that the causal lines terminating in those hearers’ sensory surfaces all originate from the same source); (5) explains (warrants) our tendency to infer from observable data to unobserved phenomena. What, then, is the status of these postulates? Russell maintains that they are things which we know. Let’s turn to examine this claim now.
11 Bertrand Russell, ‘Notes on Non-Demonstrative Inference and Induction’, in Slater, ed. Collected Papers 11, Op. Cit., pp. 138-139, p. 139. It is unclear why the note was dictated though never published. It appears to have been written for a specific publication as Russell’s wife noted the exact word-count on the typescript, presumably as information for someone in an editorial role. Alternatively it may have been intended to add to the account of this episode in Russell’s thought given in My Philosophical Development which was in press at the time. See editor’s note on p. 116 of vol. 11 of the Collected Papers, op. cit. 12 Russell, Human Knowledge, Op. Cit. p. 506. 13 Ibid. p. 508.
516
Graham Stevens
3. KNOWLEDGE OF THE POSTULATES: HIGHER-ORDER NATURALISM? Obviously if the postulates are to secure scientific knowledge, they must be not only true but also known. The need to meet the first requirement explains the presence of expressions like “frequently”, “usually”, etc. in the official statement of the postulates. We will look at that in the next section. The second requirement is also very hard to meet. Obviously the propositions used to express the postulates are not universally known. But Russell is happy concede this, holding only that particular instances of the general forms are consciously known. To explain the sense in which he thinks we have knowledge of the general principles, Russell embarks on the rather drastic course of attempting to extend the traditional definition of knowledge to embrace phenomena that would not ordinarily be classed as knowledge, and in some cases do not even appear to be epistemic cases at all: Knowledge of connections between facts has its biological origin in animal expectations. An animal which experiences an A expects a B; when it evolves into a primitive man of science it sums up a number of particular expectations in the statement “A causes B”. It is biologically advantageous to have such expectations as will usually be verified; it is therefore not surprising if the psychological laws governing expectations are, in the main, in conformity with the objective laws governing expected occurrences.14
Russell’s discussion of this point is frustratingly brief, but we can get a better idea of what he has in mind from ‘Non-Deductive Inference’. In the manuscript Russell gives three examples of ways in which knowledge is less straightforward than is usually thought: it can be, he suggests, (a) uncertain, (b) unverbalized, and (c) unconscious. As an example of (a) he suggests that we can have indexical knowledge (“I am hot”) despite not knowing what “I” means. This example is unfortunate as it seems to rest on some flaws in Russell’s semantic theory for indexicals: in particular, he seems to be conflating indexical sentences (‘I am hot’) with the propositions they express in contexts of utterance (the proposition that Russell is hot, if the sentence is said by Russell). However, we could perhaps salvage Russell’s point by replacing the example with a sceptical case (I cannot be certain that I am not a brain in a vat, so I cannot be certain that my leg hurts … but we still want to say that I know my leg hurts). The example he gives of (b) is a perceptual case—walking down a street and trying to avoid bumping into other pedestrians I move accordingly, but at no point need I verbalize, even internally, my knowledge that there is a person in my way. The example of (c) he gives is the kind of knowledge employed in riding a bicycle. This may be conscious knowledge when first learnt, but quickly becomes something we employ unconsciously. Nonetheless, says Russell, it counts as knowledge. 14 Ibid. p. 514.
Russell on Non-Demonstrative Inference
517
It has to be said that a number of quite disparate mental phenomena are being run together here. Most of us would be happy to agree with Russell that the fact that knowledge is defeasible does not count against its being knowledge. However, the second and third issues appear quite unconnected to this point. The kind of knowledge Russell is talking about in (c) is surely just what we commonly call ‘knowing how’ rather than ‘knowing that’, i.e. it is a knowledge manifested in ability rather than in a consciously held attitude to something propositional. While it may take some time and patience to recruit certain muscle and nervous systems to learn how to perform a physical task like riding a bicycle, it is a casing of learning how to do it, not of learning any propositional knowledge. Once the skill has been mastered it can become “second nature”. All of this is uncontroversial, but it does not appear to have any connection with the sort of knowledge that is employed in scientific reasoning as the latter looks quite clearly to be propositional knowledge. The knowledge Russell refers to in (b) might be called dispositional. We can be said to know that a person is present when we avoid bumping into them in the street because, e.g., if questioned why we moved, we would say without hesitation, ‘to avoid that person’. There are many examples of this kind of knowledge—knowing what gear the car is in when driving, knowing how high each step of a staircase is when climbing it, and so on. The point to note about all three is that they seem to be very different cases—what does Russell think they have in common? His answer is that he thinks they all exist on a continuum that originates with something close to pure instinct or habit. This motivates an extreme version of naturalism that might be called “higher-order naturalism”. Whereas standard versions of naturalism seek to explain various phenomena in terms of scientific explanations, Russell wants to explain scientific knowledge itself in naturalistic terms. Knowledge, ultimately, is a biological phenomenon, according to this view: We are thus led to a view of knowledge which is more biological, less precise, and more a matter of degree, than the view which has usually prevailed among philosophers. In the sense in which I wish to use the word, I should say that a chicken shows “knowledge” when, without previous experience, it can pick up a grain with its beak. This “knowledge” is synthetic and a priori. Are we to let the chicken outdo us in philosophic capacity?15
This view might seem question-begging: if it is being offered as a justification for the kinds of inferences employed in scientific reasoning, then it certainly begs the question. A more charitable interpretation might be that Russell is offering explanation, not justification. In fact, he is apparently conceding that no absolutely certain justification can be found. This, after all, is the conclusion reached in Human Knowledge: ‘all human knowledge is uncertain, inexact, and partial’.16 However, Russell intends the above considerations to offer support to his claim 15 Russell, ‘Non-Deductive Inference’, Op. Cit. p. 128. 16 Russell, Human Knowledge, Op. Cit. p. 527.
518
Graham Stevens
that much of our scientific knowledge goes beyond what is provided by experience. As such, these reflections on non-demonstrative inference are supposed to be lending support to a wider argument against pure empiricism: if knowledge of the principles of non-demonstrative inference does not derive from experience, then there is more to our knowledge than empiricism permits. This anti-empiricist argument puts Russell on dangerous dialectical grounds, however: for in order for the argument to be valid, the non-demonstrative inferences must rely on knowledge of the principles. But in that case, Russell cannot remain content with offering mere explanations of our inferential behaviour; he must offer justifications of them. Yet, as we have seen, if the naturalistic thesis outlined above is offered as a justification of non-demonstrative inference, it becomes question begging.17
4. THE LIMITS OF EMPIRICISM Russell’s description of his philosophical career in My Philosophical Development as a ‘gradual retreat from Pythagoras’,18 is often mistakenly understood as signifying that the retreat culminated in an eventual rejection of realism. In fact, Russell’s retreat from Pythagorean realism was not only gradual, it was also relatively local: although Russell’s philosophical development was marked by an ever contracting ontology, he did not abandon realism as a more general philosophical stance regarding the mind-independent nature of reality. Nowhere in Russell’s later philosophy is his commitment to realism more evident than in his attitude towards empiricism.19 Despite his sympathy for empiricism, whenever it comes into conflict with Russell’s realism, it always comes off worse. For example, in the highly revealing 1936 paper, ‘The Limits of Empiricism’,20 and in the discussions 17 Perhaps an escape route from this predicament could be found in a form of epistemological externalism. If Russell were to endorse externalism, and maybe what I am calling his higher-order naturalism can be seen as gesturing in that direction, he could maintain that the justification which distinguishes true belief from knowledge may be unavailable to the knower herself, but nonetheless available from a third-person perspective. The position does not seem incompatible with the theory of knowledge contained in Human Knowledge, though there is insufficient evidence to warrant any suggestion that Russell would have endorsed it. Thanks to Thomas Uebel for drawing this point to my attention. 18 Russell, My Philosophical Development, Op. Cit. p. 208. 19 See Graham Stevens, ‘On Russell’s Naturalism’, Bertrand Russell Society Quarterly 130/131, 2006, pp. 21-30; Graham Stevens, ‘Antirealism and the Theory of Descriptions’ in Russell vs. Meinong: The Legacy of “On Denoting”, eds. Griffin and Jacquette, London: Routledge, 2009 for further discussion of Russell’s attitudes towards empiricism and realism. See also Andrew Lugg, ‘More on Russell and Quine: A Reply to Stevens’, Bertrand Russell Society Quarterly 130/131, 2006, pp. 31-37 for a contrasting view of Russell’s attitude towards empiricism. 20 Bertrand Russell (1936) ‘The Limits of Empiricism’, Proceedings of the Aristotelian
Russell on Non-Demonstrative Inference
519
of the consequences for empiricism of acknowledging epistemic constraints on truth in 1940’s An Inquiry into Meaning and Truth, Russell is not prepared to give an unqualified endorsement to empiricism, primarily because he thinks it would demand the rejection of realism.21 Russell’s analysis of non-demonstrative inference again catches him between the competing demands of realism and empiricism. Unlike other places where this situation arises in his philosophy, such as those just mentioned however, the tension between these two demands threatens to make the analysis he is offering untenable. As we saw at the end of the last section, Russell infers from his conclusion that scientific inferences rest on postulates such as the five listed in section 2 that empiricism is false. It should be stressed that this conclusion does not rest on the particular postulates he has arrived at, but rather on the fact that any such postulates will go beyond the sphere of pure experience: ‘And whatever those principles of inference may be, they certainly cannot be logically deduced from facts of experience. Either, therefore, we know something independently of experience, or science is moonshine’.22 Russell’s argument seems to be very simple: P1 The principles governing scientific inference cannot be derived from experience. P2 The principles are known (science is not ‘moonshine’). C1 Something is known independently of experience. C2 Empiricism is false. The problem with the argument, however, concerns premise 2. This premise is certainly not undefended—in a sense the whole of Human Knowledge is a defence of it—but the problem for Russell’s argument arises because of the form that his defence of premise 2 takes. As we have seen, his defence of the premise takes the form of a naturalistic account of knowledge that extends the standard extension of that term to include habits, dispositions, and so forth. Whether or not this account is true, the appeal to it in defending premise 2 strips Russell’s anti-empiricist argument of any hope of validity. Explaining, in Humean terms, why we use induction by reducing it to a matter of natural habit does not oblige us to justify induction—whether we have this habit or not is an independent issue from that of whether the principle is true or false. But making the claim that our habit has evolved into something that can accurately be called knowledge does oblige us to justify the principle, because Society 30: 131-59, 1936, reprinted in John G. Slater, ed. The Collected Papers of Bertrand Russell, vol. 10. London: Routledge 1996. 21 See, e.g., Bertrand Russell, An Inquiry into Meaning and Truth, London: Routledge 1940, pp. 275-305. 22 Russell, Human Knowledge, Op. Cit. p. 524.
520
Graham Stevens
if the principle is known it has to be true. This is the course that Russell follows, indeed it must be if premise 2 is to be defended. Russell states that the principles originally are things that we have a ‘propensity’ to adhere to in our inferences.23 Then, he claims, we reflect on these inferences and so make the principles explicit. All of this is consistent with a naturalistic explanation of our inferential behaviour of the Humean, purely explanatory, sort. But then Russell asserts: ‘As mankind have advanced in intelligence, their inferential habits have come gradually nearer to agreement with the laws of nature which have made these habits, throughout, more often a source of true expectations than of false ones’.24 At that point, we are being offered a justification of the principles of non-demonstrative inference. But the justification on offer will not convince the sceptic about the validity of those inferences, for it cannot be defended without circularity. As the justification is a naturalistic one—one that seeks to characterise knowledge in biological, and therefore scientific, terms—it cannot provide a grounding of scientific inference. The thing in need of grounding here—the validity of scientific reasoning—is being invoked to provide its own grounds. Without this argument, Russell can only offer us a brute assertion that our biological habits mature to a stage that can be labelled “knowledge”. No (non-circular) justification is given in support of the claim. But in that case, no real support has been given for the claim that science rests on knowledge that is not somehow derived from experience. Russell’s claim that the investigations in Human Knowledge into non-demonstrative reasoning present a refutation of empiricism therefore cannot be said to have been demonstrated.25
Philosophy, School of Social Science University of Manchester Manchester M13 9PL Great Britain [email protected]
23 Ibid. p. 526. 24 Ibid. p. 526. 25 I would like to thank the audience at the ESF Philosophy of Science in a European Perspective Workshop, Utrecht, October 2009 for comments. Special thanks also to Thomas Uebel for additional comments.
ELISABETH NEMETH
EDGAR ZILSEL ON HISTORICAL LAWS
I Initially it seems surprising that Edgar Zilsel’s work has found as little response among philosophers as it has. After all, his contributions to the Vienna Circle’s debates about probability and protocol statements were published in Erkenntnis.1 Already his doctoral dissertation dealt with a central problem of modern philosophy of science—the status of statistical laws in physics—and revealed a remarkably knowledgeable mathematician, physicist and philosopher.2 Yet the way in which Zilsel raised the issues, namely via Leibniz, Spinoza and Kant, was not easy to accept for many of the later logical empiricists. Zilsel stuck with what in his dissertation he had called the “problem of application” and held that it needed to be solved even once the framework of logical empiricism had been accepted. By contrast, Richard von Mises and Otto Neurath considered it a pseudo-problem. Zilsel’s views are difficult to categorise and nowadays even difficult to understand. Just as Mises and Neurath were puzzled by the problem of application, so contemporary readers are likely to be puzzled by Zilsel’s search for “historical laws”3. What were they supposed to be and why did Zilsel think it so important to discover them? My contribution will not provide wholly satisfactory answers, but I hope to offer at least some important elements for them. Significant signposts are to be found in the large historical study on the development of the concept of genius that Zilsel submitted as his habilitation dissertation at the University of Vienna in 1
2 3
For details on Zilsel’s life and work see Diederick Raven and Wolfgang Krohn, “Edgar Zilsel: His Life and Work (1891–1944)“, in Zilsel, The Social Origins of Modern Science (ed. by Diederick Raven, Wolfgang Krohn, Robert S. Cohen), Dordrecht: Kluwer 2000, pp.xii-lix. Zilsel, Das Anwendungsproblem. Ein philosophischer Versuch über das Gesetz der großen Zahlen und die Induktion, Leipzig, 1916. Nicholas Jardine discusses the tensions in Zilsel’s project in his essay review ‟Zilsel’s Dilemma”, in Annals of Science, 60 (2003), 85-94. For the complex relationships between the concept of historical laws and the ‘problem of application’ see: Johannes Lenhard and Wolfgang Krohn: ‟Das Gesetz der großen Zahlen. Edgar Zilsels Versuch einer Grundlegung physikalischer und sozio-historischer Gesetze” in: K. Hartbecke und C. Schütte (eds.): Gesetze der Natur. Historische und systematische Perspektiven, Paderborn, mentis Verlag 2005, 291-317. The authors are right in stressing the strong Kantian features of Zilsel’s approach. However, they seem to underestimate Ernst Mach’s influence on Zilsel’s thinking about physical and historical laws. It is this aspect I want to focus on in the following article.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_36, © Springer Science+Business Media B.V. 2011
522
Elisabeth Nemeth
1924. His habilitation attempt failed, but the study was published in book-form in 1926. Once it had become clear that the habilitation commission considered it insuitable for the subject of philosophy, Zilsel withdrew his application. What is interesting for us is that he himself regarded this historical investigation as a contribution to the clarification of genuinely philosophical questions. First I want to discuss a different text, however: “On the Asymmetry of Causality and the Unidirectionality of Time” (“Über die Asymmetrie der Kausalität und die Einsinnigkeit der Zeit”; hereafter “Asymmetry”). This paper, published in 1927 in Die Naturwissenschaften, deals with the concept of causation in modern physics. At first, this topic does not seem to have anything to do with the historical investigation of the genius concept, but at the end of the paper Zilsel makes very clear that reflection on the concept of causation can provide pointers for foundational questions in historiography.
II The text of “Asymmetry” gives a good example of the difficulties that Zilsel presents his readers with and which may serve to partially explain why his work did not find much reception in philosophy so far. It contains many very detailed remarks on the state of physical theory in his day: on the meaning of simultaneity, on the definition of the direction of time, on entropy and irreversibility, on the meaning of the concept of determination, on the relation between micro- and macro-laws. These topics are discussed with reference to Mach, Boltzmann, Reichenbach, Carnap, Kurt Lewin, Hermann Weyl, Moritz Schlick and others. Zilsel’s sympathies for a Machian philosophy of science are clearly discernible, as is the fact that his views cannot be assigned unambiguously to any one philosophical school. Zilsel’s argumentation is very independent and occasionally contains anticipatory ideas, yet it also often seems quite alien and hard to comprehend. Here I can only give examples that bear on Zilsel’s view of the role of historical laws. As is well-known, it was Ernst Mach who rejected with special emphasis the employment in science of everyday conceptions of causality. From him we have learnt that to discover natural laws means to discover functions that connect states, processes, natural relations.4
With this Zilsel created the framework within which he was to develop his questions. What is the significance of Mach’s insight for our everyday conceptions of causality? What distinguishes the vague everyday conceptions of causality from the exact, scientifically definable concepts of natural law and function? Zilsel’s first answer was: asymmetry. It appears as if in our everyday conceptions 4
Zilsel, “Über die Asymmetrie der Kausalität und die Einsinnigkeit der Zeit”, in Die Naturwissenschaften 15 (1927) 280-286, at p. 280. Trans. of German originals by present translator.
Edgar Zilsel on Historical Laws
523
the relation between cause and effect, whatever else is commonly meant by it, is wholly asymmetrical, whereas with regard to laws and functions it is never possible to distinguish between dependent and independent variables in a principled way.5
Noting the opposition of asymmetry and symmetry is only a first step towards clarifying the problem, however. For Zilsel went on to show that it is possible, after all, to distinguish between dependent and independent variables in certain physical laws and functions. The emphasis of the quoted sentence lies on the expression “in a principled way”. The contrast between the scientific concept of natural law and the everyday conception of causality does not lie in that one side (everyday reasoning) assumes an asymmetrical relation and the other (science) assumes a symmetrical one. The contrast is rather that in everyday reasoning the distinction between independent and dependent variables (between causes and effects) is wholly unproblematical or, better, supposed to be unproblematical. By contrast, in science laws and functions formulating asymmetrical relations are special cases which presuppose that one can indeed draw a sharp logical distinction between the dependent and independent variables. The distinction appears as unproblematical for everyday reasoning, as it is difficult to formulate for science. To present us with an exact version of asymmetric causation, Zilsel first indicates a path from everyday reasoning to science. This path has obvious similarities with what nowadays is called “the manipulability conception of causal explanation”.6 When in everyday concerns … we operate with asymmetric causality, then this has primarily biological-practical reasons. What we know is in the service of our reactions to the environment, after all, but our reactions generally focus only on one of the variables connected by law. … If, for example, an areometer is swimming in a saline solution only subject to its weight and buoyancy, then there obtains a one-to-one lawful connection between the depth to which the swimming object has sunk and the concentration of the solution; dependent and independent variables can be exchanged arbitrarily. If we want to react to the process, then we can alter the concentration of the solution arbitrarily, but not directly the depth of the swimmer. For against our supposition, this would introduce new forces into the interplay of weight and buoyancy, as when we press down the object by hand. Now we distribute in our law—and in all others of the same type—the names of cause and effect after the model of our reactions to the variable at issue. That variable which is amenable to our will, we regard as the independent one, the cause, those which are connected to it by law we regard as the dependent ones, the effects.7
From this perspective the difference between dependent and independent variables depends on this that they are available for wilful manipulation to different degrees. Zilsel understands by ‘variables available for wilful manipulation’ not only those 5 6 7
Ibid. James Woodward, Making Things Happen. A Theory of Causal Explanation, New York: Oxford University Press, 2003, pp. 9, 25. Zilsel, “Asymmetrie”, op. cit., p. 281
524
Elisabeth Nemeth
that are indeed practically manipulable by humans. They are characterised as such instead by how they are given to us in time: the reactions of an organism take place in a temporal framework. Thus our manipulations are “limited to those magnitudes which determine a state in one instant”8 Since, e.g., nothing can be said about the time of revolution of a planet on the basis of one instant, it is impossible to determine it directly, whereas we can—at least in theory—fix its distance from the sun for that is an instantaneous value. Expressed more exactly: we can intervene in planetary movements according to the basic equations of mechanics, but with respect to Kepler’s third law, according to which distance from the sun and time of revolution are weighted equally, we cannot intervene at all.9
So there are physical laws which largely correspond to the everyday conception of causality. These are “differential laws of succession” which derive processes from a state which can be determined by one instant.10 They involve time and its unidirectionality within physics and allow the definition of asymmetry. “Cause” in a macro-law is called the temporal precedent in the temporal differential laws at issue, “effect” that what follows. This distributes the somewhat vague names “cause” and “effect” among the variables of a law in the way which also ordinary usage would appear to be directed more or less consciously.11
The asymmetry of the everyday conception is captured in such a way so as to possess a clear physical meaning. All the same, this demonstrated connection of causality and succession holds certain difficulties. The direction of time indicated (past-present-future) is bound to the character of the experience of living things, but “there is hardly an objective physical way in which the subjective difference of memory and expectation can be rendered”.12 The categorical difference between before and after cannot be derived from differential laws. … The total solar eclipse of 1850 is determined, that is, calculable, just like the one of 1899, on the basis of the planetary constellation on 1 January 1927, and this hold with full generality. Wherever differential laws holds, there the future determines the past just like the past determines the future.13
The essence of the asymmetry of cause and effect is only understood, so Zilsel, once we pay attention to the fact that “quite apart from the differential laws and the peculiarities of our stream of experience … nature [shows] this most remarkable 8 9 10 11 12 13
Ibid. Ibid. Ibid. Ibid. Ibid. Ibid., p. 282. The dates do not relate to any real total solar eclipse.
Edgar Zilsel on Historical Laws
525
structure: it includes systems which assume different states at different times.” These are the topological facts of experience which “guarantee the unidirectionality of a certain objective magnitude, ‘time’, the objective difference between past and future”. These topological laws are satisfied in an exact manner in physics, “namely everwhere where entropy is defined for closed systems”. Ultimately, they depend on the topological meaning of the second law of thermodynamics which encompasses the nonequivalence of past and future. According to Zilsel, however, also “all biological and historical developments” follow in accordance with topological laws. “In biology, Dollo’s law says that organisms never return to the form they once possessed, even if they regress like parasites.”14 This gives us the schema according to which Zilsel understands all the problems that arise when physicists try to define the unidirectionality of time. Important for us here are not the further examples which Zilsel then gives, but his general conclusion. At least we succeeded in deriving the difference between independent and dependent variables from two related facts: the difference between natural circumstances amenable and not amenable to our reactive intervention and the difference of before and after. Yet it must be stressed that both differences do not play a role in quite a few functional relations in nature. There are also natural laws to which the everyday schema of cause and effect does not apply.15
These are, first, those physical laws none of whose variables allow for our direct intervention. Zilsel’s examples are the functional relation of absorption and emission in Kirchhoff’s law and the functional relations between material constants.16 Then there are, second, those laws “in which all variables are directly amenable to our reactive interventions. In these ‘bi-directional’ laws we can intervene in both ways: cause and effect are exchangeable.”17 Zilsel’s examples are the law of the lever, imaging laws in optics and the ideal gas laws. This allows Zilsel to state about macro-laws: Among the macro-laws there are many in which the variables are not amenable in the same ways to the intervention of reacting humans. These are always laws that either regulate temporal processes directly or in which such processes play an important role. In these and only these laws cause and effect are clearly differentiated. Their difference, apart from the degree of their amenability to interventions, depends on the unidirectionality of time, but that depends objectively only on the irreversibility of our macro-world. Other macro-laws, namely those where time plays no role, either connect variables none of which are amena-
14 15 16 17
All quotations in this paragraph from ibid. Ibid., p. 284 Ibid. Ibid., p. 285
526
Elisabeth Nemeth
ble to human intervention or where all their variables are equally amenable to it. In these cases the concept of cause looses its significance.18
These conceptual clarifications are, so Zilsel, more or less obvious in physics, but they are of considerable importance for “other fields of research in which—decades after Mach—vaguely applied conceptions of causality and natural law are responsible for fruitless discussions.”19 What Zilsel has in mind are the social sciences, especially the problem of historical laws. If there are historical laws, then naturally these can only be macro-laws. Now the dispute over, e.g., the ‘materialist’ conception of history turns on the question whether the economic or the cultural, religious and artistic phenomena connected with them are the dependent or the independent variables. With regard to macro-laws, however, it is not a matter of course that dependent and independent variable can always be distinguished. These questions outside of physics proper can only be answered by empirical research. Familiarity with conceptual tools of theoretical physics may provide a good service for dealing with many a problem in the human sciences, given that clear concepts are indispensible even for empirical researchers.20
III Before we can turn to the book in which Zilsel’s attempt to discover historical laws by detailed empirical research is best documented, we must follow up the hint that Zilsel provided in this last quoted passage. The critique of conceptions of causality and natural law can bear fruit in the disputes about the materialist conception of history. Zilsel himself took part in these on the occasion of debate in the Social Democratic monthly Der Kampf in 1931. Here too we cannot cover everything but only present those elements that are of relevance for our present topic. Among other things the debate concerned the question of the significance of the materialist conception of history for the Social Democratic Party. A convinced party member since 1918 and intensively engaged in the party’s educational programmes, Zilsel opposed an earlier essay whose author opined that “by joining the Party one also affirms the Marxist theory of history”.21 Zilsel conceded that it is correct that the political work of the Social Democratic Party is sustained by views that have a theoretical foundation and must be accepted by its members, naming the following: 18 19 20 21
Ibid. Ibid., p. 286 Ibid. Zilsel, “Materialismus und marxistische Geschichtsauffassung”, Der Kampf 24 (1931), repr. in Zilsel: Wissenschaft und Weltanschauung. Aufsätze 1929–1933 (ed. by Karl Acham), Vienna: Böhlau 1992, pp. 77-98, at p. 88.
Edgar Zilsel on Historical Laws
527
our ultimate aim is the socialisation of the means of production, our enemy is the capitalist class society, our path the class struggle. Whoever belongs truly to the Party must affirm this with head and heart. Yet is the materialist conception of history also a constitutive part of the Party?22
Zilsel gave a negative answer and justified his own attitude towards the materialist conception in the course of his essay in a revealing fashion. The materialist conception of history consists of two closely related theories: the theory of class struggle and the theory of base and superstructure. The first one says that from the emergence of class societies until the future classless society all historical processes are the result of class struggles—all of them, including the Crusades, including the monotheistic reform directed against the cult of Amon by the Pharao Amonophis IV-Echnaton at around 1350 BC.23
The second theory, that of base and superstructure, rests on “Marx’s teaching that for all time ideologies have been determined strictly causally by the economy”.24 This too is a theory with a very high claim to generality. It also must be able to explain the atomism of Ancient Greece and Early Christianity on socio-economic grounds. But the Social Democratic Party, so Zilsel, had no interest in the full generality of either, the theory of class struggle nor that of the relation of base and superstructure, for it represents a community of action—unlike Marxist theory. The latter “cannot renounce atomism, Echnaton and Early Christianity. One would not make for a good scientist if one were to declare that the theory of evolution holds for horses, but not for lions which are the kings among animals, after all.”25 For theorists, distant and outlandish cases are the most interesting, for “they confirm or contradict the whole theory”.26 Zilsel continued: I oppose the misuse of Marx’s theory of history precisely because I believe it to be correct in its most radical form. It is not yet properly tested and confirmed. It should be tested by confrontation with empirical history, it should not be fought against, or watered down or made unintelligible with epistemology. Turning it into a party dogma is detrimental to the Party and to the theory in equal measure.27
22 23 24 25 26 27
Ibid. p. 89 Ibid. Ibid. Ibid., p. 89-90. Ibid. p. 99 Ibid.
528
Elisabeth Nemeth
IV In 1926, one year before ”Asymmetry“, Zilsel published his compendious historical study The Origin of the Concept of Genius (Die Entstehung des Geniebegriffs; hereafter: Origin). Zilsel intended to follow up this volume, which dealt with the development of the concept of genius from Antiquity to the 16th century, with another dealing with the development up until the 20th century. This sequel was never written, probably partly because Zilsel’s interests shifted toward the investigation of the social roots of modern natural science. His work on that topic began to be published in 1939, after he had sought exile in the USA.28 These papers made Zilsel well-known as a pioneer of modern history of science, but unfortunately, due to his early death, the project remained a fragment. This makes Origin his opus magnum—even though he most certainly did not intend this. In any case, the problem fields which Zilsel made approachable in Origin form the background to his claim in 1927 that the critical investigation of the physical concepts of cause and natural law can help avoid fruitless discussions in the human sciences. One of the strengths of the book is its astonishing wealth of material from cultural history, but I cannot delve into this. Instead I must concentrate on some of the conceptual and methodological considerations which inform Zilsel’s research. In light of the debate about the materialist conception of history just considered, another strength of the book becomes apparent. Zilsel largely avoids the terminology of Marxist theory of history but in fact does precisely what in 1931 he recommended should be done. He sets out to test the materialist conception of history by means of empirical research. He does not speak of “superstructure” but states that the point lies in showing how ideas and ideals “are lawfully connected with historically caused transformations of society”.29 The class struggle too finds no mention, but instead the hierarchically ordered strata of society “whose up and down constitutes the Renaissance development”.30 Zilsel’s Origin is best understood as a first step in the large project of testing Marx’s theories of society and history. In Zilsel’s view, such a testing was not to be undertaken on the basis of the conception of science offered by dialectical materialism, but on the basis of the methodological reflection of the “for the time most complete empirical science”, theoretical physics. What this involved was made explicit by Zilsel in 1927, retrospectively as we now can see. Anyone familiar with physics would not dare to choose sides in the ordinary debates about the materialist conception of history. Such a person knows that the question at issue—“whether the economic or the cultural, religious and artistic phenomena connected with them are the dependent or the independent variables”—cannot be 28 Reprinted as Zilsel, The Social Origins of Modern Science, op. cit. 29 Zilsel, Die Entstehung des Geniebegriffs. Ein Beitrag zu Ideengeschichte der Antike und des Frühkapitalismus, Tübingen 1926 (trans. of German by present translator), p. 300. 30 Ibid.
Edgar Zilsel on Historical Laws
529
answered under all circumstances. She knows that there are laws in physics for which this presupposition does not hold. Accordingly, she is very careful in her use of the concept of causation—but does not reject it. Such a person would search for functional relations between socio-historical facts and try to specify their character. In doing so, she would bear in mind that while there are causal relations there are also functional dependencies for which the concept of cause if inapplicable. Only those researchers will be able to determine whether there are historical laws who are aware that the question of their nature cannot be answered once and for all. The lesson of modern physics consists precisely in realising that the meaning of the concepts of natural law and lawlikeness must be rethought ever and again, always starting from empirical problems. Yet critical reflection about the concept of natural law is not the only instance of “means of thought” that Zilsel transfered from physics to history. Mach’s epistemology left its traces also in the way in which Zilsel constituted his field of research. The concept of genius is conceived of as a “social structure” made up of different elements that can be investigated from different perspectives. Right at the beginning of his book, Zilsel pointed out that in an earlier book he had investigated with psychological methods the same “structure” (the concept of genius) that he was now investigating in its social contexts.31 That disciplines are nothing but differently focussed perspectives on the same basic elements is a very Machian idea. But so is the thought that historical reasearch may assume a role that may be compared to experimental research in physics: The investigation of a social structure under different social circumstances, as presented by the course of history, may be of similar service for gaining causal knowledge as the intentional alteration of conditions in a natural scientific experiment.32
The elements which Zilsel finds the “social structure” to be composed of are just aa irritatingly heterogeneous as Mach’s notorius list of elements in his Analysis of Sensations: “Colors, sounds, temperatures, pressures, spaces, times, and so forth.”33 For Zilsel, the 19th and 20th century concept of genius combines the following elements: the opposition of rare genius and the mass of average people, the misunderstanding of genius by the world around them, the idea of a brotherhood of most varied geniuses across the ages, the judgement of posterity, but also reflection, formalisation, independent idolisation, quasi-religious reverence. These ele-
31 This refers to Zilsel, Die Geniereligion, Vienna and Leipzig, 1918. See Krohn and Raven, op. cit.; Elisabeth Nemeth, “ ‘Wir Zuschauer‘ und das ‘Ideal der Sache’. Bemerkungen zu Edgar Zilsels ‘Geniereligion’ ”, in Friedrich Stadler (ed.), Bausteine wissenschaftlicher Weltauffassung, Vienna: Springer, 1997, S.157-178. 32 Zilsel, Entstehung, op. cit., 2, p. cf. p. 323. 33 Ernst Mach, Analyse der Empfindungen, Leipzig, 1886, trans. Analysis of Sensation, La Salle: Open Court, 1897, p. 2.
530
Elisabeth Nemeth
ments appear across time in different combinations, so that genius in Antiquity and the Renaissance appears different from genius in the Baroque or the 19th century. According to Zilsel it is the aim of historical investigations to tell us how “individual ideas and individual developments fit into the encompassing process of history, how they are lawfully connected with historically caused transformations of society”.34 Note that Zilsel here speaks, on the one hand, of “historical causes” which effect social transformations and, on the other hand, of certain ideas and developments being “lawfully connected” with them. Zilsel seems to suggest that for certain social transformations it is established that they can be explained causally. By contrast, Zilsel is much more careful when it comes to the relation between those “transformations” and the “ideas” connected with them. The question of what kind of lawful connection is at issue is left open. This does not mean, however, that Zilsel did not suspect causal relations to be involved. Recall his discussion paper from 1931. There he declared explicitly that he believed Marx’s theory of history to be correct in its most radical form. But from his 1927 paper we know that in physics we can only think of laws as causal laws under specific circumstances. As for physicists, so for historians the task consists in discovering functional relations and to establish what kind of lawful relations obtain. (Is one of the variables open to human intervention or none or both?) These questions can only be answered as part of an empirical research programme. What is clear in any case is where Zilsel wanted to direct his attention to in order to establish the nature of the lawful connections involved: the rise and fall of hierarchically ordered social strata. “The development of the concept of genius, like every historical phenomenon, can only be lawfully understood and explained as part of the phenomenon of the opposition between decaying and youthfully rising strata of society.”35 Here it is important to realise that the elements which the structure ‘genius concept’ consists of may appear at altogether different places in this opposition of ascending and descending social strata. Individual originality, for instance, is a part of the complex and appears historically in connection with the rising inventors and artists of the Renaissance as with the painters of mannerism. In the first case it is related to the idea of progress, in the latter to yearning for the past. For ideas are not organically grown body parts and can pass from the hands of a rising social group nearly unchanged into the hands of a sinking one. … Such transfers of ideas and mixtures of ideas can be found in the rising Christian communities of late Antiquity and will be met with often among advocates of ideals of the new and independent in the 17th century.36
34 Zilsel, Entstehung, op. cit., S. 300. 35 Ibid. 36 Ibid., p. 320.
Edgar Zilsel on Historical Laws
531
Zilsel was aware of the fact that the discovery of laws was much more difficult in history than in physics. He compared his historical research with statistical mechanics. In statistical mechanics which deals with even larger mass phenomena … the search for laws is relatively simple, for there we are not interested in individual molecules but seek to grasp a mass phenomenon … in doing history we are searching for laws as if the theory of gases and the law of entropy was derived from the behaviour of an individual molecule. Yet it may not be wholly pointless to search for lawful explanations in history too, for we should find at least developmental tendencies, more or less clear precursors of laws.37
So far, Zilsel conceded, it had not been possible to provide more than accounts of individual phenomena with certain hypotheses attached to them.38 The goal of providing historical explanation had not yet been reached. To make progress, historical research should emulate the model of fixed star astronomy. For a long time that too was confined to the description of individual phenomena, but more recently it had been “transformed from fact-collecting science into a nomological science”39 The transformation of fixed star astronomy was brought about not by discussions of the peculiarity of the field and its research methods, but by coordinated division of labour according to systematic criteria, the investigation of all stars, either in particular sections of the sky or all over up to a degree, and their cataloguing and statistical analysis.40
In a similar fashion one could, e.g., split the Renaissance into five-year periods and, with division of labour, investigate all evidence (books, legal documents, pictures, buildings) in a systematic way and according to unified and consistent criteria. It seems that Zilsel thought that the statistical frequency of the coincidence of certain attitudes with certain social positions provided the basis for determining functional dependencies. Zilsel regarded his own statistical analyses of the group biographies of the Renaissance as a small step in this direction, but at the end of his book he noted that the lawful connections he found in this way are not yet “real historical laws”. These will only be discovered when one analyses “superficially similar historical processes in economically, racially and otherwise different cultures under the same criteria” and compares the result: Since conditions for doing history are just as unalterable as those for fixed star astronomy making comparisons appears to be the only way of bringing to light so far unrecognised causal chains and extraneous disturbances of the incompletely closed historical systems. It
37 38 39 40
Ibid., p. 321. Ibid., p. 320. Ibid., p. 321. Ibid., p. 321-322.
532
Elisabeth Nemeth
would be strange indeed if such a comparison of Renaissance movements in Italy, China and Persia would not render visible any laws.41
Zilsel’s later investigations concerning the origin of modern science were based on the same methodology.42 However, in these later works the concept of physical law was thematised in a new way. It continued to be relied upon for the clarification of concepts belonging to the human sciences. At the same time, however, that concept became itself an object of historical research. For the later Zilsel, the question of what it means to search for historical laws continued to possess aspects that, starting from the natural sciences, must be answered by a philosophical-conceptual analysis. Yet the question now also gained historical-sociological aspects that must be investigated by empirical investigations of the history of the concept of physical law. While Zilsel gave frequent hints about the gain that historiography can derive from the methodological and conceptual reflections of physics, he was less clear about whether and if so what physics was to gain from the historicalsociological investigation of its basic concepts. It must be left for another occasion to consider this issue.43
University of Vienna Institute for Philosophy Universitätsstraße 7 1010 Wien Austria [email protected]
41 Ibid., p. 323. 42 See Zilsel, Social Origins, op. cit.. See also Nemeth, “Logical Empiricism and the History and Sociology of Science”, in Alan Richardson, Thomas Uebel: (eds.): The Cambridge Companion to Logical Empiricism, Cambridge: Cambridge University Press, 2007, pp. 293-297. 43 Many thanks to Thomas Uebel for translating this paper.
ERIC SCHLIESSER
“EVERY SYSTEM OF SCIENTIFIC THEORY INVOLVES PHILOSOPHICAL ASSUMPTIONS” (TALCOTT PARSONS).
THE SURPRISING WEBERIAN ROOTS TO MILTON FRIEDMAN’S METHODOLOGY
INTRODUCTION The main point of this paper is to contribute to understanding Milton Friedman’s 1953 “The Methodology of Positive Economics” (hereafter F1953), one of the most influential statements of economic methodology of the twentieth century, and, in doing so, help discern the non trivial but complex role of philosophic ideas in the shaping of economic theorizing and economists’ self-conception.1 It also aims to contribute to a better understanding of the theoretical origins of the socalled ‘Chicago’ school of economics. In this paper, I first present detailed textual evidence of the familiarity of George Stigler with the early work of Talcott Parsons, the most important American translator and disseminator of Max Weber’s ideas, who also helped create sociology as a distinct discipline in the United States.2 The Chicago-Parsons link is no surprise because historians have known that Frank Knight and Parsons corresponded, first
1
2
All my references to Friedman are by page-number from Milton Friedman Essays in Positive Economics. Chicago: The University of Chicago Press, 1953. For recent scholarship see Uskali Mäki (ed.) The Methodology of Positive Economics. Reflections on the Milton Friedman Legacy. Cambridge: Cambridge University Press, 2009. On Parsons’ theorizing Stephen Turner’s essays are useful in discerning the relevant philosophic commitments and presuppositions: “The Strange Life and Hard Times of the Concept of a General Theory in Sociology: a short history of hope,” in S. Seidman & D. Wagner, ed. Postmodernism in Social Science. Oxford: Basil Blackwell, 101-133: (1992); “Defining a Discipline: Sociology and its Philosophical Problems, from its classics to 1945,” S. Turner & M. Risjord, eds., Handbook of the Philosophy of Science. Philosophy of Anthropology and Sociology Elsevier 2007, pp. 3-69; on Parson’s attempts at discipline-building (comparable to the efforts by Stigler in Chicago’s business school), see William Buxton and Stephen Turner, “From Education to Expertise: Sociology as ‘Profession,’ ” T. Halliday and M. Janowitz, eds., Sociology and its Publics: the forms and fates of disciplinary organization, Chicago: The University of Chicago Press,1992, pp. 373-40. On Parsons’ role in the history of economics, see Geoffrey M Hodgson How Economics Forgot History: The Problem of Historical Specificity in Social Science London: Routledge, 2001, chapter 13.
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8_37, © Springer Science+Business Media B.V. 2011
534
Eric Schliesser
about translating Weber and then about matters of mutual interest.3 Knight, who was a doctoral advisor to Stigler and teacher of Milton Friedman, was not merely the first American translator of Weber, but remained keenly and, perhaps, increasingly interested in Weber throughout his life.4 I am unfamiliar with any investigation of the Weberian influence on Knight’s students.5 I show that Stigler praises Parsons’ treatment of Alfred Marshall, who plays an outsized role in Friedman’s self-conception of economics and economic theory.6 I also show that Stigler calls attention to the methodological similarity between Friedman and Parsons. Finally, I turn to F1953, and I show, first, that some of its most distinctive and philosophically interesting claims echo Parsons’ treatment of methodological matters; second that once alerted one can note Weberian terminology in F1953.7 Three disclaimers about the argument of this paper: first my approach does not remove all the confusions that people have discerned in Friedman’s arguments
3 4
5
6
7
See Ross Emmett “Frank Knight, Max Weber, Chicago Economics and Institutionalism,” Max Weber Studies, 2006, pp. 101-119. Ross Emmett tells me that Stigler was in Frank Knight’s 1936 seminar on Max Weber. For the importance of Weber at The University of Chicago, see also Stephen P. Turner and Regis A. Factor Max Weber and the dispute over reason and value: a study in philosophy, ethics, and politics, London: Routledge & Kegan Paul, 1984. Charles Parsons informs me (personal communication, June 3, 2010) that Talcott Parsons taught a class at The University of Chicago during the Summer term of 1937. It appears that neither George Stigler, who was appointed at Iowa State, nor Milton Friedman, who was National Resources Committee in Washington, would have overlapped with Parsons then.” I thank Daniel Hammond and Ross Emmett for discussion. In so far as James Buchanan and his colleagues in the Virginia school (and public choice analysis generally) became ever more open to so-called ‘Austrian’ economics, there are more Weber-Chicago connections. See L.M. Lachmann The Legacy of Max Weber: Three Essays. Berkeley: The Glendessary Press, 1971, which (incidentally) acknowledges the importance of Parson’s influence on the reception of Weber in the English speaking world. I treat Milton Friedman and George Stigler as distinct and separate from Austrian economics. Milton Friedman “The Marshallian Demand Curve,” in Essays in Positive Economics, op cit. See also the epigraph to Milton Friedman and Anna JacobsonSchwartz A Monetary History of the United States, National Bureau of Economic Research, (1963), which is also quoted in note 1 on p. 10 of Parsons’ Structure of Social Action! See also Roger E. Backhouse, Bradley W. Bateman and Steven G. Medema “The reception of Marshall in the United States,” in T. Raffaelli, M. Dardi, G. Beccatini and K. Caldari (eds). The Diffusion of Alfred Marshall’s Thought, Cheltenham: Edward Elgar. Building on a 1979 remark by Friedman, W.J. Frazer and L.A. Boland “An Essay on the Foundations of Friedman’s Methodology” American Economic Review, 73 (March 1983): 129-144, offer an account of Friedman’s Popperianism. Friedman and Popper met through the Mont Pelerin Society in 1947. The argument of my paper suggests, however, that the methodological identification with Popper is a post facto construction.
“Every System of Scientific Theory …”
535
in F1953.8 Seeing Friedman as echoing themes from Parsons sensitizes one to what elsewhere I (unhelpfully) called Friedman’s “neo-Kantian” understanding of assumptions in theorizing.9 Second the writings of Parsons need not be the most important one for understanding F1953’s arguments. More attention needs to be given to J. N. Keynes,10 whom Friedman cites in the opening lines of F195311 and who was much praised by Stigler in this period (see below). No doubt they enjoyed the irony of praising the father while criticizing the more famous son. Third I use Parsons as a proxy for a kind of innovative Weberian social science. Even though there surely are non-trivial differences between Parsons and Weber, I see Parsons’ The Structure of Social Action (first edition 1937; hereafter Structure)12 as constructing a Weberian program that is supposed to guide social science research. F1953 uses aspects of this program for its own ends.13
1. STIGLER ’S INTEREST IN PARSONS In an important short piece from 1943, George Stigler criticizes what soon became the dominant approach within professional economics; that approach combines sophisticated mathematical technique, a focus on revealed preference, and an understanding of economics (inspired by L. Robbins) as resource maximization under constraint. Near the end of his discussion, Stigler writes: it is sufficient for present purposes merely to refer to such well-known presentations as those of Durkheim in sociology and Gierke in politics. Talcott Parsons probably had economists in mind when he wrote: “For it is a fact that social existence depends to a large extent 8
9 10 11
12 13
See U. Mäki “Rhetoric at the expense of coherence: a reinterpretation of Milton Friedman’s methodology,” Research in the history of economic thought and methodology, 4, Samuels, W. J. (ed.), Greenwich, CT: JAI Press: 1986, pp. 127-43. Eric Schliesser (in press) “Friedman, Positive Economics, and the Chicago Boys,” in R. Emmett, ed., The Elgar Companion to Chicago Economics, Edwin Elgar. J. N. Keynes The Scope and Method of Political Economy, London, 1891. “In his admirable book on The Scope and Method of Political Economy John Neville Keynes distinguishes among “[1] a positive science … a body of systematized knowledge concerning what is; [2] a normative or regulative science … [,] a body of systematized knowledge discussing criteria of what ought to be …[, 3] an art … [,] a system of rules for the attainment of a given end”; Friedman comments that “confusion between them is common and has been the source of many mischievous errors”; and urges the importance of “recognizing a distinct positive science of political economy,” (F1953, 3). All my references are to page-numbers of Talcott Parsons The Structure of Social Action, New York: Free Press; 2nd edition, 1967-68. Weber’s methodology has been used to explain the continuity and discontinuities between neo-classical and behavioral economics very much in the spirit of the present article, see Erik Angner and George Loewenstein (manuscript) Foundations of Behavioral Economics.
536
Eric Schliesser
on a moral consensus of its members and that the penalty of its too radical breakdown is social extinction. This fact is one which the type of liberal whose theoretical background is essentially utilitarian is all too apt to ignore-with unfortunate practical as well as theoretical consequences. At the level of economic policy, then, it is totally misleading to talk of ends as individual and random; they are fundamentally collective and organized. If this conclusion be accepted, and accept it we must, the economist may properly exceed the narrow confines of economic analysis. He may cultivate a second discipline, the determination of the ends of his society particularly relevant to economic policy. This discipline might be called, following J. N. Keynes, applied ethics.14
The American Economic Review was already the most important journal within economics. Stigler’s article opens with a long epigraph from Aristotle’s Ethics; Stigler then targets the new techniques developed by Samuelson, Hotelling, Lerner, Kaldor, and Hicks who are in the midst of launching a formal revolution within economics.15 His argument is philosophical not mathematical. In particular, Stigler argues that economists presuppose a moral and political consensus when they are doing policy science. Stigler—who is echoing his teacher Knight here16— takes for granted that the economic sphere is framed or constrained by political or social ends. It is on this point that Stigler cites Parsons approvingly.17 Thus, Stigler’s argument leads to a distinction between pure economic analysis, in which ends are thought of as individual and random, and policy science (or applied ethics), where ends are unified; it this distinction that drives him to accept Keynes’ distinction between positive and normative science. Stigler’s point is not that economists should avoid policy science. Rather he insists that its normative presuppositions ought to be different than in pure economics. In his criticism of New Welfare economics Stigler argues for greater selfunderstanding on the part of economists about the essentially political nature of welfare economics when applied to societies. In context, Stigler’s point is meant to warn against two tendencies: first, the tendency to import the representative
14 George J. Stigler “The New Welfare Economics,” The American Economic Review, Vol. 33, No. 2 (Jun., 1943), pp. 355-359, emphasis added. 15 R.E. Backhouse “The Transformation of US Economics, 1920-1960, viewed through a survey of journal articles,” in M.S. Morgan and M. Rutherford, Eds. From Interwar Pluralism to Postwar Neoclassicism, Durham, NC: Duke University Press (Annual Supplement to History of Political Economy, 30), 1998, pp. 85-107. 16 David M. Levy, and Sandra J. Peart “Stigler, George Joseph (1911–1991),” in S. Durlauf, ed., The New Palgrave Dictionary of Economics, 2nd Edition, 2008. 17 Stephen Stigler informs me that his father owned a copy of the first 1937 edition of Parsons’ Structure. Stigler writes, “He read it but with few notes. Inside the back cover he marked 3 page numbers: 392, 395, 248. (in that order). In addition there are marginal dashes on pages 232, 256, and 566,” (personal communication, March 31, 2010). All six pages concern the common moral values/ends of a political community; they provide the background to the claim with which Stigler ends the 1943 American Economic Review article.
“Every System of Scientific Theory …”
537
agent into the “pure” part of economic analysis; second, the tendency to forget the contentious nature of assuming that that society’s ends are unified.18 Stigler’s argument presupposes, of course, that there is a self-sufficient a-political domain of pure economics: Friedman’s “positive” economics. The reference to Parsons reminds us that at the theoretical origins of ‘Chicago,’ the division of labor within economics is justified on social theoretical grounds. It is not the only important reference to Parsons in Stigler during the 1940s. In 1949 Stigler gave five lectures at the LSE that were published as a separate booklet.19 Near the end of the second lecture Stigler remarks, “I wish to close by offering an estimate of the net contribution of the attempt to construct a theory of monopolistic competition. Before undertaking this appraisal, however, it is necessary to set forth certain methodological principles,” (23). Stigler then writes: The purpose of the study of economics to permit us to make predictions about the behavior of economic phenomena under specified conditions. The sole test of the usefulness of an economic theory is the concordance between its predictions and the observable course of events. Often a theory is criticized or rejected because its assumptions are “unrealistic.” Granting for a moment that this charge has meaning, it burdens theory with an additional function, that of description. This is a most unreasonable burden to place upon theory: the role of description is to particularize, while the role of theory is to generalize—to disregard an infinite number of differences and capture the important common element in different phenomena. (Five Lectures, 23).
Stigler adds the following footnote: “The present interpretation of these principles is due to Professor Milton Friedman; see Talcott Parsons, The Structure of Social Action.” I am unaware of any attention to this footnote in the enormous literature on F1953. It is no surprise that Stigler would mention Friedman here. Compare him on “positive economics”: Its task is to provide a system of generalizations that can be used to make correct predictions about the consequences of any change in circumstances. Its performance is to be judged by the precision, scope, and conformity with experience of the predictions it yields. … In so far as a theory can be said to have ‘assumptions’ at all, and in so far as their ‘realism’ can be judged independently of the validity of predictions, the relation between the 18 In F1953 Friedman assumes that in advanced societies values have converged. For the significant afterlife of the issue, see Schliesser “Friedman, Positive Economics, and the Chicago Boys,” loc cit., and Ross Emmett “Realism and Relevance in the Economics of a Free Society.” Journal of Economic Methodology 16.3 (June 2009), pp. 341-50. 19 George J. Stigler, Five Lectures on Economics Problems, London: Longmans, Green, and Co, 1949. It had non-trivial policy impact on debates over monopoly. Eric Schliesser “Inventing Paradigms, Monopoly, Methodology, and Mythology at ‘Chicago’: Nutter and Stigler,” (forthcoming in) Building Chicago Economics: New Perspectives on the History of America’s Most Powerful Economics Program, edited by Van Horn, Rob, Philip Mirowski, and Thomas Stapleford, Cambridge: Cambridge University Press.
538
Eric Schliesser
significance of a theory and the ‘realism’ of its ‘assumptions’ is almost the opposite of that suggested by the view under criticism. Truly important and significant hypotheses will be found to have ‘assumptions’ that are wildly inaccurate descriptive representations of reality, and, in general, the more significant the theory, the more unrealistic the assumptions (in this sense). (F1953, 4 and 14)
Stigler’s views are very similar to the ones made famous in F1953 (where Stigler is thanked in the first footnote and the Five Lectures is cited in note 38). At the time of the Five Lectures Stigler was at Columbia while Friedman had just returned to Chicago. Friedman and Stigler knew each other from Chicago in the 30s, were colleagues in the Statistical Research Group at Columbia during the war, and had shared an office at Minnesota for a year.20 We know from their correspondence that in this period Stigler and Friedman were intensely discussing methodological matters.21 Stigler’s familiarity with Parson’s Structure should not come as a surprise given Knight’s connections with Parsons.22 In his first publication (reworked version of his doctoral dissertation), Stigler writes in his study of Alfred Marshall, “No attempt will be made to discuss the numerous commentaries [on Marshall]… there is no need to reproduce Parsons’ path-breaking analysis of Marshall’s philosophical preconceptions and their influence on his doctrines.” 23 While one can never rule out a certain amount of mischief on the part of Stigler (who may have enjoyed linking his friend’s developing views to a very long work on social theory), I treat the note as sincere expression of the fact that Stigler’s views on methodology were inspired by Friedman. I treat the reference to Parsons as indicative of their shared source.24
20 Milton Friedman “George Stigler: A Personal Reminiscence,” The Journal of Political Economy, Vol. 101, No. 5 (Oct., 1993), pp. 768-773 21 J. Daniel Hammond and Claire H. Hammond, Making Chicago Price Theory: Friedman-Stigler correspondence 1945-1957, London: Routledge, 2006. 22 In his 1942 textbook The Theory of Competitive Price (New York: MacMillan) 20, Stigler refers to Parsons’ Structure as “recommended readings” at the end of his introductory methodological chapter. Other recommendations are to works by Knight, Robbins, J.N. Keynes, and Cohen & Nagel’s Introduction to Logic and Scientific Method. I thank David Levy for scanning relevant pages on my behalf. 23 George J. Stigler Production and Distribution Theories: the formative period, New York: Macmillan, 1941, pp. 61-2. Stigler cites Parson’s early articles on Marshall from the Quarterly Journal of Economics not Structure. 24 Earlier in the Five Lectures, in his treatment of Marshall, Stigler made a point of citing the relevant chapter of Structure not the earlier journal articles on which they are based (5).
“Every System of Scientific Theory …”
539
2. PARSONS AND CHICAGO Parsons’ Structure is a major study of Marshall, Pareto, Durkheim and Weber. It employs intellectual history as a form of theorizing. As Parsons writes in the “Preface” to the Second edition (1949), “The Structure of Social Action analyzes a process of convergent theoretical development which constituted a major revolution in the scientific analysis of social phenomena,” (xvi). Parsons’ role is not a passive observer of the revolution. “Only very gradually did it become evident that in the treatment of these problems, even from such diverse points of view, there was involved a common conceptual scheme” (preface to the first edition, (xxii)). The common conceptual scheme was not clear to the participants, but requires Parsons’ intellectual labor to be made visible.25 So, even when he is describing others, he is often also speaking in his own voice. My argument does not require that Friedman read Parsons,26 but in what follows I collect some of Parsons’ observations and compare them with some notable methodological and conceptual passages in F1953. I focus on three aspects distinctive of the methodology of F1953. While a few of these offer remarkable similarities none is decisive for my overall argument. However, as a group they capture important features of the (indirect) influence of Parsons that I identify in F1953. First, theories only produce partial visions. Parsons writes, “The structure of the conceptual scheme itself inevitably focuses interest on a limited range of such empirical facts. These may thought of as a “spot” in the vast encircling darkness, brightly illuminated as by a searchlight. The point is what lies outside the spot is not really “seen” until searchlight moves …” (Structure, 16) Without the searchlight metaphor Friedman claims the same: “A theory is the way we perceive ‘facts,’ and we cannot perceive ‘facts’ without a theory.” (34; the scare quotes are 25 Cf. Parsons on the relationship between economics and sociology: “The inescapable conclusion from this fact is … there is no place for a logically separate body of principles of economics. Economics must be merely the application to a particular body of concrete phenomena of the general principles necessary for understanding human conduct. If any single name is applicable to this body of theory it is “encyclopedic sociology,” the synthesis of all scientific theory relevant to the concrete facts of human behavior in society. Economics then becomes applied sociology,” (Structure, 173). See also: “It commits him [Souter], like his master Alfred Marshall, to an “economic imperialism” in the bad sense of suppressing the “rights” of neighboring sciences to an “independent” existence in the society of the sciences … It will be necessary to to consider the place of economics in the whole system of the sciences of action,” (T. Parsons, “Some Reflections on the “Nature and Significance of Economics,” Quarterly Journal of Economics, 1943, p. 522; this may be the first use of “economic imperialism.”) 26 Dan Hammond writes “There is a reference [to Parsons] in [Friedman’s] notes from Viner’s Econ 303, “Modern Tendencies in Economics’: “’Capitalism’ in Recent German Literature: Sombart and Weber” JPE Dec. 1928,” (personal communication, April 20, 2010).
540
Eric Schliesser
Friedman’s.) Friedman’s way of putting it also highlights that there are no ‘facts’ independent from theory. (Parsons writes in his discussion of Weber, “Observation is always in terms of a conceptual scheme,” 597.) It connects with Friedman’s larger methodological claim, which he often attributes to Alfred Marshall who “took the world as it is; he sought to construct an “engine” to analyze it, not a photographic reproduction of it,”27 (35; recall Stigler’s view that the role of theory is not description.) Second, Parsons and Friedman offer a similar two-fold conceptual structure of theories. Parsons writes approvingly that according to Pareto: “science is best characterized by the term ‘logico-experimental.’” Parsons explains: That is to say, there are two essential elements involved: logical reasoning and observation of ‘fact.’ Logical reasoning is by itself incapable of yielding necessary results beyond tautologies, but none the less it is an essential element. It is thought of, however, as subordinate to the other element, that of fact, experimental or observed. (181; scare-quotes in Parsons).
Friedman’s conceptual structure also stresses the tautological aspect of one part of theory and the empirical of the other part: theory is, in general, a complex intermixture of two elements. In part, it is a ‘language’ designed to promote ‘systematic and organized methods of reasoning.’ In part, it is a body of substantive hypotheses designed to abstract essential features of complex reality. Viewed as a language, theory has no substantive content; it is a set of tautologies. Its function is to serve as a filing system organizing empirical material and facilitating our understanding of it; and the criteria by which it is to be judged are appropriate to a filing system. … Factual evidence alone can show whether the categories of the ‘analytical filing system’ have a meaningful empirical counterpart, that is, whether they are useful in analyzing particular class of concrete problems (F1953, 7).
Friedman has switched from the language of subordination (between the tautological and empirical part) to the more neutral language of serving a function. Moreover, Friedman introduces an important function of theory characteristic of his more general outlook: it makes possible the analysis of concrete problems.28 Third, this last point is connected to a very important example in Friedman’s argument: his treatment of Galileo’s law of fall which runs through F1953. We may start with a simple physical example, the law of falling bodies. It is an accepted hypothesis that the acceleration of a body dropped in a vacuum is a constant … and is 27
See also Friedman’s “The Marshallian Demand Curve,” 469; 470; 490 and Parsons’ treatment of Pareto: “Pareto states that “it is impossible to know a concrete phenomenon in all its details.” It is no valid criticism of a theory that it does not suffice fully to explain a concrete phenomenon; on the contrary it is a virtue,” (Structure, 183). 28 This fits with his larger claim during this period that many of the relevant details of the component parts of the economy are not known yet. For details, see Schliesser “Inventing Paradigms, Monopoly, Methodology, and Mythology at ‘Chicago’,” op. cit.
“Every System of Scientific Theory …”
541
independent of the shape of the body, the manner of dropping it, etc. … The application of this formula to a compact ball dropped from the roof of a building is equivalent to saying that a ball so dropped behaves as if it were falling in a vacuum. … Suppose, however, that a feather is dropped instead of a compact ball. The formula then gives wildly inaccurate results. (F1953, 16-17ff; see also 24, 36).29
Friedman was not the first to employ the example in this fashion. According to Parsons, Pareto resorts to the methodological device of analyzing an abstract society, which is exactly analogous to the device of treating bodies as if they fell in a vacuum. The situation is closely analogous to that in which mechanics would be if all bodies on this earth were of a density relative to that of the atmosphere approximating that of feathers. The law of gravitation could be scarcely be arrived at by a process of empirical generalization from their actual behavior in nature, or by dropping them from high places. But this would be no reason why the law of gravitation would fail to hold in such a world…Scientific progress is a matter of successive approximation. Therefore the meanings of the symbols employed in scientific theories are never fully adequate expressions of the aspects of the concrete phenomena they attempt to formulate. (216).
From Parsons’ account it is not obvious if fall in a vacuum is the example that Pareto uses (or merely a helpful illustration by Parsons of Pareto’s willingness to treat a society in abstraction).30 Parsons turns Pareto’s use of abstraction into a defense of successive approximation and an attack on crude empiricism. In the world where all bodies were of a density relative to that of the atmosphere approximating that of feathers the law of fall would seem to presuppose wildly unrealistic assumptions. Yet it would still be true. Parsons’ treatment of Pareto shows that when dealing with messy empirical details through abstraction or as-if modeling one can simultaneously hold that one’s assumptions are unrealistic and still think one’s theory is appropriate in the analysis of a concrete example. This is a theme that runs through Parsons’ book. In fact, he seems to identify it as a key factor in theoretical success: Utterly dependent logically on this “erroneous” premise [i.e., identity of interests]31 there grew up what is perhaps the most highly developed theoretical system in the social sciences [utilitarianism] with correct results—within certain limitations. This fact may serve as a les29 For analysis see Eric Schliesser “Galilean Reflections on Milton Friedman’s ‘Methodology of Positive Economics,’ with Thoughts on Vernon Smith’s ‘Economics in the Laboratory,’ ” Philosophy of the Social Sciences, Vol. 35, No. 1, 2005, pp 50-74. See also G. Stigler The Theory of Competitive Price, op cit. p. 4. 30 It is in Vilfredo Pareto “The New Theories of Economics,” Journal of Political Economy, volume 5, No. 4 (Sep., 1897), pp. 485-502. Pareto is defending the use of mathematics in the study of humans. Friedman certainly knew his Pareto, see note 13 in “Lerner on the Economics of Control,” (317). 31 Recall my treatment above of Stigler’s use of Parsons in AER 1943.
542
Eric Schliesser
son to those who are overly puristic in their scientific methodology. Perhaps it is not always wise to discard even methodologically objectionable elements so long as they serve a useful scientific function, unless one has something better to substitute, (101).32
Finally, so far I have called attention to some similarities between Parsons’ Structure and F1953. These similarities identified between Structure and F1953 are no proof either of Friedman’s reading of Parsons or his familiarity with Weberian thought, despite the importance of Weber to Frank Knight and Stigler’s familiarity with Parsons. The similarities suggest that Stigler was right to call attention to the connection in Five Lectures. Upon re-reading F1953 in light of my argument, one must find the following oft-ignored passage striking: The abstract model corresponding to this hypothesis contains two “ideal” types of firms: atomistically competitive firms, grouped into industries, and monopolistic firms. A firm is competitive if the demand curve for its output is infinitely elastic with respect to its own price for some price and all outputs, given the prices charged by all other firms; it belongs to an “industry” defined as a group of firms producing a single “product” … A firm is monopolistic if the demand curve for its output is not infinitely elastic at some price for all outputs.29 … As always, the hypothesis as a whole consists not only of this abstract model and its ideal types but also of a set of rules, mostly implicit and suggested by example, for identifying actual firms with one or the other ideal type and for classifying firms into industries. The ideal types are not intended to be descriptive; they are designed to isolate the features that are crucial for a particular problem. (35).
It is accompanied by the following footnote: 29. This ideal type can be divided into two types: the oligopolistic firm, if the demand curve for its output is infinitely elastic at some price for some but not all outputs; the monopolistic firm proper, if the demand curve is nowhere infinitely elastic.
This is the passage in which Friedman explains how theory is an “engine” for research in Marshall. Here one can see Friedman casually employing the very Weberian language of “ideal types” and explaining their function in Weberian terms (cf. Structure, 601ff). Weber’s theory of ideal types is often associated with Weber’s program of verstehen. This is also emphasized in Parsons’ Structure (583ff). One might think that Friedman’s and Stigler’s (recall Five Lectures, 23) methodological focus on predictions runs counter to Weberian understanding. This is mistaken for, first, Parsons introduces verstehen in the context of emphasizing Weber’s claim that “the natural and social sciences are in the same situation with respect to the standard so often applied, predictability … predictability is always 32 On Friedman’s realism, see Kevin Hoover “Milton Friedman’s Stance: The Methodology of Causal Realism,” (February 5, 2004). Available at SSRN: http://ssrn.com/ abstract=902062. See also Friedman’s letter to Stigler (10/4/1948, p. 91-2 in Hammond & Hammond, op cit).
“Every System of Scientific Theory …”
543
relative to the extent of abstract generalization” (Structure, 582). So, in Parsons’ account of Weber, prediction and verstehen go hand in hand. Second, recall that in F1953 theory’s “function is to serve as a filing system organizing empirical material and facilitating our understanding of it” (7). This is no isolated occurrence. In fact, in defending his focus on as if theorizing and predictability, Friedman writes, A meaningful scientific hypothesis or theory typically asserts that certain forces are, and other forces are not, important in understanding a particular class of phenomena. It is frequently convenient to present such a hypothesis by stating that the phenomena it is desired to predict behave in the world of observation as if they occurred in a hypothetical and highly simplified world containing only the forces that the hypothesis asserts to be important. (40).
While Friedman seems to have no interest in the normative orientation of action (which is crucial to Parsons’ Weber, cf. 602, 615, etc) and he does not wish to make economics into an interpretive science, the peculiar mix of i) value neutrality of research within a conceptual scheme; ii) the unreality of assumptions; iii) the importance of predictability; iv) the methodological symmetry between natural and social science; v) the importance of understanding concrete phenomena, all these can be traced via Stigler and Parsons to Weberian roots.33
Ghent University Philosophy and Moral Sciences Blandijnberg 2 9000, Ghent Belgium [email protected]
33 I thank Erik Angner, Ross Emmett, Dan Hammond, David Levy, Deirdre N. McCloskey, Ali Khan, Steven Medema, Charles Parsons, David Teira, Stephen Turner, Thomas Uebel, Richard Wagner for comments on earlier drafts. I thank Roger Backhouse for suggestions in guiding this research. I am very grateful for Stephen Stigler’s aid in this research, including granting me access to George Stigler’s papers.
INDEX OF NAMES Not included are footnotes, figures, tables, notes and references. Abrahamsen, A. 127 Achinstein, P. 269, 271, 296, 297 Amonophis IV-Echnaton 527 Anastopoulos, C. 421 Arbuthnot, J. 457 Aristotle 281, 282, 396, 408, 536 Arp, R. 228, 229 Arthur, R. 379, 381, 384-387, 390, 392, 393 Austin, J. L. 271 Ayala, F. J. 211 Bacharach, M. 315 Bayes, T. 56, 57, 508 Bechtel, W. 127 Bell, J. S. 425-438, 442, 444, 446, 449 Belnap, N. 112, 113, 118, 119, 399, 401 Bergia, S. 413-415 Bernoulli, J. 454, 459 Bernoulli, N. 457 Bertalanffy, L. 257 Bogen, J. 127 Boghossian, P. 27, 28 Boltzmann, L. 472, 522 Bondi, H. 415 BonJour, L. 38 Boole, G. 59 Born, M. 508 Boumans, M. 7 Bovens, L. 41, 45 Boyd, R. 11, 13, 15, 18 Braithwaite, R. B. 20, 25 Brandom, R. 248 Braudel, F. 285, 287 Bromet, E. 125, 129 Bunge, M. 255, 256 Busch, J. 24 Buss, L. 232, 233 Butterfield, J. 362, 387, 393 Caesar 294 Campaner, R. 137-139, 165 Carnap, R. 53-64, 66, 73, 78, 381, 477486, 499, 503, 505, 506, 508, 509, 522
Cartwright, N. 154, 300 Carus, A. W. 505 Casey, J. 356 Cassirer, E. 266, 465 Cauchy, A.-L. 354 Caulton, A. 363 Cavalcanti, E. 449 Collingwood, R. 294 Craver, C. 127, 128, 137, 142, 145 Darden, L. 127, 128, 137 Darwin, C. 158, 203-205, 211, 212, 214, 217-219, 224-227, 230 Davidson, D. 243, 294 De Moivre, A. 454 Descartes, R. 109, 241 de Sitter, W. 413, 414 De Vries, H. 225, 226, 230-234 Dieks, D. 7, 384 Dilthey, W. 242, 264, 285, 289, 290, 291 Disraeli, B. 294 Dobzhansky, T. 206, 215 Dollo, L. 525 Douven, I. 27 Dray, W. 287, 294 Duhem, P. 12, 13, 21 Dummett, M. 27 Durkheim, E. 157, 288, 535, 539 Eddington, A. S. 397 Ehlers, J. 413, 414, 417 Ehrenfest, P. 372, 374 Einstein, A. 256, 381, 385, 387, 388, 413, 508 Elles, E. 40 Ellis, G. F. R. 414, 417, 418, 422 Elster, J. 273 Esfeld, M. 354 Euler, L. 348, 354 Feyerabend, P. 90, 488, 497 Feynman, R. P. 475 Fine, A. 19, 20, 14 Fine, K. 110 Finetti, B. de 499, 506-508 Fish, M. H. 499
D. Dieks et al. (eds.), Explanation, Prediction, and Confirmation, The Philosophy of Science in a European Perspective 2, DOI 10.1007/978-94-007-1180-8, © Springer Science+Business Media B.V. 2011
546 Fisher, R. A. 161, 426, 439, 441 Fitelson, B. 40, 58 Fodor, J. 215, 301 Fontana, W. 232, 233 Forrester, J. 155, 156 Fraassen, B. van 13, 17, 19, 83, 116, 141, 150, 269, 299, 510 Frank, P. 466, 470, 473 Friedman, M. 330, 465, 533-535, 537, 538-543 Gadamer, H.-G. 265, 285 Galileo 508 Giere, R. 301-303 Gillies, D. 58, 79 Ginzburg, C. 294 Glennan, S. 128-130, 134, 137 Glymour, C. 44 Goenner, H. 413, 415 Goldman 24 Gonzalez, W. J. 7 Gonzalo, O. 355, 363 Good, I. J. 58, 60 Gould, S. J. 228, 229 Graßhoff, G. 168 Graunt, J. 457 Greenland, S. 129, 160 Grice, H. P. 243 Grünbaum, A. 86 Gurtin, M. 355 Habermas, J. 255 Hacking, I. 501 Halbwachs, M. 288 Harman, G. 28 Hart, H. L. A. 244 Hartmann, S. 7, 41, 45 Hasek, J. 203 Hawking, S. 417 Healey, R. 354 Hegel, G. W. F. 289 Heisenberg, W. 470, 474, 508 Hempel, C. G. 243, 269, 293, 294, 296, 300, 461 Hempel, K. 287 Henseleit, K. 169, 177-179 Henson, J. 449 Herder, J. G. 456 Hicks, J. 536 Hill, B. 153, 157, 160, 163-165
Index of Names Hintikka, J. 78 Hitchcock, C. 130, 133 Hobbes, T. 264 Höfler, M. 129 Hogben, L. 218 Holland, P. 354 Holmes, O. W. 499 Hotelling, H. 536 Howson, C. 13, 61 Hubble, E. 413 Hughes, T. 355 Hume, D. 65, 159, 160, 171, 474, 508 Husserl, E. 264, 383 Huxley, J. 206 Iranzo, V. 30 Jacob, F. 207 James, W. 383, 499, 500, 507 Jankélévitsch, V. 283 Jarrett, J. P. 447 Jeffrey, R. 486, 510 Jeffreys, H. 58 Johnson, W. E. 59, 78 Kadane, J. 303 Kaldor, N. 536 Kant, I. 259, 262, 264, 453-463, 466, 467, 472, 521 Kaufman, J. S. 129 Kellogg, O. 355 Kemeny, J. G. 58, 65, 66, 485 Kepler, J. 524 Keynes, J. M. 54, 56, 63, 505 Keynes, J. N. 535, 536 Kirchhoff, G. 525 Kitcher, P. 299 Knight, F. 533, 534, 536, 542 Kolmogorov, A. N. 55 Krebs, H. 169-171, 179, 180 Kries, J. v. 468 Krüger, L. 255 Kubiński, T. 87 Kuhn, T. 90, 91, 465 Kuipers, T. A. F. 61, 79, 87 Lakatos, I. 91, 487-489, 496-498 Lamarck, J.-B. 212 Landsman, N. P. 421, 422 Laplace, P.-S. 467 Lash, T. 129 Laudan, L. 20
Index of Names Leeuwenhoek, A. van 158 Lehmkuhl, D. 349 Leibniz, G. W. 521 Lepenies, W. 258 Lerner, A. 536 Levi, I. 76, 77 Lewin, K. 522 Lewis, C. I. 499, 505 Lewis, D. 144, 149, 244, 351, 353, 354 Lewontin, R. 212, 214, 215 Lipton, P. 83-85, 90 Lovejoy, A. 499 Mach, E. 469, 522, 526, 529 Machamer, P. 127, 128, 137 Mackie, J. 171, 173 Marsden, J. 355 Marshall, A. 534, 538-540, 542 Marx, K. 527, 528, 530 May, M. 168, 173 Mayr, E. 183-187, 199, 206-208, 219, 230, 231, 234 Mazzoni, L. 413-415 McCall, S. 111 McLaughlin, P. 224, 226-228, 230, 231, 234 Menzies, P. 302 Mill, J. S. 168, 171, 173 Miller, D. 509 Miller, K. 318 Mises, R. v. 466, 473, 504, 521 Misner, C. 417 Molière 19 Morabia, A. 125, 129 Morgan, T. H. 205-208, 210, 217, 218, 226, 230, 231 Morris, C. 477, 506 Müller, T. 118-120, 399-401, 403 Myrvold, W. 42, 350, 354, 363 Nagel, E. 183, 185-187, 190, 199, 499, 502 Narlikar, J. V. 415, 416, 418, 419 Neander, K. 209, 216 Neisser, U. 259 Nencki, M. 169 Neurath, O. 487, 491, 493, 495, 496, 498, 521 Newton, I. 256, 348, 478 Niiniluoto, I. 78
547 Nix, C. 61, 62 Norsen, T. 425, 426, 428-436, 446, 448, 449 Nozick, R. 215 Oppenheim, P. 65 Parascandola, M. 129 Pareto, V. 539, 541 Paris, J. B. 59, 61, 62, 78 Parsons, T. 533-543 Pearl, J. 154, 160 Peebles, P. 417, 419 Peirce, C. S. 264, 499-502, 504, 507, 509 Placek, T. 399, 401 Poincaré, H. 11, 13, 21, 468 Poole, C. S. 129 Popper, K. 16, 168, 181, 262, 487-490, 492, 494, 496, 498, 501, 502 Porter, T. 501 Prior, A. 110 Psillos, S. 11, 13-15, 17-21 Putnam, H. 11, 13, 14, 18, 109, 395 Quine, W. V. O. 27, 243, 465, 511 Ramsey, F. 499, 506-508 Raychaudhuri, A. K. 415 Reichenbach, H. 16, 465-475, 499, 502506, 508, 509, 522 Rescher, N. 323, 324 Richardson, R. 127 Ricœur, P. 281, 283-288, 290 Rietdijk, C. W. 395 Robbins, L. 535 Robertson, H. P. 412-414, 416, 417 Rose, G. 158 Rosenberg, A. 228, 229 Rothman, K. 129 Rothman, K. J. 160 Rugh, S. E. 411, 412, 420 Rumelhart, D. E. 259 Russell, B. 511-520 Ryckman, T. A. 465 Ryle, G. 243 Salmon, W. 65, 66, 127, 128, 299, 509 Samuelson, P. 536 Savitt, S. 380, 381, 384-386, 388, 389, 392-394 Scheines, R. 44 Schleiermacher, F. 264, 284, 285, 290 Schlick, M. 465, 466, 468, 472, 473, 522
548 Schmidt, J. 362 Scholz, E. 422 Schrenk, M. 242 Schultzen, O. 169 Schum, D. 303 Schurz, G. 245 Schwartz, S. 125, 129 Seevinck, M. P. 447, 449 Seidenfeld, T. 70 Sen, A. 320 Shimony, A. 447 Shogenji, T. 37, 39-41 Simon, H. A. 322-326, 328-330, 334, 335, 342, 343 Simpson, G. G. 185, 207 Sirtes, D. 141-143, 145-150 Smith, S. 355 Snow, C. P. 254, 255, 264 Snow, J. 158, 160 Sober, E. 209, 210, 212-216, 218, 220 Spinoza, B. 521 Spirtes, P. 44 Spohn, W. 243 Stacey, R. 343 Stadler, F. 7 Stein, H. 383 Stigler, G. 533-538, 540, 542, 543 Strawson, P. F. 288 Stuart, A. 355, 363 Suppes, P. 509 Susser, E. 125, 129 Süssmilch, J. P. 458 Tarski, A. 477 Teller, P. 354 Thorne, K. 417 Toulmin, S. 295 Truesdell, C. 349, 355 Tuomela, R. 288 Uebel, T. 7 Uffink, J. 447, 449 Urbach, P. 61 Vaihinger, H. 505 Vailati, G. 508 Vico, G. 264 Vries, H. de 205, 208, 217 Wald, R. M. 417, 418 Wallace, A. R. 226 Walsh, D. M. 216, 217
Index of Names Walsh, W. H. 293 Warburg, O. 169 Watkins, J. W. N. 496 Weber, M. 7, 287, 533-535, 539, 540, 542, 543 Weed, D. 129 Weinberg, S. 418 Weiner, M. 118 Weingart, P. 257 Westman, H. 449 Weyl, H. 109, 413-415, 465, 522 Wheeler, G. 38, 42, 43 Wheeler, J. A. 417 Whewell, W. 48, 293, 296 White, H. 287 Whitrow, G. J. 396 Wiener, P. 500 Wigmore, J. 295 Williamson, J. 55, 67, 69, 70, 78 Wilson, M. 348, 356, 363 Windelband, W. 242 Wiśniewski, A. 87 Wittgenstein, L. 64, 264, 482, 511 Woodward, J. 128, 130, 132, 133, 137139, 144, 146, 147, 149 Worrall, J. 33 Wright, G. H. v. 113, 243, 287 Wroński, L. 401 Wylie, A. 295 Zahar, E. 495 Zilsel, E. 521-532 Zinkernagel, H. 411, 412, 420