An Ideal Model for the Growth of Knowledge in Research Programs Author(s): Aharon Kantorovich Source: Philosophy of Science, Vol. 45, No. 2 (Jun., 1978), pp. 250-272 Published by: The University of Chicago Press on behalf of the Philosophy of Science Association Stable URL: http://www.jstor.org/stable/186819 . Accessed: 20/01/2011 06:35 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at . http://www.jstor.org/action/showPublisher?publisherCode=ucpress. . Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact
[email protected].
The University of Chicago Press and Philosophy of Science Association are collaborating with JSTOR to digitize, preserve and extend access to Philosophy of Science.
http://www.jstor.org
AN IDEAL MODEL FOR THE GROWTH OF KNOWLEDGE IN RESEARCH PROGRAMS* AHARON KANTOROVICHt Tel-Aviv University In this paper a model is presented for the growth of knowledge in a dynamic scientific system. A system which is in some respects an idealization of a Lakatosian research program. The kinematics of the system is described in terms of two probabilistic variables, one of which is related to the evolution of its theoretical component and the other-to the growth of the empirical component. It is shown that when the empirical growth is faster than the theoretical growth the posterior probability of the theoretical component increases. Thus, empirical progressiveness of a research program, as explicated in this model, is accompanied by an increase in the degree of confirmation. In such a case the system grows in a Popperian-like spirit, while learning from experience in a Bayesian manner.
1. Bayesianism vs. Popperianism; Probability vs. Content. Bayesianism and Popperianism are generally held to be totally opposed philosophies of science. In this article an attempt will be made to construct an ideal model for the growth of scientific knowledge which will incorporate some of the methodological implications of both schools as limit cases. The main issue between the two schools concerns the status of confirmation in science; Bayesians give it a central place in their methodological schemes whereas Popperians deny its validity on logical and factual grounds. The Bayesian model of confirmation is concerned with probabilities of propositions, which are interpreted as rational degrees of belief. Its probability kinematics, which describes how probabilities change following some new evidence, is based on conditionalization of the probabilities upon the new evidence. There are a variety of ways by which this procedure is described in current literature. Although our model will not be particularly sensitive to the differences between the different versions I find it convenient to adopt, for the sake of presenting the model and its background, *Received May, 1977; revised November, 1977.
t I wish to express my thanks to the University of Melbourne for its Fellowship
support during the period of which the first version of this paper was written. The first version of this paper was read at the Annual Conference of the Australasian association for the History and Philosophy of Science, in August 1974. I would like to thank Harold Lindman, Manfred von Thun and the referees of Philosophy of Science for their valuable comments. Philosophyof Science,45 (1978)pp. 250-272. Copyright (
1978 by the Philosophy of Science Association.
250
MODEL FOR GROWTH IN RESEARCH
251
the approach which was suggested for example by Ian Hacking [3]. In this version, conditionalization can be described as follows. If the initial or prior probability of an hypothesis H at time to is PO(H) and a new empirical evidence E is acquired at time t1, such that P1(E) = 1, then according to the Bayesian model P1(H) = Po(HI E), where Po (H IE) =df Po (H & E) / Po (E). The fundamental entity here is the probability Po (A) of a proposition A, given the background information Bo which is accepted as evidence at to, whereas the conditional probability is defined (see [3] p. 314). The assumption is that when the only reason for changing the degrees of belief is the observation or the acceptance of E then a rational change should be represented by conditionalization upon E. This is what Hacking calls the "dynamic assumption" of the Bayesian model (ibid.). The personalist who makes this assumption (not always explicitly) becomes Bayesian and this means that his way of learning from experience is according to the following formula (from now on the time subscripts will be omitted from the probability functions at to): (1)
P1 (H) = P (H IE) = P(H) P(E IH)/P (E),
where the first equality expresses the dynamic assumption while the second is Bayes' theorem. According to this we should be able to calculate already at to what our degree of belief in H will be if E turns out to be true, or when E is accepted. Only if we adopt an approach such as that of Keynes [5] or Carnap (in his earlier views: [1]), which attributes unique probability P(HIE) to any pair of H and E, the dynamic assumption is redundant. These approaches, however, do not leave room for any arbitrariness in the inductive inference which may be caused by extra-evidential factors such as aesthetical and simplicity criteria ([10], pp. 521-2). In the personalistBayesian model the dependence on these factors enters through the subjective prior probabilities. The prior probabilities may vary from one person to another but a rational person changes his beliefs in accordance with the dynamic assumption. It is claimed by Bayesians that under these conditions if an hypothesis has finite prior probabilities its posterior probabilities converge to 1 or 0 with an increasing amount of empirical evidence [2]. This must certainly be true in the extreme cases of falsification by conflicting empirical data and confirmation by successful predictions, when the Bayesian transformation of beliefs (Eq. (1)) holds. Thus, according to the Bayesian interpretation the posterior probability P(H I.E) is a measure for the evidential support, or the degree of confirmation, which is granted to H after E is accepted. P(H) is the initial probability of H, given the background information Bo.
252
AHARON KANTOROVICH
The initial degree of belief is influenced by plausibility criteria-metaphysical aesthetical etc.-some of which remain invariant over long periods of time. P(E) represents the degree of novelty of E with respect to B0. If E is deduced from H then P(E IH) = 1 and we get the simple formula: (2)
P(HjE)
=
P(H)/P(E),
which shows that the posterior probability of H can be low even if E is predicted by H. Wesley Salmon draws from this the following methodological conclusion: "Bayes' theorem gives an unequivocal answer to the question of whether we ought to regard high prior probability as an asset or a liability" ([9], p. 119). His answer turns out to be positive. Indeed, other things being equal, a higher prior probability ensures a higher posterior probability, but is it really an asset? Salmon and some other Bayesians interpret the prior probability of Has a measure for what they call the "plausibility" of the hypothesis, which expresses its likelihood of success. Popper has a different interpretation and a different methodological recommendation. He maintains that not only can we not have a rational recipe for arriving at true or successful theories from empirical data, but theories are not even confirmed by empirical evidence; the only thing we can learn from experience is that a theory is false, and there is no rational basis for relying on a theory that has passed tests, i.e. that has survived attempts of falsification. For him, probability is not a degree of plausibility but a measure for the lack of content. In his "Truth, Rationality and the Growth of Scientific Knowledge" he says: ". if growth of knowledge means that we operate with theories of increasing content, it must also mean that we operate with theories with decreasing probability" ([7], p. 218). Namely, the greater is the logical content of a theory, or the greater its empirical content (if the theory is scientific, or non-metaphysical, the empirical content increases with logical content) the lower is its Popperian probability. Hence, a non-daring, non-interesting or non-informative theory has a high probability. His recommendation would be to choose a highly informative theory which will potentially be a significant contributor to the growth of knowledge. Probability in Popper's interpretation is not an epistemic entity; it is ascribed to a theory as a static property and thus it cannot change with experience. Once a theory is chosen,' the only way to have further progress is by refuting the theory and replacing it by a more informative or less probable one. In order to exploit the maximal information content of a theory which was tentatively accepted for evaluation we have to test,
MODEL FOR GROWTH IN RESEARCH
253
according to Popper, "one by one, some of its more remote consequences" ([7], p. 221) thus exposing it to the severest tests. In this way we have more chances to falsify the theory, if false, and thus to learn from experience the only thing we can learn. The Bayesian, with his dynamic assumption, claims that we can learn more. He would use Eq. (2) to show that the posterior probability of H should be higher than its initial probability when we discover a piece of empirical evidence which coincides with a prediction E of H and E was not known initially to be true, i.e. when P(E) < 1. If we interpret a "remote" consequence of H as a prediction E which was unexpected to a high degree at t0, namely has low P(E), then in case such a prediction matches with empirical data the increase in the probability of H will be high. Hence, a theory with low initial probability, besides being more vulnerable to falsification, has more chances to gain higher confirmation (see below) and when it becomes highly probable, or confirmed, it is as a result of learning from experience and the effect of "prior" considerations is thus minimalized. The finite initial probability which is given to H imposes a constraint on the degree of expectedness of E which can be predicted by H; as can be seen from Eq. (2), for a given H there is a lower bound for P(E), that is P(E) - P(H). Namely, E cannot be more unexpected than H which entails it. Hence, for a given H the lower P(E) is, or the closer it is to P(H), the higher is P(HIE). P(E) will be small if E is an unexpected piece of data or when E is a conjunction of independent or nearly independent pieces of data e, each of which & may not have a low P(e,); the conjunction E = e, & e2 & ... en will have a high degree of unexpectedness although none of the conjuncts is a big surprise. One point must be clarified here. It may be thought that the lower bound for P(E), with the above interpretation of P(E), is inconsistent with the case where a highly confirmed theory, such as Newton's dynamics and gravitational theory, gives a new prediction which is highly unexpected, such as the prediction of Neptune's existence. However, in this example it was not Newton's theory, TN, by itself which entailed Neptune's existence, eNp. If we denote by M5 the model which described the structure of the solar system and which was believed to be highly confirmed at t0, (before Neptune was discovered), and by P, the predicted orbit of Uranus, then the form of the prediction was T. & MS -> PU. When it was found that the true orbit eu differed from pu the model MS was modified into M's (or in case M5 included the assumption that there are no other planets than those specified in the model, M5 was refuted and replaced by M's), but TN remained untouched, such that TN & M'5 -> eu, and M's - eNp. If eNp was highly unexpected at t0,
254
AHARON KANTOROVICH
i.e. if P(eNP) was small, it means that P(M'), and consequently P(TN & M's), were very small. It is not surprising, therefore, that a prediction F could have a low initial probability P(E), since it was entailed by the initially improbable modified conjunction rather than by the highly confirmed general theory TN. And in general, a theoretical system T (a theory or a group of interconnected theories or a conjunction of a theory with a specific theoretical or empirical model) which has a high initial probability P(T) has to be modified into T' such that P(T') < P(T), in order to account for the unexpected data. Our methodological model will be capable of treating this sort of situation in a general way. In anticipation, it may be mentioned here only that the model would treat as methodologically significant not the confirmation of TN itself but the progressiveness of the whole dynamic system of knowledge in which it was embedded. The Bayesian might therefore adopt Popper's recommendation of severe and intensive testing for a twofold reason, i.e. to expose the theory to criticism and to give its posterior probability the opportunity to increase in case the theory passes the tests. But we can also arrive at high P(H IE) by starting with H which has a high prior probability according to Salmon's recommendation. In that case H will not have unexpected predictions, since the proximity of P(H) to 1 will not allow an empirical prediction to have a low enough initial probability P(E). Therefore, a Bayesian whose interest is in the growth of knowledge will not regard a high prior probability as an asset in all circumstances. A low probability hypothesis might be favoured for initiating a research program because it is, for example, of a higher degree of precision, simplicity or generality than its competitors. P(H) is low in these cases since the amount of new empirical information H contains relative to the known facts is high. However, such an hypothesis is not necessarily implausible in the sense that we do not believe in its truth. We might be uncertain of its truth but tentatively choose it for further evaluation without accepting it. In a decision-theoretic context we might say that H has a high epistemic utility. It is also possible that the scientist who suggests the hypothesis has a high degree of confidence in it. Other scientists accept it for evaluation since they have confidence in that scientist due to his successes in the past or because of the status of leadership he enjoys in the community. Salmon mentions this last case as one of the factors which contribute to the plausibility of the hypothesis, i.e. to P(H) (ibid.). The other factors which determine the personal probability of an hypothesis are metaphysical beliefs, general intuitions, or aesthetical values. Let us refer to all of them as "metaphysical factors."
MODEL FOR GROWTH IN RESEARCH
255
In our model, however, we will be interested in using the initial probabilities of the hypotheses which appear in a research program as measures for the growth of empirical content. So it is of great importance for us to make sure that a lower P(H) indicates that H is more informative and not implausible. I would like, therefore, to sharpen the distinction between Popperian-like improbability which is due to richness in information and implausibility which indicates a deviation from the metaphysical heuristics, or the social norms of the scientific community. Then, if we find a situation in which scientific knowledge grows while the plausibility factor remains invariant, we conclude that the variation of the P(H) variable is due only to the change in the amount of empirical information carried by H, where H denotes a dynamic theory. Thus, we might partially bridge the gap between the personalist and the Popperian conceptions of probability in the following way. We assume that if two theoretical statements T, and T2 are evaluated by the same scientific community and belong to the same metaphysical and conceptual framework then P(T,) > P(T2) iff Ct (T2) > Ct (T1), where Ct denotes a measure of empirical content. In other words, we assume that a rational person would be ready to bet at higher odds on the statement T, which adds less testable information to his present knowledge, if he does not want to take a big risk, provided T1 and T2 share the same metaphysical factor and differ only in empirical content. The isolation of the metaphysical factor is not always possible or simple, but there are controlled situations in which we can trace the growth of empirical content. This happens presumably when T1and T2are two successive versions of a developing theoretical model, or more generally, successive theories Ti and Ti>l in an ideal research program, where Ti, l is obtained by adding some assumptions to T, or by modifying it, without changing the hard core and without deviating from the heuristic of the program. The two inequalities P(Ti) > P(Ti,+) and Ct(T1) < Ct(Ti,,) hold when, for example, T,+, has new empirical predictions in addition to those which are derived also from Ti. When the plausibility factor remains invariant we can compare probabilities by comparing the empirical contents. Under these circumstances personal probability and Popperian probability change with content in the same direction. As a realization for such a dependence, we may express, for example, the personal probability function as the product P = Pm Pc, where Pm is a degree of plausibility, which is dependent on metaphysical factors and P. is a Popperian-like probability which depends on the amount of information content; e.g. P, = I - Ct. The Popperian empirical content of a theory consists of the set of potential falsifiers. The
256
AHARON KANTOROVICH
more experimental outcomes a theory excludes the smaller is its Popperian probability and personal probability P, Note that our content measure differs from Popper's in that it measures empirical content relative to a state of knowledge Bo and thus Ct is not "objective" but dependent on the state of knowledge, i.e. "epistemic." Thus, it is not the set of all potential falsifiers which counts for the personalist, but the set of those potential falsifiers which have not yet been found false. When for example all potential falsifiers of T are known to be false then P,(T) = 1. Of course, in a realistic situation there is not always a clear separation between the two factors, but in the context of a research program this might be a good approximation. When a group of scientists who share common metaphysical beliefs develop a research program starting with a theory H I, we may assume that Hi is chosen in such a way that Pm (HI) is high enough and does not vary considerably among the participants in the program. When in the course of development of the program the theory H1 is modified into H2, such that it remains in the same metaphysical m (H l) and only P, changes. and conceptual framework then Pm (H2) Pm I would like to stress here that I do not commit myself to the above factorized form which serves only as a heuristic device; for constructing the ideal model for a research program we need only the assumption that the theoretical changes are controlled in such a way that the probability function varies to a good approximation in a Popperian-like manner, i.e. in the opposite direction to empirical content. It will be shown that in such a controlled situation it is easier for scientists to evaluate the degree of empirical progressiveness of their program. When a research program starts with a low P(HI ) it means, therefore, in our model that Ct (HI ) is high, namely that the amount of information H1 adds relative to the state of knowledge Bo is large, although H1 has also a high degree of plausibility. For example, in Ptolemaic astronomy the addition of epicycles for explaining a planetary orbit leads only to a change in P, Examples from modern science: in a research program which develops a molecular theory of gases, where Hi is the ideal model of structureless elastic particles, a more advanced version H2 which takes into account the additional degrees of freedom related to the molecule's structure will contain more information and will have more measureable predictions; its Popperian-like probability PC relative to the initial state of knowledge will be lower but its degree of plausibility remains high, since it obeys the heuristic of the program. Or in a research program dealing with the structure of the hydrogen atom the theory H2, in which the electron's spin is introduced, adds content to Bohr's original model H1 but still fits the general picture, although spin is a new concept which may cause
MODEL FOR GROWTH IN RESEARCH
257
some fluctuations in Pm. There are opposite cases where an hypothesis has an average Pm much smaller than 1 in a given community of scientists and on the other hand does not add much empirical content. For example, concrete models in particle physics nowadays enter into this category; e.g. the deformable sphere model for hadrons which was introduced in hadron physics in order to account for the SU(3) symmetry of the hadron spectrum (the variety of strongly interracting particles) by assuming that they correspond to the excited states of a deformable sphere which can "breathe" infinitesimally. This model deviates from the general heuristic in that field which requires abstract models and which rejects "concrete" pictures. On the other hand, this hypothesis does not add many testable predictions to those of SU(3), so that its PCis high. When, on the other hand, there is a metaphysical shift-presumably only in a transition from one research program to another-personal probability does not vary in the above manner with empirical content. Now, how can we apply the above considerations to P(E)? Here the answer is simple: what was assumed for hypotheses must be true for observation statements. Observation statements in a given research program are expressed in the empirical language of the research program (which is probably accepted as the empirical lariguage in all research programs in a given field of science at a certain period of time). Hence there cannot be a difference in the metaphysical pictures related to different empirical statements and thus if P(EI) > P(E2) (where both probabilities are measured at time t), E2 carries more information than El. E.g., E2 is more precise, includes more details or adds totally new information. If for example Bo includes the empirical generalization "all ravens observed so far are black" the observation statement which states the existence of a yellow raven will not be regarded as implausible in our nomenclature. Such a statement just contains new unexpected information relative to Bo in conjuction with the principle of uniformity of nature. We therefore assume that within a research program P(E), similarly to P(H), changes in the opposite direction to the amount of information content, e.g. 1 - Ct(E). P(E)Our specific usage of the term "plausibility" does not explicate some important uses of this term in ordinary language, including scientists' language when they talk about science, but such an explication is not intended. By introducing our distinction between the plausibility factor and the information-measure factor we try to separate out the "external" influences, which are presumably minimalized in "normal" science and come to their maximal effect in "revolutionary" periods.
258
AHARON KANTOROVICH
2. The Growth of Knowledge in a Research Program Represented in a Two Dimensional Space of Prior Probabilities. In our model, a research program is a knowledge system which is composed of two main components empirical and theoretical. In the empirical component empirical data is accumulated in the form of observation statements and in the theoretical component theories are developed in order to match the empirical data and to generate empirical predictions. The theoretical component includes a hard core which is not subject to any change. We shall not adopt the approach in which observation statements are incorrigible and are naturally distinguished from theoretical statements. We shall rather hold a "revolutionary conventionalist" approach (see [6], p. 106) and call a statement observational if its truth can be decided upon by every participant in the program who is trained to use the experimental technique of the program and to interpret the experimental results in the light of the observational theories and the other accepted theories. Thus, we take observability to be a relative notion; relative to the experimental technique and the theories which are regarded as reliable and unquestionable in the program. The hypothetical or theoretical component of a research program is composed of a growing sequence of hypotheses HI, H2I ... Hn whereas the empirical component is composed of a sequence of ., En. Let us denote the theoretical empirical statements El, E2, . component by H, where H represents a changing hypothesis, it may be looked upon as a function whose values are the His. H is adjusted to the development of the growing empirical component E. It includes not only the main model or theory which is subject to evaluation and modification but also general theories which are relevant to the main theory and a peripheral part which enables the main theory to make empirical predictions. Adjustments in H may take place either in the main theory or in the periphery. At the time ti+ I the empirical component E changes from Ei to E,+I and Hi is replaced by Hi+I. In the case of accumulating empirical information, i.e. when there is no withdrawal of empirical evidence, the new piece of evidence i+I which is accepted or observed at t1 joins E to form the new conjunction Ei + Iupon which conditionalization of probabilities is made at ti+ I' Let us assume that the initial problem with which the program starts originates in an empirical datum el which cannot be explained within the initial state of knowledge and thus P(el) < 1. We then have to conjecture an hypothesis HI which will explain el. We expect Hi to also explain some relevant facts which were accepted at time to. Let us denote by El the conjuction of all the observational facts ?
MODEL FOR GROWTH IN RESEARCH
259
including el that have to be explained by H1 so that also P(El) < 1. Ideally E will be entailed by Hl, i.e. HI -> El. Under this condition Eq. (2) tells us that also P(H,) < 1. In- order to test HI we check some of its predictions either by performing new experiments and making new observations, or by retrieving information about experiments which have already been carried out. As a result a new piece of empirical evidence e2 joins E and El is changed into E2. In the case E2 = El & e2, or generally when E2 contains more information than EI, the following inequality must hold: P(E2) < P(E). P(E2) is the probability E2 would have at time to, at the initial state of knowledge. If HI 74 E2 (i.e. E2 cannot be deduced from HI) we change H, into H2 such that H2 -> E2. The initial probability of H2, namely the probability H2 would have at to, can take any value subject to the restriction P(H2) ? P(E2), namely, as expected, H2 should be richer in information than E2. By testing H2 new evidence is acquired and the process continues in such a way that in each step Hi -> E. By the latter condition we shall characterize an "ideal" research program, where in each step H is adjusted deductively to E. Consequently, in an ideal research program P(E I Hi) = 1 for every i. We can now describe the evolution of an ideal research program as depending on a pair of variables: x = P(H) and y = P(E). To each pair of H and E there corresponds a point in the (x,y) plane within the unit square (Fig. 1). The point (1, 1) represents the state at to where an accepted theory To explains the background data Eo. When P(E I H) =1 the corresponding point will lie within the upper triangle in which P(H) ? P(E). It is instructive to look at the 1 - P(H) and Ct(E) complements of P(H) and P(E), i.e. Ct(H) -1 - P(E), in order to see the relation between probability and empirical or testable content; the greater is the distance of P(H) or P(E) from their maximal value the larger is the empirical content (potential or actual, respectively) H or E adds to the state of reference, i.e. the state of knowledge at to. Our model is not dependent on the above specific relation between probability and content but the transition to other forms, such as 1/P or -lnP, is order-preserving ([8], p. 391). On the (x,y) plane we can try to describe quantitatively some methodological characteristics of the development of a research 'A universal statement can only entail singular observation statements which have a conditional form while E may include non-conditional singular statements E,. However, even if we conditionalize upon Es and strictly H 4 Es ([8], p. 390), still, we expect from the personalist point of view to have P(ES IH) = 1 in the case when the corresponding conditional singular statement can be deduced from H.
260
AHARON KANTOROVICH P(E)
P(E,)
4/
P(H)
P(H)
Figure 1: The (P(H), P(E)) plane; the unshaded area corresponds to pairs of (H, E) with P(E IH) = 1.
program which do not depend on the specific ontological content of H or E or on the ways scientists arrive at them or produce them. The evolution of a research program will be represented on the (x,y) plane by a path which is built of discrete steps. When at time t1 the knowledge system is in its initial state (xl, y 1) = (P(H1), P(E1)) it has five general directions in which it can move such that the content of E does not decrease, i.e. in which A y < 0 (see Fig. 2): 1. A x > A y = 0. The program moves in this direction if the same amount of information which was explained by H1 can now be explained by an hypothesis H2 which contains less new information relative to B0. This may happen, for example, when H1 is changed in such a way that it is partially assimilated into Towhen, e.g., it is discovered that some of H,'s new assumptions can be mathematically derived from the established theories at to or that we do not need all the assumptions in H1 in order to explain E1. This is a move towards a more economical explanation, without yielding any new empirical information. From the point of view of those scientists who wish to learn from experience such a step does not constitute the final sometimes goal of research but only a preliminary move-though a favorable one-before the program starts to accomplish its main task, namely to acquire more empirical knowledge. For them, a research program is a learning system in which changes in H should be motivated
MODEL FOR GROWTH IN RESEARCH
261
y
yI
-
2
4 3
Y,Y
x>
xi
Figure 2: The five general directions along which a research program can develop in the (x,y) plane.
by or result in changes in E and they try to avoid as far as possible changes which do not involve empirical motivations or which do not lead to new empirical results. 2. A x > 0, A y < 0. An ad hoc move of the following kind causes the system to develop in this direction. A free parameter is added to the theory so that it can be adjusted to a new empirical data e2 which cannot be satisfactorily explained by H1, that is P(e2 JH) is significantly smaller than one. E.g. H1 is a theory with circular electronic or planetary orbits, which are replaced in H2 by elliptical orbits. The new theory-version H2 has then less potential falsifiers and as a result P(H2) > P(H1) if this move is not accompanied by improving the precision of measurement such that H2 includes also more accurate details of the system (e.g. the parameters of planetary orbits or the details of atomic spectra). 3. A x = 0, A y < 0. When new predictions of H1 are tested and turn out to be successful or, in general, when new data is explained by H1 and thus empirical information is accumulating (A y < 0) such that there is no need to change H1, the program moves in direction 3. This seems to be an ideal methodological situation. 4. A x < 0, A y < 0. In this direction H1 is changed into H2, which has higher content and H2 explains empirical data E2 which cannot
262
AHARON KANTOROVICH
be explained by H1. E.g. some new assumptions or some new details are added to H1, or H2 is logically stronger than H1, i.e. H1 can be deduced from H2, or H2's predictions are more precise versions of H 's. In all these cases if H2 were considered at to it would be given a lower (prior) probability than H1. We do not have, yet, clear criterion for judging the favourability of moving in this direction, a topic which later will occupy us mostly.
5. Ax < 0, Ay = 0. When an increase in the empirical content of H is not accompanied by an increase of explained empirical information we are moving in direction 5. This may happen when H1 clashes with experiment and it is not possible to adjust it to the data by adding any amount of content to H1, in a way which is consistent with the hard core; then we move in this direction towards the y axis. Or when empirical content is added to H1, without testing the new predictions because, for example, they are not testable by the available experimental technique. For obvious reasons this is not a desirable way to proceed. If movement "backward" in Ct(E) (or upward in y) is forbidden in the framework of an ideal research program, the above five directions exhaust all possibilities of moving from the initial point (x1, YI). The important question which arises now is how the posterior probability or the degree of confirmation of H, P(HIE), is affected by moving in each of these directions in the (x, y) plane. Under the constraint P(EIH) = 1 the degree of confirmation changes as a function of two variables: C(x,y) = x/y. From Fig. 3 we can see that C(x,y) remains constant along the line which connects (x1, Y1) to the origin. If the program, which starts with a degree of confirmation C1 = x l/y, moves along this line its degree of confirmation remains C = C1. We can also see that C always increases in directions 1, 2 and 3 and always decreases in the fifth direction. The most interesting case is what happens in the fourth direction. In this case C increases ( AC > 0) only when the change is restricted to the area bounded by the lines C = C1 and C = 1 (the latter is the square's diagonal which connects, (1,1) to (0,0)); AC = 0 along the line C = C1;otherwise, AC < 0. These results can be obtained analytically by calculating the total differential A C(x,y) = (a C/ ax)A x + (a C/ ay)Ay at the point (X1 Y 1): (3)
(AC)(xl,y1) = (Ax
-
C1Ay)/Y1
The sign of Ax - C1 Ay determines the sign of A C for each of the five directions, in accordance to what was said above.
MODEL FOR GROWTH IN RESEARCH
263
y
Y,
K0
-X-
xi Figure 3: The domain of AC
>
0 in the fourth direction (striped area).
Hence, AC > 0 in the directions 1 and 2, in which the system moves towards a more economical explanation or in an ad hoc manner, whereas in direction 5, when the theory complicates with no new empirical results, AC < 0 as expected. In the case of Ay < 0 and Ax = 0, C always increases and this corresponds to ordinary confirmation of a static theory H1 by positive evidence or by successful predictions. Thus, the pure deductive model of confirmation is represented in direction 3. The limit of verification of H1 would be reached when all H 's predictions match observational data and the system evolves from (x,,y,) downward to the diagonal C = 1. Although this is an unattainable limit in an absolute frame of reference, in a relativistic situation where a large portion of H belongs to the hard core we may practically be able to reach this limit since only a finite number of independent tests may be needed for a relativistic verification of H1. In the fourth direction, when both Ax and Ay are negative, C increases, as can be seen from Eq. (3), when and only when (4)
Ax/Ay < C1.
Moving in the fourth direction is the only case when our intuition or existing methodologies do not provide us with a sharp criterion which will tell us whether and when it is a favorable situation. The requirement of AC > 0 and Eq. (4) will help us in this respect. But first let us see what is the significance of the AC > 0 criterion.
264
AHARON KANTOROVICH
3. Explicating "Dynamic Inductive Inference." The criterion of AC > 0 for a "good" step in a research program can be viewed as an extension of the requirement that the degree of evidential support or confirmation of a static theory H will increase following an empirical investigation which yields the evidence E, i.e. P(HIE) > P(H). This is called by Carnap the positive relevance criterion ([1] chap. 6; see also [4] p. 134). When and only when this is fulfilled can it be said that E confirms H. In our model, this criterion will be a special case of the AC > 0 criterion which applies to a dynamic theoretical system H. In the extended version E "confirms" H iff P(HIE) increases with Ct(E). In classical inductive methodology there are rules as to when the degree of confirmation of a static theory increases and there are formalisms in which these rules are explicated. Let us call these rules rules of static inductive inference. In our model we find an explication for some rules of scientific inference in a dynamic knowledge system. Scientific inference is conceived here as the effect of a growing body of empirical information on a dynamic theoretical system. Our scheme does not intend to justify "dynamic induction" but only to give it a concise explication which will lead to some insights about the interrelations of different methodologies and from which we shall infer additional rules besides reformulating some of the old ones in a more articulate and accurate manner. Thus, first ordinary intuitive inductive rules should be explicated in the model, to account for the special case of a static theoretical system. Secondly, these rules should be extended to a dynamic H. Mary Hesse tries to carry out the first task and she lists ([4] p. 98) eight intuitive inductive rules which should be accounted for. This list might serve as a touch-stone for our scheme. When we stay on the path of ideal static deductive confirmation (Ax = 0) and obtain AC > 0 for Ay < 0, we account for the following four rules: i) Enumerative induction and converse entailment: If H is a generalization, its degree of confirmation should in general increase by observation of an increasing number of its positive instances and no negative instances. And in general, the confirmation of an hypothesis should be increased following successful predictions. In our model an increasing number of positive instances or successful predictions means a decreasing P(E) which results in moving towards C = 1. But if the data is uniform the decrease in P(E) will become negligible. ii) A variety ofpositive instances will result in higher confirmation. Indeed higher variety in E means that the accumulating empiri-
MODELFOR GROWTHIN RESEARCH
265
cal information is expressed by a conjunction of a large number of independent or nearly independent empirical statements which means larger negative Ay for a given x and consequently higher (positive) AC. iii) Initially unexpected evidence: if a logical consequence of H is initially unexpected before H was proposed and it is then observed to be true it raises the confirmation of H more than an initially expected evidence. This rule is obeyed in our model since when an unexpected E is observed or accepted there is a large drop in y. iv) Predictive inference: If H acquires a high degree of confirmation some further consequence of H which has not yet been observed should be predicted with high confirmation. This can be seen in the (x,y) plane from the fact that when we are on the line x = x, at a point which is closer to C = I the maximal possible drop in y is smaller; namely, there is a lower bound for the initial probability of a new prediction of H. For example, when C, = P(HIIE1) a new independent prediction e can, if successful, raise C from C, to C' = P(HIJEl&e) = P(H,)/P(E,)*P(e) - C,/P(e). Hence the minimal possible P(e), Pmin(e) = C, will lead to C' = 1. If C, is high, P(e) can take only high values, i.e. e should be predicted with a high degree of expectedness. A fifth rule is imposed on the system: v) Elimimative induction: the confirmation of H increases by observational refutation of an increasing number of its competitors. The process of adjusting H to E can be viewed as a process of elimination. We can logically regard the hypotheses Hi, i = 1, 2, . . ., n, of a research program as competing hypotheses for the explanation of the initial piece of information E. Namely, post factum, one can regard the evolution of the research program up to the n-th stage as a process of eliminating H,,H2, . . ., H,1 by an empirical investigation, leaving only H. unrefuted. And by the nature of this process, the elimination of H,, H2, . . ., Hn_l is always accompanied by the growth of empirical information or by a decreasing P(E). Thus, if we regard Hn as given from the start as a candidate for explaining E, its degree of confirmation P(HJ)/P(E,) increases with i till i = n when all HI, . Hn,l have been replaced (not necessarily refuted). This rule can be regarded as a necessary condition for a theoretical change in the system. Indeed, movement in direction 5, which
266
AHARON KANTOROVICH
is the only one which does not obey the rule, always leads to AC < 0, i.e. to disconfirmation. Rules i, ii, iii, for static inductive inference can be summarized by the requirement that when a growing body of empirical information is explained by a static theory the latter's degree of confirmation increases. This is extended in our model to the requirement that the degree of confirmation of H will increase with the growth of empirical information (with increasing numbers of positive instances, variety and unexpectedness of the empirical data) which enters into the system, i.e. which is explained by H. The requirement of static confirmation along x = x, with Ax IAy = 0 (third direction) is therefore a special and extreme case of the requirement of AxlAy < Cl in the fourth direction. Note that when Cl is close to zero, namely when for explaining El which does not have a very high degree of unexpectedness we introduce a very daring H1, with a very small P(HI), the program is forced to be in a Popperian-like situation where a big increase in the amount of explained empirical information (a large negative Ay) is needed in order to have a substantial increase in C and to get near to C = 1 in the static deductive case (Ax = 0) or in the fourth direction. In the limit case where Cl = P(HI) = 0, confirmation cannot rise from the zero level for any quantity of positive evidence or successful predictions which are consistent with B.. In the process of adjusting H to E it is always possible to protect the theoretical hard core Th. With enough ingenuity it is possible, according to the "Duhem-Quine thesis" to adjust H to E with any hard core. In other words, it is possible to protect Th from refutation "come what may," in the light of any piece of evidence, by constructing protective belts around it and proceeding in the fourth direction. But what is the price of protecting Th? H can be adjusted to E in a Popperian-like manner where each Hi is replaced by a more informative and more testable H,,,. For example, when Hi,I is logically stronger than Hi or has more accurate predictions, such that Hi,I explains Ei, the explanandum of H,, and accounts also for the problematic data which refuted Hi and has further new predictions. On the other hand the adjustment of H to E can be done in an ad hoc manner, in the "bad" sense of the word. But how shall we distinguish between a "good" and a "bad" move in this direction? In our model the answer is obvious: a good move should be characterized by the AC > 0 condition. Before trying to find the significance of this condition with respect to the problem of ad hocness, let us see first what would be our intuitive answer and what are the answers of Popper, Duhem and Lakatos. If most of the Hi's new predictions turn out
MODEL FOR GROWTH IN RESEARCH
267
to be successful we consider it, intuitively, as a favorable methodological situation (let us call it "definite growth"). On the other hand, it can happen that only a small portion of the new empirical content is confirmed experimentally or even no new prediction is confirmed ("apparent growth"). Popper eliminates the last possibility by demanding, in addition to the requirement of independent testability, the requirement that the theory should pass some new and severe tests ("The third requirement" for the growth of knowledge ([7] pp. 242-248)). If this methodological requirement is obeyed, we prevent the possibility of developing an unbroken sequence of refuted theories which obey the requirement of independent testability but which do not anticipate any new empirical data. Popper calls such theories ad hoc (meaning the "bad" sense). We may consider a sequence of theories which do not produce new predictions at all-that is, when each Hi+, explains the explanandum of Hi and the evidence which Hi could not explain but does not have new testable predictions ("zero growth")-as a limit case for ad hocness. A prototype hypothetical example for an apparent growth is given in Lakatos' story (which may be assigned to the category of "philosophy-of-science fiction") about a research program which is developed on the basis of Newton's mechanics and gravitational theory as a hard core in order to cope with a problem of planetary misbehavior, such that the sequence of protective belts does not anticipate any empirical data ([6] p. 100). Here the unbroken sequence of refuted theories is generated as a result of refusal to abandon the hard core, namely to change Th. Presumably, the neo-falsificationist will claim that an apparent growth in this case is an indication that the hard core has to be replaced. Thus, Popper's third criterion, or rather Lakatos' condition (ibid.) serves here to eliminate a theory not by direct falsification, but by observing the way a research program which is based on it develops; if the research program "degenerates," namely if its theoretical growth does not anticipate its empirical growth, it may be regarded as a sign that the hard core should be replaced. The limit case of such a methodological situation is the movement in direction 5; ending with C = 0 in the fastest way. This will happen if it would be impossible to explain a given data without replacing Th .
Popper's or Lakatos' condition gives some teeth to Duhem's claim that a theory, which will not fall as a result of "refutations," will collapse as a result of too many corrections, which cause it to lose its original simplicity; if the corrections do not generate successful predictions, the program should be abandoned. Popper's stress is on the severity of the tests the theory should pass. His recommendation
268
AHARON KANTOROVICH
will be, therefore, that the sequence of theories should generate successful predictions which have been unexpected at the beginning of the research program. When we try to explicate this criterion in our model we find that the AC > 0 condition for the passage from Hi to Hi,, in the fourth direction is the natural choice and it reflects indeed the severity of the tests which H,, has passed. However, it is not related at all to the anticipation of empirical data, since the model is not sensitive to temporal relations between elements of H and elements of E; it does not attribute a stronger confirmation to a theory if empirical data which matches its predictions is discovered after the prediction is made. But does the time factor necessarily have in fact such an effect? What enhances confirmation is the severity of the test, relative to the theoretical change Ax. If Ax is big and Ay which is related to the test is small relative to Ax then the increase in confirmation will be small, even if the empirical data was obtained after the prediction. And this is rational enough, since it is not conceivable that confirmation will depend on when the deduction was made (a scientist may present a theory without knowing that some available evidence can be deduced from it). For every example from the history of science of a strong confirmation which came as a result of the anticipation of an empirical event one can find an example of the same effect brought about by the (theoretical) with slight modificadiscovery that an available theory-possibly tions-can explain an available piece of data; we then say that the theory explains the data economically, i.e. we do not have to change the theory much in order to accomodate the large amount of data. All these considerations are simply summarized by the condition expressed by inequality 4. In the light of what was said above we can apprehend the significance of this condition: it states that in any process of adjustment of H to E there should be an upper boundfor the rate of theoretical growth for a given rate of empirical growth. It is instructive here to evaluate this condition in the special case where H2 = H1 & h2 and E2 E1 & e2 such that h2 and e2 are probabilistically independent of H1 and El respectively. In such a case: Ax/Ay
= (P(H2) - P(H1)) /(P(E2) -P(E1)) = (P(H1)P(h2)= CI(P(h2)
-
P(H1))/(P(E1)P(e2) 1) /(P(e2)
-
-
P(E1))
1)
Cl Ct(h2) /Ct (e2). When Ct(h2) = Ct(e2) the rate of theoretical growth equals that of
MODEL FOR GROWTH IN RESEARCH
269
empirical growth and Ax/Ay = C or AC = 0. When Ct(h2) < Ct(e2) the program is empirically progressing, in our modified sense, in which case Ax/Ay < Cl and AC > 0. The limit case of static deduction from H1, i.e. Ct(h2) 0 gives according to Eq.(3) AC proportional to /IAy,which means that for a given El the degree of confirmation increases in proportion to Ct(e2)(since lAyI = P(E,)(1-P(e2)) = P(E,)Ct(e2)). Hence, in the Ax < 0 case if AC > 0 it means that we add a new assumption, or a new parameter or a new factor h2 which does not add much content to Hi but a large amount of new data e2 or initially unexpected data can be explained by Hi and h2 (the restriction P(e2) - P(H1&hl) must always hold). We have here, therefore, a gain of empirical knowledge: a large amount of empirical data is explained following a small change in H or vice versa. The program degenerates when AC - 0, when there is no such gain. We may use an additive measure Ct for information: C -== nP. We can then define a quantity;
(5)
A
=df(A
Ca (E)/A Ca (H)) - 1.
which expresses the amplification power of the research program as a function of H and E. Namely, A indicates the amplification or the gain of empirical information which results from a change in the system. On the other hand, we can express A as a function of x and y: A = [Aln y/Aln x] - 1 = C(A y/A x) - 1, from which it follows (using Eq. (4)) that A > 0 iff AC > 0, AC = 0 iff A = 0 and A < 0 iff AC < 0. Hence, if the amount of theoretical information (i.e. the empirical content of the theoretical component) which is added to the system is smaller than the accompanied increase in the amount of empirical information which is explained within the system, then the degree of confirmation of H increases. Namely, in the fouth direction, the degree of amplification, which measures empirical progressiveness, is positive if and only if the confirmation increases. The AC < 0 criterion for ad hocness or non-progressivness holds only in the fourth direction, for content-increasing theoretical changes. It corresponds to a theory modification in which a new datum is explained not by using the resources of the theory but by adding a new element to the theory which is loosely related or entirely unrelated to the main body of the theory and, on the other hand, does not yield new predictions. There is, however, a different kind of an ad hoc change, which is content-reducing. This happens when, for example, free parameters are added to the theory to accomodate new data. As a result, the theory has now less new predictions and it therefore becomes more supported by the data, namely the system
270
AHARON KANTOROVICH
moves in the 2-nd direction with AC > 0. But the addition of free parameters can also yield a progressive or a non ad hoc content-increasing change. When free parameters are added, an advance in measurement precision may be stimulated and as a result the theoretical system itself develops in the direction of higher precision, yielding more accurate predictions for specific observational objects. for example, the transition from the simplified version of the Bohr atom to the Bohr-Sommerfeld version with elliptical orbits was accompanied by a higher precision in spectroscopic measurements to observe the fine structure of the spectrum. This move was progressive since the higher precision in predictions compensated for the initially contentreducing change, and the system developed into the progressive, content-increasing, region, i.e. the fourth direction with AC > 0. The most interesting question which arises now is how it is possible to have a gain in empirical information. If we look at the (x,y) diagram (Fig. 3) we can immediately see that it is possible to have A > 0 when we have started with an initial hypothesis H1 with Ct(H,) > Ct(E,); that is, when we explain the initial problematic data in the research program by a law or a theory which has extra predictions. In this step, which initiates the development of the program, there is an initial jump from the point (1,1) to (xl, y1) with AIC < 0. If we were to move along the C = 1 line from (1,1) to the point (x ,y I) then I C = 0 but amplification would not be possible. The initial jump enables an amplification of empirical information in the future development of the program. It represents the stage of potential theoretical discovery which is the crucial step in determining the fate of the program. The bigger is the first leap, i.e. the further H1 departs from E, the higher can be the subsequent amplification power of the program. If the program starts with a low Cl, i.e. when the theoretical leap lifts us high above the level of empirical data and established knowledge, we are in a Popperian-like situation in which learning from experience can be fast in the AC > 0 direction. On the other hand, when we start with high C1 we are closer to the initial empirical data and the average speed of learning from experience cannot be high. There might however be a case when it seems that an hypothesis H1 which is highly probable given Bo generates a research program which is highly progressive. Let us take the cited example of Newtonian theory H1 = TN & M5. This theory was very probable at the time the deviations in the orbit of Uranus were discovered. We might be tempted to say-when in inductivist mood-that the unexpected new evidence eu in conjunction with H1 entailed M5, and the new version H2 = TN & M5 generated a very progressive research program,
MODEL FOR GROWTH IN RESEARCH
271
since H2 -* eNpand eNp was found to be true, and Ct(eNP) was bigger than Ct(ej). Indeed, eu expressed the results of observations made on a known planet whereas eNp was a statement about the existence of a new planet, its mass and orbit. Later on additional effects of this new planet were discovered which confirmed H2 to a maximal degree. However, when we enter into a Popperian mentality we interpret the above situation differently, in the following way: There was a significant theoretical jump from H1 to H2 which enabled the subsequent progressive steps to take place. Instead of assuming that H1 (i.e. Ms) is correct but that there was a defect in the observation, or that there were some mysterious untestable effects which caused the deviation etc., a conjecture was made which implied much more empirical information than H1 & eu. The hypothesis H2 could by no means be entailed by Hi & eu, since many other possibilities for explanation existed. Thus, it was a theoretical jump from H1 to H2 which generated a progressive research program. To sum up, in our model of confirmation and scientific inference the methodological unit for assessment is not a single theory but research program. Our scheme a dynamic knowledge system-a incorporates as limit cases the methodologies of static induction or confirmation, which are represented by the third direction with A = oo, and Popperian falsification, which corresponds to a movement in the fifth direction with A = -1. The growth of knowledge in Lakatosian research programs is accounted for mainly in the fourth direction where empirical progressiveness and degeneration, in a modified sense, correspond to A > 0 (increasing confirmation) and A < 0 (decreasing confirmation) respectively. Thus the main new conclusion which we derive from the model is that the rate of theoretical growth should be small in comparison to the rate of empirical growth since this leads to increasing confirmation. This last conclusion is familiar to physicists; when they reject "too speculative" or "premature" theories which go too far from the empirical data claiming that first we have to construct phenomenological theories, which are close to the empirical phenomena, before a new far-fetched theory will be considered seriously. The model also accounts for the general phenomenon of modern science that theories which become more and more complicated are at the same time more and more precise and, in general, empirically progressive. REFERENCES [1] Carnap, R. Logical Foundations of Probability. Chicago: University of Chicago Press, 1950. [2] Edwards, W., Lindman, H., and Savage, L. "Bayesian Statistical Inference for Psychological Research," Psychological Review, 70(1963): 193-242.
272
AHARON KANTOROVICH
[3] Hacking, I. "Slightly More Realistic Personal Probability," Philosophy of Science, 35(1967): 311-325. [4] Hesse, M. The Structure of Scientific Inference. London: Macmillan, 1974. [5] Keynes, M. A Treatise on Probability. London: Macmillan, 1921. [6] Lakatos, I. "Falsification and the Methodology of Scientific Research Programmes." In Criticism and the Growth of Knowledge, I. Lakatos and A. Musgrave eds. Cambridge: Cambridge University Press, 1970. Pp. 91-196. [7] Popper, K. "Truth, Rationality, and the Growth of Scientific Knowledge." In [8], pp 215-250. [8] Popper, K. Conjectures and Refutations. London: Routledge and Kegan Paul, 1969. [9] Salmon, W. The Foundations of Scientific Inference. Pittsburgh: University of Pittsburg Press, 1967. [10] Watanabe, S. "Pattern Recognition as an Inductive Process." In Methodologies of Pattern Recognition S. Watanabe ed., Pp. 521-534.