By Parallel Reasoning
This page intentionally left blank
By Parallel Reasoning The Construction and Evaluation of Analogical Arguments
PAUL F. A. BARTHA
1 2010
3 Oxford University Press, Inc., publishes works that further Oxford University’s objective of excellence in research, scholarship, and education. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam
Copyright # 2010 by Oxford University Press, Inc. Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 www.oup.com Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. Library of Congress Cataloging-in-Publication Data Bartha, Paul F. A. By parallel reasoning : the construction and evaluation of analogical arguments / Paul F. A. Bartha. p. cm. Includes bibliographical references and index. ISBN 978-0-19-532553-9 1. Analogy. 2. Reasoning. I. Title. BD190.B37 2009 169—dc22 2008032836
9 8 7 6 5 4 3 2 1 Printed in the United States of America on acid-free paper
Acknowledgments
This book has taken shape over many years, during the course of which many people have made significant contributions. Jeff Horty, Andrew Irvine, and James Kelleher read the entire work and offered many valuable suggestions. Other colleagues and graduate students have provided insightful comments and criticism or pointed me to interesting examples: John Beatty, Darren Bradley, Roger Clarke, Emma Esmaili, Chris Hitchcock, Richard Johns, Sonia Memetea, Margaret Schabas, and Steve Savitt. I would like to thank Keith Holyoak for encouragement and some excellent suggestions relating to chapters 4 and 8, and Mike Rivers-Bowerman for useful criticism of chapter 6. I deeply appreciate the encouragement of the editor at Oxford University Press and the terrific suggestions of two anonymous reviewers; those suggestions significantly improved the book. I owe a special debt of gratitude to Kevin Ashley, Nuel Belnap, Bruce Buchanan, Richard Gale, Ken Manders, and above all, Wesley Salmon for inspiring and guiding the original dissertation project from which this book emerged. Finally, my deepest and everlasting gratitude, for unwavering patience, support, and encouragement, goes to my parents, Peter and Jeannie, and to my wife, Patricia.
This page intentionally left blank
Preface
In 1966, Mary Hesse published Models and Analogies in Science. In a set of five essays, she raised fundamental questions about the importance of analogies in scientific thought. Are analogies needed to understand theoretical concepts, and are they indispensable to the advancement of science? What criteria separate good analogical arguments from bad ones? What philosophical justification can be provided for reasoning by analogy? What does Aristotle have to tell us about the role of analogies in predication? Is there a connection between metaphor and scientific explanation? In offering answers to these questions, Hesse advanced our understanding of two aspects of the traditional “problem of analogy”: the inferential aspect, which concerns analogical arguments; and the semantic aspect, which has to do with the role of analogy (and metaphor) in the development of concepts. Previous thinkers had considered both issues, but Hesse was the first to work out detailed arguments that they were linked. During the past twenty-five years or so, research on analogy has been transformed by the efforts and interests of AI researchers, psychologists, and cognitive scientists. There are now a large number of computational theories of analogy, implemented in computer programs that perform analogical reasoning and related tasks. The introduction to The Analogical Mind, a recent anthology of “state-of-the-art” essays (Gentner, Holyoak, and Kokinov 2001), gives a good idea of how cognitive scientists now understand the central issues concerning analogy. These issues are structured around a widely accepted decomposition of the “process of analogical thinking” into four steps:1 1. See also Holyoak and Thagard (1989), which lays out a similar decomposition.
viii
PREFACE
• Retrieval or access of a relevant “source” analog • Mapping that sets up systematic correspondences between the elements of the source and “target” analogs • Analogical inference or transfer of information from source to target • Learning of new categories or schemas in the aftermath of analogical reasoning Many researchers hope to offer a psychologically realistic model of one or more of these stages that is, at least in principle, capable of being translated into a computer program. The larger objective is to understand what people do at each of these stages when they use analogies and how they do it. Subsidiary objectives include the exploration of technical difficulties in implementing these ideas in computer programs, the application of analogy to tasks besides reasoning and problem-solving, and understanding how analogy is integrated into larger scale cognitive processes (Forbus 2001). In short, the predominant concern in current research is to model analogy as a cognitive process. This has led to a climate in which there is little room for Hesse’s second and third questions—those concerned with the nature and justification of analogical reasoning. While there are now many models of analogical reasoning, there are few models for the critical evaluation of analogical reasoning. To be sure, cognitive scientists stress the need for evaluation at the analogical transfer stage, but that often amounts to checking that the models produce plausibility judgments that are psychologically realistic. The legitimacy of these plausibility judgments is not questioned further. Yet we surely need to think about Hesse’s questions. Substantive computational models of analogical reasoning obviously embody normative principles, and those principles should be subjected to criticism.2 A computer program might be eerily successful in duplicating human reasoning, yet produce flawed analogical arguments—just as humans do. Or it might produce excellent results, in which case we would want to understand the underlying ideas and their philosophical basis. By way of comparison, suppose that we want to understand the psychological processes people employ to come up with mathematical proofs. We write a computer program that breaks these processes into stages and duplicates human reasoning at each stage. That is a significant accomplishment. But we still demand a logic that discriminates between good proofs and bad ones. Somehow, when we turn to the literature on analogical reasoning, we find that this demand has all but disappeared. That is surprising, in light of empirical evidence that shows how easily people can be misled by analogical arguments.3 One reason for the fading urgency of the traditional questions is the growing conviction that much of what takes place under the heading “analogical 2. Chapter 3 examines a selection of computational models, with a view to isolating and evaluating the tacit normative theories. In particular, that chapter examines the principle of systematicity employed by most current computational models. 3. See Forbus, Gentner, and Law (1995) and Gick and Holyoak (1980, 1983) for evidence that people regularly focus on superficial similarities when evaluating analogies.
PREFACE
ix
reasoning” is not really inference but rather something akin to perception. The Analogical Mind (Gentner, Holyoak, and Kokinov 2001) includes essays on analogical thinking in young children and nonhuman primates. In an epilogue to the book, Hofstadter writes that reasoning and problem-solving lie “far indeed from the core of human thought,” and comments: If analogy were merely a special variety of something [reasoning] that in itself lies way out on the peripheries, then it would be but an itty-bitty blip in the broad blue sky of cognition. To me, however, analogy is anything but a bitty blip—rather, it’s the very blue that fills the whole sky of cognition—analogy is everything, or very nearly so, in my view. (Gentner, Holyoak, and Kokinov 2001, 499) These sentiments, and the more specific thesis that analogy-making is a process much like perception, are defended in Hofstadter’s (1995) book, Fluid Concepts and Creative Analogies. While I am sympathetic to much of what he says, I nevertheless believe in the importance of distinguishing between the process of recognizing or creating an analogy and its critical assessment. Analogies play a vast and mysterious part in human thought. We rely on them when we identify complex patterns, bring past experience to bear on a problem, or strive to attain a perspective that unifies different areas of inquiry. There are plenty of big questions, many of them opened up by recent research. How do humans construct analogies? Do nonhuman animals use analogies in a similar way to humans? What role does analogy have in defining and understanding concepts? What is the relationship between analogies and metaphors? What prospect do computational models of analogy have for illuminating how humans reason (or ought to reason) by analogy? Are the analogies employed in scientific reasoning different from those we find in other kinds of inquiry? In this book, I focus on a single thesis: good analogical arguments are an important means of establishing the plausibility of scientific hypotheses. I concentrate on two questions: (1) what criteria should we use to evaluate the analogical arguments used in science, and (2) how can we provide a philosophical justification for those criteria? These questions relate to only one stage in the reasoning process and are directed toward the specialized analogies that we find in science. Yet can we really hope to answer such questions in isolation from a broader theory of analogy-making? The theory of conceptual metaphor, presented in George Lakoff and Mark Johnson’s Metaphors We Live By (1980), suggests a negative answer. According to the theory of conceptual metaphor, the source domain in any metaphor (or analogy) must always be some element of “direct physical experience” that can be grasped nonmetaphorically because it comes from our everyday activities as embodied human beings living in a particular culture. Typical examples include journeys (“Life is a journey”), spatial orientation (“He’s at the peak of his career”), and war (“He attacked every weak point in my argument”). In a recent application of Lakoff and Johnson’s theory to metaphors in science, Theodore Brown (2003) embraces their view that all metaphors
x
PREFACE
must employ a source domain based upon everyday experience. He believes that this idea has significant potential for bridging the gap between scientist and nonscientist: When we recognize that scientific reasoning is based on the same kinds of thought processes used in other arenas of thought, that scientists are constrained in their attempts to read nature by the same embodied and social understandings that everyone uses to get along in life, science is really not so mysterious after all. (2003, 196) My view is that we need not be so restrictive about the choice of source domain. Metaphors based on “direct physical experience” are important in science, but so are sophisticated analogies that are drawn directly between two areas of scientific inquiry. I shall argue that a model of analogies within science, reasoning from one scientific domain to another, is both possible and valuable. I hope to show that this type of analogical reasoning is actually simpler in some respects than many of the examples that interest Lakoff and Johnson. There is room for a logic of analogical reasoning in which psychology plays only a limited role. Of course, we ultimately desire a broad theory of analogical reasoning, but my contention is that scientific analogies provide an excellent setting in which to start developing such a theory.
Contents
1. Analogical Arguments, 1 2. Philosophical Theories, 35 3. Computational Theories, 59 4. The Articulation Model, 91 5. Analogies in Mathematics, 151 6. Similarity and Patterns of Generalization, 195 7. Analogy and Epistemic Values, 239 8. Analogy and Symmetry, 265 9. A Wider Role for Analogies, 305 Bibliography, 333 Index, 349
This page intentionally left blank
By Parallel Reasoning
This page intentionally left blank
1 Analogical Arguments
An analogy is a comparison between two objects, or systems of objects, that highlights respects in which they are thought to be similar. Analogical reasoning is any type of thinking that relies upon an analogy. An analogical argument is an explicit representation of analogical reasoning that cites accepted similarities between two systems in support of the conclusion that some further similarity exists. This book proposes a theory of analogical arguments, concentrating on analogical arguments in science and mathematics. In this introductory chapter, I first explain why understanding such arguments is an important part of understanding analogical reasoning in general. I then provide an overview of my theory.
1.1 Magic, Science, or Common Sense? Analogical reasoning is ancient and mysterious. Odysseus used it to interpret an omen from the gods, in an effort to rally the wavering Greek forces at Troy. Reminding the Greeks of an occasion when they had witnessed a serpent devouring eight young sparrows and their mother and then turning to stone, Odysseus argued that this was a sign of ultimate victory. The Greek army would prevail against Troy after nine long years of war (Homer Iliad 2.300). Many cultures have relied upon analogy in magical rites, particularly in the practice of homeopathic medicine. For example, Evans-Pritchard (1937) reported that the Azande treat elephantiasis by making incisions in the leg and rubbing in ashes from a burnt piece of elephant’s leg. The frequent affiliation between analogy and magic
2
BY PARALLEL REASONING
has prompted one anthropologist to suggest that the use of analogy is a distinctive feature of “nonscientific thought” (Lienhardt 1954, 106). Yet the use of analogy, from antiquity onwards, has also been a distinctive feature of scientific thought. Analogical reasoning has played many important roles in the evolution of science. It is a powerful heuristic tool, aiding in the formulation of new theories. According to Priestley, a pioneer in chemistry and electricity, “analogy is our best guide in all philosophical investigations; and all discoveries, which were not made by mere accident, have been made by the help of it” (1769/1966, 14). One famous example is Maxwell’s discovery of the displacement current, around 1860, by modeling electromagnetic phenomena with a mechanical configuration of rotating vortices and idle wheels. In mathematics, too, striking breakthroughs have involved analogies between such diverse fields as logic and topology (Grosholz 1985). Analogies also have a predictive use. When two physical systems are described by laws with identical mathematical form, one system can be used to model the other. For instance, hydrodynamic analogies, first pointed out by Kelvin, exploit mathematical similarities in the equations governing ideal fluid flow and torsional problems. The equations determining the stress function for a twisted bar are identical to those for determining the velocity of a viscous fluid moving along a tube with the same cross-section as the bar. To determine the stress in a planned structure in a relatively economical way, you can construct an appropriate fluid model and measure the fluid’s velocity.1 This predictive role also exists for analogies that are not mathematically precise, as when an animal model is used to predict how a new drug might affect a human. Yet another use for analogies in science and mathematics is to pave the way for conceptual unification. Eighteenth-century analogies among “electrical fluid” in the laboratory, lightning, and “animal spirits” led to the gradual recognition that these were a single kind of “substance” (Pera 1992). In the nineteenth century, similarities between “substitution groups” (today’s permutation or symmetric groups) and a class of transformations on crystal structures led to the contemporary concept of a group (Kline 1972). Finally, and overlapping with each of the above three uses, analogical reasoning appears to play a role in justification. Analogies are employed in all sciences to demonstrate the plausibility of hypotheses. In fields such as archaeology, where we lack direct means of testing, they may provide the strongest form of support available. When we claim this justificatory role for analogical reasoning in science, however, we come up against three problems. The first is that even within science, many examples of analogical reasoning do seem like magic. Steiner (1998) argues that a series of deeply mysterious analogies played a fundamental role in breakthroughs in nineteenth- and twentieth-century physics. In his view, we cannot explain why these spectacular cases of analogical reasoning had the slightest plausibility, let alone why they succeeded in advancing physics. We find equally striking examples in other 1. This example is discussed in Timoshenko and Goodier (1970). With the availability of powerful computational tools, analog models such as this are less important today than in the past.
ANALOGICAL ARGUMENTS
3
sciences. A pharmacologist in 1934 noticed that ingesting the compound meperidine, now known as Demerol, induced an S-shaped tail curvature in mice. Since morphine produced the same effect, he reasoned—correctly, as it happens—that meperidine might also be an effective painkiller (Reynolds and Randall 1975, 273). Science or magic? Our intuitions about such cases are not always clear. The second problem is that the contribution of analogical reasoning to justification, although taken for granted, has rarely been subjected to extensive philosophical examination. People may be good at identifying the strengths and weaknesses of particular analogical arguments, but there have been few systematic efforts to distinguish between good and bad analogies in scientific reasoning or to explore the philosophical foundations for this form of justification. We have no substantive normative theory of analogical arguments. The third problem, specific to the philosophy of science, is to find a way to incorporate analogical reasoning within a model of theoretical confirmation. Analogies join symmetry arguments, thought experiments, and other forms of reasoning that seem to be important in the history of science, yet cannot easily be reconciled with dominant philosophical views about confirmation. Although focusing on scientific examples brings these three problems into sharp focus, very similar difficulties apply to “common sense” analogical reasoning and its application to nonscientific disciplines such as the law and philosophy. Here, too, we encounter mysterious cases, a lack of clear guidelines, and difficulty in formulating a coherent model of justification that accommodates the use of analogy. Analogical reasoning appears to have a common structure, and to face at least some common problems, across a variety of highly diverse settings. If we accept these preliminary observations, then we should acknowledge two further points. First, we should strive for a theory of analogical reasoning that combines broad unifying principles with means for adapting those principles to the diverse settings in which analogical reasoning is employed. The three problems noted above should be solved at a general level, but an adequate theory of analogical reasoning also needs to be able to address the details that emerge in specialized applications. Second, in developing a theory of analogy, we have to make a wise decision about where to start. Our focus should be on examples that can lead us to a theory that has both depth and generality. There can be more than one good starting point. Much of the research on analogical reasoning over the past two decades has been oriented toward the development of computational models, and psychologically realistic models, of “common sense” analogical reasoning. I shall take a different approach. I see analogical arguments in mathematics and science as key or “leading” special cases that provide an excellent basis for a general normative theory. My theory of analogical arguments rests on these two ideas. The theory combines two broad and generally applicable principles with detailed models for evaluating different forms of analogical arguments, mainly in mathematics and science. In this introductory chapter, section 1.2 explains why my theory concentrates on analogical arguments, and section 1.3 motivates the further restriction (for much of the book) to analogical arguments in mathematics and science. To set the stage, sections 1.4 through 1.6 characterize the argument form, discuss the
4
BY PARALLEL REASONING
meaning of plausibility, and review commonsense ideas about analogical arguments. The remaining sections of the chapter provide an overview of my theory.
1.2 Why Focus on Analogical Arguments? An analogical argument is an explicit form of analogical reasoning, with premises and a conclusion. Why, if our ultimate goal is to understand analogical reasoning, should we focus our attention on analogical arguments? The basic reason is simple. Since my main objective is to understand how analogical reasoning can support or justify a conclusion, it is natural to represent this reasoning in the form of an argument. Yet there are at least two objections to narrowing our topic in this way. First, reasoning is broader than argumentation. An argument is a precise, (usually) verbal reconstruction of earlier thinking that is often more like visual perception than propositional reasoning. This point is expressed vividly in many accounts of the process of discovery.2 Explicit arguments come after the fact and may omit important elements of the reasoning process. Second, even if we restrict our attention to arguments, analogical reasoning (broadly speaking) includes inferences that combine analogical arguments with other forms of argument. In tax law, for instance, the applicability of a predicate such as “principal residence” is determined in part by analogy to past cases and in part by statutory definitions (Sanders 1991). As a second example, we might imagine that the Rutherford model of the atom, which posits electrons orbiting around a positive nucleus, gains some plausibility from its analogy to the solar system.3 Any support the model might gain from the analogy is defeated, however, by its inconsistency with accepted physical theory. Orbiting electrons would radiate energy; the model is unstable. The worry here is that if we examine analogical arguments in isolation, we lose sight of the larger reasoning process in which such arguments are only part of the story. Both concerns are legitimate, yet neither undermines the importance of a good normative theory of analogical arguments. Let’s consider them in turn.
Good Analogical Reasoning Should Be Capable of Reconstruction in Argument Form My response to the first objection is modeled on Norton’s critique of philosophical theories that treat thought experiments as visual, or nonpropositional, forms of reasoning. One of the most well-known thought experiments is Galileo’s refutation of Aristotle’s position that heavy bodies fall faster than light ones. Galileo considers what will happen when a heavy cannon ball is
2. Hadamard’s (1949) monograph on the psychology of discovery in mathematics is a good example. Work on mental models and visualization, in connection with thought experiments, stresses these points (Arthur 1999, Gooding 1993, Nersessian 1992, 2002). 3. The “solar system” analogy does not appear to have played any role in Rutherford’s thinking, even though it is often invoked to explain his model.
ANALOGICAL ARGUMENTS
5
attached by a thread to a light musket ball. Aristotle’s position leads us to conclude that the combined object falls both faster than the cannon ball (since it is heavier) and slower than the cannon ball (since the musket ball will act as a drag). Without doing the experiment, we may conclude that something is wrong with Aristotle’s theory. Because thought experiments take place in the “laboratory of the mind,” there is considerable debate about how to understand them. Norton’s position is that all thought experiments are arguments (Norton 1991, 2004). Actually, Norton distinguishes two versions of the “argument thesis.” The weaker version asserts that thought experiments “can be reconstructed as arguments” and “belief in the outcome-conclusion of the thought experiment is justified only insofar as the reconstructed argument can justify the conclusion.” The stronger thesis asserts that the “actual conduct of a thought experiment consists in the execution of an argument” (2004, 50). In the case of analogical reasoning, I want to defend only the analog of the weak thesis: a conclusion reached via analogical reasoning is justified only insofar as a reconstruction of the reasoning as an analogical argument can justify that conclusion. Norton defends his two theses about thought experiments both on general philosophical grounds and by example. I hope that the many examples in this book will persuade the reader that an excellent way to evaluate analogical reasoning is indeed by reconstructing it in argument form. I do want, briefly, to advance one philosophical argument for this thesis. In the process of presenting this argument, I shall introduce a rhetorical device that will be useful throughout the book. The philosophical argument is based on the assumption that justification for analogical reasoning, or at least the sort of justification that is of primary interest, should be public. It should be based on communicable experiences, models, and assumptions. This requirement certainly supports the thesis that justifiable analogical reasoning is capable of representation in argument form. It does not rule out the inclusion of visual information, such as diagrams, in the argument. The rhetorical device is to imagine that the analogical reasoning is presented by an enthusiastic advocate to a polite but moderately skeptical interlocutor, the critic. The reasoning succeeds if it survives the critic’s scrutiny. The framework of advocate and critic helps to set a standard of justification that can be varied to reflect the demands ofdifferent settings. It also provides a vivid way to appreciate the requirement of publicity. Finally, it gives us a way to temper the “argument thesis”: the analogical argument that reconstructs a piece of analogical reasoning never is, and need not be, a complete explication of that reasoning. The critic and the advocate can share many unstated assumptions. The reconstruction of the reasoning as an argument need only be good enough to satisfy the critic. In fact, I think that the point of articulating an analogical argument in precise form is not just to persuade others to accept a view that you have already accepted on independent grounds. The exercise helps you to check your own reasoning; you can be your own critic. What is more, as we shall see, making the reasoning explicit can aid in the creative process by bringing out weaknesses and suggesting remedies.
6
BY PARALLEL REASONING
Recent work on computational models poses a challenge to my thesis about analogical arguments, to the extent that the computer programs are conceived as carrying out good analogical reasoning. In many computational models, analogical reasoning is implemented by processes that look nothing like arguments. Yet the processes are arguably public, since the algorithms and the data are open to inspection. I postpone my response to this challenge until my discussion of computational theories in chapter 3, and again in the concluding chapter.
Analogical Arguments Can Be Considered in Isolation from Other Arguments My response to the second objection is that while some “outside” factors should be incorporated into our assessment of an analogical argument, others may be set aside. In order to defend this position, we need a preliminary discussion of the aim of analogical arguments. In my view, the fundamental conclusion of an analogical argument is that a hypothesis is plausible. Section 1.5 discusses this idea in detail, but here we need a preliminary sketch. Salmon characterizes plausibility (and plausibility arguments) as follows: Plausibility arguments serve to enhance or diminish the probability of a given hypothesis prior to—i.e., without reference to—the outcome of a particular observation or experiment. They are designed to answer the question, “Is this the kind of hypothesis that is likely to succeed in the scientific situation in which the scientist finds himself or herself ?” (1990, 182) To illustrate, Salmon cites an analogical argument. A study on rats established a significant statistical association between high saccharin consumption and bladder cancer.4 Salmon suggests that the study “lends considerable plausibility to the hypothesis that use of saccharin . . . increases the risk of bladder cancer in humans” (1990, 182). Judgments of plausibility are made after a hypothesis has been formulated, but prior to testing. In addition to analogy, Salmon mentions other considerations that influence our assessment of plausibility: compatibility with accepted theories, simplicity, competence (of the person advocating the hypothesis), and a few additional criteria. One of these factors—compatibility with accepted theories—can and should be incorporated into our assessment of an analogical argument because, as we saw in the Rutherford/Bohr example, incompatibility will defeat a conclusion about plausibility. The other considerations, however, act not as outright defeaters but as factors that influence overall degree of plausibility. There can be different models for assessing the overall degree of plausibility of a hypothesis. You might employ a weighting function that combines different components: 4. The study is cited in Giere (1991).
ANALOGICAL ARGUMENTS
7
analogical support, simplicity, and so forth. Alternatively, if you interpret plausibility judgments as probabilistic claims, then you might reach an overall judgment via successive acts of conditionalization that take different factors into account. Whether we adopt one of these models or some other approach, we seem to require means of assessing the independent contributions that each factor or component makes to plausibility, even though these contributions are combined in a single judgment. One way to appreciate the preceding argument is to employ the rhetorical device of advocate and critic. It is legitimate for the critic to require compatibility with widely shared background assumptions—presumably they are shared by both advocate and critic. Accepted scientific theory is part of this shared background, but other considerations, such as simplicity and competence, are not. They are interpreted differently and given different weights by different individuals.5 Thus, it is important to evaluate the contribution of each component, including analogical arguments, independently. One clarification is in order. In cases involving multiple analogical arguments, we can’t always consider each individual argument in isolation. Sometimes one analogical argument can defeat another. This happens regularly in legal reasoning. Case A is more “on point” than case B, roughly speaking, if the relevant resemblances between case B and the case at hand are a proper subset of the relevant resemblances between case A and the case at hand. In this situation, again roughly speaking, case B becomes an irrelevant precedent.6 The issue of reasoning with multiple analogical arguments will be taken up in chapter 4. My conclusion, then, is that we can and should evaluate analogical arguments independently of other arguments, provided that our evaluation takes into account consistency with accepted theory and other relevant analogical arguments.
1.3 Why Focus on Science and Mathematics? Analogies are important in many activities, from political speeches (such as Odysseus’ harangue) to philosophical reasoning. They are vitally important in legal reasoning, where similarities are drawn between a case under discussion and previously decided cases. Reasoning by analogy is also part of everyday life: we use it all the time to solve problems and make plans. As scientific thinking is continuous with commonsense reasoning, it might seem that the best way to develop a general theory of analogical reasoning is to start with examples from ordinary life. Despite the appeal of this approach, I am convinced that it is best to begin with analogies in science and mathematics. I have three principal reasons for this concentration.
5. These points are made with great clarity in Kuhn (1977). 6. “Roughly speaking” because we are not taking into account complexities such as the nature of the similarities, the existence of relevant differences, the relative authority of the courts, the relative ages of the precedents, and so forth.
8
BY PARALLEL REASONING
A Rich Set of Intuitively Plausible Examples In the first place, science provides a rich and varied set of analogical arguments. Many of them are controversial, but many nontrivial examples have great intuitive plausibility. To take one instance, the three medians of a triangle have a common intersection point. We might conjecture, by analogy, that the four medians of a tetrahedron have a common intersection point.7 This conjecture appears intuitively plausible, and it turns out to be true. A second example, cited above, is the inference that if high saccharin consumption causes cancer in rats, it is plausible to expect the same thing in humans. A good account of analogical arguments should be able to explain the appeal of such examples. By contrast, nontrivial analogical arguments in other settings, such as the law or philosophy, are typically controversial. Consider the traditional philosophical argument for the existence of other minds. In simplest form, the argument runs thus: just as my behavior is associated with conscious mental activity, so, too, it is probable that similar behavior in others is associated with similar conscious mental activity.8 It sounds like straightforward commonsense reasoning. Yet, led by Wittgenstein, mid-twentieth-century philosophers such as Strawson (1959) and Malcolm (1963) utterly rejected this argument. Recently, updated versions are back in circulation.9 The absence of a large body of commonly accepted nontrivial analogical arguments in philosophy, or in everyday reasoning, marks an important difference between this argument form and simpler types of inductive inference. We more or less “get it right” in ordinary enumerative induction: the sun will rise tomorrow, bread will continue to be nourishing, a flame will always burn our finger. By contrast, there is no stable practice of everyday analogical reasoning.
Models of Relevance Analogical arguments are based on relevant similarities between two systems. A second reason for limiting our attention to science is that, within the sciences, we have a decent grasp of many different types of relevance relation. We have models for causal, explanatory, logical, and statistical relevance, which constitute the relationships at the heart of most scientific analogies. Although our models may be tentative, they help us to determine which similarities and differences are relevant to an analogical argument. By contrast, we have no clear models for many types of relevance that are important in nonscientific analogies.
7. A median of a triangle is a line joining a vertex to the midpoint of the opposite side. The point of intersection of the three medians is called the centroid of the triangle. In a tetrahedron, the medians are the four lines joining each vertex with the centroid of the opposite face. 8. See, for instance, Russell’s (rather weak) endorsement of this argument in his work (1914, 103) or Mill’s version (1979). 9. See Heal (1998) and Sober (2000).
ANALOGICAL ARGUMENTS
9
Similarity and Open-textured Predicates As for relevance, so too for similarity: we have clear models and criteria for evaluating similarity in the sciences, more so than in other arenas where analogical arguments are employed. That provides a third reason to focus our attention on analogical arguments in science. This claim merits clarification and discussion. A judgment of similarity is a judgment that two things are the same in some respect. It is commonly expressed by applying a single predicate to both things. Any such judgment can prompt a reasonable demand for justification if the application of the predicate is not a matter of definition or routine. In such a case, we are dealing with an open-textured predicate. Typically, the decision about whether the predicate applies involves a nontrivial comparison to paradigm cases or prototypes. There will be “easy cases” where there is general agreement that the predicate does or does not apply, and “hard cases” where applicability is open to debate. The problem of open-textured predicates is familiar in legal reasoning, since most legal predicates are open-textured.10 To show that a predicate such as “unforeseeable circumstance” or “valid contract” is appropriate, a lawyer must argue that the case at hand bears a relevant resemblance to past cases where these predicates were applied. Open-textured predicates can lead to multiple layers of analogical reasoning. Suppose that an advocate in a civil law case wants to demonstrate the existence of a valid contract between the plaintiff and the defendant. Since an offer and an acceptance are prerequisites for a valid contract, subsidiary analogical arguments might be needed to establish that an offer was made and accepted. Showing that all of these legal predicates apply to the case at hand might in turn be part of a larger argument, citing further precedents, that despite the existence of a valid contract, the defendant should not be penalized for failing to meet her obligations because of an unforeseeable disaster (e.g., a snow storm in Los Angeles). Our theory of analogical arguments will be greatly simplified if we can set aside such multilayered arguments. My contention is that we can do so for most analogical arguments in the sciences. Specifically, the basic similarities—those upon which the analogical argument is founded—are not treated as opentextured in the context of that argument. These basic similarities may be established by formal definitions or by a widely accepted classification scheme. My main point is that they do not require subsidiary analogical justification as part of the argument. To be sure, there are open-textured predicates in science. Many people believe that the extension of kind terms is determined (at least in part) by reference to paradigm examples (Kripke 1972, Putnam 1975, Kuhn 1979). Hesse (1966) argues that there can be theoretical concepts whose very meaning appears to be determined by analogy or metaphor. One example is the ether, the
10. Open-textured predicates in the law are discussed in Hart (1961, 121–32). A good recent discussion from the cognitive science perspective is Sanders (1991).
10
BY PARALLEL REASONING
medium in which light waves were once thought to propagate. Huygens (1690/ 1962) argued that light required a material medium by analogy with the fact that this was true of every other known wave motion.11 The meaning of the term “ether,” in this context, seems just to be that which is to light as air or water is to sound. There is an abundant literature arguing for the importance of metaphor in science, and metaphorical concepts are certainly open-textured because they rely upon analogy. Despite the existence of open-textured predicates, individual analogical arguments in the sciences do not appear to involve multiple layers of analogical reasoning. We seldom see analogical arguments that depend upon similarities that must themselves be established and justified analogically. Rather, analogical arguments in science depend upon basic similarities that are supposed to be unproblematic. The relevant concepts have a clear meaning over a range of cases including those under discussion; they are not open-textured in the context of that argument. By contrast, it is not uncommon for the conclusion of an analogical argument in science to extend an open-textured predicate to a new case. That is what happens when we argue by analogy for the existence of the ether as the medium in which light travels. In chapter 6, I discuss a series of analogies used to extend the concept of electricity from laboratory phenomena to lightning, to nerve impulses in animal motion, and eventually to molecules and atoms. Very similar observations apply to mathematics. There are open-textured mathematical concepts. Lakatos (1976) famously reconstructs the reasoning that leads us to include or exclude certain objects from the category of regular polyhedra. Here, too, open-textured concepts are the ones under investigation. They figure in the conclusions of analogical arguments. The starting similarities, however, are independently acceptable and the relevant predicates are not treated as open-textured. Although this claim about analogical arguments rests mainly on examples, there is one general observation in its favor. As in section 1.2, let us distinguish between individual analogical arguments, which draw upon just one case, and multiple analogical arguments, based upon two or more cases. The observation is that a great deal of analogical reasoning in science involves individual arguments. Where there is no rich “library” of cases, there is no prospect of treating open-textured predicates as they are handled in the law. For individual analogical arguments, at least, the basic similarities cannot be justified by subsidiary analogical reasoning. The upshot, if this claim about analogical arguments in science is valid, is that there is a clear advantage to beginning our study with scientific analogies. We can avoid the multilayered analogical arguments that we encounter in the law. What is more, even if this assumption is something of an idealization, it is a useful one. A first step in understanding multilayered analogical reasoning is to have a clear grasp of how a single layer of analogical argument functions. This
11. The postulate of an ethereal substance goes back as far as Aristotle, who also relied on the analogy between light and sound (De Sensu 439a19).
ANALOGICAL ARGUMENTS
11
point is acknowledged in the artificial intelligence (AI) literature on opentextured predicates in the law.12
Objection: Analogies Generate Self-perpetuating Similarities The foregoing discussion represents an analogical argument as proceeding from a set of accepted, or preexisting, similarities. A natural objection is that, in practice, analogies generate self-supporting similarities. For instance, Stepan (1996) suggests that nineteenth-century metaphors and analogies between race and gender made it possible for scientists to “see” similarities between (in their view) the “inferior” gender, female, and “inferior” nonwhite races. These “similarities,” which would not ever have been acknowledged without the background analogy, extended well beyond social inferiority to encompass physical and moral characteristics, including such far-fetched attributes as “susceptibility to disease.” Generalizing from this example, Stepan writes: Because a metaphor or analogy does not directly present a pre-existing nature but instead helps “construct” that nature, the metaphor generates data that conform to it, and accommodates data that are in apparent contradiction to it, so that nature is seen via the metaphor and the metaphor becomes part of the logic of science itself. (1996, 133) This role of a metaphor in the “logic” of science, she suggests, comes close to that of a Kuhnian paradigm. Such metaphors may prove to be helpful or they may (as in her example) prove to be harmful, but in any case they are indispensable. Essentially the same point about indispensability is made by Lakoff and Johnson: The IDEAS ARE FOOD metaphor establishes similarities between ideas and food. Both can be digested, swallowed, devoured, and warmed over, and both can nourish you. These similarities do not exist independently of the metaphor. The concept of swallowing food is independent of the metaphor, but the concept of swallowing ideas arises only by virtue of the metaphor. (1980, 147–48) For Lakoff and Johnson, some similarities are independent of metaphor, others are based on conventional metaphors that structure our everyday thinking (such as IDEAS ARE FOOD), and still others are created when novel metaphors build upon conventional ones. Stepan’s race/gender example is more pointed, however, not only because the metaphor is pernicious but also because it builds only upon itself rather than upon other independent metaphors. It is important here to distinguish between the broad role of analogy, in providing an underlying metaphor for a sustained research program, and the
12. Sanders (1991) notes that Ashley’s HYPO program (discussed in chapter 3) uses cases “to interpret a single open-textured predicate, ‘trade secret violation.’ ” In Ashley’s program, the legal predicates that enter into analogical arguments about trade secrets do not themselves require analogical support.
12
BY PARALLEL REASONING
narrow role of grounding an individual analogical argument. Stepan’s allusion to Kuhn is appropriate, since the ideas Kuhn introduced in The Structure of Scientific Revolutions help to make this distinction clear. An underlying metaphor can function, like a paradigm, as a constitutive element in scientific work. Metaphors can provide models and images that shape our perception of phenomena, and they can establish a grip that is difficult to shake. Individual analogical arguments, by contrast, are less grand. In Kuhnian terms, most of them extend an existing paradigm; they are employed in the context of normal science. In light of this distinction, we can still maintain that individual analogical arguments in science depend upon previously accepted similarities. This is true even in Stepan’s example: accepted similarities form the basis for each extension of the analogy between race and gender. Similarities that may themselves at one time have been supported by analogical arguments become basic and are taken for granted when we turn to a new analogical argument. These reflections, however, do not address Stepan’s deep concerns about the self-perpetuating nature of analogies. In her example, the narrow and broad roles for analogy are closely related. The underlying metaphor powers the entire series of individual analogical arguments that extend its reach. Even if the individual steps in the growth of the metaphor are technically noncircular, the process as a whole still looks suspicious. There is still a will to “see” similarities that reinforce and extend the metaphor. I agree with Stepan that metaphors and analogies can evolve in this way—sometimes for good, when stubborn adherence pays off, and sometimes for ill. The best response I can offer is this: all the more reason for wanting a good critical theory of analogical arguments. Many current models of analogical reasoning are based on the idea that good analogical reasoning aims at maximizing the overall degree of similarity between two systems. Such models have very few resources to halt the self-perpetuating tendency that Stepan is worried about, since that tendency is regarded as the essence of analogical reasoning. Similarly, any model of analogical reasoning in which past “successes” of an analogy count as justification for any future use is vulnerable to the charge of self-perpetuation. The theory that I shall propose does not suffer from either of the two defects just noted. It does not generally take past successes of an analogy into account: each analogical argument is evaluated on its own merits. Nor does it treat justification as a by-product of maximal overall similarity. So there is some hope that my theory has the resources to arrest entrenched analogies that have gone astray. But Stepan’s example raises a further challenge. Suppose (unlikely as it may be) that we find a normative theory that does a perfect job of identifying strong individual analogical arguments. Can a series of such arguments, individually justified, eventually lead to an unreasonable conclusion? Must the weakness or inappropriateness of the underlying metaphor eventually manifest itself in some unacceptable individual analogical argument? For now, I leave these questions open. I hope to have said enough to motivate the goal of constructing a theory of individual analogical arguments.
ANALOGICAL ARGUMENTS
13
1.4 Representation and Terminology This section provides a general schema for representing individual analogical arguments, one that is meant to be broadly acceptable. We will see more sophisticated representations in later chapters, but this one suffices to introduce some basic notation and terminology that will be used throughout the book. As a first effort, an analogical argument has the following form: (1) S is similar to T in certain (known) respects. (2) S has some further feature Q. (3) Therefore, T has the feature Q, or some feature Q* similar to Q. Lines (1) and (2) are premises, while (3) is the conclusion of the argument. This argument form is inductive; the conclusion is not guaranteed to follow from the premises. S and T are the source and target domains, respectively. A domain is a set of objects, properties, relations, and functions, together with a set of accepted statements about those objects, properties, relations, and functions. More formally, a domain consists of a set of objects and an interpreted theory about them. The language of the theory has symbols for the objects, properties, relations, and functions to which the theory refers. I shall adopt the convention of using unstarred symbols (a, P, R, f ) for items in the source domain and starred symbols (a*, P*, R*, f*) for items in the target domain. An analogy between S and T is a mapping j between objects, properties, relations, and functions in S and those in T. Not all items in S and T need to be paired up. Commonly, the analogy mapping applies only to selected properties, relations, and functions, and not at all to particular objects (as in Huygens’s analogy, below). We say that relations R and R* correspond if j (R) = R*, and similarly for properties and functions. There can be analogies in which several items in one domain correspond to a single item in the other. Most often, however, j is one-to-one, and we normally limit our attention to such cases. In practice, we omit all of these formalities and specify an analogy simply by indicating the most significant similarities and differences. Informally, the analogy mapping is extended to propositions by replacing terms pertaining to one domain with the corresponding terms that pertain to the other.13 Turning now to analogical arguments, we improve on our initial representation. We assume that the analogy relation is given. Hesse (1966) introduced what we shall call the tabular representation of an analogical argument.14 The idea is to place corresponding objects, properties, relations, and propositions side-by-side in a table of two columns, one for each domain. For instance, a simplified version of Huygens’s (1690/1962) argument that there must be an ethereal medium in 13. This is not always routine, but the characterization is adequate for present purposes. 14. A similar tabular representation appears in Holyoak and Thagard’s multiconstraint theory of analogical reasoning (1995), where correspondences are typically listed in order of increasing abstractness: first objects, then first-order predicates and relations, and then higher order relations.
14
BY PARALLEL REASONING
which light travels, just as there is a material medium for sound, can be represented as follows:15 SOUND (S) Known similarities: echoes bending around corners volume pitch
LIGHT (T ) reflection diffraction through slits brightness color
Inferred similarity: material medium (air or water)
) material medium (ether)
Hesse introduced some useful terminology based on this tabular format. The horizontal relations in an analogy are the relations of similarity (and difference) in the mapping between domains, while the vertical relations are those between the objects, relations, and properties within each domain. The correspondence between echoes and reflection is a horizontal relation; the relation between echoes and bending around corners (if any such relation exists) is vertical. To complete our characterization of analogical arguments, we need one enhancement. We want to augment the tabular representation by adding a block that represents accepted (or known) differences. Although it is common for people to suppress points of difference between the domains being compared, I shall urge that some of them ought to be made explicit. For now, it suffices that this should be an option. To formulate the augmented representation, let’s introduce some additional terminology originally defined by Keynes (1921): Positive analogy. Let P stand for a list of accepted propositions P1, . . . , Pn about the source domain S such that the corresponding propositions P1*, . . . , Pn*, abbreviated as P*, are all accepted as holding for the target domain T. P and P* represent accepted (or known) similarities. We refer to P as the positive analogy. Negative analogy. Let A stand for a list of propositions A1, . . . , Ar accepted as holding in S, and B * for a list B1*, . . . , Bs* of propositions holding in T. Suppose the analogous propositions A* = A1*, . . . , Ar* fail to hold in T, and similarly the propositions B = B1, . . . , Bs fail to hold in S. We write A, ~A* and ~B, B* to represent accepted (or known) differences, and we refer to A and B as the negative analogy. Note that r and s can be 0. Neutral analogy. The neutral analogy consists of accepted propositions about S for which it is not known whether an analog holds in T. Finally we have: Hypothetical analogy. The hypothetical analogy is simply the proposition Q in the neutral analogy that is the focus of our attention. 15. I have made slight modifications to Hesse’s representation.
ANALOGICAL ARGUMENTS
15
Although these concepts involve cross-domain similarity relationships, the nature of the correspondence is irrelevant. Thus, it is convenient to define each concept as a set of propositions about the source domain. These concepts will play an important part in my theory. Here, they allow us to provide a general but compact schema for representing analogical arguments.
Schema (1.4.1) Augmented tabular representation SOURCE (S) P A ~B Q
TARGET (T ) P* [positive analogy] ~A* [negative B* analogy] Q* (plausibly)
An analogical argument thus takes the form: It is plausible that Q* holds in the target because of certain known (or accepted) similarities with the source domain, despite certain known (or accepted) differences. I have introduced plausibility into the conclusion. Section 1.2 provided preliminary reasons for treating the conclusions of analogical arguments as assertions of plausibility. In order that our schema should be broadly acceptable, we interpret “plausibly” in (1.4.1) to mean “with some degree of support.” The next task is to refine our ideas about plausibility.
1.5 Two Conceptions of Plausibility Plausibility is a complex notion. For one thing, plausibility seems to straddle the boundary between discovery and justification. To say that a hypothesis such as “Saccharin increases the risk of bladder cancer in humans” is plausible is to convey that it has epistemic support, that we have some reason to believe in it, even prior to testing. A second complication is that a plausibility judgment appears to have pragmatic connotations. To say that a hypothesis is plausible typically implies that we have good reason to investigate it (subject to the feasibility and value of investigation). Finally, there is a certain ambiguity as to whether an assertion of plausibility is categorical or a matter of degree. In my view, these observations point to the existence of two distinct conceptions of plausibility, both of which are important in analogical arguments.
The Probabilistic Conception (Degree of Plausibility) On the probabilistic conception, plausibility is identified with reasonable credence or subjective degree of belief. We find this conception of plausibility
16
BY PARALLEL REASONING
explicitly in the earlier quotation from Salmon.16 We find it also in Polya, who writes: “We wish to use the calculus of probability to render more precise our views on plausible reasoning” (1954, 116). Polya goes on to specify a subjective reading of probability. The probabilistic conception lets us represent many locutions about plausibility very naturally. For instance: (High plausibility) It is highly plausible that p p has a high probability; (Comparative plausibility) p is more plausible than q p has higher probability than q; (Incremental plausibility) That argument/evidence makes p more plausible That argument/evidence increases the probability of p. The probabilistic conception has other advantages. It takes care of the idea that plausibility is a matter of degree. What is more, if we hold a probabilistic model of belief updating, then we have a ready-made connection between plausibility and confirmation, since subsequent testing serves to increase or decrease the probability of a hypothesis.
Modal Conception (Prima facie Plausibility) On the modal conception, “it is plausible that p” is not a matter of degree. The meaning of “It is plausible that p,” roughly speaking, is this: (Prima facie plausibility) It is plausible that p There are sufficient grounds for taking p seriously. To take a hypothesis seriously is to regard further investigation as reasonable, subject to feasibility and interest. It is to single out the hypothesis from an undifferentiated mass of logical possibilities. Although this conception, which I shall call prima facie plausibility (or sometimes minimal plausibility), is not so widely discussed as the other one, I believe that it is extremely important, especially when thinking about analogical arguments. We find the modal conception in a number of places, particularly in discussions of analogy. The physicist N. R. Campbell writes: But in order that a theory may be valuable it must . . . display an analogy. The propositions of the hypothesis must be analogous to some known laws . . . . In the view which is urged here analogies are not “aids” to the establishment of theories; they are an utterly essential part of
16. Salmon quietly endorses, but does not satisfactorily develop, the possibility of interpreting “likelihood of success” as an objective probability—specifically, as a frequency.
ANALOGICAL ARGUMENTS
17
theories, without which theories would be completely valueless and unworthy of the name. (1957, 129) Campbell goes on to examine the role of analogy in Fourier’s theory of heat conduction, and writes: “Some analogy is essential to it; for it is only this analogy which distinguishes the theory from the multitude of others . . . which might also be proposed to explain the same laws” (142). Noting that Fourier’s theory has successfully predicted new laws, Campbell insists that, by virtue of analogy, the theory “was also distinguished from absurd theories at the time it was propounded and when these new laws were not known.” (For Campbell, an “absurd theory” is a set of meaningless equations devised purely to “derive” a known law.) Campbell’s notion of a “valuable” theory is very close to my conception of a prima facie plausible hypothesis.17 Campbell evidently believes that analogy is indispensable for a valuable theory. That is a stronger thesis about analogy than I care to defend, but it seems clear that Campbell also holds a weaker thesis which I do, with one caveat, endorse: a good analogy is sufficient to establish that a hypothesis is “valuable,” even prior to testing. The caveat is that claims of prima facie plausibility based on good analogical arguments are defeasible. In particular, they are defeated if we discover any internal inconsistency or inconsistency with accepted theory, and they can be defeated by a rival analogical argument. This simply reiterates an observation made in section 1.2. Campbell has nineteenth-century antecedents in Whewell and Herschel. As Snyder (2006) has pointed out, both of these thinkers rejected an unrestricted hypothetico-deductive model of confirmation, according to which any hypothesis receives support if its deductive consequences are verified by observation. To be capable of confirmation, a hypothesis must stipulate a vera causa, or “true cause.” A vera causa is a possible cause that is worthy of testing. Herschel insisted that a vera causa had to be “analogous to causes that are already known to have produced similar effects in other cases” (Snyder 2006, 201)—that is, a cause of a sort that is known to exist in nature. Whewell criticized this constraint as too demanding, but he did accept both the basic need to distinguish between frivolous hypotheses and verae causae, and the idea that analogy was sufficient to establish a vera causa. The notion of vera causa is, once again, close to my idea of prima facie plausibility.18 The point is that prima facie plausibility should not be taken for granted. In most investigations, it is irrational, from a practical standpoint, to be openminded about every possible hypothesis. Some of them simply do not merit investigation and would probably not even occur to us. As Hanson has pointed out, Kepler would never have seriously considered the idea that Jupiter’s moons cause the apparent accelerations of Mars at 90˚ and 270˚ from aphelion (1958, 85).
17. Campbell’s notion, unlike mine, is restricted to physical theory and requires analogy to “known laws.” 18. Once again, I do not restrict the concept to causal hypotheses.
18
BY PARALLEL REASONING
Any argument that a hypothesis is prima facie plausible should be constructive; it should provide reasons to think the hypothesis might be true. Summarizing, the modal conception captures a great deal of what we want from an account of plausibility. To say that a hypothesis is prima facie plausible is to say two things: (1) It has epistemic support: an appreciable likelihood of being true (or “successful”). (2) It has pragmatic importance: it is worth investigating, granted the feasibility of and interest in investigation.19 In short, given our current knowledge of the situation and our assessment of costs and benefits, the hypothesis passes some sort of initial screening procedure. Assuming investigation is feasible and worthwhile, we ought to investigate further. But as we noted, the modal conception cannot represent degrees of plausibility.
The Relationship between the Two Conceptions If we hold a probabilistic interpretation of belief, then there must be a relationship between the modal and probabilistic conceptions of plausibility. Nevertheless, we have good methodological reasons to keep them distinct. One reason is that there is no easy way to translate prima facie plausibility into probabilistic language. The most natural strategy is this: (Positive probability) It is (prima facie) plausible that p p has positive (noninfinitesimal) probability. But this doesn’t work. Positive probability alone does not ensure that a hypothesis either has an appreciable likelihood of being true or is worth investigating. Both characteristics might require a moderately high probability value—one that lies above some threshold that depends upon the situation. Of course, for a probabilist, a prima facie plausible hypothesis must be assigned some positive probability value. But we should keep in mind the following points. First, there is no nonarbitrary and fully general method for obtaining this value. Second, and more important, the actual probability value seems to be a distraction. In company with Whewell, Herschel, and Campbell, I think that the point of many analogical arguments is just to establish a rudimentary sort of plausibility. To put everything in numerical terms is a secondary consideration. In many cases, the hypothesis is modeled as having a certain probability value because it is deemed prima facie plausible, not the other way around. If this is right, then the most fundamental task is to investigate the “modal logic” of analogical arguments: to find criteria such that an analogical argument meeting those criteria has a prima facie plausible conclusion. One advantage of 19. Point 2 depends, in a decision-theoretic sense, upon point 1 together with contextual information about costs and benefits. Absent that information, and absent a precise probability value, the two points are at least partially independent.
ANALOGICAL ARGUMENTS
19
focusing on the modal conception of plausibility is that we do not tie ourselves to a specifically probabilistic (Bayesian) approach to confirmation. That is yet another reason to keep the two conceptions of plausibility distinct. All the same, we must keep the probabilistic conception in mind for at least two reasons. First, going beyond minimal plausibility, we are interested in how analogical arguments are made stronger or weaker. My theory proposes qualitative means of assessing the strength of an analogical argument, building on the modal conception of plausibility. Second, it is ultimately important to find a place for analogical arguments in the Bayesian model of confirmation. In this book, I concentrate on prima facie plausibility, but I shall frequently discuss the probabilistic conception as well. The connection between the two is a major theme of chapter 8.
1.6 Commonsense Guidelines and Inference Rules Any normative theory for evaluating analogical arguments must propose criteria that, to some extent, reflect commonsense ideas. This section explores some of these ideas and explains why common sense falls short of providing adequate guidance.
Commonsense Guidelines Logicians and philosophers of science have identified a number of “textbook” guidelines for evaluating analogical arguments. Variants of these principles have been around for a long time.20 Here are some of the most important ones (using “CS” to suggest common sense): (CS1) The more similarities (between the two domains), the stronger the analogy. (CS2) The more differences, the weaker the analogy. (CS3) The greater the extent of our ignorance about the two domains, the weaker the analogy. (CS4) The weaker the conclusion, the more plausible the analogy. (CS5) Analogies involving causal relations are more plausible than those not involving causal relations. (CS6) Structural analogies are stronger than those based on superficial similarities. (CS7) The relevance of the similarities and differences to the conclusion (i.e., to the hypothetical analogy) must be taken into account. (CS8) Multiple analogies supporting the same conclusion make the argument stronger.
20. See Aristotle’s Topics and Prior Analytics, Mill’s A System of Logic (1843/1930), Keynes’s A Treatise on Probability (1921), and numerous logic texts including Robinson (1930), Stebbing (1933), and Copi (1961).
20
BY PARALLEL REASONING
These principles can be helpful in general, but they are frequently too vague to give us much insight. For instance, how do we count the similarities and differences mentioned in principles (CS1) and (CS2)? In connection with principles (CS5) through (CS7), which of the similarities identified in an analogical argument are relevant to the conclusion, and why are structural and causal analogies thought to be especially important? Which differences between domains are relevant? How are we to handle similarities and differences that have not been identified? In light of our discussion of plausibility in the preceding section, we can identify a second major weakness with the above principles: all of them pertain to the degree of plausibility, and hence to the probabilistic conception. None of them addresses the question, “Is this analogical argument prima facie plausible or not?” A third problem is what to do when the principles pull in different directions. To illustrate, consider an example that has been a favorite with logicians: Thomas Reid’s argument for life on other planets.21 Reid (1785) pointed out a number of similarities between Earth and the other planets in the solar system. All orbit and are illuminated by the sun, some have moons, all revolve on an axis, and all move in accordance with the law of gravity. In consequence, it is “not unreasonable to think, that those planets may, like our earth, be the habitation of various orders of living creatures” (1785, 24). Utilizing the tabular representation scheme, we can capture this argument as follows (putting Mars as the target): EARTH (S) Known similarities: orbits the sun has a moon revolves on an axis subject to law of gravity Inferred similarity: supports life
MARS (T ) orbits the sun has moons revolves on an axis subject to law of gravity ) may support life
Philosophers disagree about the cogency of this argument. Stebbing (1933) finds Reid’s argument “suggestive” and “not unplausible” because the conclusion is weak (CS4), while Mill (1843/1930) appears to reject the argument on account of our vast ignorance of properties that might be relevant (CS3). This debate is far from over. Scientists still use analogical reasoning to support the conjecture that, even if Mars is now dead, it might once have supported life (McKay 1993). The Rover expeditions of 2003–04 bolstered this argument with the finding that ancient Mars possessed liquid water in abundance. It might be argued at this juncture that the problems of vagueness and of weighing competing criteria simply reflect the reality of working with an 21. Variants of the argument are found in Stebbing (1933), Mill (1843/1930), Robinson (1930), and Copi (1961).
ANALOGICAL ARGUMENTS
21
imprecise and irreducibly subjective form of reasoning. Perhaps the above list of criteria or something similar is the best we can hope for. In any choice between scientific theories, individual interpretations of vague and sometimes conflicting criteria, and differences of opinion about the relative importance of those criteria, are unavoidable. This point certainly applies to analogical arguments, for individuals assess the merits of such arguments differently. Yet such observations have not stopped philosophers from attempting to shed light on the problem of theory choice. We can surely do better than these “textbook” guidelines. An adequate theory of analogical arguments should at least systematize and resolve conflicts between them.
Commonsense Inference Rules There is a vivid way to see that commonsense ideas about analogical reasoning are inadequate. This is to recognize that there are no widely acknowledged commonsense inference rules for analogical arguments, despite the confidence with which such arguments are endorsed or rejected. This contrasts sharply with the situation in deductive logic, where we have plenty of unimpeachable inference rules, and with enumerative induction, where we have candidates such as the familiar “straight rule.” Straight rule for enumerative induction. Let A and B be classes of objects or events. Suppose that out of n A’s that have been observed, m have been B’s. In the absence of defeating reasons, infer that the proportion of A’s that are B’s is approximately m/n.22 The straight rule need not be ultimately acceptable. My point, rather, is that it is a reasonable starting point for thinking about induction. There is no comparably simple inference rule that can serve as a reference point for reflection on analogical arguments.23 Let’s see why. Philosophers have developed two main approaches to evaluating analogical arguments, corresponding to two common conceptions of analogies. One view, most clearly expressed by Mill (1843/1930) but originating with Aristotle, is that an analogy is a conjunction of material (observable) resemblances between two domains. Mill characterized the evaluation of an analogical argument as “a competition between the known points of agreement and the known points of difference.” The basic principle for judging an analogical argument is to compare the extent of material similarity and difference. In essence, this view assimilates analogical arguments to enumerative induction. Just as we might sample a small population of A’s to find out the proportion that share the attribute B, we sample
22. If we have reason to believe that our sampling procedure is not random, then we should not draw the general conclusion. 23. Carnap and his followers have made serious attempts to formulate principles of analogy in inductive logic. Carnap ((1980) and elsewhere) uses concepts of “analogy by similarity” and “analogy by proximity” to incorporate prior beliefs about similarity into inductive logic. I shall not discuss this body of work because the project is very different from the one in this book.
22
BY PARALLEL REASONING
a small population of properties of the source domain to find out the proportion that belongs to the target domain. I will refer to this as the sampling or inductive conception of analogical arguments. The second view, which I call the structuralist or deductive conception, assimilates analogy to a deductive argument form. One version, exemplified by Hempel (1965), stresses that analogies are founded on formal resemblance. An analogy is ideal if two domains are nomically isomorphic—that is, if they are described by formally identical mathematical laws. The fundamental criterion for assessing analogical arguments is to consider how closely the analogy approximates isomorphism. In the limit of perfect isomorphism, an analogical argument becomes deductively valid. A second version, exemplified by (Weitzenfeld 1984) and (Davies 1988), treats analogical arguments as deductive arguments that rely on extra (sometimes tacit) premises. If either the inductive or deductive conception were correct, then finding an inference rule for analogical arguments would reduce to characterizing them in terms of these fundamental forms of inference (with modifications depending on the degree to which the analogical argument departs from the ideal). In chapter 2, I shall argue that neither conception of analogical reasoning is correct. For now, the observation that there seem to be analogical arguments of both types already suggests that neither account can be wholly adequate to accommodate all analogical arguments. But let’s take a brief look at what goes wrong when we attempt to formulate an inference rule along inductive lines. Using the straight rule as our model, consider the following candidate, rule (R). The rule presupposes that we have represented our analogical argument in accordance with schema (1.4.1). (R) Suppose S and T are the source and target domains. Suppose P1, . . . , Pn represents the positive analogy, while A1, . . . , Ar and ~B1, . . . , ~Bs represent the negative analogy and Q represents the hypothetical analogy, as in schema (1.4.1). In the absence of reasons for thinking otherwise, infer that Q* holds in the target domain with probability p > 0, where p is an increasing function of n and a decreasing function of r and s. This is quite close to Mill’s view of analogical inference. But it is pretty clear that rule (R) is not a viable inference rule. The most basic objection is that p need not always be positive: we can construct analogical arguments that conform to the schema (1.4.1) but establish nothing at all about the probability of Q*. A few silly examples will illustrate this point. First, rule (R) allows us to infer, with positive probability, that an unknown object on the other side of the door is cold. That object shares at least one property with ice (e.g., the property of being self-identical). By rule (R), there is a positive probability it also shares the property of being cold. But this analogy should clearly show nothing. Next, Achinstein (1964, 328) observes that there is a formal analogy between swans and line segments if we take the relation “has the same color as” to
ANALOGICAL ARGUMENTS
23
correspond to “is congruent with.” Both relations are reflexive, symmetric, and transitive. Yet it would be absurd to infer, with positive probability, that we are more likely to find congruent lines clustered in groups of two or more than alone just because swans of the same color are commonly found in groups. The features in the positive analogy are antecedently known to be irrelevant to the hypothetical analogy. In such a case, the analogical inference should be utterly rejected. As a final and more challenging example because it has historical significance, consider an argument used in 1611 by the Florentine astronomer Francesco Sizzi to refute Galileo’s claim that Jupiter had satellites: There are seven windows in the head, two nostrils, two ears, two eyes and a mouth; so in the heavens there are two favorable stars, two unpropitious, two luminaries, and Mercury alone undecided and indifferent. From which and many other similar phenomena of nature such as the seven metals, etc., which it were tedious to enumerate, we gather that the number of planets is necessarily seven . . . . Besides, the Jews and other ancient nations, as well as modern Europeans, have adopted the division of the week into seven days, and have named them from the seven planets: now if we increase the number of planets, this whole system falls to the ground.24 From a contemporary perspective, this argument should be rejected even though it can be made to conform to schema (1.4.1). The similarity between the planets and “windows in the head,” metals, and so forth consists in their being natural phenomena; the inference drawn by analogy is that they must also share the property of being seven in number. Since our current theories tell us that there is no connection at all between the known and inferred similarities, the argument should carry no weight for us. There is a separate question about what to make of the argument in its historical setting; I return to that question in the next section. These three examples show that rule (R) fails to make sense of prima facie plausibility. It also does a poor job with respect to the probabilistic conception of plausibility. The obvious objection is that not every similarity increases the probability of the conclusion and not every difference decreases it. Some similarities and differences are known to be (or accepted as being) completely irrelevant and should have no influence on our probability judgments. In short, rule (R) justifies too much. The rule must be supplemented with considerations of relevance. There are so many different types of relevance, however, that the hope for a single, simple rule of analogical inference is likely to be futile.
24. The quotation is taken from Holton and Brush (1973, 51). Sizzi’s seven “planets” evidently include the sun, moon, and the five planets visible in his day: Mercury, Venus, Mars, Jupiter, and Saturn.
24
BY PARALLEL REASONING
1.7 The Articulation Model: A Preview One of my main objectives is to formulate principles for evaluating analogical arguments, principles that are general but superior to the commonsense guidelines of the preceding section. Yet even within the sciences, there is so great a variety of analogies and analogical arguments that no single evaluation scheme can hope to accommodate them all. My proposal is that we solve this problem with a two-part theory of analogical arguments that I call the articulation model. The first part consists of two general principles. The second part is a classification scheme that lets us apply those principles differently to different types of analogical arguments. The classification is based upon diverse forms of logical, causal, and explanatory relationships that occur across the sciences. In effect, I develop a set of models for evaluating different types of analogical argument, recognizing that relevance and similarity are determined by somewhat different criteria for these diverse types of analogical argument. In this section, I provide an overview of the articulation model. The basic ideas can be explained with the help of concepts and terminology introduced earlier. Simplifying schema (1.4.1) slightly (by omitting the negative analogy), consider the representation of an analogical argument shown in figure 1.1. Illustrating with Darwin’s famous analogy between artificial and natural selection, we can take Q as the (well-confirmed) hypothesis of artificial selection and Q* as Darwin’s hypothesis of natural selection. P and P* describe accepted similarities, such as the presence of diversity among domestic varieties and natural species. Existing classifications of analogical reasoning have focused on the nature of the horizontal (similarity) relations. For example, Gentner (1983) distinguishes between superficial “feature matches,” where the objects in the two domains share common properties, and deep “structural analogies,” based on shared high-level relations.25 Darwin’s analogy falls into the second category; Reid’s
Source Domain (S) P
← Horizontal relations →
↑
P*
} Positive analogy
↑
Vertical relations ↓
Target Domain (T)
Possible vertical relations ↓
Q ← Possible horizontal relations → Q*
} Possible further analogy
FIGURE 1.1. Simplified tabular representation. 25. Holyoak and Thagard (1995) distinguish attribute mappings, relational mappings and system mappings in a similar manner. The first and last of these correspond to Gentner’s feature matches and structural analogies, while relational mappings are analogies based on correspondences between first-order relations.
ANALOGICAL ARGUMENTS
25
analogy between Earth and Mars appears to fall into the first. The account developed in this book, by contrast, proposes a classification scheme based on the different types of vertical relations in the source and target domains. The vertical relations provide the clue to determining which similarities and differences are relevant. Different sorts of vertical relations naturally lead to different assessments of relevant similarity. The first key concept in my theory is the prior association, defined as the vertical relation in the source domain that is the focus of the analogical argument. This yields a preliminary requirement for a seriously proposed argument by analogy, the first of my two general principles.
Principle 1: Requirement of Prior Association The description of the source domain must include an explicitly stated vertical relation which the analogical argument is supposed to extend in some way to the target domain. By restricting attention to analogies where we have well-defined models of the prior association, we obtain a classification that provides a basis for evaluating analogical arguments. In effect, the assessment of any analogical argument is mediated by a model of the prior association. As a preview of things to come, let me characterize two classes. • Mathematical analogies. P refers to a set of assumptions, while Q is a theorem about the source domain. The prior association is a proof that P (together with other assumptions) entails Q. The analogical argument is intended to make it plausible that similar features P* of the target domain entail a similar conclusion Q* about the target domain. • Explanatory analogies. Q refers to a hypothesis and P to observed consequences of Q. The prior association is that Q explains P. The analogical argument is meant to provide support for the idea that similar features P* in the target domain are explained by a similar hypothesis Q*. Darwin’s analogy exhibits this pattern. Analogical arguments belonging to these two distinct groups demand different evaluation criteria. This leads to the second key concept in my theory, potential for generalization, once again formulated as a requirement. The terminology derives from Poincare´, who stated that analogies enable us to perceive the “possibility of generalization” (1952a, 28). The principle is as follows:
Principle 2: Requirement of Potential for Generalization A good analogical argument is one where, at minimum, there is no compelling reason to deny that the prior association that obtains in the source domain could be generalized in a way that extends to the target domain.
26
BY PARALLEL REASONING
Specifically, the minimal condition is this: features that play a central role in the prior association must have analogs in the target domain that are either known to hold, or at least not known not to hold, in order for an analogical argument to be prima facie plausible. If some crucial difference appears to block the possibility that this particular pattern of association could be generalized, then our analogical argument fails to demonstrate the plausibility of its conclusion. By extension, the strength of the argument increases to the extent that relevant features have analogs that are known (or believed) to hold in the target domain. I maintain that these two general principles, prior association and potential for generalization, are common to the evaluation of analogical arguments in all fields. When supplemented with models for different forms of prior association, we obtain criteria for deciding which similarities of the source and target domains are relevant. The requirement of potential for generalization stipulates both minimum conditions for prima facie plausibility and a basis for assessing degree of plausibility. In this way, we obtain criteria that are more helpful than the “textbook” guidelines of section 1.6. These ideas can be illustrated by reference to our two selected classes of analogical argument. • In mathematical analogies, the features that play a central role in the prior association are the assumptions used in the proof of the theorem Q. The analogical argument stands or falls depending on whether analogous assumptions are known to hold, or at least not known not to hold, in the target domain. • In explanatory analogies, any known observable consequence of the hypothesis Q counts as central to the prior association. The strength of the argument depends upon the extent to which we know that these consequences have analogs in the target domain. This criterion directs us to search for such analogous consequences. If they are not found, then the argument is weakened or defeated. The second situation is exemplified in the reaction to Darwin’s argument. The objection regarded by Darwin as possibly “the most obvious and gravest” was that just as we observe a nearly continuous transition among domestic varieties, we should expect to observe a “finely graduated” chain of forms in the fossil record. According to my theory, Darwin’s argument would have been initially plausible but weakened over time if a fossil record of intermediate transitions had not been recovered.
Historical Relativization A project that sets out to propose standards for any form of scientific reasoning must confront the fact that such standards, and indeed the very examples taken to exemplify good scientific practice, are historically variable. Let’s consider two examples. The first is due to Aristotle. In the Meteorology, we find a charming explanation of earthquakes as exhalations of wind from the Earth: “We must
ANALOGICAL ARGUMENTS
27
suppose the action of the wind in the earth to be analogous to the tremors and throbbings caused in us by the force of the wind contained in our bodies. Thus some earthquakes are a sort of tremor, others a sort of throbbing” (Meteor 366b14).26 The second example, just as colorful, is Sizzi’s argument quoted in section 1.6. What should a theory of analogical reasoning say about such arguments? I want to avoid two extremes. I don’t want to hold up these arguments as stock examples of bad analogical reasoning, even though they may look quaint or foolish by current standards. Nor do I want to strain too hard to make them respectable, or more generally to hold the position that every analogical argument ever proposed is good. That would make a critical study of analogical arguments pointless. In fact, the articulation model offers resources for taking historical variation into account without automatically endorsing all past analogical arguments as plausible “in context.” We can retain our two general principles of prior association and potential for generalization, while acknowledging historically and culturally relative standards for what counts as an acceptable prior association. On my theory, an argument by analogy starts from a set of vertical relations in the source domain—the prior association—and provides support for the conclusion that the same or similar connections obtain in the target domain. The argument can therefore be attacked in two different ways. We can criticize the prior association (in the source domain) or we can argue that the attempt to transfer that association to the target domain is blocked in some way. My suggestion is that historical sensitivity enters primarily at the stage of evaluating the prior association. Standards for what counts as an acceptable causal or explanatory relationship have varied greatly over time. By contrast, the evaluation of whether or not we have justification for transferring such a relationship to a new setting can be relatively independent of historical milieu. Aristotle’s and Sizzi’s arguments rely upon concepts of causation and explanation that have been repudiated. Yet it is possible that, setting aside concerns about the prior association, the analogical step in which causal and explanatory associations are transferred to the target domain is acceptable. If that were the case, we would have analogical arguments that established (in their historical setting) plausible conclusions, even though those conclusions turned out to be false. Goldman (1988) has distinguished between strong and weak epistemic justification. A belief is strongly justified if it is formed based on methods that are objectively reliable, and weakly justified if it fails to be strongly justified but the agent is nevertheless nonculpable because of the limited epistemic resources available to him or her. Goldman’s intention is to allow for a form of justification in “scientifically benighted” cultures that lack the resources to determine that they are employing unreliable methods. We might similarly distinguish here between strong and weak justification for analogical arguments. An analogical
26. All translated passages from Aristotle are from Aristotle (1984).
28
BY PARALLEL REASONING
argument is strongly justified if it employs proper forms of prior association and proper standards to transfer that association from source to target; it is weakly justified if the prior association is improper through no fault of the subject, but standards for analogical transfer are observed. Since just about every term in these definitions—“prior association,” “proper standards,” “transfer”—has yet to be defined carefully, we are not in a position to evaluate our two examples. It is worth noting that I employ the same strategy to deal with another objection that might be raised to the idea of a general normative theory of analogical arguments: different standards are employed in different disciplines. As already suggested, my hope is that such variation can be located primarily in the different forms of prior association that occur in analogical arguments. There is a common logical core to analogical arguments. I shall defend that thesis by considering a variety of examples across disciplines and historical periods. In light of the observations in section 1.1, it is at least methodologically appropriate to aim for a unified account of analogical arguments.
1.8 Philosophical Justification There are many views on whether and how analogical arguments can be justified. Skeptics range from those who deny that this form of argument is even legitimate to those who despair of finding an explanation for its successes. Among people who have tried to justify analogical arguments, the usual strategy has been to reduce it to some other generally accepted form of inductive or deductive inference. These efforts are unconvincing, largely because (as suggested above) most analogical arguments are not properly cast in these simpler molds. Many attempts to provide a foundation for analogical arguments suffer from an additional weakness: they make no attempt to differentiate between good and bad analogical arguments. They set out with some “commonsense” characterization of the argument form, such as rule (R) of section 1.6, that applies equally to plausible and implausible analogical arguments. If a philosophical justification for such a rule were to succeed, it would prove too much. There are difficulties even in stating what it means to have a general philosophical justification for analogical arguments (as opposed to justification for a particular argument). One difficulty derives from diversity. As noted in section 1.6, there is no “method” of analogy, no single straightforward inference rule that characterizes the argument form. Another difficulty derives from the weakness of the argument form, particularly if we frame the conclusion in terms of prima facie plausibility. Even a good analogical argument may provide very limited support for its conclusion. These two difficulties appear to rule out a direct reliabilist form of justification. Reliabilists hold that a method or process is virtuous if it generally leads from true initial beliefs to true “output” beliefs (Goldman 1979). In the case of analogical arguments, we have neither a method nor reliability.
ANALOGICAL ARGUMENTS
29
Despite all of these difficulties, this book does provide a general philosophical defense of analogical arguments—in fact, two such defenses. In order to deal with the problem of diversity, I exploit the two-part structure of my theory. The central idea is that to evaluate any analogical argument, we must bring it under a model and then apply the principles of prior association and potential for generalization. Diversity is found in the models, while the common “method,” to the extent that one exists, resides in these two general principles. The only way to provide a general justification for analogical reasoning is to abstract from the models and to focus on those principles. The challenge thus reduces to providing a justification for the following claim: Conditional plausibility. If we have an acceptable model of the prior association and an analogical argument that meets the requirements of our two principles, then the conclusion of the argument is prima facie plausible. This formulation restricts the problem of justification to analogical arguments that the theory takes to be good ones, and it isolates what I take to be the logical core of the argument form. In order to deal with the problem of the weakness or unreliability of analogical arguments, I abandon any attempt to make a direct link to past or future successes. Instead, my strategy is to take broadly acceptable norms of scientific practice for granted and to show that the conditional plausibility claim follows from these norms. Chapter 7 offers a “top-down” justification that presupposes that standard pragmatic virtues guide us in the choice of scientific theories. Chapter 8 develops a completely independent “bottom-up”justification that presupposes the legitimacy of symmetry arguments. Here is a brief sketch of both approaches. In my view, the most successful and natural general defense of analogical reasoning has been in a nonscientific setting: case law. Judicial decisions are based on the verdicts and reasoning that have governed relevantly similar cases, according to the doctrine of stare decisis. The practice of following precedent can be justified by three main ethical considerations. First, the practice provides a relatively stable basis from which people can predict the actions of the courts and formulate their plans. Second, stare decisis serves as a check against arbitrary decisions by an inexperienced or foolish judge. In a word, a kind of ethical conservatism underwrites our use of analogical reasoning in the law. At the same time, and this is the third point in its favor, stare decisis allows for the gradual evolution of the law. The doctrine thus achieves a balance between conservative and progressive moral values. I argue (analogously) that in scientific thinking, analogical arguments are justified as a means for assessing the plausibility of hypotheses because they achieve an optimal balance between conservative epistemic values (such as simplicity and coherence with existing theory) and progressive epistemic values (such as fruitfulness and theoretical unification). This is the top-down (goal-oriented) justification that I shall develop in chapter 7.
30
BY PARALLEL REASONING
Chapter 8 offers a bottom-up (constraint-based) justification founded on principles of symmetry: like problems demand a like solution. Symmetry acts as a constraint on our reasoning and lies at the heart of every analogical argument. My analysis focuses on the practical need to identify hypotheses that merit investigation. In a case of perfect symmetry, one has as much reason to treat the target hypothesis as worthy of investigation as one does the source hypothesis. In the case of an analogical argument that satisfies the requirements of my theory, one still has adequate reasons for doing so. Salmon, following Feigl, makes a useful distinction between two sorts of justification: validation and vindication. An inference is validated by showing that it is governed by an accepted rule. A rule of inference is justified by showing that it can be derived from other accepted rules or principles. There is, however, a second form of justification called “vindication.” This kind of justification consists in showing that a given decision, policy, or act is well adapted to achieving a certain end. (1957, 39) A vindication of a theory of analogical arguments might be a demonstration that a reasonable proportion of the arguments endorsed by that theory lead to true conclusions. Using this distinction, I can summarize my views about the justification of analogical reasoning. Direct validation of an analogical inference rule in terms of some more basic deductive or inductive rule is not promising. The prospects for direct vindication of any such rule in terms of its frequency of success are also dim, both in light of the difficulties noted above and on account of the traditional philosophical problem of induction. The two lines of justification I develop in this book are best described as partial validation of my theory, relative to accepted norms of scientific practice. They are partial because they take for granted certain models of explanation, causation, and so forth. They are relative because they presuppose that the “accepted norms” are themselves justified. I believe that this is the most that one can demand by way of philosophical justification for a general theory of analogical arguments.
1.9 Confirmation Confirmation, in a broad sense, is the process by which a scientific hypothesis receives inductive support. As just explained, I think that a good analogical argument can make a hypothesis prima facie plausible. If the hypothesis was not previously taken seriously, that seems to count as confirmation. Furthermore, the idea that analogical arguments can confirm hypotheses in this broad sense seems to be independent of one’s theory of confirmation. But there are difficulties. That is because confirmation, in a narrow sense, is a certain relation between propositions. To quote a recent discussion: “Some propositions seem to
ANALOGICAL ARGUMENTS
31
help make it rational to believe other propositions. When our current confidence in E helps make rational our current confidence in H, we say that E confirms H” (Christensen 1999, 441). In the Bayesian model, which is the most prominent theory of confirmation in current philosophy of science, “confidence” is represented in terms of personal probability. A Bayesian agent starts with an assignment of subjective probabilities to a class of propositions. Qualitative confirmation is understood to be a three-place relation: Bayesian confirmation. E confirms H relative to K
$
Pr(H / E · K ) > Pr(H / K ).
In this relation, E represents a proposition about accepted evidence, H stands for a hypothesis, K for background knowledge, and Pr for the agent’s subjective probability function. To confirm H is to raise its probability, relative to K. For Bayesians, it may seem quite clear that an analogical argument cannot provide confirmation. In the first place, it is not obvious how to represent an analogical argument as an evidential proposition E. Second, even if we can find a proposition E that expresses the information about source and target domains used in the argument, that information is not new. It is “old evidence,” and therefore part of the background K. This implies that E · K is equivalent to K, and hence that PrðH=E . KÞ ¼ PrðH=KÞ: According to the definition, we don’t have confirmation. Instead, we have an instance of the familiar “problem of old evidence” (Glymour 1980). Third, and perhaps most important, analogical arguments are often applied to novel hypotheses H for which the “prior” probability Pr(H / K) is not even defined. Again, the definition is inapplicable. Something has to be wrong. It cannot be that a good analogical argument succeeds in establishing a conclusion as plausible in the modal sense, yet fails to show anything about its probability. If there is no sense to be made of the idea that analogical arguments contribute to Bayesian confirmation, then we should be suspicious about the modal claim.By the same token, if we are confident about the modal claim, then we need to take another look at the Bayesian conception of confirmation. I mean to show that analogical arguments do have a role in broadly Bayesian epistemology. Within the Bayesian framework, a well-known problem has been to justify assignments of prior probability. A number of writers (Jeffreys 1973, Salmon 1966, 1990, Shimony 1970) have argued, in various ways, that any “seriously proposed” hypothesis must have a sufficiently high prior probability to allow it to become preferred as the result of observation. Salmon in particular has suggested that the argument from analogy is one of the main tools for showing that a hypothesis is “serious” in this sense. My book affirms this idea, but I want to take it a little further.
32
BY PARALLEL REASONING
A “seriously proposed” hypothesis is just a prima facie plausible hypothesis. In section 1.5, I observed that prima facie plausibility is hard to characterize in probabilistic terms. In chapter 8, I use the symmetry-based approach to clarify the sense in which good analogical arguments show that novel hypotheses have “sufficiently high” or nonnegligible prior probability. This illuminates the connection between analogical reasoning and Bayesian confirmation.
1.10 Overview The first part of this book (chapters 2 and 3) is a critical review of existing ideas about analogies and analogical reasoning, including both philosophical and computational approaches. Chapters 4 through 6 present my theory of analogical arguments, including models for evaluating different types. I apply the theory to a variety of examples. Chapters 7 and 8 explore the philosophical basis for analogical arguments. Finally, chapter 9 discusses extensions and implications of my theory. Chapter 2 begins by tracing back to Aristotle a number of commonsense ideas about analogical arguments, and also the idea that the argument from analogy can be reduced to a form of deduction or enumerative induction. After a review of Hesse’s theory of analogy, I consider other theories that have adopted a reductionist strategy. I also respond to skeptical arguments about analogical reasoning. While the chapter aims to disabuse us of the idea that there can be any easy analysis of this argument form, its other purpose is to foreshadow a more sophisticated approach. Chapter 3 surveys computational theories of reasoning by analogy. Most of these theories fall into two groups: structuralist and case-based.27 Structuralist theories regard analogical reasoning as syntactically driven: a good analogy maximizes structural similarity between two domains.28 Case-based reasoning programs, by contrast, view analogical reasoning as a process of assessing the extent to which two domains share relevant, semantically related characteristics. These programs use a variety of interesting techniques for performing analogical reasoning in specific problem situations, some of which have independent philosophical interest. Chapter 4 presents the fundamental ideas of my theory of analogical arguments. The principle of prior association is the requirement that there be some sort of connection in the source domain—logical, causal, or statistical—between the known similarities and the further similarity that is projected to hold in the target domain. The second key idea, potential for generalization, is the extent to which similarities between the source and target domains contribute to, and differences detract from, the possibility of generalizing the prior association to
27. Case-based reasoning (CBR) is a recognized approach in the AI community. The term “structuralist” is nonstandard, but convenient for this book. 28. Prominent examples of the structuralist strategy are Gentner’s structure-mapping theory (first elaborated in Gentner 1983)) and the connectionist approach developed by Holyoak and Thagard (1995).
ANALOGICAL ARGUMENTS
33
cover the target domain. For a prima facie plausible analogical argument, there must be no compelling reason for denying that the same type of association could hold in the target as well. Taken together, these two principles define the articulation model of analogical reasoning. “Articulation” refers to the fact that our theory demands a precise statement of the nature of the prior association in the source domain. This leads to a set of models corresponding to general types of prior association that occur in such fields as mathematics, physics, biology, and archaeology. Distinct criteria for assessing analogical arguments—many of which reflect commonsense intuitions—are formulated for these different models. Chapters 5 and 6 explore the relationship between analogy and generalization in mathematics and select scientific examples. These chapters flesh out the thesis that analogical arguments are a prelude to generalization. It is evident that two sets of phenomena that fall under common laws, or two members of the same class, will be analogous in important ways. These chapters advance the opposite thesis: if there is a strong analogy between two objects or systems, then there is likely to be a nontrivial (fruitful) generalization. This thesis is the basis both for the models of chapter 4 and for the book’s analysis of justification in the final chapters. Chapter 5, on mathematical analogies, is crucial here because it provides the clearest support for the link between analogy and generalization. In many ways, mathematics provides the paradigm for a general theory of analogical arguments. Chapter 7 presents a top-down justification of analogical reasoning based on the assumption that scientists seek to promote epistemic values such as theoretical unification, simplicity, and fruitfulness. In making an initial assessment of plausibility, analogical reasoning achieves an optimal balance between these values. Chapter 8 develops an alternative bottom-up justification based on appeal to symmetry: like problems demand like solutions. Symmetry acts as a constraint upon our reasoning, and it lies at the heart of every analogical argument. This chapter also connects analogical arguments to a Bayesian model of confirmation, showing that such arguments, if they satisfy our fundamental criteria, can help to establish nonnegligible prior probability. Most of the book pertains to analogical reasoning within science, and specifically within what Kuhn (1962) calls “normal science.” This restriction is lifted in chapter 9, which explores the prospects for expanding the theory to deal with revolutionary analogical arguments and analogical arguments outside of science. The chapter also explores connections between analogical arguments and other modes of reasoning (notably, thought experiments) and reviews the implications of my theory for empirical and computational work on analogical reasoning.
This page intentionally left blank
2 Philosophical Theories
2.1 Introduction Three important objectives for any normative theory of analogical arguments are to characterize such arguments clearly, to propose guidelines for their evaluation, and to provide some form of philosophical justification. This chapter selectively reviews philosophical theories of analogy, with a focus on these objectives. There is some consensus among philosophers about the basic criteria for assessing analogical arguments. This consensus is reflected in commonsense principles such as the ones mentioned in section 1.6. Those principles allow us to speak of a commonsense model for evaluating analogical arguments. The first part of this chapter (sections 2.2 and 2.3) traces this model to Aristotle and explores its more sophisticated expression in Hesse (1966). Although I believe that Hesse’s model suffers from serious limitations, I regard my own theory as a refinement of hers. While there is some agreement about evaluation criteria, there is marked disagreement among philosophers about the best representation and justification for analogical arguments. Sections 2.4 and 2.5 concentrate on two distinct strategies. The first casts analogical argument in the mold of deductive inference. The second interprets it as a form of sampling or induction by enumeration. Each approach has merit but faces serious difficulties because it requires assumptions that apply only to certain types of analogical argument. In light of these difficulties, I suggest that to force analogical arguments into some simpler argument pattern is a mistake. Section 2.6 evaluates skeptical arguments purporting to show that no justification for analogical
36
BY PARALLEL REASONING
reasoning is possible. These skeptical arguments fail because they, too, misrepresent the logical structure (and the conclusion) of analogical arguments. One of the main themes of this chapter is that all three aspects of the problem of analogy are linked. We should not be satisfied with an unsystematic list of commonsense criteria, nor should we be optimistic about a philosophical justification that starts from a characterization that applies equally to bad and good analogical arguments. In the concluding section, I argue that refinements to the commonsense model and to the different approaches to justification point toward my own articulation model.
2.2 Aristotle on Analogy In many ways, Aristotle sets the stage for subsequent theories of analogical arguments. Some preliminary remarks will be helpful in discussing his ideas. First, it is important to distinguish between theory and practice. As Lloyd (1966) has pointed out, early Greek thinkers felt unconstrained in using colorful analogies to explain phenomena in such diverse areas as perception, meteorology, astronomy, and geography.1 Aristotle takes his predecessors to task for inferences based on superficial resemblances (Meteor 357a24, GA 752b25), yet he is at times vulnerable to the same criticism, as in his discussion of earthquakes (Meteor 366b14).2 I shall focus on Aristotle’s theoretical reflections about analogy, along with a few judicious examples. Second, although Aristotle employs the term “analogy” (analogia) and talks about analogical predication, he never talks about analogical reasoning or analogical arguments per se. As we shall see, however, he identifies two argument forms—the argument from example (paradeigma) and the argument from likeness (homoiotes)—both of which are closely related to what we now understand as an analogical argument. Third, we may properly attribute to Aristotle the view that analogical arguments are a species of plausibility argument. While Aristotle lacks the two conceptions of plausibility discussed in chapter 1, the term pithanon, which he applies to the argument from likeness (Topics 156b10), is commonly translated as plausible or persuasive.3 The principal examination of the argument from likeness is found in Topics, a work on dialectical reasoning. For Aristotle, “dialectical propositions” occupy territory between those that “admit of no doubt” and those to which “no one would assent” (Topics 104a7), while “dialectical problems” treat of “subjects that should not border too closely upon the sphere of demonstration, nor yet be too far removed from it” (Topics 105a7). Furthermore, a dialectical thesis is the usual starting point for inquiry, which agrees well with the construal of prima facie plausibility developed in chapter 1. It seems
1. Lloyd’s book (1966) provides an outstanding survey and analysis of analogical reasoning in early Greek science. 2. See “The Articulation Model,” section 1.7. 3. There is less explicit textual support in the case of the paradeigma, but see note 5 (below).
PHILOSOPHICAL THEORIES
37
reasonable, then, to regard these two Aristotelian versions of analogical reasoning (especially the argument from likeness) as plausibility arguments. With these remarks in mind, let us turn to Aristotle’s discussion of the argument from example (paradeigma) and the argument from likeness or similarity (homoiotes). The argument from example is described in the Rhetoric and the Prior Analytics: “Enthymemes based upon example are those which proceed from one or more similar cases, arrive at a general proposition, and then argue deductively to a particular inference” (Rhetoric 1402b15). To illustrate: Let A be evil, B making war against neighbours, C Athenians against Thebans, D Thebans against Phocians. If then we wish to prove that to fight with the Thebans is an evil, we must assume that to fight against neighbours is an evil. Conviction of this is obtained from similar cases, e.g. that the war against the Phocians was an evil to the Thebans. Since then to fight against neighbours is an evil, and to fight against the Thebans is to fight against neighbours, it is clear that to fight against the Thebans is an evil. (Pr. An. 69a1) Aristotle notes two differences between this argument form and induction (Pr. An. 69a15ff.): it “does not draw its proof from all the particular cases” (i.e., it is not a “complete” induction4), and it requires an additional (deductively valid) syllogism as the final step. The argument from example thus amounts to singlecase induction followed by deductive inference. Figure 2.1 shows the structure (using · for conjunction and ! for the conditional). Using the notation and terminology of section 1.4, S and T are the source and target domains, P is the positive analogy, and Q is the hypothetical analogy. P(S), Q(S) and P(T) are given. In Aristotle’s example, S is war between Phocians
S (source domain)
T (target domain)
(x)(P(x )→Q(x )) P(S) . Q(S)
P(T ) → Q(T ) P(T )
Q(T ) FIGURE 2.1. Argument from example. 4. Aristotle’s point here is that the paradeigma is not demonstrative; rather, it is a kind of incomplete or inferior syllogism.
38
BY PARALLEL REASONING
and Thebans, T is war between Athenians and Thebans, P is war between neighbors, and Q is evil. The first inference (dashed arrow) is inductive; the second and third (solid arrows) are deductive. I return to this analysis in section 2.5. For now, I simply assert that this is a crude analysis of analogical arguments, appropriate in very few cases. The paradeigma does, however, have the interesting feature of being amenable to an alternative analysis as a deductive argument form. Let us concentrate on Aristotle’s assertion, “we must assume that to fight against neighbours is an evil.” Instead of regarding this intermediate step as something reached by induction from a single case, we might instead regard it as a hidden presupposition. This transforms the paradeigma into a syllogistic argument with a missing (or enthymematic) premise, and our attention shifts to possible means for establishing that premise (with single-case induction as one such means). Construed in this way, Aristotle’s paradeigma argument foreshadows deductive analyses of analogical reasoning. Let us turn to the argument from likeness, which seems to be closer than the paradeigma to our contemporary understanding of analogical arguments.5 This form of argument receives considerable attention in Topics I 17 and I 18 and again in VIII 1. The most important passage is the following one. Try to secure admissions by means of likeness; for such admissions are plausible, and the universal involved is less patent; e.g. that as knowledge and ignorance of contraries is the same, so too perception of contraries is the same; or vice versa, that since the perception is the same, so is the knowledge also. This argument resembles induction, but is not the same thing; for in induction it is the universal whose admission is secured from the particulars, whereas in arguments from likeness, what is secured is not the universal under which all the like cases fall. (Topics 156b10–17) This passage occurs in a section offering general advice about framing dialectical arguments when confronting a somewhat skeptical interlocutor. In such situations, it is best not to make one’s argument depend upon securing agreement about any universal proposition. The argument from likeness is thus clearly distinct from the paradeigma, where the universal proposition plays an essential role as an intermediate step in the argument. The argument from likeness, though logically less straightforward than the paradeigma, is exactly the sort of analogical reasoning we want when we are unsure about underlying generalizations. It is much better suited than the paradeigma to assessing the conjectures that we might entertain in the early stages of an inquiry. To understand the argument from likeness more fully, note that Aristotle defines likeness in terms of shared attributes. Those things are called like which have the same attributes in every respect, and those which have more attributes the same than different, 5. Here I depart from Lloyd, who maintains that the paradeigma “clearly represents what we should call argument from analogy” (1966, 406).
PHILOSOPHICAL THEORIES
39
and those whose quality is one; and that which shares with another thing the greater number or the more important of the attributes . . . in respect of which things are capable of altering, is like that other thing. (Met 1018a15) In Topics I 17, Aristotle states that any shared attribute contributes some degree of likeness: “We should also look at things which belong to the same genus, to see if any identical attribute belongs to them all, e.g. to a man and a horse and a dog; for in so far as they have any identical attribute, in so far they are alike” (Topics 108a13). Since Aristotle criticizes his predecessors for arguments based on superficial similarities, it is natural to ask whether Aristotle explains how to judge when the degree of likeness between two things is sufficiently great to warrant inferring a further likeness. In other words, when does the argument from likeness succeed? Aristotle wisely does not offer any answer! Still, a clue to what he might have thought is provided by the way he justifies particular arguments from likeness. As Lloyd (1966) has observed, Aristotle typically justifies such arguments by articulating a causal principle that governs the two phenomena being compared. For example, Aristotle explains the saltiness of the sea, by analogy with the saltiness of sweat, as a kind of residual earthy stuff exuded in natural processes such as heating. The common principle is this: “everything that grows and is naturally generated always leaves a residue, like that of things burnt, consisting in this sort of earth” (Meteor 358a17). From this method of justification, we might conjecture that Aristotle believes that the “important” similarities are those that enter into these general causal principles. Summarizing the foregoing discussion, Aristotle’s accounts of the two types of analogical argument offer little help with the problem of philosophical justification. The paradeigma has a clear logical structure, but one that is too crude to fit most analogical arguments. The argument from likeness is described in detail, but its foundations are not analyzed. By contrast, Aristotle’s theory provides us with important and influential criteria for the evaluation of analogical arguments. Four stand out: 1. The strength of an analogy depends upon the number of similarities. Aristotle thought that every shared attribute contributes a degree of similarity. This view is repeated in many later accounts of analogical reasoning. For instance, Mill (1843/1930) writes that every resemblance not known to be irrelevant “affords some degree of probability, beyond what would otherwise exist, in favour of the conclusion.” 2. Similarity reduces to identical properties and relations. Aristotle’s definition of likeness effectively reduces similarity to a conjunction of identical shared attributes. Many current theories of analogical reasoning make the same assumption. 3. Good analogies derive from underlying common causes or general laws. Aristotle frequently articulates a general causal principle that applies to both domains. Versions of this idea play an important part in many later accounts.
40
BY PARALLEL REASONING
4. A good analogical argument need not presuppose acquaintance with the underlying universal (generalization). This point distinguishes the argument from likeness and the argument from example. In general, analogical arguments do not depend upon our knowing the universal result in advance. If we did, we could rely on the generalization and dispense with the analogy. Nor must we articulate the underlying generalization as part of our argument. We can proceed directly from case to case. Many later theories agree on this last point. Mill writes that analogical arguments are evaluated “without any antecedent evidence of a connection” between the known resemblance and the further conjectured resemblance. Quine suggests that an analogy bypasses or “slurs over” an inductive generalization (Quine and Ullian 1970, 61). We might try to capture the generalization after the fact, but “the missing generalization is not always easy to supply” (62). These four principles form the core of a commonsense model for evaluating analogical arguments. Their sharpest contemporary expression is found in the work of Hesse, to which we now turn.
2.3 Hesse’s Theory Hesse’s classic, Models and Analogies in Science (1966), is a rich study of the role of analogies in theoretical confirmation and explanation. I focus here on chapters 2 and 3 of her book, which present criteria for plausibility and a discussion of the problem of justification.
Hesse’s Criteria In chapter 2 of her book, Hesse formulates three requirements that an analogical argument must satisfy in order to be acceptable:6 1. Requirement of material analogy. The horizontal relations must include similarities between observable properties. 2. Causal condition. The vertical relations must be causal relations “in some acceptable scientific sense” (1966, 87). 3. No-essential-difference condition. The essential properties and causal relations of the source domain must not have been shown to be part of the negative analogy. In addition, Hesse makes an important simplifying assumption:
6. The labels for these conditions are my own. I have substituted “source domain” and “target domain” for Hesse’s “model” and “explanandum.” See “Representation and Terminology,” section 1.4, for definitions of the horizontal and vertical relations, and of the positive and negative analogy.
PHILOSOPHICAL THEORIES
41
4. Similarity-identity reduction. All similarities can be resolved into relationships of identity and difference. Let us first discuss the simplifying assumption, and then the three requirements. SIMILARITY- IDENTITY REDUCTION .
Hesse supposes that in any analogical argument, the horizontal relations between the source and target domains can be adequately represented by a list of identities and differences. She offers the following argument: When similarities are recognized, they are described in some such way as, “Both analogues have property B, but whereas the first has property A, the second has instead property C.” It may be that when the nature of the similarity is pressed, it will be admitted that the analogues do not both have the identical property B, but two similar properties, say B and B´, in which case the analysis of the similarity of B and B´ repeats the same pattern. But if we suppose that at some point this analysis stops, with the open or tacit assumption that further consideration of difference between otherwise identical properties can be ignored, we have an analysis of similarity into relations of identity and difference. (1966, 70–71) The crucial premise of this argument is that at each step in the analysis, the only way to analyze similarity is in terms of near-identity and difference. Given that premise, the only possible stopping point for the analysis is when we are content to treat all nearly identical properties as identical. That implies a complete resolution of the original similarity into relations of identity and difference. I shall mention just two difficulties with Hesse’s argument. The first is that it is overly sanguine to suppose that the stopping point represents a situation where further differences can fairly “be ignored.” In practice, the advocate for an analogical argument often deliberately stops—indeed, often must stop—with identities that suppress important differences. A good illustration is provided by Reid’s argument for life on other planets (section 1.6). Reid begins by noting that Earth and other planets orbit and are illuminated by the sun, have moons, and rotate on an axis. This list of identities clearly suppresses differences in orbital distance, levels of radiation, numbers and sizes of moons, and rates of rotation. The right stopping point in the analysis of similarity may be a matter of dispute. The issue needs to be addressed by a good theory of analogical arguments. The second difficulty is that, contrary to Hesse’s key premise, alternative analyses of similarity may be available. This point is clearest for mathematical analogies. There are numerous similarities between two- and three-dimensional objects. For example, the formulas for the area and perimeter of a rectangle are similar to those for the volume and surface area of a box. These similarities are best expressed not by any list of identities and differences but, rather, by noting that the formulas are instances of a common generalization.
42
BY PARALLEL REASONING
In short, the assumption that similarity can be unproblematically reduced to identities and differences is legitimate in important special cases, but should not be taken as generally valid. REQUIREMENT OF MATERIAL ANALOGY. Hesse insists that an acceptable analogical argument must include “observable similarities” between domains, which she refers to as material analogy. Material analogy is contrasted with formal analogy. Two domains are formally analogous if both are “interpretations of the same formal theory” (68). Heat and fluid flow exhibit formal analogy because the relevant physical laws have a common mathematical form. As a more pedestrian example, Hesse cites a political analogy, likening the relationship between a state and its citizens to the relationship between a father and his children. A purely formal analogy between two domains involves abstract similarities between relations within the source domain and relations within the target domain. By contrast, material analogy between two domains consists of what Hesse calls “observable” or “pre-theoretic” similarities, by which she means similarities that can be known independently of relations within the target domain (69). For example, in drawing an analogy between sound and light, we might begin by noting similarities between echoes and reflection. These similarities are accessible even without any detailed theory about the nature of light. Hesse’s requirement of material analogy puts her in opposition to most recent computational theories of analogical reasoning. As we shall see in chapter 3, a popular approach takes high-level relational correspondences or “systematicity” to be the heart and soul of analogical reasoning. Setting aside this debate for the moment, let us examine Hesse’s requirement. It is puzzling that Hesse should take this hard line against purely formal analogies. The puzzle dissolves, however, once we realize that, in Hesse’s view, an analogical inference transfers information from a well-understood source domain to a target about which very little is known. For Hesse, we generally have no theory of the target domain—and specifically, no theoretical knowledge of the vertical relations among objects in the target. If we had sufficient theoretical knowledge of the target to construct a formal isomorphism, then we could indeed “transfer” information from the source domain, but this would not constitute genuine analogical reasoning. There would be no novel or predictive use of the analogy; the transferred information would be derivable from what we already knew about the target, independently of the analogy. By contrast, in a case of genuine analogical reasoning, we begin with lowlevel observable similarities, such as those between echoes and reflection. These similarities (Hesse’s material analogy) suggest further correspondences, such as those between pitch and color, or between volume and brightness. We now make novel predictions: knowledge of mathematical formulas relating these parameters for the case of sound can be transferred to analogous relationships for light. Formal analogy is the end result, not the starting point, of analogical reasoning. I think that this reflects Hesse’s view and explains her motivation for the requirement of material analogy.
PHILOSOPHICAL THEORIES
43
There are compelling reasons to reject these assumptions about analogical reasoning and to drop the requirement of material analogy. First, it is apparent that formal analogies are the starting point in many important analogical inferences. That is certainly true in mathematics, a field in which material analogy, in Hesse’s sense, plays no role at all. Analogical arguments based on formal analogy have also been extremely influential in physics (Steiner 1989, 1998). Second, Hesse faces great difficulty in justifying the weight she attaches to “pre-theoretic” similarities. There is no positive argument for giving our naı¨ve judgments of resemblance a privileged status. This leads us to the third and most important point. Hesse’s model of analogical reasoning assumes too little knowledge of the target domain. Often, an analogical argument is employed to extend our theoretical knowledge of the target domain. The similarities that constitute the basis of the analogy need not be independent of our theories about the target. In fact, Hesse’s “pre-theoretic” associations may be less arbitrary if incorporated into a partial theory of the target. CAUSAL CONDITION .
Hesse requires that the hypothetical analogy, the feature transferred to the target domain, be causally related to the positive analogy. In her words, the essential requirement for a good argument from analogy is “a tendency to co-occurrence”—that is, a causal relationship. She states the requirement as follows: “The vertical relations in the model [source] are causal relations in some acceptable scientific sense, where there are no compelling a priori reasons for denying that causal relations of the same kind may hold between terms of the explanandum [target]” (1966, 87). Actually, this statement combines the causal condition with Hesse’s noessential-difference condition, to be discussed shortly. The causal condition rules out analogical arguments where there is no causal knowledge of the source domain. It derives support from the observation that many analogies do appear to involve a transfer of causal knowledge. The causal condition is on the right track but, like Hesse’s other conditions, is too restrictive. The main difficulty is that the “tendency to co-occurrence,” as a characteristic of good analogical arguments, embraces more than causal connection. For example, the causal condition rules out analogical arguments in mathematics. Even if we limit attention to the empirical sciences, persuasive analogical arguments may be founded upon strong statistical correlation in the absence of any known causal connection. Consider Benjamin Franklin’s prediction that pointed metal rods would attract lightning, by analogy with the way they attracted the “electrical fluid” in the laboratory: Electrical fluid agrees with lightning in these particulars: 1. Giving light. 2. Color of the light. 3. Crooked direction. 4. Swift motion. 5. Being conducted by metals. 6. Crack or noise in exploding. 7. Subsisting in water or ice. 8. Rending bodies it passes through. 9. Destroying animals. 10. Melting metals. 11. Firing inflammable substances. 12. Sulphureous smell.—The electrical fluid is attracted by points.—We do not know
44
BY PARALLEL REASONING
whether this property is in lightning.—But since they agree in all the particulars wherein we can already compare them, is it not probable they agree likewise in this? Let the experiment be made. (Franklin 1941, 334) Franklin’s hypothesis was based on a long list of properties common to the target (lightning) and source (electrical fluid in the laboratory). There was no known causal connection between the twelve “particulars” and the thirteenth property, but there was a strong correlation. In light of these observations, we should admit that analogical arguments may be plausible even where there are no known causal relations. NO - ESSENTIAL - DIFFERENCE CONDITION . Hesse’s final requirement is that the “essential properties and causal relations of the [source] have not been shown to be part of the negative analogy” (1966, 91). Hesse does not provide a definition of “essential,” but suggests that a property or relation is essential if it is “causally closely related to the known positive analogy.” For instance, an analogy with fluid flow was extremely influential in developing the theory of heat conduction. Once it was discovered that heat was not conserved, however, the analogy became unacceptable (according to Hesse) because conservation was so central to the theory of fluid flow. Although this requirement appears plausible, once again it is too restrictive. It can lead to the rejection of a good analogical argument. Consider once again the analogy between a two-dimensional rectangle and a three-dimensional box. Broadening Hesse’s notion from causal to logical relationships, it seems that there are many “essential” differences between rectangles and boxes. For instance, a line can divide a rectangle, but not a box, into two disconnected sets. This does not mean that we should reject every analogy between rectangles and boxes out of hand. The problem derives from the fact that Hesse’s condition is applied to the analogy relation independently of the use to which that relation is put. What counts as essential should vary with the analogical argument. Absent an inferential context, it is impossible to evaluate the importance or “essentiality” of similarities and differences. Taking this point a little further, Hesse’s requirement has the following counterintuitive consequence: the less extensive the positive analogy, the stronger the argument is likely to be. As the positive analogy expands, more properties of the source become essential on Hesse’s account, and the risk of an essential disanalogy increases. That is just what happens in the case of the analogy between heat and fluids. This consequence is counterintuitive because, all things being equal, a broad positive analogy contributes to the plausibility of an analogical argument. REFINING THE COMMONSENSE MODEL . Even though I reject Hesse’s assumption about similarity and her requirement of material analogy, I think that much is right about her “commonsense” theory. I have suggested replacing her causal condition with a more general requirement that the vertical relations in the source domain may be causal, logical, statistical, or other relations that reflect a “tendency to co-occurrence.” I have argued that her no-essential-difference
PHILOSOPHICAL THEORIES
45
condition should be refined so that evaluation of what is essential depends upon the individual analogical argument. In the concluding section, I shall argue that these observations move us toward the principles of my own theory: prior association and potential for generalization. The remainder of the chapter discusses the problem of formulating a justification for analogical arguments, first in Hesse’s work and then in other philosophical accounts.
Hesse’s Analysis of Justification Hesse is rather pessimistic about the prospects for providing a philosophical foundation for the argument from analogy (1966, 55). Her discussion of the problem, in her chapter 3, is very limited. One puzzling limitation is that the discussion makes no mention of Hesse’s own criteria for good analogical arguments. The challenge she poses amounts to finding a justification for every analogical argument that conforms to schema (1.4.1), rather than just those that satisfy her criteria.7 Such a justification would prove too much, as Hesse should accept given her suspicions about purely formal analogies. Rather than review her discussion at length, I concentrate on one argument that illustrates her approach to matters of justification. She raises the following problem (1966, 101–102): why is it more reasonable to select a hypothesis based on a model (i.e., on an analogy) than one that is not? Putting things in terms of schema (1.4.1), suppose that Q is a feature known to hold in source domain S, with Q* conjectured to hold in the target T. Since Hesse identifies Q and Q*, let’s write Q(S) and Q(T ) for the hypotheses that Q holds in the source and target domains, respectively. The problem becomes: How can we show that it is more reasonable to accept the hypothesis Q(T ) than a rival hypothesis X(T ) that is not supported by any analogy? Hesse shows that some theories of confirmation provide a means of solving this problem. For example, drawing on Hempel’s theory of confirmation, Hesse shows that a single source domain S satisfying P(S) and Q(S) provides inductive support for the generalization ðxÞðPðxÞ→QðxÞÞ and hence, given P(T ), support for Q(T ). The first step is confirmation by an instance; the second step relies on Hempel’s Special Consequence Condition (which asserts that if E confirms H, then E confirms any logical consequence of H). By contrast, there is no inductive support for the rival generalization, ðxÞðPðxÞ→XðxÞÞ: Unfortunately, this justification is exceptionally weak. The strategy requires that we treat the analogical argument as single-case induction, an approach that
7. For schema (1.4.1), see chapter 1.
46
BY PARALLEL REASONING
I criticize (in section 2.5). Briefly, this analysis distorts the argument form by introducing an intermediate step (generalization from a single case) that is harder to justify than the original argument. Most of the purported justifications discussed by Hesse are of a similar nature. Hesse briefly explores the possibility of an a priori justification in her book (1966), but concludes that the required assumptions are too obscure. She returns to the a priori approach in a later work (1974). There, she adopts what she calls the “clustering postulate”—the assumption that our epistemic probability function ought to have a built-in bias toward generalization. Specifically, for any attributes P and Q, and for any finite set of objects a1, . . . , an having the attribute P, the prior probability of finding a proportion of these objects with attribute Q is skewed in favor of the proportions 1 and 0.8 The objections to such global postulates of uniformity are well known (see Salmon 1966), but it is significant that Hesse, after such lengthy reflections about analogical reasoning, should arrive at the view that an a priori approach is unavoidable. Still, the later approach of Hesse (1974) does not succeed. If Hesse’s “clustering postulate” could be defended, then once again we would have a justification for every analogical argument that conforms to schema (1.4.1). As I have stressed, that is much too strong a conclusion.
2.4 Analogy as Deduction A number of philosophers have suggested that, in at least some cases, background knowledge can be supplied to convert an analogical inference into a deductively valid argument. Some people believe that this is the most promising way to think about any analogical argument. Here I shall discuss two “deductivist” approaches. Russell and Davies’s model involves determination rules.9 Suppose that Q and P1, . . . ,Pm are variables, and we have background knowledge that the value of Q is determined by the values of P1, . . . ,Pm. In the simplest case, where m = 1 and both P and Q are binary Boolean variables, this reduces to ð2:4:1Þ
ðxÞðPðxÞ→QðxÞÞ _ ðxÞðPðxÞ→ QðxÞÞ;
that is, P determines whether or not Q holds. More generally, the form of a determination rule is ð2:4:2Þ
ðxÞðyÞ½ðP1 ðxÞ ¼ P1 ðyÞÞ. . .ðPm ðxÞ ¼ Pm ðyÞÞ→ðQðxÞ ¼ QðyÞÞ;
8. Readers will recognize, and Hesse explicitly acknowledges, Carnap as the inspiration for her “clustering postulate.” 9. These ideas are developed in Russell (1986a), Davies and Russell (1987), and Davies (1988).
PHILOSOPHICAL THEORIES
47
where x and y are quantified over a domain of objects. This can be represented more compactly in functional notation as ð2:4:3Þ
Q ¼ FðP1 ; . . . ; Pm Þ:
Such a relationship is called a determination rule. If we assume such a rule as part of our background knowledge, then there is a corresponding analogical argument that is deductively valid. If the source S matches the target T on all values Pi, then we may validly infer that Q (T ) = Q(S ). Adapting one of the examples given by Russell and Davies, let’s suppose that the value (Q ) of a used car (relative to a particular buyer) is determined by its year, make, mileage, condition, color, and accident history (the variables Pi). It doesn’t matter if one or more of these factors are redundant or irrelevant. Provided two cars are indistinguishable on each of these points, they will have the same value (again, keeping the buyer fixed). There is a deductively valid analogical argument from the determination rule together with the value of one such car to the value of the other car. Russell and Davies claim three virtues for their account. First, it makes the role of background knowledge explicit. Second, it provides a clear and simple justification for the analogical argument. Third, the determination rule does not trivialize the role of the analogy: only by combining that rule with information about the source domain can we derive the value of Q(T ). If our determination rule has the form of equation (2.4.1) and we want to infer Q (T ), we need the additional information that P(S).Q(S) holds in the source domain. Determination rules are interesting, and they can provide an illuminating analysis of some analogical arguments. But what promise do they hold for the justification of analogical arguments in general? Russell and Davies maintain that background knowledge of a determination rule underlies many of our analogical inferences. This assertion is questionable. Scientific analogies are commonly applied to problems where we do not possess useful determination rules. In many (perhaps most) cases, researchers are not aware of all relevant factors. Medical researchers conduct drug tests on animals without knowing all attributes that might be relevant to the effects of the drug. Indeed, one of the main objectives of such testing is to guard against reactions unanticipated by theory. The justification for animal testing is the somewhat vague belief that “the most important life-regulating building blocks are the same in both man and animal” (Kobinger 1989, 33). There is no presumption that we have a comprehensive list of these building blocks. Most physiological processes in humans and animals are still inadequately explained (Lembeck 1989, 24). The “determination rule” interpretation of analogical arguments must either limit the scope of such arguments to cases where we have a well-supported determination rule or focus attention on formulating and justifying an appropriate determination rule. For cases such as animal testing, neither option seems realistic. Weitzenfeld (1984) proposes the slightly more general thesis that analogical arguments are deductive arguments with a missing premise that typically
48
BY PARALLEL REASONING
amounts to a determination rule or relation. The main difference between his view and that of Russell and Davies is that he does not insist that the missing premise should be background knowledge. Instead, he suggests three ways in which the missing premise might be discovered and justified: enumeration, surveillance, and inference. Enumeration amounts to examining the source and target domains systematically. This process is “not promising” (146–47) because, in the course of such an examination, we are likely to find independent means for verifying or disproving the conclusion of the analogical argument, which then becomes redundant. Surveillance is a matter of perceiving that the same determining structures are present in both domains: “when an array of elements is displayed, either in a visual presentation or to other senses, an isomorphism may be immediately apparent” (147). The idea is that we have a holistic grasp of similarity that justifies the inference to particular similarities. But as Weitzenfeld concedes, “no one quite knows how it is done.” Finally, by inference Weitzenfeld appears to have in mind background theoretical considerations. In the case of drug testing on animals, for example, we have background knowledge of our common evolutionary history and similar functional organization. In the end, Weitzenfeld acknowledges, the premise that there are common determining relations is “based upon plausibility arguments” (148). If that is right, then we have come full circle. We have replaced an argument by analogy, requiring no commitment to any generalization, with a valid deductive argument that requires an extra premise that has to be supported with plausibility arguments. Recasting analogy as a deductive argument may help to bring out background assumptions, but it makes little headway with the problem of justification. That problem reappears as the need to establish the plausibility of the determination rule, and that is at least as difficult as justifying the original analogical argument. Despite the difficulties, there is a powerful intuition behind deductivist analyses of the argument from analogy. That intuition is symmetry. Two cases perfectly alike in all relevant attributes, such as two perfectly similar used cars, demand similar treatment. Can we rely upon symmetry to obtain a more realistic justification for analogical arguments, even for cases where we cannot assume that we have a determination rule? I say a little about this question in the concluding section of this chapter, and a lot more about it in chapter 8.
2.5 Analogy as Enumerative Induction Several philosophers have attempted to portray, and justify, analogical arguments in terms of some simpler inductive argument pattern. Here I consider two moderately popular strategies. The first treats analogical reasoning as generalization from a single case; the second treats it as a kind of sampling argument.
PHILOSOPHICAL THEORIES
49
Single-case Induction As we saw earlier, Aristotle’s argument from example (paradeigma) consists of generalization from a single case (the source) followed by a trivial deductive inference to the target case (see figure 2.1). I believe that this is a poor way to think about most analogical arguments, but let’s give it a hearing. As a preliminary point of clarification, we restrict our attention here to analogical arguments based on a single case. Historically, many people, including Hume (1739/1978) and Keynes (1921), refer to the “argument from analogy” even where there are multiple instances of a single type. There is no clear separation between arguments from analogy and those based on enumerative induction. If we accept the latter form of reasoning, then analogical arguments are (uninterestingly) justified whenever we have a sufficient number and variety of instances. My focus, however, remains the problem of justifying an analogical argument based on a single case, and I limit discussion accordingly. Can a single instance lead to justified generalization? It can. Cartwright argues that we can sometimes generalize from a single careful experiment: [In] the best experimental designs, where we have sufficient control of the materials and our knowledge of the requisite background assumptions is secure, one single instance can be enough . . . Clearly, in these physics experiments we are prepared to assume that the situation before us is of a very special kind: it is a situation in which the behaviour that occurs is repeatable. Whatever happens in this situation can be generalized. (1992, 51) If a sample of a chemical compound dissolves in water or some other solution, we justifiably infer that any sample of that compound will be soluble under similar conditions. We know from experience that compounds have stable natures that comprise properties like solubility. If the property belongs to one sample, then it belongs to all. What are the “requisite background assumptions?” In Cartwright’s view, we must presuppose permanent, Aristotelian natures. In a similar spirit, Quine (1969) maintains that we can have instantial confirmation only for natural kinds. Kyburg attempts a general characterization of the required background knowledge for single-case generalization: A species S falls under a certain genus G; a property P belongs to a certain class of properties K. On the basis of previous knowledge, we know that all the species of that genus G are such that if one member of that species has a given property belonging to that class of properties [K ], then all members of that species have that same property. (1970, 121) Typically, this means that members of G are natural kinds and properties in K are essential properties of the kinds in G.
50
BY PARALLEL REASONING
Based on the foregoing, the objection to understanding all analogical arguments as single-case induction should be obvious: it is simply too restrictive. Some analogical arguments may fit this pattern, but not most. In fact, successful analogical arguments may lead to the discovery of a natural kind, as we shall see in section 6.2. But we don’t know that we are dealing with a natural kind when we make the original analogical argument. Interpreting the argument from analogy as single-case generalization is also counterproductive in another way. The analysis does little to advance the search for criteria that help us to distinguish between relevant and irrelevant similarities, and hence between good and bad analogical arguments.
Sampling Arguments On the sampling conception of analogical arguments, acknowledged similarities between two domains are statistically relevant evidence for further similarities. The most direct version of the sampling argument is due to Mill. An argument from analogy, he writes, is “a competition between the known points of agreement and the known points of difference.” Agreement of A and B in 9 out of 10 properties implies a probability of 0.9 that B will possess any other property of A: “we can reasonably expect resemblance in the same proportion” (Mill 1843/1930, 367). His only restriction has to do with sample size: we must be relatively knowledgeable about both A and B. If the extent of unascertained properties of A and B is large, similarity in a small sample is not a reliable guide. Thus, Mill rejects Reid’s argument that, on the basis of a few gross similarities between earth and moon, there is an appreciable probability of life on the moon. When we consider how “unmeasurably multitudinous are those of their properties which we are entirely ignorant of,” we must attach little importance to the few known similarities. By contrast, Mill saw no difficulty in using analogical reasoning to infer characteristics of newly discovered species of plants or animals, given our extensive knowledge of botany and zoology.10 A slightly more elaborate form of the sampling argument, presented by Harrod (1956), is based on the following idea. The known properties of S (the source domain) may be considered a random sample of all S’s properties— random, that is, with respect to the attribute of also belonging to T (the target domain). If the majority of known properties that belong to S also belong to T, then we should expect most other properties of S to belong to T, for it is unlikely that we would have come to know just the common properties. Harrod indicates his reliance on what he calls the fair sampling postulate, which amounts to the supposition that if the proportion of B’s in a large population of A’s is p, then the chance that a random sample of size n contains n exactly r B’s is given by the binomial distribution, ð Þpr ð1 pÞnr . r
10. In Mill’s defense, we should note that elsewhere he rejects this crude similarity measure and states that the probability value is usually “non-assignable.” Mill also develops a justification of analogy along lines similar to Keynes’s principle of the limitation of independent variety, a contribution explicitly acknowledged by Keynes (1921, 268).
PHILOSOPHICAL THEORIES
51
The sort of problem to which this distribution standardly applies is drawing balls from an urn. The binomial distribution gives the chance of drawing r black balls in n selections (with replacement) from an urn in which the proportion of black balls is p. Applied to the argument from analogy, Harrod defines the “population” as all observable features of the two domains and the “sample” as the actually observed features of the source domain. If most properties in the sample are shared by both systems, and if we assume the fair sampling postulate, then simple combinatorial reasoning sanctions the inference that the two domains have further observable similarities. To illustrate, Harrod asks us to imagine that we have observed 26 features of one cat and 13 of a second cat. All 13 happen to be shared with the first cat. On the hypothesis that just those 13 features and no more are shared, what has been “samples” of 13 features would observed is highly improbable: only one out of 26 13 be a perfect match. Hence, the hypothesis that there are further similarities becomes probable. Of course, this argument does not justify our inferring any specific additional similarity (the usual conclusion of an argument from analogy). There are grave difficulties with Harrod’s and Mill’s analyses. One obvious problem, noted earlier, is that the “population” is poorly defined. How are we to count similarities and differences? The ratio of shared to total known properties varies dramatically according to how we do this. Should properties logically implied by other properties be counted? Should some properties be weighted more heavily than others? Even Peirce, a staunch advocate of the view that induction and analogy were both “essentially inferences from sampling”, acknowledged an element of guesswork in estimating which properties are significant (Peirce, Hartshorne, and Weiss 1931–1935, 6: 31 and 6:526). If we can add similarities or differences at will, then these arguments yield inconsistent results. Sampling arguments have no value without guidelines for counting similarities, or better, a theory of relevance. A second serious difficulty lies in justifying the assumption that the sample of known features is random. The paradigm of repeated selection from an urn seems totally inappropriate. In the case of the urn, the selection process is arranged so that the result of each choice is not influenced by the agent’s intentions or purposes, or by prior choices. There is good reason to believe that samples are randomly selected. By contrast, the presentation of an analogical argument is always partisan. Bias enters into the initial representation of similarities and differences: the advocate highlights similarities, while the critic plays up differences. We have excellent reasons to reject a distribution based on random sampling. A final and somewhat more promising version of the sampling argument, owing to Stuart Russell (1988), avoids some of the difficulties just discussed. Russell’s innovation is to incorporate his ideas about determination rules (section 2.4). He is well aware of the problem of biased representation. Provided we have a fixed list that includes all determining factors, however, he shows how it is possible to make comparative use of a sampling argument. We can adjudicate between competing analogical arguments. The key idea is that the most similar source domain, measured in terms of the number of known matches on our fixed list, is “most likely to match the target on
52
BY PARALLEL REASONING
the properties which are relevant to the query even though the identity of those properties is unknown” (1988, 257). Greater similarity increases the likelihood that the relevant determining factors are included among the set of shared features. In more detail, suppose that the different source domains and the target are described by assigning values to m attribute variables P1, . . . ,Pm, and we know the value of the variable Q for the sources but not the target domain. Russell’s argument relies upon two crucial assumptions: 1. All features relevant to the value of Q appear among P1, . . . ,Pm. These features determine the value of Q. Thus, the “fixed list” is P1, . . . ,Pm. 2. Given no further information, we may assume each variable Pi is equally likely to be relevant (i.e., one of the determining factors for Q ). Now suppose that exactly r of the attributes on our list are relevant to Q. Given any source domain, let s be the number of attributes Pi such that Pi has the same value (a match) in the source and the target. Russell takes s to be the degree of similarity between source and target. Let Pr(s,r) be the probability that a source with similarity s to the target matches it on the r relevant attributes. By (2), we have s r Prðs; rÞ ¼ : m r The higher the number of similarities, the greater the chance of a perfect match on all relevant attributes. Furthermore, if source and target match on all relevant attributes, then by assumption 1, above, they also match on Q. This measure allows us to compare different source domains and select the best one. Russell’s argument makes it unnecessary to assume that we have a random sample of properties, but it still fails to solve the counting problem. The list of attributes Pl, . . . ,Pm includes factors both relevant and irrelevant to Q. Unless this list is fixed in some nonarbitrary manner, the probability values change when we add irrelevant attributes. The ordinal ranking of different sources by number of matches will be similarly sensitive. In addition, Russell’s argument depends upon the knowledge that our list of attributes is complete—that is, sufficient to determine the value of Q. As pointed out in section 2.4, analogies are often used in situations where we lack such knowledge.
2.6 Skeptical Objections Hesse’s reluctance in Models and Analogies in Science (1966) to probe deeply into the question of justification may reflect pessimism about the prospects for success, but she clearly believes that analogies play an important part in scientific inquiry. Agassi (1964, 1988) goes beyond pessimism. He formulates two skeptical arguments that purport to show that most analogical arguments cannot be justified. In his view, they show nothing at all except in the special case
PHILOSOPHICAL THEORIES
53
where we have two isomorphic systems of laws and we transfer consequences from one system to the other. In such cases, which Agassi calls “hard analogies,” the analogical argument is deductively valid. The remaining “soft analogies” have no value. Agassi’s thesis, more specifically, is that all “soft” arguments by analogy are either ad hoc or redundant. Agassi’s first skeptical argument relies upon the idea that analogical arguments amount to single-case generalization.11 He illustrates with a simple example. Tom is an Englishman and tall. Dick is an Englishman; so, by analogy, Dick is also tall. If this argument is warranted, then so is the inference that Harry, another Englishman, is tall. “In other words, all Englishmen are tall.” We cannot arbitrarily discriminate between Dick and Harry; we must give some reason—that Harry, unlike Tom and Dick, did not drink milk as a child. “In other words, all Englishmen who drank milk in childhood are tall.” In general, if P represents the positive analogy, Q is the hypothetical analogy, and T stands for the target domain, then P provides a warrant for Q(T ) just in case it provides a warrant for the generalization ðxÞðPðxÞ→QðxÞÞ But this generalization either has already been accepted on the basis of independent evidence, in which case the analogy is redundant, or is based on a single instance, in which case it is ad hoc. As pointed out earlier, it is misleading to construe analogical arguments as requiring an intermediate step of generalization from a single case. Let’s take a closer look at this point, which is plainly crucial to Agassi’s argument. Agassi commits a straightforward fallacy in supposing that a warrant for each instance of a generalization is also a warrant for the generalization. This supposition is a form of epistemic closure, and it fails for any argument form whose conclusions are probabilistic. The simplest illustration is the Lottery Paradox: in a large lottery, evidence warrants the inference that my ticket will lose, your ticket will lose, and so on. But it does not warrant the generalization that every ticket will lose. Since the conclusion of an analogical argument is a claim about plausibility, we should expect closure to fail. Agassi’s second skeptical argument starts from the observation that analogies are always proposed within a context of background assumptions. Spelling out this context is a crucial step in evaluating an analogical argument, as is clear from the example of the milk-drinking Englishmen. Agassi’s argument now proceeds: 1. There is no utterly context-less analogy. 2. Analogical arguments rely on methodological essentialism. 3. Methodological essentialism is false. Premise 1 asserts that for any analogical argument, important assumptions will remain in the background. The other two premises rely upon “methodological essentialism.” By “essentialism,” Agassi means the view that there are natural 11. This argument is first presented in Agassi (1964), but is repeated in the later article.
54
BY PARALLEL REASONING
kinds. Methodological essentialism is the assumption that human intuition reliably identifies natural kinds (1988, 417). Here is my interpretation of Agassi’s argument. Premise 1 asserts the existence of residual factors (the “context”) in any analogical argument. Premise 2 equates the assumption that those residual factors can safely be ignored with the assumption that the source and target domains belong to a common natural kind. Finally, we have premise 3, for which Agassi provides a very terse argument: “evolutionism makes the natural classification both open-ended and (ecological) context-dependent.” The point here is that our classification systems are heavily dependent on how we have evolved and on changing human purposes. We have no justification for the assumption of a common natural kind in any particular case because there is no reason to think that we have reliable intuitions about natural kinds in general. In responding to Agassi, we could criticize premise 3, which is not adequately defended. Instead, I shall concentrate on premise 2. As a preliminary observation, there is a strong prima facie case against Agassi’s claim that methodological essentialism is an indispensable assumption. The case law system provides an excellent example of a well-functioning system guided by analogical reasoning without any reliance upon methodological essentialism. Analogical arguments are also ubiquitous in mathematics (Polya 1954), a setting in which intuitions about natural kinds are not pertinent. These counter-examples suggest that we should re-examine Agassi’s analysis. We want a justification for analogical arguments that shows why methodological essentialism is unnecessary. Although we are not yet in a position to provide such a justification, we can clear the way for one by identifying the mistakes in Agassi’s argument. We need one new bit of terminology. As always, let P stand for the positive analogy between the source and target domains, and Q for the hypothetical analogy. I have argued that for a good analogical argument, we must suppose that there is some connection between P and Q—this is the prior association (see section 2.3). Let’s focus on the special case where the connection can be stated as a universal generalization. Let us call a generalization, ð2:6:1Þ
ðxÞððPðxÞ:CðxÞÞ→QðxÞÞ
a supporting generalization if C is common to the source and target. (A true supporting generalization guarantees a successful analogical argument.) There are more complex forms of prior association, but this one is sufficient for responding to Agassi. Our criticism of Agassi’s first argument is that we need not commit ourselves to any particular supporting generalization when we argue by analogy. We might believe that there is some generalization that links drinking milk and becoming tall, but refuse to commit ourselves to any particular claim of the form (2.6.1). It suffices that we take seriously the possibility that some supporting generalization holds. This simple observation is also the key to answering Agassi’s second argument. Even if we lack a reliable instinct about natural
PHILOSOPHICAL THEORIES
55
kinds, Agassi’s argument provides no reason to assign negligible probability to the existence of some supporting generalization. In order to explain this point, let’s consider an example. Suppose we are considering a new antibiotic that has proved to be effective on laboratory mice. The source domain is the mice undergoing the tests; the target domain is humans. Let P designate certain features taken to be common to mice and humans. Let Q signify recovery from infection. Let C consist of unspecified further properties of mice that, together with P, determine the positive response Q.12 We infer by analogy that, plausibly, humans who take the antibiotic will respond positively. It may be that there is a true supporting generalization; the factors in C may be common to mice and humans. Alternatively, there may be no supporting generalization. The first possibility is favorable to our analogical argument; the second is not. Agassi’s premise 2 could be motivated by two lines of reasoning: 1. Zero probability. Without methodological essentialism, there is no justification for assigning other than zero or infinitesimal probability to the favorable case that a supporting generalization is true. 2. Negligible probability. Without methodological essentialism, the favorable case has such a small positive probability that the risks of being guided by the analogy—and adopting the working hypothesis that humans might respond well to the medication—outweigh the possible benefits. Either way, the idea is that the favorable case lacks prima facie plausibility. Both lines of reasoning can be represented by the simple decision table shown in figure 2.2. The expectation if we adopt the working hypothesis is pB – (1 – p)C. A skeptic about analogical arguments holds that p is always either zero or so small that this quantity is negative. We always do better to reject the analogy (and the working hypothesis). Many Bayesians advocate strict coherence (or regularity), the policy that we should assign extreme probability values 0 and 1 only to contradictions and
Adopt the working hypothesis Don't adopt the working hypothesis
Supporting generalization p
No supporting generalization 1–p
Benefit: B
Loss: – C
No benefit/loss
No benefit/loss
FIGURE 2.2. Adopting a working hypothesis.
12. Again, for the sake of simplicity, we assume a deterministic response to the medication.
56
BY PARALLEL REASONING
logical or mathematical truths. But this policy does not get us very far. If our point is to show that a particular hypothesis has appreciable probability, we cannot be content with an appeal to strict coherence. Agassi’s argument, however, appears to be that without appealing to instincts about natural kinds, there is no way to justify the assignment of appreciable probability to the existence of a supporting generalization. Indeed, the “negligible probability” argument seems to be a good way of construing Agassi’s concerns. His discomfort with reasoning by analogy is that it is a bad gamble: In this situation one is, or may be, on the verge of a great scientific breakthrough, so that one vacillates between the highest stakes and total loss. (1988, 404) The fascination, thus, is that of the gambler. What proposes that the gambler’s chances are better when the stakes are the values of analogies is the theory of natural classification, which proposes that we have an intuition leading us to the final, context-free, perfect analogy. This theory, methodological essentialism, is false. (1988, 417) What is missing here is an evaluation of costs and benefits. Agassi neglects the fact that rejection of all working hypotheses has its own high cost: no progress can be made at all. By contrast, if we employ analogies or some other means of selecting a plausible hypothesis, there is a chance of significant gain in the form of a “great scientific breakthrough.” Agassi also seems to overestimate the potential loss in adopting a working hypothesis that turns out to be false. To adopt a working hypothesis, as discussed in chapter 1, is to accept it in a very limited sense. In the example we have been discussing, the analogy might lead us to conduct further tests on the new antibiotic. The cost if the hypothesis turns out to be false is significant, but it is not the “total loss” or disaster we might face if we started mass-producing and marketing the drug (and it had unforeseen side effects). As noted above, Agassi’s view requires that the expected value of any working hypothesis supported by analogy must be negative. That assumption is unjustified, in part because the expected value calculation is impossible to perform, and in part because even an informal weighing of costs and benefits does not clearly tell against analogical arguments. I conclude that the way is open for a justification of analogical arguments that is not vulnerable to Agassi’s objections. To develop a convincing constructive justification, however, we first need to provide criteria for discriminating between good and bad analogical arguments and to show that these criteria do not depend on methodological essentialism. That is my objective in chapters 4 through 6. We also need to shift our attention from individual analogical arguments to epistemic policies. Agassi’s policy is universal mistrust: he advocates rejection of all (soft) analogical arguments. An alternative policy is critical analogical reasoning: employ analogical arguments that satisfy certain criteria. In chapter 7, I develop this policy-level justification in detail.
PHILOSOPHICAL THEORIES
57
2.7 Conclusion We should not force all analogical arguments into the mold of deduction or enumerative induction. To do so is both to distort them and to destroy any hope of making real progress on the problem of justification. It is striking that the assessment of which similarities are relevant, a standard step in the evaluation of any analogical argument, has no place in Hesse’s attempted justifications, the reduction to single-case generalization, or the “sampling” approach. Deductive approaches face the opposite problem: they presuppose full knowledge of all relevant similarities. Any approach to justification that leaves no room for critical assessment of relevance is doomed. The main weakness of the approaches we have discussed derives from the idea that the best general characterization of an analogical argument is something like schema (1.4.1): a list of similarities and differences between our two domains. This representation provides no basis for discriminating between relevant and irrelevant similarities. If that is the best we can do, then any general philosophical justification must avoid the whole question of relevant similarity. That is just what happens with most existing approaches to justification, and it explains why they are bound to fall short. We need a better representation of analogical arguments, and correspondingly more refined evaluation criteria, if we are to make headway on the problem of justification. In chapter 1 and again in section 2.3, I suggested that good analogical arguments must satisfy the following criterion: • Prior Association. There must be some connection in the source domain between the known similarities P (the positive analogy) and the further similarity Q that is projected to hold for the target domain. • Potential for Generalization. There must be some evidence for, and no compelling evidence against, the generalization of this prior association to the target domain. The principle of prior association tells us what is missing from schema (1.4.1). We represent an analogical argument not merely by listing similarities and differences, but also by indicating a prior association. The principle of potential for generalization guides us in comparing two domains to determine which similarities and differences are relevant. These two criteria form the core of my theory of analogical arguments, developed in later chapters. With these principles in place, we are in a better position to think about the justification of analogical reasoning. We need only show that some analogical arguments—namely, those that satisfy the two criteria—are justified. If we have evidence for some connection between P and Q, and no compelling reason to think that disanalogies prevent this connection from being generalized to a class that includes the target domain, then it seems that we have a promising basis for seriously considering the hypothesis Q* (the analogue of Q ).
58
BY PARALLEL REASONING
In this chapter, we have seen clues about two possible forms of justification for analogical arguments. Our response to Agassi outlined a pragmatic approach to justification, based on the costs and benefits of adopting a working hypothesis. This idea is elaborated in chapter 7. Chapter 8 develops the idea that analogical reasoning is founded on symmetry arguments. Rather than establishing or presupposing a determination rule, as Russell and Weitzenfeld require, good analogical arguments depend upon the simple idea that like problems should be treated alike. In my view, the link between analogy and symmetry is the legitimate core of the “deductivist” analysis of analogical reasoning. Russell’s example of two used cars provides a simple illustration. If two used cars are indistinguishable as regards every attribute that I take to be relevant to establishing a fair price, then I should assign them the same value. More generally, to insist upon a prior association is to require that in a successful analogy there be a relationship, in the source domain, linking the positive analogy P and the feature Q to be transferred to the target. A successful analogical argument isolates the relevant properties and shows that we have a basis for thinking the same sort of relationship obtains in the target. That basis is symmetry. The two projects of finding criteria of plausibility and providing a philosophical foundation for analogical arguments come together.
3 Computational Theories
3.1 Introduction Cognitive science and AI researchers have long recognized the importance of analogical reasoning in a wide range of human activities. In the past forty years, people have written computer programs that use analogies to solve problems, “understand” stories, make discoveries, and even provide cooking advice. A number of authors have suggested that such programs can contribute to our understanding of analogical reasoning as used in the sciences. Ideally, each AI program is based on a computational theory or model of analogy: a set of assumptions, precise enough to implement in a computer program, about how the process of analogical reasoning does or should take place. In practice, these theories analyze restricted aspects of analogical reasoning and the assumptions are made only partly explicit. Furthermore, as those in the field readily acknowledge, some programs work on only “a handful of carefully chosen examples” (Falkenhainer, Forbus, and Gentner 1989/90, 2). In return for this limitation in scope, the principal gain is the ability to exploit detailed knowledge about the problem areas. Our main question in this chapter is whether computational theories are broad enough to provide genuine insight into the sort of analogical reasoning that occurs in the sciences. To this end, we selectively review a number of analogical reasoning programs. Most computational theories of analogy fall into two camps, which I call the structuralist and case-based views. We shall discuss a few examples of each type. On the structuralist view, analogies are founded on parallel representations of knowledge about two domains. For an ideal analogy, these representations are isomorphic, and the value of an
60
BY PARALLEL REASONING
analogy is measured by the degree to which it approximates isomorphism. Within the structuralist camp, there is a division between connectionist and symbolic (propositional) approaches to representing knowledge structures, but the ideal of isomorphism unites these computational theories. The alternative case-based view models analogies on the idea that two domains share a cluster of salient, or relevant, dimensions. These are characteristics we know to have been associated in the past with outcomes of interest or to have semantically interesting links to those outcomes. The strength of an analogy depends upon the extent of relevant overlap between the source and target domains. This approach has something in common with the sampling conception of analogy discussed in chapter 2, but as we shall see, case-based reasoners solve the “counting problem” by working with a predefined set of relevant features. Both approaches attempt to model the way humans use known solutions to solve new problems. Structuralist and case-based approaches differ in three main respects: • Scope. Structuralist programs tend to be general-purpose analogical reasoners. They can find and utilize analogies in virtually any context. By contrast, case-based reasoners are typically narrow; they work within a bounded “region” of fairly similar cases—that is, a set of cases whose most important features can be adequately described using a finite and preestablished vocabulary.1 • Flexibility of knowledge representation. Structuralist programs employ a general-purpose representation of data about the source and target domains, such as some form of predicate calculus or connectionist architecture. Accordingly, information is stored as sentences in a formal language or as patterns of activation over “semantic units.” Typically, there are many ways to describe the two domains in the formal language, and few guidelines for choosing between different representations. The main alternative, used in case-based reasoning (CBR), is to represent domains, or cases, with a set of stereotypes that provide a uniform way to store information. For example, a stereotype for visiting a restaurant includes placeholders for the usual objects or agents we expect to find (waiters, tables, food), the usual goals we might have (to get a meal, to entertain a date), and the usual sequence of events likely to take place (calling the waiter, ordering a drink; see Barr and Feigenbaum 1981, 216). A particular case of visiting a restaurant is described by filling in some or all of these placeholders with definite values. Stereotypes may be
1. Cognitive science authors sometimes reserve the term “analogical reasoning” for inter-domain comparisons, such as between heat and fluids, while “case-based reasoning” refers to intra-domain comparisons, as between two meal plans or two legal cases involving contract violation (Seifert 1988, 125). Since the two types of comparison are clearly related and since there is no sure way to tell them apart, I think it is preferable to view both as varieties of analogical reasoning.
COMPUTATIONAL THEORIES
61
implemented in data structures known as scripts or frames. Choosing stereotypes ensures a uniform basis for comparing cases; however, it limits the scope of a program. • Treatment of relevance. On the structuralist approach, the relevance of a predicate or function is determined primarily by systematicity: the extent to which it enters into complex networks of relationships (though other criteria play roles as well). By comparison, CBR programs are oriented around a set of indices that includes every factor deemed to be relevant. For instance, in Hammond’s Chef program (Hammond 1989), each case is a recipe. Recipes are indexed both by the goals they satisfy (“include beef,” “make a stir-fry dish”) and by the problems they avoid (“too much liquid,” “too little leavening”). Cognitive science researchers have had a variety of purposes for developing computational models, and their associated computer programs. Early models tended to be oriented toward “understanding the basic constraints that govern human analogical thinking” (Hummel and Holyoak 1997, 458). More recent models, which seek to duplicate both the strengths and the weaknesses of human analogical thinking, are directed toward uncovering the psychological mechanisms that come into play when we use analogies. Almost without exception, these projects are descriptive; the objective is to construct psychologically realistic models of cognitive operations that involve analogies. By contrast, my project focuses on philosophical problems linked to the evaluation and justification of analogical arguments, with particular attention given to their use in scientific reasoning. My review of computational models will concentrate on their potential and their limitations for illuminating such problems. It is important to acknowledge, at the outset, that many of my criticisms may be traced at least in part to the fact that my aims diverge from the aims of the authors. With this point cleared up, I propose to use four criteria to evaluate the computational models. 1. Predictiveness is the extent to which a computational theory makes definite predictions about which analogies are more plausible than others. Advocates of computational philosophy of science point to the clarity and analytical rigor of computational theories as important virtues in their favor (Thagard 1988b), and it is certainly true that, at the level of the computer program, decisions are absolutely precise. As the preceding discussion shows, however, a computational theory of analogical reasoning includes not just the program but also the way input and requisite knowledge are represented by the user. Typically, this representation is guided by wholly or partially implicit conventions. The AI community has long been aware of this point. Predictiveness is concerned with whether the evaluation criteria embodied in both the program and the conventions about representation can be made explicit and yield a clear verdict about plausibility. The less clear the conventions about representation, and the more sensitive the program is
62
BY PARALLEL REASONING
to the details of representation, the weaker a theory’s claim to be predictive. 2. Applicability is a second important criterion for evaluating computational theories of analogy. Analogical reasoning, as we have emphasized, is used as a guide to making practical decisions. To be applicable, the evaluation criteria used by a program must be justified. Before we rely on the program, we need to be convinced that it uses reasonable procedures to discriminate between good and bad analogical reasoning. It is not enough that a program generates a “best” analogy from among those available. Even the best analogy may not be good enough to meet minimum standards for prima facie plausibility if the positive analogy between source and target domains is not sufficient. Applicability is thus a three-part criterion. It involves, first, a programmer’s definition of minimum standards for the problem area; second, an implementation of these standards in the program or representation conventions; and third, a philosophical or technical argument that the programmer’s definition is reasonable. 3. Scope has already been mentioned. A computational theory of analogy should apply to a large class of problems and explain diverse phenomena associated with analogical reasoning. For instance, a truly comprehensive theory should provide for analogies that encompass a single source and target domain as well as those involving multiple sources. It should also employ representation techniques general enough to handle a large set of analogies. This criterion, however, comes with a qualification. As the reader will shortly appreciate, many clever programs operate in rather narrow domains. The criterion of scope should not be understood as the injunction to produce all-purpose analogical reasoning programs but, rather, as the ideal of writing programs that embody principles capable of broad application. For example, Hofstadter and Mitchell comment that their program, Copycat, is “designed to handle analogies in a particularly tiny domain,” but “all the features of the Copycat architecture were in fact designed with an eye to great generality” (Hofstadter 1995, 208). 4. Simplicity is the final virtue. A theory that can account for many different types of analogies using little conceptual apparatus is preferable to a more complicated theory of equal scope. Simplicity is a virtue of computational theories, just as it is a virtue of scientific theories. Since analogical reasoning is a complex phenomenon, however, the importance of simplicity should not be exaggerated. In summary, when thinking about the role of analogies in science, the most significant issues in evaluating computational theories are predictiveness, applicability, scope, and simplicity. All are familiar from philosophical discussions of choice between scientific theories. The principal difficulty encountered by such theories is that there is an invariable trade-off between the first two virtues—predictiveness and applicability—and the third, scope. Some case-based
COMPUTATIONAL THEORIES
63
reasoning programs demonstrate impressive and credible performance, but only in highly specialized contexts not obviously capable of generalization. By contrast, general-purpose, structurally based analogical reasoners tend to rely on unspecified conventions about representation so that predictiveness and applicability are problematic. This chapter proceeds as follows. The next section discusses Evans’s ANALOGY, the first computer program to perform analogical reasoning. Evans introduced many ideas that influenced later computational theories. Section 3.3 discusses the leading structuralist approaches: Gentner’s structure-mapping theory, Holyoak and Thagard’s constraint-satisfaction model and its successor, the multiconstraint theory. Then section 3.4 provides an overview of case-based reasoning and a description of two representative programs: Ashley’s HYPO and Carbonell’s PRODIGY. Section 3.5 is devoted to Hofstadter and Mitchell’s Copycat program, which differs from both structuralist and case-based reasoning. Finally, the concluding section summarizes the relevance of the different computational theories to our analysis of analogical arguments in the sciences.
3.2 An Early Effort: Evans’s ANALOGY Program In the mid-1960s, Evans (1968) created a program to solve geometric-analogy intelligence-test problems. The problems consisted of line drawings such as shown in figure 3.1. The task was to answer the question, “Figure A is to Figure B as Figure C is to which of the following figures?” In this case, the best answer is Figure 4. The same rule—“remove the smaller object from inside the larger object”—transforms Figure A into Figure B, and Figure C into Figure 4; no comparably simple rule taking A to B would take C to any of the other given figures.2 The objective of the
A
1
2
B
3
C
4
5
FIGURE 3.1. A geometric analogy problem.
2. Note that the more complex rule “Remove the larger object and then enlarge the smaller object” transforms Figure A to Figure B and Figure C to Figure 2.
64
BY PARALLEL REASONING
ANALOGY program is to find the “best” (i.e., the simplest) set of rules transforming A into B that also transforms C into one of Figures 1 through 5. Evans identified two central questions: how to represent the line figures and how to define the transformation rules. Like contemporary case-based reasoning advocates, Evans recognized the importance of describing his figures in a standardized way. If the program (rather than the program user) were really to do the work, there should be no leeway for arbitrary choices in the representation. For this reason, he developed a limited, low-level vocabulary with rules for describing line figures composed of dots, straight line segments, and circular arcs. This was a precursor to the frame idea; indeed, Evans even called his data structures “descriptive frameworks.” Similarly, transformation rules were defined via composition from a finite class of elementary mathematical transformations: Euclidean motions (rotation and reflection), scale changes, adding objects, and removing objects. The rule that transforms A into B, in simplified terms, is the following: REMOVE an object O1 that is inside another object O2. ANALOGY is a two-part program. Part 1 takes the user-supplied description and elaborates upon it by computing a new description of each line drawing. Specifically, it runs through a finite set of relevant topological properties and relations such as INSIDE, LEFT, and ABOVE, and calculates whether each one applies. The result is a descriptive list of each figure that, as noted above, is similar to a filledin script or frame. In addition, for each pair of objects in the eight figures, Part 1 determines whether one can be transformed into the other by Euclidean motions and scale changes. For instance, can the small square in Figure C be transformed into the large triangle in Figure 5 by Euclidean motions? The answer is no. Part 2 involves three steps. 1. Calculate admissible rules for transforming Figure A into Figure B. For instance, “remove small triangle from large triangle.” In Evans’s program, each transformation rule is a list of component elementary transformations. 2. For each such rule and each of the five possible solution figures X, attempt to generalize to a rule that takes Figure C to figure X. For instance, if figure X is 4, we know that the rule “remove small square from inside circle” transforms C to X. So the generalization “remove object from inside larger object” covers both transformations. In Evans’s program, generalizations are obtained by deleting statements in the A ! B rule that do not apply to the C ! X rule. 3. Select the “best” rule. ANALOGY defines the best or simplest rule to be the most specific (roughly, the longest) rule, the idea being that such a rule is the least altered from the original A ! B rule. The program then outputs the figure corresponding to the best rule. Although a tie is possible, the answer is usually unique. Evans expanded the program with more elaborate procedures to resolve ties and to deal with more complex figures. The program apparently exceeds the performance of the average college-bound student.
COMPUTATIONAL THEORIES
65
While Evans did not develop a theory of analogical reasoning, many of his ideas are fruitful even beyond the limited domain of geometric analogies. Three ideas stand out. First, he offered a syntactic analysis of similarity. He thought that each object involved could be represented in such a way that similarity recognition “has a natural connection to and can be simply stated in terms of the syntactic structure of the representation” (1968, 346). This foreshadowed the structuralist view of analogy, to be discussed in the next section. Second, he suggested that competing candidates for the conclusion X of the analogy should be evaluated by attempting to generalize (through abstraction) the connection between A and B. This is precisely what most case-based reasoners do to make inferences by analogy. The suggestion also corresponds closely to the main ideas I expressed in the last chapter: that we should make the connection (prior association) between known and projected similarities explicit, and that this connection should be capable of generalization for a good analogy. Finally, Evans’s selection criterion—to prefer the solution requiring minimal generalization of the connection between A and B—offers an ingenious way to adjudicate between competing analogical inferences. His selection criterion is based on a type of conservatism or parsimony: it favors the analogy that requires minimal modification. Relative to our purposes, the limitations of Evans’s program all have to do with its scope. His simple domain has nice features that are not available in most cases of analogical reasoning. In particular, consider the condition that all similarities can be represented structurally or syntactically in terms of a predefined set of predicates. This condition is satisfied for Evans’s geometric analogies, since both the figures and the transformations relating them are simple, precisely defined in advance, and finite in number. It fails in most scientific investigations, however, as will be argued in the next section. A second limitation, observed by Kedar-Cabelli (1988b, 97), is that in Evans’s geometric analogies, all features present in the description, and no features absent from it, are relevant. However, Evans’s program provides no insight into one of the central problems of analogical reasoning—namely, sorting out which items of the given information, and which items of contextual background information, are relevant.
3.3 Structuralist Approaches The Structure-mapping Theory Following Evans, many computational theories of analogy have given a prominent role to syntactic or structural considerations. The most extreme proposal, and the most influential, has been Gentner’s structure-mapping theory, which assesses the strength of all analogies on structural grounds.3 The theory, together
3. Gentner acknowledges that pragmatic factors shape the way that source and target domains are represented in an analogy, but she originally maintained that given a representation, finding and evaluating the best analogy was a purely syntactical matter. Later versions of the theory incorporate other constraints (Forbus, Ferguson, and Gentner 1994; Forbus 2001).
66
BY PARALLEL REASONING
with the associated computational systems SME (mapping) and MAC/FAC (retrieval), is developed in Gentner (1983); Falkenhainer, Forbus, and Gentner (1989/90); Forbus, Gentner, and Law (1995), and numerous other papers. The central idea of Gentner’s theory is that good analogies are characterized by systematic relational correspondences: “Analogies are about relations, rather than simple features. No matter what kind of knowledge (causal models, plans, stories, etc.), it is the structural properties (i.e., the interrelationships between the facts) that determine the content of an analogy” (Falkenhainer, Gentner, and Law 1989/90, 3). Gentner makes a fundamental distinction between properties, or monadic predicates, and relations, which have multiple arguments. She also distinguishes different orders of relations. The order of a relation (or function) is one plus the maximum order of its arguments, where objects are considered to have order 0. Consider the sentence, “Gravitational attraction between the sun and a planet, and the fact that the mass of the sun is much greater than that of the planet, causes the planet to orbit the sun.” Gentner represents this in the following form: CAUSE( AND [ATTRACTS(sun, planet), GREATER( MASS(sun), MASS(planet)], REVOLVE-AROUND(planet, sun)). In this sentence, REVOLVE AROUND and ATTRACTS are first-order relations, GREATER is second-order, AND is third-order, and CAUSE is fourth-order. MASS is a first-order function.4 The structure-mapping theory assumes a propositional representation of both source and target domains. An analogy mapping M is a one-to-one function from the items in the source domain to those in the target. It may be a partial mapping; not every item in the source domain need have a target image. Gentner imposes three further constraints on M: 1. Identicality. Only identical relational predicates can be matched, although nonidentical objects, functions, and monadic predicates may be matched.5 2. n-ary restriction. M must map objects to objects, n-place functions to n-place functions, and n-place predicates to n-place predicates. 3. Consistency. Whenever M maps P to P*, it must map the arguments of P to the corresponding arguments of P*. The best mapping M is determined by systematicity: the extent to which it places higher order relations, and items that are nested in higher order relations, in correspondence. In particular, properties and functions of objects are unimportant
4. The full example is given below. Logicians treat AND as a connective rather than a relation, but no harm is done here by treating it as a relation. 5. The identicality condition is relaxed in later work; see Forbus 2001. The n-ary restriction is also relaxed in some structuralist models.
COMPUTATIONAL THEORIES
67
unless they are part of a relational network. Gentner’s Systematicity Principle (1983, 163) states: (SP) A predicate that belongs to a mappable system of mutually interconnecting relationships is more likely to be imported into the target than is an isolated predicate. By the same token, predicates occurring in statements involving higher order relations are more likely to be imported into the target than those occurring only in lower order relations. This criterion depends solely upon the syntax of the given representations and not at all upon the content. Gentner and her colleagues have applied the structure-mapping theory to numerous examples, one of them being the putative early twentieth-century analogy between the solar system and the atom. The description of the two domains is reproduced here as figure 3.2.6 While there is nothing wrong with this representation, it may seem that the end-result is preordained. As Hofstadter (1995, 282) writes about this example, “the analogy is already effectively given in the representations.” There are a few irrelevant assertions about the temperatures of the sun and the planet, but Gentner’s program SME (the structure-mapping engine) successfully navigates around such lower order statements. The preference for systematicity generates the “right” analogy, which sanctions the inference that attraction between the nucleus and electron, combined with the greater mass of the nucleus, causes the electron to orbit the nucleus. The most attractive feature of the structure-mapping theory is its simplicity. If, as Evans hoped, a purely syntactical approach proves adequate for evaluating all (or even a large class of ) analogies, that is a major achievement. At the present stage of research, however, this hope may be overambitious. The structuremapping theory faces severe difficulties, most easily described in terms of the evaluation criteria noted in section 3.1. The first and most obvious problem has to do with predictiveness. The point is nicely expressed by Gentner’s own warning (quoted at the beginning of this chapter) to avoid carefully chosen examples. The warning applies as much to representation as to content. A purely structural theory of analogy is highly sensitive to the details of representation. The structure-mapping theory appears to rely on the assumption that formulating domain representations is a small job compared to finding the best analogy mapping. In fact, the hardest task may be to represent known information so that a consistent, nontrivial analogy can be exhibited. Since this is now a familiar criticism of SME and its descendants (Mitchell 1993, Hofstadter 1995), I will illustrate with a single example. Consider the solar system/atom analogy. Suppose we represent our knowledge about the attractive forces in the two domains as
6. From Falkenhainer, Forbus, and Gentner (1989/90, 56). The representation is substantially the same in Gentner (1983). As Holyoak and Thagard note (1995, 189), this analogy does not appear to have played any significant role in Rutherford’s thinking, even though it may be pedagogically useful.
B.2. Solar system/Rutherford atom B.2.1. Solar system (defEntity sun :type inanimate) (defEntity planet :type inanimate) (defDescription solar-system entities (sun planet) expressions (((mass sun) :name mass-sun) ((mass planet) :name mass-planet) ((greater mass-sun mass-planet) :name >mass) ((attracts sun planet) :name attracts) ((revolve-around planet sun) :name revolve) ((and >mass attracts) :name and1) ((cause and1 revolve) :name cause-revolve) ((temperature sun) :name temp-sun) ((temperature planet) :name temp-planet) ((greater temp-sun temp-planet) :name >temp) ((gravity mass-sun mass-planet) :name force-gravity) ((cause force-gravity attracts) :name why-attracts)))
B.2.2 Rutherford atom (defEntity nucleus :type inanimate) (defEntity electron :type inanimate) (defDescription rutherford-atom entity (nucleus electron) expressions (((mass nucleus :name mass-n) ((mass electron) :name mass-e) ((greater mass-n mass-e) :name >mass) ((attracts nucleus electron) :name attracts) ((revolve-around electron nucleus) :name revolve) ((charge electron) :name q-electron) ((charge nucleus) :name q-nucleus) ((opposite-sign q-nucleus q-electron) :name >charge) ((cause >charge attracts) :name why-attracts)))
FIGURE 3.2. The solar-system/atom analogy.
68
COMPUTATIONAL THEORIES
69
GRAV-FORCE(sun, planet) and ELECTRICAL-FORCE(nucleus, electron, attractive). We have added a third term for the valence of the electrical force that, unlike gravity, can be either attractive or repulsive. As a result of the n-ary restriction, the structure-mapping theory is unable to find any analogy between the twoplace and the three-place relations.7 If we represent the source domain in too much or too little detail compared to the target, then a purely structural approach will fail. The structure-mapping theory, unlike Evans’s earlier effort, makes no provision for the elaboration that is essential in such cases. Gentner has acknowledged this problem, but (to my knowledge) there is no adequate solution. The next problem relates to the theory’s scope. The structure-mapping theory appears to be universal and has been applied to many examples. Yet it is not always appropriate to give priority to systematic, high-level relational matches. What Gentner calls “superficial feature matches” can be extremely important in certain types of analogical reasoning in science. For instance, archeologists employ a method called ethnographic analogy (section 4.8) to determine the probable use of tools and artifacts from ancient cultures. The idea is to compare the artifacts in the archeological record to similar items in existing cultures. The strength of these analogies is based, to a large degree, on surface resemblances between the two artifacts, regardless of whether these resemblances are known to participate in elaborate relational networks. The most interesting difficulties with the structure-mapping theory, however, relate to its applicability. Let us examine Gentner’s central thesis that analogies are all about structure. That thesis persists in later versions of SME and its various offshoots. A recent paper describes the “systematicity bias” as “the preference for alignments that form deeply interconnected structures” (Gentner, Bowdle, Wolff, and Boronat 2001). On the surface this is a descriptive thesis, and that is often how Gentner and her colleagues portray it. They write that the Systematicity Principle “fits with evidence that people naturally interpret analogy and metaphor by mapping connected systems of belief, rather than independent features” (208). Yet research by Gentner and others shows that something different happens in psychological experiments: “unless subjects are given extensive training, examples, or hints, they will be much more likely to choose superficial features than deep structural features when using analogies” (Dunbar 2001, 313; see also Forbus, Gentner, and Law 1995; Gick and Holyoak 1983). Such superficiality is obviously not limited to the laboratory; it is equally
7. The n-ary restriction is extremely problematic for mathematical analogies. Natural analogies emerge between two- and three-place relations when comparing two- and three-dimensional geometry. This point will be developed in chapter 5.
70
BY PARALLEL REASONING
characteristic of the sorts of analogies that have guided the practice of magic and divination for thousands of years. Systematic analogical thinking is exemplified in the work of mathematicians and scientists (Dunbar 2001) and in many metaphorical constructions (Lakoff and Johnson 1980), but it cannot be taken for granted. The Systematicity Principle is normative, not descriptive. It is a principle meant to characterize good or plausible analogical reasoning, and it is the most important normative principle underlying current work on analogies in cognitive science. But what does it really mean? Let us distinguish between two different versions: (SP 1) Individual predicates that are plausibly mapped (transferred) to the target domain via analogical inference are typically embedded in complex (high-order) relations. (SP 2) A predicate that is embedded in a complex (high-order) relation is more plausibly transferred (via analogical inference) than one that is isolated, or embedded only in low-order relations. It follows that an analogical inference has a degree of plausibility that increases monotonically with its level of systematicity. The principle (SP 1) is more or less acceptable. Analogical reasoning does typically involve the transfer of features embedded within systematic relationships. Gentner’s theory, however, requires principle (SP 2). SME prefers the mapping that puts the highest possible level of relations into correspondence, and everything else follows from this principle. I have three criticisms of principle (SP 2). First, the n-ary restriction compels us to discount certain kinds of systematicity, with the result that a less systematic analogy may be more plausible. In other words, SME might lead us to overlook a fine analogical argument. The example above, in which electrical force is represented as a three-place relation, illustrates the point. It is likely that any intuitively plausible analogy can be represented in a form such that SME produces the desired mapping, but then it is the user of SME (rather than the program) that has grasped the analogy’s plausibility. A similar problem pertains to the identicality restriction, though (as noted) Gentner and her colleagues have relaxed it. Second, increased systematicity is not sufficient for increased plausibility. An implausible analogy can be represented in a form congenial to SME, because in a purely syntactical approach, high-order relations are cheap. Taking up Achinstein’s example of analogy between swans and line segments (section 1.6), the relations “has the same color as” and “is congruent to” are analogous because both are equivalence relations. Provided our representation of the two domains includes this irrelevant high-level information, we may be able to sanction the frivolous analogical inference that congruent lines tend to cluster, just like swans of the same color. Both of these difficulties have to do with representation, but principle (SP 2) has a third and, in my view, more damaging weakness—one that applies to any
COMPUTATIONAL THEORIES
71
theory that simply equates degree of structural isomorphism with plausibility. The Systematicity Principle takes no account of valence, or the direction of relevance. In some analogical arguments, reducing high-level overlap can actually contribute to plausibility if what is eliminated is a preventive cause rather than a contributing cause. To illustrate, let us consider the analogical argument that life exists or has existed on Mars. Even recently, people have advanced versions of this argument. The best source domains for this analogy are frozen lakes in Antarctica or glaciers in Greenland, where microbes have been found to thrive despite the cold. The word “despite” is crucial here. Freezing temperatures are preventive or counteracting causes; they are negatively relevant to the existence of life. Some researchers believe that the climate of Mars was more favorable to life 3.5 billion years ago than it is today, because temperatures were warmer. Keeping all other relevant factors fixed for the sake of argument, the analogy between Antarctica and present-day Mars is more systematic than the analogy between Antarctica and ancient Mars. According to the Systematicity Principle, the analogy with Antarctica should provide stronger support for life on Mars today than it does for life on ancient Mars. Once we appreciate the problem, similar examples are easy to construct. Sir Edmund Hillary and Tenzing Norgay reached the summit of Mount Everest in 1953, despite wearing 30-pound packs. The weight of the packs was negatively relevant to the success of the expedition; it figures in relevant, high-level causal relations. Later mountaineers would profit from Hillary and Norgay’s example even without the heavy packs. But to maximize high-level relational correspondence between a current Everest expedition and the 1953 climb, mountaineers should take a 30-pound pack. Increased systematicity does not always increase plausibility; reduced systematicity does not always decrease it. The elimination of systematic overlap contributes to plausibility when what is eliminated is a counteracting cause. The focus on counteracting causes makes this very clear, but the more general point is that without taking into account the nature of the relationships of various factors to the hypothetical analogy, systematicity can be misleading. With this observation in mind, let us take another look at principles (SP 1) and (SP 2). Principle (SP 1) asserts that predicates that are amenable (to plausible transfer by analogy) tend to be systematic (i.e., to participate in high-level relations); principle (SP 2) asserts just the reverse. Interpreted as statistical claims, principles (SP 1) and (SP 2) are equivalent because positive statistical relevance is a symmetrical relationship. I think that the statistical interpretation of principle (SP 1) is acceptable, and therefore I accept a statistical interpretation of principle (SP 2) as well. In fact, to acknowledge this point allows us to explain why the Systematicity Principle enjoys a measure of success. Systematicity can be a rough statistical indicator of plausible analogical arguments. What the examples of the preceding paragraphs demonstrate, however, is that we should reject the idea that systematicity per se produces or explains the plausibility of an analogical argument.
72
BY PARALLEL REASONING
When we reason by analogy that some feature of the source domain is present in the target domain, we must first determine which features of the source domain are relevant and how they relate to the analogical conclusion. Then we investigate whether the crucial contributing features (or something similar) are present in the target domain. That is what explains the plausibility (or implausibility) of the argument. There is no syntactic short-cut via systematicity.
A Constraint-satisfaction Model Holyoak and Thagard have attempted to address some of the problems with SME while maintaining a structuralist theory of analogy. Their constraint-satisfaction model of analogy, like Gentner’s theory, focuses on the construction of the analogy mapping. Like Gentner, they stress criteria such as consistency and systematicity. Unlike Gentner, they acknowledge the importance of pragmatic factors: The view here is that analogy, and inference in general, must be understood pragmatically, taking into account the goals and purposes of the cognitive system. . . . Analogies are virtually always used to serve some known purpose, and the purpose will guide selection [of the aspects of the source relevant to the analogy]. (Holyoak and Thagard 1989, 296–97) They replace Gentner’s restriction that analogical matching can occur only between identical relations with a notion of semantic similarity. Semantic similarity, in Holyoak and Thagard’s theory, is a numerical estimate of the degree to which people regard two properties or relations as being alike. Holyoak and Thagard maintain that structural, semantic, and pragmatic constraints provide “pressures” supporting or weakening pairwise correspondences. A global analogy mapping between domains emerges from these pairwise correspondences. The theory is implemented in a connectionist program called the Analogical Constraint Mapping Engine (ACME) that works in the following manner. The user-supplied input has three components: 1. Structural description: predicate-calculus representations of the source and target domain, rather like those used by Gentner 2. Semantic information: optional pre-assigned numerical similarity weights for potentially corresponding pairs of concepts in the source and target domains 3. Pragmatic information: optional statements identifying predicates as important (of interest) or indicating that certain correspondences are strongly presumed to hold The program works by first constructing a network of links between each possible pair of source-target elements and assigning an “activation value” to
COMPUTATIONAL THEORIES
73
each link. ACME then runs through a series of cycles in which each link has its activation level updated, based on how well it coheres with the previous cycle’s activation levels for all other pairings. High levels for consistent pairings raise a link’s activation value; high levels for inconsistent pairings lower it. Semantic and pragmatic information provide parallel pressure to elevate the activation levels for pairs that have been either identified as highly similar or are presumed to correspond. Eventually, the activation values stabilize and an analogy map emerges as the conjunction of all “best” pairings—those with high activation values. As a psychological model of how humans actually construct analogies, Holyoak and Thagard’s theory has a number of positive features. It seems natural that in many cases we develop analogies “from the bottom up,” beginning with individual matches between domain elements, rather than starting with high-level structural correspondence. The introduction of semantic and pragmatic criteria allows the program to take into account the sorts of biases that inevitably influence which analogies we find persuasive. The “cyclical” program structure reflects the human tendency to revise one’s ideas about the different portions of an analogy in a gradual manner. Although structural (syntactic) constraints play the paramount role, all three types of constraint—syntactic, semantic, and pragmatic—are viewed as pressures influencing the final selection of an analogy mapping, rather than as absolute requirements. One other interesting feature of the theory is the way in which it automatically takes care of competing analogies. ACME does not implement any normative principles for adjudication; instead, the preferred analogy emerges under the pressure of competing constraints. On predictiveness and applicability, it appears that the constraint-satisfaction model fares little better than the structure-mapping theory. Sensitivity to the initial representation is even more of a problem, since in addition to the propositional description of the domains, the user must provide specific similarity weights and pragmatic cues. The idea that the user can supply pragmatic and semantic weights may be too simple.8 How can we guard against manipulation to get the desired result? Holyoak and Thagard provide no guidelines about these inputs. If we have doubts, then we are unlikely to accept any inference made by the program. Since this objection is essentially the same as the first one lodged against SME, I will not dwell on it further. ACME also appears to be vulnerable to the three objections raised against Gentner’s Systematicity Principle, since the semantic and pragmatic constraints offer no way to handle preventive factors. Thagard claims that his updated model, Drama, minimizes concerns about hand-tailored input (Eliasmith and Thagard 2001) and that the problem is no worse for Drama than it is “for any model of analogy handling equally complex real-world analogies.” It is, however, precisely for such general-purpose analogical reasoning programs that the problem of tailored input is so acute. Let us pass on to Holyoak and Thagard’s book-length treatment of their theory. 8. The program Analog Retrieval by Constraint Satisfaction (ARCS), also developed by Thagard, Holyoak, Nelson, and Gochfeld (1990), employs a semantic network to select potential source analogs based on similarity. But it is not clear that such networks can be used to calculate the input values for ACME.
74
BY PARALLEL REASONING
The Multiconstraint Theory In Mental Leaps (Holyoak and Thagard 1995), the fundamental ideas implemented in ACME and related programs are presented in popular, nontechnical fashion. The authors apply their theory of analogical reasoning to an enormous range of applications: concept formation and reasoning in children and animals, political and legal decision-making, theological and scientific reasoning. Examples are analyzed both as evidence for and as applications of the multiconstraint theory. The multiconstraint theory does not differ significantly from the earlier constraint-satisfaction model; in fact, the ACME program (along with other associated programs) is presented as the computational implementation of the theory. Once again, the core idea of the theory is that three types of constraint—similarity, structure, and purpose—influence the process of analogical reasoning. The authors stress that these are “soft constraints” or pressures that guide reasoning along the connectionist lines of ACME, rather than “hard constraints” or absolute requirements. One of the appeals of the connectionist model, as already noted, is that there is no need to turn these constraints into sharp principles. Unfortunately, this feature of the model becomes a defect if our project is to articulate a clear normative account of analogical arguments. From this perspective, there is very little that can be said about the three types of constraint. • Similarity. The authors note that many analogies are influenced by a perception of “global similarity,” based on sensory features. They also suggest that similarity is assessed on the basis of semantic networks (indicating inheritance relationships and the like). In practice, however, the authors continue to supply numerical measures of similarity as inputs to ACME. • Structure. Information about domains is represented in the same way as in SME and ACME, using a form of predicate calculus with relations of different orders. The authors distinguish three types of analogies, based on the highest order of the properties and relations placed in correspondence: (1) attribute mappings, which only pair up monadic predicates; (2) relational mappings,which pair up first-order relations; and (3) system mappings, which develop correspondences between relations of all orders. Such mappings should ideally satisfy the requirements to be isomorphisms (1-to-1 and consistent), but these are not mandatory. Systematicity is the primary criterion for updating semantic linkages as ACME runs through various iterations. • Purpose. Holyoak and Thagard indicate that goals provide important constraints upon analogical reasoning. Relative to our goals, certain information becomes salient. Following Hesse, the authors agree that causal information is often of central importance; however, they provide a list of additional higher order relations typically relevant to goals, such as relations of explanation and implication. In practice, a goal is implemented in ACME as a special boost to the weight of one or more semantic links, reflecting the desire to favor one or more correspondences.
COMPUTATIONAL THEORIES
75
Holyoak and Thagard claim both that the multiconstraint theory is a substantive, normative account of analogical reasoning and that, as a descriptive account of human reasoning practices, it draws empirical support from a wide variety of contexts as well as from the agreement of ACME and related programs with actual human decisions. Both of these claims can be disputed. So far as I can tell, the authors make only the following normative recommendations: 1. Employ system mappings (rather than attribute or relational mappings) based on higher order relations such as “causes,” “facilitates,” and “is incompatible with” (36, 145). 2. Try to work with analogies that can be evaluated with respect to purpose and similarity (36). 3. Where possible, construct multiple mappings to different source analogs (146). 4. Evaluate the results or recommendations of an analogical argument independently (146). Point 4 is not really part of the logic of reasoning by analogy, though it is certainly a sound methodological suggestion. The other three recommendations do not constitute a substantive normative theory. It might be argued that the authors have merely failed to articulate the substantive normative theory embedded in ACME, and that the success of the program in generating plausible analogical inferences provides an indication that it has normative content. But does the multiconstraint theory really have this empirical support? To answer this question it helps to distinguish two versions of the theory. The “light” version is summed up in the primary thesis that the three types of constraint identified by the authors all have a role to play in analogical reasoning, and in the secondary thesis that system mappings are the strongest kind of analogy (cf. Gentner’s systematicity principle). The “heavy” version is the program ACME together with the conventions employed by the authors in creating the program’s input. Throughout most of Mental Leaps, talk of confirmation for the multiconstraint theory appears to be limited to the light version, and indeed that theory is quite plausible. But it is too thin to count as a substantive normative theory. As for ACME, the difficulty with confirmation (as pointed out above) is that the conventions for representing input to the program are not made explicit. Since we don’t really know what the “heavy” theory is, we cannot properly say that ACME’s putative successes provide confirmation.9 Hofstadter (1995, 298) has written that SME and ACME work in “hollow ‘real-world’ domains.” That is, they are able to perform analogical reasoning in all sorts of real-world settings, but the programs “know virtually nothing about concepts.” This is not an objection, since both SME and ACME are based on the thesis that analogical reasoning is syntactical (or primarily syntactical) so that no 9. Drama, while it employs a mapping algorithm that appears to improve upon ACME, is still based on the multiconstraint theory (Eliasmith and Thagard 2001). Thus, it remains difficult to articulate what norms for analogical reasoning are embodied in the program.
76
BY PARALLEL REASONING
knowledge of concepts should be required. But Hofstadter’s observation reflects a concern to understand more clearly what substantive normative principles are guiding these programs. Structuralist programs provide attractive models for integrating a great deal of information and handling multiple analogical arguments. Apart from the familiar concerns about representation, I have suggested two major challenges. The first is to defend the link between structure and plausibility (taking into account concerns about whether factors that enter into complex structural relations are positively relevant to the analogical conclusion). The second is to articulate the evaluative norms embodied in the algorithms and input representation. As we shall see, these problems do not arise for case-based analogical reasoning programs, which tend to be based on rigid representation structures. The price is that such programs give up pretensions to full generality and to real-world analogical reasoning.
3.4 Case-based Reasoning The fundamental idea of case-based reasoning (CBR) is to solve or analyze new problems by adapting solutions, or solution methods, used in similar previous problems (Riesbeck and Schank 1989, 25). One key assumption is that past experience can be represented in terms of stereotypical frames or scripts. A second assumption is that we have accumulated a moderately large set of past cases— large enough so that for any new case there will be a past case that resembles it closely in relevant respects. Given these two assumptions, we can try to solve a new problem of type X by modifying a similar past instance of the script for type X. “Not too much thinking has to be done, by a computer or a person, if the right script is available” (Riesbeck and Schank 1989, 4). The novelty of the case-based reasoning approach is to treat all thinking in the same way: “Taken as a strong hypothesis about the nature of human thought, it is possible to imagine that thinking is no more than applying scripts, no matter what the situation” (1989, 4). The hypothesis appears to imply that any domain of human thought can be modeled with a suitably complex set of scripts, together with rules for applying them. I find this hypothesis inherently implausible. In fact, even the hypothesis that all analogical reasoning can be modeled with scripts appears to be false. One grave difficulty is that many analogical arguments involve single source and target domains, while scripts seem to require a good number of past cases. Since it is not my present purpose to evaluate Riesbeck and Schank’s hypothesis, and since the “script” hypothesis has been widely criticized by others, I will not press the point. I simply want to indicate that we cannot expect CBR to offer a complete picture of analogical reasoning. In this section, I assess the extent to which case-based reasoning can model analogical reasoning. To begin, every case-based reasoning program has four components: 1. A library of past cases 2. An indexing scheme
COMPUTATIONAL THEORIES
77
3. A method for matching a new case with similar past cases 4. A method of adapting past solutions Developing each component can be laborious and expensive, but I will focus on what are often considered the most difficult tasks: choosing a suitable indexing scheme and developing good techniques for adapting past cases. An index is a feature of past cases considered to be important; it is used both for retrieving past cases and for assessing relevant similarities and differences between cases—for instance, blood pressure and weight might be two indices in a program designed to diagnose and treat heart diseases. The challenge is to find a set of indices that reflect all important similarities and differences likely to arise between cases. The use of indices marks the sharpest contrast between CBR and structuralist approaches. Structuralist programs, such as Gentner’s SME, determine what is relevant dynamically, during the operation of the program. The CBR programs, by comparison, rely on a preestablished rigid list of relevant features.10 Four types of index are commonly employed in case-based reasoning programs:11 1. Correlated factors or dimensions. These have been positively or negatively associated with a particular outcome in the past. Ashley’s HYPO program, described below, identifies a number of factors that are important in trade secrets law. A factor that might favor the plaintiff, for instance, is the existence of a valid agreement by the defendant not to reveal product information. 2. Causal factors or relationships. These are like dimensions, but supplemented with a causal model of the domain in question. For instance, the above-mentioned imaginary program to diagnose heart disease would likely make use of a model in which blood pressure and weight were causal factors. 3. Teleological or goal-related concepts. In problem solving, the most important cases to retrieve may be those which share a goal or subgoal with the present problem. Kolodner’s Julia program—a sort of chef’s assistant— uses a library of cases indexed by culinary goals: vegetarian main course, nondairy dessert recipe, etc. (Kolodner 1993). 4. Derivational indices. These are features of attempted solutions to a past case, such as plans or rules that were applied. Each step in the solution is indexed by these features. For example, if each case is a proof in propositional logic, then one index is the inference rule used at each step. Carbonell has used this approach to develop the Prodigy program, which reasons analogically about simple mathematical problems and planning tasks.
10. It may be that there is a formal equivalence between structuralist and CBR programs: that any structuralist program could be implemented as a CBR program, and vice versa. In practice, though, the emphasis and range of applications differ along the lines indicated. 11. This classification is modified from Ashley (1988a, 197).
78
BY PARALLEL REASONING
To choose a set of indices is to define in advance the set of relevant features for a set of cases. Many CBR programs perform well in specialized problem areas because of a well-designed indexing scheme. Still, any set of indices is only suitable for a carefully circumscribed set of cases since we must know in advance that these indices are the only factors relevant to conclusions of interest. The second difficult part of a CBR program is to develop methods for adapting past cases to solve present problems. The problem involves coming up with adaptation rules that strike an appropriate level of generality. One of the simplest techniques is straight substitution: find a common frame that applies to both the old and new cases, and simply substitute new items for the corresponding old ones (Riesbeck and Schank 1989). In CBR programs, cases may be instances of many different frames at varying levels of generality. For example, the CHEF program might store a particular beef-broccoli stir-fry recipe under “beef dish” and “meat dish.” To produce a pork stir-fry recipe, the more general frame “meat dish” would be used, with pork substituted for beef. General frames should not, however, be contrived to “solve” a specific problem. The Mediator program (Simpson 1985) illustrates this difficulty. Mediator proposes compromises to resolve two-party disputes. One of the cases in Mediator’s library involves settling a dispute between two children over an orange. A proposal to divide the orange equally failed because one child wanted the peel while the other wanted the fruit. The correct strategy, stored in script form, was “divide into different, but compatible goals” and give each child the part she wanted. Mediator later adopted this strategy to propose an analogous resolution to the Egypt-Israel dispute over the Sinai Peninsula. The solution was to give Israel military control and Egypt political and economic control. As one critic has commented, however, the program only “discovers” the analogy between dividing the orange and dividing the Sinai “because of an explicit [ frame] structure that characterizes both as disputes where the disputants have different, separately satisfiable goals” (Burstein 1988a, 136). The frames are so specialized as to seem tailor-made for this example. Like indices, general frames should strive for an intermediate level of generality to be convincing. This explains the need to assume a case base sufficiently large so that for any new case there will be a closely similar past one. Without this assumption, the straight substitution technique will not work effectively. Reinstantiation is a second method for adapting past cases to solve current problems. Rather than trying to make substitutions or otherwise modify an old solution directly, the idea is to reapply the strategies that worked in the old solutions. This method is only available for programs with derivational indices. To illustrate how indices and adaptation techniques work, I briefly describe two case-based reasoning programs: Ashley’s HYPO and Carbonell’s Prodigy.
Hypo Ashley’s HYPO software performs case-based legal reasoning in the domain of trade secrets law (1990). In such cases, a plaintiff (usually a corporation) makes an accusation of trade secrets violation against a defendant (usually an
COMPUTATIONAL THEORIES
79
employee). A library of past cases is indexed by thirteen dimensions—features of a case that tend to strengthen or weaken a legal claim. The existence of a nondisclosure agreement between plaintiff and defendant is one important factor in deciding whether a trade secret has been misappropriated. A second key consideration is the extent of voluntary outside disclosures. The fact that the plaintiff has freely provided information about the product to outsiders will weaken the claim that a confidence has been violated. The thirteen dimensions (and nothing else) constitute the relevant features in determining the important similarities and differences between a precedent and a current case. Each past precedent is reduced to its simplest terms: the set of relevant features and the outcome of the case, which is a ruling in favor of the plaintiff or the defendant. The user—presumably a lawyer or judge—provides a structured, low-level description of the case to be decided. HYPO’s output consists of arguments—in fact, multiple levels of argumentation, corresponding to the sorts of reasoning that interest advocates on both sides of the case. HYPO’s first step is to determine which of the thirteen dimensions apply. Each dimension is defined by a set of preconditions, stated in terms of the low-level input description. This stage of Ashley’s program is comparable to Evans’s elaboration phase. The second step is to retrieve all cases that share at least one dimension with the current case, and to partially order these cases by the degree of overlap between their lists of dimensions and the dimensions of the new case. HYPO then cites the best, or “most on-point,” cases for each side. The best cases for a side are precedents that are maximal in the partial ordering, among those which were decided in favor of that side. This is the first level of argument that HYPO produces. To illustrate, suppose that the situation is as indicated below, where d1, d2, . . . , d8 are different dimensions, Def indicates that a past case was decided in favor of the defendant, and Pla stands for a decision for the plaintiff. This is shown below: New case
Case A
Case B
Case C
Case D
d1, d4, d5
d3, d4 Pla
d1, d4, d8 Def
d1, d5, d6 Pla
d5 Pla
Cases A and C are both maximally on-point for the plaintiff, case B for the defendant. Case D is not maximally on-point because it shares only d5 with the new case, whereas case C shares both d5 and d1. HYPO next constructs, for each maximally on-point case, possible counterarguments for the other side. These counterarguments suggest ways to distinguish, or disqualify, the precedent because of unshared dimensions, differing magnitudes along dimensions such as “number of outside disclosures,” or— most disastrously—“trumping” cases. That is, case 2 trumps case 1 if (a) the overlap (common dimensions) between case 2 and the case at hand includes the overlap between case 1; and (b) cases 1 and 2 were decided differently. In the schematic example of the preceding paragraph, case A for the plaintiff is trumped by case B for the defendant (since d3 is irrelevant). HYPO also provides
80
BY PARALLEL REASONING
a third tier of argumentation—namely, rebuttals to the counterarguments. For example, HYPO will suggest that the plaintiff could attempt to distinguish the trumping case B by arguing that dimension d8 was essential to the decision in favor of the defendant but is absent in the case at hand. Finally, what gives HYPO its name is its ability to posit hypothetical or “what if ” scenarios. The program can counterfactually modify the case to be decided by adding or subtracting dimensions. Posing hypothetical questions is an essential legal skill. For example, the lawyer for the defendant in a trade-secrets case might want to know whether she would have a stronger case—and the plaintiff a weaker case—if the plaintiff had voluntarily disclosed information (to its customers, in trade journals, or elsewhere), even if she presently lacks credible evidence of such disclosure. HYPO has many striking features: a clear definition of what counts as relevant (the thirteen dimensions), a straightforward treatment of similarity in terms of matching along dimensions, and a model for adversarial case-based reasoning. There is no need to adapt past solutions. Instead, HYPO combines the information from all possible precedents to produce its arguments. Ashley’s approach to reasoning with multiple sources is especially innovative. Rather than attempt to identify the unique best analogy, HYPO finds the best arguments for opposing positions and highlights the relevant factors in these arguments. In connection with this point, it is worth noting that in Ashley’s view, HYPO’s inability to assign different weights to the dimensions (or even to ignore some of them) depending on context is a weakness. But this characteristic might actually be a virtue if we think about how analogies function in science. In writing about the early stages of theory choice, Kuhn suggests that science “requires a decision process which permits rational men to disagree” (1977, 332; emphasis in original). Values such as simplicity, fruitfulness, scope, and consistency with existing theory are highly influential, but these values frequently clash. Scientists diverge both on how to interpret each of these values and on what weight to assign them in cases of conflict, but this type of conflict is healthy for science: “opposing maxims alter the nature of the decision to be made, highlight the essential issues it presents, and point to those remaining aspects of the decision for which each individual must take responsibility” (330). It seems that what HYPO models, with considerable success, is a comparable process that ought to be the initial stage of a legal investigation. A determinate weighting of dimensions is not appropriate. There are, however, at least three important reasons to think that the ideas used in HYPO could not easily be extended to scientific reasoning by analogy. First, as already mentioned, in scientific analogies there is often only one source and one target, rather than a large body of related cases. More generally, there is often no large “class” of cases that includes both source and target domains. Second, HYPO works with a fixed set of thirteen dimensions that determine everything else, but nothing like this obtains for most scientific analogies. Finally, most scientific analogies are based on specific causal or logical relationships, rather than mere correlation between characteristic features and outcomes. These causal or logical relationships must be considered to determine what is relevant, yet HYPO has no way to model them.
COMPUTATIONAL THEORIES
81
The Prodigy Program and Derivational Analogy Derivational analogy, an approach implemented in Carbonell’s Prodigy program, is based on the idea that the reasoning processes used to derive past solutions, and not just the resultant problem solutions, provide useful information for solving new problems (Carbonell 1986; Carbonell and Veloso 1988, 1993).12 Carbonell (1983) first developed a procedure called transformational analogy in which a target problem was solved by transforming the solution to a source problem through a series of transformation operators. One difficulty with this approach was that similar problem solutions can vary considerably in the way they are represented. For example, the same sorting algorithm is coded quite differently in two programming languages. This makes transformation from one solution to another extremely complicated. Derivational analogy was introduced to overcome this problem. A derivation for Carbonell means a “justified solution path” (Carbonell and Veloso 1988) and includes all dead-end lines of reasoning as well as the steps that actually have led to a successful solution. Derivational analogy works in conjunction with other problem-solving methods, such as heuristic search or plan instantiation. The procedure begins by using one of these methods, and then retrieves all past cases whose early solution paths parallel that of the new problem. Prodigy then tries to “replay” the subsequent reasoning steps from these past problems by applying them to the new problem. To use derivational analogy, each step in a past derivation must be stored and classified under several indices: • • • • •
Rules or operators used in the step Pre-conditions required for these rules/operators to be used Subgoals (if any) the step was attempting to satisfy Reasons that step was taken rather than alternatives Reasons we abandon a train of thought
Consider a derivation, or proof, in propositional logic. To construct a complex proof, we typically work both backwards from the desired conclusion (“meansend analysis” in AI) and forwards from what we have already derived. Here, there is a fixed and finite set of rules for reaching the solution. This is a general characteristic of derivational analogy. As Carbonell and Veloso explain (1993, 252), we can use derivational analogy only in domains where problem solutions can be derived via “a set of operators and inference rules [with definite preconditions and effects].” We must also have a complete grasp of all features that might be relevant in a derivation, in order to produce a comprehensive set of derivational indices. This apparently limits the scope of derivational analogy to artificial tasks or simple mathematical problems.13 Mathematical analogies of even moderate complexity appear to lie beyond its present scope.
12. Veloso (1992) applies derivational analogy to a variety of problems. 13. In the articles cited, Carbonell and Veloso have applied derivational analogy to simple problems in linear algebra, an artificial transportation problem, and job-planning in the Strips robot domain.
82
BY PARALLEL REASONING
Case-based Reasoning: Limitations and Wider Relevance These two examples show that case-based reasoning programs can perform well in specialized problem-solving areas. The explanation for their success should be clear. Analogical reasoning requires the ability to identify which similarities to past cases are relevant to a new case. Case-based reasoning programs solve this problem through specialization. By restricting the problem domain, we are able to identify a fixed set of determining factors, which become the indices.14 Every indexed feature, and nothing else, is relevant. As we might expect, the limitations of the case-based approach to analogical reasoning duplicate what we saw in our discussion of determination rules in chapter 2. 1. Complete set of determining factors. The case-based reasoning model does not appear suited to modeling analogies in settings where we cannot identify a complete set of determining factors. Such settings arise in at least three very common situations. First, we may lack detailed knowledge (typically causal models) of the source domain. We cannot hope to specify a complete set of determining factors that govern animal and human responses to drugs. Second, the range of problems that arise may be too great to depend upon a fixed set of determining factors. We should not expect to identify a complete list of features relevant to analogies between plane and solid geometry. Third, we may know very little about the target domain. These three situations are common in scientific analogies, so that the case-based approach faces a severe limitation. 2. Dense library of cases. As noted earlier, case-based reasoning is inappropriate when dealing with single-source analogies. The case library must be large enough so that, for any new case, it is possible to find past cases that are quite similar. But analogies with a single source are common in scientific reasoning. 3. No plausibility criteria. Most CBR programs shed no light on the fundamental problem of evaluating the plausibility of analogical arguments. In fact, they appear to avoid this problem by assuming: (a) norms for good analogical reasoning are embedded in the case library; and (b) the case library is dense enough so that the closest analogues sanction a legitimate inference. In evaluating scientific analogies, however, we cannot take plausibility for granted even if we have a large set of background cases. An obvious response to the first two observations is that we can expand a set of indices to accommodate new problems, and we can increase the case library. These steps can certainly broaden the scope and reliability of casebased reasoning. But my point is that no finite indexing scheme can ever provide
14. Some CBR programs implement a form of dynamic indexing. One example is discussed in (Leng, Buchanan, and Nicholas 1994).
COMPUTATIONAL THEORIES
83
an adequate basis for analogical reasoning about an expanding, open-ended body of knowledge, a description that applies to scientific settings where analogical reasoning is most useful. Furthermore, restating the second observation, many analogical arguments are unique; there is no class of similar cases. While these difficulties make it unlikely that a case-based reasoning approach can ever hope to provide a model that captures all aspects of analogical reasoning in science, individual CBR programs incorporate insights that have wide application. For example, I shall argue (in the next chapter) that HYPO’s strategy for dealing with competing precedents can be generalized and applied broadly.
3.5 The Copycat Program Hofstadter and Mitchell’s Copycat program (Hofstadter 1995, Mitchell 1993) is, in the authors’ words, “designed to discover insightful analogies, and to do so in a psychologically realistic way” (Hofstadter 1995, 205). The program handles the following type of problem: 1. Suppose the letter-string abc were changed to abd; how would you change the letter-string ijk in “the same way”? Most people would answer ijl, since it is natural to think that abc was changed to abd by replacing the rightmost letter with its successor. Alternative answers are possible: ijd (replace the rightmost letter with d), ijk (replace each c with d), or abd (replace the entire string with abd). But these alternatives are not compelling. We just have a sense of what counts as the natural analogy in this case. Copycat operates in the domain of such letter strings, using concepts like successor, same, leftmost, rightmost, alphabetic first (which applies only to a), alphabetic last (which applies only to z), group (for a group of related letters), and a number of others (about sixty in total). As Hofstadter and Mitchell point out, surprisingly interesting problems can arise in this tiny world. Here are two more examples: 2. Suppose the letter-string aabc were changed to aabd; how would you change the letter-string ijkk in “the same way”? 3. Suppose the letter-string abc were changed to abd; how would you change the letter-string mrrjjj in “the same way”? In each case, the rule “replace the rightmost letter with its successor” could describe the initial transformation, but neither ijkl (in problem 2) nor mrrjjk (in problem 3) seems like a good answer. In problem 2, we are inclined to regard the groups aa and kk as significant. A more natural answer here is ijll, where the transformation rule is now “replace the rightmost group of letters with its successor.” The same idea might lead us to the solution mrrkkk for problem 3, but we might also notice that the group lengths in mrrjjj have the same successor relation (1-2-3) as do the letters in abc. This observation should incline us to the solution mrrjjjj, where we have replaced the length of the rightmost group with its
84
BY PARALLEL REASONING
successor. Interestingly,Hofstadter and Mitchell reject the solution mrrkkkk as overkill (Hofstadter 1995, 241): it “blends the alphabetic view with the numerical view in an inappropriate manner.” In many ways, the problem domain of Copycat bears a closer resemblance to Evans’s geometric analogies than to the domain of any of the more recent programs considered here (even though Copycat has to figure out its answers from scratch, rather than select the best from a set of given choices). Indeed, Hofstadter seems to feel that Evans’s idea of “integrating perceptual processing with mapping, and doing so in an elegant microworld” (1995, 273) did more to advance understanding of analogical reasoning mechanisms than much of the later research. The reason is his conviction that analogy-making is in large part about the perception of novel patterns, and his equally firm belief that such perception requires concepts with “fluid” boundaries (such as the generalized successorsequence of problem 3). He writes, “No true-life situation comes with hard and fast boundaries. . . . Despite this, most artificial-intelligence models of analogymaking involve situations that come prepackaged as tight little bundles of facts with perfectly crisp edges” (69). His rejection of structuralist approaches such as SME and ACME derives in part from the thought that by the time we have “crisp” representations such as the one in figure 3.2, most of the work is done. Genuine analogy-making involves modification or “slippage” of concepts; structuralist programs work with fixed concepts or, more accurately, with no concepts at all. For the same reason, he rejects most case-based models of analogical reasoning: My intuition about what is missing in CBR and CYC,15 and would even be missing in a hypothetical splice of the two systems’ best aspects, is a deep model of concepts. . . . For me, a model of concepts implies something like the overlapping, emergent halos that arise in a Copycat- or Tabletop-style Slipnet. (371) I share Hofstadter’s philosophical views about the central role of perception in analogy-making, though I think that both case-based reasoners and structuralist programs shed light on aspects of the process. In the remainder of this section, I attempt briefly to summarize the alternative architecture of Copycat and discuss the implications of the program for our account of analogical reasoning in science. To begin, the Copycat program has three components: 1. The Slipnet is a network or graph with a fixed set of explicit or “Platonic” concepts pertaining to letter sequences (successor, leftmost, and so forth). Nodes are linked together by edges whose weights represent “conceptual distance,” which reflects the probability that one concept will “slip” into another while trying to find an analogy. To the nodes themselves are attached weights called activation levels, representing their perceived relevance and importance to the problem at hand. Both activation levels and conceptual distances have preset values but change dynamically as
15. CYC is Doug Lenat’s project of assembling a vast database of commonsense knowledge.
COMPUTATIONAL THEORIES
85
the program runs through a series of iterations. For example, if leftmost is currently highly active, the activation level of “conceptually close” neighbors such as (perhaps) rightmost will increase. Each node also has an a priori “conceptual depth” value that does not change: this value is intended to capture the generality and importance of the concept in solving Copycat problems, in the view of the programmers. For example, opposite is a conceptually deep concept. While the nodes represent “Platonic” (explicit, core) concepts, Hofstadter and Mitchell stress that associated with each node is a probabilistic cloud or “halo” including close neighbors and links to these neighbors. These halos function as a second type of concept associated with each node. Core concepts are static, isolated and sharp; halos are dynamic, mutually overlapping, and fluid. 2. The Workspace stores temporary “perceptual structures” as the program struggles to find a coherent solution to the problem. For example, it constructs and stores “bridges” between pairs of objects (individual letters or groups). In addition, the Workspace gradually builds a description for each object, which includes its relationships to other objects. 3. The Coderack contains a set of constantly changing agents or “codelets.” These are procedures that modify an object’s description, add bonds to create groups of letters, make bridges between separated groups, and so forth. In addition, there are special “scout” codelets that do not actually make any changes, but investigate what might happen if changes are made. Codelets are selected and run in random order, but each is assigned an urgency value that determines its probability of being selected. New codelets are constantly being placed in the Coderack as “follow-ups” to active codelets and as the result of activities in the Slipnet. Rather than employing conventional search strategies, the Copycat program utilizes “parallel terraced scan”: scout codelets allow the program to send out many “fingers” along different possible routes, at different speeds. The pressures from different possible solutions gradually push things in a coherent direction. More and more structure is built up in the Workspace and increasingly deep concepts become activated in the Slipnet. Halting is determined probabilistically from the temperature, a parameter that drops as the amount of order in the Workspace increases. Hofstadter and Mitchell argue that what makes Copycat unique is the fluidity of its concepts (halos) and, indeed, the fluidity that permeates almost every aspect of the program. This allows for a kind of open-mindedness “not possible in an architecture with frozen representations” (267): any concept might turn out to be relevant, and different runs on the same problem can produce distinct solutions. Nevertheless, orderly structures do emerge and the program produces plausible solutions. In the authors’ view, this emergence of order out of random low-level processes is the essence of what takes place in perception. Copycat thus vindicates the thesis that analogy-making can be modeled as a process akin to perception, even if the program employs mechanisms distinct
86
BY PARALLEL REASONING
from those in human perception (and the authors make no claim that the mechanisms are similar). This is a significant achievement. Copycat fares well in terms of the criteria laid out in section 3.1; the program makes definite predictions (with explicit conventions about its input) and employs plausible mechanisms to produce good solutions. Neither Copycat nor the underlying model of concepts is simple, but the idea of modeling analogymaking on perception has strong appeal. What of scope? The program works in a tiny environment, and Hofstadter and Mitchell concede that the architecture would be invalidated “if it could be shown to depend on the relative smallness of its repertoire of Platonic concepts and the relatively few instances of those concepts that appear in a typical problem” (263). But they vehemently resist this idea: fluid concepts and mechanisms such as the parallel terraced scan are the essence of Copycat and have nothing to do with its small domain. It is likely that these ideas have much wider scope than the problem domain of Copycat, but making this claim precise represents a challenge. In the Copycat architecture “every possible concept, fact, method, object and so on is in principle available at all times” (264) in analogy-making, where the “possible” items are all of those in Copycat’s repertoire. The authors stress that the same openness to possibilities characterizes creative human analogies, though we have a much larger repertoire. There is a real difference, however, in being able to keep tabs on a finite set of possibly relevant notions (in the Copycat world) and never being sure whether something left out (in the real world) might turn out to be crucially important. In objecting to Darwin’s analogy between artificial and natural selection, for example, critics pointed out that contemporary geological evidence would not support a sufficiently great age for the earth to permit evolution on the scale that Darwin claimed. The objection drew on information outside the range of Darwin’s original discussion. The difficulty here is the same as the “expanding domains” problem that faces CBR. The type of open-endedness needed to handle such cases, however, is probably beyond the reach of any currently feasible computer program. Let us turn the tables and consider now whether Copycat poses a challenge to any logical or quasi-logical theory of analogical reasoning. If Hofstadter and his co-authors are right, most analogical cognition involves microprocesses akin to those of Copycat. These processes operate below the level of abstract reasoning. High-level logical appraisal is, to employ an apt cliche, merely the tip of the iceberg. Attempts to model and evaluate analogical arguments with logical criteria, including the efforts described in the previous chapter and in the remainder of this book, might be hopelessly misguided.16 This objection, which I shall term psychologism about analogies, deserves serious attention whether or not one is convinced of the generality of Copycat. In my view, Copycat presents a powerful case that analogy-making is in large part the perception of patterns, and that it is properly modeled with low-level 16. I am not attributing this argument to Hofstadter, but it is an argument that might be made by someone committed to a perceptual model of analogical cognition.
COMPUTATIONAL THEORIES
87
cognitive processes. But in no way does this eliminate the need for a logical model of analogical reasoning. To appreciate this point, note first that even in Copycat’s own domain, we can examine the program’s output and ask: is that a good answer? Hofstadter and Mitchell offer no general theory of what makes a good analogy, but in explaining particular cases, they appeal to principles similar to those of Evans’s. The idea seems to be: find a general rule that covers both cases but is as conservative (elegant, simple, etc.) a departure from the original transformation as possible. These norms emerge in their discussion of problems 2 and 3. Regardless of how an answer is produced, we are entitled to ask for and receive a justification. If we respond to the question, “Why is this a good analogy? ” with “It is the output of the Copycat program,” or (analogously) “It is just what I came up with,” we confuse causes with justifications. To appreciate this point clearly, consider a parallel case. The psychological processes that lead to mathematical insight are deeply mysterious. Imagine some future descendant of Copycat that solves problems in Euclidean geometry through a combination of clever perception-like processes and parallel terraced scans. Nobody would claim that the workings of the program provide justification for the mathematical inferences it produces. We still need to apply independent standards of justification. These reflections bring us back to the “argument thesis” introduced in section 1.2. I asserted there that any conclusion reached via analogical reasoning is justified only insofar as a reconstruction of that reasoning as an analogical argument justifies that conclusion. I noted the challenge posed by computational models of analogy. What I have been suggesting here is that while Hofstadter may have demonstrated that computer programs intended to generate creative analogies should take the perceptual aspect of analogy-making seriously, we still need a normative theory of evaluation and justification. Before we leave this issue, it is worth considering two additional objections that could be made on behalf of psychologism. First, some philosophers have maintained that ethical decision-making is largely a matter of carefully developed perceptual capacities. If you have been properly brought up, you will “see” that something counts or does not count as a moral reason to act.17 This is sometimes extended to the view that there is no more to moral justification than this properly cultivated perception applied to particular cases. How do we know that analogical decision-making is not just like this? Second, consider Kuhn’s arguments, in the later chapters of his Structure of Scientific Revolutions (Kuhn 1962 and Kuhn 1970), that there is no algorithm for paradigm choice. Such choices are not the result of logic, but rather of success or failure in coming to perceive things in a novel way. Analogical reasoning, if it is primarily a perceptual activity, might not be amenable to criticism or justification. I have already suggested a response to the second of these objections in section 1.3 by introducing a distinction between constitutive analogies— metaphors and analogies that underlie an entire research program—and the 17. Honderich (1985) contains a number of papers discussing the analogy between moral values and secondary qualities.
88
BY PARALLEL REASONING
more mundane analogies of “normal science.” Although the distinction may not be sharp or easy to make, there are plenty of analogical arguments that we routinely criticize and evaluate. Even constitutive analogies can be subject to criticism. Kuhn’s point is not that paradigm choice is closed to rational debate; it is just that there can be no compelling argument either way.18 As for the perceptual model of ethical decision-making, whether or not the view is tenable, there are deep differences between analogical reasoning in science and ethical decisions. The similarities and relevance relations in the sciences are sharper and clearer. Analogies are part of the process of theory confirmation or acceptance, and in disciplines such as archaeology, analogical inference plays a central role. If we abandon the idea that we can apply logical criteria to analogies, we must abandon the idea that analogies play any part in the confirmation of scientific theories. We should not be willing to take that step until we are convinced that no acceptable criteria exist. Copycat offers a bottom-up approach to analogical cognition. The theory is embedded in the way analogies emerge from microprocesses. My approach, which elaborates two very general principles into criteria for evaluating analogies in different contexts, is top-down. I am convinced that there is room for both approaches.
3.6 Conclusion: The Philosopher and the Cognitive Scientist Can computational theories help us to understand analogical reasoning in the sciences? Case-based reasoning programs can model analogical inferences within specialized branches of science where we can identify a comprehensive set of relevant factors. No existing computational theory, however, provides a good general normative theory of analogical arguments in science. Nonetheless, our review of these theories does lead to a sharper appreciation of what it takes to have an adequate model of analogical reasoning. Taking the negative point first, I have argued that neither the structuralist nor the case-based approach can be extended to a general understanding of analogical arguments. Both types of model encounter difficulties in identifying relevant similarities and differences, and both fail to provide adequate norms for analogical arguments. Structuralist approaches derive from the view of analogies as isomorphisms. Relevant similarities and differences are identified primarily from considerations of structure (and explicit user input). Insofar as these computational models incorporate any norm, it is the Systematicity Principle. I have argued that systematicity is at best statistically correlated with plausibility. Nothing rules out deeply systematic analogical arguments that are implausible, or good analogies that depart from systematicity. By contrast, case-based approaches are founded on experience of correlations between a set of key factors and various
18. These clarifications are found in the 1969 postscript in Kuhn (1970), as well as in Kuhn (1977).
COMPUTATIONAL THEORIES
89
outcomes. Relevant similarity is confined to matches on predefined dimensions. So case-based reasoning models are confined to well-understood problem domains with a rich case history. Despite these limitations, the work in cognitive science provides insights into the processes involved in any analogical argument, and these insights are directly relevant to our search for normative criteria. Four ideas seem particularly important and might never have emerged clearly without the computational models. First, we must recognize the importance of focusing on the conventions governing how a domain is represented by the user. Analogies hold between descriptions of domains and can be strengthened or weakened as these descriptions are altered. Based on the examples we have reviewed, it seems reasonable to aim at standards somewhere between the precision of scripts and the flexibility of predicate-calculus descriptions. A philosophical theory should articulate minimal standards for such representations: standards that provide insight and rule out frivolous arguments, but still leave considerable latitude. The second idea, embodied in Ashley’s HYPO, Evans’s Analogy, and Hofstadter and Mitchell’s Copycat, is that we need a clear account of how the initial representations of source and target domains should be elaborated, prior to determining correspondences and evaluating an analogical argument. In the next chapter, I combine these two ideas by proposing that different patterns or “canonical forms” of representation—which I refer to as the prior association—furnish a classification of different types of analogical argument. In addition to imposing standard forms of representation, these patterns are helpful in highlighting the relevant features. The patterns function like high-level stereotypes for the different kinds of association that occur in scientific analogies. The third insight is that there are a variety of ways to deal with multiple analogies, and that it is not necessarily the best strategy to try to pick a unique winner. Fourth, and finally, the computational models provide a big clue as to how we should approach the problem of identifying relevant similarities and differences. Siding with structuralists, we should do this dynamically rather than working with a static list. Siding with case-based reasoners, however, we should incorporate prior knowledge about relevance (if available), and we should base our judgments of relevance on the nature and strength of the vertical relations, rather than on their systematicity. All of these points are consistent with the conclusions of chapter 2. There is evidently much philosophical work to be done.
This page intentionally left blank
4 The Articulation Model
4.1 Introduction What does it take for an analogical argument to make its conclusion plausible? This question is important not just for philosophers but also for anybody who makes use of analogical arguments. Yet as we have seen, most philosophical theories do not discriminate effectively between good and bad analogies, while computational models tend to dismiss the idea of a normative account altogether. In this chapter, I shall answer the question by proposing what I call the articulation model for evaluating analogical arguments. This first section deals with some preliminary matters. To begin, I summarize what I take to be the most important general requirements for a normative theory of analogical arguments. Next, I provide a brief outline of the articulation model and indicate how it will meet those requirements. Finally, I identify some important restrictions that apply to the model. The most basic task of a normative theory of analogical reasoning is to deliver a verdict as to whether a given analogical argument establishes its conclusion as prima facie plausible. The theory should also, if possible, provide a basis for making qualitative and comparative judgments about the strengths of different analogical arguments.1 Based on the surveys in the preceding two chapters, I propose that such a theory should meet the following requirements.
1. I postpone until much later (chapter 8) questions about how (and whether) a theory of analogical arguments might be linked to quantitative (probabilistic) measures of plausibility.
92
BY PARALLEL REASONING
1. Clarity and consistency. Our theory must provide clear (but flexible) standards of representation and clear, consistent criteria of evaluation. That is, we need to be clear both about how an analogical argument is initially represented and about the principles used for its evaluation. The guidelines for representation, however, must be flexible enough to accommodate the enormous variety of analogical arguments. The consistency requirement means that our evaluation criteria must not yield contradictory verdicts about a single argument represented in a fixed way. Consistency does not mean that the verdict must remain invariant under changes in representation, not even when two representations are logically equivalent. What is more, we should allow that two or more incompatible hypotheses can be made plausible by distinct analogical arguments. 2. Applicability/predictive adequacy. Our theory must provide criteria that plausibly discriminate between good and bad analogical arguments. A satisfactory theory should not be vulnerable to obvious counterexamples, and should either agree with our considered judgments about particular analogical arguments or offer defensible corrections. 3. Explanatory power. Our theory must provide criteria amenable to philosophical justification. A successful theory should show how good analogies actually contribute to plausibility. It should exhibit the logical mechanisms at work, rather than (merely) the psychological mechanisms. 4. Scope. Our theory should apply to the widest possible variety of analogical arguments. I take this requirement to mean at least four things. First, a comprehensive theory should accommodate the wide range of fields in which analogical arguments are used. A theory restricted to causal relations cannot hope to account for analogies in mathematics. Second, a theory of analogical reasoning should identify relevant similarities and differences dynamically. As we have seen (in the discussion of case-based reasoning), theories that work with a fixed list of relevant features have a limited range of application. Third, a comprehensive theory should handle both individual (single-source) analogical arguments and those involving multiple sources. Finally, an adequate theory should integrate analogical arguments into a broader account of inference and confirmation. 5. Simplicity. Our theory should be as economical as possible in its basic concepts and assumptions. 6. Nontriviality. Our account should not make analogies irrelevant to justification. We restrict attention to cases where the analogy plays a nontrivial role in the argument. An account of analogical arguments that presupposes an underlying generalization (see section 2.4) can trivialize an analogical argument if it makes information about the source domain unnecessary. The requirement of nontriviality rules out this sort of analysis. It is difficult to find a theory of analogical arguments that meets all of these requirements. I suggest, however, that the difficulty is exacerbated by the way that current theories classify and evaluate analogies. Most current classification
THE ARTICULATION MODEL
93
schemes are based on the overall nature of the similarities between domains— what Hesse refers to as horizontal relations. Such schemes lead to the view that analogical arguments should be assessed on the basis of overall similarity between source and target domains. The articulation model, by contrast, classifies analogical arguments on the basis of the vertical relations within each domain. This new approach leads to evaluation criteria based on relevant, rather than overall, similarity, and that proves to be crucial in attempting to meet the requirements listed above. To motivate this shift in strategy, let us recall that three of the most prominent approaches to analogy, each of which is oriented toward overall similarity, face very similar difficulties. First, there are computational theories founded on systematicity, the idea that analogical inferences involving complex, high-level networks of relations and attributes are inherently the most plausible ones. I argued in section 3.3 that systematicity by itself does not produce or explain the plausibility of any analogical argument. Second, there is the sampling view: if two domains share a high degree of overall similarity, then it is probable that they agree in all characteristics relevant to the attribute that is the focus of the analogical argument. As we saw in section 2.5, this approach suffers both from vagueness (about which similarities count) and from being overly liberal (in licensing too many analogical arguments). Third, there is Hesse’s view (see section 2.3): the plausibility of an individual analogical inference derives from material analogy, or “pre-theoretic” similarities between domains. Both in her 1966 book and in later work, Hesse argues that analogical arguments are founded on metaphorical relationships between the source and target domains. Scientific language is pervaded by metaphors, and these metaphors are “pre-theoretic” because they draw on personal and cultural experience.2 As Hesse (1966) notes, correspondences between properties of sound and light, such as pitch and color, volume and intensity, and so on, were recognized long before the mathematics of wave theories had been developed. In light of such observations, it may seem appropriate to evaluate an analogical inference by starting with the underlying metaphor. The problem once again is that overall similarity, now construed in terms of metaphorical correspondences, is an unreliable guide to the plausibility of an analogical argument. No metaphorical connotations, no fixed set of pre-theoretic similarities and differences, can be appropriate for evaluating all conclusions toward which an analogy between two systems might be directed. If there were such a set, we should have to accept either every analogical argument involving the two systems or none. Yet, as pointed out in the discussion of Hesse’s approach, the analogy between heat and fluids licenses some inferences but not others. In every case, the salience (relevance or irrelevance) of similarities is relative to the individual inference in which the analogy is used. This is the substance of my criticism of all classifications and theories of analogical 2. In addition to Hesse (1966, 1988), Lakoff and his colleagues have argued that human experience is framed in terms of broad metaphors (Lakoff and Johnson 1980, Lakoff 1987, Lakoff and Turner 1989). Brown (2003) argues for the prevalence in sophisticated scientific reasoning of metaphors based on ordinary experience.
94
BY PARALLEL REASONING
reasoning based on overall similarity, and it hinges on the fact that these theories misrepresent the basic logical structure of analogical arguments. In contrast to Hesse and others, I believe that the key to understanding and evaluating an analogical argument is to look first at the nature of the vertical relations. An indispensable requirement for a good analogical argument is a clear connection, in the source domain, between the known similarities (the positive analogy) and the further similarity that is projected to hold in the target domain (the hypothetical analogy). I call this the prior association. Making the prior association explicit is the first order of business because it allows us to proceed to exactly the right issue: whether there is reason to think the same kind of connection could obtain in the target domain. Accordingly, I classify and evaluate analogical arguments on the basis of this prior association. By taking this new approach, we can meet all of the desiderata for a theory of analogical arguments. Our model achieves clarity by setting standards for representing each distinct type of prior association and it achieves scope by encompassing a broad range of types. It combines a classification scheme (section 4.2) with a general theory (section 4.3), and then fleshes out the general theory with subordinate models (or submodels) for the different types of analogical argument (sections 4.4–4.9). The general theory applies to all analogical arguments. The subordinate models are geared toward scientific analogies, but are capable of wider application. Furthermore, additional models can be developed to extend the theory to analogical arguments outside science (though I elaborate on this point only in the concluding chapter). In this way, the scope of my theory remains broad even though I concentrate on examples from mathematics and science. An important attraction of this two-component theory is that it is open to improvement. While the submodels outlined in this chapter capture some of the main features of analogical arguments, they are capable of refinement. In fact, chapter 5 is devoted to exploring a variety of mathematical analogies, expanding upon the basic model offered in section 4.4. To conclude this preliminary discussion, let me note three important restrictions. First, the theory as formulated in section 4.3 concerns individual analogical arguments—those that involve a single source domain and a single target domain. This type of argument is both an important special case and the key to understanding arguments involving multiple analogies. I defend that position in section 4.10, where I expand the articulation model to accommodate arguments involving multiple analogies. Second, my model for evaluating analogical arguments is presented in isolation from other factors that contribute to our assessment of the overall plausibility of a hypothesis. Although it provides an account of what it takes for an analogical argument to establish the prima facie plausibility of its conclusion, other criteria also influence our plausibility judgments. I postpone discussion of how analogical arguments are integrated into a broader pattern of inference until chapters 7 and 8. Finally, in presenting the basic model, I take interdomain correspondences— relationships of similarity and difference—for granted. This sets my project apart from work on analogy in cognitive science, where the focus of research is on the generation of mappings between domains. Here, I take the correspondences as
THE ARTICULATION MODEL
95
given and (for now) assume that they are legitimate. Chapters 5 and 6 discuss the critical evaluation of such correspondences. These three restrictions are idealizations. They allow us to concentrate on how the evaluation of an analogical argument depends upon the prior association. The idealizations are acceptable because most of the insights we gain from studying analogical arguments in isolation are valid even when we move to a wider inferential context. The factors set aside here—multiple analogies, additional determinants of plausibility, similarity—are, with some exceptions, reintroduced later. I shall provide arguments for the case of multiple analogies in section 4.10, while later chapters lift the other restrictions.
4.2 Classification and Examples Consider the following four examples of analogical arguments. Example 4.1 (triangles and tetrahedra). Suppose we have proved that the three medians of any triangle have a common intersection point. By analogy, we conjecture that the medians of any tetrahedron—the lines joining each vertex with the center of the opposite face—have a common intersection. Example 4.2 (spherical shells). In 1769, Joseph Priestley was the first to suggest, by analogy with gravitational force, that the absence of electrical influence inside a hollow charged spherical shell was evidence that charges attract and repel each other with an inverse square force (Priestley 1769, 1775/1966, 374). It was already well known to be a mathematical consequence of the inverse square law of gravitation that a shell of uniform density exerts zero force inside. This persuasive analogy was particularly significant for scientists of the period because it was extremely difficult to test directly the inverse square hypothesis. Example 4.3 (walking on stilts). The species Hallucigenia sparsa, which flourished about 530 million years ago, was discovered by Charles Walcott in the Burgess Shale of British Columbia. The animal has been described as a “bizarre creature”with “seven pairs of stiltlike legs, supporting an elongate trunk with a cylindrical cross section” (Conway Morris 1977). Given that no animal like H. sparsa is known today, it was puzzling even to determine its method of locomotion. Various living organisms were taken as “more-or-less appropriate analogs,” including tripod fish that rest and walk on elongated pectoral and tail spines, and elasipodid holothurians “supported by elongate tube feet” (Briggs and Conway Morris 1986). We also encounter species of echinoids that are able to support themselves effectively on similar appendages, but move slowly on soft surfaces. Based on these and other analogies, the conclusion was that H. sparsa “probably did not progress rapidly over rocks or mud, and much of its time may have been spent stationary” (Conway Morris 1977). Example 4.4 (morphine and meperidine). In 1934, the pharmacologist Schaumann was testing synthetic compounds for their antispasmodic effect. These drugs had a chemical structure similar to morphine. He observed that one of the compounds—meperidine, now also known as Demerol—had a physical effect on mice that was previously observed only with morphine: it induced an
96
BY PARALLEL REASONING
S-shaped tail curvature. By analogy, he conjectured that the drug might also share morphine’s narcotic effects. Testing on rats, rabbits, and dogs showed that meperidine was in fact an effective painkiller, and had the same sort of influence as morphine on respiration and body temperature. Employing a second analogy, between these animals and humans, he argued that the drug would have similar effects on humans (Lembeck 1989, 11; Reynolds and Randall 1975, 273). On the surface, all four examples conform to the same pattern: from observed similarities, infer a further similarity. They all fit schema (1.4.1) from chapter 1. Despite this resemblance, the arguments differ in fundamental ways. The most important, perhaps, is the direction of the prior association between the positive analogy, P, and the hypothetical analogy, Q. The concept of direction is clearest if the prior association is logical (entailment), causal, or explanatory in nature, because these relations are typically asymmetric. In such cases, there are two possibilities: from P to Q if P causes (entails, explains, etc.) Q, or from Q to P if the reverse holds. There are two further possibilities: a bi-directional prior association, where relations run both ways, and no direction, if there is no known priority. We thus obtain a general classification consisting of four types of analogical argument. 1. Predictive analogies (P ! Q ). P is prior to Q in the association. In Example 4.1, the direction of association is from P to Q. S (the source) is the domain of plane geometry and T (the target) is the domain of solid geometry. The fact Q that the medians of a triangle meet is a consequence of known facts P about triangles. These facts have analogs (P*) in three dimensions; we predict that Q* is a consequence. 2. Explanatory analogies (Q ! P ). Q is prior to P in the association. In Example 4.2, the direction of association is from Q to P. Here, S stands for the gravitational and T for the electrostatic set-up. P (the absence of gravitational force inside a hollow sphere) is a consequence of Q (the inverse-square law of gravitation). Since Q explains P, it is plausible that something similar, Q*, explains a similar phenomenon P*. We call this an explanatory analogical argument as it is most often used to convey plausibility on a proposed explanatory hypothesis. 3. Functional analogies (P $ Q ). The association runs in both directions. Example 4.3 compares the structure and behavior of an existing organism (S) with H. sparsa (T ). The prior association is an explanation of physical form P in terms of an alleged function Q (locomotion). It does not quite fit the pattern of Example 4.2, as form is both prior to and posterior to function. That is, (a) the appendages (in tripod fish) have a form capable of performing locomotion, and (b) the legs have the form that they do, at least in part, because of (for the sake of ) that function. As Aristotle would say, P is prior as efficient cause while Q is prior as final cause.3 Plausibly, the same relationships hold between P* and Q*.
3. This simplified account of functional explanation will be improved in section 4.8.
THE ARTICULATION MODEL
97
Functional analogies are the only type of bi-directional association that we consider in this chapter (hence the label for this category). We can certainly imagine other sorts of bi-directional associations. For example, we might encounter a mathematical analogy in which P and Q are logically equivalent (relative to background mathematical assumptions). In this situation, however, usually only one direction matters for the analogical inference, so that the argument counts as a predictive analogy.4 4. Correlative analogies (P # Q). The association is symmetric; there is no direction of priority. The relationship here is statistical correlation. In Example 4.4, let S stand for morphine and T for the new compound, meperidine. P (the S-shaped tail curvature) and Q (the narcotic effects) are both observed consequences of morphine injection. There is no reason to think that either causes (or explains) the other; rather, they are two distinct bodily reactions to morphine. The conclusion of the argument, Q*, is that meperidine might have similar narcotic effects because Q* is correlated with P* (S-shaped tail curvature) just as Q is with P. The down arrow (#) represents the fact that we lack a causal or explanatory model that establishes priority for either P or Q. Of course, there will often be a (symmetrical) statistical correlation between P and Q within the first three categories of analogy. This fourth category is meant to include only cases where we have no information about the direction of the prior association. Apart from the direction of the prior association, the four examples differ in a second respect. This is the mode of association between P and Q, which may be either deductive or inductive. The association is deductive if, in conjunction with appropriate background assumptions, either P entails Q (as in Example 4.1) or Q entails P (Example 4.2). This kind of association is prevalent in mathematics or mathematically formulated theories, and is limited to predictive and explanatory analogies. Inductive associations, by contrast, are compatible with all four basic types of analogy. Once we subdivide the predictive and explanatory categories, we obtain a total of six varieties of analogical argument.5 Table 4.1 summarizes the distinctions. The table merits a few remarks. First, the nomenclature is meant to reflect either typical applications or the logical character of the association. Analogical arguments in the first category, for example, typically turn on mathematical relationships. Second, most of these associations (except for some mathematical examples) involve causal relationships. This is not reflected in the terminology because it would not help to distinguish the different forms. Third, the qualifier “probabilistic” is applied to two of the labels, but statistical relationships also figure in functional and correlative analogical arguments. 4. The reader may wonder whether the category of functional analogies is reducible in some way to the first two categories. In chapter 6, I suggest that we can distinguish between elementary and complex patterns of analogical reasoning, the latter involving some combination of elementary forms. Here I treat functional analogies as a separate category, partly because they play a prominent role in inquiry and partly to provide one example of a complex type of analogical argument. 5. Examples within all six categories will be discussed in sections 4.4–4.9.
98
BY PARALLEL REASONING
DIRECTION OF PRIOR ASSOCIATION MODE Deductive Inductive
Predictive (From P to Q)
Explanatory Functional Correlative (From Q to P) (Both directions) (No direction)
Mathematical
Abductive
—
—
Predictive/ Probabilistic
Abductive/ Probabilistic
Functional
Correlative
TABLE 4.1. Types of analogical argument.
Roughly speaking, we obtain a ranking of analogical arguments by decreasing strength if we proceed in top-down, left-to-right fashion. The ranking is rough because it refers only to the maximum possible strength of analogical arguments within each category. Within all six categories, we encounter both good and bad analogical arguments. The point is just that the strength of the prior association strongly influences the plausibility of the analogical argument. The maximum possible strength tends to decrease as we proceed left to right and top to bottom. Since that maximum is rarely attained, however, this rank ordering is only a rough guide to the degree of plausibility that an analogical argument conveys. Finally, I reiterate a point made in the previous section: this classification and the models based upon it are provisional. They are capable of refinement. Before developing those models, however, let’s first explain the general theory that is common to them all.
4.3 The Articulation Model The articulation model is based on the two fundamental principles mentioned in previous chapters. First, there must be a prior association, a clear connection, in the source domain, between the known similarities and the further similarity that is projected to hold in the target domain. As we have just seen, this connection can be logical, causal, explanatory, or a correlation. Second, there must be a potential for generalization of this association from the source domain to the target, which would support the conclusion of the analogical argument. This section explains how these principles provide an abstract characterization of plausible analogical arguments. I present the model first in a formal and then an informal version.
Formal Version An assessment of plausibility takes place against a background of potentially relevant factors f1, f2, . . . . These factors may be variables, assumptions, or conditions, depending on the setting. In general, given a fixed analogical argument, we define the set F of potentially relevant factors as follows.
THE ARTICULATION MODEL
99
DEFINITION: Potentially Relevant Factors The set F of potentially relevant factors for an analogical argument is the union of the following sets: • The set j of all factors (other than the conclusion) that appear explicitly in the analogical argument • All sets C of factors (other than the conclusions) that appear in other salient analogical arguments advanced in favor of the same or rival conclusions • A set B of unstated background factors (also referred to as the background context) Given the idealizations noted in section 4.1, we assume here that C is empty (a restriction dropped in section 4.10). That is, we are examining a single analogical argument. We begin by supposing that our analogical argument identifies certain factors that are present or absent in the source and target domains. As always, we use * for factors in the target domain, flagging their correspondence to 4 properties in the source domain. The symbol denotes absence, so that f signifies the known absence of f4 from the source domain. Throughout this section, we use the following schematic example to illustrate our definitions and ideas. In the case of f5 and f* 6, we do not know whether a corresponding factor is present in the other domain, as shown in table 4.2. What is the set F of potentially relevant factors for the argument? We can simplify things in light of our earlier assumption (section 4.1) that correspondences between the source and target domains are given. Wherever we have a pair of corresponding factors, we include only the first (the factor pertaining to the source domain) in j. With this simplification, the set of potentially relevant factors for our example becomes j = {f1, f2, f3, f4, f5, f6*}, together with background factors B. The conclusion of the analogical argument is that Q* is plausible, where Q* is a proposition about the target that corresponds to Q. An important requirement of our theory is that every analogical argument must specify a definite prior association, typically falling under one of the types noted in section 4.2. An acceptable analogical argument does not simply put forward an unstructured list, as in the preceding paragraph and as in schema (1.4.1), but rather presents the relevant factors in some definite relationship to each other.
TABLE 4.2. Schematic example Source Domain
Target Domain
f1 f2 f3 4 f
f1* f2* 3* f f4* ? f6*
f5 ? Q
100
BY PARALLEL REASONING
DEFINITION: Prior Association The prior association is a relation, Rðf1 ; . . . ; fm ; p 1 ; . . . p n ; QÞ where each fi and pj belongs to j. A prior association may incorporate factors that are present (fi) or absent (pj), although the latter are often omitted.6 To complete our artificial example, let us suppose that the prior association here takes the form R(f1, f3, f5, Q). For each type of prior association, our theory defines a canonical form, reflecting basic conventions about representation, and preconditions, reflecting basic standards for that type of association. These details pertain to the submodels; they are not part of the general theory. Next, we consider two different ways of carving up the set j. The first employs the distinctions between positive, negative, and neutral analogies introduced in chapter 1. DEFINITION: Similarity Partition j = P [ N [ O, where we have: • P is the positive analogy, consisting of all members f in j represented as belonging to both the source and target domains.7 • N is the negative analogy, consisting of all members f in j represented as in the other. belonging to one domain with f • O is the neutral analogy, consisting of all members f of j represented as belonging to one domain with no information about whether f* belongs to the other domain. In the example, P = {f1, f2}, N = {f3, f4} and O = {f5, f6*}. Note that there is no overlap among these sets. Our second way of partitioning j is based on a distinction between critical and secondary relevance. The critical factors, roughly speaking, are those elements of the prior association represented as playing an essential part in the circumstances. All other factors in the prior association are secondary. In the sections to come, we provide different characterizations of this distinction for the different categories of analogical argument. DEFINITION: Relevance Partition j = jC [ jS [ jI, where we have: • jC is the set of critical (or critically relevant) factors. • jS is the set of secondary (or secondarily relevant) factors. • jI is the set of irrelevant factors, consisting of members of j that do not appear in the prior association. 1, p 2, etc., but sometimes (as in our 6. I shall generally represent factors omitted from the source domain as p 1,f 2, and so forth. example) use f 7. That is, f belongs to the source and f* to the target. Similar clarifications apply to the definitions of the negative and neutral analogy.
THE ARTICULATION MODEL
101
Often, the set jI is empty, as there is no point in including idle factors in an analogical argument. In fact, the set jI plays no further role in our theory. We speak of jC [ jS as the set of all relevant factors. In our example, let us suppose that jC = {f1, f3}, jS = {j5} and jI = {f2, f4, f6*}. Then the set of relevant factors is {f1, f3, f5}. We have defined these two distinctions (the similarity partition and the relevance partition) for j, the set of factors appearing explicitly in the argument. Both distinctions, however, must sometimes be extended into B, the set of background factors. I suppress this point in the above definition, but in fact, there is no loss of generality. Later in the chapter, I show how significant background factors may be incorporated into an augmented version of the prior association, and thus into an enlarged set j of explicit relevant factors. Augmentation will be an important feature of the theory for some categories of analogical argument. We need one final distinction. In cases where the prior association is a causal relationship, it may be that some of the relevant factors contribute to Q and some do not.8 We shall say that the valence of a factor is positive if it is a contributing cause for Q , negative if it is a counteracting cause for Q , and neutral if it neither contributes nor counteracts. Let j+ denote the subset of relevant factors consisting of those that are both present and have positive valence, while j consists of the relevant factors that are present and have negative valence. (If the concept of valence is inapplicable, j+ is identical to j.) In our example, let us suppose that j1, j3, and j5 are all contributing factors. Using these ideas, we propose two levels of appraisal for analogical arguments, both of which implement the idea of potential for generalization. The first is a screening test for minimal or prima facie plausibility. The idea is that an argument that passes the test should be taken seriously. As I explained in chapter 1, prima facie plausibility is a modal notion rather than a matter of degree. (4.3.1) Prima facie Plausibility An analogical argument meets the requirements for prima facie plausibility if: 1. Overlap. j+ \ P 6¼ (where is the empty set). 2. No-critical-difference. jC \ N = . The positive (overlap) condition states that there must be some relevant (contributing) factor common to the source and target domains—in other words, that the positive analogy plays a relevant part in the prior association. The negative (no-critical-difference) condition states that no critical factors belong to the negative analogy. For analogical arguments that are prima facie plausible, there is a second set of criteria for assessing qualitative plausibility.
8. It is useful to include counteracting causal factors in “how possibly” analogical arguments, as first noted in section 3.3 and discussed further in section 4.5.
102
BY PARALLEL REASONING
(4.3.2) Qualitative Plausibility The plausibility of an analogical argument depends on three criteria: 1. Strength of the prior association. This will be discussed in the next section. 2. Extent of the positive analogy. An analogical argument becomes stronger when critical factors are shifted from neutral to positive analogy, and secondary factors are shifted either from negative to neutral or from neutral to positive analogy. 3. Multiple analogies. Favorable analogies support the conclusion; competing analogies may weaken or undermine it.9 Applying these ideas to our example, we see that it satisfies the overlap condition because f1 2 j+ \ P, but it violates the no-critical-difference condition because f3 2 jC \ N. This analogical argument thus fails to show that its conclusion (Q*) is prima facie plausible.
Informal Version: A Three-step Evaluation Procedure I now offer a friendlier, and more clearly motivated, description of the preceding theory. To that end, two ideas are helpful. The first is the rhetorical device of advocate and critic, introduced in chapter 1. This device motivates many features of my theory. Recall that we imagine that an enthusiastic advocate presents the analogical argument to a polite but moderately skeptical critic. Introducing this framework highlights the need to balance two competing pressures at work in representing and evaluating arguments from analogy: explicitness and economy. On the one hand, the critic wants the argument to be as explicit as possible, noting every factor that might be relevant to the conclusion, since the inclusion of detail increases the chance of exposing a weakness in the argument. On the other hand, the advocate wants to be economical about what counts as relevant. After all, a successful analogical argument shows that some differences don’t matter. The second idea is that it helps to apply the elements of the articulation model via a three-step evaluation procedure. The three steps, and their connection to the formal model, are as follows. 1. Elaborate the prior association (canonical forms and preconditions). This step consists in ensuring that the prior association meets certain standards. For each of our six categories of analogical argument, these standards are encapsulated both in a canonical form, which is a quasi-formal template for representing the prior association, and in certain preconditions that the prior association must satisfy. 2. Determine relevance (critical and secondary features). The next step is to sort out which features of the source and target domains are relevant to the conclusion of the argument, and to determine their degree of relevance.
9. In light of our restriction to individual analogical arguments, we set aside this point until section 4.10.
THE ARTICULATION MODEL
103
Remarkably, the distinction between critical and secondary relevance is adequate to account for many features of analogical arguments. In some cases, augmentation (further elaboration) of the prior association is necessary at this step. 3. Assess the potential for generalization (plausibility screening). The prospects for generalizing the prior association are evaluated by assessing both positive and negative evidence, as captured in the criteria for minimal and qualitative plausibility. The interaction between advocate and critic sheds light on each of these steps, which merit further discussion. 1: PRIOR ASSOCIATION AND PRECONDITIONS . I propose canonical forms for representing each type of prior association, drawn from philosophical models of explanation, causation, and so forth. These forms are meant to balance the competing pressures of explicitness and economy. They should ensure sufficient precision to satisfy the critic but be flexible enough to model any reasonable analogical argument of the appropriate type. Each type of prior association has its own set of standards, which we call preconditions. For instance, if the association is deductive (as in Examples 4.1 and 4.2, above), the implication from P to Q or from Q to P should be valid. If the association is a causal explanation, the preconditions relate to matters such as possible defeating conditions. The preconditions are independent standards for that type of relationship; they have nothing to do with the analogical argument per se. An analogical argument is a nonstarter if its prior association fails to meet these preconditions. The canonical forms and preconditions, then, establish minimal standards for the prior association. Obviously, the advocate wants to represent the situation in the source domain in terms that favor the analogical argument, but these elements of our model impose reasonable constraints. In fact, the constraints should be welcomed by the advocate, since they actually strengthen the argument. Analogical inference is a parasitical form of reasoning. The prior association in the source domain imposes an upper bound on the strength of the analogical inference: at best, we may conclude that an association of similar strength holds in the target domain.10 I call this the upper bound thesis. The canonical forms and preconditions not only ensure a viable argument but also help to maximize the potential plausibility of the conclusion. The particular canonical forms put forward in the sections to come are provisional. Different choices would lead to somewhat different evaluation criteria. The fundamental strategy of articulating the prior association in a standardized form is independent of these choices. STEP
2: DETERMINATION OF RELEVANCE . The second step is to decide which features of our two domains are relevant in assessing the analogical argument.
STEP
10. I return to this idea of an upper bound in chapter 8.
104
BY PARALLEL REASONING
There are two main problems. First, there is the problem of explicit relevance: which features among those explicitly represented in the argument are relevant, and to what extent? These leads to two separate concerns. • Critic: The positive analogy might be overstated, including irrelevant similarities. • Advocate: The negative analogy might be overstated, containing irrelevant differences. Second, there is the problem of implicit relevance: are there additional relevant features that are not represented in the analogical argument? This again leads to two separate concerns. • Critic: The negative analogy might be understated, suppressing important differences that are relevant to the conclusion. • Advocate: The positive analogy might be understated, omitting relevant similarities. These concerns about overstating or understating the similarities and differences suggest, correctly I think, that the issue is not whether our analogical argument distorts objective facts about relevance but, rather, whether it distorts what is known by the critic or the advocate to be relevant. An acceptable representation of the analogical argument must somehow reflect both parties’ knowledge of the two domains. That an analogical argument fails to take account of totally unknown, but objectively relevant, factors can have no bearing on our assessment of its plausibility.11 Bearing this in mind, my proposed solution to the problem of implicit relevance comes in two parts. First, the preconditions for the prior association impose upon the advocate a burden (but not an undue burden) of honesty in making relevant factors an explicit part of the analogical argument. Second, at this second stage of the evaluation process we may need to augment the prior association by making relevant background factors an explicit part of the argument. We shall see how this works when we come to the submodels. For the problem of explicit relevance, I again suggest a two-part solution. First, as noted in the formal exposition, we distinguish between critical features, which play an essential part in the prior association, and secondary features, which figure in the prior association but are not essential. This distinction will be developed somewhat differently for each class of analogical argument. Second, I maintain that this problem has limited significance so long as our primary objective is to account for prima facie plausibility, rather than to measure the strength of analogical arguments. 3: ASSESSMENT OF POTENTIAL FOR GENERALIZATION . When people reason by analogy, they may be regarded as seeking a generalization that embraces the two
STEP
11. Objective relevance becomes a concern if we seek to justify analogical reasoning in reliabilist terms, a point to which I return in later chapters.
THE ARTICULATION MODEL
105
domains. As noted earlier, this view is common among mathematicians (Poincare´ 1952a; Polya 1954). In legal reasoning, the ideal of generalization is reflected in the judge’s announcement of a “rule of the case” covering the situation at hand as well as the precedents taken to be most on point. My most fundamental claim is that assessing the potential for generalization captures the essential logic of analogical reasoning in all of its applications. I also maintain that our two requirements (overlap and no-critical-difference) provide an adequate test of prima facie plausibility. Even though people often put forward analogical arguments with no intention of generalizing, and even though efforts to generalize frequently (and instructively) fail, the two requirements focus our attention on the most fundamental features of the source domain and provide the best guide to formulating more specific evaluation criteria within each category of analogical argument. In each case, we shall see that assessing plausibility involves positive factors (reasons to believe that generalization is possible) and negative factors (reasons to think that generalization might be blocked). The prima facie plausibility and qualitative plausibility tests provide this type of evaluation. It is appropriate to say a little more about what I mean by generalization in the phrase, “potential for generalization.” When we speak of generalization, what come to mind are most likely categorical assertions (“All A are B” ) or statements of conditional probability (Prob(B/A) = p). But as I argued in chapter 2, analogical arguments cannot always be equated with straightforward forms of generalization. Polya, writing about generalization in mathematics, makes a helpful distinction: You should not forget, however, that there are two kinds of generalizations. One is cheap and the other is valuable. It is easy to generalize by diluting; it is important to generalize by condensing . . . . Generalization by condensing compresses into one concept of wide scope several ideas which appeared widely scattered before. (1954, 30) In a scientific setting, generalization by condensing is commonly referred to as unification. Kitcher writes: Science advances our understanding of nature by showing us how to derive descriptions of many phenomena, using the same patterns of derivation again and again, and, in demonstrating this, it teaches us how to reduce the number of types of facts we have to accept as ultimate (or brute). (1989, 432) When I speak of potential for generalization, I intend this notion of condensing or unification: finding a common pattern instantiated in both the source and target domains.
Analogical Arguments and Analogical Reasoning I have presented the articulation model in abstract form. The next few sections fill in the details and apply the model to different kinds of analogical argument.
106
BY PARALLEL REASONING
Still, one preliminary objection that might already have occurred to the reader is that the theory is too restrictive. That is, it leads to the rejection of too many analogical arguments. The objection here is that the no-critical-difference condition is too restrictive for prima facie plausibility. Perhaps an analogical argument can make its conclusion plausible even when there are critical differences between the source and target domains. As a point about human psychology, this observation is correct. People derive confidence from deeply flawed analogies. Nevertheless, we can respond to this objection by considering how an advocate of an analogical argument deals with alleged critical differences. When a critical disanalogy is pointed out, the response is always to try to save the argument by showing that the disanalogy is either merely apparent or less than critical. This suggests that the no-critical-difference condition really is a fundamental part of the logic of analogy. The condition might also be thought too restrictive because it can lead scientists to make unjustified assumptions in order to preserve an analogy. In a discussion of great analogies in physics, Oppenheimer (1956) points out that in the nineteenth century, scientific experience suggested that a material medium was essential for the propagation of any wave. This led physicists to postulate (by analogy) a luminiferous ether, an assumption that was something of a hindrance to progress until the advent of special relativity.12 This looks like a case where scientists were hampered by allegiance to the no-critical-difference condition: to preserve the analogy between light and other wave-like phenomena, an unwarranted assumption was made. If we attend to the history carefully, however, we see that the no-critical-difference condition could have helped, rather than impeded, investigation. It often happens that crucial tests of similarity are deferred. In terms of our model, this occurs when critical properties lie in the neutral analogy. While initially such analogies may seem plausible, the plausibility evaporates over time if efforts to shift these critical properties to the positive analogy fail. This phenomenon, which will be examined more fully in the discussion of the “acoustical analogy” (Example 4.7), helps us to understand the role of analogical reasoning in the history of “ether” theories. This leads to what I consider the best response to the objection. The nocritical-difference condition is just right for analogical arguments because it leads to good analogical reasoning. Analogical reasoning involves the use of an analogy to explore new territory. That means careful sculpting of an initially crude analogy, together with gradual recognition of its limitations. A good theory of analogical arguments is the keystone for a fruitful model of analogical reasoning. We want a theory that rejects careless analogical arguments and tells us why they fail. Such a theory provides guidance about how to refine an analogy (or whether to abandon it), with the ultimate objective of achieving greater understanding of the target domain. These ideas will be explored further as we turn to the next task, developing models for our six categories of analogical argument. Sections 4.4 through 4.6 12. Indeed, the special theory of relativity showed that the analogy between light and sound breaks down in a number of ways. The most important is the fact that the speed of light is constant in different frames of reference, while the speed of sound is modified according to ordinary Galilean transformations.
THE ARTICULATION MODEL
107
provide detailed discussion, while the presentation in sections 4.7 through 4.9 is more compressed. The main argument resumes in section 4.10, with the extension of the general theory to reasoning with multiple analogies.
4.4 Mathematical Analogies In a predictive analogical argument, the known similarities tend to promote or bring about the projected similarity in the source domain. The argument depends upon the possibility of extrapolating this tendency to the new (target) situation. We discuss separately the cases where the association can be represented in logical and statistical terms. When the association is a relationship of entailment, we have a mathematical analogy. Mathematical analogies occur both in pure mathematics and in mathematical sciences. For this category of analogical argument, the general theory of the preceding section yields, at least initially, a clear and simple model. We explore additional complications, examples and philosophical issues in chapter 5. According to our theory of analogical arguments, the first step is to supply a model for the prior association: a “canonical form” together with relevant restrictions or “preconditions” on that form. Here, the prior association is a mathematical derivation. We represent it in the following simple form. (4.4.1) Prior Association for Mathematical Analogy j£Q Q is the proposition whose analog is projected to hold in the target, and j is the set of explicit assumptions used in the proof.13 (Unstated background assumptions are not part of the prior association.) The prior association is thus that Q follows from j via a mathematical derivation. The analogical argument suggests that a similar entailment relationship implies an analogous proposition (Q*) in the target domain. The precondition here is just that the proof be mathematically acceptable. It must satisfy a competent mathematician. The prior association can be any proof that meets this basic (but informal) standard.14 At this point, let us reintroduce Example 4.1. A definite example helps to fix our definitions and to see where we are headed. Example 4.1 (triangles and tetrahedra, continued). In this geometrical analogy, the source domain is plane geometry and the feature Q is that the three medians of a triangle are concurrent—that is, they intersect in a common point. The prior association is any proof of this fact. Each proof—or even different ways
13. The turnstile (£) symbolizes entailment within some logical system. Statement (4.4.1) abuses this notation slightly because we do not require the prior association to be a proof in first-order logic. What is intended is simply that we have an acceptable proof. 14. In chapter 5, I introduce restrictions on the way proofs are written, but the complications are not important here.
108
BY PARALLEL REASONING
A
Z
X 0 B
Y
C
FIGURE 4.1. Ceva’s theorem.
of writing what is substantially the same proof—constitutes a different prior association. There are at least two ways to prove that the medians AY, BZ, and CX of triangle ABC have a point of intersection (see figure 4.1), and as we shall see, these distinct proofs make a vast difference when it comes to assessing the plausibility of the analogical argument that the four medians of a tetrahedron have an intersection point.15 First, there is a purely geometric proof that uses Ceva’s Theorem. Since AX = XB, BY = YC, and CZ = ZA (as X, Y, and Z are midpoints), AX BY CZ ¼ 1: XB YC ZA That is precisely the condition Ceva’s Theorem requires for the line segments AY, BZ, and CX to be concurrent in a point. An alternative proof uses analytic geometry. Represent the vertices A, B, C as ordered pairs in the usual Cartesian coordinate system. Then the mid-points are X¼
AþB BþC AþC ;Y ¼ ; and Z ¼ ; 2 2 2
and the medians are the sets of points {(1 t)C + tX: 0 t 1} and so on. The point AþBþC 3 lies on each median, as can be seen by taking t¼
2 : 3
This proves that the medians are concurrent.
15. For a tetrahedron, the medians are the four lines joining the vertices to the centroids of the opposite faces.
THE ARTICULATION MODEL
109
Now let us turn to the second step in our general model, the determination of relevance. In section 4.3, I characterized critical factors as those that play an essential part in the prior association. In mathematical analogies, the critical factors are just those that are used in the proof—that is, jC = j. All factors in f are critically relevant because they are all essential to the proof as given. Applying this analysis to our example, we find that the critical factors are the following: 1. 2. 3a. 3b.
X, Y, and Z are mid-points. The medians join vertices to mid-points. Ceva’s Theorem (for the first proof ); or the existence of 1-parameter algebraic representations for the medians (for the second proof ).
Nothing else is critical. Background assumptions do not count as critical, nor even as secondary.16 This does not mean that they are unimportant. As we shall see here and in chapter 5, analogical arguments in mathematics may fail precisely because of some disanalogy in these background assumptions. On my theory, that should not count against an initial judgment of plausibility, provided that judgment is based on a proof that is adequately explicit. It is natural to express a different worry. Some proofs are not parsimonious— they make use of facts that are not needed to derive the conclusion. An alternative approach would define the critical factors to be those that appear in a parsimonious proof. After discussing our example, I shall explain why I do not believe that this alternative approach is appropriate. The first two steps—laying out the prior association and identifying the critical factors—are carried out solely within the source domain. The third and final element of the articulation model is the plausibility principle, (4.3.1), which assesses the potential for extending the prior association to cover the target domain. The important idea, once again, is that analogies are a precursor to generalization. In mathematical analogies, to generalize is to formulate and prove a result in a setting that comprehends both the source and target domains. To treat analogies as the first step in such a process seems as reasonable in mathematics as anywhere, since most (perhaps all) successful analogies do lead to generalizations. That strategy is certainly vindicated by Example 4.1, since one can prove that the medians of n-dimensional simplexes (generalized tetrahedra) intersect for any n. But, of course, when we argue by analogy we cannot appeal to (and are often unaware of ) any such generalization. So let us turn to the test for plausibility. Combining principle (4.3.1) with the above analysis of critical relevance, we obtain the following specific form of the test:
16. The background assumptions may be set-theoretic, logical, or even arithmetic axioms. It is often obvious that they hold for the target domain as well as the source, so they do not need to be spelled out. When that is not obvious, however, they may cease to count as background assumptions. If we are applying analogical reasoning to a problem in set theory, for example, then the set-theoretic assumptions employed must be made explicit. I say more about the role of background assumptions in chapter 5.
110
BY PARALLEL REASONING
(4.4.2) Prima facie Plausibility for Mathematical Analogies 1. Overlap. Some fact used in the source proof must belong to the positive analogy, P. 2. No-critical-difference. Nothing used in the source proof can correspond to something known to be false in the target domain. That is, no critical assumption belongs to the negative analogy, N. If a general theorem holds in an abstract setting that includes the source and target domains as special cases, then the general proof could be adapted, step-bystep, to yield proofs in our two domains. So if, for any fact used in the proof of Q, either there is no meaningful analog in the target T or the analog is false, then the no-critical-difference condition fails. There is then no good reason to expect a generalization, and the analogical argument is not plausible. Let us apply these criteria to Example 4.1. In either version of the argument, the concepts analogous to “triangle,” “midpoint,” and “median” are “tetrahedron,” “centroid,” and “median.” So facts 1 and 2 identified above as relevant have true analogs in solid geometry by the set-up of the problem. There is no clear three-dimensional analog to Ceva’s Theorem, however, so the no-criticaldifference condition fails for the first proof.17 On the other hand, the second proof passes the test. The algebraic representation for the centroids and medians of a tetrahedron is similar to that used for a triangle. For instance, if the vertices of the tetrahedron are A, B, C, and D, then the centroids are AþBþC AþBþD ; ; 3 3 and so on, exhibiting a form and symmetry like the two-dimensional case.18 So the analogical argument is plausible. The analogical argument fails with one proof but succeeds with another, and that is consistent with our view that the particular proof employed is an integral part of the analogical argument. In summary, for a mathematical analogy to satisfy our requirement of potential for generalization, everything used in the source proof must correspond to something not known to be false in the target domain. Otherwise, we have no reason to hope that there is a more general proposition of which the source and target theorems are special cases. If the overlap and nocritical-difference conditions are satisfied, then there is scope for strengthening the argument in two ways. First, we can make the argument better by moving more of the relevant properties from the neutral to the positive analogy. Second, we can make the nature of the correspondence sharper, as I shall do in chapter 5.
17. Nothing like Ceva’s Theorem comes to mind. Of course, one response to the apparent disanalogy might be to try to find (and prove) a three-dimensional analog of the theorem. 18. Lacking an account of similarity at this point, I simply assert that there is a natural correspondence here. In the terminology of chapter 5, this is a case of geometric similarity.
THE ARTICULATION MODEL
111
Let us return now to the earlier concern about idle assumptions. It might happen that an analogical argument should be rejected according to principle (4.4.2) because an assumption used in the source domain (i.e., in the proof ) has no analog (or a false analog) in the target, yet that assumption is actually unnecessary and could be eliminated in a parsimonious proof. On the basis of such cases, the objection is that my account of plausibility is too strict. My first response is to repeat that distinct proofs lead to distinct analogical arguments. If the analogical argument as presented is not amenable to generalization, then it ought to be defeated. The burden rests with the advocate, who has every incentive to formulate as parsimonious a proof as possible. It is perfectly in order for the critic to single out any feature of the proof as given that does not seem to carry over to the target. But there is a more interesting response. The art of analogical reasoning in mathematics is to “see” a common pattern in two domains. Often, this means representing or reformulating a proof in a way that removes disanalogies. To this end, it is fruitful to define the critical factors relative to the actual proof, rather than relative to some ideal, parsimonious proof. The latter concept is not only difficult to define but also would, if incorporated into our theory, move it away from its practical orientation. The approach we have adopted allows for flexibility in what counts as an acceptable proof. It also provides clear guidelines about what to do when an analogy fails: either abandon it or refine the proof to eliminate the disanalogy. A very similar objection can be raised in connection with the overlap condition. That condition can be trivialized, given that we are not imposing a requirement of parsimony. To any proof j £ Q, the advocate need only add some trivial proposition B that holds for both source and target domains. Then j [ {B} £ Q is still an acceptable prior association, B counts as relevant by our definition—indeed, as critically relevant—and B belongs to the positive analogy. Once again, I think it would be a mistake to respond to this objection by imposing a requirement of parsimony. Suppose we stipulate that the positive analogy must not be logically trivial. The prior association can then be manipulated so that B plays a nontrivial role.19 A better response is to fall back on the rhetorical device of advocate and critic: an intelligent critic will reject a prior association gerrymandered to satisfy the overlap condition. I resist building this into the preconditions because (as we shall see in chapter 5) many creative analogies in mathematics do involve manipulations that are hard to distinguish from specious moves. It is best to acknowledge that prior associations containing trivial additions are valid mathematical proofs, but should be rejected as the basis for an analogical argument. I will not attempt to formalize this idea here, but I return to these issues in chapter 5.
19. Replace any assumption f used in the proof with the two assumptions B, B⊃f.
112
BY PARALLEL REASONING
4.5 Predictive/Probabilistic Analogies We often use analogies to make predictions even when the relations involved are causal or statistical, rather than relationships of entailment. We may have only a partial understanding of the relevant processes, there may be exceptions, or the processes may be genuinely stochastic. In each of these cases, we encounter analogical arguments that attempt to transfer the relationships in the source domain to a new setting in order to make plausible predictions. In this section, we discuss these predictive/probabilistic analogical arguments. Two examples illustrate the idea. Example 4.5 (beta-blockers). Reasoning from analogy is the basis of all medical experimentation with animals. Any inference about the effects a drug might have on humans is analogical because it extrapolates observed results based on the similarities between humans and different animal species. The most persuasive analogies are those involving physiological processes known to be shared by humans and laboratory animals, although weaker analogies are also possible (see section 4.9). Although alternatives have become available, animal models are still regarded as indispensable for testing the efficacy, safety, and proper dosage of drugs (Lembeck 1989, 24; Goldberg and Frazier 1989, 24). When (and whether) animal testing is ethically justified is, of course, a matter of debate. In the late 1940s, the pharmacologist R. P. Ahlquist performed experiments on several different species of animals to determine the effects of adrenaline. He discovered that adrenaline bonded chemically with cell components that he dubbed “beta-receptors.” Among other effects, the bonding stimulated cardiac demand for oxygen; to meet this demand, heart rate increased. In the following decade, J. W. Black discovered a class of drugs that worked as agonists, bonding strongly with the beta-receptors and impeding the effects of adrenaline. These first beta-blockers, pronethalol and propanolol, were observed to decrease the heart rate of rabbits, dogs, and other species. Black hypothesized that the drugs decreased the heart’s demand for oxygen, and (by analogy) could be used to decrease the heart rate of humans and thus to alleviate conditions such as angina and hypertension, both of which were known to arise from an imbalance between oxygen supply and demand (Kobinger 1989, 32; Shanks 1984, 37). Example 4.6 (life on other worlds). Since Thomas Reid first proposed the argument, analogy has played a prominent role in speculations about life on other worlds.20 John Stuart Mill (1843/1930, 89–90) discusses whether an analogical argument supports the hypothesis that there are inhabitants on the moon. Both Earth and the moon are “solid, opaque, nearly spherical substances, appearing to contain, or to have contained, active volcanoes”; both revolve about their axes; and both (according to Mill) “receive heat and light from the sun in
20. Versions of this argument are discussed in earlier chapters (sections 1.6, 2.5, and 3.3).
THE ARTICULATION MODEL
113
about the same quantity.” If this were all we knew about the moon, he continues, it would increase the probability of its having inhabitants. Of course, as Mill points out, this account is incomplete. Most important, the moon has “no atmosphere . . . , no clouds, and (it is therefore concluded) no water.” This totally destroys the argument, since as far as we know these conditions are essential for recognizable forms of life. He concludes that if life does exist on the moon, “it must be as an effect of causes totally different from those on which it depends here.” By contrast, the analogical argument that life exists or has existed on other planets, particularly Mars, is still considered viable (McKay 1993). Some researchers believe that the climate of Mars was favorable to life some 3.5 billion years ago. Outflow channels and networks of valleys suggest that water once flowed there, an impression confirmed by the Mars Rover expeditions of 2004. There is evidence that the planet once possessed a warm carbon dioxide atmosphere. There is an even better analogy between Mars and frozen lakes in Antarctica or glaciers in Greenland, where microbes have been found to thrive despite the cold. Of course, there are still important differences, such as Mars’ lack of a large moon. If tidal forces played an indispensable role in the genesis of life on Earth, then the presence of a large moon could be a critical disanalogy. Both examples concern causal relations, and in both cases the direction of causation is from P (the positive analogy) to Q (the hypothetical analogy). In Example 4.5, we have some grasp of the physiological mechanism (beta-receptors) common to humans and animals. In Example 4.6, by contrast, we have less precise knowledge of the relevant causal processes. We do not know exactly what produced life on Earth, although we believe that we know some of the crucial factors. Let us see how the components of the articulation model can be tailored for evaluating arguments of this type. Our treatment of mathematical analogies in the preceding section provides a point of departure, but this second category turns out to be considerably more delicate.
Prior Association and Preconditions For this type of analogical argument, the prior association is a causal explanation. The explanation cites causal factors, some positively relevant to the conclusion and others (such as frigid Antarctic temperatures) negatively relevant. Such explanations are typically partial: they provide some of the factors that produced the effect Q, but do not pretend to offer the full story. They may or may not be formulated in probabilistic terms. We may or may not have a causal model, in the style of Pearl (2000) or Spirtes, Glymour, and Scheines (2000). As a general model for this type of prior association, I shall make use of Humphreys’s notion of “aleatory explanation” (1981, 1989). There are numerous accounts of causal explanation, but Humphreys’s approach has the virtue of not requiring definite probability values. This is an important advantage since, as we have seen in the examples, such values are not generally provided in analogical arguments. What is more, aleatory explanations can also be used to model causal explanations that are not explicitly stochastic, such as Example 4.5.
114
BY PARALLEL REASONING
An aleatory explanation of a particular event A takes the form “A because F, despite C”, where F is a nonempty set of contributing, or positively relevant, causal factors and C is a (possibly empty) set of counteracting, or negatively relevant, factors. Humphreys offers a simple example: “The car skidded off the road because it was traveling at an excessive rate of speed, despite the fact that the road was free of ice” (1981, 227). Contributing causes (high speed) raise the probability of the event to be explained; counteracting causes (the absence of ice) lower it. I amend this slightly to conform to the notation of section 4.3 and to accommodate the causal relevance of absent factors, whose significance will shortly become clear. I suggest that the prior association for a predictive/probabilistic analogy has the canonical form: (4.5.1) Prior Association for Predictive/Probabilistic Analogy despite j Q because j+ and , Here, Q is the hypothetical analogy while j+ and j are, respectively, sets (or lists) of relevant contributing and counteracting causal factors present in the = { 2, . . . } is a set (or list) of factors that are absent from p 1, p source domain.21 22 the source. Typically the positive analogy includes at least one contributing cause (a consequence of the overlap requirement). As in the preceding section, it is helpful to introduce a background context B of unstated assumptions, although B is not formally part of the prior association. Refinements of the canonical form (4.5.1) are certainly possible. If our explanation allows us to estimate the probability of Q, this information can be included in the representation (4.5.1) as well. If we have a causal model in the style of Pearl (2000), then we can offer a more precise representation. Although there is scope for expanding the theory in that direction, I limit my attention to Humphreys’s model because it is both flexible and pitched at exactly the right level of detail for many analogical arguments.23 For Example 4.5, we have: betablockers lower heart rate in animals (Q ) because (1) animals have beta-receptors; (2) heart rate depends on adrenaline bonding with the beta-receptors; and (3) beta-blockers bond with the receptors, reducing the effect of adrenaline. All three factors here are contributing causes. The positive analogy consists of items (1) and (2), explicitly identified physiological mechanisms common to humans and animals. For Example 4.6, the initial version of the prior association is that life on 21. Humphreys’s model depends upon the assumption that one can label every causal factor as either contributing or counteracting. That assumption has been criticized. For example, Hitchcock (1993) suggests that ascriptions of positive and negative relevance are always relative to other factors. The causal relevance of receiving a dose of medicine to recovery might be positive relative to getting no treatment, but negative when compared to receiving a larger dose. Even if Hitchcock is right, assertions of binary causal relevance may generally be employed as shorthand for relative (ternary) claims, and the incorporation of defeating conditions into our account of relevance (see below) will allow us to handle variable causal valences. 22. Here I depart from Humphreys, who explicitly excludes absent factors from explanations. The reason for will soon be made clear. I will continue to refer to the amended version as an aleatory explanation. including 23. As we shall see shortly, Humphreys’s framework accommodates a kind of first-order counterfactual reasoning. More sophisticated causal models permit more sophisticated appraisal of this type of reasoning.
THE ARTICULATION MODEL
115
Earth is due to the atmosphere, water, distance from the sun, and so forth—all contributing causes. We also have a version where the explanandum (Q) is the existence of life in Antarctica, and counteracting causes are included (extreme cold, lack of liquid water). What preconditions should be satisfied by an aleatory explanation? Consider Example 4.6. There are many possible aleatory explanations for life on Earth with varying lists of contributing (and counteracting) causes. Which of these provide an acceptable starting point for an analogical argument about life on another world? The crucial question is this: How complete must these explanations be? In particular, can they omit contributing factors (present or absent) that we (advocate and/or critic) know to be relevant? The need for some type of epistemic completeness is familiar from decades of thinking about statistical explanations. In Humphreys’s approach, the completeness requirement is rather modest, but I believe that it is just what we need. Humphreys’s requirement is that no defeating condition can be known to hold. A defeating condition is a circumstance that, if it were to obtain, would convert a contributing cause to a counteracting cause (or vice versa). I extend this to any condition that nullifies a contributing or counteracting cause by rendering it irrelevant in the circumstances. Consider the following simple example: Mrs. Thompson’s garden flourished because it received plenty of sun and she watered it regularly, despite the cool weather. We know that excessive watering would destroy the garden. So the explanation is legitimate only on condition that there is not excessive watering. I adopt Humphreys’s requirement as a precondition for an acceptable prior association, as follows. (4.5.2) Completeness Condition for Predictive/Probabilistic Analogy No defeating condition for any contributing cause in the explanation of Q may be known to hold in the source domain. This condition requires some clarification. As already explained, aleatory explanations need not mention every contributing cause. Nor need they include, for each contributing cause mentioned, statements about all known defeating conditions. Completeness just means that for each contributing cause that is included, no defeating condition is known to obtain in the source domain. In particular, condition (4.5.2) ensures that if our aleatory explanation omits some factor known to be essential for Q, that factor may be taken for granted as present in the source domain (since any circumstance known to make the explanandum (Q ) impossible counts as a defeating condition). Let us see how this applies to Example 4.6, the argument for life on the moon. Mill’s objection may be put thus: the argument fails because the prior association deliberately suppresses features widely known to be essential to life on Earth. On my account, this analogical argument will also fail, but not at this first stage of formulating an acceptable prior association. An acceptable aleatory explanation for life on Earth may omit the presence of liquid water and an atmosphere, even though we know that both are essential requirements for life. The failure will emerge later: we will identify these factors as critical,
116
BY PARALLEL REASONING
whether or not we include them initially in the prior association. We shall see that there is no advantage to be gained by deliberately leaving them out. Still, as stressed in section 4.3, it is important that all standards (i.e., preconditions) that apply to the prior association be independent of the analogical argument.
Relevance Suppose now that we have a representation of the prior association and that it satisfies the completeness condition (4.5.2). The next step is to identify the relevant factors, critical and secondary, in the source and target domains. My proposal, referring to the canonical form (4.5.1), is that the relevant factors are of three sorts: • All identified contributing causal factors (j+) are critical. • All salient defeating conditions (in P) for these contributing causal factors are critical (that is, their absence is critical). • All identified counteracting causal factors (j¯) and all nonsalient defeating conditions (in P) are secondary. This proposal, however, applies not to the prior association but rather to its augmentation, defined in terms of salient defeating conditions that we now describe. Given a set of contributing causes, a salient defeating condition is a defeating condition for one of those causes that we (advocate or critic) judge either to be present or possibly present in the target domain. In other words, it would belong to the negative or neutral analogy if we extended these notions into the background context, B. The augmented prior association is simply the original prior association supplemented with all salient defeating conditions. It still matches the canonical form (4.5.1), Q because jþ and none of p1 ; p2 ; . . . ; pn ; despite j− ; where j+ and p1 through pn—henceforth summarized as P—include all salient defeating conditions for all contributing causes.24 The additional factors must be drawn out by going through the initial list of contributing causes, reflecting upon known defeating conditions, and considering whether those conditions might hold in the target domain. Nonsalient defeaters can remain in the background context. The advocate should, in principle, have no objection to the augmented version. Since the original prior association satisfies condition (4.5.2), it can accommodate the additional factors. In practice, though, we may expect debate as to what counts as a salient defeating condition. The shift to the augmented prior association represents a compromise. The advocate is allowed great latitude in the initial aleatory explanation, but we use the augmented version to prevent misleading suppression of relevant factors.
24. In general, we have to add both contributing causes (e.g., the presence of liquid water) and the absence of defeating conditions (e.g., no extraordinary levels of radiation).
THE ARTICULATION MODEL
117
Consider once more Example 4.6, the argument for life on the moon. If the original prior association is skimpy, the augmented version must include all identified contributing factors as well as the presence of liquid water and an atmosphere—indeed, these are all critical factors. But the new version need not mention the absence of a large black hole in the vicinity of Earth, which (given our epistemic situation) is not salient. It is natural to raise questions about this analysis of relevance. On the one hand, our theory counts all identified contributing factors as critical, even though some of them might play a rather small role in bringing about Q. On the other hand, even the augmented prior association might omit important causal factors. I shall address these concerns in the discussion section.
Potential for Generalization The final step is plausibility screening, which assesses the potential for generalizing the (augmented) prior association. As a preliminary question, what sort of generalization might result if an analogy of this type is successful? One natural objective of inquiry is to identify appropriate causal contexts (reference classes) and, relative to those contexts, to determine the values of important conditional probabilities.25 The relevant “generalization” toward which predictive/probabilistic analogical arguments are directed may thus be a conditional probability statement or (more generally) a probability distribution. A more ambitious objective might be to elucidate a common causal mechanism (or set of mechanisms) at work in both domains, and in this sense, too, predictive/probabilistic analogies can pave the way for generalization. Consider our two examples. In the case of Example 4.5 (beta-blockers), our preliminary goal might be to find out about the probability of reduced heart rate, given various doses of the medication. We might also have the more ambitious aim of understanding the mechanisms, common to humans and laboratory animals, by which beta-blockers halt the effects of adrenaline. Similarly, in Example 4.6, our immediate concern may be to show that the probability for life, given certain conditions (common to Earth and ancient Mars), is nonnegligible, while our larger objective may be to deepen our understanding of the conditions under which life originates. The analysis of section 4.3, combined with the preceding analysis of critical relevance, provides specific criteria of plausibility for predictive/probabilistic analogies. Since the critical factors are contributing causes plus salient defeating conditions, we have the following test. (4.5.3) Prima facie Plausibility for Predictive/Probabilistic Analogies 1. Overlap. Some contributing causal factor (in j+) must belong to the positive analogy, P.
25. In the deterministic case, the objective is to find the causal contexts within which the probabilities are 1 or 0.
118
BY PARALLEL REASONING
2. No-critical-difference. Each identified contributing causal factor (in j+) must not be known to be absent in the target, and each salient defeater (in P) must not be known to be present in the target. The rationale for this test in terms of the preceding discussion of generalization is as follows. If the source domain lacked any of the contributing factors in j+, or if any defeating condition in P were to obtain, that would vitiate the particular aleatory explanation. So if analogs of these conditions are known to fail in the target domain, there is no potential for generalization—we will have failed to make it plausible that there is a causal context or mechanism common to the source and target domains. To illustrate these ideas, consider Example 4.6 once more. A number of distinct analogical arguments are contained in this example, depending upon how we formulate the prior association and also depending upon whether our target is the moon or Mars. These choices affect the plausibility of the argument. In the first place, the advocate is free to adopt a more or less detailed explanation of the origin of life on Earth, including or omitting Earth’s history of volcanic activity and its large moon. Both of these factors, as Mill notes, may have contributed to the existence of life on Earth, but they count as critical only if explicitly identified as contributing causes. By contrast, the presence of an atmosphere and of liquid water on Earth count as critical if their presence (on the target planet or moon) is in doubt. They become part of the augmented prior association no matter what. If the target object is the moon, it makes no difference whether we adopt the more or less detailed explanation for life on Earth. Our test leads to the unsurprising rejection of the analogical argument, since the moon is known to lack an atmosphere and liquid water. By contrast, if the target object is ancient Mars, the verdict of our analogical argument is sensitive to exactly which critical factors are included. The less detailed version, which omits all mention of Earth’s large moon, passes the test for prima facie plausibility as all critical factors belong to the neutral or positive analogy. By contrast, if our prior association mentions tidal forces (and Earth’s large moon), the argument fails. The beta-blocker argument, Example 4.5, passes the plausibility test and, indeed, appears to be highly persuasive for several reasons. The action of the drugs is well understood for the case of the animal model. The relevant physiological processes have analogs in humans, so that all contributing causes belong to the positive analogy. There do not appear to be any defeating conditions obviously present in humans. Although we have not yet addressed the issue of multiple analogies, the fact that the same effects are produced in multiple species greatly strengthens the argument. It appears that the analogy could only fail because of some previously unknown fact about humans—for instance, the existence of a protein that neutralizes beta-blockers. The criteria for plausibility are eminently satisfied.
Discussion In this section, I first review some objections to our test (4.5.3) for plausibility, and then I discuss criteria for assessing qualitative plausibility.
THE ARTICULATION MODEL
119
To begin with, the analysis we have provided of the Earth-Mars analogy reveals an important general feature of our theory: the initial representation of an analogical argument involves trade-offs. The more contributing causes we include in the explanation, the stronger the association and the greater the potential for a strong analogical argument, according to the upper bound thesis of section 4.3. Yet each contributing cause listed counts as critical and thus increases the likelihood of finding a disanalogy between the source and the target domains. I think that this is exactly right, but it does appear to have puzzling implications. One of these puzzles can be put in the form of an objection. The articulation model appears to reward ignorance and to encourage deception. Out of ignorance or a deliberate intent to mislead, we might omit from our initial causal explanation (i.e., the prior association) some important factor that is in fact a critical disanalogy—one that defeats the analogical argument. Let us treat the two cases (ignorance and deception) separately. I have already suggested that if we omit an important factor out of ignorance— ignorance shared by advocate and critic—then there is no problem with our theory’s finding an analogical argument plausible. That verdict is relative to our epistemic situation. Suppose it turned out that, for humans, beta-receptors had a much greater affinity for adrenaline than for chemicals such as pronethalol and propanolol, so that the analogy of Example 4.5 ultimately failed. We would have had no way to know of this difference in advance, and the original analogical argument would have been justified. Similarly, with reference to Example 4.6, suppose it turned out that the value of the gravitational constant played an important role in the origin of life on Earth. We might know in advance that Earth’s gravitational constant has a value roughly 2.5 times as great as that of Mars but have no idea that this fact is relevant. Here again, if everything else points to a plausible analogy, then the hypothesis that Mars supports or once supported life is worth pursuing. In fact, following such an analogy is never a waste of time since it may reveal a previously unknown relevant difference. Rather than rewarding ignorance, the analogy’s initial plausibility helps to remove it. The case of deception, where a difference known to be important is deliberately concealed, seems to be more serious. One remedy is to evaluate such arguments in an openly adversarial setting such as a court of law, where we expect advocates to downplay characteristics that would distinguish a favorable precedent. To some extent, peer review and debate play a similar role in the scientific arena. But there is no need to bring in external remedies to answer the objection. The most direct response is that even if much is held back, our theory brings to light any concealed difference (salient defeater) that is known to be essential in the circumstances. If the resulting analogical argument passes the test, it still has some plausibility. As a further response, the problem of deception largely disappears when we move beyond individual analogical arguments to the broader context of analogical reasoning. Analogical reasoning makes use of multiple versions of an analogical argument to gauge the plausibility of the conclusion. We discover that the Earth-Mars analogy succeeds if we omit the contribution of tidal forces and fails
120
BY PARALLEL REASONING
if we include them. We conclude that the analogy with Earth provides only limited plausibility for life on ancient Mars. Let us turn now to a second objection, parallel to one that occurred in our discussion of mathematical analogies: our theory is too strict about predictive/ probabilistic analogies. We take every identified contributing factor to be critical and reject an argument if even one of those factors belongs to the negative analogy. Yet it may be that the problematic factor has almost negligible importance. My response to this objection is virtually the same as the response given to the parallel objection for mathematical analogies. First, judgments of plausibility vary with the prior association. Any factor cited is fair game for criticism. Second, taking this strict line about critical disanalogies is fruitful in that it helps us both to refine a failed analogical argument and to gauge the plausibility of the conclusion. Let us now move beyond our test for prima facie plausibility to consider how predictive/probabilistic analogical arguments may be strengthened or weakened, according to the criteria for qualitative plausibility outlined in section 4.3. To begin with, we identified several reasons for the strength of Example 4.5, the beta-blocker analogy. By comparison, when we apply those reasons to the analogical argument for life on ancient Mars, it appears relatively weak. In the first place, the origins of life are something of a mystery. Numerous theories have been proposed, with varying degrees of plausibility.26 Second, as noted already, some of these theories rely on causal processes which had no analog on ancient Mars. If the first cells evolved in tidal pools, as some theories maintain, then the absence of large tidal effects on Mars would count as a critical difference. Third, there is only one favorable source domain: Earth. Finally, certain features that seem to be part of almost any possible explanation fail to have analogs on Mars. On Earth, the origin of life required the sustained presence of warm temperatures, liquid water, and an atmosphere. In a relatively short period of time, Mars appears to have become cold and dry, and lost most of its atmosphere. Life might not have had time to emerge even if conditions were initially favorable. The plausibility of the Earth-Mars analogy turns on this condition. Defenders of this analogy have not yet shown that the characteristics believed to be essential for life persisted on Mars for sufficiently long to permit its development. At best, one can maintain that at present these factors belong to the neutral analogy, so that the hypothesis of life on Mars is still (weakly) plausible. Guided by this discussion of the two examples, I suggest that there are five important determinants of plausibility for this type of analogical argument. 1. Strength of the prior association. Is the aleatory explanation well understood and well-confirmed by evidence? How detailed is it? Are the proposed causal processes based on observation or largely hypothetical? Are there plausible alternative explanations? In accordance with the upper bound 26. Some of these theories are discussed in Cairns-Smith (1985).
THE ARTICULATION MODEL
2.
3. 4.
5.
121
thesis, a stronger prior association makes for a stronger analogical argument.27 Extent of correspondence. The contributing causes and defeaters in the proposed account are the critical factors; counteracting causes are secondary. To what extent do both types of relevant factors belong to the positive analogy? Multiple favorable analogs. Source domains in which similar causal accounts can be given strengthen the analogy. Multiple competing analogs. Source domains in which the analog to Q fails to hold tend to weaken the analogy. In fact, such domains can be seen as providing defeaters for the contributing causes (as will become clear in section 4.10). Counteracting causes. These played no role in our discussion of prima facie plausibility, but they are important when it comes to assessing the strength of an analogical argument. If there are known counteracting causal factors in the target, an analogical argument is strengthened if we can formulate the prior association so as to incorporate them into the neutral analogy or, better still, the positive analogy. In other words, we need to find a source domain in which the effect was produced despite these factors, demonstrating that they are not defeating conditions.
To clarify this last point, a large overlap between the positive analogy, P, and the set of counteracting causal factors j can strengthen an analogy, provided our objective is merely to establish a weak conditional probability claim (showing that something is possible). Since Q holds in spite of reasons to think it might not, opponents cannot cite such reasons as decisive against the plausibility of Q*. Those factors are evidently not defeaters. This is illustrated in Example 4.6: there is life in Antarctica because of an atmosphere and water, despite extreme cold and dry conditions. By building in the counteracting causes, we make the argument more plausible. We should assess predictive/probabilistic analogical arguments in all of these respects. For the strongest analogical arguments in this category (such as Example 4.5), there is only one plausible causal account of Q; critical and secondary causes belong to the positive analogy P; there are multiple supporting source domains; and no analogies supporting a contrary conclusion. The weakest acceptable analogical arguments only just satisfy the overlap and no-criticaldifference conditions: contributing causes and their defeaters belong to the neutral analogy (with at least some overlap with the positive analogy). Such arguments become more persuasive as critical factors are shifted to the positive analogy.
27. This issue does not arise for mathematical analogies where the prior association always gives us an entailment relation (which is maximally strong).
122
BY PARALLEL REASONING
4.6 Abductive Analogies It is a mistake to treat all analogical inferences as predictive—as if it were always a matter of accumulating enough similar causes to justify thinking that corresponding effects will be produced in the source and target domains. We have to evaluate analogical arguments differently when the direction of the prior association is reversed. If Q explains P (the positive analogy), then the analogical argument aims to show that a similar hypothesis Q* is plausible because it would explain a similar phenomenon P*. Justification in such a case is, roughly, a matter of accumulating enough similar effects to support the inference to similar causes. I distinguish two cases, abductive and abductive/probabilistic, according to whether the relationship is characterized as entailment or in probabilistic terms. We begin with the former category.
Examples, Prior Association, and Preconditions In an abductive analogy, the prior association is a relationship in which some observable result E is mathematically derived from a hypothesis Q together with other assumptions. Let us represent the other assumptions that appear explicitly in the derivation as C.28 (4.6.1) Prior Association for Abductive Analogy C, Q £ E. Informally, we say that E follows from Q.29 Typically, all or part of E lies in the positive analogy, and Q plays a nontrivial role in the derivation. The additional assumptions C include boundary conditions and auxiliary hypotheses (restrictions, idealizations, and so forth). The positive analogy may extend to some of these additional assumptions as well. Based on a similar relationship in the target domain, the objective of the analogical argument is to convey plausibility upon an explanatory hypothesis Q*. To illustrate these concepts, let us review Example 4.2 and introduce a further example. Example 4.2 (spherical shells, continued). In Priestley’s analogy, the source domain S is a spherical shell of uniform density and the target T is a hollow charged sphere. The (accepted) law of gravitation (Q ) explains the observed absence of force inside a shell (E ), provided we assume uniform density (C ). The prior association here is just the derivation of E from Q and C. The point of the argument is to show that a similar hypothesis about electrical force would explain a similar consequence: the absence of force inside a shell of uniformly distributed charge. In this case, the positive analogy is extensive. It includes the 28. As usual, we suppose a background context B of unstated assumptions. 29. As in section 4.4, we use the turnstile to signify an acceptable derivation, but not necessarily a valid proof in first-order logic.
THE ARTICULATION MODEL
123
observed facts (E ), some of the initial conditions (hollow sphere), and the main auxiliary hypothesis (uniform density). All have analogs in the target domain. Example 4.7 (acoustical analogy). An acoustical analogy was employed by many nineteenth-century physicists investigating the spectral lines of different elements. The physicist G. G. Stokes was the first to suggest explicitly that, apart from the fact that light was a transverse vibration and required a special medium (the ether), spectral phenomena were “completely analogous to the acoustical situation, with atoms (and/or molecules) serving as oscillators originating or absorbing the vibrations in the manner of resonant tuning forks” (Maier 1981, 51). Just as the production of a note on a stringed instrument or tuning fork leads to harmonic overtones, “would it not be reasonable to expect the discovery of corresponding harmonic overtone relationships between the lines of the spectra?” The similarities between sound and light appeared to lend the conjecture some plausibility. In a number of cases, it proved possible to relate the frequencies of visible lines in ratios of whole numbers, although there were often unexplained gaps. For instance, G. J. Stoney in 1871 found that the first, second, and fourth spectral lines of hydrogen had frequencies in the ratio of the numbers 20, 27, and 32. When it proved impossible to resolve all the lines into a fundamental frequency and its overtones, physicists tried to group the spectra into different series, each consisting of harmonics of a different fundamental vibration. This, too, failed. An additional difficulty was that the members of each series converged to a limit, whereas there was “no known case in acoustics where frequency overtones did converge to an upper limit” (Maier 1981, 131). Nevertheless, efforts persisted to make the analogy work because, as Maier points out, in the early stages of spectral analysis there was really no alternative model. The acoustical analogy was not abandoned until after Balmer, in 1885, showed that the formula for the successive lines of hydrogen obeyed a totally different mathematical relationship: l ¼ hm2/(m2 – n2), where h is a constant, n = 2, and m = 3,4,5, . . . In this analogy, the prior association is the standard mathematical derivation of harmonic vibration frequencies for a fixed string.30 In the derivation, the initial conditions are that the string is fixed at both ends, has constant (positive) tension, and has constant mass density. The most important auxiliary assumptions are that there are only relatively small transverse displacements of the string and that tension is the only appreciable net force on any element of the string after it is released.31 The prior association for this type of analogy, then, is a derivation of certain observable facts, represented as E, in the source domain. There are two simple preconditions that this relationship should satisfy. (4.6.2) Pre-conditions for Abductive Analogy 1. The derivation must be valid. 2. The additional assumptions must be justified.
30. Or tuning fork, but we will settle on the vibrating string as the source domain. 31. Derivations can be found in many physics textbooks, e.g., Halliday and Resnick (1981).
124
BY PARALLEL REASONING
Every empirical hypothesis requires simplifications and idealizations for derivations to be possible. No real string has uniform density and tension, but variations may be negligible. It can be difficult to identify such idealizations, let alone to say when they are justified, and thus it can be difficult to determine whether the derivation required as part of the prior association (4.6.1) is valid. I shall not attempt to address these matters.32 Instead, I simply assume that advocate and critic can reach agreement about whether a particular argument meets the conditions in (4.6.2). Now we come to a marked contrast between explanatory analogies and the mathematical analogies of section 4.4. Both types of analogy involve the formulation of a hypothetical derivation in the target domain, but in the case of an explanatory analogy we also need to establish that this derivation is valid. In other words, we must meet, in the target domain, the following additional condition: (4.6.3) Precondition on the Target Association C*, Q* £ E* (valid derivation) Here C* represents boundary conditions and auxiliary hypotheses that apply to the target domain, while E* represents an observable result. The target association must satisfy its own version of condition (4.6.2), part 1. There is no such precondition for predictive analogies because there the conclusion of the analogical argument is that it is plausible that such an entailment holds. By contrast, for explanatory analogies, the fact that an analogous entailment holds is part of the foundation of the analogical argument. In our two examples, the derivation in the target domain is provided, at least in principle. Priestley was aware that an inverse-square law of electrostatic attraction would entail the observed absence of force inside the shell. Similarly, Stoney used the acoustical model to derive at least some of the observed spectral lines, though in the end the model could not account for all of them. The existence of this precondition for the target association is puzzling because it seems at first to make the analogical argument dispensable. If we have a derivation of E* from Q* (together with C*), then it appears that Q* becomes plausible on grounds that pertain solely to the target domain. Indeed, according to a crude formulation of the hypothetico-deductive (H-D) model of confirmation, any hypothesis receives confirmation if it entails some result that is actually observed. If we know that Q* meets this condition, then what purpose does the analogy serve? Part of the solution to this puzzle is to recall that even on the H-D model of confirmation, we need to justify the additional assumptions C* before concluding anything. A better response, however, is that we should not embrace this view of confirmation. One familiar shortcoming of the H-D model is that it fails to distinguish seriously proposed from ad hoc hypotheses. Many hypotheses that are confirmed on a narrowly construed H-D model are rightly ignored as ad hoc or spurious. As we saw in chapter 1, nineteenth-century philosophers and scientists tried to solve this problem by imposing additional requirements for confirmation. The problem also vanishes if we move to a Bayesian framework. In order to be 32. Regarding idealizations, see Sklar (2000) and Batterman (2005).
THE ARTICULATION MODEL
125
confirmed, a hypothesis must be independently plausible, which can be interpreted in terms of its prior probability (see section 1.9). The point, then, is that explanatory analogical arguments provide an important way to distinguish serious hypotheses from frivolous ones.33 Ideally, analogical arguments provide us with something like an estimate of the prior probability of the hypothesis Q*, taking the evidence E* into account, though we shall not pursue this point here.34
Relevance The next step is to determine which information is relevant. In keeping with our general theory, the set of potentially relevant factors j includes everything explicitly used in the prior association (except Q), which means the derived result, E, together with the boundary conditions and any auxiliary hypotheses, C. But it turns out that the plausibility of analogical arguments of this type depends both on a careful comparison between E and E* and on comparisons involving a broader class of observable consequences. Just as in our analysis of predictive/probabilistic analogies, these additional critical factors require us to augment the prior association. It is convenient to identify three classes of critically relevant factors: 1. All identified additional assumptions C are critically relevant, but (as it turns out) of limited importance. 2. The observable results E in the prior association are critically relevant. 3. Factors in scope(Q ), the set of salient observable consequences of Q , are critically relevant. If we augment the prior association by expanding E to include scope(Q) (to be defined shortly), then the third item merges with the second. First, let me defend the claim that the set C has limited importance in evaluating this type of analogical argument. As a preliminary observation, criticism of these analogical arguments tends to focus on failures of correspondence between observable consequences rather than on discrepancies about boundary conditions or auxiliary hypotheses. That is because abductive analogical arguments are generally employed in settings where we need to be highly inventive, as our two examples illustrate. This makes it appropriate to be especially open-minded about what might correspond to the assumptions in C—that is, to allow great latitude in devising mechanisms analogous to those of the source domain. This latitude extends to the idealizations and boundary conditions that accompany a novel theoretical hypothesis. Stoney’s idea is that something like a tuning fork or vibrating string determines the frequency of atomic spectra, and
33. That is, a hypothesis supported by a plausible analogy with a well-confirmed hypothesis is prima facie serious. This is not to say that every seriously proposed hypothesis must be analogous to some well-confirmed hypothesis, although some people have maintained this position. 34. This idea is developed in chapter 8.
126
BY PARALLEL REASONING
we should not to be too picky about what corresponds to string tension or fixed endpoints. The upshot is that we should allow the analogous assumptions, C*, to be stated in somewhat imprecise terms. Given this indeterminacy, the constraint that they should belong to the neutral or positive analogy (as implied by the nocritical-difference condition) is easily satisfied. They still count as critical, but they do not play a major role in evaluating the analogical argument. This constitutes a major difference from the way we treat such assumptions in a predictive analogical argument. With predictive analogical arguments, every assumption used to make a derivation go through or to explain some effect plays a critical part because the point is to show that the same (or a similar) effect will be produced in the target. But in the case of explanatory analogical arguments, we already know that similar effects have been produced in the source and target domains. There is little need to find a close correspondence between the background assumptions C and C*. The most significant role in evaluating abductive analogical arguments is played by the observable consequences of Q and of Q*. (4.6.4) Observational scope of Q—scope(Q) Scope(Q ) consists of E and every other salient observable consequence of Q together with C. The notion of “consequence” here depends not only on C but also on the unstated background assumptions B. The intention, however, is to rule out trivial consequences, such as those that follow from B alone; hence the reference to salient observable consequences. Definition (4.6.4) is (unavoidably) imprecise, but the examples of the next section should help to fix the concept. Close correspondence between observational consequences provides the crucial test of potential for generalization. If an observed consequence of Q + C fails to correspond closely to any analogous effect in the target, or has an analog that conflicts with what is actually observed in the target, the possibility of generalization is blocked.35 To appreciate this point, we formulate the final step in the model and then return to our two examples.
Potential for Generalization As usual, we begin by considering what sort of generalization might follow upon a successful abductive analogy. Our two examples suggest that the answer is a theoretical formalism that unifies the two domains. For instance, the absence of force inside a spherical shell of uniform density follows for any inverse-square law force, a result that belongs to a general theory of potentials. The existence of
35. Not every observation about the source domain requires an analog in the target. A spherical shell of uniform mass density does not repel any other body, but the analogous observational claim about an electrostatically charged shell fails if an object of opposite charge is nearby. The intention in restricting attention to scope(Q) is to block the derivation of these irrelevant consequences, for a suitable choice of C. For example, the observation about the lack of repulsion depends upon additional assumptions not in C.
THE ARTICULATION MODEL
127
discrete vibration frequencies that are integral multiples of a fundamental frequency follows for any standing wave. The historical fates of our two examples illustrate practical ways to test for the possibility of generalization. Scientists found Priestley’s argument highly convincing. For one thing, gravitation was the only other known force acting at a distance, so the choice of analogs was limited. Second, nearly every relevant fact about the gravitational case corresponded perfectly to an analogous fact for the electrostatic set-up: the observable consequences, most of the boundary conditions describing the set-up, and the auxiliary hypotheses. Important differences between electricity and gravity were known, such as the fact that electrical force can be repulsive as well as attractive, but these differences were not seen as relevant. On the current proposal, such a difference truly is irrelevant because it plays no role in the prior association. The argument, in Example 4.7, that spectral lines were produced by a type of standing-wave oscillation was more critically received—even though, as in the gravity-electricity analogy, there was no serious alternative for explaining discrete frequencies. The objections stemmed mainly from failures of correspondence in the observable consequences of the two domains (i.e., in what I have called scope (Q )). There should be an observable harmonic for each integer multiple of a fundamental frequency, as in the source domain. In the case of the target domain, however, there were both too few and too many observed spectral lines: too few because successive lines could not be put into correspondence with successive integers, and too many because, for many elements, no harmonic series could be found at all. In addition, the fact that each series of spectral lines converged to a limit did not correspond to the acoustical situation where there was nothing remotely like a limit in the sequence of overtones. In terms of our model, the analogical argument succeeded in Example 4.2 (spherical shells) and failed in Example 4.7 (acoustical analogy) because, in the latter case, people realized that factors in scope(Q ) belonged to the negative analogy. These remarks show that we cannot take a static approach to evaluating analogical arguments. Initially, when just a few spectral lines had been recorded, the acoustical analogy must have appeared plausible to many people. The “gaps” in the evidence could be attributed to an incomplete observation record. Over a period of time, however, the required series of spectral lines were not found. Nor did scientists find source analogs of the convergence that occurred in the target domain, signaling another deep disanalogy. At this point, continued adherence to the acoustical model could not be justified by the original analogical argument. In fairness, however, the decision about when to shift a factor from the neutral to the negative analogy is not always obvious. We summarize with the following test for plausibility, which combines the general theory of section 4.3 with our account of critical relevance. (4.6.5) Prima facie Plausibility for Abductive Analogies 1. Overlap. Some factor in E must belong to the positive analogy, P. 2. No-critical-difference. First, factors in C must not belong to the negative analogy. Second, E and more generally the elements of scope(Q) must not
128
BY PARALLEL REASONING
belong to the negative analogy; furthermore, investigation must eventually show that elements of this set can be shifted to the positive analogy. In summary, the evaluation of plausibility for abductive analogical arguments depends upon three things: 1. The validity of the derivations in the source and target domains 2. The reasonableness of the additional assumptions in both domains 3. The fact that no critical features (C or scope(Q)) belong to the negative analogy. In assessing the qualitative plausibility of an abductive analogy, we turn to the three factors identified in section 4.3. First, as always, we must consider whether critical factors belong to the neutral or the positive analogy. Thus, Priestley’s argument is particularly strong in light of the almost exact correspondence of the initial set-up and auxiliary hypotheses. These critical factors are all known to belong to the positive analogy. This was definitely not the case with the acoustical analogy, and as noted above, the situation did not improve over time. Second, the strength of the prior association plays an important part in assessing plausibility. The strength of the prior association is measured in terms of the degree to which E confirms Q. Without going into any detailed discussion of how to measure degree of confirmation, let us single out one factor for special attention: the degree to which E would be expected if Q were not the case. In general, the more precise the statement of E, the greater the degree to which it confirms Q and the stronger the analogical argument becomes. Priestley’s argument fares well because it is based on a precise analogy between observed effects. The acoustical analogy also fares well on this score since the observed effect is stated in precise terms (discrete frequencies that are integral multiples of a fundamental frequency). The existence of multiple analogies is of limited importance for abductive analogical arguments. That is because, as noted earlier, explanatory analogies are often used to advance a relatively new field in which it is difficult to come up with any clear hypothesis at all, let alone multiple analogies for competing hypotheses.
4.7 Abductive/Probabilistic Analogies In many explanatory analogical arguments, the explanatory connection is best represented not as entailment but, rather, in statistical terms or by means of a causal model. We can evaluate this type of argument by merging our analyses of the previous two types, abductive and predictive/probabilistic analogies. The relevant concepts are defined in those earlier analyses and so the discussion here will be somewhat abbreviated.
THE ARTICULATION MODEL
129
Prior Association, Preconditions and an Example First, consider the prior association. Typically, Q is part of a causal explanation for some phenomenon E, and we think that a similar cause Q* can help to explain a similar phenomenon E*. As in section 4.5, we employ Humphreys’s aleatory model of probabilistic explanation. We distinguish an observed result E, boundary conditions and auxiliary hypotheses C, contributing causes j+ with defeaters P, counteracting causes j, and background knowledge B. The prior association takes the following form: (4.7.1) Prior Association for Abductive/Probabilistic Analogy despite j¯, where Q j+. E because j+ and C and , In short, we have an aleatory explanation for E that includes Q in the explanans. Q may be the only contributing causal factor, but in general there are others. The explanation will also invoke additional assumptions C and may mention counteracting causes. As with the abductive analogies of the previous section, the positive analogy P typically overlaps significantly with the set E of observable consequences and may extend to other elements of the explanans. The preconditions on the prior association are drawn from those for predictive/probabilistic and abductive analogies. We want to ensure that the association (4.7.1) represents an acceptable aleatory explanation, and we want to take the abductive character of our argument into account. Thus, I propose the following two requirements: (4.7.2) Preconditions for Abductive/Probabilistic Analogy 1. No defeating condition for any contributing cause in the explanation may be known to hold in the source domain. 2. The additional assumptions C must be justified. For the rest of this section, I concentrate on one example of this pattern of analogical reasoning: Darwin’s famous analogy between artificial and natural selection. Example 4.8 (artificial and natural selection). In a letter to Henslow (May 1860), Darwin reflects upon the role of analogy in establishing the initial plausibility of his theory: Why may I not invent the hypothesis of Natural Selection (which from the analogy of domestic productions, and from what we know of the struggle of existence and of the variability of organic beings, is, in some very slight degree, in itself probable) and try whether this hypothesis of Natural Selection does not explain (as I think it does) a large number of facts. (1903, letter 100) Here the source domain S is any instance of “domestic production,” while the target T is any species found in nature. Artificial selection or breeding (Q ), by
130
BY PARALLEL REASONING
propagating traits pleasing or useful to the breeder, increases the probability that selected traits become predominant (E ). By analogy, the advantages that useful variations convey on individual plants or animals raise their probability of being passed along to offspring. Over a long period, natural selection can explain the predominance of useful traits and the diversity of species. In the source domain, the “prior association” is the entire complex causal process of mutation, inheritance, and artificial selection.36 The additional assumptions here have to do with the particular varieties, their characteristics prior to breeding, the methods and conditions of breeding, and the time span involved.
Relevance and Potential for Generalization Which factors count as critical in this sort of analogy? What might block a possible generalization—in Darwin’s case, a common explanation of diversity for species and varieties in terms of selection? To answer these questions, we once again combine our ideas for predictive/probabilistic and abductive analogical arguments. All factors in the prior association (except counteracting causes) are critical, but we have to augment that association in both directions by adding causes and effects. That is, salient defeating conditions and additional observable consequences of Q, both of which may be omitted from the original version of the prior association, count as critical. The analogical argument is defeated if any of these factors belongs to the negative analogy. Consider Example 4.8. In an illuminating discussion, Sober (1984, 18–19) points out that the analogy between natural and artificial selection faced three serious obstacles: “one concerned the observed effects of artificial selection, the other two its causes.” The first obstacle was that breeders had been able to “tinker and fine-tune,” but in Darwin’s day, had never produced a new species. The second obstacle was that artificial selection is “the product of intelligent manipulation,” whereas natural selection is supposed to function without causal input from any conscious agent. The final obstacle was that artificial selection is a deterministic process, while natural selection is a probabilistic one. The breeder stipulates exactly which organisms are permitted to propagate, while natural selection supposes only that certain individuals have greater fitness and hence a greater chance of propagating.37 Let us first consider the comparison of effects. When critics pointed to disanalogies between domesticated varieties and natural species, they commonly focused on perceived failures of correspondence in the observable consequences of artificial selection and (supposedly) of natural selection. Apart from the obstacle about creation of new species pointed out by Sober, the first and greatest problem, as Darwin himself acknowledged (1860, chapters 6 and 10), was the absence of “insensibly fine gradations,” or transitional forms, in the fossil record as well as among contemporary species. Implicit in this objection is the fact that artificial selection is known to produce gradual transitions and hence an approximately 36. Humphreys’s aleatory explanation is well suited to modeling the process in varying levels of detail. 37. Although artificial breeding is deterministic in one sense (i.e., whether a particular organism is allowed to propagate), the process is still probabilistic as regards whether offspring are produced and which traits are passed on. Thus, it is still appropriate to use a probabilistic causal model for the source domain.
THE ARTICULATION MODEL
131
continuous series of varieties, in which each new form exhibits slight modifications from its predecessor. The problem is remarkably similar to that faced by advocates of the acoustical analogy (Example 4.7): how can we account for gaps in the observation record where the analogy leads us to expect something? Darwin countered this “gap” objection by arguing that the fossil record was incomplete (in effect, shifting this factor to the neutral analogy), and that the present-day absence of intermediate species could be explained by their vulnerability to extinction. As Sober notes (1984, 18), the “new species” objection was answered by appealing to time scale: “if artificial selection has achieved so much in the comparatively short time of recorded human history, then natural selection can be expected to generate even more impressive results on the time scale of natural history” (emphasis in original). Here the strategy is to appeal to a difference in the boundary conditions or auxiliary hypotheses. Darwin (1860, chapter 6) noted one further disanalogy in comparing the effects of natural and artificial selection, also related to the “new species” problem: “How can we account for species, when crossed, being sterile [or] producing sterile offspring, whereas, when varieties are crossed, their fertility is unimpaired?” He provided two responses. He pointed out that human selection is oriented toward traits useful to man, never toward reproductive potential; as a result, it is hardly surprising that different varieties of dogs, cattle, and other domestic animals remain fertile. More interestingly, he cited numerous examples as evidence that the distinction between species and varieties on grounds of fertility was a gradual one. Many different species of plants, and some animals, are amenable to cross-fertilization, and even the hybrid offspring prove fertile. Now let us shift our attention from disanalogous effects to disanalogous causal factors. Darwin (1860, chapter 10) takes seriously the notorious objection that “time will not have sufficed for so great an amount of organic change, all changes having been effected very slowly through natural selection.” This point is closely related to Sober’s observations about consciousness and the contrast between deterministic and probabilistic processes: given an extremely large interval of time, a probabilistic process relying on slight statistical advantages could produce the same diversity as a deterministic process depending upon conscious choice over a brief time span. Indeterminate natural processes can serve as the analog for deliberate choices by conscious agents. The time required for small variations owing to artificial selection, though long enough for change to be almost imperceptible, is still on a human scale. The corresponding time required for the vast diversification and evolution that Darwin proposed appeared implausible to his contemporaries. The lack of sufficient time would count as a defeating condition. As suggested, all of these objections concerning disanalogies between causes and effects can be accommodated by our account, which combines features from our analyses of predictive/probabilistic and abductive analogies.38 First, the critical factors in an abductive/probabilistic analogy are as follows: 38. I claim only that my model captures the right general features of this type of analogical reasoning. The detailed analysis, of course, depends upon what goes into the prior association.
132
BY PARALLEL REASONING
• Effects. Let scope(Q) refer to probable consequences of Q together with the additional assumptions in the prior association. The elements of scope(Q ) are critical and should be compared as precisely as possible with analogs in the target domain. • Causes. Salient defeating conditions P are critical. The auxiliary assumptions C and other contributing causes are also critical, but as with abductive analogies, their analogs are typically formulated in speculative terms and located in the neutral analogy. Combining this with our general model yields the following test for plausibility: (4.7.3) Prima facie Plausibility for Abductive/Probabilistic Analogy 1. Overlap. There must be some overlap between E and the positive analogy, P. 2. No-critical-difference. a. Observable effects (scope(Q )) must not belong to the negative analogy; furthermore, investigation must eventually shift elements of this set to the positive analogy. b. No defeating conditions may be known to hold in the target domain. The additional critical factors C and j+ must not belong to the negative analogy. Clause (a) embraces the objections made to Darwin’s analogy on the basis of discrepancies between the effects of artificial and natural selection. As indicated above, a reasonable initial response would be to suggest that the fossil record is incomplete. In contrast to the fate of the acoustical analogy, where the gaps were never filled, Darwin’s faith that a more complete record would furnish evidence of transitional forms has been substantiated. But when first proposed, Darwin’s argument was only slightly better off in this respect than the acoustical analogy (slightly better off because Darwin could provide reasons for expecting a sparse record). Clause (b) addresses objections made on the basis of disanalogous causal factors. Most of these relate to the vast time-scale required for natural selection to produce the diversity currently observed. Darwin’s best defense here was to point out that the age of the Earth was undeniably great and that humans were largely ignorant both about the precise age and the rate of evolution under natural selection: With respect to the lapse of time not having been sufficient since our planet was consolidated for the assumed amount of organic change, and this objection, as urged by Sir William Thompson, is probably one of the gravest as yet advanced, I can only say, firstly, that we do not know at what rate species change as measured by years, and secondly, that many philosophers are not as yet willing to admit that we know enough of the constitution of the universe and of the interior of our globe to speculate with safety on its past duration. (1860, chapter 15) Darwin’s treatment of disanalogies based on causes and those based on effects is similar. In both cases, he appeals to the limits of knowledge, in effect
THE ARTICULATION MODEL
133
urging his contemporaries to locate the critical features in the neutral analogy. His argument became more convincing (with accumulating evidence about transitional forms and about the age of the earth) as relevant properties were shifted from the neutral to the positive analogy.39 The other major determinant of plausibility is the strength of the prior association. In concluding our discussion of abductive arguments in section 4.6, we noted that the derivation of a precise description of the observed consequences from an explanatory hypothesis strengthens the argument. The same point applies to this category of analogical arguments. That is reflected in the greater plausibility attached to Darwin’s model following the development of a mathematical framework (the “evolutionary synthesis”) that permits precise calculations of changes in allele frequency as a result of the process of natural (or artificial) selection.
4.8 Functional Analogies Functional analogies are used to infer similarities in function from similarities in form. Example 4.3 is an inference that H. sparsa’s “stiltlike” appendages, similar in form to those of other organisms, supported a similar kind of locomotion. The analogy is employed to assign a function to an unusual body part of an extinct creature. The known role of similar body parts in existing species supplies the likely answer. This type of inference is important in archaeology where it is referred to as the method of ethnographic analogy: “a method of interpreting the nonobservable behavior of the ancient inhabitants of an archaeological site (or ancient culture) based on the similarity of their artifacts to those used by living peoples” (Hunter and Whitten 1976, 147). Ethnographic analogy is commonly used to infer the probable function of an artifact based on the known function of a similar item in a similar culture. The next example illustrates this. Example 4.9 (Peruvian pots). Donnan (1971) and Shelley (1999) describe the use of ethnographic analogy to determine the significance of odd marks or incisions on the necks of Moche clay pots found in the Peruvian Andes.40 The marks are “small and inconspicuous, highly variable, and apparently made without care with a blunt instrument or a fingertip” (Shelley 1999, 585). The clue to interpreting these strange marks came from noticing that contemporary potters in central Peru place very similar marks, which they call signa´les, on the necks of their pots in order to indicate ownership. The signa´les enable the potters to identify and reclaim their pots whenever several potters share a kiln or storage facility. As Shelley notes (1999, 587), the application of this conclusion to the Moche potters is supported by many reasons: the “strong visual similarity of the modern 39. I shall not here consider the objection based on the absence of intelligent manipulation in the case of natural selection. Certainly, this counts as a critical disanalogy for some formulations of the analogical argument. 40. The Mochica culture flourished in northern Peru from about A.D. 200 to 700.
134
BY PARALLEL REASONING
signa´les with the marks found on Moche pots,” evidence of a possibly “continuous tradition of signa´les in the region,” and the fact that the markings are relatively infrequent both among contemporary pots and Moche pots. According to Hunter and Whitten (1976, 147), ethnographic analogy may be employed in two ways: “as a direct historical approach in which the modern peoples must be historical descendants of the ancient ones; and as a nonhistorical (or comparative) approach in which the ancient and modern peoples need not be historically connected but must manipulate similar environments in similar ways.” Example 4.9 illustrates the historical approach, while the following example illustrates the non-historical approach. Example 4.10 (slab lamps). It is frequently possible to find equivalents of Stone Age or Ice Age technology among the Inuit. Archeologists in southern Europe have found bowl-shaped lamps, carved from rock, in which animal fat was burned. They have also found flat, uncarved slabs bearing traces of burned fat. Why would Ice Age people ever have used flat lamps that allow fuel to spill over the sides? A strong clue is provided by near-contemporary Inuit groups that employ both kinds of lamp. They occasionally use flat slabs as makeshift lamps, when traveling and pressed for time (de Beaune and White 1993, 108). It is reasonable to think that the European slab lamps served the same function. In each of these examples, analogies are used to make an explanatory hypothesis plausible. Functional analogies are evidently a type of explanatory analogy. Rather than subsuming them under our earlier account, however, I treat them as a separate category. The main reason is that the explanations upon which such analogies are based are teleological. That calls for a more elaborate form of prior association than we saw in the representation patterns (4.6.1) and (4.7.1). The other special characteristic of these analogies is that they are often applied to hypotheses about the remote past, as in the three examples above, which are drawn from palaeontology and archaeology. Independent testing of such hypotheses may be impossible; analogies may provide the best available evidence. For this reason as well, functional analogies constitute a particularly important and interesting special category.
Prior Association and Preconditions Our first step is to characterize the prior association, which in this case is a teleological explanation. I shall begin with a simple account due to Wright (1976). Wright observes that man-made artifacts, such as pots or light switches, have a particular form because that form is capable of fulfilling the desired function. He makes this the paradigm for all functional explanations. Restating his account slightly, the function of object X is Z if (W1) X is able to do Z under appropriate conditions41 (W2) X is there [or has the form it does] because it does (results in) Z. 41. Wright’s statement is that Z is a consequence (result) of X ’s being there, but he glosses this as I have put it.
THE ARTICULATION MODEL
135
If the object has multiple functions Z, Z´, . . . , then these may be substituted for Z in conditions (W1) and (W2). To say X is there, or X has a particular form, because it achieves some function is, according to Wright, shorthand for giving a consequence etiology for X. This consequence etiology requires selection criteria (e.g., individual advantage or preference) that indicate why X is there and why the function Z is “wanted.” Many of the limitations of Wright’s approach have to do with concerns peripheral to ours. For instance, Wright’s account has been criticized because it does not allow dysfunctional tokens to count as instances of a functional type. The function of my lawnmower is to cut the grass, yet it is currently broken and Wright’s clause (W1) is violated. When imputing functional roles to archaeological artifacts, however, we care only about functional type. The concern about dysfunctional tokens is therefore immaterial, provided we reformulate Wright’s account in terms of types rather than tokens. In addition to making this change, we expand Wright’s condition (W2) by explicitly including selection criteria. We do this in order to make a clear connection between teleogical explanations and the more straightforward aleatory explanations of previous sections. Let f characterize the form of a class of objects, and let Q indicate the function of objects of type f. Let C represent certain environmental conditions, and let G stand for relevant selection criteria. (4.8.1) Prior Association for Functional Analogy (F1) Objects of type f can have the function Q, given C. (F2) The function Q is required by selection criteria G. (F3) Objects of type f are there because of Q and C. Thus, condition (F1) is a type-level version of (W1); Conditions (F2) and (F3) replace condition (W2). The prior association here is composite because all three of these conditions take the form of explanations. That has important implications for how functional analogies should be evaluated. First, condition (F1) may appear to be no more than a matter of observation. That stone slabs can function as lamps is obvious because they do function this way in near-contemporary cultures. Similar remarks apply to signa´les, and to the unusual appendages that existing species of animals use for locomotion (Example 4.3). In the context of an analogical argument, however, condition (F1) is not trivial because we want to extend the relationship between form and function to the target domain. In the case of Example 4.3, we have to worry about whether H. Sparsa’s overall body structure, or features of its environment, might prevent its “stiltlike” appendages from functioning as legs. In other words, we are looking for salient defeating conditions. Thus, condition (F1) is appropriately modeled as an explanation, and this element of the analogical argument should be evaluated according to the criteria outlined in section 4.5. Second, condition (F2) states that the relevant function is selected for by the criteria G. This, too, is a type of explanation, but rather than provide an analysis, I indicate (below) how its extension to the target may be finessed in terms of uniformities.
136
BY PARALLEL REASONING
Finally, condition (F3) has the form of an explanation of form in terms of function. In considering whether a similar relationship could apply to the target domain, we should apply the criteria developed in our discussion of abductive/ probabilistic analogies, in section 4.7. Combining these remarks with our analyses of predictive/probabilistic and abductive/probabilistic analogies (sections 4.5 and 4.7), I suggest the following preconditions. (4.8.2) Preconditions for Functional Analogy 1. No defeating condition for (F1) or (F3) may be known to hold in the source domain. 2. The assumptions C must be justified. 3. (F2) should be justified as a uniformity. The proposals in (4.8.1) and (4.8.2), and in particular the appeal to uniformities, require us to consider an important issue that has sparked debate within the field of archaeology. R. A. Gould (Gould and Watson 1982) argues that analogical reasoning in archeology should be shunned in favor of an approach based on uniformitarian generalizations. Inferences about the function of objects in antique cultures should proceed via generalization from present cultures, followed by application to the past.42 By contrast, Watson (Gould and Watson 1982), Wylie (1982, 1985), and others insist that ethnographic analogy is indispensable, although they concede that uniformitarian principles should be incorporated into analogical thinking. In Gould’s view, it is primarily general “lawlike” patterns that are projectible into the past. If we are interested in projecting a more specific claim about the function of an artifact, it is a mistake to proceed directly by analogy with a single contemporary culture in which, referring to (4.8.1), objects of a very similar type f are there because they have the function Q. To project this functional role onto some ancient culture, we must first justify a “special” generalization that under certain environmental or social conditions, cultures tend to use objects of type f to fulfill function Q. This principle is then applied to the ancient culture. One can view Gould’s methodological reform as restricting the method of ethnographic analogy: first derive a law or “uniformity” by looking at existing cultures, and then, by analogy, project it into the past (Wylie 1982). As applied to our theory, Gould’s position appears to imply that a well-confirmed special uniformity is a precondition for a plausible analogical argument. We should reject this extreme version of uniformitarianism. Not every viable functional analogy should require prior knowledge of a “special” uniformity. To restrict ethnographic analogy in this way would impose too great a limit upon our ability to make plausible inferences about the past. In particular, such a restriction would imply that no analogical argument based on a single source can
42. The logic here is similar to that of Aristotle’s paradeigma (section 2.2).
THE ARTICULATION MODEL
137
be plausible. We should not concede this last point, because we do encounter intuitively plausible inferences based on one (or relatively few) cultures, as in the Peruvian pots example. What is more, we often have no underlying uniformity in view and may be reluctant to commit ourselves to a specific principle. Yet we should not stop making inferences about the past. The preconditions (4.8.2) provide a moderate version of Gould’s position. The condition (F2), relating selection criteria and function, represents a uniformity, but it pertains only to general features present in the explanation. In the case of Example 4.3, for instance, we have the almost trivial assertion that possessing a means of locomotion is conducive to survival. Such uniformities still play an important role in establishing the plausibility of an analogical argument, but they are only part of the story.
Relevance and Potential for Generalization Based on the above representation of the prior association and our earlier analyses of explanatory analogy, the following factors emerge as critical: • Aspects of physical form f, selection criteria G and environmental conditions C, identified explicitly in the prior association • Salient defeating conditions for the explanation • Salient observable consequences of Q and C An argument of this type meets the requirements for prima facie plausibility if some of these factors belong to the positive analogy and none belongs to the negative analogy. In general, the positive analogy P combines aspects of the physical form f, the selection criteria G, and even the environmental conditions C. While this test looks complicated, it can be simplified by attending to special features of this type of analogical argument. The most significant simplifications relate to the selection criteria G and salient defeating conditions, which may often be set aside. Evaluation of such arguments can rest almost entirely upon the historical record: artifacts, fossils, and so forth. My argument depends upon three ideas. First, the resemblances in physical form that constitute the basis of the analogy may be highly specific, as in the case of the signa´les on the Peruvian pots. Second, the source and target domains may be historically linked, or (as in the example of slab lamps) we may have independent evidence of very similar environmental conditions. Finally, given the first two assumptions, defeating conditions and selection criteria cease to be independent factors. When retrieved artifacts are both unusual in form and extremely similar to analogs in familiar cultures, it becomes plausible not merely that they were used for similar purposes but also that similar selection pressures were present. This inference is only threatened by differences in the historical record, which count as salient observable consequences. For instance, we should expect to find the flat slab lamps of Example 4.10 only outside settled communities.
138
BY PARALLEL REASONING
In our discussion of qualitative plausibility in section 4.3, we identified three factors: the extent of the positive analogy, the strength of the prior association, and multiple analogies. The above remarks make it clear that specific resemblances contribute greatly to overall plausibility by maximizing the positive analogy. As regards the strength of the prior association and multiple analogies, it is helpful to note Wylie’s remark that an ethnographic analogy becomes more cogent when we “work on the left-hand side,” which amounts to articulating and defending relevant uniformities (1982, 397). Adding uniformitarian principles to the prior association, supported by multiple analogies, strengthens the conditions (F1) and (F3). Wylie points out that in a (rarely achieved) ideal situation, uniformitarian principles can establish an “if-and-only-if” relationship between form and function: using a particular form of tool might be the only way some end has ever been known to be achieved, and that end may be the only use of the tool. Archaeologists have referred to this as a hypothetico-deductive (H-D) strategy for confirming archeological hypotheses: based on the uniformity, one can make predictions about the past and these can be tested against the archeological record. In the limit, this pattern of analogical reasoning could fall under both the predictive/probabilistic and abductive models. Uniformitarian principles that fall short of this ideal can still strengthen a functional analogy.
4.9 Correlative Analogies We consider, finally, analogical arguments in which the prior association is merely statistical correlation. As we shall see, this category of arguments poses a special challenge to the articulation model, precisely because the prior association gives us very little information. In chapter 6, I propose an alternative strategy for dealing with correlative analogies. With a correlative analogy, we start by supposing that in the source domain, a set f of attributes is correlated with Q. We encounter similar attributes f * and infer that a similar attribute Q* is likely to be present. When is this type of analogical argument plausible? Let us state the problem in slightly more general terms. We may assume that the correlation exists within some reference class C (for example, a set of experimental trials). The prior association is as follows. (4.9.1) Prior Association for Correlative Analogy Pr(Q / f·C) > Pr(Q / C) (f is positively relevant to Q, given C.)43 Here, Pr represents objective probability, as indicated by observed frequency. The positive analogy P generally includes f and some of the features that characterize the reference class C. 43. As explained in section 4.2, the prior association here is symmetrical and could be stated in symmetrical fashion: Pr(Q·f/C ) > Pr(Q / C ) Pr(f / C ).
THE ARTICULATION MODEL
139
The conclusion of the analogical argument is, first of all, a prediction that an analogous relationship holds in the target domain: PrðQ =f C Þ > PrðQ =C Þ: Here, C* is a reference class in the target domain. Since f* applies to the target domain, the analogy supports the conclusion that Q* is plausible there. Example 4.4 (morphine and meperidine, continued). For the source domain, the reference class C is a large class of past trials of potential analgesics on mice. In the target domain, C* consists of trials with new drugs. The factors f and f * stand for the unusual tail curvature; Q and Q* represent sedative and analgesic effects. The two attributes f and Q are strongly correlated: Pr(Q / f C) = 1. Indeed, morphine was the only drug known to produce the tail curvature, although Schaumann could not explain the correlation. The initial conclusion supported by analogy is that Pr(Q* / f* C*) > Pr(Q* / C*). The analogy also appears to support the more specific hypothesis that meperidine has the same narcotic effects as morphine. The analogical argument is not meant to show that meperidine is likely to have the desired narcotic effects, but only that these effects are somewhat plausible. There is more to be said about this example, but let us first see how our general model applies to correlative analogies. The first step is to identify standard conditions to ensure that the association (4.9.1) represents a genuine correlation. (4.9.2) Preconditions for Correlative Analogy 1. The correlation in (4.9.1) should be statistically significant (i.e., the class C should be adequately large and appropriately defined). 2. Prior to testing, C should be homogeneous with respect to factors known to be causally relevant to Q (i.e., there should be no known way to partition C into subclasses within which the frequency of Q varies). Both of these conditions are met in our example, where C consists of a class of trials with various compounds. Schaumann had no prior causal knowledge that would have allowed him to partition this class. We now come to the crucial step of identifying critical factors and assessing plausibility. It is clear that both the reference class C and the relevant attributes f cited in the association (4.9.1) count as critical, since both play an essential part in the prior association. But what does it mean for a correlative analogy to satisfy our test for plausibility? Our theory offers only the following uninformative test. (4.9.3) Prima facie Plausibility for Correlative Analogy 1. Overlap. Some factor in f must belong to the positive analogy. 2. No-critical-difference. Atributes corresponding to C and f must not be known to be absent in the target reference class, C*.
140
BY PARALLEL REASONING
This test is unhelpful because it appears to be either unsatisfiable or trivially satisfied. It is unsatisfiable if what is required is that the target class C* be indistinguishable from C. Such a strict requirement would, for instance, rule out analogical inferences based on animal testing, since there are obvious differences between nonhuman species and humans. By contrast, the conditions (4.9.3) are trivially satisfied if we require only “correspondence” between C and C*. Any correlative analogy would count as plausible with such a loose requirement. Let us try to determine why our theory has let us down and attempt to find a remedy. In section 4.1, we adopted the idealized assumption that the similarity relations (correspondences) in an analogy could be taken for granted. That allowed us to concentrate on the role of the prior association in evaluating analogical arguments. The strategy works well for highly structured types of analogical argument, but breaks down for correlative analogies, which have almost no structure. The remedy is to return to the general thesis that analogical reasoning is based on potential for generalization. For a correlative analogy, the relevant sort of generalization seems to be a natural kind. In the case of Example 4.4, Schaumann’s experiments led to the discovery of a whole class of morphine-like compounds. Let us consider that example a little more closely. It is striking that when it comes to formulating hypotheses about natural kinds, a bizarre effect common to morphine and meperidine—the S-shaped tail curvature—is a more reliable indicator than similar chemical structure. One discussion of Schaumann’s experiments offers a cautionary message: Although many thousands of compounds have been synthesized which appear to have the predicted structural characteristics of a potent analgesic, only relatively few have been shown by biological tests to have useful properties . . . . The estimate of one chance in a thousand of making a new compound to order which will have useful analgesic properties appears to be a very low probability in statistical terms but is probably a low estimate in fact. (Reynolds and Randall 1975, 376) Of course, Schaumann restricted his attention to compounds with structural similarity to morphine, but our problem is to explain why he acted reasonably in taking tail curvature to be a significant indication that morphine and meperidine belonged to a common family. In order to evaluate correlative analogical arguments properly, we need a means of identifying traits that suggest when two domains might belong to a natural kind. That analysis does not belong in this chapter, but the need for one signals that it is time to take a closer look at the horizontal similarity relationships and their impact on the plausibility of analogical arguments. I shall return to the analysis of correlative analogies in chapter 6, where I argue in more detail for the thesis that these analogies are linked to inferences about natural kinds.
THE ARTICULATION MODEL
141
I conclude with some final remarks about the qualitative plausibility of correlative analogies. Apart from the strength of the correlation, one recognized determinant of plausibility appears to be the past record of analogies between the two domains and the record of similar analogies in other settings. For instance, in the absence of a detailed physiological model, any inference from drug testing on animals to probable effects on humans counts as a correlative analogy. The persuasiveness of such inferences depends partly on whether past analogies of a similar nature have been successful. The individual inference receives indirect support as a member of a class of inferences that has proved to be plausible. As de Finetti has argued, The very belief in the plausibility of extending certain conclusions concerning certain medical treatments to certain others can be explained in turn by the observation of analogies in a broader and vaguer sense . . . it turns essentially on similar observations made for other medical treatments. (de Finetti 1980) Of course, this “inductive” confirmation of the value of analogies applies within each of our categories of analogical argument. Even within mathematics, if a class of analogies (such as those between plane and solid geometry) has proved useful many times in the past, experience tends to bolster our confidence in analogies of that type. In my view, however, this type of inductive support has limited importance. As I have maintained throughout the book, each analogical argument has unique characteristics and must be assessed on its own terms. Overall resemblance to some category of successful arguments should be discounted if we can bring direct considerations to bear on the argument in ways outlined in the preceding sections.
4.10 Multiple Analogies Thus far, I have focused on criteria for the evaluation of individual analogies— arguments involving a single source domain. I have asserted that such arguments are more fundamental than multiple analogies, arguments that involve multiple sources.44 In the first part of this section, I defend that claim. I then outline how the formal theory of section 4.3 can be extended to accommodate reasoning with multiple analogies. There are three basic reasons to regard individual analogical arguments (rather than multiple analogies) as the most basic “units” of analogical reasoning. 1. Ubiquity. Individual analogical arguments are common. Any normative theory of analogical reasoning should be able to handle them.
44. This terminology is due to Shelley (2003).
142
BY PARALLEL REASONING
2. Screening. Reasoning with multiple analogies requires a preliminary step of evaluating the constituent individual analogies for prima facie plausibility. In many cases, this step is all that is wanted from analogical reasoning; the analogies are used to identify plausible hypotheses rather than to adjudicate between them. 3. Independence (rough). The assessment of prima facie plausibility for each individual analogical argument can and should be largely independent of the other analogies. The crucial idea here, as in the theory of individual analogies, is that there is an important role for prima facie plausibility. While I believe that these arguments are basically correct and establish the primacy of individual analogical arguments, the formulation of Independence needs refinement. In cases where we have a number of analogies, but no obvious way of ranking any one as superior to any other, I believe that we may evaluate each argument separately from the others. The same goes for cases where the possible rankings are open to debate. There is an important objection to Independence, however, when we do have a clear ranking on our set of analogies. One analogical argument that meets the criteria for prima facie plausibility when evaluated in isolation may be trumped by a second, more highly ranked analogical argument that supports an incompatible conclusion. Independence fails; the first argument no longer possesses any measure of plausibility. Ashley’s HYPO program, discussed in section 3.4, illustrates how this siutation arises in the legal arena. Past precedents are partially ordered by degree of overlap with the case at hand, along a specified list of relevant dimensions. HYPO identifies the optimal, or most on-point, cases for each side in a legal dispute. The most on-point cases are maximally similar to the current case, according to the partial ordering. A past case decided in favor of the plaintiff, if it bears relevant similarity to the case at hand, normally establishes some plausibility in finding for the plaintiff in the case at hand. But we make an exception if some other case bears even greater resemblance to the case at hand, yet was decided for the defendant. The second precedent trumps the first, and the trumped analogical argument may be discounted.45 In light of these observations, I suggest the following revision to our rough formulation. 3. Independence. A provisional assessment of prima facie plausibility is made independently for each individual analogy under consideration. This assessment remains unchanged for optimal analogical arguments, but is defeated if there is a more highly ranked analogical argument that supports an incompatible conclusion.
45. I am setting aside complexities such as the age of the two precedents, the authority of the two courts, and other matters.
THE ARTICULATION MODEL
143
The formal presentation below clarifies these ideas and explains how the presence of multiple analogies can alter our opinions about which factors are relevant to an individual analogical argument, without threatening independence. The case against the primacy of individual analogical arguments appears stronger in situations when the goal of analogical reasoning goes beyond screening. Analogies are sometimes used to establish high probability (as opposed to mere plausibility) for a hypothesis, or to show that one hypothesis has more support than its rivals. These roles for analogy are important if no alternative method of confirmation is available. It seems that multiple analogies may be essential in such cases (Shelley 2003). My response is that we should replace “essential” with “helpful.” In the first place, an individual analogy can sometimes provide strong support on its own. Second, the initial screening process, and hence the theory for evaluating individual analogical arguments, is still fundamental. Finally, many of the criteria that go into evaluating the plausibility of individual analogical arguments apply to reasoning with multiple analogies. Let me now suggest how to extend the theory presented in section 4.3 in order to take multiple analogies into account in a way that respects all of these ideas. To set things up, let us suppose we have a set G of analogical arguments {A1, . . . , An} with respective source domains S1, . . . , Sn (not necessarily distinct), a fixed target domain T, and supported hypotheses Q1*, . . . , Qn* (again not necessarily distinct). As usual, we use the * notation to refer to features of the target domain. In accordance with our theory, each individual argument must articulate a prior association that meets relevant preconditions. We face two tasks. First, we need to indicate whether (and how) the presence of multiple analogies should influence our criteria for assessing the prima facie plausibility of each individual argument. Second, we need to show how we should combine multiple analogies in assessing the plausibility of the hypotheses Q1*, . . . ,Qn*. In what follows, I tackle the first of these tasks systematically, but I confine myself to limited remarks about the second.
Prima facie Plausibility According to our general theory, an analogical argument is prima facie plausible if some of its critical factors belong to the positive analogy and none belongs to the negative analogy. The crucial question is how the presence of multiple analogies (the set G) alters the set of critical factors for each individual analogy in G. The position I shall defend is roughly this: not at all for “optimal” members of G, but quite a lot for “sub-optimal” members. The net effect on prima facie plausibility is summarized by the Independence condition above. Let’s fix our attention on one individual argument—for instance A1, the first member of G. Referring back to section 4.3, we see that the set of potentially relevant factors consists of those that appear explicitly in our argument together
144
BY PARALLEL REASONING
with some background factors,46 and we must now add the set (represented as in section 4.3) of all factors present in any of the other analogical arguments in G. There is no change to the definitions of the positive, negative, and neutral analogy, except that this division now applies to the enlarged set of potentially relevant factors. We now come to the key step of identifying which factors are critical to each constituent individual analogy. First, we consider each of these arguments in isolation from the others (applying the models of earlier sections), and identify sets of critical factors j1C, . . . , jnC. Next, we want to determine the critical factors relative to G, written as jG,1C, . . . , jG,nC. There are three obvious candidates. • Unchanged. jG, iC = jiC for each i. There is no change at all to the set of critical factors. In assessing prima facie plausibility, other analogies in G are irrelevant. • Union. jG, iC = j1C [ . . . [ jnC for each i. All n arguments have the same critical factors—namely, all factors that are critical in at least one of the individual arguments. • Intersection. jG, iC = j1C \ . . . \ jnC for each i. All n arguments have the same critical factors—namely, all factors that are critical in every one of the individual arguments. But none of these proposals is satisfactory. On the first proposal, where the critical factors are unchanged, each analogical argument is totally independent of the others. That is too much independence, in light of our discussion of trumping. The second proposal takes too many factors as critical, with the result that we are led to reject too many analogical arguments. Earlier, I suggested that assessments of prima facie plausibility should be independent in cases where we have no rank-ordering of multiple analogies. Consider a near-future version of Example 4.6. A newly discovered planet Bellona, orbiting a star much like our sun, has an atmosphere, water, and a large moon, but a rather cold surface temperature. Earth shares the first three characteristics and supports life; (ancient) Mars shares all but the third characteristic and does not support life.47 The second proposal would have us count the single known disanalogy in each case as a critical difference that blocks the analogical argument. Thus, the second proposal fails to respect the integrity of the individual analogical arguments in G. Each is based on an independently acceptable prior association—some sort of derivation or explanation. Yet on this proposal, any factor that is critical for even one argument in G effectively becomes a salient defeating condition for the prior associations in all of the other arguments. That is a serious mistake. In our simple example, neither the absence of a large moon nor a cold surface temperature should be regarded as a defeating condition. 46. As in earlier sections, we suppose that important background factors can be incorporated into an augmented prior association. 47. For the sake of argument, I am supposing that ancient Mars had liquid water but did not support life.
THE ARTICULATION MODEL
145
Multiple analogies might bring defeating conditions to light, but not in so mechanical a fashion. The third proposal suffers from the same sort of difficulties as the second, although in this case the problem is that it counts too few factors as critical (and too many analogical arguments as plausible). Imagine a case where two wholly distinct combinations of factors, {f1, f2} or {f3, f4}, can bring about the result Q. Suppose we have f1, f2, and Q in source S1, and we have f3, f4, and Q in source S2. On this proposal, no factor is critical, and Q is trivially plausible in any target domain whatsoever! Once again, the trouble comes from failing to respect the integrity of the prior associations in each of the source domains. As a first step in formulating a better proposal, we suppose that there is a ranking on the set of multiple analogies under consideration. DEFINITION: Ranking on G A ranking ≺ on G is a partial ordering of the arguments in G: A ≺ A´ if A´ is superior to A. The point of such a ranking is captured in the following proposal about critical factors. DEFINITION: Critical Factors Relative to G and ≺ For each i, the set of critical factors jG, iC is the union of: • j iC • Every jjC where Ai ≺ Aj For optimal arguments (those not outranked by any other), the set of critical factors is unchanged. For suboptimal arguments, every critical factor in every more highly ranked argument now counts as critical. (Note that in the trivial case of an empty ranking, every argument counts as optimal and there is no change.) Our definition of prima facie plausibility (see section 4.3) needs no adjustment, but when combined with the above definition of critical factors, the result is exactly as stated in our revised formulation of Independence. For optimal arguments in G, independence holds: prima facie plausibility is assessed without regard to the remaining members of G. For suboptimal arguments, independence fails, and they establish nothing about plausibility. It follows that suboptimal arguments may be ignored. The standard for ranking one argument above another must be very high. It should be completely clear that the suboptimal argument can be set aside. For the sake of generality, I leave open the possibility that such a ranking could be based partly on domain similarity and partly on external factors (such as level of court jurisdiction, in the case of legal reasoning). Employing a modified version of Ashley’s ideas, however, I propose a standard ranking based on matching critical factors.
146
BY PARALLEL REASONING
DEFINITION: Standard Ranking Suppose A1 and A2 are arguments in G, and j1C and j2C represent the respective critical factors. Then A1≺A2 if: 1. j1C \ P j2C \ P. 2. j1C \ N = j2C \ N = . The first condition, which is the important one, states that the critical positive analogy for the first argument is a subset of that for the second argument. The second argument restricts the ranking to arguments that satisfy the no-criticaldifference condition. If A1 ≺ A2 on this ranking, and the two arguments support incompatible hypotheses, then it is entirely in keeping with our theory to disregard A1. The existence of a better analogy that supports an incompatible conclusion shows that there is no potential for generalizing the prior association in A1. To illustrate how all of these ideas may be applied, let us expand the “Bellona” example above. Assume that we know of yet one more planet, Proserpina, that closely resembles Earth but does not support (and to the best of our knowledge has never supported) life. Suppose that table 4.3 represents the three analogical arguments; the initial critical factors are italicized.48 Here, G consists of three analogical arguments. The analogy with Earth is advanced to support the hypothesis that Bellona supports (or has supported) life; the other two support the contrary hypothesis. The table represents the first step in determining the critical factors for each argument in isolation. The second step is to determine the standard ranking of our three analogies. All three initially satisfy the no-critical-difference condition, and in all three cases, the italicized critical factors coincide exactly with the positive analogy. Consequently, the Proserpina analogy ranks above the Earth analogy, but nothing can be said about how either compares with Mars. It follows that we must add “no volcanoes” as a critical factor (in this case, part of the negative analogy)
TABLE 4.3. Multiple analogies Source Domain
Target Domain
1. Earth [supports life] Atmosphere, water, large moon, warm, volcanoes
Bellona Atmosphere Water Large moon Cold No volcanoes
2. Ancient Mars [no life] Atmosphere, water, no large moon, cold, volcanoes 3. Proserpina [no life] Atmosphere, water, large moon, warm, no volcanoes
48. Not all factors that are critical in one argument must be critical in other arguments. In each of the three analogical arguments, the prior association may omit certain planetary characteristics and still be acceptable.
THE ARTICULATION MODEL
147
for Earth. The Earth analogy no longer satisfies the no-critical-difference condition and may be disregarded. We are left with two prima facie plausible arguments that Bellona does not support life. We could salvage the Earth analogy by finding some further relevant characteristic shared by Earth and Bellona, but not by Proserpina. As always, our assessment of plausibility depends upon the particular form in which the analogical argument (or arguments) is stated. This concludes our formal account of prima facie plausibility. Setting aside the details, let us review why having such an account is important and, in many cases, good enough. In the first place, there is no need for a theory of analogical arguments to declare a winner every time. Frequently, several hypotheses may be worthy of investigation. This is especially true in subject areas where we have independent means of testing hypotheses. Second, a great deal of our reasoning with multiple analogies can be usefully modeled without going beyond prima facie plausibility. For instance, there are two especially effective strategies for criticizing an analogical argument: uncovering some critical disanalogy with the target, or trumping the argument by finding a better analogy that supports a rival hypothesis. Both of these strategies are ways to defeat the argument’s prima facie plausibility, so both of them have a place in our theory. In a similar vein, our theory of prima facie plausibility allows us to extend our ideas about the exploratory uses of analogy, a topic broached in the final paragraph of section 4.3. Analogical reasoning is more than an adversarial contest. Being able to draw on multiple analogies, some successful and some not, helps us to refine our model of the target domain.
Qualitative Plausibility Despite the preceding discussion, a complete theory of analogical reasoning should go beyond prima facie plausibility to explain how multiple analogies can be combined in an overall assessment of rival hypotheses. Such a theory would be especially useful in subject areas where we have no independent means of testing a hypothesis. That includes not only sciences that deal with the remote past but also philosophy and the law. I do not have such a theory, however, and will limit myself to making a few observations. Our fundamental guiding principle—that the strength of an analogy is based on its potential for generalization—can help us in thinking about overall plausibility. Plainly, multiple analogies supporting the same hypothesis can strengthen that hypothesis, but in my view, two other considerations are more significant. The first is just the familiar point that the larger the extent of the positive analogy, the stronger the argument. Even if there is no natural way to measure this overlap, the guideline is still qualitatively useful. The second, and more important, basis for assessing overall plausiblity is the level of detail in the prior associations of the relevant source domains.
148
BY PARALLEL REASONING
It is helpful here to recall Evans’s ANALOGY program (section 3.2), which solves geometric analogy problems by looking for the most specific rule that transforms one picture into another. The more specific and detailed the prior association, the greater the potential for generalization. What is more, the models developed in the middle part of this chapter provide clues to the areas in which we should look for detail in the prior association. For instance, we saw that explanatory analogies are often vague about background assumptions and boundary conditions. We should focus on these details when comparing explanatory analogies for rival hypotheses. Our theory of individual analogical arguments concentrates on prima facie plausibility. I have not emphasized detail in the prior association, beyond meeting basic standards (the preconditions), except in selected discussions about ways to increase the qualitative plausibility of analogical arguments. As the remarks about generalization in the preceding paragraph suggest, it is possible to expand our theory along these lines. But these initial reflections also suggest that this expansion might best be accomplished within the various models for different types of analogical argument, rather than at the general level.
4.11 Conclusion Analogical arguments exhibit enormous variety in strength, usefulness, and subject matter. In this chapter, I have argued that there is also considerable variation in logical type, as reflected in the classification scheme. I proposed a general theory for evaluating analogical arguments, based on two fundamental principles: prior association and potential for generalization. These principles govern all analogical arguments but are applied differently within each category. They need to be supplemented with subordinate models that both provide a basis for initial assessment and indicate fruitful directions for further study of an analogical argument. In this closing section, I discuss some of the main positive conclusions of the chapter as well as some limitations of the theory. I have defended three main theses in this chapter. First, I have claimed that we can make headway in understanding analogical reasoning through a better understanding of individual analogical arguments. Second, I have maintained that to understand individual analogical arguments, we should focus first not on measuring overall similarity or constructing mappings between two domains, but rather on clarifying the exact nature of the relations within the source domain that are to be transferred to the target domain. That is what I have called the prior association. Finally, I have suggested that the most important objective for a normative theory of analogical arguments is not to introduce a quantitative measure of overall strength or even an algorithm for adjudicating between rival analogical arguments; rather, it is to define workable criteria for prima facie plausibility, which I interpret as demonstrable potential for generalizing the prior association. What emerges from the chapter, I believe, is that there is considerable support for these theses, but (so far) the degree of support depends upon the
THE ARTICULATION MODEL
149
type of analogy. Support is strongest in the first four categories and weakest when we come to correlative analogies. As the prior association becomes less precise and less informative, it becomes harder to find in it much of a basis for evaluating plausibility. Increasingly, we fall back on something like physical similarity. Once noted, this limitation can be expanded into a broader criticism of the articulation model. Our criteria for evaluating analogical arguments depend upon relatively precise and generally acceptable models for proof, explanation, and so forth. It may be objected that the models I have proposed are neither precise nor acceptable. My reply has two parts. First, I reiterate a point made earlier: the models are intended to be acceptable to a reasonable critic. Certainly, if they fail to meet that standard, then they must be modified. The general approach, however, transcends the particular models. Indeed, I made it clear from the outset that the theory is open to revisions, in the form of substantive alteration or greater precision in representing a form of prior association, or entirely new species of analogical argument. But I remain convinced that if there are no relatively clear standards for the prior association, then there can be little meaningful discussion about the plausibility of an analogical argument. Second, taking up a theme broached in the preface, I emphasize that the objective of this book is to develop a normative theory of analogical arguments rather than a psychological model of a cognitive process. Given that objective, it is legitimate to take our paradigm cases of analogical reasoning to be highly structured arguments with a clear form of prior association. In this chapter, we began with mathematics and worked outwards to less structured cases of analogical reasoning. In order to address concerns about whether my theory has much to say about the least structured cases, however, we need to shift our attention from Hesse’s vertical relations (the prior association) to horizontal relations (similarity). In the next chapter, I identify some prominent patterns of similarity in mathematical analogies. These ideas are extended, in chapter 6, to a broader analysis of similarity and its role in analogical arguments. In both chapters, I show a strong connection between similarity and the second fundamental principle, potential for generalization. Together, the analysis and examples of the present chapter and the next two provide solid support for both of our fundamental principles.
This page intentionally left blank
5 Analogies in Mathematics
5.1 Introduction This chapter is devoted to analogical arguments in mathematics. These analogies can be fascinating in their own right, but they also play a special role in my theory. They provide especially clear illustrations of the articulation model presented in chapter 4. At the same time, they allow us to refine and expand that theory in ways that serve as models for topics to be considered in future chapters: the role of similarity in evaluating analogical arguments, the philosophical foundations for analogical reasoning, and the uses of analogy in research programs. First, consider similarity. Chapter 4 proposed evaluation criteria for analogical arguments based entirely on the prior association: the logical, statistical, or causal relation in the source domain around which the analogical argument is built. That chapter placed no restrictions on how we identify corresponding features of the source and target domains. Yet it is obvious that without restrictions on what counts as legitimate similarity, any theory of analogy will run into trouble. Section 5.2 presents some nontrivial mathematical examples that illustrate this problem. Mathematics is an excellent place to start thinking about similarity and its role in analogical reasoning because mathematical similarities can be stated in precise terms. As Polya (1954) puts it, two systems are analogous “if they agree in clearly definable relations of their respective parts” (13). But even when we confine our attention to mathematics, the problem of modeling similarity can be overwhelming. The first few sections of this chapter show that we need to accommodate many different sorts of similarity in evaluating mathematical analogies. We should not expect to handle all of them with a single strategy, and I do not attempt to provide a comprehensive theory. Instead, sections 5.4 through 5.6 develop models for three important types of similarity, in effect giving us three special varieties of
152
BY PARALLEL REASONING
mathematical analogy. By complementing (and refining) the basic theory of chapter 4 (specifically, section 4.4), these models enable us both to address the difficulties raised in section 5.2 and to deepen our understanding of analogical reasoning in mathematics. (In chapter 6, they provide a foundation for extending our analysis of similarity beyond mathematics.) Next, consider the problem of providing a philosophical foundation for analogical arguments. In chapter 2, we reviewed several unsuccessful attempts to solve this problem. Some of these attempts rest upon causal notions such as natural kinds, or causal assumptions such as Hesse’s “clustering postulate.” But causal notions and assumptions can play no part in the justification for analogical reasoning in mathematics. Consequently, if there is to be a common philosophical basis for analogical arguments in mathematics and other (empirical) disciplines, we should not look to causal principles for the core ideas. Apart from cutting off unpromising approaches to the problem, there are two additional reasons mathematics is an attractive setting to begin serious reflection about the justification of analogical arguments. The first is widespread agreement that analogical reasoning in mathematics works—that careful analogical arguments really do make their conclusions plausible. The second is the clear link between successful analogies and generalization. Mathematicians often view their work as part of an ongoing effort to understand and develop a relationship of analogy between two structures, with the hope of culminating in a general theory. My articulation model bases the plausibility of an analogical argument on its demonstration of just such a potential for generalization. If this theory is to be capable of justification, a promising place to start is with its mathematical applications. In section 5.7, I propose that the plausibility of analogical arguments in mathematics derives from considerations of symmetry. This foreshadows a general approach to justification that will be developed in chapter 8. Finally, in section 5.8, I go beyond individual arguments to discuss how analogies play a larger intellectual role in mathematical research. In this regard, Polya’s remark about “clearly definable relations” stands in sharp contrast to Poincare´’s (1952a) opinion that analogies “not expressible by formula” are the most valuable ones, and that principles guiding discovery “must be felt rather than formulated” (57). The two views do not necessarily conflict, since Poincare´ is reflecting on the psychological process of discovery while Polya’s emphasis, like mine, is on justifying assertions of plausibility. Still, the two activities are not entirely disconnected. I shall argue that having a sharp theory of analogical arguments is not merely compatible with Poincare´’s views but also illuminates the value of analogies in exploration. Let me comment briefly on the examples in this chapter. For the most part, I have tried to use accessible examples. For cases that require moderately complicated mathematics, I try to summarize their implications for my theory in nontechnical terms. A chapter appendix provides additional applications and illustrations.
ANALOGIES IN MATHEMATICS
153
5.2 Three Challenges This section raises three difficulties for the account of plausible mathematical analogies developed in chapter 4 (section 4.4). All of them signal the need to supplement that account by incorporating an analysis of the similarities between the source and target domains into our plausibility criteria. In general, a mathematical conjecture is plausible if, in light of available evidence but short of decisive proof, there is enough support to warrant investigation. In the case of an analogical argument, the supporting evidence is that a similar result has already been proved. In chapter 4, I proposed that we should evaluate such arguments by assessing the potential for generalizing the proof used in the source domain to fit the target domain. As Polya (1954) puts it, that proof must be “fit for imitation” ( 46). I put forward two specific criteria for prima facie plausibility. (5.2.1) Plausibility I 1. Overlap. Some explicit assumption in the proof must correspond to a fact known to be true in the target domain. (The positive analogy is nontrivial.) 2. No-critical-difference. No explicit assumption in the proof can correspond to something known to be false in the target domain. (No critical assumption belongs to the negative analogy.) I have clarified the earlier formulation slightly by replacing “everything used in the proof” with “every explicit assumption in the proof.” If all such assumptions belong to the neutral or positive analogy, then the analogical argument counts as prima facie plausible—that is, there is a solid case for investigating the conjecture. As I stressed in chapter 4, this test for plausibility is somewhat ambiguous. It is relative to our representation of the two domains—in particular, it is relative to how the proof is formulated in the source domain. That proof has to be mathematically acceptable, but I have not tried to spell out what counts as a mathematically acceptable proof or, more generally, as an acceptable representation of facts about the source and target domains. Nor have I indicated how to identify the explicit assumptions in the proof. In my view, this lack of precision is tolerable because it reflects genuine ambiguity in what counts as a good analogical argument. Mathematicians can and do disagree about questions of plausibility, and the disagreements can often be traced to choices about representation. Our theory of analogical reasoning intentionally leaves such matters open. We cannot be equally sanguine, however, about ambiguity in identifying legitimate correspondence between features of the source and target domains. Without some restrictions, the theory is vulnerable to counterexamples. Indeed, it becomes trivial. The remedy for this unsatisfactory state of affairs, which will occupy us for much of the chapter, is to supply constraints on correspondence— that is, standards for acceptable or, as I shall say, admissible, similarity between our two domains.
154
BY PARALLEL REASONING
Before embarking on this project, let me say something about the level of detail at which I shall deal with the problem of similarity. As we saw in chapter 3, computational theories devote considerable attention to the representation of similarity; they are preoccupied with theoretical and practical issues related to analogical mapping. I shall not attempt to match the level of detail in those theories. In order to explain what I hope to achieve, recall the rhetorical framework of advocate and critic presented in section 4.3. In essence, an acceptable analogical argument should satisfy an imaginary critic, whom I characterized as intelligent and sympathetic, but somewhat skeptical. In what follows, I have just this sort of critic in mind when setting standards for admissible similarity. By contrast, for most computational theories, we might imagine that the intended critic is unintelligent and indifferent, so that the standard of precision is correspondingly higher. These computational theories aim to construct a theory of similarity from scratch. My criteria for admissibility are not geared toward blocking all possible counterexamples, and they take for granted a lot of common ground between the advocate and the critic. I shall be concerned mainly with three problems that I take to be of particular importance and interest.
Specious Resemblance The first problem is that an analogical argument may satisfy Plausibility I, yet be quite implausible because the resemblances between the two domains derive from contrived manipulations. In order to see that this is an interesting problem, let us begin by considering the following example, which I take to be a good analogical argument. Example 5.1 (rectangles and boxes). Our source domain, S, consists of rectangles in the plane, with sides x, y > 0. Let Q be the theorem that of all rectangles with a fixed perimeter, the square has maximum area. The target, T, consists of rectangular boxes with sides x, y, z > 0. Let Q* be the conjecture that if perimeter is fixed, the largest such box is a cube. We argue by analogy that Q* is plausible. According to Plausibility I, the first step in evaluating this analogical argument is to supply a proof of Q. Our assumption is that perimeter is constant— that is, x + y = c for some constant c. So the area xy is maximized when the product x(c – x) is maximized. A theorem of calculus (the First Derivative Test) tells us that this occurs when x = c/2. In conjunction with our assumption, this implies that y = c/2, and thus we have proved that x = y (i.e., that the rectangle with maximum area is a square). The critical facts, those used explicitly in the proof, are the formulas for area and perimeter, the assumption that perimeter is constant, and the First Derivative Test. Looking at the target domain, we see that there are analogous formulas for volume and perimeter, and an analogous theorem about maxima for functions of two variables. This information is summarized in table 5.1. From Plausibility I, it follows that we have a plausible analogical argument. (In fact, the conclusion Q* is true.)
ANALOGIES IN MATHEMATICS
155
TABLE 5.1. Rectangles and boxes Source (two-dimensional rectangles)
Target (three-dimensional boxes)
Area = x · y Perimeter = 2(x + y) First derivative test (one-variable): at a maximum, Df (x) = 0
Volume = x · y · z Perimeter = 4(x + y+ z) First derivative test (two-variables): at a maximum, Df(x, y) = 0
Our test for plausibility seems to work quite well here. Yet our analysis takes it for granted that volume is the natural analog to area, and there are other possibilities. Let’s rewrite the formula for the area of a rectangle with sides x and y as AREAðx; yÞ ¼ 3x 22 þ x 21 y21 3sin22 y; which is, of course, equal to xy. Now we obtain a new three-dimensional analog: VOLUMEðx; y; zÞ ¼ 3x 32 þ x 31 y31 z31 3sin32 z: The “resemblance” here is that both AREA and VOLUME are instances of the formula 3x1 n2 þ x1 n1 . . . xn n1 3sinn2 xn ; putting n = 2 and n = 3, respectively. That gives us an analogical argument that of all boxes with a fixed perimeter, the one that maximizes VOLUME is a cube. What is more, the argument appears to satisfy the criteria of Plausibility I, although it is manifestly implausible. Let’s refer to this variant as Example 5.1a. This is a clear case of specious resemblance, resting on a technical trick. Nobody would take the argument seriously. But how can we rule out such tricks in a principled way when we have stressed the importance of allowing flexibility in representing an analogical argument? If the example seems far-fetched, consider the following case. Example 5.2 (Euler characteristic formula). Euler’s formula states that if the number of faces, edges, and vertices of a (convex) polyhedron are F, E, and V, respectively, then F + V = E + 2. For example, the equation holds good for a cube, which has 6 faces, 8 vertices, and 12 edges. It also works for tetrahedra, dodecahedra, and other solids.1 There is an even simpler formula for the twodimensional case: V = E. Any polygon has the same number of edges and vertices. Polya (1954, 43) rewrites the two formulas in an ingenious way to exhibit an analogy: 1. See Lakatos, Worrall, and Zahar (1976) for the history and limitations of this formula.
156
BY PARALLEL REASONING
VE+F=1 VE+FS=1
(polygons 2 dimensions) (polyhedra 3 dimensions)
The left-hand side of each equation is an alternating sum beginning with the number of vertices (0-dimensional elements), and proceeding through the number of edges (1-dimensional), faces (2-dimensional), and solids (3-dimensional elements).2 This looks like a case of specious resemblance, where new terms are introduced and old ones are manipulated in order to “create” an analogy. Yet it turns out to be an illuminating way to rewrite the two formulas. Indeed, it suggests analogous Euler formulas for higher dimensions, and these analogous formulas happen to be correct.3 A “manufactured” resemblance thus leads to an insightful analogy. In some ways, our difficulty is reminiscent of Goodman’s well-known puzzle: how do we distinguish between purely qualitative predicates (like blue and green) that are suitable for inductive reasoning, and predicates (like grue and bleen) that are artificial (Goodman 1965)?4 Goodman offers a pragmatic solution, relying on the concept of entrenchment or successful past projection. In essence, inductive generalization is restricted to predicates that have been successfully projected in the past. But entrenchment seems to have little place in creative mathematics. If we insist upon a “standard way” of doing mathematics, we might never find the innovative reformulation of the Euler formula.
Separated Domains Our second problem is how to deal with analogical arguments based on similarity between “widely separated” domains. An analogical inference between two neighboring dimensions is prima facie stronger than one between dimensions one and three, one and four, and so on. Perhaps “gappy” analogical arguments don’t reveal anything about plausibility, as in the following (slightly artificial) example.
−1
0 Punctured interval
1 Punctured Sphere
FIGURE 5.1. Punctured balls in one and three dimensions. 2. For a polygon, F (the number of faces) is 1; for a polyhedron, S (the number of solid elements) is 1. 3. The full generalization involves concepts of algebraic topology; see Munkres (1984). 4. Grue is defined relative to an arbitrary fixed time t as green and examined before t, or blue and not examined before t. Bleen is defined similarly, reversing the positions of green and blue.
ANALOGIES IN MATHEMATICS
157
Example 5.3 (punctured balls): A punctured set is a set with one point removed, such as the interval (1, 1) without the point 0 (pictured in figure 5.1). A set is disconnected if it can be written as the union of two disjoint nonempty open sets. Roughly, this means it has two or more separate, nontouching pieces. Lastly, the unit ball in Rn is the set of points whose distance to the origin is less than 1 unit. It is easy to show that, in one dimension, the punctured unit ball (1, 0) [ (0, 1) is disconnected into two components. What should we say about arguing, by analogy, that the punctured unit ball in R3 is disconnected? It seems that we should reject the argument: one should look first at the two-dimensional case (where the punctured unit ball is not disconnected). Yet Plausibility I provides no restrictions that require looking first at “closer”cases. Our test seems to fail us. This argument appears to satisfy Plausibility I, yet it is intuitively implausible. Once again, there is no quick way to fix the problem. Suppose we rule out any analogy between domains that are not “neighbors” (although as yet, we have no precise concept of neighboring domains). That will imply something much too strong: analogical arguments that depend upon parity are a priori implausible. But we know that some mathematical properties do depend upon parity. For example, polynomial equations of odd degree with real coefficients always have at least one real root, and that is not so if the degree is even. The Hairy Ball Theorem asserts that only in real vector spaces of even dimension is it possible to define a continuous tangent vector field on the surface of the unit ball. More picturesquely, in even dimensions we can comb a hairy ball flat without parting the hair in any way. We can’t do this if the dimension is odd. Sometimes, in thinking about a problem in three dimensions, the one-dimensional case might be a better guide than the two-dimensional case.
Limits Similarities involving limits play a central part in many interesting mathematical analogies. The problem is that our theory seems to reject most of them because of the no-critical-difference requirement. Unlike the previous two problems, in which our theory gives intuitively implausible arguments a passing grade, the difficulty here is that our theory appears to be too strict. An excellent illustration of this type of analogy—and the challenge it poses for our theory—is Euler’s beautiful exploitation of similarities between polynomials and power series. My presentation is based on Polya (1954, 17–21). Example 5.4 (polynomials and power series). A polynomial of degree n is a function of the form f(x) = a0 + a1x + . . . + anxn. A power series is a function defined by an infinite sum of the form f ðxÞ ¼
∞ X
an x n ;
n¼0
for each x, the value f(x) is defined if the partial sums
158
BY PARALLEL REASONING M X
an x n
n¼0
converge to some limit. Each of these partial sums is a polynomial function, so that a power series is the limit of a sequence of polynomial functions. The concept of a limit is essential for the precise characterization of the correspondence between polynomials and power series. Euler uses the analogy between polynomials (the source domain) and power series (the target domain) to support two conjectures and to obtain a striking result—namely, the infinite sum 1 + º + 1/9 + 1/16 + . . . = p2/6, or in concise notation: ∞ X 1 p2 : ¼ 6 n2 n¼1
Euler’s first conjecture equates an infinite sum and an infinite product. In the source domain (polynomials), it is not difficult to prove the following result: a 0 þ a1 x þ . . . þ an x n
¼ a0 ð1 x=a1 Þ . . . ð1 x=an Þ:
Here, the polynomial f(x) = a0 + a1x + . . . + anxn has the n roots a1, . . . , an (counting multiplicities). So we can equate a finite sum (left side) with a finite product (right side). By analogy, if f is defined by a power series and has roots ai (counting multiplicities), Euler conjectures that f is expressible as the infinite product ∞
a0 ∏ ð1 x=ai Þ: n¼1
Since sin x/x has the power series sin x=x ¼ 1 x 2 =3! þ x 4 =5! . . . and has roots ± kp for k6¼0, Euler’s conjecture yields: sin x=x ¼ ð1 x=pÞð1 þ x=pÞð1 x=2pÞð1 þ x=2pÞ . . . ¼ ð1 x2 =p2 Þð1 x 2 =4p2 Þ . . . Euler’s second conjecture is to derive an identity from this first step. In the source domain, begin with the equation just discussed: f ðxÞ ¼ a0 þ a1 xþ . . . þan x n ¼ a0 ð1 x=a1 Þ . . . ð1 x=an Þ: The coefficient of x in these two expressions must be the same, which gives us the relation:
ANALOGIES IN MATHEMATICS
159
a1 ¼ a0 ð1=a1 þ . . . þ 1=an Þ: The expression in parentheses is the sum of the reciprocals of all roots of f. Euler conjectures that the same relationship between the coefficients a0 and a1 holds for the infinite case—that is, for power series. Applying it to sinx/x, he obtains 1=3! ¼ ð1Þð1=p2 þ 1=4p2 þ . . .Þ; which, after simplification, gives the desired series for p2/6. Euler’s argument poses a challenge to our theory for the following reason. In proving the equality of the two representations of a polynomial, as finite sum and finite product, we use facts about the degree of the polynomial. But there is no analog for degree, and hence nothing corresponding to these results, for an infinite power series. According to Plausibility I, this counts as a critical difference. Thus, on our theory, it looks as if Euler’s argument is blocked, and consequently establishes nothing about plausibility. Other analogical arguments based on taking limits are blocked in similar ways. That looks bad for our theory, since Euler’s argument appears prima facie plausible. Again, there is no simple way to fix our theory. The no-critical-difference condition leads to trouble here, but it is at the heart of our account of plausible analogical arguments and is not easily modified. To meet all three of the challenges raised in this section, we need to incorporate constraints on similarity into our evaluation criteria. As a first step, let’s make the following simple adjustment. (5.2.2) Plausibility II 1. Overlap. Some explicit assumption in the proof must correspond admissibly to a fact known to be true in the target domain. (The positive analogy is nontrivial.) 2. No-critical-difference. No explicit assumption in the proof can correspond admissibly to something known to be false in the target domain. (No critical assumption belongs to the negative analogy.) The only change from Plausibility I is that the correspondences have to be admissible—that is, an acceptable basis for an analogical argument. In sections 5.4 through 5.6, we shall develop models that help us to decide when the similarities in an analogical argument are admissible. These models will permit us to amend our basic test for plausibility and to deal with the three challenges just raised.
5.3 Isomorphism Before developing multiple models for different types of mathematical similarity, let us consider briefly whether a single strategy might succeed. Many people take
160
BY PARALLEL REASONING
the concept of isomorphism to set the appropriate standard, the ideal that should guide all our thinking about similarity and its role in analogical reasoning. While this idea has the appeal of simplicity, it proves too rigid to accommodate many intuitively plausible analogical arguments, even within the field of mathematics. Borrowed from model theory, the concept of isomorphism certainly does give us a precise way to capture the idea that two structures look alike. Suppose that I and I* are interpretations for a first-order language L, with respective domains D and D*. Let RI, f I and cI be the denotations of relation, function, and constant symbols under I, and similarly for I*. We say that I and I* are isomorphic if there is a 1-1 function j, mapping the objects in D onto the objects in D*, such that for all x1, . . . , xn in D: (5.3.1) (I1) For each n-place relation symbol R in the language L, RI(x1, . . . , xn) if and only if RI*( j(x1), . . . , j(xn)); (I2) For each m-place function symbol f in the language L, j( f I(x1, . . . , xn)) = f I*( j(x1), . . . , j(xn)); and (I3) For each constant c in the language L, j(cI) = cI*. (Chang and Keisler 1977, 20–21) For example, if our language L consists of a single two-place predicate, <, then two isomorphic interpretations are the positive integers with < interpreted as “less than,” and the negative integers with < interpreted as “greater than.” The mapping j takes 1 to 1, 2 to 2 and so forth. One structure is the mirror image of the other. This definition can be applied to analogies, but it has to be modified slightly. Instead of a common language with two interpretations, we have an analogy mapping j defined on a select set of relations, functions, and objects of the source domain, S, and taking them to corresponding relations, functions and objects in the target domain, T. Thus j must map n-place relations to n-place relations, and similarly for functions. The first modification, for each clause of definition (5.3.1), is to replace “in the language L” with “in the source domain.” Next, in clause (I1), replace RI and RI* with R and j(R). With similar modifications to the other two clauses, we obtain a definition of analogy as an isomorphism relation: (5.3.2) (I1) for each n-place relation R in the source domain, R(x1, . . . ,xn) if and only if j(R)( j(x1), . . . , j(xn)); (I2) for each m-place function f in the source domain, j( f(x1, . . . ,xn)) = j( f )(j(x1), . . . , j(xn)); and (I3) for each designated object c in the source domain, j(c) = j(c). Isomorphisms are perfect analogies. Two isomorphic domains have identical structure, as in the mirror-image example involving the positive and negative integers. Despite its precision, isomorphism holds little promise as a general model for analogy, even within mathematics. There are at least three difficulties. In the
ANALOGIES IN MATHEMATICS
161
first place, if two domains S and T are isomorphic, then any proof in S can be mapped to a proof in T. An analogical argument between isomorphic domains gives us certainty rather than plausibility. Since we want a nontrivial theory of analogical arguments (see section 4.1), we have good reason not to take isomorphism as the paradigm for the analogy relation. A second difficulty, of a technical nature, is that analogy relations do not usually require correspondence between all objects of the two domains, but isomorphisms require a bijective (1-1 and onto) correspondence, the function j in definition (5.3.2). There are many analogies between the real and complex numbers, including the fact that the usual arithmetic operations and many limit processes work in similar ways (since both sets are closed fields). Yet the real and complex numbers are not isomorphic because there is no bijection j linking the two sets in such a way as to satisfy definition (5.3.2). The third and most important reason to reject the “isomorphism” picture of analogy is that it is too restrictive to accommodate the diverse forms of similarity that enter into mathematical analogies. In particular, it cannot model geometric analogies such as those that occur in Example 5.1. In that example, we have a correspondence between a two-place function (area) and a three-place function (volume), which is not possible under an isomorphism. A similar problem applies to correspondences obtained by taking limits, as in Example 5.4. It is too restrictive to think of all similarities in mathematical analogies in terms of isomorphisms. There is, however, one feature of isomorphisms that is untouched by the objections we have raised, and which seems to be a reasonable constraint on any analogy relation. (5.3.3) Consistent Mapping Condition An admissible analogy relation j must be consistent: any designated relation, function or object of the source domain S has at most one analog in the target domain. In other words, j is a function from designated relations, functions, and objects of S to those of T. The consistent mapping condition applies to a single analogical argument. Plainly, different analogical arguments between the same two domains can place different relations, functions, or objects in correspondence. In the next three sections my approach is to model “local” similarities that relate clusters of relevant properties and relations in the two domains, rather than thinking in terms of a single global mapping. Despite the failure of isomorphism as a general model for similarity, we shall see its influence in the syntactic characterization of algebraic and (to some extent) geometric similarity to which we now turn.
5.4 Algebraic Similarity An algebraic similarity is a mathematical property shared by a cluster of relations, functions, and objects in the two domains. For example, addition and multiplication
162
BY PARALLEL REASONING
(of real numbers) are similar in that both operations are commutative (x + y = y + x and x · y = y · x) and associative ((x + y) + z = x + (y + z) and (x · y) · z = x · (y · z)). We characterize such similarities as second-order identities, in the following manner. (5.4.1) Algebraic Similarity An algebraic similarity between relations R1, . . . , Rm, functions F1, . . . , Fn, and constants c1, . . . , cp of S, and corresponding R1*, . . . , Rm*, F1*, . . . , Fn*, and c1*, . . . , cp* of T is expressed by a (second-order) propositional function ΨðR1 ; . . . ; Rm ; F1 ; . . . ; Fn ; c1 ; . . . ; cp Þ; such that the two sentences ΨðR1 ; . . . ; Rm ; F1 ; . . . ; Fn ; c1 ; . . . ; cp Þ and ΨðR1 ∗ ; . . . ; Rm ∗ ; F1 ∗ ; . . . ; Fn ∗ ; c1 ∗ ; . . . ; cp ∗ Þ that result from substituting the respective relations, functions, and constants for the variables are true in the respective domains S and T.5 For example: commutativity of both addition (x + y = y + x) and multiplication (x · y = y · x) is an algebraic similarity expressed by the function Ψ :ðxÞðyÞðFxy ¼ FyxÞ: Here, F is a two-place function variable; C is true when either operation (+ or ·) is substituted for F. As a second example: distributivity of addition and multiplication, x(y + z) = xy + xz, is expressed by Ψ :ðxÞðyÞðzÞðF1 ðx; F2 ðy; zÞÞ ¼ F2 ðF1 ðx; yÞ; F1 ðx; zÞÞ: That the distributive property holds in two domains, such as the real numbers and the complex numbers, counts as an algebraic similarity. If we substitute multiplication for F1 and addition for F2 we obtain a proposition that holds in both domains. In practice, there is no need to identify algebraic similarities at this level of abstraction. Instead, we simply rely on the obvious resemblance between the corresponding first-order propositions. The following example illustrates how algebraic similarity, in conjunction with Plausibility II, can be used to evaluate an analogical argument.
5. The bold variables—representing relations, functions, and constants—are free in C. The other relation, function, and constant symbols denote items in the source and target domains.
ANALOGIES IN MATHEMATICS
y
y sector area 1/2 a length a
sin a
sector area 1/2 a
163
x2 - y2 = 1
sinh a 1
1
cos a
x
x cosh a
FIGURE 5.2. The circle and the hyperbola. Example 5.5 (trigonometric and hyperbolic functions). The eighteenth-century mathematician John Playfair (1778), in an attempt to justify arithmetical operations on imaginary numbers, developed an interesting analogy between the unit hyperbola and the unit circle, and in particular between the hyperbolic and trigonometric functions. (Playfair’s geometric interpretation is illustrated in figure 5.2.) He began by noting a number of interesting correspondences, as shown in table 5.2. TABLE 5.2. The hyperbola and the circle Unit Hyperbola Sectors, area ‰ a þea 2 a e ea 2
cosh ðaÞ ¼ e sinh ðaÞ ¼
a
1 (constant)
Unit Circle Arcs, length a (or sectors, area ‰ a) þeia 2 e eia 2i
cos ðaÞ ¼ e sin ðaÞ ¼
ia
ia
i (constant)
It was well known that propositions about real numbers could be “proven” by formal manipulations involving complex numbers, but there was considerable debate about the merits of such proofs in light of philosophical misgivings about the status of imaginary numbers. For example, the double angle formula for cosines can be derived from the above equations as follows: cos ð2aÞ þ 1 ð5:4:2Þ
¼
e2ia þ e2ia þ 2 2
¼
ðeia þ eia Þ2 2
¼ 2 cos 2 a: Playfair provided an interesting defense for this type of derivation. He wrote that any such deduction is “unintelligible” per se, but “points out a perfectly
164
BY PARALLEL REASONING
precise” valid deduction for the hyperbolic functions, making no use of imaginaries. This valid deduction can be obtained by direct substitution of 1 for i in the original derivation. For example, the above derivation is readily transformed into a proof of an expansion formula for hyperbolic cosine: cosh ð2aÞ þ 1
ð5:4:3Þ
¼
e2a þ e2a þ 2 2
¼
ðea þ ea Þ2 2
¼ 2 cosh 2 a: In Playfair’s view, this identity for hyperbolic functions provides analogical support for the original theorem (and a justification for the formal manipulations with imaginary numbers). The same sort of relationship holds for other trigonometric identities, such as 1 sin a cos b ¼ ½sin ða þ bÞ þ sin ða bÞ: 2
ð5:4:4Þ and ð5:4:5Þ
sinh a cosh b ¼
1 ½ sinh ða þ bÞ þ sinh ða bÞ: 2
Now as Playfair noted, the procedure sometimes fails. For instance, 1 sin a sin b ¼ ½cos ða bÞ cos ða þ bÞ 2
ð5:4:6Þ is true, but ð5:4:7Þ
sinh a sinh b ¼
1 ½ cosh ða bÞ cosh ða þ bÞ 2
is false. Similarly, ð5:4:8Þ
sin 2 a þ cos 2 a ¼ 1
is true, while ð5:4:9Þ
cosh 2 a þ sinh 2 a ¼ 1
is false. What goes wrong in these cases is that the derivations for the identities (5.4.6) and (5.4.8) make use of the assumption i2 =1. Since 12 6¼1, Playfair’s substitution procedure fails to yield valid derivations for the equations (5.4.7) and (5.4.9). Let me now connect this example to my theory. I am interested in the analogy that runs from the trigonometric functions to the hyperbolic functions
ANALOGIES IN MATHEMATICS
165
(opposite to the direction on which Playfair focuses). This analogy concerns whether a formally correct derivation of a trigonometric identity such as equation (5.4.2), (5.4.4), (5.4.6) or (5.4.8) can be transformed, line by line, into a derivation for the analogous hyperbolic identity. Our theory correctly indicates that the identities (5.4.3) and (5.4.5) are plausible, because every critical assumption belongs to the positive analogy. By contrast, the analogical arguments for the conjectured identities (5.4.7) and (5.4.9) are blocked by the no-critical-difference condition. They require the critical assumption i2 = 1, which belongs to the negative analogy. This example provides a good illustration for our amended theory of plausibility, Plausibility II. Playfair’s analogy rests entirely upon algebraic similarities, reflected in the parallel structure of the relevant functions and assumptions. In this case, our test for plausibility works well, correctly sorting out which identities can be transferred from the trigonometric to the hyperbolic case. That is as much as we can hope for in a theory of analogical arguments. The example suggests that algebraic similarity can be integrated with our basic theory in a relatively straightforward manner. That might seem surprising, since the definition of algebraic similarity appears to allow for trivial or specious resemblances. For instance, the operations of adding two numbers and taking the minimum of two numbers are both commutative. This similarity appears slight, yet according to our definition it counts as a legitimate algebraic similarity. We might worry that this leads to a theory that finds specious analogical arguments to be plausible. Our theory takes care of this worry by offering an independent basis for determining which assumptions are critical—namely, those that occur explicitly in the proof. Thus, if we are extrapolating a result proved using only the commutativity of addition, then x + y and minimum (x, y) are appropriately similar. Otherwise, this similarity is either irrelevant or insufficient to secure the plausibility of the analogical argument. Every algebraic similarity is admissible. This type of similarity is quite unproblematic—and when we reflect on the fact that it is really a form of identity, that is not surprising. To bring out the difficulties noted in section 5.2 and to examine more interesting analogical arguments, we turn to other kinds of mathematical similarity.
5.5 Geometric Similarity Two functional or relational expressions (henceforth just expressions) are geometrically similar if they look the same up to a change in the value of some parameter or parameters. In characterizing them, I limit my attention to cases involving a single parameter. It helps to begin with a few examples of expressions that exhibit geometric similarity:
166
BY PARALLEL REASONING
• Area(x, y) = x · y and Volume(x, y, z) = x · y · z, from Example 5.1 • Length(x, y) = (x2+y2)‰ and Length(x, y, z) = (x2 + y2+z2)‰ (2- and 3-dimensional length) • f(x) = x2 and g(x) = x4 • f(x) = xºn and g(x) = x‰n In these examples, the parameter can be a positive integer or a real number. Furthermore, the parameter may indicate the number of arguments, as in the first two examples, or it may function as an individual constant, as in the last two. That gives us three types of geometric similarity (since number of arguments must be an integer). (5.5.1) Geometric Similarity Two expressions F and F* are geometrically similar if one of the following three cases obtains: a. F = F(m) and F* = F(n), where F(k) is an expression parameterized by a positive integer k in some range including m and n. b. F = F(a) and F* = F(b), where F(t) is an expression parameterized by a real number t in some range including a and b. c. F = F(x1, . . . , xm) and F* = F(x1, . . . , xn) where F(x1, . . . , xk) is an expression with k arguments, defined for a range of values of k including m and n. Each definition involves a one-parameter family of relational or functional expressions. The similarity is captured by the common form, F. The two similar expressions are obtained by supplying distinct values for the parameter. The parameter space may be a set of integers, as in cases (a) and (c), or a set of real numbers, as in case (b). As examples of each case, we have the following pairs: a. f(x) = x2 and f*(x) = x4 (both instances of F(k) = xk); b. f(x) = xºn and f*(x) = x‰n (both instances of F(t) = xtn); c. Area(x, y) = x·y and Volume(x, y, z) = x· y· z (both instances of F(x1, . . . , xk) = x1· . . . ·xk). Geometric similarity is highly sensitive to form. It is not preserved under mathematical and logical equivalence. For instance, Fðx1 ; x2 Þ ¼ x1 x2 and F ðx1 ; x2 ; x3 Þ ¼ x1 sinðx2 x3 Þ are mathematically equivalent to the instances n = 2 and n = 3 of Fðx1; . . . ; xn Þ ¼ x1 . . . xn þ ðn 2Þ½x1 sinðx2 . . . xn Þ x1 . . . xn :
ANALOGIES IN MATHEMATICS
167
But F and F* are not themselves instances of F, so they are not geometrically similar. Without this restriction, any two expressions could be shown to be geometrically similar and the relation would be trivial. Let’s look at how geometric similarity functions in analogical arguments. It will quickly become clear that our purely syntactic definition is inadequate—and for the very reasons that emerged in the challenges of section 5.2. We begin with a set of examples of which the first few have already been introduced. Example 5.1 (rectangles and boxes; from section 5.2). This analogy rests on geometric similarities. The formulas for two- and three-dimensional perimeter are related as instances of Perimeter(n) = 2n – 1(x1 + . . . + xn), with n = 2 and n = 3. Similarly, area and volume are instances of generalized volume, Volume(n) = x1· . . . ·xn. Assuming the geometric similarities are admissible, the analogical argument is plausible. So far, so good. But trouble arises right away. In section 5.2, we presented a superficially similar but intuitively implausible analogical argument, Example 5.1a. That example was constructed so that all of the relevant functions satisfy our purely syntactic definition of geometric similarity. This is the problem of specious resemblance. Example 5.2 (Euler formula; from section 5.2). The similarities between the different formulas for polygons, polyhedra, and so forth can be fully represented in terms of a single parameter (dimension), and again they satisfy our definition of geometric similarity. Example 5.3 (punctured sphere; from section 5.2). This analogy between punctured intervals in one dimension and punctured spheres in three dimensions once again satisfies our definition of geometric similarity, yet is not obviously plausible. This is the problem of separated domains. These examples make it clear that geometric similarity must meet substantive criteria beyond definition (5.5.1) to count as admissible, to be an acceptable basis for a plausible analogical argument. But precisely what these criteria should be is a difficult question, and one that I shall not entirely resolve. Before taking on this task, I offer one final example. The example is important, in part because it is an analogical argument that fails even though it is (according to my theory) plausible. Example 5.6 (abelian groups). This example relies on elementary group theory, so I start with some mathematical background. A group is a set closed under an operation—multiplication—that satisfies some basic properties: associativity, the existence of an identity, and the existence of inverses. For example, the set of rotations around the origin in the plane is a group. To multiply two rotations is to follow one with another. This operation satisfies all of the group properties: it is associative, the identity element is rotation through zero degrees, and the inverse of a rotation of k degrees is a rotation of k degrees. A group G is said to be abelian, or commutative, if ab = ba for any two elements a, b of G. For example, the group of two-dimensional rotations is abelian: the net effect of two rotations is the same regardless of which is performed first. Finally, if x is a member of a group G and m is a positive integer, then xm signifies x multiplied by itself m times. For example, if x is a
168
BY PARALLEL REASONING
TABLE 5.3. Abelian groups Source (S)
Target (T) 2
2 2
For all a, b, (ab) = a b Associative Cancellation Laws Q: Abelian
For all a, b, (ab)3 = a3b3 Associative Cancellation Laws ) Q*: Abelian
rotation through 20 degrees, then x3 is 3 rotations through 20 degrees, which is equivalent to rotation through 60 degrees. Now to the analogical argument, which is based on the following elementary problem in group theory (Herstein 1975): if G is a group such that (ab)3 = a3b3 for all a, b in G, must G be abelian? Analogical reasoning seems to make a positive answer plausible. For we can prove the analogous result that if (ab)2 = a2 b2 for all a, b in G, then G is abelian. Here is the proof. If a, b E G, then: ) )
abab ¼ aabbðby assumption and associativityÞ bab ¼ abbðusing the left cancellation law; ax ¼ ay ) x ¼ yÞ ba ¼ abðusing the right cancellation law; xb ¼ yb ) x ¼ yÞ:
The proof uses only the definition of exponentiation, associativity, and the two cancellation laws. According to my theory, these are the critical facts. My theory suggests that we have a good analogical argument. The source and target domains are groups satisfying analogous conditions, as shown in table 5.3. Q is the statement that S is abelian, which (by the above proof ) is true. Q* is the conjecture that T is abelian. There are both algebraic and geometric similarities between S and T. The general properties of groups, associativity, and the cancellation laws count as algebraic similarities. The two identities at the head of each column exhibit geometric similarity. Assuming that all of these similarities are admissible, and noting that everything that is critically relevant in our proof of Q has a true analog in T, our theory tells us that Q* is plausible. In fact, Q* is false.6 This example is important for three reasons. In the first place, we have an analogical argument that is both putatively plausible and a failure. That establishes the nontriviality of our theory, for plausibility should not always line up with success. Second, as a closely related point, the example reminds us that we need to characterize plausibility in a way that makes it at least partly independent of success. Our theory suggests that, short of attempting a proof, this analogical argument counts as plausible because there are no obvious critical disanalogies between the source and target domains. Third and finally, this example reinforces the lesson we have already drawn from Example 5.1 and its variant Example 5.1a: we need an account of admissible geometric similarity that
6. A modified result is true: if (ab)3 = a3b3 for all a, b in T, and the number of elements in T is not divisible by 3, then T is abelian.
ANALOGIES IN MATHEMATICS
169
supplements the purely syntactic definition (5.5.1). Toward that end, I begin by proposing solutions to the first two problems from section 5.2.
Specious Resemblance The problem of specious resemblance is that two expressions with no meaningful similarity can be rewritten so as to satisfy the purely syntactic criterion (5.5.1). Yet creative forms of representation are an important part of analogical reasoning in mathematics. We seek a balance that allows us to distinguish between legitimate and specious manipulations. Table 5.4 summarizes facts (columns 1 and 4) and intuitions (columns 2 and 3) about the examples that bear on this problem. To “solve” the problem of specious resemblance, we need to account for the facts and intuitions in the table. We must explain why, for instance, the manipulations of the Euler formula may be legitimate while those in the highly contrived Example 5.1a are not. My proposed solution is roughly as follows. An admissible geometric similarity must both satisfy the syntactic conditions of definition (5.5.1) and be motivated by the proof that serves as the prior association on which the analogy is based. Thus, in Example 5.1a, AREA and VOLUME meet criteria (5.5.1) for geometric similarity, and AREA is mathematically equivalent to Area, as shown in table 5.5. Yet if we consider the proof (section 5.2) for the theorem that of all rectangles with fixed perimeter, the square has maximum Area, we find no motivation for re-writing Area as AREA. I suggest, then, that the way to deal with specious resemblance is with the following requirement of internal coherence: (5.5.2) Internal Coherence For a geometric similarity to count as admissible, the relevant relations and functions should be expressed using standard representation, unless some justification internal to the domain can be given for a nonstandard representation. More specifically, any novel representation should be justified in terms of the proof that is the prior association for the analogical argument. TABLE 5.4. Specious resemblance: summary of results Example Example 5.1 (boxes) Example 5.1a (boxes: variant) Example 5.2 (Euler formula) Example 5.6 (Abelian groups)
Satisfies def. (5.15)?
Legitimate manipulation?
Plausible analogy?
Successful analogy?
Yes
Yes
Yes
Yes
Yes
No
No
No
Yes
Maybe
Maybe
Yes
Yes
Yes
Yes
No
170
BY PARALLEL REASONING
TABLE 5.5. Contrived version of area/volume analogy Two Dimensions
Three Dimensions
Area = x · y AREA = 3x22 + x21y21 3sin22y
Volume = x · y · z VOLUME = 3x32 + x31y31z31 3sin32z
The real problem lies not with representations that are complex or novel but, rather, with those that have no independent motivation. The proposal is that any novel way of representing features of the source or target domain must have a motivation that is independent of the analogy and its purposes, a rationale internal to the domain. More specifically, the novel representation should be linked to the proof that is the basis of the analogical argument. In Example 5.1a, the only way to motivate the artificial formula AREA is to point to the function VOLUME and say, “I want the formula for area to look like that (so as to provide analogical support for my conjecture).” The motivation is external to the source domain, which makes the reformulation illegitimate. There is another way to argue for requirement (5.5.2). The problem that it is meant to solve can be viewed as a problem about hidden assumptions. The expressions for AREA and VOLUME lack an important property possessed by the original expressions for Area and Volume: symmetry between x and y. The formulas for Area and Volume remain the same if we interchange x and y, but that does not hold for AREA and VOLUME. Symmetry is exploited in the proof on which the analogy in Example 5.1 is based, but the manipulations destroy the symmetry. That raises an important general problem: in evaluating an analogical argument, how do we know whether or not there are relevant assumptions that are not explicit? Features such as symmetry may be important in a mathematical proof, and highly relevant to the correct assessment of an analogical argument, even though they are not made explicit. The problem is that our theory seems to make no provision for identifying these tacit yet critical disanalogies. The problem is interesting because we might well think that demanding inferential explicitness in a proof is all that we could possibly want when it comes to mathematical clarity. A proof is inferentially explicit if each step is (or can be) justified as following from earlier steps by the application of an elementary rule or principle. This type of explicitness, familiar to logicians, is adequate so long as we are only interested in assessing the validity of a mathematical argument. When we turn to the inductive side of mathematics, however, and specifically to analogy, we have to acknowledge the importance of tacit features. We may not have noticed them, yet they can be critical to an analogical argument. We might conjecture that there is an ideal of pragmatic explicitness, important for inductive arguments in mathematics, that complements the traditional ideal of inferential explicitness in representing deductive arguments. A proof is pragmatically explicit if it spells out all important hidden assumptions (such as symmetry). It should be clear, however, that this idea is a nonstarter. The ideal of pragmatic explicitness is unattainable, or at least something that we can never be sure of having attained. We can agree that a proof is sufficiently explicit to count as valid, but we can never be sure that we have attained pragmatic explicitness.
ANALOGIES IN MATHEMATICS
171
Accordingly, our theory defines critical relevance using the standard of inferential explicitness. It would be a mistake to resolve the problem of specious resemblance (or hidden assumptions) by insisting that all proofs be pragmatically explicit, even if we could be clear about what that meant. In the first place, any proof will almost always have interesting hidden features. In the second place, the success of an analogical argument often hinges on the fact that some or all of those tacit features are irrelevant—that is, no obstacle to generalization. To require that we identify every tacit feature about a proof in the source domain would hobble analogical reasoning. It is enough to require that all assumptions used in an inferentially explicit proof be capable of imitation in the target. When an analogical argument meets this standard yet supports a conjecture that happens to be false, that failure brings out the importance of the hidden features but it does not show that the original conjecture was implausible. Our best hope, then, is the somewhat vague criterion of internal coherence, requirement (5.5.2). We have seen that this requirement lets us deal effectively with Example 5.1 and Example 5.1a. In my view, it also succeeds with Example 5.6 (abelian groups). There is no specious resemblance, so the similarity is admissible and the analogical argument counts as plausible, in spite of the fact that it fails. What about the Euler formula, Example 5.2? Polya’s manipulations (see section 5.2) are certainly motivated by the desire to display an analogy between two and three dimensions, but that is acceptable only if they also have a natural motivation independent of the analogical argument. Let us see what our theory says about this example. In the first place, what is the analogical argument? Let us focus on the argument from the two- and three-dimensional Euler formulas to the analogous formula in four dimensions: VðverticesÞ EðedgesÞ þ FðfacesÞ SðsolidsÞ þ Kð4 dimensional objectsÞ ¼ 1 If we have no more to go on than the formulas for two and three dimensions (see the initial presentation of Example 5.2 in section 5.2 ), with inductive support but no formal proof, then there is no prior association and, according to our theory, the analogical argument establishes nothing at all about plausibility.7 The first person to prove the three-dimensional Euler formula in a rigorous manner was Cauchy. Cauchy’s proof, restricted to convex solids, involves first collapsing the solid to a two-dimensional structure of vertices, edges, and regions, then adding edges until all regions in the complex are triangular, and finally deleting edges and vertices down to a single triangle, all the while ensuring that each transformation makes no difference to the sum of V – E + F. I submit that relative to such a proof, Polya’s way of writing the Euler formula is sufficiently motivated, so that the similarity between the three- and four-
7. If this strikes you as too rigid, I ask you to defer judgment until section 5.8.
172
BY PARALLEL REASONING
dimensional cases may be considered admissible. Our theory does a fair job of explaining our intuitions about the Euler example. Of course, we can find plenty of objections to the requirement of internal coherence. With a little ingenuity, one could probably produce an “internal” justification even for a case as artificial as Example 5.1a. I acknowledge the problem, but still maintain that the requirement imposes the right kind of constraint on representation. A second objection is that requiring internal coherence is too conservative. Too many geometric similarities become inadmissible because they depend upon unusual representations. Recall, however, that the advocate has considerable latitude in drawing up the proof, and can probably motivate any reasonable change in representation. Remember that our critic is moderately sympathetic. In the end, it must be conceded that I have not completely resolved the problem. The requirement of internal coherence is not a precise condition, but only a general guideline about representation. Disagreement about the plausibility of a mathematical analogy may well turn on whether some novel representation can be made to appear legitimate, and the theory I have put forward does not settle all such disputes.
Separated Domains Our second problem is to deal with geometric similarity between “widely separated” domains. In Example 5.3 (punctured balls), the analogical inference that proceeds from the one-dimensional case to the three-dimensional case appears to be implausible, because the natural thing to do is to look first at the twodimensional case. Yet the argument seems to qualify as plausible according to our theory, which so far provides no restrictions along these lines. Furthermore, as noted in section 5.2, complications arise because there can be legitimate analogies that skip over intermediate domains. Happily, we have the resources to deal with this problem. We need only combine our treatment of multiple analogies (section 4.10) with the present analysis of geometric similarity. Any case of geometric similarity is likely to be a situation in which we have multiple analogical arguments, corresponding to different possible parameter values. In Example 5.3, for instance, we are considering an analogy between one and three dimensions, but there is clearly a twodimensional analog to consider as well. Our theory of multiple analogies, supplemented with a very reasonable proposal, enables us to deal with this type of example. We need only two ideas from the theory of multiple analogies. First, we need to define a ranking, a partial ordering (denoted by ≺) on the set of analogical arguments. In section 4.10, we introduced the standard ranking, according to which argument A2 is superior to argument A1 if the positive analogy of A2 includes all critical factors in the positive analogy of A1. Informally, everything important that A1 shares with the target domain is also present in A2. But in cases of geometric similarity, we have an additional basis for our ranking: both sorts of parameters, real numbers and integers, have a natural ordering. It is natural to think that R2 would be a better model for R3 than R would be. Now we
ANALOGIES IN MATHEMATICS
173
come to the “reasonable proposal”: to refine the standard ranking by making use of this natural ordering. (5.5.3) Ranking Proposal Suppose A1 and A2 are possible analogical arguments for some conjecture about a target domain T. Suppose that the geometrically similar expressions in the three relevant domains are characterized by parameters a1 for A1, a2 for A2, and a for T. Then A1 ≺ A2 (A2 is strictly superior to A1) if and only if: a. A2 is strictly superior to A1 on the standard ranking (based on critical factors); or b. A2 and A1 are equally good on the standard ranking, but the parameter value a2 is closer than a1 to a. The parameter-based ordering never overrides the standard ordering, but it can break ties.8 Thus, in Example 5.3, the two-dimensional punctured disk is strictly superior to the punctured interval as an analog for the three-dimensional case. The second idea we take from the theory of multiple analogies in section 4.10 is independence. Independence states that for optimal arguments (with respect to the ranking ≺), plausibility is assessed independently of other competing analogies, while suboptimal arguments may be ignored. Applying this result to our example, we are led to reject the analogy between the punctured interval and punctured sphere. This is not because the relevant geometric similarities fail to be admissible; rather, we reject the analogy because of the existence of a competitor that is manifestly superior. Our ranking proposal can handle cases where it is legitimate to skip intermediate parameter values, as in the parity-based analogies discussed in section 5.2. In such cases, we may find a critical factor shared by two separated domains (e.g., n = 2 and n = 4) but not present in the intermediate case (n = 3). Because proximity in parameter value is secondary to overlap in critical factors in determining the overall ranking, our proposal allows for cases where the analogy between separated domains is superior. What if we recognize the existence of intermediate analogs, but lack relevant knowledge about them? Suppose that we are contemplating a conjecture about three-dimensional geometry. We have an analogical argument based on a proven result for one dimension, and it meets all of our criteria to count as a plausible individual argument. We are ignorant about whether the analogous result holds in two dimensions. In this case, our theory does not dictate that we should ignore the one-dimensional analogy. We have a plausible analogical argument, at least until such time as we know more about the two-dimensional case.
8. The proposal can be generalized to cases where geometric similarity is defined using more than one parameter, although this requires the introduction of a metric.
174
BY PARALLEL REASONING
Admissible Geometric Similarity Whereas algebraic similarity is a purely syntactic notion, geometric similarity between two domains combines common syntactic structure with the substitution of distinct parameters. The result is a more complex form of similarity, as reflected both in the examples and in our resolution of two philosophical difficulties: the problem of specious resemblance and the problem of separated domains. We have introduced an additional requirement for a geometric similarity to count as admissible: internal coherence (5.5.2). But we have also seen that a satisfactory discussion of geometric similarity requires us to consider multiple, rather than just individual, analogies. Apart from offering a way to think about a form of similarity that is important in many analogical arguments, the analysis of this section supports a broad conclusion. Here and in earlier chapters, I have rejected theories of analogy that model similarity as isomorphism, theories that treat similarity as a primitive semantic relation, and theories that reduce similarity to identity. Our analysis of geometric similarity shows that we can find different and philosophically viable alternatives for modeling similarity—alternatives that let us raise, and at least partially answer, interesting questions.
5.6 Asymptotic Similarity Similarities between the finite and the infinite, between discrete and continuous systems, and other similarities involving limits are at the heart of some of the most interesting mathematical analogies. These asymptotic similarities build on the two previous types, but the use of limits introduces a new level of complexity. As noted in section 5.2, arguments of this type pose a special challenge for our theory. My principal objective in this section is to show how the articulation model can be refined so as to handle this type of analogical argument. I shall discuss only two examples (with an additional illustration in the chapter appendix). The first is Example 5.4, Euler’s ingenious analogy between polynomials and power series. A polynomial of degree n has the form f(x) = a0 + a1x + . . . + anxn; a power series has the form f ðxÞ ¼
∞ X
an x n ;
n¼0
and is defined at x if the partial sums M X
an x n
n¼0
converge to some limit. So a power series is the pointwise limit of a sequence of polynomial functions (in fact, the sequence converges uniformly inside some
ANALOGIES IN MATHEMATICS
175
interval). Euler’s argument depends on the idea that mathematical operations behave similarly for polynomials and power series. The second example, to be presented shortly, is an analogy between finite and infinite matrices. A finite matrix with m rows and n columns has m · n elements. We refer to the element in row i and column j as aij. An infinite matrix is a matrix A of the form a11 a12 a13 . . . a21 a22 a23 . . . a31 a32 a33 . . . ... which has an element aij for each pair of positive integers i and j. The similarity between infinite and finite matrices is once again best described in terms of limits. An infinite matrix may be conceived as the limit of a sequence of finite matrices. The above infinite matrix A is the “limit” of the sequence whose nth member is the square matrix An consisting of the elements belonging both to the first n rows and to the first n columns. A2 is the 2 2 matrix in the upper left corner of the figure. Because limits and convergence are topological notions, we need to introduce some topological ideas in order to define asymptotic similarity. A topology t on a set X is a family of subsets (called open sets) that must include X, the empty set, all unions of members of t, and all finite intersections of members of t. Specifying a topology is equivalent to specifying which sequences converge: the sequence x1, x2, x3, . . . converges to the limit x if and only if every open set containing x also contains all but finitely many xn. For example, if X is the set of real numbers, the sequence ‰, º, 1/8, . . . converges to 0 because the terms of that sequence eventually become as close to 0 as one wishes, provided we go far enough along. More generally, a sequence of real numbers xn converges to a limit x if for any non-empty open interval (x e, x + e) around x, all but finitely many terms of the sequence belong to that interval. This definition of convergence is equivalent to specifying that the open sets (i.e., the topology) for the real numbers consist of all open intervals or unions of open intervals. We need one final topological notion. Given a topology on X, a subset D is dense in X if for every element x in X, there is a sequence
of elements of D that converges to x. This means that every element in X can be closely approximated by a nearby object in D. For example, the rational numbers are a dense subset of the real numbers. For any real number x, any open interval containing x, no matter how small, contains rational numbers. Equivalently, for any x, there is a sequence d1, d2, . . . of rational numbers that converges to x. We really need these mathematical notions to have a realistic hope of defining asymptotic similarity. Let me explain, informally, the definition that I shall propose, referring to Example 5.4. In the first place, when we talk about similarities between polynomials and power series, we don’t primarily mean that a particular polynomial is similar to a particular power series. Rather, we mean that the two classes share similar properties: operations such as addition and
176
BY PARALLEL REASONING
factorization (if Euler is right) behave similarly for polynomials and power series. That is the notion of asymptotic similarity that we want to capture. The next thing to notice is that the set of polynomials can be regarded as a dense subset of the set of power series that converges on some closed interval [a, a]. A polynomial of degree n is simply a power series, f ðxÞ ¼
∞ X
an x n ;
n¼0
with coefficients an+1 = an+2 = . . . = 0. The appropriate topology here is uniform convergence. A power series f converges on [-a, a] if and only if there is a sequence of polynomials, with Pn of degree n, such that for each E > 0 there is an N such that n N implies |Pn(x) f(x)| < E for all x in [a, a]. The polynomials thus constitute a dense subset. My suggestion is that asymptotic similarity combines limit operations with the two elementary types of similarity already discussed. An asymptotic similarity between a source domain S and a target domain T is founded upon algebraic or geometric similarities between S and an appropriate dense subset D of the target. (In the power series example, S and D are the same, but that need not be the case in general.) In figure 5.3, the full analogy (dotted double arrow) is obtained by “lifting” these similarities to T. Frequently, it turns out that conjectures based on such an analogy can be rigorously proven using the limit construction and the “leading special case” where the conjecture is restricted to D. This pattern of reasoning is familiar to every mathematician. Using these ideas, we are finally in a position to define asymptotic similarity of relations and functions defined on S and T. The definition is relative to the choice of (appropriate) topology on the target domain, and it makes use of the concept of the restriction of a function or relation to D. If R* is a relation on the objects of T, R*|D is R* restricted to objects in D; similarly, if F* is a function on T, then F*|D is F* restricted to D. These notions let us characterize the relations illustrated by figure 5.3 in precise terms.
T asymptotic similarity
S
limit operations
D algebraic and geometric similarities
FIGURE 5.3. Asymptotic similarity.
ANALOGIES IN MATHEMATICS
177
(5.6.1) Asymptotic Similarity Relations R and R* (or functions F and F*) defined on S and T, respectively, are asymptotically similar if there is an (appropriate) dense subset D of T such that R and R*|D (or F and F* |D) are algebraically or geometrically similar. There is no special requirement for an asymptotic similarity to count as admissible, other than that the constituent geometric similarities (if any) must be admissible, and that the choice of topology must be appropriate. Let us see how these ideas apply to Example 5.4 (power series). As already noted, the class of polynomials, D = S, is a dense subset of the class of convergent power series. Any relation or function on S is (trivially) identical to one defined on D, which counts as a trivial case of either algebraic or geometric similarity. In keeping with our theory, Euler’s elaborate analogical argument (described in section 5.2) should count as plausible. In order to reach that conclusion, however, we need to make one final change to our test for plausibility. The no-critical-difference condition in Plausibility II characterizes plausibility in terms of correspondences between the source and target domains. For analogies involving limits, we must substitute correspondence between S and a dense subset D of T. Here is the final version of our plausibility test for mathematical analogies. (5.6.2) Plausibility III 1. Overlap. Some explicit assumption in the proof must correspond admissibly to a fact known to be true in the target domain (or an appropriate dense subset, in the case of asymptotic similarity). (The positive analogy is nontrivial.) 2. No-critical-difference. No explicit assumption in the proof can correspond admissibly to something known to be false in the target domain (or an appropriate dense subset, in the case of asymptotic similarity). (No critical assumption belongs to the negative analogy.) The modified clauses capture the familiar heuristic principle, “what holds up to the limit is also true at the limit” (Polya 1954, 205). To apply this principle, it suffices that any element of T can be approximated by a sequence of elements in D, each of which exhibits no known critical difference from elements in the source domain. What is true for polynomials of arbitrarily high degree is true for power series. With this final revision, Euler’s analogical argument counts as plausible. It might be objected here that mathematics is full of examples where Polya’s maxim fails. We can rearrange the order of terms in any finite series and get the same sum, but that is not generally true for infinite series. A function that is the pointwise limit of a sequence of continuous functions need not itself be continuous. And so on. In spite of the many exceptions, however, the maxim is still an excellent guide to plausible conjectures; occasional failure is not a decisive obstacle. Furthermore, our definition of asymptotic similarity and its
178
BY PARALLEL REASONING
incorporation into the test for plausibility leads to an approach that is actually more refined than Polya’s maxim. By combining ordinary analogies with limit operations, we are able to avoid many obvious counterexamples. We illustrate this point by returning to Example 5.7, the analogy between finite and infinite matrices introduced at the start of this section. Here, Polya’s maxim fails, but our test leads to a correct result. Example 5.7 (finite and infinite matrices). As noted earlier, there is an analogy between the set S of finite matrices and the class T of infinite matrices, with real number entries. For both domains, let aij represent the element in row i and column j. Consider the following analogical argument. When we sum up the elements of a finite matrix (i.e., a matrix in S), the order of addition is unimportant. The result is the same whether we first compute the sum for each row, or whether we first compute the sum for each column: m X n X
n X m X aij ¼ aij :
i¼1 j¼1
j¼1 i¼1
By analogy, we might conjecture that a similar result holds for T: ∞ X ∞ X
∞ X ∞ X aij ¼ aij :
i¼1 j¼1
j¼1 i¼1
What does our theory tell us about this conjecture? Is it plausible? That equality holds in the finite case is trivial to prove. In fact, it follows from the commutativity and associativity of addition. If we can put S into appropriate correspondence with a dense subset D of T, then by our theory, the conjecture will be plausible. A natural choice for D might be the set of infinite matrices whose entries all become 0 beyond some row and beyond some column (that is, there exist integers M and N such that aij = 0 if i M or j N). Plainly S (finite matrices) and D (infinite matrices whose rows/columns are eventually 0) are isomorphic, so that the relevant algebraic similarities obtain. The difficulty, however, is to specify an appropriate topology on T for which D is a dense subset. We might try to define the distance between two matrices with entries {aij} and {bij} as ∞ X ∞ X
jaij bij j;
i¼1 j¼1
with the intention of defining convergence (and hence a topology) relative to this distance function.9 But there is no guarantee that this series will converge, and in general it does not converge. If we can find no plausible definition of
9. If this sum of positive terms converges to a finite limit, then it is absolutely convergent and so the order of summation is unimportant.
ANALOGIES IN MATHEMATICS
179
convergence with respect to which D is a dense set, then the analogy fails to provide support for its conclusion. In fact, the conclusion is false. Let A be the following infinite matrix: 1 1=2 1=4 1=8
0 1 1=2 1=4
0 0 1 1=2
0 0 0 1
... ... ... ...
Then ∞ X ∞ X
aij ¼ 2; but
i¼1 j¼1
∞ X ∞ X
aij ¼ 0;
j¼1 i¼1
so that the order of summation affects the result. At this point, our theory suggests an interesting strategy. Let T consist not of all infinite matrices, but only of those for which the sum ∞ X ∞ X
jaij j
i¼1 j¼1
is finite. The distance function proposed above is well-defined and induces a topology on T, and further, D is a dense subset of T. So according to our theory, the conjecture about the order of summation becomes plausible for this restricted target domain. Indeed, it is true. The example is interesting for a number of reasons. Most important, it shows that our theory can avoid problematic analogical inferences from the finite to the infinite by forcing us to define precisely how objects in the target domain are obtained as limits from objects in the source domain. If we are unable to articulate this connection, then we should reject the argument. The example also points to an important limitation of our account: we say nothing about what counts as an “appropriate” topology. If we adopt the so-called trivial topology on T, according to which the only open sets are the empty set and T itself, then any nonempty subset of T will be dense. But the trivial topology is manifestly not appropriate in most settings. Since I offer no account of what constitutes an appropriate topology, I concede that I have only the beginnings of a theory of asymptotic similarity and limitbased analogical reasoning. Still, the analysis of the two examples in this section, and Example 5.10 in the chapter appendix, suggest that our theory has promise. This concludes our discussion of similarity in mathematical analogies. Plainly, much remains to be said about these and other important forms of similarity, but my objective in this chapter has been modest: to refine our understanding of mathematical analogies, setting the stage for further work on nonmathematical analogies. In the remainder of this chapter, we shall see that
180
BY PARALLEL REASONING
mathematical analogies also provide clues to understanding other aspects of analogical reasoning.
5.7 Plausibility and Analogy in Mathematics The question I begin to answer in this section is this: why should analogical arguments satisfying our test be regarded as plausible? The “test” (Plausibility III) combines the theory of chapter 4 with the models of similarity put forward in the present chapter. Our question is thus: why should an analogical argument that satisfies the overlap and no-critical-difference conditions make its conclusion plausible? I shall first clarify the question by reviewing what I mean by plausibility, and then outline a preliminary response. Section 1.5 distinguished between modal and probabilistic conceptions of plausibility. In section 5.2, I wrote that a mathematical conjecture is plausible if, in light of available evidence, but short of decisive proof, it has enough support to warrant investigation. This is the modal conception—that is, prima facie plausibility. It has been our focus up until now. But references to “available evidence” and “support” suggest that plausibility in mathematics, as elsewhere, is a matter of degree. It is certainly worth exploring the prospects for a probabilistic interpretation of plausibility. Polya (1954), Hacking (1967), and more recently Gaifman (2004) have endorsed approaches along these lines. To introduce a probabilistic approach within mathematics, however, brings up the problem of logical omniscience. Any coherent probability assignment must assign equal probability to logically equivalent statements, or to statements whose equivalence is implied by accepted axioms. Coherence is widely regarded as a condition of rationality, so it seems to follow that logical omniscience is a requirement of rationality. Yet logical omniscience appears to rule out the assignment of probabilities other than 1 or 0 to any provable or disprovable statement of mathematics. The logical equivalences are there; so much the worse for us if we are not aware of them. In dealing with problems of Bayesian epistemology (specifically, the problem of old evidence), philosophers have evolved various strategies to model nonomniscient agents who can come to learn about logical relationships (Hacking 1967, Good 1968, Garber 1983, Jeffrey 1992, Gaifman 2004). Common to many of these approaches is the idea that the beliefs of the nonomniscient agent are initially represented in an impoverished setting. The logical equivalences only exist in some enriched framework, and it may even be the case that the agent does not, strictly speaking, have a probability model at all. Gaifman is quite casual about scenarios of the latter type: A mathematician, having deliberated a problem, may have a strong conviction that a certain sentence is true and be willing to bet on it with odds, say 3:1. This can be interpreted, in probabilistic terms, as a subjective probability that is 0.75. . . . Subjective probability assignments to mathematical statements are not worse in this respect than
ANALOGIES IN MATHEMATICS
181
subjective probabilities in general. Justifying the particular values is not at issue; we may not be able to give an account that explains the expert’s values, except for gesturing at the expert’s past experience, or at some analogy with other cases. The framework of subjective probability does not require that such an account be given. (2004, p. 108) Since Gaifman is keenly aware of the problem of logical omniscience, the situation he describes must (I think) be one in which the mathematician is considering a lone statement or small fragment of mathematics, with no pretense to having a proper probability model. There is small scope for failures of coherence and no real concern about Dutch Books. It seems to me that Gaifman is basically right, although in such circumstances, “probabilities” are really just betting quotients and not probabilities at all.10 So long as we are not worried about this, we can take two further steps. First, there might be situations in which one cannot lay sharp odds on the truth of a statement (i.e., no bet or set of bets seems fair), but one can nevertheless compare the “relative likelihood” of two statements. Elsewhere (Bartha and Johns 2001, Bartha 2004) I have argued that this is what happens in the de Finetti lottery. Imagine a lottery with one ticket printed for each positive integer, where you believe that each ticket is equally likely to win. You cannot assign any value to the probability that an individual ticket wins (0 is too small; any positive number is too large). Still, you regard as fair a bet in which you win $1 if ticket i wins, lose $1 if ticket j wins, and otherwise no money changes hands. Your relative probability—or better, your relative betting quotient—for the pair of statements (ticket i wins, ticket j wins) is 1. The idea can be generalized: holding two tickets makes you twice as likely to win as holding just one, and so forth. Second, there might be situations in which one is willing to lay odds only on a conditional bet. For example, suppose that you are throwing a dart at a square board that measures one meter by one meter. Suppose that the point of the dart is equally likely to hit any point on the board. Since the entire board has unit area, the probability that the dart will land in a specified region is just its area, if that area is well defined. In particular, any two regions of equal area are equally likely to contain the point where the dart hits. Consider the proposition M that the dart lands on a given horizontal line crossing the dartboard, as shown in figure 5.4. There is zero probability that a dart will land exactly along this line: Pr(M) = 0. But nothing prevents us from raising the question: what is the conditional probability of L, the dart’s landing on the left half of M, given that it lands on M? Intuitively, the answer is ‰, and even if the conditional probability is not well defined, that might be the most reasonable conditional betting quotient.11 In these examples, it seems clear that the only possible justification for the betting quotients is symmetry. In the lottery case, we have perfect symmetry of the outcomes where ticket i wins and ticket j wins; nothing favors one outcome
10. Bayesian subjective probabilities are defined on an algebra, are subject to coherence, and are measured by (rather than being identified with) betting quotients. 11. This argument is taken largely from Bartha and Johns (2001).
182
BY PARALLEL REASONING
M L
FIGURE 5.4. The dart board. over the other, so there can be no justification for a relative betting quotient other than 1. In the dartboard case, similarly, we have (by hypothesis) symmetry between the left and right sides of the dartboard. I suggest that the plausibility of analogical reasoning is often best construed not precisely in terms of probability but, rather, in terms of relative conditional betting quotients (we might call them quasi-probabilities); and further, that these betting quotients are symmetry-based. At the heart of an analogical argument is the judgment that a conditional bet on Q* given P* is approximately equal in value to a conditional bet on Q given P—even though one may not have an exact value in mind.12 And the basis for that judgment is the symmetry of the two situations: the reasoning that pertains to your evaluations of these two conditional bets must be the same. Let us assess this position for the best-case scenario of a perfect analogical argument, where all correspondences are legitimate and every assumption used in the proof of Q is known to have a true analog in T. That is, the positive analogy P contains every critically relevant factor. This situation is not uncommon; it occurs in Example 5.1, Example 5.5, and Example 5.6. A quick way to put the symmetry argument is this: what basis could there be for evaluating the relevant conditional bets differently? The source and target domains are alike in all relevant known respects, short of actually attempting the proof of the target conjecture, Q*. A very natural objection is that the analogical argument may have overlooked some relevant differences. Such differences, of course, can exist even for arguments that satisfy Plausibility III. Consider Example 5.6 (abelian groups), a perfect analogical argument whose conclusion turns out to be false. According to our test, the inference is plausible, but obviously there really are relevant differences between the source and target domains since the theorem is true in one case but false in the other. The proof in the source domain makes only one use of the cancellation laws, and that is what leads to the failure of the analogy. That does not count as a relevant difference on the present account, and I think 12. As usual, P and P* refer to the positive analogy, and Q and Q* to the analogous hypotheses or conjectures that are the focus of the analogical argument.
ANALOGIES IN MATHEMATICS
183
that is right. The inference really is plausible. It is reasonable to expect a generalization, even though it turns out that there is none (see note 6). As my purpose in this section is mainly to foreshadow the approach to justification that will be developed in chapter 8, I will not develop the argument further at this point. But I do want to acknowledge two very natural objections and suggest how they might be met. First, how does the above reasoning apply to imperfect analogical arguments, where not all critical assumptions belong to the positive analogy? Some belong to the neutral analogy; we are ignorant of whether or not they have true analogs in the target domain. In such cases, the symmetry argument for the equal value of the conditional bets (for Q* given P* and for Q given P) still applies, but these conditional bets become less of a reliable measure of the plausibility of Q and of Q* as the positive analogy shrinks to represent a smaller share of the relevant factors. Broadly speaking, the solution to this problem is that it is appropriate to accept imperfect analogical arguments that meet a reasonable standard. Second, there are longstanding objections to basing probabilistic (or quasiprobabilistic) judgments on appeals to symmetry. This strategy raises the specter of the Principle of Indifference and its associated paradoxes. Paradoxes arise when different symmetries, all apparently equally legitimate, generate conflicting probability assignments. How can we be sure that different analogical arguments, or even one analogy represented in different ways, will not lead to inconsistent betting quotients? I suggest that there are two sorts of situations where the threat of paradox can be countered. The first, exemplified by the de Finetti lottery and the dartboard, is in a highly idealized model. In such models, one can sometimes identify the relevant class of symmetries on a system and show that this class satisfies nontrivial conditions that guarantee a well-defined probability (or quasiprobability) model.13 The second type of situation, exemplified by analogical reasoning and the sort of ad hoc probability assignments Gaifman describes, is just the opposite: a relatively unstructured framework in which symmetries are applied across problem domains, rather than within a system. A helpful picture to keep in mind is that one uses the sorts of betting quotients we have been discussing to build, in gradual stages, a probability model (with limiting values of 1 and 0, in the case of mathematics). The paradoxes associated with the Principle of Indifference require a certain amount of structure: competing symmetries must be identified and represented as plausible. Analogical arguments are unique in two respects that help to diminish the risk of paradox. First, in the early stages of an inquiry, one has relatively little structure—perhaps just one, or at any rate a small number, of analogical arguments can be brought to bear on a conjecture. Second, distinct analogical arguments can generate multiple relative conditional betting 13. Sophisticated techniques along these general lines have been developed by Jaynes and his students; see Jaynes 1973, Villegas (1977, 1981).
184
BY PARALLEL REASONING
quotients without explicit contradiction. In the case of multiple analogies, a superior argument renders the inferior one irrelevant; where no ranking is possible, betting quotients conditional on different sets of critical factors are not obviously in conflict. A contradiction would arise in the case of two optimal analogical arguments that share the same positive analogy but diverge in their conclusions, but I am not aware of any such case in mathematics. In short, where there is no ambition to proceed straight from symmetries to a probability model, the danger of paradox subsides.
5.8 A Larger Role for Mathematical Analogies Analogies appear to have two principal roles in mathematics. One widely recognized role is heuristic.14 Striking discoveries in mathematics have involved analogies between such diverse fields as number theory and complex function theory. More commonly, similarities within the same field of mathematics provide useful guidelines for solving particular problems. The term “heuristic” actually embraces two distinct functions of analogies: the psychological role of triggering discoveries and the logical role of establishing the plausibility of conjectures that have been formulated but not yet proved or disproved. In this section, I reserve the term for the latter role, which has consistently been our focus in this chapter. Analogies also play a systematizing or unifying role in mathematics. Descartes’ (1637/1954) correlation between geometry and algebra, for example, made it possible to reduce a whole class of geometric problems to finding the roots of a polynomial equation. Descartes provided a method for systematically handling geometrical problems that had long been recognized as analogous, since they corresponded to equations of the same degree. In the last century, the similarities between “substitution groups” (today’s permutation or symmetric groups) and isometric linear transformations on crystal structures led to the concept of a group (Kline 1972). The two roles are related through the idea of generalization. As I have been suggesting in this chapter and the last, the goal of generalization drives analogical reasoning in the early stages of an investigation. By contrast, after a general result has been obtained, the analogy sheds light on fruitful connections between two previously independent areas. Between these two stages, an analogy functions as a guide in an ongoing research program aimed at generalization. Indeed, mathematicians often portray their work as part of a prolonged effort to understand and develop a relationship of analogy between two structures.15 For instance, analogies between logic and topology were refined by Mostowski,
14. Polya (1954), Poincare´ (1952), and Hadamard (1949) have all stressed the importance of analogies in mathematical discovery. 15. An informal survey of current Mathematical Reviews—peer reviews of current work by mathematicians— uncovered many cases in which the reviewers explain the significance of a result in terms of how it extends or develops a well-known analogy.
ANALOGIES IN MATHEMATICS
185
Kleene, Addison, and others, culminating in the theory of the arithmetical and analytical hierarchies. A more recent example is the attempt to develop systematically the analogy between the p-adic numbers and real-closed fields (Macintyre 1986). We might speculate that an analogy closely identified with such a research project has a kind of life history. Initially, it rests on a few points of similarity. It guides researchers in formulating new concepts and theorems, and in developing more complex similarities. An analogy that is successfully developed in this way culminates in a deductive theory; a failed analogy, weakened by too many points of difference, will not lead to any such breakthrough. This large role appears to contrast with the humble analogies that help the mathematician to solve a particular problem. Indeed, within this dynamic conception of analogical reasoning, the individual analogical arguments that have been our focus appear as temporal cross-sections, snapshots in an evolving drama. That leads to two interesting questions. First, should the systematic or unifying function of analogy somehow be taken into account in our model for evaluating individual arguments—that is, our heuristic model? Second, does our heuristic model shed any light on the problem of understanding the dynamic conception of analogical reasoning? As regards the first question, the simple answer is that we need a “static” theory of individual analogical arguments that works independently of the big picture. For one thing, when we consider analogies in terms of their systematic or unifying function, the ultimate “justification” of an analogy is the successful construction of a comprehensive deductive theory. But in its heuristic role, analogy is an aid to inductive reasoning, and any heuristic model of analogical reasoning must recognize this practical orientation. We need guidelines that help us decide what to do next: to decide which conjectures are likely to be worth investigating and which proof strategies might turn out to be fruitful. Criteria used to evaluate the plausibility of analogical arguments must not depend on the final mathematical theory—otherwise they would be, practically speaking, useless. Furthermore, realistic criteria should be fallible: they should pronounce some conjectures plausible even though they turn out to be false, and other conjectures implausible although they turn out to be true. All of these remarks suggest that our theory of individual analogical arguments should be independent of an account that explains their role in a broader research context. Certainly, we can find reasons in favor of the opposite position, that our heuristic model must take the big picture into account. We can point to examples of analogical arguments that appear plausible not because they conform to the models presented in this chapter but, rather, because they are located within an evolving program of research. We have already seen an example in the Euler characteristic formula, Example 5.2. As noted in section 5.5, we can set up an analogy among three and higher dimensions that appears plausible, even though the most basic criterion of our theory (a rigorous proof for the source domain) is not met. In order to respond to this argument, let me turn things around by suggesting that our heuristic model contributes to our understanding of the dynamic
186
BY PARALLEL REASONING
role of analogies. Our theory proposes that an individual analogical argument is plausible if it has a reasonable potential for generalization, and the theory offers an interpretation of this potential in terms of two basic criteria. If an analogy that looks likely to be fruitful in light of experience somehow fails to meet those criteria, our model does not mandate total rejection of the argument—it leaves us with the option of modification. It is at this point that historical considerations exert their influence. We may have a vague sense that the analogy can be refined. We may have found this sort of analogy to be so reliable that we are confident that some version will succeed. In the case of the Euler characteristic formula, experience may suggest to us that a rigorous proof for the three-dimensional case will almost certainly involve removing and adding vertices, edges, and faces. In all such cases, broadly speaking, we retain the basic intuition that the analogy exhibits reasonable potential for generalization. If we decide to persist with the analogy, then our model provides clear guidance about how it may be restricted or reformulated in order to meet our criteria of plausibility. Actual practice illustrates some of the most basic ways in which this can happen. When mathematicians discover a critical difference, they respond in one of three ways: they reformulate the proof (in the source domain) to dispense with the problematic assumption; they restrict the target domain so that the disanalogy disappears; or they give up on the analogy. The first two responses are natural strategies for restoring the potential for generalization. Several of our examples (in the main part of the chapter and the chapter appendix) illustrate how our theory provides this sort of guidance. Just as deductive logic and the probability calculus constrain individual arguments in science, we need a basic model for assessing individual analogical arguments. And just as deductive logic and the probability calculus have important roles to play within a dynamic model of mathematical (or scientific) evolution, so too a basic model for assessing individual analogical arguments contributes to a larger understanding of analogical reasoning.
5.9 Conclusion Analogical arguments are important in mathematics, and analogical arguments in mathematics are important for anybody who wants to understand analogical reasoning. Within mathematics, we find a rich set of examples and an ideal starting point for developing a general theory of analogical arguments. In this section, I highlight some important general conclusions about analogies in mathematics to be drawn from this chapter, and then explain why these conclusions will prove valuable outside mathematics. Chapter 4 proposed a simple test for a good analogical argument in mathematics: every assumption used in the proof in the source domain must correspond to something true, or not known to be false, in the target domain. One of the main objectives of this chapter has been to show, principally through the use of examples, that this test is clear, actionable, nontrivial, and reasonably reliable. The murkiest part of the test, however, is the notion of correspondence between
ANALOGIES IN MATHEMATICS
187
domains. In order to address that problem, this chapter proposed models for three types of similarity that are common in mathematical analogies. Setting aside the specific strengths and weaknesses of those models, I believe that they have sufficient merit to justify two important general conclusions. First, they demonstrate that it is both feasible and valuable for theories of analogical reasoning to avoid three prevalent and misleading assumptions: (1) that relations of similarity can always be reduced to identity and difference, (2) that interdomain similarities can best be modeled by the notion of isomorphism, and (3) that similarity is a non-analyzable primitive relation. Most theories of analogical reasoning presented in chapters 2 and 3 make one of these simplifying assumptions. Instead, we should move to a pluralistic approach that aims for a set of precise, but diverse, models of similarity. As we have seen, such models provide the tools for faithful representation of analogical arguments in mathematics, and they allow us to raise—and partially to answer—some interesting philosophical questions. The second significant conclusion suggested by our work on similarity is that, although a theory of analogical arguments can strive for precise models, we should not expect to eliminate entirely the role of good judgment in the evaluation of analogical arguments. Our analyses of geometric and asymptotic similarity, for instance, appeal to ideas of natural or “standard” representation and “appropriate” topology. While this limitation might dampen our hopes of automating the process of analogical reasoning, there is no reason to be pessimistic if our aim is simply to obtain guidelines as sharp as possible for evaluating the plausibility of individual arguments, and to flag clearly the areas where discretion is needed. The latter sections of this chapter examined the philosophical justification for analogical arguments in mathematics, and the relationship between the roles that analogies play in individual arguments and in mathematical research programs. Here the most important point is that our theory of analogical arguments, relating plausibility to potential for generalization, is not merely compatible with treating these broader issues but also capable of furnishing valuable insights. On the issue of justification in particular, I argued that we can, even in mathematics, think of plausibility in terms of subjective probabilities (or quasi-probabilities). The idea that potential for generalization establishes plausibility is an application of an older idea that symmetry somehow constrains probability assignments. The application of symmetry to analogical reasoning is meant to be less ambitious than in the classical theory of probability and so, it is hoped, free of many of the classical difficulties. I have stated all of these conclusions in general terms because I believe that they apply to nonmathematical as well as to mathematical analogies. The next chapter continues the pluralistic approach to similarity initiated here and makes use of some of the same models. On the problem of philosophical justification, concentrating on mathematical analogies helps us to steer clear of unpromising strategies. As already noted, mathematical analogies do not rely upon any grand causal principle for their plausibility, and we should bear this in mind in reflecting upon the justification for analogical arguments in general. There are
188
BY PARALLEL REASONING
grounds for further optimism if the approach sketched in this chapter, connecting analogical arguments to symmetry-based reasoning, does not rely upon special characteristics of mathematics. Finally, we briefly discussed how our theory of individual analogical arguments sheds light on the ways in which analogies guide sustained mathematical investigation and on how we should respond to “defective” analogies. These ideas do not depend upon anything peculiar to mathematics, and the concluding chapter explores how they might be extended outside the mathematical setting.
Appendix to Chapter 5: Three Additional Examples The examples in this Appendix provide further illustrations of the three types of similarity. They also support the view that our test, while aimed at assessing the plausibility of individual analogical arguments, provides a helpful framework for using analogies to guide an evolving investigation. Example 5.8 (complex numbers and operators). Let S be the set of complex numbers, and T the set of linear operators on V, where V is a finite-dimensional inner product space. Operations with complex numbers (addition, multiplication, and conjugation) are algebraically similar to operations (addition, function composition, and the adjoint map) with linear operators.16 Algebraic similarities between the corresponding operations include those shown in table 5.6. One important and well-known disanalogy is the failure of commutativity for operator composition. In general, BA 6¼ AB, even though zw = wz always holds. Note that we could characterize each of these algebraic similarities (and the one difference) as required by definition (5.4.1), but we omit this formality. This analogy can be useful because we can assess the plausibility of a conjecture Q* about linear operators by considering a corresponding claim Q about complex numbers. In many cases where we have a proof for Q, the analogous proposition Q* is plausible. By the no-critical-difference condition, however, if the proof of Q requires commutativity, then the analogical inference to Q* is blocked. To illustrate these ideas, consider the set of symmetric operators, consisting of those A such that A* = A. This class is analogous to the subset of real numbers in C—that is, numbers such that z¯ = z. We know that the sum and the product of TABLE 5.6. Complex numbers and operators Complex Numbers
Linear Operators
(1) z þ w ¼ z þ w (2) z ¼ z z (3) zw ¼ w
(B+A)* = B* + A* A* * = A (BA) * = A*B*
16. Complex conjugation is defined as usual: if z ¼ aþbi, then z¯¼abi is the conjugate of z, where a and b are real numbers. If A is a linear operator, then its adjoint A* is defined, via the inner product function < >, as the unique operator such that ¼ < v, A* w> for all v, w E V. A* is standard notation for the adjoint. Although we have been using * to signify features of the target domain, this special usage is confined to this example.
ANALOGIES IN MATHEMATICS
189
two real numbers are real. Is it plausible to conjecture by analogy that the sum and the product of two symmetric operators are symmetric? In the case of sums, ¼ z þ w; so z + the source proof is trivial: if z¯ = z and w¯ = w, then z þ w ¼ z þ w w is real. The critical facts here are the above characterization of real numbers as those for which z¯ = z and the identity (1) above. Since the corresponding statements are true for linear operators, the conjecture that the sum of symmetric operators is symmetric is strongly plausible (and, indeed, just as trivial to prove). By contrast, a proof that the product of two real numbers is real uses identity (3) and the commutativity of multiplication: zw ¼ w z ¼ wz ¼ zw: Because commutativity does not hold for operators, the analogy fails (according to our theory) to show that it is plausible that BA is symmetric whenever B and A are symmetric. In fact, the best result is that if B and A commute and are symmetric, then BA is symmetric. This conjecture is strongly plausible (again, according to our theory), since the single relevant difference is eliminated by restricting or weakening the conjecture. Consider a second conjecture, which requires a few preliminary definitions. U is called a unitary operator if U*U = I, where I is the identity operator (that is, I(v) = v for all vEV). The condition U*U = I is algebraically similar to the condition z¯z =1, which means that z lies on the unit circle. P is called a positive operator if P = A*A for some A. The analogous property for a complex number r is that r = z¯z for some complex number z, which means that r is a nonnegative real number. It is easy to prove the socalled polar decomposition formula: any complex number z can be written as z = ru, where r 0 and u lies on the unit circle. Call this result Q. Does our analogy between complex numbers and operators make it plausible that any (nonzero) operator A can be written in the form A = PU where P is positive and U is unitary? This is the conjecture Q* in which we are interested: a polar decomposition formula for operators. We have to identify the critical assumptions in the proof of Q. If z = a + bi, with i2 = 1 and a, b real numbers, then 1
1
jzj ¼ ða2 þ b2 Þ2 ¼ ðz zÞ2 is the modulus of z. It is simple to prove that z = ru, where r 0 is a real number and |u| = 1. Put 1
r ¼ jzj ¼ ðz zÞ2 and, if r 6¼ 0, put u = z/r; if r = 0, put u = 1. Then u¯u = (z¯z)/r2 = 1 establishing that |u| = 1. This proof uses the following facts: every positive real number has a positive square root; every positive real number r has a positive inverse 1/r; and if
190
BY PARALLEL REASONING 1
r ¼ ð z zÞ2 ; then 1/r commutes with z and z¯. Now not all of the corresponding statements hold for operators. In particular, not every positive operator has an inverse. That is a critical difference. The analogical inference to Q* is blocked—that is, the analogy fails to show that Q* is plausible.17 The analogy between complex numbers and operators is an important reasoning tool. Our plausibility criteria make it clear when the analogy can be applied in a straightforward way, when certain modifications must be made (e.g., restriction to the commutative case), and when the analogy fails altogether to establish plausibility. Example 5.9 (series convergence). Analogies can be useful in trying to decide whether an infinite series, or sum of terms, converges to a finite limit. The series ∞ X
2n ¼ 1=2 þ 1=4 þ 1=8 þ . . .
n¼1
converges (to 1), while ∞ X
2n ¼ 2 þ 4 þ 8 þ . . .
n¼1
evidently diverges. Also, ∞ X
20n ¼ 1 þ 1 þ 1 þ . . .
n¼1
diverges. But sometimes it is not so obvious whether or not a series converges. Consider the question: when does ∞ X
2pn ¼ 2p þ 22p þ . . .
n¼1
converge? First, observe that for any real value p, 2pn is geometrically similar to the expressions in all three of the above series. The parameter space here consists of real numbers x in some range including –1, 0 and 1. Suppose p > 0. Then the analogy with the case p = 0 is better than the analogy with the case p = –1, and in fact the case where p = 1 is actually irrelevant (by the discussion of ranking in section 5.5). So it is plausible that if 0 < p, the series diverges. (Divergence here is actually obvious, since each term is larger than 1.) If p < 1, it is plausible that 17. Since Q* happens to be true, one might say that Plausibility I has led us astray. But this is misleading; our conclusion was merely that a particular analogical inference failed to establish plausibility.
ANALOGIES IN MATHEMATICS ∞ X
191
2pn
n¼1
converges, by analogy with the case p = 1. (Once again, the result is obvious since each term 2pn of the series is less than 2n and X
2n
n¼1
converges.) If 1 < p < 0, we can draw an analogy with either of the cases p = 0 and p = 1, obtaining the conflicting hypotheses that the series does and does not converge. If the similarities are admissible in both cases, then we have two plausible but conflicting conjectures. Example 5.10 (the l p and L p spaces). Our final example provides an excellent illustration of asymptotic similarity. The space l p is a set of sequences <xn> of real numbers, for each 1 p < 1, defined as follows: lp ¼ f< xn > =
∞ X
jxi jp < ∞g:
i¼1
A sequence belongs to the set lp if the sum of the absolute values of all its terms raised to power p is finite. Any sequence that has only finitely many nonzero terms is in l p for all 1 p < 1. The sequence of reciprocals, xn = 1/n, does not belong to l1 because the sum ∞ X 1 n¼1
n
diverges, but it belongs to l p for p > 1. The space L p is a set of measurable functions f: R ! R, for each 1 p < 1. The definition of measurable functions involves the concept of Lebesgue measure, m, but for our purposes it is enough to state that the notion of measurability is a generalization of continuity. We then define (for a given p 1): Lp ¼ ff : R ! R=f is measurable and ∫jf jp dm < ∞g: A function f belongs to L p if the Lebesgue integral of | f | p over the entire space R is in L p for all p 1. Any probability is finite. f(x) = exp( x2), for example, R 1 density function F is in L , since jFjdm = 1. In the analogy between lp and L p, sequences correspond to functions, but there is no mapping from particular sequences to particular functions. The idea, once again, is that operations on the two spaces have similar properties. In particular, summation in l p is closely analogous to integration on L p; indeed,
192
BY PARALLEL REASONING
within a general theory of L p spaces, summation is a special type of integration. The analogy between these two structures is extremely useful because it is easier to test conjectures in l p than in L p. Sequences are simpler objects than functions of a real variable. If (for a given p) some result is true for l p, it is usually—though not always—a good bet that the corresponding result is true for L p. To illustrate this point, consider the concept of separability. A space X with a metric (or distance function) is separable if it has a countable, dense subset. That is, there is a countable subset E such that if we pick an arbitrary element x of X, we can find an element of E as close to x as we please. Formally, for every x E X and any e > 0 there is an element eEE such that the distance r (x, e) < e. We can prove that l p is separable. Let E consist of all sequences with only finitely many nonzero terms, all of which are rational numbers. A typical element of E might be: 1 14 1; ; ; 12; 0; 0; . . . : 2 97 E is countable because the rationals are countable. To show that E is dense, pick any sequence <xi > such that ∞ X
jxi jp < ∞:
i¼1
Since the sum is finite, there is some k such that adding the first k terms brings us close to that finite sum. By the density of the rational numbers, there are rationals e1, . . . ,ek close to the first k terms. Let ek+1 = ek+2 = . . . = 0. Then <ei> belongs to E, and the total distance between <xi > and <ei >, defined as ∞ X
jxi ei jp ;
i¼1
is small. This distance can be made as small as we please. The critical assumptions in this argument include the following: 1. Familiar facts about countable sets (the rationals and sets constructed from the rationals by countable operations are countable); 2. The fact that if an infinite sum of positive terms converges, then the tail end of the sequence can be made as small as desired; and 3. The fact that a terminating series can be as closely approximated as desired by a terminating series of rational numbers (derived from the density of the rationals). In the target domain, L p, the assumptions (1) are unchanged. According to our theory of analogies, the conjecture that L p is separable is plausible if statements corresponding to (2) and (3) are true (or could be true) for L p, or for a dense subset D of L p (since we are dealing here with asymptotic similarity).
ANALOGIES IN MATHEMATICS
193
R Corresponding to (2), we can prove that if |f | p < 1, then for any e > 0 there is a positive N such that the portion of the integral evaluated outside the interval [N, N] is less than e. Corresponding to (3) is the result that there is indeed a dense subset D of L p such that for any f in L p, there is a function g belonging to D such that g approximates f closely: þN
p ∫ j f gj < ε:
N
To evaluate the analogical argument amounts to evaluating the plausibility of this last statement about L p. This requires going a little further into the similarity between summation and integration. The Lebesgue integral can be defined in terms of step functions. A step function C is defined on [a, b] by taking points a = x0 < x1 < . . . < xn = b and setting C to be a constant value ci on [xi, xi+1). A measurable function f is integrable if f can be as well approximated (for purposes of integration) by a step function as desired—that is, if for every e >0 R there is a step function C such that jf –Cj < e. The integrals of step functions are finite sums, so that in general the integral of f is a limit of such sums. Thus, there is an asymptotic similarity between sums and integrals. Taking one further step, define D to be the set of step functions formed from partitions whose end points x0 < x1 < . . . < xn are all rational numbers, and whose values ci are also rational. Any step function C can be closely approximated by a member j of D; that is, the integral ZþN jC ’jp N
can be made as small as desired. And D is countable by the way it is constructed from the rationals. Analogs of the critical assumptions hold for D, and D is a countable dense subset, completing our argument for the plausibility of the conjecture that L p is separable. In this example, the analogy guides us at every step not only in assessing plausibility, but also in devising the proof that L p is separable. Constructing that proof is not a simple process. We cannot simply transfer the proof from the source l p to the target domain, L p. The example illustrates how the basic principles that are used in simpler analogies can still operate at a moderately complex level, even though they must be applied in a careful manner.
This page intentionally left blank
6 Similarity and Patterns of Generalization
6.1 Introduction In chapter 4, I developed a set of models for evaluating different types of analogical arguments. These models are based upon careful analysis of the prior association—the vertical relationship within the source domain that one hopes to extend to the target. By contrast, they place no constraints upon the horizontal (cross-domain) relationships. Correspondences between the two domains are taken for granted. Chapter 5 proposed such constraints for mathematical analogies. In this chapter, I extend some of those ideas beyond mathematics, concentrating as always on scientific examples. The central idea of the articulation model is that the proper way to evaluate an analogical argument is to assess its potential for generalization. I continue to put that idea to work here. As in chapter 5, I characterize distinct types of similarity and show how they are aligned with different forms of generalization. I limit my attention to three common types of similarity. We have seen examples of each type in earlier chapters. • Feature matching (section 6.2). Two observable features match if they resemble each other to a high degree. For example, doses of morphine and meperidine cause identical S-shaped tail curvature in mice (Example 4.4). • Formal similarity (section 6.3). Two features are formally similar if they occupy corresponding positions in formally analogous theories. For example, pitch in the theory of sound corresponds to color in the theory of light (Example 4.7). • Parametric similarity (section 6.4). Two features exhibit parametric similarity if they can be appropriately represented by the
196
BY PARALLEL REASONING
assignment of distinct values to some variable (or set of variables), and if they are linked by a range of intermediate features corresponding to intermediate variable values. For instance, the characteristics of a culture and its historical ancestor can be linked in a continuous tradition (Example 4.9). Of course, an analogical argument often relies upon more than one sort of similarity, but commonly one of the above types is the predominant relationship between critical factors in the source and target domains. Indeed, as we saw in earlier chapters, many people classify analogies according to whether the most important cross-domain relationships are feature matches or structural (formal) similarities. My starting point is the observation that successful analogical arguments based upon these three very different types of similarity tend to result in different forms of generalization. Analogical reasoning from feature matches often leads to the discovery or extension of a kind. Reasoning from formal similarity results in a common mathematical formalization. Arguments based upon parametric similarity lead to or extend a uniformity or invariant relationship: a pattern of counterfactual dependence. I shall continue to apply the term “generalization” to these three types of achievement, although philosophers also refer to them as forms of unification (Morrison 2000, Woodward 2003). In keeping with my broad thesis, I suggest that a fruitful way to assess analogical arguments is to treat them as aiming at these sorts of generalization (whether or not that aim is explicit). Of course, analogies do not always produce breakthroughs. My point is that a plausible analogy must be consistent with and show some promise of generalization, and that it gains in strength to the extent that it demonstrates such potential. We should evaluate the similarities and differences between our two domains with the appropriate type of generalization in mind. Furthermore, I shall argue that these forms of generalization tend to line up with the classification scheme of chapter 4, so that their consideration enriches our theory of analogical arguments. To close this introductory section, let me once again contrast my approach to similarity with approaches commonly adopted in computational models of analogical reasoning. As we saw in chapter 3, most computational models are built around the task of constructing a mapping between the source and target domains. The most popular approaches, following the lead of Gentner and her colleagues or of Holyoak and Thagard, treat analogical mapping as a process driven by the objective of maximizing overall similarity (structural and semantic). Although these models allow for initial similarity judgments (and acknowledge their influence when people are looking for possible source analogs), individual features of the source and target domains ultimately count as similar if placed in correspondence by the “best” global mapping. This is a coherentist and psychologically motivated model of similarity judgments. On my account, the goal of an analogical argument is to generalize a particular logical or explanatory relationship rather than to maximize overall similarity. This contrast reflects my orientation toward normative concerns. It may be psychologically realistic to suppose that what triggers analogical reasoning is an initial perception of striking similarities, and that what drives
SIMILARITY AND PATTERNS OF GENERALIZATION
197
it toward a conclusion is a passion for global similarity. From the normative point of view, however, it is appropriate to focus on the endpoint of a successful analogy—namely, a broadened concept or unified theory—and whether there is any reason to expect that such an objective can be attained. Individual similarities and differences must be scrutinized not for contributing to overall similarity but for their role in supporting or blocking generalization.
6.2 Feature Matching and Correlative Analogies The most basic form of similarity is sharing a common feature or belonging to a common kind.1 Let us use the term “matching” for resemblances that are naturally expressed in perceptual terms. In this section, I concentrate on analogical arguments that appear to rest primarily upon a list of matching features. The following argument, due to Benjamin Franklin, is a splendid illustration. Example 6.1 (Franklin on lightning). In 1749, Benjamin Franklin predicted that pointed metal rods would attract lightning by analogy with the way they attracted the “electrical fluid” in the laboratory: Electrical fluid agrees with lightning in these particulars: 1. Giving light. 2. Color of the light. 3. Crooked direction. 4. Swift motion. 5. Being conducted by metals. 6. Crack or noise in exploding. 7. Subsisting in water or ice. 8. Rending bodies it passes through. 9. Destroying animals. 10. Melting metals. 11. Firing inflammable substances. 12. Sulphureous smell. -- The electrical fluid is attracted by points. -- We do not know whether this property is in lightning. -- But since they agree in all the particulars wherein we can already compare them, is it not probable they agree likewise in this? Let the experiment be made. (Franklin 1941, 334) According to Franklin’s theory, charge was carried by a subtle (i.e., weightless) fluid. All bodies have a natural amount of this fluid; an increase in the amount produces a net positive charge while a decrease leads to a negative charge. His hypothesis, that lightning would be attracted by points, was based on an extensive list of properties common to lightning and “electrical fluid.” There was, however, no known connection between the twelve common properties and the thirteenth projected similarity. This sort of argument poses a challenge for any theory of analogical reasoning, and a particular puzzle for my theory. In general, the challenge is to explain how arguments such as Franklin’s can be so compelling. As we saw in chapters 2 and 3, many people deprecate analogies based purely upon matching features, rather than upon similarities at the level of deep (typically causal) structure. In the case of my own theory, the difficulty is that Franklin’s argument fits the pattern that I call “correlative” analogical arguments (section 4.9): there is no known prior association 1. As Quine puts it, “the notion of kind and the notion of similarity seem[ed] to be substantially one notion” (1969, 119).
198
BY PARALLEL REASONING
other than statistical correlation among the properties. As I argued in section 4.9, statistical correlation provides too little structure to identify critical factors and provide guidance in assessing plausibility, in marked contrast to what we find with other forms of analogical argument. In order to respond to this challenge, I return to my broad thesis that to evaluate an analogical argument, we should consider its potential for generalization. I suggest that a correlative analogical argument is cogent if it provides reason to infer that the source and target domains are likely to belong to a common kind, corresponding to a common nature that is responsible for the cited and hypothetical similarities. The argument should be evaluated with this idea in mind. The idea seems to apply well to Example 6.1 (Franklin on lightning). Franklin’s point appears to be that the “electrical fluid” in the laboratory and lightning in a thunderstorm are the same substance. The purpose of the analogy is to enlarge a recognized kind. The prediction that a pointed conductor would attract lightning is a consequence of this larger conclusion: if lightning is electricity, it must possess this known characteristic of electrical fluid. Of course, Franklin’s prediction was borne out when the experiment was carried out several times (with occasionally fatal results) over the next two to three years. Two further examples help to clarify and support the idea that correlative analogies (and feature matching) are bound up with inference to a common kind. Example 6.2 (animal spirits). The concept of electricity was further extended, via analogy, to explain animal motions and sensations. One researcher cited a long list of similarities to support the conjecture that “electrical vapor” had the “same nature as animal spirits”: The speed with which the electrical vapor moves, changes direction, stops and races forth again seems consistent with the speed and changes in animal sensations and motions. The singular ease of its travel—in general, through electrical bodies by communication, and in particular, through the nervous and muscular parts of animals—is consistent with the ease with which the mutations induced in organs by various objects are conveyed to the seat of sentience; it is also consistent with the agility with which other motions correspondingly ensue in the body. And the contractions and dilations caused in the muscles by an electrical spark or electrical shock are arguments, perhaps even decisive ones, for the abovementioned conjecture. (Beccaria 1753, 126–27, as quoted in Pera 1992, 58) A modern commentator (Pera) has interpreted the first part of the argument as follows: “if the electrical fluid possesses some of the relevant properties that the nervous fluid should also possess—such as transmission speed, force, and penetration power—then, by analogy, there are grounds for proposing the hypothesis that the two fluids are identical.” In both this and Franklin’s example, analogy is used to argue that what have been thought to be two substances are in fact one.2 2. Faraday employed the same type of analogical argument in extending the scope of electrical interactions to the molecular scale. His experiments with electrolysis prompted him to write, “I have such conviction that the power which governs electrodecomposition and ordinary chemical attractions is the same”; see Mahan (1975).
SIMILARITY AND PATTERNS OF GENERALIZATION
199
Example 6.3 (Peruvian bark). Analogical reasoning was employed in the discovery of salicin, a precursor of aspirin. It was well known that Cinchona, or “Peruvian bark,” was an effective cure for fevers. Rev. Edward Stone wrote to the president of the Royal Society of London in 1763 to relate his discovery of similar medicinal properties in the bark of the English willow: There is a bark of an English tree [willow], which I have found by experience to be a powerful astringent, and very efficacious in curing agues and intermitting disorders. About six years ago, I accidentally tasted it, and was surprised at its extraordinary bitterness; which immediately raised in me a suspicion of its having the properties of the Peruvian bark. (Collier 1984, 564) Willow bark contains salicin, which, as was discovered later, is a derivative of salicylic acid, the active ingredient in aspirin. It is likely that Stone believed that the same substance, responsible for the bitter taste, was found in both barks. Alternatively, we might interpret the analogical argument as the first step in defining a potential new kind—perhaps a genus of chemicals—that includes both substances. Since there is no direct causal link between taste and medicinal qualities, one of these indirect links seems to be the best available grounds for the argument.3 I suggest that all three examples can be analyzed as involving two stages of argument. The first stage is generalization: from a list of shared properties, we infer that two things belong to a nontrivial common kind. As the second stage, we infer some further similarity because the relevant additional feature is congruent with, or strongly associated with, members of that kind. Where there is little or no understanding of the causal relationships behind the correlation, what justifies its extrapolation to the target domain is the promise of a common kind that has the right causal properties. This pattern of reasoning is characteristic of analogical arguments based on feature matching. The reasoning pattern merits close attention because frequently, as in the case of Example 6.3 (Peruvian bark), the intermediate step is not explicit and we have no clear idea of the appropriate common kind. My objective is to show that the ideal of generalization can nevertheless provide at least rudimentary guidelines. The problem, of course, is to determine when an unstructured list of matching features supports an inference to a common kind. There is no good general philosophical solution to this problem. Quine (1969, 1973) suggests that natural selection has equipped us with an instinct for making good similarity judgments and singling out interesting kinds.4 There is certainly some promise for a naturalistic approach to this type of analogical reasoning, but there are also grounds for skepticism. Even without the kind of global skepticism voiced by 3. Stone had an additional reason to think that willow bark would prove efficacious in treating fevers. He put the observation that the willow tree “delights in a moist or wet soil, where agues chiefly abound” together with “the general maxim that many natural maladies carry their cures along with them, or that their remedies lie not far from their causes.” Since these propositions, even if conceded, provide very little support for the particular inference under consideration, I shall pay them no further attention. 4. I avoid restriction to natural kinds. As Hacking (1990) points out, even if we can make sense of the notion, many fruitful kinds may not qualify as natural.
200
BY PARALLEL REASONING
Agassi (see section 2.6), it is clear that we make many errors about the significance of properties and kinds. For centuries, people believed that objects superficially resembling the symptoms of an illness could be useful in treating that illness.5 Perhaps nature has equipped us for survival and reproduction, but we need better arguments to persuade us that natural selection furnishes us with a capacity for making the sophisticated judgments about kinds required in scientific reasoning. To sharpen this skeptical objection to Quine (or rather, to an overly ambitious application of Quine’s ideas), let us turn to the field of paleontology, in which inferences about kinds, many of them analogical, play a crucial role. Paleontologists must infer genealogical relationships from lists of anatomical similarities and differences (as observed in the fossil record). Stephen Jay Gould puts the difficulty as follows: Evolutionary and genealogical inferences rest upon the study and meaning of similarities and differences, and the basic task is neither simple nor obvious. If we could just compile a long list of features, count the likenesses and unlikenesses, gin up a number to express an overall level of resemblances, and then equate evolutionary relationship with measured similarity, we could almost switch to automatic pilot and entrust our basic job to a computer. The world, as usual, is not so simple. . . . Similarities come in many forms: some are guides to genealogical inferences; others are pitfalls and dangers. (1989, 213) To illustrate the dangers, Gould makes two important distinctions. The first is the basic biological distinction between homology, or similarity arising from inheritance of features present in common ancestors, and (biological) analogy or homoplasy, or similarity arising from separate evolution for the same function. The dorsal fins of dolphins and certain fishes, or the wings on bats and birds, have features that are merely homoplastic. By contrast, the presence of seven neck vertebrae in giraffes, humans, and bats constitutes homology because this characteristic has been inherited by nearly all mammals. Homoplasy might support plausible analogical inferences about function (see Example 4.3), but it is utterly misleading when it comes to drawing conclusions about phylogenetic relationships. Gould writes: [The Burgess Shale species] Anomalocaris may have used its overlapping lateral flaps to swim by undulation, much as certain fishes with continuous lateral fins or flattened body edges do—but this functional similarity, evolved from different anatomical foundations, indicates nothing about genealogical relationship. (213–14)
5. See the example of elephantiasis in section 1.1, and Stone’s reasoning in note 3 above. Very similar principles are still influential in forms of alternative medicine such as homeopathy (Hahnemann 1921). Some skepticism about our “natural” instincts is appropriate.
SIMILARITY AND PATTERNS OF GENERALIZATION
201
Gould’s second distinction, within the category of homology, is between shared-but-primitive (symplesiomorphic) and shared-and-derivative (synapomorphic) characteristics. The former category includes similarities such as the possession of a vertebral column or a bivalved carapace—inherited, but so broadly shared as to be of no value in defining informative genealogical groups. The latter category is typified by more specialized similarities such as the possession of bodily hair (limited to mammals) or opposable thumbs. Making use of these two distinctions, Gould tells us that the basic rule for paleontologists is to exclude both homoplasy and shared-but-primitive homologies in genealogical inferences.6 Gould’s analysis makes it very clear that simple-minded naturalism about scientific kinds (and scientific analogies) is a nonstarter. Let us take stock. I have suggested that correlative analogies, based on lists of matching features, should be assessed on the basis of whether the features cited provide plausible support for a two-stage inference: first to a common kind, and second to the additional similarity being present in the target. Scientific practice suggests that it is naı¨ve to seek a simple naturalistic justification for such inferences. Very well, then; why not just defer to scientific practice? Following Gould’s lead, we might propose that each science must develop its own sense of which features are appropriate for classification, and hence which are suitable for use in correlative analogies. This strategy has much in common with Goodman’s solution to the problem of induction. Strongly entrenched predicates, those which have been successfully employed in explanations and predictions in the past, contribute most to the plausibility of grouping together two objects under a common kind.7 In a sense, this more sophisticated form of naturalism provides the best possible solution to the puzzle raised at the start of this section: how to account for the power of certain “merely correlative” analogies. Not just in paleontology, but also in fields such as pharmacology (see Example 4.4, morphine and meperidine), we encounter powerful analogical arguments made without specific causal knowledge. We may attribute the cogency of these arguments to their meeting criteria solidly based on scientific experience, and (with Gould) we can note that these criteria are highly specific to each scientific field or subfield. From a philosophical perspective, however, something is missing. One way to appreciate the difficulty is to recall Kuhn’s famous distinction in The Structure of Scientific Revolutions between inquiry within and outside the context of “normal science.” Work that is done either prior to the establishment of a research paradigm or in periods of “revolutionary” science is not governed by the same
6. As Gould points out, even this principle may be too casual when dealing with certain periods in the distant past (e.g., 530 million years ago, the era of the Burgess Shale specimens), when “unparalleled experimentation” by nature led to the continual re-emergence in distinct genealogical groups of traits that we would ordinarily classify as shared and derivative. In other words, the principle is misleading where homoplasy is common. But it is still a reasonable starting point. 7. Gould’s principle is more subtle than this, since homoplastic and symplesiomorphic traits are arguably entrenched—they play a role in some kinds of explanations—but nevertheless excluded from consideration in genealogical inferences.
202
BY PARALLEL REASONING
principles as normal science. Yet it is in such settings that we often encounter correlative analogies—indeed, the three examples described earlier in this section all fall outside normal science. This suggests that there is still a need to develop, at an intermediate level of generality (above the level of Gould’s principle), basic principles for evaluating correlative analogies. These principles should be conformable to the demands of different disciplines (and thus compatible with more determinate principles such as Gould’s), yet general enough to be applicable even outside normal science. Allow me to dwell just a little longer, however, on the special case of evolutionary biology. Let’s take another look at Gould’s principle that genealogical inferences should exclude features that are homoplastic and symplesiomorphic. Stated this way, the principle looks circular. Starting merely with observed similarities between two fossil specimens, how can we classify them as homoplastic or symplesiomorphic unless we already know the genealogy? Evolutionary biologists have formulated a number of principles for avoiding this circularity and distinguishing between these different types of similarity at a general level. • Homology tends to be associated with basic body plans. For instance, merostomes, a group of marine arthropods that includes horseshoe crabs, have strong head shields, trunks of several segments of equal width, and a narrow tail. This body type constitutes a shared-but-primitive homology. • Homologous features display detailed similarities in structure and position, whereas homoplastic similarities are frequently superficial. • Homologous features tend to occur in congruent clusters. In regard to the last item, the importance of clustering to classification was noted very clearly by Darwin: The importance, for classification, of trifling characters, mainly depends on their being correlated with several other characters of more or less importance. The value indeed of an aggregate of characters is very evident . . . a classification founded on any single character, however important that may be, has always failed. (Darwin 1860, chapter 13) These ideas are put to work in an analytical technique known as parsimony analysis that allows biologists to infer genealogical relationships directly from similarities. The method does not presuppose any prior classification of a given similarity as homoplasy or homology.8 Parsimony analysis is used to construct cladograms, or phylogenetic trees, that trace the evolution of different taxa. As a simple example, suppose we are trying to sort out the ancestry of humans, northern alligator lizards (which give birth to live offspring), frogs, and dogs.9 Let us consider just two of the many possible phylogenetic trees, where each fork represents a historical branch point (see figure 6.1).
8. We do, however, need to make a general assumption that homoplasy occurs much less frequently than homology; otherwise, the method gives misleading results. 9. This example is a modified version of one provided in Embley et al. (2005).
SIMILARITY AND PATTERNS OF GENERALIZATION
Human
Dog Lizard
Frog
Tree 1
Human Frog
Lizard
203
Dog
Tree 2
FIGURE 6.1. Parsimony analysis.
We begin with a set of characters (or traits). For our example, take the following list: hair, lactation, tail, and live birth. We construct a table indicating whether each character is possessed (+) or absent (-) in each of the taxa. Each character is assigned a fit on each tree, defined as the minimum number of steps (changes) required to explain the presence or absence of the character for all taxa in the tree. This is equivalent to the number of times the trait would have had to evolve (or disappear) independently. Tail has a fit of 1 on Tree 2, since only one change (at the initial fork) is required to explain its evolutionary history; by contrast, on Tree 1, its fit has the value 2 because two steps (evolution at both later forks) are required. The overall tree length is the sum of the fit values over all characters. If one tree has shorter tree length than another, it is taken to be more parsimonious and a more probable representation of actual evolutionary history. In this example, and relative to the given set of characters, Tree 1 is more parsimonious, suggesting that the absence of a tail is a merely homoplastic resemblance between humans and frogs. There are numerous elaborations and complications to parsimony analysis, but I wish to focus on points that I think will be of general value for this type of analogical argument. The main point is that judgments of parsimony are relative to the initial list of characters, so that it is crucial to determine what sorts of characters (singly or in combination) are appropriate. This brings us full circle back to the problem of determining which matching features offer the promise of a common kind. In the broadest sense, the answer is this: features that are informative. This point appears to be valid for any classificatory enterprise.10 Put simply, ubiquitous features and features known in advance to be misleading provide no helpful information; by contrast, determinate structural properties are highly informative. We can go a little further. Informativeness can be broken down into the following criteria for evaluating analogical arguments based on matching features. (6.2.1) Scarcity (of each individual feature) The more uncommon the cited similarity, the greater the likelihood that it picks out a distinct kind. In particular, complex structural features are likely 10. Indeed, Kitching et al. (1998, 1) write that cladistics is a general method of classification that can be employed to organize any comparative data (e.g., in linguistics).
204
BY PARALLEL REASONING
to be rare and therefore significant (as noted above in Gould’s insistence on derived rather than primitive shared traits). (6.2.2) Congruence (scarcity of a cluster of factors) By extension of the first point (and with reference to Darwin’s observation), a cluster of matching features that is comparatively rare contributes greatly to the plausibility of a common kind. (6.2.3) Independence We should exclude similarities that appear redundant relative to others on the list. Independence provides a means of countering concerns about the arbitrariness of the list. Texts on cladistics refer to the need for rejecting overlapping characters, though this represents an ideal: “in reality, we have a sliding scale from widely overlapping characters to widely disjunct characters” (Kitching et al. 1998, 22). (6.2.4) Qualitativeness and Discreteness Qualitative and discrete characters are more reliable indicators of a stable group than quantitative and continuously varying characters. Most cladistic analyses reject characters that are continuous and quantitative (e.g., bone lengths) and employ characters that are discrete and qualitative (e.g., lactation). Such characters are considered more reliable indicators of a clade than quantitative characters, and they permit clear judgments of presence or absence. (6.2.5) Discipline-specific Constraints The general criteria (6.2.1)–(6.2.4) have to be balanced against disciplinespecific guidelines that help to distinguish between misleading and informative similarities, as in Gould’s injunction to set aside homoplastic similarity. These guidelines inevitably rest upon experience. All five criteria pertain to the first step of the two-stage process mentioned earlier: inferring the existence of a common kind. Three of the criteria—(6.2.1), (6.2.4), and (6.2.5)—also pertain to the second step of transferring a further similarity to the target domain. The argument becomes more plausible to the extent that this further feature of the target is discrete/qualitative, comparatively rare, and of the sort whose transfer in analogical inferences has often been successful.11 Apart from the support that these criteria derive from our lengthy discussion of the biological example, there are two other means of defending them along with the more general idea that matching features should be informative. We can view classification as a species of statistical reasoning whose foundation may be
11. By way of clarification, there is a trivial sense in which an analogical argument gains in plausibility if the focal feature (the one being transferred by analogy) is ubiquitous, rather than rare. The conclusion that this feature applies to the target domain is certainly more probable if that feature occurs commonly than if it is scarce. But we should not confuse incremental confirmation with overall probability: that a ubiquitous feature is likely to be found in the target domain has a high antecedent probability, to which the analogical argument may contribute nothing.
SIMILARITY AND PATTERNS OF GENERALIZATION
205
something like the Principle of the Common Cause. What has to be explained is the improbable congruence of certain factors in two distinct domains; the best explanation is the presence of a common cause or common nature. Salmon provides a clear general statement of this principle (due originally to Reichenbach): “The principle of the common cause states, roughly, that when apparent coincidences occur that are too improbable to be attributed to chance, they can be explained by reference to a common causal antecedent” (1984, 158). Salmon formulates the principle in terms of events: “Given two effects, A and B, that occur together more frequently than they would if they were statistically independent of one another, there is some prior event C, which is a cause of A and is also a cause of B, that explains the lack of independence between A and B” (1984, 160). As a stock illustration, when several members of a theatrical company all become ill, we legitimately infer some common cause—perhaps food poisoning from a shared meal. Parsimony analysis seems to rely upon an analogous version of common-cause reasoning, where A and B refer to qualitative traits or characters, and C refers to a biological kind. The common-cause principle plays a fundamental role in the justification of analogical arguments based on matching features. It makes the connection between our list of criteria for features that are informative and the promise of generalization to a common kind. It provides a basis for evaluating the strength of such arguments. Finally, it establishes the potential for this type of analogical argument to be strengthened (to a predictive/probabilistic analogy) by supplying the underlying causal relationship. This point can be illustrated by reference to the earlier Example 4.4 (morphine and meperidine). The strength of that argument, which is based entirely on matching features, is due to the highly unusual S-shaped tail curvature produced by both morphine and meperidine, and the thought that this curvature and sedative effects are produced by some common cause, a chemical nature common to the two drugs. It is not my purpose here to defend any version of the Principle of the Common Cause, which is difficult to state in acceptable form and admits of important limitations. I do think that some version of the principle is needed both to make sense of how analogical arguments in the category under discussion are to be assessed, and ultimately to provide such justification as they are capable of. Let us concentrate, however, on the five criteria suggested above. Specifically (and this is the second way of justifying the criteria), I want to show that they work well in evaluating our earlier (nonbiological) examples. I also wish to discuss how we might handle situations in which they seem to pull in opposite directions. By appealing to our criteria, we can see that Franklin’s argument (Example 6.1) fares much better than Stone’s (Example 6.3). Franklin notes a whole cluster of shared properties, discrete in nature, individually uncommon, and for the most part independent of each other. In particular, the fact that metal conducts both lightning and the “electrical fluid” is a rare matching feature. Apart from these, only heat was known to be conducted by metal, and at a much slower rate. Even if we rule out some of the items on his list (“destroying animals” seems to be implied by “rending bodies”), the argument is very strong.
206
BY PARALLEL REASONING
By contrast, Stone’s willow bark analogy is weak to the point of implausibility. It relies upon a single similar feature, bitter taste, which occurs commonly in many plants and herbs (and perhaps most bark)12 and thus is not particularly informative. Although Stone’s conjecture turned out to be correct, it seems to have been more a matter of good luck than good analogical reasoning. One might argue that the bitter taste common to willow bark and cinchona (Peruvian bark) is sufficiently striking to warrant further exploration of the analogy. The analogical argument could certainly be improved by identifying more notable and rare properties common to the two types of bark, in the manner of Franklin’s example. But we should probably reject Stone’s argument in its original form. The criteria we have listed sometimes appear to conflict. A feature may appear to be comparatively rare, in line with (6.2.1), yet be excluded by disciplinary tradition, in accordance with criterion (6.2.5). Franklin invokes color in his analogy because the vivid color common to lightning and electrical sparks in the laboratory is rare and striking. Yet we know that color plays little role in contemporary physical explanations. My response is that it was acceptable, both in the absence of a real theory of electricity and because of the exceptional nature of the phenomenon, to view color as significant and the similarity as an indication of a common nature. As a second example of this type of conflict, one might suggest in connection with Example 4.4 that the striking S-shaped tail curvature observed in mice injected with either morphine or meperidine is rare, but tradition should not accord such an odd similarity any great weight. To the contrary, muscle spasms are a highly important category of physiological response, often observed as consequences of drug treatment. In this case, understanding that this rare reaction belongs to a traditionally important category allows us to avoid conflict between criterion (6.2.1) and criterion (6.2.5). The contrast between this example and Stone’s willow bark case is quite interesting. We have two superficially similar analogical arguments, both reasoning from a single gross similarity to the conclusion that some substance has a medicinal virtue. Yet when we consider the key similarity in light of criteria (6.2.1) and (6.2.5), we find a very strong analogical argument in the case of morphine and meperidine, and a very poor one in Peruvian bark and willow bark. A further concern is that many of our criteria are historically relative. Franklin puts colors and sounds near the top of his list; genetic sequence figures prominently in current biological reasoning. In my view, this historical component to the evaluation of analogical arguments is just fine. Classification schemes, and therefore analogical arguments based on feature matching, may be regarded differently at different times. The basic criteria (6.2.1) through (6.2.5) remain, even though their interpretation and application varies, so that we may expect different judgments about the virtues of an analogical argument.13 Our theory gives us room to make historically sensitive judgments. 12. In fact, quinine and salicin, the chemicals responsible for the respective bitter tastes of cinchona and willow bark, are structurally quite different. 13. This echoes the discussion of relativization in section 1.7.
SIMILARITY AND PATTERNS OF GENERALIZATION
207
Franklin’s lightning argument was a good one for his day, while Stone’s willow bark argument (I submit) was not. The examples in this section all appear to support the idea that we can usefully view correlative analogies as either extending an existing kind or defining a new kind. Still, it might be objected that not every correlative analogy fits this pattern. Analogical reasoning from animals to humans appears to provide many counterexamples. Medical experimenters regard a mouse or rat as a suitable “animal model” for humans on the basis of similarities. The genetic constitution and physiology of mice are, in many respects, close to what we find in humans. Recall that in Example 4.4 (morphine and meperidine), there are actually two analogical arguments. Following the analogical inference that meperidine might have the same analgesic properties as morphine on rats, there is an analogical inference that meperidine may be expected to have analgesic effects on humans. Such an inference, it might be argued, requires no postulation of a common kind. We pass from the effects of drugs on rats to their likely effects on humans by appealing directly to similarity, not to any common kind. So goes the objection. My initial reply is that our five evaluation criteria are still relevant to establishing the plausibility of these and other similar analogies. It is still important that the shared properties on which the argument rests be rare, independent, congruent, and so forth. Thinking in terms of a common kind is a convenient way to summarize these criteria. Furthermore, as noted already, we can strengthen an analogy of this type if we are able to upgrade the prior association from mere correlation to causal explanation. Informative characteristics, as indicated by our evaluation criteria, are likely to play a role in such a causal explanation. I believe that we can offer a stronger response to the objection. Analogical reasoning from animals to humans really does require the possibility of a nontrivial common kind, even though this may not be a standard biological grouping. One essayist has suggested that scientists are fast becoming convinced of the fundamental symmetry among species that licenses not merely traditional inferences from animals to humans but also inferences in the reverse direction. The author writes: The same genes organize the body plan and fundamental structurebuilding and metabolic mechanisms across vertebrates and even invertebrates . . . brain structures and organization develop in a highly coordinated fashion . . . [and] these structural similarities are echoed in behavioral patterns. (Bradshaw and Finlay 2005, 149) Such observations point the way to a “more coherent picture of both human and non-human species,” a picture that stresses what is common.
6.3 Formal Similarity and Pythagorean Analogies Formal Analogy and Formal Similarity Hesse defines formal analogy as “one-to-one correspondence between different interpretations of the same formal theory” (1966, 68). Hempel writes that two
208
BY PARALLEL REASONING
sets of phenomena exhibit nomic isomorphism if there is a “syntactic isomorphism between [the] corresponding sets of laws” (1965a, 436). The essential idea is that the two sets of physical laws have a common mathematical form and may be obtained by assigning different physical interpretations to the symbols that appear in that common form.14 In this section, my objectives are to introduce a slight (but important) generalization of this idea, to present an interesting and important challenge to a whole class of formal analogies as put forward by Steiner (1989, 1998), and to show how my model of analogical arguments offers a way to respond to that challenge. Hempel’s definition echoes ideas that we find in nineteenth-century physicists such as Kelvin and Maxwell. Instead of nomic isormophism, Maxwell speaks of physical analogy. In his 1855 paper “On Faraday’s Lines of Force,” he defines this as “resemblance in form” between “the laws of one science and those of another” (1890, 156), which is very close to Hempel and Hesse’s conception of formal analogy. Maxwell identifies numerous formal (or physical) analogies between heat, fluid flow, and inverse square law forces. The following passage, which discusses the analogy between attractive forces and heat conduction, is a typical example: The laws of the conduction of heat in uniform media appear at first sight among the most different in their physical relations from those relating to attractions. The quantities which enter into them are temperature, flow of heat, conductivity. The word force is foreign to the subject. Yet we find that the mathematical laws of the uniform motion of heat in homogeneous media are identical in form with those of attractions varying inversely as the square of the distance. We have only to substitute source of heat for centre of attraction, flow of heat for accelerating effect of attraction at any point, and temperature for potential, and the solution of a problem in attractions is transformed into that of a problem in heat. (1890, 157) The mathematical law governing the electric field E(x) due to an electric potential j(x) at a point x due to a system of charged particles is (for constant k1) D
EðxÞ ¼ −k1 fðxÞ; D
where stands for the gradient operator. The heat flow (or heat flux vector) H(x) due to a temperature distribution T(x) is (for constant k2) D
HðxÞ ¼ −k2 TðxÞ: So both equations have the form
14. We don’t distinguish here between laws and law statements, but this distinction should be clear from the context.
SIMILARITY AND PATTERNS OF GENERALIZATION
209
D
AðxÞ ¼ −k f ðxÞ; where k is a constant. Another classic example, discussed by Hempel, is the analogy between the flow of electric current in a wire and fluid in a pipe. Ohm’s law Dv ¼ iR; stating that voltage difference along the wire equals current times a constant resistance, is mathematically identical to Poiseuille’s law (for ideal or Newtonian fluids): Dp ¼ V_ k; or pressure difference along the pipe equals volume of flow per unit time times a constant. The two domains are formally analogous. Both can be represented by a common equation whose symbols are interpreted differently. As these examples illustrate, we have nomic isomorphism when two laws share a common mathematical form. The first point I want to stress, however, is that formal analogy should not be limited to nomic isomorphism. There can be formal analogy between the laws of two theories or between (statements describing) empirical phenomena that are consequences of the laws. We can also have both sorts of analogy at once. To illustrate, consider Priestley’s analogy between charged spherical shells and shells of uniform mass density (Example 4.2) and the acoustical analogy between spectral lines and harmonic overtones of a stringed instrument (Example 4.7). In both of these cases, we may say that the formal analogy exists initially between observable phenomena, rather than between physical laws. There is a formal analogy with the potential for nomic isomorphism. I shall say, then, that we have a formal analogy between two domains whenever we can abstract a common mathematical form (typically an equation or set of equations), regardless of whether this relates laws or descriptions of empirical phenomena. Two objects, relations, or functions are formally similar if they correspond within the context of a formal analogy—that is, if they are interpretations of the same symbol in the common abstract form. Formal similarity is closely related to the notion of algebraic similarity, defined in section 5.4 for mathematical analogies. As in that section, it is not usually necessary to isolate the abstract form. It is important, however, to note some significant issues about formal similarity and formal analogies that emerge when we move from a purely mathematical to an empirical setting. In the first place, in the mathematical setting we have one basic pattern of analogical argument, where we reason from formally similar assumptions to the plausibility of similar consequences. In empirical settings, there are two basic patterns associated with formal analogies: mathematical analogies, used to argue from similar causes to similar effects, and abductive analogies, used to argue from formally similar effects to formally similar causes. It is precisely to make sense of this second category that we we need to liberalize the notion of formal analogy to include cases of similarity in form between observable phenomena. If formal
210
BY PARALLEL REASONING
similarity were limited to resemblance between the laws of two different systems, then it could not play any role in abductive analogies. This point is illustrated by Priestley’s analogical argument (Example 4.2). The known formal analogy exists between the experimental setups (two hollow spherical shells with uniform mass or charge distribution), and between the (indirectly) observed consequences (the absence of force inside the shell). The plausible conclusion of the argument is that there is a similar explanation for these observations: that electrostatic attraction (like gravitation) obeys an inverse square law. With these preliminary observations out of the way, we come to an interesting question: can a purely formal analogy contribute to the plausibility of an empirical hypothesis?15 Maxwell, Hempel, and Hesse all concentrate on examples in which a formal analogy has a physical basis, despite its mathematical characterization. Indeed, in all of the examples thus far, the formal analogies have clear and wellunderstood physical interpretations. Yet we have characterized formal analogy in abstract, mathematical terms. This raises the problematic possibility of using a purely formal analogy, with no physical interpretation, to reason about a physical system. It is tempting to accept the following thesis (which implies a negative answer to our question): (6.3.1) Requirement of Physical Analogy A necessary condition for an analogical argument in the empirical sciences to be plausible is that the relevant similarities that constitute the basis for the argument have known physical significance. Steiner (1989, 1998) rejects this requirement, at least to the extent of identifying a whole class of purely formal analogies that have been successful, and influential, in the history of physics.16 Although I believe that the requirement of physical analogy, as stated in condition (6.3.1), is not quite correct, I shall propose and defend a similar principle.
Pythagorean Analogies and Steiner’s Problem We need to say a little more about what it means for a formal analogy to have, or to lack, physical significance. For Steiner, this point is the basis for an important distinction between two categories of formal analogy. A first-order formal analogy is one with a physical basis: the relevant mathematical similarities correspond to physical properties. Steiner’s example is the mathematical property of linearity, which corresponds to the physical property of superposition. Steiner is not very interested in such analogies, which he regards as relatively unproblematic. I think that even first-order analogies are somewhat mysterious, but Steiner 15. Some people—for example, Morrison (2000)—might question whether physical analogy plays a role in plausibility arguments. I believe that historical examples such as the acoustical analogy strongly support an affirmative answer, provided we extend the notion of physical analogy to observable phenomena as I have been suggesting. 16. It would be possible for Steiner to retain condition (6.3.1) by arguing that none of these analogies is plausible.
SIMILARITY AND PATTERNS OF GENERALIZATION
211
restricts his attention to a deeply mysterious second category which he calls Pythagorean analogies. A Pythagorean analogy relies on purely mathematical similarities that have no known physical interpretation at the time they are proposed, even though for successful analogies, a physical basis tends to emerge later. In Steiner’s words, a “Pythagorean analogy at time t” is “a mathematical analogy between physical laws (or other descriptions) not paraphrasable at t into nonmathematical language” (1998, 54). The label refers to the pre-Socratic philosopher Pythagoras and his followers—mathematical mystics who were disposed to “find” analogies between natural phenomena and mathematical or musical relationships. Aristotle provides a good description of Pythagorean reasoning: all the properties of numbers and scales which they [the Pythagoreans] could show to agree with the attributes and parts and the whole arrangement of the heavens, they collected and fitted into their scheme; and if there was a gap anywhere, they readily made additions so as to make their whole theory coherent. E.g. as the number 10 is thought to be perfect and to comprise the whole nature of numbers, they say that the bodies which move through the heavens are ten, but as the visible bodies are only nine, to meet this they invent a tenth—the “counter-earth.” (Met 986a4–11) In Steiner’s view, modern-day physicists, who share the astonishing faith that purely mathematical analogies can lead to physical theories that truly describe reality (or at least lead to precise, accurate predictions), are latter-day descendants of Pythagoras—although their analogies are markedly more successful than his. Steiner argues that Pythagorean analogies have been employed with great success in nineteenth- and twentieth-century physics. We shall shortly consider three examples. First, however, it is important to appreciate Steiner’s interest in Pythagorean analogies, which partly coincides with their interest for my theory. They generate two puzzles. First, analogies based on formal similarities with no known physical interpretation “should not work at all” (Steiner 1989, 454), but many of them do succeed. The formal similarities are “sufficiently irrelevant that their success evokes, or should evoke, puzzlement.” Second, for similar reasons, it seems that Pythagorean analogies should not even confer plausibility on a hypothesis. Philosophers thus face the daunting task of explaining both why Pythagorean analogies are sometimes justified and why they are so often successful. These reflections prompt Steiner to reject what he calls “naturalism,” roughly the view that humans have no special place in the cosmos. He reasons as follows: 1. Pythagorean analogies have played a crucial role in many fundamental physical discoveries. 2. Pythagorean analogies are anthropocentric. They depend on mathematical resemblances selected on the basis of aesthetic appeal rather than physical significance.17 17. That is, mathematicians single out these features because they appeal to their (very human) sense of symmetry, beauty, or simplicity.
212
BY PARALLEL REASONING
3. Naturalism cannot accommodate any form of anthropocentric reasoning about physical reality. Hence, recent discoveries in physics provide a severe challenge to naturalism. I don’t wish to discuss Steiner’s argument in this form because I don’t want to embark on a discussion of naturalism and anthropocentrism. What is most interesting, in my view, is the claim that a vitally important class of analogical arguments is so deeply mysterious that success appears to be a miracle. So let me restate Steiner’s problem in the following way. Consider the following propositions: (1) Pythagorean analogies have played a crucial role in many fundamental physical discoveries. (2) Pythagorean analogies contribute nothing to the plausibility of a physical hypothesis. (2a) Pythagorean analogies are based on mathematical similarities with no known physical significance. (2b) Reasoning about the physical world based on purely mathematical similarities contributes nothing to the plausibility of a hypothesis. (3) The fundamental discoveries were in fact based on (somewhat) plausible arguments. The version of Steiner’s problem I shall consider is that propositions (1), (2), and (3) are incompatible. Nobody wants to reject (3) and attribute the breakthroughs in physics to chance. That leaves two types of solution. A Pythagorean solution rejects (2), and faces the task of explaining how a Pythagorean analogy can provide plausibility. This forces us to give up criterion (6.3.1), the requirement of physical analogy. By contrast, a non-Pythagorean solution rejects (1) by showing that Steiner’s analogies are not, in fact, Pythagorean. I shall offer a non-Pythagorean solution to Steiner’s problem. In order to clarify further the nature of Pythagorean analogies, and to take the first steps toward solving Steiner’s problem, let’s consider some examples. Example 6.4 (electromagnetic radiation). Maxwell made use of analogical reasoning in his prediction of electromagnetic radiation. This famous example has received a great deal of attention. Here is how Steiner describes Maxwell’s reasoning in 1873: Maxwell noted that [existing laws of electricity and magnetism] contradicted the conservation of charge, though the phenomenological laws were strictly in accord with the evidence then available. Yet, by tinkering with Ampe`re’s law, adding to it the “displacement current,” Maxwell succeeded in getting the laws actually to imply charge conservation. With no other empirical warrant (Ampe`re’s law stood up well experimentally; on the other hand, there was ‘very little experimental warrant’ for the physical existence of the “displacement current”), Maxwell changed Ampe`re’s law to read that (the “curl” of) the magnetic field is given by the sum of the “real” current and the “displacement current.” Ignoring the empirical basis for Ampe`re’s law (magnetism is caused by an
SIMILARITY AND PATTERNS OF GENERALIZATION
213
electric current), but by formal mathematical analogy, Maxwell now asserted the law even for a zero “real” current! Thus did Maxwell predict electromagnetic radiation, produced later by Hertz. (Steiner 1989, 458) Steiner is following a common interpretation according to which Maxwell’s “tinkering” was motivated by the need to restore conservation of charge.18 Ampe`re’s law for the magnetic field B (and current J), curl B ¼ 4p J; had a great deal of empirical support, but is valid only if the charge distribution does not vary over time. Maxwell modified the equation to curl B ¼ 4p Jþð1=c 2 Þ@E=@t; adding a term (the displacement current) that ensures charge conservation.19 The new term matters only if the charge distribution (represented by the electric field E) is not constant. Maxwell then noted the possibility of electromagnetic radiation even where J is zero. Steiner counts Maxwell’s prediction of electromagnetic radiation as a Pythagorean analogy. The analogy here is between known solutions to Maxwell’s (and Ampe`re’s) equation where J is nonzero and @E/@t is zero, and conjectured solutions where J is zero and @E/@t is nonzero. According to Steiner, the two cases could have had (in 1873) only a mathematical resemblance—namely, they solve the same equation. That makes the analogy Pythagorean. To appreciate more clearly why we have a Pythagorean analogy, let us represent Maxwell’s reasoning (by Steiner’s lights) in figure 6.2.
Maxwell’s equation curl B = 4π J + (1/c 2)∂E/∂t
Ampère’s Law (∂E/∂t = 0) curl B = 4π J
Observed solutions
No current (J = 0) curl B = (1/c 2)∂E/∂t
Predicted solutions (electromagnetic radiation)
FIGURE 6.2. Maxwell’s Pythagorean analogy.
18. Maxwell’s reasoning is often presented this way in textbooks on the subject; see Purcell (1985) or Feynman (1963). 19. Charge conservation requires div J + @ r / @t = 0, where r stands for charge density. Ampe`re’s law implies that div J = 0 (applying the divergence operator to both sides and using the fact that div curl B = 0), but that is valid only for a closed circuit. When we apply the divergence operator to both sides of Maxwell’s equation and make use of Gauss’s law (div E = 4pc2r), we get the correct result for charge conservation.
214
BY PARALLEL REASONING
The Pythagorean analogy is the dotted arrow between the observed and predicted solutions to Maxwell’s equation. The pattern of reasoning is to take a wellconfirmed equation (in this case, Ampe`re’s law), generalize by removing a background condition (in this case, @E/@t = 0), and then conjecture the existence of some other solution. Such an analogy should convey no plausibility on the conjecture, because the available evidence supports only the initial specialized equation. All we have to go on is mathematical resemblance: both the observed and predicted phenomena are solutions to a common mathematical equation. Example 6.5 (Schro¨dinger equation). According to Steiner, a similar style of reasoning was employed by Schro¨dinger in the discovery of his famous equation. Steiner represents Schro¨dinger’s reasoning as follows. Schro¨dinger “assumed that a particle of constant energy E corresponds to a wave of frequency” o = E/h (relying on Planck’s E = h o). If we put Ψ ¼Aeiðkx−otÞ ; then differentiating with respect to t and re-arranging yields i h
@Ψ ¼ EΨ: @t
But we also have p = h k (from de Broglie), and so we get @2Ψ p2 ¼ −k2 Ψ ¼ − 2 Ψ; 2 @x h so that −
h2 @ 2 Ψ p2 Ψ ¼EΨ; ¼ −k2 Ψ ¼ 2 2m @x 2m
so long as the potential energy is zero, since then E = p2/2m. Putting the equations together yields −
2 @ 2 Ψ h @Ψ ; ¼ i h 2 2m @x @t
a special case of the Schro¨dinger equation with zero potential and constant energy. Schro¨dinger extended this to the general case where the potential energy V is not zero: ! h2 @ 2 @Ψ þ V Ψ ¼ i h : − 2m @x 2 @t He then conjectured that all the solutions to this general equation could be found in nature. For Steiner, this analogy between known and predicted solutions has the same logical structure as Maxwell’s argument: passing from examples that
SIMILARITY AND PATTERNS OF GENERALIZATION
215
support a restricted equation to an unrestricted (and unmotivated) general equation, and thence to the hypothesis that mathematical solutions of the general equation indicate real possibilities. Example 6.6 (relativistic Schro¨dinger equation). Schro¨dinger made further use of a Pythagorean analogy to guess the form of the relativistic wave equation. The pattern of reasoning here is different from what we saw in the first two examples. To understand it, we once again invoke the neat relationship between the nonrelativistic Schro¨dinger equation and its classical analog. As we just saw, from Hamilton’s energy equation, ð6:3:2Þ
E¼
p2 þ V;20 2m
we get the Schro¨dinger equation by substituting differential operators @ ^ ¼ i E h @t for energy (E) and ^p ¼ −i h
@ @x
for momentum (p). This gives us the (one-dimensional) Schro¨dinger wave equation: 2 ^ ¼ ^p þ V; E 2m
ð6:3:3Þ or in expanded form, i h
@ −h2 @ 2 Ψðx; tÞ ¼ Ψðx; tÞ þ VðxÞΨðx; tÞ: 2m @x 2 @t
Schro¨dinger suggested (by analogy) applying the same substitutions to Einstein’s relativistic mass-energy equation ð6:3:4Þ
E 2 − p2 c2 ¼ m2 c 4 ðc ¼ the speed of lightÞ
to obtain ð6:3:5Þ
^ 2 − ^p 2 c 2 ¼ m2 c4 : E
In expanded form, we have (in one dimension)
20. Here, E represents total energy, V is potential energy, m is mass, and p is momentum.
216
BY PARALLEL REASONING
− h2
@2 2 @2 Ψðx; tÞ ¼ m2 c4 Ψðx; tÞ: −c @t2 @x 2
The substitution that leads from equation (6.3.4) to (6.3.5) is motivated purely by formal analogy with the substitution that leads from equation (6.3.2) to (6.3.3). It yields a result known as the Klein-Gordon equation. Steiner comments that Schro¨dinger, and after him Dirac, had “faith that, despite the vast difference between Hamilton’s and Einstein’s energy equations, a substitution that ‘worked’ in one case should work in the other” (1998, 161). That faith is sustained not by any physical analogy but, rather, by the drive toward mathematical generalization and the idea that physical reality somehow comes along for the ride. The mathematical substitutions that worked in one setting might succeed in a “similar” setting: that is the characteristic structure of this type of Pythagorean analogy. It is Pythagorean because the only support for equation (6.3.5) is the mathematical similarity of the relation between (6.3.4) and (6.3.5) and the relation between (6.3.2) and (6.3.3). With these three examples on board, we are now in a better position to appreciate Steiner’s argument. Many fundamentally important analogical arguments in physics have been Pythagorean: the arguments have been based on mathematical similarities with no known physical basis. Absent any known physical similarity, the basis for identifying the salient mathematical similarities can only be our sense of which mathematical properties are interesting, elegant, simple, or otherwise worthy of note. But unless we suppose that the aesthetic sensibilities of the human mind are somehow aligned with the natural world (this is where anthropocentrism comes in), there is no basis for expecting that they should provide even the weakest of guidance in seeking the fundamental laws of nature. Steiner puts the rhetorical question: “Does a sentence have an enhanced claim to be true because it is a palindrome?”
Pythagorean Solutions to Steiner’s Problem One response to Steiner’s argument runs as follows. We can concede a great deal— that many great analogies in physics really have been Pythagorean and that they really did derive from our aesthetic sensibilities—but still deny his conclusion. As Steiner himself observes, physicists have had moderate success with Pythagorean analogies. Reasoning by induction from this success, they are justified in continuing to employ this form of reasoning. Liston (2000) appears to offer an argument of this sort: “[The scientist] can admit that no one knows how [Pythagorean] reasoning works and argue that the very fact that similar strategies have worked well in the past is already reason enough to continue pursuing them hoping for success in the present instance” (200). In fact, Steiner anticipates and rejects this line of reasoning. What counts as a similar strategy? The inductive argument applies to a new case only if it
SIMILARITY AND PATTERNS OF GENERALIZATION
217
exemplifies a “similar strategy” to successful past cases. But that similarity can only be purely mathematical. So the analogy to past cases is itself Pythagorean! The inductive argument thus fails if we reject Pythagorean reasoning and succeeds if we accept it. Either way, the argument is idle. A second response to Steiner’s argument is to deny that mathematical concepts are anthropocentric. We can embrace Steiner’s proposal that the mathematical mind and the world are in alignment. One way to do this is to appeal to natural selection. It is no surprise that our mathematical instincts are reliable guides to the physical world because, like everything else, they evolved under the pressure of evolutionary processes. Again, Steiner anticipates the objection and provides a simple but effective response (akin to our response to a similar appeal to natural selection in section 6.2). An evolutionary explanation might succeed at the level of everyday mathematics and macroscopic physics, but there can be no such explanation for why recondite mathematical concepts should prove so effective in discovering the laws of the atomic and subatomic worlds. This reply casts serious doubt on whether natural selection can solve the problem. A third response is to appeal to some general theory of analogical reasoning that shows that Pythagorean analogies can, in fact, establish a measure of plausibility. I believe that many computational theories of analogical reasoning could accomplish this task. I have in mind the structuralist theories exemplified by some of the work of Gentner and her colleagues, and by Holyoak and Thagard. The problem, as explained in chapter 3, is that justification is not the central concern in such theories, and perhaps comes too easily if one constructs representations of two domains that exhibit similar syntactic structure. In short, Pythagorean solutions either fail or prove too much. The problem is that it’s simply not true that every Pythagorean analogy contributes to plausibility. Analogical arguments (whether Pythagorean or not) have to be considered individually, rather than tackled en masse. Consider once again Liston’s inductive argument. In discussing Schro¨dinger, Liston writes: “Schro¨dinger’s strategy is less blind faith in the formalism than it is faith that past successes of the formalism indicate that the formalism somehow encodes important physical facts” (2000, 203). This is still too general, but it is an interesting refinement of the simple inductive response that is on the right track. The suggestion amounts to this: an analogy that looks Pythagorean on the surface may in fact not rely on purely mathematical similarities. I would like to develop this suggestion. We don’t need either the inductive or the evolutionary arguments if we can show that a putatively Pythagorean analogy breaks down into elementary forms of analogical argument that are respectable because they are not, in fact, Pythagorean.
A Non-Pythagorean Solution My response to Steiner’s argument takes the form of a strategy for dealing with individual allegedly Pythagorean analogies, rather than a single argument that takes them all on at once. The strategy is to show that the impression of
218
BY PARALLEL REASONING
Pythagorean reasoning dissolves when we consider the fine structure of the argument. Upon close scrutiny—here is where my strategy takes on more shape—we find that these arguments are elementary types of analogical argument wrapped in complex physics (and mathematics), or else combinations of elementary types. They appear Pythagorean only if we lift part of the reasoning free of the full argument. My strategy can best be explained by examining our three examples of Pythagorean reasoning, with particular attention given to the first. Example 6.4 (continued). Recall Steiner’s view of Maxwell’s argument, as illustrated earlier in figure 6.2. There are two possible reasons to view this as a Pythagorean analogy. The first might be that the generalization from Ampe`re’s law is a purely mathematical move, lacking any physical basis. The second might be that any instance of the following simple pattern of analogical reasoning—let’s call it the equation analogy—counts as Pythagorean:
Equation analogy Differential equation
Observed solution
Mathematical solution
(physical reality)
(physical possibility)
Either reason, if cogent, would rule out any physical interpretation of the resemblance between observed and predicted solutions. In fact, neither reason is convincing. The generalization has clear physical motivation, and the equation analogy is a composite of accepted types of analogical reasoning. Let us take these points in turn. My first point is that if Steiner is correct and the generalization is motivated by the need to restore charge conservation, then it has a physical basis. The actual history is extremely complicated and the subject of much scholarly work (Bork 1963, Bromberg 1967, Hesse 1973, Morrison 2000), but in fact, it seems clear that in his earlier work, Maxwell motivated the generalization by an elaborate mechanical analogy. Siegel (1986,1991) takes Maxwell’s attempts to develop an elaborate mechanical model of the “magneto-electric medium” very seriously. Siegel notes that even in “Faraday’s Lines,” where Maxwell claims only a pedagogical value for mechanical analogies, he looks forward to proposing a theory in which “physical facts will be physically explained.” In his 1861 essay, “On Physical Lines of Force,” we no longer have a mere heuristic, but rather a “theory of molecular vortices.” Maxwell’s goal, writes Siegel, was “not a complete and consistent set of electromagnetic equations for its own sake, but rather a
SIMILARITY AND PATTERNS OF GENERALIZATION
219
complete and consistent mechanical model of the electromagnetic field” (1986, 113) in accordance with “the prevailing norms in British physics at the time.” In refusing to ignore this mechanical model, “rather than seeing Maxwell as a muddle-headed twentieth-century physicist who happened to live in the nineteenth century, one is able to see Maxwell as a clearheaded nineteenth-century physicist, whose accomplishments grew out of, and are comprehensible within, the nineteenth-century context” (1986, 145–46). Rather than undertaking a lengthy technical explanation of Maxwell’s mechanical models, which Siegel and others have provided, I offer only a brief summary. Maxwell’s model of the magneto-electric medium initially consisted of closely packed rotating cells (molecular vortices) filled with fluid, and separated by a single layer of small, spherical particles that functioned rather like ball bearings. Maxwell recognized that if the vortices all rotated with equal angular velocity, the ball bearings would remain stationary, but if the rotation were not uniform, the small particles would be carried along. He “identified” the ball bearings as electricity and their flux density as the current density J; the magnetic field corresponded to the large cells, with direction given by the axis of rotation and magnitude given by the speed of rotation. Ampe`re’s law then correctly describes the relationship between the rotation of the vortices and the motion of the ball bearings. Yet there are some problems. To derive this analog to Ampe`re’s law, Maxwell had to assume that the vortices rotated as rigid bodies; on the other hand, it was important for other reasons to assume that they were fluid. Maxwell’s compromise in his final model (part III of “On Physical Lines of Force”) was to picture the vortices as made of an elastic material. This change made it necessary to introduce a corrective term in the equation relating the motion of the ball bearings to the rotation of the vortices because deformations in the elastic material would change the motion of the ball bearings. The corrective term was precisely the displacement current. In terms of the models of analogical reasoning I have proposed, we have a complex abductive analogy (proceeding from formally analogous effects to formally analogous explanatory hypotheses). The two domains are Maxwell’s system of rotating vortices and an electric current. The first of these domains exists only as a model, but that changes nothing. The starting point for the analogical argument is a formal similarity between effects—namely, between the motion of the vortices and ball bearings, and phenomena characterized by Ampe`re’s law (as interpreted differently in the two domains). We infer, by analogy, similar hypotheses—that is, a microscopic mechanical medium with the corrective term, analogous to Maxwell’s system with the displacement current. This reasoning is not purely mathematical, but it is based on physical similarities. It has the same structure as Priestley’s argument. In his (1998), Steiner notes this historically based objection to his argument that Maxwell was employing a Pythagorean analogy. In his view, the mechanical analogies that Maxwell employed in the 1860s are largely irrelevant to the
220
BY PARALLEL REASONING
“Pythagorean nature of the 1873 reasoning.” The 1873 reasoning was mathematical “tinkering” motivated only by the need to restore charge conservation—it had no physical basis, yet it led to real physical predictions (i.e., electromagnetic radiation). But we have clear evidence of a comprehensible (though elaborate) process of analogical reasoning that went on in the 1860s. Furthermore, even if Steiner is correct and Maxwell in 1873 is offering an argument that has no logical dependence on the mechanical analogy, it is still very clear that the generalization step—introducing the displacement current—has a physical basis. Steiner could still adopt the second strategy identified above, however, arguing that no matter where the equation comes from, the equation analogy is Pythagorean. Suppose we have an equation some of whose solutions have been observed and are regarded as providing confirmation for the equation. We then argue from the mathematical possibility of a second solution to its physical possibility. That looks Pythagorean. My suggestion is that this sort of reasoning is a special, though perfectly respectable, mathematical analogy. It conforms to the pattern described in section 4.4 and in chapter 5: from formally similar assumptions, infer formally similar consequences. Here, the formally similar assumptions are the identical differential equation, and the idea is that in combination with certain unknown boundary conditions in the target domain, the differential equation yields new solutions. That resembles what happens in typical mathematical analogies. When we look at things in this way, we can see how misleading it is to characterize Maxwell’s reasoning as a purely mathematical analogy. Steiner is right that the pattern of analogical reasoning used by Maxwell is not valid in general—if that pattern is characterized as arbitrary generalization followed by the derivation of a new solution. The generalization has to be physically motivated. Yet it is clear that Maxwell’s motivation is not purely mathematical. There is nothing “mathematically natural” about Maxwell’s generalization of Ampe`re’s law. Being a solution of Maxwell’s equation is not a natural mathematical kind that would satisfy the aesthetic cravings of a pure mathematician. On the analysis I am suggesting, there is a purely mathematical analogy involved as part of the argument—Steiner is correct on this point—but it is thoroughly entangled with the physical reasoning. If we focus solely on that analogy, the reasoning looks Pythagorean; indeed, it looks like an implausible piece of reasoning. But the general law has a physical basis: it is bolstered either by an abductive analogy or by the plausibility argument about conservation of charge. We can then combine it with the mathematical analogy and there is no difficulty. By breaking Maxwell’s reasoning down into two successive analogical arguments—an abductive analogy followed by a mathematical analogy—each of which is plausible, we avoid having to see it as a purely formal and inadequately motivated set of mathematical manipulations. We also have reason to qualify the requirement of physical analogy: to the extent that purely mathematical reasoning is part of the story, some of the analogies involved may be purely mathematical. Yet the overall argument must still have a physical basis.
SIMILARITY AND PATTERNS OF GENERALIZATION
221
The foregoing discussion prompts the following tentative reformulation of the requirement of physical analogy: (6.3.6) Modified Requirement of Physical Analogy A necessary condition for a composite analogical argument in the empirical sciences to be plausible is that in at least one of the component analogies, the relevant similarities that constitute the basis for the argument have known physical significance. This is offered only as a necessary condition for plausibility. If correct, it provides a solution to Steiner’s problem because it rules out plausible Pythagorean analogies. Example 6.5 (continued). I suggest (also tentatively) that a similar analysis of a complex argument into two separate analogies applies to the Schro¨dinger example, and that once again the first analogy is not Pythagorean. Just as Steiner downplays the mechanical analogies that led Maxwell to the displacement current, he downplays the optical analogy that inspired Schro¨dinger. As Liston notes, particle optics may be viewed as an approximation to more general wave optics, and Schro¨dinger reasoned that particle mechanics should be generalized to wave mechanics. Furthermore, just as wave optics can treat phenomena such as diffraction that cause difficulty for particle optics, so, too, it would not have been unreasonable to hope that wave mechanics might be able to explain analogous quantum phenomena. Thus, one can see the initial generalization from particle mechanics to wave mechanics as being supported by an abductive analogy with a physical basis. This is then followed by a mathematical analogy between solutions to the Schro¨dinger equation. Complications may cloud this analysis, but the approach (I hope) is promising. These two examples exemplify my strategy for dealing with Pythagorean analogies in which a mathematical generalization of an empirically confirmed equation is followed by inference to the physical possibility of new solutions. The generalization is supported by an abductive analogy with a physical basis, and this is then followed by a mathematical analogy. Only when we isolate that second step, or ignore the physical basis of the first step, do we appear to have a Pythagorean analogy. Of course, Steiner identifies other types of Pythagorean analogy. I do not have sufficient space to deal with all of them. But let’s consider the relativistic Schro¨dinger equation, a case that looks like a spectacular success for Pythagorean reasoning. Example 6.6 (continued). Recall that Schro¨dinger took the substitutions E→i h
@ @t
p→i h
@ @x
that transformed Hamilton’s energy equation into the Schro¨dinger equation, and applied them to Einstein’s relativistic equation
222
BY PARALLEL REASONING
E 2 p2 c 2 ¼ m2 c4 to obtain the Klein-Gordon equation, 2 @ @2 −h2 2 c2 2 Ψðx; tÞ ¼ m2 c4 Ψðx; tÞ: @t @x Steiner argues that “no physical argument” justifies this substitution, which therefore counts as a Pythagorean analogy. Steiner is doubtless right that the success of this formal “guesswork” is something of a mystery. There does not appear to be any straightforward way to justify the faith of these theoreticians that the same formal manipulations that worked in the nonrelativistic case would succeed in a relativistic setting. But we need not see this faith as mathematical mysticism. Perhaps the mystery is very like what we encounter in any analogical argument. I suggest that while the physics may be very complicated, there is no reason the logic has to be. The reasoning in this example fits the pattern of abductive analogical reasoning. The wave equations in both the nonrelativistic and relativistic cases, equations (6.3.3) and (6.3.5) in the initial discussion of this example, can both be viewed as generalizations of the classical particle versions—equations (6.3.2) and (6.3.4)—which may be derived as analogous well-confirmed consequences. The logic is essentially the same as in Priestley’s argument (Example 4.2), even though the complexity of the physics is much greater. The analogy does have a physical basis. The challenge in coming up with a philosophical solution to Steiner’s problem is to explain why analogies such as those in our three examples can be plausible, while at the same time acknowledging that their success is somewhat mysterious. My proposal for meeting this challenge has been to show that complex, and apparently Pythagorean, analogical arguments either conform to a recognizable elementary pattern of analogical reasoning or involve a hybrid of such patterns, and that the arguments do in fact have a physical basis. We see successful “Pythagorean” analogies only when we isolate a fragment of the historical reasoning—a point suggested by Steiner himself when he writes, “No scientist mentioned here could have formulated valuable theories without an enormous fund of empirical information and prior modeling . . . my point is that this empirical information was brought to bear on new cases through the medium of mathematical classification” (1989, 5). I am not persuaded that there is no physical content in the abstractions or mathematical classifications that play a large part in Steiner’s formal analogies. The mathematical manipulations are, at some point, anchored in physical analogies at the level of observation. The plausibility of any formal analogy in science rests on how the starting points, in this case observations and phenomenological laws, are represented. That leaves scope for manipulation and arbitrariness. People who champion rival theories might describe the “same” observations differently. In the acoustical analogy, the starting point is the observed analogy between discrete spectral lines and discrete overtones of
SIMILARITY AND PATTERNS OF GENERALIZATION
223
a vibrating string. Proponents of the analogy attempted to represent the lines as multiples of a fundamental frequency so that they would exhibit structural similarity with the vibrations of a string. When this proved impossible, they tried to save the analogy by proposing that there were several distinct series of lines, each with its own fundamental frequency and harmonics. Critics charged that the assumption of distinct series was arbitrary. The key problem for assessing formal similarity is to decide whether the chosen representation is legitimate or forced. The difficulty here is reminiscent of the problem of “specious resemblance” for mathematical analogies (section 5.2). And the solution here is similar to our solution in the mathematical case: to impose a requirement that our representations have a motivation that is independent of the analogy. That is why we need the modified requirement of physical analogy (6.3.6). I end this discussion of formal similarity on a historical note. As defined, formal similarity is applicable only to formalized theories, and in particular to theories expressed in mathematical terms. Steiner’s examples are far advanced in mathematical sophistication. Yet formal analogies existed before the nineteenth and twentieth centuries. There is a looser notion, identified by Hesse, according to which two concepts are formally similar if they play “corresponding” roles in different theories. Such similarities figure in pre-mathematical analogies between sound and light. For example, Aristotle suggested that the senses of sight and hearing (and, in his view, smell) operate via a medium: “In both cases the [perceived] object sets in movement only what lies between, and this in turn sets the [sense] organ in movement” (De Anima 419a26). He conjectured that what happens when sound produces echoes “must be analogous to what happens in the case of light” (419b28)—that is, reflection. From the observation that light is always reflected from the bodies it strikes (otherwise, “outside what was directly illuminated by the sun there would be blank darkness”), he argued by analogy that “in all generation of sound echo takes place, though it is frequently only indistinctly heard.” The parallels that Aristotle draws between echoes and light, and elsewhere between air or water as the medium of sound and an ethereal medium of light, are best described as formal similarities—even though Aristotle’s theories were not formulated in mathematical language. For historical theories, and perhaps for any nonmathematical theory, Aristotle’s formula for analogy of proportion might serve as an alternative definition of formal similarity: “as one is to one thing, so is another to another (e.g. as knowledge stands to the object of knowledge, so is perception related to the object of perception)” (Topics 108a8). This definition is inspired by simple arithmetic relationships, just as the contemporary definition is inspired by the notion of isomorphism. The standard of representation is an evolving one.
6.4 Parametric Similarity and Continuity Arguments In many analogical arguments, the source and target domains are linked by a continuous, or gradually varying, series of intermediate cases. In many of these
224
BY PARALLEL REASONING
arguments, the key idea is that the feature of interest changes not at all, or else changes at a steady and predictable rate, as we make the transition from source to target. Analogies of this sort are appealing to the extent that they can exploit plausible assumptions about change and continuity. Our objective in this section is to examine those assumptions and to formulate a systematic approach to the assessment of these continuity arguments.
Characterization and Examples Let me begin with a simple characterization of the similarities that constitute the basis of a continuity argument. We represent them by means of parameters (or variables) that take distinct values in the source and target domains, and intermediate values for intermediate cases. I shall use the term parametric similarity to refer to any relationship between features that can be usefully represented in this way—that is, by the assignment of values to one or more continuous or gradually varying parameters.21 A continuity argument depends upon a cluster of such similarities, represented by a set of variables. In the source domain, these variables have a known causal relation to the feature that is the focus of the analogical argument.22 This causal connection, the prior association of earlier chapters, will occupy our attention shortly. First, let’s consider a few examples of continuity arguments, one new and three excerpted from chapter 4. Example 6.7 (dinosaur speed). Structural similarities between different animal species may be represented in proportional terms. McNeill Alexander (1991) compares bone measurements, body masses, and stride lengths of contemporary and extinct animals to draw conclusions about the speed and agility of the dinosaurs. The two central concepts in his analysis are geometric and dynamic similarity. Two body shapes are geometrically similar “if one can become identical to the other by a uniform change in the scale of length.” Two motions are dynamically similar “if one can become identical to the other by uniform changes in the scales of length, time and force.” The author characterizes dynamic similarity using the ratio v2/gl, where v is running speed, g is the acceleration due to gravity, and l is leg length. Based on this system, walking speeds for a brontosaurus can be estimated using those of an elephant. Most of the similarities and differences between the two animals are left unstated. The only ones that matter, on this analysis, are basic body shape (geometric similarity) and the quantities v and l that figure in the dimensionless ratio v2/gl (dynamic similarity). Example 4.8 (artificial and natural selection). Darwin’s famous analogical argument depends upon the idea that if breeding can produce noticeable variation in a few generations, then evolution through natural selection can account
21. The concept is close to geometric similarity for mathematical analogies, presented in chapter 5, though here it is not tied to syntactic form. 22. Continuity arguments in mathematics, which are noncausal, are treated in chapter 5.
SIMILARITY AND PATTERNS OF GENERALIZATION
225
for great diversity over a very long time period. Again, the concept of scale plays a major part in the argument. Example 4.9 (Peruvian pots). This analogical argument is based on the assumption that the social function of markings (signa´les) engraved on clay pots is unlikely to have changed in a long continuous cultural tradition linking the pre-Columbian Mochica to present-day Peruvian potters. Example 4.10 (slab lamps). In this argument, similarities between two Ice Age cultures—near-contemporary Inuit and prehistoric Europeans—make it plausible that just as the Inuit occasionally employed slabs of rock as makeshift lamps, such slabs played a similar role in Ice Age Europe. These examples all rely upon the presumption that some causal relationship persists across a range of domains. But there are some important differences. In particular, let us note four points of contrast. 1. Direct and indirect analogies. Some analogical arguments proceed by generalizing from known cases to a uniformity from which we then infer the desired conclusion about the target domain. Let us distinguish between indirect analogies, where the conclusion is inferred from a uniformity in this way, and direct analogies, where no uniformity plays this role. 2. Connected and unconnected analogies. In the second and third examples, we have a set of intermediate cases indexed by time. In dinosaur speed, by contrast, there is no series linking the elephant to the brontosaurus. Let us distinguish between connected and unconnected analogies according to whether or not the two domains are linked by intermediate cases. For connected analogies, we can distinguish further between continuous and noncontinuous connection based on whether the relevant parameter for the intermediate cases is real-valued or integer-valued. 3. Historical and nonhistorical analogies. As noted in section 4.8, archaeologists distinguish between historical analogies such as Peruvian pots, where the source and target domains are connected by a series of temporal intermediates, and nonhistorical analogies where this is not the case. Slab lamps is a nonhistorical analogy because there is no historical sequence of cultures linking Europeans of a few thousand years ago to recent North American Inuit cultures. This distinction is an important special case of the previous one. Clearly, historical analogies are always connected, but the reverse is not true. Two domains can be connected by a sequence of cases that are not historical intermediates. Scale models provide an illustration. A model ship is connected to its full-size target, at least in principle, by a series of models with similar shape. 4. Stability arguments and proportionality arguments. In the Peruvian pots argument, the crucial claim is that the variable of interest (the social function of signa´les) remains constant through the historical sequence of cultures linking the Mochica to present-day Peru. Gradual changes in the
226
BY PARALLEL REASONING
relevant parameters should produce no change. A similar claim seems to underlie the Slab lamps analogy, but without historical connection. We have a stability argument when the conclusion is that some variable has the same value in both the source and target domains. In proportionality arguments, the variable of interest is supposed to change in a manner proportional to changes in the other critical parameters. For example, Darwin’s analogy proceeds from proportionate effects (biological diversity in species and domestic varieties) to proportionate causes (evolution over different time scales).
Invariants Let’s focus first on the contrast between direct and indirect analogies. My first point is that some plausible continuity arguments proceed by establishing a uniformity but others do not. Many successful analogical arguments ultimately help to establish a uniformity, but don’t presuppose one. I also want to make a second point: it is a mistake to restrict our attention to uniformities. I propose that a continuity argument is plausible if it supports the extension of an invariant relationship, or simply an invariant, to the target domain. The contrast between invariants and uniformities (or laws) is explained in detail by Woodward (2003). Woodward represents causal relationships by means of structural equations that indicate how one variable depends counterfactually upon others. For example, the equation pffiffiffiffiffiffi T ¼ 2p l=g describes the dependence of the period of a simple pendulum on its length and the gravitational constant. It tells us what would happen to the period, T, if the length were altered. In general, a structural equation takes the form Y ¼ FðX1;...; Xn Þ; or more compactly, Y ¼ FðXÞ ðwhere X ¼ðX1;...; Xn ÞÞ : Here X1, . . . , Xn and Y are variables and the structural equation provides information about what would happen to the value of Y if the values of some or all of the independent variables X1, . . . , Xn were altered. It is understood that only relevant variables are included on the right-hand side. A variable Xi is relevant to Y if some intervention that changes the value of Xi would change the value of Y. An intervention, roughly, is a selective manipulation that changes the value of a variable. The relationship described by a structural equation is invariant if it is preserved over a suitable range of circumstances and interventions “in the neighborhood” of an actual situation, X = a (where a = (a1, . . . , an) specifies a
SIMILARITY AND PATTERNS OF GENERALIZATION
227
definite value ai for each Xi).23 An invariant is like an unpretentious uniformity. It is clearly formulated and it holds over a range of possible cases, but it need not satisfy all of the philosophical criteria traditionally associated with the notion of a uniformity or law. In particular, an invariant relation can have fairly limited scope. Furthermore, a structural equation may count as invariant even if not all parameters relevant to Y appear as variables on the right-hand side. For example, g may be interpreted as a constant so that the equation for a pendulum has the form T = F(l). This relationship holds over a range of manipulations (those performed near the surface of the earth). Invariants are more suitable than uniformities for the analysis of analogical arguments such as the one employed in the Peruvian pots example. The generalization asserting that the function of signa´les on Peruvian pots is to indicate ownership counts as an invariant, or as implying the existence of an invariant relationship between cultural and environmental factors and the markings. We would be unlikely to regard such a relationship as a uniformity. Woodward maintains that any singular causal claim implies that some invariant relationship obtains in a neighborhood of the case at hand. That is, the invariant relationship persists over some range of cases that differ slightly from the actual case. Singular causal claims count as explanatory for just that reason: they tell us what would happen under very similar conditions. The fact that singular causal claims and invariant relations have counterfactual consequences implies that they take some work to establish. Ideally, one observes what happens to Y under interventions on each of the variables X1, . . . , Xn to justify a claim such as equation (6.4.1). I suggest that invariants play a major role in continuity arguments. Recall the two fundamental principles of my model of analogical arguments: any acceptable analogical argument must be based upon an explicit prior association in the source domain, and there must be potential for generalizing that association to the target domain. When we restrict our attention to causal analogies and continuity arguments (as in this section), it is plausible to think that the prior association is an invariant causal relationship. Furthermore, I claim that a continuity argument is plausible if, and to the extent that, it supports the extension of a relevant invariant relationship to the target domain. Let us take these two points in turn. In chapter 4, for all types of causal analogical arguments, I proposed forms of prior association built upon Humphreys’s model of aleatory explanation. I believe that the main features of Humphreys’s model are compatible with Woodward’s ideas about invariants.24 To appreciate this, we need to acknowledge two different ways in which we might formulate an invariant. An indeterminate invariant is an abstract statement of functional dependence as in equation (6.4.1): Y = F(X1, . . . , Xn). This formulation indicates only that Y depends counterfactually on X1, . . . , Xn in some neighborhood of the actual situation. A 23. Woodward (2003) offers a detailed proposal in terms of “testing interventions,” but here I simply presume that the invariant relationship holds over a suitably large range of nearby cases. 24. Woodward himself acknowledges a broad similarity (2003, 12).
228
BY PARALLEL REASONING
determinate invariant requires a specific formulation of the functional dependence, provided by a formula or table of values. A Humphreys-style aleatory explanation of the form “Y = b because X1 = a1, despite X2 = a2” implies an indeterminate invariant Y = F(X1, X2), with a little additional information about the valence of the variables on the right-hand side.25 But Humphreys’s model is compatible with the existence of a determinate explanation, and hence a determinate invariant. The distinction between these two forms is important when we turn to analogical arguments and the business of generalizing an invariant relationship to the target domain. Specifically, we are now in a position to clarify the distinction made above between indirect and direct analogies. An indirect analogical argument proceeds via justification of a determinate invariant—that is, a determinate functional relationship. By contrast, a direct analogy aims only at supporting the extension of an indeterminate invariant. The basic objective is to show that the same general pattern of dependence holds over the range of intervening cases and extends to the target domain, and that no other variables matter.
Justification of Continuity Arguments On my theory, for a prima facie plausible analogical argument, there must be some support for a generalization that applies to both the source and target domains. As I have just been arguing, for continuity arguments, the relevant type of generalization is an invariant relationship. Several strategies can provide support for this type of relationship, depending on the nature of the connection between source and target domains. Table 6.1 provides a summary (with selected examples). The strategies become more widely applicable, but weaker, as we move down the list. 1. GENERAL SURVEY. One unimpeachable method to justify an analogical argument is to collect good evidence, via a general survey, for a broad invariant relationship that we can then apply to the target domain. This strategy is exemplified by the “uniformitarian” approach to ethnographic analogy discussed in section 4.8, and we saw the same idea earlier in Weitzenfeld’s analysis of analogical arguments (section 2.4). The dinosaur-speed example provides a possible illustration. We might be able to justify conjectures about dinosaur speed if we can derive them from some formula that has broad inductive support. In essence, this approach converts a direct analogical argument into an indirect one. An indirect analogical argument is really just a conventional inductive argument. The source domain ceases to play any distinctive part. It becomes just one piece of evidence for the general invariant relationship. While there is nothing wrong with this way of handling analogical arguments, it has
25. Here I have simplified things by assuming just one positive and one negative factor.
SIMILARITY AND PATTERNS OF GENERALIZATION
229
limited applicability. For most analogical arguments, it is not possible in advance to articulate, let alone to justify, a determinate invariant. To illustrate this point, consider Example 4.4 (morphine and meperidine)— Schaumann’s analogical argument that meperidine shares morphine’s analgesic properties. The argument is based partly on a resemblance in the chemical structure of meperidine and morphine. But there is an almost limitless family of morphine-like molecules that share chemical groups and chemical structures (Reynolds and Randall 1975, 371). It is not possible to predict which ones will share morphine’s analgesic properties; even slight changes can have a drastic impact. One discussion summarizes existing knowledge as follows: Although the examination of the structures of known potent analgesics shows certain features in common, it is still a fact that one cannot predict the activity of any new synthetic compound or devise a new compound which will have useful analgesic properties by making the compound to order. . . . Because of the large modifications in the electronic properties of the various parts of the molecule which may result from apparently minor modification in structure, it is still necessary for the biologist to test multiples of a thousand compounds before finding useful drugs. (Reynolds and Randall 1975, 375–76) Analgesic properties are not stable under slight variations in chemical structure. Schaumann’s argument relies heavily upon the unusual physiological response (tail curvature) to both morphine and meperidine. But there is little prospect for establishing a determinate invariant linking chemical structure plus tail curvature to analgesic effects. The general survey is the strategy of choice for indirect analogies, where it is feasible to formulate and justify a determinate invariant relation. But we need different means to justify direct analogical arguments. 2. CONTINUOUS CONNECTION AND THE SIMPLE STABILITY ARGUMENT. Consider the Peruvian-pots argument, a historical analogy. The Peruvian-pots example is a good counterexample to the view that every plausible ethnographic analogy must proceed via a uniformity. It is a strong analogical argument that illustrates what can be accomplished when we have a continuous historical connection between the source and target domains. Any attempt to represent the argument as proceeding indirectly, via the defense of some explicit uniformity, is doubly misleading. First, as noted earlier, any generalization about marks on Peruvian pots is likely to be an invariant relation rather than a uniformity. Second, and more important, to introduce such a generalization as an intermediate step obscures the nature of the reasoning in this argument. The argument relies not on generalization, but on simple mathematical ideas about continuity. Suppose that F(x) is a continuous function on some interval containing distinct points a and b. Suppose further that the range of F is discrete: the set
230
BY PARALLEL REASONING
of possible values of F is a set of distinct points. Then F(b) must be equal to F(a). For instance, consider 8 < −1; x < 0 FðxÞ ¼ 0; x ¼ 0 : : 1; x > 0 If F is continuous on an interval containing a and b, then it must be that F(b) = F(a). An analogous conclusion holds for a function of several variables. I suggest that the Peruvian-pots argument relies on similar reasoning. Specifically, we have what I shall call a simple stability argument. • Continuous function. The “social role” of the markings (signa´les) is a continuous function of various cultural and environmental parameters. • Discrete values. The set of possible social roles is finite (and hence discrete). • Continuous connection. Present-day Peru and pre-historic Mochica culture are historically linked by continuous cultural transition. Consequently, the social function of the signa´les in Mochica culture is the same as in present-day Peru. A similar analysis applies to any analogical argument where we have historically (i.e., spatiotemporally) linked source and target domains and a function that is continuous and has a discrete range of possible values. These three conditions provide the ideal setup for a continuity argument. The continuous function condition is certainly an idealization. What makes it acceptable in our example is that continuity, a local property, is plausible. At each point in time, if there is no jarring outside influence (e.g., the sudden intrusion of a practice from some other culture), there is no reason to expect a sudden change in the purpose of the signa´les. Another attractive feature of the example is that, while we must make some effort to spell out the relevant cultural parameters, there is no need for a complete list. The discrete values condition is also an idealization. The designated social role of the signa´les, to indicate ownership of a pot, embraces a range of more definite possibilities. But once again, the idealization is justified. Genuinely distinct functions, such as using the signa´les to indicate an artist’s identity or to take an inventory, are appropriately modeled as “distant” from each other. The discrete values assumption is a crucial feature of the simple stability argument, which can succeed only if the feature that is being transferred to the target domain is qualitative and discrete, rather than continuously variable. Qualitative features tend to be stable and well suited to be the focus of an analogical argument (as noted in section 6.2). The third condition, Continuous connection, is straightforward and requires no further comment. Of course, it is important to recognize limits in applying the mathematical model to real-world analogical arguments. If we interpret the three conditions literally, then remoteness of the target from the source in space and time should
SIMILARITY AND PATTERNS OF GENERALIZATION
231
make no difference to the strength of the analogical argument. But clearly, the more remote the target, the greater the probability that the relevant function is not continuous over the range of intermediate cases. 3. EXTREME CASES / UPPER AND LOWER BOUNDS. The strategy of considering extreme cases is useful if there is no determinate invariant in the picture, the function F(X1, . . . , Xn) is not constant, and the analogical argument depends only upon establishing some constraints on F. We have an extreme case when the relevant parameter values fall outside the range that includes the source and target domains. An extreme case can be useful in establishing upper or lower bounds for a quantitative relationship when a precise equation is not to be had. Further, those bounds may be good enough for a plausible analogical argument. To illustrate, McNeill Alexander is concerned with an analogy between the elephant and brontosaurus, but he can usefully consider smaller animals to support or refine an inexact generalization about the relations between walking speed, leg length, and other physical characteristics. If we interpret his reasoning in this way, we are treating his argument as a direct analogy that does not depend upon any precise uniformity. Arguably, we find the “extreme case” strategy in Darwin’s discussion of variation under domestication. In support of the “great power” of accumulative selection, he writes, “several of our eminent breeders have, even within a single lifetime, modified to a large extent their breeds of cattle and sheep”(1903, 22). Variation may often proceed at a leisurely pace, but the extreme cases help to secure the conclusion that (given enough time) even great diversity of species could be accounted for by selection. In summary, the strategy of introducing extreme cases contributes to the refinement and establishment of an invariant by providing information about the gross features of (and constraints upon) that invariant. 4. NEARLY CONTINUOUS CHAIN AND HYPOTHETICAL CASES. If there is no continuous series of intermediate cases between the source and target domains, then there is an obvious problem with the continuity argument. Still, we can approximate continuous connection with a series of transitional cases that vary gradually, and apply the same basic reasoning as in the simple stability argument. As a surrogate for continuity, and in keeping with our analysis of multiple analogies (in section 4.10), an appropriate requirement is that there be no known intermediate case that constitutes a counterexample. In schematic terms, if we are trying to show that F(b) = F(a), there must be no known c between a and b such that F(c) 6¼ F(a). An interesting variant on this strategy is the introduction of a hypothetical case in place of an actual intermediate. Hypothetical cases can be used either in support of or in opposition to an analogical argument—but the latter is more common. The “hypothetical case” strategy has received some attention in discussions of legal reasoning by analogy. Ashley (1990) notes two examples that illustrate the technique.
232
BY PARALLEL REASONING
Example 6.8 (church and state). A 1988 U.S. Supreme Court decision concerning separation of church and state related to a complaint about a public Christmas display.26 City Hall had set up a display including a Christmas tree and a cre`che with religious figures. The plaintiff argued that the display was unconstitutional. Part of the judge’s evaluation of the case involved raising hypothetical questions such as: “What if the display had included camels but no cre`che?” and “What if there had only been a tiny picture of a cre`che stapled on one of the Christmas tree ornaments?” These alternative scenarios, in which the plaintiff’s case would be less sure, could be connected to the actual scenario by a sequence of gradual transformations. They could also force the plaintiff to be clearer about the general principle that sparked the complaint. Example 6.9 (VCR sales). In a landmark copyright case, a major studio wanted to prevent Sony from producing videocassette recorders on grounds that they could be used to record copyright-protected material.27 Sony argued that most television material (at the time) was not copyright-protected, so that most of the time, use of the VCR would be legal. The hypothetical question here was whether Sony should be prevented from selling VCRs even if only a tiny proportion (such as 1 percent, or 0.01 percent) of its intended use would be illegal. As these examples suggest, the “hypothetical case” strategy can be useful in refuting an analogical argument. Certain thought experiments in the history of science provide illustrations. For example, Galileo’s famous refutation of the thesis that a heavy cannonball falls faster than a light musket ball involves the introduction of a hypothetical case where the two objects are attached (1638/ 1974). Here, the hypothetical case is arguably both intermediate and extreme (since the combined object is heavier than either), which leads to contradiction. 5. PROXIMATE TARGET. An analogical argument such as the one discussed in the slab-lamps example is appropriately classified as a continuity argument, even though there are no intermediate cases. The most natural strategy to support such an argument is to show that the source and target domains are proximate— that is, that the relevant conditions are nearly the same in the source and target domains.28 Here we exploit Woodward’s idea that if an invariant relation holds for some situation, then it holds locally. Provided the target domain is close to the source domain, it comes under the scope of the invariant and that suffices to make the analogical argument plausible. This is the style of reasoning that we find in the slab-lamps example. Even without historical connection, the relevant environmental and cultural factors between Ice Age Europe and recent Inuit societies are (it is argued) sufficiently close to justify the analogy. 26. Lynch v. Donnelly, in The Complete Oral Arguments of the Supreme Court of the United States (1988), Case No. 82–1256. 27. Sony Corp. v. Universal City Studios, in The Complete Oral Arguments of the Supreme Court of the United States (1988), Case No. 81–1687. 28. The relevant parameters are ascertained using the models developed in chapter 4.
SIMILARITY AND PATTERNS OF GENERALIZATION
233
6. INSENSITIVITY ARGUMENTS. The final, and weakest, strategy for supporting a continuity argument is to provide independent evidence that the focal parameter Y of the analogy is insensitive to changes in the values of X1, . . . , Xn as we pass from the source to the target domain. For example, we might argue that an artifact with highly specialized form could only serve one function, so that function is insensitive to any variation in cultural or environmental conditions. Two objects from remote cultures that share this specialized form are likely to play a similar cultural role. Let’s apply this idea to Example 4.6 (life on other worlds). The analogy between Earth and Mars is properly characterized as a continuity argument: the gist is that conditions supporting life on Earth are reasonably similar to what we would have found on ancient Mars. What makes this analogical argument a real challenge is the total absence of intermediate cases. Even if (as argued in section 4.5) we can find no critical disanalogy, how can we provide positive support for a general relationship that supports the hypothesis of life on ancient Mars? It is obvious that most of the strategies identified above fail. No general survey is possible, there are no intermediates, and there are no relevant extreme or hypothetical cases. Two strategies remain. We can employ a proximity argument. In fact, we do find arguments that Antarctica and Greenland are close analogs to ancient Mars (see section 4.5). In my view, this argument fails because Antarctica and ancient Mars remain a considerable distance apart. The target domain (Mars) lies outside the range of cases over which we could plausibly claim to have tested any invariant relationship about the conditions for life. The other strategy is to use an insensitivity argument, which means providing independent reasons to think that the existence of life is insensitive to the sorts of changes we find between Earth and Mars. The prospects for this argument are, once again, rather dim. My conclusion is that this particular analogical argument fails: there is no positive support for generalization, and therefore no basis for a plausible conclusion.
Proportionality Arguments So far, I have concentrated on stability arguments where the objective is to show that the value of some parameter is constant (or nearly constant) in the source and target domains. In a proportionality argument, the objective is to show that value of a parameter varies according to a proportional relationship. The above ideas provide a reasonable starting point for understanding proportionality arguments as well. In an illuminating article on scale models and similarity-based reasoning, Sterrett (2006) discusses how scientists rely on theory, practice, and experience to establish criteria for similarity in specific settings. She pays special attention to two cases: hydrodynamical analogies (in fluid dynamics) and the law of corresponding states (from chemistry and thermodynamics). Hydrodynamical analogies play a crucial role in justifying the use of wind tunnels and water
234
BY PARALLEL REASONING
channels to conduct tests, rather than employing full-size models. For such analogies, it is crucial to establish precise criteria of dynamical similarity. We need to know all of the parameters relevant to fluid flow, and we need to know how they should be adjusted so that a small-scale model gives reliable information about the much larger target. As Sterrett explains, one crucial element for similarity in hydrodynamics is geometric similarity. For example, a small boat should have the same hull shape as the large ship that it is meant to model. A second determinant is the Reynolds number, a dimensionless constant that combines several parameters relevant to fluid flow. This number is defined as the ratio of inertial forces (v·r) to viscous forces (m/L): Re ¼ ðv:rÞ=ðm=LÞ: Here, v stands for the average fluid velocity, m for fluid viscosity, r for fluid density, and L for a characteristic length for the system (e.g., pipe diameter). If the model (source domain) has the same geometry and the same Reynolds number as the target, then its fluid flow behavior will be the same.29 It is worth noting that we see essentially the same ideas, and even the same terminology, in the dinosaur-speed example. Extrapolating from this and other examples, Sterrett writes: In general, if you have a governing equation describing a phenomenon available, a similarity principle can be obtained straightforwardly by manipulating the equation into a form such that the requisite nondimensional parameters are obtained by inspection of the equation. However, you do not need a governing equation in order to obtain a similarity principle . . . Both hydrodynamical similarity and the law of corresponding states can be derived from merely the knowledge of the list of quantities upon which the phenomenon depends. That means that you need only know the form of the equation in the most general sense, i.e., that the phenomenon is a function of quantity 1, quantity 2, etc. As long as the list is complete and does not contain extraneous quantities, the method of dimensional analysis will show a valid similarity principle. (2006, 76) The quoted passage describes two distinct ways in which similarity reasoning might be utilized. In the first case, we have the governing equation. In such a case, we can still employ reasoning by analogy, using a scale model to determine behavior. This might be an attractive option if we want to avoid solving the equation (especially in an era when numerical methods were unavailable). But I set this case aside, as analogical reasoning in such a case appears to be philosophically unproblematic: one already possesses a broad generalization that takes in both source and target domains. 29. There are complications, but this simplified description is adequate for present purposes.
SIMILARITY AND PATTERNS OF GENERALIZATION
235
Sterrett’s second case corresponds to knowledge of what I have called an indeterminate invariant, Y = F(X1, . . . , Xn). The parameters Xi are combined into a nondimensional characteristic that serves as a similarity principle, and this principle is used to draw analogical inferences from scale models. Of course, there might be many ways of combining these parameters into a nondimensional quantity and one has to choose between them. My concern, however, lies not with how one comes up with a plausible nondimensional characteristic like the Reynolds number or, for that matter, McNeill Alexander’s v2/gl in the dinosaurspeed example. I am happy to accept Sterrett’s view that this kind of insight rests on general theoretical principles, practical experience, and ingenuity. Instead, I am interested in the mode of analogical argument by which we make a quantitative conjecture about the target domain. I suggest that this type of argument can be understood by adapting the simple stability argument (Strategy 2). We interpret Sterrett’s requirement of practical experience as implying that we have confirmed, over some range of cases and interventions, that Y = F(X1, . . . , Xn) is constant on surfaces characterized by K(X1, . . . , Xn) = c, where c is a constant and K stands for the nondimensional characteristic that combines the parameters X1, . . . , Xn. With the added assumption that F is defined over a neighborhood that includes the source and target domains, and certain other assumptions about the form of K30, we can (locally) represent the surface K(X1, . . . , Xn) = c as Xn = G(X1, . . . , Xn1) for some continuous function G. If we put H(X1, . . . , Xn1) = F(X1, . . . , Xn1, G(X1, . . . , Xn1)), this gives us HðX1;...; Xn−1 Þ ¼ constant for all values of (X1, . . . , Xn1) in some neighborhood including the source domain. The argument that H has this same constant value for the target domain is just the simple stability argument. But the condition that H is constant is equivalent to the condition that F is constant over the surface K(X1, . . . , Xn) = c, which was our objective. I conclude that the simple stability argument may be regarded as the philosophical basis for the sort of reasoning we find in more complex modeling. Of course, scaling up can fail. A parameter of negligible importance in the source domain can turn out to be significant for the target. But this difficulty exists for any stability argument, and indeed for any analogical argument. So long as the difficulty is not inevitable, it is no objection to plausibility. I have focused on the example of hydrodynamical analogies, where we want a justification for expecting constant proportion in the value of Y = F(X1, . . . , Xn) for source and target domains. We can extend this reasoning to analogical arguments where the value of Y is only roughly proportional. Rather than develop this point, let’s look at a final case where we are not concerned with the exact form of the invariant relationship.
30. Notably, K must be differentiable with some nonzero partial derivative. Without loss of generality, we assume here that its partial derivative with respect to Xn is nonzero.
236
BY PARALLEL REASONING
If a continuity argument depends only on establishing upper or lower bounds on the proportional relationship, then the extreme-case strategy becomes relevant, as suggested above in our preliminary discussion of Example 4.8. Darwin anticipates the objection that examples of artificial selection are drawn from comparatively recent history, and that there is no reason to think that evolution has always proceeded at the same rate. He takes pains to point out that his theory requires “no fixed law of development, causing all the inhabitants of an area to change abruptly, or simultaneously, or to an equal degree” (1860, chapter 11). If his concern is merely to establish that gradual change can produce noticeable changes in a few generations, so that dramatic changes can be accounted for by selection over a great span of time, then appeal to extreme cases is perfectly reasonable.
6.5 Conclusion In this chapter, I considered three types of similarity that are prevalent in scientific analogies. I suggested that these similarities should be understood as contributing to different forms of generalization. Analogies based on matching features in the source and target domain are associated with the extension of a scientifically significant kind. Analogies based on formal similarity lead to the identification of a common mathematical formalism. Finally, analogies based on variables that take different values in the two domains are oriented towards formulating or extending an invariant relationship. One of my central principles is that a plausible analogical argument should exhibit potential for generalization. I have argued that the different types of generalization at which analogical arguments aim shape our assessment of similarity and provide guidelines and strategies for evaluating those arguments. I have also suggested that the different forms of similarity identified in this chapter line up at least roughly with the categories of chapter 4, allowing us to enrich the models developed there. In this chapter, we have seen that my theory provides a platform for raising, and sometimes for answering, interesting questions. What makes an analogical argument like Franklin’s (Example 6.1) compelling, given that so many people attach little value to analogical arguments founded on “feature matching”? Can we account for the plausibility of certain pivotal, but mysterious, analogical arguments in nineteenth- and twentieth-century physics without resorting to mathematical mysticism? What basic assumptions underlie continuity arguments? These are worthwhile questions and my theory begins to offer answers. This is a good place to acknowledge an important limitation of my theory. My objective is to develop a model for analogical arguments at a level of detail intermediate between an elementary “commonsense” description of analogical arguments and a meticulous case study. The argument forms in chapter 4 and the characterizations of similarity in the current chapter are pitched at this intermediate level of generality. Inevitably, a sophisticated analysis of a particular
SIMILARITY AND PATTERNS OF GENERALIZATION
237
analogical argument will be more illuminating than the criteria that I have proposed. Still, my hope is that both the general theory and the guidelines for different types of analogical argument are useful starting points in thinking about individual analogical arguments. In addition, as I shall argue in the next two chapters, my theory has sufficient content to shed light on the problem of providing a justification for analogical reasoning.
This page intentionally left blank
7 Analogy and Epistemic Values
7.1 Introduction Good analogical arguments share a common logical core, despite wide variation across disciplines and historical contexts. That common core, I believe, is captured by two simple and fundamental principles: prior association and potential for generalization. There must be a clearly articulated connection in the source domain, and there must be reason to believe that such a connection can be generalized to cover the target domain. Together, these principles define the articulation model. In previous chapters, I have explained how the two principles are to be interpreted and applied in a variety of settings. In this chapter and the next, I provide philosophical justification for these two principles as well as a deeper understanding of the role of analogical reasoning in scientific inquiry. Specifically, I hope to establish that if a hypothesis is supported by an analogical argument which satisfies the two fundamental principles, and if there is no independent reason to reject it, then that hypothesis is prima facie plausible. To identify a hypothesis as prima facie plausible is to draw it out from the mass of undistinguished possibilities compatible with the evidence; it is to assert that further investigation is reasonable, unless and until a clearly superior candidate is available. I develop three separate lines of argument in support of this conclusion. First, I argue that the articulation model is supported by considered judgments about what counts as good or bad analogical reasoning. Some of these considered judgments relate to individual analogical arguments. Chapters 4 through 6 show that the articulation model agrees with our intuitive judgments about some such arguments
240
BY PARALLEL REASONING
and that it yields new insights about others. Considered judgments also take the form of general guidelines for evaluating analogical arguments, such as those presented in section 1.6. In section 7.2, I suggest that these general guidelines are appropriately summarized and corrected by the two principles of my theory. In order for an appeal to considered judgments of either sort to provide meaningful support for the articulation model, however, those judgments would require independent justification. Rather than suppose that they have it, I regard this part of the argument as establishing only two modest conclusions. First, despite their generality, the two fundamental principles are nonvacuous: they provide a useful framework for thinking about analogies. Second, the two principles offer a compact representation of collective “commonsense” wisdom about analogical arguments. The second line of argument, presented in sections 7.3 through 7.5, is a topdown argument that analogical reasoning is justified because it is especially well suited to attain certain goals or values of scientific inquiry—values associated with Kuhnian “normal science.” In simple terms, analogical reasoning in conformity with the two principles strikes an excellent balance between the competing demands of stability and innovation. In order to explain this claim and to make it plausible, we explore (in section 7.4) the nature and justification of analogical reasoning in the parallel setting of the law. Stare decisis, the principle that like cases should be decided alike, lies at the foundation of legal reasoning. By contrast with the situation in science, the basic justification for employing this principle is straightforward. The doctrine of precedent promotes consistency and predictability while still permitting the evolution of the legal system. We seek an analogous balance in scientific reasoning. I argue in section 7.5 that the articulation model is the appropriate analogue to stare decisis and that it can be given an analogous justification. Taking the comparison a step further, both legal and scientific reasoning are paradigmatic in nature. To solve new problems, we must inevitably extend the assumptions and methods used in past solutions. Analogical reasoning in accordance with the two principles is optimally suited to solving puzzles in this manner. That is the underlying reason for its appropriateness in both the law and science. The existence of this common structure leads to the third line of argument, presented in chapter 8. This is a bottom-up justification for analogical reasoning based on an appeal to symmetry.
7.2 Review of Commonsense Guidelines In section 1.6, I identified a number of commonsense criteria for evaluating analogical arguments. For convenience, I repeat them here. (CS1) The more similarities (between the two domains), the stronger the analogy. (CS2) The more differences, the weaker the analogy. (CS3) The greater the extent of our ignorance about the two domains, the weaker the analogy.
ANALOGY AND EPISTEMIC VALUES
241
(CS4) The weaker the conclusion, the more plausible the analogy. (CS5) Analogies involving causal relations are more plausible than those not involving causal relations. (CS6) Structural analogies are stronger than those based on superficial similarities. (CS7) The relevance of the similarities and differences to the conclusion (i.e., to the hypothetical analogy) must be taken into account. (CS8) Multiple analogies supporting the same conclusion make the argument stronger. The theory developed in chapters 4 through 6 both summarizes and improves upon these criteria. Consider criteria (CS1) and (CS2). The articulation model preserves and corrects these ideas: as we add relevant similarities and remove relevant differences, the potential for generalization increases. At the same time, there is room for nuance. Our principle of prior association, employed in conjunction with the models of chapter 4, provides a systematic way to determine whether similarities and differences are relevant to the conclusion at hand. We don’t have to rely upon a criterion of relevance that is not context-sensitive.1 Furthermore, what matters is not so much the number of similarities and differences as their importance, and that importance is taken into account by our second principle: potential for generalization. The models of chapter 4 allow us to distinguish between differences that are critical and those that are not. Even for the weakest type of analogy, where the prior association is simply a statistical correlation, the two principles guide us in our application of criteria (CS1) and (CS2). I argued in section 6.2 that such analogies tend to be preliminary efforts at classification (as illustrated by examples drawn from physics and biology). Ultimately, the two items under comparison may be located within a common species or genus. The principle of potential for generalization should be applied with this point in mind. Similarities strengthen the argument to the extent that the properties involved are causally relevant for the type of classification under consideration, and to the extent that the occurrence of these properties is informative. The morphine and meperidine case (Example 4.4) is a good illustration of the second criterion: the S-shaped tail curvature induced by this drug and morphine was unusual enough to provide support for the analogical inference that meperidine would share morphine’s analgesic effects. Our two principles also allow us to refine criterion (CS3), which plays a central part in Mill’s work on analogical reasoning. The valid core of this criterion is that we can strengthen an analogy by shifting relevant properties from the neutral analogy (the area of ignorance) to the positive analogy (the area of known similarities). Under some circumstances, however, Mill’s principle is misleading. Analogical arguments may actually become weaker as our ignorance about two domains decreases. Arguments that might have been persuasive long ago are now dismissed. 1. The most popular one in textbook discussions is causal relevance; see below.
242
BY PARALLEL REASONING
A good illustration, mentioned in chapter 1, is Aristotle’s argument that earthquakes must be a kind of exhalation of wind. As he puts it, “the action of the wind in the Earth” is “analogous to the tremors and throbbings caused in us by the force of the wind contained in our bodies.” In Aristotle’s time, this analogy might have had some merit; today, with greater knowledge, it appears childish. The reason is not simply that we have amassed greater knowledge of similarities and differences between our bodies and the Earth but, rather, that we have more sophisticated models of the processes involved (i.e., the prior association). In similar fashion, with increased scientific understanding of the processes relevant to the origin of life (and of conditions on the other planets), the analogical argument for life on other planets in our solar system is generally less plausible than in earlier times. Criterion (CS3), then, suffers from two weaknesses. The first is that increases in knowledge can shift features out of the neutral analogy in two directions: to the positive analogy or to the negative analogy. The analogical argument only becomes stronger if the shift is toward the positive analogy. The second weakness is that it fails to account for changing scientific standards as to what counts as a legitimate prior association. Changes in the way science is done can invalidate a previously acceptable analogical argument by ruling out certain kinds of relevance relations.2 We have also been able to refine other common views about analogies. Criterion (CS5) asserts the popular view that the strongest analogical arguments are those based on analogies involving causal relations. Hesse’s theory (section 2.3) holds that scientific analogies must involve a causal relation, and this assertion is repeated in many textbook discussions (e.g., Copi and Cohen 2002). Causal relevance certainly does play a central role in analogical arguments, and that point is recognized in the models of chapter 4. From the many examples of analogy that we have considered, however, it should be clear that we need to encompass other relevance relations in addition to causation and also that causal relations may sometimes fail to be relevant. A closely related opinion that has gained broad acceptance is criterion (CS6), which states that structural analogies support stronger conclusions than analogies based on superficial similarities. As with the claim about causation, this view is partially correct in that detailed knowledge of a prior association typically does require structural knowledge. But we have seen both that structural similarity can sometimes be irrelevant or misleading and that allegedly superficial similarities can be highly significant (as they are in ethnographic analogies and in Franklin’s argument about lightning, Example 6.1). As always, the key is to attend carefully to the prior association. The two fundamental principles thus condense and refine our commonsense intuitions about analogy. The models and ideas developed in chapters 4 through 6 provide a systematic way to evaluate the plausibility of individual analogical arguments that largely agrees with our considered judgments. These observations provide some support for the reasonableness of the two principles. Because there is so much latitude in the way that these two principles are applied, however, and because I do not wish to assume that our considered
2. The Sizzi example, from section 1.6, also illustrates this type of development.
ANALOGY AND EPISTEMIC VALUES
243
judgments of plausibility are correct, we need to find a more satisfying justification. The place to look is among the core values that guide scientific inquiry.
7.3 Conservative and Progressive Values William James’s two famous injunctions—“Shun error! Believe truth!”—express with admirable brevity the importance of both conservative and progressive objectives in scientific inquiry. Yet the slogans are a little too succinct. The need to balance conservative and progressive values in science goes beyond truth and error. The selection of a scientific theory is based partly upon its logical and confirmational virtues (such as internal consistency and evidential support) and partly upon pragmatic virtues (such as informativeness and economy). Conservative and progressive impulses apply to the latter virtues as much as to the former ones. Furthermore, the pragmatic virtues are highly important in assessing plausibility. That turns out to be crucial in providing a justification for analogical arguments. Conservatism has sometimes been characterized as the position that the less rejection of current beliefs required to embrace a hypothesis, the more plausible it is, other things being equal.3 Since we need not be so strongly attached to all of our present beliefs, let us replace “current beliefs” with “current beliefs belonging to some privileged class.” Even so, the formulation is unsatisfactory for a number of reasons. In addition to its vagueness, it defines conservatism too narrowly by limiting its impact to situations where outright rejection of a currently held belief is at issue. Conservatism can exert its influence even in new areas of investigation. Conservatism is also sometimes identified with the even narrower thesis that the mere acceptance of a belief, or of any belief within some privileged class, constitutes a measure of justification for that belief.4 It is not very clear, however, how to make this idea coherent. On a probabilistic model of belief, for instance, we might interpret acceptance of a belief as the assignment to it of probability 1. But then what could be the effect of additional justification, since the probability is already maximal? Nor, if acceptance means something like the assignment of a very high probability (Kyburg and Teng 2001), does the mere fact of acceptance give us any reason to boost that probability further.5 More generally, acceptance already signals firm commitment to a belief or set of beliefs. Conservatism is concerned not so much with the justification for this commitment as with its implications about the appropriate attitude to take toward other beliefs. Despite the difficulty of defining the notion, the motivations for embracing some form of epistemic conservatism are relatively clear. One reason is the safety 3. This description is found in Quine and Ullian (1970). See also Harman (1973, 159): “We attempt to make the least change in our antecedent view that will maximize explanatory coherence.” 4. For example, Chisholm’s formulation is as follows (1981, 14): “Necessarily, if S believes p and nothing else S believes explicitly contradicts p, then believing p has some presumption in its favor for S.” 5. In certain situations, such as Pascal’s wager, one might have a pragmatic reason to try to boost one’s degree of belief in a proposition. Even so, that reason is very different from mere acceptance.
244
BY PARALLEL REASONING
inherent in incremental change—in particular, the reduced risk of falling into inconsistency. Another is the importance of not bringing basic assumptions constantly into question. And, of course, there is the hope that one’s most cherished beliefs are actually true. I shall use the term conservative in two senses. The first is to describe any system of beliefs and practices that includes a privileged subclass that one is especially reluctant to abandon or revise. The second is as a label for specific epistemic values that guide theory selection: coherence, simplicity, unification. These are retrospective or conservative values because they take existing beliefs for granted and consider how well a new theory or hypothesis fits in. Conservatism means commitment to a conservative system or to conservative epistemic values—usually both. Kuhn’s account (1962) of paradigm-governed normal science offers a detailed picture of a conservative practice and the conservative values that influence that practice, though the discussion of values is more explicit in his later work (1977). Scientists working within a paradigm accept a core set of assumptions, methods, and worked-out problem solutions: the “privileged class.” To accept the paradigm is to make a strong commitment indeed; discrepancies from anticipated results are viewed as failures on the part of the scientist, not refutations of the core assumptions. Without such a commitment, Kuhn argues, researchers would lack the confidence to carry out the kind of meticulous work that goes on in mature sciences. Despite, and indeed because of, this extreme resistance to novelty, normal science evolves steadily. Only a persistent and significant anomaly can force the eventual rejection of the paradigm, and only when an alternative paradigm is available. That is what occurs during scientific revolutions. Stability and conservative values are important to normal science, but so is innovation. I use the terms innovative and progressive primarily as labels for certain forward-looking epistemic values used in theory choice: fruitfulness, scope and once again unification (which looks both to past and future accomplishment). We can also speak of a practice in which such values are important as innovative or progressive. In an illuminating discussion of Kuhn’s Structure and subsequent work, McMullin (1993) distinguishes between the primary goals for a scientific theory, among which he gives preeminence to empirical adequacy and explanatory power, and secondary epistemic values, such as simplicity, fertility, unifying power, and coherence. We have already divided everything on the secondary list (with the addition of scope) into our two categories of conservative and progressive values.6 McMullin’s list of primary goals is debatable,7 but we certainly do need to include such values as accuracy, precision, and empirical adequacy (or truth). These are not properly characterized as either conservative or progressive, and we follow McMullin in simply calling them primary goals or values. What is the connection between primary and secondary values? In McMullin’s view, the primary goals “define the activity of science,” while the secondary values 6. As noted, we can put unification on both lists. 7. A realist would want to subordinate empirical adequacy to obtaining a true picture of reality; an anti-realist might object to “explanatory power.”
ANALOGY AND EPISTEMIC VALUES
245
are thought to serve as means to the primary goals, typically by providing “clues” to which theories are likely to achieve the primary goals. Some people reject the idea that secondary values are linked to empirical adequacy (or truth), or at least deny that any such connection can be justified in a respectable manner. Van Fraassen, for example, argues that simplicity is a purely pragmatic virtue: Simplicity . . . is obviously a criterion in theory choice, or at least a term in theory appraisal. For that reason, some writings on the subject of induction suggest that simple theories are more likely to be true. But it is surely absurd to think that the world is more likely to be simple than complicated (unless one has certain metaphysical or theological views not usually accepted as legitimate factors in scientific inference). The point is that the virtue, or patchwork of virtues, indicated by the term is a factor in theory appraisal, but does not indicate special features that make a theory more likely to be true (or empirically adequate). (1980, 90) On this view, secondary values are still important in theory choice, but they are independent of the primary goals and presumably carry less weight. We evaluate not just scientific theories or hypotheses but also methods according to epistemic values. We want methods that support theories that are accurate, empirically adequate, fruitful, and so forth. Now, just as we find disagreement about the merits of competing theories, we should expect disagreement about methods. As Kuhn has pointed out, even though the list of epistemic values important to science may be stable and widely shared, the precise interpretation and weight given to each such value varies from one scientist to another. We expect controversies about method, and we find them. Despite such variation, however, Kuhn identifies analogical reasoning as the main engine of expansion for normal science.8 Commitment to the core principles and problem-solving methods of a paradigm generates the confidence and creative force that leads scientists to extend those methods to new applications by means of analogies. In fact, the breakdown of analogical reasoning coincides with the point where normal science leaves off and scientific revolutions take place. But in normal science, analogical reasoning is the, or at least a, principal problem-solving method used by scientists. Kuhn does not, of course, attempt any justification for analogical reasoning, and he would not regard it as a method. My view is that it can be usefully viewed as a method, albeit one that is characterized by very general principles. Those principles become determinate only in particular contexts and are applied somewhat differently by different people, but they are still the common denominator for good analogical arguments. I believe that we can combine Kuhn’s early work on normal science and his later work on scientific values to show that analogical arguments in accordance with the two principles are well suited to meet the imperatives of both conservatism and innovation (i.e., the secondary values). 8. In the 1969 Postscript to his Structure of Scientific Revolutions, Kuhn speaks of this idea as “the most novel and least understood aspect” of his book. As he puts the point there, “Scientists solve puzzles by modeling them on previous puzzle-solutions, often with only minimal recourse to symbolic generalizations” (Kuhn 970, 189–190).
246
BY PARALLEL REASONING
This claim is valid for a wide range of differing interpretations and weights assigned to those values. Furthermore, although the claim makes no mention of the primary values of science, that omission turns out to be appropriate! My main purpose in this section and the next two is to clarify and defend this connection between analogies and the competing values that influence normal science. In the final chapter, I suggest that a similar connection may exist even in scientific revolutions where analogical reasoning plays a part. The general point is that the sort of conservatism required by science is not merely compatible with but also mandates a modicum of open-mindedness, and this can even extend to paradigm shifts. For the moment, however, we limit our attention to normal science. Whether or not we accept every detail in Kuhn’s portrayal of normal science, the epistemic values we have identified do exert significant influence there. In order to answer our questions about analogies, conservatism and innovation, it is helpful first to consider an enterprise in which all of these matters are much more straightforward: the law.
7.4 Analogy in Legal Reasoning Stare decisis Stare decisis is the doctrine of precedent. I shall concentrate on the most rigid version of this doctrine, which can be found in the English common-law tradition. It has three principal features (Cross and Harris 1991, 5; italics added): 1. Respect is paid to single decisions of any superior court. 2. Any decision of such a court is a binding precedent on that court and any lower court. 3. Any decision of such a court is a persuasive precedent on higher courts. A precedent is binding if the judge of the present case is obligated to decide it in the same way as the previous case, even if he or she has good reason not to do so. A precedent is persuasive if the present case must be decided in the same way as the previous case unless the judge can provide good reason not to do so. The description makes reference to the hierarchy of English courts, rising from the lowest level of County and Magistrates’ Courts up through High Courts, Courts of Appeal, and finally the House of Lords. There are important differences between the application of stare decisis in England and elsewhere, a point to which I return later. To decide a present case in the same way as a previous case means to apply the ratio decidendi of the previous case to the present one. The ratio decidendi is the portion of the judgment in which a judge enunciates a general legal principle or principles on which the decision is based. Writers on jurisprudence agree that this part of the decision is not always easy to identify, and that it must always be understood in the context of the facts of the case (Cross and Harris 1991, Levi 1949). In a much-cited case in which the House of Lords found the manufacturer
ANALOGY AND EPISTEMIC VALUES
247
of a bottle of ginger beer liable for damages to a consumer who became ill as a result of a dead snail in the bottle, the ratio decidendi is contained in the following passage: a manufacturer of products, which he sells in such a form as to show that he intends them to reach the ultimate consumer in the form in which they left him with no reasonable possibility of intermediate examination, and with the knowledge that the absence of reasonable care in the preparation or putting up of the products will result in an injury to the consumer’s life or property, owes a duty to the consumer to take that reasonable care. [Donoghue v. Stevenson 1932 AC 599] When there is no relevant difference between the current case and a previous case from a court of equal or higher jurisdiction, that past decision must be applied. Where relevant differences exist, however, the judge has to determine the generality of the ratio decidendi. There is room for debate about whether the duty of reasonable care just described is limited to certain types of product, how stringently to understand “no reasonable possibility of intermediate examination,” and so forth. The principle articulated in Donoghue v Stevenson was later applied to allow recovery for harm against the manufacturers of defective underpants9 and against an engineering firm whose negligent repair work (rather than manufacture) caused the collapse of a lift.10 Donoghue v Stevenson was not extended to a case in which a workman was injured by a defective crane, because the workman had ample opportunity to examine the crane and was, in fact, aware of the defects.11 In this case, the ratio decidendi was quite naturally construed narrowly enough so that the past case could be distinguished from the case at hand and did not constrain the decision. This flexibility about interpreting the scope of a past decision is crucial to the art of legal reasoning. Llewellyn (1960) has written eloquently about “loose” and “strict” interpretations of the authority of precedent. The loose view construes the ratio of a case broadly in order to apply it to diverse new cases. The strict view construes the ratio of a case narrowly, perhaps even to the point where there is no real possibility of application to any future case. In this way, the law can marginalize past decisions that have come to be viewed as inappropriate or erroneous, or based upon an unsatisfactory ratio: “[The strict view] is applied to unwelcome precedents. It is the recognized, legitimate, honorable technique for whittling precedents away, for making the lawyer, in his argument, and the court, in its decision, free of them. It is the surgeon’s knife” (1960, 67). With Llewellyn’s metaphor in mind, an apt illustration for the technique he discusses is Hillyer v St. Bartholemew’s Hospital.12 In this case, the court unanimously agreed not to find for a patient who attempted to sue a hospital when he sustained burns while under anaesthetic. The examining doctor, who had been 9. 10. 11. 12.
Grant v. Australian Knitting Mills [1936] AC 85. Haseltine v. Daw [1941] 2 KB 343. Farr v. Butters [1932] 2 KB 606. [1909] 2 KB 820.
248
BY PARALLEL REASONING
chosen by the patient, was not a hospital employee, but the hospital provided the operating theatre as well as nurses and porters. One judge argued on narrow grounds that the staff had effectively ceased to function as hospital employees while in the service of the doctor, and further that the burns could not be proved to be the fault of the hospital staff rather than the attending doctor. A second judge advanced the much broader and arguably reckless principle that a hospital’s sole responsibility to its patients is to use reasonable care in the selection of nurses and porters. Rather than allow this dangerous principle to be extended, a later decision limited the applicability of Hillyer to cases sharing its exact circumstances.13 The strict interpretation, then, can be used to limit the harm that might result from bad legal judgments. Of course, that interpretation gives rise to still more concerns about idiosyncratic reasoning, legal quibbling, and inexperience: since no two cases are ever identical, an unscrupulous or clueless judge might distinguish a case that ought not to be distinguished. In response to such concerns, Llewellyn argues that distinguishing past cases requires considerable effort and legal skill; thus, the technique is less likely to be used successfully by an inexperienced or foolish judge.14 As additional protection from the dangers of a bad precedent, stare decisis is subject to exceptions, the most notable of which is that the highest court (the House of Lords) has the right to depart from its own past decisions, though these are “normally binding.”15 Departure can be made “when it appears right to do so,” by which is meant that rigid adherence to a precedent would “lead to injustice” or “unduly restrict the proper development of the law,” as when a past decision is obsolete. Although we have focused on analogical reasoning in English law, it has great importance in every known legal system. The U.S. Supreme Court, though bound by stare decisis, has a more relaxed attitude than the British High Court to exceptions. This difference can be traced to such factors as the importance of the Constitution and the existence of fifty different state jurisdictions that the Supreme Court must oversee. The French legal system is based on civil and penal codes that are supposed to cover every conceivable situation, and there is no official doctrine of precedent at all. In practice, however, French judges pay close attention to decisions of their high court, the Cour de Cassation. A similar situation obtains in other countries with a civil-law, as opposed to a case-law, system. It is important to note that analogical reasoning in the law exemplifies our principles of prior association and potential for generalization. Both principles are embodied in the ratio decidendi, a legal principle that links the facts in the previous case to the judge’s decision and which is extended or generalized to justify a similar relationship in the new case.
13. Cassidy v. Minister of Health [1951] 2 KB 343. 14. And of course, there is no such thing as an unscrupulous judge. 15. This became official doctrine in 1966. See Cross and Harris (1991, 104).
ANALOGY AND EPISTEMIC VALUES
249
Justification for Stare decisis Stare decisis, when called into question at all, is traditionally justified with solidly pragmatic arguments. The doctrine has both costs and benefits, but it is universally agreed that the principal benefits—consistency, certainty, and finality in the law—outweigh the costs, which consist mainly of occasional harm or injustice as well as the danger of judgments out of step with changing social values. Arguing in the late nineteenth century that even the House of Lords should be bound by its past decisions, Lord Halsbury wrote: cases of individual hardship may arise, and there may be a current of opinion in the profession that such and such a judgment was erroneous; but what is that occasional interference with what is perhaps abstract justice, as compared with the inconvenience—the disastrous inconvenience—of having each question subject to being re-argued and the dealings of mankind rendered doubtful by reason of different decisions, so that in truth and in fact there would be no real final court of appeal? (London Tramways v. London County Council 1898 AC 380) Times change, however, and even the British legal system has moved to a more liberal interpretation of the doctrine of precedent. As noted, the House of Lords reserved the right to depart from precedent in its Practice Statement of 1966, though even that statement reaffirmed the traditional arguments for stare decisis: the doctrine provides a “degree of certainty” that allows people to regulate their conduct and allows for the “orderly development” of the legal system. The point about orderly development is important. Even the most routine decisions require some extension of principles invoked in past cases, so that gradual evolution is unavoidable and even desirable. More dramatic change occurs when a precedent regarded as anomalous or inappropriate is distinguished as outlined above. The principal benefits cited for stare decisis, then, are stability, consistency, and the continuous and orderly development of the legal system. These are conservative political and moral values. Against these benefits we must weigh the principal costs attributed to the doctrine: an inflexibility that can lead to particular decisions that are unjust, and an inability to respond in a timely manner to changing social values. The criticism here is that lawyers and judges do things they know to be irrational “for no better reason than that they have seen some one else do them” (Radin 1933, 199). When we take into account the safety mechanisms that allow these costs to be mitigated, however, I suggest that the argument for stare decisis becomes compelling. Still, what does this argument really show? Because it will be the “source” on which we base our justification for analogical reasoning in science, we need to be as clear as possible about the conclusion. The first point to notice is that the justification proceeds at the level of policy. It is a rule-utilitarian argument. There is no attempt to show that stare decisis gives the best possible result in each particular case; indeed, such an assertion is implausible. Rather, the idea
250
BY PARALLEL REASONING
is that the policy is, on the whole, good at balancing certain social objectives. Sophisticated versions of stare decisis attempt to fine-tune the doctrine so that problems (although admittedly inevitable) are kept to a minimum. Second, we can achieve a little more clarity about what is being claimed for stare decisis (as opposed to alternative principles of legal reasoning) by modeling our cost-benefit analysis using decision theory. That means we need to identify a set of available options, a partition of possible circumstances or states, and premises about the probabilities of these circumstances and the values of the many possible outcomes. The rational choice is, then, the option with maximum expected value. Simplifying somewhat, we might suppose that the relevant options are to adopt a British-style case-law system, adopt some other case-law system, embrace a legal tradition based on statutes or codes, or allow the individual judge full discretion in decision-making. These are the historically available choices. Our premises are that we should attach high utility to conservative values such as stability, moderate value to flexibility, and relatively low probability to the occurrence of situations where a radical departure from precedent is appropriate. Tacitly, we assume that no other costs and benefits are relevant. Given these premises, case-law systems that require adherence to precedent with few exceptions come out well: we can expect them to do as well as or better than the alternatives. We do not, however, want to be in the uncomfortable position of arguing that case-law traditions are superior to civil-law traditions, to say nothing of touting the British system as the pinnacle of achievement.16 Happily, we don’t have to make that argument. Each system has its strengths and weaknesses, reflecting different priorities in different legal traditions, and we need to modify our argument in a way that accommodates this cliche´. A much more plausible claim, sufficient for our purposes, is that existing case-law systems are optimal in a weaker sense: no alternative system is preferable in every important respect. To make sense of this claim, we move to a model in which the different dimensions of value are represented in vector form, rather than combined into a single number. Simplifying again, let us suppose that we rate competing legal policies along just two relevant dimensions of value, lumping the conservative objectives of the law in one group and progressive values in the other. The British doctrine of precedent might fare somewhat better than the American or French under the first heading, but somewhat worse under the second. Without assigning weights to these two dimensions, each of these legal systems is weakly optimal: no other available option fares better along both dimensions.17
16. Such a view would not be without precedent. In the words of one High Court justice, the strict version of stare decisis “hits the golden mean between too much flexibility and too much rigidity” (Holdsworth, 1903, 12: 146ff.). 17. A useful analogy here is the notion of Pareto-optimality. A social policy (e.g., on distribution) is Pareto optimal if no other available policy makes everyone better off. Our two dimensions are analogous to two individuals.
ANALOGY AND EPISTEMIC VALUES
251
I suggest that the argument provided above for stare decisis should be understood in one of two ways. We can argue that a particular case-law system incorporating that doctrine is weakly optimal in the sense just described.18 Alternatively, in view of the fact that all existing legal systems defer to precedent with varying degrees of stringency, we can argue that even if we do assign weights to our two dimensions to produce a single combined score, some version of a doctrine of precedent will be incorporated in any weakly optimal policy selected over the whole range of reasonable weights. In particular, no reasonable assignment would support a legal system in which judges are given a free hand. A third and final observation is important. It is striking that the justification for stare decisis, in company with most of the jurisprudential literature, ignores what we might traditionally expect to be cited as the primary value within the legal system: dispensing justice. Stability, orderliness, and even responsiveness to social change may be important, but they are surely secondary to the reliable administration of justice. Yet we do not find arguments that stare decisis directly promotes decisions that are just. Despite this omission, the connections between stare decisis and justice are obvious. First, there is an indirect link: if past decisions have been just, we can expect that like decisions will produce mainly just results. Second, there is a more direct connection: the secondary values promoted by stare decisis are related to justice either instrumentally or as constituent parts. Consider the following argument: There is a sense in which the court’s failure to follow one of its past decisions inevitably produces a failure of justice. Litigants are entitled to expect that like cases will be decided alike, and they are encouraged to do so by the courts. One litigant must necessarily be disappointed of his just expectations if the [court] holds that, although the facts of his case are as he contended and although, as he likewise contended, there is no reasonable legal distinction between those facts and the facts of a previous case . . . the case is one which ought not to be followed. (Cross and Harris 1991, 103–104) The point here is that stability of the common-law system serves justice because it is in fact a constituent of justice. Hart (1961) also argues for a close connection between stare decisis and the impartiality that is a normal feature of justice. Summarizing, the justification for stare decisis is that the policy (as instantiated in particular traditions) is weakly optimal in securing a balance of conservative and progressive social and ethical values. Justice plays only an indirect part in this argument. Let us accept this justification for stare decisis and return to the role of analogies in science.
18. I exclude utopian legal systems. There is always room for improvement in existing systems.
252
BY PARALLEL REASONING
7.5 Law and Science A High-level Analogy I shall develop a high-level analogy between law and science that suggests that we can offer a similar justification for analogical reasoning in science.19 I suggest that the policy analogous to stare decisis is to accept as plausible (worthy of investigation) any conclusion supported by an analogical argument satisfying the two fundamental principles of the articulation model. Just as we stand by decided cases in the law, we should stand by accepted examples of proof, explanation, and so forth in the sciences. My claim is that this policy achieves a weakly optimal balance between competing epistemic values. I support this claim by identifying key similarities, and explaining the surprising irrelevance of the many differences, between law and science. By analogy with the justification for stare decisis, we are arguing for the acceptability of a policy or method. That is not quite the same conclusion as the one stated at the outset (section 7.1), which refers to the plausibility of individual analogical arguments. Moving to the policy level is the best way to handle both the generality of the two fundamental principles and the possibility of exceptional cases. We might, for instance, encounter a situation in which a conclusion supported by a good analogy must clearly be rejected because of a fundamental conflict with other parts of our theory. Such cases should not undermine our confidence in analogical reasoning any more than would obvious anomalies in the law.20 A policy-level justification for our method of analogical reasoning is appropriate in science, just as in law. In developing this high-level analogy between legal and scientific reasoning, we make use of our own procedure. We elaborate the prior association according to an appropriate model and identify the concepts relevant to that association. That task has been accomplished in the discussion (section 7.4) of the justification for stare decisis. Our next step is to assess the potential for generalization— that is, the possibility of transferring our argument from one domain to another by considering similarities and differences. There are notable similarities between law and science. The conservative legal values that support stare decisis have analogs in conservative epistemic values. We value stability and orderly growth in normal science—if Kuhn is right, we value these things highly. The stability of a body of theory and the orderly progress of inquiry are distinguishing features of a mature science. At the same time, we prize innovation within the current paradigm, and our list of forwardlooking epistemic values is the analog to the progressive values that play a role in the law. Of course, the use of analogies in science can fail to yield correct problem solutions, and as we saw in chapter 1, there is a more subtle danger that
19. I refer to it as “the high-level analogy” between law and science to distinguish it from individual analogical arguments occurring within the law or within particular sciences. 20. Such possibilities motivated the clause about “independent reason to reject” in our original formulation.
ANALOGY AND EPISTEMIC VALUES
253
analogies can perpetuate social or metaphysical biases that hinder science. These concerns, as we have seen, have their analogs in the law. In defending stare decisis, our response to such problems was twofold: the probability of this sort of failure is sufficiently low that we should still abide by precedent, and there are safety mechanisms that mitigate the problems. Following Kuhn, we can offer analogous responses in science. Scientists do not expect novelty; they attach low probability to encountering a situation where existing methods will fail to provide a solution. In fact, they tend to set aside intractable problems and focus on those they expect to yield to techniques available within the existing paradigm. Furthermore, science has even greater resources than the law for eventually recognizing and abandoning a misleading analogy. In summary, we find that the basic assumptions in our justification for stare decisis all have analogs in the practice of science. Finally, our conclusions are also closely analogous. We want to show not that every analogical argument in science conveys plausibility but only that those adhering to our two fundamental principles do so. As noted above, stare decisis implicitly incorporates both principles. Building them into our target conclusion maintains the analogy with the source domain. Now we come to the many serious differences that appear to block our highlevel analogical argument. Here are the most important ones. 1. Fundamental goals. The fundamental goal of legal institutions is the dispensation of justice. Science aims at truth or some surrogate such as empirical adequacy. This discrepancy seems to show that the pragmatic justification for stare decisis is unlikely to be transferred successfully to a scientific setting. 2. Normative vs. descriptive systems. A legal generalization is a norm imposed upon particular cases. By contrast, a generalization in science is descriptive: particular cases confirm or disconfirm it. Perpetuating a norm requires a moral or pragmatic justification, but projecting a descriptive regularity requires epistemic justification. 3. Importance of conservative values. Following up on the previous two points, the role of conservative values such as stability and gradual evolution is different in the law and in science. In the legal setting, these values are either constitutive of justice or instrumentally related to it, and therefore are ends to be promoted. In a scientific setting, by contrast, they do not constitute truth or empirical adequacy. They function as constraints on scientific practice rather than as ends. In addition, the value placed on innovation is very high. In short, conservative values count for less in science than in the law, which dims the prospects for our highlevel analogy. 4. Range of alternatives. The justification for stare decisis is greatly simplified by the lack of serious alternative approaches to legal decisionmaking. By contrast, a scientist can draw on a range of resources other than analogies in coming to a decision about the plausibility of a hypothesis.
254
BY PARALLEL REASONING
5. Nature of prior associations. As explained in chapter 4, scientific analogies depend upon a prior association that is typically a causal, explanatory, or logical relationship. Legal analogies are based on relationships of legal relevance. This difference might be an obstacle to drawing any deep connection between the two disciplines. 6. Prevalence of open-textured predicates. As we noted in section 1.3, the opentextured nature of many predicates is significant in the law but less so in science. It could be argued that it is the prevalence of open-textured predicates in legal reasoning that makes case-based reasoning so essential there: we need to draw on past cases to grasp the very meaning of the relevant concepts. We should expect analogies to play a less critical role in science. 7. Obligation vs. permission. Exceptions aside, the force of precedent in the law is binding. It tells judges how they must decide a case. By contrast, a successful analogical argument in science does not generally compel acceptance of a hypothesis but rather provides epistemic license or permission to investigate further. 8. Richness of case library. An advocate or judge typically has access to a large library of cases more or less similar to the one under scrutiny. A scientific hypothesis may be supported by multiple analogies, or there may be competing analogies supporting rival hypotheses, but we often see analogies with just one source domain. The richness of the case library makes the justification of analogical reasoning more natural in the law than in science. These differences are deep but not devastating. In evaluating an analogical argument, what counts is not global similarity but whether the known differences are critical relative to a specific prior association. In my view, the differences just noted are either irrelevant or can be accommodated without destroying the argument. Before dealing with them, let us take a second look at the features common to both legal and scientific reasoning.
Elaboration: Analogy and Values in Science The first and most important observation is that primary virtues play no direct part in our justification for analogical reasoning, either in the law or in science. As already noted, there is no reference to justice in the traditional justification for stare decisis. Nor do the primary goals of truth or empirical adequacy, accuracy, and precision enter directly into the proposed justification for analogical reasoning in science. Both primary and secondary goals are obviously important in evaluating scientific hypotheses. But if our focus is on methods for assessing plausibility, as it is during the preliminary stages of an inquiry, then the secondary goals become all-important. The primary goals either fade out or factor out, so that they need not be taken directly into account. Truth (or empirical adequacy) fades out. When considering a novel hypothesis, we are not in a position to assess veracity directly. That is a matter for further
ANALOGY AND EPISTEMIC VALUES
255
testing (or rigorous proof, in the case of mathematical conjectures). Of course, if we agree with McMullin that secondary values offer a “hint” of truth, then there is an indirect link to veracity. Whether McMullin or van Fraassen is right on this point, however, is immaterial. What matters is that plausibility is directly assessed on the basis of the secondary virtues rather than on the basis of truth. Accuracy and precision, by contrast, factor out. These primary goals do bear directly on the plausibility of a hypothesis, but their contribution is orthogonal to that of the secondary values. We may have just one hypothesis, or we may be comparing different hypotheses that are all compatible with the observed facts (though they need not be empirically equivalent). Each must meet a basic threshold of accuracy and precision, but we are not to impose too stringent a demand. On grounds of simplicity, for example, we might prefer a linear relationship among observed variables, even though we could obtain an exact fit with a polynomial of higher degree.21 In these early stages of inquiry, accuracy and precision operate as basic constraints common to all methods, but not as primary selection criteria.22 Beyond the initial filtering operation, they do not help us to evaluate the plausibility of hypotheses or the merits of different methods for selecting an initial hypothesis. The identification of plausible hypotheses, then, and of appropriate methods for selecting them, is based upon secondary values. So there is no difficulty in the fact that our high-level analogy rests entirely upon similarities between secondary values in law and science. It follows that our proposed justification for analogical reasoning does not depend upon any deep metaphysical assumptions, but only upon our acceptance of Kuhn’s portrayal (or something like it) of normal science. On the one hand, if we regard the secondary values as standing on their own, without any connection to truth or empirical adequacy, then metaphysics is irrelevant. On the other hand, holding an instrumentalist conception of secondary values is compatible with most metaphysical positions. A realist might accept Kuhn’s ideas about normal science but reject his views about incommensurability and anti-realism, insisting that we have good reason to think that science progresses toward the truth. The secondary values advance us toward the attainment of a true and complete scientific worldview. A moderate anti-realist might claim that science progresses toward empirical adequacy rather than truth, but would put forward the same list of secondary values as serving this alternative primary goal. In the case of someone such as Laudan (1977, 1984), for whom the primary goal of science is to increase the number and range of solved problems, the secondary values again bear an instrumental relationship to that goal. In short, the primary goals vary with one’s metaphysical position but the list of secondary values and their instrumental worth remain constant. Provided analogical reasoning advances those secondary objectives, our argument succeeds.
21. Discussions of this complex issue go back a long ways; Reichenbach (1938) and Jeffreys (1973) are two rich sources. 22. A similar remark might be made about common sense. Common sense is also a factor in estimating plausibility, but it applies uniformly across all secondary values. We return to this point below.
256
BY PARALLEL REASONING
We have been equating plausible with “worthy of investigation.” Our analysis up to this point brings out an important point: plausibility, so defined, is severed from direct connection with truth. To my mind, this is all to the good: our defense of analogical reasoning depends only upon acceptance of the pragmatic virtues. Yet somebody might insist upon the objective of establishing analogical reasoning as plausible, where plausibility is linked to truth (or empirical adequacy). Our analysis still accomplishes part of that program by showing how analogical reasoning fits in with the pragmatic virtues.23 For those who accept some version of the instrumentalist conception, then, our justification for analogical reasoning divides into two steps. Step one is to show that analogical reasoning effectively promotes the secondary values. Step two is to justify the link between those values and our designated primary goal or goals (truth or some surrogate). The second step, as has been noted, involves the problem of induction. Even if experience suggests that simplicity, unification, and other secondary values have been indicators of empirical adequacy (or truth), to project this relationship into the future requires the use of induction. Our justification for analogical reasoning is limited to the first step. Recognizing this limitation frees us from having to deal with the familiar skeptical arguments that plague attempts to justify induction. On this point, incidentally, law has an advantage over science because there the connection between secondary and primary values is much tighter, so that step two is more straightforward. As we noted, stability is not merely instrumentally related to justice, but also constitutive of it. In my view, this helps to explain why the justification for analogical reasoning is so much easier in the law than in science. This difference does not defeat our high-level analogy, however, because in both cases only secondary values are of direct concern. In the previous section, I identified three key features in the justification for stare decisis: it operates at the level of policy, it is directly concerned only with secondary values, and its justification is best regarded as a claim of weak optimality. Having seen that we can transfer the first two features to the scientific setting, let us now consider the third. How are competing secondary objectives to be balanced? What do we hope for in the early stages of either a scientific investigation or a legal argument? In either case, we need to keep our options option. We want to identify not just one but many serious candidates for solutions to our problem. In law, these are the best competing arguments for both sides of the debate. In scientific reasoning, they are the hypotheses that we take to be prima facie plausible—that is, worth pursuing. My proposal is that the set of serious candidates consists of those supported by weakly optimal methods. Just as in the legal domain, a method of assessing overall worth is weakly optimal if it achieves a balance between conservative and progressive values—that is, if no other available method does better along every relevant dimension. This concept applies equally well to the scientific and social settings. 23. Chapter 8 explores these issues more fully, in a setting where beliefs and belief change are modeled probabilistically.
ANALOGY AND EPISTEMIC VALUES
257
Weak optimality is certainly a necessary condition for plausibility. Suppose that two rival hypotheses are, respectively, supported by method A and method B, and that method B is superior in all respects to method A. Then we have good reason to investigate the hypothesis supported by method B rather than the one supported by method A. So method A cannot be used to establish plausibility. With one assumption and one qualification, weak optimality is also a sufficient condition for plausibility. The assumption is that we can and should perform an initial plausibility screening without assigning weights to the different epistemic values. Individual scientists do assign weights to simplicity, coherence, and so forth, and they make quite different judgments about plausibility. An initial screening, however, should be broad-minded. We want to allow, for instance, that two conjectures might both be plausible even if they could not both be true. This is only possible if the process for evaluating plausibility either abstracts entirely from weights or prescribes a range of admissible weights. The qualification is a requirement of balance. Any legitimate method for assessing plausibility must respect the importance of both conservative and progressive values. Selecting hypotheses entirely on the basis of a single factor, such as simplicity, is weakly optimal because no other method will score better on that single dimension, but such a strategy is too lopsided to count as establishing plausibility. The requirement of balance admits only methods that give positive weight to both conservative and progressive values. When I speak of a weakly optimal method henceforth, I shall assume that it satisfies this condition. With this requirement in place, there is little point in distinguishing between the two options mentioned at the end of the preceding paragraph. The simplest approach is to evaluate plausibility without reference to weights. But then weak optimality is the most that can be claimed for any method. For this initial plausibility assessment, then, hypotheses supported by weakly optimal methods are all on par. This is our set of serious candidates. It might be objected that even though this proposal identifies the best candidates, we can still reject all of them as implausible. That would preserve the symmetry of the situation. The best response to this objection is pragmatic: we have to start somewhere. Given a decision to work on a particular problem (and the conviction that a solution exists), we have to gamble on some hypothesis. The cost of doing nothing at all until some better idea comes along is too great. Let us take stock. I have argued that in the early stages of an inquiry, we seek an optimal balance of secondary values. Methods that are weakly optimal and that satisfy a modest requirement of balance yield plausible solutions—that is, ideas worth pursuing. The final step is to show that analogical reasoning, in accordance with our two fundamental principles, is in fact weakly optimal. Now just as there are many different versions of stare decisis, there are many different ways to implement the two fundamental principles. As explained in chapter 4, they yield strict criteria for analogical reasoning in some settings and relatively loose criteria in others. Still, we were able to present a convincing case for the weak optimality of stare decisis without going into these details, and I believe we can do the same thing for our model of analogical reasoning.
258
BY PARALLEL REASONING
We are helped by the fact that there are few methods available for plausibility assessment. Return for a moment to the legal arena. The weak optimality of stare decisis was easy to defend because there are few alternatives. We found only two: civil-law traditions in which judgments are based on comprehensive legal codes, and legal systems based largely on idiosyncratic judgments. Neither of these may be expected to perform better in all respects than a case-law tradition built around stare decisis. In the case of scientific reasoning, on the one hand, our task is simplified because there is no option analogous to a civil-law tradition. Nobody can seriously entertain the fantasy of a complete set of general heuristic principles sufficient to guide all assessments of plausibility. There is just too much novelty even in normal science, to say nothing of scientific revolutions. On the other hand, idiosyncratic methods have to be taken more seriously in science than in the law. Such methods are more or less regular procedures for identifying plausible hypotheses on the basis of some combination of secondary values. Given our restrictions about plausibility screening, though, they must not depend upon any definite weighting of these values. Analogical reasoning in accordance with our two fundamental principles is one such method. What distinguishes it in science, as in law, is the extent to which it embraces and carefully balances nearly every important secondary value. As we shall see, this single method integrates fruitfulness, simplicity, coherence, and unification, rather than leaving them in tension. Only scope is not encompassed in the mix: that there is analogical support for a theory tells us nothing, either positive or negative, about its scope. No other method for assessing plausibility so successfully combines so many dimensions, let alone rating better along each of them. There is also a second distinguishing feature: when we confine our attention to unification, we are unlikely to encounter any method that can match the promise of analogy. These two distinguishing features justify our conclusion that analogical reasoning in accordance with the two fundamental principles is a weakly optimal method.24 Let us first see how analogical reasoning promotes the values of coherence, simplicity, fruitfulness, and unification. Salmon (1990) relates coherence (which he calls consistency) explicitly to analogy. This criterion, he tells us, has two aspects: internal (or logical) consistency and external compatibility with other accepted theories. Compatibility in turn has a deductive component (logical compatibility) and an inductive component (fit or congruence). Since the deductive relationships are straightforward, the inductive aspect is the more interesting and important one. Salmon maintains that the chief criterion for evaluating the inductive fit of a new proposal is analogy with accepted theories. Simplicity is a complex idea that, again following Salmon, has “no precise measure.” He rejects the idea of any a priori characterization: judgments of simplicity vary with context and are based on “training and experience” in 24. One qualification here: if we have many competing analogical arguments and a clear way to rank them, as discussed in section 4.10, then the claim about weak optimality is restricted to those analogical arguments that are most on-point.
ANALOGY AND EPISTEMIC VALUES
259
particular areas of expertise. Shimony (1970) endorses Goodman’s (1961) idea that standards of simplicity rest upon our commonsense knowledge and our classificatory habits, but like Salmon, he supplements this with contextual expertise. He notes that without “guidelines suggested by antecedent knowledge,” the concept of simplicity becomes vacuous. Only experience could have taught us to recognize the simplicity of Lorentz invariance principles, for example. These views strongly suggest that analogical reasoning may be expected to yield ideas that score well for simplicity. For those ideas are by definition similar in relevant respects to the exemplary models that shape scientists’ conception of simplicity. Thus, even though simplicity goes well beyond analogy in incorporating commonsense knowledge and other factors, there is a positive relation between the two notions. Fruitfulness, like simplicity, has many different aspects. At least one of these, however, is closely related to analogy. Salmon writes: “Some theories prove fruitful by unifying a great many apparently different phenomena in terms of a few simple principles. The Newtonian synthesis is, perhaps, the outstanding example; Maxwellian electrodynamics is also an excellent case” (1990, 577). A theory that promises unification possesses this type of fruitfulness. I now turn to that central secondary value, unification. As Friedman (1974), Kitcher (1989), and others have suggested, unification is desirable in science because it allows for maximal explanatory power with a minimal set of principles, as in the case of the Newtonian synthesis.25 A similar point applies to unification, or generalization, in mathematics. In Kitcher’s work, the basic notion of unification is “seeing connections, common patterns, in what initially appeared to be different situations” (1989, 432). He identifies these common elements as patterns of derivation: Science advances our understanding of nature by showing us how to derive descriptions of many phenomena, using the same patterns of derivation again and again, and, in demonstrating this, it teaches us how to reduce the number of types of facts we have to accept as ultimate (or brute). (432) This general understanding of unification seems right, and it lies at the very heart of the articulation model of analogical arguments. Our two fundamental principles are aimed directly at the unification of diverse phenomena or problem areas. They force us to refine and evaluate a conjecture, in comparison to some accepted idea, until the common elements emerge. Prior association requires that there be a pattern and that we make it explicit. Potential for generalization requires, at minimum, that there be no barrier to transferring the pattern to the solution of a new problem. Our notion of a pattern is somewhat more general than Kitcher’s. He restricts himself to derivations because his overriding concern is with explanation, but as we saw in chapter 4, other types of relationship occur in analogical arguments. 25. Morrison is a dissenting voice; she argues for “the decoupling of unification and explanation” (2000, 1). Her target, however, is a set of metaphysical presuppositions about the “unity of nature” rather than the modest thesis that unification is at least sometimes linked to explanation.
260
BY PARALLEL REASONING
In any event, by incorporating the two fundamental principles, we ensure that our model of analogical arguments performs exceptionally well along the important dimension of unification. That dimension is crucial to plausibility judgments, for a number of reasons. It has both conservative and progressive aspects; indeed, as I have suggested, advancing unification also advances most of the other secondary epistemic values. As a result, a focus on unification tends to lead to a balanced assessment of plausibility. By contrast, a theory of analogical inference that omits the two principles could actually score negatively along the dimension of unification. That is because potential for generalization is violated if and only if there is some known barrier to unification (as ultimately became clear in the case of the acoustical analogy, Example 4.8). An analogical argument that is acknowledged to be defective in this way will also be much less attractive from the point of view of other epistemic criteria that, as we have seen, are strongly linked to the central value of unification. In my view, such analogies do not establish anything about plausibility. Their only value lies in their potential for salvage: perhaps they can be taken apart and refashioned into sound analogical arguments. My theory of analogy is oriented toward this single value of unification in much the same way as other theories of plausibility center on simplicity or scope. Commenting on proposals such as those of Kemeny (1953) or Jeffreys (1973) that make simplicity the sole or primary criterion for assessing proposed (and unrefuted) hypotheses, Shimony writes: “this role is too great a burden for the concept of simplicity to bear” (1970, 155). Although Shimony is objecting here to a priori concepts of simplicity, the main problem is that simplicity is only one of many dimensions that should be used to assess the merits of a proposal. Popper’s approach to theory choice faces similar problems. To suggest that the boldest (unrefuted) conjectures are the best (1959, 1974) is to make scope the sole criterion of plausibility. Once again, this takes too narrow a view of how such decisions are to be made. The failure of such approaches, in fact, is what motivated our requirement of balance. Although our model of analogical reasoning fixes on unification as its central objective, that is an excellent choice because, as we have seen, it overlaps with nearly all of the other secondary values. Summarizing, my argument amounts to the following. In comparing different methods for assessing plausibility, we must look to secondary values. Prior to assigning weights to these values, weak optimality is all we have to go on. But there are relatively few methods available. The weak optimality of analogical reasoning in accordance with the two fundamental principles follows from two observations: no other method respects and even integrates so many different values, and no other method even comes close along the dimension of unification. Finally, analogical reasoning without the two fundamental principles does not clearly advance these values (though I return to the topic of defective analogies in the concluding chapter). We have taken the justification given for stare decisis in legal reasoning, and adapted it to the scientific setting. Let us now return to the eight objections noted earlier to the analogy between law and science.
ANALOGY AND EPISTEMIC VALUES
261
Responses to Objections 1 through 4 As Thomas Aquinas would say, the responses to the first four objections are evident from the preceding discussion. The discrepancy between the fundamental aims of law and science is irrelevant because those fundamental aims do not figure directly in the justification for analogical reasoning in either discipline. The high-level analogy rests entirely upon similarities between secondary values in law and science. Given the focus on secondary values, the contrast between normative and descriptive enterprises is also irrelevant. That difference pertains to the primary goals. Analogical reasoning serves normative ends in the law and descriptive ends in the sciences, but the reasoning itself can be distinguished from those ends and is only indirectly connected to them. While still on this point, we might be bothered by the very different way in which our two domains handle the relationship between generalizations and particular cases. In the law, generalizations are imposed on particular cases, whereas in science, they are tested by such cases. But this contrast only emerges in later stages of scientific investigation since, as we noted, evaluations of plausibility are being made between hypotheses that are all compatible with the data. So far as plausibility decisions are concerned, in fact, scientific reasoning has a normative aspect: it seeks to impose a pattern as the best way to solve a problem. As to the third objection, we may concede that stability is more directly related to justice in law than to truth (or empirical adequacy) in science, and that this difference can explain why analogical reasoning is more easily justified in the former setting. But as we noted, our high-level analogy is not destroyed by this difference. Nor is the greater weight attached to conservative values in the law much of a problem, so long as both conservative and progressive values remain important in both domains. If I am right that we need a stage of plausibility assessment prior to the assignment of weights to these values, then this difference is entirely irrelevant to our analogy. The fourth objection concerns the discrepancy between the range of methods available for legal reasoning and scientific reasoning. We have discussed this point extensively. All that matters, ultimately, is that in both the law and science there is a strong case for the weak optimality of analogical reasoning. The remaining four objections merit separate discussion.
Objection 5: Nature of Prior Association I have maintained that in evaluating analogical arguments, different types of prior association require different models. The point of our high-level analogy, however, is to transfer a pragmatic (broadly decision-theoretic) justification for reasoning in accordance with our two principles from the legal arena to the sciences. That high-level analogy abstracts from the details of how to make assertions of legal relevance, and the justification presented for stare decisis does not depend upon those details. There is no reason to think that we cannot
262
BY PARALLEL REASONING
similarly abstract from the rather different types of relevance that occur in scientific arguments.
Objection 6: Open-textured Predicates The prevalence of open-textured predicates in legal reasoning is irrelevant to our argument. There are two different routes to this conclusion, depending on one’s view about the prominence of open-textured concepts in science. One route is to claim (despite my arguments to the contrary in section 1.3) that there really is no difference between law and science on this score, and hence no threat to our analogy. Scientific concepts are as open-textured as legal concepts. This position might receive support from a line of thought initiated by Hesse (1966) and developed prominently in recent years by Lakoff and Johnson (1980) and others. The idea is that scientific concepts are metaphorical in origin: they are extensions of concepts that have a more direct basis (e.g., in bodily experience). For example, high- and low-energy states are understood using a landscape metaphor in which more energy is up (Lakoff and Johnson 1980, 19). Many people think that metaphor is central to understanding how concepts function in scientific reasoning. Any claim about energy transitions, for example, draws tacitly upon the metaphor of upwards or downwards movement. This idea might be extended to the suggestion that understanding any scientific claim or argument, including analogical arguments, requires that we bring in secondary or semantic analogies: networks of examples and models which metaphorically define the concepts invoked in that claim or argument. That is precisely what we encounter in legal arguments, where we must employ such semantic analogies to justify the applicability of legal predicates. If science is like this too, then there is no disanalogy. As I argued in chapter 1, we do not find layers of argument in which the similarities upon which a scientific analogy rests are themselves backed up with supporting semantic analogies. Franklin’s analogy (Example 6.1) extended the concept of electricity beyond its familiar boundaries, but no subsidiary analogies were required to justify the many points of similarity cited between lightning and “electrical fluid.” Nor, if electrical phenomena are cited as similarities between two domains in a subsequent analogical argument, do we need to justify this claim by invoking further analogies. These observations support a position I stated in section 1.3: open texture is pertinent to scientific concepts when they occur in the conclusion of an analogical argument, but not when they occur in the premises (i.e., in the positive analogy). This position gives us a different way to respond to the objection about opentextured predicates. We can acknowledge that many, perhaps even most, scientific concepts are metaphorical in origin, yet maintain that only when such concepts are being enlarged or extended in scope does their metaphorical nature need to be considered explicitly (by way of analogies). In other cases, the metaphor may play only a silent and relatively modest role.
ANALOGY AND EPISTEMIC VALUES
263
The whole issue about open-textured predicates is thus irrelevant to our high-level analogy. For we can distinguish between a primary or justificatory analogy (intended to back the ruling in a case or support some scientific hypothesis) and the secondary or semantic analogies used to demonstrate the applicability of an open-textured predicate. Our high-level analogy is based solely on a comparison between primary or justificatory analogies in law and science. The prevalence of open-textured predicates in the law ensures a large role for secondary analogies, but this has no direct bearing on our argument.
Objection 7: Obligation and Permission Stare decisis constrains decision-making; analogical reasoning in science just provides support for a possible line of investigation. This looks like a sharp contrast between obligation and permission. In fact, the difference is not so very great. There are elements of obligation and permission in both domains. When we focus on the preliminary stages of legal reasoning, analogical reasoning is permissive. As Ashley explains, advocates are interested in finding the most on-point precedents for both sides in a legal dispute, not only in order to construct their own arguments but also to anticipate those that will be offered by opposing counsel. So our idea of a stage at which many competing scientific hypotheses can be plausible is mirrored in legal reasoning. Furthermore, if we move beyond analogical reasoning to the broader category of plausibility arguments, we find that they are as binding in science as in the law. There are two differences. First, analogy is the main form of plausibility reasoning available in the law while it is one of many prominent forms in science. Second, the set of serious candidates supported by some optimal method is typically much larger in scientific reasoning. The principle common to both domains is that we are constrained to adopt one of these serious candidates. That common idea is all that is needed for our high-level analogy.
Objection 8: Richness of Case Library In thinking about a legal problem, there are typically many readily accessible sources. This is often not the case in solving problems in science. All this implies, however, is that analogical arguments are more widely available in the law than in science. When we confine our attention to cases in science where analogies can be found, this objection ceases to be relevant. It is worth pausing here to relate this issue to Kuhn’s distinction between normal and revolutionary science. Analogies are quite common in the former, but less common in the latter. Although I have suggested that analogical arguments have value even in revolutions (an idea that I defend in the concluding chapter), we can already see that this is a problematic claim. Given the emphasis I have placed on unification as the primary virtue of analogical reasoning, and given Kuhn’s complete rejection of the idea that there could be unification across paradigms, we shall face a serious challenge in making a case for the value of analogical arguments in paradigm shifts.
264
BY PARALLEL REASONING
7.6 Conclusion To the original list of objections, we might add one more. If our concern is to provide a justification for analogical reasoning in science, then to employ an analogy with stare decisis looks like circular reasoning (since it presupposes the legitimacy of analogical reasoning in the law). If the analogy with law is meant to provide justification, then the reasoning is circular; if not, then it seems to be pointless. This objection provides an opportunity to reflect on a more general puzzle about analogical reasoning: the disappearing analogy problem. Suppose we have a prior association between facts P and a conclusion Q about the source domain. We argue that there is a similar connection between known facts P* and some conjectured hypothesis Q* about the target. In this case, the evidence for Q* is just P*. The source domain plays no role. How, then, can analogies ever provide justification? The solution to the general puzzle is that, in most cases, we need a preliminary step that identifies Q* as a prima facie plausible hypothesis before we can speak of evidence for it. That is what the analogy supplies. In a Bayesian framework, this idea has a probabilistic formulation. In the simplest case, Q* is a novel hypothesis whose prior probability is either zero or undefined. We cannot regard P* as evidence in the usual sense because the conditional probability Pr(Q*/P*) is undefined. The analogy lets us assign a value to this conditional probability—a value that functions as a kind of “relative prior” for Q*. This idea will be developed in the next chapter. That is what happens in some cases. But in others, the story is a bit different. Guided by the analogy, we might succeed in constructing a detailed connection between P* and Q* that is completely independent of the source domain. Sometimes this occurs by way of generalization: we are able to work out the structure common to the source and target domains. In the course of exploring an analogy between problems in two- and three-dimensional geometry, for instance, we might formulate and prove a general result in n dimensions that is good for any n. If we can justify an association at this general level, then the relationships in the source and target become instances. We have moved beyond analogy to full-fledged generalization. The source domain may still have heuristic value, but its justificatory importance evaporates. My response to the charge of circularity is that in developing the analogy between law and science, we do isolate a common general structure, and the justification for analogical reasoning pertains to that structure. The general structure is the problem of satisfying competing preferences without assigning weights, and the solution is to adopt weakly optimal methods. At the end of the day (but no earlier), the analogy with law does fade away. Yet it is only by means of that analogy that we could arrive at our general characterization of settings in which analogical reasoning establishes plausibility.
8 Analogy and Symmetry
8.1 Introduction The previous chapter developed a top-down analogy between law and science. Analogical reasoning is justified because it achieves an optimal balance of conservative and progressive values. There is also a bottom-up analogy. In both domains, analogical arguments appeal to symmetry: like cases must be treated alike. It is important to appreciate the contrast. The top-down approach of chapter 7 is based on the idea that analogical reasoning effectively promotes virtues such as simplicity and unification. That is very different from the thought that symmetry provides constraints upon our reasoning. Suppose that an analogical argument satisfies the two requirements of the articulation model: prior association and potential for generalization. I maintain that such an argument establishes a relationship of symmetry between the source and target domains. That relationship, in conjunction with basic norms of symmetry-based reasoning, implies that we ought to treat the existence of an analogous association in the target domain as a serious possibility. To develop this line of justification, we need an account of symmetry and an analysis of how it combines with our theory of analogical arguments. Section 8.2 meets these two objectives in a general setting that treats the notion “is a serious possibility” as the modal concept that I have been referring to as prima facie plausibility. It proposes a modal form of symmetry-based reasoning, and then applies this argument form to the articulation model of section 4.3. The rest of the chapter aims to develop a probabilistic version of the argument. Moving to a probabilistic (more specifically, Bayesian)
266
BY PARALLEL REASONING
setting is important because it provides scope for objections and refinements. The central idea that I defend in this part of the chapter derives from Salmon (1966, 1990) and has considerable independent appeal: prior probabilities represent rational judgments about plausibility, and analogies can play a significant role in assigning prior probability values. My principal objective is to sharpen these points by showing, via appeal to symmetry, why an analogical argument that satisfies the requirements of our model should justify the assignment of nonnegligible prior probability to a hypothesis. The Bayesian setting, then, provides a suitable framework for integrating my theory of analogical arguments into a broader account of confirmation and belief revision. This widens the appeal of the theory. I hope that it also enriches the Bayesian approach to confirmation: analogical arguments provide one means of countering familiar worries about the arbitrariness of prior probabilities. One important limitation: I provide no justification for basic symmetry principles, other than to deflect some of the most severe criticisms leveled against them. My strategy here parallels that of chapter 7, where I take standard pragmatic virtues (simplicity, fruitfulness, and so forth) for granted. In both chapters, the idea is to push back, rather than resolve completely, fundamental philosophical concerns about the justification of analogical reasoning, carving out for it a distinct place within a broadly acceptable framework for scientific reasoning. Obviously, that strategy fails, in this chapter, if all appeals to symmetry ought to be rejected. That is why I devote two sections (8.3 and 8.4) to defusing arguments that would lead to such universal rejection. The project needs to be explained and defended with care. The basic link between symmetry and analogical reasoning (the “modal” argument) is developed in section 8.2. I then move to a general discussion of prior probabilities in section 8.3 and an examination of the relationship between symmetry and prior probabilities in section 8.4. Section 8.5 adapts the modal argument to the Bayesian setting with an account of the connection between analogy and prior probabilities. The final sections explore implications of this approach to philosophical justification.
8.2 The Modal Argument Symmetry Arguments Symmetry is a precisely defined relationship of similarity between two objects. Typically, this relationship is represented by means of a transformation, or class of transformations, that leaves some set of features invariant. Two geometric figures related to one another by reflection, rotation, or translation exhibit symmetry. Reflections, rotations, and translations are isometries: they preserve the distance between any pair of points. Two geometric figures are symmetric if and only if there is a transformation from one to the other that preserves distances.1
1. Plainly, there can be other sorts of geometric symmetry.
ANALOGY AND SYMMETRY
267
In addition to external symmetries relating one object to another, we also speak of an object’s internal symmetries. These too can be understood in terms of transformations. An object has internal symmetry with respect to a class of transformations if it remains identical under those transformations. A playing card has internal symmetry with respect to 180 degree rotation; a square is symmetric under reflections and rotations of 90, 180, or 270 degrees. A symmetry transformation, then, is a mapping that preserves some designated feature or class of features. We are generally interested in the class of all transformations that leave those features invariant. We can either specify the invariant features or the class of symmetry transformations; the two strategies are equivalent.2 Two problems are symmetrical if their relevant or essential structure is identical. What counts as essential depends upon the type of problem. But even at this level of generality, we can already see a link between symmetry and analogy. Perfect symmetry between two problems is also perfect analogy. A perfect symmetry argument is an analogical argument in which all points of difference are known (or believed) to be irrelevant, and all features relevant to the solution belong to the positive analogy. The following examples illustrate this point. Example 8.1 (hydrodynamic analogies). Hydrodynamic analogies (see section 6.4) depend upon isolating the right invariants. Given certain key parameters relevant to the stress on a twisted metal bar, engineers can construct a model using fluid in a tube. By measuring the velocity of the fluid at some point, they can predict the value of the stress in the metal bar at the corresponding point. The bar and the moving fluid are perfectly analogous systems, governed by equations of identical mathematical form (Timoshenko and Goodier 1970). Example 8.2 (table-covering problem). Polya’s table-covering problem (1954, 23) is a charming illustration of symmetry-based reasoning. Two people take turns placing a penny on a rectangular table. The pennies must lie flat and not overlap with any penny already placed. The winner is the person who places the last penny. If each plays the best possible game, who will win? Polya provides two hints. First, in the special case where the table is so small that it can be covered by one penny, the player who goes first will win. Second, the problem becomes easier when we generalize to include any table with a center of symmetry. By extending the special case and exploiting the symmetries of the problem, we can solve the general case.3 The role of analogical reasoning in this case is subtle but important: recognizing the analogy between different cases is the key to solving the problem. Example 8.3 (used-car purchase I). Suppose that two used cars are indistinguishable in make, model, age, odometer reading, color, and every other parameter that Tracy deems relevant to establishing a fair price. For her, they are perfectly analogous, or symmetrical, with respect to the matter of fixing a price. 2. The class of transformations has to satisfy certain formal properties. Typically, this class must be a group under function composition. 3. The player who goes first will always win. The first move should be to place a penny in the exact center of the table. After that, Player One’s strategy is simple: mirror each move made by Player Two, placing each penny directly opposite. This preserves the symmetry of the configuration (table plus pennies). So long as Player Two has space to place a penny, Player One is guaranteed to have space directly opposite for a response.
268
BY PARALLEL REASONING
If Tracy values the first car at $15,000, then she is constrained to value the second at $15,000. Note that even though the relationship between price and the relevant variables is subjective, there is still a constraint of consistency in applying subjective criteria. Tracy must not assign different values to the two vehicles in the absence of any identifiable difference that is relevant for her.4 These examples allow us to make two important contrasts. First, there is a contrast between objective and subjective symmetries. The reasoning in the first two examples is based on objective symmetries: the relevant set of invariant features is part of a widely shared (and objectively justifiable) model. By contrast, the basis for Tracy’s reasoning in Example 8.3 is a relationship of symmetry between her representations of the two cars. The “invariants” are simply the determinants of value for Tracy, which are subjective and need not be widely shared. Given that the two cars are indistinguishable on these parameters, Tracy must, if she is rational, assign the same value to both. She may assign an imprecise value or a range of possible values to both cars; she may even declare herself unable to assign any value. The important point is that both cars should be treated in the same way. The three examples also reveal a second contrast, between practical and theoretical applications of symmetry. Some symmetry arguments are oriented toward action while others are oriented toward theoretical belief (whether or not it leads to action). Polya’s example and the hydrodynamic analogies are primarily theoretical. Tracy’s reasoning, by contrast, is meant to help her decide which car to purchase. My conclusion at this stage is modest: subjective symmetries, whether objectively based or not, constrain practical reasoning in at least some cases. This is at least in part a fact about rationality, not psychology. Tracy has to value the two cars equally unless she can identify some relevant difference. The next two subsections aim, in a series of steps, to transform this initial observation into a convincing justification for analogical reasoning.
Symmetry Principles and Practical Reasoning Van Fraassen (1989, 233) identifies two broad principles that characterize symmetry-based arguments. In this section, I argue that these principles constrain practical reasoning in simple cases, such as Example 8.3 (used-car purchase), where we possess all relevant information. I then extend these constraints to cases of partial information as an important step toward linking symmetry to analogical reasoning.5 Van Fraassen’s first principle depends upon positive knowledge that certain symmetries exist. (8.2.1) Positive Symmetry Principle Structurally similar problems must receive correspondingly similar solutions. 4. The example is a modified version of one proposed by Russell (1986a) and by Davies and Russell (1987) to illustrate their concept of a determination rule (see section 2.4). 5. Van Fraassen’s attitude to symmetry arguments is complex, and I make no claim to represent his views.
ANALOGY AND SYMMETRY
269
Hydrodynamic analogies (Example 8.1) exemplify this principle. The other symmetry principle starts from asymmetry. (8.2.2) Negative Symmetry Principle An asymmetry can only come from a preceding asymmetry. The field of stereochemistry provides an excellent illustration. Polarized light moves in only one plane. Jean-Baptiste Biot, a nineteenth-century physicist, conducted experiments in which a beam of polarized light was passed through a tube containing a chemical solution. Typically, the light passed through with no change to the polarization, but in the case of certain solutions, such as camphor, the plane of polarization was rotated. Biot correctly conjectured that the internal molecular structure of these “optically active” substances must be asymmetrical in some manner (Nye 1996). These examples are theoretical. What about practical reasoning? Our analysis of Example 8.3 (used-car purchase) suggests that van Fraassen’s principles constrain practical reasoning as well. The principles express the rationality of treating uniformly, and the irrationality of treating differently, two cases that appear identical in all subjectively relevant respects. As indicated at the outset of this chapter, I shall not attempt to justify the basic symmetry principles (8.2.1) and (8.2.2). Before we can accept their application even to so simple a case as Example 8.3, however, we need to address an obvious objection. Experience has shown that symmetry arguments can lead to inconsistency. The greatest fears relate to probabilistic applications and the Principle of Indifference, which is partly why we are focusing initially on an example that has nothing to do with probability. Even for a case like Example 8.3, however, inconsistency can arise. Suppose that there are three used cars, identical in all parameters that Tracy deems relevant to her appraisal. Suppose further that she assigns car a the value of $15,000 and car b the value of $12,000. Then symmetry principles seem to dictate that the value assigned to car c should be both $15,000 and $12,000, which is absurd. In this case, though, the problem is easy to resolve. The fault lies not with the symmetry principles but with Tracy’s past evaluations, which are already inconsistent. Symmetry does not generate a new inconsistency; rather, it exposes an old one. Tracy has to sort out her existing value assignments before applying symmetry reasoning to a new case. I conclude that there is no serious obstacle to accepting that symmetry can provide rational constraints on assignments of value in the special case where two items agree on every factor that is (subjectively) deemed to be relevant. (8.2.3) Symmetry Constraint with Full Information If V ¼ F(X1, . . . , Xn) describes one’s evaluation function, V(a) is the value (or range of values) assigned to a, and Xi(b) ¼ Xi(a) for all Xi, then the value (or range of values) for V(b) should be the same as V(a).
270
BY PARALLEL REASONING
Here, V ¼ F(X1, . . . , Xn) is an indeterminate invariant relationship: it asserts that the value V depends entirely upon factors X1, . . . , Xn (over some range of cases), but does not necessarily assert a determinate form of functional dependence.6 Principle (8.2.3) says that if two objects exactly agree in all respects relevant to your evaluation, then you should assign them the same value (or range of values). It is meant to be an unproblematic symmetry norm. The next step is to extend the idea of symmetry-based constraints to situations of partial information. Consider the following modification of the used-car example. Example 8.4 (used-car purchase II). Suppose first that Tracy learns of two used cars that are indistinguishable in most respects: make, model, age, general condition, and odometer reading. She has not learned the color of the two cars, which is (for her) the only other determinant of value. The symmetry principles still constrain Tracy to treat the two cars in exactly the same way, although (as before) she may assign them either a definite value or a range of possible values. Suppose that Tracy now learns that the first car is blue and assigns it a value of $15,000. Although she is still ignorant of the color of the second car, symmetry dictates that the range of its possible values must include $15,000, since the second car will have to be assigned this value if it turns out to be blue. This range of possible values will be large if color is a major factor in Tracy’s appraisal and narrow if it is not. In short, when evaluating the second car, if Tracy assigns an interval of possible values, then this interval must contain the value assigned to the first car. The general statement of this constraint is as follows: (8.2.4) Symmetry Constraint with Partial Information If V ¼ F(X1, . . . , Xn) describes one’s evaluation function, V(a) is the value assigned to a, Xi(b) ¼ Xi(a) for some of the Xi, and it is not believed that 6 Xk(b) for any k, then the range of values for V(b) should include V(a). Xk(a) ¼ This is a genuine constraint, but a very weak one—indeed, too weak to be of much use. In order to obtain a more useful symmetry constraint, let us reflect a little more carefully on Tracy’s objectives. A constraint such as principle (8.2.4) might be interesting if our goal is to gather as much information as possible about the form of Tracy’s evaluation function or preference ranking. But all Tracy cares about is whether she should take a closer look. Her goal is not to achieve perfect clarity about her preferences, but to buy a used car. Let us introduce the notion of the set C of serious candidates. The members of set C are used cars that (1) Tracy knows about and (2) ought to be live options for her eventual purchase decision. The members of set C are cars that have either been deemed worthy to be in the running or ought to be investigated further. Tracy’s problem is to decide, on the basis of partial information about car b, whether to include it in the set C of serious candidates.
6. See section 6.4.
ANALOGY AND SYMMETRY
271
I propose the following principle, modeled on principle (8.2.4): (8.2.5) Modal Symmetry Constraint Suppose V ¼ F(X1, . . . , Xn) describes one’s evaluation function, (1) a belongs to the set C of serious candidates, (2) Xi(b) ¼ Xi(a) for some of the Xi, and (3) it 6 Xk(b) for any k.7 Then b should belong to C. is not believed that Xk(a) ¼ Suppose, once again, that Tracy has a complete list of all parameters relevant to her evaluation. The principle says that if Tracy knows that two cars agree exactly on some of these parameters but has no information on whether they agree or differ on the remaining parameters, and the first car is a serious candidate for purchase, then the second car should also be a serious candidate.8 I won’t hide the fact that there is a link between this principle and the criteria for a prima facie plausible analogical argument that I advanced in section 4.3. The requirement that Xi(b) ¼ Xi(a) for some of the Xi corresponds to my overlap condition: there must be a nontrivial positive analogy. The requirement that we not believe that Xk(a) 6¼ Xk(b) for any k corresponds to my no-critical-difference condition: no critical factor should belong to the negative analogy.9 Here at last is a symmetry principle that has some prospect of meshing with my theory of analogical arguments. In fact, this special principle contains some of the most important ideas needed later. In the remainder of this subsection, I hope to convince you to accept principle (8.2.5). If the principle seems obvious to you, please regard the argument that follows as a warm-up for later. The argument makes frequent reference to the used-car example, but does not depend upon the details of that example. Suppose, then, that conditions are as in the antecedent of principle (8.2.5): V ¼ F(X1, . . . , Xn) describes the evaluation function, a belongs to the set C of serious candidates, Xi(b) ¼ Xi(a) for some of the Xi, and we do not know that Xk(a) 6¼ Xk(b) for any k. For definiteness, suppose we know that Xi(a) ¼ Xi(b) for i ¼ 1 through m and that we are ignorant of Xk(b) for k ¼ mþ1 through n. We may suppose further that at any point in time, the list C is finite. It helps to focus on the used-car example: the members of set C are cars known to Tracy, clearly a finite set. We want to show that b should belong to this set. The argument for principle (8.2.5) proceeds by constructing a hypothetical earlier epistemic situation. Let us refer to the time of the present situation as t1 and the time of the hypothetical earlier situation as t0. At t0, Tracy is aware of X1(a), . . . , Xm(a) but ignorant of Xmþ1(a), . . . , Xn(a). In other words, at this imaginary earlier time, she knows the values Xi(a) only for those Xi whose values she now knows for both a and b. Since t0 is a hypothetical situation, we assume 7. Condition (3) should be interpreted as: nothing is believed that justifies either full belief or high probability for Xk (a) 6¼ Xk(b) for any k. In practice, I shall simply assume that one is ignorant of the values Xk(b) other than the specified subset on which a and b are known to agree. 8. Here I assume that priority of consideration is irrelevant: we don’t discount the second car just because we already have a serious candidate. 9. In this situation, all of the factors Xi may be regarded as critical. More on this point shortly.
272
BY PARALLEL REASONING
TABLE 8.1. Set-up for the modal argument Knowledge of a
Knowledge of b
t0
X1(a), . . . , Xm(a) known [Xm+1(a), . . . , Xn(a) unknown]
[b totally unknown]
t1
X1(a), . . . , Xm(a), Xm+1(a), . . . , Xn(a) known a is on the list C of serious candidates
X1(b) = X1(a) ... Xm(b) = Xm(a) [Xm+1(b), . . . , Xn(b) unknown]
that Tracy’s valuation function V is unchanged between t0 and t1. Finally, we may suppose she has no knowledge of b at that earlier time (or at least none that is relevant at t1). The situation is as depicted in table 8.1. I make two claims: (1) At t0, car a should be on the list C. (2) If (1) is granted, then by the negative symmetry principle (8.2.2), car b should be on the list C at t1. The case for (2) above is simple. Absent any relevant difference between car b at t1 and car a at t0, the decision about inclusion on the list should be the same. The argument for (1) is completely separate and owes nothing to symmetry. It derives from the fact that car a is on the list C at t1, and is thus a serious candidate at t1. Why does this imply that car a should be on the list at t0? Because of the pragmatic aspects of the situation. Tracy knows about a at t0, along with finitely many other cars. If car a were not on the list at t0, then Tracy would not deem it worthy of closer investigation (she would not believe that car a held any promise), and there is no foreseeable scenario on which car a would make it onto the list by t1. In this hypothetical construction, it is appropriate to cancel out accidental ways in which Tracy might come to learn more about car a without investigating. The net result is that if car a is not on the list at t0, then car a is effectively rejected at t0. This argument for (1) can be viewed as an application of van Fraassen’s General Reflection principle. Reflection in its most general form is the requirement that one’s present beliefs and values be consistent with anticipated future beliefs and values. As van Fraassen puts the requirement: “My current opinion about event E must lie in the range spanned by the possible opinions I may come to have about E at later time t, as far as my present opinion is concerned” (1995, 16). Given the simple decision-making structure we have imposed on Tracy, we can state a modal version of General Reflection. Suppose that each car belongs (at any point in time) to one of just three categories: on the list of serious candidates, rejected, or not yet considered. I propose: (8.2.6) Modal Reflection Principle If, so far as my present opinion at t0 is concerned, I may come to regard a as a serious candidate at later time t1, then I should regard a as a serious candidate at t0.
ANALOGY AND SYMMETRY
273
Modal Reflection requires that Tracy place car a on the list at t0 if she anticipates that she might want it on the list at t1. This is the only opinion that keeps open both future possibilities: rejection and inclusion.10 From this, I argue that if car a is on the list at t1, then Tracy must also put it on the list at the hypothetical earlier moment t0. We are not talking about an actual earlier moment. Tracy might never have come to learn just the restricted facts about a stipulated in our characterization of t0. Instead, Tracy is acknowledging, given that she now takes car a seriously, that she ought rationally to allow for this attitude in the hypothetical state t0, and that the only way to do so is to have a on the list at t0. This connection with Reflection reinforces a point made above in connection with principle (8.2.4): the greater the known overlap between a and b, the smaller the range of “possible opinions” about the value of b. That observation can now be seen as a consequence of the fact that the range of possible values for a at t0 is narrow if one anticipates that learning the few remaining important facts about a is unlikely to produce much change in one’s opinion. Let me address two objections to the argument I have given for claim (1) above. The first is that it proves too much. Focus on the actual epistemic situation t1. Suppose that car a is on the list of serious candidates, and Tracy knows of car b only that X1(b) ¼ X1(a)—that is, that a and b share a single relevant attribute. The argument just presented compels us to put b on the list of serious candidates. That is both independently implausible and practically unfeasible because it generates too many serious candidates. In response to this objection, I concede that any context where our argument leads to an infinite or intractably long list of serious candidates is problematic. I acknowledge this limitation (which was at least partly taken into account in the requirement that set C be finite). Having conceded this much, I don’t see that it is independently implausible that b should be placed on the list on the basis of one relevant similarity if there are no known critical differences (i.e., nothing at all is known to suggest that Xk(b) 6¼ Xk(a) for some k). Provided the list of cars does not become overly long, a single relevant similarity may be an acceptable basis for further investigation.11 In connection with this response, we might wonder whether it involves presupposing that Tracy has adopted a rule such as the following: “Put a car on the list as soon as I come to believe that it has at least one attribute Xi that surpasses some threshold value.” The symmetry argument does not require any such rule because it constrains only pairs of decisions, not individual decisions. This reflects a general truth about analogical arguments: they may reflect the application of an underlying rule or generalization to two separate cases, but they do not have to presuppose such a rule.
10. Here we should note that familiar objections to Reflection, which involve information loss or uncertainty about temporal location, pose no threat to this application. 11. Here I want to flag that, in the end, we shall need to restrict the use of Reflection when the positive analogy is very slight. This difficulty emerges clearly for the probabilistic version of our argument and will be discussed in section 8.5.
274
BY PARALLEL REASONING
The second objection takes just the opposite tack. The objection is that there is no justification for requiring any positive analogy in principle (8.2.5)—that is, for the requirement that Xi(b) ¼ Xi(a) for some i. Suppose that Tracy knows everything about car a and nothing about car b. If the above argument proves that b should be taken seriously on the basis of one known similarity to a, doesn’t it prove the same thing even if there is no known similarity, so long as there is no known critical difference? My response is that in a case where nothing relevant is known about b, b should not enter into Tracy’s deliberations. In terms of the above categories, b is “not yet considered.” The argument fails at the Modal Reflection step because if a is stripped of all attributes, then a, too, belongs in the “not yet considered” category. There is no basis for distinction between a car a with no known (relevant) attributes and a car of which Tracy is completely unaware. Similar restrictions apply to General Reflection: we should not be worried about future preferences concerning outcomes about which we know nothing and hold no opinions. In short, b must meet the minimal requirement of positive analogy to figure in Tracy’s decision. Finally, let me address the question: in what sense does the Modal-Symmetry Constraint—principle (8.2.5)—count as a constraint of rationality? My arguments above amount to this: if we take Tracy’s subjective evaluation function as fixed, then norms of rationality—the negative symmetry principle and the Modal Reflection principle—lead to principle (8.2.5). It is natural to invoke a wider conception of rationality according to which Tracy’s decision, if it is to count as rational, must be answerable to a broad range of criticisms, including criticisms about whether her evaluation function includes all (and only) factors appropriate for determining the value of a used car. Since we have supposed that Tracy’s evaluation function is not open to this sort of criticism, her decision and the Modal-Symmetry Constraint reflect a restricted form of rationality. Happily, I believe that this restriction can be lifted when we generalize the modal argument, as we shall now do.
Application to Analogical Reasoning: The General Modal Argument Principle (8.2.5) is a modal principle. Instead of licensing the assignment of a value or range of values to object b, the principle licenses the conclusion that b is a serious candidate. I shall refer to the argument for principle (8.2.5) as the special modal argument and to the argument I am now going to develop as the general modal argument. The argument for principle (8.2.5) is a leading special case: it generalizes to a justification for any analogical argument that meets the requirements elaborated in chapter 4. It is convenient to repeat those requirements, but first we need a quick review of concepts and terminology.12 My theory requires that for a good analogical argument, there must be a prior association—typically a logical, causal or
12. Sections 1.4 and 4.3 provide a more complete presentation.
ANALOGY AND SYMMETRY
275
statistical relationship—that relates certain factors present or absent in the source domain to a proposition Q (called the hypothetical analogy) that is the focus of the argument. The conclusion is that an analogous proposition about the target, Q*, is to some degree plausible. The prior association may be represented abstractly as Rðf1 ; . . . ; fm ; p1 ; . . . ; pn ; QÞ, where fi and pj refer to factors (present and absent) that play an explicit role in the relationship. We then identify a subset jC of these factors, those that have an essential part in the prior association, as critical.13 The positive analogy P consists of critical factors that have known analogs in the target domain; the negative analogy N is the set of critical factors known to be present in one domain but absent in the other. Finally, we distinguish (for causal relationships) between contributing and counteracting factors, and we write jþ for the subset of contributing critical factors. Using these concepts, let’s restate the criteria for prima facie plausibility, from section 4.3, in the form of the following claim. (8.2.7) Prima facie Plausibility Suppose Rðf1 ; . . . ; fm ; p1 ; . . . ; pn ; QÞ describes an acceptable prior association such that the following conditions are satisfied: (1) Plausible analog. Q belongs to the set C of prima facie plausible hypotheses. (2) Overlap. jþ \ P 6¼ (non-trivial positive analogy). (3) No-critical-difference. jC \ N ¼ (no critical factor in the negative analogy). Then Q* should belong to C In short, an analogy between Q and Q* shows that Q* is prima facie plausible if we have an acceptable prior association, Q itself is plausible, and there are relevant similarities but no critical differences. It is helpful to see the parallels with principle (8.2.5). In passing from (8.2.5) to (8.2.7), the first step is to replace the set C of cars that are serious candidates for purchase with the set C of hypotheses that are serious candidates for investigation by an epistemic agent, whom we may as well continue to call Tracy. We are dealing here with prima facie plausibility. The members of set C are hypotheses that (1) Tracy knows about and (2) are or ought to be live possibilities that Tracy takes into account in making decisions. The hypotheses in C may have already been accepted, they may be under investigation, or they may simply be hypotheses that ought (in Tracy’s view) to be investigated. Tracy’s problem is to decide, on the basis of an analogical argument, whether to include some novel hypothesis Q* in the set C. The second step is to replace the valuation function V ¼ F(X1, . . . , Xn) that appears in principle (8.2.5) with the more general notion of a prior association. This complicates matters, but the complications are manageable. 1. The substitution complicates the identification of critical factors. In the special case V ¼ F(X1, . . . , Xn), all Xi are critical to the subjective
13. In some cases, we need to elaborate the prior association to identify these critical factors.
276
BY PARALLEL REASONING
evaluation function V and nothing else is relevant. For a prior association in general, neither of these assumptions is legitimate. In their place, we have the models of chapter 4 that allow us first to ensure that a given prior association meets standards of acceptability and then to identify the set of critical factors.14 These critical factors are drawn from the prior association, from the background context, and from other salient analogical arguments. Here, I abstract from these details and simply take for granted that for every prior association there is an associated set of critical factors, and that this set provides an acceptable “first-order” model of all that is relevant to evaluating the analogical argument. 2. A second complication is that a prior association (unlike a valuation function) may include both contributing and counteracting factors. This is taken into account in principle (8.2.7) by introducing the subset jþ of contributing critical factors in the overlap condition. 3. The third and final complication has to do with the positive analogy P and the negative analogy N. We replace equality and inequality of parameters Xi in principle (8.2.5) with more complex interdomain relations: similarities involving corresponding factors in the positive analogy, and differences involving corresponding factors in the negative analogy. The present argument again abstracts from the analysis of similarity and difference in earlier chapters. If these points are granted, then the modal argument for Prima facie Plausibility, principle (8.2.7), is basically the same as for its special case, principle (8.2.5). At t1, in the source domain, Tracy accepts Q and (a fortiori) includes Q in her set C of prima facie plausible candidates.15 In a hypothetical earlier situation t0 where jþ is reduced to just those factors that belong to the positive analogy (and no other changes are made), Tracy must also, by Modal Reflection, include Q in the set C. By symmetry between Tracy’s representation of the source domain at t0 and the target domain at t1, she is rationally required to include Q* in the set C of serious candidates as well.
Comments and Objections The argument just given relies upon the models of chapters 4 through 6 to identify the set of critical factors. That leads to a first objection: the general modal argument must be weaker than the special version used in support of (8.2.5). There are three good reasons, however, to welcome this dependence upon the earlier work. In the first place, our argument is not supposed to justify all analogical arguments—only those that satisfy the requirements of our theory. Since the
14. See section 4.2 for a summary and sections 4.4 – 4.9 for the details. 15. Surrogates for acceptance are fine. The strength of commitment to Q at t1 sets an upper bound to the strength of the argument for Q*.
ANALOGY AND SYMMETRY
277
theory includes the models, they must enter into the argument in some way. In fact, their role is exactly as described in the above discussion: they define symmetry relations by determining the critical factors and acceptable similarities. It is perfectly appropriate that these models should limit what a general justification for analogical reasoning can accomplish. The second point, closely connected to the first, is that our argument has a modular structure. The particular models of earlier chapters can be criticized, modified, or rejected; new models can be added. Such changes can be made without any need to alter the general modal argument. The third point has to do with the crucial issue of whether the Prima facie Plausibility principle (8.2.7) represents constraints that are rational or merely psychological. In this regard, Prima facie Plausibility enjoys an advantage over the special case, principle (8.2.5). I conceded earlier that although the special case follows from basic norms of rationality, it fails a wider test of rationality because it assumes a fixed subjective evaluation function that is not open to criticism. By contrast, Prima facie Plausibility describes constraints that are rational in both the narrow and the wide sense. The principle derives from the same basic norms of rationality, but this time the list of critical factors is also answerable to criticism. In fact, the procedure for identifying these factors is supposed to be acceptable to both an advocate and a critic. Debate over whether the subsidiary models identify the right factors is entirely legitimate. Another way to put the point is this: the models of chapter 4 are meant to prescribe generally acceptable ways to represent different types of prior association and to extract critical factors. Prima facie Plausibility says: “you can be the critic!” If an analogical argument counts as acceptable under one of the models, then unless you can find something wrong with the model, you ought (rationally) to take the conclusion as a serious candidate. Moving beyond these concerns about dependence on the details of the subsidiary models, a second objection applies to the final step in the general modal argument: the appeal to symmetry. That step, which is unproblematic for the used-car example and for principle (8.2.5), is open to criticism when applied to analogical arguments in general. The key issue is whether the interdomain similarities are “close enough” to preserve the relevant invariants. My response is similar to what I said about the first objection. Rather than defeating the general argument, this objection indicates the need to supplement it with effective constraints on similarity, as discussed in chapters 5 and 6. To summarize my response to both objections, given that our theory consists of two broad principles supplemented by a set of subsidiary models, the general modal argument offers as much as we can hope for by way of justification. Its central idea is to decompose every analogical argument into two stages: the construction of a hypothetical epistemic state to which we apply the Modal Reflection principle, followed by the application of symmetry to link that hypothetical state to the target domain. The decomposition succeeds for analogical arguments that satisfy our requirements for prima facie plausibility and fails for those that do not.
278
BY PARALLEL REASONING
Extensions of the Basic Theory In chapter 4, I discussed two extensions of the articulation model beyond the basic definition of prima facie plausibility. First, I proposed criteria for assessing qualitative plausibility (section 4.3). Second, I extended the model to handle reasoning with multiple analogies (section 4.10). The modal argument accommodates both of these extensions, but it does a better job with the latter than the former. First, consider reasoning with multiple analogies. In section 4.10, the basic idea for handling such cases is to partially order these analogical arguments by looking at the extent of the positive analogy with the target domain. As a rough (very compressed) summary, one such argument is “better” than another if its (relevant) overlap with the target is more extensive. Furthermore, if analogical argument A2 is better than analogical argument A1, then any critical factors for A2 automatically become critical for A1. With these amendments in place, if A1 and A2 support incompatible hypotheses, then we can disregard A1: the argument ceases to satisfy the requirements for prima facie plausibility. It might seem that the general modal argument leads to conclusions that conflict with this earlier analysis of multiple analogies. To illustrate the problem, consider the schematic representation in table 8.2 of two source domains that are potential analogs for the same target. Here, f1, f2, and f3 are critical factors (with analogous factors in the target denoted by *). Q1 and Q2 are incompatible hypotheses, and we have prior associations as indicated by the arrows in the two source domains. Assuming that there are meaningful analogous hypotheses Q1* and Q2* about the target domain, the question is whether either or both of them can be made plausible by the corresponding analogical arguments, which we denote as A1 and A2. Our theory of multiple analogies provides at least a partial answer: since A2 is better than A1, Q1* should receive no support from A1. A hasty application of the general modal argument suggests that it yields a different conclusion. Let t1 be the situation represented in the above table. Construct a hypothetical earlier epistemic situation t0 in which Tracy knows only f1. By the Modal Reflection argument, Q1 is a serious candidate at t0. By the symmetry argument, Q1* is a serious candidate for the target domain at t1. So it looks as though Q1* does receive support from A1, according to the modal argument. The error in this reasoning is not hard to spot. The analysis ignores the nocritical-difference condition, clause (3) of principle (8.2.7). The factor f2 is TABLE 8.2. The modal argument and multiple analogies Source Domain 1. f1, f2, f3 2, f3 2. f1, f
Target Domain ) Q1 ) Q2
2*, f3* f1*, f
ANALOGY AND SYMMETRY
279
automatically critical for A1, given that it is critical for the better argument A2. The reflection step of the argument succeeds—Q1 is a serious candidate at t0—but the symmetry step fails because the source domain at t0 and the target at t1 differ on the presence of the critical factor f2. The symmetry argument only works when these two domains aren’t represented as different in any crucial way. I conclude with a brief discussion of the criteria for qualitative plausibility from section 4.3, singling out the following clause: “An analogical argument becomes stronger when critical factors are shifted from neutral to positive analogy.” The neutral analogy consists of factors represented in the argument as belonging to one domain, with no information about the other. This proposal about relative strength is at least compatible with the modal argument. It is in keeping with the earlier discussion of principle (8.2.4) and the used-car example: the greater the known overlap between cars a and b, the smaller the range of “possible opinions” about the value of b. It seems likely that we can broaden this point so that it applies to all analogical arguments. As things stand, however, the modal argument can at best deliver the conclusion that the conjecture Q* is prima facie plausible. As I have stressed, I think that providing a basic argument for prima facie plausibility is a primary function for analogical reasoning. Still, it is clear that a modal approach is incapable of delivering any qualitative (or quantitative) measure of degree of plausibility. This provides an incentive to move to a more sophisticated probabilistic story about how symmetry and analogical reasoning are related.
8.3 Prior Probabilities How can analogical reasoning be accommodated within a probabilistic, broadly Bayesian framework for belief updating? That is the main question that will occupy us for the remainder of the chapter, and it is an important question. One reason for its importance is the richness (and wide acceptance) of Bayesian epistemology. A second reason is that we ought to be able to express the modal argument of the previous section in probabilistic terms. If we can’t, then there is something suspicious about that argument. A third, as noted in the chapter introduction and at the end of the preceding section, is the promise of enriching both our theory and the Bayesian model of confirmation. In my view, the basic ideas utilized in the modal argument can be adapted to a probabilistic setting. First, taking a hypothesis to be prima facie plausible can be construed as assigning it nonnegligible prior probability; second, analogical arguments can justify such prior probability assignments. I develop these ideas in section 8.5, stressing the complexities that emerge in making the transition to a probabilistic approach. (Note: this section and the next one are intended chiefly to show that it is not crazy to propose symmetry-based constraints on prior probabilities. The main line of argument resumes in section 8.5. Some readers may wish to skip to that section, which builds directly upon the ideas and arguments of section 8.2.)
280
BY PARALLEL REASONING
In this section, I set aside analogies (and symmetry arguments) to talk generally about prior probabilities. This preliminary discussion is necessary for two reasons. First, among some orthodox Bayesians there is little sympathy for the idea of clear and substantive rational constraints on prior probabilities, other than coherence. This attitude casts doubt on the very idea that analogies might rationally constrain prior probabilities. Second, the one additional constraint that enjoys broad support among orthodox Bayesians is strict coherence, or regularity. Strict coherence mandates a nonzero prior probability assignment for any noncontradictory proposition. Ironically, if we adopt this one extra constraint, then it may seem pointless to worry about whether an analogical argument can justify a positive prior probability. All of this means that in order to make room for analogical arguments in the Bayesian framework, I have to refute or at least temper extreme positions that would deprive them of any justificatory role. That is my main objective in this section. To begin, let’s review the basic elements of Bayesian epistemology. Most Bayesians interpret probabilities subjectively, as representing an agent’s degrees of belief (or credences). A Bayesian considering a family of possible hypotheses begins with a prior probability distribution and a set of likelihoods. Subsequent posterior probabilities are obtained by conditionalizing on accepted evidence in accordance with Bayes’ Theorem.16 All Bayesians acknowledge the constraint of coherence—the set of prior probability values must conform to the basic mathematical axioms of the probability calculus—but there is disagreement about the appropriateness of additional constraints. Orthodox subjective Bayesians place no constraints upon prior probabilities other than coherence.17 Prior probabilities, as the starting points in arriving at informed opinion, reflect individual judgments. They are not subject to criticism provided they are compatible with the axioms of the probability calculus. The point is not that these initial probability assignments are completely arbitrary but rather that they lie outside the scope of the “logic” of rational belief and decision. Indeed, subjectivists are fond of drawing an analogy between the probability calculus and deductive logic. Logic provides rules about entailment without imposing any constraints upon one’s initial set of premises other than consistency.18 Yet it is no part of logic that those premises are to be selected haphazardly. Of course, many subjectivists do acknowledge the need for constraints that go beyond probabilistic coherence, but for the moment let us focus on those who feel that they are unnecessary. They offer a twofold defense. First, any substantive statistical theory will include ineliminably subjective elements, “whether consciously or not,” but they are typically concealed (de Finetti and Savage 1972, 144). In accepting subjectivism about prior probabilities, Bayesians simply
16. There are other widely recognized mechanisms for updating, such as Jeffrey conditionalization, but they are not relevant to this discussion. I also omit discussion of another widely accepted constraint, Lewis’s Principal Principle (1980). 17. This is the point of view espoused by de Finetti (1964, 1974, 1980). See also Howson and Urbach (1993). 18. This analogy is found in de Finetti (1972) and in Howson and Urbach (1993).
ANALOGY AND SYMMETRY
281
ensure that we lay our cards on the table. That is the point of the analogy with deductive logic. Second, the particular choice of prior probabilities matters little because of a set of mathematical results referred to as the convergence theorems.19 As the convergence theorems tell us, “a relatively sharp, or precise, measurement may lead to a final distribution for a parameter (or set of parameters) that depends but little on the initial distribution of the parameters provided only that this initial distribution is sufficiently gentle, or diffuse” (de Finetti and Savage 1972, 145). Suppes puts the point as follows: If a moderate number of observations is taken in the experiment, the conclusions drawn will be relatively robust, that is, relatively indifferent to moderate variations in the prior distribution. . . . [Given] any two prior distributions drawn from a large class of possible distributions, there is, for a broad class of experiments, a sufficiently large number of observations to bring the two posterior distributions as close together as desired. (1966, 204) Eventually, one’s probabilities come to reflect the accumulated evidence, and variations in initial prior probability assignments become insignificant. We encounter the same line of reasoning elsewhere (Howson and Urbach 1993). Critics are unsatisfied with these arguments. First, the analogy with deductive logic fails when the Bayesian approach is put forward not merely as a mechanism for updating our degrees of belief but also as a substantive model for the confirmation of scientific hypotheses. Salmon and Earman write: “If our probabilities are supposed to represent reasonable degrees of conviction some stronger restrictions [than coherence] appear to be needed” (1992, 84). The relevant comparison is not between the probability calculus and deductive logic, but between probabilistic models and deductive models. These models include, respectively, prior probabilities and premises, which are therefore legitimate objects of critical scrutiny. As Salmon (1966) points out, we want our probabilities to have practical predictive significance, for they reflect the sorts of bets we are prepared to make. An individual who, without any evidence of physical bias, assigns prior probability 0, or nearly 0, to a result of heads on the next coin toss is being foolish. While such an assignment is not mandated by orthodox subjectivism, it is tolerated. But it seems perfectly legitimate to demand justification for such a biased prior— and, by extension, for any prior. The second argument, based upon convergence, provides only limited reassurance and is vulnerable to a number of criticisms. For one thing, the convergence theorems assume a “sufficiently gentle” initial distribution. However mild that assumption may be, it is still some constraint on the prior distribution. Furthermore, the influence of the prior distribution does not disappear in cases
19. Savage (1972) and Edwards, Lindman, and Savage (1963) are standard sources for the convergence theorems. See also Gaifman and Snir (1982).
282
BY PARALLEL REASONING
where little evidence is available for updating one’s beliefs. In the extreme case of zero prior probability, no amount of favorable evidence can ever change our opinion. If the prior probability assignments are imprecise, then conditionalizing even upon extensive evidence may lead to more imprecise probabilities (Herron, Seidenfeld, and Wasserman 1994, 1997). There is an even more fundamental problem with the idea that convergence makes prior probability assignments unimportant. As Harold Jeffreys pointed out, “gentle” distributions can be utterly inappropriate. The problem arises whenever the probability distribution involves one or more real-valued parameters, as the following (imaginary) example illustrates. Example 8.5 (electrostatic attraction): Suppose that we believe that the force of electrostatic attraction between two point charges q1 and q2 a distance r apart has magnitude expressed in the form F ¼ kq1 q2 =r n : We are trying to determine the value of the exponent n. Initially, we know that 1 n 3, and we take all real values in this interval to be possible. We cannot assign Pr(n ¼ c) > 0 for each possible value c, but only for countably many values. The flat or uniform prior distribution Prðn < cÞ ¼ ðc 1Þ=2 assigns prior probability zero to every hypothesis n ¼ c, including n ¼ 2. With this prior, no evidence could ever lead us to assign positive probability to Coulomb’s Law (which requires that n ¼ 2).20 Jeffreys argued that the policy of always adopting a uniform prior distribution is disastrous: The fatal objection to the universal application of the uniform distribution is that it would make any significance test impossible. If a new parameter is being considered, the uniform distribution of the prior probability for it would practically always lead to the result that the most probable value is different from zero—the exceptional case being that of a remarkable numerical coincidence. Thus any law expressed in terms of a finite number of parameters would always be rejected when the number of observations comes to be more than the number of parameters determined . . . . [The] only ground that we can possibly have for not always rejecting the simple law is that we believe that it is quite likely to be true . . . We do not assert it as certain, but we do seriously consider that it may be true—in other words, it has a non-zero prior probability, which is the prior probability that the new parameter, which is the coefficient of a new function, is zero. But that is a recognition
20. Of course, starting with the uniform prior, as evidence accumulates, we will see a spike around n = 2. That is not the same thing, as the lengthy passage from Jeffreys (below) makes clear.
ANALOGY AND SYMMETRY
283
that for the purpose of significance tests, at least, the uniform distribution of the prior probability is invalid. (1973, 118–19)21 This reasoning led Jeffreys to his famous simplicity postulate (1973, 129): “Any clearly stated [law] has a finite prior probability, and therefore an appreciable posterior probability until there is definite evidence against it.” Jeffreys’s arguments may not constitute justification for a positive prior probability assignment, but they give us strong motivation for wanting such a justification. Taking Jeffreys’s point together with the earlier criticisms of the orthodox view, we have good reason to reject the position that additional constraints on prior probabilities, over and above coherence, are unnecessary. Of course, we are still a long ways from finding and justifying any such constraints. We should admit that the orthodox view is the default position if we fail in this task. So let us turn next to approaches that acknowledge the need for additional constraints. The first, which I shall call the mish-mash view, is nicely described in the following passage from Richard Jeffrey: Modes of judgment (in particular, probabilizing) and their attendant standards of rationality are cultural artifacts, bodies of practice modified by discovery or invention of broad features seen as grounds to stand on . . . In practice, justification—what makes the histopathologist’s personally cogent Bayes factors cogent for her colleagues as well—is a mish-mash including the sort of certification attested by her framed diploma and her reputation among relevant cognoscenti. (1992, 11)22 There may be no crisp formal constraints upon our prior probabilities besides coherence, but it is nevertheless perfectly legitimate to demand justification for particular assignments. Prior probabilities should incorporate all available relevant information, even though it is hopeless to formalize this process. Suppes writes: The vague and subjective prior distribution is of importance primarily in summarizing all the information about the experiment or proposed test which lies outside the narrow framework of the experiment itself, but which is still relevant in varying degrees. The assumption of a prior distribution is a systematic way of summarizing a great deal of heterogeneous information. (1966, 203) Shimony (1970) and Salmon (1990) similarly maintain that we should assign positive prior probability to “seriously proposed” hypotheses, with the positive appraisal depending upon a complex set of background factors.
21. Jeffreys’s argument is stated in terms of significance tests, but it should be clear that versions of the same problem afflict both the error statistician and the Bayesian. 22. The Bayes factor for A and B is the ratio of odds between A and B after and before the effects of experience and is equivalent to the likelihood ratio.
284
BY PARALLEL REASONING
The second approach—sometimes called logical Bayesianism—advances exactly the opposite point of view: prior probability assignments may at times be determined on the basis of inside information about an experimental setup. This approach is associated with Jeffreys (1973) and Jaynes (1968, 1973), and especially with the idea that prior probability assignments may be determined, in full or in part, by considerations of symmetry. In cases where nothing is known about the experimental setup, symmetry is taken to imply something like a flat or uniform prior distribution. That idea is expressed in the classical Principle of Indifference and, in a more sophisticated way, in Jaynes’s entropy and cross-entropy principles. These general principles are controversial and vulnerable to various technical objections. In special cases, however, where we have knowledge of invariant structures or (what is the same thing) internal symmetries, the arguments can be compelling. The internal symmetries either constrain or uniquely determine our prior probability assignments. Both the mish-mash view and the logical approaches have something to offer and both have weaknesses. The weaknesses of the logical approach are wellknown, but I postpone discussion until the next section. The mish-mash view, though plausible, provides no real guidance in evaluating prior probability assignments. It may be natural to say, “It depends upon the particular case,” but we should try to do better. To see why, let us focus on the problem of extreme bias—that is, assignments of zero prior probability. The best general response the mish-mash view can offer is the policy of strict coherence (or regularity): don’t limit your options by assigning extreme probability values. In particular, assign probability 1 only to logical truths and 0 only to contradictions.23 The idea is to ensure that anything possible could eventually come to have high probability, given enough evidence. The policy cannot be taken literally, however, except in a highly artificial setting. For one thing, standard conditionalization, which is the principal engine of belief revision, requires the assignment of probability 1 to propositions describing accepted evidence, even though they are not logical truths. For another, propositions that one has not entertained or that one cannot even formulate within the current probability model cannot have positive probability and may have no well-defined probability at all.24 Finally, as we saw earlier, any prior distribution involving one or more real-valued parameters cannot assign positive probability to each possible set of parameter values. We could introduce infinitesimal probabilities, but even if strict coherence could be salvaged in this way, it would be useless without some way to identify serious possibilities—what I have been calling hypotheses with nonnegligible prior probability. The best characterization of this notion, following Jeffreys once again (1973, 129), is that a hypothesis has nonnegligible prior probability if a moderate amount of evidence could lead to an appreciable posterior probability. A generic policy of strict coherence provides no help with the problem of extreme bias
23. This constraint is rejected by de Finetti because of the “random integer” problem, discussed in section 8.4. 24. These are obviously general difficulties with strict coherence, not problems for the mish-mash view per se.
ANALOGY AND SYMMETRY
285
because it does not give guidance in selecting hypotheses to be assigned nonnegligible prior probability. Where does all this leave us in the search for constraints on prior probabilities? Orthodox subjectivists can point to the defects of “logical” principles and the vagueness of Suppes’s notion of “heterogeneous information.” But there is a middle ground between unacceptably broad logical axioms and blunt appeals to experience as a whole. The way is still open for “local” justification of prior probabilities, based on guidelines pitched at an intermediate level of generality. We can seek relatively clear principles and criteria that assist us in making probability assignments while leaving room for differences of opinion. Of course, that is just what I have been urging in the case of analogical arguments. Analogical reasoning appeals to adherents of both the mish-mash and the logical approaches, since it combines elements of both styles. On the one hand, analogical arguments bring outside considerations to bear on a problem while, on the other, they appeal to our sense of symmetry. Analogies are symmetries in a wide sense—they are symmetries applied to our entire knowledge situation. I shall develop this idea in the next two sections.
8.4 Symmetry Arguments and Subjective Probability Trouble can arise when we apply symmetry arguments to subjective probabilities. Presented with an apparently normal die and lacking any special knowledge of asymmetries, it seems reasonable to assign probability 1/6 to each face’s coming up. This inference employs the classical Principle of Indifference, which may be stated as follows (where e is a statement describing the relevant available evidence): If there are n mutually exclusive possibilities h1, . . . ,hn, and e gives no more reason to believe any one of these more likely to be true than any other, then P(hi/e) is the same for all i. (Howson and Urbach 1993, 52) This principle, as stated, is rightly rejected because it leads to well-known paradoxes. Example 8.6 (steel cubes). A machine produces steel cubes with edges between 1 and 3 cm long. Given no further information, what is the probability that a given cube will have edge length between 1 and 2 cm? A naı¨ve application of the Principle of Indifference yields the answer (21) / (31) ¼ ½, on grounds that the edge length is as likely to lie between 1 and 2 cm as between 2 and 3 cm. If we consider the equivalent problem of determining the probability that the volume of the cube is between 1 and 8 cm3, however, the Principle of Indifference now seems to give the answer (81) / (271) ¼ 7/26. This time, the reasoning is that the volume is as likely to fall between 1 and 2 cm3 as between 2 and 3 cm3 or in any of the 26 equal intervals compatible with a volume between 1 and 27 cm3.
286
BY PARALLEL REASONING
The existence of such paradoxes is a familiar objection to naı¨ve reliance on symmetry principles. We need to be careful in drawing lessons from failures of the Principle of Indifference. One view is that such failures show that symmetry principles impose no rational constraint whatsoever upon probabilistic inference. Though widely applied, symmetry principles are a matter of psychology rather than logic. De Finetti (1972) and Howson and Urbach (1993) are forceful advocates for this position. For Jeffreys and Jaynes, however, the lesson to be drawn from failures of the Principle of Indifference is that many problems are not well-posed (Jaynes 1973). While there is no remedy for this difficulty in some cases, there are others (even Example 8.6) where careful analysis shows that symmetry does lead to justifiable probability assignments. Constraints can arise from symmetry. Between these positions lies the mish-mash view, including a number of people whose attitude to symmetry arguments is sympathetic but wary. Jeffrey, Shimony, and Salmon do not try to formalize the reasoning that leads to prior probability assignments. Van Fraassen, as noted earlier, countenances some applications of symmetry to probability. My objective is not to defend any grand symmetry principle. Instead, I seek to make room for modest principles by countering arguments that lead to the complete dismissal of symmetry as a rational basis for probabilities. Despite the paradoxes and other objections, it can still be legitimate to employ symmetrybased reasoning in subjective probability assignments.25 It is important, for the remainder of this chapter, to appreciate how it is that legitimate symmetry arguments function. The rejection of the idea that symmetry ever provides rational constraints on our credences derives from two great fears, together with the confidence that symmetry may safely be relegated to psychology. The first great fear is inconsistency. The paradoxes already mentioned show that appeals to symmetry can lead to incompatible conclusions.26 The second great fear is rationalism. Symmetry arguments can seem creepy. Howson and Urbach, dismissing van Fraassen’s use of a symmetry argument in support of conditionalization, write: “In our opinion all that we should infer from these demonstrations is that symmetry principles, whatever their fertility in generating successful scientific theories, as in modern particle physics, should not be regarded as having axiomatic status” (1993, 110). After pointing out numerous technical difficulties with and ultimately rejecting Jaynes’s symmetry-based Principle of Minimum Cross-Entropy, they write: “We seem to be back in the shadowy, if not downright shady, world of the synthetic a priori” (1993, 112). While the charge is directed at one general
25. This point is not new. We even encounter appeals to a “restricted” Principle of Indifference in recent work on the Sleeping Beauty problem (Elga 2000). Such appeals are respectable, though controversial. 26. See van Fraassen (1989) for a discussion of many paradoxes arising from uncritical application of the Principle of Indifference and the prospects for their resolution.
ANALOGY AND SYMMETRY
287
principle, the larger message seems to be that cogent symmetry arguments require clearly formulated axioms, and any clearly formulated symmetry axiom must inevitably be synthetic a priori and thus unacceptable. The final argument for rejecting symmetry-based constraints on probabilistic inference is that a psychological interpretation of symmetry gives us everything we could possibly want. Symmetry arguments are fine so long as we understand them merely as describing, rather than constraining, our thinking about subjective probabilities. De Finetti is an eloquent advocate for this point of view. He writes: [Use] of such words as symmetry, trial, and perfect coin does not in itself entail properties of probabilities; on the contrary, if we would use these words meaningfully, they must be defined in terms of the probabilistic properties, and hence as subjective notions, because they express properties not of the events but of the probabilities associated with the events by some person, that is, of his opinion about the events. To say, for example, that A and B are independent for someone means no more nor less than that P(AB) ¼ P(A) P(B) in his probability evaluation. (1972, 190) On this approach, symmetry is defined in terms of the sort of probability assignments one is disposed to make. This may be what de Finetti had in mind when he wrote that subjectivists could “fruitfully adopt” (with modifications) many “non-subjectivistic” ideas from people like Harold Jeffreys. De Finetti carried out that program brilliantly through his concepts of exchangeability and partial exchangeability.27 Despite the fact that the fears are serious and the alternative program appears attractive, these three arguments are far from conclusive. The concern about inconsistency can be confronted head-on. The first comment here is that inconsistency poses as much trouble for subjectivists like de Finetti, who wish to incorporate a psychological version of symmetry into prior probability assignments, as for anybody else. When de Finetti writes about the probabilities assigned “by some person,” he means idealized probabilities that have been adjusted to eliminate incoherence, for otherwise that person’s probabilities (or betting quotients) are inadmissible. In order to model subjective assumptions about symmetry, we need to ensure that they do not lead to contradictory probability assignments. But then there is no advantage to a psychological interpretation of symmetry. Furthermore, those who take symmetry arguments to justify probabilistic claims are well aware of the dangers of inconsistency. There are well-defined, if restricted, results about when a proposed set of symmetries generates consistent
27. Franklin (2001) argues that incorporating such constraints makes de Finetti a logical probabilist. That is a distortion of de Finetti’s position, but I am sympathetic to Franklin’s view that a purely psychological construal of constraints such as exchangeability is unsatisfactory.
288
BY PARALLEL REASONING
probability assignments.28 In many examples, such as Buffon’s needle problem, symmetry arguments can be employed without fear.29 Consider next the concern that adherence to any symmetry principle is a form of rationalism. While the rejection of naı¨ve applications of the Principle of Indifference is entirely justified, universal dismissal of symmetry principles is not. Empiricists can accept circumscribed symmetry principles as well as van Fraassen’s general rules. In the case of the hydrodynamic analogies (Example 8.1), for instance, the relevant principle is this: when the behavior of two systems is perfectly described by equations of identical form and we know the value of some parameter in one system, then we have a justified claim about the corresponding value in the other system. The principle is a priori but not synthetic. Useful application of the principle requires empirical input: evidence that two systems are in fact governed by laws with isomorphic structure, together with a measurement in one of the two systems. In van Fraassen’s view, a very similar analysis applies to any legitimate symmetry argument: “[Symmetry arguments] are a priori, and powerful; but they carry us forward from an initial position of empirical risk, to a final point with still exactly the same risk. The degree of empirical fallibility remains invariant” (1989, 261). The empirical risk comes in the assumption that we have a genuine case of symmetry (a perfect analogy, in Example 8.1). The symmetry principle is a priori, but its application takes for granted that the models for the two systems are adequate. That these models are adequate reflects hard-won empirical knowledge. Symmetry arguments thus depend upon both a priori and empirical premises, but symmetry principles proper are not synthetic a priori. In sum, we can counter the charge of rationalism, just as we countered worries about inconsistency, through careful restriction and analysis of symmetry principles. Despite these reassurances, suspicion is bound to arise whenever we apply symmetry principles to probabilistic reasoning. These suspicions are best directed not toward symmetry principles in general but toward the premise that we have identified an appropriate set of symmetries. It is right to worry about whether the set of symmetries is too ill-defined to yield meaningful information or too large to yield consistent probability assignments. With this point in mind, the contrast between the used-car example and the steel-cubes example illustrates an important difference between external and internal symmetry arguments. An internal symmetry argument, such as the one applied to the steel cubes example, works with a fixed domain. The symmetry argument is meant to deliver precise probabilities. For such an argument to succeed, we need to specify clearly the full set of symmetries and prove that they do not lead to inconsistency. As Jaynes would say, we need a well-posed problem. By contrast, an external symmetry argument, such as the one employed in the
28. The most impressive results require definability of a Haar measure on the group of symmetry transformations; see Villegas (1977, 1981). For a different approach, see my papers (Bartha and Johns 2001; Bartha 2004). 29. Many discussions of Buffon’s Needle (e.g., Wikipedia) don’t even mention the application of symmetry.
ANALOGY AND SYMMETRY
289
used-car example, enlarges the domain by adding some new object or problem, and then relates this new case to an existing case (or cases). Inconsistency is less of a worry because we are building a group of symmetry transformations “on the fly” by adding just one symmetry relation to an existing set. Furthermore, the goal of an external symmetry argument is frequently just to establish a lower or upper bound on some parameter rather than a precise probability value. These considerations suggest that there can be two acceptable forms of symmetry argument with probabilistic conclusions. One form is exemplified by Buffon’s needle problem, where internal symmetries lead to demonstrably consistent and precise probability values. The other form is an external symmetry argument whose conclusions may be imprecise. My view is that analogical arguments belong to this second category: on the basis of likeness between a novel case and an existing case (or cases), they aim to establish a lower bound for some prior probability. These ideas will be developed in section 8.5. The final task in this section is to consider de Finetti’s program of reducing symmetry to psychology. I shall argue that this strategy is not philosophically satisfactory because it denies the legitimacy of a perfectly reasonable demand for justification. Consider a series of tosses of a single coin by one person. Alan assigns equal probability to all outcomes of m heads in n tosses, regardless of the order in which heads and tails occur. In de Finetti’s terminology, his subjective probability function for these outcomes is exchangeable. When pressed for justification, he says that he can find no relevant difference between any two such outcomes (i.e., it makes no difference whether the m heads all come at the beginning of the sequence of tosses, all at the end, or are interspersed throughout the sequence). By contrast, suppose that Brian rejects exchangeability, insisting that there are relevant differences. The precise sequence of heads and tails might influence the probability of the outcome, albeit ever so slightly. Perhaps molecules are lost on the bottom side of the coin when it scrapes the table. Brian appeals to asymmetry as a reason for his prior probability assignment. Now consider Chris, who rejects exchangeability but agrees with Alan that there is no relevant difference (no asymmetry) between any two sequences producing the same number of heads. I think that Chris is irrational. His asymmetrical probability assignment violates principle (8.2.2): it has no basis in any preceding asymmetry. But if symmetry is a purely psychological concept, then there is no way to make sense of this charge. The point of resisting the reduction of symmetry to psychology (to a property of one’s personal probability assignments) is that the two judgments—symmetry and probability—are conceptually independent. Furthermore, judgments of symmetry have explanatory priority. The reduction collapses the two types of judgment into one and denies the legitimacy of the request for justification. If we grant the reduction, then we have no business demanding justification from Chris (or Brian or Alan). Nor, in the used-car example, can we criticize Tracy if she assigns different values to two used cars that are identical by her lights. Judgments of symmetry or asymmetry count as reasons for making probability assignments and they precede those assignments. We may not know if the
290
BY PARALLEL REASONING
physical setup justifies those judgments, and we can argue about that. For the reductionist, however, this process cannot occur at all. Prior probabilities are the starting point for reasoning and the stopping point in requests for justification. It is helpful here to consider de Finetti’s own appeal to symmetry in his rejection of countable additivity.30 In discussing the example of a positive integer “chosen at random,” de Finetti insists that we should be able to assign equal probability to each integer. But if the probability assigned to integer n is pn and if we impose the requirement of countable additivity, then we must have p1 þ p2 þ . . . ¼ 1; which makes it impossible to attach the same value to each pn. He concludes: Therefore, a probability evaluation which is completely [i.e., countably] additive . . . represents an extremely asymmetrical opinion concerning the probability of the different cases. If this lack of symmetry does not reflect the actual judgment of the subject, perhaps because he is indifferent toward all the possible outcomes, how could we then include in the definition of consistency (in a purely formal sense) a condition which does not allow him to assign equal values (necessarily zero) to all the probabilities pn? Should we force him, against his own judgment, to assign practically the entire probability to some finite set of events, perhaps chosen arbitrarily? (1972, 91) Although the argument is couched in psychological terms, it derives its force from the idea that judgment of the symmetries and asymmetries inherent in a problem precedes and even constrains our subjective probability assignments. Symmetry arguments can provide reasons for probability assignments. Judicious symmetry principles can be employed without inconsistency and without embracing rationalism. The next task is to show how such principles allow us to make a connection between analogical arguments and prior probabilities.
8.5 Analogical Arguments and Prior Probabilities In the introductory chapter, I quoted Priestley: “analogy is our best guide in all philosophical investigations; and all discoveries, which were not made by mere accident, have been made by the help of it” (1769, 1775/1966, 14). Similar sentiments about the indispensability of analogies for discovery are expressed by nineteenth- and twentieth-century writers, as well as by contemporary cognitive scientists. I have been concerned not with the necessity of analogies but with their sufficiency for rational inquiry, especially in the sciences. Earlier chapters proposed a theory to help us decide which analogical arguments are likely to be good “guides to investigation.” I believe that analogical arguments framed in
30. The argument is developed in a number of places, e.g., 1972, pp. 91–92, and 1974, 122. For further discussion, see Williamson (1999), Bartha and Johns (2001), and Bartha (2004).
ANALOGY AND SYMMETRY
291
accordance with that theory have more than heuristic value: they provide a measure of justification for their conclusion. In this section, I refine that claim by relating my theory of analogy to the Bayesian model of confirmation. Salmon (1966, 1990) makes an appealing proposal about the role of analogies within a broadly Bayesian approach to confirmation. First, he suggests that Bayesians can usefully separate testing from plausibility arguments. Testing leads to updating via conditionalization. Plausibility arguments, by contrast, are directed toward the choice of a reasonable set of prior probability assignments. Similar ideas have been expressed more recently by Hawthorne, who writes: It is arguably a virtue of probabilistic induction that it provides a place for [plausibility] assessments to figure into the full evaluation of hypotheses. Although prior probabilities may be subjective in the sense that agents may disagree on the relative strengths of plausibility arguments—and so disagree on the plausibilities of various hypotheses—priors are far from being mere subjective whims. (2008) Salmon maintains that analogies constitute a particularly important variety of plausibility argument. I think that this view is essentially correct, but that it should be restricted to analogical arguments that meet certain standards. What is more, we need to explore the intellectual “mechanisms” that link these cogent analogical arguments to prior probabilities. I don’t believe that we can make this link perfectly precise, but we can make considerable headway. My strategy is to show that the modal argument of section 8.2 can be recast in Bayesian terms. The result, which I call the factorization argument, provides the basic link between analogies and prior probabilities. The argument is not a proof that any analogical argument satisfying our criteria will succeed; rather, it is a demonstration that such arguments are sanctioned by broadly acceptable norms of symmetry and Reflection.
Partial Probability Models and Nonnegligible Probability We want to show that an analogical argument that begins with a credible proposition and meets our criteria for plausibility justifies the assignment of nonnegligible probability to its conclusion. My immediate objective is simply to provide a clear statement of this claim by defining the italicized phrases. The easiest way is to begin with the modal version from section 8.2. (8.2.7) Prima facie Plausibility Suppose Rðf1 ; . . . ; fm ; p1 ; . . . ; pn ; QÞ describes an acceptable prior association such that (1) Q belongs to the set C of prima facie plausible 6 and (3) jC \ N ¼ . Then Q* should belong to C. hypotheses; (2) jþ \ P ¼ We want a version of this principle that replaces prima facie plausibility with some probabilistic analog. The new version will have much in common with principle (8.2.7). Let’s concentrate on a single analogical argument (i.e., no rival analogies). In terms
292
BY PARALLEL REASONING
of the rough formulation of the previous paragraph, the credible proposition Q is (as before) a known, accepted or seriously considered proposition about the source domain. The conclusion Q* is the analogous proposition about the target domain. The analogical argument meets our criteria for prima facie plausibility if it has an acceptable prior association R, a nontrivial positive analogy—clause (2)—and no known critical difference from the target domain—clause (3). None of these things needs to change. The important modification will be to replace “Q belongs to C” and “Q* should belong to C” with probabilistic notions. To make this change seems to require a probability model: an outcome space, a set of events or propositions to which probabilities are attached, and a well-behaved probability function Pr. In most cases of analogical reasoning, however, we have no such model! In fact, as hinted in section 8.4, analogical reasoning is often used to extend a partial model. I offer the following informal characterization of partial probability models and the probability values within these models that hold the most interest. The characterization assumes that we have fixed our attention on a particular analogical argument. It runs as follows: 1. Critical factors. As in section 8.2, we suppose that we can identify a set jC of critical factors sufficient for a “first-order” assessment of our analogical argument. 2. Probability function. The object of interest is a subjective probability function Pr that assigns conditional probabilities relative to the presence and absence of the critical factors. Write Pr(Q=f1 f2 p1 ) for the probability of Q given that factors f1, f2 are present and p1 is absent.31 Write A and B for conjunctions f1 . . . fr and p1 . . . ps ; in the prior association, and write Pr(Q / A B) for the corresponding conditional probability. As usual, we use * to denote analogous factors and propositions in the target domain: Pr(Q* / A*) stands for Pr(Q* / f1* . . . fr*). 3. Extensions. The propositions to which we attach a probability express features of the source and target domains. We typically do not have welldefined probabilities (or conditional probabilities) for all such propositions. To extend the model is to extend the class of propositions with a probability assignment. 4. Relative priors: source domain. An analogical argument presupposes not an absolutely prior probability Pr(Q ), but rather the probability of Q conditional upon all critical factors known to be present or absent in the 1 . . . p n represent those source domain. If F ¼ f1 . . . fm and P ¼ p factors (as in (8.2.7)), then our starting point is Pr(Q / FP). I shall refer to this as a relative prior probability. 5. Probabilities in the target domain. We don’t presuppose any probability assignments for the target domain. Most such assignments count as 31. This is a slight abuse of notation: fi and Pj represent both factors and assertions about the presence/ absence of those factors.
ANALOGY AND SYMMETRY
293
extensions of our model, although alterations of existing assignments are also possible. We need one last bit of terminology. As always, let P stand for the positive analogy, represented as a conjunction of present and absent critical factors in the source domain, and let P* represent the analogous conjunction believed to hold in the target domain. Let K represent all residual critical factors—the factors that remain after deleting those that belong to P. Then P K represents the complete conjunction that tells us which critical factors are present and absent in the source domain. It is exactly the same as F P, but the new notation will prove to be useful. I can now state the probabilistic analog of principle (8.2.7), my main claim about the connection between analogical arguments and prior probabilities. (8.5.1) Nonnegligible Probability 1; . . . ; p n ; QÞ describes an acceptable prior association Suppose Rðf1 ; . . . ; fm ; p such that the following conditions are satisfied: (1) Plausible analog. Pr(Q / P K) is appreciable. (2) Overlap. jþ \ P 6¼ . (3) No-critical-difference. jC \ N ¼ . (4) Residual factors constraint. Pr(K / P) is not extremely low. Then Pr(Q* / P*) should be nonnegligible relative to Pr(Q /P K). The parallel with the modal version, principle (8.2.7), is useful and helps to bring out the importance of moving to a probabilistic formulation. Much remains the same, but I have made two substitutions to eliminate reference to the set C of prima facie plausible hypotheses: (a) “Pr(Q /P K) is appreciable” for “Q belongs to C” (b) “Pr(Q*/P*) should be nonnegligible relative to Pr(Q /P K)” for “Q* should belong to C” The other notable change is the addition of the Residual factors constraint. Let me say a little about the motivation for the two substitutions. I discuss the new constraint a bit later. For a good analogical argument, it is not enough that we currently believe Q or assign it a high probability in light of all other beliefs. The prior association itself must provide substantial reasons for belief in Q. In probabilistic terms, I interpret this as the requirement that Pr(Q /P K) be appreciable. This is substitution (a). For substitution (b), let me first explain why we should care about Pr(Q*/P*). Recall that P* stands for the analogs in the target domain of factors that belong to the positive analogy. From clauses (2) and (3) above, P* is nonempty, and there are no known critical differences. It follows that the only critical factors in the target domain about which we know anything definite are those that belong to P*. P* thus represents our relevant background knowledge of the target domain. Hence, Pr(Q*/P*) is the appropriate “relative” prior probability for Q*.
294
BY PARALLEL REASONING
The biggest departure from the modal version concerns the notion of nonnegligible probability. Section 8.3 introduced Jeffreys’s idea that a prior probability is nonnegligible if a moderate amount of evidence could lead to an appreciable posterior probability. Here, we explicitly define nonnegligible probability as a relative notion. Nonnegligible probability is defined relative to some other “appreciable” probability value. The conclusion of principle (8.5.1) is that the conditional probabilities for Q* and Q should be in the same epistemic ballpark. Given a reasonable amount of evidence, the probability for Q* could then rise to the level of Q.32 From these remarks, we can already see two significant contrasts between principles (8.2.7) and (8.5.1). The most obvious is that the conclusion now allows for a variable degree of plausibility in Pr(Q*/P*), as opposed to delivering the blunt result that Q* is prima facie plausible. The other important contrast is that in the probabilistic version, the strength of the conclusion is relative to the strength of the prior association. This idea is absent from the modal version, where the conclusion is not stated in relational terms. Defining the strength of the conclusion relationally is a clear improvement, and even brings out a weakness in the modal version. A series of analogical arguments, each acceptable, might produce a sequence of increasingly less probable hypotheses Q, Q1, Q2, . . . and so forth. It could be that this sequence eventually reaches a hypothesis that we would not count as prima facie plausible. There is no way for the modal argument to rule out such a possibility. By contrast, the probabilistic version, principle (8.5.1), is not vulnerable to this problem because it makes an assertion only about the degree of plausibility of the next member of the sequence relative to the previous member. One final preliminary point concerns arguments involving multiple analogies. As I explained in section 8.2, the modal argument can handle complications that arise when multiple source domains are introduced: we either confine our attention to “optimal” arguments or we enlarge our set of critical factors. Since the very same adjustments can be made for the probabilistic version, I shall continue to focus on individual analogical arguments.
The Factorization Argument Suppose that we can identify prima facie plausibility with nonnegligible prior probability as suggested above. Suppose we also grant that we have an adequate justification for principle (8.2.7) via the modal argument of section 8.2. Then it seems that the factorization argument, which is its probabilistic analog, should provide an adequate justification for principle (8.5.1). The factorization argument proceeds in two steps, just like the modal argument. Let’s continue to let Tracy be our agent. Suppose that at t1, Tracy has 32. This conclusion would be defensible even if we dropped the requirement that Pr(Q / P K) be appreciable. Principle (8.5.1) as we have stated it builds in the assumption that we start from a relatively strong prior association. But the general point is that the principle conforms to the upper bound thesis of section 4.3: the prior association in the source domain imposes an upper bound on the strength of the analogical argument.
ANALOGY AND SYMMETRY
Source Domain (S)
295
Target Domain (T)
t0: Pr (Q / P ) non-negligible Step 1 (Reflection)
t1: Pr (Q / P ⋅ K ) >> 0
Step 2 (Symmetry)
Pr (Q* / P *) non-negligible
(appreciable probability)
FIGURE 8.1. The factorization argument.
an analogical argument for Q* meeting all of our requirements: an acceptable prior association, an appreciable probability value Pr(Q /P K), nontrivial positive analogy, and no known critical differences. Introduce a hypothetical earlier situation t0 where the only known factors are those in the positive analogy, P. By Reflection, Tracy should assign nonnegligible value to Pr(Q / P) relative to Pr(Q /P K).33 By symmetry between the source domain at t0 and the target at t1, Tracy should assign comparable values to Pr(Q /P) and Pr(Q* / P*). Combining the two steps, Pr(Q* /P*) should be nonnegligible relative to Pr(Q /P K). Figure 8.1 summarizes the reasoning. Although I endorse this argument, it faces a serious challenge—a difficulty that is obscure in the modal version but easily grasped in the probabilistic version. Of course, if there were no such difficulty, there would be little point in going beyond the modal argument. The Residual factors constraint, already included in principle (8.5.1), remedies the problem, but at the cost of introducing imprecision. Let’s consider the two steps of the factorization argument in more detail. STEP
1:
REFLECTION
ðfrom PrðQ=PKÞ 0 to PrðQ=PÞ nonnegligibleÞ The argument for step 1 is modeled on the parallel step for the modal version. The assignment of Pr(Q / P) (at t0) is an extension of Tracy’s probability model. Appealing to Reflection, that assignment should anticipate her high value for Pr(Q /P K) at t1. Pragmatic considerations dictate a nonnegligible value for Pr(Q /P): if we rule out discoveries by “mere accident” (in Priestley’s phrase) that might boost her probability for Q , Tracy must either assign a nonnegligible probability to that hypothesis at t0 or drop it from serious consideration. The main objection to this reasoning is simple. An appreciable value for Pr(Q /P K) is compatible with a very small (or negligible) value for Pr(Q /P), especially when the positive analogy P is slight. By the conjunction principle,
33. Since (by hypothesis) the only difference between t0 and t1 is the addition of K, we can manage with a single probability function Pr rather than having to introduce separate Pr1 and Pr2.
296
BY PARALLEL REASONING
PrðQ=PKÞPrðK=PÞ ¼ PrðQ=PÞPrðK=QPÞ: Both Pr(K/ P) and Pr(Q /P) could be very small even though Pr(Q /P K) is large. Indeed, Pr(K/P) will be small if K is unexpected given just P. It may be that K (rather than P) is responsible for the high value of Pr(Q /P K). The problem can be illustrated with a simple example: P Fred is reading a book from a university library. Q The book that Fred is reading contains the sentence “7 þ 5 ¼ 12.” K Fred is a Kant scholar. Here, both Pr(K/P) and Pr(Q /P) are tiny even though Pr(Q /P K) is moderately high. Let’s call this the residual-factors objection, as it arises from the potential significance of the residual factors K. Although this has the appearance of a technical objection, I believe it points to an important issue for Bayesians: the tension between pragmatic and epistemic considerations in making plausibility judgments. The modal argument of section 8.2 stresses the pragmatic element. Reflection requires that we treat a hypothesis with solid analogical support as prima facie plausible in order to keep our options open. But once we put things in terms of subjective probabilities, previously neglected epistemic considerations—such as the value of Pr(K / P)—become salient. In order to rescue the argument, we need to find a way of balancing these two influences. To make this problem more vivid and to set the stage for a solution, let’s look at three examples. The first example is Priestley’s analogy between gravitation and electrostatic attraction, as presented in chapter 4 (Example 4.2). On the basis of analogous effects—namely, the absence of force inside a hollow shell of uniform mass or charge density—Priestley conjectured that electrostatic attraction, like gravitational attraction, obeys an inverse square law. This is a very strong analogical argument because the positive analogy P is extensive. In fact, P represents the whole of the prior association: there are no residual critical factors K. Priestley was aware of important differences between electricity and gravitation, but none of them has any bearing on his argument. In this case, the Reflection step is not needed and the residual-factors objection does not arise. The second example is Example 4.6, the analogy about life on other worlds. That argument starts with an explanation for the existence of life on earth and concludes that there is nonnegligible probability that life exists on another planet such as Mars (or ancient Mars). The initial explanation, which is supposed to provide appreciable probability for its explanandum, incorporates a number of factors that account for life on earth. As a result, we should expect that the positive analogy P between earth and Mars will be slight and that there will be many residual factors. At the Reflection step, can we legitimately reason backwards to nonnegligible value for Pr(life on earth/P)? The third example is Example 4.7, the acoustical analogy employed by nineteeth-century physicists in an attempt to explain discrete atomic emission spectra by analogy with harmonic oscillators. Here, Q is the hypothesis that an
ANALOGY AND SYMMETRY
297
oscillator such as a tuning fork or vibrating string vibrates at discrete overtones that are integral multiples of a fundamental frequency. Although Q is well supported, much of that support derives from residual factors K that are not part of the positive analogy P between vibrating strings and atoms. In fact, the positive analogy is slight, consisting merely of general resemblances between sound and light and the fact that both systems manifest discrete frequencies of vibration. Despite a high value of Pr(Q /P K), it could be argued that there is no case to be made for assigning other than negligible value to Pr(Q/P). Again, the Reflection step appears to be in trouble. The Reflection argument, then, is unnecessary in the first example and seems to fail for examples such as the acoustical analogy and life on other worlds, where residual factors loom large. But let’s consider the two problematic examples more carefully. If the modal argument is remiss because it neglects epistemic considerations, then the above discussion of the two problem cases is equally one-sided in ignoring pragmatic considerations. In this regard, the examples share two notable features. First, both examples occur in epistemically impoverished settings. There is little doubt, from a historical perspective, that the appeal of explaining emission spectra through the acoustical analogy gained from the fact that there was no other plausible explanation in the running. Standing waves, harmonic overtones, and the like were the only known wavelike phenomena exhibiting discrete vibration frequencies, and thus the only plausible source domain for conjectures about discrete atomic spectra. A similar point applies to the argument for life on other worlds: we lack authoritative knowledge about possible source domains besides earth. Second, in both examples, we find advocates who are aware of the weakness of their argument and working hard to increase the positive analogy.34 Physicists went to great lengths to interpret spectral lines as overtones. Those who look favorably on the hypothesis of life on ancient Mars deliberately incorporate counteracting causes into the positive analogy: microbes exist in frozen Antarctic lakes despite intense cold and the absence of liquid water. We know that such efforts ultimately failed in the case of the acoustical analogy. We don’t yet have a verdict on Mars. The important point, illustrated by both cases, is that we can sometimes suspend judgment while we attempt to increase the positive analogy. The decision to pursue a hypothesis this far already implies a serious commitment that has to be reflected in an assignment of nonnegligible prior probability. From this discussion, we take away a nuanced understanding of the Reflection step. We start with an appreciable value for Pr(Q /P K). At one extreme, we find strong analogical arguments, such as Priestley’s, where the critical factors all belong to the positive analogy and the set K of residual factors is irrelevant. At the other extreme, we encounter weak but passable analogical arguments, such as the acoustical analogy, where most of the critical factors belong to the neutral analogy. These are borderline arguments whose plausibility hangs in the
34. Chapter 4 provides detailed discussion of this point.
298
BY PARALLEL REASONING
balance: we may suspend judgment while trying to enlarge the positive analogy, so long as we lack plausible rivals. It is quite typical that situations where the positive analogy is small tend to be situations where we have few rivals. All of this leads to the Residual factors constraint in principle (8.5.1): for an analogical argument to establish nonnegligible prior probability, there should be no independent reason to assign a low probability to Pr(K/P). The idea is really that, from the vantage point of t0, there has to be a serious possibility of a prior association at t1 if we are to apply the Reflection step in the argument. In some ways, this new constraint has been implicit all along in the no-critical-difference condition. That condition requires that the residual factors K belong to the neutral, rather than the negative, analogy. The Residual factors constraint is an obvious extension of that principle: we should not have independent reason to think that the critical factors K probably belong to the negative analogy.35 What has emerged from our discussion is the difference between cases such as the acoustical analogy or life on other worlds, where we can reasonably suspend judgment, and the case of the Kant scholar, where we antecedently believe that the residual factors have a very low conditional probability of being present. It is appropriate to make the condition explicit, and to acknowledge that the judgment about the residual factors is at least partly subjective. It is also important to recognize that some version of the Residual factors constraint should be incorporated into principle (8.2.7), the account of prima facie plausibility. Still, this modification does not represent a fundamental change to our theory of analogical arguments. STEP
2:
SYMMETRY
ðPrðQ=PÞ > a ) PrðQ =P Þ > a eÞ The symmetry step rests on two assumptions. The first is that it is appropriate to regard the epistemic situations in which these two conditional probabilities are assigned as symmetric. The second is that the existence of symmetry mandates the assignment of comparable conditional probability. I represent this as the assignment of approximately the same lower bound for Pr(Q*/P*) as for Pr(Q /P). The first assumption is the same here as for the modal version of the argument. The idea is that the models of chapters 4 through 6 do their job. They identify an appropriate set of critical factors and appropriate standards of similarity for justified symmetry judgments. Certainly, they are fallible, but the models need only represent reasonable standards. The second assumption rests on van Fraassen’s Negative Symmetry Principle, principle (8.2.2). If we can’t exhibit any relevant difference between the hypothetical source domain at t0 and the target at t1, then the conditional probabilities should receive approximately the same value (or range of values).
35. Footnote 7 anticipates this way of interpreting the no-critical-difference condition.
ANALOGY AND SYMMETRY
299
The inclusion of the factor e is meant to acknowledge that in most cases, the symmetry is not perfect. The principal worry here is that we become vulnerable to the inconsistency and arbitrariness that plague naı¨ve applications of the Principle of Indifference. How do we know that our appeals to symmetry will not lead to incompatible probability assignments, as in the case of steel cubes (Example 8.6)? My response combines our discussion of symmetry and subjective probability (section 8.4) with the idea that analogical arguments are employed to extend partial probability models. In section 8.4, I noted two styles of symmetry argument that can evade inconsistency. There are tight arguments where internal symmetries generate a unique and consistent set of prior probabilities, but there are also loose arguments that appeal to external symmetry. This second style is the right one here. With analogical reasoning, we are in open country. We appeal to analogy (and symmetry) not to force a precise probability assignment; rather, the analogy merely provides a lower bound as we extend our partial probability model. There are too few symmetries to produce worries about inconsistency.
8.6 Implications and Objections The relationship between analogies and prior probabilities outlined in the preceding section has some important consequences. The analysis also gives us the means of responding to a number of important objections.
Degrees of Plausibility Among analogical arguments that meet our criteria for prima facie plausibility, some are better than others. At the end of section 8.2, I noted that the modal approach is powerless to accommodate distinctions in degree of plausibility. With a probabilistic approach, we have the beginnings of a framework that lets us compare the relative strengths of different analogical arguments. Quantitative measures enter into the factorization argument at two points: the initial value of Pr(Q /P K), and the Reflection step where we pass to a value for Pr(Q /P). This gives us two different measures of strength. First, certain categories of analogical argument tend to be stronger because they are associated with large values for Pr(Q /P K). In particular, in a mathematical analogy P K implies Q, and therefore Pr(Q /P K) ¼ 1. Second, an analogical argument becomes stronger as we increase the positive analogy and shrink the set K of residual factors. This idea was illustrated in the previous section, in our comparison of Priestley’s argument and the acoustical analogy. We maximize on both measures if Pr(Q /P K) ¼ 1 and there are no residual factors. It is worth noting that even analogical arguments of this caliber can fail. A good illustration is Example 5.6 (abelian groups), an analogical argument in support of a rather plausible mathematical conjecture (by my lights) that turns out to be false. In terms of assessing degree of plausibility, however, the biggest deficiency is that my theory does not incorporate any measure of degree of similarity between
300
BY PARALLEL REASONING
source and target domains. Because of this, the theory remains first and foremost a modal analysis: its main business lies with the initial screening of hypotheses for prima facie plausibility.
Pragmatic Arguments and Dutch Books How can an analogical argument, which brings no new evidence to bear, raise the probability of a hypothesis? It seems impossible, for a Bayesian, that there should be any incremental confirmation. As noted in chapter 1 (section 1.9), this looks like a version of the problem of old evidence. I want to focus on a different (but related) problem. Van Fraassen (1989) has a well-known objection to any updating rule that assigns probability to a hypothesis in excess of what accrues from ordinary conditionalization—and thus an objection to any rule that boosts probability when there is no new evidence at all. The criticism, made vivid by the tale of Bayesian Peter, is that any such “ampliative” rule is vulnerable to a Dutch Book. Every time Peter contemplates the adoption of such a rule, he realizes that it would lead him to acknowledge as fair a system of bets that foreseeably leads to certain loss. Whatever one thinks of Dutch Book arguments, vulnerability of this sort provides strong reason to think that one is operating with a defective policy. I accept this point completely. Suppose that, despite van Fraassen’s arguments, we remain convinced that some non-evidential theoretical virtue contributes to confirmation. One strategy, illustrated in recent work on unification (Myrvold 2003), is to show that the theoretical virtue is already taken into account by conditionalization. An alternative strategy is to show that the theoretical virtue may justifiably influence our prior probability assignments, though not in any way that can be encapsulated as an updating rule. Principle (8.5.1), Non-negligible Probability, falls under this alternative strategy. It is not an updating rule. It is a guideline for extending a partial probability model by adding a new prior probability for a newly considered hypothesis. No foreseeable Dutch Book can arise where there is no antecedent subjective probability. This way of avoiding difficulty, however, is a little too easy because analogical reasoning is not limited to situations where one entertains a proposition for the first time. Darwin’s analogy between artificial and natural selection (Example 4.8) was directed toward a hypothesis that had already attracted considerable attention, as he explains in the opening pages of The Origin of Species. In cases such as this, analogical arguments do aim to shift existing probabilistic opinion without marshaling new evidence. To legitimize this use of analogy, we need a different way of avoiding van Fraassen’s Dutch Book argument. Before going further, it is worth noting that the problem does not seem to arise for the simpler principle (8.2.7), Prima facie plausibility. Dutch Books are not an issue if we are operating in a pre-probabilistic framework. In a way, this is not surprising: within the rigorous framework of the probability calculus, we seem to require a clear mechanism to explain how analogical arguments can
ANALOGY AND SYMMETRY
301
influence our prior probabilities, but no such mechanism can be provided. As Hawthorne notes, some Bayesians accept that ongoing revision of prior probabilities (based on plausibility arguments) can be rational, but “the logic of Bayesian induction has nothing to say about what values the prior plausibility assessments for hypotheses should have; and it places no restrictions on how they might change” (2008). One easy solution to our problem is just to take for granted the rationality of appealing to analogical arguments to revise prior probabilities. But this still leaves us with doubts about how analogical arguments mesh with the Bayesian apparatus. The best solution, in my view, is suggested by van Fraassen himself. Like Hawthorne, he accepts that the orthodox Bayesian view, which recognizes only updating by conditionalization, is too rigid to accommodate all types of belief change. His proposal is that we can be voluntarists about our (probabilistic) beliefs. We can change our subjective probabilities in ways other than conditionalization. Such changes can be rational so long as they meet three conditions. The first is synchronic rationality: our beliefs must be coherent at any given time. The second is that such radical belief shifts cannot be based upon a policy or rule articulated in advance, since this would make us vulnerable to Dutch Books. The third condition is conformity to Reflection: present opinion must lie in the range of anticipated future opinions. Principle (8.5.1), Non-negligible Probability, is an expression of this voluntarism. It applies equally to situations of complete novelty and situations of radical revision, where one effectively adopts new priors relevant to some particular problem. It does not dictate what one’s updated probabilities should be; rather, it provides constraints of rationality which are largely the expression of van Fraassen’s own General Reflection and symmetry principles. Perhaps the most important point to emphasize here, once again, is that my theory supplies only a constraint on pairs of conditional probability assignments (in the two domains). It mandates that we bring two conditional probabilities into line, ensuring that one probability is nonnegligible with respect to the other. It does not tell us how to accomplish this. It does not propose any mechanism or rule for equalizing the two conditional probabilities.
Mathematical Analogies The connection between analogy and symmetry is as important in mathematics as anywhere else. But there is a well-known difficulty in applying the probabilistic apparatus in this case: coherence requires that all subjective probabilities of mathematical statements be 0 or 1. One possible response is to retreat to the general modal argument of section 8.2, which offers a nonprobabilistic account linking symmetry and analogical reasoning. I prefer not to take that route. Plausibility comes in degrees in mathematics, just as it does everywhere else. Mathematicians don’t hesitate to speak in terms of probabilities when talking about unproved conjectures because probabilities have many of the right features for characterizing plausibility reasoning (Polya 1954, Hacking 1967, Gaifman 2004). A major thesis of this
302
BY PARALLEL REASONING
book is that the basic mechanisms of analogical reasoning are common to mathematics and the empirical sciences. That thesis provides strong impetus not to adopt fundamentally different approaches to analogical reasoning within and outside mathematics. There are technical resources that can help with this problem, akin to the partial probability models of the present chapter. The basic idea (sketched in section 5.7) is that symmetry and analogy arguments are applied to a fragment of mathematics. I shall not here pursue the matter further.
Disappearing Analogies The probabilistic approach gives us an interesting way to resolve the “disappearing analogy” problem mentioned at the end of chapter 7. The problem, once again, takes the form of a dilemma. In the target domain, the positive analogy P* either counts as evidence for the hypothesis Q* or it does not. Either way, facts about the source domain can make no difference to our justification for Q* because they provide no evidence over and above the support (if any) provided by P*. For instance, in Priestley’s analogy, P* includes the absence of force instead a hollow shell of charge. Either this constitutes evidence for the inverse square law of electrostatics (Q*) or not. No extra support comes from citing analogous facts about gravitational force. This objection is a variant of the Bayesian Peter objection already considered, and the response is similar. If Q* is novel, then we cannot regard P* as evidence in the Bayesian sense because we have no prior probability to update via conditionalization. The point of the analogy is to provide guidance in assigning the (relative) prior probability Pr(Q*/P*). If Q* is not novel, it is nonetheless true that P* is not evidence in the usual sense because the point of the analogy is to motivate a revised (relative) prior probability assignment. There is an important general point here that goes beyond Bayesianism and may be traced at least to nineteenth-century ideas about confirmation. As we noted in chapter 1, Whewell and Herschel rejected an unrestricted hypotheticodeductive model of confirmation, according to which any hypothesis receives support if its deductive consequences are observed. Both thinkers insisted that a hypothesis capable of confirmation must involve a vera causa, which I interpreted as a prima facie plausible hypothesis. The Bayesian analog is a hypothesis with nonnegligible prior probability (relative to some other hypothesis). In arguing that justification is required for a nonnegligible prior probability assignment, we have a clear role for analogies in confirmation and that resolves the disappearing analogy problem.
8.7 Conclusion An analogical argument constructs a link between two domains. The analysis of this chapter offers ways to understand this link both intrinsically (“from the inside”) and extrinsically (“in the large”). From the inside, an analogical argument
ANALOGY AND SYMMETRY
303
that satisfies our criteria is justified by a two-step factorization involving Reflection and a symmetry argument. The factorization can be given either a modal or a Bayesian formulation. From the outside, an analogical argument is located within the larger project of extending a partial probability model by means of symmetries between domains. Although I have argued that analogical arguments may be traced to norms of rationality, we still face the question: what reason do we have to expect analogical arguments to work? The best answer I can give is that our models of analogical reasoning provide a forum that lets us debate about, and ultimately identify, the “right” critical factors, and hence the appropriate invariants for establishing symmetry between two domains. We can only hope that experience provides a reliable set of models and a corresponding sense of what counts toward symmetry in normal investigations. Our models may not be applicable to non-scientific inquiries, however, or to the analogical arguments that figure in scientific revolutions. Revolutionary science requires profound changes to the way in which we solve problems. In the final chapter, I explore the extent to which our theory might be pushed beyond the bounds of normal science.
This page intentionally left blank
9 A Wider Role for Analogies
9.1 Introduction: Beyond Mathematics and Normal Science In the introduction to this book, I stressed the goal of seeking a theory of analogical arguments that provides unifying principles yet recognizes the diversity of this form of reasoning. In early chapters, I suggested that analogical reasoning has its own logic and should not be assimilated to any standard deductive or inductive argument pattern. I then argued that my articulation model—two general principles together with a set of models—provides criteria for evaluating analogical arguments. Briefly, for an analogical argument to be good, it must clearly articulate a relationship (the prior association) in the source domain and then demonstrate the potential for generalizing that relationship to the target domain. Chapters 7 and 8 proposed two separate philosophical justifications for this theory. The first is based on the idea that good analogical arguments achieve an optimal balance between competing epistemic values; the second traces plausible analogical reasoning to modest symmetry principles. In presenting these ideas, most of the examples and discussions have been oriented toward normal scientific inquiry.1 Indeed, two salient features of normal science are closely aligned with the two fundamental principles of my theory. First, normal scientific reasoning exhibits regular structure. We can agree upon standard forms for representing common sorts of causal, logical, or functional relationships. These are the relationships that constitute the prior association at the heart of an analogical argument. The second salient feature of
1. See section 7.3 for a summary of Kuhn’s concept of normal science.
306
BY PARALLEL REASONING
normal science is the importance of various forms of generalization: classification systems, abstract models, common causes, and invariant relations. In my theory of analogical reasoning, this point is reflected in the principle of potential for generalization. These two principles, and the corresponding features of normal science, are central to my model for evaluating analogical arguments. Both characteristics are also readily apparent in mathematics. As in normal science, and more generally as in any puzzle-solving activity, there is structure (clear rules) and the best solutions are the most general ones that fit the case at hand. Yet not all cognitive activities in which analogies play a role can be characterized as structured puzzle-solving. As we move to other settings, the characteristics just described begin to disappear. These other settings include the use of analogies in scientific revolutions and in arguments outside of science. As an extreme case, the alleged use of analogical reasoning by nonhuman animals appears to be neither rule-governed nor oriented toward generalization.2 A unified perspective on analogy may be our objective, but it is natural to wonder whether my theory applies beyond mathematics and normal science. What can we say when the two features noted above are diminished or absent? Can the articulation model be adapted or must we settle for a fragmented account of analogical arguments—an account according to which plausible analogical arguments are one thing in the case of normal science and mathematics, and another thing everywhere else? I argue that it is possible to generalize my theory to accommodate applications outside normal science, although the resulting theory is less sharp. In this chapter, I also consider two larger questions. Does my theory of analogical arguments have connections to other forms of reasoning and argumentation? I argue that it helps to shed light on the topic of thought experiments. Does it engage with empirical research on the broad topic of analogical cognition? Again, I offer a positive answer, based on the idea that my theory encapsulates commonsense norms that should be reflected in any realistic model of analogical reasoning. The chapter is organized in a series of circles moving out from the core setting of normal science: scientific revolutions, analogical arguments outside science, thought experiments, and links to empirical research on analogy.
9.2 Revolutionary Analogies A scientific revolution, as described by Kuhn (1962, 1970) is the transition from one set of concepts, experimental and theoretical techniques, fundamental assumptions, and model problem solutions—in short, a Kuhnian paradigm—to a very different such set. Revolution is one possible response by the scientific community to persistent failures or anomalies for the theory—that is, to the breakdown of normal scientific practice—but a revolution can occur only when a new theory with some promise has emerged. The transition from one paradigm to another is both
2. See Oden, Thompson, and Premack (2001) for experiments on analogical reasoning by chimpanzees.
A WIDER ROLE FOR ANALOGIES
307
historically and philosophically complicated. Kuhn devotes an entire book (1957) to his classic example, the Copernican revolution. A vast literature, including several anthologies, deals with scientific paradigms and revolutions.3 Kuhn (1962, 1970) argues that a scientific revolution constitutes a change in worldview and that successive scientific paradigms are “incommensurable.” In part, this means that there is never an argument that, on pain of irrationality, compels individual scientists to abandon an old paradigm for a new one. Attempts to produce such arguments in favor of either the new or the old paradigm must inevitably appear circular because they incorporate paradigm-specific interpretations of the relevant evidence. The difficulty of comparing two paradigms is compounded by the fact that old concepts acquire new meanings during a paradigm shift. A good example is the concept of mass, which is a property of objects in classical Newtonian physics but relative to a frame of reference in the successor paradigm, special relativity (Feyerabend 1963). If the variation in meaning is significant, then translation of sentences from one paradigm into the other becomes difficult or impossible. This is a second and more radical sense in which rival paradigms are incommensurable: if the claims of one paradigm can’t be faithfully represented using the language and concepts of its rival, then a comparative assessment is out of the question. Paradigm shifts do occur, however, and Kuhn insists, especially in his later work (1977), that an individual scientist’s decision to embrace a new theory can be rational (although not compelled). Another important historical phenomenon, closely related to scientific revolution, is the transition from pre-paradigmatic to normal science. The transition from Ptolemaic astronomy to Copernicus’s heliocentric system counts as a revolution because the earlier Ptolemaic system was already paradigmatic science. By contrast, Franklin’s theory was the original paradigm that signaled the emergence of the new science of electricity. In this section, I shall consider two questions. First, can analogical arguments support plausibility judgments either during revolutions or in the development of an original paradigm? Second, if such revolutionary analogies do exist, can their role be explained by the theory that I put forward for normal science? The first question is less straightforward than it sounds. Revolutionary analogies obviously exist. They are as famous as the historical episodes themselves: Darwin’s analogy between natural and artificial selection; Maxwell’s analogy between electromagnetic phenomena and a mechanical system of rotating vortices and idle wheels; Huygens’s analogy between light and sound. Analogies are also common in the emergence of new sciences, as illustrated by our discussion (in chapter 6) of analogies in the early history of electrical theory. Serious evaluation of these historically complex analogies lies beyond the scope of this book, but we can ask what role they play. I want to defend, first, the point that such analogies are employed in plausibility arguments, and, second, the view that the model of analogical reasoning that I have put forward can be modified to accommodate these arguments.
3. Gutting (1980), Horwich (1993b), and Nickles (2003) are three examples.
308
BY PARALLEL REASONING
To begin, then, analogies are employed to make revolutionary hypotheses plausible. A more fundamental point is that revolutionary hypotheses always require the backing of plausibility arguments. I argued this point for novel hypotheses within normal science in section 8.3, and plausibility arguments are even more important for revolutionary science. Thus, both in his prefatory letter and in the First Book of De Revolutionibus, Copernicus opens with arguments designed “at least to make it possible for his contemporaries to conceive the earth’s motion” (Kuhn 1957). Copernicus’s initial plausibility arguments, for the most part not analogical, are an important prelude to the substantive achievements of the later Books. The situation is similar for the three revolutionary episodes cited above, although here the plausibility arguments definitely involve analogies. The link between analogy and plausibility is clearest in Darwin’s case. In a May 1860 letter to Henslow, Darwin reflects upon the logical role of his analogy: Why may I not invent the hypothesis of Natural Selection (which from the analogy of domestic productions, and from what we know of the struggle of existence and of the variability of organic beings, is, in some very slight degree, in itself probable) and try whether this hypothesis of Natural Selection does not explain (as I think it does) a large number of facts? (1903, letter 100) I argued in chapter 6 that physical analogy plays a similar role for Maxwell.4 Finally, in the opening pages of his Treatise on Light, Huygens remarks: “It is in some other way [than the way in which a shot or an arrow traverses the air] that light spreads; and that which can lead us to comprehend it is the knowledge which we have of the spreading of Sound in the air” (1690/1962, 4). From the Preface to the Treatise, it is apparent that Huygens sees the arguments in the rest of his book as cementing the case for the “spherical wave” hypothesis: When things which have been demonstrated by the Principles that have been assumed correspond perfectly to the phenomena which experiment has brought under observation; especially when there are a great number of them, and further, principally, when one can imagine and foresee new phenomena which ought to follow from the hypotheses which one employs, and when one finds that therein the fact corresponds to our prevision . . . this ought to be a very strong confirmation of the success of my inquiry. (1690/1962, vi–vii) The analogy with sound, then, serves to make the wave hypothesis initially plausible. In all three cases, we are dealing with what I have been calling prima facie plausibility. Kuhn rightly observes that the fate of a new paradigm rests upon concrete achievements—specifically, detailed solutions to problems that proved awkward or intractable for the old theory. But a potential convert to a new paradigm must 4. As noted in chapter 6, this interpretation of Maxwell has both supporters and critics.
A WIDER ROLE FOR ANALOGIES
309
first be satisfied that the novel theory is not completely ridiculous. Only then will the achievements of the new paradigm be considered seriously. Conversion rests on the second stage; nonetheless, plausibility arguments, and analogical arguments in particular, constitute an important first stage in making the case for a new paradigm. Let me now turn to the question of whether my theory of analogical reasoning can accommodate revolutionary analogies. To begin with, it is important to realize that the issue of incommensurability is not directly relevant. To be sure, if the new paradigm is incommensurable with the old one, then we cannot draw upon the old paradigm as our source domain. In part, this is because the new paradigm rejects the problem-solving apparatus of the old one; in part, it reflects the prominent role that my theory assigns to the potential for generalization in assessing an analogical argument. There can be no generalization across incommensurable paradigms. But none of this is any cause for concern, since no revolutionary analogy ever employs the old paradigm as its source domain. Scientists want to see support for the novel elements of the new paradigm. These novel elements relate to problems that can’t be successfully resolved in the old paradigm. It follows that any analogical support must come from a source domain other than the old paradigm.5 Kuhn himself suggests that, both in the emergence of a first paradigm and in times of revolution, scientists look to “externally supplied” facts and beliefs to support novel ideas. These externally supplied facts may be lumped into two categories: facts drawn from remote branches of science, and heterogeneous facts and beliefs (consisting of casual observations or experiments, technical expertise, and philosophical views). Corresponding to this suggestion, we might expect to find that the analogical arguments employed in scientific revolutions have source domains located either within a remote science or outside science altogether. That is just what we find for our three examples. Maxwell employs a mechanical model and Huygens draws on the theory of sound; Darwin appeals to the experience of breeders. That these two kinds of analogical argument can be used to support a novel hypothesis is problematic for my theory of analogical reasoning. Where the source domain is based upon casual observation or technical expertise, as in Darwin’s case or in early theories of electricity, there is no established pattern of explanation or prediction that can satisfy my requirement of prior association. Where the source domain is drawn from a remote science, as in Maxwell’s case, the requirement of potential generalization appears untenable. I consider these two cases separately, and then examine a form of analogical reasoning in which both requirements are problematic.
5. There is a well-known problem of demonstrating how a new paradigm (e.g., relativistic mechanics) in some way preserves the achievements of the old one (e.g., Newtonian mechanics). Nickles (1975) suggests that relationships between the old and new paradigms should be construed in terms of analogies, and that this helps to avoid problems of meaning incommensurability. So analogy may have some application to this problem. As my concern is with the different problem of supporting the novel aspects of the new paradigm, we can set this issue aside.
310
BY PARALLEL REASONING
Nonscientific Source Domain Consider, first, a revolutionary analogy such as Darwin’s, where the source domain is some nonscientific phenomenon. In this case, the source domain is our past experience with selective propagation of animals and plants. I discussed Darwin’s analogy in chapter 4 without even remarking upon its revolutionary status. I classified the prior association6 and evaluated the argument accordingly. But if there is no “science” of breeding, then the classification seems unjustified because my theory applies only to prior associations that can be articulated in conformity with some generally accepted pattern of explanation or derivation. That requirement must now be weakened or dropped. There are two ways to do this that would count as natural extensions of the models of analogical reasoning worked out for normal science in chapter 4. The first way is to slot the prior association into one of the existing forms used to model scientific practice. That strategy, arguably, works well in Darwin’s case; the substance of the argument is presented in chapter 4. Taking that route leads to a general concept of selection that embraces both the natural and artificial varieties. The second way is to wing it. More precisely, this strategy is to craft a new form of prior association that fits the case at hand, either by adapting a standard pattern or devising a brand new one. I employed just this strategy in chapter 7 in developing the high-level analogy between law and science. In the course of that argument, I cast the prior association into a decision-theoretic mold. We should appreciate that this strategy moves us toward a more general theory of analogical reasoning than what I have developed thus far. Instead of the requirement that the prior association conform to a recognized type, the new idea is that it should instantiate some valid general pattern. That is an appealing idea, but it also represents a weakening of the theory, since we have no template for developing these general patterns. There is no prospect of working out all of the possibilities in advance.7 This proposal for generalizing the theory is at least consistent with my portrayal of analogical reasoning as a debate between an advocate and a moderately skeptical critic.8 The advocate may base the analogical argument upon a nonscientific source domain, but must still accept the critic’s invitation to elaborate the prior association. I have stressed that every form of prior association is provisional and subject to improvement. The motivation for the canonical representations of chapter 4 was to impose reasonable requirements of honesty and explicitness in representing the similarities and differences upon which an analogical argument depends. Those requirements can still be met, even when the source domain is not so well structured as we might wish, by careful elaboration of a new form of association.
6. In the “abductive/probabilistic” category. 7. Arguably, this problem arises in the law and the solution there is some form of case-based reasoning. But my concern is largely with individual analogical arguments in settings where we lack a rich library of cases. 8. See sections 1.2 and 4.3.
A WIDER ROLE FOR ANALOGIES
311
The impact of this modification to my theory, however, is significant. It adds a preliminary step to analogical reasoning: inventing a broadly acceptable pattern of association, based on the case at hand. Such analogical arguments are likely to be complex, and perhaps less persuasive than what we encounter when we follow recognized patterns of association. I shall return to this idea in section 9.3.
Remote Source Domain A revolutionary idea, such as Huygens’s proposal of a wave theory of light or Maxwell’s introduction of the displacement current, is sometimes supported by analogy with a scientific theory in an apparently remote domain: sound in the case of Huygens, a mechanical configuration of rotating vortices for Maxwell. The challenge here is that there seems to be scant prospect for any generalization embracing two such different domains. Yet according to my theory, where there is no potential for generalization, there is no plausibility—or at least none shown by that analogical argument. This problem arises from a very natural assumption about analogical reasoning: the more dissimilar the domains, the weaker the argument. That assumption is clearly stated by Hume in his Dialogues Concerning Natural Religion: “Wherever you depart, in the least, from the similarity of the cases, you diminish proportionably the evidence; and may at last bring it to a very weak analogy, which is confessedly liable to error and uncertainty” (1779/1947, 144). If we focus just on our two examples, however, it looks as if current computational theories of analogy can solve the problem. These theories share Hume’s assumption, but with an important refinement: “Computational models of analogy have assumed that the strength of an inductive inference about the target is based directly on similarity of the analogs, and in particular on shared higher-order relations” (Lee and Holyoak, 2008). Approaches such as Gentner’s structure-mapping theory, or Holyoak and Thagard’s multiconstraint theory,9 assign a high degree of similarity to two domains represented as having the same “higher order” logical structure, regardless of differences in subject matter. Huygens’s and Maxwell’s analogies pose no special difficulty: since the source and target domains have the same high-level formal structure, they count as very similar. I agree with the computational theorists’ view that the domains in these two examples, with respect to these particular analogical arguments, are not remote. But my agreement is not based on a fixed similarity metric that assigns great weight to shared higher order relations. Instead, it derives from the fact that Huygens’s and Maxwell’s analogical arguments fall into the same category as many others in normal science, such as Priestley’s spherical-shells argument. These are abductive analogical arguments, which rely upon a formal analogy to make an explanatory hypothesis plausible.10
9. See section 3.3. 10. See section 4.6.
312
BY PARALLEL REASONING
In section 6.3 (and earlier in section 4.6), I argued at length that for an argument of this type to succeed, what unites the two domains cannot just be a common formalism. A “Pythagorean” analogy—a purely logical or mathematical resemblance in form, devoid of physical meaning—cannot support a novel scientific hypothesis. The analogical argument must rest upon similarities between observable effects in the two domains. Such similarities are represented in mathematical terms, but nevertheless have a physical interpretation. Indeed, for this category of analogical argument, it turns out that all of the critical features have a mathematical representation. Because our evaluation scheme restricts our attention to critical features, it is possible for two domains that deal with completely different subject matters to count as very similar. In short, these examples pose no threat to the requirement that a plausible analogical argument must show some potential for generalization. There is, however, an interesting historical contrast between the two examples. In Huygens’s case, the generalization (wave theory) continues to play an important role, whereas in Maxwell’s case, the intended generalization was probably a general mechanical model applicable both to the electromagnetic medium and to the imaginary system of vortices. The mechanical model eventually ceased to play a role even in Maxwell’s own thinking. This difference in the fate of the two generalizations does not undercut the importance of there being some initial prospect for uniting the two domains under a common generalization. So far, so good—but doubts remain. Can’t there be revolutionary analogies between remote domains where there is no genuine prospect of generalization? One excellent candidate is Hume’s extensive use of analogies between money and fluid—either fluid in general or “electric fluid” specifically. As Schabas writes: Hume may have been prompted to explore such properties as conservation, diffusion, capacity, condensability, ubiquity, and vitality [of money] because of his knowledge of recent investigations on the electric fluid. This would help account for the remarkably original character of his monetary analysis. Previous thinkers had compared money to water or blood, but Hume was unprecedented in his command of monetary fluctuations, the rapid flow of money from one region to another, and the return to a state of equilibrium both locally and globally. (2006, 73) We have here an analogy that was hailed as innovative (and a fortiori plausible), despite the apparent impossibility of any general “theory of fluids” that might unite the diverse cases of water, money, and electricity. This seems especially ironical in light of Hume’s indictment of the analogies of religion, and more generally his dim view (quoted above) of analogies between highly dissimilar phenomena. But the case is not so simple. Schabas continues: It is my belief that Hume was influenced by the widespread contemporary effort to treat common but mysterious substances as if they were fluids and that this in turn accounts for his ideas on the behavior of
A WIDER ROLE FOR ANALOGIES
313
money. The fact that the theory of money is also the area in which Hume most often conducts thought experiments may be additional evidence that the broader context of his political economy was that of natural philosophy. (2006, 73–74) If we locate Hume’s investigations in the “broader context” of natural philosophy, then we restore the possibility of generalization between the different types of fluids—just as by locating Maxwell’s investigation of electromagnetic phenomena within a broadly mechanical tradition, we restore the possibility of generalization for his physical analogies. I cannot prove that all historically plausible scientific analogies satisfy my requirement that there must be serious potential for generalization. But the thesis is both historically and philosophically viable. It provides the only means I can think of for making the link between analogical arguments and plausibility.
Art and Science In a recent article, Lehrer speculates about the influence of cubist painting on Niels Bohr. After noting Bohr’s well-known proof that the “solar system” model of the atom was unstable, Lehrer suggests that Bohr adopted cubism as a “new metaphor”: “For Bohr, the allure of cubism was that it shattered the certainty of the object. The art revealed the fissures in everything, turning the solidity of matter into a surreal blur. Bohr’s discerning conviction was that the invisible world of the electron was essentially a cubist world” (2008). Lehrer goes on to argue that in order to make progress, scientists need models drawn from the arts, in no small part because of their imprecision: But the surreal nature of physics is precisely why it needs the help of artists. The science has progressed beyond our ability to understand it, at least in any literal sense . . . . This is the irony of modern physics: it seeks reality in its most fundamental form, and yet we are utterly incapable of comprehending these fundaments beyond the math we use to represent them. The only way to know the universe is through analogy. (2008) Visual analogies, metaphors, and models are essential for scientific progress, and especially for revolutionary breakthroughs. Stafford (1999) also makes a case for the relevance of the visual arts to understanding analogical thinking. Indeed, she makes the sweeping claim that all analogical thinking is inherently visual activity. Her main argument appears to be that in visual perception, and especially in our perception of visual art, we constantly dissolve and reform boundaries in order to process images. It is precisely this ability to play with traditional categories that is the key to analogical thinking—an idea that is certainly shared by some cognitive scientists, notably Hofstadter (1995). Prominent scientists and mathematicians express similar sentiments. In chapter 5, we noted Poincare´’s opinion that analogies “not expressible by formula” are the most valuable ones, and that principles guiding
314
BY PARALLEL REASONING
discovery “must be felt rather than formulated” (1952a, 57). A leading theme in Hadamard’s (1949) Essay on the Psychology of Invention in the Mathematical Field is the indispensable role of imprecise visual representation in creative thinking. Suppose we tone down these ideas to the more modest claim that there exist visual analogies—vague, imprecise, and difficult to characterize—and that these visual analogies at least sometimes play a large role in innovative scientific thinking. Even this modest claim, and certainly the bolder versions that assert the indispensability of visual analogies, seems to raise important questions for my theory and for my analysis of revolutionary analogies in particular. I have been unwilling to weaken the requirements of my theory: a plausible analogical argument must be based on a clearly articulated prior association that has demonstrated potential for generalization. But this theory seems to exclude all “reasoning” via visual analogy, which must inevitably fail to meet both requirements. In fact, I am sympathetic to these ideas, and I don’t believe that my theory banishes the visual arts and visual analogies from scientific thinking. My reasons are the same as in my discussion of Hofstadter’s Copycat program in section 3.5. I wrote: Copycat presents a powerful case that analogy-making is in large part the perception of patterns, and that it is properly modeled with low-level cognitive processes. But in no way does this eliminate the need for a logical model of analogical reasoning. To appreciate this point, note first that even in Copycat’s own domain, we can examine the program’s output and ask: is that a good answer? Most theories of visual analogy belong with perceptual models of analogical cognition. They cannot provide a basis for critical assessment. However we produce a piece of analogical reasoning, we still need guidelines for its evaluation; such reasoning is far from infallible. If we abandon the idea that we can apply logical criteria to analogical reasoning, we must abandon the idea that analogies play any part in the rational selection and confirmation of scientific theories. Scientists have drawn, and should continue to draw, on all sorts of remote and inchoate sources for inspiration. But they should be critical of their muses, and so should others. If analogical arguments appear in print, they need to be presented in a justifiable form. Perceptual and logical models of analogical reasoning are complementary: neither should be excluded.
9.3 Analogical Arguments Outside of Science The importance of analogical arguments outside of science is obvious, both in everyday life and in disciplines such as the law and philosophy. What is more, analogical reasoning existed for many thousands of years before science arrived on the scene. It is widely accepted that scientific inference evolved from
A WIDER ROLE FOR ANALOGIES
315
commonsense patterns of reasoning. Consequently, we should expect continuity between analogical reasoning within and outside science. This point should be reflected not only in psychological models of analogical reasoning but also in normative theories. My objective in this section is to outline how my theory, given its initial orientation toward scientific analogies, can be broadened. There is an obvious danger of treating analogical arguments outside of science as second class. But that need not be an inevitable consequence of starting with analogies in science. My initial limitation to analogical arguments where we have a clearly defined form of prior association represents an idealization. An idealization is an assumption made in order to develop a model that gives us partial understanding of some phenomenon. In chapter 4, I formulated a set of canonical forms, or general patterns, and assumed that an acceptable prior association would take one of those forms. That counts as an idealization because those forms provided the framework on which I constructed my core theory of plausible analogical arguments. The assumption is not much of a stretch for scientific analogies, since those general forms are quite flexible. I now want to explain why this theory can also give due respect to non-scientific analogical arguments. In the first place, the classification scheme of chapter 4 is, in one sense, fully comprehensive. The general thesis of that chapter is that every plausible analogical argument is based on a relationship (in the source domain) of counterfactual dependence involving factors in the positive analogy P, the hypothetical analogy Q, and residual factors K. Different varieties of analogical argument correspond to different directions of counterfactual dependence: Q depends on P and K in predictive analogies, whereas P depends on Q and K in explanatory analogies. My models incorporate several forms that this dependence might take: logical, causal, and statistical. Since these forms of dependence figure in analogical arguments outside science, my theory is not limited to scientific analogies. What the theory lacks are models tailored to the many types of counterfactual dependence that we find in analogical arguments in the law, politics, philosophy, and so forth. By the same token, however, sharper models than mine could be constructed for applications within science as well. These are not objections to the theory but areas for further work. In the second place, I can outline an extended, or liberalized, version of my theory that replaces the requirement of a prior association conforming to some predefined template with the requirement that there be a prior association that instantiates some valid general pattern that we seek to extrapolate to the target domain. To show that an analogical argument is plausible, you have to be able to construct a general model. Such models require justification, but they may be provisional. There is no favoritism here for scientific models. This liberalized theory of analogical arguments may be programmatic, but it still has noteworthy philosophical consequences. The first and most significant point is that it retains the importance of scrutinizing the “vertical relationships” within each domain that are the basis for an analogical argument. By contrast, Humean views, which assess the strength of an analogical argument in terms of overall similarity between domains (or “mapping potential”), do not require such scrutiny.
316
BY PARALLEL REASONING
A second and closely related point is that an acceptable theory of analogical reasoning should meet the “threshold” criterion: it should impose some threshold for prima facie plausibility. I borrow this idea from Woodward’s discussion of scientific explanation. He writes: [We] need a theory [of explanation] that captures several different possibilities. On the one hand, there are generalizations and associated putative explanations . . . that are not explanatory at all; they fall below the threshold of explanatoriness. On the other hand, above this threshold there is something like a continuum: a generalization can be explanatory but provide less deep or good explanations than some alternative. (2003, 368) It seems to me that exactly the same point applies to a theory of analogical reasoning. It should provide both a threshold below which an argument is not plausible at all and, for arguments above this threshold, a continuum of better and worse. Most existing theories focus on the continuum but provide no threshold. By contrast, my core theory focuses on the threshold and allows an analogical argument to be strengthened by increasing the positive analogy. That idea is preserved in my extended theory by insisting upon a prior association capable of generalization. A third important point is that certain commonsense norms that have emerged in our analysis of scientific analogies are preserved in the extended theory. I develop this point in section 9.6. Finally, in the extended theory, plausible analogical arguments are still open to generalization and thus amenable to the two lines of justification developed in chapters 7 and 8. That constrasts sharply with views on which analogical reasoning is either unjustified or rests on some crude form of induction. There are, of course, objections that can be raised to the general view of analogical arguments that I am suggesting. I mention just two. First, there may still be concerns about the scope of the theory. Where there is no prospect of articulating a form of prior association, there can be no plausible analogical argument. That seems to rule out plausible analogical reasoning in animals and young children. Let’s call this the “elitism” objection. Second, there is an obvious objection that the liberalized theory is too vague: it calls for general models without providing any guidance as to what counts as an acceptable model. In response to the elitism objection, consider how theories of causal modeling might counter an analogous objection: children and animals employ causal reasoning, yet lack the capacity for constructing abstract causal models. One interesting reply to this objection is that evolution does, in fact, hard-wire us for causal modeling. Based on studies of causal learning in infants, Plotkin writes, “we humans come into the world primed or predisposed to learn certain features of the causal structure of that world” (1997, 189). There have also been studies of analogical reasoning that devote considerable attention to the role of structured representations of the source and target domains—though largely with a view to evaluating the plausibility of
A WIDER ROLE FOR ANALOGIES
317
structure-mapping theories of analogy.11 That is, these studies tend to concentrate on the relative importance of high-level relations and superficial feature matching in such tasks as retrieval of a relevant source domain and analogical mapping. There is certainly potential here for further studies, perhaps even in young children or animals, that might tell us whether some representation of the direction and strength of intradomain relations (the two dimensions that matter in my theory) might influence the kinds of analogical inferences subjects are disposed to make. There is also a simpler response available to causal modeling theorists, provided we concentrate on the normative side of their theories. Particular causal judgments are assessed relative to a causal model, regardless of whether the agent making that judgment is able to articulate the model. In just the same way, we need a model to assess the validity of any analogical argument, regardless of whether its original advocate is capable of providing one. The second objection concerns the vagueness of my liberalized theory of analogical reasoning. Here I have no satisfactory answer. I have no recipe for formulating models. The best that I can do is point to prototypes in the six categories of chapter 4 and in the decision-theoretic analogy of chapter 7. I also offer one illustration of how the general theory might be put to work in the analysis of a well-known example from the philosophy of science.
9.4 An Analogical Argument for Entity Realism Hacking’s (1983) argument for entity realism—the view that certain unobservable entities of physical theory really exist—is intriguing. In its simplest form, the argument runs: we (experimentalists) spray, shoot, and otherwise manipulate electrons; things that we manipulate exist; hence, electrons exist. Is this an elaborate form of G. E. Moore’s famous proof of the existence of an external world—here is one hand (electron), and here is another? Is it a disguised version of the argument for realism from the success of science? I shall argue that it can be usefully construed as an argument from analogy—and a good one, on my theory. Two aspects of Hacking’s brand of entity realism are of particular importance, especially to the reader not versed in the literature on scientific realism. First, entity realists claim that some class of (unobservable) entities is as real as some other, widely accepted class. The strategy is to take for granted the existence of ordinary, middle-sized physical objects and argue that some kind of unobservable entity is just as real.12 Hacking’s argument focuses on the reality of electrons. Second, entity realism depends on our being able to make sense of the existence of entities like electrons independently of theory, or in a way that is minimally dependent on theory. Otherwise, the position collapses to theory 11. See Gentner, Holyoak, and Kokinov (2001) for a survey of some of this work. 12. Entity realists, then, are not tackling the venerable philosophical problem of skepticism about the reality of the external world.
318
BY PARALLEL REASONING
realism, or the view that our scientific theories are (or at least should aim to be) true (or approximately true). Hacking’s argument begins by noting the pragmatic attitude most experimentalists adopt towards physical theories: “Various properties are confidently ascribed to electrons, but most of the confident properties are expressed in numerous different theories or models about which an experimenter can be rather agnostic . . . Models good for calculations on one aspect of electrons will be poor for others” (1983, 263–64). Certain “sufficiently well understood” causal properties are ascribed to electrons, but since these properties can be “embedded” in many different theories, the experimenter can remain agnostic. Crucially, the causal properties assigned are just those that allow the experimentalist to manipulate electrons in experiments used to investigate other sorts of entities. That is significant, since Hacking’s criterion for taking something to be real is expressed in terms of causation and intervention: “Reality has to do with causation and our notions of reality are formed from our ability to change the world . . . We shall count as real what we can use to intervene in the world to affect something else, or what the world can use to affect us” (1983, 46). Now let’s consider Hacking’s argument. As Hacking points out, we judge ordinary objects to be real because of our causal interactions with them: “why else are we (non-sceptics) sure of the reality of even macroscopic objects, but because of what we do with them, what we do to them, and what they do to us?” (1982, 76). Notice: “we non-sceptics.” The appeal to manipulation could not demonstrate the reality of chairs and tables, much less electrons, to a total skeptic. Non-skeptics, however, accept the world of middle-sized objects. They do so not because that assumption has epistemological virtues—for instance, because positing the reality of middle-sized objects simplifies, organizes or explains our perceptual experience—but rather because our manipulations and interactions pre-suppose their reality. In consequence, we are entitled to take unobservable entities to be as real as ordinary objects when we come to interact with them in similar ways. We have the very same reasons for believing in certain unobservable entities—those we manipulate experimentally—as we do for believing in the entities of the commonsense, middle-sized world. As Hacking explains, experiments such as Millikan’s measurement of the charge on an electron do not count as “manipulation” of electrons. What he has in mind are experiments that use electrons to investigate properties of other entities. He cites an experiment where physicists “spray” electrons to test hypotheses involving neutral bosons. Entities become real “when we use them to investigate something else” (1983, 272); this experiment supports the reality of electrons, not bosons. Resnik (1994, 401) has summarized Hacking’s argument as follows: 1. We are entitled to believe that a theoretical entity is real if and only if we can use that entity to do things to the world. 2. We can use some theoretical entities, e.g., electrons, to do things to the world, e.g., change the charges of niobium balls.
A WIDER ROLE FOR ANALOGIES
319
3. Hence, we are entitled to believe that some theoretical entities, e.g., electrons, are real. Read in this way, Resnik suggests that the argument is vulnerable to the objection that it is either dogmatic or a disguised form of inference to the best explanation. That is the familiar argument that we are entitled to believe that theoretical entities exist (and that our theories are approximately true) as their reality is the best (or perhaps the only) explanation for the success of our theories. Hacking himself rejects this argument. But, Resnik asks, how do we justify premise 2, which makes a claim about our causal powers? Either we assert it dogmatically, or else we justify it as we justify any causal claim: as the best explanation of some regularity, in this case our experimental success. The experimenter who reliably conducts experiments, eliminates “noise,” and overcomes “bugs” accepts premise 2 as the best explanation for experimental success. Hence, Hacking’s argument is a form of inference to the best explanation. It is not my intention to evaluate this objection in detail, but I think that it must be wrong. Hacking explicitly rejects such a view because it “gets the timeorder wrong” (1983, 265). We do not infer the reality of electrons from our experimental success; rather, we presuppose it in designing the experiments that reliably produce new phenomena. Conceivably, to the outsider the argument looks like inference to the best explanation. But to the experimenter, with her “immense practical knowledge of how to manipulate electrons,” the inferential relationship is quite different. If we reject inference to the best explanation as the justification for premise 2, though, how can we avoid seeing its acceptance as dogmatic? To answer this question, we have to take seriously Hacking’s pragmatism: the goal is to show that electrons are as real as middle-sized objects, based on how our actions presuppose their existence. I suggest that Hacking’s argument is best understood as an analogy, summarized in figure 9.1 (with the analogical inference represented by the wide arrow). How does this help? The source domain for the analogy is the world of macroscopic objects. Hacking does not clarify the nature of the relation between our causal interactions with objects and our belief in their reality—the prior Source domain:
Target domain:
Macroscopic objects
‘Unobservable’ entities
Causal interactions
Causal manipulation
↔
↔ Real entities
⇒
Real entities?
FIGURE 9.1. Interpreting Hacking’s argument as an analogy.
320
BY PARALLEL REASONING
association—but it is plain from the earlier quotation that he does not take it to be inference to the best explanation. Most likely, there is a bi-directional relationship here, as represented in the double arrow: our causal interactions reinforce our sense that we are dealing with real entities (down arrow); our interventions and manipulations presuppose this realism (up arrow). We need not make this relationship entirely clear. It suffices that we take causal interaction and manipulation, rather than inference to the best explanation, as the basis for realism at this level. Realism about tables and chairs, in Hacking’s sense, is presupposed whenever we move them around. The point of casting this argument as an analogy is that Hacking can now argue that we, or at least the skilled experimentalists among us, have the very same sort of causal interactions with electrons when we “spray” or otherwise manipulate them (hence the double arrow in the target domain). Here, too, it is a matter of learning how to move things around. What basis do we have for conceding real existence in one case, but not the other? The unobservability of electrons, though a disanalogy, is not relevant because the observability of chairs plays no critical part in the prior association. There is no obstacle to generalization over the two domains. The analogical argument satisfies both of our fundamental principles. Hacking’s argument derives strength from the orneriness of the things being manipulated, and in particular from the intensive “debugging” or tinkering needed to get things to work just right. Physicists have to employ “crude mechanical means” to filter out electrons that scatter in the wrong direction. The correct manipulation of electrons is a craft that has to be painstakingly learned. This strengthens the positive analogy to the experience of working with ordinary macroscopic objects, and that is just the sort of thing my theory requires. Another way to appreciate the argument (and to find the relevant generalization) is to see that Hacking has proposed a new way to conceive of the division between problematic and unproblematic kinds of entities. The division should be made in terms of what is manipulable and what is not manipulable. Putting his point in these terms should help to make it clear that it begs the question to insist that the fundamental division is between observable and unobservable phenomena, and that Hacking’s notion of manipulability must reduce to observability.
9.5 Thought Experiments and Hypothetical Models In recent years, a number of philosophers have explored connections between thought experiments and analogical reasoning (Gooding 1993, Nersessian 1993, Palmieri 2003). Nersessian (1993) offers a way to understand the connection via mental modeling. According to her, the first step in a thought experiment is the construction of a mental model, which she characterizes in the following terms: A mental model is a structural analog of a real-world or imaginary situation, event, or process that the mind constructs to reason with. What it means for a mental model to be a structural analog is that it
A WIDER ROLE FOR ANALOGIES
321
embodies a representation of the spatial and temporal relations among and the causal structure connecting the entities and events depicted. (1993, 293) A thought experiment involves two further steps: the “manipulation” of this mental model to derive some result, followed by an inference that transfers that result back to the phenomenon being modeled. The mental model thus functions as the “source domain” in a piece of analogical reasoning. Nersessian writes elsewhere: “Because the simulation complies with the . . . constraints of the system it represents, performing a simulation with a model enables inferences about real-world phenomena” (2002, 141). A prominent example in Nersessian’s work is Maxwell’s use of “physical analogy” to derive his idea of the displacement current (Nersessian 2002). I am in agreement with Nersessian that Maxwell’s use of physical analogy is not merely heuristic.13 As his mechanical configuration of rotating vortices and idle wheels is an imaginary system, his use of this system arguably counts as reasoning with a mental model. The analogical step here is somewhat complex, since it involves not just the link between a mental model and a real phenomenon, but also a further analogy between mechanical and electromagnetic phenomena. Still, we do appear to have an example that is both a thought experiment and an analogical argument. An argument? For Nersessian, mental models are nonpropositional representations and reasoning with mental models is not conventional argumentation. The manipulation of a mental model, unlike the construction of an argument, involves operations that are “tacit.” The process is akin to perception and can be much faster than propositional reasoning. Nersessian concedes, however, a well-known thesis due to Norton: “thought experiments can often be reconstructed as arguments” (Nersessian 1993, 297). Norton’s position (1991, 2004), more precisely, is that cogent thought experiments can always be reconstructed as good arguments. I agree with Norton not only on this point, but also in his observation (in 2004) that if our concern is with the cogency (rather than the psychology) of a thought experiment, then reconstruction as an argument is the essential point and the focus on mental modeling may be a distraction.14 Norton also notes that not all thought experiments depend upon mental models. It is more appropriate to say that they all involve hypothetical models. These reflections suggest an analysis of thought experiments, reconstructed as analogical arguments, which combines Nersessian’s and Norton’s views. Replace “mental model” with “hypothetical model,” and “manipulation” with “derivation of a dependence relation.” A (reconstructed) thought experiment involves three steps: the construction of the hypothetical model, a derivation of some dependence result in that model, and an analogical inference back to the 13. See my discussion of this example in section 6.3. As always, we need to acknowledge the wide range of opinions about the significance of this analogy in Maxwell’s thinking. 14. To repeat a point from my discussion of visual analogies in section 9.2, psychological and normative theories of analogical reasoning (and of thought experiments) can have complementary aims. To focus on the normative aspect is not to deny the interest of mental models.
322
BY PARALLEL REASONING
phenomenon being modeled. Perhaps this formulation takes in cases that we would not ordinarily regard as thought experiments. Provided that it does apply to thought experiments, however, my theory shows how to represent them as analogical arguments. That gives us a specific version of Norton’s thesis: all good thought experiments may be reconstructed as good analogical arguments. In the remainder of this section, I draw attention to a number of advantages of this characterization of thought experiments. The first is that it accommodates a wide variety of thought experiments. The analogical argument is based on the dependence relations (i.e., the prior association) in the hypothetical model. For different types of prior association, we should expect different kinds of thought experiments. For instance, we have seen that there is a big difference between predictive and explanatory analogical arguments. We should expect differences between predictive and explanatory thought experiments. Galileo’s famous thought experiment showing that all bodies must fall at the same speed counts as predictive, while Maxwell’s (as argued in section 6.3) counts as explanatory. In the first case, we transfer a consequence derived in the hypothetical model; in the second, we transfer an explanatory hypothesis. We should evaluate such arguments differently. Taking this analysis a little further, our reformulation in terms of analogical arguments allows for thought experiments that are tentative, or exploratory. A hypothetical model might merely show that some result is a serious possibility rather than that it is inevitable or highly probable. Lennox thinks that many thought experiments in evolutionary biology play precisely this role. He writes: “Thought experiments are intended as tests, not of the truth of the statements comprising the theory, but of the explanatory potential of the theory. They are designed either to display, or to challenge, a theory’s ability to explain the full range of phenomena it claims for its domain” (1991, 223). According to Lennox, Darwin cooks up examples that illustrate “how the struggle for existence could produce a variety by means of selection” (Lennox 1991, 230). Darwin does not always intend his scenarios to be taken as serious attempts at explanation. Given the weak prior association, we have a weak analogical argument. An analogous result might be true in the real world, and that is all Darwin cares about. Exploratory thought experiments are important both in science and in other disciplines, such as philosophy and the law. Our characterization in terms of analogy allows us to locate this type of reasoning within the same genus as more compelling thought experiments. A second advantage of our characterization of thought experiments is that it sheds light on how we might assess the degree of plausibility of thought experiments. Nersessian writes that a thought experiment is often “so compelling that even in those cases where it is possible to carry it out, the reader feels no need to do so” (1991, 296). In terms of the characterization above and my theory of analogical reasoning, we can see that some thought experiments are compelling because they combine a strong prior association with a strong analogical argument. In order to appreciate these points, we should first note that because of the way in which a hypothetical model is constructed, the positive analogy is always
A WIDER ROLE FOR ANALOGIES
323
maximal. That is, every feature that plays a critical role in the hypothetical model (the source domain of the thought experiment) is known to be present in the target domain. In the case of the most compelling thought experiments, the features present in the hypothetical model make the derived result appear either inevitable (e.g., if it is entailed mathematically) or highly probable. Thus, we have both a strong prior association and a strong analogy. Nevertheless, as Nersessian and many others have pointed out, thought experiments are fallible. That is compatible with our analysis: even with a strong prior association and a strong analogy, the hypothetical model may fail to incorporate factors present in the target domain that have unsuspected relevance. There is one final advantage of characterizing thought experiments as analogical arguments that merits attention. It pertains to the justification of thought experiments. One of the most enduring puzzles about thought experiments is to explain how they can provide new knowledge about the world. As Kuhn puts the problem: [If ] we have to do with a real thought experiment, the empirical data on which it rests must have been both well-known and generally accepted before the experiment was even conceived. How, then, relying exclusively upon familiar data, can a thought experiment lead to new knowledge or to new understanding of nature? (1964/1977, 241) If our analysis of thought experiments as analogical arguments is adopted, this becomes the more general question: how can an analogical argument based on a hypothetical source domain provide support for its conclusion? This in turn is a special case of the broader question that has been our focus in the last two chapters: how can any analogical argument (relying solely upon old evidence) provide support for its conclusion? I think that we gain something by moving to this broader formulation. We have already had to confront the problem that Kuhn raises for thought experiments in making a case for analogical reasoning. We have faced (in chapter 8) the objections that analogical reasoning—if meant to provide justification rather than just to serve as a heuristic device—is “spooky,” rationalistic, unnecessary, and contrary to standard norms of belief revision. We might hope that the answers to those objections can shed light on the justification of thought experiments. The main idea of chapter 8 is that good analogical arguments impose symmetry-based constraints on pairs of judgments. Such arguments provide a reason to treat two analogous hypotheses with rough parity: if one holds that the hypothesis in the source domain is plausible, then one ought to regard the analogous hypothesis in the target domain as plausible. I think that this must be part of the solution to the puzzle about thought experiments. A result supported in the hypothetical model should receive comparable support in our theory of the target domain. As we saw in chapter 8, two sorts of complications arise for this argument. The first is that the positive analogy between source and target domains may not be extensive. It may be that many critical factors—those that play a crucial role in
324
BY PARALLEL REASONING
the prior association—belong to the neutral analogy.15 For an acceptable thought experiment, however, I have suggested that the positive analogy is always maximal (and the neutral analogy is empty). So this first complication does not arise. That’s good news. The other complication is to find an epistemological account that makes sense of this solution, avoiding familiar worries about incoherence and (in the probabilistic case) Dutch Books. Analogies, symmetry arguments, thought experiments, old evidence—all of these are troublesome when we try to incorporate them into a probabilistic framework. They all provide reasons to adjust one’s probabilities, but no defensible means of making the adjustments. Before going further into this topic, let’s look at how Kuhn answers his own question. He first considers the following proposal: “Because it embodies no new information about the world, a thought experiment . . . can teach nothing about the world. Instead, it teaches the scientist about his mental apparatus. Its function is limited to the correction of previous conceptual mistakes” (1964/ 1977, 252). The suggestion is that thought experiments confront a scientist “with a contradiction or conflict implicit in his mode of thought” (261). Thus, Galileo’s famous thought experiment about falling bodies exposes contradictions that were already present in the Aristotelian system. The requirement of consistency signals the need for a change in the theory. This view is like my proposal, although I focus on symmetry (rather than logical consistency) as the source of constraint. Kuhn rejects the proposal, though, because the “contradictions” that emerge in a thought experiment are not intrinsic to the previous scheme: “[The problems] arose, that is, not from his [the scientist’s] mental equipment alone but from difficulties discovered in the attempt to fit that equipment to previously unassimilated experience.” (1964/1977, 261). The thought experiment brings the contradiction to light by making relevant some class of experiences or set of problems that had previously been shunted aside. In Kuhn’s terms, a good thought experiment brings out a striking anomaly that is impossible to ignore, and ultimately triggers a (possibly minor) scientific revolution. The result is both a change in conceptual apparatus and a kind of learning about the world. While this picture may be correct for many thought experiments, I don’t think that all of them need to be linked to anomalies. I agree with Kuhn that the shifts in belief produced by a thought experiment imply both continuity and discontinuity with past practice, but I find this idea easiest to express in probabilistic terms. As I argued in chapter 8, a successful analogical argument leads to novel probability assignments, or to modifications of existing assignments, that can’t be expressed in terms of ordinary conditionalization. Such changes do resemble what goes on in the transition to paradigm science or in a small revolution. They can’t be rationalized in terms of purely evidential considerations because they have an irreducibly pragmatic component: they reflect a particular 15. Recall that the neutral analogy consists of critical factors in the source domain whose analogs are not represented as either holding or not holding in the target domain.
A WIDER ROLE FOR ANALOGIES
325
response to norms of symmetry that govern how we should treat similar cases. That is what provides continuity with the past and entitles us to regard the modifications as rational. I can offer no clear mechanism for incorporating analogical arguments into a Bayesian framework because there can be no such mechanism.16 I did, in chapter 8, suggest an image: analogical arguments are used to extend (or alter) partial probability models. I think the same point applies to thought experiments. To complete this sketch linking thought experiments to my theory of analogical arguments, we need to ask: doesn’t it matter that the source domain in a thought experiment is purely hypothetical? Doesn’t this point undercut all of the foregoing remarks about evaluation and justification? My first response is that my theory of analogical arguments says nothing that rules out hypothetical source domains. In particular, the symmetry-based justification of chapter 8 does not depend upon a real source domain in which the result to be transferred to the target is actually true. The starting assumption is that Q (the result to be transferred) is a serious possibility (modal version), or that Pr(Q /P K) is appreciable (probabilistic version).17 The way is open not only to hypothetical models but also to source domains where Q might have been expected but is in fact known to be false. Still, a little reflection suggests at least two important differences between analogical arguments with real and merely hypothetical source domains, and explains our preference for the former. The first is the great difficulty of making a convincing case for most types of prior association in a purely imaginary source domain. In this book, we have surveyed analogical arguments in which this association is variously of a logical/mathematical, causal or statistical nature. Only the first of these can be independently established in a hypothetical domain, through purely conceptual analysis. To justify positing causal or statistical relationships in a hypothetical model, we have to rely either upon commonly accepted background assumptions or upon the extraction of detailed knowledge of the target domain. The great advantage of a real source domain is that causal and statistical relationships can be independently verified; they have a life of their own, independently of the analogy. This observation accounts for the fact that most compelling thought experiments belong to the formal sciences. The second major difference lies in the potential of analogies for guiding an extended program of research. In an ongoing investigation, an analogy with a purely imaginary source domain is generally much less useful than one that has a real source domain. This point is specifically directed at thought experiments involving causal and statistical relations. These arguments are typically tailored to support a specific conclusion and may have little further use. It is easy to see why, in light of our first observation: since most of the elements that go into the
16. Apart from Kuhn’s observations, another argument for this position is provided by van Fraassen’s reflections on updating rules, discussed in section 8.6. 17. Here, P represents the positive analogy and K stands for residual factors in the neutral analogy.
326
BY PARALLEL REASONING
imaginary domain come from the target, we can’t expect it to be a source of independent conjectures.
9.6 Empirical Studies and Computational Models Over the past twenty-five years or so, research on analogical reasoning has been heavily influenced by the structure-mapping approach pioneered by Gentner (1983) and refined in later computational theories. I surveyed some of this work in chapter 3. My aim in this section is to explore some connections between my own theory and what has come to be the dominant approach for cognitive scientists who work on analogical reasoning. In structuralist theories, the primary object of interest is an analogy mapping between two structures. Given some type of structured representation for two domains, the analogy mapping puts elements of those domains into correspondence. The best mappings are those that maximize overall similarity between the source and target domains, with special importance attached to how well these mappings preserve the structure of higher order relations. In short, the best mappings are those that approximate an isomorphism. Analogical reasoning is identified with the construction of a mapping that incorporates a novel hypothesis about the target domain in order to maximize interdomain similarity. The more coherent or “systematic” the mapping, the better the reasoning. It is important that views of this sort imply both a descriptive psychological theory, a characterization of how people actually perform analogical reasoning, and a normative theory, a characterization of good analogical reasoning. I have stressed the legitimacy of investigating separately these two aspects of analogy. My principal theme in this section, though, is that the normative and descriptive strands cannot be kept entirely separate and that researchers need to acknowledge the connection. Insofar as it does that, the structure-mapping approach is a reasonable starting point for understanding analogical reasoning. In what follows, I first discuss how Gentner’s structure-mapping theory has evolved. People have brought both empirical and normative criticisms to bear on that theory. I argue that very limited progress has been made as regards the latter. I then examine why structuralist theories have so much difficulty with the normative side of analogical reasoning. Finally, I have a few suggestions about how a reorientation toward normative issues, and more specifically how some of the ideas in this book, might contribute to future research on analogical reasoning.
The Structure-mapping Engine and Its Successors The original structure-mapping theory, implemented in the structure-mapping engine (SME) algorithm, is geared exclusively toward syntactic isomorphism. It ignores what Gentner refers to as “superficial feature matches” and takes no account of the meaning of the higher order relational structures that it puts into
A WIDER ROLE FOR ANALOGIES
327
correspondence. Defenders of SME provide an empirical justification for this approach: experts prefer analogies that map deep structures, “connected systems of belief, rather than independent features” (Gentner et al. 2001, 208). But the theory has come under pressure on both empirical and normative grounds. On the empirical side, Dunbar (2001) showed that people rely upon superficial matches in low-level problem-solving, and that such matches often play an important role in our retrieval of source domains for constructing analogies. There are also plenty of scientific analogies in which such features are important.18 On the normative side, cognitive science researchers have acknowledged that Gentner’s purely syntactic approach to mapping higher order relations does not do justice to the special significance of causal relationships in analogical reasoning or to the particular goals of the person performing the reasoning. Successor theories have made improvements. Gentner and her colleagues combine SME with the MAC/FAC retrieval program, which takes featurematching into account. Holyoak and Thagard’s multiconstraint theory (1989, 1995), with its associated ACME and ARCS programs, allows a role for individual features (and more broadly semantic constraints) in assessing overall similarity, and flags causal relations and other “pragmatically central” constraints as important. Some theories add a post-analogical “verification” step (Falkenhainer, Forbus, and Gentner 1989/90) that checks the conclusion of the analogical argument. Hummel and Holyoak’s LISA model (1997, 2003) recognizes the important link between analogies and “schema induction.” It is fair to say, however, that both the psychological literature and the computational models have devoted considerably more attention to the empirical adequacy of their theories than to normative aspects of analogical reasoning. Some attention has been given to “empirical norms,” the practices that appear to guide our reasoning. These empirical norms are treated as influences on the mapping process rather than as logical requirements for plausibility. To flag causal relations or “pragmatically central” features of the domain as important is not the same as evaluating whether or not we have a plausible analogical argument. To appreciate the importance of this point, we need only imagine a computational theory of deductive or probabilistic reasoning that models widely shared practices (including common fallacies) as “pressures” that help to produce judgments. Such a theory will fail to represent logical norms adequately.
Diagnosis The failure of structuralist theories to take the normative side of analogical reasoning seriously is not too surprising, given both the empirical orientation of most of the research and the absence of a satisfactory “logic” of analogical arguments. Even taking these points into account, however, structuralist approaches still face a serious challenge in accommodating normative principles. The basic problem is this: for structuralists, there is really no such thing as an
18. See section 6.2.
328
BY PARALLEL REASONING
analogical argument. And if there are no analogical arguments, it becomes very difficult to find a theoretical role for norms. Let’s take these points in turn. Of course, nobody denies that people construct analogical arguments. But as we have seen, structuralists regard analogical mappings, rather than arguments, as the fundamental psychological constructs. Arguments reduce to mappings. Causal constraints replace inferential norms. High-level norms must be interpreted as low-level causal “pressures” that support or inhibit pairwise correspondences between the source and target domains. The distinctly normative character of the analogical argument disappears. In section 9.3, I argued that an acceptable theory of analogical arguments should provide both a threshold below which an argument is not plausible at all and, for arguments above this threshold, a continuum of better and worse. That suggests a simple distinction between threshold norms, which govern the prima facie plausibility of an argument, and norms of degree. My specific concern is that structuralist theories have no place for threshold norms. “Pressures” on the mapping process might be able to account for norms of degree, but they cannot explain why some analogical arguments are acceptable and others are not.19 Structuralists have at least two ways to respond to this criticism. One way is to deny that we need threshold norms: every analogical argument is at least somewhat plausible. I don’t see this response as promising, both because it does not seem true to life and because (as I have argued repeatedly) analogical arguments are used in science to screen hypotheses for prima facie plausibility. The other response is to find a way to model threshold norms via constraints at the mapping level. Some structuralist programs do incorporate at least one such constraint, the n-ary restriction: an analogical relation must map objects to objects, n-place functions to n-place functions, and n-place predicates to n-place predicates. Most structuralists think this restriction is too rigid, and it is relaxed in recent theories (Forbus 2001; Hummel and Holyoak 1997, 2003). In any event, this is not the sort of threshold norm that I have in mind. The issue turns on whether there can be critical disanalogies that invalidate an analogical argument, as discussed in chapters 4 through 6. It is hard to see how a constraint-based approach can accommodate them.
Looking Ahead In order of increasing specificity, I have three suggestions about why psychological and computational models of analogical reasoning require at least a rudimentary normative account of analogical arguments, and how some of the ideas in this book might be helpful. The first suggestion has already been made: we need a basic theory of analogical arguments to define a threshold for prima facie plausibility. I have advanced such a theory, principally in chapter 4 and in section 8.2, but my point here is only that some such analysis is needed. Less forcefully, although I think 19. It’s interesting that this problem does not arise for case-based reasoning programs, such as Ashley’s HYPO, discussed in section 3.4. Ashley’s program treats arguments, rather than mappings, as fundamental.
A WIDER ROLE FOR ANALOGIES
329
that there are good philosophical grounds for believing that there is a threshold of acceptability for analogical arguments, this position can be treated as an empirical claim about our actual practice of reasoning by analogy. If we find that the “threshold” picture receives empirical support, then our models of analogical reasoning need to account for it. My second suggestion is that the two broad principles of my theory—prior association and potential for generalization—may be helpful in empirical work on analogical reasoning. Even if our objective is a psychologically realistic model rather than a normative theory, we should still expect certain commonsense norms to be widely reflected in analogical reasoning. I believe that the two broad principles are widely reflected not just in mathematical and scientific reasoning, but also in the law, in philosophy, and in everyday analogical deliberation. If that is true, then a realistic psychological model should incorporate these principles in some manner. It might appear that structuralist theories already incorporate both principles. By insisting upon a detailed representation of source and target domains, they respect the requirement of prior association. By favoring systematic analogy mappings, they respect the requirement of potential for generalization. These responses miss the mark, however, because they don’t provide guidance in identifying the critical factors that, on my theory, determine the success or failure of an analogical argument.20 To explain this point, it helps to focus on the category of predictive causal analogies. In a recent paper on the role of causal knowledge in analogical reasoning, Lee and Holyoak (2008) discuss the fact that increased systematicity does not always translate into greater inductive strength.21 As a simple illustration, they consider how an analogy between the earth and the moon might be used to argue for the presence of iron deposits on the moon. As they observe, “the fact that no mining operations have so far been conducted on the moon (a mismatch with a property of the source) seems to strengthen the analogical inference that iron deposits remain to be found on the moon” (297). More generally, “When a preventive causal relation present in the source analog is absent from the target, the strength of an inductive inference is correspondingly increased” (301). Lee and Holyoak have conducted empirical studies to show that most people reason in accordance with this “commonsense norm” that takes the distinction between contributing and preventive causes seriously. In their view, “[current computational models] are unable to grasp that the two types of causes are important in different ways, with very different implications for analogical inference” (301). Generalizing from this observation, my point is that when our objective is to construct a mapping that preserves high-level structure, we tend to ignore the details of what goes on inside the prior association. We must rely upon syntactic properties (such as the order of a relation) or user-supplied “hints” to determine which features are most important to the argument. By 20. These views about the importance of identifying critical factors in determining whether an analogical argument is acceptable may, once again, be treated as empirical claims. 21. See sections 3.3 and 4.5, where I elaborate on this point.
330
BY PARALLEL REASONING
contrast, the articulation model ensures that we represent the prior association in enough detail to determine which features and relations are critical and how they affect the feature being transferred to the target. This discussion leads to my third and most specific suggestion. My analysis of analogical arguments provides a number of specific “commonsense” norms that should arguably be reflected in a realistic theory of analogical reasoning. Lee and Holyoak have concentrated on one such norm, relating to contributing and preventive causes. But my theory prescribes different norms for different types of prior association, as outlined in the middle chapters of the book. In causal analogies, for example, the presence or absence of defeaters may be critical. In explanatory analogical arguments, gaps in the observational record may count as critical. Neither point is captured by systematicity. These specialized norms of analogical reasoning may be of interest independently of the rest of my theory. The three suggestions just made reflect what I take to be the most important connections between psychological models of analogical reasoning and the modal version of my theory, which is directed toward criteria for prima facie plausibility. There is also potential for connection to the probabilistic version. Chapter 8 proposed links between symmetry, analogical arguments, and prior probability assignments. If those proposals hold up, they, too, might be reflected in psychological theories of analogical reasoning.
9.7 Defective Analogical Arguments This book provides a theoretical framework for evaluating analogical arguments. That framework presupposes a fixed representation for the argument at a moment in time. Even if we accept that theory, there is still an obvious question: what should we do when the theory tells us that an analogical argument is defective? Just as first-order logic does not tell us what to do with an invalid argument, my theory provides no advice about faulty analogical arguments beyond the need to revise or reject them. The costs of giving up too quickly are evident, but historical examples (circular orbits, ether theories, and the acoustical analogy) illustrate the dangers of hanging on too long. In cases where an analogy has been vital to a research program, the problem of when to give up on a flawed analogical argument is very close to the problem of when to abandon a paradigm. The analogy has a life distinct from its application to a particular argument. As Kuhn emphatically observes, anomalies never lead to the rejection of a paradigm in the absence of an alternative. It seems as if the same thing is often true of flawed analogical arguments. Rather than abandon them, we reformulate them, look for missing evidence, or alter our auxiliary assumptions. Furthermore, the observation applies quite widely. The best strategy for dealing with the initial failure of an analogical argument is often to try to make it work by finding a reformulation rather than to abandon it. I believe that my theory of analogical reasoning is compatible with this claim. After explaining the point, I return to the question of when to call it quits.
A WIDER ROLE FOR ANALOGIES
331
There are three main reasons for persisting with an analogical argument even if it fails to meet the requirements of my theory. The first, as already noted, is that we often have no plausible alternative hypothesis. Of course, on my theory, the failed analogical argument establishes nothing about plausibility, so what reason do we have to persist? The answer is that every analogy has potential value: this is the second reason for persistence. It is important to distinguish between an analogy and the particular arguments in which that analogy is employed. Provided we have an extensive set of similarities between two domains, there will always be some analogical inferences that pass muster. My theory provides guidelines for reformulating an unsuccessful analogical argument so that it becomes acceptable. Confronted with critical differences, we must either find a novel representation of the prior association or modify the conclusion of the argument, so that the problematic features cease to be critical. The clearest illustrations of both strategies can be found in mathematics. In Example 4.1 (triangles and tetrahedra), we see an analogical argument that fails initially but succeeds when we move to a different proof of the very same result. In Example 5.8 (complex numbers and operators), by contrast, we discussed a case where we were able to salvage a faulty analogical argument by modifying the conjecture about the target domain until it met our criteria for plausibility. Establishing the modified result can sometimes even be an important intermediate step in proving the original conjecture. The third reason for persistence is that even if the two strategies just described fail to produce a plausible analogical argument, that failure can be instructive in highlighting critical differences between the source and target domains. This phenomenon too is clearly exhibited in many mathematical examples. Persistence with a failed analogy can also be a virtue in the empirical sciences. Notably, new evidence can sometimes eliminate what appear to be critical differences and increase the positive analogy. This occurred with respect to the fossil record in the case of Darwin’s analogy (Example 4.8) and with the discovery of evidence for abundant water on ancient Mars (Example 4.6). It failed to occur with atomic spectra in the case of the acoustical analogy (Example 4.7). In short, attempting to salvage a failed analogical argument is a good strategy if we have a clear prior association, extensive similarity between two domains, and no alternative plausible hypothesis. We can try to eliminate critical differences by modifying the prior association, restricting the conclusion of the argument or seeking new evidence. There is no sharp rule about when to give up. We just have to evaluate how our analogical argument is shaping up. Is the relevant positive analogy increasing, while the neutral and negative analogies decrease? Is the conclusion being reduced to a claim of no real consequence? Have other plausible hypotheses emerged? Such questions have to be left to individual judgment.
This page intentionally left blank
Bibliography
Achinstein, P. 1983. The Nature of Explanation. Oxford: Oxford University Press. ———. 1964. “Models, Analogies and Theories.” Philosophy of Science 31: 328–49. Agassi, J. 1988. Analogies Hard and Soft. In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy, ed. D. Helman (pp. 401–19). Dordrecht: Kluwer Academic Publishers. ———. 1964. Discussion: Analogies as Generalizations. Philosophy of Science 31: 351–56. Aristotle. 1984. The Complete Works of Aristotle, ed. J. Barnes. Princeton: Princeton University Press. Armstrong, D. M. 1978. Universals and Scientific Realism, 2 vols. Cambridge: Cambridge University Press. Arthur, R. 1999. “On Thought Experiments as a priori Science.” International Studies in the Philosophy of Science 13: 215–29. Arveson, W. 1976. An Invitation to C*-algebras. New York: Springer-Verlag. Ashley, K. D. 1990. Modeling Legal Argument: Reasoning with Cases and Hypotheticals. Cambridge, MA: MIT Press/Bradford Books. ———. 1988a. “Indexing and Analytic Models.” In Case-based Reasoning: Proceedings of a Workshop on Case-Based Reasoning, ed. J. Kolodner (pp. 197–202). Los Altos, CA: Morgan Kaufmann. ———. 1988b. “Arguing by Analogy in Law: A Case-Based Model.” In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy, ed. D. Helman (pp. 205–24). Dordrecht: Kluwer Academic Publishers. ———. 1985. “Reasoning by Analogy: A Survey of Selected A.I. Research with Implications for Legal Expert Systems.” In Computing Power and Legal Reasoning, ed. C. Walter (pp. 105–27). St. Paul: West Publishing.
334
BIBLIOGRAPHY
Barr, A., and E. Feigenbaum, eds. 1981. Handbook of Artificial Intelligence. Los Altos, CA: William Kaufmann. Bartha, P. 2004. “Countable Additivity and the de Finetti Lottery.” British Journal for the Philosophy of Science 55: 301–21. Bartha, P., and R. Johns. 2001. “Probability and Symmetry.” Philosophy of Science 68 (Proceedings): S109–22. Batterman, R. 2005. “Critical Phenomena and Breaking Drops: Infinite Idealizations in Physics.” Studies in History and Philosophy of Modern Physics 36 (2): 225–44. Beccaria, G. 1753. Dell’elettricismo artificiale e naturale libri due. Turin: Nella Stampa di Filippo Antonio Campana. Berkeley, G. 1965. Principles, Dialogues, and Philosophical Correspondence, ed. C. M. Turbayne. Indianapolis: Bobbs-Merrill. Bertrand, J. 1889. Calcul des probabilite´s. Paris: Gauthier-Villars. Biela, A. 1991. Analogy in Science. Frankfurt: Peter Lang. Black, M. 1962. Models and Metaphors. Ithaca, NY: Cornell University Press. Bork, A. 1963. “Maxwell, Displacement Current, and Symmetry.” American Journal of Physics 31: 854–59. Bradshaw, G. A. and B. L. Finlay. 2005. “Natural Symmetry.” Nature 435 (May 2005): 149. Branting, L. K. 1991. “Building Explanations from Rules and Structured Cases.” International Journal of Man-Machine Studies. 34 (6): 797–837. Briggs, D., and S. Conway Morris. 1986. “Problematica from the Middle Cambrian Burgess Shale of British Columbia.” In Problematic Fossil Taxa, ed. A. Hoffman and M. Nitecki (pp. 167–83). Oxford: Oxford University Press. Bromberg, J. 1967. “Maxwell’s Displacement Current and His Theory of Light.” Archive for History of Exact Sciences 4: 218–34. Brown, T. 2003. Making Truth: Metaphor in Science. Urbana and Chicago: University of Illinois Press. Burstein, M. 1988a. “Analogy vs. CBR: The Purpose of Mapping.” In Case-based Reasoning: Proceedings of a Workshop on Case-Based Reasoning, ed. J. Kolodner (pp. 133–36). Los Altos, CA: Morgan Kaufmann. ———. 1988b. “Combining Analogies in Mental Models.” In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy, ed. D. Helman (pp. 179–203). Dordrecht: Kluwer Academic Publishers. Cairns-Smith, A. G. 1985. Seven Clues to the Origin of Life. Cambridge: Cambridge University Press. Campbell, N. R. 1957. Foundations of Science. New York: Dover. ———. 1920. Physics: The Elements. Cambridge: Cambridge University Press. Carbonell, J. G. 1986. “Derivational Analogy: A Theory of Reconstructive Problem Solving and Expertise Acquisition.” In Machine Learning: An Artificial Intelligence Approach, vol. 2, ed. J. Carbonell, R. Michalski, and T. Mitchell (pp. 371–92). Los Altos, CA: Morgan Kaufmann. ———. 1983. “Learning by Analogy: Formulating and Generalizing Plans from Past Experience.” In Machine Learning: An Artificial Intelligence Approach, vol. 1 ed. R. Michalski, J. Carbonell, and T. Mitchell (pp. 137–62). Palo Alto, CA: Tioga.
BIBLIOGRAPHY
335
Carbonell, J. G., and M. Veloso. 1993. “Derivational Analogy in PRODIGY: Automating Case Acquisition, Storage and Utilization.” Machine Learning 10: 249–78. ———. 1988. “Integrating Derivational Analogy into a General Problem Solving Architecture.” In Case-based Reasoning: Proceedings of a Workshop on Case-Based Reasoning, ed. J. Kolodner (pp. 104–24). Los Altos, CA: Morgan Kaufmann. Carnap, R. 1950. Logical Foundations of Probability. London: Routledge & Kegan Paul. ———. 1980. “A Basic System of Inductive Logic Part II.” In Studies in Inductive Logic and Probability, vol. 2, ed. R. C. Jeffrey (pp. 7–155). Berkeley: University of California Press. Cartwright, N. 1992. “Aristotelian Natures and the Modern Experimental Method.” In Inference, Explanation, and Other Frustrations, ed. J. Earman (pp. 44–71). Berkeley: University of California Press. Chang, C. C., and H. J. Keisler. 1977. Model Theory, 2nd ed. Amsterdam: North Holland Publishing. Chisholm, Roderick. 1981. Foundations of Knowing. Minneapolis: University of Minnesota Press. Christensen, D. 1999. “Measuring Confirmation.” Journal of Philosophy 96 (9): 437–61. Collier, H. O. J. 1984. “The Story of Aspirin.” In Discoveries in Pharmacology, vol. 2, ed. M. J. Parnham. and J. Bruinvels (pp. 555–94). Amsterdam: Elsevier. Complete Oral Arguments of the Supreme Court of the United States. 1988. Frederick, MD: North American Reporting. Conway Morris, S. 1977. “A New Metazoan from the Cambrian Burgess Shale of British Columbia.” Palaeontology 20 (3): 623–40. Copi, I. 1961. Introduction to Logic, 2nd ed. New York: Macmillan. Copi, I., and C. Cohen. 2002. Introduction to Logic, 11th ed. Upper Saddle River, NJ: Prentice Hall. Cross, Rupert, and J. W. Harris. 1991. Precedent in English Law, 4th ed. Oxford: Clarendon Press. Darden, L., and J. Cain. 1989. “Selection Type Theories.” Philosophy of Science 56: 106–29. Darden, L., and R. Rada. 1988a. “Hypothesis Formation Using Part-Whole Interrelations.” In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy, ed. D. Helman (pp. 341–75). Dordrecht: Kluwer Academic Publishers. ———. 1988b. “Hypothesis Formation via Interrelations.” In, Analogica, ed. A. Prieditis (pp. 109–27). London: Pitman. Darwin, C. 1903. More Letters of Charles Darwin, vol. 1, ed. F. Darwin. New York: D. Appleton. ———. 1860. On the Origin of Species. New York: D. Appleton. Davies, T. R. 1988. “Determination, Uniformity, and Relevance: Normative Criteria for Generalization and Reasoning by Analogy.” In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy, ed. D. Helman (pp. 227–50). Dordrecht: Kluwer Academic Publishers. Davies, T. R., and S. Russell. 1987. “A Logical Approach to Reasoning by Analogy.” In IJCAI 87: Proceedings of the Tenth International Joint Conference on Artificial Intelligence, ed. J. McDermott (pp. 264–70). Los Altos, CA: Morgan Kaufmann.
336
BIBLIOGRAPHY
de Beaune, S., and R. White. 1993. “Ice Age Lamps.” Scientific American 266 (March): 108–13. De Finetti, B. 1980. “On the Condition of Partial Exchangeability.” In. Studies in Inductive Logic and Probability, vol. 2, ed. R. C. Jeffrey (pp. 193–205). Berkeley: University of California Press. ———. 1974. Theory of Probability, vols. 1 and 2, trans. A. Machı´. and A. Smith. New York: John Wiley. ———. 1972. Probability, Induction and Statistics. New York: John Wiley. ———. 1964. “Foresight: Its Logical Laws, Its Subjective Sources.” In Studies in Subjective Probability, ed. H. Kyburg and H. Smokler (pp. 93–158). New York: John Wiley. De Finetti, B., and L. J. Savage. 1972. “How to Choose the Initial Probabilities.” In Probability, Induction and Statistics, B. de Finetti (pp. 143–46). New York: John Wiley. Descartes, R. 1637/1954. The Geometry of Rene´ Descartes, trans. D. E. Smith and M. L. Latham. New York: Dover. Donnan, C. B. 1971. “Ancient Peruvian Potter’s Marks and Their Interpretation Through Ethnographic Analogy.” American Antiquity 36 (4): 460–66. Duhem, P. 1954. The Aim and Structure of Physical Theory, trans. P. Wiener. Princeton, NJ: Princeton University Press. Dunbar, K. 2001. “The Analogical Paradox: Why Analogy Is so Easy in Naturalistic Settings, Yet so Difficult in the Psychological Laboratory.” In The Analogical Mind: Perspectives from Cognitive Science, ed. D. Gentner, K. Holyoak, and B. Kokinov (pp. 313–34). Cambridge, MA: MIT Press. Edwards, W., H. Lindman, and L. J. Savage. 1963. “Bayesian Statistical Inference for Psychological Research.” Psychological Review 70: 193–242. Eells, E. 1991. Probabilistic Causality. Cambridge: Cambridge University Press. Eells, E., and B. Fitelson. 2000. “Measuring Confirmation and Evidence.” Journal of Philosophy 97 (12): 663–72. Elga, A. 2000. “Self-Locating Belief and the Sleeping Beauty Problem.” Analysis 60: 143–47. Eliasmith, C., and P. Thagard. 2001. “Integrating Structure and Meaning: A Distributed Model of Analogical Mapping.” Cognitive Science 25: 245–86. Embley, M., M. Wilkinson, J. McInerny, R. Hirt, and P. Foster. 2005. “Molecular Systematics and Evolution Course: Introduction to Characters and Parsimony Analysis.” Natural History Museum, London and NUI Maynooth, Ireland. Available at: www.bioinf.org/molsys/index.html. Evans, T. G. 1968. “A Program for the Solution of Geometric-Analogy Intelligence-Test Questions.” In Semantic Information Processing, ed. M. L. Minsky (pp. 271–353). Cambridge, MA: MIT Press. Evans-Pritchard, E. E., ed. 1954. The Institutions of Primitive Society. Oxford, UK: Basil Blackwell. ———. 1937. Witchcraft, Oracles and Magic among the Azande. Oxford: Clarendon Press. Falkenhainer, B., K. Forbus, and D. Gentner. 1989/90. “The Structure-Mapping Engine: Algorithm and Examples.” Artificial Intelligence 41: 2–63.
BIBLIOGRAPHY
337
Feyerabend, P. 1963. “How to Be a Good Empiricist—A Plea for Tolerance in Matters Epistemological.” In Philosophy of Science, The Delaware Seminar, vol. 2, ed. B. Baumrin (pp. 3–39). New York: Interscience Publishers. Feynman, R. 1965. The Character of Physical Law. Cambridge, MA: MIT Press. Feynman, R., R. Leighton, and M. Sands. 1963. The Feynman Lectures on Physics. Reading, MA: Addison-Wesley. Fitelson, B., and J. Hawthorne. In press. “How Bayesian Confirmation Theory Handles the Paradox of the Ravens.” In The Place of Probability in Science, ed. E. Eells and J. Fetzer. Chicago: Open Court Press. Forbus, K. 2001. “Exploring Analogy in the Large.” In The Analogical Mind: Perspectives from Cognitive Science, ed. D. Gentner, K. Holyoak, and B. Kokinov (pp. 23–58). Cambridge, MA: MIT Press. Forbus, K., R. Ferguson, and D. Gentner. 1994. “ Incremental Structure-mapping.” In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, ed. A. Ram and K. Eiselt (pp. 313–18). Hillsdale, NJ: Lawrence Erlbaum. Forbus, K., D. Gentner, and K. Law. 1995. “MAC/FAC: A Model of Similarity-based Retrieval.” Cognitive Science 19: 141–205. Franklin, B. 1941. Benjamin Franklin’s Experiments, ed. I. B. Cohen. Cambridge, MA: Harvard University Press. Franklin, J. 2001. “Resurrecting Logical Probability.” Erkenntnis 55: 277–305. Friedman, M. 1974. “Explanation and Scientific Understanding.” Journal of Philosophy 71: 5–19. Gaifman, H. 2004. “Reasoning with Limited Resources and Assigning Probabilities to Arithmetical Statements.” Synthese 140: 97–119. Gaifman, H., and M. Snir. 1982. “Probabilities over Rich Languages, Testing and Randomness.” Journal of Symbolic Logic 47: 495–548. Galileo Galilei. 1638/1974. Discorsi, Two New Sciences, trans. S. Drake. Madison: University of Wisconsin Press. Garber, D. 1983. “Old Evidence and Logical Omniscience in Bayesian Confirmation Theory.” In Testing Scientific Theories, Minnesota Studies in the Philosophy of Science, vol. 10, ed. J Earman (pp. 99–131). Minneapolis: University of Minnesota Press. Gentner, D. 1988. “Analogical Inference and Analogical Access.” In Analogica, ed. A. Prieditis (pages 63–68). London: Pitman. ———. 1983. “Structure-Mapping: A Theoretical Framework for Analogy.” Cognitive Science 7: 155–70. Gentner, D., B. Bowdle, P. Wolff, and C. Boronat. 2001. “Metaphor Is Like Analogy.” In The Analogical Mind: Perspectives from Cognitive Science, ed. D. Gentner, K. Holyoak, and B. Kokinov (pp. 199–254). Cambridge, MA: MIT Press. Gentner, D., B. Falkenhainer, and J. Skorstad. 1988. “Viewing Metaphor as Analogy.” In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy, ed. D. Helman (pp. 171–77). Dordrecht: Kluwer Academic Publishers. Gentner, D., K. Holyoak, and B. Kokinov. 2001. The Analogical Mind: Perspectives from Cognitive Science. Cambridge, MA: MIT Press.
338
BIBLIOGRAPHY
Gick, M. L. and K. J. Holyoak. 1983. “Schema Induction and Analogical Transfer.” Cognitive Psychology 15: 1–38. ———. 1980. “Analogical Problem Solving.” Cognitive Psychology 12: 306–55. Giere, R. 1991. Understanding Scientific Reasoning, 3rd ed. Fort Worth, TX: Holt, Rinehart, and Winston. Ginsberg, M., ed. 1987. Readings in Nonmonotonic Reasoning. Los Altos, CA: Morgan Kaufmann. Glymour, C. 1980. Theory and Evidence. Princeton, NJ: Princeton University Press. Goldberg, A. M., and J. M. Frazier. 1989. “Alternatives to Animals in Toxicity Testing.” Scientific American 261 (August): 24–30. Goldman, A. 1988. “Strong and Weak Justification.” In Philosophical Perspectives, ed. J. Tomberlin (pp. 51–69). Atascadero, CA: Ridgeview. ———. 1979. “What Is Justified Belief?” In Justification and Knowledge, ed. G. Pappas (pp. 1–23). Dordrecht: Reidel. Good, I. J. 1968. “Corroboration, Explanation, Evolving Probability, Simplicity and a Sharpened Razor.” British Journal for the Philosophy of Science 19: 123–43. Gooding, D. 1993. “What Is Experimental about Thought Experiments?” In PSA 1992: Proceedings of the Biennial Meetings of the Philosophy of Science Association (pp. 280–90). Chicago: University of Chicago Press. Goodman, N. 1972. Problems and Projects (pp. 437–47). Indianapolis: Bobbs-Merrill. ———. 1966. The Structure of Appearance. Indianapolis: Bobbs-Merrill. ———. 1965. Fact, Fiction and Forecast. Indianapolis: Bobbs-Merrill. ———. 1961. “Safety, Strength, Simplicity.” Philosophy of Science 28: 150–51. Gould, R. A., and P. J. Watson. 1982. “A Dialogue on the Meaning and Use of Analogy in Ethnoarchaeological Reasoning.” Journal of Anthropological Archaeology 1: 355–81. Gould, S. 1989. Wonderful Life. New York: W.W. Norton. Greiner, R. 1988. “Abstraction-based Analogical Inference.” In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy, ed. D. Helman (pp. 147–70). Dordrecht: Kluwer Academic Publishers. ———. 1985. Learning by Understanding Analogies. Ph.D. thesis, Stanford University, Palo Alto, CA. Technical Report STAN-CS-85-1071. Grosholz, E. R. 1985. “Two Episodes in the Unification of Logic and Topology.” British Journal for the Philosophy of Science 36: 147–57. Gutting, G., ed. 1980. Paradigms and Revolutions. Notre Dame: University of Notre Dame Press. Hacking, I. 1993. “Do Thought Experiments Have a Life of Their Own?” In PSA 1992: Proceedings of the Biennial Meetings of the Philosophy of Science Association (pp. 302–308). Chicago: University of Chicago Press. ———. 1990. “Natural Kinds.” In Perspectives on Quine, ed. R. Barrett and R. Gibson (pp. 129–141). Cambridge, UK: Blackwell. ———. 1983. Representing and Intervening. Cambridge: Cambridge University Press. ———. 1982. “Experiments and Scientific Realism.” Philosophical Topics 13: 71–88. ———. 1967. “Slightly More Realistic Personal Probability.” Philosophy of Science 34: 311–25.
BIBLIOGRAPHY
339
Hadamard, J. 1949. An Essay on the Psychology of Invention in the Mathematical Field. Princeton, NJ: Princeton University Press. Hahnemann, S. 1921. Organon der rationellen Heilkunde (Organon of the Healing Art), 6th ed, trans. W. Boericke. Philadelphia: Boericke & Tafel. Halliday, D., and Resnick, R. 1981. Foundations of Physics, 2nd ed. New York: John Wiley. Hammond, K. J. 1989. “CHEF.” In Inside Case-Based Reasoning, ed. C. K. Riesbeck and R. C. Schank (p. 165). Hillsdale, NJ: Lawrence Erlbaum. Hanson, N. R. 1958. Patterns of Discovery. Cambridge: Cambridge University Press. Harman, Gilbert. 1973. Thought. Princeton, NJ: Princeton University Press. Harre, R. 1960. An Introduction to the Logic of the Sciences. London: Macmillan. Harrod, R. F. 1956. Foundations of Inductive Logic. London: Macmillan. Hart, H. L. A. 1961. The Concept of Law. Oxford: Clarendon Press. Hartmann, Stephen. 1998. “Idealization in Quantum Field Theory.” In Idealization IX: Idealization in Contemporary Physics, ed. N. Shanks (pp. 99–122). Amsterdam: Rodopi. Hawthorne, J. 2008. “Inductive Logic.” Stanford Encyclopedia of Philosophy, 2008 ed., ed. E. Zalta.: http://plato.stanford.edu/entries/logic-inductive/. Heal, J. 1998. “Understanding Other Minds from the Inside.” In Current Issues in Philosophy of Mind, ed. A. O’Hear (pp. 83–99). New York: Cambridge University Press. Helman, D. H., ed. 1988. Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy. Dordrecht: Kluwer Academic Publishers. Hempel, C.G. 1965a. “Aspects of Scientific Explanation.” In Aspects of Scientific Explanation and Other Essays in the Philosophy of Science, (pp. 331–496). New York: Free Press. ———. 1965b. “The Logic of Functional Analysis.” In Aspects of Scientific Explanation and Other Essays in the Philosophy of Science, (pp. 297–330). New York: Free Press. ———. 1965c. “Fundamentals of Taxonomy.” In Aspects of Scientific Explanation and Other Essays in the Philosophy of Science, (pp. 137–54). New York: Free Press. Herron, T., T. Seidenfeld, and L. Wasserman. 1997. “Divisive Conditioning: Further Results on Dilation.” Philosophy of Science 64 (3): 411–44. ———. 1994. “The Extent of Dilation of Sets of Probabilities and the Asymptotics of Robust Bayesian Inference.” In PSA 1994: Proceedings of the Biennial Meetings of the Philosophy of Science Association (pp. 250–59). Chicago: University of Chicago Press. Herstein, I. N. 1975. Topics in Algebra, 2nd ed. New York: John Wiley. Hesse, M. B. 1988. “Theories, Family Resemblances and Analogy.” In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy, ed. D. H. Helman (pp. 317–40). Dordrecht: Kluwer Academic Publishers. ———. 1974. The Structure of Scientific Inference. Berkeley: University of California Press. ———. 1973. “Logic of Discovery in Maxwell’s Electromagnetic Theory.” In Foundations of Scientific Method: The Nineteenth Century, ed. R. Giere and R. Westfall (pp. 86–114). Bloomington: University of Indiana Press. ———. 1966. Models and Analogies in Science. Notre Dame: University of Notre Dame Press.
340
BIBLIOGRAPHY
Hesse, M. B. 1964. “Analogy and Confirmation Theory.” Philosophy of Science 31: 319–27. Hintikka, J., and P. Suppes, eds. 1966. Aspects of Inductive Logic. Amsterdam: NorthHolland Press. Hitchcock, C. R. 1993. “A Generalized Probabilistic Theory of Causal Relevance.” Synthese 97: 335–64. Hofstadter, D. 1995. Fluid Concepts and Creative Analogies. New York: Basic Books. Holdsworth, W. S. 1903. A History of English Law. London: Methuen. Holton, G., and S. Brush. 1973. Introduction to Concepts and Theories in Physical Science, 2nd ed. Reading: Addison-Wesley. Holyoak, K., and P. Thagard. 1995. Mental Leaps: Analogy in Creative Thought. Cambridge, MA: MIT Press. ———. 1989. “Analogical Mapping by Constraint Satisfaction.” Cognitive Science 13: 295–355. Honderich, T. 1985. Morality and Objectivity. London: Routledge & Kegan Paul. Horwich, P. 1993a. “Wittgensteinian Bayesianism.” In Midwest Studies in Philosophy, vol. 18, ed. P. A. French, T. E. Uehling, and H. K. Wettstein (pp. 62–77). Notre Dame: University of Notre Dame Press. ———, ed. 1993b. World Changes: Thomas Kuhn and the Nature of Science, Cambridge, MA: MIT Press. Howson, C., and P. Urbach. 1993. Scientific Reasoning: The Bayesian Approach, 2nd ed. Peru, IL: Open Court. Hull, D. L. 1973. Darwin and his Critics. Cambridge, MA: Harvard University Press. Hume, D. 1779/1947. Dialogues Concerning Natural Religion. Indianapolis: Bobbs-Merrill. ———. 1748/1977. An Enquiry Concerning Human Understanding. Indianapolis: Hackett. ———. 1739/1978. A Treatise of Human Nature, ed. A. Selby-Bigge. Oxford: Clarendon Press. Hummel, J., and K. Holyoak. 2003. A symbolic-connectionist theory of relational inference and generalization. Psychological Review 110: 220–64. ———. 1997. “Distributed Representations of Structure: A Theory of Analogical Access and Mapping.” Psychological Review 104 (3): 427–66. Humphreys, P. 1989. The Chances of Explanation: Causal Explanation in the Social, Medical and Physical Sciences. Princeton, NJ: Princeton University Press. ———. 1981. “Aleatory Explanations.” Synthese 48: 225–32. Hunter, D., and P. Whitten, P., eds. 1976. Encyclopedia of Anthropology. New York: Harper & Row. Huygens, C. 1690/1962. Treatise on Light, trans. S. Thompson. New York: Dover. Indurkhya, B. 1992. Metaphor and Cognition. Dordrecht: Kluwer Academic Publishers. James, W. 1979. The Will to Believe and Other Essays in Popular Philosophy. Cambridge, MA: Harvard University Press. Jaynes, E.T. 1973. “The Well-Posed Problem.” Foundations of Physics 3: 477–92. ———. 1968. “Prior Probabilities.” IEEE Transactions of the Society of Systems Sciences Cybernetics SSC-4: 227–41. Jeffrey, R.C. 1992. Probability and the Art of Judgment. Cambridge: Cambridge University Press.
BIBLIOGRAPHY
341
———, ed. 1980. Studies in Inductive Logic and Probability. Berkeley, CA: University of California Press. Jeffreys, H. 1973. Scientific Inference, 3rd ed. Cambridge: Cambridge University Press. Jenkin, F. 1973. “The Origin of Species” [Review article, 1867] In Darwin and his Critics, ed. D. L. Hull (pp. 302–50). Cambridge, MA: Harvard University Press. Jevons, W. S. 1877/1924. The Principles of Science, 2nd ed. London: Macmillan. Jones, S., and D. Pilbeam, eds. 1992. The Cambridge Encyclopedia of Human Evolution. Cambridge: Cambridge University Press. Kanigel, R. 1991. The Man Who Knew Infinity: A Life of the Genius Ramanujan. New York: C. Scribner’s Sons. Kedar-Cabelli, S. 1988a. “Towards a Computational Model of Purpose-Directed Analogy.” In Analogica, ed. A. Prieditis (pp. 89–108). London: Pitman. ———. 1988b. “Analogy—from a Unified Perspective.” In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy, ed. D. H. Helman (pp. 65–103). Dordrecht: Kluwer Academic Publishers. Kelly, K. 1995. The Logic of Reliable Inquiry. Oxford: Oxford University Press. Kemeny, J. 1953. “The Use of Simplicity in Induction.” Philosophical Review 62: 391–408. Keynes, J. M. 1921. A Treatise on Probability. London: Macmillan. Kitcher, P. 1993. The Advancement of Science. Oxford, UK: Oxford University Press. ———. 1989. “Explanatory Unification and the Causal Structure of the World.” In Scientific Explanation, Minnesota Studies in the Philosophy of Science, vol. 13, ed. P. Kitcher and W. Salmon (pp. 410–505). Minneapolis: University of Minnesota Press. ———. 1983. The Nature of Mathematical Knowledge. Oxford: Oxford University Press. Kitching, I., P. Forey, C. Humphries, and D. Williams. 1998. Cladistics: The Theory and Practice of Parsimony Analysis, 2nd ed. Oxford: Oxford University Press. Kline, M. 1972. Mathematical Thought from Ancient to Modern Times. New York: Oxford University Press. Kolodner, J. L. 1993. Case-Based Reasoning. Palo Alto, CA: Morgan Kaufmann. Kobinger, W. 1989. “From Mouse to Man: Transferability of the Results of Animal Experiments to Man.” In Scientific Alternatives to Animal Experiments, ed. F. Lembeck (pp. 30–37). Chichester: Ellis Horwood. Kripke, S. 1972. Naming and Necessity. Cambridge, MA: Harvard University Press. Kroes, P. 1989. “Structural Analogies between Physical Systems.” British Journal for the Philosophy of Science 40: 145–54. Kruse, M. 2000. “Invariance, Symmetry and Rationality.” Synthese 122: 337–57. Kuhn, T. S. 1979. “Metaphor in Science.” In Metaphor and Thought, ed. A. Ortony (pp. 409–19). Cambridge: Cambridge University Press. ———. 1977. “Objectivity, Value Judgment, and Theory Choice.” In The Essential Tension (pp. 320–39). Chicago: University of Chicago Press. ———. 1970. The Structure of Scientific Revolutions, 2nd ed. Chicago: University of Chicago Press. Kuhn, T. S. 1964/1977. “A Function for Thought Experiments.” In The Essential Tension (pp. 240–65). Chicago: University of Chicago Press.
342
BIBLIOGRAPHY
Kuhn, T. S. 1962. The Structure of Scientific Revolutions, 1st ed. Chicago: University of Chicago Press. ———. 1957. The Copernican Revolution. Cambridge, MA: Harvard University Press. Kyburg, H. 1970. Probability and Inductive Logic. New York: Macmillan. Kyburg, H., and C. M. Teng. 2001. Uncertain Inference. Cambridge: Cambridge University Press. Lakatos, I. 1976. Proofs and Refutations: The Logic of Mathematical Discovery. Ed. J. Worrall and E. Zahar. Cambridge: Cambridge University Press. Lakoff, G. 1987. Women, Fire and Dangerous Things: What Categories Reveal about the Mind. Chicago: University of Chicago Press. Lakoff, G., and M. Johnson. 1980. Metaphors We Live By. Chicago: University of Chicago Press. Lakoff, G., and M. Turner. 1989. More than Cool Reason: A Field Guide to Poetic Metaphor. Chicago: University of Chicago Press. Laudan, L. 1984. Science and Values: The Aims of Science and Their Role in Scientific Debate. Berkeley: University of California Press. ———. 1977. Progress and Its Problems: Toward a Theory of Scientific Growth. London: Routledge and Kegan Paul. Leatherdale, W. H. 1974. The Role of Analogy, Model, and Metaphor in Science. Amsterdam: North-Holland Publishing. Lee, H. S., and Holyoak, K. J. 2008. “Absence Makes the Thought Grow Stronger: Reducing Structural Overlap Can Increase Inductive Strength.” In Proceedings of the Thirtieth Annual Conference of the Cognitive Science Society, ed. V. Sloutsky, B. Love, and K. McRae (pp. 297–302). Austin, TX: Cognitive Science Society. Lehrer, J. 2008. “The Future of Science . . . Is it Art?” Seed (online magazine), January, www.seedmagazine.com. Lembeck, F. 1989. Scientific Alternatives to Animal Experiments. Chichester: Ellis Horwood. Lenat, D. 1983. “The Role of Heuristics in Learning by Discovery: Three Case Studies.” In Machine Learning: An Artificial Intelligence Approach, vol. 1, ed. R. Michalski, J. Carbonell, and T. Mitchell (pp. 243–304). Palo Alto, CA: Tioga. Leng, B., B. Buchanan, and H. Nicholas. 1994. “Protein Secondary Structure Prediction Using Two-Level Case-Based Reasoning.” Journal of Computational Biology 1 (1): 25–38. Lennox, J. G. 1991. “Darwinian Thought-Experiments: A Function for Just-so Stories.” In Thought Experiments in Science and Philosophy, ed. T. Horowitz and G. Massey (pp. 223–45). Savage, MD: Rowman & Littlefield. ———. 1983. “Aristotle’s Lantern.” Journal of Hellenic Studies 103: 147–51. Levi, E. 1949. An Introduction to Legal Reasoning. Chicago: University of Chicago Press. Lewis, D. 1980. “A Subjectivist’s Guide to Objective Chance.” In Studies in Inductive Logic and Probability, vol. 2, ed. R. C. Jeffrey (pp. 263–93). Berkeley: University of California Press. Lewis, D. 1973. Counterfactuals. Cambridge, MA: Harvard University Press. Liebig, J. 1964. Animal Chemistry. New York: Johnson Reprint.
BIBLIOGRAPHY
343
Lienhardt, G. 1954. “Modes of Thought.” In The Institutions of Primitive Society, ed. E. E. Evans-Pritchard (pp. 95–107). Oxford: Oxford University Press. Liston, M. 2000. “Critical Discussion of Mark Steiner’s The Applicability of Mathematics as a Philosophical Problem.” Philosophia Mathematica 3 (8): 190–207. Llewellyn, K. 1960. The Bramble Bush: On Our Law and its Study. New York: Oceana. Lloyd, G. E. R. 1966. Polarity and Analogy. Cambridge, UK: Cambridge University Press. Macintyre, A. 1986. “Twenty Years of P-adic Model Theory.” In Logic Colloquium 1984, ed. J. B. Paris, A. J. Wilkie, and G. M. Wilmers (pp. 121–53). Amsterdam: North-Holland Publishing. Mahan, B. 1975. University Chemistry, 3rd ed. Addison-Wesley. Maier, C. L. 1981. The Role of Spectroscopy in the Acceptance of the Internally Structured Atom 1860–1920. New York: Arno Press. Malcolm, N. 1963. Knowledge and Certainty. Englewood Cliffs, NJ: Prentice-Hall. Maxwell, J. C. 1890. Scientific Papers of James Clerk Maxwell, vol. I, ed. W. D. Niven. Cambridge: Cambridge University Press. McCarty, L. T. 1977. “Reflections on TAXMAN: An Experiment in Artificial Intelligence and Legal Reasoning.” Harvard Law Review 90: 837–93. McCarty, L. T., and N. S. Sridharan. 1981. “The Representation of an Evolving System of Legal Concepts: II. Prototypes and Deformations.” LRP-TR-11, Laboratory for Computer Science Research, Rutgers University. McKay, C. P. 1993. “Did Mars Once Have Martians?” Astronomy 21 (9): 26–33. McMullin, Ernan. 1993. “Rationality and Paradigm Change in Science.” In World Changes: Thomas Kuhn and the Nature of Science, ed. P. Horwich (pp. 55–78). Cambridge, MA: MIT Press. McNeill Alexander, R. 1991. “How Dinosaurs Ran.” Scientific American 264 (April): 130–36. Mill, J. S. 1843/1930. A System of Logic. London: Longmans-Green. ———. 1979. Collected Works of John Stuart Mill, Vol. 9: An Examination of Sir William Hamilton’s Philosophy. New York: Longmans, Green & Co. Mitchell, M. 1993. Analogy-Making as Perception. Cambridge, MA: Bradford Books/MIT Press. Morrison, M. 2000. Unifying Scientific Theories. Cambridge: Cambridge University Press. Munkres, J. 1984. Elements of Algebraic Topology. Menlo Park, CA: Addison-Wesley. Myrvold, W. C. 2003. A Bayesian Account of the Virtue of Unification. Philosophy of Science 70: 399–423. Nagel, E. 1961. The Structure of Science. New York: Harcourt, Brace & World. Nerhot, P., ed. 1991. Legal Knowledge and Analogy: Fragments of Legal Epistemology, Hermeneutics and Linguistics. Dordrecht: Kluwer Academic Publishers. Nersessian, N. 2002. “Maxwell and ‘the Method of Physical Analogy’: Model-Based Reasoning, Generic Abstraction, and Conceptual Change.” In Reading Natural Philosophy, ed. D. Malament (pp. 129–66). Chicago: Open Court. Nersessian, N. 1993. “In the Theoretician’s Laboratory: Thought Experimenting as Mental Modeling.” PSA 1992: Proceedings of the Biennial Meetings of the Philosophy of Science Association (pp. 291–301). Chicago: University of Chicago Press.
344
BIBLIOGRAPHY
Nersessian, N. 1992. “How do Scientists Think?” In Cognitive Models of Science, Minnesota Studies in the Philosophy of Science, vol. 15, ed. R. Giere (pp. 5–22). Minneapolis: University of Minnesota Press. Nickles, T., ed. 2003. Thomas Kuhn. Cambridge: Cambridge University Press. ———. 1975. “Two Concepts of Intertheoretic Reduction.” Journal of Philosophy 70: 181–201. Niiniluoto, I. 1988. “Analogy and Similarity in Scientific Reasoning.” In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, andPhilosophy, ed. D. H. Helman (pp. 271–98). Dordrecht: Kluwer Academic Publishers. Norton, J. 2004. “Why Thought Experiments Do Not Transcend Empiricism.” In Contemporary Debates in Philosophy of Science, ed. C. Hitchcock (pp. 44–66). Oxford, UK: Blackwell Publishing. ———. 1991. “Thought Experiments in Einstein’s Work.” In Thought Experiments in Science and Philosophy, ed. T. Horowitz and G. Massey (pp. 129–48). Savage, MD: Rowman and Littlefield. Nye, M. J. 1996. Before Big Science: The Pursuit of Modern Chemistry and Physics 1800–1940. New York: Twayne. Oden, D., R. Thompson, and D. Premack. 2001. “Can an Ape Reason Analogically? Comprehension and Production of Analogical Problems by Sarah, a Chimpanzee.” In The Analogical Mind: Perspectives from Cognitive Science, ed. D. Gentner, K. Holyoak, and B. Kokinov (pp. 471–97). Cambridge, MA: MIT Press. Oppenheimer, R. 1956. “Analogy in Science.” American Psychologist 11 (3): 127–35. Ortony, A., ed. 1979. Metaphor and Thought. Cambridge: Cambridge University Press. Palmieri, P. 2003. “Mental Models in Galileo’s Early Mathematization of Nature.” Studies in the History and Philosophy of Science, Part A 34 (2): 229–64. Parnham, M. J., and J. Bruinvels, eds. 1984. Discoveries in Pharmacology, vol. 2. Amsterdam: Elsevier. Pearl, J. 2000. Causality: Models, Reasoning and Inference. Cambridge: Cambridge University Press. ———. 1988. Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann. Peirce, C. S. 1931–1935. Collected Papers, ed. C. Hartshorne and P. Weiss. Cambridge, MA: Harvard University Press. Pera, M. 1992. The Ambiguous Frog, trans. J. Mandelbaum. Princeton, NJ: Princeton University Press. Pietarinen, J. 1972. Lawlikeness, Analogy and Inductive Logic. Amsterdam: North-Holland. Plato. 1971. Collected Dialogues, ed. E. Hamilton and H. Cairns. Princeton, NJ: Princeton University Press. Playfair, J. 1778. “On the Arithmetic of Impossible Quantities.” Royal Society of London Philosophical Transactions 68: 318–43. Plotkin, H. 1997. Evolution in Mind. London: Penguin. Poincare´, H. 1952a. Science and Method, trans. F. Maitland. New York: Dover. ———. 1952b. Science and Hypothesis, trans. W. J. Greenstreet. New York: Dover. Polya, G. 1954. Mathematics and Plausible Reasoning, 2 vols. Princeton, NJ: Princeton University Press. Popper, K. 1974. Objective Knowledge: An Evolutionary Approach. Oxford: Clarendon Press.
BIBLIOGRAPHY
345
———. 1959. The Logic of Scientific Discovery, 2nd ed. New York: Basic Books. Posin, D. Q. 1948. Mendeleyev: The Story of a Great Scientist. New York: McGraw-Hill. Prieditis, A., ed. 1988. Analogica. London: Pitman. Priestley, J. 1769, 1775/1966. The History and Present State of Electricity, 2 vols. New York: Johnson Reprint. Provine, W. B. 1971. The Origins of Theoretical Populations Genetics. Chicago: University of Chicago Press. Purcell, E. 1985. Electricity and Magnetism. New York: McGraw-Hill. Putnam, H. 1975. “The Meaning of Meaning.” In Language, Mind, and Knowledge, Minnesota Studies in the Philosophy of Science, vol. 7, ed. K. Gunderson (pp. 157–98). Minneapolis: University of Minnesota Press. Quine, W. V. 1973. The Roots of Reference. La Salle, IL: Open Court. ———. 1969. “Natural Kinds.” In Ontological Relativity and Other Essays (pp. 114–38). New York: Columbia University Press. Quine, W. V., and J. S. Ullian. 1970. The Web of Belief. New York: Random House. Radin, M. 1933. “Case Law and Stare Decisis.” Columbia Law Review 33 (February): 199. Railton, P. 1981. “Probability, Explanation, and Information.” Synthese 48: 233–56. Reichenbach, H. 1949. The Theory of Probability. Los Angeles: University of California Press. ——. 1938. Experience and Prediction. Chicago: University of Chicago Press. Reid, T. 1785/1895. Essays on the Intellectual Powers of Man. The Works of Thomas Reid, vol. 3, 8th ed., ed. Sir William Hamilton. Edinburgh: James Thin. Resnik, D. 1994. “Hacking’s Experimental Realism.” Canadian Journal of Philosophy 24: 395–412. Reynolds, A. K., and L. O. Randall. 1975. Morphine and Related Drugs. Toronto: University of Toronto Press. Riesbeck, C. K., and R. C. Schank. 1989. Inside Case-Based Reasoning. Hillsdale, NJ: Lawrence Erlbaum. Rissland, E. 1985. “Argument Moves and Hypotheticals.” In Computing Power and Legal Reasoning, ed. C. Walter (pp. 129–43). St. Paul, MN: West. ———. 1986. “Learning How to Argue: Using Hypotheticals.” In Experience, Memory, and Reasoning, ed. J. L. Kolodner and C. K. Riesbeck (pp. 115–26), Hillsdale, NJ: Lawrence Erlbaum. Rissland, E., and D. Skalak. 1991. “CABARET: Rule Interpretation in a Hybrid Architecture.” International Journal of Man-Machine Studies, 34 (6): 839–87. Robinson, D. S. 1930. The Principles of Reasoning, 2nd ed. New York: D. Appleton. Rudin, W. 1976. Principles of Mathematical Analysis, 3rd ed. New York: McGraw-Hill. Rudner, R. 1966. Philosophy of Social Science. Englewood Cliffs, NJ: Prentice-Hall. Russell, B. 1914. Our Knowledge of the External World. London: Allen & Unwin. Russell, B. 1912. “On the Relation of Universals and Particulars.” Proceedings of the Aristotelian Society 12: 1–24. Russell, S. 1988. “Analogy by Similarity.” In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy, ed. D. Helman (pp. 251–69). Dordrecht: Kluwer Academic Publishers.
346
BIBLIOGRAPHY
Russell, S. 1987. “Analogy and Single-Instance Generalization.” In Proceedings of the Fourth International Machine Learning Workshop, ed. P. Langley (pp. 390–97). Palo Alto, CA: Morgan Kaufmann. ———. 1986a. Analogical and Inductive Reasoning. Ph.D. thesis, Stanford University, Department of Computer Science, Palo Alto, CA. ———. 1986b. “Preliminary Steps Towards the Automation of Induction.” In Proceedings of the Fifth National Conference on Artificial Intelligence, ed. T. Kehler (pp. 477–84). Palo Alto, CA: Morgan Kaufmann. Russell, S., and B. Grosof. 1987. “A Declarative Approach to Bias in Concept Learning.” In Proceedings of the Sixth National Conference on Artificial Intelligence (pp. 505–10). Palo Alto, CA: Morgan Kaufmann. Rutherford, E. 1963. Collected Papers. London: Allen and Unwin. Salmon, W. 1990. “Rationality and Objectivity in Science, or Tom Kuhn Meets Tom Bayes” In Scientific Theories: Minnesota Studies in the Philosophy of Science, vol. 14, ed.C. Wade Savage (pp. 175–204). Minneapolis: University of Minnesota Press. ———. 1984. Scientific Explanation and the Causal Structure of the World. Princeton, NJ: Princeton University Press. ———. 1978. “Religion and Science: A New Look at Hume’s Dialogues.” Philosophical Studies 33: 143–76. ———. 1966. The Foundations of Scientific Inference. Pittsburgh: University of Pittsburgh Press. ———. 1957. “Should We Attempt to Justify Induction?” Philosophical Studies 8 (3): 33–48. Salmon, W. and J. Earman. 1992. “The Confirmation of Scientific Hypotheses.” In Introduction to the Philosophy of Science, M. Salmon, J. Earman, C. Glymour, J. Lennox, P. Machamer et al. (pp. 42–103). Upper Saddle River, NJ: Prentice-Hall. Salmon, M., J. Earman, C. Glymour, J. Lennox, P. Machamer et al. 1992. Introduction to the Philosophy of Science. Upper Saddle River, NJ: Prentice-Hall. Sanders, K. 1991. “Representing and Reasoning about Open-textured Predicates.” Proceedings of the Third International Conference on Artificial Intelligence and Law (pp. 137–44). New York: ACM. Savage, L. J. 1972. Foundations of Statistics. New York: Dover. Schabas, M. 2006. The Natural Origins of Economics. Chicago: University of Chicago Press. Schaffner, K. 1986. “Exemplar Reasoning about Biological Models and Diseases: A Relation between the Philosophy of Medicine and Philosophy of Science.” Journal of Medical Philosophy 11 (February): 63–80. ———. 1972. Nineteenth-Century Aether Theories. Oxford, UK: Pergamon Press. Schank, R. C. 1982. Dynamic Memory: A Theory of Reminding and Learning in Computers and People. Cambridge: Cambridge University Press. Seifert, C. 1988. “Analogy and Case-Based Reasoning.” In Case-based Reasoning: Proceedings of a Workshop on Case-Based Reasoning, ed. J. Kolodner (pp. 125–29). Palo Alto, CA: Morgan Kaufmann. Sellars, W. 1965. “Scientific Realism or Irenic Instrumentalism.” In Boston Studies in the Philosophy of Science, vol. 2, ed. R. S. Cohen and M. W. Wartofsky (pp. 171–204). Dordrecht: Synthese.
BIBLIOGRAPHY
347
———. 1956. “Empiricism and the Philosophy of Mind.” In The Foundations of Science and the Concepts of Psychoanalysis, Minnesota Studies in the Philosophy of Science, vol. 1, ed. H. Feigl and M. Scriven (pp. 253–329). Minneapolis: University of Minnesota Press. Shanks, R. G. 1984. “The Discovery of Beta Adrenoceptor Blocking Drugs.” In Discoveries in Pharmacology, vol. 2, ed. M. J. Parnham and J. Bruinvels (pp. 38–72). Amsterdam: Elsevier. Shapere, D. 1966. “Plausibility and Justification in the Development of Science.” Journal of Philosophy 63 (20): 611–21. Shelley, C. 2003. Multiple Analogies in Science and Philosophy. Amsterdam: John Benjamins. ———. 1999. “Multiple Analogies in Archaeology.” Philosophy of Science 66: 579–605. Shimony, A. 1970. “Scientific Inference.” In The Nature and Function of Scientific Theories, ed. R. Colodny (pp. 79–172). Pittsburgh: University of Pittsburgh Press. Siegel, D. M. 1991. Innovation in Maxwell’s Electromagnetic Theory. Cambridge: Cambridge University Press. ———. 1986. “The Origin of the Displacement Current.” Historical Studies in the Physical and Biological Sciences 17: 99–146. Simpson, R. L. 1985. “A Computer Model of Case-Based Reasoning in Problem Solving: An Investigation in the Domain of Dispute Mediation.” Technical Report GIT-ICS85/18, Georgia Institute of Technology, School of Information and Computer Science, Atlanta, GA. Sklar, L. 2000. Theory and Truth: Philosophical Critique within Foundational Science. Oxford: Oxford University Press. Snyder, L. 2006. Reforming Philosophy: A Victorian Debate on Science and Society. Chicago: University of Chicago Press. Sober, E. 2000. “Evolution and the Problem of Other Minds.” Journal of Philosophy 97: 365–86. ———. 1984. The Nature of Selection. Cambridge, MA: MIT Press. Spirtes, P., C. Glymour, and R. Scheines. 2000. Causation, Prediction and Search, 2nd ed. Cambridge, MA: MIT Press. Spivak, M. 1980. Calculus, 2nd ed. Berkeley, CA: Publish or Perish. Stafford, B. M. 1999. Visual Analogy: Consciousness as the art of Connecting. Cambridge. MA: MIT Press. Stebbing, L. S. 1933. A Modern Introduction to Logic, 2nd ed. London: Methuen. Steiner, M. 1998. The Applicability of Mathematics as a Philosophical Problem. Cambridge, MA: Harvard University Press. Steiner, M. 1989. “The Application of Mathematics to Natural Science.” Journal of Philosophy 86: 449–80. Stepan, N. 1996. “Race and Gender: The Role of Analogy in Science.” In Feminism and Science, ed. E. F. Keller. and H. Longino (pp. 121–36). Oxford: Oxford University Press. Sterrett, S. 2006. “Models of Machines and Models of Phenomena.” International Studies in the Philosophy of Science 20 (March): 69–80.
348
BIBLIOGRAPHY
Strawson, P. 1959. Individuals. London: Methuen. Subramanian, D. 1988. A Theory of Justified Reformulations. Ph.D. thesis, Stanford University, Palo Alto, CA. Suppes, P. 1966. “A Bayesian Approach to the Paradoxes of Confirmation.” In Aspects of Inductive Logic, ed. J. Hintikka and P. Suppes (pp. 198–207). Amsterdam: NorthHolland Press. Talalay, L. E. 1987. “Rethinking the Function of Clay Figurine Legs from Neolithic Greece: An Argument by Analogy.” American Journal of Archaeology 91 (2): 161–69. Taylor, G. J. 1994. “The Scientific Legacy of Apollo.” Scientific American 271 (July): 40–47. Thagard, P. 1989. “Explanatory Coherence.” Behavioural and Brain Science 12: 435–502. ———. 1988a. “Dimensions of Analogy.” In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy, ed. D. Helman (pp. 105–24). Dordrecht: Kluwer Academic Publishers. ———. 1988b. Computational Philosophy of Science. Cambridge, MA: MIT Press. Thagard, P., K. Holyoak, G. Nelson, and D. Gochfeld. 1990. “Analog Retrieval by Constraint Satisfaction.” Artificial Intelligence 46: 259–310. Timoshenko, S., and Goodier, J. 1970. Theory of Elasticity, 3rd ed. New York: McGraw-Hill. Van Fraassen, Bas. 1995. “Belief and the Problem of Ulysses and the Sirens.” Philosophical Studies 77: 7–37. ———. 1989. Laws and Symmetry. Oxford: Clarendon Press. ———. 1984. “Belief and the Will.” Journal of Philosophy 81: 235–56. ———. 1980. The Scientific Image. Oxford: Clarendon Press. Veloso, M. 1992. Learning by Analogical Reasoning in General Problem Solving. PhD thesis, Carnegie Mellon University, Pittsburgh, PA. Villegas, C. 1981. “Inner Statistical Inference II.” Annals of Statistics 9 (4): 768–76. ———.1977. “Inner Statistical Inference.” Journal of the American Statistical Association 72: 453–58. von Wright, G. H. 1951. A Treatise on Induction and Probability. London: Routledge and Kegan Paul. Weitzenfeld, J. S. 1984. “Valid Reasoning by Analogy.” Philosophy of Science 51: 137–49. Williamson, J. 1999. “Countable Additivity and Subjective Probability.” British Journal for the Philosophy of Science 50: 401–16. Wilson, D. 1983. Rutherford: Simple Genius. London: Hodder and Stoughton. Winston, P. 1980. “Learning and Reasoning by Analogy.” Communications of the ACM 23 (12): 689–703. Woods, J., A. Irvine, and D. Walton. 2004. Argument: Critical Thinking, Logic and the Fallacies, 2nd ed. Toronto: Pearson Education Canada. Woodward, J. 2003. Making Things Happen: A Theory of Causal Explanation. Oxford: Oxford University Press. Wright, L. 1976. Teleological Explanations. Berkeley: University of California Press. Wylie, A. 1985. “The Reaction Against Analogy.” Advances in Archaeological Method and Theory. 8: 63–111. ———. 1982. “An Analogy by Any Other Name Is Just as Analogical.” Journal of Anthropological Archaeology 1: 382–401.
Index
In the index, analogical reasoning is often abbreviated as AR. abductive analogies, 122–128; prior association for, 122–23; preconditions for, 123–25; critical factors for, 125; prima facie plausibility of, 127–28 abductive/probabilistic analogies, 128–33; prior association for, 129; preconditions for, 129–30; critical factors in, 131–32; and prima facie plausibility, 132; and determinants of qualitative plausibility, 133 abelian groups: as example of analogical reasoning, 167–68, 169–71, 182, 299 Achinstein, P., 22–23 ACME program, 72–73 acoustical analogy: as example of analogical reasoning, 123, 125–28, 209, 222, 260, 296–98, 331 advocate and critic: as device for evaluating analogical arguments, 5, 7, 102, 104, 154 Agassi, J., 52–56, 200
Ahlquist, R. P., 112 aleatory explanation, 113–14, 227–28; defeating conditions for, 115–116 algebraic similarity, 161–165 analogical arguments, 13–14; and argument from example (paradeigma), 36–38; and argument from likeness (homoiotes), 36, 38–39; and articulation model, 24–30, 33; basic evaluation criteria for, 19–21, 39, 40–44; defective, when to give up on, 330–1; defined, 1; individual and multiple, 141–42; outside of science, 314–17; and prior probabilities, 290–99; representation and terminology, 13–15; varieties of, 96–98. See also analogical arguments, varieties of analogical arguments, varieties of: abductive, 122–28, 209, 219, 222; abductive/probabilistic, 128–33; correlative, 97, 138–41, 197–207; functional, 96–97, 133–38; mathematical, 107–11, 209, 220; predictive/probabilistic, 112–21 analogical reasoning (AR), examples of: abelian groups, 167–68, 169–71, 182, 299; acoustical analogy, 123, 125–28,
350
INDEX
209, 222, 260, 296–98, 331; animal spirits, 198; animal testing, 47, 207; argument for other minds, 8; Aristotle on earthquakes, 26, 242; artificial and natural selection, 24, 26, 129–33, 224–25, 236, 300, 307–13; beta-blockers, 112, 114, 117–21; church and state, 232; complex numbers and operators, 188–90, 331; dinosaur speed, 224; electric current and fluid flow, 209; electromagnetic radiation, 212–14, 218–20, 307–13, 321; electrostatic attraction, 282; ether analogy, 9–10, 106; Euler characteristic formula, 155, 167, 169, 171–72, 185–86; finite and infinite matrices, 175, 178–79; fluid flow and heat conduction, 44, 93; Franklin on lightning, 43, 197–98, 205–7, 242, 262; heat and electricity, 208; hydrodynamic analogies, 267, 288; Kelvin and hydrodynamic analogies, 2; life on other worlds, 20, 25, 41, 50, 71, 112–13, 114, 115, 117–21, 146–47, 233, 296–98, 331; lp and L p spaces, 191; money and fluid, 312–13; morphine and meperidine, 3, 95–96, 139–41, 205–7, 195, 229, 241; Peruvian bark, 199, 205–7; Peruvian pots, 133–38, 196, 225, 227; polynomials and power series, 157–159, 174–77; punctured sphere, 157, 167; race-gender analogies, 11–12; rectangles and boxes, 41, 44, 154, 167, 169–71; relativistic Schro¨dinger equation, 215–16, 221–22; Rutherford model of atom, 4; saccharin and cancer, 6; Schro¨dinger equation, 214–16, 221; series convergence, 190–91; Sizzi’s analogy, 23, 27; slab lamps, 134, 137, 225–28; sound and light, 195, 307–13; spherical shells, 95, 122–23, 126–28, 296–98, 209; triangles and tetrahedra, 8, 95, 107–110, 331; trigonometric and hyperbolic functions, 163–65; used car purchase, 47, 58, 267–269; used car purchase II, 270; VCR sales, 232; walking on stilts, 95, 133, 135
analogical reasoning: and analogical arguments, 4–7; characteristics of, 1–4; and commonsense guidelines, 19–21, 240–43; and commonsense inference rules, 21–23; commonsense model of, 44–45; and confirmation of scientific theories, 31–32; deductive conception of, 22, 46–48; defined, 1; and historical relativization, 26–28; inductive/ sampling conception of, 21–22, 48–52, 93; normative aspects and computational models of, 61, 327; normative theories of, viii, 3, 91–92; and normative theories of confirmation, viii, 3, 61, 91–92; and philosophical justification, 28–30, 33; and plausibility, 15–19; perceptual models of, ix, 84, 86–88, 313–14; public nature of, 5; role of in research programs, 11–12, 87–88, 330–31; in science and mathematics, 2, 7–12; and scientific revolutions, 306–14; skeptical objections to, 52–56; structuralist approaches to, 65–72, 88; and values in science, 254–61; and visual arts, 313–14 analogical thinking, decomposition of in computational models, vii–viii analogy: constitutive, 11–12, 87; defined, 1, 13; and disappearing analogy problem, 124–25, 264, 302; ethnographic, 133–34, 136–38; explanatory, 25–26, 96; formal, 42, 207–10; hydrodynamical, 233–35; hypothetical, 14–15; larger role of, in mathematics, 184–86; between law and science, 252–54; material (Hesse), 42, 93; negative, 14, 100; neutral, 14, 100; physical, 208, 210, 221, 312; positive, 14, 100; predictive, 96, 112–21; Pythagorean, 211; self-perpetuating, 11–12; vertical and horizontal relations in, 14, 25. See also multiple analogies ANALOGY program, 63–65, 89, 148 animal spirits: as example of AR, 198 animal testing: as example of AR, 47, 207 argument for other minds: as example of AR, 8
INDEX
argument thesis, 5–6, 87, 321–22 Aristotle, 4–5, 10n. 11, 21, 223; analysis of analogical arguments, 36–40; on causal principles, 39; on earthquakes, as example of AR, 26, 242 articulation model: desiderata for, 92–94; difficulty in handling correlative analogies, 138–40; formal version of, 98–102; informal version of, 102–5; extended to multiple analogies, 143–48; extended to nonscientific domains, 310, 315; preview of, 24–30; restrictions on, 94–95; and revolutionary analogies, 309–13 Ashley, K. D., 11n. 12, 78–80, 89, 145, 263 assumptions, hidden: problem of, 170–72 asymptotic similarity, 174–80; defined, 177 Bayesianism: role for analogical reasoning in, 19, 30–32, 266, 291, 300, 325; and strict coherence, 55, 280, 284–5; and confirmation, 30–32, 291; logical, 284; mish-mash view, 283; orthodox view, 280; and prior probability, 279–85. See also Dutch books; old evidence problem; prior probability Beccaria, G., 198 beta-blockers: as example of AR, 112, 114, 117–21 Black, J. W., 112 Brown, T., ix–x Campbell, N.R., 16–18 Carbonell, J. G., 81 Carnap, R., 21n. 23, 46n. 8 Cartwright, N., 49 case-based reasoning, 76–83, 88 causal condition (Hesse): and evaluation of analogical arguments, 40, 43–44, 242 Ceva’s theorem, 108 CHEF program, 61, 78 Chisholm, R., 243n. 4 church and state: as example of AR, 232 clustering postulate, 46 common sense: and analogical reasoning, 3, 19–23; guidelines, 19, 240–42; norms, 329–30
351
complex numbers and operators: as example of AR, 188–90, 331 computational theories: cased-based, 60; general characterization, vii, 59; structuralist, 59 confirmation: Bayesian approach to, 30–32, 124–25, 279, 291; hypothetico-deductive model of, 124 conservatism, epistemic, 243–44 consistent mapping condition, 161 constraint-satisfaction model, 72–76 continuity arguments, 223–24; characterization and examples, 224–226; and invariants, 226–28; justification of, 228–33 Copernicus, 308 COPYCAT program, 62, 83–88, 89, 314 correlative analogies, 97, 138–141, 197–207; as challenge to articulation model, 138–140; evaluation criteria, 140–141, 202–7; preconditions for, 139–41; prima facie plausibility for, 139; prior association for, 138–39 corresponding states, law of, 233 critical factors: as defined in articulation model, 100, 101, 104 Darwin, C., 24, 26, 129–133, 224–226, 236, 307–13, 322 Davies, T.R. 46–48 De Finetti, B., 141, 280–281, 286–287, 289–290 De Finetti lottery, 181, 183 Descartes, R., 184 dinosaur speed: as example of AR, 224 domains: hypothetical (in thought experiments), 325; source and target, 13 DRAMA program, 73 Dunbar, K., 69, 327 Dutch books, 300–301, 324 electric current and fluid flow: as example of AR, 209 electromagnetic radiation: as example of AR, 212–14, 218–20, 307–13, 321 electrostatic attraction: as example of AR, 282 empirical norms: in analogical reasoning, 327
352
INDEX
entity realism: analogical argument for, 317–20 enumerative induction: straight rule for, 21 epistemic closure: failure for plausibility arguments, 53 epistemic values, conservative, 243–46, 252; and justification for analogical reasoning, 254–260; primary and secondary, 244–46; progressive, 244, 252 ether analogy: as example of AR, 9–10, 106 ethnographic analogy, 69, 133–34, 136–38 Euler characteristic formula: as example of AR, 155, 167, 169, 171–72, 174, 185–86 Evans, T. G., 63–65, 84, 89 evolutionary and genealogical inference from similarities and differences, 200–4 explicitness in proofs: inferential and pragmatic, 170–71 factorization argument, 294–99 feature matching: and correlative analogies, 197–207 finite and infinite matrices: as example of AR, 175, 178–9 fluid flow and heat conduction: as example of AR, 44, 93 formal analogies. See analogy: formal Franklin on lightning: as example of AR, 43, 197–98, 205–7, 242, 262 Franklin, J., 287 n. 27 Friedman, M., 259 functional analogies, 96–7, 133–38; critical factors for, 137; and determinants of qualitative plausibility, 138; preconditions for, 136–37; prior association for, 135 Gaifman, H., 180–81, 183, 281 n. 19 Galileo, 232, 322; and thought experiments, 4–5 generalization, potential for: as requirement in articulation model, 25–26, 32–33, 57, 101, 104–5, 148, 196 geometric similarity, 165–174; admissible, 174; defined, 166; and problem of
separated domains, 172–74; and problem of specious resemblance, 169–72; and ranking rival analogical arguments, 173 Gentner, D. B., 24, 32n. 28, 65–72, 217, 326 Goldman, A., 27–28 Gould, R.A., 136–37 Gould, S.J., 200–2 Hacking, I., 199 n. 4, 317–20 Hadamard, J., 4 n. 2, 314 Hammond, K. J., 61 Hanson, N. R., 17 Harrod, R.F., 50–51 Hawthorne, J., 291, 301 heat and electricity: as example of AR, 208 Hempel, C. G., 22, 45, 207–210 Herschel, J., 17, 302 Hesse, M., vii–viii, 9, 13–14, 207–8, 242; and criteria for evaluation of analogical arguments, 40–45; on justification for analogical reasoning, 45–46; on material analogy and pre-theoretic similarity, 42–43, 93–94 Hitchcock, C. R., 114n. 21 Hofstadter, D., ix, 67, 75, 83–88, 89, 314 Holyoak, K., 24n. 25, 32n. 28, 72–76, 217, 327, 329–30 horizontal relations: in analogy, 14, 24–25 Howson, C., 280–81, 285–86 Hume, D., 49, 311–313 Hummel, J., 327 Humphreys, P., 113–15, 129, 130 n. 36, 227–28 Huygens, C., 10, 13–14, 307–13 hydrodynamic analogies: as example of AR, 267, 288 HYPO program, 11 n. 12, 78–80, 89, 142 hypothetical cases: role in analogical reasoning, 231–32 hypothetical models and thought experiments, 320–26 induction, single-case, as analysis of analogical reasoning, 37, 45, 49–50 invariants, 226–28; determinate and indeterminate, 227–28 isomorphism, 159–61; difficulties of, as model for analogy, 160–61; nomic,
INDEX
22, 208; in computational theories of analogical reasoning, 59–60, 71, 326 James, W., 243 Jaynes, E. T., 284, 286 Jeffrey, R., 283, 286 Jeffreys, H., 255 n. 21, 260, 282–84, 286 Johnson, M., ix–x, 11, 262 JULIA program, 77 justification of analogical reasoning, 28–30; as deductive argument form, 46–48; from epistemic values, 29, 56, 254–60; in Hesse’s theory, 45–46; as inductive argument form, 48–52; in the legal setting, 29, 249–51; from symmetry-based reasoning, 30, 48, 180–84, 271–77, 294–99 Kedar-Cabelli, S., 65 Kelvin: and hydrodynamic analogies, 2 Kepler, J., 17 Keynes, J.M., 14, 49, 50 n. 10, kinds: common, 198–200; and correlative analogies, 197–207; natural, 49–50, 140; sophisticated naturalism about, 201. See also evolutionary and genealogical inference from similarities and differences Kitcher, P., 105, 259 Kolodner, J. L., 77 Kuhn, T. S., 11–12, 33, 80, 87, 201, 244–46, 263, 306–9, 324, 330 Kyburg, H., 49 Lakatos, I., 10, 155 n. 1 Lakoff, G., ix–x, 11, 93 n. 2, 262 law and science, analogy between, 252–53; and major disanalogies, 253–54, 261–63 Lee, H. S., 329–30 Lennox, J. G., 322 life on other worlds (Reid): as example of AR, 20, 25, 41, 50, 71, 112–13, 114, 115, 117–21, 146–47, 233, 296–98, 331 limits, problem for mathematical analogies, 157–59 Liston, M., 216–17, 221 Llewellyn, K., 247–8 Lloyd, G.E.R., 36–39
353
logical omniscience, problem of, 180–81 Lottery Paradox, 53 MAC/FAC program, 66, 327 magic: and analogy, 1–2 material analogy: requirement for, in evaluation of analogical arguments, 40, 42–43, 93–94 mathematical analogies: basic analysis, 25–26, 107–11; critical factors for, 109; and idle assumptions, 109, 111; larger role in mathematical inquiry, 184–86; prima facie plausibility of, 110–11, 153; prima facie plausibility, version II, 159; prima facie plausibility, version III,177; prior association for, 107; and probabilistic reasoning, 301–2 matrices, finite and infinite: as example of AR, 175, 178–79 Maxwell, J. C., 2, 208, 212–14, 218–20, 307–13, 321 McMullin, E., 244, 255 MEDIATOR program, 78 mental modeling: and thought experiments, 320–21 metaphor, and analogy, ix–x, 93–94; and open-textured predicates, 9–10, 262–3; and perception of similarity, 11–12; and visual analogy, 313–14 Mill, J. S., 8 n. 8, 19 n. 20, 20–22, 39–40, 50, 112–13, 115, 118, 241 Mitchell, M., 83–88, 89 modal argument and symmetry-based justification for analogical reasoning: and extensions of articulation model, 278–79; general version of, 274–76; objections to, 276–77; special version of, 271–74 money and fluid: as example of AR, 312–13 morphine and meperidine: as example of AR, 3, 95–96, 139–41, 229, 205–7, 195, 241 Morrison, M., 196, 210 n. 15, 259 n. 25 multiple analogies, 141–148; and application to problem of separated domains, 172–73; critical factors relative to, 145; and justification via the modal argument, 278–79; and
354
INDEX
primacy of individual analogical arguments, 141–43; prima facie plausibility relative to, 143–47; and qualitative plausibility, 147–48; ranking of, 145–47, 172–73 Myrvold, W.C., 300 natural and artificial selection: as example of AR, 24, 26, 129–33, 224–25, 236, 300, 307–13 natural kinds. See kinds: natural Nersessian, N., 320–23 no-critical-difference condition: and prima facie plausibility, 101, 106 nomic isomorphism. See isomorphism: nomic no-essential-difference condition (Hesse): and evaluation of analogical arguments, 40, 44 normal science, 12, 33, 88, 201–2, 240, 244–46, 255, 305–7 Nickles, T., 309 n. 5 Norton, J.: and argument thesis, 321–22; on thought experiments, 4–5 Ohm’s law, 209 old evidence problem, 31, 180 open-textured predicates, 9–11, 254, 262–3 Oppenheimer, R., 106 optimality, weak: and ethical values, 250; and epistemic values, 256–60 paleontology: evolutionary and genealogical inferences in, 200–4 Pareto-optimality, 250 n. 17 parsimony analysis, 202–3 Peirce, C. S., 51 Pera, M., 2, 198 Peruvian bark: as example of AR, 199, 205–7 Peruvian pots: as example of AR, 133–38, 196, 225, 227 phylogenetic trees, 202–3 physical analogy. See analogy: physical plausibility: of scientific hypotheses, 2–3, 6–7; conceptions of, 15–19; connection to truth, 256; modal conception of, 16–18; probabilistic conception of, 15–16, 180–84, 187;
relationship between modal and probabilistic conceptions of, 18–19, 299–300; of revolutionary scientific hypotheses, 308. See also prima facie plausibility Playfair, J., 163–65 Poincare´, H., 25, 152, 313 Poiseuille’s law, 209 Polya, G., 16, 105, 151–153, 155–56, 177, 267 polynomials and power series: as example of AR, 157–159, 174–77 Popper, K., 260 potential for generalization. See generalization, potential for potentially relevant factors: for an analogical argument, in articulation model, 99 preconditions: in articulation model, 100, 102–3. See also analogical arguments, varieties of predictive/probabilistic analogies, 112–21; and aleatory explanations, 114; causal factors in, 113–15; critical factors for, 116; defeating conditions for, 115–116; prima facie plausibility for, 117–18; prior association for, 113–15; and qualitative plausibility, 120–21 Priestly, J., 2, 95, 124, 290, 296 prima facie plausibility, 16–18, 32–33, 101–2, 275–76, 291–93; for abductive analogies, 127–28; for abductive/ probabilistic analogies, 132; for correlative analogies, 139–40; for functional analogies, 137–38; threshold for, 316, 328–29; for mathematical analogies, 110–11; for multiple analogies, 143–47; for predictive/probabilistic analogies, 117–18 Principle of Indifference, 183, 269, 284–6, 288, 299 Principle of Common Cause, 205 prior association: as requirement in articulation model, 25–26, 32–33, 57, 94, 100–103, 274–75; for abductive analogies, 122–25; for abductive/ probabilistic analogies, 129; augmented, 101, 116; canonical
INDEX
forms, 100, 103; for correlative analogies, 138–39; direction of, 96–97; for functional analogies, 135–36; and historical relativization of standards, 26–28; and mathematical analogies, 107–9; mode of, 97; for predictive/ probabilistic analogy, 114; and upper bound thesis, 98, 103, 299 prior probability, 279–85; and extreme bias, 284; nonnegligible, 266, 279, 284, 291, 293–94; problems with uniform distribution, 282–83; relative, 292 probabilistic analogies. See abductive/ probabilistic analogies; predictive/ probabilistic analogies probability models, 292; partial, 291–293 PRODIGY program, 81 proportionality arguments, 225–26, 233–36 punctured sphere: as example of AR, 157, 167 Pythagorean analogies: defined, 211; and Steiner’s problem, 211–12 Quine, W. V., 40, 49–50; 197 n. 1, 199–200, 243 n. 3 race-gender analogies: as example of AR, 11–12 rectangles and boxes: as example of AR, 41, 44, 154, 167, 169–71 reflection principle: general, 272; modal, 272–74 Reichenbach, H., 205, 255 n. 21 Reid, T., 20, 41 relativistic Schro¨dinger equation: as example of AR, 215–16, 221–22 reliabilism, 28 Residual factors constraint, 293, 298 Resnik, D., 318–19 Reynolds number, 234–5 Russell, Stuart, 46–48, 51–52 Rutherford model of atom: as example of AR, 4 saccharin and cancer, 6 Salmon, W., 6, 16 n. 16, 30–31, 205, 258–59, 266, 281, 291
355
scale models, 233–35 Schabas, M., 312–13 Schaumann, O., 95–6, 139–40 schema: for representing analogical arguments. See tabular representation of AR Schro¨dinger equation: as example of AR, 214–16, 221 scientific revolutions: and AR, 306–14 scripts: in case-based reasoning, 61, 76 separated domains, problem of, 156–57, 172 series convergence, 175; as example of AR, 190–91 Shelley, C., 133–4, 141 n. 44, 143 Shimony, A., 259–60 Siegel, D.M., 218–19 similarity, 93–4, 195–97; algebraic, 161–65; Aristotle on, 39; asymptomatic, 174–80; dynamic, 224, 234; and feature matches, 195, 197ff.; formal, 195, 207–10; geometric, 165–74, 224, 234; in mathematical analogies generally, 151–54, 187; and open-textured predicates, 9–11; parametric, 195–96, 223ff.; relational vs. surface features, 24, 66, 69; self-perpetuating, 11–12; syntactic approach to, 65. See also evolutionary and genealogical inference; formal analogies; isomorphism; material analogy similarity-identity reduction: in Aristotle, 39; in Hesse’s theory, 41–42 simplicity, of theories of analogical reasoning, 62, 92; of a scientific hypothesis or theory, 244–5, 258–59, 260; postulate of (Jeffreys), 283 Sizzi, F., 23 Sizzi’s analogy: as example of AR, 23, 27 slab lamps: as example of AR, 134, 137, 225–28 Sleeping Beauty problem, 286 n. 25 Snyder, 17 Sober, E., 130–31 sound and light: as example of AR, 195, 307–13 specious resemblance, problem of, 154–56, 169–72, 223
356
INDEX
spherical shells: as example of AR, 95, 122–23, 126–28, 296–98, 209 stability arguments, 225–26; simple, 229–31 Stafford, B.M., 313 stare decisis, 29, 240, 246–51; justification for, 29, 249–51 Stebbing, L. S., 20 Steiner, M., 2, 43, 210–23 Stepan, N., 11–12 Sterrett, S., 233–35 Stone, E., 199, 200, 205–6 structural equations, 226 structure-mapping theory (and SME), 65–72, 326–30 Suppes, P., 281, 283, 285 symmetry, 187–88; basic principles of, 268–269; and betting quotients, 181–84; and constraints on practical reasoning, 267–74; and countable additivity, 290; implicit role of in mathematical proofs, 170; internal and external, 267; modal constraint, 271; psychological interpretation of, 286–87, 289–90; role of in problem-solving, 48, 58, 267–68; and subjective probability, 285–290; and thought experiments, 323. See also justification of analogical reasoning; modal argument; Principle of Indifference symmetry arguments, 266–68; internal and external, 288–89; a priori status of, 286–89 systematicity (and systematicity principle), viii n. 2, 42, 61, 65–72, 88, 93, 326–30 table-covering problem, 267 tabular representation of AR, 13–14; augmented, 15
Thagard, P., 13 n. 14, 24 n. 25, 32 n. 28, 61, 67 n. 6, 72–76, 217, 327 thought experiments, 4–5, 232, 320–26; and symmetry, 323 topology, 175–179 triangles and tetrahedra: as example of AR, 8, 95, 107–110, 163–65, 331 trigonometric and hyperbolic functions: as example of AR, 163–65 unification, 2, 105, 196; and generalization, 196; and justification of AR, 258–60 uniformities and uniformitarianism, 136–38, 196; contrasted with invariants, 225–27 upper bound thesis, 98, 103, 119–20, 294 n. 32, 299 Urbach, P., 280–81, 285–86 used car purchase: as example of AR, 47, 58, 267–269; version II, 270 Van Fraassen, B., 245, 255, 268, 272, 286 n. 26, 288, 300–1 VCR sales: as example of AR, 232 vera causa, 17, 302 vertical relations: in analogy, 14, 24–25 walking on stilts: as example of AR, 95, 133, 135 Watson, P.J., 136–7 Weitzenfeld, J. S., 22, 47–48, 228 Whewell, W., 17, 302 Woodward, J., 226–27, 232, 316 Wright, L., 134–35 Wylie, A., 136, 138