The Foundations of Statistics LEONARD
J.
SAVAGE
"* BII,. . Higpu PrtJf,sJOr of Sl4IiShu YiIh U"w"si"
llCONO aBVISID BDI110H
DOVER PUBUCAnONS, INC. NEW YORK
Copyright © 1972 by Dover Publicationa, Inc.. CoP}rilbt © 1954 by I. IUchard Sante. All rights raerwd under PaD AmeriCaJI aad 1Dter'· national CopJrilht Conw:adonL
Tbit DoYer edition. fint publilbed in 1972. is a reviled and enIarFd YCnioa of the wort oripaall, pGblilhed by John WOey. Son. in 195f.
lfltma4tiOntJl SlImdMd Bool Number: tJ.41J6..62J49·1
Libr." 01 CO"VaI CaI4Zo, Cdrd NU1fIber: 7"1181'" Manufactured in the United Stata of America Dover PublicadoDs, lac. ISO Varic:k Street New Yort. N.Y. )OOlf
TO MY FA1HER
Preface to the Dover Edition CONTINUINO INTEREST HAS LlIoJOOUR.\OED PUBlJC4\TION OF A SEOOND
edition of this book. BecaU8e revising it to fit my present thinking and the new elimate of opinion about the foundatioDS of statiBtica would obliterate rather than nwtore, I have limited myself in the preparatioD of this edition much 88 though dealin, with the work of another. The objective erron that have come to my attention, mainly through the generosity of readers, of whom Peter Fishburn has my special thanJra, have been corrected, of eoUI'Ie. Minor and mechanical ones, sueb .. a name misspelled or an inequality that had peraiated in pointing in the wrong direction, have been silently eliminated. Other changes are conspicuous 88 additions. They consist mainly of thi8 Prefacey Appendix 4: Bibliographic Supplement, and several footnotes identified 88 new by the To enable you to pursue the many new developments since 19M according to the intensity and direction of your own interests, a number of new references leading to many more are listed in the Bibliographic Supplement, and the principle adV8Dces knowu to me are pointed out in new footnotes or in comments on the DeW references. Citations to the bibliography in the original Appendix 3 are made by a compact, but otherwise ill.advised, letter aDd number code; thoae to the new Appendix 4 are made by a DOW popular system, whieh is dective, informative, and flexible. Example: The historie papers (Borel 1924) aDd [D2] have beeD translated by Kyburg and SmokIer (1964). The following paragraphs are intended to help you approach this book with a more current perspective. To some extent, they will be intelligible and useful even to a novice in the foundations of statistics, but they are neeell8rily somewhat teehnieal and will therefore take on new m@aoing if you return to them as your reading in this book and elsewhere progre88e8. The book falla into two parts. The 11m, ending with Chapter 7, is a general introduction to the peJ"8ODalistic tradition in probability and utility. Were this part to be done over, radical revision would not be required, though I would DOW IUpplement the Jine of argument centering around 8 system of postulatea by other leas formal approaches, eaeb convincing in ita own way, that converge to the general coDe1wdon that peJ'IODal (or subjective) probability is a good key, and the best yet
.+.
...
IU
• IV
PREFACE TO THE DOVER EDITION
mown, to all our valid ideas about tbe app1ieationa of probability. There would also be IDaD1 new worb to report on and analyze more thoroughly
than can be done in footnotes. The origiDal aim of the aecond part of the book. beginning with Chapter 8, is aD too plainly stated in the aeeond complete paragraph on page 4. There, a personaliatie justification is promiaed for the popular body of devices developed by the enthusiastically frequentiatic aeboola that then occupied almOlt the whole statistical &ceDe and atill dominate it, though 1. . eompletAtly. The aecond part of the book is jndeed devoted
to perIOnalistie diae1l8Bion of frequentiatie dmC8l, but for one after another it reluctantly admits that jUBtifieatioD haa not been found. Freud aloDe could uplain how the rub and UDfulfilled promise on page 4 went UDamended through 10 many reviaioDa of the mlDWlCript. Today, as I see it, the theory of peraonal probability applied to statistics sbows that many of the prominent frequentistie deviees can at belt lead to accidental and approximate, not ayatematic and eopnt, 8IlCeelS, U is expanded upon, perhaps more optimistically, by Pratt (1965). AmODg the ill-founded frequentiatie devieea are minimax rul. almOit all tail-area tests, tolt-ranee intervals, aDd, in a 80rt of elaaa by itaelf, fiducial probability. If I have lost faith in the deviCftl of the frequentistic aehools, I have learned new re&peet for lOme of their pneral theoretical ideas.. Let me amplify first in eomaectiOD with the Neyman-Pearson aeboot WhUe imdating on long-run frequency 88 the basis of probabl1ity, that eehool wiaely emphaaizes the ultimate subjectivity of statistical inference or behavior within the objective coDStraint of 'c adm;';bility, " 88 in (Lebmann 1958; Wolfowitz 1962). But careful study ot admi';bility Ieada almOlt inexorably to the reeognitiOD of peraonal probabUitiea and their central role in statistica (Savage 1961, Section 4; 1962, pp. 170-175), 80 personalistie statistiea appears as a natural late deveiopmeDt of the Neyman-Pearaon ideas. One conseqUeDce of this sort of analyaha of admissibility is the extremely important likelihood principle, a eorol1ary of Bayea' theorem, of which I was not even aware when writing the first edition of this book This principle, inferable from, though nominally at variaDee with, Neyman-Peanon idea (Birnbaum 1962), was first put forward by Barnard (1947) and by Fisher (1955), members of what might be called the Fisher aehool of frequentiat& See alao (Barnard 1965; Bar.. Dard et aL 1962 ; Corn1leld 1966).
The views just exprMled are evidently controversial, and if I have permitted myself lOeb expre.ioDS .. "show" aDd Hinexorably," they are not meant with mathematical ftnality. Yet, eoutroversial though
v
PRF3ACE TO THE DOVER EDITION
they may be, they are today shared by a number of atatiaticiaDs, who may be aalled penonalistie BayMiana, or simply penoD&lists. This book baa played-and continue. to play-a role in the p8ft0Da}istie movemet, but the movement itself baa other sources apart from thoae from which this book itself was drawn. One with great impact on praetiea1 8tatiatiea and scientiiie manqeJDent is a book by Robert. Sehlaifer (1959). This ill a welcome opportunity to say that his idea were deve1· oped wholly independently of the present book, and indeed of other
penonalistie literature. They are in full harmony with the ideas in thia book but are more down to earth and leg spellbound br tndition. L. J. YaU U"""'" Juu,1971
&VA8
Preface to the First Edition A BOOJt ABOUT 80 CONTBOVEBSlAL A SUBJECT AS THE FOUNDATIOn
of atatistica may have some value in the classroom, 88 I hope this ODe win; but it cannot be a textbook, or manual of instruction, statiq the accepted facta about ita subject, for there scame1y are any. Openly, or ooyly screened behind the polite conventions of what we call a disinterested approach, it must, even more than other boob, be an airing of ita author'8 current opinions. One who 80 airs his opinions has serious misgivings that (as may be judied from other prefaces) he often tries to eommnnieate along with his book. First, he longs to know, for reasons that are Dot altogether noble, whether he i8 really making a valuable contribution. His O\\~ conceit, the encoU1'8plD.ent of friends, and the confidence of his publiaher have given him hope, but he knows that the hopes of others in his position ~ve seldom been fully realized. Again, what he has written is far from perfect, even to his biased eye. He bas stopped revising and caUed the book finished, becaUBe one must sooner or later. Finally, he fears that he himself, and stiU more such public as he baa, win forget that the book is tentative, that an author's most recent
word need not be his last word. The application of statistics interests some workers in almost every field of empirical investigation-not only in science, but also in commerce and industry.. Moreover, the foundations of statistics are connected conceptually l\ith many disciplines outside of statistics itae1f, particularly mathematics, philosophy, economica, and peycholOU a situation that, incidentally, must augment the Datura! misgiviDp of an author in this field about his own competence. Those who read in this book may, therefore, be diverse in background and interests. With this consideration in mind, I have endeavored to keep the book 88 free from teclmieal prerequisites 88 its subject matter and its restriction to a reaaouable aile permit.. Technical knowledge of statistics is nowhere &88umed, but the ..-del who has some general knowledge of statisties will be much better prepared to understand and appraise this book. The books 8tatiBlics, by L. H. C. Tippett, and On 1M Principia of 8tGtiaticol Ifl/~ by
...
VUl
PREFACE TO THE FIRST EDITION
A. Wald, listed in the Bibliography at the end of Appendix 3, are short authoritative introductions to statistics, either of which would provide some statistical backcround for this book. The boob of Tippett and Wald are so different in tone and empbaais tbat it would by no meaDS be wasteful to read them both, in that order. Any but the most casual reader should have lOme formal preparation in the theory of mathematical probability. Thole acquainted with moderately advanced theoretioal statistics will automatically have this preparation; others may acquire it, for example, by reading TI&«wr 01 Probability, by M. E. Munroe, or selected parts of AA l'*«ltlditm to Probability TMory and It. ApplictltioM, by W. FeUer, aceorcIiDg to their taste. In Feller's book, a thorough reading of the Introduction and Chapter It and a casual readjug of Chapten 5, 7, and 8 would be sufficient. The explicit mathematical prerequisites are not great; a year of calculus would in principle be more than enough. But, in practice, readers without some training in formal logic or one of the abstract bnmcbea of mathematics usually taupt only after calculus win, I fear, find lOme of the 10111 thoup elementary mathematical deductioDB quite forbiddiDg. For the sake of such readers, I therefore take the Uberty of giving some pedasogical advice here and elsewhere that mathematically more mature readers will find superfluous and possibly irritating. In the first place, it MDIlOt be too strongly emphasized that a long mathematical argument can be fully unde1'8tood on first readjDI only",1len it is very elementary indeed, relative to the reader's mathematical bowledge. If one wanta only the gist of it, he may read such material ODee only; but othenriBe he must expect to read it at least once spin. Serioua reading of mathematics is best done sittiDl bolt upright OD a hatd chair at a desk. Pencil and paper are nearly indispeDSable j for there are always figures to be sketched and steps in the &llWDent to be verified by calculation. In this book, as in maDY mathematical boob, when exercu. are indicated, it is absolutely euential that they be read and nearly essential that they be worked, becauee they coDStitute part of the exposition, the exercise form being adopted where it seems to the authol' best for conveyiDg the partieu1ar information at hand. To some mathematiciaDs, and even more to logicians, I must say a word of apology for what they may consider lapeea of rigor, such as using the same symbol with more than one meaaiDI and failinl to distinguish uniformly between the use and the mention of a symbol; but they will understand that theM lapses are sacrifices to what I take to be general intelligibility and will have, I hope, no real difficulty in repairing them.
PRJI:FACE TO THE FIRST EDITION
.
IX
Few J1rill wish to read the whole book; therefore introductions to the chapters and sections have been 80 \vritten 88 not only to provide orieatation but also to facilitate skipping. In particular, we fieroull are indicated around mathematically advanced topice and other digleuioD8. A few words in explanation of the eonventioDl, such 88 those by which intemal and external referencea are made in this book, may be uaeful. The abbreviation, 3.4 means Section 4 of Chapter 3; within Chapter 3 itae1f, this would be abbreviated Itill further to § 4. The abbreviation (3.4.1) meaDS the first numbered and displayed equation or other expreaion in § 3.4; within Chapter 3, this would be abbreviated stiD further to (4.1) and within f 3.4 simply to (1). Theorems, lemmas, eDftises, coronaries, figures, and tables are named by a similar system, e.I., Theorem 3.4.1, Theorem 4.1, Theorem 1. Incidentally, the proofs of theorems are terminated \\;th the special punotuation mark . , a device borrowed from Halm08'. Jl8tJ1Ur, f'luory.
Seven postu1atea, Pl, P2, etc., are introduced over the coune of several chapters. For ready reference these are, with 80Dle explanatoty material, reproduced on the end papers. Entries in the Bibliography at the end of Appendix 3 are designated by a eelf-explanatory notation in square brackets. For ezample, the works of Tippett, Wald, Munroe J FeDer, and HaIm08, &heady referred to, are (TJ), [WI], [M6], [FI], and [H2), respectively. I often allude to a set of UrI refert:tU"AI to a given u,pic. This meaDS a set of external references intended to lead the reader that wishes to pur8Ue that particular topic to the fullest and mOlt recent bibliographies; it baa nothing to do with the merit or importance of ·the worb retened to. Tecbnical terms (except for Don-verbal symbols) that are defined in this book are printed in bold face or italies (depending on the impor-
tance of the term for this book or for established U881e) in the context where the term is defined. These special fonts are occasionally used for other purpoee8 &8 well. Terms a~ sometimes uaed iDformaDyeven in unofficial definitions-before beiDg oflieiaUy defined. Even the official defiDitions are sometimes of neceuity very loose, oorrespoudiDg to the well-known principle that, in & formal theory, some terms must iD striet logic be left undefined. L. J. SAVAGB
Acknowledgement I IUVB IIAKY JPBIBNDe, PEW OF WHOM 8JlA.BE KY PBB8BNT OPIN-
iODl. to thank for criticism and encouragement. Though the list Beema 10D& I eannot refrain from explicitly mentioDiDg: I. Bma, A. Burks, R. Camap, B. de Fmetti, M. Flood, I. J. Good, P. R. HalmOl, O. Helmer, C. Hildreth, T. Koopman8, W. Kruskal, C. F. Mosteller, I. R. &Vap, W. A. Wallis, and M. A. Woodbury. Wal6a &8 chairman of my department and close friend has particularly encouraged me to write the book and facilitated my doiDg 80 in many ways. lira. Janet towrey and Miu Louise Foraytb typed and retyped and did 80 many other paiDStalring tMke 80 well that it would be inadequate to call their help secretarial. My work on the book was made possible by four organizations to which I herewith expn. thanks. During the yean l~ through 1964 I worked on it at the University of Chicago, where the work \\·88 supported by the Offiee of Naval Research and the UDiversity.itaelf, which also supported it during the summer of 1952. During the academic year 1951-52 I worked on it as a. research scholar in France under the Fulbrilht Act (Public Law 584, 79th Congress), and during the \\11ole of that year 88 a fellow of the John SimoD Gugenheim lfemorial F01JDdation. L. J. 8.
Contents Postulate. of a peraoaaliatio theory of deeision . . . . . . • • . . . End papers
I,
I1f'l"WQDvCTlQlC
1. Tbe mIe of foundatjone ,
1 1
!
2. Historical backcround
a, ae.ral outline of thiI book
.
,. PBVIMDf.ar (A)N81P ZJlAnON8 ON DECWON IN 'l'HJ: FAce OJ' UNCliBTAINft
J. Introduction
..,.,.!"
2. The penoD . . • . . . . . , . 3. The world. and ltata of the world . 4, Eyeota . . , . . . . . . . . . . . 6. CoDI8Quenoee, acta, and decisions 8. The simple orderiDl of acta with reIJ)eCt to preference 7. The aurH.hiDc principle. . . . . . . . . . . . . . 3.
pPeoJlAlc
8 10 13 17 21
PRo.ABfLlTI
1. Introduction. . . . , . . . . '
2. Qualitative peraonal probability . 3. Quantitative pereonal probability .. Aqmt matbelD&Uaal details . . . • 5. Conditional prot.bility, qualitative and quantitative 6. The approach to certainty throgh experience 7. Symmetric aeqU8ncee of event.. . . . . . . •. CluTtCAL
6 7
2'1 30 33
43 48 50
Co.MAD ON P'sPONAL PaoBABILtTT
1. IDtroduetjOD. . . . . . . . . . . . . . 2. Bome abortoominp of the per8Onalist.ie view
56
a. ConnectiOD with other vieg, , . . •. Criticism of other views. . . . . . . . .
60
S. The role of aym.metry in probability . . . • 6. How can eeienee UIG a pen!Ona1ilt.ic view of prob&bility?
6'1 63 • 67
5. UTlLlTT
1, IDtroduction. . . . . . . . . . . . 2. GamtW . . . , . . . . . . . . . 3. Utility, and preference pmb1es 4. The exteDsion of utility to more poeraI act8 .
amoac
lit Smen wortda . . . . . . . . . . . . . 6. Historical and critical comment. on utility I" wtn
69
70 78 76 82 91
OONTENTS 8. 0 ....1'4'110" 1. Introductioa .
...............
..
... , 2. What aD nl1 • [J("!'Atlon is , . . , . . , . . . . . . . . . . . 3. Multiple obeot vatiou, and extmaiODI of obeervatiODl and of Beta of
105
acta . . . . . . . . • . . . .
111 114 116
7, PAJmTIOM hoBLZ"" I, IDtroduc;tioa. «
•
,
•
•
•
•
•
•
•
•
,.,
108
•
2. Structure of (twofold) partition J)rOhI!ma. • • . . • • , 3. The DIue of o_nation . . . . . ««
121 121
•
128 1M
1.tO
10
148 8. &r4naftCl borD I. Introduction 2. What, is statietiee proper? . 3. Multiperaoul problems . •. The mbrimax theory . . 0
9. IJiITaODVCDON
•
•
•
•
Mnnacu:
TO TJq
. . . .
•
,
.
,
•
•
•
•
•
•
«
•
•
1M
•
150
182
lea
6. must.r&tions of the minimp rule . . • . . . . . . 7. Objectivistic motivation of the minim" rule . . . . 8. Loaa .. oppoeed to neptive income in the minimax rule
10. A
.
,
.
1M 185 .
188 189
PaJU1OM4UJmC RzIN'TBILPU'I'4TJOK O. TJlB MINIMAX THEORY
1. 2. 3. 4.
Introduction, . . . . , . , . . . . . . . . . . . . . . " A model of croup decisiOD. . . . . . . . . . . • . . , , The crouP minimax rule, and the Il'OUP principle of admiasibility Critique of the minimax rule
croup
0
0
•
•
•
•
•
•
•
11. To P 4MLLEU8M
BITWUN TJI1I MINIMAX TaIOaT AND TI[& Two-hBlO1I OAYU 1. I~ucUon. . , 2. 8t.aodard pme! 3. Minimax play . -i. Parallelism and contraat with the minimax theories.
•
0
•
•
,
Tae
•
172 172 178 ITt
or
THIQ&Y
178 178 180 180
0
12.
1. •
TuWOBT
1. Introduction. 2. Tho behavioral;"';" outlook 3. Maed acta . . . . . . , , . .. .. Income and low ' , , . , 50 The minimax rule, and the principle of admiaBibility 0
1M 1M 1M
MATB&YAnca OF MINIMAX PBOBLBMI
1. Introduction. • . . . . 2. AbIt.raet
_DIM . . . . . . . . . .
.. .. . . . ,
1M t
1M
CONTENTS
a
...
BJi".. paw.. . . . . . . . . .
.. AD egmple of a bilinear pm!
11. Oaganows
70
TU Mpgy!X
. . .
Rm.,.
1. Introcluctioo. • . . • . • . • . • . • • • 2. A COIlfuIioIl between _ and neptive income Utili~y and the minir,;u rule .. . . . . . .. A Imget. IUb-mipjmp acta 5. The minimax rule doea not poerat.e a simple orderiDc
a.
•
s
14 Tn
~86
180 198
s
•
•
•
•
,
•
.,
t
200 •
... .
200
.
212
-.
•
201
•
Mnnv.x
TusoBx ArrusD TO O.g&yADOXI 1. IDtIoduotjon, , . . , . , , . • ' 2. Recapitulation of partition problema
3.
a,,5d_' ltatjatjca
. . • ' •
.. flm. dl~my! aD example I. The approach to certainty s
•
,
. . . . •
•
•
•
s
•
•
•
8. Coat of o'-rvation. . . . . , . . . . . 7. Bequeatial probability ratio procedures 8 t RaDdomiMtion ,
,
212 214 214
•
216
. .
216
.
16. Pol1ft' FAnuIA'ftON 1. IDtirocluetiOD.. . . . . • • . . . • . . . 2. The verbeMtic CQDeep\ of point Mtimation . . . . . a. Examples of problems of point estimation . . 4. Criteria tbat have been propoeed for point eetimatee . • a. A behavioraliltic review of the criteria for point estimation 8. A behavionliatic review, OOIltinueci • .... 7. A bebaYioraliatic nmew, concluded . .
ttT
.
,
.
...·.
11. TMnMo 1. Introduction. . . 2. A theory of testing a. in practice •
T..wa.
.
,
"""PIX 3,
221 221 228
229
• •
., .
.
. . . .. . . ..
. . . ..
17. Ilft'IIJlvAL EsmuTlO1! AND R"+TBD TOPIcs 1. EeU'Ntei of the aeeuracy of estimates . . 2. Int.ern.1 Mtimation ADd confidence int«yaJa S. ToJen.oce iDttmla • 4. Fiducial probability
Ar.1ND!X 1.
220
s
••
,
,
257
..
•
s
..
·
.
Exnqntp V4LD .
263
00lfY'U FuNcnon. . .
286 270
BrauoollAPHlc MA'l'BBlAL .
A......JD 4. BDuOOlU.PBJC Sl1PPLBIIDT
. . . .. . . .
..
283
299
a..,.tI. INOU
• ,
.
..
.
.
301
·.
305
CHAPTER
1
Introduction 1 The role of fouIldati0D8 It is often argued academically that no science can be more secure than ita foundations, and that, if there is controversy about the foundationa, there must be even greater controversy about the hiaher parts of the science. As a matter of fact, the fOundatiODS are the most CODtroversial part:a of many, if not all, sciences. Physics and pure matb~ matica are excellent examples of this phenomenon. As for statistica, the foundations include, on any interpretation of which I have ever heard, the foundations of probability, 88 controversial a subject &8 one could name. As in other sciences, controversies over the fOUDciatioDl of statistics reflect themae1ves to some extent in everyday practice, but not nearly 80 catastrophically 88 one might imagine. I believe that here, as elsewhere, catastrophe is avoided, primarily because in practical situations common seoae generally saves all but the moat pedantic of WI from flagrant error. It is hard to judge, however, to what extent the relative calm of modem statistics is due to its domination by a vigorous school relatively weD agreed within itself about the foundatioDS. Although study of the foundations of a science does not have the role that would be aasigned to it by naive first-tbjnp-firstiam., it baa a certain continuing importance 88 the science d~elope, influencing, and being influenced byI the more immediately practical parts of the acienca. 2 HIstorical backpound The concept and problem of inductive inference have been prominent in philosophy at least since Aristotle. Mathematical work on some aspects of the problem. of inference dates back at least to the early eighteenth century. Uribniz is said to be the first to publish a suggestion in that direction, but Jacob Bernoulli's posthumous Ar. Conj«>Iondi (1713) [B12] seems to be the first concerted effort. t This mathe-
t Valuable iDfonnatiOD on t.his and other topics of the early pbilottopbic bWtory of
probMiIity is attractively pramted in Keynea' treatile IX.), especially in Chapters VII, XXIII, aDd the bibHocraphy. 1
2
INTRODUCTION
[1.2
ma.tica1 work has always revolved around the concept of probability; but, though there was active interest in probability lor nearly a century before the publication of Ar. Ctmiedlmtli, earlier activity 8fJIIDDI Dot to have been concerned with inductive inference. In the present century there baa been and continues to be extraordinary interest in mathematical treatment of problems of inductive inference. For reasons I cannot and need not analyse here, tbia activity bas been strikingly concentrated in the EllIlish-apeakiDl world. It is known under several names, most of which stress some aspect of the subject that seemed of overwhelming importance at the moment when the name was coined. "Mathematical statistics," one of ita earliest names, is still the moat popular. In this name, "mathematical" seems to be intended to connote rational, theoretical, or perhaps mathematicallyadvanced, to distinguish the subject from thoae problems of gathering and condensing numerical data that can be considered apan from the problem of inductive inference, the mathematical treatment of which is generally relatively trivial. The Dame "statistical inference" recopisea that the subject is concemed with inductive inference.. The name "statistical decision" reflects the idea that inductive inference is not always, if ever, concerned with what to believe in the face of inconclusive evidence, but that at least sometimes it is concerned with what action to decide upon under such circumstances. Within this book, there will be no hann in adopting the shortest poasible D&1Ile,
'fstatistics. " It is uDuimoU!iy agreed that statistics depends somehow on probability. But, as to what probability is and how it is connected with statistics, there has seldom been such complete disagreement and breakdown of communication since the Tower of Babel. There must be dOJleDS of different interpretations of probability defended by liviDa authorities, and some authorities hold that several different interpret&trioDS may be useful, that ie, that the concept of probability may have different meaningful geD8e8 in different contexts. Doubtleaa, much of the disagreement is merely terminological and would disappear under sufficiently sharp analysis. Some believe that it would all disappear, or even that they have themselves already made the necessary analysis. Considering the confusion about the foundations of statistics, it is surprising, and certainly gratifying, to find that almost everyone is agreed on what the purely mathematical properties of probability are. Virtually all controversy therefore centers on questions of interpreting the generally accepted axiomatic concept of probability, that is, of determining the extramatbematica1 properties of probability.
1.1)
BI8TORICAL
BACKGROm~
3
The widely accepted axiomatic concept referred to is commonly MenDed to Kobnogoroff [K7] and goes by his name. It should be mentioned that there is some dissension from it on the part of a small group led by von Mises [V2). There are also a few minor mchnical variations on the Kolmogoroff system that are sometimes of interest; they will be cliacuaaed in § 3.4. I would distinguish three main cl8JSSe8 of view8 on the interpretation of probability, for the purpoaes of this book, calling them objectivistic, perBOD&1istic, and neces.ry. Condensed descriptions of these three clMI. of views seem called for here. If some readers find these descriptions condensed to the point of unintelligibility, let them be uaured that fuller ones will gradually be developed 88 the book proceeds. Objectivistic views hold that some repetitive events, such 88 toaaea of a penny, prove to be in reaaonably cloee agreement with the mathematical concept of independently repeated random events, all with the same probability. Aceording to such views, evidence for the quality of agreement between the behavior of the repetitive event and the mathematical concept, and for the magnitude of the probability that appliee (in cue any does), is to be obtained by observation of lOme repetitiODl of the event, and from DO other source whatsoever. Penon.Batie view8 hold that probability measurea the confideuee that a particular individual has in the truth of a particular proposition, for example, the proposition that it wiD rain tomorrow. These views postulate that the individual concerned is in some ways "reasonable," but they do Dot deny the possibility that two reasonable. individuals faced with the same evidence may have different degrees of confidence in the truth of the same proposition. lfecealary views hold that probability measures the extent to which one set of propositions, out of logical necessity and apart from human opinion, confirms the truth of another. They are generally regarded by their holders as extensions of logic, which tel1a when ODe set of propositi0D8 necessitates the truth of another. After what baa been said about the intensity and complexity of the controversy over the probability concept, you must realise that the short taxonomy above is bound to infuriate any expert on the found ... tiODB of probability, but I trust it may do the less learned more good than harm. The great burst of statistical research in the Engliab-epeaking world in the present century has revolved around objectivistic views on the interpretation of probability. As "ill shortly be explained, any purely objectivistic view entaila a aevere difficulty for statistics. This difficulty is recopised by members of the British-America Sdtool, if I
INTRODUCTION
[1.3
may use that Dame without its being taken too literally or at all nationalistically t and is regarded by them &8 a If8&t, though not iDeurmountable, obstacle; indeed, some of them lee it 88 the central probJem of statistics. The difficulty in the objectivistic positiOD is this. In aD)' objectivistic view, probabilities can apply fruitfully ODly to repetitive eveDta, that is, to certain proeCfJ888; and (dependiDg on the view in question) it is either meaningless to talk about the probability that a pven proposition is true, or this probability can be only 1 or O. according as the proposition is in fact true or false. Under neither interpretation can probability aerve as a measwe of the trust to be put in the proposition. Thus the existence of evidence for a proposition can never, on an objectivistic view, be expressed by saying that the proposition is true with a certain probability. Again, if one mUd ChOOE amoDI eeveral COUl'I88 of actiOD in the light of experimental evidence, it is Dot meaningful, in terms of objective probability, to compute which of these actioDl is most promising, that is, which bas the highest expected income. ,Holders of objectivistio view8 have, therefore, no l8COllt'88 but to arpe that it is not reaaonable to &88ip probabilities to the truth of propositi0D8 or to calculate which of several actions is the most promising, aDd that the need expreaed by the attempt to set up such concepts must be met in other ways, if at all. The British-American School has had great success in several respects. The number of ita adherents has rapidly inereued. It baa contributed many procedures of strong intuitive appeal and (one feels) of lasting worth. These have found widespread .application in many aciences, in industry, and in commerce. The BUCceas of the school may pracmatically be taken as evidence for the correctneal of the general view on which it is based. Indeed, anyone who overthrow8 that view must either discredit the procedures to which it bas led, or show, 88 I hope to show in this book, that they are on the whole cOll8istent with the altem&tive proposed. Some, I among them, hold that the grounds for adopting an objectivistic view are not overwhelmingly strong; that there are serious logical objections to any such view; a.nd, most important of all, that the difficulty a strictly objectivistic view meets in statistics re8ects real inadequacy.
a
General outUDe of tbla book
This book presents a theory of the fOundati0D8 of statistics which is baaed on a per80Dal iatic view of probability derived mainly from the work of Bnmo de Finetti, ... expre • sod for example in (D2J. The theory
1.31
GENERAL OUTLINE OF TBlS BOOK
is presented in a tentative spirit, for I realiae that the serious bJemiahea in it apparent to me are not the only ones that wi1l be diecovereci by critical readers. A theory of the foundatioDS of statistics that appears contrary to the teaching of the moat productive atatisticiaDs will properly be reprded with extraordinary caution. Other viewl OD prot. bility will, of COUl'l8, be diae\l888d in this book, partly for their own interest and partly to explain the relationship between the penonaJistic view on which this book is baaed and other views. The book is orpnised into seventeen chapters, of which the present introduction is the lint. Chapters 2-7 are, 80 to spark, concerned with the foundations at a relatively deep level. They develop, explain, and defend a certain abstract theory of the behavior of & highl7 ideaJir;ed perIOD faced with uncertainty. That theory is shown to bave as implicatioDS a theory of perIODal probability, comwpondiDc to the per80n alietic view of probability basic to this book, and also a theory of utility due, in its modem form, to von Neum8.J)D and Morpostem
(V41. There is a trauaitiOD, occurring in Chapter 8 aDd maintained throughout the rest of the book, to a shallower level of the foundations of ItatiatiC8j I might -1 from pre-statistics to ltatiatica proper. In thoee later chapters, it is recopised that the theory developed in the earlier ones is too highly ideaUsed for immediate application. Some compromises have to be made, and the appropriate ones are sought in an aualyail of aome of the inventioDa and ideas of the British-American School. It will. I hope, be demODBtrated thereby that the superficially incompatible systems of ideas 8880Ciated on the one hand with a penona1istic view of probability and on the other with the objectivistical1y iDspired developments of the British-American School do in fact lend each other mutual support and clarification.
CHAPTBR
2
Preliminary Considerations on Decision in the Face of Uncertainty 1 IDtr04act1oD DeciaioDa made in the face of uncertainty pervade the life of fiery iDdividual aDd orpni_tion. Even anim al8 might be said COIl.muaUy to make IUch decilioDa, and the psychological mach. . . . by which men decide may have much in eoJDmon with thea by which animal. do 80. But formal reuoDiq pr-!mabty pia,. no role ill the deailiOlll 01 uimal., little in thOle of children, and 1. . than miabt be wished in thoae of men. It may be said to be the purpoae of this book, and indeed of statiatiea pneralIy, to diaeu. the implicatioD8 of re&8ODing for the making of deeisicma. ReuoniD& is enmmonly a.oeia.ted with 10lie, but it is obvious, .. many have point,ed out, that the implieatiODS of what is mdiDariIy called logic are meager indeed when uncertaiDty is to be faced. It baa therefore often been asked whether 1000e cannot be BtAmded. by priDciplee 88 aceeptable .. thOle of 1000e iteelf, to bear more fuBy OIl uncertaiDty. An attempt to extend logic in this way will be bepD in this chapter, differing in two important respeata from most, but not all, other attAmpta. First, since logic is concerned with implicatiODl &mODI propositiODB, many have thought it natural to extend logic by eettiq up criteria for the extent to which one proposition tends to imply, or provide erideDce for, another. It seems to me obvious, however, that what is ultimately want.! is criteria for decidiDc &mODI pallible counea of actiOllj and, therefore, pneraUsation of the relation of implication Ie elD8 a~ beet & roundabout method of attack. It must be admitted that lOKic itae1f does lead to lOme criteria for decision, becauee what ia implied by & proposition known to be true is in turn true and aometimea relevant to meJri. . a deciaiOll. Should eom8 DOtion of partial implication be demODltrably even better artioulatal with deciliOD thua is impUcatioa i&t
THE PERSON
2.2J
7
aelf, that would be excellent; but how is such a notion to be aought exeept by explicitly studying decision? RAmsey's diacussion in (RI] of the point at iaBue here is especially forceful. Second, it is appea1iDg to suppose that, if two individuals in the ame aituaticm, haviq the same taatea and supplied with the same information, act reasonably, they will act in the same way. Such apeement, belief in which amounts to a necesaary (88 oppoeed to a peraonalistic) view of probability, is certainly worth looking for. Penonallyt I bafieve that it does not correspond even roughly with reality, but, baviDS at the moment DO strong argument behind my peasimism OD this point, I do not iDaist on it. But I do iDaist that, until the contrary be demODStrated, we must be prepared to find reaaoniDg inadequatAt to briDe about complete apeement. In particular, the extensioDi of logic to be adduced in this book will not bring about complete apeement; and whether enough additioD&l principles to do eo, or indeed any additioDal principles of much consequence, can be addueed, I do not know. It may be, and indeed I believe, that there is an element in deeiaion apart from taste, about which, like taste itae1f, there is no disputiDg. The Bext four eeetiODB of this chapter build up a formal model, or acheme, of the situation in which a persoD is faced with uncertainty; the final two, in terms of this model, motivate and state lOme of the few principles that seem to me entitled to be taken as postulates for rational decision. S nepenon I am about to build up a higbly idealiRd theory of the behavior of a "rational" persoD with respect to decisioDS. In dome 80 I wiD, of eouree, have to ask you to agree with me that such and such maxima of behavior are "ratioDal." In 80 far 88 "rational" meaDS logical, there is no live question; aDd. if I uk your leave there at all, it is only 88 a matter of form. t But our person is going to have to make up hie mind in situ. tiOlUl in which criteria beyond the ordinary ODes of logic will be neceeeat)'. So, when certain mamn. are preeented for your consideration, you must ask younelf whether you try to behave in accordance with them, or, to put it differently, how you would react if you noticed youree1I violating them. t The a"mption that .. per8OD'. behavior is loaicaJ is, or ooune, far from YMUOUI. Ia pu'ticular, IUCh a penon caDDot be uncertain about deeidab1e mathematical propOIitioal. Thi.t .1. . . . . at Jaut to IDe, that the 1emptmc propam abtebed by Polya [PS) of .tabIiabiD, a theory of the probability of mathematical conjeeturee MOD'"
be fully eueceeduJ in that it caDDOt lead to a truly formal theory, but de FiDetti (W) _ _ more optimiltic about the pfOFUD..+ + Poly. hu ICfNtly rlaboralfld hi!' program, but nnt ill lb. di~oll of SHk· iug • tonnal theory. A ~riOU8 ftarly work by CfrNole (1915) is IIOmewhat )Mlrtincmt, and Hatting (1967) &rgUea for th. pnMihility of infOludinl' mathPlnatifal oftt'f'rtah.tv in a formal then".
.
.
8
PRELIMINAllY CONSWERATlON8 ON DECI8ION
12.3
It is brought out in economic theory that orpniaatioos sometimes behave Hke individual people, 80 that a theory originally inteDded to apply to people may alao apply to (or may even apply better to) such unite as families, corporatiODl, or utioDS. In view of this IQIibilitiy, economic tbeorista are sometimes reluctant to use the word "peraon," or even "individual," for the behaviDg UDita to which they mer; but for our purpoae "pereon" tbreateDI DO coDfwlion, though the poaaibility of umn , it in an extended BeDae may well be borne in mind.
a
The world, and ltatel of the world
A formal description, or model, of wbat the peraon is uncertain about win be needed. To motivate this formal deacription, let me begin in. formally by CODSidering a list of examples. The penon might be UDcertain about: 1. Whether a particular egg is rotteD. 2. Which, if any, in a particular doseD ega are rotten. 3. The temperature at noon in Chicago yesterday. 4. What the temperature was and win be in the place now covered by Chicaao each noon from January 1, 1 A.D., to January I, 4000 A.D. 5. The infinite sequence of heads and tails that will result from repeated t
mate of complexity in Example 7. Example 6 is a little anomalous in that anytbiDg the perIOD does not know about the description of .. he could know in principle by thinking sufficiently bard about it, that is, by logio alons. This point. banal to some readers, neede explaDatioD
THE WORLD, AND STATES OF THE WORLD
2.3]
9
for others. If, for example, .. is understood to be the area of a circle of unit radius, it follows by logic alone that .. is not greater than the area of a square circumscribing the unit circle, that is, .. S 4. By an elaboration of this method T can be computed to any dep-ee of accuracy, and by other purely logical methods many other facta about ... caD be establiahed, ncb as the fact that ... is not a rational number. b CODDection with the concepts ~ by the preceding paragraph, the following nomenclature is proposed as brief, augestive, and in rea80Da ble harmony with the usages of statistics and ordinary eli&-
eoune. Term thewodd altate (of the world)
the true state (of the world)
Definition the object about which the
p8I'8OD
is
coneemed a description of the world t IeaviDe no relevant aspect undescribed the state that doee in fact obtain, i.e., the true deaeription of the world
In application of the theory, the question will ariae as to which world to UII8 in a given context. Thus, if the peraon is interested in the on1y brown ea in .& dOseD, should that egg or the whole dosen be taken as the world? It Will be seen &8 the theory is developed that in principle no harm is done by taking the larger of two worlds 88 a model of the situation. One is therefore tempted to adopt, once and for all, one world sufliciently large, say Example 7. The moat serious objection to this is that Example 7 is vague, and some mathematical and philosophical experience 8U~ts that the vagueness cannot be removed without ruiniDg the universality of the example. It may aI80 be added that the use of modest little worlds, tailored to particular contexte, is often a simplification, the advantage of which is justified by a considerable body of mathematical experience \lith related ideas. The 8eDBe in which the world of a dozen ega is larger than the world of the one brown egg in the dozen is in some respects obvious. It may bp well, however, to emphasize that a state of the smaller world corre-
.tpmds not
to one state of the larger, but to 8 set of state8. Thus, '4The brown egg is rotten" describes the smaller world completely, and
therefore is a state of it; but the same statement leaves much about the Iarpr world unsaid and cOrTe8pOnds to a set of 211 state of it. In ihe 88D8e UDder discussion a smaller world is derived from a larger by negleeting some distinctions between states, not by ignoring 80me states outnpt. The latter sort of cODtraetiou may be useful in ~ certain
PRELIMINARY OONBIDEaATIONS ON DB0J810N
10
states are reprded by the penon can be ipcnd. ,
II
(2.4
virtually impoaaible 80 that they
BftDm
An ennt is a set of states. For example, in connection with the world of Example 2, the penon might well be conoemeci with the eVeDt tbat exactly one eg in the dosen is rotten (an event haviDl12 statee &8 elements), or, a little leas academically, that at least one of the ega is rotten (an event having 211 - 1 states 88 elements, i.e., aU the statee in the world but one). In connection with the world of Example 3, the person mipt be CODcerned with the event, having an infiDite number of states, that the temperature at noon in Chicago yesterday W88 below freesing. To give a final illustration, of a more mathematical flavor, cOlllider in CODIleetiOll with Example 5 the event that the ratio of the number of heads to tails approaches 3 as the sequence PI'OII( nnaa to infinity. In connection with any given world, there are two eventa that are of the utmost logical importance, though in ordinary discourse it may eeem banal even to mention their existence. The. are the univerDl and the vacuous eventa. The UDlftrlal event, here to be symbolded by S, is the event having every stafA! of the world 88 element. In 10 far a8 "world" baa a real technical meaning, S is the world. The ftCUOUI event, which can here be safely enough symbolised by the 0 of arithmetic, is the event having no states as element.. To illustrate, in Example 1 the event that the egg is rotten or good is the univenal event, and that it is both rotten and good is the vacuous event. It is important to be able to express the idea that a given event contains the true state among ita elements. Engliah U8ap eeema to otter no alternative to the rather stuffy expression, "the event obtaJnl." The theory under development makes no formal reference to time. In particular, the concept of event as here formulated is timeless, though temporal ideas may be employed in the description of particular events. Thus, it would not be said that Lincoln's &88MIPnation is an event that occurred in 1865 and that the next return of Halley'. comet is one that will occur in 1985, but that Lincoln's AMaMinatiOJl in 1865 and the retum of Halley'. comet in, but not before, 1985 are events that obtain. Modem mathematical U8ap, especially that of a branch of mathematics called Boolean algebra. suaests the followiq table of definitioDB in connection with the concepts of state and event. Some of theee are synonyms, othen abbreviations, and still others new terms compounded out of old.
11
EVENTS
Though the notatiODa introduced in Table 1 are very elem8Dtary and of great utility, they are Dot ordinarily taupt except in connection with iOlie or relatively advanced mathematica. A aet of earcilel iDuatrating their uee is therefore given below the form of • llumlMnd list of statements. Theee statements are true whatever the eeta A, B,
m
TABLB
1.
MA'l"IdUATICAL NO"L~CLATURB PBBTAUnNQ TO 8l'AD AND .v.1ftS
Definition
Term (Btuie Imu) let
. A , BC·· J ,
eveo'
generic symbola for event.
pnerie symbols for stat. the UDivenJal event the vacuous event
I, ". ,"
S
o
• IA. A c B (or B :::> A). A-B.
• is an element of A, te., a etate in A. t A is contained inB, i.e., every e1emeat of A is an element of B. A equals B, i.e., A 11 the lame let 81 B, te., A and B haw u:actly the IUD8
eJementi. (C~)
the compJemeDt of .A with
tbOl8 element. of 8 that are not in A
napectto8 ~A
the pion 0( the All
UlA,
AUB
the iD.tenectloA of the Al.
n,A,
AnB
the compJemeat of A with respect to 8 thoee e1emeota of 8 that are element. of &t least ODe of the aeta A 1, AI, etc.. the union of the Ale the union of A aDd B f i.e., thOle elementa of 8 that are elements of A or B (possibly of both) thOle elementa 01 8 that. are e1emeote of each of the eete A 1. At, e&e. the intenection of the Al. the intenectioll of A and B, i.e., thoee alemen.. of S that are e1emeota of both A and B
t Typocaaphical Dote: The Ponon font of the Greek alphabet (a, tJ, ,., I,
I,
t •... )
is the ODe aJm. . always printed, at 1eut in America, wbeD matbem.tieaI COMtaata Mel variabIee aft denoted by Greek letters. The symbol • OIled la tbie &ad lOme other pub1icatioDa to denote "element 01" iIt ~er. the epi&cm of the Vertical fOllt (., ~, T. 1, .. t. •.. ). Some pubUcatiOll8 1118 the apec(aI symbol ~; 8Dd the POI8OIl epaiIoo, plMlmably becaue of ita re.mblance to E:. The latter utap eataU8 eilller uIiD& • fOl' iwo different purpoI8I 01' elee AMainl fOIl. in mid alpbaW (a., ~ " ...... r, ...) - - ooutu. aDd ftriab1el aN deooted by Onek IeUea.
1OIDe".,
PRELIMINARY OOKSIDERATION8 ON DECISION
12
12.4
C may be. Alathematiciu8 would for the most part verify them by translating them into EDgliah and appealing to common seDIe, though in complicated cases explicit use might be made of Exercise 9. Diagrams, called VenD diqr&lDl, in which aeta aN symbolized by ant88, as illustrated by Figure 1, are often suggestive.
F'JIUI81
It is a remarkable and useful fact that any univenally valid statement about sets remains 80 if, throughout, U is interebaDpd with n, o with S, and c with =>. The dual in this sense of each exercise should be studied along with the exercise iteelf. For example, the dual of Exercise 7 is: A ::> B, if and only if A .. A U B. Note that the Brat parte of Exercises 1 through 6 are dual to the second parts. It may be remarked that, if Exerciees 1-6 are taken &8 axi0tn8 and 7 88 a definitioD, Exercises 8-21 and a180 the duality principle follow formally from them. For example, 10 can be proved thus: By 7, if A B is A, then A c B; but, by 1, A A is A; therefore A C A. Apin, 8 can be proved, usiDg 6, 3, 2, 1, 3, and 6 in that order. thus:
n
(1)
n
0
nA
-=:
(A
n ""A) n A
- ""A
n (A n A)
== <""A z=t
""A
n A) n A
nA
.. A
n "",A - o.
Such formal demonstration is fun and helps develop mathematical skill . In the present exercises the novice, however, should consider it as a possible supplement to, but not 88 a substitute for, demonstration by interpretation. If the exercises fail to render the notations familiar, it would be best to talk Vtith someone to whom they are already familiar or failing that, to read in any elementary book where the subject is treated, for example, Chapter II, "The Boole-Schroeder Algebra, U in the text of Lewis and Langford [L7].
Exercises illuatratiDg Boolean algebra 1. A n A .. A =- A U A. 2. (A n B) n c -= A n (B n (J);. (..4. U B) U C -= A U (8 U C). (These facti often render parentheeee superfluous.)
3. A
nB -
B
n .A; A U B .. B U A.
2·51
CONSEQUENCES, ACTS, AND DECISIONS
4. A n (B U C) == (A n B) U (.4. n C); A U (B (A U B) n (A U C). 5. S A == Aj 0 U A == A. 6. A n (",. 4) -= 0; A U (~A) S. 7. A c B, if and only if A == A n B. 8. 0 n A :;g O. 9. A =- B, if and only if A c B and B cA.
n C)
J3 :c
n
=t
10. .Ii cA. 11. (A n B) c A. 12. If A c B, then (A n C) c (B n C), and (A U C) c: (B U C). 13. (A U B) c C, if and only if A c: C and B c C. 14.0 C A cS. 15. A n (A U B) := A. 16. ~(~A) A. 17. ~(A U B) == (~A) (~B) (De Morgan's theorem). 18. ~ == S. 19. A n (~A U B) -= A n B. 20. A c B, if and only if (~B) C (",A). 21. A c B, if and only if A (~B) :IE O. 22. ~(UiAi) CI (~A,) (General De Morgan's theorem). 23. A U Bi ) (A UBi). 24. A n
n
n
n, en, ni n,
I
CooaequeDcea. acts, and decialons
To 88y that a decision is to be made is to say that one of two or more acta is to be ChOleD, or decided on. In deciding on an act, account must be taken of the possible .tate. of the world, and also of the consequences implicit in each act for each possible state of the world. A consequeace is anything that may happen to the person. CoDSider an example. Your wife has just broken five good ega into a bowl when you come in and volunteer to finish making the omelet. A sixth egg, which for some reason must either be used for the omelet or wasted altogether, lies unbroken beside the bowl You must decide what to do with this unbroken egg. Perhaps it is Dot too great an oversimplification to say that you must decide among three acta only, namely, to break it into the bowl containing the other five, to break it into & saucer for inspection, or to throw it away without iDspection.
PRELIMINARY CONSIDERATIONS ON DECISION
l'
(I.a
DependiDg on the state of the eg, each of tbeae three acta will have aome eonaequence of coneem to you, .,. that indicatAd by Table 1.
State Act
Good bre&k iDlo bowl
aU... omelet
DO
omelet, aDd five pod ega
destroyed break into ..ucer omelet, and • -.ucer five eel omelet, and a aaueer towuh to wash throwaway fiV&zeg omelet, and ODe good five . . omelet
m-eg
ea destroyed
Even the little example concerning the omelet sugesta how varied the thinga, or experiences, reprded 88 cooaequences, C&D. be. They might in general involve money, life, state of health, approval of friends, well-being or otbers, the will of Ood, or anything at all about which the penon could poaJibly be concerned. Consequeneee might appropriately be called states of the penon, as opposed to states of the world. They might alao be referred to, with 80IDe exteDsiOD of the ecoDomio notion of income, as the poaJible incomes of the pel'8OD. In anyone problem, the set of consequences envisaged will be denoted by 1', and the individual COD8eQuencea will be denotAd by f. g. A, etc. In the omelet a-
ample, F oonsista of the six conaequeoees tabulatAd in Table 1: eix~ omelet; DO omelet, and five good ega destroyed; etc. If two different acts had the same cODBequeneea in every state of the world, there would from the present point of view be DO point in eon&idering them two different act. at all. An act may therefore be ide. tified with ita possible couaequencea. Or, more formally, an act is It function attaching a consequence to each state of the world. The notation f will be used to denote an act, that is, a fuDction, attachiDI the COD8equenee 1(') to the state ,.. The notation f is losica1ly a better name for a function than the more customary J(') for exactly the same reason that the word "logarithm" is a better term for loprithm than "logarithm of ZH would be. The notational distinction involved here is often justifiably neglected in mathematical work. but we will have special need to observe it, at least in connection with acta, as willlOOD be explained. When aevera1 acta are to be diacuMed at once, they may be
2.51
OONSEQUENCa, A.CTS, AND DECISIONS
16
denoted by dift'erent letten thUS: f, " hj by the UIe of primes thus: f, f', f"; or by subscripts thus: f), f l - The set of all acta available in a given situation will be denoUd by F or a aimDar Qmbol. In the example of the omelet, ,. has three acts &8 elements. fi, for example, f denotes the first of the three acts listed in Table 1, then f is defined thus: !(good) - Iix-egg omelet; (1)
J(rotten) -
DO
omelet, and five good eggs destroyed.
The argument might be raised that the formal description of decision that has thus been erected seems inadequate becauae a person may not know the consequences of the acts open to him in each state of the world. He might be 80 ignorant, for example, &8 not to be sure ,,·hether one rotten egg win spoil a six-egg omelet. But in that case nothing eould be simpler than to admit that there are four states in the world corresponding to the two states of the egg and the two conceivable &Dswe1'8 to the culinary question whether ODe bad egg will spoil a aix. . omelet. It 8eems to me Ob''10U8 that this solution works in the greatest generality, though a thoroUghgoing analysis might not be trivial. A reader interested in the technicalities of thi8 point or that of the succeeding paragraph will find an extensive discussion of a similar problem in Chapter II of IV4], ""here von Neumann and l{orgenstern diacUII the reduction of a general game to ita reduced form. Again, the formal deecription might seem inadequate in that it does Dot pro'\ide explicitly for the possibility that one decision may lead to another. Thus, if tbe omelet should be spoiled by breaking a rotten egg into it, new questions might arise about what to substitute for breakfast and bow to appMBe your justifiably furiou8 wife. But, just &8 in the preceding paragraph an apparent shoneoming of the proposed mode of description was attributed to an incomplete analysis of the possible states, here I would say that the list of available acts envisaged in Table 1 is inadequate for the interpretation that has just been put on the problem. Where the single act "break int.o bowP' now stande, there should be several, such as: "break into bowl, and in case of disaster have t.out," "break into bowl, and in cue of disaster take famjly to a neigbboring restaurant for breakfast." Appropriate COD8equence8 of these new acts can easily be imagined. As has just been suggested, what in the ordinary way of thinkiDg might be regarded 88 a chain of deeisiODS, one leading to the other in t.ime, is in the formal description proposed here regarded 88 a 8ingle decision. To put it a little differently, it is proposed that the choice of a
16
PRELIMINARY OON8IDERATIONS ON DECISION
[2.5
poliey or plaD be reprded 88 • .oncle decision. This point of view, though not always in 80 explicit a f0J'!ll, hu played a prominent role in the statistical advances or the present century. For example, the great majority of experimentaliats, eveD today, suppose that the Cunetion of statistics and of statisticians ia to decide what conclusions to draw from data gathered in aD experiment or other observatioDai program. But statisticiaos bold it to be lacking in foresight to gather data without a view to the method of analysis to be employed, that is, they hold that the design and aoalysia of an experiment should be decided upon &8 an articulated whole. The point of view under discussion may be symbolized by the proverb, "Look before you leap," and the one to which it i8 oppoaed by the proverb, "You can cr088 that bridge when you come to it." When two proverbs conflict in this way, it is proverbially true that there is some truth in both of them, but rarely. if eve" can their common truth be captured by a single pat proverb. One must indeed look before be leaps, in BO far 88 the looking is Dot ~nably time-cODSUming and otherwise expensive; but there are innumerable bridges one e&IlDot afford to era., unless he happens to come to them. Carried to ita logical extreme. the "Look before you leap" principle demands that one envisage every conceivable policy for the government of his whole life (at least from DOW on) in its most minute details, in the light of the vast Dumber of unknown at.a.ta of the world, and decide here and now on one policy. This is uttA!rly ridiculous, not-as some might think-because there might later be cauae for regret, if thinp did not turn out &8 had been anticipated, but because the task implied in making such a decision is not even remotely resembled by human possibility. It is eveD utterly beyond our power to plan a piCDie or to playa game of chess in accordance with the principle, even ""ilen the world of states and the set of available acts to be envisaged are artificially reduced to the narrowest reuonable limits. Though the "Look before you leap" principle is preposterous if earried to extremes, I would nODe the 1888 argue that it is the proper subject of our further diacusaiOD, because to cr088 ODe's bridges when ODe comea to them me&DI to attack relatively simple problema of decision byarWicia1ly con6n ing attention to 80 small a world that the "Look before you leap" principle can be applied there. I am UD&ble to formulate criteria for eelectiq th818 amaH worlds and indeed believe that their ee1action may be a matter of judgment and experience about which it iI impoeeible to enunciate complete and sharply defined geeral prineipl., thouah aomethiDI more will be laid in this cODDeetion in 10.6. On the other hand, it is aD operation ill which we all nece.arily have
2.6}
THE ORDER 01.- ACTS WITH RESPECT TO PREl"ERENCE
17
much experience, and one in which there is in practice coDBiderable agreement. In view of the "Look before you leap" principle, acts and decisioDS, like events, are timeless. The person decides "now" once for all j there is Dothing for him to wait for, because his ODe decision provides for all contingencies. None the less, temporal modes of description, though translatable into atemporai ones, are often 8uggMtive. Thus, there 'rill be occasion to analyze and make frequent use of the idea of defer... ring a decision until an observation relevant to it has been made. 6 The simple orderinc of acts with respect to preference Of two acts f and I, it is possible that the person prefen f to I. Loosely speaking, this means that, if he were required to decide between f and DO other acts being available, he ,,'ould decide on f. This procedure for t.esting preference is not entirely adequate, if only beeause it fails to take account of, or even define~ the possibility that the person may not really have any preference between f and I, regarding them as equivalent; in which case his choice of f should not be regarded &8 significant. If the person really does regard f and g 88 equivalent, that is, if he is indifferent between them, then, if f or « were modified by attaching an arbitrarily small bonus to its conBequences in every state, the person's decision would presumably be for whichever act was thus modified. This test for indifference does not provide an altogether satisfactory definition, since it begs the question to some extent by postulating in effect that the tester knows what eODstitut.es a small bODus. Another attempted solution would be to say that the pel'8On knows by introspection whether he has decided haphazardlyor in resp01l8e to a definite feeling of preference. This sort of solution seems to me especially objectionable, because I think it of great importance that preference, and indifference, between f and I be detennined, at least in principle, by decisions between acts and not by response to introspective questions. In spite of the difficulty of distinguishing between preference and indifference, I think enough h&~ heeD said for us to proceed to a postulational treatment of them. The very meaning of the relationship of preference that I have attempted to establish in the preceding paragraph implies that the perSOD cannot simultaneously prefer f and I to f. In the postulational treatment of the relationships of preference and indifference, it will be technically cODvenient to work with the relation "is not preferred to" rather than directly with ita complementary relation "is preferred to." Thust rather than say that it is impossible that both f is preferred to I and , to f, I might say that, of any two acts f and I, f is Dot preferred
't
to,
PRELIMINARY CONSIDERATIONS ON DECISION
18
to I or I is Dot preferred to f, poaaibly both. Again, the de&nition of preference sugests that, if f is not preferred to I, and I is not preferred to b, tben it ja imp088ible that f should be preferred to h.
The two assumptioDS jU8t made about the relation "il Dot preferred to" is 80metimes expreaed in ordinary mathematical usap by aayiDg that the relation is a simple ordering aJDODI acts. Formally, a relation ~. amoog & set of elementa z, " z ..• , is called a Rmpl_ ord"" in this book, if and oDly if for every z, ,1, and ,: 1. Either % S- fI, or II S· z. 2. If z <. fI, and tI S- z, then z ~. J. Borrowing from arithmetic the sugestive abbreviation oS for the relation "is Dot preferred to, rt the 8I8UJDptioD that S is a simple order..
iDg can be expre.ed formally by a postulate, thus:
PI
The relation S is a simple orderiDg 8JDoug actB.
I t if noteworthy that PI makes no explicit reference to states of the world. Except pc88ibly for mathematical refiDemeDi8, t it aeeme to IDe that no additional poetulate8 can be formulated without makiDI such reference at any rate none will be in this book. PI by it8elf is not very rich in coDSequences, but one easily proved theorem followiDg from it may be mentioned. 1 If P is a finite Bet of acts, there exist f and h in P 8Uch that for alii in P ISaS h .
TDEOREM
Theorem 1 is especially relevant to application of the theory of decision, because I interpret the theory to imply that, if r is finite, the person will decide on an act h in r to which no other act in F is p~ fernd, the existence of at least ODe such h beiD, guaranteed by the theorem. I t is often appropriate to consider infinite sets of available acta. In eeonomie context., for e.yample, it is generally an inappropriate complication to take explicit account of the poaaibility that all traD8actiODI muat be in integral numbers of pennies. If infinite leta of available aeta are set up and interpreted without BODle mathematical tact, unreaIistie conclusions are likely to follow. Suppose, for example, that you were free to chooee any income, provided it be definitely lees than etOO,OOO per year. Precisely which income would you ehooee, abatraetiDg from the indivisibility of pennies?
t For eampie, lUell t.opoIocical • .wnptiona about the spece with oeiphorhcJod. defined in tenu of S aa CODDeCtedD-, local comp.dra e••, or deD8i~y.
2.6)
THE ORDER OF ACTS WITH RESPECF TO PREFERENCE
19
It ia sometimes convenient to supplement the relation S by other reiatiODB derived from it in accordance with the definiti0D8 in Table 1, analogous defiDitiona heiDg applicable to any simple ordering. The assumption of simple ordering~ PI, has several implications for the derived relationa ~, <, >, and ... Tlle.e are generally stroD&ly augpatAd by the properties of the corresponding relations in arithmetic. TABLC
1.
TABLB OF RZLA'ftON8 DBJUftD nOli
Definition
New Relation f> ,.
IS f.
f
< I, i.e., I > I.
f
== It i.e•• f is equinleDt to (or
f
S
is prefemd to f.
iIlcWrereat with n.peet to) ,.
I is between f and h.
It is false that I S f. 1< f. f
S " and I S f.
f~
ISh, or h ~ g S
f.
A few such implieatioDB of PI are listed below, with no intention of completeness, 88 exerciaes for those who may not already be familiar with the elementary properties of simple ordering.
Bxerdaea 1. The relation ~ is also a simple ordering. 2. AU the relatioDS S, ~, <, >, and :I.: are transitive, that is, they can be validly substituted for S in the second part of the definition of
simple ordering. 3. Between any pair of acta I, " one and only one of the three relations <. :6:, and > holda. 4. If f < I, and I == h, then f < h. o. If f = I, then I ~ f. 6. For any f, f .. f. 7. At least one of three acts I, g, b is between the other two. When can there be more than one 8uch? Two very different aorta of in~rpretatioD8 can be made of PI and the other postulates to be adduced later. First, PI can be regarded 88 a prediction about the behavior of people, or animals, in decision situations. Second, it can be regarded 88 a logic-like criterion of coDSisteney in decision situations. For us, the aeeond interpretation is the ODIy one of direct relevance, but it may be fruitful to discu88 both, calling the mat empirica1 and the second DOnutive.
PRELIMINARY CONSIDERATIONS ON DECISION
12.6
Loaic itaelf admits an empirical 88 well 88 a normative interpretation. Thus, if an experimental subject believes certain propositioua, it is to be expected tbat he will also believe their logieal eonaequencee and disbelieve the negations of the. CODIeQuences. This theory of humaD psychology has some validity and is of great practical utility in our everyday dealings with other people, thOUlh it is very crude aDd approximate. For ODe thiDI, people ofteD do make elemeotary mistakes in locie; more refined theories would attribute theae mistakes to such thiDp 88 accident or subcoDacioua motivation. For another, if anyone who beHeved the axioms of mathematics also believed aU that they imply and nothing that they contradict, mathematical study would be superfluous for him; such & penoD would, as has been explained, be able to state the ten-tholl.D dth or any other ~ in the decimal expansion of ... on demand. To summari.Ie, logic can be interpreted as a crude but sometimes handy empirical paycholOlical theory. The principal value of logic, however, is in connection with ita DOrm&tive interpretation, that is, as a set of criteria by which to detect, with au1Iicient trouble, any ineoD8istencies there may be amODl our beliefs, and to derive from the beliefs we ahady hold 8Ueh new 00. . . conBisteDcy demands. It does Dot seem. appropriate here to attempt an analysis of why and in what contexts we wish to be conaistent; it is sufficient to allude to the fact that we often do wish to be 80. ADalogoualy, PI tocether with the postulates to be adduced la~r can be interpreted 88 a crude and shallow empirical theory predictiDg the behavior of people making decisiODS. This theory is practical in suitably limited domains, aDd everyone in fact makes use of at leaat lO1Ile .... peeta of it in predietiDi tbe behavior of othen. At the eame time, the behavior of people is often at variance with the theory. The departure is 80metimea flagrant, in which ease our attitude toward it is much like that we hold toward a slip in logic, calling the departure a miBtake aDd attributing it to such thinp as accident and ·subccmacious motivation. Or, the d~ may be detectable only by a long chain 01 argument or calculation, the possibilities becominl increasiDgly complicated .. new postulates are brought to stand beside Pl. Pursuing the analogy with logic, the main use I would make of PI and ita successors is normative. to police my own decisiona for cODSiBteDcy and, where poesibl~ to make complicated deciaioDs depend on simpler ones. Here it is more pertinent than it was in eonneetion with logic that something be said of why and when cODSiBt.ency is a dMideratum, thouP I cannot say much. Suppose 8OmeoDe says to me, "I am a ratiODal person, that is to say, I seldom, if ever, make mistakes in Josie. But I
2.71
THE 8URF.-THI!"G PRIXCIPLE
21
behave in flagrant diM.greement with your postulates, because they violate my personal t~te, and it Kt~~mR to me more sensible to cater to my taste than to a theon" ... arbitrtui)v r.(m('()(~ted bv ., .vou." I don't see how I could really contr()\~ert him. but I \V'ould be inclined to match his introspection "ith some of my own. I would, in particular, t('U him that., when it is cxplicitl~.. hrou~ht· to my attention that I ha,·e shown a. preference for f as compared wit.h I, for g as (~ompared "ith h, and for h as compared wit.h f, I feel uncomfortable in much the same way that I do when it is brought t.o my attention t.hid some of my belief8 are logically contradi(~tory. \Vhetl~ver I examine Stwh 8 triple of preferences on my own part, 1 find that it. is not. at. all difficult t.o reverse one of them. In fact, [ find Ol} (~ontemplating th~ three alleged prefe~DC(1'8 side by side that. at least one among them is not. a preference at all, at any rate not any more. There i~ sCJlne t.elnptation t.u explore the possibi1it-~s of analyzing preference among a(~ts 88 a partial ordering, that, is, in effect to repJaoo part 1 of the definition of simple ordering by the very weak proposition f S f~ admitting that some pairs of acts are incomparable. This would seem to give expression to intr08peeth·e sensations of indecision or ,-acillation, which we may be relue1llnt to identify with indifJeren~e. .My own conjecture is that it would pro\·e a blind alley losing much in power and advaneing little, if at all, in realism; but only an enthusiastic explorat.ion could shed real light on the question.
.
7
The sure-thine principle
., businessman contemplates buying a certain piece of property. He considers the outcome of the next presidential f'lection relevant to the attractiveness of the purchase. So, to clarify the matter for himself, he asks whether he would buy if he knew tha.t the Republican (".andidate were going to \\;n, and decides that he would do 80. Similarly, he eODsiders ,,"h~ther he would buy if he knew t.hat the Democratic ~andidate were goj~ to win, and again find, that he \\·ould do so. Seeing that be would buy in either event, he de<~ides that he should buy, c,"en though he does not. know which event obtains, or will obtain, as \\re would ordinarily say. It is all too seldom that a decision can be arrived at on the basizs of the principle u8ed by this businessman, but., except p088ibly for the aMumption of simple ordering, I leno,,· of no other extralogical principle governing dedsioDs that finds ~ch r~ady acceptance. Ha\ing suggested what. I shall t.ent.ath·cly call the sure-thiDc principle, let me gi\"e it relath"cly formal statement thus: If the person would not. prefer f to g, cit.her knowing that the eveont, B obtained, or knowing that the event ",B obtained, then he does not prefer f to I.
PRELIMINARY <XlNSmERATlONS ON DECISION
(2.7
Moreover (provided be does not regard B as virtually imp<8Jible) if he would definitely prefer, to f, knowing that B obtained, and, if he would not prefer f to ,. knowing that B did not obtain t then he definitely prefersl to f. The sure-thiDg principle cannot appropriately be accepted 88 a postulate in the sense that PI is, becauae it would introduce new undefined technical tel'Dl8 referring to knowledge aDd poseibility that would render it mathematically useleaa without still more postulates lovernbll these terms. It will be preferable to reprd the principle 88 a l00ee ODe that augeata certain formal postulates well articulated with PI. What technical interpretat.ion ean be attaehed to the idea that f would be preferred to ,. if B were known to obtain? Under any reasonable interpretation, the matter would eeem not to depend on the values f and I 888Ume at states outaide of B. There is, then, DO 1088 of generality iD 8Uppoeing that f and I qree with each other except in B,· that is, that f{a) .. g(.) for all •• ~B. Under this unrestrictive ... sumption, f and g are surely to be regarded .. equivalent given '"WB; that is, they would be considered equivalent, if it were lmown that B did not obtain. The first part of the aure-thing principle can now be interpreted thus: If, after being modified 80 as to agree with one another outside of B, f is not preferred to ,; then f would not be preferred to " if B were knoWD. The notion will be expressed formally by .yiDg that f S I liven B.+ It is implicit in the argument that baa just led to the definition of f S , given B that, if two acta f and, are 80 modified in ~B &I to agree with each other, then the order of preference obtaining between the modified acta win not depend on which of the permitted modifieatioDl was actually carried out. Equivalently, if f and I are two aeta that do agree with MOb other in ~B, and f S I; then, if f and , are modified in ,.,..,B in any way BUeb that the modified acta fl and C' continue to agree with each other in ~B, it will also be 80 that f' S t~ This ... sumption is made formally in the postulate P2 below and illustrated schematically in Figure 1, a kind of diagram I find suggestive in many such contexts. In Figure I, the set 8 of all states. and the set F of aU CODSeqUeDees J are represented by horizontal and vertical intervals raspeetively. In any such diagram an act I, being a function attaching a value f(,} • F to each , .8 is represented by a graph. This partieular diagram graphs two acta ·f and I that' agree with each other in ~B, and two other aeta I' and ( that also agree with each other in ~B and arise by modifyilll f and I respectively only in ~B, that is, acta agreeing with f and , respectively in B. + In thiB edition, the Mrreeponding deftnition Dl on the end papers hu becan .lightly strengthened to eompt!!aaate an inadvntftlt wNknMl in the t'nd paper venion of P2, poiDt~ out to me by P..w.r ~hhu1"b.
2.7)
THE SURE-THING PRINCIPLE
23
, ____~l~______~_~____~) Il~___'_l~ j -it 1 ~~----~------~.~--~------~I S
Piaure 1
If f, I, and f/, I' are such that: 1. in """B, I agrees with I, and I' &glees with 2. in B, f agrees with f', and I agrees with (, 3. f S ,; then f' :S (. P2
t,
Each of the relatiOll8 uS given B" is now easily seen to be a simple orderi.Dg, and the relations ,,> J <, >, .. given B" are to be defined mutatis mutandis. It is DoteWOrthy though obvioU8 that, if 1(,) - g(.) for all •• B, then f ... , given B. It is now poaaibJe and instructive to give an atemporal analysis of the following temporally described decision situation: The person must decide between f and I after he finds out, that is, observes, whether B obtains; what will his decision be if he finds out that B does in fact obtain? Atemporally, the peraon can 8Ubmit bilD8elf to the consequences of f or else of I for all. I B, and, independently, he can submit himself to the consequences of f or else of , for all , e ,...", B; which alternative will he decide upon for the 8'8 in B1 Finally, dateribing the situation not only atemporally but aleo quite formally, the persoD must decide amoDl four acts defined thus: ~
boo agrees with I on B and with f on ,..." B, bol &pees with f on B and with , on "-B, b lo qreea with I on B and with f on ,...",B, bll qreea with I on B and with I on ,....,B.
The question at iasue now takes this form.
Supposing that
DODe
of
PRELIMINARY CONSID(4!RATIONS ON DF£ISION
(2.7
the four functions is preferred to the particular one hi;' is i a: 0, or is i == 1; that is, does hil agree with I on B or with I on B1 I t is not hard to see that i can be 1, if and only if f S C given B.. Indeed, if i - I , hoJ S hi;' which means that f S C given B. Arguing in the opposite direction, if f < I given B; then boo
1. 2. 3. 4. 5. 6.
1
The vacuous event, 0, is null. B is null, if and only if, for every f and " f == I given B. If B is null, and B :::> C; then C is null. If "-IBis null; f S I given B, if and only if f ~ ,. f :5 I given S, if and only if f S c. If S is null, f == I for every f and I.
Component 6 of Theorem 1 requires comment, because it corresponds to a pathological situation. In cue S is null, it is not really intuitive to say that S (and therefore every event) is virtually impoasible. The interpretation is rather that the person simply doesn't care ,,·hat bappens to him. This is imaginable, especially under a suitably restricted interpretation of F, but it is uninteresting and will accordingly be ruled out by a. later postulate, P5. A finire set of events Bi is a partition of B i if B. n Bj ~ 0, for i ~ j, and i Bf == B. 'Vith t.his definition, it is easily proved by arithmetic induction that
U
If Bi is a partition of B, and f S I given B; for each i, then f ~ , given B. If, in addition, f < I given 8 J for at least one j, then f < g given B.
THEOREM
2
The union of any finite Dumber of null events is null.
There are still other interesting consequences of Theorem 2, which may be most conveniently mentioned informally. If, in Theorem 2, B == S (or, more generally, if ~~ is null), it is superfluous to say "given
THE SURE-THING PRINCIPIIE
2.71
Btl in the conclusions of the theorem. If f ..:. g given B, for each i, then f I given B. So much for the consequences of P2. Acta that are con-.t, that is, acta whose consequences are independent 01 the etat.e of the world, are of special interest. In particular, they lead to a natural definition of preference among consequence8 in terms of preference among acf.6. F ollo"ping ordinary mathemati(Wal U8age, f .. g will mean that f is identically g, that is, for every a, J(,) - g. A formal definition of preference among consequences can now conveniently be expre88ed thus. For any consequence8 g and g', g < g'; if and only if, when f Iii g and f' == g', f ~ f'. In the same spirit, meaning can be &88igned to such expressioll8 88 f ~ g, g ~ f given B, etc., and I will freely use such expressions without defining them explicitly. In particular, J ~ g given B has 8 natural meaning, but one that is rendered SUperflUOU8 by the next postulate,
=
PS. Incidentally, it is no,,· evident how awkward for U8 it would be to use/(,) for f; because!(,) < g(8) is a statement about the eoneequence8 J(') and g(,) , whereas f :s g is a statement about acts, and we \\ill have frequent need for both sorts of statements. Suppose t.hat f = gt and f' e g', and that 9 S g', is it nuonable to admit that, for some B, f > I' given B1 That depends largely OD the interpretation we choose to make of our technical terms, &8 an example helps to bring out. + Before going on a picnic with friends, a person decides to buy a bathing suit or a tennis raeket, not. having at the moment enough money for both. If we call ~on of the tennis racket and po88e86ioD of the bathing suit consequences, then ...~e must say that the cODSeqUf'nces of his decision ,vill be indepeDd~nt of where the picnic is actually held. If the person prefers the bathing mit this deciBion would presumably be re,"ersed, if he learned that the picnic were not going to be held Dear water. Thus the question ",,'bether it ean happen that f > f' given B would be answered in the affirmative. But., under the interpretation of "act" a.nd "COD.CJe
+ The rolt' nf .n.-h f.-..dom thTOQllhoat QainfO (1951).
~c-i~n~
iA brilliantly
~
by
PRELIMINARY CONSIDERATIONS ON DECISION
[2.1
to assume the negative &D8Wer 88 a postulate. The postulate is 80 couched 88 not only to usert that knowledge of an event cannot estab-
Jish a new preference amODg coaaeqUeDees or reverae an old ODe, but also to 888ert that, if the event is not DuD, no preference amoDI COD8equences can be reduced to indifferenee by knowledge of an event. If f !!! 9, fl only if 9 S g'.
P3
=g', and B is not nuD; then f S f' given B, if and
Applying Theorem 2, it is obvious that
'it
3 H B, is a partition of B; and if (for all i and .) Ii S J(') == Ii, and g(B}. == Ii when, I B,; then f ~ I given B. If, in addition, IJ < Ii for BOIDe j for which Bj is not null, then f < I given B. THEOREM
Theorem 3 is logically equivalent to P3 in the presence of PI and Pl, and Theorem 3 caD as eMily be given an intuitive basis as the postulate P3. Therefore the aanunptiOD of P3 88 a poatulate iDetead of Theorem 3 is only a matter of taste. Theorem 3 has been widely accepted by the British-American School of statiatici&D8, special empbasis having been giveD to it, in connection with his notion of admissibility, by the late Abraham Waldo I believe, 88 will be more fully explained later, that much of ita particular ailnifieance for that school stems from the implication that, if several different people agree in their preferencea amoug consequences, then they must also agree in their preferences amoDl certain acta. This brings the present chapter to a natural conclusion, sinee the further postulates to be proposed can be more conveniently introduced in connection with the uses to lvhich they are put in later chapters.
CHAPTER
3
Personal Probability 1 Introduction I personally coDSider it more probable that a Republican president will be elected in 1996 than that it will snow in Chicago eometime in the month of May, 1994. But even thi81ate spring mow seems to me more probable than that Adolf Hitler is still alive. Many, after careful eonsideratioD, are convinced that such statements about probability to a penoD mean precisely nothing, or at any rate that they mean nothiq preciaely. At the opposite extreme, others hold the meaning to be 80 ee1f-evident 88 to be uDanaJysable. An intermediate position- is taken ill thia chapter, where a particular interpretation of probability to a penon is given in tel'JD8 of the theory of consistent decision in the face of UDeertainty, the exposition of which was begun in the 1aat chapter. Much as I hope that the Dotion of probability defined here is tonaistent with ordinary usage, it should be judged by the contribution it makes to the theory of decision, not by the accuracy with which it aDalyleS ordinary usage. Perhaps the first way that auaeats itself to find out which of two eventl a person considers more probable is simply to uk him. It might even be argued, though I think fallaciously, that, since the question eoneems what is iD8ide the person'8 head, there can be no other method, just &8 we have little, if any, access to a person's dreams except through his verbal report. Attempts to define the relative probability of a pair of events in terms of the answers people give to direct interrogation bas justifiably met with antipathy from moat statistical theorists. In the fint place, many doubt that the concept "more probable to me than" is an intuitive one. open to DO ambiguity and yet admitting no further analysia. EVeD if tbe concept ""ere 80 completely iDtuitive, which might justify direct interrogation 88 a subject worthy of 80Dle psychological study, what could such interrogation have to do with the behavior of a person in the face of uncertainty, except of course for his verbal behavior under iDterrogation? If the stat.e of mind in qUe8tion is Dot eapable of manifestinc itself in some sort of extraverbal behavior, 27
28
PERSONAL PROBABILITY
(3.1
it is extraneOU8 to our main interest. H, on the other hand, it does manifest itself through more material behavior, that should, at least in principle. imply the paasibility of testing whether & person hold. one event to be more probable than another, by some behavior expreaing, and giving meaning tG, bis judgment. It would, in short, be preferable, at least in principle, to interrogate the person, not literally through his verbal answer to verbal questions, but rather in a figurative aense somewhat reminiscent of that in which a scientific experiment iIJ sometimes spoken of as an interrogation of nature. Severa1acbemes of behavioral, 88 opposed to direct, interrogation have been propcad. The one introduced below W88 suggested to me by a paaaage of de Finetti's (on pp. 5-6 of (D2]), though tbe pasaap itae1f does not emphasise behavioral interrogation.. To illustrate the seheme, our idealised person has just taken two eggs from his icebox and holds them unbroken in his hand. We wonder whether he thinks it more probable that the brown ODe is good than that tbe ,,·hite one is. Our curiosity being lWU, we are prepared to pay, if necessary, to have it satisfied. We therefore addreu him thus: uWe see that you are about t~ open thOle ega. If you win be 80 c0operative as to guess that one or the other egg is good, we will pay you a dollar, should your guess prove correct. If ineorrect, you and we are qUitd, ex(.~pt that we will in any event exchange your two eggs for t.wo of guaranteed goodness." If under these circumstances the person stakes his chance for the dollar on the brown egg, it seems to me to cormspond well \\·ith ordinary usage to say that it is more probable to him that tbe brown one is good than that the white one is. Thouch, of course, I hope for your agreement on this analysis of ordinary usap, I repeat that it is not reAlly fundamental to the subsequent arpment.. as indeed no such lexicographical point eould be; for the utility of a construct or definition depends only secondarily on the aptness of the expression in terms of \\-hicb it is couched. There 1S a mode of interrogation intermediate between ",-hat 1 have called the behavioral and the direct. One ean, namely, aak the perBOll. not how he feels, but what he \\·ould do in such and such a situation. In 80 far as the theory of decision under development is regarded 88
an empirical one, t·he intermediatt.e mode is a compromise between economyand rigor. But, in the theory's more important nonnative interpretation as a set or criteria of consistency for U8 to apply to our own decisions, the intermediate mode seel1l8 t.o me to be just the ript one. Though it entails digreasion from the main theme, some readers may be interested in a few words about actual experimentation on strictly
3.11
INTRODUCTION
empirical behavioral interrogation.
29
Some key references bearing on
the subject are [Mol], [RaJ, and [\VB). In the first place, a little reflection shows that an experiment in which human subjects are required to decide among actual acts may be very 8Xpen.sive in time, money, and effort, especially if the consequencea envisaged are expensive to provide, 8 point discu880d in detail in [W8J. Questions of morality, and even of legality, toward the subject may further complicate the investigation. For example, ~lostelJer and No~, as de;cribed in Section 3B of (:\-14). made certain tha.t every subject in one experiment of theirs would be financially ~,nefited, though they kept this security secret from t.he subjecta. There is also a difficulty in principle. SUpp05e that I wish to discover a person's preferences among several acts-three acts f, &, and h are sufficient to bring out the difficulty. If I in good faith otTer him the opportunity to decide among all three, and he decidftS on f; then tbere is no further possibility of discovering \\~hat his preference W'88 between I and h. Suppose, for example, that a hot man actually prefers a 8Wim, a shower, and a glass of beer, in that oroel'. ()nce he decide.s (,n, and thereby becomes entitled to, the 8wim, he can no longer appropriately be &8ked to decide between shower and beer...~ nah·e attempt to do 80 would result in biB deciding between a swim and shower on the one band, and a swim and beer on the other-aD alt4lgether different situation (rom the one intended. The difficulty can sometimes be met by special devict'S. For example, the investigator might wait for a different. but "similar" occlbion. But Allen Wallis has mentioned to me an interesting and very general device, which ,vill no,v be described. with his pennission. t Suppose that the hot man is instructed to rank the three acts in order, subject to the consideration that two of them will be drawn at random (e.g., by tard drawing or dice rolling), and that he is then to have whichever of these two acts he has &&riped the lower rank. He is thus called on to select one of six acts, that iaf one of the 8ix po88ible ran kings. If he does, for example, select the ranking tswim, show'er, beerf, it follow8 (l.asily from the theory of decision thus far developed that. for him 8wim ~ shower ~ beer, barring the farfet~hed possibility that he regards one or more of the three dra,,·ings 88 virtually impossi.. ble and provided that his preference aD)ODg the three acts swim, sbo,\-er, beer given any of the three drawings is the same as his original preference. The investigat.or could in practice design the drawing in such a
w.
t I have t!lin~ 8(I4!O this same devic:e used by If.
Allais.
PEBSONAL PROBABILITY
30
way as to be well satisfied that the required "irrelevance" obtained, except for very "superstitious" people. This ends the present dipeaaioD on actual behavioral interrogation. The purpose of this chapter is to explore the concept of pel'8OD8l probability that was indicated in the example about the two ega. The concept will be put on a formal basis in § 2 by introducing two DeW postulates, P4 and PS, to be ueed in conjunction with Pl-3. This will lead to a formal &D&lyaia of the notion that one event is DO more probable than another. Several deductions about this notion reminiscent of mathematical properties ordinarily attnDuteci to probability will be made; but only in 13, after adjunction of still another postulate, P6, can the notion be connected quantitatively with what mathematicians ordinarily eatl mathematiea1 probability. Section 4 is devoted to some mathematically technical criticisms of the notion of pel'8OD8l probability, which can safely be skipped or skimmed by those not interested in such matters. Section 5 diacusaes conditional pel'8OD8l probability; 6, the approach to certainty through a long sequence of conditionally independent relevant observations; and 7, an extension of the concept of a sequence of independent events, particularly inten.tiDg from the
t
viewpoint of personal probability. S
QuaUtadve penonal probabWty
When I spoke in the introductory section of offering the person a dollar if hia gllesa about the egg proved correct, it was tacitly assumed that his gueea would not be affected by the amount of the pri. offered. That aeema to me correct in principle. It would, for example, eeem UDreasonable for the person with the two eggs to reverae his decision if the priM were reduced from a dollar to a penny. He might reverse himself in going from a penny to a dollar, because he might not have found it worth his trouble to give careful consideration for too amall a prise. I think the anomaly can best be met by deliberately pretending that consideration costa the pereoD nothing, though that is far from the truth in actual complicated situations. It might, on the other hand, be stimulating, and it is certainly more realistic, to think of CODSid~ tion or calculation as itself an act OD which the person must deoide. Though I have Dot explored the latter poasibility carefully, I 8U8p8Ct that any attempt to do 80 formally leads to fruitleea and endless re• gJ'tBlOD.
t The
term "personal probability" .... -lgMted to me orally by ThomtoD C. Fry. Some other terme IQIPIIted for the I&IDe concept. are "subjectiye probability," "paycbolOlical probabUity," aDd "d.- of oooviotion."
3.21
QUALITATIVE PERSONAL PROBABILITY
31
To offer a ~ in ease A obtains means to make available to the pereon an act fA such that for'IA,
(1)
for. c ~A,
where J' < I. The 888UDlption that on whieh of two events the penon will chooee to stake .. given priae does not depend on the prise itae1f is 8XpMA 00 by the following postulate, which loob formidable only because it contains four definitions like (1). The reader may find it belpful to grapb an iD8t.a.nce of the postulate in the 'Pirit of Figure 2.7.1.
1.
" ,,; A, B; fA, fB, IA. IB are such that: g' < g; f <J,
2&.
'A('}
P4
If I, "
2b.
I, / .. (,) - f,
gA(') == g
for, • A,
IIA(') - g'
for,
IB(') - I,
IIB(') - II
for, I B,
f',
IIB(') - g'
for •• "'B;
a
/B(') -
3.
I ~A;
fA S fs;
tben IA SIB. In the light of P4, it will be said that A is DOt more probable than S, abbreviated A ~ B; if and only if when f < f and fA, fB are such that IA(') - / for, • A, fA(') for •• "'A,
r
IB(') - / then fA
for, I B,
'B(') .. f'
for •• ",B;
S fs.
The assumption that there is at least one worth-while ptiJe is innOCUOU8; for, thoulh a context failing to satisfy it might arise, such a eonteJtt would be too trivial to merit study. I therefore propose the following postulate. P5
There is at least one pair of cODSequences/, /' such that
r
< /.
All the implications to be deduced from Pl-5 for 80me time to come
are tbemaetves impiicatious of the three easily established eonclusiODa. which are introduced by the following definition and theorem.
PERSOX.o\L
32
PROBA8II~ITY
[3.2
A relation ~. between events is a qu.Jitatlve probability; if and only if, for all events B, C, D. ~.
is a simple ordering, 2. B 5:. C, jf and only if BUD 1.
C
n D = OJ 3. 0 S· 8, 0
S· C U D provided B n D t
=
<. S.
I t may be helpful to remark that the second part of the abo,'e definition says, in eft'ect, that it will not affect the person's guess to offer him a consolation prize in ease neither 8 nor C obtains, but D happens to. TR£OR!"'~
1 probability.
The reJation
<
as applied to e\pents is a
qualitat~ive
You will ha,"e no difficulty in pro'''ing that Theorem 1 fonows from PI-5. Theorem 1 has many consequences of the 80rt one \vould expect it S meant "not more probable than" in any sense having the mathematical properties ordinarily attributed to numerical probability. This is illustrated by tbe follo""ing list of exercises, l'Iilieh should not only be proved fonnallYt but also int.erpreted intuith"ely. One easy exercise not included in t.he Jist belO\V, because it is not strictly a eonsequence of Theorem 1 alone, is to show that B .!. 0, if and only if B is a nun event. Exercises
1. If B c C, then 0 ~ B < C ~ S. 2&. If B n D = enD = 0; then B
< C,
if and only if BUD
<
CUD.
2b. If 0 < C, and B n C == 0; then B < B U C. 3. If B < C, then ~C :5 ,.....,B; and conversely. Hint! l)raw a '~enn diagram of the fourfold partition B n C, ",8 n C, B n ",C, ~B n
,.".,C. If B 5 CJ and enD = 0; then BUD S CUD. If B < 0; then B U C == 0, and B == o. If S S B; then B n c == C, and B == s. If BUD S C U D t and B n D = 0; then B ~ C. 5a. If Bl S Ct , B2 < C2 , and C1 C:J, == 0; t-he-n Rl U 8 2 S C1 U C2 • Hint: Exhibit B2 and C1 in the fonn B2 ~ Ba' U Q, C1 == 0 1' U Q with 8 2' tel', Q disjoint. Justify the following ealculation, step by step. 4a. 4b. 4e. 4d.
n
I &12-' BUD
whence Bl U B2
_ CUB 1 2' <
< C.
U C2 •
== C1' U Bt)_
_ CI ' <
U C_, t)
3.3]
QtTAl\"rITATIVE PERSONAL PROBABII..ITY
88
5b. If Bl U B. S C1 U Ct and Bl n B2 - 0; then Bl S C1 or B t S C2 • 6. If B S ,...,B and C ~ ~C, then B S C; equality holding in the conclusion, if and onlr if it holds in both parts of the hypothesia.
I
Quantitative penonal probability
As I have said, the exercises tenninating the preceding section 1U1gest a close mathematical parallelism between penonal probability and the mathematical properties ordinarily attributed to probability, though the postulates asaumed thus far do not (88 could easily be demOJUltrated) make it po88ible to deduce from this parallelism the unambiguou8 . . aipment of a numerieal probability to each event. But, if, for example (following de Finetti (D21), a new postulate aaeertiDg that 8 can be partitioned into an arbitrarily large number of equivalent sublets were ..umed, it is pretty clear (and de Finetti explicitly ahoWB in (D2D that numerical probabDities could be 80 assigned. It might fairly be objected that such & postulate would be flagrantly ad hoc. On the other band, mob a poatulate could be made relatively acceptable by observing that it will obtain if, for example, in all the world there is & coin that the person is finnly convinced is fair, that is, a coin such that any finite sequence of heads and tails is for him no more probable than any other eequence of the same length; though such a coin is, to be sure, a considerable idealisation. Mter some general and abstract discussion of the mathematical CODnection between qualitative and quantitative probability, a postulate, P6, will be propoeed. which, though logiea1ly actually stronger than the U8UDlptioD that there are partitions of S into equivalent eventa, seems to me even easier to accept. Once P6 is accepted, there WIll scareely again be any need to refer directly to qualitative probability. To begin with, let me say precisely what is meant, in the present context, by a probability measure, this being the standard term for what I would here otherwise prefer to call a quantitative probability, and what it means for a probability measure to be in aareement \\;th a qualitative probability. A probability measure on a aet S is a function P(B) attaching to each B c S a real number such that: 1. P(B) ~ 0 for every B. 2. If B n C - 0, P(B U C) 3. P(S) - 1.
= P(B) + P(C).
This definition, or something very like it, is at the root of all ordinary mathematical work in probability.
PERSONAL PROBABILITY
[8.8
If 8 carrie. a probability measure P and a qualitative probability S- such that, for every B, C, PCB) S pee), if and only if B ~. Cj then P (strictly) agrees with S-. If B 5- C implies PCB) S P(C), then P .Imoat qre. with ~ -. ThiI terminology is obviously CODsistent in that, if P agrees, that is, strictly agrees, with < -, P also almost agrees with S-. It is also easily aeeD that, if P agreoa with S·, then knowledge of P implies knowledge of ~.. But, if P onIyalmoet &&reee with it may happen, 88 examples in 14 mow, that PCB) P(C), though B < - C, 80 that knowledge of P may imply only imperfect
.s.J
knowledge of 5:-. The rest of this section is mainly a atudy of qualitative probabilities generally. with a view to discovering interestiDg conditioDi under which there is a probability measure that agrees, either strictly or alm08t, with a given qualitative probability. Theae eondiuODB 8IJgest a new postulate goveming the special qualitative probability S. The work is necessarily rather tedious and burdened with detail. I t will, therefOle t be wiae for most readers to skim over tbe material. omitting the proofs but noticing the more obvious logical cODDectioDB aJDODI the theorems and definitions. Some may then find themselves aufliciently interested in the details to retum and read or supply the proofs, .. the eue may require. Othere may u.fe1y go forward. Here, as elsewhere, technical tenns of interest for the moment only are introduced with italics rather than boldface. An n-fold almost uniform partition or B is an n-fold partition of B such that the union of no r elements of the partition is more probable than that of any r + 1 elements. If there exist n-fold almost lmoorm partitioDB of B for arbitrarily large values of ft_ then there exist m-fold almost unii'onn partitions for every positive integer tn.
THEOREM 1
Let B i , i = 1, .", A, be an ",",fold almost uniform partition (of B) with n ~ m2• Using the euclidean algorithm, let n be written n = am + b. where a and b are integers such that m < a and 0 ~ b < m. N ow let Cj, j :.;; 1, ... , m, be any m-fold partition such that each Cj is the union of a or a + 1 of the Bls. The union of any r of the C/s, r < m, is the union of from ar to (0 + l)r of the Bli and the union of r + 1 of the C/s is that of from a(r + 1) to (a + l)(r + 1) of the Bla. Since r < m S a, (a l)r -= or r < or + (J - o(r + 1)• • PROOP.
+
+
2 If there exist n-fold almost unifonn partitions of 8 for arbitrarily large yalues of Pl, then there is one and only one probabllity measure P that almost agrees with S e. Furtbennore, for any p, 0 S p THEOREM
QUA.'~ITATIVE
3.3)
PERSONAL PROBABILITY
35
:S 1, any B c 8, and the unique P just defined, there enlt. C c B
= pP(B). t
such that P(C)
The proof is broken into a sequence of easy stepe, left, for the moat part, to the reader. These steps are grouped in blocks. only PROOF.
the last step in each being needed in the proof of later steps. 1. There exist n-fold aJmost unifonn partitions of S for every positive n. 2&. If PI, . · '. P. are real numbers such that 0 S 1'1 S 112 :S · · · S Pttl and ~Pi == 1; then
,.
:E Pi S Tin,
(1)
T--l , ... , n.
I
2b. If further ,.+1
1l
EPi> L
for r == 1" ...
Pi
k-r+1
1
ft -
l',
then r
(2)
L
~
Pi
>
{r -
E
and
l)/n,
1
Pi
S
(r
+ l)/n.
ft-,+1
2c. The sum of any r of the p/slies between (r - l)/n and (r + l)/n. 2d. U P alm08t agrees ,vith S·, and C(r. n) denotes bere and later in this proof any union of r elements of any n...fold almost unifonD par-
tition (not necessarily the same from one context to another), then (3)
(r - 1)/n
S
<
P(C(r, n»
(r
+ l)/n.
3. Let k(B, n) denote the qest integer r (poesibly sero) such that some C(r, fa) is not more probable than B. The function k(B, n) is well-defined, and 0 S k(B, n) S n. 4&. For any P that almost agrees with (4)
(k(B, 'n) - 1)/"
S P(B)
S·, S (k(B,
n)
+ 2)/n.
4b. At most one P can almost agree with S· Sa. If Bi and C, are n-fold partitions (not necerearily almost uniform) 80 indexed that B1 S· BJ S· ... S· B,., and C1 ~. C, ~ •... ~. e,,; then (6)
" B.~· U " Ci, U
"-r
t Techoiea1
r
== 0, ... , n - 1.
ft-t'
note: The mathematical eaenoe of the terminal conclusion of lhiI t.heanm, and other coocluaioos related to it, are liven by 80beayk and Hammer (815). It miPt be conject~ in 8061017 with couat&bIy additive meMUNI, that thIa OODclUIioD meaDI ooIy that P .. DOD.tomiC, bat tM\ coojectun it (alee [Nil.• + A key reteftn~ for further information on tht- 8tnIdure of ftnitely addi... tift meuanB ill (DubiDII 1969). 8utainM tI. . of flnit.I~· additiv«' probability ill illuatrated in (Dobins and 8. . . 196';).
PERSONAL PROBABILITY
36
(3.3
6b. If in addition the two partitioD8 are almost uniform, then ,.
r+2
U c·• -<. U B·I,
(6)
I
(Proof.
1, ' .. ,
T -
fa -
2.
1
r+2
"
•
,.
1
.-r
JI-,.
J
U Bi ~. U Bi ~. U C. ~. U Ct.)
00. The union of any r element. of one almost uniform .fold partition is Dot more probable than the union of any r 2 ~lements of aD... other. M. If B n C =: 0, then
+
(7)
k(B t n)
+ k(C, n} -
2 S k(B U C, ta) S k(B, n)
+ k(C, n) + 1.
6&. If a C(r, m) is not more probable than a C(a. n), then
r:~ e:~ $
(8)
+ ;;;.
(Consider an mn-fold almost uniform partition, and use the easily established fact that t.he union of any l + 2 elemeDts of an alm08t uniform partition is actually more probable than that of any t elemeDtB.) k(B, ~ _ k(B, n)
6b.
m
n
I ~ ! + 3 + 2-. Imnm"
6c. It is meaningful to define P(B) by PCB) = Dr lim
(9)
k(B, n)
A-- "
'
that i8~ the limit exista. 7. PCB), as just defined, is a probability measu~ and the 0Dly one that almost agrees \\itb S·. 8a. There exist tlVO infinite sequences of sets C. and D" contained in B such that: 1. eft n D.
== O.
2. Cft C C,,+h and D" c D"+l, 3. P(C.) ~ pP(B) _ n-', 4. P(D,.) ~ (1 - p}P(B) -
n- I •
8b. P(U" C,,) 2! pP(B), P(U" D.) ~ (1 - p)P(B), and (U" D.) = O. Sc. P( C,,) - pP(B) . •
cU. C.) n
u.
A few technical terms of localized interest oo1y are now introduced. If aDd only if, for every B >·0. there ill a partition of S. DO element of
QrANTIT.-\TIVF,
3.31
PJ.~RSONAI..
PROBABILITY
37
whi(eh iR as probahlt~ as Il; :s;. is Ii~' (:. + B and C are aIm o.t tqUiVdlent, written B ¢ . C; if and only if for all non-null G and H such that B n G =: C n H = 0, BUG >. C and C U H ~. B. It is ob'''iou8 that equivalent events are also almost equivalent. Finally, if and only if every pair of almost equivalent events are equivalent, <. is tight.. THEOREM
:!
S . is fine.
HTP.
1. If B >. 0, and C >. 0; there exists Dee such that o < . D <·B. 2. II B Q. 0, C ¢ .. /1, and B n C = G n If =- 0; then B U C o·G U H. 3. It B o· C, G ¢ . H, B U C o· G U H, and B n C = G n H == 0; then B o·G. 4. Any partition or S into almost. equivalent events is an almost uniCONCL.
partition. 5. Any event can be partitioned into two almost equivalent events. 6. Any event can be partitioned into 2" almost equivalent events, for any non-negative integer n. i. There exists one and only one P that. almost agrees with <-. For any B. p (0 < p ~ 1), and the unique P just. defined, there exists C c B such that P(C) == pP(B). If B >. Ot P(B) > O. Finally, B c- C, if and only if P(B) = P(C). fOl"lB
The parts of the eonclusion are so arranged that ea~h is easy to prove in the light of its pnadecE'SSOr'S, but proofs (or Parts 3 and 5 are given below. I t may be remarked that all parts are trivial consequences of the Ja...~t one and have therefore relatively little importance in PROOI'.
themselves. Part 3. Suppose, for example, B U E <. G, B n E = 0, and E >. 0; and consider two CASeS: (a) If B U C <. S, it may be assumed without 1088 of generality that (," E = 0, whence (B U C) U E ~. G U H. Therefore, C >. H. Let E be partitioned into two non-null event8 Eland E 2 ; then (since it is absurd to suppose that the part of G outside of C is null, which would imply C ~. G >. B U E) there is in G an E' such that C n E' - 0 <·E' ~ .. E2' No'\\" CUE' >·H U H' ~.G >. (B U H.) U E 2 •
n
whence C >. B U E I, which is absurd. (b) If B U C ==. S, it can (setting aside the easy special case enG :iiI.O) be shown successively that: HUG =as; C <. BUB <. '0, where E >·0 and E c (' n G; (8 n H) U E <- (G n C); (C n H) <. (G B); and II U E <. G, which establishes 8 contradiction.
n
+ In the fiJ'Rt Nlition, thifl df'nnitiol1 If'alt''olm Pikf!'~
WIU~
a triftt' tnn "tp&k. u point..! nut by
PERSONAL PROBABILITY
38
11.1
Part 5. There exists a sequence of threefold partitions of B, say C,., D,., and G., such that:
D,., and D. U G. ~. ell, 2. 0"+1 ~ c., D.+1 ::> D., and 0.+1 C G., 3. ~.+1 n G. ~. G,,+I; whence Ga' contaiDS two disjoint evena each at least as probable 88 0,,+11. Crt U G.
~.
For any H >·0, G.
~.
H for sufficieutJy large ft, 18 may be I88Il by coD8ideriDg some m-fold partition DO element of which is more probable than H. and letting n be such that 2"-1 > m. If G. were more probable than H and therefore more probable than each element of the partition, it would follow tbat the union of all elements of the partition, namely S, is leas probable than 0 1 , which would be absurd. The two events Bl = U. Cra, Bt = G.) partition B in the required faahiOD...
If S· is both fine aDd tight; the only probability measure that almost agrees with S· strictly agrees with it, and there exist partitioDl of 8 into arbitrarily many equivaleat eveDta. CoaOLLARY 1
4 ~. is both fine and tight, if and only if, for every B <. C, there exists a partition of 8 the union of each element of which with B is less probable than c.
THEoREM
The proof of thia theorem is easy.
In the light of Theorems 3 and 4, I tentatively propoee the foUowiq poetulate, P6', IOverJling the relation S amoDi events, and thereby the relation S amoDI acta. U B < 0, there exists a partition of 8 the union of each element of which with B is less probable than c.
PS'
It aeems to me rather eamer to justify tbe &88UDlption of Pe', which _ye in eft'eet that ~ is both fine and tight, than to justify the .-unption, which was made by de Finetti {D2J and by KoopmaD (Kg), (KIO], (Kill in closely related contexts, that there exist partitions of Slnto arbitrarily many equivalent events, though 10liea1ly P6' implies that ••nmption and somewhat mOle. Suppose, for example, that you y0urself consider B < C, that is, that you would definitely rather .take • gain in your fortune on C than on B. Consider the partition of your own world into 211 events each of which corresponds to a particular sequence of R heads and tails, thrown by youraelf, with a coin of your own choosing. It fW!D\8 to me that you could euily ch~ such • coin and chooae ft sufficiently large 10 that you would continue to pre-
3.8)
QUANTITATIVE PERSONAL PROBABILITY
fer to stake your pin on C, rather than on the union of B and any particular aequence of n. heads and tails. For you to be able to do 80, you need by no means consider every sequence of heads and tails equally probable. It would, however, be disingenuous not to mention that some who have worked on a closely related concept of probability, notably Keyns [K4] aDd Koopman [K9], [KIO], [KI1], would object to P6' preciBe1y beawae it implies that the agreement between numerical probability and qualitative probability is strict. Koopman, for example, bolds that, if A ::> B and A ,14 B, then A is necesaari1y more probable thaD 8, though the numerical probability of A may well be the 8aDle as that of B. Thus, if. marksman shoota at a wall, it is logically contradictory that his bullet should fall nowhere at all, but it is logiea1ly coJl8istent that a preacnDed mathematically ideal point on the bullet should strike a praacnDed mathematicaUy ideal line on the wall Since the event of the PN8Cribed Point hitting a pre8CnDed line is logieally possible, Koopmall would insist that the event is more probable than the V&eUoua event, uamely that the bullet goes nowhere, though the numerieal probability of both events is sero. I do not take direct issue "ith Koopman, becaQ8e he is preaunabJy talking about a somewhat different concept of probability from the particular relation S; but I do not think it appropriate to suppoee that the penon would distinctly rather stake a pin on the line than on the Dull eet. The issue is Dot really either an empirical or a normative one, because the point and line in qumtion are mathematical ideali_tions. If the point and line are replaced by & dot aDd • band, respectively, then, of COUr&e, no matter how small the dot and band may be, the probability of the one bitting the other is greater than that of the vacuous event. But it teems to me enUndy & matter of taste, conditioned by mathematical experience, to decide what idealization to make if the dot and band are replaced by their idea1iMd limits. So much for hair 8plitting. All far &8 the theory of probability per 8e is concerned, postulate P6' ia all that, need be assumed, but in Chapter 5 a slightly stronger &88WDption will be needed that bears on acts generally, not only OD thoee very special acta by which probability is defined. Therefore, I am about to propoae & postulate, PS, that obviously implies P6' and \'till therefore su.peraede it.. This atronpr postulate 8ee1M to me acceptable for the lUna reason that P6' itself does.
P6 If I < h. and I is any consequence; then there exista a partition of 8 such that. if , or h is 80 modified on any one element of the partition 88 to take the value I at every 8 there, other values being un-
PE&c;oNAL PROBABILITY
disturbed; then the modified I remains less than h, or than the modified h, as tbe case may require. ,
13.4
« remains less
Some mathematical details
Are there qualitative probabilities that are both fine and tight, that are fine but not tight, that are tight. but not fine, that are neither fine nor tight but do have one and only one alm08t agreeing probability measure? Examples anBwering all these questions in the aflirmative will be exhibited in this sectioD. To indicate a different t.opic that will also be treated here, thoee of you who have had more than elementary experience with mathematical treatments of probability know that it is not wrual to suppose, 88 has been done here, that all sets have a numerical probability, but rather that a sufficiently rich cl888 of seta do 80, the .remainder being considered unmeasurable. Again, it is usual to suppose that, if each of an infinite sequence of di8joint sets i8 measurable, the probability of their union is the sum of their probabilities, that is, probability meuu1e8 are generally 88SUIIled to be countably additive~ But the theory being developed here does 8IJ8Ume that probability is defiDed for all events, that is, for all sets of statal, and it does not imply countable additivity. but only finite additivity. The present section not only an8\\~ra the questions raised in the preceding paragraph, but also discusses the relation of the notions of limited domain of definition and of countable additivity to the theory of probability developed bere. The general conclusions of this discussion are: First, there is no technical obatac1e to working with a limited domain of definition, and, except for expository complicatioDB, it might have been mildly preferable to have done 80 throughout. Second, it is a little better not to 888Ume countable additivity as a postulate, ,but. rather &8 a special hypothesi8 in certain contexts. A different and much more e.."(tensive treatment of these questions has been given by de Finetti (04). Finally, before entering upon the main technical work of this section, one easy question about the relation between qualitative and quantitative probability will be ans\vered and several 88 yet unanswered ones will be raised. Are there qualitative probabilities "ithout any strictly agreeing measure? Yes, becauae any qualitative probability that i8 fine but Dot tight is easily shown t.o provide an example. It is, bo,ve,·er, an open question, stressed by de :....inetti [D5). whether a qualitati\re probability OD a finjte S always hu a strictly agreeing measure. I t would also be technically interesting to know about the existence of almost acreeinl measures in the same context..+ + EVf'D this has eint"e bHn answftJ'.-d in th~ negatift by Kraft, Pratt. and Af.idflubP.,. (19M). Rfle (Filhbnm 1970. pp. 21f).211).
.1""
The matters to be treated in the rest of this section are rather teeh.. nieal mathematicalJy, and, though I would not delete them altogether, it does not seem justifiable to lay the necessary groundwork for presenting them in an elementary fashion. Some may, therefore, find it necessary to skip the rest of this section altogether, or to skim it rather lightly. It is well kno\m that there does not exist a eountably additive probability measure defined for every subset of the unit interval, agreeing with Lebesgue measure on those sets where Lebesgue measure is defined, and assigning the same measure to each pair of congruent sets+ (Problem (b). p. 276 of [112]). On the other hand, there do exist finitely additive probability measures agreeing with Lebesgue measure on thOle sets for which Lebesgue measure is defined, and 8.8Bigning the same measure to each of any pairs of congruent sets; cf. p. 32 of [B4J. The existence of such measures shows, amoDg other things, that a finitely additive measure need not be eountably additive. Again, calling such a finitely additive extension of Lebesgue measure P and defining B S· C to I1lMD P(B) ~ P(C), we see an example of a qualitative probability that is both fine and tight. An example of a qualitative probability that is tight but not fine may be cODStructed by taking for S two unit intervals, 8 1 and~, in eaeh of which finitely additive extensions of Lebesgue measure, PI and P2J are defined. The generic set B in this example is therefore partitioned into BI - B Sl and B2 B 8 2 , respectively. For this example, let B <. C; if, and only if P1(B 1) < P 1 (C 1 ), or else P 1 (B 1 } - P1(Ct ), and P 2 (B2 ) < P 2 (C2 ). This ~. is not fine, because, for example, 8 cannot be partitioned into events none of which is more probable than 8.. On the other hand, it is easily seen to be tight. Next, take 8 to be the union of 8 1 and 8 2 with the measures of PI and P2 88 defined in the preceding example, but modify the definition of S·, saying B <·0; if and only if Pl(B 1 ) P2(B2 ) < Pl(C 1 ) + Ps(Ct ), or else P 1 (B 1 ) + P,(B2 ) - PI(el ) + P t (C2 ), and P1(B 1) S p.(C 1 ). This is an example of a qualitative probability that is fine but not tight. Combining the ideas of the two preceding examples, it is easy to ex.. hibit a qualitative probability that is neither fine nor t.ight but is such that 8 can be divided into arbitrarily many equally probable events. Thus all the questions raised in the opening paragraph of this section are answered in the affinnative.
n
:BE
n
+
+ R. Ulanl (1930)
)l~
that any nonatomil'J, oountably additive probability ftleutln! d~ftned on aU thf'lInlw-tR of thfl unit int.-TVa] is inMnRiRt~nt with tbp Mnt~nU1lhl hypothesis.
42
PERSONAL PROBABILITY
To get a fee1ing for the question whether literally all seta should be regarded 81 measurable, IUppoae that 8 is a cube of unit volume and that the probabDity IJ18IIIUN P that strictly agrees with S is such that the probability of a parallelepiped is equal to its volume. It foUowa that the probability of any .,t having Jordan content is ita Jordan content, but, if a aet baa not Jordan content, & eontinuum of poIIibilities i8 still open. Thou&h other po88loilitiea are conceivable, it is not
unnatural to conaider an idealized person for whom the numerical probability attached to each Borel set, or even each Lebespe mea.aurable let, is ita Lebespe measwe. To go further and take seriously oompuiaoDS between aeta that are not Lebeacue measurable, or even betweeD those tbat are not Borel measurable, aeema to me to be without any implication bearing on reality. I suppose it might be argued, on the contrary, that there is DO feature of reality that caD properly be interpreted by poetuIatiDg that the peraoD is able to compare only seta from • aufticiently narrow field, 80 that it is simpler and more eleput to admit all &eta. The question seems to be one of taste, but the fonowing remark illustrates what I eODBider an awkwardness in auPpoeiDI probability to be attached to all seta. It would seem, at first aIanee, that the peraon should be able, if he is 80 eoDBtituted, to reprd all pain of pometrieally congruent sets for which he makes any comparison at all 88 equivalent, but the famous Banach-Tarski paradox {B5] show8 that this cannot be done if all sets are regarded as measurable. I think it a little more graceful to abstain from comparison between the more biarre aet8 than to give up, or even much modify, my everyday DotioDS about the symmetry of such probability problems assooiated with
geometry. If one is unwilling to insist on comparison between every pair of .ta, or events; then, in the same spirit, it is inappropriate to inaiat on comparison between every pair of acta. All that has been, or is to be, formally deduced in this book concerning preferences among sets, could be modified, mutatis mutandis, 80 that the class of eveDta would Dot be the class of aU subsets of S, but rather a Borel field, that is, a cr-aJpbra, on S; the set of all consequences would be a measurable apace, that is, a set with a particular ...algebra singled out; and an act would be a measurable function from the measurable space of events to the measurable apace of conaequencee. Indeed, the whole thing could be done for abstract ,·algebraa without reference to seta at aU, and this might have some actual advantage, since it would make po88Iole the identification of events with propositions in almost any formallaquage, even one unable to formulate at all the complete descriptions I call states.
8.6)
CONDITIONAL PROBABILITY
It may seem peculiar to insist on cr-algebras 88 opposed to finitely additive algebras even in a context where finitely additive m~ are the central object, but countable unions do seem to be essential to some of the theorems of la-for example, the t.erminal conclwnooe of Theorem 3.2 and Part 5 of Theorem 3.3. 80 much of the modem mathematical theory of probability depends on the 8MUmption that the probability measures at hand are countably additive that one is strongly tempted to assume countable additivity, or its logical equivalent, as a postulate to be adjoined to Pl-6.+ But I am inclined to agree with de Finetti [D2], (D4) and Koopman (K9), (KIO] , (KIll that, however convenient countable additivity may be, it, blte any otber assumption, ought not be listed among the postulates for a concept of personal probability unless we actually feel that its violation deserves to be called inconsistent or unreasonabl.e. I know of no argument leading to the requirement of countable additivity, and many of us have a strong intuitive tendency to regard &8 natural probability problems about the nec~M&rily only finitely additive uniform probability densitiM on the integers, on the line, and on the plane. It therefore seems better not to a..'mume countable additivity outright as a postulate, but. to recognize it as 8 special hypotbesis yieldillg.. wllere applicable, a large el888 of usef1l1 theorems.
Conditional probability, qualitative and quantitative Conditional preferences among acts in the light of a given event were introduced in § 2.7. Since the relation S among events baa been d~ fined in terms of the corresponding relation among acts, we may well expect to attach meaning to statements of the form B ~ C given D, provided that D is not null. The natural way to do 80 is to take a pair of acts f and , that test whether B < C (as Pre8cribed by the definition of < between acts in § 2) and say that B S C given D, if and only if f S I given D. Since there is more than one pair of acts f, I by which the proposition B S C can be tested, it is at first sight conceivable that not allauch pairs would be in the 8&Dle order given D, which would frustrate the propoeed definition of S given D. However. it i8 easily seen that for any I, g testing B ~ C. f S g given D (D not null) is equivalent to B n D sen D. Thus it is seen not only that the proposed definition is unambiguous, but also that it is expressible in terms of probability comparisons amoDg seta, without direct reference to acta at all, and, still further, that the postulates Pl-6 apply to the condiI
tional preference relation S given D among acta. This preamble sufficiently motivates the following definition and easy theorem about qualitative probability relations generally. t- rarrifd 'lUt lJy Vil1PgBS (lSlB·I).
PERSONAL PROBABILITY
(3.5
If ~. is a qualitative probability, and 0 < . D; then B S- C Pftll D, if and only if B n D S· enD. THEOREM
1
If
D. If in addition fine or tigh t.
<. is a ~.
qualitative probability, then 80 is ~. liven is fiDe or tight, then S - given D is correspondingly
<. is fine,
then, for any D that is not null, there exiBta, in view of Theorem 3.3, one and only one probability meuure P(B D), the (conctitiooal) probabWty of B pven D, that alm08t agrees with Se. But, just 88 one would expect from the traditional study of Dumerit81 probability, and as may be eaaily verified, P(B n D)/P(D) cODBidend as a function of B for fixed D is a probability measure that almOlt &gree8 ,,'ith S· given D. Therefore,
If
(1)
I
P(B I D) == P(B
n D)/P(D).
As was explained in 12.7, preference among acts given B can aug.. Restively be expressed in temporal terms. Analogously, the eompariaon among events given B and, therefore, eonditioD&l probability given B can be expressed temporally. Thus P(C I B) can be regarded 88 the probability the person would assign to C after he had observed that B obtains. It is conditional probability that gives expression in the theory of personal probability to tho phenomenon of learning by experience. In accordance ,,·ith established usage, a pair of events B, C are ealled independent if P(B n C) == P(B)P(C). More genenlly, a 88t of eventa are called independent, if for every finite set of them, say B 1, .... , B., (2)
P
(n B.) :. IIi PCB,). i
Obviously, if D is not Dull, B and D are independent; if and only if PCB I D) == PCB), in which case D may fairly be called irreleftDt to B. The notioDB of independence and irrelevance have, 80 far &8 I can aee, no analogues in qualitati,,~ probability; this is surprising and UDfortunate, for these notions seem to evoke a strong intuitive respoDBe. The absence of these analogues is traceable to the absence of a qualitative analogue for propositions of the (onn P(B I C) ~ P(G I H). Working under a rather different motivation from that which ","des this book, B. o. Koopman (K9}. [KIO], and [KIl] baa developed a system of qualitative possibility in which it is meaningful to compare B giVeD C with G given H. I t is true also that for qualitative probability, even 88 it is defined here, some interconditional comparisons might be naturally defined. If, for example, B S· ",B given C and ~ S· G given H, it would not be unreasonable to establish the convention that B
given C S· G given H. 'Chis sort of extension is not, however, highly
3.5)
CONDITION All PROBABII"ITY
pertinent to my purpose, for here I have little interest in qualitative probabilities, except as a foundation for quantitative probability. The following partition formula is well known and easy to prove: (3)
pee) - L
p(e I Bj)P(Bj )
j
",ilere Bl is a partition of S into non-null sets. If, further, C is not nun, it is also trivial to derive the celebrated Bayes' rule (or theorem), (4)
I P(Bi I
p(e I Bi)P(B.) C) = - - - P(C)
I
P(C B,)P(B i ) ~
L
I
p(e Bj)P(BJ}
j
Illustrations of these formulas are found in all elementary texbooks OD probability, 88 well as in later sections of this book. Finally I if neither B nor C is null, (5)
PCB I C)
PCB)
-
P(C i B)
P(C)
P(B
n C)
-=---P{B)P(C)
which may be given the suggestive reading: Knowledge of C modifies the probability of B by the same factor by which knowledge of B modifies the probability of C. The concept of random variable enters into almost any discussion of probability. Experts are fairly well agreed on the follcming definition. A raudOlll variable is a function x attaching a value x(,) in some set X to every 8 in a set 8 on which a probability measure P is defined.t Such an 8 together with the measure P is called a probability space. Beal-valued random variables are the most familiar, though in general the values X can be things of any sort. Ir, for example, x and 1. with values in X and Y, respectively t are random yariables on t.he same measure space, a new' random variable Z Ix, yl is defined by setting Z(8) - lr,(.), Y(8»). The values of z are thus elements of ,,·hat is ealled X X Y (read the cartesian product of X and Y), the set of ordered pai1'8 'witb first elemeDt in X and second in Y. The same sort of thing can be done, of course, for ordered n-tuples and also for infinite sequences of random variables. IIr
t In many applications of the theory of probability. not all
RubHttta of S or of X
are eatuddered meuurab1e. It ie then required 88 part of t.be de-finition of random variable that z be measurable, i.e., that (or eve", measurable Y c X, the aet of .'Iauch that z(.) f Y be measurable.
PERSONAL PROBABILITY
(3.8
Two random variables % and J defined on the aam8 measure apace 8 are eaIled (ltatiatlcally) independent; if and only if, for every Xo eX ad Yo c Y, the two events (i.e., subaeta of 8) defioed by the cmacJi... tiona %(.) I Eo and U(,), Yo, respectively, are independent. t 'The extension of this definition from pain to 8Dy number of random variables is obviou8. 8
The approada to ,certalDty throu&b aperleDC8
In § 3, the theory of penonal probability W88, from the purely mathematical point of view, reduced to that of probability measures, a BUbject that baa been elaborately studied, more or lesa explicitly, for eenturies. Any mathematical problem concemiug peJ'8OD8l probabilit.y is nece oaarily a problem eoneemiDg probability me&8U1'efJ the study of which is currently called by mathematici8D8 mathematical pmbabDltr --and conversely. The particular outlook and interpretation impHcit in a personalistic concept of probability leads, however, to problems that, though perfectly meaningful for mathematical probability, riaht Dot otherwise have been emphasized. This section and the BUcceediDI one each briefly diacua one 8uch problem. Theae two problema are selected from among many p088ibilitieB for the iDsight they provide into the concept of personal probability. Before studying theae problema, it is neeeeaary to be converaant with
the material in Appendixes 1 and 2, which is used in the jmmediate sequel and often throughout the rest of this book. As was brought out in 15, the person leams by experienee. The purpose of the present section is to explore \\ith a moderate d.egree of generality how he typically becomes almost certa.iD of the truth, when the amount of his experience increuee indefiDitely. To be specific, BUppoee that the person is about to observe a Iarae number of J'8Ildom variables, all of which are independent given Bi for each i, where the B, are a partition of S. It is to be expected iDtuitive1y, and wiD 800D be shown, that under general conditions the penon is very sure that afU!r making the observation he ",ill attach a probability of nearly 1 to whichever element of the partition actually obtains, To deaeribe the 8ituation formally. let Bi be a partition of 8 with P(B,) == #(i). Let ~,., r :a 1, 2, ... f be a aequence of raDdom varlableB, each taking on only a finite number of values (which can without )088 of generality be thought of 88 integers). The restriction to a finite let of values could be removed, but to do 80 would raise problems of mathematical technique that, however interesting, are rather beside the point t ,\\'bere not all seta are meuurable, Xo and Yo must., of COUI8ey be required to be measurable.
&6J
APPROA.CH TO CERTAINTY THROUGH EXPERIENCE
4.7
of this book. Let J: denote the firat A of the nmdom variables z,.. It is to be bome in mind that z depends on n, so, strictly speaking, it should be written z(n). The 888Umption that, given B, the x/a all have the same diatributioQ is expreeaed by (1)
P(%r(')
==
%1'
IB i)
== ~(xr Il),
wheN f(z,.ll) is defined by the context.
Combining (1) with the aumptioD that the Z,'8 are independent given B t , (2)
I
P(z B i )
8&-
111
- DI P(z(s)
lXI, ' .• , %nll B i )
-
-
II ~(x..1 i),
when a coDventioDaisymbol has been used for equal by ddultion. Theae hypotheees having been laid down, it fonows from Bayes' rule and the partition formula (5.3) and (5.2), that (3)
P(B. I x)
I
P(z B,)P(Bi ) - ----
P(z)
/lei)
II,. f(z,. I J)
-=:----P(z)
and (4)
P(r.) -
L
fj(i)
IT f(z,. 1i).
i
r
In cormeetion with (3). it may be obeerved in p888ing that, if the a priori pIObabUlly, /J(t}, of Bi is 0, then, no matter what value x is obeerved, the a poeteriori probabDlty of Bi, P(B, I x), is also o. This is an example of the general principle that, if some event is regarded &8 virtually impoeaible. then no evidence what80ever can lend it credibility. Similarly, (3) implies the equally CommOD-8eD8e principle that, if an obeervation ~ is virtually impossible on the hypothesi. (i.e., given) B i , and % is ob8erved, then Bi becomes virtually impoeaible a posteriori. It is particularly interesting to compare the probability of two ele. menta or the partition, say BJ and B2 for definiteness, in the light of x. (5)
PCB) I x) _ P(I) p(B,1 z) 1l(2)
II ~(~ II) l'
~(X1' I 2)
== ~(l) II R'(% ) 1J(2)
r
_ 6(1) R(x) 8(2) ,
'
[3.6
PERSONAL PROBABILITY
where self-explanatory abbreviatioDl have been introduced. Equation (5) is meaningless, if both the numerator and denominator of ita lefthand side vanish. U the denominator alone vanish., the fraction may properly be regarded as infinite. This wiD happen; if aDd only if B, is DUlJ, and B. is Dot null giveD z. That is, it will happen if &ad only if ,,(1) pi! 0, ~(2) =- 0, or if ~(1) ~ 0, and R{~) lD co. In modem statistical ueap, R'(Z~) and R(z) are the IjbtihcwJ ra'" of Bl to B, given %, and %, respectively, quantities of importaDee in many theoretical eontexta. H a peraon coDtemplatee makinl the observation z, that is, findins out the value of z(a) for the _ tbat is the true state of tbe world, it may properly be asked bo\v probable he considers it that It will turn out to have a particular value. It "ill be abOWD, barriDg two baDaJ exeeptiODS, that, for ft mftieiently large, the probability, given BJr that R is greater than any preassigned number is almost 1. The poeaibUity P(B.) ~ 0 is to be excepted, for then the conditional probability in question is meaningless. The other exception occurs when t(zt' 11) f(zr 12) for every Z~t that is, when the common distribution of XI' giVeD B 1 is the same as it is given B2 j for then obeervation of Kr is simply irrelevant in distinguishing B I from 8 2 , or, a little more technically. :lr is irrelevant to Bl given Bl U B2 , and
(6) Formally, it is to be demonstrated that, unless P(B I )
=a
0, or (6)
holds, (7)
..
lim P(R(z)
,,-
> p I B 1> ==
1
for 0 S p
< aD.
The problem is quite simple when account is taken of the fact that R(J:) is the product of n random variables, R'(zr), tbat are independent given B 1 - In attacking the problem, two cuea are to be diatiDguished, according 88 there are or are Dot values of % that, have positive pr0bability given BI but Ie1"O probability given B2 • It is in practice rather fortunate to find iD8taDce8 of the firat cue, for then (7) applies ~;tb a vengeance. Indeed, BUppo&e that P(R'(~)
(8)
< GO IB.)
= t/I,
Then
peR .. co I B s>=-= 1 - ••,
(9)
which
ob\~jous1y
approaches 1 "ith increasing n.
3.61
APPROACH TO CERTAINTY THROUGH EXPERIENCE
The second cue, namely " == 1, is more interesting. Since much is known about awna of identically distributed independent random variablea, it is uatural to investigate
log R(x) =
(10)
L log R'(X,.), r
thereby replacing a product by a awn. It is easily eeen from the definition of R'(zr) that P(R'(zr) > 0 B l ) == 1, 80, in the cue DOW at band, the fUDctiona Jog R'(x.,) are independent real bounded random variables.
I
Letting (11)
tbe weak law of large Dumbers (12)
t
implies that, for any
lim P(log R(x) ~ n(1 - e) I B l )
-
E
> 0,
1,
a- •
equivalently, (13)
lim P(R(x)
> eflU-e) I B I ) = 1.
The objective \\;11 therefore be achieved, if it is demonatra.ted that
1
> 0 unless
(14)
(6) holds. But 1 - B(log R'(zr) IB I ) ~ -log E(R,-I(Z.,)
I B1)
- - log 1 - 0, may be argued thU8: The inequality in the above calculation is 88signed as Exercise 8 in Appendix 2, together with the fact that equality can hold in (14) if and only if R,-I(Xr ) is constant with probability one given B 1• But the expected value of R,-l(z..) given Bl is equal to 1, 88 (14) 888erts and 88 may be easily verified from the definition of R,-J(xr ). So, barring the exceptions provided for, 1 > 0, and the demonstration of (7) is eomplete. Before the observation, the probability that the probability given ~ of whichever element of the partition actually obtains will be greater than a is 88
(Ii)
~ ~('lp(p(B.1 ~)
,
>
I
a B~,
where summation is confined to tboee i'8 for which /l(i) jill! O. Application of (14) (edeDded to arbitrary pairs of i'a) shows that the coefficients t For the definition of this law, _, if DecOIlE,)" p. un of FeUer. book [Fl).
PEBSONAL PROBABILITY
[3.7
of each /J(t.) in the quantity (15), and therefore the quantity itself, approaches 1 as n increases; provided only that no two funetiODB f(ZrI i) and ~(Xr i') are the same, if ~(t.) and /J(i') are both different from sero. To summariJre informally, it has DOW been shown that. with tbe observation of an abundance of relevant data, the peraon is almost certain to become highly convinced of the truth, and it has also beaD shown that he bimaelf knows this to be the cue. It may be remarked, for those familiar with certain theorems, that many refinementa of (7) and its COD8eQUeDces could be worked out by application of the strong I&w of large numbers, the central limit theorem, and the law of the iterated logarithm to R'(z..). The quantity 1 is coming to be called the iDfODDation of the distribution of Xr given Bl with respect to the distribution of x.. given B". More generally, if P and Q are probability measures, confined (for simplicity) to a finite set X with elements 2:; the information of P with reepect to Q is defined by
I
1: P(z) log per) .
(16)
:I
Q(:r)
This usage stems from work of Claude Shannon in communication engineering, a good account of which is given in [SI1J; and also from independent work of Norbert Wiener in a related contAnct [WIOJ. The ideas of ShanDOD and of Wiener, though concerned with probability, seem rather far from statistics. It is, therefore, somethiDg of aD accident that the term "information" coined by them should be not altopther inappropriate in statistics. The situation is still further confused, because, as long ago 88 1925, R. A.. Fisher emphasized an important notion, which he called "information," in connection with the theory of estimation (Paper 11, Theory oj 8uJtiMical t~timalitm in (F6J). At lint glance, Fisher's notion seems quite different from that of Shannon and Wiener, but, 88 a matter of fact, his is a limiting fann of theirs. A useful but rather technical exposition relating the several aen8el of "information n is given by Kullhack and LeihJer (K15], and I return to tbe topic in § 15.6....
7 Symmetric sequence. of 8YeJlts A problem often posed by statisticians is to estimate from a eequenee of observatioDB the unknown probability p that repeated trials of aome 80rt are successful. On an objectivistic view, this problem is natural and important, for on such a view the probability that a coin falla heads, for example, is a property of the eoin that ean be determined by experimentation with the coin and in no other way. But on a personalistic .,
~
aiM
(Knllhar~
1001).
8.7)
SYMMETRIC SEQUENC&'J OF EVENTS
II
view of probability, strictly interpreted, no probability is unknown to the pel'8OD concerned, or, at any rate, he can detennine a probability only by interroptiDg himselft not by reference to the external world. This situation has been interpreted to imply that the pel'lODalistic view is wrong, or at any rate inadequate, because it apparently cannot eVeD express one of the most natural and typical problema of atatistiea. Thus far in this book, I have not argued against the poIBibDity of defining some U8ef'ul Dotion of objective probability, but have contented myself with presenting a particular notion of pel"8Oll&1 probability. Therefore, at this point it might be tempting to seek a dualistic theory admittiDg both objective and personal probabilities in some kind of articulat.ion with one another. De Finetti fD3] haa shown, however, that it is not necesaary to do so, that the notion of a coin with unknown probability p can be reinterpreted in t.erms of personal probability alone. The present section is de,·oted to outlining this development due to de Ymetti. In the organisation of tbe book &8 a whole, it playa DO lop. ea11y 888eJltial pan; it is, rather, a digression intended to give a clearer understanding of the notion of personal probability, especially in relation to objectivistic views. The ideas presented. here are but a fl'&lment of those OD the same subject in [D2]. z,. be a sequence of random variables taking only the values 0 and 1. The Z,.'8 &re, to all intents and PUrp0ee8, & aeqUeDee of events, the rth of which is the event that %,.(8) - 1. To say that these events are independent, each occurring with probability p, is to aay that the probability of any finite pattern, %1, ••• , x"' initiating the sequeDce %r(') is given by the formula
ut
(1)
P(%,.(,) .. %,.; r -= 1, •.. ," I p) .. ,,(1 - p)"-',
where II is the number of 1'8 among the %,.'8 for r z:: I, ••.• n. Mixtures, in a certain sense, of sequences of random variables are often of interest, as they already ha,,~ been in the preceding section. SUPP088, to be explicit, that the world is partitioned by Bl and that, liven Bit the x/a are independent "ith P(r,.(I) == 1 B.) having some fixed value p(i). Then the unconditional probability of a particular initial aequence is a mixture of tbe probabiUtiee given by (1) tbu8:
I
(2)
P(~(,) ..
x,.; r - 1, ..• , 11) -
L
•
p(t)·(l - p(I)"-'P(Bi) .
It is natural to pneralise (2) formally thus: (3)
P(z.(.) - :r,.; r - I, .'., n) -
f P"(I -
p)." dM(p),
(3.1
PERSONAL PROBABILITY
where M is a probability measure on the real numbers in the interval [0, 1). It is noteworthy that equation (3), UDderatood to apply for every ft, is equivalent to the cODdition that the probability that every fa of each prescribed set of n of the Zp's takes the value 1 is
f
(4)
p'"dM(p).
This follows by arithmetic induction from the obvious formula (5)
P(Zr(a)
-= z,.; r
:. 1, •.. J ta)
-= P(z..(,)
II:
:rr; r
+ P(S,.{,)
ID
II:
1, ... , n; Za+l(')
w=
0)
z,.; r .. 1, .•. , fa; %-+1(') - I),
wbich app1iee to aDy sequenee of random variables t.aJring OIl only the values 0 and 1. Equation (3) caD very well have an interpretation in such terma that the meaaure JI is not merely an abstract probability measure, but is actually a peraonaI probabilit.y. Thus, if p is a random variable tbat. is (for a giVeD perIOn) distributed aecordiDa to Jl, and, Hlor each 'P the eonditiODal distribution of the %r'a given p is independent, with p(z,.(e) - 1) .. p; then (3) obtains. Strictly speaking, the notion 01 conditional probability .. it OCCUI8 in the preeediDg sentence is uaed in a aomewhat wider aenae than has been defined in this book, for the probability of any particular p win typically be zero. At least for countably additive measures, the neee sary exteDIiOll of eonditioaal probability and cooditional expectatioo is presented by Kolmogordf in [K7); it is a concept, of the p-eateat value in adV8Dced mathematical statistice and in probability generally. However, in moat contexta where objectivista apeak of an 1mbown probability 'P, there is, 80 far 88 aD exclusively persooaJistio view of probability is concerned, DO unknown parameter that can play the role of p in (3). Examination of aituati0D8 in which "unlmoWll" probability is appealed to, whether juatifiably or not, abOWI that, from the persoD&liatic standpoint, they alwaye refer to 8)'DlDletric eequenc. of events in the 8eDle of t.he following definition. The sequence of random vanabl. z" ta1rinl only the values 0 aDd 1, is a IJIfIIfftdric t ~, if and only if the probability that aD7 b of the %,,(.)'8 equal 1 and aDy c other ~(.)'8 equal 0 depends only on the intepre II ad c.
t De FiDeUl ... the 1PreDeth word for "equi..at."+ + Be aDd oUtfl,. DOW preter "-elaarllCMhl•." Tb~ C'ODf4tpt ....ms to ha.,~ '-n ftNt ~ by Jl1lM Bug (1928).
3.7)
8YMMETBlC
~UENCES
OF EVENTS
&3
It is euy to verify that any mixture of independent sequences in the leDle of (3) is a symmetric sequence. De Finetti has di.seovered that the converse is a1ao true. These conclusions can be fonnally 8WIlDlarized thU8: THEOREM 1 A sequence of random variables z,., taking only the valUe8 0 and 1, is symmetric, if and only if there exiBta a probability
measure M on the interval [0, 1Jsuch that the probability that any prescribed n of the zr(8fs equal 1 is gi,"en by (4). Two such measures, M and M', must be eseentiaUy the same, t in the sense that, if B is a subinterval of [0, 11, then M(B) .. JI'(B). CODBidering that de Finetti haa published a proof of Theorem 1 in [D2) based on the Fourier integral, that &U1 proof of it must be rather technical, and that the theorem is not the basis of any formal inference later in this book, it eeems best not. to prove it here. t I t is Theorem 1 that makes it possible to express propoejtiODS referring to uDknown probabilities in purely personalistic terma. If, for example, a statistician were to say, U I do not know the p of this coin, but I am sure it is at most one half," that would mean in personalistic terms, "I rteard the sequence of toeses of this coin 88 a symmetric sequence, the measure M of which 888igns unit measure to the interval [0, t)." This condition on M meana in tum that for every n the (personal) probability of n consecutive heads is at most 2--, &8 is easily verified. I do not insist that propositions couched in tenns of a fictitiou8 unknown probability are bad~ if undel'8tood &8 suggestive abbreviations, but only that the meaningfulness of such propoaitioM does Dot cODstitute an inadequacy of the pel'8Onaliatic \iew of probability. The mathematical concept of probability me&8Ul'C or, a trifle more generally, bounded mea...ure is fundamental to mathematics generally. Probability measures, often under other Dam~ are, therefore, employed in many parts of pure and applied mathematics completely unrelated to probability proper. For example, the distribution of m&88 in a not necessarily rigid body i8 expressed by a bounded measure that tells how much of the body is in each region of space. We must, therefore, not be A1rprised if, even in studying probability itself. we come &C1'088 some probability measures used Dot to measure probability
t TechnieaJ note: If "probability measure" were here uDderstood to mean a COWl"" ably additive probability meuure on the Borel . . of (0, I). the theorem would remain true, aDd the __tiaJ wdqu__ or M would beoome true uniquene-. ~ Teclmical DOte: Theorem 1 can be proved very quickly and naturally by applyina tbe theory or the lIasdodr moment problem (pp. 8-9 of 18130 ~ M t but; method does !lOt _m to pneraliae readily. + + New ad pneral utelhodl are in Hewitt and Savap (1955) aad ByllNardzewUi (1961). For related work see Biihlmann (1960), Freedman (1962, 1_), Milier-Gnu.ewaka (1949, 1950), and ReDyi and Reves& (1963).
to.
PERSONAL PROBABILITY
proper but only for auxiIiaty purpoIe8. In the event that p is not aotually an unknown parametsr, the measure M pr8leDted by Theorem 1 seems at first sight to be such & purely auxiljary measure, but, aa a matter of fact, M does measure certain inteJeeq probabilities, at least approximately. For example, JettiDg 1 •
(6)
I. - -
ft
E%,., 1
it can be shown that
......
(7)
lim P(I.(.)
S ') - M(p S
I).
In words, the person considers the average of any larp Dumber of future obaervatioDS to be distributed approximately the way 11 is distributed by M. This is an extension of the ordinary weak law of larp Dumbers, proved in (D2] alODg with a correspoDdiDg eDemIion of the strong law.
If the first n terms of a symmetric aequenoe are obaerved, how does the rest of the sequence appear to the penon in the liIht of this obeervation? In the first place, it aIao is a symmetric eeqUeDce but paerally of a structure different from that of the original eequenee, as may be shown thus: Let ..(y, n - 1/) .. Df P(z1'{&) - Z1'; ,
(8)
as one may for a symmetric sequence. (9)
P(z,(,)
-= 1, ••• , A),
Then
+ 1, ... , n + m 1%1'(8) P{Xp (') -= XII' P .. 1, ... , n + m) == P(Zr(8) == Z", T = 1, ..• , n) r(1/ + z, (n - 1/) + (m - z»
-= z,; q
la
fa
2:,-,
-
r - 1, .•. ,.)
:Ii: - - - - - - - - - -
"(11,
n - II)
where a is the number of 1'8 among the Z,'8, q - n
+ 1,
+ "'-
.•. , ft. Equation (9) shows that the aequence q > n, given that s ..(.) r =- 1, .•. , A, is a new symmetric sequence characterised by
x.,
(10)
~~m-~=ru
"(1/
+ z, (n -
r)
"'(11,
fa -
~1'J
+) (m - z» . fI
The measure M' associated with the new sequence is, aceordiDa to Theorem 1, esaentially determined by the CODdition that
&7J
(11)
8YMMErRIC SEQUENCES OF EVENTS
f
p'" dM'(p) - "'(m, 0) ...(m
+ 7/, n -
1/)
=-----r(y,1I - y)
f p..+tt(1 -
p) ....... dM(p)
=.---------------"'(J/, n - 71) ==
f",. ,-(1 -
p)"--' d.V(p). "'(,1, n - 1/)
Equation (11) makes it plausible that, except for the slight ambiguity .,.mitted by Theorem 1, M' is defined (for Borel seta B) by (12)
M'(B) - 11'-1(1/. n - 1/)£P"(1
- p)"" dM(p),
and this ean in fact be demonstrated with some appeal to slightly advaced methods pertaining to the Hausdorff moment problem (pp. 8-9 of [813]). It is noteworthy that, if M(B) -= 0, then M'(B) 0 also. In tbe event that p reaUy is an unknown parameter, this rne&D8 that, if the peraon is virtually certain that the true p is Dot in B, no amount of II:
evidence can ait« that opinion. Equation (12) shOWl that M' is generally different from M. Indeed, for fixed ft ~ 1, Jl' is clearly the same as M for every tI for which ...CJ/, fa - If) > 0, if and only if M assigns the measure 1 to lOme one value of p~ That is, the person regards evidence drawn from a symmetric sequence 88 irrelevant to the future behavior of the sequence, if aDd 0Dly if at the outset he regards the sequence Dot merely 88 aym:netrie but also as independent. It can be shown that the person regards it 88 highly probable that, if he obeerves a eufticiently lonl ..,'bent of a symmetric eequence, tbe cODtinuation of the sequence will then be one for which the eOJ1ditionaJ variaDce of p, (13)
wiD be small. In tbe event that p is really an unknown parameter, this implies that the person is very sure that after a Ioog sequence of obaervatiODB he will aaaign nearly unit probability to the immediate neigh.. borhood of the value of p that actually obtaine--a paraDel to the approach to certainty discus8ed in 16.
CHAPTER
4
Critical Comments on Personal Probability 1 Introduction It is my tentative view that the concept of personal probability introduced and illustrated in the preceding chapter is, except poeaibly
for 81ight modifieatioDl, the only probability concept essential to aeience and other activities that call upon probability. I propoaa in this chapter to discusa the ahorteominp I Bee in that particular peraoDaliatic view of probability, which, for brevity, shall here be ea1led simply Uthe personalistic view"; to point out briefly the relationships between it and other views; to criticize other view8 in the light of it; aDd to diecuss the criticisms holders of other views have raised, or may be expected to raise, against it. From the standpoint of strict logical organization such critical ~ marks are somewhat premature, becauae the personalistic view jtself insists that probability is concerned with consistent action in the face of uncertainty. Consequently, until the theory of such action has been completely outlined in later chapters, the view to be criticised cannot even be considered to have been wholly presented. Practically, however, it aeoms wise not to confine critical comment. to the one part of the text that logic may 8Uggest as appropriate, but rather to touch on criticism from time to time, even at the cost, of some repetition. Thus, some of what is to be said here has already been said in the introductory chapter and elsewhere, and some of it will be said &pin Views other than the personalistic view are to be diacueed here, but it cannot be too distinctly emphasized that the account given of them will be very superficial. t One function of diacWl8ing other view. is to provide the reader with at leut some orientation in the larp and diversified body of ideas pertaiDing to the foundation of 8tatistiea that t
~fucb
more extensive comparative material ia given by Keynes (KtJ, by N.pI (NIl. and by Camap (CJI. Koopmao IK121 should &lAo be mentlofted in tbi8 COD-
Det"tion.
".21
8HORTCOAflNG8 OF THE PERSONALISTIC VIEW
51
have been accumulated. A less obvious, but I think no less important and legitimate, function is to cast new light on the personalistic view, especially for thoae who already hold, or tend to hold, other ,·iew8.
I can aDSWer, to my own satisfaction, some criticisms of the perBODalistie view that have been brought to my attention. These points are mscuSBed later in the chapter, but in this section I state and discu~ &8 clearly as I can those that I find more difficult and confusing t.o answer. According to the pel'8Oll&1istie view, the role of the mathematical theory of probability is to enable the pe1'8On using it to de~et incoDsistencies in his own real or envisaged beha,ior. It is also understood that, baling detected an inconsistency, he ",,11 remo,·e it. An incoDsistency is typically removable in many different ways, amoDg which the theory gives no guidance for chOO8iDg. Silence on this point does Dot seem altogether appropriate, so there may be room to improve the theory here. Consider an example: The persoD finds on interrogatinl himself about the possible outcome of tossing a particular coin five tim. that he coDSidera each of the thirty-t,,·o possibilities equally probable, 80 each baa for him the numerical probability 1/32. He also finds that he considers it more probable that there will be four or fh·e heads in the five toI8e8 than that the first two tAl.as will both be heads. Now, reference to the mathematical theory of probability soon shows the persoD that, if the probability of each of the thirty-two p08Bibiliuea is 1/32, then the probability of four or five heads out of five is 6/32, and the probability that the first two toe&es wUI be heads is 8/32, 10 the person has caught himself in an inconsistency. The theory does not tell him how to resolve the inconsistency; there are literally an infinite number of poabilities amoDg which he must choose. In this particular example, the choice that first comes to my mind, aDd I imAJPne to yours, is to hold fast to the poaitioD that all thirty-two poaJibilitie.s are equally likely and to accept the implications of that position, including the implication that four or five heads out of five is Ieee probable than two heads out of two. I do not t.hink that there is my justification for that choice implicit in the example &8 formally et.ated, but rather that in the 80rt of actual situation of ,,-hieh the example is a crude schematization there generally are considerations not incorporated in the example that do justify, or at any rate elicit, the
choice_ To approach the matter in a 80mewhat different ,,·ay, there seem to be some probability relations about which we feel relatively "sure'" as
CRITICAL COMMENTS ON PERSONAL PROBABILITY
[4.1
compared with others. When our opinions, as reflected in real or envisaged action. are inconsistent, we sacrifice the unsure opiniODB to the sure ones. The notion of "sure" and "unsure" introdueed here is vague, and my complaint is precisely that neither the theory of personal probability, as it is developed in this book, nor any other device known to me renders the Dotion less vague.+ There is some temptation to introduee probabilities of a seeond order 80 that the person would find himaelf saying such things 88 "the probability that B is more probable than C is greater than the probability that F is more probable than G. u But such & program seems to meet insurmountable difficulties. The first of theae pointed out to me by Max Woodbury-is this. If the primary probability of an event B were a random variable b with respect to eeeondary probability, then B would have a "composite" probability, by which I mean the (secondary) expectation of b. Composite probability would then play the allegedly villainous role that secondary probability was intended to obviate, and nothiDg would have
been accompliahed. Api", once second order probabilities are introduced, the introduction of an endleA8 hierarchy seems mescapable.. Such a hierarchy . . "'very difficult to interpret, and it seems at best to make the theory less realistic, not mote. Finally, the objection concerning composite probability would seem to apply, even if an endless hierarchy of higher order probabilities were introduced. The composite probability of B would here be the limit of a sequence of numbers, E.(B.. _1 (··· E2 (Pl(B»· .. ». a limit 'that could scarcely be postulated not to exist in any interpretable theory of this sort. The reader may wish to evaluate for himseH the arguments in favor of such a hierarchy put forward by Reichenbach (Chapter 8, [R2)), takiDI proper account of the differences, between Reichenbaeh'. overall view. and his mathematical theory, of probability on one band and, on the other, the personalistic view and measure-theoretic mathematical theory that are the basis of my critique of higher order probabilities. The interplay between the "sure" and "unaure" is interestiDll1 expreaaed by de FiDetti (p. 60, [D2D thus: "The fact that a direct estimate of a probability is not always possible is just the reason that the loPcal rules of probability are useful. The practical object of these rules is simply to reduce an evaluation, scarcely aeceuible directly, to others by meaD8 of which the determination is rendered easier and more
precise." It may be clarifying, especially for some Madera under the away of the objectivistic tradition, to mention that, if a person is '~8Ure" that ... One teolptillg representation of the l1Jl8un is to replace the penGn'.lingle probability nU!!UUrfI P by a ..t of .\l~h measures, especially a OODvex set. Some .xploratioDS of thia are Dempster (1968), Good (1962), and 8nuUa(1961).
4.2)
SHORTCOMINGS OF THE PERSONALISTIC VIEW
the probability of heads on the first toss of a certaiD penny is i, it does not at all foUow that he considen the coin fair. He might, to take an extreme example, be convinced that the penny is a trick ODe that always falhJ beads or always falls tails. Logic, to which the theory of personal probability can be closely paralleled, is similarly incomplete. Thus, if my beliefs are inconsistent with each other, logic insists that I amend them, without teUing me how to do 80. This is not a derogatory criticism of logic but simply a part of the truism that logic alone is not a comp1ete guide to lile. Since the theory of personal probability is more complete than logie in some reapecta, it may be 801Ilewhat disappointing to find that it represents no improvement in the partieular direction now in question. A leCond difficulty, perhaps closely associated with the fim one, stems from the vagueness aaociated \\ith judgments of the magnitude of personal probability. The postulates of personal probability imply that I can determine, to any degree of accuracy whatsoever, the probability (for me) that the next president will be a Democrat. Now, it is manifest that I cannot really determine that number with great aecuracy, but only roughly. Since, &8 is widely recognized, all tbe IDte~ ing and useful theories of modern science, for example, geometry, relativity, quantum mecbanics, Mendelism, and the theory of perfect competition, are ine.uct; it may not; at first sight seem disquieting that the theory of personal probability should also be somewhat inexact. As wiD immediately be explained, however, the theory of personal probability ca.DIlot safely be compared with ordinary scientific theories in this respect. I am not familiar with any serious analysis of the notion that a theory is only slightly inexact or is almost true, though philO8Opbem of science have perhaps presented some. Even if valid analyses of the Dotion have beeD made, or are made in the future, for the ordinary theories of science, it is Dot to be expected that those analyses wiD be immediately applicable to the theory or personal probability, normatively interpreted; because that theory is a code of consistency for the penon applying it, not. & system of predictioDs about the world around him.
The difficulty experienced in § 2.6 with defining inditJerence &ee1Dl eloeely &88OCiated with the difficulty about vagueDe88 raised here. Another difficulty with the theory of personal probability (or, more properly, with that larger theory of the behavior of a person in the face of uncertainty, of which the theory of personal probability is a part) is that the statement of the theory is not yet necessarily complete. Tbus we shan in the next chapter come upon another proposition that. demands acceptance as a postulate, and, sinee even this leaves the per-
eo
CRITICAL COMMENTS ON PERSONAL PROBABILITY
(4.4
a great deal of freedom, there is DO telling when someone will come upon still another postulate that clamors to be adjoined to the others. Strictly speaking, this is not 80 much an objection to the theory &8 a warning about what to expect of ita future development. SOD
3 Connection with other ft8WI All view. of probability are rather intimately connected with one another. For example, any necessary view can be reprded 88 an extreme persooaliatic view in which 10 many criteria of consistency have been invoked that there is no role lelt for the peJ'8OD'S individual judptent. Again, objectivistic views can be regarded as penonalistic views according to which comparisou of probability can be made only for very special pairs of events, and then only a.coording to such eriteria that all (right-minded) people agree in their comparisons. From a different ~dpointt personalistic views lie not between. but beside, necetRry and objectivistic views; for bot.h Decessary aDd objectivistic view8 may, in contrast to personalistic views, be called objective in that they do not concem individual judgment. ,
Criticism of other vi....
It will throw some light on the personalistic vie\v to say briefly bow some other views seem to compare unfavorably \\~itb it. It is one of my fundamental tenets that any satisfactory account of probability must deal ,lith the problem of action in the face of uncertainty. Indeed, almost everyone who seriously considers probability, especially if be has practical experience with statistica, does sooner or later deal with that problem, though often only tacitly. Even BOme pel"8OllaJistie view8 seem to me too remote from the problem of action, or decision. For example, de FiDetti in (021 gives two approaches to personal probability. Of these, one is almost exactly like the view 8pOD8Ored here, except onJy that the notion "more probable than" is supposed to be intuitively evident to the person, without reference to any problem of decision. The other is more satisfactory in this . . apect, being couched in terms of betting beha\ior, but it seems to me a BOmewhat Ie. satisfactory approach than the one apoD8Ol'ed here, because it must assume either that the beta are for infiniteeimalaumB oranticipating the language of the next chapter-that the utility of money is linear. The theory expressed by Koopman in [K9), [KIO], and [KIll and that expressed by Good in [02] are both peraoDaiiatic views that tend to ignore decision, or at any rate keep it out of the foreground; but the personalistic view expressed by Ramsey in [Rl1 like the one 8pOIl8OJ'ed here, takes decision as fundamental. If any neceasary view
4.4)
CRITICIS~I
OF OTHER VIEWS
61
can be formulated at all, it might well be poesible to formulate it in terms of decision, but, so far as I know, the notion of decision has Dot appeared fundamental to the holders of any necessary view. It seems fair to .y that objectivistic views, by their very nature, must in principle regard decision as secondary to probability, if relevant at all. Yet, the objeetivist A. 'Vaid has done more than anyone el8e to popuMae the notion of decision. As bas already been indicated, from the position of the personalistic view, there is no fundamental objection to the possibility of CODstruCting a necessary view, but it is my impression that that p08Sibility has Dot yet been realiJed, and, though unable to verbalize rea8ODS, I eonjeetul'8 that the possibility is not real. Two of the most prominent enthusiasts of necessary views are Keynes, represe1lted by [K41, and Carnap, who has begun in {ell to state what he ho~ will prove a satisraetory necessary (or Dearly Decessary) view of probability. Keyoes indicated in the clOBiDg pages of [K4) that he was not fully satisfied that he had solved his problem and even sugestAd that aome element of objectivistic views might have to be accepted to achieve a satisfactory theory, and Carnap regards (ell as only a step toward the establiahmeut of a satisfactory Deee88ary view, in the existence of which he declares confidence. That these men express any doubt at all about the poasibility of narrowing a personalistic view to the point where it becomes & necessary one, after such extensive and eareful Jabor directed toward proving this possibility, speaks loudly for their integrity; at the same time it indicates that the task they have set themselves, if poesible at all, is Dot a light one. KeyDe8, writing in 1921 of what a.re here called objectivistic views, complained, "The absence of a recent exposition of the logical basis of the frequency theory by any of its adherents baa been a great disadvantage to me in critici.ing it." (Chap. VIII, Sec. 17, of [K4]). I beUeve that his complaint applies as aptly to my position today 88 to his then, though I cannot pretend to have combed the intervening literature \vith anything like the thoroughneM Keynes himeelf would hAve em· ployed. Reichenbach, to be sure, preaenta in great detail an interestiDg view that must be classified 88 objectivistic [R21, but it &eeDl8 far removed from those that domiDate modern statistical theory and form. the main 8ubject of the following discU8lion. Whatever objectivistic views may be, they seem, to holders of necessary and peraonaHstic views alike, subject to two major lines of eriticiam. In the first place, objectivistic view8 typically attach probability only to very special events. Thus, on no ordinary objectivistic view would it be meaninlful, let alone true, to say that on the basis of the available evidence it
ClUTICAL OOMMENTB ON PEBSONAL PROBABILITY
is very improbable. though Dot impolllibleJ that France will become a monarchy within the next decade. Many who bold objectivistic views admit that such everyday statements may have a meaning, but they insist, dependiDg on the extremity of their positions, that that meanjDI is not ~evant to mathematical concepts of probability or even to acienoe generally. The peraonaJistie view claims, however. to ana1yae such statements in terms of mathematical probabilityt and it CODSideni them important in science and other buma activities. Secondly, objectivistic viewa are, and I thjnk fairly, cbarpd with circularity. They are generally predicated on the existence in nature of processes that may, to a sufficient degree of approximation, be repreeented by a purely mathematical object, namely an in6Dite sequence of independeDt events. Tbis idealization is said, by the objectivista who rely on it, to be analogous to the treatment of the vague and extended mark of a carpenter's pencilaa a geometrical point, which is 80, fruitlul in certain contexta. When it is pointed out to the objectivist that he uses the very theory of probability in determiDing the quality of the approximation to which he refers, he retorts that the appHecl pometer--a fictitious cba.racter whose reputation for 80lidity in acience is unquestioned-likewise U8e8 geometry in determining the quality of his approximatioDL Let the geometer then be challenged, and he replies with a tllreefold reference to experience, saying, "It is a common experience that with sufficient experience ODe develope good judgment in the uee of geometry and thenceforth generally experiences 8U.ooeI8 in the predictiODS he baaee on it." "Now,".ya the objectivist, "the geometer's answer is my answer." But it seems to critics of objectiviatia views that, though the geometer may be entitled to make 88 maDy aUusioDS to experience 88 he pleases, the probabiliat is Dot free to do 10, precisely because it is the bwlinees of the probabilist to aoalyze the concept of experience. He, therefore, cannot properly support his position by alluding to experience until he baa analysed that concept, thoqb he can, of course, allude to 88 many experiences as be wishes. Two eorta of mixed views call for special comment here. First, some {amODl them Camap (el]; Koopman [K9], [KlO), aDd (KII); and Nagel [Nl)) hold that two probability concepti playa role in inference, an objectivistic ODe and a personalistic or a nece-uy one. This dualism is typically justified 88 neoesaa.ry to the analyaia of such a concept 88 that of a coin with unknown probability of failiDg heada. But, as 13.7 explains, de Finetti has provided a satisfactory aulyaia on the buia of personal probability alone. 8e00Dd, othere--for example, van Dumg (VI] and Feraud [F2]fiMinl the conventional objectivistic viewa circular for the re&IOD8 I
4.5]
THE ROLE OF SYl\fMETRY IN PROBABILITY
have cited, try to break the circle by relatively iaoIated uee of subjective ideas. Very crudely, it seems to be their position that in any ODe context it is allowable for a peraon to act &S though some one event of aufticiently small (objective) probability, cboaen at his di8cretion, were impc_ible. Quite apart from the relatively technical question of whether any consistent mixed view of this kind ean be CODatrueted. holders of personalistic aDd neeeasary views alike criticise them 81 unneeeaaarily timid, for they embrace subjective ideas, but only giDpdy. I
1'he role of lIJDUDetrJ In probability
An important and highly controversial question in the foundationa of probability is whether and, if 80, how symmetry CODSiderations can deteandne the probabilities of at least some events. Symmetry eoDSideratioDS have always been important in the study of probability. Indeed, early work in probability was dominated by the notion of symmetry, for it wu uauaUy either coneemed with, or directly iupired by, symmetrical pmbJiDg apparatus such &8 dice or cards. To illustrate those cJaasieal problema, suppose that a pmbler is offered eeveral beta concerning the poeaible outcome of rolling three dice, where it is to be understood that refraining from any beta at all may be amODl the available "beta." Which of the available bets should the gambler choose? Pemapa I distort history somewhat in inlilting that early problems were framed in terms of choice among beta, for many, if not most, of them 'Were framed terms of equity, that is, they ISked which of two pJayen, if either, would have the advantage in a hypothetical bet. But, especially from the point of view of the earlier probabilists, such a question of equity is tantamount to a question of choice amoDg beta, for to uk which of two "equal" betters has the advantage is to ask which of them baa the preferable alternative, U W84 pointed out quite explicitly by D. Bernoulli in [BIO). In elect, the clusical worken recommended the foUowiDllOlutiOD to the problem of three dice, with corresponding solutions to other pmbliDg problema: 1. Attach equal mathematieal probabilities to each of the 216 (- 63 ) lJ(aI"ble outcomes of roUiDg the three dice. (There are 61 poeaibilitiea, becauIe the firat, eecond, and third dice can each ahow any of six 8COJ'eI, all combinatiODS being poeaible.) 2. Under the mathematical probability established in Step 1, compute the expected \\innings (possibly negative) of the gambler for each available bet. 3. Choose & bet that has the largest expected winnings among thOle available.
m
CRITICAL COMMENTS ON PERSONAL PROBABILITY
('.6
At present it is appropriate to refrain from criticisms of the use made of expected wimUnp until the next chapter and to coneentrate discussion on the notion that the 216 possibilities should be coDSidered equally probable, which can coDveniently be done by drastically reducing the class of beta cOJl8idered to be available. Say, for definiteneaa, that the only beta to be eoDSidered are simply even-money beta of one doUar, tha.t the triple of ac0re8 falls iD a preasaiped sublet of the 216 possibilities. When attention is focused on this restricted cIaaa of bet.. the total recommendation is seen to imply that the probability meaaure defined in the first step of the recommendation be adopted 88 the personal probability of the gambler. To put it differentlYI a gambler who adopts the recommendation will hold the 216 possible outcom.. equally probable Dot only in some abstract 8eD8e, but also in the sense of personal probability as defined in § 3.2. The notion that the 216 possibilities should be repnled 81 equaDy probable is familiar to everyone; for it is taken for granted wherever gentlemen gamble 88 weD 88 in the standard biglHehool alpbra eoune8, where it serves to illustrate the theory of combiuatioDi and permutatioD•• Traditionally, the equality of the probabilities W88 auppoeed to be established by what was called the priDdple of msafRdeat reum., t thus: Suppose that there is an argument leading to the eoncluaion that one of the possible combinations of ordered scores. ..y t1, 2, 3 J, is more probable than some other, say 16. 3, 4). Then the information on which that hypothetical argument is baaed has such symmetry 81 to permit a completely parallel, and therefore equally valid, argument leading to the conclusion that t6, 3, 41 is more probable than 11, 2, 31. Therefore, it was ~rted, the probabiHtiea of aD combiDatiODB must be equal. The principle of iDBUfficient reason has been and, I think, will continue to be a most fertile idea in the theory of probability; but it is Dot so simple as it may appear at first sight, and criticism has frequently and justly been brought against it. Holden of DecelBBIU')' viewa typically attempt to put the principle on a. rigoroua basis by modifyiDc it in such a way 88 to take account of such criticism. Holders of per8OIIAIistic and objectivistic views typically regard the criticism as Dot altogether refutable, 80 they do not attempt to establiah a formal postulate corresponding to the principle but content themselves 88 I shan here -with exhibiting an element of truth in it. One of the first criticisms is that the principle is Dot strictly appHeable for a person who has had any experience with the apparatus in qUe&t Perhaps wh&t I here call the principle of insufficient reuoo should be called the principle of ropnt reaaon. See Section 3 of (Bill for the diatiDdion iDvolwd.
4.11
THE ROLE OF SY'fMETRY IN PROBABILITY
tion, or even with similar apparatus. Thus, attempts to use the principle, as I have stated it, to prove that there is no such thing as a run of luck at dice, as actually played, are invalid. The penon may have had relevant experience, directly or vicariously, not only with gambling apparatus itself, but also with people who make and handle it, includiq cheaters. It is Dot always obvious what the symmetry of the informatiOD ia m a situation in which one wishes to invoke the principle of jDl81fficieDt reason. For example, d'Alembert, an otherwise great eighteeDth-century mathematician, is supposed to have argued aeriously that the prob.ability of obtaining at least one bead in two toues of a fair coin is 2/3 rather than 3/4. (Cf. (Tal, Art. 464.) Heads, aa he said, might appear OD the Jim tOSB, or, failing that, it might appear on the second, or, finally, might not appear on either. D'Alembert conaidered the three poaaibilitiee equally likely. It .ems reasonable to suppoee that, if the principle of i n ad6 cieDt reason were formulated and applied with sufficient care, the eooclusiou of dtAlembert would appear simply as a mistake. There are, however, more serious examples. Suppose, to take a famous one, that it is known of an um only that it contains either two white balls, two black balla, or a white ball and a black ball. The principle of insuffieient reason baa been invoked to eonclude that the three possibilities are equally proba• ble, 80 that in partieular the probability of one white and one black ball is concluded to be 1/3. But the principle hu also been applied to conclude that there are four equally probable possibilities, namely, that the first ball is white and the second also, that the first is white and the 8eCOud black, etc. On that basis, the probability of one white and one black ball is, of coune, 1/2. Personally, I do Dot try to arbitrate between the two conclusions but consider that the existence of the pair 01 them reflects doubt on the notion that a person's knowledge relevant to any matter admits any full and precise description in terms of propositions he knows to be true and others ahout which he knows nothing. Afost holders of personalistic views do not find the principle of insufficient reason eompelling, because they envisage the possibility that a person may consider one event more probable than another without haviDg any compelling argument for his attitude. Viewed practically. this position is closely associated \1i;th the first criticism of the principle of iblUflicient reason, for the holder of a personalistic view typically 8Upp0ge8 that the person is under the influence of experience, and poe.ibly even biologically determined inheritance, that expmaaes iteelf in his opinions, though not neeesaarily through compelling argument.
cnmcAL COMMENTS ON PERSONAL PROBABILITY
Holders of personalistic views do see some truth in the principle of insufficient reason, because they recognize that there are frequently pal'-
titioDB of the world, associated with symmetrical-looking gambling apparatus and tbe like, tbat many and dive. . people all consider (very nearly) unifonn partitions. As was illustrated in the preeedmg section, we often feel more "awe" about probabilities derived from. the judllllent that such partitions are uniform than we do about others. Such partitioDs are, moreover, very important in that they provide some events the probability of which to diverse people is in agreement.. Though the events eoneemed are often of no importance in themaelvea, agreement about them can, through the statistical invention of randomization, contribute to agreement about all sorts of issues open to empirical investigation. Widespread though the agreement about the near unifonnity of some partitions is, holders of personalistic views typically do Dot find the contexts in which such agreement obtaine sufficiently definable to admit of expression in a postulate. Holders of purely objectivistic views see no seuse at all in the original formulation of the principle of insufficient reason. for it U88I '·probability" in a manner they consider meanin&iess. But they too Bee all element of truth in the principle, which they consider to be establisbed 88 a part of empirical physics. Thus, for example, they regard it .. an experimental fact, admitting some explanation in terms of theoretiea1 physics, that three dice manufactured with reasonable symmetry WI11 exhibit each of the 216 poesible patterns with nearly equal frequency, if repeatedly rolled ",,;th sufficient violence on & suitable surface. Holders of personalistic views agree that experiments Of, more pnerally, experiences determine to a large extent when people employ the idea of insufficient reason. Thus, though experiments with pmblinl apparatus, quite apart from gambling it8elf, have & fascination tbat perbaps exceeds their real interest, such experiments are not altogether worthlese. On the one band, they provide strong evidence that a perSOD cannot expect to maintain a symmetrical attitude toward any piece of apparatus with which he has had IODg experience, unless he is vIrtually CODVinCed at the outset that the p088ible states of the apparatus are equally probable and independent from trial to trial. To say it in the more fanu1iar and sometimes more eongeniall8Dguage of objective probability, long experiments with coins, dicet cards, and the like have always shown some bias, and often 8O~e dependence from trial to trial. On the other hand (and this has the utmost practical importance), it has been shown that. with skill and experience, gambling a.pparatus, or its statistical equivalent, can be manufactured in which the bias and the dependence from trial to trial are extremely small. This implies
'.6)
HOW CAN SCIENCE USE A PEBSONALISTIC VIEW!
that groupe of very diverse people can be brought to agree that repeated triaJa with certain apparatus are nearly uniform and nearly independent.. Thus eertain methods of obtaiDing random numbers and other outcomes of uniform and independent trial8, which are vital to many 80rta of experimentation, have justifiably found a~eptance with the scientific public. A stimulating account of practical methods of obtainiDg laDdom numbers, and random samples generally, is given by Kendall in Chapter 8 (Vol. I) of (K2].
6 How can lCience use a personalistic view of probabiUq? It is often argued by holders of necessary and objectivistic view8 alike that that ill-defiDed activity known as science or scientific method CODaiate largely, if Dot exclusively, in finding out what is probably true, by criteria on which all reasonable men agree. The theory of probability relevant to science, they therefore argue, ought to be a codification of universally acceptable criteria. HoJders of necessary views eay that, just as there is no room for dispute as to whether one proposition is logically implied by others, there can be no dispute 88 to the uteat to which one proposition is partially implied by others that are thought of as evidence bearing on it, for the exponents of necessary views reprd probability 88 a generalization of implication. Holders of objectivistic views say that, after appropriate observations, two reaaonable people can DO more disagree about the probability with which trials in a sequence of coin toeses are heads than they can disagree about the length of a &tick after measuring it by suitable methods, for they con.. Bider probability an objective property of certain physical systems in the same sense that length is generally considered an objective property of other physical systems, small errors of measurement being contemplated in both contexts. Neither the necessary Dor the objectivistic outlook leaves any room for personal differences; both, therefore, look on any penonalistic ,,;ew of probability 8B, at beat, an attempt to predict some of the behavior of abnormal, or at any rate unacientme,
people. I would reply that the personalistic view incorporates all the universally aeceptable criteria for reasonableness in judgment known to me and that, when any criteria that may have been overlooked are brought forward, they will be welcomed into the pel'lODalistie view. The criteria incorporated in the personalistie view do not guarantee agreement on all questions amoDg all honest and freely communicating peoplet even in principle. That incomplereness, if one will call it sucb, does not distre88 me, for I think that at least some of the disagreement we see around us is due neither to di8bODesty, to errors in rea8ODing. nor to
68
CRITICAL COMMENTS ON PERSONAL PROBABILITY
(4.6
friction in communication, though the harmful effects of the latter are almost ineapable of exageration. As was mentioned in connection with aymmetry, there are partitioDi that diverse people all cousider nearly uniform, though not eompe1led to that agreement by any poatulate of the theory of peraonal probability. As has also been mentioned and as will be expJained later (e. pecially in § 14.8), through the statistical invention of randomisatiOD, agreement about partitions pertaining to gambling apparatus of no importance in itself can be made to contribute to agreement in every part of empiric.1 science. Another mechanism that brings people baviDl some, but. not all, opinions in common into more complete agreement was illuatrated in 113.6-7. Indeed, it was there shown that in certain contata aDy two opinions, provided that neither is extreme in & technical sense, are almost sure to be brought very close to one another by a suf6ciently large body of evidence. It b88 been counteredt I believe, that, if experience syatematicaDy leads people with opinions orilinally ditJerent to hold a commOD opinion, then that common opinion, and it only, is the proper eubject of acientific probability theory. There are two inaccuracies in this argument. In the first place, the conclusion of the personalistic view is not that evidence brings holders of different opiniODS to the same opiniona, but rather to similar opinions. In the second place, it is typically true of any observational program, however extensive but preecribed in advance, that there exist pairs of opinions, neither of which can be called extreme in any precisely defined aense, but which e&Dnot be expected, either by their holders or any other person, to be brought into cloee agreement after the ob8ervational program. I have, at least once, beard it objected against the personalistic view of probability that, according to that view, two people might be of different opinions, according 88 one is 'pessimistic and the other optimistic. I am not sure what position I would take in abstract diacuMiOD of whether that alleged property of pel'8Onalistie views would be 0bjectionable, but I think it ia clear from the formal definition of qualitative probability that the particular per8OD8listic view 8pODSOJed here does not leave room for optimism and pessimism, however these traits be interpreted, to play any role in the pel'8On'e judgment of probabilit.ies. + See (Fiaber 1934), p. 287.
CHAPTER
5
Utility 1 Introduction The postulates P4-6, introduc~ in Chapter 3, have already led to simplification of the relation S in so far as it applies to acts of a special but important fonn. Indeed, through the introduction of numeriea1 probability, those special comparisons have been reduced to ordinary arithmetic comparison of numbers in such a '9tray that many relatiOll8 &mODI acts are deducible by simple and systematic aritbmetic calculation. In this ciaapter it \\ill be shown that the arithmetization of comparison among acts can, with the introduction of one mild new postulate, be extended to virtually all pairs of acts. This far-reaching arithmetization of comparison among acts is achieved by attaching a number U(/) to each eOD8eQUence! in such a way that f S I if and only jf the expe<.~ted value of U (f) is numerically leaa than or equal to that of U(&), provided only that the real-valued funetiODB U(f) aDd ll(1) are e88eDtially bounded. The pro,\ision can fail to be met only if there exist acts that are, so to speak, distinctly preferable to any fixed reward or distinctly \vorse than any fixed punilh-
mente A function U that thus arithmetizes the relation of preference amonl acta will be called a utility. It wiD be shown that the mUltiplicity of utilities is not complicated. e,·ery utility being simply related to every other. I have chosen to \lse the name "utility" in preference to any other, in spite of some unfortunate connotations this name has in connection with economic theory, because it W88 adopted by von Neumann and MoIpD8tern when in (V4] they revived the concept to which it Mfem, in a most stimulating \\"ay. Their treatment has been of such widespread intaeBt that the introduction of a Dame other than Uutility" at the present time would cause more confusion than it could alleviate. The next three sections are concerned \lith the t.echnical exploration of the utility concept. I think readers interested in the details will find it best to read these sections twice as & unit, in the fashion I have been recommendiDg for other material in which definitions and propoeitiooa 69
UTILITY
(1.2
are interlarded with proors; others will be content with a cureory readiDa, omitting proofs. Taking advantage of the simplicity afforded by the introductian of utility, I try in 15 to make BOrne progreaa with the problem, pointed out in 12.5, of specifying criteria for the construction of "small worlds. " Finally, t 6 briefly reports the history of tbe utility idea. A aeparate critical aeetion is Dot necessary, becauae the criticisms of the theory of utility known to me are incorporated conveJlientJy into the hiItorial section.
2 Gamblea Before di8cllasi o l utility, it is expedient to eatabliah certain facta, the first beiDg that at least amoDI • rather ricb. clue of acta, name.,. acta coufined with probability one to a finite number of CODaequen0e8, preference depends only on the probability distributiOD of the COIUI8qUeDceB
of the acts.
TBBOREY
1
Hyp. 1. f., ... , I. are" elements of F, ft ~ 1. 2. Ph ••• , p,. are numbers IUch that ZPi - 1. 3. I and h are acta such that P(g(a) - Ii) == P(h(a) == Ii) - Pi,
i - I , ••• ,
ft.
CoNCL.
The theorem is obvioU8 for n - 1. It wiD be proved by induction, supposing henceforth that ft > 1. l.et B denote the intersection of the two events that ,(.) -I. and A(.) ~ I., and let C denote the intenection of the two events thafr A(.) - f. and g(,) ~ I,.. It is easy to see that PCB) == P(C). C caD be partitioned into Co, Cit ... , C.. - It where Co is a null event and Cit i - I • ... t fa - 1, is the interaectioD of C with the event that 1/(') - lie By repeated application of Conclusion 7 of Theorem 3.3.3, B can be partitioned into events Bo, B., ... , B,,-1 such that P(Bi ) - pee,), i - 0, •.. ," - 1. Let Co ca " and define li+l step by step for i c: 0, ..•• 1l - 2 thus: PaOOJ'.
(1)
gi+1 (.) -
I,.
CI+1, for, t B'+I, elsewhere. for
8 •
It is easily seen from the facts of conditional probability that "+1 .. ~ given Bi+l U Ci +h and it is even more obvious that '*1 ... Ii liVeD '"'"'(B'+I U C,+,). 'lberefore 11+1 • ,,, 10 L-l .. ,. ~t
1.2)
GAMBLES
71
P{g4+1(') - 'i) - P(g,(.) -Ii) - Pi, 80 P(g.-l(') - /i) - Pi, j - 1, • • ., n. Thus '._1 is not only equivalent to I but also satisfies the hypothelis of the theorem relative to h, 80 it will suffice to prove the theoarem. for L-l and h in place of I and h. Now 1.-1 has been constructed to equal f. in C, except OIl a nuD let. Therefore h given CUD, where D is the subset of ~C which 1.-1 =- h == f •. Itremaina oD!ytoahowthat'._I" hgiven"-'(C U D). If",(C U D) is null, that is true automatica1ly; henceforth concentrate on the lees trivial situation. If "'(0 U D) is not null, then ~ given ~(C U D) aatisfie8 all the postulatal assumed thua far, and therefore the COD8equencee/l' .. ·'/.-1; the numbers Pi' .. pJ(l - ,.), i .. I, ... ," - 1; the acts and h; and the relation S given ~(C U D) satisfy the hypothesis of the theorem for a case in which it is supposed already to have been proved. •
'.-1 ..
OD
'.-1
In this chapter the notation T,pJ. will denote the class of all acts f· for which there exist partitions B. of8 such that P(Bt} - Pi andJ(.) Ie for - I Bit Here the li8 are a finite sequence of consequences (not neeesaarily distinct), and the Pi'S a corresponding sequence of nODneptive real numbers such that ~Pi =- 1. In view of Conclusion 7 of Theorem 3.3.3, such a class of acta, which will in this chapter be ~ ferred to 88 a gambu and denoted by I, I, " or the like, always bu at least one element. Theorem 1 says, in effect, that the person reprds all elements of allY pmble as equivalent. To put it differently, if the events B, of a partition have the probabilities Pi, and if the act f is such that the cOD8eQuence!. will befall the person in case B. occurs, then the value of f is independent of how the partition B, is chosen. Gambles can be mixed, in & sense, to make new gambles, thus: Let Ii be a finite sequence of gambles, I..
(2)
I
a:z
"P . .J, .. ~., '"
•
and tlJ a corresponding aequence of non-negative real numbers such that ,%(11 - 1. The mUture of the Ij's with weights'l, denoted Zo-IIj, is defined by (3)
'J:,tri'l
== ~ (Ii { ~ Pti!ii} -=
L
«(1iPii)J&/,
i.j
which is meaningful, the fils being consequences and the (tlj#lii)'S being numbers such that 'X,(tliPiJ) == 1. Such mixtures are exemplified by an iDaurance policy in which the benefit is an annuity payable during the
(6.2
UTILITY
72
life of the beneficiary, and by a lottery in which the pri&ea are tickets in other lotterieL In view of Theorem 1, it is natural to say that'S I meaDS that, lor
every act f in the class of acta corresponding to I, f ~ ,. Correaponding definitiona are to be understood for / S I, I < fI, J ~ " etc.
< p .s
TIIEOREJI 2 If I, ,. and " are gambles. aDd 0 (1 - p}6 :S P8 (1 - p)lt, if and only if IS,.
+
1; then
pi
+
Let f, I; Ii, gi; and B i , Cj be acts, couequences, and partitiODl 8Uch that f and , are among the acta repreeented by , and ., respectively. with J(,) == I, for _. Bi aDd ,(I) i!=I g; for, e CJ. Coaatruet DiJ C Bi CI such that P(D,;) .. PP(Bi el). and let D - U Dijo Then P(D) - p, PCB. I D) - PCB,), and P(Cj I D) p(ej). What is to be proved is, in effect, that f S I given D, if and ooly jf f S I. In view of Theorem 1 it is clear that whether that is 80 or Dot for f and I does Dot depend on the particular choice of Dj 80, with an obvious temporary exteuion of terminology, it is to be proved that f S I pven p, if and only if f S .. If f .. I given a for every 0 < (I ~ I, there is nothing to prove. Otherwise it can be assumed without 1088 of gener&lity that, for some PROOF.
n
n
given ao· In view of 'Theorem 2.7.2, if a fJ ~ 1, f ~ C given a, and f ~ C given fJ; then f ~ I given (<< fJ), and similarly f ~ I given a/2. Making use of P6 and Theorem 2.7.2, it can easily be shown that, for any CI sufficiently close to cro, f < I given cr. The prececiiDI three paragraphs imply that, in the case at, hand, f < I given a for every ex, 0 < a St.. ere, f
+
3 U I (1 - cr),_
THEORDl
,,1 +
< I,
+
and 0 S
< pSI,
fI
then pi
+ (1 -
p),
<
PRooF. In view of the immediately verifiable identities, pi
+ (1
- p), - (p - t/)I
+ [1 -
(p {
_») X fS
I
(1 - p) I} . 1 - (p - 0')
1 - (p - 0')
(4)
+
tTl + (1 - cr), - (P - 0'), + (1 - (p - 0')] X {
(f
1 - (p - cr)
I
} + 1 -(1 (p- -p) .) ••
5.3)
UTILITY, AND PREFERENCE AMONG GAMBLES
this theorem is a 8pecial case of Theorem 2; unless p - 1, and (/ in which case it is trivial. •
'1 < 12 and 11 < I S 1
THBoIWI 4 If ODe p such that P'1
+ (1 -
2)
73 ~
0,
then there is one and only
p)ft .. I.
PaooJ'. It follows immediately from Theorem 3 and the principle of the Dedekind cut t that there is one and only one Po such that
+ (1 <111 + (1 tI'l
(5)
0)'2
< I,
11)12 > I,
> Po if (/ < Po.
jf 11
According to (5), no number, except possibly Po, can satisfy the equivaJence demanded by the theorem. Finally, using (5) and P6 (much &8 it was used in the proof of Theorem 2), it follows that Po does indeed satisfy the equivalence. • 3 Utility, and preference &mODI gambles
The idea of utility can most conveniently be introduced in connection with gambles or, equivalently, acts that with probability one are confined to a finite number of cOIlSP.Quences, thus: A utility is a function U 8880ciating real numbers "itb consequencea in such a way that. if '%PJi and II - ~(TRJ; then 1 ~ II, if and only if ~p,U(fd S '1:,"jU(g/). Writing U(I) for 'Z,PiU(f.), the condition takes the fonn flff].s Similarly, it is convenient to understand that, for an act f, ,
:=II
(1)
ur,].
U[f] - B(U(f).
In this notation the following obvious theorem gives a slightly different characterization of utility.
TBBoRllll 1
A real-valued function of consequences, U, is a utility; if and only jf f S C is equivalent to U(f) < UflJ. provided t and C are both with probability one confined to a finite set of consequences.
Do the postulates thus far assumed guarantee that any utilities exist at all? Can Theorem 1 be extended to an even wider cl888 of acta? Does a great diversity of utilities exist, or does the relation ~ practically detennine the function U? These questioDS, here mentioned in the order in l\iUch they most naturally arise, are manjfestly of great importance in understanding utility. 'For technical reasons, they will
t cr., if necelllal'Y. any introduction to the theory of the n!al numbers for e~plan. lion of t.hia principle, e.g.,
Cha~r
II of (G3).
(a.a
74
be anarwezed in a different order-the third followed by the first in this aectiou, and the a!>COIld in the next aection.
U there is a utility at aU, there is aurely more than one, because • utility plus a coD8tant and a utility times a positive coDStant are aJeo obvioualy utilities; thU8: 2 If 11 is a utility, and p, then 11' - pU + (I is aleo a utility.
TBBORDI
(I
are real nwnbera with
p
> 0;
If there exista a utility, and if J < g; then there ex... ists a utility U for which U(J) and U(g) are any preassigned pair of
COROLLARY
1
numbers, provided UCJ)
<
U(g).
Theorem 2 says that any increasing linear function of a utility is " utility. The next theorem. says that, cODvemely, any two utilities are neeeasarily increasing linear funetiona of ODe another.
3 If U and U' such that U' - pU + (I, p >
THEOREM
anl
o.
utilities, there exiat numbera p and
t1
The finJt step of the proof will be to demoDBtrate the following identity for the two utilities U and U' and for any three COIl8&quences I, g, h. 1 1 1 (2) U(f> U(g) U(h} - o. PROOJ'.
U'(/)
U'CJI)
U any two of the consequences I,
U'(h)
" A are equivalent, two column8 of
the determinaot in question are equal, and therefore the determinant vanishes. It can be assumed, then, that no two of /, g, and,. are equivalent; and there is no 1088 in generality, as may be eeen by permutiDs columns, in aasuming J < g < A. Theorem 2.' now pennite the couclusion that there is a p such that pf (1 - p)1a - g. Therefore,
(3)
+ 1 =- pI + (1 U(g) == pUU) + (1 U'(g) pU'(J) + (1 :I
Thus the middle
COlUJDlI
p)l
p)U(I&) p)U'(Ia).
of the determinant i.4J linearly d(lpendent on
tbfl other two~ so th~ d~tertnin8nt vaDjsh~8~ 88 ""u a~rtf'd. Now let g and A be any fixed pair of cOD8eQuences such that g < II, the existence of such a pair being IlIUm by P5. Equation (2) can be
1.3]
UTILITY, .UTD PREFERENCE AMONG GAMBLES
mceeaaively rewritten, where (4)
J
is an arbitrary CODBequeaC8, thus:
l[U(g)U'(h) - U(h)U'(g») - U(f)[U'(A) - U'(g)]
+ U'U)[U(h) (5)
U(g») - 0,
U'(h) - U'(g) U(g)U'(Ia) - U(Ia)U'(g) U'(/) - U(A) _ U(g) V(/) U(A) - U(JJ) •
which proves the theorem; for U'(h) - U'(g) and U(A) - U(g) are both positive. • If V and V' are utilities such that, for 80me , < A, U(g) = U'(g) and U(A) = U'(h); then U and V' are the same, that ii, for every I, U(f) .. U' (f).
COROLLAay 2
To 8Um marize, if there is a utility at all, there are an infinite number, but the array of utilities is Dot complicated; for all can be generat«l from anyone by increasiDg linear traD8i'onnations. Tum now to the question of existence. THEOREM
4
There exists a utility.
PllooJ'. Von Neumgn and MorpDBtern prove eseentially this theorem, as weD &8 the preceding one, in the appeDdix of (V4). The following proof is thein, expreeeed, as the teacher used. to say, in my own words. For this proof only, certain special nomenelature is introducecL A set of gambles' is corwex; if and only if, for every " I I , aDd p, 0 ~ p S 1, ,1 + (1 - P)II'. An inlenJall of pmbles is the let of all pmbles I BUch that, for some fixed 9 and " (which detennine the in.terval), , SIS 1. A hyper-vlililll V on a convex set F is a real-valued function of the gambles of f, such that , ~ I, if and only if V(I) S V(I),
+
+
and such that V(pl (1 - p),) ell pVC'> (1 - p)VC,). The fonowing remarks about this special nomenclature are obvious and will be repeatedly used in the proof, without explicit reference. The set all pmbles is convex. The intersection of two convex aeta is convex. Every interval is convex. There is an interval containing any fini~ Bet of gambles. If there is a hyper-utility on the set of all pmblea, it ia a utility when eonfined to consequences. By the same method that led to the proofs of Theorems 2 aDd 3. if there is a hyper-utility on f containing. and II, with I < It, then there is one and only one hyper-utility V on f such that V(I> - 0 and V(') -= 1.
or
7.
UTlLlTY
[6.6
If , is the interval determined by , < la, then, according to Theorem 2.4, there is for every' in I a unique Dumber, call it V (I), such that (6)
, II.
(1 - V(f), + V(I)1a..
By repeated use of Theorem 2.2, it follows for any I, , ., that (7)
pi + (1 - p)' - ,{ (1 - V(I»g
+ (1 =s
+ V(I)A}
p) {(I - V(I')g + V(f')h)
+ (1 + (PV(f) + (1 -
11 -
[PV(I)
,)V(I'))), ,)V{f'»)h,
V is a hyper-utility on the convex aet •. From here OD in this proof, let I, ]a be a fixed pair of CODaeqUeDce8 with IJ < h. Making use of the precediug two paraarapha, there is • UDique hyper-utility asaiping the values 0 aDd 1 to g and I&, respectively, OIl any one interval containinI , and h. The interaoctiOD of two lIIe1l tervals is a convex set containing, and h, and on the intenectiOD the hyper-utilitiea &88Oeiated with the two intBvall &1'8 both hyper-utilitiea attaching 0 and 1 to g and la, respectively; they must, therefore, be equal to one another on the intemection. Any gamble I is an element of some interval containing g aDd A. Let V(I) be the common value . .igned to , by all the hyper-utilitie8 that are defined on intervals containin,l, g, and A aDd that AMip the values 0 and 1 to , and h, respectively. Since there is always at leut one such interval for any pmbJe I, the function V is defined for all pmblee. The proof will be complete when it is shown that V is a hyper-utility for the convex set of all gambleL Let, and , be any two pmblea and , a number, 0 S , S 1. There is aD interval containing " " I, A. and pI + (1 - p)f'. In that interval the function V is a hyper-utility. Therefore V(PI + (1 - ,)1') - pV(f) + (1 - p)V(,) aod V(f) S V(I'), if and only if' ~ , . • 80
m-
,
The extension of utDlt7 to more paera! acts The requiremeDt that an act have only a fiDite Dumber of COD8equences may seem, from a practical point of view, aim. . DO requirement at all. To iUustrate, the Dumber of time iDtervaJa that milbt poasibly be the duration of a hUll1&D life caD be regarded 88 finite, if you agree that the duration may 88 well be rounded to the DIUeIIt minute, or aecond, or microeecond, aDd that there is almost; DO poBbility of ita exceeding a thousand years. More geDera1ly. it is plauaible
S.fl
that,
THE EXTENSION OF UTII"ITY TO "lORE GENERAl" AC7S
77
matter what set of consequences is envisaged, each consequence can be practically identified with some element of a suitably cboaen finite. though poeaibly enormous, subset. It might therefore seem of little or no importanoo to extend the concept of utility to aets having an infinite number of consequences. If that argument ,vere valid, it could easily be extended to reach the conclusion that infinite seta are irrelevant to aD practical affairs, and therefore to aU parts of applied mathematics. But it is one of the most profound lessons of mathematieal experience that infinite seta, tactCully bandIed, can lead to great simplification of situations that could, in principle, but only with enormous djfficulty, be treated in tenns of finite sets. How diffieult it would be to study geometry if ODe made at the ou~rtet the "simplifying 888Umption" that to all intents and purpoees at most 101 ,000 points in space can be discriminated from one another! Again. it is generally more convenient and fruitful to think of the annual ea.~h income of an individual or finn as a continuous variable with an infinite number of poesible values tban as a discrete variable confined to some large finite Dumber of values, even if it is knoVlD that the income must be some integral number of cents less, say, thaD 1010. ODe way to extend the concept of utility to acts with an infinite number of consequences would be to postulate: If U(f1 and UlIl boUl exist (the values +00 and -co being regarded as possible); f I, if and only if U(f] S U[c]. I see no serious objection to making this assumptiOD outright, though it might be complained that the &S8Umption is motivated more by general mathematiea1 intuition and experience than by intuitive standardH of consistency among decisioD8, ,,·hich I have tried to take 88 my sole guide thus far. A statemeDt almost &I strong 88 the one in question can, however, be derived on adjoining a new postulate, P7, more in the spirit of Pl-6. 'rbat rather technical program "iD be carried out in the next several paragraphs. Those not intereatAd can safely skip to the paragraph following Corollary 1 on DO
s:
pap SO.
Suppoee that every possible conseqtience of the act g is at leMt B8 attractive to the person as the act f considered as a whole; then it seems to me within the spirit of the sure-thing principle to conclude that f S Ii the same might 88 fairly have been said for the relations ~ J and also for the two relations < given B and > given B. This idea is formalized in the following postulate, which, aceording to the conventiODS of mathematical double-talk, is to be interpreted as t,vo propositioD&-One having S and the other ~ throughout.
P7
If f
S
(~)
g(,) given B for every BIB, then f
S
(~),
given B.
rIA
78
Attention haa been called to the mathematically u.eeful fact that. if PI~ apply to a relation S J then they &lao apply to any ze1ation S given B, provided B is Dot nuD. It ia obviOUI that the aame is true for PI-7, a fact that will be used often. It is also noteworthy that Pl-7 obviously imply the propositions that ariae if in them every iDataDee of the sip S is replaced by ~ and every iDataDce of ~ is replaced by S. Therefore in any deduction fJ'ODl Pl-7 every iDataDce of the mil'l and ~ can be reversed to produce a deduction that may be caJIed the qmmdric dual the original deduction. This remark, a leaitimate child of the principle of inmflicieDt re&8OIl, has not been important heretofole t because almost all deductiODS thua far made have been their own symmetric duals. Since that will Dot be 80 of some of the lemmM in the preeent section, much needless writm, and thiDkiDg caD be saved by agreeing at the outset that, once a result is proved, it and its 8YIDmetria dual may be UBed as if both had been explicitly proved. Before goiDg to work with P7, some may wish to Bee all example of a mathematieal structure satisfying PI-6 but not aatiafyiDg Fl. Moreover, undel'BtaDding of such an example will do much to clarify the UBe8 to be made of P7. To construct the example, begin by lettiDa S be a set carrying a finitely additive probability measure P under which 8 can be partitioned into subsets of arbitrarily amaU probability. Let the set of consequences be the half-open interval of numbers 0 ~ f < 1. Let U(J> I : J, U(f) - 8(1), and
s
or
(1)
VII] - lim PI/(,) • -0
~ 1-
.1 .
Since the probability in (1) decreaaee with ., there is DO question about the existence of the limit. Now let Wlf) - utI) VlIl, and define f S I to mean that W[f) S WlIJ. Checking postulates Pl-6, it will be found that the < thua defined satisfies them all, and that what haa here been called UU) is indeed & utility for S. But if, for example, there is an f such that U[fI - VII) ~ 1, P7 is violated, 88 can be seen by compariq f to the act that, for each " takes &8 value the maximum. of t and 1(.). Whether there can be such an f, mayI 80 far as I know, depend on the choice of S and P. But, if the positive integers are taken as 8, and P is 80 chosen that though the probability of anyone integer is 0 the probabDity of the set of even integers is 1/2, a poaaibUity 888UJ'ed by the Dote to Section 3 of Chapter II on p. 231 of (B4]J the function equal to 0 at the odd intege1'8 and equal to (1 - lin) at each even A is such an f. Finite, 88 oppoeed to countable, additivity P!IND8 to be essential to this example; perhaps, if the theory were worked out in & countably additive spirit from the start, little or no oounterpal't of FI would be necesaary.+
+
+ .Fishburn (1970, .'dercise 21, p. 2lS) haa sua-ted an appropriate ..t-
eningof P7.
6.4J
THE EXTENSION OF UTILITY TO MORE GENERAL Aar8
79
Several lemmas depending on P7 are now to be proved ~tory to proving that U[I) governs preference for a very large cluI of acts. It is to be understood throughout the section that U is any fixed utility. The truth of each lemma is intuitively clear, in the seD8e that each could justifiably be aeeepted 88 a postulate if need be. Since they are also easy to prove and of secondary interest, condensed proofs "ill suftice. LEMMA
If, for every consequence h, f
1
< A, and I < "; then f
.. I.
PaoOF. Conaider in the light of P7 that f S ge,) and , S f(,) for every •.• If there exists a consequence 10 such that f S lOt and if U(J(.) S Uo for every I, then there exists a pmbJe • such that f S • and U(,) < Uo• LEMMA,
2
If UUo) SUo, then • can be taken to consist of 10 alone. Otherwise, let /1 be any consequence such that U(fl) S Uo and let. be the unique mixture of 10 and 11 such that UC.) - Uo•• PROOF.
IIDOIA
Hyp.
3 1. Tbe Bl8, i ;: 1, ... , n, are a partition, and the Ui's are
corresponding numbers. 2. f is an act such that U(J(.» < U, (or, • B,. 3. I is a pmble such that I :S f. UrI]
CoNCL.
oS 'XUtP(B.).
PRooF. If the lemma were falae, it would be falae even for some' < f. Then it may be &88UDled, modifying t if need be by meaDS of P6 and Lemma 1, that there exists for each i an fi such that f < fi given BiNow, in view of Lemma 2, there exists for each i a gi such that f S Qi
ut g = ~P(B.)g"
given B, and Ulgil < U ,. f S g. Therefore, utf) S Ufo]
=:
and obseJve that , %'P(Bi ) U(fli) S %P(B~) U. . .
<
An act will be called bounded if its utility is, according to ordinary mathematical ueap, an essentially bounded random variable; the notion is put in a more formal and self-contained way 8B follows: A boutadt4 ad is aD act, f IUCh that, for eome two numbers U0 and UIt PI U0 ~ U(J(a» S UJ) ~ 1. The de8Dition is clearly not, dependent on the
choice of U. TaEoRlDl 1 U(f) ~
If f and I are bounded, then f S " if and only if
UlIJ·
PRooF. If there exist , and " such that g S f S A, then there is, by Theorem 2..4, a mixture , of , and " such that , .. f. The null event
UTILITY
80
which VUe,»~ i8 not, between Uo and UI may as weD be diareprded; the rest caD be partitioned into ft -+ 1 events B, defined by the condition that • e Bi if and only if V i-I S U(J(.» < Vi, i -= 1, •.• , ,,+ 1. where OD
(2)
i - 0, ... , "
+ 1.
Applying Lemma 3 and ita symmetric dual, :ZVi_1P(Bi )
(3)
S
U[I)
S
~vJ>(Bi ) ..
Similarly, acconlilll to Exercise 3 of Appendix 1, ~V._IP(Bi) ~
(4)
U(f]
~
ZV,(B.).
Therefore (6)
I U(fJ -
I
U[I] S l:(VI - V'_l)P(B,) - (U I
-
Uo)/n,
whence U(f) -= U(I). To consider the remaining case, suppose that the bounded act f exeeeda (is exceeded by) every CODaequeneej caD it for the moment big (l","). Aceonlinl to Lemma 1, all big (aDd, dually, alllitt1e) acts are equivalent to one another. Furthermore, it is, for example, eaeily aeen that, if an act is big, then for ~ > 0, (6)
p{ U(/(8»
~ sup U(f) I
II!} -
1.
(Some may be more familiar with the notatiOD "LUB" and "GLB," read "least upper bound" and Ugreatest lower bound," thaD with the conespondiDg "sup" and "w," read "supremum" and "infimum." If even theae older terms are not famUiar, see Exercise 4 of AppeoWs 2.) Therefore, if there are big (little) acta, they all have the II8Dle expected utility, namely sup U(J> (inf UU». . Suppoee now that f S I. It is poaaible that f and I are both little; that f ia little, and I is equivalent to lOme pmble; that f is Htde and I big; that f and I are each equivalent to some pmble; that f is equiv.. lent to some pmble, and , it big; or, finaDy, that they are both bil" In each of these eases, a simple argument showe that U(I] ~ UIIJ. The CODveI8e arguments are similar. • COROLLARY 1 If t and C are bounded, and P(B) > 0" then f S I given B, if and ollly if E(U(f) - U
I
+ Peter Pishbum (1970, pp. 1M,208-207) ADd I have .usee diIeoverea to my 8urpriae that tIleee postulates imply bounded utility, ..veraI paragraph. ill • Dew light.
wId_
pat. the ue.d
5.41
THf;
l'~XTEN~ION
0." UTII.ITY TO 1iORE G[t;NERAI. ACTS
81
I do not think, however, that the question is 8Ufficiently intereetiDg to warrant attention here, especially since there is some reaaon t first stated by Gabriel Cramer in 8. letter partially reproduced in [BIO), to postulate that there are upper and lower bounds to utility, in which cue all acta would necessarily be bounded. Even without P1, the postulates imply, in the following RnS8, that no gamble has infinite or minWi infinite utility. An act f has ifl.Jiniu (minU8 infinite) utility; if and only ifJ for some fI < (» h and for every f > 0, there is a B with PCB) S f and suoh that, the act equal to f on B and to g on ",B exceeds (is exceeded by) h.
A gamble or a coD8eQuence would be said to have infirtib! (mi,.U8 if&finiU) vliLily, if one of the acts correspondinS to it had infinite (minU8 infinite) utility. Indeed, Theorem 2.4, a deduction from Pl-6, ob\iously implies that there are no infinite or minus infinite gambles or consequenCe8. It may, however, be mentioned that Pascal held that, in just the aense at hand, salvation is an infinite consequence «P2), pp. 189-191). Again, it is often said, in etlect, that the utility to a pet'8OD of immediate death is a COIUlequenee of minus infinite utility, but ca8ual obsen-ation shows that this is not true of anyone-at least not of anyone who ,,·ould CJ'088 the street to greet a friend. In the same \?ein, medicine often gives lip aervice to the idea that the death of & patient is of minus infinite utility, and, of course, doctors do go to great lengths to keep their patients alive; but a doctor who took the idea too seriously would make a nui8&Dee of himaelf and soon find himaelf with no patients to treasure. H the utility of consequences is unbounded, lay from above. t then, even in the presence of Pl-7, acta (though not gambles) of infinite utility can easily be constructed. My personal feeling is that, theological questioos aside, there are DO acts of infinite or minua infinite utility, and that one might reasonably 80 postulate, which would amount to aanuning utility to be bounded. Justifiable though it might be, that aasumption would entail & certain mathematical awkwardness in many practical contexts. For ex.. ample. sa will be discu8&ed at greater length in Chapter 15, it sometimes _rna reuonable to auppoee that the penalty for acting as though a particular unknown number were 11 iDstead of its true value, 1', is proportional to jI -= (p - p)2. But, if the possible values of II are uDboUDded, then 80 are the possible valUe! of I, 80 utility is here taken to be unbounded. On close scrutiny of such an example one alway8 finds that
t That hi, ift for every V. there ia a eoneequenee J IUch that V
82
[5.6
UTILITY
it is Dot really reasonable to 888UDl8 the penalty even rouchlY propor· tional for large values of ''', but rather that larp values are 80 improbable that the error made in mjaappraiainl the penalty .BBOCiated "ith them is negligible compared to the saviDg in simplicity resultiDg from the misappraisal. If the 88SUmption of bounded utility were made part of the theory of pelSODal probability. then any example in whieh unbounded utility is used for mathematicallimplicity would be in CODtradiction to the postulates. I propose, therefore, Dot to assume bounded utility forma1ly, but to remember that problems involviDg unbounded
to"
utility are to be handled cautiously. To take stock of the chapter thus far, utility baviDg been established, it is DOW superfluous to cODlider that CODI8qUeDceI may be of all aorta, since the postulates imply that in virtually every context a consequence is adequately characterised by its utility, lOme one uti6ty function having been chosen from the linear family of p(aibBitiea. Therefore, unJeg the contrary is clearly indicated, /, " and " wiD henceforth mean not exactly coDBeQueneea in the BeD. used to date, but rather real numbel'8 measuring utility in unite to be called um... Correspondinaly, an act f will henceforth be understood to be a real-valued random variable. The entire theory of preference, at least for bounded acts, can now be summarized by the following rMum': R
f ~ C given B, if and only if PCB) == 0, or E(t -
II B) S o.
From DOW on, though not formulated as a postulate, it is to be 88SUDled without further quibbliDg that R holda, provided only that E(l) and BC6.) exiat and are finite; no attempt will be made to compare acta for which tbe expected value does not exist or is infinite. If a person is free to decide among a 88t F of acta, he Wl1. presumably chooee one the expectation of which is f1(F), where
(7)
,,(F) - aup E(f), 's)
provided that 8uch a one exists. This provision must be mentioned, even though a set F for which ,(F) == 00 will, by cODvention, not be considered to give rise to a valid decision problem; for, if F is infinite in number, there may be no act in F with expectation quite as great 88 v(F). Nonetheless, tI(") may, in a sense, be regarded as the value or utility of the set of acta P, as is discussed in the penultimate parqraph
of 16.5. I
Small worlds Allusion was made in the penUltimate paragraph of § 2.5 to the practical necessity of confining attention to, or i80latiDg, relatively simple
6.6]
SMALL WORLDS
83
situations ill almost all applicatioDS of the theory of decision developed in this book. As was mentioned there, I find it difficult to say with any completeness how such isolated situations are actually arrived at and justified. The purpose of the present section is to take some steps toward the 801ution of that problem or, at any rate, to set the problem forth as clearly as I can. This section, though important for a critical evaluation of the thesis of this book, is not essential to a cuual readinl. MakiDg an extreme idealisation, which has in principle guided the whole araument of this book thus far, a pel'8OD has only one decision to make in hia whole life. He must, namely, decide how to live, and this he mi,bt in principle do once and for all. Though many, like myself, have found the concept of overall decision stimulating, it is certainly highly unrealistic and in many contexts unwieldy. t Any claim to realism made by this book-or indeed by almost any theory of personal decision of which I know-is predicated on the idea that some of the individual decision situations into which actual people tend to subdivide the single grand decision do recapitulate in microcosm the mechanism of the idealized grand decision. One applieation of the theory of utility to overall decisioDB has, however, been attempted by Milton Friedman in [FII). The problem of this section is to say 88 clearly 88 pouible what con... stitutAWI a satisfactory isolated decision situation. The general method of attack I propose to follow, for "'ant of a better one, is to talk in terms of the grand situation-toDpe in cheek-and in those terms to &Daly. and discusa isolated decision situatioDS. I hope you \\ill be able to agree, as the diSCU88ion proceeds, that I do not lean too heavily on the concept of the grand decision situation. Consider a simple example. Jones is faced \\ith the decision \\iletber to buy a certain sedan for a thOU8&lld dollars, a certain convertible also for a thousand dollars, or to buy neither and continue carleea. The simplest &Dalysis, and the one pnera11y aBlDled, is that Jonee is deciding between three definite and sure enjoyments, that of the sedan, the convertible, or the thousand dollars. Chance and uncertainty are conaidered to have nothing to do with the situatioD. This simple analysis may well be appropriate in some contexts; however, it is not difficult to reeopjm that Jones moat in fact take account of many uncertain future poEibiHtiea in actually making his choice. The relative
t Umealilt.ic tbc'II.gb the eooeept i.e, it would be •
mletake, arising out of elliptical
p.....f&tion. to auppo8e that the CODcept predicatee the eboice of & oomplet. life-IODI poney by new-bom babiM. If a per80D eYer reaebed lUeh a level of maturity as to be able to make a lifelq choi08 for his life lrom that time on, be would then beoome a pencm to whom the CODcept could be literallJ applied.
UTILITY
(5.5
fragility of the convertible will be compeuated only if Jones's hope to
arrange & loug vacation in • warm and 8CeDic part of the country actually materialir.es; J ODes would not buy a car at aU if he thought it likely that he would immediately be faced by a financial emergency arising out of the sickness of hjmself or of some member of his family; be would be glad to put the money into a ear, or almost any durable goods, if be feared extensive inftation. This briDge out the fact that what are often thought of 88 COJUJeqUenC88 (that is, sure experiences of the deciding person) in isolated decision situatioDS typically are in reality highly uncertain. Indeed, in the final analysis, a coDBequenee is an idealization that can perhaps Dever be wen approximated. I th~ fore suggest that we must expect acta with actually uncertain CODlequenees to play the role of 8U~ COD8eQuences in typical isolated decision situations. Suppose DOW, to elaborate the example, that Jones is preaented with a choice between tickets in several different lotteries such that, whichever he chooeea and whatever tickets are drawn, he wiD win either nothing, the aedan, the convertible, or a thousand doUara. None of these four CODaeqUence8 not even "notbingu-i8 actually a sure consequence in the striet aense, &8 I think you will now undel'Btand. I propoee to analyze Jones's present decision situation in terms of a "small world." The more colloquial Greek word, mierocoem, will be reserved for a special kind of small world to be described later. To deaenbe the atate of the small world is to say which prize is assoei&ted with each of the tickets otJered to Jones. The small-world aet8 actually available to Jones are acceptance of one or another of the tickets. The generic small-world act is an arbitrary function taking as its value ODe of the four sm.a.U-world consequences according to which lJIDallworld state obtains. It will be noticed that the small-world states are in fact events in the grand world, that indeed they constitute a partition of the grand world. If there are an infinite number of small..,,·orld states, as indeed there must be, if the small world is to satisfy the postulates Pl-7, then the partide:. in question becomes an infinite partition. t Theee CODaideratioDl lead to the following technical definitions. Let the grand tDOrld 8 be, as always, a set with e1ementa ., ", •. -. The grtmd-t.r1orl4 ~, F may 88 well be taken to be & bounded
t Teclmica1 Dote: It is mathematicaUy more aenera1 ADd elepot Dot to mM.t that the ema1J world have stat. at, all, but rather to apeak of • apecUl e.... of enoUl .. emall-worldeventa.This class should be cloeed UDder eomplemeDta and &Dite UDioN. ID ~ UIe IIDaIl-world event., and thereby the small world ite&1f, coutmJte •
B001em IUbalpbra of the Boolea alpbra of the IraDd-world events.
5.S)
SMALL WORLDS
let of real numbers. The gran4-tDorid ad4 are then real-valued funetioo8 I, I. h, ...• The preference ordering between acts is determined by the conditiOD that f S I if and only if
E(f -I) SO,
(1)
where the expected value indicated in (1) is derived from a probability measure P characteristic of the grand world or, to be more exact, of the penon's attitude toward the grand world. The construction of a mall world S from the grand world 8 begiD8 with the partition of 8 into subsets, or ~ld . . . i, i ' t • • • (not necessarily finite in number). Throughout this technical discu88ion, it wiD be D.8C-!$8ary to bear in mind certain double interpretatiODB such 88 that I is both an element of S and a subset of S. Strictly speaking. a ~ltl ~ B in B is a collection of subsets of 8 and not itself a sublet of S. However, the union of all the elements of B, regarded &8 IUbaeta of 8, ill an event in 8; call it (B). The lImal) world, &8 I mean to define it, is determined Dot only by the definition of a state, but also by the definition of small-world consequences. A ~ld consequence is a grand-world act. A set 1 of grand-world acts, regarded as small-world CODaeqU8nC81, is thU8 part of the definition of any given small world. It will be mathematically simplest, ad coat little if anything in insight, to suppose that the elementa of II are finite in number. They will be denoted I, " " ... ; and, wheD the small-world consequence I is recognized a8 a grand-world act, 1(') will denote the grand-world coMeQuence of / at the grand-
world atate •. A 1Dlal1-world act I is, of course, a function from small-world states I to walLworld conaequences J. In this isolated technical discussion. we will bobble along with the notations J(') for the smalJ.world OOD8equeoC8 attached to. by I, and 1('; J) for the grand-world COMequence attached to·. by 1<.) recognized as a grand-world act. Each small\vorld act , gives rise to & unique grand-world act I, defined thw.: J(,) .. DII('; 1(.»,
(2)
where 1(_)
meaD8
th&t small-world state I of which the grand-world
state • is an element.
The' diatinction between I and I, like aome other diatinctiooa I have thoqht it worth while to make in the pneent complicated context, is perhaps pedantic. At any rate, it is to be undentood as part of the definition of a small world that f S I if ad 0Dly if f that is, in
s a,
·
~ ~~ view of (1), if and only if B(b ~ B(t). In tbi8 CODIlectiOD, it is UI8ful to note that
(3)
E(l) ==
l: E(111('(a»
f.' == I: , E(~ IJ(I(,»
-= £)P(J('(,» -= i)
.. £jP(j('('»
- I).
It may be advantageous to review (3), and thereby the whole technical definition of & small world, in terms of an example. A lDIalLworld act. typified by the purchase of a lottery ticket, amounts to acceptiDI the consequences of one of several ordinary grand-world acta accordiDc to which element of a partition does in fact obtain. For eumple, the participant in a lottery may drive away a car, lead away a goat, face a firing squad, or remain in the status quo, according to the terma of the lottery and aceordiDg to which ticket he hae in fact drawn. LettiDs the example of the lottery stand for the general aituatiOD, the expected utility of a lottery ticket can be computed by the partition formula (3.5.3) from the conditional expectation &8IOCiated with each ticket, which is what (3) does. It may fairly be .id that a lottery prize is not an act, but rather the opportunity to cbooee from a Dumber of acta. ThUl a cash prise puts its p088e88or in a position to choose amoDl many purcbaaea he could not otherwise aft'ord. I believe that analysis to be more nearly correct, but it is more complicated; and, if ODe thiDka of each set of acta made available by a lottery prise 88 repreaented by a best act of that eat, the more complicated analysis seems BUperfluous, at least in a first
attack. A small world is completely satisfactory for the use to which I mean to put it, if and only if it jtee1f satisfies the leVan postulates and leads to-more technically, agrers with--& probability P such that (4)
(or all (5)
PCB) - P([S»
B c S and baa a utility n such that rJ(])
::I
E(])
for all J I P. For the preeent context, call lOch a completely satisfactory small world a microcoam; if the small world aatiafiee the postulates, but does not neeessarily admit P 88 its probability nor rJ ... utality. call it a paeudo-micrOCOBm. To display the circumstances under which a small world is & pseudomicrocosm, I shall briefly comment on each of the postulates in the form given on the end papers of this book, referriDg to them here as
8.IJ
91-7,
SMALL WORLDS 88
87
opposed to Pl-7, to emphasise that they are here beine
COD-
sidered with respect to B and P.
PI
Simple ordfJrtng.
Automatieally.tis&ecl. Indeed it is directly implied by Pl.
112
CmuliIitmal pre/mmt'A wll deftn«l.
Automatic.
1'3
Conditional fn/erenu dou not effed ~.
Requires exactly that, for every J. &.
1S ,
b.
1& ~ , given (B),
giVeD
[SI,
,
I',
and! c S, either:
jf and only ill
< I, or
for every Ii, , e ,.
In these inequalities the e1ementa of , are of course interpreted as grand-world acts•
.P4
Qualit4li. peratmtJl probabilitr wll defined.
Requires exactly that, if 1 < , aDd hi S he, where 1&j(.) - ,
-I
far a I [B)
for I e ~{B] ·
(6) Ao(.) - ,
-/
for. «[C] for a I "'IC);
then Ia'i S h'c. where It'j and It'c are defined in terms of/',,,, I' < I', in aualogy with (6). This postulatB is automatic in case , has at moat two elements.
P6
TAl 1*"8Oft, b
ItnM
tkJimt. pre/~.
, for some J, , • ,. ParIWm 0/ tDOrlda into liny eventa.
Requiresl <
Ps
It is clear that this postulate is not automatic, that is, it is Dot implied by the validity of Pl-7 for the grand world. It is Dot eYeD implied by Pl-7 together with Pl-5, though in the presence of aU theae 1'6 could undoubtedly be weakened. There seems to be little to pin in the present context by reducing !S6 to such minimal termB, nor by expreaaiDg it, as tsl-5 have been expresaed, in gra.Dd-world terms alODe; for :P6 does Dot lend itself easily to such treatment, though it would be
(5.1
UTILrTY
88
easy to decide in any instance whether P6 obtained without undue reference to the grand world.
Pi
Strong form of BUre-thing principle.
Au tomatic, in view of the explicit assumption that F has only • finite number of elements. To summarize, a smaU world is a pseudo-microcosm, if and ODly if it satisfies Pa~" The possibility of enlarging an arbitrary small world in such a, way as to satisfy those conditions has already been implicitly discussed in connection with P3-6. To recall the arpmenta that were adduced, ODe might review the e.umple about the egg in 13.1, aDd the further diseU88ion of that example in the opening paragraph of § 3.2; the remark in § 3.2, introducing P5; and the example about the coin following P6' in § 3.3. I t is encouraging to poesese the arguments just, cited tending to ahow that any small world can without overwhelming difficulty be embedded in a somewhat larger small world that is a pseudo-microcosm. A pseudomicrocosm is, however, completely satisfactory, omy if it is actually a microcosm, that is, only if it leads to a probability measure and a utility well articulated with those of the grand world. The problem of deciding under what circumstances that oceurs is much facilitated by the fact that the probability measure and a utility of a pseudo-microcosm can be written do\vn explicitly, as the ne.~t fe\v paragraphs ahow. "ro study the problem, suppose the small world is a pseudo-microcosm. Then, in vie\v of 1'5, let I, Ii, be elements of , such that, < Ii, and let
(7)
Q(8) ==
E(h - DI (B)) p([BJ) DI E(1i, _ I)
== E-1(h - g)
r f1&(8) -
JIIJ
g(8) I dP(8).
By using P3 to check the positivity, it is easily verified that Q is a probability measure on B. The probability measure Q agrees \\ith the relation < between small-world events, which is easily verified on rewriting (3) for the special small-world act Ij that takes the value Ii for s ! Band g for i ! ""B thus: (8)
I
+ E(g I ",(BJ)p(~(B]) ,1 (B])p«(BJ) + 8(0) g)Q(B) + E(J).
E(lj) == Jt:(h [B])p«(B))
== l~'(ii = E(h -
a.11
S)IALL WORLDS
89
to CODatruet a probability me&81J.n! that acreea with the relation S between amaU-world events, but, in the preeenee of 151-6, aD of them mUit (in view of CorolJary 3.3.1) be the same 88 Q. That consideration leads to the formula Since , and , are esaentially arbitrary, there are many
E(] - /' ((BDP([B» == E(j - J')Q(IJ}
(9)
for 41l1t J'.' aDd
11 c B.
UIiDg (9) aDd recalling that. U(]) has been defined be rewritten tbus: (10)
way8
E(I) == B(I)
&8
+ ~ E(~ - "/('(8» == ~)P(](i(8»
E(]), (3) can
== Ii)
t
=
~ U<')Q(}(i) = ') .
•
The question whether a given peeudo-microe08lD is really a microcosm is the question whether Q(B) = P([S» and whether U is a utility for the pseudo-microcosm. The answer to the eecond part is immediate and, I t.bink, somewhat surprising, for (10) shows that for any peeudomierocoam U is indeed a utility. UDfortuDate1y, the condition Q(B) == p([B) is Dot also automatic. The po88ibility of its failing to be satisfied is illustrated by the following simple mathematical example. Let 8 be the uDit square 0 S %, r ~ 1, andJet
LL I
(11)
E(f) -
1
f(z, If) dz dlf·
It is of DO real moment that the integral in (11), jf undentood in the Lebelpe or Riemann 8eIl8e, is Dot defined for all bounded functi0D8. Let the elemeota of S be the vertical line segments, s - constant. FiDally, suppose that the elements of , consist of the funCtiOD sero and amy 6Dite number of non-negative multiples of a fixed positive functioD h - n,. It is easy to verify that S as t.hUl defined is a peeudo-mieroc08m aod that (12)
where (13)
Q(B)
-i,.s_
q(z') dz'
UTILITY
Unlelll q is 1 for every
Zl,
[6.&
which will Dot at all typically be the cue, B
is not really a microcosm.
The general conditioD that & peeudo-miC1Oeoem be a micfOCoam-i..e., that Q(B) == P«(B})-iB evidtmtly, in view of (9), (14)
E(] -
I' I [B)) :. E(J - J')
every B for which p([B]) > o. Incidenta.UYJ that condition alone practiea1ly implies that • ema1l world S, not nicea sarily assumed to be a pseudo-microcosm, is a real microcoam. More exactly. it implies all the postulates :P1-7, except P6; and it implies that the probability meaaure P agrees with the relation ~ between small-world events. Also, if a small world is a pseudo-microcosm, it is enough that (14) should hold for eorM pair of functiODl for which the for every
J. I' ., and
right-hand side of the equation does not vanish. Equation (14) is, however, UD8&tisfactory in that it Beema incapable of verification without taking the grand world much too aerloualy. Some cODBOlation may derive from the faet that if J and f are constants they automatically satisfy (14). Two such absolute, or puld-world, CODSequences would suffice, for, .. has just been remarked, it ;, aufIicient that (14) be satisfied for two materially different small-world COD88quence8, in the preeence of IS 1-7 (which are verifiable without any detailed knowledge of the grand world). It must, however, be admitted, &8 has already been mentioned, that the very idea of a grandworld consequence takes the grand ,,·orld pretty aerioualy-a point forced into my reluctant mind by a conversation with Francesco Bram-
billa. I feel, if 1 may be allowed to say 50, that the poesibility of beiDl taken in by a pseudo-microcosm that is Dot a real miorocosm is remote, but the difficulty I find in defining an operationally applicable criterion is, to say the least, ground for caution. There certainly seem to be C88e8 in which one could confidently &Bsume (14), though thU8 far formal analysis of the source of such ... eurity eseapea me. CODSider, for example. a lottery in which numbel'ed tickets are dral'ln from a drum. It seems clear that for an ordinary person the outcome of the lottery is utterly irrelevant to his life, except through the rules of the lottery itself. In other terms equally 10088. the value of a thousand dollars, or of & car, to a person would not ordinarily depend at all OD what numbers were drawn in a lottery, unle88 the person himself (or perhaps some other person or organization with whom he had some degree of contact) held ticket8 in the lottery. A more precise fonnulation, which does indeed imply (14), is that the events that represent the outcome of the lottery are all statistically
1.8)
IDSTORICAL AND CRITICAL COMMENTS ON UTILITY
91 -
independent of the grand-world &COl, or functions, that typically enter 81 prbres in a lottery. This suggests once more that it would be desirable, if poaaible, to find a simple qualitative personal description of independence between events. (Compare the first paragraph after (3.5.2).) 6 IDItorica1 and critical comments on utility A casual historical sketch of the concept of utility v.ill perhapa have some interest as history. At any rate, most of the critical ideas pertaiDiDg to utility that I wish to discu. find their plaeee in such a sketch as cODveniently 88 in any other organization I can devise. Much more detailed material on the history of utility, especially in 80 far 88 the economics of risk bearing is concerned, is to be found in Arrow's review article (A6]. Stiper's historical study [818] empbasir.ee the history of the now almost obsolete eeonomic notion of utility in riskless aituationa, a notion still sometimes confused with the one under discussion. Aa was mentioned in § 4.5, the earliest mathematical studies of probability were 1argely concemed ?tith gambling, particularly with the questioD of wbich of several available cash gambles is most advanta.geous. Early probabilists advanced the maxim that the gamble with the highest expected ,,·inninga is best or, in tel"lDl of utilityr that wealth measured in caah is a utility function. Some sense can be seen in that maxim, which will here be caned by its traditional though misleading name, the prindple of mathematical upectatiOD. First, it has of~n been argued that the principle fonows for the long run from the weak law of large numbers, applied to large numbers of independent bets, in each of which only sums that the gambler considers smal) are to be won or lost. Second, Daniel Bernoulli, who, in [BIO], was one of the &rat to introduce a general idea of utility corresponding to that developed in the preeediDg three sections, made the fonowing analysis of the principle, which justifies its application in limited but important contexts. If the eonaequences / to be considered are all quantities of cash, it is reasonable to auppose that UU) will change amoothly with ebaDge8 in J. Therefore, if a person's present wealth is /0, and he contemplates various pmbles, nODe of which can greatly change his wealth, the utility function can, for his particular purpose, be approximated by its tanpnt at /0, that is, (I)
a linear function
UCJ>
~
U(fo)
+U-
/o)U'Uo),
of J. Since a coDBtant term is irrelevant to any com-
parison of expected values, the approximation amounts to reprding utility u proportional to wealth, that is, to following the principle of
UTILITY
15.8
mathematical expectation. 80 far 88 I know, the oaly other &I'JUDlent for the principle ibat bu ever been advanced is one cODceminc equity between two players. AI Bernoulli lAye, that &I'JUDlent is irre1evaDt at beat; and neither of the relevant &I'JUDlenta justifies eatecoria1 accept.. ance of the principle. None the lesa, the principle was at first eo categorically accepted that it aeemed paradoD~ to mathematicians of the early eighteenth century that presumably prudent iDdividuala reject. the principle in certain real and hypothetical decision situati0D8. Daniel Bernoulli (1700-1782), in the paper (BIO), seems to have beeD the first to point out that the principle is at best & rule of thumb, and he there augested th. maximi.tion of expected utBity 88 a more valid principle. Daniel Bernoulli'. paper reproduces portioas of • Jet,ter from Gabriel Cramer to Nicholas Bemoulli, which establishes Cramer's chrono1ogical priority to the idea of utl1ity aDd most of the other main ideu of Bernoulli'8 paper. But it is Bernoulli'. formulatioa. topther with some of the ideaa that were epecifica1ly his that became popular and have had widespread in8UeDC8 to the PJ'8l8l1t day. It is therefore appropriate to review Bernoulli'. paper in BODle detail.. Being unable to read Latin, I follow the German edition [BIll. Bernoulli begi n 8 by reminding his readers that the principle of mathematical expectation, though but weakly supported, had theretofore dominated the theory of behavior in the face of uncertainty. He_ys that, though many &I'JUDlenta had been pven for the principle, they were all based on the irrelevant idea of equity aJDODI players. It hard to believe that he had never heard the arpment juatifying the principle for the long run, even though the weak law of Iarp numbenl was then only in its mathematical inf&DOY. An Ctmjet:ltmdi lB12]. then a fairly up-to-date and most eminent treatise on probability, does seem to give only the argument about equity, and that in countleaa f0nD8. This treatise by Daniel's uncle, Jacob (- James) Bernoulli (16M-17M), incidentally, contains the first ma.tbematical advance toward the weak law, proving it for the special ease of repeated trial•. Many examples show that the principle of mathematical expectation is not univetally applicable. Dauiel BernouDi promptly presents one: "To justify these remarks, Jet us suppose & pauper happens to acquire a lottery ticket by which he may with equal probability win either nothing or 20,000 ducats. Will he have to evaluate the worth of the ticket as 10,000 dueata; and would he be acting foolishly, if he sold it for 9,000 ducats? " Other examples occur later in the paper as iUustratiODS of the u. of the utility concept. Thus a prudent merchant may iJl8Ul"e hie abip apb18t 1011 at 88&, though he undentaDda perfectly wen that he •
18.,.
6.61
HISTORICAL AND CRITICAL COMMENTS ON UTILITY
93
thereby increasing the iDaurance company's expected wealth, and to the same extent decreasing his own. Such behavior is in flagrant violation of the principle of mathematical expectation, and to one who held tbat principle categorically it would be 88 absurd to insure &8 to throw money away outright. But the principle is neither obvious nor deduced from other principles regarded 88 obvious; 80 it may be cbaJ1enpd, ami must be, becaU8e everyone agrees that it ill Dot really iD8AI)e to •
1D8lIle.
Bernoulli cites a third, now very famoU8, example illustratinl that lDeIl of prudence do Dot invariably obey the principle of mathematical expectation. This example, known 88 the St. Petersburg paradox {because of the journal in which Bernoulli's paper W88 published) bad earlier been publicized by Nicholas Bernoulli, t and DaDjel acknowledges it as the stimulus that led to his investigation of utility. Suppoee, to state the St. Petersburg paradox succinctly, that a per80D could choose between an act leaving his wealth fixed at ita present magnitude or one that would change his wealth at random, increuiDg it by (2- - f) dollars with probability 2-- for every positive integer fl. No matter how Iarp the admission fee I may be, the expected income of the random act is infinite, as may easily be verified. Therefore, accordinc to the principle of mathematical expectation, the random act is to be preferred to the status quo. Numeriea1 e.YRmplee, however, lOOn convince any sincere persoD that he would prefer the status quo if f is at all large. If f is 8128, for example, there is only 1 chance in 64 that a persoD choosing the J'8Ddom act wiD 80 much as break even, and be will othenviae loee at least 164, a jeopardy for which be can seek compeDl&tion only in the prodigiously improbable winnjng of a prodigiouSly high prize. Appealing to intuition, Bernoulli says that the cash value of a pereon's wealth is not its true, or moral, worth to him. Thus, according to Bemoulli, the dollar that might be precious to a pauper would be nearly worthless to & millionaire-or, better, to the pauper himae1f were he to become a millionaire. Bernoulli then postulates that people do seek to maximise the expected value of moraJ worth, or what has been ealled moral expectation. Operationally, the moral worth of a penon'. wealth, 80 far as it con· cerna behavior in the face of uncertainty, is just what I would call the utility of the wealth, and moral expectation is expectation of utility. t Damel refers to thia Nicbol.. Bernoulli u his uncle, but, in view of dates meD~ tiooed ill the last eemOll of Daniel'. paper aDd the pnea10Cf in ChapW 8 of (891. I think he must have mean\ his elder couin (1687-1759), perhaps "unclet . . . • term of deference.
-rIC
[1.1
UT1LlTY
It aeema mystical, however, to talk about moral worth apart from probability and, having done so, doubly mystical to postulate tha.t this undefined quantity aerv.. aa a utility.. Theee obvioua criticisms have naturally led many to diacredit the very idea of utility, but 112-4: ahow (rollowing von Nenmann and MorpD8tem) that there ia a more copnt, though Dot altopther unobjectionable, path to that concept. BemouDi argued, elaborating the eDJDP1e of the pauper and the millionaire, that a fixed increment of cash wealth typically reaulte in an ever amaDer increment of moral wealth ae the basic caeh wealth to which the increment applies is increaaed. He admitted the potIIibility of examplea in which this Jaw of cUmfnitbtnc IIW'IiuI utility, as it baa come to be eaDed iD the literature of economica, might fail For example, a relatively amall sum might be precious to a wealthy priIoDer who required it to complete his raoaom. But Bernoulli iD8isted that lOch examples are unusual and that aa a general rule the law may be aaaumed. In mathematieal terms, the law .y. that utility 81 a IUDetion of money is a concave (i.e., the negative of a convex) function. t It fonows from the basic inequality eoncemiDg CODvex functiODB (Theorem 1 of Appeodix 2) that a peraoD to whom the law of dimiDWriq marginal utility applies wiD always prefer the status quo to any fair gamble, that is, to any random act for which the change in hie expected wealth is sero, and that be will always be wilIiDg to pay eometbina in addition to ita actuarial, or expected, value for izunJranoe &pinlt any 1088 to himaelf. The Jaw of diminishing marzinal utility has been very popular. and few who have conaidered utility.mee Bernoulli have diecarded it, or even realised that it was not neceaaarily part and parcel of the utility idea. Of course, the law baa beeD embraced eaprly and uDcritically by those who have a moral aversion to gambling. Bernoulli went further than the law of diminishing marginal utnity and augested that the slope of utility as a fUDction of wealth migbt, at least 88 a rule of thumb, be suppoeed, not only to decrease with, but to be invenely proportional to, the cub value of wealth. This, he pointed out, is equivalent to postulating that utility is equal to the logarithm (to any base) of the cub value of wealth. To this day, DO other function bas been sugeated 88 a better prototype for Everyman'. utility function. NODe the Ieee, 88 Cramer poiDtai out in hie aforementioned letter, the logarithm bas a serious disadVADtaPi for, if the loprithm were the utility of wealth, the St. Petenbtlrl paradox could be
t Often the meaninp of "convex" and "concave" as applied
to
IUDCtiODa ....
iIl-
t«ebaDp:i. A function ill here eaIled CODVex if it appean convex, in the orcIiaaI7 .... 01 the word, when viewed from below. Such a function is, of COUlII, &Ieo . .cave from above, whence the eoDfuaion. Appendix 2.
a.
1.1)
HISTORICAL AND CRITICAL COMMENTS ON UTILITY
86
amended to produce a random act with an infinite expectAd utility (i.e., an infinite expected logarithm of income) that, again, DO ODe would really prefer to the status quo. To take a less elaborate example, suppose that a man's total wealth, including an appraisal of hill futuN eamiDg power, were a million dollars. If the logarithm of wealth were actually his utility, he would 88 soon as not flip & coin to decide whether his wealth should be changed to ten thousand dol1are-roughly tsoo per year-or & hundred million doll&r8. Thia aeeml prepost«0U8 to me. At any rate, I am sure you can construct an example alone the eme lines that will seem preposterous to you. Cramer therefore c0ncluded, and I think rightly, that the utility of cash must be bounded, at least from above. It seems to me that a good argument caD also be adduced for supposing utility to be bounded from below, for, however wealth may be interpreted, we all subject our total wealth to slight jeopardy daily for the sake of a large probability of avoiding more moderate 1088e8. But the logarithm is unbounded both from above and from below; so, though it might be a reaaona.ble approximation to a peI8OD'. utility in a moderate range of wealth, it cannot be taken .riously over extreme ranges. Bernoulli'. idesa were aeeepted wholeheartedly by Laplace [LII, who W88 very enthusiastic about the applieatiODS of probability to alllOl'tl of decision problems. It i8 my casual impresaion, however, that from the time of Laplace until quite recently the idea of utility did not strongly influence either mathematical or practical probabiliats. For a 10Dl period economists accepted Bernoulli'. idea of moral wealth as the measurement of a person'. well-being apart from any coDlideratioD of probability. Though "utility" rather than "moral worth" baa been the popular name for this concept &mODI EnglishspeakiDg economists, it is my impression that Bemoulli's paper is the prineipal, if not the BOle, aource of the notion for all economiata, though the paper itae1.f may often have been 100 sight of. Economiats were for a time enthuaiaatic about the principle of diminishing marginal utility, and tbq saw what they believed to be reSections of it in maDY aspect. of everyday life. Why eIae, to paraphrase Alfred Marshall 0(pp. 19, 96 of [M2]), does a poor man walk in a rain that induces a rich man to take a cab? DuriDg the period when the probability-Ieee idea of utility wsa popular with economists, they referred not only to the utility of money, but also to the utility of other eonaequencea such &9 commodities (and services) and combinations (or, better, patterns of consumption) of commodities. The theory of choice among cOD8equenees was expn: Mad by the idea that, &mODI the available consequences, a persoD prefers those
UTILITY
[1.8
that have the highest utility for him. Also, the idea of diminiabiDI marginal utility was extended from money to other commoditiee. The probability-less idea of utility in economica has been completely discredited in the eyes of almost all economists, the fonowing argument against it-originally advanced by Pareto in pp. 158-159 and the Mathematical Appendix of [Pl]-being widely accepted. If utility is regarded as controlling only consequences, rather than acta, it is not true-a.s it is when acta. or at least gambles, are coDBidered and the formal definition in § 3, is applied-tbat utility is determined exeept for a linear transfonnation. Indeed, confining attention to CODJJequences, any strictly mODotonically increasing function of one utility is another utility. Under these circumstances there is little, if any, value in talking about utility at all, unless, of course, special economic cousideratioDS should render one utility, or say a linear family of utilities, of partieu1ar interest. That possibility remalDS academic to date, though one attempt to exploit it was made by Irving Filber, 88 i8 briefly discussed in the paragraph leading to Footnote 155 of [818J. In particular, utility as a function of wealth can have any shape whatsoever in the probability-less context, provided only that the function in qUe&tion is increasing with increasing wealth, the provision following from the casual observation that almost nobody throws money away. The history of probability-less utility has been thoroughly reported by Stigler [SI8]. What, then, becomes of the intuitive arguments that led to the notion of diminishing marginal utility? To illustrate, cOD8ider the poor man and the rich man in the rain. Those of us who consider diminishing marginal utility nonseDSical in thi8 context think it sufficient to say simply that it is a common observation that rich men spend mODey freely to avoid moderate physical suffering whereas poor men sutler freely rather than make corresponding expenditures of money; in other terms, that the rate of exchange between circumstances producing physical discomfort and money depends OD the wealth of the person involved. In recent years there has been revived interest in Bernoulli's ideas of utility in the technical sense of §§ 2-4, that is, 88 a function tbat, 80 to apeak, controls decisions among acts, or at least pmbles. Ramsey's essays in (RIl, which in spirit closely resemble the first five chapters of this book, present a relatively early example of this revival of interest. Ramsey improves on Bernoulli in that he defines utility operationally in terms of the behavior of a person constrained by certain postulates. rtam.sey's essays, though now much appreciated, seem to have had relatively little influence. Between the time of Ramsey and that of von Neumann and MorgeD-
5.61
HISTORlCAli AND CRITICAl..
CO~f~IE~~
ON UTILITY
ftI
stern there was interest in breaking away from tbe idea of maximisi", expected utility, at least 80 far 88 economic theory was concerned (cf. [Tla]). This t.rend w88supported by thoae who said that Bemoulli gives DO reaeon for supposing that preferences correspond to the apected
value of some function, and that therefore much more general poI8ibilities must be considered. Why should not the range, the variance, and the skeWDeM, not to mention countlea other features, of the distribution of lOme function join with the expected value in determining preference? The question was anawered by the construction of Ramsey and again by that of von Neumann and Morgenstern, which baa been atiahtlyextended in 1,2-4; it is limply a mathem atiea1 faet that, almost any theory of probability haling been adopted and the sure-thing principle having been suitably extended, the existence of a function whose expected value controls choices can be deduced. That does Dot mean tbat as a theory of actual economic behavior the theory of utility is abeolutely established and cannot be overthrown. Quite the contrary, it is a theory that makes factual predictions many of which can easily be observed to be false, but the theory may have some value in making economic predictions in certain contexte where the departures from it bappen not to be devastating. Moreover, &8 I have been arguingt it may have value as 8 normative theory. Von NenmanD and Morgenstern initiated amoQg economists and, to a lesser extent, also among statistieians an intense revival of interest in the technical utility concept by their treatment of utility, which appea.t'8 as a digression in [V4]. The von N eumann-Morgenatern theory of utility has produced this reaction, because it gives strong intuitive grounds for aeeeptiDg the Bemoullian utility hypothesis as a eonsequenee- of well-aceepted maxima of behavior. To give readers of this book some idea of the von Neumann-lforgenstem theory, I may repeat that the treatment of utility &8 applied to pmbles presented in § 3 is virtually copied from their book [V4). Indeed, their ideas on this subject are responsible for almoet all of my own. One idea now held by me that I think von N.eumann
and l\lorgenstern do not explicitly support, and that 10 far as I know they might not wish to hal"e attributed to them, is the nonnative interpretation of the theory.
Of course, much
ot the new interest in utility takes the form of criti-
cism and controversy. The greater part of this diScuS9ion that baa come to my attention has not yet been published. A list of references leading to most of that which has is [B7], [W14], [81], [04], (F13], [A2J. I shall successively discuss each of the recent major criticisms of the modern theory. of utility known to me. My method in each cue will
(6.8
UTILITY
be first to state the criticism in a form rarembliDl those in which it is typically put forward, regardless of whether I eonaider that form well chosen. I \\ill then discuss the criticism, elaborating its meaning and indicating its rebuttal, when there seems to me to be one. (a) Modem economic theorists have rigorously shown that there is no meaningful measure of utility. More specifically, if any function U fulfills the role of a utility. then 80 does any striotly monotomealJy illereasing function of U. I t must, therefore, be an error to conclude that every utility is a linear function of every other. This argument bas been advanced with a aeriousneaa that is surprising, considering tbat it concedes little intelligence or leaming to the proponents of the utility theory under diseussiun and eonaidering tbat it results, 88 will immediately be explained, from the baldest sort of a terminological confusion. To be fair, I must go on to 8&y that I have never known the argument to be defended long in the preaence of the expJauation I am about to give. In ordinary economic usage, especially prior to the work of von Neumann and Morpnatem, a utility associated v..ith gambles would pre8umably be simply a function U associating numbers with gambles in such a way that , ~ fI, if and only if U(I) ~ U(g); though economic discussion of utility was, prior to von Neumann and Morgenstern, almost exclusively confined to consequences rather than to gambles or to acts. It ia unequivocaUy true, as I have already brought out, that any monotonic function of a utility in this wide classical sense is itBe1f a utility. "nat von Neumann and ?\.lo1"geostem have mown, and what has been recapitulated in § 3, is that, granting certain hypotheeea, there exists at least one classical utility V satisfying the very 8pecial condition
(2)
V(cJ
+ (jt)
:s
aV(1)
+ tJV(g),
where , and ., are any gambles and a, fJ are nOD-negative numbers such that CI fJ = 1. Furthermore, if I may for the moment call a cl88frical utility satisfying (2) a von Neumann-Morgenstern utility, every von Neumann..?forgenstem utility is an increasing linear function of every other. To put the point ditTerently, the essential conclusion of the von N'eumann-~{ol'len8tem utility theory is that (2) can be satisfied by a classical utility, but not by very many. The eoDlusion arises only because von Neumann and ~forgenstem use the already pre-empted word "utility" for what I here call "von Neumann-MorgeDStem utility." In retrospect, that seems to have been a mistake in tactica, but ODe of no long-range importanee.
+
1.6]
HISTORICAL
~~D
CRITICAL
CO:M~IE..'\TT8
ON UTILITY
99
(b) The postulates leading to the von Neumann-Morgenstern concept of utility are arbitrary and gratuitous. Such a view can, of COW'8e, always be held without the slightest fear of rigorous refutation, but a critic holding it might perhaps be persuaded away from it by a reformulation of the postulates that he might find more appealing than the original set, or by illuminating examples. 1D particular, Pl-7 are quite different from, but imply, the postulates of von Neumann and :ltforgenstem. Incidentally, the main function of the von Neumann-llorgenstem postulates themselves is to put the e&I8Iltial content of Daniel Bernoulli's "postulateU int.o a fonn that is lesa gratuitous in appearance. At least one serious critic, who had at first found the system. of von Neumann and ~forgenstem gratuitoU8, chaDpd his mind 'when the poesibUity of deriving certain aspects of tbat sy&tem from the sure-thing principle was pointed out to him. (c) The sure-thing principle goes too far. For example, if two lo~ teries with cash priles (not necessarily positive) are based on tbe same Bet of lottery tickets and 80 arranged that the prize that will be asaigned to any ticket by the second lottery is at least &8 great as the prize a&signed to that ticket by the first lottery, then there is no doubt that virtually any person would find a ticket in the first lottery Dot preferable to the 8&tDe ticket in the second lottery. II, however, the pri&es in each lottery are themselves lottery tickets, such that the prize aasociated with any ticket in the first lottery is not preferred by the person under study to the prise aaaociaf4d with the same ticket by the eeeond lottery, the cODclusion that the person will Dot prefer a ticket in the first lottery t.o the same ticket in the second is no longer compelling. This point resembles the preceding one in that the intuitive appeal of an &8SUmption can at most be indicated, not proved. I do think it cogent, however, to stress in connection with this particular point that a cash pme is to a large extent a lottery ticket in that the uncertainty 88 to what will become of 8 person if he baa a gift of a thousand dollars is not in principle ditTerent from the uncertainty about what will J>e. come of him if he holds a lottery ticket of considerable actuarial value. Perhaps an adherent to the eriticism in question would think it relevant to reply thus: Though cash sums are indeed essentially lot~ry tickets, & sum of money is worth at least &8 much to a person as a. smaller IWD, in a peculiarly definite and objective 88D8e, because money can,. if one desires, always be quickly and quietly thrown away, thereby making any sum available to a pe1150n who already has a larger sum.
But I have never heard that reply made, nor do I here plead ita cogency.
UTILITY
100
16.8
(d) An actual syafaIlatie deviation from the 8ure-thing principle and, with it, from the von Neumazm-MotpDStern theory of utility, can be exhibited.. For example, a pel'8OIl might perfectly reasonably prefer to IUbsist on a packet of Army K ratioDl per meal than on two OUIlces of the best caviar per meal. It is then to be expected, according to the aure-tbiDI principle, that the perIOD would prefer the K rations to a lottery ticket yielding the K rationa with probability 9/10 and tbe caviar diet with probability 1/10. That expectation is no doubt lulfi11ed, if the lottsy is underatood to determine the peIBOn'a year-long diet once and for aIL But, if the perIOD iI able to have a.t each meal a lottery ticket offeriDc him the K rauODI or the caviar with the indicated probabilities, it is Dot at all unlikely, panting that he Iikea caviar and baa 8OlDeltorage facilities, that he wiD prefer this "lottery diet." This conclusion is in de6uce of the principle tha~ "the theory of coDlUmer demand is. static theory." (Cf. [WI4].)
I admit that the theory of utility is not atatic in the indicated aeD8e, .. the foregoiDa example coocluaively shows. But there is not the eligbtest reaaDn to tbjDk of a lottery producmg either a steady diet of caviar or & steady diet of K ratiODS 18 heiDI the same lottery 18 one having a multitude of different prizes almost all of which are mixed chronological programs of caviar and K rations. The fact that a theory of CODsumer behavior in riskless situations happeua to be static in the required IeD88 (under eertain 8pecial &88UIDptioD8 about storability and the linearity of pricea) is DO argument at all that the theory of eODSUDler behavior in risky circumatances should be static in the same 1eD88 (as I mentioD in a note appended to [W14]). (e) If the von NeumaDn-Morgenstern theory of utility is not static, it is Dot subject to repeated empirical observation and is therefore
vacuous.
(Cf. [W14J.)
I think the diacU88ion in 13.1 of how to determine the pref'erenceB of a hot man for a. swim, a shower, and a glass of beer, and the disoussion in § 5 of the practicality of identifying pseudo-microcosms are steps toward showing how the theory can be put to empirical test without making repeated trials on any one pel'8OD. (f) Casual observation shOWI that real people frequently and ftagrantly behave in di••ccord with the utility theory, aDd that in fact behavior of that. 8OI't ill DOt. at all typically considered abnormal or ir-
rationaL
Two different topiea call for discussion under this heading. In the first place, it is undoubtedly true that the behavior of people doee often
6.6)
HISTORICAL AND CRITICAl.. CO!\fl\fENTS
O~
UTILITY
101
ftqrantly depart from the theory. None the leas, all the world knows from the leaaona of modem physics that a theory is not to be altogether rejected because it is Dot absolutely true. I t seems not unreasonable to auppoee, and examples could easily be cited to confirm, that in the extremely eomplicated subject of the behavior of people very crude theory eaD playa useful role in certain contexts. Second, many apparent exceptions to the theory can be so reinterpreted 88 not to be exceptiODB at all. For eumple, a flier may be observed doing a stUDt that risks his life, apparently for nothing. That seen.' to be in complete violation of the theory; but, if in addition it is known that the flier has a real and practical Deed to CODvince certain colleapes of his courage, then he is simply paying for advertising with the risk of his life, which i8 not in itaelf in contradiction to the theory. Or, IUppoee that it were mown more or less objectively tbat the flier baa a need to demonstrate his own courage to himaelf. The theory would again be reecued, but this time perhaps not 80 convincingly ae before. In general, the reinterpretation needed to reconcile various 10$ of behavior with the utility theory is sometimes quite acceptable and sometimes 80 strained 88 to lay whoever propoees it open to the charae of trying to save the theory by rendering it tautological The ••me 80rt of thing arises in connection with many theories, and I think there is pneral agreement that DO hard-and-f88t rule can be laid down .. to when it becomes inappropriate to make the necessary reinterpretation. FOl" example, the law of the conaervation of energy (or jt. atomic ap variant, the law of the conservation of mass and eneflY) owee it. 8UCee88 largely to ita being an expreasion of remarkable and reliable facta of nature, but to some extent &lao to certain conventions by which new sorta of energy are 80 defined as to keep the law true. A stimulating discussion of this delicate point in connection with the theory of utility is given by Samuelson in [81]. (s) Introspection about certain hypothetical decjaioD situations 1Ugpate that the sure-thing pnnciple and, l\ith it, the theory of utility are DOl'ID&tiVely Ull88.tisfactory. Consider an example based on two decision aituatioDi each involving two gambles. f
Situation 1. Choose between Gamble 1. 1500,000 with probability 1; and Gamble 2. t2,500,OOO with probability 0.1, tsOO,OOO with probability 0.89, status quo with probability 0.01.
t T1IiI partieuJar example is due to Allaia (A2). ADother late......... namp1e .... pi lIaW . . . .ha, earlier by Geo'a- Moria' lC4}.
UTILITY
102
11.1
Situation 2. Choose between Gamble 3.
t5OO,OOO with probability 0.11, status quo with probability 0.89; and
Gamble 4. e2,SOO,OOO with probability 041, status quo with probability 0.9. Many people prefer Gamble 1 to Gamble 2, becauae, spealdDI qualitatively, they do Dot find the chance of winning a wry large fortune in place of receiving a large fortune outright adequate CODlpeos&tiOD for even a small risk of being left in the statua quo. Many of the .me people prefer Gamble 4 to Gamble 3; beca.U8e, speaking qualitatively, the chance of winning is nearly the same in both gambles, 80 the one with the much larger prise seems preferable. But the intuitively aoceptable pair of preferencee, Gamble 1 preferred to Gamble 2 and GambJe 4 to Gamble 3, ia.Dot compatible with the utility concept or, equivalently, the sure-thing principle. Indeed that pair of preferences impUee the following inequalities for any hypothetical utility function. U (t500,OOO)
> O.IU (12,500,000) + 0.S9U (t500,OOO) + O.lU (to),
(3) 0.1 U ('2,500,000)
+ O.9U (eo) > 0.11 U (t500,OOO) + 0.89U (to);
and these are obviously incompatible.
Examples t like the ODe cited do have a strong intuitive appeal; eveD if you do not peraonally feel a tendency to prefer Gamble 1 to Gamble 2 and simultaneously Gamble 4 to Gamble 3, I think that a few trials with other prizes and probabilities will provide you with an example appropriate to yourself. H, a.ftB thorough deliberation, anyone maintains a pair of distinct preferences that are in conflict with the sure-tbin, principle, he must abandon, or modify, the principle; for that kind of discrepancy seems intolerable in a normative theory. Analogous circumstances foreed D. Bernoulli to abandon the theory of mathematical expectation for that of utility [BIO). In general, a pe1'8On who has tentatively accepted a normative theory must conscientiously study aituatiODB in which the theory seems to lead him astray; he must decide for each by reflection -deduction will typically be of little relevance-whether to retain hie initial impresaion of the situation or to accept the impllcatiooa of the theory for it. To illustrate, let me record my own reactions to the example with
t ADaia baa aDDOUDeed (but Dot yet. pubUahed) reapoue8 of
an empirical InveetiptioD of the
prudent, eduea" people to mch eDlDpIee (At].
5.6J
HISTORICAL AND CRITICAL
OOMME~"TS
ON trrILITY
103
which tbia heading waa introduced. When the two situatiODS were 6nt presented, I immediately expre.ed preference for Gamble 1 18 opposed to Gamble 2 aDd for Gamble 4 as opposed to Gamble 3, and I still feel an intuitive attraction to those preferences. But I have since accepted the following way of looking at the two eituatiOll8, which amounts to repeated use of the sure-thing principle. One way in which Gambles 1-4 could be realised is by a lottery with a hundred numbered tickets and with prizes according to the schedule shown in Table 1. TABLB
1.
PRlZU IN UNIT8
or el00,OOO
GAMBI&6
1
Situatio 1 Gamble 1
tGamble 2
6 0
S· tuatio 2 {Gamble 3 1 D Gamble 4
5 0
D
IN A LarJ'EBT UALlIIMO
1-4
Ticket Number 2-11 12-100 6
6
25
5
5
0 0
25
Now, if one of the tickets numbered from 12 through 100 is drawn, it will not matter, in either situation, which gamble I chooee. I therefore focUII on the poasibility that one of the ticketa numbered from 1 tJuooch 11 will be drawn, in which cue Situations 1 aDd 2 are exactly parallel. The subsidiary decision depends in both situations on whether I would eeD aD outright gift of esoo,ooo for a 10-to-l chance to win 12,500,000a conclusion that I think has a claim to univeraality, or objectivity. Finally, consulting my pme1y personal taste, I find that I would prefer the gift of 8500,000 and, accordingly, that I prefer Gamble 1 to Gamble 2 and (contrary to my initial reaction) Gamble 3 to Gamble 4. It IeEms to me that in reversing my preference between Gambles 3 and 4: I have corrected an elTOr. There is, of courae, an important senae in which preferences, being entirely subjective, cannot be in error; but in a different, more subtle sense they can be. Let me illustrate by a
simple example containjng no reference to uncertainty. A man buying a ear for a,134.56 is tempted to order it with a radio installedJ whiah will bring tbe total price to 12,228.41, feeliDg that the difference is trifliDg. But, when he reflects that, if he already had the ear, he certainly would not spend t93.85 for a radio for it, he realicea that he baa made an elTOr.
One thing that mould be mentioned before this chapter is closed is that the law of diminiabiDI JD8lIinal utility playa DO fundamental role
104
UTILITY
16.6
in the von Neumann-MorpDltem theory of utility, viewed either empirically or normatively. Therefore the pOl8ibUity is left open that utility as a function of wealth may not be concave, at leBA in IOIDe in-
tervals of wealth. Some economic-theoretical CODBequences of 1'eCOInition of the possibility of non-concave eegn1eote of the utility function have been worked out by Friedman and myaeJf (F121, and by Friedman alone [FlII. The work of Fri«Jman aDd myeeIf OD this point is eriticized by Markowitz (MIJ.+
+ See &JIO Anhibald (1959) aDd HWD880D (1970).
CHAPTER
6
Observation 1 Introdactlon With the constroction of utility, the theory of declaion in the face of uncertainty is, in a aeD88, complete. I have DO further postulates to propoee, and those I have proposed have been ahown to be equiva.lent to the aasumption that the person always decides in favor of an act the expected utility of which is as large as poEble, auPposiDI for simplicity that only a finite number of acta are open to him. At the level of generality that has led to this conclusion there seems to be little or nothing left to .y. To go further now meaDS to 10 into more detail, to investigate special types of decision problema. One type of decision problem of central importance is that in which the pel80ll iB called upon to make an observation and then to choose some act in the light of the outcome of the observation. The consideration of such observational decision problema is a step toward thoee problems of great interest for statistics in which the perBOn must decide what observation to make, that is, of coone, what to look at, not what to see. Tbey are the problems of designing experiments and other observational programs. Some remarks on observation were made in Chapter 3, but only DOW that the theory of utility is established is it poEble to give a relatively complete analysis of the concept. Observation is a concept esaential to the study of statistics proper, most of what has been said thus far being preliminary to, but Dot really part of, 8tatistit~; even after this chapter and the next one, on observation, there will still remain a major transition. One important feature of mud. uf wbat is ordinarily called statistics is, according to my analysis, concerned \lith the behavior not of an isolated peraoD, but of a group of pel'8OD8 acting, for example, in concert. ID later chapters I will deal, 80 far as I am able, with the problem group action, but preliminary considerations bearing on it will be made and pointed out from time to time in this chapter and the next. Though the details of these two cbaptel'8 may eeem mathematically forbiddinlt drastie simpIifyiDI &88umptioDa are made in them to keep
or
105
108
OBSERVATION
(8.2
extraneous difficulties to a minjmum. These typically take the form of assuming that certain seta of ads, events, and values of random varia. bles are finite. Even in elementary applications of the theory, theae simplifying assumptions seldom actually hold. In some contexta, it is quite elementary to relax them sufficiently; in others, serious mathematical effort has been required; and aome are still at the frontier of research. RelaxatiODS of the aaaumptiODB will be touched OD from time
to time, sometimes explicitly but sometimes only implicitly in the choice of suggestive notation and nomenclature. Beyond this introduction, the preeent chapter is divided into four sections: 12 analyses informally aDd then formally the notion of a coatfree observation; It 3 a.nd" discuss certain obvious but important c0nditions under which one observation, aDd similarly one set of acta, is more valuable than another; 15 abstractly discu_ problema of designing experiments or, perhaps more generally, obaervational programs.
2 What an observation is To begin with an informal survey of obaervation, coDBider a decision problem, that is, a person faced with a decision &mODI aevera! acts. Calling it the basic decision problem and the acta associated with it the basic acts, a new decision problem would arise, if the person were infonned before he made his decision that a particular event, say B, obtained. The new decision problem is related to the basic decision problem in & simple way; for the acts a.saoeiated with it are a1so the basic acts, and the decision is to be made by computiDg the expected utility given B of the basic acts and deciding on one that maximises the conditional expected utility. The basic problem may be modified in stiU another, though closely related, way. Let the person say in advance, for each possible Bit which of the basic acts he will decide on when he is informed, as he is to be, which element Bi of a given partition obtains. This will be called the derived decision problem arising from the basic decision problem aDd the observation of i, and ita acts will be called derived acts. Technically speaking, the derived acte are detennined by arbitrarily 888igning one basic act to each element of the partition. For any state " the consequence of a derived act is the cODBequence for , of the basic act A880ciated with the particular Bt in which 8 lies. The terms informaUy introduced in this paragraph are defined formally later in the section. A derived decision problem is not necesaarily clliferent in kind from the basic problem; indeed it is quite possible that the basic problem ean itself be viewed as derived from lOme other basic problem and obeer-
vation.
WHAT AN OBSERVATION IS
107
Formidable thouab the description of a derived problem may seem at fim reading, ita aolUtiOD is, in a aeoae, easy and has aIn.dy almost been given; for it is clear that, if P(Bi} > 0, the penon will decide to associate with B, a basic act the expected utility of which given B, is 88 high as possible, and, if P(B,) - 0, it is immaterial to the person which basic act is associated with B i . It is almost obvious that the value of a derived problem cannot be leaB, and typically is greater, than the value of the basic problem from which it is derived. After all, any basic act is amODg the derived acta, 80 that any expected utility that can be attained by deciding on a basic act can be attained by deciding on the same basic act coDBidered as a derived act. In ahort, the pel'BOIl is free to ignore the obeervation. That obvious fact is the theory'8 expression of the commonplace that knowledge is Dot disadvantapoua. It sometimes happens that a real pel'BOIl avoida finding something out or tbat his friends feel duty bound to keep something from him,t saying that what he doesn't know can't hurt him; the jealous apouae and the hypochondriac are familiar tragic examples. Such apparent exceptiODB to the principle that forewarned is forearmed call for anaIJ'Iis. At first sight, one might be inclined to _y that the peJ'801l who refWJel freely proffered information is behaving irrationally and in violation of the postulates. But perhaps it is better to admit that information that aeema free may prove expensive by doing psychological harm to ita recipient. Consider, for example, a sick penon who is certain that he baa the best of medical care and is in a position to find out whether bis mclme88 is mortal. He may decide that his own peraoDality is such that. though he can continue with some cheer to live in tbe fear that he may poaibly die 8OOD, what is left of his life would be agony, if he knew that death were imminent. Under such circumstances, far from caning him irrational, we might extol the person's rationality, if he abstained from the information. On the other band, such an interpretation may aeem foreed. (Cf. Critieiam (f) of '5.6.) Examplee of deciaioDS based on observation are aD every hand, but it will be worth while to examine one"in some detail before undertaking an abstract mathematical analysia of such dec,-isions. Any example would have to be hiehly idealiled for simplicity, becauae the complexity of &By real decision problem defies complete explicit description, but particular simplicity is in order here. The penon in the example is considering whether to buy some of the grapes he aees in a grocery store and, if 80, in what quantity. To hi8 taste, the grapes may be of any of three qualities, poor, fair, and excellent. CaD the qualities Qgenerically and 1, 2, and 3 individually. From
lOB
OBSERVATION
(U
what the perlOR knows at the moment, inciudiDg of coune the appearanee of the grapes, he caaDofr be certain of their quality, but be attaches peraonal probability to each of the three poaeibilitiea according to Table 1. T A.8L11 1. P(Q) Q(uatity)
123
1/4
P(robability)
1/2
1/4
The perIOD can decide to buy 0, 1, 2, or 3 pounds or 1faPI8; theae are the buic acta of the example. TV;", one consideration with &Dother, be finds the CODBeqUeDCeI of each act, measured in utilea, in each of the three poEible evente to be tho. Pven in the body of Table 2. The expected utilities in the riPt IIl&l1Pn of Table 2 follow, of coune, from Table 1 and the body of Table 2. TABLIt
2. UTtLIT!' 1(0)
)'OB .ACB
f
AND BAca
0
Q f
1
0
0
1
-1
2 3
-3 -6
2
3
B(f)
0 1 0
0 3 5
0 1
1/2
-2
6
-1
The entriea in Table 2 have not been ehoeen hapbuardly, but with an attempt at verisimilitude. ThUi it is IUppoeed that if the perIOD buys grapes of poor quality his di. .tisfaetioD with the barpiD will accelerate rapidly with the amount bought, which seems reaaonabte, especially if the keeping quality of poor grapes is low. He is., of course, unaffected by the quality if he buys DODe. Again, buyiDg & few fair grapes m&y be mildly desirable, but overbuyiq is not. FiDally, excellent grapea are worth buyiq, evan in large quantities, but the utility of the purchase increases 1888 than proportionally to the amount bouPt. The correct solution of the basic decision problem is to buy 1 pound of grapes; for that act baa, according to the right margin of Table 2, an expecUd utility of 1, which is the largest that can be attained. Now. suppoee the penon is free to make aD observation, that is, a new observation in addition to thOle that may have contributed to the determination of the probabilities in the basic problem. It may be, for example, that the grocer invites him to eat a lew of the grapes or that the penon is going to uk the woman beaide him how fJley look to her. Let there be five poaeible outcomes of his obeervatiOD; call them ~
109
WHAT AN OBSERVATION IS
8.2]
generically and 1, 2, 3, 4, and 5 individually_ I 888Ume, though this f.tore ill rather iDcidentai to the example, that low values of % tend to be IlUl(plltive of low quality. The joint distribution of % and Q. that is, the probability that % and Q simultaneoualy have &Dy given pair of values, is of eentral technical importance. ThOBe probabilities, each multiplied by 128 for limplicity of presentation, are given in the body of Table 3. The "'htrband and bottom margiDa of the table give, TABU:
3. 128P(%
n Q)
Q 1 2
3 4 6
1
2
3
128P(z)
15 10 4 2 1
5
21
15
1 2
27
24
4
32
15 5
10
15
'l7 21
32
64
32
128
128P(Q)
also multiplied by 128, tbe probability of each value of x and of each value of Q. The marginal entries are, of course, obtained by adding rows and columns. As indicated in the lower right-hand comer of the table, the probabilities assumed do indeed add up to 1, and the bottom marain recapitulates Table 1. Conditional probabilities can easily be read from Table 3. Thus, for uample, the conditional probability that % is 2, given that Q is 3, is 2/32, and the conditionaJ probability that Q is 2, given that % is 4. is 15/'Zl. It wiD be aeeD in later seetioDa that the distribution of z given 12 is, iD • eeo8e, even more fundamental than the joint distribution of 2: and Q. There are 4' - 1,024 derived acts, since one of the four basic acts can be &89igned arbitrarily to each of the five po.gible outcomes of the observation. It is an easy exercise, using Tables 2 and 3, to verify Table 4, \\·hich shows the conditional expectation of the utility of ea<-h TABLE
4. Eef I z) ~
f
1
0
0/11
1 2 3
-7/21 -40/21 -iN/21
2
3
4
..
,)
0/27 11/n -7IJrzt
0/32
0/27
81/81
43/27
8/32
J,4/11
72/21
-18/21
-48/32
18/'l7
14111
0/21 49/21
(0.1
110
basic act given each po88ible outeome of the obIeI'vauOD. For each z, the highest, expected utility, PVeD. that value of z, has been italioi.t Thus, for example, oBly if s is 1 will the peraon refrain from buyiDg grapes altogether, and ODIy if s is 5 will he risk buyiDg 3 pounds. In fuD, the best derived act~ call it I, is to buy 0, 1, 1, 2, or 3 pounds, if ~ is 1, 2,3, 4, or 5, reapectively. The value of the derived problem is the expeeted value of I, which is computed thus: (1)
BCg) -
1: E(& I s)P(%)
-= (0 + 11
+ 32 + 44 + 74)/128
== 161/128 ~ 1.26 utDea. Since the value of the basic problem is 1 utne, the envisapd observation is worth 0.26 utile; that is, the penon would if nece.-uy pay up to 0.26 utile for the observation. Bzerdse
1. Suppose that the person could directly obeerve the quality of the grapes. Show that his best derived act would then yield 2 util., and show that it could not po88ibly lead him to buy 2 pounds of the grapes.
The notion of a decision problem baaed on an observation will DOW be formaUy described, with 8pecial reference to mathematical notation and other technical details. 1. There is a set of basic acts, P with elements f, f', etc. In the example of the grapes II coDBiatal of the four enviaapd ac. of buying 0, 1, 2t or 3 pounds of grapes. The convention laid down at the end of § 5.4, requiring that the consequences of acta be measured in utiles, WIll be adhered to, and it wJ11 be supposed that v(ll) is finite. 2. The obaenatioD is a (not necessarily real) random variable z asaociating with each state 8 an obse"ed value X(8) in some set X of
pOlSible obaerved values %, z', etc. In the e.umple of the grapes, the states. (01 which the pc.tulates require that there be an infinite number) were never fuDy descnDed, and coDSeq11entiy the random variable z was Dot fully described. either. In the same senae it may be said that the basic acta, which are also really random variables, were not fully described either. All that is really important, however, is to know the simultaneous distribution of the conaequencee of the acta in F and of the values of z. In the example of the grapes that information was implicit in Tables 2 and 3.
8..3)
MULTIPLE OBSERVATIONS, AND EXTENSIONS
111
For mathematical simplicity in the formal work to fonow, it wiD pnerally be assumed that X has only a finite number of elements, thoqh the usumption can and must be relaxed in many practical aituatiODS. When X is aanuned finite, the random variable x is, (or all pu!'J)OE8 of the present context, simply a partition of B, namely, the partition into the sets on which % is constant. Indeed, earlier in tltia eection, the notion of observation was deseribed in tenns of a partition, but the description in terms of a random variable is more familiar in ltatisties and may have technical advantages, especially when the restriction that X be finite is relaxed. 3. The set of Ibv:IUfnI junctions is the set of all functions associating an element of P with each element z of X. Let the values of the generic atra~ function be denotAd by fez) and the function itee1f by f(x). The Dotion of strategy funetion was not introduced in the informal description of observation, nor in the example of the grapes, because it is but a mathematical intermediary to the definition of derived acta and did Dot seem to call for explicit expression in the less formal con-
texts. 4. To each strategy function f(x) corresponds a derived act It in the set of aU derived acta P(x), defined by for all, ,S.
(2)
It was explained that in the example of the gra.pes there are 45 d~ rived acta. In the same way, it can be seen in general that if X haa ~ and F hu ." tlements there are derived acts. 5. The mue of I' liven z,
.l
(3)
v(ll 12:)
This is the fUDCtiOD of by italics in Table 4.
%
..
- Df sup , E(f Iz).
indicated, for the example of the grapes,
3 Multiple observations, and extensions of observations and of seta of acta If eeveral random variables %1, ••• , :Ea , 8880Ciating elements of 8 with elementa of seta X t, ..• , X", are simultaneously under diacU88ion, it is natural to form the new random variable, denoted z - (Zit···, :Eta), that 8880Ciates with each element of 8 an ordered n-tuple of elementa of Xl. •.. , X,., respectively. If the context is such that Xl, ••• , z,. U'8 thought of 88 obaervatioDB, then Z can also be thought of as All obeerYation and will sometimes be called a multiple observation-to
OBSERVATION
112
16.3
emphasize the manner of if.8 formation. To illustrate., any item such as profession or body temperatU1'e t.bat might be entered on .. patient'. history can be thought of u an observation; but the whole history. or a filing cabinet of historie&, can a1ao be thought of 88 an observation, the history being a multiple observation of items, and the cabiDet a
mUltiple observation 01 histories. Consider two ob8ervatioDi x and,. It i8 an int.erestiDg p088ibility that x and y are 80 related to each other that knowledp of the value of x would (almOBt certainly) imply (aImoet certain) knowledge of J. In that cue, observation of z implies eMentially the obeervatioD of J and pnera1ly something besides, which augeeta the following tIuee definitiODS. U and ODly if x and 1 are obeervatiou such that, for aU • and " in eom8 B of probability one t %(.) - %(,') implies Y(tI) - Y(-'>i then z is aD ......00 of J, and y is a contractioa of x. If z is an exteDBioD of ,. and '1 is an exteoaiOD of x, then z and ., are equivalent. Strictly speaking, one should 8&y not that x and J are equivalent, but rather that they are equivalent regarded as obeervatioDa, for tbia would Dot be a good cOlleept of equivalence to apply to random variables regarded &8 such. For example, & pair of equivalent obeervatioD8 can obviously be a pair of real random variables with different expected valUaI. Some properties of the relatiODS of exUmaion, contraction, and equivalence between observations are given by the followiDg easy but important aercisee. Throughout ibis set of exerciaes it is unn8Cesaary to IUppoee the obeervatioDS confined to a finite 88t of values; in the ease of ExerciBe 3b, it is impossible to do 80. Exerciaea 1. z and y ~ equivalent, if and only if x is both an extensiOD and a contraction of y. 2a. If P (%(.) fI(') I ~ 1, z and y are equivalent. 2b. Any observation z is equivalent to itaelf. Sa. If there is a value flo such that P{W(.) - rol == 1, then every % is an exteDBion of 1, and any two such obee"ations are equivalent. Such an obaervatioD, of course, amounts to obeerving notbing at all and will therefole be called a DUll obler'fatIaDab. If %(.) - • for almost all , • S, then z extends every J. 4. If z is an exteDsion of 'I, and '1 i8 an extension of %, then z is an exteDsion of &. State and verily the analogous fact about equivalence. 5a. If Tit .. function &I8OCiatiDg an element of Y with each element of X, and x is an obeervation. then the obaervatiOD '1 such that 1 v(x) is a contraction of z. 1:11
MULTIPLE OBSERVATIONS, AND EXTENSIONS
6.31
113
5b. If '1 is & contraction of Z, then there is a function l' such that Ply(') - 11'(%(,»' -= 1. What freedom is there in the ehoice of the function 1'1 Se. What are the implications of Exercises 5a and 5b for equivalence between obeervations? 6. If x and J are observations and z == tx, 1) is the corresponding double obeervatioD, then z is an extension or z and of .,. (This exereiee seems to call for a eonverae saying that every extension can be n!prded as a double obeervatiOD, but DO really neat one sugests itaelf to me. None the lees, in thinkiog about extensions and eontractiODl, the aort brought out by the exercise is a typical and stimulatinl example.) 7. tx, 11 is equivalent to.x, if and only if x extends J. The relatiODS ot extension, contraction, and equivalence have parallels for seta of acts., defined thU8: If I' and G are (noD-v&euoua) sets of acts ~h that, for lOme B of probability one, there is for each, I G an f with lea) =- ,(1) for all , • B; then I' is an atBnlion of G, and G is a contraction of II. If .. is an exteD8ioD of G, aDd G is aD extension of F, then J' and G are equiv-
I'
al_t More nerdses 8. If:r is an extension of (equivalent to) G, then v(F) ~ (-) p(G). 9. Discu88 the analogues of Exereiaes 1, 2b, and 4: for seta of
acts. 10. H po ~ G, then P extends G. 11. If P(z) is derived from F on obeervation of x, then V(x) extends
P. 12. HTP.
F(x) is derived from P on obeervation of %; )'(1) is derived from F on observation of 1; F(x, y) is derived from F on obeervation of (x, ,..1; F(z; y) is derived from F(z) OD observation of J. CONCL.
1. r(x, 1) ia equivalent to P(.x; 1). 2. '(Zt y) extends F(z) and P(y) .. 3. If z is equivalent to 1, then P(z) is equivalent to F{J). 4. If 1 extend..z; then P(z, y) is equivalent to P(y), P(y) is equil-alent to Jl(z; 1), and PCy) extads F(x).
(64
114
13a. Under the hypothesis of 12, the equivalences and relatioDS of extension &mODI the seta of acta arising out of two obaervatioDB can, with evident coDventioDB, be diagrammed thua:
I
Zi J
Z, J
Ji Z
l
I
!
x-
.04
J
13b. If "1 extends x, the di&lram becomes
I
z, Y
Zi Y
Yi z-
J
I-
Z -
o.
l3e. If x and '1 are equivalent, the diagram becomes
Z,Y
XiJ
z
Jj Z J
-+
0•
14. If :F(x) and G(z) are derived from P and G, respectively, and if P extends G, then r{x) extends G(z).
I
15. II(F(z» - E[I/(F z)I" ,
f
I
II(F x(,» dP(,)
~ II(F).
Dominance and admgtdbiUty
According to Exercise 3.14. if one set of acta, reprded .. ba.aie, extends another, the first is at least &8 valuable as the secoDd in the liaht of any observation whatever. This aeetion splorea a relation, dominance, which haa the same property but is Dot 80 strict as exteD8ion. Dominance is of some importance for the theory of personal probability as it baa been developed thus far. But its ·importance will be even greater in the study of statistics proper, where interperscm.a.l agreement is of particular interest; for, as the definition shortly to be given will make clear, two people having different persow probabilitiee will agree 88 to whether one of two sets of acts dominates another, if ollly they agree which events have probability zero a condition generally met in practice, and one that could if desired be dispeDSed with by a alight cbange in the definition of domina.u~ I t will be seen that dominance and notions related to it are intimately a.ooiated with the sur&-tbiDg principle. Indeed, probability beiDa taken for granted, the basic facts about domiDance aeem to give a c0mplete expreB8ion of the sure-thing principle. Dominance aDd related concepts were much stressed by Wald, in [W3) for e.umple.
DOMINANCE AND ADMISSIBILITY
8.41
116
Two or three notiODa, the logical connectioDS among them, and those between them and exteDsion, are to be treated. The logical conneotioDS being many but simple, I think that the materiallenda itae1f better to formal than to expository treatment, for in such a context the reader who looks for the motivating ideas sees them himself more easily than he comprehends someone else's verbalilation of them. This section will therefore consist primarily of a group of formal definitions and
several exerci8ea.
,(8»
If and only if PUC,) ~ = 1, f domin.tes ,. If and only if some (every) element of P dominates (18 domina~ by) I, I' domlnatel (II dominated b,) I. If and only if :r dominates every element of 0, P domina. . G. If and ODiy if f dominates " but I does not domiDate f, f Itrlctly dominates,. If and only if f • P, and f i. Dot strietly dominated by any element of P, f is aclmisaible (with respect to F).
Involving as they do acta 88 wen 88 sets of aeta, the definitioDa, strictly apealdDl, introduce four different kinds of domiDanee. However, this complexity can be alleviated, with a alight lapse of logie, by identifying each act f with the set of acta of which f is the ODly element, for it is easily seen that this identification is in lOch harmony with the definition that, once it is made, the four kinds of dominance coJlapee illto one. BxerdIM la. Consider analogues of Exerciaes 3.2b and 3.4.
lb. When can two acts dominate each other? 2&. If F extends G, then :r dominates G. DiscU88 the converse. 2b. P(z) dominates F. 2c. If F :::l G t then P dominates G. 3&. If F c G, and F dominates G, then each admi88ible elf'ment of G dominates and is dominated by aD element of P. ab. Mter any finite number of nOD-adm;..ole elements is deleted from 1', what remains of any subset of J' that dominated .. continues to
domiDate F. 30. Though the set of admissible elementa of P may in eome iDstaDces
dominate rJ
DO
proper subset of the set of admissible e1emeDta can ever
do 80; but, if any other subset domiDates P, some proper subeet of it
at.o does 80. 3d. U:r is finite, the eet of admissible elements of :r domjnatee 1'. 38. DiacuI8 the role of "finite" in 3b and 3d.
118
4&. If the set of adm_Die elements of P dominatal G, and G dominates 1', then the set of admiMible elements of F is equivalent to the set of admissible elements of G. 4b. If F and G dominate each other, and either is finiU!, then the seta of admissible elements of P and G, respectively, are equivalent to each other, and each dominatea both I' and G. 5. If 'F dominatal G, then 11(1') ~ ,(G). 6. If .. dominates G J then, for any observation z, P(Z) dominates G(z).
I
Outline of the dellp of upedmenta
Often, eapecially in statistics, a decision problem can be seen 88 the problem of decidiDg which of aeveral experimente-or which of several obeervational programs, if that is really a more general term-to under-
take. In this eection the notion of the decision problem derived from a basic decision problem and an obeervation must be elaborated a little, because, as derived acta have been treated thus far, they correepoDd to the poesibility of making aD observation free of charge. Though obaervations are sometimes free, there is typically a cost 888OCiatA!d "lith making them; information must typically be bought either from other people or, mOle often from Datura, 80 to speak. The cost of iDfOJ'lD&otiOD may be money, trouble, one's own life, that of another, or any of iDnumerable poesibilltiee, but all can in principle be measured in terms of utility. The coat of an observation in utility may be neptive 88
well
zero or positive; witneIB the cook that tastes the broth. In principle, if a Dumber of experiments are available to a person, he bas but to ehooae one whose set of derived acts has the greatest value to him, due account being taken of the cost of observation. That simple formulation, like some others in this book, is, in a 8eD8e, oversimple; it abstracts from the enormous variety of coDSiderations that enter into the careful desip of any experiment. The poesibility of so abatmetiDg from variety does not remove the ultimate necessity of studying some aspects of that variety in detail. R. A. Fisher's TM Duign. 0/ EZlperimenta (F4], for example, is concerned almost exclusively with experiments baaed on a special technique called the analysis of variance, and it is but an introduction to even that important facet of statistiea. AgaiD, there is a growiDg literature (in which the work of A. Wald is outstanding) on sequential anaIyais, which is concerned in principle with aD experiments in which later parte of the experiment are conducted in the light of what happens in earlier parte; but this literature baa, by neee8lit,., been confined to a relatively tiny part of that domain. &8
e.II
OUTLINE OF THE DESIGN 01' EXPERIMENTS
111
Belore tumiDg to a more formal recapitulation of the outline of the desip of experiments, this may be & good place for a few speculative worda about the difference, if any, between experiment and obeervation.
Some aeiencea are commonly called experimental as oppoeed to others that are eal1ed obaenr&tional. Aerodynamics, the psychology of rote le&l'lliD& and the genetics of fruit flies would typically be called experimental sciences; and, to take parallel examples, meteorology, the paycholOlY of dreams, and human genetics would be called observational . But it is widely agreed, and the most casual consideration makes it clear, that any basic difference that may really be present resides Dot in the sciences themselves but in the methods typical of each. To illu8trate the role of observation in sciences ordinarily COD8idered experimental and vice versa, observations of wild populations of fruit flies have been useful in the study of the pnetica of fruit flies; the elects of fatipe, for example, on dream content may well be the subject of an experiment; and, e.'tcept for the atom, no topic in science is more popular today than experimental rain making. The illustrations could be utAmded indefinitely, and there is also a lees direct sort exemplified by the dieeipline called experimental medicine, which typically studies experiments on animals with the hope, often justified, that the findings thus obtained can be extrapolated to humans. The problem, then, is to distinguish an experiment from an observation. Except for brevity, it might be better to say mere observation, for, in general \l8IP, an experiment would be coruridered a special sort of observation. The first apparent contrast that comes to mind is that experimentation is generally thought of 88 active and observation 88 passive. But, upon examinatioD. it is Been that obaer/ation is also active, for obeervatiODS are typieally made by goiDg B'JlDewhere to observe, or waiting attentively till something happens. Often it is not only the observer bimMli who must be transported and put in readiness to make an observation, but also a considerable body of apparatus. What demands more activity than the modem observation of a solar eclipae? Another apparent contrast is that the experimenter acts on the tb.iq he obeerves. where&8 the observer acts only on himself and on instruments of observation that may be regarded 88 extenaioDB of his own 8eD8e orpns. If this criterion were accepted altogether DAively, there would be no such thing 88 a physiological experiment on one'. self; even 80phisticated interpretations might find it difficult to embrace plYchological experiments on one's self. Finally, experiments as opposed to observations are commonly suppoeed to be characterized by reproducibility and repeatability. But
118
OBSERVATION
(U
the observation of the angle between two &tara is easily Iep8&table and with highly reproducible rEaJlta in double contrast to an experimeat to determine the effect of upJodiDI an atomic bomb near a battleabip. All in all, however useful the distinction between observation and Bperiment may be in ordinary practice, I do not yet .a that it admits of any 80lid analysis. At any rate, no formal UI8 of the ctiatinction will be attempted in this book. Return DOW to the notion of obeervation subject to coat. It may be that the value of the random variable % is obaatvable but oDly at & coat c, a real-valued random variable meaaund in utilea. If, u heretofOIe, P(x) denota the set of acta derived from If on cost-free oblel,... tion of %, let F(z) - c denote the set of derived. acta subject to the laDdom coat c. This notation is interpreted to meaD that, if I is the generic element of F(z), then f - c (which, being a utility-valued function of I, is an act) is tbe generic act of the eet, P(z) - c. Very often the coR of an observation is independent of I, but not, for example, for him that testa the sharpness of a thorn with his finpr. Since obeervaticma are typically paid for before, or simultaneously with, maJriDI the oblerv... tion, the cost is typically obeerved along with the obeervation proper. Put differentlyJ the cost c is typically ,. contraction of the obaatvation z. Thus, if in some special context any advantage were to be gained by 80 doing, it would not be drastic to URlDl8 the cost of obeerriug z to be a function of the form ~ (x) j but, 88 a matter of fact, DO such advantage has come to my attention. It is not difficult to think of experiments to which the &88UDlptiOD does not apply. For example, in the present state of uncertainty about the long-term efJecta of X ..1'&)'8, anyone conducting a ahort--term experiment in which YOUDI huma beings were subjected to large doses of x-radiation would risk. coats that might not overtly manifest themselves for hall a century, or even for generations.
Much that would ordinarily be eaIled obeervation cannot be described by saying that the random COlt is simply to be subtracted from each derived act of the C01Te8pODd iD I obeervatiOD thought of 88 free of COlt. Allowing that it may be lepndary, the form of trial by ordeal in which the guilty Boated safely to be banged and the innocent drowned to be exonerated epitomUree such a situation; except in point of absurdity, ordinary industrial destructive testing of electric fuses and other products is much the same. Strictly speaking, discrepancy occurs even in the ordinary context in which the coet of obaatvation is a fixed IUID of money; for the utility of money is DOt, strictly linear, 80 tbe cost of observation typicaUy affects different derived acts somewhat differently. ThiB eort of situation is indeed 80 common as to introduce at leut a
8.51
OUTLINE OF THE DES(GN OF EXPERIMENTS
IJ9
alicbt error into almoet every application of the notion of cost 88 a subtraetive te1'lD. It would therefore be desirable to extend considerably the notion of cost of observation, but, thus far, I see no way to do 80 that does not destroy the mathematical advautage of singling problems of observation out of the class of decision problema generally. It is convenient now to analyze the appropriateness of regarding the Dumber 11(1") as a measure of the value of P. As must already be clear to tl:e reader, if a person is to make a preliminary decision limiting his next deeiaion to one or another of several &eta of acts, say, F, G, and H, then his preliminary decifJion will select a set that bas the highest value of P, and the preliminary and secondary decisioDS, regarded &8 a single p&Dd decision, amount to the problem of deciding on an act from PUG U B. So far 88 thia WJe of v is concemed, any increuing monotonic function of p such 88 " or 3- would be equally satisfactory, but ., baa an advantage in arithmetic simplicity when costs of obeervatioD are involved. Consider, for example, the problem of whether to make a particular observation at the random cost c or to make no observation at aIL The two seta of acta involved may then be symbolised by ~(z) - c) and F, respectively. The peculiar simplicity of 11 88 a mea&UN cl the value of a aet of acta, in this eontext, is exhibited by the almost obvious fact that 17(1'(:1) - c) == v(F(z» - B(c). It may be remarked in paring that v is a particularly good measure.in any problem where P, a, or His, 80 to speak, made available by lot, a poBbllity realised in (7.3.2), for example. FiDally, if one among several obeervations is to be ChoseD, each with it. own random coet (poBbly including the null observation), the perIOD will choose an ob8ervatioD for which v(l'(z» - B(c) is as large as poIIible. If the number of observations among which deeision is to be made is infinite, that function may not attain a maximum value. but the value of the situation to the persoD can reasonably be regarded 81 the auprem.um of the funotion; there are, of COUl'88, observatiODB &m.cmg those avajlable for which the supremum is arbitrarily nearly attained.
CHAPTER
7
Partition Problems 1 Introduction In the introduction of the preceding chapU!r it was explained that the treatment of decision problems in general had been carried to a logical conclusion, and that to study decision problems further it bad become DeeeIJ&ry to specialize. The notion of obeervation wu acc0rdingly chosen as the subject of specialization. The llituatioll now repeats it8e1f at a new level, for I have now covered the main pointe tbat occur to me about observation in general, though I see eODBiderably more to say about a certain type of obaervation. The type of observation problem to which the Pl8lBl1t chapter is devoted, though relatively special, is still very general. Indeed, ita generality is 8Ugested by the fact that no other type of problem is aystematically treated in modern statistics. In objectivistic terms, it would be descnDed 88 the type of decision problem in whieh the CODSequenC8 of each basic act dependa only on which of aeveral (pcaaDly jDfiaitel,. many) probability distributiona does in fact apply to the random vu;... able to be observed. f\fodem statiatics has no DAIIle for this type of problem, beeauae it recognises DO other type; and DO particularly sugestive Dame oecU18 to me. I am therefore tentatively adopting the noncommital name ''partition problem." Such motivation as there is for that Dame will be apparent when the concept is defined. In non-objectivistic terms, a partition problem bu the folIowiDa atructure. There are, of course, basic acts F and an observation z. The peculiar feature is a random variable b, which is typieally not lObject to observation, with the property that every f ill J is CODStant given that b has any particular value b. In many practical problems b takes on au infinity, even a non-denumerable infinity, of values, but systematic coDaideration of such problems would involve those advanced mathematical techniques that are explicitly being avoided in this book. Glossing over such queetioDS of technique for the moment, the state of the world, which is itaelf a 120
1.2]
STRUCTURE OF (TWOFOLD) PARTITION PROBLEMS
121
random variable, might play the role of b; with respect to this b, 8D7 observational decision problem would presumably be a partition problem. It may, therefore, be iDaoourate to eall partition problema special, but they are special whenever b is not equivalent to the state of the
world. AI has just heeD mentioned, the general poJicy of this book with .respect to mathematical teclmique restricts formal treatment of partition problems here to thoee in which b 888UDle8 only a finite number of different values, that is to say, those in which b is to all intents aDd pur-
poeee a partition B't whence the name ·'partition problem}' For the reader who is not familiar with the elements of the geometry of fHlimenaioD&l convex bodies, there will be a distinct expo8itory advaDtap in confining the formal treatment still further to twofold partitions. At the same time, by explicit statements and by the uae of suggestive natation, all readers "ill be given at least some idea of the extension of the theory to n-fold partitions; indeed, a reader familiar, for example, with Sections 16.1-2 of (V4], or with [B20] wiD find the extension as plain u if it had been made explicitly. Thus the restriction to twofold .. oppoeed to .foJd partitiODB will be to the advantap of some and to the diMdvantap of none. Partition problems are even cloeer than are observatioDal problema generally to the subject matter of statistics proper. ID particular, in the course of this chapter, multipel'BODal cODBideratioDS wBl from time to time be pointed out in connection with partition problema. I
Structure of (twofold) putitiOD problema
A central feature of a twofold partition problem ie, of coune, a twofold partition, or dichotomy, B i , i-I, 2. By way of abbreviation let tJ(i) - P(B,), and fJ - {8(1), 1J(2) J. The /l(I)'e can be any two Dumbers such that 1J(t.j > 0 and Z~(,) - 8(1) 8(2) - 1. Sinee 11(2) .at 1 8(1), it might seem superfluous to have a special notation for 1l(2); but this redundancy more than pays for itself in symmetry, especially in the extalsion of the theory to n--fold partitiODl. The po8Iibilit,y that one of the /J(t)'s vaniabes baa been ruled out, for it is neither typical nor iDteeItiDg, ud it. retention would mar the expo8ition 01 the theory. Each basic act f I F is characterized by a pair of Dumbers Ii ncb that
+
(1)
for each i. The technical a88UIllption will be made that 88 f ranges over l' the Dumbers Ii are bounded from above for each i, which is a little more stringent than tbe now familiar ....mptiOD that p(~ < co.
(7.2
122
The assumption expre-ed by (1) is made for defiuiteDeas and aimplicity. though its full force will aeldom be used. The poaaibility of relaxing (1) in certain context. will be mentioned from time to time, e&peciaUy since this poaaibility is of lOme interest even in the exploitation of (1) itaelf. In particular, for .vera! paps now it will seareely ever be necessary to a-1me . .)'thing about the struature of F relative to Bt, except that B(f I BJ is bounded from above for each I; for making the abbreviation /. == B(f, B.), almost everytbinl from here through Exercise 1 applies verbatim. The expected utility of any f • F can be computed in several fOl'JDl thus:
I
(2)
E(f) == E(f B1)P(B 1) - '1P(I)
+ 8(f I Bt)P(B,)
+ fJJ(2)
= 'Zt/IJ(I) SIr
12
+ (Jl -/J)~(l).
The first of these forma expreaaee the expected value in general terms; the second utilises abbreviations; the third is an obvious mathematical transcription of the second, particularly sugestive of extension to the n-fold situatioD; the fourth sacrifices the symmetry exhibited by the preceding three in order to take adVaDtage of the relation between ~(l) and fj(2). From the fourth form of (2), it i8 clear that, for fixed I, E(f) is a linear function of ~(l). Henceforth that fact, for example, would be expreaeed in symmetric form by saying that B(l) is IjDear iD /J, and the dependence of E(f) on ~ might be expHeitly indicated by writing B(f fJ). Since in any one decision problem fJ is constant, it might seem pointless to emphasise that E(t (J) is linear in~. But there are, in fact, two different reasons for being interested in variation of 8. In the tillt pIaee, once the observation :s: has been observed to have the value %, the basic, or a priori, decision problem is replaced by an a posteriori problem in which P(Bi \ z) plays the role originally played by P(B,) - /J(.). Beeond, interest in compa.riDg different people is becoming increasingly more explicit 88 the book proceeds. In particulart it is of interest to compare people who have available the same set of basic acta and who, at least 80 far &8 the distribution of x and the acta in F are concerned, ha.ve the same conditional personal probability given B i , but who attach different probabilities ~(J1 to tbe elements of tbe partitiOD.
I
I
7.21
STRUCTURE OF (TWOFOLD) PARTITION PROBLE?!S
123
To emphuiae ita dependence OD ~, .,(P) will sometimes be written '(I' fJ); ita computation in the fonowing faahion is fundamental to the theory of partition problems.
I
(3)
f.,
u(F IIJ) == sup E(f llJ) = sup Ul~(l) fl'
+ J~(2)J
= k(jJ),
where k(jJ) is defined by the equation in which it <>CCUIB. AcoordiDg to Exercise 4 of Appendix 2, the function k is convex in {J, that ie, It i8 convex when recognised as a function of ,,(1) alone. IntA!rpreUd as 8 pair of a priori probabilities, {J is confined to the open interval defined by ~(Jj - 1, /J(t) > 0, but it is valuable to recognize that k is defiDed, convex, and continuous on the closed interval %fJU) - 1, fJ(i) ~ o. Many typical features of the relationship between Ii' and B, are i11ustraWd grapbically by Figure 1. The abscisaa of that graph represents
FilUM 1
both /J(l) and fJ(2), as indicated, and the ordinate is measured in utm.. The straight lines, the left enda of which are marked fl. b. c, fl, and ., papb 88 functiODl of fj the expected values of the five basic acta of the partictilar problem represented. The ordinates at their right aDd left ends, respectively, are the correspondiDg values of the 11'. and 12'8. The graph of IE is marked by heavy line segmenta. It is seen that the linea G, C, and I, and they alone, touch the graph of k, for they repre-
124
PARTITION PROBLEMS
eent the only acta that aN optimal for lOme value of IJ. The act repteaented by d is inadmissible (if (1) is taken HWa1ly), beiDg in fact atrictly dominated by every other act except e, and it is therefore auperftuoua to the perIOD, no matter what the value of /J; b is obviously equally superfluous, but for a different reaaoD. In many typical problema in which l' baa aD infiDity of elementa, It is, IInlike the k in Figure 1, strictly CODvex; that is, ita 0Dly iDtervaJa of linearity are point intervale. BDrdIe 1. Compu~ and paph k for the eet P of dichotomous acta of the form
12(.> ..
1 - (1 - .)'; AnatHr.
i(ft) - lI'(l) - ~(2)f'
c:
(2,8(1) - Ir.
Tum now to the relationa between an observation x and the dichoWny B,. As before, it will be &MUJDed for mathematical simplicity that the value! of z are confined to a fiDilAt aet X. The probability that x attains the value z given BiJ written P{z I B i ), is fundamental in connection with partition problems. For one thiDl, as has already been indicated, there is interest in coDSidering people who, though differing with respect to fJt agree with reepeot w P(z t Bi). The probability P(%, B~ that z atWDI the value ~ and that B, simultaneously obtaiDB, the probability P(z) that x attains the value %, and the probability fJ(i I s) of Bi given that ~(.) -=- z are derived from P(~ I B i ) and /J by meaDS of Bays' rule (3.5.4) and the partition rule (3.5.3) thus: (4) (S)
P(:t, Bi) =- P(z 1Bi)~('l. P(z.) -
L
P(x, Bi).
i
(6)
if P{z) paE 0; and ~(i I~) is meaningless othen\i.se. It must be remembered that P(,;, B,), P(%), and ~(i l z) depend on the value of fJ and that a really eomplete Dotation would show that dependence. On the other hand, the condition that P(%) JII. 0 is independent of the value of f1.. When a 1eC00d obeervation y is to be discwmed, lJ(i is, in defiance of atriet locie, to be understood as the analogue of fJ(i I %); that is, 81 the eonditional rrobability of B, given that y(.) - fl. Dot aa the 1BID8 function 88 ~(i z) with 11 substituted for:t. CorMBpondiDg COllVeu-
t,)
7.3]
THE VALUE OF OBSERVATION
tions apply to PCJJ),
125
p(,1 Bi ), and P(II, B,).
made of such contractions &8 Equation (1) implies that
YmalIy, free use wiU be ~(x) for {,,{II z)t fj(21 z) I.
I
I
E(f B" %) - E(I B i )
(7)
J' and for all z such that P(z I B i ) > O. Equation (7) 11 the mathematical eaaence of the concept of a partition problem, and virtually all that is to be said about partition problems applies verbatim, if (7), even without (1), applies to such observations lUI may be under diacussion. In view of (7), for all f
I
I
B(f /J, ~) -
(8)
:=
if P(~)
E, B(t I B"
I
z}P(B, z)
1: IIJ(i I ~),
> o.
•
3 The ftlue of obaemation U the obeervation z ia made, and it is found that %(.) - z, then the & posteriori value of the set of basic acts, written ,(~ 1z), or mON fully v(F'~, z), will typically be ditJerent from the a priori value ,(PIli). Indeed, in view of (2.8), (1)
v(l
IIJ, z)
- sup B(I Ifl. z)
I.'
- "(1'111(%» - k(P(z».
This is the first illustration of the technical convenience of the function k. It is known on general principles that v(F{z» ~ I'(P), but there i8 lOme interest in reverifying the inequality in the PraJ8Dt context; in particular, it is P<*ible here to say in interesting terms just wheo equality can obtain. (2)
"(F(z) 1Il) .. B(fJ(F I fl(x»
- B(k(JJ(z»
I fJ)
1fJ)
~ k(B(ft(z) I fJ),
where the terminal inequality is an application of Theorem 1 of Appendix 2. To appreciate the inequality (2), it is neoeasary to ea1cula.te B(fl(i t z) explicitly. This calculation, typical of many the reader muat henceforth be expected to make for bimaelf, ruD8 as fonowa, where it i8
1.
PARULnON PROBY,EMS
1'1.8
to be UDdemood that the summation with respect to s appJiee Oldy to thoae terms for which P(%) is differeat from o. (3)
I
B(JJ(i :r) f /J) ..
1: /lei , :r;)P(z) s
.. P(B4) - /J(s).
Substituting (8) into (2) lead! to the anticipated conclusion that
(4)
.,(I'(z) I fJ) ~ k(JJ) - p(P 1fj).
I
According to Theorem 1 of Appendix 2, ,(r(z) fJ) is definitely greater than PcP I fJ) unless /J(z) is confined with probability ODe to lOme interval of linearity of k, in which cue the obeervation :r may fairly be called frIeleftllt to the basic decision problem at band. If 11: is irreJevut, the interval of liDearity to which B(z) is coufined muat, in view of (3), ContaiD IJ. In the particularly iDtereetiDc cale aad the only po&sible one, if 1c(JJ) is strictly eonvex-m which fJ(z) is with probability one equal to a coD8tant value, that value must therefore be fJ. An obl81'Vation for which 6(:1.) is with probability ODe equal to fJ may fairly be called utter., lrrelftUlt, beeauae it is inelevant no matter what lit .. of basio acts is a.oeiated with the dichotomy. To say that z is utterly irrelevant is to ..y that, with probability one, A( P(,;' B,)/J(.1 (6) ~, :r; pes)
·1 ) _
- /l(s). Since fJ(i} (6)
> 0, (5) is equivalent to t.be condition that P(z
I BJ -
P(z),
at least when pes) > o. Furthermore, it is obvious from (2.5), apin DOting that ~'1 > 0, that, if P(%) - 0, then P(z B,) .. O. Therefore z is utterly irrelevant, if aDd only if (6) bolds for all % and ij that is, if and ODly if the distribution of % given Bi is independent of i. This form of the condition is intuitively evoked by the words "utterly irrelevlUlt" and baa the advantap of not iDvolviDg #. It is noteworthy that whether an obaervation is utterly irrelevaD~ dependa neither on the particular set of basic acta, Dor on the value of ~, 80 people will epee on what is utt«ly irrelevant independent of their
I
THE VALUE OF OBSERVATION
7.8J
pemonal a priori probabilities and the acts amoDI which they are rree to chOO88. The greatest lower bound in z of v(F(x) , fJ), namely ~(JP Ill), and the circumstances under which this bound is attained haviDg been eetab1ished, it is natural to tum to a parallel investigation of the IMSt upper bound. A foothold for that investigation is found in the remark that the chord joining the ends of the graph of k Dever lies below the graph.
ADalyticaJly, (7)
k(JJ) S fJ(l)k(l, 0)
+ fJ(2)k{O, 1) -
1(JJ),
where l(JI) is defined by the context. Unleaa one of the l1(i)'1 vanishes, equality bolds in (7), if and only if k(JJ) is a linear function. In view of (7) and (3),
(8)
I
I
I1(F(x) (J) - B(1c~(x» (J) S B(I~(J:» , 8) .. l(P).
The inequality (8) gives an upper bound for .,(p(z». In graphical terms it says that, for any fJ, DO observation can add more to the value ,,~) of ,. than the vertiea1 distance at fJ between the graph of k and the graph of the chord joining the ends of k. Equality obtains in (8), if IE is linear, in which cue the upper and lower bounda are equal to each other irrespective of the value of /J and the nature of the observation. If·' is dominated by a BiDIle that is, if there is a single f optimal given B. for both values of it then It is linear. It can eaaily be verified that, provided 11 is finite aIld (I) aetuaIly obtaiDsi this is indeed the only circumstance under which k is linear, and, even if theee provisiona are not satisfied, the ~ibUities are Dot much men intereatiDg. Suppose, then, that k is not linear; equality can hold in (8), if and only if ~(z) is with probability confined to the en~ of the interval,·a condition that does not depend at all on r. By simple considerations, which have by DOW been rendered famjliar, this condition on z is equivalent to the condition that
't
P(~ I B1)P(% 1B t ) - 0, for aIl~. An obaervation satisfying (9) may fairly be caUed de8DltlYe t because, if (1) obtains, such an observation removee aU uncertainty about the outeome of each f c P, no matter what fJ may be. Perhaps Dl&DY of the obeervations made in everyday life are definitive, or practieally so. Once Old Mother Hubbard looked in the cupboard, her doubts were reduced to the vanishing point. None the leM, definitive observations do not play an important part in statiatioal theory, preciaely because statistics is maiDly concerned with uncertainty, and there is DO uncertainty once an obeervation definitive for the context at bani baa been made.
(9)
PARTITION PBOBI,EMS
128
,
(1-'
Eztenalon of obsenatlon., ad ....dent statiltics
It was shown in § 6.4 that a ltatfatic, or contraction, J of an obaervation J: is never worth more than z and is typiea1ly worth I.e. The purpose of the preeent section is to explore the relation between aD observation and a contraction of itself in the case of & partition problem, especially to explore the special eonditioDS in that cue under which the statistic is 18 valuable as the observation itaelf. Let % and J be two obeervationa such that J is a statistic of :I, that is, such that, for some function 1', 11(') =- ,,'(:e(,» with probability one. The values of F(%} and P(y) can be compared by the following ~ tiOD, which in the light of the preceding section wiU Deed but little explanation. v(P(s» ~ B(l~(s» IIJ)
(1)
- 1: E(k~(z» I /J, y)P(1/) .
•
I y) ~ k(E<-'(z» I(l, r»,
(2)
if pcr)
E(k(ft(:I) (l,
> o.
I I fJ, 'II)
(3)
E(fJ(i z)
=z
I: fj(i I z)P(z I y) s
= .E {J(i I%)P(z, 1/} • if P(u)
P(y)
s
> o.
Because of the special relationship between x and 1. P(~t ,) - 0 UDless tI(~) :=II 1/, in which case P(z, r) =- P(z). Understanding that the summation indicated by ~' in (4) below extends only over thoae values of z for which 11'(2:) - 1/, the ealcu1ation is continued thus: (4)
E/~(i I z) I fJ '-'"
) _
Z' P(z, B,) P(z) P(z)
, II =:
P(Y)
%' P(%, B,) P(y)
=-
P(y)
,. ~(i ',1).
Therefore, (5)
P(y, Bi )
I
,(F(z) IJ) ~
1: k{ft(y»P(II)
•
:II
v(F(y) J fJ).
SUFFICIENT STA'nsTICS
After the preceding section, it seenls almost superfluous to expJaiD that the point of the calculation above is not to obtain the inequality (5), wbioh baa already been derived with leas labor and greater generality in Exerc1eee 6. 3 . 8 and 6.3.13b, but to be able to di8cuas when equal.. ity holds in (5). The calculat.ion makes it clear that equality holds in (6), if and 0Dly if equality holds in (2) for every 11 of positive probability. This in tum is equivalent to the coodition that, given" ~(z) is confined with probability one to an interval of linearity of k. A auJlieient eondition for that is that, given '1/, ~(z) be confined with probabiHty one to a aiDgle value, which e&DDot be other than per); if It is strictly conva, the almost certain confinement fJ(z) to P(y) is also neceaary. Now, if, for'every 11 of positive probability, P(P(x(.» - fJ(y) 1r) - I, then it is we that fJ(%) r= 6(y) with unconditional probability one, that i8,
or
P~(X(8»
(6)
.. fJ{JI('») - 1.
'I1le eondition (6) clearly does Dot depend on P, and the foUowiDg ealculatiOD 80 expre 88es it 88 to make clear that it does not depend on # either. Equation (6) is satisfied, if and onJy if
I
(7)
when P(z)
I
P(x B 1)6(" P(r'(x) Bi)/J(t) P(:r;) P(y'{:r»
> 0; or, if and only if
(8) when P(z, B.)
(9) when p(JlI B i )
P{x J B,) p(yl B i )
P(z)
--P(y)
> 0; or, again, if and only if P(z l B~, 1/) == P(,; 11/),
> 0; or finally if and only if P(x 1B i , 1/) is independeDt
or i for thaee values of i for which it is defiDed. In this form, and yet another to be derived in connection with (10), the condition is widely studied in modem statistical theory aDd a statistic eatiafying the COIldition is there ca11ed a eufBdeDt statistic. The name is weD justified; forJ . . baa just been shown, it is sufficient, for any purpoee to which x micbt be put, to know" if and only if '1 is a 81Jf6eient statistic for z. A different, and perhaps more congenial, approach to auf&eieDt statistics is the fonowing. If the person observes 'the particular value , of ." his original basic decision problem is replaced by a new one with the same basic acts. but with ~ replaeed by IJ(',I). Strictly . .kiDI. this will fail to be a partition problem, in case ~(y) is (0, 1) or (1, 0). or, for brevity, if 6(r} is eztmM. To lee whetber ,,(P(z) (1) is really greater
I
130
PARTITION PROBLEMS
(7.'
than ,(P{J) Ill), it is enough to investigate whether, for some 11 of positive probabilif,y for which fl(g) is not extreme, z is relevant to the ~ titian problem baaed on ~6J), for if ~(y) is extreme there can be DO value in foilowiDg the observation that 11 baa occurred by the observation of x. Therefore, x will be a worthltBI addition to '1, if, for evWT II for which flC1l) is Dot extreme, x is utterly inelevant, that is, if J is suflicieD' for x. If k is strictly convex, the condition is aleo necSIS..,.. The recognition of sufticient statistics in explicit problema .is often facilitated by the foUowiDI faClDIablHty cdtarioD. A statistic J ia auIicient for x if and only if there exists at least ODe pair of funetiODS R and S Nch that P(z 1B,) - RC1I(z); ;}8(%).
(10)
The Dece88ity of the condition follows from the exhibition of a particular R and S for a 8UfIicient statistic thus: (11)
I
P(z B,)
IS
E P(z I B i , lI)p{JIl Bi)
• == L P(z J ,,)p(rl B,) • I
I
~ P(y'(:r} Bi)P(:c ,,'(%}).
On the other hand, if P(z 1B,) can be expressed in the form (10), J can be seen to be sufficient for x thus: If P(;e , B" r) is meaninaful, it is given by (J2)
I
P(z Bh
)
== P(z, 111
B()
P(, I B.)
'II ==
==
0,
I
P(:z B,)
p(rl Bi )
•
if 11'(%) - 1/,
8(z)
= I: SCSi) __
t
w(~·)
which is independent of i. The reader may be interested in aakine himself, 88 an exercise, what freedom there is in ehooaiDI R and S when at least one such pair of factors exista. IDterest in sufficient statistics is not confined, of COUl'8e, to twofold, or even finite, partitions. With that in mind, the varioul criteria for su1Iicient statistics have been given in such terma as to be valid for allY finite partition and the usual infinite ones. They require aome modifica-
7.4J
SUPFICIENT STATISTICS
181
tion if the obeervatioDS are Dot confined to a finite, or at any rate denumerable, let of values, but formal details of that important exteDIiOD will not be given here. Elementary treatments are given in most textbooks of mathematical atatiatiC8; more advanced and general treat,..
menta are given in [B21, [L6], and [H3J. There are l8VeraI examples of 8UfIicient atatistiea in the exerciara below, others are given in almost any fairly advanced textbook on statistics (in particular, in [09]), and one other general example of extraordinary importance is treated in the next section.
h .... In theae exerciaee, let x denote a multiple obeervation x -= (Xl, ••• , ~ J, where, given B" the z,.'a are independent and identieally distributed. There will be no real advantage here in thinking of the partition 88 twofold, or even finite, and for some of the exercises it will be impractical to do 80. 1. Let P(~,. I B,) - Pi,
+ q, -
:l1li
I,
if x,.
s
0,
otherwise,
- 0, where P.
if z,.
1; and let 1I(z) -
E %1'. r
Show that:
I
(a> P(z B,) - .,f,r--; (b) J is aufticient for z, using the faetorabilitv criterion;
(e) PC1J I BiJ - (:) pl'q/' ....., where, as always, (:) - nlf,I{" - 1/) I; (d) P(z I II'(Z»
)-1 . \v(z)
=(
1&
2. For each positive integer iJ let
p(z..1 B,)
- i- 1,
if ~r
- 0,
otbenrise,
S i,
where the values of Xr are confined to the positive integers; and let V(%) - max z,.. Show that,:
•
Ca) P(s I B.) -
j--,
- 0, (b) J is IUflicient for 1.
if fI S i, otherwise;
132
(7.'
PARTfflON PROBLEMS
3. In the two exercises above it baa been poeaible to chooae the factor 8 identically equal to 1. To exhibit & more typical example, let i, %.., and 11 be confined to the positive integers with 'II{z) == max Sr, 81 in the preceding exercise, and let
2:cr I Bo) .. , i f "'- <_ i, r • i(i + 1) -r
P(z
.. 0,
otherwise•
Show that:
P(z I B,) - (or 2
»).II
"+ 1
Zr,
if 1/ S i,
r
otherwise. - 0, (b) 'I is aufficient for x. 4. Put DO restriction on the conditional distributioD8 P{~ I Bu, except that z,. be confined with probability one to some fixed finite set. Say, for the moment, that two values s and ~ of z are IetJm mat., if one arises from the other by permutation of the component obee:vatioos. This divides the poeaible values of x into teoma, and, academic though it may seem, the team to which:t belongs can be taken as v(%). Show that the probability of % given ,1'(.2:) and B, is independent of i (if it is defined at all), 80 that the statistic 1I'(x) is sufficient for z. If the values of the %/8 happen to be real numbers, then for any % it is poeaible to permute the component observations to obtain & DODdecreasiDg aequence of n (not necessarily distinct) numbers, and 0Dly ODe such non-decreasing sequence can be 80 obtained from each s. The sequence thus attached through % to each 8 is called in statistical usage the eequence of order statistics corresponding to s. Since team mates. and only team mates, have the same order statistica, the let of order statistics regarded as a single statistic is equivalent to the team statistic 1/(x) defined more generally in the paragraph above and is therefore sufficient. 5. Let Xr given Bi be subject to the normal probability deDlitJ with mean Pi, and variance (Tl, that is, (13)
.(2:1' I B i ) == (,"(I~r~ exp {- (zr - ~,)2/2tri2}.
This situation, though elementary, does not fall within the teclmieal scope· of this book, because Xr is not confined to & finite set of values. The reader famUiar with probability densities will see, however, that the density of z is (14)
.(2:., ...
_f2 { .;z. I Bu-= (271t1,2) exp -
~%r2 ?A,I
I'i~r
/Ai2 }
+ "i' -" 2tt,2
'
7.4)
133
SUFFICIENT STATISTICS
which sugesta that " defined by (15)
may fairly be called a auflicient statistic for :I. Show in the eame heuristic way that, if til is independent of i, then .VGs) - kr defines a IUflicient statistic; and that, if I'i is independent • i, then 1I(z) - t&Zz,.1 - (~r)J does 80. 6. If w aDd z are obaervatiODl independent of each other given B" llDder what conditioaa can w be aufticient for {w, z)1 7. To break away from independent oo.ervations, suppose that, ill the eveDt B" ft carda are dealt from a thoroughly ahufBed deck of ft + i carda each beariDg a different serial number from 1 throup A + i. !At wr be the number OD the rth card dealt aDd w - IW1, ••• , ~.}. Show that mu w,. defines a au.flicieDt statistic for w and that the w,.'. r
are Dot independent.
8. If a exteuda w, and w is auftieient for J, then z is aIao sufficient for
J. 9. If I is aufticieDt for w, and 1 is independent of both (s, J J is sufficient for {w, ,. J.
I
and w. then
10. Every definitive statistic is sufficient.
In virtually all statistics tats it would be said that the 1 defined by (IS) CODBtitutes not one statistic, but two; aimUarly, the set of order atatisti. would ordinarily be referred to &8 " statistics rather tbao AI one. There are contexte in which it is appropriate to try to count Itatiatica mthat fashion, but, 80 far &I the theory of sufficient atatiatice is CODcemed, it often seems fruitt. ., if not positively detrimental, to dolO. The concept of BUfficient atatiatic8 has proved of great value in statistical theory and praetiee. The reason for this does Dot eeem to me altopther easy to analyse, but, 88 the exerci8ea above illustrate, the famities of distributions moat frequently studied in statistics are poeraUy rich in suftieient statiGica. It is hard to separate cauae from here; for the distributiODl that are moat studied tend to be thOle haviDc the greatest mathematical simplicity, and the presence of ,trik· iDs 8ufticient statistics, such 88 thoee exhibited by Exerciaee 1, 2, 3, 5, aDd 1, are among the sources of mathematical simplicity moat often met in the study of particular families of distributions. It mUBt be emphasised that suftieient statisties ofUm provide & signifi.. cant eaving in the mechanical labor of storing aDd preeenting data. 'lhUl, m any experiment faithfully represented by Exercise 1, it is
.ect
e
131
PARTITION PROBLEMS
auflieieot, in both the techDieal and ordinary IJ8D88I of the word, to record a single intepr 11 place cl the list of %r'8, which might well be very long. Several of the other eurciaea would in principle also lead to great aaviDp of this eort, but Exercise 5 is the only other that ariIeI frequently in practice.
m
The concept of auftieient atatwca WB8 introduced, together with much of the theory 8880Ciated with it, by R. A. Fisher (of. index, (Fa]). The subject hal been one of eontinuing interest and has been explored in aevera1 direetiQD8; key referencee are [B2], [Ell, [L6), [Ha], (Kl6]. and [M5}, and (LeCam 1.964). I
Llblfhood ,.tto.
The random variable 6(z) has played 80 importaDt & role in preceding aections that tbe reader will probably Dot be surprised to find that /J(z) is a sufficient statistic for z, a concluaion that. in the light 01 the faetorabillty criterion (4.10), can be &eeIl thus: (1)
P(,;
I Bd -
P(B,
I ,;) P(z)
6(11
If a statistic is sufficient, it is sufficient irrespective of tbe value of /J; moreover, any multiple of it by & DOD-zero CODBtant is also aufticient. Therefore, (1) implies that for any numbers a(tj, such that the multiple oba& ,.tion rea) defined by
er(.1 > 0,
P(s I B.) r,(z; a) .. Df Za{J)P(z I Bj)
(2)
res; cr) .aw5~t~amcror~
(3)
a.
DI ('1(2;, a), 'I(S, «) J
8m.
I: Clu)rJ(%; a) -
1
i
there is some redundancy in retaining both componeDta, but this redundancy is more than compensated by the advantap of retaining symmetry, especially when .fold partitioDB are confAmplated. Formally, the r(a)'8 are an iDfinite family of auffieient statistica, ODe for each a; but to a1t iDteata and purpoIe8 they repreaent but ODe auf&-
185
LIKEI.IBOOD RATIOS
cieDt statistic, for any r(a) is equivalent to aDY other, say r(<<'), 88 caa be demODStrated thus: (4)
I
P(z, Bi}/,%«'(Ir,)P(z B,,>
1'i{Z 0) - ---~----~, l:a{JiIP(z I B,)/%ol(It)P(z I B,,>I
J'A%, a')
----'%«(j)1'j(z, a') un-
Having such a multiplicity of forms for what is eaentially 0Il8 portaDt statistic is rather emb&l'l'888iDg, 80 there is some incentive to pick a atandud form. Betting each tzCJ) == 1 recommends itae1f as CODveaieDt aDd 1-.da to the particular statistic r - (rs, rsl, where
P(z I B,)
(6)
r.(z) -
E P{2; I Bi) • j
This form is indeed convenient for twofold and, more poeraUy, for ftfold partitioos, but, where iDfiDite partitiODB are to be dealt with, ita apparent naturalneaa is misIeadiDg, for the sum in the denominator of (5) is then typically mYellct. In the cue of twofold partitiODl, a convenient form for the statistic is that of a likelihood ratio, in the 88D18 introduced in 13.6, for it is easy to aee that, infhute Ilumben being admitted, P(~ B1)/P(z Btl is equivalent to r. Henceforth, &BY atatiatic equivalent to r will be called a likelihood ratio of z with Ieepect to the partitiOll B~ definition that does Dot seriously CODfIict with ordinary statiltica1 uaap of the term. Figure 1 illustrate. a geometric interpretation of likelihood ratios that is sometimes valuable. The figure can belt be deecribed by teUinc how to draw it. Firat draw a pair of earteaian eoordiDate &U8 for variabl. '" aDd Ut. Next draw the two line &qpilents repreeeoted by 1&1 tAt - 1 aDd (ul/a(I» (Us/a(2» - 1 with theu,'sDOD-Deptive. The left ends of thee 8fCDlentB are indicated in Figure 1 by G and b, respectively, the particular value tz - (1/3, 2/3) beinl uaed for nluatratiOD. Now plot the point IP(% I B t ), P(~ 1Bt)}. If ~ baa positive probability (for any, and therefore for all, 6); this point, wDl be different from the origin 0, 80 it wiD be pcaible to draw the (daabed) tiDe connectiq the origin with the point (PC,;' B I ), P(:r; I BI)I. This line (or ray t.IuvuIh the origin, 88 it is often called) must nece ••rily pierce the Iiae ....aentB G aDd b. The important aeometrieal fact, which the Nader will have no difliculty in verifying, is that daeee interaectioDs occur at the points {rl(~)' rs(%») and (rl(z, el), r,(s, «»), JeapeCUveIy.
I
I
+
+
]38
PARTITION PROBLEf,fS
17..1
•
FipIe 1
It is also obvious that the ratio P(z I B 1)/P(z I B,) ia the reciprocal of the elope of the ray. Sinee, to each s that occura with positive probability, tllere conesponds a ray through the origin, the ray can be taken .. a statistic; according to the geometrical construction of the precediDa paragraph, this statistic is equivalent to r and is therefore a likelihood ratio of z with respect to the partition B,. The ray CODDecting the 0Jicin with a point tUl, 1121 can conveniently be represented by the sugeatiV8 notation VI:~' though, of course, different pairs of numbe1'8 caD represent the same ray. More explicitly, if ~ is any number different from 0, AU1 :~Ut represents the aame ray 88 "1: ute In aoalytieal projective pometry any pair of numbers repreeenting a ray in this fashion is called a aet of hOlllOleneoua coorcliDates of the ray. The redundancy of the notation "1:'U2 may be removed by, for example, characteriziDg the ray by the reciprocal of ita slope v,t/ut.
Such Don-homogeneoua COOrdiDatisatiOD entails a aacrifice in symmetry and the Deceeaity of admitting infinity as a meaningful value of the quotient; both 108188 are quite troublesome in extension of theee ge0metric concepts to carteaiaD apace of 1& dimenaioos, which is necessary
7.6]
in connection with .fold partitioDl. In homogeneous coordiutes the likelihood ratio ean coaveniently be represented by any of the equally good seta of homogeneous coordinates, P(z I B 1) :P(z I B,), 'a (z) :'J(~), and 'I(Z, 0) :r,(z, a). FiDally, it may be remarked that pes I Bl>1 P(z I Bt) is a non-homogeneous coordinate. Thus the meny equivalent forms in which the likelihood ratio statistics can be naturally up, aaacl correapooda to the many different notatiODB by which a ray through the origin ean be naturally designated. The mOlt nearkable fact about the likelihood ratio coalidered AI a statistic is that it is "'"'W'1I, 80 to speak, 88 well 88 aufticieDt. By that I mean that to have the advantages of knowing z it is neceawy 88 well .as IUflicient to know the liIcelihood ratio. The point can be put formally thus: TaEOBBIIl
If y is sufficient for z, then J is an exUmaion of r.
PRoo... The theorem is virtually obvioua in terms of the factor&bUity criterion for aufticient statistics, for in the notation of (4.10) R(lI(%), i)
(6)
r.l%) .. - - -
~R(fI(2:)J ))
f\
with probability one, exhibiting r, 88 a function of fl• •
CoaOLLARY 1 If I is suffieient for Z, and if enry ., sufticient for z is an extension of I, then I is equivalent to r. By ordinary analytic standards, the likelihood ratio seems to be a rather complicated statistic, at least in the ease of n-fold partitiOl18, where R is at all larp; for, to one who takes aeriously the idea that & multiple statistic should Dot also be repnIed 88 a ainaJe statistic, the likelihood ratio seems at first Bight to be ft., or perbape (ft - 1), ~ tics. Yet Theorem 1 and ita corollary show that the likelihood ratio ., in & fundamental aeD8e, the most compact sufficient statistic that a partition problem admits. A. an explicit example of a likelihood ratio, CODSider the twofold partition problem arising from Exercise 4.1 OD confining attention to two different values of p, II&Y PI aDd PI. The likelihood ratio r is euily computed tIlu: (7)
P(z I B,)
= pl'<·)(l _ (1 -
10
(8)
Pi)·-"~)
pd- (
).,-c.) == q._ (~)••
Pi 1 - Pi
Ii
138
PARTmON PROBLEMS
(7.5
Theorem 1 is thereby verified in the present iDBtance; for (8) exhibits r explicitly &8 8, contraction of J, and J is euily exhibited 88 a contrac-
tion of r thua: (9)
Plql I0 1 7JJ91
In this ex&nlple t J is, in view of (8) and (9), equivalent to the likelihood ratio..
ExerdHI 1. Exprea k~(:t» and ,(F(z» in terms of the likelihood ratio thus:
lJ(i; r) -
(to)
Df f';#(tj/'E
rjlu),
J
(11)
k~(z»
- k~(r(z»).
(12)
,,(F(z) fJ) -
~ ~(r» [ ~ P(r I BJlfJ(J) ].
I
2. This extended exerci8e develops the persoDalistic and behavioralistic theory of what, following the objectivistic and verbalistic traditions of statistica, is called the teatlDc of a IImp1e dichotomy. 8 type of decision problem that, though seldom very tealiatic, is a popular aDd iD8tructive example v.ith important implications for more realistic pr0blems. Verbalistieally such a problem is described as that of maki'll the best gueD on the buis of an obeervation 88 to whether it is Bs or BJ that obtains. Behavioralistically, this is pnerally interpreted as the problem of deciding, on the basis of obaervatiOD, between two primary acta one of which is preferable to the other if Bl obWns and vice vera if B, does. Here is one topic in which the a_UDption that i is eon6ned to two values is rather more than simply a pedaaogical aimpUficatiOllj
a reader intereeted in reJuing the
8lJ8lIIDption
will find paps 127-130
of [W3] stimulating. Suppose that P contaiDs only two acta fl and fa and is dominated by neither. Let tPij - Df E(f, B;). There is DO 1088 of generality in auPpoaiDI
I
(13)
8. -
~-.12 Df
2
> 0,
L
vx
-Dr
.11-911>0, 2
which will henceforth be done. That is, it will be suppoeed that f, is appropriate only to Bl and vice versa.
1_
7~
(b) Show that (14)
"00 - L .uIJ(i) I
-L
~;IJCJ1
for 6(1) ~ '1/(11 + It) - tfo(l) for ~(2) ~
+ It) -
1,/(II
I
JJo(2)
-i(.l1 + "1)~(1) + i(.11 + .,,)8(2) + I 'v'(2) - :E .~CJl + f 1t6(2) - '-8(1) I,
~(1)
I
I
where fJo and the ..,'. are defined by the context. (e) B(t, I fJ) - tOO, if and only if /J(t) ~ /10(1,1. This condition 0btains for both i'. simultaneously, if and only if /J -1Jo. (d) Show that (10)
iCl'{r» -
{f
¥'1lU)
+ I 'Ir~(2) -
- 1:: ~(j; r)
asra6(l)
I}If rlJlJ)
for r, ~ r,·(jJ, A,),
I
where (16)
ad that (17)
'(P(I:) I fJ) -
.IJ(J) + :E I 'IP(r I B t )IJ(2)
L J
I
- ltP(r B1)IJ(I) ,
r
(fl
+ &.(1 -
2P(rl
< ,.*(ft, 110) j Bt)
- Per - ,*(JJ, /Jo) I B1»)I/1(I)
+ (.. + 'l[l -
2P(r,
< "J*(JJ, fJo) I Bt )
I
- P(r - r*(p, flo) Bt))J 6(2).
functicm I 'wping an i to each S, I beiDg implicitJy defined thus: f(%) - f,c.). Conversely any I de.rminee a derived act. Show that B(f(z) fJ) - v(P(z) /J), if and cmIy if r~(.)(:e) ~ riC.) *(JJ, 1Jo) for every %. Such a function i(z) is called & Jlb1ihoocl-ntlo teat a.ociated with Show that at leaat ODe likelihood-raUo teet is aameiated with every value of '., aDd that if Per - r*) - 0 (which is typically the cue) there is oDIy ODe. (I) If fez) is determined by a fUDctiOil 011, the probability of decidiDg OIl the iDappropriate value of I in ease BJ obtaiDi is geueralJy called (8) ADy derived act f(x) determiDee
&
I
,,*.
I
PARTmON PBOM.EM8
140
17.8
the probability of an .Ior of the j-th IdncL ADalytitA1Iy the probabBities of error of the first and I8cond kind are, reapecQvely, (18)
61 - Df
P(i(z) -
21 B
1),
It
=Dr P(i{%) .: 11 Bt ).
If i* is a likelihood-ratio test a.ociated with r*, Ihow that ita erron of &he first and aeeond kind are subject to the bounds (19)
PC'1
(20)
P(rl
< '1*[ B 1>.s '1* S P(Tl S '1-' B 1) > '1*1 Bt) S It* .s Perl ~ 1"1· J B,,).
What about the typical C888 that Per .. ,.) .. 01 Ca) Show that, if i is at least 88 ,ood all ,- in the 8eDI8 that e. S ..for both ,"s, then i is a likelihood-ratio teat ad i is virtually 1* in that 6i a 6i· for both i'a. Hint: CoDaider an If and a ~ for which r-(fl, flo) 1::1 ahowiDg that theee exist, and note that, for this decision problem,
r·,
B(f••
I,,) - Ie, c:a
(21)
I
E(f. fJ) -
1t(1 - 2el-) 16(1)
+ (II -
'1(1 - 2It·)16(2)
I
.(P(z) (J)
I fl
-
3,(1 - ~) 1,,(1)
+ t -2 -
'1(1 - ~J) ltl(2)
I
~ ,(!lex) fJ),
with equality if and only if I is a likelihood-ratio test. This important conclusion about likelihood-ratio testa baa been much emphasized, especially by the Neyman-Pearaon acbool.
The concept of likelihood ratio, sometimes simply called likelibood, je DOW one of the most pervasive concepti of atati8tical theory.. It I8elD8 to have been introduced in 1922 by R. A. Fisber (cf. ind. of (Fa», who emphasised it in CODDection with the important method of estimation named by him "the method of maximum likelihood. n Ita use in teatiDg bypotheees was apparently first emphasised by J. Ney.. man and E. S. Pearson (see Vol. IT, p. 303 of [IC2]). In connection with likelihood ratios as nece ••ry and sufficient statistics, mathematieally advanced readers will be interested in Section 6 of (1.,6), (82). and [M5]. ODe of the earliest contributiooa in this direction was made by C. A. B. Smith (814).
8 Repeated ob....tioDl If x(n) - {%1' ••• , x,. J, where, given B 4, the z,.'. are independent identically distributed random variables, then f1(:r(x(n») is • DOD-decreasing function of ft, for the (n + I)-tuple is aD exteDaion of t.he fttuple. If~) is atrictly convex-a COIldition that you noW' recopiw
7.6)
141
REPEATED OBSERVATIONS
.. interestin~(P(x(n») is eaai1y seeD to be strictly increaaing in ta, unleas the individual :Er'e are either utterly irrelevant or definitive. It is to be expected, especially jn the light of the approach to certainty diacuaaed in § 3.6, that, 88" becomes very large, zen) wiD become prae• tically definitive. Indeed, t 3.6 makes it poaaible to state and prove a
formal theorem to that effect. 1im>BJDIl 1. x(n) - IZh ••. , z.J, where, given Bi, the z,.'a are independent and identieally distnouted random variables. 2. The X,.'8 are not utterly irreJevant to Bl3. v(F I {J) - k(JJ).
Hyp.
CoXCL. uniformly
lim 11(I'(z(n» I {J) a-_ in fJ.
l(JJ) - Df fJ{l)k(l, 0)
+ ~2)k(O, 1)
Pao<W. Writing x aa abort for zen),
I
(1)
v(P(x) fJ) - E[k(P(z»).
For an arbitrary. > 0, let the closed interval! on which k is defined be partitioned into two subsets J and K, where J is the set of those (18
such that k(JJ)
(2)
~
left) -
f,
and K is the complement of J relative to 1. It fonows from the continuity of the functions on each side of (2) that fJ aJ, if either component of fJ is sufficiently large. The computation initiated in (1) can now be eanied forward thus:
(3)
E[k~(J:»] - E[k(B(z»
I /J(2:(.»
eJ]P(B(z(,» IJ)
+ Elk(p(z» IP(z(,»
1
K}P(p{z(,» c K)
~ E(l~(z» 1~(%(.» IJ]P~(2;('» 1.1)
+ min ken ·P(p(z(.» K) II' == E[l(p(x»)) - {E(l~(x» I fJ(%('» «K] I
- min k(JJ)JP(jJ(2:('»
I
f
K) -
"max I k(JY) l·p(fJ(2:(,» " in which (3.6.15) Now, in view of the paragraph > 1(fJ) -
I
I
K) -
f.
and the fact that, if either component of IJ is close to I, /J I J; P(fJ(z(a» • K) becomes arbitrarily small for suftieiently large ft• • 0CCUl'8
142
PARTmON PROBLEMS
1'1.7
'1 SeqU-tiaI probability DtIo pnced.area The preeent section digreaeee to diacu88 an iDtaeating application of the ideas preeented in this chapter to what is ca1led aequential aulyaia. Sequential analJlds refers in principle to the theory of observational pr0grams in which the selection of what obeervations to make in later pbuea of the program depends on what baa been observed in eattier phases. Such behavior is commonplace in everyday life; for example, you look for something until you find it, but Dot lonpr. Statistiea itself has always used sequential procedures. For eamp1e, it is Dot rare to conduct a preliminary experimeD~ to determine how a main experiment should be aanied out. Thus, if one were required to estimate with a roughly preassigned precision the mean of a normal distribution of unknown mean and unknown variance, one mipt reasonably begiD by taking ten or twenty observationa, which would give BODle idea of the variance and would therefore determine about how many obearvationa are Deeess&ry for achieving the requisite precision. Commonplace though problems with aequent.ial featuMs are, A. Wald was the first to develop (1943) a systematic theory of a COIl8iderable body of problems of this sort. For early history see the Introduction of [W2J and the Foreword of Section I of (817]. Some later ideas on sequential analysis, due maialy to Wald and Wolfowitz, are the subject of this section. It wiD not be practical to proceed with full rigor, primarily because random variables capable of a88umjng an infinite number of values are neceBVily involved. Full det&ila are given in [W3] and more compactly in (A7], but not in Wald'. book on sequential analysis [W2]. Z == (x(I), ... , x(,), ... J, where the Z(II)'. are conditionally an infinite sequence of independent, relevant, identically distributed random variables. Rather informally, a 8luential observational propam. with respect to x is a rule telling whether to observe z(1) or whether to make DO observation at all; if the particular value 2:(1) ia obeervecl, whether to obeerve x(2) or to discontinue observation; if the values %(1) and z(2) are observed whether to observe x(3) or to m."ontinue observation, etc. More formally, let N be a function of the infinite sequence of values % .: (x(I), .•. , x(,,), •.. t such that, if the sequence z! &pees with ~ in every component from the first through the N(z)th, then N(z') - N(z).
ut
Such a function N detemUDe8 a sequential obletYatioDal prapam, which is a contraction of z, call it y(x; If) J defined thus: (1)
rex; X)
-
Df
(x(l), ... , z(N(z»J.
7.7)
SEQUENTIAL PROBABILITY RATIO PROCEDURFl3
143
It is te be understood that, if N(%) is zero for some %, it is identieally aero, and that ,(I:; 0) is & null observation. It will be 888Umed that the random cost 8880Ciated with a sequential observational pJ'OII"&Dl is proportional to the number of random variables observed, that is, c = N(z}y, "1 > o. No categorical defeDSe of this ••I)mption ia augested, but clearly there are interesting problema in which it is met at least approximately. The domain of applicability of the theory ean aetua1ly be considerably extended by modifying the a-Imption to include a fixed overhead cost that applies except in cue N is identically zero; this does not greatly complicate the analysis, 88 the interested reader will be able to see for himself. The theory would even remain virtually unchaqed, if c were only 888UlDed to be of the form Nea)
(2)
c- h -= 0,
+ ...L1 c('),
if N
> 0,
if N -= 0,
where h, c(l), c(2) , ... are independent with finite expected values E(h) ~ 0, E(c(r» > 0, and the c(,)'s are identiea1ly distributed. For any F there are some values of ~ for which it would be UDwise to adopt a.oy sequential obaervatiODal pJ'Oll"&Dl other than the null obser· vation. Suppoee, for example. that /J is 80 close to an extreme value that l(JJ) - k(fJ) < "1; under this circum8taDee the most that could be pined by observing even % itself would be leas than 'Y. but the cost of ma)dnllO much as one observation is at least "1. Let the Bet of values of IJ for which it is not justified to make any but the null observation be denoted for a while by J(F; "1), or simply J, for short. Now, if fJ • J, the person's utility can, by the definition of J, be maximired by refraining from any obaervation but the null observation and aceeptiDg the utility k~); otherwise there will be lOme advantage to him in obeerviog %(1). If the person does observe the particular value %(1) of z(l), he finde himself with a posteriori probabilitiee /J(z(l» in place of the a priori ~, he baa paid (or at any rate entailed) a cost -r, aDd he must now decide whether to make any further obeervations. His new problem is simply the problem he would have faced at the outlet had his a priori probabilities been fJ(%(I» iDstead of 8, except that all utilities are now reduced by -,. He jUBtifiably aecepta the utility ~~(z(1») - ~, if /J(z(l» aJ; othen-ise he will observe 1:(2). ContinuiDa this line of argument step after step, it foUowa that optimal action COD8iata in obaerving successive X(D)'S until an a posteriori probability in J oceura, and then adopting a basic act consistent with the a poeteriori probability.
144
PARTITION PBOBI.EMS
(7.1
In actual practice, it is far from easy to determine whether a particular value of ~ belODp toJ (Fj ,,), because in principle the whole enormous variety of sequential observational prqpama has to be expl} IJ'. If programs corresponding to an seta JI had to be examined, the process would still be matbematitally impractical; indeed, in all but special calM, practical 801utioDl have yet to be found. But, if any special conditioDS that J must neceooanly satisfy are discovered, only &eta JI satisfying those conditions need be examined. Some very pm... era! conditiona are these: J contains the extreme points of I; J is topologically eloeed, that is, if a value 110 is not in J, then the Dear aeighbora of flo are a1ao IlOt in J. The first of these conditiOll8 requiree DO com~,
ment, and the aecond follows eaaiIy from the continuity 88 a function of ~of
(3)
.
E(k(p(y(x; N)) - ~R I ~]
-
1c(JJ).
These conditions alone do not 10 far toward narrowiq to practiea1 limite the variety of Beta to be explored. Thua far in the development of the 8Ilbject, really powerful conditions have been obtained 0Dly at the ex.peose of eoDBiderable restrictiODS on the structure of P or, eqwvalentlyJ of k. Suppose, then, that F is dominated by a finite number 01 acta or, what amounts to a little leas, that the graph of k is polygonal, as it is for the k graphed in Figure 2.1. TecbDieally, this restriction on k may be expresaed by saying that the interval! is the union of & finite number of intervals of linearity of k. Under the restriction, relatively much can be concluded about the structure of J(F; ~), for it is true in general, .. will be shown in the next parapaph, that the intenJectioa 01 J with any interval of linearity of k i8 a closed interval. Suppose, indeed, that III and fJ2 belOll1 to J and to a common interval of linearity of k, but that ~o on the interval between #1 and fJJ does not belODg to J. A contradiction follows according to the fonowing computation, in which h is any act derived from • aequential obeervatiOll&l PJ'OII'IUIl, cost included, that is advantageous at /jo. (4)
L j
E(h , BJ)fJo(J)
> k(fjo),
7.1]
SEQUENTIAL PROBABILITY RATIO PROCEDURES
145
for h is supposed to be adVaDtageoua at 60; and (5)
L
E{h , B i )IJ.(s1
S k(ft.. ),
m - 1,2,
i
for DO derived act is supposed to be advantapoua at ~., Bince fJ.. «J. Since IJo is a weighted average, say ';-,,JJ., of the ~'8, and since i{IJ) is linear in the interval between ~l and /Jt, it follows frcm (4) and (5) that (6)
I
}: E(h B,)/Jo(t)
< k(fJo),
i
contradicting (4). The supposition that flo «,...,J baa thus been Iedueed to absurdity. The demonstration just given e~"tends directly to n-fold problema.. The aeneral conclusion is that the intersection of J with any domain of liDearity of k is convex, 80 that, if k is polyhedral, J is the union of a finite Dumber of cloeed convex sets, each lying wholly in a domaiD of ljnearity of k. The practical implications of the conclusion are enormously greater for twofold than for higher-fold problema, because twofold problems lead to one-dimensional bounded, closed, convex Bets, which present DO great variety, all of them heiDg closed bounded intervals. But threefold problems, for example, lead to c10eed bounded two-dimeDBiona1 convex sets, a restriction that leaves great room lor variety. If IE is polygonal, the variety of seta J' to be surveyed is enormously reduced, for J' must be the union of a known number of intervals, each of whlch is confined to a known interval. Suppose that this number is m; the cl&88 of sequf'ntial observational programs to be surveyed can be characterized by the two end points of each of the m intervals, except that, the possibility that some of the intervals are vacuous must be borne in mind. Since the extremes of 1 are neeessari1y in J, and therefore neeEBuanly appear as end points of intervals in J, the exploration baa been reduced to a 2(m - 1) parameter family of possibilities. The poesibility that m .. 1, \vhicb almost means that F is dominated by a single element of itself, is trivial; for then all ~'8 are in J, and obeervation is never called for. This can be seen in many ""a18. In particulart it {ollowa 88 an illustration of the machinery that has just been developed, thus: The end points, or extremes. of 1 are both in J, as always, and, since m -= 1, they are both in the same interval of linearity of J; therefore the in~rval between them, namely every value of (J, lies in J. The possibility that m -= 2-iD ordinary statistical usage, the aequential testing of a simple dichotomy-is of particular importance.
148
(7.7
PARTmON PROBLEMS
It OCCUl'l typically when II is dominated by two acta, Deither of which dominates the other, 88 in Exereiae 5.2. One of the two acts is appropriate to one "hypothesis" B1, and the other is appropriate to Ba- In case fA - 2, it is easily seen, by methods that bave now been indicated more than OIlce, that each of the two cloeed intervals that CODStitute J baa &8 ODe end point ODe of the extrem. of 1. Neither of the two intervals can be vacuous, nor can either coDBilt only of • siqle point. It is relatively easy to find, at least approximatAdy, the two values of fJ that determine J(P; ~), aDd the theory of this situation hu corresponcliD&fy been broupt to a relatively high depee of perfection; for details, see (817], (W21, [Wa], and [A7J. FollowiDg (or at least paraphrasing) Wald [W2], a sequential obaervatiODal propam. characteriIed by makinl IUCCfJ' ive obIervatiou UDtil the a poeteriori probabilities fall into lOme set J, fonowed by adoptiDa & basic act appropriate to the a poateriori probability, ia called a Hqueotial pmbabOlt)' ratio pmcedure. The Ie&8OD for this DomeDCIature is that to obaerve until the a poeteriori probabilities faU into J is to obIerve until the numbers
#(&)P(z(l), •.• , s(.) I Bi) tlU1P(z(1), '.', x(,) BJ)
.
(7)
"(~ I z(I) ••..• z(..» - L
I
J
tie in a certain eet, or, what amounts to the same tbia& aatisfy certain conditiona. But, the particular value of fj having been Nlliped, this is tantamount to requiring the ratioe of probabilities (8)
I ..• , zeN) I B j )
P(z{l), .•• , z(N) B,)
P{z(l),
to satisfy certain coaditious. Since (7) and (8) are ways of expressing the likelihood ratio, the observational program together with the act derived from it m;pt also be referred to 88 a sequential likelihood-ratio proeedure. Indeed, but for the precedent established by Wald, that would seem the better name.
As an actual example of a sequential probability ratio procedure, auppoee that the distribution of xCv) given Bi attaches the probabilitiea " and fl' - 1 - p, to the values 1 and 0, respectively. The expreaaion (8) can in any case be writtAm in the factored form (9)
IT
Bi)}
{P(%(V) I .... P(%(v) I Bi ) •
7.7]
SEQUENTIAL PROBABILITY RATIO PROCEDURm
147
and ill the preaent example this takes the special form (10)
-(;re::J(N),
(;J(N)(;r~(N)
where N
(11)
L 2:(11).
r(N) -
...1
It is noteworthy, in connection with sufficient statistics, that the CODdition that the a posteriori probability be in J is in this cue expreaible, accordiDc to (10), as a condition on y(N) and N. BpeeialjsiDI the sample further, SUppollfS that J hi of the sort appropriate to testiDl a limple dichotomy. The condition that the & poateriori probability be in ,....,., is then apreaaed by each of the following equivalent paba of inequalities, where 41(1) ad 41(2) are positive numbers 8uch that a(l} +.(2) < 1. ~(II ~(l), •.. ,
%(N» < 1 -
a(l),
<1-
a(2).
(12)
1l(21 %(1), •••• z(N)
~(l)Q #(l)Q
+ ,,(2)
<1-
a(l) ,
(13)
~(2)
1l(I)Q
+ 1l(2)
<1-
«(2) I
where 0 for the moment denota the likelihood ratio (10).
Q < ,,(2)(1 - a(I» _ g. #(l)a(l)
t
(14)
Q>
fJ(2)CI(2) ~(l)(l - a(2» -
0.,
where 0·, Q. are defined by the context. Since, according to (13), the strueture of -J is superficially determined by three parameters, 881' by P(1), .(1), and a(2), it is worthy ol8OlDe note that the eorrtWpODdiDl' eondiuon is ultimately expre.ed in terms of only two special parameten, g. and 0.; this is only natural, cODSideriDc that ,....,; is an open interval determined by ita two end point.. The act that would be appropriate to Bl is called for by values of Q ~ Q., &Del the one appropriate to Bt is called for by vatu. of Q SQ••
148
PAB.l'ITION PROBLEMS
Thus far, the particular form (10) of the likelihood ratio has Dot really been exploited in the calculation, 80 (14) applies to the testiq of simple dichotomies generally. Taking account of (10), (14) C&D by elementary mampulatioll be put in the following form.
+ N log ('l2!Q.) l/log (log Q. + N log (qJ/Ql)}/lOg
1I(N)
(15)
<
(log Q.
where, for definiten888, it is supposed that PI > Pt.. Thus, the region ill the (N, fJ) plane determined by ",J, the region in which further observations are ealled for, is a band bouDded by two parallel linea of positive slope. 8
Standard form. and absolute comparison betwecm oblenatlona
I
If J: and y are such that, for every l' and (J, I1(1'(x) (J) ~ .(I'{J) Ill); then x imitates, 80 to speak, an exteDsion of 7, and it may appropriately be said that z is a virtual ~ of 7.. CorrespondiDgly, if z is a virtual extension of 1, aDd 1 is a virtual extension of %, it may be said that x and 1 are I1irttMJllll equiNlenL No matter what a priori probabilities a person may have, or what basic acts are available to him, he win have DO preference between a pair of virtually equivalent obeervatioDS. 80 virtually equivalent 0bservations are indeed equivalent for many practical purpoaea. Where oombinationa of observatioDB are under consideration, however, the ze1a. tion of virtual equivalence does not resemble true equivalence.. For example, if :t and y are equivalent, then each is equivalent to the multiple observation {z, y J, but if x and y are only virtually equivalent, they may well be independent, in whieh eaae neither will typieally be equivalent to {z, y). This aeetion explores the notions of virtual extension and virtual equivalenee. In particular, an interesting standard representative of the c1. . of observations virtually equivalent to a given observation x is defiDed aDd discussed. This maU!rial is scareely referred to later in the book, and it may without much 1088 be skipped or gtc_ad over. It will be couclled frankly in the language of n-fold aa opposed to twofold partitiODS, but readera with the rest of the chapter behind them wiD eaaaly be able to concentrate on the twofold situation, if they find it
more UDderatandable. Most of the ideas to be presented in this sectiOD were originated by H. F. BobDeDbluat, L. S. Shapley, and S. Sberman in a private memorandum dated August 1949, which I was privilepd to see at that time.
7.81
STANDARD FORM
J41
Thia work was extended and brought to the attetion of the public by David Blackwell in (BIS). It is obvious that, if '1 is a sufficient statistic for Xt then z and y are virtually equivalent. In particular the likelihood ratio r derived hom z ia virtually equivalent to x. Moreover, the reader may anticipate, and it will be formally shown in the eoune of this aection, that if aud on1y if ob8ervatiODS are virtually equivalent do their likelihood ratios have the 88Dle diatnDution for every value of fl, or, what comes to the 1&ID8 thiDg, giVeD each Bi, i - I , ; .. , n. Thus the ft conditional distr:ibutiODS of the likelihood ratio given each B, could be taken to charactaiJre the obeervatiODl virtually equivalent to a given one, 8&1 z. Actually, .. will be shown, the cia. of observatiODS virtually equivalent to z can be represented by the distribution of the likelihood ratio for any aiDcJe Ilon-extreme value of /J. For definiteneas, the partieu1ar value ~. t l/n, ..... , lIn J will be used, but the interasted reader will find it a simple exercise to extend all the CODSideratioos baaed on ~. to any other non-extreme~, as would be necessary in any extension of the theory mE
to infinite partitiooa. let mer) be the probability that the likelihood ratio in the standard form (6.5) attaina the particular value, when fJ - fJ·. WIth ae1f-evi.. dent abbreviations, (1)
m(r) == PCr 111->
- E Per I BJ)(l/n) J
1
- n- EI rCz)-r E P(z I Bi)· The 8eCOIld line of (1) exhibits mer) expre88ed in terms of the " diatributiODl P(r I B,). It is rather more interesting to see that thOle " diatributiOlll can themselves all be expressed in terms of the single eli&tribution m, 88 follows from the definition (S-S) of r and the third line of (1) thus: (2)
I
P(r Bi)"
1:
I
P(x B t )
p(.)_r
.. 1:
Ti(Z)
rCs)..,.
Similarly,
(3)
E P(z: I Bj} I
- nr.-m(r).
P(r III) - " {
f
rj/JU)} m(r).
rr.a
PAllTlTION PROBI.EMS
150
Reprded as a probability measure OD the let of all ..tuples of nUlDbers r, m has the followiDg three important properties. P(rl~
01_) -1;
P(~>l - 11 tJI) - 1; E(r, I m) _ ,,-1.
(4)
Of these, the first two are obvioU8 from the definition of r, and the third followa by calculation from (2) thus: (5)
1-
l: Per I B~ - ft 1: ,,,,,Cr) r r
Conversely, auppoae that m is 8DY mathematical probability defined OD the set of .tuplee ,. of numbers, subject to the CODditiODS (4), theD. a can easily be verified, ft mathematical probabilitifa are fonnaUy defined by the equation Per I B i ) ~ nri1A(r).. Mathematically, r dietributed thus can be reprded as aD obaervation. The foUowiDg calculation demODStratea the expected conclusion that the likelihood ratio of this observation is the observation itaelf and that itl distribution given ~. is m. Per I B,) ftr4fll(r)
L P(r I Bi) - n l: rJm(r) - r,.
-=--~-
;
(6)
I
Per IIJ·)
-
E tU"p(r)(l/.) i
mer).
It is intereatiDg and fruitlul to oompute .~(z) I (7)
,> in terms of ...
v(P(x) 1{J) - B(l(}(z» , Il)
- B(k«(r,,(i)/L r~(i»)) I ~J I
- nB [k( (r,.,(,1/L rJlCJ1}) E rIJCJ) I mJ. I
I
Temporarily adopt the convention that, if a is any ta-tuple of positive numbers and h any function of r (not neceeaarily OODVex), 7'(CI)h is a function of r defined thus: (8)
T(<<)h(r)
:It
Df h( (r~-a(i)/
1: rp(j) J):ZrjaU). j
Then (7) takes the abbreviated form
(9)
B(1c(fJ(z»
I ~) -
aE(T(JJ)Ic(r) 1m).
1151
STANDARD FORM
7.8J
To see the implieatioDl of (9), it i8 neeeaaary to know something about what the operation T(fJ) d~ to the function Jr, in particular to know that T(P)k is convex in r. The derivation of these necessary facts is straightforward and is left to the reader 88 a sequence of exerci.ees. Bzerd_ lL T{a)T~)h == T( {Q(l)~(l), ... , CI(A)~(n} J)h == T(p)T(a)h. lb. h - TUa(l)-I, .... , a(A)-lJ)T(a)h. 1
2. TUJ·)h - - h. n
3. H h(r) ~ I(r) for r between " and r"; then T(a)A(r) ~ T(a),(,.) for r between 'la(~j/E "lo(1') and rl'a(~j/"E rl'a()j. I
j
4. If h is linear, then 80 is T(a)h. 5. If h is convex (strictly convex), then 80 is T{a)h. Exercise 5 is obvious in the licht of Exercises 3 and 4, but some may prefer the demonstration ~ by the following calculation, where A + " - 1; A, " ~ 0; aDd obvious abbreviations are used. (10) T(o)A(U + p.r') -"
( M·", -a + a · (Ar + "ar') a· r
S >J& (~«) a.r + \;.,
.. AT(a)"(r)
p.a.r'
r')
- a a-(AI" + PT') 01" (AT + pr') a· r'
ph(«·rr' ,«) «.r'
+ pT(a)h(r').
It is amuaiDc to eetabliah once more that observation generally pays, thia time by meana of (10), (4), and Exereiaea 5 and 2. (11)
fI.8(T(JJ)k(r) I m) ~ ftT~)II;(B(r
1m»
== ftT(JJ)l~·)
- k~).
If Z aDd r are obIa V.tiODS aDd '" and ",' are the correapondiDa dietnDuticma, it is now easy to 8f4Y .in terms of '" and ",' when Z is utterly irrelevant, when it is definitive, and when x is virtually aD extensiOll of r.
lIore eurciIee 6. The observation z is utterly irrelevant if and only if P(r -
6*' tA)
- 1. 7. The obeervation x is definitive; if and only if PCri m) - lIn, or, equivalently, if and only if 0 1m) - (n - 1)/".
per, -
11
162
(7.8
PARTITION PBOBIeEMS
Sa. The observation z is a virtual extenaiQll of r, if and 0DIy if, for every convex function b defined for ",
B(h(r) r m) ~ B(l(r)
(12)
I.,.
8b. The two observations are virtually equivalent, if and 0DIy if, for every convex function+h,
I
E(1a(r) tn) .. B(1a(r)
(13)
1m').
The conclusioD reached in Exercise 8b can be much improved. Indeed, it will be shown that the two obgervati0D8 are virtually equivalent, if and only if m and m,' are the same probability measures. This will be achieved if, for example, it is shown that, m and m' have the same momenta, for it is well known that two dift'erent countably additive probability measures confined to a bounded aet of n-tuples of numbera cannot have the same moments. t The momeDta in questiOD are expected values of monomials of the form (14)
where the ti's are DOD-negative intege1'8. In general, , will not be convex, 80 it cannot be concluded immediately that I has the same expected value with respect to m and m'. If, however, & highly convex function is added to g, then the sum will be CODVex and its expected value will be the same with respect to til and m'. Since, by hypotheaia, this is also true of the convex term of the sum, it must also be true of the not neceasari1y convex term. Specifically, let (15)
I&(r) - g(r)
+ ~ 1: rl, j
where A is a positive Dumber to be detennined later. To teet h for convexity, let, be for the moment an arbitrary "-tuple of numbers and. a real variable, and compute tbe second derivate of h(r cr.) with respect to fI at t1 =:t O.
+
(16)
tIll&(r .J
+ fI')
~
WI
I
~ a'g(r)
t:II
.-0
.&.J ~,j
ar, 8rl
8.-8S
~
2
+ >. ~ Ij • J
CoaaideriDg that each r, is between 0 ADd I, the abaolute values of the derivatives of , that appear in (16) have a common upper bound, tJay
t See, for eumpJe. CoroI1a:r;y 1.1, p. 11, 01 (811). Under our UIUIl IimpJif)'iD, awumptioa that s is coofi.ned to a fiDite DUmber of valu.ea, m is certaialy countably additive. AetuaIJy, t.be whole theor7 am be . . veloped mutatia mutandis only that the dilrtributiOD of z Ie cauntablJ additive OQ lOme suitable BoN! &eld. + Morae ud SaebWeI' (l966) show, iD deet, that the teet caD be conflDed to the very special eonvex funetioDS mas p,r•• where the P' are arbitrary poeitive Dumbers.
_miD,
8T~~DARD
7.81
1S3
roRM
't
if A ~ pA2, h is convex in the region where each n. between 0 and 1 and is a fortiori convex in the intenection « that region with the hyperplane '%rJ -= 1. Now that it baa been established that m and ",,' repreaent virtually equivaJent observatiODB, if and only if m and m' are identi~ it is apparent that fit or, more exactly, the . t of CODditioaal diatributiODI per I B,) - """,,(r)-is a unique standard form for all obeervatione virtually equivalent to z. If x virtually extends J, it is to be expected that, no matter what rea80Dabie definition of "informative" may be suaesteci, J: will be at least 88 informative as J. In particular, it is to be expected that the iDformation of B, with respect to B j (as defined in § 3.6) will be at least 118 large for Z 88 for y, which the followin, calculation verifies, supposing for simplicity that, for both obeervatiODS, jnfinite information is impoBble. The point in question depends on the cODvexity of the funetion h defined by
pj 10,
(17)
her) -
,.(log,. - log ri),
because Ji. i
(18)
II:
E(log
r, - log'j I B i )
- ftE[r,(1og
r, - log 'j) I mI.
The required convexity can be demODStrated much .. it was in (15)+ for & difterent function also momentarily ealled h: (19)
- -
1
',"'J
I
1, >- 0.
(r·,· l ' - r·,·)2
It would be intereetiDg to know whether every virtual exteDaiOD is realised by an actual extenaioo, that ie, whether whenever x is a virtual exteDsion of 1 there exist random variables r and " such that x and are virtually equivalent, 1 and are virtually equivalent, and r ea:teuds J'. To the best of my knowledge that conclusion baa thus far beaD eatabliahed only in the case of twofold problema, the demonatratiOil for that cue being liVeD by Blackwell in [DIS).
r
r
+ AetaallJ, w. oaleulation c1ependa only em the eoDTmty of (IOC r. -
log r/) in r/r,.
CHAPTER
8
Statistics Proper 1 IDtroductioD. I think any profealional statistician, whether or Dot he fOUDd himae1f in sympathy with the preceding chapters, would feel that, even aIlowiDe for the abetractDeas expected in a book OD foundatioDl, tb088 chaptel"8 do not really discU88 his profeaaion. Be would not, I hope, find the same ahorteomiDg in this and the aucceediDg chapters, for they are concemed with what seem- to me to be atatiatiea proper. The purpoI8 of the present abort chapter is to explain this traDaition and to eerve as .. geDeral introduetion to ita aucceaeors. 2 What is atatiatica proper? So far as I can see, the feature peeuliar to modem statistical activity is ita effort to combat two inadequacies of the theory of decision, 88 I have thus far discU88ed it. In the first place, there are the vagu8lltBI difficulties aaaociated. with what in § 4.2 were caUed "UD8UIe probabilities." Second, there are the special problema that &rile from more than ODe perlOn '. participating in • dee_on. From the peJ'lOnaliatic point of view, ltatiaticl plOp« can perhaps be defined 88 the art of dealing with vagueDe118 and with interperlonal difrerence in decision situatioD8. Whether this very teDtative de&mtiOD is juatUied, later aeetiona and chaptA!r8 will permit the statistical reader to judge. At any rate, vagueneaa and interpemonal diBenmce are the concepts that, directly or indirectly, dominate the rest of this
book. I will
try to discuas vagtleDe18 in thia chapter, but IOmetbing may profitably be said here about interper80Dal differences. Dot
8 Multlpenoa.l problema As I have already frequently said, it seems to me that multiperBODal eoDSideratiou constitute much of the eeaence of wba\ is ordiDarily called It&tiatia., and that it is largely through auch CODSideratiODl that the achievements of the British-American SchooJ can be interpreted in 1M
8.8J
MULTIPERSONAL PROBLEMS
ISS
tams of peraonal probability. This is a ",iew that can best be defended by illustration, and the requisite illustrations will be scattered throughout later chapters; but some support is lent to it by thOle critics of peraonal probability who say that personal probability is inadequate becauee it applies only to individual people, whereas the methods of science are, more or leas by definition, those methods that are aeceptable to all rational people. The 80rt of multipersonal problems I mean to call attention to are those arisiDg out of differences of t&8te and judgment, as oppoeed to those, 80 familiar in eeonomica, arising out of eonfticting interests. .As a matter of fact, the latter type of multipersonal situation can, if one chOO8e8, be reprded 88 among the fonner; it may, for example, be said that you and I have different tastes for the process of taking a dollar from me and giving it to you. Though modem statisticians do not at all deny the existence of different ta8tea in different people, only occasionally do they take that dift'erence explicitly into acCOUDt. In particular, the theory of utility baa acarceIy ever entered explicitly into the works of statisticiaDs. Our intellectual ancestors who believed in the principles of mathematical expectation were leas tolerant than modem statisticians in 80 far .. they deDied rationality in thoee whose tastes departed from that principle, and some of their bigotry is occasionally met with today. In dealing with multipel'8Onal situations, it is clearly valuable to recognize tboee in which the people involved may all reasonably be expecUd to have the same 10*_, that is, utilities, with respect to the altema.tive8 involved in the situation. Explicit attempts to discover pneral cireumatancea under which people's tastes will be identical are rare. The moat important and fruitful attempt of this sort is repreeented by D. Bernoulli's idea that utility functions will typically be approximately linear within sufficiently confined rrmges of income.. CoDscioualy or uncOD8cioualy, that principle is repeatedly appealed to throughout statisties; it wu, for example, brought out in § 6.0 that the very idea of an observation depends for ita practical value on Bernoulli's principle of approximate linearity. Relatively inexplicit exploitations of similarity of taste are sometimes made in statistics. The idea is often expreMed, for example, tllat the penalty for making an estimate discrepant from the number to be estimated will, for everyone concerned, be proportional (within a reasonable range) to the square of the discrepancy; an argument for this principle u a rule of thumb appropriate to many contAmtB will be given in ,15.5. Again, there are situations in which it is agreed that the penalty will depend only on the diacrepancy and Dot OIl the true value of
151
STATlSTIaJ PROPER
(8.'
the Dumber to be estimated. Of COUl'8e, theze are problema in which both rule. are invoked simultaneously, the peaalty beiDa auppoead to be proportional to the square of the diacrep8llCY and independent of the value to be estimated. Tum DOW to cUereocee In judpDeat, that ia, to diffenmees in the perIIOD81 probability, for different people, of the lUDe event. ThouP modern objectivistic atatisticlu. may recopi., the existence of differences of judgment, they argue in theoretical ctiacUlli0D8 that _tietics must be pUl'IIUed without reference to the existence of thOle differences, indeed without reference to judlPDeDt at an, in order that CODelusiooa shall have scientific, or general, validity. To put the same idea in peraonaliatic terms, I would .y that atatietica is Jarply devoted to exploidng similarities ill the judgmenta of certain cl.Res of people and in aee1dng devices, notably relevant obeervation, that tend to miDimise their differenoee. The tendency of oblervation to bring about agreement baa been iIlUltrated in 13.6. Some of the other pnerU cireumatances in which different people may be expected to agree, or at leaat nearly agree, in eome of their judgments have also been mentioned. For axampl., it ID&J well happen that dilerent people ue faced with partition problema that are the laDle in that the aame variable is to be obeerved. by each person, but differ in that each peraon baa his own a priori probabilities ~ and hie own set of available acta 1'. If, however, the cODditiODal distribution of z liven Bi ia the aame for each perIOD, thea the people will, for example, agree 88 to whether & contraction '1 of z is 8UfIicient, which is often of great practical value. Apin, there are circumatances under which each of these same people will agree that certain derived acta are nearly optimal.
,
The mlnimlx theory
In receat years there haa been developed a theory of decisioD. here with due precedent to be called the miDima x theory, that embraeee 80 much or current statistical theory that the remainiDI chapters caD larpIy be bunt around it. The minimax theory was originated and much developed by A. Wald, whoee work on it is almost completely .1mmariaed in hie book [W3]. Wald's minjmax theory, of courae, deriv.. from, and reflects the body of statistical theory that had been developed by others, particularly the ideas aaaoclated with the names of J. Neyman and E. S. Pea.non. It seemalikely that, in the development of the minimax theory, Wald owed much to von Neumann'a t.-tmeot of what VOD NeumanD caDs zero.eum two-peI'8OIl pm., which thouah conceptually remote from statistics, is mathematically all but identical
THE MINIMAX THEORY
167
with study of the minimax rule, the characteristic feature of the minimax theory. Wald in his publicatiODB, and even in CODVe1'8&tion, held himself aloof from. extramathematica1 questions of the founciatiODl of statistics; and therefore many of the opinions expmmed in lats chapters on such points in connection with the minimax theory were neither supported nor oppoeed by him. It may fairly be said, however, that he was an objectivist and that his work was strongly motivated by objectivistic
ideas. My policy here of bolding difficulties of mathematical technique to a minimum by making stringent simplifying 88BWDptioDS will be adhered to in connection with the minimax theory. A larp part of Wald'. book [W3) is concerned with overcoming the difficulties in technique that are heM avoided by simplifying assumptions, but that must be faced in many practical problems. Despite Wald's able effort, important problema of aDalytic teehDique stin remain in connection with the minimax theory. It should also be appreciaUd that the individual mathematical problems raised by applicatioDS of the miD imp. theory are often very awkward, even when stringent simplifying 888UlDptiODB are complied with; consequently much work on specific appUeationa of the theory is atill in progress.
CHAPTER
9
Introduction to the Minimax Theory 1 Introduction This chapter explains what the minimax theory ie, almost without referenee to the theory of peraonal probability. This courae IfJ8ID8 best, beeause the theory W88 originated from an objectivistic point of view and 88 the solution of an objectivistic problem. Moreover, a philosophically more neutral presentation seem. to result, if the ideas of personal probability are here kept out of the foreground. The minimax theory begiDa with some of the ideas with which the theory of personal probability, &I developed in this book. also begiDa. In particular, the notiona of person, world, states of the world, events, CODBequencea, acta, and decisi0D8 preaenteci in Ii 2.2-5 apply as well to the minimax theory-from which they were in fact derived-u to the theory of personal probability. The point at which the two theories depart from each other is 12.6, which postulates that the person's prefereneee establish a simple order among all acts. That 888UlDption is necessarily rejected by objectivist&, for it, together with the sure-thing principle (which they presumably accept), implies the existence of personal probability. For objectivists, of COUJ'8e, conditional probability does not apply to all ordered pains of events. More specifieally, it aeems to be a tacit assumption of objeeti. viatic statistics that the world envisaged in any one problem is partitioned into events with le8p8Ct to each of which the conditioD&l prot. bUitiea of all events (ignoring the mathematical UKiliDica1ity of meastl1'&bility considerations) are defined, but that conditional probability with le8p8Ct to seta other than uniODS of elements of the partition are Dot defined. That, incidentally, is why partition problema domiDate objectivistic statistics. The partition in question is in general infinite, but, for mathematical simplicity, it will here be 888UIIled to be a finite partition B,. The objectivistic position is not in principle oppoaed to the concept of utility. In particular, the minjmax theory is predicated on the idea
168
9.2)
1&8
THE BERAVIORALISTIC OUTLOOK
that the coD8eQuences of thOle acts with which it deals are meuured numeriea11y by a quantity the expected value of which the per8OD. wishes to have as large &8 possible, whenever (from the objectivistic point of view) the concept of expected value applies. It will therefore be doing the minjmax theory little or no injustice to postulate here, as eleewhere, that the oouaequences of acts are measured in utility. Theee preJjminaries dispoaed oft the general objectiYlatlc ctedaton problem is to decide on an act f in some given r, by criteria dependiDs only on the conditional expectations B(f I Bi), and therefore without reference to the "meaningless" P(B.}. Taking any peJ'lODaliatic or neuooary point of view literally, it is DOD8eDliea1 to pose an objectivistic decision problem, that to uk which f of F is beat for the person, without reference to the P(Bu. On the other hand, many, if not all, holders of objectivistic viewa, like WaId, fiDd themaelvea logically compelled by two widely held _eta to COIlaider such problems meaningful. First, for re&8OD8 I have alluded to in Chapter 2 and will BOOn expand upon, many theoretical statisticiau today agree, at least tacitly, that the object, or at any rate one object, of atatistica is to recommend wise action in the face of uncertainty • point of view that Wald W&8 particularly active in bringiDg to the fore. Second, statisticians of the British-American School, of which Wald ja to be eonsiciered a member, are objectivists and are therefore committed • to the view that the probabilities P(Bi > are meaningleD, or, at &Dy rate, that they cannot be legitimately used in solutions of statistical problems. So far as I know, Wald is the only ODe ,,·ho has proposed any solution to the general objectivistic decision problem, barring minor variati0D8. His propoeal, which is here called the minimax theory, is rather complicated to state. In view of its complexity and the importance of this theory for the rest of this book, and for 8tatiatiea1 theory generally, I hope the reader will have particular patience with the present chapter.
w,
I
The behaYkalJatlc oudook
Prior to Wald'e formulation of what is here called the objectivistio decision problem, the problema of statistics were almost always thoulht of as problems of deciding what to say rather than what to do, though then had already been lOme interest in replacing the verbalistic by the bebavioralistic outlook. The first empbaaia of the bebavioraliatic ou~ look in atatistica was apparently made by J. Neyman in 1938 in [N3), where be coined the term "inductive behavior" in oppoeition to "inductive inference." In the verbalistic outlook, which still dominates moat everyday statistical thought, the basic acta are auppoeed to be
180
INTRODUCl'ION TO THE MINIMAX THEORY
[0.2
aaeerti0D8; and schemes baaed OD ob8ervatioD are IOUgbt that eeldom lead to false, or at any rate p:aly inaccurate, MErtioua. The verbalistic outlook in statistics seems to have ita origin in the verbalistic outlook in probabllity criticized in 12.1. which in turn is traceable to the ancient tradition in epistomology that deductive aDd iDduetive inference are eloeely analogous proeeiIJ&8. I, aDd I believe others sympathetic with WaId'. work, would aDalyze the verbalistic outlook in statistics thus: WhatAwer an uaertion may be. it is an act; and decidiDg what to asaert is an instance of decidiDa how to act. Therefore deciaion problema formulated ill tenDI of act.e are no less general than those formulated in terms of assertions. If, on the other hand, a sufficiently broad interpretation is put on the notion of .-ertion, perhaps every decision to adopt an act can be ... prded as an a.seertiOD to the effect that that act is the best available. in which cue the difference between the verbalistic and the behavioralistic outlooks is only terminological; but I do think that, even UDder such an iDterpretation, the behavioralistic outlook with its tendency to emphasise consequences offers the better terminology. FaUacious attempts to analyze away the difference between the verbalistic and behavioralistic viewpoints are also sometimes put forward, especially in informal discu_on. For example, it is sometimes said that one should act as though his best estimate of a quantity were in fact the quantity itself. But on that basis few of us would buy life iuunmce for next year, for we do not typically estimate the year of our death to be 80 close. Other examples are diacuaaed by Camap .in Section 50 of [Cl]. If uaertions are, indeed, to be in~rpreted 88 a special cla8a of acta of particu1ar importance to statistics, I have no clear idea what tba~ class may be; but it would presumably exclude certain acts, 8Uch &8 the design of an experiment, that surely are of importanC!e to statistics. Actually the verbalistic outlook huled to much confusion in the foundations of statisties, because the notion of a&gettion has been used in 8everal difJerent, but always ill-defined, senses, and because emphasis on assertion distracts from the indispenaabJe concept of consequences.
I conclude that the bebavioralistie outlook is clearer, fuller, and better unified than the verbalistic; and that such value 88 any verbalistic con.. cept may have it owes to the poseibility of one or more bebalioralistie interpretations. This analysis is really too brief and must be IUpplemented by certain remarks. To begin with, the reader may wonder whether the verbalistic outlook has adherents who defend it against the behavioralistic, and if 80 what their arguments may be. Actually, the .tatistica1 public seems
1.2]
THE BEH..~VIORALISTIC OUTLOOK
111
to greet the behavioraliatie outlook as a relatively new idea-how old it may actually be is beside the point here-which as such must be reprded with some skepticism. To the best of my knowledge, however, only one objection against the behavioraliatie outlook baa been presented. It must be di8CUS8ed next. It hal been eeen 88 an objection to the behavioralistio outlook that the eonsequences of some assertions, particularly those of pure acience, are extremely subtle and difficult to appraise. AI a function of the true but unknown velocity of light, what, for example, will be the C()IUI8ooo quences of aaaerting that the velocity of light is between 2.99 X 1010 and 3.01 X 1010 centimeters per second? But, if some acts do have subtle CODSequences, that difficulty caDnot properly be met by denying tbat they are acts or by ignoring their consequences. Certain practieal solutions of the difficulty are known. For example, coDSiderations of symmetry or continuity may, as is iUustrated in Chapters 14 and 15, make a wi8e decision poasible even in some caaea where the explicit cODBequences of the available acts are beyond human reckoning~ AgaiD, analysis sketched in the next two paragraph8 tends to show that aaaertions with extremely subtle eonsequences playa smaller 1'01e in science and other affairs than might at first be thought. No worker would actually publish-indeed no journal would accept -88 research the hypothetical usertion about the. velocity of lipt mentioned in the paragraph above. The eOJUJequenee8 might be subtle, if he did; but they would not be very important, for no one would take him 8eriously. An actual worker would do 88 much as was practical to say what obeervationa relevant to tbe velocity of light he, and perhaps others, had performed and what had been observed. To be lUre, his statement of the observations would typically be much condensed; he would resort to sufficient statistics or other deviceJ to put his reader rapidly in position to ad as though the reader himself had made the observations. Alaertions about the velocity of light, and countlesa others of that sort, are of course published in textbooks and handbooks. These asaertions do indeed have complicated consequences, 80 judgment is eaUed for in the compilation of aueh boob; but the seriousness of the consequences of their aasertiona is limited because of the possibility of refeniDg to original research publicatiODS, a poasibility serious textboob and handbooks facilitA~ by the inclusion of bibliographies.. On the other band, it is obvioua that many problema described according to the verbalistic outlook as calling for deeisi0D8 between aaeertiODS ...uy eaIl only for decisions between much more down-to-earth acts, such as whether to issue single- or double«lpd 1'U01'8 to aD army,
UI2
I~wrRODUCTION
TO THE MINIMAX THEORY
19.8
how much posta.ge to put on a parcel, or whether to have a watch IeadjustAd. It is time now to turn back to objectivistio decision problems.
a
Mb:ec1 acta SpeakiDI with pedantic strictness, it might be said that Wald does Dot propoee a 101ution for the general objectivistic decision problem, because, before UDdertaking a solUOOD, he iDsiats that B be aubject to a certain condition. On the other hand, he argues that the condition is typically met in practice; he might fairly have iDBiatal that it ill the very heart of much actual statistical practiee. Before di&ouaaiDg the issue in detail, let me give a emall but typical illustration of it. Suppose that in a rental library I am confronted with the choice between two detective stories, each of which looks more horrifying than the other. At first Bight it would seem that only two aeta are open to me, namely, to rent one book or the other, but Wald pointe out that there are other possibilities, Dot otdinarily thought of as such. In particular, I can eliminate one of the books by flippin, a coin.. More aceurately and more generally, I can let my choice depend on the outcome of a random variable that is utterly irrelevant to the fundamental partition-in this example, a random variable the outcome of which is independent of the relative merita of the two books. The random vaJia.. ble may 88 well be confined at the outset to two values eorrespondiDs to the rental of one or the other of the books, and random variabl. . . . signing the same probabilities to the books are equivalent for the purpose at band. In practice, especially serious statistical p.ractice, auch random variables are, taking reasonable preeautioDl, readily provided
by coin8, cardB, dice, tables of random numben, ad other devicea. In terms of the general objectivistic decision problem, WaJd'. poiDt can (except for mathematical technicalities) be formulated thus: II ff' represents a finite number of elements of F, and .(r) is a cOITe8pOnding set of Don-negative numbers such that ~(r) - 1, then the person can make the mized act
(1) r
available to himself by observing at DO appreciable coat & random vaJia.. bJe taking the values r with corresponding probabDities .(r) irrespective of which B, obtains, 80 P may be assumed to include f. TeeJmically, the sum in (1) should, for full generality, be replaced by aD integral with respect to a probability measure. But such integrals become SUperfluoU8 under the simplifying auanunption, which is herewith made,
INCOME AND LOSS
183
that there are in F a finite set of acts f." to be ealled prlmary acta, with respect to which every act in F can be represented in the form (1). In the rental-library 8DlDple, the two acta corresponding to the two boob can be regarded 88 primary. 8inee mixed acta are also available from the personalistic point of view, it may weD be wed whether it is advantageous to consider them in eoDDection with that point of view, and, if not, how they can be of advantap from one point of view but not the other. The &DSWer to the first part of the question is easy. Indeed, if f is defined by (1) then it is pemoDaliatieal1y impossible that f should be definitely preferred to every fr, that is, that (2)
E(f) -
.Er .,,(r)B(fr) > max E(fr ), r
for a weighted mean e8nnot be peater than all ita terms. Technical explanation of the efficacy of mixed acta from the objectivistic point of view can best be presented after the whole statement of the minimax me, but those at all familiar with modem statistical practice will derive lOme insight from the remark that the u8ual preference of etati&tici8D8 for random samples repreeente a preference for certain mixed
acta. ,
Incomeandloss
It is 80metimea suggestive, and in conformity with some statistical (though not quite with economic) usage, to refer to B(f r B i ) as the IDcome of f when Bi obtaiDS, and, eorrespondingly, to tile the notation l(f; tj. An important concept 8I8OOiated with the income is that which I shall refer to as the 1088 (symbolized by L(f; .1) incurred by the act f when B, obtains. By that I mean the difference between the income the penon could attain if he were able to act with the certain knowledge that B. obtained and that which he will attain if he decides 011 f when B, does in fact obtain. Formally, (1)
L(fi
t.1 =- Df max l(f'; .) I'
- l(f; IJ.
H the penon decides on f when Bi ObtaiDS, L(f; I) measures in tel1Dl of income the error he has made. If he were himself informed of Bi after f had been chosen, which is not typically the case, L(f; .) would, 80 to apeak, "measure his eauee for regret. On that account, lOme have propoeed to call 1088 "regret," but that term seems to me charged with emotion and liable to lead to such misinterpretation .. that the 1081 necusarily becomes known to the peraoD. On the other hand, the
INTRODUCTION TO THE MINIMAX THEORY
1M
(9.5
term "1011" has been UJed by Wald in the 88DI8 of Degative income, but in contexte where 1<81 as defined here is, of the two senses, the only defeDllible one, as will be explained in 18. I hope the seDM proposed here will Dot c&U88 serious confusion. EserdHs
1. For each i, there is at least ODe primary act fr such that 1(l..i a)
(2)
~
max l(f; Il. I
Such a primary act, may fairly be eaUed OII'ff'ecl for i. 2. L(I; t) - ~(r)L(fr; .1 ~ 0, equality holdiDa if and only if f is a mixture of acta correct for i. 3. L(f; i) - max l(fr , ; i) - l(f; s1. ~,
4. L(f; i)
a:
-1(1; i), if and only if
(3)
o.
max l(fr i s) r
I
The mlnlmlX rule, and the principle of admillibUity
The moat characteristic feature of the minimax theory is a certain rule of behavior, or recommendation to the penon. Tbia rule, to be called the minimax rule, can DOW be formulated thus: Decide on an act f/, such that (1)
max L(f'; i) -= min mu L(f; s), ~
t
,
where f and I' are, of course, confined to P. In worda, the minimax rule recommends the choice of BUm an act that the peatest 1088 that can possibly accrue to it sball be as amaJJ as possible. An f satisfying the recommendatiOD of the minimax mle will be called & mlnlmls act, and the greatest 1088 that ean accrue to a miDimax act will be called the mJnlmax mue of the (objectivistic) decision
problem and written L·. Under the simplifying a88umptioDS that have been made, it is not technically difficult to show that at least one miDimax act exists. The statement of the rule can be reasonably extended to mathematically more general situationa, but a digreasion about thi& possibility is not appropriate here. The name of the Me is presumably derived from the abbreviation "min max" in (1) or from the Latin phrase "minimum max;morum" thus abbreviated.
9.8)
ILLUSTRATIONS OF THE l.nNIMAX RULE
181
It may well happen that l' contains more than one act that is mini-
max for the problem. in which case the minimax rule recommends, not a particular act, but only that the choice be narrowed to the set of minimax acta. Some other criterion must then be invoked to narrow the choice further. In particular, it can be shown that at least one of the minimax acts is admissible, in the sense of § 6.4. Alt Wald indicates, it would, theMfore, be an inexcusable violation of the su~thing principle Dot to narrow the choice to admissible acta. This application of the aure-thing principle will be called the principle of aclmi88l'bDity. The minjmax rule and the principle of admiMibiJity eoDBtitute the subject matter of, and thereby define, the minimax theory. 8 muatradona of the minim• .,. rule It would be hani to imagine an objectivistic decision problem simpler than that of whether to make an even-money (or more accurately, even.. utility) bet in favor of a certain event or to refrain from betting. 1'hat problem, therefore, provides & convenient first example of the minimax rule aDd the concepts associated with it. Supposing, &8 one may without 1088 of generality, that the bet is for one utile, the objectivistic decision problem is completely described by Table 1, which gives the inTABLE
1. To
r.(COKB OP AN EVEN-MONEY BET,
1(t.j i)
Event Act
1
Bet, fl Don't bet, '.
o
-1
o
come of each of the two primary acts for each of the two elements of the partition coneapoDdiDg to the event in questiOD and ita complement. In view of Exercises 4.2 aDd 4.3 the corresponding loss functiOD is described by Table 2. Therefore, (1)
,
,
max L(fj i) - max ~(,.)L(f,.;
'J
181
INTRODUCTION TO THE MINIMAX THEORY
equality obtaining U and ODly if .(1) - .(2) == and the ODIy minimax act is f - til !ft.
+
I.
(9.0
Therefore, L* -
I,
Event Act
o 1
1
o
In this problem, therefore, the minimax rule recommends that the person decide, in effect, by ftippiDg a fair coin. If the odds ill the bet had Dot been even, the minimax rule would have recommended the WJ8 of a coin with a certain bias; this more general 8Xample will be worked out in detail in § 12.4. It is noteworthy in connection with the preeent problem-for it happens in many otbeJ'8-tbat, for the act f, L(f; &1 - L* for every value of i. ,'I It. ", "
The followiDc more elaborate example, illustratiDg the mech8a i am of obaervation, is paraphraeed from a slightly incorrect eumple in [82]. Of three numbered coins, two are pennies and one is a dime, or else ODe ill a peDDy and two are dimes. This gives rise to a lixfold partition Bi , because any of the three coins may be the siDgular one, and in two wa,a. The available primary acta are described in two &tapa thus: First, the person may select one of the coiDs by number for observation, or he may refrain from 80 doing; aecond, he must gueas at the denomination of the aingular coin. His income in utiles is defined by the followiDl conditions: 1. U the aiDgular com is & penny, be must pay a tax of 10; if it is a dime, he receives a bonU8 of 20. 2. If he chooaes to observe a coin, he must pay an inspection fee of I, reprdleE of the particular coin selected for observation. 3. If his guess is incorrect he pays a penalty of 8. It is easy to see that the first of the three terms in the person's income is irrelevant to hie 1088, since his decision does Dot affect the macnitude of that term. His 1088 is therefore the sum of two terms. The first of these is 1 or 0 depending on whether be decidea to make an observation; the aeeond is 0 or 8, depending on whether DIU" • correct.
9.6J
ILLUSTRATIONS OF THE MINIMAX RULE
187
,If the person chooses not to pay the inspection fee, it is'clear from the preceding example that, no matter what he does, his loss may be as hip. 88 4, and that it is certain to be that small if and only if he governs his guess (essentially) by the ftip of a fair coin. Suppose next that the person decides to make an obsenration. H he &elects any particular coin for obaervatioD, be Us as badly 011 88 he was before the obsenration, and he has in addition incurred the iDspectiOD fee. Thus, even if the person knows that the first coin is a pemlY, there is nothing he can do to be sure that his total 1088 will not be more thaD 0, and, 88 before, he caD guarantee that smal) a 1088 only by governiDa hie guess with the flip of a fair coin. I thiDk every practicing statistician would say that, if aD. observation ill to be made at all, one of the three coins should be 88lected at random (i.e., the probability 1/3 should be attached to obeerviug each of them) and after the obsenration the person should aueaB that the sinplar coin is opposite in denomination to the ODe obsenred. It will be shOWD in the next paragraph that this eommOn-eeD8e act is minimax. In the &rat place, the 1088 L(fo; 11 for the act 10 in question is, for each i, equal to 1 + i X 8 -= 3f, which is leas than 4; for the inspection fee ia 1 aad the probability of making a wroug cu_ which would result in the lea of 8, is 1/3. To show that fo is miDimu, it will be enough to Ihow that every act can result in a 1088 of at least 3f. One poEibility for doiDg this (which in 112.3 will be shown to be a Datura! one to try) is to show that, for a certain set of weights, the weighted average of L(f; '1 with respect to i is at least 3J for all f. In fact. it is auflicieot, in view of Exerciae 4.2, to establish 8Uch an inequality for the primar7 acta. In the present example, it happeD8 that the weights can be 000leD to be equal. What is to be shown, then, is that the followiq inequality obtains for every primary f.
(1)
L(f) ..
D.l 1: L(f; 11 ~ 3i· i
Now, if the primary act f does Dot involve oblervatioD, L(f) - 4; because three of the six terms to be averaged are then 8, aDd the other three are O. Suppose next, for definit.ene., that f involves the observatiOll 01 the first coin; there are then three poBbilitiee to CODBider. lUst, the gueas is made without reprd for the denomination obIerved, in which cue the observation is, 80 to speak, thrown away, making L(f) - 5. Second, the denomination gue_cd may be the l&IDe .. the denomination obsenred, in which case the 1\1" wiD be WlODI for four of the six values of i, making L(f) := 61. Finally, the denomination pIned may be the opposite of the one observed, in which cue the gueas
168
INTRODUCTION TO THE
)'IINI~IAX
THEORY
(9.7
3f.
will be wrong for two of the six values of i, making L(f) .. This argument shows that L· ~ 31; and, since L{fo; i) == 31 for every i, 10 is a minimax act and L· -= a!. It would not be difficult to show that fo is the only minimax act for this problem. ., Objectivistic motivation of the minimax rule
The minimax rule recommends an act for the person to chooee; more strictly, it recommends & sharp narrowing of his choice. But how can this particular recommendation be motivated? To the best of my knowledge no objectivistic motivation of the minimax Nle baa ever been published. In partieu1ar, Wald in his works always fnmkly put the rule forward without any motivation, aayiDg simply that it might appeal to some. Though my heart is DO lODger in the objectivistic point of vieW, I will in the next few paragraphs suggest a relatively objectivistic motivation of the rule.
I evolved this far from satisfactory argument at a time when I took the objectivistic view for granted. Now, as a personalist, it still seems interesting to me in that it .owe, or at least suggests, bow statistical devices combat vagtleDe88, a topic I find very ctifticult to ~U88 directly. On a different level. the argument may shed light on the personalistic view by suggesting how personalistic ideas entered the mind of at leaat one objectivist. A categorical defeDSe of the minimax rule seems de&n.itely out of the question. Suppose, for example, that the person is offered an evenmoney bet for five do11ara-or, to be ultra-rigorous, for five utileethat internal combustion engines in American automobU. wiD be 0bsolete by 1970. If there is any event to which an objectivist would refuse to attach probability, that corresponding to the obaoleacence in question is one. As the example centering around Tablea 6.1-2 makes clear. the Ulinimax rule recommends that the bet be takeu or rejected according as a fair coin falla heads or tails. Yet, I think I may say without presumption that you would regard the bet against ob801eacence 88 a very sound investment, agreeing that provision for adequate interest and compensation for changes in the value of money is implicit in mea&mement of income in utiles. On the other hand, there &19 practical circumstances in which ODe might well be willing to accept the rule-even one who, like my:ee1f, holds a personalistic view of probability. It is hard to state the circumstances precisely, indeed they seem v8lUe almost of neceaaity. But, roughly, the rule tends to seem acceptable when L· is quite small compared with the values of L(f; .1 for BOlDa acta f that merit serious consideration and some values of i that do not in common sense .em
0.8)
LOSS AS OPPOSED TO NEGATIVE INOOME
1.
Deady iDcndible. Suppoee, for example, that I were faced with such a decision problem, in which it may be assumed for simplicity that there is oaly one minimax act I, and coDBider how I might defend the choice of that act to someone who proposed another to me. He might, for example, tell me that he knowa from long experience, or by a tip from his broker, that eome act I is preferable to f. "WeD," I might say, "I have aU the respect in the world for you and your aources of information, but you can Bee for yourself-for it is objectively eo-that the moat I can lose if I adopt f is L·." He will not be able to say the same for It and in llWly actual situations the greatest possible 10M under r may be many times as great &8 L· and of sueh a mapitude 88 to make a serious difference to me should it occur, which may well end the argu-
ment 10 far 88 I am concerned. It is of interest, however, to imagine that my challenger preflOes me more clOMly, remiDding me that I am a believer in personal probabUityt and that in fact I myaelf attach an expected 1088 L to I that is eeveral times lIDAIler ihaD L·. Even then, depending on the cin:umsta.nees, I might aD8W8l' frankly that in practice the theor,y of pencmaJ. probability is auppoaed to be an ideali_tion of one's own standards of behavior; that the idealimtiOD. is often imperfect in such a way that au aura 01 vagueDelil is attached to many judgments of peraoD&l probability; that indeed in the present situation I do not feel I know my own mind well enough to act definitely on the idea that the expected 1088 for I really is L; but that I do, of C011l'8e, feel perfectly confident that f cannot roe. suit in a lea greater than L*, a prospect that in the case at hand does Dot distress me much. It eeeme to me that. any motivation of the minjmax principle, objectivistic or penonalistie, depends on the idea that deciaioD problema with relatively emall values of L· often occur in practice. The mechaDiem reaponsible for this is the possibility of observation.. The cost of a particular observation typically does Dot depend at all on the U8eS to which it is to be put, 10 when larp issues are at stake au act ineorporat-
iDe
a relatively cheap observation may aometimes have a relatively amaU mNimum 1<&. In particular, the ineome, 80 to apeak, from an importaDt acieDtific obaervation may accrue copiously to all mankind poeration after pneratioD.
8 to. u opposed to neptive iDcome ill the mlnlm.y rule As .. variant to the minimax rule 88 I have atated (or perhaps I should DY interpreUd) it, one might CODSider the possibility of lettiDg the "..tive of income play the role of the 10M in (S.l). IDdeed, atrictJy sp_kiDl" Wald bhuelf alwaye propoaed the minimax rule in that
170
I~-rRODUcrION
TO THE MINIMAX THEORY
(9.8
form. I believe he never made written alluaion to the Nle formulated in terms of 1081 (as "le)8l" is defined here); orally he took the poaition that log and the form of the minimu rule baaed on it were inVeDtiou of mine, toward which he was tentatively sympathetic. There is virtually no mathematical difference between the two rules, aDd it W8I characteristic of Wald's approach to the foundatiODl of atatiatica to be reluctant to eommit himaelf with respect to any other diflereaces. Though the miDi max rule founded OD the negative of income ECID.I altogether unteDable. 88 will soon be explained, and thoup DO ODe but myself seems to question that I originated the variant of the theory baaed on 1088, little or DO originality is attributable to me in this respect. Wald more thaD foreshadowed the idea, for. though he baaed hie minimax rule on the negative of income, he made it clear in publictr tiODBt including [W3], that he regarded as typical problems in which the income baa, for every i, the property specified in Exerciae 4.4:. Therefore, in the situatioDl Wald regarded as typical, the diatmction between the two forms of the rule vuisbes, 10, untO heariDs his explicit disavowal, I coDBidend the idea of lea 88 oppoeed to D,.tive income his. To see that the minimax rule founded on the neptive of income iI utterly untenable for statistica, cODBider, for example, a twofold partition problem with two primary acta in which the income is AI in Table 1. TA.B~
1. I(f,; i)
Event
Act Bl
Bt •
II
t.
Now, if the
-1
-10
-1 1
were interesUd in miuimising the maximum of the neptive income, he would have no recourse but to decide on III in which C888 (but in no other) be could be sure that the negative iDcome would be at mOlt 1, whichever B, obtainecL This may not in itee1f eeem 0bjectionable, but auppo8e DOW that the peJ'8Oll baa available free of cost an observation, however relevant to B,. Then, no matter what derived act he chooeea, if Bl obtains, his negative income will be at least 1 utD8; and, to be IUJ'8 that it is not more, he apiD has DO NCOUl'Ie bu~ per8OJ1
1.8)
LOSS AS OPPOSED TO NEGATIVE mOOD
I'll
to decide OD fl. In mort, for the problem at band, the penon'. beha-rior would Dot be iDfluenced by any obaervation, however relevant. This • BIDS to me abIord on the face of it, but perhape dle absurdity can be
brouPt
out by a Ie. abatraet situation paraUeliul the enmple jut pveD. A penon has a ladder, and, just 88 he is about to 1188 it, ii 00CU1"8 to him that the ladder may peBbly be danaeroua17 defecme. Be _viIIpe two basic primary acts: ft, to throw the ladder &W&7 8Ild 00,. a new one, which will coat 1 utile in either event; ad fl' to use the ladder, which will, if the ladder is defective, nault in his m;ury to the extent of 10 utilee, and will, if the ladder is 8OUDd, aceompJiah hie 0bject, which is worth 1 utile. Now, if the pel'lOD acta on the principle of minimising the maximum of nep.tive income, he will throw the ladder away, DO matter what teste tend to show that it iI 8Otlnd.
CHAPTE R
10
A Personalistic Reinterpretation of the Minimax Theory 1 IDtrocIuctioIl
In this chap. a teinterpretatiOD of the mbdmax theory, baaed on the theory of peracmal probability and the idea that statistical problema are typiea1ly multipenoD&l, is tentatively put forward. The reinterpretation is baaed on a model or 8Cheme that capturea, I believe, much of the essence of actual statistical aituatiODl, but it may be poeeible to effect that end with other equally simple and even more realistic models; for the one to be presented here leaves much to be desired. In structure, this chapter is kept roughly paraDe! with Chapter 9, to enable the reader to aumiDe 88 clOle1y 88 he may wish the parallelism between the objectivistic interpretation given there and the pet'8OD81ist;ic one PVeII here.. In particular, the liberty is taken of PviDc old symbols new mea. mp in order to briDg out the parallelism between the two interpJetatiOD8.
A mocle1 of poup dedldOD CoDBider a group of people, indexed by numbera i. These people are suppoeed to have the .me utility function, at leaat for the COD88qUeDce8 to be considered in the present context, but their penonal probabilities are not necelTlrily the arne, The group of people is placed in a situation in which it muat, acting in concert, chooae an act f from a fiDite set of available acta F, the CODBequeneee of the acta being meuured in terms of the common utility of the members of the group. The situation just described will be called a lfOap cledeion probl..... It is epitomised by a jury. The membe1'8 of the jury, in lepl tb8O!Y. are IUppoeed to have eomDlon value judgments ill comaect.ioD with the lepl matters at band; for theae are incorporated in the law as stated in the iDstNCtiODa of the court. But it iI part of the very eoDcept of • jury that ita members may be of different. opiDi0D8; that their judpaeou. I
172
10.8]
THE GROUP MINIMAX THEORY
173
to questions of tact may difJer; that, to put it technically, they may have different systems of personal probability. Still other situations resembling the group decision problem are widespread in acieoce and
88
industry, though the group decision problem does by DO means represent the only sort of social interaction talding to make the theory of personal probability, confined to & single persoD, inadequate. Whenever a hospital or a factory modifies ita prooodures, whenever a doctrine is adopted with little reservation by virtually all the workers in a science, or whenever a panel of experts drafts a report, something like group decision is taking place. Since the members of the group in a group decision problem, though requited to act in concert, typically differ from one another in their probability judgments, it is too much to expect that any rule can be formulated that will be acceptable to, or in any sound sense proper for, all groups under all circumstances. On the other hand, there may be ODe or more rules of thumb that will lead the group to an acceptable compromise in many practical circumstances. Two such suggestions, the group minimax rule and the group principle of admissibility, will be made and explored in the next sectioD.
S The lI'Oup minimax rule, and the croup prindple of admissibility
In the first place, the possibility of using mixed acta is to be pointed out. fi, for example, you and I, walking together, disagree about which branch of a fork in the road leads home, we can, and in fact may, decide which to try by ftippiDg a coin. In general, mixed acta are available in a group decision problem for reasons analogous tc their availability in objectivistic decision problems, for, though the members of a group may generally differ in the probabilities they personally aasign to some events, there is in practice an abundance of events associated witb coins, cards, random numbers, and the like that make it possible for the group to mix the primary acts in any proportion, all members of the group being in agreement about what the proportioD8 are. The example of the fork in the road illustrates how the use of mixed acta can effect such a compromise as to make decision possible in what might othel'\\iae be an impuJe. As in the account of the objectivistic decision problems, it will therefore be taken for granted from now on that F contains all mixtures of its elements, and once more, for mathematical simplicity, it will be assumed that there are a finite number of primary acts f,. in F, of which all others are mixtures. The ith person in the group attaches a certain expected utility, or (personal) income, to the act I; call it 1(1; ~). In the judgment of the
A PERSONALI8TlC MINIMAX TBIOOay
(1)
L(I; sJ - max 1(1'; t) - 1(1;
"
(10."
t'.
(poaDbly zero) 88 compared with the income or expected utility that in his opinion would result from aD act he CODIidera moat promisiDg. The poop mjnjmax rule is the P.II8ItiOll that aD act be adop&ed IUcb that the 1arge8t 1018 faced by 8IIJ member of &he group wDI be .. small 88 poeaible. To put it formally, the IAlgestion is that aD be
r
adopted such that (2)
max L(f'; t.) - L· :: Df miD max L('; I). ,
f
i
The paraUeUsm between the poup miuimax rule and the minjm" rule stated in I 9.5 is great. In particular, (2) is identical ill appearaDCe with (9.5.1). Thil is really only a pun, though a fruitlul ODe, because L, i, and even f have altogether ditJeteIlt meanin. in the two COIlt.edl. AS indicated at the outset, it cannot be expected that the IfOUP miDimax rule wiD, or reasonably should, be accepUd by every group faeed with every problem. But, much as in the correapondiDg objectivistic decision problema, it may happen that, if L· is lID_II, in a rather vague aense, the group will accept the 1I'0up miDimax rule. Indeed, if L· • small, the group minimax rule requirea no member of the group to face a Iarp loa, 80 DO member will feel that the IRlgest.iOD is 8 eerioua miltake. In any event, no member of the group caD . . . . aD altemative that will not make lOme member'slo88 IS peat aa L*, for there is none. Moreover, in many problema the group minimA IUIe wiJllead to the same 1088 L· for every member of the group (al is aplaiDed iD 112.3). a circumstance which, when it 0CCU1'8, may add to the aceeptabDity of the sugestion by making it seem fair. 01 coune it ill poeaible that, 88 in the objectiviatic interpretation, more than one act fulfilling the minimax principle uiata.. Here,. pan.pbraae of the principle of adnrilBibility will further DaITOW the ehoiee, for if (8)
LCI; I)
~
L(f; i)
for every i, with inequality obt-ajniDg for lOme i, tbe group CAMot eenously consider f. 4: CritIque of the poup minim.'! n&1e
Some of the criticisms that have been, or may be, raised apiDat the minimax rule can as well be ctiacuaaed in ccmnectaon with one intapl"
10.'1
CRITIQUE OF THE GROUP
~IINIMAX
RUI.,E
171
mti-
tatiOD as with the other, and Chapter 13 will be devoted to such cUuna. But lOme that bear specifiea1ly on the multiperaonal interpretatlon in this chapter ehould be discussed here. In the first place, the group minimax rule is flagrantly undemocratic. In particular, the influence of an opinion, under the group minimax rule, is altoptber independent of how many people in the group hold that opinion. In geDeral, it is difficult to give a fonnal analysis of the concept of democratic decision, a point discusaed at length by Arrow [A5), HildJeth [H4a), and others. Perhaps, considering that the peopJe in the poup are postulated to have a common utility function, a satisfactory aDalyaia of democratic decisions could be given in the case of a group
problem byeome such procedure 88 minimizing the average with napect to i of L(f; I). But, in many situations in which I envisage application of the II'OUP minimax principle, the group will in fact be a rather nebulous body of people, for example the group or all epeciaHata in BOID8 field. The principle would in such a case be administered by a single member of the group somewhat in the following fashion. In pJuDiDI an investigation, the results of which be intends to publiahf he will endeavor to take aeeount of all opinions, 80 far &8 he can know or gu.. them. that are eonaidered at all re880Dable iD his field of investigation. And when he publishes his results he will say, in effect, ''Whatever reuonable opinions have heretofore been held by members of this specialty, in the light of my investigation and the minimu: nde, it is DOW proper for the members of the specialty, in 10 far .. they are ealled upon to act in concert, to agree to such and sucb an action." To put it a little differently, in such an application the group is rather fictitious, and the individual investigator is admitting as reasoDable a rather larp clasB of opinions, but excluding many that he illUIe his confreres will agree are utterly absurd. He will, for example, feel quite free to exclude those OpiDioDS that almost all educated people reprd as superstitious. The group mjnjmax rule is alao objectionable in some contexts, beC&U8e, if one were to try to apply it in a real situation, the members of the group mipt well lie about their true probability judgmenta, ill order to influence the decision generated by the minimax rule in the direction each considers correct.. This objection is, however, acarcely eerioua in the fictitious sort of application suggested above. It is appropriate, in tenninating this section, to diacuse a certain distinction. neglect of which can, 88 was pointed out to me orally by Bruno de Finetti, lead to serious misunderstanding of the group minimax :rule. Voluminoua obaervation typically tends to make anyone person &1mOBt oertam. of the truth, and aIao, when a group of people is involved, it deciaiOD
118
A PERSONAJ,ImC MINIMAX THEORY
pt.,
typically tends to make L· small. These two tendencie8, thouP related, are aeparate pheoomena , as an illustration will briDe out. Suppoae that Peter and Paul are required to bet 1 utile in concert either that the majority of a larp electomte baa voted fOf, or that it hae voted agaiDst, a certain issue; but that before betting they an to be allowed to examine a random sample of 1,001 ballots. H specific opinions about the division of the electorate are . . . .ed to Peter and Paul, the situation ean be nIarded as a poop d.eciaiOD problem.. To start with aD interesting extreme poeaibility, IUPpoI8 that it is Peter's unequivocal opinion that 55% of the electorate .. for and 45% i8 aping the issue and Paul'., that the division is 46% for and 55% agaiDst; that is, Peter, for example, is supposed to act u though he 1motD, that the division i.e 55%-45%. If. fiua1ly, it is understood that the group decision problem eoaaista in the two people, Peter aDd Paul, decidiDg, before the ample is ac>tually obeerved, how their bet is to be determined by the compoaitioa of the sample; then the unique minimax act is to bet that the electorate majority is whatever the sample majority happens to be. GrantiDK this easily establiabed 801utiOD of the minimax problem, it is obvioua tbat the two people both face the minimax lea L·.. Peter, to be specilc, regards L - 88 the probability that through random fluctuation the ample will accidentally fail to corroborate his "lmowledp" that the majority is for the issue. Numeriea11y, L* is about 0.0008. Peter and Paul, recognising that the poeaibility of obaerriDg the sample reduces the minimax 1088 to about 0.0008 88 compared with the 0.5 that it would be if no sample were available, may weD find the minimax act a satisfactory compromise; at any rate, it is hard to see in this situation how they could arrive at any other. Thougb the incorporation of the sample into the problem has greatly reduced L -, obeervation of the sample does Dot affect the opinion of either person in the slightest, for unequivocal opinioD8 such &8 they hold are Dot subject to modification in the light of evidence. At last one of the two people is immovably wrong, and the obaervatiOD of DO sample, bowe\·er large, can bring them both close to the truth. This brings out a contrast between the reduction of L- and the approach to certainty of the truth, both or which typically occur with the accumulation of evidence. The same contrast is expressed by remarking that, though the two people may readily adopt the minimax aet, each fee6D1 that at the expense of a small risk he is diverting the obetiDaey of his eoDeape to their common good; after the obaervation of the sample, ODe or the other of them is bound to feel that the prise baa been lad by & lad aDd improbable accident.
10.4)
CRITIQUE OF THE GROUP 1.IINIl\fAX RULf;
177
The wary will ask, "Who will feel how, when the actual majority is diaclosed and settlement made? l\llat if Peter's unequivocal opinion tums out to be false?U Such questions suggest that paradox lurks in an example in which different people unequiyocally hold mutually jnCOD8istent opinions, 80 there is some interest in considering a modifieation of the example, free of that objectionable feature. Suppose then that Peter and Paul, though strongly opinionated about the division of the electorate, are not absolutely unequivocal in their opinions. To be quite definitet suppose that Peter attache8 probability 1-10-10 to the dh·ision 55%-45% and probability 10-10 to the dhrj. mon 45%-55%, and that Paul attaches the same probabilities but in the opposite order to the two divisions- Here, as in the example of the unequivocal opinions, the unique minima.~ act is to let. the bet be chosen in accordance with the sample majority; L· is a trifle lower than before-. Observation of the sample does now generally affect the opinions of the two people, but, though it radically ~uces the minimax 1088, it dON not typically bring the two people into close agreement. If, for example, the division is in fact 45%-55%, Paul's strong a priori belief that that is the actual division is almost sure to be strengthened by tbe sample, and Peter-s equally strong but false belief is almost 6UJ'e to be weakened. Still, the probability is only about 1/2 that Peter will be led by the sample to attach an a posteriori probability even &8 great &8 0.05 to the actual division. Thu8, speaking loosely but practically, the approach to eertainty of the truth is here not typieally nearly 80 fur advanced by observa.tion as is the reduction of the mini max 1086+ It may not be 8Uperfluous to point out that the preceding paragraph alludes not only to the two different personal probability systems of Peter and of Paul, but also to certain conditional probabilities that you and I have accepted hypothetically in setting up the example. Whichever division does actually obtain, it is rather probable that, once the eampJe is observed, either Peter or Paul _ill wish he could break his contract. This seems to me to reftect a serious objection to the group minimax principle, especially in thOle applications in which the members of the group are not literaDy coDSUlted, for people cannot be expected to abide by disappointing contracts they might have made but didn't. For other approaches to the group decision problem see de Finetti [D6], {D7a), de J.inetti (1954), Stai!! von Holstein (1970, p. 65 and fr.), and Winkler (1.968).
+ As de Finetti haa remark@d, the aeparation between the two phenotnena is lUore ~larly brought out if Petn and P.ol decide which bet to make OD the . .is of • t.nnis ulateh betWt'fll thfllluwlvt'S. }t'or, if ..eh thinks hiuwelf muth tbe nperior player, Le will be depreaed, though the OpUUODB of Peter .114 Paul about the eiHti01l ",wain eompletely una1reried by the outeome of the mateh.
CHAPTER
11
The Parallelism between the Minimax Theory and the Theory of TwcyPerson Games 1 IDtroductloD John von Neumann, in 1928 [V3), developed a theory of PJI* iD whieh two people play each other for money. t This theory is math.e• matical1y 80 cloee1y akin to that of the minimax rule and hae bad .-ch iD1luence on ita development that it would be artificial to give an ezpomuon of the minimax rule without _yinc eometbiDc of the theory of what von Nenmann calla JI8IO«1m two-pereoD pales, thoqh the account, given here muat BetA: 18ari1y be hiably eompre aaed. The moat convenient references in EogJiab to the theory of IeI'O-8Ulll twe>-per80n pmea, should the reader be interested in a fuller account, are [B18), [MaJ, and Chapters n and III of [V4]; though thea who lead German may find it best to start with the expository sectiODS of the paper [Va] in which von Neumann first diseuaaed the subject. The 80rt of syatematic pUDDing by which the formal parallelism between the objectivistic and per80nalistic minimax theories wu emphasized in Chapter 10 will be used cmce more, to briDe out the formal parallelism between thoee theories and that of sero-eum two-perBOll pmes. Logic will be still further sacrificed to clarity and convenience by calljng the two people who play the game "you" and "I."
Studard pmes A certain eort of game, here called a BlondGrd gmM, is defined thus: You aecretly chooee a number r from a fioite aet of poeaibiJitiea, and I aecretly chooee a number i, also from a finite set of poaBbilitiee. The numbers r and i having been choeen, you pay me the 8UID of mcmey (possibly negative) L(r; I), where L is an arbitrary function of rand i, known to both of us. It is assumed that, for the 8UJD8 involved, each of us finda money proportioDai to utility. t In thia completely independeot development he was to 80IDe esteot antiapated I
by Emil Borel. CoDlllllt ["I. (FlO', and (B21) for deW. aDd further reference..
178
11.2)
STANDARD GAMES
179
At fint sight, standard games look very dull, though it is immediately recopised that lOme such games are played. A tiny but typical example is the game of "ButWn, button, who's got the button?"; "Stone, paper, acilsora'J ill almost as familiar an example; and others could be mentioned. But, and this seem' remarkable at firat, any game, except po8Iibly thaee dependent on physical akill, can be viewed 88 a standard pme. The great generality of standard pmes is demODStraUd in detail in Chapter II of (V4], but iDformal diacwJsion of a single example will render the idea intuitively c1.... Suppose then that you and I are to playa game of poker (of a specified variety). At first sight poker doee not seem to be a standard game, because it involves several random events, and eevera1 deciaiona on the part of each of us, some to be made in the light of other& But, it can be arped, there are only a finite number of different situations that can arise in the eourae of a pme of poker. You could, therefore, in principle write into a no~book exactly which choice you would make in each of the po8Iible situations with which you might be faced in playing poker with me. The number of po8Iible ways of compiling such notebooks, or policies of play, is fiDitej 80, except for limitations of time and patience, you will be at DO diaadvantap in playing ODe game with me, if you simply choee once and for all that ODe of the many possible policies of play that seems best to you. Similarly, from my point of view, the game eoDBista, in principle, in choosing one policy of play. Once you have chosen one of the policies po8Iible for you, say the rth, and I have chosen ODe of the policies possible for me, say the itb, the amount you will have to pay me at the termination of the game is a random variable. 8inee it is agreed that the payments are eft'eetively in utilee for both of us, your payment to me is effectively the expected value of this random variable, which may be called L(r; sJ and which is in principle known to both of UI 88 a function of r and i. The elabora~ game of two-person poker is thus exhibited, at some expense to realism, 88 a standanl game. ReprdiDc the eboice of an , by you or an i by me 88 a primary act, both of us are at liberty to use mixed acta. Indeed, explicit attention apparently was first called to the poesibility of uaing mixed acts by Borel (see [B21J), in just this context. Let f and I represent mixed acta assigning probabilities .(r) and 'Y(i) to the values r and i, respectively. The standard game is now replaced by a somewhat different pme in which you choose an Ij I choose a I; and you pay me the amount L( f ; ,), where (1)
L(I; ,)
=Df 1: L(r; s).(r)'Y(*). r, ~
180 S
MINIMAX THEORY AND THEORY OF GAMD
111.4
Minimax play
Von NeumaDD adduces aD argument, the statement of which will be briefly postponed, that, if you have respect for my intelligence, you will Bee to it that the most I can poaaibly take from you shall be 88 amal1 88 poaaible, that is, you will choose an f' for which (1)
max L(t'; I) - L* -
Df
•
min max L(t; 1>. f
&
Symmetricallyt according to his argument, I should choose a (lUeh that (2)
min L(f; I') - L. .. DI max min L(f; ,). I
f
f
Bince, making the recommended choice, you are sure that you will not pay me more t¥n L*, and I am correapondiDgly sure that you will not pay me lesa than L.; it fonows that L. S L·. This inequality would, of coume, have obtained even if mixed acta were not permitted. It is a remarkable mathematical fact (not to be proved in this book) that, permitting mixed acts, equality always obtain8; 80 the special symbol L. is superfluous here. The argument for the recommended choices rests on the equality of L* and L.. You realize that I can take at least L* from you and that, if you are not careful, I may take mOle. On the other hand, I rea1i. that you can prevent my taking more than L· from you and that, if I am not careful, I may get leas. This suggests to many that a pair of intelligent players, each respecting the intelligence of the other, will each adopt one of the recommended acts. ParalleUam and contrast with the miDfm.x theories Some formal parallelism between the minimax theories of decision and the theory of zero.sum two-person games is evident, but the parallelism is much more complete than may appear at first sight. The mixtures I are without counterpart in the two minimax theories of decision, and the appearance of I in (3.1) at the place where i appears in (9.5.1) may seem to mar the parallelism between theee two equatiODS. But, letting
,
(1)
L(I; 11
c:
Df
L
L(T;
,1.(r),
r
ill the game theory (in close parallelism with the decieion theories), (2)
L(f; C)
3:
~ L(f; i)-y(i) S max L(f; i), i
•
11.4J
GAME9 AND THE MINIMAX THEORIES
181
aDd (3)
max L(f; I) - max L(f; i). c ,
Therefore (3.1) is equivalent to (4)
max L(f'; ,) - min max L{f; s, - L·. f
i
I
Thus from the point of view of the minimax theories of decision the 1'8 repreaent no matMial innovation and a~ at worst useless baggage. Actually, though of little if any relevance in the interpretation of the miDimax theories, the C8 constitute a useful mathematical device. Their 118eIulness has in fact been illustrated in working out the aeeond example in § 9.6 and will be systematically demonstrated in the next chapter, aloDi with the UBefulne8a of the apparently irrelevant "maxi_ min" problem p08ed by (3.2) and of the fact that L. -= L·. Some remarb on the poBbi1ity of interpreting the g's in the minimax theories are postponed to the end of this seetion. In the game theory, L may be any function whatsoever of its arguments ,. and i, but, in the deeision theories, L is subject to the condition that, for every i. (6)
min L(r; Ij - 0, r
where L{rj i) is of COUI'8e to be interpreted &8 L(f,i .,1. Here is the only mathematical difl'erence between the game theory and the decision theories, the former being mathematieally slightly more general than \he latter. Though the mathematical differences are negligible, the intellectual difference between the situations leading to the game theory on the one hand aDd to the decision theories on the other is great. Serious misunderstandings of the (objectivistic) minimax theory have often reaulted from identifying it with the game theory. Among other thinPt 1088 is then confounded with negative income, and tbe misconception that the (objectivistic) minimax rule is ultrapessimistie ia created. I have even heanl it stated on this account that the minimax rule amounts to the 888UDlptioD that nature is malevolently opposed to the interests of the deciding person. Though mathematical convenience seems to be the basic rea80D for lntrociuciDg the ,'8 ill the minimax theories, it is tempting to ask whether the 1'8 have also some natural interpretation in those theories. At the moment, I do not lee a convincinl interpretation in either theory, but completeness demands an account of an interpretation sugested by
182
MINIMAX THEORY AND THEORY 01' GAMD
(11.4
Wald for his venrion of the objectivistic theory, especially since this interpretation inftuenced aome of Wald'e moat widely used Umninology.. The objectivistic problem of decid.iDg on an act in ipOraDC8 of which partition element B, obtaiN, the PCB,) being regarded as meaningleaa, suggests a Dew problem that may perhaps also be ea1led objeetivistic. The new problem. arises on postulating that P(B,) is meaningful but utterly unknown, that is, PCB,) - y(t.), where the ,,(t)'. are the components of a I here interpreted as the a priori distribution unknown to
the decid.iDg pel8OIl. Sinee for Wald "1088"
l)'Ilonymous with "Degative expected income," he naturally calculated the 1018 of the Dew problem thus: (6)
W88
L(f; I) - -B(f
J.)
- I: - E(f I B,)P(Bu I
- 1: , L(I; I)Y(I), arriving thua at the very function sugeated by the pme theory. In Wald'e version of the theory, the new problem therefore amOUDta to the formal introduction of the 1'8 in connection with the old one, which neatly fulfills the reasonable expectation that there should be DO material difference between regarding PCB,) 88 meaoi nglEJ88 and regarding it as meaningful but utterly uDknoWD. The suggested interpretation of a , as an lJDlmown-or, to mirror Wald more faitbfully, fictitious-a priori distnOOUOD does not workt however, if the loea function of the new problem is defined by (9.4.1), for the new function L(f; I) is not then generally the same 88 the function L(f; I) sugestAd by the pme theory; thus (7)
L(f; I) == max B(t' - f t'
:a
DlU f'
-
I,)
l:i {L(fj S} -
L(f'i '} }'Y(i)
L(f; I) - min L(f'; I) f'
S L(f; 1>, equality holding for a typical 1 (Le., a, such that 'Y(') > 0 for every i) only in the altogether trivial situation that I' is dominated by one of ita elements.
GAMES AND THE MINIMAX THEORID
11.4)
188
Does this mean that, contrary to expectatioD, there is a material dif-
feNDce betwem the new problem with 1088 t and the old ODe? The followinc aercises show that it does Dot. Bald_
.
,
I. max L(f; c) .. max £(f; .). 2.. miD max L(f; I) - L·. f
I
3. max L(fj I) - L·) if and only if max £(f; I) - L·. •
i
C H A PT E R
12
The Mathematics of Minimax Problems 1 Introduction Since the two different minimax decision theories and the theory of zero-sum two--person games have a common mathematical core, it wiD be worth while to digress (or a chapter even at the expeD88 of some repetition, to discus.~ this common core mathematically, that is, virtually without reference to its various possible interpretauOD& The discU88ioD will have to be drastically confined relative to the large body of relevant literature, but the reader who wishes to pursue the subject much further will find {B18], [V4], ('V3], and [}.131 to be key referenca 2 Abstract games
To begin with a very general situation, which will later be apecialiaed to the one of main interest, let f and I denote generic elementa of &By two abstract sets, and let L(f; I) be the value of an essentially arbitrary real-valued function. It will, however, be aasumed for simplicity that for every r and I' the quantities max L(f'; I}, (I)
•
L*
:II:
Dt
min max L(f; ,), I
,
min L(f; 1')
L. - Dr max min L(fj &)
I
•
t
exist. To say that a maximum, for example, exists is not omy to say that the function in question is bounded from above, but also that the ma.ximum value is actually attained for at least one value of the argument. For want of a more neutral term, call the function L(f; I) an abstract pme. An f' is called miDlmax, if and only if (2)
max L(f'; I)
~
L*;
I
and a I' is called maximin, if and only if
(3)
min L(f; I') - L.. f
185
12.2)
The exiateDce of minjmax and maximin values of the variables is implicit in (1). It is an eaI7 exerci8e to show that r is mjnimax, if and only if L(f'; Il S L·
(4)
for every ,. The eoneapondiDg clwacterisation of maximin that
e'.
AI
those such
L(f; I'> ~ 14
(5)
for every f could similarly be shown. But the symmetry of the situatiOD is such that it would be SUperfiUOUl to derive this clwacterisation of a maximin explicitly. Indeed, every theorem, or general concluaiOD, about L(f; ,) obvioualy has a dual, which ariIee on applyiDg the theorem to the newabatraet game Lc.; f) with Lw; f) - -L(fj ,). This is typical of what illmOWD in mathematics .. & dualif;J pdDdple. Benceforth the duals of demcmatrated concluaiOD8, even when not explicitly stated, will be 88 freely used 88 the demODStrated concluaioDS themaelvee. Some eonc1usiODl are of couree ee1f dual. Incidentally, another example of a duality principle wu uaed in i5.4, and a very important one 'W88 pointed out in eoDDection with Boolean algebra in 12.4. AD arpment ahowiDI that L. S L* was aiVeD in CODDeCtiOD with the theory of games. More formally, if f and ~ are, nepectiveiy, miDimu and maximiD, then from (4) aDd (6) (6)
It is poaaible, indeed typical, that L. < L*. Suppose, for example, that f and , are variables that take only two values and that L(I; Il is described by Table 1. Here, as the reader should verify, both f's TULK
1. L(t; I> I
1 2 1
0 1
2
1 0
f
ad both I'. are minimax and maximin, respectively, and L· - 1,
L. -
o.
The foUmriD, theorem is frequently applicable to the identificatioD of minimax and mpjrnin values of f and I, and of Le and Lt..
186
THE MATHEMATICS OF MINIMAX PROBLEMS
{12.8
II f/, (, and the Dumber C are BOoh that L(t; ,) ~ c S L(fj 1') for every f and I; then L· - L. -= C -= L(tj t), f' is miDimax, and ( is maximin.
TBEoRDI 1
PRooP. First, C ~ L·, beeauae C ~ max L(f'; I) ~ min max L(f; I>
(7)
f
I
-
L*i
I
and t dually, C S L.. But L. ~ L*; 80 C L· = L. = C. Now (4) and (5) apply. •
S L. S L* S C, that is,
II t and C are such that L(t i a) S L(fj I'> for every I and then f and ( are, reepectively, minimax aDd maximin, and L· -= L. -= L(t; 1'). COROLLARY
'i
1
8 BiUDear pm..
If one stumbles aomehow onto a pair r, ( satisfyiDa the hypothesis of Corollary 2.1, then he baa diaeovered a minimax, a maximin. and the values (in this cue equal to each other) of L· and I.. But that poaaibiHty of diecovery does not exist unJe- L· - L., which at the level of puerality of the last aection is 11IW8U8l.. Almoet all real interest, however, centers OIl a very special cIasa of abetract pm., here to be ealled bilinear pDles, for which it is demODBtrable that L· ia invariably equal to L.. The definition of bilinear pmee involves 8eV~ stepe. First, COIlsider au abstract pme, L(r; i), baaed on a pair of variables, ,. and i. The two variables are here assumed for simplicity to have only a finite number of poaaible values, an aMlmption that can, and for statistics mUlt, be COD8iderabIy relaxed. Next, let f and I be Don-negative funetioaa of , and it respectively, arbitrary except for the coDStraint that (1)
E/(r) -= r
:E, ,('1 =- I,
in short, probability measures on the r'e and i'a, MSpeCtive1y. FiDally, the bilinear pm. L(f; ,) is defined thus. (2)
L(f; &) -
Df
:E L(T; t)/(r),(t). p.i
It is important to recopiH that the duality principle continues to hold, that is, if L(t; .) ia a bilinear pine, then L(c; f) - -L(I; .) is alIO one.
12.3)
BILINEAR GAMES
187
In Wma of the auxUial7 functioDa L(t; I)
- Df
1: L(r; t,j!(r), r
(3) L(T; I) -
Df
E L(r; I)g(,), ~
the followiDg equalities and inequalities can easily be verified by tbe
reader. max L(f; ,) - max L(f; I), &
,
(4)
min L(f; I) - miD L(T; I). r
f
(5)
min max L(r; tj p
~
min max L(I;.1 - L*
~
L.
i f '
-=- max min L(T; I) ~ max min L(T; ,). &
,.
•
r
But more can be said in connection with (5), for it baa been shown by von Neumann (V3l that for the special clua of functioDl now under diaeuaDon L· is actually equal to L.. This important equality cannot couvenieDtly be proved here, but tbe interested reader ean refer to the relatively simple proof given by von Neumann and Morgeustem in Section 17.6 of (V4] (reading first, if necessary, the introduction to the mathematics of convex sets that constitutes Chapter 16 of that book) or to the version of. it presented in [BI8}. In the light of the equality of L* and L., (5) becomes (6)
min max L(T; 11 r
i
> min max L(f; ,) = L* ,
=:
•
max min L(,.; ,) I
,.
> max min L(r; i). ,
r
In view of (4) and (6), Theorem 2.1 can be much improved upon for
bilinear games: TIDODIIl For bilinear games, the following three conditiODI t, (, and C are equivalent: 1. f' minirnu, C maximin, and L* = c. 2. L(f'; I) ~ < L(f; 1') for every f and ,. S. L(t; i) S C S L{,.; C> for every i and r.
OIl
c
Paoo,.. Condition 2 implies 1, by Theorem 2.1; 1 implies 3 by (6); aDd 3 implies 2 by (4) • •
188
THE MATHEMATICS OF MINIMAX
PROBI~MS
(12.8
A DeceBIBl1 and suflicient condition that f be mini. max is that, for lOme I, L(f; i) S L(r; ,) for every r and i. UDder that OOIldition Le - L(f; .), and I is maximin.
CoROLLABY
1
Corollary 1 • .q .. an eapecial1y appropriate expreaaiOll of Theorem 1 in connection with the minimax deciaiOD theories, where the ,'a are, after all, Dot really of interest in themselves. Theorem I, and equiva1aD.tJy Corollary 1, are of great practical value. To be sure, there ale alprithms, or rules (given by Shapley aDd Snow in [S12D, by which L· and all miU im" values of f can in principle be computed, but theae aIgorit.h ma are 80 awkward to apply that in practice ODe pnara11y gunl. one or more miuimax 1'8, aDd al80 a maximin ., aD the baaia of aom8 clues. verifying the 1Ue88 and evaluatiJ1l L* by Corollary 1. To finjah the job, one then finds, if one can, an argument to show that the minimax "s thus discovered are all there 8l8.. This rather imperfect pr0.cedure is especially important, since it can relatively easily be ateDdad to many situatiODl in which r and i are not confiDed to finite nmgee, .. does Dot seem to be true of the algorithms. As WIB mentioned in § 10.3 and 88 the examplea that have been given illustrate, if f is minimax, then L(I; 1') is in practice often actually equal to L· for all, or at least many, values of i. Insight into that phenomeDon is given by the foDowing theorem. 2 If i is such that there exiata a maximin I for which 0, then L(fi .1 - L· for every minimax f.
THEOREM
g{i)
>
PRooP. L(f; i) S Le, because f is mioimax. Therefore L(t; I), being a weighted average of the L(f; I)'., is at most Lei and it is actually 1888, if any term with positive weight is not equal to Le. But L(fj I) ~ L*, because I is mmmin. • I t can happen, and in statistical practice it often does happen, that every i satisfies the hypothesis of Theorem 2, in which case L(f; 11 L· fot' every i and every minjmax f. Theorem 2 often provides a basis for guessing a minimax " a maxim;» I, and the value of L*, which can then be checked by application of Corollary 1. To take a simple example, suppose that there are ft valuee of f', aDd n of i. There may be 8ODl8 J'8UOn to conjecture that each i is used by some maximin I, that is, that each i aatJsfiee the hypothesis of Theorem 2. If the conjecture is in fact true, then fer) and L· satisfy the system of equations Jf(r) OL· - 1 r (7) 1: L(rj i)/(r) - lL· - 0.
l:
r
+
12.f)
AN EXAMPLE OF A BILINEAR
Typically, (7) as
& system
of n
188
~\lE
+ 1 linear equations in n + 1 variables
win have exaetly one 801ution (J(r), L·). This 801ution, if the conjecture is valid, will actually consist of the eomponentB of a minimax f (m thi8 eaae the only one) aDd the value of L*. But the conjecture ia Dot yet confirmed. In particular, if any fer) in the solution of (7) is negative, it is contradicted; if not, the investigation can proceed. The candidates for maximin values of , are now, by the dual of Theorem. 2, amoDg the solutIons of the system.
E 19(i) + OL· :: 1 i
(8)
L,
L(r; ~)g(t) - lL* - 0,
where r is confined to the values for which fer) > O. To CODSider ODly the simplest and most typical cue, suppose fer) > 0 for every r. HeL* 88 known, (8) consists of n 1 equations for " variables, which at first sight might be expected generally to have DO solution. To put the matter differently, if one forgets for the moment that L· baa been determined by (7), it might seem poBble that (8) could lead to a dift'erent value, say L·'. But, using the latter part of (8) and then the first part of (7), it is seen that
+
PntinI
(9)
L
L(r; i)!(r)g{tl - L!(r)L·' - L·',
r. i
r
and dually the double sum equals L·; 80 di.seMpaney between L· and L*' is not amonl the real snap in the tentative program-irrespective of the number of r's participating in (8). Finally, if (8) leads to even one set of positive g(,l's, it follows from Corollary 1 that the f and L· derived from (7) are the unique minimax and the true value of respectively. The convene of Theorem 2 has been proved by Bohneo.blust, Karlin, and Shapley in [B19), though their proof CAnnot be reproduced here. As is pointed out by these authors, the convene does not extend at all readily to situations involving infinite ranges or r and i. Theorem 2 and ita converse can be aummarized thus:
L·,
Tm!oltmI3 'There exists & maximin, for which g(t) only if L(f; ,j =: L· for every minimax f. ,
> 0,
if and
All eumple of • biIiDeu pme
It is now convenient to discuss a certain example, or rather a cJasa of examples, of bilinear games, namely those in which i takes only two valUeB, say 1 and 2. Two preliminary remarks will help to orient the
190
THE MATBEMATIQI OF MINIMAX PROBLEMS
diaouadOll. Firat, bilinear pm_ in which i takes only one value are devoid of intertwt, for the mimmax problem in that cue is simply • problem of finding an ordinary minimum. Second, the discuaaioa of bilinear pm. in which i takes only two valute includes, in effect, because of the duality principle, the diacuaDOD of thoae in which r tak. only two values.
If i takes only the two values 1 and 2, the values I - 1,(1), ,(2) J cau be Jepteaented grapbically by points OIl an interval. 88 illuatAtAd at the foot of Fipre 1. For every r, L(r; ,) is linear 88 a function of
1.--.(1)--. .. . . . 1. •...------,f%)--------. Fiprel
I, as is L(fi ,) for every f. It is, of COUlI8, just because the L(t; I) of a bilinear game is linear in this 8eD88 and ita dual that I uae the tAVm "bilinear." In Figure 1 the five slanting solid linea represent the five linear functioDa L(r i I) of a bilinear game in which r (for illustration) takes five values and i takes two. The dashed lines repreaent two vatu. of f,
AN EX.UlPLE OF A BILINEAR GAME
191
each of which has for simplicity been 80 choaen as to use, or mix, only two values of r. As may be verified by inspection, the particular bilinear game repreaented by Figure 1 baa the special property that minL(r; t1 - 0 for
eaeh i, which is the distinguishing property of those bilintsr pmea that arise in connection with the minimax decision theories descnDed in Chapters 9 and 10. Figure 1 bears a more than accidental resemblance to Figure 7.2.1. In particular, the concave function (1)
min 1,,(1; I) r
marked by heavy line segments in Figure 1 is closely analogous to the convex function 80 marked in Figure 7.2.1. The particular I emphasized by Figure 1 is that for which the function (1) attaiDS ita maximum value, which aecoroing to (3.6) is L*. This I i8 therefore the unique maximin. It has been shown quite generally in [BID] that bilinear pm. with more than one minimax or maximin are, in a seDIe, UDusual; Figure 1 makes it graphically clear that the special bilinear games now under consideration do usually have a unique maximin, becaUle there is more than one maximin only in case (1) happeD8 to have a horilontaJ Iti'dent.. What are the minimax f'8 for the bilinear game represented by FiguN I? Aeeording to the dual of Theorem 3.2, an r cannot be U8ed in the formation of a minimax f unless L(r; g} == L· for the (in this cue UDique) maximin ,. That consideration eliminates all but two 01 the r's from cOll8ideration. and it is graphically clear that thia Wl1. usually be the case for bilinear games in which i takes only two values. Theotem 3.2 itaelf, applied to the particular game under diseussion, shows that the graph of L{f; I) as a function of I must be horisontaJ for any minimax f. The two preceding conditions together eliminate all values of f exeept the one corresponding to the horimotal daabed line in Figure Ii and that f is indeed minimax, because L(f; i) -=- L· for both vaiueJJ of i. To specialize still further, suppose that r as well as i takes only two values. Such a game can, of oowse, be represented graphically in the spirit of Figure 1. Several qualitatively different situatiOIUl can occur, which might, for example, be elassified by the relation of the two liDear fuDCtiODS L(r, ,) to each other. The reader should graph and coDlider many or all of these possibilities for himaelf. The only one treated here will be that in which the two functions CI'088 each other at an interior It with one funotioD aloping up and the other doWD. It is graphi-
192
THE MATHEMATICS 0)' MINIMAX PROBLEMS
(12.4
cally clear that there will then be a unique minimax and a unique maximin, as will now be shown auaIyticaUy. The condition postulated can be expressed without loss of generality thus: L(I; 2) > L(t; 1), L(2; 1) > L(2j 2), (2) L(2j 1) > L(1; 1), L(I; 2) > L(2; 2).
Or, more mnemonically, L(I; 2), L(2; 1)
(3)
> L(I; 1), L(2; 2).
I t is conjectured, in this case on graphical grounds, that the program outlined in CODIlection with (3.7-8) applies, and the reader caD indeed verify that that program leads to the conclusion (4)
L·
=r
{L(I; 2)L(2; 1) - L(I; I)L(2; 2)}/4,
where
(5)
A!1 == L(I;
2)
+ L(2; 1) -
L(I; 1) - L(2; 2);
and that the unique minimax f and maximin I are (6)
f(I) { 1(2)
(7)
{
= [L(2;
1) - L(2; 2»)/.6
= [L(I; 2) -
L(I; 1»)/4,
g(l) -= [L(I; 2) - L(2; 2»)/4 g(2) - (L(2; 1) - L(I; 1)]/4.
If the game arises from an application of the minimax decision theory, (3) almost always applies. More precisely, in this ease, except possibly for the order of numbering,
(8)
L(I; 1) == L(2; 2) - 0
and
L(l; 2), L(2; 1) ~ 0;
if only the inequalities in (8) are both 8triCt, (3) applies. Then (4-7) specialize to
80,
(9)
L·
a:
L(l; 2)L(2; 1)/4,
where
+ L(2; 1);
(10)
~
(11)
1(1) = L(2; l)/~,
1(2) =- L(I; 2)/4,
(12)
g(l) - L(I; 2)/4,
g(2)
= L(l; 2)
= L(2; 1)/4.
12.$1
I
BILINEAR GAMm EXHIBITING 8YlttMETRY
193
Bilinear pmes uhibitiDg symmetry
Mathematically the solution of a bilinear game is often simplified by eonsiderat·ioDS of symmetry. For statistical applications, the implications of symmetry for biHnear games are of fundamental importance in 80 far as they represent & counterpart in the minimu theory of the disreputable but irrepressible principle of iDsufticient reason. This section discUS8e8 tbeee implications in an elementary, but fonnal, way. It can be skimmed over or skipped outright without much detriment
to the unde1'8tanding of later seetiona. Any discussion of aymmetry involves, at least implicitly, the branch of mathematics known 88 the theory of groups. Though what is to be said here about games exhibiting symmetry is intended to be c1ea.r without prior knowledge of the theory of groups, it may be mentioned that introductions to that subject are to be found in many p1aeea, for example in [B141. It CAD, and in practice often does, happen that a bilinear game has 80me qmmdrg. t This means that there are permutaticma, here symbolised by P, T', etc., of the values of r among themselves and the values
of i among themselves such that L(Tr; Ttl -= L(rj i)
(1)
for every r and i, where, of course, Tr and Ti an! the valUe! into which T carries ,. and i respectively. Permutations satisfying (1) are said to ~ the game inWJria1ll, or to belong to the group (01 qm1MlJU_) 0/ 1M game.. The pennutation U that leaves every r and every i fixed must be counted among the permutations in the group of the game, but the game h88 DO symmetry (worthy of the name) unless there are other permutations besides U in its group. An example of a game \l;th high symmetry is the game implicit in the IeCOnd example of 19.6, (or, to any permutation whatsoever at the six i's in that game among themselvtS, there is a eorrespondiq permutation of the such that the two pennutations taken together leave the game invariant. It was, of course, the exploitation of symmetry that made the treatment of that example relatively simple. ReturniDg to biliDear games in general, if T and T' are in the group of the game. then the product TT' defined by the condition that
"'8
(2)
(TT'),.
:II
DI
T(T'r),
(TT')i == Df T{T't)
is obviously also a permutation in the group of the game. This multi-
t This concept must not be confused with that of 'Icymmetrical pm..."
which ant symmetrical in the __ that the equation L(ri 1) - - L(i; r) is meanincful aad true for evert r and.i.
1M
THE MATHEMATICS OF MINIMAX PROB),EM8
(12.1
plication of perDlutatioDl somewhat reeemblee the ordinary multiplication of numbers. In particular, (T7")T" is evidently the Mme .. T(T'T"), though it is Dot Dect!Blarily true that TT' - T'T. Relative to this multiplication the permutation U playa the role of the unit, or Dumber 1, in arithmetic, for it is obvious that TU - U7' - T for any permutation T. For every permutatiOD '1', there is evidently & permutation r 1, and one only, that undoes T, that is, one such that T-IT - U. It is euy to see alao that TT-I - U and that, if '1' is in the II'Oup of the game, 7'-1 is too. The notation is of course motivatA!d by the CODIideration that, relative to the multiplication of permutatiOlUl, r 1 playa the role of the reciprocal of T. It will be adopted as a defiDition that Tf and '1'1 are the funcDODS such that T/(r) .. J(r1r) and Tg(f,) - g(T- 1,1 for every permutation of '1' and for every r and i, The intervention of r 1 in this definition may at first eeem arbitrary, but it is motivated by the followiDI CODaideratioDa. F"uat, if f ii, for example, the function auch that f(ro) - 1 and I(r) - 0 for , ~ '0, then Tf should be such that TJ(Tf'o) - 1 and TI(r) - 0 for r ~ Tro. Second, 8(Tf) should be (81')1 rather than (TS)f. The definition having been adopted, L(Tf; Til caD be calculated thU8:
r-'
(3)
L(Tf; 1'1) -
1: L(r; t.")/(T-l,),{T-l,) r.i
where the basic fact is exploited that, if r, i nma once through all pairs of values, then Tr, Ti also does 80. It follow8 from (1) and (3) that, if T is in the group of the game, then
(4)
L(Tt; TC) == L(f; I).
An f {I) is eaJ1ed UwariGnt under 1M group of 1M game, if aDd only if Tf - f (TI - c> for every T in the group. There is a natural way to
construct from any f an f invariant under the group, and dually for ,. Namely, let 1
I =- DI -
(5)
n
:E Tf, f'
12.6)
BILINEAR GAM&') EXmBITING SYMMETRY
115
where (here and throughout this aeetion) fa is the Dumber of elements in the group and the aummation is over all elements of the group. The definition (5) accomptiabea its objective, because (6) I
.. -1: I n 'r
and (7)
n
-= - - 1, tI
'I'1(r) .. J(T,-lr)
- ! E/CT-1T,-t r ) n
=-
f'
~ 1: T'T/(r) n
/(r)
T
for every , and for every T' in the group. In. (7) 1188 is made of the easily eetabUahed facta that T- 1T'-1 = (T'T) -1 and that as T nma once tbrouah the group so does T'T. The justification of I is, of eourae, dual to that of I. It is noteworthy that f - I, if and only if f is invariant UDder the group of the pme.
Suppoee R (1) ie a eet of the 1"8 (i'a). Then, by definition, r a TB (i • TI), if and only if 'l'-J, • R (T-1i e 1); and the set R (1) is invtJrli.mt under IAe ".wp 0/ 1M gcIrM, if aDd only if TR -= B (T1 - 1) for every T in the II'01lP.
Bxercl_ la. If R is iDvariant, 80 is ~R. lb. U B and R' are invariant, 10 are R
n H' and BUR'.
Ie. The vacuous Bet and the eet of all 1"8 are invariant. 2. For every R, let 12 - Df UT TB, where T is of eoul'8e confined to thepwp; and, for every r, define the trGj~ 0/ r I I (ri, where [rJ ii, as is customary, the aet whoee only element is T. (a) R is the amalJest invariant set coat-jniDl B. (b) I is the iDtenoction 01 aD invariant leta containiol R. (c) R iri.
U
t ••
(d) iii is the smallest invariant set of which, is aD element. 3&. H R is iDvariaDt, and R n fri ~ 0, then B ::> [r). 3b. If R is invariant, and r I R, then R ::> M. 3e. If trl n [r'J " 0, then (r] - (?j.
198
THE MATHEMATICS OF MINIMAX PROBI,EMS
(12.6
... The followiq conditiODl are equivalent: (I. R is invariant.
fJ. B - R. 'Y. For every r I R, fri c R. ,. R is partitioned into seta each of which is a trajectory.
4b. The following conditions are equivalent: a. f is invariant. 8. The set of ,'s for which f takes any liven value is invariant. -r. f is constant on every trajectory. Sa. If T'r - r, then (TT'T-1)Tr - Tr. Sh. If Ir} denotes the number of elementa of the group that leave r fixed, then (r I .. {Tr}. Se. If r denotes the number of elements in [rj, then" Sci. Both IT J and r are divisors of n. Se. The value of I every where on the trajectory of r is
II II
(rill r II.
II II
1
',T;:-rr, E f(r).
(8)
1 r II
r.
(rl
6. Note the dual of eaoh of the preceding exerci8ea.
In the establishment of all these preliminaries, the theory of bilinear pmes bas been almOst lost sight of, but it is now possible to say much
about the significance of invariant functions and seta for bilinear games. I begin with & theorem valued for 80me of its corollaries rather than for any charm of ita own. THEOREM
L(fll; I).
1 U L(f'; T,) S L(f"; TI) for e\~ery T, then L(f'; I) ~ If in addition L(f/; C) < L(f"; C), then L(l'; I) < L(f" j I).
PROOF.
L(T-1f'; ,) == L(f'; T,) ~ L(f"; Tg).
(9)
Therefore
L(I'j C) -
(10)
1 -1: L(T-1f'i I) n f' 1
S-
1: L(f"; TC)
n T
- L(f"; I).
If L(f'; I) < L(f"; I), then (9) is strict for T - U, and therefore (10) is also strict. • CoROLLARY
L(fll; t).
1
H L(f'; TC) == L(f"; TI) for every T. then L(l'; Il
-
12.6)
BILL'lEAR GAMa EXHIBITING
8Y~fMETRY
197
CoaoLLARY 2 If L(f'; I) =- L(f"; I) for every I, thea L(l'; ,) L(r'; I) for every I. COBOIUBY
L(I; I) - L(f; I) - L(l i I) for every f and ,.
3
4 If f .is invariant under the group of the pme, L(t; I) - L(f; I) for every ,.
CoBOLI.••y
ParaphraaiDgsome of the nomenclature of 16.4, if L(t'; I) S L(r' j , ) for every ., say that l' domifUJu, til; if f' dominates fll, but f" does not dominate f', 8&y that f' Ibidljj domifUJU8 f"; if fl dominates fll, aud f" dominates 1', say that f' and f" are equivalent; if fl is not strictly domiDated by 8DY I, say that, f' is ~ CoROLLARY
H f' dominates, atrietJy dominata, or is equivalent
5
to f", then f' dominatea, strictly dominates, or is equivalent to 1", respectively. CoROLLARY
If L(f; TI) S L(I; Tc) for every T, then L(I; I)
6
11K
L(li I)· H L(fj '1 S L(I; '1 for every i • I, where 1 is invariant under the group of the pme, then L(f;.1 = L(I j 1.1 for i • 1. CoBOLLAltY
7
It is impossible that f strictly dominates I.
Coao1d,ABT 8
max L(l; I>
TuoRBII 2
if the riPt-hand of the game.
& maximum
S max L(I; C), equality boldiDg, if and only I
is attained for a I invariant under the group
Paoor. (11)
maxL(lil) - muL(fjl) &
I
S max L(f; ,). I
The inequality in (11) follows from the fact that every I is a ,; equality holda, if and ODly if the fiDa1 maximum is attained for some I, that is, for IOIIl8 invariant . . . CoBOJ,YRT
9
10 ofthepme. CoBOLLART
If f is minimax, 80 is I. There exiaU a. mmimax I invariant under the group
If. pme has more than one minimax I, it is temptiq to 8Up~ that in statistical, if not in all, applicatiODB of the theory aD invariant,
198
THE MATREMATIt8 OF MINIMAX PBOBI.EMS
or symmetrical, mjnimax f would recommend jtae1f at least
(12.6
hilhly as any other minimax f. This supposition, being vapte, e&DDot be really proved, but certain faeta tend to support it.. In particular, the following theorem is a reaasuriDg improvement of Corollary 10.
1'ImollEll 3
88
There is at least one admi88ible, invariant, minimax f.
Paoo... It is a direct coaaquence of a theorem (Theorem 2.22, p. 04, 01 (Wa]) of Wald'., too techDical for statement or proof heN, that at least one invariant minimax f is strictly domiDated by DO iDvariaDt f'. U that f were Itrictly dominated by any til (invariant or not), it would also, according to Corollary 5, be domiDat«l by I", which is impoabJe. Therefo~ f is admiMibIe... U the bilinear game baa high symmetry or, more aplicitly, if the number of trajectories into which the r'a or the ,"8, or both, are partitiooed is amaU; the search for invariant miDimax fa aDd invariaDt maximin ,'8 is relatively simple. An iDvariaDt minimax ia cbaraeterised as an invariant f such that max L(f'; I) - min max L(I; c) - L*.
(12)
I
,
•
But, since at least one invariant minimax exi.stB, the criterion (12) is Dot cbanged if the minimization on its right side is confined to invari· ant '8i with f 80 confined, the criterion remajns unchanged, if both maximil&tiODB are confined to invariant ,'a (as Corollary 3 ahows). Thus the search for invariant minimax f8 and invariant maximin (8 amounts to the solution of an abstract game that arises from the original bilinear pille by ruling out certain values 01 f and I, namely the UD-invariant ones. This Dew aDd .maller abatract game can be exhibited &8 a bilinear pme thus: it be understood for the moment that r raDp8 over such a 88t of tlle ra that there is exactly ODe r' in every trajectory dually for i'. For invariant / and g,
ut
(13)
L(f; I)
tri;
-=
I: E L(rj ,j/(r)g('1 ,.
-
i
~~ rI
L L
i',. I (rij
'I
Wi
L(r; ,)J(r)g(,)
- :E I: L'(r'i i')f(r),,(i'), r'
i'
12.4]
BILINEAR GAMES EXHIBITING BYMME'rRY
199
where (14)
aDd (15)
I'(r') ~ Df II " 11/(1'); g'(i') = Df II i' 1I,(i').
FiDaIly, it is easily verified that, except for the conditions/'(,,) ~ 0, (i') ~ 0, and %f'{r'} =- l:g'(i') -= 1, the coefficients I'(r') and ,'(i') are arbitrary. The new game is therefore to all intents and purpoaea a bilinear pme with only 88 many ,"s and i"a 88 there are r·trajectories. and Hrajectoriee, respectively, in the original game. The new game, incidentally, may well have symmetry of ita OWD. If there is only one r- or one i-trajectory, the new game is 80 simple i~ ecarce1y deserves to be called a game. This OCCUl'B, for example, in the second example of § 9.6, where there is only one ..trajectory. In tbM situation there is only one invariant I, and it is equal at every i to the reciprocal of the total number of i'8 (which i8 here the value of i for every -). That I must therefore be an admissible maximin. The value of L· is the~ore given by
II II
(16) The invariant mjnimax f's are those and only thoae invariant f'a euch that f(r) = 0 for every' that fails to minimise the sum in (16). Moreover, here the minimax I's (invariant or Dot) are all equivalent, as can be argued thus: Any invariant minimax f is such that (17)
L(I; C) - L(f; I) -= L·
for every I. If any minimax f whatsoever failed to satisfy (17), it would strictly dominate I; but according to Corollary 8 that is impossible. Therefore in the very special situation at hand all mjnjmu: eatiafy (17) and are aeeordingly equivalent. It is. of COUl'88, important to extend consideration of symmetry to bilinear games with jnfinite seta of r'a and i's, and infinite groupe of symmetries. but the task has not yet proved straightforward. Two key references bearing on it are [U] and [B17].
r.
CHAPTER
13
Objections to the Minimax Rules 1 Introduction I have already expressed and supported my opinion that neither the objectivistic nor the personalistic minimax rule can be categorically de-fended (§ 9.7 and § 10.3). On the other hand, certain objectiona have been leveled against the objectivistic rule (that beiDg the weB-bOWD one) that seem to me to call for reinterpretation, if not outright refutation.
2 A confusion between loIS and negative Income Some objections valid against the minimax rule based on negative income are irrelevant to that based on 1088. The notions that the minimax rule is ultrape88i m istic aDd that it can lead to the igDoriDg of even extensive evidence have already been discussed as ...mplea of such 01>-
jectiona. Another example I would put in the same category baa been augeated by Hodges and Lehmann [m1. In this example a peJ'llOD who baa observed n independent to&8es of a coin for which the probability of heads has an unknown value p is required to predict the outcome of the (n l)th toss. Hodges and Lehmann here mterpret prediction in the following 80mewbat sophisticated, but reaaonable, aeD8e. The per80Il is, in the light of his observation, required to choose a number p betWeeD 0 and 1 and to pay a fine of (1 - p)1 or p2 according 88 the (n l)th toss is in fact heads or tails. Thus the (expected) mcome attached to the primary act p and event p is
+
+
(1)
l(p; p) -= -p(l - p)2 - (1 _ p)p2
= - (p -
p)2 - 1'(1 - pl.
AB Hodges and Lehmann show, the only derived act (mixed or pure) that yields the minimax of the negative income is to set p - i irrespective of the observation. But it is, in common sense, absurd thus to ic~
J3.31
UTILITY AND THE
~UNI~fA.~
RULE
201
the observation of the first n toages. In view of this absurdity, almost everyone would agree that applying the minimax rule directly to the neptive of (1) is a foolish act for the person to employ. The absurdity of minimising the maximum of neptive income in this example is of courae DO valid argument apiDat minjmi.mg the maximum 1088. It is easy to see that the loss corresponding to (1) is DOle
L(p; p)
(2)
=:I
(p - 'Pr~.
AM Hodges and Lehmann happen to show in the same paper [H6] (though in a different context), and &8 will be diaeus8ed in some detail in § 4, the UDique minimax derived act does use the obeervatiODB to advantage, resulting in & loss of 1
(3)
irrespective of 'PI The absurd act of setting p .. 1 irrespective of the obeervation results in the 1088 (p - 1>2, which in any ordinary contezt would be inferior to (3), especially for large n. Incidentally; the minimax derived from (2), though not nearly 80 bad as aetting p identiea11y equal to is itself open to a serious objection, which will be explained in § 4.
I,
I
Utility and the minimax rule
Some objecti0D8 to the objectivistic, and mutatis mutedis to the group, minimax rule are in effect objecti0D8 to the concept of utility, which underlies the minimax rules. Critieisms of the concept of utility have already been discuseed in Chapter 5, particularly in § 5.6, but certain aspects of the diseusaion need to be continued here. It is often said, and I think with justice, that, even grantiug the validity of the utiJity concept in principle, a persoD can aeldom write down his income function J(r; J) with much aeeuracy. This idea is put fQJ'Ward sometimes with one interpretation and sometimes with another. Of these, only the first is strictly an objection to the utility concept.
That one is
It
dilemma
raised by the phenomenon of vagueness.
VaguenNl may 10 blur a person's utility judgments that he cannot accurately write down his income function. I suppose that no one will eeriously deny this; I would be particularly embarraseed to do 80J for it is almost a reeapitulation of the very argument that leads me, tholllb in principle & personalist, to see some 8e1l8e in the objectivietic decision problem. On the other hom, if all meaning is denied to utility (or lODle exteDaion of that notion) no unification of statistics seems possible.
OBJECTIONS TO THE MINIMAX RULm
(18.8
Three special circumstances are known to me under which eecape from the dilemma ia poIIible. First, there are problema in which IOIDe straightforward commodity, lOch 88 1DOD8Y, livee, man hours, hospital bed days, or submarines sighted, is obvioualy 80 nearly proportiooal to utility as to be substitutable for it. Second, there are problems in which exact or approximate minimax decisioDi can be calculated on the basis of only relatively little, and eaaily available, information about the income function, such as symmetry, mODotoneity, or amoothneM.
The po88ibiUty of cheap extensive obeen'atiOD, which (wben it occurs) makea the miDimu priDciple attractive, aIIo tAmcla to make many decision problema fall into both of the two types in which the difBculty of VllUenesB is alleviatAd. For example, in a monetary cieoiliOD problem with cheap observation available, it often happens that the weak law of large numbers, and the like, can be invoked to justify reprdinl cash income as proportional to utility income. Third, there are many important problems, not necessarily lacking in richness of struoture, in which there are exactly two ccmaequencea, typified by overall SUCOO88 or failure in a venture. In IRlCh a problem, .. I have heard J. von Neumann etreIe, the utility can, without lea of generality, be set equal to 0 on the leas desired and equal to 1 on the more desired of the two cOlUBluence8. The 8eCond sense in which it may, though not quite properly, be said to be impossible to write down the income function is typified by this example. A manufacturer of small ahort-lived objecta, .y paper napkins, is faced with the problem of deciding on a pJ'Oll'BlD 01 sampling to control the quality of his product.. He complaiu that, thouah for this problem his utility is adequately measured by money, he cannot write down his income function because he does Dot know how the public will reaet to various levels of quality-that, in particular, the minimax rule does not ten him at aU how much be ought to spend on the sampling program, though it may say how any given amount can best be employed. The manufacturer baa a real difJiculty, tboup he expresaes it inaccurately. He forgets that the lack of knowledge that gives rise to the decision problem involves not only the &taW of hie product, but also the state of the public j taking the state of the public into account, there is no real difficulty in writing down the income function. But, if it is not practical for the manufaeturer to make obIervatioDa bearing on the state of the public &8 well as thoee bearin, on the state of the product, the minimax rule is not a practical 8Olution to his problem; for, rigorously applied, it would remove him from the papernapkin busine88. I believe that in practice the penonaliatio method often is, and must be, used to deal with the unmoWD state of the pub-
1&4)
ALMOST SUB-MINIMAX ACTS
lie, while objeetivistic methods, particularly the minimax principle, are now increasingly often used to deal with the state of the product-&, IOrt of dualism haviDg some paraDel in almost all eerious applicatioDa of statisti~. This is not to deny that relatively objectivistic methods of market rmearch can sometimes be used, Dor that there are penoualiatic e1emente aside from tboee concerning the state of the public in much 01 even the moat advanced quality control practice. ,
AImoIt _b-mfnima" acts
Another lIOn of objeetiOD to the objectivistic minimax rule is illu. trated by the following example attributed to Herman Rubin and published by Hodges and Lehmann [HS]. An integer-valued random variable :I subject to the binomial dlatributioD (1)
I
P(z p) =- (:) ,,-(1 - p)--
ia obeerved by a perIOD who knows n but Dot p. His decision problem is to decide on a function,. of s subject to the 1088 functioD: (2)
Let; p) - B«t - pr'l p) - ~ <1'{2:) - p)' C)"-(1 - p)--.
In other terms, he must estimate P on the basis of an obaervation of :r; and subject to & loa equal to the aqU&l'e of his error. The traditional estimate of l' is defined by ~(z) ~ z/n. Thi8 estimate haa mauy virtuea; it is the mmmum-likelihood estimate, the only unbiased estimate, and (as is shown in [GIl) the only minjmax estimate for a somewhat dift'erent problem hom that poeed by (2). But for (2) the unique minimax is (u is shown in (H6]) defined by (3)
_J(Z) -
~(z) +
,. it ia ItraiPtforward to verify for every P, (4)
L
and (5)
which CODBtant is, therefore, L·. The ratio of the first of theea functiOll8
OBJECTIONS TO THE MINIMAX RULm
to the second is (6)
4p(1 - p)
the maximum of which
(18.4
(1 + ~y.
at p - 1/2 and is
OCCU1'8
(1 +~r
(7)
Thus, for large ft, the maximum 1088 of to is larger than L* by ODly a alight fraction. Moreover, the 1081 of to is Ieaa tlum L· except QeD p lies in the interval where
(8) that is, where (9)
I p - ! I < ill -
(1
+ n- H)-2) U O! (4,,)-K.
To take a numerical example, conaider ", - lOS (which the p1'BCtica1 will note is rather big for a sample). The advantap of tl over to at 1/2 is then only 0.64%, and, once p departs by as much 88 0.04 from 1/2 in either direction, the advantage is with to. It &mounta, for example, to 3.5%, 15.5%, 00% in favor of ~, when p is 0.6, 0.8, 1.0, reapectively. Many agree that in such an example good judgmNlt will, under ordioary cil'CUlll8taDce8, prefer to to the recommendation of the minimax rule, ~1. To my mind, this example constitutes a valid objectiOD agaiDlt the minimax rule, in the sense that it demoDBtrates one8 more that, whatever value that rule may have, it is at beet a rule 01 thumb. The example is a good illustration of the role of peracmal probability in ordiDary statistical thinking, for the source of the di••tiafaction a person would ordinarily feel for tt &8 opposed to to stems from the fact that he would Dot ordinarily attach enough personal probability to the immediate neighborhood of p a=r 1/2 to justify preference for tl. It follows from the numbers given above, for example, that, if the permn attaches a probability of leas than 0.84 to the interval (0.4, 0.6J, he will prefer ~o to Jl; the same conclusion can be derived from the supposition that the standard deviation of the personal distribution of p is at least 0.04. Of CO\lI'88, situatioD8 can be imagined in which the personal probabilities would be 80 concentrated about 1/2 as to justify preference for PI; the point of the example is oo1y that there are situations in which that would clearly not be the cue. Interesting material and important references bearing OD the ph. Domenon illustrated by the decision problem under diacuaaiOD are &iven ." ::I
13..5)
NO
SI~IPLE
ORDERING
205
by Wolfowitz in ['VI7]. It seems to be suggested there tba.t the difficulty can be met by postulating some small amount e by which the person does not mind having bis income deereased. Taken literally, this postulate implies on repeated application that all incomes are equivalent for the person, but \VoHol\-;tz makes it clear that he does not mean to propose the postulate in a sense ibat allows repeated applications. The idea is reminiscent of those theories of probability that permit the neglect of an occasional improbable event (mentioned in the last paragraph of § 4..4) and seems to me open to an objection aimiIar to the one raised in connection with them. In particular, the choice of the f would be not only personal, but ill defined 88 wen.
I
The
minjmax
rule doe. Dot lenerate a simple orderiDg
Finally t an objection made by Chernoff [C7] to the objectivistic mini· max theory must be disewssed. This will entail statement and illustration of the phenomenon on which the objection is baaed, and stat. ment and analysis of the objection itself. The phenomenon pertains to the relation between two objectivistic decision problems, to be called for the moment the narrow and the wide problems. The narrow problem is determined by certain primary acts f r ; and the wide one is determined by th06e primary acts and ODe more, say 10 - In other words, the wide problem presents the person with one more choice than the narrow. Calling the two income functions 1 (f; '&) and 10(1; i), it is to be understood, of course, that 1(f; i) == 10(f; _} for any f that does not use, that is, give positi\~ weight to, foe The corresponding equation does not necesaarily obtain for the 1088 fun"tioDS; indeed it clearly does 80, if and only if the maximum of 10(f; i) in f can be attained for each i without using f o. Even in case no minimax of the wide game uses 10, it is therefore to be expected that the minimax fs of the wide game will be different from those of the narrow game. In fact, it can happen that no minimax of the wide game uses either fo or any fr used by a. minimax of the narrow game; this is the phenomenon to be discussed in this section. To see how the phenomenon can occur, suppose that Figure 12.4.1 represents the loss function of the Darrow problem; and consider what the coneapondiDg figure is for the wide problem, supposing that fo is such that ~ I:: Df 1(fo; 2) - max l(t.; 2) > 0, r
(1) ~ - Df max
l(fri 1) - 1(10 ; 1)
> O.
OBJECl'ION8 TO THE MINIMAX RULel
[1a.a
It is clear that 4 and :z can attain any positive values, irrespective of the structure of the narrow problem. The figure for the wide problem
is constructed thus: The graph corresponding to each 4 is left fiud at its right end and raised by the amount .6 at ita left, and to is represented by a line sloping up with slope Z from the lower lefWland comer. It is f!M!y to Bee that the raising of the left ends of the graphs of the i,.'. can make my fr with a positive slope horilontaL If, further, auch aD f,. minimizes L(f; ,) for lome I, it caD be made a miDimax by chooainl :E suftieient1y large. Thl18, speakinl apecifieally of Figure 12.4.1, the I,. corresponding to the left 8cpn ent of the heavy COIltave II'&pb, which is not UBed in the minimax of the D&DOW problem, ean become the UDique minimax. Figure 12.4.1 is & little special ill that the heavy concave graph baa only one vertex to the left of the maximin of the D&I'I'OW problem. If there were more than one, the phenomenon eould also be exhibited by making the second vertex to the left the umque maximin, which would occur for all 1:&'. aDd %'. in a certain raup. ThUB the pheDOIWmOD occun not only for isolaUd values of 4 and % but typically for whole domains of values. Suppose, to take a striking caae, that one f~, say f~, is the unique minimax for the narrow problem and a different ODe, f""'r is the unique minimax for the wide problem.. It is absurd, as Chernoff 8&)'8 in effect,
'FI
to recommend as the best act among the f/s when only the f,.'8 are available and then to recommend f,." as the beet for an even wider class of posaibilitiee. Fancy saying to the butcher, "Seeing that you have geese, I'n take a duck instead of a chicken or a bam. U It is absurd, then, to contend that the objectivistic iidnimax rule aelecta the beat available act. But that is not 80 devaatatiDg to the rule as might at first appear, for it is not contended by anyone known to me that the rule does select the best. On the contrary, the rule is invoked only as a sometimes practical rule of thumb in contexte where the concept of "best" is impractical-impractical for the objectivist, where it amounta to the cODcept of personal probability, in which he does not believe at all; aDd for the personaJist, where the diflieulty of vaguenef.8 becomes overwhelming. To have a conaiateDt eoDcept of "best," that is, to have a mode of decision that doea not exhibit the phenomenon, amount8, as Chernoff hjmee1f points out, to the eatabliabment of a simple ordering of preference amoDg acta. In 80 far .. that can be done consistently with the sute-thing principle, personal probability is practically defined thereby. If the 8UJ'eootb jng principle is violated, the ordering is absurd 88 an expreaaiOD of pmerence. For example, the rule of minimizing the maximum of the negative of iDoome
18.6)
NO SIMPLE ORDERING
does Dot exhibit the phenomenon. It amounts to coDSideriba f S u.d only if (I)
,
max 1(1; _)
S
r, if
,
max 1(1'; I).
This eatabliabea a simple ordering, but one tbat violates the ~thiDI principle by violatiug P2. The phenomenon baa a particularly Datura! interpretation for the IfOUP minimu rule. It would not be atraDp, for example. if a banquet commjttee about to agree to buy chicken Bhould. on being informed that goose is aIIO available, finally compromise on duck.
CHAPTER
14
The Minimax Theory Applied to Observations 1 IatmductioD In this chapter the concept of obeenration is re-expJored from the point of view of the minimax rule. In principlet objectivistic and IJ'OUp minjmax problems should here be treated on an equal footiDl- But, since mathematically the two theories are identical, it ac:x:1II8 wiaMt. to focus OD one, interjecting occasional digreeaioD8 about the other. I have choeen to focus on the objectivistic problems. That choice, being in accordance with other literature on the minima rule, will faeilitate the reader'8 further study of the subject, and it also renders more ob-
vious the intimate connection between the minimu rules and the theory of partition problemB presented in Chapter 7. The Pl'888Dt ebapter can indeed be regarded largely 88 a paraphrase of Chapter 7, 80 there will unavoidably be many references to the notatiODS and CODclusiODS of that chapter. S Recapitulation of partition problems ParalleliDl the t.teatment of observatioD in Chapters 6 and 7, an objedirnBtU; ~ pro1Mm will be roughly defined to consist of an objectivistic problem, regarded as basic; an obeenatioD; and a eecODd objectivistic problem, derived from the basic one and the obaer.. vatioD. More explicitly, the buic problem may be any objectivistic probleaa It will be ebaraeterired by the values of B(f Bi ), where f J'&DP8 over a let of acta F subject to the conditiODS laid down in 19.3, and Bf is a partition. The observation is a random variable x (confined, 88 usual in this book, to a finite set of values), subject to the conditioual distributions P(z Bl ), aDd 80 articulated with F that B(f, B i , %) ell B(f B,) for every % such that P(t: B i ) > The last condition is (7.2.7); as mentioned in CODDection with that equation, the condition will in partico-
I
l
I
I
o.
208
14.21
RECAPITULATION OF PARTITION PROBLEMS
lar be met, if every f is constant on every B t, a specialisation costing but little in real generality. The derit1ed problem (paralleling I 6.2) consists of F(z), the set of all functioD8 usigning elements f of the basic acts F to values :e of the observation %. The values of E(f{x) B l ) for f(x) c F(z) are computable from the E(f 1B i ) and the P(% 1B i ) thus:
I
(1)
I
I L E(f(x) I B., z)P(x I B1)
E(f(z) Bu - E(E(f(x) Bi , x» II:
-=
L
I
I
E(f(x) B.}P(z Bi )
or
It will DOW be shown that the set of derived acts P(z) satisfies the teelmiea1 conditions imposed on the set of basic acts P, 80 that the derived problem is also an objectivistic decision problem. In fact, if every f e F is expressible in the fonn %/(r)fr (l\~th the usual condition OD the /(r)'I), primary act.8 for F(z) analogous to the f r '8 can be defined by attaching to every function r = r(z) an element f(x; r) of I'(x), where f(x; r) ~ Df f,(z).
(2)
There are only a finite number of f(x; r)'8, and all elements of F(z) are expressible 88 weighted averages of them; the first assertion is obvious, and the second poee8 the problem of finding, for any sysrem of probability measures 4>(r; z) on the ,'8, at least one probability measure on the set of functions r with respect to which P(r(z) s ; r) - .(r; z) for every r and z. The problem typically has many solution!; the simplest is to let the r(%'}'8, regarded for each z as funct.ions of r, be independent random variables on the set of r's considered as a probability space, that is, to set P(r) - II t/)(r(z);:r). Formally, this particular solution leads to the identity (3)
f(z)
=
:E +(r; x)f
r
r
-=
L r
{n .;(r(%");
r)
J fr(z).
.'
The identity and the fact that the coefficients in braces are nOD-neptive and add up to 1, are easy to check analytically, if it is recognized that summation with respect to r means multiple summation with re-
210
THE MINIMAX THEORY APPLIED TO OBSERVATIONS
(14.2
spect to r(I), r(2), ... (the s'. beiq for definiteDeM IIUJ)p(ad to take integral valu.). Equation (3) shows incidentally that it is immaterial whether it is before or after the obeervation that mixed acta are introduced . Tum momentarily to the idea of obeervation in group decision problema.. Here the B(t; B.)'. are nplaced by l(f; 1)'8. the expected ineome of f in the opinion of the ith penon. There is no partition B" except in a special, though theoretiea1ly important, case, namely that of the ith p8J'8OD holdin, unequivocally that Bi obtains. The P(~' B i )'. are here replaced by P(x; t1'., the personal distribution of z for the ith penon. It is poetu1ated that, for each penon, the conditional expectation of f is tmaffected by knowledp of :z. The derived acta are formaDy the aame .. for aD objectivistic decision problem, and the income function of the derived aroup decision problem is (4) 1(f(z); tj - E 1(1(%); i)P(z; i). RetumiDg to objectivistic prob1em8t (9.4.1) defiD. the lea function of the basic objectivistic problem and, mutatis mutaDdia, that of the derived problem &lao, thu.: (5)
I
I
L(f(x); I} - max B(f'(x) B.) - B(f(z) B.). f'(Z)
The right aide of (5) admits lOme simplification, lor, if the p81B01l1mew which Bi obtained, obaenratioD would be value1. . to him. Accord.. inlly, (6)
L(f(%);
t1 - max ,- E(f' I B,)
I
- B(f~) B,).
ADalytica1ly, the simplification is justified thus: (7)
I
I
max E(t B,) S max E(f(z) B,) J
f~
== max E B(f(x) I Bi>P(z I B.) fez)
z
S max E(f I Bu. f
In diaenl8ing application of the minimax rule to the buic and d~ rived 10118 functions, it is doubly advantageous to introduce mixtures of the ,"'s, for thereby the theory of bilinear games preeented in Chapter 12 and that of partition problems (with aom.e reinterpretation) eaD both be brousbt to bear. Letting fJ denote a generic syaUm of weiPta ~(,). /J(t) ~ 0 and %IJCJj - 1, and waDI the notation of Chapter 7, the
RECAPlTULAnON OF PARTITION PBOBLEM8
1...2]
211
biJiuear pm...lM)Ciated with the primary and derived problema are, respectively,
I
(8)
L(f; fJ) .. 1(Jl) - B(f IJ),
(9)
L(I(z) ; fJ) .. l(ft) - B(I(z) fJ)
I
- 1(JJ) -
l: L B(I(z) I BI)P{z I Bi){Ju1 J
- lep) -
s
E 8(1(%) I (J, :&)P(% I fJ). s
If necen.".. (9) can be iDterpre~ and verified by comparison with
(7.3.7) and (7.2.8), in that order. In Chapter 7J /lei) W88 generally required Dot only to be DOIHleptive, but al80 strictly positive; on examination, this alipt differenee from the preeent context wiD be found iDnoeuoua. Again, in Chapter 7, the statemeDt and derivation of conclusions were, for simplicity, nominally confined to twofold partition problema. Here the exteDlioD of th~ coneluaioD8 to ,..fold problema will be freely used, though aome reaclen may prefer here, 88 there, to focus on twofold probJema, Lettiq L· denote the minimax (and maximin) value of the basic, and L·(z) that of the derived problem, it is obviOUl, since pes) :) B, that L·(z) ~ L·i but there is lODle interest in viewing this inequality aa a cODSequenee of (7.3.4): L·(x) .. max min L(I(s) ; fJ)
(10)
•
l(a)
- max (l~) - ,(I'(z) I ~)]
• S max , [l~) -
I
,(P fJ)]
.. max min L(I; fJ) - L·. •
f
It is clear that the maximin fJ'8 for the basic aDd derived problema are the (I. that maximise the concave functiona (11)
A(jJ) -
Dt
I(JJ) - .,(1' IIJ) - l(J) - k~)
and
I
I
A(JJ; x) .. Df l~) - 11(:r(z) 8) - 1(P) - 8(k(JJ(s» fJ),
(12)
respectively. The eearcb for minimax I(S)'I, for example, is ~t1y Dan'Owed by the coDSideration that, if fez) is minimax, B(f(z) l 8) .(It(z) /I) for eome #, indeed for every maximin IJ. AccordiJa& to 17.3.
I
212
THE MINIMAX TJIIX)RY APPLIED TO OBSERVATIONS
equality obtaine in (lO)t if and ODly if there is a maximiD
basic problem such that
fJo(z> -
(13)
Df
{
(If.4
/Jo of the
P(~ I BI)~(f,) }
E P(l: I BJ)/JoCJ1 j
is also a maximin of the basic problem for every z lOch that
%P(~ I BJ)~(;l
> o.
The moat typical poasibility, and the only
ODe
to be explored here, is
that the basic problem baa a UDique maximin flo with /Jou1 > 0 for all m ptiOD, L·(z) - L·, if and OD.Iy if z is utterly irj. Under this relevant, .. is easily shown. In the laDle spirit, as can easily be shown, L-(z) - 0, if z is definitive, but Dot typically otbenrile; and, if z extends y, then L· (x) S L· (y) with equality if, and typically only if, y is su1BcieDt for z.
'.'1
a
Sd.deDt etatilltica DigrMiDI from the miDimax rule for a moment, somethiDI more fundamental caD be aaid about a sufticient statistic J of x. Namely, for every f(x) £ P'(I:), there exista aD fCy) e F(y) aueh that l(f(y); t) l(f(z); i) for every i. Indeed fer) - L f(z)P(z 1y} defines such an
•
act. Without appeal to 10 weak a step 88 the minimax rule, this remark demODStrates that even an objectivist loees DotbiDa by acbaDBiDe bowl. of an obeervation for knowledge of a auflicient statistic of it. The remark might 88 well have been expru.ed in 17.4, except that there it would have involved some circumlocutioD, mixed acta Dot yet havm, been introduced. ,
SImple cUcbotomJ, aD eumple Much of what has been said thus far is well illustrated by the miDimax counterpart of Exerci8e 7.5.2. The reader is aceordiDgly asked to review that exerci8e and coDtinue it thU8: Ezerdlea
1. For the problem in question:
ACP) - 1,JJ(1)
(a>
(b) l(p; 1:)
¥(l) ).
+ 3.6(2) - ~ I ".rJlJ(2) - lar.6(l) I {~p(r I BJ)} 1t(2P(r. < rl·(fJ, *'0>1 B 1) + Per =- r·(fJ, 1Jo) I B 1)]#(I) + 'd2P(r. < rl·(JJ, flo) I B,) + P(r -= ,.~, ~) I B2)~(2).
- 1,JJ(1)
-
+ ' 1R(2) -1'16(2) -
BIMPLE DICHOTOMY, AN EXAMPLE
2&. A ~ is maximin, if and only if r·(ft, (1)
~)
213
is such that
'sPe"l < "I-(ft, fJo) I B1) S ,IP(,., S "I-eft, ~o) I Bt )
and
(2)
'sP(r. S
"I -(fJ, 1Jo) I Bt )
~ &tP(rt
< 1', *(JJ, 60) 1B 2 ).
2b. There is typi~ly only one maximin, but there may be a chad intBval of them. 3. Tbough the acta of F and F(z) 88 defined by Exercise 7.5.2 do Dot provide for mixed acta, it willsuftice to consider mixtures of the f(x)'a. Each of theee will be determined by an 1, and nothing will be lost by requirin,i to be of the form i{r(z». 4a. Any minimax will be equivalent to a mixture of f(z)'8 each corre8pOJldiug to & likelihood-ratio teet 8880Ciated with r·~, /Jo) for every
muimin IJ. 4b. In view of Exercise 3, the only likelihood-ratio teats that need be cODlidered for a minimax ~ are: i(,.) - 1, jf and only if r, i(,.) - 1, if and only if'l
< rl*(lJt fJo}. < '1·(IJ, (0).
These are not Deceasarily different testa. 5&. If the maximin 8 is unique, the minimax act is unique (except J)<8ibly for equivalent acta) and is a mixture of exactly two f(x)'8 corresponding to the two likelihood-ratio teste defined in Exercise 4b. This conclwrion eaUs for 80me eomment J for, in ordinary statistical practice, one or the other of the extreme likelihood-ratio teats is uaed, never a mixture. This practice is not in serious conflict with the minimax rule. beeauae the maximum 1088 asaociated with either extreme is typieally only aliibtly ~ than L*(z). Moreover. vagueness about the exact magnitude of 31 and &t would usually frustrate any attempt to calculate tbe coefticienta of the mixture. Incidentally t mixture is Dot called for at all when r is continuowdy distributed, for 1(8, :I) is thea IIDOOth rather than polygonal; that is, if P(r - r'l B i ) - 0 for every r' and both "s, then A~;:a:) bas a continuous first derivative in /I. To &bow this and to show that the derivative is i,P(rl S '1·' B,) 'IP('1 S r2*1 B,) may be taken as an exercise only slightly beyond the WJllal mathematical level of thiB book. 6b. If there is more than one maximin ~, then anyone that is not extreme baa only one likelihood-ratio teet a880Ciated with it, and the aame ODe for all. The f(z) coneapondiug to that test is eeeentially the only minimax.
21'
THE MINIMAX THEORY APPLIED TO OBSERVATIONS
6 The approach to
114.6
certaiD.tJ +
In concluding the paraphraae of § 7.I~ that baa thus far been the subject of the present chapter, it should be mentioned that the approach to certainty studied in I 7.6 obviously implies that the conespondiDI L·(x(n» approaches aero with increasing n. 8 Coat of observation
A coat C UIOCiated with an objectivistic observational problem. diminishes the income by B(e I B.) for each i, reprdl8118 of Ii that is, aIlo\\iog for the coat, 1(1; .1 - B(t - c 'B i ). But the cost, being UDavoidablel does not affect the 1088 function, 80 the minimax problem &88OOiated with the obeenration is independent of the coat. The ooeta do intervene, however, in an esaential way in the problem of decidiDc whioh to choose of several available obaervatioDB, flAY Z. at coat Cei it is important to bear in mind in connection with this problem that a Dull obeenration at sero cost is typically among the choieea available in nal life. The generic act of this oomPOU.M problem can conveniently be symbolized by };A(a)f(z.), or sometimes simply by A. Here, of coune, A(a) ~ 0, 1:A(a) - 1; for choice of A means choice, for each a, of the
probability ~«(J) that the tJth observation x. will be made and aleo choice of the derived act f(Zo) to be adopted in case z. is made. I t is intuitively evident, and foUows easily from (1) below, that the mixture of aeveral A's is also a A sa far 88 income is concerned, 80 mixtures of ~t8 do not require explicit consideration. The income funotion can be written 1(~; I) == };A(a)E(f(z.) -
(I)
c.1 BJ.
Whence (2)
max l(Ai i) - max E(t I B,) - min B(c.1 B,). ).
f
•
The 1088 function is accordingly (3)
L(A; fl)
Ja
:E A(G) (Lca(f(Zo) ; Il) + d.(JJ) J, ca
where (4)
d.(jJ)
:II
Dl
Ei
I
(B(c.1 Bi) - min E(c., Bi) ttl(s), ••
and L.{f(x.); fJ) is the 1088 function of the obeervaticmal problem derived from the 4th observation. The compound minimax problem is intimately related to the concave functions h~; Zo) and the linear functions d.(f/), 88 is explained by the following exercises. + SoUle reeeDt retereDees appropriate to this title are Blackwell ad Dubi. (1962), Chao (1970), Fabius (1964), and Freedman (1965).
14.7)
SEQUENTIAL PROBABILITY RATIO PROCEDURES
215
Bxerd_ 1. Show that
(5)
k).(P) -
Df
min L().; /l) = min [A(P; Za) l
..
+ dc&(P»).
2. If ~ - 1·1'(z.,), then L(A; fJ) - AA~); if and only if: first,
L,., (I' (Xes' ); /J) - A&; %ca')
(6)
(in which case f' ez.,) will be called weu, od4p1ed to Xo' and fJ) ; aDd, second, k(p; :10')
(7)
+ d.,(JJ)
- min [h~; :10)
..
+ d.(n)]
(in which cue %0' will be called all adGpIed to fJ). 3&. Show that
= max 1q(JJ) :S min max (Ia~; x.) + d.UJ)). ,
L).· -= Df min max L(A; /J)
(8)
.
l
,
,
3b. Under the important special condition that the d.(JJ) are equal to constanta d., (8) specialises to (9) L'A· :S min [L·(x.) a
+ d.].
Sc. When CAD equality hold in (8) and (9)1 3d. fJ' ill maximin. if and only if Ia). (fJ') IS L).·. 4-. A ~ - ~(CJ)J(%.) is minimax. if and only if: (<<) For every CI for which A(o) > 0, z. is well adapted to every maximin ~. and f(z.) is weD adapted to X. and every maximin fJ. eft) L(>.; i) S L).· for every i. (Of course (Il) is alone necE!Jary and aWlieientj the point of the exercise is that the necessary condition (a) may eoDveniently confine the seareb for minimax ~'8 to relatively few eaadidate&) 6. Suppose that: (<<) r and i are confined to the values 1 and 2, and L(I.,.;.j - r - i ,; (ft) x is confined to the values 1 and 2, aDd P(l , B,) - 1/2, p(ll B t ) - 1/4; (y) G is confined to the values 1 and 2, and the >.'. of the eompound problem attach weight ),(1) to a basic act at zero
I
to an aet derived from. x at a DOD-negative coD8taDt eoet tl. Compute and graph: A(JJ), "(P; z), and (for various values of tl) 1a".(JJ). Graph L).· 88 a function of d, and discuss the minimax A's for eoat
and
A(2)
various values of d.
Sequeatlal probabUltJ ratio procedurea The type of decision problem that in 17.7 led to the eoncept 01 a aequeDtial probability ratio procedure has an intimate eounterpart in
,
216
THE 1fINIMAX THEORY APPLIED TO OBSERVATIONS
[14.8
an important type of compound objectivistic decision problem, for \\ilich the concept was in fact originally developed by Wald [W2]. The %0'8 of a problem of this type range over tbe enonnOU8 variety of sequential observational programs associated with a sequence of (conditionally) identically distributed random variables x(I), %(2), .'-, The technical assumption that the a's have a finite range is not fulfiUed; but, &8 in § 7.7, I proceed with some lapse of rilor, referring to Wald'. book [W3] or [A71 for the full details. Exercise 6.4 shows that attention may be confined to a's that are well adapted to at least one p, and that for those a's it may be confined to f(Zo)'8 that are well adapted to x. and the conesponding fj. The way is paved by I 7.7, which states sharply restrictive properties of the z..'8 and f(z..Ys that are 80 adapted. In BOrne cases, recognition of these properties contributes greatly to the possibility of actually computing minimax, or nearly minimax, procedures for sequential problems.
8 RandomiuUon Another important type of compound problem. is illustrated by the eecond example of § 9.6. A generalization of part of that example is presented here to show how the minimax rule explains, or implies, the proce88 called randomization, which is one of the most striking features of modem statistics, and one Jong antedating the minimax rule. Randomization represents the only important use of mixed acts that has thus far found favor with practicing statisticians, as will be discusaed in the next section. The exact meaning of randomisation seems a little elusive; no sharp definition is attempted here. But, roughly, randomization is the selection of an observation at random; that is, of a ~ with more than one ~(a) actually positive, the choice of the ~(G)'8 and of the derived acts being governed largely by symmetry. The following example provides at least a fairly general illustration of the concept. To set the stage and provide motivation for a rormal statement, the example \\ill first be stated in language that is suggestive though a little vague. The consequences of the basic acta in the example depend on the composition of a population of n objecta, which may be thought of as numbered from 1 through n. It may be known of some compositions that they cannot occur; but, if a composition is considered possible, all populations having that composition (irrespective of ordering) are also considered possible. Each observation in the compoUDd problem consists in the cost-free observation of some m of the objects, every subset of exactly m objects being available for observation. Formally, the index i of the partition B. runs over a certain set 1 of n-tuples, {it. ... J i" I. of elements considered for definiteness to be in-
tegera. If i
217
ltflXEn ACTS IN STATISTICS
lUI
=- (i l ,
..• ,
i. J ,1 then any permutation Ti of i is also m J
1. It is &8BUmed that (1) for every f • F, ill, and pennutation T. To every subeet A ofmintegers, 1 S 41(A) < a2(A) < ... < 0..-1(.4) < a.(A) S n, there eonesponda an obeerlation z(A) the poasible values of which are m-tuples (z,(A), ..• , x.(A»). The conditional
I
(a> Lettiog >.(A) be independent of A.
8Dd
r
therefore equal to ( :
I
for every A; that is, letting every sample of size m have the same probability of being chosen, or randomjziaC, 88 it is said. (b) utting f(zl(A), ... , x.(A» be symmetric in its m· arguments aad iDdepeDdent of A. It
in {act be shown, by the method illustrated in the second example of § 9.6 and di8Cu88ed more generally in § 12.5, that there is at least one minimax 8&tiBfying (a) and (b). and el-en that there is an admiSBible one. Typieally. if m. is large, but, small compared to ft, L).· is much smaller than the common ,·alue of the L·(z(A»'s. The importance of randomiaation in applied statistics ean 8earcely be exaaerared. From the personalistic viewpoint it is one of the most important ways to bring groupe of people into virtual unanimity; from the objectivistic viewpoint it not only makes po88ible great reductions in maximum loes, but it is seen 88 an invention by which the theory or probability is brought to bear on situations to which probability on eaD
first (objectivistic) sight would
IleeDl
irrelevaDt.+
9 limed acts in ltatiatica Many have eommen~ that modem applied statistics makes one. but only one, important uee of mixed acta, namely in deciding, through the procees of randomization, what to oblerve. Thus, for example, once the o~atioD baa been made. the derived act is in practice almoet aiwaya chOlen. without mOOng, from a set of basic acta natural to the problem. This might seem to imply a sharp conflict between tbe minimax rule and ordinary statistical practice; but actually it reflects + I WOIIld aprl .. m,-lt ftry
d~ereDtIy
today (SaftP 19trl, pp. 33-M).
218
THE MINIMAX THEORY APPLIED TO OBSERVATIONS
(1'.0
agreement, for mixed acta (p'e&tly reduce the minimax 1018 in decisionproblem intapretatioDS of typical practical atatistical aituatioDl, whea and only when ordinary practice calla for mixed acta of the same sort, namely when random;.tion is caDed for. There are certain mechaniama that aystematiea1ly tend to make mixed acta have relatively little, or even abeolutely DO, adV8lltap over unmixed acts. In the followinl discussion of theae mech_ism., let L(f'i i) be the abstract game on which a bilinear pme L(fj I) is baaed. In the first place, supposing that L(ri s1 is DOn-neptive for f1Very r and i (as is appropriate to the context DOW at band), (12.8.6) caa be completed, 80 to speak, thus: (1)
L· min (R, 1)
~
min max L(T; s), i
r
where R and I denote for the moment the number of values of r and i. respectively, aDd min (R, 1) is of courae the minimum of the two integers R and 1. An inequality atronger thaD (I) will actua1ly be proved. Conaider a minimax f for which the smallest poEble Dumber R' of the /(1')'8 are actually positive: (2)
B,'L* == max B' I: , L(r; '1/(r)
,
~ max L(r' j i
'J
> min max L(r; aJ r
i
where r' is 80 ehoeen that R'I(,.') ~ 1, as can obvioualy be done. It is known (Bl9] that R' ~ min (R, 1). The important le880D of (1) is that, unI8!ll!l R and 1 are both larp, the introduction of mixed acts e&nnot reduce the minjmax Ie. to a very amaU fraction of the value it would otherwise have. To mention a different mechanism, Figure 12.4.1 augests that, if there are many r'I, the comers of the concave function emphaaised in that figure may well be very blunt, in which caae a minimax mixed act baa almost sa high a maximum 1088 as anyone of ita components. When the Dumber of r'a is infinite, the concave fUDction may well be diBeteDtiable, in which case mixed acta have absolutely no advantage. The remark appended to Exercise 4.6& is pertinent here. This mechanism can be related to a certain large e1asa of infinite abstract (i.e., not DecelBlrily bilinear) games, discovered by Kakutani {KI], for which L* - L.. Bilinear games are but & special case of these, and numerous others seem to arise frequently in applicatiODB.
14.0)
MIXED Acre IN STATISTICS
219
If L· - L. for an abetract game, nothing at all can be pined by adjomirag mixed acta, &8 (12.3.5) mow8. Finally, it may be mentioned that in many cases where an observation z might be followed by a mixed derived act, the same, or nearly the aame, conaequences can often be realized by a pure act. Speaking a little loosely, this ooours whenever % has a continuous or nearly CODtinuous contraction '1 that is irrelevant, or nearly irrelevant, for then '1 can play the role in selecting a basic derived act that would otbeJ'\\iae be aBIDed to a table of random numbers. If, for example, x is CODtinuous, y(x) can be taken 88 the last few digits in the decimal expansion of x to an extravagant number of plaeea. Again if, conditionally, z {za, ."", x..1 is an n-tuple of continuously, identically, and independently distributed real random variables, y{z) may be taken .. the permutation that ranks the z's in ascending order t provided that n 1 is fairly large: 10! should satisfy a1moet any need. A reeent technical reference on the auperfiuOU8De88 of mixed acta in the preeence of continuous observations is {DI3]. I have oceuioually heard it conjectured that any mixed act made after the obeervation (m an obeervational decision problem) is.~ in principle. I would al'lUe that the conjecture is mistaken thus: Any obeervational problem that cal1s for randomisatioD can be simulated, 80 far as its 1088 function L(T;.) is concerned, by a basic problem. A mixed act will be 88 appropriate to the basic problem 88 it was to the observational problem from which the basic one was derived. In this way a peat variety of situations eaIliDg for mixed acta haviDg nothing to do with choice of observation can be conatructAd, though they aeem. to be atypical in practice. MOr.!Over, any basic problem can obviously occur 88 the decision problem remaining after aome particular value z of aD. obeervation baa been observed, 80 the situations just cODStructed lead to cloeely re1atAKl ones calJing for mixed acts G/16 observation. lAB abstractly, cOD8ider a person choosing from a tray of aaaorted French pastries. Even after extensive visual obeervation and interrogation of the waiter, the person might justifiably introduce considerable mixture into hia choice.. I think that the conjecture that mixed acts are necesaa.rily inappropriate after obeervatioDB etelDB partly from the mechanisms that do tend to make such acta inappropriate or unimportant in many typical caaes and partly from justifiable dissatisfaction with specific mixed acta that have from time to time been sugested by statisticians. For example, the suggestion that ties in rank arising in nOD-parametric testa be removed by ranking the tied obeervatiODS at random may in many, or perb&pe all, cases fairly be reprded with suspicion.
CHAPTER
15
Point Estimation 1 IDtroductioa This chapw discuases point estimation, and the next two di8CW18 the testing of hypotheaee and interval estimation, respectively. Definitions of these procc-. must be sought in due course; but, for the momeDt, whatever notioDB about them you happen to have will afford sulIicieot background for certain introductory remarks applyiDg equally wen to both kinds of estimation and to testing. Estimating and testing have been, and inertia alone would insure that they wiD long continue to be, comerstones of practical statiatica. Their development has until recently been almost exclusively in the verbalistic tradition, or outlook. For example, testing and interval estimation have often been expressed as problems of making asaerti0D8, on the basis of evidence, according to systems that lead, with high probabilityI to true a.eaertiODl, and point estimation has even been decried 88 ill-conceived becauee it is not 80 expl"e88ible. Wald's minimax theory has, as was explained in 19.2, 8timulated interest in the interpretation of problems of estimation and testing in behavioraliatic terms; to objectivists this has, of OOUl'88, meant interpretation 88 objectivistic decision problems. For reasons diseU888d in § 9.2, it does seem to me that any verbalistic concept in statistics owes whatever value it may have to the possibility of one or more behavioral-
istic interpretatioDS. The task of any such interpretation from one framework. of ideas to another is neeeasarily delicate. In the present iD8tance, there is a particular temptation to force the interpretation, namely, 80 that criteria proposed by t.he verbalistic outlook are translated into applications of the minimax theory, that is, of the minimax role and the 8Ul'e-thiDg principle (88 expressed by the eriterion of admissibility), for these are the only general criteria thus far proposed and eeriously maintained for the solution of objectivistic decision problema. Of courae it is to be expected, and I hope later aeeti0D8 of this chapter and the next demonstrate, that unforced interpretations do often translate verbalistic 220
16.31
EXAl\fPJ.. I'.:8 OF
PROBI.,E~IS
OF POINT ESTIMATION
221
criteria into applications of the behavioraJistic ones. In evaluating any such interpretations, it must be borne in mind that an analogy of great mathematical value may be valueless as an interpretation; correspondingly, what is put forward as m~re analogy should not be taken to be an interpretation, much less branded as a forced one. For example, attention has al~ been called (in 111.4) to the danger of regardiDg the analogy between the theory of two-person games and that, of the minimax rule for objectivistie decision problems 88 an interpretation. In fact, minimax problflms are of such mathematical generality that they arise, even \\;thin statistics, in context8 other than direct application of the minimax rule to objectivistic decision problems; a striking, though technical, example is Theorem 2.26 of '''aid's book ['V3]. The literature of estimation and testing i.., vast; indeed it has, I think, been seriously contended that statistics treats of no other subjects. This r.hapter and the next t.wo cannot, therefore, pretend to present a complete digest of that literature, fven 80 far &8 it pertains to the foundations of statistics. For further reading certain chapters of Kendall's treatise (K2] may be rec.ommended as a key reference to the verbalistic tradition (Chapt.ers 1i and 18 for point estimation; 19 aDd 20 for interval estimation; 21, 26, and 27 f01" testing). !\{any newer aspects are treated in \Vald's book (W3]; and a reoent review of testmg by Lehmann lU] is recommended. 2 The verbalistic concept of point estimation
AbBtractly aDd very generally, but in verbalistic language (which is De<.'e888rily vague), the problem of point estimation is this: Knol\;ng P(z I B.) for every i and having ob..~n"ed the \~ue %, guess the ,raluc ~ of a prescribed function, or parameter 88 it i8 often called, ~(i) \\ith values in a set A. Semi..beh&\ioralistirally this is, I think univel'8&lly, undeJ'Btood to mean that a function 1 8880ciating a ,"&Iue l(z) ,A with each z (or possibly a mixture of such functions) is to be decided on, the function 1 being called an estimate (Oft to be c.omplete, a point estimate) of the parameter~. A problem of point estimation has, thus, 80me of the structure of an objectivistic obaervational problem; but, aince nothing hu yet been said about the income, or consequence, reBUlting from the act I in case B; obtains, it is at the moment. impoadble to advanee criteria for the choice of 1.
8 Eumplea of problems of point estimation It ",;U now be well to pre88nt some examples after a few 'word.e of preparation.. For simplicity, A ,,-ill henceforth genemlly be supposed to be an interval (possibly unbounded) of real numbers. If ~(i) -
TABLa 1.
eo... COMMON U'l'IMATlON
ConventioDal symbol for i
Conventional symbol
Binomial (of Bile n)
p
p
Inteprs 0 to ft
(b) Pois80n
I/o
" "
Noo-nepti:ve in.
Name
(e) Nonnal mean (for sample of 1& obeervations with vari-
I/o
anee one)
Ranpofs
II
PBO. . .1I8
Probability or density of s
estima~
for ~(i)
orA
(:) p.qa-.
z/r.
'-~/zl
~
s~
pm
n-Wplee s. of real numbers
or real
.(%1 - ,,) ..• 4(%,. - p),
where
"
. (Zl) ,,--. -; ... '" (Za)
(d) Nonnal variance (for sample of n observationa with mean sero)
r
..tuples s, numben
(e) Nonoal meaD (for 8lUDp1e of n otarv.tions with vari..
CP, crt)
Jl
.tupJee of real numbers
,,-"tII
Cp, crt)
",.
..tuples Sf of real numbetS
,,_-. (%1 ; ") ... ,. (St. -; Ii)
z,
1 ' =Df-ZZ,
~(.) .. DI (2w)-K,-HI'
crt
&nee
Maximumlikelihood
1
;Zzil
~
(%1 -; ") ..... (~. -; ")
~rc o~ z
I
unknown}
(f) Nonnal varianee (for sample of " oblervationa with
mean unknown)
I
1
-%%,'
"
-!' -:::
C
11.4)
CRITERIA PROPOSED FOR POINT ESTIMATES
A{i' ) implies i = i', then A rather than i can be used to index the partition; such an estimation problem is said to be free of alliance peraIIleten. This wsage corresponds to the fact that the i'a can typically be
ordered couples (X, 8), where A is of course ).(I) and , is eaIled the ullillnee puameter; if 8 in tum happens to be repreEDted &I 8D ordered n-tuple t ordinary usage calls , an ,.-tuple of amaaC8 puametera. It must be recognized 88 atypieal in estimation problema for i or A to be confined to a finite set of values, and often ~ is not 80 eoDfined either. It will therefore be necessary to proceed heuristically into damaiDs where the mathematically limited theory developed in this book does Dot rigorou&ly apply. The specific estimation problems most commonly cited as examples, and moat important in practice, are summarised in Table 1, together with their maximum-likelihood estimates, that is, PJltimates constructed in accordance with a rule t.o be defined in § 4. All but the laat two examples of Table 1 are free of DUiaance parameters. repreaented
,
88
Cdteria that have been proposed for point eatimatea
As a matter of fact. verbalistic treatments typically do give some ioJdiDI of the coDBequenee of the act l when B. obtains. Thus, in the examples commonly cited, such as those in Table 3.1, A is a eet of real numbers or a 88t of n-tuples of real numbers and, therefore, a set of objects between which the notion of proximity. has some meaning. Work in the verbalistic tradition has made it clear in connection with such examples that, if l ~ A{i) for the Bi that obtains, the guess is eouidered perfect and that, roughly speaking, it is considered rather poor if , is far from A. In spite of the apparently hopeless indefiniteness of estimation problema even 88 thus fonnulated, various criteria, or desiderata, for estimates have been sugeeted. A list of these criteria, intended to be essentially complete, is now presented. Each item is annotated and illustrated 00 make its meaning clear, and sometimes to call attention to related criteria not explicitly listed; motivation and criticism are, however, deferred until later sections, where they are treated in. connection with explicit hypotheses about the consequences of misestimation. No attempt is made t.o include criteria like intellectual simplicity or facility of computation that depend not only on the estimate but aleo on the capabilities of the people who contemplate using it. The list is in a sense logically inhomogeneous. For example, no one really considers it a virtue in it8eU for an estimate to be a maximum-likelihood eatima~ (Criterion 4); rather, it is believed that weh estimates do typically have real virtu~.
POINT ESTIMATION
[11.4
I t has, to begin the list of criteria, been augested by ODe another that: 1. If J is auflieient, nothing is to be lost by
perIOIl
or
requi.riDc the estimate 1
to be a contraction of ,. It will be instructi\l'e to bear in mind that nece ooary and adlicieDt statistics of the examples (a)-(I) in Table 3.1 are, respectively, %, %, i, :E r, (f, E ,x2), (f, E zt). 2. If, of two estimates 1 and I', (1)
for every i, \\ith strict inequality for some i, then 1 is better than J'.
There are countless variants of tbis idea. In particular, the square of the difference may be replaced by any other positive power of the absolute difference.. Again, (I) may be imposed at only one value of i, if 1 and l' are subjected to some other condition, freedom from biu (Criterion 6 belo\\·) being the popular one. Example (f) gives rise to & good illustration of this criterion, whicb is also int.eresting in a later connection. Letting Q .. DI 1: it is ,,·ell known that E(Q t Il, ~) - (n - 1),1 and that B(Q' , 1£, .,J) = (n 2 - 1)fT4. Therefore
r -
(2)
E«(aQ -
cr'f Ip., fT2)
:.
{a2 (n 2
=
{(a _n+1 1
-
n4':-
+ Ila4 )2(,,2 _ 1) + 2 } IT.
1) - 2a{n -
1)
n+l
2cr4 >--n+l for all real CI, with equality if and only if a - (n + 1)-1, omitting the pathological but trivial cue that n 1. By the criterion in queation, Q/(n + 1) is therefore better than any other estimate of the form aQ, including the maximum-likelihood estimate Q/n and the unbiased eetimate Q/(n - 1). :I:
3. If, of two estimatas 1 and 1', (3)
PC - ~1 S l(:t) - A(s) S
'21 B i ) ~ P( -'. S l'(z) -
A(i)
S
1
tt B~
for every Don-negative fJ and f2 and for every i, with strict inequality for some fir '2. and some i, then 1 is better thao 1'. + Thil example w.. given by Leo A. Qooc1D lU (1963).
Is.4J
CRITERIA PROPOSED FOR POlST ESTI}IAT&,q
Aeceptance of this criterion is obviously implied by acceptance of Criterion 2, of which it may therefore be regarded as a skeptical COUDterpart; formal demonstration of a much more general assertioD will be given in connection with (5.2-4). The criterion implies, for example, in connection with (e) of Table 3.1 that f is superior to any other weighted average of the %/8. A more interesting example "ill be mentioned in connection .;th Criterion 5. That modification of Criterion 3 in which it is concluded only that I is at least &8 good as l' is of 80me technical interest. IDcidentally, if equality held identically in (3). there would presumably be nothing to chooee between the two estimates by any reasonable criterion, for they would then both have the same syst~m of conditional distributions.
4. A maximum-likelihood estimate is often a rather good estimate. A mmmum-likelihood eltimate is a.n estimate 1 such that, for some function i of %, lex) == A(i(:r» and p~l~w)~p~lB~
00
for every i and %, In many natural problems there is only one maximum-likelihood. estimate. Taking into account the analogy between probabilities and values of probability densities, the reader should verify that the estimates listed in Table 3.1 are indeed the unique maximumlikelihood estimates of the problema to which they refer. When there is a unique maximum-likelihood estimate, it i8 obviously a contraction of the likelihood ratios and, therefore, of any 8ufficient statistic; which fits neatly with Criterion 1. 5. A good estimate should have the same symmetry as the problem. l\lore precisely, if a permutation T of the i'8 and the ,r's is such that (5)
and such that "(i) =: "(i') implies "(Ti) - A(Ti'); then I should be such that, if l(x} ~ X(i), l(Tz) ~ X(Ti). For example, adopting also Criterion 1, a good estimate for Jl in (c)
+
may be sought of the fonn l(I). Symmetry then dictates l(~ a) .. 1(1) a and l( -:2) == -l(~); in short, l(:f) = :2. The same conclusion ean be drawn for (e), though with a littJe more trouble. The criterion applied to (I) leads to estimates of the form aQ. The constant « might be fixed by appealing, for example, to CriteriOD 2, 4, or 6. These alone give three slightly different determinatioDBa-1 :. (ft I)" n, and (n - 1), respectively.
+
+
226
POINT mTIMATION
(11.4
Again, it can be shown for Examples (c) and (e) that, among all eetimates aatiafyiq Criterion 5, ~ is beat according to Criterion 3.
6. It is desirable that the estimate be unbiased.
An estimate 1 is called unbiasecl, if ad only if 8(11 B i )
(6)
-
A(tj
for every i. It is euy to verify that the maximum...likelihood estimates of (6)-(e) in Table 3.1 are aU unbiased; that of (f), however, is not, for B(QI_I p, ,t) - (1 - lin)'" instead of Again, if 1 is a maximum-likelihood estimate of A, J is a maximum-likelihood estimate of;'. But, if 1 is Dot definitive, and 1 is an unbiased estimate of ~, ,. is not an unbiased estimate of;', as Theorem 1 of Appendix 2 implies.
(S'.
7. If P~ 1 - A(i) I < 11' better than 1'.
-
A(i)
II B,) > 1/2
for every i, then 1 ia
Any reeemblance between this criterion and Criterion 3 Ie eme to be dispelled by the following example. Suppoee that, for every i, P(I - A(i) - a, l' - A(i) - b B i ) equals 2/11 if G and b are intepre such that o ~ (I < b S 2, equals 5/11 if a and b are 2 and 0 respectively, and equals 0 otherwise. According to Criterion 71' I ia better than 1', because 6/11 > 1/2; but, according to Criterion 3, l' is better thaD I, because 5/11 > 4:/11 and 7/11 > 6/11. The example can euily be modified to suit any taste for symmetry and continuity. But, if I and I' are conditionally independent (which is not a natural 8BunptiOll)t and I is better than l' according to Criterion 7; then, u may easily be shown, I' cannot be better than 1 by Criterion 3.
I
The list of criteria is here interrupted by several paragrapha of explanation in preparation for two concluding criteria. The approach to certainty treated in 113.6 and 7.6 has ita counterpart in the theory of estimation. In particular, if x(n) = (1:1, ••• , z.t is an n-tuple of conditionally independent and identically dietn1>uted observations, there will typically exist sequences of estimates l(n) baaed
on x(1&), sueb that (7)
lim P
I S f I B.) .. 1
11- •
for every poaitive e and every i. A aequence of estimates aatiafyiDa (7) ntlative to any sequence of obaervatiODS zen) (not neeesaarily n-tuplea of conditioDaUy independent observations) is called CODIiateBt.
11.41
CRITERIA PROPOSED FOR POINT ESTIMATES
227
The condition of consistency is often realized in & very special way, namely that the error [1(%(1&); n) - ~(i)] is, for every B~ and for large ft, practically normally distributed about zero with variance invemely proportional to ft. More formally, & sequence of estimates may be such that nU(l(%(n). n) - ).(~"\] ) 1 (8) lim P ( , "
fer
for every i and cr, where cr(i) i8 80Dle positive function of i; it is then said that nH(l(z(n); n) - ~('1] is uymptotically DOI'lDll about zero with UJIDPtIOdc ftliaDce ~(.1. If, in addition, for every i, .-2(;) is Dot leas thaD a certain function, the differential information, to be defined in 16. then the sequence I. is called efIldent. There is a poesible pitfall in connection with the idea of asymptotio normality. Though (8) implies that, for large ft, the distribution of the error is, in a 8enae, almost the Donna! distribution with aero meaD and variance tiJ{i)/n, it does Dot imply that the mean of the error is cloae to sero, or even finite or well defined. Similarly, the variance of the en'OI' may be much larger than ~("1/n, infinite, or in defined; but it cannot, for larp ft., be smaller than by a fixed fraction or less. Much literature on estimation has concentrated OD sequences of . . timation problems in which zen) is an .tuple coDSiating of the fint n element. of an jDfiDite sequence of conditionally independent and CODditional1y identieally distributed random variables or, 88 it will be eaJled in the present chapter, a altJndo.rd sequence; because these are the simplest e,amplee of sequences of increasingly informative ~ vati0D8. Examples (c)-(O in Table 3.1 refer directly to standard aequences; the binomial distributions (a) can be regarded as the distribution of the sufficient statistic :E ~ of the standard sequence z(n) in which each z, takes the values 1 and 0 with probabilities p and 1 - p, respectively (d. Exerci8e 7.4.1); again, if each Xi is Pois8on-distributed with parameter p., then L Xa is sufficient for zCn> and is itaelf Poi8m· distributed with parameter ftI,l. Thus, all the examples in Table 3.1 live me more or leas di!eetly to examples of standard eequenee8. In speaking of standard, and oceaaiooally of other, aequencee the ellipsis of referring to a aequence of estimates simply &8 "an estimate" has been widely adopted, 80 one reads recommendatioDS that "an fittimate" should be coD8i8tent or efficient. This ellipsis, though often convenient, sometimes proves dangerous. It distracts from the fact that a person ia called upon to make em estimate, not a aequence of . . timate8; 80 that the question of what constitutes a good sequence does not ariae. Again, it makes one feel that if an .timate, say 11" baa been
"'<.1/,,
POINT ESTIMATION
(16.4
defined for x(13), then the definition of 11. is thereby implied . One forpta, for example, that uthe" average of ft observations is & whole ~ quenee of statistics, a sequence singled out by human tastes aDd iu.. wrests, rather than by any mathematical Deceasity. In abort, the ellipsis establishes the atmoaphere of the logically nODBeDSieal (thoup perhaps psychologically revealing) questions on intelligence teats such 88: "What are the two missing terma in the aequence __ 1 8 28 1 8 2 ST" t The recommendations of coJl8iatency and efficiency quoted above caD be added to the numbered list of suggestions, in a form that avoids the ellipsis:
8. If each l(n) is a good estimate for the corresponding x(n) of a standard sequence, then the eequence l(n) is conaistent. The sequence of maximum-like1jbood estimates of the aequences of problems (a), (c)-(f) are consistent; &Ddt for the sequence of probleDII
of estimating from an obeenratioD J. Poisaon-diatributeci with parameter ftI', the maximum-likelihood estimates Ya/n are consistent. If there is one eooaiatent sequence of estimates, for a sequence of problems there is a plethora. Each term of & consistent sequeD~ caD, for example, be multiplied by (1 + n- Jf) without destroying coDSisteney. Again, the sample medians ~ are in (c) a consistent sequence difterent from the eequence of maximum-likelihood estimates. 9. Under the hypothesis of Criterion 8, the aequence 1(,,) ie efficient, at least if any efficient aequenee of estimates exists.
The six sequences of maximum-likelihood estimates mentioned under Criterion 8 are all well known to be efficient, &8 sequences of maximumlikelihood estimates for atandard aequencea typically are. The asymptotic variances and certain other interesting quantities asaoeiated with these 8ix sequences are presented in Table 1. It i8 remarkable tbat, for each of the examples in Table 1, the expected values of the estimate. approach the estimat.ed parameter; n times the variance of the stimate, and n times the expeeted squared error, both approach the asym~ totic variance of ft,H times the error. For the first five examples the relatiODB mentioned hold, indeed, Dot only in the limit, but exactly t for all fl. All six examplee are rather .pecial, or magical, but the limit,ing relatiODS just mentioned may fairly be expected tc bold in some pnerality, though they are not (as has already been mentioned) really implied by the asymptotic normality of the sequence of errors times flH.
To illustrate the exceptions that can occur,
t e - 2.7182818285 to eleven aipificaot &pres. : See any .tatiaUCI text for definition, if nee tlBal'}'.
l! ,-1 is, iD (c), the
BERAVIOR.~ISTIC
IS.S)
RE\I'IE\\· OF
E.'rrI~IATION
1111I
1 for" '" 0; this eequeDee of atmaximum-likelihood estimate of timatea is efficient; and nJf
TABLa 1.
EXAMPLBS 0,. BEHAVIOR OF MAXUIUM-LlDUHOOD eS'J.'l)(ATBS Asym~ fa
Sequence
Mean
fa
X variance
X expected
square of error
Ca) Po_n_ (c) (d) (e)
(0
totie variance ofnH X error
p
pq
pq
pq
" r"
"1
" 2cr4
II-
(1 -"~)r
2c14
2(1
r -~) g4
1 2cr4
1
(It
,r
(2 - ~)g4
26'
As in the case of consistency, where there is one efficient sequence, there are many, but efficiency is, of course, & much more restrictive property than consisteney. For example, multiplication by (1 n-~) typically destroys efficiency, though multiplication by (1 n -1) never does. Again, the consistent sequence of medians mentioned under Criterion 8 it not efficient. Indeed, it is well kno·wn of that sequence that the aequence of erro1'8 times n U is asymptotieally normal about zero
+
+
with asymptotic variance or/2 rather than 1.
A bebaYioralladc review of the criteria for point estimation It is time now to introduce the notion of consequenees, or (equivalently, I believe) of 1088, thereby interpreting estimation problems 88 decision problems. Let it be said then that aD ~imaliD'. decmon probkm is an obaervational decision problem with the following distinguish. ing feature. There is a one-to-one correspondence between the basic acta f and the values attained by a real-valued function A(i), such that L(f; i) -=z 0, if f is the act that corresponds with A(i). It is simpler, more sugeetive, and harmlesa to let the number l that correspond8 to f replace f itself in all further discussion of estimation decision problema. ToiUuatratethenewDotation, it may beaaidthat.L(l;i) = Otifl =: A(i). I believe that any situation ordinarily said to call for (point) estimation can be , analysed &8 an estimation decision problem. For example. I
POINT ESTIMATION
estimatiDg how much paint will cover a wall may, dependiDI on citeumstaneee, m~ deciding: how much paint to buy, what to bid for a contract, or what Dumber to enter in a glleasing pool. Under each of those interpretations there will be aero 1018, if and, typieally, ouly if the estimate is "correct," as one says. The coosequenee8 of an estimate may, like those of many real life decisions, be difficult to appraise. It is hard to say eVeD in relatively concrete aituatiOD8 what it cost to miBestimate the apeed of light. a particular mortality rate, or the national income. U, to revert to aD example already diBc:us8ed, the estimate is to be published somewhere for the use of whoever bas a use for it, the COueqUeDCeS of publication may seem beyond all reckoDiDg. None the 1_, I reaffirm the conviction that the concept of CODaequence meaaured in income or 10. is valuable in dealing with such situationa, 88 I hope the present treatment of estimatioa will illustrate:+ Incidentally, it aeemI indifferent, as I have already said, whether 1088 or income is taken .. the startiDg point. It is euily shol\~ that the decisions of the idealiJed peraon of the personalistic probability theory wiD be the same in two problema having poesibly difl'erent income, but the same 1088, functioDL Thia feature I would expect to be acceptable even to objeetivista, and I also think it appropriate to theories of group decision. I know of nothing interesting that distinguishes estimation decision problems as & claa from observational decision problems pnerally. But actual estimation situations suggest certain relatively wide cl••ea of estimation decision problems about which intereeting and valuable conclusions can be drawn. Indeed, it will be shown in this and the next two seetioDB that seven of the nine listed criteria for estimation caD be justified to some extent 88 flowing from application of the principle of admissibility and the minimax rule to such eI.-es of estimation de-
"in
cision problems. Before making any real specialisation, it may be most systematic to mention that Criterion 1 is simply an instance of the general principle, which we have now studied from several pointa of view, that nothing is lost by confining attention to sufficient statistics, at least if mixtul'ell
are allowed. I t is elear in almost any estimation situation, even in tboee for which the notion of ]088 is vaguest, that if two errors have the same sign the larger entails at least as great a 1088 as the smaller. Analytically, (1)
L(l; I}
~
L(l'; t)
for ~(i) S I < I' and for A(i) > I > I'. Situations to which (1) fails to apply can readily be imagined. William Tell, for exemple, in eRi+ ThiI idea "u apreeaed by Gaua (l82l, SeetioD 6).
15.61
BEHAVlORALISTIC REVIEW OP FlJTlMATION
281
mat.ing the aqle by which to elevate his eroe-bow for the apple shot might have preferred a downward error of 100 to one of 10 ; but web eireumstances aeem exceptional. Furthermore, it is WRla11y justifiable to assume that strict inequality holds in (1), though there are many exceptiODB in which, for example, fta miss is 88 good as a mile" or one hit is u good &8 another. As is, I think t intuitively evident, when strict inequality bolda in (1), Criterion 3 is simply an application of the principle of adm i8810ility. That CODcluaion can be shown in complete generality without aeriou8 diflicu1ty, but, in compliance with the usual mathematical limitatiODB 01 this book, it wiD here be shown only under the 88IJUmption that z
is coDfined to a finite number of values. What is to be shown is this: If 1 and I' are a pair of estimates satisfyiDe the bypotbesie of Criterion 3, and if (1) holds with strict inequality; then L(I; i) - L(1'; i) S 0 for every it with strict inequality (or aome i. To begin the proof calcu1ate thu8: (2) L(I; 11 - L(1' i i) -
L
L(l; 'l[P(l(:t) ==
II B i ) -
II B,»)
P(l'(:t) -
I
:.: ~ L(l; ,')Q(l; i) I
-
~ L(l; s)Q(I; i)
+ :E
1<)00
L(l; i)Q(l; i),
'»00
wbere the definition of Q(l; i) is clear from the context, and where it b.. been taken into account that L(A('&l; I) - O. It will be shown that both 8WD8 in the Jut part of (2) are non-positive and that for lOme. at leaat ODe of them is negative. Focus, for definiteneM, on the aeeond sum. Let Ie == A(i) and 'I, 1" •.• be, in order of increuiq magnitude, the value. of I > ~(,) for which Q(l; .) '" o. With the abbreviations L{~)
-= Df L(l.; i), 4(k) := Df L(k) - L(t - 1), and Q(k) the sum to be investigated is (3)
1: L(k)Q(k) o<~
-
L
Q(k)
0<'
=L 0<1(
L
Df
Q(I.; ,),
4(1:')
0<1' Si
~(k')
:E Q(k). l~~'
('.I1Ua rearrangement may aeem bisarre on first encoun~rt but it is widely used in mathematics generally and is in fact an exact analogue, for 8UID8, of the more familiar integration by parts, for int.ep'81a.) It foUowa from (1) read with atrict inequality that A(k) > 0; and it 101Iowa from the hypothesis of Criterion 3 that Q(k) S 0, and that some Q(i)-or an aual0pu8 term &I8OCiated with the first BUm in the Jut
POINT
232
~IMATION
[16.5
line of (2)-iB strictly negative for some i. This completM the deduction of Criterion 3 from the strict form of (1) and the principle of admissibility. Essentially the same argument leads from (1) sa actually written to the modification mentioned in the note under Criterion 3. A vcry slight strengthening of (1), together with the minimax rule, provides a widely applicable justification of Criterion 8 (consistency), 88 will now be explained. Suppose that (1) not only holds but also is strict, if I A('J; that is, in addition to (1) suppose only that L{l'; i) > 0 for alll' ~ )'(i). In this context. let s(n) be a sequence of oblervationa such that the minimax L·(n) of the corresponding estimation problems approaches zero with increasing ,,; then any sequence of minimax eetimatM l(n) is consistent. Indeed, if the sequence l(n) is not cODSiatent, then, for some i, and some positive f and WI:
't
(4) for some arbitrarily Jarge "alues of (5)
ft.
This implies
L-(n) ~ 1.1(1(n); 'J ~ 0 min I L(A(i)
+ f; '1, L(~(~1 -
'; i) J
>
0,
which contra
them off. Suppose that a person must estimate the amount A of shelving for books, priced at fl.00 per foot, to be ordered for some purpose. It is
BEHAVIORALISTIC REVIEW OF ESTIMATION
15.5)
poaible that the foilowiDg economic analy8is of the situation would be suflieiently realistic. The person holds every foot of abelviDg 1. . thaD the Dumber of feet, ~, .A boob to be shelved to be worth Ser, cr > 1, but superfluous ahelviDl he holds to be worthlea Formally, L(l;~)
(6)
a:
(a - l)(A - l)
- (I - ~)
for I
S
for I
> ~.
~
There is thaD a comer, or kink, at I == Ai 80 differentiation, even once, ia impoaible. But the following aDaIysis is much more likely to be sufficiently realistic. The urgency of the Ihe1viDg of the boob is variable. Some would be worth shelving, even if the east of shelving were very high; at the other extreme, there are lOme that would not be worth shelving un.... the cost were very low. More fully, the value of I feet of shelviDg iI a function i(l) that presumably has the following featurel. It is MODGtODiea1ly increasing, strictly concave, and twice differentiable in I; i(O) = 0; i(oo) < «); ;'(0) > 1. The ineome attached to orderiDs L feet of shelviDlj at the price t1.00 per foot, is clearly 1(1; i)
(7)
=:
i(l) - I.
It. is maximised at the one and only value ~ ror which di(A)/dA == IJ
80
that L(l; i) - (i~) - A] - [i(l) - I),
(8)
which is of course t"ice differentiable in I. The moral of these two possible economic analyaes of one example is of wide applieability, 88 is weD known amoDg economists. Where a superficial analysis 8Ugesta & kink, or even a diaoontinuity, in an income function, deeper analysis will often show that the function i8 smoothed out by vari0U8 economic phenomena such 88 the iDhomopReityand the mutual substitutability of commodities. To retum from the digr.-ion, if L is twice differentiable in I (at least when I is close to A), L can be expanded in a Taylor aeries thus: (9)
L(l; -1
-
L(A; i)
+ (I -
a
~) -;-, L(l;
.1 '-ACi)
U
+ ~ (I 2
A)2
~ L(l; ,11 '~(J) + 0«1 -
a,-
A)t),
where, following standard uaap, 0«1- ).)2) is a runction orland i, Dot oece.arily the same from ODe context to another, 8Uch that 0«1 - A),) +
234
(uta
POINT ESTIMATION
(I - ~)2 approaches aero 881 approaches ~(.) for fixed t. The firet term OD the right side of (9) vanish.- by the definition of eatimation; the aeoond must vanish alao, for othenriae L could be negative.. Therefore,
(10)
L(l; i)
:=II
21 (I -
A)2
a2 L{l; i) ap
- (I - A(s1)la(i)
I
,_
+ o«l -
+ 0«1 -
~)2)
A)'),
where a(i) is defined by the context. In view of (10), it is plausible that L may, in JDaDY problema where estimates of great accuracy are poasible, be aupP«*d to be practically of the form (11)
L(l; i) - (I - ~(i»Ja(i),
where c:t{i) > 0 for every i. Tbis does Dot exaatly mean that a re&8ODable L can be eloae1y approxima~ by functi0D8 of the form. (11) lor aD I. In particular, the absurd aaaumptiOD that L is unbounded (which such approximation would typically imply) is not to be made. It meaDS, rather, that under favorable circumstances (11) may lead to a reaaonably good evaluation of L(I; i). In 80 far as the form (11) can be 1UpP«*d adequately to represent L, Criterion 2 is obviously an application of the principle of admissibility. An interesting discussion and application of (11) is given by Yates (Y2]. 8 A bebaviora1istic review, continued
Thua far, Criteria 1, 2, 3, 5, and 8 have been diacuaaed in behavioraliatic terms. In ract, under suitable bypotbeeea, each has been found to have considerable bebavioralistic justification. Criteria 4 and 9 also have such juatmeation, but my diacussion of them is 80 bulky it had better be isolafAd in a special section. As for Criteria 6 and 7, the only ones remaininl, they do not seem to me to have any aeriOUI justification at all, as will be discussed in still another section. Criterion 4, the recommendation of maximum-likelihood eatimatee, Us of extraordinary interest, for, of all the criteria of the verbalistic tradition, it is ea8lltially the only one that seleetB a unique estimate in almoet every estimation situation of practical importance. The present
section demoostrates that, in the presence of exteDBive observation, maximum-likelihood estimates are often almost minimax estimat81; it alao gives some analysis of Criterion 9, which refers to efficiency. The way to theee goala ia roundabout; it beginB with a study of information in the techniea1 Mille mentioned in 13.6. In this aection it will be ...
BEHAVIORALI8TIC REVIEW OF mrIMATlON
16.61
aumed for mathematical simplicity that each obeervation under diacuesion is confined to a finite number of values, each having positive probability for every element of whatever partition is under diacu8sion. If Bi and Bj are elements of a partition, not necessarily finite, and x Us an obaervation, BaY, ill the spirit of (3.6.11), that the in/omuJtitm 0/ j relGtiw to i for 1M ~ z is (1)
J(i, jj x)
e Df
-E (log P(z P(z: 8/)IBi) - -B (log Bi)' 'j
B~)
r~
The expreesion of J in terms of likelihood rati08 is important, eapeciaJly for the exteDBion of the di8cuaaioD to more general obeervatioDl than th088 contemplated here.. The reader should, therefore, try to bear in mind that the whole diacussion eould be carried on in terms of likelihood ratios; I refrain from 80 doing only for momentary ftUODS of notational convenience. The theory of ] caD CODveniently be preeented in a aeries of exercises. Burel_ la. If., is a contraction of x, then J (i, j; x)
when? Hint: (2)
-
E
I
(I
~
J(i, j; y). With equality
1
P(% I Bi ) B ) > pc, I Bi ) 01 P(z I BJ iJ 11 - - og p(rl B i ) •
lb. J(i, j; x) ~ o. With equality when? 2&. If %1, ••• , z.. are conditioDally independent, then (3)
J(i, j; Zl,
2b. If in addition the then
".,
%.'8
x.) -
r; J(i, jj z.) •
are conditioD&11y identically distributed,
(4)
It is interesting to evaluate the information J (A, A + 4A; 1:) where A and ~ + 4A are two e1oee1y neighboring values of the parameter of an estimation problem, supposed, for aimpHcity, to be free of nllisance parameters. If P(z I~) is continuous in A, it is almost obvious that JCAt A 4A; %) approaches zero 88 4A approaches zero. If P(% I A) is differentiable in A, it is easy to show further (eoDSideriDg that J is non.. neptive) that even J()., ~ + 4A; z)/tU. approaehee sero as AA ap-
+
POINT I8TIMATION
proacbea namely, (5)
HI'O.
But in this
lim J('A, ~
~l"'O
caE
much more
111.8 caD
and will be ahowD,
+ 4Ai x) _ ! R('A; z) 2
4)..1
- Dr
i
E[
e :% ).)Y I).]· I
101
The function H is pnerally, following Fisher, called iDformatiOll, but here we bad better call it tli8~ 'aJ~ Chl'OllolOlieally, as explained at the end of 13.6, the coneept of differential information ia older than that here called limply information and of which it is, according to (5), a limiting C&I8. The dem.ODStration of (5) bePa. with the CODSideration that (6)
log (1
+ ,> =- t -lP + o(p).
Therefore, (7)
{P(Z I A + lU) - P(z I A>} + ---~---
+ I
P(z 1A 4A) - In. 1 "8 P(x A ) · - a
)nIJ
=-
P(z J A)
{P(Z f A + 4A) -
I
P(z A)}
P(z f A)
_ ~ {P(Z I A+ 4A) - P(z I A)}' 41.'" 2 P(z I A) + 0(
Ja
Since the expected value liven A of the term in the second line 01 (7) is eaaily aeen to be euctly selO, it will be tactful to leave that term alone; but the eecond may be approximated thus: (8)
{ P(Z
I A+ ~) - pes I A)}' {4A apes I ~) }I P(z I A) P(.z I A) aA + o(~) log P{z I A>}2 S\ - 4A aA + o(~J.
'ta
Therefore,
J(A, A + ~; z)
(9)
II:
tH(A; Z)4).1
+ o(4AI),
which establishes (5). Moreaerd_ 3. If the kth derivative (A: for 8Very z, then (10)
E
> 0)
with respect, to 'A
or P(z I A) eDIt.
~(Z\A) ~ p(z I).) I).) - ~(~ p(% I A») - 0.
15.8)
BEHAVIORALISTIC REVIEW OF
ESTI~IATION
4. If the requisite &eeond derivative exists, then
(11)
H(>"j x) - -E
(:2101 P(z I A) I A)-
5. Uy is a contraction of % (and H(>..; x) is well defined), then H()'; 1)
S H~; x). Remark: The inequality is obvious in the light of Exercise la and the first part of (5). But it can also be derived from the following application of Theorem 1 of Appendix 2t which is useful in the next exercise. (12)
{
1
p(,1 ~)
ap(rl A)}2 _ gJ ( dA
1
apex I A)
P(% A)
aA
({
SE
I
8P(:r. i P(z A) clA 1
I
I
A)
1/,
A)}21
) Y. A ,
I
with equality for every 11 and A, if and only if :A log P(z A) ran be expl8l8ld 88 a function of 11 and A alone. &. H J is a contraction of %, BfA; x) == H(A; 1) for every ).; if and only ii, is sufficient for x. 6b. RCA; x) .. 0 for every A, if and only if x is utterly irrelevant. 7&. If z., ... , z. are independent giVeD A, tbeD
H(A; %1,
(13)
".,
I: H('A; z.) .
x.) -
•
7b. H, in addition, the x,'s are identically (14)
B(A;
Xh ••• ,
distribu~
given A, then
x.) == nH().;x.).
8. If 1 is a real-valued contraction of
%,
and H(A; x) is well defined.
thea (a) (1&)
!!. 8(11 A)
=- E
~
('(X) a log P(l(z) I ).) a~
(b) (16)
E(11 - Ar I A)H(>'j 1)
~ {~E(ll A)
A).
r.
with equality if aDd only if (17)
a
- Jog p(ll A) == (l - >.)k
aA
for aome coD8taDt k. Hint: Use Exerciae 3 and apply the Schwartz inequality to (15).
(lS.8
POINT J!SI1MATION
(e) If H().i z)
> 0, then
E«(1- A]21 A)
(18)
~ {~B(ll A)}'18(1; x).
Exercise 8e is an important, and now famous, inequality. It, together with its n.dimensional pneraiizatiOD, baa been called the Cram6r-Rao inequality because of ita independent pUblication by Reo and Cram" in 1945 and 1946 respectively (see [H6J). But the D&1Ile is Dot at all well justified historically. Frichet pnwented the inequality in 1M3 [FS], and Darmois extended F~et'8 inequality to fa dimeD8iODS, at least for unbiaaed estimates. in a publication [DIJ Dot later than RaG'I. The inequality has also, tho. I think erroneously J been attributed. to an early paper by Aitken and Silverstone (AI], and to one by Doob [DlOl. My point is, of COUI'Ie, not to give a definitive history of the iDequality, but merely to augest that lor the time beiDg an imperBOD&l name would be better. I tentatively propose callml it the in/~ iMqUGlity. Some recent references pertinent to the informatiOll inequality aDd other topice treated thus far in this eection are (W16], [M5], [061. and [H6]. The techniques uaed in the remainder of tbia aeetion, which revolve around the infonnatioD inequality, were published posthumously by Wald [W5]. The information inequality has an important beariDl on application of the minimax nile to estimation, of which the following theorem may, in view of (5.11) be taken as a first illustration. 1
THEOREII
HTP.
1. For every A in a closed interval of lezaph I, H().; z) where H is a constant. 2. 1 is a real-valued contraction of z.
(
H,
2)-1·
For aome A in the interval, B«(l - A)2l A) ~ HH + i
CoMCL.
Suppoee that the theorem is falae. Then aceordiDI to
PaooP. erciae 8c, (19)
lor every (20)
~
~
d
in the interval.
dA [A - E(ll A»)
>
Therefore, (
1 - HH H~
~~ + ,/ -
2
-('-H~~-+-2-)
E.~-
BEHAVIORALISTIC REVIEW OF
15.61
E9TI~IATIOX
for every ~ in the interval. Therefore, at one end of the interval or the other, (2J)
This leads to a contradiction through the well-kDo,,~ inequality
(22)
which
I
E([l- ~J'I~) ~ IE(l - ~ A)ll
-I A -
E(l t A)
I',
be derived as a direct application of Theorem 1 of Appeodix 2, or of the Schwart. inequality. or of the useful identity caD
E([l - A)' J A) =
(23)
v(11 A) + {E(I - AI ") }2• •
In the remaining portion of this section, let it be understood that: 1. The x,'. are an infinite sequence of observations that are, given A, identieally distributed and independent. 2. z(ta) - (X., •.• , z.1 for 1& - 1, 2, .... S. 1(,,) ill a real-valued contraction of x(n). The contraction l(n) is to be thought of &8 an estimate of A based on observation of x(n). In the spirit of the minimax theory it is really mixed, rather than ordinary, estimates that should be treated here. But this entailB no eaaential change in tbe following discussion onee it is recopiaed that a mixed estimate is, in effect, an ordinary estimate baaed on observation of y(n) =- Dr (1(n), z(n», where x(n) is sufficient for y(n), 80 that H(A; y(n» = H(A; x(n» for all A. 4. t aDd , are poeitive numbers. 5. Ao is a cloeed interval of length a contained in the range of A and including a given value ).0.
theorem shOW8 that, if L(I; ).) is of the fonn (5.11), L(l(n); A) caDDot ordinarily be kept much smaller than a('Ao)/nH(Ao; Xl) for Iarp la, even in a email interval about Ao. The
Dext
Tamu.. 2
If H(Aj Zl) is continuou8 and positive at lo, and if CI~) is a DOD-negative function continuous at >.0, then, for sufficiently larp A, 8«(1(3) - ).)Ia().) I A) ~ (1 - t)o:{'Ao)/1I1l('Aoi :1:1) for some ). lAo.
PaooP. There ill no loes of generality in supposing that
f
Ao such that, for A lAo, a(A) ~ (1(>.0)(1 - f)~ and H().j H().o; Zl)H (1 + (1 - f)-~)/2. Using Exercise 7b, (24)
H(A; Z(R»~ - "JiH(>.; Xl)H
n~
S -
2
H{lo; Zt)H[l
+ (l
< 1 and XI)Ji
- f)-UJ
S
POINT DTIMATION
H)
for A. At.
I.
By Theorem
(15.1
if ,,~ 16/,sH(Ao; :1:1)[(1 - f)-~ -
then (25)
E«I(n) - ).)2) A) ~ {
2nH H(>.oi zl)~l + (I -
.)-)()
11'.
+ ,2}-2
(1 - f)U
>---- nH(Ao; Zl)
for some). lAo. • The next theorem exteDc.l8 Theorem 2 to practically _1' 1011 functioa that is twice dift'erentiable in l for I and A clOBe to ActTBBoRBK3 Hyp.
1. HCA; :1:1) is positive aDd continuous at Ao.
2. a(A) -
1 at L(l; A) 2al
Df - ---,
I
I~
is continuous at Ao.
3. Inequality (5.1) bolds for). in 1\0. CoNCL.
For BUfficiently large ta, L(I(n); A) 2: (1 - e)cx(Ao)/aHCAo; Zl)
for eome ). lAo. PRooP. It may be suppoEd without 1088 of pnerality that. < Ii aDd that, for I, A • .\0, L(l; A) ~ (1 - ,)~a(A)(1 - A)'. It may also be suppoEd that 1(%; 71) lAo- This is 80, because it would 8uffice to prove tbe theorem for a new estimate 1'(,,), where l'(zj _) is defined to be the number in Ao cloeest to l(~; ft), which ill tum follows from the fact that L(I'(n); A) S L(I(,.); A) for A lAo. These auppoeitioDB haviq been made, tbe theorem is a direct COD-
aequeDce of Theorem 2. • If L(l; A) satisfies (5.1) and has two derivatives with respect to I eontinuoU8 in A for every ~ and for every I sufficiently cloee to A, and if H().; Xl) is ConUnUOUB and positive, then, for su8iciently large ft., CoROLLARY
(26)
1
L·(n)
~
(1 - f) sup «('A)/nH('A; x.), ~
where L *(n) is the minimax value of the estimation decision problem derived from L(lj A) and z(n), unleas the supremum ill queetiOll is infinite, in which cue nL·(ft) approaches infinity.
Of course, it would be enough to 888UlDe only that L(l; A) and H(Ai Xl) are weD behaved at some aequence of values of A. OIl whieh the aupremum
16.6]
BEHAVIORALISTIC REVIEW OF F.BrIMATION
241
in question is approached. In particular, if the supremum is actually attained at some A, they need only be wen behaved there.. Now. turning to the sequence of rnaximum-likelihood estimates, let them be denoted for the moment by l(n). It is known that under rather general hypotheaes "H(i(n) - A) is asymptotically normal about saro with asymptotic variance l/H(A; Zl). t This suggests, and ex· ampJee tend to confirm, that, under some supplementary eonditioDB, (27)
Jim nE( (1(n) -
~)
2
• -.
) ==
1 H(AjJ:l)
.
Indeed, one set of conditions implying (27) is stated in [W5], but one that BeeDll diffieult to apply. It can be shown that (27), together with the usual asymptotic behavior of i(n), implies __
(28)
lim
nL(l(n);~)
a(A)
-
•- •
, H('A; %1)
provided, for example, that L(l; ~) is bounded for eaeh A and that the second derivative of L(I j ~) \\;th respect to l exists when l - A. Easily applied rigorous theorems implying (28) much less (27) do not seem to have been formulated yet; but examples suggest that, under conditions general enough for many applications, (28) actually does hold uniformly, in the sense that, for n sufficiently large, (29)
(1 - e)cx(A)..
(1
+ e}a{A)
- - - - < L(l(n); A) S - - - nH(~; Xl)
nH().; J:t}
-
for all A simultaneously. If (29) holds, then, in view of CoroUary 1, i(n) is nearly minimax for large n, in the sense that (30)
L·(n) ~ (1 - e) sup L(1(n); >.). ~
Good examples can be based on (a> of Tables 3.1 and 4.1, letting L(lj p) be any 1088 function having two continuous derivatives in l throughout 0 ~ " p ~ 1. In particular, the example diseuased in 113.4 arises, if L(lj p) - (I - p)l. It can be argued that the phenomeDon discussed in connection with that example is probably not rare;
t Some key references for the asymptotic behavior of i(ft) are [K21. [C9J. [LSI. [WIG), [N.). The literature on this subject is extraordinarily complirateci. There are aclmowledpd mathematic:aJ mistakes in some of ita most aopbiaticat.ed publicatioaa. otben prove much leal tbaD any but the most attentive reader would be led to euppoee; lew pve an adeqU&te statement of their re1ationa to their predeeell8Ol'I; aod thoee that make eerious pretentiOQa to ricor involve complicated hypotheee8. Por dOCW'Dentation or tbia lament tee IN4). [W.j), mel (La).
POINT ESTIMATION
(16.1
because, for minimax I(A), L(I(n); A) is, judginl from. eumplee, ofteD constant and, therefore, nearly equal to mp «().)/ftll(>.; ~l)t but L(i; A) ~
closely follows the rise and fall of a(A)/nHCA j Xl). Turn now to Criterion 9. efficiency. It 888ID8 difficult to defend the criterion as it has been defined in cOIlDeetion with (4.8); for what virtue is there in the asymptotic normality required by (4.8)1 It it perhaps noteworthy that the sequence of minimax .timates, tl(n). ariaing in cODilection with 113.4 does not .tilly (4.8). Indeed, (13.4.3) implies that 1&H~(n) - p) is asymptotically normal not about zero, but about p). It is my impression that the essence of the efficiency concept resides not in asymptotic normality, but in the overall behavior of the mean square error of a sequence of estimates. I therefo~ propoee U!ntatively to modify the definition and to call a aequence of estimates 1(,,) efficient, if and only if ita mean square error behavee at leaat as weU 81 can typically be expected for a sequence of muimum-likelihood esti...
(I -
mates. Formally, I propose to eaIll(n) efficient, if and only if, for " sufficiently large, E([l(n) _ AJ2) ~ (1 + e) (31) nH(~; Xl)
for every ~ simultaneously. I think the main objection that is likely to be raiaed to this proposed definition is 8880Ciated with the p088ibility that in some problema of theoretical, and perhaps also of practical, importance (31) is not .tiIfied by any sequence of estimates whatsoever, though the maximumlikelihood eequence is efficient in the "official" 8eD8e. In BUM a problem, are the maximum-likelihood estimates not as good for all practical purposes for sufficiently large n as though their variances were actually equal to thOle of the nonnal distributions to which they approximate? It is natural to think 80 by analogy with other contexts in the theory of probability, but approximate nonnality is actually no substitute for (31) in the present context. The next paragraph is devoted to aD example illustrating the inadequacy of asymptotic variance 88 a measure of asymptotic 1088. It can be skipped without 1088 by anyone Dot interested in such technicalities. The best example I have been able to construct is derived from a Iequence of observations that is not a standard sequence. Whether the interesting features that it exhibits CAD actually be realised by standard aequeoces, I do not know; but the example will do to illustrate the iasue. Let y(n) be any real random variable 8Ubject to the deuity
14.6) .~(
BEHAVIORALISTIC REVIEW OF ESTIMATION
-
248
~)nHi n), defined thus: .;(z; n) is the standard uormal deDBity
iDSide the interval [-I(n), &(n)J, ,en) being such that the standard normal probability of this interval is (1 - n-1); .CZj n) =a ,-2'(2n)/4 for 1(27&) ~ I z I s ft~; +(z; n) is 80 defined elsewhere &8 to be a symmetric positive probability density with the first two momenta finite, with a bounded derivative approaching zero like z~ with increasing z, and with unique abeolute'maximum at z -=- O. It is evident that ftH (J(n) - A) is asymptoticaUy normal about aero with unit variance. The iDformation H().; y{n» is well defined (even accOrdiDg to tbe strict cODditioDl imposed by Cram&, Lemma 1, Section 32.2 of [09]). The maximum-likelihood estimates of A are 1{n), and these are also (accordiDg to Theorem 3.3 of IGll) minimax for the simple quadratic 1088 function (I - ).)2. But (32)
E{(y(n) - ).J2 ] ).) .. E(y(n)21 0)
~ 2nH
f
1
"a.{ynH; 1l) dll
I('~)t,-"
==
In-~[l
- 1(2n)n-H ] 8(2n),
which does not satisfy (3t). Even for the bounded, and therefore more realistic, 1099 function, (33)
L(l; A) ~ min {I, [I - Al2} ,
it follows easily from Theorem 3.3 of [G1] that every estimate must aomewbere incur a 1088 at least &8 great as the lo,,·er bound established by (32). To summarize, there are no estimates efficient in the 8eD8e of (31), nor even in the sense that would arise from (31) on replaciDI the simple quadratic 1088 function by a bounded loss funetioD; the sequence of estimates 1{n) is efficient in the official sense, 80 to speak, but doea not, of course, result in losses of the order of What can be said in positive justification of the criterion of efficiency 88 defined by (31) or the like? RoughJy, the elements of such a sequence nearly dominate every estimate for every smooth 1088 function. A little more precisely, for large ft., the 1088 &89OCiated with an element or a sequence efficient in the sense of (31) is at most larger by a small fraction than that of any other estimate, except poesibly in some abort intervals. t The maximum loss of such an element is at moat larger by a small fraction than the minimax loss, 80 the elementa of the sequence are typieally nearly minimax. Moreover, they typiWiy have cOMid-
,,-1.
t It. bae actually been
demoDStrated &.hat the total lenlth of thfJ8e ioternlI (within uy fixed int«val) 11 ama1l [Lal.
~xeeptionaJ
POINT DtrlMATION
(15.1
erably smaller losses than any minimax estimate, except in short intervala that are typically very improbable a priori in the personal 88D8e. Thus the principle of admissibility, the minimax rule, and the peraooalistic concept of probability combine to suggest that efficiency 88 defined by (31) is a promiaiDg guide in the search for good .timatea. An extensive critique of the concept of efficiency, including much material OD ita history, bas been given by LeCam in [LS), which unfortunately was not available to me in ita entirety 88 I wrote this section. R. A. Fiab.er's name is the most prominent in the history of maximum.likelihood estimation and efficiency. Some historical details a,n, given in IN4) and on p. 45 of Vol. II of [K2] . ., A bebavioralistic review, concluded Criteria 6 (unbiuedD e88) and 7 are now the only ones in tbe list for which I have Dot suggested some justification in terms of the theory of decision problems, and, indeed, I C8DDot. UnbiMed estimates fascinate
many theoretical statisticians, including myaelf, and the study of them. undoubtedly has certain valuable by-products. Yet it is now widely agreed that & serious reason to prefer unbiased estimates seems never to have been proposed. Three weak defenses are sometimes beard. First, unbiasedness is &8serted to have an intuitive appeal; whether it does or not depends, of course, on the experience of the intuiter. Second, averages of inc.,.. ingly many unbiased estimateB are typically consistent. If this is a \irtue, it is a limited one and pertains to the unbiased estimate not &8 an estimate, but 88 a step in the definition of other estimates. Third, an allusion is made to equity. If, for example, it has been agreed that ODe party will buy a sack of sugar from another at 80 much per pound, it seems fair that the nominal weight of the sack be detennined by UDbiased estimate. This ethical conclusion could perhaps be given some justification in tenns of approximately linear utility functions or a longrun argument, though there is danger of falling into such pitfalls 88 the conclusion that accuracy is unimportant for equity; and it might find some application in the theory of barter; but it seems, at best, tangential to estimation in the sense of the present chapter. For a proper appraisal of the criterion of unbiaaedness it should be realized that, even if A admits an unbia.eed estimate, many not-at-aU pathological functions of A (which can in tum be regarded &8 parameters), may fail to do 80 and that such unbiased estimates 88 ~ does admit may be preposterous. These phenomena are both illustrated by the following simple example. Let % be confined to two values, say 1 and 2; let P(l A) 1 - p(21 ).) - A; and let), be confined to the interval
I
3:Z
15.71
B}"~AV)ORALlSTIC
REVIEW OF ESTIMATION
[1/3, 2/3}. Then, by definition, 1 is an unbiaaed estimate of .(A), if and only if l(l)A l(2)(1 - A) := 1(2) (l(l) - l(2»A - .~)-a condition that caD be met., if and only if. i8linear. Suppose, for example, f/)(A) - ~ for every A. then l(l) = 1, 1(2) := 0 defines the only unbiased eetimate of .~). This estimate is worse, accordiDg to an emphatic \'UiaDt of Criterion 3, than the biaaed estimate l' sueb that 1'(1) - 2/3 and 1'(2) -= 1/3; for l' (when it errs at all) errs in tbe same direction 88 1, but never nearly as far. A. for Criterion 7. it is on first encounter appealing to postulate that, I 1 is uaually closer to ~ than l' is, then 1 is better than 1'. But, speakinc at least for myself, the initial appeal of Criterion 7 seems to have been bound up with the conjecture that Criterion 7 is in some BeD. of the eame 80rt 88 Criterion 3. The exampJe given under Criterion 7 almost entirely evaporates the conjecture, and with it the appeal. In the paper [P5) in which the criterion is put forward for ooDlidemtion and exploration, Pitman mentions that the criterion eeems acceptable in contexts where "the de,,;} takes the hindmost." This aIlulion to tbe devil eeetDs to offer no justification for the criterion 88 a criterion of estimatioD, for I understand the allusion to refer only to the followiDg kind of declaion problem, which is quite remote from estimation as oroinarily understood and is hardly ever encountered: A penon must chooee between 1 and 1', -'winDing a prize if the estimate of his choice falls closer to A than does the other one. A.ccording to Pitman, the relationship of "better than," or ueloeer thaD" as he ea11a it, defined by Criterion 7, is not DecefRrily traDaitive. He aJ'IUe8, I think with some justice, that this breakdown of tnmsitivity doee Dot in itaelf invalidate the criterion when the criterion is applied to aelect the "beet" from some prescribed class of estimates; but "best" eaDDot here be taken litaally. Criterion 7 is UDWSUal in that it depends on the joint eonditiOD&l distributiQD8 of pairs of estimates rather than on the distn1>utiona of each estimate cOD8idered aeparately. On any ordinary interpretation of ... timation known to me, it can be argued (&8 it "'88 under Criterion 3) tbat no criterion need depend OD more than the separate diatributiODS.
+
+
CHAPTER
16
Testing 1 lDtroductioD In prinoiple, this chapter on the statistical proeess of testiDg (often referred to more fully as making tats of hypotheses or sipifieance tests) might have been organir.ed on the pattern of the preceding chapter on point estimation: 8 sta~ment of vetbaJistie ideas, followed by motivation and criticism in terms of behavioraliBtie ideas. But I am. diSJUaded from repeating that pattem by several coDSideratiODB. It would, in the first place, be needlessly repetitious. ThtJS, in the pre. ence of the preceding chapter I need mention only in passing that auIicient statistica and symmetry play the same role in testing 88 in other observational decision problems, aDd that a certain scheme of testing, closely related to maximum-likelihood estimation, haa asymptotic, or large sample, virtues. Again, the pattem of the precediDg chapter is less attractive here. because the criteria for testA5 developed in the verbalistic tradition do not on the whole seem to have such satisfying behavioralistic motivation 88 do their oounterparta in the theory of point estimation. Finally, it is inappropriate to attempt anything like a complete list of verbalistic criteria for tests here, especially in view of the availability of two excellent and mutually eomplementary key references (Chapten 21, 26, and 71 of [K2]; and (JA)). The organization actually adopted is this: Firat, testiDg and criteria for teats are discussed from a frankly behavioraliBtic viewpoint. In this discussion ideas stemming from the verbalistic tradition are uaed freely I and some criteria of the verbalistic tradition are criticized. Second, an attempt is made to analyze some of the important statistical situations to which the theory of testing is ordiDarily applied. It is becoming increasingly recogniJed that many of these applications are very crude, and that their replacement by sounder procedures coDStitutes some of the most important and provocative statistical problems of today. Tenns introduced in boldface in this chapter are among the most frequent in ordinary statistical usage. The definitions given are in24A
A THEORY OF TESTING
11.2)
247
tended to be in reuonable accord with that usage, but some small oonceaDODB are made to the particular form in whieh the theory of testing is expreaed here.
I A tIleo!)' of testiq VerbatisticaUy, the problem of testing meaD8 to guess, on the basis of obiErvatioD, which of two disjoint and mutually exhaustive hypotheses obtains. Beb&vioralistically, this would generally be agreed to point to the definition: A t!"ing problem is an obaervational decision problem derived from exactly two basic acta to and fl. These two basic acta are called (for a reason that will BOOn be clear) acceptiDI and rejectIac the nail hypotheela, respectively. CoDaidered abstractly as bilinear games, testing problema may, 80 far 88 I know, have DO special feature beyond the uninteresting one that one of two f'8 is appropriate to each i. But, considered 88 obeervatioDal problems, testing problema do present some interesting special featureH. In the first plaee, since at least one of the two basic acta is appropriate to each i, the set 1 of all i's e&U be partitioned into three eeta, H 0, HI, and N. defined thus:
(1)
L(foi"1
-
L(lo; ,1
>0
0 and
L(f1 ; i)
>0
for i. Ho,
and L(11 ; i) - 0 for i • Hit
L(loi ,j - 0 and L(f1 ; i) - 0 for i . N. When it is recalled that the i's correspond to a partition B, of S, the leta H o• HI, and N may, with a slight clash of logical gears, be regarded as three events partitioning 8. The traditional names of H 0 and HI are the DUll and the altematift hJPOtheaia, respectively j N, beiDg quite unimportant and often either ignored or made vacuous by some trick of definition, has DO IUch name. Rejecting the null hypothesis when it does in fact obtain and accepting it \\ilen it does not obtain are called erroR, more specifically erron of the first and aecoa.cl kind, respec-
tively. A teat is a derived act of a testing problem. A test may conveniently be identified with the real-valued contraction z of the obaervation Z, web that ~(%) is the probability prescribed by the test for rejection of the null hypothesis in ease z is observed. An unmixed test (which v...s until recently the only kind contemplated) oorresponds to & I confined to the two values 0 and 1, which respeetively imply outright aceeptanee and rejection of the null hypothesis.
(16.2
TESTING
The 1088 8IIJOCiated with the taJt z when i obtains is clearly (2)
L(z; 11 - L(fo; I;B(1 - z , I)
+ L(f1 ; I1E(z 1i)
== L(ft ; i)B(1 \ I}
for i .Ho
= L(fo; i)(l - B(s 111]
for i
-0
fori .N.
,H.
I
I
The functions B(z i) and [1 - B(z t)] are, respectively, the probability of rejecting and aeeeptiDg the null hypothesis with the teat z wIleD i obtains. There is obviously Dot.h;DI to chooee between them in importance or convenience, each beiDg equivalent to the other. They are commonly caUed the power faDctioD, and opemtlDc characterisdc, respectively. In view of (2), ODe test % domiDateB another ZlJ if and only if (3)
E(z I i) S E(z' I '}
fori aHo
B(z 1I)
for i
~ E(ZI , i)
I
Ht ;
or. apin, if and cmly if the probability of error with ~ is at least as great 88 with z for every i. Thus, dominance, admissibility, and equivalence depend on the basic 1088 function, L(fr ; I), only in 80 far 88 that function determines Ho and HI. This is not ODly remarkable but aJao useful; for H 0 and HI may well be clearly defined in contata where the basic 1088 is vague, or othenvi8e ill detennmect If z is admissible in the spirit of (3) relative to a pair of seta Ho and HI, then (if co is for the moment admitted 88 a poasi'ble value for .. lea) there exists a basic 1088 function leading to Ho and HI and having z &8 its essentially unique minjmax. Indeed, let L(fo; i)
(4)
L(f1; tJ
== [1 - E(z I i) ]-1
for i aH.
==0
elsewhere;
..
for i e Ho
E(z, ,)-1
==0
elsewhere.
'Vith this 1088 and reckoning o· co I=t 0 (as is appropriate here), L(zl i) == 1 or 0, according 88 there is or is not positive probability of making an error at i with z. In view of (2) and (4), any minimax Zl Dot equivalent to z would strictly dominate It contrary to the assumption that z is admissible. The moral of that conclusion can be put thus: Without special 888UDlptioDS about the basic 1088, the principle of admissibility
16.21
A THEORY OF TF.3FING
and the minjmax rule lead to no criteria expressible solely in terms of Ho. lIt, and the conditional distributions of the observation x other than that of admiaJibility itself. Whether some other objectivistic principle could justify such criteria may be cousidered an opeD question, but, 88 I have already said (in , 15.1), DO other general objeotivistic principles have been aeriously maintained. It is natural, for example, to demand that z have the same symmetry BB P(z I 'J and Ho and H1 j but that criterion can surely Dot be jU8tified at all, unless the basic 1098 is also assumed to have the same symmetry, the jll8tifiability of which in tum depends on the cue. To take another important example, it is often proposed that a satisfactory test must be unbiaaed, t that is, ita power function must never be hilher in Ho than in HI' More formally, the test I is unbiased, if and oraly if
E(z I io) S E(II il)
(5)
for every 10 I H0 and every il • HIA.umiDg that L(fo; f.j and L(fl; i) are conatant in HI and Ho, respectively, it will be shown that any minimax muat be unbiased. As a step toward that demonstration, coDSider a testing problem as a minimax problem, without any special aaaumption about the baaia 1088.. It is pcaible that L* .. 0, in which C888 the miDimax testa are all equivalent and all unbiased. Putting that poaibility aaide, I aaert, and will ahoW', that (under the usual mathematical simplmea.tiODs) max L(z; i) .. max L(z; i) == L*
(6)
,.R,
'cS,
for any minimax z. It is obvious that neither maximum exceeds L·, and also that one or the other must equal L*. But suppose, for example, that the aeeond maximum were actually lese tban L*, and consider r - cd with 0 < cr < 1. AccordiDg to (2), if z' is substituted for z, the first maximum in (6) will be dep1'e8led, and, for cr aufticiently cia. to 1, the aecond would remain actually 1MB thaD which contradicts the ,.lmptiOll that z is m;njmax~ eetabliahmg (6). Now make the apecialua1mption that
L·,
L(fo; I} - A
(7)
for i ,Ht
forieHo,
aDd suppose that z could be minimax but biased.. There would then
t A definition uDifyin& the 'YuiOWl fonrard in
[L5J.
eoocepta of UDbiued.De. ill atati8tiea is put
TIBTlNG
118.1
exist io • Ho and il • HI such that (8)
L* - L(I; 10)
a
BE(z I to) - A - AB(z IiI) - L(z; il)'
and such that E(z; io) > B(I; ill. But eoDaideration of the teat that. simply Ulips to every z the Dumber 8 midway between B(z; 10) and B(r.; '1) ahowe that I eou1d not be minimax, The condition (7) is a re&8OD&ble &88UIDption in lOme testing problema, and, where (7) is satisfied, the criterion of unbiaaedD 888 has such aupport as the minimax rule can give. In many other typical testiaa pro)). lema, however, there are borderline errors that hardly matter at all but can scarcely be prevented, and serious errors that eaD largely be prevented. The fonowing example, which can be varied to suit divene t.aataJ, ahows that it can be folly to insist on unbiaaedDe8I in such problema. Let i take the three values 0, 1, 2, and let J: take the vatu. 0 aDd 1 with conditioD&l probabilities defined thus: (9)
P(O I 0) - 99/100,
P(O 11) - 0,
P(O 12) - 1.
Let the buic lea be defined by the condition tbat i • Ho or i • H 1, ~ cording 88 i - 0 or DOt, and by (10)
L(ll; 0) - 1,
L(to; 1) - 1,
Then L(z; 0) - ["z(O) (11)
L(Io; 2) - 1/101.
+ .e(I)VlOO
L(z; 1) - 1 - a(l) L(z; 2) - [1 - ,(0)1/101.
It is eaai1y verified that the only minimax z· is defined by z*(O) - 0, .1*(1) =- 100/101, and that L(z*; t) :IS L* == 1/101 for every i. But it is alao euily verified that the only unbiaaed teats are absurd in that they ignore the observation 1:; they are in fact just thoae for which %(0) == ,(1). It has until quite recently been said by many that attention should be confined to tests such that there is a fixed probability (I (called the lize of the test) of making an error of the first kind for every i • HoIndeed, the criterion of siR has often been taken 80 seriously as to be incorporated into the very definition of a test. Thoup many important testa happen to have a sile, others equally important do not; 10 it now eeema to be recognised [lAl that the po_8Ifion of a siae cannot
16.2J
251
A THEORY OF TESTING
be taken seriously 88 a criterion. t To take an everyday example, conaider the binomiaJ distnUUtiODS P(z I p) '"'
(12)
C:
1) pa(l - p)101_,
where tbe parameter p confined to [0, 1] plays the role of i and % - 0, ..• , 101; and suppoae that Ho is the hypothesis that p < 1/2. A teat of size Q is a test fOI" which
~ z{x) C~) P"(1 -
(13)
for all" (14)
< 1/2.
p)101-..
-= CI
This obvioue1y implies
E [z(.%) -
al
(101) ( P)Z _0 z
z
P I -
for all p < 1/2, whence z(z) = ex for every z. So only absurd testa have aile, in this example, though there are elearly tests here that are quite satisfactory for many applications, for example, let z(z) equal 0 or 1 according 88 Z S 50 or % > 00. In view of the criticism just made, there is a tendency to redefiDe lise 80 that any test bas a aiu cr, namely,
(15)
a
&:IS
Df
max B(z
I t.J.
i. Ho
In t.erms of this definition of size, a concept of testing somewhat different from that proposed in this section hat been defined and defended (Wald, p. 21 of (Wal, and Lehmann, pp. li-l8 of [IA]; D81D.ely, it is postulated that a test is to be chosen not trom among all possible tests. but only from among those haling a size a (in the sense 01 (IS)) given .. part of the talting problem.l This concept ot testing is Dot defended to the exclusion of the one proposed here, but it is asserted by tbe authOl'8 cited to be more realistic for some problems. The aJ'IUD1enta of both authors on this point are similar and, I think, quite weak in two crueial p1acee, for the advantage is supposed to Bow in some uUJMciJNd way from the undemona'"*t1 impossibility of comparing preferenees for consequences of qualitatively different kinds. It MelDS, it I may be allowed such 8 conjecture, that the concept of testing under a
t Statisticians in~rested in the Be~n.t-Fisber problem may be interestA!
lThe constraint aetually
pp.
impoaedt especially by !Ahmann [IA), it that the aile
be at most a. But, .. Lehmann expJaiDI, thil dil'erenC8 is mont appanmt tbaa real.
TJ!8TlNG
252
(16.8
constraint of si. represents a Procrustean attAmpt to fit the (older) Neyman-Peanon theory of teatiDg bypath. . too cloeely with the (Dewer) minimax theory. It is not to be denied, of courae, that there may sometimes be a mathematical advantage in studying and comparing testa of giVeD sise. It should be mentioned, before CODeludiDg the IUbject, that any theory taking aise seriously introduces an asymmetry of the theory with respect to H0 and HI. an asymmetry that is aurely not always appropri-
ate.. Sipfftcace le.eI, or leY8l of allI'itcaace, is a ayDODy'ID (nec1ectiDI a alight distinction made in [IA)) of me, probably more wide1y used than aise itee1f.
a
Testing in practice
The theory of testing admits some fairly reaHatic applicatiODS, but the present state of statistics is sueh that the theory of testiDC is invoked more often than not in problems on which it does Dot bear squarely. This section discuaaea typical applicatiODS of the theory. pointing out the shortcomings I am aware of. The development of the theory of testiog bu been much inflUtmeed by the special problem of simple dichotomy, that is, teatiDg problema in which H0 and HI have exactly one element each. Simple dichotomy is BWJCeptible of neat and full analysis (as in Exercise 7.5.2 and in , 1",.4), likelihood-ratio testa here being the only admissible taJtB; and simple dichotomy often gives insight into mont complicated problema, though the point is not explicitly illuatratAd ill this book. Coin and ball examples of simple diehotomy are easy to eoDStruct, but in8tances seem rare in real life. The astronomical obeervationa made to distinguish between the Newtonian and EiD8teiDian bypotheaea
are a good, but not perfect, example, and I suppose that lfAMAl'Ch in Men.. delian genetics sometimes leads to others. There is, however, a tradition of applying the concept of simple dichotomy to some situatiODB to which it is, to say the best, only crudely adapted. Consider, for example, the decision problem of a person who must buy, 10 , or refuse to buYt f 1, a lot or manufactured articles on the basis of an obeervation z. Suppoee that i is the diflerenee between the value of the lot to the person and the price at which the lot is offered for sale, and that P(z, 11 is known to the peraon. Clearly, H 0, H 1, and N are seta eharacteriaed respectively by i > 0, i < 0, i O. This analysis of this, and similar, problems bas recently been explored in terms of the minimax rule, for example by Sprowls [S16] and a little more fully by Rudy (R4], and by Allen [AS]. It seems to me natural and promi8irlc for many fields of :II
18.1)
TESTING IN PRACTICE
253
application, but it is Dot a traditional analysis. On the eontrary, much
literature recommends, in effect, that the person pretend that only two values of i, io > 0 and it < 0, are possible and that the person then choose a test for the resulting simple dichot.omy. The selection of the two values 10 and i l is left to the person, though they are sometimes supposed to correspond to the person's judgment of what constitutes good quality and poor quality-terms really quite without definition. The emphasis on 8imple dichotomy is tempered in 80me acceptan~ sampling literature, where it is recommended that the person choose amoDg available testa by some largely unspecified overall consideration of operating characteristics and costa, and that he facilitate his 8Urvey of the a,"ailable tests by focusing on a pair of points that happen to interest him and considering the test whose operating characteristic passes (economicallYt in the case of sequential testing) through the pair of points. These traditional analyses are certainly inferior in the theoretical framework of the present discussion, and I think they will be found inferior in practice. To make a small digression, there is a complication in connection "ith testing whether to buy that is not ordinarily envisaged by statistical theory; namelyJ the economic reaction between the buyer and the supplier. If, for example, the supplier know'S the test the buyer is going to apply, that knowledge "ill influence the quality of the lot supplied. There seems to be little, if any, successful work oh the economic problem thU8 raised about the game-like beha\ior of the two people involved (af. pp .. 331, 340, and 346 of {W61). The problem "ilether to buy a Jot obviously has many formal COUDterparta in other domains. In some of them it is particularly clear that purely objectivistic methods do not suffice. To illustrate, imagine two experiments: one designed to determine whether it is advantageous to add a certain small amount of sodium fluoride to the drinking water of children, the other to det~nnine whether the same amount of oil of peppermint is advantageous. Grant.ing that each of the two additions can be made at the same cash cost for labor and material and that the desip of the two hypothetical e:~perjment8 differ only in the interchange of the roles of sodium fluoride and oil of peppermint, the corresponding testing problems are object~vi8tiea1ly completely parallel, that is, the same with regard to 1088 function and eonditional probability of the oheervation8. But it must be acknowledged, lthink, that the people actually charged with the decision in either of these two eases would ad should take into account opinions they had before the observation. For example, they might originally have considered it nearly impossible that the oil of peppermint could result in any hygienic advantage large
TESTING
(18.1
enouch to compensate for even the small coat of ita admjDiatrationt but, in view of recent dental.roeoarches on the subject, they mi.", Dot have couidered it at all unlikely that the sodium fluoride should have aD overall advantap. In that ease, parallel observatiODB in the two aperiments would Dot always lead to parallel decisioDS. ObjectivUsta typically admit such a poasibUity but go on to say that it is UDl'e88OIl&ble to isolate the experiment and that it is the totality of iDformation beariDl on the subject that should be trgted objectiviatically. If objectiv.. ists could give a more detailed diacuaeiOD of how to deal with such a totality of information, it might do much to elarify their position. I turD now to a djft'erent andt at least for me, delicate topic in CODDfJOtion with applications of the theory of teetirag. Much attention is given in the literature of statistics to what purport to be teeta of bypath__, in which the null hypothesis is such that it would Dot reaDy be aeeepted by anyone. The followinc three propoaitiODl, though pJayful in content, are typical in form of these abdM null hypotheses, .. I shaJl call tbem for the moment. A
The mean Doise output of the cereal Krakl is a linear fUDction of
the atmoepherie pressure, in the range from 900 to 1,100 miDibar8.
B The basal metabolic consumption of sperm whales is normally diatributed [WIl J. C New York taxi drivers of Irish, Jewish, and Scandinavian extraction are equally proficient in abusi\"e language. Literally t~ test 8Uch hypotheses AI these is preposterous. If, for example, the loes associated with f1 is sero, except in cue Hypotheaia A is exactly satisfied, what possible experience with Krakl could dissuade you from adopting The uDacceptabUity of extreme null hypotheees is perfectly well known; it is closely related to the often heard maxim that acience diaproves, but never proves, hypotheses. The role extreme hypotheeea in science and other statistical activities seems to be important but 0bscure. In particular, though If like everyone who practices statistica, have often "tested" extreme hypotheses, I ea.nnot give & very satisfactory analysis of the process, nor say clearly how it is related to testing 88 defined in this chapter and other theoretical diaculliODS. None the leas, it, aeema worth while to explore the subject tentatively; I wiD do 80 largely in terms of two examples. Consider fi1'8t the problem of a cereal dyDamicist who muat estimate the noi8e output of Krakl at each of ten atmospheric preaeures between 900 and 1,100 millibars. It may well be that he can properly n!pI'd the
'.?
or
le.8]
TESTING IN PRACTICE
problem 88 that of estimating the ten parametera in question, in whieh cue there is no question of testing. But aupPQJ8, for example, that one or both of the following cODSiderations apply. First, the engineer and his colleagues may attach considerable personal probability to the possibility that A is very nearly satisfied-very nearlyr that is, in terms of the dispersion of his measurements. Second, the administrative, eomputational, aDd other incidental (-.()8t8 of using ten individual estimates might be considerably greater than that of using a linear formula. It might be impractical to deal with either of these considerations very rigorously. One rough attack is for the engineer first to examine the obaerved data % and then to proceed either as though he actually believed Hypothesi8 A or else in some other way. The other way might be to make the estimate according to the objectivistic formulae that would have been used had there been no complicating considerations, or it might take into account difTerent but related eomplieating considerations not explicitly mentioned here, such 88 the advantage of U8ing a quadratic approximation. It is artificial and inadequate to regard tbil decision between one class of basic acts or another as a test, but that is what in current practice we seem to do. Tile choice of which test to adopt in BUM a context is at least partly motivated by the '·ague idea that the test should readily accept, that is, result in acting 88 though the extreme DUn hypotheses were true, in the farfetched caae that the nuD hypothesis is indeed true, and that the worse the approximation of the null hypotheses to the truth the less probable should be the aaeeptance. The method just outlined is crude, to say the best. It is often modified in accordance with common sense, especially 80 far &8 the aecond coDSideration is concerned. Thu8, if the measurements are sufficient]y preeiae, no ordinary test might accept the Dull hypotheses, for the experiment will lead to a clear and sure idea of just what the departures from the null hypotheses actually are. But, if the engineer considers those departuree unimportant for the context at hand, he will justifiably decide to neglect them. Rejection of an extreme null hypothesis, in the sense of the foregoing discussion, typically gives rise to a complicated subsidiary decision problem. Some aspects of this situation ha\"8 recently been explored, for example by Paulson [P3), (P4J; Duncan {DII], (D12]; Tukey [T4], [T5]; Scheff~ [S7]; and W. D. Fisher [F7]. To summarize abstractly, I would say that, in current practice, ~ called testa of extreme hypotheses are resorted to when at least a little credence is attached to the possibility that the null hypothesis is very nearly true and when there is some special advantage to behaving 18
TESTING
[18.3
though it were true. One other illustration will make it clear that point estimation is not essential to the situation and that belief in the approximate truth of the null hypothesis alone does not alwayB justify testing. Conaider the penonnel manager of a great New York taxi company.
Wishing, of
that his drivers should be 88 proficient 88 poaaible, he would, under simple circumstances, hire exclusively from the Dational-extraction group that bad obtained the highest mean aeoree in a standard profioiency examination; for why should he not be guided by a positive indication, however slight? A statistical test of the extreme Hypothesis C would not, therefore, be called for, .. has been pointed out in general terms by Bahadur and Robbins [Bal. Even strong belief that ethnic difierenr.es are extremely email in the respect in question would not alone be any reason for departing from this simple policy, dictated by the principle of admj88ibility-quiu, in contrast to the . ample framed around Hypothesis A. H, however, public opinion, a shortage of labor, or administrative difficulty militataJ againat any diacrimination at all, the manager may resort to a test baaed on the exCOUJ'88,
amination SCOral. In practice, tests of extreme hypotheses are typically chOEll from a relatively small arsenal of staDdaM types, or families, each family consisting of one unmixed test at every signifiea.nee level (as is always called in this context). In publications, it is standard practice Dot simply to report the result of a test, but rather to report that level of significance for v.·hich the corresponding teat of the relevant family would be on the borderline between aeeeptance and rejection. The rationale usually given for this procedure is that it enables each U8e1" of the publication to make his own test at the aignifiamce level he deem. appropriate to his partioular problem. Thu the aignificauce level is supposed to play much the same practical role .. a auflicient statistic. An interesting contribution to the theory of extreme hypotheaea is given by Bahadur [BI] in the speoial context of the two-sided t-teat.
me
CHAPTER
17
Interval Estimation
and Related Topics 1 Bstimatel of the accuracy of estimates The doctrine is often expre8ged that a point estimate is of little, or no, value UDlesa aooompanied by an eatimate of ita own accuracy. This doctriDe, which for the moment I will call the d~ of ~ ... motion, may be a little old-fashioned, but 1 think some critical diacua&ion of it here is in order for two reasons. In the first plaee, the doctrine is still widely coDSidered to contain more than a grain of tmth. For example, many readers will think it strange, and even remisB, that I have written a long chapter (Chapter 15) on estimation without even sugestiq that an estimate should be accompanied by an estimate of ita accuracy. In the second place, it seems to me that the concept of interval estimation, ,,-bleb is the subject of the next section, baalarply evolved from the doctrine of aceuracy estimation and that di8cuaDOD of the doctrine will, for some, pave the way for discuasion of infBval estimation. The doctrine of accuracy estimation is vague, even by the standards of the verbalistic tradition, for it does not 8&y what should be taken .... meuure of accuracy, that is, what an estimate of aceu~y ahould estimate. Any measure would be rather arbitrary; a typical one, here adopted for definiteness, is the roo~ enor, (1)
BH({l - ~(i)J21 B i )
-
(V(11 B i )
+ (E(11 B,) -
~(t))2} H,
using (15.6.23). The root-mean...quare error reduces to the ItaDdard deviation, VH(l1 Bi), in case the estimate 1 is unbiased. Taking the doctrine literally, it evidently leads to endIe. regressiOD. for an estimate of the accuracy of an estimate should presumably be accompanied by an estimate of ita own accuracy, and 80 on forever. E\"en supposing that the doctrine were somehow purged of vagueDeee and endless regression, it would atiI1 be in clear eonJliet with the behavioralistie concept of estimation studied in Chapter 15. If a decision ~7
INTERVAL E8TIMATION AND BEI.A.TED TOPICS
[17.1
problem consists in deciding on a number in the light of an observation, the person concerned wants to adopt an 1 that i8, in lOme 88D8e or other, as good as possible; but, sinee he must make some deeision, it eould at most satisfy idle curiosity to know how good the best ~ idle, I say, because, his decision once made, there is no way to use knowledge of ita accuracy. Since it seems to me that the kind of problem envisaged in Chapter 15 is of frequent occurrence and may properly be called estimation, I am inelined to say that the doctrine of accuracy estimation is erroneous. However, it is possible that someone should point out a difterent class of problems, also properly called problema of estimation, with respect to which the doctrine baa some validity; thouih, 80 far as I know, this has not yet occurred. One sort of situation that might, through what I would coDlider faulty analysis, seem to support the doctrine of accuracy estimation is illustrated by the fonowing, hiPIy scbematized example. A per&oI1 has to estimate the number 7& of replacement parte of a certain sort that should be carried by an expedition. He can conduct a trial the outcome of which will, let us say, be an observation x distnouted in the Poisson distribution with mean equal to tICR; that is, (2)
I
P(~ n) = ,---(am)S/2:1,
where a is a known cODStant and e, which the persoD can choose, is the east (beyond overhead) of the trial. Under reasonable hypotheses, once c baa been chosen and the value % observed, n(s) == s/ae is a good estimate of 7&; and in 80 far as the problem is of the type envisapd in Chapter 15, that is the end of the matter. But there may be features of the problem that have not yet been stated, though in principle they should have been. In particular, it may be that the persoD is free to conduct a.aecond trial, though there wiD typically be a high penalty for doing so. One rough, but sometimes natural and practical, step toward deciding whether a seeond trial is called for is to remark that (n/ac)H is a good estimate of the root-meansquare error of ft and may give a fairly good basis on which to judge whether the risk of misestimation warrants the expense of a second trial. My own conviction is that we should frankly regard such a problem as has just been described 88 a special problem in sequential analysis and treat it as an orpnic whole. Viewed thus, c is to be chosen in the light of the pouibility of making a seeond trial. The decision to be baaed on % is the complex one of whether to go to the expense of a second trial; if 80, of what magnitude; and, if Dot, what estimate of fa to adopt.
17.2)
INTERVAL ESTIMATION
Another sort of situatiOD that aoe&liB to have stimulated the doctrine of accuracy estimation is the following. Suppose that a rEHearch worker
baa obeerved :1:1, • • ., Sa, which are independent and normally distributAMI about the mean I' with varianee til liven ", and tI. If be wishes to publiab the resulta of hie investigation for all concerned to use as their own needs and opiniOD8 may dictate, be should, ideally, publish a sufticient ltatiatic of his observation, stating how it is distributed given " aDd fl. Any other COUI'8e may deprive some reader of some information he might be able to put to use. So far as the primary aim is concerned, all euflieient statistics are equivalent, but secondary collliderationa greatly oarrow the research worker's choice. To illustrate, consider the five auJlieient statiatica the values of which for (ZI, ••• , %" l are:
Ca) {%1' ••. , %.J. (b) The n order statistics of (e) %, and %i2•
(ZI • ••• ,
Cd) f
Df
L
=:
L
Df
:E Zi/n and .,
(e) ~ and '/nH.
t=
z.J.
(L %l - f :E %1)/'" -
1.
is at all large, (e), (d), and (e) are cbeaper to publiab than (a) and (b). ~foreoverJ for almost any use to whieh a reader miKbt wish to put the data, (c), (d), and (e) will save him a conaiderable amoUDt of computation. In 80 far IS it is true that alm08t any reader who has a use for the data at all will use I, but Dot Decessarily E ~" statistics like (d) and (e) are slightly preferable to (c). There is IOmeih i og to be Mid both for (d) and for (e), in view of the teady availability of certain tables; but, at Ieaet when n is very large, there is a eligbt advantap to (e) for those calculations a reader is most likely to perform. III particular, a reader using (e) caD, when n is large, often ignore the actual value of ft. Even if the distributions of the Xl, ••• , z" are not euctly nonnal, (e), (d), ad (e) often can play alm08t the aame role as aufticient statistics. It is no wonder then that (e) is often ehOll8ll .. a CODveDient way to present data. But, in my opinion, it is a mistake to lay great theoretical emphasis on the f~t that (e) happeD8 to eoDBist of what is ordinarily a good estimate of 11>, namely ~, together with what is ordinarily a good estimate of the root.-mean...aquare error of that ... timate, namelY./ft H•
If
ft
S Interftl eatimatioD and ccmft4eDce mterva1a
The verbalistic tradition baa suggested a procedure differeDt from point estimation but somehow related to it. This other procedure, here called inCemll utimalion, can be defined as followa, though the definition is neeetarily vague. Where x is aD observation subject to the
INTERV AL ESTIt.IATION AND RELATED TOPIC5
260
(17.2
I
conditional distributions P(%- B,) and A(,l is a function of i, gllefll that ~(i) lies in some set Jt(s) (to be eaIled an interval estimate) determined for each value of z. It is almost a part of the definition to say that the function AI(x) is to be 80 chosen that P(>,,(t,) I M(z) B,) aha ll be nearly 1 lor every i and that M(z) should tend to be small and "e1(J88 knit" in a geometrie.al8eD8e, some compromise being effected between these two conflicting deJIiderata. The parameter ~(t) could in principle be a very general function, but it will here be enough to suppose for definiteneaa and simplicity that A(t) is real. Though more general poaribilities are cODtemplat«i in principle. the set M (z) is in practice typically a bounded interval, which corresponds with what I meant in saying that M(z) is supposed to be "close knit." The idea of interval estimation is complicated; an example is in order. Suppose that, for each >.., x is a real random variable normally distributed about>.. with unit variance; then, &8 is very easy to see with the aid of a table of the DOnnal distribution, if M (%) is taken to be the interval [% - 1.9600, z 1.9600], then
I
+
PC). ,M{:r:) I ~) == a ,
(1)
where a is constant and almost equal to 0.95. It is usually thought nece88&rY to warn the novice that such an equa(1) does not concern the probability that a random variable ). lies in a fixed let M(z). Of course, >.. is given and therefore not random in the context at hand; and, given A, a: is the probability that M(x), which is a contraction of 1:, has 88 its value an interval that cODtaiD 8 A. Why seek an interval estimate? One sort of verbalistic answer nma like this: At first glance, the problem of estimation seems to require that a person guess, on observing that x takes the value %, that A(i) hu lOme particular value l(z); but, since it is virtually impossible that such a gueM should be correct, it seems better to try something else. In particular. it is often poarible to a.ssert that )'(i) is in a comparatively narrow interval M(z). chosen according to such & system that it is very improbable for each i that the aasertion will be false. Less extreme verbalistic explanatiODS tend to give the impreseion that point estimation need not be altogether rejected t but that interval estimation satisfies a parallel need. The first part of the explanation just cited is specious, since no ODe really expects a point estimate to be correct, and since, when one really is obliged by circumstances to make a point estimate in the behavioralistic sense, there is no escaping it. NODe the leas, that part of the explanation does seem to give some insight into the appeal of int«val estimation. The second part of the explanation is a sort of fiction; for it tion
88
17.2]
INTERVAI4
F.srI~IATION
261
will be found that whenever its advocates talk of making 888ertiollB that have high probability, whether in connection with testing or estimation, they do not actually make such assertions themselves, but end.. leasly paaa the buck, saying in effect, "This assertion has arisen accordinK to ,. sy&tem that will seldom lead you to make false assertions, if you adopt it. M, for myself, I usert nothing but the properti~ of l.he system." From the behavioralistie point of view, I maintain that point estimation fulfil. an important function. On the other hand, I can cite no important beh&vioraliatic interpretation of inten·al estimation. )Ioreover, in such direct and indirect contact as I have bad \lith actual statistical practice, I bave-l\ith but one extraordinary exception, which wiD soon be discussed---imeountered no applications of interval estimation that seemed convincing to me as anything more than an infonnal device for exploring data or crudely summarizing it for others. In short, not being convinced myself, I am in no position to present eonvincing evidence for the usefulneaJ of interval estimation as a direct step in decision. The reader should knowJ however, that few are as pessimistic 88 I am about interval estimation and that most leaders in statistical theory have a long-standing enthusiasm for the idea, which may have more solid grounds than I now know. The following is a scbematized example of one sort of decision problem that does call for something like interval estimation. An observation % bears on the position A of a lifeboat, the occupantR of which will be saved or lost according 88 the boat is or is not sighted by a searching aircraft before nightfall. The decision problem is, therefore, to choose, from all the domains that the airplane could seareh in time, one domain )f(z); and the 1088 must, in effect, be reckoned as 0 or 1 aceordiDK 88 M(z) does or does not contain A. rrhis type of problem eeema. however, too rare and too special to be taken as representath·e of thoee for which interval estimation is so \\idely advocated. Many criteria have been put fonvard for intenPal estimation, but I am of course in no position to discuss them critically. J. Neyman has lone about the search for criteria systematically, setting up a parallelism between the theory of interval estimation and of testing. In particular, paralleling the criterion of fixed size for tests, he has emphasised interval eetimates aueb that (2)
P().(i) • M(%)
I Bi )
=-
a
for a fixed a (typically cloee to 1) and for every i. Such interval estimates are called con fldeac8 inte"als at the coa1Ideuce level CI. The interval estimate mentioned in connection "it·h (1) is obviously a COD-
282
INTERVAL F.BTIMATION AND RELATED
TOPI~
(11A
fidence interval. Wald [W3) sought to include the theory of confidence intervals in the minimax theory, but in my opinion he did not succeed in giving intaval estimation a behavioraliatic interpretation. Though I am in no position to criticize any criterion of interval . . timation, I venture to uk: whether (2) is not gratuitous, 88 I have m.ore positively aaaerted. of ita &Dalogue in the theory of testing. Chapters 19 and 20 of (K2] will serve as key refereneea for interval estimation. a Tolenace inten-ala There baa recently been considerable study of what are called tol__ ace intemlll (or IlmltB). They are related to the problem of gueBBiug the actual value of a real random variable ." on the basia of an obaervatioD of E. A tolerance intero1 for ., at toitnDCe level a and f:4II8deDce left! ~ is an interval-valued function Y(z) lUob that
I
(1)
P{P(, • Y(%) B" %)
> a I B.)
- (J
for every i. The concept expre88f!d by (1) is & slippery one; perhaps it will help to express it in worda thus: For every Bil there ia probability" that ~ i8 such that 11 wiD fall in Y(%) with probability at least G, given B, and z. In typical applications 11 is independent of z; this permits a alight simplifieation of the definition. The notion of tolerance interval IUs AlB to me at least as I1namenable to behavioralistie interpretation 88 that of confidence interval, and I therefore venture no diacusaion of it here. Key references are [B22] and [W7]. ,
PidudaJ probe.bUity
This is not really a section on fiducial probability, but rather all apology for Dot having such a section. The concept of fiducial probability put fOl"Ward and stressed by R. A. Fisher is the moat disputed technical concept of modem sta.ti8tics, and, the concept is larpIy concerned with interval estimation, I wanted to diacuss it here. I have, however, been privileged to see certain as yet unpubliabed manuscripta of R. M. Williams [W12J and J. W. Tukey which convince me that such diacWJJion by me now would be premature. Some key references to fiducial probability aDd to the BehreD&.Fisher problem, which ia the moat disputed field of application of fiducial probability. are YJSher's own papers, especially [F5], and Papers 22, 25, 26, 27, and 35 of the collectioD [F6); Kendall [K2J, Chap_ 20; Yatea (YI]; Owen (01]; SepI [89]; Bartlett (B6]; Scheff41 [86), (86]; Walsh (W9]; and Chand [CSJ.+
since
+ ADd I e&J1 now add Ba1'D&rd (1983), Dempate'r (19M), l'iaber (1958, s... bODS ITI 3, IV 6, V 5, V 8, VI 8, VI 12), IAnDik (1988, Chapten VlD-X), Patil (1965),
S~befi
(1970), Tukey (1957), and Williams (1966).
APPENDIX
1
Expected Value 'I'hiI appendix, a brief account of aome relatively e1emezatary aapects of the badly named mathematical concept, expected value, is pnamted for thoee who might otherwi8e be handicapped in reading this book. No proofs are given here, but the reader who needs this appendix will probably be willing and able to accept the facta cj~ without proof. espeeiaIly if he acquires intuition for the subject by working the SUIa-ted exerciee& The requisite proofs are, however, given implicitly in 8Dy standard WOTk on integration or measure (e.g., Cbapten I-V of
1H2]). nttoughout this appendix, let S be a set with elements, and aubeefa A, B, CJ • • • on which a (finitely additive) probability measure P is definecL Bounded real radom ftriablea, that bounded real-valued functions, defined for each , • S, will here be denoted by %, 1, ••• , and real Dumbers by %, 1/, and lower-case Greek letters. The upected value of x, generally written E(x), is cbaraeterized 88 the one and only function attaching a real number to every bounded random variable x, subject to the fonowing three conditiODl for 8VerJ
w,
't
X,
J,
p,
tI,
and B:
(1)
E(px
(2)
E(s) ~ 0
(3)
+ 411) -
pB(x)
+ ,BC3).
whenever P(z(,) < 0) =-
o.
Jr(c(1 ~» ~ jl(1J).
In (3), c( I B) is the c.baracterJatk fuIlcdon of B, that is, c(-I B) - 1, if • t B, and c(, I B) - 0, if _ , "'B. In mathematical contexte remote from the topics in this book, the term "cba.racteristic funation" baa at least two other meanings virtually unconnect,ed with the one at hand, one in connection with linear operators on function spaoee, and another in cODDeCtion with the Fourier analysis of distributions. Often the expected value of z is refened to 88 the bltepal of % over 8, in which cue it is generally written s(.) dP(_).
J
:188
APPENDIX 1
Buzdses %lJ ••• , ~.
1. If s takes only a finite number of values, Bet of probability lero; then
except on a
II
(4)
E(z) ..
L %iP (S(8)
- Xi),
i-I
that is, the average of the ~l8, each weighted by the probability of ita occurrence. 2. If P(x(,) < r(I» - 0, B(s) ~ BCy); and if, in additioD, P(~(,) >
+ .)
r(') > 0 for some f > Ot then H(x) > B(y). t 3. If x is a real random variable, B, a partition, Pi and., real numben such that Pi S %(.) S tli for, I Bit then
'%paP(B,) S :8(x)
(5)
~
1:,,;P(B,).
cd A n B) - e(I A)e
4.
Aa is explained in texts on measure theory, the expected value can (at least for countably additive measures), and in practice must, be extended to many unbounded random variables.
Since, provided P(B)
> 0,
the conditional probability, defined by p(e IB) - p(e n B)/P(B), is itself a probability meuuret the expectation of z with respect to a conditional probability is a meMinpul concept. This conditioDal upectatioD is written E(z I B) and read "the expeeted value of % given B." Jlore ezerd_
I
5. E(x B) - B(xed B»/P(B). Hint: It suftjCftJ to verify that the expression on the right satisfies the three conditions parallel to (1-3) that define H(x 1B). 6. If B. is a partition of S, then
for every 3.
(6) 7. E(:.:) -
E E(% I Bi)P{Bi).
Hint: Use J:
-
I%.
i
t TfebnicaJ
DOte: In the event tbat P i.e oountably additive, P(z(.)
> r(,» > 0
imptiee the eKistence of a suitable f, 10 then t need not be mentioned at all.
EXPECrED VALUE
Suppoee y is a (not neeeasarily real) random variable that take. on only a finite number of values. It will be understood that B(% I ,) is the expected value of x given that ,,(.) =: 1/, provided 11 is such that this event bas positive probability.. Furthermore, it will be understood that B(I: I y) is a bounded real random variable that for each , takes
I
I,)
the value 8(x r('». The definition leaves B(I: UDdefined on the nuD set of those points , where y(.) is a value that 'I takes on with probability seJ'O.. It is immaterial how this blemi.h is removed; in particular B(s y) may as well be set equal to 0, where it baa Dot already been defined.
I
SdII more uerciaa
1,» -
8. B(B(h E(h}. 9. If f is a real-valued function defined on the values of y; then ICy) is a bounded real variable, and
(7)
E(f(y)z) =- B(JlJ)B(z I
J».
10. If 1&(x) is such that, for all f,
(8)
B(f(y)x) =- BU(y)h{J»,
then A(r(,» == B(x I tI('»' except possibly OD a set of .'. of probabUity seI'O.
Exercise 9 and ita corollary, 8, prefJeDt the moat frequently used properties of conditional expectation. Exercise 10 shOWI that the property pft!lellted in 9 characterises conditional expectation. Through this cbaraeterisation Kolmogoroff (K7) extends the ideas of conditioDal expectation and also of conditional probability (for eountably additive measurea) to random variables Y Dot neeeeearily confined to & finite or even denumerable set of values; though the definition in terms of ordiDaI'Y eonditional probability then breaks down completely, the pro... bility that ,(.) - , often being 0 for every ,.
APPENDIX
2
Convex Functions This appendix Jives a brief account of convex fuDotioua in the eame spirit as the preceding one gives an account of expeetAd value. Reaaonable facsimiles of the proofs omitUd here are . .ttared through [B,), where they may be found by anyone not, content to aldp them. An interval is a set 1 of real numbers; such that, if z, a I 1 aDd % S 11 :S -, tben 11 .1. It is Dot difficult to see that intervale can be c'laPBified according to Table 1, where it is to be understood that ~ < •. TABLZ
1. Tn
VABlOU6 T1'PB8 0. DrBBV.AL8
A real-valued fUDotion t defiDed for, in an interval I is COD"', if and only if the graph of the function never rises above any chord of it,.. aelf. ADalytieally, if p and II' are positive, p + ~ - 1, and Z, 7/11; thea (1)
t(p%
+ t11/) ~ pt(z) + .t{J). 266
CONVEX FUNCTIONS
.,
If equality holds in (1) for eome p; then, 81 is euiJy verified. it holds for every p, and t is b ..... i.e., of the form cd + fl, in the dOled interval [s, fll. An infAlrval in which t is linear will here be called an 1atern1 of 1iIleuJ.tr. If and only if there are DO intervals of linearity other than the 0De-P0int and vacuous intervals, t is ItrJctly CODftIL
BHrcIee. 1. Verify, at least II'8PhiealIy, that the fonowing functiODS are eonvex in the indicated intervals; diacua their intervals of liDearity; and _., which are mictly convex. I - (-00, +00):
(a) ". for every p, (e)
I ~ I.
(b)
:r:" + ps + • for every p and .,
(d)
I % I' for p ~ I,
(e) z. 1 - (0, GO): (f) -loss,
(g) z' for -GO
< p < o.
1- (-1, +1): (Il) (1 -
r.,-H,
(i) 1 -
COl
(d/2).
2. In an int«val where t is eoDVax, if "(s)/M:' exists at s, the "(s)/ibJ ~ 0; and if, for every ~ in an interval 1, tl't(s)/dz:' uiata ADd is DOD-oeptive, then t is convex in 1. 3. Be -plore Exercise 1 in the HPt of 2. 4. Let T be a non-vacuous .t of funcQoDl, t, t', ... , convex in I, _diet (2)
1-(.) - lOp 1(.). t
In (2), as always in mat.bematb, the IUp, or supremum, of a set of numbers is the least number, poEl'b1y GO, that is not Ifill than any . . mea.t of the set. If t·(.) < GO for every ,,1, then t- is convex in 1. Explore the proposition just stated, first paphica1ly, especially for a mute Bet of linear fs, and then analytically. What if the elementl of T an all strictly convex? 6. In aD open interval whet'e t is convex, it is aI80 eontiDuoua. What are the facta for clo.ed and half-clcleed intervaJa?
APPENDIX 2
6. If t is convex in 1, z• • 1, Pi r; then (3)
> 0, aDd Z,. - 1, where ~ -
1, ••• ,
~PlI(%i) ~ ,(~~)
Equality obtains, if and ODly if aU the sa'. are in a siDgie interval 01 linearity of t. (a) Interpret the propoaitioDB above in terms of probability. (b) Prove them by arithmetic induction on r. (c) What if t is strictly eonvex?
Exercise 6 suggesta, and indeed proves a special case of, the followiDg well-known and moat uaeful theorem, which caDDot be proved here in full pnerality.
TBmBEK 1 If t is convex and bounded in the interval 1, and z(a) .1 for all B 18, then (4)
B(I(%»
~ '(8(%».
Equality obtaiM • if and only if the valuea of z are with probability ODe contained in a single interval of linearity 01 t. Here and tJuouPout tide appendix, such conditions for equality are to be understood to apply only in the event that either p is countably additive or the lBDdom variable is with probability one confined to a fiDite set of valuesj the general situatioD for finitely additive me&lW'el is a little mOIa complicated. More aerci_ 7. The ftriance of z, often written Vex), is defined thus: (5)
Vex) - B([s - E{J:)J~.
Show that (6)
with equality if aDd only if P{z(a) .. H(x» :It 1. 8. Show that, if x is never 8IDaller than aome positive number, (7)
log ~l{X-l) S EOOK x) ~ 101 B(x).
When can either equality obtain? Write the analogue of (7) au...... by (3), and show thereby that (7) is a pIleI'&IilatioD of the Iamjliv fact that the arithmetio meaD (of poeiave numbers) is at 1. . . . pat
OONvEX FUNcrION8
.. the geometric mean aDd the geometric mean is at least &8 great 88 ~
harmonic mean.
One of the most famous of all inequalities is the Schwartz inequality. which ean, though not quite obviously, be derived from. Theorem I, and which can be stated in terms of expected values thus: (8)
g2(xy) S B(r)Bt:r>,
with equality obtaining if and only if for some Dumbers
p
and .. Dot
both sero (9)
P(PX(.) == 01/('»
== 1.
Note that (9) expresaes (perhaps too compactly) that, except on some . t of probability aero, either x or J VAnishes identically or else each is • fixed multiple of the other. Statiatically speaking, the Schwarts inequality EtXpftlJ8e8, in effect, the famDiar fact that any eorrelation eoefficient must lie betwecm +1 and -1, one of the extremes occurring if and only if at least ODe of the two random variables involved is a linear function of the other. The concept of convex functions and its implications ~ eaaiIy be extended to real-valued functions defined OD vectors in an n--dimeaaicmal vector space, the role of intervals there being replaced by convex subaeta of the vector space; but an understanding of this exteDaion, though deeirab1e, is not aheolutely eeaential in reading this book. ODe good introduction to convex subsets of vector spacee is SectiODl 16.1-2 of [V4], and another especially adapted to ltatiatieal applic. tiODS is incorporated in IBIS]. The standa.rd treati8e on the topic is that of Bonnesem and Fencbel [B20).
APPENDIX
3
Bibliographic Material The bibliography of about 170 items that termiDatea this appendix lists Dot only all worb referred to in this book but aI80 801D8 othera, for it is intended to aerve not only as a mecbanical aid to referaee but also 88 a briefly and informally annotated list of augeated !-.din. in the foundations of statistics. In addition to the DOtes meorporateci into the bibliography, information about many of the works Iiated theN is given in other parts of the book, where it call be fourad by ref•• Uc to the author'. name in the authol' index. BefereDCBI that have eo.e to my attention since the first edition are in Appendix -&: Bibliographie Supplement. They are cited by the eonvention aeeordinr to which the first of them is called (Aczel ]966). Todhunter baa abundant references scattered in chronoiOlica1 order through ['1'3), emphasising the mathematical aapecta of probability up through the period of Laplace. Keynes, in [K41 pves a formal bibliography which purpoeely does Dot overlap Todhunter's material very exteDBively, the emphasis being on more philoaophieal aspeeta of probability and on the period between Laplace and Ke,nee. Canaap ill (011 also gives a formal bibliography, which empbMi. . pubticatiODl sinee Keynes. Camap promises an eVeD fuller bibUopaphy in the projected second volume of his work, and he recommeDds the bibiOlraphy of Oeol'l Henrik von Wright in (V5]. Bibliographiea of 8tatistics proper are of eome, thouch diluted, relevance. Of these, the mOlt useful is that of M. G. Kendall in Vol II of [K2J. Camap at the beginniDg of his bibHopaphy Iiv- reference to some other statistieal bibliographies. The enormous work of O. K. Buroe in statistical bibliography, [B23) , (B24]. and [B2S], should also be mentioned . His volumes bring together pointed excerpta from reviews of statistical books. Buros also directed a bibliopaphlc department, entitled "Statiatical Methodology," in the Jt1UIYIIIl 0/ 1M A~ &0'istictJl AaociGtion. from September 1945 to September 1948, liatiiDB current artielee, books, theeee, aod chapters dealiDI with atatiatica. In 270
BIBLIOGRAPmC MATERIAL
211
Volume 20 (1949) of the AnnaL. 0/ M~8tatiltiel, an important journal of statistical theory, there are two cumulative indexes of Volumes 1-20, ODe arranged by author, the other by subject.
BlbJiapaphy Aitken, ~ C., and B. Silftl'ltoDe {Al] "OIl dae eltimatlcm of atadstlea1 p&rameten," protJl«jm,. of tIu! &,al &tMt" of ~ 81 (IMl-43), 18&-lM Caasued aeparately April 2, 1942). A1IaiI, Maurice (A21 ''Le co~meDt de l'homme rationnel devant Ie risque: Critique dee poe.tWati et axioms de 1'6oole Americaine," E~. 21 (1953),503-616.
ADea. S. G., Jr. (Aa) "A . . of miDimu teet.a for ODe Bided compoaite hypotheees," A"nalt oj JI~ &tJ1i1liu, 24 (1958), 295-298.
ADIoombe, F. J. [A4) "Mr. KaeaJe OD probability and iDductioD," MiRtl. 60 (1951), 29SH09. Sa,. much of poera1lnt.ereet on the foundadoaa of atatistics, in the course of comment. OD {K5J. Arrow, K.meth J. (Aa] 80citIJ CItt1U» tIIttllfIIIWi4uoj VaI_. Cowlee CommlWon Monocraph No. 12. New York, 101m Wiley ct Solll, 1951. (8eeoDd ecUlioD, 1M3.) (AS] "Altematlft ap~ ~ the theory of choice in riak..tekjDI eituatJODl,Jt ~ 19 (1961), 401-437. Arrow, K. J., Da'f'id Blaekwell, and M. A. Girabick (A7l j'Bayee aDd mlDlmu 101utioDl of aequential decilioa prob1ems," ~, 17 (1M9). tl3-M8. JWwdQJ', RaP.. Raj (Bl) "A property of the ,.. 8tatistie," &u.kA,., 12 (1952), 79-88. (B2J "SafBeieDcy Uld statistical deeiaiOll funcdooI." AMGle of M~ &olillit;s, 25 (19M), (to appear). BaJwlur. RaP" Raj. aDd Herbert Bobbins [B3) ''!'he problem of the pealer meaD," Aftaoll oJ JI~ 8t4titlia, 21
(19&0),'" C. LAMb, 8. (1M.] "WorN dee ~ Ii.,..,.., Wanaw, Fuad. . Kultury Narodowei. 1932. B&u.t'lI, 8.• aad A. Tanki [B6] "Su.r Ia d6e0mpolition dee eDlembJea de pointe eo partie. nwpeetiftlDeDt coaaruentel," ~ M~, 6 0924), ~277. Bu1Iet;t" M. S. lB6) "Completely llmultaDeou1 &ducial distributioaa," Aft1'lGle oJ M~ 8t.tJIi1lia. 10 (1939), 128-138. ."moJ, WDljam J. (B7J '-rhe Nenmann.MorpDltenl utiJi\, index-u ordiDalilt -.iew," Jt1fIJMl of Ptlliliotll..." 59 (1911), 61-88. Be.J-. Tbomu IBBJ Foaifflila of T1Do Po,.. b, BOJa: i. Aft BfMJJ/ TotIIGI'd Solri"ll G Problem ifl , . Dodrirw tJ/ ~, Will ~MnI Prk~'. FtIrfVJtIrfl and ~; PAil. 2',.... Bo,aI Soc., JIP. ",~'18, 11.. Will G C~,. Bcfwor4 C. Jloliu.
APPENDIX
a
ii. A L«ur Oft A.",.pIoIie s.n. J""" Sa". 10 Joim CIJIIImt,; 7P. 189-1'11 I/Me 801M VoluJU. Will G C~ 6J W. BdlDGl'dt Dttmi,." ed. W. Edwarda Demin" Wubiopon, D. C., The Graduate School, The Departmeat of Apt. eulture. UNO; ...pllbl....... al (Baya W8). The first of tbeIe two paperl, in which .. IpeCiaJ. cue of what ia DOW .ned Ba,..' rule iI Introduced. fiaurea prominently in controvendee about the fOWldat.iou of prot.biUty, for W. paper fint put e,enl of the major . . . in &be HmeliahtBell, E. T. [B9) Jlat. 0/ JI~, New York, Simon and Schuster, 1987.
Bemoum.DuW (BIO] u8pw:i1M't tbeoriae IlD\'U de ID8D1W'a 1IOrtia," C~i ~ IricatlGrum ~ P.~ (lor 1130 and 1731), 5 (1738), 116-192. ~
... moct.n. W..".... yerlUCl ........ TAeriJ . , W ~mtmf I'0Il 01.,.,"" (German traDIIation of (BIO] by Alfred PriDphelm, with iotJocluction by Ludwia Frick), LeiPSi& Dwlcker V. Bumblot, 1886. {B11aJ "Expoeitioo of a DeW theory on the meuuremeD\ of risk" (Enc1ieh tIuItlation of [BIO) by Low. Sommer), B~. 22 (19M), 28-28. BemouW, Jaoob (-James) (812] AI". ~ Buel, 1713. [BIB] 1V"'nlie~UIIf (German traDllation of [BIt) by R. BaaOetwald'. Kle-ker der Exakten \rJMeDJCbaften. NOI. 107 and 108, W. EnplmanD , 1899. Coataiaa, besides much of primary matbematlea1 inte..., what I rmderetaDd to be tbe tiNt euended di8cUleioD of the appli~tioD of probebility to the problem of iDfere.a.ce.. UafOJ'tUDate1y, the German translation w laid to be illoom" [Bll] DN
_>.
1Ai"'.
plde. BirkhoB, 0., and 8. MacLane (Bl") A Surw, oJ Modem Alge6ro, New York, The Maemil1aa Co., UNI. Bide,., M. T. L. (B161 "Some Dotes on probability," JOtII'fUJl of 1M 1,..,.,. oJ Ad&Mlria Stutltnla' &ci«", 10 (1951), 161-203. Blackwell, David [BIG) "Compariaon of experimenta," pp. 93-102 of PrOMeitli..,. 0/ ,.. s..d (1950) S.,.w., SrmJlOliu", ma M GlMrruJtU:ol SfGliltic. GIld Probabilily, ed. Jeny Seynwa, Berkeley, University of C.tifomia PNee, 1951. [B17) "On the translation parameter problem for ctiecrete n.riatMe," 0/ MGtMrntJl~ ~ 22 (1951), 393-399. BIacltwen, David, aDd M. A. Gil'8hiek (BI8) TIatJ TMorti 0/ GJttf &otialicol D«:Uiou, New York, Jolm W....y • SoDS, 1~ BoImenblUlt, H. F., S. Karlin, and L. S. Shapley [B191 "Solutione of diecnte two-persoD pmeI," pp. 61-72 of [KI3}. Bonn en. T., and W. Fenebel (820) 2'1YorN . , Uuam K6rpR, Erpbniale dar Mathematik wul ihrw Gnupbiete, Vol. m, Part I, Berlin, J. Spriqer, 193-&; reprinted, New York, CbdR.
A....,.
aa..
s
Publishi. Co., 1948. Borel, Emile {B2IJ ''The theory of play and intecraJ equationa with skew lIfII'IMbie '-DeJa; Oa pmel tha, involve ehaaoe and the aldll of U1e players; On.,.... 01 m-r
BIBLIOGRAPmc MATERIAL
278
forml of Ike. q_.etrir 4etermiDant aDd the .elleral theory of play (tra..-
Iated by ~ J. Bev...,)," B~, 21 (1953), 97-124.. Bowker, A. H. {B22] ''Tolerance limits for normal dietributions," Chapter 2, pp. 95-110 of T«Ir ftique.t of Statillical AMl,m, by the Statiatical Reeearch Group. CAmJmbia University, New York, 1-{cOraw-Hill Book Co., UM7. Buroe, O. K. (eel) {B28] Baean:Ja orad SttrtiItictJl MetMdolow, Booka aM ~ (1938-88), New BruoIwick, New Jemey, Rutprw University Preas, 1938. lim) T_ 8ectmtl Y~ ift llAeGrcA cmd MdI&otlolDgJl, Booa cuacI ~. ffighland Park, New Jersey, The Griffin Preas, 1941. [B25) SItJIiIIit:al MetMdolow Rmew8 1941-1960, New York, John wUey & SoDI,
1961. Camap, Rudolf (Cl) Logial F~ oj p~, Chieago, Univenity of Chicap Prea. 19S0. This is the tint of a projected pair of volumes desiped to demoDBtrate metieuloualy the author. contention that a oertain almoet D:eee1lMl'Y view of pro~ ability is .-entia! tA> lCience-not cIenyiDc the meaninaflJ1Q818 of the objec-tiviltic concept. Reviewed by me in (84). (C2) '1'''' Nalurs oM A~ oJ Iftlludi", Logk, Cbica&O, Univeraity of Chieap Preas, 1961. A reprint of aelected eeetioos of [C1). [ea) TIw ConIiftuUJn of IfIIiudiN M~oU, Chiea&o, Uni"nity of Chieago Pre.,
1962. Faentially a chapter of the eecond volume of the projected pair referred to ander (el J.
Centre NatioD&1 de Beebercbe Scienti&que (04] F~ _ G~ • lG ~ du n.que Centre National de Ia Recherche Scientifique, 19M.
'"
b:tIrwm«rit, Paris,
Report of &11 international econometric colloquium on JiIk, in which there • • much disculBion of ut.ility, held in Paria, May 12-17, 1952. Chand, Uttam {(6) ''Distributioos related to compari8on of two meaM and two ~gresaion coeftieieDte," AfUIGlI 0/ M~41~, 21 (l9S0). 507-522. Chapman, Doqlu G. t aDd Herbert Robbins [eG] "M;nimum va.riaDce estimation without repl&rity &88UDlptwos, tJ AnMU oJ MaI~
Sl4litlia, 22 (1961), 581-686.
Chernoff, Herman (07) "Remal'D on a Rational Selection of a Decillion Funmon," Cowles Commj";ou Diacusion Paper, Statistics, No. 326 (January 10, 1949). Unpublished.
Chu.rcJnnan,
c. West
[ca] 7'Aetwy
oJ
Ezperim.al4l 1n/IJf'fJfta, New York, The Macnnillan Co., 1948.
A diacuation of curreDt atatiltica from tbe vie1r.pOint of teebnical philoeophy. Cram6r, Harald [09] JI~ M~ oJ 8tatiaIic., PrincetoD, Princeton University Press, 1946. By ,~ &be JD08t oomprehenlive rigorous book on mathematical methods of Itatiati
APPENDIX
a
llumoia. G.
[DIJ "Sur 1M Jimitee de Ia . . . . . . de ___ tlltimaUo_:'
a.. •
1'1..,.,
l~'" ~ 18 (ItN1), 0-16.
de IiDeUi, Bnmo 1DJ1''U pNyiIioa: _1oiI1o&lqua. _1OUr'CeI aubject.ivM," AMtIlca ."1,..,." B...n PoitttJtIrf, 7 (1937), 1..... (Traa....td I. (Kybur. aDd Nmokler Itt"').) The &wt two aad the fiDa1 cMpt.em 01 thie ..,. aift a 1fataaeDt, wIUeh I (GUIld panicuJarly"ulatiD& of the 'View (a pertODaiiatio ODe) of the fOUDdltioal of probability. TIle . . . iateNeadDi cIaa)Mla aN maticaIly rather . .ai_I. A bibliopapb, of the autbcr. work OIl the ~ Uo. of probabDit)' up to IOIIIet.ime ia 1t871s included. (D8J '~ ftai It Ie probU)e," DIMcIictJ, (1MI). 71-93. [1M) "SaD' impoetaRone MBiomaac. del _colo deDe probabiIitA.,t AftaaU 1'..... aini, 8erieI 2. 19 (1949), 29-81. [DI) "La 'loci- del plauaibile' IICODdo Ja aouesiou di Poly.," Alii au. XLII Riuniou . . Sot:hH 1,..." , . il ~_f &:We. (N. . . . . "41), Rome, SoeietA I ...... pel' il Proaa 111 ~ delle 8eieue, lUI (10 pape). [DI) "It. .t ~ for t.1Ie l'fJOODeUiatioal of theoriM of probabiliv," pp. 217-228 of p~ #/ '" 8«on4 [11SO) . . . . , S,.,.;... _ JlaIIaaIaIical &ali• • .." ~, eeL JfD7 Ne)'JllaD, Berbley, tJDiwnit,. of CIIlfcni& Pr!I., 1.1. Bepe;ellJr tIaroacb ... 111'= I&lOU it maa. abcNt muitipeNoDal probIeaII. all . . . . _-.ip& 01 (D8) ... bad mach iDl1I8IlO8 on thiI book. (D7) "8ul1a pnf.tbilitA," 0i0rwGI. 6gIj .~ • AftftGli di BOOIICMIiG. 11 (lO62), 885-709. (D7a] ''La IIOtioD de 'diatribut.ion d'opiDioD' COllUDe hue d'u ...u d'iDterp. .tioa de Ia atatidique," ~ • rllllfiW de &wi• .,. ,. ,·U. . . . . . . Poria, 1 (1962), 1-19.
.u..r.
_\'8
-t"
a
..
Delorme. 8. (ad.) (DB)
Colloru • cabl_ ,.........,. (AcCvcI'iMe ~ d
. . . . . . . , . 1141).
Paris, BermaIm et. a.., 1961. A coIIecticm of papen by eevera1 a&tho..., moat1y on U. phi~ of .... abilit7. read ill a ooDoquium beJel act. the 13th mt.DatioDal Pbiloeoph7 of 8alenoe, in Pan.. 1949. Tbere. aD OftI'allJeview by M. PI6cbet, pNeideBt of die ooJ)oqWUID. All papen are in FNIleb. ezoept OM in Ena1iah.
eo....-
01.
(D9J Di~ Vol. a (1M9), Noe. 9-10. ThiI __ of DiGldt:G, a quarterly "mterDatioaal review of the pIIiboph7 of bowledp," it deYoted to p!Qbability, aDd mainly to ita fOUDdatioaa. It ia ec>mplilid of ..pen b7 .veral authora, each ill Enalieh, FreDeh, or Germaa..
Doob. J. (DIO!" "Statil&ica1 elt.imatloD," T.......... ti ". A ...... JI~ 11«MIr, 81 (l936), .1C)...421. Ducaa. D. B. (DIll CIA M...1fieaaee WI. for dileleu. betWeeD raaked tnatmenta In aD
V._
of w.riaooe," Ytr,inia Jf1tImOl Ii ~ 2 (1861), 171-189. IDIJ) "OD the propertiel of the multiple oompari8oal telt,1t ~
8 (1962), '9-67.
anal,..
JtIIIIftIII fI/
BIBLIOGRAPHIC MATERIAL
271
~,
A., A. WaId, aDd J. Wolfcnrita {D131 f'EJjmination of raDdomiatioa in certain statietiea1 deeieioo pmoedurea and terHWD two-perM)D PIDM," A,.,.. tI M~ ~ 22 (1961), 1-21. ElfviDc. G. (El] "8u.ftic:iency and eomptetene. in decieion runction theory," Aft"" Ato..,.",BeiMIGrM". F~, &ria A.I., 136 (1962),9 paps. Feller, WiUiam (FIJ A" IftIrodvdilm to ProbGbilitr TItMw7I and 1,. AppIicaIioJu, Vol. 1. New York, John Wiley" Sou, 1~. (Thir(l editioD, 1968; VoL 2.1He aDd 1971.) A IOphilti~ted introduction to the mat.hemat.ice of prob&bilit.y. Certain relatively advanced mathematiea1 teehniquee are avoided by a .vere reetriotioD of the material t.reat.I, "bleb hi DODe the . . exteaaift aDd varied. A .coad volume, removiDa the restriction, is promWd. Faad,D. [P2) "Induction amplifiante et iDf6renoe atatiltique/' ~ I (UNO). 127-10. Piaber. R. A. lF3) ~ JlelMtl./or ~ IVor"" EdinbwP aDd Lcmd-. Olini' ad Boyd, 1926; and later editiOD&. The author ill the out8taDdiD& member of the Britilh-Americaa 8ehool, aDd thi8 book oIlda baa had far more in8ueDee OD the deftlopmea, of etatieli. ill the current centurr than lUly other publie&tion. [Ff] 1''' ~ oJ Bspm....u, EdiDbwP aDd London, Oliver aDd Boyd, 1936; aDd later eclitiODI. 8ecoad 0Dly to [Fa) in the eztent of ita iaflUIDO&. [FI) '·A DOte OIl fiducial iDlerence," A __ oJ JI~ StoINIia, 100939), 383-388. 1J'6) COIIIn"hfionI eo JI~ Slatiatia, New York, Jolm WUey • Sou. 191iO. A ooDeetioil 01 p8p8d eelected aDd &IlDOtatecl by hilDlell. WiUl. ~,. of PIsber by P. C. MahalaDobia. Beviend ill {N4].
Fiaber·.
Fieber, Walter D. [F7) ''aD •
poolinc problem from the ltatistieal deeiIioD Yiewpoint," B~
21 (19&3), 567-686. NeW. Maurioe IPS} "s. l'esteDIioD de certaiDI 6valuationa ItatWqu. au ta8 de pet.ita 4ehantD10...·' Ben. 44,'1"""" IstImIoIiOftdldc~ 11 (1943), 188-205. (F9] "EmUe Borel, iDltiator of the theory of PQ'ehoIoIIea1 pInS aDd lie appU__ tion," B~ 21 (1953), 95-96. Ft*b.et, Maurice, and J. 'VOD Neumann (FlO) "CommeDtary OIl the Borel DOtal," B~ 21 (1953), 118-121. F:rfedman. MiltoD IFIll "Choice, ebanoe, and penoD&1 di8tributioe of income," Jt1fIIftIIl 0/ PoII'lleal BCOftOM" 81 (1953), 277-290. FrIedman, MiltDn, aDd L. J. Savap 1F12) "The utility analyaia of ehoioee involviDa riek,,. J tIUrItIIl 0/ PolititGl BCimOfII,. a6 (UM8), 279-3Mj reprinted, with • correctiOD, in 18191. [PIli ''The expeetecl-utiJity hypotbeeie aDd the ~ 01 utility," J.."",. tI PoIitieal B~. eo (1962), 488-47..
APPENDIX 3
278
Ginhick. M. A., aad L. J. s.wp (01) us.,. ADd mini...... sm.... for quadratio to. flmetloDI," PP4 U-741n Proteedi..- 0/ .- a....t [1110) • .", 8".,... 011 JI~ cmd~. ed. J..,- Nqmaa, Berb1e;v, UDiftnity of CalifOrDia ~
BIttIi._
UN;}. Good, I. J.
[02} PNbGbilitr tmd ON W,.;gAiftl 0/ ....., LoDdoa. Ch.rt. Grif6a ADd Co.• and New Yark, Hafner PubliebiDi Co., 191ifl Preeeute, with ~ iD~ ~ &Del arpmenta. Good'. view, a pe!8ODalist1o ODe, OIl the foa.adatioDa of probability. Beviewed by me m lsa}. Grave&, UWNIlOI M. (03) TM TAeorr 4/ ,~ f/ Il«Il VGriG6Ia, New York. McGraw·HiD Book Q).,lM6. Halmoa, Paul B.. [H 11 "The fOUDdatiou of probebility." Amerit'oa M~ JlonIAl" 51 (tOM), 493-610. Short, . . , espoIition of the Kolmoprol pJObability coaoept. [H21 MfUVn 'rilltlrr. New York, Vall NOItraDd Co., 1150. Halmo8, Paul R.. aDd L. J. 8avap [Ha) 14ApplieatioD of the BadOD-Nikodym theorem to the theory of . . . .t atatiatiea," AIUIGlt tJ/ JlaIMtrtaIical8lllli1lie_, 20 (IM9), 226-241. Hardy, O. a, J. E. Littlewood, aad G. Polya [B.) 1~, Cambridp. Ovnbridp UDi....ty Pr '\ 10M. Hildreth. CIi1rord [H4a) "Alternative conditioDilor IOdal orderiDp,JI B~, 21 (1918),1l-M. Hodps, J. L., Jr., and E. L 1.bmNm [85J ltSome problema in mhlimu point eMimation," Annala of JI~ &tr,. ,~., 21 (1950), 182-197. [H61 "Some applieatioDl of the Cram6r-Rao iDequalitJ'," pp. 13-22 in ~ oJ 1M &r;orad [1960] lkrblq S,.,-aan .,. JI~ 8I4';rIk• . . Prtbobil"'J/, ed. Jersy Neymao, BerbJey, UDiwnlity 01 CaUfomia ~ 1951. Hume, David [H7) Aft Efttl1Ai,., COItCIJ Bu",.. u,...~, LondoDt 17"; and 1a&er editioDl. An early aDd laD\OUI preeentation of the pbilolophlcal problem of inductive iDferenee, &rOWld which aImoet aU later diecuaOll of &be problem pivote. Jetfrey8, Harold [Jl J TIworr oJ Pro6abilit, (Second edition), Oxford, ClareadOD PnIIe, UM&. An irapaioua and viprous del. . of a aece.ary view, similar to, but mont sophisticated thaD, IApIaoe'& (Staeolld edition, 1981.) Kakutam, 8. [KIJ CIA pneraiiatioll of Brouwer's fiud..point theorem," Duk MatIwnttItit:Gl JI1UmIJl, 8 (1941), 4S7-4&9. Kendall, Maurice G. [K2] 'I'M A~ TItct1rrJ of~, Vol. I, IM7, Vol. II, UN8, Um.don, Charles
"i.,
Griffin and
Co.
Virtually an eueyclopedia of atatistiea1 t.heory. history, aod bib60craphy (u of 1&&3).
(K81 "On the reeoneiliAtioll of the theoriel of prob&bility," BitomtJlribl. 36 (1949). 101-118.
BIBLIOGRAPmC MATERIAL
L,. ., John Maynard
(K4) A T,... on Probobililg, London and New York, MaemiUan & Co., urn; IIOODd edition, 1929. A IoD&. but often entertaini. . acoount of Keynes' viewt a 1IeC 8E . , ODe. The hiatorical pe •• 31M aad bibUoIraphy an! of spedaI interest.
Kneale" WUliam (lm1 ProbtMliI" once Intlvdioa, Oxford, Clarendon Pn., 1949. An iDpoiou8 preaentatioa, pbiJoeophieal in spirit aDd t.ckarotmd. of & DeW view with objeetivisUc aDd Dee!' ry upeda. Reriewed in [Afl. pm) "ProbMJi1it,y and indUctiOD,t' JliM, eo (1951),310-817. A reply to (A4]. Kolmoproff, A. N. [K7J GrurtdbttJri6e tier WcW.cWralU:UeitlrecbUftf. Berlin, J. SpriDaw, 1933. lKBJ F~ of 1M 7'"." of P,ohtIbililf/ (EneJiah traDIIaUoa of [K7J edited by Nathall Morrison), New York, Che18ea Publishi. Co., 1960. 8atemeot and Vffl'1 compact, development of the Kolmoprolr OODaept of matbematieal probability. ExcelJeat readiDl, tbouP madaematical17 rather adftDced. ICoopmaD, B. O. [K9J '''!be uioms and alpbra or intuitive probability," AMGII ~ Jl~ Ser. 2, 41 (lMO), 269-292(100) '-rbe buea of probability," BulUtiA oj tIN A~ JI~ 8ocid" 48 (l9tO), 763-77". [Kll] "Intuitive probabilities and eequenoee," Aruaal. Ii J(~. Ser. 2, a 09tl), U.I81. Tbeee three ~ra preeeDt the peJ'IIODAIiatio view that Koopman bolds aIoDc with aD objediviltic ODe. 1E12) Reviewa of eleV8D pap8l8, M~ lUNtN, 7 (15K8), 188-198; aDd 8 (1947), 245-~7. A connected eequenae of reviews of papem, by aenral authon, that were publiIhed .. a sympo8lum in PA~ tIAII p~ Ba.rci, Vola. S aDd 6 (194H6). Xaha, H. W., aDd A. W. Tucker (ea.) (KIa) C~ 10 ' " T".,." of 0-... Vol. I (Ann'" of Matlwnati. Study No. 2t). Princeton, PrineetoD UDiveraity~. 19150. [l(14J C~ 10 lie TMory oJ Goma, VoL n (ADDIJe of M&tbemati. Study No. Ja), PriDoetoD. Princeton UDiftdity Prell, 1968. ~ eoontin,ted COIleetioIl of articIee OIl the theory of IN"'" by .ven1 aothor8, wtth bibliosraphiKu1bck, B., aod It. A. Leibler (XIS) '-on iaformatioa. aDd .w&cieDq," AtuIGla tI Jl~ ~ 22 (1DS1), 79-88. Laplace. Pial ... BimOD de (Ll) BMJi ~ lUI" lee probobiliUa (Firat edition), Paris, 1814; &Del •••al eubeequent editiou, of wbieh the Fifth, 1825. wu the ... to he revilld by lAplace.
paJ A
BIllA, _ ~ (EncJisb traz.1atioa of (L1L ..... editkm), New York, John \vUer • SoDS, 1917; repriDted, New York, Publicatiou, 1952. p~
no..
APPENDIX 8 Tbi8 . ." pub1iahed both eeparately uad AI the prefaee of dMt au&hor'. ~ technical u-tlle, T".,. CIIIGl...... tIa ~, is, aft for a lew dull lPOts, ODe or the moet deliptful aDd ltimulatiDI clMeiea of probabilit7. x.. plaoe'. view is a naive 11MB SFJ.,. oa.e. m. ooDOBpt of the domain of applieabllit1 of the theory of prot.bilitJ II iupiriDc, if Dol al8,. realiItic.
LeCam, Lucien [La] "On lOme asymptotic propertiee of maziIDUDlliblihooclat;m,. aDd relMed Bay" lItimau.," pp. 277-829 01 U....-, tJ/ CDli/tJI'Aia Pv6liotIIiou i . . . tittiu, Vol. 1, No. IJ~ Berk~ and Loa UDivenity of California Pr ••,
ADaa-'
1958. Lehmann, E. L. IUJ "Some priJlcip1ee of the theory of teet.iDa hypot.beaee." AMolt ~ M~ &.II·"iel,21 (1950), 1-28. [U1"A p ...... concept of unbiuedn-." Aft.... of Jl~ SlGlirlie., 22
(1951), 687-597. Lehmann, E. L, aDd H8DI'1 8cheI'6 (L6] uComp1eteDell, .milar zeajoDS, and unbiaaed eatimation, Pan I."
&aIM,..
10 (1960). aoa-.a«). LeN. Clantaae I.rvioc. and Cooper Harold r.."pord [L7) 8pt6olie Logie, New York aDd London, The Century Company. 1982. 14ndley, D. V. [L8J "8t&tietieallDference." JountGl of 1M Ro,al ~ Boric", &n. S, 15 (1963), -"76. Excellent. nadiul in conneetiOD with Chapter. 1'-17 of tm. boot. UDlol'tunate1y, I did Dot lee the p&per in time to refteet ita coatente in tho. chapters. Markowitw, Harry (MI) '''I'be utilty of wealth," JtIfU'fI4l oj Polilical Bcmamn" 60 (1952), III-Ili8. Maraball, Alfred 1M2] Prifldpla 0/ BCOIIOMica (Firat edition), London, MacmiUp .t Co., 1890; aDd 1IWl11Ut.queut editiou of .hieh the Eicbtb (1927) .It.ndard. t-lcKinBey, J. C. C. [M31 Inlrodudion to u.. T1vDrJI 01 GsJRU, New York, McGraw-DiU Book 00., 1152. KoeteUer, Pted.rick. au Philip NOI" IM4-) "All experimental meuuremeot of utility," JOfIt1UIl oj Polilicol Bc:oMm" &9 (1951). 3Tl-4M. Mourier, EdiUa [M6) "T-.. de choix entre diven lois de pro~t6." TrahtJj.. cJe _adWiDS, 2 (1951),2M-259.
MulU"Oe, M. E. (Mil TMor" of Probobililll, New York, McGraw-BiD Book Co., 1961. An elemenWy modem ten OD the metbematiee of probebility. Easier aDd more poeral than [FIJ, but not 10 penetratiq. Napl. Emed (NIl "Prineiplee of the theory of probabi&ty." 1~ B~io oJ U,... Ptl8~, VoL I. No.6, Chieqo, University of Chicaao Pre., 1939. NeyDWl, Jeny {N2] "OutIiDe of a theory of statistical e8timation baaed on the c....cal theory of probability," PltilDtopAical 'l'~ oJ tM RoIIGl8ot:i«fI. Ser. A, 288 (l937).
aaa.-aso.
BIBLIOGRAPHIC MATERIAL (NS) "L'eet.imation ltatistique, trait4e comme un probl. . . cl,.;que de prot. biliW,'· pp. 25-67 of AdutJIiUe ~ el irtd~ No.1., Paria, Hermann et Cieo, 1988. Beprueatative worb of the Neyman-PearIOIl Sehool. & major branch of what ia this book iI called the British-American School [N41'TiIher'. coDeeted papen," &iat~ MonUalJ/, a (1951). 406-t08. Nuake, R. J., &ad L. J. Say. . [N6) 'fOn the Bet of nlu. or a nODalomie, bitely additift, finite meuure," ~ tMtli,., of 1M A~ M~ 8ocWfI, 3 (1962), 217-218.
Oweo, A. R. G. [01] U Ancillary ltatiltiee and fiducial distributions," BoMlafliJ, 9 (UH8-49), 1-18.
P"', V'U&edo (PI) M__ fl'~
~
(SeeoDd edition),
Pan..
Giard, 1927.
(FirA
edi\ioa 1909. Baled 011 a .till earlier book ill Italian)
P-..I, B1aiee (P2J PeuIJa, with introduction and DOt.el by Louis Lafuma, PariI, Delmu, 1947. PaulIoa, Edward [P3) &fA multiple decWon procedure for certain problema in the anaIyaia of vanUl08," .4.... of M~ 8t.Gtittiu, 20 (1949), 95-98. (P'J "00 oomparilon of 8everaI esperimental catepriee with a control," ARIIGlI qf Jl~ SIAIlifla, 23 (1952), 239-246. Pitman, E. J. G. (P5) '-rbe 'e~ eetimate of statistical paramet.en," Procetdi,.,. of IJy ell".. ,..,.. p~it:Gl Socidfl, 33 (1931), 212-222. Polya, G. [ra] C4Pre1imiDary remarks on a loaia of plaulible iDfereDCe," DiallJdic4, 3 (19019), 28-a. Ram-y, Prank P. (Rl1 "Truth and probability" (1926), and r 4Furt.her ooDIideratiou" (1028). in TIw p~ oj M~ GIttl oo...lAficol BaetJ,t, London. Kepn Paul, aDd New York, Harcourt. Brace and Co., 1931. PenetratiD, developmeDt of • peraonali8t.ic view of probability aad utility. Rameey'. concepti of probability and utility are 811eDtialIy the same .. tboee pnamted in tbia book, but his lotdca1 development. of them ia an iDtAft8tiq aiterDatift to the one l1I8d here, hit definitions of probability and utility bei.. IimultaDe0u8 ad interdependent. Reiebeabaeh. BaDe [R2) TAt 2'Aeorr oj Probobilit" Berkeley and Los A.qeIee" UDiYeNit.y of Ca&famia Preea, UM9. The mOlt JMent and complete statement. of Reichenbach'. e1aborat.el:r worked out objed.iviatic view. Riehter, Ha. (B2&) .4Zur GrundleaunI der \\'ahraeheinlichkeitatheorie, I, II. III, lV," MtJI.M.. mGIiIc1w A~t 126 (1952). 129-139; 125 (1963), 223-224, 335-343; ADd 128 (1953).862-a7'. A penoneJIatIc theory of probability with & phylic.l orientation, pertainJ. . not to behavior but to "feeliDc of expectation." Bo" Ifl" Stephen W., and Albert O. Hart (RaJ ''Esperimeatal verification of & oompoaite iDdiirerenGe map, " J tNf'fttOl oJ
PoUIitAl BCOJItIm" 69 (1951), 288-318.
APPENDIX 3
280
Rudy, Norman
lR'J "Some problema in the economiee of indoat.ria1 -.mpJinc irwpeetioa," .... ,at UDPUbliehed diTurtatiOll IUbmfttAMI to the Umv. .t;y of CbiCllO in. 19&2. &mueJson, Paul A[81) "Probability, utility, and the independence axiom," B~ 3) (l9U), 670-678.. Savap, L. J. (82) ''The theory of ItatiAical deci.OD," JtIUT1UIl 0/ cAt A~ 8IaIUIicGZ A.. ~ '8 (1151), M-67. (83) Beview of I. J.. Good's Probabilil" caM u.. lV~ oJ ~t in JoumtIllI/ Me A~ 8I4lillietil AuoeiGIiM, {8 (1961), 883-8M. [MJ Review of Rudolf Camap'. LogU:ol ,~ of Pro6tJbililrl, in BCIOftOIINI. rW, 20 (1962), ~. Seheff6, ReDlY [sa] CIOn 8Oiutiona of the Behrene-Filher problem, baaed on the ktistributioD,It Aft"" 0/ M~~, 14 (1943), 35-44. [S8) 'fA note on the Behren&-Filher problem," AnMla oJ M~ &tIM""', 15 (1944),430-432. 1S7) "A metl10d of judgiDa all contruts in the anal)'lia of variance," BiomIJri•• 40 (1953), 1-18.
Searle, S. R. (SS) "Probability-the difficulties of definition:' Juurul o/tMlu1i1uIfJ of Ad...... 8~~'
8oei«J/, 10 (1961),
~212.
Sepl, I. E. (89] "Fiducial dietributione 01 eeveral parameten with application to a normal syatem," PrtJcudi",. oJ 1M CtJrnbridgI PAiloeopAictJl &t:Utr, 34 (1938), 41-47. Shackle, G. L. 8. [810J Bqedaliara in Eccmomic" Cambridp, Cambridge UDivereity Pre., 1949. ShanDon, Claude E., and Warren Weaver [S11) ThII M~ TAMwr oj CommUAiecltioa, UrbaDa, VDivenit,. of DIiDoia Press, 1949. Shapley, L. 8., and R. N. Snow (812) "Buic .,Iutiou of di8crete pme8," pp. 27-86 of [KI3). Sbobat, J. A. t and J. D. Tamarkin (SI3} n. P,obkm 0/ JlOfMfII,f (Mat.hematical Surveys, No.1) American Mathematical Society, New York, 1943; reprinted with small chanpa in 1950. Smith, Cedric A. B. {Sl") "Some examples of di8crimination," AMGla oJ Bugmicl, 13 (1947). 272-282. Sobeayk, A" and P. C. Hammer [S16) uThe nmp8 or additive eet functions," Dua MalJNmalietJl JtnIt'1I4l, 11 (1944), 847-851.
Sprowls.R.Clay (SI6) "Statietiea1 deciaion by the method of minimum risk: an applicatioD," Joumal of 1M A1MI"ieGn &ofialiaJl A~ 45 (1950), 238-248Statistical Re8eareh Group, Columbia University [817] ~ Anol,.u of ~ DaI4: Applicatiou, New York, Columbia University PreM, 194a. Stigler, Georp J. (818) ''The development of utility theory." J 0UI'fIIJl of PoliDaJl Ecm&OIftr. Perl I,
58 (1950), 307-327; Part fi, 58 (1950), 373-396. SUg1er, Gear., J., and KeDDeth E. Bouldin, (eda.) (819) R«MlirIp ita PM 7',..",. Cbieqo. Richard D. Irwin, 1952.
BIBLIOGRAPHIC MATERIAL
281
Thrall, Robert M., Cycle H. Coombe, and Robert L. Davia (eda.) (Tl) Dei... PrGee•.,., John Wiley. SoDS, New York, 19M. 'nntDer. Gerhard frio) "A oont.ributioll to the DOIHItatic theory of choice," Quarwrl" Jt1tInfIJl 01 Bconotnia, 56 (1M2), 27'-306. Tippett, L B. C. (T2J 8141iftiet, Lo.odoa, Oxford UDivenity Preas, IM7. A Ihon, euy introduction to the ideaa of apptied atatiltiea, with emphuis OD the eocDl eciencee.
Todhunter, I.
BiUty of 1M M ~ T..,." ti Pro6ahilicr /rtIIft 1M Ti".. t1/ PGM:ItIl ID tW tJ/ LoplDa, Cambridp and London, MaernUlan • Co., 1865; reprinted, New yon, o. E. 8tecbert, 1931; New York, CbeI8et. PubtiahiDI Co., INt. Tuby, Joim W. rr4) 'ceomp.riDc IDdividual IMMI ill the aoalyaia of variance," B~, 5
lT3) A
(UH9),99-114-
rr6] "Quick and dirty met,hods in atatiatica. Pan ll. Simple ~ for 1taDdani delipa," ~ oJ 1M Fiftla AftftUGl COfINIIIim& oj 1M A~ SotMI" for QwJIiIw ConlnIl, 1961, pp. 189-197. ft.
Dutaia. D.
[VI] "Sur Panalyae lotPque dee relatiou entre Ie ealcul des probabiIit& et appJicatioDe," pp. 49-68 of (D8).
von Mi-, Richard (V2J Pro6oIn1i'71, BIDliIliCl Ad TrulA, London, WUJiam Rodp and Co., 1939.
trusletioD or the oriciaal German. Intenlltioc for ita preIWDAbly authoritative statement of von Miaee' unuaaal view on the ~ foandatloDi of probabwq. Kg riew 011 the iDte1Ieetual fouclatioJla. objeethiltie. (8eeo.a eclitiOIl, 1957.) VOIl NeumillD, John [V3) ICZur Tbeorie der GeeeJlrbafWlpiele," M~ AMGlen, 100 (1928),
Enlfieh
296-320. Neu""'ftD, John, aDd Osbr Morpoetern [V4) 7'"."., oj Gmna and BCtIMmit: ~ (Second edition), Princeton. Princeton UDivenity Preas, UN7. ContaiDs, u a dipeMion, the important DeW treatment of utility (rom which the u.tment or utility in thia book derives. The eecood edition OOIltaiDI more than the fint on this subject, eepeciaIly a teebDical appendix. AIeo, the idea of reprdinc multiple choices AI overall choioee .. diac"Cfld ill pat detail. Finally, the cbapten on "sero-swn two-penon" ..",. are matbematieaUy intimately con.oected with the statistical minimax theory.
YOIl
iii.
VOD
Wricht, Oecq Hemik
[V6] TAl Logk:Gl ProWna 0/ Ittdvdiora (Acta Pbibophiea Fennle&, Fue. 3), BelIblkl, INI. (8eeoad editioa. 1967.)
WaId, Abraham (WI) 0.. eM PriflCipla 0/ 8t1WtiotJlI",",,", Notre Dame, Indiana, 1M2. A compact ezpoeiUon of the dominant formal ideu of the British-AmerIcan Scbool [W2J ~ A~ New York, John Wiley • Sooa. UNT. (WI] 8141i1lictJl Deei. . . FUfIdiou, New York, Jolm Wiley" 8oD8, 19&0. Wald'. great COIltribiltioDi to statistical theory are weD reprel8Dtecl br thiI book, but it is not for becinnera. • (W4) "Note OIl the MDliat.ency of the maximum Iiblihood eetlm·te," AnMlt oJ J(~ &DIirlia, 20 (INt), 696-801.
282
APPENDIX
a
[WI) "Aaymptotic minima s IOhltioDl of -auentW point eetimatioa probleml," pp. 1-12 of Prot.lIi.,. of 1M &i»rttJ [lNO] B..w., ~ .. 11,... ~ 8I4lilla oM Probobililrl. eeL Jeny Neymu., Berkeley. UDiveNity of
Califomla Prell, 1961. Wallis, W. ADen (W8) "SWldard IUDPliDa-iDepection procedU1'el," p~ of • 1~ 8CtIIiatical Cort/~, 3 (1947), 331-3150. [W7J ''TOJeI'&ll08 intervale for Jineuo reat t . . . . . . pp. 43-62 of p~ tI ..
8«tJNJ (l950J B",""" S"""oeiu.... 01'1 JI~ &tIM. . . tJNl ~, eel. Jersy NeymaD, Berkeley, UDivenity of California Pre., 1961. WaDis, w. ADen, and Milton Friedman [WS) ''The empirical derivation of iDdiIereDce lunct.Ioua." pp. 175-189 ia BtuMa itt M~ ~ _ BeonoJl'&flrice, ed. O. I..,. d caL. Cbieap, URivenity of Chicap Pre., 1M2. WalIh. John E. [W9) "On the power function of the 'ba' w.- aolutiOD of the Bebre...ftIher problem," AtUIGle of M~ 8~., 20 (1949), 616-818.
~,Norbert [WIO) C~J
New York, John W"1ley ct Bca, UNS. White, Paul D •• Robert L. KiDc. aDd James Jenb, Jr. (WII) 'The re1aticm of heart, abe to the time iDtervaII of tlIe heart tat, ..mil particular reference to the elephant aDd the whale,'" TM N. ~ JtItIf'ft4J 01 Jlftliciu, 248 (1958), 89-70Willi.",., R& M. · [WII) '''I11e UI8 of fiducial diatributiou with special NfeNDoe to the ~ Fisher problem," Pan II of aD uopublWwl m-rtatiOD submitted to t.he UDivenity of Cambridp and filed in 1949 in the UDivenity of Cambridp Ulnry u PIt..D. 1671. Wi8dom, John OuIton (W13J FauratlGliou oj 1ft/milia ift NGlvraJ &Wnt:e, LondOD, Methuen, 198. A new book in the philoeophieaJ. tradition, but motivated by the idea of naminin. how inductive inference iI aduaUy u.d Tboa&b"""" entirely verbaHstic in outlook, behavioraliatio ideM inc1ndinl the aurMhia& prlndple play an importaDt role in the 6Dal and tftIminati • chapter. Wold, B. (W14J "Ordinal prefereDee8 or cardinal utility," B~, 20 (19&2). 861-664.
in.....
Woifowits. J. (Wla) "The efficiency of sequential eatimatel and Wald~. equation for awpamtial proolael,t' Annale aJ JI~~, 18 (IM7), 216-280. (W16] "OD Wald'. proof of the coDli8tency of the maximum likelihood eet.imate," AftftGla Ii MotMIraalit:al8tolittiu. 20 (lM9), 601-602.
[W17J "00 e-complete elISe. of decision functiona,u AftftGla oJ M~ &. tifIiu, 22 (1961), 4814M. Y&teI, F. [Yl) "An apparent inOObllatency arieiq from t.te of lipiticUlee bued on fiducial distributiou of unbOWD parameters," PrOCUJdi,.,. o/CM C~ P~
&tMl", as (1939), S11H91.
[Y2J "Principles IOveroiDc the amount of experimentation ill developmental work." NaIvre, 170 (1962), 138-140.
APPENDIX
4
Bibliographic Supplement SiDee the publication of the tint edition of this book, the literature of the foundatiolUl of statisti~ like that of all aeienee, haa been ing .with aweaome rapidity. The relatively abort list of about 180 HeIDI below includes a few older references overlooked in the first editioD, but most are more recent. They are chosen in the .pirit of thOle in the lint editiOIl, Appendb 3: Bihliographic llaterial. Some support Dew uaertiODl made in this edition, lOme bring up to date reading liBta Btl k~)" refereuCIN for certain topics, anel some are selected for their quality and originality. Paps in tbis edition that cite a given entry iD the list. below are shown by italic numben following the entry--& neglected inventiOD ping baa at least to (Coolidp 1940). Where there is neither lOch a pap Dumber DOl' a commeDt, the entry is soppoaed to speak for italf. Some of the new entries are special bibliographie (Edwards 1969; Georp 1968; Greenwood et a1. 1962; Joiner et ale 1970: Lancaster 1968, 1969, 1970; lfiller 1969: Savage 1970; W uaerDWl and SilaDder 1958, 1964). BibliOlJl8phiea of statistiC! it8elf, not to mention those of related fields, h.,·e 80 proliferated that Laneaster (1968, 1969, .1970) has al. ready published three bibliographies of stati8tica1 bibliographies, one of book leDgth. Several important journals bave published cumulative iDdexM .. shown by the table below.
,toW-
~JoUl'D&1
.A ...,. Df Jl.tA#ffUIt~dl SI4tt.,iea Bio.~tri1tG
y an Prin~jpal typat of t'OVfftge (and Volum.)
1930-1960 (1..31 ) 1901..1950 (1-37) 1901·1961
Citation, author, I11bje4!t, tabl• Subj~
(l-48)
283
Author
APPENDIX 4
2M
Y-.rs
Joaraal
Principal typM of COYerare
(and Vo11Ullel)
~ouf'ftOl
1888-1939
Author, aubjeet, book IMina indesed by author of. book Author, 81lbjeet
(1-34) 1MO-195.t; (35-50) 1958-1969 (1-17) 1959-1988 (1-7)
Author, tJ1Ibjeet, hook nM8'ft iDdaed by alldlor of book and by I18hjeet Author, aubjee&, book reriewa indued bJeabjeet Eaeh yMr separat.eq, author, I1lbjeet, hook N9iewa iIlc1aed by . .thor of
1932-1952 (1.20)
,i.
01 AMmea 8"""kGl..t..~ ~OfIrfUil of fi. AM...... 8'GtUIiMl A ..~ Bntc. tU .4""1,,,111. S,., ..tietJl 2'Nor, ... ••'W A""rd~"
8,.,..,iqu.
book
Of theee, the tbirty-year index of the A .... of .dletnGlical SfGtill," (Greenwood et ale 1962) is a landmark in bibJiosraphie teehniqlle and is still VfJr7 uaefaJ. though DO loDpr recent. The index of .vera) joumala compiled by Joiner at al (1970) is timely and of • very useful type, probably much cheaper to compile than that exemp1i&ed hy (Greenwood et at 1962). SiBee 1964 the 8cietac~ Ctumo. IttdIz (the llUItitute for 8eientifle InformatioD, Philadelphia) hu been published with ever increasing coverage. Thia is aD enormoUi enterprise ahowmc who hu cited whom in about 3,000 different journals, which makes it relatively eaay to find recent literature on 4Dyscientifie topie for which ODe or two older ref.. erenees are Imown. Thoutrh the coverage for statistics and related flelda
may not yet he
v~ry (!omplet~~
this faciJity
i~
already useful.
Additional bibliography AezQ, J&noa 1988 Ltte'"r.,
Futte,ioft4l Eq••etoru York, Aeademie Pr Ell. Ott
•• 7'11.... ..ApplioaUotu, New
Seetion 7_1.4 is • key referenee lor a mathematical approaeJa that inve.tiptes the ecmaequeDeM of funmoul eqaationa that somehow suaest themeelvee as uioms tor probabUlty bat doel not seek to interpret; probability.
ADaeOJnbe, Fraeie J. 1961 "Bayesian atatiatifJI," Tlt~ A."ricd" 8t.,u,ido_, 15, No.1, 21-24. Arebibald, G. C. 1959 "Utility, risk, and linearity," JOtIrtlGl 01 PolUicGl El!OftD." 67. 431-4l5O. 104 Barnard, George A. 1947 "A review of ISequelltial ADalyaia' by Abraham Wald," J ourMl of ,1M A.me... 8'41i1ft«Jl A"oN'iota, 42, 858-8M. i"
BIBLIOGRAPHIC SUPPLEMENT
285
Barnard, George A. 1983 "Some logiea1 aapeee. of the ftdueial &rpIDent," JourllGl of 'lui RogtJlSI4Ii41ict1l BoeNIg, 8~tV. B, 25, 111..11~. ., RamaN, Gecqe A. 1985 "The UN 01 the likelihood tunctiOJl in atatistieal praetiee," Prou ___" of 'M Bff'ulell By_postu. ". J(MAftlGtitwI GWd Probability. 1, 27-40. itJ Barnard, G. A., G. M. JenJrina, aDd C. B. Wiuatm 1962 "Likelihood, inference, and time serif!l8," JOtlrfl4l 0/ Ii. Boyal S",uu.z SO~'!I, Sme, .A, 12.'5, 321-372. if! Bay., Thoma 1958 "EMa.,v toward 801viDg a problem in the doetrine of Muee8: with. hiographiftll note by G. A. DamaN," Bio",elrilul, 45, 29s... 315. (Alao plIbliabed separately by the Biometrika omee, University COnfCe. LoDdon.) New edition of [BS]. Birnbaum, Allaa 1962 "On the f01lDdatioDII of ataD.heal infenmee," JotAf"ftG1 0/ Ii. A .M'ieGtt Stalil'ielJl .A.,ociation, 51, 269-306. .., Bimbaum, Allua 1989 "Concept. of 8tatiatieal "idenee," pp. 112-143 in g.,tAY' i. Hour of E,..." NQf.', eda. Sidney X_leAb e.S'. Patriek Suppea, aDd Morton White, New York, St. Martin'. Pn.. Blaekwell, D., aDd L. DubiDs 1982 "Merging of opinioDl with inareasing information," .A HCIla "f Jfa,AftlG'ical 81a1i3,k., 33, 882-887. 114 Blum, Jali-, and .Tudah RaeenbJatt 1987 "011 partial a priori infonnatioD in statiatiea1 iDferenee," A.t1G18 1iHl 8tatidia, 38, 1671-1878. Seeks • compromise between the penonalistie and lreqllentistie approaehea. Borel, EmUe 1924 If}, propos d'u traiU de p1'ObabiJita," Bei7tU PMlOHplhqu, 98, 321·338; reprinted in PnJliqft41 d Plailo8opJHe tHI p,.obabiJitl. by Borel, 1939, pan., Gauthier-Villars; translated in (Kyburg and SmokJer 19M). This revi", of [K4] eontains the earliest aeeount of the modern eoaeept of penoul probability known to me.
,.,,*
'i,a.
8'.
", Jl",,....
Box, Geoqre E. P., and G. C. Tiao 1982 f'A further look at robUBtn. . vi. Bayes' theorem," Bio_«f"I&.,
49, 419-433. A penonaliBtie &eC!ODnt of an important general problem in
lltatistiee. Broad, C. D. 1989 lre4Mdiotl, Probobt'iltl, aM OtJfUoli~, Dordreeht, HollaDd, D. Reidel PDblisbing C01DpaDY. Colleeted papen of a fal1101l8 philOllOpher 011 the title the.... Bro., Irwin D. J. 1983 "Lingaiatie anal~. of • .tatiatisJ ~troyeJ'8y," n. ~.w . .a 8'a';"~., 11, 18-21. AntiperllOD&liBti~
eolllmentary.
APPENDIX 4
286
Bihlmanft, H. 1960 '·Aut&1lHhbare etoebutiHbe VariableD und ibN Greozwertaaetze," l1....,~r..,l1 of Col'/tmtitJ Pttbliul10fN i. 3, 1-38.
S,.,.,'*,
13
Camap, Rudolpb 1962 "The aim of induetive logie," pp. 303-318 in (NapJ, Suppes, aDd Tanki 1962). Cfreeole, Pierre 1915 "L'irrido(ltibjli~ de l'intuitioD d. probabUitaa et I'existanee de propoaitioJUI lnathfmatique ind'montrabJ.,~' A rcA•.,«t tk P.1/cAolog.., 15, 256-305. Remarkable early ideu about the 8ubjP.ftivity of probability and about the elusiye coneept of the probability of RJatilematial
'1
propositiODa. Chamj)ere, Michael L. 1970 "A simple problem with atrikiagJy different fnquelltiatic and Bayeeian solatioDB," J".,rtttJl 01 ,~. Royal S"".,tkal Boring, S,N, B, 32, 218-282.. Chao, K. T. 1970 "The asymptotic behavior of Bay_ tlItimatora," A WtICIU of Jl"''''4~
Clarke, R. D.
19M "The
s'o"",ia, 41, 801-80ft
114
C!On~pt
01 probability," 10Mrfttli of tl. I ...ittd, ol.Act.an••, SO, 1-13 (fonawed by ...tract 01 cliecmaaion, 14-31). A ~noaaliatie view from an aetnarial standpoiDL Coolidp, Julian L. 1940 A Hitdory 01 OeOfltetril!al M~tlw., New York, Oxford Univt-r.. aity Preas. (Reprinted: Dover Publit'&tiODB, S'.... York, 1963. 28,' Cornfit'ld, J eJ"ODle 1966 "Sequential triala, sequential anal,.is, and the likelihood priDciple," n, ~ .. S"ti8ti~id", 20, No.2, 18-28. i" Comfield, Jerome 1969 "The BaY.AD outlook and its applitstiODa," Bio.etM, 25,
,rica.
617-842. A aignileant .tat~u.ent of the ptftOnaliatie poRtion in statiatis. C08tantini, Dom~nito 1970 Fo"'ffI~'i tI,l Colcolo d~lk Probabilila, HilaDO, Giangiac:olllO Feitrioflili Editore. A recent survey of the foundatioD8 of prob&bility by an et"1ee-
tieeJly intlined author.
B.ll.,.. 1'1.,.,,,,,
de Fin~tti, Bruno 1954 "Media di dfGIioni e mftiia di opinioni/' d, I.'6ntG'iottGl tU SIa'iIt,jqlU, 28th ae.iOD, 34. 144-157. 11" de Finetti, Bruno 1961 "The Bayeeian approaeh to the rej(l('tion of olltlien/' pp. 199210 in '''01. 1. of Proc.~.g. 0/ 'itt (l96O) Bwlr~ley By. . p~... 011 lla'''''otieGl Sta'wld olld Probdlnlj,'II, eel Jeny Neyman, Berkeley and Los l\ ngel_, Uni"emty of CaJifonUa PreM. de Fioetti, Bruno 1968 uProb&bility : Interpretations," pp. 496-505 in the l"UrM'iollGl EfIt!yclopnlia 01 SOMl 8~~., N~ York, MMmillan. A penetrating oVflmew of the philoeophy of probability.
FOIl"'''
,i,
BIBLI.OGRAPHIC SliPPLEllENT
287
de }'inetti, Bnum, and Leonard J. Savage 1982 "Sui ruodo di seegliere Ie probabiJita iniziali:' ! oJIdGM8ti tkUG ,,'a'utiM, BiblioltCf. d~l JI,'rMl, S~ri~, C, 1, 81-147. (A fairly extensive English sumlDary, 148-151.) D~ at length many f&C.'E'ta of the in~rpretation and appli_nOD of the personalisti~ position. de J oaftlDel t Bertrand 1967 n~ Art of c.o,.jl'cture, New York, Bui(! Boob. A bistorieal aDd literary approaeb to intellipnt and imagiDa-
S.'
tive g1Ieasiog. De Zeeaw, G., C. A. J. YJek, and W. A. W~naar 1970 8.bj~ctit:e Probability: TIt.or1l, El£p~ri"'~'lJ, AppliDtdiOtN (couaists of u.ue No. 2/3, Vol. 34 of Acla P,yeltologicc), A.m&terdam, Xorth-HoJland Publishing Co. Dempster, Arthur P. 1964 "On the dif8culties inhttrt'llt in }t'i3ber'e fiducial argument," JOtlnt41 0/ tA~ •.ftll,riCGfI Sta,j..,tktll _.f.uoei81ioa, 59, 67-88. 262 DelUpster, Arthur P. 1968 "A generalization of Bayesian infen-llef':~ JourfHJl of ,.~ Boyal S'flfulieal Son,'y, Series B, 30, 205-247. 58
Diekey, Jamee M.
1971 "The weightEd likelihood ratio, linear hypotheBfe on nonual lotation parametfrs," .A ""tl/~ of JIQlIt~nlieal S'atiBtiu, 42, 204-223. ~\ by nafenau..,.. for th.. pt'r50nalistit approach ill multivariate lltatiaties. Dl"fIu, J aequH 1981 'Tond~lU~uta logiques df' 1& probabilite subjeetive et de l'util. itit pp. 73-87 in La Dleilfioft. Paris, C~ntre National d~ 1. Reehflr~h.. Sei@ntift'lue. P1HB the tyranny of ool1sequen~ that do not df!lpend on states and statea that do not de~Dd on atta.
DubiDs, Lester E. 1989 "An eleltl~tal"Y proof of
B~hller'8 finitely addith'e Radon ... Nikodym thE'01'eln," ..4 mtriM,. Jllllh,..tJt~(Jl Jlon,ltly, 76, No.5. ~~.
~
Duhi1Ul, Lester E., and Uoollard .J. Savage 1985 HotI! 10 Ga ..bl, 1/ YOM Jlfl~t: lftl'qllnlili8# lor 8toclf(U'~ P.-oces.t$, New York, McGraw..Hin Book Co. Edwards, A. W. F. 1969 "Statiatif'&l methods in seientifte infen-nee," N (Jtt,,.~, 222, 1233-
3S
123'7. ChampiOlll a notion of prior likelihood U oppoeed to prior probability. CODtaina interesting critical and bi8torieal sidelipta. Edwards, Ward 1089 A BibliogralJlty of R~8~tJrd. oa B,IJ""ior D~ritrj". P"O'~38'/f 10 1968, Ruman PfirfonnaD~ C~ter Memorandum Report No.7, Ann Arbor, University of Miehipn Press. Edwards, Ward. Harold Lindman, and Leonard .T. Sa".
1963 "Bayesian ,taastir.) jnferen~~ for psyehoJogit"al J'8eA!'eh t " PIIY.. cItDlogi$llUri.tf1, 70, 193-242.
28.~
APPENDIX 4
288
Elementary but eerioaa disealeiOD of tbe p8l'1Oulistie poeitiOD iD ltatisti.. Not partieuJarly eoDhed to payehology. Ed....rds, Ward, L. D. PhilliPJ, W. L. Ua1l', ad B. C. Goodnwn 1968 "Probability infonnatioD proeeesiug 8y.teID8: Design and evaJuatioD," IEEE Tnauac'tOlU Ott 8y,,"" Brine. "ltd CglHlru,*,
88C-4, 248-265. Colleerned with the praetieal use of penoDai pro..bilitia in.. VOlvi.Dg the opinio. at c1iftfftDt people, eaeh about his OWD uea of eompetenee. Edwa. ., Ward, and A. TYeraky
1987
D,~
JI"I_" BaltilDore, PenguiD Boob. A. ""all aatbology eenteriDg around empirieal ..peet. of deea. iOD maJriar, inclllctiug oompa.ter-aided improvement of decisioD. Ell8berg, naniel, William FeUD(lr, and Bo....rd ~ 1981 IfSympoeium: D~o_ under uneertaiJlty," QtuJ,.,.,.lg J ~"ntGl 01 E~Oft(}.k.a, 76, 843-694. A key reteraee for. C!el1aiD type of deparmre t!'ODa the theory ot penoul probability and utility in thia book. EriGlOD, William A. 1989 "Subjet!tive Bayesian modele in eampli. finite populatioD8," .Tow,..., 01 ,1, BOllal8t.,i6tietJI8tH!kty, 8wN' B, 31, 195-220&. A key reteftDee for. new line of thinkbag aboat the theory of ..mpliug.
F.biu, J. 1984 "A.yanptotie behavior of BaYM' Mtilllates," A.tNl.tr 0/ .olitGI81a1itt'tclt, 35, 846-856.
JftJ'It~·
:l14
Fishburn, Peter C. 1984 DeNiote GU Yalta. T"~ory, New York, Johu Wiley and Sou. TreatB decieioD aDd preferen~ exteDBively in • tuhion harmonioua with thia book, and with • Tie" to applications IJlO~ to m.n.. agement thaD to 8tatiatiee.
Fishburn, Peter C. 1970 Utility 7'.eory lor D~ Mld".g, N•• York, John Wiley and SODIJ. 40, 18, SO LaeidJy treats • great Yariety of axiom.tie appl'OMhfe to pntereote, with and without UDft!riainty, ineludiDl the axiomatie upeeta of book and later developmenta. Fi.ber, Ronald A. 1934 "Two nt'w properties of lIa&thematieal likelihood," Pft)U,• .,,. 01 "., BogtJl SOMII, SIJrie8 A, 144, 205-221. (Paper 14 of (P6].) 68 Fiaber, Ronald A. 195-') "Statisti~ Jlletboda and eaelltifte indudioD," loti"'" of ,~ Rogdl 8t.'''tituJi SOft.'y, B, 17, 89-78. it: Fiaher, Ronald A. 1958 S'G,."icGl JidleotU otld 8ei#.ti~c 1_/,rnle., New YOK, Hatner. Mn Fruer, Donald A. S. 1968 ft .. 8'",'.,r, o/l.f~~, New York, Joh. Wiley ...41 Scms.. A tMorT of afatiatiea1 iDterenee that IIfJ8me to clerive from icleu of ftdueial probability and of nee eII.ria reliallfle OD .~ and iporuee.
th.
8,""3
BIBLIOGRAPHIC Sl:PPLEMENT
289
)"ric·het,llauriee 19.1)5 "Sur l'ilupurtau(.'e PU et,,(luometrie de la diatinetion entre Is probabilitM :ratiOD~lJeB tat irrationellf!'S," ECMtDa"No, 23, 303-308. Freedman, David 1962 "Invariant. under mixing whieh geraerali7.e de FinetU'. tbeo1'eUl," ....f.NI" 01 JI(dll~..(dwl Statistics, 33, 916-924. )'re.du1&n, David 1963 "InvariantK und~r mixing which generalized d~ Finetli's theomn: Continuous time pa1'lltn~t.ar,n Jt ,uualB of JlIJ'1J~"'4ticGl
8,.-
IMt~,34.11~1216.
53
68
FnetiJllan, David 1965 .40n th .. uymptotie behavior of BaYN estimates in the diJerete cue II:' A .....Itt 0/ Mtdlema'ic41 SIoti.dit-.8, 36, 454-456. 214 Gausa, Carl Friedrieh 1821 "Theon. f'ol1lbinationia obRrvationum erroribua miniwia obnoxiae," Co".mttdtJIw.~8 So("uttJIiA RtgiGe Sciml,.rtI'" GoIti.glttMo' B,c~'iort8, 5, 33-90. In Oennan translation, A. BOneh and P. Simon, .AblttJltdl••gt. za, Jldlaod. iUr il_.,'~ QVGdrG'., Berlin, Weatdntckerei Joaehim Friekert, 1887 (reprinted W1IJ"Zburg: Phyai~a-Verlag, 1964). In FftDCh traDalalioD, J. Bertrand, JI,tlotU tk~ JloiJUlre$ Corrl" Paris, Mallet.Baehelier, 1855. !l30 George, Stephen L. .1988 ~ •.,htaotdl#d Bibliography Ofl tlt~ Fou1Ulaliou of l.'ttrtftc~, lnstitut. of Statistics Mimeograph Serle. No. 5'12, Ral.igh, North Carolina State College Pre88. 183 Giere, Ronald N. 1989 "BayMian Itatisti("8 and biuOO protwiUre8,tt Sytl'1t ••~, 20, 371..
s,.,.,tesl
381. A pbiloeophic
a~UDt of diJferenees between Bayeeian and Neyman.PeanoniaD atatistiea, ooneluding in favor of the Jatter. Guedenko, B. V. 1962 n~ TAtIO,.." (If Probability. New York, ChelsM Publishing Co. This edition, more than later ODes, eontains some philoaophiea1 paasagea, a few of them referrUlg to dialeetieal materialism.
Good,IrriDgJobn 1959 ''Kind. of probabilityt ScintU, 129,443-447. Good, Irring John 1982 USubjfJCtive probability as the tneuure of a Don-lneuarable &et," pp. 319-329 in (Nagel, Suppes, and Taraki 1962).
58
Good, Irving John 1965 TII~ E_'ifftdtiOfi 01 Pr()bll#,i'ilies: A .. Bllloy JIod~", Bay...,. JI.,Jeodlt, Cambridge, M~husetta Institute of Teehnology Pt--.
0"
Goodnlall, Leo A. 1953 "A ailliple method for improving some f'Stimaton," A""'" 01 Jfdt1a~... aticGI StafWItiu, 24, 114-117. 114 Orayaon, C. Jaekllon, Jr. 1960 D~riMoft8 Uftder ll"cerla'.ey: Drilltttg D~cifto,., by Oi' alld 0 .. Op,raU)t'I, Cambridge, Harvard Univenity Press. In~re8tiog diseulioD of po88ible applications of personal probability and utility to a kind of bum. . deeisioll greatly alreded by uneertainty.
290
APPENDIX 4
Greenwood, J. Arthur, Jagram OlkiD, aDd I .. Riehard Sa.... (eds.) 1962 A ...... 01 Jlllfll...tJIieal 1__ 1o F_. 1-31: 1980-1960, 81. Paul, KiDD.., Nortll CeDtraI Pablillbiag Co. 1&,184 A sophiatieated bibliOll'&phie app&rat1l8 for the period eoverM, mel11dine Dot 0011 author . . detailed aubj_ indnea for the 01 JI.'lt~., S"""tia, but abIo illduiq eia*iOBS to A the A.tNiU and eitatioDi made in the A ...,. to other jounaals.
s,,.,.,,;.:
.tId"
Baag, Jul. un probl~me pn6ra1 de pro_bili. . et . . diY,nee appli.. eaooDS," pp. 859-614 in Pr""~tli,." o/lA4Jl"'""",,ioHI Co"gr_". 0/ JfGtA,. .,ieiau, Toro.'Q 111247 Toronto, University of Toronto
1928
14SV
Preae.
61
Hu.ldDg, laD. 19&5 lAlgie (1/ St.tw'it:oll./~r~., Cambridge, Baaie Boob.
HaekiDg,Iu 1067 "81i,htly more n.liatie pel'lJODaI probability," PAil",.play 0/ 8~.,34,311~.
?
Interati., tor ita own tb_ bat also t.or ita MDlitift interpretatiOD of statisti. liwrature.
B&jek,Jaroe1av 1985 "0. *it- eonaepta of statiatiee." ProeHdlall' 01 ,la, Filt" [1965/88] B".It.ky S"",poft". 0- Jlntl--.,.,kal Stllt.tieI elN Probab.1"II, 1, 139-162.
HabD880D, Nibl B. 1970 "FriMDlan.s.vage utility fUDmolUl eGnaieteDt with risk aversion," Q'Mllr'~rlll JOllrff41 Df ECO"Ofll~, 84, 412-487. 104 Halphen, EtiellD8 1955 "La notion de vraiaembIaD~t" Publk/diOtf M "I.,i,." d~ Sill'''tique tU l'U.i1:,rriCi M PaN, 4, 41-92. Thought. ot an origiaal, aad too little kncnm, fip.re OD the foundatiOJUl of probability. Harrod, Roy 1958 Fo.rtd4,io", 01 IJtd8e'.t:~ Logie, New York, Har4:'Ourt, BraeP and Co. Hewitt, Edwin, and Leonard J. Savage 195.r; "Symm~tri~ mea81lft8 OD Cartesian prodaeta," TNJ ....eliou 0/ 'A~ A_mea. Jlt"Ia'fJUJlicGl 8oeing, SO, 47()'501. 59 Hildreth, C. 1963 "Baysian statistieiaDll and remote elienba," Eeott()ffl~trit!a, 31, 422..439. BriDp out, and ft'fteda upon, an unportant point Deftr made nplieit in this book : the person in peraonaliatie ,tatistiN is hardly ever a .tatiatieian or even the -.eientifte inv"tiptor but is often one among the publie who apprai18 a atientifte publieation. Hill, Bru~ M. 1983 "The thrH-puameter loponnal diatriblltion and Bayeeian analysitJ of a point.80uree epidemi~," J 01 A ••I'ieo. St4tv'iedl Anot"ia,io., 58, 72e85. Rill. Bruee M. 1969 ''Ji'oundatiou tor the theory of 1. . eqUft8," JOtl.NlGI of
0......
'1.
,1,
BIBLIOGRAPHIC SUPPLEMENT Boy.'8'd,;,ticol8~vt1l, S~rit8,
291
B, 31, 89-91.
Pel'llODal.iatie tratmellt of a topic eentnl to moclena statiati-. BiJpinea, Rilto 1988 R.I•• 01 .Ae~.p'lJtte. afld Ittd• .,n.:. l~ogic (Acta Philoaophiea Fenniea, u.ae 2), Amsterdam, North·HoDaDd PubiiahiDg CoIB*
pany. Hinbleiler, Jus 1961 ·'The Bayeeian approaeh to atatistieal deeisioD: An upoeitioD," Tl. Jov.nt4l 01 BtarifU", 34, 471-489. HOAdley, Bruee 1970 "A Bayesian look at inveJ"l(t linear regl'ftl8ion," J o.rtICIl of , •• .d lII~ricd. SIatilJticGl A8.0ci4t.~", 85, 356-369. AD exam pl. of penonaliatie ideu in the study of a apeeifte problem in atatistiCl. Huber, Peter J. 19M "Robaat estiruation of a location parameter," .A ....." of JIal'..atieGl Stofil'it8, 35, 73-101. AD importaDt DODper8onali.tie advaaee in the eeotral problem of statUtieal robuatDe-.. Jeflrey, Richard C. 1966 TIll' Logk of DtMOft, N.w York, MeGraw·Hill Book Co. An interesting departure froan the theory of penoD&l probability and utility as represented by thie hook. Jelfreya, Harold 1955 "The p....nt po8ition of the theory of probability,tf Bri,... JOtlrtldl for tu Pllilo,op1y O/St!kffU, 5, 275.289. Joiner, Brian L., N. F. lAubBehft', Eleanor S. Brown!, and Bert Levy 1970 A- A.,l"r GtId P«'flt"~d Till. ltUkz S.l~e'~d S'GliItkdl JON"""', National Bunall of StaDdards Speftal Public.tion 321, Wuhington, United States Department of Commerae. 288;,184 Briap the eG.lIlulative indexftl of tM!VeD joumala up throach a little later than 1988 in a form that ia eoDyemmt and powerful. Keodal1, M.. G., aDd A. Stuart 1958 Tie AdMtIe.d TIJ~ory 01 StG.i..,ic8: Vol. 1, Dtlfribufiott TIa~, London, Charls GrlJreD and Co.
'0
Kencla11, M. 0 .• aDd A. Sturt 1961 Ti, AcJt'aRe.d TUtWlI of S,.tulia: Vol. J,I-!,,.nct8 atld s •• liofuJaip, London, Charts Griffen and Co.
This item and the one before, together with • projeded third volume, Pia,.....' muI .....ly."', atad Ti"'fl-S.rlII'a. win COP8titute • radieally Dew edition of [K2]. Kratt, C. H., John W. Pratt, and A. Seidenberg 1959 "Intuitive probability on finite seta," A.tIGlt 0/ M.,A...,*, Sltdill'ia, 30, 408-4.19. Kullba~k, Solomon 1961 I_/or",aIiMt Tlatory 11M 8'atistit'8, New York, John Wiley aDd Sona. Kyburg, He~· E .• Jr. 1981 Probobil;'1I.tId tie IJogi() of BfJlioNl Bfti4!/, Middleton, CoIIJL, Wea1eyaa Ubivt'nity Preu.
40 SO
292
APPENDIX -4
Kyburg, Hemy E., Jr., and Ernest Nagel (eda.) 1983 IttdMClit:Ha: So".. 0,,"..., , ••••, M1ddletoWD, Wesleyan University Pn.. Kyburg, HeDJ")' E., Jr., and Boward E. Smokier 1964 8~ itt S ..bj.~tit;. Probabili'y, New York, John Wiley and
Sou. An anthology eo_ting of aD extraet from John V mm, tra... IatiODI of (Borel 1924) and [D2], and reproduetiou of (R1]~ [100], and (Savage 1961). B .. a large lribliopapby. 1/14 lA.n. .ter, Beuy O. 1968 BibliogrtJplty 0/ Scst..1icG1 BibliograpAk3, BdiDburgbt Oliver
&Del Boyd. Laacuter, Belll'1 o. 1_ "A bibliography of ltatistieal bibliopaphu.: A aeeoDd liat," B~ 0/ ,".I.,__ ,iMlal SI.fiI'ielJJ 1"''''.'.,37, 57..fJ7. Lancuter, Belll'J' O. 1970 "A bibliograpby of ltatiatieal bibliographiee: A third list," 0/ ,IN 1.t.rtltJtioflGl 8f4lulit!tJl S8, 258-281. LeCam, LUGeJl 1964 "SulIleieney aDd appronlDate .dleieney," .A.1ICIl.t 0/ )(al".... . .,ieGl 8,wtia, M, 1419. .14M. LebmaDD, E. L.
Bni.,
1.,;,.,.,
1958 "Sipifteanee level and power," Aft_I. of J(aIAt'tli4Ueal SIG29, 1187-1176. Lehmama, E. L. 1959 Te.tiwg S,ati..fkal l'ypotltttJl's, New York, John Wiley and Sons. heell~nt illaatration of Neyman-Pe&rIon theory, showing older and newer aspet'ta in tension with eaeh other. Lindley, Dennis V. 1968 "Proteuor Hogben'. ICrisia'-A survey of the toundatiou of 8tatisties," _~ppliitd 8totidk~, 7, 186-198.
',,'ia,
188
R83 188
134 tV
Lindley, Denni8 V.
1985 I.,,.od,,ehott to Probabilily CIted SfCl'Ylic8: Fro. a Boyuion 'Y;,,,,poi.,: Pa,' 1, Probability j Por' 3, I.I.r~.ctt, Cambridge, Cambridge UniTeraity Preea. Linnik, Y11. v. 1968 SttJIwtiCGl Problnu .mIl N ....rae' PtJrGflNt.,." TraualatioDl of MathelDatieaJ Monographs, 20, Providence, Amirit'aD Math..
matieal Society. 261 Luee, R. DuneaD, Robert R. Buh, and Eugene Galantei' (eds.) 1965 B_d•• itt MtJt1t~m4"cal P.y~1aolotIY, Vol•• tt II, New York, John Wiley and Sona. Parts V and VI eoatain relevant ~ings. Luee, R. Dunean, and Howard RaUfa 1957 G"""s attd D~OtI, New York, .John Wiley aDd SODa. Good aceount of the theory of gamee aad ita eontaeta with the normative theory of deeisioD.
,!
Lut'le, R. Danean, and Patriek Suppes 1965 "Preference, utility. and subjective probability," pp. 249 410
BIBLIOGRAPHIC SUPPLEMENT
293
in Vol. 3 ot Balldbook of JiallaftUJlit'Gl P."dwlogy, eda. R. DuDean Luce, Robert R. Bueh, and Eugene GalaDter, New York, London, &Del SyclDey, John Wiley and Sona. Valaable tor iteelt and u a key relerenee. Lubaiewiez, Jan 19'10 "Logiea1 foundations of probability tbeory," in J a. LtaluuiefI1iu: S.kcl.d Worb, ed. L. Borkowski, Amsterdam and London, North-Holland Publishing Company, and Wanaw, PWN, Poliab 8eieDWle PnbJiahen. Eagliah tranalatioDS of the original Gennan vennoD of 1913. Eloqueatly preeenta an early fonna1-logie app~ to probability. Valuable for inaiPta and hiatory. Lasted, Lee B. 1988 lralrodueliott M,dical D,ciriott Jiati,." Springfield, llliaoia,
C.C.Tbomu.
'0
Bopefal bue nalistie dieeussiOD of a promiuDg field of applieation for penona.lUtie ideas. Jlaritz, J. 8. 1970 E_piric4l Baye3l1etlaodtt, LoDdon, Methuen. Key referenee for a braneh of frequentiatio statiatis with BaycsiaD elemeDta. Jlarkowita, Harry Il. 1W Port/ow, S"-Chott, Cowles Cowmiarion Monograph No. 18, New York, John Wiley and SoDS. Applieation of the theory of utility to the theory of investment. Manebak, Jaeob 1988 "DeeiaiOD makiDg: Eeonomie upeds," pp. 42-46 in the tIGIioIIal EflCYc10,wdilJ of ,la, SociGl Sciftcn, New York, Maemillan. Kilier-Gruewaka, Halina INS "On the law of probability and the ebaraeteriatie funetiOD of the Btandardized 81llD of equiYalent variablea," TottHIrzyaltDO N ... 'WarzatNkW, 42, 99-142. KiJier..Orazewsb, Halina 1850 "8uUa lege limite delle variabili N8U&1i equiYalenti," Alii MU. A«ddftlia NaziMuJl, tift LiftCft, S~ 8, 2 (First
1.,.,-
It"'.
.etaOJW
~on),~.
sa 5J
Miller, Hobert B. 1989 A 8"~I~d BGyma. SlatitJtSc. Bibliograp.y, Department of 8tatiati. Teehnieal Paper No. 214, Madison, Univenity of· Wi&-
eo..m Prea.
R83
Mone, Norman, &Del Riebard 8ukateder 1988 "8tatistiea1 isomorpbism," .A.MIa of Jla'~ 8141U1ia,
37, 203-214. KOIteller, Frederiek, and David L. Wallace 19M l.f~fte' ,," DUp"t~d ..tu,IaOntAip: Tit, F~tlIral"t, Reading, lIaNaeh1l8ett8, Adcli8on-WMley. A thorough and illuminating applieation of atatistie. ilIaatratiDg both penoDaIisti~ aDd nonpenonaliatie appnJMhel.
l$j
APPENDIX 4 Nagel, Emeat, Patriek Suppes, ad Allred Tanki (edl.) 1982 LDgItJ, JI.,1todologg aN Pltil~(Ip1a!l oI8~., S&Uford, StaDford Univenity Pfta.
Neymu,Jeny 1952 L«'.f'~' elM CtMf~ ott JI.'A....tiHl StdlUf* atld Probabili.y, W uhingtoa, Graduate 8ehool of the UDited States Depart. ment of Agriealture. (Pint edition 1938, lninaeographed.)
YeymaD, Jerr;y 1957 "Current problema of mathematicsl statisties," pp. 349-370 in Vol. 1 of Proc.,dittg,
0/ .A. 1..,,.,,.,iotwJl Co.., ..... 01
Jlea,A.-
..., . ... (Amaterdam, ltM], GJODiDgeD, E. P. Nordbo•• Neylnan, Jerzy 1987 A 8.1."t'0,. of Eorly s,Gtia'ieGI Pap.". of J. N." ...,., Cawbridge, Cambridge UDiveraity PreII8. Neyman, .Jerzy, anet BlOB S. Peanon 1967 J oi.. S,.,g,icaI Papwa by J. N "lUI E. S. PIJClr8o•• Berkeley, University of CaliforDia Pre..
-11......
PatU, Venkutai B.
196.') "Approziruation to the Behre. .FiMer dWributioDlt," Bio. .,· rika, 52, 261.21]. 261 P ..nIOlI, Egoll S. 1966 Tla. Selftl.d Pap.u til E. S. Pf'GrJtot' t Berke.tty alad La. Augels, University of Califomia Prese. Pfanzagl. J obann 1988 Tuo,.y 0/ Jieanf'hfU'.', New York, John Wiley aud Son8. Measurement of utility and personal probability is typical of tb~ t'Oneerus of this highly nlathematic-aJ book. Plaekett, Robert L. 1966 'ICurrent tr'eDda in 8tatistieai inference/' J DtlnNIl of 'A. BOlliJI SttJI..,ic.1 So~_I!I, 8m~' A, 129,249-267. Poly., G. 19M Jfa,je...'~ alld PIG"bl~ B~480fl'.g. Vol. I, I"dwdio" GAd .AttOl09.11 sa .ali.....'ie.. Vol. II, PdllftW 01 PIG-bl.'.f'r~. Princeton, Prineeton UDi'Yersity Pl'eIB, and London, Oxford UBi. venit,. P,.... Pop])t'r, Karl 1959 ~ Logic 0/ Sci~"'ili~ Di.',co('~r", l.",mduB HutehinaoD. (BeprinW: S~enN!l Editions, Nflw York, 1961.) Pratt, .Tobn W. 1981 "Review of E. H. Lehmann's ~T~.t'''' Shd ...1k4l HlIflo,Ia•••/" J otJl'fMl of 'A~ A ..,riet.a. St4tiatictJI .A~3ocica'imI, 58, 183-167. A penonaiiatie review of the highly l'MJ)ftted fnquentistie work (Lehmann 1959.) Pratt, John. W. 1964 "Rilk aversion in the small and in the large," E~o_"'f~, 32, 1-2,122-138. A valuable eontribatioD to the theory of utility. Pratt, JohD W. 1965 "BayNiaa interpl'fltatiOD of standard inteftn~ atateauf'Dts," J".,f"fItGl of 'Ia~ ]WYdt SIGn-tie.' Society, S~riu B, ~, 189-203. i. t
BIBLIOGRAPHIC RUPPLEMENT
295
Quine, W. V. 1951 "Two dogmas of ewpiririam," Pllil080pAiCGI B.m.-, 80, 20-43. (ReprinW in Fro. II Logka' Poi., 01 Yifll'. 1953, Cambridp, Harvard UniY@nity Prea)
Z!
au..,Howard 1968 Dww. Attaly.if: l.atrotlac'cwy Lect.r., 0tI Cla~, ..... U.. t"4rtoi,,'y, RMdiDg, X. . . .ueUe, AddisoD-W.ley.
Raj..,
Boward, and Robert 8ehlaif er 1961 .-t "plW 8,.tiltlical D~iOft T"~O'r!l, CAillbridp, Harvard U Di.. nnity PreM. A rather adftnt'led textbook in pel'BUulistic statiatiGJ. tiDyi, Altrid 1970 Fn..-'Iou 01 Prob.biUtll, San FranNc!o, Holden-Day. More mathenaatital than phil~)pbieal but important for the .erioaa pbiJoeophieal atudenta of probability. Rfnyi, Altnd, and P. Rftfaz 1988 "A study of Mqllellees 01 equivalent events .. special .table . . qamee.," P"blka',ofN8 Jl4dt"-"'.CfJe, D~~, 10, 319-325. Roherla, Harry V. 1963 uRiak, ambiguity, and tht' Ravage axiom.: COJdJUeot," QfJartniy JOtfnud of ECOfIO"'eJ, 17, 327-342.
RyD-NarUawaki, C. 1951 "On .tatioaary aequellees of raDdom Y&riablea and the de Filletti'. eqllivaimep/' Colloq.iN" JI.,lnaatinl_, 4t 149-158. Salmon, WfJlley C.
63
63
1966 Tit. Fo ..""",,,, 01 Scilntliflc Iff/~c" Pittsburgh, UDivendty of Pitt.burgh PI Ia. Laeid review and study of the problelll of iDduet1011 bequeathed to 118 by HaIRe. Savage. I. Riebard 1968 SttJlvtictr: Uflt:~r,..,.tN aM B~AG~ior, Boeton, Houghton Kiflin Company. AD el@IDentary .tatisti~aJ textbook in the penonalistie Bayesian
spirit. Sa,.., Leonard J. 1980 "RMeDt teDdenftee in the fOllociatiou ot .tatiatita," pp. 540...'>" in Proee,• of 'i~ Iflurtttlti0ft41 01 .G'~.'ieia [Edinbargh.19M], Cambridp, CaDlbridp Uai1'enity Pita. Sa••• Leonard J. 1981 I"The foundations of stali.uN reeonaideftd," pp. 57S.588 in Vol. I of Pf'OHHtag. 01 'Ia~ Four'" [1980] B6rIt,I,'1 Sy.poIIi.... 011 JI,.,Ia#M,.tkfrl S,."-tic. ..ad ProbGbili'", ed. Jersy Neyman, Berkeley, Univntrity of California Preas. "' Sa. ., Leoaard J. 1982 "Bay_ian .tatiatieat" pp. 161-194 in Rf'e~..' D"'~lo".._" i. D.cmo. aH 1• .forfAatiott ProC'.tIB~t fIda. Robert E. Multol and Paul Gray, New York, Maemillan Co. ie Savage, Leonard 1. 1962 "Subjeetive probability and statiatieal pnetiee," pp. 9-36 ill (Sa.,., et ai, 1962). 217
.,lf
e",.,..,.
..
APPENDIX 4 Sa. ., Lecmani J. 1987 "DiIlealtiee in the theory of perlow probability,"
P"il~O'-lI
of 8~, 34, 306-310.
Savage, Lecmard J. 1961 "ImpHeatioDs of per80w probability for iaduetioa," J nrtIGl 01 PlIiloIJoplt" 84, 593-807. Savage, LeoDUd J. 1910 "R_diag augp8tiou for the fOlUldatiODB of a.dati.," TN .A~.. St.ti8ficitJ_, 24, No.4, 23-27. Som.what O\'erla~ but is maeh shorter thaa, the pneent Bibliocraphie Supplement. 188 Sayage, Leonard J. 1m "EHeitatiOD of pelBODal prohabilitiea and expeetatiOD8," J ow..., of ,It• ..4."";'.. Bfa""ieM .dMoeMIioIf, 68, 788-80L See (BtU' "011 BolateiIl1970). Savap. Leouard J., et a1. 1982 n. Fm."-'ioM o/S,.,."ietJll-t.rMU: A Sppoft••, New York, John WHey and 8oaa. Valuable tor the interehaDp of ideu amoor atadlU. . of di"erse ezperit'Dee and outlook. S• •', HeID'J 1970 "PraetiaJ 10111tiona of the .Behrens-Filher problem," J o"ntGl 0/ ,IN A...nco.. 8'.... .4.'0-';0,., 85, 1501.1598. III SeheUiDg, Thoma C. 1960 Tl" Seral.gy 01 CorajUcl, Cambridge, Harvard University
'i_'
PertiDent beeaue eoD1liet and group decision are _peeta of the same thiDg. Estramatbematical and partie.alarly.unl1llatblg. Sehlaifer, Robert 1959 Pf'oboW.,,, CItICI S,,.,.,,klt lor Buiruu D,tUiMu, New York,
McGraw-Hill Book Co. Sehmitt, Samuel A. 1989 Jl~tU."., UftC~rtQi.'y: Atl ElflflNtft'.r, IlllnHlwrw. to BG1J'Mil BIa'ulia, Reading, MUIM'huetts, Addition-Welley. SheUy, Mayurd W., II, and GleDD L. Bryan (eds.) 1964 N ....... JtuIg __l. Gttd O"ti.alily, New York, John Wiley and SoDa. An orpnizecJ eoUeetioD of _ , . by many alithoJ'L Intel eetiug in itselt and uef1l1 as an extensiye key reference. Smith, Cedric A. B. 1961 "Ccmaiateney ill atatiltieal infenmee anel decision," J OtInI4il 01 ,,.~ Boy.'8kJ'Y'ic41 Socil'l}, 8,M B,23, 1-25. Smith, ~e A. B. 1965 "Penonai probability and atatiatiea1 analyaia," J oul'ltGl of ,~ RoyGl81atil~41 BoeNly, B~• .A, 128, 469-489. Stall von Holstein, Carl-A.%eI S. 1970 .A"~UtIHfN au EHl...,iott oI8..bj'eli~~ ProbabiliIJI DUlrih'iou, Stockholm, The EaoDomie Reae&rch Institute at the Stock-
68
holm Sehool of Eeonomiee. 1'17 Exeellent monograph on how to elieit perlOw probabiliti. aad what to do with them. Reviews and enriehea a eoDiiderable literature. Related refermeee are (Sav.p 1971; WiDkler 1988).
2f11
BIBLIOGRAPHIC SUPPLEMENT Stone, Mervyn 1970 "The role of e%perimental randomizatioD in Bay.... datJati.: l'inite .tDpliog and two Bayeeiane," BiofII_,riktl, M, 881..s83. Strarift'llJlUl, WUliam E. 1911 "Proper lJay. minimax eatima~n of the multivariate Bornw 1IlMIl," ,A..z. of JlClCA.. . ,iat, 42, 3&')..388. Key retel'ftlee for a ehalleugiDg theoretieal develOPloeot initiated by Chari. Steia. Suppee, Patriek 1960 "Some Opt'b probleu. in &be foundationa of Bubjective prob&bility,"' pp. 182-189 in 1./fWflltlliH aN D~.. PrOC6I11f••, ed. Roben 11"01, New York, KeOraw-Hill Book Co. Ta\'&Dee, P. V., ed.
,iealS,",..
mo
Prob'. . .
0/".
Lo~ of ScNral'jk: KfIOtDl,dg_,
PI,_.
tra....tect
bJ
T. J. B1akely, New York, Bum.nitiee A rare opportunity to rMd in English the icHae of 101M n»Odena SoviR phiJOIIOph(lftJ about probability and iududioD. Tribe, U1U'ftlt'e H. 1m "Trial by lUathernati~: ~Oll and ritual in thfl legal prot..," HurNr" lAte R#ri#w:, 84, 1329-1393. A key ntere~ on the poeaible applifability of pro_bilistie idea in the eourta, whi~h the author dOftillot fiDd pro.llising.
Tribas, Xyron 1989 BG'ioJttJ' D.~"ip'iofllf, D,.ci8iOfU aM
D'~igfU,
Nf!W York, Per-
gamon Pras. A Beet. Irian approaeh. Tuk~,John
w.
1957 "Son.. e-Ulllplee with fiducial relevaot'E'," .A ..,../~ 01 JlfJt"~'"
'ktil 8'd'i8'~, 28, 687-895.
III
Tukey, Job. W.
8,.,..· '''''111..
1962 "The future of data anal,..is," .A ..... of 1I.'''ftla'kdl tiu, 33, 1-6'7. U1A1•• ,8tanialaw 1930 "Zv K_theori~ in der aUgemeineq )(eDgeniehre," 141 .,,~_, 18, 140-150. van Dantsig, David 1950-1 "Reriew of Camap'. Logi~tJl FOHttd4liotu 01 ProbGbilt"," 811f"1u.~, 8. 459-470. van Dutar. David
1957 "8tatiBtical prieatbood: Sav. OD penonal probabiliu.." SI...
,ut'". N6M'ltJfKiictJ, 11, 1..16.
'Wan
naDtzie, na.id
n: Sir Ronald on eeielltifle intereneet 8.lUfka N ~mfJttdktJ. 1.1, 185-200. The thn-e preeediDg retereDeaJ review three dUferent views of the foandatioD8 of pTObahility and etatiatia. fTOUl the .t&ndpoiDi
1951 "8t.&tistieal priMtbood
of a foa.rth. Vetter, KennanD 1987 WM"t'Aft.. 'ie'k~it aUld logittcltn Mohr (Paul Sieb.ek).
Spi~'m."',
TiibiDgea, J. C. B.
41
APPENDIX 4 ViiI. . ., C. ~OD
qualitative probability cralpbru," A ...I, 'ioGl B"fJ''''k~t 35, 1787-1796.
1984
von }Ii.., Riebard 1942 "On the _rreet ue of Bay.' foruaula," St4l1il'ieI, 13, 156-165.
A."".
01
MafA,...
43
o/lItJ,AneaIic81
DIOltratei u approaeh anaauallor a fnqueDtiaL YOD W richt, Geol'l Henrik 1982 "Remarb OIl epiiteruoiOl1 of I1lbjefti~e probability," pp. 330339 in (N..... Suppa, aDd Tanki 1982). Wald, Abraham (ed.) 19M S~IMJ'M PaJHl'" i. St.,istic. aN ProbGbDilg, New York, )f~ Graw...Bill Book Co. WalllJerm&ll, Paul, and Fred 8. BilaDder 1958 D.Nioft Jlan.,: .4. A.tIO'.'. Biblio,ntp", Itbaea. Comell UDivenity Prt.. 181 W . . .nnan, Pa1ll, ad Fred S. SilaDder 19M D.~Jlaa",: A. A.fIO'G,.d BiblitJ,,.p11l: SWpplflftdl, 1968-63, ltb.., Com.!) Univenity Pre.. ., Watta, Doaald O. (eeL) 1987 n., I'.'.r., 01 SI.fUIia, Proeeediup of a Conferenee OD the Future of 8tatistim held at the Uni'Yenrity of W. . . .in. JUDe 1987, New York aDd Londoa, A-.dende PreM.
Wetheri1l, G. B. 1961 "Ba7eeiaD sequential ualyaia," B~''''', 48, 281-2t2. WhittJt', Peter 1961 "Cu"e and periodogram 8nloodliDg," J otWUI 0/ Boyal SIntMfieGl BOcf.", sm.. B, 19, 38-47. Whitde, Peter 1958 ClOD the amoothing of probability deWlity tUlletiODB," JOII,.,.1 01 ". Boyal 8tGtU'ieal SocNIy.t S"w. B, 20, 334-343.. Theile two refereDete are sua_dye far penonaliatie teftllliqae. Williama••J. 8. 1988 "The role of probability in ftdatrial iofereDee," S •• 8ttri6~ A, 28, 21}'·296. 262 Winkler, Robert L. 1988 "The eoaR__ of ..bjeetive probability diatriblltioDtl," JIa. ag. . . .' StMrace, 15, 2. 881-B75. 1.71
I..
t".,
Wolfowit&, J. 1982 ''Bay.iao mlereaee &Dd UiOll18 of t'JO"tAmt deei8ioa," Reo. ",'riea, SO, 411-479. Woltowiiz, J. 1970 "Bdeetioaa on the tutare of matbematiea1 .tatiatiea," pp. 7397SO in Bu.". •• Probabilify Gad Bt.,..,ae., eds. R. C. Boee et Chapel RUI, UDivtaity of North CaroIiDa PI I••
.1..
ie
Technical Symbols 'lhII iDda ia illteaded to lead to the de&aita. of aD teehDMt.llJ1DboII tMt I n clefivd in the te:d &Del ued esteDIiftly. Some IJIIlba ba" man tIaaa . . . . . Nf....., ooneepoodilll to their u. ill more tbaa ODe - . clepeDdiDi OIl OOIltat..
A. B, 0,11,"
1,lIL 85
4(., B), c
-r
Bl
, .I f), 2S5 .Ita .),.~ ',It /,', A, I., II ',I, I, " 81 .,14. I, It ,,14, 82 I, i, 85 I, •• It, 71 GLB, 80 B.,B.. M7 B('A; Z), 288 1of,80 l(fi l), 188, 178 ',71 J(.; 'r). J, 148
(:).131 P,33
II. R. (;i, 195
.. S, lao 1', 186 .up, 287 8, .. 11,_
't
86
I, I',
as
T, 'l", 198
r 1,1M U.l93 V,.
.,.I
VCz),268
I(lP 6), 128 1(. ,. s), ~ ~),
I
l.-
'., 149
k,128
~z).I25
LOB. 80 z,., 1M. 174. 180, UN L., 180, 181
.... 11
£(r. t'), L, 178
L(f; Il. 171, 18), 1M, 181 L(f;". USa. 114, 180, 187 1,(... al, 181 100,127 1.1, 121
TCA),2'1 ftt(r), 141 N,Si'
~,I(,t),
121
/l(i Is), 124
-Dl . ''I f.-
-, ::l. C, 11
~.11
U,
n, 11
:S, 18, IS, 81, 7'J S·,18 ~,
<. >, ., 19
-,26 0-
, 87
+, is
1.
Author Index TbIa Index II inteDded to te.d to e'lrJ reference made in the text to an author'• . . . . or opimo. Only. few of the au\bon referred to do DOt have warb liatecl ill the blb110paphy (p. 2'11). A ,. . examplee WQItrate the UIe of tbia index: F. J. ADIcombe is DOt referIecl to In the tat proper, but then II a referenoe to him, beJODd tbe men IIItiDa of his _me, ill the bibliotp'apby UDder [Aoi). A paper of which BlackweD iI • coau\bor, ba\ who. &rat nat.! Ul~ iI KeDoeth J. Arrow, is eomewbere refemd to wit.IaoIa mea.tioD 01 BIactwe1l'. name, but 0Dly • bibllocraPhic IyDlbol of die form IAa). A work of 8. B. 8eade is lilted ill u.e hihIiopaphy, but DOt ot.benriIe meatioDed.
Dam
AKba, A. C., 238 AlIaII, Maurioe, 21, W. 101, 102 ADa, a G., Jr., 252
A......, P. J., (A4J Arrow, Kenneth J., 91, 142, 146, 1'11,
CbemoIf, BerDIID, a, 208 Churchman, C. Welt, [C8I Coombs, Qyde B., . . Robert M. Thrall Cramer, Gabriel, 81, 92, ' " 86 0JazMr, Harald, 181, 288, J61, ta, (eaJ
218
Bah."., RaPu Baj, 131, 1M, 140, 2M . . . . . S., 41, a, 78 BMlmoI, WilHam J., f1'1 Bayes, 'Daomaa. IB8J Bell, B. T., 93 BenaouIIi, Daniel, 63. 81, 91, 92 aDd fl., 91, 102, 166 BemouDl, Jaoob (- James), I, 92, [BIll BemoullI, };lcllolae, 93 Bl!'Jrhol, G., 193 BIaJay, M. T. L., It BIactweD, David, loi9, 158, 178, 1M, 187,
navle, Robert L. t , . Robert M. Thrall De FiDetti, Bruno, 4, 1, 28, 33, 38, 40, G, 51, 52, 58, 60, 62, 175, 177, [DI], [D6J Delorme, 8., (D8] Dem;ng, W. Edwanla, [B8)
Doob, J., 238 Duncan, D. B., 255 Dvoretzky, A., 219 EllviDl, 0., 1M
Feller, William, viii, 49, (Fl]
199 . . . KeDDetJi J. Arrow BobJl8DblUlt, H. p .. 148, 189, 191. 218 Donnz.ra, T., 121,269 Borel, Emil, 178, 179 BouIcb& ~b E., ace 8tiaIer in
~"'If Bowker. A. B., 262 BrambU1a, Franoeeco, 90 Buroe, O. X., 270
D' AJembert, Jeu Ie Bond, 85 Darmaia, G., 288
Fenchel, W., . . T.
BODneIIN,
F6rawJ, D., 62 Fisher, InriD" 98
Fisher, R. A., SO, 116, 1M, 2a6, 244, 251, 262, [F3], {F4}, (F6] Piaher, Walter D., 255 ~, Maurice, 178, 238, (D8] Frick, Lad";c. (Btl) Friedman, t.filton, 83, 97. 104 Fry, Thornton C., 30
Ginbick, M. A., 203, 243 . . tJlao David Blaekwell 301
302
AtrrHOR INDEX
Good, L J., ~ (G21 Graves, u.wrenoe M., 73
MorpDItem, Oekar. ~ l§. ~ ~ @L "- ~ 121, 269 . . ol.IO Jolm YOD Neumaam
Hallll08, Paul R., ix, 41. 131, 134.268,
MarIat, Geor... 1n1
Hammer, P. C., 36 Bardy, G. Ita 266
Morriaoa. Nathan, (K8) MOIteller, FrederIck C., 29 Mourier, Edith, JI8
Ha.....r, R., [Bt3)
MUDrOe, M. E.,
(Ht)
Hndreth, CUlord,
rm
BodaeI, 1. L., Jr., 200. 201. 203. 288 Hame. David, lH7I
NUDb. IL J., II
a,
~
~ ~ ~
270, tK4J
11M
Paul D. White
Kneale, W1l1iam, (A4J Kolmoaorotf, J.. N., 3, ~ 2M, (K81 Koopman, B. 0., M. ~ ft ~ Ma
~
~
(Kll), [K12) Kuhn, lL W., (K13J. {K14]
K'IJlback. s., ~ 1H
'."aford, Cooper Harold, 12 lAp. . . Pierre Simon de, H. (L2J
I.e Cam. Luaen, 241, 243, 2H LehmYD, E. L., 131, 134. 140, 200, 201,
203.221,20&6.249,250,251.252 . . oleo J. L. Bodpe, Jr. Leib1er, R. J.., ~ 1M . . olID S. KuBbaek Leibnia, 1 Lewia, Clarence Irving, 12 Undley, D. V.• [LSJ Littlewood, J. E., _ O. H. Hardy
MacLane. S., lD3. _ olao G. Birkhol Mahuano... P. C., (F6) Markowita, Harry, 1M ManbaU, Alfted, 96 McKiD8ey, J. C. C., 178, 1M Molina, Edward C., lB8)
1S9.240. 2'1,
244, 261, ,"Jt (NIl Nope, PbDip, 2D
KakutaDl, 8., 218 Karlin, 18& . . oleo H. P. Bohnenbluat Kendall, Maurice G., §t 14.0. 221. 241,
King, Robert L.,
~
.m, (MG)
Napl,~~62 N8J1D&Il. Jersy, ~ 115t\
J.trreya. Harold, (Jl) Jerab. James, Jr., . . Paul D. White
262, 270, lK2) KeJ1*, lohn Mayu.rd. !,
71.
Pareto, V'dfNdo. 98 Pucal, Blei. . 81 P"'n, Edward, 256 PeerIOD, E. 8., 1«). lse Pitman, E. J. G., 7&6 Po.,.., G., t . . . . G. lL IIanfy Price, Richard, (B8) PriDpheim, Alfred, (Btl] Rameey, Frank P .., it ~ ~ ~ [al]
RaG, C. R., ~ Beichenbach, Bane, ~ ~ [B2J Richter, BUB, (R2a] RobbiDl, Berber\, 266 Rubin, 'BenDul,
m
Rudy, Norman. 162
Samue1eon, Paul A., ~ 10.1 81..... L. J .., 100. 166, (821) . . al.o Milton Friedman, M. A. Girabick, Paul It. Halmce, cmd B... J .
NUDke 8cbd6, Hemy, 2M . . tUo E. L. LebmMn
Beane, 8. a, ~p., 8b&ckJe, G. L. a, ~,., Shannon. Claude E., liO Shapley, L. S., ~ 188, 189 . , oleo H.. F .. BoImeDblU8t Sbermu,8..Uil Shohat, J. J.., ~ ~ W 8il'feNtoae,
Its -
Smith, Cedric A. B., 1iQ
AUTHOR INDEX
Snow. R. N.,188 8obcayk, A., 36 Sommer. LouiIe, (B11..1 ~ R. ClaY. 2S2 8tMiat1cal ~ Group, Columbia Uldvet1lity, Ie. l.46 StiPr, Georp J., ~ 96
303
von Neumann, John, §s 15. 69, 7S. 94, ~ ~ ~ ~ 178. 179, 184, 187, 202 von Wright, Georg Henrik. 2m Wal~ ~
Abraham. viii. ?&a 61, 114, 116,
142. ~ 156, 159, 162, 184. 165, ~ 169. 170, 182, ~ 216, 220, 221, ~ 241. 2&1. 262
'hmarkiD, J. D., 1M J. A. Shohat T. .Id, A.. tee S. 8aDacb TInIl, Robert M., bibliDgrophfl TintDer, Gerhard, i7 Tippett, L. H. C., vii, tT2) Todhunter, b ~ 2m 'nicker, A. W., . . H.. W. Kuhn Toby. John W., 2M, 262
al,o A... Dvoretzky Wallis, W. Allen, xi, 29, 253, 262 Weaver, Warren, "' Claude E. Sh&nnon White, Paul D., 254 WieGel', Norbert, liO Winiams, R. M., 282 Wudom, John Oulton, [W13) Wold, H., ~ 100 WoUowitz, J., 142. 206, 238, 2i1 . . tIlao A. Dvoretaky
yaa D. .~, D.,62
Woodbury, Max, 68
.,~
\'OIl
M1eee, Richard, I
Yates, F., ZH
General Index See a1eo TdnieGl S",."., p. 299, and lndc of Aut.ur., pp. acu-a Acoeptanoe II&IDpIiDa, 263 A.ocIpUD& 247 A.ecaraq estimation, doctriDe of, 2571 Act, defiDition 01, U egmplee of" 14 pDeric symbols for, ~ 14 proper interpretation of, li Acta, CODItant, 26 equivaleDt (or iocWffftAt), 19 pDeric symbol for seta of, a iD.6ni. eeta of, 18 Actuarial ftlue, 9j AdmMaibiUty. 116 in theory of pmeI, 197 of a teet, 148 principle of, ~ ~ 231 1'fOUP, 17' Apeement bet.ween ada, on events, 22 Apeemellt between a probability meuwe aDd • quantitative probability, 3i ApeemeDt between people, ~ 661f, 11.. 128, 127, 217 aa to judpnent, 156 .. to utility, lIi5 complete, t Almoet equivalent events, 3I Almoet eDCt acienoe, lD1 Almost uniform partition, H Alt.emative bypotheeie, 247 of variance, 116 AMGl.t 0/ JI~ 81G1:i1tJe., 272 A posteriori probability, j7 Approach to certainty, leI, 176«. 21.,
Anal,.
228 A priori probability, 47 Ariatotle, 1 A,.. t:tmdtdtJltdi, h ~ 92
m
Aliymptotic IlOI1D&Iity, Aqmptotic vari&aoe, m
Banach-Tanki paradox, 42 Basic act, 106 definition of, l.lO Baaic dedaioa problem, ~ 208 Bayes' rule (or theorem), fli Behavioral iDtAtnoptioD, 28 strictly empirieat., ~ 29 BehavioraUstic and verbaliatic outlooks, 17. Behavioraliatic outlook, ~ 1691. 220, 261 applied to point. -jmation, 2291 BehreDa-PiIber problem, 251, 262 Betl, ~ M Betweenneee, 19 Bibliographies, 270, Binomial distribution, 131, 1461, 222 definition of, Boolean aJcebra, 11 duality principle of, 12 exercisea in, I1ff Borel fiekI, 42 Bounded &Ct, m Bounded utility, ~ 82 Britilb-Ameriean School, ~ ~ ~ 1M, 155, 169
m
ma
m
Certainty, approach
fA),
I-ll, 1781'. 214,
2m fUDct~
263 Copnt I8IIOD, prineiple of, M Communication, 88 Complement of an event, 11 Compouad problem, 2U Coaeaw function. I i Condit.icm&l apectatioD, 2M CoIlditioD&l preference, 22 amonc oonaequenoee IUperftUOUI, 25, 26 Cooditiooal prob&bility, 43« 80S Cbaracteriatic
OENBllAL INDEX
808
CoDclitiouJ probability, wide __, 62 Confidence interval, 281 Con&delloe level, 261 of a toleraDce int.ftl, _ CoIIIequence, ~ l j pneric aymboJa lor, U varieQ' of, 14 Coa8equeao., iporance of, 1& I1mbol for let of, U CcmsideratloD. _ of. 30 Qonaietent eequee08 of .'rna., 226 Co_,nt . . , 26 Coo_niDI ennta, 11 Ccmtn.cdoa, 1281 of aD ot.. ..tioa, U2 of a . . of acta, 111 CoDftX
function, ~ . .
1triadr,287 Convex . . of pmblea, 7.6 CoDYU--,_ Con. . act, 1H Correct _imafe. 230 Colt of ~ 116. llJ. 21', 214 Countable additivity. ~ A 78 Cramh-Rao iDequality. 23& DeciIioD, 13 after obIervation, 21 lo(poaDd,8 DeeItion problem, poup, 1". UId o.....tioa, 210 object.ivlltio, 172ft DecIIIioDI, co.eeutift. Ua 18
De8aitive ot..rvation, 127. 133. 212 Dearee of ocmYictioa, m Democracy, In
11
Eftlcient eequence of fStimatAII, 227. 2GIr Empirieal iDtapNtatiOD of poet""tea,
19,2) Epeilon, Penon, U
vertica1,ll Equal ewnta, U Equity, A 92 Equiftleuee, of eat. 01 of teIta, 148 Equivalent acte, 19
acta. ua
. . .,. Root-maa ~ error afttI 8quarecl em>r of AD eetim,te, de6nitioD of, m Errora 01 &as aDd IeOODd kiDd, 140.
M7 FatJmatJon, interval, 2:i9 poim,Z1OI' de&DitioG of, 221 Fajmatloa den'doll plOb1em, 22M Eftnt., oomplemeat of, U deflnltioD of, 10 esamplee 01, 10 _ric eymboll for, 11 Jlull (or vtrtua11y ilDpOllib1e), 2i
ani"""
10
vacuous, 10
Eftllte, almoet equivalent, 37 oontaiDiDI. 11
equaJ,l1
defiDition of, U1 Derived deeiliOD problem, 108
m
Derived problem, n.icn of elpelimeDt., ~ 106, 118 Dicbot.omy, 121 Ditferential iDformatioo. 2381 ~ment between people, Dominance, 111 in theory of pmeI~ 197 of ODe teet by uaother, US
ea
Duelity principle, 1M of Doo'en alpbra, 12 01 peMm&l probability, 78 of theory or PmeI, 185. 186
EqulftleDt. obaervatioaI. 112 EqaiftleD& eequeeoe of fmDte, 62 El'J'Or, meeD equare, ~
eo. of oon"da'atioD, ao
De Morpn'. theorem, poeral, 11 Derived act, 10D
D"aHatic views OD probebility, ~ ~ ~
~
as
iDtereectioD of, 11 UDioD of, 11 EspeetatioD, oooditiODAl, W Ezpeeted ftlue, 21M defiDitioD of, 283 Experience, ~ ~ ~ 82
EzpedmeDt aDd ~ 111. 118 ExteDaion, of aD ot.rvation, 112
of • • 01 .... m EDreme II. J.29
1_
GBNEBAL INDEX
J'utonhiJl~,. aftedoG for . .~.
NroolD, II JlduaW prolJUiIlty, 212
Pme.1L •
IDdaaUft ___.. 2 I..........
ID&mum.1Il
Infhdte letala ..,.,u.d met,heeetl., rl
Foad..... 01 __- . role of, 1
Infinite
FoaadatioDa of atatiItiea, deep, I hittory of, 1«
lDfonnaifoD, ~ !A 2861 cBenmtial. 2811 Iafonaatioa iDeqaalit,., 218 r,..at6a.t,plOll, principle of, 193
tlaaDow,l
~
utiJit7, &1
~
M.
Intepoal, 263 lIlt.enapt.ioD, hebaftnl, •
of deei8ioDs 180ft matbemMiCi of, 184. tbeorr of, US, 178ff Givea,
~M
GlaDd world, IY
Gteek fODta, 11 Group, mathematical. 193 OIOUP &etioa. 105 Group deeiIIoD problem. 1721 ad o_nation, 210 Group minimax rule, 317 llauldortf moment problem, & M. 112 Homopaeou ooordiaateB, 1&1 B,per-uWity. 76 Bypotheaia. altemative, 247 estreme null, 2M
.d,M7 Iaoome, UIii _ t l... ~~llD UId - . 182, Dl peI.onN,123 IaOOllliatency, ~ II, 6I ladeailion, 21 lad.epeuda0l ill qualitative probability. "tIl Iadepeadent ewnte, M IDClepeQdeDt rudom ~ j8
lDdIa. . . . lL69 cU8loulty of t-dar. 1.7 laducdft behavior. 1M
intermediate mode oft .28 atrictly empiriaal. 2& 29 IntAaectioa of ewau, 11 Iot.enal, 261 IDterYal eetimatioD, m definition of, . , 28Q IDtezTaI of pm.... D Interval of HllMri'7• •7 InvariaDee of & pme, UN' IDvariaat minima, lt1. 191 Irrelevant, 126
utterly, 126 Imtlevaut even', ~
JountIIl oJ A~ &a,.-• • A . . . . liM, 2m Judcm-t,l66 IMp numbers, atroac law of, M weak law of, A M. Q1 Learnia&~M
. . GltoEsperi_oe Lebeepe m.eure, jJ. I.ikelD.oocl rMio, A 18M, 225
Ukelibaodfttlo .... IlL 211 l'near I1IDotloD, . , ~C,a . . - "01' . . . . ,
empbiea1 iD~ of, 2D
erl\W1m 01, m iDcompJet.eae. 01, & JlOI'ID&tift iDterpretatloll of, to Loalcal behavior. Implieatiou of, t ~ 2D "Look before 70U leap prineip1e/' 11 critkiIm of, lI. 11
Lea. 111. 1M. 1& un perIOIIAI, 114
OENBBAL INDEX
301
Loaa, uniformity of, 188. 174 Lea and neptift iDoome, 182, 2m
MaraiDal utility, 103. 1M diminiahi"" Od
Mat.hemMieal expectation. priDdpIe of. ~91
.Iuimin, !H Muimum-Ubliboocl eatimate, 140, 203. 222&,241 deinition of, 226
Mean-lqU&N error,
~
. . ol.o ~ ~ error tIII4 Squared error Meuurable random variable. 44
MectiM,228 Microooem, 86 MiDimV-j 18i
1(.'."7 statiatic, 137. 2:H
Meet e111"7 . . . of probUilit,y. &•
81.
II Neptlft lnoame. 184. 169. 170 uad 10M, 181, D Ne1JllAD-Peanca ICbool, UO Ne,maD-P.noD dleory 01 ~ 262 aoa-ArchiIDlKleaD probebiDt.y, 19
1.
Normal cli8tribu\km, 182. 222 Normative interpretadou, of postuIa&ee. of tbeotJ of utility, 91 Normatift theory, 102 NuiIaD. puameter, 223 Null eveat, It, 1& Nail hJpot,heli,. 14.7 edreme, 2M
Nun . . . . .tioD, 112
MiDimu~.lM
Minimax equali'Y, 179, 187 Mhdmu _mete, 212, ~ 211 Mlnjmu rule. 157. 1801 ADd eimp1e o.rderiDc, _ poap,17ttr.m objeetivieUc, 1641
deftnitioD of, 1M
iJIuItratioDa of, IMI objectiYiatic mot.ivatiOD of, I§B. 189. Minimax rules, criticism of, 200Jr MiDimu test, 249, 2liO Minim" tbeon-. met.laematiCl of, IMt1 AfiDimu tbeoIy, 1. objectivil&io, de6DitioD of, 186 objectiYiat.ic approach to, 1581 Minimax tbeoIy aDd cae. ,atioa, D lWoimax value, 1M MiDcl act, IA !R ill croup deci8ioa problem, 173 Mised acta in atatist.icI, 218. 216. 217« Mixture of pmble8, n Momell' problem, 1IawIdodr, ~ H, W
Moral expectation, ~ 94 Moral worth, 91ft' Multiper8OD&l OODIIideratiou, 122, 1M, 126, 127. ~ lMlr, 17'21
... . . Apeemeat, Certaint)', tIftd
J>ila&reemeat
Mal...., o. . . . .tioD (or atatiaUc), W oouati", of, 111
Object.iYiado deciliOJl problem, llII Obit_riItic oIarvatioul problem. 2DI ObjeotiYiltlc . . . of probebiltJ'. ~ ~ ~g253,2H
ceatra1 dilicult)' of, , Pl'Obabilt7 of iao1ated propoeitiODll 1UIder,j
ot.t .atioD, 10611. 18 COlt of, lltt UI.. 189. 214. 214 c:IeciIiOD after, 23 cleftnitioll of. l.1O ObeeIYat.ioD8l problem, objectaftldo. 2D8 Ot.erYatioa aDd aperim8Ilt. 117. 118 0 ... ,eclft1ue, un ObtaiDl, 1D ()peIam.c obanctaiatJc. HI Optiri-l, 88 Ordar ItfAiIIie, 111
Parameter, 221 m"nGe, 113
Partial onleriD& 21
Partition, 2A aI!DOG UDifOl'lD, M Partition formaIa. 4Ii P&rtitioD pI~ 1JOI' PfII'IIODaIiItio view, 18
cti8icuItieI wit.b, 17 po_ble ill.... , . . . of, II P....alIIUo . . . of prot.bIIlV, J, 87 Pea8OMI ......... ~ 10
GENERAL INDEX
PaIODal probability, cridclam of . . . . . lade approaeh to, ~ 28 .... termI for, •
Pemoa IS eeoaomio UDi~ a
'D
,E""I
I.m..,. !.
ao SIll.
Repet.1Sl
BeJectiD& ~7 Root-~un enor, m .... Meaa«I'JAN enorGJWlBqaazed
PIa . . . .np cleeiliOll, 16 oriticiWD of, !l.. II PaiDt 81&1matioD, 220« de6Dittoo of, 221.
errm'
Po. COIl distribution, 222 Power faaotion, 21.8
St. Petenburs paradox. 981 8chwart. iDequalit:v. . . Science, aImoIt eaet, 101 8equea.tia1 ~ ll§. 1421. 215, 216 8equeat.ial obwa vatiODal pJOIIUD, 1G
PrefereDOe, 17
.. limpJe 0IderiDc. 11 .. pertial ordeIiD& 21
CODditioDal, 22 IUpeliooaa for
oo..-queD~
a§. 26
lne8aivity 01, 17 traDIlttri\y of, 18 PreI. .ce amoDI OOD8lqueDceI, 26 diatlnp'ehed from preference &mODI
....
PIe ItatiltiCl, 6 Primary act, 18& Prbe, 81 PrcabiUtlea of hiper emler, R ProbMJUitJ, methemati-l propertiee of,
M
UDbow:a, mperflUOUilllI. of iD perIODaliatie theory, ~ U Yiewa OIl, d,N.,iC, 3a §!. ~ 83
Ja eo. 61. 87 obJeedvlldc, I, ~ lI. fl, 2S8, 2M perlODa1Wc, Iz trl DeClEIII!7,
_ . . PerllClftl1jatio ~
P!obabBi\y 1Deaaare, 33 ProbMJUit7 ...." 46 PropoIItIoDa, probabQit 1 oi, UDder objectiNtie . . . . f, 2t §L 12 Pa.Jdo.mioroooem, sa PaJOhololiea1 probability,
m
Qualitative probability, defiDitioa of, 32 egm., 28 be hot Jd tiabt, ~ neither fiDe DOl' tiFt, i1
tlaht but not be, a Quantitative probebiJit1.
Random variable, 46 reel, 211 Batjona1 be":rior, t Ray. 1M
aa
Bandomi.tkm, . . 183. 216. m Raadom Dumbera, 67
8equeatial probability ratio proeed.are 146 SipifiCIDce lewl, 252 zeportinc of, 266 Sipificance .... HIlI 8hnpIe dichotomy, ~ 141, Mt. ~
212. 213, 212 Simple orderiDi. 18 aDd the miniIOU rule, D exera- on, 19 abe of a teat, 2m SmNI world, it ~ 821 8qaared ernJI', It 2U
. . . . MeeIHquare error IDfJIYHqUAI"8 error
"., . .
Standard deviatioD, 2U StMdvd pale. 17M 8tudard aequeoce of oIanatloaa, 22t State. 9 true, a State&, pDeric I)'Dlbola fOl', U Statilltic, 128 Btatiati-. other DIINII for. 2 . . . ot.2 Btatiltiea proper, A. 10L 114, 121 defiDitioD of, 1M 8tratea1 fanctioD, 111 Strict.b' convex faactioD, 7111 Subjective probability, :I) SdicieDt ItAtiItie. 1201, 212, ~ tao, 287. ~ 2Ha 259 factorabOlty eri\eftoo for. 1801 8uptemum, ~ 2IJ1 Sure pellODAl ~ §7.. II. 1& 8are-tbiDc prinelple, 2~. 114. 2Irl
OENEllAL INDEX
310
&,mmetric dual, 78 Symmetric BlueDoe of eYeD'" _ Symmetry. 232, 246 in probability, 831 of pmee, 193&
Tutea,l65
Te&m mate, 182 Teat., definition of, 247 of hypotheees, 24:61
Utility• • aDd \he mjnlmay ruleI, toUr bounded, 91
critidlm of, Id defiDitioa of, 78
history of, our
loprithmic, M. 91 probability.., Ot, 95. 91 Utterly lrreleYa1lt o'beer9atba, 1" 211. 23'1
TeetiDa. 221
Tedina problem, "7
n. in rank, 219
Tipt, 'KI, 40 Time in theory of deeimOD, 10, 11, II, 44
VMilIation, 21 VacuOlil tmDt, 10
..,...bollor, 10, t 1 Vap.u II, 18, 118, u,e
Value of
0_ ,.1iaD,
111
Toleranoe interval. 262 Tolerance level, 262 Topolosica1 "'mptioDa poeaible for a 1imp1e orden... 18
Variaaoe. 218 VeDA diapam, 12 Verbelietio ud behavioral. . . oatJooh,
Traaaitivity. 19 True state, 9
Verbelistio outlook, 1_. 2m, 280, III iudequacy oIia cJe6ajUoo 01 ptI'IIOUI probUliIity,2'7,28 Vutual exteDlioD, IC8 Vutually equlftleD\ acta, I . Vutually im~ l.hIe eYellt. 2C
UDbiaeed eetimate, ma, 224, m. 246 definition of, 228 UDbi.ued teet, 249 ortticlam of, 250 Uniform distributioD, 1St UDiOD of eventa, 11
Uoiveral event, 10 symbol for, 11 Ut.ile,82
17
World, choice of. g de6Dition 01, 9 eumplt. of, 8 1f'&Ild, M small, 9, 16, 821