Probability and measure - PDF Free Download

Probability and Measure Second Edition Patrick Billingsley The University of Chicago JOHN WILEY & SONS New York •...

Author: Patrick Billingsley

236 downloads 1494 Views 15MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Probability

and Measure

Second Edition

Patrick Billingsley

The University of Chicago

JOHN WILEY & SONS

New York

•

Chichester

•

Brisbane

•

Toronto

•

Singapore

Copyright ® 1986 by John Wiley & Sons, Inc. All rights reserved. Published simultaneously in Canada. Reproduction or translation of any part .of this work beyond that permitted by Section 107 or 108 of the 1976 United States Copyright Act without the permission of the copyright owner is unlawful. Requests for permission or further information should be addressed to the Permissions Department, John Wiley & So�s, Inc.

Librtlry of C•gras Catalogillg ;, Publicat;. Data: Billingsley, Patrick. Probability and measure. (Wiley series in probability and mathematical statistics) Bibliography: p. Includes index. 1. Probabilities. 2. Measure theory. I. Title. II. Series.

QA273.B575 1985 ISBN 0-471-80478·9

519.2

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

85-6526

Preface

Edward Davenant said he "would have a man knockt in the head that

should write anything in Mathematiques that had been written of before." So reports John Aubrey in his

Brief Lives. What is new here then?

To introduce the idea of measure the book opens with Borel's normal number theorem, proved by calculus alone, and there follow short sections establishing the existence and fundamental properties of probability mea sures, including Lebesgue measure on the unit interval. For simple random variables-ones with finite range-the expected value is a sum instead of an integral. Measure theory, without integration, therefore suffices for a com pletely rigorous study of infinite sequences of simple random variables, and this is carried out in the remainder of Chapter

1, which treats laws of large

numbers, the optimality of bold play in gambling, Markov chains, large deviations, the law of the iterated logarithm. These developments in their turn motivate the general theory of measure and integration in Chapters and

3.

Measure and integral are used together in Chapters

4 and 5

2

for the study

of random sums, the Poisson process, queues, convergence of measures, characteristic functions, central limit theory. Chapter

6

begins with deriva

tives according to Lebesgue and Radon-Nikodym-a return to measure theory-then applies them to conditional expected values and martingales. Chapter

7

treats such topics in the theory of stochastic processes as

Kolmogorov's existence theorem and separability,

all illustrated by Brownian

motion. What is new, then, is the alternation of probability and measure, prob ability motivating measure theory and measure theory generating further v

VI

•

PREFACE

probability. The book presupposes a knowledge of combinatorial and discrete probability, of rigorous calculus, in particular infinite series, and of elementary set theory. Chapters 1 through 4 are designed to be taken up in sequence. Chapters 5, 6, and 7 are essentially independent of one another and can be read in any order. My goal has been to write a book I would myself have liked when I first took up the subject, and the needs of students have been given precedence over the requirements of logical economy. For instance, Kolmogorov's existence theorem appears not in the first chapter but in the last, stochastic processes needed earlier having been constructed by special arguments which, although technically redundant, motivate the general result. And the general result is, in the last chapter, given two proofs at that. It is instructive, I think, to see the show in rehearsal as well as in performance.

The Second Edition.

In addition to correcting some errors, I have in this edition expanded the treatment of several topics, such as the asymptotic behavior of Markov chains, and added a few new ones, such as a central limit theorem for martingales. Although I have made cuts here and there, the second edition is longer than the first. It is still possible to cover most of the book in one academic year if all the starred topics are omitted.

Acknowledgments.

Again I thank Michael Wichura, whose comments on a preliminary version of the first edition greatly improved the book. My thanks go also to the many readers whose suggestions have led to improve ments in the new edition. Finally, I thank Susan Shott for such suggestions, as well as for help in seeing the new edition through the press.

Envoy.

The book is done-no further editions. Making this contri bution to the river of mathematics has meant days and years of easeful toil, time very agreeably spent. And although the contribution is small, the river is great: After ages of good service done to those who people its banks, as Joseph Conrad said of the Thames, it spreads out "in the tranquil dignity of a waterway leading to the uttermost ends of the earth."

Patrick Billingsley Chicago, Illinois A ugust 1985

Contents

CHAPTER 1. PROBABILITY 1. Borel's Normal Number Theorem, 1

The Unit Interval-The Weak Law of Large Numbers- The Strong Law of Large Numbers-Strong Law Versus Weak-Extending the Probabilities 2. Probability Measures, 16

Spaces-Classes of Sets-Probability Measures-Lebesgue Measure on the Unit Interval-Constructing a-Fields* 3. Existence and Extension, 32

Construction of. the Extension- Uniqueness and the w-A Theorem-Monotone Classes-Completeness-Lebesgue Measure on the Unit Intervai-Nonmeasurab/e Sets 4 . Denumerable Probabilities, 45

General Formulas-Limits Sets-Independent Events-Subfie/ds The Borel-Cantelli Lemmas- The Zero-One Law-Strong Laws Versus Weak * Stars indicate topics that may be omitted on a first reading. VII

••

VIII

•••

CONTENTS

5. Simple Random V ariables, 63

Definition-Independence-Existence of Independent Sequences Expected Value-Inequalities 6. The Law of Large Numbers, 80

The Strong Law- The Weak Law-Bernstein 's Theorem-A Refinement of the Second Borel-Cantelli Lemma

7 . G ambling Systems, 88

Gambler 's Ruin-Selection Systems-Gambling Policies-Bold Play* - Timid Play*

8. Markov Chains, 107

Definitions-Higher-Order Transitions-An Existence Theorem- Transience and Persistence-Another Criterion for Persistence-Stationary Distributions-Exponential Convergence*-Optimal Stopping* 9.

Large Deviations and the Law of the Iterated Logarithm,* 142

Moment Generating Functions-Large Deviations-Chernoff's Theorem-The Law of the Iterated Logarithm

CHAPT ER 2. MEASURE

155

10. General Measures, 155

Classes of Sets-Conventions Involving oo-Measures- Uniqueness 11. Outer Measure, 162

Outer Measure-Extension-An Approximation Theorem Caratheodory 's Condition* 12. Measures in Euclidean Space, 171

Lebesgue Measure-Regularity-Specifying Measures on the Line-Specifying Measures in R k 13. Measurable Functions and Mappings, 182

Measurable Mappings-Mappings into R k-Limits and Measurability - Transformations of Measures

IX •

CONTENTS

14. Distribution Functions, 189

Distribution Functions-Exponential Distributions- Weak Convergence -Convergence of Types*-Extremal Distributions* CHAPT ER 3. INTEGRATION

202

15. The Integral, 202

Definition-Nonnegative Functions- Uniqueness 16. Properties of the Integral, 209

Equalities and Inequalities-Integration to the Limit-Integration over Sets-Densities-Change of Variable- Uniform Integrability Complex Functions 17. Integral with Respect to Lebesgue Measure, 224

The Lebesgue Integral on the Line-The Riemann Integral The Fundamental Theorem of Calculus- Change of Variable The Lebesgue Integral in R k -Stieltjes Integrals 18. Product Measure and Fubini ' s Theorem, 234

Product Spaces-Product Measure-Fubini 's Theorem-Integration by Parts-Products of Higher Order

19. Hausdorff Measure,* 247

The Definition- The Normalizing Constant-Change of Variable-Calculations

CHAPT ER 4. RANDOM VA RIABLES AND EXPECTED VA LUES 20. Random Variables and Distributions, 259

Random Variables and Vectors-Subfie/ds-Distributions Multidimensional Distributions-Independence-Sequences of Random Variables-Convolution-Convergence in Probability- The G/ivenko-Cantel/i Theorem*

259

X

CONTENTS

21 . Expected V alues, 280 Expected Value as Integral-Expected Values and Distributions-Moments-Inequa/ities-Joint Integra/s-Independence and Expected Value-Moment Generating Functions 22. Sums of Independent Random Variables, 290 The Strong Law of Large Numbers- The Weak Law and Moment Generating Functions-Kolmogorov 's Zero-One Law-Maxima/ Inequalities-Convergence of Random Series-Random Taylor Series* - The Hewitt-Savage Zero-One Law* 23. The Poisson Process, 307 Characterization of the Exponential Distribution- The Poisson Process - The Poisson Approximation-Other Characterizations of the Poisson Process-Stochastic Processes 24. Queues and Random Walk,* 322 The Single-Server Queue-Random Walk and Ladder Indices-Exponential Right Tail-Exponential Left Tail- Queue Size CHAPTER 5. CONV ERGENCE OF DISTRIBUTIONS

335

25. Weak Convergence, 335 Definitions- Uniform Distribution Modulo 1 *-Convergence in Distribution-Convergence in Probability-Fundamental Theorems-Belly 's Theorem-Integration to the Limit 26. Characteristic Functions, 351 Definition-Moments and Derivatives-Independence-Inversion and the Uniqueness Theorem- The Continuity Theorem-Fourier Series* 27. The Central Limit Theorem, 366 Identically Distributed Summands- The Lindeberg and Lyapounov Theorems-Feller 's Theorem*-Dependent Variables* 28. Infinitely Divisible Distributions,* 382 Vague Convergence- The Possible Limits-Characterizing the Limit 29. Limit Theorems in R k , 390 The Basic Theorems-Characteristic Functions-Normal Distributions in R k-The Central Limit Theorem-Skorohod 's Theorem in R k *

XI •

CONTENTS

30.

The Method of Moments,*

405

The Moment Problem-Moment Generating Functions-Centra/ Limit Theorem by Moments-Application to Sampling Theory-Application to Number Theory

CHAPTER 6. DERIVATIVES AND CONDITIONAL PROBABILITY

419

31. Derivatives on the Line,* 419

The Fundamental Theorem of Calculus-Derivatives of Integrals-Singular Functions-Integrals of Derivatives-Functions of Bounded Variation 32. The Radon-Nikodym Theorem, 440

Additive Set Functions-The Hahn Decomposition-Absolute Continuity and Singularity- The Main Theorem 33. Conditional Probability, 448

The Discrete Case- The General Case-Properties of Conditional Probability-Difficulties and Curiosities-Conditional Probability Distributions 34. Conditional Expectation, 466 Definition-Properties of Conditional Expectation-Conditional Distributions and Expectations-Sufficient Subfields*-Minimum Variance Estimation* 35. Martingales, 480 De.finition-Submartinga/es-Gambling-Functions of Martinga/es-Inequa/ities-Convergence Theorems-Reversed Martinga/es-Applications: Derivatives-Likelihood Ratios-Bayes Estimation-A Central Limit Theorem* CHAPTER

36.

7.

STOCHASTIC PROCESSES

506 Stochastic Processes-Finite-Dimensional Distributions-Product Spaces-Kolmogorov 's Existence Theorem- The Inadequacy of 9/T Kolmogorov' s Existence Theorem,

506

XII

••

CONTENTS

37. Brownian Motion, 522 Definition-Continuity of Paths-Measurable Processes-Irregularity of Brownian Motion Paths- The Strong Markov Property-Skorohod Embedding*-Invariance* 38. Separability,* 551 Introduction-Definitions-Existence Theorems-Consequences of Separability-Separability in Product Space APPENDIX

564

N OTES ON THE PROBLEMS

575

BIBLIOGRAPHY

610

LIST OF SYMBOLS

613

INDEX

615

PROBABILITY AND MEASURE

CHAPTER

1

Probability

SECI'ION 1 . BOREL'S NORMAL NUMBER THEOREM

Although sufficient for the development of many interesting topics in mathematical probability, the theory of discrete probability spacest does not go far enough for the rigorous treatment of problems of two kinds: those involving an infinitely repeated operation, as an infinite sequence of tosses of a coin, and those involving an infinitely fine operation, as the random drawing of a point from a segment. A mathematically complete develop ment of probability, based on the theory of measure, puts these two classes of problem on the same footing, and as an introduction to measure-theoretic probability it is the purpose of the present section to show by example why this should be so. The

Unit Interval

The project is to construct simultaneously a model for the random drawing of a point from a segment and a model for an infinite sequence of tosses of a coin. The notions of independence and expected value, familiar in the discrete theory, will have analogues here, and some of the terminology of the discrete theory will be used in an informal way to motivate the development. The formal mathematics, which involves only such notions as the length of an interval and the Riemann integral of a step function, will be entirely rigorous. All the ideas will recur later in more general form. t For the discrete theory, presupposed here, see for example the first half of Volume I of FELLER. (Names in capital letters refer to the bibliography on p. 610.)

1

2

PROBABILITY

Let 0 denote the unit interval (0, 1]; to be definite, take intervals open on the left and closed on the right. Let w denote the generic point of 0. If A

(1 . 1 )

n

= U

i= l

( a; , b;] ,

where the intervals (a;, b;) are disjoint [A3] t and are contained in 0 , assign to A the probability (1 .2)

n

P( A ) = E

i= l

( b; - a; ) .

If A and B are two such finite disjoint unions of intervals, and if A and B are disjoint, then A u B is a finite disjoint union of intervals and (1 .3)

P( A

U

B) = P(A) + P(B) .

This relation, which is certainly obvious intuitively, is a consequence of the additivity of the Riemann integral:

If /( w ) is a step function taking value c1 in the interval ( x1 _ 1 , x1 ], where 0 == x0 x 1 < · · · < xk = 1, then its integral in the sense of Riemann has the value

<

{1 .5) Iff= (� and g = 18 are the indicators [AS] of A and B, then (1 .3) follows from (1 .4) and (1.5). This also shows that the definition (1 .2) is unambiguous-note that A will have many representations of the form (1 . 1) because (a, b) u ( b, c) == (a, c). Later these facts will be derived anew from the general theory of Lebesgue integration. *

According to the usual models, if a radioactive substance has emitted a single a-particle during a unit interval of time, or if a single telephone call has arrived at an exchange during a unit interval of time, then the instant at which the emission or the arrival occurred is random in the sense that it lies tA

notation [An] refers to paragraph n of the appendix beginning on p. 564; this is a collection of mathematical definitions and facts required in the text.

*Passages in small type concern side issues and technical matters, but their contents are sometimes required later.

SECTION

0

1.

BOREL'S NORMAL NUMBER THEOREM

Graph of d.1(w)

1

I 0

Graph of d2(w)

3

J 1

in (1.1) with probability (1.2). Thus (1.2) is the starting place for the description of a point drawn at random from the unit interval: 0 is regarded as a sample space and the set (1 .1) is identified with the event that the random point lies in it. The definition (1.2) is also the starting point for a mathematical represen tation of an infinite sequence of tosses of a coin. With each w associate its nonterminating dyadic expansion (1 .6) each d n (w ) being 0 or 1 [A31). Thus (1 .7) is the sequence of binary digits in the expansion of w. For definiteness, a point such as ! = .1 000 ... = .01 11 . . . , which has two expansions, takes the nonterminating one; 1 takes the expansion .1 11 . . . . Imagine now a coin with faces labeled 1 and 0 instead of the usual heads and tails. If w is drawn at random, then (1.7) behaves as if it resulted from an infinite sequence of tosses of a coin. To see this, consider first the set of w for which d;(w) = u ; for i = 1, ...,n, where u 1 , , u n is a sequence of O' s and 1 ' s. Such an w satisfies • • •

oo

n u. n u. 1 w + E ---: L--:< < E --: , ' ' ' 2 2 2 i-1 i-n+1 i-1 where the extreme values of w correspond to the case d;(w) = 0 for i > n and the case d;(w) = 1 for i > n. The second case can be achieved, but since the binary expansions represented by the d;(w) are non ter minating- do not end in O' s- the first cannot, and w must actually exceed E 7_ 1 u ;/2 ;. Thus (1 .8)

[w: d;(w) =

U;,

i

= l,

. . . ,n ]

( E --;.u. E --;u . + 1 l n

n

=

;- 1

2

i= 1

2

r .

4

PROBABILITY

0

1 01

00 000

001

010

I

10 01 1

1 00

I

11 101

1 10

Dec ompositions by dyadic intervals

I

I 111

The interval here is open on the left and closed on the right precisely because the expansion (1.6) is the nonterminating one. In the model for coin tossing the set (1.8) represents the event that the first n tosses give the outcomes u 1 , , u n in sequence. By (1 .2) and (1 .8), • • •

(1 .9)

P [ w: d; ( w ) =

U;,

i = 1, . . . , n]

=

;n ,

which is what probabilistic intuition requires. The intervals (1.8) are called dyadic intervals, the endpoints being adjacent dyadic rationals with the same denominator, and n is the rank or order of the interval. For each n the 2 n dyadic intervals of rank n decompose or partition the unit interval. In the passage from the partition for n to that for n + 1, each interval (1.8) is split into two parts of equal length, a left half on which dn 1 ( w) is 0 and a right half on which dn 1 ( w) is 1 . Hence P[w: dn (w) = 1 ] = � for every n. Note that dk (w) is constant on each dyadic interval of rank n if k < n. The probabilities of various familiar events can be written down im mediately. The sum E7_ 1 d; ( w) is the number of 1's among d1 ( w ), . . . , dn ( w ), to be thought of as the number of heads in n tosses of a fair coin. The usual binomial formula is +

+

0

(1 .10)

s

k < n.

This follows from the definitions: The set on the left in (1.10) is the union of those intervals (1.8) corresponding to sequences u 1 , , u n containing k 1's and n - k O's; each such interval has length 1/2 n by (1 .9) and there are � of them, and so (1 .10) follows from (1 .2). The functions dn (w) can be looked at in two ways. Fixing n and letting w vary gives a real function dn = dn ( ) on the unit interval. Fixing w and letting n vary gives the sequence (1.7) of O's and 1's. The probabilities (1.9) • • •

·

{)

SECTION

1.

BOREL ' S NORMAL NUMBER THEOREM

5

and (1 .10) involve only finitely many of the components d;( w ) The interest here, however, will center mainly on properties of the entire sequence (1 .7). It will be seen that the mathematical properties of this sequence mirror the properties to be expected of a coin-tossing process that continues forever. As the expansion (1.6) is the nonterminating one, there is the defect that for no w is (1. 7) the sequence (1, 0, 0, 0, . . . ), for example. It seems clear that the chance should be 0 for the coin to tum up heads on the first toss and tails forever after, so that the absence of (1, 0, 0, 0, . . . )-or of any other single sequence- should not matter. See on this point the additional re marks immediately preceding Theorem 1.2. .

The Weak Law of Large Numbers

In studying the connection with coin tossing it is instructive to begin with a result that can,..in fact, be treated within the framework of discrete probabil ity, namely, the weak law of large numbers: Theorem 1.1.

(1 .1 1 )

For each £, t lim p

n-+oo

["': n f. d; ( "' ) .!.

,_ .

1

1 - - > 2

£

]

= 0.

Interpreted probabilistically, (1.11) says that if n is large, then there is small probability that the fraction or relative frequency of heads in n tosses will deviate much from � , an idea lying at the base of the frequency conception of probability. As a statement about the structure of the real numbers, (1 .1 1) is also interesting arithmetically. As will become clear in the course of the proof, the set in (1 .11) is the union of certain of the intervals (1.8), and so its probability is well defined by (1 .2). With the Riemann integral in the role of expected value, the usual application of Chebyshev's inequality will lead to a proof of (1 .1 1). The argument becomes simpler if the dn ( w ) are replaced by the Rademacher

functions, (1 .12 )

if dn( W ) = 1 , if dn( W ) = 0.

t The standard ( and 8 of analysis will always be understood to be positive.

6

PROBABILITY

L 0

1

0

Graph of

r1(w)

I 1

Graph of

r2(w)

Consider the partial sums n

(1 .1 3)

sn(w) = L r; (w ) . ;-I

Since E7_1 d; ( w) = thing as

(sn ( w) + n )/2, (1.11) with £/2 in place of £ is the same

(1 .14) This is the form in which the theorem will be proved. The Rademacher functions have themselves a direct probabilistic mean ing. If a coin is tossed successively, and if a particle starting from the origin performs a random walk on the real line by successively moving one unit in the positive or negative direction according as the coin falls heads or tails, then r; (w) represents the distance it moves on the ith step and sn (w) represents its position after n steps. There is also the gambling interpreta tion: If a gambler bets one dollar, say, on each toss of the coin, r; (w) represents his gain or loss on the i th play and s n ( w) represents his gain or loss in n plays. Each dyadic interval of rank i - 1 splits into two dyadic intervals of rank i; r; ( w ) has value - 1 on one of these and value + 1 on the other. Thus r; ( w ) is - 1 on a set of intervals of total length ! and + 1 on a set of total length �· Hence f�r; (w) dw = 0 by (1.5), and

( 1.15)

fsn( w ) dw = 0 0

by (1 .4). If the integral is viewed as an expected value, then (1.15) says that the mean position after n steps of a random walk is 0. Suppose that i < j � n. On a dyadic interval of rank j - 1, r; ( w) is constant and ri ( w ) has value - 1 on the left half and + 1 on the right. The

SECTION

1.

BOREL'S NORMAL NUMBER THEOREM

7

product r; ( w )'j( w) therefore integrates to 0 over each of the dyadic inter vals of rank j - and so

1,

i=l=j .

(1 .16)

This corresponds to the fact that independent random variables are uncorre lated. Since rl(w) = expanding the square of the sum shows that

1,

(1.17)

fs;( w ) dw

(1.13)

= n.

This corresponds to the fact that the variances of independent random variables add. Applying Chebyshev's inequality in a formal way to the probability in now leads to

(1.14) (1.18)

The following lemma justifies the inequality. Let f be a step function as in /(w ) 0 = x0 < < xk = ·

1.

· ·

(1.5):

=

ci for w e (xi _ 1 , xi ], where

Iff is a nonnegative step function, then [ w: /( w) � a] is for a > 0 a fi nite union of intervals and LemmL

P[w : f( w ) � a) � -a111f ( w ) dw .

(1.19)

o

The set in question is the union of the intervals (xi _ 1, xi ] for which ci � a. If E' denotes summation over those j satisfying ci > a, then P [ w : /(w) � a] = E'(xi - xi _ 1) by the definition On the other hand, PROOF.

(1.2).

The shaded region has area OtP

0

[w: {fw) > Q ] .

8

since the

PROBABILITY

ci

are all nonnegative by hypothesis, (1 .5) gives

•

Hence (1.19).

Taking a = n 2£ 2 and /(w) = s; (w) in (1.19) gives (1.18). Clearly, (1.18) implies (1.14), and as already observed, this in tum implies (1.11). The Strong Law of Large Numbers

It is possible with a minimum of technical apparatus to prove a stronger result that cannot even be formulated in the discrete theory of probability. Consider the set

( 1 .20 )

N = [w : lim -n L d; ( w ) = - ] 1

n-+oo

n

i=l

1 2

consisting of those w for which the asymptotic relative frequency of 1 in the sequence (1 .7) is � . The points in (1.20) are called normal numbers. The idea is to show that a real number w drawn at random from the unit interval is "practically certain" to be normal, or that there is "practical certainty" that 1 occurs in the sequence (1.7) of tosses with asymptotic relative frequency t. It is impossible at this stage to prove that P(N) = 1, because N is not a finite union of intervals and so has been assigned no probability. But the notion of "practical certainty" can be formalized in the following way. Define a subset A of 0 to be negligiblet if for each positive £ there exists a finite or countable* collection /1 , /2 , of intervals (they may overlap) satisfying • • •

( 1 .21 ) and

( 1 .22 ) t The term negligible is introduced for the purposes of this section only. The negligible sets will

reappear later as the sets of Lebesgue measure 0. * Countably infinite is unambiguous. Countable will mean finite or countably infinite, although it will sometimes for emphasis be expanded as here to finite or countable.

SECTION

BOREL' S NORMAL NUMBER THEOREM

1.

9

Of course P(lk ) is simply the length of the interval lk . A negligible set is one that can be covered by intervals the total sum of whose lengths can be made arbitrarily small. If P(A) is assigned to such an A in any reasonable way, P(A ) < £ ought to hold for each positive £, and hence P(A) ought to be 0. Even without such an assignment of probability the definition of negligibility can serve as it stands as an explication of "practical impossibil ity" and "practical certainty": Regard it as practically impossible that the random w will lie in A if A is negligible, and regard it as practically certain that w will lie in A if its complement A c [Al] is negligible. Although the fact plays no role in the arguments to follow, for an understanding of negligibility observe first that a finite or countable union of negligible sets is negligible. Indeed, suppose that A 1 , A 2 , . . . are negligible. such that A n c U k ln k and Given £, for each n choose intervals /n 1 , ln 2 , n E k P( In k ) < £/2 . All the intervals In k taken together form a countable n collection covering Un An, and their total length is EnE k P( In k ) < L,£/2 = £. Therefore, Un A n is negligible. •

•

•

A set consisting of a single point is clearly negligible, and so every countable set is also negligible. The rationals for example form a negligible set. In the

coin-tossing model, a single point of the unit interval has the role of a single sequence of O's and 1 's, or of a single sequence of heads and tails. It corresponds with intuiti<.. that it should be "practically impossible" to toss a coin infinitely often and realize any one particular sequence set down in advance. It is for this reason not a real shortcoming of the model that for no w is (1 . 7) the sequence (1 , 0, 0, 0, . . . ). In fact, since a countable set is negligible, it is not a shortcoming that (1 .7) is never one of the countably many sequences that end in O's. t�

Theorem 1.2.

The set of normal numbers has negligible complement.

Borel 's normal number theorem,t a special case of the strong law of large numbers. Like Theorem 1 .1 , it is of arithmetic as well as probabilis This is

tic interest. The set N " is not countable: Consider a point w for which (d 1 (w), d 2 (w), . . . ) = (1, 1 , u3, 1 , 1 , u6, )-that is, a point for which d;(w) 1 = 1 unless i is a multiple of 3. Since n - E7_ 1 d;(w) > i, such a point cannot be normal. But there are uncountably many such points, one for each infinite sequence (u3, u6, ) of O's and 1 's. Thus one cannot prove Nc negligible by proving it countable, and a deeper argument is required. •

•

•

•

•

•

t E mile Borel : Sur les probabilites denombrables et leurs applications arithmetiques, Circ. Mat d. Palermo. 29 ( 1 909). 247-271 .

10 PROOF OF THEOREM

( 1 .23 )

PROBABILITY

1 .2. Clearly (1 .20) and

N=

["':

lim !sn("') =

,� oo

n

o]

define the same set (see (1 .1 3)). To prove N c negligible requires constructing coverings that satisfy (1 .21) and (1 .22) for A = N(' . The construction makes use of the inequality ( 1 .24) This follows by the same argument that leads to the inequality in (1 .1 8)-it is only necessary to take /(w) = s : ( w ) and a = n4£4 in (1 .19). As the integral in (1 .24) will be shown to have order n 2 , the inequality is stronger than (1 . 1 8). The integrand on the right in (1 .24) is (1 .25) where the four indices range independently from 1 to n. Depending on how the indices match up, each term in this sum reduces to one of the following five forms, where in each case the indices are now distinct:

( 1 .26)

r;4 ( w ) = 1 , r/( w ) lj2 ( w ) = 1 , r/( w ) 'j( w ) rk ( w ) = lj( w ) rk ( w ) , r/( w) 'j( w) = r; ( w ) 'j( w ) , r; ( w ) 'j( w ) rk ( w ) r1 ( w ) .

If, for example, k exceeds i, j, and /, then the last product in (1 .26) integrates to 0 over each dyadic interval of rank k 1 because r; ( w ) 'j( w )r1( w) is constant there while rk ( w) is - 1 on the left half and + 1 on the right. Adding over the dyadic intervals of rank k 1 gives -

-

This holds whenever the four indices are distinct. From this and (1 .1 6) it follows that the last three forms in (1 .26) integrate to 0 over the unit interval; of course, the first two forms integrate to 1 .

SECTION

1.

BOREL ' S NORMAL NUMBER THEOREM

It

The number of occurrences in the sum (1 .25) of the first form in (1 .26) is n. The number of occurrences of the second form is 3n(n - 1), because there are n choices for the a in (1 .25), three ways to match it with {J, y, or 8, and n 1 choices for the value common to the remaining two indices. A term-by-term integration of (1 .25) therefore gives -

( 1 . 27 )

fo

1s! ( w ) dw = n + 3n ( n - 1) < 3n 2 ,

and it follows by (1 .24) that

( 1 . 28 ) Fix a positive sequence { £ n } going to 0 slowly enough that the series E�n 4 n - 2 converges (take £ n = n - 1 18 , for example). If An = [w: l n - 1sn(w) l > £n ], then P(An) < 3£n 4 n - 2 by ( 1 . 28 ) , and so EnP( An) < oo . If, for some m , w lies in A� for all n greater than or equal to m , then I n - Is n < w) I < £ n for n > m and it follows that w is normal because £ n --+ 0 (see (1.23)). In other words, for each m, n �- m A� c N, which is the same thing as Nc.· c U�- m An. This last relation leads to the required covering: Given £, choose m so that E�- m P(An) < £. Now An is a finite disjoint union of intervals whose lengths add to P(A n ), and therefore U�- m An is a countable union of intervals (not disjoint, but that does not matter) whose lengths add to less than £. The intervals in this union provide a covering of • Nc of the kind the definition of negligibility calls for. Strong Law Versus Weak

Theorem 1.2 is stronger than Theorem 1.1. A consideration of the forms of the two propositions will show that the strong law goes far beyond the weak law. For each n let In ( w ) be a step function on the unit interval and consider the relation (1 .29)

limoo P [ w : I In < w ) I > (] n-.

== o

together with the set {1 .30)

[ w : n-.limoo ln ( w ) == o] .

If /,, ( w ) == n-1sn ( w ), (1.29) reduces to the weak law (1.14) and (1.30) coincides with the set (1.23) of normal numbers. The general theory to follow (see the end of Section 4) will show that, whatever the step function In ( w ) may be, if the set (1 .30) has negligible complement, then (1.29) holds for each positive £. For this reason, a proof of Theorem 1.2 is automatically a proof of Theorem 1.1.

12

PROBAB ILITY

The converse. however, fails: (1.29) can hold for each positive < even though (1.30) does not have negligible complement. Define an infinite sequence of intervals in the following way. Define the first two by (1 .31 ) Define the next four by (1 .32)

==

Define the next eight as the eight intervals (1.8) with n 3. And so on. Now let fn ( w ) be the indicator function of 1,. The set in (1.29) is 1, itself (unless < > 1, in which case it is empty); since the length of 1, goes to 0, (1.29) holds. On the other hand. /, moves repeatedly over (0, 1], so that each w lies in /,, for infinitely many values of n; hence /n (w) 1 for infinitely many values of n, and the set (1.30) is empty. The complement of (1.30), the entire unit interval, is not negligible-an "obvious" fact whose proof is deferred (see Lemma 2 to Theorem 2.2). Thus there exist step functions /,, ( w) which satisfy (1.29) for each positive but for which (1 .30) fails to have a negligible complement. For this reason a proof of Theorem 1.1 is not automatically a proof of Theorem 1.2; the latter lies deeper and its proof is correspondingly more complex.

==

<

Extending the Probabilities

It is natural to try and go beyond the definition (1 .2) and assign probabili ties in a systematic way to sets other than finite unions of intervals. Since the set of nonnormal numbers is negligible, for example, one feels that it ought to have probability 0. For another probabilistically interesting set that is not a finite union of intervals, consider ( 1 .33)

00

U

n -1

[ w : - a < s 1 ( w ) , . . . , sn - l ( w ) < b, sn ( w ) = - a ] ,

where a and b are positive integers. This is the event that the gambler's fortune s n ( w ) reaches - a before it reaches + b; it represents ruin for a gambler with a dollars playing against an adversary with b dollars, the rule being that they play until one or the other runs out of capital. The union in (1 .33) is countable and disjoint, and for each n the set in the union is itself a disjoint union of certain of the intervals (1 .8). Thus (1 .33) is a countably infinite disjoint union of intervals, and it is natural to take as its probability the sum of the lengths of these constituent intervals. Since the set of normal numbers is not a countable union of intervals, however, this extension of the definition of probability would still not cover all cases of interest.

SECTION

1.

BOREL ' S NORMAL NUMBER THEOREM

13

It is, in fact, not fruitful to try to predict just which sets probabilistic analysis will require and then assign probabilities to them in some ad hoc way. The successful procedure is to develop a general theory that assigns probabilities at once to the sets of a class so extensive that most of its members never actually arise in probability theory. That being so, why not ask for a theory that goes all the way and applies to every set in the unit interval? For an arbitrary subset A of 0, should there not exist a well defined probability that the random point w lies in A? The answer turns out to be no (see the end of Section 3), and it is necessary to work within certain subclasses of the class of all subsets of the unit interval. The classes of the appropriate kinds-the fields and a-fields-are defined and studied in the next section. The theory there covers the unit interval as treated in this section, as well as discrete spaces, Euclidean spaces, and many others. PROBLEMS Some problems involve concepts not required for an understanding of the text, or concepts treated only in later sections; there are no problems whose solutions are used in the text itself. An arrow f points back to a problem (the one immediately preceding if no number is given) the solution and terminology of which are assumed. See Notes on the Problems, p. 575. 1.1.

(a) Show that a discrete probability space (see Example 2.8 for the formal definition) cannot contain an infinite sequence A 1 , A 2,. of independent events each of probability t. Since An could be identified with heads on the nth toss of a coin, the existence of such a sequence would make this section superft uous. (b) Suppose that 0 < Pn < 1 and put a, = min { Pn , 1 p, } . Show that, if l:nan diverges, then no discrete probability space can contain independent events A 1 , A 2 , such that An has probability Pn. •

•

-

•

1.2.

1.3.

1.4.

Show that

• •

N and N'

are dense [A15] in

(0, 1].

Use Lemma 2 to Theorem 2.2 to show that a nonempty interval is not negligible. t Define a set A to be trijlingt if for each £ there exists a finite sequence of intervals lk satisfying (1.21) and (1.22). This definition and the definition of negligibility apply as they stand to all sets on the real line, not just to subsets of (0, 1]. (a) Show that a trifling set is negligible. (b) Show that the closure of a trifling set is also trifling. Find a negligible set that is not trifling.

(c)

t Like negligible . trifling is a nonce word used only here. The trifling sets are exactly the sets o f

content 0; see Problem 3 . 1 5.

14

1.5.

1.6.

1.7.

PROBABILITY

(d)

Show that the closure of a negligible set may not be negligible. (e) Show that finite unions of trifling sets are trifling but that this can fail for countable unions. f For fJ 0, 1, , b - 1, let Ab({j) be the set of numbers in (0, 1] whose nonterminating representations in the base b do not contain the digit fj. (a) Show that Ah(/J ) is trifling. (b) Find a trifling set A such that every point of the unit interval can be represented in the form x + y with x and y in A . (c) Let Ah(/J 1 , . . . , fjk ) consist of the numbers in the unit interval in whose expansions the digits fj1 , . . . , fjk nowhere appear consecutively in that order. Show that it is trifling. What does this imply about the monkey that types at random? f The Cantor set C can be defined as the closure of A 3 (1 ). (a) Show that C is uncountable but trifling. (b) From [0, 1] remove the open middle third (t, f); from the remainder, a union of two closed intervals, remove the two open middle thirds ( �, ; ) and ( t, t ). Show that C is what remains when this process is continued ad infinitum. (c) Show that C is perfect [A15]. Put M (t) JJets,.
== ...

==

{1 .34) Over each dyadic interval of rank n, sn ( w) has a constant value of the form 2-n r: expt( ± 1 ± 1 ± · · · ± 1), ± 1 ± 1 ± · · · ± 1, and therefore M(t) where the sum extends over all 2n n-long sequences of + 1's and - 1's. Thus { 1 .35)

1.8.

( M(t) ... e

'+

)

==

e -t n = (cosht) n . 2

Use this and (1.34) to give new proofs of (1.15), (1.11), and (1.27). (This, the method of moment generating functions, will be investigated systematically in Section 9.) f By an argument similar to that leading to (1.35) show that the Rade macher functions satisfy

SECTION

Take ak

==

1.

BOREL' S NORMAL NUMBER THEOREM

k k t2- and from Ef_1 rk(w)2 -

•

SlO t

{ 1 . 36) by letting formula

--

t

n

__..

00

k-1

2w- 1 deduce 1

cos-k

2

oo inside the integral above. Derive Vieta's remarkable

- == 12 2

'IT

1.9.

==

n

==

15

-

2

J2

+

12 /2 + J2

...;.___ _

2

+

12

------

2

In this problem let d n(w) stand for the nth digit in the nonterminating expansion of w in the base b. Fix fl, 0 < fl < b - 1, and redefine rn(w) as

rn( w)

{==

b-1 -1

if d n { if d n(

== == ==

W ) == W ) fl .

/l,

=I=

Then (1 . 1 2) is the case b 2, fl 1 . Define s (w) by (1.1 3) for the new � rk(w) and prove that /Jsn(w) dw 0, /Jsn(w) dw n ( b - 1), and 2 /Js!(w) dw n ( b - 1)( b - 3 b + 3) + 3 n ( n - 1)( b - 1) 2 • Let be 1 the w-set where b is the asymptotic relative frequency limn n- 1 EZ-tl { JIJ (dk(w)) of/l in the base-b expansion of w. Let the numbers in are normal to the base b. Show that is negligible. That N3 (1) has negligible complement proves the Cantor set negligible.

==

Nh

==

Nt

Nh(/l) Nb np:�Nb(/l);

==

1.10.

For a positive function /(q) of integers q, let A1 be the set of w in (0, 1] that have close rational approximations in the sense that there are infinitely many irreducible fractions pIq satisfying I w- pIq I < f(q) . Show that A1 is negligible if Eqqf(q) < oo . According to a theorem of Diophantine approximation, lw- plql < irrational w. Except for a negligible set 11q 2 has infinitely many solutions for 2 2 of w, however, lw- plql < 11(q log q) has only finitely many solutions.

1.11.

A number w is normal in the base 2 if and only if for each positive £ there exists an n0(£, w) such that l n - 1 E7_ 1 d;(w)- t I < £ for all n exceeding n 0 (£, w ). Theorem 1.2 concerns the entire dyadic expansion, whereas Theo rem 1.1 concerns only the beginning segment. Point up the difference by showing that for£< t the n0 (£, w) above cannot be the same for all win 1 -in other words, n - E7_1 d; (w) converges to t for all w in but not uniformly.

N,

1.12.

N

1 .4 f (a) Using the finite form of Lemma 2 in the proof of Theorem 2.2, together with Problem 1 .4(b), show that a trifling set is nowhere dense [A15]. n n (b) Put B Un(rn - 2- -2, rn + 2 - -2 ] , where r1 , r2 , ••• is an enumera tion of the rationals in (0, 1]. Show that (0, 1]- B is nowhere dense but not

==

trifling or even negligible. Show that a compact negligible set is trifling.

(c)

16

PROBABILITY

1.13.

f

A set of the first category (Al5] can be represented as a countable union of nowhere dense sets; this is a topological notion of smallness, just as negligibility is a metric notion of smallness. Neither condition implies the other: (a) Show that the nonnegligible set N of normal numbers is of the first 1 category by proving that Am n�- m [ w: In- s,. ( w ) I < ! ] is nowhere dense and N c UmAm. (b) According to a famous theorem of Baire, a nonempty interval is not of (0, 1] - N the first category. Use this fact to prove that the negligible set N' is not of the first category.

==

==

SECTION 2. PROBABILITY MEASURES Spaces

Let 0 be an arbitrary space or set of points w. In probability theory 0 consists of all the possible results or outcomes w of an experiment or observation. For observing the number of heads in n tosses of a coin the space 0 is { 0, 1 , . . . , n } ; for describing the complete history of the n tosses n 0 is the space of all 2 n-long sequences of H's and T's; for an infinite sequence of tosses 0 can be taken as the unit interval as in the preceding section; for the number of a-particles emitted by a substance during a unit interval of time or for the number of telephone calls arriving at an exchange 0 is {0, 1 , 2, . . . } ; for the position of a particle 0 is three-dimensional Euclidean space; for describing the motion of the particle 0 is an ap propriate space of functions; and so on. Most O's to be considered are interesting from the point of view of geometry and analysis as well as that of probability. Oasses of Sets

Viewed probabilistically, a subset of 0 is an event and an element w of 0 is a sample point. As pointed out at the end of Section 1 , it is necessary to single out for treatment special classes of subsets of 0. To be useful, such a class must be closed under various of the operations of set theory. Consider the set N of the normal numbers in the form where s n (w) is the sum of the first n Rademacher functions. Since a

Example 2. 1. t

(1.23),

t Many of the examples in the book simply illustrate the concepts at hand, but others contain

definitions and facts needed subsequently.

SECTION

p oint

w

2.

PROBABILITY MEASURES

17

lies in N if and only if limnn-1sn(w) = 0, N can be put in the form 00

00

00

N = n U n[w:jn-1sn(w)j m, and this is just the definition of convergence to 0-with the usual £ replaced by k-1 to avoid the formation of an uncountable intersection. Since sn(w) is constant over each dyadic interval of rank n, the set [w: In-1s n( w) I
A systematic treatment of the ideas in Section 1 thus requires a class of

sets that contains the intervals and is closed under the formation of countable unions and intersections. Note that a singleton [A1] { x } is a countable intersection nn (x - n-1, x] of intervals. If a class contains all the singletons and is closed under the formation of arbitrary unions, then of course it contains all the subsets of 0. As the theory of this section and the next does not apply to such extensive classes of sets, attention must be restricted to countable set-theoretic operations and in some cases even to finite OnC3. Let 0 be an arbitrary nonempty space. A class .fF of subsets of 0 is called a fieldt if it contains 0 itself and is closed under the formation of complements and finite unions: (i) 0 E �; (ii) A E � implies Ac e ofF ; (iii) A, B E � implies A U Be.?F. Since 0 and 0 are complementary, (i) is the same in the presence of (ii) as the assumption 0 e.?F. In fact, (i) simply ensures that .fF is nonempty: If A E :F, then Ac E .fF by (ii) and 0 = A U Ac e .fF by (iii). By DeMorgan ' s law, A n B = (Ac U Bc)c and A u B = (Ac n Bc)c. If !F is closed under complementation, therefore, it is closed under the formation of finite unions if and only if it is closed under the formation of finite intersections. Thus (iii) can be replaced by the requirement (iii') A, B e � implies A n Be.?F. A class � of subsets of 0 is a a-field if it is a field and if it is also closed under the formation of countable unions: (iv) A1, A2, e.?F implies A U A2 U · · · e.?F. 1 •

•

•

t The term algebra is sometimes used in place of field.

18

PROBABILITY

Note that (iv) implies (iii) in the presence of (i) and (ii): take A n = 0 for n > 3. A field is sometimes called a finitely additive field to stress that it need not be a a-field. A set in a given class !F is said to be measurable !F or to be an !#i-set . A field or a-field of subsets of D will sometimes be called a field or a-field in D. Section 1 began with a consideration of the sets (1 . 1 ), the finite disjoint unions of subintervals of D = (0, 1] . Augmented by the empty set, this class is a field ffl0: Suppose that A = ( a1, a}] U · · · U ( am, a�1 ], where the notation is so chosen that a1 < · · · < am . If the (a;, a:] are disjoint, then A c is (0, a1 ] U ( a} , a 2 ] U · · · U ( a:, _ 1 , am] U ( a:, , 1 ] and so lies in !!40 (some of these intervals may be empty, as . a: and a; + 1 may coincide). If B = ( b1 , b} ] U · · · U ( bn , b� ], the ( b1, bJ] again disjoint, then A n B = U;'!. 1 U j _ 1 {(a;, a:] n ( b1, bJ] } ; each intersection here is again an interval or else the empty set, and the union is disjoint, and hence A n B is in !!40• Thus !!40 satisfies (i), (ii), and (iii'). Although !!40 is a field, it is not a a-field, since it does not, for example, contain the set (1 .33), a countable union of intervals that cannot be represented as a finite union of intervals. The set (2 . 1 ) of normal numbers is • also outside !!40• Example 2. 2.

The definitions above involve distinctions perhaps most easily made clear by a pair of artificial examples. Let � consist of the finite and the cofinite sets ( A being cofinite if A c is finite). Then � is a field. If D is finite, then � contains all the subsets of 0 and hence is a a-field as well. If D is infinite, however, then � is not a a-field. Indeed, choose in D a set A that is countably infinite and has infinite complement. (For example, choose a sequence w 1 , w2 , . . . of distinct points in D and take A = { w2 , w4 , } . ) Then A � �' even though A is the union, necessarily countable, of the singletons it contains and each singleton is in /F. This shows that the definition of a-field is indeed more • restrictive than that of field. Example 2.3.

•

•

•

Let � consist of the countable and the co-countable sets ( A being co-countable if A c is countable). Then � is a a-field. If D is uncountable, then it contains a set A such that A and A c are both uncountable. t Such a set is not in �' which shows that even a a-field may Example 2.4.

�

t If n is the unit interval, for example, take A = (0, ] , say. To show that the general uncountable n contains such an A requires the axiom of choice [A8]. As a matter of fact, to

show the existence of the sequence alluded to in Example 2.3 requires what is called the axiom of dependent choice. The possibility of choosing such sets and sequences is of course taken for granted in ordinary mathematical analysis. See JECH for a discussion of the set-theoretic . lSSUeS.

SECTION

2.

PROBABILITY MEASURES

19

not contain all the subsets of 0; furthermore, this set is the union (un countable) of the singletons it contains and each singleton is in !F, which shows that a a-field may not be closed under the formation of arbitrary • unions. The largest a-field in 0 consists of all the subsets of 0; the smallest consists only of the empty set and 0 itself. The elementary facts about fields and a-fields are easy to prove: If !F is a field, then A , B e � implies that A - B = A n B e e � and A ll. B = ( A - B ) u ( B - A ) E �- Further, it follows by induction on n that A l , . . . ' An E � implies A I u . . . UAn E � and At n . . . nAn 'E �- And if :F is a a-field, it follows by the infinite version of DeMorgan s law that A 1 , A 2, E � implies A1 n A 2 n · · · E �A field is closed under the finite set-theoretic operations, and a a-field is closed also under the countable ones. The analysis of a probability problem usually begins with the sets of some rather small class �' such as the class of subintervals of (0, 1]. As in Example 2.1 , probabilistically natural con structions involving finite and countable operations can then lead to sets outside the initial class �- This leads one to consider a class of sets that (i) contains � and (ii) is a a-field ; it is natural and convenient, as it turns out, to consider a class that has these two properties and that in addition (iii) is in a certain sense as small as possible. As will be seen presently, this class is the intersection of all the a-fields containing � ; it is called the a-field generated by � and is denoted by a( �). There do exist a-fields containing �' the class of all subsets of 0 being one. Moreover, a completely arbitrary intersection of a-fields (however many of them there may be) is itself a a-field: Suppose that �= n,�, where fJ ranges over an arbitrary index set and each � is a a-field. Then A e !F implies for each fJ that A e � and hence Ac e �' so that Ac E !F. If A n E !F for each n, then A n E � for each n and fJ, so that UnA n lies in each � and hence in �. Thus the intersection in the definition of a( �) is indeed a a-field containing JJI. It is as small as possible, in the sense that it is contained in every a-field that contains �: if � c l§ and l§ is a a-field, then l§ is one of the a-fields in the intersection defining a( �), so that a( � ) c f§. Thus a( JJI) has the three properties listed above: (i) � c a( JJI), (ii) a(�) is a a-field, and (iii) if � c l§ and l§ is a a-field, then a( �) c l§. The importance of generated a-fields will gradually become clear. •

•

•

If � is a a-field, then obviously o( �) = �- If � con sists of the singletons, then a(�) is the a-field in Example 2.4. If � is empty or � = {0} or � = { 0 } , then a(�) = {0, 0 } . If � c �' c o(�), then a(�') = a(�). • Example 2.5.

20

PROBABILITY

Let J be the class of subintervals of 0 = (0, 1] and define !11 = a(J ). The elements of !11 are called the Borel sets of the unit interval. The field !110 of Example 2.2 satisfies J c !110 c !11 , and hence a( !flo ) = !11 . Since !11 contains the intervals and is a a-field, repeated finite and countable set-theoretic operations starting from intervals will never lead outside !11 . Thus !11 contains the set (2.1) of normal numbers. It also contains for example the open sets in (0, 1]: If G is open and x e G, then there exist rationals a x and bx such that x E (a x , bx 1 c G. But then G = U x e G (a x , bx ]; since there are only countably many intervals with rational endpoints, G is a countable union of elements of J and hence lies in !11 . In fact, !11 contains all the subsets of (0, 1] actually encountered in ordinary analysis and probability. It is large enough for all " practical" purposes. It does not contain every subset of the unit interval, however; see the end of Section 3. The class !11 will play a fundamental role in all that • follows. Example 2. 6.

Probability Measures

set function is a real-valued function defined on some class of subsets of 0. A set function P on a field .fF is a probability measure if it satisfies these

A

conditions: (i) 0 < P(A) < 1 for A e.?F; (ii) P(O) = 0, P(O) = 1; (iii) if A I , A 2 , . . . is a disjoint sequence of jli:sets and if u k_ I A k E .?F, thent

( 2.2 ) The condition imposed on the set function P by (iii) is called countable additivity. Note that, since .fF is a field but perhaps not a a-field, it is necessary in (iii) to assume that U k_ 1 A k lies in .?F. If A 1 , . . . , An are disjoint �sets, then Ui:_ 1 A k is also in .fF and (2.2) with An + l = A n + 2 = =0 · · ·

tAs the left side of (2.2) is invariant under permutations of the An , the same must be true of the right side. But in fact, according to Dirichlet's theorem [A26], a nonnegative series has the same value whatever order the terms are summed in.

SECTION

2.

PROBABILITY MEASURES

21

gives ( 2. 3)

The condition that (2.3) hold for disjoint jli:sets is finite additivity; it is a consequence of countable additivity. It follows by induction on n that P is finitely additive if (2.3) holds for n = 2-if P(A u B) = P(A) + P( B ) for disjoint jli:sets A and B. The conditions above are redundant, because (i) can be replaced by P( A) � 0 and (ii) by P(O) = 1. Indeed, the weakened forms (together with (iii)) imply that P(O) = P(O) + P(O) + P(O) + , so that P(O) = 0, and 1 = P(O) = P( A) + P(Ac), so that P(A) < 1. · · ·

Consider as in Example 2.2 the field ffl0 of finite disjoint unions of subintervals of 0 = (0, 1]. The definition (1.2) assigns to each �0-set a number- the sum of the lengths of the constituent intervals-and hence specifies a set function P on ffl0• Extended inductively, (1.3) says that P is finitely additive. In Section 1 this property was deduced from the additivity of the Riemann integral (see (1.4)). At the end of this section the finite additivity of P will be proved from first principles, and it will be shown that P is, in fact, countably additive. Thus P is a probability measure on the field ffl0• • Example 2. 7.

If :F is a a-field in 0 and P is a probability measure on �, the triple ( 0 , §', P) is called a probability measure space, or simply a probability space. A support of P is any �set A for which P( A) = 1. Let � be the a-field of all subsets of a countable space 0 , and let p ( w ) be a nonnegative function on 0. Suppose that E w e n P (w) = 1, and define P(A) = E w e A P (w); since p(w) > 0, the order of summa tion is by Dirichlet' s theorem [A26] irrelevant. Suppose that A = U� 1 A;, where the A ; are disjoint, and let wil, w; 2 , • • • be the points in A;. By the theorem on nonnegative double series [A27], P(A) = E iJ p(wiJ ) = E;E1p(wiJ ) = E;P(A;), and so P is countably additive. This ( 0 , �' P) is a discrete probability space. It is the formal basis for discrete probability theory. • Exllltlple 2. 8.

Now consider a probability measure P on an arbitrary a-field §' in an arbitrary space 0 ; P is a discrete probability measure if there exist finitely or countably many points w k and masses m k such that P( A ) = E'-' �< e Am k for A in !F. Here P is discrete but the space itself may Example 2. 9.

22

PROBABILITY

not be. In terms of indicator functions, the defining condition is P( A) = Ekm k f.� ( w k ) for A E !F. If the set { w 1 , w 2 , } lies in !F, then it is a support of P. If there is just one of these points, say w0, with mass m 0 = 1, then P is a unit mass at w0• In this case P( A ) = J.�(w0) for A e !F. • • • •

Suppose that P is a probability measure on a field !F, and that A , B e !F and A c B. Since P(A ) + P( B - A) = P( B), P is monotone: P( A ) < P( B )

(2 .4)

if A c B.

It follows further that P( B - A ) = P( B) - P( A ), and as a special case, (2.5) Other formulas familiar from the discrete theory are easily proved. For ex�ple,

( 2.6 )

P( A U B ) = P( A ) + P( B ) - P( A n B ) ,

the common value of the two sides being P( A n Be) + P( A n B) + P( Ac n B ). This is the case n = 2 of the general inclusion-exclusion formula:

( 2.7 )

P

L�l A k ) = � P ( A ; ) - � P ( A ; () Aj ) + L P ( A; n A1 n A k ) + i <j < k

· · ·

To deduce this inductively from (2.6), note that (2.6) gives

Applying (2.7) to the first and third terms on the right gives (2.7) with n + 1 in place of n. nAk _ 1 , then the B k are disjoint If B 1 = A 1 and B k = A k n AI n and U i: _ 1 A k = U i: _ 1 Bk , so that P(Ui:_ 1 A k ) = Ei: _ 1 P( B k ). Since P( Bk ) < P( A k ) by monotonicity, this establishes the finite subadditivity of P: · · ·

( 2.8 )

SECTION

Here, of course, the

Boole ' s

inequality.

Ak

2.

23

PROBABILITY MEASURES

need not be disjoint. Sometimes

(2.8) is called

In these formulas all the sets are naturally assumed to lie in the field .?F. The derivations above involve only the finite additivity of P. Countable additivity gives further properties:

Let P be a probability measure on a field .?F. (i) Continuity from below: If An and A lie in .fF andt

'lbeorem 2.1.

P( An ) t P(A).

(ii)

Continuity from above: If An and A lie in

P( A n) ! P(A).

(iii)

Countable subadditivity: If A 1 , A2,

• • •

.fF

An f A,

then

and An ! A, then

and Uk_ 1 A k lie in ofF, then

(2.9 )

-·

For (i), put B 1 = A 1 and B k = A k - A k t Then the B k are disjoint, A = U k_ 1 B k , and An = Ui:_ 1 B k , so that by countable and finite additivity, P(A) = Ek_ 1 P( B k ) = limnEi:_ 1 P( B k ) = lim nP ( An ). For (ii), observe that An ! A implies A� t Ac, so that 1 - P(An) t l - P(A). As for (iii), increase the right side of (2.8) to Ek_ 1 P( A k ) and then apply part (i) to the left side. • PROOF.

Example 2. 10.

In the presence of finite additivity, a special case of (ii) implies countable additivity. If P is a finitely additive probability measure on the field :F, and if An ! 0 for sets An in .fF implies P(An) J, O, then P is countably additive. Indeed, if B = U k Bk for disjoint sets Bk ( B and the Bk in !F ), then U k nB k J, 0 and the condition, together with finite additivity, gtves •

>

P( B ) -

i:. P( Bk ) = P ( kU> n Bk )

k-1

-+

0, •

It is useful to note that the monotone passage to the limit in Theorem 2. l(i) applies if n is replaced by a continuous index. Suppose, for example, that A , e $ for t > 0, that As c A , for s < t, and that A == U, > 0 A , e $. If tn -+ oo , then A,,. t A , and so P( A , ) f P( A ) by (i). But P( A , ) is nondecreasing in t and hence has as t -+ oo a limit tliat must coincide with lim n P( A , ). Thus A , f A implies that II

t For the notation, see [A4] and [AlO].

24

PROBABILITY

P( A , ) f P( A ). Moreover, if !F is a a-field, A e !F follows from A, e .F because monotonicity implies U, 0 A , Un A , e .F. Similar remarks apply to (ii). >

==

"

Lebesgue Measure on the Unit Interval

To extend the probabilities in Section 1 and in Example 2.7 to a larger class of sets is to construct Lebesgue measure. This is a probability measure A on the a-field !11 of Borel sets in the unit interval. (In Chapter 2, A will be defined for sets outside the unit interval as well.) It is specified by the requirement that

A( a, b] = b - a

( 2.10 )

for intervals. (In Section 1, A was denoted by the first step in the construction.

P.) The following theorem is

As defined by (2.10), A is a countably additive set function on the class of intervals.

Theorem 2.2.

Note that allowing a = b in (2.10) gives A(O) = 0 for the empty interval. Of course, A(O, 1] = 1. The proof will be given by two lemmas which apply to all finite intervals, not just to subintervals of (0, 1 ].

If U k (a k , bk ] c (a, b] for a finite or infinite sequence of disjoint intervals (a k , b k ], then Ek (bk - a k ) < b - a. PROOF. Suppose first that there are only finitely many intervals, say n. The result being obvious for n = 1, assume that it holds for n - 1. If an is the largest among a 1 , , a n (this is just a matter of notation), then u;::}(a k , b k ] c (a, an], so that E;: - }(b k - a k ) < a n - a by the induction hypothesis, and hence E;: _ 1 (bk - a k ) < (an - a) + (bn - an) < b - a. Lemma 1.

•

•

•

If there are infinitely many intervals, each finite subcollection satisfies the hypotheses of the lemma, and so Ei: == 1 (b k - a k ) < b - a by the case just treated. But as n is arbitrary, the result follows. •

If (a, b ] c U k (a k , b k ] for a finite or infinite sequence of inter vals (not necessarily disjoint), then b - a < Ek (bk - a k ). PROOF. Assume that the result holds for the case of n - 1 intervals and that (a, b] c U;:_ 1 (a k , b k ] . Suppose that an an - a by the induction hypothesis and hence l:Z == 1 (bk Lemma 2.

2. PROBABILITY MEASURES 25 a k) � (an - a) + (bn - an) > b - a. The finite case thus follows by induc SECTION

tion . t Now suppose that (a, b] c ur_ 1 (a k , b k ]. If 0 < £ < b - a, the open intervals (a k , b k + £2 - k ) cover the closed interval [a + £, b], and it follows _ 1 (a k , bk + £2 - k ) by the Heine-Borel theorem [A13] that [a + £, b] c Ui: for some n . But then (a + £, b] c Ui:_ 1 (a k , bk + £2 - k ], and by the finite case, already established, b - (a + £) < E;:_ 1 (bk + £2 - k - a k ) < E k_ 1 (b k - a k ) + £. Since £ was arbitrary, the result follows. •

2.2 is

Of course, Theorem

an

immediate consequence of these two lemmas. Thus A is countably additive on the class J of intervals in (0, 1] . Now A can be extended to the class ffl0 of Examples 2.2 and 2.7. If A is a finite disjoint union Uf_ 1 I; of elements of J, define A(A)

(2.1 1 )

=

n

E A ( I; ) .

There is a question of uniqueness here, because A will have other represen tations as a finite disjoint union Uj_ 1 Jj of intervals. But J is closed under the formation of finite intersections, and so Theorem 2.2 gives

(2.12 )

n

E A ( I; )

=

; -I

n

m

E E A ( I; n Jj )

=

i == I j -I

m

E A ( Jj ) .

j ==I

(Some of the I; n Jj may be empty, but the corresponding A-values are then 0.) Thus the definition is indeed consistent. Thus (2.11) defines a set function on the class fJI0 of finite disjoint unions of intervals, the set function considered in Example 2.7. This set function is countably additive. To see this, suppose that A = Uk 1 A k , where A E ffl0, A k e ffl0, and the A k are disjoint. Then A = Uf_ 1 I; and A k = U j.! 1 JkJ are finite disjoint unions of intervals, and the definition (2.11) and Theorem 2.2 . give

(2.1 3 )

A(A)

=

n

E A ( I; ) ; -I

oo

=

n

m 1c

oo

E E E A ( I; n Jk1 )

i == I k == I j == I m lc

- E E k ==I j == I

A ( Jkj )

=

oo

E k == I

A( Ak ) .

t This and the finite case of Lemma 1 together imply (1 .3). Like (1 .3) they follow immediately from the additivity of the Riemann integral, but the point is to give an independent

development of which the Riemann theory will be an eventual by-product.

PROBABILITY

Thus A is a probability measure on the field !!40• There remains the problem of extending A to the larger class fJI of Borel sets, the a-field generated by !flo .

It is well to pause here and consider just what is involved in the construction of Lebesgue measure on the unit interval. That (2.10) defines a finitely additive set function on the class J of intervals in (0, 1] is a consequence of Lemmas 1 and 2 for the case of only finitely many intervals. Finite additivity thus involves only the most elementary properties of the real number system. On the other hand, proving countable additivity involves compactness (the proof of Lemma 2 uses the Heine-Borel theorem), a profound idea underlying all of modem analysis. Once A has been proved countably additive on the class J, there is no real problem in extending it from J to the field !!40 by the definition (2.1 1); the arguments involving (2.12) and (2.13) are easy. Difficulties again arise, however, in the further extension of A from the field !!40 to the a-field fJI = a( !!40), and here new ideas are again required. These ideas are the subject of the next section, where it is shown that any probability measure on any field can be extended to the generated a-field. Constructing a-Fields* The a-field a ( �) generated by .JJI was defined from above or from the outside, so to speak, by intersecting all the a-fields that contain .JJI (including the a-field consisting of all the subsets of 0). Can a( .JJI ) somehow be constructed from the inside by repeated finite and countable set-theoretic operations starting with sets

.

-' ? . m .w

For any class � of sets in Q let �· consist of the sets in �' the complements of sets in .Jft', and the finite and countable unions of sets in �- Given a class .JJI, put JJ/1 == � and define JJ/2 , JJ/3 , inductively by . • •

( 2 1 4) .

That each �n is contained in a( .JJI ) follows by induction. One might hope that �. == a( .JJI ) for some n , or at least that U:'- t .JJin == a(.JJI ). But this process applied to the class of intervals fails to account for all the Borel sets. Let J0 consist of the empty set and the intervals in Q == (0, 1] with rational endpoints, and define J, == J,!. 1 for n == 1, 2, . . . . It will be shown that U:'- o J, is strictly smaller than !I == a( J0 ) . If an and bn are rationals decreasing to a and b, then (a, b] == U m nn (a m , bn ] == Um (Un ( a m , bn ]' ) ' e .J4. The result would therefore not be changed by including in J0 all the intervals in (0, 1]. To prove U�_0J, smaller than !1, first put

( 2 .15 ) Since J,, 1 contains n == (0, 1] and the empty set, every element of J, has the form (2.15) for some sequence A1 , A 2 , . . . of sets in J,, _ 1 . Let every positive integer _

* This topic may be omitted.

SECTION

2.

27

PROBABILITY MEASURES

appear exactly once in the square array

Inductively define

(2.16) •o ( A I ' A 2 '

•n ( A 1 ' A 2 '

.

.

0

• • •

) == A t ' ) == '}I ( •n - 1 ( A

mu

'

A ml2 '

• • •

) ' •n - ( A 1

m 21 '

A m 22 '

It follows by induction that every element of J, has the form some sequence of sets in .1'0 Finally, put

n

•

0 •

)

'

• •

0

)'

== 1 , 2 , . . 0

•n ( A 1 , A 2 ,

•

• • •

) for

Then every element of u:_0J, has the form (2.17) for some sequence A 1 , A 2 , . of sets in J0 If A 1 , A 2 , are in !1, then (2.15) is in !I; it follows by induction that each •,. ( .A 1 , A 2 , ) is in !I and therefore that (2.17) is in !1. With each w in (0, 1] associate the sequence ( w 1 , w 2 , ) of positive integers such + wk is the position of the k th 1 in the nonterminating dyadic that w 1 + expansion of w (the smallest n for which Ej_ 1 d ( w ) == k ). Then w ++ ( w 1 w2 , ) is a one-to-one correspondence between (0, 1] an the set of all sequences of positive integers. Let /1 , 12 , . . . be an enumeration of the sets in Jo , put cp( w) == •
•

.

• • •

•

·

•

•

·

0

• •

·

,

d

• • •

Since w lies in B if and only if w lies outside cp( w ) , B =F cp( w) for every w. But every element of u:_ o J, has the form (2.17) for some sequence in Jo and hence has the form cp ( w ) for some w. Therefore, B is not a member of u:_ 0J, . It remains to show that B is a Borel set. Let Dk == [ w: w e /,., ]. Since + wk == n ] == [ w: E j : f d1 ( w ) < k == Ej _ 1 d1 ( w )] i; a Borel L1c ( n ) == [ w : w 1 + set, so are [ w : wk == n ] u:_ 1 Lk _ 1 ( m ) n Lk ( m + n ) and Dk == u:_ 1 [ w : wk == n ] n I, . Suppose that it is shown that ·

·

·

=

(2.18) for every n and every sequence u1 , u 2 , from the definition (2.17) that

• • •

00

of positive integers. It will then follow

(

Be == ( w : w E cp ( w) ] == U w : w E •n ( l,., , 1,.,"'" 2 n-1

00

== U •n ( Dm Ill , Dm Ill , n-1

·

·

0

"' Ill

) == • ( Dt , D2 , ) ·

·

·

·

,

. 0 0

)]

28

PROBABILITY

But as remarked above, (2.17) is a Borel set if the An are. Therefore, (2.18) will imply that Be and B are Borel sets. If n = 0, (2.18) holds because it reduces by (2.16) to [ w: w e /., ] = Du . Suppose that (2.18) holds with n - 1 in place of n. Consider the conditiot1 1

WE

(2 .19)

( 11""k1 , /., 11"'k

(l)n _ 1 1.,

2

,

...).

By (2.15) and (2.16), a necessary and sufficient condition for w e (!)n ( I., "1 , I., , . . . ) is that either (2.19) is false for k = 1 or else (2.19) is true for some k exceeding 1 . But by the induction hypothesis, (2.19) and its negation can be replaced by w e (1),, _ 1 ( Du , Du , . . . ) and its negation. Therefore, w e (l)n ( 1., , 1., , . . . ) '"k1 "'k2 111 if and only if w (l)n ( Du1 ' Du ' . . . ) . Thus U,J, =I= !1 , and ther� are Borel sets that cannot be arrived at from the intervals by any finite sequence of set-theoretic operations, each operation being finite or countable. It can even be shown that there are Borel sets that cannot be arrived at by any countable sequence of these operations. On the other hand, every Borel set can be arrived at by a countable set of these operations if it is not required that they be performed in a simple sequence. The proof of this statement-and indeed even a precise explanation of its meaning-depends on the theory of infinite ordinal numbers.t "2

E

"2

PROBLEMS 2.1.

Define x V y = max{ x, y } and for a collection { X0 } define V aX a = supcrxa; define x 1\ y = min{ x, y } and Aaxa = inf0 X0• Prove that IA u B = lA V 18 , IA 8 = IA 1\ I8 , IA c = 1 - IA , and IA 8 = I IA - I8 I , in the sense that there is equality at each point of n. Show that A c B if and only if lA < IB pointwise. Check the equation x A (y v z ) = (x A y) v (x A z ) and de duce the distributive law A n ( B u C) = ( A n B) u ( A n C ). By similar arguments prove that A

n

A u ( B n C) = ( A u B) n ( A u C) , Al:lC c ( Al:l B) u ( Bl:lC) ,

( U A .. r = n A � , ( n A r ... n A � . n

n

2.2.

n

,.

n

Let A 1 , , An be arbitrary events and put Uk = U (A; n · · · n A; ) and lk = n (A;1 U · · · UA; ), where the union and intersection extend o�er all the k-tuples satisfying 1 < ;1 < . . . < ik < n . Show that uk = In - k + I · •

•

t See Problem 2.22.

•

SECTION

1.3.

(a) Suppose that n

e

E

2. !F

29

PROBABI LITY MEASURES

and that

A, B

e

!F

implies

A - B

=

A n Be

e

Show that !F is a field. (b) Suppose that n !F and that !F is closed under the formation of complements and finite disjoint unions. Show that !F need not be a field.

!F.

1.4.

Let � , � , . . . be classes of sets in a common space n. (a) Suppose that � are fields satisfying � c � + 1 • Show that U �_ 1 � is a field. (b) Suppose that � are a-fields satisfying � c � + 1 • Show by example that u�- � � need not be a a-field.

15.

The field /( JJI) generated by a class JJI in n is defined as the intersection of all fields in n containing JJI. (a) Show that /( JJI) is indeed a field, that JJI c /( JJI), and that /( JJI) is minimal in the sense that if t§ is a field and JJI c t§, then f( JJI) c t§. (b) Show that /( JJI) is the class of sets of the form U :'!, 1 nj!... 1A;1 , where for each i and j either Aij JJI or A�j JJI, and where the m sets nj� lAi}' 1 < i < m , are disjoint. The sets in /( JJI ) can thus be explicitly presented, which is not in general true of the sets in a ( JJI ).

E

1.6.

E

(a) Show that if JJI consists of the singletons, then /( JJI) is the field in Example 2. 3 . (b) Show that /( JJI) c a( JJI), that /( JJI) == a( JJI) if JJI is finite, and that a(/( JJI)) == a( JJI).

f

(c)

Show that if JJI is countable, then /( JJI) is countable.

1.7.

Suppose for each A in JJI that Ac is a countable union of elements of JJI. The class of intervals in (0, 1] has this property. Show that a( JJI) coincides with the smallest class over JJI that is closed under the formation of countable unions and intersections.

1.8.

Show that, if B e a( JJI), then there exists a countable subclass JJ/8 of JJI such that B e a( JJ/8 ).

1.9.

(a) Show that if a( JJI) contains every subset of Q, then for each pair w and w' of distinct points in Q there is in JJI an A such that lA ( w ) =I= lA ( w'). (b) Show that the reverse implication holds if Q is countable.

(c)

Show by example that the reverse implication need not hold for uncoun table n.

1.10.

1.11. 2.12.

A a-field is countably generated, or separable, if it is generated by some countable class of sets. (a) Show that the a-field !I of Borel sets is countably generated. (b) Show that the a-field of Example 2.4 is countably generated if and only if Q is countable. (c) Suppose that � and � are a-fields, � c � , and � is countably generated. Show by example that !F1 may not be countably generated. Show that a a-field cannot be countably infinite-its cardinality must be finite or else at least that of the continuum.

(a) Let !F be the field consisting of the finite and the cofini te sets in an infinite Q and define P on !F by taking P( A ) to be 0 or 1 as A is finite or

30

PROBABILITY

cofini te. (Note that P is not well defined if n is finite.) Show that P is finitely additive. (b) Show that this P is not countably additive if Q is countably infinite. (c) Show that this P is countably additive if n is uncountable. (d) Now let � be the a-field consisting of the countable and the co-count able sets in an uncountable Q and define P on � by taking P( A ) to be 0 or 1 as A is countable or co-countable. (Note that P is not well defined if Q is countable.) Show that P is countably additive. and 2.13. Suppose that P is a probability measure on a field � ' that A1 , A 2 , A Un A n lie in �' and that the An are nearly disjoint, in the sense that P( Am n A n ) == 0 for m =I= n . Show that P( A ) == En P( A n ) · • • •

=

2.14. On the field !10 of finite disjoint unions of intervals in (0, 1] define P( A ) to be 1 or 0 as A does or does not for some positive £ contain the interval (! , ! + £ ]. Show that P is finitely but not countably additive.

2.15. Stochastic arithmetic. Define a probability measure Pn on the class of all subsets of Q == {1, 2, . . . } by

(2.20)

1 n

Pn ( A ) == - # ( m : 1

S

m

S

n, m E A) ;

among the first n integers the proportion that lie in A is just Pn ( A ). The set A has density

(2.21) provided this 1imi t exists. Let !J be the class of sets having density. (a) Show that D is finitely but not countably additive on !A. (b) Show that !J contains the empty set and Q and is closed under the formation of complements, proper differences, and finite disjoint unions but is not closed under the formation of countable disjoint unions or of finite unions that are not disjoint. (c) Let Jl consist of the periodic sets M == [ ka: k == 1, 2, . . . ]. Observe that a

{2 .22) Show that the field /(.A) generated by Jl (see Problem 2.5) is contained in !J. Show that D is completely determined on /(.A) by the value it gives for each a to the event that m is divisible by a. (d) Assume that Ep- 1 diverges (sum over all primes; see Problem 5.17 (e)) and prove that D, although finitely additive, is not countably additive on the field f( J1 ). (e) Euler's function cp ( n ) is the number of positive integers less than n and relatively prime to it. Let p 1 , , p, be the distinct prime factors of n ; from the inclusion-exclusion formula for the events [ m : P; 1 m ], (2.22), and the fact . . •

SECTION

2.

31

PROBABILITY MEASURES

that the P; divide n , deduce

( ! ).

cp ( n ) ... n 1 n n pf

( 2.2 3)

P

(f)

2. 16.

Show for 0 s x s 1 that D ( A ) == x for some A . (g) Show that D is translation invariant: if B == [ m + 1 : m e A ], then B has a density if and only if A does, in which case D ( A ) == D( B). Let 0 be sequence space, the set of infini te sequences of O's and 1's. Let the n th component of w be an ( w ), so that w == ( a1 ( w ), a 2 ( w ), . . . ) . A set [ w : a; ((A)) -= u; , i l, . . . , n ], for a sequencen u1 , , un of O's and 1 's, is a cylinder of rank n ; assign it P-measure 2 - . Let i'0 consist of the empty set and the finite disjoint unions A == U�_ 1C; of cylinders C; , and put P ( A ) == ==

• • •

E�- I P( C;) ·

(a) Show that i'0 is a field. (b) Show that P is finitely additive on the class of cylinders. (c) Show that P is consistently defined on i'0 and finitely additive there. (d) Show that P is countably additive on i'0• (e) Compare sequence space with (0, 1], cylinders with intervals, and P with

2.17.

Lebesgue measure. A probability measure space ( 0, !F, P) is nonatomic if P( A ) > 0 implies that there exists a B such that B c A and 0 < P( B ) < P( A ) ( A and B in !F, of course). (a) Assuming the existence of Lebesgue measure A on !1, prove that it is nonatomic. (b) Show in the nonatomic case that P( A ) > 0 and £ > 0 imply that there exists a B such that B c A and 0 < P( B ) < £. (c) Show in the nonatomic case that 0 s x s P( A ) implies that there exists a B such that B c A and P ( B) == x. Hint: Inductively define classes .JF, , numbers hn , and sets Hn by JtQ == {0} == { H0 } , .JF, == [ H: H c A Uk n Hk , P( Uk n Hk ) + P( H) s x ], hn == sup[ P( H): H e .JF, ], and P( Hn ) > hn - n- 1 • Consider Uk Hk . (d) Show in the nonatomic case that, if p 1 , p2 , . are nonnegative and add to 1, then A can be decomposed into sets B1 , B2 , such that P( Bn ) == Pn P ( A ) . t (a) Show for nonatomic measures P1 and P2 on the same a-field !F that there exists an A such that P1 ( A ) � ! and P2 ( A c) > ! . (b) Extend to n measures. For a probability measure P on a a-field !F, define d( A , B) == P( A � B ) for A and B in !F. Show that this is a metric on !F except that A == B may not follow from d (A, B ) 0. Identifying sets at distance 0 makes !F into a metric space. 2.17, 2.19 t Suppose that ( O, .F, P) is nonatomic and A , B, and t, 0 < t < 1, are given. Show that there exists an E, such that d( A , E, ) == td( A , B ) and d( E, , B ) == (1 - t)d( A , B). A metric space with this property is called convex.

-

<

• •

1.18. 1.19.

==

2.20.

• • •

<

32

PROBABILITY

2.21. (a) Suppose that JJI = { A 1 , A 2 ,

} is a countable partition of Q. Show (see (2.14)) that JJ/2 = JJ/1* = JJI* coincides with a( JJI). This is a case where a( JJI) can be constructed " from the inside." (b) Show that the set of normal numbers lies in �(c) Show that .TI'* = .TI' if and only if .TI' is a a-field. Show that J, _ 1 is strictly smaller than J, for all n . 2.22. Extend (2.14) to infinite ordinals a by defining JJia = (U/l a � )*. Show that, if Q is the first uncountable ordinal, then U a 0JJia = a( JJI). Show that, if the cardinality of JJI does not exceed that of the continuum, then the same is true of a( JJI). Thus !I has the power of the continuum. 2.23. f Extend (2.16) to ordinals a < Q as follows. Replace the right side of (2.15) by U� 1 (A 2 n - I U A2n> · Suppose that _,ft is defined for {J < a. Let fJa (1 ) , fJa (2), . . . be a sequence of ordinals such that fJa ( n ) < a and such that if {J < a, then {J = fJa ( n ) for infinitely many even n and for infinitely many odd n ; define (2.24) _,a ( A . , A 2 , . . . ) •

•

•

<

<

....

Prove by transfinite induction that (2.24) is in !I if the An are, that every element of J;. has the form (2.24) for sets An in J0 , and that (2.18) holds with a in place of n . Define CJ>a ( w) = _,a ( I , I , ) and show that Ba = [ w: w � cpa ( w )] lies in !I .1;. for a < Q. Sho\v tliat ..1;. is strictly smaller than � for a < {J s n. w

w

•

•

•

-

SECI10N 3. EXISTENCE AND EXTENSION

The main theorem to be proved here may be compactly stated this way:

A probability measure on a field has a unique extension to the generated a-field. Theorem 3. 1.

In more detail the assertion is this: Suppose that P is a probability measure on a field �0 of subsets of 0, and put � = o( �0). Then there exists a probability measure Q on � such that Q (A) = P(A) for A e �0. Further, if Q ' is another probability measure on � such that Q '(A) = P(A) for A e �0, then Q'(A) = Q(A) for A e � . Although the measure extended to � is usually denoted by the same letter as the original measure on �0, they are really different set functions, since they have different domains of definition. The class �0 is only assumed finitely additive in the theorem, but the set function P on it must be countably additive because this of course follows from the conclusion of the theorem. As shown at the end of the last section, A (initially defined for intervals by (2.10)) extends to a probability measure on the field ff10 of finite disjoint

3. EXISTENCE AND EXTENSION 33 unions of subintervals of (0, 1]. By Theorem 3.1, A extends in a unique way from fJI0 to fJI = o( ffl0), the class of Borel sets in (0, 1]. The extended A is Lebesgue measure on the unit interval. Theorem 3.1 has many other applica tions as well. The uniqueness in Theorem 3.1 will be proved later; see Theorem 3.3. The first project is to prove that an extension does exist. SECTION

Construction of the Extension

Let P be a probability measure on a field �0• The construction following extends P to a class that in general is much larger than o( �0) but nonetheless in general does not contain all the subsets of 0 . For each subset A of D, define its outer measure by

(3 . 1 )

n

where the infimum extends over all finite and infinite sequences A 1 , A 2 , of �-sets satisfying A c U n A n . If the An form an efficient covering of A, in the sense that they do not overlap one another very much or extend much beyond A, then En P(A n ) should be a good outer approximation to the measure of A if A is indeed to have a measure assigned it at all. Thus (3.1) represents a first attempt to assign a measure to A. Because of the rule P(Ac) = 1 P(A) for complements (see (2.5)), it is natural in approximating A from the inside to approximate the complement A c from the outside instead and then subtract from 1 : •

•

•

-

(3 .2 ) This, the inner measure of A, is a second candidate for the measure of A. t A plausible procedure is to assign measure to those A for which (3.1) and (3.2) agree, and to take the common value P*(A) = P.(A) as the measure. Since (3.1) and (3.2) agree if and only if

( 3 . 3)

P * ( A ) + P* ( Ac )

=

1,

the procedure would be to consider the class of P* ( A ) as the measure.

A satisfying (3.3) and use

tAn initially reasonable idea would be to define P.( A ) as the supremum of the sums En P< An ) for disjoin t sequences of �-sets in A . This will not do. For example, in the case where n is the

unit interval, §'0 is !110 (Example 2.2), and P is A as defined by (2. 1 1 ), the set N of normal numbers would have inner measure 0 because it contains no nonempty elements of !110 ; in a

satis factory theory, N will have both inner and outer measure 1 .

34

PROBABILITY

It turns out to be simpler to impose on A the more stringent requirement that P*( A n E ) + P*( Ae n E ) = P* ( E )

(3 .4)

holds for every set E; (3.3) is the special case E = D, because it will tum out that P *( D ) = t.t A set A is called P*-measurable if (3.4) holds for all E; let Jl be the class of such sets. What will be shown is that Jl contains o( �0) and that the restriction of P* to o( �0) is the required extension of P. The set function P * has four properties that will be needed : (i) p *(0) = 0; (ii) P * is nonnegative: P*( A ) > 0 for every A c D; (iii) P * is monotone: A c B implies P*( A ) � P*(B); (iv) P * is countably subadditive: P*(U n A n ) < En P*( A n ). The others being obvious, only (iv) needs proof. For a given £, choose �0-sets Bn k such that An C U k Bn k and E k P( Bn k ) < P *( A n ) + £ 2 - n , which is possible by the definition (3.1). Now U n A n C U n, k Bn k' so that P *( U n A n ) s E n . k P(Bn k ) < E n P *( A n ) + £, and (iv) follows. * By definition A lies in the class Jl of P*-measurable sets if it splits each E in such a way that P* adds for the pieces-that is, if (3.4) holds. Because of subadditivity, (3.4) is equivalent to P * ( A n E ) + P*( Ae n E )

( 3 .5 )

< P *( E ) .

The class Jl. is a field.

Lemma 1.

It is clear that D e .I and that Jl is closed under complementa tion. Suppose that A, B e Jl and E c D. Then

PROOF.

P * ( E ) = P * ( B n . E ) + P*( Be n E ) =

P *( A n

B n E ) + P*( Ae n B n E ) + P *( A

n Be n E )

+ P *( Ae n Be n E ) �

P *( A n

B n E ) + P* (( Ae n B n E ) u ( A

n Be n E )

u ( Ae n Be n E )) =

P * (( A n

B ) n E ) + P*(( A

n B)e n E),

t It also turns out, after the fact, that (3.3) implies that (3.4) holds for all E anyway; Problem 3.2. * Compare the proof on p. 9 that a countable union of negligible sets is negligible.

see

·

SECTION

3.

35

EXISTENCE AND EXTENSION

inequality following by subadditivity. Hencet A n B e Jf, and field.

the

Lemma 2.

Jf

is a

•

The set function P* is finitely additive on Jt.

If A and B lie in j( and are disjoint, then P *( A U B) = P*( A n ( A u B)) + P *(Ac n ( A u B)) = P*( A ) + P*( B) . It follows by induc • tion that P *( U i: _ 1Ak) = Ei:_ 1P *(Ak) for disjoint J(-sets A1, An.

PROOF.

•

Lemma 3.

If A1, A 2 ,

•

•

•

•

is a disjoint sequence of j{-sets, then

and P * ( UnA n ) = EnP*( An).

•

,

UnAn e Jf

Put A = UnAn. By Lemma 2 and monotonicity, Ln s m P *( An) = P *( Un s m An) < P *( A ). Therefore, EnP *( An) < P *( A ); the opposite in equality follows by subadditivity. It remains to prove that A satisfies (3 . 5) . Put Bm = U n s mAn and consider the equation PROOF.

(3.6)

n !![;. m

This is certainly true for m = 1 . Assume it is true for m , and split E n Bm + t by the set Bm, which by Lemma 1 lies in Jl: P*( E n Bm + t ) = P*( E n Bm + l n Bm ) +P*( E n Bm + t n B� ) = P *( E n Bm) + P *( E n Am + t ) En s m P*( E n An) + P*( E n Am + 1). Hence (3 . 6) holds for all m . Now split E by Bm: P *( E ) = P *( E n Bm ) + P *( E n B� ) = En s mP* ( E n An ) + P *( E n B�) � Ln s m P *( E n A n) + P *( E n Ac). P *( E) > EnP*( E n An) + P *( E n Ac) � P *( E n A ) + oo : Let m • P* ( E n A c), which establishes (3 . 5). =

-+

Lemma 4.

additive.

The class .,/1 is a a-field, and P* restricted to Jf is countably

By Lemmas 1 and 2, P * is finitely additive on the field .1. If A1 , A 2 , are in .1, put B1 = A1 and Bn = An n A� n · · · n A�_ 1; then the Bn are disjoint and by Lemma 1 lie in .1, so that U nAn = UnBn lies in Jt by Lemma 3. Thus .I is a a-field. Lemma 3 also gives countable • additivity. PROOF.

•

•

•

PROOF OF EXTENSION IN THEOREM

3 . 1 . Suppose that P * is defined via (3 . 1 )

from a probability measure P on a field .F0• The first step is to prove that ,-o c Jl. t This proof does not work if (3.4) is weakened

to

(3.3). Compare the fact that some inductive proofs go through only if the theorem in question is given a sufficiently strong formulation.

36

PROBABILITY

Suppose that A E � · Given E and £, choose �-sets An such that E c U n A n and L,nP(An) < P*( E) + £. The sets Bn = An n A and Cn = An n A c lie in �0 because it is a field. Also, E n A c UnBn and E n Ac c U nCn; by the definition of P* and the finite additivity of P, P *(E n A) + P*( E n A'') < EnP(Bn) + EnP(Cn) = EnP( An) < P*( E) + £. Hence A E �0 implies (3.5), and so �0 c ...1( . It is obvious from the definition (3.1) that P*(A) < P(A) for A in §'0• If A c UnAn, where A and the An are in $0, then by the countable subad ditivity and monotonicity of P on � ' P(A) < EnP(A n A n ) < EnP(An) · Thus

P ( A ) = P* ( A ) Since D E $0 c ..,1( , P*(O) = P(O) = 1 . By Lemma 4, P*

(3 .7)

restricted to the a-field Jf is thus a probability measure. Since the a-field ..,1( contains .F0, it also contains §'= a($0), and P* further restricted from ..,1( to §' is also a probability measure. This measure on � is the required extension because by (3.7) it agrees with P on $0• • Uniqueness and the 1r-� Theorem

To prove the extension in Theorem 3.1 is unique requires some auxiliary concepts. A class 9' of subsets of 0 is a 'IT-system if it is closed under the formation of finite intersections:

('IT ) A, B E 9' implies A n B E gJ. A class !t' is a A-systemt if it contains 0 and is closed under the formation of complements and of finite and countable disjoint unions:

( A-1) 0 E !t'; ( A-2) A E !t' implies Ac E !t'; ( A-3) Al , A 2 , . . . ' E !t' and A n n

A

m

=

0 for m ¢ n imply UnAn E !t'.

Because of the disjointness condition in ( A-3), the definition of A-system is weaker (more inclusive) than that of a-field. In the presence of ( A-1 ) and ( A-2), which imply 0 E !t', the countably infinite case of ( A-3) implies the finite one. Many uniqueness arguments depend on Dynkin ' s 'IT-A theorem. It is technical but extremely useful. tA different but logically equivalent definition was used in the first edition of this book. For this

and other definitions less convenient than the present one, see Problem 3.8.

3. EXISTENCE AND EXTENSION 37 If gJ is a 'IT-system and .!l' is a A-system, then 9J c .!l' implies SECTION

'lbeorem 3.2. a ( 9' ) c .!l' .

Although it can be stated quite compactly, the argument will for clarity be broken down into steps, with all details included. Two preliminary facts will be proved: PRO OF.

a0 b0

A class that is both a 'IT-system and a A-system is a a-field. If a A-system contains A and A n B, it also contains A - ( A n = A n Be.

B)

Let 1(9') be the minimal A-system over �-the intersection of all 'A-systems containing &J. And for each set A , let riA be the class of sets B such that A n B e l(gJ). The proof proper will consist of establishing the following statements in sequence; here the arrows denote implication. 1 °. A E I( gJ ) --+ riA is a A-system. 2 ° . A e fJ' & B e fJ' --+ A n B e gJ c I( 9') --+ B e riA . 3° . A E fJ' --+ fJ' c riA --+ I( gJ) c riA . 4° . A e fJ' & B e 1(9') --+ B e riA --+ A n B e 1(9'). 5 ° . B e 1(9') & A e fJ' --+ B n A e 1(9') --+ A e fiB. 6°. B e 1(9') --+ gJ c fiB --+ l(fJ') c fiB. 7 °. B E 1(9') & A E l(gJ) --+ A E fiB --+ B n A E I( &J). 8 ° . I ( fJ') is a 'IT-system. 9° . I ( fJ') is a a-field. 10° . fJ' c a( 9') c I( &J ) c !t'. A class that is both a 'IT-system and a A-system contains 0 ( A-1) and is closed under the formation of complements ( A-2) and finite intersec tions ( 'IT ), and is therefore a field. It is a a-field because if it contains sets An then it also contains the disjoint sets Bn = A n n A} n nA�_ 1 and by (A- 3) contains UnAn = UnBn . STEP

a0 •

· · ·

If a A-system contains A and A n B, then it contains Ae and the disjoint union Ae u ( A n B), and therefore contains the complement of this last set, namely (Ae U ( A n B))e = A n Be. STEP

b0•

1 ° . Since A n 0 = A E I( 9'), riA satisfies ( A-1). It satisfies ( A-2) because if B E riA , then the A-system 1(9') contains A and A n B and hen ce by b0 contains A n Be. It satisfies ( A-3) because if Bn E riA and the STEP

38

PROBABILITY

Bn

are disjoint, then the A-system 1(9') contains the disjoint sets A n Bn and hence contains their union U n (A n Bn) = A n (UnBn), so that UnBn E f§A . STEP 2 °. Use first the fact that fJ' is a w-system and then the definition of f§A . STEP 3 °. Use 2 °, 1 o, and the minimality of I ( fJ' ).

STEP 4°. Use 3 ° and the definition of f§A . STEP 5°. Use 4 o and the definition of f§8. (The key to the proof is the fact that B E �A if and only if A E f§8.) STEP 6°. Use 5°, 1 °, and the minimality of 1(9' ). STEP 7°. Use 6° and the definition of f§8• STEP g o. Use 7° and the definition of w-system. STEP 9° . Use g o and a0 • STEP 10°. Use 9° and the minimality of o( 9') and I( 9'). The final conclusion is that o( 9') c !t', as required. • Since a field is certainly a w-system, the uniqueness asserted in Theorem 3.1 is a consequence of this result: 1beorem 3.3. Suppose that P1 and where fJ' is a w-system. If P1 and P2

P2 are probability measures on o(fJ'), agree on fJ', then they agree on o(fJ'). PROOF. Let !t' be the class of sets A in o(fJ' ) such that P1 (A) = P2 (A). Clearly D E !t'. If A E !t', then P1 (Ac) = 1 P1 (A) = 1 P2 (A) = P2 ( A c ), and hence Ac E !t'. If An are disjoint sets in !t', then P1 (UnAn) = EnPl (An) = EnP2 (An) = P2 (UnAn), and hence UnAn E !t'. Therefore !t' is a A-system. Since by hypothesis fJ' c !t' and fJ' is a w-system, the w-A theorem gives o(fJ') c !t', as required. • -

-

Note that the w-A theorem and the concept of A-system are exactly what are needed to make this proof work: The essential property of probability measures is countable additivity, and this is a condition on countable disjoint unions, the only kind involved in the requirement ( A-3) in the definition of A-system. In this as in many applications of the w-A theorem, .IR e o( fJ') and therefore o(fJ') = !t', even though the relation o ( fJ' ) c !t' itself suffices for the conclusion of the theorem. If a probability measure P on a a-field !F is determined by its values on some smaller class .JJI , it is natural to ask whether jli:sets E can somehow be

SECTION

3.

39

EXISTENCE AND EXTENSION

approximated by �sets A in such a way that P ( A). Theorem 1 1 .4 is a result of this kind.

P( E)

is approximated by

Monotone Oasses A class

A

of subsets of Q is monotone if it is closed under the formation of monotone unions and intersections: (i) A t , A 2 , . . . A and An f A imply A A ; (ii) A t , A 2 , . . . Jl and An � A imply A e J/. Halmos ' s monotone class theorem is a close relative of the w-A theorem but will be less frequently used in this book.

11aeorem 3.4.

cr('a ) c A.

EE

E

If � is a field and

Jl is

a monotone class, then 'a

c .,K

implies

PllOOF.

Let m ( 'a ) be the minimal monotone class over 'a-the intersection of all monotone classes containing 'a· It is enough to prove a( 'a) c m('a); this will follow if m ( 'a ) is shown to be a field because a monotone field is a a-field. Consider the class t§ [A : Ac e m ('a)]. Since m ( � ) is monotone, so is t§. Since 'a is a field, � c t§, and so m(�) c t§. Hence m('a ) is closed under complementation. Define t§1 as the class of A such that A u B e m('a) for all B e 'a· Then t§1 is a monotone class and 'a c t§1 ; by the minimality of m(�), m('a) c t§1 • Define t§2 as the class of B such that A u B e m ( 'a ) for all A e m(�). Then t§2 is a monotone class. Now from m(�) c t§1 it follows that A e m(�) and B e 'a together imply that A u B e m( -Fo); in other words, B e 'a implies that B e '12 • Thus � c t§2 ; by minimality, m(�) c t§2 , and hence A , B e m('a) • implies that A u B e m(�).

==

Completeness

This is the natural place to consider completeness, although it enters into probabil

ity theory in an essential way only in connection with the study of stochastic processes in continuous time; see Sections 37 and 38. A probability measure space ( 0, -', P) is complete if A c B, B e �, and P( B) 0 together imply that A e � (and hence that P( A ) 0). If ( n, -', P) is complete, then the conditions A e -', A l:l A' c B e �, and P( B) 0 together imply that A' e -' and P( A') P( A ). Suppose that ( Q , -', P) is an arbitrary probability space. Define P* by (3.1) with � =- ,- a( 'a ), and consider the a-field A of P*-measurable sets. The arguments leading to Theorem 3.1 show that P* restricted to A is a probability measure. If P *(B) 0 and A c B, then P* (A n E) + P*(Ac n E) s P*( B) + P* ( E) P*( E) by monotonicity, so that A satisfies (3.5) and hence lies in A. Thus (0, Jf, P* ) is a complete probability measure space. In any probability space it is

==

== ==

==

==

==

==

therefore possible to enlarge the a-field and extend the measure in such a way as to get a complete space.

40

PROBABILITY

Lebesgue Measure on the Unit Interval

Consider once again the unit interval (0, 1] together with the field ffl0 of finite disjoint unions of subintervals (Example 2.2) and the a-field fJI = a ( !i'0) of Borel sets in (0, 1]. According to Theorem 2.2, A( a, b] = b - a defines a countably additive set function on the class of subintervals, and by the argument following it, (2.11) defines A as a probability measure on !!40• By Theorem 3.1, A extends to ffl, the extended A being Lebesgue measure. The probability space ((0, 1], ffl, A ) will be the basis for the probability theory in the remaining sections of this chapter. A few geometric properties of A will be considered here. Since the intervals in (0, 1] form a w-system generating fJI, A is the only probability measure on fJI that assigns to each interval its length as its measure. Suppose that ((0, 1 ], !6, A ) is completed, as above. The sets in the completed a-field .,If are called the Lebesgue sets and A extended to .,If is still called Lebesgue measure.

Some Borel sets are difficult to visualize: Let { r1 , r2 , } be an enumeration of the rationals in (0, 1). Suppose that £ is small and choose an open interval In = (an, bn) n such that rn e In c (0, 1) and A ( In ) = bn - an < £ 2 . Put A = U�_ 1 In. By subadditivity, 0 < A ( A ) < £ . Since A contains all the rationals in (0, 1), it is dense there. Thus A is an open, dense set with measure near 0. If I is an open subinterval of (0, 1), then I must intersect one of the In, and therefore A( A n I) > 0. If B = (0, 1) - A, then 1 - £ < A( B) < 1. The set B contains no interval and is in fact nowhere dense [A15]. Despite this, B has measure nearly 1. • Example 3. 1.

•

•

•

-

There is a set defined in probability terms that has geometric properties similar to those in the preceding example. As in Section 1, let d n < w ) be the nth digit in the dyadic expansion of w ; see (1.6). Let An == [ w E (0, 1 ] : d; ( w ) = dn + ; ( w ) = d2 n + ; ( w ), i = 1, . . . , n], and let A = U�_ 1 An. Probabilistically, A corresponds to the event that in an infinite sequence of tosses of a coin, some finite initial segment is im n n mediately duplicated twice over. Since A(An) = 2 · 2 - 3 , 0 < A(A) < E�_ 1 2 2 n = � - Again A is dense in the unit interval; its measure, less than � , could be made less than £ by requiring that some initial segment be • immediately duplicated k times over with k large. Example 3. 2.

-

The outer measure (3.1) corresponding to A on ffl0 is the infimum of the sums EnA(An) for which A n E !!40 and A c Un An. Since each An is a finite

SECTION

3.

EXISTENCE AND EXTENSION

41

disjoint union of intervals, this outer measure is (3 . 8)

n

where the infimum extends over coverings of A by intervals (an, bn1 · The notion of negligibility in Section 1 can therefore be reformulated: A is negligible if and only if A*(A) = 0. For A in ffl, this is the same thing as

A( A) = 0.

Since replacing (a n , bn] by (an, bn + £2 - n ) increases the sum in (3.8) only by £, the infimum is unchanged if it extends over coverings of A by open intervals (a n , bn ). Given £ choose the covering so that En(bn - an) < A*( A) + £ and put G = U n (a n , bn). The construction proves this: For A e fJI and £ > 0, there exists an open set G such that A c G and A( G) < A(A) + £. (Example 3.1 is this construction applied to the set of rationals.) A Borel set can thus in the sense of Lebesgue measure be approximated from above by the open sets; this property is called regularity. Nonmeasurable Sets

There do exist sets outside !1. To construct one it is convenient to use addition modulo 1 in (0, 1]. For x, y e (0, 1] take x E9 y to be x + y or x + y - 1 according as x + y lies in (0, 1] or not. t Put A E9 x == [a E9 x: a e A ]. Let !R be the class of Borel sets A such that A E9 x is a Borel set and A{A E9 x ) == A ( A ). Then !R is a A-system containing the intervals, and so !II c !R by the w-A theorem. Thus A e !I implies that A E9 x e !I and A( A E9 x) =- A ( A ). Define x and y to be equivalent ( x - y) if x E9 r == y for some rational r in (0, 1]. Let H be a subset of (0, 1] consisting of exactly one representative point from each equivalence class; such a set exists under the assumption of the axiom of choice [A8]. Consider now the countably many sets H E9 r for rational r. These sets are disjoint because no two distinct points of H are equivalent. (If H E9 r1 and H E9 r2 share the point h 1 E9 r1 == h 2 E9 r2 , then h 1 - h 2 ; this is impossible unless h 1 == h 2 , in which case r1 == r2 .) Each point of (0, 1] lies in one of these sets because H has a representative from each equivalence class. (If x - h e H, then x == h E9 r e H E9 r for some rational r.) Thus (0, 1] == Ur( H E9 r), a countable disjoint union. If H were in !1, it would follow that A(O, 1] == ErA( H E9 r). This is impossible: If the value common to the A( H E9 r) is 0, it leads to 1 == 0; if the common value is positive, it leads to a convergent infinite series of identical positive terms ( a + a + · · < oo and a > 0). Thus H lies outside !1. The argument can be recast in a straightforward way to prove (together with the axiom of choice) this result: There exists on the class of all subsets of (0, 1] no probability measure P such that P( A E9 x ) == P( A ) for all A and x. ·

t This amounts to working in the circle group.

42

PROBABILITY

Under the assumption of the continuum hypothesis in addition to the axiom of choice it is possible to prove the stronger result that there exists on the class of all subsets of (0, 1] no probability measure P such that the P-value of each singleton is o.t

PROBLEMS 3. 1.

(a) In the proof of Theorem 3.1 the assumed finite additivity of P is used twice and the assumed countable additivity of P is used once. Where? (b) Show by example that a finitely additive probability measure on a field may not be countably subadditive. Show in fact that if a finitely additive probability measure is countably subadditive, then it is necessarily countably additive as well. (c) Suppose Theorem 2.1 were weakened by strengthening its hypothesis to the assumption that !F is a a-field. Why would this weakened result not suffice for the proof of Theorem 3.1?

3.2.

Let P be a probability measure on a field 'a and for every subset A of Q define P* ( A ) by (3.1). Denote also by P the extension (Theorem 3.1) of P to !F == a ( -Fo ). (a) Show that

( 3 .9)

P* ( A ) == inf( P( B) : A

c

B, B e !F)

and (see (3.2))

(3.10)

3.3.

P.( A ) == sup ( P( C) : C c A , C e !F ) ,

and show that the infimum and supr�mum are always achieved. (b) Show that A is P*-measurable if and only if P.( A ) == P*(A ). (c) The outer and inner measures associated with a probability measure P on a a-field !F are usually defined by (3.9) and (3.10). Show that (3.9) and (3.10) are the same as (3.1) and (3.2) with !F in the role of 'a · 2.12 2.14 3.2 f For the following examples, describe P* as defined by (3.1) and .,If == JI ( P* ) as defined by the requirement (3.4). Sort out the cases in which P* fails to agree with P on 'a and explain why. (a) Let 'a consist of the sets 0, {1 }, {2, 3 } and Q == {1, 2, 3 }, and define probability measures P1 and P2 on 'a by P1 { 1 } == 0 and P2 {2, 3} == 0. Note that J/( P1* ) and J/( P2* ) differ. (b) Suppose that Q is countably infinite, let -Fa be the field of finite and cofini te sets, and take P ( A ) to be 0 or 1 as A is finite or cofini te. (c) The same, but suppose that Q is uncountable. (d) Suppose that Q is uncountable, let � consist of the countable and the co-countable sets, and take P( A ) to be 0 or 1 as A is countable or co-countable.

t See BIRK.HOFF, p. 1 87. There it is also proved (p. 185) that a finitely additive measure on a field can be extended to a finitely additive measure on the a-field of all subsets of the space.

3. EXISTENCE AND EXTENSION 43 (e) The probability in Problem 2.14. (f) Let P( A ) == lA ( w0 ) for A e � , and assume { w0 } e a( 'a ) . Let n be the unit square [(x, y): 0 < x , y < 1], let !F be the class of sets of the form [( x, y): x e A , 0 < y < 1], where A e !1, and let P have value A(A ) at this set. Show that ( n, !F , P) is a probability measure space. Show for A == [( x , y ) : 0 < x < 1, y == ! J that P.( A ) == 0 and P*( A ) == 1. 2.13 3.2 f As shown in the text, a probability measure space ( n, !F, P) has a complete extension- that is, a complete probability measure space ( n , � , P1 ) such that !Fe :F. and P1 agrees with P on !F. (a) Suppose that ( n , � ' P2 ) is a second complete extension. Show by an example in a space of two points that P1 and P2 need not agree on the a-field !F1 n �. (b) There is, however, a unique minimal complete extension: Let fF+ consist of the sets A for which there exist jl&:sets B and C such that A ll. B c C and P( C ) == 0. Show that fF+ is a a-field. For such a set A define p+ ( .A ) == P( B). Show that the definition is consistent, that p+ is a probability measure on fF+ , and that ( n , fF+ , p+ ) is complete. Show that, if ( n , � , P1 ) is any complete extension of ( n , !F, P), then fF+ c :F. and P1 agrees with p + on fF+ ; ( n , fF+ , p + ) is the completion of ( n , !F, P). (c) Show that A e fF+ if and only if P.( A ) == P* (A ), where P* and P* are defined by (3.9) and (3.10), and that p+ ( A ) P.( A ) P*( A ) in this case. Thus the complete extension constructed in the text is exactly the completion. Suppose that ( n, !F , P) is a probability measure space and define P* by (3.9). Suppose that n0 c n and let 'a == [ A n n0 : A e !F]. Suppose that P * ( ll0 ) > 0 and define P0 on 'a by setting P0 (A n n0 ) == P*(A n ll0 )/P * ( n0 ) for A e !F. (a) Show that 'a is a a-field in n0 and that P0 is a probability measure on �(b) Let P0* be the outer measure in n0 generated by P0 • Show that P0*( A ) P*( A )/P*( n9) for A c n0• 2.6 2.10 2.19 f Show that the metric space in Problem 2.19 is separable if the a-field !F is separable in the sense of Problem 2.10. (a) Show by example that a A-system need not be a field. (b) Show that a A-system satisfies the following conditions. (A-4) A , B e .!£ and A c B imply B - A e .!£, (A-5) A , B e .!£ and A n B == 0 imply A U B e .!£, (A-6) A 1 , A 2 , • • • e .!£ and An f A imply A e .!£, (A-7) A 1 , A 2 , • • • e .!£ and An � A imply A e .!£. (c) Show that !J' is a A-system if and only if it satisfies (A-1), (A-4), and (A-6).t (d) Sometimes a A-system is defined as a class .!£ that satisfies ( A-1 ), ( A-4), ( A-5), and ( A-6). Show that ( A-5) is redundant in this definition and that this definition is equivalent to the one adopted in this book. SECTION

3.4.

3.5.

==

3.6.

3.7. 3.8.

t

==

1 n the first edition of this

conditions.

==

book

a A-system was defined as a class satisfying these three

44 3.9. 3. 10.

3.11.

3. 12. 3. 13. 3. 14. 3. 15.

PROBABILITY

2.5 3.8 f Deduce the w-A theorem from the monotone class theorem by showing directly that, if a A-system !J' contains a w-system gJ, then !R also contains the field generated by gJ. Show that a A-system is a monotone class. The large class in the monotone class theorem has less structure than the large class in the w-A theorem, whereas the reverse is true of the small classes. Show that it follows from each of these two theorems that, if a field .Fo is contained in a A-system !R, then a( .Fo ) c !J'. Show that each of these theorems would follow from a theorem asserting that, if a w-system gJ is contained in a monotone class .A , then a(B' ) c .A. Show, however, that there is no such theorem. 2.5 f (a) Suppose that 'a is a field and P1 and P2 are probability measures on a ( 'a). Show by the monotone cJass theorem that if P1 and P2 agree on .Fo, then they agree on a ( 'a ). (b) Let 'a be the smallest field over the w-system gJ. Show by the inclusion-exclusion formula that probability measures agreeing on gJ must agree also on 'a · Now deduce Theorem 3.3 from part (a). Lay out the proof of the monotone class theorem as a sequence of steps like the one in the proof of Theorem 3.2. Show that it is impossible in Theorem 3.3 to drop the assumption that gJ is a w-system. 2.22 f Prove the existence of a Lebesgue set that is not a Borel set. The outer content of a set A in (0, 1] is c* ( A ) == infE n ( bn - an ), where the infimum extends over finite coverings of A by intervals (an , bn ]. Thus A is trifling in the sense of Problem 1.4 if and only if c*( A ) == 0. Define inner content by c .( A ) == 1 - c*(A c). Show that c.( A ) < c* ( A ); if the two are equal, their common value is taken as the content c( A ) of A , which is then Jordan measurable. Show that c.( A ) == supE n ( bn - an ), the supremum ex tending over finite disjoint unions of intervals ( an , bn 1 contained in A . Of course, the analogue for A fails. Show that c* (A ) == c* (A - ), where A - is the closure of A ; the analogue for A* fails. A trifling set is Jordan measurable. For the set N of normal numbers, 0 == c.( N ) < c*( N) == 1, even though A .( N) == A*( N) == 1. Show that c .( A ) < A .( A ) < A* ( A ) < c* ( A ) always holds. 1.6 f Deduce directly by countable additivity that the Cantor set has Lebesgue measure 0. From the fact that A(x E9 A ) == A ( A ), deduce that sums and differences of normal numbers may be nonnormal. Let H be the nonmeasurable set constructed at the end of the section. (a) Show that, if A is a Borel set and A c H, then A ( A ) == 0-that is, A .( H) == 0. (b) Show that, if A*( E) > 0, then E contains a nonmeasurable subset. The aim of this problem is the construction of a Borel set A in (0, 1) such that 0 < A( A n G) < A( G) for every nonempty open set G in (0, 1). •

3.16. 3.17. 3.18.

3. 19.

4.

45 (a) It is shown in Example 3.1 how to construct a Bore] set of positive Lebesgue measure that is nowhere dense. Show that every interval contains such a set. (b) Let { I, } be an enumeration of the open intervals in (0, 1) with rational endpoints. Construct disjoint, nowhere dense Borel sets A 1 , B1 , A 2 , B2 , of positive Lebesgue measure such that A , u B, c I, . (c) Let A == U" A" . A nonempty open G in (0, 1) contains some I, . Show that 0 < A ( A , ) < A ( A n G) < A ( A n G) + A( B, ) < A ( G). 3.20. f There is no Borel set A in (0, 1) such that aA( I) < A( A n I) < bA(I) for every open interval I in (0, 1), where 0 < a < b < 1. In fact prove: (a) If A(A n I) � bA (I) for all I and if b < 1, then A(A) = 0. Hint: Choose an open G such that A c G c (0, 1) and A(G) 0, then A(A) = 1. 3.21. Let P be a probability measure on !!I; suppose that A e !!I and A(A) = ! imply P( A ) == ! . Show that P == A. First show that P( l) = 2 - " for each dyadic interval I of order n . SECTION

DENUMERABLE PROBABILITIES

•

.

•

SECI'ION 4. DENUMERABLE PROBABILITIES

Complex probability ideas can be made clear by the systematic use of measure theory, and probabilistic ideas of extramathematical origin, such as independence, can illuminate problems of purely mathematical interest. It is to this reciprocal exchange that measure-theoretic probability owes much of its interest. The results of this section concern infinite sequences of events in a probability space.t They will be illustrated by examples in the unit interval. By this will always be meant the triple ( 0 , �, P) for which 0 is (0, 1], � is the a-field !!4 of Borel sets there, and P(A) is for A in � the Lebesgue measure A(A) of A. This is the space appropriate to the problems of Section 1, which will be pursued further. The definitions and theorems, as opposed to the examples, apply to all probability spaces. The unit interval will appear again and again in this chapter, and it is essential to keep in mind that there are many other important spaces to which the general theory will be applied later. General Formulas

Formulas (2.4) through (2.8) will be used repeatedly. The sets involved in such formulas lie in the basic a-field !F by hypothesis. Any probability t They come under what Borel in his first paper on the subject (see the footnote on p. 9) called

probabilites denombrables; hence the section heading.

46

PROBABILITY

argument starts from given sets assumed (often tacitly) to lie in .?F; further sets constructed in the course of the argument must be shown to lie in .fF as well, but it is usually quite clear how to do this. If P( A ) > 0, the conditional probability of B given A is defined in the usual way as

(4.1)

P(A n B) . P(A)

P ( B IA ) =

There are the chain-rule formulas

P ( A n B ) = P( A ) P( BIA ) , P ( A n B n C ) = P( A ) P( BIA ) P( CIA n B) ,

(4.2)

•

•

•

•

•

•

•

•

•

•

•

•

(4.3 )

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

n

Note that for fixed A the function P( B IA) defines a probability measure as B varies over !F. I f P( A n ) 0, then by subadditivity P(UnAn) = 0. If P( An) = 1, then are n n A n has complement UnA� of probability 0. Therefore, if A 1 , A 2 , =

sets of probability 0, so is

UnAn;

if A , A 1

2

,

•

•

•

•

•

•

are sets of probability 1, so is

n n A n. These two facts are used again and again. Limit Sets

For a sequence A 1 , A 2 ,

( 4.4 )

•

•

•

of sets, define a set

lim sup An = n

00

00

n U

Ak

n - l k-n

and a set

( 4.5 )

lim inf An n

=

00

00

u n Ak.

n-l k-n

These setst are the limits superior and inferior of the sequence { An } . They lie

t See Problems 4.2 and 4.3 and inferior.

for the parallel between set-theoretic and numerical limits superior

SECTION

4.

DENUMERABLE PROBABILITIES

47

in !F if all the A n do. Now w lies in (4.4) if and only if for each n there is some k � n for which w E A k ; in other words, w lies in (4.4) if and only if it lies in infinitely many of the An. In the same way, w lies in (4.5) if and only if there is some n such that w E A k for all k > n; in other words, w lies in (4.5) if and only if it lies in all but finitely many of the An· Note that n k- nA k j lim infnA n and Uk- n A k � lim supnAn. For every m and n, n k_ m A k c Uk- nA k , because for i > max{ m, n }, A ; contains the first of these sets and is contained in the second. Taking the union over m and the intersection over n shows that (4.5) is a subset of (4.4). But this follows more easily from the observation that if w lies in all but finitely many of the A n then of course it lies in infinitely many of them. Facts about limits inferior and superior can usually be deduced from the logic they involve more easily than by formal set-theoretic manipulations. If (4.4) and (4.5) are equal, write

(4. 6)

lim An = lim infAn = lim supAn . n

n

n

To say that A n has limit A, written An --+ A, means that the common value (4.6) exists and coincides with A . Since lim infnAn c lim supnAn always holds, to check whether An --+ A is to check whether lim supnAn c A c lim infn A n. From An E !F and An --+ A follows A E .?F. Consider the functions dn ( w ) defined on the unit interval by the dyadic expansion (1.6), and let In( w) be the length of the run of O's starting at d n (w): ln (w) = k if dn (w) = . . . = d n + k - 1 (w) = 0 and can be computed dn + k (w) = 1 ; here ln (w) = 0 if dn (w) = 1. Probabilities n by (1 .9). Since [w: ln (w) = k] is a union of 2 - 1 disjoint intervals of length 2 - n - k , it lies in !F and has probability 2 - k - 1 • Therefore, [w: ln (w) > r] = [ (A): d; ( w) = 0, n � i < n + r] lies also in .fF and has probability 2 k 1. � �k "il::. r - - • (4.7 ) Exllltlple 4. 1.

If A n is the event in (4.7), then (4.4) is the set of w such that ln (w) � r for infinitely many n , or, n being regarded as a time index, such that ln (w) > r

infinitely often.

•

of the theory of Sections 2 and 3, statements like (4.7) are valid in the sense of ordinary mathematics, and using the traditional language of probability-" heads," " runs," and so on-does not change this. When n has the role of time, (4.4) is frequently written Because

(4 .8)

lim sup An = ( An i .o. ) , n

48

PROBABILITY

where " i.o." stands for " infinitely often." Similarly, (4.5) could be written

( 4 .9)

lim infAn = ( An a.a.] , n

with "a.a." abbreviating " almost always," that is, " for all but finitely many values of n . " Theorem 4. 1. (i)

(4 .10)

(ii)

(

For each sequence

P lim infAn n

If A n -+ A ,

)

{ An } ,

<

lim infP( An ) n

<

lim sup P ( An ) n

<

(

)

P lim sup An · n

then P( An) -+ P( A).

Clearly (ii) follows from (i). As for (i), if Bn = nk nA k and Cn = ur- n A k , then Bn j lim infnAn and Cn � lim supnAn, so that, by parts (i) and (ii) of Theorem 2.1, P( An) > P( Bn) -+ P(lim infnAn) and P( An) < • P( Cn) -+ P(lim supnA n) · PROOF.

Define ln(w) as in Example 4.1 and let An = [w: ln( w ) > r ] for fixed r. By (4.7) and (4.10), P[ w: In( w) > r i.o.] > 2 - r. Much • stronger results will be proved later. Example 4. 2

Independent Events

Events A and B are independent if P( A n B) = P( A)P( B). (Sometimes an unnecessary mutually is put in front of independent.) For events of positive probability, this is the same thing as requiring P( BIA) = P(B) or P( A I B) = P( A). More generally, a finite collection A1, , An of events is indepen dent if • • •

(4.11) for 2 < j < n and 1 � k1 < · · · < ki < n. Reordering the sets clearly has no effect on the condition for independence, and a subcollection of in dependent events is also independent. An infinite (perhaps uncountable) collection of events is defined to be independent if each of its finite subcollections is.

SECTION

If

4.

DENUMERABLE PROBABILITIES

49

n = 3, (4.11) imposes for j = 2 the three constraints

(4 . 12 ) P( A 1 n A 2 )

=

P( A 2 n A 3 ) and for

j=

P( A 1

P( A 1 ) P( A 2 ) , =

n

A3 )

=

P( A 1 ) P ( A 3 ) ,

P( A 2 ) P ( A 3 )

3 the single constraint

(4.1 3 ) Example 4.3.

Consider in the unit interval the events Buv = [ w: d u ( w) = d0( w )] - the u th and vth tosses agree-and let A 1 = B12 , A 2 = B 1 3, A 3 = B2 3• Then A 1 , A2, A3 are pairwise independent in the sense that (4.12) holds (the two sides of each equation being !). But since A 1 n A 2 c A 3 , (4.13) does not hold (the left side is ! and the right is �), and the events • are not independent. Example 4.4.

In the discrete space 0 = { 1 , . . . , 6} suppose each point has probability � (a fair die is rolled). If A 1 = { 1, 2, 3, 4} and A 2 = A3 = {4, 5, 6 }, then (4.13) holds but none of the equations in (4.12) do. Again the • events are not independent. Independence requires that (4.11) hold for each j = 2, . . . , n and each n choice of k 1 , . . . , ki , a total of Ej _2 ; = 2 - 1 - n constraints. This requirement can be stated in a different way: If B 1 , • • • , Bn are sets such that for each i = 1, . . . , n either B; = A; or B; = 0, then

()

(4.14) The point is that if B; = 0, then B; can be ignored in the intersection on the left and the factor P( B;) = 1 can be ignored in the product on the right. For example, replacing A 2 by 0 reduces (4.13) to the middle equation in (4.12). From the assumed independence of certain sets it is possible to deduce the independence of other sets. Example 4.5. On the unit interval the events Hn = [ w: dn< w ) = 0], n = 1, 2, . . . , are independent, the two sides of (4.11) having in this case

value 2 -J. It seems intuitively clear that from this should follow the independence, for example, of [w: d2 (w) = 0] = H2 and [w: d1 (w) = 0, d3 ( w ) = 1] = H1 n Hf, since the two events involve disjoint sets of times. Fwtber, any sets A and B depending, respectively, say, only on even and on odd times (like [w: d2n(w) = 0 i.o.] and [w: d2n + 1 (w) = 1 i.o.]) ought also to be independent. This raises the general question of what it means for A to depend only on even times. Intuitively, it requires that knowing which

PROBABILITY

occurred entails knowing whether or not A oc ones among H2 , H4 , curred-that is, it requires that the sets H2 , H4 , "determine" A . The set-theoretic form of this requirement is that A is to lie in the a-field From A e a( H2 , H4 , ) and B e a(H1 , H3 , ) generated by H2 , H4 , it ought to be possible to deduce the independence of A and B. • • • •

• • •

• • •

•

• • •

. . •

The next theorem and its corollaries make such deductions possible. Define classes .JJ/1 , , .JJin in the basic a-field .fF to be independent if for each choice of A; from .JJI; , i = 1, . . . , n, the events A 1 , , A n are indepen dent. This is the same as requiring that (4.14) hold whenever for each i, 1 < i � n, either B; e .JJI; or B; = D. • • •

• • •

If .JJ/1 , , .JJin are independent and each .JJ1; is a 'IT-system, then a ( .JJ/1 ) , a ( .JJin ) are independent. PROOF. Let ffl; be the class .JJI; augmented by D (which may be an element of �; to start with). Then each ffl; is a 'IT-system, and by the hypothesis of independence, (4.14) holds if B; e ffl;, i = 1, . . . , n. For fixed sets B2 , , Bn lying respectively in ffl2 , , !Jin , let !l' be the class of �sets B 1 for which (4.14) holds. Then !l' is a A-system containing the 'IT-system ffl 1 and hence (Theorem 3.2) containing a(ffl 1 ) = a(.JJ/1 ). Therefore, (4.14) holds if B 1 , B2 , , Bn lie respectively in a( .JJ/1 ), ffl2 , , !fin , which means that a( .JJ/1 ), �2 , , .JJin are independent. Clearly the argument goes through if 1 is replaced by any of the indices 2, . . . , n. From the independence of a( .JJ/1 ), .JJ/2 , , .JJin now follows that of • a( .JJ/1 ), a( .JJ/2 ), .JJ/3 , , .JJin , and so on. If .JJI = { A 1 , , A k } is finite, then each A in a( .JJI ) can be expressed by a " formula" such as A = A 2 n As or A = ( A 2 n A 7) u ( A 3 n A7 n A8). If .JJI is infinite, the sets in a( .JJ/ ) may be very complicated; the way to make precise the idea that the elements of .JJI "determine" A is not to require formulas, but simply to require that A lie in a( .JJI ). • • .

Theorem 4.2.

, • • •

• • •

• • •

• • •

• • •

• • •

• • •

• • •

• • •

Independence for an infinite collection of classes is defined just as in the finite case: [ .JJ/9 : 8 e 8] is independent if the collection [A 9 : 8 e 8] of sets is independent for each choice of A 9 from .JJ/9 • This is equivalent to the independence of each finite subcollection .JJ/91 , , .JJ/9, of classes because of the way independence for infinite classes of sets is defined in terms of independence for finite classes. Hence Theorem 4.2 has an immediate consequence: • • •

If .JJ/9 , 8 e 8, are independent and each .JJ/9 is a 'IT-system, then a( .JJ/9 ), 8 e 8, are independent.

Coronary 1.

SECTION

Corollary 2.

4.

51

DENUMERABLE PROBABILITIES

Suppose that the array A 11 ' A 1 2 ' A 21 ' A 22 '

(4.15 )

.

· · · · · ·

.. . . . ...

of events

is independent; here each row is a finite or infinite sequence, and there are finitely or infinitely many rows. If !F; is the a-field generated by the are independent. i th row, then .F1 , .?F2 , • • •

If d; is the class of all finite intersections of elements of the i th row of (4.15), then s;l; is a w-system and o( d;) = !F;. Let I be a finite collection of indices (integers), and for each i in I let 1; be a finite collection of indices. Consider for i E I the element C; = n j E J;A ij of d;. Since every finite subcollection of the array (4.15) is independent (the intersections and products here extend over i e I and j e 1; ), PROOF.

P (n . c . ) = P (n . n . A . . ) = n .n . P ( A . . ) = n . P (n . A . . ) I

I

I

=

J

I

1)

J

1)

I

J

1)

n ; P ( C; ) .

It follows that the classes .JJ/1 , .JJ/2 , applies.

• • •

are independent, so that Corollary 1

•

Corollary 2 implies the independence of the events discussed in Example 4.5. The array (4.15) in this case has two rows:

H2 , H4 , H6, . . . HI , H3 , Hs, . . . Theorem 4.2 also implies, for example, that for independent A1,

• • • ,

A n,

(4.16 ) To prove this, let d; consist of A; alone; of course, A � e o( s;f; ) . In (4.16) any subcollection of the A ; could be replaced by their complements.

Consider as in Example 4.3 the events Bu v that, in a sequence of tosses of a fair coin, the u th and vth outcomes agree. Let .9J'1 consist of the events B12 and B1 3 , and let .JJ/2 consist of the event B23• Since Example 4. 6.

52

PROBABILITY

these three events are pairwise independent, the classes .9J'1 and .9J'2 are independent. Since B2 3 = ( B1 2 � B13 ) ( lies in a( .JIJ'1 ), however, a( .JIJ'1 ) and a( .JJI2 ) are not independent. The trouble is that .9J'1 is not a w-system, which shows that this condition in Theorem 4.2 is essential. • Subfields

Theorem 4.2 involves a number of a-fields at once, which is characteristic of probability theory; measure theory not directed toward probability usually involves only one all-embracing a-field !F. In probability, a-fields in �- that is, sub-a-fields of !F-play an important role. To understand their function it helps to have an informal, intuitive way of looking at them. A subclass .JJI of !F corresponds heuristically to partial information. Imagine that a point w is drawn from D according to the probabilities given by P: w lies in A with probability P( A). Imagine also an observer who does not know which w it is that has been drawn but who does know for each A in SJI whether w e A or w � A . Identifying this partial information with the class .JJI itself will illuminate the connection between various measure theoretic concepts and the premathematical ideas lying behind them. The set B is by definition independent of the class .JJI if P( BIA ) = P( B) for all sets A in .JJI for which P( A ) > 0. Thus if B is independent of .JIJ', then the observer' s probability for B is P( B) even after he has received the information in .JJI ; in this case .JJI contains no information about B. The point of Theorem 4.2 is that this remains true even if the observer is given the information in a( .JJI ) , provided that .JJI is a w-system. It is to be stressed that here information, like observer and know, is an informal, extramathe matical term (in particular, it is not information in the technical sense of entropy). The notion of partial information can be looked at in terms of partitions. Say that points w and w' are �equivalent if, for every A in .JJI, w and w' lie either both in A or both in A c- that is, if ( 4.17)

A

e .JJI .

This relation partitions Q into sets of equivalent points;

call

this the �partition.

EXIIIle IIp 4. 7. I f w and w' are a( .JJI )-equivalent, then certainly they are � equivalent. For fixed w and w' the class of A such that lA ( w ) = lA ( w') is a a-field; if w and w' are �equivalent, then this a-field contains .JJI and hence a( .JJI ) , so that w and w' are also a( .JJI )-equivalent. Thus �equivalence and a( .JJI )-equivalence are • the same thing, and the �partition coincides with the a(.JJI )-partition. An

observer with the information in a(.JJI ) knows, not the point w drawn, but the equivalence class containing it. That is the information he has. In Example 4.6, it

53 is as though an observer with the items of information in JJ/1 is unable to combine them to get information about B23 • SECTION

4.

DENUMERABLE PR.OBABILITIES

If Hn == [ w : dn (w) = 0] as in Example 4.5, and if JJI = { H1 , H3 , H5 , • • • }, then w and w' satisfy (4.11) if and only if dn ( w) = dn ( w') for all odd n. The information in a( JJI) is thus the set of values of dn ( w) for n odd. • 11P 4.8. E%111ie

The following example points up the informal nature of this interpretation of a-fields as information. E%111ie 11P 4.9. In the unit interval ( Q , !F, P) let the countable and the co-countable sets. Since P( G)

be the a-field consisting of is 0 or 1 for each G in fl, each set H in !F is independent of fl. But in this case the �partition consists of the singletons, and so the information in fl tells the observer exactly which w in Q has been drawn. In the sense that it is independent of H, the a-field fl contains no information at all about H; at the same time, in the sense that it tells the observer exactly which w was drawn, the a-field fl contains all the information about H. • fl

The source of the difficulty or apparent paradox in this example lies in the unnatural structure of the a-field fl rather than in any deficiency in the notion of independence.t The heuristic equating of a-fields and information is helpful even though it sometimes breaks down, and of course proofs are indifferent to whatever illusions and vagaries brought them into existence. The

Borei-CanteUi Lemmas

This is

the first Borel-Cantelli lemma:

converges, then P(lim supnAn) = 0. PROOF . From lim supn A n c U f- m A k follows P(lim supn A n ) < P(Uk_mA k ) � Ef-mP(A k ), and this sum tends to 0 as m --+ oo if En P( An) Theorem 4.3.

If EnP( An)

•

converges.

By Theorem 4.1, P(An) --+ 0 implies that P(lim infnAn) = 0; in Theorem 4.3 hypothesis and conclusion are both stronger. Consider the run length /n( w ) of Example 4.1 and a sequence { rn } of positive reals. If E 2 - r,. < oo , then Emmple 4. 10.

(4. 18 ) To prove this, note first that, by (4.7), P[ w: In( w ) � rn ] = 2 s where sn is r,. rounded upward to the next integer, and (4.18) will follow by the first -

tSee Problem 4.13 for a more extreme example.

,.

,

54

PROBABILITY

Borel-Cantelli lemma if E 2 -.s,. < oo . Since 2 - .s,. < 2 - r,., l:2 - r,. < oo does imply (4.18). If rn = (1 + £ )log2n and £ is positive, there is convergence because 2 - r,. = 1/n 1 + '. Thus ( 4 .19) The limit superior of the ratio In( w )/log2 n exceeds 1 if and only if w belongs to the set in (4.19) for some positive rational £. Since the union of this countable class of sets has probability 0, (4 .20) To put it the other way around,

• In this example, whether the inequality lim supnln(w )jlog2n < 1 holds or not is a property of w, and the property in fact holds for w in a set of probability 1. In such a case the property is said to hold with probability 1 , or almost surely. In nonprobabilistic contexts, a property that holds for w outside a set of measure 0 holds almost everywhere, or for almost all w. The preceding example has an interesting arithmetic consequence. Truncating the dyadic expansion at n gives the standard approximation E;::� dk ( w )2 - k to w; the error is between 0 ( n 1)-place n and 2 - + 1 , and the error relative to the maximum is Example 4. 11. -

(4 .21 ) which lies between 0 and 1. The binary expansion of e n (w) begins with ln( w ) O ' s, and then comes a 1. Hence .0 . . . 01 < e n (w) < .0 . . . 0111 . . . , where there are ln( w ) O's in the extreme terms. Therefore, (4.22)

SECTION

4.

55

DENUMERABLE PROBABILITIES

so

that results on run length give information about the error of approxima tion. By the left-hand inequality in (4.22), e n( w ) < xn (assume that 0 < xn < 1) imp lies that In( w ) > - log2xn - 1 ; since E 2 - r,. < oo implies (4.18), Exn < oo implies P [ w: e n( w ) � xn i.o.] = 0. (Clearly, [w: e n( w) < x] is a Borel set.) In particular,

(4 .23) This

probability refers to a point w drawn at random from the unit interval . •

Example 4.12. The final step in the proof of the normal number theo rem (Theorem 1.2) was a disguised application of the first Borel-Cantelli lemma. If An = jw: l n - 1s n( w ) l > n - 1 18 ], then EP(A n ) < oo , as follows by (1 .28), and so P[An i.o.] = 0. But for w in the set complementary to [An

i.o.], n - lsn( w)

This is

--+ 0.

•

the second Borei-Cantelli lemma:

1beorem 4.4.

If { An } is an independent sequence of events and EnP( An)

diverges, then P(lim supnAn) = 1. PROOF.

It is enough to prove that P(U:0_ 1 nk_nAk) = 0 and hence enough to prove that P(n k- nAk ) = 0 for all n. Since 1 - x < e - x,

Since E k P( A k ) diverges, the last expression tends to 0 as P(n k-nAk ) = limj P(n i:!�Ak) = 0.

j --+

oo ,

and hence •

By Theorem 4.1, lim supnP(An) > 0 implies P(lim supnAn) > 0; in Theo rem 4.4 the hypothesis EnP( An) = oo is weaker but the conclusion is stronger because of the additional hypothesis of independence. Since the events [w: /n( w ) = 0] = [w: dn( w ) = 1], n = 1 , 2, . . . , are independent and have probability � ' P[w: /n( w ) = 0 i.o.] = 1. Since the events An = [w: /n(w) = 1] = [w: dn( w ) = 0, dn + 1 ( w ) = 1], n = 1, 2, . . . , are not independent, this argument is insufficient to prove that Example 4. 13.

(4 . 24 )

P [ w : In ( w ) = 1 i.o. ) = 1 .

PROBABILITY

But the events A 2 , A4, A6, . . . are independent (Theorem 4.2) and their probabilities form a divergent series, and so P[w: 1 2 n( w ) 1 i.o.] = 1, • which implies (4.24). =

Significant applications of the second Borel-Cantelli lemma usually require, in order to get around problems of dependence, some device of the kind used in the preceding example. � 4. 14. Ext��ple

and

There is a complement to (4 .1 8 ): E 2 -'"/rn diverges, then

If rn

is

nondecreasing

( 4.25)

To prove this, note first that if rn is rounded up to the next integer, then E 2 - ',./rn still diverges and (4.25) is unchanged. Assume then that rn = r ( n ) is integral and define { n k } inductively by n 1 = 1 and n k 1 = n k + rn k k � 1 . Let A k = [w: ln k ( w ) � rn k 1 = [w: d;( w ) = 0, nk < i < n k 1 ] ; since the A k involve nonoverlapping · sequences of time indices, it follows by Corollary 2 to Theorem 4.2 that A 1 , A 2 , are independent. Byn the second Borel-Cantelli lemma, P[A k i.o.] = 1 if E k P(A k ) = E k 2 - r< k > diverges. But since rn is nondecreasing, +

'

+

• • •

E

k�l

n" ) n k) - 1 r( r( E r (nk )( n k 1 - nk ) 2 = 2 +

k�l

n�l

n Thus the divergence of En 2 - ',.rn- l implies that of Ek 2 - r< k >, and it follows that, with probability 1, In k( w ) � rn k for infinitely many values of k. But this is stronger than (4.25). The result in Example 4.2 follows if rn = r, but this is trivial. If rn = log2n, then E 2 - ' /rn = E 1/(nlog2n) diverges, and therefore ..

(4.26)

By (4.26) and (4.20), (4 .27 )

Thus for w in a set of probability 1, log2n as a function of n is a kind of • " upper envelope" for the function In< w ) .

SECTION

4.

DENUMERABLE PROBABILITIES

57

By the right-hand inequality in (4.22), if /n(w) > log2n, 1/n. Hence (4.26) gives

EXIlmple 4. 15.

then en( w )

<

[

(4.28)

P w : en ( w )

<

! i.o.] = 1 .

This

and (4.23) show that, with probability 1, en(w) has 1/n as a " lower envelope." The discrepancy between w and its ( n - 1)-place approximation n Ek:ldk ( w )/2 k will fall infinitely often below ( n 2 - l) - 1 but not infinitely n-l - 1 1 ( + 2 below (n ) . • often 1be

Zero-One Law

For a sequence A 1 , A 2, . . . of events in a probability space (D, �, P ) consider the a-fields a(An, An + 1 , . . . ) and their intersection

(4 .29)

ff=

00

a ( An , An + t , . . . ) . n n -1

is the tail a-field associated with the sequence { An }, and its elements are called tail events. The idea is that a tail event is determined solely by the A n for arbitrarily large n . This

Since lim supmAm = n k � n U; � k A ; and lim infm A m = Uk � n ni � k A; are both in a(An, An + t ' . . . ), the limits superior and inferior • are tail events for the sequence { An } . . EXIImple 4.16.

Example 4.1 7. Let ln(w) dn("') = 0]. For each n0,

( w : /n ( w )

Thus [ w: In( w)

> rn i.o.]

be the run length, as before, and let Hn = [ w :

> rk ]

=

(w : u n n n n

=

Hk n Hk + t n n U n n n

� 0 k>

/k ( w )

> 0 k�

· · · n Hk + rk 1 · -

> rn i.o.] is a tail event for the sequence { Hn }.

The probabilities of tail events are governed by

law: t

t For a more general version,

see

Theorem 22.3.

•

Kolmogorov 's zero-one

58

PROBABILITY

If A1, A 2 , . . . is an independent sequence of events, then for each event A in the tail a-field (4.29), P( A) is either 0 or 1 .

Theorem

4.5.

By Corollary 2 to Theorem 4.2, o ( A 1 ), . . . , o ( A n _ 1 ), a ( A n , A n 1 , . . . ) are independent. If A E !!7, then A E a ( A n A n 1 , . . . ) ' + + and therefore A 1 , . . . , A n _ 1 , A are independent. Since independence of a collection of events is defined by independence of each finite subcollection, the sequence A , A 1 , A 2 , . . . is independent. By a second application of Corollary 2 to Theorem 4.2, o( A) and o(A 1 , A 2 , ) are independent. But A E !Tc o( A 1 , A 2 , ) ; from A E o( A) and A E o( A 1 , A 2 , . . . ) it follows that A is independent of itself: P( A n A) = P( A)P( A ). This is the same as • P( A ) = ( P( A )) 2 and can hold only if P( A ) is 0 or 1. PROOF.

•

•

•

•

•

•

By the zero-one law and Example 4.16, P(lim supn A n ) is 0 or 1 if the A n are independent. The Borel-Cantelli lemmas in this case go further and give a specific criterion in terms of the convergence or diver • gence of EP( A n ) · Example 4. 18.

Kolmogorov' s result is surprisingly general, and it is in many cases quite easy to use it to show that the probability of some set must have one of the extreme values 0 and 1. It is perhaps curious that it should so often be very difficult to determine which of these extreme values is the right one. By Kolmogorov' s theorem and Example 4.17, [w: ln ( w ) > rn i.o.] has probability 0 or 1. Call the sequence { rn } an outer boundary or an inner boundary according as this probability is 0 or 1. In Example 4.10 it was shown that { rn } is an outer boundary if E 2 - r, < oo . In Example 4.14 it was shown that { rn } is an inner boundary if rn is nondecreasing and E 2 - '"rn- 1 = oo . By these criteria rn = 8 log 2 n gives an outer boundary if 8 > 1 and an inner boundary if 8 < 1. What about the sequence rn = log 2 n + 8 log 2 log 2 n? Here E 2 - r, = E 1 /n (log 2 n ) 9, and this converges for 8 > 1, which gives an outer boundary. Now 2 - '"rn- l is of the order 1/n(log 2 n) 1 + 9, and this diverges if 8 < 0, which gives an inner boundary (this follows indeed from (4.26)). But this analysis leaves the range 0 < 8 < 1 unresolved, although every sequence is either an inner or an outer boundary. This question is pursued further in Example 6.6. • Example 4. 19.

Strong Laws Versus Weak If

A n = [w: l n ( w )

(4.30)

> log 2 n], then by (4.7) and (4.26), P ( A n ) -+ 0,

P(lim supn A n ) = 1 .

SECTION

4.

59

DENUMERABLE PROBABILITIES

Compare (4.10): If P( A n ) --+ 0, then lim inf n A n has probability 0, but lim supn A n may have probability 1 . There are simpler examples of this. Let /1 , /2 be the dyadic intervals of rank 1 in (0, 1], let /3, /4 , 15 , 16 be the dyadic intervals of rank 2, and so on; see (1 .31) and (1 .32). These intervals also have the property (4.30), since the length of In goes to 0 while lim supn /n is the entire interval (0, 1]. The point in Section 1 was that, if In is the indicator of In ' then lim (4.31 ) n P [ w : lin ( w ) I > £ ) = 0 for each positive

£,

even though it is not true that

(4.32) in fact, the set in (4.32) is empty and hence has Lebesgue measure 0 (in the terminology of Section 1, it is negligible). On the other hand, suppose each In is a step function on the unit interval, and is otherwise arbitrary. Then (4.32) implies (4.31). Indeed, the set [ w: lin ( w ) I > £ i.o.], being contained in D - [ w: lim n In ( w) = 0], must by (4.32) have measure 0; but then (4.31) follows by Theorem 4.1. As pointed out in Section 1, this implication (with ln (w) = n - 1sn (w)) shows that the strong law of large numbers, Theorem 1.2, is indeed stronger than the weak one, Theorem 1.1. PROBLEMS 4. 1.

Suppose that sets A 1 , A 2 , . Derive Bayes' formula,

4.1.

2.1 f The limits superior and inferior of a numerical sequence { xn } can be defined as the supremum and infimum of the set of limit points-that is, the set of limits of convergent subsequences. This is the same thing as defining ( 4 .33)

•

•

decompose Q and that P( An ) > 0, P( B)

00

00

lim sup xn == " V xk n n n

-1 k-

and ( 4 .34)

00

00

lim infxn n == v 1\ xk .

n- 1 k -n

>

0.

60

PROBABILITY

Compare these relations with (4.4) and (4.5) and prove that

Prove that limn An exists in the sense of (4.6) if and only if limn /A ( w ) exists for each w. f (a) Prove that n

4.3.

( lim sup A,. } n ( lim sup B,, } n

n

::::>

lim sup ( A , n B, ) , n

( lim sup A,. ) u (lim sup B,, ) ... lim sup ( A,. U B,. ) , n

n

n

( lim inf A,. ) n ( lim inf B,. ) ... lim inf ( A,. n B,. ) , n

n

n

Show by example that the two inclusions can be strict. (b) The numerical analogue of the first of the relations in part (a) is

( lim sup x,. } ( lim sup y,.} n

A

n

�

lim sup ( x,. n

A

y,.) .

Write out and verify the numerical analogues of the others. (c) Show that

{

r

lim sup A� ... lim infA,. , n

lim infA� n

=

n

( lim sup A,. f , n

lim sup An - lim infA n == lim sup ( An n A� + 1 ) n

n

n

== lim sup ( A� n

(d)

4.4.

n

An + 1 ) •

Show that An � A and Bn � B together imply that An U Bn � A U B and An n Bn � A n B. Let An be the square [( x, y ): I x I � 1, IY I < 1] rotated through the angle 2wn fJ. Give geometric descriptions of lim supnAn and lim inf An in case (a) fJ == l ; (b) fJ

is rational;

4. DENUMERABLE PROBABILITIES 61 fJ is irrational. Hint: The 2wnfJ reduced modulo 2w are dense in SECTION

(c)

4.5.

4.6.

[0, 2 w] if fJ is irrational. (d) When is there convergence in the sense of (4.6)? Find a sequence for which all three inequalities in (4.10) are strict. (a) Show that limn P (lim infk A n n A k ) == 0. Hint: Show that lim supn lim infk An n Ak is empty. Put A* == lim supnAn and A == lim infnAn. (b) Show that P(An - A* ) -+ 0 and P(A . - An) -+ 0. (c) Show that An -+ A (in the sense that A == A * == A .) implies P(A�An) -+ 0. (d) Suppose that An converges to A in the weaker sense that P(A�A* ) == P(A�A .) == 0 (which implies that P( A* - A .) == 0). Show that P(A�An ) -+ 0 (which implies that P( An ) -+ P( A )). Assume that P(lim supnAn ) == 1 and P(lim infn Bn ) == 1 and prove that P(lim supn ( An n Bn )) == 1. What happens if the condition on { Bn } is weakened to P(lim supn Bn ) == 1? 2.19 4.6 f Show that the metric space of sets in Problem 2.19 is complete. In a space of six equally likely points (a die is rolled) find three events that are not independent even though each is independent of the intersection of the other two. n For events A 1 , , An , consider the 2 equations P( B1 n · · · n Bn ) == P( B1 ) P( Bn ) with B; == A; or B; == Af for each i. Show that A 1 , , An are independent if all these equations hold. Suppose that the equation P(A n B n C) == P( A ) P( B) P(C) holds for fixed sets B and C and for all A in a w-system .JJI such that Q == U� 1 A; for some sequence of sets A ; in .JJI. Show that the equation holds for all A in a( .JJI ). For each of the following classes .JJI, describe the �partition defined by (4.17). (a) The class of finite and cofinite sets. (b) The class of. countable and co-countable sets. of Q. (c) A partition (of arbitrary cardinality) 1 (d) The level sets of sinx ( Q == R ). (e) The a-field in Problem 3.4. 3.2 f There is in the unit interval a set H that is nonmeasurable in the extreme sense that its inner and outer Lebesgue measures are 0 and 1 (see (3.9) and (3.10)): A .( H) == 0 and A*( H) == 1. See Problem 12.4 for the construction. Let Q == (0, 1 ], let t§ consist of the Borel sets in Q, and let H be the set just described. Show that the class !F of sets of the form ( H n G1 ) U ( He n G2 ) for G1 and G2 in t§ is a a-field and that P [( H n G1 ) U ( He n G2 )] =- t A ( G1 ) + t A ( G2 ) consistently defines a probability measure on !F. Show that P( H) == t and that P( G) == A (G) for G e t§. Show that t§ is generated by a countable subclass (see Problem 2.10). Show that t§ contains all the singletons and that H and t§ are independent. •

4.7. 4.8.

4.9. 4.10.

•

4.11.

4.11.

4. 13.

•

•

• • •

•

•

•

62

4. 14.

4. 15.

4. 16.

4. 17.

4. 18.

PROBABILITY

The construction proves this: There exist a probability space ( n , !F, P), a a-field t§ in !F, and a set H in !F, such that P( H) == ! , H and t§ are independent, and t§ is generated by a countable subclass and contains all the singletons. Example 4.9 is somewhat similar, but there the a-field t§ is not countably generated and each set in it has probability either 0 or 1. In the present example t§ is countably generated and P( G) assumes every value between 0 and 1 as G ranges over t§. Example 4.9 is to some extent unnatural because the t§ there is not countably generated. The present example, on the other hand, involves the pathological set H. This example is used in Section 33 in connection with conditional probability; see Problem 33.13. (a) If A 1 , A 2 , • • • are independent events, then P(n� 1 An) == n:0_ 1 P( An ) and P(U�_ 1 An ) == 1 - n�- 1 (1 - P(An )). Prove these facts and from them derive the second Borel-Cantelli lemma by the well-known relation between infinite series and products. (b) Show that P(lim supn A n ) == 1 if for each k the series En > k P( An IAJ; n · · · n .A � _ 1 ) diverges. From this deduce the second Borel-Cantelli lemma once agam. (c) Show by example that P(lim supn An ) == 1 does not follow from the divergence of En P( An I A f n · · · n A�_ 1 ) alone. Suppose A 1 , A 2 , • • • are independent. There are a priori four cases for the convergence or divergence of EP(An ) and EP(A� ). Describe the pair P(lim supn An ) and P(lim infn A n ) in each case. (a) Show (see Example 4.19) that log 2 n + log 2 log 2 n + 9 log 2 log 2 log 2 n is an outer boundary if 9 > 1. Generalize. (b) Show that log 2 n + log 2 log 2 log 2 n is an inner boundary. n 2.16 f Extend to t:t == a(t:60) the measure P assigning probability 2- to each cylinder of rank n. Let An == [ w: an ( w ) == 1]. Show that A 1 , A 2 , • • • are independent events and P( An ) == ! . f Suppose that 0 � Pn < 1 and put Pn (1) == Pn and Pn (0) == 1 - Pn . Gener alize the construction in Problem 2.16 by defining P by P[ w: a; ( w ) == u; , i < n ] == P I ( ul ) . . . Pn ( un ) for sequences ul ' . . . ' un of O's and 1 ' s; Problem 2.16 covers the case Pn ! . Show that P gives a probability measure on ll0 and extend it to 1:1. Show that the events An == [ w: an ( w ) == 1] are indepen dent and P(An ) == Pn · f Suppose that 0 s Pn < 1 and put an == min{ Pn ' 1 - Pn }. Show that, if Ean converges, then in some discrete probability space there exist indepen dent events An satisfying P(An ) == Pn · Compare Problem 1 .1(b). 2.17 f Suppose that there are in ( Q , !F, P) independent events A 1 , A 2 , • • • such that, if an == min { P(An ), 1 - P(An )}, then Ean == oo. Show that P is nonatomic. 2.15 f Let F be the set of square-free integers-those integers not divisible 2 by any perfect square. Let F, be the set of m such that p 1 m for no p < I, and show that D( F,) == n P 1 (1 -:- p- 2 ). Show that . Pn ( F, - F ) < EP > , p- 2 , and conclude that the square-free mtegers have density n ( 1 - p- 2 ) == 6jw 2 • =

4.19.

4.20. 4.21.

<

p

SECTION

4.12. Let1 A 1 , A 2 , • •

5.

63

SIMPLE RANDOM VARIABLES

be independent events. Show that the event [ w : n - Ek _ 1 1A ,. , ( w) -+ x] has probability either 0 or 1. 4.11. Suppose that 0 < an < 1, n 1, 2, . . . . From [0, 1] remove an open interval of length a1 • From each of the two remaining closed intervals remove an open interval of relative length a 2 • Continue. The remaining set is of the Cantor type; show that it has Lebesgue measure 0 if and only if Enan oo . In the case of the Cantor set, an t . •

==

=

==

SECI'ION S. SIMPLE RANDOM VARIABLES Definition

Let (D, �, P ) be an arbitrary probability space and let X be a real-valued function on 0 ; X is a simple random variable if it has finite range (assumes only finitely many values) and if

(5 .1)

( w : X( w ) = x] e �

for each real x. (Of course, [w: X( w) = x] = 0 e � for x outside the range of X.) Whether or not X satisfies this condition depends only on �, not on P, but the point of the definition is to en�ure that the probabilities P[w: X( "' ) = x] are defined. Later sections will treat the theory of general random variables, of functions on 0 having arbitrary range; (5.1) will require modification in the general case. The d n ( w) of the preceding section (the digits of the dyadic expansion) are simple random variables on the unit interval: the sets [ w: dn (w) = 0] and [ w: d n ( w ) = 1 ] are finite unions of subintervals and hence lie in the a-field !!4 of Borel sets in (0, 1 ] . The Rademacher functions are also simple random variables. Although the concept itself is thus not entirely new, to proceed further in probability requires a systematic theory of random variables and their expected values. The run lengths ln (w) satisfy (5.1) but are not simple random variables because they have infinite range (they come under the general theory). In a discrete space, � consists of all subsets of 0, so that (5.1) always holds. The indicator /A of a set A is a simple random variable if and only if A e �. It is customary in probability theory to omit the argument w. Thus X stands for a general value X( w) of the function as well as for the function itself, and [ X = x] is short for [ w: X( w) = x]. A finite sum ( 5 .2 )

64

PROBABILITY

is a random variable if the A; form a finite partition of 0 into jli:sets. Moreover, every simple random variable can be represented in the form ( 5 .2) : for the X ; take the range of X, and put A; = [ X = X;] · But X may have other such representations because x;IA . can be replaced by E1x;IA if the A;1 form a finite decomposition of A; in to jli:sets. If f§ is a sub-a-field of !F, a simple random variable X is measurable f§ , or measurable with respect to f§ , if [ X = x ] e f§ for each x. A simple random variable is by definition always measurable �. Since [ X e H ] = U: X = x ], where the union extends over the finitely many x lying both in H and in the range of X, [ X e H ] e f§ for every H c R 1 if X is a simple random variable measurable f§ . The a-field a( X) generated by X is the smallest a-field with respect to which X is measurable; that is, a( X) is the intersection of all a-fields with respect to which X is measurable. For a finite or infinite sequence X1 , X2 , . . . of simple random variables, a( xl , x2, . . . ) is the smallest a-field with respect to which each X; is measurable. It can be described explicitly in the finite case: 'l

Let xl , . . . ' xn be simple random variables. (i) The a-field a( X1 , , Xn) consists of the sets

Theorem 5.1.

• • •

for H c R n ; H in this representation may be taken finite. , Xn) if and only (ii) A simple random variable Y is measurable a( X1 , if Y = f( X1 , , Xn ) ( 5.4 ) • • •

• • •

for some f:

n R --+ R 1 •

Let .,II be the class of sets of the form (5.3). Sets of the form [( XI , . . . ' Xn ) = ( x l , . . . ' x n )1 = n �- 1 [ X; = X ; ] must lie in a( xl , . . . ' Xn ); each set (5.3) is a finiten union of sets of this form because ( X1 , , Xn), as a mapping from 0 to R , has finite range. Thus .I c a ( X1 , , Xn). n On the other hand, .I is a a-field because 0 = [( X1 , , Xn) E R ], [( X1 , , Xn ) e H]c = [( X1 , , Xn) E He], and U 1 [( X1 , , Xn) e H1 ] = [( X1 , , Xn ) e U 1 H1 ]. But each X; is measurable with respect to Jf be cause [ X; = x ] cannbe put in the form (5.3) by taking H to consist of those ( x l , . . . ' xn ) in R for which X; = X. It follows that a( xl, . . . ' Xn ) is contained in Jf and therefore equals Jf. As intersecting H with the range n (finite) of ( X1 , , Xn ) in R does not affect (5.3), H may be taken finite. This proves (i). PROOF.

• • •

• • •

• • •

• • •

• • •

• • •

• • •

• • •

5. SIMPLE RANDOM VARIABLES 65 Assume that Y has the form (5.4)-that is, Y( w ) = I( X1( w ), . . . , Xn( w )) for every w . Since [ Y = y] can be put in the form (5.3) by taking H to consist of those x = (x1, . . . , xn ) for which l( x ) = y, Y is measurable SECTION

a ( X1, . . . , Xn ) .

Now assume that Y is measurable o( X1 , . . . , Xn ). Let y1 , , y, be the distinct values Y assumes. By part (i) there exist H1, . . . , H, such that [ w: Y( w ) = Y; ] = [ w : ( X1( w ), . . . , Xn ( w )) e H; ] = [ w : ( X1( w ), . . . , Xn( w )) e H; n Hf n n H;c_ 1], where the last equality holds because the [ w : Y( w ) = Y; ] are disjoint. It suffices to define l( x1, , xn) as Y; on H; n Hf n H;c_ 1 and as 0, say, on (U�_ 1H; ) c. (Alternatively, simply take n I = E�- 1Y; IH, · > • ·

• • •

· ·

• • •

·

·

·

Since (5.4) implies that Y is measurable o( X1, . . . , Xn ), it follows in particular that functions of simple random variables are again simple 2 1 random variables. Thus X , e x, and so on are simple random variables along with X. Taking I to be E�- 1 xi, n�- 1 X ;, or max; < n xi shows that sums, products, and maxima of simple random variables are simple random variables. As explained on p. 52, a sub-a-field of � corresponds to partial informa tion about w. In this view, o( X1, . . . , Xn ) corresponds to a knowledge of the values X1 ( w ), . . . , Xn ( w ). These values suffice to determine the value Y( w ) of Y if and only if (5.4) holds. t EXIlllple l 5. 1. For the dyadic digits dn ( w ) on the unit interval, d3 is not measurable o ( d1, d2 ); indeed, there exist w' and w" such that d1( w') = d1 ( w") and d 2 ( w') = d2 ( w") but d3( w') ¢ d3( w"), an impossibility if d3( w ) = l( d1( w ), d2 ( w )) identically in w. If such an I existed, one could unerringly predict the outcome d3( w ) of the third toss from the outcomes • d1 ( w ) and d2 ( w ) of the first two.

Let sn ( w ) = Ek _ 1rk ( w ) be the partial sums of the Rademacher functions-see (1.13). By Theorem 5.1(ii) sk is measurable a( r1, , rn ) for k � n , and rk = sk - sk _ 1 is measurable o(s1, , sn ) for k � n. Thus o ( r1, , rn ) = o(s1, , sn ). In random-walk terms, the first n positions contain the same information as the first n distances moved. In gambling terms, to know the gambler' s first n fortunes (relative to his initial fortune) is the same thing as to know his gains and losses on each of the first n plays. • Exllllple l 5. 2. • • •

• • •

• • •

• • •

t The elements of the a( XI ' . . . ' Xn )-partition (see (4. 17)) are the sets [ XI = XI ' . . . ' xn = Xn ] with X; in the range of X;.

66

PROBABILITY

For sets A and A;, A n R1 . lA = f( IA 1 , . . . , lA ) for some f: R Example 5. 3.

-+

II

E a( A 1 ,

• • •

, A n ) if and only if •

Independence

A sequence X1 , X2 , (finite or infinite) of simple random variables is by definition independent if the classes a( X1 ), a( X2 ), . . . are independent in the sense of the preceding section. By Theorem 5.1(i), a( X;) consists of the sets [ X; E H] for H c R 1 • The condition for independence of X1 , , Xn is therefore that • • •

• • .

for linear sets H1 , , Hn . The definition (4.11) also requires that (5.5) hold if one or more of the [X; E H;] is suppressed; but taking H; to be R 1 eliminates it from each side. For an infinite sequence X1 , X2 , , (5.5) must hold for each n. A special case of (5.5) is • • •

• • •

On the other hand, summing (5.6) over x 1 E H1 , , x n E Hn gives (5.5). Thus the X; are independent if and only if (5.6) holds for all x 1 , , x n . Suppose that • • .

• • •

X11 , X1 2 , . . . X2 1 , X22 , · · .

(5 .7)

•

•

•

•

•

•

•

•

•

is an independent array of simple random variables. There may be finitely or infinitely many rows, each row finite or infinite. If SJI; consists of the finite intersections n i [XiJ E Hi ] with Hi c R l , an application of Theorem 4.2 shows that the a-fields a( Xi l , X; 2 , ), i 1, 2, . . . are independent. As a consequence, Y1 , Y2 , are independent if Y; is measurable a( Xil , X; 2 , ) for each i. =

• • •

• • •

• • •

The dyadic digits d 1 (w), d 2 ( w), . . . on the unit interval are an independent sequence of random variables for which Example 5.4.

(5 .8)

P ( dn 0 ] P (dn 1 ] =

=

=

It is because of (5.8) and independence that the a fair coin.

=

�.

dn give model for tossing a

SECTION

5.

67

SIMPLE RANDOM VARIABLES

The sequence ( d 1 ( w ), d 2( w ), . . . ) and the point w determine one another. It can be imagined that w is determined by the outcomes dn< w ) of a sequence of tosses . It can also be imagined that w is the result of drawing a poi nt at random from the unit interval, and that w determines the d ( w ) . In the second interpretation the dn( w ) are all determined the instant w is drawn, and so it should further be imagined that they are then revealed to the coin tosser or gambler one by one. For example, o( d1, d2 ) corresponds to knowing the outcomes of the first two tosses-to knowing not w but only d1 ( w ) and d2 ( w )-and this does not help in predicting the value d3( w ) • because o( d1, d2 ) and o( d3) are independent. See Example 5 . 1 . n

Example 5. 5.

For example,

Every permutation can be written as a product of cycles .

4 4

�)

6 = ( 1 5 6 2 ) ( 3 7 )( 4) . 2 1 7 This permutation sends 1 to 5, 2 to 1, 3 to 7, and so on. The cyclic form on the right shows that 1 goes to 5, which goes to 6, which goes to 2, which goes back to 1; and so on. To standardize this cyclic representation, start the first cycle with 1 and each successive cycle with the smallest integer not yet 2

3

5 6

encountered. Let D consist of the n ! permutations of 1, 2, . . . , n, all equally probable; 'contains all subsets of 0 and P( A ) is the fraction of points in A . Let Xk( w ) be 1 or 0 according as the element in the k th position in the cyclic representation of the permutation w completes a cycle or not. Then S( w ) = E� _ 1 Xk ( w ) is the number of cycles in w. In the example above, n = 7, X1 = X2 = X3 = X5 = 0, X4 = X6 = X1 = 1, and S = 3. The following argument shows that XI , . . . ' xn are independent and P[ xk = 1] = ( n - k + 1) - 1 . This will lead later on to results on the probabilities P[S e H) . The idea is this : X1 ( w ) = 1 if and only if the random permutation sends 1 to itself, and the probability of this is 1/n. If X1 ( w ) = 1, then X2 ( w ) = 1 if and only if w also sends 2 to itself, and (since w cannot send 1 and 2 both to 1) the conditional probability of this is 1/(n - 1). And if X1( w ) = 0, then w sends 1 to some i ¢ 1, and X2 ( w) = 1 if and only if w sends i to 1, the conditional probability of which is 1/(n - 1).

For a complete proof, consider the event A;(}) that w sends i to j. If i 1 , ik are distinct and }1 , , }* are distinct, then P(n�_ 1 A; ( Ju )) = ( n - k )!/n ! Let Yi ( w ) , . . . , Y, ( w ) be the integers in the successive positions in the cyclic representa tion of w. Let A be the event that •

•

•

•

X1 = x1 , . . . , x* - 1 = x* - 1 , y1 = Y1 ' ' yk - 1 = Yk - 1 ' .

•

.

•

•

,

68

PROBABILITY

If A is nonempty, then it has the form n� : t A l. ( iu ) for certain iu · (If Xu = 0 , take iu = Yu + 1 ; if xu = 1, let x0 = 1 and take iu =, y,. + 1 , where v is the largest index satisfying 0 < v < u and x,, = 1.) Hence A has probability ( n - k + 1)!/n ! But for an appropriate J�c , A n [ X�c = 1] has the form n�= l A r (Ju ) an d hence has probabil ity ( n - k ) ! /n ! Therefore P( A n [ X" = 1]) = P( A )l n - k + 1) - 1 . Adding over the possible sets A shows that P[ X�c = 1] = ( n - k + 1) - 1 • Thus [ X�c = 1] is independen t of each A and it follow s by Theorem 4.2 that a ( Y1 , , Yk , X1 , , X�c _ 1 ) and a( X�c ) are independent. This of course implies that a( X1 , , X�c _ 1 ) and a( X�c ) are independent, and it follows by induction on k that x1 ' . . . ' xn are independent. • •

•

•

•

•

•

•

•

•

Existence of Independent Sequences The distribution of a simple random variable defined for all subsets A of the line by

(5 .9)

X is the probability measure p.

p. ( A ) = P ( X e A ] .

This does define a probability measure. It is discrete in the sense of Example 2.9 : If x 1 , • • • , x1 are the distinct points of the range of X, p. has mass P; = P[X = X;] = p. { x ; } at X;, and p.(A) = 'Lp ; , the sum extending over those i for which X; e A. As p.(A) = 1 if A is the range of X not only is p. discrete, it has finite support.

Let { p. n } be a sequence of probability measures on the class of all subsets of the line, each having finite support. There exists on some probability space ( 0, �' P) an independent sequence { Xn } of simple random variables such that xn has distribution P.n · Theorem 5.2.

What matters here is that there are finitely or countably many distribu tions P. n · They need not be indexed by the integers; any countable index set will do. The probability space will be the unit interval. To understand the construction, consider first the case in which each P. n concentrates its mass . on two distinct points, say a n and bn . Put Pn = P. n { a n } and qn = 1 - Pn = P. n { bn }. Split (0, 1] into two intervals Ia and Ib of lengths p 1 and q1• De fine X1 (w) = a 1 for w e Ia and X1 (w) = b 1 for w e lb. If P stands for Lebesgue measure, then clearly P[X1 = a 1 ] = p 1 and P[ X1 = b 1 ] = q1 , so that xl has distribution P.t· Now split Ia into two intervals Jaa and Ja b of lengths p 1 p2 and p 1 q2 , and split Ib into two intervals Jba and Jbb of lengths q 1 p2 and q 1q 2 • Define X2 (w) = a 2 for w E Jaa U Jba and X2 (w) = b 2 for w E Jab U Jbb · As the diagram makes clear, P[ X1 = a 1 , X2 = a 2 ] = p 1 p 2 , and similarly for the PROOF.

SECTION

xl •

x2

=

x2

a2

=

=

xl

al

P1q2

P1P2

69

SIMPLE RANDOM VARIABLES

xl

al

=

5.

=

x2

b2

•

=

bt

a2

qlp2

xl

I

x2

= =

bl

qlq2

b2

•

other three possibilities. It follows that XI and X2 are independent and X2 has distribution p. 2• Now X3 is constructed by spliting each of • J""' Ja b ' Jb a ' Jbb in the proportions p 3 and q3, and so on. The argument for the general case is essentially the same. Let x ni ' . . . , x n 1 be the distinct points on which P. n concentrates its mass, and put Pn; =

II

#',.{ Xn; }, 1

, • • • , 11�1 > of respective Jf subintervals II into ] 1 Decompose (0, I D > , 1 � i < II . 1/ E w for w) X = ( by X setting . efine ; X . . . P ll I I I I� ' ' P t I Then ( P is Lebesgue measure) P[w: XI (w) = xi ; ] = P(J/ > ) = p l i , 1 m to I 2 sub1. ntervals 1; , • • • , 1u<2> of respective lengths P1 i P 2 I , • . • , P I ; P 2 1 • Define X2 ( w) = x 2 1 for w E U:t_2 I I;7 > , 1 < j < 12 • Then P[w: XI (w) = X 2I ; ' X2 (w) = x 2 1 ] = P(I;7> > = P I ; p 2 1 • Adding out i shows that P[w: X2 (w) = x 2 1 ] = p 2 1 , as required. Hence P[ XI = xi ; , X2 = x 2i ] = p l i p 2 1 = P[ XI = x l i ]P[X2 = x 2 1 ], and XI and X2 are independent. The construction proceeds inductively. Suppose that (0, 1] has been decomposed into II · · · . In intervals ( 5 . 10 )

of lengths

P ( Ji(1 n )

( 5 .11 )

• •

•

i11

) = p I , i1 . . . pn , i11 •

n I n I . t• n to In + I subm tervals J;(1 . .+. ;11)I , • . . , J;(1 ...+ ;11) 111+ 1 of respect•1ve lengths P(J,.< ��. )Pn + I I' . . . , P(J,.< ��. ,. )Pn + I 1 • These are the intervals of 1 1 11+ 1 the next decomposition. This construction gives a sequence of decompositions (5.10) of (0, 1] into subintervals; each decomposition satisfies (5.11) and each refines the preceding one. If P. n is given for 1 < n < N, the procedure terminates after N steps; for an infinite sequence it does not terminate at all. Decompose /;(1 n. ). . ;11 ,·

t 1r b - a == 81 + · · into subintervals /1 ,

t

II

· •

•

II

t

+ 81 and B; > 0 , then I; = ( a + Ej < ;Bj , a + E1 � , 81 ] decomposes ( a , b) , 11 with lengths B;. Of course, 1, is empty if B; = 0.

•

70

PROBABI LITY

For 1 � i � In , put Xn ( w) = Xn; if w E U ;1 ;,_ 1 1;�� . ;, _ 1 ; - Since each decomposition (5.10) refines the preceding, Xk ( w ) = x k;k for w e 1;�� . ;k . . ; Therefore, each element of (5.10) is contained in the element with the same label i 1 i n in the decomposition • • •

.

•

•

,

·

•

Xn;, ] , 1 < ;1 < /1 , . . . , 1 � i n < / n. The two decompositions thus coincide, and it follows by (5.11) that P[X1 = X li ' Xn = X n , ] = p 1 ;1 pn ; Adding out the indices i 1 ' . . . ' i n - 1 shows that xn has distribution P. n and hence that x1 , . . . ' xn are indepen • dent. But n was arbitrary. A

1'

•

•

•

;1

• • •

;

;, =

[ w: X1 ( w ) = X i , l 1

t

•

•

•

t

•

•

•

, Xn( w )

=

,•

Probability theorems such as those in the next sections concern indepen dent sequences { xn } with specified distributions or with distributions having specified properties, and because of Theorem 5.2 these theorems are true not merely in the vacuous sense that their hypotheses are never fulfilled. Similar but more complicated existence theorems will come later. For most purposes the probability space on which the xn are defined is largely irrelevant. Every independent sequence { Xn } satisfying P[ Xn = 1] = p and P[ Xn = 0] = 1 p is a model for Bernoulli trials, for example, and for an event like U:0_ 1 [Ei:_ 1 Xk > an ], expressed in terms of the Xn alone, the calculation of its probability proceeds in the same way whatever the underlying space (D, !F, P) may be. It is, of course, an advantage that such results apply not just to some canonical sequence { Xn } (such as the one constructed in the proof above) but to every sequence with the appropriate distributions. In some applica tions of probability within mathematics itself, such as the arithmetic ap plications of run theory in Examples 4.11 and 4.15, the underlying D does play a role. -

Expected Value

A simple random variable in the form (5.2) is assigned

mean value ( 5 .12 )

There is the alternative form

( 5 .13)

E[ X]

=

L xP ( X = x ] , X

expected value

or

SECTION

5.

SIMPLE RANDOM VARIABLES

71

the sum extending over the range of X; indeed, (5.12) and (5.13) both coincide with E x E;: x x x ; P ( A ; ) . By (5.13) the definition (5.1 2) is con sistent: different representations (5.2) give the same value to (5.12). From (5.1 3) it also follows that E[X] depends only on the distribution of X; hence E [ X] = E [ Y] if P [ X = Y] = 1. If X is a simple random variable on the unit interval and if the A; in (5.2) happen to be subintervals, then (5.12) coincides with the Riemann integral as given by (1 .5). More general notions of integral and expected value will be studied later. Simple random variables are easy to work with because the theory of their expected values is transparent and free of technical complica tions . As a special case of (5.12) and (5.13), , -

E ( IA )

( 5 . 1 4)

As another special case, if a variable X( w) = a, then

= P( A) . constant a is

identified with the random

= a. From (5.2) follows /( X) = E;/(x ; )IA ., and hence E [ / ( X ) ] = Lf( x ; ) P ( A; ) = Lf( x ) P [ X = x ] , ( 5 .1 6) E [ a]

(5 . 1 5 )

I

X

;

the last sum extending over the range of X. For example, the k th moment E [ X k ] of X is defined by E [ X k ] = EyyP[ X k = y], where y varies over the range of X k , but it is usually simpler to compute it by E[ X k ] = Ex x kP [ X = x ], where x varies over the range of X. If

(5 .17) ;

are simple random variables, then aX + {J Y = Eij (ax ; + {Jyj )IA n B has ; 1 expected value E ij (ax ; + {Jyj )P(A ; n Bj ) = aE;x;P(A;) + {J Ej yj P(Bj ). Expected value is therefore linear:

= aE ( X] + {JE ( Y ] . If X(w) < Y(w) for all w, then X; < yj if A; n Bj is nonempty, and hence E;i x ;P( A; n Bj ) < E;i yj P ( A ; n Bi ) . Expected value therefore preserves order: (5 . 18)

(5 . 1 9)

E ( aX + {JY ]

E ( X] < E ( Y ] if X < Y.

72

PROBABILITY

(It is enough that X � Y on a set of probability 1.) Two applications of (5.19) give E[ - l XI ] � E[ X] � E[ I XI ], so that by (5.18) I E [ X ]I � E [I XI ] .

(5 .20)

The relations (5.14) through (5.20) will be used repeatedly, and so will the following theorem on expected values and limi ts. If there is a finite K such that I Xn ( w ) l � K for all w and all n, the xn are uniformly bounded.

If { Xn } is uniformly bounded, and if X = lim n Xn on an !J14set of probability l, then E[ X] = lim n El Xn 1 · Theorem 5.3.

Let A be the jli:set of the hypothesis: P( A) = 1 and there is convergence inside A. Since redefining Xn = X = 0, say, outside A does not change the expected values, it can be assumed that X = lim n Xn everywhere. By (5.18) and (5.20), PROOF.

I E ( X] - E ( Xn ]l < E (I X - Xn l ] , and it suffices to prove that the right side goes to 0. Let K bound the Xn ; then K bounds X as well. Given £, define A n = [ I X - Xn l > £]; then

for each w. By (5.19),

Since Xn ( w ) -+ X(w) for all w, lim supn A n is empty, and Theorem 4.1 gives P( A n ) -+ 0. Therefore, lim supn E[ I X - Xn 1 1 � £. Since £ was arbitrary, the • result follows. Theorems of this kind are of constant use in probability and analysis. For the general version, Lebesgue' s dominated convergence theorem, see Section

16.

On the unit interval take X( w) identically 0 and take Xn( w ) to be n 2 if 0 < w � n - 1 and 0 if n - 1 < w � 1. Then Xn ( w ) --+ X(w) for every � , although E[ Xn ] = n does not converge to E [ X] = 0. Thus Theorem 5.3 fails without some hypothesis such as that of uniform boundedness. See also Example 7.7. • EXIIIle IIp 5.6.

An extension of (5.18) is an immediate consequence of Theorem 5.3 :

SECTION

5.

73

SIMPLE RANDOM VARIABLES

If X = En Xn on an !#4set of probability 1, and if the partial sums of En Xn are uniformly bounded, then E[ X] = E n ElXn1 · Corollary.

Expected values for independent random variables satisfy the familiar product law. For X and Y as in (5.17), XY = EiJ xiyJ IA , n B/ If the X; are distinct and the y1 are distinct, then A ; = [ X = X; ] and B1 = [Y = y1 ]; for independent X and Y, P(A ; n B1 ) = P(A ; )P( B1 ) by (5.6), and so E[ XY] = E;1 x ; y1 P(A ; )P( B1 ) = E[ X]E[Y]. If X, Y, Z are independent, then XY and Z are independent by the argument involving (5.7), so that E[ XYZ] = E [ XY]E[Z] = E[ X]E[Y]E[Z]. This obviously extends:

E ( X1

( 5 .21)

•

•

•

Xn ] = E ( X1 ]

•

•

•

E( Xn ]

if xl , . . . ' xn are independent. Various concepts from discrete probability carry over to simple random variables. If E[.X] = m, the variance of X is

(5 . 22) the left-hand equality is a definition, the right-hand one a consequence of expanding the square. Since aX + P has mean am + p, its variance is E [((a X + P ) - ( am + P )) 2 ] = E[a 2 ( X - m) 2 ]:

P ] = a 2 Var( X ] . If X1 , , Xn have means m 1 , , m n , then S = E 7_ 1 X; has mean m = Ej_ 1 m ; , and E[(S - m) 2 ] = E[(E7_ 1 ( X; - m ; )) 2 ] = E7_ 1 E[( X; - m ; ) 2 ] + 2 E1 s i < J s nE[( X; - m ;X X1 - m1 )]. If the X; are independent, then so are the X; - m ; , and by (5.21) the last sum vanishes. This gives the famiJiar V ar ( aX +

(5 . 23 )

•

•

•

•

•

•

formula for the variance of a sum of independent random variables:

(5 .24)

var

[ f. X; ] = f. ;-I

Suppose that Then

E [ X]

i-I

Var[ X; ] .

X is nonnegative; order its range: 0 < x 1 < x 2 <

·

·

·

k

= E X; P [ X = X;] ;- I

k-I

= E x ;( P [ X > x ;] - P [ X > X; + 1 ]) + x k P [ X > x k ] ; -I

k

= x 1 P [ X � x 1 ] + E ( x ; - X; _ 1 ) P [ X > x ;] . i- 2

<

xk .

74

PROBABILITY

p,

0

XJt - 1

Since P[ X > x] = P[ X > x 1 ] for 0 < x < x 1 and P[ X > x] = P[ X > X ;] for X;_ 1 < x < X; , it is possible to write the final sum as the Riemann integral of a step function: ( 5 .25 ) This holds if X is nonnegative. Since P[ X > x] = 0 for x > x k , the range of integration is really finite. There is for (5.25) a simple geometric argument involving the "area over the curve." If P; = P[ X = X; ], the area of the shaded region in the figure is the sum p 1 x 1 + +p k x k = E[X] of the areas of the horizontal strips; it is also the integral of the height P[ X > x] of the region. · · ·

Inequalities

There are for expected values several standard inequalities that will be needed. If X is nonnegative, then for positive a (sum over the range of X) E[ X] = Ex xP[ X = x] > Lx : x � axP[ X = x] > aEx : x > a P[ X = x]. There fore,

P ( X > a] < -a1 E ( X ]

( 5 .26)

if X is nonnegative and a positive. A special case of this is (1.19). Applied to I XI k , (5.26) gives Markov 's inequality,

P [ I XI > a] <

( 5 .27) valid for positive

a.

If k

=2

and

m

� E (I XI k ] , a = E[ X]

is subtracted from

X,

this

SECTION

becomes the

5.

75

SIMPLE RANDOM VARIABLES

Chebyshev (or Chebyshev-Bienayme) inequality:

(5 . 28) A function

( x ) + (1 - p )

( 5 . 29 )

< E [

valid if

-p1 + q1

( 5 .30 )

-

=

1,

p>

X.

1, q > 1 .

Holder 's inequality is ( 5 .3 1 ) If, say, the first factor on the right vanishes, then X = 0 with probability 1, hence X Y = 0 with probability 1, and hence the left side vanishes also. Assume then that the right side of (5.31) is positive. If a and b are positive, -l -1 there exist s and t such that a = e P and b = e q 1• Since ex is convex, 1 e P- l + q< p - le + q - te or s

s

I

s

I'

P a bq ab < +p q. u

This obviously holds for nonnegative as well as for positive and v be the two factors on the right in (5.31). For each w,

a and b. Let

X( w ) Y( w ) < -1 X( w ) P + -1 Y( w ) q . uv p u q v Taking expected values and applying (5.30) leads to (5.31). I f p = q = 2, Holder's inequality becomes Schwarz 's inequality :

(5 .32)

76

PROBABILITY

a {J.

p {Jja .. q = fJ/( fJ a ), and Lyapounov 's inequality..

= < Suppose that 0 < In (5. 31) t ake Y( w ) = 1 , and repl ace X by I XI 0• The result is

0

(5 .33)

-

< a < {J .

PROBLEMS 5. 1.

5.2. 5.3. 5.4.

(a) Show that X is measurable with respect to the a-field t§ if and only if a( X) c t§. (b) Show that, if t§ == { 0, Q } , then X is measurable t§ if and only if X is

constant. (c) Show that, if P( A ) is 0 or 1 for every A in t§ and X is measurable t§, then P [ X == c) == 1 for a constant c. (d) Complement Theorem 5.1(ii) by showing that Y is measurable a( X1 , • • • , Xn ) if and only if for all w and w', X; ( w) == X; ( w'), i < n , implies that Y( w ) == Y( w'). 2. 1 7 f Show that the unit interval can be replaced by any nonatomic probability measure space in the proof of Theorem 5.2. Show that m == E[ X] minimizes E[( X - m ) 2 ]. Suppose that X assumes the values m - a , m , m + a with probabilities p , 1 2 p , p , and show that there is equality in (5.28) . Thus Chebyshev's inequality cannot be improved without special assumptions on X. -

5.5.

Suppose that X has mean m and variance a 2• (a) Prove Cantelli ' s inequality a >

(b)

5.6. 5.7.

5.8.

0.

Show that P [ I X - m l > a ] < 2 a 2 /( a 2 + a2 ). When is this better than Chebyshev's inequality? (c) By considering a random variable assuming two values, show that Cantelli's inequality is sharp. The polynomial E[( t i XI + I Y l ) 2 ] in t has at most one real zero. Deduce Schwarz's inequality once more. (a) Write (5.33) in the form EllIa [ I X l 0 ] < E[( I X l cr ) Jllcr ] and deduce it directly from Jensen's inequality. (b) Prove that E[1jXP ] > 1jEP ( X] for p > 0 and X a positive random variable. (a) Let f be a convex real function on a convex set C in the plane. Suppose that ( X( w ) , Y( w )) e C for all w and prove a two-dimensional Jensen's

SECTION

5.

77

SIMPLE RANDOM VARIABLES

inequality: /( E ( X ) , E ( Y ] )

(5 .34) (b)

5.10.

E ( /( X, Y) ) .

Show that f is convex if it has continuous second derivatives that satisfy

( 5 .35) 5.9.

 0,

f Holder's inequality is equivalent to E[ X1 1P y1 1q ] < E1 1P ( X] E1 1q [ Y ] ( p - 1 + q - 1 == 1), where X and Y are nonnegative random variables. Derive this from (5.34). f Minkowski ' s inequality is ·

(5.36)

5.11.

valid for p � 1. It is enough to prove that E[( X1 1P + Y1 1P ) P ) < ( E1 1P ( X] + E1 1P ( Y ])7' for nonnegative X and Y. Use (5.34). The essential supremum of I X I is ess sup I X I == inf[ x : P [ l X I < x ] == 1]. (a) Show that E1 1P ( I X I P ) f ess sup l X I as p t oo . (b) Show that E( I XY I ] < E( I X I ] ess sup I Y l and ess sup J X + Y l < ess sup l X I + ess sup I Y l . If one formally writes ess sup l X I == E foo ( I X I 00 ], the first of these inequalities is Holder's for p == 1 and q == oo (formally 1 - I + oo - t == 1) and the second is Minkowski's inequality (5.36) for p == oo. (c) Check (5.33) for the case fl == oo . For events A 1 , A 2 , , not necessarily independent, let Nn == E i: _ 1 I k be the number to occur among the first n . Let ·

5.11.

A

• • •

Show that

an - fln Var [ n - I Nn ) == PQn - an2 + n

.

Thus Var[ n - 1 Nn ] -+ 0 if and only if fln - a ; -+ 0, which holds if the An are independent and P(A n ) == p (Bernoulli trials) because then an == p and

fln == P 2 == a ;. 5. 13.

5.14.

(a) Let S be the number of elements left fixed by a random permutation on n letters. Calculate E[ S] and Var[ S]. (b) A fair r-sided die is rolled n times. Let N be the number of pairs of

trials on which the same face results. Calculate E[ N ] and Var[ N]. Hint: In neither case consider the distribution of the random variable; instead, represent it as a sum of random variables assuming values 0 and 1. Show that, if X has nonnegative integers as values, then E[ X ] == E� 1 P [ X > n ]. _

78

PROBABILITY

5. 15. Let I; == IA 1 be the indicators of n events having union A . Let Sk = El;1 • • • I; , where the summation extends over all k-tuples satisfying 1 < i 1 < · · · < i/' < n. Then sk = E[Sk ] are the terms in the inclusion-exclusion formula P( A ) = s 1 - s 2 + · · · ± sn . Deduce the inclusion-exclusion formula from fA = S1 - S2 + ± Sn. Prove the latter formula by expanding the product n f_ 1 (1 - I; > · 5.16. 2.15 f For integers m and primes p, let a. ( m ) be the exact power of p in •

•

·

the prime factorization of m : m = n p p0P (',m ). Let divides m or not. Under each Pn (see (2.20)) the variables. Show that for distinct primes p 1 , • • • , Pu ,

(5.39)

P,. [ a.p , > k; ' ; < u ]

...

[

8p ( m ) be 1 or 0 as p a.P and BP are random

.!. k n k -+ k 1 k n P 1 1 • • • Pu " P 1 1 • • • Pu "

]

and ( 5. 40)

Pn ( a.P

·

I

= k; , i s u ]

Similarly, ( 5.41 )

.

Pn [ 8Pi = 1 , I

S U

]

u

(1 -

-+ n ,• _

)

1 1 • k p pk 1 .

I

I

.

I

I+

]

[

1 n 1 =-+ . n Pt . . . Pu P 1 . . . Pu

According to (5.40), the a.P are for large n approximately independent under Pn , and according to (5.41), the same is true of the BP . For a function f of positive integers, let ( 5 .42)

n

1 En [ / ] = - E /( m ) n m- 1

be its expected value under the probability measure Pn . Show that ( 5 .43)

En ( a.p ] =

00

[ ]

1 1 . - �k -+ p-1' k-1 n p � �

this says roughly that ( p - 1)- 1 is the average power of p in the factoriza tion of large integers. 5.17. f (a) From Stirling's formula, deduce (5.44)

En [ log] = logn + 0{1) .

From this, the inequality En [ a.P ] < 2/p, and the relation logm = EP a.P ( m)logp, conclude that EP p- 1 logp diverges and that there are in finitely many primes.

SECTION

5.

SIMPLE RANDOM VARIABLES

79

(b) Let log*m == EPBP ( m)logp. Show that

( 5 .45)

[ ]

En ( log* ] == E .!. !! logp == logn + 0 { 1) . n p P

(c)

Show that [2n/p] - 2[ n/p] is always nonnegative and equals 1 in the range n < p s 2n. Deduce E2 n [log* ] - En [log* ] = 0(1) and conclude that E logp == O( x) .

( 5.46)

psx

Use this to estimate the error removing the integral-part brackets introduces into (5.45) and show that ( 5.47)

1 E p- logp == logx + 0 { 1) . psx

(d)

Restrict the range of summation in (5.47) to fJx < p < x for an ap propriate fJ and conclude that (5.48)

E logp x x ,

psx

in the sense that the ratio of the two sides is bounded away from 0 and oo . (e) Use (5.48) and truncation arguments to prove for the number w(x) of primes not exceeding x that ( 5.49)

w( X)

X

X

logx

.

(By the prime number theorem the ratio of the two sides in fact goes to 1.) Conclude that the rth prime p, satisfies p, x rlogr and that ( 5.50) 5. 18.

5. 19.

1 == oo . Ep p

Let fn ( x ) be n 2x or 2n - n 2x or 0 according as 0 < x < n - 1 or n - 1 < x < 2 n- 1 or 2 n- 1 < x < 1. This gives a standard example of a sequence of continuous functions that converges to 0 but not uniformly. Note that /J/n ( x ) dx does not converge to 0; relate to Example 5.6. 4.18 f Prove Theorem 5.2 for the case In 2 by the result in Problem 4.18. =

80

PROBABILITY

SECfiON 6. THE LAW OF LARGE NUMBERS The Strong Law

Let X1 , X2 , • • • be a sequence of simple random variables on some probabil ity space (D, !F, P). They are identically distributed if their distributions (in the sense of (5.9)) are all the same. Define Sn = X1 + · · · + Xn .

If the Xn are independent and identically distributed and then

Theorem 6. 1.

E[ Xn ] =

m,

( 6.1 )

The conclusion is that n - 1sn - m = n - 1 E?_ 1 ( X,. - m) --+ 0 except on an jli:set of probability 0. Replacing X; by X; - m shows that there is no loss of generality in assuming m = 0. Let E [ X;] = o 2 and E[X:] = � 4• The proof is like that for Theorem 1.2. In analogy with (1 .25), E[S:] = EE[ X0XpX1X8], the four indices ranging independently from 1 to n. Since E[ Xn 1 = 0, it follows by the product rule (5.21) for independent random variables that the summand vanishes if there is one index different from the three others. This leaves terms of the form E[ X;4 ] = � 4 , of which there are n, and terms of the form E[X/ .X:/1 = E[ X/]E[ X12 ] = o 4 for i ¢ j, of which there are 3n(n - 1). Hence PROOF.

( 6 .2 ) where K does not depend on n. By Markov's inequality (5.27) for k = 4, P[ I Snl � n£] < Kn - 2£ -4 , and so by the first Borel-Cantelli lemma, P[ l n - 1Snl > £ i.o. ] = 0 for each positive £. Now

where the union extends over positive rational £. This shows in the first place that the set on the left in (6.3) lies in � and in the second place that its probability is 0. Taking complements gives (6.1). • The classical example is the strong law of large numbers for Bernoulli trials. Here P[Xn = 1] = p, P[ Xn = 0] = 1 - p, m = p; Sn represents the number of successes in n trials, and n - 1Sn --+ p with prob ability 1 . The idea of probability as frequency depends on the long-range • stability of the success ratio S,jn . Example 6. 1.

SECTION

6.

THE LAW OF LARGE NUMBERS

81

Theoren1 1 .2 is the case of Example 6.1 in which ( (}, $i", P ) is t he unit interval and the Xn( w ) are the digits d,( w ) of the dyadic expansion of w. Here p = ! . The set (1 .20) of normal numbers in the unit i n terval has by (6.1) Lebesgue measure 1 ; its complement has measure 0 ( and so in the terminology of Section 1 is negligible). See Example 4. 1 2. • Example 6. 2.

The Weak Law If (6.3) has probability 0, so has [ l n - 1Snl > £ i.o ] for each positive £ , and Theorem 4.1 implies that P[ l n - tsn 1 > £ ] --+ 0. (Compare the discussion ce ntering on (4 . 3 1 ) and (4. 3 2) . ) But this follows directly from Chebyshev's inequality ( 5 .2 8 ) , which also leads to a more flexible result. Suppose that (On, �' Pn ) is a probability space for each n and that xnl ' . . . ' xn r, are simple random variables on this space. Let sn = xnl + · · · + Xn r, • If .

(6 .4) and if xnl' . . . ' xn r, are independent, then r, Var( Sn ] = (6 .5) E [ Sn ] = mn = E m n k ' k-1

r,

on2 = E on2k · k=l

Chebyshev's inequality immediately gives this form of the weak law of large numbers:

Theorem 6.2. Suppose that for vn > 0. If an/Vn --+ 0, then

each n,

Xn 1 , • • • , Xn,,

are independent and

(6 . 6 )

for all positive

£.

Example 6.3.

Suppose that the spaces (On , Fn , Pn ) are all the same, that 'n = n, and that Xn k = Xk , where X1 , X2 , . . . is an infinite sequence of independ en� random variables. If vn = n, E [ Xk ] = m , and Var[ Xk ] = o 2 , then o,jvn = o ;{; --+ 0. This gives the weak law corresponding to Theorem 6. 1 . • Let On consist of the n ! permutations of 1 , 2, . . . , n, all equally probable, and let Xn k ( w ) be 1 or 0 according as the k th element in Example 6.4.

82

PROBABILITY

the cyclic representation of w E o n completes a cycle or not. This is Example 5.5, although there the dependence on n was suppressed in the notation. Here rn = n, Xn 1 , . . . , X, n are independent, and Sn = Xn 1 + Xnn is the number of cycles. The mean m n k of Xn k is the probabil + ity that it equals 1 , namely (n - k + 1) - 1 , and its variance is on2k = m n k(1 - m n k ). If Ln = I:/: -= 1 k - 1 , then Sn has mean I:: = 1 m n k = Ln and variance I:: _ 1 m n k(1 - m n k ) < Ln . By Chebyshev' s inequality, ·

·

·

Of the n ! permutations on n letters, a proportion exceeding 1 - £ - 2L; 1 thus have their cycle number in the range (1 ± £)Ln . Since Ln = log n + 0(1), most permutations on n letters have about log n cycles. For a refinement, see Example 27.3. Since O n changes with n, it is the nature of the case that there cannot be • a strong law corresponding to this result. Bernstein's Theorem

Some theorems that can be stated without reference to probability nonethe less have simple probabilistic proofs, as the last example shows. Bernstein ' s approach to the Weierstrass approximation theorem is another example. Let f be a function on [0, 1 ]. The Bernstein polynomial of degree n associated with f is ( 6 .7) Theorem 6.3. [0, 1 ].

If f

is continuous,

Bn ( x )

converges to f( x ) uniformly on

According to the Weierstrass approximation theorem, f can be uniformly approximated by polynomials; Bernstein's result goes further and specifies an approximating sequence. Let M = supx lf( x ) l , and let 8(£) = sup[ l/( x ) - /( y ) l : l x - Y l < £] be the modulus of continuity of f. It will be shown that

PROOF.

( 6 .8 )

sup lf ( x ) - Bn ( x )I x

2M < 8( £ ) + n£ 2 •

SECTION

6.

83

THE LAW OF LARGE NUMBERS

By the uniform continuity of f, lim ( _ 08( £ ) = 0, and so this inequality (for n - 1 13, say) will give the theorem. £ Fix n > 1 and x E [0, 1] for the moment. Let X1 . . . , X, be independent ra ndom variables (on some probability space) such that P [ X 1] x and P [ X; 0] 1 - x; put S X1 + · · · + Xn. Since P[S k ] = : x A (l x ) " - A , the formula (5.16) for calculating expected values of functions of random variables gives E[ /(S/n )] Bn(x). By the law of large numbers, there should be high probability that Sjn is near x and hence ( / being continuous) that /( S/n) is near /(x); E[/( S/n )] should therefore be near f( x ). This is the probabilistic idea behind the proof and, indeed, behind the defi nition (6. 7) itself. Bound l/(n - 1S ) - /( x ) l by 8(£) on the set [ l n - 1S - x l < £] and by 2 M on the complementary set, and use (5.19) and (5.20) as in the proof of Theorem 5.3. Since E[S] nx , Chebyshev ' s inequality gives =

,

=

=

=

=

;

=

=

()

=

=

I Bn ( x ) - /( x ) I < E [ I/( n - Is ) - /( x )I ] < 8 ( £ ) P [ I n - 1S - x l < £ ) + 2 MP [ I n - 1S - x l > £ ) s

since Var[S] A

=

8 ( £ ) + 2 M Var[ S ]/n 2£ 2 ;

nx (1 - x)

s

•

n , (6.8) follows.

Refinement of the Second Borel-Cantelli Lemma =

For a sequence A 1 , A 2 , . . . of events, consider the number Nn IA 1 + · · · + lA of occurrences among A 1 , . . . , An. Since [ A n i.o.] [ w: supn Nn(w) oo ], P[ A n i.o.] can be studied by means of the random variables Nn. Suppose that the A n are independent. Put P k P ( A k ) and mn p1 + · · · + Pn · From E[ IA , ) P k and Var[ /A , ) P k (1 - p k ) X, then =

II

=

=

=

=

=

=

=

P ( Nn < x ] S P ( I Nn - mn l > mn - x ]

(6 .9 )

<

Var( Nn ) ( mn - x ) 2

<

mn ( mn - x ) 2

"

If L Pn oo , so that mn --+ oo, it follows that lim n Pl Nn < x ] x . Since =

(6 .1 0)

=

0 for each

84

PROBABILITY

P[supk Nk < x] = 0 and hence (take the union over x = 1, 2, . . . ) P[supk Nk < oo ] = 0. Thus P[ An i.o.] = P[supn Nn = oo] = 1 if the A n are indepen dent and L Pn = oo , which proves the second Borel-Cantelli lemma once . again. Independence was used in this argument only to estimate Var[ Nn 1 · Even without independence, E[ Nn ] = m n and the first two inequalities in (6.9) hold.

If EP( A n ) diverges and

Theorem 6.4 t

(6 . 1 1 )

then P[ A n

i.o.]

=

1.

As the proof will show, the ratio in (6.1 1) is at least 1 ; if (6.1 1) holds, the inequality must therefore be an equality. PROOF.

Let 8n denote the ratio in (6.11). In the notation above, Var[ Nn )

= =

E [ Nn2 ] - m �

=

IA/A,) - m� [ E j, k s n L

L P ( Ai n A k ) - m ; = ( 8n - 1 ) m �

j, k s n

2

(and 8n - 1 � 0). Hence ( see (6.9)) P[Nn < x ] S ( 8n - 1)m�/(mn - x ) 2 for x < m n . Since m �( m n x ) --+ 1, (6.1 1) implies that lim infn PlNn < x ] = 0. It still follows by (6.10) that P[supk Nk < x ] = 0, and the rest of the • argument is as before.

-

If as in the second Borel-Cantelli lemma, the A n are independent (or even if they are merely independent in pairs), the ratio in (6. 1 1 ) is 1 + Ek s n< Pk - Pi)/m�, so that EP( A n ) = oo implies (6.1 1). • Example 6.5.

Return once again to the run lengths /n (w) of Section 4. It was shown in Example 4.19 that { rn } is an outer boundary ( P[/n � rn i.o.] = 0) if 1: 2 - r, < oo . It was also shown that { rn } is an inner boundary Example 6.6.

t For an extension, see Problem 6. 1 5.

SECTION

6.

85

THE LAW OF LARGE NUMBERS

( P [In � rn i. o.] = 1) if rn is nondecreasing and E 2 -',.rn- t = oo , butr Theo rem 6 . 4 can be used to prove this under the sole assumption that E 2 - , = oo . A s usual, the rn can be taken to be positive integers. Let A n = [In � rn ] = = dn + r l = 0] . If j + 'j � k, then Ai and A k are independent . [ dn - < j + 'j and put m = max{ j + 'j, k + r } ; then Aj n , < k that ose j Supp k m k+ i i A k = [d . == · · · = dm - 1 = 0] and so P(A . n A k ) = 2 - < - > ,. 2 - < r�c - > 2 -< k -i >P(A k ). Therefore, =

· · · J

=

J

'

E P ( Ai n A k ) �

j, k s n

E

k 5. n

�

P(A k ) + 2 E P ( Ai ) P( A k ) j
j
ksn

+ ( E P(A k ) r ksn

+

2 E P ( A k ). ksn

If EP(A n ) = E 2 - r, diverges, then (6.11) follows . Thus { rn } is an outer or an inner boundary according as E 2 - r, converges or diverges, which completely settles the issue. In particular, rn = log 2n + 8 log 2 log 2 n gives an outer boundary for 8 > 1 and an inner boundary for

(J

�

•

1.

It is now possible to complete the analysis in Examples 4.11 and 4.15 of the relative error en (w) in the approximation of w by EZ :l dk (w)2 - k. If /n (w) � log 2 x n (0 < Xn < 1), then en (w) < Xn by (4.22). By the preceding example for the case rn = log 2 x n , Ex n = oo implies that P[w: en (w) � Xn i. o.] = 1. By this and Example 4.11, [w: en(w) � x n i. o.] has Lebesgue measure 0 or 1 according as Exn converges or Example 6. 7.

-

-

.

·� ·

PROBLEMS 6.1. 6.2. 6.3.

Show that Zn -+ Z with probability 1 if and only if for every positive £ there exists an n such that P[ I Zk - Z l < £, n < k < m ] > 1 - £ for all m ex ceeding n. This describes convergence with probability 1 in " finite" terms. Show in Example 6.4 that P[ I Sn - Ln l � L!(2 + f ] -+ 0. As in Examples 5.5 and 6.4, let w be a random permutation of 1, 2, . . . , n. Each k, 1 < k s n , occupies some position in the bottom row of the permo-

PROBABILITY

86

tation w ; let xnk < w ) be the number of smaller elements (between 1 and k - 1) lying to the right of k in the bottom row. The sum sn = xnl + . . . + xnn is the total number of inversions- the number of pairs appearing in the bottom row in reverse order of size. For the permutation in Example 5.5 the values of X7 1 , , X11 are 0, 0, 0, 2, 4, 2, 4, and S7 = 12. 1 Show that Xn1 , , Xnn are independent and P[ Xnk = i ] = k - for 0 < i < k. Calculate E[ Sn ] and Var[ Sn 1 · Show that Sn is likely to be near n 2 /4. • • •

• • •

6.4. 6.5.

For a function f on [0, 1] write 11 / 11 = supx 1/( x ) I · Show that, if / has a B < £ 11 / ' 11 + 2 11 / IVn£ 2 • Conclude that continuous derivative , then n II/ / ' ll 1 II / - Bn ll = O ( n- 13 ).

From Theorem 6.2 deduce Poisson ' s theorem : If A1 , A 2 , . . . are independent events and P (An) = Pn ' if Pn = ( P I + . . . + pn )/n , and if Nn = Ei: - 1 /A k ' 1 then P [ I n - Nn - Pn I > £] -+ 0.

In the following problems Sn = X1 + · · · + Xn .

6.6.

Prove Cantelli ' s theorem: If XI ' x2 . . . are independent, E[ xn ] = 0, and ' E[ X: ) is bounded, then n- 1Sn -+ 0 with probability 1. The Xn need not be identically distributed.

6.7.

(a) Let x1 , x 2 , . . . be a se�uence of real numbers, and put sn = x 1 that n- sn 2 -+ 0 and that the Xn are bounded, and + . . . + Xn . Suppose 1 show that n - sn -+ 0. (b) Suppose that n - 2Sn2 -+ 0 with probability 1 and that the Xn are uniformly bounded (supn. w i Xn ( w ) l < oo) . Show that n - 1Sn -+ 0 with prob ability 1. Here the Xn need not be identically distributed or even indepen

dent.

6.8.

f Suppose that XI , x2 , . . . are independent and uniformly bounded and E[ Xn ] = 0. Using only the preceding result, the first Borel-Cantelli lemma, and Chebyshev's inequality, prove that n - 1sn -+ 0 with probability 1.

6.9.

Use the ideas of Problem 6.8 to give a new proof of Borel's normal number theorem, Theorem 1.2. The point is to return to first principles and use only negligibility and the other ideas of Section 1, not the apparatus of Sections 2 through 6; in particular, P( A ) is to be taken as defined only if A is a finite, disjoint union of intervals.

f

Suppose that (in the notation of (5.37)) /Jn - a; == 0 ( 1 /n ) . Show that n - 1 Nn - an -+ 0 with probability 1. What condition on /Jn - a� will imply a weak law? Note that independence is not assumed here.

6.10. 5.12 6. 7 f

6.11. Suppose that X1 , X2 , . . . are m-dependent in the sense that random variables more than m apart in the sequence are independent. More precisely, let �k = a( � , . . . , Xk ), and assume that �k 1 , , �k,' are independent if k; _ 1 + m < i; for i == 2, . . . , I. (Independen\ random variables are 0-depen dent.) Suppose that the Xn have this property and are uniformly bounded and that E[ Xn ] == 0. Show that n- 1sn -+ 0. Hint: Consider the subse quences X; , X;+m+ I ' X;+ 2(m + l) ' for 1 < i S m + 1 . 6.12. f Suppose that the xn are independent and assume the values Xt ' . . ' x, with probabilities p ( x1 ) , , p ( x1 ). For u1 , , uk a k-tuple of the x; ' s, let • • •

• • •

.

• • •

• • •

SECTION

6.

87

THE LAW OF LARGE NUMBERS

Nn ( u 1 , , uk ) be the frequency of the k-tuple in the first n + k - 1 trials, that is, the number Of m SUCh that 1 < m � n and Xm == U 1 , , Xm + k - 1 == uk . Show that with probability 1, all asymptotic relative frequencies are what they should be-that is, with probability 1, n- 1 Nn ( u1 , , uk ) -+ p ( u 1 ) p ( uk ) for every k and every k-tuple u 1 , , uk . •

•

•

• • •

• • •

6. 13.

• • •

•

•

•

f A number w in the unit interval is completely normal if, for every base b

and every k and every k-tuple of base-b digits, the k-tuple appears in the base-b expansion of w with asymptotic relative frequency b- k . Show that the set of completely normal numbers has Lebesgue measure 1.

6.14. Shannon ' s theorem. Suppose that XI ' x2 ' . . . are independent, identically distributed random variables taking on the values 1, . . . , r with positive probabilities p 1 , , p,. If Pn (i 1 , , in ) == P; · · · P; , and if Pn ( w ) == Pn ( XI ( w ) , . . . ' xn ( w )), then Pn ( w ) is the probatiility that a new sequence of n trials would produce the particular sequence XI ( w ), . . . ' xn ( w ) of out • • •

•

•

•

comes that happens actually to have been observed. Show that

with probability 1. In information theory 1, . . . , r are interpreted as the letters of an alphabet, X1 , X2 , . . . are the successive letters produced by an information source, and h is the entropy of the source. Prove the asymptotic equipartition property: For large n there is probability exceeding 1 - £ that the probability n
6.15. The Renyi- Lamperti lemma. Let the limit inferior in (6.11) be c. Show that P [ A n i.o.] > 2 - c. Note that c cannot be less than 1. 6.16. In the terminology of Example 6.6, show that log 2 n + log 2 log 2 n + fJ log 2 log 2 log 2 n is an outer or inner boundary as fJ > 1 or fJ < 1. Generalize. (Compare Problem 4.16.) 6.17. 5.17 f Let g( m) == E PBP ( m ) be the number of distinct prime divisors of m. For a n == En [ g ] (see (5.42)) show that a n -+ oo . Show that

( 6.12) for p

En

[( B - !n [ �p ] )( B - !n (!!.q ] ) ] < __!_np + __!_nq P

q

=F q and hence that the variance of g under P,, satisfies

( 6.13) Prove the Hardy-Ramanujan theorem :

( 6.14)

88

PROBABILITY

Since an log logn (see Problem 18.18), most integers under n have some thing like log log n distinct prime divisors. Since log log1 07 is a 1i ttle less than 3, the typical integer under 107 has about three prime factors-remarkably few. 6.18. Suppose that X� ' X2 , are independent and P[ Xn 0] p. Let Ln be the length of the run of O's starting at the n th place: Ln k if Xn · · · Xn + k I 0 =F Xn + k · Show that P[ Ln > rn i.o.] is 0 or 1 according as En p converges or diverges. Example 6.6 covers the case p t . -

• • •

=

=

=

=

=

=

=

'n

SECI10N 7. GAMBLING SYSTEMS

Let X1 , X2 , be an independent sequence of random variables (on some ( D , !F, P)) taking on the two values + 1 and - 1 with probabilities P[ Xn = + 1 ] = p and P[ Xn = - 1 ] = q = 1 - p. Throughout the section Xn will be viewed as the gambler's gain on the nth of a series of plays at unit stakes. The game is favorable to the gambler if p > ! , fair if p = ! , and unfavor able if p < � . The case p � ! will be called the subfair case. After the classical gambler's ruin problem has been solved, it will be shown that every gambling system is in certain respects without effect and that some gambling systems are in other respects optimal. Gambling prob lems of the sort considered here have inspired many ideas in the mathemati cal theory of probability, ideas that carry far beyond their origin. Red-and-black will provide numerical examples. Of the 38 spaces on a roulette wheel, 18 are red, 18 are black, and 2 are green. In betting either on red or on black the chance of winning is �: . •

.

.

Gambler's Ruin

Suppose that the gambler enters the casino with capital a and adopts the strategy of continuing to bet at unit stakes until his fortune increases to c or his funds are exhausted. What is the probability of ruin, the probability that he will lose his capital, a? What is the probability he will achieve his goal, c? Here a and c are integers. Let (7 . 1 )

Sn = X1 + · · · + Xn '

The gambler's fortune after ( 7 .2)

S0 = 0.

n plays is a + Sn. The event

A a . n = (a + Sn = c, O < a + Sk < c, k < n )

SECTION

7.

89

GAMBLING SYSTEMS

represents success for the gambler at time n , and

Ba. n = ( a + Sn =

( 7 .3)

represents ruin at time success, then

n.

< a + S* <

0� 0

s (a) (

If

c,

k

< n)

denotes the probability of ultimate

(7 .4) for 0 < a < c. Fix c and let a vary. For n > 1 and 0 < a < c, define A a . n by (7.2), and adopt the conventions Aa.o = 0 for 0 < a < c and A,.o = 0 (success is impossible at time 0 if a < c and certain if a = c), as well as Ao. n = Ac. n = 0 for n � 1 (play never starts if a is 0 or c). By these conventions, s c (O) = 0 and s c (c) = 1 . Because of independence and the fact that the sequence X2 , X3, is a probabilistic replica of X1 , X2 , , it seems clear that the chance of success for a gambler with initial fortune a must be the chance of winning the first wager times the chance of success for an initial fortune a + 1 , plus the chance of losing the first wager times the chance of success for an initial fortune a 1 . It thus seems intuitively clear that •

•

•

•

•

•

-

sc ( a ) = psc ( a + 1 ) + qs ( a - 1) ,

(7.5)

0

c

< a < c.

For a rigorous argument, define A � . n just as Aa. n but with s; = X2 + . . . + Xn + l in place of sn in ( 7 . 2 ) . Now P[ X; = X;, i < n ] = P[ X; + I = X;, i s n] for each sequence x 1 , . . . , xn of + 1's and - 1's, and therefore n P[(X1 , . . . , Xn) e H ] = P[( X2 , . n. . , Xn + 1)e H) for H c R . Take H to be the set of X = (x 1 , . . . x n ) in R satisfying X ; = ± 1, a + X I + . . . + x n = ' c, and 0 < a + x 1 + +x k < c for k < n . It follows then that · · ·

(7.6) Moreover, Aa , n = ( [ X1 = + 1] n A �+ l , n - l ) U ( [ X1 = - 1] n A � - l , n - 1 ) for n � 1 and 0 < a < c. By independence and ( 7 . 6 ), P( Aa , n) = pP( Aa + 1, n_ 1) + qP ( A a _ 1 n - 1) ; adding over n now gives (7.5). Note that this argument mvolves the entire infinite sequence X1 , X2 , It remains to solve the difference equation (7.5) with the side conditions rc(O) = O, s c (c) = 1 . Let p = qjp be the odds against the gambler. Then [Al9] there exist constants A and B such that, for 0 � a < c, sc (a ) = A + Bp11 if p ¢ q and sc(a) = A + Ba if p = q. The requirements s c (O) = 0 and sc(c) = 1 determine A and B, which gives the solution: •

t

•

•

•

•

90

PROBABILITY

The probability the gambler can before ruin attain his goal of c from an initial capital of a is pa - 1 0 � a < c, P = !1. p =1= I , c- 1 ' p (7 .7) a q = 1. 0 < a c, p = p c' <

The gambler's initial capital is $900 and his goal is $1000. If p = � , his chance of success is very good: s 1000 ( 900) = .9. At red-and black, p = �= and hence p = i� ; in this case his chance of success as • computed by (7.7) is only about .0000 3 . Example 7. 1.

It is the gambler's desperate intention to convert his $100 into $20,000. For a game in which p = � (no casino has one), his chance of success is 100/20,000 = .005 ; at red-and-black it is minute-about 3 X • 10 - 9 11 . Example 7. 2.

In the analysis leading to (7.7), replace (7.2) by (7 . 3) . It follows that (7.7) with p and q interchanged (p goes to p - 1) and a and c - a interchanged a gives the probability rc( a ) of ruin for the gambler: rc ( a ) = ( p - (£ - ) 1 )/( P - c - 1 ) if p ¢ 1 and rc( a ) = (c - a)jc if p = 1. Hence sc( a ) + rc(a) = 1 holds in all cases: The probability is 0 that play continues forever. For positive integers a and b, let Ha. b = U �_ 1 [Sn = b, - a < Sk < b, k < n] be the event that Sn reaches + b before reaching - a. Its probability is simply (7.7) with c = a + b: P( Ha. b) = sa+b(a). Now let Hh = U�_ 1 Ha . b = U �_ 1 [Sn = b] = [supnSn > b] be the event that Sn ever reaches +b. Since Ha. h i Hb as a --+ oo , P( Hb) = lim as a+ h ( a ) ; this is 1 if p = 1 or p < 1, and it is 1/ph if p > 1 . Thus if p >

q, ( 7 .8) if p < q. This is the probability that a gambler with unlimited capital can ultimately gain b units. The gambler in Example 7.1 has capital 900 and the goal of winning b = 100; in Example 7.2 he has capital 100 and b is 19,900. Suppose, instead, that his capital is infinite. If p = �, the chance of achieving his goal increases from .9 to 1 in the first example and from .005 to 1 in the second. At red-and-black, however, the two probabilities .9 100 and .9 1 9900 remain essentially what they were before ( . 0000 3 and 3 x 10 - 9 1 1 ) Example 7.3.

•

•

SECTION

7.

91

GAMB LING SYSTEMS

Selection Systems

Players often try to improve their luck by betting only when in the preceding trials the wins and losses form an auspicious pattern. Perhaps the gambler bets on the nth trial only when among X1 , , Xn - l there are many more + 1 's than - 1 's, the idea being to ride winning streaks (he is " in the vein"). Or he may bet only when there are many more - 1's than + 1's, the idea being it is then surely time a + 1 came along (the " maturity of the chances"). There is a mathematical theorem which translated into gaming language says that all such systems are futile. •

•

•

, X,, _ 1 there is an It might be argued that it is sensible to bet if among X1 , excess of + l's, on the ground that it is evidence of a high value of p. But it is assumed throughout that statistical inference is not at issue: p is fixed-at * , for example, in the case of red-and-black-and is known to the gambler, or should be. •

•

•

The gambler' s strategy is described by random variables B 1 , B2 , taking the two values 0 and 1 : If Bn = 1, the gambler places a bet on the nth trial; if Bn = 0, he skips that trial. If Bn were ( Xn + 1)/2, so that Bn = 1 for Xn = + 1 and Bn = 0 for Xn = - 1 , the gambler would win every time he bet, but of course such a system requires he be prescient-he must know the outcome Xn in advance. For this reason the value of Bn is assumed to depend only on the values of X1 , , Xn 1 : there exists some n 1 function bn : R - l --+ R such that •

•

•

•

•

•

_

(7 .9) (Here B 1 is constant.) Thus the mathematics avoids, as it must, the question of whether prescience is actually possible. Define

{�

( x• . . . . . xn ) . F0 - {O, D } .

(7 .10 ) The

0

n

=

1 , 2, . . . ,

a-field � 1 generated by X1 , , Xn 1 corresponds to a knowledge of the outcomes of the first n - 1 trials. The requirement (7.9) ensures that Bn is mea surable � - I (Theorem 5.1) and so depends only on the information actually available to the gambler just before the n th trial. For n = 1, 2, . . . , let Nn be the time at which the gambler places his n th bet. This n th bet is placed at time k or earlier if and only if the number E�_ 1 B; of bets placed up to and including time k is n or more; in fact, Nn is the smallest k for which E�_ 1 B; = n. Thus the event [ Nn < k ] coincides _

•

•

•

_

92

PROBABILITY

[E�_ 1B; > n]; by (7.9) this latter o( XI , . . . ' xk - 1 > = !Fk - 1 · Therefore, with

event lies in o( B 1 ,

•

•

•

, Bk ) c

(7 .1 1 )

(Even though [Nn = k ] lies in .?Fk - I and hence in .?F, Nn is as a function on 0 generally not a simple random variable because it has infinite range. This makes no difference because expected values of the Nn will play no role; (7 .11) is the essential property.) To ensure that play continues forever (stopping rules will be considered later) and that the Nn have finite values with probability 1, make the further assumption that P ( Bn

( 7 .12 )

=

1 i.o]

=

1.

A sequence { Bn } of random variables assuming the values 0 and 1, having the form (7 . 9), and satisfying (7.12) is a selection system. Let Yn be the gambler's gain on the nth of the trials at which he does bet: Yn = XN . It is only on the set [Bn = 1 i.o.] that all the Nn and hence all the Yn are well defined. To complete the definition, set Yn = - 1, say, on [Bn = 1 i.o.] c; since this set has probabili�y 0 by (7.12), it really makes no difference how Yn is defined on it. Now Yn is a complicated function on 0 because Yn (w) = XN,. "' > (w). < Nonetheless, II

( w : Yn( w ) = 1 ]

=

00

U ( ( w : Nn ( w ) = k ] n ( w : Xk ( w ) = 1 ] )

k-1

lies in .fF and so does its complement simple random variable.

[ w: Yn ( w) = - 1 ] .

Hence

Yn

is a

An example will fix these ideas. Suppose that the rule is always to bet on the first trial, to bet on the second trial if and only if X1 = + 1, to bet on the third trial if and only if X1 = X2 , and to bet on all subsequent trials. Here B 1 = 1, [B2 = 1] = [ X1 = + 1], [B3 = 1 ] = [ X1 = X2 ], and B4 = B5 = · · · = 1. The table shows the ways the gambling can start out. A dot represents a value undetermined by X1 , X2 , X3 Ignore the rightmost column for the moment. Example 7.4.

•

SECTION

xl

x2

x3

-1

-1 -1 +1 +1 -1 -1 +1 +1

-1 +1 -1 +1 -1 +1 -1 +1

-1 -1 -1 +1 +1 +1 +1

7.

GAMBLING SYSTEMS

B t B2 B3 Nt N2 N3 N4 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1

1 1 0 0 0 0 1 1

1 1 1 1 1 1 1 1

3 3 4 4 2 2 2 2

4 4 5 5 4 4 3 3

5 5 6 6 5 5 4 4

93

yl

y2

y3

-1 -1 -1 -1 +1 +1 +1 +1

-1 +1

1 1 1 1 2 2 -1 3 +1 .

-1 -1 +1 +1

T

In the evolution represented by the first line of the table, the second bet is placed on the third trial ( N2 = 3), which results in a loss because y2 == XN == x3 = - 1 . Since x3 = - 1, the gambler was " wrong" to bet. 2 But remember that before the third trial he does not know X3( w) (much less "' itself); he knows only X1(w) and X2 (w). See the discussion in Example • s.� Selection systems achieve nothing because { Yn } has the same structure as { Xn } : 1beorem 7.1. For any selection + 1] = p , P [ Yn = - 1 ] = q.

system, { Yn } is independent and P[Yn =

PROOF.

Since random variables with indices that are themselves random variables are conceptually confusing on first aquaintance, the w ' s here will not be suppressed as they have been in previous proofs. Relabel p and q as p ( + 1) and p( - 1), so that P[w: Xk (w) = x ] = p(x) for x == ± 1 . If A e �k - l ' then A and [w: Xk (w) = x ] are independent, and so P( A n [w: Xk (w) = x ]) = P( A) p ( x ). Therefore, by (7.1 1),

==

00

L P ( w : Nn ( w ) = k , Xk ( w ) = x ]

k-1

= p(x) .

00

= L P ( w : Nn ( w ) = k ] p ( x ) k�l

94

PROBABILITY

More generally, for any sequence

P [w:

==

x1,

•

•

•

, x n of

± l ' s,

Y; ( w ) = x ; , i < n ] = P [ w : XN, < ., > ( w ) = x; , i < n ] kl <

...

E

n

P [ w : N;( w ) = k ; , Xk, ( w ) = x; , i < n ] ,

where the sum extends over n-tuples of positive integers satisfying k 1 < . < k n . The event [w: N; (w) = k ; , i < n] n [w: xk (w) = X; , i < n] lies in :Fk n 1 (note that there is no condition on Xk ,(w)), and therefore

..

I

-

P (w:

Y; ( w ) = X; , i < n ] kl <

Summing

...

E

n

P( [w : N;( w ) = k; , i < n ]

k n over k n_ 1 + 1, k n_ 1 + 2, . . .

E

k 1 < · · · < kn - 1

brings this last sum to

P [ w : N;( w ) = k ; , xk; ( w ) = X; , i < n ] p ( x n)

= P [ w : XN, < .,> ( w ) = X; , i < n ) p ( x n) = P (w : Y; ( w ) = X; , i < n ] p ( x n ) .

It follows by induction that

P (w : Y; ( w ) = X; , i < n ] = 0 p ( x ;) = 0 P ( w : Y; ( w ) = X ; ] , i s. n i
•

Gambling Policies There are schemes that go beyond selection systems and tell the gambler not only whether to bet but how much. Gamblers frequently contrive or adopt such schemes in the confident expectation that they can, by pure force of arithmetic, counter the most adverse workings of chance. If the wager specified for the n th trial is in the amount Wn and the gambler cannot see into the future, then Wn must depend only on X1 , , Xn_ 1 Assume •

•

•

•

SECTION

7.

GAMBLING SYSTEMS

95

wn is a nonnegative function of these random variables: there therefore that n 1 such that 1 R --+ R : fn is an

(7 .13) Apart from nonnegativity there are at the outset no constraints on the fn , although in an actual casino their values must be integral multiples of a basic unit. Such a sequence { Wn } is a betting system. Since Wn = 0 corresponds to a decision not to bet at all, betting systems in effect include n selection systems. In the double-or-nothing system, Wn = 2 - 1 if X1 = · · · = Xn _ 1 = - 1 ( W1 = 1) and Wn = 0 otherwise. The amount the gambler wins on the nth play is Wn Xn . If his fortune at time n is Fn , then

(7 . 14)

This also holds for n = 1 if F0 is taken as his initial (nonrandom) fortune. It is convenient to let Wn depend on F0 as well as the past history of play and hence to generalize (7.13) to

(7 . 15 ) for a function

gn : R n

(7 . 16 )

F:0

1

In expanded notation, Wn ( w ) = Bn( F0, X1 ( w ), . . . , Xn _ 1 ( w )) . The symbol Wn does not show the dependence on w or on F0 either. For each fixed initial fortune F0, Wn is a simple random variable; by (7.15) it is measurable � 1 • Similarly, Fn is a function as of X = F (w): F (w), . . . , X of F0 as well n n ( F0, w). n 1 If F0 = 0 and Bn = 1 , the Fn reduce to the partial sums (7.1). Since � _ 1 and o( Xn ) are independent, and since Wn is measurable �- 1 (for each fixed F0), Wn and Xn are independent. Therefore, E [ W,. Xn] == E[Wn ] · E [ Xn 1 · Now E[ Xn 1 = p - q < 0 in the subfair case ( p < �), with equality in the fair case ( p = � ). Since E [ Wn 1 > 0, (7.14) implies that E(Fn ] � E [ Fn _ 1 ]. Therefore, --+ R •

> ··· > ··· > E ( Fn ] > E ( F1 ] -

in the subfair case, and

(7 .1 7 ) the fair case. (If p < q and P [ Wn > 0] > 0, there is strict inequality in (7 . 1 6). ) Thus no betting system can convert a subfair game into a profitable enterprise.

in

96

PROBABILITY

Suppose that in addition to a betting system, the gambler adopts some policy for quitting. Perhaps he stops when his fortune reaches a set target, or his funds are exhausted, or the auguries are in some way dissuasive. The decision to stop must depend only on the initial fortune and the history of play up to the present. Let T ( F0, "' ) be a nonnegative integer for each "' in 0 and each F0 � 0. If T = n , the gambler plays on the n th trial (betting Wn ) and then stops; if T = 0, he does not begin gambling in the first place. The event [ w: T ( F0, "' ) = n ] represents the decision to stop just after the n th trial, and so, whatever value F0 may have, it must depend only on X1 , , Xn . Therefore, assume that •

•

•

n = 0, 1 , 2, . . .

(7 .1 8)

A T satisfying this requirement is a stopping time. (In general it has infinite range and hence is not a simple random variable; as expected values of T play no role here, this does not matter.) It is technically necessary to let T ( F0, "' ) be undefined or infinite on an w-set of probability 0. This has no effect on the requirement (7.1 8 ), which must hold for each finite n. Since T is finite with probability 1, play is certain to terminate. A betting system together with a stopping time is a gambling policy. Let 'IT denote such a policy. Suppose that the betting system is given by Wn = Bn , with Bn as in Example 7.4. Suppose that the stopping rule is to quit after the first loss of a wager. Then [ T = n] = u k_ 1 [Nk = n, y1 = . . . = yk 1 = + 1, Yk = 1 ]. For j � k , [Nk = n , lj = x ] = u:._ 1 [Nk = n , � = m,- Xm = x] lies in � by (7.11); hence T is a stopping time. The values of T are shown • in the rightmost column of the table. Example 7.5.

-

The sequence of fortunes is governed by (7.14) until play terminates, and then the fortune remains for all future time fixed at FT (with value FT< Fo , w ("' )). Therefore, the gambler's fortune at time n is

>

(7 .19 )

F* n =

{ FFnT

if T > n , if T < n .

Note that the case T = n is covered by both clauses here. If n 1 < n � T, then Fn* = Fn = Fn 1 + wn xn = Fn*- 1 + WnXn ; if T � n 1 < n, then Fn* = FT = Fn*_ 1 . Therefore, if W,.* = I[ .,. � n J Wn , then -

-

( 7 .20 )

SECTION

7.

97

GAMBLING SYSTEMS

But this is the equation for a new betting system in which the wager placed at time n is Wn* · If , > n (play has not already terminated), W,1* is the old amount Wn; if T < n (play has terminated), W,,* is 0. Now by (7.18), ( T � n] = [ T < n] c lies in � - 1• Thus Ir T , 1 is measurable � - 1 , so that wn* as well as wn is measurable � - 1' and { wn* } represents a legitimate betting system. Therefore, (7.16) and (7.17) apply to the new system: �

(7 . 21 )

F,0 = F,0* > -

E ( F1* ] ->

> ··· ··· > E ( Fn* ] -

if p < � , and

(7 .22) if p = t . The gambler' s ultimate fortune is FT . Now lim n Fn* = 1 since in fact Fn* = FT for n > T . If

(7 .23)

linm E ( Fn* )

=

E ( FT ),

then (7.21) and (7.22), respectively, imply that E[FT ] < According to Theorem 5.3, (7.23) does hold if the bounded. Call the policy bounded by M (M nonrandom) if

(7.24 )

FT with probability

n = 0, 1 , 2, . . .

F0 and E[ FT ] = F0• Fn* are uniformly

.

If Fn* is not bounded above, the gambler' s adversary must have infinite capital. A negative Fn* represents a debt, and if Fn* is not bounded below, the gambler must have a patron of infinite wealth and generosity from whom to borrow and so must in effect have infinite capital. In case Fn* is bounded below, 0 is the convenient lower bound-the gambler is assumed to have in hand all the capital to which he has access. In any real case, (7 .24) holds and (7.23) follows. (There is a technical point that arises because the general theory of integration has been postponed: FT must be assumed to have finite range so that it will be a simple random variable and hence have an expected value in the sense of Section 5.) The argument has led to this result :

Theorem 7.2. For anypolicy, (1.21) holds ifp < If the policy is bounded (and F., has finite range), and E [ FT ] = F0 for p = t.

} and (1.22) holds ifp = }. then E[F.,] < F0 for p < �

98

PROBABILITY

The gambler has initial capital a and plays at unit stakes until his capital increases to c (0 < a < c) or he is ruined. Here F0 = a and w,, 1 , and so Fn = a + sn. The policy is bounded by c and FT is c or 0 according as the gambler succeeds or fails. If p = t and if s is the probability of success, then a = F0 = E[FT ] = sc. Thus s = ajc. This gives a new derivation of (7.7) for the case p = ! . The argument assumes however that play is certain to terminate. If p < t, Theorem 7.2 only gives s < ajc, • which is weaker than (7.7). Example 7. 6. =

Example 7. 7. Suppose as before that F0 = a and Wn = 1, so that = a + Sn, but suppose the stopping rule is to quit as soon as Fn reaches

Fn a + b.

Here Fn* is bounded above by a + b but is not bounded below. If p = � , the gambler is by (7.8) certain to achieve his goal, so that FT = a + b. In this case F0 = a < a + b = E[F,.]. This illustrates the effect of infinite capital. It also illustrates the need for uniform boundedness in Theorem 5.3 (compare Example 5.6). • For some other systems (gamblers call them " martingales"), see the problems. For most such systems there is a large chance of a small gain and a small chance of a large loss. Bold Play *

The formula (7. 7) gives the chance that a gambler betting unit stakes can increase his fortune from a to c before being ruined. Suppose that a and c happen to be even and that at each trial the wager is two units instead of one. Since this has the effect of halving a and c, the chance of success is now pa/2 pc/2

_

-

1 1

-

pa pc

_

-

1 1

pc/2

+1 pa/2 + 1 '

g_ =

p

p

¢

1.

If p > 1 ( p < � ), the second factor on the right exceeds 1: Doubling the stakes increases the probability of success in the unfavorable case p > 1. In case p = 1, the probability remains the same. There is a sense in which large stakes are optimal. It will be convenient to rescale so that the initial fortune satisfies 0 < F0 < 1 and the goal is 1. The policy of bold play is this: At each stage the gambler bets his entire fortune, unless a win would carry him past his goal of 1, in which case he bets just * This topic may be omitted.

SECTION

7.

99

GAMBLING SYSTEMS

enough that a win would exactly achieve that goal: Wn =

( 7 .25)

{ Fn - 1

1 - Fn - 1

if 0 < F,, - 1 < 1 if ! < Fn 1 < 1 .

'

_

(It is convenient to allow even irrational fortunes.) As for stopping, the policy is to quit as soon as Fn reaches 0 or 1. Suppose that play has not terminated by time k - 1 ; under the policy (7.25), if play is not to terminate at time k, then Xk must be + 1 or - 1 according as Fk 1 < ! or Fk 1 > ! , and the conditional probability of this is at most m = max{ p, q }. It follows by induction that the probability that n bold play continues beyond time n is at most m , and so play is certain to terminate ( T is finite with probability 1). It will be shown that in the sub fair case, bold play maximizes the probability of successfully reaching the goal of 1. This is the Dubins-Savage theorem. It will further be shown that there are other policies that are also optimal in this sense, and this maximum probability will be calculated. Bold play can be substantially better than betting at constant stakes. This contrasts with Theorems 7.1 and 7.2 concerning respects in which gambling systems are worthless. From now on, consider only policies 'IT that are bounded by 1 (see (7.24)). Suppose further that play stops as soon as Fn reaches 0 or 1 and that this is certain eventually to happen. Since FT assumes the values 0 and 1, and since [ FT = x] = U:0_ 0[T = n] n [Fn = x] for x = 0 and x = 1, FT is a simple random variable. Bold play is one such policy 'IT. The policy 'IT leads to success if F., = 1. Let Q.,(x) be the probability of this for an initial fortune F0 = x: _

(7 .26)

_

Q., ( x ) = P [ F., = 1 ] ,

F0 = x.

Since Fn is a function l/l n (F0, X1 (w), . . . , Xn (w)) = Vn (F0, w ), (7.26) in expanded notation is Q.,(x) = P[w: VT( x, '-J ) (x, w) = 1]. As 'IT specifies that play stops at the boundaries 0 and 1,

{ Q.,(O)

Q ., (1) = 1, 0 < Q ., ( x ) < 1, 0 < X < 1.

(7 .27 )

= 0,

Let Q be the Q ., for bold play. (The notation does not show the dependence of Q and Q ., on p, which is fixed.) Theorem

7.3.

In the subfair case, Q .,(x) < Q (x) for all and all x. 'IT

100 PROOF.

( 7.28)

PROBABILITY

Under the assumption

p s q, it will be shown later that

Q( x ) > pQ( x + t ) + qQ(x - t ) , 0<X-

t < X < X + t < 1.

This can be interpreted as saying that the chance of success under bold play starting at x is at least as great as the chance of success if the amount t is wagered and bold play then pursued from x + t in case of a win and from x - t in case of a loss. Under the assumption of (7.28), optimality can be proved as follows. Consider a policy 'IT, and let Fn and Fn* be the simple random variables defined by (7.14) and (7.19) for this policy. Now Q(x) is a real function, and so Q ( Fn*) is also a simple random variable; it can be interpreted as the conditional chance of success if 'IT is replaced by bold play after time n . By (7.20), Fn* = x + tXn if Fn*_ 1 = x and W,.* = t. Therefore, X.I

where x and t vary over the (finite) ranges of Fn*_ 1 and Wn*, respectively. For each x and t, the indicator above is measurable � - 1 and Q ( x + tX. n ) is measurable o( Xn ); since the Xn are independent, (5.21) and (5.14) gtve X. I

By (7.28), E [Q(x + tXn )] < Q ( x ) if 0 < x - t S x < x + t S 1. As it is assumed of 'IT that Fn* lies in [0, 1] (that is, Wn* S min{ Fn*_ 1 , 1 - Fn*_ 1 } ), the probability in (7.29) is 0 unless x and t satisfy this constraint. There fore, X. I

X

This is true for each n, and so E[Q( Fn*)] � E[Q( F0*)] = Q(F0). Since Q( Fn*) = Q( FT ) for n > T, Theorem 5.3 implies that E[Q(FT )] � Q (F0 ). Since x = 1 implies that Q ( x ) = 1, P[FT = 1] s E[Q ( F,.)] < Q( F0). Thus Q"( F0 ) � Q ( F0 ) for the policy 'IT, whatever F0 may be.

SECTION

7.

101

GAMBLING SYSTEMS

Q ) �X< {pQ(2x , ) Q ( X p + qQ(2x (7 �X< x x Q(O)x. x < ! Q(1) x; if p) x + x 2x Q(2x)); x x; p) q) x x) 2x Q(2x Q(x) x x� Q(x) x.

It remains to analyze functional equation .30)

and prove (7.28). Everything hinges on the

=

0 i

- 1),

i, 1.

= 0 and = 1. The For = 0 and = 1 this is obvious because , the first stake idea is this: Suppose that the initial fortune is If under bold play is the gambler is to succeed in reaching 1, he must win the first trial (probability = and then from his new fortune go this makes the first half of (7.30) on to succeed (probability plausible. If � t, the first stake is 1 the gambler can succeed either by winning the first trial (probability or by losing the first trial (probabil = ity and then going on from his new fortune - (1 - 1 to succeed (probability - 1)); this makes the second half of (7.30) plausible. must be an increasing function of It is also intuitively clear that (0 � 1): the more money the gambler starts with, the better off he is. Finally, it is intuitively clear that ought to be a continuous function of the initial fortune

A formal proof of (7.30) can be constructed as for the difference equation (7.5). If /l ( x ) is x for x s ! and 1 - x for x � t , then under bold play W, == {J( F, _ 1 ). Starting from l0 ( x ) == x, recursively define

Then F,

== In ( Jb; x1 ' . . . ' xn ) . Now define

If Fo == x, then T, ( x) == [ gn (x; X1 , . . . , Xn ) == 1] is the event that bold play

will

by time n successfully increase the gambler's fortune to 1. From the recursive defini tion it follows by induction on n that for n � 1, In (x; x1 , . . . , xn ) == fn - 1 (x + /l( x ) x1 ; x 2 , , xn ) and hence that gn (x; x1 , . . . , xn ) == max { x, gn _ 1 (x + /l( x)x1 ; x 2 , , xn ) }. Since x == 1 implies gn _ 1 ( x + fj (x) x1 ; x 2 , , xn ) > x + /l( x) x1 == 1, T, ( x) == [ gn _ 1 ( x + {j(x) X1 ; X2 , , Xn ) == 1], and since the X; are independent and identically distributed, P( T, ( x)) == P(( X1 == + 1] n T,(x)) + P([ X1 == - 1] n T, ( x)) == pP [ gn _ 1 (x + fj (x); X2 , , Xn ) == 1] + qP [ gn _ 1 (x /l( x); X2 , , Xn ) == 1] == pP( T, _ 1 (x + fj (x))) + qP(T, _ 1 (x - fj (x ))). Letting n -+ oo now gives Q (x) == pQ (x + fj ( x)) + qQ (x - fj (x)), which reduces to (7.30) because Q (O) == 0 and Q (1) == 1. Suppose that y == ln - 1 (x; x1 , . . . , xn _ 1 ) is nondecreasing in x. If xn = + 1 , then /, ( X; X 1 , . . . , Xn ) is 2 y if 0 S y < t and 1 if t S y S 1 ; if Xn = - 1, then •

•

•

•

•

•

•

•

•

•

•

•

•

• • •

•

•

102

PROBABILITY

-

In ( x; x 1 , , xn ) is 0 if 0 < y < ! and 2 y 1 if ! < y < 1. In any case, fn ( x; x 1 , , xn ) is also nondecreasing in x, and by induction this is true for every n. It follows that the same is true of gn (x; x1 , , xn ), of P(T,(x)), and of Q (x). Thus Q (x) is nondecreasing. Since Q (1) = 1, (7.30) implies that Q (!) = p Q ( 1) = p, Q ( i) = pQ (!) = p 2 , Q ( i ) = p + qQ ( ! ) = p + pq. More generally, if Po = p and p 1 = q, then • • •

• • •

k (7.31) Q ,. 2

• • •

( )

-

L

[

n

Pu1

•

•

•

Pu. :

U;

;E. 2;

]

k

< " ' 2

n

> 1,

the sum extending over n-tuples ( u1 , , un ) of O's and 1 's satisfying the condition indicated. Indeed, it is easy to see that (7.31) is the same thing as • • •

( 7 .32) n for each dyadic rational . u1 • • un of rank n . If . u1 • • un +n 2- < ! , then u1 = 0 and by (7.30) the difference in (7.32) is p0[ Q (. u 2 un + 2- + t ) - Q (. u 2 • un )). But (7.32) follows inductively from this and a similar relation for the case . u 1 un > t. n n n Therefore Q ( k2 - n ) - Q (( k - 1)2- ) is bounded by max{ p , q }, and so by monotonicity Q is continuous. Since (7.32) is positive, it follows that Q is strictly increasing over [0, 1]. •

•

•

•

•

•

•

•

•

•

Thus Q is continuous and increasing and satisfies (7.30). The inequality (7 .28) is still to be proved. It is equivalent to the assertion that

a ( r, s ) = Q(a) - pQ(s) - qQ(r ) > 0 if 0 < r � s < 1, where a stands for the average: a = �( r + s). Since Q is n continuous, it suffices to prove the inequality for r and s of the form kj2 , and this will be done by induction on n . Checking all cases disposes of n = 0. Assume that the inequality holds for a particular n , and that r and s n have the form kj2 + t . There are four cases to consider.

CASE 1. s < ! . By the first part of (7.30), d(r, s) = p4(2r, 2s). Since 2r n and 2s have the form kj2 , the induction hypothesis implies that d(2r, 2s) > 0. CAsE

2.

t<

r. By the second part of (7.30), a (r, s ) = qa ( 2r - 1, 2s - 1 ) > 0.

CASE 3.

r < a � � � s. By (7.30), a ( r, s ) = p Q(2a ) - p ( p + qQ(2s - 1 ) ] - q [ p Q( 2r ) ] .

SECTION

7.

Since � s s � r + s = 2a < 1, - � < i , Q(2a - � ) = pQ (4a - i ), and it fellows that

&( r, s ) = q (Q( 2a -

103

GAMBLING SYSTEMS

Q(2a) = p + qQ(4a - 1); since 0 < 2a - 1). Therefore, p Q (2a) = p 2 + qQ(2a � ) - p Q( 2s

- 1 ) - p Q(2r )] .

Since p < q, the right side does not increase if either of the two p ' s is changed to q. Hence

& ( r, s ) > q max[�( 2r, 2s - 1 ) , & ( 2s - 1, 2r ) ] . The induction hypothesis applies to 2r < 2s - 1 or to 2s - 1 < 2r, as the case may be, and so one of the two & ' s on the right is nonnegative. CASE

4. r <

�

�

a < s. By (7.30),

& ( r, s ) = pq + qQ(2a - 1 ) - pqQ( 2s - 1 ) - pqQ( 2r ) . Since 0 � 2a - 1 = r + s - 1 < � ' Q (2a - 1) = p Q (4a - 2); since � < 2a - � = r + s - � < 1, Q (2a - � ) = p + qQ (4a - 2). Therefore, qQ (2a - 1) = p Q (2a - � ) - p 2, and it follows that

&( r, s ) = p ( q - p + Q ( 2a I f 2s

-1

<

� ) - qQ( 2s

- 1 ) - qQ( 2r )] .

2r, the right side here is p [( q - p ) (l - Q( 2r ) ) + &( 2s - 1, 2r )] > 0.

If

2r < 2s - 1, the right side is p [ ( q - p)(1 - Q(2 s - 1 )) + & { 2r, 2s - 1 )]

> 0.

This completes the proof of (7 .28) and hence of Theorem 7 .3.

•

The equation (7.31) has an interesting interpretation. Let Z1, Z2 be independent random variables satisfying P[Zn = 0] = Po = p and P(Zn = n ; 1 ] = p 1 = q.; From P[Z 1 i.o.] = 1 and E; n Z; 2 - < 2 - it follows that n = n n n ; ; < ] 2 [E 2 < k2 2 < k2 ] < P[E Z < k2 ]. Z P[E Z ; ; ; 7P �t �t t n Since by (7.31) the middle term is Q(k2 - ). ,

>

( 7 .33 )

•

•

•

104

PROBABILITY

holds for dyadic rational x and hence by continuity holds for all x. In Section 31, Q will reappear as a continuous, strictly increasing function singular in the sense of Lebesgue. On p. 428 is a graph for the case Po = .25. Note that Q( x) = x in the fair case p = t. In fact, for a bounded policy Theorem 7.2 implies that E[FT ] = F0 in the fair case, and if the policy is to stop as soon as the fortune reaches 0 or 1, then the chance of successfully reaching 1 is P[FT = 1 ] = E[FT ] = F0• Thus in the fair case with initial fortune x, the chance of success is x for every policy that stops at the boundaries, and x is an upper bound even if stopping earlier is allowed. Example 7. 8.

The gambler of Example 7.1 has capital $900 and goal $1000. For a fair game ( p = !) his chance of success is .9 whether he bets unit stakes or adopts bold play. At red-and-black ( p = tf), his chance of success with unit stakes is .0000 3 ; an approximate calculation based on (7.31) shows that under bold play his chance Q(.9) of success increases to • about .88, which compares well with the fair case. In Example 7.2 the capital is $100 and the goal $20,000. 11 At unit stakes the chance of successes is .005 for p = � and 3 x 10 - 9 for p = tf. Another approximate calculation shows that bold play at red-and black gives the gambler probability about .003 of success, which again compares well with the fair case. This example illustrates the point of Theorem 7.3. The gambler enters the casino knowing that he must by dawn convert his $100 into $20,000 or face certain death at the hands of criminals to whom he owes that amount. Only red-and-black is available to him. The question is not whether to gamble-he must gamble. The question is how to gamble so as to maximize the chance • of survival, and bold play is the answer. Example 7. 9.

There are policies other than the bold one which achieve the maximum success probability Q(x). Suppose that as long as the gambler' s fortune x is less than t he bets x for x s -l and t x for � < x s � . This is, in effect, the bold-play strategy scaled down to the interval [0, ! 1 , and so the chance he ever reaches t is Q(2x ) for an initial fortune of x. Suppose further that if he does reach the goal of �, or if he starts with fortune at least � in the first place, he continues with ordinary bold play. For an initial fortune x > � , the overall chance of success is of course Q(x ), and for an initial fortune x < � ' it is Q(2x )Q( i) = pQ(2x) = Q(x). The success probability is in deed Q(x) as for bold play, although the policy is different. With this example in mind, one can generate a whole series of distinct optimal policies. -

SECTION

7.

GAMBLING SYSTEMS

105

Timid Play *

The optimality of bold play seems reasonable when one considers the effect of its opposite, timid play. Let the £-timid policy be to bet Wn = min { £, Fn 1 , 1 - Fn 1 } and stop when Fn reaches 0 or 1. Suppose that p < q, fix an initial fortune x = F0 with 0 < x < 1 , and consider what 1 happens as £ --+ 0. By the strong law of large numbers, lim nn - S,, = £[ Xd = p - q < 0. There is therefore probability 1 that supkSk < oo and lim nS, = - oo . Given 1J > 0, choose £ so that P[supk(x + tSk ) < 1 ] > 1 - 1J. Since P(U�_ 1 [x + £Sn < 0]) = 1, with probability at least 1 - 1J there exists an n such that x + tSn < 0 and maxk n(x + tSk ) < 1. But under the £-timid policy the gambler is in this circumstance ruined. If Q((x) is the probability of success under the £-timid policy, then lim ( _. 0Q (( x ) = 0 for 0 < x < 1. The law of large numbers carries the timid player to his ruin. _

_

<

PROBLEMS 7.1.

A gambler with initial capital a plays until his fortune increases b units or he is ruined. Suppose that p > 1. The chance of success is multiplied by 1 + 8 if his initial capital is infinite instead of a. Show that 0 < (J < ( pa - 1) < ( a ( p - 1)) - ; relate to Example 7.3. - •

7.2.

13.

As shown on p.

90, there is probability 1 that the gambler either achieves his goal of c or is ruined. For p =F q, deduce this directly from the strong law of

large numbers. Deduce it (for all p ) via the Borel-Cantelli lemma from the fact that if play never terminates, there can never occur c successive + 1's. 6.12 f If V, is the set of n-long sequences of ± 1's, the function bn in (7.9) maps V, _ 1 into {0, 1 }. A selection system is a sequence of such maps. Although there are uncountably many selection systems, how many have an effective description in the sense of an algorithm or finite set of instructions by means of which a deputy (perhaps a machine) could operate the system for the gambler? An analysis of the question is a matter for mathematical logic, but one can see that there can be only countably many algorithms or finite sets of rules expressed in finite alphabets. Let Y. , �a>, . . . be the random variables of Theorem 7.1 for a particular system a ' and let c be the w-set where every k-tuple of + 1 's ( k arbitrary) a occurs in Y1 ( w ), Yia>( w ), . . . with the right asymptotic relative frequency (in the sense of Problem 6.12). Let C be the intersection of C over all a in the effective selection systems a. Show that C lies in !F (the a-field probability space (Q, !F, P) on which the Xn are defined) and that P( C) = 1. A sequence ( X1 ( w ), X2 ( w ), . . . ) for w in C is called a collective: a subse quence chosen by any of the effective rules a contains all k-tuples in the correct proportions.

* This topic may be omitted.

106

7.4.

PROBABILITY

Let D11 be 1 or 0 according as X2 _ 1 * X2 11 or not.. and let M" be the time of the k th 1 - the smallest n such that E�·= 1 D, = k . Let Z" = X2 M . In other words, look at successive nonoverlapping pairs ( X2 11 _ 1 , X211 ) , dis�ard accor dant ( x2 ,r - 1 = x2 ,. ) pairs, and keep the second element of discordant ( X211 _ 1 * X2 ,. ) pairs. Show that this process simulates a fair coin: Z1 , Z2 , are independent and identically distributed and P [ Z" = + 1] = P [ Z" = - 1] = ! , whatever p may be. Follow the proof of Theorem 7. 1 ; it is instructive to write out the proof in all detail, suppressing no w 's. n

.

7.5.

•

.

Suppose that a gambler with initial fortune 1 stakes a proportion 8 (0 < 8 < 1) of his current fortune: Fo = 1 and W,. = 8F,. _ 1 • Show that F,. = n;: 1 (1 + (J xk ) and hence that =

n

[

�.

1

log F,. = 2 -; log 1 Show that F, 7.6.

7.8.

_

-

82 >] .

0 with probability 1 in the subfair case.

In "doubling,'' W1 = 1, w,. = 2 W,. _ 1 , and the rule is to stop after the first win. For any positive p, play is certain to terminate. Here F, = Fo + 1 , but of course infinite capital is required. If Fo = 2 k - 1 and W,. cannot exceed F, 1 the probability of F, = Fo + 1 in the fair case is 1 - 2 - " . Prove this via Theorem 7.2 and also directly. _

7.7.

-+

+8 + ( log 1 8

,

In " progress and pinch," the wager, initially some integer, is increased by 1 after a loss and decreased by 1 after a win, the stopping rule being to quit if the next bet is 0. Show that play is certain to terminate if and only if p > ! . Show that F, = Fo + ! W12 + ! ( T - 1). Infinite capital is required.

Here is a common martingale. At each stage the gambler has before him a pattern x 1 , , xk of positive numbers. He bets x 1 + x�.. , or x 1 in case k = 1 . If he loses, at the next stage he uses the pattern x 1 , . . . , x�.. , x1 + x�.. . If he wins, at the next stage he uses the pattern x 2 , , x" 1 , unless k is 1 or 2, in which case he quits. Show that play is certain to terminate if p > t and that the ultimate gain is the sum of the numbers in the initial pattern. Infinite capital is again required. • • •

• . .

7.9.

_

Suppose that wk = 1 , so that Fk = Fo + sk . Suppose that p > q and T is a stopping time such that 1 < T < n with probability 1 . Show that E[ F, ] < E [ F, ], with equality in case p = q. Interpret this result in terms of a stock option that must be exercised by time n , Fo + s�.. representing the price of the stock at time k.

7. 10. Let u be a real function on [0, 1], u( x) representing the utility of the fortune x. Consider policies bounded by 1 ; see (7 .24) . Let Q ( Fo ) = E [ u ( F, ) ] ; this represents the expected utility under the policy , of an initial fortune Jb . Suppose of a policy w0 that 'IT

( 7 .34)

0 < X < 1,

SECTION

8.

107

MARKOV CHAINS

and that

( 7 .35) 0 < X - 1 < X < X + t < 1. Show that Q ( x) < Q ( x) for all x and all policies , . Such a w0 is optimal. Theorem 7.3 is the ;pecial case of this result for p < i , bold play in the role of w0 , and u(x) = 1 or u ( x ) = 0 according as x = 1 or x < 1. Condition (7.34) says that gambling with policy w0 is at least as good as not gambling at all; (7 .35) says that, although the prospects even under w0 become on the average less sanguine as time passes, it is better to use w0 now than to use some other policy for one step and then change to w0 . 'IT

'IT

7.1 1. The functional equation (7.30) and the assumption that Q is bounded suffice to determine Q completely. First, Q(O) and Q(1) must be 0 and 1, respec tively, and so ( 7.3 1) holds. Let 1Qx = !x and T1 x = }x + ! ; let f0 x = px and /1 x = p + qx. Then Q( Tu � x) = fu fu Q( x). If the binary expansions of x and y both b�gin with the diiits u 1 , � , u,r , they have the form x = T Tu,.x' and y = T Tu ,. y'. If K bounds Q and if m = max{ p , q }, it follows that IQ( x) - Q(y) l < Km n . Therefore, Q is continuous and satisfies (7.31). · · ·

"•

· · ·

· · ·

"•

• •

· · ·

SECTION 8. MARKOV CHAINS

As Markov chains illustrate well the connection between probability and measure, their basic properties are here developed in a measure-theoretic setting. Definitions

Let S be a finite or countable set. Suppose that to each pair i and j in S there is assigned a nonnegative number piJ and that these numbers satisfy the constraint

L P;j = l ,

(8.1 )

jES

i E S.

Let X0, X1 , X2, be a sequence of random variables whose ranges are contained in S. The sequence is a Markov chain or Markov process if •

•

•

P ( Xn + t = JIXo = i o ,

(8.2 )

· · · ,

Xn = i n ]

= P ( Xn+ t = ii Xn = i n ] = P;nJ for every

n

and every sequence i0,

•

•

•

, i n in S for which P [ X0 = i0,

•

•

•

, Xn

108

PROBABILITY

= i n ] > 0. The set S is the state space or phase space of the process, and the p iJ are the transition probabilities. Part of the defining condition (8.2) is that the transition probability (8 .3) does not vary with n. t The elements of S are thought of as the possible states of a system, Xn representing the state at time n. The sequence or process X0, X1 , X2 , then represents the history of the system, which evolves in accordance with the probability law (8.2). The conditional distribution of the next state Xn + t given the present state Xn must not further depend on the past X0 , , Xn - t · This is what (8.2) requires, and it leads to a copious theory. The initial probabilities are •

•

'IT;

(8 .4)

•

•

•

•

= P [ X0 = i] .

The 'IT; are nonnegative and add to 1, but the definition of Markov chain places no further restrictions on them. Example 8. 1.

The Bernoulli-Lap/ace model of diffusion. Imagine r black

balls and r white balls distributed between two boxes, with the constraint that each box contains r balls. The state of the system is specified by the number of white balls in the first box, so that the state space is S = {0, 1, . . . , r } . The transition mechanism is this: at each stage one ball is chosen at random from each box and the two are interchanged. If the present state is i, the chance of a transition to i - 1 is the chance i/r of drawing one of the i white balls from the first box times the chance i/r of drawing one of the i black balls from the second box. Together with similar arguments for the other possibilities, this shows that the transition probabil ities are

P;, ; - t =

( � r.

r ; 2 Pi , i +l = r '

(

)

i r i ( ) = 2 Pu r2 '

the others being 0. This is the probabilistic analogue of the model for the • ftow of two liquids between two containers. * t Sometimes in the definition of Markov chain P [ Xn + 1 = jl Xn = i ] is allowed to depend on n . A chain satisfying (8.3) is then said to have stationary transition probabilities, a phrase that will be omitted here because (8.3) will always be assumed. * For an excellent collection of examples from physics and biology, see FELLER, Volume 1 , Chapter XV.

SECTION

8.

109

MARKOV CHAINS

The P;j form the transition matrix P = [ P;j ] of the process. A stochastic matrix is one whose entries are nonnegative and satisfy (8.1); the transition matrix of course has this property. Extunple 8. 2.

{ O , l, . . . , r } and

Random walk with absorbing barriers. 1

0 0

q •

•

0 0 0

•

•

0 0 0

0 0

p

q

0 P=

0

p

0

•

•

•

0 0 0

•

•

•

0 0 0

•

•

•

•

•

•

...

•

•

•

•

•

... •

•

•

•

•

•

0 0 0

0 0 0 •

•

•

q

0 0

•

•

•

0 0 0 •

0

q

0

•

•

p 0 0

S=

Suppose that 0 0 0

•

•

•

0

p 1

That is, Pi, i + l = p and P;, ; - t = q = 1 p for 0 < i < r and p00 = p,, = 1. The chain represents a particle in random walk. The particle moves one unit to the right or left, the respective probabilities being p and q, except that each of 0 and r is an absorbing state-once the particle enters, it cannot leave. The state can also be viewed as a gambler's fortune; absorption in 0 represents ruin for the gambler, absorption in r ruin for his adversary (see Section 7). The gambler's initial fortune is usually regarded as nonrandom, so that (see (8.4)) 'IT; = 1 for some i. • -

Example 8. 3. Unrestricted random walk. Let S consist of all the integers p. This chain i = 0, ± 1, ± 2, . . . , and take P;, ; + t = p and P;, ; - I = q = 1 -

represents a random walk without barriers, the particle being free to move • anywhere on the integer lattice. The walk is symmetric if p = q.

The state space may, as in the preceding example, be countably infinite. If so, the Markov chain consists of functions Xn on a probability space (D, !F, P ), but these will have infinite range and hence will not be random variables in the sense of the preceding sections. This will cause no difficulty however, because expected values of the Xn will not be considered. All that is required is that for each i E S the set [w: Xn (w) = i] lie in !F and hence have a probability. Let S consist of the integer lattice points in k-dimensional Euclidean space R k ; x = (x 1 , , x k ) lies in S if the coordinates are all integers. Now x has 2k neighbors, points of the form y = (x 1 , , X ; + 1, . . . , x k ); for each such y let Px_v = (2k) - 1 . Example 8.4.

Symmetric random walk in space.

•

•

.

.

•

•

110

PROBABILITY

The chain represents a particle moving randomly in space; for k = 1 it reduces to Example 8.3 with p = q = � . The cases k < 2 and k > 3 exhibit an interesting difference. If k s 2, the particle is certain to return to its • initial position, but this is not so if k > 3; see Example 8.6. Since the state space in this example is not a subset of the line, the X0 , X1 , do not assume real values. This is immaterial because expected values of the Xn play no role. All that is necessary is that Xn be a mapping from 0 into S (finite or countable) such that [ w: Xn ( w) = i] E � for i E S. There will be expected values E[/(Xn )1 for real functions f on S with finite range, but then /( Xn (w)) is a simple random variable as defined before. •

•

•

Example 8.5. A selection problem. A princess must choose from among r suitors. She is definite in her preferences and if presented with all r at once could choose her favorite and could even rank the whole group. They are ushered into her presence one by one in random order, however, and she must at each stage either stop and accept the suitor or else reject him and proceed in the hope that a better one will come along. What strategy will maximize her chance of stopping with the best suitor of all? Shorn of some details, the analysis is this. Let S1 , S2 , , Sr be the suitors in order of presentation; this sequence is a random permutation of the set of suitors. Let X1 == 1, and let X2 , X3 , be the successive positions of suitors who dominate (are preferable to) all their predecessors. Thus X2 == 4 and X3 == 6 means that S1 dominates S2 and � but S4 dominates S1 , S2 , � , and that S4 dominates S5 but S6 dominates S1 , , S5 • There can be at most r of these dominant suitors; if there are exactly m , xm + 1 == xm + 2 == == r + 1 by convention. As the suitors arrive in random order, the chance that S; ranks highest among S1 , , S; is (i - 1)!/i ! == 1/i. The chance that � ranks highest among S1 , � and S; ranks next is ( j - 2)!/j ! == 1/j ( j - 1). This leads to a chain with transition probabilitiest • • •

• • •

• • •

•

•

•

•

•

•

•

P [ X, + 1 = JI X,

(8.5 )

i i ... ] ... j( j _ l ) ,

•

(8.6)

•

p [ xn + 1 == r + 11 xn == i ] == !_ ' r

•

,

1 < i < j < r.

If Xn == i, then Xn + 1 == r + 1 means that S; dominates S; + 1 , S1 , , S; , and the conditional probability of this is •

•

• • •

, Sr

as

well

as

1 < i < r.

As downward transitions are impossible and r + 1 is absorbing, this specifies a transition matrix for S == {1, 2, . . . , r + 1 }. It is quite clear that in maximizing her chance of selecting the best suitor of all, the princess should reject those who do not dominate their predecessors. Her strategy therefore will be to stop with the suitor in position XT , where T is a random t The details can be found in DYNKIN & YUSHKEVICH. Chapter III.

SECTION

8.

111

MARKOV CHAINS

variable representing her strategy. Since her decision to stop must depend only on the suitors she has seen thus far, the event [ T == n ] must lie in a( X1 , , Xn ) . If X., == i , then by (8.6) the conditional probability of success is /(i) == ifr. The probability of success is therefore E[/( X., )], and the problem is to choose the • strategy T so as to maximize it. For the solution, see Example 8.16.t • • •

Higher-Order Transitions

The properties of the Markov chain are entirely determined by the transi tion and initial probabilities. The chain rule (4.2) for conditional probabili ties gives

Similarly, (8. 7)

for any sequence Further, (8. 8 )

i i 0'

1'

. . . ' i m of states.

P ( Xm + t = }1 , 1

<

t < n i Xs = i s , O < s < m ]

as follows by expressing the conditional probability as a ratio and applying (8. 7) to numerator and denominator. Adding out the intermediate states now gives the formula (8.9 )

P � �> = P ( Xm + n = J· I Xm = i ] '}

(the k1 range over S ) for the nth order transition probabilities. t With the princess replaced by an executive and the suitors by applicants for an office job, this is known as the secretary problem.

112

PROBABILITY

n

Notice thatp f? > is the entry in position (i, j ) of p , the n th power of the transition matrix P. If S is infinite, P is a matrix with infinitely many rows and columns; as the terms in (8.9) are nonnegative, there are no conver gence problems. It is natural to put

{

1 P ��> = 8 . . = 0 I}

I}

if i = j , if i ¢ j .

Then P0 is the identity, as it should be. From (8.1) and (8.9) follow n + <� � l � > � . . � � ) P p = p p p = 8.10 ( ) LJ 1 , "} LJ 1 , 'II} > ' I} ,

,

An Existence lbeorem

Suppose that P = [ p;1 ] is a stochastic matrix and that 'IT; are nonnegative numbers satisfying L; s'IT; = 1. There exists on some ( 0, .fF, P) a Markov chain Xo , XI , x2 , . . . with initial probabilities 'IT; and transition probabilities piJ · lbeorem 8.1.

e

Reconsider the proof of Theorem 5.2. There the space ( 0, ofF, P) was the unit interval, and the central part of the argument was the construction of the decompositions (5.10). Suppose for the moment that S = { 1 , 2, . . . ) . First construct a partition Jf0>, 1�0>, . . . of (0, 1] into counta bly manyt subintervals of lengths ( P is again Lebesgue measure) P( I/0> ) = 'IT;· Next decompose each 1/0> into subintervals I;�> of lengths P( l;�>) = 'IT; P ;J · Continuing inductively gives a sequence of partitions { /;�� . ;,. : i 0 , , i n = 1 , 2, . . . } such that each refines the preceding and P( l;�� > . ;,. ) = 'IT;0 pi 0 i1 pin in Put Xn ( w) = i if w E U ;0 . . . ;,. _ 1 1;�� . ;,. _ 1 ; . It follows just as in the proof of Theorem 5.2 that the set [ X0 = i 0 , , Xn = i n ] coincides with the interval 1;�� . ;,. · Thus P[ X0 = i , , Xn = in] = 'IT;0 P ;0 ; 1 P;,._ 1 ; ,.· From this, (8.2) 0 and (8.4) follow immediately. That completes the construction for the case S = { 1 , 2, . . . } . For the general countably infinite S, let g be a one-to-one mapping of { 1 , 2, . . . } onto S and replace the Xn as already constructed by g( Xn ); the assumption S = { 1, 2, . . . } was merely for notational convenience. The same argument • obviously works if S is finite. PROOF.

•

•

•

•

•

•

-I

•

•

•

•

•

•

•

•

•

•

Although strictly speaking the Markov chain is the sequence X , X1 � . . . , one often speaks as though the chain were the matrix P together0 = b - a and B; � 0 , then I; = (b - E1 s , 81 , b - E1 < ;B1 ), i = 1 , 2 decompose ( a , b ] into intervals of lengths B, .

t 1r

81 + 82 +

·

·

·

• . . .

,

SECTION

8.

113

MARKOV CHAINS

with the initial probabilities 'IT; or even P with some unspecified set of 'IT; . Theorem 8.1 justifies this attitude: For given P and 'IT; the corresponding Xn do exist, and the apparatus of probability theory-the Borel-Cantelli lemmas and so on-is available for the study of P and of systems evolving in accordance with the Markov rule. From now on fix a chain X0 , X1 , • • • satisfying 'IT; > 0 for all i. Denote by P; probabilities conditional on [ X0 = i]: P;(A) = P[A I X0 = i]. Thus

( 8. 11)

--

P. [ X = z· 1 < t < n ] = p I

I

I'

.

II1

..

p I 1 I 2 • • • P I· n - 1 I· n

by (8.8). The interest centers on these conditional probabilities, and the actual initial probabilities are now largely irrelevant. From (8.11) follows

(8. 12)

Suppose that I is a set (finite or infinite) of m-long sequences of states, J is a set of n-long sequences of states, and every sequence in I ends in j. Adding both sides of (8.12) for ( i 1 , • • • , i m ) ranging over I and ( j1, • • • , in ) ranging over J gives

( 8 . 13)

For this to hold it is essential that each sequence in (8.12) and (8.13) are of central importance.

I end in j.

Formulas

Transience and Persistence

Let

(8.14)

-

be the probability and let

(8.1 5 )

of a first visit to j at time

tij

= P;

n

for a system that starts in

( CJl x = i 1 ) = f /;�" > n-1

..

n-1

i,

1 14

PROBABILITY

be the probability of an eventual visit. A state i is persistent if a system starting at i is certain sometime to return to i: /;; = 1. The state is transient in the opposite case: /;; < 1. < n k and Suppose that n 1 , • • • , n k are integers satisfying 1 < n 1 < consider the event that the system visits j at times n 1 , • • • , n k but not in between; this event is determined by the conditions · · ·

XI

=I= j '

. . . ' xn l - I

=I= j '

xn l

= j'

. . . ' Xn 2 - I =I= j ' Xn 2 = j ' .................. . .......

Xn l + I

=I= j '

Repeated application of (8.13) shows that under P; the probability of this n n n �}n �< - n �< - •> . Add this over the k-tuples n 1 , • • • , n k : event is /;S · >�} 2 - 1 > 1 the P;-probability that Xn = j for at least k different values of n is /;1�J - • Letting k --+ oo therefore gives · · ·

if lI.jj. < if lI.jj. =

( 8 .16 ) Recall that

i.o. means infinitely often. Taking i = j gives if /; ; < if }1,. . =

( 8 .17 ) Thus

4.5),

1' 1.

1' 1.

P;[ Xn = i

i.o.] is either 0 or 1; compare the zero-one law (Theorem but note that the events [ Xn = i ] here are not in general independent.

Theorem 8.2. (i) Transience ofi is equivalent to P;[ Xn = i i.o.] = 0 and to Ln Pft> < oo . (ii) Persistence of i is equivalent to P;[ Xn = i i.o.] = 1 and to En pfr > = oo .

By the first Borel-Cantelli lemma, L n Pfr > < 00 implies P;[ xn = i i.o.] = 0, which by (8.17) in tum implies /;; < 1. The entire theorem will be proved if it is shown that /;; < 1 implies Ln Pfr > < oo . PROOF.

SECTION

8.

115

MARKOV CHAINS

The proof uses a first-passage argument : by (8.13),

n- 1

p!j > = P; [ xn = j ) = L P; [ xl * j , . . . ' xn - s - 1 * j , s-O

n- 1

- L

s-o

=

n-1 � �

s-o

xn - s = j , xn = j )

P; [ X1 * j , . . . , xn - s - 1 * i , xn - s = j ] lj [ Xs = j]

l_ (_n - s )p �� ) . lrJ JJ

Therefore,

n

n t- 1

t - s >p : > > = /;� : L L L P:f t- 1 t- 1 s-O n- 1

n

s-o

t-s+ 1

- L P :t > L

/;� t - s ) �

n

L P Jt >/; ; ·

s-o

Thus (1 - /;;) E7_ 1 p1f> � /;;, and if /;; < 1, this puts a bound on the partial � n ;(t) Surns f-t• 1P ; . Polya 's theorem. For the symmetric k-dimensional ran dom walk (Example . 8.4), all states are persistent if k = 1 or k = 2, and all states are transient if k > 3. To prove this, note first that the probability pfr > of return in n steps is the same for all states i; denote this probability by a�k> to indicate the dependence on the dimension k. Clearly, a ��>+ 1 = 0. Suppose that k = 1. Since return in 2n steps means n steps east and n steps west, Example 8.6.

a 2< ln) =

( 2nn ) _!_ . 22n

Stirling's formula, a�1J - ( 'ITn) - 1 12 • Therefore, En a �1 > = are persistent by Theorem 8.2. By

oo

and all states

1 16

PROBABILITY

In the plane, a return to the starting point in 2n steps means equal numbers of steps east and west as well as equal numbers north and south:

(2n )! 1 f. u!u! ( n - u ) ! ( n - u ) ! 4 2 n u == O

a(22n> _

( )

It can be seen on combinatorial grounds that the last sum is 2nn , and so a �2J = (a �1J ) 2 - ( w n) - I . Again, En a �2 ) = oo and every state is persistent. For three dimensions,

a 2( 3n) = L

( 2n ) ! u!u!v!v! ( n - u - v ) ! ( n - u - v ) !

the sum extending over nonnegative reduces to

u

and

v

1 6 2n '

satisfying

u + v < n.

This

) 2 n - 2/ ( 2 ) 2 / ) ( ( 1 2 a 2< 3>n - L � 3 3 a �� u a W , n

(8 . 18)

_

/- 1

-

2

as can be checked by substitution. (To see the probabilistic meaning of this formula, condition on there being 2n - 21 steps parallel to the vertical axis and 21 steps parallel to the horizontal plane.) It will be shown that a �3J = O(n - 312 ), which will imply that En a�3> < oo . The terms in (8 . 18) for I = 0 and I = n are each OI( n - 312 ) and hence can be omitted. Now a �1 > < Ku - 1 12 and a�2 > < Ku - , as already seen, and so the sum in question is at most

n ) ) ) 2/ 2 2 / 1 ( ( ( 2 1 n2 3 K L ;� 3 ( 2n - 21 ) - 1 /2 ( 21 ) /-1

-•.

(2n - 21 ) - 1 12 < 2n 1 12 ( 2n - 21 ) - 1 � 4n 1 12 ( 2n - 21 + (21) - 1 � 2(21 + 1) - 1 , this is at most a constant times Since

! 2 n ) ( /2 I n 2n + 2 ( )! Thus L, n a�3> <

oo ,

1 ( 2n + 2 ) ( ! ) 2n - 2/ 1�1 2/ + 1 3

n-

+

I

1) - 1 and

( 2 ) 2/ I - O ( n - 3/2 ) . 3

+

and the states are transient. The same is true for

k=

SECTION

4, 5, . . . , since 0( n - " 12 ) .

8.

1 17

MARKOV CHAINS

an inductive extension of the argument shows that a �k >

=

•

It is possible for a system starting in i to reach j (f,.1 > 0) if and only if p1j > > 0 for some n. If this is true for all i and j, the Markov chain is

irreducible.

If the Markov chain is irreducible, then one of the following two alternatives holds. (i) All states are transient, P,. (U 1[ Xn = j i.o.]) = 0 for all i, and Ln Pfj > < oo for all i and j. (ii) All states are persistent, P,. (n1[ Xn = j i.o.]) = 1 for all i, and En p1j > = oo for all i and}. Theorem 8.3.

The irreducible chain itself can accordingly be called persistent or transient. In the persistent case the system visits every state infinitely often. In the transient case it visits each state only finitely often, hence visits each finite set only finitely often, and so may be said to go to infinity. PROOF. >

P�� > 0. }I

For each Now

i

and

j

there exist

r

and

P � � + s + n ) p ��>p � � >p � � )

(8. 19 )

II

> -

1)

}}

)1

s

such that

>

pf; > 0

and

'

and since p I}� � >p}�I� > > 0' En p I�I� > < oo implies that En p}}� � > < oo · if one state is transient, they all are. In this case (8.16) gives P,. [ Xn = j i.o.] = 0 for all i and j , so that P,.(U1[ Xn = i i.o.]) = O for all i. Since E�_ 1 p 1j > = n I(•)P ( n - • ) - �00 / ( P)�oom P (Jm ) < L.,�00m-== OP (Jm ) , 1· t f0llOWS that 1· f }· � ao � L.,.,._ I J ij L., -o { �n - 1L.,.,- 1 J ij 11 J is transient, then (Theorem 8.2) En pfj converges for every i. The other possibility is that all states are persistent. In this case lj[ Xn = j i.o.] = 1 by Theorem 8.2, and it follows by (8.13) that •

pJ,.m > = lj ( ( Xm = i ] n ( Xn = j i.o. ] ) < _

E

n>m � i..J

n>m

lj [ Xm = i, xm + l =1:- j,

•

•

•

, xn - 1 =1:- j, Xn = J ]

p � � )Jl,(_n - m ) = p � � >Jl. . . }I

I}

}I

I}

There is an m for which pJ,.m > > 0, and therefore fiJ = 1 . By (8.16), P, [ Xn = j i.o.] = /;1 = 1 . If En pfj > were to converge for some i and j, it would follow by the first Borel-Cantelli lemma that P,. [ Xn = j i.o.] = 0. •

1 18

PROBABILITY

Example B. 7.

impossible if

S

Since E1 pfj > = 1 , the first alternative in Theorem 8.3 is is finite: a finite, irreducible Markov chain is persistent.

The chain in Polya's theorem is certainly irreducible. If the dimension is 1 or 2, there is probability 1 that a particle in symmetric random walk visits every state infinitely often. If the dimension is 3 or more, • the particle goes to infinity. Example 8. 8.

Example 8. 9.

Consider the unrestricted random walk on the line (Ex ample 8.3). According to the ruin calculation (7.8), lot = pjq for p < q. Since the chain is irreducible, all states are transient. By symmetry, of course, the chain is also transient if p > q, although in this case (7.8) gives lot = 1. Thus /; = 1 (i ¢ j ) is possible in the transient case.t If p = q = , the chain is persistent by Polya' s theorem. If n and j - i have the same parity,

(

n

n +j - ; 1 li - i l < n · 2 This is maximal if j i or j i ± 1, and by Stirling' s formula the maximal value is of order n - t /2 • Therefore, lim n p �n > 0, which always holds in the =

=

=

transient case but is thus possible in the persistent case as well ( see Theorem

8.8).

•

Another Criterion for Persistence

Let Q = [ qiJ ] be a matrix with rows and columns indexed by the elements of a finite or countable set U. Suppose it is substochastic in the sense that q;1 > 0 and E1 q;1 < 1. Let Q n = [ q;)n >] be the n th power, so that � q . ,q"l<� > ' q1)� � + t ) = i..J ( 8.20 ) ,

,

Consider the row sums O.,( n ) =

( 8.21 ) From

( 8.22 )

(8.20) follows

o .< n + t ) I

=

� q��)•

i..J j

1)

� . . (J � n ) • q i..J j

1) J

t out for each j there must be some i =F j for which fij

<

1 ; see Problem 8.9.

8. MARKOV CHAINS 01.< l l � 1 , and hence o.< n + l )

1 19

SECTION

Si ncen Q is substochastic, n l r q I�JI >oJ1< ) � o.I< > Therefore the monotone limits � 'II • '

I

=

n lq .

EJ.E J1qI�JI

Jlj =

( 8.23) exist. By (8.22) and the Weierstrass M-test solve the system X;

(8.24 )

=

E jE U

[A28], o; E1q;1o1. Thus the o; =

q J x1 , i e U, i

O < x; < 1 ,

; e u.

n � � ;1 - o;( I ) , and X; < o;( ) for < L.t1q L.t1q;1x1 n n n all i implies that X; < E1q;1o} > o;< + I ) by (8.22). Thus X; < o;< > for all n by induction, and so X; < o;. Thus the o; give the maximal solution to (8.24): . · For an arb1 trary so1utton, X;

=

For a substochastic matrix solution of (8.24). Lemma

1.

Q

the limits (8.23) are the maximal

Now suppose that U is a subset of the state space S. The p iJ for i and j n in U give a substochastic matrix Q. The row sums (8.21) are o/ > = Ep;11 P1LI 2 • • • pin - Lin' where the j1 , . . . , in range over U, and so o/ n > = P; [ X1 e U, t < n ]. Let n --+ oo :

(8.25 )

o;

=

P; ( X1

E

U, t 1 , 2, . . . ) =

,

i E U.

this case, o; is thus the probability that the system remains forever in U, given that it starts at i. The following theorem is now an immediate consequence of Lemma 1. In

Theorem

8.4.

of the system

( 8.26 )

For U c S the probabilities (8.25) are the maximal solution X;

=

E jE U

piJ x1 , i e U,

0 < X; < 1 ,

i E U.

120

PROBABILITY

For the random walk on the line consider the set The system (8.26) is

Example 8.10.

{0, 1, 2, . . . }.

{ X; == PpxX; +. 1 + qxi - 1 '

Xo 1 xn = A + An

i>

U

=

1,

n

It follows [A19] that if p = q and x n = A - A(qjp) + 1 if p ¢ q. The only bounded solution is x n = 0 if q > p, and in this case there is probability 0 of staying forever among the nonnegative integers. If q < p, A = 1 gives the maximal solution xn = 1 - (qjp) n + 1 . Compare (7.8) and • Example 8.9. Now consider the system (8.26) with single state i 0 :

U = S - { i0 }

for an arbitrary

X; = E P;jxj, i ¢ i 0, j '=�= io 0 � X; < 1 ,

( 8 .27 )

There is always the trivial solution-the one for which

X; = 0.

An irreducible chain is transient zf and only if (8.27) has a nontrivial solution.

Theorem 8.5. PROOF.

The probabilities

( 8 .28 ) are by Theorem 8.4 the maximal solution of (8.27). Therefore (8.27) has a nontrivial solution if and only if /;;0 < 1 for some i ¢ i 0 • If the chain is persistent, this is impossible by Theorem 8.3(ii). Suppose the chain is transient. Since 00

P;o [ X1 = i, x2 ¢ i o , . . . ' Xn - 1 ¢ i o , xn = i o] hoio = P;o [ X1 = i o] + nE=2 iE -:l= i0 = P i0 i0 + L Pi0 i fi io ' ; '=�= io and since /;0 ;0 < 1, it follows that /;;0 < 1 for some i ¢ i 0• • Since the equations in (8.27) are homogeneous, the issue is whether they have a solution that is nonnegative, nontrivial, and bounded. If they do, 0 � x ; � 1 can be arranged by rescaling. t t see Problem 8. 1 1 .

SECTION

8.

121

MARKOV CHAIN S

In the simplest of queueing models the state space is and the transition matrix has the form

Example 8. 11.

{ 0, 1, 2, . . . }

ao aI a 2 0 0 0 ao aI a 2 0 0 0 0 ao al a 2 0 0 0 0 ao aI a 2 0 0 0 0 ao aI a 2

...

••• ••• ...

•••

.........................

If there are i customers in the queue and i > 1 , the customer at the head of the queue is served and leaves, and then 0, 1, or 2 new customers arrive (probabilities a0, a 1 , a 2 ), which leaves a queue of length i - 1, i, or i + 1. If i = 0, no one is served, and the new customers bring the queue length to 0, 1, or 2. Assume that a0 and a 2 are positive, so that the chain is irreducible. For i0 = 0 the system (8.27) is

(8 . 29 )

{ x i == aai xxi + a+2xa2 x, xk

o k-1

k > 2.

l k + a 2x k + 1 '

Since a0, a 1 , a 2 have the form q( 1 - a), a, p( 1 - a) for appropriate p, q, a, the second line of (8.29) has the form x k = px k + I + qx k _ 1 , k > 2. Now the solution [A19] is A + B( q/p) k = A + B( a0ja 2 ) k if a 0 ¢ a 2 ( p ¢ q) and A + Bk if a0 = a 2 ( p = q), and A can be expressed in terms of B because of the first equation in (8.29). The result is if a 0 ¢ a2 , if a 0 = a 2 • There is a nontrivial solution if a0 < a 2 but not if a 0 > a 2 . If a 0 < a 2 , the chain is thus transient, and the queue size goes to infinity with probability 1. If a0 � a 2 , the chain is persistent. For a nonempty queue the expected i_ncrease in queue length in one step is a 2 - a0, and the • queue goes out of control if and only if this is positive. Stationary Distributions

Suppose that the initial probabilities for the chain satisfy

(8 .30)

E 'IT; P ij = 'IT} '

iES

j E

S.

122

PROBABILITY

It then follows by induction that

� 'fT.p � �> 'fT.,

( 8 .31 )

i..J iES

I

1)

=

J

jE

S,

n

=

0, 1 , 2, . . . .

If 'IT; is the probability that X0 = i, then the left side of (8.31) is the probability that xn = j, and thus (8.30) implies that the probability of [ Xn = j] is the same for all n. A set of probabilities satisfying (8.30) is for this reason called a stationary distribution. The existence of such a distribu tion implies that the chain is very stable. To discuss this requires the notion of periodicity. The state j has period t if p}1n > > 0 implies that t divides n and if t is the largest integer with this property. In other words, the period of j is the greatest common divisor of the set of integers

( 8 .32 ) If the chain is irreducible, then for each pair i and j there exist for which p fj > and p}t > are positive, and of course

�� + s + n) > p � � >p � �>p � � ) PI I }} 1) )1

( 8 .33 )

-

r and s

•

Let t ; and t1 be the periods of i and j. Taking n = 0 in this inequality shows that t ; divides r + s; and now it follows by the inequality that p}1n > > 0 implies that t; divides r + s + n and hence divides n. Thus t; divides each integer in the set (8.32), and so t; � t1 . Since i and j can be interchanged in this argument, i and j have the same period. One can thus speak of the period of the chain itself in the irreducible case. The random walk on the line has period 2, for example. If the period is 1, the chain is

aperiodic.

In an irreducible, aperiodic chain, for each i and}, P0n > > 0 for all n exceeding some n0(i, j). PROOF. Since p1jm + n > > p1jm>pJ1n > , if M is the set (8.32) then m e M and n e M together imply m + n e M. But it is a fact of number theory [A21] Lemma

2.

that if a set of positive integers is closed under addition and has greatest common divisor 1, then it contains all integers exceeding some n 1 . Given i and J.' choose r so that p1)� � > > 0 If n > n 0 = n I + r then pI)� � > > pI}� � >p}}� �- r > > 0. • •

'

Suppose of an irreducible, aperiodic chain that there exists a stationary distribution-a solution of (8.30) satisfying 'IT; � 0 and L; 'IT; 1 . Theorem 8.6.

=

SECTION

Then the chain is persistent, ( 8.34 )

8.

MARKOV CHAINS

123

� ) = 'fT.} lim p � I} n

for all i and}, the 'ITi are all positive, and the stationary distribution is unique. n The main point of the conclusion is that, since p� > approaches 'ITi , the effect of the initial state i wears off. n PROOF. If the chain is transient, then p� > --+ 0 for all i and j by Theorem 8.3 , and it follows by (8.31) and the M-test that 'ITi is identically 0, which contradicts L; 'IT; = 1. The existence of a stationary distribution therefore

implies that the chain is persistent. Consider now a Markov chain with state space S X S and transition probabilities p(ij, k/) = P; k Pil (it is easy to verify that these form a stochas tic matrix). Call· this the coupled chain; it describes the joint behavior of a pair of independent systems, each evolving according to the laws of the original Markov chain. By Theorem 8.1 there exists a Markov chain ( Xn , Yn ), n = 0, 1, . . . , having positive initial probabilities and transition probabilities

P ( ( Xn + l ' Yn + t ) =

( k, / ) 1( Xn , Yn) = ( i, J ) ) = p ( ij, kl ) . n For n exceeding some n 0 depending on i, }, k, /, p< n >(ij, kl) = pf: >pi� > is positive by Lemma 2. Therefore, the coupled chain is irreducible. (The

proof of this fact requires only the assumptions that the original chain is irreducible and aperiodic.) It is easy to check that 'IT( ij) = 'IT; 'ITi forms a set of stationary initial probabilities for the coupled chain, which, like the original one, must therefore be persistent. It follows that, for an arbitrary initial state ( i, }) for the chain {( Xn , Yn )} and an arbitrary i0 in S, P;i [( Xn , Yn ) = ( i 0 , i0) i.o.] = 1. If T is the smallest integer such that XT = YT = i0, then T is finite with probability 1 under P;j· The idea of the proof is now this: Xn starts in i and yn starts in }; once xn = yn = i o occurs, xn and yn follow identical probability laws, and hence the initial states i and j will lose their influence. By (8.13) applied to the coupled chain, if m < n, then P;j [ ( xn ' yn ) =

(k' I)' T = m] t < m , ( Xm , Ym ) = ( i o , i o )] X P;o;o ( ( Xn - m ' Yn - m ) = ( k , / )]

= Pij ( ( X, , Y, ) ¢ ( i o , i o ) ,

�n - m)p �n - m > . = PI.J. ( T = m ] pt0A. 10/

124

PROBABILITY

= =I = = = k, < ] = = < ] = k ] < = k, < n ] + > ] = = k , < ] + PiJ > ] ] = k] +

Adding out I gives PiJ [ X, k , 'T m ] m] P.1 1 [ T out k oives PIJ. . [ Yn ' 'T probabilities, and add over m 1, . . . , n : =

ca

pi} [ xn

From this follows P;1 ( Xn

'T

n

PiJ ( Xn

PiJ ( Yn

=

=

=

P;1 [ T

=

m ] p 1c:ic- m ), and adding m ] p�1 11/" - m > • Take k = I' equate

pi} [ yn

k'

T

'T

< PiJ ( Yn

n

PiJ ( T >

'T

n .

P;1 ( 'T

n

( 'T

n

n .

This and the same inequality with X and Y interchanged give

Since T is finite with probability 1 ,

p�n) - p�n) lim I k l jk I n

(8 .35)

=

0•

(This proof of (8.35) requires only that the coupled chain be persistent.) By (8.31) , 'ITk - pj; > L; 'IT; ( pf; > - pj; >), and this goes to 0 by the M-test if (8.35) holds. Thus Iim n p}; > 'ITk . As this holds for each stationary distribution, there can be only one of them. It remains to show that the 'ITJ are all strictly positive. Choose r and s so that pf; > and p}t > are positive. Letting n --+ oo in (8.33) shows that 'IT; is positive if 'IT1 is; since some '"J is positive (they add to 1) , all the 'IT; must be • positive.

=

=

For the queueing model in Example 8.1 1 the equations

Example 8. 12.

(8.30) are

'ITo 'IT1 'IT2 'ITk

== 'IT'IToaao ++ 'IT1aa1o , 2a , o t 'IT1 + 'IT o = '1Toa2 + 'IT1a2 + 'IT2a1 + 'IT3ao , = 'ITk - 1a2 + 'ITkai + 'ITk + tao ,

k > 3.

Again write a 0, a 1 , a 2 , as q(1 - a), a , p(1 - a ). Since the last equation here is 'ITk q 'ITk + 1 + p 'ITk _ 1 , the solution is

=

AA Bk 'ITk = { + +

B ( p/q )

k

= A + B ( a 2ja0 ) k

if a0 if a 0

=I=

=

a2 , a2 ,

SECTION

8.

125

MARKOV CHAINS

for k > 2. If a 0 < a2 and Ewk converges, then wk 0, and hence there is no stationary distribution; but this is not new, because it was shown in Example 8.11 that the chain is transient in this case. If a 0 = a 2, there is again no stationary distribution, and this is new because the chain was in Example 8.11 shown to be persistent in this case. If a 0 > a2, then Ewk converges, provided A = 0. Solving for w0 and w1 in the first two equations of the system above gives w0 = Ba2 and w1 = Ba 2 ( 1 - a0 )/a0 • From Ekwk = 1 it now follows that B = (a0 - a 2 )/a 2, and the wk can be written down explicitly. Since wk = B(a2ja0) k for k � 2, there is small chance of a large queue length. • =

If a 0 = a2 in this queueing model, the chain is persistent (Example 8.11) but has no stationary distribution (Example 8.12). The next theorem de scribes the asymptotic behavior of the p!1n > in this case. 1beorem 8.7.

bution, then ( 8 .36) for all i and j.

If an irreducible, aperiodic chain has no stationary distri lim pI)� � > = 0

n

If the chain is transient, (8.36) follows from Theorem interesting here is the persistent case.

8.3.

What is

By the argument in the proof of Theorem 8.6, the coupled chain is irreducible. If it is transient, then E ( p�n> ) 2 converges by Theorem 8.2, and n the conclusion follows. Suppose, on the other hand, that the coupled chain is persistent. Then the stopping-time argument leading to (8.35) goes through as before. If the p�n> do not all go to 0, then there is an increasing sequence { n u } of integers along which some p�n> is bounded away from 0. By the diagonal method [A14], it is possible by passing to a subsequence of { n u } to ensure that each p�n��> converges to a li�t, which by (8.35) must be independent of i. Therefore, there is a sequence { n u } such that limup�n��> = a1 exists for all i and j, where a1 is nonnegative for all j and positive for some j. If M is a finite set of states, then E1e MaJ = limuEJ e MP�n��> < 1, and hence 0 < a = E1a1 < 1. Now Ek e MPf:��>pkJ < p�n��+ I> = Ek PikP�j��>; it is possible to pass to the limit ( u --+ oo) inside the first sum (if M is finite) and inside the second sum (by the M-test), and hence Ek e MakpkJ � EkPi kaJ = a1 . There fore, EkakpkJ � a1; if one of these inequalities were strict, it would follow that Ekak = E1EkakpkJ < E1a1, which is impossible. Therefore Eka k p k J = aj for all }, and the ratios w1 = a1ja give a stationary distribution, contrary • to the hypothesis. PROOF.

126

PROBABILITY

The limits In times. Let

(8.34) and (8.36) can be described in terms of mean return 00

n) . l. < . � n = i..J !ljj ' ,-} n� l if the series diverges, write p.i = oo. In the persistent case, this sum is to be thought of as the average number of steps to first return to j, given that Xo = j.t Lemma 3. Suppose that} is persistent and that lim n p}in > = u. Then u > 0 if and only if p.i < oo, in which case u = 1/p.i .

( 8.37)

u .

Under the convention that 0 = 1/oo, the case u = 0 and p.i = oo is consistent with the equation u = 1/p.i . n PROOF . For k > 0 let p k = En k fi} >; the notation does not show the dependence on j, which is fixed. Consider the double series 1.(2 ) + ljj 1.(3) + . . . 1_(1 ) + ljj ljj 1_(.2) + ljj 1_(3) + . . . + ljj 1_(.3 ) + + ljj >

n n to sums column th n the and 0) > k ( p to sums row The k th k fi} > (n > 1), and so [A27] the series in (8.37) converges if and only if E k p k does, in which case 00

( 8.38 )

P.j = L P k · k�O Since j is persistent, the lj-probability that the system does not hit j up to time n is the probability that it hits j after time n, and this is Pn· Therefore, 1 - pi)n > = Pi [ Xn ¢ j ) n- 1

= Pj [ Xl ¢ j, . . . , Xn ¢ J ) + L lj [ Xk = j, xk + l ¢ j, . . . , Xn ¢ J ) n-1

k-1

k = Pn + L Pj) )Pn - k ' k-1

t Since in general there is no upper bound to the number of steps to first return, it is not a simple random variable. It does come under the general theory in Chapter 4, and its expected value is indeed P.j (and (8.38) is just (5.25)) , but for the present the interpretation of P.j as an average is informal.

SECTION

and since

127

MARKOV CHAINS

1,

=

Po

8.

1

=

p0 pJJ��> + p 1 pJJ� � - 1 > +

·

· ·

+ pn 1 pJJP > + -

pn pJ�J�> •

Keep only the first k + 1 terms on the right here, and let n --+ oo; the result + P k )u. Therefore u > 0 implies that Ek p k converges, so is 1 � (p0 + that p.1 < oo . n Write x n k = P k PJ) - k ) for 0 < k � n and x n k = 0 for n < k. Then 0 < x n k 0 and u = 1jp.1 . ·

·

·

The law of large numbers bears on the relation u = 1jp.1 in the persistent case. Let Vn be the number of visits to state j up to time n. If the time from one visit to the next is about p.1 , then V,. should be about njp.1 : Vnfn 1jp.1 . But (if X0 = } ) Vn/n has expected value n - 1 EZ _ 1 pjf > , which goes to u under the hypotheses of Lemma 3 [A30]. Consider an irreducible, aperiodic, persistent chain. There are two possi bilities. If there is a stationary distribution, then the limits (8.34) are positive, and the chain is called positive persistent. It then follows by Lemma 3 that p.1 < oo and 'ITJ = 1jp.1 for all }. In this case, it is not actually necessary to assume persistence, since this follows from the existence of a stationary distribution. On the other hand, if the chain has no stationary distribution, then the limits (8.36) are all 0, and the chain is called null persistent. It then follows by Lemma 3 that p.1 = oo for all }. This, taken together with Theorem 8.3, provides a complete classification: =

For an irreducible, aperiodic chain there are three possibili The chain is transient; then for all i and}, lim n p�n > 0 and in fact

Theorem

ties:

8.8.

(i) �� n p '1� � > < 00 (ii) The chain

=

·

is persistent but there existsn no stationary distribution (the null persistent case); then for all i and}, p� > goes to 0 but so slowly that Ln P!j > = 00, and p.1 = oo. (iii) There exist stationary probabilities 'ITJ and (hence) the chain is persistent ( the positive persistent case); then for all i and}, lim n p!j > 'ITJ > 0 and p.i 1/'nj· < oo. Since the asymptotic properties of the p!j > are distinct in the three cases, =

these asymptotic properties in fact characterize the three cases.

=

128

PROBABILITY

Example 8. 13.

matrix is

Suppose that the states are 0, 1 , 2, . . . and the transition

o Po 0 q1 0 P 1

q

.

0 0

q2

.

.. ..

0 0 P2 .. .... ......... ...

where P ; and q; are positive. The state i represents the length of a success run, the conditional chance of a further success being P; · Clearly the chain is irreducible and aperiodic. A solution of the system (8.27) for testing for transience (with i0 = 0) must have the form x k = x 1 jp 1 P k - 1 . Hence there is a bounded, nontrivial solution, and the chain is transient, if and only if the limit a of p0 • • • Pn is positive. But the chance of no return to 0 (for initial state 0) in n steps is clearly p0 • • • Pn - 1 ; hence /00 = 1 - a , which checks: the chain is persistent if and only a = 0. Every solution of the steady-state equations (8.30) has the form 'ITk = 'IT0 p 0 • • • P k - 1 . Hence there is a stationary distribution if and only if E k p0 • • • p k converges; this is the positive persistent case. The null per p k --+ 0 but Ek Po p k diverges (which sistent case is that in which Po happens, for example, if qk = 1/k for k > 1). Since the chance of no return to 0 in n steps is p0 • • • Pn _ 1 , in the p k _ 1 . In the null persistent persistent case (8.38) gives p. 0 = Ek-o Po case this checks with p.0 = oo ; in the positive persistent case it gives P. o = Er-o'1Tk/'IT0 = 1/'IT0 , which again is consistent. • ·

·

·

·

·

·

·

·

·

·

·

·

Exponential Convergence *

In the finite case pfj > converges to

'ITi at an exponential rate: Theorem 8.9. If the state space is finite and the chain is irreducible and aperiodic, then there is a stationary distribution { 'IT; } , and lp � � ) 'IT · I < Apn I}

where A

>0

-

}

-

'

and 0 < p < 1.

From this it follows that a finite, irreducible, aperiodic chain cannot be null persistent. This also follows from Theorem 8.8 ; see Example 8.7. * This topic may be omitted.

SECTION

8.

129

MARKOV CHAINS

m(J n > = min . p � �> and MJ� n > = max · P � � > • By (8 • 10) ' n > = m (n ) � p . � . ( min <�> p > m ( n + l ) = min m p . i..J i..J I

Let

PROOF .

}

1)

I

I

1 11

J1

11)

-

I

J1

•

I JI

1)

}

}

'

n > = M� n > • . � . � � <�> p M � n + t > = max < max p M p . � . i..J "J J }

I

Sin ce obviously

J1

-

m} n > < M} n > , < ··· < m (1 > _ < m (2> _ 0_ }

}

( 8 .39)

1,

I

< -

J1

I JI

}

M � 2> < M � 1 > }

-

}

" �

1.

Suppose temporarily that all the P;J are positive. Let r be the number of states and let B = minii p;1 . Since E1 pii > rB, 0 PvJ and let E" denote summation over j satisfying PuJ < PvJ · Then (8 .40) Since E'pvj +

E"puJ > rB,

(8 .41 ) Apply (8.40) and then (8.41):

n n n _ + l + l . �( ( ( ) ) . � p p p p = Pu k vk VJ ) Jk ) i..J UJ 1

"( _ � ' ( PuJ _ PvJ ) Mk( n ) + � < i..J i..J PuJ

=

n n ) ' Ml m� p - >) L ( uj Pvj) (

< (1 Since

u

and

v

n ( m ) PvJ k )

rB ) ( Ml n > - m�n > ) .

are arbitrary,

Th erefore, Ml n > - m�n > < (1 - rB) n . It follows by (8.39) that M) n > have a common limit 'ITJ and that (8 .42 )

m5 n > and

130

PROBABILITY

Take A = 1 and p = 1 - r8. Passing to the limit in (8.10) shows that the 'IT; are stationary probabilities. (Note that the proof thus far makes almost no use of the preceding theory.) If the p;1 are not all positive, apply Lemma 2: Since there are only finitely many states, there exists an m such that pfj > > 0 for all i and j. By the case just treated M} m t > - m} m t ) < p1 Take A = p - I and then replace p by pl / m . • •

Example 8. 14.

Suppose that

P=

Po P t Pr - 1 Po •

•

•

•

•

•

• • • • • •

•

•

•

•

•

• • •

P1

•

•

Pr - 1 Pr - 2

•

•

•

•

Po

The rows of P are the cyclic permutations of the first row: piJ = p1 _; , j - i reduced modulo r. Since the columns of P add to 1 as well as the rows, the steady-state equations (8.30) have the solution 'IT; = r - 1 • If the P; are all 1 n positive, the theorem implies that p� > converges to r- at an exponential rate. If X0 , Y1 , Y2 , . . . are independent random variables with range {0, 1, . . . , r - 1 } , if each Yn has distribution { p0 , , p, _ 1 } , and if Xn = X0 + Y1 + + Yn , where the sum is reduced modulo r, then P[ Xn = j] --+ r - 1 • The Xn describe a random walk on a circle of points, and whatever the • initial distribution, the positions become equally likely in the limit. ·

·

•

·

•

•

Optimal Stopping *

Assume throughout the rest of the section that S is finite. Consider a function T on 0 for which T ( w ) is a nonnegative integer for each w. Let � = o( X0 , X1 , . . . , Xn ); T is a stopping time or a Markov time if

(8.4 3 )

[w:

T( w )

= n]

e�

for n = 0, 1, . . . . This is analogous to the condition (7. 1 8) on the gambler' s stopping time. It will be necessary to allow T ( w ) to assume the special value oo , but only on a set of probability 0. This has no effect on the requirement (8.43), which concerns finite n only. If f is a real function on the state space, then /( X0 ), /( X1 ), . . . are simple random variables. Imagine an observer who follows the successive * This topic may be omitted.

SECTION

8.

13 1

MARKOV CHAINS

X0 , X1 , X,.< c.) > ( w )), and

of the system. He stops at time T, when the state is XT (or receives an reward or payoff /( XT ). The condition (8.43) prevents prevision on the part of the observer. This is a kind of game, the stopping time is a strategy, and the problem is to find a strategy that maximizes the expected payoff E[/( XT )]. The problem in Example 8.5 had this form; there S = {1, 2, . . . , r + 1 }, and the payoff function is /(i) = i/r for i < r (set /(r + 1) = 0). If P (A) > 0 and Y = E1 y1 18 is a simple random variable, the B1 1 forming a finite decomposition of 0 into jli:sets, the conditional expected value of Y given A is defined by states

• • •

E [ Y I A ] = E Y;P( BJI A ) . 1

Denote by

E; conditional expected values for the case A E;[ Y ] = E [ YI X0 = i ] = Ey1 P;( B1 ) .

=

[ X0 = i]:

1

The stopping-time problem is to choose T so as to maximize simultaneously E;[/( XT )] for all initial states i. If x lies in the range of f, which is finite, and if T is everywhere finite, then (w: /( XT( w ) (w)) = X ) = U�-o (w: T( W ) = n, f( Xn ( w )) = x] lies in � ' and so /( XT ) is a simple random variable. In order that this always hold, put /( XT ( w ) (w)) = 0, say, if T(w) = oo (which happens only on a set of probability 0). The game with payoff function f has at i the value

(8 .44 ) the supremum extending over all Markov times T. It will tum out that the supremum here is achieved: there always exists an optimal stopping time. It will also tum out that there is an optimal T that works for all initial states i. The problem is to calculate v(i) and find the best T. If the chain is irreducible, the system must pass through every state, and the best strategy is obviously to wait until the system enters a state for which f is maximal. This describes an optimal T, and v(i ) = max / for all i. For this reason the interesting cases are those in which some states are transient and others are absorbing ( p ;; = 1 ) . A function q> on S is excessive, or superharmonic, if t

(8 .45)

log p ==

E

( [ P ] - 2 [ !!P ] ) 1og p 2n

n

< 2 n ( E2 n ( log* ] - En (log* ] ) == O ( n ) .

Deduce (5.46) by splitting the range of summation by successive powers of 2. The difference between the left side of (5.47) and E[ x J [log * ) is 0(1). (d) If K bounds the 0(1) term in (5.47), then

E log p � 8x E p- 1 log p

p s. x

Bx

�

8x(log 8- 1 - 2 K ) .

(e) For (5.49) use

log p E log x � w ( x) � P s. x < x 1 /2

+

E 1

P s. x

2

l/2

.,.. log x pE �x

+

E

xl/2

Probability and measure

Read more

Probability and measure theory

Read more

Probability and measure

Read more

Probability and measure theory

Read more

Measure theory and probability

Read more

Measure, integral and probability

Read more

Measure, Integral and Probability

Read more

Measure, Integral and Probability

Read more

Probability and measure

Read more

Probability and measure

Read more

Measure Theory and Probability

Read more

Measure, integral and probability

Read more

Measure Theory and Probability Theory

Read more

Measure Theory and Probability Theory

Read more

Measure Theory and Probability Theory

Read more

Measure Theory and Probability Theory

Read more

Measure Theory and Probability Theory

Read more

Measure Theory and Probability Theory

Read more

Measure Theory and Probability Theory

Read more

Measure Theory and Probability Theory

Read more

Measure theory and probability theory

Read more

Measure Theory and Probability Theory

Read more

Probability and Measure (Wiley Series in Probability and Statistics)

Read more

An Introduction to Measure and Probability

Read more

An introduction to measure and probability

Read more

Measure Theory and Probability Theory Edition 1

Read more

Probability & Measure Theory, Second Edition

Read more

Probability & Measure Theory, Second Edition

Read more

Probability & Measure Theory, Second Edition

Read more

Probability & Measure Theory, Second Edition

Read more

Recommend Documents

Probability and measure

Probability and Measure Third Edition PATRICK BILLINGSLEY The University of Chicago A Wiley-Interscience Publication ...

Probability and measure theory

Probability and measure

Probability and measure theory

Measure theory and probability

Measure, integral and probability

Marek Capi´nski and Ekkehard Kopp Measure, Integral and Probability Springer-Verlag Berlin Heidelberg New York London ...

Measure, Integral and Probability

Marek Capi´nski and Ekkehard Kopp Measure, Integral and Probability Springer-Verlag Berlin Heidelberg New York London ...

Measure, Integral and Probability

Marek Capi´nski and Ekkehard Kopp Measure, Integral and Probability Springer-Verlag Berlin Heidelberg New York London...

Probability and measure

Probability and measure