An Introduction to Measure-Theoretic Probability Second edition
An Introduction to Measure-Theoretic Probability Second edition
by GEORGE G. ROUSSAS Department of Statistics University of California, Davis
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands 225 Wyman Street, Waltham, MA 02451, USA 525 B Street, Suite 1800, San Diego, CA 92101-4495, USA Second edition 2014 Copyright © 2014, 2005 Elsevier Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email:
[email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/ permissions, and selecting Obtaining permission to use Elsevier material. Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. Library of Congress Cataloging-in-Publication Data Application submitted British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library For information on all Academic Press publications visit our web site at store.elsevier.com Printed and bound in USA 14 15 16 17 18 10 9 8 7 6 5 4 3 2 1 ISBN: 978-0-12-800042-7
This book is dedicated to the memory of Edward W. Barankin, the probabilist, mathematical statistician, classical scholar, and philosopher, for his role in stimulating my interest in probability with emphasis on detail and rigor. Also, to my dearest sisters, who provided material means in my needy student years, and unrelenting moral support throughout my career.
Pictured on the Cover Carathéodory, Constantine (1873–1950) He was born in Berlin to Greek parents and grew up in Brussels, Belgium. In high school, he twice won a prize as the best Mathematics student in Belgium. He studied Military Engineering in Belgium, and pursued graduate studies in Göttingen under the supervision of Hermann Minkowski. He is known for his contributions to the theory of functions, the calculus of variations, and measure theory. His name is identified with the theory of outer measure, an application of which in measure theory is the so-called Carathéodory Extension Theorem. Also, he did work on the foundations of thermodynamics, and in 1909, he published the “first axiomatic rigid foundation of thermodynamics,” which was acclaimed by Max Planck and Max Born. From correspondence between Albert Einstein and Constantine Carathéodory, it may be deduced that Carathéodory’s work helped Einstein in shaping some of his theories. In 1924, he was appointed professor of Mathematics at the University of Munich, where he stayed until his death in 1950.
xiii
Preface to First Edition This book in measure-theoretic probability has resulted from classroom lecture notes that this author has developed over a number of years, by teaching such a course at both the University of Wisconsin, Madison, and the University of California, Davis. The audience consisted of graduate students primarily in statistics and mathematics. There were always some students from engineering departments, and a handful of students from other disciplines such as economics. The book is not a comprehensive treatment of probability, nor is it meant to be one. Rather, it is an excursion in measure-theoretic probability with the objective of introducing the student to the basic tools in measure theory and probability as they are commonly used in statistics, mathematics, and other areas employing this kind of moderately advanced mathematical machinery. Furthermore, it must be emphasized that the approach adopted here is entirely classical. Thus, characteristic functions are a tool employed extensively; no use of martingale or empirical process techniques is made anywhere. The book does not commence with probabilistic concepts, and there is a good reason for it. As many of those engaged in teaching advanced probability and statistical theory know, very few students, if any, have been exposed to a measure theory course prior to attempting a course in advanced probability. This has been invariably the experience of this author throughout the years. This very fact necessitates the study of the basic measure-theoretic concepts and results—in particular, the study of those concepts and results that apply immediately to probability, and also in the form and shape they are used in probability. On the basis of such considerations, the framework of the material to be dealt with is therefore determined. It consists of a brief introduction to measure theory, and then the discussion of those probability results that constitute the backbone of the subject matter. There is minimal flexibility allowed, and that is exploited in the form of the final chapter of the book. From many interesting and important candidate topics, this author has chosen to present a brief discussion of some basic concepts and results of ergodic theory. From the very outset, there is one point that must be abundantly clarified, and that is the fact that everything is discussed in great detail with all proofs included; no room is allowed for summary unproven statements. This approach has at least two side benefits, as this author sees them. One is that students have at their disposal a comprehensive and detailed proof of what are often deep theorems. Second, the instructor may skip the reproduction of such proofs by assigning their study to students. In the experience of this author, there are no topics in this book which can be omitted, except perhaps for the final chapter. With this in mind, the material can be taught in two quarters, and perhaps even in one semester with appropriate calibration of the rate of presentation, and the omission of proofs of judiciously selected
xv
xvi
Preface to First Edition
theorems. With all details presented, one can also cover an entire year of instruction, perhaps with some supplementation. Most chapters are supplied with examples, and all chapters are concluded with a varying number of exercises. An unusual feature here is that an Answers Manual of all exercises will be made available to those instructors who adopt the book as the textbook in their course. Furthermore, an overview of each one of the 15 chapters is included in an appendix to the main body of the book. It is believed that the reader will benefit significantly by reviewing the overview of a chapter before the material in the chapter itself is discussed. The remainder of this preface is devoted to a brief presentation of the material discussed in the 15 chapters of the book, chapter-by-chapter. Chapter 1 commences with the introduction of the important classes of sets in an abstract space, which are those of a field, a σ-field, including the Borel σ-field, and a monotone class. They are illustrated by concrete examples, and their relationships are studied. Product spaces are also introduced, and some basic results are established. The discussion proceeds with the introduction of the concept of measurable functions, and in particular of random vectors and random variables. Some related results are also presented. This chapter is concluded with a fundamental theorem, Theorem 17, which provides for pointwise approximation of any random variable by a sequence of so-called simple random variables. Chapter 2 is devoted to the introduction of the concept of a measure, and the study of the most basic results associated with it. Although a field is the class over which a measure can be defined in an intuitively satisfying manner, it is a σ-field—the one generated by an underlying field—on which a measure must be defined. One way of carrying out the construction of a measure on a σ-field is to use as a tool the so-called outer measure. The concept of an outer measure is then introduced, and some of its properties are studied in the second section of the chapter. Thus, starting with a measure on a field, utilizing the associated outer measure and the powerful Carathéodory extension theorem, one ensures the definition of a measure over the σ-field generated by the underlying field. The chapter is concluded with a study of the relationship between a measure over the Borel σ-field in the real line and certain point functions. A measure always determines a class of point functions, which are nondecreasing and right-continuous. The important thing, however, is that each such point function uniquely determines a measure on the Borel σ-field. In Chapter 3, sequences of random variables are considered, and two basic kinds of convergences are introduced. One of them is the almost everywhere convergence, and the other is convergence in measure. The former convergence is essentially the familiar pointwise convergence, whereas convergence in measure is a mode of convergence not occurring in a calculus course. A precise expression of the set of pointwise convergence is established, which is used for formulating necessary and sufficient conditions for almost everywhere convergence. Convergence in measure is weaker than almost everywhere convergence, and the latter implies the former for finite measures. Almost everywhere convergence and mutual almost everywhere
Preface to First Edition
convergence are equivalent, as is easily seen. Although the same is true when convergence in measure is involved, its justification is fairly complicated and requires the introduction of the concept of almost uniform convergence. Actually, a substantial part of the chapter is devoted in proving the equivalence just stated. In closing, it is to be mentioned that, in the presence of a probability measure, almost everywhere convergence and convergence in measure become, respectively, almost sure convergence and convergence in probability. Chapter 4 is devoted to the introduction of the concept of the integral of a random variable with respect to a measure, and the proof of some fundamental properties of the integral. When the underlying measure is a probability measure, the integral of a random variable becomes its expectation. The procedure of defining the concept of the integral follows three steps. The integral is first defined for a simple random variable, then for a nonnegative random variable, and finally for any random variable, provided the last step produces a meaningful quantity. This chapter is concluded with a result, Theorem 13, which transforms integration of a function of a random variable on an abstract probability space into integration of a real-valued function defined on the real line with respect to a probability measure on the Borel σ-field, which is the probability distribution of the random variable involved. Chapter 5 is the first chapter where much of what was derived in the previous chapters is put to work. This chapter provides results that in a real sense constitute the workhorse whenever convergence of integrals is concerned, or differentiability under an integral sign is called for, or interchange of the order of integration is required. Some of the relevant theorems here are known by names such as the Lebesgue Monotone Convergence Theorem, the Fatou–Lebesgue Theorem, the Dominated Convergence Theorem, and the Fubini Theorem. Suitable modifications of the basic theorems in the chapter cover many important cases of both theoretical and applied interest. This is also the appropriate point to mention that many properties involving integrals are established by following a standard methodology; namely, the property in question is first proved for indicator functions, then for nonnegative simple random variables, next for nonnegative random variables, and finally for any random variables. Each step in this process relies heavily on the previous step, and the Lebesgue Monotone Convergence Theorem plays a central role. Chapter 6 is the next chapter in which results of great utilitarian value are established. These results include the standard inequalities (Hölder (Cauchy–Schwarz), Minkowski, cr, Jensen), and a combination of a probability/moment inequality, which produces the Markov and Tchebichev inequalities. A third kind of convergence—convergence in the rth mean—is also introduced and studied to a considerable extent. It is shown that convergence in the rth mean is equivalent to mutual convergence in the rth mean. Also, necessary and sufficient conditions for convergence in the rth mean are given. These conditions typically involve the concepts of uniform continuity and uniform integrability, which are important in their own right. It is an easy consequence of the Markov inequality that convergence in the rth mean
xvii
xviii
Preface to First Edition
implies convergence in probability. No direct relation may be established between convergence in the rth mean and almost sure convergence. In Chapter 7, the concept of absolute continuity of a measure relative to another measure is introduced, and the most important result from utilitarian viewpoint is derived; this is the Radon–Nikodym Theorem, Theorem 3. This theorem provides the representation of a dominated measure as the indefinite integral of a nonnegative random variable with respect to the dominating measure. Its corollary provides the justification for what is done routinely in statistics; namely, employing a probability density function in integration. The Radon–Nikodym Theorem follows easily from the Lebesgue Decomposition Theorem, which is a deep result, and this in turn is based on the Hahn–Jordan Decomposition Theorem. Although all these results are proved in great detail, this is an instance where an instructor may choose to give the outlines of the first two theorems, and assign to students the study of the details. Chapter 8 revolves around the concept of distribution functions and their basic properties. These properties include the fact that a distribution function is uniquely determined by its values on a set that is dense in the real line, that the discontinuities, being jumps only, are countably many, and that every distribution function is uniquely decomposed into two distribution functions, one of which is a step function and the other a continuous function. Next, the concepts of weak and complete convergence of a sequence of distribution functions are introduced, and it is shown that a sequence of distribution functions is weakly compact. In the final section of the chapter, the so-called Helly–Bray type results are established. This means that sufficient conditions are given under which weak or complete convergence of a sequence of distribution functions implies convergence of the integrals of a function with respect to the underlying distribution functions. The purpose of Chapter 9 is to introduce the concept of conditional expectation of a random variable in an abstract setting; the concept of conditional probability then follows as a special case. A first installment of basic properties of conditional expectations is presented, and then the discussion proceeds with the derivation of the conditional versions of the standard inequalities dealt with in Chapter 6. Conditional versions of some of the standard convergence theorems of Chapter 5 are also derived, and the chapter is concluded with the discussion of further properties of conditional expectations, and an application linking the abstract definition of conditional probability with its elementary definition. In Chapter 10, the concept of independence is considered first for events and then for σ-fields and random variables. A number of interesting results are discussed, including the fact that real-valued (measurable) functions of independent random variables are independent random variables, and that the expectation of the product of independent random variables is the product of the individual expectations. However, the most substantial result in this chapter is the fact that factorization of the joint distribution function of a finite number of random variables implies independence of the random variables involved. This result is essentially based on the fact that σ-fields generated by independent fields are themselves independent.
Preface to First Edition
Chapter 11 is devoted to characteristic functions, their basic properties, and their usage for probabilistic purposes. Once the concept of a characteristic function is defined, the fundamental result, referred to in the literature as the inversion formula, is established in a detailed manner, and several special cases are considered; also, the applicability of the formula is illustrated by means of two concrete examples. One of the main objectives in this chapter is that of establishing the Paul Lévy Continuity Theorem, thereby reducing the proof of weak convergence of a sequence of distribution functions to that of a sequence of characteristic functions, a problem much easier to deal with. This is done in Section 3, after a number of auxiliary results are first derived. The multidimensional version of the continuity theorem is essentially reduced to the one-dimensional case through the so-called Cramér–Wold device; this is done in Section 4. Convolution of two distribution functions and several related results are discussed in Section 5, whereas in the following section additional properties of characteristic functions are established. These properties include the expansion of a characteristic function in a Taylor-like formula around zero with a remainder given in three different forms. A direct application of this expansion produces the Weak Law of Large Numbers and the Central Limit Theorem. In Section 8, the significance of the moments of a random variable is dramatized by showing that, under certain conditions, these moments completely determine the distribution of the random variable through its characteristic function. The rigorous proof of the relevant theorem makes use of a number of results from complex analysis, which for convenient reference are cited in the final section of the chapter. In the next two chapters—Chapters 12 and 13—what may be considered as the backbone of classical probability is taken up: namely, the study of the central limit problem is considered under two settings, one for centered random variables and one for noncentered random variables. In both cases, a triangular array of row-wise independent random variables is considered, and, under some general and weak conditions, the totality of limiting laws—in the sense of weak convergence—is obtained for the row sums. As a very special case, necessary and sufficient conditions are given for convergence to the normal law for both the centered and the noncentered case. In the former case, sets of simpler sufficient conditions are also given for convergence to the normal law, whereas in the latter case, necessary and sufficient conditions are given for convergence to the Poisson law. The Central Limit Theorem in its usual simple form and the convergence of binomial probabilities to Poisson probabilities are also derived as very special cases of general results. The main objective of Chapter 14 is to present a complete discussion of the Kolmogorov Strong Law of Large Numbers. Before this can be attempted, a long series of other results must be established, the first of which is the Kolmogorov inequalities. The discussion proceeds with the presentation of sufficient conditions for a series of centered random variables to convergence almost surely, the Borel– Cantelli Lemma, the Borel Zero–One Criterion, and two analytical results known as the Toeplitz Lemma and the Kronecker Lemma. Still the discussion of another two results is needed—one being a weak partial version of the Kolmogorov Strong Law
xix
xx
Preface to First Edition
of Large Numbers, and the other providing estimates of the expectation of a random variable in terms of sums of probabilities—before the Kolmogorov Strong Law of Large Numbers, Theorem 7, is stated and proved. In Section 4, it is seen that, if the expectation of the underlying random variable is not finite, as is the case in Theorem 7, a version of Theorem 7 is still true. However, if said expectation does not exist, then the averages are unbounded with probability 1. The chapter is concluded with a brief discussion of the tail σ-field of a sequence of random variables and pertinent results, including the Kolmogorov Zero–One Law for independent random variables, and the so-called Three Series Criterion. Chapter 15 is not an entirely integral part of the body of basic and fundamental results of measure-theoretic probability. Rather, it is one of the many possible choices of topics that this author could have covered. It serves as a very brief introduction to an important class of discrete parameter stochastic processes—stationary and ergodic or nonergodic processes—with a view toward proving the fundamental result, the Ergodic Theorem. In this framework, the concept of a stationary stochastic process is introduced, and some characterizations of stationarity are presented. The convenient and useful coordinate process is also introduced at this point. Next, the concepts of a transformation as well as a measure-preserving transformation are discussed, and it is shown that a measure-preserving transformation along with an arbitrary random variable define a stationary process. Associated with each transformation is a class of invariant sets and a class of almost sure invariant sets, both of which are σ-fields. A special class of transformations is the class of ergodic transformations, which are defined at this point. Invariance with respect to a transformation can also be defined for a stationary sequence of random variables, and it is so done. At this point, all the required machinery is available for the formulation of the Ergodic Theorem; also, its proof is presented, after some additional preliminary results are established. In the final section of the chapter, invariance of sets and of random variables is defined relative to a stationary process. Also, an alternative form of the Ergodic Theorem is given for nonergodic as well as ergodic processes. In closing, it is to be pointed out that one direction of the Kolmogorov Strong Law of Large Numbers is a special case of the Ergodic Theorem, as a sequence of independent identically distributed random variables forms a stationary and ergodic process. Throughout the years, this author has drawn upon a number of sources in organizing his lectures. Some of those sources are among the books listed in the Selected References Section. However, the style and spirit of the discussions in this book lie closest to those of Loève’s book. At this point, I would like to mention a recent worthy addition to the literature in measure theory—the book by Eric Vestrup, not least because Eric was one of our Ph.D. students at the University of California, Davis. The lecture notes that eventually resulted in this book were revised, modified, and supplemented several times throughout the years; comments made by several of my students were very helpful in this respect. Unfortunately, they will have to remain anonymous, as I have not kept a complete record of them, and I do not want to provide an incomplete list. However, I do wish to thank my colleague and friend Costas Drossos for supplying a substantial number of exercises, mostly accompanied by
Preface to First Edition
answers. I would like to thank the following reviewers: Ibrahim Ahmad, University of Central Florida; Richard Johnson, University of Wisconsin; Madan Puri, Indiana University; Doraiswamy Ramachandran, California State University at Sacramento; and Zongwu Cai, University of North Carolina at Charlotte. Finally, thanks are due to my Project Assistant Newton Wai, who very skillfully turned my manuscript into an excellent typed text. George G. Roussas Davis, California November 2003
xxi
Preface to Second Edition This is a revised version of the first edition of the book with copyright year 2005. The basic character of the book remains the same, although its style is slightly different. Whatever changes were effected were made to correct misprints and oversights; add some clarifying points, as well as insert more references to previous parts of the book in support of arguments made; make minor modifications in the formulation, and in particular, the proof of some results; and supply additional exercises. Specifically, the formulation of Theorem 8 in Chapter 3 has been rearranged. The proof of Theorem 3, case 3, in Chapter 4, has been simplified. The proof of Theorem 12 in Chapter 5 has been modified. Proposition 1, replaces Remark 6(ii) in Chapter 7. The proof of Theorem 3(iii) in Chapter 8 has been modified, and so has the concluding part of the proof of Theorem 5 in the same chapter. Likewise for the proofs of Theorems 7 and 8 in the same chapter. Remark 2 was inserted in Chapter 9 in order to further illustrate the abstract definition and the significance of the conditional expectation. Section 3 of Chapter 11 has been restructured. Theorem 3 has been split into two parts, Theorem 3 and Theorem 3*. Part (i) is the same in both of these theorems, as well as in the original theorem. There is a difference, however, in the formulation of part (ii) in the new versions of the theorems. Theorem 3 here is formulated along the familiar lines involving distribution functions and characteristic functions of random variables. Its formulation is followed by two lemmas, which facilitate its proof. The formulation of the second part of Theorem 3* is more general, and along the same lines as the converse of the original Theorem 3. Theorem 3* is also followed by two lemmas and one proposition, which lead to its justification. This section is concluded with two propositions, Propositions 2 and 3, where some restrictions imposed in the formulation of Lemmas 1–4 and Proposition 1 are lifted. In the same chapter, Chapter 11, the proof of Theorem 8 is essentially split into two parts, with the insertion of a “Statement” (just a few lines below relation (11.28)), in order to emphasize the asserted uniformity in the convergence. In Chapter 12, Example 3 was added, first to illustrate the process of checking the Lindeberg-Feller condition, and second to provide some insight into this condition. In Chapter 14, the second part of Lemma 7 has been modified, and so has its proof . Finally, a new chapter, Chapter 16, has been added. This Chapter discusses some material on statistical inference, and it was added for the benefit of statistically oriented users of the book and upon the suggestion of a reviewer of the revised edition of the book. Its main purpose, however, is to demonstrate as to how some of the theorems, corollaries, etc. discussed in the book apply in establishing statistically inference results. For a chapter-by-chapter brief description of the material discussed in the book, and also advice us to how the book can be used, the reader should go over the preface of its first edition.
xxiii
xxiv
Preface to Second Edition
The Answers Manual has been revised along the same lines as the text of the book. Thus, misprints and oversights have been corrected, and a handful of solutions have been modified. Of course, solutions to all new exercises are supplied. Again, the Answers Manual, in its revised version, will be made available to all those instructors who adopt the book as the textbook in their course. Misprints and oversights were located in the usual way; that is, by teaching from the book. Many of the misprints and oversights were pointed out by attentive students. In this respect, special mention should be made of my students Qiuyan Xu and Gabriel Becker. Clarifications, modifications, and rearrangement of material, as described earlier, were also stimulated, to a large extent, by observations made and questions posed by students. Warm thanks are extended to all those who took my two-quarter course the last two offerings. Also, I am grateful to Stacy Hill and Paul Ressel for constructive comments. In particular, I am indebted to Michael McAssey, a formerly graduate student in the Department of Statistics, for the significant role he played toward the revision of the book and the Answers Manual. The accuracy and efficiency by which he handled the material was absolutely exemplary. Thanks are also due to Chu Shing (Randy) Lai for most efficiently implementing some corrections and inserting additional material into the book and the Answer Manual. In closing, I consider it imperative to mention the following facts. Each chapter is introduced by a brief summary, describing the content of the chapter. In addition, there is an appendix in the book, Appendix A, where a much more extensive description is provided, chapter-by-chapter. It is my opinion that the reader would benefit greatly by reading this appendix before embarking on the study of the chapters. In this revision, a new appendix, Appendix B, has been added, providing a brief review of the Riemann–Stieltjes integral, and its relationship to the Riemann integral and the Lebesgue integral on the real line. The Riemann–Stieltjes integral is used explicitly in parts of Chapter 8, and implicitly in part of Chapters 11 through 13. Finally, it is mentioned here that some notation and abbreviations have been added to refresh readers’ memory and ensure uniformity in notation. George G. Roussas Davis, California September 2013
CHAPTER
Certain Classes of Sets, Measurability, and Pointwise Approximation
1
In this introductory chapter, the concepts of a field and of a σ -field are introduced, they are illustrated by means of examples, and some relevant basic results are derived. Also, the concept of a monotone class is defined and its relationship to certain fields and σ -fields is investigated. Given a collection of measurable spaces, their product space is defined, and some basic properties are established. The concept of a measurable mapping is introduced, and its relation to certain σ -fields is studied. Finally, it is shown that any random variable is the pointwise limit of a sequence of simple random variables.
1.1 Measurable Spaces Let be an abstract set (or space) and let C be a class of subsets of ; i.e., C ⊆ P(), the class of all subsets of . Definition 1.
C is said to be a field, usually denoted by F, if
(i) C is nonempty. (ii) If A ∈ C, then Ac ∈ C. (iii) If A1 , A2 ∈ C, then A1 ∪ A2 ∈ C.
Remark 1. In view of (ii) and (iii), the union A1 ∪ A2 may be replaced by the intersection A1 ∩ A2 . Examples. (Recall that a set is countable if it is either finite or it has the same cardinality as the set of integers. In the latter case it is countably infinite. A set is uncountable if it has the same cardinality as the real numbers.) (1) C = {, } is a field called the trivial field. (It is the smallest possible field.) (2) C = {all subsets of } = P() is a field called the discrete field. (It is the largest possible field.) (3) C = {, A, Ac , } for some A with ⊂ A ⊂ . (4) Let be infinite (countably or not) and let C = {A ⊆ ; A is finite or Ac is finite}. Then C is a field. (5) Let C be the class of all (finite) sums (unions of pairwise disjoint sets) of the partitioning sets of a finite partition of an arbitrary set (see Definition 12 below). Then C is a field (induced or generated by the underlying partition). An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00001-3 Copyright © 2014 Elsevier Inc. All rights reserved.
1
2
CHAPTER 1 Certain Classes of Sets, Measurability
Remark 2. In Example 4, it is to be observed that if is finite rather than infinite, then C = P(). Consequences of Definition 1. (1) , ∈ F for every F. (2) If A j ∈ F, j = 1, . . . , n, then nj=1 A j ∈ F. (3) If A j ∈ F, j = 1, . . . , n, then nj=1 A j ∈ F. Remark 3. It is shown by examples that A j ∈ F, j ≥ 1, need not imply ∞ ∞ A ∈ F, and similarly for A (see Remark 5 below). j j j=1 j=1 Definition 2. C is said to be a σ -field , usually denoted by A, if it is a field and (iii) in Definition 1 is strengthened to (iii ) If A j ∈ C, j = 1, 2, . . ., then ∞ j=1 A j ∈ C. Remark 4. In view of (ii) and (iii), the union in (iii ) may be replaced by the intersection ∞ j=1 A j . Examples. (6) C = {, } is a σ -field called the trivial σ -field. (7) C = P() is a σ -field called the discrete σ -field. (8) Let be uncountable and let C = {A ⊆ ; A is countable or Ac is countable}. Then C is a σ -field. (Of course, if is countably infinite, then C = P()). (9) Let C be the class of all countable sums of the partitioning sets of a countable partition of an arbitrary set . Then C is a σ -field (induced or generated by the underlying partition). Remark 5. A σ -field is always a field, but a field need not be a σ -field. In fact, in Example 4 take = (real line), and let A j = {k integer; − j ≤ k ≤ j}, j = 0, 1, . . . Then A j ∈ C, nj=0 A j ∈ C for any n = 0, 1, . . . but ∞ j=0 A j (= set of all integers) ∈ / C. Let I be any index set. Then Theorem 1. (i) If F j , j ∈ I are fields, so is j∈I F j = {A ⊆ ; A ∈ F j , j ∈ I }. (ii) If A j , j ∈ I are σ -fields, so is j∈I A j = {A ⊆ ; A ∈ A j , j ∈ I }.
Proof.
Immediate.
Let C be any class of subsets of . Then Theorem 2. (i) There is a unique minimal field containing C. This is denoted by F(C) and is called the field generated by C.
1.1 Measurable Spaces
(ii) There is a unique minimal σ -field containing C. This is denoted by σ (C) and is called the σ -field generated by C. Proof. (i) F(C) = j∈I F j , where {F j , j ∈ I } is the nonempty class of all fields containing C. (ii) σ (C) = j∈I A j , where {A j , j ∈ I } is the nonempty class of all σ -fields containing C. Remark 6. Clearly, σ (F(C)) = σ (C). Indeed, C ⊆ F(C), which implies σ (C) ⊆ σ (F(C)). Also, for every σ -field Ai ⊇ C it holds Ai ⊇ F(C), since A i is a field (being a σ -field), and F(C) is the minimal field (over C). Hence σ (C) = i Ai ⊇ F(C). Since σ (C) is a σ -field, it contains the minimal σ -field over F(C), σ (F(C)); i.e., σ (C) ⊇ σ (F(C)). Hence σ (C) = σ (F(C)). Application 1. Let = and C0 = {all intervals in } = {(x, y), (x, y], [x, y), [x, y], (−∞, a), (−∞, a], (b, ∞), [b, ∞); x, y ∈ , x < y, a, b ∈ }. Then σ (C0 ) is denoted by B and is called the Borel σ -field over the real line. The sets in B are ¯ = ∪ {−∞, ∞}. ¯ is called the extended real line and the called Borel sets. Let _ σ -field B generated by B ∪ {−∞} ∪ {∞} the extended Borel σ -field. x, x + n1 with Remark 7. {x} ∈ B for every x ∈ . Indeed, {x} = ∞ n=1 1 x, x + n1 ∈ B. Hence ∞ n=1 x, x + n ∈ B, or {x} ∈ B. Alternatively, with a < x < b, we have {x} = (a, x] ∩ [x, b) ∈ B. Definition 3. The pair (, A) is called a measurable space and the sets in A _ ¯ B ) the measurable sets. In particular, (, B) is called the Borel real line, and (, extended Borel real line. Let C again be a class of subsets of . Then C is called a monotone class if A j ∈ C, j = 1, 2, . . . and A j ↑ (i.e., de f A1 ⊆ A2 ⊆ · · · ) or A j ↓ (i.e., A1 ⊇ A2 ⊇ · · · ), then lim j→∞ A j = ∞ j=1 A j ∈ C de f ∞ and lim j→∞ A j = j=1 A j ∈ C, respectively. Definition 4.
Theorem 3. A σ -field A is a monotone field (i.e., a field that is also a monotone class) and conversely. Proof. One direction is immediate. As for the other, let F be a monotone ∞field and A ∈ F. We have: let any A j ∈ F, j = 1, 2, . . .. To show that ∞ j j=1 j=1 A j = n B , where B = A1 ∪ (A1 ∪ A2 ) ∪ · · · ∪ (A1 ∪ · · · ∪ An ) ∪ · · · = ∞ n j=1 A j , ∞n=1 n and hence Bn ∈ F, n = 1, 2, . . . and Bn ↑. Thus n=1 Bn ∈ F. Theorem 4. If M j , j ∈ I , are monotone classes, so is j∈I M j = {A ⊆ ; A ∈ M j , j ∈ I }. Proof.
Immediate.
3
4
CHAPTER 1 Certain Classes of Sets, Measurability
There is a unique minimal monotone class M containing C. Proof. M = j∈I M j , where {M j , j ∈ I } is the nonempty class of all monotone classes containing C. Theorem 5.
Remark 8.
{M j , j ∈ I } is nonempty since σ (C) or P() belong in it.
Remark 9. It may be seen by means of examples (see Exercise 12) that a monotone class need not be a field. However, see the next lemma, as well as Theorem 6. Lemma 1. Let C be a field and M be the minimal monotone class containing C. Then M is a field. Proof. where
In order to prove that M is a field, it suffices to prove that relations (*) hold, ⎫ (i) A ∩ B ∈ M ⎬ (∗) for every A, B ∈ M, we have: (ii) Ac ∩ B ∈ M . ⎭ ⎩ (iii) A ∩ B c ∈ M ⎧ ⎨
(That is, for every A, B ∈ M, their intersection is in M, and so is the intersection of any one of them by the complement of the other.) In fact, M ⊇ C, implies ∈ M. Taking B = , we get that for every A ∈ M, Ac ∩ = Ac ∈ M (by (ii)). Since also A ∩ B ∈ M (by (i)) for all A, B ∈ M, the proof would be completed. In order to establish (*), we follow the following three steps: Step 1. For any A ∈ M, define M A = {B ∈ M; (*) holds}, so that M A ⊆ M. Obviously A ∈ M A , since ∈ M. It is asserted that M A is a monotone class. Let de f B j = B ∈ M A ; i.e., to show B j ∈ M A , j = 1, 2, . . . , B j ↑. To showthat ∞ j=1 that (*) holds. We have: A ∩ B = A ∩ ( j B j ) = j (A ∩ B j ) ∈ M, since M is monotone and A ∩ B j ↑. Next, Ac ∩ B = Ac ∩ ( j B j ) = j (Ac ∩B j ) ∈ M M, by (*)(ii), and Ac ∩ B j ↑. Finally, A ∩ B c = A ∩ ( j B j )c = since Ac ∩ B j ∈ c A ∩ ( j B j ) = j (A ∩ B cj ) with A ∩ B cj ∈ M by (*)(iii) and A ∩ B cj ↓, so that c j (A ∩ B j ) ∈ M since M is monotone. The case that B j ↓ is treated similarly, and the proof that M A is a monotone class is complete. Step 2. If A ∈ C, then M A = M. As already mentioned, M A ⊆ M. So it suffices to prove that M ⊆ M A . Let B ∈ C. Then (*) holds and hence B ∈ M A . Therefore C ⊆ M A . By step 1, M A is a monotone class and M is the minimal monotone class containing C. Thus M ⊆ M A and hence M A = M. Step 3. If A is any set in M, then M A = M. We show that C ⊆ M A , which implies M ⊆ M A since M A is a monotone class containing C and M is the minimal monotone class over C. Since also M A ⊆ M, the result M A = M would follow. To show C ⊆ M A , take B ∈ C and consider M B . Then M B = M by step 2. Since A ∈ M, we have A ∈ M B , which implies that B ∩ A, B c ∩ A, and B ∩ Ac all belong in M; or A ∩ B, Ac ∩ B, and A ∩ B c belong in M, which means that B ∈ M A . Theorem 6. Let C be a field and M be the minimal monotone class containing C. Then M = σ (C).
1.2 Product Measurable Spaces
Proof. Evidently, M ⊆ σ (C) since every σ -field is a monotone class. By Lemma 1, M is a field, and hence a σ -field, by Theorem 3. Thus, M ⊇ σ (C) and hence M = σ (C). Remark 10. Lemma 1 and Theorem 6 just discussed provide an illustration of the intricate relation of fields, monotone classes, and σ -fields in a certain setting. As will also be seen in several places in this book, monotone classes are often used as tools in arguments meant to establish results about σ -fields. In this kind of arguments, the roles of a field and of a monotone class may be substituted by the so-called π -systems and λ-systems, respectively. The definition of these concepts may be found, for example, in page 41 in Billingsley (1995). A result analogous to Theorem 6 is then Theorem 1.3 in page 5 of the reference just cited, which states that: If P is a π -system and G is a λ-system, then P ⊂ G implies σ (P) ⊂ G.
1.2 Product Measurable Spaces Consider the measurable spaces (1 , A1 ), (2 , A2 ). Then Definition 5. The product space of 1 , 2 , denoted by 1 × 2 , is defined as follows: 1 × 2 = {ω = (ω1 , ω2 ); ω1 ∈ 1 , ω2 ∈ 2 }. In particular, for A ∈ A1 , B ∈ A2 the product of A, B, denoted by A× B, is defined by: A× B = {ω = (ω1 , ω2 ); ω1 ∈ A, ω2 ∈ B}, and the subsets A × B of 1 × 2 for A ∈ A1 , B ∈ A2 are called (measurable) rectangles. A, B are called the sides of the rectangle. From Definition 5, one easily verifies the following lemma. Lemma 2. Consider the rectangle E = A × B. Then, with “+” denoting union of disjoint events, (i) E c = (A × B c ) + (Ac × 2 ) = (Ac × B) + (1 × B c ). Consider the rectangles E 1 = A1 × B1 , E 2 = A2 × B2 . Then (ii) E 1 ∩ E 2 = (A1 ∩ A2 ) × (B1 ∩ B2 ). Hence E 1 ∩ E 2 = if and only if at least one of the sets A1 ∩ A2 , B1 ∩ B2 is . Consider the rectangles E 1 , E 2 as above, and the rectangles F1 = A 1 × B1 , F2 = A 2 × B2 . Then
(iii) (E 1 ∩ F1 ) ∩ (E 2 ∩ F2 ) = [(A1 ∩ A1 ) × (B1 ∩ B1 )] ∩ [(A2 ∩ A2 ) ×(B2 ∩ B2 )] (by (ii)) = [(A1 ∩ A 1 ) ∩ (A2 ∩ A 2 )] ×[(B1 ∩ B1 ) ∩ (B2 ∩ B2 )]
(by (ii))
A2 ) ∩ (A 1
= [(A1 ∩ ×[(B1 ∩
∩ A 2 )] B2 ) ∩ (B1 ∩ B2 )].
Hence, the left-hand side is if and only if at least one of (A1 ∩ A2 ) ∩ (A 1 ∩ A 2 ), (B1 ∩ B2 ) ∩ (B1 ∩ B2 ) is . Theorem 7. Let C be the class of all finite sums (i.e., unions of pairwise disjoint) of rectangles A × B with A ∈ A1 , B ∈ A2 . Then C is a field (of subsets of 1 × 2 ).
5
6
CHAPTER 1 Certain Classes of Sets, Measurability
Proof. Clearly, C = . Next, let we show that E ∩ F ∈ C. In mE, F ∈ C. Then E i , F = nj=1 F j with E i = Ai × Bi , i = fact, E, F ∈ C implies that E = i=1 m n 1, . . . , m, F j = A j × B j , j = 1, . . . , n. Thus E ∩ F = i=1 j=1 (E i ∩ F j ) and
E i ∩ F j , E i ∩ F j are disjoint for (i, j) = (i , j ) by Lemma 2 (ii), (iii). Indeed, in Lemma 2 (iii), make the identification: A1 = Ai , B1 = Bi , A2 = Ai , B2 = Bi , A 1 = A j , B1 = B j , A 2 = A j , B2 = B j to get (E i ∩ F j ) ∩ (E i ∩ F j ) = [(Ai ∩ Ai ) ∩ (A j ∩ A j )] × [(Bi ∩ Bi ) ∩ (B j ∩ B j )] by the third line on the right-hand side in Lemma 2(iii), and at least one of (Ai ∩ Ai ) ∩ (A j ∩ A j ), (Bi ∩ Bi ) ∩ (B j ∩ B j ) is equal to . Then, by Lemma 2(iii) again, (E i ∩ F j ) ∩ (E i ∩ F j ) = , and therefore m n
E ∩ F = i=1 j=1 (E i ∩ F j ). However, E i ∩ F j = (Ai ∩ A j ) × (Bi ∩ B j ) (by Lemma 2(ii)), and Ai ∩ A j ∈ A1 , Bi ∩ B j ∈ A2 , i = 1, . . . , m, j = 1, . . . , n. Thus E ∩ F is the sum of finitely many rectangles and hence E ∩ F ∈ C. (By ∈ C, k = 1, . . . , , then induction it is also true that if E k k=1 E k ∈ C.) Finally, m m m E i )c = i=1 E ic = i=1 [(Ai × Bic ) + (Aic × 2 )] (by Lemma 2(i)), E c = ( i=1 and Ai × Bic , Aic × 2 are disjoint rectangles so that their sum is in C. But then so is their intersection over i = 1, . . . , m by the induction just mentioned. The proof is completed. Remark 11. Clearly, the theorem also holds true if we start out with fields F1 and F2 rather than σ -fields A1 and A2 . Definition 6. The σ -field generated by the field C is called the product σ -field of A1 , A2 and is denoted by A1 ×A2 . The pair (1 ×2 , A1 ×A2 ) is called the product measurable space of the (measurable) spaces (1 , A1 ), (2 , A2 ). If we have n ≥ 2 measurable spaces (i , Ai ), i = 1, . . . , n, the product measurable space (1 ×· · ·×n , A1 ×· · ·×An ) is defined in an analogous way. In particular, if 1 = · · · = n = and A1 = · · · = An = B, then the product space (n , B n ) is the n-dimensional Borel space, where n = × · · · × , B n = B × · · · × B (n factors), and B n is called the n-dimensional Borel σ -field. The members of B n are called the n-dimensional Borel sets. Now we consider the case of infinitely (countably or not) many measurable spaces (t , At ), t ∈ T , where the (= ) index set T will usually be the real line or the positive half of it or the unit interval (0, 1) or [0,1].
Definition 7. The product space of t , t ∈ T , denoted by t∈T t or T , is defined by T = t∈T t = {ω = (ωt , t ∈ T ); ωt ∈ t , t ∈ T }. By forming the point ω = (ωt , t ∈ T ) with ωt ∈ t , t ∈ T , we tacitly assume, by invoking the axiom of choice, that there exists a function on T into t∈T t with t = , t ∈ T , whose value at t, ωt , belongs in t . Now for T = {1, 2}, 1 × 2 = {ω = (ω1 , ω2 ); ω1 ∈ 1 , ω2 ∈ 2 }. Also, let f : T → 1 ∪ 2 such that f (1) ∈ 1 , f (2) ∈ 2 . Then ( f (1), f (2)) ∈ 1 × 2 . Conversely, any (ω1 , ω2 ) ∈ 1 × 2 is the (ordered) pair of values of a function f on T into 1 ∪ 2 with f (1) ∈ 1 , f (2) ∈ 2 ; namely, the function for which f (1) = ω1 , f (2) = ω2 . Thus, 1 × 2 may be looked upon as the collection of all functions f on T into 1 ∪ 2 with f (1) ∈ 1 , f (2) ∈ 2 . Similar interpretation
1.3 Measurable Functions and Random Variables
holds for any finite collection of (= ) i , i = 1, . . . , n, as well as
any collection of (= ) t , t ∈ T (= ) (by the axiom of choice). Thus, T = t∈T t = { f : T → t∈T t ; f (t) ∈ t , t ∈ T }. In particular, if T = and t = , t ∈ T , then T = t∈T t is the set of all real-valued functions defined on . Remark 12. In many
applications, we take T = [0, 1], t = , t ∈ T , and we consider subsets of t∈T t , such as the set of all continuous functions, denoted by C([0, 1]), or the set of all bounded and right-continuous functions, denoted by D([0, 1]). Next, for any
positive integer N , let TN = {t1 , . . . , t N } with ti ∈ T , i = 1, . . . , N , and let A TN = t∈TN At . Then A TN is a rectangle in t1 × · · · × t N . Furthermore,
Definition 8. The subset A TN × t∈T c t = t∈TN At × t∈T c t of t∈T t N N
is called a product cylinder in T = t∈T t with basis A TN and sides At ∈ At , t ∈ TN . Theorem 8. Let C be the class of all finite sums of all product cylinders. Then C is a field (of subsets of t∈T t ). The proof of this theorem is based on the same ideas as those used in proving Theorem 7. Definition 9. The σ -field generated by C is called the product σ -field of At , t ∈ T , and is denoted by AT = t∈T At . The pair (T = t∈T t , AT = t∈T At ) is called the product measurable space of the (measurable) spaces (t , At ), t ∈ T . The space (∞ , B ∞ ), the (countably) infinite dimensional Borel space, where ∞ = × × · · ·, and B ∞ = B × B × · · ·, is often of special interest. B ∞ is the (countably) infinite-dimensional Borel σ -field. The members of B ∞ are called (countably) infinite dimensional Borel sets. For more information, see also page 62 of Loève (1963).
1.3 Measurable Functions and Random Variables Let , be two spaces and let X be a mapping such that X : → . Then the set operator X −1 associated with the mapping X is defined as follows: Definition 10. X −1 : P( ) → P() and X −1 (A ) = A, where A = {ω ∈ ; X (ω) ∈ A }; X −1 (A ) is the inverse image of A under X . From Definition 10 it follows that Theorem 9. (i) (ii) (iii) (iv)
If A ∩ B = , then [X −1 (A )] ∩ [X −1 (B )] = . X −1 (A c ) = [X −1 (A )]c . X −1 ( j∈I A j ) = j∈I X −1 (A j ) and X −1 ( j∈I A j ) = j∈I X −1 (A j ). X −1 ( j∈I A j ) = j∈I X −1 (A j ).
7
8
CHAPTER 1 Certain Classes of Sets, Measurability
(v) X −1 (A − B ) = X −1 (A ) − X −1 (B ) (equivalently, X −1 (A ∩ B c ) = X −1 (A ) ∩ [X −1 (B )]c ). (vi) If A ⊆ B , then X −1 (A ) ⊆ X −1 (B ). (vii) C ⊆ C
, then X −1 (C ) ⊆ X −1 (C
), where X −1 (C ) = {A ⊆ ; A = X −1 (A ) for some A ∈ C }; and similarly for C
. Now let us assume that is supplied with a σ -field A . Then we have Theorem 10. Define the class C of subsets of as follows: C = X −1 (A ). Then C is a σ -field (i.e., the inverse image of a σ -field is a σ -field). Remark 13.
This σ -field is called the σ -field induced (in ) by X .
Proof of Theorem 10.
This is immediate from (ii) and (iii) of Theorem 9.
Next assume that is supplied with a σ -field A. Then Theorem 11. Define the class C of subsets of as follows: C = {A ⊆ ; X −1 (A ) ∈ A}. Then C is a σ -field. Proof.
Immediate from (ii) and (iii) of Theorem 9.
de f
Theorem 12. Let C be a class of subsets of and let A = σ (C ). Then A = σ [X −1 (C )] = X −1 (A ). Proof. We have: C ⊆ A implies X −1 (C ) ⊆ X −1 (A ), and this implies A ⊆ X −1 (A ) because X −1 (A ) is a σ -field by Theorem 10. Thus, to show X −1 (A ) ⊆ A. Define C ∗ as follows: C ∗ = {A ⊆ ; X −1 (A ) ∈ A}. Then, clearly, C ⊆ C ∗ , and C ∗ is a σ -field by Theorem 11, and the fact that X −1 (C ∗ ) ⊆ A. Hence A ⊆ C ∗ and therefore X −1 (A ) ⊆ X −1 (C ∗ ) ⊆ A. Thus X −1 (A ) ⊆ A and X −1 (A ) = A. Now assume that both and are supplied with σ -fields A and A , respectively. Then Definition 11. If X −1 (A ) ⊆ A we say that X is measurable with respect to A and A , or just measurable if no confusion is possible. In particular, if ( , A ) = (n , B n ) and X is measurable, we say that X is an n-dimensional random vector and if n = 1, a random variable (r.v.). If (, A) = (n , B n ), ( , A ) = (m , B m ) and f : →
is measurable, then f is called a Borel function, and for m = 1 a Baire function. The meaning and significance of Theorem 12 are this: if we want to check measurability of X , it suffices only to check that X −1 (C ) ⊆ A, where C is a class generating A . Indeed, if X : (, A) → ( , A ), then (A, A )− measurability of X means X −1 (A ) ⊆ A. Let X −1 (C ) ⊆ A and let A = σ (C ). Then σ [X −1 (C )] ⊆ A. But σ [X −1 (C )] = X −1 (A ). Thus X −1 (A ) ⊆ A. In particular, in the Borel real line, X is a r.v. if only X −1 (C0 ) or X −1 (C j ) or X −1 (C j ) ⊆ A, j = 1, . . . , 8, where C0 is as in Application 1, the classes C j , j = 1, . . . , 8, are the classes of intervals each consisting of intervals from C0 of one type, and C j is the class taken from C j when the endpoints of the intervals are restricted to be rational numbers j = 1, . . . , 8. Theorem 13. Let X : (, A) → ( , A ) be measurable and let f : ( , A ) → (
, A
) be measurable. Define f (X ) : →
as follows: f (X )(ω) = f [X (ω)].
1.3 Measurable Functions and Random Variables
Then the mapping f (X ) is measurable. That is, a measurable mapping of a measurable mapping is a measurable mapping. Proof. For A
∈ A
, [ f (X )]−1 (A
) = X −1 [ f −1 (A
)] = X −1 (A ) with A ∈ A . Thus X −1 (A ) = A ∈ A. Corollary 1. Proof.
Borel functions of random vectors are random vectors.
Take ( , A ) = (n , B n ), (
, A
) = (m , B m ).
(, A), ( , A )
and assume that and We now consider the measurable spaces are also provided with topologies T and T , respectively. (Recall that T is a topology for if T is a class of subsets of with the following properties: (i) , ∈ T , (ii) T is closed under finite intersections of members of T , and (iii) T is closed under arbitrary unions of members of T .) The pair (, T ) is called a topological space, and the members of T are called open sets. Also, f : (, T ) → ( , T ) is said to be continuous (with respect to the topologies T and T ), if f −1 (T ) ∈ T for every T ∈ T . Theorem 14. Let f : → be continuous and let that T ⊆ A, A = σ (T ). Then f is measurable. Proof. Continuity of f implies f −1 (T ) ∈ T , T ∈ T . Hence f −1 (T ) ⊆ A. Since T generates A , we have f −1 (A ) = σ [ f −1 (T )] ⊆ A by Theorem 12. Application 2. Recall that a class of sets in T is a base for T if every T in T is the union of members of this class. A topology T and the corresponding topological space are called separable if there exists a countable base for T . In the spaces (k , B k ), k ≥ 1, the “usual” topology Tk is the one with base the class of all finite open intervals (rectangles) or only the class of all open intervals (rectangles) with rational endpoints. This second base is countable and the topology Tk and the space (k , Tk ) are separable. Then, clearly, B k is generated by Tk (see Theorem 7, Definition 6, and the paragraph following it). Thus we have Corollary 2. Let X : (, A) → (n , B n ) be measurable and let f : (n , B n ) → (m , B m ) be continuous. Then f (X ) : → m is measurable (i.e., continuous functions of a random vector are random vectors). Proof.
Follows by the fact that Tm and Tn generate B m and B n , respectively.
This corollary implies that the usual operations applied on r.v.s, such as forming sums, products, or quotients, will give r.v.s. Now if X : → n , then X can be written as X = (X 1 , . . . , X n ). In connection with this we have Theorem 15. Let X = (X 1 , . . . , X n ) : (, A) → (n , B n ). Then X is a random vector (measurable function) if and only if X j , j = 1, . . . , n are r.v.s. Proof. Let Bi ∈ B, i = 1, . . . , n. Then X −1 (B1 × · · · × Bn ) = (X 1 , . . . , X n )−1 (B1 × · · · × Bn ) = (X 1 ∈ B1 ) ∩ · · · ∩ (X n ∈ Bn ) = [X 1−1 (B1 )] ∩ · · · ∩ [X n−1 (Bn )]. Thus, if X j , j = 1, . . . , n are r.v.s, then X −1 j (B j ) ∈ A for every j and hence so is
9
10
CHAPTER 1 Certain Classes of Sets, Measurability
n
X −1 (B j ). So, if X j , j = 1, . . . , n, are measurable, so is X (by the definition of the product σ -field B n ). Next, consider the projection functions f j : n → such that f j (x1 , . . . , xn ) = x j , j = 1, . . . , n. It is known that f j , j = 1, . . . , n, are continuous, hence measurable. Then X j = f j (X ), j = 1, . . . , n, and the measurability of X implies the measurability of X j , j = 1, . . . , n. j=1
Let X be a r.v. Then the positive part of X , denoted by X + , and the negative part of X , denoted by X − , are defined as follows: X if X ≥ 0 0 if X ≥ 0, + − X = ; X = 0 if X < 0 −X if X < 0 . Then, clearly, X = X + − X − and |X | = X + + X − . Now as a simple application of the Corollary to Theorem 14, we show that both X + and X − are r.v.s and then, of course, so is |X |. To this end, take n = m = 1 and define f by: f (x) = x + , which is a continuous function of x, and similarly for f (x) = x − . Directly, the measurability of X + is established as follows. In order to prove the measurability of X + , it suffices to show that (X + )−1 ((−∞, x]) ∈ A for x ∈ . For x < 0, (X + ≤ x) = . For x = 0, (X + ≤ 0) = (X ≤ 0) ∈ A. For x > 0, (X + ≤ x) = (X + = 0) ∪ (0 < X + ≤ x) = (X ≤ 0) ∪ (0 < X ≤ x) = (X ≤ x) ∈ A. (Recall that for a sequence {xn } of real numbers, and as n → ∞: xn = x¯ if for every ε > 0 there exists n(ε) > 0 (1) lim sup xn or lim n n
integer such that n ≥ n(ε) implies xn ≤ x¯ + ε, and xn > x¯ − ε for at least one n ≥ n(ε). (2) limn inf xn or lim xn = x if for every ε > 0 there exists n(ε) > 0 n
integer such that n ≥ n(ε) implies xn ≥ x − ε and xn < x + ε for at least one n ≥ n(ε). Also, f (3) lim xn = infn supi≥n xi = infn yn , yn de = supi≥n xi , n
so that yn ↓ and set infn yn de f
= limn yn = x. ¯ de f
(4)
lim xn = supn infi≥n xi = supn z n , z n = infi≥n xi , n
so that z n ↑ and set supn z n de f
= limn z n = x. ¯ If x ≥ x, ¯ For every n ≥ 1, z n ≤ yn so that supn z n ≤ infn yn or equivalently x ≤ x. then the common value x = x¯ = x is the limn of xn .)
1.3 Measurable Functions and Random Variables
Next let X n , n ≥ 1, be r.v.s. Then define the following mappings (which are assumed to be finite). The sup and inf are taken over n ≥ 1 and all limits are taken as n → ∞. ⎫ : supn X n ω = supn X n (ω) supn X n ⎪ ⎪ ⎪ ⎪ ⎪ infn X n : infn X n ω = infn X n (ω) ⎪ ⎪ ⎬ lim sup X n or lim X n : lim sup X n (ω) = lim sup X n (ω) ⎪ ω ∈ . ⎪ n n n ⎪ ⎪ n ⎪ ⎪ lim inf X n or lim X n : lim inf X n (ω) = lim inf X n (ω) ⎪ ⎭ n
n
n
n
Then lim infn X n ≤ lim supn X n and if lim infn X n = lim supn X n , this defines the mapping limn X n . Then we have the following: If X n , n ≥ 1, are r.v.s, then the mappings just defined are also ∞ Proof. We have (supn X n ≤ x) = (X n ≤ x, n ≥ 1) = n=1 (X n ≤ x) ∈ A. Thus supn X n is a r.v. Now infn X n = −sup(−X n ) and then the measurability of supn X n implies the measurability of infn X n . Next, lim supn X n = infn (sup j≥n X j ). Thus, if Yn = sup j≥n X j , then Yn , n ≥ 1, are r.v.s and then so is the infn Yn . Finally, lim inf n X n = −lim supn (−X n ), and then the previous result implies the measurability of lim infn X n . The measurability of limn X n , if the limit exists, is an immediate consequence of the last two results. _ ¯ B ), the extended Borel real line, is an A measurable mapping X on (, A) into (, extended r.v. Then Theorem 16 still holds true if the operations applied on X n , n ≥ 1, produce extended r.v.s. Theorem 16. r.v.s.
j ∈ I } be a Definition 12. Consider the measurable space (, A) and let {A j , collection of sets in A such that Ai ∩ A j = , i, j ∈ I , i = j, and j A j = . Then this collection is called a (measurable) partition of . The partition is finite if I is a finite set and infinite otherwise. partition of , Definition 13. Let {A j , j = 1, . . . , n} be a (finite, measurable) n α and define the mapping X : → as follows: X = j=1 j I A j , such that α j ∈ , j = 1, . . . , n (which may be assumed to be distinct). Then X is called a infinite measurable) partition of , simple r.v. If {A j , j = 1, 2, . . .} is a (countably then the mapping X : → such that X = ∞ j=1 α j I A j , α j ∈ , j = 1, 2, . . . (which may be assumed to be distinct) is called an elementary r.v. Remark 14.
By I A we denote the indicator of the set A; i.e., 1 if ω ∈ A, I A (ω) = 0 if ω ∈ Ac .
It is evident that simple and elementary r.v.s are indeed r.v.s. What is more important, however, is that some kind of an inverse of this statement also holds true. More precisely,
11
12
CHAPTER 1 Certain Classes of Sets, Measurability
Theorem 17. Proof.
Every r.v. is the pointwise limit of a sequence of simple r.v.s.
Consider the r.v. X , the interval [−n, n), and define the sets: j −1 j , j = −n2n + 1, −n2n + 2, . . . , n2n , An j = ≤ X < 2n 2n A n = (X < −n), A
n = (X ≥ n), n = 1, 2, . . .
Then, clearly, {An j , j = −n2n + 1, . . . , n2n , A n , A
n } is a (measurable) partition n j−1 of . Thus, if we define X n by X n = n2 j=−n2n +1 2n I An j + (−n)I A n + n I A
n , then X n is a simple r.v. We are going to show next that X n (ω) → X (ω) for every ω ∈ . n→∞
Let ω ∈ . Then there exists n o = n o (ω) such that |X (ω)| < n o . It is asserted that ω ∈ An j for n ≥ n o , some j = −n2n + 1, . . . , n2n . This is so because for j n n n ≥ n o , [−n, n) ⊇ [−n o , n o ) and the intervals [ j−1 2n , 2n ), j = −n2 + 1, . . . , n2 form a partition of [−n, n). Let that ω ∈ An j(n) . Then j(n)−1 ≤ X (ω) < j(n) 2n 2n . But j(n)−1 then X n (ω) = 2n so that |X n (ω) − X (ω)| <
1 . Thus X n (ω) → X (ω). n→∞ 2n
To this theorem we have the following: Corollary 3. If the r.v. X ≥ 0, then there exists a sequence of simple r.v.s X n such that 0 ≤ X n ↑ X as n → ∞. Proof. If X ≥n 0, then X n of the theorem becomes as follows: j−1
X n = n2 j=1 2n I An j + n An , so that 0 ≤ X n → X . We will next show that n→∞
j X n ↑. For each n, we have that [0, n) is divided into the n2n subintervals [ j−1 2n , 2n ), n n+1 j = 1, 2, . . . , n2 , and for n + 1, [0, n + 1) is divided into (n + 1)2 subintervals j n+1 , and each one of the intervals in the first , ), j = 1, 2, . . . , (n + 1)2 [ 2j−1 n+1 2n+1 class of intervals is split into two intervals in the second class of intervals. Thus X n (ω) ≤ X n+1 (ω) for every ω ∈ (see following picture).
[ j−1 2n 2(j − 1) 2n+1
X(ω)
)[ 2j − 1 2n+1
X(ω)
) j 2n 2j 2n+1
Remark 15. The significance of the corollary is that the nondecreasing simple r.v.s X n are also ≥ 0. This point will be exploited later on in the so-called Lebesgue Monotone Convergence Theorem and elsewhere. Remark 16. extended r.v.
Theorem 17 and its corollary are, clearly, true even if X is an
1.3 Measurable Functions and Random Variables
Exercises. 1. Consider the measurable space (, A) and let An ∈ A, n = 1, 2, . . . Then recall that lim inf An = lim An = n→∞
n→∞
∞ ∞
A j and lim sup An = lim An = n→∞
n→∞
n=1 j=n
∞ ∞
Aj.
n=1 j=n
(i) Show that limn→∞ An ⊆ limn→∞ An . (If also limn→∞ An ⊆ limn→∞ An , so that limn→∞ An = limn→∞ An , then this set is denoted by limn→∞ An and is called sequence {An }, n ≥ 1.) c the limit ofthe c (ii) Show that limn→∞ An = limn→∞ Acn , limn→∞ An = limn→∞ Acn . Conclude that if limn→∞ An = A, then limn→∞ Acn = Ac . (iii) Show that lim (An ∩ Bn ) = lim An ∩ lim Bn , n→∞
and lim (An ∪ Bn ) =
n→∞
n→∞
(iv) Show that lim (An ∩ Bn ) ⊆
lim An ∪ lim Bn .
n→∞
n→∞
lim (An ∪ Bn ) ⊇
n→∞
n→∞
lim An ∩ lim Bn ,
n→∞
n→∞
and
n→∞
lim An ∪
n→∞
lim Bn .
n→∞
(v) By a counterexample, show that the inverse inclusions in part (iv) do not hold, so that limn→∞ (An ∩ Bn ) need not be equal to limn→∞ An ∩ limn→∞ Bn , and limn→∞ (An ∪ Bn ) need not be equal to limn→∞ An ∪ limn→∞ Bn . (vi) If limn→∞ An = A and limn→∞ Bn = B, then show that limn→∞ (An ∩ Bn ) = A ∩ B and limn→∞ (An ∪ Bn ) = A ∪ B. (vii) If limn→∞ An = A, then show that for any set B, limn→∞ (An B) = A B, where An B is the symmetric difference of An and B. (viii) If A2 j−1 = B and A2 j = C, j = 1, 2, . . ., determine limn→∞ An and limn→∞ An . Under what condition on B and C does the limit exist, and what is it equal to? Hint: (i) Use the definition of limn→∞ An and limn→∞ An , and show that each side is contained in the other. (ii) Use the definition of limn→∞ An , limn→∞ An , and DeMorgan’s laws.
13
14
CHAPTER 1 Certain Classes of Sets, Measurability
(iii), (iv) Use the definition of limn→∞ An and limn→∞ An , and then show that each side is included in the other. (v) A choice of the An s and Bn s that does this is if one takes A2 j−1 = A, A2 j = A0 , B2 j−1 = B, B2 j = B0 , j ≥ 1, then take = , and finally select A, A0 , B and B0 suitably. (vi) Use parts (iii) and (iv). (vii) It follows from parts (vi) and (ii). (viii) It follows from part (v). 2.
(i) Setting A = lim inf n→∞ An , A = lim supn→∞ An , and A = limn→∞ An if it exists, show that all A, A, and A are in A. (ii) If An ↑ as n → ∞, show that limn→∞ An exists and is equal to ∪∞ n=1 An , and if An ↓ as n → ∞, then limn→∞ An exists and is equal to ∩∞ n=1 An .
3. Carry out the details of the proof of Theorem 1. 4. By means of an example, showthat A j ∈ F, j ≥ 1, need not imply that ∞ ∞ j=1 A j ∈ F, and similarly for j=1 A j . 5. Let P = {An , n = 1, 2, . . .} be a partition of where An = , n ≥ 1, and let C be the class of all sums of members in P. Then show that C is the σ -field generated by the class P. 6. Let C0 be the class of all intervals in , and consider the eight classes C j , j = 1, . . . , 8, each of which consists of all intervals in C0 of one type. Then B = σ (C j ), j = 1, . . . , 8. Also, if C j denotes the class we get from C j by considering intervals with rational endpoints, then σ (C j ) = B, j = 1, . . . , 8. Hint: One may choose to carry out the detailed proof for just one of these classes, e.g., the class C1 = {(x, y); x, y ∈ , x < y} or the class C1 = {(x, y); x, y rationals in with x < y}. (i) If C is the class of all finite sums of intervals in (unions of pair7. wise disjoint intervals) of the form: (α, β], α, β ∈ , α < β; (−∞, α], α ∈ ; (β, ∞), β ∈ , , then C is a field and σ (C) = B. (ii) The same is true if C is the class of all finite sums of all kinds of intervals in . 8. Consider the space (, F) and for an arbitrary but fixed set A with ⊂ A ⊂ , define F A by: F A = {B ⊆ ; B = A ∩ C, C ∈ F}. Then F A is a field (of subsets of A). Hint: Notice that the complement of a set in F A is with respect to the set A rather than . 9. Consider the space (, A) and let A be as in Exercise 8. Define A A by A A = {B ⊆ ; B = A ∩ C, C ∈ A}. Then A A is a σ -field (of subsets of A). Furthermore, A A = σ (F A ), where F A is as in Exercise 8 and A = σ (F). Hint: First, show that A A is a σ -field and σ (F A ) ⊆ A A . Next, show that σ (F A ) ⊇ A A by showing that, for any σ -field A∗ of subsets of A, with A∗ ⊇ F A , it holds that A∗ ⊇ A A . This is done by defining M by M = {C ∈ A; A ∩ C ∈ A∗ } and showing that M is a monotone class.
1.3 Measurable Functions and Random Variables
10. Show that, if {An }, n ≥ 1, is a nondecreasing sequence of σ -fields, then ∞ n=1 An is always a field, but it may fail to be a σ -field. 11. Carry out the details of the proof of Theorem 4. 12. By means of an example, show that a monotone class need not be a field. 13. Carry out the details of the proof of Lemma 2. 14. Let 1 , 2 be two spaces and let A, Ai ⊆ 1 , B, Bi ⊆ 2 , i = 1, 2. Then show that (i) (A1 × B1 ) − (A2 × B2 ) = [(A1 ∩ A2 ) × (B1 − B2 )] + [(A1 − A2 ) × B1 ]. (ii) A × B = , if and only if at least one of A, B is . (iii) If Ai × Bi , i = 1, 2 are = , then A1 × B1 ⊆ A2 × B2 , if and only if A1 ⊆ A2 , B1 ⊆ B2 . (iv) If A1 × B1 = A2 × B2 = , then A1 = A2 and B1 = B2 . (v) Let A× B, Ai × Bi , i = 1, 2 be = . Then A× B = (A1 × B1 )+(A2 × B2 ), if and only if A = A1 + A2 and B = B1 = B2 , or A = A1 = A2 and B = B1 + B2 . 15.
(i) With A ⊆ 1 , and B ⊆ 2 , show that A × B = if and only if at least one of A or B is equal to . (ii) With A1 , A2 ⊆ 1 and B1 , B2 ⊆ 2 , set E 1 = A1 × B1 and E 2 = A2 × B2 and assume that E 1 and E 2 are = . Then E 1 ⊆ E 2 if and only if A1 ⊆ A2 and B1 ⊆ B2 . Explain why the assumption that E 1 and E 2 are = is essential.
16.
(i) Let Ai ⊆ i , i = 1, 2, . . . , n, and set E = A1 × · · · × An . Then E = if and only if at least one of Ai , i = 1, 2, . . . , n, is = . (ii) If also Bi ⊆ i , i = 1, 2, . . . , n, and F = B1 × · · · × Bn , then show that E ∩ F = (A1 × · · · × An ) ∩ (B1 × · · · × Bn ) = (A1 ∩ B1 ) × · · · × (An ∩ Bn ).
17. For i = 1, 2, . . . , n, let Ai , Bi , Ci ⊆ i and set E = A1 × · · · , ×An , F = B1 × · · · × Bn , G = C1 × · · · × Cn . Suppose that E, F, and G are all = and that E = F + G. Then show that there exists a j with 1 ≤ j ≤ n such that A j = B j + C j while Ai = Bi = Ci for all i = j. 18. In reference to Theorem 7, show that C is still a field, if Ai is replaced by a field Fi , i = 1, 2. 19. Consider the measurable spaces (i , Ai ), i = 1, 2, and let C be the class of all countable sums of rectangles (unions of pairwise disjoint rectangles) in the product space 1 × 2 . Then by an example, show that C need not be a σ -field. Remark: Compare it to Theorem 7 in this chapter. Hint: Take 1 = 2 = [0, 1] and show that the main diagonal D of the rectangle [0, 1] × [0, 1] belongs in the σ -field generated by the field of all finite rectangles, but it is not in C. 20. Carry out the details of the proof of Theorem 10. 21. Carry out the details of the proof of Theorem 11.
15
16
CHAPTER 1 Certain Classes of Sets, Measurability
22. Consider the mapping X defined on (, A) onto = X (), the image of under X , and let C ⊆ P( ) be defined as follows: C = {B ⊆ ;
B = X (A),
Then, by means of an example, show that
C
A ∈ A}.
need not be a σ -field.
Remark: Compare this result with Theorem 11 in this chapter.
n 23. Consider the measurable space (, A) and let X be defined by i=1 αi I Ai or ∞ α I , where α ∈ are distinct for all i and {A , . . . , An } or X = i 1 i=1 i Ai {Ai , i ≥ 1} are partitions of . Then show that X is a r.v. (a simple r.v. and an elementary r.v., respectively) if and only if the partitions are measurable (i.e., Ai ∈ A for all i). 24. If X and Y are mappings on into , show that {ω ∈ ; X (ω) + Y (ω) < x} = {ω ∈ ; X (ω) < r } ∩ {ω ∈ ; Y (ω) < x − r } ,
r ∈Q
where Q is the set of rationals in . 25. If X is a r.v. defined on the measurable space (, A), then |X | is also a r.v. By an example, show that the converse need not be true. 26. By a direct argument (that is, by using the definition of measurability), show that, if X and Y are r.v.s, then so are the mappings X ± Y , X Y , and X /Y (Y = 0 a.s.; i.e., Y = 0 with probability 1, or Y = 0 almost surely). 27. Carry out the details of the proof of the Corollary to Theorem 14. 28. If X and Y are r.v.s defined on (, A), show that (X + Y )+ ≤ X + + Y + and (X + Y )− ≤ X − + Y − . 29. Let A1 , A2 , . . . be arbitrary events in (, A), and define Bm by: Bm = “Am is the first event which occurs among the events A1 , A2 , . . .,” m ≥ 1. Then (i) Express Bm in terms of An s, m ≥ 1. (ii) Show that B1 , B2 , . . . are pairwise disjoint. ∞ (iii) Show that ∞ m=1 Bm = ∪n=1 An . 30. For a sequence of events {An }, n ≥ 1, show that (i) limn→∞ An = {ω ∈ ; ω ∈ An for all but finitely many ns}, (ii) limn→∞ An = {ω ∈ ; ω ∈ An for infinitely many ns} (to be denoted by (An i.o.) and read (An s occur infinitely often )). 31. If An and Bn are events such that An ⊆ Bn , n ≥ 1, then show that (An i.o.) ⊆ (Bn i.o.). 32. In , let Q be the set of rational numbers, and for n = 1, 2, . . ., let An be defined by 1 1 ; r∈Q . An = r ∈ 1 − ,1 + n+1 n Examine whether or not the limn→∞ An exists.
1.3 Measurable Functions and Random Variables
33. In , define the sets An , n = 1, 2, . . . as follows: 1 1 , A2n = 0, . A2n−1 = −1, 2n − 1 2n Examine whether or not the limn→∞ An exists. 34. Take = , and let An be the σ -field generated by the class {[0, 1), [1, 2), . . . , [n − 1, n)}, n ≥ 1. Then show that (i) An ⊆ An+1 , n ≥ 1, and indeed An ⊂ An+1 , n ≥ 1. (ii) The class ∪∞ n=1 An is not a σ -field. (iii) Describe explicitly A1 and A2 . 35. Let A1 , . . . , An be arbitrary subsets of an abstract set , and let Ai be either Ai or Aic , i = 1, . . . , n. Define the class C of subsets of as follows: C = {all unions of the intersections A 1 ∩ · · · ∩ A n }. Then show that (i) The class C is a field (generated by the sets A1 , . . . , An ). (ii) Compute the number of elements of C. 36. If f : → , then show that (i) f −1 [ f (A)] ⊇ A, A ⊆ . (ii) f [ f −1 (B)] ⊆ B, B ⊆ . (iii) By concrete examples, show that the relations in (i) and (ii) may be strict. 37.
(i) On the measurable space (, A), define the function X as follows: ⎧ ⎨ −1 on A1 , 1 on Ac1 ∩ A2 , X (ω) = ⎩ 0 on Ac1 ∩ Ac2 , where A1 , A2 ∈ A. Examine whether or not X is a r.v. (ii) On the measurable space (, A) with = {a, b, c, d} and A = {, {a, b}, {c, d}, }, define the function X as follows: X (a) = X (b) = −1, X (c) = 1, X (d) = 2. Examine whether or not X is a r.v. (iii) If = {−2, −1, 0, 1, 2} and X is defined on by X (ω) = ω, determine the field induced by X and that induced by X 2 . Verify that the latter is contained in the former.
38. For a sequence of r.v.s {X n }, n ≥ 1, set Bk = σ (X k , X k+1 , . . .), k ≥ 1. Then show that for every k and l with k < l, it holds that Bk ⊇ Bl . 39. For the r.v.s X 1 , X 2 , . . . , X n , set Sk = kj=1 X j , k = 1, . . . , n, and show that σ (X 1 , X 2 , . . . , X n ) = σ (S1 , S2 , . . . , Sn ). de f
40. For any set B ⊆ , the set B + c = Bc is defined by: Bc = {y ∈ ; y = x + c, x ∈ B}. Then show that if B is measurable, so is Bc .
17
18
CHAPTER 1 Certain Classes of Sets, Measurability
41. Let be an abstract set, and let C be an arbitrary nonempty class of subsets of . Define the class F1 to consists of all members of C as well as all of their complements; i.e., F1 = {A ⊆ ; = {A ⊆ ;
A ∈ C or A = C c with C ∈ C} A ∈ C or Ac ∈ C} = C ∪ {C c ; C ∈ C},
so that F1 is closed under complementation. Next, define the class F2 as follows: F2 = {all finite intersections of members of F1 } = {A ⊆ ; A = A1 ∩ · · · ∩ Am , Ai ∈ F1 , i = 1, . . . , m, m ≥ 1}. Also, define the class F3 by F3 = {all finite unions of members of F2 } n
= {A ⊆ ;
A = ∪ Ai with Ai ∈ F2 , i = 1, . . . , n, n ≥ 1}
= {A ⊆ ;
A = ∪ Ai with Ai = Ai1 ∩ · · · ∩ Aim i ,
Ai1 , . . . ,
i=1 n i=1
Aim i
∈ F1 , , m i ≥ 1 integers, i = 1, . . . , n, n ≥ 1}. Set F3 = F and show that (i) F is a field. (ii) F is the field generated by C; i.e., F = F(C). 42. Refer to Exercise 41, and set A1 = F1 . Then define the classes A2 and A3 instead of F2 and F3 , respectively, by replacing finite intersections and finite unions by countable intersections and countable unions, respectively. Set A3 = A and examine whether or not A is a σ -field. Hint: For A ∈ A, check whether you can declare that Ac ∈ A.
CHAPTER
Definition and Construction of a Measure and its Basic Properties
2
In this chapter, the concept of a measure is defined, and some of its basic properties are established. We then proceed with the introduction of an outer measure, study its relationship to the underlying measure, and determine the class of sets measurable with respect to the outer measure. These results are used as a basis toward obtaining an extension of a given measure from a field to the σ -field generated by this field. Next, by means of a measure, a class of point real-valued functions is defined, and their basic properties are studied. Finally, it is shown that any nondecreasing right-continuous function induces a unique measure in the Borel real line.
2.1 About Measures in General, and Probability Measures in Particular Consider the measurable space (, A). Then Definition 1.
¯ is said to be a measure if A (set) function μ : A →
(i) μ(A) ≥ 0 for every A ∈ A (μ is nonnegative). ∞ ∞ (ii) μ j=1 A j = j=1 μ(A j ), A j ∈ A, j = 1, 2, . . . (μ is σ -additive). (iii) μ() = 0. μ is said to be infinite if μ() = ∞; σ -finite if μ() = ∞ but there exists a partition {A j , j = 1, 2, . . .} of such that μ(A j ) < ∞, j = 1, 2, . . .; finite if μ() < ∞; and a probability measure, denoted by P, if μ() = 1. The triple (, A, μ) is called a measure space, and in case μ = P, (, A, P) is called a probability space. Remark 1. It is possible that μ(A) = ∞ for every = A ∈ A, but this is a rather uninteresting case. So from now on, we will always assume that there exists at least one = A ∈ A such that μ(A) < ∞. In such a case, μ() = 0 is a consequence of (i) and (ii). In fact, let = A ∈ A such that μ(A) < ∞. Then ∞ A , where A = A, A = , j = 2, 3, . . . So μ(A) = μ( A= ∞ j 1 j j=1 j=1 A j ) = ∞ μ(A) + j=2 μ() implies μ() = 0.
An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00002-5 Copyright © 2014 Elsevier Inc. All rights reserved.
19
20
CHAPTER 2 Definition and Construction
Remark 2. Occasionally, we may be talking about a measure μ defined on a field F of subsets of rather than a σ -field A. This means that (i) μ(A) ≥ 0 for every A ∈ F. ∞ ∞ ∞ (ii) μ j=1 A j = j=1 μ(A j ) for those A j ∈ F for which j=1 A j ∈ F. (iii) μ() = 0. Then Theorem 1(i), which follows, shows that ⎛ ⎞ n n ⎝ ⎠ μ Aj = μ(A j ); j=1
j=1
i.e., μ is finitely additive on F. Consider the measure space (, A, μ). Then n n (i) μ is finitely additive; i.e., μ = j=1 A j j=1 μ(A j ), A j ∈ A, j = 1, . . . , n. (ii) μ is nondecreasing; i.e., μ(A1 ) ≤ μ(A2 ), A1 , A2 ∈ A, A1 ⊆ A2 .
∞ ∞ ≤ (iii) μ is sub-σ -additive; i.e., μ j=1 A j j=1 μ(A j ), A j ∈ A, j = 1, 2, . . . Theorem 1.
Proof.
∞ (i) We have nj=1 A j = j=1 B j , where B j = A j , j = 1, . . . , n, B j = , j = n + 1, . . . ∞ n Then μ( nj=1 A j ) = μ( ∞ j=1 B j ) = j=1 μ(B j ) = j=1 μ(B j ) = n μ(A ). j j=1 (ii) A1 ⊆ A2 implies A2 = A1 +(A2 − A1 ), so that μ(A2 ) = μ[A1 +(A2 − A1 )] = μ(A1 ) + μ(A2 − A1 ) ≥ μ(A1 ). From this, it also follows that: A1 ⊆ A2 implies μ(A2 − A1 ) = μ(A2 )−μ(A1 ), provided μ(A1 ) is finite. c
∞ c c (iii) j=1 A j = A1 + A1 ∩ A2 + · · · + A1 ∩ · · · ∩ An ∩ An+1 + · · ·, so that ⎛ ⎞ ∞
μ⎝ A j ⎠ = μ(A1 ) + μ Ac1 ∩ A2 + · · · j=1
+ μ Ac1 ∩ · · · ∩ Acn ∩ An+1 + · · · ≤ μ(A1 ) + μ(A2 ) + · · · + μ(An+1 ) + · · · ∞ = μ(A j ).
j=1
Definition 2. Consider the measurable space (, A) and let μ be a measure on A. We say that μ is continuous from below, if for every A j ∈ A, j = 1, 2, . . . with A j ↑,
2.1 About Measures in General, and Probability Measures in Particular
we have μ(A j ) ↑ μ(lim j→∞ A j )(= μ( ∞ j=1 A j )). We say that μ is continuous from above, if for every A j ∈ A, j = 1, 2, . . . with A j ↓ and for which there exists an An such that μ(An ) < ∞, we have μ(A j ) ↓ μ(lim j→∞ A j )(= μ( ∞ j=1 A j )). μ is said to be continuous, if it is both continuous from below and continuous from above. We say that μ is continuous at , if for every A j ∈ A, j = 1, 2, . . . with A j ↓ and for which there exists an An such that μ(An ) < ∞, we have μ(A j ) ↓ 0. Remark 3. In defining continuity from above and continuity at for μ, one has got to assume that there exists an An such that μ(An ) < ∞ (then, of course, μ(A j ) < ∞ for all j ≥ n). In fact, consider the sets A j = [ j, ∞), j = 1, 2, . . . in the real line with μ the Lebesgue measure (to be defined precisely later on and that assigns as measure to each interval its length). Then μ([ j, ∞)) = ∞, j = 1, 2, . . . but ∞ j=1 [ j, ∞) = . Thus μ([ j, ∞)) → ∞, μ() = 0. j→∞
The following theorem relates the concepts of additivity and continuity. Theorem 2. (i) A measure μ is finitely additive and continuous. (ii) If the set function μ is nonnegative, μ() = 0, and finitely additive only, and either continuous from below, or finite and continuous at , then μ is σ -additive (hence a measure). Proof. (i) The finite additivity of μ was proved in Theorem 1(i). Now we will prove con↑. If μ(An ) = ∞ for tinuity. Let first A j ∈ A, j = 1, 2, . . ., and A j be
some n, then μ(A j ) = ∞ for all j ≥ n, so that μ( ∞ j=1 A j ) = ∞. Thus
μ(A j ) → μ( ∞ A ). So we may assume that μ(A ) < ∞ for all j. Then j j=1 j j→∞
lim A j =
j→∞
∞
j=1
A j = A1 + Ac1 ∩ A2 + · · · + Ac1 ∩ · · · ∩ Acn−1 ∩ An + · · · = A1 + (A2 − A1 ) + · · · + (An − An−1 ) + · · ·
Thus, μ( lim A j ) = μ[A1 + (A2 − A1 ) + · · · + (An − An−1 ) + · · · ] j→∞
= μ(A1 ) + μ(A2 − A1 ) + · · · + μ(An − An−1 ) + · · · = lim [μ(A1 ) + μ(A2 − A1 ) + · · · n→∞
+ μ(An − An−1 )]
21
22
CHAPTER 2 Definition and Construction
= lim [μ(A1 ) + μ(A2 ) − μ(A1 ) + · · · n→∞
+ μ(An ) − μ(An−1 )] = lim μ(An ). n→∞
This establishes continuity from below. Let A j now be
↓ as j → ∞ and let μ(An 0 ) < ∞. Then An 0 − A j is ↑ for j ≥ n 0 , and ∞ j=n 0 (An 0 − A j ) = ∞ An 0 − j=n 0 A j ; i.e., as j → ∞, An 0 − A j ↑ An 0 − ∞ j=n 0 A j and thus A ) by the previous result. Since μ(An 0 − A j ) = μ(An 0 − A j ) ↑ μ(An 0 − ∞ j j=n 0 ∞ A ) = μ(A ) − μ( A ), we have μ(A j ) ↓ μ(An 0 ) − μ(A j ), μ(An 0 − ∞ n0 j=n 0 j j=n 0 j ∞ ∞ μ( j=n 0 A j ) = μ( j=1 A j ). (ii) Assume first that μ is continuous from below and take limits as n → ∞. Let A j ∈ A, j = 1, 2, . . . be pairwise disjoint. Then, clearly, nj=1 A j ↑ ∞ μ( nj=1 A j ) ↑ μ( ∞ below. But j=1 A j . Hence j=1 A j ) by continuity from n n μ( nj=1 A j ) = μ(A ), by finite additivity, and lim j n j=1 j=1 μ(A j ) = ∞ ∞ ∞ μ(A ). Thus μ(A ) = μ( A ); i.e., μ is σ -additive. Now j j j=1 j=1 j=1 j ∈ A, j = 1, 2, . . . be assume that μ is finite and continuous at , and let A n ∞ j A = A + A which implies that pairwise disjoint. Then ∞ j=1 j j=1 j j=n+1 j ⎛ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎞ ∞ n ∞ n ∞ μ⎝ A j⎠= μ ⎝ A j ⎠ +μ ⎝ A j⎠= μ(A j )+μ ⎝ A j ⎠, j=1
j=1
j=n+1
j=1
j=n+1
(2.1) by finite additivity. Next, ∞ A ↓ , evidently, because of the disjointness j j=n+1 of the A j s. Then μ( ∞ j=n+1 A j ) ↓ μ() = 0 by the finiteness of μ and its continuity at . Thus, by taking the limits in (2.1), as n → ∞, we get ⎞ ⎛ ∞ ∞ n A j ⎠ = lim μ(A j ) = μ(A j ). μ⎝ j=1
n→∞
j=1
j=1
Remark 4. In the case if μ(A of μ being continuous from below, j∞) = ∞ for at least ∞ one j, j0 , say, then μ( ∞ j=1 A j ) = ∞, so that μ( j=1 A j ) = j=1 μ(A j )(= ∞). So we may assume, if we wish, that μ(A j ) < ∞ for all j ≥ 1.
2.2 Outer Measures Again, let P() be the class of all subsets of and let C, C be two subclasses of P(). Let ϕ, ϕ also be two set functions defined on C, C , respectively, and taking ¯ Then values in . Definition 3. We say that ϕ is an extension of ϕ, and ϕ is a restriction of ϕ , if C ⊂ C and ϕ = ϕ on C.
2.2 Outer Measures
Definition 4.
¯ is said to be an outer measure, if A set function μ◦ : P() →
(i) μ◦ () = 0. (A) ≤ μ◦ (B). (ii) μ◦ is nondecreasing; i.e., A ⊂ B implies μ◦ ∞ ◦ ◦ ◦ (iii) μ is sub-σ -additive; i.e., μ ( n=1 An ) ≤ ∞ n=1 μ (An ).
Remark 5. ◦ (i) (i) μ◦ (A) ≥ 0 for all A, since ⊆ A implies 0 = μ◦ ()
n ≤ μ (A) by
and (ii). ◦ ◦ (ii) It follows that μ is finitely subadditive, since μ ( j=1 A j ) = μ◦ ( ∞ j=1 B j ), where B j = A j , j = 1, . . . , n, B j = , j ≥ n + 1. Then
⎛ μ◦ ⎝
n
j=1
⎞
⎛
A j ⎠ = μ◦ ⎝
∞
⎞ Bj⎠ ≤
∞
j=1
μ◦ (B j ) =
n
j=1
μ◦ (B j ) =
j=1
n
μ◦ (A j ).
j=1
(iii) A measure is an outer measure restricted to A ⊆ P(). Now let F be a field of subsets of , let μ be a measure on F and let μ∗ : P() → ¯ be defined as follows: μ(A j )}, where the inf is taken Definition 5. For A ∈ P(), μ∗ (A) = inf{ ∞
∞ j=1 over all A j ∈ F, j = 1, 2, . . . such that j=1 A j ⊇ A; i.e., over all countable coverings of A by unions of members of F. (Clearly, for every A ∈ P() there exists such a covering, since ∈ F.) Then we have the following theorem. Theorem 3. Let μ be a measure on F, a field of subsets of , and let μ∗ be defined on P() as before. Then (i) (ii) (iii) (iv)
μ∗ is an extension of μ (from A to P()). μ∗ is an outer measure. If μ is σ -finite on F, then μ∗ is σ -finite on P(). If μ is finite on F, then μ∗ is finite on P().
Proof. (i) Let A ∈ F. Then A ⊆ A so that μ∗ (A) ≤ μ(A) by the definition of μ∗ . Thus, it suffices to show
that μ∗ (A) ≥ μ(A). Let A j ∈ F, j = 1, 2, . . ., be a A . At this point we notice that ∞ covering of A; i.e., A ⊆ ∞ j=1 j=1 A j need
∞j not belong in F and hence μ( j=1 A j ) need not be defined at all. So we work
∞
∩ A ), while A ∩ A j ∈ F, since as follows: A = A ∩ ( ∞ j=1 A j ) = j=1 (A
∞ j A, A j ∈ F, j = 1, 2, . . . Then μ(A) = μ[ j=1 (A ∩ A j )] ≤ ∞ j=1 μ(A ∩ A j ) ∞ (see Remark 2(ii) and Theorem 1(iii)), and this is ≤ j=1 μ(A j ); i.e., μ(A) ≤ ∞ ∗ ∗ j=1 μ(A j ) so that μ(A) ≤ μ (A). Thus μ = μ on F.
23
24
CHAPTER 2 Definition and Construction
(ii) First, that μ∗ () = 0 follows from part (i). Next, let A ⊂ B. Since every covering of B is a covering of A, we get μ∗ (A) ≤ μ∗ (B). Thus it remains to prove sub-σ -additivity. Let A j ∈ P(), j = 1, 2, . . ., and let ε > 0. For each j, it follows from the definition of μ∗ (A j ) that there exists a covering ∞
ε > μ(A jk ). (2.2) 2j k=1
Now, from A j ⊆ ∞ A jk , j = 1, 2 . . ., it follows that ∞ Aj ⊆ ∞ k=1 j=1 j=1
∞
∞ k=1 A jk ; i.e., {A jk , j, k = 1, 2, . . .} is a covering of j=1 A j . Hence ⎛ ⎞ ∞ ∞ ∞
μ∗ ⎝ Aj⎠ ≤ μ(A jk ). (2.3) A jk ∈ F, k = 1, 2, . . . , such that μ∗ (A j ) +
j=1
From (2.2), we have ∞ ∞ j=1 k=1
μ(A jk ) ≤
∞ j=1
j=1 k=1
⎛
⎞ ∞ 1 μ∗ (A j ) + ε ⎝since = 1⎠ . 2j
(2.4)
j=1
From (2.3) and (2.4), we get ⎛ ⎞ ∞ ∞
Aj⎠ ≤ μ∗ (A j ) + ε. μ∗ ⎝ j=1
j=1
Letting ε → 0, we get the desired result. (iii) Since μ is σ -finite on F, there exists a partition {A j , j = 1, 2, . . .} of with A j ∈ F, j = 1, 2, . . ., such that μ(A j ) < ∞, j = 1, 2, . . . But A j ∈ F implies μ∗ (A j ) = μ(A j ), j = 1, 2, . . ., by (i). Thus μ∗ is also σ -finite. (iv) Finally, μ∗ () = μ() < ∞, by (i), since ∈ F. Theorem 3 exhibits the existence (and provides the construction) of an outer measure, namely μ∗ . Then we may denote μ∗ by μo . This outer measure μo is said to be induced on P() by μ defined on F. Definition 6. Let μo be an outer measure. Then a set A ⊆ is said to be μo measurable, if for every D ⊆ , we have: μo (D) = μo (A ∩ D) + μo (Ac ∩ D) (i.e., μo is additive for A ∩ D and Ac ∩ D).
Ω
A
Ac D
2.2 Outer Measures
Remark 6. (i) Since D = (A ∩ D) ∪ (Ac ∩ D) implies μo (D) ≤ μo (A ∩ D) + μo (Ac ∩ D), in order to check μo -measurability for A, it suffices to check that μo (D) ≥ μo (A ∩ D) + μo (Ac ∩ D)
for every D ⊆ .
(ii) There are μo -measurable sets. In fact and are such sets, because D = ( ∩ D) ∪ (c ∩ D) , and μo ( ∩ D) = μo (D), μo (c ∩ D) = μo () = 0, so that μo (D) = μo ( ∩ D) + μo (c ∩ D), and similarly for the set. Theorem 4.
Let μo be an outer measure. Then
(i) The class Ao of μo -measurable sets is a σ -field. (ii) μo , restricted on Ao , is a measure.
Proof. (i) We first prove that Ao is a field. Let A ∈ Ao . Then μo (D) = μo (A ∩ D) + μo (Ac ∩ D) = μo (Ac ∩ D) + μo [(Ac )c ∩ D] for every D ⊆ , and this shows that Ac ∈ Ao . Next, let A, B ∈ Ao . To show that (A ∩ B) ∈ Ao. Since B ∈ Ao , we get by writing μo (D) = μo (B ∩ D) + μo (B c ∩ D) and taking D to be A ∩ D and Ac ∩ D, successively, o μ (A ∩ D) = μo [B ∩ (A ∩ D)] + μo [B c ∩ (A ∩ D)], μo (Ac ∩ D) = μo [B ∩ (Ac ∩ D)] + μo [B c ∩ (Ac ∩ D)], so that μo (A ∩ D) + μo (Ac ∩ D) = μo [B ∩ (A ∩ D)] + μo [B c ∩ (A ∩ D)] + μo [B ∩ (Ac ∩ D)] + μo [B c ∩ (Ac ∩ D)].
(2.5)
But A ∈ Ao implies μo (D) = μo (A∩ D)+μo (Ac ∩ D). Taking this into consideration and the fact that μo is finitely subadditive, (2.5) becomes μo (D) ≥ μo (A ∩ B ∩ D) + μo [(A ∩ B c ∩ D) + (Ac ∩ B ∩ D) +(Ac ∩ B c ∩ D)]. Now (A ∩ B c ∩ D) + (Ac ∩ B ∩ D) + (Ac ∩ B c ∩ D) = D ∩ [(A ∩ B c ) + (Ac ∩ B) + (Ac ∩ B c )], = D ∩ [(A B) + (A ∪ B)c ] = D ∩ (A ∩ B)c . Therefore (2.6) becomes μo (D) ≥ μo [(A ∩ B) ∩ D] + μo [(A ∩ B)c ∩ D] so that (A ∩ B) ∈ Ao . So Ao is a field.
(2.6)
25
26
CHAPTER 2 Definition and Construction
o o Finally, 2, . . ., and set
∞A j ∈ A , j = 1,
∞ we prove that A is a σo -field. Let A = j=1 A j . To show that A ∈ A . Since j=1 A j = A1 + (Ac1 ∩ A2 ) + · · · + (Ac1 ∩ · · · ∩ Acn−1 ∩ An ) + · · · and (Ac1 ∩ · · · ∩ Acn−1 ∩ An ) ∈ Ao , n = 2, 3, . . ., by the factthat Ao is a field, it suffices to assume that the A j s are pairwise disjoint. Set Bn = nj=1 A j , B0 = . Then Bn ∈ Ao , n = 1, 2, . . ., and therefore (2.7) μo (D) = μo (Bn ∩ D) + μo Bnc ∩ D for every D ⊆ .
Next, An ∈ Ao . Thus, by writing μo (D) = μo (An ∩ D) + μo (Acn ∩ D) and taking D to be Bn ∩ D, we have μo (Bn ∩ D) = μo [An ∩ (Bn ∩ D)] + μo Acn ∩ (Bn ∩ D) , = μo (An ∩ D) + μo (Bn−1 ∩ D), since An ⊆ Bn and Acn ∩ Bn = Acn ∩ (A1 + · · · + An ) = A1 + · · · + An−1 = Bn−1 . the same way with That is, μo (Bn ∩ D) = μo (An ∩ D)+μo (Bn−1 ∩ D). Working in o μ (Bn−1 ∩ D), etc. (or by using induction), we get μo (Bn ∩ D) = nj=1 μo (A j ∩ D). Then (2.7) becomes as follows: n
μo (D) =
μo (A j ∩ D) + μo Bnc ∩ D .
j=1
But μo (Bnc ∩ D) ≥ μo (Ac ∩ D), since A ⊇ Bn or equivalently Ac ⊆ Bnc , and μo is nondecreasing. Thus μ (D) ≥ o
n
μo (A j ∩ D) + μo (Ac ∩ D).
j=1
Letting n → ∞, we get μo (D) ≥
∞
μo (A j ∩ D) + μo (Ac ∩ D),
j=1
⎤ ∞ ≥ μo ⎣ (A j ∩ D)⎦ + μo (Ac ∩ D), ⎡
j=1
= μ (A ∩ D) + μo (Ac ∩ D). o
Then A ∈
Ao ,
(2.8)
and this completes the proof of part (i).
(ii) Consider the first line on the right-hand side in inequality (2.8): μo (D) ≥ ∞ o o c j=1 μ (A j ∩ D) + μ (A ∩ D), D ⊆ , and set A instead of D. Then μo (A) ≥
∞
μo (A j ∩ A) + μo () =
j=1
since A j ∩ A = A j for all j.
∞ j=1
μo (A j ),
2.3 The Carathéodory Extension Theorem
Since the opposite inequality is always true, by sub-σ -additivity of μo , the proof is completed. (Observe that A0 may be contained strictly in P(); see Exercise 28(ii) in this chapter.) In the following section, an outer measure will be instrumental in extending a given measure from a field to the σ -field generated by it.
2.3 The Carathéodory Extension Theorem This section is devoted to the discussion of the Carathéodory extension theorem, which provides the basis for the construction and extension of measures. Theorem 5 (Carathéodory Extension Theorem). Let μ be a measure on a field F. Then (i) μ can be extended to the σ -field A generated by F. (ii) If μ is finite on F, then the extension is unique and finite. (iii) If μ is σ -finite on F, then the extension is unique and σ -finite.
Definition 7. The unique finite (σ -finite) extension of a finite (σ -finite) measure μ on F to A = σ (F) is called the Carathéodory extension. Proof of Theorem 5. (i) Let μ∗ be the set function defined just prior to Theorem 3. Then we saw that μ∗ is an outer measure on P(), is an extension of μ on F, and is σ -finite or finite, if μ is so, respectively. Also, μ∗ is a measure on the σ -field A∗ of μ∗ -measurable sets, by Theorem 4. Then, all we have to prove is that A ⊆ A∗ , or just that F ⊆ A∗ . Let A ∈ F and D ⊆ . To show μ∗ (D) ≥ μ∗ (A ∩ D) + μ∗ (Ac ∩ D). From the definition of μ∗ , for ε > 0, there exists a covering {A j , j = 1, 2, . . .} of D in F such that ∞ ∗ μ (D) + ε > μ(A j ). (2.9) j=1
Now μ(A j ) = μ[(A ∩ A j ) + (Ac ∩ A j )] = μ(A ∩ A j ) + μ(Ac ∩ A j ) = μ∗ (A ∩ A j ) + μ∗ (Ac ∩ A j ), since μ and μ∗ agree on F. Thus, (2.9) becomes μ∗ (D) + ε >
∞
μ∗ (A ∩ A j ) +
j=1
∞
μ∗ (Ac ∩ A j ).
j=1
(2.10)
∞ Next, ∞ D so that ( ∞ j=1 A j ⊇ j=1 A j ) ∩ A = j=1 (A ∩ A j ) ⊇ A ∩ D and
∞ c ∩ A ) ⊇ Ac ∩ D. Thus, (A ( j=1 A j ) ∩ Ac = ∞ j j=1 ∞ j=1
μ∗ (A ∩ A j ) ≥ μ∗
∞
(A ∩ A j ) ≥ μ∗ (A ∩ D),
j=1
27
28
CHAPTER 2 Definition and Construction
∞ j=1
μ∗ (Ac ∩ A j ) ≥ μ∗
∞
(Ac ∩ A j ) ≥ μ∗ (Ac ∩ D).
j=1
Adding up these relationships and using also (2.10), we get μ∗ (D) + ε ≥ μ∗ (A ∩ D) + μ∗ (Ac ∩ D). Since this is true for every ε > 0, we get then μ∗ (D) ≥ μ∗ (A ∩ D) + μ∗ (Ac ∩ D). This proves that A is μ∗ -measurable and hence A ∈ A∗ . (ii) That the extension is finite has been seen in Theorem 3. So, all we have to do is to prove uniqueness. Let μ1 be the above seen extension and let μ2 be any other extension. Define M ⊆ A as follows: M = {A ∈ A; μ1 (A) = μ2 (A)}. We shall show that M is a monotone class and equals A. First F ⊆ M, since μ1 = μ2 = μ on F. Let now {An } be a monotone sequence of sets in M. Then μ1 (limn→∞ An ) = limn→∞ μ1 (An ) = limn→∞ μ2 (An ) = μ2 (limn→∞ An ) (by the finiteness of μ1 and μ2 ); i.e., μ1 (limn→∞ An ) = μ2 (limn→∞ An ) and hence limn→∞ An ∈ M. Thus, M is a monotone class. Then M contains the minimal monotone class over F that coincides with A (by Theorem 6 in Chapter 1). Thus, μ1 , μ2 coincide on A. (iii) Again, the σ -finiteness of the extension follows from Theorem 3, and we only have to establish uniqueness. The σ -finiteness of μ implies the existence of a partition {A j , j = 1, 2, . . .} of in F such that μ(A j ) < ∞, j = 1, 2, . . .. For each A j , consider the classes F A j = {A j ∩ B; B ∈ F}, A A j = {A j ∩ B; B ∈ A}. Then F A j is a field and A A j is a σ -field. Furthermore, A A j is the σ -field generated by F A j (see Exercises 8 and 9 in Chapter 1). Let μ1 , μ2 be as in (ii). Then μ1 = μ2 , and finite on A A j by (ii). Next, let A ∈ A. Then A = ∞ j=1 (A ∩ A j ), while A ∩ A j ∈ A A j , j = 1, 2, . . ., so that μ1 (A ∩ A j ) = μ2 (A ∩ A j ), j = 1, 2, . . . Thus μ1 (A) = ∞ ∞ j=1 μ1 (A ∩ A j ) = j=1 μ2 (A ∩ A j ) = μ2 (A). Therefore μ1 = μ2 on A. Special cases. (1) Let = and let C be the class of all finite sums of intervals in . Then C is a field (by Exercise 7(ii) in Chapter1), and B = σ (C). Let μ(I ) = length of I , where I is an interval, and let μ(A) = nj=1 μ(I j ), if A ∈ C and hence A = nj=1 I j , I j , j = 1, 2, . . . , n, intervals. The set function μ is σ -finite, since, for ∞ example, = ∞ n=0 (−n −1, −n]+(0, 1)+ n=1 [n, n +1) and μ((−n −1, −n]) = μ([n, n + 1)) = 1 (finite). Then, provided that μ is well defined and a measure on C—which we will show later on (Theorem 7)—the unique extension of μ on B is called the Lebesgue measure. Let us denote it by λ. (2) For n ≥ 2, let C be the class of all finite sums of rectangles in n . Then C is a field and B n = σ (C) (by Theorem 7 in Chapter 1 and its extension). If
2.3 The Carathéodory Extension Theorem
B = A1 × · · · × An , A j Borel setsin , j = 1, . . . , n, define μ(B) as follows: μ(B) = nj=1 λ(A j ), and if E = mj=1 B j , B j , j = 1, . . . , m, rectangles in n , define μ(E) by μ(E) = mj=1 μ(B j ). It is easily seen that μ is σ -finite on C. Actually, there exists a denumerable partition of n by rectangles in n that are cartesian products of intervals in (recall that a set is denumerable if it has the same cardinality as the set of integers). Then, provided μ is well defined and a measure on C (that this is so is seen as a straightforward generalization of Theorem 7), the unique extension of μ on B n is called the n-dimension Lebesgue measure. Let us denote it by λn . Remark 7. (i) Special case 1 indicates why the natural original class C on which a measure μ is defined is a field. There is also another reason, and that is that, if the class C is not a field, then the extension of μ to the minimal σ -field over C need not be unique although μ may be finite (σ -finite) on C. (ii) If μ on C is not finite (σ -finite), then the Carathéodory extension on A = σ (C) need not be unique. (iii) An extension of μ to A = σ (C) may be σ -finite while μ is not so on C. In connection with these remarks, see also Exercises 12 and 13. Example 2 demonstrates the futility of starting out with an arbitrary class of sets in defining a measure on a field or a σ -field. Example 1 is a prelude to Example 2. Example 1. Let Fi , i = 1, 2, be fields of subsets of a set and define the classes Ci , i = 1, 2, 3, by: C1 = F1 ∪ F2 = {A ⊆ ;
A ∈ F1 or A ∈ F2 },
so that C1 is closed under complementation; C2 = {all finite intersections of members of C1 = {A ⊆ ; A = A1 ∩ · · · ∩ Am , Ai ∈ C1 , i = 1, . . . , m, m ≥ 1}; C3 = {all finite unions of members of C2 } n
= {A ⊆ ; A = Ai with Ai ∈ C2 , i = 1, . . . , n, n ≥ 1} = {A ⊆ ;
A=
i=1 n
Ai with Ai = Ai1 ∩ · · · ∩ Aim i , Ai1 , . . . , Aim i ∈ C1 ,
i=1
m i ≥ 1 integer, i = 1, . . . , n, n ≥ 1}. Set C3 = F. Then F is a field. This is so by Exercise 41(i) in Chapter 1, where the role of C and F1 is played by C1 , the role of F2 is played by C2 , and the role of F3 (= F) is played by C3 (= F).
29
30
CHAPTER 2 Definition and Construction
Example 2. In reference to Example 1, take F1 = {, A, Ac , } and F2 = {, B, B c , }(A = B, A ∩ B = ), so that C = C1 = F1 ∪ F2 = {, A, Ac , B, B c , }. Then, as is easily seen, C2 = {, A, Ac , B, B c , A ∩ B, A ∩ B c , Ac ∩ B, Ac ∩ B c , }; also, C3 = {, A, Ac , B, B c , A ∩ B, A ∩ B c , Ac ∩ B, Ac ∩ B c , A ∪ B, A ∪ B c , Ac ∪ B, Ac ∪ B c , (A ∩ B) ∪ (Ac ∩ B c ), (A ∩ B c ) ∪ (Ac ∩ B), }; as it can be verified. Set C3 = F. Then F is a field on account of Example 1. Alternatively, the assertion is checked directly, since F is closed under complementation and under the union of any two of its members (see Exercise 35). Clearly, F is the field generated by C(= C1 = F1 ∪ F2 ). This is so by Exercise 41(ii) cited above or by direct considerations. On the class C, define μi , i = 1, 2, as follows: μi () = 0, μi (A) = 0.40, μi (B) = 0.35, μi (Ac ) = 0.60, μi (B c ) = 0.65, μi () = 1, so that μi , i = 1, 2, are (probability) measures on C. Next, extend these measures from C to the field F generated by C in the following manner: μ2 (A ∪ B) = 0.60, μ1 (A ∪ B) = 0.50 μ1 (A ∪ B c ) = 0.90 μ1 (Ac ∪ B) = 0.85 μ1 (Ac ∪ B c ) = 0.75
μ2 (A ∪ B c ) = 0.80, μ2 (Ac ∪ B) = 0.75, μ2 (Ac ∪ B c ) = 0.85,
μ1 (A ∩ B) = 0.25 μ1 (A ∩ B c ) = 0.15
μ2 (A ∩ B) = 0.15, μ2 (A ∩ B c ) = 0.25,
μ1 (Ac ∩ B) = 0.10 μ1 (Ac ∩ B c ) = 0.50
μ2 (Ac ∩ B) = 0.20, μ2 (Ac ∩ B c ) = 0.40
.
The assigned values are legitimate, because they satisfy the ordering property (C ⊂ D implies μi (C) ≤ μi (D), i = 1, 2) and the additivity property. Furthermore, whereas μ1 and μ2 coincide on C, they do not coincide on F. That is, a measure defined on a class C (which is not a field) may assume more than one extension to a measure on the field generated by C. The same holds if F is replaced by a σ -field A. The interested reader may find a rather extensive treatment of measure theory in the reference Vestrup (2003).
2.4 Measures and (Point) Functions Let μ be a measure on B, the Borel σ -field in such that μ(finite interval) < ∞ (the Lebesgue measure, for example, does this). Then for a constant c, define a function
2.4 Measures and (Point) Functions
Fc = F : → as follows: F(x) =
c + μ((0, x]) if x ≥ 0, c − μ((x, 0]) if x < 0.
(Then F(0) = c, since (0, 0] = and μ() = 0.) Then we have the following easy theorem. Theorem 6.
Let F be defined as above. Then F is
(i) Nondecreasing. (ii) Continuous from the right.
Proof. (i) Let 0 ≤ x1 < x2 . Then F(x1 ) = c + μ((0, x1 ]) ≤ c + μ((0, x2 ]) = F(x2 ). Next, let x1 < 0 ≤ x2 . Then F(x1 ) = c − μ((x1 , 0]) ≤ c + μ((0, x2 ]) = F(x2 ). Finally, let x1 < x2 < 0. Then F(x1 ) = c−μ((x1 , 0]) ≤ c−μ((x2 , 0]) = F(x2 ). (ii) Let x ≥ 0 and choose xn ↓ x as n → ∞ here and in the sequel. Then (0, xn ] ↓ (0, x] so that μ((0, xn ]) ↓ μ((0, x]), or c + μ((0, xn ]) ↓ c + μ((0, x]), or equivalently, F(xn ) ↓ F(x). Next, let x < 0, and pick xn such that xn ↓ x. Then (xn , 0] ↑ (x, 0] so that μ((xn , 0]) ↑ μ((x, 0]), or equivalently, −μ((xn , 0]) ↓ −μ((x, 0]), or c − μ((xn , 0]) ↓ c − μ((x, 0]), or equivalently, F(xn ) ↓ F(x). Thus, we proved that a measure μ on B with the property that μ(finite interval) < ∞ defines a class of (point) functions on → that are nondecreasing and continuous from the right. Each such function is called a distribution function (d.f.). If μ is finite, then each F is bounded. In particular, if μ is a probability measure and, if we take c = μ(−∞, 0], then F is a d.f. of a r.v. X (i.e., in addition to (i) and (ii), F(−∞) = lim x→−∞ F(x) = 0, F(∞) = lim x→∞ F(x) = 1). Now we will work the other way around. Namely, we will start with any function F that is nondecreasing and continuous from the right, and we will show that such a function induces a measure on B. To this end, define the class C ⊂ B as follows: C = ∪ {(α, β]; α, β ∈ , α < β}, and on this class, we define a function as follows: de f ((α, β]) = (α, β) = F(β) − F(α), () = 0. Then we have the following easy lemma. Lemma 1.
Let C and be defined as above. Then
(i) ≥ 0. (ii) (α, β) ↓ 0 as β ↓ α. (iii) If α1 ≤ α2 ≤ · · · ≤ αn , then n−1 j=1 (α j , α j+1 ) = (α1 , αn ) = F(αn ) − F(α1 ). (iv) is nondecreasing.
31
32
CHAPTER 2 Definition and Construction
Proof. (i) Obvious from the nondecreasing property of F. (ii) Obvious by the continuity from the right of F. n−1 n−1 (iii) j=1 (α j , α j+1 ) = j=1 [F(α j+1 ) − F(α j )] = F(αn ) − F(α1 ) = (α1 , αn ). (iv) If (α1 , α2 ] ⊃ (α3 , α4 ], then by (iii) and (i), (α1 , α2 ) = (α1 , α3 ) + (α3 , α4 ) + (α4 , a2 ) ≥ (α3 , α4 ). Next, we have the following less obvious lemma. Lemma 2. The function on C is a measure. That is, () = 0, ((α, β]) ≥ 0, ∞ (α , β ], it holds ((α, β]) = and for (α, β] = ∞ j j=1 j j=1 ((α j , β j ]). Proof. Since () = 0, and ≥ 0 (and is nondecreasing), all that remains ∞ (α , β ], then (α, β) = to prove is that is σ -additive; i.e., if (α, β] = j j j=1 j (α j , β j ). Consider the n intervals (α j , β j ], j = 1, . . . , n. These intervals are nonoverlapping and we may rearrange them (α j1 , β j1 ], (α j2 , β j2 ], . . . , (α jn , β jn ] so that n
α j1 < β j1 ≤ α j2 < β j2 ≤ · · · ≤ α jn < β jn .
Then i=1 (α ji , β ji ) ≤ (α j1 , β j1 ) + (β j1 , α j2 ) + · · · + (α jn , β jn ) = is ≤ (α, β), since (α j1 , β jn ] ⊆ (α, β]. So, (α j1 , β jn ), by Lemma n 1 (iii), and this (α ji , β ji ) = nj=1 (α j , β j ) ≤ (α, β), which implies for every finite n, i=1 ∞
(α j , β j ) ≤ (α, β).
(2.11)
j=1
We then have to establish the reverse inequality; i.e., ∞
(α j , β j ) ≥ (α, β).
(2.12)
j=1
Consider (α, β], choose 0 < ε < β − α, and look at the interval [α + ε, β]. By Lemma 1 (ii), (β j , β j + δ) ↓ 0 as δ ↓ 0. Thus, there exists δ j > 0 such that (β j , β j + δ j ) <
ε , j = 1, 2, . . . 2j
(2.13)
Also, (α j , β j + δ j ) = (α j , β j ) + (β j , β j + δ j ), by Lemma 1 (iii). Thus, by using (2.13), one gets (α j , β j + δ j ) < (α j , β j ) +
ε , j = 1, 2, . . . 2j
(2.14)
The intervals {(α j , β j + δ j ), j = 1, 2, . . .} evidently cover [α + ε, β]. Then there exists a finite number of them, n 0 , (by the Borel–Heine Theorem) covering [α + ε, β].
2.4 Measures and (Point) Functions
From these n 0 intervals, select m ≤ n 0 with the following properties: α j1 < α + ε,
β jm + δ jm > β,
α j,i−1 < α ji < β j,i−1 + δ j,i−1 < β ji + δ ji , i = 2, . . . , m. Now look at the following intervals: (α j1 , β j1 + δ j1 ], (β j1 + δ j1 , β j2 + δ j2 ], . . . , (β j,m−1 + δ j,m−1 , β jm + δ jm ] (see the following picture). αj1 (| ( | α α+ε
αj2 | ]( βj1 +δj1
αji | ( ] ] βj2 +δj2 . . . βj,i−1 +δj,i−1 βj,i +δj,i . . .
αjm ] ] | ( βj,m−1 +δj,m−1 β βj,m+δj,m
These intervals are nonoverlapping and their sum is (α j1 , β jm +δ jm ] ⊃ (α+ε, β]. Then (α + ε, β) ≤ (α j1 , β jm + δ jm ) = (α j1 , β j1 + δ j1 ) +
m
(β j,i−1 + δ j,i−1 , β ji + δ ji )
i=2
≤
m
(α ji , β ji + δ ji ),
i=1
because (β j,i−1 +δ j,i−1 , β ji +δ ji ] ⊂ (α ji , β ji +δ ji ] and is nondecreasing. That is, (α + ε, β) ≤
m
(α ji , β ji + δ ji ).
i=1
Next,
m
(α ji , β ji + δ ji ) ≤
j=1
∞
(α j , β j + δ j ),
j=1
(since (α ji , β ji + δ ji ], i = 1, . . . , m are only m of the intervals (α j , β j + δ j ], j = 1, 2, . . .) = ≤ =
∞
[(α j , β j ) + (β j , β j + δ j )]
j=1 ∞
(α j , β j ) +
j=1 ∞
ε (by (2.13)), 2j
(α j , β j ) + ε;
j=1
(α + ε, β) ≤
∞ j=1
(α j , β j ) + ε.
i.e.,
33
34
CHAPTER 2 Definition and Construction
But (α+ε, β) = F(β)− F(α+ε) and hence F(β)− F(α+ε) ≤ ∞ j=1 (α j , β j )+ ε. Letting ε ↓ 0 and using Theorem 6(ii), we get then F(β) − F(α) = (α, β) ≤ ∞ (α , β ) which is (2.12). Therefore (2.11) and (2.12) show that is a measure j j j=1 on C. Theorem 7. Let F : → be a d.f. Then F uniquely determines a measure ν on B such that ν((α, β]) = F(β) − F(α), called the measure induced by F. If F is bounded, the measure is finite, and, in particular, if F is a d.f. of a r.v., then the measure is a probability measure. Proof. In terms of F, define as was done just prior to Lemma 1. Then consider the class C = C ∪ {(−∞, α]; α ∈ } ∪ {(β, ∞); β ∈ } and let C be the class of all finite sums in C . Then the class C is a field generating B (by Exercise 7(i) in Chapter 1), and, clearly, the elements of C are sums of countably many elements of C; i.e., if A ∈ C , then A = j I j with I j ∈ C, j = 1, 2, . . . On C , we define a function ν as follows: ν(A) = (I j ), if A = I j , I j ∈ C, j = 1, 2, . . . j
j
We will first show that ν is well defined; i.e., if A = i Ii and A = j I j , then (Ii ) = Clearly, Ii ∩ I j ∈ C, i, j = 1, 2, . . . Next, Ii = j (I j )(= ν(A)). i j Ii ∩ I j , since Ii ⊆ A = j I j so that (Ii ) = ( j Ii ∩ I j ) = j (Ii ∩ I j ), since is a measure on C (by Lemma 2). In a similar fashion, (I j ) = i (Ii ∩ I j ). Hence I j = I j ∩ Ii = Ii ∩ I j = (Ii ), j
j
i
i
j
i
as was asserted. Clearly, ν coincides with on C, and we will next show that ν is a measure on Let A j ∈ C , j = 1, 2, . . ., such C . It suffices to prove σ -additivity. that Ai ∩ A j = . To show that ν(A) = A ∈ C , i = j, and let A = j j j ν(A j ). Since A j ∈ C , j = 1, 2, . . ., we have that A j = i I ji with I ji ∈ C, i = 1, 2, . . . , j, i = 1, 2 . . .} form a partition of A. Thus ν(A) = Clearly, {I ji j,i (I ji ) = j i (I ji ) = j ν(A j ). The remaining assertions are obvious; e.g., uniqueness follows from Theorem 5 since ν is at least σ -finite. Remark 8. The preceding derivations also justify the questions left open in the two special cases discussed earlier (special cases right after the proof of Theorem 5). The d.f. F which induces the Lebesgue measure λ is defined by F : → , F(x) = x. Remark 9. If X is a r.v., its probability distribution (or just distribution) is usually denoted by PX and is defined by: PX (B) = P(X ∈ B), B ∈ B, so that PX is a probability measure on B. Next, if FX is the d.f. of X , then FX (x) = PX ((−∞, x]), x ∈ , so that FX is determined by PX . Theorem 7 shows that the converse is also true; that is, FX uniquely determines PX .
2.4 Measures and (Point) Functions
Exercises. 1. If is countable and μ is defined on P() by: μ(A) = number of points of A, show that μ is a measure. Furthermore, μ is finite or σ -finite, depending on whether is finite or denumerable, respectively. (This measure is called the counting measure.) 2. Refer to the field C of Example 4 in Chapter 1 and on C, define the set function P as follows: P(A) = 0 if A is finite, and P(A) = 1 if Ac is finite. Then show that (i) P is finitely additive. (ii) If is denumerable, P is not σ -additive. (iii) If is uncountable, P is σ -additive and a probability measure. 3. Refer to the σ -field C of Example 8 in Chapter 1 and on C, define the function P as follows: P(A) = 0 if A is countable, and P(A) = 1 if Ac is countable. Then show that P is a probability measure. 4. Let An , n = 1, 2, . . . be events in the probability space (, A, P) such that P(An ) = 1 for all n. Then show that P(∩∞ n=1 An ) = 1. 5. Let {Ai , i ∈ I } be an uncountable collection of pairwise disjoint events in the probability space (, A, P). Then show that P(Ai ) > 0 for countably many Ai s only. Hint: If In = {i ∈ I ; P(Ai ) > n1 }, then the cardinality of In is ≤ n − 1, n ≥ 2, and I0 = {i ∈ I ; P(Ai ) > 0} = ∪n≥2 In . 6. Let be an infinite set (countable or not) and let A be the discrete σ -field. Let ω2 , . . .} ⊂ , and with each ωn , associate a nonnegative number {ω1 , pn (such that ∞ ωn ∈A pn . n=1 pn ≤ ∞). On A, define the set function μ by: μ(A) = Then show that μ is a measure on A. 7. In the measure space (, A, μ) a set A ∈ A is called an atom, if μ(A) > 0 and for any B ⊆ A with B ∈ A, it follows that μ(B) = 0, or μ(B) = μ(A). In reference to Exercise 6, identify the atoms of A. 8. In any measure space (, A, μ) and with any An ∈ A, n = 1, 2, . . ., show that μ lim inf An ≤ lim inf μ(An ); also, μ lim sup An ≥ lim sup μ(An ), n→∞
provided μ
n→∞
∞ j=n
n→∞
n→∞
A j < ∞ for some n.
In Exercises 9 and 10, show that the set function μ◦ defined on P() is, indeed, an outer measure. 9. is an arbitrary set, ω0 is a fixed point of , and μ◦ is defined by: μ◦ (A) = I A (ω0 ). 10. is the set of 100 points arranged in a square array of 10 columns, each with 10 points, and μ◦ is defined by: μ◦ (A) = number of columns that contain at least one point of A.
35
36
CHAPTER 2 Definition and Construction
11. For an arbitrary set containing at least two points, consider the trivial field F = {, }, and let (the finite measure) μ be defined on F by: μ() = 0 and μ() = 1. (i) Determine the set function μ∗ on P(), as is given in the definition just prior to Theorem 3 (which μ∗ is, actually, an outer measure, by Theorem 3(ii)). (ii) Show that the σ -field of μ∗ -measurable sets, A∗ , say, is the trivial σ -field, so that A∗ is strictly contained in P(). 12. Let = {ω1 , ω2 , ω3 , ω4 }, let C = {, {ω1 , ω2 }, {ω1 , ω3 }, {ω2 , ω4 }, {ω3 , ω4 }, }, and define μ on C as follows: μ({ω1 , ω2 }) = μ({ω1 , ω3 }) = μ({ω2 , ω4 }) = μ({ω3 , ω4 }) = 3, μ() = 6, μ() = 0. Next, on P(), define the measures μ1 and μ2 by taking μ1 ({ω1 }) = μ1 ({ω4 }) = μ2 ({ω2 }) = μ2 ({ω3 }) = 1, μ1 ({ω2 }) = μ1 ({ω3 }) = μ2 ({ω1 }) = μ2 ({ω4 }) = 2. Then show that C is not a field. μ is a measure on C. Both μ1 and μ2 are extensions of μ (from C to P()). Construct the outer measure μ∗ (as is defined in Definition 5) by means of μ defined on C. (v) Conclude that μ∗ = μ1 = μ2 (so that, if the class C is not a field, the extension of (even a finite measure μ on C) need not be unique).
(i) (ii) (iii) (iv)
13. Let = {0, 1, 2, . . .}, let A = {1, 3, 5, . . .}, and let C = {, A, Ac , }. Let μ be the counting measure on C, and let μ1 , μ2 be defined on P() by μ1 (B) = the number of points of B, μ2 (B) = 2μ1 (B). Then show that C is a field. μ is not σ -finite on C. Both μ1 and μ2 are extensions of μ and are also σ -finite. Determine the outer measure μ∗ (as is defined in Definition 5) by showing that μ∗ (B) = ∞ whenever B = . (v) Show that the σ -field of μ∗ -measurable sets, A∗ say, is equal to P().
(i) (ii) (iii) (iv)
(From this example, we conclude that if μ is not σ -finite on the field C, then there need not be a unique extension. Also, there may be σ -finite extensions, such as μ1 and μ2 here, when the original measure on C is not σ -finite.) 14. Construct additional examples to illustrate the points made in Exercises 11(ii), 12, and 13.
2.4 Measures and (Point) Functions
15. Consider the measure space (, A, μ), and define the classes A∗ and A¯ as follows: A∗ = {A M; A ∈ A, M ⊆ N with N ∈ A and μ(N ) = 0}, A¯ = {A ∪ M; A ∈ A, M ⊆ N with N ∈ A and μ(N ) = 0}. Then show that (i) A M = (A − N ) ∪ [N ∩ (A M)]. (ii) A ∪ M = (A − N ) [N ∩ (A ∪ M)]. From parts (i) and (ii), conclude that ¯ (iii) A∗ = A. ¯ defined in Exercise 15, and show that A¯ (and 16. Refer to the classes A∗ and A, hence A∗ ) is a σ -field. 17. Refer to Exercise 15 and on A∗ , define μ∗ by: μ∗ (A M) = μ(A). (i) Show that μ(A − N ) = μ(A). By Exercise 15(ii), μ∗ (A ∪ M) = μ(A − N ). Therefore, by part (i), we may define μ∗ on A¯ by: μ∗ (A ∪ M) = μ(A). (ii) Show that μ∗ so defined is well defined; that is, if A1 ∪ M1 = A2 ∪ M2 , where Ai and Mi , i = 1, 2 are as in Exercise 15, then μ(A1 ) = μ(A2 ). (iii) Show that μ∗ is a measure on A¯ (and hence on A∗ , by Exercise 15(iii)). ¯ μ∗ ) ( and (, A∗ , μ∗ )) is called completion Remark: The measure space (, A, of (, A, μ). ¯ defined in Exercise 15 contains the class Cˆ 18. (i) Show that the class A∗ (= A) defined by Cˆ = {B ⊆ ; either B = A for some A ∈ A, or B ⊆ N for some N ∈ A with μ(N ) = 0}. (That is, Cˆ is taken from A by supplementing A with all subsets of all null sets in A; that is, sets of measure 0.) ˆ def ¯ (ii) Also, show that σ (C) = Aˆ = A∗ (= A). 19. If (, A, μ) is a measure space, by means of an example, show that there may be subsets M ⊂ N , where N are null sets in A, with M ∈ / A. (If it so happens, however, that for every M ⊆ N for every null N ∈ A, then M ∈ A, we say that A is complete with respect to μ.) 20. If μ◦ is an outer measure and A◦ is the σ -field of μ◦ -measurable sets, then show that A◦ is complete (with respect to μ◦ ). 21. Consider the measure space (, B, μ) where μ(finite interval) < ∞, and for c ∈ , let Fc : → be defined by c + μ((0, x]) x ≥ 0, Fc (x) = c − μ((x, 0]) x < 0. Then, by means of an example, show that Fc need not be left-continuous.
37
38
CHAPTER 2 Definition and Construction
22. Consider the measurable space (, A), and let μ be a set function defined on A and taking nonnegative values, and such that μ() = c with 0 < c < ∞. Then show that μ cannot be additive. 23. Let μ1 , . . . , μn be σ -finite n measures on (, A) and set μ = μ1 + · · · + μn (in μi (A), A ∈ A). Then show that μ is also σ -finite. the sense that μ(A) = i=1 24. In the probability space (, A, P), show that B)] = P(A) + P(B) − 2P(A ∩ B). (i) P[(A ∩ B c ) ∪ (Ac ∩ n P(Ai ) − (n − 1). (ii) P(A1 ∩ · · · ∩ An ) ≥ i=1 25. Consider the probability space (, A, P), where = {ω1 , ω2 , ω3 , ω4 }, A is the discrete σ -field, and P is defined by P({ω1 }) =
1 1 1 3 , P({ω2 }) = , P({ω3 }) = , P({ω4 }) = . 6 3 5 10
For n = 1, 2, . . ., define An by: A2n−1 = {ω1 , ω2 }, A2n = {ω2 , ω3 }. Determine the: limn→∞ An , limn→∞ An and compute the probabilities: P( lim An ), P( lim An ), lim P(An ), lim P(An ). n→∞
n→∞
n→∞
n→∞
26. Let = {ω1 , ω2 , . . .} and let A be a σ -field of subsets of . Then show that (i) A = P() if and only if {ωi } ∈ A for all ωi . (ii) A measure μ on A(= P()) is determined by defining μ({ωi }), i ≥ 1. 27. Suppose that = {ω1 , ω2 , . . .}, and let A be the discrete σ -field. On A, define the set function μ by 0 if A is finite, μ(A) = ∞ if A is infinite. Then show that (i) μ is nonnegative, μ() = 0 and is finitely additive, but it is not σ -additive. (ii) = limn→∞ An for a sequence {An }, with An ⊆ An+1 , n ≥ 1, and μ(An ) = 0 (and therefore μ(Acn ) = ∞), n ≥ 1. 28. Let = {1, 2, . . .}, and for any A ⊆ , let a = sup A. On the discrete σ -field, define μ0 by ⎧ a ⎨ a+1 if A is finite, μ0 (A) = 0 if A = , ⎩ 1 if A is infinite. Then (i) Show that μ0 is an outer measure. (ii) Determine the class A◦ of μ◦ -measurable sets. 29. Recall that m is a median of a r.v. X if P(X ≤ m) ≥ 21 and P(X ≥ m) ≥ 21 . Then show that, if m is a median of the r.v. X , then −m is a median of the r.v. −X .
2.4 Measures and (Point) Functions
30. Recall that a r.v. X is said to be symmetric about 0, if X and −X have the same distribution. If X is symmetric (about 0), then show that 0 is a median of X ; i.e., P(X ≤ 0) ≥ 21 and P(X ≥ 0) ≥ 21 . 31. Let μ0 be an outer measure, and suppose that μ0 (A) = 0 for some A ⊂ . Then show that μ0 (A ∪ B) = μ0 (B) for every B ⊆ . 32. Consider the measure space (, A, μ), and suppose that A is complete with respect to μ. Let f , g : → be such that μ( f = g) = 0. Then show that, if one of f or g is measurable, then so is the other. 33. Consider the measure space (, A, μ), and let A be complete with respect to μ. Let A ∈ A with μ(A) = 0, and let f be an arbitrary function f : A → . Then f is measurable. 34. (i) If μ1 and μ2 are two measures on (, A), then show that μ = μ1 + μ2 is also a measure, where μ(A) = μ1 (A) + μ2 (A), A ∈ A. (ii) If at least one of the measures μ1 and μ2 is complete, then so is μ. 35. Refer to Example 2 and: (i) Show that the class C3 consists of the unions of any two members of the class C2 . (ii) Also, show that C3 is closed under complementation and the formation of the union of any two of its members. (iii) Conclude that C3 is a field, and, indeed, the field generated by C.
39
CHAPTER
3
Some Modes of Convergence of Sequences of Random Variables and their Relationships
In this short chapter, we introduce two basic kinds of convergence, almost everywhere convergence and convergence in measure, and we then investigate their relationships. The mutual versions of these convergences are also introduced, and they are related to the respective convergences themselves. Furthermore, conditions for almost sure convergence are established. Finally, the concept of almost uniform convergence is defined and suitably exploited.
3.1 Almost Everywhere Convergence and Convergence in Measure Consider the measure space (, A, μ) and let {X n }, n = 1, 2, . . . , X be r.v.s. Then Definition 1. We say that {X n } converges almost everywhere (a.e.) to X and write a.e.
X n → X , if X n (ω) → X (ω) except on a set N (A) such that μ(N ) = 0; i.e., n→∞
n→∞
a μ-null set. We also write μ(X n X ) = 0. In particular, if μ is a probability n→∞
measure, this convergence is called almost sure (a.s.) convergence. We say that {X n } converges mutually a.e., if μ(X m − X n 0) = 0, or equivalently, X n+ν − m,n→∞
a.e.
X n → 0 uniformly in ν ≥ 1 (see also Exercise 20). Then, by the Cauchy criterion n→∞ for sequences of numbers, {X n } converges a.e. to X , if and only if it converges mutually a.e. If we modify X n to X n on a null set Nn and X to X on null set N0 , de f a.e. then, since ∞ n=1 Nn ∪ N0 = N ∈ A and μ(N ) = 0, we have that: X n → X , if n→∞
a.e.
X n → X , {X n } n→∞
converges mutually a.e., if {X n } converges mutually a.e., and the preceding statement about equivalence of a.e. convergence and mutual convergence of {X n } is still valid. Of course, μ(X n = X n ) = μ(X = X ) = 0, n = 1, 2, . . . . μ
We say that {X n } converges in measure to X and write X n → X , if, n→∞ for every ε > 0, μ(|X n − X | ≥ ε) → 0. In particular, if μ is a probability measure, Definition 2.
n→∞
this convergence is called convergence in probability. We say that {X n } converges mutually in measure, if for every ε > 0, μ(|X m − X n | ≥ ε) → 0, or equivam,n→∞
lently, μ(|X n+ν − X n | ≥ ε) → 0 uniformly in ν ≥ 1 (see also Exercise 21). The n→∞
An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00003-7 Copyright © 2014 Elsevier Inc. All rights reserved.
41
42
CHAPTER 3 Some Modes of Convergence of Sequences
convergence is mutual convergence in probability, if μ is a probability measure. Obviously, these convergences remain intact when the X n s and X are modified as μ before. We will show later on (see Theorems 2 and 6) that X n → X , if and only if n→∞ {X n } converges mutually in measure. a.e. a.e. Clearly, if X n → X and X n → X , then μ(X = X ) = 0. This is also true for n→∞ n→∞ convergence in measure but is less obvious. So μ
Proof. so that
μ
Let X n → X and X n → X . Then μ(X = X ) = 0. n→∞ n→∞ We have X − X = (X − X n ) + (X n − X ) ≤ |X n − X | + X n − X ,
Theorem 1.
X − X ≥ ε ⊆ |X n − X | ≥ ε ∪ X n − X ≥ ε , and hence 2 2 ε ε μ X − X ≥ ε ≤ μ |X n − X | ≥ + μ X n − X ≥ → 0. 2 2 n→∞ Thus, μ( X − X ≥ ε) = 0, for every ε > 0 and hence μ X − X ≥ k1 = 0, k = 1, 2, . . .. But (X = X ) = ∞ X − X ≥ 1 . Thus μ(X = X ) ≤
k
k=1
1 μ X − X ≥ k
k
= 0; i.e., μ(X = X ) = 0.
Convergence in measure implies mutual convergence in measure. That is, Theorem 2. Proof.
μ
Let X n → X . Then {X n } converges mutually in measure.
n→∞
We have: |X m − X n | ≤ |X m − X | + |X n − X | .
Thus,
ε μ([|X m − X n | ≥ ε]) ≤ μ |X m − X | ≥ 2 ε → 0. +μ |X n − X | ≥ 2 m,n→∞
Remark 1. It is obvious that any one of the modes of convergences introduced so far is true for any subsequence of a given sequence. What we are going to do in the remaining part of this section is to find an expression for the set X n → X , then use this result to formulate a criterion for a.e. convergence n→∞ of X n to X . The set of convergence X n → X consists of all those ω ∈ for which: For n→∞
every ε > 0, there is an n = n(ε, ω) ≥ 1 (integer) such that, for all ν ≥ 1, it holds |X n+ν (ω) − X (ω)| < ε.
(3.1)
3.1 Almost Everywhere Convergence and Convergence in Measure
That (3.1) holds for all ν ≥ 1 means ω∈
∞
(|X n+ν − X | < ε) for some n ≥ 1, and every ε > 0.
(3.2)
ν=1
That (3.2) holds for some n ≥ 1 means ω∈
∞ ∞
(|X n+ν − X | < ε) for every ε > 0.
(3.3)
n=1 ν=1
That (3.3) holds for every ε > 0 means ω∈
∞ ∞
(|X n+ν − X | < ε).
ε>0 n=1 ν=1
∞ ∞ Thus: X n → X = n=1 ν=1 (|X n+ν − X | < ε). Clearly, ε > 0 may n→∞
ε>0
be replaced by εk ↓ 0 or k1 (↓ 0). Then
∞ ∞ ∞
1 |X | Xn → X = . n+ν − X < n→∞ k
k=1 n=1 ν=1
In a similar way we find that the set for which {X n } converges mutually is
∞ ∞ ∞
1 |X n+ν − X n | < . (X n+ν − X n n→∞ → 0) = k ν≥1 k=1 n=1 ν=1
Both of these sets are measurable, since they are taken by countable operations on measurable sets. Thus we have Theorem 3.
We have
∞ ∞ ∞
1 |X n+ν − X | < (X n → X ) = ∈ A, n→∞ k k=1 n=1 ν=1
∞ ∞ ∞
1 |X | (X n+ν − X n n→∞ ∈ A, → 0) = n+ν − X n < k ν≥1 k=1 n=1 ν=1
∞ ∞ ∞ 1 |X n+ν − X | ≥ and hence, (X n X ) = ∈ A, n→∞ k k=1 n=1 ν=1
∞ ∞ ∞ 1 |X n+ν − X n | ≥ (X n+ν − X n n→∞ ∈ A. 0) = k ν≥1
k=1 n=1 ν=1
In the following, it is understood that k, n = 1, 2, . . . and ν = 1, 2, . . . (or ν = 0, 1, . . .).
43
44
CHAPTER 3 Some Modes of Convergence of Sequences
a.e.
Now X n → X means μ(X n X ) = 0, or equivalently, n→∞
n→∞
1 |X n+ν − X | ≥ μ = 0. k n ν k
1 = 0 implies μ − X| ≥ But μ k n ν |X n+ν − X | ≥ k n ν |X n+ν 1 1 |X | = 0, k = 1, 2, . . . , since ↑ − X ≥ n+ν n ν k n ν k k |X n+ν − X | ≥ k1 as k → ∞. In the other way around, if for all k = 1, 2, . . . , these sets have measure zero, then the measure of (X n X ) is zero. Thus n→∞ |X n+ν − X | ≥ k1 = 0, k = 1, 2, . . . , μ(X n X ) = 0 if and only if μ n ν n→∞ and in a similar fashion μ(X n+ν − X n n→∞ 0) = 0 if and only if μ n ν ν≥1 |X n+ν − X n | ≥ k1 = 0, k = 1, 2, . . .. These statements are true for any μ. Assume now that μ is finite. We have then, for a fixed k,
1 1 |X n+ν − X | ≥ |X n+ν − X | ≥ ↓ , as n → ∞, so that k k ν n ν
1 1 |X n+ν − X | ≥ |X n+ν − X | ≥ μ ↓μ as n → ∞. k k ν n ν Therefore, in this case,
1 |X n+ν − X | ≥ μ(X n X ) = 0 if and only if μ → 0 n→∞ n→∞ k ν
for each k = 1, 2, . . . , and in a similar way, μ(X n+ν
1 |X n+ν − X n | ≥ X n ) = 0 if and only if μ → 0 n→∞ n→∞ k ν≥1 ν
for each k = 1, 2, . . .. Thus we have the following theorem. Theorem 4.
1 |X n+ν − X | ≥ μ(X n X ) = 0 if and only if μ n→∞ k n ν = 0, k = 1, 2, . . . ,
and
μ(X n+ν
1 |X n+ν − X n | ≥ − X n n→∞ 0) = 0 if and only if μ k ν≥1 n ν
= 0, k = 1, 2, . . . .
(3.4)
3.2 Convergence in Measure is Equivalent to Mutual Convergence
In particular, if μ is finite, then
1 |X n+ν − X | ≥ μ(X n X ) = 0 if and only if μ → 0, n→∞ n→∞ k ν k = 1, 2, . . . ,
and μ(X n+ν − X n n→∞ 0) = 0 if and only if μ
ν≥1
|X n+ν
ν
1 − Xn| ≥ k
→ 0, k = 1, 2, . . .
n→∞
a.e.
μ
n→∞
n→∞
Corollary. If μ is finite, then X n → X implies X n → X . Proof.
a.e.
In fact, let X n → X , or μ(X n X ) = 0. However, n→∞
n→∞
1 |X n+ν − X | ≥ μ(X n X ) = 0 is equivalent to μ → 0, n→∞ n→∞ k ν k = 1, 2, . . . .
But
1 μ |X n − X | ≥ k Thus
1 μ |X n − X | ≥ k
⎛
⎞
1 ⎠. |X n+ν − X | ≥ ≤ μ⎝ k ν≥0
→ 0,
n→∞
k = 1, 2, . . . , or equivalently,
μ
Xn → X . n→∞
Remark 2. This need not be true if μ is not finite. Also, the inverse need not be true even if μ is finite. These points can be illustrated by examples (see Exercises 2(i) and 2(ii)).
3.2 Convergence in Measure is Equivalent to Mutual Convergence in Measure It is first shown that convergence in measure ensures the existence of a subsequence that converges almost everywhere. Then, by means of this result and almost uniform convergence, it is shown that convergence in measure and mutual convergence in measure are equivalent, as is the case in almost everywhere convergence.
45
46
CHAPTER 3 Some Modes of Convergence of Sequences
Theorem 5. (i)
If {X n } converges mutually in measure, there is a subsequence {X n k } of {X n } a.e.
and a r.v. X such that X n k → X as n → ∞ (which implies n k → ∞). μ a.e. (ii) If X n → X , there is a subsequence {X n k } of {X n } such that X n k → X , a r.v. n→∞
n→∞
X , and μ(X = X ) = 0. Proof.
(i) {X n } converges mutually in measure implies that
1 1 μ |X m − X n | ≥ k < k for m, n ≥ n(k), k = 1, 2, . . . . 2 2
(3.5)
Define n 1 = n(1) n 2 = max{n 1 + 1, n(2)} n 3 = max{n 2 + 1, n(3)}. .. . Then n 1 < n 2 < n 3 < · · · → ∞ since each term increases at least by 1. For k = 1, 2, . . ., we set X k = X n k and define
∞ 1 Ak = X k+1 − X k ≥ k , Bn = Ak . 2 k=n Then μ(Ak ) = μ X n k+1 − X n k ≥ 21k < 21k , by (3.5), since n k , n k+1 ≥ n(k) from 1 their own definition, and hence μ(Bn ) ≤ ∞ k=n μ(Ak ) ≤ 2n−1 ; i.e., μ(Bn ) ≤
1 2n−1
.
(3.6)
We are aiming at showing that {X k } converges mutually a.e. To this end, for ε > 0, 1 choose n 0 = n(ε) such that 2n01−1 < ε. Then 2n−1 < ε, n ≥ n 0 . Now for k ≥ n ≥ n 0 and ν ≥ 1, one has X k+ν − X k = X k+ν − X k+ν−1 + X k+ν−1 − X k+ν−2 + X k+ν−2 − X k+ν−3 + · · · + X k+1 − X k + X ≤ X k+ν − X k+ν−1 − X k+ν−2 k+ν−1 + · · · + X − X + X k+ν−2 − X k+ν−3 k+1 k k+ν−1 k+ν−1 = − X X j+1 X j+1 − X j j ≤ ≤
j=k ∞
X j+1 − X j .
j=n
j=n
3.2 Convergence in Measure is Equivalent to Mutual Convergence
Therefore, if ω ∈ Bnc =
implies
∞
Ack , or equivalently, ω ∈ Acj , j ≥ n, then 1 X j+1 (ω) − X j (ω) < j 2 k=n
∞ X j+1 (ω) − X j (ω) ≤ j=n
1 < ε, 2n−1
and this implies in turn that X k+ν (ω) − X k (ω) < ε, k ≥ n(≥ n 0 ), ν ≥ 1.
(3.7)
This, in turn, gives that X k+ν − X k ≥ ε ⊆ Bn , k ≥ n(≥ n 0 ), ν = 1, 2, . . . . Hence
∞ X
k+ν
− X k ≥ ε ⊆ Bn , k ≥ n(≥ n 0 )
ν=1
implies
∞ ∞
X
k+ν
− X k ≥ ε ⊆ Bn (n ≥ n 0 ),
k=n ν=1
and therefore, by (3.6), ⎛
X μ⎝
k+ν
k≥n ν≥1
⎞ − X ≥ ε ⎠ ≤ μ(Bn ) ≤ k
1 2n−1
, (n ≥ n 0 ),
1 so that μ k≥1 ν≥1 X k+ν − X k ≥ ε ≤ 2n−1 (n ≥ n 0 ). Letting n → ∞, we get ⎛ ⎞
X − X ≥ ε ⎠ = 0 for every ε > 0, or equivalently, μ⎝ k+ν k ⎛
k≥1 ν≥1
X ⎝ μ
k+ν
k≥1 ν≥1
⎞
1 ⎠ = 0, m = 1, 2, . . . (see (3.4)). − X k ≥ m
Then Theorem 4 applies and gives that {X k } converges mutually a.e. and therefore there exists a r.v. X such that {X k } = {X n k } (subsequence of {X n }) converges to X a.e. as k → ∞. μ (ii) Let X n → X . Then, by Theorem 2, {X n } converges mutually in measure, and n→∞
a.e.
hence, by part (i), there exists {X k } ⊆ {X n } such that X k → X . It remains for us k→∞
to show that μ(X = X ) = 0. This is done in the Proof of Theorem 5(ii) (continued later).
47
48
CHAPTER 3 Some Modes of Convergence of Sequences
Remark 3. It is to be pointed out here that the subsequence, constructed in Theorem 5, that converges a.e. to a r.v. will be used extensively in many instances and for various purposes. Theorem 6. If {X n } converges mutually in measure, then {X n } converges in measure to a r.v. X as n → ∞. Proof (For the case that μ is finite). By Theorem 5 (i), there exists a subsequence a.e.
μ
k→∞
k→∞
{X nk } of {X n } and a r.v. X such that X n k → X . Also X n k → X by the Corollary μ
to Theorem 4 (since μ is finite). To show that X n → X . Indeed, n→∞
|X n − X | ≤ X n − X n k + X n k − X implies ε ε (|X n − X | ≥ ε) ⊆ X n − X n k ≥ ∪ X nk − X ≥ , 2 2 so that
ε ε μ(|X n − X | ≥ ε) ≤ μ X n − X n k ≥ + μ X nk − X ≥ . 2 2 Letting n → ∞ (which implies that n k → ∞), we get μ X n − X n k ≥ 2ε → 0 from mutual convergence in measure of {X n }, and μ X n k − X ≥ 2ε → 0. Hence, μ as n → ∞, μ |X n − X | ≥ ε → 0, or equivalently, X n → X . n→∞
Theorem 6 is valid even if μ is not finite, but for its proof we need some preliminary results. (See Proof of Theorem 6 (continued) later.) a.u.
Definition 3. We say that {X n } converges almost uniformly to X and write X n → X , n→∞
if for every ε > 0, there exists Aε ∈ A such that μ(Aε ) < ε and X n → X unin→∞
formly on Acε . We say that {X n } converges mutually almost uniformly if for every ε > 0 there exists Aε ∈ A such that μ(Aε ) < ε and X m − X n → 0 uniformly on m,n→∞
a.u.
Acε . Of course, X n → X if and only if {X n } converges mutually almost uniformly. n→∞ (Observe that there exist such sequences; e.g., the subsequence {X n k } constructed in Theorem 5 is such a sequence. See also Theorem 8 below.) Theorem 7.
a.e.
μ
n→∞
n→∞
a.u.
We have that X n → X implies that for k1 there exists Ak such that μ(Ak ) < n→∞ and X n → X uniformly on Ack , k = 1, 2, . . .. Set A = ∞ k=1 Ak . Then A ∈ A
Proof. 1 k
a.u.
n→∞
If X n → X , then X n → X and X n → X .
n→∞
1 and μ(A) k ) < k . Letting k → ∞, we then get μ(A) = 0. On the other hand, ∞≤ μ(A c c A = k=1 Ak and therefore for every ω ∈ Ac , it follows that ω ∈ Ack for some a.e.
k = 1, 2, . . . , so that X n (ω) → X (ω). That is, X n → X on Ac or X n → X . n→∞
n→∞
n→∞
3.2 Convergence in Measure is Equivalent to Mutual Convergence
a.u.
Again, X n → X implies that for every ε > 0 there exists Aε ∈ A such that μ(Aε ) < ε
n→∞ and X n
→ X uniformly on Acε . This second part means that for every
n→∞
δ > 0 there exists N (δ, ε) > 0 (independent of ω ∈ Acε ) such that for n ≥ N (δ, ε), |X n (ω) − X (ω)| < δ, ω ∈ Acε . Thus, for n ≥ N (δ, ε), (|X n − X | ≥ δ) ⊆ Aε , so that μ μ(|X n − X | ≥ δ) ≤ μ(Aε ) < ε; i.e., X n → X . n→∞
a.e.
Compare with Remark 2 where X n → X does not necessarily imply
Remark 4.
n→∞
μ
that X n → X . n→∞
Theorem 8. (i) If {X n } converges mutually in measure, then there exists {X k } ⊆ {X n } and a r.v. a.u.
X such that X k → X . μ
k→∞
a.u.
(ii) If X n → X , then there exists {X k } ⊆ {X n } and a r.v. X such that X k → X and
n→∞ μ(X =
k→∞
X ) = 0.
Proof. (i) Consider the subsequence {X k } constructed in Theorem 5(i). By the arguments which led to relation (3.7), for every ε > 0, there is n 0 = n(ε) > 0 integer and a set Bnc with μ(Bn ) < ε, n ≥ n 0 , such that, on Bnc , X k+ν (ω) − X k (ω) < ε, k ≥ n 0 , ν = 1, 2, . . . . Applying the above for n = n 0 , we get: for every ε > 0, there exists n 0 = n(ε) positive integer and a set Bn 0 with μ(Bn 0 ) < ε, such that, on Bnc0 , X (3.8) k+ν (ω) − X k (ω) < ε, k ≥ n 0 , ν = 1, 2, . . . . But (3.8) is the definition of mutual a.u. convergence. Then there exists a r.v. X a.u. such that X k → X . μ
k→∞
(ii) If X n → X , then {X n } converges mutually in measure (by Theorem 2). Then, n→∞
a.u.
μ
μ
k→∞
k→∞
k→∞
by part (i), X k → X . However, X k → X by Theorem 7. So X k → X and μ
X k → X (since {X k } ⊆ {X n }). Therefore μ(X = X ) = 0 by Theorem 1. k→∞
Remark 5. See Exercise 11 for an implication of a.e. convergence with regards to a.u. convergence. We are now in a position to complete the parts of various proofs left incomplete so far. (ii)
49
50
CHAPTER 3 Some Modes of Convergence of Sequences
Proof of Theorem 5(ii) (continued). Recall that in Theorem 5 (ii) we assumed μ
a.e.
n→∞ r.v. X .
k→∞
X n → X and we proved the existence of a subsequence {X k } such that X k → X , What remained to be verified was that μ(X = X ) = 0. This is done in a Theorem 8 (ii). Proof of Theorem 6 (continued). We can now give the proof of Theorem 6 for not necessarily a finite μ. Consider the sequence {X n k } of Theorem 5 or Theorem μ a.u. a.e. 8. Then X n k → X implies (X n k → X and) X n k → X by Theorem 7. Hence k→∞ k→∞ k→∞ μ(|X n − X | ≥ ε) ≤ μ X n − X n k ≥ 2ε + μ X n k − X ≥ 2ε → 0 + 0 = 0, the μ
k→∞
first by assumption and the second by the fact that X n k → X .
k→∞
μ
Corollary to Theorem 2 and 6. X n → X if and only if {X n } converges mutually n→∞ in measure. A brief summary of the results obtained in this chapter is as follows. The sequence of r.v.s we refer to is {X n }, and a limiting r.v. is X . 1. Convergence in measure implies (μ−) uniqueness of the limit (Theorem 1). 2. Convergence in measure is equivalent to mutual convergence in measure (Theorems 2 and 6). 3. Expressions of the set of pointwise convergence (and hence of nonconvergence), as well as the set of pointwise mutual convergence (and hence of mutual nonconvergence) (Theorem 3). 4. Necessary and sufficient conditions for a.e. (and mutual a.e.) convergence for any μ, and in particular, for finite μ (Theorem 4). 5. Almost everywhere convergence and finiteness of μ imply convergence in measure. The converse need not be true even for finite μ (Corollary to Theorem 4, Exercise 2(i), (ii)). 6. Convergence in measure (or mutual convergence in measure) implies the existence of a subsequence that converges a.e. (or converges mutually a.e.) (Theorem 5). 7. Almost uniform convergence implies a.e. convergence and convergence in measure (Theorem 7). 8. Convergence in measure (or mutual convergence in measure) implies the existence of a subsequence that converges a.u. to a r.v. (or converges mutually a.u.) (Theorem 8). Exercises. μ
1. If X n → X , then show directly (that is, without reference to other results) that: (X n
n→∞ − X )+
μ
μ
μ
μ
→ 0, (X n − X )− → 0, X n+ → X + and X n− → X − .
n→∞
Hint: For the convergence μ
n→∞ n→∞ μ X n+ → X + , show that |X n+ − n→∞
n→∞
X + | ≤ |X n − X |, and
likewise for X n− → X − . To this end, use Exercise 28 in Chapter 1. n→∞
3.2 Convergence in Measure is Equivalent to Mutual Convergence
2. By means of examples, show that: a.e.
μ
n→∞ μ
n→∞ a.e.
n→∞
n→∞
(i)
X n → X need not imply X n → X if μ is not finite.
(ii)
X n → X need not imply X n → X even if μ is finite.
Hint: In (i), take = , A = B, μ = λ, the Lebesgue measure, and choose the r.v.s suitably. In part (ii), take = (0, 1], A = B , μ = λ, the Lebesgue measure, and choose the r.v.s suitably. ∞ 3. For any sequence of events {An }, n ≥ 1, show that P(An ) < ∞ implies n=1
P(lim sup An ) = 0. n→∞
4. The sequence {X n }, n ≥ 1, of r.v.s is said to converge completely to 0, if, for ∞ P(|X n | ≥ ε) < ∞. every ε > 0, n=1 a.e.
(i) Show that, if {X n }, n ≥ 1, converges completely to 0, then X n → 0. n→∞
(ii) By means of an example, show that complete convergence is not necessary for a.s. convergence. Hint: For part (i), use Exercise 3 here and Exercise 4 in Chapter 2. For part (ii), take = (0, 1], A = B , P = λ, the Lebesgue measure, and choose the r.v.s suitably. a.e. Note: The most common way of establishing that X n → X is to show that n→∞ {X n − X }, n ≥ 1, converges completely to 0. P
5. Show that X n → X if and only if, for every {n } ⊆ {n}, there exists a further subsequence
n→∞ {n } ⊆
a.s.
{n } such that X n → X and any two limiting r.v.s are n →∞
a.s. equal. a.s. 6. By means of an example, show that X n → c = 0 need not imply that P(X n = n→∞
0) = 1 for all n. 7. If for some εn > 0 with a.s.
∞
εn < ∞, it holds that
n=1
∞
P(|X n+1 − X n | ≥ εn ) < ∞,
n=1
then show that X n → to a r.v. n→∞
Hint: It suffices to show that {X n } converges mutually a.s. To this end, set An = (|X n+1 − X n | ≥ εn ), n ≥ 1, and use Exercise 3 in this chapter in order to c conclude that P( lim An ) = 0. Then, by setting A = lim An and N c = A , it n→∞
n→∞
follows that the event N c (with P(N c ) = 1) is the set over which {X n } converges mutually. 8. For n = 1, 2, . . . , let X n and X be (real-valued) r.v.s and let g : → be conP
P
tinuous. Then show that X n → X implies g(X n ) → g(X ) (so that continuity n→∞ n→∞ preserves convergence in probability).
51
52
CHAPTER 3 Some Modes of Convergence of Sequences
Hint: One way of approaching this problem is to use Exercise 5. 9. For n = 1, 2, . . . , let X n , Yn , X , and Y be (real-valued) r.v.s and let g : 2 → P
P
P
n→∞
n→∞
n→∞
be continuous. Then show that X n → X and Yn → Y implies g(X n , Yn ) → g(X , Y ) (so that continuity preserves convergence in probability). a.s. 10. Show that X n → X if and only if there is a sequence 0 < εn → 0 such that n→∞
n→∞
⎡
⎤ P⎣ |X k − X | ≥ εk ⎦ → 0. n→∞
k≥n
Hint: For the part εn → 0, show that, for every ε > 0, there exists N = N (ε) n→∞ (|X k − X | ≥ ε) ⊆ (|X k − X | ≥ εk ) such that k ≥ N and n ≥ N imply k≥n
k≥n a.s.
and then use Theorem 4 suitably. For the part X n → X , use Theorem 4 in order n→∞ to conclude that P |X k − X | ≥ ε → 0. Applying this conclusion for n→∞
k≥n
m ≥ 1, show that there exists a sequence n m ↑ ∞ as m → ∞ such that ⎡ ⎤
1 ⎦< 1 . |X k − X | ≥ P⎣ m 2m k≥n m
Finally, for n m ≤ k < n m+1 , set εk =
1 m
and show that
⎡
⎤ ⎡ ⎤ P⎣ |X k − X | ≥ εk ⎦ ≤ P ⎣ |X k − X | ≥ εk ⎦ ≤ k≥n
k≥n m
1 2n−1
.
a.e.
11. (Egorov’s Theorem). Show that, if μ is finite, then X n → X implies that n→∞
a.u.
Xn → X . n→∞
Hint: For an arbitrary ε > 0 to be kept fixed throughout and k ≥ 1 integer, use Theorem 4 in order to conclude that there (ε, k) > 0 such exists Nk = N |X n − X | ≥ k1 . Thus, if Aε = that μ(Aε,k ) < 2εk , k ≥ 1, where Aε,k = n≥Nk Aε,k , then μ(Aε ) ≤ ε. Finally, show that X n (ω) → X (ω) uniformly in n→∞
k≥1
ω ∈ Aε . 12. Show that the complement Ac of the set A of (pointwise) convergence of a sequence of r.v.s {X n }, n ≥ 1, is expressed as follows: Ac = ∪ {ω ∈ ; lim inf r ,s
n→∞
X n (ω) ≤ r < s ≤ lim sup X n (ω)}, where the union is taken over all rationals r and s with r < s.
n→∞
3.2 Convergence in Measure is Equivalent to Mutual Convergence
13. For n = 1, 2, . . . , let X n , X be r.v.s defined on the measurable space (, A), and suppose that X n −→ X (pointwise). Then show that n→∞
(i) X is σ (X 1 , X 2 , . . .)-measurable. (ii) σ (X 1 , X 2 , . . .) = σ (X 1 , X 2 , . . . , X ). 14. For a sequence of r.v.s {X n }, n ≥ 1, show that the set A = {ω ∈ ;
∞
X n (ω)
n=1
converges} is in the σ (X m , X m+1 , . . .) for every m ≥ 1. 15. Let X n , n ≥ 1, be r.v.s defined on the measure space (Ω, A, P), and suppose ∞ that P(|X n | > n) < ∞. Then show that lim |Xnn | ≤ 1 a.s. n→∞
n=1
Hint: Refer to Exercise 3 in this chapter. 16. For n = 1, 2, . . . , let Zn = (Z n1 , . . . , Z nk ) and Z = (Z 1 , . . . , Z k ) be kP
P
dimensional r.v.s. Then we say that Zn → Z if ||Zn − Z|| → 0, or n→∞ n→∞ 1/2 k P P (Z n j − Z j )2 → 0. Then show that Zn → Z if and only if n→∞
j=1
n→∞
P
Z n j → Z j , j = 1, . . . , k. n→∞ P Xn → n→∞
P
17. If X and g : → is continuous, then g(X n ) → g(X ) (see n→∞ Exercise 8). Show that this need not be true, if g is not continuous. Hint: Consider the function δc (x) = 0 for x < c, and δc (x) = 1 for x ≥ c, for some constant c ∈ . μ 18. If X n −→ X and |X n | ≤ Y a.e. n ≥ 1, then show that |X | ≤ Y a.e. with respect n→∞ to the measure μ. 19. Refer to Exercise 13 and continue as follows: For n = 1, 2, . . . , let X n , X be r.v.s a.e.
defined on the measure space (, A, μ), and suppose that X n → X . Then, by n→∞ means of concrete examples, show that: (i) X is σ (X 1 , X 2 , . . .)-measurable. (ii) X is not σ (X 1 , X 2 , . . .)-measurable. a.e.
(iii) If X n → X , show that the X n s and X can be modified into X n s and n→∞ X , so that X n → X pointwise, X is σ X 1 , X 2 , . . . -measurable, and n→∞ ∞ n=1 μ X n = X n = 0 = μ(X = X ). (As a consequence, instead of the X n s and X one could use the X n s and X , without loss of generality, and also ensure that X is σ (X 1 , X 2 , . . .)measurable.) (iv) Consider the measurable space (, A, μ), and suppose that, for some ω0 ∈ , {ω0 } ∈ A and μ({ω0 }) = 0. Define X n (ω) = 0 on {ω0 }, and X 2n−1 (ω0 ) = 2, X 2n (ω0 ) = 3, n ≥ 1; and X (ω) = 0 on {ω0 }c , X (ω0 ) = 1.
53
54
CHAPTER 3 Some Modes of Convergence of Sequences
a.e.
Then verify that X n → X . Furthermore, modify the X n s and X as indin→∞ cated in part (iii), so that the conclusions of that part hold. a.e.
20. Show that the convergence X m − X n −→ 0 as m, n → ∞ is equivalent to the a.e.
convergence X n+ν − X n −→ 0 uniformly in ν ≥ 1. n→∞
21. Show that, for every ε > 0 : μ(|X m − X n | ≥ ε) → 0 as m, n → ∞ if and only if μ(|X n+ν − X n | ≥ ε) −→ 0 uniformly in ν ≥ 1. n→0
CHAPTER
The Integral of a Random Variable and its Basic Properties
4
In this chapter, the concept of the integral of a r.v. X with respect to a measure μ is defined, and the basic properties of integrals are established. The definition of the integral is given first for simple r.v.s, then for nonnegative r.v.s, and finally for any r.v.s. These things are done in Section 4.1, whereas the basic properties of the integral are discussed in Section 4.2. A brief summary of the results in these sections is also presented at the end of the second section. In the next short and final section of the chapter, the measure space (, A, μ) is replaced by the probability space (, A, P), and the probability distribution of a r.v. X , PX , is introduced. There is only one result here, Theorem 13 , whose significance lies in that integration over an abstract space is transformed to integration over the real line .
4.1 Definition of the Integral Consider nthe measure space (, A, μ) and let X be nonnegative simple r.v.; i.e., X = j=1 α j I A j , where α j ≥ 0, j = 1, . . . , n, and {A j , j = 1, . . . , n} is a partition of . To such anX , we assign a possibly extended number, denoted by I (X ), as follows: I (X ) = nj=1 α j μ(A j ). We also make throughout the convention that ±∞ × 0 = 0. Then, Theorem 1. The function I : on the class of ≥ 0 simple r.v.s into [0, ∞] as defined above is well defined. m Proof. If also X = i=1 βi I Bi is another representation of X , we have to prove that n m j=1 α j μ(A j ) = i=1 βi μ(Bi ). Clearly, {A j ∩ Bi , j = 1, . . . , n, i = 1, . . . , m} is a partition of . For j = 1, . . . , n, i = 1, . . . , m, define ci j as follows: α j = βi if A j ∩ Bi = . c ji = whatever (e.g., = 0 for definiteness) if A j ∩ Bi = Then j,i
c ji μ(A j ∩ Bi ) =
j
i
c ji μ(A j ∩ Bi ) =
c ji μ(A j ∩ Bi )
j i A j ∩ Bi =
An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00004-9 Copyright © 2014 Elsevier Inc. All rights reserved.
55
56
CHAPTER 4 The Integral of a Random Variable and its Basic Properties
=
α j μ(A j ∩ Bi ) =
j
j i A j ∩ Bi =
i.e.,
c ji μ(A j ∩ Bi ) =
j,i
c ji μ(A j ∩ Bi ) =
j,i
=
αj
μ(A j ∩ Bi ) =
i A j ∩Bi =
α j μ(A j );
j
α j μ(A j ), and in a similar fashion
j
βi μ(Bi ). Thus
i
α j μ(A j )
j
βi μ(Bi ) (see also Exercise 15).
i
Theorem 2. Let X , Y be nonnegative simple r.v.s such that X ≥ Y . Then I (X ) ≥ I (Y ); furthermore, if the measure μ is finite, then I (X ) > I (Y ) if and only if X > Y on a set of positive measure. In particular, X ≥ 0 implies I (X ) ≥ 0; furthermore, if the measure μ is finite, then I (X ) > 0 if and only if X > 0 on a set of positive measure. m n Proof. Let X= j,i α j × j=1 α j I A j , Y = i=1 βi I Bi . Then, clearly, X = α μ(A I A j ∩Bi , Y = i, j βi I A j ∩Bi since I = 0, and hence I (X ) = j j ∩ j,i Bi ), I (Y ) = i, j βi μ(A j ∩ Bi ). Now X ≥ Y implies α j ≥ βi on A j ∩ Bi (which are = ) and hence I (X ) ≥ I (Y ). Next, let μ be finite. Then if X > Y on a set E with μ(E) > 0, one has that α j ≥ βi on all A j ∩ Bi (which are = ) and α j > βi on E ∩ (A j ∩ Bi ) for some A j ∩ Bi with μ[E ∩ (A j ∩ Bi )] > 0. Since α j μ(A j ∩ Bi ) = α j μ[E ∩ (A j ∩ Bi )] + α j μ[E c ∩ (A j ∩ Bi )] > βi μ[E ∩ (A j ∩ Bi )] + βi μ[E c ∩ (A j ∩ Bi )] = βi μ(A j ∩ Bi ), we have that I (X ) > I (Y ). Finally if I (X ) > I (Y ), there must exist (by contradiction) a set E such that μ[E ∩ (A j ∩ Bi )] > 0 for some A j ∩ Bi for which α j > βi (whereas always α j ≥ βi ). Indeed, if α j = βi on all A j ∩ Bi = , then I (X ) = I (Y ), a contradiction. If α j > βi on A j ∩ Bi with μ(A j ∩ Bi ) = 0 and α j = βi on all A j ∩ Bi with μ(A j ∩ Bi ) > 0, then again I (X ) = I (Y ), a contradiction. Thus, there must exist A j ∩ Bi = with μ(A j ∩ Bi ) > 0 on which α j > βi ; i.e., X > Y on a set of positive measure; namely, E = A j ∩ Bi for some j, i. The special case follows by taking Y = 0. Bi
Aj E
4.1 Definition of the Integral
Next, let X be a nonnegative r.v. (not necessarily a simple r.v.). Then, by the Corollary to Theorem 17, Chapter 1, there exist X n nonnegative, simple r.v.s such that X n ↑ X . To such an X we assign the quantity I (X ) as follows: I (X ) = limn→∞ I (X n ). This limit exists and, clearly, I (X ) ≥ 0 (but it may happen that I (X ) = ∞). Then we have the following theorem. Theorem 3. The (possibly extended) function I : on the class of ≥ 0 r.v.s into [0, ∞] as defined earlier is well defined. Remark 1. Before we proceed with the proof of this theorem we observe that the I just defined coincides with the I defined before on the class of ≥ 0 simple r.v.s since in such a case we can take X n = X , n = 1, 2, . . .. Thus the I is an extension from the class of ≥ 0 simple r.v.s to the class of ≥ 0 r.v.s. Proof of Theorem 3.
In order to prove the theorem it suffices to prove that
If Y is a ≥ 0 simple r.v. with Y ≤ X , then I (Y ) ≤ lim I (Z n ), with 0 ≤ Z n simple r.v.s ↑ X ,
(4.1)
where here and in the sequel, all limits are taken as n → ∞, unless otherwise specified. In fact, if (4.1) is true, then for 0 ≤ X n simple ↑ X , 0 ≤ Yn simple ↑ X , we have Yn ≤ X implies I (Yn ) ≤ lim I (X n ) and lim I (Yn ) ≤ lim I (X n ). Also X n ≤ X implies I (X n ) ≤ lim I (Yn ) and lim I (X n ) ≤ lim I (Yn ). Thus lim I (X n ) = lim I (Yn ). In order to establish (4.1) we distinguish three cases. de f 0. Case 1. μ() <∞, min Y = m > rn We have X n = i=1 αni I Ani , Y = sj=1 β j I B j . Choose ε > 0 such that ε < m and define Cn as follows: Cn = (X n > Y − ε). Then Cn ↑ , since X n ↑ X and X ≥ Y imply X > Y − ε. Indeed, from the definition of Cn and the fact that X n ↑ X , de f we have Cn ↑, so that lim Cn = C = n≥1 Cn . Then C = . If not, there exists ω ∈ but ω ∈ / C if and only if ω ∈ n≥1 Cnc , which implies X n (ω) ≤ Y (ω) − ε for every n, and hence X (ω) ≤ Y (ω) − ε, a contradiction. Next, rn rn I (X n ) = αni μ(Ani ) ≥ αni μ(Ani ∩ Cn ), i=1
i=1
since Ani ⊇ Ani ∩ Cn . n s Also μ(Ani ∩ Cn ) = sj=1 μ(Ani ∩ Cn ∩ B j ). Hence I (X n ) ≥ ri=1 j=1 αni × μ(Ani ∩ Cn ∩ B j ).Now,since ω ∈ (Ani ∩ Cn ∩ B j ) implies αni > β j − ε, rn s we get I (X n ) ≥ j=1 (β j − ε)μ(Ani ∩ C n ∩ B j ), and this is equal to: i=1 s (β − ε)μ(C ∩ B ); i.e., we have n j j=1 j I (X n ) ≥
s
(β j − ε)μ(Cn ∩ B j ).
j=1
(Notice that up to now no use of the finiteness of μ has been made.)
(4.2)
57
58
CHAPTER 4 The Integral of a Random Variable and its Basic Properties
Now, Cn ∩ B j = B j − B j ∩ Cnc . Hence μ(Cn ∩ B j ) = μ(B j ) − μ(Cnc ∩ B j ), since Cnc ∩ B j ⊆ B j and μ is finite. Thus I (X n ) ≥
s
(β j − ε)[μ(B j ) − μ(Cnc ∩ B j )]
j=1
=
s
β j μ(B j ) − ε
j=1
s
μ(B j ) −
j=1
= I (Y ) − εμ() −
s
s (β j − ε)μ(Cnc ∩ B j ) j=1
(β j − ε)μ(Cnc ∩ B j )
j=1
≥ I (Y ) − εμ() − (max Y )
s
μ(Cnc ∩ B j )
j=1
= I (Y ) − εμ() − (max Y )μ(Cnc ); i.e., I (X n ) ≥ I (Y ) − εμ() − (max Y )μ(Cnc ).
(4.3)
Now, Cn ↑ if and only if Cnc ↓ , which implies (since μ is finite) μ(Cnc ) ↓ μ() = 0. Therefore, letting n → ∞ in (4.3), we get lim I (X n ) ≥ I (Y ) − εμ(). Now letting ε → 0, we get I (Y ) ≤ lim I (X n ), which is (4.1). de f Case 2. μ() =∞, min Y = m > 0. s Then I (X n ) ≥ j=1 (β j − ε)μ(Cn ∩ B j ) according to (4.2), where no use of the finiteness of μ was made. Now, s s (β j − ε)μ(Cn ∩ B j ) ≥ (m − ε)μ(Cn ∩ B j ) j=1
j=1
= (m − ε)
s
μ(Cn ∩ B j )
j=1
= (m − ε)μ(Cn ); i.e., I (X n ) ≥ (m − ε)μ(Cn ).
(4.4)
Since μ(Cn ) ↑ μ() = ∞ and m − ε > 0, we get, by taking the limits in (4.4) as (4.1). n → ∞ : lim I (X n ) = ∞ and hence I (Y ) ≤ lim I (X n ), which is Case 3. μ() ≤ ∞, min Y = m = 0. Recall that 0 ≤ Y = sj=1 β j I B j ≤ X , n and we wish to show that for any 0 ≤ ri=1 αni I Ani = X n ↑ X , we have I (Y ) ≤ lim I (X n ). de f B jk where the If A = (Y > 0), then min A Y = m A > 0, whereas A = summation is over those jk s for which the corresponding β jk s are > 0, and of course,
4.1 Definition of the Integral
Ac = B jl , where the summation is over those jl s for which the corresponding β jl s are = 0. Let X (A) be the restriction of X to A, and set Y (A) =
s
β j I B j ∩A ,
j=1
X n(A) =
rn
αni I Ani ∩A .
i=1
Then, clearly, 0 ≤ X n(A) simple r.v.s ↑ X (A) , since X n(A) = X n and X (A) = X on A, and X n ↑ X . Also, Y (A) ≤ X (A) because Y (A) = Y and X (A) = X on A, and Y ≤ X . Since m A = min Y (A) > 0, case 1 or case 2 applies (depending on whether (A) μ() < ∞ or μ() = ∞, respectively) and gives that I (Y (A) ) ≤ lim I (X n ). However, Y = β j IB j = β jk I B jk + β jl I B jl = β jk I B jk , j
so that I (Y ) = ≤
β jk μ(B jk ) =
β jk μ(B jk ∩ A), since B jk ∩ A = B jk
β j μ(B j ∩ A) = I (Y (A) ); i.e., I (Y ) ≤ I (Y (A) ).
j
Also, I (X n(A) ) =
αni μ(Ani ∩ A)
i
≤
αni μ(Ani ) = I (X n ),
i (A)
so that lim I (X n ) ≤ lim I (X n ). Combining the results obtained, we have then I (Y ) ≤ I (Y (A) ) ≤ lim I (X n(A) ) ≤ lim I (X n ), as was to be seen.
So if X is a ≥ 0 r.v. then I (X ) is well defined (but it may be = ∞). Now let X be any r.v. Then X = X + − X − , where X + , X − ≥ 0. Thus I (X + ), I (X − ) are well defined. If not both I (X + ), I (X − ) are = ∞, we define I (X ) by: I (X ) = I (X + ) − I (X − ). Clearly, if X ≥ 0 then X = X + and I (X ) = I (X + ), since I (0) = 0; and if X is a nonnegative simple r.v., then I (X ) coincides with the I (X ) as defined at the beginning of this section, since we can take X n = X , n ≥ 1. Thus this I on the class of all r.v.s for which I (X + ) − I (X − ) is defined is an extension of the I on the class of ≥ 0 r.v.s. Definition 1. The possibly extended function I defined on the class of r.v.s X for which I (X + ) − I (X − ) exists is calledthe integralof X over the space with respect is to the measure μ and is denoted by X dμ or X dμ, or X if no confusion possible. In particular, if μ is a probability measure P, then X dμ = X d P is denoted by E X and is called the (mathematical) expectation of X .
59
60
CHAPTER 4 The Integral of a Random Variable and its Basic Properties
n n Thus if X = i=1 αi I Ai ≥ 0 then X dμ = i=1 αi μ(A i ); if the r.v. X is ≥ 0 Xn dμ for any 0 ≤ X n but not necessarily a simple r.v., then X dμ = lim n→∞ simple r.v. ↑ X as n → ∞; and for any r.v. X , X dμ = X + dμ − X − dμ, − + provided that at least one of X dμ, X dμ is < ∞. If now A ∈ A, then I A is measurable, and hence so is X I A for any r.v. Then, Definition 2. The integral of the r.v. X over A ∈ A, denoted by A X dμ, is defined by (X I A )dμ = (X I A )dμ, provided this latter integral exists. + − Definition 3. We say that X is integrable if both X , X are < ∞; i.e., X exists and is finite.
4.2 Basic Properties of the Integral In this section, we discuss the basic properties of the integral stated in the form theorems. Theorem 4. If X and Y are ≥ 0 r.v.s, then (X + Y ) = X + Y . m Proof. i=1 αi I Ai mis, X = n That m n First, let X and Y be≥n 0 simple r.v.s. α I and Y = β I = β I , i A ∩B j B j A ∩B i j j i j αi ≥ 0, i i=1 j=1 j=1 j=1 i=1 = 1, . . . , n. 1, . . . , m, β j ≥ 0, j m m n (αi + β j )I Ai ∩B j , and X = i=1 αi μ(Ai ), Y Then X + Y = i=1 j=1 n j=1 β j μ(B j ), and (X + Y ) = + +
β j μ(Ai i=1 j=1 m n βj
j=1
∩ Bj) =
μ(Ai ∩ B j ) =
i=1
= = =
m m n n (αi + β j )μ(Ai ∩ B j ) = αi μ(Ai ∩ B j ) i=1 j=1
m n
of
i=1 j=1 m i=1 m i=1
αi
n
μ(Ai ∩ B j )
j=1
αi μ(Ai ) +
n
β j μ(B j ) = X + Y .
j=1
Then, clearly, Next, let X n , Yn be ≥ 0 simple r.v.s such that X n ↑ X , Yn ↑ Y . n αni I Ani , X n +Yn are ≥ 0 simple r.v.s such that (X n +Yn ) ↑ (X +Y ). Let X n = rj=1 sn rn sn X n + Yn = Y = j=1 βnj I Bn j . Then i=1 j=1 (αni + βn j )I Ani ∩Bn j , and n just seen for the nonnegative simple r.v.s X (X n + Yn ) = X n + Yn , as was and Y ; i.e., (X n + Yn ) = X n + Yn . Taking the limits, as n → ∞, we then get (X + Y ) = X + Y . Remark 2. The theorem generalizes in an obvious manner to any finite number of nonnegative r.v.s. Theorem 5. Let A, B ∈ A with A ∩ B = . Then, (i) If A+B X exists, then A X , B X also exist and A+B X = A X + B X .
4.2 Basic Properties of the Integral
(ii) If A X , B X exist, exist (i.e., it is not of the form and A X + B X also ± ∞ ∓ ∞), then A+B X also exists and A+B X = A X + B X . Proof.
(i) Clearly
+ X I A+B = X + I A+B = X + I A + X + I B = (X I A )+ + (X I B )+ ,
since A ∩ B = . So
+ X I A+B = (X I A )+ + (X I B )+ and similarly, (4.5)
− X I A+B = (X I A )− + (X I B )− .
+ Now that A+B X = X I A+B exists implies that at least one of X I A+B ,
−
+ XI is < ∞. Let X I A+B < ∞. By (4.5) and Theorem 4, we have A+B + X I A+B = (X I A )+ + (X I B )+ and hence both (X I A )+ , (X I B )+ are < ∞.
− Thus X I A+B = (X I A )− + A X , B X exist. Again by (4.5) and Theorem 4, (X I B )− . Thus
+
− X= X I A+B = X I A+B − X I A+B A+B = (X I A )− + (X I B )− (X I A )+ + (X I B )+ − = (X I B )+ − (X I B )− (X I A )+ − (X I A )− + X+ X ; i.e., X= X+ X. = X IA + X IB = A
B
A+B
−
A
B
The same result follows if weassume that X I A+B < ∞. (ii) Now, let A X , B X , A X + B X exist. Then either (X I A )+ or (X I A )− is < ∞, or both. + − Also, either (X I B ) or (X I B ) is < ∞, or both. From the existence of A X + B X we then have X+ X = (X I A )+ − (X I A )− + (X I B )+ − (X I B )− . A
B
Set
(X I A )+ = a + , +
+
(X I B ) = b , Then
(X I A )− = a − , (X I B )− = b− .
X = a + − a − + b+ − b− ,
X+ A
B
61
62
CHAPTER 4 The Integral of a Random Variable and its Basic Properties
and its existence means only the following cases may occur: ⎧ ± ± ⎨ a , b are all < ∞ a + and/or b+ = ∞ but a − and/or b− finite . ⎩ − a and/or b− = ∞ but a + and/or b+ finite − For any of these cases, the (a+ + b+ ) − (a − grouping + b ) is legitimate, and this + − is equal to A+B X − A+B X = A+B X . Thus A+B X exists and A+B X = A X + B X.
Remark 3. The theorem generalizes in an obvious manner to any finite number of pairwise disjoint events. (see also Exercise 5.) Corollary. If X exists, then for any A ∈ A, A X exists and X = A X + Ac X . Furthermore, if X is integrable over , then X is integrable over any A ∈ A. Proof. The first part of the corollary follows from Theorem 5(i) with A = A, c . As for the second part, we have: the fact that X is that B = A finite implies + − X , X < ∞. Since X + = A+Ac X + = A X + + Ac X + , X − = − = − − + − < ∞, and hence X A+Ac X A X + Ac X , we then get A X , A X is integrable over A. Theorem 6. If X exists and c ∈ , then cX exists and cX = c X . Proof. We distinguish the following cases: Case 1. Let c = −1. Then cX = −X . But:(−X )+ = X − , (−X )− = X + , so that X − = (−X )− − (−X )+ X = X+ − −X = − X . =− (−X )+ − (−X )− = − (−X ) or Case 2. Let c ≥ 0. If c = 0, then c × X = 0implies cX = 0 = 0 = 0 X , since 0 × (±∞) is also 0 (for the case that X = ±∞). We assume now that + + − − + − c > 0. Then (cX ) = cX , (cX ) = cX , so that cX = (cX ) − (cX ) = cX + − cX − ; i.e., cX =
cX + −
cX − ,
(4.6)
provided it exists. + hence0 ≤ cX n simple Since X + ≥ 0, there exist 0 ≤ X n simple r.v.s ↑ X and + r.v.s ↑ cX as n → ∞, which implies cX n → cX + . But cX n = c X n n→∞ + X , we then get cX + = for simple r.v.s, as is readily seen. Since also X n → n→∞ c X + . In a similar way, cX − = c X − , and then (4.6) exists and gives the desired result. Case 3. Let c < 0. Then cX = (−1)[(−c)X] = −1 (−c)X , by case 1, and this is = −(−c) X , by case 2, since −c ≥ 0; i.e., cX = c X .
4.2 Basic Properties of the Integral
Theorem 7.
If X ≥ 0, then
X ≥ 0.
r.v.s ↑ X as Proof. Let n → ∞ whereas 0 ≤ X n simple X , we get then X ≥ 0. Since also X n → n→∞
X n ≥ 0 by Theorem 2.
Theorem 8. (i) If X ≤ Y and X , Y exist, then X ≤ Y . (ii) If X ≤ Y and X − < ∞ (so that X exists), then Y exists and X ≤ Y by part (i). (iii) If X ≤ Y and Y + < ∞ (so that Y exists), then X exists and X ≤ Y bypart (i). (iv) If X exists, then X ≤ |X |. Proof. (i) Let 0 ≤ X ≤ Y . ThenY = X + (Y − X ) = X + Z , where Z = Y − X ≥ 0. Hence Z ≥ 0 by Theorem 7. Thus Y = X + Z by Theorem 4. Also, + ≤ Y + , X − ≥ Y − , clearly. Then X ≤ Y . In general, X ≤ Y implies X + ≤ − Y − , by what we have just X ≤ Y + , X − ≥ Y − or − X − + − − ≤ + − Y − or X Y X ≤ Y , since established. Therefore X X , Y exist. − − we (ii) From X − ≥ Y − above, get that X < ∞, which implies Y < ∞, so that Y exists and X ≤ Y by part (i). (iii) From X + ≤ Y + above, we get that Y + < ∞ so that X + < ∞, and hence X exists and X ≤ Y by part (i). X ≤ |X | |X | |X | |X | (iv) Indeed, − ≤ X ≤ implies − ≤ X ≤ or |X |. Theorem 9. If X = Y a.e. and X exists, then Y exists and X = Y . Remark 4. We first observe that if μ(B) = 0, for some B ∈ A, then B Z = 0, for + exist 0 ≤ Z n I B simple r.v.s ↑ Z + I B as any r.v. Z . In fact, Z ≥ 0 implies thatthere + + Z I B = B Z . But B Z n = 0, because, n → ∞; hence B Z n = Z n I B → n→∞ rn rn if Z n = αn j An j , then Z n I B = αn j I B∩An j and B Z n = Z n I B = j=1 j=1 rn α μ(B ∩ An j ) = 0. This implies that B Z + = 0. Similarly, B Z − = 0, so j=1 nj that B Z = 0. Proof of Theorem 9. Let A = (X = Y ). Then A = (X − Y = 0) and A is −1 measurable, since A = (X − Y ) ({0}) and X − Y is a r.v. Now, the existence of X implies the existence of A X , Ac X and X = A X + Ac X , by the Corollary to Theorem 5. But X+ X = X IA + X = Y IA + 0 c A Ac A = Y = Y +0= Y+ Y A
A
A
Ac
63
64
CHAPTER 4 The Integral of a Random Variable and its Basic Properties
(by 4), and this is equal to Remark Y = X.
Y by Theorem 5(ii). So
Y exists and
X is integrable if and only if |X | is integrable. + − + − Proof. Now X being integrable implies X , X < ∞. Since |X | = X + X , we get |X | = X + + X − by Theorem 4, and then |X | < ∞; i.e., |X | is Next, let |X | be integrable. We have |X | = X+ + X − and |X| = X + + integrable. − X by Theorem 4, and |X | < ∞ which implies X + < ∞ and X − < ∞. Thus X is integrable. Theorem 11. If |X | ≤ Y and Y < ∞, then X is integrable. Proof. |X | ≤ Y implies |X | ≤ Y by Theorem 8. But Y < ∞; thus |X | < ∞ if and only if X is finite, by Theorem 10. Theorem 12. If X , Y , and X + Y exist, then (X +Y ) exists and (X +Y ) = X + Y. Proof. The existence of X + Yimplies that if X = ∞, then −∞ < Y ≤ ∞, and if X = −∞, then ∞ > Y ≥ −∞. So if X = Y = ±∞, then X + Y = ±∞, and, by means of simple r.v.s, it is seen (see also Exercise 6) that (X + Y ) = ±∞. So (X + Y ) = X + Y in this case. Next assume that at least one of X , Y is finite and, in order to fix ideas, let Y be finite. This implies that A Y is finite for any A ∈ A by the Corollary to Theorem 5. Now, we consider the following partition of by the sets A j , j = 1, . . . , 6, defined as follows: Theorem 10.
A1 = X ≥ 0, Y ≥ 0, X + Y ≥ 0 A4 = X < 0, Y ≥ 0, X + Y ≥ 0 ,
A2 = X ≥ 0, Y < 0, X + Y ≥ 0 ; A5 = X < 0, Y ≥ 0, X + Y < 0 ,
A3 = X ≥ 0, Y < 0, X + Y < 0 A6 = X < 0, Y < 0, X + Y < 0 . The existence of X , Y implies the existence of A j X , A j Y , j = 1, . . . , 6, by the Corollary to Theorem 5. We will prove that A j (X + Y ) = A j X + A j Y , j = 1, . . . , 6, which will imply that
(X + Y ) =
(X + Y )
=
Aj
j
j
=
j
=
(X + Y ) Aj
j
X+ Aj
X+ Aj
Y Aj
j
Y Aj
4.2 Basic Properties of the Integral
=
X+
=
j
Aj
X+
Y
j
Aj
Y
by Remark 3 and Exercise 5. We have A1 : (X + Y )I A1 = X I A1 + Y I A1 with X I A1 , Y I A1 ≥ 0. Then A1 (X + Y ) = (X I A1 +Y I A1 ) = X I A1 + Y I A1 , by Theorem 4, and this is = A1 X + A1 Y . A2 : X I A2 = (X + Y )I A2+ (−Y )I A2 with (X + Y )I A2 , (−Y )I A2 ≥ 0. Then, by Theorem 4, A2 X = X I A2 = [(X + Y )I A2 + (−Y )I A2 ] = (X + Y )I A2 + −Y I A2 = A2 (X + Y ) − A2 Y . Since A2 Y is finite (by the Corollary to Theorem 5), we get A2 (X + Y ) = A2 X + A2 Y . A3 : −Y I A3 = −(X by + Y )I A3+ X I A3 with −(X + Y )I A3 , XI A3 ≥ 0. Then, Theorem 6, − A3 Y = − A3 (X + Y )+ A3 X or A3 Y = A3 (X +Y )− A3 X , and since A3 Y is finite, so is A3 X . Then we get A3 (X + Y ) = A3 X + A3 Y . A4 : Y I A4 = (X +Y )I A4 +(−X )I A4 and A4 Y = A4 (X +Y )+ A4 − X = A4 (X + Y ) − A4 X . Since A4 Y is finite, so is A4 X and A4 (X + Y ) = A4 X + A4 Y . A5 : −X I A5 = Y I A5 +(−X −Y )I A5 and − A5 X = A5 Y − A5 X +Y or − A5 X − X + Y. A5 Y = − A5 X + Y , or A5 X + A5 Y = A5 A6 : −(X + Y )I A6 = −X I A6 − Y I A6 and − A6 X + Y = − A6 X − A6 Y or A6 X + Y = A6 X + A6 Y . The interested reader may find a rather extensive treatment of integration in the reference Vestrup (2003). Here is a brief description of the results obtained in Sections 4.1 and 4.2. For a simple r.v. X , the quantity I (X ) was defined, to be called eventually the integral of X . For two nonnegative simple r.v.s X and Y with X ≥ Y , it follows that I (X ) ≥ I (Y ); and I (X ) > I (Y ) if and only if X > Y on a set of positive μ-measure in the case that μ is finite. For a nonnegative r.v. X , the quantity I (X ) is defined, and it is shown that it is well defined. and set I (X ) = X dμ or just For any r.v. X , define I (X), provided it exists, X . Also, for A ∈ A, define A X dμ or just A X , and also define integrability of a r.v. X . For two nonnegative r.v.s X , Y , it holds (X + Y ) = X + Y . For A, B in A with A ∩ B = ∅, and any r.v. X for which A+B X exists, it follows that A X , B X also exist and A+B X = A X + B X . Furthermore, if the assumption is that all three A X , B X , and A X + B X exist, then A+B X also exists and A+B X = A X + B X .
65
66
CHAPTER 4 The Integral of a Random Variable and its Basic Properties
X exists, then so does A X for every A ∈ A. Furthermore, if X is finite, so is A X for every A ∈ A. If X exists, then cX exists for every c ∈ , and cX = c X . X ≥ 0 implies X ≥ 0. If X ≤ Y and X and Y exist, then X ≤ Y ; if X ≤ Y and X −< ∞, then + X exists and X ≤ Y ; if X ≤ Y and Y < ∞, then X exists and X ≤ Y . If X = Y a.s. and X exists, then Y also exists and X = Y. X < ∞ if and only if |X | < ∞. If |X | ≤ Y and Y < ∞, then X is integrable. If all three X , Y , and X + Y exist, then (X + Y ) exists and (X + Y ) = X + Y. If
4.3 Probability Distributions In this short section, we consider a r.v. X defined on the probability space (, A, P), and on B, define the set function PX as follows: PX (B) = P(X −1 (B)) = P(X ∈ B). Then it is easy to see that PX is a probability measure on B. Definition 4. The set function PX on B associated with the r.v. X is called the (probability) distribution of X . The point function FX : → [0, 1] defined by FX (x) = PX ((−∞, x]) = P(X ≤ x) is called the distribution function (d.f.) of X . (See Theorem 6 and the comments following it in Chapter 2.) It is easily seen that FX satisfies the following properties: (1) is nondecreasing; (2) de f
de f
is continuous from the right; (3) FX (+∞) = lim x→∞ FX (x) = 1; (4) FX (−∞) = lim x→−∞ FX (x) = 0. (See Theorem 6 and the comments following it in Chapter 2.) Conversely, a function on into [0, 1] that satisfies properties (1)–(4) uniquely defines a probability measure Q on B (by Theorem 7 in Chapter 2), and it is also true that there exists a r.v. X whose PX is this distribution Q. The simplest example would be that of taking (, A, P) = (, B, Q) and let X (ω) = ω , ω ∈ . Then, clearly, Q is the distribution of X , PX . Also, if Y ∼ U (0, 1) and X = F −1 (Y ), then X ∼ F, where F −1 (y) = inf{x ∈ ; F(x) ≥ y}. (See, e.g., Theorem 7, Chapter 2 in Roussas, 1997.) The following theorem is an important one in statistics, since it allows integration over the real line rather than over the abstract space . Theorem 13.
Let g : → be measurable. Then we have g(X )d P = g(x)d PX , E g(X ) =
also denoted by g(x)d FX , in the sense that if one side exists, so does the other, and they are equal. (See also the Appendix regrading the notation g(x)d FX .) Proof. We use the familiar method of proving the theorem in several steps starting with indicator functions. Let g = I B some B ∈ B. Then g(X ) = I A (X ), where
4.3 Probability Distributions
A = X −1 (B). Then, g(X )d P = I A (X )d P = P(A) = P(X −1 (B)) = PX (B) = I B (x)d PX = g(x)d PX .
n Next, let g be a nonnegative simple function; i.e., g(x) = i=1 αi I Bi (x) with αi > a (measurable) partition of , and let Ai = 0, i = 1, . . . , n, where {B1 , . . . , Bn } is n αi I Ai (X ), and by linearity of the integral X −1 (Bi ), i = 1, . . . , n. Then g(X ) = i=1 (see also Exercise 8) and the previous step, n n g(X )d P = αi I Ai (X )d P = αi I Bi (x)d PX
i=1
=
n
i=1
αi I Bi (x) d PX =
i=1
g(x)d PX .
Now, let rgn be nonnegative. Then there exist 0 ≤ gn (x) simple ↑ g(x); i.e., α I (x) (αni ≥ 0, i = 1, . . . , rn ), which implies that 0 ≤ gn (x) = rni=1 ni Bni −1 where Ani = X (Bni ). Then ↑ g(X ) as n → ∞ gn (X ) = i=1 αniI Ani (X ) simple → g(X )d P, gn (x)d PX → g(x)d PX , whereas gn (X ) gn (X )d P n→∞ n→∞ d P = gn (x)d PX for all n, by the previous step. Hence g(X )d P = g(x)d PX . + − + Finally, for any g, write g(x) = g (x) − g (x), which implies g(X ) = g (X ) − − g (X ). Now, if g(X )d P exists, it then follows that either g + (X )d P < ∞ or g − (X )d P < ∞
or both. Since g + (X )d P = g + (x)d PX and g − (X )d P = g − (x)d PX , by the previous step, it follows that either + g (x)d PX < ∞ or g − (x)d PX < ∞
or both, respectively. Thus, g(x)d PX exists and + g(X )d P = g (X )d P − g − (X )d P = g + (x)d PX − g − (x)d PX = g(x)d PX . Likewise, the existence of equality.
g(x)d PX implies the existence of
g(X )d P and their
67
68
CHAPTER 4 The Integral of a Random Variable and its Basic Properties
Remark 5. The proof is exactly the same for k-dimensional random vectors. However, we do not intend to present details. Exercises. 1. Construct an example where r.v.s X n and X , defined on a probability space a.s.
P
n→∞
n→∞
(, A, P), are such that X n → X pointwise (hence X n → X and X n → X ), n→∞
but E X n E X .
n→∞ 2. If the r.v. X ≥ 0 and X dμ = 0, then show that μ(X = 0) = 0. 3. If the r.v. X takes on the values n, n = 1, 2, . . ., with probability 1, so that ∞ ∞ n=1 P(X = n) = 1, show that E X = n=1 P(X ≥ n). 4. (i) For a r.v. X and any two disjoint events A and B, show that
(X I A+B )+ = (X I A )+ + (X I B )+ , (X I A+B )− = (X I A )− + (X I B )− . (ii) More generally, for any finite collection of pairwise disjoint events Ai , i = 1, . . . , n, show that
n X Ii=1 Ai
+
=
n i=1
X I Ai
n −
+
− n X I Ai . , X I i=1 = Ai i=1
5. If for the pairwise disjoint events Ai , i = 1, . . . , n, the integral n Ai X exists, ni=1 then the integrals Ai X , i = 1, . . . , n, exist, and n Ai X = i=1 X. i=1 Ai 6. Let X and Y be two X = Y= ∞ or simple r.v.s such that X = Y = −∞. Then show that (X + Y ) exists and (X + Y ) = X + Y (= ∞ or −∞). be two measures on (, A) and suppose that X d(μ1 + μ2 ) 7. Let μ1 and μ 2 exists. Then X dμi exist, i = 1, 2, and X d(μ1 + μ2 ) = X dμ 1 + X dμ2 . n , i = 1, . . . , n, are integrable, then so is the r.v. i=1 X i and 8. If the r.v.s X i n n X X = . i i i=1 i=1 9. Let X and Y be r.v.s with finite second moments, and set E X = μ1 , EY = μ2 , 0 < V ar (X ) = σ12 , 0 < V ar (Y ) = σ22 . Then the covariance and the correlation coefficient of X and Y , denoted respectively, by Cov(X , Y ) and ρ(X , Y ), are defined by: Cov(X , Y ) = E[(X −μ1 )(Y −μ2 )] = E(X Y )−μ1 μ2 and ρ(X , Y ) = Cov(X , Y )/σ1 σ2 . (i) Then show that −σ1 σ2 ≤ Cov(X , Y ) ≤ σ1 σ2 , and Cov(X , Y ) = σ1 σ2 if and only if P Y = μ2 + σσ21 (X − μ1 ) = 1, and Cov(X , Y ) = −σ1 σ2 if and only if P Y = μ2 − σσ21 (X − μ1 ) = 1. (ii) Also, −1 ≤ ρ(X , Y ) ≤ 1, and ρ(X , Y ) = 1 if and only if σ2 P Y = μ2 + (X − μ1 ) = 1, σ1
4.3 Probability Distributions
and ρ(X , Y ) = −1 if and only if P Y = μ2 −
σ2 σ1 (X
− μ1 ) = 1.
Hint: Use Exercise 2. 10. Let = {−2, −1, 3, 7}, and let the measure μ be defined by: μ({−2}) = 2, μ({−1}) = 1, μ({3}) = 3, μ({7}) = 7. Let X be a r.v. defined by: X (−2) = X (−1) = −1, and X (3) = X (7) = 1. Then, for A = {−2, 3, 7}, compute the integral A X dμ. 11. Let (, A, μ) = ((−5, 5), B(−5,5) , λ) where λ is the Lebesgue measure, and let the r.v. X be defined as follows: ⎧1 , ω ∈ (−5, 2) ⎪ ⎪ ⎪ 21 ⎨ ,ω=2 X (ω) = 3 ⎪ 1, ω ∈ (2, 3] ⎪ ⎪ ⎩ 0, ω ∈ (3, 5). Then, for A = [−1, 4], compute the integral A X dμ. 12. Let = {0, 1, 2, . . .}, let A be the discrete σ -field of subsets of , and on A, define the function μ by: μ(A) = number of nonnegative integers in A. Then show that μ is a measure, and indeed, a σ -finite measure. 13. Let g : → (0, ∞) be nondecreasing in (0, ∞) and symmetric about 0 (g(−x) = g(x), x ∈ ), and let X be a r.v. such that E g(X ) < ∞. Then show that: P(|X | ≥ c) ≤ E g(X )/g(c) for every c > 0. 14. For a r.v. X , show that |X | 1 E for every c > 0. P(|X | ≥ c) ≤ 1 + c 1 + |X | Hint: Use Exercise 13 above. 15. Let X be a simple r.v. defined on (, A), so that (see Definition 13 in Chapter 1) X = nj=1 α j I A j , where {A j , j = 1, . . . , n} is a (measurable) partition of , m and the α j , j = 1, . . . , n, are assumed to be distinct. Next, let X = i=1 βi I Bi be any other representation of X . Then show that each A j , j = 1, . . . , n, is the sum of some Bi ’s. Remark 6. Under the assumption that the α j , j = 1, . . . , n, are distinct, the partition {A j , j = 1, . . . , n} may be called irreducible. For such a partition, there is a unique representation of the respective simple r.v. and consequently, Theorem 1 is rendered superfluous. 16. If X and Y are two identically distributed integrable r.v.s, then E X I(|X |≤c) = E Y I(|Y |≤c) for any constant c.
69
CHAPTER
5
Standard Convergence Theorems, The Fubini Theorem
This chapter consists of two sections. In the first section, we discuss the standard convergence theorems, such as the Lebesgue Monotone Convergence Theorem, the Fatou–Lebesgue Theorem, and the Dominated Convergence Theorem. The Lebesgue Monotone Convergence Theorem and the Dominated Convergence Theorem provide conditions under which, from the limit of an integral, one can pass to the integral of the limit. The Fatou–Lebesgue Theorem deals with inequalities, involving lim inf and lim sup, rather than equalities as was the case in the former two theorems. As an application, two theorems are discussed, on the basis of which the interchange of the operations of differentiation and integration is valid. This section is concluded with the introduction of the concept of convergence in distribution. Its relation to convergence in probability is mentioned, and a result, particularly useful in Statistics, is stated, involving convergence in distribution of two sequences of r.v.s. The purpose of the second section is to establish the Fubini Theorem. This result gives conditions under which the interchange of the order of integration, involving a double integral, is valid. A number of auxiliary results have to be established first, and these include the so-called Product Measure Theorem.
5.1 Standard Convergence Theorems and Some of Their Ramifications Theorem 1 (Lebesgue Monotone Convergence Theorem; interchange of the order of lim and for ≥ 0 nondecreasing sequences If 0 ≤ X n ↑ X as n→ ∞, of r.v.s). where X may be an extended r.v., then X n ↑ X as n → ∞ (i.e., lim X n = n→∞ lim X n ). n→∞
Proof. In the sequel all limits are specified. taken as n → ∞ unlesss otherwise 0 ≤ X n ↑ X implies that X n , X exist, and X n ↑ implies that X n ↑. So, to show X n ↑ X , let 0 ≤ X nk simple r.v.s ↑ X n , k → ∞, and define the r.v.s Yk by Yk = max X nk , k = 1, 2, . . .: 1≤n≤k
X 11 X 12 · · ·
X 1k
X 21 X 22 · · · X 2k · · · · · · ·
···
↑ X1
· · · ↑ X2 · · · ·
An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00005-0 Copyright © 2014 Elsevier Inc. All rights reserved.
71
72
CHAPTER 5 Standard Convergence Theorems, The Fubini Theorem
X k1 X k2 · · ·
X kk
···
↑ Xk
· · · · · · · · · · · X n1 X n2 · · · X nk · · · ↑ X n ·
·
·
·
·
·
·
·
·
·
·
Then, clearly, 0 ≤ Yk simple ↑, X nk ≤ Yk , n = 1, . . . , k. Next, Yk ≤ X k . In fact, since Yk = max X nk , we have that for ω ∈ there exists 1 ≤ n o (ω) ≤ k such 1≤n≤k
that Yk (ω) = X n o (ω),k . But X n o (ω),k (ω) ≤ X n o (ω) (ω) since X n o,k ↑ X n o as k → ∞, and X n o (ω) (ω) ≤ X k (ω), since n o (ω) ≤ k and X n ↑. Thus Yk ≤ X k and therefore X nk ≤ Yk ≤ X k , n = 1, . . . , k. Keeping n fixed and letting k → ∞, we get X n ≤ lim Yk ≤ X . k→∞
Now letting n → ∞, we get X ≤ lim Yk ≤ X , so that lim Yk = X . So 0 ≤ Yk simpler r.v.s ↑ X k→∞ k→∞ implies that Yk ↑ X as k → ∞. From Yk ≤ X k ≤ X , we get Yk ≤ X k ≤ X so that lim Yk = X ≤ lim sup X k ≤ X . k→∞
k→∞
So X k → X as k → ∞, and of course, X k ↑ X . In other words, lim X n = lim X n . Remark 1. Clearly, the convergence X n ↑ X may be only a.e., and X n ≥ 0 may also be true only a.e. Corollary 1 (Interchanging the and in a series of ≥ 0 r.v.s). Xn. (i) If X n ≥ 0, n = 1, 2, . . . , then n Xn = n (ii) If the r.v. X ≥ 0 a.e. and {A j , j ≥ 1} are pairwise disjoint, then A j X = j j Aj X. Proof. ∞ (i) Let Yn = nj=1 X j . Then 0 ≤ Yn ↑ ∞ X j and hence Yn ↑ X . j=1 n n n ∞ j=1 j X X X X = . Thus lim = But Yn = j j j = j=1 j j=1 j=1 j=1 ∞ j=1 X j . (ii) = X = X IA j X = X I I A j j j X IA j = j j Aj j Aj (by part (i)) and this equals to j Aj X.
5.1 Standard Convergence Theorems and Some of Their Ramifications
a.e. Corollary 2. If Y ≤ X n ↑ X (a possibly extended r.v.) and Y < ∞, then a.e. X n ↑ X (i.e., here 0 is replaced by the integrable r.v. Y ). Proof. From Y ≤ X n a.e., we have Y + ≤ X n+ a.e. and X n− ≤ Y − a.e., so that X n exists. Next, Y ≤ X n ↑ X implies 0 ≤ X n − Y ↑ X − Y and this, by the theorem, a.e. a.e. a.e. a.e. implies (X n − Y ) ↑ (X − Y ) or (by Theorem 12 in Chapter 4) Xn ↑ X X n − Y ↑ X − Y or since Y < ∞. Theorem 2 (Fatou–Lebesgue Theorem; interchange of
and lim inf, lim sup, lim).
a.e.
(i) If the r.v.s Y , X n , n = 1, 2, . . . are such that Y ≤ ≤X n , n = 1, 2, . . . , and Y is integrable, then lim inf X n ≤ lim inf X n . n→∞ n→∞ Xn ≤ (ii) If X n ≤ Z , n = 1, 2, . . . , and Z is integrable, then lim sup a.e. n→∞ lim sup X n . n→∞ a.e.
a.e.
a.e.
(iii) If Y ≤ X n ≤ Z , n = 1, 2, . . . , where Y and Z are as above and X n → X , a n→∞ X and |X n | → |X | . possibly extended r.v., then X n → n→∞
Proof.
n→∞
In what follows, all limits are taken as n → ∞.
(i) Assume first that 0 ≤ X n , n = 1, 2, . . . , and define Yn by Yn = inf X k . Then k≥n
a.e.
lim Yn = lim inf X k = lim inf X n and this convergence is, clearly, ↑; i.e., k≥n
a.e.
0 ≤ Yn ↑ lim inf X n . Then the Lebesgue Monotone Convergence Theorem n applies and gives: Yn ↑ lim inf X n . On the other hand, Yn ≤ X n . Thus X n and lim Yn ≤ lim inf X n , so that Yn ≤ lim inf X n ≤ lim inf X n . a.e.
In the general case, we consider X n −Y that are ≥ 0, n = 1, 2, . . . , and apply the previous results. Specifically, Y ≤ lim inf X n a.e. implies (lim inf X n )− ≤ Y − a.e., so that lim inf X n exists. We then get lim inf(X n − Y ) ≤ lim inf (X n − Y ) or lim inf X n − Y ≤ lim inf X n − Y
73
74
CHAPTER 5 Standard Convergence Theorems, The Fubini Theorem
or
lim inf X n ≤ lim inf
X n , since
Y is finite.
Then, by part (i), (ii) X n ≤ Z a.e. or −Z ≤ −X n a.e. and −Z is integrable. lim inf(−X n ) ≤ lim inf −X n = lim inf(− X n ). But lim inf(−X n ) = − lim sup X n . Then
− lim sup X n ≤ lim inf − X n = − lim sup X n or
lim sup X n ≤ − lim sup X n or lim sup X n ≤ lim sup X n .
−
a.e. (iii) Y ≤ X n , n ≥ 1, Y < ∞, imply, by (i), lim inf X n ≤ lim inf X n . a.e. Also, X n ≤ Z , n ≥ 1, | Z | < ∞; thus, by (ii), lim sup Xn ≤ lim sup X n . Hence: lim inf X n = X ≤ lim inf X n ≤ lim sup X n ≤ lim sup X n = X . Thus X n → X . Next, Y ≤ X n ≤ Z or − Z ≤ −X n ≤ −Y so that 0 ≤ |X n | ≤ |Y | + |Z | a.e. and (|Y | + |Z |) < ∞. Since |X n | → |X |, we get |X n | → |X |.
We proceed with a definition and a lemma. Definition 1. Let μ be a measure on A and let ϕ be a set function on A. We say that ϕ is absolutely continuous with respect to μ, and write ϕ μ, if for every A ∈ A for which μ(A) = 0, we have ϕ(A) = 0. Lemma 1. Let ϕ be a nonnegative, σ -additive, and finite set function. Then ϕ μ if and only if, for every ε > 0, there exists δ = δ(ε) > 0 such that μ(A) < δ imply ϕ(A) < ε. For its proof, refer to Exercise 6. Also, see Theorem 27.1 on page 191 in Munroe (1953). Corollary. Let X be integrable and define the finite and ≥0 set function ϕ (actually, finite measure) as follows: ϕ(A) = A |X | dμ, A ∈ A. Then ϕ μ. Remark 2. ϕ as defined above is called the indefinite integral of |X |, and is σ additive on account of Corollary 1(ii) to Theorem 1. Proof of Corollary.
See Remark 4 in Chapter 4.
5.1 Standard Convergence Theorems and Some of Their Ramifications
Alternatively, let the r.v.s X n be such that ⎧ ⎨ X (ω) if |X (ω)| < n if X (ω) ≥ n X n (ω) = n ⎩ −n if X (ω) ≤ −n, n = 1, 2, . . . (so that |X n | ≤ n). Then, clearly, 0 ≤ |Xn | ↑ |X | as n → ∞εand hence, as n → ∞ |X n |ε↑ |X | < ∞. Choose n o such that |X | < |X n | + 2 , n ≥ n o . Set δ = δ(ε) = 2n 0 and let A be such that μ(A) < δ. Then X n ≤ n o μ(A) + ε , |X | = |X | − X no + ϕ(A) = o 2 A A A A ε by the fact that X n o ≤ n o and the choice of n o . This is < n o × 2n o + 2ε ; i.e., ϕ(A) < ε. Theorem 3 (Dominated Convergence Theorem; interchange of and lim). If |X n | ≤ a.e.
μ
Y a.e., n = 1, 2, . . . with Y integrable and either (a) X n → X or (b) X n → X , n→∞ n→∞ then (i) A X n → A X uniformly in A ∈ A. n→∞ (ii) Xn → X. n→∞ |X n − X | → 0. (iii) n→∞
(iv) (i) and (iii) are equivalent. Remark 3.
(i) In the proof we will use thefollowing property: If Z exists, then Z ≤ |Z |, as was seen in Theorem 8 (iv) of Chapter 4. (ii) Part (iv) is true under an integrability assumption only of the r.v.s involved. This integrability is ensured here by the assumptions made (see parts (a) and (b) in the proof of the theorem.) Proof of Theorem 3. All limits that follow are taken as n → ∞ unless otherwise specified. We first establish (iv) under either mode of convergence of X n to X . (iv) Let that |X n − X | → 0. Then Xn − |X n − X | ≤ |X n − X | → 0 X = (X n − X ) ≤ A
A
A
A
independently of A. So, A X n → A X uniformly in A ∈ A. Next, |X n − X | = (X n − X )+ + (X n − X )− = (X n − X ) − (X n − X ) (X −X ≥0) (X n −X <0) n = Xn − X− Xn + X, An
An
Acn
Acn
75
76
CHAPTER 5 Standard Convergence Theorems, The Fubini Theorem
where An = (X n − X ≥ 0). So, if A X n → A X uniformly in A ∈ A, then |X n − X | → 0. Xn − X → 0, Xn − X → 0 imply An
Acn
An
Acn
Now, since (i) and (iii) are equivalent and (i) implies (ii), it suffices to show (iii) only. a.e.
(a) Assume first that X n → X . Then |X n | ≤ Y a.e. implies |X | ≤ Y a.e., so that 0 ≤ |X n − X | ≤ 2Y a.e. and 2Y is finite. Then Theorem 2 (iii) gives |X n − X | → a.e. 0, since |X n − X | → 0. μ μ μ (b) Assume next that X n → X or X n − X → 0 or |X n − X | → 0. By setting Z n = |X n − X | , Z = 2Y , we have then: 0 ≤ Z n ≤ Z a.e. (by Exercise 18 in Chapter μ 3) integrable, Z n → 0, and we want to prove that Z n → 0. For r = 1, 2, . . . , define Yr as follows: 1/r if Z (ω) > 1/r Yr (ω) = Z (ω) if Z (ω) ≤ 1/r .
(5.1)
Then, clearly, 0 ≤ Yr ≤ Z and Yr → 0 (since 0 ≤ Yr ≤ r1 everywhere), and r →∞ this implies Yr → 0 by Theorem 2 (iii). Now, let Z n (ω) < r1 . Then, if Z (ω) > r1 , r →∞
it follows from (5.1) that Yr (ω) = 1/r and hence Z n (ω) ≤ Yr (ω). If Z (ω) ≤ r1 , then Yr (ω) = Z (ω) by (5.1), and hence Z n (ω) ≤ Yr (ω) since always Z n ≤ Z . To summarize, Z n < r1 implies Z n ≤ Yr and so Zn ≤ Yr ≤ Yr → 0. (5.2) r →∞
(Z n < r1 )
(Z n < r1 )
μ
Now, Z n → 0 implies μ(Z n ≥ 1/r ) → 0. So, if we define ϕ(A) =
n→∞
A, then ϕ μ and ϕ is finite because Z is integrable. Therefore Z → 0 (by Lemma 1). n→∞
A
Z dμ, A ∈
(5.3)
(Z n ≥ r1 )
By Z n ≤ Z and (5.2) and (5.3), we get then Zn = Zn + Z n ≤ Yr + (Z n < r1 )
(Z n ≥ r1 )
by letting n → ∞ first and then by letting r → ∞.
Z →0
(Z n ≥ r1 )
Remark 4. Theorems 1–3 remain true if the set {1, 2, . . .} is replaced by T ⊆
and n → ∞ is replaced by t → to , t, to ∈ T . The following two theorems provide sufficient conditions that allow the interchange of the order of executing the operations of integration and differentiation.
5.1 Standard Convergence Theorems and Some of Their Ramifications
d Theorem 4 (Interchange of dt and ). Let T ⊆ and let to be an interior point of T . Let X = X (ω, t) be a real-valued function on × T that is an integrable r.v. X (·,t)−X (·,to ) a.e. ∂X for each t ∈ T and such that ( ∂t )to exists for a.e. ω ∈ and ≤Y t−to ∂ X (·,t) d integrable for all t in some neighborhood of to . Then the dt X (·, t) and ∂t exist and are finite for t = t0 , and they are equal; i.e.,
∂ X (·, t) d . X (·, t) = dt ∂t t0 to (·,to ) 1 X (·, t) − X (·, to ) = X (·,t)−X . Proof. We have t−t t−to 0 X (·,t)−X (·,to ) a.e. ∂ X (·,t) → ( ∂t )to since ( ∂∂tX )to exists a.e., while for ts in a neight−to t→to ∂ X (·,t) (·,to ) a.e. ≤ Y a.e. and hence borhood of to , X (·,t)−X ≤ Y integrable, so that ∂t t−to t
Now
∂ X (·,t) ∂t to
o
is integrable. Then Remark 4 applies and gives
lim
t→to
∂ X (·, t) X (·, t) − X (·, to ) X (·, t) − X (·, to ) = lim = . t→to t − to t − to ∂t to
Since the left-hand side is equal to lim
t→to
X (·,t)− X (·,to ) d , it follows that dt t−to
X (·, t) t
o
is finite, and the asserted equality holds. d Theorem 5 (Interchange of dt and ). Let T = [α, β] ⊂ and let X = X (ω, t) be defined on × T into and be such that: X is an integrable r.v. for each t ∈ T , ∂ X∂t(·,t) a.e. exists for a.e. ω and all t ∈ T , ∂ X∂t(·,t) ≤ Y integrable, for all t ∈ T . Then, for each to ∈ [α, β],
∂ X (·, t) d . X (·, t) = dt ∂t to to Remark 5. respectively.
For t = α or t = β, we mean derivative from the right or left,
Proof of Theorem 5. Calculus, a.e.:
We have, by the Mean Value Theorem of Differential
∂ X (ω, ·) X (ω, t) − X (ω, to ) = (t − to ) ∂t
t ∗ (ω)
,
where t ∗ (ω) t, to . lies between ∂ X (·,t) a.e. (·,to ) Since ∂t ≤ Y integrable, for all t ∈ T , we get X (·,t)−X ≤ Y a.e. Then t−to Theorem 4 applies and gives the result. Application 1. Results such as Theorems 4 and 5 have wide applicability in statistics. They are employed, e.g., when establishing the Cramér–Rao inequality, asymptotic normality of the Maximum Likelihood Estimate, and in many other instances.
77
78
CHAPTER 5 Standard Convergence Theorems, The Fubini Theorem
This section is concluded with the concept of convergence in distribution, and two results involving convergence in distribution. To this end, let X n , n ≥ 1, and X be r.v.s defined on the probability space (, A, P), and let Fn and F be their d.f.s, respectively. Also, let C(F) be the continuity set of F (the set on which F is continuous). Then d
We say that {X n } converges in distribution to X and write X n → X ,
Definition 2.
n→∞
if Fn (x) → F(x) for all x ∈ C(F). We also denote this convergence by writing n→∞
Fn ⇒ F and call it weak convergence of {Fn } to F.
n→∞
The following theorem relates convergence in probability and convergence in distribution. P
d
n→∞
n→∞
If X n → X , then X n → X . The inverse is not true, in general. It
Theorem 6.
is true, however, if P(X = c) = 1.
For its proof, see, e.g., page 168 in Loève (1963), or page 183 in Roussas (1997). At this point, it should be mentioned that, although the book Loève (1963) is used as a standard reference here, there are of, course, other books which present a thorough treatment of probability; e.g., Shiryaev (1995) is such a book. In the following example convergence in distribution does not imply convergence in probability. Example 1. Let = {1, 2, 3, 4}, A = P(), P({1}) = P({2}) = P({3}) = P({4}) = 41 . Define X n , n ≥ 1, and X as follows:
X n (1) = X n (2) = 1 X n (3) = X n (4) = 0,
n≥1
;
X (1) = X (2) = 0 X (3) = X (4) = 1.
P
Then |X n − X | = 1 for all ω ∈ . Hence X n X , clearly. Next n→∞ ⎧ ⎧ ⎨ 0, x < 0 ⎨ 0, x < 0 FX n (x) = 21 , 0 ≤ x < 1 ; FX (x) = 21 , 0 ≤ x < 1 ; ⎩ ⎩ 1, x ≥ 1 1, x ≥ 1 P
i.e., FX n (x) = FX (x), x ∈ and hence FX n ⇒ FX , while X n X . n→∞
n→∞
The following theorem is very useful in statistics. Let X n , Yn , n ≥ 1, and X be r.v.s such that FX n ⇒ FX (or
Theorem 7 (Slutsky).
n→∞
d
P
n→∞
n→∞
X n → X ) and Yn → c. Then d
(i ) X n ± Yn → X ± c,
(i) FX n ±Yn ⇒ FX ±c ,
n→∞
n→∞
(ii) FX n Yn ⇒ FcX , n→∞
(iii) F X n ⇒ F X , c = 0. Yn
Remark 6.
Xn Yn
n→∞
c
; or
d
(ii ) X n Yn → cX , (iii )
n→∞ d Xn → Xc . Yn n→∞
P
is well defined with probability → 1, since Yn → c = 0. n→∞
n→∞
5.1 Standard Convergence Theorems and Some of Their Ramifications
For the proof of the theorem, see, e.g., pages 102–103 in Rao (1965), or page 201 in Roussas (1997). Example 2. In this example, we use some of the results obtained so far to show that, under certain conditions and in a certain sense, convergence in distribution is preserved. To this end, let X n , n = 1, 2, . . . , be r.v.s, let g : → be differentiable, and let its derivative g (x) be continuous at a point d. Also, let cn be constants such that d
d
n→∞
n→∞
0 = cn → ∞, and let cn (X n −d) → X , a r.v. Then cn [g(X n )−g(d)] → g (d)X . n→∞
All of the following limits are taken as n → ∞. In the first place, by assumption, d
d
P
cn (X n −d) → X and cn−1 → 0, so that X n −d → 0, by Theorem 7 (ii), or X n −d → 0, P
by Theorem 6. Hence |X n − d| → 0, by Exercise 8 in Chapter 3. Next, expand g(X n ) around d according to Taylor’s formula in order to obtain g(X n ) = g(d) + (X n − d)g (X n∗ ), where X n∗ is a r.v. lying between d and X n . Hence cn [g(X n ) − g(d)] = cn (X n − d)g (X n∗ ). P
P
P
However, |X n∗ − d| ≤ |X n − d| → 0, so that X n∗ → d, and hence g (X n∗ ) → g (d), by the continuity of g (x) at d and the exercise cited above. Then, by Theorem 7 (ii), d
d
cn (X n − d)g (X n∗ ) → g (d)X , and therefore cn [g(X n ) − g(d)] → g (d)X .
An application of the result discussed in the previous example is given below. Example 3. (i) Let X 1 , . . . , X n be i.i.d. r.v.s with mean μ ∈ and variance σ 2 ∈ (0, ∞), and let g : → be differentiable with derivative continuous at μ. Then √
n[g( X¯ n ) − g(μ)] → N (0, [σ g (μ)]2 ), d
n→∞
where X¯ n is the sample mean of the X j s. (ii) In particular, if the X j s are distributed as B(1, p), then √ d n [ X¯ n (1 − X¯ n ) − pq] → N (0, pq(1 − 2 p)2 ), n→∞
where q = 1 − p. √
n( X¯ n − μ) → X ∼ √ n→∞ N (0, σ 2 ). Then the assumptions of Example 2 are fulfilled, so that n[g( X¯ n ) −
(i) Indeed, the Central Limit Theorem (CLT) gives that
d
d
g(μ)] → g (μ)X ∼ N (0, [σ g (μ)]2 ). n→∞
(ii) Here μ = p and σ 2 = pq, and take g(x) = x(1 − x), 0 < x < 1. Then g( X¯ n ) = X¯ n (1 − X¯ n ), and g (x) = 1 − 2x, so that g ( p) = 1 − 2 p. The result then follows from part (i).
79
80
CHAPTER 5 Standard Convergence Theorems, The Fubini Theorem
The result obtained in Example 2 and its applications in Example 3 are often referred to as the delta method. See also Exercise 26 in Chapter 11. Below, a simple application of an instance of Theorem 7 (iii) is given in a testing hypothesis problem. Application 2. Let X 1 , . . . , X n be i.i.d. N (μ, σ 2 ) r.v.s. For testing H : μ = μo against A : μ = μo , say, at level of significance α (where σ is unknown), one uses the t-test and determines the cutoff point c by the requirement that √ n( X¯ n − μo ) P(|tn | > c) = α, where t = 1 ¯ 2 j (X j − X n ) n−1 is tn−1 distributed under H . Now, whether the X s are normal or not, we set √ ( X¯ n −μo ) √ n σ n( X¯ n − μo ) d and have as n → ∞, → N (0, 1), tn = (H ) σ 1 2 /σ 2 ¯ (X − X ) j n j n−1 1 1 P (X j − X¯ n )2 → 1, σ2 n − 1 j
d
provided, of course, μ, σ 2 are finite (in the nonnormal case). Then as n → ∞, tn →
(H )
Z ∼N (0, 1), and hence the size α of the test will be intact for large n no matter whether the normality assumption is valid or not. This is known as the robustness property of the t-test. In a probability framework, convergence almost everywhere (a.e.) becomes almost sure (a.s.) convergence, and convergence in measure becomes convergence in probability. These modes of convergences and convergence in distribution introduced earlier are related as follows, on the basis of the Corollary to Theorem 4 in Chapter 3, and Theorem 3 here: a.s.
P
d
X n → X implies X n → X implies X n → X , as n → ∞. Also, d
P
X n → X implies X n → X if P(X = c) = 1, but not otherwise, and P
a.s.
X n → X need not imply X n → X , as n → ∞. (For the last statement, see Exercise 2 (ii) in Chapter 3.)
5.2 Sections, Product Measure Theorem, the Fubini Theorem The content of this section is highly technical, and a brief outline of the basic concepts and results is as follows. Consider the σ -finite measure spaces (i , Ai , μi ), i = 1, 2, and the product measurable space (1 × 2 , A1 × A2 ).
5.2 Sections, Product Measure Theorem, the Fubini Theorem
First, for any E ∈ A1 × A2 , define 2 -sections of E at ω1 , to be denoted by E ω1 , and 1 -sections of E at ω2 , to be denoted by E ω2 , and show that they are measurable. Second, for a measurable function f : 1 × 2 → , define its 2 -section at ω1 , f ω1 (·) : E ω1 → , and the 1 -section of f at ω2 , f ω2 (·) : E ω2 → , and show that all these sections are measurable. Third, define the functions f : 1 → and g : 2 → by: f (ω1 ) = μ2 (E ω1 ) and g(ω2) = μ1 (E ω2 ). Then show that f and g are (nonnegative and) measurable, and that f dμ1 = gdμ2 . Fourth, on A1 × A2 , define λ by λ(E) = f dμ1 = gdμ2 . Then show that λ is a σ -finite measure, and that λ(A1 × A2 ) = μ1 (A1 )μ2 (A2 ), A1 ∈ A1 , A2 ∈ A2 . Because of this, λ is referred to as the product measure (of μ1 and μ2 ) and is denoted by μ1 × μ2 . Finally, consider the r.v. X : (1 × 2 , A1 × A2 , μ1 × μ2 ) → , and look at the following integrals (whose existence is assumed here): X (ω1 , ω2 )d(μ1 × μ2 ), X (ω1 , ω2 )dλ = X (ω1 , ω2 )dμ1 dμ2 = X (ω1 , ω2 )dμ1 dμ2 , X (ω1 , ω2 )dμ2 dμ1 . X (ω1 , ω2 )dμ2 dμ1 = Then conditions are given under which the above three integrals exist, and they are all equal. Definition 3.
For E ∈ A = A1 × A2 and for ω1 ∈ 1 , ω2 ∈ 2 , we define
E ω1 = {ω2 ∈ 2 ; (ω1 , ω2 ) ∈ E},
E ω2 = {ω1 ∈ 1 ; (ω1 , ω2 ) ∈ E}.
Clearly, E ω1 ⊆ 2 , E ω2 ⊆ 1 ; E ω1 is called an 2 -section of E at ω1 , and E ω2 is called an 1 -section of E at ω2 . Ω2 Eω1
E
ω2
Eω2
ω1
Ω1
Remark 7. Clearly, if E = A × B, then E ω1 = B or depending on whether ω1 ∈ A or ω1 ∈ / A, and similarly E ω2 = A or depending on whether ω2 ∈ B or / B. ω2 ∈
81
82
CHAPTER 5 Standard Convergence Theorems, The Fubini Theorem
Theorem 8.
Every section of a measurable set is measurable.
Proof. Let C = {E ∈ A; every section of E is measurable}. Then, by the previous remark, C contains all rectangles, and furthermore it is a σ -field. In fact, ∞ let E n ∈ E , and let ω ∈ . Then E = C, n = 1, 2, . . . , let E = ∞ n 1 1 ω 1 n=1 n=1 E n , ω1 , as is easily seen (see Exercise 7(iii)). Since E n,ω1 ∈ A2 , n = 1, 2, . . . , we have that E ω1 ∈ A2 . Similarly, E ω2 ∈ A1 , ω2 ∈ 2 . Thus E ∈ C. Now if E ∈ C, then E c ∈ C. In fact, for ω1 ∈ 1 we have (E c )ω1 = (E ω1 )c , as is easily seen (see Exercise 7(v)). Since E ω1 ∈ A2 , we have E ωc 1 ∈ A2 . Similarly, E ωc 2 ∈ A1 , ω2 ∈ 2 . Thus E c ∈ C. As mentioned already, C contains the class of all rectangles in 1 ×2 . Hence A ⊆ C. Since also C ⊆ A by its definition, we have C = A. Let f : E ⊆ 1 × 2 → . Definition 4. For ω1 ∈ 1 , the function f ω1 , defined on E ω1 into by f ω1 (ω2 ) = f (ω1 , ω2 ), (ω1 , ω2 ) ∈ E, is called an 2 -section of f at ω1 . Similarly, for ω2 ∈ 2 , the function f ω2 , defined on E ω2 into by f ω2 (ω1 ) = f (ω1 , ω2 ), (ω1 , ω2 ) ∈ E, is called an 1 -section of f at ω2 . Theorem 9. If f : (1 ×2 , A1 ×A2 ) → ( , B) is measurable, then every section of it is also measurable. Proof.
Let B ∈ B. Then f ω−1 (B) = {ω2 ∈ 2 ; f ω1 (ω2 ) ∈ B} = {ω2 ∈ 2 ; f (ω1 , ω2 ) ∈ B} 1 = {ω2 ∈ 2 ; (ω1 , ω2 ) ∈ f −1 (B)} = ( f −1 (B))ω1 .
(B) is simply an Now f −1 (B) ∈ A1 × A2 by the measurability of f , and f ω−1 1 2 -section of f −1 (B) at ω1 , thus an A2 measurable set. So, f ω1 is A2 -measurable and in a similar fashion f ω2 is A1 -measurable. Consider now the σ -finite measure spaces (i , Ai , μi ), i = 1, 2, and let (1 × 2 , A1 × A2 ) be the product measurable space. For E ∈ A1 × A2 , define the functions f and g on 1 and 2 , respectively, as follows: f (ω1 ) = μ2 (E ω1 ), g(ω2 ) = μ1 (E ω2 ).
(5.4)
These functions can be defined since E ω1 and E ω2 are measurable (Theorem 8). With this notation, we have the following theorem. Theorem 10. For every E ∈ A = A1 × A2 ,the functions f and g as defined above are nonnegative, measurable, and f dμ1 = gdμ2 . Proof. Let M = {E ∈ A1 × A2 ; for the respective functions f and g defined by (5.4), the theorem is true}. Then (i) M = since clearly, 1 × 2 ∈ M. (ii) M is closed under countable sums. In fact,let E n ∈ M, n = 1, 2, . . . , with E i ∩ E j =, i = j, and n E n . To show that E ∈ M. set E = E (see also Exercise 7(iii)), we have Since E ω1 = ( n E n )ω1 = n n,ω1
5.2 Sections, Product Measure Theorem, the Fubini Theorem
f (ω1 ) = μ2 (E ω1 ) = μ2 ( n E n,ω1 ) = n μ2 (E n,ω1 ) = n f n (ω1 ), where f n (ω1 ) = μ2 (E n,ω1 ). That is, f (ω 1) = n f n (ω1 ). Now E n ∈ M implies that f n is ≥ 0, measurable. Thus f = n f n ≥ 0 and since nk=1 f k → (↑) f and n→∞ n f is measurable, it follows that f is measurable. Also, f = k n fn = k=1 f by Corollary 1 to Theorem 1. In a similar way, g ≥ 0 and measure n n f n dμ1 = and g= n gn = n gn , where gn (ω2 ) =μ1 (E n,ω2 ).But gn dμ2 , n = 1, 2, . . . , because E n ∈ M. Hence f dμ1 = gdμ2 . (iii) If E = A × B, then E ∈ M. In fact, f (ω1 ) = μ2 (E ω1 ) = μ2 (B)I A (ω1 ), B (ω2 ). Thus f , g are ≥ 0, measurable. Next, g(ω2 ) = μ1 (E ω2 ) = μ1 (A)I f dμ1 = μ2 (B)μ1 (A) = gμ2 = μ1 (A)μ2 (B). Hence f dμ1 = gdμ2 . (iv) If C is the field (by Theorem 7 in Chapter 1) of all finite sums of measurable rectangles in 1 × 2 , then C ⊆ M. This follows from (ii) and (iii). (v) M is a monotone class. Let first E n ∈ M, n = 1, 2, . . . . with E n ↑. Then to de f show that E = lim E n = n E n ∈ M. First, we notice that E n ↑, implies n→∞ E n,ω1 ↑, ω1 ∈ 1 (see also Exercise 7(i)), and E ω1 = n E n ω1 = n E n,ω1 (see also Exercise 7(iii)) or E ω1 = ( lim E n )ω1 = lim E n,ω1 . Next, with the n→∞ n→∞ limits taken everywhere as n → ∞, f (ω1 ) = μ2 (E ω1 ) = μ2 lim E n,ω1 = lim μ2 E n,ω1 = lim f n (ω1 ), where f n (ω1 ) = μ2 (E n,ω1 ); i.e., f (ω1 ) = lim f n (ω1 ), ω1 ∈ 1 .
Since f n ≥ 0, measurable, so is f . But fn (ω1 ) = μ2(E n,ω1 ) ≤ μ2 E n+1,ω1 = f n+1 (ω1 ); i.e., 0 ≤ f n ↑ f , and this implies f n dμ1 ↑ f dμ1 . In a similar fashion, 0 ≤ gn ↑ g, and this implies gn dμ2 ↑ gdμ2 , where g(ω2 ) = μ1 (E ω2 ), gn (ω2 ) = μ1 (E n,ω2 ). But f n dμ1 = gn dμ2 , since E n ∈ M. Hence f dμ1 = gdμ2 implies E ∈ M. de f Let now E n ∈ M, n = 1, 2, . . . , E n ↓. To show that E = lim E n = n E n ∈ M. First, thatμ1 and μ2 are finite. Again, E n ↓ E implies E n,ω1 ↓, and assume E = n E n,ω1 or E ω1 = (limE n )ω1 = limE n,ω1 , ω1 ∈ 1 (see also E ω1 = n n ω1 Exercice 7(iv)). Next, f (ω1 ) = μ2 (E ω1 ) = μ2 (limE n )ω1 = μ2 limE n,ω1 = lim μ2 E n,ω1 (by finiteness of μ2 ), = lim f n (ω1 ); f (ω1 ) = lim f n (ω1 ), ω1 ∈ 1 . Since f n ≥ 0, measurable, so is f . But f n (ω1 ) = μ2 (E n,ω1 ) ≥ μ2 E n+1,ω1 = f n+1 (ω1 ), and f 1 (ω1 ) = μ2 (E 1,ω1 ) ≤ μ2 (2 ) < ∞, ω1 ∈ 1 . Thus 0 ≤ fn ≤ μ2 (2 ) < ∞ with f n → f . Then Theorem 2 (iii) implies f n dμ1 → f dμ1 . Similarly, 0 ≤ gn measurable ≤ μ1 (1 ) < ∞ and gn → g, where g(ω2 ) = μ1 (E ω2 ), gn (ω2 ) = μ1 (E n,ω2 ), ω2 ∈ 2 , and hence gn dμ2 →
83
84
CHAPTER 5 Standard Convergence Theorems, The Fubini Theorem
gdμ2 . But f n dμ1 = gn dμ2 , since E n ∈ M. Hence f dμ1 = gdμ2 , which implies that E ∈ M. Now consider the case where μ1 and μ2 are σ -finite. Their σ -finiteness implies the existence of partitions {Ai , i = 1, 2, . . .} and {B j , j = 1, 2, . . .} of 1 and 2 , respectively, for which μ1 (Ai ) < ∞ and μ2 (B j ) < ∞ for all i and j. For each i and j, define on A1 and A2 , respectively, the finite measures μ1i (A) = μ1 (A ∩ Ai ) and μ2 j (B) = μ2 (B ∩ B j ) (see Exercise 8). For any E ∈ A1 × A2 , set f (ω1 ) = μ2 (E ω1 ),
f j (ω1 ) = μ2 j (E ω1 ) = μ2 E ∩ (Ai × B j ) ω with ω1 ∈ Ai 1
= μ2 (E ω1 ∩ B j ), g(ω2 ) = μ1 (E ω2 ),
gi (ω2 ) = μ1i (E ω2 ) = μ1 E ∩ (Ai × B j ) ω with ω2 ∈ B j 2
= μ1 (E ω2 ∩ Ai ). Then observe that f (ω1 ) =
∞
f j (ω1 ) and g(ω2 ) =
j=1
∞
gi (ω2 ).
(5.5)
i=1
It follows that f j , gi are ≥ 0, f j is A1 -measurable, gi is A2 -measurable, and on account of (5.5), so are f and g, respectively. Also, f j dμ1i = gi dμ2 j for all iand j. It follows that ∞ ∞ ∞ ∞ ∞ ∞ (5.6) f j dμ1i = gi dμ2 j = gi dμ2 j . i=1 j=1
i=1 j=1
j=1 i=1
However,
f j dμ1i =
⎛ ⎞ ⎝ f j ⎠ dμ1i (by Corollary 1 to Theorem 1)
j
=
j
(by the definition of f )
f dμ1i
=
(by Exercise 9),
f dμ1 Ai
and i
j
f j dμ1i =
i
f dμ1 = Ai
i
( f I Ai )dμ1 =
( f I Ai )dμ1 i
5.2 Sections, Product Measure Theorem, the Fubini Theorem
= f I Ai dμ1 i
= =
i Ai
gi dμ2 j =
i
and
gi dμ2 j =
i
gi dμ2 j =
gdμ2 j =
gdμ2 , Bj
j
(5.7)
i
j
(since {Ai ; i ≥ 1}is a partition of 1 ).
f dμ1
Likewise,
f dμ1 (by Corollary 1 to Theorem 1)
gdμ2 = Bj
j Bj
gdμ2 =
gdμ2 .
(5.8)
Relations (5.5)–(5.8) then yield f dμ1 = gdμ2 . Since, as already mentioned, and A2 -measurable, respectively, the the f and g are nonnegative and A1 -measurable proof is completed. Thus, in all cases, f dμ1 = gdμ2 and this implies E ∈ M. So we have that M is a nonempty monotone class containing C; hence, M contains the minimal monotone class over C, which is A1 × A2 (by Theorem 6 in Chapter 1). Since also M ⊆ A1 × A2 , it follows that M = A1 × A2 . Thus the theorem is true for every E ∈ A1 × A2 . Remark 8. The theorem need not be true if μ1 , μ2 are not σ -finite, as the following example shows. This fact also has repercussions to Theorems 11 and 12 later. Example 4. Let 1 = 2 = [0, 1], let A1 = A2 = B[0,1] , let μ1 be the Lebesgue measure on A1 , and define the set function μ2 on A2 by: μ2 (A) = number of points in A. Then 2 is an infinite μ ∞ measure since μ2 is nondecreasing, μ2 () = 0, and ∞ Ai ) = i=1 μ2 (Ai ) = ∞ for whatever Ai s in A2 . ∞ = μ2 ( i=1 a partition of [0, 1] with However, μ2 is not σ -finite. Indeed, if {A1 , A2 , . . .} is ∞ A s would have to be finite, and μ2 (Ai ) < ∞, then the i i=1 Ai = [0, 1], which ∞ is a contradiction as i=1 Ai is countable. Next, define the functions f ∗ and g ∗ as follows: f ∗ , g ∗ : [0, 1] × [0, 1] → [0, 1], f ∗ (x, y) = x, and g ∗ (x, y) = y. Then f ∗ and g ∗ are measurable, since, e.g., for any 0 ≤ x1 < x2 ≤ 1, ( f ∗ )−1 ([x1 , x2 ]) = [x1 , x2 ] × [0, 1], which is (A1 , A2 )-measurable, and likewise for g ∗ . It follows that h = f ∗ − g ∗ is also measurable. Now, let D be the main diagonal of the square [0, 1] × [0, 1]; i.e., D = {(x, y) ∈ [0, 1] × [0, 1]; x = y}. Then D is measurable, because h −1 ({0}) = D. Next, for each x ∈ [0, 1], the [0, 1]section of D at x, Dx , is Dx = {y ∈ [0, 1]; (x, y) ∈ D} = {y} (with y = x), and likewise D y = {x} (with x = y). Therefore the functions f and g defined in relation (5.4) are here f (x) = μ2 (Dx ) = μ2 ({y}) = 1 and g(y) = μ1 (D y ) = μ1 ({x}) = 0.
85
86
CHAPTER 5 Standard Convergence Theorems, The Fubini Theorem
So, f (x) = 1 for all x ∈ [0, 1] and g(y) = 0 for all y ∈ [0, 1]. It follows that f dμ1 = 1 = 0 = gdμ2 [0,1]
[0,1]
(by following the convention that 0 × ∞ = 0).
Theorem 11 (Product Measure Theorem). Let (i , Ai , μi ), i = 1, 2, be two σ finite measure spaces. Define λ on A1 × A2 as follows: for E ∈ A1 × A2 , λ(E) = μ2 (E ω1 )dμ1 = μ1 (E ω2 )dμ2 . Then (i) (ii) (iii) (iv)
λ is a measure. If E = A × B, A ∈ A1 , B ∈ A2 , then λ(A × B) = μ1 (A)μ2 (B). λ is σ -finite. If μ is defined on the rectangles A × B, A ∈ A1 and B ∈ A2 , by μ(A × B) = the field C of finite sums of (measurable) μ1 (A)μ2 (B), and is extended to rectangles in 1 ×2 by μ(E) = ri=1 μ1 (Ai )μ2 (Bi ), where E = ri=1 Ai × Bi , Ai ∈ A1 , Bi ∈ A2 , i = 1, . . . , r , then μ is well defined on C, is a σ -finite measure on C, and λ is the unique extension of μ from C to A1 × A2 .
Remark 9. It should be mentioned at this point that the measure λ is instrumental in establishing the properties of the set function μ asserted in part (iv) of Theorem 11. Proof of Theorem 11. (i) From Theorem 10, if f(ω1 ) = μ2 (E ω1 ) and g(ω2 ) = μ1 (E ω2 ), then f , g are ≥ 0, measurable, and f dμ1 = gdμ2 . Thus, for E ∈ A1 × A2 , λ(E) is well defined. Next, λ is a measure. It suffices to prove that λ has the following properties: (a) λ() = 0, (b) λ is σ -additive, (c) λ is nondecreasing or nonnegative. That λ() = 0 is obvious. ∞ Let now E n ∈ A1 ×A 2 , n ≥ 1, E i ∩E j = , i = j. To show that n=1 λ(E n ) = ∞ λ(E), where E = n=1 E n . Let f , f n be defined by f (ω1 ) = μ2 (E ω1 ), f n (ω1 ) = μ2 (E n,ω1 ). But, as in Theorem 10 (ii), ⎛ ⎞ ∞ ∞ ∞ ⎠ = μ2 μ2 (E ω1 ) = μ2 ⎝ En E n,ω1 = μ2 (E n,ω1 ); n=1
ω1
i.e., f (ω1 ) =
n=1 ∞
n=1
f n (ω1 ), ω1 ∈ 1 .
n=1
Then, ∞ ∞ f n dμ1 = f n dμ1 (by Corollary 1 to Theorem 1), f dμ1 = n=1
or λ(E) =
∞ n=1
λ(E n ).
n=1
5.2 Sections, Product Measure Theorem, the Fubini Theorem
Finally, let E, F ∈ A1 × A2 with E ⊂ F. To show that λ(E) ≤ λ(F). Indeed, E ⊂ F implies E ω1 ⊆ Fω1 (see Exercise 7(i)), so that μ2 (E ω1 ) ≤ μ2 (Fω1 ).
Hence,
μ2 (E ω1 )dμ1 ≤
μ2 (E ω1 )dμ1 or λ(E) ≤ λ(F).
(Alternatively, λ(E) ≥ 0, trivially.) (ii) Let E = A × B, A ∈ A1 , B ∈ A2 . Then f (ω1 ) = μ2 (B)I A (ω1 ), and hence λ(E) = f dμ1 = μ1 (A)μ2 (B). (iii) Let {Ai , i = 1, 2, . . .}, {B j , j = 1, 2, . . .} be partitions of 1 , 2 , respectively, such that μ1 (Ai ) < ∞, μ2 (B j ) < ∞, i, j = 1, 2, . . . . Then {Ai × B j , i, j = 1, 2, . . .} is a partition of 1 × 2 and λ(Ai × B j ) = μ1 (Ai ) μ2 (B j ) < ∞, i, j = 1, 2, . . . . (iv) By part (ii), μ(A× B) = λ(A× B)(= μ1 (A)μ2 (B)). Next, if E = ri=1 Ai × Bi with Ai ∈ A1 and Bi ∈ A2 , i = 1, . . . , r , then r
μ(Ai × Bi ) =
i=1
r
μ1 (Ai )μ2 (Bi ) =
i=1
=λ
r
r
λ(Ai × Bi )
i=1
Ai × Bi
i=1
(since λis a measure on A1 × A2 ⊃ C) = λ(E). Likewise, if E =
s j=1 s
Aj × B j , then
μ(Aj × B j ) =
j=1
=
s j=1 s
μ1 (Aj )μ2 (B j ) λ(Aj × B j )
j=1 s
= λ(
Aj × B j )
j=1
= λ(E) so that μ is well defined on C. Furthermore, μ is a measure on C because μ = λ on C and λ is a measure on A1 × A2 ⊃ C. Finally, μ is σ -finite (by the proof of part (iii)). Hence by the Carathéodory Extension Theorem (Theorem 5 in Chapter 2), λ is the unique extension of μ from C to A1 × A2 . Definition 5. The measure λ as defined previously is the product measure of μ1 and μ2 denoted by μ1 × μ2 .
87
88
CHAPTER 5 Standard Convergence Theorems, The Fubini Theorem
Corollary. A set E ∈ A1 × A2 is λ-null (λ(E) = 0) if and only if almost every section of it is null. Proof.
We assume μ1 (1 ), μ2 (2 ) > 0, since otherwise λ = 0. We have
λ(E) =
μ2 (E ω1 )dμ1 =
μ1 (E ω2 )dμ2 .
Hence λ(E) = 0 if and only if
μ2 (E ω1 )dμ1 =
μ1 (E ω2 )dμ2 = 0,
and since the integrants are ≥ 0, these relations imply μ2 (E ω1 ) = 0 a.e. [μ1 ], μ1 (E ω2 ) = 0 a.e. [μ2 ]. (See Exercise 2 in Chapter 4.)
5.2.1 Preliminaries for the Fubini Theorem Let (i , Ai , μi ), i = 1, 2, be two σ -finite measure spaces and consider the product measure space (1× 2 , A1 × A2 , μ1 × μ2 = λ). Let X be a r.v. defined on 1 × 2 and let that X dλ = X d( μ1 × μ2 ) exist. This integral is also called the double integral of X . Next, for fixed ω1 ∈ 1 , set X ω1 (ω2 ) = X (ω1 , ω2 ), and for fixed ω2 ∈ 2 , set X ω2 (ω1 ) = X (ω1 , ω2 ) (apply Definition 4 with E = 9, X ω1 (·), X ω2 (·) are A2 , A1 -measurable, respectively. We 1 × 2 ). By Theorem (·)dμ , X (·)dμ , exists and set f (ω ) = X (·)dμ = assume now that X ω 2 ω 1 1 ω 2 1 2 1 X (ω1 , ·)dμ2 , g(ω2 ) = X ω2 (·)dμ1 = X (·, ω2 )dμ1 . Then f and g are meaas is seen in the proof of Theorem 12 later, and assume that f dμ1 = surable, [ X (ω1 , ω2 )dμ2 ]dμ1 and gdμ2 = [ X (ω1 , ω2 )dμ1 ]dμ2 , exist. These integrals are also called iterated integrals. The question then arises: under what conditions de f X (ω1 , ω2 )dμ1 dμ2 X (ω1 , ω2 )dμ1 dμ2 = = X (ω1 , ω2 )dμ2 dμ1 de f = X (ω1 , ω2 )dμ2 dμ1 ? The answer to this question is given by the Fubini Theorem. Theorem 12 (The Fubini Theorem). Consider the product σ -finite measure space (1 × 2 , A1 × A2 , μ1 × μ2 = λ) and let on 1 × 2 that X be a r.v. defined dμ = X dμ2 dμ1 and their is either nonnegative or λ-integrable. Then X dμ 1 2 common value is X dλ.
5.2 Sections, Product Measure Theorem, the Fubini Theorem
Proof. Assume first that X is nonnegative (Halmos). Then all X dλ, X dμ1 dμ2 , X dμ2 dμ1 exist (since X ≥ 0, X ω1 (·) ≥ 0, X ω2 (·) ≥ 0, f (ω1 ) ≥ 0, g(ω2 ) ≥ 0, and assuming that f and g are appropriately measurable). We will present the proof in three steps. Assume first that X = I E for some E ∈ A1 × A2 . Then X (ω , ω ) = I E (ω1 , ω2 ) = I E ω2 (ω1 ), measurable for each fixed ω2 , and hence 1 2 X (ω1 , ω2 )dμ1 = I E ω2 (ω1 )dμ1 = μ1 (E ω2 ) (measurable, by Theorem 10), and μ1 (E ω2 )dμ2 . Similarly, X (ω1 , ω2 )dμ2 dμ1 = X (ω1 , ω2 )dμ1 dμ2 = μ = X dλ by μ2 (E ω1 )dμ1 . But μ2 (E ω1 )dμ1 = 1 (E ω2 )dμ2 = λ(E) Theorems 10 and 11. Thus X dμ1 dμ2 = X dμ2 dμ1 = X dλ. By linearity of the integral, the theorem is then true for nonnegative simple r.v.s. Next, let X be any nonnegative r.v., and in the sequel, take all limits as n → ∞. Then there exist 0 ≤ X n simple r.v.s ↑ X , and by the Monotone Convergence Theorem, X dλ. (5.9) X n dλ → For each fixed ω2 ∈ 2 , X n (·, ω2 ) and X (·, ω2 ) are (A1 -)measurable and 0 ≤ X n (·, ω2 ) ↑ X (·, ω2 ), so that, by the Monotone Convergence Theorem, (0 ≤) X n (·, ω2 )dμ1 ↑ X (·, ω2 )dμ1 . (5.10) But X n (·, ω2 )dμ1 is (A2 -)measurable (as a finite linear combination of integrals of indicators) by the previous step; also, X (·, ω2 )dμ1 is (A2 -)measurable as a limit of (A2 -)measurable r.v.s. Then from (5.10) and the Monotone Convergence Theorem, it follows that X (ω1 , ω2 )dμ1 dμ2 ↑ X (ω1 , ω2 )dμ1 dμ2 , or
X n dμ1 dμ2 →
Likewise,
X dμ1 dμ2 .
(5.11)
X dμ2 dμ1 .
(5.12)
X n dμ2 dμ1 →
However, for each n ≥ 1, X n dμ1 dμ2 = X n dμ2 dμ1 = X n dλ. Then, from (5.9)–(5.13), we get X dμ1 dμ2 = X dμ2 dμ1 = X n dλ, as was to be seen.
(5.13)
89
90
CHAPTER 5 Standard Convergence Theorems, The Fubini Theorem
Now suppose that X is λ-integrable (Loève). Since X = X + − X − and the theorem is true for X + and X − , it follows that it is true for X . In more detail, apply the result just obtained to X + and X − to get X + dμ1 dμ2 = X + dμ2 dμ1 = X + dλ, X − dμ1 dμ2 = X − dμ2 dμ1 = X − dλ, so that
(X + − X − )dμ1 dμ2 =
or
−
X dμ1 dμ2 − X dμ1 dμ2 = X + dμ2 dμ1 − X − dμ2 dμ1 = X + dλ − X − dλ,
or
+
(X + − X − )dμ2 dμ1 =
X dμ1 dμ2 =
(X + − X − )dλ,
X dμ2 dμ1 =
X dλ.
Remark 10. From the preceding elaboration, it also follows that theorem is the + dλ = ∞ or true even if both sides are either ∞ or −∞, depending on whether X − X dλ = ∞. Remark 11. The Fubini Theorem also holds (and is established in a similar fashion) for any finite number of product σ -finite measure spaces. Exercises. 1. If the r.v. X is integrable, then show that: n P(|X | ≥ n) → 0. By a counterexn→∞ ample, show that the converse need not be true. Hint: For the converse part, let X take on the values n = 3, 4, . . . , with probabilities pn proportional to (log n + 1)/(n log n)2 . 2. Let X be an integrable r.v. Then, for every ε > 0, there is a simple r.v. X ε such that |X − X ε | < ε. Hint: Write X = X + − X − , consider nonnegative sequences of simple r.v.s {X n } and {Yn } converging to X + and X − , appropriately employ the Dominated Convergence Theorem, and finally, define X ε in terms of X n s and Yn s. 3. Establish the following generalized version of part (iii) of Theorem 2 (Fatou– Lebesgue Theorem). Namely, for n = 1, 2, . . ., let X n , Un , Vn and X , U , V be r.v.s such that a.e. a.e. a.e. U n ≤ Xn ≤ Vn a.e., n ≥ 1, and as n → ∞, X n → X ,Un → U , Vn → V , X finite. Un → U finite and Vn → V finite. Then X n → n→∞
5.2 Sections, Product Measure Theorem, the Fubini Theorem
4. Establish the following generalized version of Theorem 3 (Dominated Convergence Theorem). Namely, for n = 1, 2, . . . , let X n , Un , X , U be r.v.s such that a.e. a.e. |X n | ≤ Un a.e., n ≥ 1, and as n → ∞, X n → X , Un → U , and Un → U finite. Then, as n → ∞: (i) X n → X finite; (ii) |X n − X | → 0; (iii) A X n → A X uniformly in A ∈ A. 5. In reference to Exercise 4, leave the setup and the assumptions intact except that a.e.
μ
a.e.
μ
the assumptions X n → X and Un → U are replaced by X n → X and Un → U as n → ∞. Then conclusions (i)–(iii) hold. Hint: For part (i), use the following fact: A sequence of real numbers {xn } converges to a real number x as n → ∞, if and only if for any subsequence {m} ⊆ {n} there exists a further subsequence {r } ⊆ {m} such that xr → x. r →∞
6. Let μ be a σ -finite measure and let ϕ be a nonnegative σ -additive and finite function, both defined on the measurable space (, A). Then show that ϕ μ is equivalent to the following: for every ε > 0, there exists δ = δ(ε) > 0 such that μ(A) < δ implies ϕ(A) < . Hint: If the assertion were not true when ϕ μ, there would exist an ε > 0 such that for every δ > 0 and some A with μ(A) < δ, we would have ϕ(A) ≥ ε. Apply this argument for δn = 1/2n , μ(An ) < 1/2n , and ϕ(An ) ≥ ε for some An , and set A = lim sup An in order to get μ(A) = 0 and ϕ(A) ≥ ε, a contradiction. n→∞
7. All sets figuring below are subsets of the product space 1 × 2 . Then show that (i) E ⊆ F implies E ω1 ⊆ Fω1 and E ω2 ⊆ Fω2 , ω1 ∈ 1 , ω2 ∈ 2 . (ii) E ∩ F = implies E ω1 ∩ Fω1 = E ω2 ∩ Fω2 = , ω1 ∈ 1 , ω2 ∈ 2 . (iii) For n = 1, 2, . . . , ∪ E n = ∪ E n,ω1 , ∪ E n = ∪ E n,ω2 , and in n n ω1 ω2 n n
En = E n,ω1 , En = E n,ω2 , ω1 ∈ 1 , particular, n
ω1
ω2 ∈ 2 . (iv) For n = 1, 2, . . . , ∩ E n n
n
ω1
ω2
n
= ∩ E n,ω1 , ∩ E n n
n
n
ω2
= ∩ E n,ω2 , ω1 ∈ 1 , n
ω2 ∈ 2 . (v) (E c )ω1 = (E ω1 )c , (E c )ω2 = (E ω2 )c , ω1 ∈ 1 , ω2 ∈ 2 . 8.
(i) Consider the measure space (, A, μ) and let C be a fixed set in A. On A, define the function μ◦ by: μ◦ (A) = μ(A ∩ C). Then show that μ◦ is a measure. (ii) Let X be ar.v. defined on (, A, μ) and X dμ exists. Then suppose that show that X dμ◦ also exists and that X dμ◦ = C X dμ.
9. Consider the (σ -finite) measure space (, A, μ), and let {Ai , i = 1, 2, . . .} be a (measurable) partition of . For each i, define the measure (see Exercise 8) on (, A, μ) for which μi by: μi (A) = μ(A ∩ Ai ). Then, if X is a r.v. defined the integral X dμ exists, show that the integrals X dμ i , i ≥ 1, also exist and ∞ X dμ = X dμ. i i=1
91
92
CHAPTER 5 Standard Convergence Theorems, The Fubini Theorem
10. If X is a simple r.v. (not necessarily nonnegative), defined on the (product) space (1 × 2 , A1 × A2 , λ = μ1 × μ2 ) with μ1 and μ2 σ -finite, for which X dλ exists, then appealing to Theorem 12 and Remark show directly (i.e., without , ω )dμ dμ = X (ω , ω )dμ dμ = X (ω1 , ω2 )dλ = 10) that X (ω 1 2 1 2 1 2 2 1 X dλ. 11. Let X be a r.v. defined on the product space (1 × 2 , A1 × A2 , λ = μ1 × μ2 ) and suppose that X dλ exists. Then show that the Fubini theorem holds true. 12. If the r.v.s X 1 , . . . , X n are i.i.d. with E X 1 = μ ∈ and V ar (X 1 ) = σ 2 ∈ (0, ∞), √ ¯ d then by the CLT, n( X n −μ) → Z ∼ N (0, 1), where X¯ n = 1 nj=1 X j is the σ
n
n→∞
sample mean of the X j s. Show that the CLT implies the Weak Law of Large P Numbers (WLLN); i.e., X¯ n → μ. n→∞
13. Let (, A, μ) = ((0, 1], B(0,1] , λ) where λ is the Lebesgue measure, and consider the function f : (0, 1] → defined by 1 1 < x ≤ , n = 1, 2, . . . . n+1 n Then investigate whether or not the integral (0,1] f dλ exists. Also, compute it if it exists, as well as the (0,1] | f |dλ. 14. For n = 1, 2, . . . , consider the r.v.s X 1 , X 2 , . . . and X , and show that
|X n − X | P → 0. X n −→ X if and only if E n→∞ 1 + |X n − X | n→∞ f (x) = (−1)n n,
Hint: Refer to Exercise 14 in Chapter 4. 15. Take (, A, P) = ((0, 1), B(0,1) , λ), λ being the Lebesgue measure, and, for n = 0, 1, . . ., define the r.v.s X n by: X 2n+1 = I(0, 1 ) , 2
X 2n+2 = I[ 1 ,1) . 2
Then, as n → ∞, show that: lim inf X n ≤ lim inf X n , lim sup X n ≤ lim sup X n (as it should be, by Theorem 2, since 0 ≤ X n ≤ 1 for all n, and the bounds are integrable). 16. In reference to Exercise 12 in Chapter 4, define the r.v. X by X (ω) = 21ω and then show that X dμ = 2. 17. For n ≥ 1, let h n , gn , G n , and h, g, G be real-valued measurable functions defined on ( k , B k , λk ), where λk is the Lebesgue measure on B k , be such that: (i) As n → ∞, h n (x) → h(x), gn (x) → g(x), G n (x) → G(x) a.e. [λk ].
5.2 Sections, Product Measure Theorem, the Fubini Theorem
(ii) For all n ≥ 1, gn (x) ≤ h n (x) ≤ G n (x) a.e. [λk ]. (iii)
gn dλ → g dλ , G n dλ → k
k
g dλk , G dλk are finite. and k
k
k
k
k
k
G dλk , as n → ∞,
k
Then show that k h n dλk → k h dλk and k h dλk is finite. 18. For n ≥ 1, let X n , Yn , and X be r.v.s defined on the probability space (, A, P), d
and suppose that X n → X as n → ∞. Then, as n → ∞: P
d
(i) Yn − X n → c, a constant, implies Yn → X + c. P
d
(ii) Yn → c implies X n Yn → cX . 19. As a variant of Exercise 6, consider the following situation. Let (, A, P) be a probability space and let X be an integrable r.v. On A, define the (finite) measure ν by ν(A) = A X d P. Then show that ν << P if and only if for every ε > 0 there exists δ = δ(ε)(> 0) such that P(A) < δ implies ν(A) < ε. Hint: Use appropriately the Dominated Convergence Theorem. 20. Let X be a nonnegative integrable r.v. Then: (i) Use the Fubini Theorem to show that EX =
∞
P(X ≥ t)dt.
0
(ii) Apply this result in case the d.f. of X is F(x) = 1 − e−λx for x ≥ 0 and 0 for x < 0 (λ > 0); or F(x) = 0 for x < 0, F(x) = x for 0 ≤ x ≤ 1, and F(x) = 1 for x ≥ 1.
93
CHAPTER
Standard Moment and Probability Inequalities, Convergence in the r th Mean and its Implications
6
This chapter consists of two sections. The first section is devoted to the standard moment and probability inequalities. They include the Hölder (Cauchy–Schwarz) inequality, referring to the expectation of the product of r.v.s; the Minkowski and the cr -inequality, referring to the expectation of the sum of r.v.s; and the Jensen inequality, concerning convex functions. The probability inequality in Theorem 6 provides both an upper and a lower bound, and the upper bound gives the Markov inequality and the Tchebichev inequality, as special cases. In the second section, the concepts of convergence in the r th mean as well as mutual convergence in the r th mean are introduced, and it is shown that they are equivalent. Most of the remainder of this section is devoted to establishing various implications of convergence in the r th mean, and also to giving sufficient conditions that imply convergence in the r th mean. The concepts of uniform integrability and uniform continuity are instrumental in these derivations. These facts are summarized in the form of a table for easy reference.
6.1 Moment and Probability Inequalities From now on the measure spaces to be considered will be probability spaces (, A, P). Theorem 1. Let X be a r.v. whose r th absolute moment is finite; i.e., E |X |r < ∞. Then E |X |r < ∞ for all 0 ≤ r ≤ r .
Proof. For each ω ∈ , we have |X (ω)|r ≤ 1 + |X (ω)|r for 0 ≤ r ≤ r . In fact, this is clearly true if |X (ω)| ≤ 1, while if |X (ω)| > 1, this is also true because then |X (ω)|r < |X (ω)|r ; this inequality implies E |X |r ≤ 1 + E |X |r < ∞. Theorem 2 (The Hölder inequality). Let X , Y be two r.v.s and let r > 1. Then E |X Y | ≤ E 1/r |X |r ×E 1/s |Y |s for s > 0 such that r1 + 1s = 1 (where it is assumed that E |X |r, E |Y |r < ∞, because otherwise the inequality is trivially true). In particular, for r = s = 2, we have E |X Y | ≤ E 1/2 |X |2 × E 1/2 |Y |2 or E 2 |X Y | ≤ E |X |2 E |Y |2 (which is known as the Cauchy–Schwarz inequality).
In proving Theorem 2, we need the following lemma. An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00006-2 Copyright © 2014 Elsevier Inc. All rights reserved.
95
96
CHAPTER 6 Standard Moment and Probability Inequalities, Convergence
Lemma 1. For x, y ∈ and r , s > 0 such that r1 + 1s = 1, we have |x y| ≤
|x|r r
+ |y|s . s
Proof. Since |x y| = |x| |y|, it suffices to prove the lemma for x, y > 0; i.e., to s r show x y ≤ xr + ys . For fixed x, consider the following function of y, f (y) = y s−1 (s−1)y s−2 xr . Thus f (y) = 0 yields s , y > 0. Hence f (y) = − r y 2 + s = r (s−1) y s ; from r1 + 1s = 1, it follows that r (s−1) = 1. So x r = y s . Next, s s s−3 r r f (y) = 2x + (s−1)(s−2)y , which for x r = y s becomes 2y + (s−1)(s−2) y s−3 > 0. s s r y3 r y3 Thus f (y) is minimized for y s = x r , and this minimum is xr ry xr
+
min f (y) = x. So, f (y) ≥ x or y>0
x r ys + ≥ x y. r s
Proof of Theorem 2. In the first place, if E |X |r = 0 then |X |r = 0 a.s. so that X = 0 a.s. and X Y = 0 a.s. or |X Y | = 0 a.s. Then E |X Y | = 0; thus the inequality is true. Similarly, if E |Y |s = 0. So, we assume that E |X |r, E |Y |r > 0, and set s r X (ω) Y (ω) x = E 1/r , y = E 1/s . Then |x y| ≤ |x|r + |y|s becomes |X |r |Y |s |X (ω)|r |Y (ω)|s |X (ω)Y (ω)| ≤ + for all ω ∈ ; hence r s r E 1/r |X | E 1/s |Y | r E |X | s E |Y |s E |X |r E |Y |s 1 1 E |X Y | ≤ + = + = 1 or r s r s 1/r 1/s E |X | E |Y | r E |X | sE |Y | r s E |X Y | ≤ E 1/r |X |r E 1/s |Y |s . Corollary. Proof.
E 1/r |X |r is nondecreasing in r (> 0).
Consider the Cauchy–Schwarz inequality E 2 |X Y | ≤ E |X |2 × E |Y |2 and
for 0 < r < r , replace X by |X |
r −r 2
and Y by |X |
r +r 2
. Then we get
E 2 |X |r ≤ E |X |r −r E |X |r +r , hence 2 log E |X |r ≤ log E |X |r −r 1 + log E |X |r +r or log E |X |r ≤ (log E |X |r −r + log E |X |r +r ). 2
Thus, if we consider log E |X |r as a function of r , g(r ) say, then g is convex, since the last relationship above is equivalent to g(r ) ≤ 21 [g(r − r ) + g(r + r )], where
+r ) r = (r −r )+(r and g is continuous. (To see that g is continuous, let r ↑ ro ; 2 r then |X | ≤ 1 + |X |ro (by Theorem 1), so that the Dominated Convergence Theorem applies and gives E |X |r → E |X |ro or g(r ) → g(ro ). Next, let r ↓ ro . Then |X |r ≤ r →r0
r →r0
1 + |X |r1 for some r1 > ro and all ro ≤ r ≤ r1 , so that the previous argument applies.) Now g(0) = 0 and the slope of the line through the points (0, 0) and ) 1 r (r , g(r )) is increasing in r . But this slope is g(r r (r > 0). Thus r log E |X | ↑ in r or 1
1
log E r |X |r ↑ in r or E r |X |r ↑ in r .
6.1 Moment and Probability Inequalities
g(r)
(r,g(r))
0
r
r
(For the facts on convex functions stated earlier, see, e.g., page 73 in Hardy et al. (1967)) For r ≥ 1, we have
Theorem 3 (The Minkowski inequality). 1
1
1
E r |X + Y |r ≤ E r |X |r + E r |Y |r (where it is assumed that E |X |r , E |Y |r < ∞, because otherwise the inequality is trivially true). Proof.
For r = 1, we have |X + Y | ≤ |X | + |Y | so that E |X + Y | ≤ E |X | + E |Y | .
So the inequality is true. Now let r > 1. Then E |X + Y |r = E(|X + Y | |X + Y |r −1 ) ≤ E(|X | |X + Y |r −1 + |Y | |X + Y |r −1 ) = E(|X | |X + Y |r −1 ) + E(|Y | |X + Y |r −1 ). At this point, applying the Hölder inequality for the given r (> 1) and s such that 1 1 r + s = 1 (from which it follows that s = r /(r − 1)), we get 1
1
1
1
E |X + Y |r ≤ E r |X |r E s |X + Y |(r −1)s + E r |Y |r E s |X + Y |(r −1)s 1
1
1
= E s |X + Y |(r −1)s (E r |X |r + E r |Y |r ). Now, from
1 r
+
1 s
= 1 we get (r − 1)s = r . Thus 1
1
1
E |X + Y |r ≤ E s |X + Y |r (E r |X |r + E r |Y |r ). 1
Hence, if E |X + Y |r > 0, then divide both sides by E r |X + Y |r to get 1
1
1
E r |X + Y |r ≤ E r |X |r + E r |Y |r . If E |X + Y |r = 0, the inequality is trivially satisfied.
97
98
CHAPTER 6 Standard Moment and Probability Inequalities, Convergence
For r ≥ 0, we have
Theorem 4 (The cr -inequality).
E |X + Y |r ≤ cr (E |X |r + E |Y |r ), where 1 if r ≤ 1 cr = r −1 2 if r > 1.
For the proof of this theorem we need the following lemma. Lemma 2. For x, y ∈ and r ≥ 0, we have |x + y|r ≤ cr (|x|r + |y|r ), where cr is as in the theorem. Proof. Since |x + y| ≤ |x| + |y| implies |x + y|r ≤ (|x| + |y|)r , it suffices to show that (|x| + |y|)r ≤ cr (|x|r + |y|r ). From this it also follows that it suffices to prove the lemma for x, y > 0. Case 1: 0 ≤ r ≤ 1. We have y x , < 1 imply x+y x+y and
x x+y
r
x , > x+y
y x+y
r >
y x+y
x r + yr >1 (x + y)r
or (x + y)r ≤ x r + y r , as was to be seen. y x , 1 − p = x+y . Then p, 1 − p < 1. Set also f ( p) = Case 2: r > 1. Set p = x+y r r p + (1 − p) and minimize it. From f ( p) = r pr −1 − r (1 − p)r −1 = 0, we get pr −1 = (1 − p)r −1 ; hence p = 1 − p and p =
1 . 2
Next, f ( p) = r (r − 1)[ pr −2 + (1 − p)r −2 ] is > 0 for all p(> 0) and hence for p = 21 . Since min f ( p) = f ( 21 ) = 2r1−1 , we have then: 0< p<1
f ( p) ≥
1 2r −1
or
x x+y
r
+
y x+y
r ≥
1 2r −1
or (x + y)r ≤ 2r −1 (x r + y r ),
as was to be seen. Proof of Theorem 4.
For all ω ∈ , we have
|X (ω) + Y (ω)|r ≤ cr [|X (ω)|r + |Y (ω)|r ]. Hence E |X + Y |r ≤ cr (E |X |r + E |Y |r ).
6.1 Moment and Probability Inequalities
Now, let I be an open interval ⊆ and assume g : I → , convex; i.e., for any x, x ∈ I and every α ∈ [0, 1], we have g[αx + (1 − α)x ] ≤ αg(x) + (1 − α)g(x ). For such a function g, we have the following facts: (1) g is continuous (hence measurable; see Exercise 2). (2) Through any point (xo , g(xo )) of the graph of g, there passes a straight line that stays beneath the graph of g or at most touches it. Such a line is called a line of support of the curve at the point (xo , g(xo )), and its equation is: y − g(x0 ) = λ(x0 )(x − x0 ), for some number λ(x0 ), or y = g(x0 ) + λ(x0 )(x − x0 ). But g(x) ≥ y, x ∈ I . Thus g(x) ≥ g(x0 ) + λ(x0 )(x − x0 ) or g(x) − g(x0 ) ≥ λ(x0 )(x − x0 ), x ∈ I . Theorem 5 (The Jensen inequality). Let g : I → be convex and for a r.v. X taking values in I , let E X ∈ I and E g(X ) exist. Then g(E X ) ≤ E g(X ).
Proof. In the first place, g is continuous hence measurable and therefore g(X ) is a r.v. Next, in g(x) − g(xo ) ≥ λ(xo )(x − xo ) replace xo by E X and x by X . We then have g(X ) − g(E X ) ≥ λ(E X )(X − E X ). Taking the expectations of both sides, one gets g(E X ) ≤ E g(X ). Applications. 1. Let g(x) = x 2 . Then g is convex and therefore E 2 X ≤ E X 2 (as also follows from the Cauchy–Schwartz inequality with Y = 1). 2. Let g(x) = x r , x ∈ (0, ∞), r ≥ 1. Then g is convex over the set (0, ∞) and r2
therefore E r |X | ≤ E |X |r . In particular, g(x) = x r1 , x ∈ (0, ∞) is convex in x for r2 ≥ r1 (>0). Replacing X by |X |r1 , we then get r2 1 1 r2 r2 E |X |r1 r1 ≤ E |X |r1 r1 or E r1 |X |r1 ≤ E |X |r2 or E r1 |X |r1 ≤ E r2 |X |r2 ; 1
i.e., E r |X |r ↑ in r (r > 0), as was seen before. Remark 1.
This was also established in the Corollary to Theorem 2.
Definition 1. α is said to be the almost sure sup of the r.v. X (α = a.s. sup X ) if P(X > α) = 0 and for any β < α, P(X > β) > 0. Theorem 6.
Let X be a r.v. and let g : → [0, ∞), Borel function. Then
(i) If g is even, and nondecreasing on [0, ∞), we have that E g(X ) − g(c) E g(X ) ≤ P(|X | ≥ c) ≤ , c > 0 constant, α g(c) and α = a.s. sup g(X ).
99
100
CHAPTER 6 Standard Moment and Probability Inequalities, Convergence
(ii) If g is nondecreasing on , then we have E g(X ) E g(X ) − g(c) ≤ P(X ≥ c) ≤ , c ∈ . α g(c) Remark 2.
∞ ∞
is interpreted as 0.
Proof. (i) Let A = (|X | ≥ c). Then for ω ∈ A we have |X (ω)| ≥ c; equivalently, X (ω) ≥ c or −X (ω) ≥ c. Hence g(−X (ω)) = g(X (ω)) ≥ g(c); i.e.,g(X ) ≥ g(c) on A whether X ≥ c or X ≤ −c. Similarly, g(X ) ≤ g(c) on Ac . Next, E g(X ) = g(X )dP = g(X )dP + g(X )dP Ac
A
and
g(c)P(A) ≤ A g(X )dP ≤ α P(A) . 0 ≤ Ac g(X )dP ≤ g(c)
Thus g(c)P(A) ≤ Eg(X ) ≤ α P(A) + g(c) and hence P(A) = P(|X | ≥ c) ≤
E g(X ) E g(X ) − g(c) and P(|X | ≥ c) ≥ . g(c) α
(ii) Let B = (X ≥ c). Then g(X ) ≥ g(c) on B and g(X ) ≤ g(c) on B c . Since E g(X ) = g(X )dP + g(X )dP, Bc
B
we get
g(c)P(B) ≤ B g(X )dP ≤ α P(B) , 0 ≤ B c g(X )dP ≤ g(c)
which leads to g(c)P(B) ≤ E g(X ) ≤ α P(B) + g(c), g(X ) and hence P(B) = P(X ≥ c) ≤ Eg(c) , P(X ≥ c) ≥ E g(X α)−g(c) .
Special Cases: By taking g(x) = |x|r , r > 0, we get from the right-hand side of |r the inequality in (i): P(|X | ≥ c) ≤ E |X cr , which is the Markov inequality.
Also, P(|X − E X | ≥ c) ≤ E |X −crE X | , by replacing X by X − E X (by assuming that E X is finite); in particular, for r = 2 we have the Tchebichev inequality: r
P(|X − E X | ≥ c) ≤
σ 2 (X ) . c2
This section is concluded with a simple example regarding the Tchebichev inequality.
6.2 Convergence in the r th Mean, Uniform Continuity
Example 1. When the distribution of X is not known, which is most often the case in statistics, the Tchebichev inequality lends itself handily to determining the smallest sample size n, so that the so-called sample mean will lie within a prescribed multiple of standard deviations σ from the population mean μ with probability no smaller than a preassigned value p. Thus, if X 1 , . . . , X n are i.i.d. r.v.s with expectation μ ∈ n X j with and variance σ 2 ∈ (0, ∞), then the sample mean of X j s is X¯ n = n1 j=1
E X¯ n = μ and σ 2 ( X¯ n ) = σ 2 /n. Then P(| X¯ n − μ| < kσ ) ≥ 1 − nk1 2 , and if we set 1 − nk1 2 ≥ p, then the required sample size is the smallest value of n that is greater 1 than or equal to k 2 (1− . The Markov inequality provides an upper bound for the p)
probability P(|X | ≥ c) when E|X |2 = ∞ but E|X |r < ∞ for some (0 <)r < 2.
6.2 Convergence in the r th Mean, Uniform Continuity, Uniform Integrability, and Their Relationships Definition 2. Let X , X n , n = 1, 2, . . ., be r.v.s such that E |X |r , E |X n |r < ∞, n = 1, 2, . . ., for some r > 0. We say that X n converges in the r th mean to X , and write (r ) X n → X , if E |X n − X |r → 0. n→∞
n→∞
(2)
For r = 2, the convergence is referred to as convergence in quadratic mean, X n → X , n→∞
q.m.
or X n → X .
n→∞
Remark 3. Since E |X n − X |r ≤ cr (E |X n |r +E |X |r ), we have that E |X n − X |r < ∞, n = 1, 2, . . . . At this point, it should be mentioned that the limit in the r th mean is a.s. uniquely defined. That is, we have the following Proposition 1.
(r )
(r )
n→∞
n→∞
Let X n → X and X n → Y or E |X n − X |r → 0 and
E |X n − Y |r → 0. Then X = Y a.s.
n→∞
n→∞
Proof.
Indeed, E |X − Y |r = E |(X n − Y ) − (X n − X )|r ≤ cr (E|X n − X |r + E|Yn − Y |r ) → 0, n→∞
so that E |X − Y |r = 0 and hence |X − Y |r = 0 a.s. or X = Y a.s.
The following theorem will prove useful in many cases. Theorem 7.
(r )
P
n→∞
n→∞
Let E |X n |r < ∞ for all n. Then X n → X implies X n → X and
E |X n |r → E |X |r finite. (However, see also Theorem 14.) n→∞
101
102
CHAPTER 6 Standard Moment and Probability Inequalities, Convergence
Proof. The first conclusion is immediate by the Markov inequality. As for the second, we have Case 1: 0 < r ≤ 1. By the cr -inequality we get E |X n |r = E |(X n − X ) + X |r ≤ E |X n − X |r + E |X |r E |X |r = E |(X n − X ) + X n |r ≤ E |X n − X |r + E |X n |r , so that E |X n |r − E |X |r ≤ E |X n − X |r E |X |r − E |X n |r ≤ E |X n − X |r , or − E|X n |r − E|X |r ≤ E|X n − X |r
and hence E |X n |r − E |X |r ≤ E |X n − X |r → 0 so that E |X n |r → E |X |r . n→∞
n→∞
Case 2: r > 1. By the Minkowski inequality we get 1 1 1 1 E r |X n |r = E r |(X n − X ) + X |r ≤ E r |X n − X |r + E r |X |r 1
1
1
1
E r |X |r = E r |(X n − X ) + X n |r ≤ E r |X n − X |r + E r |X n |r , so that ⎧ 1 ⎨ E r |X n |r − E r1 |X |r ≤ E r1 |X n − X |r
⎩ E r1 |X |r − E r1 |X n |r ≤ E r1 |X n − X |r , or − E r1 |X n |r − E r1 |X |r ≤ E r1 |X n − X |r
1 1 1 1 1
and hence E r |X n |r − E r |X |r ≤ E r |X n − X |r → 0; thus E r |X n |r → E r |X |r n→∞
or E |X n |r → E |X |r .
n→∞
Finiteness of E |X |r follows from E |X n |r − E |X |r ≤ E |X n − X |r (for 1 1
1 0 < r ≤ 1) or from E r |X n |r − E r |X |r ≤ E r |X n − X |r (for r > 1). n→∞
Theorem 7 is supplemented now by the following result. (r )
P
n→∞
n→∞
If X n → X , then X n → X (as was seen in Theorem 7). Conversely,
Theorem 8. P
(r )
n→∞
n→∞
if X n → X and P(|X n | ≤ M < ∞) = 1, then X n → X , for every r > 0. Proof.
P
Assume that X n → X and P(|X n | ≤ M) = 1. We have then P
a.s.
n→∞
n→∞
X n → X implies that there exists {m} ⊆ {n} such that X m → X (by Theorem 5 (ii) in Chapter 3). Hence P(|X n | ≤ M) = 1 implies P(|X | ≤ M) = 1 (by Exercise 18 in Chapter 3). Then, by Lemma 2, |X n − X |r ≤ cr (|X n |r + |X |r ) and P(|X n − X |r ≤ 2cr M r ) = 1,
6.2 Convergence in the r th Mean, Uniform Continuity
and therefore
|X n − X |r dP +
E |X n − X |r =
(|X n −X |≥ε) ≤ 2cr M r P(|X n
|X n − X |r dP
(|X n −X |<ε) r
− X | ≥ ε) + ε .
Hence lim sup E |X n − X |r ≤ εr . Letting now ε → 0, we obtain E |X n − X |r → 0.
n→∞
n→∞
Remark 4.
(r )
In Definition 2 we defined X n → X by assuming that E |X n |r < ∞, n→∞
n = 1, 2, . . . , and E |X |r < ∞. Now let us assume only that E |X n |r < ∞, n = 1, 2, . . . , and let E |X n − X |r → 0 for some r.v. X . Then it follows that n→∞
E |X |r < ∞. In fact, E |X |r = E |(X − X n ) + X n |r ≤ cr (E |X n − X |r + E |X n |r ). Next, E |X n − X |r → 0 implies that for n ≥ N (some N ), E |X n − X |r < ∞. Thus E |X |r < ∞.
n→∞
Definition 3. Let E |X n |r < ∞, n = 1, 2, . . . . We say that {X n } converges mutually in the r th mean if E |X m − X n |r → 0. m,n→∞
Then we have the following theorem. (r )
X n → X , some r.v. X , if and n→∞
Theorem 9 (Completeness in the rth mean theorem). only if {X n } converges mutually in the r th mean. Proof.
(r )
Let X n → X . Then n→∞
E |X n − X m |r = E |(X n − X ) + (X − X m )|r ≤ cr (E |X n − X |r + E |X m − X |r ) Now let X m − X n
(r )
→
m,n→∞
0. Then X m − X n
P
→
→
m,n→∞
0.
0 by the Markov inequality; i.e.,
m,n→∞ P {X n } converges mutually in probability. Then X n → n→∞
X , some r.v. X (by Theorem 6 a.s.
in Chapter 3), which implies the existence of {k} ⊆ {n} such that X k → X or k→∞
a.s.
−X k → − X with P(X = X ) = 0. Henceforth we treat X as if it were X . Then, k→∞
a.s.
for every fixed m, we get X m − X k → X m − X . Thus we have k→∞
0 ≤ |X m − X k |r and lim inf |X m − X k |r = lim |X m − X k |r k→∞
k→∞
= |X m − X |r a.s.
103
104
CHAPTER 6 Standard Moment and Probability Inequalities, Convergence
Applying part (i) of the Fatou–Lebesgue Theorem, we get then r lim inf |X m − X k | ≤ lim inf |X m − X k |r , or k→∞ k→∞ lim |X m − X k |r ≤ lim inf |X m − X k |r , or k→∞ k→∞ |X m − X |r ≤ lim inf |X m − X k |r ; i.e., E |X m − X |r ≤ lim inf E k→∞
k→∞ |X m − X k |r
.
Letting also m → ∞, we get by our assumption: lim sup E |X m − X |r ≤ lim inf lim inf E |X m − X k |r m→∞
m→∞
= So E |X n − X
|r
k→∞
lim E |X m − X k |r = 0.
m,k→∞
→ 0.
n→∞
The following result is a characterization of integrability of a r.v. Namely, |X | dP → 0. Theorem 10. The r.v. X is integrable if and only if (|X |≥c)
c→∞
Proof. Let |X | dP < ∞. Then P(|X | < ∞) = 1, because otherwise |X |dP = ∞. a.s.
Also, |X | I(|X |≥c) ≤ |X | independent of c and integrable, and |X | I(|X |≥c) → 0 as c → ∞. Therefore [|X | I(|X |≥c) ]dP → 0 by the Dominated Convergence Theorem, or c→∞ |X | dP → 0. Next, if |X | dP → 0, there exists co sufficiently large c→∞ c→∞ |≥c) (|X |≥c) (|X |X | dP < 1. Thus such that (|X |≥co )
|X | dP = so that
|X | dP +
(|X |≥co )
|X | dP < 1 + co ,
(|X |
|X | dP is finite.
Remark 5. The theorem need not be true if the measure is not finite. For example, let (, A, μ) = (, B, λ), where λ is the Lebesgue measure, and let X = 1. Then |X | dλ = λ( ) = 0, whereas |X | dλ = ∞. for c > 1, (|X |≥c)
Now, replace X by a sequence {X n }, n ≥ 1, and give the following definition. Definition 4.
The r.v.s X n , n ≥ 1, are said to be uniformly integrable if |X n | → 0 uniformly in n ≥ 1. c→∞
(|X n |≥c)
The following concept will also be needed later.
6.2 Convergence in the r th Mean, Uniform Continuity
Definition 5. If X n , n ≥ 1, are integrable, then |X n | ,n ≥ 1, are said to be uniformly (P−) absolutely continuous if P(A) → 0 implies A |X n | → 0 uniformly in n ≥ 1; i.e., for ε > 0, there exists δ(ε) independent of n such that |X n | < ε for every n ≥ 1. P(A) < δ(ε) implies A
In this definition, the index n may be replaced by t ∈ T ⊆ . (See also Exercise 6 in Chapter 5.) Theorem 11. The r.v.s X n , n ≥ 1, are uniformly integrable if and only if the integrals of their absolute values are bounded and uniformly continuous. Proof. Assume uniform integrability. Then for ε > 0, there exists c = c(ε) > 0 |X n | < ε for all n. Now large enough such that (|X n |≥c)
|X n | =
|X n | + (|X n |
i.e.,
|X n | ≤ c P(|X n | < c) + ε ≤ c + ε;
(|X n |≥c)
|X n | are bounded. Next, choose c > 0 such that
Take δ = δ(ε) =
ε 2c
(|X n |≥c)
|X n | <
ε 2
for all n.
and let A be such that P(A) < δ. Then
|X n | = A
|X n | +
A∩(|X n |
+ (|X n |≥c)
|X n | ≤
A∩(|X n |≥c)
|X n | A∩(|X n |
ε ε ε |X n | ≤ c P(A) + < c + = ε; 2 2c 2
i.e., P(A) < δ implies A |X n | < ε and hence we have uniform continuity. Now assume boundedness and uniform continuity. Boundedness implies |X n | ≤ M for all n, so that P(|X n | ≥ c) ≤ Mc for all n. Then, by uniform continuity, for ε > 0, there exists δ(ε) > 0, call it δ, such that if A has P(A) < δ, we have A |X n | < ε for all n. Taking A = (|X n | ≥ c) andc large enough to make Mc ≤ δ, and since then |X n | < ε for all n, which proves uniform P(|X n | ≥ c) < δ, we will have (|X n |≥c)
integrability. The following theorem provides a criterion of uniform integrability. (r )
If E|X n |r < ∞ for all n, and X n → X , then |X n |r are uniformly n→∞ r Proof. By Theorem 11, it suffices to prove that |X n | are bounded and uniformly (r ) continuous. Now X n → X implies |X n |r → |X |r < ∞, by Theorem 7, and
Theorem 12. integrable.
n→∞
n→∞
105
106
CHAPTER 6 Standard Moment and Probability Inequalities, Convergence
hence |X n |r are bounded. Next, for A ∈ A, X n I A = X I A + (X n − X )I A and then, by the cr -inequality, we get r r |X n I A | ≤ cr |X I A | + cr |(X n − X )I A |r or |X n |r ≤ cr |X |r + cr |X n − X |r . A
A
10). Thus for Now |X |r is absolutely continuous, as is easily seen (see Exercise ε > 0, there exists δo (ε) > 0 such that if P(A) < δo (ε) then A |X |r < 2cεr . From |X n − X |r → 0 we have that there exists n o such that |X n − X |r < 2cεr for n→∞
n > n o . Therefore for n > n o and A such that P(A) < δo (ε), we have ε ε |X n |r ≤ cr + cr = ε. 2cr 2cr A Next consider A |X n |r , n = 1, . . . , n o . Then for
ε > 0 there exists δn (ε), n = 1, . . . , n o , such that P(A) |X n |r < ε, < δn (ε) implies A
because |X n |r is absolutely continuous. Set δ(ε) = min{δo (ε), δ1 (ε), . . . , δn (ε)}. Then for ε > 0 and A such that P(A) < δ(ε) we have A |X n | < ε for all n. Theorem 13 (Necessary and sufficient conditions for convergence in the r th mean). Let E|X n |r < ∞ for all n. Then (r )
(i) X n → X if and only if, either n→∞ P (ii) X n → X and |X n |r are uniformly continuous, or n→∞ P (ii ) X n → X and |X n − X |r are uniformly continuous. n→∞
Proof. The theorem is established by showing that: (i) ⇒ (ii), (ii ) ⇒ (i) (which implies that (ii ) ⇒ (ii)) and (ii) ⇒ (ii ) (which implies that (ii) ⇒ (i)). Then (i) and (ii) are equivalent, and (i) and (ii ) are also equivalent. In the form of a diagram, we have (i) (ii)
(r )
(ii ) P
Indeed, X n → X implies X n → X (by Theorem 8) and that |X n |r , n ≥ 1, are n→∞ n→∞ uniformly integrable (by Theorem 12), which, in turn, implies that |X n |r , n ≥ 1,
6.2 Convergence in the r th Mean, Uniform Continuity
are uniformly continuous (and |X n | ≤ M(< ∞), n ≥ 1) (by Theorem 11). So (i) ⇒ (ii). Next, (ii ) ⇒ (i) because, with An = (|X n − X | ≥ ε), for any ε > 0, |X n − X |r = |X n − X |r + |X n − X |r ≤ ε + εr , Acn
An
by (ii ) for all n > n 0 = n 0 (ε), so that |X n − X |r → 0; i.e., (ii ) ⇒ (i). Finally, (ii) ⇒ (ii ) because, by the cr -inequality, |(X n − X )I A |r ≤ cr |X n I A |r + cr |X I A |r , A ∈ A,
or
|X n − X |r ≤ cr A
|X n |r + cr A
|X |r .
(6.1)
A
By (ii), |X n |r < A
ε for all n, provided P(A) < δ(ε), some suitable 2cr δ(ε) > 0.
Next, we show that |X |r < A
ε , 2cr
for the same A as that in (6.2).
(6.2)
(6.3)
(This would be true if E|X |r < ∞, which we do not know.) That (6.3) is true is seen P
a.s.
as follows: By (ii), X n → X . Hence there exists {m} ⊆ {n} such that X m → X or n→∞ m→∞ a.s. r |X m |r I A → X I A , A ∈ A, with P(X = X ) = 0. Henceforth we treat X as if m→∞ it were X . So, a.s. 0 ≤ |X m |r I A → |X |r I A . Then by part (iii) of the Fatou–Lebesgue Theorem m→∞
(Theorem 2 in Chapter 5), lim inf |X m |r I A =
lim |X m |r I A = m→∞ ≤ lim inf |X m |r I A
m→∞
|X |r I A
m→∞
or
|X |r ≤ lim inf A
m→∞
|X m |r .
(6.4)
A
Again, by (ii) (see also (6.2)), ε |X m |r ≤ for all m, provided P(A) < δ(ε). 2c r A
(6.5)
107
108
CHAPTER 6 Standard Moment and Probability Inequalities, Convergence
From (6.4) and (6.5), it follows that ε |X |r ≤ for the same A as in (6.5). 2c r A Then (6.2) and (6.3) hold simultaneously, and hence (6.1) yields ε ε |X n − X |r ≤ cr + cr = ε for all n, 2c 2c r r A
as was to be seen. Corollary 1. Proof.
(r )
(r )
n→∞
n→∞
If X n → X then X n → X for all (0 <)r ≤ r . P
By the theorem, it suffices to prove that X n → X and n→∞
(r )
P
n→∞
n→∞
|X n − X |r are
uniformly continuous. Since X n → X implies X n → X , we have to prove the latter
part only. Let An = (|X n − X | ≥ 1). Then, on An we have: |X n − X |r ≤ |X n − X |r . Thus, for any A ∈ A, r r |X n − X | = |X n − X | + |X n − X |r A
A∩An
≤
|X n − X |r +
A∩An
A∩Acn
A∩Acn
|X n − X |r
|X n − X |r + P(A).
≤ A
ε Hence, for ε > 0 and A with P(A) as small as required but < r 2 , we have ε r |X n − X | implied by A |X n − X | < 2 for all n by uniform continuity of (r )
X n → X . The proof is completed.
n→∞
Corollary 2.
P
(r )
n→∞
n→∞
Let X n → X and E |X n |r ≤ M(< ∞), n ≥ 1. Then X n → X for
all (0 <)r < r .
Proof. By the theorem, it suffices to prove that |X n |r are uniformly continuous. For c > 0, let An = (|X n | ≥ c). Then, for |X n | > 0, we get
|X n |r = |X n |r On An ,
Hence
|X n |r 1 r = |X n | . r |X n | |X n |r −r
1 1 1 1 ≤ , or ≤ r −r . r −r |X n | c c |X n |
|X n |r = A
|X n |r + A∩An
A∩Acn
|X n |r
6.2 Convergence in the r th Mean, Uniform Continuity
= A∩An
≤
1 1
cr −r
|X n
|r −r
|X n | + r
A∩Acn
|X n |r 1
|X n |r + cr P(A) ≤ A∩An
cr −r
M + cr P(A).
Then for ε > 0, take c sufficiently large so that r 1−r M < 2ε . Also choose A such that c P(A) < εr = δ(ε). Thus, for ε > 0, there exists δ(ε) > 0 such that P(A) < δ(ε) 2c implies A |X n |r < ε. Corollary 3 (Compare with the converse part of Theorem 8).
P
Let X n → X and n→∞
(r )
a.s.
|X n | ≤ Y , for all n with EY r < ∞. Then X n → X . n→∞
r a.s. r r r Proof. |X n | ≤ Y implies the r |X n | ≤ Y a.s. and hence |X n | ≤ Y . Then absolute continuity of Y (by Exercise 11) implies the uniform continuity of |X n |r , hence the result. (1)
Let 0 ≤ X n , and E X n < ∞ for all n. Then X n → X if and only if
Lemma 3.
n→∞
P
X n → X and E X n → E X finite. n→∞
n→∞
Throughout the proof all limits are taken as n → ∞ unless otherwise
Proof.
(1)
P
specified. Let X n → X . Then X n → X and E X n → E X finite by Theorem 7 (applied P
P
with r = 1). Hence it suffices to prove the converse. Now X n → X implies X n −X → 0 P
P
P
or X − X n → 0. Next, it is easily seen that Yn → Y implies g(Yn ) → g(Y ) for every continuous function g (see also Exercise 8 in Chapter 3). The function g(x) = x + P
P
is continuous. Hence X − X n → 0 implies (X − X n )+ → 0. Now 0 ≤ X n , n ≥ 1, P
and X n → X implies X ≥ 0 a.s. (passing to a subsequence {m} ⊆ {n} such that a.s. X m → X , so that X ≥ 0 a.s. since X n ≥ 0 a.s. for all n), and (X − X n )+ ≤ X , a.s. n→∞ So we have P
(0 ≤)(X − X n )+ ≤ X with E X < ∞ and (X − X n )+ → 0. Then, the Dominated Convergence Theorem gives E(X − X n )+ → 0.
(6.6)
It is also given that E X n → E X finite, which implies E(X − X n ) → 0.
(6.7)
From (6.6) and (6.7) and the relation X − X n = (X − X n )+ − (X − X n )− we get E(X − X n )− → 0.
(6.8)
109
110
CHAPTER 6 Standard Moment and Probability Inequalities, Convergence
Adding (6.6) and (6.8), we get (1)
E |X − X n | → 0 or X n → X . Theorem 14 (Vitali’s Theorem).
(r )
Let E |X n |r < ∞ for all n. Then X n → X if and n→∞
P
only if X n → X and E |X n |r → E |X |r finite. n→∞
Proof.
n→∞
Throughout the proof all limits are taken as n → ∞. (r )
P
X n → X implies X n → X and E |X n |r → E |X |r finite ( by Theorem 7). So it suffices to show the converse. P P Now X n → X implies |X n |r → |X |r and we also have E |X n |r → E |X |r finite. Set Yn = |X n |r , Y = |X |r . Then we have 0 ≤ Yn , and EYn < ∞ for all sufficiently (1)
P
large n and Yn → Y , EYn → EY finite. Then the lemma applies and gives Yn → Y (1) or |X n |r → |X |r . Hence |X n |r are uniformly continuous by Theorem 13. So we (r ) P have: X n → X and |X n |r are uniformly continuous. Then X n → X , by Theorem 13 again. Remark 6. Lemma 3 is true if P is replaced by a σ -finite measure μ and X n , X by probability densities functions f n , f . Then the condition E X n → E X is trivially (1) true, since these quantities will be equal to 1. Then f n → f (i.e., | f n − f | dμ → 0) μ
if and only if f n → f . In this form the lemma is known as Scheffé’s Theorem. (See Scheffé (1947).) More precisely, we have Theorem 15 (Scheffé’s Theorem). If for n = 1, 2, . . . , f n and f : k → are p.d.f.s with respect to some σ -finite measure μ (e.g., the Lebesgue measure) in k , then as n → ∞, (1)
fn → f (μ)
μ
if and only if f n → f .
Proof. Markov’s inequality is still true if P is replaced by μ, as is easily seen directly (see Exercise 12). μ (1) Thus, with all limits taken as n → ∞, we have f n → f implies f n → f since (μ) μ μ(| f n − f | ≥ ε) ≤ ε−1 | f n − f | dμ. Next, it is easily seen that f n → f or f − μ
μ
f n → 0 implies ( f − f n )+ → 0 (see Exercise 1 in Chapter 3). As before, 0 ≤ ( f − μ f n )+ ≤ f with ( f − f n )+ → 0 and f dμ = 1 implies ( f − f n )+ dμ → 0, f dμ by the Dominated Convergence Theorem, whereas f ndμ = 1 → 1 = or ( f − f n )dμ = 0. Thus ( f − f n )− dμ → 0. So ( f − f n )+ dμ → 0 and (1) ( f − f n )− dμ → 0 imply | f n − f | dμ → 0 or f n → f . (μ)
6.2 Convergence in the r th Mean, Uniform Continuity
(r )
(1)
n→∞
n→∞ (r )
Notice that X n → X is equivalent to |X n |r → |X |r . Indeed, by The-
Remark 7.
P
orem 14, and with all limits taken as n → ∞, we have X n → X implies X n → X and P
E|X n |r → E|X |r finite. Hence |X n |r → |X |r and E|X n |r → E|X |r finite, and there(1)
(1)
P
fore, by Theorem 14 again, |X n |r → |X |r . Also, |X n |r → |X |r implies |X n |r → |X |r P
and E|X n |r → E|X |r finite, and hence X n → X and E|X n |r → E|X |r finite. There(r )
fore X n → X .
(r ) Another consequence of the convergence X n → X is that A |X n |r → A |X |r uniformly in A ∈ A. Indeed,
(|X n |r − |X |r ) ≤
|X n |r − |X |r ≤ E |X n |r − |X |r ,
A
A
(1)
independent of A and converging to 0, since |X n |r → |X |r . The following table presents in summary form the main implications of convergence in the r th mean, as well as conditions under which convergence in the r th mean holds. All limits are taken as n → ∞. (r ) X n → X if and only if {X n } , n ≥ 1, converges mutually in the r th mean. ⎧ P ⎪ Xn → X ⎪ ⎪ ⎪ ⎪ E |X n |r → E |X |r finite ⎪ ⎪ ⎨ |X n |r are uniformly integrable (r ) X n → X implies r ⎪ |X n | arer uniformly continuous ⎪ ⎪ ⎪ |X n − X | are uniformly continuous ⎪ ⎪ ⎪ ⎩ (r ) Xn → X , 0 < r ≤ r . ⎧ P(|X n | ≤ M < ∞) = 1, n ≥ 1 ⎪ ⎪ ⎪ ⎪ E ⎨ |X n |r → E |X |r finite P X n → X and any one of |X n |r are uniformly continuous ⎪ ⎪ |X n − X |r are uniformly continuous ⎪ ⎪ ⎩ |X n | ≤ Y a.s., n ≥ 1, and EY r < ∞ (r )
imply X n → X . (r )
P
X n → X and E |X n |r ≤ M < ∞, n ≥ 1, imply X n → X , 0 < r < r . Finally, combining the converging implications at the end of Section 5.2 in Chapter 5 with Theorem 8 here, we have that the four modes of convergence: a.s., in probability, in distribution, and in the r th mean, are related as follows: a.s.
P
d
Xn → X ⇒ Xn → X ⇒ Xn → X ⇑ r Xn → X .
111
112
CHAPTER 6 Standard Moment and Probability Inequalities, Convergence
Exercises. 1. Establish the following generalized version of Theorem 2 (the Hölder inequality). Namely, show that for any n (≥ 2) r.v.s X 1 , . . . , X n and any positive numbers r1 , . . . , rn with r11 + · · · + r1n = 1, it holds that 1
1
E|X 1 . . .X n | ≤ E r1 |X 1 |r1 . . .E rn |X n |rn . 2. Let g : I open interval in → be convex; i.e., for every x, x ∈ I and every α ∈ [0, 1], it holds g(αx + (1 − α)x ) ≤ αg(x) + (1 − α)g(x ). Then show that (i) g is continuous. (ii) For every x0 ∈ I , there exists λ(x0 ) ∈ such that g(x) − g(x0 ) ≥ λ(x0 ) × (x − x0 ), x ∈ I . Hint: For part (i), choose x1 < x0 < x2 , set α = (x2 − x0 )/(x2 − x1 ), β = (x0 − x1 )/(x2 − x1 ), use convexity of g to get g(x0 ) ≤ αg(x1 ) + βg(x2 ), and take the lim inf by letting x2 ↓ x0 . Next, let x1 < x2 < x0 , take α = (x0 − x2 )/ (x0 − x1 ), β = (x2 − x1 )/(x0 − x1 ), use convexity of g to get g(x2 ) ≤ αg(x1 ) + βg(x0 ), and take the lim sup as x2 ↑ x0 . For part (ii), let x1 < x0 < x, and with α = (x − x0 )/(x − x1 ) and β = (x0 − x1 )/(x − x1 ), use convexity of g in order to get g(x) − g(x0 ) ≥ [g(x0 ) − g(x1 )](x − x0 )/(x0 − x1 ). The result then follows by taking x1 = cx0 for some c > 0 so that cx0 ∈ I . 3. Establish the following generalized version of Theorem 3 (the Minkowski inequality). Namely, show that for any n (≥ 2) r.v.s X 1 , . . . , X n and any r ≥ 1, it holds that 1 1 1 E r |X 1 + ··· + X n |r ≤ E r |X 1 |r + ··· + E r |X n |r . 4. If
∞
a.s.
E|X n − X |r < ∞ for some r > 0, then X n → X . n→∞
n=1
Hint: Use Exercise 4 in Chapter 3. a.s.
(r )
n→∞
n→∞
5. If X n → X , and X n → Y for some r > 0, it follows that P(X = Y ) = 0. 6. Construct r.v.s X n , n ≥ 1, and X on some probability space (, A, P) such that P
(r )
n→∞
n→∞
X n → X , but X n X for any r ≥ 1. 7. Construct r.v.s X n , n ≥ 1, and X on some probability space (, A, P) such that a.s.
(r )
n→∞
n→∞
X n X , but X n → X for any r > 0. 8. Let X n , n = 1, 2, . . ., be r.v.s such that E|X n |β ≤ M (< ∞), n ≥ 1, for some β > 0. Then show that |X n |α , n ≥ 1, are uniformly integrable for all α with 0 < α < β. 9. For n = 1, 2, . . ., let X n be r.v.s such that P(X n = cn) = n1 and P(X n = 0) = 1 − n1 , for some c > 0. Investigate whether or not these r.v.s are uniformly integrable.
6.2 Convergence in the r th Mean, Uniform Continuity
10. If E|X | < ∞, show that |X | is absolutely continuous; that is, A |X |→0 when P(A)→0. 11. For n = 1, 2, . . ., let X n , Yn and X , Y be r.v.s such that P(X n ≥ Yn ≥ 0) = 1, P
P
n→∞
n→∞
X n → X , Yn → Y , and E X n → E X finite. Then show that E|Yn − Y | → 0. n→∞
n→∞
12. For any r.v. X defined on the measure space (, A, μ), show directly that the Markov inequality holds; namely, μ(|X | ≥ c) ≤ c−r |X |r dμ for any r > 0 and any c > 0. 13. For a r.v. X and some r > 0, show that E|X |r < ∞ if and only if E|X − c|r < ∞ for every (finite) constant c. 14. For a r.v. X with E X = μ ∈ and V ar (X ) = σ 2 = 0, show that P(X = μ) = 1. 15. For n = 1, 2, . . ., consider the r.v.s X n and X and show that P
(i) |X n | → 0 if and only if n→∞ P
(ii) |X n | → 0 if and only if n→∞
16.
P |X n | → 0. 1+|X n | n→∞ |X n | → 0. E 1+|X n | n→∞
Hint: For part (i), see Exercise 8 in Chapter 3. (i) For r > 1 and x j ∈ , j = 1, . . . , n, show that
r
n n
−1
−1
n
xj ≤ n |x j |r .
j=1 j=1 (ii) From part (i), deduce that
r
n
1 n
1 E
X j
≤ E|X j |r . n n
j=1
j=1
(On the other hand, by the Minkowski inequality,
r
n
n
1 1 1/r E 1/r
X j
≤ E |X j |r .) n n
j=1
j=1 17. For n = 1, 2, . . ., suppose that X n and X are r.v.s defined on the probability space P
(, A, P), and suppose that X n → X and {X n } are uniformly integrable. Then n→∞ show that A X n dP → A X dP uniformly in A ∈ A. n→∞
18. If for n = 1, 2, . . ., the r.v.s {X n } and {Yn } are uniformly integrable, show that {X n + Yn } are also uniformly integrable. 19. For n = 1, 2, . . ., let X n be identically distributed with E X n ∈ . Then show n that { X¯ n } are uniformly integrable, where X¯ n = n1 X j is the sample mean of X 1, . . . , X n .
j=1
113
114
CHAPTER 6 Standard Moment and Probability Inequalities, Convergence
a.s.
20. Let Y = sup |X n | with EY r < ∞ (for some r > 0) and X n → X . Then show n→∞
n≥1
that
E|X |r
(r )
< ∞ and X n → X . n→∞
21. For n = 1, 2, . . ., let X n be r.v.s such that E X n = μn ∈ and Var(X n ) = P
σn2 → 0. Then show that X n − μn → 0. n→∞
n→∞
22. Let the r.v.s X n , n ≥ 1, be defined as follows:X n (ω) = 2n if ω ∈ (0, n1 ), and X n (ω) = 0 otherwise, where (, A, P) = ((0, 1], B(0,1] , λ) and λ is the Lebesgue measure. Then show that X n → 0 pointwise, but E|X n |r to any n→∞
finite number for any r > 0; in fact, E|X n |r → ∞ for any r > 0.
n→∞
n→∞
23. For n = 1, 2, . . ., let the r.v. X n be defined by
⎧ 1 ⎪ ⎪ n c with probability ⎪ ⎪ n ⎨ 2 Xn = 0 with probability 1 − ⎪ n ⎪ ⎪ ⎪ ⎩ n −c with probability 1 , where c is a positive constant. n Then show that P
(i) X n → 0. (ii) (iii)
n→∞ E|X n |r → 0, 0 < n→∞ (r ) X n → 0, cr < 1. n→∞
cr < 1; E|X n |r → ∞, cr > 1. n→∞
24. Let E X n2 < ∞ for all n. Then, if E|X n − X |2 → 0, show that E|X n2 − X 2 | → 0. n→∞
n→∞
25. For n = 1, 2, . . ., let X n , Yn and X , Y be r.v.s defined on the probability space (r )
(s)
n→∞
n→∞
(, A, P), and suppose that X n → X , Yn → Y where r , s > 1 with 1 s
(1)
1 r
+
= 1. Then show that X n Yn → X Y . n→∞
26. Let (, A, P) = ((0, 1), B(0,1) , λ), where λ is the Lebesgue measure, and let X n be a r.v. defined by ⎧ 1 ⎪ ⎪ ⎨ 1, ω ∈ 0, n , n = 1, 2, . . . . Xn = 1 ⎪ ⎪ ,1 ⎩ 0, ω ∈ n (r )
a.s.
Then show that X n → 0 for any r > 0, and X n −→ 0; indeed, X n −→ 0 n→∞ n→∞ n→∞ pointwise. (Compare it with Exercise 7.)
6.2 Convergence in the r th Mean, Uniform Continuity
(r )
27. If X n → X , show that there is a subsequence {X n k } ⊆ {X n } such that n→∞ a.s.
X nk → X . n→∞
28. For n = 1, 2, . . ., let the r.v. X n be defined by Xn =
en with probability n −2 0 with probability 1 − n −2 .
Then show that P
a.s.
n→∞ (r )
n→∞
(i) X n → 0 (indeed, X n → 0). (ii) X n 0 for any r > 0 (in fact, E|X n |r −→ ∞ for all r > 0). n→∞ n→∞ (Compare it with Exercises 6 and Theorem 8.) 29. For n = 1, 2, . . ., let the r.v. X n be defined by ⎧ cn with probability 2−n ⎨2 Xn = 0 with probability 1 − 2−n+1 ⎩ cn −2 with probability 2−n , for some positive constant c. a.s.
(i) Show that X n → 0. n→∞
(r )
(ii) Determine the condition c and r (> 0) must satisfy, so that X n → 0. n→∞ (Compare it with Exercises 6 and 8.) 30.
(i) For any r.v. X , show that P(|X | ≥ c) ≤ E|X |/c (c > 0), by using the obvious inequality |X | ≥ cI (|X | ≥ c). In particular, if X ≥ 0, then P(X ≥ c) ≤ E X /c. (ii) For any r.v. X and any c > 0, show that P(X ≥ c) ≤ e−tc Eet X (t > 0), P(X ≤ c) ≤ e−tc Eet X (t < 0). (iii) If E X 2 < ∞, then show that P(|X | > 0) ≥ (E|X |)2 /E X 2 . In particular, if X ≥ 0, then P(X > 0) ≥ (E X )2 /E X 2 .
31. For n ≥ 1, let X n be r.v.s with E X n = μn , σ 2 (X n ) = σn2 , and suppose that, as n → ∞, μn → μ ∈ , and σn2 → 0. Then show that E(X n − μ)2 → 0 and P
X n → μ. 32. For n ≥ 1, let X n and Yn be r.v.s defined on the probability space (, A, P), and P
suppose that |X n | and |Yn |, n ≥ 1, are uniformly integrable and X n − Yn → 0 as n → ∞. Then show that: (i) |X n − Yn |, n ≥ 1, are uniformly integrable. (ii) E|X n − Yn | −→ 0. n→∞
115
116
CHAPTER 6 Standard Moment and Probability Inequalities, Convergence
33. Consider the probability space (, A, P), and let be an open subset of . Let g(·; ·) : × → be (A × B )-measurable, where B is the σ -field of Borel subsets of . We say that g(·; θ ) is differentiable in q.m. at θ , if there exists a ˙ θ ) (the quadratic mean derivative of g(·; θ ) (A × B )-measurable function g(·; at θ ) such that q.m. ˙ θ ). h −1 [g(·; θ + h) − g(·; θ )] −→ g(·; n→∞
1 −|x−θ| ,x 2e
Now let p(x; θ ) = ∈ (θ ∈ ) be the double exponential probability density function, and for θ, θ ∗ ∈ , set 1 1 g(x; θ, θ ∗ ) = [ p(x; θ ∗ )/ p(x; θ )]1/2 = exp |x − θ | − |x − θ ∗ | . 2 2 (i) Show that g(x; θ, θ ∗ ) is not pointwise differentiable with respect to θ ∗ at (θ, θ ) for any θ = x ∈ . (ii) If the r.v. X is distributed according to p(·; θ ), show that g(X ; θ, θ ∗ ) is differentiable in q.m. with respect to θ ∗ at (θ, θ ), θ ∈ , with q.m. derivative g(X ˙ ; θ ) given by: ⎧ −1/2 if X < θ ⎨ g(X ˙ ; θ ) = 0 (for example) if X = θ . ⎩ 1/2 if X > θ 34. All r.v.s appearing below are defined on the probability space (, A, P). (i) If X ≥ 0, then E X ≥ 0 and E X = 0 only if P(X = 0) = 1. (ii) Let X ≥ Y with finite expectations. Then E X ≥ EY and E X = EY only if P(X = Y ) = 1. (iii) Let X > Y with finite expectations. Then E X > EY . (iv) Let g : I open subset of be strictly convex (i.e., g[αx + (1 − α)x ] < αg(x) + (1 − α)g(x ), x, x ∈ I , 0 < α < 1), let Z be a r.v. taking values in I , let E Z ∈ I , and let E g(Z ) exist. Then E g(Z ) > g(E Z ) unless P(Z = constant) = 1.
CHAPTER
The Hahn–Jordan Decomposition Theorem, The Lebesgue Decomposition Theorem, and the Radon–Nikodym Theorem
7
This chapter revolves around three classical theorems. In Section 7.1, it is shown that any σ -additive set function ϕ may be decomposed into the difference of two measures (i.e., ϕ is a signed measure). This is a brief description of the Hahn–Jordan Decomposition Theorem. In the following section, this decomposition is used in order to obtain the Lebesgue Decomposition Theorem. This theorem states that, if μ and ν are any two σ -finite measures, then ν can be decomposed uniquely into two components νc and νs . The component νc is absolutely continuous with respect to μ, and νs is singular with respect to μ. In the third and final section of the chapter, the Lebesgue Decomposition Theorem is specialized to the case where ν μ in order to establish the Radon–Nikodym Theorem. Namely, to show that ν is the indefinite integral, with respect to μ, of a r.v. that is nonnegative, a.e. [μ] finite, and a.e. [μ] unique. A corollary to this theorem justifies the replacement in integrals of a probability distribution by its probability density function (p.d.f.) with respect to a dominating measure, as is routinely done in Statistics.
7.1 The Hahn–Jordan Decomposition Theorem In all that follows, ϕ is a σ -additive set function defined on (, A) and taking values in (−∞, ∞]. The value −∞ is excluded in order to avoid expressions of the form ∞ − ∞. It will also be assumed that ϕ is finite for at least one set A. Then from A = A + + + · · · and the σ -additivity of ϕ it follows that ϕ() = 0. From this fact it also follows that ϕ is finitely additive. Definition 1. The continuity of ϕ is defined the same way the continuity of a measure has been defined; i.e., ϕ is continuous from below if An ↑ A as n → ∞ implies ϕ(An ) → ϕ(A); it is continuous from above if An ↓ A as n → ∞ with |ϕ(An 0 )| < ∞ for some n 0 implies ϕ(An ) → ϕ(A); and it is continuous if it is n→∞ continuous from both below and above. An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00007-4 Copyright © 2014 Elsevier Inc. All rights reserved.
117
118
CHAPTER 7 The Hahn–Jordan Decomposition Theorem
Lemma 1. Proof.
Every σ -additive set function ϕ is continuous.
Let An ↑ A as n → ∞ and suppose first that |ϕ(An )| < ∞, for all n. Then A=
∞
A j = A1 +
j=1
∞ (A j − A j−1 ) j=2
and hence ϕ(A) = ϕ(A1 ) +
∞
ϕ(A j − A j−1 ).
j=2
But A j−1 ⊆ A j implies that A j = A j−1 + (A j − A j−1 ), so that ϕ(A j ) = ϕ(A j−1 ) + ϕ(A j − A j−1 ), and since |ϕ(An )| < ∞, for all n, we get ϕ(A j − A j−1 ) = ϕ(A j ) − ϕ(A j−1 ). Therefore ϕ(A) = ϕ(A1 ) + lim
n→∞
n
ϕ(A j − A j−1 )
j=2
= ϕ(A1 ) + lim ϕ(An ) − ϕ(A1 ) = lim ϕ(An ). n→∞
n→∞
Now suppose that ϕ(An 0 ) = ∞ for some n 0 . Then for all n ≥ n 0 , An = An 0 + (An − An 0 ), so that ϕ(An ) = ϕ(An 0 )+ϕ(An − An 0 ) = ∞. Also A = An 0 +(A− An 0 ) implies ϕ(A) = ϕ(An 0 ) + ϕ(A − An 0 ) = ∞, so that ϕ(An ) → ϕ(A). n→∞
Next, let An ↓ A asn → ∞ and |ϕ(An 0 )| < ∞ for some n o . Then An 0 = A+ ∞ j=n 0 A j − A j+1 , so that ϕ(An 0 ) = ϕ(A) + + lim
n→∞
∞ j=n 0 n
ϕ(A j − A j+1 ) = ϕ(A) ϕ(A j − A j+1 ).
j=n 0
For j ≥ n 0 , A j = A j+1 + (A j − A j+1 ), so that ϕ(A j ) = ϕ(A j+1 ) + ϕ(A j − A j+1 ), and since ϕ(An 0 ) is finite, so are ϕ(A j+1 ) for all j ≥ n 0 (as follows by induction). Hence ϕ(An 0 ) = ϕ(A) + lim ϕ(An 0 ) − ϕ(An+1 ) = ϕ(A) + ϕ(An 0 ) − lim ϕ(An ). n→∞
Thus lim ϕ(An ) = ϕ(A) and ϕ is continuous. n→∞
n→∞
7.1 The Hahn–Jordan Decomposition Theorem
Definition 2. For a set function ϕ on (, A), define the set functions ϕ + and ϕ − on (, A), as follows: ϕ + (A) = sup{ϕ(B); B ⊆ A, B ∈ A}, B ⊆ A,
ϕ − (A) = − inf{ϕ(B);
B ∈ A}.
Remark 1. Both ϕ + (A) and ϕ − (A) are ≥ 0 for all A. This is so, because ϕ(∅) = 0 implies ϕ + (A) ≥ 0 for all A. Also, inf{ϕ(B); B ⊆ A} = − sup{−ϕ(B); B ⊆ A}, or − inf{ϕ(B); B ⊆ A} = sup{−ϕ(B); B ⊆ A}, and since −ϕ() = 0, we have sup{−ϕ(B); B ⊆ A} ≥ 0, and hence ϕ − (A) ≥ 0 for all A. Then we may formulate and prove the following result. Theorem 1 (Hahn–Jordan Decomposition Theorem). Let ϕ be a σ -additive function defined on (, A), let ϕ + , ϕ − be as in Definition 2, and let −∞ < m = inf{ϕ(A); A ∈ A} ≤ sup{ϕ(A); A ∈ A} = M ≤ ∞. Then (i) There exists at least one set D ∈ A such that ϕ(A) ≥ 0 for every A ⊆ D and ϕ(A) ≤ 0 (A ∈ A). for every A ⊆ D c (ii) ϕ + (A) = ϕ(A ∩ D), ϕ − (A) = −ϕ(A ∩ D c ). (iii) ϕ + , ϕ − , |ϕ| = ϕ + + ϕ − are measures and ϕ − is finite. (iv) ϕ = ϕ + − ϕ − (in the sense that ϕ(A) = ϕ + (A) − ϕ − (A), A ∈ A). Proof.
(i) For j = 1, 2, . . ., let ε j > 0 and j ε j < ∞ e.g., ε j = A j ∈ A be such that m ≤ ϕ(A j ) ≤ m + ε j . Next, from A2 = (A1 ∩ A2 ) + (A2 − A1 ),
1 j2
. For each j, let
A1 ∪ A2 = A1 + (A2 − A1 ),
it follows that ϕ(A1 ∩ A2 ) = ϕ(A2 ) + ϕ(A1 ) − ϕ(A1 ∪ A2 ). Therefore m ≤ ϕ(A1 ∩ A2 ) = ϕ(A2 ) + ϕ(A1 ) − ϕ(A1 ∪ A2 ) ≤ 2m + ε1 + ε2 − m = m + ε1 + ε2 . Thus, m ≤ ϕ(A1 ∩ A2 ) ≤ m + (ε1 + ε2 ) and, clearly, this is true for any two sets An , An+1 . Furthermore, it is easily seen by induction that m ≤ ϕ(
k j=n
Aj) ≤ m +
k j=n
εj.
119
120
CHAPTER 7 The Hahn–Jordan Decomposition Theorem
Letting k → ∞ and utilizing the continuity of ϕ, we obtain m ≤ ϕ(
∞
Aj) ≤ m +
j=n
Now set D c = lim inf An = n→∞
n→∞
n=1
∞ j=n
ϕ(D c ) = ϕ(lim
∞
εj.
(7.1)
j=n
∞ ∞
lim sup Acn . Then, as n → ∞,
∞
j=n
A j , so that D =
∞ ∞ n=1
j=n
Acj =
A j ↑ D c and by continuity of ϕ,
A j ) = lim ϕ(
j=n
∞
Aj) = m
j=n
on account of (7.1). That is, ϕ(D c ) = m. Next, let A ⊆ D c . Then D c = A + (D c − A) and hence m = ϕ(D c ) = ϕ(A) + ϕ(D c − A) ≥ ϕ(A) + m, so that ϕ(A) ≤ 0. Finally, if A ⊆ D, then m ≤ ϕ(A + D c ) = ϕ(A) + ϕ(D c ) = ϕ(A) + m, so that ϕ(A) ≥ 0. This completes the proof of (i). (ii) Let B ⊆ A. Then ϕ(B) = ϕ[(B ∩ D) + (B ∩ D c )] = ϕ(B ∩ D) + ϕ(B ∩ D c ) ≤ ϕ(B ∩ D) (since ϕ(B ∩ D c ) ≤ 0) ≤ ϕ(B ∩ D) + ϕ[(A − B) ∩ D] (since ϕ[(A − B) ∩ D] ≥ 0) = ϕ(A ∩ D)
(since (B ∩ D) + (A − B) ∩ D = A ∩ D).
That is, for every B ⊆ A, ϕ(B) ≤ ϕ(A∩ D) and this is, in particular, true for B ⊆ (A ∩ D). Since in forming sup ϕ(B) we may restrict ourselves to B ⊆ A B⊆A
with B ⊆ A ∩ D (because ϕ(C ∩ D c ) ≤ 0), we have that ϕ(B) ≤ϕ(A ∩ D) for every B ⊆ A ∩ D and hence ϕ(A ∩ D) = sup ϕ(B) = ϕ + (A). B⊆A
Next, for B ⊆ A,
ϕ(B) = ϕ(B ∩ D) + ϕ(B ∩ D c ) ≥ ϕ(B ∩ D c ) (since ϕ(B ∩ D) ≥ 0) ≥ ϕ(B ∩ D c ) + ϕ[(A − B) ∩ D c ] (since ϕ[(A − B) ∩ D c ] ≤ 0) = ϕ(A ∩ D c ) (since (B ∩ D c ) + (A − B) ∩ D c = A ∩ D c ). That is, for every B ⊆ A, ϕ(B) ≥ ϕ(A ∩ D c ) and this is, in particular, true for B ⊆ A ∩ D c . Since in forming sup [−ϕ(B)] we may restrict ourselves to B⊆A
B ⊆ A with B ⊆ A ∩ D c (because ϕ(C ∩ D) ≥ 0), we have that −ϕ(B) ≤ −ϕ(A ∩ D c ) for every B ⊆ A ∩ D c ,
7.1 The Hahn–Jordan Decomposition Theorem
and hence −ϕ(A ∩ D c ) = sup [−ϕ(B)] = − inf ϕ(B) = ϕ − (A). B⊆A
B⊆A
(iii) That ϕ + is a measure follows from the fact that ϕ + (A) = ϕ(A ∩ D), ϕ() = 0, ϕ(B) ≥ 0 for every B ⊆ D and the σ -additivity of ϕ; similarly for ϕ − . Finally, from (−∞ <)m ≤ ϕ(A ∩ D c ) ≤ 0 it follows that 0 ≤ −ϕ(A ∩ D c ) = ϕ − (A) ≤ −m, so that ϕ − is finite. (iv) By (ii), ϕ(A) = ϕ[(A ∩ D) + (A ∩ D c )] = ϕ(A ∩ D) + ϕ(A ∩ D c ) = ϕ + (A) − ϕ − (A).
Remark 2. (i) The theorem is true without the assumption that m > −∞, but the proof is somewhat more complicated (see, e.g., pages 86–87 in Loève (1963), or pages 104–106 in Neveu (1965)). (ii) If μ1 , μ2 are two measures such that μ1 (A)−μ2 (A) is defined for every A ∈ A, then μ1 − μ2 is called a signed measure. Thus the theorem shows that every σ -additive function is a signed measure. (iii) For the set D, one has ϕ(D) = M and ϕ(D c ) = m. In fact, there exist Bn ∈ A such that ϕ(Bn ) → M. Then n→∞
ϕ(D) = ϕ(D ∩ Bn ) + ϕ(D ∩ Bnc ) ≥ ϕ(D ∩ Bn ) (since ϕ(D ∩ Bnc ) ≥ 0), ≥ ϕ(D ∩ Bn ) + ϕ(D c ∩ Bn ) (since ϕ(D c ∩ Bn ) ≤ 0) = ϕ(Bn ) (by finite additivity of ϕ). That is, ϕ(D) ≥ ϕ(Bn ) and hence, as n → ∞, ϕ(D) ≥ M. Since also ϕ(D) ≤ M, it follows that ϕ(D) = M. Next, recall that m = inf{ϕ(A); A ∈ A}. Then there exists Cn ∈ A such that ϕ(Cn ) → m as n → ∞. Furthermore, ϕ(D c ) = ϕ(D c ∩ Cn ) + ϕ(D c ∩ Cnc ) ≤ ϕ(D c ∩ Cn ) (since ϕ(D c ∩ Cnc ) ≤ 0), ≤ ϕ(D c ∩ Cn ) + ϕ(D ∩ Cn ) (since ϕ(D ∩ Cn ) ≥ 0), = ϕ(Cn ) (by finite additivity of ϕ). That is, ϕ(D c ) ≤ ϕ(Cn ) and hence, as n → ∞, ϕ(D c ) ≤ m. Since also ϕ(D c ) ≥ m, it follows that ϕ(D c ) = m. Corollary. Under the assumptions of the theorem, |ϕ| is bounded if and only if |ϕ|() < ∞, or if and only if ϕ + () < ∞.
121
122
CHAPTER 7 The Hahn–Jordan Decomposition Theorem
Proof. The first assertion follows by the fact that |ϕ| is a measure, so that |ϕ|(A) ≤ |ϕ|() for all A ∈ A. The second assertion follows by the fact that ϕ − () ≤ −m < ∞, and the expression |ϕ|() = ϕ + () + ϕ − (). Definition 3. If μ, ν are two measures on A, we recall that ν is said to be μcontinuous (or absolutely continuous with respect to μ), denoted by ν μ, if μ(A) = 0 implies that ν(A) = 0. We also say that ν is dominated by μ or μ dominates ν. If ν μ and μ ν, then μ and ν are said to be mutually absolutely continuous and write μ ≈ ν. ν is said to be μ-singular (or singular with respect to μ) if there exists N ∈ A with μ(N ) = 0 such that ν(A) = ν(A ∩ N ) for every A ∈ A. We also say that ν and μ are orthogonal and write ν ⊥ μ.
7.2 The Lebesgue Decomposition Theorem Theorem 2 (Lebesgue Decomposition Theorem). Let μ, ν be two σ -finite measures on A. Then (i) There exists a decomposition of ν into a μ-continuous measure νc and a μsingular measure νs such that ν = νc + νs (in the sense that ν(A) = νc (A) + νs (A), A ∈ A). (ii) The decomposition in (i) is unique. r.v. X determined (iii) νc is the indefinite integral of a nonnegative, a.e. [μ] finite (A) = X dμ and if νc (A) = up to μ-equivalence (i.e., if for every A ∈ A, ν c A dμ for another r.v. X , then μ(X = X ) = 0). X A Proof. Case 1: μ, ν finite. (i) Let X = {X ≥ 0, a.e. [μ] finite r.v.s; A X dμ ≤ ν(A), A ∈ A}. Then X = 0 ∈ X , so that X = , and for every X ∈ X , (0 ≤) X dμ ≤ ν() < ∞. Thus de f X dμ = α < ∞ and there exists {X n } ⊆ X such that sup X ∈X X n dμ → α. n→∞
Define Yn by: Yn = max X k . Then Yn ↑ as n → ∞ and let X = lim Yn . Thus n→∞ 1≤k≤n 0 ≤ Yn ↑ X as n → ∞ and hence Y dμ ↑ X dμ as n → ∞. On the other n hand, X n ≤ Yn and hence X n dμ ≤ Yn dμ. Letting n → ∞, we get then α ≤ X dμ. Thus X dμ ≥ α. (7.2) Now define A1 = (Yn = X 1 ) A2 = (Yn = X 2 ) − A1
7.2 The Lebesgue Decomposition Theorem
A3 = (Yn = X 3 ) − (A1 + A2 ) . . . . . . . . . . An = (Yn = X n ) − (A1 + A2 + · · · + An−1 ). Then, clearly, {A1 , . . . , An } is a partition of and therefore for any A ∈ A, one has n n Yn dμ = Yn dμ = Yn dμ = X j dμ nj=1 A∩A j
A
≤
n j=1
j=1
A∩A j
j=1
A∩A j
ν(A ∩ A j ) = ν(A); Yn dμ ≤ ν(A) for every A ∈ A, and hence Yn ∈ X .
i.e., A
Thus Yn dμ ≤ α and since Yndμ ↑ X dμ as n → ∞, one has X dμ ≤ α. This result, together with (7.2), gives X dμ = α. Furthermore, Yn dμ ≤ ν(A) for every A ∈ A and Yn dμ ↑ X dμ as n → ∞ A
A
A
(as follows from 0 ≤ Yn ↑ X as n → ∞) imply that A X dμ ≤ ν(A) for every A ∈ A . Since (0 ≤) X dμ = α < ∞ implies that X is a.e. [μ] finite, we have that X ∈ X and X dμ = α. Now define νc on A as follows: νc (A) = A X dμ. Then νc is μ-continuous. Next, define νs by: νs (A) = ν(A) − νc (A), A ∈ A. Since X dμ ≥ 0, ν(A) − νc (A) = ν(A) − A
by the fact that X ∈ X , we have that νs (A) ≥ 0, A ∈ A. Since both ν and νc are measures, it follows that νs is itself a measure. The proof of (i) will be completed by showing that νs is μ-singular. To this end, for each n define the set function ϕn as follows: 1 ϕn (A) = νs (A) − μ(A), A ∈ A. n Then, clearly, ϕn is a finite σ -additive set function. Hence, by Theorem 1, there exists Dn ∈ A such that
Set D =
∞
ϕn (A ∩ Dn ) ≥ 0, ϕn (A ∩ Dnc ) ≤ 0,
c j=1 D j .
A∩
r j=1
A ∈ A.
Then, as r → ∞,
D cj ↓ A ∩ D and hence ϕn (A ∩
r j=1 j =n
D cj ) → ϕn (A ∩ D) r →∞
(7.3)
123
124
CHAPTER 7 The Hahn–Jordan Decomposition Theorem
by Lemma 1. On the other hand, for r ≥ n, ϕn (A ∩
r
D cj ) = ϕn [(A ∩
j=1
r
D cj ) ∩ Dnc ] ≤ 0 by (7.3)
j=1 j =n
(where the role of A in (7.3) is played by A ∩ rj=1 D cj here) for all n ≤ r ; i.e., j =n
ϕn (A ∩ rj=1 D cj ) ≤ 0, n ≤ r , and hence, as r → ∞ ϕn (A ∩ D) ≤ 0 , for all n. Equivalently, νs (A ∩ D) −
1 1 μ(A ∩ D) ≤ 0 for all n, or νs (A ∩ D) ≤ μ(A ∩ D) n n
for all n. Letting n → ∞, we get νs (A ∩ D) = 0, A ∈ A. Since νs (A) = νs (A ∩ D) + νs (A ∩ D c ), we have that νs (A) = νs (A ∩ D c ). Thus νs will be μ-singular if we show that μ(D c ) = 0. To this end, we have νc (A) = ν(A) − νs (A) = ν(A) − νs (A ∩ D c ) ≤ ν(A) − νs (A ∩ Dn ) (since Dn ⊆ D c ). Thus
A
1 1 X + I Dn dμ = νc (A) + μ(A ∩ Dn ) n n (from the definition of νc ) 1 ≤ ν(A) − νs (A ∩ Dn ) + μ(A ∩ Dn ) n (by the previous inequality) 1 = ν(A) − [νs (A ∩ Dn ) − μ(A ∩ Dn )] n = ν(A) − ϕn (A ∩ Dn ) (from the definition of ϕn ),
≤ ν(A) (since ϕn (A ∩ Dn ) ≥ 0). So, for every A ∈ A, one has A X + n1 I Dn dμ ≤ ν(A), so that (X + n1 I Dn ) ∈ X . Hence (X + n1 I Dn )dμ ≤α. But A (X + n1 I Dn )dμ ≤ α + n1 μ(Dn ), since X dμ = α;
c therefore α+ n1 μ(Dn ) ≤ α, so that μ(Dn ) = 0, for all n. Since D = ∞ j=1 D j implies n c c j=1 D j ↑ D as n → ∞, we obtain that μ(D ) = 0. The proof of (i) is complete. (ii) In (i) we proved that ν = νc + νs , where X dμ = X dμ + X dμ νc (A) = A∩D A∩D c A = X dμ = νc (A ∩ D), A∩D
7.2 The Lebesgue Decomposition Theorem
since μ(A ∩ D c ) ≤ μ(D c ) = 0. Also let ν = νc + νs , where νc (A) = νc (A ∩ D0 ), νs (A) = νs (A ∩ D0c ) with μ(D0c ) = 0. We have
νc (A) + νs (A) = νc (A) + νs (A) (= ν(A)) or νc (A) − νc (A) = νs (A) − νs (A) for every A ∈ A.
(7.4)
Set N = D c ∩ D0c . Then μ(N ) = 0 and hence νc (N ) = νc (N ) = 0, since they are both μ-continuous. For A ∈ A, write A = (A ∩ N ) + (A ∩ N c ). Then νc (A) − νc (A) = [νc (A ∩ N ) − νc (A ∩ N )] + [νc (A ∩ N c ) − νc (A ∩ N c )]
= [νc (A ∩ N ) − νc (A ∩ N )] + [νs (A ∩ N c ) − νs (A ∩ N c )] (by (7.4)).
But νc (A ∩ N ) = νc (A ∩ N ) = 0 since A ∩ N ⊆ N and μ(N ) = 0. Next, νs (A ∩ N c ) = νs (A ∩ N c ) = 0 because A ∩ N c = A ∩ D ∩ D0 ⊆ A ∩ D, A ∩ D0 and νs (A ∩ D) = νs (A ∩ D0 ) = 0. So νc = νc and therefore νs = νs . (iii) In (i) it was seen that νc (A) = A X dμ for every A ∈ A. Let also νc (A) = A X dμ for every A ∈ A. To show that X = X a.e.[μ]. In fact, if μ(X −X > 0) > 0, then there exists ε > 0 such that μ(X − X > ε) > 0. Hence (X −X >ε) (X − X )dμ ≥ εμ(X − X > ε) > 0, which is a contradic tion (to the assumption that A X dμ = A X dμ = νc (A) for every A A). Thus μ(X − X > 0) = 0 and similarly μ(X − X < 0) = 0, so that μ(X = X ) = 0. Case 2: μ, ν σ -finite. (i) From the σ -finiteness of μ, ν, there exist two countable partitions of {A j , j = 1, 2, . . .}, {Aj , j = 1, 2, . . .}, say, such that μ(A j ), ν(Aj ) < ∞, j ≥ 1. Consider the intersection of these two partitions, which is another partition of ; call it {B j , j ≥ 1}. Then μ(B j ), ν(B j ) < ∞, j ≥ 1. Consider the restrictions μn , νn of μ, ν on Bn ; i.e., μn (A) = μ(A ∩ Bn ) μ(A) = n μn (A) Then , A ∈ A. (7.5) νn (A) = μ(A ∩ Bn ) ν(A) = n νn (A). On each Bn , the theorem is true. Therefore νn = νnc +νns , where νnc is μn -continuous and νns is μn -singular. We assert that νnc is, actually, μ-continuous. In fact, let μ(A) = 0. Then μn (A) = 0 for all n and hence νnc (A) = 0 for all n. Next, from the μn -singularity of νns we have that there exists Nn ∈ A such that μn (Nn ) = 0 and νns (A) = νns (A ∩ Nn ), A ∈ A, so that νns (A ∩ Nnc ) = 0, A ∈ A. Look at Bnc . Then νn (A ∩ Bnc ) = μ(A ∩ Bnc ∩ Bn ) = μ() = 0, and similarly μn (A ∩ Bnc ) = 0. But νnc is μn -continuous. Hence νnc (A ∩ Bnc ) = 0 and this, together with νn (A ∩ Bnc ) = 0, implies that νns (A ∩ Bnc ) = 0. To summarize, νns (A ∩ Nnc ) = 0, νns (A ∩ Bnc ) = 0, A ∈ A.
(7.6)
125
126
CHAPTER 7 The Hahn–Jordan Decomposition Theorem
We assert that νns is μ-singular. In fact, νns [A ∩ (Bn ∩ Nn )c ] = νns [A ∩ (Bnc ∪ Nnc )] = νns [(A ∩ Bnc ) ∪ (A ∩ Nnc )] = νns (A ∩ Bnc ) + νns (A ∩ Nnc ) −νns (A ∩ Bnc ∩ Nnc ) = 0 by (7.6). So for the set Mn = Bn ∩ Nn , we have that μn (Mn ) = 0 (since μn (Nn ) = 0) and νns (A ∩ Mnc ) = 0. But μn (Mn ) = μ(Mn ∩ Bn ) = μ(Mn ) from (7.5) and the fact that Mn ⊆ Bn . Thus μ(Mn ) = 0 and νns (A ∩ Mnc ) = 0, which is equivalent to saying that νns is μ-singular. Up to this point we have shown that νnc is μ-continuous and νns is μ-singular (on Bn ). The μ-singularity of νns implies the existence of a set Nn ∈ A such that μ(Nn ) = 0 and νns (A ∩ Nnc ) = 0. Set N = ∞ j=1 N j . Then μ(N ) = 0 and νns (A ∩ N ) = νns [A ∩ ( c
∞
N j )c ]
j=1
= νns [A ∩ (
∞
N jc )]
j=1
c
≤ νns (A ∩ Nn ) = 0; i.e., μ(N ) = 0, and hence νnc (N ) = 0, and νns (A ∩ N c ) = 0. Next, νnc (A) = νnc (A ∩ N c ) + νnc (A ∩ N ) = νnc (A ∩ N c ) = νnc (A ∩ N c ) + νns (A ∩ N c ) = νn (A ∩ N c ). That is, νnc (A) = νn (A ∩ N c ) and therefore de f
νc (A) =
νnc (A) =
n
νn (A ∩ N c ) = ν(A ∩ N c ) (by (7.5)).
(7.7)
n
Next, νns (A) = νns (A ∩ N ) + νns (A ∩ N c ) = νns (A ∩ N ) = νns (A ∩ N ) + νnc (A ∩ N ) = νn (A ∩ N ). That is, νns (A) = νn (A ∩ N ) and therefore de f
νs (A) =
n
νns (A) =
νn (A ∩ N ) = ν(A ∩ N ) (by (7.5)).
(7.8)
n
From (7.7) and (7.8), we have that ν(A) = νc (A) + νs (A), A ∈ A, whereas νc is μ-continuous since every νnc is μ-continuous. Furthermore, νs is μ-singular since νs (A ∩ N c ) = ν(A ∩ N c ∩ N ) = ν() = 0. This completes the proof of (i).
7.2 The Lebesgue Decomposition Theorem
(ii) By (i), ν = νc + νs , where νc (A) = ν(A ∩ N c ), νs (A) = ν(A ∩ N ), A ∈ A, and μ(N ) = 0. Let ν = νc + νs be another decomposition of ν into a μ-continuous measure νc and a μ-singular measure νs . For each n, consider the and ν of ν, ν , ν , ν , and ν , respectively, to B . restrictions νn , νnc , νns , νnc c s n ns c s + ν and hence ν = ν , ν = ν by part Then we have νn = νnc + νns = νnc nc ns ns nc ns (ii) in Case 1. Since νc (A) =
νnc (A ∩ Bn ), νs (A) =
n
and νc (A) =
νns (A ∩ Bn )
n
νnc (A ∩ Bn ), νs (A) =
n
νns (A ∩ Bn )
n
we have νc = νc , νs = νs , as was to be seen. (iii) From part (i) of Case 1, we have X n dμn , νnc (A) =
A ∈ A,
A
where X n is ≥ 0 and an a.e. [μn ] finite r.v. Actually, since μn assigns measure 0 outside Bn , we may assume that X n is 0 on Bnc . In the course of the proof of (i) in the present case, it was seen that νnc (A ∩ Bnc ) = 0. Therefore νnc (A) = νnc (A ∩ Bn ) and hence X n dμn = X n dμn . A
A∩Bn
On , define X as follows: X = n X n . Then, clearly, X (ω) = X n (ω) for ω ∈ Bn , and X n dμn = X n dμn = X n dμ = X dμ. νnc (A) = A
Therefore νc (A) =
A∩Bn
A∩Bn
n
X dμ = A∩Bn
A∩Bn
X dμ. A
That X is a.e. [μ] well defined follows as in part (iii) of Case 1.
Remark 3. The theorem is true if ν is only a σ -additive, σ -finite set function (see page 132 in Loève (1963)), but then the r.v. X need not be ≥ 0. Remark 4. The theorem is still true if ν is not even a σ -finite set function; i.e., if ν is only a σ -additive set function the theorem is true, but then X need be neither ≥ 0 nor a.e. [μ] finite. Remark 5. measure.
The theorem need not be true if μ is not σ -finite even if ν is a finite
127
128
CHAPTER 7 The Hahn–Jordan Decomposition Theorem
7.3 The Radon–Nikodym Theorem Theorem 3 (Radon–Nikodym Theorem). Let μ, ν be two σ -finite measures on A such that ν is μ-continuous (ν μ). Then ν is the indefinite integral of a nonnegative, a.e. [μ] finite r.v. X that is unique up to μ-equivalence. Proof. By Theorem 2, ν = νc + νs , where νc μ and νs is μ-singular. So there exists N ∈ A with μ(N ) = 0 such that νs (A) = νs (A ∩ N ), A ∈ A. Then, for A ∈ A, (7.9) ν(A) = νc (A) + νs (A). In (7.9), take A = N and use the assumption that ν μ and the fact that νc μ to obtain: 0 = ν(N ) = νc (N ) + νs (N ) = 0 + νs (N ) = νs (N ). That is, νs (N ) = 0 and hence νs (A) = νs (A ∩ N ) ≤ νs (N ) = 0. In other words, νs (A) = 0 for all A ∈ A, orto put it differently, νs = 0. It follows that ν = νc , and therefore ν(A) = νc (A) = A X dμ, A ∈ A, for a nonnegative r.v. X which is a.e. [μ] finite and a.e. [μ] uniquely determined. Remark 6. It is to be emphasized that a.e. [μ] uniqueness of X means that, if Y is another nonnegative, a.e. [μ] finite r.v. such that A X dμ = A Y dμ, A ∈ A, then X = Y a.e. [μ]. Actually, this is true for any two r.v.s X and Y (that or if extended are finite everywhere, r.v.s., are a.e. [μ] finite) for which the integrals X dμ and Y dμ exist. That is, the following result holds. Proposition 1. Let X and Y be finite, (real-valued) r.v.s, or extended r.v.s but a.e. [μ] and assume that the integrals X dμ and Y dμ exist. Then A X dμ = A Y dμ for all A ∈ A implies that X = Y a.e. [μ]. Proof. Indeed, the existence of X dμ and Y dμ implies the existence of A X dμ and A Y dμ, A ∈ A (by the Corollary to Theorem 5 in Chapter 4). Assume first since | A X dμ| = | X I A dμ| ≤ that X dμ is finite. Then A X dμ is also finite, + dμ and − dμ, because ∞ > X X |X |dμ = |X |dμ < ∞, and so are A A − + X dμ. X dμ + + From dμ is finite and then so are A +Y dμ − X dμ = Y dμ, we have that Y A, asbefore. Next, from A X dμ = A Y dμ, we get A X dμ− and A Y dμ, A∈ − dμ = + dμ− − + − X Y A A A Y dμ, or because offiniteness, A X dμ+ A Y dμ = + − + − + − A Y dμ + A X dμ, or A (X + Y )dμ = A (Y + X )dμ. Since this is true for all A ∈ A and the integrands are ≥ 0 (and a.e. [μ] finite), we obtain X + + Y − = Y + + X − a.e. [μ], or X + − X − = Y + − Y − a.e. [μ], or X = Y a.e. [μ]. Similarly if Y dμ is finite. Now, that X dμ = ∞. Then from ∞ = X dμ = X + dμ− X − dμ, suppose + X − dμ X + dμ = ∞ and < ∞. Then,for any A ∈ A, AX dμ ≤ ∞ and we get − dμ < ∞. From X dμ = Y dμ, we get Y + dμ = ∞ and Y − dμ < ∞, so X A − that A Y + Therefore the relation A X dμ = A Y dμ A−Y dμ < ∞. dμ +≤ ∞ and becomes A X dμ − A X dμ = A Y + dμ − A Y − dμ, or, because of finiteness of
7.3 The Radon–Nikodym Theorem
− − + A X dμ and A Y dμ, A (X + − + − X + Y = Y + X a.e. [μ],
+ Y − )dμ = A (Y + + X − )dμ. Hence, as before, or X + − X − = Y + − Y − a.e. [μ], or X = Y a.e.
[μ]. Likewise if Y dμ = ∞. from −∞ = X dμ = X + dμ− X − dμ, it follows that If+ X dμ = −∞, then X dμ < ∞ and X − dμ = ∞. Therefore A X + dμ < ∞ and A X − dμ ≤ ∞ for A ∈ A. Once again −∞ = X dμ = Y dμ = Y + dμ − Y − dμ implies all + − + − < ∞ Y dμ < ∞ and A Y dμ ≤ ∞. and Y dμ = ∞, so that AY dμ + − the relation A X dμ = A Y dμbecomes A X dμ− A X dμ = A Y + dμ− Hence − + dμ and + − + becauseof finiteness of A Y dμ, A Y dμ, − A X dμ − A Y or, − AX + − + − + dμ = − A Y dμ − A X dμ, or A (X + Y )dμ = A (Y + X )dμ. It follows that X − + Y + = Y − + X + a.e. [μ], or X + − X − = Y + − Y − a.e. [μ], or X = Y a.e. [μ]. Similarly if Y dμ = −∞. So, in all cases as described above, A X dμ = A Y dμ for all A ∈ A implies X = Y a.e. [μ]. Remark 7. The r.v. X of the theorem is called a Radon–Nikodym derivative of dν ν with respect to μ and is denoted by X = dμ . Let now f : → and suppose x b that −∞ f (t)dt exists. Then for all x ∈ (−∞, b], −∞ f (t)dt = F(x) exists and d F(x) d x = f (x) at all continuity points x ∈ (−∞, b] of f . The point function F may also be thought of as a set function over the intervals (−∞, x], x ∈ (−∞, b]. Thus, dν generalizes the notation d F(x) the notation dμ dx . Remark 8. In most applications, (, A) = (n , B n ), ν = P and μ = λn , the n-dimension Lebesgue measure. This is the case, e.g., with all of the common distributions such as uniform, normal, gamma, beta, or Cauchy. In the discrete case, such as that of binomial, Poisson, negative binomial, or hypergeometric, the measure μ is the counting measure; i.e., the measure that assigns mass one to singletons. Corollary. Let μ and λ be σ -finite measures on A such that μ λ, and let X be a r.v. for which X dμ exists. Then
X dμ = A
X A
dμ dλ for every A ∈ A. dλ
Proof. First, let X = I B for some B ∈ A. Then
X dμ = A
I B dμ = A
dμ = μ(A ∩ B).
(7.10)
A∩B
Also, μ(A ∩ B) = A∩B
dμ dλ = dλ
(I B A
dμ )dλ = dλ
X A
dμ dλ. dλ
(7.11)
129
130
CHAPTER 7 The Hahn–Jordan Decomposition Theorem
From (7.10) and (7.11), we have then A X dμ = let X = ri=1 αi I Ai , αi ≥ 0, i = 1, . . . , r . Then X dμ =
r
A
αi
I Ai dμ =
A
i=1
r
A
dμ X dλ, A ∈ A. Now, dλ
αi
i=1
A
I Ai
dμ dλ dλ
(by the previous step) r dμ dμ = X dλ, αi I Ai dλ = dλ dλ A A i=1
so, again, the conclusion holds true. Next, let X ≥ 0. Then there exist 0 ≤ X n simple r.v.s ↑ X as n → ∞, which implies 0 ≤ X n I A ↑ X I A as n → ∞. Therefore, by the Lebesgue Monotone Convergence Theorem, X n dμ → X dμ. (7.12) X I A dμ or X n I A dμ → n→∞
n→∞
A
A
However,
dμ dλ (by the previous step) dλ A A dμ dμ dμ ↑X as n → ∞ . (7.13) → X dλ since 0 ≤ X n n→∞ A dλ dλ dλ dμ dλ, A ∈ A. From (7.12) and (7.13), it follows that A X dμ = A X dλ Finally, for any r.v. X for which the X dμ exists, we have
X n dμ =
Xn
+
−
+ dμ
X dλ − X dμ − X dμ = dλ A A A dμ dμ (X + − X − ) X dλ = dλ. = dλ dλ A A Thus, in all cases, A X dμ = A (X dμ dλ )dλ, A ∈ A. X dμ =
A
X A
− dμ
dλ
dλ
Remark 9. The result stated in the preceding corollary is what lies behind the routine replacement in integrals of probability distributions by their p.d.f.s. Thus, let X be a r.v. defined on the probability space (, A, P) with probability distribution function PX , let B ∈ B and A = X −1 (B), and let g : → be (measurable and) such that E g(X ) exists. Then (by Theorem 13 in Chapter 4), d PX g(x) (x) dλ, g(X ) d P = g(x) d PX = dλ A B B
7.3 The Radon–Nikodym Theorem
if PX << λ (which often is either the Lebesgue or the counting measure); i.e., d PX (x) dλ. g(X ) d P = g(x) dλ A B In the following remark a brief summary is presented of results leading to Theorem 3. Remark 10. From among Theorems 1–3, the one that is most often used is Theorem 3, the Radon–Nikodym Theorem. Its proof, however, depends on Theorem 2, the Lebesgue Decomposition Theorem, whose proof is long and depends on Theorem 1, the Hahn–Jordan Decomposition Theorem. A shortcut to Theorem 3 may be as follows: • Start out with a σ -additive set function ϕ defined on (, A) into (−∞, ∞] for which there is at least one A ∈ A such that |ϕ(A)| < ∞. • State Lemma 1, according to which ϕ is continuous (where continuity from above, continuity from below, and continuity of ϕ are as in Definition 1). • Define ϕ + and ϕ − as was done in Definition 2, and then state the following short version of the Hahn–Jordan Decomposition Theorem: 1. ϕ + , ϕ − and |ϕ| = ϕ + + ϕ − are measures and ϕ − is finite. 2. ϕ = ϕ + − ϕ − (in the sense that ϕ(A) = ϕ + (A) − ϕ − (A), A ∈ A). • Recall the definition of absolute continuity, mutual absolute continuity, and singularity (see Definition 3). • State the Lebesgue Decomposition Theorem, whose detailed proof is very long and is carried out first for the case that both μ and ν are finite, and then the case they are both σ -finite. • Finally, state and prove the Radon–Nikodym Theorem. This chapter is concluded with a useful inequality, both in information theory as well as in statistical inference. Proposition 2 (Shannon–Kolmogorov Information Inequality). Let X be a kdimensional random vector defined on the probability space (, A, P), and suppose that its probability distribution PX has one of two possible probability density functions f 0 or f 1 with respect to a σ -finite measure ν on B k . Let K ( f 0 , f 1 ) be the mutual entropy of f 0 and f 1 ; i.e., f 1 (X ) f 0 (X ) = E0 log , f 0 (X ) f 1 (X ) where E0 denotes expectation taken under f 0 , and log is the natural logarithm. Then K ( f 0 , f 1 ) exists, is ≥ 0, and is = 0 if and only if Pi [ f 0 (X ) = f 1 (X )] = 1, where Pi is the probability measure induced by f i , i = 0, 1. K ( f 0 , f 1 ) = −E0 log
Remark 11.
By making the usual conventions 00 = 0, ±∞ × 0 = 0, and writing f 0 (x) f 0 (X ) log = f 0 (x)dν, K ( f 0 , f 1 ) = E0 log f 1 (X ) f 1 (x) k
it is seen that K ( f 0 , f 1 ) ≥ 0, although it may be ∞.
131
132
CHAPTER 7 The Hahn–Jordan Decomposition Theorem
Proof. (a) (b) (c) (d)
If If If If
From the last relation above and the conventions made, it follows that: f0 f0 f0 f0
> 0 and f 1 = 0, then K ( f 0 , f 1 ) = ∞. = 0 and f 1 > 0, then K ( f 0 , f 1 ) = 0. = f 1 = 0, then K ( f 0 , f 1 ) = 0. > 0 and f 1 > 0, then the integrand is finite.
Thus, it suffices to focus on this last case alone. By setting S0 = ( f 0 > 0) and S1 = ( f 1 > 0), we may assume that S0 = S1 , since the cases S0 − S1 = and S1 − S0 = are covered by (a) and (b) above. So, let S = ( f 0 > 0) = ( f 1 > 0), A = X −1 (S), so that Pi (A) = 1, i = 0, 1, where Pi is the probability measure induced by f i , i = 0, 1. The function g(z) = − log z (z > 0) is strictly convex (since g (z) = z −2 > 0). Then with Z = f 1 (X )/ f 0 (X ), g(Z ) is strictly convex on A (with Pi (A) = 1, i = 0, 1). Thus, g(E0 Z ) ≤ E0 g(Z ) and equality occurs if and only if P0 (Z = c, a constant) = 1. This is so by Exercise 35 in Chapter 6. Equivalently, f 1 (X ) f 1 (X ) f 1 (X ) ≤ E0 − log = −E0 log = K ( f 0 , f 1 ), − log E0 f 0 (X ) f 0 (X ) f 0 (X ) or −K ( f 0 , f 1 ) ≤ log E0
f 1 (X ) = log f 0 (X )
S
f 1 (x) f 0 (x)dν = log f 0 (x)
and −K ( f 0 , f 1 ) = 0 only if P0 From
f 1 (X ) f 0 (X )
f 1 (x)dν = log 1 = 0, S
f 1 (X ) = c = 1. f 0 (X )
= c a.s.[P0 ], we get f 1 (X ) c = E0 = f 0 (X )
S
f 1 (x) f 0 (x)dν = 1, f 0 (x)
so that, first, −K ( f 0 , f 1 ) ≤ 0, equivalently, K ( f 0 , f 1 ) ≥ 0, and secondly, K ( f 0 , f 1 ) = 0 only if P0 [ f 1 (X ) = f 0 (X )] = 1; also, P1 [ f 1 (X ) = f 0 (X )] = 1. The proof is completed. Exercises. 1. Verify the relation An 0 = A +
∞
− A j+1 ) used in the proof of Lemma 1; ∞
An , then for namely, if {An }, n ≥ 1, form a nonincreasing sequence and A = ∞ n=1 any n 0 ≥ 1, the given relation holds. In particular, A1 = n=1 (An − An+1 ) if ∞
An = .
n=1
j=n 0 (A j
2. Given the finite measures μn , n = 1, 2, . . ., on the measurable space (, A), define a probability measure μ such that μn μ for all n (i.e., μ(A) = 0 for A ∈ A implies μn (A) = 0, n ≥ 1).
7.3 The Radon–Nikodym Theorem
3. Let C be the class of all probability measures defined on the measurable space (, A), and let d(P, Q) = P − Q = 2 sup{|P(A) − Q(A)|; A ∈ A}. (i) Then show that d is a distance in C; i.e., show that d(P, Q) ≥ 0 and d(P, Q) = 0 only if P = Q, d(P, Q) = d(Q, P), and d(P, Q) ≤ d(P, R) + d(R, Q) where R ∈ C. Next, let μ be a σ -finite measure in A such that P << μ for every P ∈ C, and for P and Q in C, let f = d P/dμ and g = d Q/dμ. (ii) Then show that
d(P, Q) =
| f − g|dμ .
Hint: For part (ii) and for any A ∈ A, ( f − g)dμ = ( f − g)dμ − A
A∩B
(g − f )dμ, A∩C
where B = ( f − g > 0) and C = (g − f > 0). 4. Let U (α, β) be the uniform distribution over the interval [α, β] (α < β), and for n ≥ 1, let Pn and Q n be the (probability) measures corresponding to U (− n1 , 1) and U (0, 1 + n1 ). Then show that Pn − Q n → 0 as n → ∞, by using part (ii) of Exercise 3. 5. Let P and Q be two probability measures defined on the measurable space (, A), and suppose that P ≈ Q (i.e., P << Q and Q << P). Let f = d P/dμ and g = d Q/dμ for some σ -finite measure μ dominating both P and Q (e.g., μ = P + Q), and let Z = log(g/ f ), where as always, log stands for the natural logarithm. Then (with reference to Exercise 3), show that P − Q ≤ 2(1 − e−ε ) + 2P(|Z | > ε) for every ε > 0. Hint: For B = ( f − g > 0), show that P − Q = 2[P(B) − Q(B)], and then work with P(B) and Q(B) by introducing C = (|Z | > ε). 6. Let P, Q and f , g be as in Exercise 3 and, without loss of generality, we suppose that the dominating measure μ is a probability measure (e.g., μ = (P + Q)/2). Define ρ by ρ = ( f g)1/2 dμ, and show that: (i) ρ ≤ 1. 2 1/2 (ii) 2(1 − ρ) ≤ d(P, Q) ≤ 2(1 − ρ ) , where (by Exercise 3), d(P, Q) = | f − g|dμ = P − Q. Replacing P, Q and f , g, ρ by Pn , Q n and f n , gn , ρn , n ≥ 1, part (ii) becomes: 2(1 − ρn ) ≤ d(Pn , Q n ) ≤ 2(1 − ρn2 )1/2 .
133
134
CHAPTER 7 The Hahn–Jordan Decomposition Theorem
(iii) From this last relation, conclude that, as n → ∞, d(Pn , Q n ) → 0 if and only if ρn → 0. 7. For n ≥ 1, let f n be real-valued measurable functions defined on an open set E ⊂ containing 0, and suppose that | f n (x)| ≤ M(< ∞) for all n and all x ∈ E, and that f n (x) −→ 0 for x ∈ E. Then show that, for any sequence xn −→ 0, n→∞ n→∞ there exists a subsequence {xm } ⊆ {xn } and a λ-null subset N of E (which may depend on {xn }) such that f m (x + xm ) −→ 0, x ∈ E − N ; here λ is the Lebesgue m→∞ measure. Hint: Consider the integral | f n (x + xn )|d(x), where is the d.f. of the N (0, 1) distribution; refer to Exercise 17 in Chapter 5, and use the fact that the measure induced by and λ are mutually absolutely continuous. 8. For n ≥ 1, let f n be real-valued measurable functions defined on , and suppose that | f n (x)| ≤ M(< ∞) for all n and x ∈ , and that lim f n− (x) = 0 for x ∈ . n→∞
Then for any sequence {xn } with xn −→ 0 it holds lim sup f n (x + xn ) ≥ 0 a.e. n→∞
[λ], where λ is the Lebesgue measure.
n→∞
CHAPTER
Distribution Functions and Their Basic Properties, Helly–Bray Type Results
8
This chapter deals with some basic properties of d.f.s, the concept of weak and complete convergence of sequences of d.f.s, and some Helly–Bray type theorems. More precisely, in Section 8.1, it is shown that a d.f. is, actually, determined by its values on a dense set in ; that the set of its discontinuity points is countable; and that it is uniquely decomposed into three d.f.s, of which one is a step function, the second is a continuous d.f. whose induced measure is absolutely continuous with respect to the Lebesgue measure, and the third is a continuous d.f. whose induced measure is singular with respect to the Lebesgue measure. In the following section, the concepts of weak and complete convergence of a sequence of d.f.s is introduced, and their relationships are discussed. It is also shown that for any sequence of d.f.s, there is always a subsequence that converges weakly to a d.f. In the final section, the integral of a bounded and continuous real-valued function defined on with respect to a d.f. is considered; such integrals are to be interpreted in the Riemann–Stieltjes sense. Then conditions are given under which weak or complete convergence of a sequence of d.f.s implies convergence of associated integrals. Problems of this type arise often in research and applications.
8.1 Basic Properties of Distribution Functions Definition 1.
We say that F is a d.f. (not necessarily of a r.v.) if F : → and
(1) 0 ≤ F(x) ≤ 1, x ∈ . (2) F ↑. (3) F is continuous from the right. def
def
As usual, F(+∞) = lim F(x), F(−∞) = lim F(x). Then 0 ≤ F(−∞), x→∞
x→−∞
F(∞) ≤ 1, but F(−∞) need not be = 0 and F(+∞) need not be = 1. Proposition 1. Defining property (3) is not essential in defining a d.f. in the following sense: if F ∗ satisfies (1) and (2), then one can construct a d.f. F (i.e., F satisfies properties (1)–(3)) which coincides with F ∗ whenever F ∗ is continuous or only right continuous. An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00008-6 Copyright © 2014 Elsevier Inc. All rights reserved.
135
136
CHAPTER 8 Distribution Functions and Their Basic Properties
Proof. In the first place, F ∗ can have only jumps. Now if x ∈ C(F ∗ ), we set / C(F ∗ ), we set F(x) = F ∗ (x + 0) which exists F(x) = F ∗ (x), whereas if x ∈ because of (2). Then, clearly, F satisfies properties (1)–(3). Actually more is true, namely, Proposition 2. Let D be a dense subset of (such as the set of rational numbers), and let FD : D → satisfy (1) and (2). Then FD determines uniquely a d.f. F in . Proof. In the first place, we may assume that FD is also continuous from the right in D, by Proposition 1. Next, for x ∈ , let xn ∈ D such that xn ↓ x as n → ∞. Then define F : → [0, 1] as follows: F(x) = FD (x), x ∈ D, and F(x) = lim FD (xn ), x ∈ D c . Clearly, F satisfies (1)–(3). n→∞
If F1 , F2 are two d.f.s such that F1 (x) = F2 (x), x ∈ D, then F1 ≡ F2 .
Corollary.
Proof. Let x ∈ D c . Then there exists xn ∈ D such that xn ↓ x as n → ∞. It follows that, as n → ∞, F1 (xn ) ↓ F1 (x), F2 (xn ) ↓ F2 (x). But F1 (xn ) = F2 (xn ), n ≥ 1. Hence F1 (x) = F2 (x). We mentioned previously that a d.f. can have only discontinuities that are jumps. The following theorem refers to the number of these jumps. Theorem 1. Any d.f. F has a countable number of discontinuous points (which, of course, can be equal to 0). Proof. If F is continuous everywhere there is no problem. Let then F be discontinuous and let (α, β] be a finite interval in . Let x1 , . . . , xn be n points such that α < x1 < · · · < xn ≤ β at which F is discontinuous. ( α
]( x1
]( x2
] x3
...
...
]( xn
] β
Then we have F(α) ≤ F(x1− ) < F(x1 ) ≤ F(x2− ) < F(x2 ) ≤ · · · ≤ F(xn − ) < F(xn ) ≤ F(β). The lengths of the jumps are F(x j ) − F(x j − ),
j = 1, . . . , n.
Summing them up, we get n [F(x j ) − F(x j − )] = [F(x1 ) − F(x1− )] + [F(x2 ) − F(x2− )] j=1
+ · · · + [F(xn−1 ) − F(xn−1− )] + [F(xn ) − F(xn − )] ≤ [F(x1 ) − F(x1− )] + [F(x2− ) − F(x1 )] + [F(x2 ) − F(x2− )] + · · · + [F(xn − ) − F(xn−1 )] + [F(xn ) − F(xn − )]
8.1 Basic Properties of Distribution Functions
(by adding some nonnegative numbers), = F(xn ) − F(x1− ) ≤ F(β) − F(α); i.e., n [F(x j ) − F(x j − )] ≤ F(β) − F(α).
(8.1)
j=1
Relation (8.1) implies the following: for ε > 0, the number of jumps in (α, β] with length of jump > ε cannot be greater than 1ε [F(β) − F(α)]. In fact, if this number is K , and K > 1ε [F(β) − F(α)], then the sum of lengths of the K jumps is ≥ K ε > F(β) − F(α), which contradicts (8.1). Applying this for m = 1, 2, . . . , we get the number of jumps in (α, β] with length of jump > m1 is ≤ m[F(b) − F(a)], thus finite. Hence the total number of jumps in (α, β] is countable. Since can be written as the sum of denumerably many intervals, we have that the total number of jumps in is countable. Definition 2. Consider the numbers xn , n = 1,2, . . . , and let the positive numbers p(xn ) be associated with xn ,n ≥ 1, such that n p(xn ) < ∞. Define F ∗ : → [0, ∞) as follows: F ∗ (x) = xn ≤x p(xn ). Then F ∗ is said to be a step function. Remark 1. F ∗ as just defined is not necessarily a step function in the usual sense; i.e., constant over intervals, since the xn s can form a set dense in ; e.g., the rational numbers. However, if {xn , n ≥ 1} is not dense in , these two concepts coincide. Theorem 2 (Decomposition Theorem). Any d.f. F is uniquely decomposed into two d.f.s Fc , Fd such that Fc is continuous, Fd is a step function, and F = Fc + Fd . Proof. Let xn , n ≥ 1, be the discontinuity points of F and set p(xn ) = F(xn ) − F(xn −) and p(x) = 0 for x = xn , n ≥ 1. Then, clearly, n p(xn ) < ∞(actually, ≤ 1, by (1) in Definition 1). Define: p(xn ), Fc (x) = F(x) − Fd (x). Fd (x) = xn ≤x
We show first that Fd is a d.f. In the first place, 0 ≤ Fd (x) ≤ 1, x ∈ (apply (8.1) for β = x and let α → −∞). Next, for y > x, we get p(xn ) ≥ 0, Fd (y) − Fd (x) = x<xn ≤y
so that Fd ↑. So it remains to show continuity from the right. We have, by (8.1), p(xn ) ≤ F(y) − F(x). (8.2) (0 ≤)Fd (y) − Fd (x) = x<xn ≤y
Letting y ↓ x and utilizing the right continuity of F, we get Fd (y) − Fd (x) → 0 or Fd (y) → Fd (x).
137
138
CHAPTER 8 Distribution Functions and Their Basic Properties
We next prove that Fc is a continuous d.f. Since F, Fd are both d.f.s and Fd (x) ≤ F(x) (by (8.1)), x ∈ , we have that 0 ≤ Fc (x) ≤ 1 (by (8.1)), x ∈ . Next, Fc is ↑. In fact, for x < y, we have, by (8.2), p(xn ) ≤ F(y) − F(x), by (8.1). Fd (y) − Fd (x) = x<xn ≤y
Hence Thus
F(y) − F(x) − Fd (y) − Fd (x) ≥ 0.
Fc (y) − Fc (x) = F(y) − Fd (y) − F(x) − Fd (x) = F(y) − F(x) − Fd (y) − Fd (x) ≥ 0,
so that Fc ↑. Now Fc is right continuous since both F, Fd are so. Thus it remains to prove that Fc is continuous from the left. Again for x < y, we have Fc (y) − Fc (x) = F(y) − Fd (y) − F(x) − Fd (x) = F(y) − F(x) − p(xn ) x<xn ≤y
= F(y) − F(x) − p(y) −
p(xn )
x<xn
= F(y) − F(x) − F(y) + F(y−) − = F(y−) − F(x) −
p(xn )
x<xn
p(xn ).
x<xn
Thus, as x ↑ y, we get F(y−) − F(x) → F(y−) − F(y−) = 0 and p(xn ) = p(xn ) − p(y) ≤ F(y) − F(x) x<xn
x<xn ≤y
− F(y) − F(y−) = F(y−) − F(x) → 0.
Finally, we justify the uniqueness argument. We have F = Fc + Fd and also step d.f.; i.e., Fd (x) = let F Fc + Fd , where Fc is a continuous d.f. and Fd is a =
p (xn ), where p (xn ), n ≥ 1, are positive numbers (with p (xn ) ≤ 1) assigned xn ≤x
xn
to the points x1 , x2 , . . . , and p (x) = 0 for x = xn , n ≥ 1. It will first be shown that Fd (x) − Fd (x−) = p(x) (the length of the jump of Fd at x), x ∈ . To this end, let ε > 0. Then p(xn ) Fd (x + ε) − Fd (x − ε) = x−ε<xn ≤x+ε
=
x−ε<xn ≤x
p(xn ) +
x<xn ≤x+ε
p(xn ).
8.1 Basic Properties of Distribution Functions
But, by (8.1),
p(xn ) ≤ F(x + ε) − F(x) → F(x) − F(x) = 0, ε→0
x<xn ≤x+ε
whereas p(x) ≤
p(xn ) ≤ F(x) − F(x − ε) → F(x) − F(x−) = p(x). ε→0
x−ε<xn ≤x
Therefore
x−ε<xn ≤x
p(xn ) → p(x). ε→0
However,
p(xn ) = Fd (x) − Fd (x − ε) → Fd (x) − Fd (x−). ε→0
x−ε<xn ≤x
Hence Fd (x) − Fd (x−) = p(x). Next, we show that, for every x ∈ : p (xn ) = Fd (x−), xn <x
Fd (x) − Fd (x−) = p (x), Fd (x) − Fd (x−) = p(x). Indeed, let = {x1 , x2 , . . .}, let A be the discrete σ -field, and on A, define the (finite) p (xn ). For k = 1, 2, . . . , set Ak = {xn ; xn ≤ x − k1 }, measure μ by μ(A) = xn ∈A ∞
def
so that, as k ↑ ∞, Ak ↑ ∪ Ak = A = {xn ; xn < x}. Then μ(Ak ) → μ(A), or k→∞ k=1 p (xn ) → p (xn ), or p (xn ) → p (xn ), or Fd (x − k1 ) → xn ∈Ak
xn <x
k→∞ x ∈A n
p (xn ). However, Fd (x −
Next, for ε > 0,
k→∞ x <x xn ≤x− 1k n 1
(x−). Hence ) → F k k→∞ d xn <x
Fd (x) − Fd (x − ε) =
xn ≤x
= =
p (xn ) −
k→∞
p (xn ) = Fd (x−).
p (xn )
xn ≤x−ε
p (xn ) + p (x) −
p (xn )
xn <x xn ≤x−ε
Fd (x−) + p (x) − Fd (x − ε) → Fd (x−) + p (x) − Fd (x−) ε→0
= p (x).
139
140
CHAPTER 8 Distribution Functions and Their Basic Properties
Also, Fd (x) − Fd (x − ε) → Fd (x) − Fd (x−), so that Fd (x) − Fd (x−) = p (x). ε→0
Finally, from Fc + Fd = F, we get Fd = F − Fc , so that Fd (x) − Fd (x − ε) = [F(x) − Fc (x)] − [F(x − ε) − Fc (x − ε)] = [F(x) − F(x − ε)] − [Fc (x) − Fc (x − ε)] → F(x) − F(x−) = p(x), ε→0
since Fc is continuous, so that Fd (x) − Fd (x−) = p(x). Combining the last two results, and using also the result that Fd (x) − Fd (x−) = p(x), we get Fd (x) − Fd (x−) = p (x) = p(x) = Fd (x) − Fd (x−). From this it follows that Fd and Fd have the same points of discontinuities and
the same lengths jumps. n ), n ≥ 1, so that of Thus, xn = xn , and p (xn ) = p(x
p (xn ) = p(xn ) = Fd (x), x ∈ , and then Fc (x) = Fc (x), x ∈ . Fd (x) = xn ≤x
xn ≤x
(See also Exercise 10.)
Corollary. Any d.f. F is uniquely decomposed into three d.f.s Fd , Fcc , Fcs such that F = Fd + Fcc + Fcs , where Fd is a step function, singular with respect to Lebesgue measure λ; Fcc is λ-absolute continuous; and Fcs is continuous and λ-singular; by λ-absolute continuity and λ-singularity, we mean that the measures induced by the d.f.s have these properties. Proof. Let μd , μc be the measures induced by Fd , Fc , respectively. We set N = {x1 , x2 , . . .} for the set of discontinuities of F, and we prove that μd is λ-singular by proving that μd (B) = μd (B ∩ N ), B ∈ B, while λ(N ) = 0. Since μd (B) = μd (B ∩ N ) + μd (B ∩ N c ), it suffices to prove that μd (B ∩ N c ) = 0, which would be the case if we would show that μd (N c ) = 0. We have μd (N c ) = μd ( − N ) = μd lim ( − {x1 , . . . , xn }) n→∞
= lim μd ( − {x1 , . . . , xn }) n→∞ = lim μd () − μd ({x1 , . . . , xn }) n→∞ = lim α − μd ({x1 , . . . , xn }) , n→∞
(where α = μd ()); and this is = α − lim μd ({x1 , . . . , xn }) = α − lim n→∞
=α−
∞ j=1
So, μd is λ-singular.
n→∞
n j=1
p(x j ) = α − Fd (∞) = α − α = 0.
p(x j )
8.2 Weak Convergence and Compactness
Next, μc is decomposed uniquely (by the Lebesgue Decomposition Theorem, Theorem 2 in Chapter 7) into a λ-continuous measure μcc and into a λ-singular measure μcs such that μc = μcc + μcs . Then, for every x ∈ , μc (−∞, x] = μcc (−∞, x] + μ cs (−∞, x] or Fc (x) = Fcc (x) + Fcs (x), for the corresponding d.f.s, and μcc (B) = B ϕ dλ, where ϕ is a measurable, nonnegative, finite a.e. [λ], and uniquely defined a.e. [λ] function, and Fcc (x) = (−∞,x] ϕ dλ. Now Fcc is continuous since for x < y, we have Fcc (y) − Fcc (x) = (x,y] ϕ dλ → 0 as x ↑ y, because |ϕ I(x,y] | ≤ ϕ independent of x, λ-integrable, Dominated and as x ↑ y, ϕ I(x,y] → ϕ I{y} . Then, by the Convergence Theorem, (x,y] ϕ dλ = ϕ I(x,y] dλ → ϕ I{y} dλ = {y} ϕ dλ = 0. From the expression Fc = Fcc + Fcs and continuity of Fcc , it follows that Fcs is also continuous. Thus F is decomposed into a step d.f. Fd , a continuous d.f. Fcc with density, and a continuous d.f. Fcs whose corresponding measure vanishes outside a λ-null set.
8.2 Weak Convergence and Compactness of a Sequence of Distribution Functions Definition 3. If F is a d.f. on , then its variation, V ar F, is defined to be V ar F = F(+∞) − F(−∞). Let now F and Fn , n = 1, 2, . . . , be d.f.s. We have defined weak convergence, Fn ⇒ F, to mean Fn (x) → F(x), x ∈ C(F). The convergence n→∞ n→∞ Fn ⇒ F need not imply that V ar Fn → V ar F. If this is so, we say that Fn n→∞
n→∞
c
converges completely to F and we denote it by writing Fn → F.
n→∞
c
Clearly, if F, Fn , n ≥ 1, are d.f.s of r.v.s, then Fn ⇒ F is equivalent to Fn → F. n→∞
Example 1.
n→∞
Let ⎧ ⎨ 0 if x < −n Fn (x) = 21 if − n ≤ x < n, n ≥ 1 ⎩ 1 if x ≥ n, c
and F(x) = 21 . Then Fn ⇒ F but Fn F. n→∞
n→∞
Remark 2. It is to be noted that if Fn ⇒ F, then F is uniquely determined. In n→∞ fact, we have the following Proposition 3. For two d.f.s F and F and as n → ∞, suppose Fn ⇒ F and Fn ⇒ F . Then F(x) = F (x), x ∈ . Proof.
Indeed, for x ∈ C(F) ∩ C(F ),
Fn (x) → F(x), n→∞
Fn (x) → F (x) n→∞
and hence F(x) = F (x) on C(F)∩C(F ). Next, [C(F)∩C(F )]c = C c (F)∪C c (F ) is countable, and for every x in this set there exists xn ∈ C(F)∩C(F ) such that xn ↓ x
141
142
CHAPTER 8 Distribution Functions and Their Basic Properties
as n → ∞. Then F(xn ) → F(x), F (xn ) → F (x). But F(xn ) = F (xn ), n ≥ 1. n→∞
n→∞
Hence F(x) = F (x), x ∈ .
The following theorem relates the variations of d.f.s and their values at ±∞. Theorem 3.
Let Fn ⇒ F. Then n→∞
(i) lim sup Fn (−∞) ≤ F(−∞) ≤ F(∞) ≤ lim inf Fn (∞). n→∞
n→∞
(ii) V ar F ≤ lim inf V ar Fn . n→∞
(iii) Fn (±∞) → F(±∞) if and only if V ar Fn → V ar F. n→∞
n→∞
c
Thus Fn ⇒ F implies Fn → F if and only if Fn (±∞) → F(±∞).) n→∞
n→∞
n→∞
Proof. (i) We have: Fn (−∞) ≤ Fn (x) ≤ Fn (∞). Let now x ∈ C(F) and take the limits as n → ∞. Then lim sup Fn (−∞) ≤ F(x) ≤ lim inf Fn (∞). Now let x ↓ −∞ through continuity points. Then lim sup Fn (−∞) ≤ F(−∞) ≤ lim inf Fn (∞). If x ↑ ∞ through continuity points, then lim sup Fn (−∞) ≤ F(∞) ≤ lim inf Fn (∞). Hence, we get (i). (ii) From (i) and the fact that for any sequences of real numbers {xn } and {yn }, lim sup xn = − lim inf (−xn ), and lim inf (xn + yn ) ≥ lim inf xn + lim inf yn , n→∞
n→∞
n→∞
n→∞
n→∞
we get V ar F = F(∞) − F(−∞) ≤ lim inf Fn (∞) − lim sup Fn (−∞) n→∞
n→∞
= lim inf Fn (∞) + lim inf [−Fn (−∞)] n→∞
n→∞
≤ lim inf [Fn (∞) − Fn (−∞)] = lim inf V ar Fn , n→∞
n→∞
as was to be seen. (iii) Assume first that V ar Fn → V ar F or Fn (∞) − Fn (−∞) → F(∞) − F(−∞). n→∞
n→∞
Then for ε > 0 and all sufficiently large n, we have Fn (∞) − Fn (−∞) − F(∞) − F(−∞) < ε,
8.2 Weak Convergence and Compactness
or Fn (∞) − F(∞) + F(−∞) − ε < Fn (−∞). Hence as n → ∞, lim inf Fn (∞) − F(∞) + F(−∞) − ε ≤ lim inf Fn (−∞) ≤ lim sup Fn (−∞), or, F(−∞) − ε ≤ lim inf Fn (−∞) ≤ lim sup Fn (−∞) ≤ F(−∞), since by part (i), lim inf Fn (∞) − F(∞) ≥ 0 and lim sup Fn (−∞) ≤ F(−∞). Letting ε → 0, we obtain lim inf Fn (−∞) = lim sup Fn (−∞) = F(−∞). Likewise, for ε > 0 and all sufficiently large n, −ε < Fn (−∞) − Fn (∞) + F(∞) − F(−∞) , or Fn (−∞) − F(−∞) + F(∞) + ε > Fn (∞). Hence as n → ∞, lim sup Fn (−∞) − F(−∞) + F(∞) + ε ≥ lim sup Fn (∞) ≥ lim inf Fn (∞), or F(∞) + ε ≥ lim sup Fn (∞) ≥ lim inf Fn (∞) ≥ F(∞), since by part (i), lim sup Fn (−∞) − F(−∞) ≤ 0 and lim inf Fn (∞) ≥ F(∞). Letting ε → 0, we obtain lim inf Fn (∞) = lim sup Fn (∞) = F(∞). The proof of the other direction is immediate. Theorem 4.
Fn ⇒ F if and only if Fn (x) → F(x) on a set D dense in . n→∞
n→∞
Proof. In all that follows, n → ∞ unless otherwise noted. If Fn ⇒ F, then Fn (x) → F(x), x ∈ C(F) which is a set dense in . Now let Fn (x) → F(x), x ∈ D. We show that this convergence is true for x ∈ C(F). For x ∈ C(F), there exists x , x
∈ D such that x < x < x
. Hence Fn (x ) ≤ Fn (x) ≤ Fn (x
). Letting n → ∞, we get F(x ) ≤ lim inf Fn (x) ≤ lim sup Fn (x) ≤ F(x
). Now let x ↑ x, x
↓ x through values of D. Then we get F(x−) = F(x) ≤ lim inf Fn (x) ≤ lim sup Fn (x) ≤ F(x+) = F(x). Thus Fn (x) → F(x).
Remark 3. It is to be emphasized that in the proof of Theorem 4 only continuity points of F are relevant; discontinuity points and right-continuity points play no role in the proof.
143
144
CHAPTER 8 Distribution Functions and Their Basic Properties
Now let {Fn } be any sequence of d.f.s. Of course, this sequence may or may not converge weakly. However, there is always a subsequence which converges weakly; i.e., Theorem 5 (Weak Compactness Theorem). If {Fn } is a sequence of d.f.s, there exists a subsequence that converges weakly to a d.f. Proof. Let D = {x1 , x2 , . . .} be a set dense in , and look at the bounded sequence {Fn (x1 )}, n ≥ 1. Then there exists a convergent subsequence; call it {Fn1 (x1 )}. Here, and in the remainder of this proof, all limits are taken as n → ∞. Next, look at the subsequence {Fn1 } (of {Fn }) and evaluate it at x2 to get the bounded sequence {Fn1 (x2 )}. Then there exists a convergent subsequence; call it {Fn2 (x2 )}. Since {Fn2 } ⊆ {Fn1 }, it follows that {Fn2 (x1 )} is also convergent. Proceeding like this, at the nth step we consider the subsequence {F1,n−1 , F2,n−1 , F3,n−1 , . . .} and evaluate it at xn to get the bounded sequence {F1,n−1 (xn ), F2,n−1 (xn ), F3,n−1 (xn ), . . .}. Then there exists a convergent subsequence; call it {F1n (xn ), F2n (xn ), F3n (xn ), . . .}. Since {F1n , F2n , F3n , . . .} ⊆ {F1,n−1 , F2,n−1 , F3,n−1 , . . .}, it follows that {F1n (x j ), F2n (x j ), F3n (x j ), . . .} is also convergent for j = 1, . . . , n − 1. The process continues in this manner indefinitely. In terms of an array, we have that F11 F12 . . F1n . .
F21 F22 . . F2n . .
F31 F32 . . F3n . .
··· ··· . . ··· . .
converges at converges at . . . . . converges at . . . . .
x1 x1 , x2 x1 , x2 , . . . , xn . .
At this point, consider the diagonal sequence {Fnn } and argue that {Fnn } converges at every point of D. To this end, let xn ∈ D and look at {Fnn (xn )}. Clearly, {Fnn } ⊆ {F1n , F2n , F3n , . . .} except for the first n − 1 terms F11 , F22 , . . . , Fn−1,n−1 . Hence {Fnn (xn )} converges, since so does {F1n (xn ), F2n (xn ), F3n (xn ), . . .}. Thus, {Fnn } converges on D and defines the function FD (x) = lim Fnn (x), x ∈ D. Clearly, FD satisfies the properties of being between 0 and 1 and nondecreasing, since 0 ≤ Fnn (x) ≤ 1, x ∈ , and Fnn is nondecreasing. Next, extend FD to F ∗ from D to by defining F ∗ thus: F ∗ (x) = FD (x), x ∈ D, and F ∗ (x) = limn→∞ FD (xn ) with xn ∈ D, xn ↓ x. Then, clearly, 0 ≤ F ∗ (x) ≤ 1, x ∈ , and F ∗ is nondecreasing. Furthermore, Fnn (x) −→ F ∗ (x), x ∈ C(F ∗ ). Indeed, for x ∈ C(F ∗ ), there exist x n→∞
and x
in D such that x < x < x
, so that Fnn (x ) ≤ Fnn (x) ≤ Fnn (x
), and this implies as n → ∞, lim Fnn (x ) = FD (x ) = F ∗ (x ) ≤ lim inf Fnn (x) ≤ lim sup Fnn (x) ≤ lim Fnn (x
) = FD (x
) = F ∗ (x
) (since F ∗ = FD on D). Thus F ∗ (x ) ≤ lim inf Fnn (x) ≤ lim sup Fnn (x) ≤ F ∗ (x
).
8.3 Helly–Bray Type Theorems for Distribution Functions
Next, let x ↑ x and x
↓ x whereas always x , x
∈ D. Then lim F ∗ (x ) = F ∗ (x−) = F ∗ (x) ≤ lim inf Fnn (x) ≤ lim sup Fnn (x) ≤ lim F ∗ (x
) = F ∗ (x+) = F ∗ (x) (since x ∈ C(F ∗ )). Therefore, as n → ∞, lim inf Fnn (x) = lim sup Fnn (x) = lim Fnn (x) = F ∗ (x); i.e., Fnn (x) −→ F ∗ (x) for x ∈ C(F ∗ ), as asserted. Finally, modify F ∗ into F as n→∞
follows: F(x) = F ∗ (x), x ∈ C(F ∗ ), and for x ∈ C(F ∗ ), F(x) = lim F ∗ (xn ) where xn ∈ C(F ∗ ) with xn ↓ x as n → ∞, so that F inherits the set of continuity points of F ∗ ; i.e., C(F) = C(F ∗ ), and is also right-continuous. Therefore F is a d.f. and Fnn ⇒ F, as was to be seen. Remark 4. The method employed in the proof for constructing the subsequence {Fnn } is known as the Cantor diagonal method.
8.3 Helly–Bray Type Theorems for Distribution Functions Remark 5. The following integrals are to be understood in the Riemann–Stieltjes sense. For some comments, regrading such integrals and related results, the reader is referred to Appendix B. As a prelude to this section, let us revisit briefly Chapter 5. In that chapter, it was seen that, if a sequence of r.v.s X n , n ≥ 1, converges to a r.v. X (as n → ∞), either a.e. (as is the case in Theorem 1 and its corollaries, Theorem 2(iii) and Theorem 3 under (a)), or in measure (as is thecase in Theorem 3 under (b)), then the integrals X n dμ converge to the integral X dμ, provided some additional requirements are met. Now, let Fn and F be the d.f.s of X n and X , respectively, and suppose that Fn ⇒ F c or Fn −→ F as n → ∞. Also, let g : → be a Borel function, sothat g(X n ) and g(X ) are r.v.s. Then, by Theorem 13 of Chapter 4, g(X n )dP = g(x)dP X n = g(x)d F , and g(X )dP = g(x)dP = n X g(x)d F (see also Remarks 6 of this chapter). A question which may then arise is this: under what further conditions is it true that g(x)d Fn −→ g(x)dF, over a set B ⊆ ? B
n→∞ B
It is this kind of questions for which the Helly–Bray type results provide an answer (see Theorems 6,7, and 8). Theorem 6 (Helly–Bray Lemma).
Let Fn ⇒ F. Then for every α, β, n→∞
−∞ < α < β < ∞, such that Fn (α) → F(α), n→∞
Fn (β) → F(β) n→∞
145
146
CHAPTER 8 Distribution Functions and Their Basic Properties
and every g : [α, β] → continuous (and hence bounded), we have gdFn → gdF. (α,β]
n→∞ (α,β]
Proof. Consider the partition α = x1 < x2 < · · · < xm+1 = β of [α, β] and on (α, β] define the function gm (x) =
m
g(ξ j )I(x j ,x j+1 ] (x),
j=1
where ξ j ∈ (x j , x j+1 ], j = 1, . . . , m. Then we assert that, as m → ∞, so that max x j+1 − x j → 0, we have gm (x) → g(x), x ∈ (α, β]. In fact, let x ∈ (α, β].
1≤ j≤m
Then x ∈ (xk , xk+1 ] for some k, while also ξk ∈ (xk , xk+1 ]. Now, as m → ∞,
max x j+1 − x j → 0 implies xk+1 − xk → 0 so that ξk → x 1≤ j≤m
and hence g(ξk ) → g(x) by continuity of g. But g(ξk ) = gm (x). Thus gm (x) → g(x).
m→∞
Next,
g dFn − g dF ≤ (g − gm )dFn
(α,β] (α,β] (α,β]
+ (gm − g)dF + gm dF − gm dFn
, (α,β]
(α,β]
(α,β]
clearly, while all these integrals exist either because the integrand is continuous or because it is a step (in the usual sense) function. Then the right-hand side of the relation above is bounded by
2 sup |gm (x) − g(x)| + gm dF − gm dFn
, (α,β]
x∈(α,β]
(α,β]
since F(β) − F(α) ≤ 1,
Fn (β) − Fn (α) ≤ 1.
Next, (α,β]
gm dF =
m j=1 (x j ,x j+1 ]
and similarly, m gm dFn = (α,β]
j=1 (x j ,x j+1 ]
gm dF =
m
g(ξ j ) F(x j+1 ) − F(x j ) ,
j=1
gm dFn =
m j=1
g(ξ j ) Fn (x j+1 ) − Fn (x j ) .
8.3 Helly–Bray Type Theorems for Distribution Functions
Hence
(α,β]
g dFn −
(α,β]
m
g(ξ j ) g dF
≤ 2 sup |gm (x) − g(x)| + x∈(α,β]
j=1
× Fn (x j+1 ) − F(x j+1 ) + Fn (x j ) − F(x j ) .
Taking the partitioning points x2 . . . , xm in C(F), using weak convergence and the assumption that Fn (α) → F(α) and Fn (β) → F(β), we have Fn (x j ) → F(x j ), n→∞ j = 1, . . . , m. Thus, the second summand at the right-hand side of the last inequality → 0. Next, n→∞ sup |gm (x) − g(x)| = max sup |gm (x) − g(x)| ; x ∈ (x j , x j+1 ] 1≤ j≤m
x∈(α,β]
sup g(ξ j ) − g(x) ; x ∈ (x j , x j+1 ] 1≤ j≤m
≤ max sup g(ξ j ) − g(x) ; x ∈ [x j , x j+1 ] 1≤ j≤m = max |g(ξ j ) − g(y j )|; some y j ∈ [x j , x j+1 ]
= max
1≤ j≤m
by continuity of g, and this is equal to |g(ξk ) − g(yk )| for some k = 1, . . . , m. As m → ∞, this last expression converges to
zero by the uniform continuity of g in [α, β] and the fact that max x j+1 − x j → 0 implies |ξk − yk | → 0. Thus, as 1≤ j≤m
n → ∞,
(α,β]
g dFn →
(α,β]
g dF.
Let Fn ⇒ F. Then, for every g :
Theorem 7 (Helly–Bray Extended Lemma).
n→∞
→ continuous and such that g(±∞) = 0 (in the sense that lim g(x) = 0), x→±∞ we have g dFn → g dF. n→∞ Proof. Since g is bounded and continuous on , g dFn , g dF exist (finite) and are taken as follows: g dFn = lim g dFn , g dF = lim g dF.
α→−∞ (α,β] β→∞
α→−∞ (α,β] β→∞
Next, for any −∞ < α < β < ∞, we have
g dFn −
≤
g dF g dF − g dF n
(α,β] (α,β]
+
g dF − g dF
(α,β]
+ g dFn − g dFn
.
(α,β]
(8.3)
147
148
CHAPTER 8 Distribution Functions and Their Basic Properties
In the sequel, α and β will be continuity points of F, and α and β will be taken as small as needed and as large as needed, respectively. Then Fn (α) → F(α), Fn (β) → n→∞
n→∞
F(β), and hence, by Theorem 6,
ε
g dFn − g dF
< for all sufficiently large n.
3 (α,β] (α,β] Also,
ε g dF
< . 3 (α,β]
g dFn
=
g dFn
≤
g dF −
Finally,
g dFn −
(α,β]
−(α,β]
(8.4)
(8.5)
sup |g(x)| <
−(α,β]
ε , 3
(8.6)
by also taking into consideration that g(x) → 0 as x → ±∞. Then from (8.3)–(8.6), we get
g dFn − g dF
< ε for n sufficiently large.
c
Theorem 8 (Helly–Bray Theorem). Let Fn → F. Then for every g : → n→∞ bounded and continuous, we have g dFn → g dF. n→∞ Proof. Since g is bounded and continuous on , g dFn , g dF exist (finite). Next,
g dFn −
g dF ≤ g dFn − g dF
(α,β] (α,β]
+ g dF − (8.7) g dF + g dFn − g dFn
.
(α,β]
(α,β]
In the sequel, α and β will be continuity points of F; also, α and β will be as small as needed and as large as needed, respectively. With this in mind, we have
ε
< (8.8) g dF − g dF n
3 (by Theorem 6), (α,β] (α,β] and, by the definition of
g dF,
g dF −
ε g dF
< . 3 (α,β]
(8.9)
8.3 Helly–Bray Type Theorems for Distribution Functions
Next,
g dFn −
g dFn = g dFn + g dFn
(α,β] (−∞,α] (β,∞) |g| dFn + |g| dFn ≤ M Fn (α) − Fn (−∞) ≤ (−∞,α] (β,∞) + Fn (∞) − Fn (β) = M Fn (∞) − Fn (−∞) − Fn (β) − Fn (α) = M V ar Fn − Fn (β) − Fn (α) , c
where |g(x)| ≤ M. Let n → ∞. From Fn → F it follows that V ar Fn → V ar F. Also, Fn (α) → F(α), Fn (β) → F(β), so that Fn (β) − Fn (α) → F(β) − F(α). Thus, for all sufficiently large n, V ar Fn − Fn (β) − Fn (α) ≤ |V ar Fn − V ar F|
+ V ar F − F(β) − F(α) + Fn (β) − Fn (α) ε ε ε ε + + = , − F(β) − F(α) < 9M 9M 9M 3M and therefore
ε
g dFn − g dF
< . (8.10)
3 (α,β] (α,β] By (8.8)–(8.10), (8.7) becomes
g dFn − g dF
< ε for all sufficiently large n.
Remark 6. From the proof of Theorems 6–8, it is easily seen that they are also true if in the d.f.s involved property (1) in Definition 1 is replaced by the boundedness property 0 ≤ F(x) ≤ B, x ∈ , some B > 0. Remark 7. In closing, it should be mentioned that the results discussed in this chapter, pertaining to weak convergence of d.f.s, are very special cases of the subject matter of weak convergence of probability measures. For an exposition of such material, see, e.g., Billingsley (1999). Exercises. 1. Let F : → [0, 1] be nondecreasing, right continuous, F(−∞) = 0 and F(∞) = 1 (i.e., a d.f. of a r.v.), and let F −1 be defined by F −1 (y) = inf{x ∈ ; F(x) ≥ y}, y ∈ [0, 1]. Next, consider the probability space (, A, P) = ([0, 1], B[0,1] , P), where P = λ is the Lebesgue measure, and on , define the function X by X (ω) = F −1 (ω). Then show that X is a r.v. and that its d.f. is F. Hint: Show that F −1 (y) ≤ t if and only if y ≤ F(t), t ∈ .
149
150
CHAPTER 8 Distribution Functions and Their Basic Properties
P
2. (i) If X n → X , then show that FX n ⇒ FX . n→∞
(ii)
3. 4. 5. 6.
n→∞
By an example, show that the converse statement in part (i) need not be true.
Hint: For part (i), show that, for every ε > 0 and any x ∈ , FX n (x) ≤ P(|X n − X | ≥ ε) + FX (x + ε), and FX (x − ε) ≤ P(|X n − X | ≥ ε) + FX n (x). Next, let x ∈ C(FX ) and take the limits first as n → ∞ and then as ε → 0 to obtain the desired result. Let Fn , n ≥ 1, be d.f.s of r.v.s and suppose that Fn ⇒ F where F is a d.f. with n→∞ V ar F = 1. Then show that F is the d.f. of a r.v. Show that the Weak Compactness Theorem (Theorem 5) still holds, if condition 1 in Definition 1 is replaced by: (0 ≤)F(x) ≤ B, x ∈ , n ≥ 1, for some B < ∞. Show that Theorems 6–8 hold true if, for all x ∈ , 0 ≤ F(x) ≤ B and 0 ≤ Fn (x) ≤ B, n = 1, 2, . . ., for some B > 0, as indicated in Remark 6. For n = 1, 2, . . ., consider the r.v.s X n , X and Y such that |X n | ≤ Y , EY r < ∞ d
(for some r > 0), and X n −→ X . Then show that E|X n |r −→ E|X |r . n→∞
n→∞
7. Let X be a r.v. with d.f. F; i.e., F(x) = P(X ≤ x), x ∈ . Then show that (i) P(X < x) = F(x−) (the left-hand side limit). (ii) F(x) is continuous at x if and only if P(X = x) = 0. 8. In the following expression, determine the constants α and β, so that the function F defined is the d.f. of a r.v. F(x) = 0, x ≤ 0, and F(x) = α + βe−x
2 /2
, x > 0. c
9. For n = 1, 2, . . ., let Fn and F be d.f.s such that Fn −→ F, and let F be n→∞
continuous. Then show that Fn (x) −→ F(x) uniformly in x ∈ . n→∞ Hint: For ε > 0, choose a and b sufficiently small and sufficiently large, respectively, so that F(a) < 3ε , F(∞) − F(b) < 3ε . Next, partition [a, b] by a = x0 < x1 < · · · < xk−1 < xk = b, so that F(x j ) − F(x j−1 ) < 3ε , j = 1, . . . , k. Finally, by taking x ∈ [a, b], or x < a, or x > b, show that |Fn (x) − F(x)| < ε for all x, provided n ≥ N some integer independent of x ∈ . 10. Let X be a r.v. with continuous d.f. FX , and define the r.v. Y as follows: Y = X if X ≤ C, and Y = C if X > C, where C is a constant. Then (i) Determine the d.f. of the r.v. Y , FY . (ii) Show that FY = F1 + F2 , where F1 is a continuous d.f. and F2 is a step function. 11. By a simple example demonstrate that it is possible that Fn , n ≥ 1, are d.f.s of r.v.s with Fn ⇒ F, a d.f., but not that of a r.v. n→∞
12. For n ≥ 1, let X n and X be r.v.s defined on the probability space (, A, P), and d
suppose that, as n → ∞, E|X n | → E|X | < ∞ and X n → X . Then show that |X n |, n ≥ 1, are uniformly integrable.
8.3 Helly–Bray Type Theorems for Distribution Functions
13. For n ≥ 1, let Fn and F be d.f.s of r.v.s, let f n be real-valued measurable functions defined on , let f : → be continuous, and let g : → [0, ∞) be continuous. Assume that: (i) Fn → F as n → ∞. (ii) | f n (x)| ≤ g(x), x ∈ , n ≥ 1. (iii) f n (x) −→ f (x) uniformly on finite intervals. n→∞ (iv) g dFn −→ g dF and g dF < ∞. n→∞ Then show that f n dFn −→ f dF and f dF is finite. n→∞ Hint: Use the Helly–Bray Lemma (Theorem 6 in this chapter).
151
CHAPTER
Conditional Expectation and Conditional Probability, and Related Properties and Results
9
This chapter is primarily about conditional expectations and conditional probabilities defined in terms of σ -fields. The definitions are based on Theorems 2 and 3 of Chapter 7. In Section 9.1, the concept of the conditional expectation, given a σ -field, is defined, and then the conditional probability is derived as a special case. In the following section, some basic properties of conditional expectations and of conditional probabilities are discussed. In Section 9.3, versions for conditional expectations of standard convergence theorems proved in Chapter 5 are established. The same is done for the moment inequalities discussed in Chapter 6. Some additional properties of conditional expectations and of conditional probabilities are presented in the fourth section of the chapter. The section is concluded with an application, illustrating that the elementary definition of a conditional probability density function coincides with the more general definition used here. In the course of derivations in Section 9.4, the concept of independence of r.v.s is needed, as well as a result regarding the expectations of the product of two independent r.v.s (see Lemma 1). An elaboration on independence and the proof of Lemma 1 are found in Chapter 10.
9.1 Definition of Conditional Expectation and Conditional Probability Consider the probability space (, A, P) and let X be a r.v. defined on this space with E |X | < ∞. Let B be a σ -field with B ⊆ A and denote by PB the restriction of P to B. On A, define the set functions ϕ + and ϕ − as follows: + + − ϕ (A) = X dP, ϕ (A) = X − dP, A ∈ A. A +
A
−
Denoting by ϕB and ϕB the restrictions of ϕ + , ϕ − to B, we clearly have + − (B) = X + dP, ϕB (B) = X − dP, B ∈ B. ϕB B
B
An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00009-8 Copyright © 2014 Elsevier Inc. All rights reserved.
153
154
CHAPTER 9 Conditional Expectation and Conditional Probability
+ − Then, clearly, ϕB , ϕB are PB -continuous (finite) measures (see also Exercise 1) and therefore, by Theorem 3 in Chapter 7, there exists a.s. [PB ] well defined, ≥ 0, B-measurable (a.s. [PB ] finite) r.v.s Y + , Y − such that + − ϕB (B) = Y + dPB , ϕB (B) = Y − dPB , B ∈ B. B
B
Then, by setting Y = Y + − Y − , we have that Y is a.s. [PB ] well defined, B-measurable, a.s. [PB ] finite, and Y dPB = X dP, B ∈ B. B
B
(However, see also Proposition 1 in Chapter 7.) Definition 1. Given an integrable r.v. X defined on the probability space (, A, P), and a σ -field B ⊆ A, the conditional expectation of X , given B, is any B-measurable r.v. Y , denoted by E B X (or E(X | B)) such that B E X dPB = X dP, B ∈ B. (9.1) B
B
EB I
By taking X = I A , A ∈ A, then A is called conditional probability of A, given B, is denoted by P B A, and we have B E I A dPB = P B A dPB B B = I A dP = P(A ∩ B), B ∈ B. (9.2) B
Special Cases: For B ∈ A with P(B) > 0 and a r.v. X with E |X | < ∞, define the r.v. E B X as follows: 1 (E B X )(ω) = X dP, ω ∈ . P(B) B Now let {B j , j = 1, 2, . . .} be a (measurable) partition of with P(B j ) > 0, all j, and let B be the σ -field generated by this partition. Define the function Y on as follows: ∞ 1 Y = (E B j X )I B j , E B j X = X dP, j ≥ 1. P(B j ) B j j=1 Then, clearly, Y is B-measurable and for any B ∈ B, one has B = i∈I Bi for some I ⊆ {1, 2, . . .}. Therefore, by Exercise 1, ⎡ ⎤ ⎣ (E B j X )I B j ⎦dPB Y dPB = Y dPB = Y dPB = i∈I Bi
B
=
i∈I
i∈I
Bi
i∈I
Bi
j
P(Bi ) (E Bi X )PB (Bi ) = (E Bi X )P(Bi ) = X dP P(Bi ) Bi i∈I
i∈I
9.1 Definition of Conditional Expectation and Conditional Probability
=
X dP = Bi
i∈I
i∈I Bi
i.e., B
Y dPB =
B
X dP =
X dP = B
E B X dPB ,
B
E B X dPB ;
B ∈ B,
and therefore Y = E B X a.s. (by Proposition 1 in Chapter 7). That is, in the case that B is generated by a countable (measurable) partition {B j , j ≥ 1} with P(B j ) > 0 1 for all j, then for any integrable r.v. X , one has that (E B X )(ω) = P(B X dP, for j) Bj a.a. (almost all) ω ∈ B j , j ≥ 1. In particular, if X = I A , A ∈ A, then
B P(A ∩ B j ) P A (ω) = , a.a. ω ∈ B j , P(B j )
j ≥ 1.
(9.3)
If the partition of consists of two sets B, B c ∈ A with 0 < P(B) < 1, then the σ -field generated by {B, B c } is B = {, B, B c , } and P(A∩B) a.a. ω ∈ B
B P(B) P A (ω) = P(A∩B c) a.a. ω ∈ B c . P(B c ) Thus for a.a. ω ∈ B, (P B A)(ω) = P(A | B), and for a.a. ω ∈ B c , (P B A)(ω) = P(A | B c ). Now suppose that the σ -field B ⊆ A is generated, as before, by a countable partition {B j , j ≥ 1} with P(B j ) > 0 for all j, and that for A ∈ A, the conditional probability of A, given B, is defined by (9.3). In this context, it will be convenient to write PωB A rather than (P B A)(ω). Thus, we have PωB A =
P(A ∩ B j ) , a.a. ω ∈ B j , P(B j )
j ≥ 1.
(9.4)
From (9.4), it is obvious that for every fixed ω ∈ (lying outside a null set), PωB as a function on A is a probability measure. Then, if X is a r.v., we may talk about the integral of X with respect to PωB , provided, of course, this integral exists. To this end, let X be a P-integrable r.v. Then for an arbitrary but fixed ω ∈ (lying outside a null set), we have that ω ∈ B j for exactly one j. Therefore, by Exercise 2, 1 B X dP = E B j X , ω ∈ B j , j ≥ 1, X (ω )dPω = P(B j ) B j and hence = (E B j X )I B j = EωB X , ω ∈ (lying outside a null set). (9.5) X (ω )dPB ω j
This result then shows that if B is generated as before, one may define conditional probability first and then get the conditional expectation as the integral of a P-integrable r.v. X with respect to the conditional probability.
155
156
CHAPTER 9 Conditional Expectation and Conditional Probability
Remark 1. The result just stated need not be true if B is not of the form just mentioned. (See, e.g., Loève (1963), pages 353–354.) This section is concluded with two examples illustrating some concepts discussed here. Remark 2. Here is a possible explanation of the meaning/usefulness of the conditional probability and conditional expectation in their abstract setting. From the definition of the conditional probability, we have, for A ∈ A, P B A dPB = I A dP = P(A ∩ B), B ∈ B, B
so that
B
1 P(B)
B
P(A ∩ B) = P(A|B) P(B)
P B A dPB =
Now, on the right-hand side in the last relation above, we have the conditional probability of A, given B, as is given by the familiar elementary definition. The left-hand side in the sane relation tells us that the actual calculation of P(A|B) is obtained through the integration of the (B-measurable) r.v. P B A over B, and norming it by the P(B). Next, from the definition of the conditional expectation of a r.v. X (E X ∈ ), we have, E B X dPB = X dP, B ∈ B, B
so that
1 P(B)
B
E B X dPB =
B
1 P(B)
X dP = B
X dP(·|B),
where P(·|B) is the elementary definition of the conditional probability, given B (see also Exercise 2). So, E B X has the property that it is the (B-measurable) r.v. whose weighted (by P(B)) integral over B gives the expectation of X with respect to the conditional probability measure P(·|B). Example 1. Let X 1 , . . . , X n be i.i.d. r.v.s defined on the probability space (, A, Pθ ), where the parameter θ takes values in the parameter space , an open subset of k , k ≥ 1, and set X = (X 1 , . . . , X n ). Also, let T = (T1 , . . . , Tm ) be an m-dimensional statistic of X; i.e., a measurable function of X not involving any unknown quantities. We say that T is sufficient for θ , if the conditional distribution of X, given the σ -field induced by T, BT , does not depend on θ . That is, if AX is the σ -field induced by X (so that AX ⊆ A and BT ⊆ AX since T = T(X)), then EθBT I A = PθBT A is independent of θ for all A ∈ AX . As an application of this concept,let X 1 , . . . , X n be i.i.d. r.v.s distributed as B(1, θ ), θ ∈ = (0, 1), and set T = nj=1 X j . Then T is sufficient for θ . To this end, set X = (X 1 , . . . , X n ) and observe that X takes on 2n values in n , call them xi = (xi1 , . . . , xin ), i = 1, . . . , 2n . Thus, if Ai = X−1 ({xi }), then AX is generated by {Ai , i = 1, . . . , 2n }. Next, T ∼ B(n, θ ) andBT is generated m by {Bt , t = 0, 1, . . . , n} where Bt = T −1 ({t}) = X−1 (Ct ) = j=1 Ai j , where Ct = {xi j = (xi j 1 , . . . , xi j n ) ∈ n , each coordinate being either 0 or 1 and their
9.1 Definition of Conditional Expectation and Conditional Probability
sum being equal to t, j = 1, . . . , m = nt }, since there are nt ways of choosing t positions out of n to place t 1s and 0s in the remaining n − t positions; if t = 0, then
n = 1, and this is the number of ways of placing 0s in all n places (and 1s nowhere). 0 With A ∈ AX , apply the special case following Definition 1 to get n Pθ (A ∩ Bt ) , t = 0, 1, . . . , n. (E Bt,θ I A )I Bt , E Bt,θ I A = Pθ (Bt ) t=0 p Clearly, for each A ∈ AX , A = k=1 A jk , for some p, so that ⎞
p ⎛ m q ⎠ ⎝ A ∩ Bt = A jk ∩ Ai j = Akl , for some q,
PθBT A = EθBT I A =
k=1
j=1
l=1
and Pθ (Akl ) = Pθ (X 1 = xkl 1 , . . . , X n = xkl n ) = Pθ (X 1 = xkl 1 ) . . . Pθ (X 1 = xkl n ) = θ xkl 1 (1 − θ )1−xkl 1 . . . θ xkl n (1 − θ )1−xkl n = θ t (1 − θ )n−t .
It follows that Pθ (A ∩ Bt ) = qθ t (1 − θ )n−t . Since Pθ (T = t) = nt θ t (1 − θ )n−t , it follows that qθ t (1 − θ )n−t Pθ (A ∩ Bt ) q = n t = n E Bt,θ I A = n−t Pθ (Bt ) θ (1 − θ ) t t independent of θ ; that is, E Bt,θ I A is independent of θ , and then so is PθBT I A . It follows that T is sufficient for θ . Example 2. Refer to the previous example, and suppose that the independent r.v.s θ ; i.e., X j ∼ P(θ ), θ ∈ = X 1 , . . . , X n are distributed as Poisson with parameter (0, ∞), j = 1, . . . , n. Then the statistic T = nj=1 X j is sufficient for θ . As in Example 1, X = (X 1 , . . . , X n ) takes values xi = (xi1 , . . . , xin ) in n where each xi j = 0, 1, . . . , j = 1, . . . , n. Thus, if Ai = X−1 ({xi }), then AX is and BT is generated by {B generated by {Ai , i = 1, 2, . . .}. Next, T ∼ P(nθ ) t, t = m A , where C = xi j = 0, 1, . . .} where Bt = T −1 ({t}) = X −1 (Ct ) = i t j j=1 (xi j 1 , . . . , xi j n ) ∈ n ; each coordinate being ≥ 0 integer and their sum being equal
to t, j = 1, . . . , m = n+t−1 . That m = n+t−1 follows from the fact that there are t t
n+t−1 ways of selecting t positions out of n to place xi j 1 , . . . , xi j n ≥ 0 integers with t xi j 1 + · · · + xi j n = t (see also Exercise 19). For A ∈ AX , we have, as in Example 1, PθBT A = EθBT I A =
∞
(E Bt ,θ I A )I Bt , E Bt ,θ I A =
t=0
Furthermore, again as in Example 1, A ∩ Bt =
q l=1
Pθ (A ∩ Bt ) , t = 0, 1, . . . . Pθ (Bt ) Akl for some q, and
Pθ (Akl ) = Pθ (X 1 = xkl 1 , . . . , X n = xkl n )
157
158
CHAPTER 9 Conditional Expectation and Conditional Probability
θ xkl 1 θ xkl n . . . e−θ x kl 1 ! x kl n ! t θ = e−nθ , x kl 1 ! . . . x kl n !
= e−θ
so that Pθ (A ∩ Bt ) = qe−nθ θ t /xkl 1 ! . . . xkl n !. Since Pθ (T = t) = e−nθ (nθ )t /t!, it follows that Pθ (A ∩ Bt ) qe−nθ θ t /xkl 1 ! . . . xkl n ! = Pθ (Bt ) e−nθ (nθ )t /t! = qt!/n t xkl 1 ! . . . xkl n !
E Bt,θ I Ai =
independent of θ . Then so is PθBT I A , and T is sufficient for θ .
9.2 Some Basic Theorems About Conditional Expectations and Conditional Probabilities In all that follows the σ -field B ⊆ A. Also, Proposition 1 in Chapter 7 will be used throughout without explicit reference to it. Theorem 1.
Let E |X | < ∞. Then
(i) E(E B X ) = E X . (ii) E A X = X a.s. (iii) If X is B-measurable then E B X = X a.s.
Proof. (i) From the definition of E B X , one has E B X dPB = X dP, B
B ∈ B.
B
Then for B = , we get B B E X dPB = E X dP = X dP or E(E B X ) = E X . (ii) Since both X and E A X are A-measurable and E A X dPA = X dPA , A
we have that E A X = X a.s.
A
A ∈ A,
9.2 Some Basic Theorems About Conditional Expectations
(iii) As in (ii), both E B X and X are B-measurable and B E X dPB = X dPB , B ∈ B, B
so that
EB X
B
= X a.s.
Theorem 2. (i) If X = c a.s., then E B X = c a.s. (ii) If X ≥ Y a.s. and E |X | , E |Y | < ∞, then E B X ≥ E B Y a.s. (iii) If for j = 1, . . . , n, X j are integrable and c j are constants, one has ⎛ ⎞ n n cj X j⎠ = c j E B X j a.s. EB ⎝ j=1
j=1
(iv) P B = 1 a.s., P B () = 0 a.s., P B A ≥ 0 a.s., E B ( n B j=1 c j P A j a.s.
n
j=1 c j I A j )
=
Proof. (i) For every B ∈ B, we have B E X dPB = X dP = c dP = c dPB (since X = c a.s.). B
B
That is,
B
B
B
E X dPB =
B
B
c dPB for every B ∈ B, and both E B X
and c are B-measurable. Hence E B X = c a.s. (ii) We have B E X dPB = X dP B B ≥ Y dP (since X ≥ Y a.s.) B
EBY
and this equals B dPB , B ∈ B. Since both E B X and E B Y are B-measurable, we obtain that E B X ≥ E B Y a.s. (see also Exercise 4). (iii) We have EB c j X j dPB = c j X j dP = cj X j dP B
=
j
B
j
cj
B
E B X j dPB =
j
B
j
j
c j E B X j dPB ,
B
B ∈ B.
159
160
CHAPTER 9 Conditional Expectation and Conditional Probability
B Since both E B ( j c j X j ) and j c j E X j are B-measurable, it follows that E B ( j c j X j ) = j c j E B X j a.s. (iv) We have B
P B d PB =
B
I dP = P(B) = PB (B) =
B
1 d PB ,
B ∈ B,
while both P B and 1 are B-measurable. Hence P B = 1 a.s.; B P dPB = I dP = 0 = 0 dPB , B ∈ B, B
B
B
and both P B and 0 are B-measurable. Thus P B = 0 a.s.; P B A dPB = I A dP = P(A ∩ B) ≥ 0 = 0 dPB , B
B
B ∈ B,
B
and both P B A and 0 are B-measurable. Hence P B A ≥ 0 a.s. (by Exercise 4). Finally, EB(
c j IA j ) =
j
c j E B I A j a.s. (by (iii)) and this equals
j
B
cj P Aj;
j
i.e., E B (
j
c j IA j ) =
j
c j P B A j a.s.
9.3 Convergence Theorems and Inequalities for Conditional Expectations Theorem 3 (Lebesgue Monotone Convergence Theorem for Conditional Expectaa.s.
tions). If, as n → ∞, 0 ≤ X n ↑ X a.s. and E X < ∞, then 0 ≤ E B X n ↑ E B X a.s. ∞ B In particular, P B ( ∞ j=1 A j ) = j=1 P A j a.s. Proof.
With all limits taken as n → ∞, we have 0 ≤ X n ≤ X n+1 , so that 0 ≤ E B X n ≤ E B X n+1 a.s.
Therefore E B X n ↑ Y (some Y ) a.s. and Y is B-measurable since it is the a.s. limit of B-measurable r.v.s. (See also Exercise 19 in Chapter 3.) Then E B X n dPB ↑ Y dPB , B ∈ B, B
B
9.3 Convergence Theorems and Inequalities for Conditional Expectations
by the classical Monotone Convergence Theorem. But B E X n dPB = X n dP and X n dP ↑ X dP = E B X dPB . B
B
B
Thus
B
Y dPB =
B
B
E B X dPB ,
B
B ∈ B,
a.s.
so that Y = E B X a.s. Therefore 0 ≤ E B X n ↑ E B X a.s. For the second part, we have ⎛ ⎞ n n n a.s. B A j ) = E B Inj=1 A j = E B ⎝ IA j ⎠ = E IA j PB( j=1
j=1
↑
a.s.
∞
EB IA j =
∞
j=1
j=1
PB A j .
j=1
On the other hand, n
Aj ↑
j=1
∞
A j implies
In
j=1
Aj
=
n
j=1
IA j ↑
j=1
, and E B Inj=1 A j ↑ E B I∞ I∞ j=1 A j j=1 A j
∞ j=1
IA j = ⎛
a.s., or P B ⎝
n
⎞ Aj⎠↑
j=1
⎛ ⎞ ∞ PB ⎝ A j ⎠ a.s. j=1
Comparing this with the last result obtained, we get ⎛ ⎞ ∞ ∞ Aj⎠ = P B (A j ) a.s. PB ⎝ j=1
Theorem 4.
Suppose that E |Y | , E |X Y | < ∞ and let X be B-measurable. Then E B (X Y ) = X E B Y a.s.
Proof.
j=1
E B (X Y )
E B (I
First let X = I A , A ∈ B. Then = A Y ) and hence E B (I A Y )dPB = I A Y dP = Y dP = E B Y dPB B B A∩B A∩B = (I A E B Y )dPB , B ∈ B, B
E B (I
EBY
a.s., since both E B (I A Y ) and I A E B Y are B-measurable, so that AY ) = IA B B or E (X Y ) = X E Y a.s.
161
162
CHAPTER 9 Conditional Expectation and Conditional Probability
Next, let X=
n
α j IA j ,
A j ∈ B,
j = 1, . . . , n.
j=1
Then E B (X Y ) = E B [(
α j I A j )Y ] = E B (
j a.s.
=
a.s.
α j Y IA j ) =
j
α j E B (Y I A j )
j
(by Theorem 2(iii)) α j I A j E B Y (by the previous step)
j
= X E B Y ; i.e., E B (X Y ) = X E B Y a.s. Next, suppose that X , Y ≥ 0 and let 0 ≤ X n simple r.v.s ↑ X as n → ∞. Then, as n → ∞, 0 ≤ X n Y ↑ X Y implies E B (X n Y ) ↑ E B (X Y ) a.s. But
E B (X n Y ) = X n E B Y a.s. and X n E B Y ↑ X E B Y a.s.
Thus E B (X Y ) = X E B Y a.s. Finally, for any r.v.s X and Y , X Y = (X + − X − )(Y + − Y − ) = X + Y + − X + Y − − X − Y + + X − Y − , and the previous results complete the proof of the theorem. Theorem 5 (Conditional Inequalities).
For r , s > 0, suppose that
E |X | , E |Y | , E |Y |s < ∞. r
r
Then (i) E B |X Y | ≤ (E B |X |r )1/r (E B |Y |s )1/s a.s., provided r1 + 1s = 1. (ii) (E B |X + Y |r )1/r ≤ (E B |X |r )1/r + (E B |Y |r )1/r a.s., provided r ≥ 1. (iii) E B |X + Y |r ≤ cr (E B |X |r + E B |Y |r ) a.s., where cr = 1 if (0 <)r ≤ 1, and c r = 2r −1 if r > 1. r (iv) E B X ≤ E B |X | a.s. and, more generally, E B X ≤ E B |X |r a.s., r ≥ 1. Proof. (i) Since E B |X |r and E B |Y |s are a.s. finite, the inequality is trivially true on a possibly exceptional null set; so we may assume that they are finite. More precisely, let A = {ω ∈ ; E B |X |r = 0}, B = {ω ∈ ; E B |Y |s = 0}. First, focus on the set A. If P(A) = 0, then there is no issue regarding ω ∈ A. If P(A) > 0, then since A ∈ B, we have E B |X |r dPB = |X |r dP, 0= A
A
which implies that |X |r = 0 a.s. on A (by Exercise 5(i)), or X = 0 a.s. on A, and this implies that X Y = 0 a.s. on A, of |X Y | = 0 a.s. on A. Hence E B |X Y | = 0
9.3 Convergence Theorems and Inequalities for Conditional Expectations
a.s. on A (by Exercise 5(ii)), and therefore the inequality is true on A. Likewise, the inequality is true on B, so that it is true on A ∪ B. Next, consider (A ∪ B)c = Ac ∩ B c on which both E B |X |r and E B |Y |s are positive. Then in the inequality |αβ| ≤
|β|s |α|r + , r s
and on Ac ∩ B c , set:
1/r 1/s
, β = Y / E B |Y |s . α = X / E B |X |r
Then, on Ac ∩ B c , |X Y | |X |r |Y |s
1/r
1/s ≤ B r + B s r E |X | sE |Y | E B |X |r E B |Y |s and hence, by taking the conditional expectations, given B, and using B-measurability of E B |X |r and E B |Y |s , we get, a.s. on Ac ∩ B c , E B |X Y | E B |X |r E B |Y |s 1 1
1/r
1/s ≤ B r + B s = + = 1. B r B s r s r E |X | sE |Y | E |X | E |Y | Therefore
E B |X Y | ≤ (E B |X |r )1/r (E B |Y |s )1/s
a.s. on Ac ∩ B c . (ii) For r = 1, we have |X + Y | ≤ |X | + |Y |, so that a.s.
E B |X + Y | ≤ E B (|X | + |Y |) = E B |X | + E B |Y | . a.s.
For r > 1, |X + Y |r = |X + Y | |X + Y |r −1 ≤ (|X | + |Y |) |X + Y |r −1 = |X | |X + Y |r −1 + |Y | |X + Y |r −1 . Hence, with s > 1 such that r1 + 1s = 1, we have a.s. E B |X + Y |r ≤ E B |X | |X + Y |r −1 + E B |Y | |X + Y |r −1 1/r 1/s a.s. B ≤ E |X |r × E B |X + Y |(r −1)s 1/r 1/s + E B |Y |r × E B |X + Y |(r −1)s 1/s 1/r B (r −1)s E B |X |r = E |X + Y | 1/r B r + E |Y | 1/s 1/r 1/r B r B r B r E |X | , = E |X + Y | + E |Y |
163
164
CHAPTER 9 Conditional Expectation and Conditional Probability
since (r − 1)s = r as follows from
1 r
+
1 s
= 1. That is,
E B |X + Y |r ≤ (E B |X + Y |r )1/s [(E B |X |r )1/r + (E B |Y |r )1/r ] a.s. Now, if (E B |X + Y |r )1/s = 0 on a set A, then E B |X + Y |r = 0 on A, whereas on Ac , (E B |X + Y |r )1/s > 0. Then on A, the desired inequality is true and so is on Ac by dividing through by (E B |X + Y |r )1/s . Thus we get the result. (iii) We have seen that |X + Y |r ≤ cr (|X |r + |Y |r ). Hence by Theorem 2 (ii), (iii), we get E B |X + Y |r ≤ cr (E B |X |r + E B |Y |r ) a.s. (iv) Let X = X + − X − . Since X + , X − ≥ 0, we get E B X + , E B X − ≥ 0 a.s. and a.s. a.s. a.s. E B X = E B (X + − X − ) = E B X + −E B X − , so that E B X ≤ E B X + +E B X − = E B (X + + X − ) = E B |X |. (For the case that r > 1, see Exercise 6.) a.s.
Theorem 6 (Convergence in the rth Mean for Conditional Expectations). For n ≥ 1, (r )
let X n , X be r.v.s such that E |X n | , E |X | < ∞. Then if X n → X , r ≥ 1, it follows n→∞
(r )
that E B X n → E B X .
n→∞
Proof.
We have r r E E B X n − E B X = E E B (X n − X ) ≤ E(E B |X n − X |r )
r since for r ≥ 1, E B Z ≤ E B |Z |r a.s., by Theorem 5 (iv), and the last expression equals E |X n − X |r → 0. n→∞
Theorem 7 (Fatou–Lebesgue Theorem, Dominated Convergence Theorem for Conditional Expectations). For n ≥ 1, let X n , X , Y , Z be integrable r.v.s. Then, with n → ∞ as appropriate: (i) Y ≤ X n a.s., n ≥ 1, implies E B (lim inf X n ) ≤ lim inf E B X n a.s., provided lim inf X n is integrable. (ii) X n ≤ Z a.s., n ≥ 1, implies lim sup E B X n ≤ E B lim sup X n a.s., provided lim sup X n is integrable. In particular, (i ) Y ≤ X n ↑ X a.s. implies E B X n ↑ E B X a.s. a.s.
a.s.
(ii ) Y ≤ X n ≤ Z a.s., n ≥ 1, and X n → X , imply E B X n → E B X . a.s.
a.s. (iii ) |X n | ≤ U a.s., n ≥ 1, U integrable and X n → X imply E B X n → E B X . (See also Exercise 11.)
Proof.
In all that follows, n → ∞ as appropriate.
(i) We have X n − Y ≥ 0 a.s., n ≥ 1, and a.s.
0 ≤ inf (X j − Y ) ↑ lim inf (X j − Y ) = lim inf(X n − Y ) = lim inf X n − Y . j≥n
j≥n
9.3 Convergence Theorems and Inequalities for Conditional Expectations
Hence, by Theorem 3,
B inf (X j − Y ) ↑ E B lim inf(X n − Y ) = E B lim inf X n E j≥n
−E B Y a.s. Next, inf j≥n (X j − Y )
≤
X n − Y , so that
EB
a.s.
inf (X j − Y ) ≤ j≥n
E B (X n − Y ) = E B X n − E B Y . Hence lim E B inf (X j − Y ) ≤ lim inf E B X n − E B Y a.s. or a.s.
j≥n
E B lim inf X n − E B Y ≤ lim inf E B X n − E B Y a.s. or
E B lim inf X n ≤ lim inf E B X n , since E |Y | < ∞, so that E B Y < ∞ a.s. (See also Exercise 7.) (ii) It is proved either similarly, or by utilizing the relations according to which X n ≤ Z a.s. is equivalent to −Z ≤ −X n a.s. and this implies that E B [lim inf(−X n )] ≤ lim inf E B (−X n ) a.s. But E B [lim inf(−X n )] = a.s. a.s. E B (− lim sup X n ) = −E B (lim sup X n ) and lim inf E B (−X n ) = lim inf(−E B X n ) = − lim sup E B X n . Thus, −E B (lim sup X n ) ≤ − lim sup E B X n a.s., or lim sup E B X n ≤ E B (lim sup X n ) a.s. (i ) We have Y ≤ X n ↑ X a.s. or 0 ≤ X n − Y ↑ X − Y a.s., so that E B (X n − Y ) ↑ E B (X − Y ) a.s. (by Theorem 3) or E B X n − E B Y ↑ E B X − E B Y a.s. and hence E B X n ↑ E B X , since E B Y < ∞ a.s.
(ii ) By (i) and (ii) we have
a.s. E B lim inf X n ≤ lim inf E B X n ≤ lim sup E B X n a.s. B
≤ E lim sup X n . But lim inf X n = lim sup X n = X a.s., a.s. a.s. so that E B X ≤ lim inf E B X n ≤ lim sup E B X n ≤ E B X . Thus,
EB Xn → EB X . a.s.
(iii ) Apply (ii ) with Y = −U and Z = U .
165
166
CHAPTER 9 Conditional Expectation and Conditional Probability
Theorem 8 (Jensen Inequality for Conditional Expectations). Let X be a r.v. taking values in I , an open interval in , with E X ∈ I , and let g : I → be convex nd such that E |g(X )| < ∞. Then g(E B X ) ≤ E B g(X ) a.s. In particular, if g is also nondecreasing, then Y ≤ E B X a.s. implies g(Y ) ≤ E B g(X ) a.s.
Proof. In the proof, treat I as if it were the entire for convenient writing. It is well known that g is continuous and either monotone in or there exists x0 ∈ such that g is nonincreasing in (−∞, x0 ] and nondecreasing in [x0 , ∞). (See Exercise 2 in Chapter 6, and the book by Hardy et al., (1967).) Consider this latter case first. Clearly, without loss of generality, we may assume that x0 = 0 and that g(x0 ) = 0. Thus g(x) ≥ 0 for all x ∈ . Next, let Z be a simple r.v.; i.e., Z = kj=1 z j I A j , where {A j , j = 1, . . . , k} is a (measurable) partition of . Then, clearly, ⎞ ⎛ z j IA j ⎠ = g(z j )I A j , g(Z ) = g ⎝ j
so that
⎡
E B g(Z ) = E B ⎣
j
⎤
j a.s.
=
a.s.
g(z j )I A j ⎦ =
⎛ a.s.
g(z j )E B I A j ≥ g ⎝
j
E B g(z j )I A j
j
⎞ z j E B I A j ⎠ (see Exercise 8)
j
(by the convexity of g and the fact that j E B I A j = j P B A j = 1 a.s.) ⎡ ⎛ ⎞⎤ a.s. ⎣ B ⎝ =g E z j I A j ⎠⎦ = g E B Z . j
That is,
E B g(Z ) ≥ g E B Z a.s.
(9.6)
Now for the given r.v. X , let {X n }, n ≥ 1, be a sequence of simple r.v.s such that, as n → ∞ here and in the remainder of the proof, X n → X , |X n | ≤ |X | , (|X n | ≤ n), and (0 ≤)g(X n ) ≤ g(X ). (In Theorem 17, Chapter 1, take X n = 2jn for j = −n2n + 1, . . . , 0, so that X ≤ X n , n and take X n = j−1 2n for j = 1, . . . , n2 , so that X n ≤ X .) We have then, by (9.6),
9.3 Convergence Theorems and Inequalities for Conditional Expectations
with Z replaced by X n , By the Dominated (Theorem 7 (iii)),
E B g(X n ) ≥ g E B X n a.s. Convergence
Theorem
for
(9.7) conditional
expectations
EB Xn → EB X . a.s.
g (x )
[ j −1
) j
2n
2n
0
Hence
[ j −1
) j
2n
2n
x
g(E B X n ) → g(E B X ). a.s.
Also
(9.8) a.s.
0 ≤ g(X n ) ≤ g(X ) and g(X n ) → g(X ). Thus the Fatou–Lebesgue Theorem for conditional expectations again (Theorem 7(ii ) or (iii )) gives a.s. (9.9) E B g(X n ) → E B g(X ). Relations (9.7)–(9.9) give that E B g(X ) ≥ g E B X a.s. Next, let g be monotone. In this case, {X n } may be chosen so that X n → X , |X n | ≤ |X | , (|X n | ≤ n), and |g(X n )| ≤ |g(X )| . Indeed, if g is either increasing or decreasing in , then it will look like either one of the accompanying first four figures. In either case, we may assume (by switching the curve) that it looks like one of the last two pictures. Consider the figure at the left-hand side, as the other figure is treated similarly. For j ≤ 0, take X n = j/2n .
167
168
CHAPTER 9 Conditional Expectation and Conditional Probability
Then X ≤ X n , g(X ) ≤ g(X n ), |X n | ≤ |X |, and |g(X n )| ≤ |g(X )|. For j ≥ 1, take X n = ( j − 1)/2n . Then |X n | = X n ≤ X = |X |, and |g(X n )| = g(X n ) ≤ g(X ) = |g(X )|. So, in all cases, |X n | ≤ |X | and |g(X n )| ≤ |g(X )|. g(x)
g(x)
x
0
x
0
g (x )
g (x )
x
0
x
0
g (x )
g(x)
[− j 2− 1 , 2j ) n
n
0
x
[
j−1, j 2n 2n
)
[
)
j −1 2n
j 2n
0
j −1 2n
j 2n
[
)
x
9.4 Further Properties of Conditional Expectations
As before, the convexity of g implies (9.7), and thus we reach the same conclusion as before. Likewise if g is nonincreasing in . For the second conclusion, one has Y ≤ E B X a.s. implies g(Y ) ≤ g E B X a.s.
Since g E B X ≤ E B g(X ) a.s., we obtain the result.
9.4 Further Properties of Conditional Expectations and Conditional Probabilities Theorem 9.
If B ⊆ B (⊆ A) and E |X | < ∞, then
E B (E B X ) = E B X = E B (E B X ) a.s.
Proof. The first equality follows by the fact that E B X is B -measurable, since it is B-measurable and B ⊆ B . As for the second equality, we have for all B ∈ B, B B E (E X )dPB = E B X dPB B B = X dP = E B X dPB ; B
i.e.,
B
E B (E B X )dPB =
B
B
E B X dPB ,
B ∈ B,
and hence E B (E B X ) = E B X a.s. since they are both B-measurable.
Now, let X , Y be two r.v.s defined on the probability space (, A, P), and set B X = X −1 (Borel σ -field in ), BY = Y −1 (Borel σ -field in ). Then the elementary definition of independence of the r.v.s. X , Y is equivalent to independence of B X and BY ; i.e., P(A ∩ B) = P(A)P(B), for all A ∈ B X and B ∈ BY . A σ -field B ⊆ A and Y are said to be independent if B and BY are independent. A formalization of these concepts and further elaboration on independence may be found in Section 1 of Chapter 10. The following result will be needed in the sequel; its proof is deferred to Chapter 10 (see Section 10.3). Lemma 1. If the r.v.s X and Y are independent and E |X | , E |Y | < ∞, then E |X Y | < ∞ and E(X Y ) = (E X )(EY ). Remark 3.
By induction, the lemma is true for any n independent integrable r.v.s.
169
170
CHAPTER 9 Conditional Expectation and Conditional Probability
Theorem 10. Let X be an integrable r.v. such that X and B are independent. Then E B X = E X a.s. Proof. For any B ∈ B, we have that I B and X are independent since {∅, B, B c , } ⊆ B. Next, B E X dPB = X dP = (X I B )dP = E(X I B ) = (E X )(E I B ) B
B
(by independence of X and I B ) E X dP = E X dPB . = (E X )P(B) = B
That is,
B
E B X dPB =
B
B
E X dPB ,
B ∈ B,
whereas both E B X and E X are B-measurable. Thus E B X = E X a.s.
and set Now let Y be a function defined on (, A, P) into BY = A ⊆ ; Y −1 (A ) = A for some A ∈ A . Then, BY is a σ -field of subsets of (by Theorem 11 in Chapter 1). Let also BY = Y −1 (BY ), so that BY is a σ -field ⊆ A. On BY and BY define the probability measures PY and PY , respectively, as follows: PY (B) = P(B), B ∈ BY ; PY (B ) = P[Y −1 (B )] = PY (B), B ∈ BY , B = Y −1 (B ). Definition 2. If X is an integrable r.v. and Y is as above, the conditional expectation of X , given Y , is denoted by E Y X or E(X | Y ) and is defined to be E BY X . The reason for this notation will be explained hereafter. For this purpose, we need the following result, which is, actually, closely related to Theorem 13 in Chapter 4. Lemma 2. Let (, BY , PY ), ( , BY , PY ), be as before and let g : ( , BY ) → ( , Borel σ -field) be measurable, so that g(Y ) : (, BY ) → ( , Borel σ -field) is measurable (a r.v.). Then g(y)dPY = g(Y ) dPY , B ∈ BY , B = Y −1 (B ), B
B
in the sense that, if one of these integrals exists, then so does the other, and both are equal. Proof. The proof follows the familiar line. First, let g(y) = I A (y), A ∈ BY . Then, clearly, g(Y ) = I A (Y ), A = Y −1 (A ). Therefore g(y)dPY = I A (y)dPY = dPY = PY (A ∩ B ) = P[Y −1 (A ∩ B )] B B A ∩B = P(A ∩ B) = PY (A ∩ B) = dPY = I A (Y ) dPY = g(Y ) dPY . A∩B
B
B
9.4 Further Properties of Conditional Expectations
That is, the theorem is true for indicators. Then it is true for the case that g is a nonnegative simple r.v. Next, it is true for the case that g is a ≥ 0 r.v., by the Lebesgue Monotone Convergence Theorem, and finally for any g, as described earlier, by writing g as g = g + −g − . (For the details, see Exercise 9; also, Theorem 13 in Chapter 4). Theorem 11. Let (, BY , PY ), ( , BY , PY ) be as before and let X be an integrable r.v. Then E Y X (= E BY X ) is a.s. ([P]) a function of Y ; i.e., it depends on ω ∈ only through Y for almost all (a.a.) ω ∈ . More precisely, there exists a (measurable) function g : ( , BY ) → Borel real line, such that (E Y X )(ω) = g(Y (ω)) a.s. [P]. Proof. On BY , define the finite measures ϕ + , ϕ − and the finite set function ϕ as follows: ϕ + (A) = X + dP, ϕ − (A) = X − dP, A A + − ϕ(A) = ϕ (A) − ϕ (A) = X dP. A
Then ϕ + , ϕ − , ϕ
are all PY -continuous finite measures and ϕ is a signed measure. On BY , define the measures ϕ + , ϕ − and the set function ϕ as follows: ϕ + (A ) = ϕ + (A), ϕ − (A ) = ϕ − (A), ϕ (A ) = ϕ(A),
A = Y −1 (A ).
Then ϕ + , ϕ − are finite measures, ϕ is a signed measure, and all three are PY -continuous. Then, by Theorem 3 in Chapter 7, there exist BY -measurable, nonnegative (a.s. [PY ] finite), and a.s. [PY ] well-defined r.v.s g+ , g− such that ϕ + (A ) = g+ dPY , ϕ − (A ) = g− dPY , A ∈ BY . A
A
By setting g = g+ − g− , we have that g is BY -measurable, (a.s. [PY ] finite), and a.s. [PY ] well defined, and such that ϕ (A ) = gdPY , A ∈ BY . A
But
A
g dPY = ϕ (A )[= ϕ + (A ) − ϕ − (A ) = ϕ + (A) − ϕ − (A)] = ϕ(A) = X dP = E Y X dPY . A
Thus,
A
On the other hand,
g dPY = A
A
A
g(y)dPY =
g(y)dPY =
E Y X dPY . A
g(y) dPY , A
171
172
CHAPTER 9 Conditional Expectation and Conditional Probability
by Lemma 2. Thus,
E X dPY =
g(Y ) dPY ,
Y
A
A ∈ BY .
A
Since both E Y X and g(Y ) are BY -measurable, we have that E Y X = g(Y ) a.s., as was to be seen. Remark 4. In the last theorem, we have shown that there exists a measurable function g on into , (a.s. [PY ] finite), and a.s. [PY ] well defined such that E Y X = E(X | Y ) = g(Y ) a.s. Now, let y be in the range of Y , let {y} ∈ BY , and set A y = Y −1 ({y}) (so that A y ∈ BY ). Then for ω ∈ A y , (E Y X )(ω) = g(y) a.s. This is what we mean by writing E(X | Y = y) = g(y). If A ∈ X −1 (Borel σ − field in ), it follows that A = X −1 (B) for some B ∈ (Borel σ -field in ). Then E Y I A = P Y A = P Y (X ∈ B), (since A = (X ∈ B)) to be denoted by P(X ∈ B | Y ) and the notation P(X ∈ B | Y = y) is used to denote the value of P(X ∈ B | Y ) at ω ∈ A y . Following is an example where Theorem 11 is employed along with other results obtained in this chapter. Example 3. Let X 1 , . . . , X n be i.i.d. r.v.s defined on the probability space (, A, Pθ ), where the parameter θ lies in the parameter space , an open subset of , and set X = (X 1 , . . . , X n ). Let U = U (X) be a statistic (a measurable function of X not involving any unknown quantities), and let T be a sufficient statistic for θ (as defined in Example 1 of this chapter). The statistic U , as an estimator of θ , is called unbiased, if Eθ U = θ for all θ ∈ . Consider the Eθ (U |T ). Then this conditional expectation has the following two properties: It is independent of θ (by sufficiency), and it depends on T only a.s. (by Theorem 11). Therefore, we may set Eθ (U |T ) = φ(T ). Then also, Eθ φ(T ) = Eθ [Eθ (U |T )] = Eθ U = θ for all θ , so that φ(T ) is also an unbiased estimator of θ . Furthermore, σθ2 [φ(T )] ≤ σθ2 (U ) for all θ , so that φ(T ) is at least as good as U in terms of variances (which are assumed to be finite). The variances inequality is seen as follows. On , consider the function g(x) = (x − θ )2 , which is convex (because g (x) = 2 > 0), and apply the Jensen inequality for conditional expectations (Theorem 8 in this chapter) to obtain [φ(T ) − θ ]2 = [Eθ (U |T ) − θ ]2 a.s.
= [Eθ (U − θ |T )]2 ≤ Eθ [(U − θ )2 |T ] a.s.
Taking the Eθ -expectations of both sides, we get σθ2 [φ(T )] = Eθ [Eθ (U |T ) − θ ]2 ≤ Eθ (U − θ )2 = σθ2 (U ) for all θ .
9.4 Further Properties of Conditional Expectations
Application 1. Let X and Y be discrete (real-valued) r.v.s defined on (, A, P), y with Y taking on the values y j with P(Y = y j ) > 0 for all j ≥ 1. In the sequel, will stand for such a value. Let BY be the σ -field generated by {y j }, j ≥ 1 , and let BY = Y −1 (BY ). Then, for every A ∈ BY , A = Y −1 (B ) for some B ∈ BY . Let PY be the restriction of P to BY , and let PY be the probability distribution of Y ; i.e., PY (B ) = PY (A) for every B ∈ BY , where A = Y −1 (B ). Let x be any one of the values x1 , x2 , . . . of X , set B = {x}, and let C = X −1 (B) = X −1 ({x}) = (X = x). Define ϕ as follows: ϕ(A) =
A ∈ BY .
IC dP,
(9.10)
A
Then,
IC dP(= P(A ∩ C)) = (E Y IC )dPY A A = E(IC |Y )dPY = P(C|Y )dPY .
ϕ(A) =
A
Define ϕ by
(9.11)
A
ϕ (B ) = ϕ(A),
B ∈ BY ,
A = Y −1 (B ).
(9.12)
Then ϕ is a finite measure on BY and PY -continuous (since PY (B ) = 0 implies PY (A) = 0 or P(A) = 0, hence ϕ(A) = 0, so that ϕ (B ) = 0). Then, by Theorem 3 in Chapter there exists a BY -measurable function g, nonnegative (a.s. [PY ] finite), 7, and a.s. PY well defined, such that ϕ (B ) = By Lemma 2,
B
g(y)dPY =
B
g(y)dPY ,
B ∈ BY .
g(Y )dPY (A = Y −1 (B )),
(9.14)
A
and therefore, by means of (9.10)–(9.14), g(Y )dPY = P(C|Y )dPY , A
(9.13)
A ∈ BY .
(9.15)
A
Since g(Y ) and P(C|Y ) are BY -measurable, relation (9.15) implies that P(C|Y ) = P(X = x|Y ) = g(Y ) a.s., and, in particular, P(C|Y )(ω) = P(X = x|Y )(ω) = g(y) a.s. for ω ∈ A y = Y −1 ({y}).
(9.16)
Next, for a fixed x denoting any one of the values x1 , x2 , . . . of X , P(X = x|Y = y) =
P(X = x, Y = y) is BY -measurable (see also Exercise 10) P(Y = y)
173
174
CHAPTER 9 Conditional Expectation and Conditional Probability
and by (9.11), B
=
P(X = x|Y = y)dPY =
P(X = x, Y = y) P(Y = y) P(Y = y)
y∈B
P(X = x, Y = y) = P(X = x, Y ∈ B )
y∈B
= P(A ∩ C) = ϕ(A) g(y)dPY (by (9.12) and (9.13)). = ϕ (B ) =
(9.17)
B
The BY -measurability of P(X = x|Y = y) and of g(y), and relation (9.17) imply that (9.18) P(X = x|Y = y) = g(y) a.s. PY . Therefore, by (9.16) and (9.18), P(X = x|Y = y) = P(X = x|Y )(ω) a.s. for ω ∈ A y = Y −1 ({y}).
(9.19)
Relation (9.19) states that P(X = x|Y = y) is a.s. equal to P(C|Y ) evaluated at each ω in A y . This result demonstrates that, in the present setup, the general definition and the elementary definition of a conditional p.d.f. coincide (a.s.). Exercises. 1. If X is an integrable r.v. and {An , n = 1, 2, . . .} are pairwise disjoint events, show that ∞ X dP = X dP. ∞ Ai
i=1
i=1
Ai
Hint: Split X into X + and X − and use Corollary 1 (ii) to the Lebesgue Monotone Convergence Theorem (Theorem 1 in Chapter 5). 2. Let X be an integrable r.v. and, for B ∈ A with P(B) > 0, consider the conditional probability on A, P(·|B). Then show that 1 X dP(·|B) = X dP. P(B) B Hint: Go through the familiar four steps that X is an indicator function, a simple r.v., a nonnegative r.v., any r.v. 3. Consider the probability space (, A, P) = ([0, 1], B[0,1] , λ), where λ is the Lebesgue measure, and let F be the σ -field generated by the class {[0, 41 ], ( 41 , 23 ], ( 23 , 1]}. Also, let X be the r.v. defined by: X (ω) = ω2 , ω ∈ . Then show that E(X |F) = α1 I[0, 1 ] + α2 I( 1 , 2 ] + α3 I( 2 ,1] , 4
4 3
3
and compute α1 , α2 , α3 . Hint: Refer to the special cases discussed right after Definition 1.
9.4 Further Properties of Conditional Expectations
4. Let X and Y be B-measurable and integrable r.v.s. We further assume that B X dP ≥ B Y dP for every B ∈ B(⊆ A). Then show that X ≥ Y a.s. Hint: By setting Z = Y − X , we have B Z dP ≤ 0 for all B ∈ B, and we wish to conclude that Z ≤ 0 a.s. Set C = (Z ≤ 0) and D = (Z > 0)(= C c ). Then it suffices to show D Z dP ≤ 0 and that P(D) = 0. By taking B = D, we have Z dP = (Z I )dP ≥ 0 since Z > 0 on D, so that Z dP D D D = 0. Thus, it suffices to show that, if for a r.v. Z with D = (Z > 0) it holds that D Z dP = 0, then P(D) = 0. This can be done through the four familiar steps. 5. (i) Let X ≥ 0 a.s. on a set A with P(A) > 0, and suppose that A X dP = 0. Then show that X = 0 a.s. on A. (ii) Let X ≥ 0, integrable, and X = 0 a.s. on a set A ∈ B(⊆ A), with P(A) > 0. Then show that E B X = 0 a.s. on A. Hint: (i) With C = (X > 0), we have 0 = A X dP = D X dP, D = A ∩ C. So, D X dP = 0 and X > 0 on D. Show that P(D) = 0 which would be equivalent to saying that X = 0 a.s. on A. Do it by going through the four familiar steps. (iii) Use the fact that B X dP = B E B X dPB for all B ∈ B, replace B by B ∩ A, and conclude that I A E B X = 0 a.s. This would imply that E B X = 0 a.s. on A. 6. If E|X |r < ∞, then show that |E B X |r ≤ E B |X |r a.s., r ≥ 1. Hint: One way of establishing this inequality is to use the Jensen inequality (Theorem 8). For this purpose, take g(x) = |x|r , r ≥ 1, and observe that it is convex. (It is convex for x ≥ 0 and symmetric about the y-axis, hence convex in .). 7. If the r.v. X is integrable, then E B X is finite a.s. 8. Recall that a function g : I (open interval) ⊆ → is said to be convex if g(α1 x1 + α2 x2 ) ≤ α1 g(x1 ) + α2 g(x2 ) for all α1 , α2 ≥ 0 with α1 + α2 = 1, and all x1 , x2 ∈ I . Prove the following generalization: if g is as above, then: g(α1 x1 + · · · + αn xn ) ≤ α1 g(x1 ) + · · · + αn g(xn )
(*)
for any n ≥ 2, any α1 , . . . , αn ≥ 0 with α1 +· · ·+αn = 1, and all x1 , . . . , xn ∈ I . Hint: Use the induction method. Inequality (*) is true for n = 2, assume it to be true for n = k and establish it for n = k + 1. In the expression g(α1 x1 + · · · + αk+1 xk+1 ) group the terms in two parts, one containing the first k terms and one containing the last term. In the first group, multiply and divide by 1 − αk+1 (assuming, without loss of generality, that αk+1 < 1), and use the induction hypothesis. 9. Fill in the details in proving Lemma 2. Hint: Refer to the proof of Theorem 13 in Chapter 4. 10. Let X and Y be discrete r.v.s and recall (from the application following Theorem 11) that BY is the σ -field of subsets of defined by BY = {B ⊆ ; Y −1 (B ) = A for some A ∈ A}. For x, y ∈ with P(Y = y) > 0, consider the
175
176
CHAPTER 9 Conditional Expectation and Conditional Probability
conditional probability P(X = x|Y = y) = P(X = x, Y = y)/P(Y = y) and show that, for each fixed x, the function P(X = x|Y = ·) is BY -measurable. 11. The Dominated Convergence Theorem in its classical form states: If |X n | ≤ Y a.s.
P
n→∞
n→∞
a.s., n ≥ 1, EY < ∞, and either X n → X or X n → X , then E X n → E X finite. n→∞
In the framework of conditional expectations, we have shown that: If |X n | ≤ Y a.s. a.s. a.s., n ≥ 1, EY < ∞, and X n → X with E X finite, then E B X n → E B X for n→∞
n→∞
any σ -field B ⊆ A (see Theorem 7 (iii )). a.s. By means of an example, show that the convergence X n → X cannot be n→∞
P
a.s.
n→∞
n→∞
replaced by X n → X and still conclude that E B X n → E B X . 12. Let the r.v.s X and Y have the Bivariate Normal distribution with parameters μ1 , μ2 in , 0 < σ1 , σ2 < ∞, and ρ ∈ [−1, 1], so that their joint probability density function (p.d.f.) is given by p X ,Y (x, y) = where 1 q= 1 − ρ2
+
1
2π σ1 σ2 1 − ρ 2
x − μ1 σ1
!2
y − μ2 σ2
− 2ρ !2 "
e−q/2 ,
x − μ1 σ1
!
y − μ2 σ2
!
, x, y ∈ .
(i) Show that the exponent may be written thus: ! ! ! x − μ1 2 y − μ2 x − μ1 2 2 2 −ρ (1 − ρ )q = + (1 − ρ ) σ2 σ1 σ1 !2 !2 y−b x − μ1 = + (1 − ρ 2 ) , σ2 σ1 ρσ2 where b = μ2 + (x − μ1 ). σ1 (ii) From part (i), it follows that:
" (x − μ1 )2 p X ,Y (x, y) = √ exp − 2σ12 2π σ1 " (y − b)2 1 exp − ×√ . 2π (σ2 1 − ρ 2 ) 2(σ2 1 − ρ 2 )2 1
9.4 Further Properties of Conditional Expectations
From this expression, and without any actual integration, conclude that the r.v. X is distributed as N (μ1 , σ12 ); i.e., X ∼ N (μ1 , σ12 ), and by symmetry, Y ∼ N (μ2 , σ22 ). 13. In reference to Exercise 12 (ii), and without any actual operations, conclude that the conditional distribution of the r.v. Y , give X = x, is N (b, σ22 (1 − ρ 2 )), where 2 b = μ2 + ρσ σ1 (x − μ1 ); and by symmetry, the conditional distribution of X , given 1 Y = y, is N (c, σ12 (1 − ρ 2 )), where c = μ1 + ρσ σ2 (y − μ2 ). 14. (i) In reference to Exercises 12 and 13, and by writing E(X Y ) = E[E(X Y )|X ], show that E(X Y ) = μ1 μ2 + ρσ1 σ2 . (ii) Use the definition of the covariance of two r.v.s X and Y (with finite second moments), Cov(X , Y ) = E[(X − E X )(Y − EY )] = E(X Y ) − (E X )(EY ), in order to conclude that, in the present case, Cov(X , Y ) = ρσ1 σ2 . (iii) From part (ii), conclude that, in the present case, the correlation coefficient of the r.v.s X and Y , ρ(X , Y ), is equal to ρ. 15. If the r.v.s X and Y have the Bivariate Normal distribution with parameters μ1 , μ2 in , 0 < σ1 , σ2 < ∞ and ρ ∈ [−1, 1], set U=
Y − μ2 X − μ1 and V = σ1 σ2
and show that the r.v.s U and V have the Bivariate Normal distribution with parameters 0, 0, 1, 1 and ρ, by transforming the joint p.d.f. of X and Y into the joint p.d.f. of U and V . 16. Let X and Y be r.v.s defined on the probability space (, A, P), and suppose that E X 2 < ∞. Then show that (i) The conditional variance of X , given Y , is given by the formula Var (X |Y ) = E{[X − E(X |Y )]2 |Y } = E(X 2 |Y ) − [E(X |Y )]2 a.s. (ii) Var(X ) = E[Var (X |Y )] + Var [E(X |Y )]. 17. (Wald) (i) Let X 1 , X 2 , . . . be r.v.s and let N be a r.v. taking the values 1, 2, . . ., all defined on the probability space (, A, P). Define the function X as X (ω) = X 1 (ω) + · · · + X N (ω) (ω), and show that X is a r.v. (ii) Now suppose that the X i s are independent and identically distributed with E X 1 = μ ∈ , that N is independent of the X i s, and that E N < ∞. Then show that E(X |N ) = μN , and therefore E X = μ(E N ).
177
178
CHAPTER 9 Conditional Expectation and Conditional Probability
(iii) If in addition to the assumptions made in part (ii), it also holds that Var(X 1 ) = σ 2 < ∞ and Var(N ) < ∞, then show that Var(X |N ) = σ 2 N . (iv) Use parts (ii) and (iii) here and part (ii) of Exercise 16 in order to conclude that Var(X ) = σ 2 (E N ) + μ2 Var(N ). Hint: For parts (ii) and (iii), use the special case right after Definition 1, and part (i) of Exercise 16. 18. Let B be a sub-σ -field in (, A, P) which is equivalent to the trivial σ -field {, } (in the sense that, for every B ∈ B, either P(B) = 0 or P(B) = 1), and let X be a B-measurable (integrable) r.v. Then show that X = E X a.s. 19. In reference to Example 2, show that, for each t = 0, 1, . . ., there are, indeed,
n+t−1 xi = (xi1 , . . . , xin ) where each xi1 , . . . , xin ranges from 0 to t, and t
xi1 + · · · + xin = t, i = 1, . . . , n+t−1 . t
CHAPTER
Independence
10
The concept of independence of two σ -fields and two r.v.s was introduced in Section 4 of the previous chapter, because it was needed in Lemma 1 there. What we do in this chapter is to elaborate to a considerable extent on the concept of independence and some of its consequences. In Section 10.1, the relevant definitions are given, and a result regarding independence of functions of independent r.v.s is stated and proved. The highlight of the chapter is Theorem 1, which states that the factorization of the joint d.f. of n r.v.s to the individual d.f.s implies independence of the r.v.s involved. Section 10.2 is devoted to establishing those auxiliary results, which are needed for the proof of Theorem 1. In the final section of the chapter, the proof of Theorem 1 is given, as well as the proof of Lemma 1 in Chapter 9.
10.1 Independence of Events, σ -Fields, and Random Variables Here, we recall the definition of independence of events, and then extend it to independence of classes of events (and in particular, fields or σ -fields), and independence of r.v.s. In all that follows, (, A, P) is the underlying probability space. Definition 1. Two events A1 , A2 are said to be independent (stochastically, statistically, or in the probability sense), if P(A1 ∩ A2 ) = P(A1 )P(A2 ).
(10.1)
For n ≥ 2, the events A1 , . . . , An are said to be independent, if for any k ≥ 2 and any (integers) n 1 , . . . , n k with 1 ≤ n 1 < n 2 < · · · < n k ≤ n, it holds that P(An 1 ∩ An 2 ∩ · · · ∩ An k ) = P(An 1 )P(An 2 ) · · · P(An k ).
(10.2)
Any collection of events, {Ai , i ∈ I }, is said to be independent, if any finite subcollection is a set of independent events. Definition 2. Two classes of events C1 and C2 with C j ⊆ A, j = 1, 2, are said to be independent, if for all choices of events A j ∈ C j , j = 1, 2, relation (10.1) holds. For An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00010-4 Copyright © 2014 Elsevier Inc. All rights reserved.
179
180
CHAPTER 10 Independence
n ≥ 2, the classes of events C j ⊆ A, j = 1, . . . , n, are said to be independent, if for all choices of events A j ∈ C j , j = 1, . . . , n, the events {A1 , . . . , An } are independent; i.e., relation (10.2) holds. Any collection of classes of events C j ⊆ A, j ∈ I , are said to be independent, if any finite subcollection out of these classes is a collection of independent classes. In particular, if the classes C j are fields F j or σ -fields A j , then we talk about independence of fields or σ -fields, respectively. Definition 3. Consider the r.v.s X 1 and X 2 , and let A j = X −1 j (B), j = 1, 2, be the σ -fields induced by them in (B being the Borel σ -field in ). Then we say that the r.v.s X 1 and X 2 are independent, if the σ -fields A1 and A2 are independent. The σ field A1 and the r.v. X are said to be independent if the σ -fields A1 and A2 = X −1 (B) are independent. For n ≥ 2, the r.v.s X j , j = 1, . . . , n, are said to be independent, if the σ -fields A j = X −1 j (B), j = 1, . . . , n, induced by them are independent. Finally, restricting ourselves to denumerably many r.v.s, we say that the r.v.s {X n , n ≥ 1} are independent, if any finite collection of them is a set of independent r.v.s. Remark 1. In order to establish independence of n fields F j or σ -fields A j , j = 1, . . . , n, it suffices to show that, for all choices of events A j ∈ F j or A j ∈ A j , j = 1, . . . , n, it holds that P(A1 ∩ · · · ∩ An ) = P(A1 ) · · · P(An ).
(10.3)
In other words, it is not necessary to check (10.2) for all subcollections of k events out of the n events. This is so, because by taking A j = for j = n 1 , . . . , n k , relation (10.3) reduces to (10.2). Remark 2. From Definition 2, it is immediate that subclasses of independent classes are also independent. We record below a simple but very useful result. Proposition 1. Borel functions of independent r.v.s are independent r.v.s. That is, if the r.v.s {X n , n ≥ 1} are independent and Yn = gn (X n ), where gn : → measurable, then the r.v.s {Yn , n ≥ 1} are independent. Proof. It follows immediately by Remark 2, because, if A X n are the σ -fields induced by the r.v.s X n , and AYn are the σ -fields induced by the r.v.s Yn , then AYn ⊆ A X n , n ≥ 1. This is so because Yn−1 (B) = [gn (X n )]−1 (B) = X n−1 [gn−1 (B)] ∈ A X n since gn−1 (B) ∈ B, B ∈ B. Now, consider the r.v.s X j , j = 1, . . . , n, and let A j , j = 1, . . . , n be the σ fields induced by them. Then by Definition 3 and Remark 1, independence of the r.v.s X 1 , . . . , X n amounts to the validity of relation (10.3) for all choices of the events A j ∈ A j , j = 1, . . . , n. But A j ∈ A j means that A j = X −1 j (B j ), some B j ∈ B. Therefore relation (10.3) becomes, equivalently, P(X 1 ∈ B1 , . . . , X n ∈ Bn ) = P(X 1 ∈ B1 ) . . . P(X n ∈ Bn )
(10.4)
10.2 Some Auxiliary Results
for all B j ∈ B, j = 1, . . . , n. Thus, establishing independence for the r.v.s X 1 , . . . , X n amounts to checking relation (10.4). However, in lower level textbooks, it is claimed that the r.v.s X 1 , . . . , X n are independent, if P X 1 ≤ x1 , . . . , X n ≤ xn = P(X 1 ≤ xn ) . . . P(X n ≤ xn ), or P X 1 ∈ (−∞, x1 ], . . . , X n ∈ (−∞, xn ] = P X 1 ∈ (−∞, x1 ] . . . P X n ∈ (−∞, xn ] (10.5) for all x1 , . . . , xn in . Relation (10.5) is a very special case of (10.4), taken from it for B j = (−∞, x j ], j = 1, . . . , n. The question then arises as to whether this claim is valid, as it should be. The justification of this claim is the content of the next result. Theorem 1. The r.v.s X 1 , . . . , X n are independent, if and only if relation (10.5) holds for all x1 , . . . , xn in . As already stated, the validity of (10.5) is a special case of (10.4), taken from it for B j = (−∞, x j ], x j ∈ ; or to put it differently, by taking B j s from the class C = {(−∞, x]; x ∈ }. We show below that (10.5) also holds when this class is enlarged to include , , and any interval in finite or not, and of any form. The proof of this is facilitated by the results established in the following section.
10.2 Some Auxiliary Results In this section, three lemmas and one proposition are established, on which the proof of Theorem 1 is based. Lemma 1. If relation (10.5) holds when the B j s in (10.4) are chosen from the class C, then it holds if the B j s are chosen from the class C0 : C0 = {(−∞, x], (−∞, x), (x, ∞), [x, ∞), (x, y], [x, y), (x, y), [x, y], , ; x, y ∈ } = {, , (x, ∞), (x, y], (−∞, x], [x, y], (x, y), [x, y); x, y ∈ } (listed in a convenient order). Proof. In (10.5), if at least one of the intervals (−∞, x j ], j = 1, . . . , n, is replaced by , then it becomes identity, 0 = 0. Thus, (10.5) holds with intervals (−∞, x j ], j = 1, . . . , n, replaced by . Next, P(X 1 ∈ , X j ≤ x j , j = 2, . . . , n) = P X1 ∈ (−∞, ym ], X j ≤ x j , j = 2, . . . , n where ym ↑ ∞, m → ∞ m
181
182
CHAPTER 10 Independence
=P (X 1 ≤ ym ), X j ≤ x j , j = 2, . . . , n m
(X 1 ≤ ym , X j ≤ x j , j = 2, . . . , n) =P
m
= P lim(X 1 ≤ ym , X j ≤ x j , j = 2, . . . , n) m
= lim P(X 1 ≤ ym , X j ≤ x j , j = 2, . . . , n) m
= lim P(X 1 ≤ ym )P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) m
= P(X 1 ∈ )P(X 2 ≤ x2 ) · · · P(X n ≤ xn ). So, the factorization in (10.5) holds if one of the intervals (−∞, x j ], j = 1, . . . , n (which without loss of generality may be taken to be the interval (−∞, x1 ]), is replaced by . Assuming the factorization to be true when k of the preceding intervals (2 ≤ k < n) are replaced by , we show as before that it is also true if k + 1 intervals are replaced by . So, by the induction hypothesis, any number of intervals may be replaced by and the factorization in (10.5) holds. Next, P(X 1 > x, X j ≤ x j , j = 2, . . . , n) = P(X 1 ∈ ( − (−∞, x]), X j ≤ x j , j = 2, . . . , n) = P((X 1 ∈ , X j ≤ x j , j = 2, . . . , n) −(X 1 ≤ x, X j ≤ x j , j = 2, . . . , n)) = P(X 1 ∈ , X j ≤ x j , j = 2, . . . , n) −P(X 1 ≤ x, X j ≤ x j , j = 2, . . . , n) = P(X 1 ∈ )P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) −P(X 1 ≤ x)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) = [P(X 1 ∈ ) − P(X 1 ≤ x)]P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) = P(X 1 > x)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ). As in the previous step, the factorization holds if any number of the intervals (−∞, x j ], j = 1, . . . , n, is replaced by intervals of the form (x, ∞). Next, P(x < X 1 ≤ y, X j ≤ x j , j = 2, . . . , n) = P(((X 1 ≤ y) − (X 1 ≤ x)), X j ≤ x j , j = 2, . . . , n) = P(X 1 ≤ y, X j ≤ x j , j = 2, . . . , n) −P(X 1 ≤ x, X j ≤ x j , j = 2, . . . , n) = P(X 1 ≤ y)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) −P(X 1 ≤ x)P(X 2 ≤ x2 ) · · · P(X n ≤ xn )
10.2 Some Auxiliary Results
= [P(X 1 ≤ y) − P(X 1 ≤ x)]P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) = P(x < X 1 ≤ y)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ), and the same is true, if any number of the (−∞, x j ], j = 1, . . . , n, is replaced by intervals of the form (x, y]. Next, P(X 1 < x, X j ≤ x j , j = 2, . . . , n) (−∞, ym ], X j ≤ x j , j = 2, . . . , n) where ym ↑ x, m → ∞ = P(X 1 ∈ m
(X 1 ≤ ym ), X j ≤ x j , j = 2, . . . , n =P m
(X 1 ≤ ym , X j ≤ x j , j = 2, . . . , n) =P
m
= P lim(X 1 ≤ ym , X j ≤ x j , j = 2, . . . , n) m
= lim P(X 1 ≤ ym , X j ≤ x j , j = 2, . . . , n) m
= lim P(X 1 ≤ ym )P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) m
= P(X 1 < x)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ). Then, arguing as in the previous steps, we conclude that (10.5) holds if any number of the (−∞, x j ], j = 1, . . . , n, is replaced by intervals of the form (−∞, x). Next, P(x ≤ X 1 ≤ y, X j ≤ x j , j = 2, . . . , n) = P(((X 1 ≤ y) − (X 1 < x)), X j ≤ x j , j = 2, . . . , n) = P(X 1 ≤ y, X j ≤ x j , j = 2, . . . , n) −P(X 1 < x, X j ≤ x j , j = 2, . . . , n) = P(X 1 ≤ y)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) −P(X 1 < x)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) = [P(X 1 ≤ y) − P(X 1 < x)]P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) = P(x ≤ X 1 ≤ y)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ), and arguing as before, we conclude that (10.5) holds if any number of the (−∞, x j ], j = 1, . . . , n, is replaced by intervals of the form [x, y]. Next, P(X 1 ≥ x, X j ≤ x j , j = 2, . . . , n) (ym , ∞), X j ≤ x j , j = 2, . . . , n where ym ↑ x, m → ∞ = P X1 ∈ m
183
184
CHAPTER 10 Independence
=P (X 1 > ym ), X j ≤ x j , j = 2, . . . , n m
(X 1 > ym , X j ≤ x j , j = 2, . . . , n) =P
m
= P lim(X 1 > ym , X j ≤ x j , j = 2, . . . , n) m
= lim P(X 1 > ym , X j ≤ x j , j = 2, . . . , n) m
= lim P(X 1 > ym )P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) m
= P(X 1 ≥ x)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ), and arguing as in previous steps, we conclude that (10.5) holds if any number of the (−∞, x j ], j = 1, . . . , n, is replaced by intervals of the form [x, ∞). Next, P(x < X 1 < y, X j ≤ x j , j = 2, . . . , n) (x, ym ], X j ≤ x j , j = 2, . . . , n where ym ↑ y, m → ∞ = P X1 ∈ =P
m
(x < X 1 ≤ ym ), X j ≤ x j , j = 2, . . . , n
m
=P
(x < X 1 ≤ ym , X j ≤ x j , j = 2, . . . , n)
m
= P lim(x < X 1 ≤ ym , X j ≤ x j , j = 2, . . . , n) m
= lim P(x < X 1 ≤ ym , X j ≤ x j , j = 2, . . . , n) m
= lim P(x < X 1 ≤ ym )P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) m
= P(x < X 1 < y)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ), and as in previous steps, it follows that (10.5) holds when any number of the (−∞, x j ], j = 1, . . . , n, is replaced by intervals of the form (x, y). Finally, P(x ≤ X 1 < y, X j ≤ x j , j = 2, . . . , n) = P X1 ∈ (ym , y), X j ≤ x j , j = 2, . . . , n where ym ↑ x, m → ∞ m
(ym < X 1 < y), X j ≤ x j , j = 2, . . . , n =P m
(ym < X 1 < y, X j ≤ x j , j = 2, . . . , n) =P m
10.2 Some Auxiliary Results
= P lim(ym < X 1 < y, X j ≤ x j , j = 2, . . . , n) m
= lim P(ym < X 1 < y, X j ≤ x j , j = 2, . . . , n) m
= lim P(ym < X 1 < y)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) m
= P(x ≤ X 1 < y)P(X 2 ≤ x2 ) · · · P(X n ≤ xn ), and as before, we conclude that (10.5) holds when any number of the intervals (−∞, x j ], j = 1, . . . , n, is replaced by intervals of the form [x, y). Finally, combining the 10 conclusions just reached, we have that the factorization (10.5) holds when the B j s in (10.4) are chosen in any way from the class C0 . This completes the proof of the lemma. Lemma 2. Let F be the field (by Exercise 7(ii) in Chapter 1) of all finite sums of members of C0 . Then (10.5) holds, if any number of intervals (−∞, x j ], j = 1, . . . , n, are replaced by elements of F.
m Ii with Ii ∈ C0 , and Proof. Let I ∈ F. Then I = i=1 P(X 1 ∈ I , X j ≤ x j , j = 2, . . . , n) = P X1 ∈ Ii , X j ≤ x j , j = 2, . . . , n i
=P (X 1 ∈ Ii ), X j ≤ x j , j = 2, . . . , n i
=P (X 1 ∈ Ii , X j ≤ x j , j = 2, . . . , n) . i
However,
(X 1 ∈ Ii , X j ≤ x j , j = 2, . . . , n) P
i
=
P X 1 ∈ Ii , X j ≤ x j , j = 2, . . . , n .
i
Therefore P(X 1 ∈ I , X j ≤ x j , j = 2, . . . , n) P(X 1 ∈ Ii )P(X 2 ≤ x2 ) · · · P(X n ≤ xn ) (by Lemma 1) = i
= P(X 2 ≤ x2 ) · · · P(X n ≤ xn )
i
= P(X 2 ≤ x2 ) · · · P(X n ≤ xn )P
P(X 1 ∈ Ii )
X1 ∈
i
Ii
185
186
CHAPTER 10 Independence
= P(X 2 ≤ x2 ) · · · P(X n ≤ xn )P(X 1 ∈ I ) = P(X 1 ∈ I )P(X 2 ≤ x2 ) · · · P(X n ≤ xn ). Thus, the factorization in (10.5) holds if one of the intervals (−∞, x j ], j = 1, . . . , n (which without loss of generality may be taken to be the interval (−∞, x1 ]), is replaced by a member I of F. Assuming it to be true when k of the intervals (2 ≤ k < n) are replaced by members of F, we show as before that the factorization also holds if k + 1 intervals are replaced by I s. So, by the induction hypothesis, any number of intervals may be replaced by I s and the factorization holds. This completes the proof. With F as in Lemma 2, we know that B = σ (F); i.e., B is generated by F. Let −1 F j = X −1 j (F) and A j = X j (B), j = 1, . . . , n. Then by Theorem 12 in Chapter 1, A j = σ (F j ), j = 1, . . . , n. A reformulation of Lemma 2 in terms of elements in the F j s is as follows: for all F j ∈ F j , j = 1, . . . , n, P(F1 ∩ · · · ∩ Fn ) = P(F1 ) · · · P(Fn ).
(10.6)
It is our intention to show that (10.6) holds when the F j s are arbitrary members of the respective A j s. To this effect, let M1 be defined as follows: M1 = {B ∈ A1 ; P(B ∩ F2 ∩ · · · ∩ Fn ) = P(B)P(F2 ) · · · P(Fn ) with F j ∈ F j , j = 2, . . . , n}.
(10.7)
Then, by (10.6) and (10.7), F1 ⊆ M1 ⊆ A1 , and we shall show that M1 is a monotone class. To this end, let {Am }, m ≥ 1, be a monotone sequence of elements of M1 with limit A. We shall show that A ∈ M1 , which amounts to showing that P(A ∩ F2 ∩ · · · ∩ Fn ) = P(A)P(F2 ) · · · P(Fn )
(10.8)
for all F j ∈ F j , j = 2, . . . , n. Indeed, as m → ∞,
P(A ∩ F2 ∩ · · · ∩ Fn ) = P lim Am ∩ F2 ∩ · · · ∩ Fn m
= P ∪ Am ∩ F2 ∩ · · · ∩ Fn , if Am ↑ m
= P ∪(Am ∩ F2 ∩ · · · ∩ Fn ) m
= P lim(Am ∩ F2 ∩ · · · ∩ Fn ) m
= lim P(Am ∩ F2 ∩ · · · ∩ Fn ) m
= lim P(Am )P(F2 ) · · · P(Fn ) m
= P(A)P(F2 ) · · · P(Fn ), and the same is true, if Am ↓, by replacing union by intersection. This justifies the assertion made. Next, define M2 by M2 = {B ∈ A2 ; P(A1 ∩ B ∩ F3 ∩ · · · ∩ Fn ) = P(A1 )P(B)P(F3 ) · · · P(Fn ) with A1 ∈ M1 and F j ∈ F j , j = 3, . . . , n}.
(10.9)
10.3 Proof of Theorem 1 and of Lemma 1 in Chapter 9
Then, by (10.8) and (10.9), F2 ⊆ M2 ⊆ A2 , and we shall show that M2 is a monotone class. This is done as in the previous step by letting {Am }, m ≥ 1, be a monotone sequence of elements of M2 with limit A, and showing that P(A1 ∩ A ∩ F3 ∩ · · · ∩ Fn ) = P(A1 )P(A)P(F3 ) · · · P(Fn )
(10.10)
for all A1 ∈ M1 and all F j ∈ F j , j = 3, . . . , n. Continuing on like this, let Mn be the class defined by Mn = {B ∈ An ; P(A1 ∩ · · · ∩ An−1 ∩ B) = P(A1 ) · · · P(An−1 )P(B) with A j ∈ M j , j = 1, . . . , n − 1}.
(10.11)
Then as before, Fn ⊆ Mn ⊆ An , and Mn is a monotone class, because if {Am }, m ≥ 1, is a monotone sequence of elements of Mn with limit A, then it is shown as in the previous steps that P(A1 ∩ · · · ∩ An−1 ∩ A) = P(A1 ) · · · P(An−1 )P(A)
(10.12)
for all A j ∈ M j , j = 1, . . . , n − 1. Gathering together the results just obtained, we then have the following Lemma 3. For j = 1, . . . , n, the classes M j defined by (10.7), (10.9), and (10.11) are monotone classes with the property that F j ⊆ M j ⊆ A j , and by (10.12), n n P(A j ), for all A j ∈ M j , j = 1, . . . , n. (10.13) P ∩ Aj = j=1
j=1
The following proposition is an immediate consequence of what has just been discussed. Proposition 2. If the fields F j , j = 1, . . . , n are independent, then so are the σ -fields generated by them, A j = σ (F j ), j = 1, . . . , n. Proof. In Lemma 3, it was proved that relation (10.6) implies relation (10.13) for all A j ∈ M j , where M j are monotone classes with F j ⊆ M j ⊆ A j , j = 1, . . . , n. But, by Theorem 6 in Chapter 1, M j = A j , j = 1, . . . , n. This completes the proof.
10.3 Proof of Theorem 1 and of Lemma 1 in Chapter 9 We are now ready to prove Theorem 1 here and Lemma 1 in the previous chapter. Proof of Theorem 1. If the r.v.s are independent, then n n P(A j ), for all A j ∈ A j , j = 1, . . . , n, P ∩ Aj = j=1
j=1
187
188
CHAPTER 10 Independence
and in particular, this is true for A j = X −1 j ((−∞, x j ]), j = 1, . . . , n, which gives (10.5). In the other way around, if (10.5) holds, then so does (10.13) by way of Lemma 3 and Proposition 2, for all A j ∈ A j , j = 1, . . . , n, which establishes independence of X j , j = 1, . . . , n. Remark 3.
Relation (10.5) can also be written as follows in terms of d.f.s: FX 1 ,...,X n (x1 , . . . , xn ) = FX 1 (x1 ) · · · FX n (xn )
for all x1 , . . . , xn in . This section is concluded with the proof of Lemma 1 in Chapter 9. Proof of Lemma 1 in Chapter 9. The proof follows the four familiar steps. It is established successively for indicators, simple r.v.s, nonnegative r.v.s, and any r.v.s. So, let X = I A , Y = I B , A, B ∈ A. Then the σ -fields B X = {∅, A, Ac , } and BY = {∅, B, B c , } are independent because of the independence of the r.v.s = X and Y . Next, X Y = I A∩B , so that E(X
Ym) = P(A ∩ B) = P(A)P(B) n (E X )(EY ). Now let X = (E I A )(E I B ) =
i=1 αi I Ai and Y = j=1 β j I B j , so that X Y = i j γi j I Ai ∩B j , where γi j = αi β j , if Ai ∩ B j = , and whatever; e.g., 0, if Ai ∩ B j = , and E(X Y ) = γi j P(Ai ∩ B j ) = αi β j P(Ai ∩ B j ) i
j
= =
i
j
i
j
αi β j P(Ai )P(B j )
αi P(Ai ) × β j P(B j ) = (E X )(EY ). i
j
Next, let X and Y be ≥ 0 r.v.s and let n → ∞ in the remainder of this proof. Then there exist simple r.v.s X n and Yn such that 0 ≤ X n ↑ X and 0 ≤ Yn ↑ Y , so that 0 ≤ X n Yn ↑ X Y . By the Lebesgue Monotone Convergence Theorem, we have then: E X n → E X , EYn → EY , so that (E X n )(EYn ) → (E X )(EY ), and E(X n Yn ) → E(X Y ). However, E(X n Yn ) = (E X n )(EYn ) by the previous step. Thus, E(X Y ) = (E X )(EY ) < ∞. Finally, for any r.v.s X and Y , we have X Y = (X + − X − )(Y + − Y − ) = X + Y + − X + Y − − X − Y + + X − Y − , and all expectations E X + , E X − , EY + , EY − are finite by the assumption that E |X | < ∞ and E |Y | < ∞. Then, by the previous step, all expectations E(X + Y + ), E(X + Y − ), E(X − Y + ), E(X − Y − ) are finite, and therefore E(X Y ) = E(X + Y + ) − E(X + Y − ) − E(X − Y + ) + E(X − Y − ) = (E X + )(EY + ) − (E X + )(EY − ) −(E X − )(EY + ) + (E X − )(EY − ) = (E X + )(EY + − EY − ) − (E X − )(EY + − EY − ) = (EY + − EY − )(E X + − E X − ) = (E X )(EY ) finite.
10.3 Proof of Theorem 1 and of Lemma 1 in Chapter 9
Exercises. 1. If the events A1 , . . . , An are independent, then so are the events A1 , . . . , An where Ai is either Ai or Aic , i = 1, . . . , n. Hint: The proof is done by double induction. See also Theorem 6 in Chapter 2 in Roussas (1997). 2. Consider the measurable space (, A), let Fi , i = 1, 2, be fields with Fi ⊆ A, i = 1, 2, and define F by F = {all finite unions of As with A ∈ A; A = A1 ∪ A2 or A = A1 ∩ A2 , Ai ∈ Fi , i = 1, 2}. Then show that F is a field. 3. For any real numbers p1 , . . . , pn such that 0 ≤ pi ≤ 1, i = 1, . . . , n, show that n n n 1 − exp − pi ≤ 1 − (1 − pi ) ≤ pi , m = 1, . . . , n. i=m
i=m
i=m
Hint: For the left-hand side, use the inequality e x ≥ 1 + x, x ∈ , and for the right-hand side employ the induction method. 4. Let An , n = 1, 2, . . ., be independent events in the probability space (, A, P). Then show that
(i) ∞ n=1 P(An ) = ∞ if and only if P(lim supn→∞ An ) = 1. ∞ (ii) n=1 P(An ) < ∞ if and only if P(lim supn→∞ An ) = 0. Hint: For part (i), use the left-hand side inequality in Exercise 3 here, and Exercise 3 in Chapter 3. For part (ii), use part (i) and Exercise 3 in Chapter 3 again. Remark: This result is referred to as the Borel Zero-One Criterion. 5. Let An , n = 1, 2, . . ., be independent events in the probability space (, A, P), and suppose that limn→∞ An exists, call it A. Then show that P(A) = 0 or P(A) = 1. Hint: Use Exercise 4. 6. Let X n , n ≥ 1, be independent r.v.s distributed as B(1, p), and set X¯ n =
n P a.s. n −1 i=1 X i . Then show that X¯ n → p and X¯ k 2 → p. n→∞
k→∞
Hint: For the second conclusion, use Exercise 4(i) here and Theorem 4 in Chapter 3. 7. (i) If X n , n = 1, 2, . . ., are independent r.v.s defined on the probability space
a.s. (, A, P), show that X n → 0 if and only if ∞ n=1 P(|X n | ≥ ε) < ∞ for n→∞ every ε > 0. (ii) Reconcile this result with the result in Exercise 4 in Chapter 3. Hint: Use Exercise 4(ii) here and Theorem 4 in Chapter 3. 8. Refer to Exercise 11 in Chapter 9 and construct a concrete example, by means of independent r.v.s (e.g., Binomially distributed r.v.s), to demonstrate the correctness of the assertion made there.
189
190
CHAPTER 10 Independence
9. The r.v.s X 1 , . . . , X k are independent if and only if in relation (10.4) all B j s are replaced by intervals (a j , b j ] with a j , b j in and a j < b j , j = 1, . . . , k. Hint: Work as in Lemma 1 in order to show that P(X ≤ x1 , a j < X j ≤ b j , j = 2, . . . , k) = P(X ≤ x1 )
k
P(a j < X j ≤ b j ),
j=2
and complete the factorization by replacing one of the remaining X j , j = 2, . . . , k, at a time. 10. Let X and Y be independent r.v.s and suppose that E X exists. For every B ∈ B, let A = Y −1 (B) and show that A X d P = (E X )P(A). 11. Show that a r.v. X is independent of itself if and only if P(X = c) = 1 for some (finite) constant c. 12. (i) For two r.v.s with finite second moments, it follows that, if X and Y are independent, then they are uncorrelated (i.e., ρ(X , Y ) = 0, or equivalently, Cov(X , Y ) = 0). Justify this statement. (ii) For the case that the r.v.s X and Y have the Bivariate Normal distribution, use Exercise 14(iii) in Chapter 9 in order to show that, if X and Y are uncorrelated, then they are independent. 13. If the r.v.s X 1 and X 2 are independent, then show that Eeit(X 1 +X 2 ) = Eeit X 1 × Eeit X 2 ,
t ∈ ,
and by induction, Eeit(X 1 +···+X k ) = Eeit X 1 × · · · × Eeit X k ,
t ∈ ,
for the independent r.v.s X 1 , . . . , X k . Hint: Write eit(X 1 +X 2 ) = eit X 1 × eit X 2 = [cos(t X 1 ) + i sin(t X 1 )] × [cos(t X 2 ) + i sin(t X 2 )] and use Proposition 1. 14. (i) If the r.v.s X and Y are independent, distributed as N (0, σ 2 ), and U = X + Y , V = X − Y , then show that U and V are independent, distributed as N (0, 2σ 2 ), by transforming the joint p.d.f. of X and Y into the joint p.d.f. of U and V . (ii) If the r.v.s X and Y are independent, distributed as N (μ1 , σ 2 ) and N (μ2 , σ 2 ), respectively, use part (i) in order to show that U and V are independent, distributed as N (μ1 + μ2 , 2σ 2 ) and N (μ1 − μ2 , 2σ 2 ), respectively. 15. Consider the probability space (, A, P), and let A1 , . . . , An be independent events with P(Ak ) = p, k = 1, . . . , n. Next, define the function X : → as follows: X (ω) = the number of A1 , . . . , An containing ω. Then show that (i) X is a r.v. (ii) The distribution of X is B(n, p) (i.e., Binomial with parameters n and p).
10.3 Proof of Theorem 1 and of Lemma 1 in Chapter 9
16. Consider the probability space (, A, P), let {An }, n ≥ 1, be a sequence of events, and set X n = I An . Then show that the events {A1 , A2 , . . .} are independent if and only if the r.v.s X 1 , X 2 , . . . are independent. 17. If X ∼ B(n, p), compute the probability that P(number of Hs = number of Ts + r ), r = 0, 1, . . . , n. 18. Let X 1 , . . . , X n be independent identically distributed (i.i.d.) r.v.s defined on the probability space (, A, P) and having d.f. F. Let Fn be the empirical d.f. defined in terms of the X i s; i.e., 1 Fn (x, ω) = [number of X 1 (ω), . . . , X n (ω) ≤ x]. n de f
Then show that sup{|Fn (x, ·) − F(x)|; x ∈ } = Dn (·) is a r.v. That is, although Dn (·) is arrived at through noncountable operations, it is still a r.v. de f Hint: Define Dn+ and Dn− by: Dn+ = Dn+ (·) = supx∈ [Fn (x, ·) − F(x)], Dn− = de f
Dn− (·) = supx∈ [F(x) − Fn (x, ·)], so that Dn = max{Dn+ , Dn− }. Next, show that Dn+ = max{max1≤i≤n [ ni − F(yi )], 0}, and Dn− = max{max1≤i≤n [F(yi − 0) − i−1 n ], 0}, where yi = x (i) , i = 1, . . . , n, and x (1) ≤ x (2) ≤ · · · ≤ x (n) are the ordered xi s, xi = X i (ω), and ω is an arbitrary but fixed ω ∈ . 19. (Glivenko–Cantelli). Refer to Exercise 18 and show that a.s.
sup{|Fn (x, ω) − F(x)|; x ∈ } −→ 0. n→∞
Hint: For 0 < p < 1, define x p by: x p = inf{x ∈ ; F(x) ≥ p}, so that F(x) ≥ p for x ≥ x p , and F(x) < p for x < x p , which implies F(x p − 0) ≤ p. Next, replace p by i/k (k ≥ 2 integer), i = 0, 1, . . . , k, to get the points xki , with −∞ ≤ xk0 < xk1 , and xk,k−1 < xkk ≤ ∞. Then, for x ∈ [ ki , i+1 k ), i = 0, 1, . . . , k − 1, it holds that i i +1 ≤ F(xki ) ≤ F(x) ≤ F(xk,i+1 − 0) ≤ , k k so that F(xk,i+1 −0)− F(xki ) ≤ k1 . Use this result and the nondecreasing property of F and Fn to obtain, for x ∈ and i = 0, 1, . . . , k: 1 Fn (x) − F(x) ≤ [Fn (xk,i+1 − 0) − F(xk,i+1 − 0)] + , k 1 Fn (x) − F(x) ≥ [Fn (xki ) − F(xki )] − , k so that |Fn (x) − F(x)| ≤ max{|Fn (xk,i+1 − 0) − F(xk,i+1 − 0)|, 1 |Fn (xki ) − F(xki )|; i = 0, 1, . . . , k − 1} + . k Finally, take the sup over x ∈ (which leaves the right-hand side intact), and use the SLLN to each one of the (finitely many) terms on the right-hand side to arrive at the asserted conclusion.
191
CHAPTER
Topics from the Theory of Characteristic Functions
11
This chapter is a rather extensive one consisting of nine sections. The main theme of the chapter is the introduction of the concept of a characteristic function (ch.f.) and the study of some of its properties, as well as some of the ways it is used for probabilistic purposes. It is to be emphasized that ch.f.s are not studied, to the extent they are, for their own sake; rather, they are looked upon as a powerful tool for the purpose of obtaining certain probability results. A brief description of the sections is as follows. In the first section, the concept of the ch.f. of a d.f. is defined and some of its basic properties are established. In the following section, the so-called inversion formula is proven in several forms. The significance of this formula is that it allows recovery of the distribution by means of its ch.f. The application of this inversion formula is illustrated by way of two simple examples. One of the basic convergences in probability is convergence in distribution, which, in practice, is not easy to check directly. The so-called Paul Lévy Continuity Theorem, which is the main result in Section 11.3, replaces convergence in distribution by convergence of ch.f.s; this latter convergence is much easier to handle. Convergence in distribution in higher than one-dimensional spaces is in essence replaced by convergence in distribution in the real line. This is done by way of the Cramér-Wold device, which is discussed in Section 11.4. The convolution of two d.f.s, its interpretation, and several related results are discussed in Section 11.5, whereas some technical properties of ch.f.s are studied in the following section. An application of some of these results yields the Weak Law of Large Numbers and the Central Limit Theorem; this is done in Section 11.7. The basic result discussed in the next section is that, under certain regularity conditions, the moments of a distribution uniquely determine the distribution. For its rigorous justification certain concepts and results from complex analysis are required, which are dealt with in the final section of the chapter.
11.1 Definition of the Characteristic Function of a Distribution and Basic Properties In all that follows, d.f.s are nonnegative, nondecreasing, right-continuous functions with finite variations; it is not assumed that the variations are necessarily bounded by 1 unless otherwise stated (see also Exercises 4 and 5 in Chapter 8). An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00011-6 Copyright © 2014 Elsevier Inc. All rights reserved.
193
194
CHAPTER 11 Topics from the Theory of Characteristic Functions
Definition 1. The characteristic function f of a d.f. F (in the sense of Definition 1 in Chapter 8; see also Remark 5 there) is, in general, a complex-valued function defined on by it x e d F(x) = cos t xd F(x) + i sin t xd F(x). (11.1) f (t) =
The integration in (11.1) is to be understood either in the sense of Riemann–Stieltjes, or as integration with respect to the measure induced by F (see also Appendix B). The integral is well defined for all t ∈ , since cos t x and sin t x are F-integrable. If F is the d.f. of a r.v. X , then (11.1) may be rewritten as f X (t) = Eeit X = E cos t X + iE sin t X .
(11.2)
Some basic properties of a ch.f. are gathered next in the form of a theorem. Theorem 1. (i) | f (t)| ≤ Var F, t ∈ , and f (0) = Var F. In particular, if f (0) = 1 and 0 ≤ F(x) ≤ 1, then f is the ch.f. of a r.v. (ii) f is uniformly continuous in . (iii) If f is the ch.f. of a r.v. X , then f α X +β (t) = eiβt f X (αt), t ∈ , where α and β are constants. (iv) If f is the ch.f. of a r.v. X , then f −X (t) = f X (t), t ∈ , where, for z = x + i y(x, y ∈ ), z¯ = x − i y. dn (v) If for some positive integer n the nth moment E X n is finite, then dt n f X (t)|t=0 = inE Xn. Remark 1. In the proof of the theorems, as well as in other cases, the following property is used: [g(x) + i h(x)]dμ ≤ |g(x) + i h(x)| dμ = [g 2 (x) + h 2 (x)]1/2 dμ,
where g and h are real-valued functions, and [g(x) + i h(x)]dμ = i h(x)dμ. Its justification is left as an exercise (see Exercise 1). Proof of Theorem 1.
g(x)dμ +
For convenience omit in the integration. Then
(i) | f (t)| = | eit x d F(x)| ≤ |eit x |d F(x) = Var F, and f (0) = d F(x) = Var F. If f (0) = 1, then Var F = 1, which together with 0 ≤ F(x) ≤ 1, x ∈ , implies F(−∞) = 0, F(∞) = 1, so that F is the d.f. of a r.v. (ii) | f (t + h) − f (t)| = | ei(t+h)x d F(x) − eit x d F(x)| = | [ei(t+h)x − eit x ] d F(x)| = | [eit x (ei hx − 1)]d F(x)| ≤ |eit x (ei hx − 1)|d F(x) = |ei hx − 1|d F(x). Now |ei hx − 1| ≤ 2, which is independent of h and F-integrable. Furthermore, ei hx − 1 → 0 as h → 0. Therefore the Dominated Convergence
11.2 The Inversion Formula
Theorem applies and gives |ei hx − 1|d F(x) → 0 as h → 0. So | f (t + h) − f (t)| is bounded by a quantity that is independent of t and → 0 as h → 0. This establishes uniform continuity for f . (iii) f α X +β (t) = Eeit(α X +β) = E[eiβt ei(αt)X ] = eiβt E[ei(αt)X ] = eiβt f X (αt). (iv) f −X (t) = Eeit(−X ) = Eei(−t)X = E[cos(−t X ) + i sin(−t X )] = E[cos(t X ) − i sin(t X )] = E cos t X − iE sin t X = E cos t X + iEsin t X = f X (t). (v) Consider, e.g., the interval [−r , r ] for some r > 0. Then, for t ∈ [−r , r ], ∂ it X = i X eit X exists, and | ∂t∂ eit X | ≤ |X |, independent of t and integrable. ∂t e Then, by Theorem 5 in Chapter 5, ∂ it X d d d F(x) f (t) = eit X d F(x) = e dt dt ∂t = i (X eit X )d F(x), d f (t)|t=0 = i (X eit X )|t=0 d F(x) = iE X . and, in particular, dt d k it X = i k X k eit X exists, and The same applies for any k, 1 ≤ k ≤ n, since dt ke k
d it X | dt e | ≤ |X k |, independent of t and integrable. In particular, k k k i X d F(x) = i k E X k .
dk dt k
f (t)|t=0 =
11.2 The Inversion Formula By means of relation (11.1), the d.f. F defines its ch.f. f . The converse is also true; i.e., if it is given that f is the ch.f. of a d.f. F, then F can be recovered by means of the so-called inversion formula. More precisely, we have the following result. Theorem 2.
Let F be a d.f. (not necessarily of a r.v.) and let f be its ch.f. Then
(i) For any a, b ∈ (a < b) and T > 0: F(a) + F(a−) F(b) + F(b−) − 2 2 T −ita 1 e − e−itb f (t)dt. = lim T →∞ 2π −T it
(11.3)
(ii) If a, b ∈ C(F) and T > 0: 1 T →∞ 2π
F(b) − F(a) = lim
T
−T
e−ita − e−itb f (t)dt. it
(11.4)
195
196
CHAPTER 11 Topics from the Theory of Characteristic Functions
(iii) If X is a discrete r.v. taking on the value x j with probability P(X = x j ) = p(x j ), j ≥ 1, then T 1 p(x j ) = lim e−it x j f (t)dt, j ≥ 1, T > 0. T →∞ 2T −T In the course of the proof of the theorem, the following two facts will be needed, which are recorded here for easy reference. b Fact 1. The Dirichlet integrals a sinx x d x are bounded uniformly in a, b ∈ . 0 ∞ Fact 2. −∞ sinx x d x = 0 sinx x d x = π2 . (See, e.g., integral 417 in Tallarida (1999).) Also, the following remarks are in order. Remark 2. (i) As it will be seen in the proof of the theorem, the integrals on the right-hand sides of relations (11.3) and (11.4) are real. (ii) It is to be noticed that in (11.3) and (11.4), we consider the Cauchy principal values of the integrals (i.e., integrals over intervals symmetric with respect to the origin); the integrals taken over arbitrary limits may fail to exist. (iii) It is to be pointed out here that, whereas a d.f. F determines the corresponding ch.f. uniquely, the converse is not true. That is, a ch.f. determines a class of d.f.s through the difference F(b) − F(a), a, b ∈ C(F) (Theorem 2(ii)) rather than a unique d.f. Any two such d.f.s differ by a constant. Indeed, let F and G be two such d.f.s. Then, for x and a in C(F) ∩ C(G) with a < x, we have F(x) − F(a) = G(x) − G(a)(= the right-hand side in (11.4)). Letting a ↓ −∞ through C(F) ∩ C(G), we get F(x) − F(−∞) = G(x) − G(−∞) or F(x) − G(x) = F(−∞) − G(−∞), a constant for x ∈ . Nevertheless, all these d.f.s define the same measure (Theorem 7 in Chapter 4). In particular, if f is the ch.f. of a r.v., then the corresponding measure is a probability measure (Theorem 1(i) here). Proof of Theorem 2. (i) Set g(t) =
e−ita − e−itb , it
J (T ) =
1 2π
T
−T
g(t) f (t)dt.
(11.5)
Expanding e−ita and e−itb around 0, up to terms of second order, dividing by t( = 0), and taking the limit as t → 0, we obtain limt→0 g(t) = b − a. Then define by continuity, g(0) = b − a. (See also Exercise 2.) Since b e−ita − e−itb = e−it y dy (both for t = 0 and t = 0), and it a ∞ f (t) = eit x d F(x), −∞
11.2 The Inversion Formula
we have 1 J (T ) = 2π =
1 2π
=
1 2π
T
−T T
b
a ∞
e
−it y
b
−T −∞ a ∞ b T −∞ a
−T
dy
∞
−∞
e
it x
d F(x) dt
eit(x−y) dyd F(x)dt eit(x−y) dtdyd F(x),
(11.6)
where the change of the order of the integration is allowed by the Fubini Theorem (Theorem 12 in Chapter 5), since |eit(x−y) | = 1 is integrable with respect to the d.f. F (over (−∞, ∞)), the Lebesgue measure (over (a, b)), and the Lebesgue measure again (over (−T , T )). Next, for x = y: T T 1 eit(x−y) dt = deit(x−y) i(x − y) −T −T 1 = [ei T (x−y) − e−i T (x−y) ]. i(x − y) On the other hand, from eiv = cos v + i sin v, e−iv = cos v − i sin v, we get: eiv − e−iv = 2i sin v or 2i1 (eiv − e−iv ) = sin v, so that 1 sin[T (x − y)] [ei T (x−y) − e−i T (x−y) ] = 2T . i(x − y) T (x − y) Since sinx x → 1 as x → 0, sinx x is defined to be 1 at x = 0 by continuity. But T T for x = y : −T eit(x−y) dt = −T dt = 2T . Thus, for all x, y, relation (11.6) becomes T ∞ b sin[T (x − y)] dyd F(x). (11.7) J (T ) = π −∞ a T (x − y) b (x−y)] Next, consider the integral πT a sin[T T (x−y) dy and set T (x − y) = −u, so that dy =
du T ,u
: T (a − x), T (b − x), to get 1 T (b−x) sin u de f T b sin[T (x − y)] dy = du = K T (x), π a T (x − y) π T (a−x) u
so that by (11.7) and (11.8), we have ∞ K T (x)d F(x). J (T ) = −∞
(11.8)
(11.9)
By Fact 1, K T (x) is bounded uniformly in T , a, b, and x, and its bound (a finite constant) is F-integrable. Next, by Fact 2, ⎧ ⎫ ⎨ 0 if x < a or x > b ⎬ de f lim K T (x) = 1 if a < x < b = K (x). (11.10) ⎩1 ⎭ T →∞ if x = a or x = b 2
197
198
CHAPTER 11 Topics from the Theory of Characteristic Functions
By the boundedness of K T (x) by a constant (independent of T ), F-integrable, and relation (11.10), the Dominated Convergence Theorem applies, and therefore (11.9) gives ∞ ∞ K T (x)d F(x) = K (x)d F(x) lim J (T ) = lim T →∞
= +
T →∞ −∞
a−
−∞ b
K (x)d F(x) +
K (x)d F(x) +
a
K (x)d F(x)
a+
∞
K (x)d F(x) +
b
−∞ b−
a
K (x)d F(x)
b+
1 = 0 + [F(a) − F(a−)] + 1 × [F(b−) − F(a)] 2 1 + [F(b) − F(b−)] + 0 2 1 1 = [F(b) + F(b−)] − [F(a) + F(a−)]. 2 2
(11.11)
Then relations (11.5) and (11.11) complete the proof of part (i). (ii) It follows from part (i), since F(a) = F(a−) and F(b) = F(b−). (iii) We have T T e−it x j f (t)dt = e−it x j eit xk p(xk ) dt −T
−T
= =
T
−T
k
p(xk )
k
k
e
it(xk −x j )
T
−T
p(xk ) dt
eit(xk −x j ) dt,
by the Fubini Theorem, because the integrand is integrable with respect to the counting measure (or the probability measure induced by { p(xk )}; k ≥ 1), and the Lebesgue measure over [−T , T ], and this is = p(x j ) × 2T + ∞ it(xk −x j ) dt. But for xk = x j , k = j −∞ e T T T it(xk −x j ) e dt = cos t(xk − x j )dt + i sin t(xk − x j )dt −T
−T T
−T
=
−T
cos t(xk − x j )dt
T 1 d sin t(xk − x j ) (xk − x j ) −T sin T (xk − x j ) − sin[−T (xk − x j )] 2 sin T (xk − x j ) = = . xk − x j xk − x j =
11.2 The Inversion Formula
Therefore, the original integral becomes T 1 1 p(x j ) × 2T e−it x j f (t)dt = 2T −T 2T 2 sin T (xk − x j ) 1 + p(xk ) 2T xk − x j xk =x j
= p(x j ) +
xk =x j
p(xk )
sin T (xk − x j ) . T (xk − x j )
Set T (xk − x j ) = x and expand sin x around 0 up to terms of order one to obtain sin x = sin 0 + (x − 0) cos x|x=x ∗ (for some x ∗ = θ x, |θ | ≤ 1) = x cos x ∗ , so that T (xk − x j ) cos x ∗ sin T (xk − x j ) = = cos x ∗ , and T (xk − x j ) T (xk − x j ) sin T (xk − x j ) ∗ T (x − x ) = | cos x | ≤ 1 k
j
independent of T and integrable (with respect to the measure { p(xk ), xk = x j }). Therefore, by the Dominated Convergence Theorem, lim
T →∞
p(xk )
xk =x j
sin T (xk − x j ) T (xk − x j )
sin T (xk − x j ) =0 T (xk − x j ) xk =x j sin T (xk − x j ) 1 ≤ since −→ 0 T (xk − x j ) |T (xk − x j )| T →∞ T −it x 1 j f (t)dt = p(x ). and hence lim T →∞ 2T j −T e =
p(xk ) lim
T →∞
To this theorem, there are the following three corollaries. Let a = x − h and b = x + h (h > 0) be continuity points of F. Then 1 T sin(th) −it x e f (t)dt. F(x + h) − F(x − h) = lim T →∞ π −T t
Corollary 1.
Proof.
For this choice of a and b, we have e−ita − e−itb = e−it(x−h) − e−it(x+h) = e−it x (eith − e−ith ) = e−it x × 2i sin(th),
199
200
CHAPTER 11 Topics from the Theory of Characteristic Functions
so that 1 2π
T −T
1 e−ita − e−itb f (t)dt = it π
T
−T
sin(th) −it x e f (t)dt. t
The result follows from part (ii). Corollary 2. given by
The d.f. F is differentiable and its derivative at a, 1 h→0 T →∞ 2π
p(a) = lim lim
T −T
F (a)
1 − e−ith −ita e f (t)dt ith
= p(a), is (11.12)
if and only if the right-hand side in (11.12) exists. Proof. In part (ii) of the theorem, set b = a + h (with both a and a + h continuity points of F, h > 0). Then e−ita − e−itb = e−ita − e−it(a+h) = e−ita (1 − e−ith ), and hence 1 1 [F(a + h) − F(a)] = lim T →∞ 2π h
T −T
1 − e−ith −ita e f (t)dt. ith
(11.13)
Suppose first that p(a) exists. Then, taking the limits in (11.13), as h → 0, we have the desired result. Next, if the limit (as h → 0) on the right-hand side of (11.13) exists, then p(a) exists and is taken as stated. Similarly for h < 0. ∞ Corollary 3. If −∞ | f (t)|dt < ∞, then the derivative F (x) = p(x) exists, is bounded and continuous in , and is given by ∞ 1 p(x) = e−it x f (t)dt. (11.14) 2π −∞ From the expansion e−ith = 1 − ithe z , with z = −ithθ, θ real, |θ | ≤ 1,
Proof.
1−e−ith z we have ith = e , (t −ith Also, 1−eith ≤ |e z | =
= 0), and
1−e−ith ith
is defined to be 1 for t = 0 by continuity.
1. Thus
∞ 1 − e−ith −ita 1 − e−ith −ita e e f (t)dt = f (t)I[−T ,T ] (t)dt ith ith −T −∞ −ith with 1−eith e−ita f (t)I[−T ,T ] (t) ≤ | f (t)| independent of T and (Lebesgue-) integrable, and 1 − e−ith −ita 1 − e−ith −ita lim e e f (t)I[−T ,T ] (t) = f (t). T →∞ ith ith T
Then the Dominated Convergence Theorem applies and gives ∞ T 1 − e−ith −ita 1 − e−ith −ita lim e e f (t)dt = f (t)dt. T →∞ −T ith ith −∞
(11.15)
11.2 The Inversion Formula
In the integrand on the right-hand side of (11.15), look at h as an index, and observe that −ith | 1−eith e−ita f (t)| ≤ | f (t)| independent of h, (Lebesgue-) integrable, and 1−e−ith −ita ith e
f (t) → e−ita f (t). Therefore, the Dominated Convergence Theorem h→0
again gives
lim
∞
h→0 −∞
1 − e−ith −ita e f (t)dt = ith
∞ −∞
e−ita f (t)dt.
Thus, from (11.15) and (11.16), T ∞ 1 1 − e−ith −ita 1 lim lim e f (t)dt = e−ita f (t)dt. h→0 T →∞ 2π −T ith 2π −∞
(11.16)
(11.17)
Since left-hand side of (11.17) is equal to p(a) by (11.12), have p(a) = ∞the−ita ∞ we 1 1 −it x f (t)dt. The e f (t)dt. Replacing a by x, we have p(x) = e 2π −∞ ∞ −it x 2π −∞ ∞ 1 1 f (t)dt| ≤ 2π boundedness of p(x) follows by p(x) = | 2π −∞ e −∞ | f (t)| (t)| 1 −it x dt < ∞, whereas continuity follows thus: | 2π e f (t)| ≤ | f2π independent of 1 −it x 1 −it x0 x, (Lebesgue-) integrable, and 2π e f (t) → 2π e f (t). Then the Dominated x→x0
Convergence Theorem completes the proof. Here are two examples that illustrate how the inversion formula applies. Example 1.
Let the r.v. X be distributed as B(n, p), so that its ch.f. is given by n n n itk n k n−k p q ( peit )k q n−k f (t) = e = k k k=0
k=0
= ( peit + q)n , t ∈ (q = 1 − p). Apply Theorem 2(iii) to recover p(x), x = 0, . . . , n. We have T T 1 1 −it x e f (t)dt = e−it x ( peit + q)n dt 2T −T 2T −T T n n 1 −it x it k n−k ( pe ) q = e dt k 2T −T k=0 T n n k n−k i(k−x)t 1 p q e = dt k 2T −T k=0 n 1 n k n−k T i(k−x)t p q e dt = k 2T −T k=0 n n k n−k T i(k−x)t 1 p q = e dt k 2T −T k=0,k =x 1 n x n−x p q + × 2T 2T x
201
202
CHAPTER 11 Topics from the Theory of Characteristic Functions
n n k n−k ei(k−x)T − e−i(k−x)T p q k 2i T (k − x)
= p(x) + = p(x) +
k=0,k =x n k=0,k =x
n k n−k sin(k − x)T p q , k (k − x)T
and by taking the limit, as T → ∞, the second term on the right-hand side above T −it x 1 1 → 0 as |x| → ∞). It follows that lim T →∞ 2T tends to 0 (since | sinx x | ≤ |x| −T e f (t)dt = p(x) as in Theorem 2(iii). Example 2. Let the r.v. X be distributed as N (0, 1), so that its ch.f. is f (t) = e−t ∞ 2 Since | f (t)|dt = −∞ e−t /2 dt < ∞, Corollary 3 applies. We have then ∞ ∞ 1 1 2 −it x e f (t)dt = e−it x e−t /2 dt 2π −∞ 2π −∞ ∞ 1 2 = e−(t +2it x)/2 dt 2π −∞ ∞ 1 2 2 2 = e−[t +2it x+(i x) ]/2 e(i x) /2 dt 2π −∞ 2 e−x /2 ∞ −(t+i x)2 /2 e dt = 2π −∞ 2 e−x /2 ∞ 1 −u 2 /2 du = √ √ e 2π −∞ 2π e−x /2 1 2 = √ × 1 = √ e−x /2 = p(x). 2π 2π
2 /2
.
2
11.3 Convergence in Distribution and Convergence of Characteristic Functions—The Paul Lévy Continuity Theorem In this section, two versions of the Paul Lévy continuity theorem are stated and proved, after a number of auxiliary results have been established. The significance of this theorem is that convergence in distribution of a sequence of r.v.s to a r.v. is reduced to convergence of ch.f.s. Convergence in distribution is not easy to deal with, whereas convergence of ch.f.s is amenable to a large body of analytical facts and techniques. Below, we define the so-called integral ch.f. of a d.f. F, which is needed as a tool in the sequel. To this end, let F be a d.f. (not necessarily of a r.v.) with a ch.f. f . In terms of f , we define the function fˆ as follows: t f (v)dv, t ∈ . (11.18) fˆ(t) = 0
11.3 Convergence in Distribution and Convergence
Then fˆ(t) =
t 0
e
ivx
d F(x) dv =
t 0
e
ivx
d F(x)dv =
t
0
eivx dv d F(x)
(by the Fubini Theorem, since the integrand is integrable). Now, for x = 0, we have
t
eivx dv =
0
while for x = 0, we have
t 0
eit x = 1 + it x +
1 ix
t
0
deivx =
1 ivx t eit x − 1 e |0 = , ix ix
dv = t. Expanding e(i x)t around t = 0, we get (it x)2 ∗ × eit x , for some t ∗ with |t ∗ | ≤ 1, 2
so that eit x − 1 t2 ∗ = t + (i x) eit x . ix 2 eit x −1 ix
is defined by continuity at x = 0 (which is what we usually do), then t = t, and this is what we get as a value of 0 eivx dv for x = 0. So, for any x, we have then it x e −1 ˆ d F(x). (11.19) f (t) = ix
Thus, if
eit x −1 i x |x=0
Definition 2. The function fˆ as defined in (11.18) or (11.19) is called the integral ch.f. of the d.f. F. Remark 3. There is a one-to-one correspondence between f and fˆ. In fact, f uniquely determines fˆ by means of (11.18). On the other hand (i.e., if we are told that fˆ is the integral ch.f. of a ch.f. f and we want to recover f ), since f is continuous, we have fˆ (t) = f (t), so that fˆ uniquely determines f . The simplest (and most commonly used) version of the continuity theorem is the following. Theorem 3 (Paul Lévy Continuity Theorem). For n ≥ 1, let Fn and F be d.f.s of r.v.s with respective ch.f.s f n and f . Then, as n → ∞, c
(i) Fn ⇒ F (or, equivalently, Fn →F) implies f n → f on . c
(ii) f n → f on implies Fn ⇒ F (or, equivalently, Fn →F).
For the proof of part (ii), we need the following two auxiliary results. Lemma 1. Let f n and f be as in Theorem 3, and let fˆn and fˆ be the respective integral ch.f.s. Then, as n → ∞, f n → f on implies fˆn → fˆ on .
203
204
CHAPTER 11 Topics from the Theory of Characteristic Functions
Proof. For t > 0 (and similarly for t < 0), | f n (v)I[0,t] (v)| ≤ I[0,t] (v) independent of n, (Lebesgue-) integrable, and f n (v)I[0,t] (v) → f (v)I[0,t] (v). Then, by the n→∞ Dominated Convergence Theorem, t t f n (v)dv = f n (v)I[0,t] (v)dv → f (v)I[0,t] (v)dv = f (v)dv, n→∞
0
0
or fˆn → fˆ on .
Lemma 2. Let Fn , F and f n , f be as in Theorem 3, and let fˆn and fˆ be the respective integral ch.f.s of f n and f . Then, as n → ∞, fˆn → fˆ on implies Fn ⇒ F (or, c equivalently, Fn →F). In the course of the proof of this lemma, as well as elsewhere, the following elementary fact is employed, which is stated here as a remark. Remark 4. If {z n }, n ≥ 1, is a sequence of (real or complex) numbers, then z n → z 0 as n → ∞, if and only if for any subsequence {m} ⊆ {n} there exists a further subsequence {r } ⊆ {m} such that zr → z 0 as r → ∞. Proof of Lemma 2. In this proof, all limits are taken as {n} or subsequencies thereof converge to ∞. To show that Fn (x) → F(x), x ∈ C(F). By Remark 4, it suffices to show that for every {m} ⊆ {n} there exists {r } ⊆ {m} such that Fr (x) → F(x), x ∈ C(F). Since {Fm } (evaluated at x ∈ C(F)) is bounded (by 1), there exists {r } ⊆ {m} such that Fr ⇒ F0 , some d.f. on . (This is so by Theorem 5 in Chapter 8.) ˆ Clearly, 0 ≤ F0 (x) ≤ 1, x ∈ , and let f0 and f 0 be the ch.f. and the integral ch.f. of F0 , respectively. We have fˆr (t) = (eit x − 1)/i x d Fr (x) with (eit x − 1)/i x continuous (in x) over (for each arbitrary but fixed t ∈ ) and |(eit x − 1)/i x| ≤ 2/|x| → 0 as |x| → ∞. Also, Fr ⇒ F0 . Then (by Theorem 7 in Chapter 8), it x it x e −1 e −1 d Fr (x) → d F0 (x), i x ix or fˆr → fˆ0 on . However, fˆn → fˆ on , so that fˆr → fˆ and fˆ0 = fˆ on . It follows that f 0 = f on , and therefore F0 − F = c, some constant c (see also Remark 2(iii)). We shall show that c = 0, which will establish the assertion that Fn ⇒ F. Indeed, from f 0 = f on , we have f 0 (0) = f (0), or V ar F0 = V ar F = 1. So, 0 ≤ F0 (x) ≤ 1, x ∈ , and V ar F0 = 1. Then F0 (−∞) = 0 and F0 (∞) = 1. Finally, from F0 (x) − F(x) = c, we get, as x → −∞, F0 (−∞) − F(−∞) = c, or 0 − 0 = c; i.e., c = 0 and F0 = F. Proof of Theorem 3. (i) For each arbitrary t ∈ , eit x is continuous (in x) and bounded (by 1). Since c
Fn →F, Theorem 8 in Chapter 8 applies and gives that eit x d Fn (x) → eit x d F(x) = f (t), f n (t) =
or f n → f on .
11.3 Convergence in Distribution and Convergence
(ii) It follows from Lemmas 1 and 2. (Or, in more detail, from f n → f or , we have fˆn (t) =
t 0
f n (v)dv =
f n (v)I[0,t] (v) dv
with | f n (v)I[0,t](v) | ≤ I[0,t] (v) (since | f n (v)| ≤ V ar Fn = 1) independent of n and (Lebesgue-) integrable. Also, f n (v)I[0,t] (v) → f (v)I[0,t] (v). Hence f n (v)I[0,t] (v) dv → f (v)I[0,t] (v) dv = fˆ(t); fˆn (t) =
i.e.; fˆn → fˆ on . Next, in order to show that Fn ⇒ F, suffices to show that for every {Fm } ⊆ {Fn }, there is {Fr } ⊆ {Fm } such that Fr ⇒ F. To this end, let {Fm } ⊆ {Fn }. Then there is {Fr } ⊆ {Fm } such that Fr → F0 , some d.f. This is so by Theorem 6 in Chapter 8. Let f 0 and fˆ0 be the ch.f. and the integral ch.f., respectively, of F0 . Then Fr ⇒ F0 implies fˆr → f 0 on , by part (i). However, fˆn → fˆ, by what it was just proved. Hence fˆr → fˆ, and therefore fˆ = fˆ0 , which implies f = f 0 . But then F0 − F = c, a constant. To show that c = 0. Indeed, 0 ≤ F0 ≤ 1 (since 0 ≤ Fn ≤ 1 for all n), and 1 = f (0) = f 0 (0) implies that V ar F0 = 1. Then F0 (−∞) = 0. Since F0 (x) − F(x) = c, x ∈ , letting x ↓ −∞, we get 0 − 0 = c, so that F0 = F, and therefore Fn ⇒ F.) A version of Theorem 3, whose part (ii) is seemingly less restrictive than part (ii) of Theorem 3, is the following. Theorem 3* (Paul Lévy Continuity Theorem). For n ≥ 1, let Fn be d.f.s of r.v.s with respective ch.f.s f n . Then (i) If Fn ⇒ F, a d.f. of a r.v. with ch.f. f , it follows that f n → f on , as n → ∞. (ii) Let f n → g, some function on continuous at the origin. Then Fn =⇒ F, n→∞ n→∞ where F is a (uniquely determined) d.f. of a r.v. Remark 5. Clearly, part (i) is the same in both Theorems 3 and 3*. In part (ii) of Theorem 3*, it is not required that the limit g of { f n } be a ch.f. at all (even less the ch.f. of a r.v.), but it turns out that g is, indeed, the ch.f. of a r.v. and that Fn =⇒ F, n→∞ the uniquely determined d.f. corresponding to f . For the proof of part (ii) of Theorem 3*, we need the following auxiliary results. Lemma 3. Let f n and g be as in Theorem 3*, and let fˆn be the integral ch.f. ˆ corresponding to t f n . Then, as n → ∞, f n → g on implies that f n → gˆ on , where g(t) ˆ = 0 g(v)dv (where the integral is to be understood in the Lebesgue sense). Proof. It is the same as that of Lemma 1, where f and fˆ are replaced by g and g, ˆ respectively.
205
206
CHAPTER 11 Topics from the Theory of Characteristic Functions
Before the second auxiliary result is formulated, we have to introduce the concept of weak (and complete) convergence of a sequence of d.f.s to a d.f. up to an additive constant. At this point, recall that, if Fn , n ≥ 1, and F are d.f.s (not necessarily of r.v.s), then weak convergence Fn ⇒ F (as n → ∞) means Fn (x) → F(x), x ∈ C(F). In this section, the need for a modification of this convergence arises. This is due to the fact that there is no one-to-one correspondence between d.f.s and the corresponding ch.f.s (although there is such one-to-one correspondence between the measures induced by the d.f.s and the respective ch.f.s). From relation (11.1), it follows that it is, actually, an entire class of d.f.s that induce the same ch.f.; namely, a d.f. F and any d.f. F + c, c ∈ , constant, for which F(x) + c ≥ 0, and any two members of this class differ by a constant. On the other hand, by means of the inversion formula, a ch.f. defines a class of d.f.s any two of which differ by a constant. So, for a d.f. F, all d.f.s F + c, c ∈ such that F(x) + c ≥ 0, x ∈ , have the same ch.f., and a given ch.f. f determines a class of d.f.s of the form just described. These observations lead to the following definition. Definition 3. For n ≥ 1, let Fn and F be d.f.s (not necessarily of r.v.s). We say that {Fn } converges weakly to F up to an additive constant, and we write Fn =⇒ F n→∞ uac, if for every {n 1 } ⊆ {n} and {n 2 } ⊆ {n} with Fn 1 =⇒ F1 and Fn 2 =⇒ F2 , d.f.s, it n→∞
n→∞
holds that F1 − F2 = c, some constant. Also, we say that {Fn } converges completely c to F uac, and write Fn → F uac, if Fn =⇒ F uac and V ar Fn −→ V ar F. (The c
n→∞
n→∞ c
n→∞
convergence Fn −→ F, trivially, implies Fn −→ F uac.) n→∞
n→∞
Remark 6. (i) If F0 is the limiting d.f. of a (weakly) converging subsequence of {Fn }, n ≥ 1, then any other limiting d.f. F is of the form F = F0 + c, some c ∈ . (ii) Let F0 and F be as in part (i). Then, if F(−∞) = F0 (−∞) = 0, it follows that F(x) = F0 (x), x ∈ . Consequently, if F0 is the d.f. of a r.v., then so is F. Indeed, F(x) = F0 (x) + c, and as x → −∞, F(−∞) = F0 (−∞) + c, or 0 = 0 + c, and c = 0. (iii) Clearly, all the limiting d.f.s F determine the same ch.f. f . c
c
n→∞
n→∞
(iv) If Fn =⇒ F, then, trivially, Fn =⇒ F uac, and if Fn −→ F, then Fn −→ F uac. n→∞
n→∞
Lemma 4. For n ≥ 1, let Fn be d.f.s of r.v.s with respective ch.f.s and integral ch.f.s t f n and fˆn . Let gˆ be as in Lemma 3 (i.e., g(t) ˆ = 0 g(v)dv, t ∈ , with g defined on and being continuous at the origin), so that fˆn → gˆ in . Then Fn =⇒ F uac, some d.f. F with ch.f. f , and fˆ = gˆ on .
n→∞
Proof. In the proof, all limits are taken as {n} or subsequences thereof tend to ∞. Let {n 1 } and {n 2 } be any subsequences of {n} such that Fn 1 ⇒ F1 and Fn 2 ⇒ F2 , some d.f.s. (Such subsequences exist, by Theorem 5 in Chapter 8, since the Fn s are bounded (by 1).) Let f 1 , f 2 , and fˆ1 , fˆ2 be the respective ch.f.s and integral ch.f.s of F1 and F2 . As in the proof of Lemma 2, fˆn i → fˆi on , i = 1, 2, and since
11.3 Convergence in Distribution and Convergence
ˆ on , and hence f 1 = f 2 on . fˆn → gˆ on , we conclude that fˆ1 = fˆ2 (= g) Thus, all (weakly) convergent subsequences of {Fn } determine the same ch.f., call it f . Therefore F1 − F2 = c, so that Fn ⇒ F uac, where the d.f. F is in the class of d.f.s determined by f . Finally, fˆ = fˆ1 = fˆ2 = gˆ from above. Proposition 1. For n ≥ 1, let Fn be d.f.s of r.v.s with respective ch.f.s and integral f n and fˆn , let g be a function on continuous at the origin, and let g(t) ˆ = ch.f.s t g(v)dv, t ∈ (where the integral is to be understood in the Lebesgue sense). 0 Suppose that f n −→ g on . Then Fn =⇒ F, a (uniquely determined) d.f. of a r.v. n→∞
n→∞
Proof. With all limits taken as n → ∞, we have that f n → g on implies fˆn → gˆ on (by Lemma 3), and this, in turn, implies (by Lemma 4) that Fn ⇒ F uac, some d.f. with ch.f. f and fˆ = g. ˆ From fˆ = gˆ on , we have t t 1 t 1 t f (v)dv = g(v)dv, t ∈ , or f (v)dv = g(v)dv, t = 0. t 0 t 0 0 0 Taking the limits, as t → 0, we get f (0) = g(0) (by continuity at 0; see also Exercise 3). From f n → g on , we have 1 = f n (0) → g(0), so that g(0) = f (0) = 1. The d.f.s F1 and F2 in Lemma 4 take values in [0, 1], as they are limits of such sequences. So, 0 ≤ Fi (x) ≤ 1, x ∈ , and V ar Fi = f (0) = 1, i = 1, 2. It follows Fi (−∞) = 0, so that F1 (x) − F2 (x) = c yields, as x → −∞, 0 − 0 = c. Then the limiting d.f. F in Lemma 4 is uniquely determined, and is the d.f. of a r.v. Proof of Theorem 3*. All we have to do is to justify part (ii). However, this is the conclusion of Proposition 1. Convergence in distribution is preserved under continuity, as the following example shows. Example 3.
d
Let X 1 , X 2 , . . ., and X be r.v.s such that X n −→ X , and let g : → n→∞
d
be continuous. Then g(X n ) −→ g(X ). n→∞
Indeed, by Theorem 3, it suffices to show that f g(X n ) (t) −→ f g(X ) (t). However, n→∞
f g(X n ) (t) = Eeitg(X n ) = E cos[tg(X n )] + iE sin[tg(X n )] = cos[tg(x)]d FX n (x) + i sin[tg(x)]d FX n (x) cos[tg(x)]d FX (x) + i sin[tg(x)]d FX (x) −→ n→∞
(by Theorem 8 in Chapter 8, since the integrands are bounded and continuous on ) = Eeitg(X ) = f g(X ) (t). c
The foregoing convergence is valid, because FX n −→ FX , and cos[tg(x)] and n→∞
sin[tg(x)] are bounded and continuous on , so that Theorem 8 in Chapter 8 applies.
207
208
CHAPTER 11 Topics from the Theory of Characteristic Functions
d
As a simple application, we have that X n −→ Z
∼
n→∞
N (0, 1) implies
d
X n2 −→ Z 2 whose distribution is the so-called chi-square with one degree of free-
n→∞ dom, Z 2
∼ χ12 . It is of some importance to observe that suitable versions of the results incorporated in the Lemmas 1, 2, 3, 4, and Proposition 1 also hold under less restrictive conditions. This is the content of the following proposition.
Proposition 2. With n ≥ 1, let Fn be (uniformly) bounded d.f.s (not necessarily of r.v.s) with ch.f.s f n and integral ch.f.s fˆn . Then (i) If Fn =⇒ F uac, some d.f. F with ch.f. f and integral ch.f. fˆ, it follows that n→∞
fˆn → fˆ on . n→∞
ˆ some function on , it follows that there exists a d.f. F (not nec(ii) If fˆn → g, n→∞
essarily of a r.v.) with ch.f. f and integral ch.f. fˆ such that Fn =⇒ Fuac and n→∞
fˆ = gˆ on .
With all limits taken as {n} or subsequences thereof tend to ∞, we have
Proof.
(i) In order to show that fˆn → fˆ on , it suffices to show (by Remark 4) that for every {m} ⊆ {n} there exists {r } ⊆ {m} such that fˆr → fˆ on . Indeed, by looking at {Fm }, there exists a subsequence {Fr } ⊆ {Fm } (by the uniform boundedness of the Fn s and Theorem 5 in Chapter 8) such that Fr ⇒ F, some d.f. with ch.f. f and integral ch.f. fˆ. Then it x it x e −1 e −1 ˆ d Fr (x) → d F(x) = fˆ(r ), fr (t) = ix ix as in the proof of Lemma 2, or fˆr → fˆ on . It follows that fˆn → fˆ on . (ii) For any {n 1 } ⊆ {n} and {n 2 } ⊆ {n} with Fn 1 ⇒ F1 and Fn 2 ⇒ F2 , some d.f.s F1 and F2 , one has (by part (i)) that fˆn 1 → fˆ1 and fˆn 2 → fˆ2 on , where fˆ1 and fˆ2 are the integral ch.f.s corresponding to F1 and F2 . Since also fˆn 1 → gˆ ˆ and hence f 1 = f 2 for the and fˆn 2 → gˆ on , it follows that fˆ1 = fˆ2 (= g), respective ch.f.s. Then F1 − F2 = c, some constant c, and hence Fn ⇒ F uac, some d.f. F with ch.f. f (= f 1 = f 2 ). Corollary. For n ≥ 1, let Fn , f n and fˆn be as in the proposition, and suppose that f n → g a.e., (with respect to Lebesgue measure) on . Then Fn =⇒ Fuac, some n→∞ n→∞ d.f. F (not necessarily of the r.v.) with ch.f. f , and f = g a.e., Proof. For n ≥ 1 and t ∈ , | f n (t)| are uniformly bounded by a constant, and f n → g a.e., where here and in the sequel all limits are taken as n → ∞. Then t t t def f n (v)dv → g(v)dv, or fˆn (t) → g(t) ˆ = g(v)dv, t ∈ . 0
0
0
11.3 Convergence in Distribution and Convergence
Then, by part (ii) of the proposition, Fn ⇒ F uac, some d.f. F with ch.f. f and integral ch.f. fˆ, and fˆ = gˆ on . From t t ˆ f (v)dv and g(t) ˆ = g(v)dv, f (t) = 0
0
it follows that fˆ = f on , and gˆ = g a.e., (see, e.g., Theorem 10 on page 107 of Royden (1988)). However, fˆ = gˆ on . Hence f = g a.e. Proposition 3. With n ≥ 1, let Fn be (uniformly) bounded d.f.s (not necessarily of r.v.s) with ch.f.s f n and integral ch.f.s fˆn . Then c
(i) If Fn → F uac, some d.f. F with ch.f. f , it follows that f n → f on . n→∞
n→∞
(ii) If f n → g, some function on continuous function at the origin, it follows that n→∞
c
there exists a d.f. F (not necessarily of a r.v.) with ch.f. f such that Fn → F n→∞ uac and f = g on . Proof.
With all limits taken as {n} or subsequences thereof tending to ∞, we have
(i) It suffices to prove that, for every {m} ⊆ {n}, there exists {r } ⊆ {m} such that fr → f on . Looking at {Fm }, there exists {Fr } ⊆ {Fm } such that Fr ⇒ F, some d.f. F with ch.f. f . Since for each arbitrary and fixed t ∈ , eit x is bounded and continuous in (as a function of x), it follows (by Theorem 8 in Chapter 8) that it x e d Fr (x) → eit x d F(x) = f (t), t ∈ , fr (t) =
so that f n → f on . (ii) For n ≥ 1 and t ∈ , | f n (t)| are uniformly bounded by a constant, and f n → g on . Then t t def f n (v)dv → g(v)dv = g(t), ˆ t ∈ . fˆn (t) = 0
0
Therefore, by part (ii) of Proposition 2, it follows that there exists a d.f. F (not necessarily of a r.v.) with ch.f. f and integral ch.f. fˆ such that Fn ⇒ F uac and fˆ = gˆ on . That is, t t 1 t 1 t f (v)dv = g(v)dv, t ∈ , or f (v)dv = g(v)dv, t = 0. t 0 t 0 0 0 By taking the limits as t → 0, we have then (see also Exercise 3) f (0) = g(0). Since f n → g on , we have f n (0) → g(0), or V ar Fn = f n (0) → g(0) = c f (0) = V ar F. Therefore Fn →F uac.
209
210
CHAPTER 11 Topics from the Theory of Characteristic Functions
11.4 Convergence in Distribution in the Multidimensional Case—The Cramér–Wold Device For k ≥ 2, one may define a k-dimensional d.f. F and establish properties similar to those stated and proved in Chapter 8 for a one-dimensional d.f. Also, one may define its ch.f. and establish properties and results analogous to the ones shown in the first two sections of this chapter. However, we will only restrict ourselves to the definition of the d.f. of a k-dimensional random vector and its ch.f., and state two theorems analogous to Theorems 2 and 3. Definition 4. (i) The d.f. of a k-dimensional random vector X = (X 1 , . . . , X k ), or the joint d.f. of the r.v.s X 1 , . . . , X k , is defined by FX (x) = FX 1 ,...,X k (x1 , . . . , xk ) = P(X 1 ≤ x1 , . . . , X k ≤ xk ), x1 , . . . , xk ∈ . (ii) The ch.f. of the random vector X, or the joint ch.f. of the r.v.s X 1 , . . . , X k , is defined by f X (t) = EetX = Eet1 X 1 +···+tk X k , t1 , . . . , tk ∈ , where “ ” denotes transpose. (iii) For n ≥ 1, let Fn be the d.f. of the k-dimensional random vector Xn = (X 1n , . . . , X kn ), and let F be the d.f. of the random vector X = (X 1 , . . . , X k ). d
Then Xn → X or Fn =⇒ F, if FXn (x) → FX (x) for all continuity points x of n→∞ n→∞ n→∞ FX . A version of Theorem 2 for the k-dimensional case reads as follows. Theorem 2 . Let X be a k-dimensional random vector, X = (X 1 , . . . , X k ), with d.f. F and ch.f. f . Then, for continuity points a = (a1 , . . . , ak ) and b = (b1 , . . . , bk ) of F, it holds that P(a j < X j ≤ b j , j = 1, . . . , k) k Tk T1 k e−it j a j − e−it j b j 1 ··· f (t1 , . . . , tk ) × = lim 2π it j −Tk −T1 j=1
dt1 · · · dtk , as (0 <)T j → ∞, j = 1, . . . , k.
Also, a version of Theorem 3 is as follows. Theorem 3 . For n ≥ 1, let Fn be d.f.s of k-dimensional random vectors with ch.f.s f n . Suppose Fn =⇒ F, a d.f. of a k-dimensional random vector with ch.f. f . Then n→∞
f n → f on k . n→∞
Conversely, if f n → f , on k , then Fn =⇒ F. (Thus, if Fn , n ≥ 1, and F are n→∞ n→∞ d.f.s of k-dimensional random vectors with respective ch.f.s f n and f , then in order to show that Fn ⇒ F, it suffices to show that f n → f on k .) n→∞
n→∞
11.5 Convolution of Distribution Functions and Related Results
However, a certain device, stated as the Cramér–Wold Theorem next, although it makes use of Theorem 3 , reduces the actual proof of weak convergence for the k-dimensional case to the one-dimensional case. Let Xn , X be k-dimensional random vectors with
Theorem 4 (Cramér–Wold).
d
respective d.f.s and ch.f.s Fn , F and f n , f . Then Xn → X (i.e., Fn =⇒ F) if and n→∞ n→∞ only if, for any c j ∈ , j = 1, . . . , k, k
d
c j X jn →
n→∞
j=1
k
d
c j X j , or c Xn → c X , n→∞
j=1
where Xn = (X 1n , . . . , X kn ), X = (X 1 , . . . , X k ), and c = (c1 , . . . , ck ).
d
Proof. With all limits taken as n → ∞, let Xn →X, equivalently, Fn ⇒ F. Then, by Theorem 3 , f n → f on k , or Eeit1 X 1n +···+itk X kn → Eeit1 X 1 +···+itk X k , t j ∈ k , j = 1, . . . , k. For any c j ∈ , take t j = c j t, j = 1, . . . , k, any t ∈ . Then the preceding relation is rewritten thus: Eei(c1 X 1n )t+···+i(ck X kn )t → Eei(c1 X 1 )t+···+i(ck X k )t , or
Eei(c1 X 1n +···+ck X kn )t → Eei(c1 X 1 +···+ck X k )t , d and then, by Theorem 3, kj=1 c j X jn → kj=1 c j X j . d Next, let kj=1 c j X jn → kj=1 c j X j for any c j ∈ , j = 1, . . . , k. Then, by Theorem 3, Eei(c1 X 1n +···+ck X kn )t → Eei(c1 X 1 +···+ck X k )t , t ∈ . Since for t = 0, both sides of this equation are equal to 1, suppose that t = 0, and t for any t j ∈ , take c j = tj , j = 1, . . . , k. Since c j t = t j , j = 1, . . . , k, the last expression becomes Eeit1 X 1n +···+itk X kn → Eeit1 X 1 +···+itk X k , d
and this implies that Xn →X by Theorem 3 .
11.5 Convolution of Distribution Functions and Related Results The d.f.s to be considered in this section are as described in Section 11.1 of this chapter, unless otherwise specified. To this end, let F1 and F2 be two d.f.s, and define the function J by J (x) =
F1 (x − y)d F2 (y), x ∈ .
(11.20)
211
212
CHAPTER 11 Topics from the Theory of Characteristic Functions
Definition 5. The function J defined by (11.20) is called the convolution (or composition) of the d.f.s F1 and F2 and is denoted by J = F1 ∗ F2 . Theorem 5. Let J be defined by (11.20). Then J is a d.f., and, in particular, it is a d.f. of a r.v. if F1 and F2 are d.f.s of r.v.s. Proof.
Clearly, J (x) ≥ 0 on , and J (x) = F1 (x − y)d F2 (y) ≤ C1 d F2 (y) = C1 V2 ,
(11.21)
where C1 is a bound for F1 and V2 is the variation of F2 . It is nondecreasing, since for x1 > x2 , J (x1 ) − J (x2 ) = [F1 (x1 − y) − F1 (x2 − y)]d F2 (y) ≥ 0,
as x1 > x2 implies F1 (x1 − y) ≥ F1 (x2 − y). Hence J (x1 ) ≥ J (x2 ). Next, J is continuous from the right. In fact, let x0 ∈ and let x ↓ x0 . Then F1 (x − y) ↓ F1 (x0 − y) while F1 (x − y) ≤ C1 , independent of x, and integrable. Hence, the Dominated Convergence Theorem gives F1 (x − y)d F2 (y) → F1 (x0 − y)d F2 (y) = J (x0 ). J (x) =
Finally, if F1 and F2 are d.f.s of r.v.s, then J (x) ≤ 1, by (11.21), and J (−∞) = 0, J (∞) = 1. In fact, as x → −∞, then F1 (x − y) → 0, y ∈ . Since again F1 (x − y) ≤ 1, integrable, we get, by the Dominated Convergence Theorem, F1 (x − y)d F2 (y) → 0; i.e., J (x) → 0, or J (−∞) = 0. x→−∞
Next, x → ∞ implies F1 (x − y) → 1, y ∈ . As before,
F1 (x − y)d F2 (y) →
or J (∞) = 1.
1d F2 (y) = 1; i.e., J (x) → 1, x→∞
Remark 7. If J ∗ (x) = F2 (x − y)d F1 (y), then J ∗ (x) is also a d.f., denoted by F2 ∗ F1 . Now let f , f 1 , f 2 be the ch.f.s corresponding to F, F1 , F2 . Then the following theorem is true.
Theorem 6. If F = F1 ∗ F2 , then f = f 1 × f 2 . Conversely, if f 1 and f 2 are the ch.f.s of the d.f.s F1 and F2 , respectively, and if we set f = f 1 × f 2 , then f is the ch.f. of the d.f. F, where F = F ∗ + c, for some constant c, and F ∗ = F1 ∗ F2 .
11.5 Convolution of Distribution Functions and Related Results
Let F = F1 ∗ F2 . Then f (t) = eit x d F(x) =
Proof.
lim
α→−∞,β→∞ (α,β]
eit x d F(x).
For α < β, look at (α, β] and consider the following partition for each n: α
(
α = x n1
x n2
x n3
]
···
x nk n
β
xn,k n +1 = β
where the partitioning points are chosen so that max (xn, j+1 − xn j ) → 0. n→∞
j=1,...,kn
Then (α,β]
e
it x
kn
d F(x) = lim
n→∞
= lim
j=1 kn
n→∞
eit xn j [F(xn, j+1 ) − F(xn j )] eit xn j
j=1
− = lim
F1 (xn, j+1 − y)d F2 (y)
F1 (xn j − y)d F2 (y)
kn
n→∞ j=1
eit xn j [F1 (xn, j+1 − y)
−F1 (xn j − y)]d F2 (y) ⎧ ⎨ kn eit(xn j −y) [F1 (xn, j+1 − y) = lim n→∞ ⎩ j=1 ⎫ ⎬ − F1 (xn j − y)]eit y d F2 (y). ⎭
(11.22)
n it(x −y) But kj=1 e n j [F1 (xn, j+1 − y) − F1 (xn j − y)] are partial sums tending to the integral of eit x over the interval (α − y, β − y] with respect to F1 . Furthermore, these partial sums are bounded in absolute value by kn [F1 (xn, j+1 − y) − F1 (xn j − y)] = F1 (xn,kn +1 − y) j=1
−F1 (xn1 − y) ≤ V1 ,
213
214
CHAPTER 11 Topics from the Theory of Characteristic Functions
the variation of F1 , independent of n, and F2 -integrable, whereas lim
kn
n→∞
eit(xn j −y) [F1 (xn, j+1 − y) − F1 (xn j − y)]eit y
j=1
=
(α−y,β−y]
eit x d F1 (x) eit y .
Then the Dominated Convergence Theorem gives ⎧ ⎨ kn lim eit(xn j −y) [F1 (xn, j+1 − y) n→∞ ⎩ j=1 −F1 (xn j − y)]eit y d F2 (y) ⎧ ⎨ kn = eit(xn j −y) [F1 (xn, j+1 − y) lim ⎩n→∞ j=1 it y −F1 (xn j − y)]e d F2 (y) it x = e d F1 (x) eit y d F2 (y),
(α−y,β−y]
or, by (11.22), (α,β]
Next,
e
it x
d F(x) =
(α−y,β−y]
(α−y,β−y]
e
it x
d F1 (x) e
e
it x
d F1 (x) eit y d F2 (y).
≤ V1 , independent of α, β,
it y
and F2 -integrable, whereas
lim e d F1 (x) eit y (α−y,β−y] = eit x d F1 (x) eit y = f 1 (t)eit y . it x
α→−∞,β→∞
Therefore, by the Dominated Convergence Theorem again, and (11.23), lim eit x d F(x) α→−∞,β→∞ (α,β]
(11.23)
11.5 Convolution of Distribution Functions and Related Results
= =
lim
α→−∞,β→∞
=
lim
α→−∞,β→∞
(α−y,β−y]
e
(α−y,β−y]
f 1 (t)eit y d F2 (y) = f 1 (t)
d F1 (x) eit y d F2 (y) eit x d F1 (x) eit y d F2 (y)
it x
eit y d F2 (y) = f 1 (t) f 2 (t).
(11.24)
Since the left-hand side of (11.24) is eit x d F(x) = f (t), we have then f (t) = f 1 (t) f 2 (t). Thus, if F = F1 ∗ F2 , then f = f 1 × f 2 . From F ∗ = F1 ∗ F2 and the direct part, we have f ∗ = f 1 × f 2 , where f ∗ is the ch.f. of F ∗ . Also, f = f 1 × f 2 . Thus, f = f ∗ . Hence f is a ch.f. and the corresponding d.f. is F = F ∗ + c, for some constant c. To this theorem, there are the following three corollaries. Corollary 1.
The product of two ch.f.s is a ch.f.
Proof. Let f 1 and f 2 be two ch.f.s. For j = 1, 2, the ch.f. f j determines a class C j of d.f.s, any two of which differ by a constant. Let F j ∈ C j , j = 1, 2, be any two d.f.s, and let J = F1 ∗ F2 . Then, by Theorem 5, J is a d.f., and let f be its ch.f. Then, by Theorem 6, f = f 1 × f 2 , so that the product f 1 × f 2 is a ch.f. Corollary 2. For any two d.f.s F1 and F2 , we have F1 ∗ F2 = F2 ∗ F1 uac, and F1 ∗ F2 = F2 ∗ F1 , if F1 and F2 are d.f.s of r.v.s. Proof. Let F1 ∗ F2 = J and F2 ∗ F1 = J ∗ with respective ch.f.s f and f ∗ . Then, by Theorem 6, f = f 1 × f 2 and f ∗ = f 2 × f 1 . Since f 1 × f 2 = f 2 × f 1 , we have f = f ∗ , so that J − J ∗ = c, for some constant c, which proves the first assertion. For the second assertion, we have that if both F1 and F2 are d.f.s of r.v.s, then J and J ∗ are d.f.s of r.v.s by Theorem 5. Since f = f ∗ , by the first part here, it follows that J = J ∗ (see Remark 6(ii)). Corollary 3. If F1 and F2 are, respectively, the d.f.s of the independent r.v.s X 1 and X 2 , then F = F1 ∗ F2 (= F2 ∗ F1 ) is the d.f. of the r.v. X 1 + X 2 . Proof.
By Lemma 1 in Chapter 10, f X 1 +X 2 (t) = Eeit(X 1 +X 2 ) = E(eit X 1 × eit X 2 ) = (Eeit X 1 )(Eeit X 2 ) = f X 1 (t) f X 2 (t) = f (t),
where f is the ch.f. of F1 ∗ F2 (by Corollary 1). Since F1 ∗ F2 is a d.f. of a r.v., it follows that it is the d.f. of X 1 + X 2 . This section is concluded with the definition of symmetry of a r.v., and some results related to it. Definition 6. The r.v. X is said to be symmetric about zero, if the r.v.s X and −X have the same distribution; i.e., P(X ≤ x) = P(−X ≤ x) = P(X ≥ −x), x ∈ .
215
216
CHAPTER 11 Topics from the Theory of Characteristic Functions
Theorem 7.
We have
(i) For any ch.f. f , f (−t) = f (t), t ∈ . (ii) If f X is the ch.f. of the r.v. X , then f X is the ch.f. of the r.v. −X . (iii) X is symmetric about zero if and only if its ch.f. f X is real.
Proof. (i) Let F be any d.f. corresponding to f . Then f (−t) = e−it x d F(x) [cos(−t x) + i sin(−t x)]d F(x) = [cos(t x) − i sin(t x)]d F(x) = = =
[cos(t x) + i sin(t x)]d F(x) eit x d F(x) = f (t). (See also Theorem 1(iv).)
(ii) By Theorem 1(iii), f α X +β (t) = eiβt f X (αt). For α = −1, β = 0, this becomes f −X (t) = f X (−t). But f X (−t) = f X (t). Thus f X (t) is the ch.f. of −X . (Or, by part (i), f −X (t) = Eeit(−X ) = Eei(−t)X = f X (−t) = f X (t).) (iii) Let X be symmetric about zero with d.f. FX . Then it x f X (t) = e d FX (x) = eit x d F−X (x) = f −X (t) = f X (t)
by part (ii); i.e., f X = f X , so that f X is real. Next, let f X be real. Then f X = f X . But f X = f −X by part (ii). Thus f X = f −X or FX and F−X are the same; hence, X is symmetric about zero.
11.6 Some Further Properties of Characteristic Functions In this section, two main results (Theorems 8 and 9) are established pertaining to ch.f.s of r.v.s. Theorem 8 (and its corollary) may also be established for certain ch.f.s that are not necessarily ch.f.s of r.v.s. Theorem 8. For n = 1, 2, . . ., let f n , f be ch.f.s of r.v.s Then, if f n → f on , it n→∞ follows that the convergence is uniform in closed intervals in .
11.6 Some Further Properties of Characteristic Functions
Proof. It suffices to prove that f n (t) → f (t) uniform in t ∈ [−T , T ], T > 0. Let Fn , F be the d.f.s corresponding to f n , f . Then we have it x it x | f n (t) − f (t)| = e d Fn (x) − e d F(x) it x it x ≤ e d Fn (x) − e d F(x) (α,β] (α,β] + eit x d Fn (x) + eit x d F(x) . (11.25) −(α,β]
But
−(α,β]
eit x d F(x) ≤ d F(x) = 1 − [F(β) − F(α)], −(α,β] −(α,β] eit x d Fn (x) ≤ d Fn (x) = 1 − [Fn (β) − Fn (α)].
−(α,β]
−(α,β]
Thus, (11.25) becomes
| f n (t) − f (t)| ≤
(α,β]
e
it x
d Fn (x) −
(α,β]
e
it x
d F(x)
+{1 − [F(β) − F(α)]} + {1 − [Fn (β) − Fn (α)]}.
(11.26)
Pick α, β to be continuity points F and such that 1 − [F(β) − F(α)] <
ε . 7
(11.27)
Now, with n → ∞, f n → f implies Fn ⇒ F(by Theorem 3) so that Fn (α) → F(α), ε 7 by means of (11.27). Then
Fn (β) → F(β) and 1 − [Fn (β) − Fn (α)] < 1 − [F(β) − F(α)] + for n > n 1 = n 1 (ε), and the last expression is < (11.26) becomes it x e d Fn (x) − | f n (t) − f (t)| ≤ n ≥ n1.
(α,β]
2ε 7
(α,β]
e
it x
3ε d F(x) + , 7 (11.28)
The proof of the theorem would be completed (by means of (11.28)) if we knew that eit x d Fn (x) → eit x d F(x) uni f or mly in t ∈ [−T , T ]. (α,β]
n→∞ (α,β]
This convergence is true for each t (by the Helly–Bray Lemma, Theorem 6 in Chapter 8), and the uniformity in t ∈ [−T , T ] is the content of the following result.
217
218
CHAPTER 11 Topics from the Theory of Characteristic Functions
Statement. Under the assumptions of Theorem 8 and with α and β being continuity points of F, it holds it x e d Fn (x) → eit x d F(x) uni f or mly in t ∈ [−T , T ]. n→∞ (α,β]
(α,β]
Proof.
Pick points α = x1 < x2 < · · · < x N < x N +1 = β
to be continuity points of F and such that max (xk+1 − xk ) ≤
k=1,...,N
ε , 7T
and on (α, β], define the function gt as follows: gt (x) = eit xk if x ∈ (xk , xk+1 ], k = 1, . . . , N . Pick n ≥ n 2 = n 2 (ε, N ), so that |Fn (xk ) − F(xk )| <
ε , k = 1, . . . , N + 1. 7(N + 1)
(11.29)
Next,
(α,β]
e
it x
d Fn (x) −
≤
(α,β]
(α,β]
e
it x
d F(x)
|eit x − gt (x)|d Fn (x)
|eit x − gt (x)|d F(x) + gt (x)d Fn (x) (α,β] (α,β] − gt (x)d F(x) .
+
(α,β]
(11.30)
But for x ∈ (α, β], |eit x − gt (x)| = |eit x − eit xk | (for some xk ) and this equals it xk it(x−xk ) − 1 = eit(x−xk ) − 1 ≤ |t(x − xk )| , e e since ei x − 1 ≤ |x|, x ∈ (see also Exercise 4). Then the preceding expression is ε ≤ T |x − xk | < T × 7T = 7ε . Thus, (11.30) becomes 2ε it x it x e d Fn (x) − e d F(x) < 7 (α,β] (α,β] (11.31) gt (x)d Fn (x) − gt (x)d F(x) . + (α,β]
(α,β]
11.6 Some Further Properties of Characteristic Functions
But, by the definition of gt (x), g (x)d F (x) − t n (α,β]
(α,β]
gt (x)d F(x)
N N = eit xk [Fn (xk+1 ) − Fn (xk )] − eit xk [F(xk+1 ) − F(xk )] k=1
≤
N
k=1
|[Fn (xk+1 ) − Fn (xk )] − [F(xk+1 ) − F(xk )]|
k=1
=
N
|[Fn (xk+1 ) − F(xk+1 )] − [Fn (xk ) − F(xk )]|
k=1
≤2
N +1
|Fn (xk ) − F(xk )|
k=1
≤2(N + 1) ×
2ε ε = by (11.29). 7(N + 1) 7
Thus, (11.31) yields, for n ≥ n 2 = n 2 (ε, N ), 4ε it x it x < e d F (x) − e d F(x) , n 7 (α,β] (α,β]
(11.32)
as was to be seen. Completion of the Proof of Theorem 8. For t ∈ [−T , T ], (11.28) becomes, by means of (11.32), and for n ≥ n(ε) = max{n 1 (ε), n 2 (ε, N )}, | f n (t)− f (t)| ≤
3ε 4ε + = ε. 7 7
Next, we recall the definition of continuous convergence and derive a simple result to be employed in the corollary following. Definition 7. For n ≥ 1, let gn , g be functions defined on A ⊆ into . Then, as n → ∞, we say that gn → g continuously in A if gn (tn ) → g(t) whenever tn → t, tn , t ∈ A. Proposition 4. Let gn → g uniformly in A ⊆ and let g be continuous. Then n→∞ gn → g continuously in A. n→∞
Proof. We have |gn (tn ) − g(t)| ≤ |gn (tn ) − g(tn )| + |g(tn ) − g(t)|, and gn (tn ) − g(tn ) → 0 by uniform convergence of gn and g(tn ) − g(t) → 0 by continuity n→∞ n→∞ of g. Corollary to Theorem 8. If f n → f and tn → t, tn , t ∈ , then f n (tn ) → f (t) n→∞ n→∞ n→∞ (i.e., f n converges continuously in ).
219
220
CHAPTER 11 Topics from the Theory of Characteristic Functions
Proof. With n → ∞, let tn → t, tn , t ∈ [−T , T ], n ≥ 1, for some T > 0. Then f n → f uniformly in [−T , T ] and f is continuous. Hence, by Proposition 4, the convergence is continuous in [−T , T ]. In particular, f n (tn ) → f (t). Lemma 5. Let X be a r.v. with ch.f. f . Then if E|X |n < ∞ for some positive dn integer n, it follows that f (n) (t) = dt n f (t) is continuous in . Proof.
As was seen in the proof of Theorem 1(v), f (n) (t) = i n eit X X n d F(x).
Next,
i n eit X X n → i n eit0 X X n and i n eit X X n ≤ |X |n , t→t0
independent of t and integrable. Hence the Dominated Convergence Theorem completes the proof. Theorem 9. Let X be a r.v. such that E|X |n < ∞ for some positive integer n, and let f be its ch.f. Set m (k) = E X k , k = 0, 1, . . . , n. Then one has f (t) =
n−1 (k) m k=0
k!
(it)k + ρn (t), t ∈ ,
where 1 n−1 (n) (t x)d x (where the integral exists because of (i) ρn (t) = t n 0 (1−x) (n−1)! f Lemma 5), or (n) (ii) ρn (t) = mn! (it)n + o(t n ), or (n)
(iii) ρn (t) = θ μn! |t|n , where θ = θ (n, t) is such that |θ | ≤ 1, and μ(n) = E|X |n . Proof. (i) In the first place, f (n) (u) exists and is continuous (by Theorem 1(v) and Lemma 5), so that the integral indicated exists in the Riemann sense. Next, the result is, clearly, true for t = 0 (00 = 1). Thus we may assume in the sequel that t = 0, if need be. Now 1 (1 − x)n−1 (n) f (t x)d x ρn (t) = t n (n − 1)! 0 1 1 = (t − t x)n−1 f (n) (t x)d(t x) (n − 1)! 0 t 1 = (t − u)n−1 f (n) (u)du (n − 1)! 0 (by setting t x = u)
11.6 Some Further Properties of Characteristic Functions
t 1 (t − u)n−1 d f (n−1) (u) (n − 1)! 0 1 (n − 1) = (t − u)n−1 f (n−1) (u)|t0 + (n − 1)! (n − 1)! t × (t − u)n−2 f (n−1) (u)du
=
0
t n−1 f (n−1) (0) (n − 1)! t 1 (t − u)n−2 f (n−1) (u)du; + (n − 2)! 0 i.e., integrating by parts, we get t 1 n−2 f (n−1) (u)du. f (n−1) (0) + (n−2)! 0 (t − u) =−
n−1
t ρn (t) = − (n−1)! Also, t 1 n−2 f (n−1) (u)du = (n−2)! 0 (t − u)
=
t 1 n−2 d f (n−2) (u) (n−2)! 0 (t − u) 1 n−2 f (n−2) (u)|t 0 (n−2)! (t − u) (n−2) t n−3 (n−2) + (n−2)! 0 (t − u) f (u)du t n−2 (n−2)
=−
(n − 2)! t 1
+ (n−3)!
f
0 (t
(0)
− u)n−3 f (n−2) (u)du,
so that t n−1 t n−2 f (n−1) (0) − f (n−2) (0) (n − 1)! (n − 2)! t 1 (t − u)n−3 f (n−2) (u)du. + (n − 3)! 0
ρn (t) = −
Proceeding in this manner, the (n − 1)th integration by parts yields t t (1) t − f (0) + f (1) (u)du = − f (1) (0) + f (t) − 1, 1! 1! 0 so that ρn (t) = −1 −
t (1) t n−2 f (0) − · · · − f (n−2) (0) 1! (n − 2)!
t n−1 f (n−1) (0) + f (t) (n − 1)! n−1 k t (k) f (0) + f (t) =− k! −
k=0
221
222
CHAPTER 11 Topics from the Theory of Characteristic Functions
=−
n−1 k t k=0
k!
i k m (k) + f (t)
(since f (k) (0) = i k m (k) ) =−
n−1 (k) m
k!
k=0
f (t) =
(it)k + f (t), so that
n−1 (k) m k=0
(ii) By part (i), ρn (t) =
1 tn (n−1)! 0 (1 −
(n − 1)! ρn (t) = tn
k!
(it)k + ρn (t), t ∈ .
x)n−1 f (n) (t x)d x, so that, for t = 0,
1
(1 − x)n−1 f (n) (t x)d x 0 1 = (1 − x)n−1 (iu)n eit xu d F(u) d x 0 dn since n f (λ) dλ n n ∂ iλu d iλu e d F(u) = e = d F(u) n dλn ∂λ = (iu)n eiλu d F(u) 1
(1 − x)n−1 u n eit xu d F(u)d x 1 n n n−1 it xu u (1 − x) e d x d F(u) =i = in
0
(11.33)
0
(by the Fubini Theorem, which applies since |(1 − x)n−1 u n eit xu | ≤ |u|n is Lebesgue × F-integrable over [0, 1] × ). Now, |(1 − x)n−1 eit xu | = |1 − x|n−1 ≤ 1 (over [0, 1]) independent of t and Lebesgue-integrable over [0, 1]. Furthermore, (1 − x)n−1 eit xu → (1 − x)n−1 , t→0
so that the Dominated Convergence Theorem yields 1 1 1 (1 − x)n−1 eit xu d x → (1 − x)n−1 d x = . t→0 0 n 0 Next, 1 n n−1 it xu u ≤ |u|n independent of t, and F (1 − x) e d x 0
−integrable.
11.7 Applications to the Weak Law of Large Numbers
Furthermore, u n
1 0
un . Therefore, by the Dominated Cont→0 n
(1−x)n−1 eit xu d x →
vergence Theorem, 1 n u d F(u) un (1 − x)n−1 eit xu d x d F(u) → t→0 n 0 1 = m (n) . n n 1 It follows that u [ 0 (1 − x)n−1 eit xu d x]d F(u) = n1 m (n) + o(1) (where o(1) → 0 as t → 0), and then by (11.33), (n − 1)! i n (n) m + o(1) or ρn (t) ρ (t) = n tn n m (n) m (n) (it)n + t n o(1) = (it)n + o(t n ), = n! n! as was to be seen. (iii) Again, as in part (ii) (see first and last lines on the right-hand side of relation (11.33)), 1 1 n n−1 it xu (1 − x)n−1 f (n) (t x)d x = i n u (1 − x) e d x d F(u) 0
≤
0 1
|u| (1 − x) d x d F(u) 0 1 1 = |u|n d F(u) = μ(n) , n n so that
|ρn (t)| =
tn (n − 1)!
0
n
n−1
1
|t|n (n) μ . (1 − x)n−1 f n (t x)d x ≤ n!
Then there exists θ = θ (n, t) with |θ | ≤ 1 such that ρn (t) = θ |t|n! × μ(n) = n
(n)
θ μn! |t|n .
11.7 Applications to the Weak Law of Large Numbers and the Central Limit Theorem The following two results are applications of two theorems in the previous sections, Theorems 3 and 9. Application 1 (Weak Law of Large Numbers, WLLN). Let X 1 , . . . , X n be i.i.d. r.v.s with E X 1 = m (1) finite. Then X1 + · · · + Xn P → m (1) . X¯ n = n→∞ n
223
224
CHAPTER 11 Topics from the Theory of Characteristic Functions
Discussion.
By Theorem 9 (ii), applied for n = 1, we get m (1) (it) + o(t) (where f is the ch.f. of the X i s). 1!
f (t) = 1 + Hence
n t t (1) t f X¯ n (t) = f = 1+m i +o n n n n n (1) t + to(1) t it im = 1 + m (1) + o(1) = 1 + , n n n n
where for a fixed t, o(1) → 0 as n → ∞, so that im (1) t + to(1) → im (1) t. This n→∞ implies that n im (1) t + to(1) (1) 1+ → eim t , (11.34) n→∞ n (1)
so that f X¯ n (t) → eim t which is the ch.f. of the r.v. X that is equal to m (1) with n→∞ probability one. Hence, by Theorem 3, d P X¯ n → m (1) or, equivalently X¯ n → m (1) . n→∞
n→∞
Remark 8. In relation (11.34), we use the familiar result (1 + cn → c as n → ∞ (see also Exercise 5).
cn n n )
→ ec when
Application 2 (Central Limit Theorem, CLT). Let X 1 , . . . , X n be i.i.d. r.v.s with E X 1 = m (1) finite and σ 2 (X 1 ) = σ 2 ∈ (0, ∞). Then √ Sn − E Sn n[ X¯ n − m (1) ] d = → Z ∼ N (0, 1), n→∞ σ (Sn ) σ where n n 1 Sn = X j and X¯ n = X j. n j=1
j=1
X −m (1)
Discussion. Set Y j = j σ . Then the r.v.s Y1 , . . . , Yn are i.i.d. with EY1 = 0 and σ 2 (Y1 ) = EY12 = 1. Let f 1 be the ch.f. of the Yi s. Then by Theorem 9 (ii), applied for n = 2, we get EY12 EY1 (it) + (it)2 + o(t 2 ) 1! 2! t2 t2 = 1 − + o(t 2 ) = 1 − + t 2 o(1). 2 2
f 1 (t) = 1 +
Hence f Sn −E Sn (t) = f n
n X j −m (1) (t) = f
j=1
j=1
σ (Sn )
√ σ n
t n (t) = f √ √ 1 Yj/ n n
11.8 The Moments of a Random Variable Determine its Distribution
n t2 t2 = 1− + o(1) 2n n n 2 − t2 + t 2 o(1) t2 = 1+ → e− 2 , n→∞ n since −
t2 t2 + t 2 o(1) → − as o(1) → 0 for a fixed t. n→∞ n→∞ 2 2
t2
The fact that e− 2 is the ch.f. of a r.v. Z ∼ N (0, 1) completes the proof.
11.8 The Moments of a Random Variable Determine its Distribution This section consists of two main results, Theorems 10 and 11. In the first of these results, a condition is given under which a ch.f. expands into an infinite series, and in the second, conditions are stated under which the moments of a r.v. completely determine its distribution. In the proof of Theorem 11, one needs a number of concepts and results from complex analysis, which are presented in the next section. Let X be a r.v. such that E|X |n < ∞ for n = 1, 2, . . ., and let f be m (n) n its ch.f. Then for any t ∈ for which the series ∞ n=0 n! (it) converges, one has ∞ m (n) f (t) = n=0 n! (it)n .
Theorem 10.
Proof.
Set Sn (t) =
n m (k) k=0
k!
(it)k ,
and observe that the assertion is true for t = 0 (00 = 1). So, let t0 = 0 be a point for (k) (2r ) which Sn (t0 ) converges. Then mk! (it0 )k → 0. In particular, m(2r )! (it0 )2r → 0, and k→∞
this is equivalent to
r →∞
μ(2r ) 2r |t0 | → 0, r →∞ (2r )!
since μ(2r ) = m (2r ) . From Theorem 9 (iii), applied for n = 2r , we have f (t0 ) =
2r −1 k=0
so that
m (k) μ(2r ) 2r μ(2r ) 2r (it0 )k + θ |t0 | = S2r −1 (t0 ) + θ |t0 | , k! (2r )! (2r )!
μ(2r ) μ(2r ) |t0 |2r ≤ |t |2r → 0 | f (t0 ) − S2r −1 (t0 )| = θ (2r )! 0 r →∞ (2r )!
(11.35)
225
226
CHAPTER 11 Topics from the Theory of Characteristic Functions
on account of (11.35). Thus S2r −1 (t0 ) → f (t0 ).
(11.36)
r →∞
Next,
μ(2r ) m (2r ) |S2r (t0 ) − S2r −1 (t0 )| = (it0 )2r = |t |2r → 0 (2r )! 0 r →∞ (2r )!
by (11.35). Hence S2r (t0 ) → f (t0 ) and this together with (11.36) gives that r →∞ Sn (t0 ) → f (t0 ). n→∞ ∞ Remark 9. Recall that if n=0 cn z n is a power series where z is, in general, a complex variable, then the radius ρ of convergence of the series is given by ρ = 1/ lim sup |cn |1/n , n→∞
so that ρ −1 = lim supn→∞ |cn |1/n , and the series converges for |z| < ρ. Lemma 6. Let X be a r.v. such that μ(n) = E|X |n < ∞ for all n = 1, 2, . . . (so that m (n) = E X n are also finite). Then the series S(t) =
∞ m (n) n=0
n!
∗
(it) and S (t) = n
∞ μ(n) n=0
n!
|t|n
have the same radii of convergence. Let ρ1 and ρ2 be the radii of convergence of the series S(t) and S ∗ (t), (n) 1/n (n) 1/n respectively. Then, since mn! ≤ μn! (by the fact that |E X n | ≤ E|X |n ),
Proof.
it follows that equivalently,
1 ρ2
or ρ1 ≥ ρ2 . Thus, it suffices to show that ρ1 ≤ ρ2 , or (2n) 1/2n (2n) 1/2n ≤ ρ11 . To this end, we have μ(2n)! = m(2n)! , so that
1 ρ1
≤
1 ρ2 ,
μ(2n) lim sup (2n)! n→∞
1/2n
m (2n) = lim sup (2n)! n→∞
1/2n ≤
1 . ρ1
(11.37)
Next, [μ(2n−1) ]1/(2n−1) ≤ [μ(2n) ]1/2n (by the fact that E 1/r |X |r ↑ in r > 0), so that 1/(2n−1) 1/(2n) μ(2n) μ(2n−1) [(2n)!]1/2n ≤ × . (11.38) (2n − 1)! (2n)! [(2n − 1)!]1/(2n−1) At this point, assume for a moment that [(2n)!]1/2n = 1. n→∞ [(2n − 1)!]1/(2n−1) lim
(11.39)
11.8 The Moments of a Random Variable Determine its Distribution
Then (11.38) gives by means of (11.37)
μ(2n−1) lim sup (2n − 1)! n→∞
1/(2n−1)
Since
⎧ ⎨
1/n
1 . ρ1
μ(2n) = max lim sup ⎩ n→∞ (2n)! n! 1/(2n−1) ⎫ ⎬ μ(2n−1) lim sup , ⎭ (2n − 1)! n→∞
1 = lim sup ρ2 n→∞
μ(n)
≤
we have, from (11.37) and (11.40), that Remark 10.
1 ρ2
≤
1 ρ1 ,
(11.40)
1/(2n) ,
so that ρ1 = ρ2 .
The proof of (11.39) is left as an exercise (see Exercise 7).
Corollary to Lemma 6. A sufficient condition for the series S(t) to converge for some t0 = 0 is that m (n) 1/n [μ(n) ]1/n < ∞. lim sup < ∞ or lim sup n n→∞ n! n→∞ Proof. The first assertion is immediate and the second follows from the first by Stirling’s formula (see Exercise 8). Theorem 11. Let X be r.v. with d.f. F and ch.f. f , and suppose that E X n = m (n) ∈ , n ≥ 1, and let μ(n) = E|X |n . Then m (n) n (i) If the series S(t) = ∞ n=0 n! (it) converges for some t0 = 0, it follows that the distribution of X is uniquely determined (by the moments of X ). (ii) A sufficient condition for S(t) to converge for some t0 = 0 is that (n) 1/n lim supn→∞ μn! < ∞. Proof. (i) Let F0 be a d.f. of a r.v., potentially different from F, with corresponding ch.f. f 0 , such that
x n d F0 (x) = m (n) , n ≥ 0.
m (n) n Then the series S0 (t) = ∞ n=0 n! (it) (= S(t)) converges for t = t0 = 0, and hence for all t with |t| ≤ t0 , supposing without loss of generality that t0 > 0. For each such t, S(t) represents f (t), and S0 (t) represents f 0 (t) on account of
227
228
CHAPTER 11 Topics from the Theory of Characteristic Functions
Theorem 10. Then, by Fact 4 in the next section, f (z) and f 0 (z) are defined and are analytic for |I m(z)| < t0 . Furthermore, f
(n)
∞ i k m (k) k−n (n) (z) = = f 0 (z), |z| < t0 , z (k − n)! k=n
and in particular, f (z) =
∞ k (k) i m k=0
k!
z k = f 0 (z), |z| < t0 .
(11.41)
Thus, f (z) and f 0 (z) are analytic for |I m(z)| < t0 , and f (z) = f 0 (z) for |z| < t0 , by means of (11.41). Then, by Fact 5, f (z) = f 0 (z) for |I m(z)| < t0 (by continuous extension). In particular, f (t) = f 0 (t) for t ∈ , so that F = F0 . (ii) Immediate by the Corollary to Lemma 6. The last theorem is illustrated by the following example. Let Z ∼ N (0, 1), so that m (2k) =
Example 4. Then f (t) =
∞ m (2k) k=0
(2k)!
(it)
2k
(2k)! , m (2k+1) 2k k!
= 0, k = 0, 1, . . ..
2 k ∞ ∞ t2 t 1 1 2k (it) = − = = e− 2 , k 2 k! k! 2 k=0
k=0
as was expected. This section is concluded with some comments and a result involving the expansion of a logarithmic function. Recall that if z is a complex number, then the log z is any complex number w such that ew = z. This relation defines a many-valued function, since if w is a solution of ew = z, then so is w + 2nπi, n = 0, ±1, . . ., because e2nπi = 1. Now, every complex number z may be written as z = |z|eiθ for some θ with −π < θ ≤ π . Then w = log |z| + iθ (|z| = 0), −π < θ ≤ π , is a solution of ew = z, since ew = elog |z|+iθ = |z|eiθ = z. This solution is called the principal branch of the logarithm of z and is usually denoted by log p z. In all that follows, we shall work with log p although we shall not indicate the p. The following result will prove useful in many situations. Lemma 7.
For any complex number z, one has log(1 + z) = z[1 + ε(z)] = z(1 + θ z), if |z| ≤
where |ε(z)| ≤ |z| and |θ | = |θ (z)| ≤ 1. Proof.
For |z| < 1, it is known that log(1 + z) = z −
z3 z4 z2 + − + ··· 2 3 4
1 , 2
11.9 Some Basic Concepts and Results from Complex Analysis
∞ zn (−1)n+1 n n=1 z z2 z3 = z 1− + − + ··· 2 3 4 = z[1 + ε(z)], (z = 0),
=
where
(11.42)
z2 z3 z − + ··· ε(z) = − + 2 3 4
Now z z2 z3 |ε(z)| = − + − + · · · 2 3 4 z 2 2 2 = −1 + z − z + · · · 2 3 4 |z| 2 2 2 ≤ 1 + |z| + |z| + · · · . 2 3 4 |z| 1 + |z| + |z|2 + · · · ≤ 2 |z| 1 ≤ |z|, = 2 1 − |z|
(11.43)
provided |z| ≤ 21 . Thus, we have |ε(z)| ≤ |z| for |z| ≤ 21 by (11.43) and log(1 + z) = z[1 + ε(z)] by (11.42). Of course, the fact that |z| ≤ 21 implies |ε(z)| ≤ 21 ; it also implies that we can write ε(z) = θ z for some θ = θ (z) with |θ | ≤ 1; (i.e., θ (z) = ε(z) z ). Thus, log(1 + z) = z(1 + θ z).
11.9 Some Basic Concepts and Results from Complex Analysis Employed in the Proof of Theorem 11 In what follows, C stands for the complex plane, and I m(z) stands for the imaginary part of the complex number z = x + i y; i.e., I m(z) = y, x, y ∈ . Definition 8. A function g : S ⊆ C → C is said to be differentiable at z 0 ∈ S with derivative g (z 0 ), if g(z) − g(z 0 ) → g (z 0 ) z − z0 as z tends to z 0 in all possible ways; g is differentiable in S, if it is differentiable at each z ∈ S. Definition 9. The function g is called analytic in S, if it is differentiable in S. If S = C and g is analytic, then it is called entire.
229
230
CHAPTER 11 Topics from the Theory of Characteristic Functions
Fact 1. If g is analytic in S, then the derivatives of all orders g (n) , n ≥ 1, exist (and are given by a certain formula involving the Cauchy integral). Fact 2.
A function g represented by a power series g(z) =
∞
cn z n , |z| ≤ r (some r > 0)
n=0
is analytic in |z| < r . Fact 3. If g is analytic for |z| ≤ r (some r > 0) (or analytic on and inside a simple closed contour C), then, for every z with |z| < r (or every z inside C), g(z) can be represented by a power series. More specifically, for every a with |a| < r (or every a inside C), ∞ g(z) = g (n) (a)(z − a)n , |z − a| < δ, n=0
where δ is the distance of a from the nearest point of the circumference |z|(n)= r (or n the distance of a from the nearest point of C). In particular, g(z) = ∞ n=0 g (0)z , |z| < r . Facts 2 and 3 justify the following definition. Definition 10.
The ch.f. f is said to be r -analytic, if f (t) =
∞ an n=0
n!
t n , |t| < r (r > 0), an ∈ C,
and is called entire, if r = ∞. Let f be a ch.f. with corresponding d.f. F, and for z = x + i y, x, y, ∈ , define f (z) by i zu e d F(u) = ei xu × e−yu d F(u) f (z) = −yu e cos xud F(u) + i e−yu sin xud F(u), =
provided (see also Proposition 5 below), e−yu d F(u) < ∞.
(11.44)
Fact 4. Let f be a ch.f. of a r.v. with corresponding d.f. F, and assume that the moments m (n) = x n d F(x) are finite for all n = 1, 2, . . ., and that S(t0 ) given in Lemma 6 converges for some t0 = 0. Then f (t) is r -analytic for |t| < r (some r > 0). Also, f (z) is defined and is analytic for z in the strip defined by |I m(z)| = |y| < r .
11.9 Some Basic Concepts and Results from Complex Analysis
This is so by Proposition 5, which follows. Furthermore, for z with |z| < r , all derivatives f (n) (z), n ≥ 1, exist and f (n) (z) =
∞ i k m (k) k−n z , (k − n)! k=n
and, in particular, f (z) =
∞ m (k) k=0
k!
(i z)k .
Remark 11. That f (z) is well defined for z with |I m(z)| = |y| < r is due to the fact that relation (11.44) is satisfied here, as Proposition 5 below shows. Fact 5. Let g, g0 : S → C (where {z ∈ C; |I m(z)| < r } ⊆ S ⊆ C), and suppose that g(z) and g0 (z) are analytic for z with |I m(z)| < r (r > 0). Furthermore, assume that g(z) = g0 (z) for |z| < r . Then g(z) = g0 (z) for z with |I m(z)| < r . m (n) n Proposition 5. Assume that the series S(t) = ∞ n=0 n! (it) converges for some t0 = 0 (as we do in the formulation of Theorem 11), and assume without loss of generality that t0 > 0. Then relation (11.44) is satisfied, so that f (z) is well defined for z with |I m(z)| = |y| < t0 . In the first place, convergence of the series S(t) for t with |t| < t0 implies μ(n) n convergence of the series S ∗ (t) = ∞ n=0 n! |t| ; this is so, by Lemma 6. Next, for 0 < t < t0 , ∞ ntn |x| et|x| d F(x) = d F(x) n=0 n! ∞ |x|n t n d F(x) = n!
Proof.
n=0
(by Corollary 1(ii) to Theorem 1 in Chapter 5) ∞ n ∞ t μ(n) n t < ∞, = |x|n d F(x) = n! n! n=0
n=0
as already pointed out. But et|x| d F(x) =
= so that
0 −∞
0
−∞ 0 −∞
∞
et|x| d F(x) +
et|x| d F(x)
0
e
−t x
e−t x d F(x) < ∞,
∞
d F(x) +
et x d F(x),
0
0
∞
et x d F(x) < ∞.
(11.45)
231
232
CHAPTER 11 Topics from the Theory of Characteristic Functions
Next, for 0 < y < t0 , −yu e d F(u) =
≤
−∞ 0 −∞
whereas, for −t0 < y < 0, e−yu d F(u) =
=
0
0 −∞ 0 −∞
e
−yu
∞
d F(u) +
e−yu d F(u)
0
e−yu d F(u) + V ar F < ∞ (by (11.45)),
e−yu d F(u) +
∞
e−yu d F(u)
0
e
(−y)u
≤ V ar F +
∞
d F(u) + ∞
e(−y)u d F(u)
0
e(−y)u d F(u) < ∞ (by (11.45)).
0
Thus,
e
−yu d F(u)
< ∞ for |y| < t0 , which is (11.44).
Remark 12. Material pertaining to this section may be found in the classical reference Titchmarsch (1939). Exercises. 1. Let g and h be real-valued functions defined on for which g(x)dμ and h(x)dμ are finite, where μ is a (σ -finite) measure in . Then show that: [g(x) + i h(x)]dμ ≤ |g(x) + i h(x)| dμ.
In particular, |E Z | ≤ E|Z |, where Z is a complex-valued r.v.; i.e., Z = X + iY with X and Y real-valued r.v.s. Hint: Use polar coordinates. 2. In reference to the proof of Theorem 2, show that limt→0 g(t) = b −a as claimed there. 3. In reference to the proof of Proposition 1 (see also proof of Proposition 3), provide the details of the convergence 1 t g(v)dv → g(0) as t ↓ 0. t 0 4. Show that |ei x − 1| ≤ |x| for all x ∈ . Hint: Write ei x = cos x + i sin x, express cos x in terms of sin2 2x , and use the inequality | sin t| ≤ |t|, t ∈ (which you also must prove). 5. Show that (1 + cnn )n → ec , when cn → c, where cn ∈ C, the complex plane, n→∞ n→∞ n ≥ 1.
11.9 Some Basic Concepts and Results from Complex Analysis
6. Let X be a r.v. having the Cauchy distribution with parameters μ = 0 and σ = 1 1 (i.e., the p.d.f. of X is given by p(x) = π1 × 1+x 2 , x ∈ ). Then show that: (i) The E X does not exist. (ii) The ch.f. f X (t) = e−|t| , t ∈ . Next, let X 1 , . . . , X n be independent r.v.s distributed as X and set Sn = X 1 + · · · + X n . Then (iii) Identify the ch.f. f Sn /n (t). (iv) Show that
7.
8. 9.
10.
Sn P 0 n n→∞
by showing that
Sn d 0. n n→∞
(Although, by intuition, one would expect such a convergence, because of symmetry about 0 of the Cauchy distribution!) ∞ x) d x = π2 e−|t| (see, e.g., integral 403 Hint: For part (ii), use the result 0 cos(t 1+x 2 in Tallarida (1999); also see integral 635 in the same reference). [(2n)!]1/2n Show that: [(2n−1)!] 1/(2n−1) → 1. n→∞ √ 1 Hint: Use the Stirling formula, which states that n!/ 2π × n n+ 2 × e−n tends to 1 as n → ∞. Establish the validity of the relations claimed in the Corollary to Lemma 6. Hint: Use the Stirling formula cited in the hint of Exercise 7. If X 1 , . . . , X n are i.i.d. r.v.s with E X 1 = μ ∈ and σ 2 (X 1 ) = σ 2 ∈ (0, ∞), then √ ⇒ Z ∼ (by Application 2 to Theorem 9 in this chapter) it follows that Sσn −nμ n n→∞ n N (0, 1), where Sn = j=1 X j . √ does not converge in probability as n → ∞. Show that Sσn −nμ n √ Hint: Set Yn = (Sn − nμ)/σ n and show that, as n → ∞, {Yn } does not converge mutually in probability by showing that {Y2n − Yn } does not converge in probability to 0. According to the WLLN (Application 1 to Theorem 9), if the r.v.s X 1 , . . . , X n are i.i.d. with finite E X 1 , then Sn P −→ E X 1 , where Sn = X j. X¯ n = n n→∞ n
j=1
Sn !
The following example shows that it is possible for n to converge in probability to a finite constant, as n → ∞, even if the E X 1 does not exist. To this effect, for j = 1, 2, . . ., let X j be i.i.d. r.v.s such that P(X j = −n) = P(X j = n) = 1 c/n 2 log n, n ≥ 3, where c = 21 ∞ n=3 n 2 log n). Then show that E X 1 does not P exist, but Snn −→ 0, where Sn = nj=1 X j . n→∞
Hint: Show that E X 1 does not exist by showing that E X 1+ = E X 1− = ∞. Next, set X n j = X j if |X j | < n, and X n j = 0 otherwise, j = 1, 2, . . . , n ≥ 3,
233
234
CHAPTER 11 Topics from the Theory of Characteristic Functions
n Sn∗ P Sn and let Sn∗ = −→ 0 by showing that j=1 X n j . Then show that (i) n − n n→∞ ∗ ∗ ∗ S S S P Snn = nn −→ 0; (ii) E nn = 0; (iii) V ar nn −→ 0; (iv) from (ii) and n→∞
(iii), conclude that
Sn∗ P −→ 0; n n→∞
n→∞
then (i) and (iv) complete the proof.
In all Exercises 11–16, i is to be treated as a real number, subject, of course, to the requirement that i 2 = −1. 11. If X ∼ B(n, p), show that f X (t) = ( peit + q)n , q = 1 − p. 12. If X ∼ P(λ), show that f X (t) = eλe 13.
it −λ
.
2 e−t /2 .
(i) If Z ∼ N (0, 1), show that f Z (t) = (ii) If X ∼ N (μ, σ 2 ), use the fact that Z = order to show that f X (t) = eiμt−
X −μ σ
σ 2t2 2
∼ N (0, 1) and part (i) in
.
14. If X has the Gamma distribution with parameters α and β; i.e., if its p.d.f. is given by 1 x α−1 e−x/β , x > 0 (and 0 for x ≤ 0), Γ (α)β α ∞ where the Gamma function Γ (α) is given by Γ (α) = 0 y α−1 e−y dy (α, β > 0), then show that 1 . f X (t) = (1 − iβt)α In particular, for α = 1 and β = 1/λ, we get the ch.f. of the Negative Exponential distribution with parameter λ; i.e., f X (t) = 1/(1 − itλ ); and for α = r2 (r > 0 integer) and β = 2, we get the ch.f. of the chi-square distribution with r degrees of freedom; i.e., f X (t) = 1/(1 − 2it)r /2 . 15. If the r.v.s X and Y have the Bivariate Normal distribution with parameters μ1 , μ2 in , 0 < σ1 , σ2 < ∞, and ρ ∈ [−1, 1], show that their joint ch.f. is given by f X ,Y (t1 , t2 ) = exp iμ1 t1 + iμ2 t2 1 − σ12 t12 + 2ρσ1 σ2 t1 t2 + σ22 t22 . 2 p(x; α, β) =
For this purpose, do the following: (i) Assume first that μ1 = μ2 = 0 and σ1 = σ2 = 1, and use Exercises 12 (ii) in Chapter 9 and 13 (ii) in this chapter to show that: 12 2 f X ,Y (t1 , t2 ) = exp − t1 + 2ρt1 t2 + t2 . 2
11.9 Some Basic Concepts and Results from Complex Analysis
(ii) For the general case, use the transformations U = (X − μ1 )/σ1 , V = (Y − μ2 )/σ2 and verify that EU = E V = 0, V ar (U ) = V ar (V ) = 1, ρ(U , V ) = ρ(X , Y ) = ρ. Then use Exercise 15 in Chapter 9 and part (i) here to arrive at the desired expression for the ch.f. f X ,Y . 16. Let the r.v.s X and Y have the Bivariate Normal distribution with parameters μ1 , μ2 ∈ , 0 < σ1 , σ2 < ∞, and ρ ∈ [−1, 1], and set U = X +Y , V = X −Y . (i) Verify that EU = μ1 + μ2 , V ar (U ) = σ12 + σ22 + 2ρσ1 σ2 , E V = μ1 − μ2 , V ar (V ) = σ12 + σ22 − 2ρσ1 σ2 , and Cov(U , V ) = σ12 − σ22 (by using Exercises 12 (ii) and 14 (ii) in Chapter 9). (ii) Since fU ,V (t1 , t2 ) = Eeit1 U +it2 V = Eeit1 (X +Y )+it2 (X −Y ) = Eei(t1 +t2 )X +i(t1 −t2 )Y = f X ,Y (t1 + t2 , t1 − t2 ), use Exercise 15 in order to conclude that fU ,V (t1 , t2 ) = exp {i(μ1 + μ2 )t1 + i(μ1 − μ2 )t2 1 − (σ12 + σ22 + 2ρσ1 σ2 )t12 + 2(σ12 − σ22 )t1 t2 2 +(σ12 + σ22 − 2ρσ1 σ2 )t22 ]}. (iii) From part (ii) and Exercise 15, conclude that the r.v.s U and V have the Bivariate Normal distribution with parameters μ1 +μ2 , μ1 −μ2 , σ12 +σ22 + 2ρσ1 σ2 = τ12 , σ12 + σ22 − 2ρσ1 σ2 = τ22 , and ρ(U , V ) = (σ12 − σ22 )/τ1 τ2 . (iv) From part (iii) and Exercise 12 (ii) in Chapter 10, conclude that U and V are independent if and only if σ1 = σ2 . 17. In this exercise, the r.v.s X 1 , . . . , X k are independent with distributions as indicated, and X = X 1 + · · · + X k . Then use Exercise 13 in Chapter 10, the inversion formula (Theorem 2), and the appropriate ch.f.s in order to show that: (i) If X j ∼ B(n j , p), j = 1, . . . , k, then X ∼ B(n, p), where n = n 1 + · · · + nk . (ii) If X j ∼ P(λ j ), j = 1, . . . , k, then X ∼ P(λ), where λ = λ1 + · · · + λk . (iii) If X j ∼ N (μ j , σ j2 ), j = 1, . . . , k, then X ∼ N (μ, σ 2 ), where μ = μ1 + · · · + μk and σ 2 = σ12 + · · · + σk2 . Also, c1 X 1 + · · · + ck X k ∼ N (c1 μ1 + · · · + ck μk , c12 σ12 + · · · + ck2 σk2 ), where c1 , . . . , ck are constants. (iv) If X j ∼ Gamma with parameters α j and β, j = 1, . . . , k, then X ∼ Gamma with parameters α = α1 + · · · + αk and β. In particular, if X j has the Negative Exponential distribution with parameter λ, j = 1, . . . , k, then X ∼ Gamma with parameters α = k and β = 1/λ, whereas, if X j ∼ χr2j , j = 1, . . . , k, then X ∼ χr2 , where r = r1 + · · · + rk .
235
236
CHAPTER 11 Topics from the Theory of Characteristic Functions
18. Show that the r.v.s X 1 , . . . , X k are independent if and only if, for all t1 , . . . , tk in , f X 1 ,...,X k (t1 , . . . , tk ) = f X 1 (t1 ) × · · · × f X k (tk ). Hint: Use Proposition 1 in Chapter 10 (and work as in Exercise 13 in the same chapter), and Theorem 2 . 19. If f is a ch.f., then show that f is positive definite; i.e., for all tk , tl reals and all complex numbers z k , zl , we have n n
f (tk − tl )z k z¯l ≥ 0
k=1 l=1
for every integer n ≥ 1. 20. For n = 1, 2, . . ., let X n be a r.v. distributed as Poisson with parameter n, X n ∼ √ P(n) (so that E X n = V ar (X n ) = n), and set Yn = (X n − n)/ n. Then show d
that Yn −→ Z ∼ N (0, 1). n→∞
2
3
∗
Hint: Use the expansion: ei z = 1+i z − z2 − i z6 ei z for some complex number z ∗ . 21. Let F1 and F2 be d.f.s, and let G be their convolution, G = F1 ∗ F2 . Then: (i) If F1 is absolutely continuous with respect to Lebesgue measure with p.d.f. p1 , it follows that G is absolutely continuous with respect to Lebesgue measure with p.d.f. p given by ∞ p1 (u − y)d F2 (y). p(u) = −∞
(ii) Furthermore, if F2 is also absolutely continuous with respect to Lebesgue measure with p.d.f. p2 , then ∞ p(u) = p1 (u − y) p2 (y)dy. −∞
22. If the r.v.s X and Y are i.i.d. with ch.f. f , then the ch.f. of the r.v. X − Y is | f (t)|2 . 23. Let X 1 , . . . , X n be independent r.v.s, each of which is symmetric about 0. Then the r.v. X 1 + · · · + X n is also symmetric about 0. 24. For n ≥ 1, let gn and g be functions defined on E ⊆ k into , and recall that {gn } is said to converge continuously to g on E, if for every x ∈ E, gn (xn ) → g(x) whenever xn → x, as n → ∞. Then show that if {gn } converges continuously to g on E, it follows that g is continuous on E. 25. For n ≥ 1, let gn , g and E be as in Exercise 24. Then: (i) If {gn } converges to g uniformly on E and g is continuous on E, it follows that {gn } converges continuously to g on E. (ii) If E is compact and {gn } converges continuously to g on E, then the convergence is uniform. 26. For n ≥ 1, let X n , Yn and X , Y be r.v.s defined on the probability space (, A, P), let d1 , d2 and cn be constants with 0 = cn → ∞ as n → ∞, and suppose that
11.9 Some Basic Concepts and Results from Complex Analysis
d
cn (X n − d1 , Yn − d2 )→(X , Y ) as n → ∞. Let g : 2 → be differentiable, and suppose its (first-order) partial derivatives g˙ x , g˙ y are continuous at (d1 , d2 ). Then show that, as n → ∞, d
cn [g(X n , Yn ) − g(d1 , d2 )] −→ [g˙ x (d1 , d2 ) g˙ y (d1 , d2 )](X Y ) = g˙ x (d1 , d2 )X + g˙ y (d1 , d2 )Y .
237
CHAPTER
The Central Limit Problem: The Centered Case
12
In this chapter, we discuss what is perhaps the most important problem in probability theory. Namely, conditions are sought under which partial sums of independent r.v.s converge in distribution to some limiting law. Also, the determination of the totality of such laws is sought. A special but extremely important case is that where the limiting law is the Normal distribution, and a very special case of it is the Central Limit Theorem (CLT) in its simplest and popular form. In order to cover as many cases of practical importance as feasible, the approach used here is that of employing a triangular array of r.v.s. The r.v.s within each row are assumed to be independent, but no such assumption is made for those in distinct rows. Of course, it is assumed that the number of r.v.s in the rows tends to infinity as the rank of rows increases. A brief description of what is done in this chapter is as follows. After basic notation and assumptions are taken care of, necessary and sufficient conditions are given for the sequence of partial sums to converge in distribution to a N (0, 1) distributed r.v. This is the extremely useful Normal Convergence Criterion due to Lindeberg and Feller, and stated as Theorem 1. Its proof is derived as a special case of Theorem 2 stated and proved in Section 12.1, and is deferred to Section 12.3. In Section 12.2, the problem is cast in broad generality. Instead of seeking conditions under which the limiting law is normal, we investigate under what conditions there is convergence in distribution. Also, the totality of such limiting laws is identified. The culmination of these discussions is Theorem 2, which follows after a number of auxiliary results are established. In Section 12.3, first the proof of Theorem 1 is derived as a special case of that of Theorem 2. Then several sets of sufficient conditions are given for convergence in distribution to the N (0, 1) distribution. One such condition is expressed in terms of absolute moments and is the basis of the so-called Liapounov Theorem (Theorem 3). If the sum of variances of the r.v.s in each row is not equal to 1, which is part of the basic assumption made, then Theorem 1 assumes an easy modification, which is Theorem 4. The next modification of Theorem 1 is that where one is dealing with a single sequence of r.v.s, and leads to Theorem 5. Should the underlying r.v.s be also identically distributed, then one has the Corollary to Theorem 5, which is the CLT in its simplest and popular form. An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00012-8 Copyright © 2014 Elsevier Inc. All rights reserved.
239
240
CHAPTER 12 The Central Limit Problem: The Centered Case
This chapter is concluded with a final section, Section 12.4, where technical results stated in Section 12.2 are proved. In conclusion, it is to be pointed out that the term “centered” used in the title of this chapter simply indicates that all r.v.s involved have expectations zero. This is always achieved by subtracting off the expectations from the r.v.s, or by “centering” the r.v.s.
12.1 Convergence to the Normal Law (Central Limit Theorem, CLT) In the present section, we shall concern ourselves with an array of r.v.s X n j , j = 1, . . . , kn , where kn → ∞, and for each n, the r.v.s X n1 , . . . , X nkn are independent, n→∞ but not necessarily identically distributed. ⎧ ⎪ ⎪ X 11 , X 12 , . . . , X 1k1 independent ⎪ ⎪ ⎨ X 21 , X 22 , . . . , X 2k2 independent .................. ............ ⎪ ⎪ X n1 , X n2 , . . . , X nkn independent ⎪ ⎪ ⎩ .................. ............ However, r.v.s in different rows need not be independent. The probability spaces on which rows are defined may even be different. We shall assume throughout that E X n j , j = 1, . . . , kn , are finite for all n. Then we may consider the r.v.s Yn j = X n j − E X n j with EYn j = 0. Thus, without loss of generality, we shall assume that E X n j = 0, j = 1, . . . , kn , n = 1, 2, . . . . To summarize: For each n, X n j , j = 1, . . . , kn , are independent r.v.s. (within each row), kn → ∞, and E X n j = 0, j = 1, . . . , kn , n = 1, 2, . . ..
n→∞
In the present section, we shall also assume that σn2j = σ 2 (X n j ) = E X n2 j < ∞, j = 1, . . . , kn , n = 1, 2, . . . , and we shall set sn2
=
kn
σn2j and sn = + sn2 .
j=1
Then the problem to concern ourselves with is this: under what condition is it true that kn Sn ⇒ N (0, 1), where Sn = L Xnj ? sn n→∞ j=1
Here and in the sequel, it is convenient to denote the distribution of a r.v. X by L(X ), and also read it as the (distribution) law of X . Then by L(X n ) ⇒ L we n→∞
12.1 Convergence to the Normal Law (Central Limit Theorem, CLT)
understand the usual weak convergence of d.f.s, or convergence of r.v.s in distribution. d
That is, FX n ⇒ FX , or X n → X . n→∞
n→∞
Before we formulate the problem precisely and solve it, we notice that for kn = n, the array of the r.v.s in question becomes a triangular array; i.e., ⎧ X 11 ⎪ ⎪ ⎪ ⎪ ⎨ X 21 , X 22 ········· ⎪ ⎪ X n1 , X n2 , . . . , X nn ⎪ ⎪ ⎩ ·················· Furthermore, if X n j = X j , j = 1, . . . , n, and they are also identically distributed, then we have E X j = 0, σ 2 (X j ) = σ 2 , sn2 = nσ 2 , and ⎧ X1 ⎪ ⎪ ⎪ ⎪ ⎨ X 1, X 2 ······· ⎪ ⎪ X 1, X 2, . . . , X n ⎪ ⎪ ⎩ ···············
with Sn = nj=1 X n j = nj=1 X j . √ Since sn = σ n, then the problem of finding conditions under which n Sn j=1 X j =L L ⇒ N (0, 1) √ n→∞ sn σ n becomes the classical CLT. Going back to the problem we started out with, we have the r.v.s X n j , j = 1, . . . , kn → ∞, independent in each row, n→∞
E X n j = 0, σn2j = σ 2 (X n j ) = E X n2 j < ∞, sn2 =
kn
σn2j ,
j=1
Sn =
kn j=1
Then
E
Xnj sn
Xnj ,
Sn = sn
= 0 and
kn j=1 Xnj sn
kn j=1
Xnj . sn
σ
2
Xnj sn
kn 1 = 2 σn2j = 1. sn j=1
n
2 , we have Thus, replacing X n j by Z n j = j=1 σ (Z n j ) = 1 and therefore, without loss of generality, we may assume that sn2 = 1.
241
242
CHAPTER 12 The Central Limit Problem: The Centered Case
In order to summarize, we have then: For each n, the r.v.s X n j , j = 1, . . . , kn → ∞ are row-wise independent, n→∞
n E X n j = 0, σn2j = σ 2 (X n j ) < ∞, sn2 = kj=1 σn2j = 1.
n Set Sn = kj=1 X n j and find conditions under which L(Sn ) ⇒ N (0, 1). n→∞
The condition to be imposed upon the r.v.s X n j in order for L(Sn ) ⇒ N (0, 1) to n→∞ hold true is of the following nature: it is assumed that the contribution of each one of the summands X n j in the sum Sn is asymptotic negligible in the following sense. Definition 1.
The r.v.s X n j , j = 1, . . . , kn → ∞, are said to be uniformly asympn→∞
totic negligible (u.a.n.) if, for every ε > 0,
max P(|X n j | > ε) → 0.
n→∞
1≤ j≤kn
Actually, we shall make an assumption that will imply u.a.n. Namely, max σn2j → 0.
1≤ j≤kn
n→∞
That this assumption implies u.a.n. is seen thus max P(|X n j | > ε) ≤ max
1≤ j≤kn
E X n2 j
1≤ j≤kn
ε2
=
1 max σ 2 → 0. ε2 1≤ j≤kn n j n→∞
We are going to show the following result, which is the general form of the CLT, and from which other variants follow. Theorem 1 (Normal Convergence Criterion or Lindeberg–Feller Theorem). For each n, let the r.v.s X n j , j = 1, . . . , kn → ∞, be row-wise independent, assume that n→∞
n E X n j = 0 and σn2j = σ 2 (X n j ) < ∞, and set sn2 = kj=1 σn2j = 1 (by assumption). Then ⎛ ⎞ kn L⎝ X n j ⎠ ⇒ N (0, 1) and max σn2j → 0, j=1
n→∞
1≤ j≤kn
n→∞
if and only if, for every ε > 0, gn (ε) =
kn j=1 (|x|≥ε)
where Fn j is the d.f. of the r.v. X n j .
x 2 dFn j → 0, n→∞
This theorem is a special case of a more general result to be obtained in the next section. The proof of Theorem 1 and of some variations of it are given in Section 12.3.
12.1 Convergence to the Normal Law (Central Limit Theorem, CLT)
This section is concluded with three examples. The first illustrates the checking of the condition gn (ε) → 0; the second presents a case where the condition n→∞
gn (ε) → 0 is not fulfilled; and the third is meant to shed some light on what lies n→∞
behind the condition gn (ε) → 0. n→∞
Example 1.
Let Y1 , . . . , Yn be i.i.d. r.v.s with EY1 = μ ∈ and σ 2 (Y1 ) = σ 2 ∈
(0, ∞), and set X n j =
Y j −μ √ , σ n
j = 1, . . . , n. Then, for each n, X n1 , . . . , X nn are i.i.d.
n 2 = σ 2 = 1 , so that s 2 = σ 2 = 1 (and with E X n1 = 0, σ 2 (X n1 ) = σn1 nj n j=1 n j n n ⇒ N (0, 1) if and max1≤ j≤n σn2j = n1 → 0). Then by Theorem 1, L j=1 X n j n→∞
n→∞ n 2 only if gn (ε) = → 0 for every ε > 0, where Fn j = Fn j=1 (|x|≥ε) x dFn j (x) n→∞ is the d.f. of the X n j s. Actually, the asymptotic normality has been established in Application 2 of Chapter 11. So, it must hold that gn (ε) → 0 for every ε > 0. That n→∞ this is, indeed, the case is2 seen as follows: √ gn (ε) = n (|x|≥ε) x dFn (x) = n (|x|≥ε) x 2 dF(μ + xσ n), since Fn (x) = √ √ μ + xσ n) = F(μ + xσ n), where F is the d.f. of P(X n1 ≤ x) = P(Y1 ≤ √ the Y j s. By setting μ + xσ n = y, we get gn (ε) =
n
(y − μ)2 dF(y) σ 2 n (|y−μ|≥εσ √n) 1 1 = 2 (y − μ)2 dF(y) − 2 (y − μ)2 dF(y) σ σ (|y−μ|<εσ √n) 1 = 1− 2 (y − μ)2 dF(y) σ (|y−μ|<εσ √n) 1 (y − μ)2 I(|y−μ|<εσ √n) dF(y). = 1− 2 σ
However, (y − μ)2 I(|y−μ|<εσ √n) ≤ (y − μ)2 independent of n and F-integrable. Also, (y − μ)2 I(|y−μ|<εσ √n) → (y − μ)2 . Then, by the Dominated Convergence n→∞ 2 Theorem, (y − μ) I(|y−μ|<εσ √n) dF(y) → (y−μ)2 dF(y) = σ 2 . It follows n→∞
that gn (ε) → 0 for every ε > 0. (See also the Corollary to Theorem 5 later.) n→∞
Of course, the example just discussed also applies to the special case where the Y j s have the Binomial distribution B(1, p) (0 < p < 1). Here is a version of this special case. Example 2. For j = 1, . . . , n, let Yn j be independently distributed as B(1, pn ) where 0 < pn < 1 with pn → 0, so that npn → λ ∈ (0, ∞). Then EYn j = pn and n→∞ n→∞ √ σ 2 (Yn j ) = pn qn , j = 1, . . . , n, where qn = 1 − pn . Set X n j = (Yn j − pn )/ npn qn , so that X n1 , . . . , X nn are independent, E X n j = 0, σ 2 (X n j ) = σn2j = 1/n, and
sn2 = nj=1 σn2j = 1 (and max1≤ j≤n σn2j = n1 → 0). Therefore, for each n, the r.v.s n→∞
243
244
CHAPTER 12 The Central Limit Problem: The Centered Case
X n1 , . . . , X nn satisfy the conditions of Theorem 1, and therefore L(Sn ) ⇒ N (0, 1) n→∞
n 2 dF (x) = n 2 dF (x) → 0 for if and only if gn (ε) = x x n j n j=1 (|x|≥ε) (|x|≥ε) n→∞
every ε > 0, where Sn = nj=1 X n j and Fn is the d.f. of the X n j s. It is shown here that for all (sufficiently small) ε > 0, gn (ε) 0, so that L(Sn ) N (0, 1). To this n→∞ √ √ end, observe that the r.v. X n1 takes on the values (1− pn )/ npn qn and − pn / npn qn with respective probabilities pn and qn . Hence gn (ε) = n x 2 dFn (x) (|x|≥ε) x 2 dFn (x) + n x 2 dFn (x) =n (x≥ε) (x≤−ε) 2 =n x dFn (x), (x≥ε)
√ √ because for sufficiently large n, −ε < − pn / npn qn , since pn / npn qn tends to 0 pn √ qn √1 > 2ε (for sufficiently small ε), so that as n → ∞. Next, √1− npn qn = npn qn → n→∞
λ
for sufficiently large n (and sufficiently small ε) It follows that n
(x≥ε)
x 2 dFn (x) = n ×
√ qn npn qn
> ε.
qn2 × pn = qn → 1; n→∞ npn qn
i.e., gn (ε) 0 for all (sufficiently small) ε > 0, as asserted.
It is to be observed, however, that Sn = an Tn + bn , where Tn = nj=1 Yn j ∼ √ √ B(n, pn ), an = 1/ npn qn , bn = −npn / npn qn , and that P(Tn = t) →
n→∞
P(T = t), where T ∼ P(λ), t = 0, 1, . . . (see, e.g., Theorem 1, Chapter 3, in Roussas √ def d (1997)). It follows (by Theorem 7 in Chapter 5) that an Tn + bn → √1 T − λ = λ n→∞ √ x X , where X takes on the values √x − λ with respective probabilities e−λ λx! , λ x = 0, 1, . . .. (Refer also to the Example 4 here, as well as Application 1 in Chapter 13.) Example 3. For j = 1, . . . , n, let X n j be independent r.v.s distributed as N (0, 1/n), so that E X n j = 0, σ 2 (X n j ) = σn2j = n1 , sn2 = 1, and max1≤ j≤n σn2j = n1 → 0. n→∞ , j = 1, . . . , n, to their sum Sn = Thus, the contribution of each one of the r.v.s X n j
n j=1 X n j is negligible. More precisely, the r.v.s X n j , j = 1, . . . , n, are u.a.n. Since Sn ∼ N (0, 1), it follows that gn (ε) → 0 for every ε > 0 (by Theorem 1). n→∞
In order to gain some insight as to why gn (ε) should converge to 0, and what this fact reflects, let us proceed with the computation of gn (ε). Here gn (ε) = n x 2 dFn (x), where Fn is the d.f. of X n1 (|x|≥ε)
12.2 Limiting Laws of L(Sn ) Under Conditions (C)
=n But
√ nx 2 n x 2 pn (x)d x, where pn (x) = √ e− 2 . 2π (|x|≥ε)
√ nx 2 n x pn (x)d x = √ x 2 e− 2 d x 2π (|x|≥ε) (|x|≥ε) √ nx 2 2 n x 2 e− 2 d x, = √ 2π (x≥ε)
2
and
(x≥ε)
x 2 e−
nx 2 2
dx =
ε − nε2 1 e 2 + n n
(x≥ε)
e−
nx 2 2
dx
(by integration by parts). Therefore,
However,
√
√ nx 2 2ε √ − nε2 2 n ne 2 + √ e− 2 d x. gn (ε) = √ 2π 2π (x≥ε)
ne−nε
2 /2
→ 0, and
n→∞
√ √ 2 2 n 1 − nx2 −x 2 /2(1/ n)2 e dx = 2 dx √ √ √ e 2π (x≥ε) 2π × (1/ n) (x≥ε) = 2P(X n1 ≥ ε) = P(|X n1 | ≥ ε) = max P(|X n j | ≥ ε) 1≤ j≤n
1 → 0. nε2 n→∞ So, gn (ε) consists essentially of the maximum probability of the individual X n j s being outside (−ε, ε), and this ought to be small, as the X n j s are concentrated around 0. Another way of looking at gn (ε) is to intepret it as the sum of the segments of the truncated variances of the X n j s taken outside (−ε, ε). Again, this quantity must be small, as the variances of the X n j s are close to 0. ≤
12.2 Limiting Laws of L(Sn ) Under Conditions (C) For reference, we gather together the basic assumptions made so far. ⎧ For each n, the r.v.s X n j , j = 1, . . . , kn → ∞, ⎪ ⎪ n→∞ ⎪ ⎨ are row-wise independent,
n (C) E X n j = 0, σn2j = σ 2 (X n j ) < ∞ and sn2 = kj=1 σn2j = 1; ⎪ ⎪ ⎪ ⎩ also, max 2 1≤ j≤kn σn j → 0. n→∞
(12.1)
245
246
CHAPTER 12 The Central Limit Problem: The Centered Case
The purpose of this section is the identification of the limiting laws of L(Sn ) under conditions (C). In all that follows, maxima, summations, and products are over 1 ≤ j ≤ kn , Fn j
n is the d.f. of X n j and f n j is the ch.f. of X n j . Also, Sn = kj=1 X n j , and by log z, z complex, we mean log p z (the principal branch of the log z). Limits and limit suprema are taken as n → ∞, unless otherwise specified. Our problem is that of finding the limiting laws (in the weak sense) of L(Sn ). The following lemma is established first. Lemma 1.
With the foregoing notation and under (C) in (12.1), one has {log f n j (t) − [ f n j (t) − 1]} → 0, t ∈ . j
Proof.
By Theorem 9 (iii) in Chapter 11, applied with n = 2, one has E Xnj 2 E Xnj (it) + θn j t 1! 2! σn2j 2 t , |θn j | = |θn j (t)| ≤ 1. = 1 + θn j 2 2
f n j (t) = 1 +
σ2 σ2 n j 2 nj 2 t ≤ t , | f n j (t) − 1| = θn j 2 2
Hence
so that max | f n j (t) − 1| ≤ j
(12.2)
t2 max σn2j → 0 (for each t). 2 j
Therefore, for each t ∈ and for all n ≥ n(t) (independent of j), one has | f n j (t) − 1| ≤
1 uniformly in j = 1, . . . , kn . 2
(12.3)
Next, for t as above and n ≥ n(t), we have, by means of (12.3) and Lemma 7 in Chapter 11, log f n j (t) = log{1 + [ f n j (t) − 1]} = [ f n j (t) − 1] + θn∗j [ f n j (t) − 1]2 for some θn∗j = θn∗j (t) with |θn∗j | ≤ 1. Thus, log f n j (t) − [ f n j (t) − 1] = θn∗j [ f n j (t) − 1]2 , and | log f n j (t) − [ f n j (t) − 1]| ≤ | f n j (t) − 1|2 ,
j = 1, . . . , kn ,
since |z 2 | = |z|2 for a complex number z (see also Exercise 10).
(12.4)
12.2 Limiting Laws of L(Sn ) Under Conditions (C)
Hence, by employing (12.4) and (12.2), one has {log f n j (t) − [ f n j (t) − 1]} ≤ | log f n j (t) − [ f n j (t) − 1]| j j ≤ | f n j (t) − 1|2 j
≤ [max| f n j (t)−1|]
j
≤ [max | f n j (t) − 1|] j
t2
=
| f n j (t)−1|
j
t2 2 σn j 2 j
[max | f n j (t) − 1|].
2
(12.5)
j
On the other hand, as was seen earlier, max | f n j (t) − 1| ≤ j
t2 max σn2j → 0. 2 j
(12.6)
By means of (12.6), relation (12.5) then gives {log f n j (t) − [ f n j (t) − 1]} → 0, as was to be seen.
j
If sn2 ≤ c rather than equal 1, then the right-hand side in (12.5) is
Remark 1. bounded by Corollary.
t2 2
c max j | f n j (t) − 1| ≤
t2 2 2 c max j σn j
→ 0, so that the result is still true.
Under (C) and with sn2 = 1 replaced by sn2 ≤ c, one has log f n (t) − ψn (t) → 0, t ∈ ,
where f n (t) =
f n j (t) and ψn (t) =
j
Proof.
j
(eit x − 1)dFn j (x).
By independence, f n is the ch.f. of Sn and by the definition of f n , j
f n j (t) − 1 = =
log f n j (t) = log f n (t),
e
it x
dFn j (x) − 1 =
(eit x − 1)dFn j (x),
(12.7)
e
it x
dFn j (x) −
dFn j (x)
247
248
CHAPTER 12 The Central Limit Problem: The Centered Case
so that
[ f n j (t) − 1] =
j
j
(eit x − 1)dFn j (x) = ψn (t).
(12.8)
Then the result follows from the lemma (along with Remark 1) and (12.7) and (12.8). From the corollary, it follows that, if ψn (t) → ψ(t), where eψ(t) , t ∈ , is a ch.f. of a r.v. (see Proposition 2(i) below), then f n (t) → eψ(t) , t ∈ , and therefore L(Sn ) ⇒ to the law corresponding to eψ(t) . The converse is also true by Theorem 3 in Chapter 11 and continuity of log p z. Therefore, in order to find the limiting laws of L(Sn ) (in the weak convergence sense), it suffices to find the (pointwise) limits of ψn (t), t ∈ . This is the subject matter of this section and the content of Theorem 2 below. To this end, we have (eit x − 1)dFn j (x) = (eit x − 1 − it x)dFn j (x), ψn (t) =
j
since
Next,
j
it x dFn j (x) = it
2 eit x −1−it x → − t2 , x2 x→0
x dFn j (x) = itE X n j = 0.
by expanding eit x around t = 0 (see, e.g., discussion
following relation (11.18) in Chapter 11). Then by defining 2 − t2
eit x −1−it x x2
to be equal to
for x = 0 (i.e., by continuity), we get eit x − 1 − it x x 2 dFn j (x) ψn (t) = x2 j eit x − 1 − it x x 2 d y dFn j (y) = x2 −∞ j it x e − 1 − it x x 2 = d y dFn j (y) x2 −∞ j it x e − 1 − it x d K n (x), = x2
or
ψn (t) =
eit x − 1 − it x d K n (x), x2
K n (x) =
j
x −∞
y 2 dFn j (y).
(12.9)
In connection with the function K n defined in (12.9), we have Proposition 1.
The function K n defined in (12.9) is the d.f. of a r.v.
Proof. In fact, 0 ≤ K n ↑, clearly, K n (−∞) = 0 also clearly, and K n (∞) =
2
∞ 2 j −∞ y dFn j (y) = j σn j = 1. K n is also right-continuous. For this, it suffices
12.2 Limiting Laws of L(Sn ) Under Conditions (C)
x to show that −∞ y 2 dFn j (y) is right-continuous. To this end, let x ↓ x0 . Then x0 x 2 2 y dFn j (y) − y dFn j (y) = y 2 dFn j (y) −∞ (x0 ,x] −∞ = y 2 I(x0 ,x] (y) dFn j (y),
and y 2 I(x0 ,x] → y 2 I (y) = 0, whereas y 2 I(x0 ,x] (y) ≤ y 2 x↓x0
independent of x and Fn j − integrable. Thus, the Dominated Convergence Theorem implies that y 2 I(x0 ,x] (y) dFn j (y) → 0, x↓x0
or equivalently,
x
−∞
y 2 dFn j (y) →
x0
x↓x0 −∞
y 2 dFn j (y).
Next, |eit x − 1 − it x| |eit x − 1| + |t x| 2|t x| 2|t| → 0 ≤ ≤ = x2 |x|2 |x|2 |x| 0<|x|→∞ (by Exercise 4 in Chapter 11). it x x Therefore e −1−it is bounded and also continuous as a function of x for each x2 fixed t. It follows that for any d.f. K (with finite variation) the integral below, to be denoted by ψ(t), it x e − 1 − it x d K (x), (12.10) ψ(t) = x2 exists and is finite (in norm). In the following, two propositions, Propositions 2 and 3 and two lemmas, Lemmas 2 and 3, are stated before the main result of this section, Theorem 2, can be stated and proved. The proofs of the propositions are deferred to Section 12.4. The lemmas are purely complex analysis results and are stated here for the sake of completeness only. Proposition 2. Let K be a d.f. of bounded variation. Then the complex-valued function ψ given in (12.10) is well defined as explained above. Furthermore: de f
(i) The function eψ(t) = f (t) is the ch.f. of a r.v., X , say. (ii) E X = 0 and σ 2 (X ) = E X 2 = Var K . Corollary 1. Let K be a d.f. with variation 1, and let the function ψ be defined by (12.10), so that eψ(t) is the ch.f. of a r.v. (by part (i) of the proposition). Then under conditions (C) in (12.1), eψ(t) is the ch.f. of the sum of n i.i.d. r.v.s. That is, for each n, there exist i.i.d. r.v.s X n j , j = 1, . . . , n, satisfying (C), such that
249
250
CHAPTER 12 The Central Limit Problem: The Centered Case
L(Sn ) = L (corresponding to eψ(t) ), and therefore f n (t) → eψ(t) on , where Sn = j X n j and f n is the ch.f. of Sn .
n 2 sn j = 1 is replaced by sn2 ≤ Corollary 2. If in (C), the condition sn2 = kj=1 c(< ∞), and K is any d.f. of bounded variation, then the proposition and Corollary 1 still hold true. Corollary 3. Then:
Let K be a d.f. of bounded variation, and on , define ψ by (12.10).
(i) The second derivative of ψ(t) exists and is given by ψ (t) = − eit x d K (x), t ∈ .
(ii) Furthermore, if K (−∞) = 0, there is only one such K that defines the ψ in (12.10). Proposition 3. For each n, let K n be a d.f. such that Var K n ≤ c(< ∞), and let ψn be defined by (12.10); i.e., it x e − 1 − it x d K n (x), t ∈ . ψn (t) = x2 (i) Suppose that K n ⇒ K , a d.f. (with variation ≤ c, by Theorem 3 (ii) in Chapter 8). Then ψn (t) → ψ(t), t ∈ , where ψ is given in (12.10); i.e., it x e − 1 − it x d K (x). ψ(t) = x2 (ii) Conversely, if ψn (t) → ψ(t), t ∈ , for some ψ, then K n ⇒ K uac (see Definition 3 in Chapter 11), a d.f. (with variation ≤ c) and ψ is given as above. The following two lemmas are complex analysis results. Lemma 2. Let g be a complex-valued function defined on such that g(0) = 1, g is continuous, and g(t) = 0, t ∈ . Then there exists a unique (single-valued) function λ defined on into C, such that λ(0) = 0, and continuous on , for which g(t) = eλ(t) , t ∈ . Lemma 3. For each n, let gn , g be functions satisfying the assumptions of Lemma 2, and let λn , λ be the corresponding functions for which the conclusion of the lemma holds. Then gn → g, t ∈ implies λn → λ, t ∈ . We may proceed with the formulation and proof of the main result in this section; a special case of it is Theorem 1 as already stated. Theorem 2. Let the r.v.s X n j , j = 1, . . . , kn , n ≥ 1, satisfy conditions
(C) in (12.1), and let Fn j and f n j be their respective d.f.s and ch.f.s. Set Sn = j X n j and
12.2 Limiting Laws of L(Sn ) Under Conditions (C)
let K n be as in (12.9); i.e., K n (x) =
j
x
−∞
y 2 dFn j (y).
(12.11)
Then (i) If L(Sn ) ⇒ L, a d.f. of a r.v. with ch.f. f = 0 on , then K n ⇒ K uac, a d.f. with Var K ≤ 1, and f (t) = eψ(t) , where it x e − 1 − it x ψ(t) = d K (x). (12.12) x2 (ii) With K n given in (12.11), suppose K n ⇒ K , a d.f. (with Var K ≤ 1, by Theorem 3 (ii) in Chapter 8), and let ψ be defined by (12.12), so that eψ is the ch.f. of a r.v. (by Proposition 2(i)). Then L(Sn ) ⇒ L, where L is the d.f. of a r.v. corresponding to the ch.f. eψ of a r.v. Proof. (i) Let L(Sn ) ⇒ L, a d.f. of a r.v. with ch.f. f = 0 on . Then (by Theorem 3 in Chapter 11) f n (t) → f (t) on , where f n (t) = j f n j (t). Under (C), the Corollary to Lemma 1 yields log f n (t) − ψn (t) → 0, t ∈ , where ψn (t) is given in (12.9) (see also (12.8)); i.e., it x e − 1 − it x ψn (t) = d K n (x). x2
(12.13)
(12.14)
From (12.13), we get by exponentiation f n (t)e−ψn (t) → 1, t ∈ .
(12.15)
Then f −1 f n e−ψn → f −1 , or e−ψn → f −1 , or eψn → f ; i.e., eψn (t) → f (t), t ∈
(12.16)
The ch.f. f has the properties of f (0) = 1, as the ch.f. of a r.v., and continuity (by Theorem 1 (i), (ii) in Chapter 11), whereas f (t) = 0 on , by assumption. Thus f satisfies the assumptions of Lemma 2. So, there exists a unique function ψ : → C, continuous on , ψ(0) = 0, and such that f (t) = eψ(t) on . Therefore, by (12.16), (12.17) eψn (t) → eψ(t) on . Set gn (t) = eψn (t) and g(t) = eψ(t) (= f (t)). From the derivations leading to relation (12.10), it is immediate that ψ(0) = 0, and ψn (t) is continuous on for each n ≥ 1. Indeed, it x e − it x − 1 ψn (t) = d K n (x) and x2
251
252
CHAPTER 12 The Central Limit Problem: The Centered Case
it x it x e − it x − 1 ≤ |e − 1| + |t x| ≤ |t x| + |t x| = 2|t||x| 2 x x2 x2 x2 2|t0 | 2|t| ≤ , |t| ≤ |t0 | (some t0 ∈ ) = |x| |x| independent of t ∈ [−|t0 |, |t0 |], K n -integrable, and eit0 x − it0 x − 1 eit x − it x − 1 −→ . 2 t→t0 x x2 Then the Dominated Convergence Theorem implies that ψn (t) −→ ψ(t0 ). Theret→t0
fore, the function gn (n ≥ 1) and g satisfy the conditions of Lemma 2. This fact along with convergence (12.17) and Lemma 3 yield ψn (t) → ψ(t) on . So, ψn is given by (12.14) and ψn (t) → ψ(t), t ∈ , for some ψ (as described earlier). Then, by Proposition 3 (ii), K n ⇒ K uac, a d.f. with variation ≤ 1 and ψ is given by (12.12). (ii) By Proposition 1, K n is a d.f. of a r.v. (n ≥ 1) and the assumption is that K n ⇒ K some d.f. (with Var K ≤ 1). Therefore Proposition 3 (i) applies and yields that ψn (t) → ψ(t), t ∈ , where ψn (t) (n ≥ 1) and ψ are given by (12.14) and (12.12), respectively; hence eψn (t) → eψ(t) on . By this and (12.15) (which holds under conditions (C)), we have then f n (t) → eψ(t) on , where eψ(t) is a ch.f. of a r.v., by Proposition 2(i). Thus L(Sn ) ⇒ L, where L is the d.f. of a r.v. corresponding to eψ ; this is so by the converse part of Theorem 3 in Chapter 11. Corollary. In conditions (C) given in (12.1), replace sn2 = 1 by sn2 ≤ c. Then one has Var K n → Var K , if and only if sn2 → σ 2 (L) (the variance of the distribution L). Proof.
From (12.11), we have Var K n =
j
y 2 dFn j (y) =
σn2j = sn2 .
j
On the other hand, Proposition 2(ii) gives Var K = σ 2 (L) (equal to the second moment of L, the d.f. of a r.v. corresponding to eψ , where ψ is determined by K through (12.12), since its first moment is 0). Hence the result.
12.3 Conditions for the Central Limit Theorem to Hold In this section, we operate basically under conditions (C) given in (12.1), as has been the case in the previous section. We provide necessary and sufficient conditions for the CLT to hold; in other words, establish Theorem 1, and also give several sets of sufficient conditions for the validity of the CLT. Here the d.f. K ; is selected to accommodate the needs of Theorem 1; namely, K is chosen as follows: 0 x <0 K (x) = (12.18) 1 x ≥ 0.
12.3 Conditions for the Central Limit Theorem to Hold
So, K is a d.f. of a r.v. with a jump of length 1 at the origin (and it uniquely determines ψ(t)). Then the formula it x e − 1 − it x ψ(t) = d K (x) x2 gives
ψ(t) =
so that eψ(t) = e
2 − t2
{0}
−
t2 t2 d K (0) = − , 2 2
, which is the ch.f. of N (0, 1). Recall that x K n (x) = y 2 dFn j (y), j
−∞
(12.19)
where Fn j is the d.f. of X n j . Then, by Theorem 2 (ii), if K n ⇒ K , it follows that L(Sn ) ⇒ N (0, 1), whereas (by part (i) of Theorem 2) if L(Sn ) ⇒ N (0, 1), then K n ⇒ K uac. Therefore in order to ensure that L(Sn ) ⇒ N (0, 1), it suffices to find conditions for which K n ⇒ K , where K n and K are given by (12.19) and (12.18), respectively. This issue will be settled in the following two lemmas, the combination of which will also provide the proof of Theorem 1. Lemma 4. For each n ≥ 1, let X n j , j = 1, . . . , kn → ∞, be row-wise independent
n σn2j = 1. Let Fn j be the d.f. r.v.s with E X n j = 0, σn2j = E X n2 j < ∞, and sn2 = kj=1
k n of X n j and let Sn = j=1 X n j . Finally, suppose that, for every ε > 0, gn (ε) =
kn j=1 (|x|≥ε)
x 2 dFn j (x) → 0.
(12.20)
Then (i) max1≤ j≤kn σn2j → 0 (so that conditions (C) in (12.1) are fulfilled). (ii) K n ⇒ K , where K n and K are given in (12.19) and (12.18), respectively. (iii) L(Sn ) ⇒ N (0, 1). Proof. (i) Clearly, σn2j
=
x dFn j (x) = x dFn j (x)+ x 2 dFn j (x) (|x|≥ε) (|x|<ε) x 2 dFn j (x) ≤ gn (ε) + ε2 , ≤ gn (ε) + 2
2
(|x|<ε)
so that max j σn2j ≤ gn (ε) + ε2 . Hence lim sup max j σn2j ≤ ε2 , by (12.20), and letting ε → 0, we get the result.
253
254
CHAPTER 12 The Central Limit Problem: The Centered Case
(ii) We have to show that K n (−ε) → 0 and K n (ε) → 1 for every ε > 0. To this end, x 2 dFn j (x) K n (−ε) = j
≤
(−∞,−ε]
j
+
[ε,∞)
j
=
(|x|≥ε)
j
Also, K n (ε) =
−
(ε,∞)
j
x 2 dFn j (x) = gn (ε) → 0. (−∞,∞)
x 2 dFn j (x)
x 2 dFn j (x)
= 1−
x 2 dFn j (x)
j
j
x 2 dFn j (x)
x 2 dFn j (x) =
(−∞,ε]
j
(−∞,−ε]
(ε,∞)
x 2 dFn j (x) → 1,
because j (ε,∞) x 2 dFn j (x) ≤ gn (ε) → 0. Thus K n ⇒ K . (iii) As has already been mentioned, part (i) ensures that conditions (C) in (12.1) hold. Then this, part (ii), and Theorem 2 (ii) imply that L(Sn ) ⇒ L, where L t2
is the distribution corresponding to eψ(t) , which here is e− 2 . In other words, L(Sn ) ⇒ N (0, 1). Lemma 5. Assume conditions (C), let Fn j be the d.f. of X n j , and set Sn =
k n j=1 X n j . Then, if L(Sn ) ⇒ N (0, 1), it follows that the convergence in (12.20) holds for every ε > 0. Proof. Let K n and K be given by (12.19) and (12.18), respectively. Then, by Theorem 2 (i), L(Sn ) ⇒ N (0, 1) implies K n ⇒ K uac. Next, gn (ε) = x 2 dFn j (x) j
= ≤
(|x|≥ε)
j
(−∞,−ε]
j
(−∞,−ε]
x dFn j (x) + 2
x 2 dFn j (x) +
= K n (−ε) + 1 − K n
ε 2
,
j
[ε,∞)
j
( 2ε ,∞)
x 2 dFn j (x) x 2 dFn j (x)
12.3 Conditions for the Central Limit Theorem to Hold
and we wish to show that gn (ε) → 0. It suffices to show that for every {m} ⊆ {n} there exists {r } ⊆ {m} such that gr (ε) → 0. By considering {K m }, there exists {r } ⊆ {m} r →∞
such that K r ⇒ K ∗ a d.f. (by Theorem 5 in Chapter 8), and K ∗ = K + c, some r →∞ constant c. Therefore ε → K ∗ (−ε) + 1 − K ∗ 2 r →∞ 2 ε −c = K (−ε) + c + 1 − K 2 = 0 + 1 − 1 = 0.
K r (−ε) + 1 − K r
ε
Thus, gr (ε) → 0.
r →∞
We may now proceed with the proof of Theorem 1. Proof of Theorem 1. Suppose that gn (ε) → 0 for every ε > 0. Then, by Lemma 4 (i), max j σn2j → 0, whereas by Lemma 4 (iii), L(Sn ) ⇒ N (0, 1). On the other hand, under conditions (C), if L(Sn ) ⇒ N (0, 1), then gn (ε) → 0 for every ε > 0; this is so by Lemma 5. The proof is complete. In reference to Example 2, we show that Theorem 1 does not provide the asymptotic distribution of L(Sn ). However, the asymptotic distribution of L(Sn ) may be obtained directly, as is done in the following example. As usual, all limits are taken as n → ∞. Example 4. In reference to Example 2, show directly, by means of ch.f.s, that √ L(Sn ) ⇒ L(cX √ + d), where the r.v. X is distributed as P(λ), X ∼ P(λ), c = 1/ λ, and d = − λ.
√ Indeed, recalling that Sn = nj=1 (Yn j − pn )/ npn qn , we have √ f Sn (t) = f nj=1 (Yn j − pn )/√npn qn (t) = f nj=1 (Yn j − pn ) (t/ npn qn ) = =
n j=1 n j=1
√ f Yn j − pn (t/ npn qn ) √ npn qn
e−i pn t/ √
= e−inpn t/ =e
npn qn
√ −inpn t/ npn qn
√ f Yn j (t/ npn qn )
√ [ f Yn1 (t/ npn qn )]n √
( pn eit/
npn qn
+ q n )n
(since f Yn1 (t) = ( pn eit + qn )n ) n √ √ npn eit/ npn qn /qn n −inpn t/ npn qn . 1+ = qn e n
255
256
CHAPTER 12 The Central Limit Problem: The Centered Case
However, qnn
−npn n = (1 − pn ) = 1 + → e−λ , n n
√
√ npn qn
e−inpn t/
and
→ e−iλt/
n→∞ √
1+
npn eit/
√ √ npn eit/ npn qn it/ λ . → λe qn √ √ it/ λ −λ e−it λ × eλe , which is it −λ
= e−it
n → eλe
n
n
√
√ it/ λ
λ
,
,
It follows that f Sn (t) → e−λ × e−it
since
X ∼ P(λ) with ch.f. eλe
npn qn /q
λ
the ch.f. of the r.v. cX + d =
√1 λ
√
λ
√ it/ λ
× eλe = √ X − λ, where
.
Observe that the result obtained here is consistent
with that to be arrived at in n Application 1 of Chapter 13, where it is shown that L j=1 Yn j ⇒ P(λ). As a consequence of Theorem 1, we have the following results. Theorem 3 (Liapounov Theorem). For each n ≥ 1, and with n → ∞ throughout, let the r.v.s X n j , j = 1, . . . , kn → ∞, be
row-wise independent and such that E X n j = 0 and σn2j = σ 2 (X n j ) < ∞, with sn2 = j σn2j = 1. Then, if for some δ > 0, E|X n j |2+δ → 0, (12.21) j
it follows that L(Sn ) ⇒ N (0, 1) and max σn2j → 0, where Sn = Proof.
j
j
(12.22)
Xnj .
By Theorem 1, it suffices to show that, for every ε > 0, gn (ε) = x 2 dFn j (x) → 0, j
(|x|≥ε)
where Fn j is the d.f. of X n j . We have E|X n j |2+δ = |x|2+δ dFn j (x) ≥ |x|2+δ dFn j (x) (−∞,∞) (|x|≥ε) 2 δ δ |x| |x| dFn j (x) ≥ ε x 2 dFn j (x), = (|x|≥ε)
so that
(|x|≥ε)
E|X n j |2+δ ≥ εδ gn (ε).
j
Therefore, relation (12.21) implies gn (ε) → 0, which ensures (12.22).
12.3 Conditions for the Central Limit Theorem to Hold
The theorem is true, in particular, if δ = 1; i.e., if
Remark 2.
E|X n j |3 → 0.
j
Corollary. replaced by
The conclusion of the theorem holds true if condition (12.21) is |X n j | ≤ Mn j (< ∞) a.s., max Mn j → 0. j
Proof.
We have
|X n j | = 3
j
X n2 j |X n j |
a.s. ≤ max Mn j X n2 j , j
j
j
so that, by taking expectation of both sides, we obtain E|X n j |3 ≤ max Mn j E X n2 j = max Mn j → 0, j
j
j
j
2
2 2 since j E Xnj = j σ (X n j ) = sn = 1. Then Remark 2 applies and gives the result. Now Theorem 1 can be generalized as follows. Theorem 4. For each n ≥ 1, let the r.v.s X n j , j = 1, . . . , kn → ∞, be row-wise
n σn2j . independent and such that E X n j = 0 and σn2j = σ 2 (X n j ) < ∞. Set sn2 = kj=1 Then σn2j Sn ⇒ N (0, 1) and max 2 → 0 , L sn sn if and only if, for every ε > 0, gn (ε) =
1 x 2 dFn j (x) → 0, sn2 (|x|≥εsn ) j
where Sn = Proof.
j
X n j and Fn j is the d.f. of X n j .
Set Yn j =
Xnj , sn
j = 1, . . . , kn .
Then for each n, the r.v.s Yn j , j = 1, . . . , kn , are independent (within each row) and such that EYn j = 0, τn2j = σ 2 (Yn j ) =
σn2j sn2
< ∞, and set τn2 =
kn j=1
τn2j = 1.
257
258
CHAPTER 12 The Central Limit Problem: The Centered Case
That is, the basic assumptions of Theorem 1 are satisfied for the r.v.s Yn j , j = 1, . . . , kn . Thus, by setting Sn Yn j = and gn (ε) = x 2 dG n j (x), Tn = sn (|x|≥ε) j
j
where G n j is the d.f. of Yn j , we obtain L(Tn ) ⇒ N (0, 1) and max j τn2j → 0, if and only if gn (ε) → 0 for every ε > 0. However, Xnj ≤ x = P(X n j ≤ xsn ) = Fn j (xsn ), G n j (x) = P(Yn j ≤ x) = P sn so that gn (ε) =
(|x|≥ε)
j
x 2 dG n j (x) =
j
(|x|≥ε)
x 2 dFn j (xsn )
1 = 2 y 2 dFn j (y) (by setting xsn = y) sn (|y|≥εsn ) j 1 = 2 x 2 dFn j (x). sn (|x|≥εsn ) j
So, L(Tn ) ⇒ N (0, 1) and max j τn2j → 0, if and only if 1 x 2 dFn j (x) → 0, sn2 (|x|≥εsn ) j
or
σn2j ⇒ N (0, 1) and max 2 → 0, if and only if sn 1 x 2 dFn j (x) → 0 for every ε > 0. gn (ε) = 2 sn (|x|≥εsn ) L
Sn sn
j
Specializing the result obtained to a single sequence of r.v.s, we have the following theorem. Theorem 5. Let the r.v.s X j , j = 1, . . . , kn , be independent and such that E X j = 0
n σ j2 . Then and σ j2 = σ 2 (X j ) < ∞. Set sn2 = kj=1 σ j2 Sn ⇒ N (0, 1) and max 2 → 0 , L (12.23) j sn sn if and only if, for every ε > 0, gn (ε) = where Sn =
1 x 2 dF j (x) → 0 , sn2 (|x|≥εsn )
(12.24)
j
j
X j and F j is the d.f. of X j .
12.3 Conditions for the Central Limit Theorem to Hold
Proof. For each n, set Yn j = independent,
Xj sn
, j = 1, . . . , kn . Then Yn j , j = 1, . . . , kn , are
EYn j = 0, τn2j = σ 2 (Yn j ) =
σ j2 sn2
< ∞, and τn2 =
kn
τn2j = 1.
j=1
Sn With Tn = j Yn j = sn and G n j = F j being the d.f. of Yn j , j = 1, . . . , kn , Theorem 1 applies to the present Yn j s and gives the result, since gn (ε) = x 2 dG n j (x) = x 2 dF j (xsn ) j
=
1 sn2
(|x|≥ε)
j
(|y|≥εsn )
j
(|x|≥ε)
y 2 dF j (y) (by setting xsn = y)
1 = 2 x 2 dF j (x). sn (|x|≥εsn )
j
Corollary. Let the r.v.s X j , j = 1, . . . , n, be i.i.d. with E X j = 0 and σ 2 = σ 2 (X j ) < ∞. Then relation (12.24) is satisfied, and so is (12.23), which here becomes σ2 Sn Sn 1 ⇒ N (0, 1) (and =L L = → 0). √ 2 sn nσ n σ n We have sn2 = nσ 2 , so that 1 1 2 n x dF(x) = 2 x 2 dF(x), gn (ε) = nσ 2 (|x|≥εσ √n) σ (|x|≥εσ √n) where F is the common d.f. of the X j s. Now, by the fact that x 2 dF(x) < ∞, and the relations Proof.
x 2 I(|x|≥εσ √n) (x) ≤ x 2 , n ≥ 1, x 2 I(|x|≥εσ √n) (x) → 0, the Dominated Convergence Theorem applies and gives (|x|≥εσ √n) x 2 dF(x) → 0. Hence the result. Remark 3. form.
The corollary just established provides the CLT in its most common
This section is concluded with an example providing a couple of useful results. Example 5. Let X 1 , . . . , X n be i.i.d. r.v.s with E X 1 = μ ∈ and Var(X 1 ) = σ 2 ∈ (0, ∞), and let X¯ n be the sample mean of the X j s (which is used for esti
2 mating μ), X¯ n = n1 nj=1 X j . Then E X¯ n = μ, Var( X¯ n ) = σn , and by the CLT, √ √ d n( X¯ n −μ) d −→ Z ∼ N (0, 1), or n( X¯ n − μ) → Y ∼ N (0, σ 2 ). σ n→∞
n→∞
259
260
CHAPTER 12 The Central Limit Problem: The Centered Case
When μ is known, the sample variance Sn2 of the X j s (which is used for estimating √
¯ d σ 2 ) is Sn2 = n1 nj=1 (X j − μ)2 , and n( XSnn −μ) −→ Y ∼ N (0, 1). Setting Y j = n→∞
(X j − μ)2 , the r.v.s Y1 , . . . , Yn are i.i.d. with EY1 = Var(X 1 ) = σ 2 and Var(Y1 ) = EY12 −(EY1 )2 = E(X 1 −μ)4 −σ 4 = μ4 −σ 4 , where μ4 is the fourth central moment of X 1 , assuming, of course, that E X 14 < ∞. So, E Sn2 = σ 2 and Var(Sn2 ) = (μ4 − σ 4 )/n. By the CLT again, √ √ n(Y¯n − σ 2 ) n(S 2 − σ 2 ) d = n → Z ∼ N (0, 1), μ4 − σ 4 μ4 − σ 4 n→∞ or
√ d n(Sn2 − σ 2 ) −→ Y ∼ N (0, μ4 − σ 4 ).
n→∞
If μ is unknown, the sample variance used for estimating σ 2 is S¯n2 = nj=1 (X j −
X¯ n )2 /(n −1). Then, as is easily seen, nj=1 (X j − X¯ n )2 = nj=1 (X j −μ)2 −n( X¯ n − 2 μ)2 , and hence E S¯n2 = 1 (nσ 2 − n × σ ) = σ 2 . Furthermore, n−1
S¯n2 = so that √
n( S¯n2 − σ 2 ) =
As n → ∞,
n
n n S2 − ( X¯ n − μ)2 , n−1 n n−1
√ √ n √ ¯ n √ σ2 n − [ n( X n − μ)]2 . n(Sn2 − σ 2 ) + n−1 n−1 n−1
n √ 2 n−1 n(Sn
d
− σ 2 ) → Y ∼ N (0, μ4 − σ 4 ), by the result obtained n→∞ √ √ 2√ n earlier and Theorem 7 (ii) in Chapter 5, σn−1n −→ 0, and n−1 [ n( X¯ n − μ)]2 −→ 0, √ d by the fact that n( X¯ n − μ) −→ Y ∼ N (0, σ 2 ), Example 3 in Chapter 11, and Theorem 7 (ii) in Chapter 5 again. Then, by the theorem just cited, it follows that √ d n( S¯n2 − σ 2 ) −→ Y ∼ N (0, μ4 − σ 4 ).
12.4 Proof of Results in Section 12.2 In this section, we provide the proofs of Proposition 2 and its three corollaries stated in Section 12.2, as well as the proof of Proposition 3, also stated in Section 12.2. For the proof of Proposition 2, the following lemma is needed which is also of independent interest. To this effect, recall that from Theorem 1 and Lemma 5 of Chapter 11, we have that if E|X n | < ∞, then dn f (t)|t=0 = i n E X n dt n and f n (t) is continuous on . The following result is also true (and is used in the proof of Proposition 2).
12.4 Proof of Results in Section 12.2
Lemma 6. Let X be a r.v. with d.f. F and ch.f. f . Then, if f (2n) (0) exists and is finite (in norm), it follows that E|X k | < ∞ for all k ≤ 2n and E X k = i1k f (k) (0). Proof. For a complex-valued function g defined on , set g(u) = g(u + h) − g(u −h), h ∈ , and define (n) g(u) recursively. Then it can be seen (see Exercise 7) that 2n (2n) r 2n f [(2n − 2r )h], (12.25) f (0) = (−1)
r r =0
and
(2n) f (0) 1 o(h 2n ) 1 (2n) = f (0) + = f (2n) (0) + 2n o(1) 2n 2n 2n (2h) 2 h 2 with o(1) −→ 0. h→0
(12.26)
By replacing f by what it is equal to, (12.25) becomes as follows: 2n (2n) r 2n
f (0) = (−1) ei[(2n−2r )h]x dF(x) r r =0 2n r 2n −i hx r i hx 2n−r (e = (−1) ) (e ) dF(x) r r =0 2n e−i hx − ei hx dF(x) = 2n ei hx − e−i hx dF(x) = 2n 2n 2n [2i sin(hx)] dF(x) = i 2 [sin(hx)]2n dF(x) = = (−1)n 22n [sin(hx)]2n dF(x).
Thus
sin(hx) 2n 2n
(2n) f (0) n = (−1) x dF(x). (12.27) (2h)2n hx Letting h → 0 and utilizing (12.26), (12.27) becomes sin(hx) 2n 2n (2n) n f (0) = (−1) lim x dF(x). (12.28) h→0 hx 2n Now 0 ≤ sin(hx) x 2n (here is where the even-order derivative is employed!). hx Thus, by the Fatou–Lebesgue Theorem (Theorem 2 in Chapter 5), sin(hx) 2n 2n sin(hx) 2n 2n x dF(x) = lim inf x dF(x) lim h→0 h→0 hx hx
261
262
CHAPTER 12 The Central Limit Problem: The Centered Case
sin(hx) 2n 2n x dF(x) hx h→0 sin(hx) 2n 2n = lim x dF(x) h→0 hx = x 2n dF(x) = E X 2n = (−1)n f (2n) (0),
≥
lim inf
since
sin t → 1. t t→0
From this result and (12.28), it follows that E X 2n < ∞ and hence
E|X |k < ∞ for all k ≤ 2n. That E X k = Chapter 11. Proof of Proposition 2.
1 ik
f (k) (0) follows from Theorem 1(v) in
Recall that it x e − 1 − it x d K (x), ψ(t) = x2
(12.29)
where K is a d.f. of bounded variation. (i) As was pointed out just prior to the formulation of the proposition, for each it x x t ∈ , the function e −1−it is well defined for all x ∈ and also continuous x2 and bounded. Thus, the integral in (12.29) is well defined (and finite in norm) as a Riemann–Stieltjes integral and is taken as the limit of Riemann–Stieltjes sums corresponding to any arbitrarily chosen collection of division points and any points in the corresponding intervals. More precisely, for an < 0 < bn with an → −∞ and bn → ∞ where, which here and in the sequel all limits are taken as n → ∞, consider the interval (an , bn ] and divide it into subintervals by the points an = xn0 < xn1 < · · · < xn,rn −1 < xnrn = bn (rn → ∞), which are chosen to be = 0 and such that max (xnk − xn,k−1 ) → 0.
1≤k≤rn
For each subdivision, we have the following corresponding Riemann–Stieltjes sum de f
Tn (t) =
rn eit xnk −1−it x k=1
2 xnk
nk
K (xn,k−1 , xnk ]
−K (x n,k−1 ,x nk ] i = t+ xnk k
K (xn,k−1 ,xnk ] it xnk (e 2 xnk
− 1) ,
(12.30)
where K (xn,k−1 , xnk ] stands for the variation of K over the interval (xn,k−1 , xnk ]. By setting K (xn,k−1 , xnk ] , βnk = xnk , and xnk K (xn,k−1 , xnk ] = (≥ 0), 2 xnk
αnk = − λnk
12.4 Proof of Results in Section 12.2
relation (12.30) becomes as follows: iαnk t + λnk (eiβnk t − 1) . Tn (t) =
(12.31)
k
Now, for each n, let Ynk , k = 1, . . . , rn , be independent (within each row) r.v.s such that Ynk is P(λnk ), the Poisson distribution with parameter λnk , so that its ch.f. is given by it f Ynk (t) = eλnk (e −1) . Set Z nk = αnk + βnk Ynk . Then the r.v.s Z nk , k = 1, . . . , rn , are independent (within each row) and the ch.f. of the r.v. Z nk is given by f Z nk (t) = eiαnk t eλnk (e Thus
iβnk t −1)
= eiαnk t+λnk (e
iβnk t −1)
!
iαnk t + λnk (eiβnk t − 1) exp Tn (t) = exp = is a ch.f. (that of the r.v.
.
"
k
e
iαnk t+λnk (eiβnk t −1)
k
Z nk ). But it x e − 1 − it x Tn (t) → d K (x) = ψ(t), x2
so that
k
de f
de f
f n (t) = e Tn (t) → eψ(t) = f (t). eit x −1−it x x2
(12.32) t2
Define g(t; x) by: g(t; x) = for x = 0, and g(t; x) = − 2 for x = 0. Restrict |t| ≤ 1 and look at g(t; x) as a function of x. Then, for x = 0, |g(t; x)| = it x e −1−it x |eit x −1|+|t x| 2 ≤ 2|tx 2x| ≤ |x| → 0 as |x| → ∞, and, for x = 0, ≤ x2 x2 |g(x; t)| ≤ 1, so that |g(· ; t)| is bounded by a bound independent of t (|t| ≤ 1) and K -integrable. Also, the integrand tends to 0, as t → 0, for all x ∈ . Then, as t → 0, the Dominated Convergence Theorem yields ψ(t) → ψ(0) = 0, so that ψ is continuous at 0 and then so is f (t). Thus, we have that f n are ch.f.s of r.v.s and, by (12.32), f n → f in with f being continuous at the origin. Then the converse part of the Paul Lévy Continuity Theorem (Theorem 3 in Chapter 11) implies that f is the ch.f. of a uniquely determined d.f. of a r.v., call it X . This establishes part (i). Restrict attention to |t| ≤ 1, and observe that, for x = 0, | ∂t∂ g(t; x)| = it(ii) e x −1 |t x| i x ≤ |x| = |t| ≤ 1 independent of t and K -integrable. Since
ψ(t) =
g(t; x)d K (x) =
{0}
g(t; x)d K (x) +
−{0}
g(t; x)d K (x)
263
264
CHAPTER 12 The Central Limit Problem: The Centered Case
eit x − 1 − it x v0 2 d K (x), t + 2 x2 −{0} v0 = K (0) − K (0−), =−
Theorem 5 in Chapter 5 applies and yields eit x − 1 − it x d d K (x) ψ (t) = −v0 t + dt −{0} x2 ∂ eit x − 1 − it x d K (x) = −v0 t + x2 −{0} ∂t it x e −1 i d K (x). = −v0 t + x −{0}
(12.33)
In particular,
ψ (0) = 0. (12.34) 2 it x Next, for x = 0, | ∂t∂ 2 g(t; x)| = ∂t∂ i e x−1 = | − eit x | = 1, again independent of t and K -integrable. Then by the theorem just cited and (12.33), it x e −1 d i d K (x) ψ (t) = −v0 + dt −{0} x it x e −1 ∂ i d K (x) = −v0 + x −{0} ∂t it x = −v0 − e d K (x) = − eit x d K (x). (12.35) −{0}
In particular,
ψ (0) = −Var K .
(12.36)
Now, eψ(t) = f (t) is the ch.f. of the r.v. X , whose second-order derivative at 0 is eψ(t) |t=0 = ψ (t)eψ(t) |t=0 = ψ (t)eψ(t) + [ψ (t)]2 eψ(t) |t=0 = ψ (0) = −Var K finite, by (12.34) and (12.36) and the fact that ψ(0) = 0. Then Lemma 6 applies and gives 1 ψ(t) e |t=0 = ψ (t)eψ(t) |t=0 = 0, EX = i and
1 ψ(t) e |t=0 = −ψ (0) = Var K (by (12.36)). i2 The proof of the proposition is completed. E X2 =
12.4 Proof of Results in Section 12.2
Proof of Corollary 1.
From (12.29), we get
ψ(t) = n where, for each n,
eit x − 1 − it x K (x) , d x2 n ψ
K n
is a d.f. of bounded variation. Then, by Proposition 2(i), e n ψ
is the ch.f. of a r.v. Let X n j , j = 1, . . . , n, be row-wise i.i.d. r.v.s with ch.f. e n . ψ
Then the ch.f., f n , of Sn = nj=1 X n j is f n = nj=1 e n = eψ , so that L(Sn ) = L(corresponding to eψ ). It remains for us to check conditions (C). By Proposition ψ d ψ(t)/n d 2 ψ(t)/n e |t=0 = 0, dt |t=0 = 2(ii), applied with e n rather than eψ , we have dt 2e −Var Kn , by (12.34) and (12.36), so that the first and second moments of the distribution ψ
corresponding to the ch.f. e n are 0 and Var Kn , respectively. Thus, E X n j = 0, and σn2j = σ 2 (X n j ) = Var Kn = n1 Var K = n1 , j = 1, . . . , n, so that max σn2j = n1 → 0, j
and sn2j = j σn2j = 1. Proof of Corollary 2. Actually, the only point so far where the condition sn2 = 1 was used was in the proof of Lemma 1, which, however, also holds true under the K is condition sn2 ≤ c (see Remark 1). So, if K is a d.f. with Var K = v, then nv ψ
K also a d.f. with Var nv = n1 . Hence e n is the ch.f. of row-wise i.i.d. r.v.s X n j , j =
1, . . . , n, with E X n j = 0 and σ 2 (X n j ) = Var Kn = nv . It follows that j σn2j = v and max j σn2j = nv → 0.
Proof of Corollary 3. (i) For an arbitrary t ∈ , consider, e.g., the interval [t −1, t +1] and apply Theorem 5 in Chapter 5 to obtain, as in relation (12.35), ψ (t) = − eit x d K (x).
(ii) Let K ∗ be another d.f. as described in part (ii), and such that
eit x − 1 − it x d K ∗ (x) = ψ(t), t ∈ . x2
Then, as in part (i), ψ (t) = −
or − ψ (t) =
eit x d K (x) = −
eit x d K (x) =
eit x d K ∗ (x),
eit x d K ∗ (x).
(12.37)
265
266
CHAPTER 12 The Central Limit Problem: The Centered Case
In particular, de f
− ψ (0) = Var K = Var K ∗ = v0 ,
(12.38)
so that, by (12.37), − Thus, the d.f.s
K v0
and
ψ (t) = v0 K∗ v0
eit x d
K (x) = v0
eit x d
K ∗ (x) . v0
(12.39)
K K∗ v0 (−∞) = v0 (−∞) = 0. Since, ∗ that vK0 = Kv0 in , or K = K ∗ in .
have variation 1 and
(12.39), they have the same ch.f., it follows
by
Proof of Proposition 3. (i) As in the proof of Proposition 2(i), let g(t; x) = (eit x − 1 − it x)/x 2 for x = 0, and (see discussion following relation (12.32)) g(t; x) = −t 2 /2 for x = 0. Then for x = 0, |g(t; x)| ≤ 2|t| |x| → 0 as x → ∞, so that, for each t ∈ , g(t; x) is bounded and continuous in as a function of x. Thus, if K n ⇒ K , a d.f. (with Var K ≤ c), then the Helly–Bray Extended Lemma (Theorem 7 in Chapter 8; see also Remark 4 in the same chapter) applies and yields ψn (t) = g(t; x) d K n (x) → g(t; x) d K (x) = ψ(t), t ∈ . (12.40)
(ii) Consider the sequence {K n } and apply the Weak Compactness Theorem (Theorem 5 in Chapter 8) to obtain {K m } ⊆ {K n } such that K m ⇒ K ∗ , some d.f. m→∞
(with Var K ∗ necessarily ≤ c). Define ψ ∗ by it x e − 1 − it x ψ ∗ (t) = d K ∗ (x), t ∈ . 2 x
Then, by part (i), ψm → ψ ∗ in . Since also ψm → ψ in , it follows that m→∞
m→∞
ψ ∗ = ψ in . Next, let {K r } ⊆ {K n } distinct from {K m } such that K r ⇒ K ∗∗ , some d.f. (of variation necessarily ≤ c), and define ψ ∗∗ by it x e − 1 − it x ψ ∗∗ (t) = d K ∗∗ (x), t ∈ . x2
r →∞
Then, as before, ψr → ψ ∗∗ in , whereas ψr → ψ in also, so that ψ ∗∗ = ψ r →∞
r →∞
in . It follows that ψ ∗∗ = ψ ∗ (= ψ) in . On the other hand, by relation (12.37), −ψ (t) = eit x d K ∗ (x) = eit x d K ∗∗ (x),
so that the d.f.s K ∗ and K ∗∗ have the same ch.f. Then, by the inversion formula (Theorem 2 in Chapter 11) it follows that K ∗ = K ∗∗ +C, for a constant C. Hence (by Definition 3 in Chapter 11) K n ⇒ K uac, some d.f. K (with Var K ≤ c), and all of them determine the same ψ.
12.4 Proof of Results in Section 12.2
Exercises. 1. If the independent r.v.s X j , j ≥ 1, are distributed as U (− j, j), then show that the Lindeberg condition (see relation (12.24)) holds, so that L( Ssnn ) ⇒ N (0, 1), n→∞
where Sn = nj=1 X j and sn2 = nj=1 σ j2 , σ j2 = Var(X j ). Hint: Recall that X ∼ U (α, β) means that the r.v. X has the uniform distribution with parameters α and β (α < β), its probability density function is p(x) =
n α+β (α−β)2 1 2 j=1 j = β−α I[α,β] (x), E X = 2 and Var(X ) = 12 . Finally, recall that n(n + 1)(2n + 1)/6. 2. If the independent r.v.s X j , j ≥ 2, are distributed as follows: P(X j = − j α ) = P(X j = j α ) = = 1−
2 jβ
1 , jβ
P(X j = 0)
(α, β > 0),
show that the restriction β < 1 ensures that the Lindeberg condition (relation (12.24)) holds. Hint: Show that the restriction β < 1 ensures that the set of js with j = 1, . . . , n, and | ± j α | ≥ εsn , is empty for all ε > 0, for large n. For an arbitrary, but fixed β(< 1), it is to be understood that j ≥ j0 , where j0 = 21/β , if 21/β is an integer, or j0 = [21/β ] + 1 otherwise. This ensures that 1 − j2β is nonnegative.
3. Let the r.v.s X j , j ≥ 1, be distributed as follows: P(X j = ± j α ) =
1 , 6 j 2(α−1)
P(X j = 0) = 1 −
1 , 3 j 2(α−1)
α > 1. Show that the Lindeberg condition (relation (12.24)) holds, if and only if α < 23 .
Conclude that L( Ssnn ) ⇒ N (0, 1), where Sn = nj=1 X j and sn2 = nj=1 σ j2 , n→∞
σ j2 = Var(X j ). Hint: For α < 23 , show that j 2α < ε2 sn2 , j = 1, . . . , n, which is implied by n 2α <
ε ε2 sn2 for large n, so that gn (ε) = 0. Next, gn (ε) ≥ 1− 18 (1− k1 )(2− k1 ) εk2 s 2 k 3−2α , 2
2α
n
where k = [(εsn )1/α ], and conclude that the expression on the right-hand side does not converge to 0 for α ≥ 23 . 4. For j = 1, 2, . . . , n, let X j be independent r.v.s defined as follows: ⎧ 2 ⎨ ± j with probability 1/12 j 2 each X j = ± j with probability 1/12 each ⎩ 0 with probability 1 − 1/6 − 1/6 j 2 . Then show that the Lindeberg condition (condition (12.24)) does not hold.
267
268
CHAPTER 12 The Central Limit Problem: The Centered Case
Hint: Recall that
n
2 j=1 j = n(n + 1)(2n + 1)/6, and show that, for every
sn−2 nj=1 j 2 I(|± j|≥εsn ) = o(1), sn−2 kj=1 j 2 = o(1), k =
ε > 0 and large n: [(εsn )1/2 ], in order to conclude that gn (ε) 0. n→∞
5. Let X n , n ≥ 1, be independent r.v.s such that |X n | ≤ Mn a.s. with Mn = o(sn )
where sn2 = nj=1 σ j2 → ∞ and σ j2 = Var(X j ). n→∞
Set Sn = nj=1 X j and show that L
Sn − E Sn sn
⇒ N (0, 1).
n→∞
Hint: From the assumption Mn = o(sn ), it follows that εsn , n > n 0 (= n(ε)), ε > 0. Write
Mn → 0, so that sn n→∞
Mn <
n0 Sn − E Sn 1 = (X j − E X j ) sn sn j=1
1 + sn
n
(X j − E X j ) (n > n 0 ),
j=n 0 +1
and since the first term tends to 0, as n → ∞, work with the second term only. To this end, set Yn j = (X j − E X j )/τn , where τn2 = sn2 − sn20 , and show that the r.v.s Yn j , j = n 0 + 1, . . . , n, satisfy the Liapounov condition (for δ = 1) (see Theorem 3). 6. Let X n j , j = 1, . . . , kn → ∞ be row-wise independent r.v.s with E X n j = 0,
n σn2j = 1, and max1≤ j≤kn σn2j → 0. Then, σ 2 (X n j ) = σn2j < ∞, sn2 = kj=1 n→∞
n with Sn = kj=1 X n j , and under the assumption that L(Sn ) ⇒ N (0, 1), show
n P X n2 j → 1. that kj=1 n→∞
Hint: By Theorem 1, it follows that, for every ε > 0, gn (ε) = j (|x|≥εsn ) x 2 dFn j → 0, where Fn j is the d.f. of X n j . One way of proceeding is to show that n→∞ $
2# → 0, j P(|X n j | ≥ ε) → 0, j σ 2 X n2 j I(X 2 <1) j σ |X n j |I(|X n j |≥1) n→∞ nj n→∞
2
2 P 2 → 0, → 0, and, finally, P( j X n j j X n j I(X n2 j <1) − j E[X n j I(|X n2 j |<1) ]n→∞ n→∞
2 I(X 2 <1) = j X n j ) −→ 0. nj
n→∞
7. For a function g : → C, set g(u) = g(u + h) − g(u − h), h ∈ , and define
(n) g(u) recursively. Then (i) Show that
(m) g(u) =
m r =0
(−1)r
m g(u + (m − 2r )h). r
12.4 Proof of Results in Section 12.2
(ii) From part (i), and by expanding f (t) around 0 up to terms of order 2n, obtain o(h 2n ) (2n) f (0) = f (2n) (0) + 2n (2h) h 2n o(t) → 0 as t → 0), (where t so that f (2n) (0) = lim
(2n) f (0) (2h)2n
as h → 0.
Part (i) is# proved by induction in m. In the process of doing so, the relation #Hint: m $ #m $ m+1$ + = will be needed. In proving part (ii), the following relation is r +1 r r +1 required (which you may use here without proof; see, however, the next exercise): 2n
(−1)r
r =0
2n 0 k < 2n (2n − 2r )k = 22n (2n)! k = 2n. r
8. For m = 1, 2, . . . and k integer, show that m r m (m − 2r )k (−1) r r =0 0 for (0 ≤)k < m = 2m × m! k = m.
(12.41)
Hint: The proof is by induction. The result is easily checked for m = 1, 2, 3. Then assume (12.41) and show that m r =0
=
(−1)r
m+1 (m + 1 − 2r )k r
0 for k < m + 1 2m+1 × (m + 1)! k = m + 1.
(12.42)
# $ First, prove (12.42) for k < m + 1. For this purpose, use the identity m+1 = r #m $ # m $ + , and in the process of the subsequent parts of the proof, use the r r −1
k #k $ l k−l expansion (x +y)k = l=0 , along with (12.41). In establishing (12.42) l x y m+1 = (m + 1 − 2r )m (m + 1 − 2r ), use the for k = m#+ 1,$write (m + 1# − 2r ) m $ = (m + 1) r −1 , repeatedly use the expansion just mentioned, fact that r m+1 r and, of course, also employ relation (12.41). 9. For j = 1, . . . , kn → ∞, let X n j be row-wise independent r.v.s with respective n→∞ d.f.s Fn j . Then show that, for every ε > 0, the convergence P max |X n j | ≥ ε → 0 (12.43) 1≤ j≤kn
n→∞
269
270
CHAPTER 12 The Central Limit Problem: The Centered Case
is equivalent to
kn j=1 (|x|≥ε)
dFn j (x) → 0 n→∞
(12.44)
and implies the convergence max P(|X n j | ≥ ε) → 0.
1≤ j≤kn
n→∞
(12.45)
Hint: Use Exercise 3 in Chapter 10. 10. Show that, for every complex number z, it holds |z 2 | = |z|2 . 11. In the proof of Proposition 2 (see discussion following relation (12.36)), Lemma 6 is used for n = 1 to ensure finiteness of E X 2 , which, according to the lemma, follows from the existence and finiteness of f (2) (0). Prove this fact directly; i.e., show that the existence and finiteness of f (2) (0) implies finiteness of E X 2 .
CHAPTER
The Central Limit Problem: The Noncentered Case
13
This chapter is a continuation of the previous chapter. In the basic conditions (C), summarized in (12.1) of Chapter 12, the assumption that the r.v.s involved have expectations zero was included. In the present chapter, this assumption is dropped (hence the term “noncentered”). As it will be seen, however, this has ramifications that will be addressed in this chapter. In Section 13.1, conditions (C) are restated for convenience, and then they are modified to conditions (C ) listed in (13.1) by replacing the requirement sn2 = 1 with the less stringent requirement sn2 ≤ c(< ∞). It is then stated that the basic Theorem 2 in Chapter 12 still holds true under (C ). This statement is justified by carefully walking through all preceding results in Chapter 12. Then conditions (C ) are modified even further by dropping the requirement that E X n j = 0. This brings us to conditions (C ) listed in (13.2). These are the conditions under which the present chapter operates. The next step is to relate quantities, such as d.f. and ch.f., corresponding to the centered r.v.s X n j −E X n j and to the noncentered r.v.s X n j . This process and some observations bring us to relation (13.11), which is the starting point for the following section. In Section 13.2, the quantity ψ¯ plays the role of ψ of Chapter 12, and a new ψ is introduced as in relation (13.12). Then this section is devoted to stating and proving four propositions analogous to Proposition 2, its corollaries, and Proposition 3, in the previous chapter. The culmination of this section is Theorem 2 , analogous to Theorem 2 of Chapter 12. In Section 13.3, two important special cases of Theorem 2 are discussed. In the first one, the limiting law is the N (μ, σ 2 ) distribution, and in the second the limiting law is the Poisson distribution with parameter λ, P(λ). The section (and the chapter) is concluded with an application to the second special case just described. This application produces the familiar result that, under certain conditions, binomial probabilities are approximated by Poisson probabilities.
13.1 Notation and Preliminary Discussion Recalling the basic setting from the previous chapter, we are dealing with the triangular arrays of r.v.s that are subject to conditions (C) in relation (12.1) of that chapter. We
An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00013-X Copyright © 2014 Elsevier Inc. All rights reserved.
271
272
CHAPTER 13 The Central Limit Problem: The Noncentered Case
reproduce them here for easy reference. ⎧ For each n ≥ 1, the r.v.s X n j , j = 1, . . . , kn → ∞, ⎪ ⎪ n→∞ ⎪ ⎪ ⎪ ⎪ ⎨ are independent within each row, kn (C) E X = 0, σ 2 = σ 2 (X ) < ∞ and s 2 = σn2j = 1; n j n j ⎪ n nj ⎪ ⎪ j=1 ⎪ ⎪ ⎪ ⎩ also, max σn2j → 0. 1≤ j≤kn
n→∞
As has already been mentioned in the introductory part, the central purpose here is to drop the assumption that E X n j = 0, and study its ramifications. First, however, an observation is in order, and that is that the fundamental Theorem 2 in Chapter 12 holds true under conditions (C) by suppressing sn2 = 1 and replacing it by sn2 ≤ c(< ∞) for all n. Again, for convenient reference, restate the resulting assumptions and denote them by (C ). That is, ⎧ For each n, the r.v.s X n j , j = 1, . . . , kn → ∞, ⎪ ⎪ n→∞ ⎪ ⎪ ⎪ ⎨ are row-wise independent, (13.1) (C ) E X = 0, σ 2 = σ 2 (X ) < ∞ and s 2 ≤ c(< ∞); nj nj ⎪ n nj ⎪ ⎪ ⎪ ⎪ ⎩ also, max σn2j → 0. 1≤ j≤kn
n→∞
The assertion then is that Statement. in (13.1).
Theorem 2 in Chapter 12 holds true under conditions (C ) listed
Justification. All one has to do is to go carefully through the proofs of the theorem and of other results used, and see what bearing on them the condition sn2 = 1 has. All references made here will be to Chapter 12. First, Lemma 1 and its corollary hold under the condition sn2 ≤ c as was pointed out in Remark 1. In Proposition 1, it is to be observed that now K n , n ≥ 1, are d.f.s with V ar K n ≤ c, and K n (−∞) = 0, but not necessarily d.f.s of r.v.s. Lemma 6 and Proposition 2 are not related to conditions (C). Corollary 1 to Proposition 2 holds true under the condition sn2 ≤ c, because of Corollary 2 to Proposition 2. Corollary 3 makes use of relation (12.35) pertaining to Proposition 2, which, however, is not related to conditions (C). Thus, Corollary 3 to Proposition 2 is not related to conditions (C) either. In Proposition 3, part (i) holds for d.f.s K n with V ar K n ≤ c (which amounts to sn2 ≤ c) and K n (−∞) = 0. Part (ii) makes use of K n s as just described and of Corollary 3 to Proposition 2, which is unrelated to conditions (C). In other words, Proposition 3 still holds, if sn2 ≤ c. Finally, regarding Theorem 2, the proof of part (i) makes use of (12.13), which also holds true for sn2 ≤ 1 (because of Remark 1). Also, it uses Lemmas 2 and 3, which are unrelated to conditions (C), and of Proposition 3(ii), which holds under sn2 ≤ c. The proof of part (ii) makes use of Proposition 3(i) and (12.15), both holding under sn2 ≤ c. This completes the justification of the statement.
13.1 Notation and Preliminary Discussion
Now, starting with conditions (C ) and dropping the requirement that E X n j = 0, we have conditions (C ): ⎧ → ∞, ⎪ ⎪ For each n, the r.v.s X n j , j = 1, . . . , kn n→∞ ⎪ ⎪ ⎪ ⎨ are row-wise independent, (C ) E X = α , say, finite, σ 2 = σ 2 (X ) < ∞ and (13.2) nj nj nj ⎪ nj ⎪ ⎪ 2 ⎪ ⎪ ⎩ sn ≤ c(< ∞); also, max σn2j → 0. n→∞
1≤ j≤kn
To economize on the notation, we mention once and for all that, for a r.v. X with E X = α, finite, σ 2 (X ) = σ 2 < ∞, with d.f. F and ch.f. f , we set X¯ = X − α, so that E X¯ = 0 and σ 2 ( X¯ ) = σ 2 ,
(13.3)
and F¯ and f¯, respectively, for the d.f. and the ch.f. of X¯ . For later reference, we mention here that
and
¯ f¯(t) = Eeit X = e−iαt f (t),
(13.4)
¯ F(x) = P( X¯ ≤ x) = P(X ≤ x + α) = F(x + α).
(13.5)
Also, all limits are taken as n → ∞ unless otherwise specified, and all maxima, summations and products are over j ranging from 1 to kn . The ch.f.s f¯n j and f n j of the r.v.s X¯ n j = X n j − αn j and X n j , respectively, are on account of (13.4), (13.6) f¯n j (t) = e−iαn j t f n j (t), so that f¯n (t) =
kn
f¯n j (t) = e−iαn t f n (t),
f n (t) =
j=1
kn j=1
f n j (t), αn =
kn
αn j .
(13.7)
j=1
The respective d.f.s F¯n j and Fn j of the r.v.s X¯ n j and X n j are related as follows, on account of (13.5): F¯n j (x) = P( X¯ n j ≤ x) = P(X n j ≤ x + αn j ) = Fn j (x + αn j ). Set K¯ n (x) =
kn
x
j=1 −∞
y 2 d F¯n j (y) =
kn
x
j=1 −∞
y 2 dFn j (y + αn j ),
(13.8)
(13.9)
and observe that K¯ n is a d.f. with K¯ n (−∞) = 0 and V ar K¯ n ≤ c. By means of K¯ n , define the function ψ¯ n by it x e − 1 − it x ¯ ψ¯ n (t) = d K n (x). (13.10) x2
273
274
CHAPTER 13 The Central Limit Problem: The Noncentered Case
It is immediate that the r.v.s X¯ n j , j = 1, . . . , kn , satisfy conditions (C ). Therefore Lemma 1 and its Corollary in Chapter 12 apply and give log f¯n (t) − ψ¯ n (t) → 0 on , and hence, for all t ∈ , ¯ ¯ f¯n (t)e−ψn (t) → 1, or e−iαn t f n (t)e−ψn (t) → 1, or f n (t)e−ψn (t) → 1, (13.11) n where ψn (t) = iαn t + ψ¯ n (t), αn = kj=1 αn j . This will be the starting point of the next section where results analogous to Proposition 2, its Corollaries 1 and 3, and Proposition 3 will be formulated. Also, a result analogous to the fundamental Theorem 2 in Chapter 12 will be formulated and proved, which is, actually, the main purpose of this chapter.
13.2 Limiting Laws of L(Sn ) Under Conditions (C )
n As has been the case throughout, Sn stands for partial sums; i.e., Sn = kj=1 X n j , and the purpose of the present section is to determine conditions under which L(Sn ) has a limiting distribution, and determine the class of all limiting laws of L(Sn ). This is done in stages with the formulation and proof of two propositions and two corollaries (Proposition 2 , its two corollaries–Corollary 1 and 3 —and Proposition 3 ) the way it was done in Section 12.2 (of Chapter 12). Proposition 2 . Define ψ by
Let K be a d.f. of bounded variation and let α be a real number.
¯ ¯ ψ(t) = iαt + ψ(t), where ψ(t) =
eit x − 1 − it x d K (x). x2
(13.12)
¯ Then eψ(t) is the ch.f. of a r.v. X¯ , say, and eψ(t) is the ch.f. of the r.v. X = X¯ + α. Furthermore, E X¯ = 0, σ 2 ( X¯ ) = V ar K , so that E X = α and σ 2 (X ) = V ar K . ¯ Proof. That eψ(t) is the ch.f. of a r.v. X¯ follows from Proposition 2(i) in Chapter 12. That eψ(t) is the ch.f. of the r.v. X = X¯ + α follows from the relation eψ(t) = ¯ eiαt eψ(t) . Finally, that E X¯ = 0 and σ 2 ( X¯ ) = V ar K follow from Proposition 2(ii) in Chapter 12. ¯ Corollary 1 . Let K be a d.f. with bounded variation, and let the function ψ be ¯
defined by (13.12), so that eψ(t) is the ch.f. of a r.v. (by part (i) of Proposition 2 in Chapter 12). Let α be a real number, and define ψ by (13.12). Then eψ is the ch.f. of a conditions r.v. Finally, for each n, there exist i.i.d. r.v.s X n j , j = 1, . . . , n, satisfying (C ), such that L(Sn ) = L(corresponding to eψ(t) ), where Sn = j X n j . ¯ Proof. As already mentioned, eψ is the ch.f. of a r.v. X¯ which, by Proposition 2(ii) in Chapter 12, has E X¯ = 0 and σ 2 ( X¯ ) = V ar K . Thus eψ is the ch.f. of the r.v.
13.2 Limiting Laws of L(Sn ) Under Conditions (C )
X = X¯ + α and E X = α and σ 2 (X ) = V ar K . This is so, because by (13.12), ¯
¯
eψ(t) = eiαt+ψ(t) = eiαt eψ(t) , t ∈ , and the right-hand side is the ch.f. of X + α, since α and X are independent. Next, ¯ ψ(t) let X¯ n j , j = 1, . . . , n, be i.i.d. r.v.s with ch.f. e n , and let X n j = X¯ n j + αn . Then the ¯ iαt ψ(t) ¯ ch.f. of X n j is e n + n , so that the ch.f. of Sn = j X n j is eiαt+ψ(t) = eψ(t) . That is, L(Sn ) = L (corresponding to eψ(t) ). Next, E X¯ n j = 0 and σ 2 ( X¯ n j ) = n1 V ar K by Proposition 2(ii) in Chapter 12, so that E X n j = αn and σ 2 (X n j ) = n1 V ar K . It follows that sn2 = V ar K and max j σn2j = n1 V ar K → 0. The conditions (C ) are fulfilled, and the proof is completed. Corollary 3 . Let K be a d.f. of bounded variation with K (−∞) = 0, and let α be a real number. Define ψ as in (13.12). Then the pair (α, K ) uniquely determines ψ, and vice versa. That is, if K ∗ is another d.f. as before and α ∗ ∈ determining the same ψ, then K = K ∗ and α = α ∗ . Proof. Indeed, let K ∗ be a d.f. of bounded variation with K ∗ (−∞) = 0 and let α ∗ ∈ , such that ¯ iαt + ψ(t) = ψ(t) = ψ ∗ (t) = iα ∗ t + ψ¯ ∗ (t), t ∈ , where
(13.13)
eit x − 1 − it x d K ∗ (x), and ψ¯ is as in (13.12). x2 Use (13.12), the defining relation for ψ¯ ∗ , and take the second-order derivative with respect to t as in relations (12.37) and (12.38) (of Chapter 12) in order to get V ar K = ψ¯ ∗ (t) =
de f
−ψ (0) = −ψ ∗ (0) = V ar K ∗ = v0 , or by (12.39) in Chapter 12, K K∗ ψ (t) = eit x d (x) = eit x d (x). − v0 v0 v0 Then, as in Corollary 3(ii) to Proposition 2 in Chapter 12, it follows that K ∗ = K on . Therefore, ψ¯ ∗ = ψ¯ in and hence α ∗ = α, by (13.13). Proposition 3 . For each n, let K¯ n be a d.f. such that V ar K¯ n ≤ c(< ∞), and let αn ∈ . Define ψ¯ n and ψn by ψ¯ n (t) =
eit x − 1 − it x ¯ d K n (x), ψn (t) = iαn t + ψ¯ n (t), t ∈ . x2
(13.14)
(i) Suppose that K¯ n ⇒ K , some d.f. (with V ar K necessarily ≤ c) and αn → α ∈ . Then ψn (t) → ψ(t), t ∈ , where ψ is defined by it x e − 1 − it x ¯ ¯ ψ(t) = iαt + ψ(t) and ψ(t) = d K (x). (13.15) x2
275
276
CHAPTER 13 The Central Limit Problem: The Noncentered Case
(ii) Let ψn (t) be defined by (13.14) and suppose that ψn (t) → ψ(t), t ∈ , some function on . Then K¯ n ⇒ K uac, some d.f. (of V ar K ≤ c), and αn → α ∈ . Furthermore, ψ corresponds to the pair (α, K ) and is defined as in (13.15), where K is any member of the family of limiting distribution of { K¯ n }. Proof. ¯ t ∈ , where ψ¯ is defined (i) The assumption K¯ n ⇒ K implies that ψ¯ n (t) → ψ(t), by (13.15). This follows by Proposition 3(i) in Chapter 12. Then that ψn → ψ follows from the fact that αn → α and relation (13.14). (ii) In order to show that αn → α ∈ , it suffices to show that for every {m} ⊆ {n} there exists {r } ⊆ {m} such that αr → α. By considering {m} ⊆ {n}, there r →∞ exists {r } ⊆ {m} such that K¯ r ⇒ K ∗ , some d.f. (with V ar K ∗ necessarily r →∞
≤ c), and define ψ¯ ∗ by ψ¯ ∗ (t) =
eit x − 1 − it x d K ∗ (x), t ∈ . it
(13.16)
Then ψ¯ r (t) → ψ¯ ∗ (t), by Proposition 3(i) in Chapter 12. Also, ψr (t) → ψ(t), r →∞ r →∞ by assumption. By (13.14), we have ψr (t) = iαr t + ψ¯r (t), so that for t = 0, ψ(t) − ψ¯ ∗ (t) de f ∗ ψr (t) − ψ¯ r (t) → = α ∈ , r →∞ it it independent of t.
αr =
(13.17)
Next, consider {s} ⊆ {n} distinct from {m}. Then there exists {ν} ⊆ {s} such that K¯ ν ⇒ K ∗∗ , some d.f. (with V ar K ∗∗ necessarily ≤ c), and define ψ¯ ∗∗ as ν→∞ in (13.16); i.e., it x e − 1 − it x ψ¯ ∗∗ (t) = d K ∗∗ (x), t ∈ . (13.18) it As before, ψ¯ ν (t) → ψ¯ ∗∗ (t) and ψν (t) → ψ(t), and for t = 0, ν→∞
ν→∞
ψ(t) − ψ¯ ∗∗ (t) de f ∗∗ ψν (t) − ψ¯ ν (t) → = α ∈ , ν→∞ it it independent of t.
αν =
From (13.17) and (13.19), we have ψ(t) = iα ∗ t + ψ¯ ∗ (t) = iα ∗∗ t + ψ¯ ∗∗ (t), so that
ψ (t) = ψ¯∗ (t) = ψ¯∗∗ (t).
(13.19)
13.2 Limiting Laws of L(Sn ) Under Conditions (C )
On the other hand, from (13.15), (13.16), and by (12.37) (in Chapter 12), it x ∗ ∗ ¯ ψ (t) = ψ (t) = − e d K (x) = − eit x d K ∗∗ (x) = ψ¯∗∗ (t),
so that the d.f.s K ∗ and K ∗∗ have the same ch.f. It follows that K ∗ = K ∗∗ + c, ¯ Thus, ψ¯∗ (t) = for some constant c, and hence K ∗ and K ∗∗ determine the same ψ. ∗ ∗∗ ψ¯ (t), t ∈ , and therefore (13.17) and (13.18) imply that α = α ∗∗ , call it α. Also, setting K for any limiting d.f. of { K¯n }, we have then shown that αn → α and K n ⇒ K uac, as was to be seen. We may now proceed with the formulation and proof of the main result in this section. Theorem 2 . Let the r.v.s X n j , j = 1, . . . , kn , n ≥ 1, satisfy conditions (C ) in (13.2), let Fn j and f n j be their respective d.f.s and ch.f.s, and let F¯n j and f¯n j be the respective d.f.s and ch.f.s of X¯ n j = X n j − αn j , E X n j = αn j ∈ . Let Sn = j X n j and let K¯ n be defined by x x y 2 d F¯n j (y) = y 2 dFn j (y + αn j ), K¯ n (x) = j
αn =
−∞
j
−∞
αn j ,
(13.20)
j
(where each K¯ n is a d.f. with V ar K¯ n ≤ c and K¯ n (−∞) = 0). (i) Let L(Sn ) ⇒ L, a d.f. of a r.v. with ch.f. f = 0 on . Then K¯n ⇒ K uac, a d.f. (with V ar K ≤ c, by Theorem 3(ii) in Chapter 8), and αn → α ∈ . Furthermore, f (t) = eψ(t) , where ψ is defined by ¯ ψ(t) = iαt + ψ(t), α ∈ , and it x e − 1 − it x ¯ d K (x), t ∈ . ψ(t) = x2
(13.21)
(ii) With K¯ n given in (13.20), suppose that K¯ n ⇒ K , a d.f. (with Var K ≤ c, by Theorem 3(ii) in Chapter 8), and αn → α ∈ . Then L(Sn ) ⇒ L, where L is the d.f. of a r.v. with ch.f. f given by f (t) = eψ(t) (by Proposition 2 ) and ψ is defined in (13.21). Proof. In the first place, by relation (13.11), we have ¯ f¯n (t)e−ψn (t) → 1 on ,
(13.22)
277
278
CHAPTER 13 The Central Limit Problem: The Noncentered Case
where
f¯n (t) = e−iαn t f n (t), ψ¯ n (t) =
f n (t) =
f n j (t),
j
eit x − 1 − it x ¯ d K n (x), x2
(13.23)
and K¯n and αn are given in (13.20). From (13.22) and (13.23) (see also (13.11)), we have f n (t)e−ψn (t) → 1 on , ψn (t) = iαn t + ψ¯ n (t).
(13.24)
We now proceed with the proof of the theorem. (i) The assumption L(Sn ) ⇒ L, a d.f. of a r.v., implies f n (t) → f (t) on (by Theorem 3 in Chapter 11), whereas f (t) = 0 on implies that eψn (t) → f (t) on ,
(13.25)
by (13.24) and the fact that f −1 (t) f n (t) → 1. Now, f (0) = 1 and f is continuous on as the ch.f. of a r.v., whereas f (t) = 0 on , by assumption. Then, by Lemma 2 in Chapter 12, f (t) = eψ(t) , where ψ is a uniquely defined function on into C with ψ(0) = 0 and continuous on . Next, ψn (0) = 0 and the ψn s are continuous in , as has been seen. Furthermore, (13.25) becomes eψn (t) → eψ(t) on .
(13.26)
The conditions of Lemma 3 in Chapter 12 are satisfied, and the convergence (13.26) implies that ψn (t) → ψ(t) on . On account of this and relations (13.23) and (13.24), Proposition 3 (ii) applies and gives K¯ n ⇒ K uac, some d.f. (of V ar K ≤ c), and αn → α ∈ . Furthermore, ψ corresponds to the pair (α, K ), where K is any member of the family of limiting distributions of { K¯ n }; i.e., ψ is given by (13.21). (ii) Now suppose that K¯ n ⇒ K , a d.f. as described in the theorem, and αn → α ∈ . By means of K and α, define ψ as in (13.21). Then, with ψn defined by (13.24), Proposition 3 (i) applies and gives ψn (t) → ψ(t) on , so that eψn (t) → eψ(t) on . From this last convergence and relation (13.24), we obtain f n (t) → eψ(t) on and eψ(t) is continuous at the origin. Then, by the converse part of Theorem 3 in Chapter 11, it follows that L(Sn ) ⇒ L, a d.f. of a r.v. with ch.f. f such that f (t) = eψ(t) on . This completes the proof.
13.3 Two Special Cases of the Limiting Laws of L(Sn ) In this section, two important special cases of Theorem 2 are considered that result from two specific choices of the d.f. K . In one of those cases, the limiting law is Normal, and in the other the limiting law is the Poisson distribution.
13.3 Two Special Cases of the Limiting Laws of L(Sn )
For the first case, choose the d.f. K as follows:
0 for x < 0, K(x) = σ 2 for x ≥ 0.
(13.27)
¯ defined in (13.21), becomes Then the corresponding ψ, ¯ ψ(t) =−
σ 2t 2 2 2
(by recalling that the integrand is equal to − t2 for x = 0). In the following, we will consider row-wise independent r.v.s X n j with E X n j = αn j and σ 2 (X n j ) = σn2j finite 2 2 2 with αn = j αn j → μ ∈ and sn = j σn j → σ < ∞. Define ψ, as in (13.21), by σ 2t 2 ¯ , (13.28) ψ(t) = iμt + ψ(t) = iμt − 2 and observe that eψ(t) is the ch.f. of the N (μ, σ 2 ) distribution. At this point, observe that, by Corollary 3 and with K as in (13.27), the pair (μ, K ) uniquely determines ψ. Then the following result holds. Theorem 1. For each n ≥ 1, let the r.v.s X n j , j = 1, . . . , kn → ∞, be row-wise independent with E X n j = αn j and σ 2 (X n j ) = σn2j finite, and such that αn =
kn
αn j → μ ∈ , sn2 =
j=1
Set Sn =
k n j=1
kn
σn2j → σ 2 < ∞.
j=1
X n j . Then L(Sn ) ⇒ N (μ, σ 2 ), and max σn2j → 0,
(13.29)
1≤ j≤kn
if and only if, for every ε > 0, h n (ε) =
kn j=1 (|x|≥ε)
x 2 dFn j (x + αn j ) → 0.
(13.30)
The proof of Theorem 1 is facilitated by the following lemmas. Lemma 1. For each n ≥ 1, let X n j , j = 1, . . . , kn → ∞, be row-wise independent r.v.s with E X n j = αn j and σ 2 (X n j ) = σn2j finite, and such that αn =
kn j=1
αn j → μ ∈ ,
sn2
=
kn j=1
σn2j → σ 2 < ∞.
279
280
CHAPTER 13 The Central Limit Problem: The Noncentered Case
n Let Fn j be the d.f. of X n j , and let Sn = kj=1 X n j . Finally, suppose that, for every ε > 0, kn h n (ε) = x 2 dFn j (x + αn j ) → 0. (13.31) j=1 (|x|≥ε)
Then (i) max1≤ j≤kn σn2j → 0 (so that conditions (C ) in (13.2) are satisfied). (ii) With K¯ n defined by K¯ n (y) =
kn
x
j=1 −∞
y 2 dFn j (y + αn j ),
(13.32)
it holds K¯ n ⇒ K , where K is given in (13.27). (iii) L(Sn ) ⇒ N (μ, σ 2 ). Proof. (i) Clearly, σn2j = (x − αn j )2 dFn j (x) = y 2 dFn j (y + αn j )
(by setting x − αn j = y; see also Appendix B for related properties) = x 2 dFn j (x + αn j ) (|x|≥ε) x 2 dFn j (x + αn j ) + (|x|<ε) x 2 dFn j (x + αn j ) ≤ h n (ε) + (|x|<ε)
≤ h n (ε) + ε2 . Hence lim sup max j σn2j ≤ ε2 , by (13.31), and letting ε → 0, we get the result. (ii) As in the proof of Lemma 4(ii) in Chapter 12, we have, for every ε > 0, x 2 dFn j (x + αn j ) K¯ n (−ε) = ≤ and K¯ n (ε) =
j
(−∞,−ε]
j
(|x|≥ε)
x 2 dFn j (x + αn j ) = h n (ε) → 0,
j
(−∞,ε]
x 2 dFn j (x + αn j )
13.3 Two Special Cases of the Limiting Laws of L(Sn )
=
sn2
−
(ε,∞)
j
x 2 dFn j (x + αn j ) → σ 2 ,
because j (ε,∞) x 2 dFn j (x + αn j ) ≤ j [ε,∞) x 2 dFn j (x + αn j ) ≤ h n (ε) → 0. Thus, K¯ n ⇒ K . (iii) By part (ii), K¯ n ⇒ K , and by assumption, αn → μ. Then, by Theorem 2 (ii), 2 2 ¯ L(Sn ) ⇒ L, where the d.f. L has ch.f. eψ(t) with ψ(t) = iμt+ψ(t) = iμt− σ t . It follows that L is the N (μ, σ 2 ), and the proof is completed.
2
Lemma 2. For each n ≥ 1, let X n j , j = 1, . . . , kn → ∞, be row-wise independent r.v.s with E X n j = αn j and σ 2 (X n j ) = σn2j finite, and such that αn =
kn
αn j → μ ∈ , sn2 =
j=1
Set Sn =
k n j=1
kn
σn2j → σ 2 < ∞.
j=1
X n j and suppose that L(Sn ) ⇒ N (μ, σ 2 ) and max σn2j → 0. 1≤ j≤kn
Then the convergence in (13.30) holds for every ε > 0. Proof. The assumption L(Sn ) ⇒ N (μ, σ 2 ) implies that K¯n ⇒ K uac, where K and K¯n are given by (13.27) and (13.32), respectively. This is so by Theorem 2 (i). Next, as in the proof of Lemma 5 in Chapter 12, h n (ε) = x 2 dFn j (x + αn j ) j
(|x|≥ε)
ε , ≤ K¯ n (−ε) + sn2 − K¯ n 2 and we wish to show that h n (ε) → 0. To this end, for every {m} ⊆ {n} there exists {r } ⊆ {m} such that K¯ r ⇒ K ∗ a d.f., and K ∗ = K + c, for some constant c. It r →∞ follows that ε ε K¯ r (−ε) + sr2 − K¯ r → K ∗ (−ε) + σ 2 − K ∗ 2 r →∞ 2 ε 2 −c = K (−ε) + c + σ − K 2 = 0 + σ 2 − σ 2 = 0, so that h r (ε) → 0. Then h n (ε) → 0.
Proof of Theorem 1.
r →∞
It follows from Lemmas 1 and 2.
281
282
CHAPTER 13 The Central Limit Problem: The Noncentered Case
For the second special case of Theorem 2 , choose the d.f. K as follows:
0 for x < 1, (13.33) K (x) = λ for x ≥ 1 (λ > 0). Then, the corresponding ψ¯ is given by it x e − 1 − it x ¯ ψ(t) = d K (x) = (eit − 1 − it)λ. x2 In (13.21), take α = λ. Then the ψ that corresponds to (λ, K ) is ψ(t) = itλ + (eit − 1 − it)λ = λ(eit − 1),
(13.34)
where we observe that eψ(t) is the ch.f. of P(λ), the Poisson distribution with parameter λ. Then, we have the following theorem. Theorem 2. For each n ≥ 1, let the r.v.s X n j , j = 1, . . . , kn → ∞, be rowwise independent with E X n j = αn j and σ 2 (X n j ) = σn2j finite, and such that αn = k n k n k n 2 2 j=1 αn j → λ, sn = j=1 σn j → λ (0 < λ < ∞). Set Sn = j=1 X n j . Then L(Sn ) ⇒ P(λ), and max σn2j → 0,
(13.35)
1≤ j≤kn
if and only if, for every ε > 0, h n (ε) =
kn j=1 (|x−1|≥ε)
x 2 dFn j (x + αn j ) → 0.
(13.36)
The proof of Theorem 2 is facilitated by the following lemmas. Lemma 3. For each n ≥ 1, let X n j , j = 1, . . . , kn → ∞, be row-wise independent r.v.s with E X n j = αn j and σ 2 (X n j ) = σn2j finite, and such that αn =
kn
αn j → λ ∈ , sn2 =
j=1
kn
σn2j → λ.
j=1
n Let Fn j be the d.f. of X n j , and set Sn = kj=1 X n j . Finally, suppose that, for every ε > 0, kn h n (ε) = x 2 dFn j (x + αn j ) → 0. (13.37) j=1 (|x−1|≥ε)
Then, (i) max1≤ j≤kn σn2j → 0, (so that conditions (C ) in (13.2) are satisfied).
13.3 Two Special Cases of the Limiting Laws of L(Sn )
(ii) With K¯ n and K defined by (13.32) and (13.33), respectively, it holds K¯ n ⇒ K . (iii) L(Sn ) ⇒ L, where L is the Poisson distribution with parameter λ. Proof. (i) Clearly,
σn2j = = ≤
(x − αn j )2 dFn j (x) =
y 2 dFn j (y + αn j ) (by setting x − αn j = y) x 2 dFn j (x + αn j ) + x 2 dFn j (x + αn j )
(|x−1|≥ε) h n (ε) + ε2 .
(|x−1|<ε)
Hence lim sup max j σn2j ≤ ε2 , by (13.37), and letting ε → 0, we get the result. (ii) We have to show that, for every ε > 0, K¯ n (1 − ε) → 0 and K¯ n (1 + ε) → λ. Indeed, ¯ x 2 dFn j (x + αn j ) ≤ h n (ε) → 0, K n (1 − ε) = (−∞,1−ε]
j
whereas 0 ≤ sn2 − =
j
≤
(−∞,1+ε]
j
(1+ε,∞)
j
[1+ε,∞)
x 2 dFn j (x + αn j )
x 2 dFn j (x + αn j ) x 2 dFn j (x + αn j ) ≤ h n (ε) → 0.
Therefore, since sn2 → λ, it follows that K¯ n (1 + ε) = j
(−∞,1+ε]
x 2 dFn j (x + αn j ) → λ.
(iii) By part (i), K¯ n ⇒ K , and by assumption αn → λ. Then Theorem 2 (ii) applies and gives that L(Sn ) ⇒ L, where the d.f. L has ch.f. eψ(t) where ψ(t) = ¯ iλt + ψ(t) = iλt + (eit − 1 − it)λ = λ(eit − 1), which is the ch.f. of the Poisson distribution with parameter λ. Lemma 4. For each n ≥ 1, let X n j , j = 1, . . . , kn → ∞, be row-wise independent r.v.s with E X n j = αn j and σ 2 (X n j ) = σn2j finite, and such that αn =
kn j=1
αn j → λ ∈ , sn2 =
kn j=1
σn2j → λ.
283
284
CHAPTER 13 The Central Limit Problem: The Noncentered Case
Set Sn =
k n
j=1 αn j
and suppose that L(Sn ) ⇒ P(λ), and max σn2j → 0. 1≤ j≤kn
Thus the convergence in (13.36) holds for every ε > 0. Proof. Since αn → λ and L(Sn ) ⇒ P(λ), Theorem 2 (i) implies that K¯ n ⇒ K uac. Next, x 2 dFn j (x + αn j ) h n (ε) = (|x−1|≥ε)
j
=
(−∞,1−ε]
j
+
j
≤
x 2 dFn j (x + αn j )
[1+ε,∞)
(−∞,1−ε]
j
+
x 2 dFn j (x + αn j )
x 2 dFn j (x + αn j )
j
(1+ 2ε ,∞)
x 2 dFn j (x + αn j )
ε . = K¯ n (1 − ε) + sn2 − K¯ n 1 + 2 Thus, as in the proof of Lemma 2, passing to a subsequence {r } ⊆ {n} for which K¯ r ⇒ K ∗ a d.f. with K ∗ = K + c, for some constant c, we have r →∞
ε ε → K ∗ (1 − ε) + λ − K ∗ 1 + K¯ r (1 − ε) + sr2 − K¯ r 1 + 2 r →∞ 2 = K (1 − ε) + c + λ ε −K 1 + −c 2 = 0 + λ − λ = 0,
so that h r (ε) → 0, and then h n (ε)→0.
Proof of Theorem 2.
r →∞
It follows from Lemmas 3 and 4.
As an application to Theorem 2, we present the familiar result, according to which Binomial probabilities are approximated by Poisson probabilities. Application 1 (Approximation of a Binomial distribution by a Poisson distribution). For n ≥ 1, let X n j , j = 1, . . . , n, be row-wise independent r.v.s distributed as with parameter pn ), such that pn → 0 and npn → λ ∈ (0, ∞). B(1, pn ) (Bernoulli Set Sn = nj=1 X n j (so that Sn is distributed as B(n, pn ), Binomial with parameters n and pn ). Then L(Sn ) ⇒ L(X ), where X is distributed as P(λ) (Poisson with parameter λ).
13.3 Two Special Cases of the Limiting Laws of L(Sn )
First, we verify nthat the conditions of Theorem 2 are satisfied here. Indeed, E X n j = with λ ∈ (0, ∞). Next, pn and αn = j=1 αn j = npn → λ, by assumption, σn2j = σ 2 (X n j ) = pn qn , where qn = 1 − pn . Also, sn2 = nj=1 σn2j = npn qn → λ, since npn → λ and qn → 1. Then, in order to show that L(Sn ) ⇒ L(X ) (and max j σn2j → 0), it suffices to show that, for every ε > 0, h n (ε) =
j
(|x−1|≥ε)
x 2 dFn j (x + pn ) → 0,
where Fn j = Fn is the d.f. of X n j . Since Fn (x + αn j ) = Fn (x + pn ), and Fn is given by ⎧ ⎨ 0 for x < 0, Fn (x) = qn for 0 ≤ x < 1, ⎩ 1 for x ≥ 1, we have
⎧ ⎨ 0 for x < − pn , Fn (x + pn ) = qn for − pn ≤ x < 1 − pn = qn , ⎩ 1 for x ≥ 1 − pn = qn , Fn (x + pn )
1
}p
n
qn
x = −pn
0
|
1 − ε qn
|
1
|
1+ε
x
285
286
CHAPTER 13 The Central Limit Problem: The Noncentered Case
(see figure). We have h n (ε) =
(|x−1|≥ε)
j
=n =n
(|x−1|≥ε)
x 2 dFn j (x + αn j )
x 2 dFn (x + pn )
(−∞,1−ε]
x 2 dFn (x + pn )
+n =n
[1+ε,∞)
(−∞,1−ε]
x 2 dFn (x + pn ) x 2 dFn (x + pn )
(since Fn (x + pn ) = 1 for x ≥ 1 + ε) = npn2 qn (since qn > 1 − ε for sufficiently large n, so that the only value taken by x with positive probability is − pn ) = npn × pn qn ≤ npn × pn → λ × 0 = 0. Thus, h n (ε) → 0, and therefore L(Sn ) ⇒ P(λ). (Also, max1≤ j≤n σn2j = pn qn ≤ pn → 0.) As a consequence of it, we have n x n−x λx pn qn → e−λ , x = 0, 1, . . . x x! In fact, for x = 0, 1, . . ., take x1 = x − 21 and x2 = x + 21 , so that x1 and x2 are continuity points of the d.f. F of X ∼ P(λ). Thus, FSn (xi ) → FX (xi ), i = 1, 2. On the other hand, P(Sn = x) = P(Sn ≤ x2 ) − P(Sn ≤ x1 ), and likewise P(X = x) = P(X ≤ x2 ) − P(X ≤ x1 ). Thus, P(Sn = x) = P(Sn ≤ x + 0.5) − P(Sn ≤ x − 0.5) → P(X ≤ x + 0.5) − P(X ≤ x − 0.5) = P(X ≤ x) − P(X ≤ x − 1) = P(X = x) = e−λ or
n x
λx , x!
pnx qnn−x → e−λ λx! . x
Exercises. 1. Under the assumption of Theorem 1, the convergence in (13.29) holds if and only if (13.30) holds for every ε > 0. With the notation X¯ n j = X n j − αn j and X¯ n j having d.f. F¯n j , and S¯n = nj=1 X¯ n j , show that the convergence in (13.29) follows from the condition kn x 2 d F¯n j (x) → 0 for every ε > 0. j=1 (|x|≥εsn )
n→∞
13.3 Two Special Cases of the Limiting Laws of L(Sn )
2. In Theorem 1 of this chapter, as well as Theorem 1 of Chapter 12, under some basic assumptions on the r.v.s X n j , j = 1, . . . , kn , given in the theorems cited, a necessary and sufficient condition is given (namely, gn (ε) −→ 0 for every ε > 0) n→∞ so that L(Sn ) ⇒ N (μ, σ 2 ), μ ∈ , σ 2 ∈ (0, ∞), n→∞ k n where Sn = j=1 X n j . Thus, P(Sn ≤ x) = FSn (x) −→ (x; μ, σ 2 ), x ∈ , n→∞
where (x; μ, σ 2 ) is the d.f. of the
N (μ, σ 2 ) distribution. Actually, the preceding convergence is uniform in x ∈ . That is, for every ε > 0 there exists N (ε) > 0, independent of x ∈ , such that n ≥ N (ε) implies |FSn (x) − (x; μ, σ 2 )| < ε for all x ∈ .
(∗ )
This result is a consequence of the following fact (known as Pólya’s Lemma). Namely, if F and {Fn } are d.f.s of r.v.s such that Fn (x) −→ F(x), x ∈ , and F n→∞ is continuous, then the convergence is uniform in x ∈ . Establish this fact. Hint: Refer to Exercise 9 in Chapter 8. See also Lemma 1, page 206, in Roussas (1997). Remark: The uniformity asserted in (*) is what legitimizes as simple an approximation as that in the Binomial case. Thus, if X 1 , . . . , X n are independent, distributed as B(1, p), and if a and b are with for sufficiently large
0≤
a < b ≤ n, then, integers n b−np a−np √ − , where S = n, P(a < Sn ≤ b) √ n j=1 X j , q = 1− p, and npq npq
b−np n √ (x) = (x; 0, 1). This is so, because P(Sn ≤ b) = P Z ≤ j j=1 npq , where
X − p b−np b−np n √ √ Z j = √1n √j pq , j = 1, . . . , n, and P j=1 Z j ≤ npq − npq < ε for n ≥ N (ε) independently of
b−np √ npq ;
and likewise for P(Sn ≤ a).
287
CHAPTER
Topics from Sequences of Independent Random Variables
14
This chapter consists of a selection of topics from sequences of r.v.s. The r.v.s considered are most often independent, although not necessarily identically distributed. The central result is the statement and proof of the Strong Law of Large Numbers (SLLN), Theorem 7. It is important to notice that all results prior to Theorem 7, consisting of six theorems and two lemmas, are used directly or indirectly in the proof of the SLLN. This may serve as a tribute to its depth and complexity. The chapter opens with the Kolmogorov inequalities, the detailed discussion of which takes up the entire first section. Kolmogorov inequalities provide upper and lower bounds for the probability of the maximum of the absolute value of partial sums of independent r.v.s. They are instrumental, albeit indirectly, in the proof of the SLLN. Section 14.2 is devoted to discussing four theorems. Theorem 2 gives a suf ficient condition, in terms of variances, for the series n≥1 (X n − E X n ) to converge a.s. Theorems 3 and 4 taken together state that, if n≥1 P(An ) < ∞, then P(lim supn→∞ An ) = 0, and if the former sum is ∞, then the latter probability is 1. The first conclusion (Borel–Cantelli Lemma) holds for any events, whereas the second conclusion (Borel Zero–One Criterion) requires independence. Theorem 5 discusses some technical consequences of the assumption that n≥1 P(X n = X n ) < ∞ for two sequences of r.v.s. The following two lemmas, Lemma 3 (Toeplitz) and Lemma 4 (Kronecker), are analytic results and lend themselves here and in many other situations. In Section 14.3, a precursor to the SLLN, Theorem 6, is discussed, as well as a lemma providing bounds for the E|X | of a r.v. X in terms of sums of certain probabilities, and, finally, the SLLN itself is proved. In the following section, it is shown that the SLLN essentially holds, even if the expectation of the underlying r.v.s is infinite. If, however, this expectation does not exist, then the averages Sn /n are unbounded with probability one. In the final section, two main results are discussed. One is the Kolmogorov Zero– One Law, which states that tail events defined on a sequence of independent r.v.s have probability either 0 or 1. This conclusion has important ramifications regarding the limits of sequences, series, etc., of independent r.v.s. The other result gives necessary and sufficient conditions for the a.s. convergence of a series of independent r.v.s, and because of its form, it is referred to as the three series criterion. These main results An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00014-1 Copyright © 2014 Elsevier Inc. All rights reserved.
289
290
CHAPTER 14 Topics from Sequences of Independent Random Variables
are obtained after the concept of the tail σ -field is introduced and a related result is established.
14.1 Kolmogorov Inequalities In this chapter, we consider certain topics related to sequences of independent r.v.s with a view of establishing the Strong Law of Large Numbers (SLLN) (Theorem 7). We start with Kolmogorov inequalities, which provide upper and lower bounds for probabilities of the maximum of partial sums of r.v.s centered at their expectations. Theorem 1 (Kolmogorov inequalities). Let X j , j = 1, . . . , n, be independent r.v.s (not necessarily i.d.), and let E X j be finite and |X j | ≤ c, where c is finite or ∞. Then, for every ε > 0, one has n 2 (ε + 2c)2 j=1 σ j 1 − n ≤ P max |S − E S | ≥ ε ≤ , (14.1) k k 2 1≤k≤n ε2 j=1 σ j where Sn =
n j=1
X j and σ j2 = σ 2 (X j ).
We observe that σ j2
∞ if and only if E X 2j
∞. Thus, if σ j2
Remark 1. = = = ∞ for some j, then c = ∞. Then the right-hand side of (14.1) becomes ∞ and the left-hand side becomes 0, interpreting ∞ ∞ as 1. Thus (14.1) is trivially true, and we may therefore 2 assume that σ j < ∞ for all j. Next, if c = ∞, the left-hand side of (14.1), where c appears, is −∞ and therefore we may assume that c < ∞, since the proof of the right-hand side of the inequality does not depend on the finiteness or not of c, as will be seen (see Lemma 1). Under this assumption, we have |X j | ≤ c, so that |E X j | ≤ c and therefore |X j − E X j | ≤ 2c. Thus, if we set Y j = X j − E X j , j = 1, . . . , n, then the Y j s are independent, EY j = 0, and |Y j | ≤ c∗ (= 2c) < ∞. Therefore, in carrying out the proof of the theorem, we may assume that E X j = 0 and |X j | ≤ 2c < ∞ (or work with the Y j s). The proof of the theorem is rather long, in particular the proof of the left-hand-side inequality. It might then be appropriate to split it in two parts in the form of lemmas. Lemma 1. Under the assumptions of Theorem 1 and the additional nonrestrictive assumption that E X j = 0 (but without assuming finiteness of the constant c), the right-hand side of (14.1) holds. Proof.
For ε > 0, we set Ak = max |S j | ≥ ε , k = 1, . . . , n, 1≤ j≤k
and
Bk =
A0 = ,
(14.2)
max |S j | < ε and |Sk | ≥ ε , k = 2, . . . , n,
1≤ j≤k−1
B1 = (|S1 | ≥ ε) = A1 , S0 = 0.
(14.3)
14.1 Kolmogorov Inequalities
Then it is clear that Bk ∩ Bl = , for k = l, and Aj =
j
Bk ,
j = 1, . . . , n.
(14.4)
k=1
Indeed, if C j = (|S j | ≥ ε), j = 1, . . . , n, then n
c An = ∪ C j = C1 + (C1c ∩ C2 ) + · · · + (C1c ∩ · · · Cn−1 ∩ Cn ) j=1
= B1 + B2 + · · · + Bn . Next,
Bk
Sn2 d P
=
Sn2 I Bk d P = E(Sn2 I Bk ) = E I Bk [Sk + (Sn − Sk )]2
= E[(I Bk Sk ) + I Bk (Sn − Sk )]2 = E(I Bk Sk2 ) + 2E[I Bk Sk (Sn − Sk )] + E[I Bk (Sn − Sk )2 ].
(14.5)
Clearly, Bk ∈ σ (S1 , . . . , Sk ), so that I Bk is σ (S1 , . . . , Sk )-measurable. It follows that I Bk Sk is σ (S1 , . . . , Sk )-measurable. Also, I Bk Sk and Sn − Sk are independent, since the former is defined in terms of the r.v.s X j , j = 1, . . . , k, whereas the latter is defined in terms of the r.v.s X j , j = k + 1, . . . , n. It follows that E[I Bk Sk (Sn − Sk )] = E(I Bk Sk )E(Sn − Sk ) = 0, so that (14.5) becomes Sn2 d P = E(I Bk Sk2 ) + E[I Bk (Sn − Sk )2 ] ≥ E(I Bk Sk2 ) Bk = Sk2 d P ≥ ε2 P(Bk ) by means of (14.3); i.e., B k Sn2 d P ≥ ε2 P(Bk ). Bk
Adding over k = 1, . . . , n, and taking into consideration relation (14.4), we get then Sn2 d P ≥ ε2 P(An ). (14.6) An
But
An
Therefore
Sn2 d P ≤
Sn2 d P = E Sn2 =
σ j2 by independence.
j
σ j2
≥ ε P(An ) and hence P(An ) ≤ 2
j
which is the right-hand side of the required inequality.
j
σ j2
ε2
,
291
292
CHAPTER 14 Topics from Sequences of Independent Random Variables
Lemma 2. Under the assumptions of Theorem 1 (including boundedness, |X j | ≤ c(< ∞), j = 1, . . . , n), the left-hand side of (14.1) holds. Proof.
From (14.2) and (14.3), it follows that Bk ⊆ Ak , Ak−1 ∩ Bk = and, by (14.4), Ak = (B1 + · · · + Bk−1 ) + Bk
(14.7)
= Ak−1 + Bk , so that Ak−1 = Ak − Bk , and I Ak−1 = I Ak − I Bk . Thus Sk−1 I Ak−1 + X k I Ak−1 = Sk I Ak−1 = Sk I Ak − Sk I Bk , by means of (14.7). Therefore, squaring out and taking the expectations, we obtain 2 I Ak−1 ) + E(X k2 I Ak−1 ) = E(Sk2 I Ak ) − E(Sk2 I Bk ), E(Sk−1
(14.8)
because E(Sk−1 I Ak−1 X k I Ak−1 ) = E[(Sk−1 I Ak−1 )X k ] = E(Sk−1 I Ak−1 )E X k = 0, and E(Sk I Ak Sk I Bk ) = E(Sk2 I Ak I Bk ) = E(Sk2 I Bk ) since Bk ⊆ Ak . Now I Ak−1 is defined in terms of X 1 , . . . , X k−1 and hence it is independent of X k . Therefore E(X k2 I Ak−1 ) = E X k2 E I Ak−1 = E X k2 P(Ak−1 ) = σk2 P(Ak−1 ). Then (14.8) becomes 2 I Ak−1 ) + σk2 P(Ak−1 ) = E(Sk2 I Ak ) − E(Sk2 I Bk ). E(Sk−1
(14.9)
But Sk I Bk = Sk−1 I Bk + X k I Bk , and hence |Sk I Bk | ≤ |Sk−1 I Bk | + |X k I Bk | < ε I Bk + 2cI Bk = (ε + 2c)I Bk ,
(14.10)
since |Sk−1 | < ε on Bk and |X k | ≤ 2c. Squaring out both sides of (14.10) and taking expectations, we get E(Sk2 I Bk ) ≤ (ε + 2c)2 P(Bk ). (14.11) Next, by means of (14.11), relation (14.9) becomes 2 I Ak−1 ) σk2 P(Ak−1 ) = E(Sk2 I Ak ) − E(Sk2 I Bk ) − E(Sk−1 2 ≥ E(Sk2 I Ak ) − E(Sk−1 I Ak−1 ) − (ε + 2c)2 P(Bk ).
(14.12)
14.1 Kolmogorov Inequalities
But Ak−1 ⊆ An , by virtue of (14.4), and therefore (14.12) becomes 2 σk2 P(An ) ≥ E(Sk2 I Ak ) − E(Sk−1 I Ak−1 ) − (ε + 2c)2 P(Bk ).
(14.13)
Summing over k = 1, . . . , n, on both sides of (14.13), we obtain, by means of (14.4),
σk2
P(An ) ≥ [E(S12 I A1 ) − E(S02 I A0 ) + E(S22 I A2 ) − E(S12 I A1 )
k
+ E(S32 I A3 ) − E(S22 I A2 ) + · · · + E(Sn2 I An ) 2 − E(Sn−1 I An−1 )] − (ε + 2c)2 P(An )
= [E(Sn2 I An ) − E(S02 I A0 )] − (ε + 2c)2 P(An ) = E(Sn2 I An ) − (ε + 2c)2 P(An ) (since S0 = 0) = E Sn2 (1 − I Acn ) − (ε + 2c)2 P(An ) = E Sn2 − E(Sn2 I Acn ) − (ε + 2c)2 P(An )
2 σk − E(Sn2 I Acn ) − (ε + 2c)2 P(An ); i.e., =
k
σk2
P(An ) ≥
k
σk2
− E(Sn2 I Acn ) − (ε + 2c)2 P(An ).
(14.14)
k
But |Sn | < ε on Acn , since Acn = ∩nj=1 C cj = ∩nj=1 (|S j | < ε). Therefore (14.14) becomes
2 2 σk P(An ) ≥ σk − ε2 P(Acn ) − (ε + 2c)2 P(An ) k
k
=
σk2
k
=
− ε2 P(Acn ) − (ε + 2c)2 [1 − P(Acn )]
σk2 − (ε + 2c)2
k
+[(ε + 2c)2 − ε2 ]P(Acn )
2 σk − (ε + 2c)2 ; i.e., ≥
k
k
σk2
P(An ) ≥
k
σk2
− (ε + 2c)2 .
293
294
CHAPTER 14 Topics from Sequences of Independent Random Variables
Hence (ε + 2c)2 P(An ) ≥ 1 − 2 , k σk
as was to be shown. Proof of Theorem 1.
It is the combination of Lemmas 1 and 2.
Remark 2. In Kolmogorov inequalities, it is clear that, in forming partial sums, we can sum between any two positive integers m + 1 and m + r with m + r ≤ n. Then, the inequalities become ⎞ ⎛ m+k m+k 2 (ε + 2c) ≤ P ⎝ max Xj −E X j ≥ ε⎠ 1 − m+r 1≤k≤r 2 j=m+1 j=m+1 σj j=m+1
⎞ m+k = P ⎝ max (X j − E X j ) ≥ ε⎠ ≤ 1≤k≤r j=m+1 ⎛
m+r j=m+1 ε2
σ j2 .
This actually amounts to reindexing the r.v.s. Indeed, by setting Y j = X j+m and τ 2j = σ 2 (Y j ), j = 1, . . . , r , we have that the Y j s are independent and |Y j | ≤ c. Then inequalities (14.1) hold; i.e., ⎞ ⎛ r k k 2 2 (ε + 2c) j=1 τ j ⎠ ⎝ 1 − r ≤ P max Yj − E Yj ≥ ε ≤ . 2 1≤k≤r ε2 j=1 τ j j=1
j=1
k m+k m+k r k 2 But j=1 Y j − E j=1 Y j = j=1 τ j = j=m+1 X j − E j=m+1 X j and m+r 2 j=m+1 σ j , and then the remark is valid.
14.2 More Important Results Toward Proving the Strong Law of Large Numbers In this section, a number of results are established, which will lead to the proof of the SLLN. These results (six in number) are important in their own right and are used in many other situations in probability and mathematical statistics. The next theorem states, in effect, that, if the variances of r.v.s do not vary wildly, then the series consisting of the r.v.s converges a.s. More precisely, we have Theorem 2. For n ≥ 1, let X n be independent r.v.s such that E X n is finite and ∞ 2 σn2 = σ 2 (X n ) < ∞. Then, if ∞ n=1 σn < ∞, it follows that n=1 (X n − E X n ) converges a.s.
14.2 More Important Results Toward Proving the Strong
Proof. Set Tk = kj=1 (X j − E X j ) and apply the right-hand side of the relation in Remark 2 in order to obtain m+r
max |Tm+k − Tm | ≥ ε ≤
P
1≤k≤r
σ j2
j=m+1 ε2
.
(14.15)
However,
max |Tm+k − Tm | ≥ ε
1≤k≤r
= |Tm+k − Tm | ≥ ε for at least one k with 1 ≤ k ≤ r =
r |Tm+k − Tm | ≥ ε . k=1
Thus, relation (14.15) becomes P
r
|Tm+k − Tm | ≥ ε
m+r
≤
k=1
σ j2
j=m+1 ε2
.
(14.16)
In (14.16), let r → ∞ and use continuity from below of the probability measure P to obtain ∞ σ j2
r j=m+1 |Tm+k − Tm | ≥ ε ≤ . (14.17) P ε2 k=1
By letting m → ∞ in (14.17), and using the fact that
2 n≥1 σn
< ∞, we get
∞ lim P |Tm+k − Tm | ≥ ε = 0 for every ε > 0.
m→∞
(14.18)
k=1
However, (14.18) is a necessary and sufficient condition for mutual a.s. convergence of {Tm }, by Theorem 4 in Chapter 3, and hence for a.s. convergence of Tm = mj=1 (X j − E X j ), as was to be seen. The following result states that, if the probabilities of a sequence of events are small, then the lim supn→∞ of the sequence has probability zero. Theorem 3 (Borel–Cantelli Lemma). Let {An }, n ≥ 1, be an arbitrary sequence of events. Then, if n P(An ) < ∞, it follows that P(lim supn→∞ An ) = 0.
295
296
CHAPTER 14 Topics from Sequences of Independent Random Variables
Proof.
With n and m tending to ∞, we have ⎛ ⎞ ⎛ ⎞ ∞ ∞ ∞ A j ⎠ = P ⎝lim Aj⎠ P(lim sup An ) = P ⎝ n
⎛
= lim P ⎝ n
∞
= lim lim P ⎝ n
However,
⎛ lim P ⎝ m
m
A j ⎠ ≤ lim m
lim lim P ⎝ n
j=1
m
m j=n
j=n
A j ⎠ = lim P ⎝lim n
m
m
⎞
m
⎞ Aj⎠
j=n
Aj⎠ .
(14.19)
j=n
⎞
⎛
∞
m
j=n
so that
⎛
⎞
j=n
⎛
since
n
n=1 j=n
m
P(A j ) =
j=n
∞
P(A j ),
j=n
⎞ A j ⎠ ≤ lim
∞
n
P(A j ) = 0,
(14.20)
j=n
P(A j ) < ∞. Relations (14.19) and (14.20) complete the proof.
The Borel Zero–One Criterion discussed later restates in its part (i) the Borel– Cantelli theorem for independent events. The important conclusion is part (ii), where it is stated what happens to the probability of the lim supn→∞ of the sequence when the individual probabilities are large. Theorem 4 (Borel Zero–One Criterion). Let the events An , n ≥ 1, be independent. Then (i) If n P(An ) < ∞, it follows that P(lim supn→∞ An ) = 0, and (ii) If n P(An ) = ∞, it follows that P(lim supn→∞ An ) = 1. Proof. (i) This is a special case of Theorem 3, which holds regardless of the independence or not of the events involved. (ii) For the proof of (ii), we need the following inequality n
1 − e−
j=1 x j
≤1−
n
(1 − x j ) for 0 ≤ x j ≤ 1, j = 1, . . . , n.
(14.21)
j=1
(For its justification, see Exercise 3 in Chapter 10.) Next, with n and m tending to ∞ we have, by (14.19), P(lim sup An ) = lim lim P(∪mj=n A j ) n
n
m
14.2 More Important Results Toward Proving the Strong
= lim lim[1 − P(∪mj=n A j )c ] n
m
= lim lim[1 − P(∩mj=n Acj )] n m ⎤ ⎡ m P(Acj )⎦ (by independence) = lim lim ⎣1 − n
m
j=n
⎧ ⎨
⎫ ⎬
m
= lim lim 1 − [1 − P(A j )] ; i.e., n m ⎩ ⎭ j=n ⎫ ⎧ m ⎬ ⎨ P(lim sup An ) = lim lim 1 − [1 − P(A j )] . n m ⎩ ⎭ n
(14.22)
j=n
Applying (14.21) with x1 = P(An ), x2 = P(An+1 ), . . . , xm−n+1 = P(Am ), we obtain 1−
m
[1 − P(A j )] ≥ 1 − e−
m j=n
P(A j )
j=n
→ 1 − e−
∞
j=n
P(A j )
m→∞
= 1 − e−∞ = 1. Therefore
⎫ ⎧ m ⎬ ⎨ [1 − P(A j )] ≥ 1, so that, by (14.22), lim lim 1 − n m ⎩ ⎭ j=n
P(lim sup An ) ≥ 1, hence P(lim sup An ) = 1. n
n
Remark 3. The significance of the Borel Zero–One Criterion is that, when dealing with independent events, there is no room for the value of the probability of their lim supn→∞ other than either 0 or 1; no values strictly between 0 and 1 can occur. The theorem just proved has the following three corollaries. Corollary 1. Suppose that the events An , n ≥ 1, are independent and An → A n→∞ (in the sense that lim inf An = n→∞
∞ ∞ n=1 j=n
Aj =
∞ ∞ n=1 j=n
A j = lim sup An = A). n→∞
Then P(A) is either 0 or 1. Proof. We and by the theorem, P(A) is either 0 or 1 according have A = limn An to whether n P(An ) < ∞ or n P(An ) = ∞.
297
298
CHAPTER 14 Topics from Sequences of Independent Random Variables
Corollary 2.
Proof.
For independent events An , n ≥ 1, we have ! 1 if n P(Acn ) < ∞, P(lim inf An ) = n→∞ 0 if n P(Acn ) = ∞.
In fact, P(lim inf An ) = P( n→∞
∞ ∞
⎛ Aj) = 1 − P ⎝
n=1 j=n
⎛
= 1− P⎝
∞ ∞
⎞
∞ ∞
⎞c Aj⎠
n=1 j=n
Acj ⎠
n=1 j=n
= 1 − P(lim sup Acn ) n→∞ " 1 − 0 = 1 if n P(Acn ) < ∞, = 1 − 1 = 0 if n P(Acn ) = ∞.
a.s.
Corollary 3. Consider the independent r.v.s X n , n ≥ 1, and suppose that X n → c, n→∞ constant. Then for every δ > 0, one has ∞
P(|X n − c| ≥ δ) < ∞.
n=1
Proof. Set An (δ) = An = (|X n − c| ≥ δ). Then the events An , n ≥ 1, are independent. Next, let N c be the set of points for which X n (ω) → c. Then P(N c ) = 1, n→∞
and for every ω ∈ N c there are finitely many ns for which ω ∈ An , because otherwise X n (ω) c. Since lim supn→∞ An consists of points belonging to infinitely many n→∞ An s (by Exercise 30(ii) in Chapter 1), it follows that lim supn→∞ An ⊆ N , so that ∞ P(A ) < ∞, because otherwise P(lim supn→∞ An ) = 0. Then n=1 n ∞ P(lim supn→∞ An ) = 1. Hence n=1 P(|X n − c| ≥ δ) < ∞ for every δ > 0. In the result discussed next, we consider two sequences of r.v.s, and the probabilities of the events over which respective r.v.s differ from each other. If these probabilities are small, then one can reach a number of conclusions, which is the content of the next theorem. Theorem 5.
For n ≥ 1, consider the r.v.s X n and X n , and set An = (X n = X n ),
A = (X n = X n i.o.) = lim sup An n→∞
(where i.o. is read “infinitely often”; see also Exercise 30 in Chapter 1). Then, if ∞ P(A ) < ∞, it follows that n=1 n (i) P(A) = 0. ∞ (ii) Let B and B be the sets of convergence of ∞ n=1 X n and n=1 X n , respectively. c c Then B ∩ A = B ∩ A and P(B ∩ A) = P(B ∩ A) = P(B ∩ B ∩ A) = 0
14.2 More Important Results Toward Proving the Strong
(i.e., the set of convergence of the two series is essentially the same, although the limits may be distinct). S (iii) The sets of convergence of the sequences { bSnn } and { bnn }, as n → ∞, differ only by a null set and the limits of the sequences are the same, where Sn =
n
X j , Sn =
j=1
n
X j and 0 = bn ↑ ∞ as n → ∞.
j=1
Proof. (i) Throughout the proof all limits are taken as n → ∞. Then, with An = (X n = X n ), n ≥ 1, and A = (X n = X n i.o.) = lim supn An , the result follows from Theorem 3. (The last equality in$Chapter 2.) $∞ from Exercise 30(ii) # follows ∞ #∞ c c = A , we have A (ii) From A = lim supn An = ∞ n=1 j=n j n=1 j=n A j = c c lim inf n An (see Exercise 30(i) in Chapter #2). Then,c for ω ∈ A or, equivalently, ω ∈ lim inf n Acn , we have ω ∈ ∞ j=n 0 +1 A j for some n 0 = n 0 (ω) c and An for all n > n 0 , hence X n (ω) n > n 0 . Since = X n (ω), then ω ∈ n0 n0 X n (ω) = X n (ω) + n>n 0 X n (ω), n X n (ω) = n n=1 n=1 X n (ω) + n 0 that n X n (ω) → n=1 X n (ω) + X (ω), if and only n>n 0 X n (ω), it follows n 0 if n X n (ω) → X n>n 0 X n (ω) and n=1 n (ω) + X (ω), where X (ω) = X (ω) = n>n 0 X n (ω). Thus, B ∩ Ac ⊆ B ∩ Ac and B ∩ Ac ⊆ B ∩ Ac , so that B ∩ Ac = B ∩ Ac , and P(B ∩ B ∩ A) ≤ P(B ∩ A) = P(B ∩ A) ≤ P(A) = 0. Ac
A
B∩B ∩A BΔB
B
B B ∩ Ac = B ∩ Ac
(iii) For each ω ∈ Ac , there exists n(ω) = n 0 such that X n (ω) = X n (ω) for all n > n 0 . For such ns and 0 = bn ↑ ∞, we write n 0 n n Sn (ω) j=1 X j (ω) j=n 0 +1 X j (ω) j=1 X j (ω) = = + bn bn bn bn and
Sn (ω) = bn
n j=1
X j (ω)
bn
n 0 =
j=1
X j (ω)
bn
n +
j=n 0 +1
bn
X j (ω)
,
299
300
CHAPTER 14 Topics from Sequences of Independent Random Variables
so that
n0 S (ω) 1 Sn (ω) X j (ω) − X j (ω) . − n = bn bn bn j=1 n 0 1 As j=1 X j (ω) − X j (ω) is independent of n and bn → 0, it follows that
Sn (ω) Sn Sn (ω) Sn c bn − bn → 0. Therefore { bn } and { bn } converge on the same subset of A , where c P(A ) = 1, and to the same limit. That is, if C and C are the sets of convergence of S
{ bSnn } and { bnn }, respectively, then C ∩ Ac = C ∩ Ac as in part (ii), and P(C ∩ A) = P(C ∩ A) = P(C ∩ C ∩ A) ≤ P(A) = 0. This section is closed with two lemmas, the Toeplitz lemma and the Kronecker lemma, which are analytic rather than probabilistic. They are used decisively toward the proof of the SLLN, as well as in many other instances in probability and mathematical statistics. Let αn j , j = 1, . . . , kn → ∞, n ≥ 1, be (real)
Lemma 3 (Toeplitz Lemma). numbers such that
n→∞
αn j → 0 for each j,
(14.23)
n→∞
and
kn
|αn j | ≤ c(< ∞) for all n.
(14.24)
j=1
Let {xn } be a sequence of (real) numbers and define the sequence {yn } as follows: yn =
kn
αn j x j , n ≥ 1.
j=1
Then one has (i) If xn → 0, then yn → 0. n→∞ n→∞ n (ii) If kj=1 αn j → 1 and xn → x finite, then yn → x. n→∞ n→∞ n→∞ (iii) For λn > 0, n ≥ 1, set bn = nj=1 λ j and suppose that bn ↑ ∞ as n → ∞. n λ j x j → x. Then, if xn → x finite, it follows that b1n kj=1 n→∞
n→∞
Proof. Throughout the proof all limits are taken as n → ∞. (i) Since xn → 0, it follows that for every ε > 0, there exists an integer n(ε) = n 0 > 0 such that for n > n 0 , |xn | < εc . Thus for all sufficiently large n (so that kn > n 0 ), kn n0 kn |yn | = αn j x j = αn j x j + αn j x j j=1 j=1 j=n 0 +1
14.2 More Important Results Toward Proving the Strong
n0 ≤ αn j x j j=1 kn ε |αn j | c j=n 0 +1 n0 αn j x j + ε (by (14.24)), and this is ≤ j=1
+
≤ ( max |x j |) 1≤ j≤n 0
n0
|αn j | + ε.
j=1
By taking now the limits and utilizing (14.23), we obtain lim sup |yn | ≤ ε for every ε > 0, so that yn → 0. n
(ii) We have yn =
kn
⎛ αn j x j = ⎝
j=1
Now
⎛ ⎝
⎞ αn j ⎠ x +
j
⎞ αn j ⎠ x → x since
j
αn j (x j − x).
j
αn j → 1, and
j
αn j (x j − x) → 0
j
by (i) with xn replaced by xn − x, so that xn − x → 0. Thus yn → x. (iii) Set λj , j = 1, . . . , n. αn j = bn Then j αn j = 1 and αn j → 0 for each fixed j, so that the assumptions in n→∞ (ii) are satisfied. Therefore λj 1 αn j x j = xj = λ j x j → x by (ii). bn bn j
j
j
Lemma 4 (Kronecker Lemma). Consider the sequence {xn }, n ≥ 1, of (real) numbers, and suppose that nj=1 x j → s finite. Let also 0 = bn ↑ ∞ as n → ∞. Then n→∞ 1 n b x → 0. j j j=1 bn n→∞ Proof. Let b0 = 0, and set sn+1 = nj=1 x j and s1 = 0. Then one has n n 1 1 bjxj = b j (s j+1 − s j ) bn bn j=1
j=1
301
302
CHAPTER 14 Topics from Sequences of Independent Random Variables
⎛ = But
n
1 ⎝ bn
b j s j+1 =
j=1
n
b j s j+1 −
j=1
n−1
n
⎞ bjsj⎠ .
(14.25)
j=1
b j s j+1 + bn sn+1 ,
j=1
and if we set j + 1 = r , so that r = 2, . . . , n, we obtain n
b j s j+1 =
n
br −1 sr + bn sn+1 =
r =2
j=1
n
br −1 sr + bn sn+1 ,
r =1
since b0 = 0. Therefore (14.25) becomes ⎛ ⎞ n n n 1 1 ⎝ bjxj = b j−1 s j + bn sn+1 − bjsj⎠ bn bn j=1 j=1 j=1 ⎡ ⎤ n 1 ⎣ = (b j − b j−1 )s j ⎦ bn sn+1 − bn j=1
n 1 = sn+1 − (b j − b j−1 )s j . bn
(14.26)
j=1
Thus, by setting λ j = b j − b j−1 > 0, j ≥ 1, we have bn = nj=1 λ j with bn ↑ ∞ as n → ∞. Since sn → s finite, part (iii) of Lemma 3 applies and gives that n→∞
n n 1 1 λjsj = (b j − b j−1 )s j → s. n→∞ bn bn j=1
j=1
This result, together with (14.26), implies that
1 bn
n
→ s j=1 b j x j n→∞
− s = 0.
14.3 Statement and Proof of the Strong Law of Large Numbers In this section, the SLLN is stated and proved. Before this is done, another theorem, Theorem 6, is established that is some kind of precursor to the SLLN. Also, an additional result is required for the proof of the SLLN, which is of independent interest; it is stated as a lemma. Theorem 6 (Kolmogorov).
For n ≥ 1, consider the independent r.v.s X n and σ2
∞ n < ∞, for suppose that E X n is finite and σn2 = σ 2 (X n ) < ∞. Then, if n=1 b2 n
14.3 Statement and Proof of the Strong Law of Large Numbers
0 < bn ↑ ∞ as n → ∞, it follows that Sn − E Sn a.s. → 0, where Sn = X j. n→∞ bn n
j=1
Proof.
We have
σ2 n
n bn2
Xn
Therefore
n
bn
=
Xn bn
Xn − E Xn bn n
n
−E
Xn bn
σ2
=
< ∞.
converges a.s., by Theorem 2. Thus, for ω in the set of convergence of this series, we E Xn have that n X n (ω)− converges. For an arbitrary, but fixed such an ω, set bn xn = We have then that and gives that
n j=1
X n (ω) − E X n , n ≥ 1. bn
x j → s(= s(ω)) finite. Then the Kronecker lemma applies n→∞
n 1 b j x j → 0, or equivalently, n→∞ bn
1 bn
j=1 n j=1
= Thus
Sn −E Sn → 0 bn n→∞
Remark 4.
bj
n X j (ω) − E X j 1 = [X j (ω) − E X j ] bj bn j=1
Sn (ω) − E Sn → 0. n→∞ bn
a.s.
For bn = n, we get
Sn −E Sn a.s. → 0, n n→∞
provided
σn2 n n2
< ∞ (which
happens, e.g., if σn2 = σ 2 for all n), and if E X n = μ ∈ for all n, then However, this result is true without assuming finiteness of σn2 .
Sn a.s. → μ. n n→∞
The following lemma will facilitate the proof of the SLLN in addition to being interesting in its own right. Lemma 5.
Let X be a r.v. and set An = (|X | ≥ n), n ≥ 1.
Then one has
∞
P(A j ) ≤ E|X | ≤ 1 +
j=1
(so that E|X | < ∞ if and only if
∞
j=1
∞ j=1
P(A j ) < ∞).
P(A j )
(14.27)
303
304
CHAPTER 14 Topics from Sequences of Independent Random Variables
Proof.
Let An , n ≥ 1, be defined by (14.27) and set A0 = . Then as n → ∞, de f An ↓ , Bn = An−1 − An = n − 1 ≤ |X | < n , Bi ∩ B j = , i = j,
∞
(14.28)
B j = .
j=1
Next,
(n − 1)[P(An−1 ) − P(An )] = (n − 1)P(Bn ) ≤
|X |d P Bn
= E(|X |I Bn ) ≤ n P(Bn ) = n[P(An−1 ) − P(An )],
(14.29)
and n P(An ) ≤
An
Now, for each r ≥ 1, we have r
|X |d P = E(|X |I An ).
(14.30)
P(A j ) = 0 × [P(A0 ) − P(A1 )] +
j=1
1 × [P(A1 ) − P(A2 )] + 2 × [P(A2 ) − P(A3 )] + ····················· + (r − 1) × [P(Ar −1 ) − P(Ar )] + r × P(Ar ) r ( j − 1)[P(A j−1 ) − P(A j )] + r P(Ar ), = j=1
which, by means of (14.29) and (14.30) suitably applied, becomes ≤
r
E(|X |I B j ) + E(|X |I Ar )
j=1
=
rj=1 B j
|X |d P +
|X |d P + (|X |≥r ) |X |d P = E|X |, =
|X |d P = Ar
(0≤|X |
|X |d P
since, clearly, r j=1
Bj =
r j=1
( j − 1 ≤ |X | < j) = (0 ≤ |X | < r );
(14.31)
14.3 Statement and Proof of the Strong Law of Large Numbers
i.e.,
r j=1
P(A j ) ≤ E|X | for all r ≥ 1, and hence ∞
P(A j ) ≤ E|X |.
(14.32)
j=1
Next, by the fact that P(A0 ) = P( ) = 1, one has 1+
r
P(A j ) = 1 × [P(A0 ) − P(A1 )] +
j=1
2 × [P(A1 ) − P(A2 )] + 3 × [P(A2 ) − P(A3 )] + ····················· + r × [P(Ar −1 ) − P(Ar )] + (r + 1) × P(Ar ) r j[P(A j−1 ) − P(A j )] + = j=1
(r + 1)P(Ar ) r j[P(A j−1 ) − P(A j )], ≥ j=1
which, by means of (14.29), becomes ≥
r
E(|X |I B j ) =
j=1
=
Arc
(0≤|X |
|X |d P (by (14.31))
|X |d P (by (14.27))
= E(|X |I Arc ); i.e., E(|X |I Arc ) ≤ 1 +
r
P(A j ).
(14.33)
j=1
But Ar ↓ implies that Arc ↑ , so that I Arc → 1. Thus, |X |I Arc ↑ |X | as r → ∞ r →∞ and therefore the Monotone Convergence Theorem gives that E(|X |I Arc ) ↑ E|X | as r → ∞. Therefore (14.33) becomes, as r → ∞, E|X | ≤ 1 +
∞ j=1
P(A j ).
(14.34)
305
306
CHAPTER 14 Topics from Sequences of Independent Random Variables
Combining (14.32) and (14.34), one has then ∞
P(A j ) ≤ E|X | ≤ 1 +
j=1
∞
P(A j ).
(14.35)
j=1
Finally, we are in a position to state and prove the SLLN. Theorem 7 (Strong Law of Large Numbers, SLLN; Kolmogorov). For n ≥ 1, let X n be i.i.d. r.v.s, let X be a r.v. distributed as the X n s, and set Sn = nj=1 X j . Then Sn a.s. −→ E X . n n→∞
if E|X | < ∞, it follows that
On the other hand, if
constant, it follows that E|X | < ∞ and c = E X .
Sn a.s. −→ c, n n→∞
a finite
Proof. Throughout the proof all limits are taken as n → ∞. Let An , n ≥ 1, be defined by (14.27) in terms of the r.v. X . Then one has relation (14.35). First, suppose that E|X | < ∞, and we will show that Snn → E X . To this end, define a.s. the truncated r.v.s X¯ n as follows: ! X¯ n =
X n if |X n | < n , n ≥ 1, 0 if |X n | ≥ n
(so that X¯ n = X n I Acn ). Also, set S¯n = n
n j=1
(14.36)
X¯ j . Then
P X n = X¯ n = P |X n | ≥ n = P(|X | ≥ n) = P An ≤ E|X | n
n
< ∞ (by
n
(14.35)). ¯
Then, by Theorem 5 (iii), { Snn } and { Snn } converge on events that differ with probability 0 and to the same limit. So, if
Sn a.s. S¯n a.s. n → E X , then n → E X . Thus, it suffices to show that
S¯n a.s. → E X. n
(14.37)
We have, by means of (14.36), E X¯ n = E X n I(|X n |
(14.38)
(see also Exercise 5) since |X I Acn | ≤ |X |, independent of n, integrable and X I Acn → X , ¯ so that the Dominated Convergence Theorem applies. Next, EnSn = n1 nj=1 E X¯ j , which is of the form b1n nj=1 λ j x j with x j = E X¯ j , λ j = 1, bn = n. Since also xn → E X , finite (by (14.38)), the third part of the Toeplitz Lemma (Lemma 3) applies and gives that E S¯n → E X. (14.39) n
14.3 Statement and Proof of the Strong Law of Large Numbers
By means of (14.39), the convergence in (14.37) would be true, if we would show that S¯n − E S¯n a.s. → 0. (14.40) n This would be true (by Theorem 6 applied with bn = n), if we would show that 2 % X¯ n & < ∞. To this end, consider (n − 1 ≤ |X | < n) = Bn = An−1 − An = σ n n c An−1 ∩ An , n ≥ 1. By the fact that Acn = (|X | < n), and Bm = (m − 1 ≤ |X | < m), it is clear that ) n
[
)
m−1
m
Acn ∩ Bm =
[
)
)
m−1
m
n
⎧ ⎨ ⎩
if m > n Bm if m ≤ n.
(14.41)
Thus, I Bm
∞
∞ X2 X2 c I I Ac ∩B = A n2 n n2 n m n=1
n=1 ∞
X2 I B (by means of (14.41)), n2 m n=m ' ( 1 1 + + · · · . = I Bm X 2 m2 (m + 1)2 =
If ω ∈ Bm , then m − 1 ≤ |X m (ω)| < m by the definition of Bm . Thus, in any case, the preceding expression is bounded by ' ( 1 1 + + · · · I Bm m 2 m2 (m + 1)2 " ' () 1 1 + + · · · . = I Bm 1 + m 2 (m + 1)2 (m + 2)2 Clearly (see also the picture at the end of this proof), ∞ 1 dx 1 1 + + · · · < = . 2 (m + 1)2 (m + 2)2 x m m Therefore I Bm
∞
X2 I Ac < (1 + m)I Bm = [2 + (m − 1)]I Bm n2 n n=1
≤ (2 + |X |)I Bm
307
308
CHAPTER 14 Topics from Sequences of Independent Random Variables
by the definition of Bm in (14.28). So ∞
X2 I Bm I Ac < (2 + |X |)I Bm . n2 n n=1
Hence
Bm
∞
X2 c I (2 + |X |)d P. d P ≤ A n2 n Bm n=1
Summing over m = 1, 2, . . ., and taking into consideration relation (14.28), we then obtain (by Corollary 1 to Theorem 1 in Chapter 5) ∞
X2 I Ac ≤ E(2 + |X |) < ∞. E (14.42) n2 n n=1
Next, n
¯ σ 2 ( X¯ n ) E X¯ 2 E(X 2 I Ac ) Xn n n = σ ≤ = (by (14.36)) 2 2 2 n n n n n n n
X2 I Ac < ∞ (by the corollary just cited, and (14.42)). =E n2 n n 2
That is,
σ 2 ( X¯ n ) n2
n
< ∞. ¯
¯ a.s.
Then Theorem 6 applies with bn = n and gives that Sn −nE Sn → 0, which is (14.40). To summarize: Sn a.s. → E X. (14.43) E|X | < ∞ implies n a.s.
Next, suppose that Snn −→ c, finite, and we will show that E|X | < ∞ and c = E X . By setting S0 = 0, we have Sn − Sn−1 Sn n−1 Sn−1 a.s. Xn = = − × −→ 0. n n n n n−1 Therefore, by taking δ = 1 in Corollary 3 of the Borel Zero–One Criterion (Theorem 4), we obtain X n P ≥ 1 < ∞, or P |X n | ≥ n = P(|X | ≥ n) < ∞, n n
or equivalently, That is,
n
n
n
P(An ) < ∞. Then inequality (14.35) implies that E|X | < ∞. Sn a.s. −→ c, finite, implies E|X | < ∞. n
(14.44)
14.4 A Version of the Strong Law of Large Numbers
a.s.
But then Snn −→ E X by (14.43). From (14.43) and (14.36), we have that c = E X and E|X | < ∞. y
1 x2
0 1 (m+1)2
+
1 (m+2)2
m
,x ≥m
m +1
m +2
m +3
x
+ · · · is the area of the orthogonals between m and ∞ which is
smaller than the area under the curve y = * ∞ dx 1 m x2 = m .
1 x2
between m and ∞, which is equal to
14.4 A Version of the Strong Law of Large Numbers for Random Variables with Infinite Expectation In Theorem 7, the a.s. limit of the average Sn /n was finite. In this section, a certain version of the theorem is established. Specifically, it is shown that, if the common expectation of the X j s exists, but it is either ∞ or −∞, the average Sn /n still converges (actually, diverges) a.s. to the expectation. However, if the common expectation does not exist, then the average Sn /n is unbounded with probability one. We start with the following lemma. Lemma 6. (i) Let X be a nonnegative r.v. with E X = ∞ and let X n , n ≥ 1, be independent a.s. r.v.s distributed as X . Then n1 nj=1 X j → ∞. n→∞
(ii) Let X be a nonpositive r.v. with E X = −∞ and let X n , n ≥ 1, be independent a.s. r.v.s distributed as X . Then n1 nj=1 X j → −∞. n→∞
309
310
CHAPTER 14 Topics from Sequences of Independent Random Variables
Proof. (i) For c > 0, define the truncated r.v. X c as follows: ! X if X ≤ c, Xc = c if X > c. Then, as c ↑ ∞, 0 ≤ X c ↑ X and hence, by the Monotone Convergence Theorem, E X c ↑ E X = ∞. Next, since X ≥ X c , we have n n 1 1 c Xj ≥ X j, n n j=1
j=1
so that n n 1 1 c X j ≥ lim inf Xj n→∞ n n
lim inf n→∞
j=1
j=1
n 1 c = lim X j = E X c a.s., n→∞ n j=1
by Theorem 7, since
|E X c |
=
E Xc
lim inf n→∞
< ∞. Thus,
n 1 X j ≥ E X c a.s. n j=1
for every c > 0. Letting c ↑ ∞ and utilizing the fact that E X c ↑ E X = ∞ (by Theorem 1 in Chapter 5), we then obtain that lim inf n→∞
so that
1 n
n j=1
n 1 X j ≥ ∞ a.s., n j=1
a.s.
X j → ∞. n→∞
(ii) By letting Y = −X , we have that Y satisfies the assumptions in (i). Thus n 1 a.s. (−X j ) → E(−X ) = ∞, n→∞ n j=1
or equivalently,
n 1 a.s. X j → −∞, n→∞ n j=1
as was to be seen. The next theorem is a straightforward application of the foregoing lemma and Theorem 7.
14.4 A Version of the Strong Law of Large Numbers
For n ≥ 1, let X n be i.i.d. r.v.s such that E X 1 exists but E|X 1 | = ∞. a.s. Then the SLLN still holds; i.e., n1 nj=1 X j → E X 1 (= ±∞).
Theorem 8.
n→∞
Proof.
X +j
X −j ,
We have X j = − so that ⎛ ⎞ ⎛ ⎞ n n n 1 1 1 a.s. Xj = ⎝ X +j ⎠ − ⎝ X −j ⎠ → E X 1+ − E X 1− n→∞ n n n j=1
j=1
j=1
by Theorem 7, applied to that one of E X 1+ , E X 1− which is < ∞, and Lemma 6, applied to the other one. Since E X 1+ − E X 1− = E X 1 , we obtain n 1 a.s. X j → E X 1, n→∞ n j=1
as was to be seen.
Finally, consider a r.v. X for which E X does not exist (e.g., X may be a Cauchy distributed r.v.). If the r.v.s X n , n ≥ 1, are independent and distributed as X , one would like to make some kind of a statement for n1 nj=1 X j . To this effect, one has the following result. Theorem 9. For n ≥ 1, let X n be i.i.d. r.v.s and suppose that E X 1 does not exist. Set Sn = nj=1 X j . Then the sequence { Snn } is unbounded with probability 1. (That & & % % is, for every M > 0, it holds P Snn > M i.o. = P lim supn→∞ Snn > M = 1.) Proof.
From
Xn n
=
Sn n
Sn−1 n−1 n × n−1 (and with S0 if { Xnn } is unbounded, then
−
= 0), it follows that | Xnn | ≤
n−1 | and therefore, so is { Snn }. Thus, it suffices to | Snn | + | Sn−1 show that { Xnn } is unbounded with probability 1. For M > 0, set |X n | >M . An (M) = n
Then { Xnn } is unbounded with probability 1, if and only if P[lim supn→∞ An (M)] = 1 for every M > 0. From Theorem 4 (Borel Zero–One Criterion), it follows that P[A P[limn→∞ sup An (M)] = 1 for every M > 0, if and only if ∞ n (M)] = ∞ n=1 for every M > 0. Now ∞ n=1
P(An (M)) =
∞
P(|X n | > n M) =
n=1
∞
P(|X | > n M)
n−1
(where the r.v. X is distributed as the X n s) ∞ ∞ = P k M < |X | ≤ (k + 1)M n=1 k=n
311
312
CHAPTER 14 Topics from Sequences of Independent Random Variables
=
∞
+ , k P k M < |X | ≤ (k + 1)M
k=1
∞ = k =
k=1 ∞
[k M<|X |≤(k+1)M]
dP
k
k=1
(k M<|x|≤(k+1)M)
d PX
(where PX is the probability distribution of X )
=
∞
k=1 (k M<|x|≤(k+1)M)
k d PX .
But k ≥ 1 implies 2k ≥ k + 1 and hence 2k M ≥ (k + 1)M. On the other hand, on |x| ≤ k. the set (k M < |x| ≤ (k + 1)M), |x| ≤ (k + 1)M, so that |x| ≤ 2k M or 2M Therefore the last expression is ∞ 1 |x|d PX ≥ 2M k=1 (k M<|x|≤(k+1)M) 1 1 = |x|d PX = |X |d P, 2M (|x|>M) 2M (|X |>M) and this is = ∞. This is so, because if it were < ∞, then E|X 1 | = |X 1 |d P = |X 1 |d P + |X 1 |d P
(|X 1 |>M) (|X 1 |≤M) |x|d PX 1 + |x|d PX 1 < ∞, = (|x|>M)
(|x|≤M)
which would imply that both E X 1+ and E X 1− are < ∞, a contradiction to the nonex∞ istence of the E X 1 . Thus, n=1 P(An (M)) = ∞ for every M > 0 and the desired result follows. To Theorems 7, 8, and 9, we then have the following. Corollary.
For n ≥ 1, let X n be i.i.d. r.v.s and set Sn = Sn a.s. → E X 1, n n→∞
is finite, one has that unbounded with probability 1. Proof.
n j=1
X j . Then, if E X 1
whereas if E|X 1 | = ∞, one has that { Snn } is
The first conclusion is one direction of Theorem 7. As for the second conSn a.s. → E X 1 , by Theorem 8, and E X 1 is n n→∞ % & |Sn | Sn a.s. → ±∞, so that P ≥ M i.o. = 1, and the conclun n→∞ n
clusion, if E X 1 exists and E|X 1 | = ∞, then either ∞ or −∞. Thus,
sion then follows. Finally, the only other case that E|X 1 | = ∞ may happen is when E X 1 does not exist (E X 1+ = E X 1− = ∞, so that E|X 1 | = ∞), but then Theorem 9 applies and yields the result.
14.5 Some Further Results on Sequences of Independent
14.5 Some Further Results on Sequences of Independent Random Variables In this section, we discuss two main results, Theorems 10 and 11, the first specifying the tail σ -field defined on a sequence of r.v.s, and the second providing necessary and sufficient conditions for the convergence of a series of r.v.s. When talking about convergence here, it is to be understood that the limit, be it a number or a (nondegenerate) r.v., is finite. In these results, the concept of a tail σ -fields is used. For its definition, and an associated result, see Definition 1 and Proposition 1 later. First, we introduce the concept of the tail σ -field associated with a sequence of r.v.s, and discuss some operations on the r.v.s involved, which produce r.v.s measurable with respect to the tail σ -field. To this end, let X n , n = 1, 2, . . ., be r.v.s defined on the probability space ( , A, P); these r.v.s need not be independent. For each n ≥ 1 and k ≥ 0, let An,n+k be the σ -field induced by the k + 1 r.v.s X n , . . . , X n+k ; i.e., An,n+k = σ (X n , . . . , X n+k ). Clearly, for fixed n, the σ -fields An,n+k are nondecreasing in k. In fact, the sets B k+1 in B k+1 can be viewed as sets B k+2 in B k+2 of the form B k+2 = B k+1 × ∈ B k+2 , and, clearly, {B k+2 ∈ B k+2 ; B k+2 = B k+1 × for B k+1 ∈ B k+1 } ⊆ B k+2 . Hence An,n+k = (X n , . . . , X n+k )−1 (B k+1 ) = (X n , . . . , X n+k , X n+k+1 )−1 (B k+1 × ) ⊆ (X n , . . . , X n+k , X n+k+1 )−1 (B k+2 ) = An,n+k+1 . Next, set Fn = ∪k≥0 An,n+k . Then Fn is a field (by Exercise 10 in Chapter 1) (whereas it need not be a σ -field), and let An be the σ -field generated by Fn ; i.e., An = σ (Fn ). Clearly, An is the smallest σ -field with respect to which all r.v.s X n , X n+1 , . . . are measurable. The σ -fields An+1 , An+2 , . . . are constructed likewise. Evidently, Fn ⊇ Fn+1 ⊇ · · ·, and this implies that An ⊇ An+1 ⊇ · · ·. Set T = ∩n≥1 An , so that the σ -field T ⊆ An , for all n ≥ 1. Definition 1. T , as just defined, is a σ -field called the tail σ -field (of the sequence of r.v.s {X n }, n ≥ 1). The members of T are called tail events, and r.v.s measurable with respect to T are called tail r.v.s. Here are some T -measurable r.v.s resulting upon operating on a given sequence of any r.v.s X n , n ≥ 1. Proposition 1.
Let {X n }, n ≥ 1, be r.v.s. Then
limn→∞ X n , if it exists, are all T -measurable. (i) lim inf n→∞ X n , lim supn→∞ X n , and (ii) The set of convergence of the series n≥1 X n is T -measurable. Proof. (i) lim inf n→∞ X n = limn→∞ (inf m≥n X m ) = limn→∞ Yn , where Yn = inf m≥n X m . But Yn is An -measurable since it is defined in terms of the r.v.s X m , m ≥ n.
313
314
CHAPTER 14 Topics from Sequences of Independent Random Variables
Likewise, Yn+1 is An+1 -measurable and hence An -measurable since An+1 ⊆ An . Thus Yn , Yn+1 , . . . are all An -measurable and hence so is their limit, Y , say. So, Y is An -measurable for all n ≥ 1, and therefore it is T -measurable. In other words, lim inf n→∞ X n is T -measurable. The T -measurability of lim supn→∞ X n follows similarly, since lim supn→∞ X n = limn→∞ (supm≥n X m ). Alternatively, lim supn→∞ X n = −lim inf n→∞ (−X n ), and the previous proof applies. Finally, the limn→∞ X n , if it exists, is T -measurable, since it is equal to lim supn→∞ X n (= lim inf n→∞ X n ). (ii) Set S1,n = nj=1 X j , n ≥ 1. Then “the set of convergence of S1,n ” = (S1,n converges, as n → ∞) = ∩m≥1 ∪n≥1 ∩ν≥1 |S1,n+ν − S1,n | < m1 (by Theorem 3 in Chapter 3). However, the r.v.s S1,n and S1,n+ν , n ≥ 1, ν ≥ 1 are all A1 measurable, so that the set (S1,n converges, as n → ∞) is also A1 -measurable. Next, (S1,n converges, as nn → ∞) = (Sk,n converges, as n → ∞), for each k ≥ 1, where Sk,n = j=k X j , and the set (Sk,n converges, as n → ∞) is Ak -measurable. Thus, (S1,n converges, as n → ∞) is Ak -measurable, for every k ≥ 1, or “the set of convergence of n≥1 X n ” is Ak -measurable for all k ≥ 1, and hence T -measurable. We now proceed with the statement and proof of the so-called Kolmogorov Zero– One Law. Theorem 10 (Kolmogorov Zero–One Law). If the r.v.s X n , n ≥ 1, are independent, then the tail σ -field T is equivalent to the trivial σ -field { , } in the sense that, for every A ∈ T , we have P(A) = 0 or P(A) = 1. Proof. The independence of the r.v.s X n , n ≥ 1, implies that the σ -fields for all n. It follows that σ (X 1 , . . . , X n ) and σ (X n+1 , X n+2 , . . .) are independent # σ (X 1 , . . . , X n ) and T are independent, since T (= n≥1 An ) ⊆ σ (X n+1 , X n+2 , . . .) for all n ≥ 0 (and T ⊆ σ (X n , X n+1 , . . .) for all n ≥ 1). Therefore the field F = ∪n≥1 σ (X 1 , . . . , X n ) and T are independent. But then the σ -fields σ (F) = σ (X 1 , X 2 , . . .) and T are independent. This is so by Proposition 2 in Chapter 10. Since T ⊆ σ (X 1 , X 2 , . . .), as already mentioned, it follows that T is independent of itself. Then for A ∈ T , we have P(A ∩ A) = P(A)P(A), which can happen only if P(A) = 0 or P(A) = 1. To this theorem, there is the following important corollary, according to which sequences and series of independent r.v.s either converge a.s. or diverge a.s.; i.e., they cannot converge (or diverge) on a set A with 0 < P(A) < 1. def
Corollary. If the r.v.s X n , n ≥ 1, are independent, then the X = lim inf n→∞ X n , def X¯ = lim supn→∞ X n , are (possibly extended) degenerate r.v.s with probability 1; i.e., they are constants (included ±∞) with probability 1. Moreover, the limits as n → ∞, if they exist, of {X n }, {Sn }, and { bSnn }, where Sn = nj=1 X j and 0 < bn ↑ ∞, are constants, as above, with probability 1. Proof. In Proposition 1, it was proved that lim inf n→∞ X n , lim supn→∞ X n and the lim X n , if it exists, are all T -measurable r.v.s. The same was proved to be true for
14.5 Some Further Results on Sequences of Independent
X n , or of the sequence {Sn }, from which n≥1 Sn the same follows for the sequence bn . The proof will be completed by showing that, if a (possibly extended) r.v. X is T -measurable, then P(X = c) = 1. Indeed, if P(X ≤ x) = 1 for every x, then, by letting x ↓ −∞, we get P(X = −∞) = 1. Likewise, if P(X > x) = 1 for all x, then, by letting x ↑ ∞, we get P(X = ∞) = 1. The remaining case is that there exist a, b in with a < b such that P(X < a) = 0 and P(X > b) considering ,that P(a ≤ X ≤ b) = 1. Then, bya+b + = 0.This +implies a+b and = 1 or , b , we have that either P a ≤ X < the intervals a, a+b 2 2 2 a+b P 2 ≤ X ≤ b = 1. Call A1 that interval for which P(X ∈ A1 ) = 1. Split A1 into halves, and let A2 be that half for which P(X ∈ A2 ) = 1. Continuing on like this, we have a sequence An ↓ with P(X ∈ An ) = 1, n ≥ 1. Clearly, as n → ∞, An ↓ {x0 }, some point in [a, b], and then P(X = x0 ) = 1. the set of convergence of the series
In establishing Theorem 11 later, we need two auxiliary results, which are also of independent interest. Lemma 7. Let X n , n ≥ 1, be independent r.v.s with finite expectations, and let σn2 = σ 2 (X n ). Then (i) If n≥1 σn2 < ∞, then n≥1 (X n − E X n ) converges a.s. (ii) If n≥1 σn2 = ∞ and |X n | ≤ c(< ∞) a.s., n ≥ 1, then n≥1 (X n − E X n ), with probability 1, does not converge. (iii) If |X n | ≤ c(< ∞) a.s., n ≥ 1, then n≥1 (X n − E X n ) converges a.s., if and only if n≥1 σn2 < ∞. Proof. (i) See the proof of Theorem 2. (ii) As in part (i), set Tk = kj=1 (X j − E X j ) and apply the left-hand side of the relation in Remark 2 to obtain (ε + 2c)2 1 − m+r ≤P 2 j=m+1 σ j
max |Tm+k − Tm | ≥ ε .
(14.45)
1≤k≤r
Since
max |Tm+k − Tm | ≥ ε
1≤k≤r
= |Tm+k − Tm | ≥ ε for at least one k with 1 ≤ k ≤ r =
r |Tm+k − Tm | ≥ ε , k=1
315
316
CHAPTER 14 Topics from Sequences of Independent Random Variables
relation (14.45) becomes (ε + 2c)2 ≤P 1 − m+r σ j2
r |Tm+k − Tm | ≥ ε .
(14.46)
k=1
j=m+1
In (14.46), let r → ∞ and use the assumption that n≥1 σn2 = ∞ to obtain
∞ |Tm+k − Tm | ≥ ε . 1≤ P k=1
Hence
∞
lim P |Tm+k − Tm | ≥ ε = 1 for every ε > 0.
m→∞
(14.47)
k=1
However, (14.47) is a necessary and sufficient condition for mutual a.s. nonconvergence of {Tm }, by Theorem 4 in Chapter 3, and hence a.s. nonconvergence of {Tm }. (iii) Follow from parts (i) and (ii). Lemma 8. For the independent r.v.s X n , n ≥ 1, suppose that |X n | ≤ c(< ∞) a.s., n ≥ 1, and that n≥1 X n converges a.s. Then both n≥1 σn2 and n≥1 E X n converge. Proof. For each n ≥ 1, let Yn be a r.v. having the same distribution as X n and such that the r.v.s X 1 , Y1 , X 2 , Y2 , . . . are independent. Set Z n = X n −Yn , so that, for n ≥ 1, E Z n = 0, and σ 2 (Z n ) = 2σ 2 (X n ) = the r.v.s Z n are independent, |Z n | ≤ 2c a.s., 2 for 2σn . Furthermore, the a.s. convergence of n≥1 X n implies a.s. convergence Yn as well as a.s. convergence for n≥1 Z n , since n≥1 Z n = n≥1 X n − n≥1 the direct part of Lemma 7 (iii), it follows that n≥1 σ 2 (Z n ) < ∞, n≥1 Yn . By < ∞. Then, by Lemma 7 (i), n≥1 (X n − E X n ) converges and hence n≥1 σ 2 (X n ) a.s. Since n≥1 E X n = n≥1 X n − n≥1 (X n − E X n ), it follows that n≥1 E X n converges. The following result provides necessary and sufficient conditions for a.s. convergence of a series of independent r.v.s. Theorem 11 (Three Series Criterion; Kolmogorov). Let X n , n ≥ 1, be independent r.v.s, and for some c > 0, let X nc be the truncation of X n at c; i.e., " X n if |X n | < c X nc = , n ≥ 1. 0 if |X n | ≥ c Consider the three series: P(|X n | ≥ c), (ii) σ 2 (X nc ), (iii) E X nc . (i) n≥1
n≥1
n≥1
14.5 Some Further Results on Sequences of Independent
Then
(a) If for some c > 0, all three series (i)–(iii) converge, then the series n≥1 X n converges a.s. (b) If n≥1 X n converges a.s., then all three series (i)–(iii) converge for every c > 0. Proof.
P(X n = X nc ) < ∞. There(a) Clearly, (|X n | ≥ c) = (X n = X nc ), and by (i), n≥1 fore, by Theorem 5 (ii), the series n≥1 X n and n≥1 X nc converge (essentially) on the same set. On account of (ii), Theorem and gives that 2 applies c − E X c ) converges a.s. Then, by (iii), c converges a.s., and (X X n n≥1 n≥1 n n hence so does n≥1 X n . a.s. (b) Now, a.s. convergence of n≥1 X n implies X n → 0. Then, by Corollary 3 to n→∞ Theorem 4, it follows that, for every c > 0, n≥1 P(|X n | ≥ c)< ∞. Thus the the series n≥1 X n and seriesc in (i) converges. Next, as noted in part (a), X converge (essentially) on the same set. Since n n≥1 n≥1 X n converges a.s., so does n≥1 X nc . However, |X nc | < c, n ≥ 1. Then Lemma 8 applies and yields that n≥1 σ 2 (X nc ) < ∞ and n≥1 E X nc converges. So, the series in (ii) and (iii) converge, and the proof is completed. Exercises 1. P
1. In reference to Exercise 6 in Chapter 11, explain why Snn 0. n→∞ 2. For a r.v. X , show that E|X | < ∞ if and only if ∞ n=1 P(|X | ≥ nc) < ∞ for some fixed constant c > 0. 3. Let X 1 , X 2 , . . . be independent r.v.s, and let the r.v. X be An = σ (X n , X n+1 , . . .)measurable for every n ≥ 1. Then show that P(X = c) = 1 for some (finite) constant c. ∞ 4. For n = 1, 2, . . ., suppose ∞ that the r.v.s X n are independent and that n=1 E|X n | < ∞. Then show that n=1 |X n | converges a.s. Hint: Use the Markov inequality, the special case of Theorem 6 in Chapter 6, and Theorem 11 here. 5. For n ≥ 1, let X n and X be r.v.s defined on the measure space ( , A, μ). Then, by ∞ Theorem 4 in Chapter 3, X n → X a.e. if and only if μ(∩∞ n=1 ∪ν=0 (|X n+ν − X | ≥ n→∞
1 k)
= 0 for each arbitrary but fixed k = 1, 2, . . . Replace μ by a probability measure P, and show that X n → 0 a.s. if and only if P(lim supn→∞ An ) = 0, n→∞
where An = (|X n | ≥ k1 ) for each arbitrary but fixed k = 1, 2, . . . 6. Consider any events (independent or not) An , n ≥ 1, and suppose that ∞ n=1 P(An ) < ∞. Then P(lim supn→∞ An ) = 0. This is so by Theorem 3 in this chapter (see also Theorem 4(i)). Regarding the converse of this statement, we have that, if P(lim supn→∞ An ) = 0 and the events An , n ≥ 1, are independent, then ∞ n=1 P(An ) < ∞. Justify this assertion.
317
318
CHAPTER 14 Topics from Sequences of Independent Random Variables
7. Regarding the converse stated in Exercise 6, if the events An , n ≥ 1, are not independent, then P(lim supn→∞ An ) = 0 need not imply that ∞ n=1 P(An ) < ∞. Give one or two concrete examples to demonstrate this assertion. Hint: Take ( , A, P) = ((0, 1), B(0,1) , λ), λ being the Lebesgue measure. Then (a) Take X n = I(0, 1 ) , n ≥ 1, and show that X n → 0, so that X n → 0 a.s. n
n→∞
n→∞
Then, by Exercise 5, P(lim supn→∞ An ) = 0, where A n = (|X n | ≥ 1/k) for any arbitrary but fixed k = 1, 2, . . . Also, show that ∞ n=1 P(An ) = ∞. (b) Take X n = I(0, 1 ) , n ≥ 1, and show that X n → 0, so that X n → 0 a.s. n→∞ n→∞ n2 ∞ Again, P(lim supn→∞ An ) = 0, as in (a). Also, show that n=1 P(An ) < ∞.
CHAPTER
Topics from Ergodic Theory
15
The ultimate purpose of this chapter is the formulation and proof of the Ergodic Theorem (see Theorem 1 and its Corollaries 1 and 4, as well as Theorem 3 and its Corollaries 1 and 2). To this effect, several concepts must be introduced first, and a substantial number of results must also be established; most of them are of independent interest. The chapter is organized in six sections. In Section 15.1, the basic concept of a discrete parameter (stochastic) process is introduced as well as the special case of the coordinate process. Also, the concept of (strict) stationarity is introduced, and some characterizations of it are discussed. In Section 15.2, the concept of a measurable measure-preserving transformation is introduced, as well as the special case of the shift transformation. Stationary processes are then defined by means of a measurable measure-preserving transformation and a r.v. It is shown that the coordinate process is defined by way of the shift transformation, and that the coordinate process is stationary if and only if the shift transformation is measure-preserving. Furthermore, the interplay between stationary processes and the coordinate process is studied. The concepts of invariant and of almost sure invariant sets under a transformation are taken up in Section 15.3. It is shown, among other things, that invariant and almost invariant sets form σ -fields and one is the completion of the other. Also, the concept of ergodicity of a transformation is introduced here. In Section 15.4, the concept of invariance, relative to a transformation, is extended to a r.v., and it is shown that a r.v. is invariant if and only if it is measurable with respect to the σ -field of invariant sets. Also, it is shown that a transformation is ergodic if and only if every r.v., invariant relative to the underlying transformation, is equal to a constant with probability 1. In the subsequent section, the Ergodic Theorem as well as the so-called Maximal Ergodic Theorem are formulated and proved, and their forms are also noted under ergodicity of the underlying transformation. In Section 15.6, a process X is considered, without being stipulated that it is defined by means of a measurable measure-preserving transformation, and the concepts of invariance of a set and of a r.v., relative to X, are defined. The invariant sets form a σ -field, and a r.v. is invariant if and only if it is measurable with respect to the σ -field of invariant sets. Ergodicity of X is also defined here. The Ergodic Theorem An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00015-3 Copyright © 2014 Elsevier Inc. All rights reserved.
319
320
CHAPTER 15 Topics from Ergodic Theory
is reformulated and proved, and its form is noted under ergodicity of X. The section is concluded with the derivation of processes by means of a given process X, and the study of properties inherited from X, such as stationarity, invariance, and ergodicity.
15.1 Stochastic Process, the Coordinate Process, Stationary Process, and Related Results As has already been mentioned, the ultimate purpose of this chapter is to present a proof of the Ergodic Theorem. To this end, one will have first to build up the necessary mathematical machinery, and prove a series of auxiliary results. To start with, let X = X 1 , X 2 , . . . and X = X 1 , X 2 , . . . be two (discrete time parameter) stochastic processes or just processes (i.e., two infinite sequences of r.v.s) defined on the probability spaces (, A, P) and ( , A , P ), respectively, and taking values in (∞ , B ∞ ), the infinite cartesian product of Borel real lines (, B). Definition 1. The processes X and X just described have the same distribution, if for every B ∈ B ∞ , one has P X ∈ B = P X ∈ B . Clearly, two processes with the same distribution are indistinguishable from the probabilistic point of view. Thus, what counts is the distribution of a process rather than the probability space on which it is defined. It would then be desirable to replace the probability space of a given process by another one that would be easier to deal with. This can actually be done, as will be shown in the sequel. Definition 2. Consider the measurable space (∞ , B ∞ ), where the points of ∞ are denoted by x = (x1 , x2 , . . .). Then, for each n ≥ 1, define on ∞ the real-valued function Xˆ n (x) = xn . ˆ = ( Xˆ 1 , Xˆ 2 , . . .) is called the coordinate Then Xˆ n is a r.v. and the process X process. −1 That Xˆ n is a r.v. is immediate, since for every set A in B, Xˆ n (A) is the cylinder × · · · × ×A × × · · · which is in B ∞ . We then have the following result. n−1
Proposition 1. Let X = (X 1 , X 2 , . . .) be a process defined on the probability space (, A, P) and taking values in (∞ , B ∞ ). Then there is a probability measure Pˆ on ˆ = ( Xˆ 1 , Xˆ 2 , . . .) have B ∞ such that the given process and the coordinate process X ˆ the same distribution, under P and P, respectively. Proof. The probability measure Pˆ in question is simply the distribution of X ˆ = P(X ∈ B). Since, clearly, for every B ∈ B ∞ , under P; i.e., for B ∈B ∞ , P(B) ˆ ∈ B) = B, we have P(X ∈ B) = P( ˆ ∈ B). ˆ X (X
15.1 Stochastic Process, the Coordinate Process, Stationary Process
In view of this proposition, all definitions and results involving the distribution of a process may be given in terms of the coordinate process by employing the appropriate probability measure. Thus, the process X = (X 1 , X 2 , . . .), defined on (, A, P) into (∞ , B ∞ ), may be thought of as being the coordinate process and ˆ then (, A, P) = (∞ , B ∞ , P). We now give the following definition. Definition 3. The process X = (X 1 , X 2 , . . .) is said to be (strictly) stationary, if for every m ≥ 1, every 1 ≤ n 1 < n 2 < · · · < n m , with n 1 , n 2 , . . . , n m integers and every k ≥ 1, one has P
X n1 , X n2 , . . . , X nm ∈ B = P X n 1 +k , X n 2 +k , . . . , X n m +k ∈ B(k) ,
(15.1)
where B is any cylinder with base a Borel set in mj=1 Bn j and all of whose sides are equal to , and B(k) is the cylinder with the same base as that of B but located in
m j=1 Bn j +k and all of whose sides are equal to . In the sequel, we will write B, rather than B(k) , but the preceding explanation should be kept in mind. We shall also write B for the cylinder with base the Borel set B in the σ -field mj=1 Bn j . Since the distribution of the random vector (X n 1 , X n 2 , . . . , X n m ) is determined by the joint d.f.of X n 1 , X n 2 , . . . , X n m , relation (15.1) is equivalent to the following one, FX n1 ,X n2 ,...,X nm (xn 1 , xn 2 , . . . , xn m ) = FX n1 +k ,X n2 +k ,...,X nm +k (xn 1 , xn 2 , . . . , xn m ),
(15.2)
for all xn 1 , xn 2 , . . . , xn m ∈ . From (15.1) (or (15.2)), it follows, in particular, that the r.v.s X n , n ≥ 1, are identically distributed. Interpreting n as time, the concept of stationarity is then clear: No matter what times they are associated with, the joint distribution of any finite number of r.v.s from a stationary process is the same, provided their relative distance (in time) remains the same. In particular, no matter at what time one starts making observations, the distribution of the outcomes is the same. From (15.1), it also follows that, for every k ≥ 1, and any B ∈B ∞ , P
X 1 , X 2 , . . . ∈ B = P X k+1 , X k+2 , . . . ∈ B .
(15.3)
This is so because all finite dimensional distributions of the processes X = (X 1 , X 2 , . . .) and X(k) = (X k+1 , X k+2 , . . .) are the same (by (15.1)), and then so are the induced infinite dimensional measures in (15.3) (by the Carathéodory Extension Theorem and the Kolmogorov Consistency Theorem; for the latter, see, e.g., Loève (1963), page 93). Actually, stationarity is characterized by the joint distributions of all finitely many consecutive r.v.s. More precisely, one has
321
322
CHAPTER 15 Topics from Ergodic Theory
Proposition 2. The process X = (X 1 , X 2 , . . .) is stationary, if and only if, for every m, k ≥ 1, and every B ∈ B m , one has (15.4) P X 1 , X 2 , . . . , X m ∈ B = P X k+1 , X k+2 , . . . , X k+m ∈ B . Proof. One direction is obvious from (15.1) by taking n j = j, j = 1, . . . , m. So, it suffices to show the other direction. To this end, let 1 ≤ n 1 < n 2 < · · · < n m . One has, by (15.4), FX 1 ,X 2 ,...,X nm (x1 , x2 , . . . , xn m ) = FX k+1 ,X k+2 ,...,X k+nm (x1 , x2 , . . . , xn m )
(15.5)
for all x1 , x2 , . . . , xn m ∈ . In (15.5), replace the xs that are = from xn j , j = 1, . . . , m, by ∞(in the sense of letting them → ∞). Then one has FX n1 ,X n2 ,...,X nm (xn 1 , xn 2 , . . . , xn m )
= FX n1 +k ,X n2 +k ,...,X nm +k (xn 1 , xn 2 , . . . , xn m ),
which is (15.2). But (15.2) implies (15.1).
We also have the following result. Proposition 3. For n ≥ 1, let X n be independent r.v.s. Then X = (X 1 , X 2 , . . .) is stationary, if and only if the X n s are identically distributed. Proof. If X is stationary, the X n s are always identically distributed. For the converse, we have FX 1 ,...,X n (x1 , . . . , xn ) = FX 1 (x1 ) · · · FX n (xn ) = FX k+1 (x1 ) · · · FX k+n (xn ) = FX k+1 ,...,X k+n (x1 , . . . , xn ) for every k ≥ 1, and then Proposition 2 completes the proof.
Given a stationary process, one may derive any number of other stationary processes. This is so because of the following result. Proposition 4. Let X = (X 1 , X 2 , . . .) be a stationary process defined on (, A, P) and taking values in (∞ , B ∞ ), and let ϕm be defined as follows: ϕm : (m , B m ) → (, B), measurable, 1 ≤ m ≤ ∞. Set
Yn = ϕm X n , X n+1 , . . . , X n+m−1 , n ≥ 1.
(15.6)
Then the process Y = (Y1 , Y2 , . . .) is stationary. Proof. By Proposition 2, it suffices to prove that, for every n, k ≥ 1, and every B ∈ B n , one has (15.7) P (Y1 , Y2 , . . . , Yn ) ∈ B = P (Yk+1 , Yk+2 , . . . , Yk+n ) ∈ B .
15.2 Measure-Preserving Transformations, the Shift Transformation
Set y j = ϕm (x j , x j+1 , . . . , x j+m−1 ), j ≥ 1, and let A = n+m−1 ; y1 , y2 , . . . , yn ∈ B . Then one has
x1 , x2 , . . . , xn+m−1 ∈
P (Y1 , Y2 , . . . , Yn ) ∈ B = P (X 1 , X 2 , . . . , X n+m−1 ) ∈ A = P (X k+1 , X k+2 , . . . , X k+n+m−1 ) ∈ A (by stationarity of X) = P (Yk+1 , Yk+2 , . . . , Yk+n ) ∈ B which is (15.7).
Corollary. If the r.v.s X n , n ≥ 1, are i.i.d., then the process Y, as defined by (15.6), is stationary. Proof.
It follows from Propositions 3 and 4.
15.2 Measure-Preserving Transformations, the Shift Transformation, and Related Results Consider the probability space (, A, P), and let T : (, A) → (, A), T −1 (A) ⊆ A; i.e., T is a measurable transformation on into itself. T will be always assumed to be measurable, whether explicitly stated or not. Definition 4.
The transformation T is said to be measure-preserving, if
P T −1 A = P A for every A ∈ A. −1 Remark 1. Actually, it suffices to show that P T A = P(A) for A belonging to a measure-determining class, such as the field F generating A. More precisely, Proposition 5. Let F be a field such that σ (F) = A, and assume that, for every A ∈ F, P(T −1 A) = P(A). Then, for every A ∈ A, P(T −1 A) = P(A). Proof. Let M = {A ∈ A; P(T −1 A) = P(A)}. Then M is a monotone class. Indeed, M ⊆ F by assumptions. Thus, M = . Next, let An ∈ M, n ≥ 1, be monotone. To show that lim An ∈ M, or equivalently, P(T −1 (lim An )) = P(lim An ) where here and in the sequel the limits are taken as n → ∞. To this end, we have, for An ↑: −1 −1 −1 ∪ An = P ∪ T An P(T (lim An )) = P T n≥1
n≥1
= P(lim T −1 An ) = lim P(T −1 An ) = lim P(An ) = P(lim An ).
323
324
CHAPTER 15 Topics from Ergodic Theory
For An ↓:
= P ∩ T −1 An P(T −1 (lim An )) = P T −1 ∩ An = P(lim T
−1
n≥1
n≥1
−1
An ) = lim P(An ) = P(lim An ).
An ) = lim P(T
So, M is monotone, contains F with σ (F) = A, and is contained in A. Hence M = A (by Theorem 6 in Chapter 1). −n Remark 2. If T is measure-preserving, then P T A = P(A), n ≥ 1, where T −n A = T −1 (T −(n−1) A), n ≥ 1, and T −0 = I , the identity in . In fact, for n = 1 the statement is true. Let us assume it to be true for some n > 1. Then P[T −(n+1) A] = P[T −1 (T −n A)] = P(T −n A) (by the measure-preserving property of T ) = P(A) (by the induction hypothesis). In terms of a measure-preserving transformation T , one may define any number of stationary processes. In fact, let X be a r.v. defined on (, A, P), and define the r.v.s X n , n ≥ 1, as follows:
(15.8) X n (ω) = X T n−1 ω , ω ∈ , where T 2 ω = T (T ω), T k ω = T (T k−1 ω), and T 0 = I , the identity in .
(15.9)
Then one has the following result. Proposition 6. Let T be a (measurable) measure-preserving transformation defined on (, A, P) into itself, and let X be a r.v. defined on this probability space. Let X n , n ≥ 1, be defined by (15.8) and (15.9). Then the process X = (X 1 , X 2 , . . .) is a stationary process. Proof. In the first place, the measurability of T implies the measurability of T 2 , since for A ∈ A, (T 2 )−1 (A) = T −1 [T −1 (A)], and by induction, T n is measurable for all n ≥ 1. Thus, X n , n ≥ 1, are r.v.s. It remains to prove stationarity. For B ∈ B n , we set A = ω ∈ ; X 1 (ω), . . . , X n (ω) ∈ B and A = ω ∈ ; X k+1 (ω), . . . , X k+n (ω) ∈ B and we wish to show that P(A) = P(A ). This will follow, by the measure-preserving property of T , by showing that Lemma 1. Proof.
With A and A as just defined, we have A = T −k A. de f
First, T −k A ⊆ A . Indeed, ω ∈ T −k A implies T k ω = ω ∈ A. Then B (X 1 (ω ), . . . , X n (ω )) = (X (T 0 ω ), . . . , X (T n−1 ω )) = (X (T 0 T k ω), . . . , X (T n−1 T k ω)) = (X (T k ω), . . . , X (T k+n−1 ω)) = (X k+1 (ω), . . . , X k+n (ω)), so that ω ∈ A ; i.e., T −k A ⊆ A .
15.2 Measure-Preserving Transformations, the Shift Transformation
de f
Next, A ⊆ T −k A. That is, for ω ∈ A , to show ω ∈ T −k A or T k ω = ω ∈ A. We have (X 1 (ω ), . . . , X n (ω )) = (X (T 0 ω ), . . . , X (T n−1 ω )) = (X (T 0 T k ω), . . . , X (T n−1 T k ω)) = (X (T k ω), . . . , X (T k+n−1 ω)) = (X k+1 (ω), . . . , X k+n (ω)) and this does belong in B since ω ∈ A . Thus, (X 1 (ω ), . . . , X n (ω )) ∈ B, which implies that ω ∈ A. Hence A ⊆ T −k A, and therefore A = T −k A. A certain transformation to be introduced next is of special interest. Definition 5.
The transformation S, defined as S : ∞ → ∞ , so that S(x1 , x2 , . . .) = (x2 , x3 , . . .),
is called the shift transformation. We then have the following result. The shift transformation S defined on (∞ , B ∞ ) is measurable.
Proposition 7. Proof.
In order to prove measurability for S, it suffices to show that S −1 I1 × · · · × In × × · · · ∈ B ∞ for any n ≥ 1,
and any intervals I j , j = 1, . . . , n, in . We have S −1 I1 × · · · × In × × · · · = x = (x1 , x2 , . . .) ∈ ∞ ; S x = (x2 , x3 , . . .) ∈ I1 × · · · × In × × · · · } = x = (x1 , x2 , . . .) ∈ ∞ ; x2 ∈ I1 , . . . , xn+1 ∈ In , x j ∈ , j ≥ n + 2 = × I1 × · · · × In × × · · · , which is in B ∞ .
(15.10)
ˆ = ( Xˆ 1 , Xˆ 2 , . . .) In terms of the shift transformation S, the coordinate process X can be expressed as
(15.11) S 0 is the identity in ∞ . Xˆ n (x) = Xˆ 1 S n−1 x , n ≥ 1 We now have the following result.
ˆ = Xˆ 1 , Xˆ 2 , . . . , defined on Proposition 8. If the coordinate process X ˆ is stationary, then the shift transformation S is measure-preserving, (∞ , B ∞ , P), and vice versa. Proof. Since the class of cylinders I1 × · · · × In × × · · · ; I j , j = 1, . . . , n, intervals in determines the measure Pˆ on B ∞ , by Remark 1 and Proposition 5, it
325
326
CHAPTER 15 Topics from Ergodic Theory
suffices to show that S is measure-preserving over this class of sets alone. We have Pˆ I1 × · · · × In × × · · · = Pˆ Xˆ 1 , . . . , Xˆ n ∈ I1 × · · · × In
= Pˆ Xˆ 1 ∈ I1 , . . . , Xˆ n ∈ In = Pˆ Xˆ 2 ∈ I1 , . . . , Xˆ n+1 ∈ In ˆ (by stationarity of X), = Pˆ × I1 × · · · × In × × · · · which, by (15.10), completes one direction of the proof. The converse is true by Proposition 6. Now Propositions 7 and 8 state, in effect, that every stationary process is essentially generated by a measure-preserving transformation. More precisely, we have Proposition 9. Let X = X 1 , X 2 , . . . be a stationary process defined on the probability space (, A, P), and let Pˆ be the distribution of X, under P. Then the ˆ is defined in terms of the shift transformation S, has the same coordinate process X distribution X, and is stationary; S is measure-preserving. It follows from (15.11), Propositions 8, and 1. Thus, from a probability point of view, if X = X 1 , X 2 , . . . is a stationary process defined on the probability space (, A, P), we may assume that there is a (measurable) measure-preserving transformation T on (, A, P) into itself such that X n = X (T n−1 ), n ≥ 1, for some r.v. X , in the sense that X n (ω) = X [T n−1 (ω)], ω ∈ , n ≥ 1.
Proof.
15.3 Invariant and Almost Sure Invariant Sets Relative to a Transformation, and Related Results The basic concepts in this section are those of an invariant and of an almost invariant set given below. Definition 6. Let T be a (measurable) transformation on (, A, P) into itself. A set A ∈ A is said to be invariant (under T ), if T −1 A = A. A set A ∈ A is said to be a.s. invariant, if P(A T −1 A) = 0, or equivalently, P(A ∩ T −1 A) = P(A) = P(T −1 A). Remark 3. It is easily seen that, if A is invariant, then P(T −n A) = P(A), and if T is a.s. invariant, then P(A T −n A) = 0, n ≥ 0. That is, Proposition 10. for n ≥ 0.
a.s.
If A is a.s. invariant, then P(A T −n A) = 0, so that A = T −n A a.s.
a.s.
a.s.
Proof. In the first place, we observe that A = B and B = C imply A = C. Indeed, a.s. a.s. A = B means that P(A B) = 0 or P(A ∩ B c ) = P(Ac ∩ B) = 0, and B = C
15.3 Invariant and Almost Sure Invariant Sets Relative to a Transformation
means that P(B C) = 0 or P(B ∩ C c ) = P(B c ∩ C) = 0. Then P(A ∩ C c ) = P[(A ∩ C c ) ∩ B] + P[(A ∩ C c ) ∩ B c ] = P[A ∩ (B ∩ C c )] + P[(A ∩ B c ) ∩ C c ] ≤ P(B ∩ C c ) + P(A ∩ B c ) = 0, P(Ac ∩ C) = P[(Ac ∩ C) ∩ B] + P[(Ac ∩ C) ∩ B c ] = P[(Ac ∩ B) ∩ C] + P[(B c ∩ C) ∩ Ac ] ≤ P(Ac ∩ B) + P(B c ∩ C) = 0. a.s.
Thus, P(A ∩ C c ) = P(Ac ∩ C) = 0 and hence P(A C) = 0 or A = C. Next, by the a.s. invariance of A, we have, P(A) = P(T −1 A) = P(A ∩ T −1 A), and by the measure-preserving property of T , P(T −1 A) = P(T −2 A) = P[T −1 (A ∩ T −1 A)] = P[T −1 A ∩ T −2 A]. Thus P(T −1 A) = P(T −2 A) = P[T −1 A ∩ T −2 A], hence P(T −1 A T −2 A) = 0 a.s. or T −1 A = T −2 A. a.s. a.s. Since also A = T −1 A, we have, by the introductory observation, A = T −2 A. a.s. −3 a.s. a.s. −n a.s. −n −2 Likewise, T A = T A = · · · = T A, so that A = T A, n ≥ 0. Proposition 11. Let T be a (measurable) measure-preserving transformation on (, A, P) into itself. Then one has (i) If A = B a.s. and B is a.s. invariant, then so is A. (ii) If B is a.s. invariant, then P(B c ∩T −n B) = 0 and P(B∩T −n B) = P(B), n ≥ 0. Proof.
a.s.
a.s.
First, show that A = B implies T −1 A = T −1 B. Indeed, P(T −1 A) = P(A)(by the measure-preserving property of T) a.s.
= P(B)(since A = B) = P(T −1 B)(by the measure-preserving property of T), and a.s.
P(A) = P(B) = P(A ∩ B)(since A = B) = P[T −1 (A ∩ B)] (by the measure-preserving property of T) = P(T −1 A ∩ T −1 B). So, P(T −1 A) = P(T −1 B) = P(T −1 A ∩ T −1 B), or P(T −1 A T −1 B) = 0, so that a.s. T −1 A = T −1 B.
327
328
CHAPTER 15 Topics from Ergodic Theory
We now proceed with the proof of the proposition. a.s.
a.s.
a.s.
(i) A = B implies T −1 A = T −1 B, by the introductory observation. But B = T −1 B a.s. a.s. a.s. a.s. by Proposition 10. Thus A = B = T −1 B = T −1 A, so that A = T −1 A, and A is a.s. invariant. a.s. (ii) By Proposition 10, we have B = T −n B, since B is a.s. invariant. Hence −n c P(B ∩ T B) = P(B) and P(B ∩ T −n B) = 0. We also have Proposition 12. The class J of invariant sets in A under a (measurable) transformation T is a σ -field. Proof.
We have that T −1 = , so that ∈ J and so J is not empty. Next,
c T −1 Ac = T −1 A = Ac , if A ∈ J , so that Ac ∈ J .
Finally,
⎛ ⎞ T −1 ⎝ A j ⎠ = T −1 A j = A j , if A j ∈ J , j
so that
j
j
Aj ∈ J.
j
Proposition 13. Let J be the class of all a.s. invariant sets in (, A, P) under the (measurable) measure-preserving transformation T . Then, for every A ∈ J , there exists a set A ∈ J such that A = A a.s.; i.e., P(A A ) = 0. −n C. Then Proof. Let C ∈ A and set B = ∞ n=0 T T −n B ⊆ B, n ≥ 0. In fact,
⎛ T −n B = T −n ⎝
⎞
∞
T − j C⎠ =
j=0
⊆
∞
∞
(15.12)
T −n− j C =
j=0
∞
T−jC
j=n
T − j C = B.
j=0
Next, set D=
∞
T −n B.
(15.13)
n=0
Then D is invariant. In fact, T
−1
D=T
−1
∞
n=0
T
−n
B
=
∞ n=0
T −n−1 B =
∞ n=1
T −n B
15.3 Invariant and Almost Sure Invariant Sets Relative to a Transformation
=
∞
T −n B
n=1 ∞
=
T −n B
B+
∞
T −n B
Bc
n=1
B
n=1
=
∞
T −n B = D,
n=0
because B c ∩ T −1 B = , since T −1 B ⊆ B, by (15.12). (Actually, this may be considered as a way of constructing invariant sets.) Suppose now that C ∈ J . Then we will show that D = C a.s. by showing that B = C a.s. and D = B a.s. In fact, since C ⊆ B, it suffices to show that P(B ∩ C c ) = 0. We have ∞ ∞ c −n c c −n P(B ∩ C ) = P C ∩T C T C ∩C = P n=0
≤
∞
n=0
P C c ∩ T −n C .
n=0
But
P C c ∩ T −n C = 0, by Proposition 11(ii).
Thus, B = C a.s., and hence B is a.s. invariant by Proposition 11(i). Now from (15.13), we have that D ⊆ B, so that ∞ c D B = B ∩ Dc = B ∩ T −n B = B∩ = B∩
n=0 ∞
n=0 ∞
T
−n
B
c
T
−n
B
c
n=0
=
∞ n=0
∞ B ∩ T −n B c = B ∩ T −n B c . n=1
But B ∩ T −n B c = B ∩ T −n ( − B) = B ∩ (T −n − T −n B) = B ∩ ( − T −n B) = B − B ∩ T −n B and P(B ∩ T −n B) = P(B),
329
330
CHAPTER 15 Topics from Ergodic Theory
by the fact that B is a.s.invariant, and Proposition 11 (ii). Thus, P(B ∩ T −n B c ) = 0, and P(D B) = 0, so that D = B a.s. and hence D = C a.s. To summarize, C is a.s. invariant, D is invariant, and C = D a.s. Hence, if we set A = C, then A = D. Proposition 14.
Let J be as in Proposition 12, and define the class J as follows:
J = {A ∈ A;
P(A B) = 0 for some B ∈ J } .
(15.14)
Then (i) Every A ∈ J is a.s. invariant. (ii) J is a σ -field. Proof. First, J = since, clearly, J ⊆ J by Proposition 13. Next, for A ∈ J , one has that there exists B ∈ J such that P(A B) = 0. Thus, A = B a.s. Since B is trivially a.s. invariant, it follows, by Proposition 11(i), that A is a.s. invariant. (ii) If A ∈ J , then for some B ∈ J , 0 = P(A B) = P(A ∩ B c ) + P(Ac ∩ B) = P[Ac ∩ (B c )c ] + P[(Ac )c ∩ B c ] = P(Ac B c ) and B c ∈ J , so that Ac ∈ J . Next, let A j ∈ J , j = 1, 2, . . . Then (i)
0 = P(A j B j ) = P(A j ∩ B cj ) + P(Acj ∩ B j ) for some B j ∈ J . We then set B = Bi and get i
⎛ ⎞ ⎛ ⎤ ⎞ ⎡⎛ ⎞c P ⎝ A j B ⎠ = P ⎝ A j ∩ B c ⎠ + P ⎣⎝ A j ⎠ ∩ B ⎦ j
j
j
⎛ ⎞ ⎛ ⎞ = P ⎝ Aj ∩ Bic ⎠ + P ⎝ Acj ∩ Bi ⎠ ⎡
j
i
j
i
⎤ c c ≤ P⎣ Aj ∩ Bj ⎦ + P Ai ∩ Bi ≤ Thus,
j
Aj ∈
J .
j
P A j ∩ B cj +
i
P Aic ∩ Bi = 0.
i
j
Definition 7. The σ -field J , as defined by (15.14), is called the completion of J with respect to A and P. (See also Exercise 17 in Chapter 2.) Proposition 15. Let J be the class of all a.s. invariant sets in A under the (measurable) measure-preserving transformation T . Then J is a σ -field and, indeed, J = J .
15.4 Measure-Preserving Ergodic Transformations, Invariant Random
Proof. By Proposition 13, J ⊆J , whereas, by Proposition 14(i), J ⊆J . Thus, J =J . Definition 8. Let T be a (measurable) transformation on (, A, P) into itself. Then T is said to be ergodic, if for every A ∈ J , one has P(A) = 0 or 1.
15.4 Measure-Preserving Ergodic Transformations, Invariant Random Variables Relative to a Transformation, and Related Results The concept of invariance is also defined for r.v.s. Definition 9. Let T be a (measurable) transformation on (, A, P) into itself, and let X be a r.v. on the same space. We say that X is invariant (relative to T ), if X (ω) = X (T ω), ω ∈ . Remark 4. For an invariant r.v. X , it is immediate that X (ω) = X (T n ω), n ≥ 0 (and for n = −1, −2, . . ., if T is one-to-one onto). Indeed, let ω = T −1 ω. Then X (T −1 ω) = X (ω ) = X (T ω ) = X (T (T −1 ω)) = X (ω), and likewise, if T −n ω = ω , then X (T −n ω) = X (ω ) = X (T ω ) = X (T (T −n ω)) = X (T −(n−1) ω) = X (ω), by the induction hypothesis. The fact that X (ω) = X (T n ω), ω ∈ , n = 0, ±1, . . ., means that X (ω) remains constant on the orbit T n ω, n = 0, ±1, . . . Now the question arises as to what r.v.s are invariant. The answer to this is given by the following result. Proposition 16. Let T be a (measurable) transformation on (, A, P) into itself, and let X be a r.v. on the same space. Then X is invariant (relative to T ), if and only if X is J -measurable, where J is the σ -field of (T -) invariant sets in A. Proof. Let X be invariant, and set A(x) = {ω ∈ ; X (ω) ≤ x} , x ∈ . Then T −1 A(x) = {ω ∈ ; T ω ∈ A(x)} = {ω ∈ ; X (T ω) ≤ x} = {ω ∈ ; X (ω) ≤ x} = A(x), since X (T ω) = X (ω). Thus, T −1 A(x) = A(x), x ∈ , and this establishes J measurability for X . Next, let X be J -measurable. We shall show that X is invariant. Since every X is the pointwise limit of simple r.v.s, it suffices to show the result for the case that X is
331
332
CHAPTER 15 Topics from Ergodic Theory
only an indicator function. So let X = I A with A ∈ J . Then X (T ω) = I A (T ω) = IT −1 A (ω) = I A (ω) = X (ω), since T −1 A = A. (See also Exercise 4.)
Ergodicity and invariant r.v.s are related as follows. Proposition 17. Let T be a (measurable) transformation on (, A, P) into itself. Then T is ergodic, if and only if every real-valued r.v. invariant relative to T , defined on the same probability space, is a.s. equal to a finite constant. Proof. Suppose that every invariant r.v. is a.s. equal to a constant, and let A be an arbitrary set in J . If we set X = I A , then X is J -measurable, and therefore invariant, by Proposition 16. Since X is equal to 1 or 0 with probability P(A) or P(Ac ), respectively, we have that P(A) = 1 or P(Ac ) = 1. So P(A) = 0 or P(A) = 1. Next, suppose that T is ergodic, so that P(A) = 0 or 1 for every A ∈ J . We shall prove that every invariant r.v. X is a.s. equal to a constant. Since X is invariant, it is J -measurable and hence P(X < x) = 0 or 1 for every x ∈ . On the other hand, P(X < x) → 1 as x → ∞. Hence P(X < x) = 1 for all x ≥ x1 , some sufficiently large x1 . Set x0 = inf {x ∈ ; P(X < x) = 1} ; x0 is finite; i.e., x0 > −∞, because otherwise P(X < x) = 1 for all x would imply 1 = P(X < x) ↓ P(X = −∞) as x ↓ −∞, a contradiction. Then P x0 − ε < X < x0 + ε = P X < x0 + ε −P X ≤ x0 − ε = 1 − 0 = 1 for every ε > 0. Letting ε ↓ 0 and observing that x0 − ε < X < x0 + ε ↓ X = x0 , we get P(X = x0 ) = 1.
15.5 The Ergodic Theorem, Preliminary Results We are now in a position to formulate and prove the Ergodic Theorem. Theorem 1 (Ergodic Theorem). Let T be a (measurable) measure-preserving transformation on (, A, P) into itself, and let X be a r.v. on the same space such that E |X | < ∞. Define X n , n ≥ 1, as X n (ω) = X (T n−1 ω), ω ∈ ,
15.5 The Ergodic Theorem, Preliminary Results
and let J be the invariant σ -field (relative to T ). Then n n 1 1 a.s. X j (ω) = X (T j−1 ω) → E X |J . n→∞ n n j=1
(15.15)
j=1
For the proof of Theorem 1, we need the following results. Lemma 2. Let T be a (measurable) measure-preserving transformation on (, A, P) into itself, and let X be a r.v. on the same space for which E X exists. Then E X (ω) = E X (T ω). Proof. Since any r.v. is split into two nonnegative r.v.s, which are nondecreasing pointwise limits of nonnegative simple r.v.s, it suffices to show the result for the case that X is an indicator r.v. So, let X = I A , A ∈ A. Then E X (ω) = P(A) and E X (T ω) = E I A (T ω) = E IT −1 A (ω) = P(T −1 A) = P(A) by the measurepreserving property of T ; i.e., E X (ω) = E X (T ω). (See also Exercise 5.) Alternative Simpler Proof. Let A = X −1 (B), B ∈ B. Then (X T )−1 (B) = T −1 A, and P(A) = P(T −1 A). Thus, X and X T have the same distribution under P. But # # # # Xd P = xd Q X and (X T )d P = xd Q X T .
Since FX ≡ FX T , we obtain
$
Xd P =
$
Theorem 2 (Maximal Ergodic Theorem). rem 1, and, for ω ∈ , set Sk (ω) =
k
X j (ω) =
j=1
and
(X T )d P.
Let T , X and X n , n ≥ 1, be as in Theo-
k
X (T j−1 ω), k ≥ 1,
(15.16)
j=1
Mn (ω) = max 0, S1 (ω), . . . , Sn (ω) .
(15.17)
#
Then
(Mn >0)
Proof.
X d P ≥ 0.
(15.18)
For 1 ≤ k ≤ n, we have, by replacing in (15.16) and (15.17) ω by T ω, Mn (T ω) ≥ Sk (T ω),
so that, by (15.16), X (ω) + Mn (T ω) ≥ X (ω) + Sk (T ω) = Sk+1 (ω), since Sk (T ω) = · · · + X k+1 (ω). Thus,
k % j=1
X (T j−1 T ω) = Sk+1 (ω) − X (ω) =
k %
X (T j ω) = X 2 (ω) +
j=1
X (ω) ≥ Sk+1 (ω) − Mn (T ω), k = 1, . . . , n.
(15.19)
333
334
CHAPTER 15 Topics from Ergodic Theory
But S1 (ω) = X (ω) and Mn (T ω) ≥ 0, so that X (ω) ≥ S1 (ω) − Mn (T ω).
(15.20)
From (15.19) and (15.20), it follows that X (ω) ≥ max S1 (ω), . . . , Sn (ω) − Mn (T ω). Therefore #
#
(Mn >0)
X (ω)d P ≥
(Mn >0)
max S1 (ω), . . . , Sn (ω)
−Mn (T ω)} d P # = Mn (ω) − Mn (T ω) d P, by (15.17). (Mn >0)
But
#
# (Mn >0)
Mn (ω)d P =
#
= Therefore
#
(Mn >0)
(Mn ≥0)
Mn (ω)d P
Mn (ω)d P, since Mn ≥ 0.
Mn (ω) − Mn (T ω) d P =
by Lemma 2. It follows that
#
Mn (ω)d P # − Mn (T ω)d P (Mn >0) # ≥ Mn (ω)d P # − Mn (T ω)d P = 0,
$ (Mn >0)
X d P ≥ 0, which is (15.18).
Lemma 3. Let T , X , X n , n ≥ 1, and J be as in Theorem 1, and suppose that E X |J = 0 a.s. Then relation (15.15) holds true. Proof. Let Sn , n ≥ 1, be defined by (15.16), and let X¯ = lim sup and in the sequel all limits are taken as n → ∞. Then
Sn n ,
where here
Sn+1 (ω) − X (ω) Sn (T ω) = lim sup X¯ (T ω) = lim sup n ' n & Sn+1 (ω) Sn+1 (ω) X (ω) n + 1 + × = lim sup = X¯ (ω); = lim sup − n n n+1 n+1
15.5 The Ergodic Theorem, Preliminary Results
i.e., X¯ (T ω) = X¯ (ω), which means that X¯ is invariant. Then X¯ is measurable with respect to J , and therefore, if for ε > 0 we set D = ( X¯ > ε), it follows that D is invariant. Now define the r.v. X ∗ as follows: (15.21) X ∗ (ω) = X (ω) − ε I D (ω). Let X n∗ = X ∗ (T n−1 ), n ≥ 1, and let Sk∗ and Mn∗ be the quantities defined by (15.16) and (15.17), respectively, when X is replaced by X ∗ . Then Theorem 2 implies that # X ∗ d P ≥ 0. (15.22) (Mn∗ >0)
Define the sets An , n ≥ 1, by Then, clearly,
An = Mn∗ > 0 .
An = Mn∗ > 0 =
Also, let the set A be defined by
max Sk∗ > 0 .
1≤k≤n
A=
sup Sk∗ k≥1
(15.23)
>0 .
(15.24)
Then, from (15.23) and (15.24), it clearly follows that An ↑ A. From the fact that D is invariant, we have that T − j D = D, j ≥ 0, so that X ∗ (T j ω) = X (T j ω) − ε I D (T j ω) = X (T j ω) − ε IT − j D (ω) = X (T j ω) − ε I D (ω); i.e., X ∗ (T j ω) = X (T j ω) − ε I D (ω). & ' Sk∗ (ω) Sk (ω) = − ε I D (ω), k k
Therefore
and
∗ S A = sup Sk∗ > 0 = sup k > 0 k≥1 k≥1 k Sk = sup − ε ID > 0 k k≥1 Sk > ε ∩ D, so that A ⊆ D. = sup k≥1 k
(15.25)
335
336
CHAPTER 15 Topics from Ergodic Theory
Now, from the definition of lim sup, it follows that lim sup
Sk Sk Sk ≤ sup ; i.e., X¯ ≤ sup . k k k≥1 k≥1 k
Since on D, X¯ > ε, it follows that on D, sup k≥1
Sk k
> ε, so that
Sk∗ Sk D ⊆ sup > ε ∩ D = sup >0 k≥1 k k≥1 k = sup Sk∗ > 0 = A. k≥1
Thus A = D. ( ∗ ( ∗ ∗ ∗ ( ( Therefore (15.25) gives An ↑ D. So we have X I An → X I D , X I An ≤ |X | ≤ |X | + ε, independent of n, and E |X | $+ ε < ∞. Hence the Dominated Conver$ $ gence Theorem applies and gives that An X ∗ d P → D X ∗ d P. But An X ∗ d P = $ $ ∗ ∗ (Mn∗ >0) X d P ≥ 0,by (15.23), so that D X d P ≥ 0. Next, one has, by means of (15.21), # # ∗ X − ε IDd P X dP = D #D # = X − ε dP = X d P − ε P(D) D #D = E X |J d P − ε P(D) D
= 0 − ε P(D) = −ε P(D). Therefore −ε P(D) ≥ 0, and hence P(D) = 0. That is to say, P( X¯ > ε) = 0, for every ε > 0, so that X¯ ≤ 0 a.s. So the conclusion so far has been that lim sup
Sn ≤ 0 a.s. n
(15.26)
Next, replacing the r.v. X by −X , the corresponding Sn is −Sn , and the same arguments above yield Sn Sn Sn = − lim inf or lim inf ≥ 0 a.s. a.s. 0 ≥ lim sup − n n n This result, together with (15.26), then gives that Sn a.s. → 0. n
15.5 The Ergodic Theorem, Preliminary Results
Proof of Theorem 1. Let Y = X −E X |J . Then E Y |J = 0 a.s., and of course, E X |J is J -measurable, hence invariant, by Proposition 16. Then, by Remark 4, E X |J (T n ω) = E X |J (ω), n ≥ 0, ω ∈ . It follows that
1 n
%n
j=1 Y
T j−1 ω =
1 n
%n j=1
X T j−1 ω − E X |J , whereas
n 1 j−1 a.s. Y T ω → 0, n j=1
by Lemma 3. Thus
n 1 j−1 a.s. X T ω → E X |J . n j=1
Corollary 1. If T is also ergodic, then the right-hand side of (15.15) is equal to E X a.s. Proof. E X |J is J -measurable, and since T is ergodic, it follows that E X |J = c a.s. by Proposition 17. Then E E X |J = E X = c. Thus E X |J = E X a.s. Corollary 2.
If T is also ergodic, then for every A ∈ A, one has n 1 j−1 a.s. IA T ω → P(A). n→∞ n
(15.27)
j=1
Proof. P(A).
It follows by Corollary 1, and the fact that a.s. E I A |J = E I A =
Remark 5. Relation (15.27) says, in effect, that, if T is ergodic, then, for almost all ω ∈ , the proportion of the points ω, T 1 ω, T 2 ω, . . ., as n → ∞, which lie in any set A ∈ A, is equal to P(A). Definition 10. Consider the measurable space (, A), and let P1 , P2 be two probability measures on A. We say that P1 and P2 are orthogonal, and we write P1 ⊥ P2 , if there exists a set A ∈ A, such that P1 (A) = 1 and P2 (A) = 0 (so that P2 (Ac ) = 1). Corollary 3. Let T : (, A) → (, A) be a (measurable) measure-preserving transformation with respect to two probability measures P1 and P2 on A, and suppose that T is also ergodic with respect to both P1 and P2 . Then either P1 ≡ P2 or P1 ⊥ P2 . Proof. It suffices to prove that, if P1 = P2 , then P1 ⊥ P2 . To this end, let A ∈ A be such that P1 (A) = P2 (A). Then, by Corollary 1, n 1 j−1 IA T ω → P1 (A) n→∞ n j=1
337
338
CHAPTER 15 Topics from Ergodic Theory
except on a set B1 ∈ A such that P1 (B1 ) = 0. Also, n 1 j−1 IA T ω → P2 (A) n→∞ n j=1
= 0. Since P1 (A) = P2 (A), we have except on a set B2 ∈ A such that P2 (B2 ) % that B1c ∩ B2c = , since, for ω ∈ B1c , n1 nj=1 I A (T j−1 ω) → P1 (A) and, for n→∞ % ω ∈ B2c , n1 nj=1 I A (T j−1 ω) → P2 (A), and P1 (A) = P2 (A), so that B1 ⊇ B2c , and n→∞
therefore P2 (B1 ) ≥ P2 (B2c ) = 1; i.e., P1 (B1 ) = 0 or P1 (B1c ) = 1 and P2 (B1 ) = 1, so that P1 ⊥ P2 . Corollary 4.
Under the same assumptions as those of Theorem 1, one has n n 1 1 j−1 (1) X j (ω) = X T ω → E X |J . n→∞ n n j=1
j=1
Proof. As it follows from the proof of Theorem 1, we may assume, without loss of generality, that E X |J = 0 a.s. Then we set Yn (ω) =
n n 1 1 j−1 X j (ω) = X T ω , ω ∈ , n ≥ 1, n n j=1
j=1
and we shall show that, as n → ∞ (here as well as throughout the proof), E |Yn | → 0.
(15.28)
a.s.
By Theorem 1, we have that Yn → 0. Then Egorov’s Theorem (see Exercise 11 in Chapter 3) implies that, for every ε > 0 there exists A = A(ε) ∈ A such that P(A) ≤ ε and Yn →0 uniformly on Ac . Thus, for all sufficiently large n independent of ω ∈ Ac , one has #
#
E |Yn | =
|Yn | d P + A
# Ac
|Yn | d P ≤
|Yn | d P + ε P(Ac ) A
# n # 1 (( (( c X j d P + ε P(A ) = |X |d P + ε P(Ac ), ≤ n A A j=1
and therefore
# lim sup E |Yn | ≤
|X | d P + ε P(Ac ). A
(15.29)
15.5 The Ergodic Theorem, Preliminary Results
But
#
# |X | d P = A
#
#
A∩(|X |>k)
#
A∩(|X |>k)
≤ ≤
(|X |>k)
and therefore (15.29) becomes lim sup E |Yn | ≤
|X | d P +
A∩(|X |≤k)
|X | d P
|X | d P + k P(A)
|X | d P + εk,
# (|X |>k)
Letting ε → 0, we obtain lim sup E |Yn | ≤
|X | d P + εk + ε P(Ac ). # (|X |>k)
|X | d P.
Now letting k → ∞ and utilizing the fact that E |X | < ∞, we get lim sup E |Yn | = 0. Thus, E |Yn | → 0, which is (15.28). Now, let X = X 1 , X 2 , . . . be a stationary process defined on (, A, P) into ∞ ( , B ∞ ), and let Pˆ be the distribution of X under P. Then the coordinate process ˆ = ( Xˆ 1 , Xˆ 2 , . . .) defined on (∞ , B ∞ , P) ˆ is also stationary (by Proposition 1); it X assumes the representation
Xˆ n (x) = Xˆ 1 S n−1 (x) (by (15.11)), where S is the shift transformation on ∞ (see Definition 5); and the shift transformation S is measurable and measure-preserving (by Proposition 8). The processes X and ˆ are equivalent probabilistically, in the sense that they have the same distribution X ˆ respectively). And, whereas X may not have a representation in (under P and P, ˆ does. It follows, terms of a measure-preserving transformation T on into itself, X in particular, that # # X 1d P = Xˆ 1 d Pˆ = E Xˆ 1 , E X1 = provided these expectations exist. Thus, if E |X 1 | < ∞, and the shift transformation is also ergodic, then we have that n 1 ˆ X j →E Xˆ 1 = E X 1 a.s. and in the first mean n j=1
(by Corollaries 1 and 3). Hence n 1 X j →E X 1 a.s. and in the first mean. n j=1
339
340
CHAPTER 15 Topics from Ergodic Theory
15.6 Invariant Sets and Random Variables Relative to a Process, Formulation of the Ergodic Theorem in Terms of Stationary Processes, Ergodic Processes Now, it would be desirable to have a definition of invariance and ergodicity, and a formulation of the Ergodic Theorem in terms of the original process without passing to the equivalent coordinate process and without the assumption that X is generated by a transformation T . This is actually done here. The appropriate definition of invariance of a set in A is the following one. Definition 11. A set A ∈ A is said to be invariant (relative to X), if there exists a set B ∈ B ∞ such that A = X−1 B and such that −1 A = X n , X n+1 , . . . B for all n ≥ 1. The class J of invariant sets in A is a σ -field.
Proposition 18.
Proof. J is nonempty since ∈ J , and closure of J under countable unions and complementation is immediate. (See also Exercise 6.) ˆ Remark 6. Consider the coordinate process X and the shift transformation S. Then the σ -field of invariant sets under S, J S , say, is defined by ) * J S = B ∈ B ∞ ; S −1 B = B . Because then S −n B = B, and S −n B = × · · · × ×B, n ≥ 0, n
we have that JS =
⎧ ⎨ ⎩
B ∈ B∞ ;
⎫ ⎬
B = × · · · × ×B, n ≥ 0 . ⎭ n
ˆ i.e., Next, let Jˆ be the σ -field of invariant sets relative to X; 1 2
−1 ∞ ˆ ˆ ˆ J = B∈B ; X n , X n+1 , . . . B = B, n ≥ 0 . But, clearly,
Xˆ n , Xˆ n+1 , . . .
−1
B = × · · · × ×B, n ≥ 1. n−1
Thus, Jˆ = Also,
⎧ ⎨ ⎩
B ∈ B∞ ;
⎫ ⎬
B = × · · · × ×B, n ≥ 0 , so that J S = Jˆ . ⎭ n
15.6 Invariant Sets and Random Variables Relative to a Process
Definition 12. A r.v.Y on (, A) is said to be invariant (relative to X), if there exists a measurable mapping ϕ on (∞ , B ∞ ) into (, B) such that Y = ϕ X n , X n+1 , . . . for all n ≥ 1. The following definition and result will be needed later. Let Z 1 : (, A) → (1 , A 1 ), measurable, Z 2 : (, A) → (2 , A 2 ), measurable. Then Definition 13. We say that Z 2 is a function of Z 1 , if for all ω, ω ∈ , ω = ω for which Z 1 (ω) = Z 1 (ω ), it follows that Z 2 (ω) = Z 2 (ω ). Lemma 4. Let Z 1 , Z 2 be defined as before, and suppose that the σ -field induced by Z 2 , A2 ⊆ A1 , the σ -field induced by Z 1 . Then, if A2 contains the singletons of points in the range of Z 2 , it follows that (i) Z 2 is a function of Z 1 . (ii) There exists a unique function Z on (Z 1 (), A 1 ∩ Z 1 () to (2 , A 2 )) defined by Z (ω1 ) = Z 2 (ω; Z 1 (ω) = ω1 ), so that Z 2 = Z (Z 1 ). (iii) Z is A 1 ∩ Z 1 ()-measurable. The proof of this lemma is left as an exercise (Exercise 7). Proposition 19. Let Y be a r.v. defined on (, A). Then Y is invariant (relative to X), if and only if Y is J -measurable, where J is the σ -field of invariant sets (relative to X). Proof. Let Y be invariant. Then there exists a measurable mapping ϕ on (∞ , B ∞ ) into (, B) such that Y = ϕ X n , X n+1 , . . . for all n ≥ 1. −1 −1 −1 (ϕ C) = X n , X n+1 , . . . B, Let C ∈ B. Then A = Y −1 C = X n , X n+1 , . . . where B ∈ B ∞ and A ∈ A. Thus, for every A ∈ Y −1 B , there exists B ∈B ∞ such that −1 B for all n ≥ 1, so that A ∈ J ; A = X n , X n+1 , . . . i.e., Y is J -measurable. Next, let Y be J -measurable. Then Y −1 B ⊆ J . Make the following identification: (1 , A1 ) = (∞ , B ∞ ), (2 , A2 ) = (, B), = X 1, X 2, . . . , Z2 = Y .
Z1
From the definition of J , it follows that −1 ∞ B for all n ≥ 1. J ⊆ X n , X n+1 , . . .
341
342
CHAPTER 15 Topics from Ergodic Theory
Thus,
−1 ∞ Y −1 B ⊆ X 1 , X 2 , . . . B .
Since B contains all singletons in , it follows that the Definition 13 applies andgives the existence of a measurable mapping ϕ defined on the range of X 1 , X 2 , . . . into (, B) such that Y = ϕ X 1, X 2, . . . . By of X 1 , X2 , . . . is the same as that of stationarity of X, it follows that the range X n , X n+1 , . . . for all n ≥ 1. Thus, ϕ X n , X n+1 , . . . is well defined. In order to complete the proof, it suffices to show that ϕ X 1 , X 2 , . . . = ϕ X n , X n+1 , . . . for all n ≥ 1. To see this, let −1 Bx = ϕ −1 {x}, x ∈ , and let A x = X 1 , X 2 , . . . Bx . −1 Then A x = Y −1 {x}, so that A x ∈ J . But then A x = X n , X n+1 , . . . Bx for all n ≥ 1, so that for ω ∈ A x , X n (ω), X n+1 (ω), . . . ∈ Bx for all n ≥ 1, and hence ϕ X n (ω), X n+1 (ω), . . . = x for all n ≥ 1. This completes the proof. Now the Ergodic Theorem becomes as follows. Theorem 3 (Ergodic Theorem). Let X = X 1 , X 2 , . . . be a stationary process defined on (, A, P), and let J be the σ -field of invariant events (relative to X). Then, if |E X 1 | < ∞, one has n 1 a.s. X j → E X 1 |J . n→∞ n
(15.30)
j=1
ˆ be the coordinate process. Proof. Let Pˆ be the distribution of X under P, and let X ˆ and gives that Then Theorem 1 applies to the process X n
1 ˆ X j → Xˆ = E Xˆ 1 |JS a.s. Pˆ . n→∞ n j=1
ˆ 0 ) = 0, Let B0 ∈ B ∞ be the exceptional set, and let A0 = X−1 B0 . Then P(A0 ) = P(B 1 %n c and on A0 , n j=1 X j converges to X , say. We shall show that X = E X 1 |J a.s. We show that X is invariant, and hence J -measurable, by Proposition 19. To this end, set X n + · · · + X n+k−1 , n ≥ 1. Then X = lim ϕ1,k k→∞ k c on A0 .
ϕn,k =
15.6 Invariant Sets and Random Variables Relative to a Process
But
X 1 + · · · + X n+k−1 n+k−1 X 1 + · · · + X n−1 × − k→∞ k→∞ n+k−1 k k X 1 + · · · + X n+k−1 = lim = X on Ac0 . k→∞ n+k−1 Thus, if we set limk→∞ ϕ1,k = ϕ X 1 , X 2 , . . . on Ac0 , then lim k→∞ ϕn,k = ϕ Xn , X n+1 , . . . on Ac0 , and both are equal to X ; i.e., (X =)ϕ X n , X n+1 , . . . = ϕ X 1 , X 2 , . . . , n ≥ 1 on Ac0 , and hence X is invariant. From the fact that X = ϕ X n , X n+1 , . . . , n ≥ 1, and the convergences lim ϕn,k = lim
n 1 X j → X on Ac0 , n→∞ n j=1
where
n 1 ˆ X j → Xˆ on B0c , n→∞ n j=1
Ac0 = X−1 B0c with P Ac0 = Pˆ B0 = 0,
we get that
Xˆ = ϕ Xˆ 1 , Xˆ 2 , . . . a.s. Pˆ .
(15.31)
Thus, for every A = X−1 B, B ∈ B ∞ , we get # # # #
Xd P = ϕ X 1, X 2, . . . d P = ϕ Xˆ 1 , Xˆ 2 , . . . d Pˆ = Xˆ d Pˆ A
A
B
B
is finite and ⎛ ⎞ ⎛ ⎞ # # # # n n 1 1 ⎝ ⎝ X j⎠ dP = X d P, Xˆ j ⎠ d Pˆ → Xˆ d Pˆ = n→∞ B n n A B A j=1
j=1
by Corollary 4 (in Section 15.5), because ( ( ( ⎡⎛ ( ⎞ ⎤ ( ( # ( (# n ( ( ( (1 n 1 ( ( ⎣⎝ ( ˆ ˆ ˆ ˆ ⎠ ⎦ X j − X d P( ≤ X j − X (( d P ( ( n A ( n j=1 ( ( ( A j=1 ( ( ( ( (1 n ( ( ˆ ˆ ≤E( X j − X (( → 0, ( n j=1 ( n→∞ so that
#
⎛
⎞ # n n # 1 1 ⎝ ⎠ Xj dP = X jdP → X d P. n→∞ A n n A A j=1
j=1
(15.32)
343
344
CHAPTER 15 Topics from Ergodic Theory
This is true, in particular, for A ∈ J . But then there exists B ∈ B ∞ such that −1 A = X k , X k+1 , . . . B, k ≥ 1. By stationarity, we then obtain # # # # Xk d P = Xk d P = X 1d P = X 1 d P. A
(X k ,X k+1 ,...)−1 B
Therefore implies that (15.32) X = E X 1 |J a.s.
$ A
X 1d P =
(X 1 ,X 2 ,...)−1 B
$ A
A
X d P for every A ∈ J . It follows that
Corollary 1.
The convergence in (15.30) holds true in the first mean. Proof. We have X = ϕ X 1 , X 2 , . . . , and X = E X 1 |J a.s., so that E X 1 |J = ϕ X 1 , X 2 , . . . a.s. Thus, ( ( ( ( ( ( ( ( (1 n (1 n ( ( ( ( ( X j − E X 1 |J ( = E ( X j − ϕ X 1 , X 2 , . . . (( E( ( n j=1 ( ( ( n j=1 ( ( ( ( ( # ( # (
(( (1 n (1 n ( = (( X j − ϕ X 1 , X 2 , . . . (( d P = (( Xˆ j − ϕ Xˆ 1 , Xˆ 2 , . . . (( d Pˆ ( n j=1 ( n j=1 ( ( ( ( ( # ( (1 n ( ( ˆ ˆ = ( X j − X (( d Pˆ ( n j=1 (
(by (15.31)), and this tends to 0 as n → ∞ by Corollary 4 (in Section 15.5). Definition 14. The stationary process X = X 1 , X 2 , . . . is said to be ergodic, if P(A) = 0 or 1 for every A ∈ J . We then have the following corollary to Theorem 3. Corollary 2. If X = X 1 , X 2 , . . . of Theorem 3 is also ergodic, then (15.30) holds true with the right-hand side being replaced by E X 1 . $ $ $ $ A E(X 1 |J )d P = $Proof. For A ∈ J , A E(X 1 |J )d P = A X 1 d $P. For P(A) = 1, E(X 1 |J )d $ E X 1 = E X 1 d P = A E X 1 d P; and for $ P = E[E(X 1 |J )] = P(A) = 0, A E(X 1 |J )d P = 0 = A E X 1 d P. Hence E(X 1 |J ) = E X 1 a.s. In Proposition 4, it was seen that functions defined on a stationary process also produce stationary processes. The same is true with regards to ergodicity. More precisely, one has Proposition 20. Let X = X 1 , X 2 , . . . be a stationary process defined on (, A, P) and ϕ be defined as follows: ϕ : (m , B m ) → (, B), measurable, 1 ≤ m ≤ ∞. Set Yn = ϕ X n , X n+1 , . . . , X n+m−1 , n ≥ 1, and Y = Y1 , Y2 , . . . .
.
15.6 Invariant Sets and Random Variables Relative to a Process
Then (i) The process Y is stationary. (ii) If J X and JY are the invariant σ -fields associated with the processes X and Y, respectively, it follows that JY ⊆ J X . (iii) If X is ergodic, so is Y. Proof. (i) It follows from Proposition 4. (ii)
−1 B, and Let A ∈JY . Then there exists B ∈ B ∞ such that A = Y1 , Y2 , . . . actually, −1 A = Yn , Yn+1 , . . . B, n ≥ 1. Set ϕ j = ϕ(x j , x j+1 , . . . , x j+m−1 ),
j ≥ 1, and let C =
xn , xn+1 , . . . ;
(ϕn , ϕn+1 , . . .) ∈ B} ∈ B ∞ . For n ≥ 1,
−1 B = ϕ(X n , . . . , X n+m−1 ), A = Yn , Yn+1 , . . . −1 B ϕ(X n+1 , . . . , X (n+1)+m−1 ), . . . = ϕ(X n , . . . , X n+m−1 ), ϕ(X n+1 , . . . , X (n+1)+m−1 ), . . . ∈ B −1 C. = X n , X n+1 , . . . ∈ C = X n , X n+1 , . . .
Thus, A is invariant with respect to X, and hence A ∈ JX . (iii) It is immediate from (ii) and ergodicity of X.
Exercises. 1. Let (, A, P) = ([0, 1), B[0,1) , λ) where λ is the Lebesgue measure, and let the transformation T be defined by & & 1 1 1 1 , T (x) = x − , x ∈ ,1 . T (x) = x + , x ∈ 0, 2 2 2 2 Then show that T is measurable and measure-preserving. 2. Let (, A) = ([0, 1), B[0,1) ), and let the transformation T be defined by T (x) = cx, x ∈ [0, 1), where c is a constant in (0, 1). Then show that there is no probability measure P on B[0,1) such that P({x}) = 0, x ∈ [0, 1), and for which the transformation T is measure-preserving. 3. Refer to Exercise 1 and examine the transformation T from the ergodicity viewpoint. 4. Complete the proof of the converse part of Proposition 16.
345
346
CHAPTER 15 Topics from Ergodic Theory
5. Complete the proof of Lemma 2. 6. Work out the details of the proof of Proposition 18. 7. Complete the proof of Lemma 4. Hint: (i) For ω, ω ∈ with ω = ω and Z 1 (ω) = Z 1 (ω ), suppose that Z 2 (ω) = Z 2 (ω ). Use the assumptions that A2 contains the singletons of points in Z 2 (), and that A2 ⊆ A1 , in order to arrive at a contradiction. (ii) By part (i), Z is well defined, and Z 2 = Z (Z 1 ). If also Z 2 = Z (Z 1 ), then show that Z (ω1 ) = Z (ω1 ) for all ω1 ∈ Z 1 (). (iii) For D ∈ A2 , we have A1 ⊆ A2 A = Z 2−1 (D) = Z 1−1 (B) with B = Z −1 (D) ⊆ Z 1 (), and A = Z 1−1 (C) for some C ∈ A1 . Conclude that B = C ∩ Z 1 (). 8. If X n , n ≥ 1, are i.i.d. r.v.s defined on the probability space (, A, P), then (i) Show that the process X = (X 1 , X 2 , . . .) is ergodic. (ii) Derive the Strong Law of Large Numbers under the assumption that E X 1 is finite. Hint: (i) By Proposition 3, it follows that X = (X 1 , X 2 , . . .) is stationary. Let J be the σ -field of invariant sets relative to X (see Definition 11), and let T be the tail σ -field defined on X = (X 1 , X 2 , . . .) (see Definition 1 and preceding discussion in Chapter 14). Then J ⊆ T . Conclude the discussion by using Theorem 10 in Chapter 14. (ii) Refer to Corollary 2 of Theorem 3.
CHAPTER
Two Cases of Statistical Inference: Estimation of a Real-Valued Parameter, Nonparametric Estimation of a Probability Density Function
16
The objective of this chapter is to present some cases, where probabilistic results obtained in previous chapters are used for statistical inference purposes. This should enhance the interest of statisticians in the probability part of the book. There are two instances of statistical inference considered here. One is estimation of a (real-valued) parameter by means of the so-called Maximum Likelihood Estimator (or Estimate) (MLE), and the other is the nonparametric estimation of an unknown probability density function (p.d.f.) by means of the kernel method of estimation. The former problem is discussed in Sections 16.1,16.2,16.3, and the latter in Sections 16.5 and 16.6.
16.1 Construction of an Estimate of a Real-Valued Parameter As a brief introduction, let X 1 , . . . , X n be independent identically distributed (i.i.d.) observations (real-valued r.v.s) defined on the probability space (, A, Pθ ), where here θ is a real-valued parameter taking values in the parameter space , an open subset of the real line . The problem is to construct on estimator (or estimate) of θ ; i.e., a (measurable) function of the X i s taking values in . There are several ways of going about it, and one of the most popular is that of using a MLE (should such an estimate exist). That is, form the so-called likelihood function of the X i s—which in the present set-up is their joint p.d.f.—and then attempt to maximize it with respect to the parameter θ over the parameter space . Should a unique maximizer exist, to be denoted by θˆn = θˆ (X 1 , . . . , X n ), then θˆn is proclaimed to be the MLE of θ . There are several reasons as to why such an estimate is desirable, but we are not going to elaborate on it here. Instead, we are going to give a set of conditions under which one may construct a sequence of roots, θ˜n = θ˜ (X 1 , . . . , X n ), of the likelihood functions, which is a strongly consistent estimate of θ ; i.e., θ˜n → θ a.s. [Pθ ], θ ∈ . n→∞
An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00016-5 Copyright © 2014 Elsevier Inc. All rights reserved.
347
348
CHAPTER 16 Two Cases of Statistical Inference
Under an enhanced set of assumptions, such an estimate is also asymptotically normal, when suitably normalized. These results also hold true for the MLE, should it exist.
16.2 Construction of a Strongly Consistent Estimate of a Real-Valued Parameter Let X 1 , . . . , X n be i.i.d. r.v.s defined on the probability space (, A, Pθ ), where the probability measure Pθ depends on a real-valued parameter θ lying in the parameter space , an open subset of . Let X be a r.v. distributed as the X i s, and suppose it has a p.d.f. f (·; θ ) (a Radon-Nikodym derivative with respect to Lebesgue measure; see Remark 1, Chapter 7). We intend to give sufficient conditions under which a strongly consistent estimate of θ may be constructed, based on the likelihood function. Below, we list perhaps the most economical conditions under which this happens. Assumptions. (A1) Assume to be an open interval in . (A2) Suppose that the positivity set of f (·; θ ) is independent of θ ; i.e., S = {x ∈ ; f (x; θ ) > 0} is independent of θ . (A3) Suppose that the p.d.f. f (·; θ ) is identifiable; i.e., for every θ1 and θ2 in with θ1 = θ2 , there exists a Borel set B(⊆S) with Pθ (X ∈ B) > 0 for all θ ∈ such that f (x; θ1 ) = f (x; θ2 ) for x ∈ B. ∂ f (x; θ ) exists for all θ ∈ and all x ∈ S. (A4) Suppose that the derivative ∂θ (A5) Let θ0 be the unknown but true value of the parameter θ , and for any θ ∈ , set K (θ0 , θ ) = Eθ0 log
f (X ; θ0 ) f (X ; θ )
(16.1)
(where log stands throughout for the natural logarithm). Then assume that K (θ0 , θ ) < ∞. In the proof of the consistency theorem stated below, the following result is used which is stated here as a lemma. Lemma 1. Let K (θ0 , θ ) be defined as in (16.1). Then, under assumptions (A2) and (A3), it follows that K (θ0 , θ ) > 0. 2
Proof. (Outline). Set y = − log x(x > 0). Then dd x y2 = x −2 > 0, so that this function is convex. Then apply Jensen’s inequality (see Theorem 15, Chapter 6) with g(X ) = − log [ f (X ; θ1 )/ f (X ; θ0 )] to obtain − log Eθ0 [ f (X ; θ1 )/ f (X ; θ0 )] ≤ Eθ0 {− log[ f (X ; θ1 )/ f (X ; θ0 )]} = K (θ0 , θ1 ). However, Eθ0 [ f (X ; θ1 )/ f (X ; θ0 )] =
S
f (x; θ1 ) f (x; θ0 ) d x f (x; θ0 )
16.2 Construction of a Strongly Consistent Estimate
=
f (x; θ1 )d x = S
f (x; θ1 )d x = 1,
so that − log 1 = 0 ≤ K (θ0 , θ1 ). By (A3), it is seen that K (θ0 , θ1 ) > 0.
In all that follows, we work on the set S (see Assumption (A2)), θ0 stands for the true (but unknown) value of the parameter θ, Xn stands for the random vector (X 1 , . . . , X n ),and δ is an arbitrary positive number. The likelihood funcn f (X i ; θ ), and the log-likelihood function is n (θ |Xn ) = tion is L n (θ |Xn ) = i=1 log L n (θ |Xn ). We may now state the consistency result alluded to earlier. Theorem 1. Under assumptions (A1)–(A5), we can construct a sequence { θn }, n ≥ 1, of estimates of θ0 , where θn is a root of the log-likelihood equation ∂ n ( θn |Xn ) = ∂θ
a.s. 0 on a set of Pθ0 -probability 1, such that θn −→θ0 as n→∞ (i.e., θn −→ θ0 on a set (Pθ0 )
n→∞
of Pθ0 -probability 1). Proof. The proof consists of two parts. First, it is shown that, for δ > 0 sufficiently small, there exists a positive integer n δ such that, for n ≥ n δ and on a set of Pθ0 probability 1, ∂ n (θδ |Xn ) = 0, |θδ − θ0 | < δ, ∂θ where θδ is a (local) maximum of the log-likelihood function n (θ |Xn ). Next, suitably exploiting this result, we proceed with the construction of the θn ’s as described. To this end, set Iδ = (θ0 − δ, θ0 + δ), I¯δ = [θ0 − δ, θ0 + δ], and take δ small enough, so that I¯δ ⊂ . Also, set f (x; θ0 + δ) f (X ; θ0 + δ) log = f (x; θ0 ) d x, J+δ = −K (θ0 , θ0 +δ) = Eθ0 log f (X ; θ0 ) f (x; θ0 ) S (16.2) so that J+δ ≤ 0 (by Lemma 1), and indeed, J+δ < 0 on account of (A3). Then n 1 f (X i ; θ0 + δ) 1 n (θ0 + δ|Xn ) − n (θ0 |Xn ) = log n n f (X i ; θ0 ) i=1
n 1
f (X i ; θ0 + δ) c = −→ J+δ on a set N+δ log , n f (X i ; θ0 ) n→∞ i=1
c ) = 1 (by the SLLN, see Theorem 4 in Chapter 14). Thus, for every say, with Pθ0 (N+δ c ω ∈ N+δ , there exists n +δ = n +δ (ω) positive integer such that 1 n (θ0 + δ|Xn ) − n (θ0 |Xn ) − J+δ < − 1 J+δ for n ≥ n +δ . (16.3) n 2 c and n ≥ n . The inequality in (16.3) From this point on, we work with ω ∈ N+δ +δ implies 1 1 n (θ0 + δ|Xn ) − n (θ0 |Xn ) < J+δ , n 2
349
350
CHAPTER 16 Two Cases of Statistical Inference
or n (θ0 + δ|Xn ) < n (θ0 |Xn ) +
n J+δ , 2
and hence n (θ0 + δ|Xn ) < n (θ0 |Xn ) (since J+δ < 0).
(16.4)
At this point, refer to relation (16.2), set J−δ = −K (θ0 , θ0 − δ), and work as above in order to conclude that n (θ0 − δ|Xn ) < n (θ0 |Xn )
(16.5)
c with P (N c ) = 1 and n ≥ n for ω ∈ N−δ θ0 −δ = n −δ (ω) positive integer. Set −δ c c c (so that P (N c ) = 1), and n = max(n , n ). Then, for ω ∈ N c Nδ = N+δ ∩ N−δ θ0 δ +δ −δ δ δ and n ≥ n δ , both relations (16.4) and (16.5) hold. Now, look at n (θ |Xn ) as a function of θ ∈ I¯δ (always with ω ∈ Nδc and n ≥ n δ ). By (A4), n (θ |Xn ) is continuous in θ (by differentiability). Then n (θ |Xn ) attains (at least) one (local) maximum; i.e., there is θδ = θδ (Xn ) with θδ ∈ Iδ (open interval, not closed I¯δ ), because of inequalities (16.4) and (16.5). So, n (θδ |Xn ) is a (local) maximum of n (θ |Xn ), θ ∈ Iδ , and of course, |θδ − θ0 | < δ. On the other hand, any local maxima are roots of the log-likelihood ∂ n (θ |Xn ) = 0 (because of the differentiability of n (θ |Xn )). Thus we equation ∂θ have ∂ n (θδ |Xn ) = 0, |θδ − θ | < δ (16.6) ∂θ (on the set Nδc with Pθ0 (Nδc ) = 1, and for n ≥ n δ ). We now embark on the construction of the desired sequence { θn }, n ≥ 1. To this end, for each k = 1, 2, . . ., select (0 <)δk < δ with δk ↓0, and consider the respective intervals Iδk , I¯δk , as well as the sets Nδck with Pθ0 (Nδck ) = 1 for all k. Set c c N c = ∩∞ k=1 Nδk , so that Pθ0 (N ) = 1 (see also Exercise 4, Chapter 2), and from this c point on, work with ω ∈ N . On the basis of the arguments in the first part of this proof, for each k ≥ 1, there exists n δk (ω) = n δk = n k positive integer, such that for n ≥ n k , there exists (at least) one (local) maximum θδk (Xn ) = θδk in the interval (θ0 − δk , θ0 + δk ) with the properties (see also relations in (16.6)),
∂ n θδk |Xn = 0, θδk − θ0 < δk . (16.7) ∂θ Consider the subsequence {n k }⊆{n}, and without loss of generality, we may assume that n 1 < n 2 < · · · ↑∞. Next, for each n = 1, 2, . . ., apply the following rule: Look θn = θδi . Then at n i < n i+1 , i ≥ 1, and for all those ns with n i ≤ n < n i+1 , take
∂ ∂ θn − θ0 = θδi − θ0 < δi . n n θδi |Xn = 0, θn |Xn = (16.8) ∂θ ∂θ Summarizing the results and taking into consideration relation (16.8), we have then θn }, n ≥ 1, of that, for ω ∈ N c with Pθ0 (N c ) = 1, there exists a sequence of roots, { θn →θ0 a.s. the log-likelihood equation such that θn →θ0 as n→∞. In other words, (with respect to Pθ0 -probability), as was to be seen. To the above theorem, we have the following corollary.
16.3 Some Preliminary Results
Suppose that for all sufficiently large n the MLE θn is the unique root of a.s. the log-likelihood equation. Then, under assumptions (A1)–(A5), θn −→θ0 as n→∞.
Corollary.
(Pθ0 )
Proof. All arguments employed in the proof of the theorem apply when θn is replaced by θn . In particular, relation (16.8) becomes, on a set of Pθ0 -probability 1,
∂ ∂ θn − θ0 = θδi − θ0 < δi . θn |Xn = n n θδi |Xn = 0, ∂θ ∂θ Given that δi →0 as i→∞, the proof follows.
Remark 1. If the parameter space is r -dimensional (r ≥ 1), then in assumption (A4) the derivative is replaced by the partial derivatives. The proof of Lemma 1 is independent of the dimensionality of . The absolute value in is replaced by the usual Euclidean distance, denote it by | · |, and all arguments in the proof go through. a.s. a.s. a.s. θn − θ0 |−→0, becomes | θn − θ0 |−→0 Then the conclusion θn −→θ0 , equivalently, | (Pθ0 )
(Pθ0 )
(Pθ0 )
in the new setting, as n→∞. In other words, the conclusion of the theorem still holds true.
16.3 Some Preliminary Results The lemma to be established below is employed in the proof of a certain uniform version of the SLLN, as well as in the proof of the asymptotic normality of the MLE. The lemma is formulated in a way to cover the case of random vectors, and the case that the parameter θ is multidimensional. Lemma 2. Let T and X be subsets of finite-dimensional Euclidean spaces, let ϕ be a real-valued function defined on the product space X × T , and let X be a random vector defined on the probability space (, A, P) and taking values in X . Make the following assumptions: (i) T is compact. (ii) For every x ∈ X , ϕ(x, t) is continuous in t (and measurable in x for every t ∈ T ). (iii) |ϕ(x, t)| ≤ h(x), t ∈ T , for some nonnegative (measurable) function h defined on X with Eh(X ) < ∞. Let X 1 , . . . , X n be independent random vectors distributed as X , and let μ(t) = Eϕ(X , t). Then, with probability 1, ⎧ ⎤⎫ ⎡ n ⎬ ⎨
1 limsup sup ⎣ ϕ(X j , t)⎦ ≤ sup μ(t). ⎭ t∈T n n→∞ ⎩ t∈T j=1
Proof. For some (small) ρ > 0, set ψ(x, t, ρ) = sup|t −t|<ρ ϕ(x, t ), where | · | is the usual distance in the Euclidean space X . Then ψ(x, t, ρ)↓ϕ(x, t) as ρ↓0, by
351
352
CHAPTER 16 Two Cases of Statistical Inference
the assumed continuity in t of ϕ(x, t). All arguments below hold for all elements of T . Thus, ϕ(x, t ) ≤ h(x) implies sup|t −t|<ρ ϕ(x, t ) ≤ h(x) or ψ(x, t, ρ) ≤ h(x), equivalently, −h(x) ≤ −ψ(x, t, ρ) and −ψ(x, t, ρ)↑ − ϕ(x, t) as ρ↓0. Replacing x by X , we have then −h(X ) ≤ −ψ(X , t, ρ) and −ψ(X , t, ρ)↑ − ϕ(X , t) as ρ↓0. Then E[−ψ(X , t, ρ)]↑E[−ϕ(X , t)], or equivalently, Eψ(X , t, ρ)↓Eϕ(X , t) as ρ↓0. (This is so by Corollary 2, Chapter 5) Or, Eψ(X , t, ρ)↓μ(t) as ρ↓0. Thus, for an arbitrary (small) ε > 0 and each t ∈ T , there exists a ρt > 0 sufficiently small, such that E ψ(X , t, ρt ) < μ(t) + ε. The collection of (balls) {St = {t ; |t − t| < ρt }, t ∈ T }, clearly, cover T . Then, by assumption (i), there is a finite number (of these balls) Sti , i = 1, . . . , m whose union is T . (This is so by the Heine–Borel covering theorem.) Now, it is clear that for each t ∈ T , there is at least one i = 1, . . . , m, such that t ∈ Sti . Then, from the definition of ψ(x, t, ρ), it follows that ϕ(x, t) ≤ ψ(x, ti , ρti ), t ∈ Sti . Replacing x by X j , we get ϕ(X j , t) ≤ ψ(X j , ti , ρti ), t ∈ Sti , j = 1, . . . , n, and hence
n n 1
1
ϕ(X j , t) ≤ ψ(X j , ti , ρti ), t ∈ Sti . n n j=1
j=1
It follows that ⎤ ⎤ ⎡ n n
1 1 sup ⎣ ϕ(X j , t)⎦ ≤ sup ⎣ ψ(X j , ti , ρti )⎦ n n t∈T t∈T j=1 j=1 ⎤ ⎡ n
1 = max ⎣ ψ(X j , ti , ρti )⎦ . 1≤i≤m n ⎡
j=1
However, by the SLLN, n 1
a.s. ψ(X j , ti , ρti ) −→ Eψ(X , ti , ρti ), i = 1, . . . , m, n→∞ n j=1
and hence ⎤ n
1 a.s. max ⎣ ψ(X j , ti , ρti )⎦ −→ max Eψ(X , ti , ρti ). n→∞ 1≤i≤m 1≤i≤m n ⎡
j=1
On the other hand, for sufficiently large n, E ψ(X , ti , ρti ) < μ(ti ) + ε, i = 1, . . . , m,
16.3 Some Preliminary Results
so that max Eψ(X , ti , ρti ) ≤ max [μ(ti ) + ε]
1≤i≤m
1≤i≤m
= max μ(ti ) + ε ≤ sup μ(t) + ε. 1≤i≤m
t∈T
Hence, for sufficiently large n, ⎤ ⎡ n
1 ψ(X j , ti , ρti )⎦ ≤ sup μ(t) + ε a.s., max ⎣ 1≤i≤m n t∈T j=1
and then
⎤ n
1 ϕ(X j , t)⎦ ≤ sup μ(t) + ε a.s. sup ⎣ n t∈T t∈T ⎡
j=1
Taking the limsup as n→∞, we get ⎧ ⎤⎫ ⎡ n ⎨ ⎬
1 limsup sup ⎣ ϕ(X j , t)⎦ ≤ sup μ(t) + ε a.s. ⎩ t∈T n ⎭ t∈T j=1
Finally, letting ε→0, we obtain ⎧ ⎤⎫ ⎡ n ⎨ ⎬
1 limsup sup ⎣ ϕ(X j , t)⎦ ≤ sup μ(t) a.s. ⎩ t∈T n ⎭ t∈T
j=1
The result below is a uniform version of the SLLN, which, of course, is interesting on its own right; it will also be employed in establishing asymptotic normality of the strongly consistent estimates of Theorem 1 and of the MLE, should it exist. Proposition 1. there, it holds
In the notation of Lemma 2 and under the assumptions (i)–(iii) made ⎤ n 1
⎣ lim ϕ(X j , t) − μ(t)⎦ = 0 a.s. sup n→∞ t∈T n j=1
(Thus, as n→∞, n1
⎡
n
j=1 ϕ(X j , t)
a.s.
−→ μ(t) uniformly in t ∈ T .)
Proof. In the first place, the function μ(t) = Eϕ(X , t) is continuous. Indeed, ϕ(X , t ∗ ) → ϕ(X , t ∗ ) as t ∗ → t (by assumption (ii)), and |ϕ(X , t ∗ )| ≤ h(X ), independent of t ∗ , with Eh(X ) < ∞ (by assumption (iii)). It follows that μ(t ∗ ) = Eϕ(X , t ∗ ) −→ Eϕ(X , t) = μ(t). ∗ t →t
353
354
CHAPTER 16 Two Cases of Statistical Inference
(This is so by the Dominated Convergence Theorem, Theorem 3, Chapter 5.) Next, suppose that the proposition is true when μ(t) = 0; i.e., we suppose that ⎤ ⎡ n 1
⎣ ϕ(X j , t)⎦ = 0 a.s., sup lim n→∞ t∈T n j=1 de f
and replace ϕ(x, t) by ψ(x, t) = ϕ(x, t) − μ(t). Clearly, ψ(x, t) satisfies assumption (ii), and |ψ(x, t)| = |ϕ(x, t) − μ(t)| ≤ |ϕ(x, t)| + |μ(t)| ≤ h(x) + Eh(X ), since |ϕ(X , t)| ≤ h(X ) is equivalent to −h(X ) ≤ ϕ(X , t) ≤ h(X ) and hence −Eh(X ) ≤ Eϕ(X , t) ≤ Eh(X ), or |Eϕ(X , t)| ≤ Eh(X ), equivalently, |μ(t)| ≤ Eh(X ). So ψ(x, t) also satisfies assumption (iii) (with h(x) replaced by h(x) + Eh(X )). Then ⎤ ⎡
1 n ⎣ ψ(X j , t)⎦ = 0 a.s., sup lim n→∞ t∈T n j=1
or
⎫ n 1 ⎬ lim ϕ(X j , t) − μ(t) sup n→∞ ⎩ t∈T n ⎭ j=1 ⎤ ⎡ n 1
⎣ = lim ϕ(X j , t) − μ(t)⎦ = 0 a.s. sup n→∞ n t∈T ⎧ ⎨
(16.9)
j=1
In the sequel, we work with ψ(x, t) and apply Lemma 2 to get ⎧ ⎤⎫ ⎡ n ⎬ ⎨
1 limsup sup ⎣ ψ(X j , t)⎦ ≤ 0 a.s. ⎭ n n→∞ ⎩ t∈T
(16.10)
j=1
Clearly, the function −ψ(x, t) also satisfies assumptions (ii)–(iii) (and E[−ψ(X , t)] = 0). Then Lemma 2 yields ⎧ ⎤⎫ ⎡ n ⎬ ⎨
1 ψ(X j , t)⎦ ≤ 0 a.s. (16.11) limsup sup ⎣− ⎭ n n→∞ ⎩ t∈T j=1
On the basis of the last two conclusions and Remark 2 below, we have ⎫ ⎧ ⎬ n ⎨ 1
ψ(X j , t) 0 ≤ limsup sup ⎩ n→∞ t∈T n j=1 ⎭
16.4 Asymptotic Normality of the Strongly Consistent Estimate
⎧ ⎤⎫ ⎤⎫ ⎡ n n ⎬ ⎬ ⎨
1 1 ≤ limsup sup ⎣ ψ(X j , t)⎦ + limsup sup ⎣− ψ(X j , t)⎦ ⎭ ⎭ n n n→∞ ⎩ t∈T n→∞ ⎩ t∈T ⎧ ⎨
⎡
j=1
j=1
≤ 0 a.s.
(16.12) Therefore the limn→∞ supt∈T n1 nj=1 ψ(X j , t) exists with probability 1 and equals 0. Reverting to the function ϕ(x, t), we obtain the desired result, as is explained in relation (16.9). The completion of the proof requires justification of the second inequality on the right-hand side in relation (16.12). This is done below.
Remark 2.
For any functions gn (t), t ∈ T , n ≥ 1, the following are true: sup |gn (t)| = max sup gn (t), sup[−gn (t)] , t∈T
t∈T
t∈T
and limsup sup |gn (t)| ≤ limsup sup gn (t) + limsup sup[−gn (t)] . n→∞
t∈T
n→∞
n→∞
t∈T
t∈T
The first assertion is immediate. As for the second, we have: Set αn = sup gn (t), βn = sup[−gn (t)], t∈T
t∈T
and α = limsup αn , β = limsup βn . Then, from the definition of the limsup, we n→∞
n→∞
n→∞
have that, for every ε > 0 and all sufficiently large n, αn ≤ α + ε, βn ≤ β + ε. Hence max{αn , βn } ≤ (α + β) + ε, and therefore limsup [max{αn , βn }] ≤ (α + β) + ε n→∞ = limsup αn + limsup βn + ε. n→∞
n→∞
Letting ε→0, we get the desired result.
16.4 Asymptotic Normality of the Strongly Consistent Estimate The objective here is to establish asymptotic normality for the strongly consistent estimate constructed in Theorem 1 and of the MLE, should it exist, under suitable conditions. For this purpose, we review the relevant notation and list conditions under which this result may be established. To this end, let X be a r.v. defined on the probability space (, A, Pθ ), where the parameter θ belongs in the parameter space .
355
356
CHAPTER 16 Two Cases of Statistical Inference
Assumptions. (B1) The parameter space is an open subset of . (B2) The set S = {x ∈ ; f (x; θ ) > 0} is independent of θ . (B3) The p.d.f. f (·; θ ) is identifiable; i.e., for every θ1 and θ2 in with θ1 = θ2 , there exists a set B(⊆S) with Pθ (X ∈ B) > 0 for all θ ∈ such that f (x; θ1 ) = f (x; θ2 ) for x ∈ B. ∂ f (x; θ ) exists for all θ and all x ∈ S and (B4) The derivative ∂θ ∂ ∂ f (x; θ )d x. f (x; θ )d x = ∂θ S ∂θ S (B5) Let θ0 be the (unknown) true value of the parameter θ , and set f (X ; θ0 ) , θ ∈ . K (θ0 , θ ) = Eθ0 log f (X ; θ ) Then it is assumed that 0 < K (θ0 , θ ) < ∞. ∂2 (B6) The derivative ∂θ 2 f (x; θ ) exists for all θ and all x ∈ S, is continuous in θ , and ∂2 ∂θ 2
f (x; θ )d x = S
S
(B7) Set I (θ0 ) = Eθ0
∂2 f (x; θ )d x. ∂θ 2
!2 ∂ log f (X ; θ ) ∂θ θ=θ0
for the Fisher information number. Then it is assumed that 0 < I (θ0 ) < ∞. (B8) For each θ ∈ , there is a compact 2 neighborhood N (θ ) with θ belonging in the ∂ interior of N (θ ), such that ∂θ 2 log f (x; θ ) ≤ H (x) for all θ in N (θ ) and all x ∈ S, for some bounding (measurable) function H with Eθ H (X ) < ∞. Notice that (B1)–(B3) are the same as (A1)–(A3), and (B4) is a strengthening of (A4). It has already been seen (see Theorem 1) that, under assumptions (B1)–(B5), there is at least one root θn of the likelihood equation, with probability 1, so that a.s. θn −→ θ0 as n→∞. Here it will be shown that this sequence is also asymptotically (Pθ0 )
normal. Namely, θn } be Theorem 2. Let θ0 be the (unknown) true value of the parameter θ , and let { a sequence as in Theorem 1. Then, under assumptions (B1)–(B8),
d √ n θn − θ0 −→ N (0, 1/I (θ0 )), as n→∞. (Pθ0 )
The following results, stated as lemmas, are used on many occasions, and either directly or indirectly in the proof of Theorem 2.
16.4 Asymptotic Normality of the Strongly Consistent Estimate
Lemma 3. Let X be a r.v. defined on the probability space (, A, Pθ ), θ ∈ , with p.d.f. f (·; θ ). Then: (i) Under assumptions (B1), (B2), and (B4), ∂ Eθ log f (X ; θ ) = 0. ∂θ (ii) Under assumptions (B1), (B2), and (B6),
2 2 ∂ ∂ log f (x; θ ) f (x; θ ) d x = Eθ log f (X ; θ ) I (θ ) = ∂θ ∂θ ∂ log f (X ; θ ) (by part (i)) = Varθ ∂θ 2 ∂ = −Eθ log f (X ; θ ) . ∂θ 2
(iii) Under assumptions (B1), (B2), (B4), and (B6), n 1 ∂ d log f (X j ; θ )−→ N (0, I (θ )), as n→∞, √ (Pθ0 ) ∂θ n j=1
where X 1 , . . . , X n are independent r.v.s distributed as the r.v. X . Proof. (i) We have f (x; θ ) d x, so that by differentiation, 1= S d ∂ f (x; θ )d x (by Theorem 4, Chapter 5) 0= f (x; θ ) d x = dθ S S ∂θ " ∂ f (x; θ ) = f (x; θ ) f (x; θ )d x ∂θ S ∂ ∂ log f (x; θ ) f (x; θ ) d x = Eθ log f (X ; θ ) ; i.e., = ∂θ S ∂θ ∂ Eθ ∂θ log f (X ; θ ) = 0, as was to be seen. (ii) Once again, f (x; θ ) d x, and hence 1= S
d2 0= dθ 2
f (x; θ ) d x = S
S
∂2 f (x; θ )d x (by Theorem 4, Chapter 5) ∂θ 2
357
358
CHAPTER 16 Two Cases of Statistical Inference
2 ∂ ∂2 log f (x; θ ) f (x; θ )d x − f (x; θ ) d x 2 S ∂θ S ∂θ 2 ∂ + log f (x; θ ) f (x; θ ) d x S ∂θ ⎧ ∂ 2 ⎫ ⎨ ∂2 f (x; θ ) f (x; θ ) ⎬ ∂θ 2 − ∂θ 2 = f (x; θ ) d x + I (θ ) f (x; θ ) ⎭ S ⎩ f (x; θ ) 2 ∂ 2 ∂ f (x; θ ) f (x; θ ) − ∂θ f (x; θ ) ∂θ 2 f (x; θ ) d x + I (θ ) = f 2 (x; θ ) S # !$ ∂ f (x; θ ) ∂ ∂θ = f (x; θ ) d x + I (θ ) ∂θ f (x; θ) S ∂ ∂ log f (x; θ ) f (x; θ ) d x + I (θ ) = S ∂θ ∂θ 2 ∂ = log f (x; θ ) f (x; θ ) d x + I (θ ) 2 S ∂θ 2 ∂ = Eθ log f (X ; θ ) + I (θ ), so that ∂θ 2 2 ∂ I (θ ) = −Eθ log f (X ; θ ) , as was to be seen. ∂θ 2
=
(iii) By parts (i) and (ii), ∂ ∂ log f (X ; θ ) = 0, Varθ log f (X ; θ ) = I (θ ). Eθ ∂θ ∂θ Then the CLT (Corollary to Theorem 2, Chapter 12) yields
∂ 1 d log f (X j ; θ ) −→ N (0, 1), n→∞ n I (θ ) j=1 ∂θ n
√
and hence (by Theorem 4 (ii), Chapter 5), n 1 ∂ d log f (X j ; θ ) −→ N (0, I (θ )). √ n→∞ ∂θ n j=1
Lemma 4.
Set
ψ(X ; θ ) =
∂ ∂2 ˙ ; θ) = log f (X ; θ ), and ψ(X log f (X ; θ ). ∂θ ∂θ 2
16.4 Asymptotic Normality of the Strongly Consistent Estimate
Also, let
1
Bn = − 0
n 1 ψ˙ X j ; θ0 + λ( θn − θ0 ) dλ. n
(16.13)
j=1
a.s.
Then, under assumptions (B1)–(B3), (B4), (B6)–(B8), it holds Bn −→ I (θ0 ), as n→∞. (Pθ0 )
Proof.
By assumption (B8), ψ(X ˙ ; θ ) ≤ H (X ), θ ∈ N (θ0 ), Eθ0 H (X ) < ∞,
˙ ˙ ; θ )→ψ(X ˙ ; θ0 ) as whereas by assumption (B6) (continuity of ψ(x; θ ) in θ ), ψ(X θ →θ0 . Then ˙ ; θ ) −→ Eθ0 ψ(X ˙ ; θ0 ) (= −I (θ0 ), by Lemma 3(ii)). Eθ0 ψ(X θ→θ0
(16.14)
(This is so by the Dominated Convergence Theorem, Theorem 3, Chapter 5.) For some ρ > 0, set Sρ = {θ ∈ ; |θ − θ0 | ≤ ρ}, and in Lemma 2 identify T with ˙ ; θ ). Then assumptions(i)–(iii) in the ˙ θ ), and μ(t) with Eθ0 ψ(X Sρ , ϕ(x, t) with ψ(x; lemma just cited are satisfied. Under these assumptions, Proposition 1 applies and yields n a.s. 1
˙ ˙ ψ(X j ; θ ) − E0 ψ(X ; θ ) −→0 as n→∞. sup θ∈Sρ n j=1 (Pθ0 ) This means that for every ω in an event E 1 with Pθ0 (E 1 ) = 1 the last convergence holds (pointwise) when the r.v.s X j are evaluated at ω; i.e.,
1 n ˙ ˙ ψ[X j (ω); θ ] − Eθ0 ψ(X ; θ ) −→0 as n→∞. sup θ∈Sρ n j=1 Hence, for every ε > 0 there exists N1 = N1 (ε, ω) > 0 integer, such that
1 n ε ˙ j (ω); θ ] − Eθ0 ψ(X ˙ ; θ ) < , n ≥ N 1 . sup ψ[X 2 θ∈Sρ n j=1
(16.15)
a.s. On the other hand, by Theorem 1, θn −→ θ0 as n→∞, which implies that, for every (Pθ0 )
ω in an event E 2 with Pθ0 (E 2 ) = 1, θn −→ θ0 . Thus, for ω ∈ E 2 and ρ > 0 as n→∞
above there exists N2 = N2 (ρ, ω) > 0 integer such that | θn (ω) − θ0 | < ρ, n ≥ N2 .
(16.16)
359
360
CHAPTER 16 Two Cases of Statistical Inference
Set N0 = N0 (ε, ρ, ω) = max{N1 , N2 } and restrict ω in E 1 ∩ E 2 , call it E 0 (with Pθ0 (E 0 ) = 1). Then both inequalities (16.15) and (16.16) hold, provided n ≥ N0 ; i.e.,
ε 1 n ˙ ˙ ψ[X j (ω); θ ] − Eθ0 ψ(X ; θ ) < and | θn (ω) − θ0 | < ρ, n ≥ N0 . sup θ∈Sρ n j=1 2 (16.17) From this point on, we work with ω in E 0 (with Pθ0 (E 0 ) = 1) and n ≥ N0 . Also, for notational convenience, omit the evaluation of r.v.s at ω. We have then 1
n 1 ˙ ψ X j ; θ0 + λ(θn − θ0 ) dλ − I (θ0 ) |Bn (θ0 ) − I (θ0 )| = − 0 n j=1 1
n 1 = θn − θ0 ) dλ + I (θ0 ) ψ˙ X j ; θ0 + λ( 0 n j=1 1
n 1 ˙ ; θ0 ) = θn − θ0 ) dλ − Eθ0 ψ(X ψ˙ X j ; θ0 + λ( 0 n j=1 1 n 1
˙ ; θ ) ≤ ψ˙ X j ; θ0 + λ( θn − θ0 ) dλ − Eθ0 ψ(X 0 n j=1 ˙ ; θ ) − Eθ0 ψ(X ˙ ; θ0 ) + Eθ0 ψ(X ⎫ ⎧ 1 ⎨ n ⎬
1 ˙ ; θ ) dλ = ψ˙ X j ; θ0 + λ( θn − θ0 ) − Eθ0 ψ(X ⎭ 0 ⎩ n j=1 ˙ ; θ ) − Eθ0 ψ(X ˙ ; θ0 ) . + Eθ0 ψ(X (16.18) But on E 0 and for n ≥ N0 , [θ0 + λ( θn − θ0 )| θn − θ0 )] − θ0 = |λ( ≤ |θn − θ0 | (since 0 ≤ λ ≤ 1) < ρ (by (16.17)), so that θ0 + λ( θn − θ0 ) (= θ (ω)), call it θ ∗ , lies in Sρ . Also, on the right-hand side of (16.18), replace the arbitrary θ by θ ∗ . We have then ⎤ ⎡ 1 n
⎣1 ˙ j ; θ ∗ ) − Eθ0 ψ(X ˙ ; θ ∗ )⎦ dλ ψ(X |Bn (θ0 ) − I (θ0 )| ≤ n 0 j=1 ∗ ˙ ; θ ) − Eθ0 ψ(X ˙ ; θ0 ) . + Eθ0 ψ(X (16.19)
16.4 Asymptotic Normality of the Strongly Consistent Estimate
However, by (16.17),
n ε 1
˙ j ; θ ∗ ) − Eθ0 ψ(X ˙ ; θ ∗ ) < ψ(X 2 n j=1
as long as θ ∗ stays in Sρ (which it does) and for n ≥ N0 , which implies that ⎤ ⎡ 1 n
1 ∗ ∗ ⎣ ˙ j ; θ ) − Eθ0 ψ(X ˙ ; θ )⎦ dλ ψ(X n 0 j=1 ⎡ ⎤ 1 n ε 1
∗ ∗ ⎣ ˙ j ; θ ) − Eθ0 ψ(X ˙ ; θ )⎦ dλ < . ≤ ψ(X (16.20) 2 n 0 j=1
Furthermore, by (16.14), for ε > 0 there exists δ(ε) > 0 such that ε Eθ ψ(X ˙ ; θ ∗ ) − Eθ0 ψ(X ˙ ; θ0 ) < , (16.21) 0 2 provided |θ ∗ −θ0 | < δ(ε). This will be, indeed, the case if ρ is chosen to be < δ(ε). So, for every ε > 0, choose ρ < δ(ε), so that (16.21) is satisfied. Inequality (16.20) is also satisfied for ω in E 0 (with Pθ0 (E 0 ) = 1) and n ≥ N0 . Combining relations (16.18), (16.20), and (16.21), we obtain |Bn (θ0 )− I (θ0 )| < ε on the event E 0 (with Pθ0 (E 0 ) = 1), provided n ≥ N0 . This is equivalent to saying that Bn (θ0 ) −→ I (θ0 ) as n→∞. (Pθ0 )
The remark below will be used in the proof of Theorem 2. Remark 3. When expanding a function according to Taylor’s formula, the following form of a remainder often proves convenient. To this effect, let g be a real-valued function defined on , and assume it has a continuous derivative to be denoted by g. ˙ Then 1
g(x + t) = g(x) + t
g(x ˙ + λt)dλ.
0
A similar expression holds when we assume the existence of higher-order derivatives, as well as when g is defined on k (k ≥ 2). Such formulas are exhibited in relation (8.14.3), page 186, in Dieudonné (1960). Proof of Theorem 2. For the independent r.v.s X 1 , . . . , X n with p.d.f. f (·; θ ), consider the likelihood function L n (θ |Xn ) = nj=1 f (X j ; θ ), where Xn = (X 1 , . . . , X n ), and set n
log f (X j ; θ ). n (θ )(= n (θ |Xn )) = log L n (θ |Xn ) = j=1
Then, by the notation introduced in Lemma 4,
∂ ψ(X j ; θ ), ˙n (θ ) = n (θ ) = ∂θ n
j=1
361
362
CHAPTER 16 Two Cases of Statistical Inference
and hence
∂ ˙ ˙ j ; θ ). ψ(X ¨n (θ ) = n (θ ) = ∂θ n
j=1
Consider ˙n (θ ) and, by using Remark 3 with g(x) replaced by ˙n (θ ), expand it to obtain 1 ¨n [θ0 + λ(θ − θ0 )] dλ ˙n (θ ) = ˙n (θ0 ) + (θ − θ0 ) 0 ⎫ ⎧ 1 ⎨
n ⎬ ˙ j ; θ0 + λ(θ − θ0 )] dλ. (16.22) ψ[X = ˙n (θ0 ) + (θ − θ0 ) ⎭ 0 ⎩ j=1
˙ In (16.22), replace √ θ by θn , recall that n (θn ) = 0 (with Pθ0 -probability 1), and divide both sides by n to get ⎫ ⎧ 1⎨ n ⎬ √ 1 ˙ 1
˙ j ; θ0 + λ( θn − θ0 ) θn − θ0 )] dλ − ψ[X √ n (θ0 ) = n( ⎭ n 0 ⎩ n j=1 √ = Bn (θ0 ) × n( θn − θ0 ) (by the definition of Bn (θ0 ) in (16.13)). a.s.
By Lemma 4, Bn (θ0 )−→ I (θ0 ) as n→∞. Therefore, with Pθ0 -probability 1 and suffi(Pθ0 )
a.s.
ciently large n, Bn−1 (θ0 ) exists and, clearly, Bn−1 (θ0 )−→ I −1 (θ0 ). Also, by Lemma 3 (Pθ0 )
(iii),
d √1 ˙n (θ0 ) −→ n (Pθ0 )
N (0, I (θ0 )). Then Slutsky’s theorem (Theorem 8(ii), Chapter 8)
implies that 1 1 d × N (0, I (θ0 )) = N (0, I −1 (θ0 )). Bn−1 (θ0 ) × √ ˙n (θ0 ) −→ (Pθ0 ) I (θ0 ) n √ √ From √1n ˙n (θ0 ) = Bn (θ0 )× n( θn −θ0 ), we get n( θn −θ0 ) = Bn−1 (θ0 )× √1n ˙n (θ0 ) (with Pθ0 -probability 1), so that, as n→∞, √
d n( θn − θ0 )−→ N (0, I −1 (θ0 )), (Pθ0 )
as was to be seen. To this theorem, there is the following corollary.
Corollary. Suppose that for all sufficiently large n and with Pθ0 -probability 1 there exists a unique MLE θn of θ0 . Then, under assumptions (B1)–(B8) and as n→∞, we have √ d n( θn − θ0 ) −→ N (0, I −1 (θ0 )). (Pθ0 )
16.4 Asymptotic Normality of the Strongly Consistent Estimate
Proof. It is immediate, since all arguments used in the proofs apply when θn is replaced by θn (see also Corollary to Theorem 1). Remark 4. When the parameter space is r -dimensional, assumptions (B1), (B4), (B6)–(B8) have got to be modified suitably. The proofs of Lemma 2 and Proposition 1 are independent of the dimensionality of T . However, the formulations of Theorem 2 and of Lemmas 3 and 4 must be modified, as well as their proofs. The conclusion of the theorem, nevertheless, remains valid, properly interpreted. Most of the relevant derivations for the multidimensional parameter case, both for Theorems 1 and 2, can be found in Theorem 17, page 114, and Theorem 18, page 121, in Ferguson (1996). In reference to Theorem 1, it should be pointed out that the assumptions made there are sufficient but not necessary—albeit economical—for the theorem to hold. The following example illustrates the point. Example 1. Let X ∼ U (0, θ ). Then assumptions (A2) and (A4) are not satisfied. On θn is given by θn = X (n) , the basis of a random sample of size n, X 1 , . . . , X n , the MLE θn − θ ). Then its and its p.d.f. is given by gn (t; θ ) = θnn t n−1 , 0 < t < θ . Set Yn = n(
n−1 p.d.f. is f Yn (y; θ ) = 1n θ + y , −nθ < y < 0, and the p.d.f. of Z n = θn − θ is f Z n (t; θ ) =
n θ n (θ
θ
n
+ t)n−1 , −θ < t < 0. It follows that, for every ε > 0,
Pθ (| θn − θ | > ε) = Pθ ( θn − θ > ε or θn − θ < −ε) = Pθ (θn − θ > ε) + Pθ ( θn − θ < −ε) = Pθ ( θn − θ < −ε) −ε −ε n n 1 n−1 n = n (θ + t) dt = n × (θ + t) θ −θ θ n −θ &n % 1 ε = n (θ − ε)n = 1 − . θ θ
∞ ε n By choosing (0 <)ε < θ , so that 0 < 1 − θε < 1, we get = n=1 1 − θ a.s. 1− θε θ−ε ε = (finite). It can be argued that this is a sufficient condition for θn −→ θ . 1− 1− θ
ε
(Pθ )
(This follows by combining the results in Exercise 4, Chapter 2, and Exercises 3, and 4 (i), Chapter 3.) So, in this case the conclusion of the theorem holds, while some of its assumptions fail to be satisfied. The same example also makes a case for Theorem 2. We have, for −nθ < y < 0, y &n−1 1 % f Yn (y; θ ) = n θ + θ n ' ( y n−1 n−1 1 1 n × θ = −→ e y/θ , y < 0, 1+ n→∞ θ θ n−1 which is the p.d.f. of the Negative Exponential distribution with parameter 1/θ , call it f Y (·; θ ). From the fact that f Yn (y; θ ) −→ f Y (y; θ ), y < 0, and by means of the n→∞
363
364
CHAPTER 16 Two Cases of Statistical Inference
Dominated Convergence Theorem (Theorem 3, Chapter 5), it follows that y y FYn (y) = f Yn (t; θ ) dt −→ f Y (t; θ ) dt, y < 0; −∞
d i.e., n( θn − θ )−→Y (Pθ )
n→∞ −∞
∼ Negative Exponential with parameter
1 θ,
and hence
√ d n( θn − θ )−→0. Thus, in this case, the conclusion of Theorem 2 fails to hold. (Pθ )
16.5 Nonparametric Estimation of a Probability Density Function The problem we are faced with here is the following: we are given n i.i.d. r.v.s X 1 , . . . , X n with p.d.f. f (of the continuous type), for which very little is known, and we are asked to construct a nonparametric estimate fˆ n (x) of f (x), for each x ∈ , based on the random sample X 1 , . . . , X n . The approach to be used here is the so-called kernel-estimation approach. According to this method, we select a (known) p.d.f. (of the continuous type) to be denoted by K and to be termed a kernel, subject to some rather minor requirements. Also, we choose a sequence of positive numbers, denoted by {h n }, which has the property that h n →0 as n→∞ and satisfies some additional requirements. The numbers h n , n ≥ 1, are referred to as bandwidths for a reason to be seen below (see Example 2). Then, on the basis of the random sample X 1 , . . . , X n , the kernel K , and the bandwidths h n , n ≥ 1, the proposed estimate of f (x) is fˆ n (x) given by: n 1
x − Xi . (16.23) K fˆ n (x) = nh n hn i=1
Remark 5. In the spirit of motivation for using the estimate in (16.23), observe first & % that h1n K x−y is a p.d.f. as a function of y for fixed x. Indeed, hn ∞ −∞ 1 1 x−y x−y dy = K K (t)(−h n )dt by setting =t hn hn hn −∞ h n ∞ ∞ K (t)dt = 1. = −∞
Next, evaluate
1 hn
% K
x−y hn
&
at y = X i , i = 1, . . . , n, and then form the average of
these values to produce fˆn (x). A further motivation for the proposed estimate is the following. Let F be the d.f. of the X i s, and let Fn be the empirical d.f. based on X 1 , . . . , X n , so that, for fixed x, Fn (y) takes on the value n1 at each one of the points y = X i , i = 1, . . . , n. Then & % weigh h1n K x−y by n1 and sum up from 1 to n (which is the same as integrating hn & % x−y 1 with respect to Fn ) obtain fˆn (x) again. hn K hn
16.5 Nonparametric Estimation of a Probability Density Function
Example 2. Construct the kernel estimate of f (x), for each x ∈ , by using the U (−1, 1) kernel; i.e., by taking K (x) =
1 , for −1 ≤ x ≤ 1, and 0, otherwise. 2
Here, it is convenient to use the indicator notation; namely, K (x) = I[−1,1] (x) (where, it is recalled, I A (x) = 1 if x ∈ A, and 0 if x ∈ Ac ). Then the estimate (16.23) becomes as follows: n x − Xi 1
, x ∈ . (16.24) fˆ n (x) = I[−1,1] nh n hn i=1
i So, I[−1,1] ( x−X h n ) = 1, if and only if x −h n ≤ X i ≤ x +h n ; in other words, in forming fˆ n (x), we use only those observations X i which lie in the window [x − h n , x + h n ]. The breadth of this window is, clearly, determined by h n , and this is the reason that h n is referred to as the bandwidth. Usually, the minimum of assumptions required of the kernel K and the bandwidth h n , in order for us to be able to establish some desirable properties of the estimate fˆ n (x) given in (16.23), are the following: ⎫ K is a bounded p.d.f.; i.e., sup {K (x); x ∈ } < ∞. ⎪ ⎪ ⎬ x K (x) tends to 0 as x→ ± ∞; i:e:, |x K (x)|−→ 0. (16.25) |x|→∞ ⎪ ⎪ ⎭ K is symmetric about 0; i.e., K (−x) = K (x), x ∈ . As n→∞ : (i) (0 <)h n → 0 . (16.26) (ii) nh n → ∞
Remark 6. Observe that requirements (16.25) are met for the kernel used in (16.24). Furthermore, the convergences in (16.26) are satisfied if one takes, e.g., h n = n −α with 0 < α < 1. Below, we record three (asymptotic) results regarding the estimate fˆ n (x) given in (16.23). Theorem 3. Under assumptions (16.25) and (16.26)(i), the estimate fˆn (x) given in (16.23) is an asymptotically unbiased estimate of f (x) for every x ∈ at which f is continuous; i.e., E fˆn (x)→ f (x) as n→∞. Theorem 4. Under assumptions (16.25) and (16.26)(i)–(ii), the estimate fˆ n (x) given in (16.23) is a consistent in quadratic mean estimate of f (x) for every x ∈ at which f is continuous; i.e., E[ fˆn (x) − f (x)]2 → 0 as n→∞. Theorem 5. Under assumptions (16.25) and (16.26)(i)–(ii), the estimate fˆ n (x) given in (16.23) is asymptotically normal, when properly normalized, for every x ∈
365
366
CHAPTER 16 Two Cases of Statistical Inference
at which f is continuous; i.e., fˆn (x) − E fˆn (x) d −→ Z ∼ N (0, 1). n→∞ σ [ fˆn (x)] At this point, it is only fitting to mention that the concept of kernel estimation of a p.d.f. was introduced by Murray Rosenblatt (1956), and it was popularized in a fundamental paper by Parzen (1962). There, one can find the proofs of the above theorems, along with other results. See references for more details. This section is concluded with a fundamental result (stated as Theorem A), which is needed in the proof of Theorems 3–5. Theorem A (Bochner).
Make the following assumptions:
*∞ (i) The function K : → is such that |K (y)| ≤ M(< ∞) for every y, −∞ |K (y)| dy < ∞, and |y K (y)|→0 as |y|→∞. *∞ (ii) The function g : → is such that −∞ |g(y)| dy < ∞. (iii) The (real) numbers h n , n = 1, 2, . . ., are such that 0 < h n →0 as n→∞. For each x ∈ , define gn by: ∞ ∞ 1 1 y x −t g(x − y)dy = g(t) dt, gn (x) = K K h n −∞ hn h n −∞ hn by settingx − y = t . Then, for every continuity point x of g, it holds ∞ gn (x) −→ g(x) K (y) dy. n→∞
−∞
Proof. ∞
(16.27)
In the first place, set h instead of h n throughout the proof, and observe that ∞ 1 ∞ %y& t t dt = dy by setting y = . K (y) dy = K K h h h −∞ h h −∞ −∞
Then ∞ gn (x) − g(x) K (y) dy −∞ ∞ %y& 1 1 ∞ % y & = g(x − y) dy − g(x) dy K K h h h −∞ h ∞−∞ % & y 1 = K [g(x − y) − g(x)] dy h h −∞ ∞ 1 % y & ≤ |g(x − y) − g(x)| K dy h h −∞
16.5 Nonparametric Estimation of a Probability Density Function
=
1 % y & |g(x − y) − g(x)| K dy h h (|y|≤δ) 1 % y & + |g(x − y) − g(x)| K dy h h (|y|>δ)
(for δ > 0) 1 % y & 1 % y & ≤ sup |g(x − y) − g(x)| dy + |g(x)| dy K K h h (|y|≤δ) h (|y|>δ) h |y|≤δ |g(x − y)| |y| % y & + × K dy |y| h h (|y|>δ) 1 % y & 1 % y & ≤ sup |g(x − y) − g(x)| K dy + |g(x)| K dy h h (|y|≤δ) h (|y|>δ) h |y|>δ 1 |y| % y & + |g(x − y)| × K dy δ (|y|>δ) h h 1 1 since |y| > δ is equivalent to < |y| δ ∞ % & y 1 1 % y & ≤ sup |g(x − y) − g(x)| K dy + |g(x)| K dy h h (|y|>δ) h −∞ h |y|≤δ 1 + |g(x − ht)||t K (t)|h dt δ (|t|>δ/h) % & y by setting = t in the last integral above h
= sup |g(x − y) − g(x)| |y|≤δ
∞
−∞
|K (t)| dt + |g(x)|
∞ 1 sup |t K (t)| |g(x − ht)|h dt δ |t|>δ/h −∞ & % y by setting = t in the first two integrals h
(|t|>δ/h)
|K (t)| dt
+
= max |g(x − y) − g(x)| |y|≤δ
1 + sup |t K (t)| δ |t|>δ/h
∞
−∞
∞
−∞
|K (t)| dt + |g(x)|
|g(y)| dy
(|t|>δ/h)
|K (t)| dt
(because the sup can be replaced by max,
by continuity of g at x, and by setting x − ht = y in the last integral). However, as n→∞ (which implies h→0), the following things happen: |K (t)| dt→0, following from : (|t|>δ/h)
367
368
CHAPTER 16 Two Cases of Statistical Inference
*∞
−∞ |K (t)| dt < ∞, |K (t)|I(|t|>δ/h) →0, K (t)I(|t|>δ/h) ≤ |K (t)| independent of n, and the Dominated Convergence Theorem (Theorem 3(ii), Chapter 5); and
sup |t K (t)|→0, following from |y K (y)|→0 as |y|→∞.
|t|>δ/h
Taking the limits, as n→∞, we obtain then ∞ K (y) dy ≤ max |g(x − y) − g(x)| limsup gn (x) − g(x) |y|≤δ
−∞
∞
−∞
|K (t)| dt.
Finally, letting δ→0, and using continuity of g at x, we have ∞ K (y) dy = 0, lim gn (x) − g(x) n→∞
−∞
and hence gn (x) −→ g(x) n→∞
∞
−∞
K (y) dy,
which is what relation (16.27) asserts.
Remark 7. Under the first assumption in (16.25) (i.e., K (x) ≤ M(< ∞), x ∈ *∞ ), it holds −∞ K r (x) d x < ∞ for every r > 1. This is so, because K r (x) = *∞ K r −1 (x)K (x) ≤ M r −1 K (x), so that −∞ K r (x) d x ≤ M r −1 . Corollary. Under the assumptions of Theorem A and with the r.v. X distributed as the X j s, it holds ∞ x−X 1 EK r K r (x)d x, −→ f (x) n→∞ hn hn −∞ for every x ∈ continuity point of f , r ≥ 1. Proof.
Writing h instead of h n , we have x−X 1 ∞ r x−y 1 EKr = f (y) dy K h h h −∞ h 1 ∞ r %z& = f (x − z) dy (by setting x − y = z) K h −∞ h ∞ −→ f (x) K r (z) dz, n→∞
−∞
by Theorem A applied with K replaced by K r and g by f .
16.6 Proof of Theorems 3–5 On the basis of Theorem A, its corollary, and standard probability arguments, we may now proceed with the proof of Theorems 3–5.
16.6 Proof of Theorems 3–5
Proof of Theorem 3.
By the fact that the X j s are i.i.d., we obtain x−X 1 x−X 1 × nEK = EK −→ f (x), E f n (x) = n→∞ nh h h h
by the Corollary to Theorem A applied with r = 1.
The following results are needed in the proof of Theorem 4 below. Lemma 5. Under assumptions (16.25) and (16.26) (i) and (ii), and for every x ∈ continuity point of f , it holds: (i) σ 2 [ f n (x)]→0. *∞ (ii) (nh n )σ 2 [ f n (x)]→ f (x) −∞ K 2 (z) dz. Proof. (i) Indeed, ⎡
⎤ n
x − X 1 j ⎦ σ 2[ f n (x)] = Var ⎣ K nh h j=1 x−X 1 × n Var K = (nh)2 h # $ x−X 2 1 2 x −X − EK = EK nh 2 h h 1 x−X 1 1 1 x−X 2 = × EK 2 − EK nh h h n h h ∞ −→ 0 × f (x) K 2 (z)dz − 0 × f 2 (x) = 0, (16.28) n→∞
−∞
by the Corollary to Theorem A, applied for r = 1 and r = 2, and by part (ii) in (16.26). (ii) From part (i), 1 x−X 2 1 2 x−X EK −h EK f n (x) = (nh)σ 2 h h h h ∞ −→ f (x) K 2 (z) dz − 0 × f 2 (x) n→∞ −∞ ∞ = f (x) K 2 (z) dz. −∞
Proof of Theorem 4. In the first place, for any r.v. U with EU 2 < ∞ and any constant c, the following identity is immediate, by adding and subtracting EU , E(U − c)2 = σ 2 (U ) + (EU − c)2 .
369
370
CHAPTER 16 Two Cases of Statistical Inference
Applying this identity with U = f n (x) and c = f (x), we get 2 2 f n (x) + E f n (x) − f (x) . E f n (x) − f (x) = σ 2 2 Then, for every x ∈ continuity point of f , it holds E f n (x) − f (x) −→ 0 on n→∞ account of Theorem 3 and Lemma 1(i). Corollary. Under the assumptions of Theorem 1 and for every x ∈ continuity point of f , it holds P f n (x) −→ f (x). n→∞
Proof.
Indeed, for every ε > 0, the Tchebichev inequality yields 2 P f n (x) − f (x) > ε ≤ ε−2 E f n (x) − f (x) −→ 0. n→∞
Proof of Theorem 5. For an arbitrary but fixed x ∈ (continuity point of f throughout the proof), and any n, set " x − Xj 1 x−X 1 K −E K Xnj = h h h h 1/2 √ x−X 1 K n Var , h h j = 1, . . . , n, where the r.v. X is distributed as the X j s. Then, as is easily seen by means of (16.23) and the third line on the right-hand side of (16.28), we have n
Xnj =
j=1
fˆn (x) − E fˆn (x) . σ fˆn (x)
Consider the triangular array of r.v.s X n j , j = 1, . . . , n, n ≥ 1, and observe that, within each row, they are i.i.d. with expectation 0 and variance σ 2 (X n j ) = 1/n, so that nj=1 σ 2 (X n j ) = 1 (and max1≤ j≤n σ 2 (X n j ) = n1 −→ 0). Thus, the conditions n→∞ of Theorem 1, Chapter 12, hold. Therefore a necessary and sufficient condition that n d j=1 X n j −→ Z ∼ N (0, 1) is that, for every ε > 0, n→∞
gn (ε) = n
(|x|≥ε)
x 2 d Fn (x) −→ 0, n→∞
where Fn is the d.f. of the X n j s, j = 1, . . . , n. On the other hand, a sufficient condition for gn (ε) −→ 0 is that (see Theorem 3, Chapter 12) n→∞
n
j=1
E|X n j |3 −→ 0. n→∞
16.6 Proof of Theorems 3–5
Since for any r.v. U ≥ 0, it is clearly true that |U − EU | ≤ U + EU , we have E|U − EU |3 ≤ E(U + EU )3 = EU 3 + 3(EU 2 )(EU ) + 3(EU )3 + (EU )3 = EU 3 + 3(EU 2 )(EU ) + 4(EU )3 , & % x−X j , and summing over j from 1 to n: we have, upon replacing U by h1 K h n
1 x − Xj 1 x − X 3 −E K E K h h h h j=1
3 n
x − Xj 1 1 x−X +E ≤ E K K h h h h j=1 # $ n n
x − Xj 3 x − Xj 2 1 1 K K = E +3 E h h h h j=1
j=1
3 n
x − Xj 1 x−X 1 E K +4 K × E h h h h j=1 # 3 2 $ 1 1 x−X x−X = nE + 3n E K K h h h h 3 1 x−X 1 x−X × E K + 4n E K h h h h 3 1 1 3 x−X 1 2 x−X + ×h =n × × EK EK h2 h h h2 h h $ x−X 4h 2 x−X 3 1 1 EK + 2 × EK × h h h h h n 1 3 x−X 1 2 x−X 1 x−X = 2 EK +3× EK ×h× EK h h h h h h h $ 3 x−X 1 de f EK +4h 2 × = I1n , h h or I1n = hn2 × J1n , where J1n is the quantity within the curly brackets in the last expression above. However, by the Corollary to Theorem A (applied with r = 3, r = 2, and r = 1), we have, as n→∞, ∞ ∞ 1 2 x−X 1 3 x−X 3 EK → f (x) → f (x) K (x) d x, EK K 2 (x) d x, h h h h −∞ −∞
371
372
CHAPTER 16 Two Cases of Statistical Inference
1 EK h
x−X h
→ f (x)
∞ −∞
K (x) d x = f (x),
so that
∞ K 3 (x) d x + 3 f (x) K 2 (x) d x × 0 × f (x) −∞ −∞ ∞ 3 3 +0 × [ f (x)] = f (x) K (x) d x. (16.29)
J1n → f (x)
∞
−∞
Also, set I2n =
√
n
= = = =
Var
1 K h
x−X h
1/2 !3
3/2 1 x−X K h h 2 !3/2 √ x−X 2 x−X 1 1 K K n n E − E h h h h # 2 $3/2 √ 1 x − X 1 x − X − EK n n EK 2 h2 h h h # √ 2 $3/2 x − X x − X n n − EK EK 2 h3 h h # √ $3/2 1 x−X 2 n n 1 2 x−X EK −h EK h 3/2 h h h h √ n n J2n , or h 3/2
√ =n n =
Var
√
I2n = nh 3/2n J2n , where J2n is the quantity within the curly brackets in the last expression above. However, by the Corollary to Theorem A (applied with r = 2, and r = 1), we have, as n→∞, ∞ x−X 1 2 x−X 1 2 EK → f (x) EK → f (x), K (x) d x, h h h h −∞ so that
J2n → f (x)
∞ −∞
K (x)d x + 0 × f (x) = f (x) 2
∞ −∞
K 2 (x)d x.
(16.30)
16.6 Proof of Theorems 3–5
Therefore n
j=1
E|X n j |3 ≤
I1n nh −2 J1n 1 J1n = √ −3/2 × =√ × −→0, I2n J2n J2n n nh nh
by means of (16.26)(i), (ii), (16.29), and (16.30). This implies that gn (ε) −→ 0 for n→∞ d every ε > 0, and therefore nj=1 X n j −→ Z ∼ N (0, 1). n→∞
373
APPENDIX
Brief Review of Chapters 1–16
A
The purpose of this appendix is to present a brief summary of the content of each chapter in this book. Before embarking on the study of each chapter, it would be advisable that the reader review the respective summary in this appendix. In this way, one obtains ahead of time an integrated picture of the entire chapter. This brief review also serves as a guide to the reader as to where various kinds of material are to be found.
Chapter 1 Certain Classes of Sets, Measurability, and Pointwise Approximation In this chapter, the concepts of a field, of a σ -field, and of a monotone class are introduced and their basic properties and relationships are studied (Definitions 1, 2, and 4, and Theorems 1–6). Measurable spaces and product measurable spaces are also introduced, as well as the concept of a measurable function, in general, and of a random vector and of a random variable (r.v.), in particular (Definitions 3, 5– 11, and Theorems 7–16). The chapter is concluded with the fundamental result of approximating pointwise a r.v. by a sequence of simple r.v.s (Theorem 17 and its Corollary).
Chapter 2 Definition and Construction of a Measure and its Basic Properties The basic concepts in this chapter are those of a measure, in general, and of a probability measure in particular (Definition 1). A number of basic results are also established (Theorems 1 and 2). The concept of the outer measure is introduced and is then used in going from a measure defined on a field to a measure defined on the σ -field generated by the field (Definitions 4 and 5, and Theorems 3 and 4); the Carathéodory Extension Theorem is instrumental here (Definition 6 and Theorem 5). The chapter is concluded with the relationship between a distribution function and the measure induced by it, and as a by-product of the discussion, we obtain the Lebesgue measure in the real line (Theorems 6 and 7). An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00024-4 Copyright © 2014 Elsevier Inc. All rights reserved.
375
376
APPENDIX A Brief Review of Chapters 1–16
Chapter 3 Some Modes of Convergence of Sequences of Random Variables and their Relationships In this chapter, we introduce two modes of convergence of a sequence of r.v.s: almost everywhere (a.e.) convergence, and convergence in measure, as well as their mutual versions (Definitions 1 and 2). It is shown that a.e. convergence and convergence in measure are equivalent to a.e. mutual convergence and mutual convergence in measure, respectively (Theorems 2 and 6). These convergences become almost sure (a.s.) convergence (a.s. mutual convergence) and convergence (mutual convergence) in probability when the underlying measure is a probability measure. It is also shown that for a finite measure, a.e. convergence implies convergence in measure (Corollary to Theorem 4). Furthermore, necessary and sufficient conditions are found for a.e. (and a.e. mutual) convergence (Theorem 4).
Chapter 4 The Integral of a Random Variable and its Basic Properties This chapter is devoted to the step-by-step definition of the integral of a r.v. X over an abstract measure space (, A, μ), X dμ (Theorems 1–3, and Definitions 1 and 2), and the proof of the basic properties of the integral (Theorems 4–12). The integral of X becomes the expectation of X (expected value of X , mean value of X ), E X , when μ is a probability measure P. Finally, the (probability) distribution of a r.v. X 4), and it is shown that, for under a probability measure P, PX , isdefined (Definition g : → measurable, it holds that g(X )d P = g(x)d PX (Theorem 13).
Chapter 5 Standard Convergence Theorems, The Fubini Theorem In this chapter, one finds the standard convergence theorems, such as the Lebesgue Monotone Convergence Theorem, the Fatou–Lebesgue Theorem, and the Dominated Convergence Theorem (Theorems 1–3). All these theorems, provide, in effect, conditions under which the limit can be taken under the integral sign. As an application of such theorems, one also establishes conditions for interchanging the operations of differentiation and integration (Theorems 4 and 5). Next, convergence in distribution of a sequence of r.v.s is defined (Definition 2), and the fact that this convergence is implied by convergence in probability is stated (Theorem 6), along with the very convenient Slutsky Theorem (Theorem 7). The second part of the chapter is devoted to a detailed discussion of the Product Measure Theorem (Theorem 11) and of the Fubini Theorem (Theorem 12), which gives conditions for interchanging the order of integration.
APPENDIX A Brief Review of Chapter 1–16
Chapter 6 Standard Moment and Probability Inequalities, Convergence in the rth Mean and its Implications The first part of this chapter is devoted to the discussion of the basic moment inequalities, such as the Hölder inequality (and its special case the Cauchy–Schwaz inequality) (Theorem 2), which provide bounds for the expectation of the product of two r.v.s by moments of the individual r.v.s; the Minkowski and the cr -inequality (Theorems 3 and 4), which provide bounds of a moment of the sum of two r.v.s by moments of the individual r.v.s; and the Jensen inequality (Theorem 5), where a convex function is involved. Moments are also used for providing bounds for certain probabilities, with the Markov and the Tchebichev inequalities being taken as special cases (Theorem 6). In the second part of the chapter, convergence in the r th mean is introduced (Definitions 2 and 3), as well as the concepts of uniform integrability and uniform (absolute) continuity (Definitions 4 and 5), along with a related result (Theorem 11). Some implications of convergence in the r th mean are studied (Theorems 7, 8, and 12), sufficient conditions for convergence in the r th mean are given (Theorems 8 and 13, and Corollaries 1–3), and, finally, necessary and sufficient conditions for convergence in the r th mean are discussed (Theorems 9, 13, and 14; see also Theorem 15).
Chapter 7 The Hahn–Jordan Decomposition Theorem, The Lebesgue Decomposition Theorem, and the Radon–Nikodym Theorem This chapter discusses three theorems, each of which is instrumental in establishing the next one. The first theorem is the Hahn–Jordan Decomposition Theorem (Theorem 1), which, in effect, says that a σ -additive set function φ defined on (, A) is uniquely written as a signed measure; i.e., as the difference of two measures, φ = φ + − φ − . In the Lebesgue Decomposition Theorem (Theorem 2), one starts out with two σ -finite measures μ and ν defined on (, A), and shows that ν is uniquely written as the sum of two measures, ν = νc + νs , where νs is μ-singular and νc is μ-continuous. Furthermore νc (A) = A X dμ, A ∈ A, where the r.v. X is nonnegative, and a.e.[μ] finite and unique. A specialization of the Lebesgue Decomposition Theorem yields the Radon–Nikodym Theorem (Theorem 3), according to which, if ν is already μ continuous and μ is λ-continuous, then ν(A) = A X dμ = A X ( dμ dλ )dλ, A ∈ A.
Chapter 8 Distribution Functions and Their Basic Properties, Helly–Bray Type Results This chapter deals with distribution functions (d.f.s) and sequences of d.f.s. Here a d.f. is a nonnegative, bounded, nondecreasing, and right-continuous function (Definition 1;
377
378
APPENDIX A Brief Review of Chapters 1–16
see also Remark 5). It is shown that a d.f. has several properties, some of which are the following. It is uniquely determined by a nonnegative, bounded, and nondecreasing function defined on a set dense in (Propositions 1 and 2); its discontinuities, if any, are jumps, which are countably many (Theorem 1); it is uniquely decomposed into the d.f.s, Fd , Fcc , and Fcs , say, where Fd is a step function (Definition 2), the measure induced by Fcc is λ-continuous, where λ is the Lebesgue measure, and the measure induced by Fcs is λ-singular (Theorem 2 and its Corollary). Next, a sequence {Fn }, n ≥ 1, of d.f.s is said to converge weekly to a d.f. F, if Fn (x) −→ F(x), for all continuity n→∞ points x of F (Definition 3; see also Theorem 4), and it is shown that, given an arbitrary (bounded) sequence of d.f.s, one can always extract a subsequence, which converges weakly to a d.f. (Theorem 5). Finally, the so-called Helly–Bray results are established, where conditions are given under which (α,β] g(x)d Fn (x) −→ (α,β] g(x)d F(x) n→∞ (Theorem 6), and g(x)d Fn (x) −→ g(x)d F(x) (Theorems 7 and 8). n→∞
Chapter 9 Conditional Expectation and Conditional Probability, and Related Properties and Results All r.v.s considered in this chapter are assumed to be integrable. For a r.v. X defined on (, A, P), its conditional expectation, given a σ -field B ⊆ A, is defined, E B X , by exploiting the Radon–Nikodym Theorem (Definition 1). The conditional probability of A ∈ A, given B, is taken as a special case for X = I A . The conditional expectation has numerous properties, some of which are established in this chapter. A group of properties of the conditional expectation encompass the following ones: the expectation of the conditional expectation is the expectation; the conditional expectation of a B-measurable r.v. X is X a.s.; the conditional expectation of a constant c is c a.s.; the conditional expectation preserves order a.s.; the conditional expectation preserves linearity a.s.; a B-measurable r.v. can be pulled out a.s. when taking the conditional expectation of the product of two r.v.s; when taking the conditional expectation successively with respect to two nested σ -field, one is left a.s. with the conditional expectation with respect to the smaller σ -field; and finally, when a r.v. and B are independent, then its conditional expectation is its expectation a.s. (Theorems 1, 2, 4, 9, and 10). Another group of properties provides versions of the Lebesgue Monotone Convergence Theorem (Theorem 3), of the Fatou–Lebesgue Theorem, and of the Dominated Convergence Theorem for the conditional expectation (Theorem 7). Still another group of properties establishes versions of the following inequalities for the conditional expectation: Hölder, Minkowski, cr (Theorem 5), and Jensen inequalities (Theorem 8). It is also shown that convergence in the r th mean implies the same convergence for the conditional expectations (Theorem 6). Finally, it is proved that the conditional expectation of the r.v. X , given another r.v. Y (i.e., given the σ -field induced by Y ), is a.s. equal to a function of Y (Theorem 11).
APPENDIX A Brief Review of Chapter 1–16
Chapter 10 Independence This brief chapter deals with the concept of independence. It starts with independence of a finite number of events (Definition 1), proceeds with independence of a finite number of classes of events (Definition 2), and concludes with the definition of independence of a finite number of r.v.s, which is reduced to that of the induced σ -fields (Definition 3). An easy result is established, according to which if X 1 , ..., X n are independent and Y j = g j (X j ) for some g j : → measurable, j = 1, ..., n, then Y1 , ..., Yn are also independent (Proposition 1). However, the highlight of this chapter is the justification of the hard part of the characterization of independence of n r.v.s through d.f.s. Namely, X 1 , ..., X n are independent if and only if FX 1 ,...,X n (x1 , ..., xn ) = FX 1 (x1 )...FX n (xn ) for all x1 , ..., xn in (Theorem 1). In the process, it is also shown that σ -fields generated by independent fields are independent (Proposition 2). The chapter is concluded with the proof of the familiar result that, under independence, the expectation of the product of two r.v.s is equal to the product of their expectations (proof of Lemma 1 stated in Chapter 9).
Chapter 11 Topics from the Theory of Characteristic Functions This is an extensive chapter on characteristic functions (ch.f.s) and their ramifications. The ch.f. of a general d.f. is defined, and we obtain as a special case the ch.f. of a d.f. of a r.v. (Definition 1). An installment of basic properties of a ch.f. are established, including its boundedness (in norm), its uniform continuity, and the relations of its derivatives to the moments of the respective r.v. (in case the ch.f. is that of a r.v.). Also, established is the fact that a r.v. X is symmetric about 0 (Definition 7), if and only if its ch.f. is real, as well as the fact that should the ch.f.s f n (t) −→ f (t), some n→∞ ch.f., then this convergence is uniform in closed intervals in (Theorems 1, 7, and 8). A d.f. determines a ch.f. through the definition of a ch.f. The important fact is that the converse is also true. This deep result is referred to as the inversion formula and is established here in great detail (Theorem 2 and its Corollaries); also, it is illustrated by a couple of examples (Examples 1 and 2). The next major undertaking is that of establishing the P. Lévy Continuity Theorem (Theorem 3; see also Theorem 3∗ ). The most useful and involved part of it states, in effect, that weak convergence of d.f.s, which is hard to handle, is reduced to convergence of the respective ch.f.s. This latter result is an analytical result and much more tractable. Next, a brief passage is made to the k-dimensional case, which includes formulation of the appropriate versions of the inversion formula (Theorem 2 ), and of the Continuity Theorem (Theorem 3 ). Also, it is shown how the Cramér-Wold device reduces, in effect, the k-dimensional case to the one-dimensional case (Theorem 4). The convolution of two d.f.s is defined (Definition 6), and it is shown to be a symmetric relation. Also, the relation of the convolution of two d.f.s to their ch.f.s is
379
380
APPENDIX A Brief Review of Chapters 1–16
discussed (Theorem 5, and Theorem 6 and its Corollaries 1 and 2). In case the d.f.s are those of r.v.s, it is shown that their convolution is simply the d.f. of the sum of two independent r.v.s (Corollary 3 to Theorem 6). Returning to a ch.f., it is shown how a ch.f. can be expanded in a Taylor-type manner with the remainder given in three different forms (Theorem 9). As a simple application of this expansion, we obtain the Weak Law of Large Numbers (WLLN) (Application 1), and the Central Limit Theorem (CLT) (Application 2). The last topic discussed here refers to the connection between the moments of a r.v. and its distribution. It is shown that, under certain conditions, the moments of a r.v. completely determine its distribution through its ch.f. (Theorem 11). This result elevates the significance of moments. In the last short section of this chapter, some basic concepts and results from complex analysis are reviewed, in order to provide the necessary support for some of the derivations.
Chapter 12 The Central Limit Problem: The Centered Case Indisputably, the most important theorem in probability is the Central Limit Theorem (CLT). This is the subject matter of this chapter in an expanded form. The problem is cast as follows: A triangular array of independent (within each row) r.v.s is considered, and the sequence of the sums of the row r.v.s is investigated from the weak convergence viewpoint. The basic assumption is made that the contribution of each r.v. to the row sum is negligible, in a sense that is made precise in terms of probabilities and variances. Then, under some additional minor assumptions, a necessary and sufficient condition is stated for the row sums to converge in distribution to a N (0, 1) distributed r.v. This result is the so-called Lindeberg–Feller Theorem (Theorem 1). Actually, the scope of this chapter is much broader than just stated. Specifically, under a set of very general conditions (which are summarized under (C) in relation (12.1)), the first question posed is, what is the class of all possible limit laws of the row sums (in the weak convergence sense), and the second, under what conditions do the row sums converge weakly to a given member of this class? The answer to these questions is provided by Theorem 2, and they constitute what is known as the Central Limit Problem. The proof of Theorem 2 is long and is deferred to the last section of the chapter. As one would suspect, Theorem 1 should fall out of Theorem 2, which it does as is explicitly shown. There are another three theorems revolving around the CLT. Theorem 3 (Liapounov) gives sufficient conditions in terms of moments for the CLT to hold. Theorem 4 is a variation of Theorem 1, and Theorem 5 discusses the case where the triangular array of r.v.s collapses into a sequence of r.v.s.
Chapter 13 The Central Limit Problem: The Non-centered Case The problem investigated in this chapter is pretty much the same as the problem considered in the previous chapter with the following two differences. The row r.v.s
APPENDIX A Brief Review of Chapter 1–16
are not assumed to be centered at their expectations, and the sum of their variances is not assumed to be equal to 1, but rather to be bounded by a (finite) constant. This necessitates carrying along the sum of the expectations of the row r.v.s, as well as the sum of their variances. Then a version of Theorem 2 of the previous chapter is established (Theorem 2 in this chapter). As special cases of Theorem 2 , we state necessary and sufficient conditions for the row sums to converge weakly to a N (μ, σ 2 ) distributed r.v. (Theorem 1), as well as necessary and sufficient conditions for the row sums to converge weakly to a Poisson, P(λ), distributed r.v. (Theorem 2). As an application of Theorem 2, the familiar result is established, according to which, under suitable conditions, Binomial probabilities are approximated by (converge to) Poisson probabilities.
Chapter 14 Topics from Sequences of Independent Random Variables The ultimate purpose of this chapter is the proof of the Kolmogorov Law of Large Numbers (LLN) stated in Theorem 7. In the process of doing so, a number of auxiliary results are established that, however, are of interest in their own right. The first such result is the Kolmogorov inequality (Theorem 1). This is a two-sided inequality, which is useful in many circumstances. An application of the right-hand side of Kolmogorov’s inequality provides a sufficient condition for the a.s. convergence of the series ∞ n=1 (X n − E X n ) of independent r.v.s (Theorem 2). Next, , n ≥ 1, the Borel–Cantelli Lemma (Theorem 3) states that if for any events A n ∞ P(A ) < ∞, then P(limn→∞ An ) = 0. Should the events be independent, n n=1 P(A ) = ∞ implies that P(limn→∞ An ) = 1, which is one part the relation ∞ n n=1 of the Borel Zero–One Criterion (Theorem 4). An application of Theorem 3 yields }, n ≥ 1, of r.v.s, the condition the n ∞ ∞ ∞result that, for any two sequences {X n } and {X n=1 P(X n = X n ) < ∞ implies that the series n=1 X n and n=1 X n converge essenessentially on the same set; also, the sequences {Sn /n} and {Sn /n} converge tially on the same set and have the same limit as n → ∞, where Sn = nj=1 X j and likewise for Sn . This is so by Theorem 5. The next two results are two analytical lemmas (Toeplitz and Kronecker), followed by a weak form of an SLLN for independent r.v.s (Theorem 6). Finally, before the actual justification of the Kolmogorov SLLN is an interesting lemma is established, according to which, for a r.v. X , undertaken, ∞ ∞ j=1 P(|X | ≥ j) ≤ E|X | ≤ 1 + j=1 P(|X | ≥ j). This result provides, among other things, a necessary and sufficient condition for integrability of the r.v. X . For independent identically distributed (i.i.d.) r.v.s with expectation either ∞ or −∞, Theorem 8 shows that the averages Sn /n still converge (actually, diverge) a.s. to the expectation as n → ∞. However, if the expectation does not exist, then the sequence {Sn /n} is unbounded with probability 1; i.e., for every M > 0, P[limn→∞ (|Sn /n| > M)] = 1. This is what Theorem 9 asserts. For independent r.v.s, it is shown that their tail σ -field T (Definition 1) obeys the Kolmogorov Zero–One Law (Theorem 10); i.e., T is equivalent to the trivial σ -field
381
382
APPENDIX A Brief Review of Chapters 1–16
{ , }, or to put it differently, P(A) = 0 or 1 for every A ∈ T . The chapter is concluded with the so-called Three Series Criterion (Theorem 11), which provides necessary and sufficient conditions for the a.s. convergence of the series ∞ n=1 X n of independent r.v.s.
Chapter 15 Topics from Ergodic Theory The ultimate purpose of this chapter is to state and prove the Ergodic Theorem in its a.s. form, as well as the first-mean form. To this end, we have to introduce the necessary concepts, and also establish a long list of auxiliary results that will allow us to prove the Ergodic Theorem. We start out with a discrete time parameter stationary (stochastic) process (Definition 3), and then provide a characterization of such processes (Proposition 2). It is easily seen that independent r.v.s constitute a stationary process, if and only if they are i.i.d. (Proposition 3). Also, a general way of constructing a stationary process by means of a given stationary process is provided (Proposition 4). Since throughout the chapter we keep moving between an abstract probability space and the countably infinite product space of the Borel real line, we need the concept of the coordinate process, which is given in Definition 2. The second installment of notation and results starts out with a measurable transformation T defined on (, A, P) into itself; it is defined when T is said to be measure-preserving (Definition 4), and it is shown that, if T is measure-preserving on a field F, then it is measure-preserving on the σ -field σ (F) (Proposition 5). By means of T and a r.v. X , one defines r.v.s as in relations (15.8) and (15.9), and then one shows that these r.v.s form a stationary sequence, if and only if T is measurepreserving (Proposition 6). For the reason cited in the previous paragraph, we also introduce here the shift transformation (Definition 5), and show that this transformation is measurable (Proposition 7) and that the coordinate process is stationary, if and only if the shift transformation is measure-preserving (Proposition 8). For a measurable transformation T as given earlier, we define invariance and a.s. invariance of an event relative to T (Definition 6), and we show that the classes of invariant and a.s. events form σ -fields (Propositions 12 and 14). The transformation T is said to be ergodic, if the σ -field of invariant events (relative to T ), J , is equivalent to { , } (Definition 8). We then proceed to define invariance of a r.v. X relative to a transformation T (Definition 9) and to show that X is invariant if and only if it is J -measurable (Proposition 16). Then the transformation T is ergodic if and only if every J -measurable real-valued r.v. is a.s. equal to a constant (Proposition 17). At this point, we have all we need to prove the so-called Maximal Ergodic Theorem (Theorem 2), and then the Ergodic Theorem in its a.s. form (Theorem 1), as well as the theorem in its first mean form (Corollary 4). When the transformation T is ergodic, then the limit in the Ergodic Theorem is a.s. equal to the E X (Corollary 1).
APPENDIX A Brief Review of Chapter 1–16
In the final section of this chapter, we consider a stationary process X, but we do not assume that X is generated by a measure-preserving transformation T . Then some of the concepts defined earlier in the chapter are redefined in terms of X. Thus, invariance of an event A relative to X is defined (Definition 11), and it is shown that the class of invariant events, J , is a σ -field (Proposition 19). Also, invariance of a r.v. Y relative to X is defined (Definition 12), and it is shown that Y is invariant if and only if it is J -measurable (Proposition 20). Then for a stationary process, as considered here, the Ergodic Theorem is restated and proved in its a.s. form as well as in the first mean (Theorem 3 and Corollary 1). If the process X is ergodic; i.e., if J is equivalent to { , }, then the limit in the Ergodic Theorem is a.s. equal to the E X 1 (Corollary 2). Finally, by means of the X n s, we define the Yn r.v.s, as in Proposition 21, and show that the resulting process Y inherits the properties of X. That is, if X is stationary, so is Y, and if X is ergodic, so is Y.
Chapter 16 Two Cases of Statistical Inference: Estimation of a Real-valued Parameter, Nonparametric Estimation of a Probability Density Function The purpose of the addition of this chapter is to demonstrate as to how some of the theorems, corollaries, etc. discussed in the book apply in establishing statistical inference results. Two cases of statistical estimation are selected for this demonstration. The first is the maximum likelihood estimation of a real-valued parameter in an assumed (parametric) model, and the other is the nonparametric estimation of a probability density function via the kernel methodology. Actually, in the former case, it is shown that there is a sequence of roots of the log-likelihood function which sequence is almost surely consistent, as an estimate of the underlying parameter, as well as asymptotically normal. In the latter case, the proposed estimate is shown to be asymptotically unbiased, consistent in quadratic mean and in the probability sense, and also asymptotically normal.
383
APPENDIX
Brief Review of Riemann–Stieltjes Integral
B
In this part of the Appendix, a brief review is given for the Riemann–Stieltjes integral, and its relation to the Riemann integral, as well as the Lebesgue integral on the real line. To this effect, let g : [a, b] → (−∞ < a < b < ∞), and let F be a d.f. A partition P of [a, b] is a set of points {x0 , x1 , . . . , xn } with a = x0 < x1 < · · · < xn = b, and another partition Q of [a, b] is said to be finer than P if P ⊂ Q. In terms of P and ξ = {ξ1 , . . . , ξn }, define the Riemann–Stieltjes sum S(P, ξ ) =
n
g(ξ j ) F(x j ) − F(x j−1 ) ,
j=1
where ξ j is an arbitrary point in [x j−1 , x j ], j = 1, . . . , n. The function g is said to be Riemann–Stieltjes integrable with respect to F over the interval [a, b] if there is a number γ and a partition P such that, for every ε > 0, every partition Q finer than P and any choice of ξ j ∈ [x j−1 , x j ], j = 1, . . . , n, it holds |S(Q, ξ ) − γ | < ε. Then γ is called the Riemann–Stieltjes integral of g b b (with respect to F over [a, b]) and is denoted b by a gd F or a g(x)d F(x). Conditions ensuring the existence of a gd F can be found, e.g., in Theorem 9–19, page 206, and Theorem 9–26, page 211, in Apostol (1958). It is to be mentioned here that throughout this appendix the term “existence” of an integral always includes its being finite. Remark 1. If the integrator is a monotone function, which is the case in this book, the Riemann–Stieltjes integral may be approached by means of the so-called upper and lower Stieltjes sums, and upper and lower Stieltjes integrals. Necessary and sufficient conditions are also given for the existence of the Riemann–Stieltjes integral by means of the quantities just mention. For these, see, e.g., Definitions 9–14, 9–16, 9–18, pages 204–206, and Theorem 9–19, page 206, in Apostol (1958). The Riemann–Stieltjes integral has properties similar to those of the Riemann integral. See, e.g., Theorems 9–2, 9–3, 9–4, 9–6, 9–7, pages 193–196, and Theorem 9–29, page 213, in Apostol (1958). The following result provides a relation between the Riemann–Stieltjes and the Riemann integral of g. b Proposition 1. Suppose that the Riemann–Stieltjes integral a g(x)d F(x) exists and that F has a continuous derivative F on [a, b]. Then the Riemann integral An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00025-6 Copyright © 2014 Elsevier Inc. All rights reserved.
385
386
APPENDIX B Brief Review of Riemann–Stieltjes Integral
b a
g(x)F (x)d x exists and we have b g(x)d F(x) = a
Proof.
b
g(x)F (x)d x.
a
See, e.g., Theorem 9–8, page 197, in Apostol (1958).
The following version of the previous result provides a necessary and sufficient b condition for the existence of a g(x)d F(x). Proposition 2. Let g be a bounded function on [a, b] and let F have a continuous derivative F on [a, b]. Then g F is Riemann integrable over [a, b] if and only if g is Riemann–Stieltjes integrable with respect to F over [a, b], and then b b g(x)d F(x) = g(x)F (x)d x. a
a
See, e.g., Proposition 2, page 705, in Fristedt and Gray (1997). ∞ For an arbitrary a ∈ , the improper Riemann–Stieltjes integral a g(x)d F(x) ∞ is defined in a way similar to that of the improper Riemann integral a g(x)d x. b Specifically, let a be kept fixed and suppose that a g(x)d F(x) exists for every b > a. b We also suppose that limb→∞ a g(x)d F(x) exists. Then this limit is denoted by ∞ a g(x)d F(x) and is called the improper Riemann–Stieltjes integral of g with respect to F over [a, ∞). ∞The result below gives a necessary and sufficient condition for the existence of a g(x)d F(x). Proof.
Proposition 3. Let g : [a, ∞) be Riemann–Stieltjes integrable with respect to ∞ F on every interval [c, d] with a ≤ c < d. Then a g(x)d F(x) exists, if and only d if for every ε > 0 there exists M = M(ε) > 0 such that | c g(x)d F(x)| < ε for all d > c ≥ M. ∞ Proof. Assume that a f d F exists and denote it by I . This means that I = b b limb→∞ a f d F. Then for every ε > 0, choose M = M(ε) so that |I − a f d F| < ε for b ≥ M. Next, for d > c ≥ M, d c d | f d F| ≤ |I − f d F| + |I − f d F| ≤ ε + ε = 2ε, c
a
a
d so that the condition | c f d F| < 2ε, d > c ≥ M is satisfied. d Next, assume that | c f d F| < ε for d > c ≥ M, and let (a ≤)an −→ ∞. By setn→∞ a ting An = a n f d F, it is easy to see that {An }, n ≥ 1, is a Cauchy sequence. Therefore there is A ∈ such that lim An = A as n → ∞. Likewise, for b (a ≤)bn −→ ∞, set Bn = a n f d F, so that {Bn }, n ≥ 1, is also a Cauchy sequence. n→∞ Hence there is B ∈ such that lim Bn = B as n → ∞. But then |A − B| ≤ |A − An | + |B − Bn | + |An − Bn |
(B.1)
APPENDIX B Brief Review of Riemann–Stieltjes Integral
= |A − An | + |B − Bn | + |
bn
f d F|
(B.2)
an
< ε + ε + ε = 3ε
(B.3) an for sufficiently large n, so that an , bn ≥ M. It follows that A = B; i.e., the lim a fdF b is independent of the sequence (a ≤)an −→ ∞, and hence the limb→∞ a fdF = n→∞ ∞ a fdF exists. a b The improper integral −∞ f d F is defined similarly. If both −∞ f d F and ∞ ∞ a f d F exist for some a ∈ , we say that the improper integral −∞ f d F exists and is defined by the sum a ∞ ∞ f dF = f dF + f d F. −∞
−∞
a
The choice of a is clearly immaterial. If g : → is a Borel function, so that g(X ) is a r.v. if X is a r.v., it has been seen (see Theorem 13 in Chapter 4) that g(X )d P = g(x)d PX E g(X ) =
in the sense other and they are equal. Also, the further that, if one side exists so does the PX is the (probability) notation g(x)d FX was used to denote g(x)d PX . Here distribution of X , under P, and FX is its d.f. However if R g(x)d FX is employed as a Riemann–Stieltjes integral (as is the case in the Helly-Bray type results in Section one should ensure conditions under which it is, indeed, true that 8.3 of Chapter 8), g(x)d P = X g(x)d FX . The results below answer this question. Proposition 4. Suppose that g : → Borel is Riemann–Stieltjes integrable on every interval [a, b] with ∞ < a < b < ∞ with respect to the d.f. FX . Then, if g(x)d PX exists, it follows that ∞ g(x)d PX = g(x)d FX (= g(x)d FX ).
Proof.
−∞
See, e.g., Theorem 14, page 111, in Fristedt and Gray (1997).
The last proposition above requires the existence of the Riemann–Stieltjes integral a f (x)d FX over every interval [a, b] (−∞ < a < b < ∞) for the conclusion to be reached. The result below provides conditions for this to be the case.
b
Proposition 5. If g : [a, b] → is continuous, then the Riemann–Stieltjes integral b a g(x)d FX exists. Proof.
See, e.g., Theorem 9–26, page 211, in Apostol (1958).
Remark 2. Use of suitable combinations of results provided in this Appendix justify fully the arguments employed in discussing Theorems 6–8 in Chapter 8.
387
APPENDIX
Notation and Abbreviations
Table C.1
Notation and Abbreviations
F, A F (C), σ (C) AA (, A) k , B k , k ≥ 1 (1 , B1 ) = (, B)
↑, ↓ P , (, A, P ) IA X −1 (B) AX or X −1 (σ -field) (X ∈ B) = X −1 (B) r.v., r. vector B(n, p) P (λ) N (μ, σ 2 ) E X or μ(X ) or μX or just μ EBX P BA σ 2 (X ) (σ (X )) or σX2 (σX ) or just σ 2 (σ ) Cov (X , Y ), ρ(X , Y ) ϕX or ϕX1 ,...,Xn or just ϕ d.f. a.s.
−→ a.e.
field, σ -field field generated by class C, σ -field generated by class C σ -field of members of A which are subsets of A measurable space k -dimensional Euclidean space, Borel σ -field Borel real line increasing (nondecreasing), decreasing (nonincreasing) probability measure, probability space indicator of the set A inverse image of the set B under X σ -field induced by X the set of points in for which X takes values in B random variable, random vector Binomial distribution (or r.v.) with parameters n, p Poisson distribution (or r.v.) with parameter λ Normal distribution (or r.v.) with parameters μ, σ 2 distribution function of N (0, 1) expectation (mean value, mean) of X conditional expectation of r.v. X, given σ -field B conditional probability of event A, given σ -field B variance (standard deviation) of X Covariance of X , Y , correlation coefficient of X , Y characteristic function (ch.f.) of X or joint ch.f. of X1 , . . . , Xn distribution function almost sure (a.s.) convergence (convergence with probability 1)
−→
almost everywhere (a.e.) convergence
−→
almost uniform convergence
−→ w −→
complete convergence (of d.f.s) weak convergence (of d.f.s)
a.u. c
C
(Continued )
An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00026-8 Copyright © 2014 Elsevier Inc. All rights reserved.
389
390
APPENDIX C Notation and Abbreviations
Table C.1 d
μ
q.m.
(r )
Continued P
−→, −→, −→ −→, −→ (, A, Pθ ) θ or θ MLE θ˜n or θˆn I(θ) p.d.f
convergence in distribution, in measure, in probability convergence in quadratic mean, in the r th mean probability space parameter parameter space Maximum Likelihood Estimate estimate of θ Fisher information probability density function
Selected References
[1] Apostol, Tom M. (1958) Mathematical Analysis, second printing. AddisonWesley Publishing Company, Reading, MA. [2] Billingsley, Patrick (1995) Probability and Measure, 3rd ed. John Wiley & Sons, New York. [3] Billingsley, Patrick (1999) Convergence of Probability Measures, 2nd ed. John Wiley & Sons, New York. [4] Dieudonné, J. (1960) Foundations of Modern Analysis. Academic Press, Boston. [5] Ferguson, Thomas S. (1996). A Course in Large Sample Theory. Chapman & Hall/CRC. [6] Fristedt, Bert and Gray, Lawrence (1997) A Modern Approach to Probability Theory. Birkhäuser, Boston. Basel, Berlin. [7] Hardy, G. H., Littlewood, J. E., and Pölya, G. (1967) Inequalities, 2nd ed. Cambridge University Press, Cambridge, U.K. [8] Loève, Michel (1963) Probability, 3rd ed. Van Nostrand Company, Princeton, NJ. [9] Munroe, M. E. (1953) Introduction to Measure and Integration. Addison-Wesley Publishing Company, Reading, MA. [10] Neveu, J. (1965) Foundations of the Calculus of Probability. Holden-Day, San Francisco. [11] Parzen, E. (1962) On estimation of a probability density function and mode. Annals of Mathematical Statistics, Vol. 33, pages 1065–1076. [12] Rao, C. R. (1965) Linear Statistical Inference and Its Applications. John Wiley & Sons, New York. [13] Rosenblatt, M. (1956) Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, Vol. 27, pages 832–837. [14] Roussas, George G. (1997) A Course in Mathematical Statistics, 2nd ed. Academic Press, Boston. [15] Royden, H.L. (1988) Real Analysis, 3rd ed. Macmillan, New York. [16] Scheffé, Henry (1947) A Useful Convergence Theorem for Probability Distributions. Annals of Mathematical Statistics, Vol. 18, pages 434–438. [17] Shiryaev, A. N. (1995) Probability, 2nd ed. Springer, New York. [18] Tallarida, Ronald J. (1999) Pocket Book of Integrals and Mathematical Formulas, 3rd ed. Chapman & Hall/CRC.
An Introduction to Measure-Theoretic Probability, Second Edition. http://dx.doi.org/10.1016/B978-0-12-800042-7.00027-X Copyright © 2014 Elsevier Inc. All rights reserved.
391
392
Selected References
[19] Titchmarsch, E. C. (1939) The Theory of Functions, 2nd ed. Oxford University Press, Oxford, U.K. [20] Vestrup, Eric M. (2003) The Theory of Measure and Integration. John Wiley & Sons, Inc., New York.
Revised Answers Manual to an Introduction to Measure-Theoretic Probability George G. Roussas University of California, Davis, United States
Chapter 1 Certain Classes of Sets, Measurability, Pointwise Approximation 1.
(i) x ∈ limn→∞ An if and only if x ∈ ∪n≥1 ∩ j≥n A j , so that x ∈ ∩ j≥n 0 A j for some n 0 ≥ 1, and then x ∈ A j for all j ≥ n 0 , or x ∈ ∪ j≥n A j for all n ≥ 1, so that x ∈ ∩n≥1 ∪ j≥1 A j limn→∞ An . c c = ∪n≥1 ∩ j≥n A j = ∩n≥1 ∪ j≥n Acj = limn→∞ Acn , (ii) limn→∞ An c c limn→∞ An = ∩n≥1 ∪ j≥n A j = ∪n≥1 ∩ j≥n Acj = limn→∞ Acn . c Let limn→∞ An = A. Then limn→∞ Acn = limn→∞ An = c c c lim A = A , and limn→∞ An = limn→∞ An = n→∞ n c limn→∞ An = Ac , so that limn→∞ Acn exists and is Ac . (iii) To show that limn→∞ (An ∩ Bn ) = (limn→∞ An ) ∩ (limn→∞ Bn ). Equivalently, ∞ ∞ ∞ ∞ ∞ ∞ ∪ ∩ (A j ∩ B j ) = ∪ ∩ A j ∩ ∪ ∩ B j . n=1 j=n
n=1 j=n
n=1 j=n
Indeed, let x belong to the left-hand side. Then x ∈ ∩∞ j=n 0 (A j ∩ B j ) for some n 0 ≥ 1, hence x ∈ (A j ∩ B j ) for all j ≥ n 0 , and then x ∈ A j and ∞ x ∈ B j for all j ≥ n 0 . Hence x ∈ ∩∞ j=n 0 A j and x ∈ ∩ j=n 0 B j , so that x ∈ ∞ ∞ ∞ ∞ ∪n=1 ∩ j=n A j and x ∈ ∪n=1 ∩ j=n B j ; i.e., x belongs to the right-hand side. ∞ Next, let x belong to the right-hand side. Then x ∈ ∪∞ n=1 ∩ j=n A j and x ∈ ∞ ∞ ∞ ∞ ∪n=1 ∩ j=n B j , so that x ∈ ∩ j=n 1 A j and x ∈ ∩ j=n 2 B j for some n 1 , n 2 ≥ 1. ∞ Then x ∈ ∩∞ j=n 0 A j and x ∈ ∩ j=n 0 B j where n 0 = max(n 1 , n 2 ), and hence x ∈ A j and x ∈ B j for all j ≥ n 0 . Thus, x ∈ (A j ∩ B j ) for all j ≥ n 0 , ∞ ∞ so that x ∈ ∩∞ j=n 0 (A j ∩ B j ) and hence x ∈ ∪n=1 ∩ j=n (A j ∩ B j ); i.e., x belongs to the left-hand side. Next, limn→∞ (An ∪ Bn ) = limn→∞ (Acn ∩ Bnc )c = [limn→∞ (Acn ∩ Bnc )]c (by part (ii)), and this equals to [(limn→∞ Acn ) ∩ (limn→∞ Bnc )]c (by what we just proved), and this equals [(limn→∞ An )c ∩ (limn→∞ Bn )c ]c = (limn→∞ An ) ∪ (limn→∞ Bn ), as was to be seen. An Introduction to Measure-Theoretic Probability. http://dx.doi.org/10.1016/B978-0-12-800042-7.00028-1 Copyright © 2014 Elsevier Inc. All rights reserved.
e1
e2
Revised Answers Manual to an Introduction
(iv) To show that: limn→∞ (An ∩ Bn ) ⊆ (limn→∞ An ) ∩ (limn→∞ Bn ) and limn→∞ (An ∪ Bn ) ⊇ (limn→∞ An ) ∪ (limn→∞ Bn ). ∞ (A ∩ B ) ∞ ∪∞ A ∩ ∪ ⊆ ∩ Suffices to show: ∩∞ j j j n=1 j=n n=1 j=n ∞ ∩∞ n=1 ∪ j=n B j . Indeed, let x belong to the left-hand side. Then x ∈ ∪∞ j=n (A j ∩ B j ) for all n ≥ 1, so that x ∈ (A j ∩ B j ) for some j ≥ n and all n ≥ 1. Then x ∈ A j and x ∈ B j for some j ≥ n and all n ≥ 1, hence x ∈ ∪∞ j=n A j and ∞ ∪∞ A and x ∈ ∩∞ ∪∞ B , B for all n ≥ 1, so that x ∈ ∩ x ∈ ∪∞ j j n=1 j=n j j=n n=1 j=n ∞ ∞ ∞ ∞ and hence x ∈ ∩n=1 ∪ j=n A j ∩ ∩n=1 ∪ j=n B j ; i.e., x belongs to the right-hand side. So, the above inclusion is correct. ∞ ∞ ∞ ∞ ∞ Also, to show that : ∪∞ n=1 ∩ j=n A j ∪ ∪n=1 ∩ j=n B j ⊆ ∪n=1 ∩ j=n (A j ∪ B j ). ∞ Indeed, let x belong to the left-hand side. Then x ∈ ∪∞ n=1 ∩ j=n A j or ∞ ∞ ∞ ∞ x ∈ ∪n=1 ∩ j=n B j or to both. Let x ∈ ∪n=1 ∩ j=n A j . Then x ∈ ∩∞ j=n 0 A j for some n 0 ≥ 1, hence x ∈ A j for all j ≥ n 0 , and then x ∈ (A j ∪ B j ) ∞ for all j ≥ n 0 , so that x ∈ ∪∞ n=1 ∩ j=n (A j ∪ B j ); i.e., x belongs to the ∞ right-hand side. Similarly if x ∈ ∪∞ n=1 ∩ j=n B j . An alternative proof of the second part is as follows:
∞ ∞ c ∞ ∞ c c lim(An ∪ Bn ) = (Ak ∪ Bk ) = (Ak ∩ Bk ) =
n=1 k=n lim(Ack
∩
c Bkc )
⊇
n=1 k=n c lim Ack ∩ lim Bkc
(by the previous part) ∞ ∞ c ∞ ∞ c c = ∪ Bk n=1 k=n ∞ ∞
=
n=1 k=n
Ak
n=1 k=n ∞ ∞
∪
Bk
= (lim An ) ∪ (lim Bn ).
n=1 k=n
(v) That the inverse inclusions in part (iv) need not hold is demonstrated by the following Counterexample: Let A2 j−1 = A, A2 j = A0 and B2 j−1 = B, B2 j = B0 , j ≥ 1, for some events A, A0 , B and B0 . Then: limn→∞ An = A ∩ A0 , limn→∞ An = A ∪ A0 , limn→∞ Bn = B ∩ B0 , limn→∞ Bn = B ∪ B0 , limn→∞ (An ∩ Bn ) = (A∩ B)∪(A0 ∩ B0 ), limn→∞ (An ∪ Bn ) = (A∪ B)∩(A0 ∪ B0 ). Therefore (A ∪ B) ∩ (A0 ∪ B0 ) need not contain (A ∪ A0 ) ∩ (B ∪ B0 ), and (A ∩ A0 ) ∪ (B ∩ B0 ) need not contain (A ∪ B) ∩ (A0 ∪ B0 ). As a concrete example, take = , A = (0, 1], A0 = [2, 3], B = [1, 2], B0 = [3, 4]. Then: (A ∪ B) ∩ (A0 ∪ B0 ) = (0, 2], (A ∪ A0 ) ∩
Revised Answers Manual to an Introduction
(B ∪ B0 ) = ((0, 1] ∪ [2, 3]) ∩ ([1, 2] ∪ [3, 4]) = {1} ∪ {3} = {1, 3} (0, 2], and (A ∩ A0 ) ∪ (B ∩ B0 ) = ∪ = , (A ∪ B) ∩ (A0 ∪ B0 ) = (0, 2] ∩ [2, 4] = {2} not contained in . (vi) If limn→∞ An = A and limn→∞ Bn = B, then by parts (iii) and (iv): limn→∞ (An ∩ Bn ) ⊆ A ∩ B and limn→∞ (An ∩ Bn ) = A ∩ B. Thus, A ∩ B = limn→∞ (An ∩ Bn ) ⊆ limn→∞ (An ∩ Bn ) ⊆ A ∩ B, so that limn→∞ (An ∩ Bn ) = A ∩ B. Likewise: A ∪ B ⊆ limn→∞ (An ∪ Bn ) ⊆ limn→∞ (An ∪ Bn ) = A ∪ B, so that limn→∞ (An ∪ Bn ) = A ∪ B. (vii) Since An B = (An − B) + (B − An ) = (An ∩ B c ) + (B ∩ Acn ), we have limn→∞ (An ∩ B c ) = (limn→∞ An ) ∩ B c = A ∩ B c by part (vi), and limn→∞ (B ∩ Acn ) = B ∩ (limn→∞ Acn ) = B ∩ Ac by parts (vi) and (ii). Therefore, by part (vi) again, limn→∞ (An B) = limn→∞ [(An ∩ B c ) + (B ∩ Acn )] = limn→∞ (An ∩ B c ) + limn→∞ (B ∩ Acn ) = (A ∩ B c ) + (B ∩ Ac ) = A B. (viii) A2 j−1 = B, A2 j = C, j ≥ 1. Then, as in part (v), limn→∞ An = B∩C and limn→∞ An = B ∪ C. The limn→∞ An exists if and only if B ∩ C = B ∪ C, or B ∪C = (B ∩C c )+(B c ∩C)+(B ∩C) = B ∩C. Then, by the pairwise disjointness of B ∩ C c , B c ∩ C and B ∩ C, we have B ∩ C c = B c ∩ C = . From B ∩ C c = , it follows that B ⊆ C, and from B c ∩ C = , it follows that C ⊆ B. Therefore B = C. Thus, limn→∞ An exists if and only if B = C. # 2.
(i) All three sets A, A, and A (if it exists) are in A, because they are expressed in terms of An , n ≥ 1, by means of countable operations. ∞ ∞ (ii) Let An ↑. Then limn→∞ An = ∪∞ n=1 ∩ j=n A j = ∪n=1 An , and lim n→∞ ∞ ∞ ∞ ∞ ∞ An = ∩n=1 ∪ j=n A j = ∪ j=n A j = ∪ j=1 A j = ∪n=1 An , so that limn→∞ An = ∪∞ n=1 An . c c ∞ ∞ ∞ If An ↓, then Acn ↑ and hence ∩∞ n=1 ∪ j=n A j = ∪n=1 ∩ j=n A j = ∞ ∞ ∞ ∞ c ∪∞ n=1 An , so that, by taking the complements, ∪n=1 ∩ j=n A j = ∩n=1 ∪ j=n ∞ ∞ A j = ∩n=1 An , so that limn→∞ An = ∩n=1 An . #
3.
(i) ∩ j∈I F j = since, e.g., ∈ F j , j ∈ I . Next, if A ∈ ∩ j∈I F j for all j ∈ I , and hence Ac ∈ F j for all j ∈ I , so that Ac ∈ ∩ j∈I F j . Finally, if A, B ∈ ∩ j∈I F j , then A, B ∈ F j for all j ∈ I , and hence A ∪ B ∈ F j for all j ∈ I , so that A ∪ B ∈ ∩ j∈I F j . (ii) If Ai ∈ ∩ j∈I A j , i = 1, 2, . . . , then Ai ∈ A j , i = 1, 2, . . . , for all j ∈ I , ∞ A ∈ A for all j ∈ I , so that ∪∞ A ∈ ∩ and hence ∪i=1 i j j∈I A j . # i=1 i
4. Let = , F = {A ⊆ ; either A or Ac is finite}, and let A j = {1, 2, . . . , j}, / F, j ≥ 1. Then F is a field and A j ∈ F, j ≥ 1, but ∪∞ j=1 A j = {1, 2, . . .} ∈ because neither this set nor its complement is finite. Also, if B j = { j + 1, j + 2, . . .}, then B j ∈ F j since B cj is finite, whereas c c ∞ ∞ ∈ / F, as it has been seen already. # ∩∞ j=1 B j = ∩ j=1 A j = ∪ j=1 A j
e3
e4
Revised Answers Manual to an Introduction
5. Clearly, C is = , every member of C is a countable union of members of P, and C is the smallest σ -field containing P, if indeed, is a σ -field. If B ∈ C, then B = ∪i∈I Ai for some I ⊆ N = {1, 2, . . .}, and then B c = ∪ j∈J A j , where J = N− I , so that B c ∈ C. Finally, if B j ∈ C, j = 1, 2, . . . , then B j = ∪i∈I j A ji , ∞ where I j ⊆ N and Ii ∩ I j = . Then ∪∞ j=1 B j = ∪ j=1 ∪i∈I j A ji , the union of ∞ members of P, so that ∪ j=1 B j belongs in C. # 6. Since C j and C j ⊆ C0 , j = 1, . . . , 8, it follows that σ (C j ) and σ (C j ) ⊆ σ (C0 ) = B, so that it suffices to show that B ⊆ σ (C j ) and B ⊆ σ (C j ), which are implied, respectively, by C0 ⊆ σ (C j ) and C0 ⊆ σ (C j ), j = 1, . . . , 8. As an example, consider the classes mentioned in the hint. So, to show that C0 ⊆ σ (C1 ). In all that follows, all limits are taken as n → ∞. Indeed, for yn ↓ y, we have (x, yn ) ∈ C1 and ∩∞ n=1 (x, yn ) = (x, y] ∈ σ (C1 ). Likewise, for xn ↑ x, we have (xn , y) ∈ C1 and ∩∞ n=1 (x n , y) = [x, y) ∈ σ (C1 ). Next, with xn and yn as above, (xn , yn ) ∈ C1 and ∩∞ n=1 (x n , yn ) = [x, y] ∈ σ (C1 ). Also, for xn ↓ −∞, we have (xn , a) ∈ C1 and ∩∞ n=1 (x n , a) = (−∞, a) ∈ (x , a] = (−∞, a] ∈ σ (C1 ). Finally, σ (C1 ), and likewise (xn , a] ∈ C1 and ∪∞ n=1 n (b, ∞) = (−∞, b]c ∈ σ (C1 ), and [b, ∞) = (−∞, b)c ∈ σ (C1 ). It follows that C0 ⊆ σ (C1 ). That C0 ∈ σ (C1 ) is seen as follows. For (x, y), there exist xn and yn rationals with xn ↓ x and yn ↑ y, so that (x, y) = ∪∞ n=1 ∈ σ (C j ). Also, for yn ↓ y, we have (x, yn ) ∈ σ (C1 ), as was just proved, and then ∩∞ n=1 (x, yn ) = (x, y] ∈ σ (C1 ). Likewise, with xn ↑ x, we have (xn , y) ∈ σ (C1 ) and then ∩∞ n=1 (x n , y) = [x, y) ∈ σ (C1 ). Also, with xn ↑ x and yn ↓ y, we have (xn , yn ) ∈ σ (C1 ), and ∩∞ n=1 (x n , yn ) = [x, y] ∈ σ (C1 ). Likewise, with x n ↓ −∞, we have (x n , a) ∈ ∞ σ (C1 ) and ∪n=1 (xn , a) = (−∞, a) ∈ σ (C1 ), whereas (xn , a] ∈ σ (C1 ), so that c ∪∞ n=1 (x n , a] = (−∞, a] ∈ σ (C1 ). Finally, (b, ∞) = (−∞, b] ∈ σ (C1 ) since c (−∞, b] ∈ σ (C1 ), and [b, ∞) = (−∞, b) ∈ σ (C1 ) since (−∞, b) ∈ σ (C1 ). It follows that C0 ⊆ σ (C1 ). A slightly alternative version of the proof follows. We will show (a) σ (C1 ) = B and (b) σ (C1 ) = B. (a) σ (C1 ) = B. That σ (C1 ) ⊆ B is clear; to show B ⊆ σ (C1 ) it suffices to show that C0 ⊆ σ (C1 ). To this end, we show that (x, y] ∈ σ (C1 ). Indeed, x, y + n1 ∈ C1 , so that ∞ x, y + 1 = (x, y] ∈ σ (C1 ). Next, x − n1 , y ∈ C1 , ∞ n=1 1 n so that n=1 x − n , y = [x, y) ∈ σ (C1 ). Also, x − n1 , y + n1 ∈ C1 , so that ∞ x − n1 , y + n1 = [x, y] ∈ σ (C1 ). Next, (−n, x) ∈ C1 , so
∞ n=1 that n=1 (−n, x) = (−∞, x) ∈ σ (C1 ). Also, −∞, x + n1 ∈ C1 , so
that ∞ −∞, x + n1 = (−∞, x] ∈ σ (C1 ). Likewise, (x, n) ∈ C1 , so n=1 that ∞ (x, n) = (x, ∞) ∈ σ (C1 ); and x − n1 , ∞ ∈ σ (C1 ), so that ∞ n=1 1 n=1 x − n , ∞ = [x, ∞) ∈ σ (C1 ). The proof is complete. (b) σ (C1 ) = B.
Revised Answers Manual to an Introduction
Since, clearly, σ (C1 ) ⊆ σ (C1 ), it suffices to show that σ (C1 ) ⊆ σ (C1 ). For x, y ∈ with x < y, there exist xn ↓ x and yn ↑ y with xn , yn rational numbers and xn < yn for each n. Since (xn , yn ) ∈ C1 , it follows that ∞ n=1 (x n , yn ) = (x, y) ∈ σ (C1 ). So C1 ⊆ σ (C1 ), and hence σ (C1 ) ⊆ σ (C1 ). The proof is complete. # 7.
(i) Let A ∈ C. Then there are the following possible cases: m (a) A = i=1 Ii , Ii = (αi , βi ], i = 1, . . . , m. ( α1
(b)
(c)
(d)
(e)
] β1
( α2
] β2
···
( ] ( αm−1 βm−1 αm
] βm
Then Ac = (−∞, α1 ] + (β1 , α2 ] + . . . + (βm−1 , αm ] + (βm , ∞) and this is in C. A consists only of intervals of the form (−∞, α]. Then there can be only one such interval; i.e., A = (−∞, α] and hence Ac = (α, ∞) which is in C. A consists only of intervals of the form (β, ∞). Then there can only be one such interval; i.e., A = (β, ∞) so that Ac = (−∞, β] which is in C. A consists only of intervals of the form (−∞, α] and (β, ∞). Then A will be as follows: A = (−∞, α] + (β, ∞) (α < β), so that Ac = (α, ∞) ∩ (−∞, β] = (α, β] which is in C. Finally, let A consist of intervals of all forms. Then A is as below: ] −∞ α
( ] ( ] ( ] ( α1 β1 α2 β2 · · · αm−1 βm−1 αm
] βm
( β
∞
Then, clearly, Ac = (α, α1 ] + (β1 , α2 ] + . . . + (βm−1 , αm ] + (βm , β] which is in C. So, C is closed under complementation. It is also closed under the union of two sets A and B in C, because, clearly, the union of two such sets is also a member of C. Thus, C is a field. Next, let C2 = {(α, β]; α, β ∈ , α < β}. Then, by Exercise 6, σ (C2 ) = B. Also, C2 ⊂ C, so that B = σ (C2 ) ⊆ σ (C). Furthermore, C ⊆ σ (C0 ) = B and hence σ (C) ⊆ B. It follows that σ (C) = B. m (ii) If A ∈ C, then A = i=1 Ii , where Ii s are of the forms: (α, β), (α, β], = [α, β), [α, β], (−∞, α), (−∞, α], (β, ∞), [β, ∞). But (α, β)c (−∞, α] + [β, ∞), (α, β]c = (−∞, α] + (β, ∞), [α, β)c = (−∞, α) + (β, ∞), [α, β]c = (−∞, α)+(β, ∞), (−∞, α)c = [α, ∞), (−∞, α]c = (α, ∞), (β, ∞)c = (−∞, β], and [β, ∞)c = (−∞, β). Then, considerc ing all possibilities as in part (i), we conclude that A ∈ C in all cases. Next, for A as above and B = nj=1 J j with J j being from among the above intervals, it follows that A ∪ B is a finite sum of intervals as above,
e5
e6
Revised Answers Manual to an Introduction
and hence A ∪ B ∈ C. Thus, C is a field. Finally, from C0 ⊂ C ⊂ B, it follows that B = σ (C0 ) ⊆ σ (C) ⊆ B, so that σ (C) = B. # 8. Clearly, F A is = since, for example, A = A ∩ and hence A ∈ F A . Next, for B ∈ F A , it follows that B = A ∩ C, C ∈ F, and B cA (=complement of B with respect to A)=A ∩ C c ∈ F A since C c ∈ F. Finally, for B1 , B2 ∈ F A , it follows that Bi = Ai ∩ Ci , Ci ∈ F, i = 1, 2, and then B1 ∪ B2 = A ∩ (C1 ∪ C2 ) ∈ F A , since C1 ∪ C2 ∈ F. # 9. That A A = and that it is closed under complementation is as in Exercise 8. For Bi ∈ A A , i = 1, 2, . . . , it follows that Bi = A ∩ Ci for some Ci ∈ A, i ≥ 1, ∞ B = ∪∞ (A ∩ C ) = A ∩ ∪∞ C ∈ A since ∪∞ C ∈ A. and ∪i=1 i i A i=1 i=1 i i=1 i Thus, A A is a σ -field. Since F ⊆ A, it follows that F A ⊆ A A and hence σ (F A ) ⊆ A A . Since for every F ⊆ Ai , i ∈ I , it follows F A ⊆ Ai,A , i ∈ I , then σ (F A ) ⊆ ∩i∈I Ai,A . Also, σ (F A ) = ∩ j∈J A∗j for all σ -fields of subsets of A with A∗j ⊇ F A . In order to show that σ (F A ) = A A , it must be shown that for every σ field A∗ of subsets of A with A∗ ⊇ F A , we have A∗ ⊇ A A . That this is, indeed, the case is seen as follows. Define the class M by : M = {C ∈ A; A ∩ C ∈ A∗ }. Then, clearly, F ⊆ M ⊆ A and M A (= M ∩ A) ⊆ A∗ . This is so because, for C ∈ F, it follows that C ∩ A ∈ F A and hence C ∩ A ∈ A∗ (⊇ F A ). Also, with M A = {C ⊆ A; C = M ∩ A, M ∈ M}, it follows that M A ⊆ A∗ from the definition of M. We assert that M is a monotone class. Indeed, let Cn ∈ M with Cn ↓. Then, for the case that Cn ↑, A∩(limn→∞ Cn ) = A∩ ∪∞ Cn ↑ or n=1 C n = ∗ ∗ ∪∞ n=1 A ∩ C n ∈ A since A ∩ C n ∈ A , n ≥ 1, so that lim n→∞ C n ∈ M. ∞ ∗ Likewise, for Cn ↓, A ∩ (limn→∞ Cn ) = A ∩ (∩∞ n=1 C n ) = ∩n=1 (A ∩ C n ) ∈ A ∗ since A ∩ Cn ∈ A , n ≥ 1, so that limn→∞ Cn ∈ M. So M is a monotone class ⊇ F, and hence M ⊇ minimal monotone class M0 , say, ⊇ F. Since F is a field, it follows that M0 is a σ -field and indeed M0 = A (by Theorem 6). Finally, A = M0 ⊆ M implies A A = M0,A ⊆ M A ⊆ A∗ , as was to be seen. # c 10. Set F = ∪∞ n=1 An , and let A ∈ F. Then A ∈ An for some n, so that A ∈ An and hence A ∈ F. Next, let A, B ∈ F. Then A ∈ An 1 , B ∈ An 2 for some n 1 and n 2 , and let n 0 = max(n 1 , n 2 ). Then A, B ∈ An 0 , so that A ∪ B ∈ An 0 and A ∪ B ∈ F. Then, Ac ∈ F and A ∪ B ∈ F, so that F is a field. It need not be a σ -field. Counterexample: Let = and let An = {A ⊆ [−n, n]; either A or Ac is countable}, n ≥ 1. Then An is a σ -field (by Example 8) and An ↑. However, F is not a σ -field because, if An = {rationals in [−n, n]}, n ≥ 1, and if we set / F, because otherwise A ∈ An for some n, which cannot A = ∪∞ n=1 An , then A ∈ happen. # 11. Set M ∩ j∈I M j and let An ∈ M, n ≥ 1, where the An s form a monotone sequence. Then An ∈ M j for each j ∈ I and all n ≥ 1, so that limn→∞ An is also in M j . Since this is true for all j ∈ I , it follows that limn→∞ An is in M, and M is a monotone class. # 12. Let = {1, 2, . . .}, M = {, {1, . . . , n}, {n, n + 1, . . .}, n ≥ 1, }. Then M is a monotone class, but not a field, because, e.g., if A = {1, . . . , n} and B = {n − 2, n − 1, . . .} (n ≥ 3), then A, B ∈ M, but A ∩ B = {n − 2, n − 1, n} ∈ / M.
Revised Answers Manual to an Introduction
As another example, let = (0, 1) and M = {(0, 1 − n1 ], n ≥ 1, }. Then M is a monotone class and (0, 21 ] ∈ M, but (0, 21 ]c = ( 21 , 1) ∈ / M. Still as a third example, let = and let M = {, (0, n), (−n, 0), n ≥ 1, (0, ∞), (−∞, 0)}. Then M is a monotone class, but not a field since, for A = (−1, 0) and B = (0, 1), we have A, B, ∈ M, but A ∪ B = (−1, 1) ∈ / M. # c 13. / E = A × B, so that either ω1 ∈ / A (i) For ω = (ω1 , ω2 ) ∈ E , we have ω ∈ / B or both. Let ω1 ∈ / A. Then ω1 ∈ Ac and (ω1 , ω2 ) ∈ Ac × 2 , or ω2 ∈ whether or not ω2 ∈ B. Hence E c ⊆ (A × B c ) + (Ac × 2 ). If ω1 ∈ A, / B, so that (ω1 , ω2 ) ∈ A × B c and E c ⊆ (A × B c ) + (Ac × 2 ). then ω2 ∈ / B, so that (ω1 , ω2 ) ∈ /E Next, if (ω1 , ω2 ) ∈ A×B c , then ω1 ∈ A and ω2 ∈ / A and hence and hence (ω1 , ω2 ) ∈ E c . If (ω1 , ω2 ) ∈ Ac × 2 , then ω1 ∈ / A × B = E whether or not ω2 ∈ B. Thus (ω1 , ω2 ) ∈ E c . (ω1 , ω2 ) ∈ In both cases, (A × B c ) + (Ac × 2 ) ⊇ E c and equality follows. The second equality is entirely symmetric. (ii) Let (ω1 , ω2 ) ∈ E 1 ∩ E 2 , so that (ω1 , ω2 ) ∈ E 1 and (ω1 , ω2 ) ∈ E 2 and hence ω1 ∈ A1 , ω2 ∈ B1 , and ω1 ∈ A2 , ω2 ∈ B2 . It follows that ω1 ∈ A1 ∩ A2 , ω2 ∈ B1 ∩ B2 and hence (ω1 , ω2 ) ∈ (A1 ∩ A2 )×(B1 ∩ B2 ). Next, (ω1 , ω2 ) ∈ (A1 ∩ A2 ) × (B1 ∩ B2 ), so that ω1 ∈ A1 ∩ A2 and ω2 ∈ B1 ∩ B2 . Thus, ω1 ∈ A1 , ω1 ∈ A2 and ω2 ∈ B1 , ω2 ∈ B2 , so that (ω1 , ω2 ) ∈ A1 ∩ B1 and (ω1 , ω2 ) ∈ A2 ∩ B2 , or (ω1 , ω2 ) ∈ E 1 ∩ E 2 , so that equality occurs. The second conclusion is immediate. (iii) Indeed, E 1 ∩ F1 = (A1 ∩ A1 ) × (B1 ∩ B1 ) and E 2 ∩ F2 = (A2 ∩ A2 ) × (B2 ∩B2 ), by part (ii), and the first equality follows. Next, again by part (ii), and replacing E 1 by (A1 ∩ A1 )×(B1 ∩B1 ) and E 2 by (A2 ∩ A2 )×(B2 ∩B2 ), we obtain the second equality. The third equality is immediate. Finally, the last conclusion is immediate. # 14.
(i) Either by the inclusion process or as follows: (A1 × B1 ) − (A2 × B2 ) = (A1 × B1 ) ∩ (A2 × B2 )c = (A1 × B1 ) ∩ [(A2 × B2c ) + (Ac2 × 2 )] (by Lemma 2) = (A1 × B1 ) ∩ (A2 × B2c ) + (A1 × B1 ) ∩ (Ac2 × 2 ) = (A1 ∩ A2 ) × (B1 ∩ B2c ) + (A1 ∩ Ac2 ) × (B1 ∩ 2 ) (clearly) = (A1 ∩ A2 ) × (B1 − B2 ) + (A1 − A2 ) × B1 . (ii) Let A × B = . Then (x, y) ∈ A × B, so that x ∈ A and y ∈ B. Also, (x, y) ∈ and this can happen only if at least one of A or B is = . On the other hand, if at least one of A or B is = , then, clearly, A × B = . (iii) Let A1 × B1 ⊆ A2 × B2 . Then (x, y) ∈ A1 × B1 , so that x ∈ A1 and y ∈ B1 . Also, (x, y) ∈ A2 × B2 implies x ∈ A2 and y ∈ B2 . Thus, A1 ⊆ A2 and B1 ⊆ B2 . Next, let A1 ⊆ A2 and B1 ⊆ B2 . Then A1 × B1 ⊆ A2 × B2 since (x, y) ∈ A1 × B1 if and only if x ∈ A1 and y ∈ B1 . Hence, x ∈ A2 and y ∈ B2 or (x, y) ∈ A2 × B2 .
e7
e8
Revised Answers Manual to an Introduction
(iv) A1 × B1 = and A2 × B2 = . Then A1 × B1 = A2 × B2 or A1 × B1 ⊆ A2 × B2 and then (by (iii)), A1 ⊆ A2 and B1 ⊆ B2 . Also, A2 × B2 = A1 × B1 or A2 × B2 ⊆ A1 × B1 , and then (by (iii) again), A2 ⊆ A1 and B2 ⊆ B1 . So, both A1 ⊆ A2 and A2 ⊆ A1 , and therefore A1 = A2 . Likewise, B1 ⊆ B2 and B2 ⊆ B1 so that B1 = B2 . (v) A × B = (A1 × B1 ) + (A2 × B2 )
(*)
From = (A1 × B1 ) ∩ (A2 × B2 ) = (A1 ∩ A2 ) × (B1 ∩ B2 ) and part (ii), we have that at least one of A1 ∩ A2 , B1 ∩ B2 is . Let A1 ∩ A2 = . Then the claim is that A = A1 + A2 . In fact, (x, y) ∈ A × B implies x ∈ A (and y ∈ B). Also, (x, y) belonging to the right-hand side of (*) implies (x, y) ∈ A1 × B1 or (x, y) ∈ A2 × B2 . Let (x, y) ∈ A1 × B1 . Then x ∈ A1 (and y ∈ B1 ), so that A ⊆ A2 . On the other hand, (x, y) ∈ A2 × B2 implies x ∈ A2 (and y ∈ B2 ), so that A ⊆ A2 . Thus, A ⊆ A1 + A2 . Next, let again (x, y) belong to the right-hand side of (*). Then (x, y) ∈ A1 × B1 or (x, y) ∈ A2 × B2 . Now (x, y) ∈ A1 × B1 implies that x ∈ A1 (and y ∈ B1 ). Also, (x, y) belonging to the left-hand side of (*) implies (x, y) ∈ A × B, so that x ∈ A (and y ∈ B). Hence A1 ⊆ A. Likewise, (x, y) ∈ A2 × B2 implies A2 ⊆ A, so that A1 + A2 ⊆ A, and hence A = A1 + A2 . Next, let A = A1 + A2 . Then A × B = (A1 + A2 ) × B = (A1 × B) + (A2 × B). Also, A × B = (A1 × B1 ) + (A2 × B2 ). Thus, (A1 × B) + (A2 × B) = (A1 × B1 ) + (A2 × B2 ). (x, y) belonging to the left-hand side of (*) implies (x, y) ∈ A1 × B or (x, y) ∈ A2 × B. (x, y) ∈ A1 × B yields y ∈ B (and x ∈ A1 ). Same if (x, y) ∈ A2 × B. Also, (x, y) belonging to the right-hand side of (*) implies (x, y) ∈ A1 × B1 or (x, y) ∈ A2 × B2 . For (x, y) ∈ A1 × B1 , we have y ∈ B1 (and x ∈ A1 ), so that B ⊆ B1 . For (x, y) ∈ A2 × B2 , we have B ⊆ B2 likewise. Next, let again (x, y) belong to the right-hand side of (*). Then (x, y) ∈ A1 × B1 or (x, y) ∈ A2 × B2 . For (x, y) ∈ A1 × B1 , we have y ∈ B1 (and x ∈ A1 ). Thus B1 ⊆ B. For (x, y) ∈ A2 × B2 , we have B2 ⊆ B. It follows that B = B1 = B2 . To summarize: A1 ∩ A2 = implies A = A1 + A2 and B = B1 = B2 . Likewise, B1 ∩ B2 = implies B = B1 + B2 and A = A1 = A2 . Furthermore, A1 ∩ A2 = and B1 ∩ B2 = cannot happen simultaneously. Indeed, A1 ∩ A2 = implies A = A1 + A2 , and B1 ∩ B2 = implies B = B1 + B2 . Then A× B = (A1 + A2 )×(B1 + B2 ) = (A1 × B1 )+(A2 × B2 )+(A1 ×B2 )+(A2 ×B1 ). Also, A×B = (A1 ×B1 )+(A2 ×B2 ), so that : (A1 × B1 )+(A2 × B2 )+(A1 × B2 )+(A2 × B1 ) = (A1 × B1 )+(A2 × B2 ). Then (A1 × B2 ) + (A2 × B1 ) = implies (A1 × B2 ) = (A2 × B1 ) = , so that at least one of A1 , A2 , B1 , B2 = (by part (ii)). However, this is not possible by the fact that A1 × B1 = , A2 × B2 = . #
Revised Answers Manual to an Introduction
15.
(i) If either A or B = , then, clearly, A × B = . Next, if A × B = , and A = and B = , then there exist ω1 ∈ A and ω2 ∈ B, so that (ω1 , ω2 ) ∈ A × B, a contradiction. (ii) Both directions of the first assertion are immediate. Without the assumption E 1 and E 2 = , the result need not be true. Indeed, let 1 = 2 , A1 = , B1 = A2 = B2 = . Then E 1 = E 2 = , but A1 A2 . #
16.
(i) If at least one of A1 , . . . , An is = , then, clearly, A1 × . . . × An = . Next, let E = and suppose that Ai = , i = 1, . . . , n. Then there exists ωi ∈ Ai , i = 1, . . . , n, so that (ω1 , . . . , ωn ) ∈ E, a contradiction. (ii) Let ω = (ω1 , . . . , ωn ) ∈ E ∩ F, or (ω1 , . . . , ωn ) ∈ (A1 × . . . × An ) ∩ (B1 × . . . × Bn ). Then (ω1 , . . . , ωn ) ∈ A1 × . . . × An and (ω1 , . . . , ωn ) ∈ B1 × . . . × Bn . It follows that ωi ∈ Ai and ωi ∈ Bi , i = 1, . . . , n, so that ωi ∈ Ai ∩ Bi , i = 1, . . . , n, and hence (ω1 , . . . , ωn ) ∈ (A1 ∩ B1 ) × . . . × (An ∩ Bn ). Next, let (ω1 , . . . , ωn ) ∈ (A1 ∩ B1 ) × . . . × (An ∩ Bn ). Then ωi ∈ Ai ∩ Bi , i = 1, . . . , n, so that ωi ∈ Ai and ωi ∈ Bi , i = 1, . . . , n. It follows that (ω1 , . . . , ωn ) ∈ A1 × . . . × An and (ω1 , . . . , ωn ) ∈ B1 × . . . × Bn , so that (ω1 , . . . , ωn ) ∈ (A1 × . . . × An ) ∩ (B1 × . . . × Bn ). #
17. We have E = F + G and E, F, G are all = . This implies that Ai , Bi , and Ci , i = 1, . . . , n are all = ; this is so by Exercise 16(i). Furthermore, by Exercise 16(ii): F ∩ G = (B1 × . . . × Bn ) ∩ (C1 × . . . × Cn ) = (B1 ∩ C1 ) × . . . × (Bn ∩ Cn ), whereas F ∩ G = . It follows that B j ∩ C j = for at least one j, 1 ≤ j ≤ n. Without loss of generality, suppose that B1 ∩ C1 = . Then we shall show that A1 = B1 + C1 and Ai = Bi = Ci , i = 2, . . . , n. To this end, let ω j ∈ A j , j = 1, . . . , n. Then (ω1 , . . . , ωn ) ∈ A1 × . . . × An or (ω1 , . . . , ωn ) ∈ E or (ω1 , . . . , ωn ) ∈ (F + G). Hence (ω1 , . . . , ωn ) ∈ F or (ω1 , . . . , ωn ) ∈ G. Let (ω1 , . . . , ωn ) ∈ F. Then (ω1 , . . . , ωn ) ∈ B1 × . . . × Bn and hence ω1 ∈ B1 or ω1 ∈ (B1 ∪ C1 ), so that A1 ⊆ B1 ∪ C1 . Likewise if (ω1 , . . . , ωn ) ∈ G. Next, let ω j ∈ B j , j = 1, . . . , n. Then (ω1 , . . . , ωn ) ∈ B1 ×. . .× Bn or (ω1 , . . . , ωn ) ∈ F or (ω1 , . . . , ωn ) ∈ E or (ω1 , . . . , ωn ) ∈ (A1 × . . . × An ), hence ω1 ∈ A1 , which implies that B1 ⊆ A1 . By taking ω j ∈ C j , j = 1, . . . , n and arguing as before, we conclude that C1 ⊆ A1 . From B1 ⊆ A1 and C1 ⊆ A1 , we obtain B1 ∪ C1 ⊆ A1 . Since also A1 ⊆ B1 ∪ C1 , we get A1 = B1 ∪ C1 . Since B1 ∩ C1 = , we have then A1 = B1 + C1 . It remains for us to show that Ai = Bi = Ci , i = 2, . . . , n. Without loss of generality, it suffices to show that A2 = B2 = C2 , the remaining cases being treated symmetrically. As before, let ω j ∈ A j , j = 1, . . . , n. Then (ω1 , . . . , ωn ) ∈ (A1 ×. . .× An ) or (ω1 , . . . , ωn ) ∈ E or (ω1 , . . . , ωn ) ∈ (F +G). Hence either (ω1 , . . . , ωn ) ∈ F or (ω1 , . . . , ωn ) ∈ G. Let (ω1 , . . . , ωn ) ∈ F. Then (ω1 , . . . , ωn ) ∈ B1 × . . . × Bn and hence ω2 ∈ B2 , so that A2 ⊆ B2 .
e9
e10
Revised Answers Manual to an Introduction
Likewise A2 ⊆ C2 if (ω1 , . . . , ωn ) ∈ G. Next, let (ω1 , . . . , ωn ) ∈ B1 × . . . × Bn or (ω1 , . . . , ωn ) ∈ F or (ω1 , . . . , ωn ) ∈ (F + G) or (ω1 , . . . , ωn ) ∈ E or (ω1 , . . . , ωn ) ∈ (A1 × . . . × An ) and hence ω2 ∈ A2 , so that B2 ⊆ A2 . It follows that A2 = B2 . We arrive at the same conclusion A2 = B2 if we take (ω1 , . . . , ωn ) ∈ G. So, to sum it up, A1 = B1 + C1 , and A2 = B2 = C2 , and by symmetry, Ai = Bi = Ci , i = 3, . . . , n. A variation to the above proof is as follows. Let E = F + G or A1 × . . . × An = (B1 × . . . × Bn ) + (C1 × . . . × Cn ), and let (ω1 , . . . , ωn ) ∈ E. Then (ω1 , . . . , ωn ) ∈ A1 × . . . × An , so that ωi ∈ Ai , i = 1, . . . , n. Then ωi ∈ Bi , i = 1, . . . , n or ωi ∈ Ci , i = 1, . . . , n (but not both). So, Ai = Bi ∪Ci , i = 1, . . . , n and A j = B j +C j for at least one j. Consider the case n = 2, and without loss of generality suppose that A1 = B1 + C1 , A2 = B2 ∪ C2 . Then, clearly: A1 × A2 = (B1 + C1 ) × (B2 ∪ C2 ) = (B1 × B2 ) ∪ (C1 × C2 ) ∪ (B1 × C2 ) ∪ (C1 × B2 ). However, A1 × A2 = (B1 × B2 ) + (C1 × C2 ), and this implies that B1 × C2 ⊆ B1 × B2 and C1 × B2 ⊆ B1 × C2 , hence C2 ⊆ B2 and B2 ⊆ C2 , so that B2 = C2 (= A2 ). Next, assume the assertion to be true for n and consider: A1 × . . . × An × An+1 = (B1 × . . . × Bn × Bn+1 ) + (C1 × . . . × Cn × Cn+1 ), or An × An+1 = (B n × Bn+1 ) = (C n × Cn+1 ), where An = A1 × . . . × An , B n = B1 × . . . × Bn and C n = C1 × . . . × Cn . Apply the reasoning used in the case n = 2 by replacing A1 by An and A2 by An+1 (so that B1 , B2 and C1 , C2 are replaced, respectively, by B n , Bn+1 and C n , Cn+1 ) to get that: An = B n + C n , An+1 = Bn+1 ∪ Cn+1 . The first union is a “+” by the induction hypothesis. The second union may or may not be a “+” as of now. Then: An × An+1 = (B n ∪ C n ) × (Bn+1 ∪ Cn+1 ) = (B n × Bn+1 ) ∪ (C n × Cn+1 ) ∪ (B n × Cn+1 ) ∪ (C n × Bn+1 ). However, An × An+1 = (B n × Bn+1 ) + (C n × Cn+1 ). Therefore B n × Cn+1 ⊆ B n × Bn+1 and C n × Bn+1 ⊆ C n ×Cn+1 , so that Cn+1 ⊆ Bn+1 and Bn+1 ⊆ Cn+1 , and hence Bn+1 = Cn+1 . The proof is completed. # 18. The only properties of the σ -fields A1 and A2 used in the proof of Theorem 7 is that Ai , i = 1, 2 are closed under the intersection of two sets in them and also closed under complementations. Since these properties hold also for the case that Ai , i = 1, 2 are fields, Fi , i = 1, 2, the proof is completed. # 19. C as defined here need not be a σ -field. Here is a Counterexample: 1 = 2 = [0, 1]. For n ≥ 2, let In1 = [0, n1 ], In j = j ( j−1 n , n ], j = 2, . . . , n, and set E n j = In j × In j , j = 1, . . . , n. Also, let
Revised Answers Manual to an Introduction
n Qn = j=1 E n j , n ≥ 2. Then Q n belongs to the field of all finite sums of rectangles. Furthermore, it is clear that ∩∞ n=2 Q n = D, where D is the main diagonal determined by the origin and the point (1,1). (See picture below.) However, D is not in the class of all countable sums of rectangles, since it cannot be written as such. D is written as D = ∪x∈[0,1] (x, x), an uncountable union. 1
··
·· ··
0
··
1
Note: In the picture, the first rectangle E n1 = [0, n1 ] × [0, n1 ], and the subsequent j rectangles E n j are: E n j = ( j−1 n , n ], j = 2, 3, . . . , n. # 20. That C = is obvious. For A ∈ C, there exists A ∈ A such that A = X −1 (A ). Then Ac = [X −1 (A )]c = X −1 [(A )c ] with (A )c ∈ A . Thus Ac ∈ C. Finally, if A j ∈ C, j = 1, 2, . . . , then A j = X −1 (Aj ) with Aj ∈ A , and hence ∪∞ j=1 A j = ∞ ∞ ∞ ∞ −1 −1 ∪ j=1 A j with ∪ j=1 A j ∈ A , so that ∪ j=1 A j ∈ C, and ∪ j=1 X (A j ) = X C is a σ -field. # 21. That C = is obvious. For A ∈ C , there exists A ∈ A such that A = X −1 (A ). Then X −1 [(A )c ] = [X −1 (A )]c = Ac ∈ A, so that (A )c ∈ C . Finally, for Aj ∈ C , j = 1, 2, . . . , there exists A j ∈ A such that A j = X −1 (Aj ) and ∞ ∞ ∞ −1 X −1 ∪∞ j=1 A j = ∪ j=1 X (A j ) = ∪ j=1 A j ∈ A, so that ∪ j=1 A j ∈ C . It
follows that C is a σ -field. # 22. A simple example is the following. Let = {a, b, c, d}, A = {, {a}, {b, c, d}, }, X (a) = X (b) = 1, X (c) = 2, X (d) = 3. Then = {1, 2, 3} and X ({a}) = {1}, X ({b, c, d}) = {1, 2, 3}, so that C = {, {1}, {1, 2, 3}} which is not a σ -field. # n 23. Let X = i=1 αi I Ai and suppose that Ai ∈ A, i = 1, . . . , n. Then for any B ∈ B, X −1 (B) = ∪Ai where the union is taken over those is for which αi ∈ B.
e11
e12
Revised Answers Manual to an Introduction
24.
25.
26.
27.
28.
Since this union is in A, it follows that X is a r.v. Next, let X be a r.v. Then, by assuming without loss of generality that αi = α j , i = j, we have X −1 ({αi }) = A since {αi } ∈ B, i = 1, . . . , n. Clearly, the same reasoning applies when Ai ∈ ∞ αi I Ai . # X = i=1 Let ω belong to the right-hand side. Then X (ω) < r and Y (ω) < x − r for some r ∈ Q, so that X (ω) + Y (ω) < x and hence ω belongs to the left-hand side. Next, let ω belong to the left-hand side, so that X (ω) + Y (ω) < x or X (ω) < x − Y (ω). But then there exists r ∈ Q such that X (ω) < r < x −Y (ω) or X (ω) < r and r < x−Y (ω) or X (ω) < r and Y (ω) < x−r , so that ω belongs to the right-hand side. # If X is a r.v., then so is |X |, because for all x ≥ 0, we have |X |−1 ((−∞, x)) = (|X | < x) = (−x < X < x) ∈ A, since X is a r.v. That the converse is not necessarily true is seen by the following simple example. Take = {a, b, c, d}, A = {, {a, b}, {c, d}, }, and define X by: X (a) = −1, X (b) = 1, X (c) = −2, X (d) = 2. Then = {−2, −1, 1, 2}, and let A = P( ). We have |X |−1 ({1}) = {a, b}, |X |−1 ({2}) = {c, d}, |X |−1 ({−2}) = |X |−1 ({−1}) = , and all these sets are in A, so that |X | is measurable. However, X −1 ({−1}) = {a} and X −1 ({−2}) = {c}, none of which belongs in A, so that X is not measurable. As another example, let B be a non-Borel set in , and define X by: X (ω) = 1, ω ∈ B, and X (ω) = −1, ω ∈ B c . Then X is not B-measurable as X −1 ({1}) = B∈ / B, but |X |−1 ({1}) = ∈ B. # X + Y is measurable by Exercise 24. Next, (−Y ≤ y) = (Y ≥ −y) ∈ A, so that −Y is measurable. Then X + (−Y ) = X − Y is measurable. Now, if Z is mea√ √ surable, then so is Z 2 because, for z ≥ 0, (Z 2 ≤ z) = (− z ≤ Z ≤ z) ∈ A. Thus, if X , Y are measurable, then so are (X + Y )2 and (X − Y )2 , and therefore so is: (X + Y )2 − (X − Y )2 . But (X + Y )2 − (X − Y )2 = 4X Y . Thus, 4X Y is measurable, and then so is, clearly, X Y . Finally, if P(Y = 0) = 1, then, for y = 0, ( Y1 ≤ y) = (Y ≥ 1y ) ∈ A, so that Y1 is measurable. Thus, X and Y are measurable, and P(Y = 0) = 1, so that X and 1 1 X Y are measurable. Then X × Y = Y is measurable. # m Since σ (Tm ) = B , it suffices to show (by Theorem 2) that f −1 (Tm ) ⊆ B m for f to be measurable. By continuity of f , f −1 (Tm ) ⊆ Tn ⊆ B n , since σ (Tn ) = B n . Thus, f is measurable. Then, for B ∈ B m , [ f (X )]−1 = X −1 [ f −1 (B)] ∈ A, since f −1 (B) ∈ B n and X is measurable. # For any r.v. Z , it holds: Z = Z + − Z − and |Z | = Z + + Z − . Hence Z + = 1 1 − 2 (|Z | + Z ), Z = 2 (|Z | − Z ). Applying this to X , Y and X + Y , we get: X+ =
1 1 1 (|X | + X ), Y + = (|Y | + Y ), (X + Y )+ = [|X + Y | + (X + Y )]. 2 2 2
Hence X+ + Y + =
1 1 [(|X | + |Y |) + (X + Y )] ≥ [|X + Y | + (X + Y )] = (X + Y )+ . 2 2
Revised Answers Manual to an Introduction
Likewise, X− =
1 1 1 (|X | − X ), Y − = (|Y | − Y ), (X + Y )− = [|X + Y | − (X + Y )] 2 2 2
and hence X− + Y − =
1 1 [(|X | + |Y |) − (X + Y )] ≥ [|X + Y | − (X + Y )] = (X + Y )− . 2 2
Alternative proof: Let X + Y ≤ 0. Then (X + Y )+ = 0 = 0 + 0 ≤ X + + Y + . Let X + Y > 0. Then (X + Y )+ = X + Y ≤ X + + Y + , because X = X + − X − ≤ X + and Y = Y + − Y − ≤ Y + . Thus, (X + Y )+ ≤ X + + Y + . Again, let X + Y < 0. Then (X + Y )− = −(X + Y ) = −X − Y ≤ X − + Y − , because X = X + − X − or −X = X − − X + ≤ X − and Y = Y + − Y − or −Y = Y − − Y + ≤ Y − . Next, let X + Y ≥ 0. Then (X + Y )− = 0 = 0 + 0 ≤ X − + Y − , so that (X + Y )− ≤ X − + Y − . So, again: (X + Y )+ ≤ X + + Y + and (X + Y )− ≤ X − + Y − . # (i) From the definition of Bm , we have: B1 = A1 , and for m ≥ 2, Bm = 29. Ac1 ∩ . . . ∩ Acm−1 ∩ Am . (ii) For i = j (e.g., i < j), Bi is either A1 (for i = 1) or Bi = Ac1 ∩ . . . ∩ c ∩ Ai , whereas B j = Ac1 ∩ . . . ∩ Acj−1 ∩ A j , and Bi ∩ B j = , Ai−1 c because B i contains Ai and B j contains Ai (since i ≤ j − 1). ∞ (iii) Let ω = m=1 Bm . Then either ω ∈ B1 = A1 , and hence ω ∈ ∪∞ n=1 An , ∞ A . Thus, , i = 1, . . . , n − 1 and ω ∈ A , so that ω ∈ ∪ or ω ∈ / A i n n=1 n ∞ ∞ ∪∞ m=1 Bm ⊆ n=1 An . Next, let ω ∈ ∪n=1 An . Then either ω ∈ A1 = B1 , / Ai , i = 1, . . . , n − 1 and ω ∈ An . Then so that ω ∈ ∞ m , or ω ∈ m=1 B B ω ∈ Bn , so that ω ∈ ∞ m=1 m . # 30.
∞ (i) We have limn→∞ An = ∪∞ n=1 ∩k=n Ak , so that ω ∈ (lim n→∞ An ) or ∞ A , therefore ω ∈ ∩∞ A for some n , and hence ∩ ω ∈ ∪∞ 0 n=1 k=n k k=n 0 k ω ∈ Ak for all k ≥ n 0 . Next, let ω ∈ An for all but finitely many ns; i.e., ∞ ∞ ω ∈ An for all n ≥ n 0 . Then ω ∈ ∩∞ k=n 0 Ak and hence ω ∈ ∪n=1 ∩k=n Ak , which completes the proof. ∞ (ii) Here limn→∞ An = ∩∞ n=1 ∪k=n Ak , and hence ω ∈ (lim n→∞ An ) or ω ∈ ∞ ∞ ∞ ∩n=1 ∪k=n Ak implies that ω ∈ ∪∞ k=n Ak for n ≥ 1. From ω ∈ ∪k=1 Ak , ∞ let k1 be the first k for which ω ∈ Ak1 . Next, consider ∪k=k1 +1 Ak , and from ω ∈ ∪∞ k=k1 +1 Ak , let k2 be the first k (≥ k1 + 1) for which ω ∈ Ak2 . Continuing like this, we get that ω belongs to infinitely many An s. In the other way around, if ω belongs to infinitely many An s, that means that there exist 1 < k1 < k2 < . . . such that ω ∈ Ak j , j = 1, 2, . . . Then ∞ ω ∈ ∪∞ k=k j Ak , j ≥ 1, and hence ω ∈ ∪k=n Ak for 1 ≤ n ≤ k1 and ∞ k j < n < k j+1 , j ≥ 1. Thus, ω ∈ ∩∞ n=1 ∪k=n Ak and the result follows. #
∞ 31. From Ak ⊆ Bk , k ≥ 1, we have ∪∞ k=n Ak ⊆ ∪k=n Bk , n ≥ 1, and hence ∞ ∞ ∞ ∩∞ n=1 ∪k=n Ak ⊆ ∩n=1 ∪k=n Bk or lim n→∞ An ⊆ lim n→∞ Bn or (An i.o.) ⊆ (Bn i.o.) (by Exercise 2). #
e13
e14
Revised Answers Manual to an Introduction
1 ∞ ∞ ∞ 32. We have limn→∞ An = ∪∞ n=1 ∩k=n Ak and ∩k=n Ak = ∩k=n {r ∈ (1 − k+1 , 1 + 1 ∞ ∞ k ); r ∈ Q} = {1} for all n, so that ∪n=1 ∩k=n Ak = {1}; i.e., lim n→∞ An = {1}. 1 1 ∞ ∞ ∞ Next, limn→∞ An = ∩∞ n=1 ∪k=n Ak and ∪k=n Ak = ∪k=n {r ∈ (1 − k+1 , 1 + k ); 1 1 ∞ ∞ ∞ r ∈ Q} = {r ∈ (1 − n+1 , 1 + n ); r ∈ Q}, so that ∩n=1 ∪k=n Ak = ∩n=1 {r ∈ 1 (1 − n+1 , 1 + n1 ); r ∈ Q} = {1}. Thus, limn→∞ An = limn→∞ An = {1} = limn→∞ An . # ∞ ∞ 33. Here limn→∞ An = ∪∞ n=1 ∩k=n Ak , and consider the ∩k=n Ak for n odd or even. Then ∞
∩
k=2n−1
Ak = (
∩
k odd ≥ 2n − 1
Ak ) ∩ ( ∩
k even ≥ 2n
Ak ),
and 1 1 ] ∩ [−1, 2n+1 ] ∩ . . . = [−1, 0], A2n ∩ A2n+2 ∩ A2n−1 ∩ A2n+1 ∩ . . . = [−1, 2n−1 1 1 . . . = [0, 2n ) ∩ [0, 2n+2 ) ∩ . . . = {0}, so that ∩∞ k=2n−1 Ak = [−1, 0] ∩ {0} = {0}. Next, ∞
∩ Ak = ( ∩
k=2n
k even ≥ 2n
Ak ) ∩ (
∩
k odd ≥ 2n + 1
Ak ),
and 1 1 ) ∩ [0, 2n+2 ) ∩ . . . = {0}, A2n+1 ∩ A2n+3 ∩ . . . = A2n ∩ A2n+2 ∩ . . . = [0, 2n 1 1 [−1, 2n+1 ]∩[−1, 2n+3 ]∩. . . = [−1, 0], so that ∩∞ k=2n Ak = {0}∩[−1, 0] = {0}. ∞ A = {0} = lim ∩ A It follows that ∪∞ n→∞ n . n=1 k=n n ∞ ∞ Next, limn→∞ An = ∩n=1 ∪k=n Ak , and consider the ∪∞ k=n Ak for odd and even values of n. We have ∞
∪
k=2n−1
Ak = (
∪
k odd ≥ 2n − 1
Ak ) ∪ ( ∪
k even ≥ 2n
Ak ),
and 1 1 1 ] ∪ [−1, 2n+1 ] ∪ . . . = [−1, 2n−1 ], A2n ∪ A2n−1 ∪ A2n+1 ∪ . . . = [−1, 2n−1 1 1 1 ∞ A2n+2 ∪ . . . = [0, 2n ) ∪ [0, 2n+2 ) ∪ . . . = [0, 2n ), so that ∪k=2n−1 Ak = 1 1 1 [−1, 2n−1 ] ∪ [0, 2n ) = [−1, 2n−1 ]. Next, ∞
∪ Ak = ( ∪
k=2n
k even ≥ 2n
Ak ) ∪ (
∪
k odd ≥ 2n + 1
Ak ),
and 1 1 1 ) ∪ [0, 2n+2 ) ∪ . . . = [0, 2n ), A2n+1 ∪ A2n+3 ∪ . . . = A2n ∪ A2n+2 ∪ . . . = [0, 2n 1 1 1 1 [−1, 2n+1 ] ∪ [−1, 2n+3 ] ∪ . . . = [−1, 2n+1 ], so that ∪∞ k=2n Ak = [0, 2n ) ∪
Revised Answers Manual to an Introduction
1 1 [−1, 2n+1 ] = [−1, 2n ). It follows that ∞
∞
∩ ∪ Ak = [−1, 1] ∩ [−1, 21 ) ∩ [−1, 13 ] ∩ [−1, 41 ) ∩ . . .
n=1 k=n
= [−1, 0] = lim An . n→∞
So, limn→∞ An = {0} and limn→∞ An = [−1, 0], so that the limn→∞ An does not exist. # (i) We have: 34. {[0, 1), [1, 2), . . . , [n −1, n)} ⊂ {[0, 1), [1, 2), . . . , [n −1, n), [n, n +1)} and hence An ⊆ An+1 . That An ⊂ An+1 follows by the fact that, e.g., [n, n + 1) cannot belong in An since all members of An are ⊆ [0, n). (ii) Let A1 ∈ A1 , A2 ∈ A2 but not in A1 , . . . , An ∈ An but not in An−1 , . . . , ∞ A . Then A ∈ / ∪∞ and set A = ∪i=1 i n=1 An , because otherwise, A ∈ An for ∞ Ai ∈ / An . some n. However, this is not possible since ∪i=n+1 c (iii) A1 = {, [0, 1), [0, 1) = (−∞, 0) ∪ [1, ∞), }, A2 = {, [0, 1), [1, 2), (−∞, 0) ∪ [1, ∞), (−∞, 1) ∪ [2, ∞), [0, 2), (−∞, 0) ∪ [2, ∞), }. # 35.
(i) First, observe that all intersections A1 ∩ . . . ∩ An are pairwise disjoint, so that their unions are, actually, sums. Next, if A and B are in C, it is clear that A ∪ B is a sum of intersections A1 ∩ . . . ∩ An (the sum of those intersections in A and those intersections in B), so that A ∪ B is in C. Now, if A ∈ C, then Ac is the sum of all those intersections A1 ∩ . . . ∩ An which are not part of A. Hence Ac is also in C, and C is a field. (ii) In forming A1 ∩ . . . ∩ An , we have 2 choices at each one of the n steps. Thus, there are 2n sets of the form A1 ∩ . . . ∩ An . Next, in forming their sums, we select k of those members atan time, where k = 0,n 1, . . . , 2n . n 2 2n Therefore the total number of sums is: 0 + 1 + . . . + 22n = 22 . #
36.
(i) If ω ∈ A, then f (ω) ∈ f (A) and ω ∈ f −1 [ f (A)]. For a concrete example, take f : → [0, 1) where f (x) = x 2 , and let A = [0, 1). Then f (A) = f ([0, 1]) = [0, 1), and f −1 ([0, 1)) = (−1, 1). It follows that f −1 [ f (A)] = f −1 ([0, 1)) = (−1, 1) ⊃ [0, 1) = A. (ii) Let ω ∈ f [ f −1 (B)] which implies that there exists ω ∈ f −1 (B) such that f (ω) = ω . Also, ω ∈ f −1 (B) implies that f (ω) ∈ B. Since also f (ω) = ω , it follows that ω ∈ B. Thus f [ f −1 (B)] ⊆ B. For a concrete example, let f : → with f (x) = c. Take B = (c − 1, c + 1), so that f −1 [(c − 1, c + 1)] = and f ( ) = {c} ⊂ (c − 1, c + 1). That is, f [ f −1 (B)] = {c} ⊂ (c − 1, c + 1) = B. #
37.
(i) Since X −1 ({−1}) = A1 , X −1 ({1}) = Ac1 ∩ A2 , and X −1 ({0}) = Ac1 ∩ Ac2 , and A1 , Ac1 ∩ A2 , Ac1 ∩ Ac2 are in A, X is a r.v.
e15
e16
Revised Answers Manual to an Introduction
(ii) We have X −1 ({−1}) = {a, b}, X −1 ({1}) = {c}, X −1 ({2}) = {d}, and neither {c} nor {d} are in A. Then X is not A-measurable. (iii) We have X −1 ({−2}) = {−2}, X −1 ({−1}) = {−1}, X −1 ({0}) = {0}, X −1 ({1}) = {1}, X −1 ({2}) = {2}, so that X −1 (B) is the field induced in by the partition: {{−2}, {−1}, {0}, {1}, {2}}. The values taken on by X 2 are 0, 1, 4, and (X 2 )−1 ({0}) = {0}, (X 2 )−1 ({1}) = {−1, 1}, (X 2 )−1 ({4}) = {−2, 2}, so that the field induced by X 2 is the one generated by the sets {0}, {−1, 1}, {−2, 2}, and it is, clearly, strictly contained in the one induced by X . # 38. For a fixed k, let Ak,n = (X k , . . . , X k+n−1 )−1 (B). Then the σ -fields Ak,n , n ≥ 1, form a nondecreasing sequence and therefore Fk = ∪∞ n=1 Ak,n is a field (but it may fail to be a σ -field; see Exercise 10 in this chapter) and Bk = σ (Fk ). Likewise, Bl = σ (Fl ) where Fl = ∪∞ n=1 Al,n . ∞ ∞ ∞ However, ∪∞ n=k An ⊇ ∪n=l An , so that Bk = σ (∪n=k An ) ⊇ σ (∪n=l An ) = Bl . This is so by the way the σ -fields Bk and Bl are generated (see Theorem 2(ii) in this chapter). # 39. Since Sk is a function of the X j s, j = 1, . . . , k, k = 1, . . . , n it follows that σ (Sk ) ⊆ σ (X 1 , . . . , X n ), k = 1, . . . , n. Hence ∪nk=1 σ (Sk ) ⊆ σ (X 1 , . . . , X n ) and then σ (∪nk=1 σ (Sk )) ⊆ σ (X 1 , . . . , X n ) or σ (S1 , . . . , Sn ) ⊆ σ (X 1 , . . . , X n ). Next, X k = Sk − Sk−1 , k = 1, . . . , n (S0 = 0), so that X k is a function of the S j s, k = 1, . . . , n. Then, as above, σ (X 1 , . . . , X n ) ⊆ σ (S1 , . . . , Sn ), and equality follows. # 40. Consider the function f : → defined by y = f (x) = x + c. Then, clearly, f (B) = Bc . The existing inverse of f , f −1 , is given by: x = f −1 (y) = x − c, and it is clear that ( f −1 )(Bc ) = B. By setting g = f −1 , so that g −1 = f , we have that g −1 (B)(= f (B)) = Bc . So, g −1 is continuous and hence measurable, and g −1 (B) = Bc . Since B is measurable then so is Bc . # (i) Clearly, F = . Next, to show that F is closed under complementation. 41. Indeed, if A ∈ F, then n
n
mi
j
A = ∪ Ai = ∪ ∩ Ai i=1
=
i=1 j=1 1 1 1 (A1 ∩ . . . ∩ Am 1 ) ∪ . . . ∪ (An
n ∩ . . . ∩ Am n )
mn 1 1 with all A11 , . . . , Am 1 , . . . , An , . . . , An in F1 , so that 1 mn c 1 Ac = [A11 ∩ . . . ∩ Am 1 ) ∪ . . . ∪ (An ∩ . . . ∩ An )] 1 c mn c 1 c = [(A11 )c ∪ . . . ∪ (Am 1 ) ] ∩ . . . ∩ [(An ) ∪ . . . ∪ (An ) ] m1
mn
i 1 =1
i n =1
= ∪ . . . ∪ [(Ai11 )c ∩ . . . ∩ (Ainn )c ]. The fact that Ai11 , . . . , Ainn are in F1 implies that (Ai11 )c , . . . , ( Ainn )c are also in F1 , as follows from the definition of F1 . So, Ac is a finite union of a finite intersection of members of F1 , and hence Ac ∈ F3 (= F),
Revised Answers Manual to an Introduction
by the definition of F3 . Next, let A, B ∈ F. To show that A ∪ B ∈ F. Indeed, A, B ∈ F implies that A = A1 ∪ . . . ∪ Am = (A11 ∩ . . . ∩ Ak11 ) ∪ . . . ∪ (A1m ∩ . . . ∩ Akmm ) with Ai1 , . . . , Aiki in F1 , i = 1, . . . , m, B = B1 ∪ . . . ∪ Bn = (B11 ∩ . . . ∩ B1l1 ) ∪ . . . ∪ (Bn1 ∩ . . . ∩ Bnln ) with l
B 1j , . . . , B j j in F1 , j = 1, . . . , n, so that A ∪ B = [(A11 ∩ . . . ∩ Ak11 ) ∪ . . . ∪ (A1m ∩ . . . ∩ Akmm )] ∪ [(B11 ∩ . . . ∩ B1l1 ) ∪ . . . ∪ (Bn1 ∩ . . . ∩ Bnln )] = (A11 ∩ . . . ∩ Ak11 ) ∪ . . . ∪ (A1m ∩ . . . ∩ Akmm ) ∪ (B11 ∩ . . . ∩ B1l1 ) ∪ . . . ∪ (Bn1 ∩ . . . ∩ Bnln ), which is a finite union of finite intersections of members of F1 . It follows that A ∪ B is in F3 (= F), so that F is a field. (ii) Trivially, C ⊆ F, so that F(C) ⊆ F. To show that F ⊆ F(C). Let A ∈ F. mn 1 1 Then, by part (i), A = (A11 ∩ . . . ∩ Am 1 ) ∪ . . . ∪ (An ∩ . . . ∩ An ) with mn m1 1 1 all A1 , . . . , A1 , . . . , An , . . . , An in F1 . Clearly, F1 ⊆ F(C) by the definition of F1 . Thus, Ai1 , . . . , Aim i are in F(C), for i = 1, . . . , n, and then the intersections Ai1 ∩ . . . ∩ Aim i , i = 1 1, . . . , n are in F(C), and therefore so is their union (A11 ∩ . . . ∩ Am 1 )∪ mn 1 . . . ∪ (An ∩ . . . ∩ An ). Since this union is A, it follows that A ∈ F(C). Thus, F ⊆ F(C), and the proof is completed. # Remark: In Exercise 41, in the proof that A ∈ F implies Ac ∈ F, the following property was used (in a slightly different notation for simplification); namely, (C11 ∪ . . . ∪ C1m 1 ) ∩ . . . ∩ (Cn1 ∪ . . . ∪ Cnm n ) = 1 ∪im1 =1 . . . ∪imn n=1 (C1i1 ∩ . . . ∩ Cnin ). This is justified as follows: Let ω belong to the right-hand side. Then ω belongs to at leats one of the m 1 × . . . × m n members of the union, for i
i
example, ω ∈ (C11 ∩ . . . ∩ Cnn ) for some 1 ≤ i 1 ≤ m 1 , . . . , 1 ≤ i n ≤ m n . But then ω ∈ (C11 ∪ . . . ∪ C1m 1 ), . . . , ω ∈ (Cn1 ∪ . . . ∪ Cnm n ), and therefore ω ∈ [C11 ∪ . . . ∪ C1m 1 ) ∩ . . . ∩ (Cn1 ∪ . . . ∪ Cnm n )], or ω belongs to the left-hand side. Next, let ω belong to the left-hand side. Then ω ∈ i
i
(C11 ∪. . .∪C1m 1 ), . . . , ω ∈ (Cn1 ∪. . .∪Cnm n ), so that ω ∈ C11 , . . . , ω ∈ Cnn i
i
for some 1 ≤ i 1 ≤ m 1 , . . . , 1 ≤ i n ≤ m n . But then ω ∈ (C11 ∩ . . . ∩ Cnn ), i
i
and C11 ∩ . . . ∩ Cnn is one of the m 1 × . . . × m n members of the union on the right-hand side. It follows that ω belongs to the right-hand side, and the justification is completed. # ∞ A , A = A1 ∩ A2 ∩. . . with A1 , A2 , . . . in A , i ≥ 1. 42. Let A ∈ A. Then A = ∪i=1 i i 1 i i i i Then ∞
∞
∞
i=1
i=1
i=1
Ac = ( ∪ Ai )c = ∩ Aic = ∩ (Ai1 ∩ Ai2 ∩ . . .)c
e17
e18
Revised Answers Manual to an Introduction
∞
= ∩ [(Ai1 )c ∪ (Ai2 )c ∪ . . .] i=1
= [(A11 )c ∪ (A21 )c ∪ . . .] ∩ [(A12 )c ∪ (A22 )c ∪ . . .] ∩ . . . ∩ [(A1n )c ∪ (A2n )c ∪ . . .] ∩ . . . , and this is equal to ∪[(Ai11 )c (∩Ai22 )c ∩ . . . ∩ (Ainn )c ∩ . . .] with i 1 , i 2 , . . . , i n , . . . integers ≥ 1, and the union extends over all choices of the sets (Ai11 )c , (Ai22 )c , . . . , (Ainn )c , . . . from the respective collections: (Ai1 )c , (Ai2 )c , . . . , (Ain )c , . . . , i = 1, 2, . . . , n, . . . However, these choices produce 0 N0 × N0 × . . . × N0 × . . . = NN 0 = N where N0 and N are the cardinal numbers of a countable set and of the continuum, respectively. Thus, there are uncountable members in the union, and hence the union need not be in A. In other word, Ac need not be in A, so that A need not be a σ -field. Remark: For the justification of the equality, asserted in the derivations related to Ac , refer to the remark following the proof of Exercise 41. #
Chapter 2 Definition and Construction of a Measure and its Basic Properties 1. If is finite, then μ is ≥ 0, μ() = 0 and finitely additive (since there are only finitely many subsets of ). Thus, μ is a measure, and also finite. If is μ ≥ 0, μ() = 0, and if A n , n ≥ 1, denumerable, = {ω1 , ω2 , . . .}, then ∞ ∞ = ∞ and A An = ∞ are = and pairwise disjoint, then μ n=1 n n=1 μ since each term is ≥ 1. Thus, μ is a measure. It is σ -finite, since = ∞ n=1 {ωn } and μ({ωn }) = 1 (finite). # n 2. (i) Let Ai ∈ C, i = 1, . . . , n, Ai ∩ A j = , i = j, and set A = i=1 Ai , so that A ∈ C. Then either A is finite or Ac is finite. If A is finite, then all Ai , i = 1, . . . , n, are finite, and therefore P(A) = 0 = 0 + . . . + 0 = P(A1 ) + . . . + P(An ). If Ac is finite, then A is not finite and hence at least one of A1 , . . . , An is not finite; call Ai0 such an event. We claim that Ai0 is unique. Indeed, if Ai and A j , i = j, are not finite, then Aic and Acj are finite. Since Ai ∩ A j = , it follows that Ai ⊂ Acj and n P(Ai ) = P(Ai0 ) = 1 hence Ai is finite, a contradiction. Then i=1 ) = 0, i = i , as being all finite), and P(A) = 1. Hence (since P(A i 0 n P(Ai ). P(A) = i=1 (ii) Let = {ω1 , ω2 , . . .} and take Ai = {ωi }, so that Ai ∩ A j = , ∞ Ai (= P()) = 1 i = j, and P(Ai ) = 0 for all i. However, P i=1 n ∞ c = finite). Therefore since i=1 Ai ∞ i=1 Ai is infinite(and ∞ = 1 = 0 = A P(A ), and P is not σ -additive. P i i=1 i i=1
Revised Answers Manual to an Introduction
(iii) Let An ∈ C, n ≥ 1, Ai ∩ A j = , i = j, and set A = ∞ n=1 An , so that A ∈ C. Then either A is finite or Ac is finite. If A is finite, then all An s are s) and hence finite (indeed, A is only the sum of finitely many of the An P(An ) = 0 for all n, and also P(A) = 0. Thus, P(A) = ∞ n=1 P(An ) (actually, the σ -additivity here degenerates to finite additivity). If Ac is finite, then A is infinite. Since is uncountable, it follows that at least one of the An s is infinite, because otherwise A would be countable (so that A + Ac = is countable, a contradiction) ; call An 0 such an event. We claim that An 0 is unique. Indeed, if Ai and A j , i = j, are infinite, then Aic and Acj are finite. Since Ai ∩ A j = , it follows that Ai ⊂ Acj and hence Ai is finite, a contradiction. Then ∞ n=1 P(An ) = P(An 0 ) = 1 ) = 0, n = n , as being all finite), and P(A) = 1. Hence (since P(A 0 n P(A ). P(A) = ∞ n n=1 Finally, it is clear that P(A) ≥ 0, P() = 0 and P() = 1. These properties along with the σ -additivity just established make P a probability measure. # 3. Clearly, P(A) ≥ 0, P() = 0 and P() = 1 since c = countable. It remains to establish σ -additivity. Let An ∈ C, n ≥ 1, Ai ∩ A j = , i = j, and c set A = ∪∞ n=1 An . Since A ∈ C, it follows that either A is countable or A is countable. If A is countable, then all A n s are countable, and hence P(A) = 0 and c P(An ) = 0, n ≥ 1, so that P(A) = ∞ n=1 P(An ). If A is countable, then A is uncountable, and therefore at least one of the An s is uncountable; call An 0 such an event. We claim that An 0 is unique. Indeed, if Ai and A j , i = j, are uncountable, then Aic and Acj are countable. Since Ai ∩ A j = , it follows that Ai ⊂ Acj and hence Ai is countable, a contradiction. Then ∞ n=1 P(An ) = P(An 0 ) = 1 ) = 0, i = n , as being all countable), and P(A) = 1. Hence (since P(A i 0 P(A) = ∞ n=1 P(An ). # Acn ≤ 4. P(An ) = 1 if and only if P(Acn ) = 0, which implies that P ∪∞ n=1
∞ c ∞ c c = 0 or P ∩∞ = 0, n=1 P(An ) = 0; i.e., P ∪n=1 An n=1 An ∞ and hence P ∩n=1 An = 1. # 5. For each n ≥ 2, there are at most n − 1 events Ai s for which P(Ai ) > n1 , because otherwise, we could choose n events with P(Ai j ) > n1 , so that nj=1 P(Ai j ) > 1. n (by pairwise However, nj=1 Ai j ⊆ and nj=1 P(Ai j ) = P A i j j=1 disjointness), and this is ≤ P() = 1, a contradiction. Thus, if In = {i ∈ I ; P(Ai ) > n1 }, then the cardinality of In is ≤ n−1. Set I0 = {i ∈ I ; P(Ai ) > 0}. Then, clearly, I0 = ∪∞ n=2 In , and since each In is finite, I0 is countable. # 6. Clearly, μ(A) ≥ 0 and μ() = 0. To establish σ -additivity. To this end, let An ∈ A, Ai ∩ A j = , i = j, and set A = ∞ n=1 An . Then: μ(A) =
ωn ∈A
pn =
∞ i=1 ωn ∈Ai
pn =
∞ i=1
μ(Ai ). #
e19
e20
Revised Answers Manual to an Introduction
7. Let + = {ωn s; pn > 0}. Then the atoms are those A which are of the form: A = {ωn } ∪ N , where ⊆ N ⊆ −+ . # ∞ ∞ A = μ limn→∞ ∩i=n = limn→∞ 8. μ limn→∞ An = μ ∪∞ i n=1 ∩i=n Ai ∞ ∞ A ⊆ A . Next, μ lim μ ∩i=n Ai ≤ lim μ(A ) since ∩ An = n i n n→∞ n→∞ i=n ∞ A ∞ A ∞ A , provided = μ limn→∞ ∪i=n = limn→∞ μ ∪i=n ∪i=n μ ∩∞ i i i n=1 ∞ A < ∞ for some n, and this is ≥ lim ∞ μ ∪i=n i n→∞ μ(An ) since ∪i=n Ai ⊇ An . # 9. μ0 is an outer measure; i.e., μ0 () = 0, μ0 is ↑, and μ0 is sub-σ -additive, because: μ0 () = I (ω) = 0; A ⊆ B implies I A (ω0 ) ≤I B (ω0 ), so that ∞ ∞ A (ω0 ) ≤ μ0 (A) = I A (ω0 ) ≤ I B (ω0 ) = μ0 (B); clearly, I∪i=1 i=1 I Ai (ω0 ), so i ∞ ∞ ∞ A = I ∞ 0 that μ0 ∪i=1 i ∪i=1 Ai (ω0 ) ≤ i=1 I Ai (ω0 ) = i=1 μ (Ai ). # 10.
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
That μ0 () = 0 and ↑ are obvious. Denote i = 1, . . . , 10, by Ci the i-th column, 0 (A ). Set A = μ and let An ⊆ , n ≥ 1. To show μ0 ∪n≥1 An ≤ n n≥1 ∪n≥1 An and suppose μ(A) = k. Then there exist k columns Ci1 , . . . , Cik such that Ci j ∩ A = , j = 1, . . . , k. This implies that there exists at least one x j ∈ Ci j ∩ A with x j ∈ Ci j and x j ∈ A, so that x j ∈ Ci j and x j ∈ An j , j = 1, . . . , k, where n 1 , . . . , n k are chosen from the set {1, 2, . . .} and need not be distinct. Then μ0 (An j ) ≥ 1, j = 1, . . . , k, and therefore: k≤
k j=1
μ0 (An j ) ≤
n≥1
μ0 (An j ) or μ0
∪ An
n≥1
≤
n≥1
μ0 (An ). #
Revised Answers Manual to an Introduction
11.
12.
(i) In the first place, it is clear that μ∗ () = 0 and μ∗ () = 1. Next, let ⊂ A ⊂ . The only covering of A by member of F is , so that μ∗ (A) = 1. Thus, μ∗ (A) = 0 if A = and μ∗ (A) = 1 if A = . (ii) First, and are μ∗ -measurable, and let ⊂ A ⊂ (which implies ⊂ Ac ⊂ ). Then A cannot be μ∗ -measurable. Indeed, in the required equality μ∗ (D) = μ∗ (A ∩ D) + μ∗ (Ac ∩ D), take D = . Then the left-hand side is μ∗ (D) = μ∗ () = 1, and the right-hand side is μ∗ (A ∩ ) + μ∗ (Ac ∩ ) = μ∗ (A) + μ∗ (Ac ) = 1 + 1 = 2, and the equality is violated. Hence A∗ = {, }. # (i) C is not a field because, e.g., {ω1 , ω2 } ∪ {ω1 , ω3 } = {ω1 , ω2 , ω3 } ∈ / C. (ii) Clearly, μ(A) ≥ 0 and μ() = 0. The only two disjoint sets whose sum is also in C are: {ω1 , ω2 } + {ω3 , ω4 } = , {ω1 , ω3 } + {ω2 , ω4 } = , and, by taking measures, we have: 3+3 = 6, 3+3 = 6, so that μ is a measure. (iii) On C: μ1 () = μ2 () = 0, μ1 () = μ2 () = 6, μ1 ({ω1 , ω2 }) = 3 = μ2 ({ω1 , ω2 }), μ1 ({ω1 , ω3 }) = 3 = μ2 ({ω1 , ω2 }), μ1 ({ω2 , ω4 }) = 3 = μ2 ({ω2 , ω4 }), μ1 ({ω3 , ω4 }) = 3 = μ2 ({ω3 , ω4 }), so that μ1 = μ2 on C. (iv) Write out the subsets of and their coverages by unions of members of C with the smallest measures to get: ω1 : {ω1 , ω2 } ω2 : {ω1 , ω2 } ω3 : {ω1 , ω3 } ω4 : {ω2 , ω4 } {ω1 , ω2 } : {ω1 , ω2 } {ω1 , ω3 } : {ω1 , ω3 } {ω1 , ω4 } : {ω1 , ω2 } ∪ {ω2 , ω4 } ∪, {ω1 , ω2 } ∪ {ω3 , ω4 }, {ω1 , ω3 } ∪ {ω2 , ω4 }, {ω1 , ω3 } ∪ {ω3 , ω4 } {ω2 , ω3 } : {ω1 , ω2 } ∪ {ω1 , ω3 } ∪, {ω1 , ω2 } ∪ {ω3 , ω4 }, {ω1 , ω3 } ∪ {ω2 , ω4 }, {ω2 , ω4 } ∪ {ω3 , ω4 } {ω2 , ω4 } : {ω2 , ω4 } {ω3 , ω4 } : {ω3 , ω4 } {ω1 , ω2 , ω3 } : {ω1 , ω2 } ∪ {ω1 , ω3 } {ω1 , ω2 , ω4 } : {ω1 , ω2 } ∪ {ω2 , ω4 } {ω1 , ω3 , ω4 } : {ω1 , ω3 } ∪ {ω2 , ω4 } {ω2 , ω3 , ω4 } : {ω2 , ω4 } ∪ {ω3 , ω4 }. Then:
e21
e22
Revised Answers Manual to an Introduction
μ∗ ({ω1 }) = μ∗ ({ω2 }) = μ∗ ({ω3 }) = μ∗ ({ω4 }) = 3, μ∗ ({ω1 , ω2 }) = μ∗ ({ω1 , ω3 }) = μ∗ ({ω2 , ω4 }) = μ∗ ({ω3 , ω4 }) = 3, μ∗ ({ω1 , ω4 }) = μ∗ ({ω2 , ω3 }) = 6, μ∗ ({ω1 , ω2 , ω3 }) = μ∗ ({ω1 , ω2 , ω4 }) = μ∗ ({ω1 , ω3 , ω4 }) = μ∗ ({ω2 , ω3 , ω4 }) = 6. (v) By part (iv), μ∗ = μ1 = μ2 because, e.g., μ1 ({ω1 , ω4 }) = 2, μ2 ({ω1 , ω4 }) = 4 and μ∗ ({ω1 , ω4 }) = 6, all distinct. # 13.
(i) Immediate. (ii) The only partition of with members in C is {A, Ac } and μ(A) = μ(Ac ) = 0 (Ac = {0, 2, 4, . . .}). (iii) On C, μ1 () = μ2 () = 0, and μ1 (A) = μ1 (Ac ) = μ1 () = ∞ = μ2 (A) = μ2 (Ac ) = μ2 (). (iv) Let ⊂ B ⊂ . Then the only possible coverages of B by members of C are: A, Ac , , all of which have μ-measure ∞. Thus, μ∗ (B) = ∞ for every B as above. (v) Let ⊂ B ⊂ . Then if D ⊂ is = , from = (B ∩)+(B c ∩), it follows that 0 = 0, whereas for D = , the relation D = (B ∩ D)+(B c ∩ D) implies that at least one of B ∩ D and B c ∩ D is = . Hence ∞ = ∞ and the equality holds again. Since and are always μ∗ -measurable, it follows that A∗ = P(). #
15.
(i) To show that A M = (A − N ) ∪ [N ∩ (A M)], where M ⊆ N . We have A M = (A M) ∩ = (A M) ∩ (N ∪ N c ) = [(A M) ∩ N ] ∪ [(A M) ∩ N c ] = [N ∩ (A M)] ∪ {[(A − M) ∪ (M − A)] ∩ N c } = [N ∩ (A M)] ∪ {[(A ∩ M c ) ∪ (Ac ∩ M)] ∩ N c } = [N ∩ (A M)] ∪ (A ∩ M c ∩ N c ) ∪ (Ac ∩ M ∩ N c ) = [N ∩ (A M)] ∪ (A ∩ M c ∩ N c ) (since M ⊆ N implies N c ⊆ M c and hence M ∩ N c = ) = [N ∩ (A M)] ∪ (A ∩ N c ) (since N c ⊆ M c ) = (A − N ) ∪ [N ∩ (A M)]. (ii)
A ∪ M = [(A − N ) ∪ (A ∩ N )] ∪ M = (A − N ) ∪ [(A ∩ N ) ∪ M] = (A − N ) ∪ [(A ∩ N ) ∪ (M ∩ N )] (since M ⊆ N ) = (A − N ) + [(A ∪ M) ∩ N ] = (A − N ) [N ∩ (A ∪ M)] (since for B and C with B ∩ C = , B + C = B C).
Revised Answers Manual to an Introduction
(iii) Let B ∈ A∗ . Then B = A M for some A ∈ A and M ⊆ N , N ∈ A with μ(N ) = 0. By part (i), B = A M = (A − N ) ∪ [N ∩ (A M)] with (A − N ) ∈ A and N ∩(A M) ⊆ N . That is, B is of the form A ∪ M with A replaced by A− N (a member of A) and M replaced by N ∩(A M) (which ¯ Next, let is a subset of N with N ∈ A and μ(N ) = 0). It follows that B ∈ A. ¯ B ∈ A. Then B = A ∪ M for some A ∈ A and some M ⊆ N with N ∈ A and μ(N ) = 0. By part (ii), B = A ∪ M = (A − N ) [N ∩ (A ∪ M)] with (A − N ) ∈ A and N ∩ (A ∪ M) ⊆ N . That is, B is of the form A ∪ M with A replaced by A − N (a member of A) and M replaced by N ∩ (A ∪ M) (which is a subset of N ∈ A and μ(N ) = 0). It follows that B ∈ A∗ . ¯ Therefore A∗ = A. Note: Parts (i) and (ii) are also established by showing that each side is contained in the other. This is done as follows. (i) Let ω belong to the left-hand side; i.e., ω ∈ A M, so that ω ∈ A and ω∈ / M. That ω ∈ / M implies that either ω ∈ / N or ω ∈ N . If ω ∈ / N , then ω ∈ (A − N ), so that ω belongs to the right-hand side. If ω ∈ N , then ω ∈ [N ∩ (A M)], so that ω belongs to the right-hand side again. Next, let ω belong to the right-hand side. Then ω ∈ (A − N ) or ω ∈ / N , so that ω ∈ A and [N ∩ (A M)]. If ω ∈ (A − N ), then ω ∈ A and ω ∈ ω∈ / M. It follows that ω belongs to the left-hand side. On the other hand, if ω ∈ [N ∩ (A M)], then ω ∈ (A M), so that ω belongs to the left-hand side again. (ii) Let ω belong to the left-hand side; i.e., ω ∈ A ∪ M, so that ω ∈ A and ω ∈ M or to both. Let ω ∈ A. Also, either ω ∈ N or ω ∈ / N . If ω ∈ N , then ω ∈ [N ∩ (A ∪ M)], so that ω belongs to the right-hand side. If ω ∈ / N, then ω ∈ (A − N ), so that ω belongs to the right-hand side again. Finally, let ω ∈ M. Then ω ∈ N and hence ω ∈ [N ∩ (A ∪ M)], so that ω belongs to the right-hand side. Next, let ω belong to the right-hand side. Then either ω ∈ (A − N ) or ω ∈ [N ∩ (A ∪ M)]. Let ω ∈ (A − N ). Then ω ∈ A and (ω ∈ / N ), so that ω belongs to the left-hand side. If ω ∈ [N ∩ (A ∪ M)], then ω ∈ (A ∪ M), so that ω belongs to the left-hand side. # ¯ Next, ¯ A∗ ) = since, e.g., = ∪ , ∈ A, μ() = 0, so that ∈ A. 16. A(= c ¯ ¯ ¯ for B ∈ A to show that B ∈ A. Now B ∈ A implies B = A ∪ M, A ∈ A, M ⊆ N ∈ A, μ(N ) = 0. Then B c = (A ∪ M)c = Ac ∩ M c = Ac ∩ [M c ∩ (N ∪ N c )] = Ac ∩ [(M c ∩ N ) ∪ (M c ∩ N c )] = Ac ∩ [(M c ∩ N ) ∪ N c ] (since M ⊆ N implies N c ⊆ M c ) = (Ac ∩ N c ) ∪ (N ∩ M c ∩ Ac )
e23
e24
Revised Answers Manual to an Introduction
with Ac ∩ N c ∈ A and N ∩ M c ∩ Ac ⊆ N . That is, B c is of the form A ∪ M with A (a member of A) replaced by Ac ∩ N c and M (⊆ N ∈ A with μ(N ) = 0) ¯ Finally, let Bi ∈ A, ¯ i = 1, 2, . . . replaced by N ∩M c ∩ Ac . It follows that B c ∈ A. Then Bi = Ai ∪ Mi with Ai ∈ A and Mi ⊆ Ni ∈ A with μ(Ni ) = 0, i ≥ 1. Therefore ∞
∞
∞
∞
∞
i=1
i=1
i=1
i=1
i=1
∪ Bi = ∪ (Ai ∪ Mi ) = ( ∪ Ai ) ∪ ( ∪ Mi ) with ∪ Ai ∈ A
17.
∞ M ⊆ ∪∞ N , a member of A with μ(∪∞ N ) = 0. It follows that and ∪i=1 i i=1 i i=1 i ∞ ¯ and A¯ is a σ -field. # ∪i=1 Bi belongs in A,
(i) In the first place, the definition μ∗ (A M) = μ(A) implies μ∗ (A ∪ M) = μ(A). Indeed, A ∪ M = (A − N ) [N ∩ (A ∪ M)] with (A − N ) ∈ A and N ∩ (A ∪ M) ⊆ N ∈ A, μ(N ) = 0. Therefore μ∗ (A ∪ M) = μ(A − N ) = μ(A ∩ N c ) = μ(A ∩ N c ) + μ(A ∩ N ) = μ[(A ∩ N c ) ∪ (A ∩ N )] = μ(A). In the process of the proof, we also have seen that μ(A − N ) = μ(A). (ii) As it was just seen, μ∗ (A ∪ M) = μ(A − N ) = μ(A). We show that μ∗ so defined on A∗ is well-defined. That is, if B = A1 ∪ M1 = A2 ∪ M2 , then μ(A1 ) = μ(A2 ). Indeed, A1 = (A1 ∩ A2 ) + (A1 ∩ Ac2 ) = (A1 ∩ A2 ) (A1 ∩ Ac2 ). / A2 , Next, A1 ∩ Ac2 ⊆ M2 , because x ∈ (A1 ∩ Ac2 ) implies x ∈ A1 and x ∈ / A2 , so that x ∈ B and x ∈ / A2 . This implies hence x ∈ (A1 ∪ M1 ) and x ∈ / A2 , so that x ∈ M2 . Thus, A1 ∩ Ac2 ⊆ M2 ⊆ N2 . that x ∈ (A2 ∪M2 ) and x ∈ From this and the fact that B = (A1 ∩ A2 ) (A1 ∩ Ac2 ), it follows that μ∗ (B) = μ(A1 ∩ A2 ) (= μ(A1 )). Likewise, A2 = (A1 ∩ A2 ) (Ac1 ∩ A2 ) with Ac1 ∩ A2 ⊆ M1 ⊆ N1 , so that μ∗ (B) = μ(A1 ∩ A2 ) (= μ(A2 )). It follows that μ(A1 ) = μ(A2 ) and μ∗ is well-defined. (iii) Clearly, μ∗ () = μ∗ ( ) = μ() = 0, and μ∗ (A ∪ M) = μ(A) (as ¯ i = 1, 2, . . . , Bi ∩ was seen in part (i)) and this is ≥ 0. Finally, let Bi ∈ A, B j = , i = j. Then Bi = Ai ∪ Mi , B j = A j ∪ M j , and = Bi ∩ B j = (Ai ∩ A j ) ∪ (Ai ∩ M j ) ∪ (Mi ∩ A j ) ∪ (Mi ∩ M j ), so that Ai ∩ A j = . Therefore ∗
μ
∞
Bi
∗
i=1
=μ
∗
∪ (Ai ∪ Mi ) = μ
=μ
i=1
∞
∞
∪ Ai
i=1
=μ
∞ i=1
∞
∪ Ai
i=1
Ai
=
∞ i=1
∪
μ Ai
∞
∪ Mi
i=1
Revised Answers Manual to an Introduction
=
∞
μ∗ (Ai ∪ Mi )
i=1
=
∞
μ∗ Bi .
i=1
¯ A∗ ). # It follows that μ is a measure on A(= ∗
18.
(i) Let B ∈ Cˆ and suppose that B = A for some A ∈ A. Then B = A with ∈ A and μ() = 0, so that B ∈ A∗ . If B ⊆ N for some N ∈ A with μ(N ) = 0, we have B = B with ∈ A and B ⊆ N ∈ A with μ(N ) = 0, so that B ∈ A∗ . Thus Cˆ ⊆ A∗ . ˆ = Aˆ ⊆ A∗ , so it suffices to show that A∗ ⊆ A. ˆ (ii) Cˆ ⊆ A∗ implies that σ (C) ∗ Let B ∈ A , so that B = AM with A ∈ A and M ⊆ N , N ∈ A, ˆ μ(N ) = 0. Since A = A, it follows that A ∈ Cˆ and hence A ∈ A. ˆ ˆ ˆ Also, M = M, so that M ∈ C and hence M ∈ A. Thus, A, M ∈ A and ˆ # therefore AM ∈ Aˆ or B ∈ A.
19. Let = {ω1 , ω2 , ω3 , ω4 }, and let A = {, {ω1 , ω2 }, {ω3 , ω4 }, {ω1 , ω2 , ω3 , ω4 }}. Then A is, trivially, a σ -field. On A, define μ as follows: μ() = 0 = μ({ω1 , ω2 }), μ({ω3 , ω4 }) = μ({ω1 , ω2 , ω3 , ω4 }) = 1. Then, clearly, μ is a measure on A. / A. # But {ω1 } ⊂ {ω1 , ω2 } ∈ A with μ({ω1 , ω2 }) = 0 whereas {ω1 } ∈ 20. Recall that μ0 is an outer measure on P() if μ0 () = 0, μ0 is ↑ and sub-σ additive. Now, let N ∈ A0 with μ0 (N ) = 0, and let M be an arbitrary subset of N . To show that M ∈ A0 . It suffices to show that μ0 (D) ≥ μ0 (M ∩ D) + μ0 (M c ∩ D) for every D ⊆ . We have: M ⊆ N , hence M ∩ D ⊆ N ∩ D and μ0 (M ∩ D) ≤ μ0 (N ∩ D) = 0, so that μ0 (M ∩ D) = 0. Next, M c ∩ D ⊆ D and μ0 (M c ∩ D) ≤ μ0 (D), so that μ0 (D) ≥ μ0 (M ∩ D) + μ0 (M c ∩ D) for every D ⊆ . # 21. On B, define μ in the following manner: μ(B) = number of integers in B. Then, clearly, μ is a measure satisfying the condition μ(finite interval) < ∞. Next, let xn ↑ −2, so that μ((xn , 0]) = 3 for all sufficiently large n, and hence Fc (xn ) = c−3 for all sufficiently large n. But Fc (−2) = c−μ((−2, 0]) = c−2. Hence Fc is not left-continuous. # 22. Indeed, if μ were additive, then c = μ() = μ( ∪ ) = μ() + μ() = 2c, so that 2 = 1, a contradiction. # 23. For n = 2, let μ1 and μ2 be σ -finite, and let {A11 , A12 , . . .} and {A21 , A22 , . . .} be the associated partitions for which μ1 (Ai1 ) < ∞, μ2 (Ai2 ) < ∞, i ≥ 1. Then {Ai1 ∩ A2j , i, j ≥ 1} is a partition of and μ(Ai1 ∩ A2j ) = μ1 (Ai1 ∩ A2j ) + μ2 (Ai1 ∩ A2j ) < ∞, i, j ≥ 1, so that μ is σ -finite. Next, assume the assertion to be true for n = k and we will establish it for n = k + 1. By setting μ0 = μ1 + . . . + μk , we have that both μ0 and μk+1 are σ -finite, and let {Bi , i ≥ 1} and {Aik+1 , i ≥ 1} be the associated partitions for which μ0 (Bi ) < ∞, μk+1 (Aik+1 ) < ∞, i ≥ 1. Then {Bi ∩ Ak+1 j , i, j ≥ 1}
e25
e26
Revised Answers Manual to an Introduction
k+1 is a partition of , and μ0 (Bi ∩ Ak+1 j ) ≤ μ0 (Bi ) < ∞, μk+1 (Bi ∩ A j ) ≤
μk+1 (Ak+1 j ) < ∞, i, j ≥ 1. Thus, k+1 (μ1 + . . . + μk+1 )(Bi ∩ Ak+1 j ) = (μ1 + . . . + μk )(Bi ∩ A j ) +
μk+1 (Bi ∩ Ak+1 j ) < ∞, i, j ≥ 1, so that μ1 + . . . + μk+1 is σ -finite. # c c 24. (i) Clearly, (A ∩ B ) ∪ (A ∩ B) = A B = (A ∪ B) − (A ∩ B). Hence P[(A ∩ B c ) ∪ (Ac ∩ B)] = P[(A ∪ B) − (A ∩ B)] = P(A ∪ B) − P(A ∩ B) (since A ∩ B ⊆ A ∪ B) = P(A) + P(B) − P(A ∩ B) − P(A ∩ B) = P(A) + P(B) − 2P(A ∩ B). (ii) We will use the induction hypothesis. For n = 2, we have: P(A1 ∪ A2 ) = P(A1 ) + P(A2 ) − P(A1 ∩ A2 ), so that P(A1 ∩ A2 ) = P(A1 ) + P(A2 ) − P(A1 ∪ A2 ) ≥ P(A1 ) + P(A2 ) − 1. Next, assume it to be true for n = k and establish it for n = k + 1. Indeed, P(A1 ∩ . . . ∩ Ak+1 ) = P[(A1 ∩ . . . ∩ Ak ) ∩ Ak+1 ] ≥ P(A1 ∩ . . . ∩ Ak ) + P(Ak+1 ) − 1 ≥
k
P(Ai ) − (k − 1) + P(Ak+1 ) − 1
i=1
=
k+1
P(Ai ) − [(k + 1) − 1]. #
i=1 ∞ ∞ ∞ ∞ 25. limn→∞ An = ∪∞ n=1 ∩k=n Ak = ∪n=1 {ω2 } = {ω2 }, lim n→∞ An = ∩n=1 ∪k=n ∞ Ak = ∩n=1 {ω1 , ω2 , ω3 } = {ω1 , ω2 , ω3 }, so that P(limn→∞ An ) = P({ω2 }) = 1 7 3 , P(lim n→∞ An ) = P({ω1 , ω2 , ω3 }) = 10 ; also, P(A2n−1 ) = P({ω1 , ω2 }) = 1 8 1 2 , P(A2n ) = P({ω2 , ω3 }) = 5 , so that lim n→∞ P(An ) = 2 and lim n→∞ 8 P(An ) = 5 . Observe that
P( lim An ) = n→∞
and P( lim An ) = n→∞
1 1 = = lim P(An ), 3 2 n→∞
8 7 = = lim P(An ). # 10 5 n→∞
Revised Answers Manual to an Introduction
26.
(i) If {ωi } ∈ A for all ωi , then, clearly, every subset of is in A, so that A = P(). On the other hand, if A = P(), then all subjects of are in A, and in particular, so are {ωi } for all ωi s. (ii) It is immediate. #
27.
(i) That μ(A) ≥ 0 and μ() = 0 are immediate. Next, n let A1 , . . . , An be n Ai ) = i=1 μ(A pairwise disjoint. Then to show that μ( i=1 i ). If at least n n Ai is infinite, so that μ( i=1 Ai ) = one of the Ai s is infinite, then i=1 ∞. Also, at least one of the terms on the right-hand side is ∞, so that n μ(A ) = ∞. On the other hand, if all A , . . . , A are finite, then i 1 n n i=1 n i=1 Ai is finite and hence μ( i=1 Ai ) = 0. The right-hand side is also equal to 0 since each -additive, because if all ∞term is 0. Next, μ is not σ ∞ s are finite, then A is infinite, so that μ( A i i i=1 Ai ) = ∞, whereas ∞ i=1 ∞ i=1 μ(Ai ) = i=1 0 = 0. (ii) Clearly, = ∪∞ n=1 An , where An = {ω1 , . . . , ωn }, so that An ⊂ An+1 , n ≥ 1, and μ(An ) = 0 for all n. Since μ(An ) = 0, n ≥ 1, it follows that μ(Acn ) = ∞ for all n. #
28.
(i) We have to prove that μ0 () = 0, μ0 (A) ≤ μ0 (B) for A ⊂ B, and μ0 is a sub-σ -additive. That μ0 () = 0 holds by the definition of μ0 . Next, suppose that A ⊂ B. There are three cases to consider. Let B be finite. a b < b+1 = μ0 (B) since a < b. Let B Then A is finite, and μ0 (A) = a+1 a 0 be infinite but A be finite. Then μ (A) = a+1 < 1 = μ0 (B). Finally, let both A and B be infinite. Then μ0 (A) = 1 ≤ 1 = μ0 (B). Now to establish sub-σ -additivity: ∞
μ0 ( ∪ An ) ≤ n=1
∞
μ0 (An ).
n=1
. Then the union Suppose that at least one of the An s is infinite, e.g., An 0 ∞ 0 (∪∞ A ) = 1, whereas 0 A is infinite, and hence μ ∪∞ n=1 μ (An ) ≥ n=1 n n=1 n 0 0 1, since μ (An 0 ) = 1 and μ (An ) ≥ 0, n ≥ 1. Next, let all An be finite 0 ∞ = 1. As for the and = . Then ∪∞ n=1 An is infinite, so that μ (∪n=1 An ) an 1 0 0 right-hand side, μ (An ) = an +1 ≥ 2 for all n, so that ∞ n=1 μ (An ) = ∞. Finally, suppose that only finitely many of the An s are finite, e.g., An 1 , . . . , An k . Then, clearly, sup(A n 1 ∪ . . . ∪ An k ) ≤ sup An 1 + . . . + ∞ 0 0 sup An k , so that μ0 (∪∞ n=1 μ (An ). Therefore μ is an outer n=1 An ) ≤ measure. (ii) By Remark 6(i), A is μ0 -measurable if μ0 (D) ≥ μ0 (A ∩ D) + μ0 (Ac ∩ D) for every D ⊆ . Also, by Remark 6(ii), and are μ0 -measurable, so to investigate the last inequality for ⊂ A ⊂ . Consider the following possible cases. Let both A and Ac be infinite, and take D = . Then μ0 () = 1, but
e27
e28
Revised Answers Manual to an Introduction
μ0 (A ∩ ) + μ0 (Ac ∩ ) = μ0 (A) + μ0 (Ac ) = 1 + 1 = 2, so that the inequality is violated. Let A be infinite but Ac be finite, and take D = . Then μ0 () = 1, but μ0 (A ∩ ) + μ0 (Ac ∩ ) = μ0 (A) + μ0 (Ac ) = c , c = sup Ac . Again, the inequality is violated. Finally, let A be 1 + c+1 finite (so that Ac is infinite), and take D = . Once again, μ0 () = 1, and a + 1, a = sup A. So, μ0 (A ∩ ) + μ0 (Ac ∩ ) = μ0 (A) + μ0 (Ac ) = a+1 the inequality is violated. The conclusion then is that A0 = {, }. # 29. It is immediate since: 1 , and 2 1 P(−X ≥ −m) = P(X ≤ m) ≥ . # 2 P(−X ≤ −m) = P(X ≥ m) ≥
30. By symmetry, we have P(X ≤ x) = P(−X ≤ x) = P(X ≥ −x) = 1 − P(X < −x) ≥ 1 − P(X ≤ −x). For x = 0, this becomes P(X ≤ 0) ≥ 1 − P(x ≤ 0), or P(x ≤ 0) ≥
1 . 2
Again, by symmetry, P(X ≥ x) = P(−X ≥ x) = P(X ≤ −x).
31.
32.
33. 34.
For x = 0, this relation becomes P(X ≥ 0) = P(X ≤ 0). But P(X ≤ 0) ≥ 21 as already shown. Thus, P(X ≥ 0) ≥ 21 , and 0 is a median for X . # From B ⊆ A ∪ B, we get μ0 (B) ≤ μ0 (A ∪ B). However, μ0 (A ∪ B) ≤ μ0 (A) + μ0 (B) = μ0 (B) (by the sub-additivity property of μ0 ). Thus, μ0 (B) ≤ μ0 (A ∪ B) ≤ μ0 (B), so that μ0 (A ∪ B) = μ0 (B). # Let N = ( f = g), and let B ∈ B. Then f −1 (B) ∈ A, by assuming that, e.g., f is measurable. Also, g −1 (B) = {[g −1 (B)] ∩ N } ∪ {[g −1 (B)] ∩ N c } = {[g −1 (B)] ∩ N } ∪ f −1 (B) (since f = g on N c ). But [g −1 (B)] ∩ N ⊆ N with μ(N ) = 0. Thus, [g −1 (B)] ∩ N is in A, and hence g −1 (B) is in A. It follows that g is measurable. # Indeed, B ∈ B, we have f −1 (B) ⊆ A with μ[ f −1 (B)] = 0, so that f −1 (B) ∈ A, and hence f is measurable. # (i) We have to show that μ is nonnegative, μ() = 0, and μ is σ -additive. (A) + μ2 (A) ≥ 0; μ() = μ () + μ2 () = 0; Indeed, ∞ μ(A) = μ1 1∞ ∞ ∞ ∞ μ( i=1 A ) = μ ( A ) + μ ( A ) = μ (A ) + i 1 2 i i=1 i i=1 i i=1 1 i=1 ∞ ∞ μ2 (Ai ) = i=1 (μ1 + μ2 )(Ai ) = i=1 μ(Ai ).
Revised Answers Manual to an Introduction
(ii) Suppose that, e.g., μ1 is complete, or more properly, A is complete with respect to μ1 , which means that A contains all subsets of the μ1 -null sets. So, let A ∈ A with μ(A) = 0. Then μ1 (A)(= μ2 (A)) = 0. Thus, for an arbitrary B ⊆ A, we have μ1 (B) ≤ μ1 (A) = 0 and B ∈ A. It follows that μ(B) ≤ μ(A) = 0, so that μ is complete. # 35.
(i) Unions of any two members of C2 produce elements in C2 except for two new elements; namely, (A ∩ B) ∪ (Ac ∩ B c ) and (A ∩ B c ) ∪ (Ac ∩ B). Beyond the obvious results, we have: A ∪ (Ac ∩ B) = A ∪ B, A ∪ (Ac ∩ B c ) = A ∪ B c ; Ac ∪ (A ∩ B) = Ac ∪ B, Ac ∪ (A ∩ B c ) = Ac ∪ B c ; B ∪ (A ∩ B c ) = A ∪ B, B ∪ (Ac ∩ B c ) = Ac ∪ B; B c ∪ (A ∩ B) = A ∪ B c , B c ∪ (Ac ∩ B) = Ac ∪ B c ; (A ∩ B) ∪ (Ac ∩ B c ) new element, (A ∩ B) ∪ (Ac ∪ B c ) = (A ∩ B) ∪ (A ∩ B)c = ; (A ∩ B c ) ∪ (Ac ∩ B) new element, (A ∩ B c ) ∪ (Ac ∪ B) = ; (Ac ∩ B c ) ∪ (A ∪ B) = (A ∪ B)c ∪ (A ∪ B) = . (ii) Closeness under complementation is immediate for all elements except, perhaps, for the last two, each of which is the complement of the other. Indeed, [(A ∩ B) ∪ (Ac ∩ B c )]c = (Ac ∪ B c ) ∩ (A ∪ B) = [(Ac ∪ B c ) ∩ A] ∪ [(Ac ∪ B c ) ∩ B] = (A ∩ B c ) ∪ (Ac ∩ B). In checking closeness under unions, it suffices to restrict ourselves to forming unions of two elements, one taken from each one of the classes: {(A ∩ B) ∪ (Ac ∩ B c ), (A ∩ B c ) ∪ (Ac ∩ B)}, {A, Ac , B, B c , A ∩ B, A ∩ B c , Ac ∩ B, Ac ∩ B c }, as well as any two elements from the second class above. To this end, and except for the obvious results, we have: A ∪ [(A ∩ B) ∪ (Ac ∩ B c )] = A ∪ (Ac ∩ B c ) = A ∪ B c ; A ∪ [(A ∩ B c ) ∪ (Ac ∩ B)] = A ∪ (Ac ∩ B) = A ∪ B; Ac ∪ [(A ∩ B) ∪ (Ac ∩ B c )] = Ac ∪ (A ∩ B) = Ac ∪ B; Ac ∪ [(A ∩ B c ) ∪ (Ac ∩ B)] = Ac ∪ (A ∩ B c ) = Ac ∪ B c ; B ∪ [(A ∩ B) ∪ (Ac ∩ B c )] = B ∪ (Ac ∩ B c ) = Ac ∪ B;
e29
e30
Revised Answers Manual to an Introduction
B ∪ [(A ∩ B c ) ∪ (Ac ∩ B)] = B ∪ (A ∩ B c ) = A ∪ B; B c ∪ [(A ∩ B) ∪ (Ac ∩ B c )] = B c ∪ (A ∩ B) = A ∪ B c ; B c ∪ [(A ∩ B c ) ∪ (Ac ∩ B)] = B c ∪ (A ∩ B c ) = B c ; (A ∩ B) ∪ (Ac ∪ B c ) = (A ∩ B) ∪ (A ∩ B)c = , (A ∩ B) ∪ [(A ∩ B c ) ∪ (Ac ∩ B)] = A ∪ (Ac ∩ B) = A ∪ B; (A ∩ B c ) ∪ (Ac ∪ B) = , (A ∩ B c ) ∪ [(A ∩ B) ∪ (Ac ∩ B c )] = A ∪ (Ac ∩ B c ) = A ∪ B c ; (Ac ∩ B) ∪ (A ∪ B c ) = , (Ac ∩ B) ∪ [(A ∩ B) ∪ (Ac ∩ B c )] = Ac ∪ (A ∩ B) = Ac ∪ B; (Ac ∩ B c ) ∪ (A ∪ B) = (A ∪ B)c ∪ (A ∪ B) = , (Ac ∩ B c ) ∪ [(A ∩ B c ) ∪ (Ac ∩ B)] = B c ∪ (Ac ∩ B) = Ac ∪ B c . Again, except for the obvious results, we have: (A ∪ B) ∪ [(A ∩ B) ∪ (Ac ∩ B c )] = (A ∪ B) ∪ (Ac ∩ B c ) = (A ∪ B) ∪ (A ∪ B)c = ; (A ∪ B c ) ∪ [(A ∩ B c ) ∪ (Ac ∩ B)] = (A ∪ B c ) ∪ (Ac ∩ B) = ; (Ac ∪ B) ∪ [(A ∩ B c ) ∪ (Ac ∩ B)] = (Ac ∪ B) ∪ (A ∩ B c ) = ; (Ac ∪ B c ) ∪ [(A ∩ B) ∪ (Ac ∩ B c )] = (Ac ∪ B c ) ∪ (A ∩ B) = (A ∪ B)c ∪ (A ∩ B) = . #
Chapter 3 Some Modes of Convergence of a Sequence of Random Variables and their Relationships 1. Indeed, |X n −X | = (X n −X )+ +(X n −X )− , so that (X n −X )+ ≤ |X n −X |, (X n − X )− ≤ |X n − X |. Hence, for every ε > 0, μ[(X n − X )+ ≥ ε] ≤ μ[|X n − X | ≥ ε] −→ 0, and likewise, μ[(X n − X )− ≥ ε] ≤ μ[|X n − X | ≥ ε] −→ 0. n→∞
n→∞
Next, recall that (Exercise 28, Chapter 1) that for any two r.v.s X and Y , (X + Y )+ ≤ X + + Y + and (X + Y )− ≤ X − + Y − . Hence X n+ = ((X n − X ) + X )+ ≤ (X n − X )+ + X + , X + = ((X − X n ) + X n )+ ≤ (X − X n )+ + X n+ = (X n − X )− + X n+ , because, as is easily seen, (−Z )+ = Z − . Then −(X n − X )− ≤ X n+ − X + ≤ (X n − X )+ , or |X n+ − X + | ≤ (X n − X )+ + (X n − X )− = |X n − X |, and therefore μ(|X n+ − X + | ≥ ε) ≤ μ(|X n − X | ≥ ε) −→ 0, n→∞
so that
μ X n+ −→ n→∞
X +.
Likewise,
μ X n− −→ n→∞
X −.
#
Revised Answers Manual to an Introduction
2. (i) Let (, A, μ) = ( , B, λ), λ the Lebesgue measure, and for n ≥ 1, let X n = I(n,∞) and X = 0. Then X n (ω) −→ 0 for every ω ∈ (since if n→∞
n 0 = n 0 (ω) is the smallest positive integer which is ≥ ω, then X n (ω) = 0 a.s. for all n > n 0 ), and in particular X n −→ 0. However, for 0 < ε < 1, n→∞
{ω ∈ ; |X n (ω) − X (ω)| > ε} = {ω ∈ ; X n (ω) = 1} = (n, ∞) and λ λ((n, ∞)) = ∞, so that X n 0. n→∞
As another example, take = {1, 2, . . .}, A = P(), μ the counting measure, and let An = {1, . . . , n}, n ≥ 1. Take X n = I An and X = 1. Then a.e. X n (ω) −→ X (ω), ω ∈ , and hence X n −→ X . However, for 0 < ε < 1, n→∞
n→∞
{ω ∈ ; |X n (ω) − X (ω)| > ε} = {ω ∈ ; |X n (ω) − X (ω)| = 1} = {ω ∈ ; X n (ω) = 0} = Acn = {n + 1, n + 2, . . .}, and μ(Acn ) = ∞. (ii) Let = (0, 1], A = B , and let λ be the Lebesgue measure. For n = k 1, 2, . . . , consider the partition of (0, 1] by the 2n−1 intervals 2k−1 n−1 , 2n−1 , k k = 1, . . . , 2n−1 . Define the r.v.s Ynk by: Ynk (ω) = 1 for ω ∈ 2k−1 n−1 , 2n−1 and Ynk (ω) = 0 otherwise, k = 1, . . . , 2n−1 , n ≥ 1, and set {X 1 , X 2 , . . .} = {Y11 , Y21 , Y22 , Y31 , Y32 , Y33 , Y34 , . . .}. λ
Then X n −→ 0, but {X n (ω)} does not convergence to 0 not even for a sinn→∞
gle ω ∈ . This is so because for every ω ∈ (0, 1], ω belongs to infinitely k−1 k many intervals 2n−1 , 2n−1 and it does not belong to infinitely many such intervals. Consequently, X n (ω) = 1 for infinitely many n and X n (ω) = 0 for infinitely many n, so that lim inf n→∞ X n = 0 = 1 = lim supn→∞ X n , and limn→∞ X n does not exist. # 3. We have ∞ P lim sup An = P ∩ n→∞
∞
∪ Ak
n=1 k=n
= lim P n→∞
∞
∪ An
k=n
=P
lim
∞
∪ An
n→∞ k=n ∞
≤ lim
n→∞
P(Ak ) = 0.
k=n
P(An ) < ∞. Hence P lim supn→∞ An = 0. # 4. (i) Set An = (|X n | ≥ k1 ), k = 1, 2, . . . By assumption, ∞ n=1 P(An ) < ∞, so that P lim supn→∞ An = 0, by Exercise 3. However, ∞ ∞ ∞ ∞ 1 lim sup An = ∩ ∪ Aν = ∩ ∪ |X ν | ≥ n=1 ν=n n=1 ν=n k n→∞ ∞ ∞ 1 = Bkc , say. Thus = ∩ ∪ |X n+ν | ≥ n=1 ν=1 k An = P(Bkc ), k ≥ 1, so that P(Bk ) = 1, k ≥ 0 = P lim n→∞ sup ∞ 1. Then P ∩∞ k=1 Bk = 1, by Exercise 4 in Chapter 2. But ∩k=1 Bk = since
∞
n=1
e31
e32
Revised Answers Manual to an Introduction
1 ∞ ∞ ∩∞ −→ 0, by k=1 ∪n=1 ∩ν=1 |X n+ν | < k , and this is the event for which X n n→∞ a.s. Theorem 3. Thus, P X n −→ 0 = 1 or X n −→ 0. n→∞
n→∞
(ii) Let , A and P be as in the hint, and define X n by X n (ω) = 1, if ω ∈ (0, n1 ), and 0 otherwise, n = 1, 2, . . . Then, for ω ∈ , X n (ω) = 0, provided n ≥ n 0 (ω) = ω1 , so that X n (ω) −→ 0 for every ω ∈ , and in particular, n→∞ ∞ a.s. X n −→ 0. However, for 0 < ε < 1, ∞ n=1 P(|X n | ≥ ε) = n=1 P(X n = n→∞ ∞ 1 1) = n=1 n = ∞. # P
5. If X n −→ X , then for every subsequences {X n } ⊆ {X n } there is a further subsen→∞ quence {X n }
a.s.
⊆ {X n } such that X n −→ X , and P(X = X ) = 0. This is so by n→∞
Theorem 5(ii). Next, suppose that for every subsequence {X n } ⊆ {X n } there is a a.s. further subsequence {X n } ⊆ {X n } such that X n −→ X , and that any two limiting n→∞ r.v.s differ on a set of probability 0. If {X n } does not converge in probability to a r.v. X , then there must exist an ε > 0 such that P(|X n − X | ≥ ε) 0. n→∞
Therefore there exists δ > 0 for which there is no N = N (δ) > 0 integer such that P(|X n − X | ≥ ε) < δ, n ≥ N . In other words, there exists n 1 < n 2 < . . . ↑ ∞ for which P(|X n k − X | ≥ ε) ≥ δ, k ≥ 1. Then, for every subsequence {X n k } ⊆ {X n k }, P(|X n k − X | ≥ ε) ≥ δ, k ≥ 1, so that no subsequence of {X n k } converges a.s. to X (or to another r.v. X differing from X on a P
set of probability 0); a contradiction. Therefore X n −→ X . # n→∞
6. Let (, A, P) = ((0, 1], B(0,1] , λ), where λ is the Lebesgue measure, and define X n by X n = I( 1 ,1] , n ≥ 1. Then, clearly, X n −→ 1 pointwise, but P(X n = 0) = n
n→∞
λ((0, n1 )) = n1 ; i.e., P(X n = 0) > 0 for all n ≥ 1. # 7. It suffices to show that {X n } converges mutually a.s. That is, for every ε > 0 and every ω ∈ N c with P(N ) = 0, there exists N = N (ε, ω) > 0 integer such that |X n+ν (ω) − X n (ω)| < ε, n ≥ N , uniformly in ν = 1, 2, . . . To this end, set An = (|X n+1 − X n | ≥ ε), n ≥ 1. Then, by assumption ∞ ¯ ¯ n=1 P(An ) < ∞. Hence, by Exercise 3 here, P( A) = 0, where A = ∞ A , or P( A c ) = 1 where A c = ∪∞ ∩∞ Ac . ¯ ¯ ∪ lim supn→∞ An = ∩∞ n=1 k=n k n=1 k=n k c ∞ Set N c = A¯ c , so that P(N c ) = 1. Now, for ω ∈ N c or ω ∈ ∪∞ n=1 ∩k=n Ak , c c ∞ we have that ω ∈ ∩k=n 1 Ak for some n 1 = n 1 (ω). Thus, ω ∈ Ak , k ≥ n 1 , so that |X k+1 k∞≥ n 1 , by the definition of Ak . The (ω) − X k (ω)| < εk for all assumption ∞ n=1 εn < ∞ implies that k=n εn −→ 0, so that, for ε > 0 there ∞ n→∞ exists n 2 = n 2 (ε) > 0 integer such that k=n 2 εn < ε. Set N = N (ε, ω) = max{n 1 (ω), n 2 (ε)}. Then, for n ≥ N , we have |X n+1 (ω) − X n (ω)| < εn for ω ∈ N c , and ∞ k=N εn < ε. Therefore, for n ≥ N , and ν = 1, 2, . . . , |X n+ν (ω)− X n (ω)| ≤ |X n+1 (ω)− X n (ω)|+. . .+|X n+ν (ω)− X n+ν−1 (ω)| <
Revised Answers Manual to an Introduction
εn + . . . + εn+ν−1 ≤ εn + εn+1 + . . . < ε. That is, for every ε > 0 and each ω ∈ N c (with P(N c ) = 1), there exists N = N (ε, ω) > 0 integer such that, for n ≥ N and ν ≥ 1, |X n+1 (ω) − X n (ω)| < ε. So, {X n } converges mutually a.s., and therefore {X n } converges a.s. to a r.v. # 8. To show that, for every 0 < ε(< 1) and 0 < δ(< 1) there exists N = N (ε, δ) > 0 integer such that n ≥ N implies P[|g(X n ) − g(X )| ≥ ε] < δ. Since for 0 < M ↑ ∞, P(−M ≤ X ≤ M) ↑ P(X ∈ ) = 1, choose M sufficiently large (and >1) so that P(|X | > M) < 2δ . Now g is uniformly continuous in the (closed) interval [−2M, 2M], so that |g(x ) − g(x )| < ε for all x , x ∈ [−2M, 2M] P
with |x − x | < δ. Next, X n −→ X implies that there exists N = N (ε, δ) > 0 n→∞
integer such that P(|X n − X | ≥ δ) < 2δ , n ≥ N . Below work with n ≥ N and set A1 = (|X | ≤ M), A2 (n) = (|X n − X | < δ), A3 (n) = [|g(X n ) − g(X )| < ε]. On A1 ∩ A2 (n), we have −M ≤ X ≤ M and X − δ < X n < X + δ, or −2M ≤ X ≤ 2M and −2M < X n < 2M (since δ < 1 and M > 1). So, on A1 ∩ A2 (n), we have X , X n ∈ [−2M, 2M] and |X n − X | < δ. Hence |g(X n ) − g(X )| < ε which implies that A1 ∩ A2 (n) ⊆ A3 (n). Then Ac3 (n) ⊆ Ac1 ∪ Ac2 (n), so that P[Ac3 (n)] ≤ P(Ac1 ) + P[Ac2 (n)] < 2δ + 2δ = δ. In other words, P[|g(X n ) − g(X )| ≥ ε] < δ for n ≥ N (ε, δ), which completes the proof. Remark: An alternative approach is to use Exercise 5. To this end, set Yn = P
g(X n ), Y = g(Y ) and show that Yn −→ Y . It suffices to show that for every n→∞
subsequence {Yn } ⊆ {Yn } there is a further subsequence {Yn } ⊆ {Yn } such that a.s. Yn −→ Y and P(Y = Y ) = 0. The subsequence {Yn } corresponds to the subn→∞
a.s.
sequence {X n } ⊆ {X n }, and there exists {X n } ⊆ {X n } such that X n −→ X P
(with P(X = X ) = 0) by the fact that X n −→ X . Set Yn = n→∞
a.s.
n→∞ g(X n ). Then
a.s.
{Yn } ⊆ {Yn } and Yn −→ Y = g(X ) by the fact that X n −→ X and g is conn→∞
n→∞
P
tinuous. Also, P(Y = Y ) = 0 since P(X = X ) = 0. It follows that Yn −→ Y n→∞
P
or g(X n ) −→ g(X ). # n→∞
9. Either apply in the plane the arguments used in Exercise 8 for the real line, or use an argument similar to the one employed in the Remark in the previous exercise. # 10. First, suppose that there exists {εn } with 0 < εn → 0 for which ∞ P ∪ (|X k − X | ≥ εk ) −→ 0. n→∞
k=n
For every ε > 0, there exists an integer kε > 0 such that k ≥ kε implies εk < ε. Then, for k ≥ kε , (|X k − X | ≥ εk ) ⊇ (|X k − X | ≥ ε), which for n ≥ kε implies ∞
∞
k=n
k=n
∪ (|X k − X | ≥ εk ) ⊇ ∪ (|X k − X | ≥ ε).
e33
e34
Revised Answers Manual to an Introduction
Hence, for n ≥ kε , ∞ ∞ P ∪ (|X k − X | ≥ εk ) ≥ P ∪ (|X k − X | ≥ ε) . k=n
k=n
Then P
∞ ∪ (|X k − X | ≥ εk ) −→ 0 implies P ∪ (|X k − X | ≥ ε) −→ 0.
∞
n→∞
k=n
n→∞
k=n
That is, for every ε > 0, P ∪∞ −→ 0 or P ∪∞ ν=0 k=n (|X k − X | ≥ ε) n→∞
a.s. (|X n+ν − X | ≥ ε) −→ 0, which is equivalent to X n −→ X by Theorem 4 n→∞
n→∞
a.s.
in this chapter. Next, suppose that X n −→ X . Then, by the theorem just cited, n→∞
∞
∪ (|X n+ν − X | ≥ ε) −→ 0 or P
P
ν=0
n→∞
∞
∪ (|X k − X | ≥ ε) −→ 0.
k=n
n→∞
The last convergence implies the existence of an integer n ε > 0 such that n ≥ n ε implies ∞ P ∪ (|X k − X | ≥ ε) < ε. k=n
Apply this by taking ε = m1 , m ≥ 1. Then, for each such m, there exists an integer n m > 0 such that n ≥ m m implies 1 1 |X k − X | ≥ < . k=n m m m
P
∞
∪
Actually, let n m be the smallest > 0 integer for which the above inequality is true, and without loss of generality, assume that n 1 < n 2 < . . . → ∞. For k ≥ n 1 , there exists n m such that n k ≤ k < n m+1 . For this k and the subsequent ones which may lie in [n m , n m+1 ), set εk = m1 . Then m → ∞ implies εk → 0 and ∞ 1 1 < . ∪ (|X k − X | ≥ εk ) = P ∪ |X k − X | ≥ k=n m k=n m m m
P
∞
For n ∈ [n m , n m+1 ), we, clearly, have 1 1 < −→ 0. P ∪ (|X k − X | ≥ εk ) = P ∪ |X k − X | ≥ k=n k=n m m m→∞
∞
∞
Since, clearly, m → ∞ implies n → ∞, we have P ∪∞ k=n (|X k − X | ≥ εk ) −→ 0, as was to be seen. #
n→∞
Revised Answers Manual to an Introduction
a.s.
11. Finiteness of μ implies (by Theorem 4 in this chapter) that X n −→ X if and n→∞ 1 only if μ ∪∞ ν=0 |X n+ν − X | ≥ k −→ 0 for every k ≥ 1. Let ε > 0 (to be kept n→∞
fixed throughout) and, for each k ≥ 1, consider ε/2k . Then, for ε/2k ,there exists Nk = N (ε, k) > 0 integer such that ∞ ε 1 < k , k ≥ 1, n ≥ Nk , μ ∪ |X n+ν − X | ≥ ν=0 k 2
or μ
ε 1 |X n − X | ≥ < k, n=Nk k 2 ∞
∪
k ≥ 1, n ≥ Nk .
1 Setting Aε,k = ∪∞ n=Nk (|X n − X | ≥ k ) (where dependence of Aε,k on ε is ε through Nk ), we have μ(Aε,k ) < 2k , k ≥ 1. Let Aε = ∪∞ k=1 Aε,k . Then μ(Aε ) = ∞ ∞ c μ ∪k=1 Aε,k ≤ k=1 μ Aε,k ≤ ε. That is, μ(Aε ) < ε. Consider Aε = c ∞ 1 c ∞ ∞ ∞ ∪k=1 Aε,k = ∩k=1 Aε,k = ∩k=1 ∩n=Nk (|X n − X | < k ). Then, for ω ∈ Acε , we have: For every k ≥ 1, there is Nk (= N (k, ε)) > 0 integer independent of ω, such that |X n (ω) − X (ω)| < k1 for all n ≥ Nk . It follows that X n (ω) −→ X (ω) n→∞
and this convergence is uniform on Acε . Thus, for every ε > 0, there exists Aε with μ(Aε ) < ε such that X n (ω) −→ X (ω) uniformly for ω ∈ Acε . It follows n→∞
a.u.
that X n −→ X . # n→∞
12. Clearly, A = {ω ∈ ; lim inf n→∞ X n (ω) = lim supn→∞ X n (ω)}, so that Ac = {ω ∈ ; lim inf n→∞ X n (ω) < lim supn→∞ X n (ω)}. Thus, we have to show that {ω ∈ ; lim inf X n (ω) < lim sup X n (ω)} = n→∞
n→∞
∪ {ω ∈ ; lim inf X n () ≤ r < s ≤ lim sup X n (ω)}.
r ,s
n→∞
n→∞
But this inequality is immediate since, if ω belongs in the left-hand side then lim inf n→∞ X n (ω) < lim supn→∞ X n (ω), and hence there exists rationals r and s with r < s, so that lim inf n→∞ X n (ω) ≤ r < s ≤ lim supn→∞ X n (ω), and thus ω belongs in the right-hand side. The other implications is immediate. # 13. (i) That X n −→ X means that X n (ω) −→ X (ω), ω ∈ . So, for every n→∞ n→∞ ω ∈ , the limn→∞ X n (ω) exists, and this means that limn→∞ X n (ω) = limn→∞ X n (ω)(= limn→∞ X n (ω)) = X (ω). However, X = limn→∞ X n = inf n≥1 (sup j≥n X j ) = inf n≥1 Yn , where Yn = sup j≥n X j , so that Yn is defined in terms of X n , X n+1 , . . . Next, X = inf n≥1 Yn , so that X is defined in terms of Y1 , Y2 , . . . , and hence X is σ (Y1 , Y2 , . . .)-measurable. However, σ (Y1 , Y2 , . . .) ⊆ σ (X 1 , X 2 , . . .), since, for each n ≥ 1, Yn is a function of X n , X n+1 , . . . Thus, X is σ (X 1 , X 2 , . . .)-measurable. (ii) The σ (X 1 , X 2 , . . .)-measurability of X implies that σ (X ) ⊆ σ (X 1 , X 2 , . . .), so that σ (X 1 , X 2 , . . . , X ) = σ (σ (X 1 , X 2 , . . .) ∪ σ (X )) = σ (X 1 , X 2 , . . .), as was to be seen. #
e35
e36
Revised Answers Manual to an Introduction
14.
∞
X n converges if and only if ∞ converges and is true for every n=m X this n ∞ ∞ X (ω)), m ≥ 1. On A (the set of convergence of n n=1 n=m X n = limr →∞ r r X = lim Y , where Y = X . However, {Ym,r }, r ≥ 1, n r →∞ m,r m,r n n=m n=m converges if and only if it converges mutually, and the set of mutual convergence of {Ym,r }, r ≥ 1, is given by the expression ∞ ∞ ∞ 1 . (Ym,r +ν − Ym,r −→ 0) = ∩ ∪ ∩ |X r +ν − X r | < r →∞ k=1 r =m ν=1 k n=1
This is so by Theorem 3. But the set on the right-hand side above is σ (X m , X m+1 , . . .)-measurable, being defined in terms of the r.v.s X m , X m+1 , . . . Thus, the set A of convergence of ∞ n=1 X n is σ (X m , X m+1 , . . .)-measurable for every m ≥ 1. # 15. To show: that limn→∞ |Xnn | ≤ 1 on a set A, say, with P(A) = 1. From the definition of the limn→∞ of a sequence of real numbers {xn }, limn→∞ xn ≤ 1 means that xn ≤ 1 for all but finitely many ns. So, what must be shown is that |X n | ≤ 1 for all but finitely many ns = 1. P n Setting An = ω ∈ ; |Xnn | > 1 , the last relation above becomes P limn→∞ Acn = 1 (by Exercise 30(i) in Chapter 1). However, P limn→∞ Acn = P
∞ ∞
Ack
=P
n=1 k=n
∞ ∞
c Ak
,
n=1 k=n
so that P limn→∞ Acn = 1 if and only if P
∞ ∞
c = 1,
Ak
n=1 k=n
if and only if
P
∞ ∞
Ak
= 0, or P limn→∞ An = 0,
n=1 k=n
which follows by the assumption that chapter). #
∞
n=1
16. First, from the expression ||Zn − Z|| =
P(An ) < ∞ (by Exercise 3 in this
k
2 j=1 (Z n j − Z j )
P
P
n→∞
n→∞
1 2
, we have that
Z n j −→ Z j , j = 1, . . . , k, implies ||Zn − Z|| −→ 0 (by Exercises 8 and 9). 1
Next, for each j = 1, . . . , k, |Z n j − Z j | = [(Z n j − Z j )2 )] 2 ≤ 1 P P k 2 2 = ||Z −Z|| −→ 0, so that Z n j −→ Z j , j = 1, . . . , k. # n i=1 (Z ni − Z i ) ) n→∞
n→∞
Revised Answers Manual to an Introduction
17. It suffices to demonstrate it by means of a concrete example. To this effect, for n = 1, 2, . . . , let the r.v.s X n be defined on some probability space (, A, P) as follows: X 2n−1 (ω) = p − n1 , X 2n (ω) = p + n1 for some positive constant p and P
all ω ∈ , Then, clearly, X n −→ p pointwise and hence X n −→ p. Set p = X n→∞ n→∞ and take c = p. Then, for all ω ∈ , δc (X 2n−1 (ω)) = 0 since X 2n−1 (ω) = p −
1 < p (= c), n
1 ≥ p (= c), n and δc (X (ω)) = 1 since X (ω) = p ≥ p (= c). δc (X 2n (ω)) = 1 since X 2n (ω) = p +
Therefore |δc (X 2n−1 (ω)) − δc (X (ω))| = |0 − 1| = 1, |δc (X 2n (ω)) − δc (X (ω))| = |1 − 1| = 0. Thus, for ε > 0, P[|δc (X n ) − δc (X )| > ε] = P[|δc (X 2n−1 ) − δc (X )| = 1] = 1, P
so that δc (X n ) δc (X ). # n→∞
μ
18. The convergence X n −→ X implies the existence of a subsequence {X n } ⊆ {X n } n→∞
a.s.
such that X n −→ X , a r.v. X , with μ(X = X ) = 0 (by Theorem 5(ii)). Theren→∞
fore |X | ≤ M a.e. and this implies that |X | ≤ M a.e. # 19. Let ω0 ∈ , assume that {ω0 } ∈ A, and let μ({ω0 }) = 0. (i) For n ≥ 1, let X n = 0 on {ω0 }c , X n (ω0 ) = 1, and let X = 0 on . a.e.
Then, clearly, X n −→ X , and X n−1 (B) = {, {ω0 }, {ω0 }c , }, X −1 (B) = n→∞
{, }. Since {, } ⊂ {, {ω0 }, {ω0 }c , }, which is the same as the σ (X 1 , X 2 , . . .), it follows that X is σ (X 1 , X 2 , . . .)-measurable. (ii) Here, for n ≥ 1, take X n = 0 on , and X = 0 on {ω0 }c , X = 1 on a.e.
{ω0 }. Then, again X n −→ X , and X n−1 (B) = {, }, X −1 (B) = {, n→∞
{ω0 }, {ω0 }c , }, which is the same as the σ (X 1 , X 2 , . . .). Since X −1 (B) σ (X 1 , X 2 , . . .), it follows that X is not σ (X 1 , X 2 , . . .)-measurable. (iii) Let N be the set over which X n X , so that μ(N c ) = 0. Next, set n→∞
X n = X n on N c and X n = 0 (for example) on N ; also let X = X on N c and X = 0 on N . Then X n −→ X pointwise and X is σ (X 1 , X 2 , . . .) n→∞ measurable. Furthermore, ∞ n=1 μ(X n = X n ) = 0 = μ(X = X ). a.e. (iv) Clearly, X n −→ X because, although lim inf n→∞ X n (ω0 ) = 2 < 3 = n→∞
lim supn→∞ X n (ω0 ), μ({ω0 }) = 0. Setting, for example, X n = X = 0 on
e37
e38
Revised Answers Manual to an Introduction
, we have X n −→ X pointwise (actually, X n = X , n ≥ 1, on ), X is n→∞ σ (X 1 , X 2 , . . .)-measurable, and ∞ n=1 μ(X n = X n ) = 0 = μ(X = X ). # 20. Let the events A and B be defined by: A = {ω ∈ ; X m (ω) − X n (ω) −→ 0}, m,n→∞
B = {ω ∈ ; X n+ν (ω) − X n (ω) −→ 0 uniformly in ν ≥ 1}. n→∞
Then A = B. Indeed, in the first place it is clear that X m (ω) − X n (ω) → 0 as m, n → ∞ is equivalent to X n+i (ω) − X n+ j (ω) −→ 0 uniformly in 0 ≤ i < j. n→∞
So, for ω ∈ A, X n+i (ω) − X n+ j (ω) → 0 uniformly in 0 ≤ i < j. By taking i = 0 and setting ν instead of j, we have X n+ν (ω) − X n (ω) −→ 0 uniformly in n→∞ ν ≥ 1, so that ω ∈ B and hence A ⊆ B. Next, let ω ∈ B, so that X n+ν (ω) − X n (ω) −→ 0 uniformly, in ν ≥ 1, and n→∞ observe that |X n+i (ω) − X n+ j (ω)| ≤ |X n+i (ω) − X n (ω)| + |X n+ j (ω) − X n (ω)|. Then, by the fact that X n+i (ω) − X n (ω) −→ 0 uniformly in i ≥ 0 n→∞
and X n+ j (ω) − X n (ω) −→ 0 uniformly in j ≥ 0, n→∞
we get X n+i (ω) − X n+ j (ω) −→ 0 uniformly in 0 ≤ i < j, n→∞
so that ω ∈ A and hence B ⊆ A. It follows that A = B. # 21. In the first place, for every ε > 0, it is clear that μ(|X m − X n | ≥ ε) −→ 0 as m, n → ∞ is equivalent to μ(|X n+i − X n+ j | ≥ ε) −→ 0 uniformly in 0 ≤ i < j. n→∞
Now, assume that μ(|X n+i − X n+ j | ≥ ε) −→ 0 uniformly in 0 ≤ i < j, take i = n→∞
0 and set ν instead of j to obtain μ(|X n+μ − X n | ≥ ε) −→ 0 uniformly in ν ≥ 1. n→∞
Next, assume that μ(|X n+ν − X n | ≥ ε) −→ 0 uniformly in ν ≥ 1, and for n→∞ 0 ≤ i < j, observe that |X n+i − X n+ j | ≤ |X n+i − X n | + |X n+ j − X n |,
Revised Answers Manual to an Introduction
so that ε ε ∪ |X n+ j − X n | ≥ . |X n+i − X n+ j | ≥ ε ⊆ |X n+i − X n | ≥ 2 2 However, ε μ |X n+i − X n | ≥ −→ 0 uniformly in i ≥ 0 2 n→∞ and
ε μ |X n+ j − X n | ≥ −→ 0 uniformly in j ≥ 1, 2 n→∞
and hence μ |X n+i − X n+ j | ≥ ε −→ 0 uniformly in 0 ≤ i < j. n→∞
This completes the proof. #
Chapter 4 The Integral of a Random Variable and its Basic Properties 1. Let (, A, P) = ((0, 1], B(0,1] , λ) where λ is the Lebesgue measure, and define X n and X by: X n = n I(0, 1 ] , n = 1, 2, . . . , X = 0. Then for any ω ∈ (0, 1], there n
exists n 0 = n 0 (ω) such that ω > n1 , n ≥ n 0 . Then X n (ω) = 0, n ≥ n 0 , so that, trivially, X n → X pointwise. However, E X n = n × n1 = 1, n ≥ 1, and E X = 0, so that E X n E X . # n→∞
2. Let A = (X > 0) and set An = (X > μ(An ) ↑ μ(A). Then 0= ≥
n = 1, 2, . . . , so that An ↑ A and
X dμ =
=
1 n ),
(X =0)
X dμ +
(X >0)
X dμ =
(X I An )dμ ≥
(X >0)
X dμ
X dμ =
A
1 μ(An ), n
so that μ(An ) = 0, n ≥ 1, and hence μ(A) = 0. #
(X I A )dμ
e39
e40
Revised Answers Manual to an Introduction
3. Set pk = P(X = k), k ≥ 1. Then: ∞
n
P(X ≥ k) = lim
n→∞
k=1
P(X ≥ k)
k=1
= lim { p1 + p2 + . . . + pn−1 + pn n→∞
+ p2 + . . . + pn−1 + pn +...... + pn−1 + pn + pn }
= lim ( p1 + 2 p2 + . . . + npn ) = lim n→∞
=
∞
n→∞
n
kpk
k=1
kpk = E X .#
k=1
4.
(i) Clearly, (X I A+B )+ = X + I A+B = X + (I A + I B ) (by the fact that A ∩ B = ) = X + I A + X + I B = (X I A )+ + (X I B )+ . Similarly, (X I A+B )− = X − I A+B = X − I A + X − I B = (X I A )− + (X I B )− . (ii) + + n + n (X Ii=1 Ai ) = X I i=1 Ai = X
n
I Ai (since Ai ∩ A j = , i = j)
i=1
=
n i=1
X + I Ai =
n
(X I Ai )+ , and similarly,
i=1
− − n − n (X Ii=1 Ai ) = X I i=1 Ai = X
n i=1
I Ai =
n
X − I Ai
i=1
n = (X I Ai )− . # i=1
n 5. The existence of n Ai X = X Ii=1 Ai implies that at least one of i=1 + − + < ∞. Since n n n (X I i=1 Ai ) or (X I i=1 Ai ) is finite. Let (X I i=1 Ai ) n + + (by Exercise 4(ii)), it follows that n (X Ii=1 = Ai ) i=1 (X I Ai )
Revised Answers Manual to an Introduction
n n (X Ii=1 (X I Ai )+ by Theorem 4 (see also Remark 2), and simi)+ = i=1 Ai n − n (X I Ai )− . Therefore (X I Ai )+ < ∞, = larly (X I i=1 Ai ) i=1 i = 1, . . . , n and the integrals X I Ai = Ai X i , i = 1, . . . , n, exist. Furthermore, n
+
X I Ai −
i=1
n i=1 +
(X In
X I Ai
) − i=1 Ai
=
n n + = (X I Ai ) − (X − I Ai )
−
i=1 −
(X In
) = i=1 Ai
i=1
X I n
i=1
Ai
=
n i=1
X, Ai
and n
+
X I Ai −
i=1
=
n i=1
n
−
X I Ai =
i=1 n
X I Ai =
i=1
n
+
X I Ai −
−
X I Ai
i=1
X . That is, Ai
n − n X . Similarly, if we assume that (X Ii=1 Ai ) < ∞. # i=1 A i i=1 m 6. Suppose that X = Y = ∞. From X = i=1 αi μ(Ai ) = ∞, it follows that for those αi s with αi < 0, the corresponding Ai s have μ(Ai ) < ∞. Then, since by convention 0 × ∞ = 0, restrict attention to αi s> 0. Hence there exists at least one i 0 such that αi0 > 0 and μ(Ai0 ) = ∞. Next, in summing the terms (α in + β j )μ(Ai ∩ B j ), restrict ourselves to Ai ∩ B j = . Now ∞ = μ(Ai0 ) = ∩ B j1 ) = ∞. for at least one j1 , which j=1 μ(Ai 0 ∩ B j ) implies that μ(A i0 implies μ(B j1 ) = ∞. Then, since Y = ∞, it follows that the corresponding = ∞, then αi ≥ 0 and β j ≥ 0 for the β j1 ≥ 0. So, whenever μ(Ai ∩ B j ) β s. It follows that + βi )μ(Ai ∩ B respective αi s and j ) is meaningful, i, j (αi j and of course, i, j (αi + β j )μ(Ai ∩ B j ) = i αi μ(Ai ) + j β j μ(B j ). So, (X + Y ) exists and is equal to X + Y . Similarly, if X = Y = −∞. # 7. Since X d(μ1 + μ2 ) exists, at least one of X + d(μ1 + μ2 ) and X − d(μ1 + μ2 ) is finite. Let X + d(μ1 + μ2 ) < ∞, and let 0 ≤ X n simple ↑ X + . + n n→∞ Then X n d(μ1 + μ2 ) ↑ X d(μ1 + μ2 ). Suppose X n = ri=1 αni I Ani . n→∞ rn rn Then X n d(μ1 + μ2) = αni (μ1 + μ2 )(A i=1 i=1 ni )+ = αni μ1 (Ani ) + rn X dμ1 + X + dμ2 . Thus, X n dμ1 + X n dμ2 ↑ i=1 αni μ2 (Ani ) = n→∞ + + + X dμ 1 + μ2 ) < ∞, 1 ++ X dμ2 = +limn→∞ X n d(μ1 + μ2 ) = X d(μ so that X dμ1 and X dμ2 are < ∞. Then X dμ1 and X dμ2 exist, and from above, X + dμ1 + X + dμ2 . X + d(μ1 + μ2 ) = n
Ai
X=
Likewise,
X − d(μ1 + μ2 ) =
X − dμ1 +
X − dμ2 ,
e41
e42
Revised Answers Manual to an Introduction
so that X + d(μ1 + μ2 ) − X − d(μ1 + μ2 ) X d(μ1 + μ2 ) = + − + = ( X dμ1 − X dμ1 ) + ( X dμ2 − X − dμ2 ) = X dμ1 + X dμ2 . # 8. By Theorem 12, the integral (X 1 + X 2 ) exists and (X 1 + X 2 ) = X 1 + X 2 . Also, by Theorems 8(iv), 4 and 10, | (X 1 + X 2 )| ≤ |X 1 + X 2 | ≤ |X 1 | + |X 2 | < ∞, so that X 1 + X 2 is integrable. Next, assume k to be true for k and show k+1 the theorem X i = ( i=1 X i ) + X k+1 , and by the it to be true for k + 1. To this end, i=1 k k k induction hypothesis, i=1 X i . Then X i is integrable and X i = i=1 i=1 k k by the first step, ( i=1 X i ) + X k+1 is integrable and [( i=1 X i ) + X k+1 ] = k k+1 k k+1 Xi . # i=1 X i + X k+1 = i=1 X i + X k+1 , or i=1 X i = i=1 2 X −μ1 Y −μ2 2 Cov(X ,Y ) (i) Clearly, 0 ≤ E = 2− , or Cov(X , Y ) ≤ 9. σ1 − σ2 σ1 σ2 2 Y −μ2 1 = 0, σ1 σ2 , with equality occurring if and only if E X −μ σ1 − σ2 Y −μ2 1 which implies that P X −μ σ1 − σ2 = 0 = 1 (by Exercise 2 here), or P Y = μ2 + σσ21 (X − μ1 ) = 1. 2 Y −μ2 ,Y ) 1 Likewise, 0 ≤ E X −μ = 2 + 2 Cov(X , or Cov(X , Y ) ≥ σ1 + σ2 σ1 σ2 2 Y −μ2 1 + = 0, −σ1 σ2 , with equality occurring if and only if E X −μ σ1 σ2 Y −μ2 1 which implies P X −μ = 1, or P Y = μ2 − σσ21 σ1 + σ2 = 0 × (X − μ1 ) = 1. (ii) The first expression in part (i) becomes 2−2ρ(X , Y ) ≥ 0, or ρ(X , Y ) ≤ 1, and ρ(X , Y ) = 1 if and only if P Y = μ2 + σσ21 (X − μ1 ) = 1; whereas the second expression is 2 + 2ρ(X , Y ) ≥ 0, or ρ(X ,Y ) ≥ −1, and ρ(X , Y ) = −1 if and only if P Y = μ2 − σσ21 (X − μ1 ) = 1. # 10. We have X = −1 × I{−2, −1} + 1 × I{3, 7} , so that A X dμ = (X I A )dμ = [(−I{−2, −1} + I{3, 7} )I{−2, 3, 7} ]dμ = (−I{−2} + I{3, 7} )dμ =− (I{−2} dμ + I{3, 7} )dμ = −μ({−2}) + μ({3, 7}) = −2 + 3 + 7 = 8; i.e., A X dμ = 8. # 11. The r.v. X can be written as follows: X=
1 1 I(−5, 2) + I{2} + 1I(2, 3] + 0I(3, 5) . 2 3
Revised Answers Manual to an Introduction
Then
X dμ = A
[−1, 4]
X dμ =
(X I[−1, 4] )dλ
1 1 = I(−5, 2)∩[−1, 4] + I{2}∩[−1, 4] + I[−1, 4]∩(2, 3] dλ 2 3 1 1 I[−1, 2) + I{2} + I(2, 3] dλ = 2 3 1 1 3 5 = ×3+ ×0+1×1= +1= .# 2 3 2 2
12. That μ(A) ≥ 0 and μ() = 0 are obvious. Next, let Ai be events with Ai ∩ A j = = the number of ≥ 0 integers in A , i = j. Then μ(A1 + A2 +. . .) 1 + A2 +. . . , ∞ ∞ (# of ≥ 0 integers in Ai ) = i=1 μ(Ai ), and it is clear that this is equal to: i=1 ∞ so that μ is a measure. That μ is σ -finite is immediate, since = ω=0 {ω} and μ({ω}) = 1. # 13. Indeed, g(X )dP + g(X )dP E g(X ) = (|X |≥c) (|X |
0) ≥ (|X |≥c) dP (since g is nondecreasing in (0, ∞) ≥ g(c) (|X |≥c)
and symmetric about 0) = g(c)P(|X | ≥ c). Hence P(|X | ≥ c) ≤ E g(X )/g(c). # |x| , x ∈ . Then g(−x) = g(x) and g(x) is nondecreasing in 14. Let g(x) = 1+|x| (0, ∞) (because g (x) = g(X ) = P(|X | ≥ c) ≤ Eg(c)
> 0). Then Exercise 13 applies and gives: |X | |X | 1 = 1 + = 1+c E c 1+|X | c E 1+|X | . #
1 (1+x)2
|X | E ( 1+|X |) c 1+c
15. Let A j0 be an arbitrary but fixed member of the A j s. There is at least one of the Bi s, call it Bi0 , intersecting A j0 . Then Bi0 = α j0 and, actually, the entire Bi0 lies within A j0 . Thus, if k is the number of Bi s intersecting A j0 , then A j0 is the sum of these Bi s. The same holds true for each one of the remaining A j s. # 16. In the first place,
X I(|X |≤c) dP, E X I(|X |≤c) =
and
E Y I(|Y |≤c) =
Y I(|Y |≤c) dP
e43
e44
Revised Answers Manual to an Introduction
exist, by the Corollary to Theorem 5, and are finite, by the fact that |X |I(|X |≤c) ≤ |X |, |Y |I(|Y |≤c) ≤ |Y | and Theorem 10. Next, with PX and PY standing for the probability distributions of X and Y, respectively, we have
E X I(|X |≤c) = =
X I(|X |≤c) dP
x I(|X |≤c) dP X
x I(|X |≤c) dPY
(by Theorem 13)
(since X = Y in distribution)
y I(|Y |≤c) dPY = Y I(|Y |≤c) dP =
= E Y I(|Y |≤c) . # =
Chapter 5 Standard Convergence Theorems, the Fubini Theorem 1. Indeed,
n P(|X | ≥ n) =
[n I(|X |≥n) ]dP ≤
[|X |I(|X |≥n) ]dP.
However, 0 ≤ |X |I(|X |≥n) ≤ |X |, independent of n and integrable, and |X |I(|X |≥n) −→ 0. Then, by part (iii) of the Fatou-Lebesgue Theorem, it follows that n→∞ [|X |I(|X |≥n) ]dP −→ 0, so that n P(|X | ≥ n) −→ 0. n→∞
n→∞
For the converse, consider the following Example: Let X take on the values 4, 5, . . . , and let xn = (nloglogn+1 , n ≥ 4, so n)2 ∞ log n+1 ∞ ∞ ∞ 1 1 1 that n=4 (n log n)2 = n=4 n 2 log n + n=4 (n log n)2 ≤ 2 n=4 n 2 < ∞. Set 1 log n+1 C = ∞ n=4 x n and take pn = P(X = n) = C (n log n)2 , n ≥ 4. We shall show that E X = ∞ and n P(X ≥ n) −→ 0. Indeed, n→∞
CE X =
∞ log n + 1 n=4
n(log n)2
=
∞ n=4
∞
1 1 + , n log n n(log n)2 n=4
and it suffices to show that one of the terms on the right-hand side above is ∞. ∞ 1 , x ≥ 3.5 and show that g(n) = ∞. Clearly To this end, set g(x) = x log n=4 x (see figure below), with c = 3.5,
Revised Answers Manual to an Introduction
... 0
4
∞
∞
g(n) ≤
5
Thus, E X = ∞. Next, set h(x) =
∞
g(x)d x =
c
n=4
6
c log x+1 (xlogx)2
=
7
dx = x log x
8
∞ c
9
d log x = log log x|∞ c = ∞. log x
+ (x log1 x)2 , x ≥ c = 3.5, and (x log1 x)2 ≤ ∞ 1 1 1 . Then, since Cn P(X ≥ n) = n ∞ k=n k 2 log k +n k=n (k log k)2 , it suffices x 2 log x ∞ ∞ 1 1 dx to show that n k=n k 2 log −→ 0. Clearly, ∞ k=n k 2 log k ≤ n− 1 x 2 log x . k 1 x 2 log x
n→∞
2
... 0
3
4
5
6
7
8
9
Set log x = y, x = e y , d x = e y dy, y : c = log(n − 21 ), ∞. Then the integral ∞ y ∞ dy ∞ dy ∞ de−y 1 ∞ becomes: c ee2ydy = c ye y . However, c ye y = − c y = − ye y |c + ×y ∞ dy −1 ∞ dy ∞ dy 1 1 = . Hence n ∞ 1 1 − c k=n k 2 log k ≤ n c ye y ≤ c ey y2 e y (n− 2 ) log(n− 2 )
−→ 0. This completes the proof. # 2. The integrability of X implies that both X + and X − are finite. Let 0 ≤ X n simple ↑ X + and 0 ≤ Yn simple ↑ X − . Then |X n | ≤ X + , independent of n→∞ n→∞ n and integrable, so that |X n − X + | −→ 0, by part (iii) of the Fatou-Lebesgue n→∞ Theorem, and likewise |Yn − X − | −→ 0. Then, for every ε > 0, there exists n
(n− 21 ) log(n− 21 )
n→∞
n→∞
e45
e46
Revised Answers Manual to an Introduction
n 0 = n(ε) > 0 integer such that |X n − X + | < 2ε and |Yn − X − | < 2ε . Set X ε = X n 0 − Yn 0 . Then, clearly, X ε is a simple r.v. and |X − X ε | = |(X + − X − ) − (X n 0 − Yn 0 )| = |(X n 0 − X + ) − (Yn 0 − X − )| + ≤ |X n 0 − X | + |Yn 0 − X − | < 2ε + 2ε = ε. # 3. The assumption that Un −→ U and Vn −→ V with U and V finite n→∞ n→∞ implies that Un and Vn are finite for n ≥n 0 , some n 0 . Since U n ≤ X n ≤ Vn a.e., n ≥ 1, it follows that Un ≤ X n ≤ Vn , n ≥ 1, so that X n are finite for n ≥ n 0 . Next, Un ≤ X n a.e. implies 0 ≤ X n − Un a.e. and therefore (by (X − U ) ≤ lim inf Theorem 2(i)), (0 ≤) lim inf n→∞ n n n→∞ (Xn − Un), or X − U , or X − U ≤ lim inf n→∞ X n − U , (X −U ) ≤ lim inf n→∞ n or X ≤ lim inf n→∞ X n . Also, X n ≤ Vn a.e. implies Xn − Vn ≤ 0 a.e. and therefore (by Theorem 2(ii)) lim supn→∞ (X n − Vn ) ≤ lim sup n→∞(X n − X ), or lim sup − V ≤ (X − V ), or lim sup V n n→∞ n→∞ X n − V ≤ n X ≤ lim inf n→∞ X n ≤ X − V , or lim supn→∞ X n ≤ X . Thus lim supn→∞ X n ≤ X , so that limn→∞ X n exists and is equal to X finite. # 4. (i) The assumption |X n | ≤ Un a.e., n ≥ 1, is equivalent to −Un ≤ X n ≤ Un a.e., n ≥ 1, and all assumptions of Exercise with Un replaced 3 are satisfied by −Un and Vn replaced by Un . Then X n −→ X finite. n→∞ a.e.
a.e.
(ii) From |X n | ≤ Un a.e., n ≥ 1, and |X n | −→ |X | and Un −→ U , we get n→∞ n→∞ |X | ≤ U a.e., so that: 0 ≤ |X n − X | ≤ |X n | + |X | ≤ Un + |X | ≤ Un + U a.e., n ≥ 1. Set Z n = Un + U and Z = 2U . Then: 0 ≤ |X n − X | ≤ Z n a.e., n ≥ 1, and
Z n −→
n→∞
Z finite.
a.e. Since also |X n − X | −→ 0, Exercise 3 applies and gives |X n − X | −→ 0. n→∞ n→∞ (iii) Finally, for any A ∈ A, | A X n − A X | = | A (X n − X )| ≤ A |X n − X | ≤ |X n − X |, independent of A and tending to 0 as n → ∞ by part (ii). The proof is completed. # 5.
(i) By the hint, it suffices to show that for each {m} ⊆ {n} there exists {r } ⊆ {m} μ such that X r −→ X . For {m} ⊆ {n}, look at {X m }. Since X m −→ X , r →∞
a.e.
μ
r →∞
r →∞
m→∞
there exists {r } ⊆ {m} such that X r −→ X . Also, since Ur −→ U , there
Revised Answers Manual to an Introduction
a.e.
a.e.
exists {s} ⊆ {r } such that Us −→ U , and, of course, X s −→ X . Thus: s→∞ s→∞ a.e. a.e. |X s | ≤ Us a.s., all s, X s −→ X , Us −→ U and Us −→ U finite. Then, s→∞ s→∞ s→∞ by Exercise 4(i), X s −→ X finite. That is, for each {m} ⊆ {n} there s→∞ exists {s} ⊆ {m} such that X s −→ X finite, so that X n −→ X s→∞ n→∞ finite. (ii) As in part (i), from {n} pass to a subsequence {s} for which |X s | ≤ Us a.e., a.e.
a.e.
all s, |X s | −→ |X |, and Us −→ U . Then work as in part (ii) of Exercise s→∞ s→∞ 4 in order to conclude that |X s − X | −→ 0. In turn, this implies, by the s→∞ hint, that |X n − X | −→ 0. n→∞
(iii) Exactly as in Exercise 4(iii). # 6. Suppose that for any ε > 0, there exists δ = δ(ε) > 0 such that μ(A) < δ implies φ(A) < ε. Then, if μ(A) = 0, it follows that, for every ε > 0 and any δ > 0, we have μ(A) < δ and φ(A) < ε. Letting ε → 0, we get φ(A) = 0. So, μ(A) = 0 implies φ(A) = 0. Next, assume that φ μ, and suppose that the statement “for every ε > 0, there exists δ = δ(ε) > 0 such that μ(A) < δ implies φ(A) < ε” is violated. This would imply that, for some ε > 0, there does not exist a δ = δ(ε) > 0 such that μ(A) < δ implies φ(A) < ε. Equivalently, for this ε > 0 and every δ > 0, there exists at least one A such that μ(A) < δ implies φ(A) ≥ ε. Take δn = 21n . Then there exists An with μ(An ) < 21n and φ(An ) ≥ ε. Set ∞ A = lim supn→∞ An = ∩∞ n=1 ∪k=n Ak . Then: μ(A) = μ
∞
∩
∞
∪ Ak
n=1 k=n
≤μ
∞
∪ Ak
k=n
≤
∞
μ(Ak ) =
k=n
1
−→ 0.
2n−1 n→∞
That is, μ(A) = 0. However, ∞ ∞ ∞ ∞ φ(A) = φ ∩ ∪ Ak = φ lim ∪ Ak = lim φ ∪ Ak , n→∞ k=n
n=1 k=n
n→∞
k=n
as follows from the σ -additivity of φ (see the proof of Theorem 2 in Chapter 2). Since φ ∪∞ k=n Ak ≥ φ(An ), by taking the limits, we get: ∞ lim φ ∪ Ak ≥ lim sup φ(An ) ≥ ε. n→∞
k=n
n→∞
Thus, μ(A) = 0 whereas φ(A) ≥ ε, which is a contradiction to our assumption that φ μ. # 7. (i) ω2 ∈ E ω1 implies that (ω1 , ω2 ) ∈ E, so that (ω1 , ω2 ) ∈ F and then ω2 ∈ Fω1 . Thus, E ω1 ⊆ Fω1 . Similarly, E ω2 ⊆ Fω2 . (ii) If E ω1 ∩ Fω1 = , there would exist ω2 ∈ E ω1 ∩ Fω1 , so that ω2 ∈ E ω1 and ω2 ∈ Fω1 which imply that (ω1 , ω2 ) ∈ E and (ω1 , ω2 ) ∈ F, a contradiction. Thus, E ω1 ∩ Fω1 = , and similarly, E ω2 ∩ Fω2 = .
e47
e48
Revised Answers Manual to an Introduction
∞ (iii) ω2 ∈ ∪∞ n=1 E n ω1 implies (ω1 , ω2 ) ∈ ∪n=1 E n , so that (ω1 , ω2 ) ∈ E n for Then ω2 ∈ E n 0 ,ω1 and hence ω2 ∈ ∪∞ one n=1 E n,ω1 . Thus, n, n 0 , say. at least ∞ ∞ E . Then ω2 ∈ E n,ω1 ∪n=1 E n ω ⊆ ∪n=1 E n,ω1 . Next, let ω2 ∈ ∪∞ n=1 n,ω1 1 for at least one n, n 0 , say, so that (ω , ω ) ∈ E and hence n 1 , ω2) ∈ 1 2 ∞ (ω ∞ E ∞ E E . But then ω ∈ ∪ E , so that ∪ ⊆ ∪ ∪∞ 2 n=1 n n=1 n ω1 n=1 n,ω1 n=1 n ω1 , and the result follows. The special case follows from what it was just established and part (ii). Similarly, for the 1 -sections at ω2 . ∞ (iv) If ω2 ∈ ∩∞ n=1 E n ω1 , then (ω1 , ω2 ) ∈ ∩n=1 E n , so that (ω1 , ω2 ) ∈ E n , n ≥ 1, and then ω2 ∈ E n,ω1 , n ≥ 1, or ω2 ∈ ∩∞ n=1 E n,ω1 . Next, if ω2 ∈ E , then ω ∈ E , n ≥ 1, so that (ω ∩∞ 2 n,ω1 1 , ω2 ) ∈ E n , n ≥ 1, and n=1 n,ω1 ∞ E ) . The asserted equality follows, E or ω ∈ (∩ (ω1 , ω2 ) ∈ ∩∞ 2 n=1 n n=1 n ω1 and similarly for the 1 -sections at ω2 . / E, and this implies (v) If ω2 ∈ (E c )ω1 , then (ω1 , ω2 ) ∈ E c , so that (ω1 , ω2 ) ∈ / E ω1 , so that ω2 ∈ (E ω1 )c . On the other hand, if ω2 ∈ (E ω1 )c , that ω2 ∈ / E ω1 , so that (ω1 , ω2 ) ∈ / E and hence (ω1 , ω2 ) ∈ E c , so that then ω2 ∈ c ω2 ∈ (E )ω1 . The asserted equality follows, and similarly for the 1 sections at ω2 . # 8.
(i) μ0 ≥ 0 and μ0 () = 0. It is also σ -additive since μ0
∞
=μ
An
∞
n=1
An ∩ C = μ
n=1
=
∞
∞
An ∩ C
n=1
∞ μ An ∩ C = μ0 An .
n=1
n=1
(ii) First, let X = I A , A ∈ A. Since X dμ exists, it follows that C X dμ also exists (Corollary to Theorem 5 in Chapter 4), and X dμ = I A dμ = μ(A ∩ C) = μ0 (A) = I A dμ0 = X dμ0 . C
C
So, X dμ0 exists and X dμ0 = C X dμ. Next, let X = ri=1 αi I Ai , where αi ≥ 0 for i = 1, . . . , r . Then r r X dμ = αi I Ai dμ = αi I Ai dμ C
C
=
r
i=1
i=1
αi
I Ai dμ0 (by the part just proved)
i=1
=
r i=1
C
αi I Ai
dμ0 =
X dμ0 .
Revised Answers Manual to an Introduction
So, again, X dμ0 exists and equals C X dμ. For X ≥ 0, there exists 0 ≤ X n simple r.v.s ↑ X . Then 0 ≤ X n IC n→∞
simple r.v.s ↑ X IC and n→∞
X n dμ =
(X n IC )dμ −→
n→∞
C
(X IC )dμ,
X n dμ0 −→
n→∞
X dμ0 .
n dμ = X n dμ0 for all n, so that However, bythe previous part, C X X dμ = X dμ . Once again, X dμ exists and equals 0 0 C C X dμ. Finally, forany X , consider X = X + − X− , and suppose that X + dμ < + the ∞. Then C X + dμ is also finite, and C X + dμ = − X dμ0 by previous part. Thus, X dμ0 exists. Since also C X dμ = X − dμ0 (whether finite or not), it follows that + − + − X dμ = X dμ− X dμ = X dμ0 − X dμ0 = X dμ0 . C
C
C
Likewise, if we assume that X − dμ < ∞. # 9. Indeed, since X dμ exists, so does Ai X dμ for all i. By Exercise 8, Ai X dμ = X dμi . At this point, assume first that X ≥ 0. Then ∞ ∞ n X dμ = X dμi = (X I Ai )dμ i=1
i=1
Ai
i=1
∞ = ( X I Ai )dμ (by Corollary 1 to Theorem 1) =
i=1
X dμ (because {Ai , i ≥ 1} is a partition of ).
Next, for anyX for which the integral X dμ exists, write X = X + − X − , and suppose that X + dμ < ∞. By the previous step, ∞ ∞ X + dμ and X − dμ. X + dμi = X − dμi = i=1
i=1
+ The finiteness − implies finiteness of X dμi for each i. Thus, the + of integral X dμi − X dμi exist for all i, and ∞ ∞ X + dμi − X − dμi X dμ = X + dμ − X − dμ = X + dμ
i=1
= =
∞ i=1 ∞ i=1
i=1 +
X dμi −
−
X dμi
X + − X − dμi
e49
e50
Revised Answers Manual to an Introduction
=
∞ i=1 ∞
X dμi , or X dμi =
X dμ.
i=1
Likewise, if we suppose that X − dμ < ∞. # n xi I Ei where {E 1 , . . . , E n } is a partition of 1 × 2 . Then 10. Let X = i=1 n X dλ = i=1 xi λ(E i ) and its existence means that the xi s corresponding to E i s with λ(E i ) = ∞ (if such E i s exist) are either all positive or all negative. Next, for ω2 ∈ 2 and ω1 ∈ 1 , consider the respective sections E i,ω2 and E i,ω1 , which (by Theorem 8) are measurable with respect to the appropriate σ -fields. Furthermore, 1 =
n
E i,ω2 , 2 =
i=1
n
E i,ω1 ;
i=1
this is so by Exercise 7(iii). Next, consider the (simple) r.v.s: X (·, ω2 ) = X ω2 (·) =
n
xi I Ei ,ω2 ,
X (ω1 , ·) = X ω1 (·) =
i=1
n
xi I Ei ,ω1 .
i=1
Then the integrals X ω2 (·)dμ1 and X ω1 (·)dμ2 exist, are A2 -measurable and A1 -measurable, respectively, and n n xi μ1 (E i , ω2 ), X ω1 (·)dμ2 = xi μ2 (E i , ω1 ). X ω2 (·)dμ1 =
i=1
i=1
The existence of X ω2 (·)dμ1 follows from the fact that, if λ(E i ) < ∞, then μ1 (E i , ω2 ) < ∞ a.e. [μ2 ], because otherwise μ1 (E i , ω2 )dμ2 = λ(E i ) = ∞, a contradiction. So, μ1 (E i , ω2 ) may be ∞ only if λ(E i ) = ∞. But then the respective xi sare either all positive or all negative, so that X ω2 (·)dμ1 exists. The same for X ω1 (·)dμ2 . Their measurability follows from Theorem 10. Next, n [ X ω2 (·)dμ1 ]dμ2 = xi μ1 (E i,ω2 )dμ2 , i=1
n [ X ω1 (·)dμ2 ]dμ1 = xi μ2 (E i,ω1 )dμ1 ,
i=1
and μ1 (E i,ω2 )dμ2 = μ2 (E i,ω1 )dμ1 = λ(E i ), i = 1, . . . , n, by Theorem 10 and the definition of λ. Thus, n [ X ω2 (·)dμ1 ]dμ2 = xi μ1 (E i , ω2 )dμ2 i=1
Revised Answers Manual to an Introduction
=
n
xi
μ2 (E i,ω1 )dμ1 (and the common value of the
i=1
two expressions in the middle is
n
xi λ(E i )
i=1
X (ω1 , ω2 )dλ) = [ X ω1 (·)dμ2 ]dμ1 . #
=
+ − 11. Since that at least one +X dλ exists, it follows +X dλ = X− dλ − X dλ and of X dλ or X dλ is finite. Let X dλ < ∞. Since X + is integrable, it follows, by Theorem 12, that: + + X dμ1 dμ2 = X dμ2 dμ1 = X + dλ, + + X dμ2 dμ1 < ∞. Also, since X − ≥ 0, it foland X dμ1 dμ2 < ∞, lows, by Theorem 12 again, that: X − dμ1 dμ2 = X − dμ2 dμ1 = X − dλ. Combining these results, we get + X dμ1 dμ2 = X dμ1 dμ2 − X − dμ1 dμ2 = X + dλ − X − dλ = X dλ, and
X + dμ2 dμ1 − X − dμ2 dμ1 = X + dλ − X − dλ = X dλ;
X dμ2 dμ1 =
i.e., the result holds. Similarly, if √
12. We have d
μ −→ 0 n→∞
n( X¯ n −μ) d −→ Z σ n→∞ d or X¯ n −→ μ, and n→∞
X − dλ < ∞. #
√σ −→ 0, so that, by Theorem n n→∞ P hence X¯ n −→ μ, by Theorem 6. # n→∞
and
13. The positive and negative parts of f are as follows: 1 1 < x ≤ 2n 2n, 2n+1 f + (x) = 0, otherwise 1 1 < x ≤ 2n+1 2n + 1, 2n+2 f − (x) = 0, otherwise.
7 (ii), X¯ n −
(n = 1, 2, . . .) (n = 0, 1, . . .)
e51
e52
Revised Answers Manual to an Introduction
Then (0,1]
f + dλ =
∞
1 1 n=1 ( 2n+1 , 2n ]
(2n)dλ =
∞ 1 1 n=1 ( 2n+1 , 2n ]
(2n)dλ
(by Corollary 1(ii) to Theorem 1) ∞ 1 2n 1 − )= (2n) × ( = 2n 2n + 1 (2n)(2n + 1) ∞ n=1
∞
=
n=1
n=1
1 = ∞. 2n + 1
Also, (0,1]
−
f dλ =
∞
1 1 n=0 ( 2n+2 , 2n+1 ]
∞ = (2n + 1) × ( n=0
=
∞ n=0
(2n + 1)dλ =
∞ 1 1 n=0 ( 2n+2 , 2n+1 ]
(2n + 1)dλ
1 1 − ) 2n + 1 2n + 2 ∞
1 1 2n + 1 = = ∞. 2(n + 1)(2n + 1) 2 n+1 n=0
So, (0,1] f + dλ = (0,1] f − dλ = ∞, and therefore the integral (0,1] f dλ does not exist. Furthermore, since | f | = f + + f − , it follows that (0,1] | f |dλ = ∞. # 14. By Exercise 14 in Chapter 4, 1 P(|X n − X | ≥ ε) ≤ (1 + )E ε Thus, if E
|X n −X | 1+|X n −X |
P |X n −X | −→ 1+|X n −X | n→∞
|X n − X | . 1 + |X n − X |
P
P
n→∞
n→∞
−→ 0, then X n −→ X . Next, let X n −→ X . Then
n→∞
|X n −X | 0 (by Exercise 8 in Chapter 3) and (0 ≤) 1+|X −X | ≤ 1 for all n, n |X n −X | integrable. Then by part (b) of Theorem 3, E 1+|X n −X | −→ 0, as was to be n→∞ seen. # Alternate proof (of one direction) without reference to Exercise 14 in Chapter 4: |X n −X | −→ 0 implies that P(|X n − X | ≥ ε) −→ 0, may be seen as That E 1+|X −X | n n→∞ n→∞ follows without employing Exercise 14 in Chapter 4. Namely, for ε > 0, |X n − X | |X n − X | = dP E 1 + |X n − X | (|X n −X |≥ε) 1 + |X n − X |
Revised Answers Manual to an Introduction
|X n − X | dP (|X n −X |<ε) 1 + |X n − X | ε dP + 1+ε (|X n −X |≥ε)
+ ≤
(since, for x > 0, the function g(x) = right-hand side above
x 1+x
= 1−
= P(|X n − X | ≥ ε) +
1 1+x
is increasing), and the
ε . 1+ε
Letting n → ∞, we get 0 = lim E
|X n − X | 1 + |X n − X |
≤ lim sup P(|X n − X | ≥ ε) +
ε . 1+ε
Letting ε → 0, we obtain the desired result. # 15. All limits are taken as n → ∞. Then: lim X 2n+1
1 1 , and lim X 2n+1 = 0 on ,1 . = 1 on 0, 2 2
Also, lim X 2n+2 = 0 on
1 1 , and lim X 2n+2 = 1 on ,1 . 0, 2 2
It follows that lim inf X n = 0, and
lim sup X n = 1,
lim inf X n = 0,
Next, X 2n+1 = 1 × It follows that:
1 2
= 21 ,
X 2n+2 = 1 ×
1 2
= 21 , so that
lim inf
lim sup X n = 1.
X n = lim sup
X n = lim
Xn =
and
1 lim inf X n = 0 < = lim inf X n , 2 1 lim sup X n . # lim sup X n = < 1 = 2
Xn = 1 , 2
1 2
for all n.
e53
e54
Revised Answers Manual to an Introduction
16. X is, indeed, a r.v. since X −1 ({1/2ω }) = {ω} is in A, and
X dμ =
∞
ω=0 {ω}
X dμ =
∞ ω=0 {ω}
X dμ (by Corollary 1(ii) to Theorem 1)
∞ ∞ 1 1 1 1 × 1 = = 1 + + 2 + ··· ω ω 2 2 2 2
=
ω=0
ω=0
1
=
1−
= 2. #
1 2
17. This exercise is a reformulation of Exercise 3 in this chapter in reference to the space ( , B k , λk ) rather than an abstract (σ -finite) measure space (, A, μ). # 18. d P (i) The convergences X n −→ X and Yn − X n −→ c imply (by Theorem 7(i)) n→∞
n→∞
d
that Yn −→ X + c. n→∞
d
P
n→∞
n→∞
(ii) Again, X n −→ X and Yn −→ c imply (by Theorem 7(ii)) that d
X n Yn −→ cX . # n→∞
19. Let An ∈ A such that P(An ) −→ 0. Then, as n → ∞, P(I An ≥ ε) ≤ 1ε E I An = n→∞
P
1 ε P(An )
→ 0, and therefore |X |I An → 0. Next, |X |I An ≤ |X | independent of n and integrable. Therefore, by the Dominated Convergence Theorem,
(|X | I An )dP =
|X |dP −→ 0. n→∞
An
So, if P(An ) → 0, then ν(An ) → 0, as n → ∞. Thus, for every ε > 0, there is δ = δ(ε)(> 0) such that P(An ) < δ implies ν(An ) < ε for sufficiently large n. It follows that, if P(A) < δ then ν(A) < ε, as was to be seen. # 20.
(i) Indeed,
EX =
X dP =
∞ ∞
=
0
=
0
But ∞ 0
∞ 0
0 ∞
∞
∞ x
xdP X = 0 I[0,x] (t)dt dP X = 0
0
dt dP X
0 ∞ ∞ 0
I[0,x] (t)dtdP X
I[0,x] (t)d PX dt (since the integrand is ≥ 0).
∞
∞
0
t
I[0,x] (t)dP X = P(X ≥ t)dt.
dP X = P(X ≥ t), so that E X =
Revised Answers Manual to an Introduction
∞ ∞ ∞ (ii) For the first case, E X = 0 xλe−λx d x = − 0 xde−λx = −xe−λx 0 + ∞ −λx ∞ ∞ ∞ d x = 0 e−λx d x = − λ1 0 de−λx = − λ1 e−λx = λ1 , 0 e 0 whereas ∞ ∞ ∞ −λx P(X ≥ t) = λe dx = − de−λx = −e−λx = e−λt , t
0
t
∞ ∞ ∞ P(X ≥ t)dt = 0 e−λt dt = λ1 0 e−λx d(λx) = ∞0 − λ1 e−λx = λ1 (= E X ). For the second case, E X = 21 , whereas, for 0 0 ≤ t ≤ 1, so that
1
P(X ≥ t) =
d x = 1 − t,
t
so that
∞
1
P(X ≥ t)dt =
0
0
1 t 2 1 1 (1 − t)dt = t − = (= E X ). # 0 2 0 2
Chapter 6 Standard Moment and Probability Inequalities, Convergence in the r th Mean and its Applications 1. The inequality has been established for n = 2. Assume it to be true for n = k and 1 establish it for n = k + 1. For r1 , . . . , rk+1 > 0 with r11 + . . . + rk+1 = 1, define r by:
=
1 r
1 r1
1 + . . . + r1k (< 1), so that r1 + rk+1 = 1. Also, set ri =
so that ri > 0, i = 1, . . . , k, and Then:
1 r1
+ . . . + r1 = r ( r11 + . . . + k
1
ri r , i = 1, . . . , k, 1 1 rk ) = r × r = 1. 1
E|X 1 . . . X k+1 | = E|(X 1 . . . X k )X k+1 | ≤ E r |X 1 . . . X k |r E rk+1 |X k+1 |rk+1 1
1
= E r (|X 1 |r . . . |X k |r )E rk+1 |X k+1 |rk+1 1
1
1
1
≤ [E r1 (|X 1 |r )r1 . . . E rk (|X k |r )rk ] r E rk+1 |X k+1 |rk+1 (by the induction hypothesis) 1
1
1
= E r1 |X 1 |r1 . . . E rk |X k |rk E rk+1 |X k+1 |rk+1 (since rri = ri , i = 1, . . . , k). Thus, 1
1
E|X 1 . . . X k+1 | ≤ E r1 |X 1 |r1 . . . E rk+1 |X k+1 |rk+1 . # 2.
(i) For an arbitrary but fixed x0 ∈ I , let x1 , x2 ∈ I be such that x1 < x0 < x2 x0 −x1 0 and set α = xx22 −x −x1 , β = x2 −x1 . Then α, β ≥ 0 and α + β = 1. Also, x0 −x1 0 αx1 + βx2 = x0 . Then by convexity, g(x0 ) ≤ xx22 −x −x1 g(x 1 ) + x2 −x1 g(x 2 ).
e55
e56
Revised Answers Manual to an Introduction
By letting x2 ↓ x0 , we get: g(x0 ) ≤ lim x2 ↓x0 g(x2 ). Next, let x1 < x2 < x0 x2 −x1 2 and take α = xx00 −x −x1 , β = x0 −x1 , so that α, β ≥ 0, α + β = 1, and x2 −x1 2 αx1 + βx0 = x2 . Therefore g(x2 ) ≤ xx00 −x −x1 g(x 1 ) + x0 −x1 g(x 0 ), and as x2 ↑ x0 , lim x2 ↑x0 ≤ g(x0 ). So, lim x2 ↑x0 ≤ g(x0 ) ≤ lim x2 ↓x0 g(x0 ), so that lim x2 →x0 g(x2 ) = g(x0 ) and g is continuous at x0 . x0 −x1 0 (ii) In g(x0 ) ≤ xx22 −x −x1 g(x 1 )+ x2 −x1 g(x 2 ) (x 1 < x 2 < x 0 ), replace x 2 by x to get: (x − x1 )g(x0 ) ≤ (x − x0 )g(x1 ) + (x0 − x1 )g(x), or (x − x1 ) × g(x0 ) − (x0 − x1 )g(x0 ) ≤ (x − x0 )g(x1 ) + (x0 − x1 )g(x) − (x0 − x1 ) × g(x0 ), or (x − x0 )g(x0 ) ≤ (x − x0 )g(x1 ) + (x0 − x1 )[g(x) − g(x0 )], or (x0 − x1 )[g(x) − g(x0 )] ≥ (x − x0 )[g(x0 ) − g(x1 )], or g(x) − g(x0 ) ≥ g(x0 )−g(x1 ) (x − x0 ). Take x1 = cx0 , c > 0, so that cx0 ∈ I . Then: x0 −x1 g(x) − g(x0 ) = λ(x0 )(x − x0 ), where λ(x0 ) =
g(x0 )−g(x1 ) . x0 −x1
#
3. The inequality has been established for n = 2. Assume it to be true for n = k and establish it for n = k + 1. Indeed, 1
1
E r |X 1 + . . . + X k+1 |r = E r |(X 1 + . . . + X k ) + X k+1 |r 1
1
≤ E r |X 1 + . . . + X k |r + E r |X k+1 |r 1
1
1
≤ (E r |X 1 |r + . . . + E r |X k |r ) + E r |X k+1 |r (by the induction hypothesis) 1
1
≤ E r |X 1 |r + . . . + E r |X k+1 |r . #
4. By the Markov inequality, P(|X n − X | ≥ ε) ≤ ε−r E|X n − X |r , so that ∞ n=1 ∞ P(|X n − X | ≥ ε) ≤ ε−r n=1 E|X n − X |r < ∞. Then Exercise 4(i) in Chapter 3 applies and gives the result. # (r )
P
a.s.
P
n→∞ P
n→∞
n→∞
n→∞
5. X n −→ Y implies X n −→ Y . Also, X n −→ X implies X n −→ X . So, P
X n −→ X and X n −→ Y , so that P(X = Y ) = 0, by Theorem 1 in n→∞ n→∞ Chapter 3. # 6. Take (, A, P) = ((0, 1], B(0,1] , λ), where λ is the Lebesgue measure, and define X n and X as follows: n on (0, n1 ] , n ≥ 1; X = 0 on (0, 1]. Xn = 0 on ( n1 , 1] P
Then, clearly, X n −→ 0 pointwise and hence X n −→ 0. However, E|X n − X |r = E X nr = nr ×
n→∞ 1 r −1 , n =n
(r )
n→∞
so that X n 0 for any r ≥ 1. # n→∞
7. In reference to Exercise 2(ii) in Chapter 3, Ynk = 1 on a set of probability 1/2n−1 and 0 otherwise. Set X n = Ymk for some m and some k = 1, . . . , 2m−1 . Then, a.s. as was seen in the exercise cited, X n 0 (indeed, it does not converge at any n→∞
r = point), but E|X n − X |r = E X nr = EYmk
1 −→ 0 2n−1 n→∞
for any r > 0. #
Revised Answers Manual to an Introduction
α α (|X n |α ≥c) |X n | dP = (|X n |≥c α1 ) |X n | dP = (|X n |≥c α1 ) 1 1 1 M |X n |β dP = c(β−α)/α )dP ≤ c(β−α)/α E|X n |β ≤ c(β−α)/α inde|X n |β−α
8. Indeed, for c > 0,
×(|X n |β × pendent of n and −→ 0. # n→∞
9. For M > 0, we have:
(|X n |≥M)
|X n |dP =
(|X n |≥M)
X n dP =
cn × 0
1 n
= c for M ≤ cn for M > cn.
Then sup M>0 (|X n |≥M) |X n |dP = c, so that lim sup M→∞ (|X n |≥M) |X n |dP = c, and this implies that (|X n |≥M) |X n |d 0. Therefore the given r.v.s are not M→∞
uniformly integrable. # 10. Here all limits are taken as P(A) → 0, or we can pass to arbitrary sequences {An } with P(An ) −→ 0. We elect the former. Now P(I A > ε) ≤ EεI A = P(A) ε → 0, n→∞
P
P
so that I A → 0 and hence |X |I A → 0. Also, |X |I A ≤ |X | independent of A and integrable. 3(ii) in Chapter 5 (applied under assumption (b)), Then, by Theorem we have |X |I A → 0, or A |X | → 0. So, A |X | → 0 as P(A) → 0. # P
11. The assumptions X n → X and X n ≥ 0 a.s. imply X ≥ 0 a.s. (by passing to a suba.s.
sequence X m −→ X ). Then the additional assumption that E X n −→ E X finite m→∞ (1) X n −→ n→∞
n→∞
X . This is so by Theorem 14. However, 0 ≤ Yn ≤ X n a.s. implies 0 ≤ Yn I A ≤ X n I A a.s. for any event A, and hence A Yn dP ≤ A X n dP. (1) But X n −→ X implies that X n dP are uniformly continuous (by Theorem 13). n→∞ P Therefore so are Yn dP. This fact along with the assumption Yn −→ Y imply implies that
n→∞
(1)
then that Yn −→ Y (again by Theorem 13), so that E|Yn − Y | −→ 0. # n→∞ n→∞ 12. Indeed, |X |r dμ = (|X |r ≥c) |X |r dμ + (|X |r 0, or P(|X − μ| < c) = 1 for every c > 0. In particular, P(|X −μ| < n1 ) = 1 for every n ≥ 1. Set An = (|X −μ| < n1 ), n ≥ 1, and A0 = (|X − μ| = 0) = (X = μ). Then, clearly, An ↓ A0 , so that, by continuity of a n→∞
probability measure, we have P(An ) −→ P(A0 ). Since P(An ) = 1, n ≥ 1, we obtain P(A0 ) = 1. #
n→∞
e57
e58
Revised Answers Manual to an Introduction
15.
P
(i) If |X n | −→ 0, then n→∞
P |X n | −→ 0 1+|X n | n→∞
since the function g(x) =
x 1+x , x
≥ 0,
|X n | is continuous, so that Exercise 8 in Chapter 3 applies. Next, 1+|X −→ 0 n | n→∞ |X n | > ε −→ 0 or P(|X n | > means that for every (1 >)ε > 0, P 1+|X n| P
n→∞
(ii)
P ε −→ 0, so that |X n | −→ 0. 1−ε ) n→∞ n→∞ P P |X n | |X n | −→ 0 implies 1+|X −→ 0, n | n→∞ n→∞
|X n | by part (i), whereas 1+|X ≤ 1 inden| pendent of n and integrable. Then by the Dominated Convergence Theo |X n | rem (Theorem 13, part (b), in Chapter 5), E 1+|X n | −→ 0. On the other n→∞ (1) |X n | |X n | −→ 0 is equivalent to −→ 0, and this implies hand, E 1+|X 1+|X n | n| n→∞
n→∞
|X n | that 1+|X −→ 0, by Theorem 8 in Chapter 6, and hence |X n | −→ 0, by n | n→∞ n→∞ part (i). # (i) The function g(x) = |x|r , r > 1, is convex (because the second derivative is positive). Consider a r.v. X taking on the values x1 , . . . , xn with probability n1 each. Then E X = n1 nj=1 x j , and by Jensen’s inequality, g(E X ) ≤ E g(X ), or | n1 nj=1 x j |r ≤ n1 nj=1 |x j |r . (ii) In part (i), replace x j by the r.v. X j to get P
16.
P
n n 1 r ≤ 1 X |X j |r . j n n j=1
j=1
Taking expectations, we obtain the desired result; i.e., n n 1 r 1 E X j ≤ E|X j |r . # n n j=1
j=1
17. The uniform integrability of X n , n ≥ 1, implies that E|X n | ≤ M(< ∞), n ≥ 1, and that A |X n |dP, n ≥ 1, are uniformly continuous. This is so by Theo P rem 11. So, A |X n |dP, n ≥ 1, are uniformly continuous and X n −→ X . Then n→∞ (1) X n −→ X , by Theorem 13(i). Therefore | A X n dP − A X dP| = | A (X n − n→∞ X )dP| ≤ A |X n − X |dP ≤ |X n − X |dP = E|X n − X | independent of A and −→ 0. The desired result follows. # n→∞
18. Uniform integrability of X n , n ≥ 1, and Yn , n ≥ 1, impliesthat E|X n | ≤ M(< ∞), n ≥ 1, E|Yn | ≤ M, n ≥ 1, and A |X n |dP → 0, A |Yn |dP → 0, as P(A) → 0, uniformly in n ≥ 1. This is so by Theorem 11. Next, E|X n + Yn | ≤ E|X n |+E|Yn | ≤ 2M, n ≥ 1, and A |X n +Yn |dP ≤ A |X n |dP+ A |Yn |dP → 0, as P(A) → 0, uniformly in n ≥ 1. Then X n +Yn , n ≥ 1, are uniformly integrable, by the same Theorem, Theorem 11. #
Revised Answers Manual to an Introduction
19. By Theorem 11, | X¯ n |, n ≥ 1, are uniformly integrable if and only if E| X¯ n | ≤ M(< ∞), n ≥ 1, and A | X¯ n |dP → 0, as P(A) → 0, uniformly in n ≥ 1. However, E| X¯ n | ≤
n 1 1 E|X j | = × nE|X 1 | = E|X 1 | < ∞, n ≥ 1, n n j=1
and
n 1 1 ¯ | X n |dP ≤ |X j |dP ≤ × n |X 1 |dP = |X 1 |dP n n A A A A j=1
independent of n, and converging to 0 as P(A) → 0. This is so by Exercise 6 in Chapter 5. # a.s. 20. We have |X n | ≤ Y , n ≥ 1, and X n −→ X . Thus, |X | ≤ Y a.s. or |X |r ≤ Y r n→∞
a.s. and hence
E|X |r
≤
EY r
a.s.
P
n→∞
n→∞
< ∞. Also, X n −→ X implies X n −→ X . Thus, P
we have: |X n | ≤ Y , n ≥ 1, EY r < ∞ and X n −→ X . Then Corollary 3 to n→∞
(r )
Theorem 13 applies and gives X n −→ X . # n→∞
21. For ε > 0, apply the Tchebichev inequality to get: P(|X n − μn | ≥ ε) ≤
σn2 −→ 0, ε2 n→∞
P
so that X n − μn −→ 0. # n→∞
22. For ω ∈ (0, 1), we have that X n (ω) = 0 for n > 1/ω, and the first assertion folnr lows. Next, E|X n |r = (2n )r × n1 = 2n = exp[(r nlog2)n] −→ ∞, which completes n→∞ the proof. # (i) For (1 >)ε > 0, we have that n c > ε, so that 23. P(|X n | > ε) = P(X n = n c ) =
1 −→ 0. n n→∞
P
Thus, X n −→ 0. (ii) Here,
n→∞ E|X n |r =
n cr −1 + n −(cr +1) , and therefore E|X n |r −→ 0 if (0 <
)cr < 1, and E|X n |r −→ ∞ if cr > 1.
n→∞
n→∞
(r )
(iii) That X n −→ 0 for all cr < 1, follows by Theorem 14 on account of parts n→∞ (i) and (ii). # (2)
24. We have E|X n − X |2 −→ 0, which is equivalent to saying that X n −→ X , and n→∞
this implies that E X n2 −→ E X 2 < ∞ (by Theorem 7). n→∞
n→∞
e59
e60
Revised Answers Manual to an Introduction
Next, E|X n2 − X 2 | = E|(X n + X )(X n − X )| 1
1
≤ E 2 |X n + X |2 × E 2 |X n − X |2 (by the Cauchy-Schwarz inequality) 1
1
1
≤ (E 2 X n2 + E 2 X 2 )E 2 |X n − X |2 (by the Minkowski inequality) 1
1
≤ 2M 2 E 2 |X n − X |2 −→ 0, n→∞
where M is a bound for E X 2 and E X n2 (for sufficient large n). # 25. In the first place, 1
1
E|X n Yn | ≤ E r |X n |r × E s |Yn |s ≤ M < ∞, for all sufficiently large n. Next, E|X n Yn − X Y | = E|X n (Yn − Y ) + (X n − X )Y | 1
1
1
1
≤ E r |X n |r E s |Yn − Y |s + E r |X n − X |r E s |Y |s 1
1
≤ M(E r |X n − X |r + E s |Yn − Y |s ) (for all sufficiently large n) (r )
(s)
n→∞
n→∞
−→ 0, since X n −→ X , Yn −→ Y . #
n→∞
26. Indeed, for any r > 0, E|X n − X |r = E|X n |r = E X nr =
1 −→ 0. n n→∞
a.s.
By Theorem 4 in Chapter 3, X n −→ 0 if and only if n→∞
1 −→ 0 for k = 1, 2, . . . λ ∪ |X n+ν | ≥ n→∞ ν=1 k
∞
Here
∞ ∞ 1 1 1 = ∪ (X n+ν = 1) = ∪ 0, = 0, , ∪ |X n+ν | ≥ ν=1 ν=1 ν=1 k n+ν n+1 1 1 1 so that λ ∪∞ = n+1 |X = λ 0, | ≥ −→ 0, and therefore n+ν ν=1 k n+1 ∞
n→∞
a.s.
X n −→ 0. # n→∞
(r )
P
27. Indeed, X n −→ X implies X n −→ X , and then (by Theorem 5(ii) in Chapter 3) n→∞ n→∞ there exists a subsequence as described. #
Revised Answers Manual to an Introduction
28.
(i) For ε > 0, we have ∞
P(|X n | > ε) =
n=1
∞
∞ 1 < ∞. n2
P(X n = en ) =
n=1
n=1
a.s.
P
Therefore X n −→ 0 (by Exercise 4(i) in Chapter 3), hence X n −→ 0.
(ii)
29.
n→∞ n→∞ P 1 r r n Next, E|X n | = e × n 2 and this tends to ∞ as n → ∞. So, X n −→ 0, and n→∞ a.s. (r ) indeed, X n −→ 0, but E|X n |r 0. So, X n 0, by Theorem 14. # n→∞ n→∞ n→∞
(i) For ε > 0, we have ∞
P(|X n | > ε) =
n=1
∞
P(|X n | = 2 ) = cn
n=1
=
n=1
1 1−
∞
1 2
1 2n−1
= 2(< ∞).
a.s.
Then X n −→ 0 (by Exercise 4(i) in Chapter 3). (ii) Next,
n→∞ E|X n |r =
2 × 2cnr ×
and only if cr < 1 −
1 n
1 2n
=
1 2(n−1)−cnr
and this converges (to 0) if a.s.
(n ≥ 2); i.e., cr < 1. Since X n −→ 0 implies n→∞
P
X n −→ 0, this result together with E|X n |r −→ 0, for cr < 1, imply that n→∞ (r )
n→∞
X n −→ 0. # n→∞
30.
(i) From |X | ≥ cI (|X | ≥ c), we get E|X | ≥ c P(|X | ≥ c), so that P(|X | ≥ c) ≤ (E|X |)/c. The special case is immediate. (ii) For t > 0, P(X ≥ c) = P(t X ≥ tc) = P(et X ≥ etc ) ≤ e−tc Eet X , by the Markov inequality. For t < 0, P(X ≤ c) = P(t X ≥ tc) = P(et X ≥ etc ) ≤ e−tc Eet X as above. (iii) Clearly, |X | = |X |I (|X | > 0) + |X |I (|X | = 0), so that 1
1
E|X | = E[|X |I (|X | > 0)] ≤ (E 2 |X |2 )E 2 [I (|X | > 0)]2 1
1
= (E 2 X 2 )[P(|X | > 0)] 2 , 2 or (E|X |)2 ≤ (E X 2 )[P(|X | > 0)], and P(|X | > 0) ≥ (EE|XX 2|) . The special case is immediate. #
31. Indeed, |X n − μ|2 = |(X n − μn ) + (μn − μ)|2 (by the cr -inequality), ≤ 2 |X n − μn |2 + |μn − μ|2
e61
e62
Revised Answers Manual to an Introduction
so that
E|X n − μ|2 ≤ 2 E(X n − μn )2 + (μn − μ)2 = 2σn2 + 2(μn − μ)2 −→ 0. n→∞
So, E(X n
− μ)2
−→ 0, and then by the Tchebichev inequality,
n→∞
P(|X n − μ| ≥ ε) ≤ ε−2 E(X n − μ)2 −→ 0. # n→∞
32.
(i) For a > 0, E[|X n − Yn |I(|X n −Yn |≥a) ] ≤ E[|X n |I(|X n −Yn |≥a) ] + E[|Yn |I(|X n −Yn |≥a) ] = |X n | dP + |Yn | dP. (|X n −Yn |≥a)
However, for c > 0, |X n | dP = (|X n −Yn |≥a)
(|X n −Yn |≥a)∩|X n |≥c)
(|X n −Yn |≥a)
|X n | dP
+ ≤
(|X n −Yn |≥a)∩|X n |
(|X n |≥c)
|X n | dP
|X n | dP + c P(|X n − Yn | ≥ a),
and (|X n |≥c) |X n | dP ≤ 2ε for all n (by the uniform integrability of ε , n ≥ n 1 = n 1 (ε), so that |X n |, n ≥ 1) and P(|X n − Yn | ≥ a) ≤ 2c |X n | dP ≤ ε, n ≥ n 1 . (|X n −Yn |≥a)
Similarly,
(|X n −Yn |≥a)
|Yn | dP ≤ ε, n ≥ n 2 = n 2 (ε).
Thus, for n ≥ n 3 = n 3 (ε) = max{n 1 , n 2 }, we get that |X n − Yn | dP ≤ ε, n ≥ n 3 . (|X n −Yn |≥a)
Increasing a so as to make the n 3 − 1 integrals |X n − Yn | dP ≤ ε, (|X n −Yn |≥a)
n = 1, . . . , n 3 − 1, we have that (|X n −Yn |≥a)
for all n, as was to be proved.
|X n − Yn | dP ≤ ε
Revised Answers Manual to an Introduction
(ii) Uniform integrability of |X n −Yn |, n ≥ 1, implies their uniform continuity (by Theorem 11). Then uniform continuity and the assumption that X n − P
Yn −→ 0 imply that E|X n − Yn | −→ 0 (by Theorem 13(ii)). # n→∞
33.
n→∞
(i) The slope at θ is 1
1
S(h) = h −1 [e 2 |x−θ|− 2 |x−(θ+h)| − 1]. Let x < θ . Then for small h (either h > 0, or h < 0), we have 1 S(h) = h −1 e−h/2 − 1 −→ − . h→0 2 For x > θ and small h (either h > 0, or h < 0), we have 1 S(h) = h −1 (eh/2 − 1) −→ . h→0 2 The conclusion follows. (ii) Since Pθ (X = θ ) = 0, θ ∈ , it suffices to restrict attention to the cases that X < θ or X > θ . From part (i), it follows that, for each θ ∈ , ˙ ; θ ) with Pθ -probability 1, and hence S(h) −→ g(X ˙ ; θ ) in S(h) −→ g(X h→0
h→0
q.m.
Pθ -probability. In order to show that S(h) −→ g(X ˙ ; θ ), it suffices to show h→0
2 ˙ ; θ ) < ∞ (see Theorem 14 in this chapthat Eθ S 2 (h) −→ Eθ g(X h→0
ter). By the fact that Pθ (X > θ ) = Pθ (X < θ ) = 21 , it follows that
2 ˙ ; θ ) = 41 . Eθ g(X Next, 1 2 1 1 1 2 = 2 1 − Eθ e 2 |X −θ|− 2 |X −(θ+h)| Eθ h −1 e 2 |X −θ|− 2 |X −(θ+h)| − 1 h (since p(X ; θ + h) |X −θ|−|X −(θ+h)| = Eθ e = Eθ p(x; θ + h)d x = 1), p(X ; θ )
and 1
1
I = Eθ e 2 |x−θ|− 2 |x−(θ+h)| =
1 2
∞ −∞
1
1
e− 2 |x−θ|− 2 |x−(θ+h)| d x.
For h < 0: 1 θ+h − 1 |x−θ|− 1 |x−(θ+h)| 1 θ − 1 |x−θ|− 1 |x−(θ+h)| 2 2 I = e 2 dx + e 2 dx 2 −∞ 2 θ+h ∞ 1 1 1 + e− 2 |x−θ|− 2 |x−(θ+h)| d x 2 θ 1 = (2 − h)eh/2 , 2
e63
e64
Revised Answers Manual to an Introduction
and for h > 0: 1 θ+h − 1 |x−θ|− 1 |x−(θ+h)| 1 θ − 1 |x−θ|− 1 |x−(θ+h)| 2 2 e 2 dx + e 2 dx I = 2 −∞ 2 θ ∞ 1 1 1 + e− 2 |x−θ|− 2 |x−(θ+h)| d x 2 θ+h 1 = (2 + h)e−h/2 . 2 Therefore 1 2 1 Eθ h −1 e 2 |X −θ|− 2 |X −(θ+h)| − 1 2
1 h/2 for h < 0 2 1 − 2 (2 − h)e h
= 2 1 −h/2 for h > 0 1 − (2 + h)e 2 2 h1 t t for t < 0 2 (1 − e + te ) = 2t1 −t −t (1 − e − te ) for t > 0. 2t 2 However, 2t12 (1 − et + tet ) → 41 as t → 0, and likewise 2t12 (1 − e−t − te−t ) → 41 as t → 0. It follows that 1 2
2 1 Eθ h −1 e 2 |X −θ|− 2 |X −(θ+h)| − 1 ˙ ; θ) → Eθ g(X as h → 0, and the result follows. # 34.
(i) This follows from Exercise 2 in Chapter 4. (ii) X ≥ Y or X − Y ≥ 0 implies E(X − Y ) ≥ 0 and E(X − Y ) = 0 only if P(X − Y = 0) = 1; equivalently, E X ≥ EY and E X = EY only if P(X = Y ) = 1. (iii) Indeed, X > Y implies X ≥ Y and hence E X ≥ EY , and E X = EY only if P(X = Y ) = 1, which is a contradiction. (iv) Strict convexity of g means that g(Z ) > g(E Z ) + λ(E Z )(Z − E Z ), and hence E g(Z ) > g(E Z ) unless (by part (iii))
P g(Z ) = g(E Z ) + λ(E Z )(Z − E Z )
= g(E Z ) − λ(E Z )(E Z ) + λ(E Z )Z = a + bZ = 1. However, this is not possible if b = 0, because, although the straight line y = a + bz is convex, it is not strictly so. Since b need not be 0, the equality P[g(Z ) = a + bZ ] = 1 may only occur if P(Z = constant) = 1, as was to be seen. #
Revised Answers Manual to an Introduction
Chapter 7 The Hahn-Jordan Decomposition Theorem, the Lebesgue Decomposition Theorem, and the Radon-Nikodym Theorem
1. To show An 0 = A + ∞ j=n 0 (A j − A j+1 ). Indeed, if ω belongs to the right-hand side, then either ω ∈ A and hence ω ∈ An 0 , or ω ∈ (A j − A j+1 ) for some / A j+1 ) and therefore ω ∈ An 0 since An ↓. Next, j ≥ n 0 . Hence ω ∈ A j (but ω ∈ let ω ∈ An 0 . Then, either ω ∈ A j for all j ≥ n 0 , so that ω ∈ ∩∞ j=n 0 A j , hence A = A, and thus, ω belongs to the right-hand side. Or there is at least ω ∈ ∩∞ j=1 j / A j . Let j0 be the smallest j > n 0 for which this is true. one j > n 0 such that ω ∈ / A j0 withj0 > n 0 or j0 ≥ n 0 + 1 or j0 − 1 ≥ n 0 . That is, ω ∈ A j0 −1 , but ω ∈ Then ω ∈ (A j0 −1 − A j0 ), so that ω ∈ ∞ j=n 0 (A j − A j+1 ); i.e., ω belongs to the right-hand side.# ∞ n 2. Define μon A by: μ(A) = n=1 μn (A)/2 μn (). Then μ is finite, since ∞ −n μ() = n=1 2 = 1. It is a probability measure, since μ() = 0, μ(A) ≥ 0 ∞ ∞ ∞ ∞ 1 1 for all A ∈ A, and μ i=1 Ai = n=1 2n μn () μn i=1 Ai = n=1 2n μn () ∞ ∞ ∞ μn (Ai ) ∞ ∞ μn (Ai ) i=1 μn (Ai ) = n=1 i=1 2n μn i=1 n=1 2n μn () (because the () = ∞ terms are ≥ 0), and this is equal to i=1 μ(Ai ) (i.e., μ is σ -additive). Finally, μn (Ai ) μn (Ai ) μ(A) = 0 is equivalent to ∞ n=1 2n μn () = 0, which implies 2n μn () = 0 for all n, or μn (A) = 0 for all n. Thus, μn μ, n ≥ 1, and μ is a probability measure. # 3. (i) The first two properties are immediate, and the triangular inequality follows from the fact that |P(A) − Q(A)| ≤ |P(A) − R(A)| + |R(A) − Q(A)| for every A ∈ A. (ii) For any A ∈ A, set B = ( f − g > 0), C = (g − f > 0). Then f dμ − g dμ = ( f − g) dμ |P(A) − Q(A)| = A A A = ( f − g) dμ + ( f − g) dμ A∩C c A∩B = ( f − g) dμ − (g − f ) dμ . A∩B
A∩C
Since f − g > 0 on A ∩ B, and g − f > 0 on A ∩ C, it follows that the last quantity on the right-hand side above is maximized by maximizing A∩B ( f − g)dμ and by minimizing A∩C (g − f )dμ. This happens by taking A = B = ( f − g > 0), in which case A ∩ C = B ∩ C = . Thus, ( f − g)dμ. Similarly, max{|P(A) − max{|P(A) − Q(A)|; A ∈ A} = B Q(A)|; A ∈ A} = C (g − f )dμ. Since | f − g|dμ = ( f − g)dμ + (g − f )dμ,
the conclusion follows. #
B
C
e65
e66
Revised Answers Manual to an Introduction
4. Let Pn and Q n be the probability measures corresponding to U − n1 , 1 and U 0, 1 + n1 , and let λ be the Lebesgue measure. Then dPn n+1 = f n (x) = I 1 (x), dλ n [− n ,1]
d Qn n+1 = gn (x) = I 1 (x), dλ n [0,1+ n ]
n for − n1 ≤ x ≤ 0 or 1 ≤ x ≤ 1 + n1 , and 0 so that | f n (x) − gn (x)| = n+1 n 2 otherwise. It follows that | f n (x) − gn (x)|dλ = n+1 × n2 = n+1 −→ 0, and n→∞ the result follows by Exercise 3. # 5. By Exercise 3(ii), P − Q = 2 sup[|P(A) − Q(A)|; A ∈ A] = | f − g|dμ.
Then, with B = ( f − g > 0), we have | f − g|dμ = | f − g|dμ + | f − g|dμ B Bc = ( f − g)dμ + (g − f )dμ Bc
B
= [P(B) − Q(B)] + [Q(B c ) − P(B c )] = 2[P(B) − Q(B)], so that P − Q = 2[P(B) − Q(B)]. Set C = (|Z | > ε). Then P(B) − Q(B) = P(B ∩ C) + P(B ∩ C c ) − Q(B ∩ C) − Q(B ∩ C c ) ≤ P(C) + P(B ∩ C c ) − Q(B ∩ C c ) = P(|Z | > ε) + P(B ∩ C c ) − Q(B ∩ C c ). But
Q(B ∩ C c ) =
g dμ
B∩C c
= B∩C =
c
B∩C c
g f dμ (since on B, f > g and hence f > 0) f exp Z f dμ ≥ e−ε P(B ∩ C c ) (since |Z | ≤ ε on C c ).
Combining results derived so far, we have P − Q = 2[P(B) − Q(B)] ≤ 2P(|Z | > ε) + 2P(B ∩ C c ) − 2e−ε P(B ∩ C c ) = 2P(|Z | > ε) + 2P(B ∩ C c )(1 − e−ε ) ≤ 2(1 − e−ε ) + 2P(|Z | > ε). #
Revised Answers Manual to an Introduction
6.
(i) Here ρ=
( f g)
≤
1/2
dμ =
( f 1/2 g 1/2 )dμ
1/2 1/2 f dμ g dμ
(by the Cauchy-Schwartz inequality)
= 1. (ii) Clearly, | f − g| = | f 1/2 − g 1/2 || f 1/2 + g 1/2 |
≥ | f 1/2 − g 1/2 |2 = f + g − 2( f g)1/2 , so that | f − g|dμ ≥ 2 1 − ( f g)1/2 dμ = 2(1 − ρ),
which is the left-hand side of the inequality. Next, | f 1/2 + g 1/2 || f 1/2 − g 1/2 | dμ | f − g|dμ =
≤ =
|f
1/2
+g
1/2 1/2 1/2 1/2 2 | dμ |f − g | dμ
1/2 2
1/2 ( f + g + 2 f 1/2 g 1/2 )dμ
×
(f +g−2f
1/2 1/2
g
1/2 )dμ
= 2[(1 + ρ)(1 − ρ)]1/2 = 2(1 − ρ 2 )1/2 , which is the right-hand side of the inequality. (iii) This is immediate by the double inequality 2(1 − ρn ) ≤ d(Pn , Q n ) ≤ 2(1 − ρn2 )1/2 . # 7. The limits are taken as {n} or subsequences thereof tend to ∞. Let be the d.f. of the N (0, 1) distribution with respective p.d.f. (2π )−1/2 exp(−x 2 /2) (with respect to Lebesgue measure λ). Then, for xn → 0, we have: 2 −1/2 | f n (x + xn )|d(x) = (2π ) | f n (x + xn )|e−x /2 dλ(x)
2 −1/2 (2π ) | f n (y)|e−(y−xn ) /2 dy =
(setting x + xn = y).
e67
e68
Revised Answers Manual to an Introduction
Refer to Exercise 17 in Chapter 5, and set h n (y) = (2π )−1/2 | f n (y)|e−(y−xn ) G n (y) = (2π ) G(y) = (2π )
−1/2
−1/2
Me
Me
−(y−xn )2 /2
−y 2 /2
2 /2
, gn (y) = 0,
, h(y) = g(y) = 0,
.
Then the conditions of that exercise are satisfied, and therefore h n (y)dy → 0 or | f n (x + xn )|d(x) → 0.
So, by the Markov inequality, 1 -probability (| f n (x + xn )| > ε) ≤ ε
| f n (x + xn )|d(x) → 0,
so that | f n (x + xn )| → 0 in -measure. Hence there exists a subsequence {xm } ⊆ {xn } such that f m (x + xm ) → 0 a.e. [-measure]. However, it is clear that the measure is mutually absolutely continuous with respect to λ. Thus, f m (x +xm ) → 0 a.e. [λ], as was to be seen. Remark: From the proof, it follows that the result also holds if is replaced by
k , k ≥ 1. # 8. Define gn by gn = − f n− , and observe that the gn ’s satisfy the conditions of the f n ’s in Exercise 7. Therefore, for any xn −→ 0, there exists {xm } ⊆ {xn } such n→∞
that gm (x + xm ) −→ 0. By their definition, f n (x + xn ) ≥ gn (x + xn ) for all n m→∞ and x ∈ . Therefore, as n → 0 (or m → ∞), lim sup f n (x + xn ) ≥ lim sup gn (x + xn ) ≥ lim gm (x + xm ) = 0 a.e. [λ], as was to be seen. #
Chapter 8 Distribution Functions and their Basic Properties, Helly-Bray Type Results 1. In the first place, 0 ≤ y ≤ 1. Next, for such ys and t ∈ , we show that F −1 (y) ≤ t if and only if y ≤ F(t). Indeed, let F −1 (y) ≤ t. Then F[F −1 (y)] ≤ F(t). Also, from the definition of F −1 (y), there exist xn ∈ {x ∈ ; F(x) ≥ y} such that xn ↓ F −1 (y), as n → ∞. Hence, by the right continuity of F, F(xn ) ↓ F[F −1 (y)], as n → ∞. However, F(xn ) ≥ y for all n, and therefore F[F −1 (y)] ≥ y. Combining this result with the result F[F −1 (y)] ≤ F(t), we obtain y ≤ F(t). In the other way around, let
Revised Answers Manual to an Introduction
y ≤ F(t). Then, clearly, t ∈ {x ∈ ; F(x) ≥ y}, and therefore t ≥ F −1 (y) by the definition of F −1 (y). The justification of the assertion is completed. Now, the function x = F −1 (y) is B-measurable, because {y ∈ ; F −1 (y) ≤ c} = {y ∈ ; y ≤ F(c)}, by the assertion already established, and the last set is (−∞, F(c)], which is in B for all c ∈ . Thus, if Y ∼ U (0, 1) and set X = F −1 (Y ), then X is a r.v. Finally, for x ∈ : FX (x) = P(X ≤ x) = P[F −1 (Y ) ≤ x] = P[Y ≤ F(x)] = F(x), as it was to be shown. # 2. (i) The following relations are self-explanatory. For every ε > 0, (X n ≤ x) = (X n ≤ x) ∩ (|X n − X | ≥ ε) + (X n ≤ x) ∩ (|X n − X | < ε) ⊆ (|X n − X | ≥ ε) ∪ (X n ≤ x) ∩ (|X n − X | < ε) = (|X n − X | ≥ ε) ∪ (X n ≤ x) ∩ (X − ε < X n < X + ε) ⊆ (|X n − X | ≥ ε) ∪ (X n ≤ x) ∩ (X − ε < X n ) ⊆ (|X n − X | ≥ ε) ∪ (X − ε ≤ x) = (|X n − X | ≥ ε) ∪ (X ≤ x + ε). Hence FX n (x) = P(X n ≤ x) ≤ P(|X n − X | ≥ ε) + P(X ≤ x + ε) = P(|X n − X | ≥ ε) + FX (x + ε). P
Let n → ∞ and recall that X n → X to obtain lim FX n (x) ≤ FX (x + ε) for every ε > 0. Let x ∈ C(Fx ) and let ε → 0 to obtain lim FX n (x) ≤ FX (x).
n→∞
Next, (X ≤ x − ε) = (X ≤ x − ε) ∩ (|X n − X | ≥ ε) + (X ≤ x − ε) ∩ (|X n − X | < ε) ⊆ (|X n − X | ≥ ε) ∪ (X ≤ x − ε) ∩ (X − ε < X n < X + ε) ⊆ (|X n − X | ≥ ε) ∪ (X ≤ x − ε) ∩ (X n < X + ε) ⊆ (|X n − X | ≥ ε) ∪ (X n ≤ x), so that FX (x − ε) = P(X ≤ x − ε) ≤ P(|X n − X | ≥ ε) + P(X n ≤ x) = P(|X n − X | ≥ ε) + FX n (x).
e69
e70
Revised Answers Manual to an Introduction
As n → ∞,
FX (x − ε) ≤ limFX n (x) for every ε > 0.
Taking x ∈ C(FX ) and letting ε → 0, we have FX (x) ≤ limn→∞ FX n (x). Combining lim and lim, we get lim FX n (x) = FX (x). (ii) Let P(X = −1) = P(X = 1) = 21 and let X n = −X , n ≥ 1, so that X n − X = −2X and FX n = FX . However, for 0 < ε < 21 , P(|X n − X | ≥ ε) = P(| − 2X | ≥ ε) = P(|X | ≥ 2ε ) = 1, so that X n X . # P
3. We have V ar × Fn = V ar × F = 1, n ≥ 1, and Fn ⇒ F. Then, by Theorem n→∞
3(iii), Fn (±∞) −→ F(±∞). But Fn (∞) = 1 and Fn (−∞) = 0, n ≥ 1. Thus, n→∞
F(∞) = 1 and F(−∞) = 0, so that F is the d.f. of a r.v. # 4. Observe that the proof of Theorem 5 depends on the following facts: First, that F(x) is bounded (not necessarily by 1), so that, e.g., the sequence {Fn (x1 )} has a convergent subsequence {Fn1 (x1 )}. The same argument is repeated throughout the proof. The second fact is Proposition 1, whose justification, however, uses only property #2 in Definition 1, in order to argue that F has only jumps as its discontinuities, if any. The third fact is Proposition 2, whose justification also holds, if F is defined thus: F : → [0, B]. # 5. Regarding the proof of Theorem 6, the only point where the boundedness F(x) ≤ 1 and Fn (x) ≤ 1 enters the picture is the following: (g − gm )d F ≤ sup |gm (x) − g(x)| [F(β) − F(α)], (α,β]
x∈(α,β]
which would be ≤ Bsupx∈(α,β] |gm (x) − g(x)| rather than ≤ supx∈(α,β] |gm (x) − g(x)|, should F be bounded by B. The same for | (α,β] (g − gm )d Fn |. The proof of Theorem 7 hinges upon the following points: Finiteness of the integrals gd Fn and gd F, which is ensured by the boundedness of Fn and F. Relation (8.4), which holds on account of Theorem 6; relation (8.5), which holds on account of the boundedness of F; and relation (8.6), where now the bound is B × supx∈(α,β] g(x), which is < ε/3 for sufficiently small α and sufficiently large β. Finally, the proof of Theorem 8 depends on the following facts: That the integrals gd Fn and gd F are finite, which is ensured by the boundedness of Fn and F. That the boundedness gd Fn − gd F ≤ M{V ar .Fn − [Fn (β) − Fn (α)]}
holds regardless of the bound of Fn and F. That (8.7) holds, because all quantities figuring there are finite, due to the boundedness of Fn and F. Also, relations (8.8) and (8.9) hold by the finiteness of the integrals involved, and finally, relation (8.10) holds on account of Theorem 6. # 6. In all that follows, n → ∞ unless otherwise explicitly stated. Now |X n | ≤ Y , n ≥ 1, implies |X n |r ≤ Y r , n ≥ 1, and hence E|X n |r ≤ EY r < ∞, n ≥ 1.
Revised Answers Manual to an Introduction
Also, P(A) → 0 implies A Y r dP → 0 (Exercise 6, Chapter 5), and hence r A |X n | dP → 0 uniformly in n ≥ 1. Then Theorem 11, Chapter 6, gives r r |X n | dP = |X n | d F(x) ≤ ε, n ≥ 1. (8.1) (|X n |>c)
(|X n |>c)
E|X n |r =
|X n |r dP =
(|x|>c)
By (8.2) and (8.1) here, r 0 ≤ E|X n | −
(|x|≤c)
(|X n |>c)
=
|X n |r dP +
|x|r d Fn (x) +
|x| d Fn (x) =
(|x|≤c)
|X n |r dP
|x|r d Fn (x).
|x| d Fn )x r
(|x|>c)
(8.2)
r
With −c, c ∈ C(F), and by Theorem 6 here, r |x| d Fn (x) −→ (|x|≤c)
(|X n |≤c)
n→∞ (|x|≤c)
≤ ε, n ≥ 1. (8.3)
|x|r d F(x).
Taking the limits in (8.3) as n → ∞, and using (8.4), we get 0 ≤ lim E|X n |r − |x|r d F(x) (|x|≤c)
r |x| I(|x|≤c) d F(x) = lim E|X n |r −
r r ≤ lim E|X n | − |x| I(|x|≤c) d F(x) ≤ ε.
(8.4)
(8.5)
Also,
r
|x| I(|x|≤c) d Fn (x) = E |X n |r I(|X n |≤c) ≤ E|X n |r ≤ EY r < ∞, n ≥ 1.
By (8.6) and (8.4), and with −c, c in C(F),
r |x| I(|x|≤c) d F(x) ≤ EY r .
(8.6)
(8.7)
Since 0 ≤ |x|r I(|x|≤c) , it follows (by the Fatou-Lebesgue Theorem, Theorem 2 in Chapter 5) that, as c ↑ ∞,
r |x| I(|x|≤c) d F(x). lim |x|r I(|x|≤c) d F(x) ≤ lim
e71
e72
Revised Answers Manual to an Introduction
However,
lim |x|r I(|x|≤c) d F(x) =
=
and lim
since the integrals
lim |x|r I(|x|≤c) d F(x) |x|r d F(x),
r |x| I(|x|≤c) d F(x) = lim
r
|x| I(|x|≤c) d F(x),
|x|r I(|x|≤c) d F(x) ↑ (as c ↑ ∞). It follows that
r r |x| d F(x) ≤ lim |x| I(|x|≤c) d F(x).
From (8.8) and (8.7), it follows that |x|r d F(x) < ∞.
(8.8)
(8.9)
From |x|r I(|x|≤c) ≤ |x|r independent of c and integrable (by (8.9)), and the fact that |x|r I(|x|≤c) → |x|r (as c ↑ ∞ with −c, c in C(F)), we have (by the Dominated Convergence Theorem) r
|x| I(|x|≤c) d F(x) → |x|r d F(x). (8.10) Finally, taking the limits in (8.5), as c ↑ ∞ (with −c, c in C(F)), and using (8.10), we have r r r 0 ≤ limE|X n | − |x| d F(x) ≤ limE|X n | − |x|r d F(x) ≤ ε. Letting ε → 0, we get that
limE|X n |r = limE|X n |r = lim E|X n |r =
|x|r d F(x)
= E|X |r . # 7. (i) Let xn ↓ 0 as n → ∞, and define the events An = (X ≤ x − xn ), A = (X < x). Then, clearly, as n → ∞, An ↑ and limn→∞ An = ∪∞ n=1 An = A. Then, by the continuity of a probability measure, lim P(An ) = P( lim An ) = P(A),
n→∞
n→∞
or lim P(X ≤ x − xn ) = P(X < x).
n→∞
However, limn→∞ P(X ≤ x − xn ) = F(x−). Thus, P(X < x) = F(x−).
Revised Answers Manual to an Introduction
(ii) By part (i), P(X = x) = P(X ≤ x) − P(X < x) = F(x) − F(x−). So, if F is continuous at x, then F(x) = F(x−), and hence P(X = 0). If P(X = x) = 0, then F(x) = F(x−), so that F is continuous at x. # 8. F has got to be nondecreasing, continuous from the right, and F(−∞) = 2 0, F(∞) = 1. For (0 <)x < y, we must have F(x) ≤ F(y) or α + βe−x /2 ≤ 2 2 2 2 2 α +βe−y /2 or βe−x /2 ≤ βe−y /2 . Now x < y is equivalent to e−y /2 < e−x /2 , and therefore the previous inequality holds if β < 0. Next, for x < 0 and x > 0, F(x) is continuous, so it remains to examine the case x = 0. Since F(0) = 0, we must have F(x) → 0 as x ↓ 0, or lim x↓0 (α + 2 2 βe−x /2 ) = α + βlim x↓0 e−x /2 = α + β = 0, so that β = −α. 2 Finally, 1 = F(∞) = lim x→∞ F(x) = lim x→∞ (α + βe−x /2 ) = α. Summarizing the above conclusions, we have then α = 1, β = −1, so that F(x) = 2 1 − e−x /2 , x > 0. # 9. For ε > 0, select a < b, so that F(a) <
ε ε , F(∞) − F(b) < . 3 3
(1)
Next, F being continuous, is uniformly continuous in [a, b]. Then we can select a partition a = x0 < x1 < . . . < xk−1 < xk = b, so that F(x j ) − F(x j−1 ) <
ε , j = 1, . . . , k. 3
(2)
Since Fn (x) −→ F(x) for all x, we have n→∞
−
ε ε < Fn (x j ) − F(x j ) < for n ≥ N independent of j. 3 3 c
(3)
Finally, since Fn −→ F, Theorem 3(iii) implies that Fn (±∞) −→ F(±∞), so n→∞ n→∞ that ε ε − < Fn (∞) − F(∞) < , n ≥ N . (4) 3 3 Next, we proceed as follows. First, let x ∈ [a, b], so that x ∈ [x j−1 , x j ] for exactly one j. Then, for n ≥ N = max{N , N }, Fn (x) − F(x) ≤ Fn (x j ) − F(x j−1 ) ε ≤ F(x j ) + − F(x j−1 ) 3 = [F(x j ) − F(x j−1 )] + <
ε 2ε ε + = <ε 3 3 3
(since x j−1 ≤ x ≤ x j ) (by (3)) ε 3 (by (2)).
e73
e74
Revised Answers Manual to an Introduction
Also, Fn (x) − F(x) ≥ Fn (x j−1 ) − F(x j ) ε ≥ F(x j−1 ) − − F(x j ) 3 = −[F(x j ) − F(x j−1 )] − ε 2ε ε > −ε >− − =− 3 3 3
(since x j−1 ≤ x ≤ x j ) (by (3)) ε 3 (by (2)).
Therefore, for x ∈ [a, b], |Fn (x) − F(x)| < ε, n ≥ N .
(5)
Next, let x < a, then, for n ≥ N , 0 ≤ F(x) ≤ F(a) < 3ε ; i.e., F(x) ≤ 3ε . Also, 0 ≤ Fn (x) ≤ Fn (a) < F(a) + <
ε (by (3)) 3
ε ε 2ε + = < ε (by (2)); i.e., 3 3 3
Fn (x) < 2ε 3 . Therefore Fn (x) − F(x) < 0 − 3ε = − 3ε > −ε, so that |Fn (x) − F(x)| <
2ε 3
−0 =
2ε 3
< ε, and Fn (x) − F(x) >
ε , n ≥ N (x < a). 3
(6)
Finally, let x > b. Then, for n ≥ N , F(∞) −
ε < F(b) (by (1)) 3 ≤ F(x) ≤ F(∞),
ε and, by (1), (3), and (4), F(∞) − 2ε 3 < F(b) − 3 < Fn (b) ≤ Fn (x) ≤ Fn (∞) ≤ F(∞) + 3ε ; i.e., F(∞) − 3ε ≤ F(x) ≤ F(∞) and F(∞) − 2ε 3 ≤ Fn (x) ≤ F(x) + 3ε . Therefore
Fn (x) − F(x) ≤ F(∞) +
ε 2ε ε − F(∞) + = < ε, 3 3 3
(7)
Fn (x) − F(x) ≥ F(∞) −
2ε 2ε − F(∞) = − > −ε. 3 3
(8)
and
Relations (7) and (8) yield, |Fn (x) − F(x)| < ε, n ≥ N (x > b). The result follows from relations (5), (6) and (9). #
(9)
Revised Answers Manual to an Introduction
10. (i) In the first place, P(Y = C) = P(X > C) = 1 − FX (C). Next, FY (y) = P(Y ≤ y) = P(Y ≤ y|X ≤ C)P(X ≤ C) + P(Y ≤ y|X > C)P(X > C) = P(X ≤ y|X ≤ C)P(X ≤ C) + P(C ≤ y|X > C)P(X > C) = P(X ≤ y, X ≤ C) + P(C ≤ y, X > C). Then, for y < C, FY (y) = P(X ≤ y) + 0 = FX (y), whereas, for y ≥ C, FY (y) = P(X ≤ C) + P(X > C) = 1. So, FY (y) = FX (y) for y < C, and FY (y) = 1 for y ≥ C. (ii) Define F1 and F2 as follows: FX (x), x ≤ C 0, x ≤C , F2 (x) = . F1 (x) = FX (C), x > C 1 − FX (C), x > C Then, clearly, F1 is a continuous d.f., F2 is a step function, and F1 + F2 = FY . # 11. For n ≥ 1, let X n be r.v.s such that P(X n = −n) = P(X n = n) = 21 , so that the respective d.f.s are given by: ⎧ ⎨ 0, x < −n Fn (x) = 1/2, −n ≤ x < n , n ≥ 1. ⎩ 1, x ≥ n Then, clearly, Fn (x) −→ 21 for every x ∈ . Thus, if F(x) = 21 , x ∈ , then F n→∞ is a d.f., Fn =⇒ F, but F is not the d.f. of a r.v. # n→∞ 12. To show that (|X n |≥a) |X n |dP → 0 uniformly in n as (0 <)a → ∞, or (|x|≥a) |x| d Fn → 0 uniformly in n as a → ∞. For 0 < c < a, we have |x|d Fn ≤ |x|d Fn + |x|d Fn (|x|≥a)
(−∞,−c] ∞
=
−∞
|x|d Fn −
(c,∞)
(−c,c]
|x|d Fn .
Thus, it suffices to show that the right-hand side above → 0 uniformly in n as c → ∞. We have ∞ (0 ≤ ) |x|d Fn − |x|d Fn (−c,c] −∞ ∞ ∞ ≤ |x|d Fn − |x|d F − |x|d Fn − |x|d F (−c,c] (−c,c] −∞ −∞ |x|d F + (|x|≥c) ≤ |E|X n | − E|X || + |x|d F + |x|d Fn − |x|d F . (|x|≥c)
(−c,c]
(−c,c]
e75
e76
Revised Answers Manual to an Introduction
However, for ε > 0, ε |E|X n | − E|X || < , n ≥ n 1 = n 1 (ε), (since E|X n | −→ E|X |), n→∞ 3 ε |x|d F < for sufficiently large c (since E|X | < ∞), 3 (|x|≥c) ε |x|d Fn − |x|d F < and 3 (−c,c] (−c,c] for c as above and also such that c and −c are continuity points of F, n ≥ n 2 = n 2 (ε) (by Theorem 6 here). Therefore, if n ≥ n 3 = max{n 1, n 2 }, and c and −c are as above, the right-hand side above becomes < ε. Hence (|x|≥a) |x|d Fn < ε, n ≥ n 3 . Increasing a to make the n 3 −1 integrals (|x|≥a) |x|d Fn , n = 1, . . . , n 3 − 1 < ε, the result follows. # 13. In the sequel, all limits are taken as n → ∞. Using assumptions (i)-(iv), we have the following chain of relations. Let c, d ∈ with c < d and always continuity points of F. Then: f d F − f d F n n
= f n d Fn + f n d Fn + f n d Fn (−∞,c] (c,d] (d,∞) − f dF + f dF + f d F (−∞,c] (c,d] (d,∞) = f n d Fn + f n d Fn − f dF + f dF (−∞,c] (d,∞) (−∞,c] (d,∞) f n d Fn − f d F + (c,d] (c,d] ≤ | f n |d Fn + | f n |d Fn + | f |d F + | f |d F (−∞,c] (d,∞) (−∞,c] (d,∞) f n d Fn − f d F + (c,d] (c,d] = | f n |d Fn + | f n |d Fn + | f |d F + | f |d F (−∞,c] (d,∞) (−∞,c] (d,∞) f n d Fn − f d Fn + f d Fn − f d F + (c,d] (c,d] (c,d] (c,d] ≤ gd Fn + gd Fn + | f |d F + | f |d F (−∞,c] (d,∞) (−∞,c] (d,∞) | f n − f |d Fn + f d Fn − f d F (by (ii)) + (c,d]
(c,d]
(c,d]
Revised Answers Manual to an Introduction
=
gd Fn + (c,d] | f n − f |d Fn +
gd Fn −
+
(c,d]
(−∞,c]
(c,d]
| f |d F +
f d Fn −
(c,d]
(d,∞)
| f |d F
f d F .
From (ii), (iii), and gd F < ∞ (by (iv)), it follows that (−∞,c] | f |d F < ∞ and (d,∞) | f |d F < ∞. Therefore, for ε > 0, c sufficiently small, d sufficiently large and always points of C(F), we have | f |d F <
(−∞,c]
ε , 6
(d,∞)
| f |d F <
ε . 6
Also, gd Fn → gd F (by (iv)), and (c,d] gd Fn → (c,d] gd F (by boundedness of g on [c, d], its continuity on (c, d], assumption (i), and Theorem 6), so that gd Fn − gd Fn → gd F − gd F.
(c,d]
(c,d]
It follows that, for all sufficiently large n,
gd Fn −
(c,d]
gd F <
2ε . 6
Also, for all sufficiently large n, (c,d]
| f n − f |d Fn <
ε by ((iii)), 6
and
(c,d]
f d Fn −
(c,d]
ε f d F < (by Theorem 6). 6
Combining all results obtained above, we have that for all sufficiently large n, f n d Fn − f d F < ε. #
e77
e78
Revised Answers Manual to an Introduction
Chapter 9 Conditional Expectation and Conditional Probability, and Related Properties and Results 1. Indeed, ∞
i=1
X dP =
Ai
∞ X Ii=1 Ai
∞ dP = X I Ai dP, i=1
∞ because Ii=1 Ai =
=
∞
I Ai as is easily seen,
i=1
∞
X I Ai
dP,
i=1
because X
∞
I Ai =
i=1
=
lim
∞
X I Ai as is easily argued,
i=1 n
n→∞
= lim
i=1 n
n→∞
X I Ai
dP
X I Ai
dP ,
i=1
n X I Ai ≤ |X | because, clearly, i=1
independent of n and integrable, so that the Dominated Convergence Theorem applies, n = lim X I Ai dP , n→∞
i=1
by Exercise 8 in Chapter 4, = =
∞ i=1 ∞ i=1
X I Ai
dP
X dP, Ai
as was to be seen. # 2. By following the standard four steps, we have: (a) Let X = IC , C ∈ A. Then X dP(·|B) = IC dP(·|B) = P(C|B) = P(C∩B) 1 1 P(B) = P(B) B IC dP = P(B) B X dP.
Revised Answers Manual to an Introduction
n (b) Let X = i=1 αi ICi . Then X dP(·|B) = ( i αi ICi )dP(·|B) = i αi 1 1 × ICi dP(·|B) = i αi P(B) B ICi dP (by (a)) = P(B) B ( i αi ICi )dP = 1 P(B) B X dP. (c) For X ≥ 0, there exist 0 ≤ X n simple ↑ X and 0 ≤ X n I B ↑ X I B . n→∞ n→∞ 1 1 1 −→ P(B) Hence X n dP(·|B) = P(B) B X n dP (by (b)) = P(B) (X n I B )dP n→∞ 1 (X I B )dP (by the Lebesgue Monotone Convergence Theorem) = P(B) 1 × B X dP; i.e., X n dP(·|B) −→ P(B) B X dP. Also, X n dP(·|B) −→ n→∞ n→∞ 1 X dP(·|B), so that X dP(·|B) = P(B) X dP. B (d) For any (integrable) X , X dP(·|B) = (X + − X − )dP(·|B) = X + dP(·|B) − X − dP(·|B) 1 1 + = X dP − X − dP (by (c)) P(B) B P(B) B 1 1 = (X + − X − )dP = X dP. # P(B) B P(B) B 3. Set B1 = [0, 41 ], B2 = ( 41 , 23 ], B3 = ( 23 , 1]. Then {B1 , B2 , B3 } form a partition of . Therefore, by the “Special Case” right after Definition 1, we have: E(X |F) = (E B1 X )I B1 + (E B2 X )I B2 + (E B3 X )I B3 , where 1 E B1 X = P(B1 ) E B2 X = E B3 X =
1 P(B2 ) 1 P(B3 )
1 4
X dP = 4
B1
0
12 5 X dP = 3
X dP =
B2
B3
x 2d x = 2 3
1 4
1 2 3
1 , 48
x 2d x =
x 2d x =
97 , 432
19 . 27
1 97 Thus, E(X |F) = 48 I[0, 1 ] + 432 I( 1 , 2 ] + 19 27 I( 23 ,1] . # 4 4 3 4. B Y dP ≤ B X dP or equivalently, B (Y − X )dP ≤ 0 or B Z dP ≤ 0, where Z = Y − X . Thus, to show B Z dP ≤ 0 for every B implies Z ≤ 0 a.s. Let that P(D) = 0. C = (Z ≤ 0), D = (Z > 0)(= C c ). Then, it suffices to show However, by taking B = D, D Z dP ≤ 0 and D Z dP = (Z I D )dP ≥ 0 since Z > 0 on D, so that D Z dP = 0. Then, we shall show that, if for a r.v. Z with D = (Z > 0), it holds D Z dP = 0, then P(D) = 0. To this end, we follow the familiar four steps.
e79
e80
Revised Answers Manual to an Introduction
(a) Let Z = I A . Then necessarily D = A. Hence 0 = A Z dP = A I A dP = P(A) = P(D); i.e., P(D) = 0. n αi I Ai and let αi1 , . . . , αik be > 0 and α j1 , . . . , α jl ≤ 0 (k + (b) Let Z = i=1 l = n). Then D = (Z > 0) = Ai1 + . . . + Aik and 0= Z dP = (Z I D )dP = (Z Ik Ai )dP r r =1 D k k k Z I Air dP = Z dP = (Z I Air )dP = r =1
=
k r =1
Air
r =1
αir dP =
k
r =1
Air
αir P(Air ).
r =1
Hence αir P(Air ) = 0 for all r , and since αir > 0 for all r , it follows that P(Air ) = 0 for all r ; consequently, P(D) = 0. n j−1 (c) For Z ≥ 0, 0 ≤ X n simple ↑ X , where we take X n = n2 j=1 2n I An j + n→∞
j n I An , An j = ( j−1 2n < Z ≤ 2n ), An = (Z > n). For ω ∈ D, we have Z (ω) > 0, so that X n (ω) > 0 for all sufficiently large n, n ≥ n 0 = n 0 (ω), say. Working with such ns, we have 0 < X n I D simple ↑ Z I D and n→∞ (Z I D )dP = D Z dP = 0, so that (X n I D )dP = 0 (for (X n I D )dP ↑ n→∞ n all n ≥ n 0 ). Since X n = kj=1 α j I A j with α j > 0, we have (X n I D )dP = Then, as in (b), P(D) = 0. j α j P(Aj ∩ D) = 0. (d) For any Z , D Z dP = D Z + dP − D Z − dP, where Z + = Z if Z > 0 and Z + = 0 if Z ≤ 0. Then D Z dP = D Z + dP = 0, and hence P(D) = 0 by (c). # 5. (i) 0 = A X dP = A∩C X dP + A∩C C X dP = A∩C X dP = D X dP, where D = A ∩ C and C = (X > 0). So, X > 0 on D and D X dP = 0 must imply that P(D) = 0. This would mean that X = 0 a.s. (a) Let X it must be that B = C, so that B ⊇ D. Then = I B . Then = 0 = D X dP n D I B dP = P(B ∩ D) = P(D) = 0. α , . . . , αik > 0 (and α j1 , . . . , α jl = (b) Let X = i=1 αi I Ai with i1 0, k + l = n). Then 0 = D X dP = rk=1 αir P(Air ∩ D), so that αir P(Air ∩ D) = 0 for all r , and hence P(Air ∩ D) = 0 for all r . Since Ai1 , . . . , Aik form a partition of D, we have P(D) = 0. (c) For X ≥ 0, 0 ≤ X n simple ↑ X , and as in Exercise 4(c), take n→∞
X n > 0, n ≥ n 0 . Then conclude, as in that part, that P(D) = 0. (ii) X ≥ 0 implies E B X ≥ 0 a.s. Also, for B ∈ B, B X dP = B E B X dPB . B In particular, since A ∈ B, B∩A X dP = B∩A E X dPB . But 0 ≤ B∩A X dP ≤ A X dP = 0 (since X ≥ 0). Thus, B∩A X dP = 0 and hence B B B B∩A E X dPB = 0. However, 0 = B∩A E X dPB = B (I A E X )dPB =
Revised Answers Manual to an Introduction
0dPB . So, B (I A E B X )dPB = B 0dPB for all B ∈ B, and both I A E B X and 0 are B-measurable. Hence I A E B X = 0 a.s. Since I A = 1 on A, in order for I A E B X = 0 a.s. either P(A) = 0 (which is not) or E B X = 0 a.s. on A. # B
2
6. The function g ∗ (x) = x r (x ≥ 0, r ≥ 1) is convex (since ddx 2 g ∗ (x) = r (r − 1) × x r −2 × ≥ 0) and |x| is symmetric about 0. Thus, g(x) = |x|r , x ∈ , is convex. r B r # Then, by Jensen inequality, g(E B X ) ≤ E B g(X ) a.s. or |E B X | ≤+ E |X | a.s. + − 7. From E X ∈ , it follows that E X and E X < ∞, and also B X and B X − < ∞ for every B ∈ B. Since B X + d P = B E B X + d PB and B X − d P = B − B + B − B E X d PB , it follows that B E X d PB and B E X d PB < ∞ for every B + B ∈ B. Let C = {E B X + = ∞}. Then C = ∪∞ n=1 {E X ≥ n}, so that C ∈ B. B + + Hence ∞ > C X d P = C E X d PB , and hence P(C) = 0. Likewise, if D = {E B X − = ∞}, then D ∈ B and P(D) = 0. It follows that E B X is finite except on the set C + D with P(C + D) = P(C) + P(D) = 0; i.e., E B X is a.s. finite. # 8. The inequality is true for n = 2 (by definition of convexity). Assume (*) to be true for k and establish it for k + 1. Without loss of generality, assume αk+1 < 1. Then: g(α1 x1 + . . . + αk xk + αk+1 xk+1 ) = α1 αk g (1 − αk+1 ) x1 + . . . + xk + αk+1 xk+1 ≤ 1 − αk+1 1 − αk+1 α1 αk (1 − αk+1 )g x1 + . . . + xk + αk+1 g(xk+1 ) 1 − αk+1 1 − αk+1 (since 1 − αk+1 , αk+1 ≥ 0 and (1 − αk+1 ) + αk+1 = 1) α1 αk ≤ (1 − αk+1 ) g(x1 ) + . . . + g(xk ) + αk+1 g(xk+1 ) 1 − αk+1 1 − αk+1 αi (by the induction hypothesis, since ≥ 0, i = 1, . . . , k 1 − αk+1 and their sum is 1) = α1 g(x1 ) + . . . + αk g(xk ) + αk+1 g(xk+1 ). # 9. As already shown in the proof of Lemma 2, B g(y)dPY = B g(Y )dPY for g = I A , A ∈ B Y. n αi I Ai , where {A1 , . . . , An } is a (measurable) partition of Next, let g = i=1 n −1 αi I Ai , and let , and let Ai = Y (Ai ), i = 1, . . . , n, so that g(Y ) = i=1 −1 B ∈ BY , B = Y (B ). Then by the linearity of the integral and the previous step, B
g(y)dPY =
n i=1
αi
B
I Ai PY =
n i=1
αi B
I Ai PY =
g(Y )dPY . B
e81
e82
Revised Answers Manual to an Introduction
Now, let g be nonnegative. Then there exist 0 ≤ gn simple ↑ g; i.e., gn = n→∞ rn rn i=1 αni I Ani (αni ≥ 0, i = 1, . . . , rn ),which implies that 0 ≤ gn (Y ) = i=1 αni −1 × I Ani simple ↑ g(Y ), where Ani = Y (Ani ), i = 1, . . . , rn . Then, for every n→∞
B ∈ BY and B = Y −1 (B), we have gn (y)dPY −→ g(y)dPY and gn (Y )dPY −→ g(Y )dPY , B
n→∞ B
B
n→∞ B
It folwhereas by step, B gn (y)dPY = B gn (Y )dPY , n ≥ 1. the previous + − g−, g, write g = g lows that B g(y)dPY = B g(Y )dPY . Finally, for any + − if B g(Y )dPY exists, it folwhich implies g(Y ) += g (Y ) − g (Y ). Now, − ∞ or both. Since lows or− B g (Y )dP Y < +that either B g +(Y )dPY < ∞ − (Y )dP , by the preg (y)dP = g Y Y B g (y)dPY = B g (Y )dPY and B B vious step, it follows that either B g + (y)dPY < ∞ or B g − (Y )dPY < ∞ or both. Thus, B g(y)dPY exists and g(y)dPY = g + (y)dPY − g − (y)dPY B B B + − = g (Y )dPY − g (Y )dPY = g(Y )dPY . B
B
B
implies the existence of B g(Y )dPY and Likewise, the existence of their equality. # 10. Let xi , i ≥ 1, y j , j ≥ 1 be the values of X and Y , respectively, and without loss of generality assume that P(X = xi ) > 0, i ≥ 1, P(Y = y j ) > 0, j ≥ 1. Let DY be the discrete σ -field generated by {y j , j ≥ 1}. Since, clearly, {y j } ∈ BY (because Y −1 ({y j }) ∈ A), it follows that DY ⊂ BY . Next, set f Y (y j ) = P(Y = y j ) = p j . Then f Y−1 ({ p j }) is a member of DY containing y j and hence f Y−1 ({ p j }) ∈ BY . It follows that f Y (·) is BY -measurable. Next, let B X ,Y and D X ,Y be defined in a way similar to that of BY and DY , and set f X ,Y (xi , y j ) = P(X = xi , Y = y j ) = pi j . Then, as above, f X ,Y (·) is D X ,Y measurable (and hence B X ,Y -measurable). For any fixed xi , xi0 , say, f X ,Y (xi0 , ·) is the {y j , j ≥ 1}-section of f X ,Y (·, ·) at xi0 , and hence it is DY -measurable. Thus, f X ,Y (xi0 , ·) and f Y (·) are both BY -measurable, and then so is their ratio P(X = i 0 |Y = ·), as was to be seen. #
B g(y)dPY
P
11. We can always take B = A and consider a sequence {X n } such that X n −→ X , n→∞
some r.v., but X n X . Then E B X n = X n a.s. and E B X = X a.s. with a.s.
n→∞
P a.s. E B X n −→ E B X , but E B X n E B X . n→∞
n→∞ P
a.s.
n→∞
n→∞
As another possibility, let X n −→ c but X n c, and take B = σ (X 1 , X 2 , . . .). Then X n is B-measurable, n ≥ 1 (and so is c), and E B X n = X n a.s., E B c = c P
a.s.
n→∞
n→∞
a.s. Therefore E B X n −→ E B c, but E B X n E B c. #
Revised Answers Manual to an Introduction
12. (i) The first step is immediate, and the second goes like this: y − μ2 x − μ1 y − μ2 x − μ1 1 −ρ = − × ρσ2 σ2 σ1 σ2 σ2 σ1 1 x − μ1 1 y−b y − μ2 + ρσ2 = (y − b) = = . σ2 σ1 σ2 σ2 (ii) Observe that the second factor on the right-hand side is the p.d.f. of a Normal distribution with mean b and variance σ22 (1 − ρ 2 ). Thus, integration with respect to y, leaves us with the first factor which is the p.d.f. of N (μ1 , σ12 ). # 13. The expression p X ,Y (x, y)/ p X (x), where p X is the p.d.f. of X , leaves us with the second expression on the right-hand side of p X ,Y (x, y) in Exercise 12 (ii), which is the p.d.f. of a Normal distribution as described. # 14. (i) We have: E(X Y ) = E[E(X Y |X )] = E[X E(Y |X )] (by Theorem 4) = E(X b) (by Exercise 13) ρσ2 = E{X [μ2 + (X − μ1 )]} (by the expression of b) σ1 ρσ2 (E X 2 − μ21 ) = μ2 μ1 + σ1 ρσ2 × σ12 = μ1 μ2 + ρσ1 σ2 . = μ1 μ2 + σ1 (ii) Cov(X , Y ) = E(X Y ) − (E X )(EY ) = (μ1 μ2 + ρσ1 σ2 ) − μ1 μ2 = ρσ1 σ2 . ,Y ) 1 σ2 = ρσ (iii) ρ(X , Y ) = Cov(X σ1 σ2 σ1 σ2 = ρ. # 15. From the transformations u = (x1 − μ1 )/σ1 , v = (y − μ2 )/σ2 , we get x = μ1 + σ1 u, y = μ2 + σ2 v, so that the Jacobian of the transformation is J = σ1 σ2 . 1√ e−q/2 , where Since p X ,Y (x, y) = 2 2π σ1 σ2
1 q= 1 − ρ2
x − μ1 σ1
1−ρ
2
− 2ρ
x − μ1 σ1
y − μ2 σ2
+
y − μ2 σ2
2 ,
we get: 1 2 2 (u − 2ρuv + v ) × σ1 σ2 exp − pU ,V (u, v) = 2(1 − ρ 2 ) 2π σ1 σ2 1 − ρ 2 1 1 2 2 " (u = exp − − 2ρuv + v ) . 2(1 − ρ 2 ) 2π 1 − ρ 2 1 "
Thus, the r.v.s U and V have the Bivariate Normal distribution with parameters 0,0,1,1, and ρ. #
e83
e84
Revised Answers Manual to an Introduction
16.
(i) We have Var(X |Y ) = E{[X − E(X |Y )]2 |Y } (by definition) = E{X 2 + [E(X |Y )]2 − 2X E(X |Y )|Y } = E(X 2 |Y ) + [E(X |Y )]2 − 2[E(X |Y )]2 a.s. = E(X 2 |Y ) − [E(X |Y )]2 . (ii) From part (i), we have E[Var(X |Y )] = E{E(X 2 |Y ) − [E(X |Y )]2 } = E X 2 − E[E(X |Y )]2 . On the other hand, Var[E(X |Y )] = E[E(X |Y )]2 − {E[E(X |Y )]}2 = E[E(X |Y )]2 − (E X )2 . Adding up the above two expressions, we get E X 2 − (E X )2 = Var(X ) = E[Var(X |Y )] + Var[E(X |Y )]. #
17.
(i) Set B j = (N = j), j = 1, 2, . . . Then it is clear that X = X 1 + . . . + X j on the event B j , j ≥ 1. It is also clear that, for any B ∈ B, (X ∈ B) =
∞ {B ∩ B j ∩ [(X 1 + . . . + X j ) ∈ B]} j=1
= B∩
∞ {B j ∩ [(X 1 + . . . + X j ) ∈ B]}, j=1
whereas the set on the right-hand side above is an event. Then so is (X ∈ B). (ii) On the basis of the Special Case right after Definition 1, we have ∞ 1 (E B j X )I B j , E B j X = X dP. E(X |N ) = P(B j ) B j j=1
However, 1 1 E(X I B j ) = E[(X 1 + . . . + X j )I B j ] P(B j ) P(B j ) 1 [E(X 1 I B j ) + . . . + E(X j I B j )] = P(B j ) 1 = [(E X 1 )P(B j ) + . . . + (E X j )P(B j )] P(B j ) (by independence of N and the X i s) = jμ.
EB j X =
Revised Answers Manual to an Introduction
That is, E B j X = jμ, and therefore E(X |N ) =
∞ ∞ ( jμ)I B j = μ j I B j = μN , j=1
since, clearly, N =
∞
j=1
j=1
j I B j . Therefore
E X = E[E(X |N )] = E(μN ) = μ(E N ). (iii) Next, working as in part (ii), E(X 2 |N ) =
∞
(E B j X 2 )I B j , E B j X 2 =
j=1
1 P(B j )
X 2 dP, Bj
and 1 E[(X 1 + . . . + X j )2 I B j ] P(B j ) ⎡⎛ ⎞ ⎤ j 1 = E ⎣⎝ X i2 + 2 Xk Xl ⎠ IB j ⎦ P(B j )
EB j X 2 =
i=1
1≤k
2 1 P B j jE X 12 + j( j − 1) E X 1 P Bj = j σ 2 + μ2 + j j − 1 μ2 = jσ 2 + j 2 μ2 , =
so that, E(X 2 |N ) =
∞
( jσ 2 + j 2 μ2 )I B j = σ 2 N + μ2 N 2 .
j=1
It follows that Var(X |N ) = E(X 2 |N ) − [E(X |N )]2 = σ 2 N . (iv) Since E(X |N ) = μN and Var(X |N ) = σ 2 N , then part (ii) of Exercise 16 yields: Var(X ) = E[Var(X |N )] + Var[E(X |N )] = E(σ 2 N ) + Var(μN ) = σ 2 (E N ) + μ2 Var(N ). # 18. For every B ∈ B, it follows that B X dPB is either 0 or E X , according to whether P(B) = 0 or P(B) = 1; or B X dPB = B (E X )dP. However, B (E X )dP = (bybeing a constant). Therefore X and B (E X )dPB , since E X is B-measurable E X are both B-measurable and B X dPB = B (E X )dPB for every B ∈ B. Hence X = E X a.s. #
e85
e86
Revised Answers Manual to an Introduction
19. Consider the r.v.s X 1 , . . . , X n taking on the respective values xi1 , . . . , xin , where xi j ≥ 0, j = 1, . . . , n, all i, and xi1 + · · · + xin = t (which imply, of course, that 0 ≤ xi j ≤ t, j = 1, . . . , n, all i). Now, look at the r.v.s X 1 , . . . , X n as n distinct cells, and also consider t indistinguishable balls. The the number of n-tuples (xi1 , . . . , xin ) with 0 ≤ xi j ≤ t, j = 1, . . . , n, all i, and xi1 + · · · + xin = t is the same as the number of ways that the t (indistinguishable) balls are distributed into the n (distinct) cells, so that the jth cell contains xi j balls (that is, X j = (see Theorem 9(iii) xi j , j = 1, . . . , n). However, this number is equal to n+t−1 t of Chapter 2 in Roussas (1997)). #
Chapter 10 Independence 1. If the events A1 , . . . , An are independent, we wish to show that the events A1 , . . . , An are also independent, where Ai is either Ai or Aic , i = 1, . . . , n. For the events A1 , . . . , An to be independent, we have to show that: For any 2 ≤ k ≤ n and any i 1 , . . . , i k with 1 ≤ i 1 < i 2 < . . . < i k ≤ n, we have: (1) P(Ai1 ∩ . . . Aik ) = P(Ai1 ) . . . P(Aik ). Without loss of generality and for easier writing, instead of (1), it suffices to show that: P(A1 ∩ . . . ∩ Ak ) = P(A1 ) . . . P(Ak ). (2) We establish (2) by showing that: (i) The factorization in (2) holds when only one of the Ai s appears as a complement, and without loss of generality and for easier writing, assume it to be Ack . (ii) Assume the factorization in (2) to hold true when only m of the Ai s appear in their complements, and without loss of generality and for easier writing, assume them to be the last k − m; i.e., Ack−m+1 , . . . , Ack . (iii) We prove the factorization in (2) to be true when only m + 1 of the Ai s appear in their complements, and without loss of generality and for easier writing, we assume them to be the last k − (m + 1) = k − m − 1; i.e., Ack−m , . . . , Ack . Proof of (i): We have: P(A1 ∩ . . . ∩ Ak−1 ∩ Ack ) = P[(A1 ∩ . . . ∩ Ak−1 ) − (A1 ∩ . . . ∩ Ak )] (since A ∩ B c = A − (A ∩ B)) = P(A1 ∩ . . . ∩ Ak−1 ) − P(A1 ∩ . . . ∩ Ak )
Revised Answers Manual to an Introduction
= P(A1 ) . . . P(Ak−1 ) − P(A1 ) . . . P(Ak ) (by assumption) = P(A1 ) . . . P(Ak−1 )[1 − P(Ak )] = P(A1 ) . . . P(Ak−1 )P(Ack ), which shows (i). By the (induction) hypothesis made in (ii), we have: P(A1 ∩ . . . ∩ Ak−m ∩ Ack−m+1 ∩ . . . ∩ Ack ) = P(A1 ) . . . P(Ak−m )P(Ack−m+1 ) . . . P(Ack ).
(3)
Proof of (iii): We have: P(A1 ∩ . . . ∩ Ak−m−1 ∩ Ack−m ∩ . . . ∩ Ack ) = P[(A1 ∩ . . . ∩ Ak−m−1 ∩ Ack−m+1 ∩ . . . ∩ Ack ) −(A1 ∩ . . . ∩ Ak−m−1 ∩ Ak−m ∩ Ack−m+1 ∩ . . . ∩ Ack )] (since A ∩ B c = A − (A ∩ B) with A = A1 ∩ . . . ∩ Ak−m−1 ∩ Ack−m+1 ∩ . . . ∩ Ack and B = Ak−m ) = P(A1 ∩ . . . ∩ Ak−m−1 ∩ Ack−m+1 ∩ . . . ∩ Ack ) −P(A1 ∩ . . . ∩ Ak−m−1 ∩ Ak−m ∩ Ack−m+1 ∩ . . . ∩ Ack ) = P(A1 ) . . . P(Ak−m−1 )P(Ack−m+1 ) . . . P(Ack ) −P(A1 ) . . . P(Ak−m−1 )P(Ak−m )P(Ack−m+1 ) . . . P(Ack ) (by the (induction) hypothesis in (ii)) = P(A1 ) . . . P(Ak−m−1 )[1 − P(Ak−m )]P(Ack−m+1 ) . . . P(Ack ) = P(A1 ) . . . P(Ak−m−1 )P(Ack−m )P(Ack−m+1 ) . . . P(Ack ). The proof of (iii), and hence of the asserted independence, is completed. # 2. Suppose A and B are of the form A = (A11 ∪ A21 ) ∪ · · · ∪ (A1m ∪ A2m ), B = (B11 ∪B21 )∪· · ·∪(B1n ∪B2n ) with A1i , B1i ∈ F1 , i = 1, . . . , m, A2 j , B2 j ∈ F2 , j = 1, . . . , n. Then A ∪ B is of the same form and hence belongs in F. The same happens if, e.g., A = (A11 ∩ A21 ) ∪ · · · ∪ (A1m ∩ A2m ), B = (B11 ∩ B21 ) ∪ · · · ∪ (B1n ∩ B2n ). If, e.g., A = (A11 ∪ A21 ) ∪ · · · ∪ (A1m ∪ A2m ), B = (B11 ∩ B21 ) ∪ · · · ∪ (B1n ∩ B2n ), then again A ∪ B ∈ F, since A ∪ B is of the form described in the definition of F. Thus F is closed under unions. Next, let A = (A11 ∪ A21 ) ∪ · · · ∪ (A1m ∪ A2m ), so that Ac = (Ac11 ∩ Ac21 ) ∩ · · · ∩ (Ac1m ∩ Ac2m ) = (Ac11 ∩· · ·∩ Ac1m )∩(Ac21 ∩· · ·∩ Ac2m ) with Ac11 ∩· · ·∩ Ac1m ∈ F1 and Ac21 ∩· · ·∩ Ac2m ∈ F2 . Therefore Ac ∈ F. Now, let A = (A11 ∩ A21 )∪(A12 ∩ A22 ), so that Ac = (Ac11 ∪ Ac21 ) ∩ (Ac12 ∪ Ac22 ) = (Ac11 ∩ Ac12 ) ∪ (Ac12 ∩ Ac21 ) ∪ (Ac11 ∩ Ac22 ) ∪ (Ac21 ∩ Ac22 ) and this is of the form of sets lying in F. Thus Ac ∈ F. By induction, it holds that Ac ∈ F if A = Am = (A11 ∩ A21 ) ∪ · · · ∪ (A1m ∩ A2m ). Indeed, for
e87
e88
Revised Answers Manual to an Introduction
Am+1 = (A11 ∩ A21 ) ∪ · · · ∪ (A1m ∩ A2m ) ∪ (A1,m+1 ∩ A2,m+1 ), we have Acm+1 = [(A11 ∩ A21 ) ∪ · · · ∪ (A1m ∪ A2m )]c ∩ (Ac1,m+1 ∪ Ac2,m+1 ) (with Ai ∈ F1 , Bi ∈ F2 , by c ∪ Ac2,m+1 ) = ∪(Ai ∩ Bi ) ∩ (Ai,m+1 the induction hypothesis) i +
, c = ∪ (Ai ∩ Ai,m+1 ) ∩ Bi ∪ Ai ∩ (Bi ∩ Ac2,m+1 ) i
c , Ai ∈ F1 , and Bi , Bi ∩ Ac2,m+1 ∈ F2 ), so that Acm+1 ∈ F. (with Ai ∩ Ai,m+1 Finally, suppose that C = A∪B, where, e.g., A = (A11 ∪ A21 )∪· · ·∪(A1m ∪ A2m ), B = (B11 ∩ B21 ) ∪ · · · ∪ (B1n ∩ B2n ). Then
(with D, Ai ∈ F1 and E, Bi ∈ C c = Ac ∩ B c = (D ∩ E) ∩ ∪(Ai ∩ Bi ) F2 , by previous steps) i
= ∪ (D ∩ Ai ) ∩ (E ∩ Bi ) (with D ∩ Ai ∈ F1 , E ∩ Bi ∈ F2 ), i
so that C c ∈ F. It follows that F is a field. # 3. In the first place, e x ≥ 1 + x, x ∈ . (Indeed, with g(x) = e x − x − 1, we have g (x) = e x − 1 = 0, for x = 0, and g (x) = e x > 0. Thus, min x∈ g(x) = g(0) = 0 and hence e x ≥ x + 1.) In the inequality e x ≥ x + 1, replace x by −x and apply . . . , nto obtain e− pi ≥ to get e−x ≥ −x -n+ 1 − -n it for x = pi , i −=m, n n p p i i i=m that i=m e ≥- i=m (1 − pi ), or e ≥ i=m (1 − pi ), or 1 − pi , so n n 1 − exp(− i=m pi ) ≤ 1 − i=m (1 − pi ), which is the left-hand side inequality. As for the right-hand inequality, we have that, for n = m, it becomes: 1 − (1 − pm ) = pm , so that the inequality holds. Next, assume the inequality to be true for n = m + k and establish it for n = m + k + 1. To this end, 1−
m+k+1 .
(1 − pi ) = 1 −
i=m
m+k .
(1 − pi ) × (1 − pm+k+1 )
i=m
= 1−
m+k .
(1 − pi ) + pm+k+1 ×
i=m
= 1−
m+k .
m+k .
(1 − pi ) − pm+k+1 1 −
i=m
(1 − pi )
i=m m+k .
(1 − pi )
i=m
+ pm+k+1
m+k . = 1− (1 − pi ) (1 − pm+k+1 ) + pm+k+1
≤ 1−
i=m m+k . i=m
(1 − pi ) + pm+k+1 (since 1 − pm+k+1 ≤ 1)
Revised Answers Manual to an Introduction
≤
m+k
+ pm+k+1 (by the induction hypothesis)
pi
i=m
=
m+k+1
pi , which is the right-hand side inequality. #
i=m ∞ (i) Set A¯ = lim supm→∞ Am = ∩∞ m=1 ∪i=m Ai . Then, as m and n → ∞, ∞ ∞ ∞ ∞ c c c c ¯ P(( A) ) = P ∪ ∩ Ai = P lim ∩ Ai = lim P ∩ Ai m i=m m m=1 i=m i=m n n = lim P lim ∩ Aic = lim lim P ∩ Aic
4.
m
n i=m
= lim lim m
n
n .
m
n
P(Aic ) = lim lim m
i=m
n
i=m
n .
(1 − pi ) where pi = P(Ai ).
i=m
-n ¯ c ) = limm limn i=m P(Aic ), pi = P(Ai ) = 1 − P(Aic ). That is, P(( A) Apply the left-hand side inequality in Exercise 3 with pi as defined here to get: n n . 1 − exp − pi ≤ 1 − P(Aic ). i=m
i=m
∞ ∞ First, suppose that i=1 pi = ∞ (which implies that i=m pi = ∞ for every m), and take the limits in the inequality above, first as n → ∞ and ¯ c ) on the basis of what has then as m → ∞, to get: 1 − 0 ≤ 1 − P(( A) ¯ ≥ 1, so that P( A) ¯ = 1. been established above. Thus, P( A) ∞ ¯ = ¯ = 1. Then i=1 pi = ∞ because otherwise, P( A) Next, let P( A) A ) = 1 if and only 0 by Exercise 3 in Chapter 3. Thus, P(lim sup n n→∞ P(A ) = ∞. if ∞ n n=1 ¯ = 0. Then ∞ )< (ii) Now, with A¯ as in part (i), suppose that P( A) n=1 P(A n ¯ = 1 by part (i). On the other hand, ∞ ∞ because otherwise, P( A) n=1 ¯ = 0 by Exercise 3 in Chapter 3. Thus, P(An ) < ∞ implies P( A) P(lim supn→∞ An ) = 0 if and only if ∞ n=1 P(An ) < ∞. # 5. Here A¯ = lim supn→∞ An = A since limn→∞ An exists. Then, by Exercise 4, ∞ P(A) = 0 if and only if n=1 P(An ) < ∞, and P(A) = 1 if and only if ∞ P(A ) = ∞. Thus, P(A) is either 0 or 1. # n n=1 pq pq ∞ 1 ¯ ¯ 6. P(| X n − p| ≥ ε) ≤ ε2 × n −→ 0, and ∞ k=1 P(| X k 2 − p| ≥ ε) ≤ ε2 k=1 n→∞
1 k2
a.s. < ∞, so that by Exercise 4(i) and Theorem 4 in Chapter 3, X¯ k 2 −→ p. # k→∞
7. For an arbitrary, but fixed ε > 0, set An = (|X n | ≥ ε). Independence of X n s implies independence of An s, n ≥ 1. Then, by Exercise 4(ii), P(lim supn→∞ An ) = 0 if and only if ∞ n=1 P(An ) < ∞. But ∞
∞
∞
∞
∞
∞
lim sup An = ∩ ∪ Aν = ∩ ∪ (|X ν | ≥ ε) = ∩ ∪ (|X n+ν | ≥ ε), n→∞
n=1 ν=n
n=1 ν=n
n=1 ν=1
e89
e90
Revised Answers Manual to an Introduction
a.s. and X n −→ 0 if and only if P ∩∞ ∪∞ (|X n+ν | ≥ ε) = 0 for every ε > 0; n=1 ν=1 n→∞ ∞ ∞ this is so by Theorem 4 in Chapter 3. Thus, P(lim sup n→∞ An ) = P ∩n=1 ∪ν=1 ∞ (|X n+ν | ≥ ε) = 0 if and only if ∞ n=1 P(An ) = n=1 P(|X n | ≥ ε) < ∞, or ∞ a.s. X n −→ 0 if and only if n=1 P(|X n | ≥ ε) < ∞ for every ε > 0. # n→∞
8. For a concrete example, for n ≥ 1, let X n be independent r.v.s ∼ B(1, 1/n); i.e., 1, 1/n Xn = , n ≥ 1. 0, 1 − 1/n P
a.s.
n→∞
n→∞
Then, clearly, X n −→ 0. However, X n 0. Indeed, for ε > 0, let An = 1 (|X ∞n | ≥ ε) = (X n = 1). Then P(An ) = P(X n = 1) = n , and hence n=1 P(An ) = ∞. Since the events An , n ≥ 1, are independent, Exercise 4(i) implies that P(limn→∞ An ) = 1. However, ∞ ∞ ∞ ∞ P lim An = P ∩ ∪ Ak = P ∩ ∪ X n+k ≥ ε n→∞ n=1 k=n n=1 k=0 ∞ ∞ = P ∩ ∪ |X n+k − 0| ≥ ε . n=1 k=0
a.s.
Since this probability is = 0, it follows that X n 0. Next, let B = n→∞
σ (X 1 , X 2 , . . .). Then E B X n = X n a.s., E B X = E B 0 = 0 a.s. Thus, a.s. EB Xn EB X . # n→∞
9. If the r.v.s are independent, then the desired factorization follows from relation (10.4) with B j = (a j , b j ], a j , b j in and a j < b j , j = 1, . . . , k. Next, suppose that P(a j < X j ≤ b j , j = 1, . . . , k) =
k .
P(a j < X j ≤ b j ),
j=1
and we wish to show that X 1 , . . . , X k are independent. By Theorem 1, it suffices to show that the factorization in relation (10.5) holds. To this end, we show that P(X 1 ≤ x1 , a j < X j ≤ b j , j = 2, . . . , k) = P(X 1 ≤ x1 )
k .
P(a j < X j ≤ b j ).
j=2
Indeed, with (x1 >)ym ↓ −∞ as m → ∞, we have: P(X 1 ≤ x1 , a j < X j ≤ b j , j = 2, . . . , k) = P(X 1 ∈ ∪(ym , x1 ], a j < X j ≤ b j , j = 2, . . . , k) m
= P(∪(X 1 ∈ (ym , x1 ]), a j < X j ≤ b j , j = 2, . . . , k) m
= P(∪(X 1 ∈ (ym , x1 ], a j < X j ≤ b j , j = 2, . . . , k)) m
Revised Answers Manual to an Introduction
= P(lim(ym < X 1 ≤ x1 , a j < X j ≤ b j , j = 2, . . . , k)) m
= lim P(ym < X 1 ≤ x1 , a j < X j ≤ b j , j = 2, . . . , k) m
= lim P(ym < X 1 ≤ x1 )P(a2 < X 2 ≤ b2 ) . . . P(ak < X k ≤ bk ) m
= P(X 1 ≤ x1 )P(a2 < X 2 ≤ b2 ) . . . P(ak < X k ≤ bk ). This process can be repeated, clearly, with each one of the remaining k − 2 r.v.s, upon the completion in relation (10.5). # of which we obtain the factorization 10. The existence of X dP implies the existenceof A X dP for every A ∈ A (by the Corollary to Theorem 5 in Chapter 4), and A X dP = (X I A )dP = E(X I A ). Independence of X and Y means independence of A X and AY , of the σ -fields induced by X and Y . Since A ∈ AY , it follows that A X and {, A, Ac , } are also independent, and hence X and I A are independent. Then, by Lemma 1, E(X I A ) = (E X )(E I A ) = (E X )P(A). # 11. Consider two r.v.s X and Y , and let A X and AY be the σ -fields induced by them. Then X and Y are said to be independent if A X and AY are independent; i.e., P(A1 ∩ A2 ) = P(A1 )P(A2 ) for all A1 ∈ A X and all A2 ∈ AY . According to this, X and X are independent if P(A1 ∩ A2 ) = P(A1 )P(A2 ) for every A1 , A2 in A X . This is also true for A1 = A2 = A, say. That is, P(A ∩ A) = P(A)P(A) or P(A) = [P(A)]2 , which occurs only if P(A) = 0 or P(A) = 1. So, for every B ∈ B, P(X ∈ B) is either 0 or 1; or A X is equivalent to {, }. Next, for sufficiently small a, P(X < a) = 0, because otherwise P(X = −∞) = 1, by letting a → −∞. Likewise, P(X > b) = 0 for sufficiently large b. Thus, P(a ≤ X ≤ b) = 1. Set I0 = [a, b], and let I1 be that half of I0 for which P(X ∈ I1 ) = 1. Next, let I2 be that half of I1 for which P(X ∈ I2 ) = 1, etc. It is clear that, as n → ∞, In ↓ {c}, some c ∈ (a, b), and then P(X ∈ In ) → P(X = c). Since P(X ∈ In ) = 1 for all n, it follows that P(X = c) = 1. # (i) By Lemma 1 in Chapter 9, 12. Cov(X , Y ) = E[(X − E X )(Y − EY )] = E(X Y ) − (E X )(EY ) = (E X )(EY ) − (E X )(EY ) = 0, so that ρ(X , Y ) = 0 also. (ii) By Exercise 14(iii) in Chapter 9, ρ(X , Y ) = ρ, so that if ρ = 0, we have by the expression of p X ,Y (see Exercise 12 in Chapter 9):
1 x − μ1 2 y − μ2 2 exp − − p X ,Y (x, y) = 2π σ1 σ2 σ1 σ2
2 x − μ1 1 exp − = √ σ1 2π σ1
y − μ2 2 1 exp − ×√ σ2 2π σ2 = p X (x) pY (y), so that X and Y are independent. #
e91
e92
Revised Answers Manual to an Introduction
13. Indeed, eit(X 1 +X 2 ) = eit X 1 × eit X 2 = [cos(t X 1 ) + i sin(t X 1 )][cos(t X 2 ) + i sin(t X 2 )] = [cos(t X 1 ) cos(t X 2 ) − sin(t X 1 ) sin(t X 2 )] +i[cos(t X 1 ) sin(t X 2 ) + sin(t X 1 ) cos(t X 2 )], and Eeit(X 1 +X 2 ) = [E cos(t X 1 )][E cos(t X 2 )] − [E sin(t X 1 )][E sin(t X 2 )] + i[E cos(t X 1 )][E sin(t X 2 )] + [E sin(t X 1 )][E cos(t X 2 )] = [E cos(t X 1 ) + iE sin(t X 1 )][E cos(t X 2 ) + iE sin(t X 2 )] = (Eeit X 1 )(Eeit X 2 ), as was to be seen. Next, Eeit(X 1 +...+X k+1 ) = E[eit(X 1 +...+X k ) eit X k+1 ] = Eeit(X 1 +...+X k ) × eit X k+1 (by the previous step) = Eeit X 1 × . . . × Eeit X k × Eeit X k+1 (by the induction hypothesis) 14.
(i) Here p X ,Y
= Eeit X 1 × . . . × Eeit X k+1 . # 2 2 = 2π1σ 2 exp − x 2σ+y2 , and the transformations u =
x + y, v = x − y produce the Jacobian − 21 . Thus, 2 1 u + v2 pU ,V (u, v) = exp − 2π × 2σ 2 4σ 2 1 u2 = √ √ exp − √ 2π (σ 2) 2(σ 2)2 1 v2 ×√ √ exp − √ 2π (σ 2) 2(σ 2)2
15.
and the desired result follows. (ii) The r.v.s X ∗ = X − μ1 and Y ∗ = Y − μ2 are independent, distributed as N (0, σ 2 ), so that, by part (i), U = X ∗ + Y ∗ and V = X ∗ − Y ∗ are independent, distributed as N (0, 2σ 2 ). Equivalently, X + Y − (μ1 + μ2 ) and X − Y − (μ1 − μ2 ) are independent, distributed as N (0, 2σ 2 ), or X + Y and X − Y are independent, distributed as N (μ1 + μ2 , 2σ 2 ) and N (μ1 − μ2 , 2σ 2 ), respectively. # (i) It is clear that X takes on the values 0, 1, . . . , n, and that in order to show that X is a r.v., it suffices to show that X −1 ({k}) ∈ A, k = 0, 1, . . . , n. We have, (X = k) = ∪(Ai1 ∩ . . . ∩ Aik ∩ Acj1 ∩ . . . ∩ Acjl ),
Revised Answers Manual to an Introduction
where the union is taken over all i 1 , . . . , i k with 1 ≤ i 1 < . . . < i k ≤ n and all j1 < . . . < jl distinct from all i 1 , . . . , i k and k + l = n. However, this union is in A, and hence (X = k) ∈ A. (ii) In part (i), the members of the union are nk , and each such member consists of n independent events; this is so by the assumed independence of A1 , . . . , An and Exercise 1. Since P(Ai1 ∩ . . . ∩ Aik ∩ Acj1 ∩ . . . ∩ Acjl ) = P(Ai1 ) . . . P(Aik )P(Acj1 ) . . . P(Acjl ) = p k (1 − p)n−k , we haveP(X = k) =
n k p (1 − p)n−k , k = 0, 1, k
. . . , n, so that X ∼ B(n, p).# 16. Let An = X n−1 (B) = {, An , Acn , }. Then X 1 , X 2 , . . . are independent if A1 , A2 , . . . are independent. Next, A1 , A2 , . . . are independent if any finite member of them is a collection of independent σ -fields. Without loss of generality, consider the collection A1 , . . . , An . Then A1 , . . . , An are independent if and only if A1 , . . . , An are independent. Indeed, if A1 , . . . , An are independent, then A1 , . . . , An are also independent, where Aj is either A j or Acj , j = 1, . . . , n. This is so by Exercise 1. But this implies that A1 , . . . , An are independent, since for every choice B j ∈ A j , we have P(B1 ∩ . . . ∩ Bn ) = P(B1 ) . . . P(Bn ). On the other hand, if A1 , . . . , An are independent, then A1 , . . . , An are independent, since P(A j1 ∩ . . . ∩ A jk ) = P(A j1 ) . . . P(A jk ) for all 2 ≤ k ≤ n and 1 ≤ j1 < . . . < jk ≤ n, and A jr ∈ A jr , r = 1, . . . , k. This is so, because P(B1 ∩ . . . ∩ Bn ) = P(B1 ) . . . P(Bn ), Bi ∈ Ai , i = 1, . . . , n, and we may choose B ji = A ji , i = 1, . . . , k, and B j = for j = j1 , . . . , jk . This completes the proof. # 17. We have X = # of H s, and let Y = # of T s. Then X = Y + r and, of course, X + Y = n, so that Y = n − X . Then: P(X = Y + r ) = P(X = n − X + r )
n +r = P(2X = n + r ) = P X = 2 0 if n + r is odd = n n+r n− n+r 2 if n + r is even. # n+r p 2 (1 − p) 2
18. Consider the quantities Dn , Dn+ , Dn− and yi , i = 1, . . . , n mentioned in the hint, and let y0 = −∞, yn+1 = ∞. Then observe that Dn+ = sup [Fn (x, ·) − F(x)] = max x∈
= max
sup
0≤i≤n yi ≤x
sup
0≤i≤n yi ≤x
i − F(x) n
[Fn (x, ·) − F(x)]
e93
e94
Revised Answers Manual to an Introduction
i = max F(x) − inf yi ≤x
= max
max
0≤i≤n
/ i − F(yi ) , 0 . n
Likewise, Dn− = sup [F(x) − Fn (x, ·)] = x∈
=
max
sup
1≤i≤n+1 yi−1 ≤x
max
F(x) −
sup
1≤i≤n+1 yi−1 ≤x
i −1 n
[F(x) − Fn (x, ·)]
i −1 = max sup F(x) − 1≤i≤n+1 yi−1 ≤x
max
1≤i≤n
F(yi − 0) −
/ i −1 ,0 . n
The quantities Dn+ and Dn− are r.v.s because they are expressed in terms of finitely many r.v.s Furthermore, it is clear that Dn = max{Dn+ , Dn− }. Therefore Dn is a r.v. # 19. For 0 < p < 1, let x p = inf{x ∈ ; F(x) ≥ p}. Then x ≥ x p implies F(x) ≥ p, whereas x < x p implies F(x) < p so that F(x p − 0) ≤ p. With this in mind, proceed as follows. Let / i , i = 0, 1, . . . , k, xki = inf x ∈ ; F(x) ≥ k so that −∞ ≤ xk0 < xk1 and xk,k−1 < xkk ≤ ∞. From the observation at the beginning, we have that x ∈ [ ki , i+1 k ) and i = 0, 1, . . . , k − 1 imply i +1 i ≤ F(xki ) ≤ F(x) ≤ F(xk,i+1 − 0) ≤ , k k
Revised Answers Manual to an Introduction
so that F(xk,i+1 − 0) − F(xki ) ≤
1 . k
(1)
Therefore, for x ∈ [xki , xk,i+1 ), i = 1, . . . , k − 2, Fn (x) − F(x) ≤ Fn (xk,i+1 − 0) − F(xki ) = [Fn (xk,i+1 − 0) − F(xk,i+1 − 0)] + [F(xk,i+1 − 0) − F(xki )] ≤ [Fn (xk,i+1 − 0) − F(xk,i+1 − 0)] +
1 (by (1)). k
(2)
Likewise, Fn (x) − F(x) ≥ Fn (xki ) − F(xk,i+1 − 0) = [Fn (xki ) − F(xki )] − [F(xk,i+1 − 0) − F(xki )] 1 ≥ [Fn (xki ) − F(xki )] − (by (1)). k
(3)
From relations (2) and (3), and for x ∈ [xki , xk,i+1 ), i = 1, . . . , k − 2, it follows that |Fn (x) − F(x)| ≤ max{|Fn (xk,i+1 − 0) − F(xk,i+1 − 0)|, 1 |Fn (xki ) − F(xki )|} + , k so that for x ∈ [xk1 , xk,k−1 ), |Fn (x) − F(x)| ≤ max{|Fn (xk,i+1 − 0) − F(xk,i+1 − 0)|, 1 |Fn (xki ) − F(xki )|; i = 1, . . . , k − 2} + . k
(4)
In particular, for i = 0, relation (1) gives F(xk1 ) ≤ k1 , and therefore (3) and (4) become, for x < xk1 , 1 Fn (x) − F(x) ≤ [Fn (xk1 − 0) − F(xk1 − 0)] + , k 1 Fn (x) − F(x) ≥ [Fn (xk0 ) − 0] − , k so that (4) still holds for i = 0 and x < xk1 . Next, relation (1) gives, for i = k − 1, F(xkk − 0) − F(xk,k−1 ) ≤ k1 , and therefore (3) and (4) become, for x ≥ xk,k−1 , 1 Fn (x) − F(x) ≤ [Fn (xk,k − 0) − F(xk,k − 0)] + , k 1 Fn (x) − F(x) ≥ [Fn (xk,k−1 ) − F(xk,k−1 )] − , k
e95
e96
Revised Answers Manual to an Introduction
so that (4) still holds for i = k − 1 and x ≥ xk,k+1 . In other words, relation (4) holds for i = 0, 1, . . . , k − 1 and x ∈ . Now, from the definition of Fn (x) = Fn (x, ω) in Exercise 4, we have that n 1 1, X i (ω) ≤ x Fn (x) = Yi , where Yi = Yi (x, ω) = 0, X i (ω) > x, n i=1
so that
n
i=1 Yi
a.s.
∼ B(n, F(x)) and by the SLLN, Fn (x) −→ F(x). Likewise, if n→∞
Z i = Z i (x, ω) are defined by Zi = then
n i=1
1, X i (ω) < x 0, X i (ω) ≥ x,
Z i ∼ B(n, F(x − 0)) and Fn (x − 0) = a.s.
1 n
n i=1
Z i . Therefore, again
by the SLLN, Fn (x − 0) −→ F(x − 0). Next, in relation (4) take the supx∈
n→∞
when the left-hand side produces the r.v. supx∈ |Fn (x)− F(x)| (by Exercise (4)), whereas the right-hand side is max{|Fn (xk,i+1 −0)− F(xk,i+1 −0)|, |Fn (xki )− F(xki )|; i = 0, 1, . . . , k−1}. (5) Each term within the square bracket in (5) tends a.s. to 0, and then so does their max. In other words: supx∈ |Fn (x) − F(x)| ≤ k1 a.s. Letting k → 0, we get then the desired result. #
Chapter 11 Topics from the Theory of Characteristic Functions 1. Set z = r eiθ (r ≥ 0, 0 ≤ θ < 2π ) for the representation of g(x) + i h(x) 2π ∞ in polar coordinates, and [g(x) + i h(x)]dμ = 0 0 zdr dθ (or just z) = r , | z| = ρ. Therefore(real)| z| = ρ = ρeiα × e−iα = = ρeiα . Then: |z|−iα −iα = (e z) = (e−iα × r eiθ ) = r ei(θ−α) = r cos(θ − α) (since ( z)e real), and this is ≤ r = |z|; i.e., | z| ≤ |z|. Alternatively, without loss of generality, suppose that [g(x) + i h(x)]dμ = 0 (since the inequality is true if it is = 0), and set z for the complex number z = | [g(x) + i h(x)]dμ|/ [g(x) + i h(x)]dμ. Then / real = [g(x) + i h(x)]dμ = Re z [g(x) + i h(x)]dμ
Re{z[g(x) + i h(x)]}dμ = Re z[g(x) + i h(x)]dμ =
Revised Answers Manual to an Introduction
|Re{z[g(x) + i h(x)]}dμ| ≤ Re{z[g(x) + i h(x)]}dμ ≤
|z[g(x) + i h(x)]| dμ = |z||g(x) + i h(x)| dμ ≤
|g(x) + i h(x)| dμ (since |z| = 1). =
Thus, | [g(x) + i h(x)]dμ| ≤ |g(x) + i h(x)|dμ. For the special case, Z = r eiθ where r and θ are r.v.s with r ≥ 0 and 0 ≤ θ < 2π . Also, let E Z = ρeiα , so that |Z | = r and |E Z | = ρ. Then real= |E Z | = ρ = ρeiα × e−iα = (E Z )e−iα = E(e−iα Z ) = E(e−iα × r eiθ ) = E r ei(θ−α) = E[r cos(θ − α)] (since real) ≤ |E[r cos(θ − α)]| ≤ E|r cos(θ − α)| = Er = E|Z |; i.e., |E Z | ≤ E|Z |. Another approach for the " special case is the following. √ Set Z = X + iY . Then 2 + (EY )2 , |Z | = E Z = E X + iEY , |E Z | = (E X ) X 2 + Y 2 . Consider the " 2 2 function g(x, y) = x + y . Then g(x, y) is convex with respect to one of its variables, the other being kept fixed (this is so because the second order derivatives are ≥ 0). Then look at g(·, EY ) and apply Jensen’s inequality to obtain: g(x, EY )dP X , (1) g(E X , EY ) ≤ E g(X , EY ) =
where PX is the probability distribution of the r.v. X . For each x, look at g(x, ·) and apply Jensen’s inequality again to obtain: g(x, EY ) ≤ E g(x, Y ) = E[g(X , Y )|X = x]. Hence
g(x, EY )dP X ≤
E[g(X , Y )|X = x]dP X
= E{E[g(X , Y )|X ]} = E g(X , Y ) " = E X 2 + Y 2 = E|Z |. (2) " √ Since g(E X , EY ) = (E X )2 + (EY )2 = |E Z | and E|Z | = E X 2 + Y 2 = E g(X , Y ), relations (1) and (2) yield the result. # 2 ∗ 2. In the expansion ei x = 1 + i x − x2 ei x , |x ∗ | bounded, replace x by −ta and −tb ∗
∗
it1 −itb = 1−itb− (tb) eit2 , |t ∗ | successively to obtain: e−ita = 1−ita − (ta) 1 2 e ,e 2 ∗ and |t2 | bounded. Hence 2
e−ita − e−itb = it(b − a) +
t 2 2 it ∗ ∗ (b e 2 − a 2 eit1 ), 2
2
e97
e98
Revised Answers Manual to an Introduction
and therefore ∗
∗
(e−ita − e−itb )/(it) = (b − a) + t(b2 eit2 − a 2 eit1 )/2i. Letting t → 0, we have that the limit is b−a. Since also lim t→0 f (t) = f (0) = 1, it follows that limt→0 g(t) = b − a. # 3. The continuity of g at 0 implies that, for every ε > 0, there exists δ = δ(ε) > 0 such that |g(v) − g(0)| < ε for |v − 0| = |v| < δ. Hence, for 0 < v < t < δ (assuming that t > 0; similarly for t < 0), t 1 1 t ≤ [g(v) − g(0)]dv |g(v) − g(0)|dv t t 0 0 1 t εt = ε, ≤ εdv = t 0 t or t t 1 1 [g(v) − g(0)]dv = g(v)dv − g(0) < ε. t t 0 0 1 t Since ε is arbitrary, it follows that t 0 g(v)dv −→ g(0). t→∞
Alternatively, apply L’Hôpital’s Rule, which states that, if f and h are two functions such that f (x0 ) = h(x0 ) = 0, f and h are differentiable in an open interval (x) f (x) = lim x→x0 hf (x) , I containing x0 , and h (x) = 0 for x ∈ I , then lim x→x0 h(x) t provided this limit exists (or is ∞ or −∞). Here take f (t) = 0 g(v)dv, h(t) = t, and x0 = 0 to obtain: t g(v)dv 1 t g(t) = lim = lim g(t) = g(0). # g(v)dv = lim 0 lim t→0 t 0 t→0 t→0 1 t→0 t " 4. |eit −1| = | cos t +i sin t −1| = |(cos t −1)+i sin t| = (cos t − 1)2 + sin2 t = √ √ √ 0 2 2 1 − cos t = 2 2 sin2 t = 2 sin2 t ≤ 2 2t , since | sin t| ≤ t; i.e., |eit −1| ≤ |t|. That | sin t| ≤ t is seen ∞ as follows: n t 2n+1 n t 2n Recall that: sin t = ∞ n=0 (−1) (2n+1)! , cos t = n=0 (−1) (2n)! . We have: ∞ ∞ 2n+1 2n t n t n t | sin t| = (−1) (−1) = (2n + 1)! 2n + 1 (2n)! n=0 n=0 t = cos t ≤ |t| (since 2n + 1 ≥ 1). # 2n + 1 5. We know that log(1 + z) = z(1 + θ z), |θ | = |θ (z)| ≤ 1, |z| ≤ 21 . Therefore, cn n cn cn cn cn log 1 + = n× 1+θ = cn 1 + θ −→ c, = n log 1 + n n n n n n→∞ and hence cn cn cn n = en log(1+ n ) = ecn (1+θ n ) −→ ec . # 1+ n→∞ n
Revised Answers Manual to an Introduction
6.
∞ xd x ∞ d(1+x 2 ) 1 1 = π1 −∞ 1+x = 2π log(1 + x 2 )|∞ 2 = 2π −∞ 1+x 2 −∞ = ∞ − ∞, so that E X does not exist. ∞ eit x ∞ 1 ∞ cos(t x) 1 ∞ (ii) f X (t) = −∞ eit x p(x)d x = π1 −∞ 1+x 2 d x = π −∞ 1+x 2 d x +i π −∞ ∞ x) x) sin(t x) d x = π1 −∞ cos(t d x (since sin(t is an odd function), and this is 1+x 2 1+x 2 1+x 2 cos(t x) 2 ∞ cos(t x) = π 0 1+x 2 d x (since 1+x 2 is an even function), and this is = π2 × π −|t| = e−|t| (by the reference given). 2e t (iii) f Sn (t) = nj=1 f X j ( nt ) = [ f X 1 ( nt )]n = (e−| n | )n = e−|t| . (i)
∞
−∞ x p(x)d x
n
(iv)
P d Sn −→ 0 is equivalent to Snn −→ 0 which does not n n→∞ n→∞ of Snn is e−|t| and the ch.f. of 0 is eit×0 = e0 = 1. #
happen, since the ch.f.
[(2n)!] [(2n−1)!] (2n) 7. We have [(2n−1)!] = (2n)1/2n ×[(2n−1)!]−1/2n(2n−1) 1/(2n−1) = [(2n−1)!]1/(2n−1) −1/2n(2n−1) 1 = (2n)1/2n × (2n)! . Now log(2n)1/2n = 2n log(2n) −→ 0, so that 2n 1/2n
1/2n
1/2n
n→∞
n! (2n)1/2n → e0 = 1. Next, Stirling’s formula states that: √ −→ 1. 1 2π ×n n+ 2 ×e−n n→∞ −1/2n(2n−1) −1/2n(2n−1) √ 1 (2n)!/ 2π(2n)2n+ 2 e−2n = and Therefore (2n)! √ 1 2n 2n/ 2π (2n)2n+ 2 e−2n √ 1 [(2n)!/ 2π (2n)2n+ 2 e−2n ]−1/2n(2n−1) −→ 1, since the bracket −→ 1 and n→∞ n→∞ −1/2n(2n−1) 1 1/2n(2n − 1) −→ 0. Furthermore, = √ 1 n→∞ 2n/ 2π(2n)2n+ 2 e−2n 1/2n(2n−1) 1/2n(2n−1) 1/(2n−1) (2n)e2n 2n = √ e 1/2n(2n−1) × and the first √ 2n+ 1 2n+ 1 2π (2n)
2
(2n)
2π
2
term −→ 1. Finally, setting An for the second term, we have: log An = n→∞
× log(2n) =
1 −1+ 4n 1 1− 2n
−2n+ 21 2n(2n−1)
× log(2n) 2n and the first term −→ −1 whereas the second term n→∞
−→ 0. Thus, log An −→ 0, so that An −→ 1. #
n→∞
n→∞
n→∞
(n) 1/n de f < ∞. We 8. The series converges (for −ρ < t < ρ) if ρ −1 = limn→∞ |mn! | 1/n (n) 1/n n 1/n 1/n (n)1/n √ 1 μ(n) / 2π n n+ 2 e−n |n μ have: |mn! | = En!X ≤ E |X = = = √ 1 n! n! n!/ 2π n n+ 2 e−n 1/n 1/n μ(n) 1 × and the second term −→ 1 by Stir√ √ n+ 1 n+ 1 2π n
2 e−n
n!/ 2π n
2 e−n
n→∞
ling’s formula and the fact that 1/n −→ 0. The first term is equal to: n→∞
[μ(n) ]1/n n
×
1 1 1 √ 1 × e−1 × n 1/2n and: √ 1 1/n −→ 1, log(n 1/2n ) = 2n log n −→ 0, so that ( 2π )1/n ( 2π ) n→∞ n→∞ (n) 1/n < ∞, then limn→∞ n 1/2n −→ 1. Therefore, if limn→∞ [μ n] (n) n→∞ |m | 1/n n! < ∞. #
e99
e100
Revised Answers Manual to an Introduction
√ 9. Set Yn = Sσn −nμ and show that {Yn } does not converge mutually in proban bility by showing that {Y2n − Yn } does not converge in probability to 0, as −2nμ √ − Sσn −nμ = √1 [S2n − 2nμ − n → ∞. We have Y2n − Yn = S2n√ n σ 2n σ 2n √ 2(Sn − nμ)]. But
⎛ ⎞ 2n n √ √ − 2nμ − 2(Sn − nμ) = X j − 2nμ − 2 ⎝ X j − nμ⎠
S2n
j=1
j=1
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ n 2n n √ =⎝ X j − nμ⎠ + ⎝ X j − nμ⎠ − 2 ⎝ X j − nμ⎠ j=1
= (1 −
j=n+1
⎛
⎞
j=1
⎛
⎞
n 2n √ 2) ⎝ X j − nμ⎠ + ⎝ X j − nμ⎠ , j=1
j=n+1
and hence
Y2n
n 2n √ 1− 2 1 j=1 X j − nμ j=n+1 X j − nμ − Yn = √ +√ × × √ √ σ n σ n 2 2 √ 1− 2 1 = √ Un + √ Vn and Un , Vn are independent. 2 2
Then: f 1−√√2 2
Un
(t) −→ e
√ 2 √ 2 t 2 /2 − 1− 2
n→∞
,
f √1
2
Vn (t) −→ e
so that f Y2n −Yn (t) = f 1−√√2 2
2
2
n→∞
√
Un + √1 Vn
2 − √1 t 2 /2
(t) −→ e−(2− n→∞
2)t 2 /2
,
.
√ Y2n −Yn Thus, the asymptotic distribution of Y2n − Yn is N (0, 2 − 2) or √ √ is 2− 2 Y −Y n 2n asymptotically distributed as N (0, 1) and then P √ √ > ε −→ 2 − 2 2− 2
n→∞
(ε) = 0. # ∞ ∞ 1 1 dx 10. First, E X 1+ (= E X 1− ) = c ∞ n=3 n log n , and n=3 n log n ≥ 2.5 (x+0.5) log(x+0.5) = ∞ dt ∞ d log t ∞ 3 t log t = 3 log t = log log t|3 = ∞ (see also the figure below), so that the E X 1 does not exist.
Revised Answers Manual to an Introduction
0
...
2.5
3
4
5
6
S∗ Next, P Snn = nn = P(Sn = Sn∗ ) ≤ P(X j = X n j for some j = 1, . . . , n) ≤ n n ∞ j=1 P(X j = X n j ) = j=1 P(|X j | ≥ n) = n P(|X 1 | ≥ n) = nc j=n ∞ 1 ∞ 1 ∞ 1 1 nc ∞ 1 ≤ log j=n j 2 , and j=n j 2 ≤ n−1 , because j=n j 2 ≤ n−0.5 n j 2 log j ∞ 1 dx = n−1 dt = − 1t |∞ n−1 = n−1 (see also the figure below), so that (x−0.5)2 t2 Sn∗ Sn nc ∞ nc 1 −→ 0, or Snn − j=n j 2 ≤ (n−1) log n −→ 0, and hence P n = n log n n→∞
Sn∗ P −→ 0. n n→∞
0
...
n−1 n−0.5 n
n→∞
n+1
n+2
n+3
On the other hand, ∗ 1 Sn = ×n X 1 dP = X 1 dP = X 1 dP E n n (|X 1 |
j=3
e101
e102
Revised Answers Manual to an Introduction
⎛
−3
1 ⎠ j log j j=−n+1 j=3 ⎛ ⎞ n−1 n−1 1 1 ⎠ = 0, by setting j = −k. = c ⎝− + k log k j log j = c⎝
1 + j log | j|
⎞
n−1
k=3
j=3
∗ S So, E nn = 0. Finally, ∗ 2 ∗ S S 2 = Var nn = E nn = n12 × nE X n1 as above. However (see also figure below),
0
n−1
...
1 j=3 log j
≤
2.5
n−0.5 2.5
4
3
dx log(x−0.5)
1 n
5
=
(|X 1 |
X 12 dP =
6
···
2c n
n−1
1 j=3 log j
n−1
n−1
Sn∗ dt 2 log t , and we wish to show that Var n x dt suffices to show, that ddx 2 log −→ 0. But t x→∞
−→ 0. By the l’Hôspital rule, it n→∞ x dt d 1 x+h dt d x 2 log t = lim h→∞ h x log t (for h > 0 and similarly for h < 0), and x dt 1 x+h dt 1 1 1 d h x log t ≤ h log x × h = log x , so that lim x→∞ dx 2 log t = lim x→∞ x+h dt Sn∗ 1 −→ 0, and hence limh→∞ h1 x log t ≤ lim x→∞ log x = 0. Therefore Var n Sn∗ P −→ 0. This convergence n n→∞
together with
Sn n
−
Sn∗ P −→ 0 n n→∞
n→∞
complete the proof. #
In all of Exercises 11-15 below, i is treated as a real number subject, of course, to the requirement i 2 = −1. n that x n−x , x = 0, 1, . . . , n, we have: 11. Since p X (x) = x p q f X (t) = Eeit X =
n x=0 n
eit x
= ( pe + q) . # it
n n x n−x n p q ( peit )x q n−x = x x x=0
Revised Answers Manual to an Introduction
12. Here p X (x) = e−λ λx! , x = 0, 1, . . . , so that x
f X (t) = Ee
it X
=
∞
e
it x −λ λ
e
x!
x=0
=e 13.
(i) Since p Z (z) =
−λ λeit
e
2 √1 e−z /2 , 2π
=e
x
λeit −λ
=e
−λ
∞
(λeit )x
x=0
.#
we have:
∞ 1 2 f Z (t) = Eeit Z = eit z √ e−z /2 dz 2π −∞ ∞ ∞ (z−it)2 t2 1 − z2 −2it z 1 2 = dz = √ e √ e− 2 × e− 2 dz 2π 2π −∞ −∞ ∞ 2 2 (z−it) t t2 t2 1 = e− 2 √ e− 2 dz = e− 2 × 1 = e− 2 , 2π −∞ since the integrand can be viewed as the p.d.f. of the Normal distribution with mean it and variance 1. (ii) Here t2
e− 2 = f Z (t) = f X −μ (t) = Eeit(X −μ)/σ = e−iμt/σ Eei(t/σ )X σ itμ t 2 t t −iμt/σ , so that f X =e σ −2 . =e fX σ σ Replacing
t σ
by t, we get: f X (t) = eiμt−
σ 2t2 2
.#
∞ α−1 1 1 it x α−1 e−x/β d x = 0 e (α)β α x (α)β α 0 x ∞ β α−1 y α−1 e−y β 1 × e−(1−iβt)x/β d x = (α)β dy (by setting (1−iβt)x = y)= α 0 β (1−iβt)α−1 1−iβt ∞ α−1 −y (α) 1 1 1 1 e dy = (1−iβt)α × (α) = (1−iβt)α . The special cases (1−iβt)α × (α) 0 y
14. Here f X (t) = Eeit X =
∞
follow immediately. #
15.
(i) For μ1 = μ2 = 0 and σ1 = σ2 = 1, we have: 1 2 2 (x − 2ρx y + y ) 2(1 − ρ 2 ) 2π 1 − ρ 2 2 1 (y − ρx)2 x 1 ×√ " exp − = √ exp − 2 2(1 − ρ 2 ) 2π 2π 1 − ρ 2
p X ,Y (x, y) =
"
1
exp −
(by Exercise 12(ii) in Chapter 9),
e103
e104
Revised Answers Manual to an Introduction
so that ∞ ∞ f X ,Y (t1 , t2 ) = Eeit1 X +it2 Y = eit1 x+it2 y p X ,Y (x, y)d xd y −∞ −∞ ∞ x2 it1 x 1 = e √ e− 2 2π −∞ ∞ 2 1 − (y−ρx)2 it2 y × e √ " e 2(1−ρ ) dy d x. 2π 1 − ρ 2 −∞ However, the second integral on the right-hand side above is the ch.f. of a r.v. distributed as N (ρx, 1 − ρ 2 ), and therefore, by Exercise 13(ii), is equal to: ei(ρx)t2 −
(1−ρ 2 )t22 2
. Therefore ∞
(1−ρ 2 )t22 x2 1 eit1 x √ e− 2 ei(ρx)t2 − 2 d x 2π −∞ ∞ (1−ρ 2 )t22 x2 1 = e− 2 √ eit1 x− 2 +iρt2 x d x 2π −∞ ∞ (1−ρ 2 )t22 x 2 −2i(t1 +ρt2 )x 1 2 = e− 2 d x. √ e− 2π −∞
f X ,Y (t1 , t2 ) =
However, x 2 − 2i(t1 + ρt2 )x = x 2 + [i(t1 + ρt2 )]2 − 2i(t1 + ρt2 )x + (t1 + ρt2 )2 = [x − i(t1 + ρt2 )]2 + (t1 + ρt2 )2 , so that (1−ρ 2 )t22 (t1 +ρt2 )2 − 2 2
∞
1 2 √ e−[x−i(t1 +ρt2 )] /2 d x 2π −∞ / 1 = exp − [(1 − ρ 2 )t22 + (t1 + ρt2 )]2 ] 2 1 2 2 = exp − (t1 + 2ρt1 t2 + t2 ) , 2
f X ,Y (t1 , t2 ) = e−
as was to be seen. (ii) Clearly, EU = E V = 0, Var(U ) = Var(V ) = 1, and ρ(U , V ) = ,Y ) = Cov(U , V ) = E(U V ) = σ11σ2 E[(X − μ1 )(Y − μ2 )] = Cov(X σ1 σ2 ρ(X , Y ) = ρ. Furthermore, U ∼ N (0, 1) and V ∼ N (0, 1). Next,
Revised Answers Manual to an Introduction
X = σ1 U + μ1 , Y = σ2 V + μ2 , so that f X ,Y (t1 , t2 ) = Eet1 X +t2 Y = eit1 μ1 +it2 μ2 Eei(t1 σ1 )U +i(t2 σ2 )V = eit1 μ1 +it2 μ2 fU ,V (t1 σ1 , t2 σ2 ) 1
= eit1 μ1 +it2 μ2 − 2 (t1 σ1 +2ρσ1 σ2 t1 t2 +t2 σ2 ) 1 = ex p iμ1 t1 + iμ2 t2 − (σ12 t12 + 2ρσ1 σ2 t1 t2 + σ22 t22 ) . 2 2 2
2 2
This is so because the joint distribution of U and V is the Bivariate Normal with parameters 0,0,1,1 and ρ (by Exercise 15 in Chapter 9), so that part (i) applies. # 16.
(i) That EU = μ1 + μ2 , E V = μ1 − μ2 , Var(U ) = σ12 + σ22 + 2ρσ1 σ2 , and Var(V ) = σ12 + σ22 − 2ρσ1 σ2 are immediate. Also, Cov(U , V ) = E(U V ) − (EU )(E V ) = E[(X + Y )(X − Y )] − (μ1 + μ2 )(μ1 − μ2 ) = E X 2 − EY 2 − μ21 + μ22 = (E X 2 − μ21 ) − (EY 2 − μ22 ) = σ12 − σ22 . (ii) We have f X ,Y (t1 + t2 , t1 − t2 ) = exp {iμ1 (t1 + t2 ) + iμ2 (t1 − t2 ) / 1 − [σ12 (t1 + t2 )2 + 2ρσ1 σ2 (t1 + t2 )(t1 − t2 ) + σ22 (t1 − t2 )2 ] . 2 Doing the algebra in the exponent, we get: 1 i(μ1 + μ2 )t1 + i(μ1 − μ2 )t2 − [(σ12 + σ22 + 2ρσ1 σ2 )t12 2 +2(σ12 − σ22 )t1 t2 + (σ12 − 2ρσ1 σ2 + σ22 )t22 ], and the result follows. (iii) Immediate from part (ii) and Exercise 15. (iv) Follows from part (iii), since ρ(U , V ) = 0 if and only if σ1 = σ2 . #
17.
(i) By Exercise 13 in Chapter 10, f X (t) = Eeit X = Eeit(X 1 +...+X k ) = Eeit X 1 × . . . × Eeit X k = ( peit + q)n 1 × . . . × ( peit + q)n k = ( peit + q)n , which is the ch.f. of the B(n, p) distribution.
e105
e106
Revised Answers Manual to an Introduction
(ii) As above, f X (t) = Eeit X 1 × . . . × Eeit X k = eλ1 e
it −λ 1
× . . . × eλk e
it −λ k
= eλe
it −λ
,
which is the ch.f. of the P(λ) distribution. (iii) Again, as above, f X (t) = Eeit X 1 × . . . × Eeit X k = eiμ1 t− = eiμt−
σ12 t 2 2
σ 2t2 2
× . . . × eiμk t−
σk2 t 2 2
,
which is the ch.f. of the N (μ, σ 2 ) distribution. Furthermore, independence of X 1 , . . . , X k implies that of c1 X 1 , . . . , ck X k , and so f c1 X 1 +...+ck X k (t) = Eeit(c1 X 1 +...+ck X k ) = Eeit(c1 X 1 ) × . . . × Eeit(ck X k ) = Eei(c1 t)X 1 × . . . × Eei(ck t)X k = f X 1 (c1 t) × . . . × f X k (ck t) = eiμ1 c1 t−
σ12 c12 t 2 2
× . . . × eiμk ck t−
= ei(c1 μ1 +...+ck μk )t−
σk2 ck2 t 2 2
(c12 σ12 +...+ck2 σk2 )t 2 2
which is the ch.f. of the N (c1 μ1 + . . . + tribution. (iv) Here
ck μk , c12 σ12
, + . . . + ck2 σk2 ) dis-
f X (t) = Eeit X 1 × . . . × Eeit X k 1 1 = × ... × α (1 − iβt) 1 (1 − iβt)αk 1 = , (1 − iβt)α which is the ch.f. of the Gamma distribution with parameters α and β. For the special cases, we have: α1 = · · · = αk = 1 (so that α = k), 1 β = 1/λ, so that f X (t) = it k , which is the ch.f. of the Gamma (1− λ )
distribution with parameters α = k, β = 1/λ. For α j = r j /2, β = 2, we 1 2 have: f X (t) = (1−2it) r /2 , which is the ch.f. of the χr distribution. # 18. Suppose the r.v.s X 1 , . . . , X k are independent. Referring to the derivations in Exercise 13 in Chapter 10, it is clear that the expression it(X 1 + X 2 ) can be replaced by i(t1 X 1 + t2 X 2 ). Then we obtain Eei(t1 X 1 +t2 X 2 ) = (Eeit1 X 1 )(Eeit2 X 2 ),
Revised Answers Manual to an Introduction
and by induction, (as in the exercise just cited), Eei(t1 X 1 +...+tk X k ) = (Eeit1 X 1 ) × . . . × (Eeitk X k ), or f X 1 ,...,X k (t1 , . . . , tk ) = f X 1 (t1 ) × . . . × f X k (tk ) for all t1 , . . . , tk in . Next, assume that this factorization, and we wish to show that the r.v.s X 1 , . . . , X k are independent. By the assumed factorization, the expression in Theorem 2 becomes as follows for continuity points a = (a1 , . . . , ak ) and b = (b1 , . . . , bk ) of the joint d.f. F = FX 1 ,...,X k of the r.v.s X 1 , . . . , X k , P(a j < X j ≤ b j , j = 1, . . . , k) T1 −it1 a1 1 e − e−it1 b1 = lim f X 1 (t1 )dt1 × . . . × T1 →∞ 2π −T1 it1 Tk −itk ak 1 e − e−itk bk f X k (tk )dtk lim Tk →∞ 2π −Tk itk = [FX 1 (b1 ) − FX 1 (a1 )] × . . . × [FX k (bk ) − FX k (ak )] (by Theorem 2 as it becomes for continuity points a j and b j of FX j , j = 1, . . . , k) = P(a1 < X 1 ≤ b1 ) × . . . × P(ak < X k ≤ bk ). That is, P(a j < X j ≤ b j , j = 1, . . . , k) =
k .
P(a j < X j ≤ b j )
j=1
for continuity point. Since any point can be approached by a sequence of continuity points, it follows that the last relation above holds for any points (a1 , . . . , ak ) and (b1 , . . . , bk ) with a j < b j , j = 1, . . . , k. Then Exercise 9 in Chapter 10 applies and yields the desired independence of the r.v.s X 1 , . . . , X k . # 19. Indeed, if F is the d.f. corresponding to f , then n n
f (tk − tl )z k z¯l =
k=1 l=1
= =
n n k=1 l=1 n
n
d F(x) z k z¯l
ei(tk −tl )x z k z¯l d F(x)
k=1 l=1
n n
e
i(tk −tl )x
k=1 l=1
zk e
itk x
× z¯l e
−itl x
d F(x)
e107
e108
Revised Answers Manual to an Introduction
=
n
=
k=1
n
z k eitk x
zk e
itk x
n l=1 n
k=1
z¯l e−itl x
d F(x)
zl eitl x
d F(x)
l=1
2 n z k eitk x d F(x) ≥ 0, =
k=1
1
where we recall that |z| = |x + i y| = (x 2 + y 2 ) 2 . # it 20. The ch.f. of X n is : f X n (t) = ene −n . Then f Yn (t) = f X√n −n (t) = e−it n
= e−it =e
√
n(e
it √ n
it √ n
= 1+ t2
− 1) = − 2 −
it √ n
−
t2 2n
it 3
√ eitn . 6 n
n
n
it √ e n
n
−
f Xn
t √
n
−1
e
it √ √ −it n+n e n −1
Use the expansion: ei z = 1 + i z − to obtain: e
√
.
z2 i z3 i z∗ ∗ 2 − 6 e , for some complex number z , √ 3 it√ itn e (|eitn | = 1), and hence −it n + 6n n
It follows that the right-hand side above tends to t2
− 2 as n → ∞, so that f Yn (t) −→ e− 2 , which is the ch.f. of Z ∼ N (0, 1). This n→∞ completes the proof. # t2
21.
(i) Indeed, G(x) =
∞
F1 (x −∞ ∞ x
=
∞
x−y
− y)d F2 (y) = p1 (u)du d F2 (y) −∞ −∞ p1 (v − y)dv d F2 (y) (by setting y + u = v)
−∞ −∞ ∞ x
=
−∞ −∞ x ∞
p1 (v − y)dvd F2 (y)
=
−∞ −∞
p1 (v − y)d F2 (y)dv (by the Fubini Theorem).
∞ Next, observe that the function g(v) = −∞ p1 (v − y)d F2 (y) is nonnegative and ∞ ∞ ∞ g(v)dv = p1 (v − y)d F2 (y)dv −∞
−∞ −∞
Revised Answers Manual to an Introduction
=
∞
∞
p1 (v − y)dvd F2 (y) (by the Fubini Theorem) = p1 (v − y)dv d F2 (y) −∞ −∞ ∞ ∞ = p1 (w)dw d F2 (y) (by setting v − y = w) −∞ −∞ ∞ = d F2 (y) = 1. −∞ −∞ ∞ ∞
−∞
x Since also G(x) = −∞ g(v)dv, it follows that g is the density of G; i.e., ∞ g = p. Furthermore, p(u) = −∞ p1 (v − y)d F2 (y). (ii) This part is immediate, because p(v) =
∞
−∞ ∞
p1 (v − y)d F2 (y)
=
−∞
p1 (v − y) p2 (y)dy
(by the Corollary to the Fubini Theorem). # 22. We have f X −Y (t) = f X (t) f −Y (t) = f X (t) f Y (−t) = f X (t) f Y (t) = f (t) f (t) = | f (t)|2 . # 23. Recall that a r.v. is symmetric about 0 if and only if its ch.f. is real. Next, f X 1 +...+X n (t) = f X 1 (t) . . . f X n (t) (by independence of the X j s) and the righthand side is real since each of f X j (t), j = 1, . . . , n, is real. Thus, f X 1 +...+X n (t) is real and hence X 1 + . . . + X n is symmetric about 0. # 24. All limits are taken as {n} or subsequences thereof tend to ∞. Continuous convergence implies that gn (xn ) → g(x) whenever xn → x. Indeed, if g were not continuous at x, the implication would be that there is a sequence xn → x and some ε > 0 such that |g(xn )−g(x)| > ε. Again, continuous convergence implies that gn (x) → g(x) by taking {xn } = {x}. Applying this for x1 , we have that there exists m 1 such that |gm 1 (x1 ) − g(x1 )| < 2ε . Also, for x2 there exists m 2 > m 1 such that |gm 2 (x2 ) − g(x2 )| < 2ε . Continuing like this, we have that there exists {m n } ⊆ {n} such that |gm n (xn ) − g(xn )| < 2ε . Setting xn = ym n , this inequality becomes |gm n (ym n ) − g(ym n )| < 2ε . Also, |g(ym n ) − g(x)| > ε. Therefore, we have: ε < |g(ym n ) − g(x)| = |[g(ym n ) − gm n (ym n )] + [gm n (ym n ) − g(x)]| ≤ |gm n (ym n ) − g(ym n )| + |gm n (ym n ) − g(x)|. However, |gm n (ym n ) − g(x)| < 2ε . Therefore |gm n (ym n ) − g(ym n )| must be > ε 2 , and this contradicts continuous convergence. Thus, g is continuous at every x ∈ E. #
e109
e110
Revised Answers Manual to an Introduction
25.
(i) Let x, xn ∈ E with xn → x as n → ∞ here and in the remaining of the discussion. Then |gn (xn ) − g(x)| = |[gn (xn ) − g(xn )] + [g(xn ) − g(x)]| ≤ |gn (xn ) − g(xn )| + |g(xn ) − g(x)| → 0, since gn (xn )−g(xn ) → 0 by uniformity, and g(xn ) → g(x) by continuity. (ii) Continuous convergence implies continuity of g (by Exercise 24). If the convergence were not uniform, there would exist {m} ⊆ {n} and xm ∈ E such that |gm (xm ) − g(xm )| > ε for some ε > 0. By the compactness of E, there exists {xr } ⊆ {xm } with xr → x ∈ E, and |gr (xr ) − g(xr )| > ε. However, ε < |gr (xr ) − g(xr )| ≤ |gr (xr ) − g(x)| + |g(xr ) − g(x)|, and |g(xr ) − g(x)| < ε/2 by continuity of g. Therefore |gr (xr ) − g(x)| > ε/2, which contradicts the assumed continuous convergence. #
26. Expand g(X n , Yn ) around (d1 , d2 ) according to the Taylor formula by using terms involving first order derivatives to get g(X n , Yn ) = g(d1 , d2 ) + (X n − d1 Yn − d2 ) g˙ x (Zn ) g˙ y (Zn ) ,
= g(d1 , d2 ) + (X n − d1 )g˙ x (Zn ) + (Yn − d2 )g˙ y (Zn ) , where Zn is a 2-dimensional random vector lying between (d1 , d2 ) and (X n , Yn ) on the line segment determined by them. Then cn [g(X n , Yn ) − g(d1 , d2 )] = cn (X n − d1 )[g˙ x (Zn ) − g˙ x (d1 , d2 )] +cn (Yn − d2 )[g˙ y (Zn ) − g˙ y (d1 , d2 )] +g˙ x (d1 , d2 )cn (X n − d1 ) + g˙ y (d1 , d2 )cn (Yn − d2 ). d
However, as n → ∞, cn (X n − d1 ) → X , g˙ x (Zn ) − g˙ x (d1 , d2 ) → 0, and cn (Yn − d
d2 ) → 0, g˙ y (Zn ) − g˙ y (d1 , d2 ) → 0 (the latter by the assumed continuity of the partial derivatives at (d1 , d2 )), so that P
cn (X n − d1 )[g˙ x (Zn ) − g˙ x (d1 , d2 )] → 0, and P
cn (Yn − d2 )[g˙ y (Zn ) − g˙ y (d1 , d2 )] → 0.
Revised Answers Manual to an Introduction
Next, the fact that d
cn (X n − d1 , Yn − d2 ) → (X , Y ) implies that any linear combination of the two components converges in distribution to the corresponding linear combination of (X , Y ) (by the Cramér-Wold device, Theorem 4 in this chapter). Accordingly, d
g˙ x (d1 , d2 )cn (X n −d1 )+ g˙ y (d1 , d2 )cn (Yn −d2 ) → g˙ x (d1 , d2 )X + g˙ y (d1 , d2 )Y , and hence d
cn [g(X n , Yn ) − g(d1 , d2 )] → g˙ x (d1 , d2 )X + g˙ y (d1 , d2 )Y , as was to be seen. Remark: From the process of the proof, it is quite clear that the result holds if the 2-dimensional random vector (X n , Yn ) is replaced by a k-dimensional random vector. #
Chapter 12 The Central Limit Problem: The Centered Case 1. X j ∼ U (− j, j), j ≥ 1, implies E X j = 0, σ j2 = σ 2 (X j ) = n(n+1)(2n+1) . 18
j2 3 ,
so that sn2 =
Also, p j (x) = for − j ≤ x ≤ j, and 0 otherwise. Next, for n sufficiently large, n < εsn or n 2 < ε2 sn2 , or n 2 < ε2 n(n+1)(2n+1) , 18 1 2j
ε2 18 (n
+ 1)(2 + n1 ) > 1, which is correct. Thus, for j ≤ n sufficiently large, 2 (|x|≥εsn ) x p j (x)d x = 0, because p j (x) = 0 outside [− j, j] ⊂ (−εsn , εsn ). Then, for sufficiently large n,
or
n 1 x 2 p j (x)d x = 0, gn (ε) = 2 sn (|x|≥εsn ) j=1
so that the Lindeberg condition is satisfied. # 2. Here E X j = 0, σ j2 = 2 j 2α × j1β = 2 j 2α−β and sn2 = nj=2 2 j 2α−β . Then, for every ε > 0, gn (ε) = 0 if the set of js with j = 2, . . . , n and | ± j α | ≥ εsn is ; or if for all js with j = 2, . . . , n, it happens that | ± j α | < εsn or j α < εsn , α which is implied by n α < εsn , or nsn < ε, and this is equivalent to saying that nα −→ 0, sn n→∞
or
sn −→ ∞, n α n→∞
or
sn2 −→ ∞. n 2α n→∞
e111
e112
Revised Answers Manual to an Introduction
(1,1)
1
0
1 n
2 n
···
n−1 n
1= n n
n sn2 j 2α However, n 2α = × j1β > n2β nj=2 ( nj )2α , and: nj=2 ( nj )2α = j=2 2( n ) 2α × n × nj=2 ( nj )2α × n1 = nTn , where Tn = nj=2 ( nj )2α × n1 = nj=1 nj 2α 1 1 2α × n1 and nj=1 nj × n1 is a partial sum converging to the integral n − n 1 2α 1 1 1 0 x d x = 2α+1 . That is, Tn −→ 2α+1 , so that Tn = 2α+1 +o(1) (o(1) → 0 as n→∞
sn2 n 2α
1 > × Tn = 2n 1−β Tn = 2n 1−β ( 2α+1 + o(1)) −→ ∞, n → ∞). Therefore n→∞ provided 1 − β > 0 or β < 1. # j 2α 3. Here E X j = 0, σ j2 = σ 2 (X j ) = E X 2j = 6 j22(α−1) = 13 j 2 , so that sn2 = 2n nβ
n(n+1)(2n+1) . Consider js with j = 1, . . . , n and | ± j|α 18 Then gn (ε) = sn−2 nj=1 (|x|≥εsn ) x 2 d F j (x) = 0 (since
< εsn or j 2α < ε2 sn2 . there are no values of X j , other than 0, in the region of integration). Next, the inequality j 2α < ε2 sn2 is implied by n 2α < ε2 sn2 , or n 2α < ε2 n(n +1)(2n +1)/18, or 18 < n(n+1)(2n+1) = ε2 n 2α (1+ n1 )(2+ n1 ) n 2α−3
which −→ ∞, provided α < 23 . n→∞
So, for j = 1, . . . , n and α < 23 , it follows that | ± j|α < εsn and gn (ε) = 0. Next, let | ± j|α ≥ εsn or j α ≥ εsn or j α ≥ εsn ,
(1)
and set k = kn = [(εsn )1/α ] (the integer part of (εsn )1/α ), so that k → ∞ (since sn −→ ∞) and k/(εsn )1/α −→ 1, or n→∞
n→∞
k 2α −→ 1. ε2 sn2 n→∞
(2)
Revised Answers Manual to an Introduction
For j with j ≥ k,
⎛ ⎞ n n k−1 1 j2 1 ⎝ 1 2 1 2⎠ gn (ε) ≥ 2 j − j = 2 sn 3 sn 3 3 j=k j=1 j=1 1 ε2 k(k − 1)(2k − 1) k(k − 1)(2k − 1) = 2 sn2 − =1− × sn 18 18 ε2 sn2 2 3 1 1 k ε × 1− 2− × 2 2 = 1− 18 k k ε sn 1 1 k 2α ε2 × 1− 2− × 2 2 × k 3−2α . = 1− (3) 18 k k ε sn
Then, by means of (2), we have that, as n → ∞, the expression on the right-hand side of (3) tends to: 1− 1
ε2 × 1 × 2 × 1 × 1 if α = 3/2 18 if α > 3/2.
In summary then, gn (ε) −→ 0 (indeed, gn (ε) = 0) if α < 3/2 (because of (1)), n→∞
and gn (ε) 0 for α ≥ 3/2. The conclusion follows. # n→∞
4. We have E X j = 0, σ j2 = σ 2 (X j ) = E X 2j = j 4 × 2 . Next, so that sn2 = nj=1 j3 = n(n+1)(2n+1) 18 gn (ε) =
1 12 j 2
× 2 + j2 ×
1 6
×2=
j2 3 ,
n n 1 1 j4 2 x d F (x) = 2 × I 2 + j sn2 sn2 12 j 2 (| j |≥εsn ) (|x|≥εsn ) j=1
j=1
n n n 1 j2 1 2 1 2 I 2 × = j I + j I( j≥εsn ) . (| j|≥εsn ) ( j 2 ≥εsn ) sn2 12 6sn2 6sn2 j=1
However, εsn =
j=1
j=1
ε √ [n(n +1)(2n +1)]1/2 3 2
n < εsn (for large n) and I( j≥εsn )
∼ n 3/2 (is of the order of n 3/2 ), so that = 0, j = 1, . . . , n. Thus, 6s12 nj=1 j 2 I( j≥εsn ) = n
o(1). Next, set kn = k = [(εsn )1/2 ], so that k is of the order of n 3/4 (k ∼ n 3/4 ), and ⎛ ⎞ n n n k−1 1 2 1 2 1 ⎝ 2 2 ⎠ j I( j 2 ≥εsn ) = 2 j I( j≥k) = 2 j − j 6sn2 6sn 6sn j=1 j=1 j=1 j=1 1 k(k − 1)(2k − 1) 2 = 2 3sn − 6sn 6 1 k(k − 1)(2k − 1) 1 × = − . 2 36 sn2
e113
e114
Revised Answers Manual to an Introduction
At this point, observe that k(k − 1)(2k − 1) is of the order of n 9/4 and sn2 is of the = o(1) and gn (ε) −→ 21 , not to 0. # order of n 3 = n 12/4 . Therefore k(k−1)(2k−1) s2 n→∞
n
5. From Mn = o(sn ), it follows that, for ε > 0, there exists n 0 = n(ε) such 0 that Mn < εsn , n > n 0 . For n > n 0 , Sn −snE Sn = s1n nj=1 (X j − E X j ) + 1 n 1 n 0 j=n 0 +1 (X j −E X j ) with sn j=1 (X j −E X j ) −→ 0 since sn −→ ∞. Next, sn n→∞
with τn2 = sn2 − sn20 , n
1 sn
j=n 0 +1
Thus, if we set Yn j = (Yn j ) =
sn2 −sn2 0 τn2
=
τn2 τn2
X j −E X j τn
n→∞
2 j=n 0 +1 E|Yn j
2 2ε
1 sn2 1− 20 sn
and
τn sn
n j=n 0 +1
× Yn j | ≤ sn2 0 −→ 0 sn2 n→∞
Xj − EXj . τn
, j = n 0 + 1, . . . , n, then EYn j = 0,
= 1. Also, for j > n 0 , |Yn j | ≤
sn ↑ ). That is, |Yn j | ≤ n
(X j − E X j ) =
n→∞
2εsn τn , 2εsn τn
2M j τn
j=n 0 +1 σ
2 (Y ) nj
=
2εsn τn
j=n 0 +1 σ
2εs
j = n 0 + 1, . . . , n. Then
n
n
2εsn τn
≤ τn j ≤ n
(since
j=n 0 +1 E|Yn j |
1
× 1 = 2ε
2
sn2 sn2 −sn2
3
= =
0
(since sn2 −→ ∞). So, for every ε > 0, there exists n→∞
n
n 0 = n(ε) such that j=n 0 +1 E|Yn j |3 < a multiple of ε, and this implies that n n X j −E X j 3 j=n 0 +1 E|Yn j | −→ 0. Then (by Theorem 3 in this chapter) j=n 0 +1 sn n→∞ τn −→ 1) and hence Sn −snE Sn ⇒ N (0, 1). # sn n→∞ n→∞ n→∞ d P We assume that Sn → N (0, 1) and we shall show that nj=1 X n2 j −→ 1. Since, n→∞ d under the assumptions made, Sn −→ N (0, 1) is equivalent to nj=1 (|x|≥ε) x 2 n→∞
⇒ N (0, 1) (since
6.
× d Fn j (x) −→ 0 for every ε > 0, where Fn j is the d.f. of X n j , it suffices to n→∞ assume this latter convergence. Our assumption implies: n
σ 2 (|X n j |I(|X n j |≥1) ) −→ 0 and n→∞
j=1
n
P(|X n j | ≥ ε) −→ 0. n→∞
j=1
(1)
The first of these convergences is immediate since (for ε = 1): n
σ 2 (|X n j |I(|X n j |≥1) ) =
j=1
n n +
,2 E |X n j |I(|X n j |≥1) E X n2 j I(|X n j |≥1) − j=1
j=1
n ≤ E X n2 j I(|X n j |≥1) j=1
=
n j=1 (|x|≥1)
x 2 d Fn j (x) −→ 0, n→∞
Revised Answers Manual to an Introduction
and the second convergence follows thus: n
P(|X n j | ≥ ε) =
j=1
≤
n j=1 (|x|≥ε) n
1 ε2
d Fn j (x) =
n 1 ε2 d Fn j (x) ε2 (|x|≥ε) j=1
j=1 (|x|≥ε)
x 2 d Fn j (x) −→ 0. n→∞
At this point, set: Yn j = X n2 j , Z n j = Yn j I(Yn j <1) , Tn =
n
Yn j , and Vn =
j=1
n
Zn j .
(2)
j=1
Then we show that: n
σ 2 (Z n j ) −→ 0.
(3)
n→∞
j=1
Indeed, for 0 < ε < 1, 0≤
n
σ (Z n j ) = 2
j=1
=
n
j=1 n
E[Yn j I(Yn j <ε) ] +
⎡
= ε ⎣1 −
j=1
n
E[Yn j I(Yn j <ε) ] +
j=1
0≤
−
j=1 (|x|≥ε)
⎡
σ 2 (Z n j ) ≤ ε ⎣1 −
E Zn j
2
n
≤
j=1 n j=1 n
E Z n2 j
j=1
E[Yn2j I(Yn j <ε) ] +
j=1 n
n j=1
E[Yn2j I(Yn j <1) ] =
≤ε
n
E Z n2 j
j=1
j=1 n
≤ε
n
n
E[Yn2j I(ε≤Yn j <1) ]
j=1
E I(ε≤Yn j <1) P(Yn j ≥ ε) ⎤
x 2 d Fn j (x)⎦ +
n j=1
n j=1 (|x|≥ε)
P(|X n j | ≥ ⎤
x 2 d Fn j (x)⎦ +
n
√
ε); i.e.,
P(|X n j | ≥
√
ε).
j=1
(4) Taking the limits in (4) (as n → ∞) and utilizing our assumption and (1), we obtain (3).
e115
e116
Revised Answers Manual to an Introduction
Next,
⎤ ⎡ n P(|Vn − E Vn | ≥ ε) = P ⎣ (Z n j − E Z n j ) ≥ ε⎦ j=1 ⎞ ⎛ n 1 2 ⎝ ≤ 2σ (Z n j − E Z n j )⎠ ε j=1
n 1 2 = 2 σ (Z n j ) −→ 0 by (3). That is, n→∞ ε j=1
P
Vn − E Vn −→ 0.
(5)
n→∞
However, E Vn =
n
E Zn j =
j=1
n n
E Yn j I(Yn j <1) = E X n2 j I(|X n j |<1) j=1
j=1
n = 1− E X n2 j I(|X n j |≥1) j=1
= 1−
n j=1 (|x|≥1)
x 2 d Fn j (x) −→ 1 − 0 = 1. n→∞
This result along with (3) yield then P
Vn −→ 1.
(6)
n→∞
Finally, ⎛ ⎞ n n n P(Vn = Tn ) = P ⎝ Z n j = Yn j ⎠ ≤ P ∪ (Z n j = Yn j ) j=1
≤ =
n j=1 n
j=1
j=1
P(Z n j = Yn j ) = P(Yn j ≥ 1) =
j=1
n
P Yn j I(Yn j <1) = Yn j
j=1 n
n
j=1
j=1
P(X n2 j ≥ 1) =
−→ 0, by (1).
n→∞
This implies that, for every ε > 0, P(|Vn − Tn | ≥ ε) ≤ P(Vn = Tn ) −→ 0, n→∞
P(|X n j | ≥ 1)
Revised Answers Manual to an Introduction
so that P
Vn − Tn −→ 0. n→∞
(7)
Relations (6) and (7) complete the proof. # 7.
(i) By definition, g(u) = g(u + h) − g(u − h), call it g1 (u). Then (2) g(u) = g1 (u) = g1 (u + h) − g1 (u − h) = g(u + 2h) − 2g(u) + g(u − 2h) 2 2 g(u + (2 − 2r )h), (−1)r = r r =0
so that the formula is true for n = 2. Next, assume it to be true for m; i.e., we assume:
(m)
m r m g(u + (m − 2r )h), g(u) = (−1) r r =0
and we shall establish it for m + 1; i.e., we shall show that: (m+1) g(u) =
m+1 r =0
(−1)r
m+1 g(u + ((m + 1) − 2r )h). r
We have: (m+1) g(u) = (m) g(u) m r m g(u + (m − 2r )h)|u+h (−1) = r r =0 m r m g(u + (m − 2r )h)|u−h − (−1) r r =0 m m g(u + ((m + 1) − 2r )h) = (−1)r r r =0 m r m − g(u + ((m − 1) − 2r )h) (−1) r r =0 m m = g(u + (m + 1)h) + g(u + (m + 1 − 2r )h) (−1)r r r =1
e117
e118
Revised Answers Manual to an Introduction
−
m−1
(−1)r
r =0
m g(u + (m − 1 − 2r )h) − (−1)m g(u + (−m − 1)h) r
= g(u + (m + 1)h) +
m−1
(−1)k+1
k=0
−
m−1 r =0
m g(u + (m − 1 − 2r )h) + (−1)m+1 g(u + (−m − 1)h) (−1)r r
= g(u + (m + 1)h) −
m−1
(−1)r
r =0
−
m−1 r =0
m g(u + (m + 1 − 2k − 2)h) k+1
m g(u + (m − 1 − 2r )h) r +1
m g(u + (m − 1 − 2r )h) + (−1)m+1 g(u − (m + 1)h) (−1)r r
= g(u + (m + 1)h) −
m−1
(−1)r
r =0
m m + r +1 r
×g(u + (m − 1 − 2r )h) + (−1)m+1 g(u − (m + 1)h) m−1 r m+1 g(u + (m − 1 − 2r )h) (−1) = g(u + (m + 1)h) − r +1 r =0 m m+1 +(−1) g(u − (m + 1)h) (since r +1 m m+1 + = ) r r +1 m−1 r m+1 g(u + (m + 1 − 2(r + 1))h) = g(u + (m + 1)h) − (−1) r +1 r =0
+(−1)m+1 g(u − (m + 1)h) m m+1 g(u + (m + 1 − 2k)h) (−1)k−1 = g(u + (m + 1)h) − k k=1
+(−1)
g(u − (m + 1)h) m k m+1 g(u + (m + 1 − 2k)h) (−1) = g(u + (m + 1)h) + k m+1
k=1
+(−1)m+1 g(u − (m + 1)h) m+1 k m+1 g(u + (m + 1 − 2k)h), as was to be seen. (−1) = k k=0
Revised Answers Manual to an Introduction
(ii) By part (i),
(2n)
2n r 2n f (2(n − r )h), h ∈ . f (0) = (−1) r
(1)
r =0
Expand f (u + t) around u up to terms of order 2n to get: f (u + t) =
2n k t k=0
k!
f (k) (u) + o(t 2n ).
(2)
In (2), take u = 0 and replace t by 2(n − r )h to obtain: 2n (2(n − r )h)k
f (2(n − r )h) =
k!
k=0
f (k) (0) + o((2(n − r )h)2n ).
(3)
By means of (3), relation (1) becomes: (2n) f (0)
2n 2n (2(n − r )h)k (k) r 2n 2n f (0) + o((2(n − r )h) ) (−1) = r k! r =0
=
2n 2n
(−1)r
r =0 k=0 2n
(−1)r
+
r =0
k=0
2n (2(n − r )h)k (k) f (0) r k!
2n o((2(n − r )h)2n ). r
(4)
The first term on the right-hand side in (4) is equal to: 2n hk
k!
k=0 2n−1
=
k=0
f
(k)
2n r 2n (2n − 2r )k (0) (−1) r r =0
hk k!
f
(k)
2n r 2n (2n − 2r )k (0) (−1) r r =0
2n (2n) r 2n (2n − 2r )2n f + (0) (−1) r (2n)! h 2n
r =0
h 2n (2n) f =0+ (0) × 22n (2n)! (according to the hint) (2n)! = (2h)2n f (2n) (0).
(5)
e119
e120
Revised Answers Manual to an Introduction
Next, for the second term on the right-hand side of (4), we have 2n r 2n 2n o((2(n − r )h) ) (−1) r r =0 2n r 2n 2n = (−1) (2(n − r )h) o(1) r r =0
≤ (2h)2n × o(1) ×
2n 2n r =0
≤ (2h)2n × o(1) × n 2n
r
(n − r )2n
2n 2n r =0 2n
r
= (2h)2n × o(1) × n 2n × 2 = (2h)2n × o(1) × (2n)2n = o((2h)2n ) (since n is constant). (6) By way of (5) and (6), relation (4) becomes: (2n) f (0) = (2h)2n (2n) 2n ) f (2n) (0) + o((2h)2n ), and hence (2h)f2n(0) = f (2n) (0) + o((2h) = (2h)2n f (2n) (0) +
o(h 2n ) . h 2n
#
8. To show that, for m = 1, 2, . . . , m 0 for k < m r m k (m − 2r ) = (−1) 2m × m! for k = m. r
(1)
r =0
Relation (1) is easily checked to be true for m = 1, 2, 3. Assume (1) to be true for some value of m and establish it for m + 1; i.e., we shall show that m+1 0 for k < m + 1 r m+1 k (m+1−2r ) = (−1) m+1 × (m + 1)! for k = m + 1. (2) 2 r r =0
The proof of (2) is split into two parts. First, it is shown that I =
m+1 r =0
(−1)r
m+1 (m + 1 − 2r )k = 0, k ≤ m, r
(3)
and then that m+1 r =0
(−1)r
m+1 (m + 1 − 2r )m+1 = 2m+1 × (m + 1)! r
In establishing (3), we use the familiar identity m+1 m m = + , r r r −1
(4)
Revised Answers Manual to an Introduction
so that I =
m+1 r =0
m+1 m m k r (m + 1 − 2r ) + (m + 1 − 2r )k (−1) (−1) r r −1 r
r =0
m+1 m r m k r = (−1) (−1) (m + 1 − 2r ) + (m + 1 − 2r )k r r −1 r =0 r =1 m m m m = (−1)r (−1)l+1 (m + 1 − 2r )k + (m − 1 − 2l)k r l r =0 l=0 m m de f r m k r m (m + 1 − 2r ) − (−1) (m − 1 − 2r )k = I1 − I2 . = (−1) r r m
r =0
r =0
Now, (m + 1 − 2r )k = [(m − 2r ) + 1]k = I1 =
m r =0
k l=0 l (m
k
− 2r )l , so that
k k m m k k l r m (m − 2r ) = (m − 2r )l (−1) (−1) r l l r r
l=0
l=0
r =0
k−1 m m k r m l r m (m − 2r ) + (m − 2r )k . = (−1) (−1) l r r l=0
r =0
(5)
r =0
For k < m, both terms in (5) are equal to 0, by (1), whereas for k = m, the first term in (5) is equal to 0, and the second term equals 2m × m!, again by (1). Thus, 0 for k < m (6) I1 = 2m × m! for k = m. Next, k k (m − 2r )l (−1)k−l , (m − 1 − 2r ) = [(m − 2r ) − 1] = l k
k
l=0
so that m k k r m (m − 2r )l (−1)k−l (−1) I2 = r l r =0
l=0
k m k k−l r m (−1) (m − 2r )l = (−1) l r l=0
r =0
m k−1 k k−l r m (−1) (m − 2r )l = (−1) l r r =0 l=0 m m (m − 2r )k . + (−1)r r r =0
(7)
e121
e122
Revised Answers Manual to an Introduction
As in (5), rm=0 (−1)r mr (m − 2r )l = 0 for l = 0, 1, . . . , k − 1, so that the first term in (7) is 0, and the second term is 2m × m! Therefore 0 for k < m (8) I2 = 2m × m! for k = m, so that, by (7) and (8), I = I1 − I2 = 0, k ≤ m, which is (3). Next, to establish (4). To this end. I∗ = =
m+1
(−1)r
r =0 m+1
(−1)r
r =0
= (m + 1)
m+1 (m + 1 − 2r )m+1 r m+1 (m + 1 − 2r )m [(m + 1) − 2r ] r
m+1
(−1)r
r =0
−2
m+1
(−1)r r
r =0
m+1 (m + 1 − 2r )m r
m+1 (m + 1 − 2r )m r
de f
= (m + 1)I3 − 2I4 .
However, I4 =
m+1
(−1)r r
r =0
m+1 (m + 1 − 2r )m r
m (m + 1 − 2r )m r −1 r =0 m+1 m since r = (m + 1) r r −1 m+1 m (m + 1 − 2r )m = (m + 1) (−1)r r −1 r =1 m m (m − 1 − 2l)m = (m + 1) (−1)l+1 l l=0 m r m (m − 1 − 2r )m = −(m + 1) (−1) r r =0 m m [(m − 2r ) − 1]m = −(m + 1) (−1)r r = (m + 1)
m+1
(−1)r
r =0
(9)
Revised Answers Manual to an Introduction
m m m (m − 2r )l (−1)m−l = −(m + 1) (−1) r l r =0 l=0 m m m m (−1)m−l (m − 2r )l = −(m + 1) (−1)r l r m
l=0 m−1
r
r =0
m m m (−1)m−l (m − 2r )l (−1)r l r r =0 l=0 m m (m − 2r )m −(m + 1) (−1)r r
= −(m + 1)
r =0
= −(m + 1)(2m × m!), and therefore −2I4 = 2(m + 1)(2m × m!) = 2m+1 × (m + 1)!
(10)
Next, m+1
I3 =
r =0 m+1
=
r =0
(−1)
r
m+1 (m + 1 − 2r )m r
m+1 m m m r (m + 1 − 2r ) + (m + 1 − 2r )m (−1) (−1) r r −1 r
r =0
m m+1 m r m m r (m + 1 − 2r ) + (m + 1 − 2r )m = (−1) (−1) r r −1 r =0
r =1
de f
= I5 + I6 .
But m m m m m m [(m − 2r ) + 1]m = (m − 2r )l (−1)r (−1)r r r l r =0 r =0 l=0 m m m m (m − 2r )l = (−1)r l r
I5 =
=
l=0 m−1
r =0
l=0
r =0
m m m r m l r m (m − 2r ) + (m − 2r )m (−1) (−1) l r r r =0
= 2m × m!, and I6 =
m+1 r =1
(−1)r
m m m (m + 1 − 2r )m = (m − 1 − 2l)m (−1)l+1 r −1 l l=0
e123
e124
Revised Answers Manual to an Introduction
= = = =
m r m (m − 1 − 2r )m − (−1) r r =0 m m [(m − 2r ) − 1]m − (−1)r r r =0 m m m r m − (−1) (m − 2r )l (−1)m−l r l r =0 l=0 m m m m−l r m − (−1) (−1) (m − 2r )l l r r =0
l=0 m−1
m m m−l r m (−1) (m − 2r )l =− (−1) l r r =0 l=0 m m (m − 2r )m − (−1)r r r =0
= −2m × m!, so that I3 = I5 + I6 = 0. From this result and (10), relation (9) yields I ∗ = 2m+1 × (m + 1)! which is (4). The proof is completed. # 9. We have P
max |X n j | ≥ ε
1≤ j≤kn
= 1− P
max |X n j | < ε
1≤ j≤kn kn
= 1 − P[ ∩ (|X n j | < ε)] j=1
= 1−
kn .
P(|X n j | < ε) (by rowwise independence)
j=1
= 1−
kn .
[1 − P(|X n j | ≥ ε)]
j=1
= 1−
kn .
[1 −
j=1
(|x|≥ε)
It follows that
max |X n j | ≥ ε
P
1≤ j≤kn
if and only if kn .
d Fn j (x)].
−→ 0
n→∞
(1)
[1 −
j=1
(|x|≥ε)
d Fn j (x)] −→ 1. n→∞
(2)
Revised Answers Manual to an Introduction
The left-hand side inequality in Exercise 3 of Chapter 10 is equivalent to ⎛ ⎞ n n . (1 − x j ) ≤ exp ⎝− xj⎠ , j=1
j=1
whereas the right-hand side inequality in the same exercise is equivalent to 1−
n
xj ≤
n .
j=1
(1 − x j ).
j=1
Combining these two inequalities, we obtain ⎛ ⎞ n n n . xj ≤ (1 − x j ) ≤ exp ⎝− x j ⎠ , 0 ≤ x j ≤ 1. 1− j=1
Taking x j = 1−
j=1
(|x|≥ε) d Fn j (x),
kn j=1 (|x|≥ε)
j=1
j = 1, . . . , kn , we get
d Fn j (x) ≤
kn . 1− j=1
⎡
≤ exp ⎣−
(|x|≥ε)
d Fn j (x)
kn j=1 (|x|≥ε)
⎤ d Fn j (x)⎦ (≤ 1).
(3)
Then from (2) and (3), we have that (5) is equivalent to (4) below. kn j=1 (|x|≥ε)
d Fn j (x) −→ 0.
Therefore (1) is equivalent to (4). Next, (0 ≤) max P(|X n j | ≥ ε) = max 1≤ j≤kn
1≤ j≤kn
≤
(4)
n→∞
kn
(|x|≥ε)
j=1 (|x|≥ε)
d Fn j (x)
d Fn j (x) −→ 0 (by (4)). n→∞
This completes the proof. # 10. Indeed, for a complex number z, we have z = |z|eiθ with −π < θ ≤ π , so that z 2 = |z|2 e2iθ and |z 2 | = |z|2 e2iθ = |z|2 × 1 = |z|2 . # 11. The first difference of f at u is (1) f (u) = f (u + h) − f (u − h), and its second difference is: (2) f (u) = ( f (u + h) − f (u − h))(h) − ( f (u + h) − f (u − h))(−h) = f (u + 2h) − 2 f (u) + f (u − 2h),
e125
e126
Revised Answers Manual to an Introduction
which, for u = 0, becomes: (2) f (0) = f (2h) − 2 f (0) + f (−2h) = ei(2h)x d F(x) − 2 ei(0)x d F(x) + ei(−2h)x d F(x)
2 i hx −i hx e −e d F(x) =
2 [2i sin(hx)] d F(x) = −4 [sin(hx)]2 d F(x). =
2
sin(hx) f (0) (2) f (0) 2 (2) = − x d F(x), and f (0) = lim = h→0 (2h)2 (2h)2 hx 2
2 sin(hx) − lim x 2 d F(x). However, 0 ≤ sin(hx) , and therefore, by the hx h→0
hx Fatou-Lebesgue Theorem, as h → 0, sin(hx) 2 2 sin(hx) 2 2 lim x d F(x) = lim inf x d F(x) hx hx
sin(hx) 2 2 ≥ lim inf x d F(x) hx
= x 2 d F(x), Hence
(2)
since lim
sin(hx) hx
= 1. Thus,
x 2 d F(x) = E X 2 < ∞. #
Chapter 13 The Central Limit Problem: The Noncentered Case
n n 2 ¯ 2 1. We have kj=1 −→ 0 if and only if s12 kj=1 (|x|≥εsn ) x d Fn j (x) n→∞ (|x|≥εsn ) x n d F¯n j (x) −→ 0 (since sn2 −→ σ 2 ), and by Theorem 4 in Chapter 12, this happens n→∞
n→∞
if and only if σn2j S¯n ⇒ N (0, 1) (and max 2 −→ 0) 1≤ j≤kn sn n→∞ sn n→∞ S¯n ⇒ N (0, 1) implies S¯n ⇒ N (0, σ 2 ), sn n→∞ n→∞ k n and then S¯n + αn ⇒ N (μ, σ 2 ), where αn = α −→ μ. Also, S¯n + n j j=1 n→∞ n→∞ n n 2 ¯ αn = kj=1 X n j = Sn . Thus, for every ε > 0, kj=1 −→ 0 (|x|≥εsn ) x d Fn j (x) n→∞ 2 implies Sn ⇒ N (μ, σ ). # n→∞
(since E X¯ n j = 0, σ 2 ( X¯ n j ) = σn2j ). But
2. See Exercise 9 in Chapter 8. #
Revised Answers Manual to an Introduction
Chapter 14 Topics from Sequences of Independent Variables 1. By Exercise 6(i) in Chapter 11, the E X does not exist. Hence, by Theorem 9 in this chapter, { Snn } is unbounded with probability 1. This implies that, as n → ∞, { Snn } cannot converge ∞ in probability to 0. # 2. By Lemma 1, ∞ n=1 P(|X | ≥ n) ≤ E|X | ≤ 1 + n=1 P(|X | ≥ n). For some fixed c > 0, consider the r.v. Y = X /c. Then ∞
P(|Y | ≥ n) ≤ E|Y | ≤ 1 +
n=1
or
∞ n=1
∞
P(|Y | ≥ n),
n=1 ∞
P(|X | ≥ nc) ≤
1 E|X | ≤ 1 + P(|X | ≥ nc). c n=1
∞
Thus, E|X | < ∞ if and only if n=1 P(|X | ≥ nc) < ∞. # 3. Since X is An = σ (X n , X n+1 , . . .)-measurable, it is measurable with respect to the tail σ -field, T , induced by the X n s. However, by Theorem 10, T is equivalent to {, }. Hence X is {, }-measurable and therefore P(X = c) = 1 for some (finite) constant c. (See also the discussion in Exercise 11 in Chapter 10.) # 4. By setting Yn = |X n |, we will show that ∞ n=1 Yn converges a.s. This is done by means of Theorem 11 (Three Series Criterion). To this end, for some fixed c > 0, P(Yn ≥ c) ≤ 1c ∞ define Ync by: Ync = Yn I(Yn
∞
= ∩ ∪ Aν = lim sup An . n=1 ν=n
n→∞
The result follows. # 6. If the events A , n ≥ 1, are independent and P lim sup A = 0, then n n n→∞ ∞ ∞ P(A ) < ∞. This is so, because, if P(A ) = ∞, then n n n=1 n=1 P lim supn→∞ An = 1, by Theorem 4(ii), a contradiction. #
e127
e128
Revised Answers Manual to an Introduction
7.
(a) Observe that, for any ω ∈ (0, 1), X n (ω) = 0 for all sufficiently large n (n ≥ n(ω), say). Then X n −→ 0 pointwise, and hence X n −→ 0 a.s. n→∞ n→∞ Then, by Exercise 5, P lim supn→∞ An = 0, where An = |X n | ≥ k1 for each arbitrary but fixed k = 1, 2, . . . . However, P |X n | ≥ k1 = ∞ P X n ≥ k1 = P(X n = 1) = 1/n, so that ∞ n=1 P(An ) = n=1 1/n = ∞. = 0. Also, (b) As in (a), X n −→ 0 a.s., and P lim supn→∞ An n→∞ ∞ 1 1 2 P |X n | ≥ k = P X n ≥ k = 1/n , so that n=1 P(An ) = ∞ n=1 1/n 2 < ∞. Remark: It is to be observed that, both in (a) and (b), the r.v.s (and hence the events) are dependent, since they are ordered: X n > X n+1 , n ≥ 1. #
Chapter 15 Topics from Ergodic Theory 1. For measurability of T , it suffices to show that T −1 ((u, v]) ∈ B(0,1] for 0 < u < v ≤ 1. This is so, by Exercise 6 in Chapter 1. That this is true is seen as follows: Let 0 < u < v ≤ 21 . Then u < x − 21 ≤ v or u + 21 < x ≤ v + 21 , so that T −1 ((u, v]) = (u + 21 , v + 21 ]. Next, let 21 < u < v ≤ 1. Then u < x + 21 ≤ v or u − 21 < x ≤ v − 21 , so that T −1 ((u, v]) = (u − 21 , v − 21 ]. Finally, let 0 < u < v ≤ 1. Then (u, v] = (u, 21 ] + ( 21 , v] (assuming of course, that v > 21 ), and u < x − 21 ≤ 21 or u + 21 < x ≤ 1, and 21 < x + 21 ≤ v or 0 < x ≤ v − 21 , so that T −1 ((u, v]) = (u + 21 , 1] + (0, v − 21 ] = (0, v − 21 ] + (u + 21 , 1]. In all three cases, T −1 ((u, v]) ∈ B, so that T is measurable. Next, to show that T is measure-preserving. Let C be the field as described in Exercise 7(i) of Chapter 1. Then by Exercise 8 in the same chapter, σ (C(0,1] ) = B(0,1] . Thus, by Proposition 5 in this chapter, it suffices to show the measurepreserving property of T for members of C(0,1] , or just for (0 <)u < v(≤ 1). That λ((u, v]) = v−u = λ(T −1 (u, v]) is immediate, because in all three possible cases considered above, we have: (v + 21 ) − (u + 21 ) = v − u, (v − 21 ) − (u − 21 ) = v − u, and [(v − 21 ) − 0] + [1 − (u + 21 )] = v − 21 + 1 − u − 21 = v − u. # 2. For every 0 < x ≤ 1, clearly, T −1 ((c2 x, cx]) = (cx, x], and if a probability measure P is measure-preserving, then P((c2 x, cx]) = P((cx, x]). Also, for n ≥ 2, T −1 ((cn x, cn−1 x]) = (cn−1 x, cn−2 x] and P((cn x, cn−1 x]) = P((cn−1 x, cn−2 x]) = . . . = P((cx, x]). But (cn x, cn−1 x] ↓ (0, 0] = , as n → ∞, so that P((cn x, cn−1 x]) → 0, and hence P((cx, x]) = 0. However, {x} ⊂ (cx, x], and hence P({x}) = 0. Thus, if P is measure-preserving, then P({x}) = 0 for every x with 0 < x ≤ 1. # 3. By Definition 8, T is ergodic if and only if every invariant set A has λ-probability either 0 or 1. Consider, e.g., the set A = A1 + A2 , where A1 = ( 26 , 36 ] and
Revised Answers Manual to an Introduction
A2 = ( 56 , 66 ]. Then T −1 (A1 ) = A2 and T −1 (A2 ) = A1 , so that T −1 (A) = A, and hence A is invariant. However, λ(A) = 26 = 13 = from 0 or 1. Hence T is not ergodic. # 4. At the start of the proof of the inverse of Proposition 16, it was seen that X (T ω) = X (ω), for all ω ∈ , when X = I A , A ∈ J . Next, let X = r j , where {A j ; j = 1, . . . , r } is a J -measurable partition of . Then j=1 α j I A r r X (T ω) = j=1 α j I A j (T ω) = j=1 α j I A j (ω) = X (ω), since I A j (T ω) = I A j (ω), j = 1, . . . , r , by the previous step. Now, let 0 ≤ X n simple r.v.s n αn j I An j , where, for each n, {An j , j = 1, . . . , rn } is a ↑ X . Then X n = rj=1 n→∞
J -measurable partition of , and 0 ≤ X n (T ω) ↑ X (T ω). However, n→∞
X n (T ω) = X n (ω), n ≥ 1, by the previous step. Then X (T ω) = X (ω). Finally, for any J -measurable r.v. X , we have X (T ω) = X + (T ω)−X − (T ω) = X + (ω)− X − (ω), by the previous step, and this is equal to X (ω). # 5. At the start of the proof of Lemma 2, itr was seen that E X (ω) = E X (T ω) when X = I A , A ∈ A. Next, let X = j=1 α j I A j , where {A j ; j = 1, . . . , r } is a (measurable) partitionof . Then E I A j (T ω) = E I A j (ω), j = 1, . . . , r , by the previous step, and rj=1 α j E I A j (ω) = rj=1 α j E I A j (T ω), or E X (ω) = E X (T ω). Now, let 0 ≤ X n simple r.v.s ↑ X a nonnegative r.v. That is, as n → ∞, n→∞
X n (ω) → X (ω) and X n (T ω) → X (T ω). However, E X n (ω) = X n (T ω), n ≥ 1, by the previous step, and hence E X (ω) = E X (T ω). Finally, for any r.v. X for which the E X exists, we have E X (ω) = E X + (ω) − E X − (ω) = E X + (T ω) − E X − (T ω), by the previous step, and this is equal to E X (T ω). # 6. Let J be the class of sets A ∈ A for which T −1 A = A. Then T −1 = , so that J = . Next, for A ∈ J , we have T −1 Ac = (T −1 A)c = Ac , so that Ac ∈ J . ∞ −1 A = Finally, for An ∈ J , n = 1, 2, . . . , we have T −1 (∪∞ n n=1 An ) = ∪n=1 T ∞ ∞ ∪n=1 An , so that ∪n=1 An ∈ J . It follows that J , is, indeed, a σ -field. # 7.
(i) Z 2 is a function of Z 1 ; i.e., if ω, ω ∈ , with ω = ω and Z 1 (ω) = Z 1 (ω ), then Z 2 (ω) = Z 2 (ω ). Set Z 2 (ω) = ω2 , Z 2 (ω ) = ω2 and suppose that ω2 = ω2 . By assumption, {ω2 }, {ω2 } ∈ A2 , and let C = Z 2−1 ({ω2 }), C = Z 2−1 ({ω2 }). Then C, C ∈ A2 and C ∩C = since ω2 = ω2 . By assumption, A2 ⊆ A1 , so that C, C ∈ A1 . Then there exists B, B ∈ A1 such that C = Z 1−1 (B) and C = Z 1−1 (B ), and B ∩ B = since C ∩C = . Next, ω ∈ C, ω ∈ C , Z 1 (ω) = Z 1 (ω ), and Z 1 (ω) ∈ B, Z 1 (ω ) ∈ B . Then B ∩ B = , a contradiction. Thus, Z 2 (ω) = Z 2 (ω ). (ii) On Z 1 (), define Z by: Z (ω1 ) = Z 2 (ω; Z 1 (ω) = ω1 ), so that Z 2 (ω) = Z (Z 1 (ω)) or Z 2 = Z (Z 1 ). (That is, for ω1 ∈ Z 1 (), consider B = {ω ∈ ; Z 1 (ω) = ω1 }. Since for any ω, ω ∈ B with ω = ω we have Z 1 (ω) = Z 1 (ω )(= ω1 ), it follows, by part (i), that Z 2 (ω) = Z 2 (ω ), so that Z , as defined above, is well defined.) To show that it is unique.
e129
e130
Revised Answers Manual to an Introduction
Let Z : Z 1 () → 2 be another mapping such that Z 2 = Z (Z 1 ). To show Z (ω1 ) = Z (ω1 ) for every ω1 ∈ Z 1 (). For ω1 ∈ Z 1 (), we have Z 2 (ω) = Z (Z 1 (ω)) = Z (ω1 ) for some ω ∈ ; also, Z 2 (ω) = Z (Z 1 (ω)) = Z (ω1 ). Hence Z (ω1 ) = Z (ω1 ), as was to be seen. (iii) To show that Z of part (ii) is A1 ∩ Z 1 ()-measurable. Let D ∈ A2 , and de f
set Z 2−1 (D) = A which is in A2 . Also, Z 2−1 (D) = [Z (Z 1 )]−1 (D) = Z 1−1 [Z −1 (D)] = Z 1−1 (B), where we have set B = Z −1 (D). So, A = Z 1−1 (B) and B ⊆ Z 1 () (because Z : Z 1 () → 2 ). Since A ∈ A2 , it follows (by assumption) that A ∈ A1 . Then there exists C ∈ A1 such that A = Z 1−1 (C). So, A = Z 1−1 (C) with C ∈ A1 and A = Z 1−1 (B) with B ⊆ Z 1 (). Therefore B = C ∩ Z 1 (). That is, Z is A1 ∩ Z 1 ()measurable. # 8.
(i) In the first place, X is stationary, by Proposition 3. Next, let A be invariant (relative to X) (see Definition 11). Then A = (X n , X n+1 , . . .)−1 B for some B ∈ B ∞ and all n ≥ 1. It follows that A ∈ σ (X n , X n+1 , . . .) for all n ≥ 1, and hence A ∈ T , where T is the tail σ -field defined on X (see Definition 1 and discussion just prior to it in Chapter 14). Thus, J ⊆ T . For independent r.v.s (whether identically distributed or not), Theorem 10 in Chapter 4 gives that P(A) is either 0 or 1 for all A ∈ T . It follows that P(A) is either 0 or 1 for all A ∈ J , and hence X is ergodic. (ii) It follows immediately from part (i), and Corollary 2 to Theorem 3. #
Index Ω1-section, 81–82 Ω2-section, 81–82 λ-absolute continuous, 140 λ-continuous measure, 141 λ-integrable, 88, 90, 141 λ-null set, 141 λ-singular measure, 141 λ-systems, 5 μ-continuous, 122 μ-equivalence, 122, 128 μ*-measurable set, 27, 36 μo-measurable set, 25 μ-null set, 41 μ-singular, 122–123, 125–128 π-systems, 5 σ-additive measure, 21 set function, 74, 117–118, 123, 127 σ-additive set function, 117 σ-field, 2, 5, 8, 26–28, 328, 330, 340–342, 345, 375 Borel, 3 Borel countably infinite-dimensional, 7 discrete, 2 extended Borel, 3 generated, 3, 27–28, 173–174 generated by a class, 3, 17, 19, 174 generated by a countable partition, 155 generated by a partition, 2, 154–155 induced by a partition, 1–2 induced by a r.v., 180 of invariant sets (events), 319, 340–341 product, 6–7 trivial, 2 σ-finite measure, 27, 80–82, 86, 110, 128 measure space, 80, 82, 86, 88 set function, 127 μ unique, 117 [μ] uniqueness, 128
A Ans occur infinitely often (An i.o.), 16 a.e. [μ] finite, 117, 128 absolutely continuous, 105 mutually, 105 uniformly, 122 with respect to a measure, 74, 117, 122, 135
Additive, 35 Additive constant, 206 Additivity of measure, 20–21, 30 All but finitely many, 16 Almost everywhere (a.e.) convergence, 41, 47–48, 50 criterion, 42 subsequence, 47 Almost sure (a.s.) convergence, 41 Almost sure (a.s.) invariant sets, 326–328, 330 Almost sure sup, 99 Almost uniform (a.u.) convergence, 45, 48, 50 Analytic function, 227, 229 Array of r.v.s, 239 Asymptotic negligible, 242 Asymptotic normality, 243 Asymptotically unbiased, 365 Atom, 35 Axiom of choice, 6
B Baire function, 8 Base, 321 countable, 9 for a topology, 9 Basis, 7 Bernoulli distribution, 284 Beta distribution, 129 Binomial distribution approximation by Poisson distribution, 284 Binomial r.v.s, 189 Bivariate normal distribution, 176–177, 190 Bochner, 366 Borel σ-field, 3, 30–31 countably infinite-dimensional, 7 extended, 3 n-dimensional, 6 Borel–Cantelli Lemma, 289, 295 Borel function, 8–9, 387 Borel–Heine Theorem, 32 Borel measure, 19 Borel real line, 3, 320 extended, 3, 11 Borel sets, 3 n-dimensional, 6 countably infinite-dimensional, 7
393
394
Index
Borel Zero–One Criterion, 189, 289, 296–297, 311 Bounded, 144–146
C Cantor diagonal method, 145 Carathéodory extension, 27 Carathéodory extension theorem, 27, 321 Cauchy criterion, 41 Cauchy distribution, 233 Cauchy integral, 230 Cauchy principal value, 196 Cauchy–Schwarz inequality, 95, 377 Central limit problem centered case, 380 noncentered case, 271, 380 limiting laws of L(Sn) under conditions (C″), 274 notation, 271 special cases, 278 Central Limit Theorem (CLT), 223–224 classical, 241 conditions for CLT to hold, 252 dominated convergence theorem, 259 Liapounov theorem, 256 convergence to normal law distribution, 240 dominated convergence theorem, 248 limiting laws of L(Sn) under conditions (C), 245 Lindeberg-Feller theorem, 242 normal convergence criterion, 242 r.v.s, 240 triangular array, 241 uniformly asymptotic negligible (u.a.n.), 242 dominated convergence theorem, 259, 263 Fatou–Lebesgue theorem, 261 Liapounov theorem, 256 Paul Lévy continuity theorem, 263 proof of results, 260 dominated convergence theorem, 263 Fatou–Lebesgue theorem, 261 Helly–Bray extended lemma, 266 Paul Lévy continuity theorem, 263 Riemann–Stieltjes integral, 262 weak compactness theorem, 266 results, 260 Riemann–Stieltjes integral, 262 Centered r.v.s, 271 Characteristic function (ch.f.), 193, 273–274, 276–279, 281–283 basic properties, 194
Central Limit Theorem, 224 concepts and results from complex analysis employed, 229 power series, 230 r-analytic, 230 convergence of, 202 convolution of distribution functions, 211 and related issues, 211 Dominated Convergence Theorem, 213 Cramér–Wold device, 210 d.f.s. of d.f., 194 integration, 194 weak (and complete) convergence, 206 definition, 193 Dirichlet integrals, 196 distribution of random variable, 225 inversion formula, 195, 201 bounded uniformly, 196 corollaries, 199 define by continuity, 196 Dominated Convergence Theorem, 198, 200 Fubini Theorem, 198 original integral, 198 Lebesgue measure, 196 Paul Lévy continuity theorem, 202–203 convergence in distribution, 202 Fubini Theorem, 202 integral ch.f. of a d.f., 202 properties, 216 uniformed convergence, 216 properties of, 216 uniformed convergence, 216 theory of, 379 Weak Law of Large Numbers, 223 Characterization of integrability, 104 Chi-square distribution, 234 CLT. See Central Limit Theorem (CLT) Complement, 18 Complete, 51 with respect to a measure, 51 Complete convergence, 51, 135, 206 up to an additive constant, 206 Completeness in the rth mean Theorem, 103 Completion, 37, 330 Completion of J, 330 Complex-valued function, 194 Concept of a characteristic function (ch.f.), 193 integral, 202 Composition of d.f.s, 212 Conditional distribution, 156 Conditional expectation. See also Conditional expectation and conditional probability
Index
and related properties and results, 378 convergence in rth mean for, 164 dominated convergence theorem, 164 Fatou–Lebesgue theorem, 164 inequalities, 162 Jensen inequality for, 166 Lebesgue Monotone convergence theorem, 160 properties, 169 Conditional expectation of X given Y, 154 given a σ-field, 153 Conditional inequalities, 162 Conditional probability, 154 given a σ-field, 153 Conditional probability density function, 153 Conditional expectation and conditional probability. See also Conditional expectation and related properties and results, 378 definition set functions, 153 meaning/usefulness, 156 properties of, 169 Borel real line, 171 equality, 169 theorems, 158 Conditions (C ), 115, 245, 253–254, 271 Conditions (C’), 271, 274 Conditions (C”), 271 limiting laws of L(Sn) under, 274 Construction of measures, 27 Continuity of measure, 20–22 Continuity point, 142, 148, 210 Continuity set, 78 Continuous, 9, 79, 108, 117, 220, 251 Continuous convergence, 219 Continuous d.f., 144, 138, 141 Continuous from above, 20–21, 117 Continuous from below, 20–22, 117 Continuous from the right, 31 Continuous function, 7, 9, 109, 209 at the origin, 205 Continuous mapping, 9 Continuous measure, 122, 127, 141 Continuous measure at ∅, 122, 132 Continuous measure from above, 20–21 Continuous measure from below, 20–22 Convergence a.e. vs. in measure, 41 almost everywhere (a.e.), 41 criterion, 42 subsequence, 45 almost sure (a.s.), 41, 99
almost uniform, 45, 48 of ch.f.s, 202 complete, 51, 135, 206 up to an additive constant, 206 in distribution, 71, 78–80, 193, 202 in the first mean, 344 in measure, 41, 45 mutual, 41, 45 a.e., 41 a.s., 51 almost uniform, 45, 48 in measure, 42, 45, 50 in probability, 41 in the rth mean, 95 in probability, 41, 51–52, 71, 78, 80 preserved by continuity, 51 in quadratic mean, 116 in the rth mean, 103, 106 for conditional expectations, 160 necessary and sufficient conditions, 106 boundedness, 105 Dominated Convergence Theorem, 109 Fatou–Lebesgue Theorem, 107 integrability of r.v., characterization of, 104 introduction, 101 Markov inequality, 102 Scheffé’s Theorem, 110 uniform continuity, 105 uniform integrability, criterion of, 105 Vitali’s Theorem, 110 series of r.v.s, 240 weak, 78, 141, 206, 211 of d.f., 149, 240 of probability measure, 149 up to an additive constant, 206 Convex function, 95–96, 99, 112, 132, 166, 172, 175 Convex, strictly, 116, 132 Convolution of d.f.s, 211 Coordinate process, 320, 325–326, 339–340, 342 Correlation coefficient, 177 Countable, 1, 135 Countable base, 9 Countable covering, 23 Countable intersections, 18 Countable partition, 125, 155 Countable set, 1 Countable sum, 2, 15 Countable unions, 18 Countably infinite, 1–2, 7 Countably infinite-dimensional Borel σ-field, 7 Countably infinite-dimensional Borel set, 7
395
396
Index
Countably infinite-dimensional Borel space, 7 Counterexample, 13 Counting measure, 35, 129 Covariance, 68, 177 cr inequality, 95, 98 Cramér–Rao inequality, 77 Cramér–Wold device, 210–211 Cramér–Wold Theorem, 211 Cylinder, 320–321 product, 7
D Decomposition Theorem, 137 Decomposition, unique, 122 Definition of characteristic function of distribution, 193 definition of integral, 55 delta method, 80 DeMorgan’s laws, 13 Dense set, 135 Dense subset, 136 Density function, 116–117, 131, 153, 176, 267 Denumerable, 28 Derivative, 200, 229–230, 236, 250 quadratic mean, 116 Radon–Nikodym, 129, 348 Diagonal sequence, 144 Differentiable function, 229 Differentiable in q.m., 116 Dirichlet integrals, 196 Discontinuities, 137, 140, 143 Discontinuity point, 135, 137, 143 Discrete field, 1 Discrete σ-field, 2 Discrete time parameter, 320 Discrete parameter stochastic process, 319 Distance, 133 Distribution, 34 Distribution function (d.f.), 31, 34, 272–283, 285 basic properties of convergence theorem, 141 decomposition theorem, 137 step function, 137 of a r.v., 34 convolution of, 211 Helly–Bray type theorems description, 148 extended lemma, 147 lemma, 145–146 Riemann–Stieltjes sense, 145
weak convergence and compactness of sequence, 141 Cantor diagonal method, 145 values, variation and, 142 variation, 141 weak compactness theorem, 144 Distribution, limiting, 274, 276, 278 Dominated Convergence Theorem, 71, 90, 93, 96, 104, 109–110, 141, 166, 176, 198, 200, 203, 212, 220, 248, 259, 263, 306, 336, 359, 363 for conditional expectations, 164 Dominates, 122 Dominating measure, 117, 133 Domination by a measure, 117 Double integral, 71, 88
E Egorov’s Theorem, 338–339 Elementary random variable, 11 Empirical d.f., 191 Entire function, 229 Ergodic, 319 stationary process, 324, 326, 339 theory, 319 transformation, 331 Ergodic theorem, 332, 342, 344 maximal, 333 Ergodicity, 319–320, 332, 340 of a transformation, 319 Event(s), 16, 51, 68, 174, 179, 190, 306, 342 Expectation, 59 Extended Borel σ-field, 3 Extended Borel real line, 3, 11 Extended random variable, 11 Extended real line, 3 Extension, 22, 27, 56, 59, 86 need not be unique, 29, 36 Extension of measures, 27 Extension of set function, 22
F F-integrable, 194, 196, 220, 243 Factorization, 181–182, 185 of the joint d.f. of r.v.s, 179 Fatou–Lebesgue theorem, 71, 73, 107 Central Limit Theorem, 261 for conditional expectation, 164 Feller, 239, 242 Field, 1, 20, 23, 25, 28–30 σ- (see σ-field)
Index
discrete, 1 generated, 1–2, 19 induced, 1 minimal, 2 trivial, 1 Field generated by C, 2, 18, 30 Field generated by σ(C ), 3, 7 Finite intersections, 17 Finite measure, 74, 84, 86 Finite partition, 1 Finite sum, 5 of intervals, 28 Finite unions, 18 Finitely additive measure, 20–21, 117 Finitely additive set function, 117 Finitely subadditive measure, 23 Fubini Theorem, 71, 88 Function Baire, 8 Borel, 8 continuous (see continuous function), 7 measurable, 7 nondecreasing right-continuous, 19 of independent r.v.s, 179–180 projection, 9 right-continuous, 7
G Gamma distribution, 234 Gamma function, 235 Generated σ-field, 3, 6 Generated field, 1–2 Glivenko-Cantelli Lemma, 191
H Hahn–Jordan decomposition theorem, 377 continuity, 117 set functions, 119 signed measure, 121 Halmos, 89 Hardy, et al., 97, 166 Helly–Bray extended lemma, 147, 266 Helly–Bray Lemma, 145–146, 217 Helly–Bray Theorem, 148 Helly–Bray type results distribution functions and basic properties, 377 Hölder inequality, 95, 377 hypergeometric distribution, 129
I Improper Riemann–Stieltjes integral, 386 Increasing function, 166 Indefinite integral, 74, 117 Independence elaboration, 153 of σ-fields, 169 of classes of events, 179 of events, 179 of fields, 180 of functions of independent r.v.s, 179 of r.v.s, 153, 179 of subclasses of independent classes, 180 row-wise, 242, 245, 253, 256–257, 268–269, 279, 281–282 Index, 200 Indicator of a set, 11 Induced σ-field, 2 field, 1 measure, 135 outer measure, 24 probability measure, 131 Induction, 26, 181–182 Inequality, 26–27 reverse, 32 Inf, 10 Infinite countably, 1–2, 7 measure, 85 partition, 11 Infinitely many, 16 measurable spaces, 6 Infinitely often (i.o.), 16 Integrable, 59, 63–64, 76 Integral ch.f., 202–204, 206, 208 Integral of random variable basic properties, 376 over a set, 56 with respect to a measure, 55 Interchange of differentiation and integration, 71 of integration and lim inf, lim sup, lim, 76 of integration and limits, 71 of integration and summation, 71 of the order of integration, 71 Invariance, 340, 382 Invariant random variables, 331–332, 341 relative to r.v., 331 relative to transformation, 319 Invariant sets, 319, 326, 340 relative to r.v., 355
397
398
Index
Inverse image, 7–8 Inversion formula, 193–195, 201 Irreducible, 69 Iterated integral, 88
J Jensen inequality, 99 conditional expectation, 166 Joint ch.f., 210 Joint d.f., 210 Joint probability density junction, 176 Jump discontinuities, 136 Jumps length, 136 number, 136–137
K k-dimensional d.f., 210 Kernel-estimation approach, 364 Kolmogorov Consistency Theorem, 321 Kolmogorov Inequalities, 289–290, 381 Kolmogorov Strong Law of Large Numbers, 290 Kolmogorov Theorem, 290 Kolmogorov Three Series Criterion, 316 Kolmogorov Zero-One Law, 289, 314 Kronecker Lemma, 289–301
L Lebesgue Decomposition Theorem, 117, 122, 377 Lebesgue measure, 21, 28, 30–31, 69, 72, 85, 92, 110, 114, 135, 348 n-dimension, 28, 129 Lebesgue Monotone Convergence Theorem, 12, 71, 73–74, 160, 172, 174, 188 for conditional expectations, 153 Left-continuous, 37 Left-hand side limit, 150 Length of jumps, 136 Liapounov condition, 268 Liapounov Theorem, 239, 256 Lim, 10 Lim inf, 10, 66 Lim sup, 10, 66 Limit of a sequence, 10 Limiting law, 239, 245 identification under conditions, 245 normal distribution, 239
of L(Sn), 245 Poisson distribution, 278 Lindeberg, 239 Lindeberg condition, 267 Lindeberg-Feller Theorem, 242 Line of support, 99 Loève, 7, 78, 90, 121, 127, 156, 321 Logarithmic function, 228
M m-dimensional statistic, 156 Main diagonal, 15, 85 Mapping, 16 Markov Inequality, 95, 100, 102, 113 Maximal Ergodic Theorem, 319–320, 333 Maximum Likelihood Estimate, 77 Mean Value Theorem, 77 Measurable, 7–8, 12 function, 7, 92 mapping, 1, 8 partition, 91 section, 80 set, 3, 25, 82 space, 1, 3 product of, 5, 7 Measure, 19, 22, 27, 30 defined on a field, 22 induced by, 34 on the Borel σ-field, 30–31 outer, 22 space, 19 Measure in the Borel real line, 19 Measure-determining class, 323 Measure-preserving property, 324, 327 Measure-preserving transformations, 323–328, 330, 332–333, 337 ergodic, 332 Measures, 19, 375 and (point) functions, 30 construction and extension of, 27 definition and construction, 375 basic properties, 375 mutual convergence in, 41, 45 outer, 22 Median of a.r.v., 38 Minimal field, 2–3 Minimal monotone class, 4, 28, 85 Minimal σ-field, 3 Minkowski inequality, 97 Modes of convergence, 41, 80, 84 Moment inequality, 95, 153
Index
Moments of a random variable determine its distribution, 225 Monotone, 3, 166 Monotone class, 1, 3–4, 14, 85, 323 Minimal, 4 Monotone Convergence Theorem (see Lebesgue Monotone Convergence Theorem), 12 monotone field, 3 Mutual convergence, 41 a.e., 41–42, 46, 49, 72 a.s., 80, 289 almost uniform, 45, 48 in measure, 41–42, 45 in probability, 41 in the rth mean, 103, 106 Mutual entropy, 131 Mutual version of convergence, 41 Mutually absolutely continuous, 122, 134
N n-Dimensional Borel sets, 6 n-Dimensional Borel space, 6 n-Dimensional Borel σ-field, 6 n-Dimension Lebesgue measure, 28 n-Dimensional random vector, 8 Natural logarithm, 133 Negative binomial, 129 Negative part, 10 Noncentered, 271 Nondecreasing, 19, 31 function, 31 measure, 22 right-continuous function, 19 Nonnegative, 19 Nonparametric inference kernel method, 364 Normal convergence criterion, 239, 242 normal distribution, 177, 234 null set, 37, 41 number of jumps, 137
P PB-continuous, 154 P-integrable r.v., 155 Pólya’s Lemma, 286 Pairwise disjoint, 1, 15, 22, 26, 62 Parameter, 156, 176 Parameter space, 156, 172 Partition, 11, 14, 19, 55, 69, 91 countable, 1 countably infinite measurable, 1, 7 finite, 1 infinite, 1 Partition points, 125, 213 Partitioning set, 1–2 Paul Lévy Continuity Theorem, 193, 202–203, 205, 263 Point function, 30, 66 Pointwise differentiable, 116 Pointwise limit, 12 Poisson distribution, 278, 284 Positive part, 10 Principal branch of logarithm, 228 Probability density function, 110, 116–117, 267 conditional, 153 joint, 176 Probability distribution, 34, 252 Probability inequality, 95 Probability measure, 19 induced by a density, 131 Probability space, 131 Process, 319–320, 340 Product σ-field, 5, 8 cylinder, 7 measurable space, 7, 9 measure, 80 space, 5–6, 91 Product Measure Theorem, 71, 362 Product of ch.f.s, 206 Projection function, 9
Q Quadric mean derivative, 116
O Open set, 9 Operator, 7 Orbit, 331 Orthogonal, 122, 337 Outer measure, 19, 22, 23, 24, 36
R r-analytic function, 229 r-th absolute moment, 216 radius of convergence, 226
399
400
Index
Radon–Nikodym theorem, 128, 377 Random variable, 7, 31, 34, 46–48, 50 elementary, 11, 16 extended, 3, 11 integral of basic properties of, 60 definition of, 55 introduction, 55 probability distributions, 66 invariant. See Invariant random variables simple, 1, 12, 16 Random vector, 8–9 Rectangle, 5–6 Regularity conditions, 193 Replacement of distribution by p.d.f., 130–131 Restriction, 22, 125, 153, 173 Review of chapters, 375 Riemann integral, 220 Riemann-Stieltjes integral, 135, 145, 385 Riemann-Stieltjes sum, 262 Riemann-Stieltjes integrable, 385 Right-continuity point, 143 Right-continuous function, 7, 19, 193 Robustness of the t-test, 80 Row-wise independence, 242, 245, 256, 268–269, 272, 279, 281, 283
Slope, 96 Slutsky’s Theorem, 78, 362 Space, 1 measurable, 1, 3, 9 product, 5–6 topological, 9 Standard convergence theorems, 71, 376 Standard moment and probability inequalities, 95 Stationarity, 319, 321 Stationary process, 319, 322, 326, 340, 344 Statistic, 71, 78 Step function, 137, 140 Stirling’s Formula, 227 Stochastic process, 320 Strict stationarity, 319 Strictly stationary process, 321 Strong Law of Large Numbers, 289–290, 306, 309, 346 for r.v.s with infinite expectation, 309 Sub-σ-additive measure, 117 Subsequences, 206 Sufficient statistic, 172 Sums of sets, 1, 5 Sup, 10 Symmetric, 69 Symmetric difference, 13 symmetric r.v. about zero, 39, 215
S
T
Sample mean, 79, 92–93 Scheffe’s Theorem, 81 Selected references, 364 Separable, 9 Sequences, 12–13, 15–16, 41, 48, 50, 53, 141, 144, 239, 266, 289–290 Set of all real-valued functions, 6 open, 9 Set function, 19, 21–22, 27, 38, 66, 71, 74, 85, 119 σ-additive, 119, 123 Set operator, 7 Shannon–Kolmogorov Information Inequality, 131 Shift transformation, 319, 323, 325, 339 Sides, 321 of a cylinder, 7 of a rectangle, 5 Signed measure, 121, 171 Simple function, 66 Simple r.v., 11–12, 16, 65 Singular with respect to a measure, 55 Singularity, 125
T-measurable, 154–155 t-test, 80 Tail σ-field, 289, 313–314, 346 Tail events, 313 Tail r.v.s, 313 Taylor’s formula, 79, 361 Tchebichev Inequality, 100, 370 Three Series Criterion, 316 Toeplitz Lemma, 300 Topological space, 9 topology, 9 usual, 9 Transformations almost sure invariant sets. See Almost sure invariant sets ergodicity of, 331 invariant random variables relative. See Invariant random variables invariant sets. See Invariant sets ergodic, 331 measure-preserving, 319, 323–324, 327, 331, 337, 339 shift, 319, 339, 382
Index
Triangular arrays of random variable subject to conditions C, 271–272 Trivial field, 1 Trivial σ-field, 2
U Unbiased statistic, 172 Uncorrelated r.v.s, 190 Uncountable, 1 Uniform continuity, 95, 101 Uniform convergence, 41, 45, 50 Uniform distribution, 50 Uniform integrability, 101 Uniformly absolutely continuous, 105 Uniformly asymptotic negligible, 242 Uniformly continuous, 105, 108 Uniformly integrable, 104–107, 113 Unique decomposition, 117, 119, 122
Uniquely defined, 101, 141, 278, 331 Up to an additive constant (uac), 206 Usual topology, 9
V Variation, 141 Vitali’s Theorem, 110
W Wald, 177 Weak Compactness Theorem, 144, 150, 266 Weak convergence, 78, 141, 170, 177, 248, 289 of d.f., 149, 240, 389 of probability measure, 149 up to an additive constant, 206 Weak Law of Large Numbers, 92, 223, 380 Well-defined, 171
401