Multiparameter Resources: An Introduction to Random Fields
Davar Khoshnevisan
Springer
Springer Monographs in Mathematics
Davar Khoshnevisan
Multiparameter Resources An Introduction to Random Fields
Davar Khoshnevisan Department of Mathematics University of Utah Salt Lake City, Ut 84112-0090
[email protected]
With 12 illustrations. Mathematics Subject Classification (2000): 60Gxx, 60G60 Library of Congress Cataloging-in-Publication Data Khoshnevisan, Davar. Multiparameter processes : an introduction to random fields / Davar Khoshnevisan. p. cm. — (Springer monographs in mathematics) Includes bibliographical references and index ISBN 0-387-95459-7 (alk. paper) 1. Random fields. I. Title. II. Series. QA274.45 .K58 2002 519.2′3—dc21 2002022927 Printed on acid-free paper. 2002 Springer-Verlag New York, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Manufacturing supervised by Jerome Basma. Camera-ready copy prepared from the author’s LaTeX files. Printed and bound by Edwards Brothers, Inc., Ann Arbor, MI. Printed in the United States of America. 9 8 7 6 5 4 3 2 1 ISBN 0-387-95459-7
SPIN 10869448
Springer-Verlag New York Berlin Heidelberg A member of BertelsmannSpringer Science+Business Media GmbH
Preface
This book aims to construct a general framework for the analysis of a large family of random fields, also known as multiparameter processes. The need for such a development was pointed out in Doob (1990, p. 47). Referring to the theory of one-parameter stochastic processes, Doob writes:1 Our definition of a stochastic process is historically conditioned and has obvious defects. In the first place there is no mathematical reason for restricting T to be a set of real numbers, and in fact interesting work has already been done in other cases. (Of course, the interpretation of t as time must then be dropped.) In the second place there is no mathematical reason for restricting the value assumed by the xt ’s to be numbers. There are a number of compelling reasons for studying random fields, one of which is that, if and when possible, multiparameter processes are a natural extension of existing one-parameter processes. More exciting still are the various interactions between the theory of multiparameter processes and other disciplines, including probability itself. For example, in this book the reader will learn of various connections to real and functional analysis, a modicum of group theory, and analytic number theory. The multiparameter processes of this book also arise in applied contexts such as mathematical statistics (Pyke 1985), statistical mechanics (Kuroda and Manaka 1998), and brain data imaging (Cao and Worsley 1999). 1 He
is referring to a stochastic process of the form (xt ; t ∈ T ).
vi
My writing philosophy has been to strike a balance between developing a reasonably general theory, while presenting applications and explicit calculations. This approach should set up the stage for further analysis and exploration of the subject, and make for a more lively approach. This book is in two parts. Part I is about the discrete-time theory. It also contains results that allow for the transition from discrete-time processes to continuous-time processes. In particular, it develops abstract random variables, parts of the theory of Gaussian processes, and weak convergence for continuous stochastic processes. Part II contains the general theory of continuous-time processes. Special attention is paid to processes with continuous trajectories, but some discontinuous processes will also be studied. In this part I will also discuss subjects such as potential theory for several Markov processes, the Brownian sheet, and some Gaussian processes. Parts I and II are influenced by the fundamental works of Doob, Cairoli, and Walsh. My goal has been to keep this book as self-contained as possible, in order to make it available to advanced graduate students in probability and analysis. To this I add that a more complete experience can only be gained through solving many of the problems that are scattered throughout the body of the text. At times, these in-text exercises ask the student to check some technical detail. At other times, the student is encouraged to apply a recently introduced idea in a different context. More challenging exercises are offered at the end of each chapter. Many of the multiparameter results of this book do not seem to exist elsewhere in a pedagogic manner. There are also a number of new theorems that appear here for the first time. When introducing a better-known subject (e.g., martingales or Markov chains), I have strived to construct the most informative proofs, rather than the shortest. This book would not exist had it not been for the extensive remarks, corrections, and support of R. Bass, J. Bertoin, K. Burdzy, R. Dalang, S. Ethier, L. Horv´ ath, S. Krone, O. L´evˆeque, T. Lewis, G. Milton, E. Nualart, T. Mountford, J. Pitman, Z. Shi, J. Walsh, and Y. Xiao. Their efforts have led to a much cleaner product. What errors remain are my own. I have enjoyed a great deal of technical support from P. Bowman, N. Beebe, and the editorial staff of Springer. The National Science Foundation and the North Atlantic Treaty Organization have generously supported my work on random fields over the years. My sincerest gratitude goes to them all. Finally, I wish to thank my dearest friend, and my source of inspiration, Irina Gushin. This book is dedicated to the memory of Victor Gushin, and to the recent arrival of Adrian V. Kh. Gushin. Davar Khoshnevisan, Salt Lake City, UT. March 2002
Contents
Preface
v
List of Figures
xv
General Notation
I
xvii
Discrete-Parameter Random Fields
1 Discrete-Parameter Martingales 1 One-Parameter Martingales . . . . . . . . . . 1.1 Definitions . . . . . . . . . . . . . . . 1.2 The Optional Stopping Theorem . . . 1.3 A Weak (1,1) Inequality . . . . . . . . 1.4 A Strong (p, p) Inequality . . . . . . . 1.5 The Case p = 1 . . . . . . . . . . . . . 1.6 Upcrossing Inequalities . . . . . . . . 1.7 The Martingale Convergence Theorem 2 Orthomartingales . . . . . . . . . . . . . . . . 2.1 Definitions and Examples . . . . . . . 2.2 Embedded Submartingales . . . . . . 2.3 Cairoli’s Strong (p, p) Inequality . . . 2.4 Another Maximal Inequality . . . . . 2.5 A Weak Maximal Inequality . . . . . .
1 . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
3 4 4 7 8 9 9 10 12 15 16 18 19 20 22
viii
Contents
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
22 24 26 30 31 31 31 33 35 37 38 40 44
2 Two Applications in Analysis 1 Haar Systems . . . . . . . . . . . . . . . . . 1.1 The 1-Dimensional Haar System . . 1.2 The N -Dimensional Haar System . . 2 Differentiation . . . . . . . . . . . . . . . . 2.1 Lebesgue’s Differentiation Theorem . 2.2 A Uniform Differentiation Theorem 3 Supplementary Exercises . . . . . . . . . . . 4 Notes on Chapter 2 . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
47 47 48 51 54 54 58 61 63
3 Random Walks 1 One-Parameter Random Walks . . . . . . 1.1 Transition Operators . . . . . . . . 1.2 The Strong Markov Property . . . 1.3 Recurrence . . . . . . . . . . . . . 1.4 Classification of Recurrence . . . . 1.5 Transience . . . . . . . . . . . . . . 1.6 Recurrence of Possible Points . . . 1.7 Recurrence–Transience Dichotomy 2 Intersection Probabilities . . . . . . . . . . 2.1 Intersections of Two Walks . . . . 2.2 An Estimate for Two Walks . . . . 2.3 Intersections of Several Walks . . . 2.4 An Estimate for N Walks . . . . . 3 The Simple Random Walk . . . . . . . . . 3.1 Recurrence . . . . . . . . . . . . . 3.2 Intersections of Two Simple Walks 3.3 Three Simple Walks . . . . . . . . 3.4 Several Simple Walks . . . . . . . 4 Supplementary Exercises . . . . . . . . . . 5 Notes on Chapter 3 . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
65 . 66 . 66 . 69 . 70 . 72 . 74 . 75 . 78 . 80 . 80 . 85 . 86 . 89 . 89 . 90 . 91 . 93 . 97 . 99 . 103
3
4 5
2.6 Orthohistories . . . . . . . 2.7 Convergence Notions . . . . 2.8 Topological Convergence . . 2.9 Reversed Orthomartingales Martingales . . . . . . . . . . . . . 3.1 Definitions . . . . . . . . . 3.2 Marginal Filtrations . . . . 3.3 A Counterexample . . . . . 3.4 Commutation . . . . . . . . 3.5 Martingales . . . . . . . . . 3.6 Conditional Independence . Supplementary Exercises . . . . . . Notes on Chapter 1 . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Contents
4 Multiparameter Walks 1 The Strong Law of Large Numbers . . . . . 1.1 Definitions . . . . . . . . . . . . . . 1.2 Commutation . . . . . . . . . . . . . 1.3 A Reversed Orthomartingale . . . . 1.4 Smythe’s Law of Large Numbers . . 2 The Law of the Iterated Logarithm . . . . . 2.1 The One-Parameter Gaussian Case . 2.2 The General LIL . . . . . . . . . . . 2.3 Summability . . . . . . . . . . . . . 2.4 Dirichlet’s Divisor Lemma . . . . . . 2.5 Truncation . . . . . . . . . . . . . . 2.6 Bernstein’s Inequality . . . . . . . . 2.7 Maximal Inequalities . . . . . . . . . 2.8 A Number-Theoretic Estimate . . . 2.9 Proof of the LIL: The Upper Bound 2.10 A Moderate Deviations Estimate . . 2.11 Proof of the LIL: The Lower Bound 3 Supplementary Exercises . . . . . . . . . . . 4 Notes on Chapter 4 . . . . . . . . . . . . . .
ix
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
105 106 106 107 109 110 112 113 116 117 118 119 121 123 125 127 128 130 132 135
5 Gaussian Random Variables 1 The Basic Construction . . . . . . . . . . . . 1.1 Gaussian Random Vectors . . . . . . . 1.2 Gaussian Processes . . . . . . . . . . . 1.3 White Noise . . . . . . . . . . . . . . . 1.4 The Isonormal Process . . . . . . . . . 1.5 The Brownian Sheet . . . . . . . . . . 2 Regularity Theory . . . . . . . . . . . . . . . 2.1 Totally Bounded Pseudometric Spaces 2.2 Modifications and Separability . . . . 2.3 Kolmogorov’s Continuity Theorem . . 2.4 Chaining . . . . . . . . . . . . . . . . 2.5 H¨ older-Continuous Modifications . . . 2.6 The Entropy Integral . . . . . . . . . . 2.7 Dudley’s Theorem . . . . . . . . . . . 3 The Standard Brownian Sheet . . . . . . . . . 3.1 Entropy Estimate . . . . . . . . . . . 3.2 Modulus of Continuity . . . . . . . . . 4 Supplementary Exercises . . . . . . . . . . . . 5 Notes on Chapter 5 . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
137 137 137 140 142 144 147 148 149 153 158 160 165 167 170 172 172 173 175 178
6 Limit Theorems 181 1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . 181 1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . 182
x
Contents
2
3
4
5 6
II
1.2 Distributions . . . . . . . . . . . . . . 1.3 Uniqueness . . . . . . . . . . . . . . . Weak Convergence . . . . . . . . . . . . . . . 2.1 The Portmanteau Theorem . . . . . . 2.2 The Continuous Mapping Theorem . . 2.3 Weak Convergence in Euclidean Space 2.4 Tightness . . . . . . . . . . . . . . . . 2.5 Prohorov’s Theorem . . . . . . . . . . The Space C . . . . . . . . . . . . . . . . . . 3.1 Uniform Continuity . . . . . . . . . . 3.2 Finite-Dimensional Distributions . . . 3.3 Weak Convergence in C . . . . . . . . 3.4 Continuous Functionals . . . . . . . . 3.5 A Sufficient Condition for Pretightness Invariance Principles . . . . . . . . . . . . . . 4.1 Preliminaries . . . . . . . . . . . . . . 4.2 Finite-Dimensional Distributions . . . 4.3 Pretightness . . . . . . . . . . . . . . . Supplementary Exercises . . . . . . . . . . . . Notes on Chapter 6 . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Continuous-Parameter Random Fields
7 Continuous-Parameter Martingales 1 One-Parameter Martingales . . . . . . . . . . 1.1 Filtrations and Stopping Times . . . . 1.2 Entrance Times . . . . . . . . . . . . . 1.3 Smartingales and Inequalities . . . . . 1.4 Regularity . . . . . . . . . . . . . . . . 1.5 Measurability of Entrance Times . . . 1.6 The Optional Stopping Theorem . . . 1.7 Brownian Motion . . . . . . . . . . . . 1.8 Poisson Processes . . . . . . . . . . . . 2 Multiparameter Martingales . . . . . . . . . . 2.1 Filtrations and Commutation . . . . . 2.2 Martingales and Histories . . . . . . . 2.3 Cairoli’s Maximal Inequalities . . . . . 2.4 Another Look at the Brownian Sheet . 3 One-Parameter Stochastic Integration . . . . 3.1 Unbounded Variation . . . . . . . . . 3.2 Quadratic Variation . . . . . . . . . . 3.3 Local Martingales . . . . . . . . . . . 3.4 Elementary Processes . . . . . . . . . 3.5 Simple Processes . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
183 184 185 186 188 188 189 190 193 193 195 196 199 200 201 202 204 207 210 213
215 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
217 217 218 221 222 223 226 226 228 230 233 233 234 235 236 239 239 242 245 246 247
Contents
4
5 6
3.6 Continuous Adapted Processes . . . . . . 3.7 Two Approximation Theorems . . . . . . 3.8 Itˆ o’s Formula . . . . . . . . . . . . . . . . 3.9 The Burkholder–Davis–Gundy Inequality Stochastic Partial Differential Equations . . . . . 4.1 Stochastic Integration . . . . . . . . . . . 4.2 Hyperbolic SPDEs . . . . . . . . . . . . . 4.3 Existence and Uniqueness . . . . . . . . . Supplementary Exercises . . . . . . . . . . . . . . Notes on Chapter 7 . . . . . . . . . . . . . . . . .
xi
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
248 250 251 253 255 256 257 260 263 266
8 Constructing Markov Processes 1 Discrete Markov Chains . . . . . . . . . . . 1.1 Preliminaries . . . . . . . . . . . . . 1.2 The Strong Markov Property . . . . 1.3 Killing and Absorbing . . . . . . . . 1.4 Transition Operators . . . . . . . . . 1.5 Resolvents and λ-Potentials . . . . . 1.6 Distribution of Entrance Times . . . 2 Markov Semigroups . . . . . . . . . . . . . 2.1 Bounded Linear Operators . . . . . 2.2 Markov Semigroups and Resolvents . 2.3 Transition and Potential Densities . 2.4 Feller Semigroups . . . . . . . . . . . 3 Markov Processes . . . . . . . . . . . . . . . 3.1 Initial Measures . . . . . . . . . . . 3.2 Augmentation . . . . . . . . . . . . 3.3 Shifts . . . . . . . . . . . . . . . . . 4 Feller Processes . . . . . . . . . . . . . . . . 4.1 Feller Processes . . . . . . . . . . . . 4.2 The Strong Markov Property . . . . 4.3 L´evy Processes . . . . . . . . . . . . 5 Supplementary Exercises . . . . . . . . . . . 6 Notes on Chapter 8 . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
267 267 267 272 272 275 277 279 281 281 282 284 287 288 288 290 292 293 294 298 303 307 311
9 Generation of Markov Processes 1 Generation . . . . . . . . . . . . . 1.1 Existence . . . . . . . . . . 1.2 The Hille–Yosida Theorem 1.3 The Martingale Problem . . 2 Explicit Computations . . . . . . . 2.1 Brownian Motion . . . . . . 2.2 Isotropic Stable Processes . 2.3 The Poisson Process . . . . 2.4 The Linear Uniform Motion
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
313 313 314 315 317 320 320 322 325 326
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
xii
Contents
3
4
5 6
The Feynman–Kac Formula . . . . . . . . . . 3.1 The Feynman–Kac Semigroup . . . . . 3.2 The Doob–Meyer Decomposition . . . Exit Times and Brownian Motion . . . . . . . 4.1 Dimension One . . . . . . . . . . . . . 4.2 Some Fundamental Local Martingales 4.3 The Distribution of Exit Times . . . . Supplementary Exercises . . . . . . . . . . . . Notes on Chapter 9 . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
326 326 328 329 330 331 335 339 340
10 Probabilistic Potential Theory 1 Recurrent L´evy Processes . . . . . . . . . . 1.1 Sojourn Times . . . . . . . . . . . . 1.2 Recurrence of the Origin . . . . . . . 1.3 Escape Rates . . . . . . . . . . . . . 1.4 Hitting Probabilities . . . . . . . . . 2 Hitting Probabilities for Feller Processes . . 2.1 Strongly Symmetric Feller Processes 2.2 Balayage . . . . . . . . . . . . . . . 2.3 Hitting Probabilities and Capacities 2.4 Proof of Theorem 2.3.1 . . . . . . . 3 Explicit Computations . . . . . . . . . . . . 3.1 Brownian Motion and Capacities . . 3.2 Stable Densities and Subordination . 3.3 Asymptotics for Stable Densities . . 3.4 Stable Processes and Capacities . . . 3.5 Relation to Hausdorff Dimension . . 4 Supplementary Exercises . . . . . . . . . . . 5 Notes on Chapter 10 . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
343 344 344 347 350 353 360 360 362 367 368 373 373 377 380 382 385 386 388
11 Multiparameter Markov Processes 1 Definitions . . . . . . . . . . . . . . . . . . . 1.1 Preliminaries . . . . . . . . . . . . . 1.2 Commutation and Semigroups . . . 1.3 Resolvents . . . . . . . . . . . . . . . 1.4 Strongly Symmetric Feller Processes 2 Examples . . . . . . . . . . . . . . . . . . . 2.1 General Notation . . . . . . . . . . . 2.2 Product Feller Processes . . . . . . . 2.3 Additive L´evy Processes . . . . . . . 2.4 Product Process . . . . . . . . . . . 3 Potential Theory . . . . . . . . . . . . . . . 3.1 The Main Result . . . . . . . . . . . 3.2 Three Technical Estimates . . . . . . 3.3 Proof of Theorem 3.1.1: First Half .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
391 391 392 395 397 398 401 401 402 405 407 408 408 410 413
Contents
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
418 419 419 424 426 428 429 433 435 438 438 441 443 448 450 453
12 The Brownian Sheet and Potential Theory 1 Polar Sets for the Range of the Brownian Sheet . . 1.1 Intersection Probabilities . . . . . . . . . . 1.2 Proof of Theorem 1.1.1: Lower Bound . . . 1.3 Proof of Lemma 1.2.2 . . . . . . . . . . . . 1.4 Proof of Theorem 1.1.1: Upper Bound . . . 2 The Codimension of the Level Sets . . . . . . . . . 2.1 The Main Calculation . . . . . . . . . . . . 2.2 Proof of Theorem 2.1.1: The Lower Bound . 2.3 Proof of Theorem 2.1.1: The Upper Bound 3 Local Times as Frostman’s Measures . . . . . . . . 3.1 Construction . . . . . . . . . . . . . . . . . 3.2 Warmup: Linear Brownian Motion . . . . . 3.3 A Variance Estimate . . . . . . . . . . . . . 3.4 Proof of Theorem 3.1.1: General Case . . . 4 Supplementary Exercises . . . . . . . . . . . . . . . 5 Notes on Chapter 12 . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
455 455 456 457 460 468 472 473 474 476 477 478 480 485 488 491 493
4
5
6 7
III
3.4 Proof of Theorem 3.1.1: Second Half . . Applications . . . . . . . . . . . . . . . . . . . . 4.1 Additive Stable Processes . . . . . . . . 4.2 Intersections of Independent Processes . 4.3 Dvoretzky–Erd˝ os–Kakutani Theorems . 4.4 Intersecting an Additive Stable Process 4.5 The Range of a Stable Process . . . . . 4.6 Extension to Additive Stable Processes 4.7 Stochastic Codimension . . . . . . . . . α-Regular Gaussian Random Fields . . . . . . . 5.1 Stationary Gaussian Processes . . . . . 5.2 α-Regular Gaussian Fields . . . . . . . . 5.3 Proof of Theorem 5.2.1: First Part . . . 5.4 Proof of Theorem 5.2.1: Second Part . . Supplementary Exercises . . . . . . . . . . . . . Notes on Chapter 11 . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
xiii
Appendices
497
A Kolmogorov’s Consistency Theorem B Laplace Transforms 1 Uniqueness and Convergence Theorems 1.1 The Uniqueness Theorem . . . . 1.2 The Convergence Theorem . . . 1.3 Bernstein’s Theorem . . . . . . .
499
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
501 501 502 503 505
xiv
Contents
2
A Tauberian Theorem . . . . . . . . . . . . . . . . . . . . . 506
C Hausdorff Dimensions and Measures 1 Preliminaries . . . . . . . . . . . . . 1.1 Definition . . . . . . . . . . . 1.2 Hausdorff Dimension . . . . . 2 Frostman’s Theorems . . . . . . . . 2.1 Frostman’s Lemma . . . . . . 2.2 Bessel–Riesz Capacities . . . 2.3 Taylor’s Theorem . . . . . . . 3 Notes on Appendix C . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
511 511 511 515 517 517 520 523 525
D Energy and Capacity 1 Preliminaries . . . . . . . . . . . . . . . . . . . . . 1.1 General Definitions . . . . . . . . . . . . . . 1.2 Physical Interpretations . . . . . . . . . . . 2 Choquet Capacities . . . . . . . . . . . . . . . . . . 2.1 Maximum Principle and Natural Capacities 2.2 Absolutely Continuous Capacities . . . . . 2.3 Proper Gauge Functions and Balayage . . . 3 Notes on Appendix D . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
527 527 527 530 533 533 537 539 540
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
References
543
Name Index
565
Subject Index
572
List of Figures
1.1 1.2 1.3
Orthohistories . . . . . . . . . . . . . . . . . . . . . . . . Orthohistories . . . . . . . . . . . . . . . . . . . . . . . . Histories . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
Covering by balls . . . . . . . . . . . . . . . . . . . . . . 160
9.1
Gambler’s ruin . . . . . . . . . . . . . . . . . . . . . . . . 335
10.1
Covering balls . . . . . . . . . . . . . . . . . . . . . . . . 345
11.1 11.2
Planar Brownian motion . . . . . . . . . . . . . . . . . . 431 Additive Brownian motion . . . . . . . . . . . . . . . . . 434
12.1 12.2 12.3 12.4
Planar Brownian sheet (aerial) . . . Planar Brownian sheet (portrait) . Planar Brownian sheet (side) . . . . The zero set of the Brownian sheet
C.1
Cantor’s set . . . . . . . . . . . . . . . . . . . . . . . . . 516
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
22 23 32
458 458 458 473
This page intentionally left blank
General Notation
While it is generally the case that the notation special to each chapter is independent of that in the remainder of the book, there is much that is common to the entire book. These items are listed below.
Euclidean Spaces, Integers, etc. The collection of all real (nonnegative real) numbers is denoted by R (R+ ), positive (nonnegative) integers by N (N0 and/or Z+ ). Finally, the rationals (nonnegative rationals) are written as Q (Q+ ). The latter spaces are endowed with their standard Borel topologies and Borel σ-fields which will also be referred to as Borel fields. Recall that the Borel field on a topological space X is the σ-field generated by all open subsets of X; that is, the smallest σ-field that contains all open subsets of X. The sequence space p is the usual one: For any p > 0, pdesignates the p collection of all sequences a = (ak ; k ≥ 1) such that k |ak | < ∞. ∞ As usual, stands for all bounded sequences. When ∞ > p ≥ 1, these p spaces can be normed by ap = { k |ak |p }1/p . Using these norms in turn, we can norm the Euclidean space Rk in various ways. Throughout, we will use the following two norms: (a) the ∞ norm, which is |x| = max1≤j≤k |x(j) |, for x ∈ Rk ; and (b) the 2 norm, which is k 1 x = { j=1 |x(j) |2 } 2 .
xvii
Product Spaces Throughout, the “dimension” numbers d and N are reserved for the spatial and temporal dimension, respectively. Given any two sets F and T , the set F T is defined as the collection of all functions f : T → F . When F is a topological space, F T is often endowed with the product topology; cf. Appendix D. If m ∈ N, F m is the usual product space
F
m
m times = F × ··· × F .
This, too, is often endowed with the product topology if and when F is topological. Throughout, the ith coordinate of any point s ∈ Rm is written as s(i) . We need the following special order structure on the Rm : Whenever s, t ∈ Rm , we write s t (s ≺ t) when for i = 1, . . . , m, s(i) ≤ t(i) (s(i) < t(i) ). Occasionally, we may write this as t s (or t s). Whenever s, t ∈ Rm , st designates the point whose is s(i) ∧ t(i) for all i = 1, . . . , m. m ith(j)coordinate (j) If s t, then [s, t] = j=1 [s , t ]. We will refer to [s, t] as a rectangle (or an m-dimensional rectangle). When this rectangle is of the form [s, s + (r, . . . , r)] for some r ∈ R1+ and s ∈ Rm , then it is a (hyper)cube. If s ≺ t, one can similarly define ]s, t[, [s, t[, and ]s, t[. For instance, ]s, t] = m (j) (j) m j=1 ]s , t ]. All subsets of R automatically inherit the order structure of Rm . This way, s t makes the same sense in Nk as it does in Rk , for instance.
Probability and Measure Theory Unless it is stated to the contrary, the underlying probability space is nearly always denoted by (Ω, G, P), where Ω is the so-called sample space, G is a σ-field of subsets of Ω, and P is a probability measure on G. Unless it is specifically stated otherwise, the corresponding expectation operator is always denoted by E. While intersections of σ-fields are themselves σ-fields, their unions are not always. Thus, when F1 and F2 are two σ-fields, we write F1 ∨ F2 to mean the smallest σ-field that contains F1 ∪ F2 . More generally, for any index set A, we write ∨α∈A Fα for the smallest σ-field that contains all of the σ-fields (Fα ; α ∈ A). An important function is the indicator function (called the characteristic function in the analysis literature): In any space for any set A in that space, 1lA denotes the function x → 1lA (x) that is defined by 1lA (x) = 1 if x ∈ A and 1lA (x) = 0 if x ∈ A. In particular, if A is an event in the probability space, 1lA is the indicator function of the event A.
xix
Throughout, “a.s.” is treated synonymously to “almost surely,” “Palmost everywhere,” “almost sure,” or “P-almost sure,” depending on which is more applicable. Both “iid” and “i.i.d.” stand for “independent, identically distributed.” When µ is a (nonnegative) measure on a measure space (Ω, F), Lp (µ) denotes the collection of all real-valued, p-times µ-integrable functions on Ω. In particular, we use this notation quite often when µ is a probability measure P. In this case, we can interpret Lp (P) as the collection of all random variables whose absolute value has p finite moments. (You should recall that Lp spaces are in fact spaces of equivalence relations, where we say that f and g are equivalent when they are equivalent µ-almost everywhere.) A special case is made for the Lp spaces with respect to Lebesgue’s measure (on Euclidean spaces): When E ⊂ Rk is Borel (or more generally, Lebesgue) measurable, Lp (E) (or sometimes Lp E) denotes the collection of all p-times continuously differentiable functions that map E into R. For instance, we write Lp [0, 1] and/or Lp ([0, 1]) for the collection of (equivalence classes of) all p-times integrable functions on [0, 1]. Depending on the point that is being made, a stochastic process (Xt : t ∈ T ) (where T is some indexing set) is identified with the “randomly chosen function” t → Xt . Throughout, Leb denotes Lebesgue’s measure regardless of the dimension of the underlying Euclidean space.
This page intentionally left blank
Part I
Discrete-Parameter Random Fields
1
This page intentionally left blank
1 Discrete-Parameter Martingales
In this chapter we develop the basic theory of multiparameter martingales indexed N or NN by a countable subset of RN + , usually N 0 . As usual, N = {1, 2, . . .}, N0 = {0, 1, 2, . . .}, and N denotes a fixed positive integer. We will be assuming that the reader is quite familiar with the standard aspects of the theory of martingales indexed by a discrete one-parameter set. However, we have provided a primer section on this subject to reacquaint the reader with some of the one-parameter techniques and notations; see Section 1 below. The main thrust of this chapter is its discussion of maximal inequalities, since they are at the very heart of the theory of multiparameter martingales. Even in the simple setting of one-parameter random walks, maximal inequalities are considered deep results to this day. For instance, consider n i.i.d. random variables ξ1 , . . . , ξn and define the corresponding random walk as Sn = ξ1 + · · · + ξn . Then, Kolmogorov’s maximal inequality states that whenever ξ’s have mean 0, for all λ > 0, 1 P max |Sj | ≥ λ ≤ E{|Sn |}. 1≤j≤n λ On the other hand, a straightforward application of Chebyshev’s inequality implies that for all λ > 0, P(|Sn | ≥ λ) ≤ λ1 E{|Sn |}, and this is sharp, in general. Thus, roughly speaking, Kolmogorov’s maximal inequality asserts that the behavior of the entire partial sum process j → Sj (1 ≤ j ≤ n) is controlled by the value of the latter process at time n. The aforementioned maximal property leads to one of the key steps in the usual proof of Kolmogorov’s strong law of large numbers and is by no means a singular property. Indeed, maximal inequalities are precisely the tools required to prove strong convergence theorems in the theory of multiparameter martingales (Cairoli’s second convergence theorem, Theorem 2.8.1), as well as Lebesgue’s differentiation theorem (Theorem 2.1.1, Chapter 2). They are also an essential
4
1. Discrete-Parameter Martingales
ingredient in the analysis of the intersections of several random walks (Theorem 2.2.1, Chapter 3), the law of the iterated logarithm (Section 2.6, Chapter 4), regularity theory of general stochastic processes (Section 2.4, Chapter 5), and weak convergence of measures and processes in the space of continuous functions (Theorem 3.3.1, Chapter 6). You might have noted that we have chosen a discretetime example for every chapter in Part I of this book! Suffice it to say that Part II also relies heavily on maximal inequalities, and we now proceed with our treatment of discrete-parameter martingales without further ado. Let (Ω, G, P) be a probability space. In this chapter we will discuss martingales N on (Ω, G, P) that are indexed by NN 0 and sometimes N . Before proceeding further, the reader should become familiar with the notation described in the preamble to this book.
1 One-Parameter Martingales This section is not a comprehensive introduction to one-parameter martingales. Rather, it serves to remind the reader of some of the key methods and concepts as well as familiarize him or her with the notation. Further details can be found in the Supplementary Exercises.
1.1 Definitions Suppose F = (Fk ; k ≥ 0) is a collection of sub–σ-fields of the underlying σ-field G. We say that F is a (discrete-time, one-parameter) filtration if for all k ≥ 0, Fk ⊂ Fk+1 . By a stochastic process, or a random process, we mean a collection of random variables that are indexed by a possibly arbitrary set. A stochastic process M = (Mk ; k ≥ 0) is adapted to the filtration F if for all k ≥ 0, Mk is Fk -measurable. A stochastic process M is a submartingale (with respect to the filtration F) if 1. M is adapted to F; 2. for all k ≥ 0, E{|Mk |} < ∞; that is, Mk ∈ L1 (P) for all k ≥ 0; and 3. for all k ≥ 0, E[Mk+1 | Fk ] ≥ Mk , almost surely. We say that M is a supermartingale if −M is a submartingale. If M is both a supermartingale and a submartingale, then it is a martingale. We will refer to M as a smartingale if it is either a sub- or a supermartingale. By Jensen’s inequality, if M is a nonnegative submartingale, Ψ is convex nondecreasing on [0, ∞[, and if E{|Ψ(Mk )|} < ∞ for all k ≥ 0, then (Ψ(Mk ); k ≥ 0) is also a submartingale.
1 One-Parameter Martingales
5
Exercise 1.1.1 (RandomWalks) Suppose ξ1 , ξ2 , . . . are i.i.d. mean 0 n random variables. Let Sn = j=1 ξj (n = 1, 2, . . .) denote the corresponding partial sum process that is also known as a random walk. If S0 = 0 and if Fn denotes the σ-field generated by ξ1 , . . . , ξn (n = 0, 1, 2 . . .), show that S = (Sn ; n ≥ 0) is a martingale with respect to F = (Fk ; k ≥ 0). Exercise 1.1.2 (Branching Processes) Write X0 = 1 and define Xn Xn+1 = =1 ξ,n (n = 0, 1, . . .), where the ξ’s are all i.i.d., integrable random variables that take their values in N0 . Let Fm denote the σ-field
generated by X1 , . . . , Xm (m = 0, 1, . . .) and show that µ−n Xn ; n ≥ 0 is a martingale with respect to F = (Fn ; n ≥ 0), where µ = E[ξ1,1 ]. The stochastic process X is a branching process and, quite often, arises in the modeling of biological systems. For instance, suppose each individual gene in a given generation has a chance to give birth to a random number of genes (its offspring in the following generation). If the birth mechanisms are all i.i.d., from individual to individual and from generation to generation, and if the entire population starts out with one individual in generation 0, the above Xn denotes the total number of individuals in generation n, for appropriately chosen birth numbers. Exercise 1.1.3 (Random Stick-Breaking) Let X0 = 1 denote the length of a stick at time 0. Let X1 be a random variable picked uniformly Xn+1 to be picked from [0, X0 ] and, conditionally on {X 1 , . . . , Xn }, define
uniformly from [0, Xn ]. Show that 2n Xn ; n ≥ 0 is a martingale with respect to F = (Fn ; n ≥ 0), where Fn denotes the σ-field generated by X1 , . . . , Xn . Among other things, this process models breaking the “stick” [0, 1] in successive steps. Suppose T is a random variable that takes its values in N0 ∪ {∞}. Then, we say that T is a stopping time (with respect to F) if for all k ≥ 0, (T > k) ∈ Fk . Equivalently, T is a stopping time if for all k ≥ 0, T = k ∈ Fk . This is a consequence of the properties of filtrations, together with the decomposition
T =k = T >k−1 ∩ T >k . In order to emphasize the underlying filtration F, sometimes we say that T is an F-stopping time. There are many natural stopping times, as the following shows. Exercise 1.1.4 Nonrandom constants are stopping times. Also, if T1 , T2 , . . . are stopping times, so are • min(T1 , T2 ) = T1 ∧ T2 ; • max(T1 , T2 ) = T1 ∨ T2 ; • T1 + T2 ;
6
1. Discrete-Parameter Martingales
• inf n Tn , supn Tn , lim inf n Tn , and lim supn Tn . Finally, if X is an adapted process and A is a Borel subset of R, then T is a stopping time, where T = inf(k ≥ 0 : Xk ∈ A) and inf ∅ = ∞. For any stopping time T , define ∞
FT = A ∈ Fn : A ∩ (T ≤ k) ∈ Fk , for all k ≥ 0 . n=1
Exercise 1.1.5 Show that (a) this extends the definition of Fk when T ≡ k for some nonrandom k; and (b) an alternative definition of FT would be ∞
Fn : A ∩ (T = k) ∈ Fk , for all k ≥ 0 . FT = A ∈ n=1
Also show that, in general, FT is a σ-field and T is an FT -measurable random variable. Furthermore, show that when T is an a.s. finite F-stopping time and when M is adapted to F, MT is an FT -measurable random variable.1 Finally, demonstrate that whenever T1 and T2 are stopping times that satisfy T1 ≤ T2 , we have FT1 ⊂ FT2 . The choice of the σ-field FT is made to preserve many of the fundamental properties of martingales. We conclude this subsection with an example of this phenomenon. Recall that 1lA is the indicator function of the measurable event A. Theorem 1.1.1 Suppose F = (Fk ; k ≥ 1) is a filtration and Y is an integrable random variable. Define M = (Mk ; k ≥ 1) by Mk = E[Y | Fk ]. Then, M is a martingale with respect to F. Moreover, if T is a stopping time with respect to F, MT 1l(T <∞) = E[Y | FT ]1l(T <∞) , a.s. Equivalently, for any j ≥ 1, E[Y | Fj ] = E[Y | FT ], on (T = j). It is important to note that there is something that needs to be proved here. This is not a mere consequence of “replacing” FT by Fj on (T = j). Proof Clearly, M is adapted. By Jensen’s inequality, sup E{|Mk |} ≤ E{|Y |} < ∞. k
1 Recall
that the random variable MT is defined ω by ω as MT (ω) = MT (ω) (ω).
1 One-Parameter Martingales
7
Recall the towering property of conditional expectations: If H1 ⊂ H2 are two σ-fields, then for any integrable random variable U , (1) E[U | H1 ] = E E{U | H2 } H1 = E E{U | H1 } H2 , almost surely. The martingale property of M follows from the towering property of conditional expectations, together with the fact that Fk ⊂ Fk+1 . For any A ∈ FT , E[E{Y | FT }1l(T =j) 1lA ] = E[Y 1l(T =j) 1lA ], since A ∩ (T = j) ∈ FT . On the other hand, A ∩ (T = j) ∈ Fj implies that E[Y 1l(T =j) 1lA ] = E[E{Y | Fj }1l(T =j) 1lA ] = E[MT 1l(T =j) 1lA ]. Since MT 1l(T =j) and E[Y | FT ]1l(T =j) are both FT -measurable random variables, the result follows.
1.2 The Optional Stopping Theorem Perhaps the single most important fact about smartingales is Doob’s optional stopping theorem;2 see (Doob 1990; Hunt 1966). Theorem 1.2.1 (The Optional Stopping Theorem) Suppose M is a submartingale with respect to a filtration F. Then, for any integer k ≥ 0 and all F-stopping times T1 and T2 with T1 ≤ T2 ≤ k, E[MT2 | FT1 ] ≥ MT1 ,
a.s.
Proof For any A ∈ FT1 , 2 −1 T
E (MT2 − MT1 )1lA = E Mj+1 − Mj 1lA
j=T1
=
k
E (Mj+1 − Mj )1lA∩(T1 ≤j
j=0
On the other hand, for all j ≥ 0, A ∩ (T1 ≤ j) and (T2 > j) are both in Fj. Hence, so is B = A ∩ (T1 ≤ j < T2 ). By the submartingale property, E (Mj+1 − Mj )1lB ] ≥ 0. This shows that for all A ∈ FT1 , E MT1 1lA ≤ E MT2 1lA , which is the desired conclusion.
2 In the general theory of processes, stopping times are known as optional times, whence the name.
8
1. Discrete-Parameter Martingales
Exercise 1.2.1 (Doob’s Decomposition) Suppose X1 , X2 , . . . are integrable random variables and let Fk denote the σ-field generated by X1 , . . . , Xk (k = 1, 2, . . .). Let X0 = E[X1 ] and let F0 = {∅, Ω} be the trivial σ-field. Show that M = (Mk ; k ≥ 1) is a martingale with respect to F = (Fk ; k ≥ 1), where Mk =
k
X − E[X | F−1 ] ,
k ≥ 1.
=1
In particular, show that every submartingale can be written as the sum of a martingale and an increasing process. Exercise 1.2.2 Use Theorem 1.1.1 and Exercise 1.2.1, in conjunction, to give another proof of the optional stopping theorem, Theorem 1.2.1. This proof is due to G. A. Hunt; cf. (Dellacherie and Meyer 1982; Hunt 1966) for this and much more. The following, perhaps more standard, form of the optional stopping theorem is an immediate corollary of the above: Corollary 1.2.1 If M is a submartingale and T is an F-stopping time, then (MT ∧k ; k ≥ 0) is a submartingale. Many of the important results of martingale theory are consequences of clever applications of the above. We list some of them in order.
1.3 A Weak (1,1) Inequality Suppose M is a submartingale. A strong (p, q) maximal inequality is one of the form E[max0≤i≤k |Mi |p ] ≤ CE{|Mk |q }, where C is a positive finite constant that may depend only on p and q. When p = q = 1, such a result typically does not hold. Instead, one ought to prove a weak (1,1) inequality that states that the tails of the distribution of the maximum of a submartingale are well controlled by the size of the expectation of the submartingale. More precisely, we have the following theorem. Theorem 1.3.1 (Doob’s Maximal Inequality) If M is a nonnegative submartingale, λ > 0 is a real number, and k ≥ 0 is an integer, then P(Mk∗ ≥ λ) ≤
1 E[Mk 1l(Mk∗ ≥λ) ], λ
where Mk∗ = max0≤i≤k Mi . Typically, one uses the above as follows: Under the above conditions, for all λ > 0 and k ≥ 0, 1 P(Mk∗ ≥ λ) ≤ E[Mk ]. λ
1 One-Parameter Martingales
9
Proof Define T = inf(i ≥ 0 : Mi ≥ λ), where inf ∅ = ∞. Then, T is a F-stopping time and (Mk∗ ≥ λ) is nothing but the event (T ≤ k). Note that M is nonnegative. Moreover, on (T < ∞), MT ≥ λ. Consequently, MT ∧k 1l(T ≤k) ≥ λ · 1l(T ≤k) . Since (T ≤ k) ∈ FT ∧k and T ∧ k ≤ k, we can apply Theorem 1.2.1 to the stopping times T ∧ k and k to see that E[MT ∧k 1l(T ≤k) ] ≤ E[Mk 1l(T ≤k) ].
This implies the result.
1.4 A Strong (p, p) Inequality It turns out that Theorem 1.3.1 implies a strong (p, p) inequality if p > 1. Theorem 1.4.1 (Doob’s Strong Lp (P) Inequality) If p > 1, then for any nonnegative submartingale M = (Mk ; k ≥ 0) and all integers k ≥ 0, p p E max Mip ≤ E[Mkp ]. 0≤i≤k p−1 Proof Without loss of generality, we may assume that E[Mkp ] < ∞; otherwise, there is nothing to prove. Since p > 1, Jensen’s inequality shows that Mkp is also a submartingale. In particular, check that k E[Mjp ] ≤ kE[Mkp ] < ∞. E max Mip ≤ 0≤i≤k
j=1
We integrate Theorem 1.3.1 by parts and use the notation there to obtain Mk∗ ∞ ∗ p p−1 ∗ p−2 λ P(Mk ≥ λ) dλ ≤ pE λ dλ · Mk . E{|Mk | } = p 0
0
We have used Fubini’s theorem. Therefore, p p
p−1
1 E{|Mk∗ |p } ≤ E{|Mk∗ |p−1 ·Mk } ≤ E{|Mk∗ |p } p E[Mkp ] p , p−1 p−1 by H¨ older’s inequality. The result follows readily from this.
1.5 The Case p = 1 Our proof of Theorem 1.4.1 breaks down when p = 1. However, it does show that ∞ ∞ 1 E[Mk∗ 1l(Mk∗ ≥1) ] = E[Mk 1l(Mk∗ ≥λ) ] dλ P(Mk∗ ≥ λ) dλ ≤ λ 1 1 Mk∗ 1 dλ = E[Mk ln+ Mk∗ ], = E Mk 1l(Mk∗ ≥1) λ 1
10
1. Discrete-Parameter Martingales
where ln+ x = ln(x ∨ 1). Therefore, E[Mk∗ ] ≤ 1 + E[Mk ln+ Mk∗ ]. We will have need for the following inequality: For all w ≥ 0, ln w ≤ 1e w.3 Apply this with w = xy to see that whenever 0 ≤ x ≤ y, x ln y ≤ x ln+ x+ 1e y. We have proven the following weak (1, 1) inequality. Theorem 1.5.1 If M = (Mk ; k ≥ 0) is a nonnegative submartingale and k ≥ 0, e 1 + E[Mk ln+ Mk ] . E max Mi ≤ 0≤i≤k e−1 Exercise 1.5.1 Suppose X and Y are nonnegative random variables such that for all λ > 0, 1 P(Y > λ) ≤ E[X1l(Y >λ) ]. λ Show that if E[X{ln+ X}k ] is finite for some k ≥ 1, so is E[Y {ln+ Y }k−1 ]. This is from Sucheston (1983, Corollary 1.4). Exercise 1.5.2 Suppose M 1 , M 2 , . . . are nonnegative submartingales, and that for any j ≥ 0, limn→∞ E{Ψ(Mjn )} = 0, where Ψ(x) = x ln+ x. Prove that for any integer k ≥ 0, lim max Mjn = 0,
n→∞ j≤k
where the limit takes place in L1 (P).
1.6 Upcrossing Inequalities Given a submartingale M and two real numbers a < b, let T0 = inf( ≥ 0 : M ≤ a), and for all j ≥ 1, define inf( > Tj−1 : M ≥ b), if j is odd, Tj = inf( > Tj−1 : M ≤ a), if j is even, with the usual stipulation that inf ∅ = ∞. These are the upcrossing times of the interval [a, b]. Furthermore, for all j ≥ 1, MTj ≥ b if j is odd and MTj ≤ a if j is even. For all k ≥ 1, define Uk [a, b] = sup(0 ≤ j ≤ k : T2j < ∞), 3 To verify this, you need only check that g(w) = ln w − and g(e) = 0.
1 w e
is maximized at w = e
1 One-Parameter Martingales
11
where sup ∅ = 0. In words, Uk [a, b] represents the total number of upcrossings of the interval [a, b] made by the (random) numerical sequence (Mi ; 0 ≤ i ≤ k). Theorem 1.6.1 (Doob’s Upcrossing Inequality) If M = (Mk ; k ≥ 0) is a submartingale, a < b, and k ≥ 0, then E{Uk [a, b]} ≤
|a| + E{|Mk |} . b−a
It is also possible to derive downcrossing inequalities for supermartingales; cf. Exercise 1.6.1 below. Proof By considering the submartingale (Mk − a)+ instead of M , we can assume without loss of generality that a = 0 and that M is nonnegative. For all j ≥ 0, define τj = Tj ∧k, and note that the τj ’s are nondecreasing, bounded stopping times. Clearly, [Mτj+1 − Mτj ] ≥ bUk [0, b]. (1) j even
On the other hand, by the optional stopping theorem (Theorem 1.2.1), E[Mτj ] is increasing in j. Thus, since we have a finite sum, we can interchange expectations with sums to obtain Mτj+1 − Mτj = E Mτj+1 − Mτj ≥ 0. E j odd
j odd
Adding this to equation (1) above, we obtain the result. (Why? That is, what if for all j, τj < k?) Exercise 1.6.1 Let M = (Mk ; k ≥ 0) denote a supermartingale with respect to a filtration F = (Fk ; k ≥ 0). Show that whenever 0 ≤ a < b, E{Dk [a, b]} ≤
E[X1 ∧ b] − E[Xn ∧ b] , b−a
where Dk [a, b] denotes the number of downcrossings of the interval [a, b] made by the sequence (Mi ; 0 ≤ i ≤ k), i.e., Dk [a, b] = Uk [−b, −a]. Doob’s upcrossing inequality contains a prefatory but still useful formulation of the weak (1,1) inequality. Indeed, note that in the special case where M is a martingale,
P max |Mj | ≥ λ ≤ P(|M0 | ≥ 12 λ) + P max |Mj − M0 | ≥ 12 λ . 0≤j≤k
0≤j≤k
But |Mj − M0 | is a nonnegative submartingale. Let Uk (λ) denote the number of upcrossings of [0, λ] made by this submartingale before time k. Then,
12
1. Discrete-Parameter Martingales
by first applying Chebyshev’s inequality and then Doob’s upcrossing inequality (Theorem 1.6.1), we obtain P
max |Mj | ≥ λ ≤ P(|M0 | ≥ 12 λ) + P(Uk ( 12 λ) ≥ 1)
0≤j≤k
2 E{|M0 |} + E[Uk ( 12 λ)] λ 2 2 ≤ E{|M0 |} + E{|Mk − M0 |} λ λ 4 2 ≤ E{|M0 |} + E{|Mk |}. λ λ ≤
Since E{|M0 |} ≤ E{|Mk |}, we obtain P
6 max |Mj | ≥ λ ≤ E{|Mk |}. 0≤j≤k λ
This is essentially the weak (1,1) inequality (Theorem 1.3.1) with a slightly worse constant. (If M0 = 0, check that in the above, the constant 6 can be improved to the constant 1 of Theorem 1.3.1.)
1.7 The Martingale Convergence Theorem Recall that a stochastic process X = (Xk ; k ≥ 0) is Lp (P)-bounded (or bounded in Lp (P)) for some p > 0 if supk≥0 E{|Mk |p } < ∞. Theorem 1.7.1 (Doob’s Convergence Theorem) If M is an L1 (P)bounded submartingale, then M∞ = limk→∞ Mk exists almost surely, and M∞ ∈ L1 (P). If M is uniformly integrable, then Mk → M∞ in L1 (P) and Mk ≤ E[M∞ | Fk ], a.s. Finally, if M is Lp (P)-bounded for some p > 1, then Mk → M∞ , in Lp (P), as well. Among other things, Chatterji (1967, 1968) has constructed a proof of Doob’s convergence theorem that is based solely on Doob’s maximal inequality (Theorem 1.3.1) and the optional stopping theorem (Theorem 1.2.1). This proof was later discovered independently in Lamb (1973) and is developed in Supplementary Exercise 3. The argument described below is closer to the original proof of J. L. Doob and is based on the upcrossing inequality. Proof Note that k → Uk [a, b] is an increasing map. Therefore, by Lebesgue’s monotone convergence theorem, if M is bounded in L1 (P), Doob’s upcrossing inequality (Theorem 1.6.1) implies that for all a < b, (1) E sup Uk [a, b] < ∞. k≥0
1 One-Parameter Martingales
13
For all a < b, define the measurable event
Na,b = lim inf Mk ≤ a < b ≤ lim sup Mk . k
k
Then, by equation (1), P(Na,b ) = 0 for all a < b, for otherwise, with positive probability, [a, b] is upcrossed infinitely many times, which would contradict equation (1). Consequently,
P Na,b = 0, a,b∈Q:a
where, as usual, Q denotes the collection of all rationals. Thus, with probability one, lim supk Mk = lim inf k Mk , which gives the desired limit M∞ . Moreover, this is finite, almost surely. In fact, applying Fatou’s lemma, we arrive at E{|M∞ |} < ∞. If Mk is uniformly integrable, then we immediately have Mk → M∞ in L1 (P). If Mk is also Lp (P) bounded for some p > 1, then, by the strong (p, p) inequality (Theorem 1.4.1), supk |Mk | is in Lp (P). In particular, |Mk |p is uniformly integrable. It remains to prove that under uniform integrability, Mk ≤ E{M∞ | Fk }, a.s. For k ≤ , and for any λ > 0, Mk ≤ E{M | Fk } = E{M 1(|M |≤λ) | Fk } + E{M 1(|M |>λ) | Fk },
a.s.
By the bounded convergence theorem for conditional expectations, with probability one, lim E{M 1(|M |≤λ) | Fk } = E{M∞ 1(|M∞ |≤λ) | Fk }.
→∞
On the other hand, M∞ ∈ L1 (P) and the dominated convergence theorem for conditional expectations together reveal that, as λ → ∞, the right-hand side of the above display converges to E{M∞ | Fk }, a.s. It suffices to show that a.s., lim inf lim inf E{|M |1(|M |>λ) | Fk } = 0. λ→∞
→∞
But this follows from the following formulation of uniform integrability: limλ→∞ sup≥1 E{|M |1(|M |>λ) } = 0. Exercise 1.7.1 Consider the branching process of Exercise 1.1.2 and show that when µ < 1, limn→∞ Xn = 0, a.s., and when µ > 1, limn→∞ Xn = +∞, a.s. That is, if the mean number of progeny is strictly less (greater) than one, then the population will eventually die out (blow up). Exercise 1.7.2 Consider the stick-breaking process of Exercise 1.1.3 and show that limn→∞ Xn exists, a.s. Identify this limit.
14
1. Discrete-Parameter Martingales
Exercise 1.7.3 Prove that every positive supermartingale converges. (Hint: Try Exercise 1.6.1 first.) We conclude this section with an important application of the martingale convergence theorem. The tail σ-field T corresponding to the collection of σ-fields X = (Xn ; n ≥ 1) is defined as T=
∞ ∞
Xn .
m=1 n=m
We say that T is the tail σ-field of the random variables X1 , X2 , . . . when T is the tail σ-field of σ(X1 ), σ(X2 ), . . . . We now present a martingale proof of Kolmogorov’s 0-1 law. In words, it states that the tail σ-field corresponding to independent random variables is trivial. Corollary 1.7.1 (Kolmogorov’s 0-1 Law) If T denotes the tail σ-field corresponding to independent random variables X1 , X2 , . . ., then for all events Λ ∈ T, P(Λ) is identically 0 or 1. Proof Let F = (Fk ; k ≥ 1) denote the filtration generated by the Xi ’s; that is, Fk = ∨k=1 σ(X ). Since for all m > n, ∨∞ k=m σ(Xk ) is independent of Fn , Fn and T are independent. In particular, for any Λ ∈ T, P(Λ) = P(Λ | Fn ), a.s. The latter defines a bounded martingale (indexed by n), which, by Doob’s martingale convergence theorem, converges, a.s., to P(Λ | ∨n Fn ). Since Λ is ∨m Fm -measurable, this almost surely equals 1lΛ . We have shown that with probability one, P(Λ) = 1lΛ , which proves our result. Exercise 1.7.4 It is possible to devise measure-theoretic proofs of Kolmogorov’s 0-1 law, as we shall find in this exercise. In the notation of our proof of Kolmogorov’s 0-1 law, show that for all Λ ∈ T and for all ε ∈]0, 1[, there exists some n ≥ 1 large enough and some event Λn ∈ ∨i≥n σ(Xi ) such that P(ΛΛn ) ≤ ε, where denotes the set difference operation. Now use this to verify that P(Λ ∩ Λ) = {P(Λ)}2 . The following difficult exercise should not be neglected. Exercise 1.7.5 (Hewitt–Savage 0-1 Law) (Hard) Recall that a function f : Rm → R is said to be symmetric if for all permutations π of {1, . . . , m} and for all x1 , . . . , xm ∈ R, f (x1 , . . . , xm ) = f (xπ(1) , . . . , xπ(m) ). Now suppose that X1 , X2 , . . . are independent random variables—all on one probability space (Ω, G, P)—and for all m ≥ 1, define E−m to be the σfield generated by all random variables of the form f (X1 , . . . , Xm ), where f : Rm → R is measurable and symmetric. The σ-field E = ∩m E−m is called the exchangeable σ-field. Our goal is to prove the Hewitt–Savage 0-1 law: “E is trivial.” n (i) If Sn = j=1 Xj , show that the event (Sn > 0, infinitely often) is exchangeable. Also, prove that the tail σ-field of X1 , X2 , . . . is
2 Orthomartingales
15
in E, although the converse need not hold. Moreover, check that (E−m ; m ≥ 1) is a filtration. (ii) Consider integers n ≥ m ≥ 1 and a bounded, symmetric function f : Rm → R. Show that for any 1 ≤ i1 , . . . , im ≤ n, a.s., E f (Xi1 , . . . , Xim ) E−n = E f (X1 , . . . , Xm ) E−n . (iii) Given integers n ≥ m ≥ 1 and a bounded, symmetric f : Rm → R, define −1 n Un = f (Xi1 , . . . , Xim ), m C(n)
n
where C(n) represents summation over all m combinations of m distinct elements {i1 , . . . , im } from {1, . . . , n}. Show that Un is E−n measurable. Conclude that as n → ∞, Un almost surely converges to E[f (X1 , . . . , Xm ) | E]. n −1 (iv) Show that limn m . . , Xim ) = 0, a.s., where C(n;1) f (Xi1 , .
n represents summation over all C(n;1) m combinations of m distinct elements {i1 , . . . , im } from {1, . . . , n}, such that one of the ij ’s equals 1. Conclude that E[f (X1 , . . . , Xm ) | E] is independent of X1 . Extend this argument to prove that E[f (X1 , . . . , Xm ) | E] is independent of (X1 , . . . , Xm ).
(v) Prove that E is independent of itself. Equivalently, for all Λ ∈ E, P(Λ) ∈ {0, 1}. (Hint: For part (ii), compute E[f (Xi1 , . . . , Xim ) · g(X1 , . . . , Xn )]; for part (iv), start by computing the cardinality of C(n; 1); for part (v), use (iv) to conclude that E is independent of X1 , X2 , . . . .)
2 Orthomartingales: Aspects of the Cairoli–Walsh Theory Now that the one-parameter theory is established, we are ready to tackle multiparameter smartingales. There are two sources of difficulty that need to be overcome and they are both related to the fact that NN 0 —and more generally RN —cannot be well-ordered in a useful way.4 4 The stress here is on the word useful, since the 1904 well-ordering theorem of E. Zermelo states that, under the influence of the axiom of choice, every set can be wellordered. However, the structure of this well-ordering is not usually known and/or in line with the stochastic structure of the problem at hand.
16
1. Discrete-Parameter Martingales
The first problem is that there is no sensible way to uniquely define multiparameter stopping times. To illustrate, suppose (Mt ; t ∈ NN 0 ) is a collection of random variables. Then, it is not clear—and in general not true—that there is a uniquely defined first time t ∈ NN 0 such that Mt ≥ 0, say. The second source of difficulty with the multiparameter theory is that there are many different ways to define smartingales indexed by several parameters. In this book we will study two such definitions, both of which are due to R. Cairoli and J. B. Walsh. This material, together with other aspects of the theory of multiparameter martingales, can be found in Cairoli (1969, 1970a, 1970b, 1971, 1979), Cairoli and Gabriel (1979), Cairoli and Walsh (1975, 1978), Ledoux (1981), and Walsh (1979, 1986b). Later on, in Section 3, we will study multiparameter martingales. They are defined in complete analogy to one-parameter martingales based on the conditional expectation formula s t =⇒ E[Mt | Fs ] = Ms , together with the obvious measurability and integrability conditions. As we shall see throughout the remainder of this book, multiparameter smartingales are both natural and useful. Despite this, it is unfortunate that, when considered in absolute generality, multiparameter martingales have no structure to speak of. Nonetheless, it is possible to tame them when a so-called commutation hypothesis holds. To clarify these issues, we begin with a different—seemingly less natural—class of multiparameter smartingales that we call orthosmartingales. These are the second class of multiparameter smartingales encountered in this book and are the subject of this section.
2.1 Definitions and Examples Let N be a positive numeral and throughout consider N (one-parameter) filtrations F1 , . . . , FN , where Fi = (Fki ; k ≥ 0) (1 ≤ i ≤ N ). A stochastic process M = (Mt ; t ∈ NN 0 ) is an orthosubmartingale if for each 1 ≤ i ≤ N , and all nonnegative integers (t(j) ; 1 ≤ j ≤ N, j = i), t(i) → Mt is a one-parameter submartingale with respect to the one-parameter filtration Fi . The stochastic process M is an orthosupermartingale if −M is an orthosubmartingale. If M is both an orthosupermartingale and an orthosubmartingale, it is then an orthomartingale. Finally, we say that M is an orthosmartingale if it is either an orthosubmartingale or an orthosupermartingale. For example, let us consider the case N = 2 and write the process M as M = (Mi,j ; i, j ≥ 0). Then, M is an orthosubmartingale if 1. for all i, j ≥ 0, E{|Mi,j |} < ∞; 2. for all j ≥ 0, the one-parameter process i → Mi,j is adapted to the filtration F1 , while for each i ≥ 0, j → Mi,j is adapted to F2 ; and
2 Orthomartingales
17
3. for all i, j ≥ 0, E[Mi+1,j | Fi1 ] ≥ Mi,j , a.s., and E[Mi,j+1 | Fj2 ] ≥ Mi,j , a.s. While the above conditions may seem stringent, we will see next that in fact, many such processes exist. Example 1 Suppose M = (Mi ; i ≥ 0) (1 ≤ ≤ N ) are N stochastic processes that are adapted to independent filtrations M = (Mi ; i ≥ 0) (1 ≤ ≤ N ). We then say that M 1 , . . . , M N are independent processes. Given that the M ’s are all (super- or all sub- or all plain) martingales, we define the additive smartingale A as At =
N =1
Mt() ,
t ∈ NN 0 .
We also define the multiplicative smartingale M as N
Mt = =1
Mt() ,
t ∈ NN 0 .
Then, it is possible to see that both M and A are orthosmartingales with respect to the F ’s, where for all 1 ≤ ≤ N , Fk = Mk ∨
N
Mim ,
k ≥ 0.
i=1 m≥0 i =
In words, the filtration Fk is the σ-field generated by {Xi ; 0 ≤ i ≤ k}, as well as all of the variables {Xim ; m = , i ≥ 0}. The following should certainly be attempted. Exercise 2.1.1 Check that for each 1 ≤ ≤ N , F is a one-parameter filtration. Conclude that additive smartingales and multiplicative smartingales are indeed orthosmartingales. We mention another example, to which we will return in Chapter 4. Example 2 Suppose X = (Xt ; t ∈ NN 0 ) are i.i.d. random variables. Define the multiparameter random walk S = (St ; t ∈ NN 0 ) based the these X’s as St = Xs , t ∈ NN 0 . st
Define one-parameter filtrations F1 , . . . , FN as follows: For all 1 ≤ i ≤ N , Fki = σ(Xs ; s(i) ≤ k),
k ≥ 0,
18
1. Discrete-Parameter Martingales
where σ(· · ·) denotes the σ-field generated by the random variables in the parentheses. Multiparameter random walks are related to orthomartingales in the same way that one-parameter random walks are to one-parameter martingales, as the following simple but important exercise reveals. Exercise 2.1.2 Whenever E[Xt ] = 0 for all t ∈ NN 0 , S is an orthomartingale with respect to F1 , . . . , FN . We will encounter multiparameter martingales many more times in this book. For now, let us note the following consequence of Jensen’s inequality. It shows us how to produce more orthosmartingales in the presence of some. Lemma 2.1.1 Suppose that M = (Mt ; t ∈ NN 0 ) is a nonnegative orthosubmartingale with respect to one-parameter filtrations F1 , . . . , FN , that Ψ : [0, ∞[→ [0, ∞[ is convex nondecreasing, and that for all t ∈ NN 0 , E[Ψ(Mt )] < ∞. Then, t → Ψ(Mt ) is an orthosubmartingale.
2.2 Embedded Submartingales We have seen many instances where stopping times play a critical role in the one-parameter theory of stochastic processes. For instance, see the described proof of Doob’s weak (1,1) maximal inequality (Theorem 1.3.1). In a multiparameter setting, we can use the following idea to replace the use of stopping times, since stopping times are not well defined in the presence of many parameters. Proposition 2.2.1 Suppose M = (Mt ; t ∈ NN 0 ) is an orthosubmartingale with respect to the one-parameter filtrations F1 , . . . , FN . If s, t ∈ NN 0 with s t, then E[Ms ] ≤ E[Mt ]. Moreover, for all t(2) , . . . , t(N ) ≥ 0, the following is a one-parameter submartingale with respect to F1 : !s(1) = M
max
0≤s(2) ≤t(2)
···
max
0≤s(N ) ≤t(N )
Ms ,
s(1) ≥ 0.
In particular, the first part says that, as a function on NN 0 , t → E[Mt ] is increasing in the partial order . Furthermore, suppose N = 2 and write M = (Mi,j ; i, j ≥ 0). Then, Proposition 2.2.1 states that, as a process in i, max0≤j≤m Mi,j is a submartingale with respect to F1 . By reversing the roles of i and j, this proposition also implies that, as a process in j, max0≤i≤n Mi,j is a submartingale with respect to F2 . Consequently, Proposition 2.2.1 extracts 2 (in general, N ) one-parameter submartingales from M. Proof We prove the first part for N = 2. Write M = (Mi,j ; i, j ≥ 0) and recall that (i, j) (n, m) if and only if i ≤ n and j ≤ m. Fix any such pair (i, j) and (n, m). Since i → Mi,j is a one-parameter submartingale,
2 Orthomartingales
19
by Jensen’s inequality, E[Mi,j ] ≤ E[Mn,j ]. On the other hand, j → Mn,j is a submartingale, too. Another application of Jensen’s inequality shows that E[Mi,j ] ≤ E[Mn,m ], as claimed. The general case (i.e., when N ≥ 2 is arbitrary) is proved in Exercise 2.2.1 below. For the second and main part of the proposition, note that by the definitions, for any s(j) ≤ t(j) (2 ≤ j ≤ N ), Ms is measurable with respect to Fs1(1) . Therefore, max0≤s(2) ≤t(2) · · · max0≤s(N ) ≤t(N ) Ms is also measurable with respect to Fs1(1) . Moreover, E max · · · max |Ms | ≤ ··· E{|Ms |}, 0≤s(2) ≤t(2)
0≤s(N ) ≤t(N )
0≤s(2) ≤t(2)
0≤s(N ) ≤t(N )
which is finite. It remains to verify the submartingale property. This can be checked directly, as we demonstrate below: Suppose s(1) = k + 1. Then, max Ms Fk1 ≥ max · · · max E[Ms | Fk1 ] E max · · · 0≤s(2) ≤t(2)
0≤s(N ) ≤t(N )
0≤s(2) ≤t(2)
≥
max
0≤r (2) ≤t(2)
0≤s(N ) ≤t(N )
···
max
0≤r (N ) ≤t(N )
Mr ,
a.s.,
where r(1) = k. We have used the orthosubmartingale property of M in the last step, and shown that !k , !k+1 F1 ≥ M E M a.s. k This is the desired result.
Exercise 2.2.1 Prove Proposition 2.2.1 for a general integer N ≥ 2. Exercise 2.2.2 Suppose γ : NN 0 → N0 is a nondecreasing function in the sense that whenever s t, then γ(s) ≤ γ(t). Prove that whenever M is an N -parameter orthosubmartingale with respect to F1 , . . . , FN , M ◦ γ is a 1-parameter submartingale with respect to F ◦ γ, where (M ◦ γ)t = Mγ(t) i N and (F ◦ γ)t = ∩N i=1 Fγ (i) (t) (t ∈ N0 ).
2.3 Cairoli’s Strong (p, p) Inequality We can combine Proposition 2.2.1 with Doob’s strong (p, p) maximal inequality (Theorem 1.4.1) to obtain Cairoli’s strong (p, p) inequality for orthosubmartingales. Theorem 2.3.1 (Cairoli’s Strong (p, p) Inequality) Suppose that M = (Ms ; s ∈ NN 0 ) is a nonnegative orthosubmartingale with respect to oneparameter filtrations F1 , . . . , FN . Then, for all t ∈ NN 0 and p > 1, p N p E[Mtp ]. E max Msp ≤ 0st p−1
20
1. Discrete-Parameter Martingales
!s(1) , where Proof Write max0 s t Ms as max0≤s(1) ≤t(1) M !k = M
max
0≤s(2) ≤t(2)
···
max
0≤s(N ) ≤t(N )
Mk,s(2) ,...,s(N ) .
! is a (one-parameter) submartingale with respect By Proposition 2.2.1, M to the one-parameter filtration F1 . Applying Doob’s strong (p, p)-inequality (Theorem 1.4.1), we see that p E max Ms 0st p p !t(1) |p } E{|M ≤ p−1 p p p = E max · · · max Mt(1) ,s(2) ,...,s(N ) . p−1 0≤s(2) ≤t(2) 0≤s(N ) ≤t(N ) Going through the above argument one more time, we arrive at p E max Ms 0st
≤
p p 2p E max · · · max Mt(1) ,t(2) ,s(3) ,...,s(N ) . p−1 0≤s(3) ≤t(3) 0≤s(N ) ≤t(N )
The result follows from induction on N .
2.4 Another Maximal Inequality The method of Section 2.3 can be used in the p = 1 case. However, this time, the relevant result is the maximal inequality of Theorem 1.5.1. Theorem 2.4.1 Suppose M = (Mt ; t ∈ NN 0 ) is a nonnegative orthosubmartingale with respect to one-parameter filtrations F1 , . . . , FN . Then, for all p ≥ 0 and all t ∈ NN 0 , e N E max Ms (ln+ Ms )p ≤ (p + 1)N N + E Mt (ln+ Mt )p+N . 0st e−1 Consequently, letting p = 0, we obtain a multiparameter extension of Theorem 1.5.1: e N N + E Mt (ln+ Mt )N . E max Ms ≤ 0st e−1 Proof Without loss of generality, we can and will assume that the expectation on the right-hand side of the theorem’s display is finite. Define Ψp (x) = x(ln+ x)p (p ≥ 1). Note that Ψp is both nondecreasing and convex
2 Orthomartingales
21
on ]0, ∞[. By Jensen’s inequality, Ψp (M ) = (Ψp (Mt ); t ∈ NN 0 ) is an orthosubmartingale. First, let us suppose that N = 1. We can apply Theorem 1.5.1 to see that E max Ψp (Mi ) 0≤i≤k e 1 + E Ψp (Mk ) ln+ Ψp (Mk ) ≤ e−1 e 1 + E Ψp+1 (Mk ) + pE [Ψp (Mk ) ln+ ln+ Mk ] . ≤ e−1 Since ln+ ln+ x ≤ ln+ x and p ≥ 0, e 1 + (p + 1)E[Ψp+1 (Mk )] E max Ψp (Mi ) ≤ 0≤i≤k e−1 e 1 + E[Ψp+1 (Mk )] . ≤ (p + 1) e−1 This is the desired result for N = 1. To prove this result in general, we write !s(1) ), max Ψp (Ms ) = max Ψp (M 0st
0≤s(1) ≤t(1)
!k ; k ≥ 0) is defined in Section 2.3. By Jensen’s inequality and the where (M !) is a (one-parameter) submartingale embedding Proposition 2.2.1, Ψp (M with respect to the (one-parameter) filtration F1 . The first portion of the proof, and the monotonicity of Ψp for all p ≥ 0, together reveal that E max Ψp (Ms ) 0st
≤
e(p + 1) 1+E max · · · max Ψp+1 (Mt(1) ,s(2) ,...,s(N ) ) . e−1 0≤s(2) ≤t(2) 0≤s(N ) ≤t(N )
Once again, using the first part of our proof, together with the embedding Proposition 2.2.1, we arrive at E max Ψp (Ms ) 0st e e 1 + (p + 1) ≤ (p + 1) e−1 e−1 p+2 × 1+E max · · · max Ψ (Mt(1) ,t(2) ,s(3) ,...,s(N ) ) 0≤s(3) ≤t(3)
e 2 ≤ (p + 1)2 e−1 × 2+E max
0≤s(3) ≤t(3)
0≤s(N ) ≤t(N )
···
max
0≤s(N ) ≤t(N )
The result follows upon induction.
Ψp+2 (Mt(1) ,t(2) ,s(3) ,...,s(N ) ) .
22
1. Discrete-Parameter Martingales
2.5 A Weak Maximal Inequality In Section 1 we have shown how to use a maximal probability estimate such as Theorem 1.3.1 to obtain maximal moment estimates such as the results of Sections 1.4 and 1.5. Now we go full circle and obtain a maximal N -parameter probability estimate in terms of the N -parameter moment estimates of Sections 2.3 and 2.4. In fact, we have the following weak (1, L lnN −1 L) inequality: Theorem 2.5.1 Suppose M = (Mt ; t ∈ NN 0 ) is a nonnegative orthosubmartingale with respect to filtrations F1 , . . . , FN . Then, for all t ∈ NN 0 and all real λ > 0, P
1 e N −1 max Ms ≥ λ ≤ (N − 1) + E Mt (ln+ Mt )N −1 . 0st λ e−1
Proof By the embedding Proposition 2.2.1 and by Theorem 1.3.1, P
1 max · · · max Mt(1) ,s(2) ,...,s(N ) . max Ms ≥ λ ≤ E 0st λ 0≤s(2) ≤t(2) 0≤s(N ) ≤t(N )
The above maximum is over N − 1 variables and is the maximum of an (N −1)-parameter orthosubmartingale; cf. the embedding Proposition 2.2.1. We obtain the result from Theorem 2.4.1.
2.6 Orthohistories A stochastic process M = (Mt ; t ∈ NN 0 ) generates N orthohistories H1 , . . . , HN defined in the following manner: For any 1 ≤ i ≤ N and any k ≥ 0, Hki is the smallest σ-field that makes the collection (Mt ; 0 ≤ t(i) ≤ k) measurable. For instance, if N = 2 and if we write M = (Mi,j ; i, j ≥ 0), then Hk1 is the σfield generated by (Mi,j ; 0 ≤ i ≤ k, 0 ≤ j) and Hk2 is the σ-field generated by (Mi,j ; 0 ≤ i, 0 ≤ j ≤ k). You can (and should) think of Hk1 and 2 Hk as “the information contained in the values of the process M , over the sets” shown in Figures 1.1 and 1.2, respectively. The following lemma contains some elementary properties of orthohistories. k Lemma 2.6.1 Let H1 , . . . , HN denote Figure 1.1: Hk1 the orthohistories corresponding to the stochastic process M = (Mt ; t ∈ NN 0 ). Then,
2 Orthomartingales
23
1. each of the Hi ’s forms a one-parameter filtration; and 2. ∨k≥0 Hk1 = · · · = ∨k≥0 HkN . Proof Consider H1 : It is an increasing family of σ-fields, i.e., a filtration. A similar argument can be applied to all the Hi ’s; this proves 1. To prove 2, 1 N note that ∨∞ i=1 Hi is the σ-field generated by (Mt ; t ∈ N0 ). Since a similar ∞ 2 ∞ N remark holds for ∨i=1 Hi , . . . , ∨i=1 Hi , our proof is complete. An interesting property of orthohistories is that they preserve smartingale properties. Proposition 2.6.1 Suppose M is an orthosubmartingale with respect to one-parameter filtrations F1 , . . . , FN , and let H1 , . . . , HN denote the orthohistories of M . Then, M is an orthosubmartingale with respect to H1 , . . . , HN .
k
Proof Recall that for all 1 ≤ i ≤ N and all (i) t ∈ NN → Mt is adapted to Fti(i) . Since 0 ,t i Ht(i) is the smallest σ-field with respect to which t(i) → Mt is adapted, we have shown that for all integers k ≥ 0 and all 1 ≤ i ≤ N , Hki ⊂ Fki . To finish our proof of the proposition, we need only check the submartingale property of M with respect to the Hi ’s. Since Hi ⊂ Fi , this follows from the towering property of conditional expectations; see equation (1) of Section Figure 1.2: Hk2 1.1, Chapter 1. In other words, suppose t ∈ NN and 1 ≤ i ≤ N . Then E[Mt | Hti(i) −1 ] = E E{Mt | Fti(i) −1 } Hti(i) −1 , a.s., ≥ E[Ms | Hti(i) −1 ],
a.s.,
where s(j) = t(j) if j = i; but s(i) = t(i) − 1. As a result, Ms is measurable with respect to Hti(i) −1 = Hsi (i) . Thus, E[Mt | Hti(i) −1 ] ≥ Ms ,
a.s.,
for the above choice of s. This completes our proof.
Proposition 2.6.1 frees us from sometimes having to worry about the one-parameter filtrations with respect to which M is an orthosmartingale. That is, if no such filtrations are mentioned, it is sometimes safe to assume that they are the orthohistories.
24
1. Discrete-Parameter Martingales
2.7 Convergence Notions One of the highlights of the one-parameter theory of smartingales is Doob’s convergence theorem (Theorem 1.7.1). In order to discuss such a result in the setting of orthosmartingales, we first need to define what we mean by “limt→∞ Xt ” for a stochastic process X = (Xt ; t ∈ NN 0 ). In this book two natural notions of convergence arise. The first is a straightforward topological one. Namely, given a finite real number L, we say that “L = limt→∞ Xt , a.s.” whenever the following holds with probaN bility one: As the distance between the point t ∈ NN 0 and the axes of R grows, Xt converges to L. That is, lim Xt =
t→∞
lim
t(1) ,...,t(N ) →∞
Xt = L.
In this spirit, if L = +∞ (respectively − ∞), “L = limt→∞ Xt , a.s.” means that with probability one, for all M > 0, there exists K > 0, such that whenever min1≤i≤N t(i) ≥ K, then Xt ≥ M (respectively Xt ≤ −M ). Sometimes, the almost sure existence of a topological limit is too stringent a condition. We can relax it considerably by defining sectorial limits as follows: Consider a real-valued function f on RN + , and let ΠN denote the collection of all permutations of {1, . . . , N }. For any π ∈ ΠN , define π − lim f (t) = t→∞
lim
t(π(1)) →∞
···
lim
t(π(N )) →∞
f (t),
when it exists. Of course, the order in which the limits are taken is, in general, quite important. We say that the function f has sectorial limits (at infinity) if π − limt→∞ f (t) exists for all π ∈ ΠN . To illustrate, suppose N = 2. Then Π2 = {π1 , π2 }, where π1 : (1, 2) → (1, 2) and π2 : (1, 2) → (2, 1). The two sectorial limits are π1 − lim f (t) = t→∞
π2 − lim f (t) = t→∞
lim
lim f (t);
lim
lim f (t).
t(1) →∞ t(2) →∞ t(2) →∞ t(1) →∞
Simple examples show that the sectorial limits need not be equal. However, if and when they are, we write limt∞ f (t) for their common value, which we refer to as the sectorial limit of f (at infinity). Next, let us suppose that M = (Mt ; t ∈ NN 0 ) is an orthosubmartingale E{|M that is bounded in L1 (P). That is, supt∈NN t |} < ∞. We can apply the 0 one-parameter convergence theorem (Theorem 1.7.1) to the first coordinate of M , say, to see that for all integers t(2) , . . . , t(N ) ≥ 0, the following exists, almost surely: Mt1(2) ,...,t(N ) = lim Mt . t(1) →∞
2 Orthomartingales
25
The next step in showing the existence of sectorial limits is to let t(2) → ∞ in the above. We could do this if we knew that the (N − 1)-parameter stochastic process M 1 was an (N − 1)-parameter orthosubmartingale. A little thought shows that for this to hold, we need more stringent conditions than mere boundedness in L1 (P). Indeed, the following suffices: Theorem 2.7.1 (Cairoli’s First Convergence Theorem) If M is an N -parameter uniformly integrable orthosubmartingale, the sectorial limit M∞,...,∞ = limt∞ Mt exists, almost surely and in L1 (P). Moreover, for any π ∈ ΠN , # " π(1) π(2) π(N ) Mt ≤ E · · · E E M∞,...,∞ Ftπ(1) Ftπ(2) · · · Ftπ(N ) , a.s. To understand this theorem better, consider the case N = 2 and write M = (Mi,j ; i, j ≥ 0). The above result states that a.s. Mi,j ≤ min E E M∞,∞ Fi1 Fj2 , E E M∞,∞ Fj2 Fi1 , In particular, when M is a 2-parameter orthomartingale, then we have the following commutation property: Mi,j = E E (M∞,∞ Fi1 Fj2 = E E M∞,∞ Fj2 Fi1 , a.s. We will explore an abstraction of this property in greater depth later on. Proof Let us prove the result for N = 2. The general case is contained in Exercise 2.7.1 below. Throughout this proof we will assume that F1 and F2 are the orthohistories of the two-parameter process M . See the discussion of Section 2.6. We will also write M = (Mi,j ; i, j ≥ 0). For any j ≥ 0, i → Mi,j is a one-parameter uniformly integrable submartingale. By the one-parameter smartingale convergence theorem (Theorem 1.7.1), M∞,j = limi→∞ Mi,j exists almost surely and in L1 (P). Moreover, Mi,j ≤ E[M∞,j | Fi1 ]. Next, we need the process properties of M∞,j . By the asserted L1 (P) convergence, E M∞,j+1 Fj2 = lim E Mi,j+1 Fj2 ≥ lim Mi,j = M∞,j , i→∞
i→∞
where the convergence takes place in L1 (P). Hence, j → M∞,j is itself a one-parameter submartingale with respect to F2 and Mi,j ≤ E[M∞,j | Fi1 ].
26
1. Discrete-Parameter Martingales
Appealing to Fatou’s lemma, we see that j → M∞,j is a uniformly integrable one-parameter submartingale with respect to F2 . The oneparameter smartingale convergence theorem (Theorem 1.7.1) shows that 1 = limj→∞ M∞,j exists almost surely and in L1 (P). It may be useful M∞,∞ to point out that in the notation of the paragraph preceding the statement of the theorem, 1 M∞,∞ = π2 − lim Mi,j = lim lim Mi,j , j→∞ i→∞
(i,j)→∞
a.s. and in L1 (P),
where π2 : (1, 2) → (2, 1). Our argument, thus far, can be applied equally well to the variables in reverse order. In this way we obtain a uniformly integrable F1 orthomartingale (Mi,∞ ; i ≥ 0) that almost surely converges to some 2 1 2 , as i → ∞. That is, we have extracted M∞,∞ and M∞,∞ as the M∞,∞ two possible sectorial limits of M . Now we show that with probability one, 1 2 M∞,∞ = M∞,∞ . For all k ≥ 0, 1 1 Fk = lim lim E Mi,j Fk1 ≥ lim lim Mk,j = Mk,∞ , a.s. E M∞,∞ j→∞ i→∞
1 limk→∞ E[M∞,∞
| Fk1 ]
j→∞ i→∞
2 M∞,∞ ,
Thus, ≥ almost surely. Since F1 is a filtration, the one-parameter martingale convergence theorem implies that the 1 | ∨k≥0 Fk1 ], almost surely. By Lemma mentioned limit is equal to E[M∞,∞ 1 2 1 2.6.1, ∨k≥0 Fk = ∨k≥0 Fk , and M∞,∞ is measurable with respect to the 1 2 ≥ M∞,∞ , almost surely. We now reverse the roles of latter. That is, M∞,∞ 1 2 the indices to deduce that with probability one, M∞,∞ = M∞,∞ = M∞,∞ . 1 We have shown the almost sure, as well as the L (P), existence of the sectorial limit lim(i,j)∞ Mi,j . That Mi,j can be bounded by a conditional expectation of M∞,∞ follows from the L1 (P) convergence. Exercise 2.7.1 Prove Theorem 2.7.1 when N ≥ 2 is arbitrary.
2.8 Topological Convergence In this section we will show that, at little extra cost, the sectorial convergence of Section 2.7 can be greatly strengthened: Theorem 2.8.1 (Cairoli’s Second Convergence Theorem) Consider any orthomartingale M = (Mt ; t ∈ NN 0 ) that satisfies the following integrability condition: sup E{|Mt |(ln+ |Mt |)N −1 } < ∞. t∈NN 0
Then, with probability one, lim
sup
t(1) →∞ t(2) ,...,t(N ) ≥0
|Mt − Mt1(2) ,...,t(N ) | = 0,
2 Orthomartingales
27
where Mt1(2) ,...,t(N ) = limt(1) →∞ Mt , almost surely. In particular, by interchanging the indices we can conclude the following strengthened form of Theorem 2.7.1 for orthomartingales: Corollary 2.8.1 Suppose M = (Mt ; t ∈ NN 0 ) is an orthomartingale with N −1 the property that supt∈NN E{|M |(ln |M |) } < ∞. Then, limt→∞ Mt t + t 0 1 exists almost surely and in L (P). Exercise 2.8.1 Prove Corollary 2.8.1.
Our proof of Theorem 2.8.1 requires the following technical lemma. Lemma 2.8.1 Suppose M = (Mt ; t ∈ NN 0 ) is a uniformly integrable orthosubmartingale. Suppose Ψ : [0, ∞[→ [0, ∞[ is convex, and that x → E[Ψ(|Mt |)] < ∞, x−1 Ψ(x) is finite and nondecreasing on [0, ∞[. If supt∈NN 0 then M∗ = limt∞ Mt exists and lim E[Ψ(|Mt − M∗ |)] = 0.
t→∞
Proof Since M is uniformly integrable, by Theorem 2.7.1, the sectorial limit, M∗ = limt∞ Mt , exists, a.s. and in L1 (P). Moreover,
Fatou’s lemma shows that E[Ψ(|M∗ |)] < ∞. Observe that t → Ψ |Mt | is itself an N parameter orthosubmartingale and, by Theorem 2.7.1, for all s ∈ NN 0 , |Ms | ≤ E · · · E E |M∗ | Fs1(1) Fs2(2) · · · FsN(N ) , almost surely. By convexity, almost surely, Ψ(|Ms |) ≤ E · · · E E Ψ(|M∗ |) Fs1(1) Fs2(2) · · ·
N F (N ) . s
Consequently, as long as Λ ∈ ∩N =1 Fs() , the following inequality holds:
E[Ψ(|Ms |)1lΛ ] ≤ E[Ψ(|M∗ |)1lΛ ].
(1)
We let Λs = (Ψ(|Ms |) > λ) (λ > 0, s ∈ NN 0 ) and apply equation (1) with 1 E[Ψ(|Mt |)], this probability Λ replaced by Λs . Since P(Λs ) ≤ supt∈NN 0 λ (i.e., P(Λs )) goes to 0 as λ → ∞, uniformly in s ∈ NN 0 . By equation (1), we deduce that t → Ψ(|Mt |) is a uniformly integrable, N -parameter orthosubmartingale. Since Ψ is continuous on [0, ∞[, by Theorem 2.7.1, lim E[Ψ(|Mt |)] = E[Ψ(|M∗ |)].
t∞
(2)
We now show that the sectorial limit “limt∞ ” is, in fact, a topological one. To this end, note that for all x, y ∈ R, Ψ(|x − y|) ≤ |Ψ(x) − Ψ(y)|.
(3)
28
1. Discrete-Parameter Martingales
To verify this, write Ψ(x) = xf (x) (x ≥ 0), where f is nondecreasing, and assume without loss of generality that x ≥ y ≥ 0. Then, Ψ(|x − y|) = (x − y)f (x − y) ≤ xf (x) − yf (x − y). Thus, if x − y ≥ y, we obtain equation (3). If x − y ≤ y, Ψ(|x − y|) ≤ (x − y)f (y) = xf (y) − yf (y) ≤ xf (x) − yf (y). Now that we have demonstrated equation (3), we can proceed with our proof. By Jensen’s inequality, Ψ(|Mt |) is an L1 (P) bounded orthosubmartingale. Since Ψ is nondecreasing, by Proposition 2.2.1, if s t, then E[Ψ(|Ms |)] ≤ E[Ψ(|Mt |)]. A real-variable argument shows that lims→∞ E[Ψ(|Ms |)] exists and, by our assumption on Ψ and M , is finite. Thus, by equation (2), limt→∞ E[Ψ(|Ms |)] = E[Ψ(|M∗ |)] and the lemma follows from equation (3). We can now prove Theorem 2.8.1. Proof of Theorem 2.8.1 For the sake of illustration, we first detail our proof in the case N = 2 and write M = (Mi,j ; i, j ≥ 0). Clearly, the function Ψ(x) = x ln+ x satisfies the conditions of Lemma 2.8.1. That is, it is convex and x−1 Ψ(x) is nondecreasing. Furthermore, by the assumption of Theorem 2.8.1, supi,j≥0 E[Ψ(|Mi,j |)] is finite. Define Mi,∞ = lim Mi,j ,
i ≥ 0,
M∞,j = lim Mi,j ,
j ≥ 0.
j→∞ i→∞
On the other hand, during the course of our proof of Theorem 2.7.1, we already saw that i → Mi,∞ (respectively j → M∞,j ) is a uniformly integrable one-parameter martingale with respect to F1 (respectively F2 ) (Why?) Moreover, if M∞,∞ = limt∞ Mt denotes the sectorial limit, then M∞,∞ = lim Mi,∞ = lim M∞,j , i→∞
j→∞
a.s.
We proceed with establishing some probability estimates. First, we note that for any fixed m ≥ 0, sup sup |Mi,j − Mi,∞ | ≤ 2 sup sup |Mi,j − Mi,m |.
j≥m i≥0
i≥0 j≥m
Hence, for any c > 0 and all m ≥ 0,
P sup sup |Mi,j − Mi,∞ | ≥ λ ≤ P sup sup |c(Mi,j − Mi,m )| ≥ 12 cλ . j≥m i≥0
i≥0 j≥m
2 Orthomartingales
29
On the other hand, for any m ≥ 0, (c|Mi,j − Mi,m )|; i ≥ 0, j ≥ m) is an orthosubmartingale with respect to its orthohistories, for instance. Therefore, by the weak maximal inequality (Theorem 2.5.1 of Section 2.5),
P sup sup |Mi,j − Mi,∞ | ≥ λ j≥m i≥0
≤
2 e 1 + sup sup E Ψ c|Mi,j − Mi,m | . cλ e − 1 j≥m i≥0
By Jensen’s inequality (cf. the first part of the embedding Proposition 2.2.1),
P sup sup |Mi,j − Mi,∞ | ≥ λ j≥m i≥0
≤
2 e 1 + E Ψ c|M∞,∞ − M∞,m | . cλ e − 1
Now note that for any m ≥ 0, i → Mi,m is a uniformly integrable oneparameter martingale with supi≥0 E{|Mi,m | ln+ |Mi,m |} < ∞. In particular, by Lemma 2.8.1, m → M∞,m is a uniformly integrable one-parameter martingale with supm≥0 E{|M∞,m | ln+ |M∞,m |} < ∞. By another appeal to Lemma 2.8.1, for any c > 0, limm→∞ Ψ(c|M∞,∞ − M∞,m |) = 0. Thus, there exists cm > 0 such that limm cm = ∞ and supm≥0 Ψ(cm |M∞,∞ − M∞,m |) ≤ 1. That is,
4 e . P sup sup |Mi,j − Mi,∞ | ≥ λ ≤ cm λ e − 1 j≥m i≥0 Letting m → ∞ and then letting λ → 0+ along a rational sequence, we see that a.s. lim sup sup |Mi,m − Mi,∞ | = 0, m→∞ i≥0
Similarly, lim supn→∞ supj≥0 |Mn,j − M∞,j | = 0, a.s., which proves the result for N = 2. We now proceed with our proof in the general case. Write −1 (1) , t), where t = (t(2) , . . . , t(N ) ) ∈ NN . The above any t ∈ NN 0 as t = (t 0 argument extends naturally to show that for all real c, λ > 0 and all positive integers n,
P sup sup |Mt − Mt1 | ≥ λ t ∈N0N −1 t(1) ≥n
2 e N −1 (N − 1) + E Ψ c|M∗ − Mn,∞,...,∞ | , cλ e − 1 where M∗ = limt∞ Mt and Mn,∞,...,∞ = limt ∞ Mn,t are N parameter and (N −1)-parameter sectorial limits, respectively, and Ψ (x) = x(ln+ x)N −1 ; see Section 2.7. Hence, we can find cn → ∞ such that
N e N −1 . P sup sup |Mt − Mt1 | ≥ λ ≤ cn λ e − 1 t ∈NN −1 t(1) ≥n ≤
0
30
1. Discrete-Parameter Martingales
(The process M 1 is defined in the statement of Theorem 2.8.1 that we are now proving.) The rest of our proof follows as in the N = 2 case. Exercise 2.8.2 In the set-up of Theorem 2.8.1, suppose further that there E[|Mt |p ] < exists p > 1 such that M is bounded in Lp (P). That is, supt∈NN 0 p ∞. Show that Mt has a limit as t → ∞, a.s. and in L (P). Prove a uniform version of this in the style of Cairoli’s second convergence theorem.
2.9 Reversed Orthomartingales Roughly speaking, one-parameter reversed martingales are processes that become martingales once we reverse time. In order to be concrete, we need some filtrations. We say that a collection of σ-fields F = (Fk ; k ≥ 0) is a reversed filtration if for all k ≥ 0, Fk+1 ⊂ Fk . That is, F is a collection of σ-fields that becomes a filtration, once we reverse time. Now we define a stochastic process M = (Mk ; k ≥ 0) to be a reversed martingale (with respect to the reversed filtration F) if 1. M is adapted to F; and 2. for all ≥ k ≥ 0, E[Mk | F ] = M , a.s. Exercise 2.9.1 Verify that the above Condition 2 is equivalent to the following: For all k ≥ 0, E[Mk | Fk+1 ] = Mk+1 , a.s. Exercise 2.9.2 Show that every one-parameter reversed martingale converges. In this regard, see also Supplementary Exercise 4. (Hint: You can begin with by deriving an upcrossing inequality.) Our multiparameter discussion begins with N one-parameter reversed filtrations Fi = (Fki ; k ≥ 0), where 1 ≤ i ≤ N . A stochastic process M = (Mt ; t ∈ NN 0 ) is a reversed orthomartingale if for each integer 1 ≤ i ≤ N and all nonnegative integers (t(j) ; 1 ≤ j ≤ N, j = i), t(i) → Mt is a reversed martingale with respect to the filtration Fi . To illustrate, consider N = 2 and write Mi,j (Fi,j , etc.) for M(i,j) (F(i,j) , etc.). Then, M is a reversed orthomartingale if for all i ≥ 0, (Mi,k ; k ≥ 0) is a martingale with respect to F2 and (Mk,i ; k ≥ 0) is a martingale with respect to F1 . Under a mild integrability condition, reversed orthomartingales always converge. Theorem 2.9.1 Suppose M = (Mt ; t ∈ NN 0 ) is a reversed orthomartingale with respect to one-parameter reversed filtrations Fi = (Fki ; k ≥ 0), 1 ≤ i ≤ N . If E{|M0 |(ln+ |M0 |)N −1 } < ∞, then limt→∞ Mt exists a.s. and in L1 (P). Exercise 2.9.3 Prove Theorem 2.9.1.
3 Martingales
31
Exercise 2.9.4 Prove the following Lp (P) variant of Theorem 2.9.1. In the same context as the latter theorem, if E{|X0 |p } < ∞ for some p > 1, then, as t → ∞, Mt converges in Lp (P). Exercise 2.9.5 Show that whenever M is a uniformly integrable, N parameter reversed orthomartingale, the sectorial limit limt∞ Mt exists and is in L1 (P). Moreover, E{limt∞ |Mt |} ≤ E{|M0 |}.
3 Martingales In this section we discuss N -parameter martingales, and more generally N -parameter smartingales. In an absolutely general setting, N -parameter smartingales do not possess much structure; cf. Section 3.3 below. On the other hand, we will see, in this section, that in some cases one can use the theory of orthomartingales to efficiently analyze N -parameter martingales.
3.1 Definitions While the definition of orthomartingales may have seemed contrived, that of N -parameter martingales is certainly not, since N -parameter martingales are the most natural extension of 1-parameter martingales. It is the deep connections between martingales and orthomartingales that make the latter useful in the study of the former. Suppose F = (Ft ; t ∈ NN 0 ) is a collection of sub σ-fields of G. We say that F is a filtration if s t implies that Fs ⊂ Ft . An N -parameter stochastic N process M = (Mt ; t ∈ NN 0 ) is adapted to the filtration F if for all t ∈ N0 , Mt is Ft measurable. The process M is an N -parameter submartingale (with respect to F) if it is adapted (to F), for all t ∈ NN 0 , E{|Mt |} < ∞ and for all t s, E[Mt | Fs ] ≥ Ms , a.s. The stochastic process M is a supermartingale if −M is a submartingale. It is a martingale if it is both a supermartingale and a submartingale. If M is either a sub- or a supermartingale, then it is a smartingale. The following is an immediate consequence of Jensen’s inequality: Lemma 3.1.1 Suppose M = (Mt ; t ∈ NN 0 ) is a nonnegative submartingale with respect to a filtration F = (Ft ; t ∈ NN 0 ) and Ψ : [0, ∞[→ [0, ∞[ is convex nondecreasing. If for all t ∈ NN , E[Ψ(M t )] < ∞, then t → Ψ(Mt ) 0 is a submartingale with respect to F.
3.2 Marginal Filtrations The collection of all multiparameter smartingales contains the class of all orthosmartingales. To be more precise, suppose M = (Mt ; t ∈ NN 0 ) is an
32
1. Discrete-Parameter Martingales
N -parameter random process that is adapted to the N -parameter filtration F = (Ft ; t ∈ NN 0 ). For all 1 ≤ j ≤ N , define Ft , k ≥ 0. Fkj = t∈NN 0 : t(j) =k
We define Fj = (Fkj ; k ≥ 0), 1 ≤ j ≤ N and refer to the σ-fields F1 , . . . , FN as the marginal filtrations of F. Lemma 3.2.1 If F is a filtration, then, F1 , . . . , FN are one-parameter filN trations, and for all t ∈ NN 0 , Ft ⊆ ∩=1 Ft() . Exercise 3.2.1 Prove Lemma 3.2.1.
The marginal filtrations of an N -parameter filtration form one of the deep links between N -parameter martingales and orthomartingales, as we shall see next. Proposition 3.2.1 Suppose M is adapted to the N -parameter filtration F. If M is an orthosubmartingale with respect to the marginal filtrations of F, then M is a submartingale with respect to F. Proof We present a proof that uses the towering property of conditional expectations (see equation (1) of Section 1.1), working one parameter at a time. Suppose t s are both in NN 0 . Then, with probability one, E[Mt | Fs ] = E E[Mt | Fs1(1) ] Fs ≥ E[Ms(1) ,t(2) ,...,t(N ) | Fs ] = E E{Ms(1) ,t(2) ,...,t(N ) | F2(2) } Fs ≥ E[Ms(1) ,s(2) ,t(3) ,...,t(N ) | Fs ] ≥ · · · ≥ E[Ms | Fs ] = Ms .
s
The remaining properties are easy to check. To each N -parameter process M we can associate a history H as follows: For every t ∈ NN 0 , let Ht denote the σ-field generated by (Ms ; s t). When N = 2, we can think of Ht as the information contained in the values (Ms ; s t). This is shown in Figure 1.3. You should check that the corresponding pictures for the 2 marginal filtrations Hk1 and Hk2 are given by Figures 1.1 and 1.2 of Section 2.6, respectively. It is helpful to have these pictures in mind when interpreting many of the results and arguments that involve multiparameter filtrations, processes, etc.
t
Figure 1.3: A picture of Ht
3 Martingales
33
3.3 A Counterexample of Dubins and Pitman According to Proposition 3.2.1 and Lemma 3.2.1, there is a sense in which the class of all smartingales includes orthosmartingales. Since the latter have rich mathematical properties (Section 2), one may be tempted to think that the same is true of the former. Next, we discuss a two-parameter counterexample of Dubins and Pitman (1980); it shows that, in the absence of further features, general multiparameter smartingales do not have many regularity properties. Related results can be found in the second half of F¨ ollmer (1984a). Let us begin with two fixed sequences of numbers: p1 , p2 , . . ., all taking values in ]0, 1[, and m1 , m2 , . . ., all positive integers. Next, consider a probability space (Ω, G, P) rich enough so that on it we can construct independent random variables X1 , X2 , . . . such that for any k ≥ 1, P(Xk = 0) = pk , and P(Xk = j) = m−1 k (1 − pk ), for all 1 ≤ j ≤ mk . For every integer i, j ≥ 0 and all numerals k ≥ 1, k k define Fi,j to be the trivial σ-field if i + j < k. If i + j = k, define Fi,j to be the σ-field generated by (Xk = 0 or Xk = i). Finally, if i + j > k, k to be the σ-field generated by all of the X’s. We can also define define Fi,j F = (Fi,j ; i, j ≥ 0) by Fi,j =
∞
k Fi,j ,
i, j ≥ 0.
k=1
We begin with some preliminary facts. Exercise 3.3.1 Verify the following properties: k 1. for each fixed integer k ≥ 1, Fk = (Fi,j ; i, j ≥ 0) is a filtration;
2. F1 , F2 , . . . are independent; and 3. F is a filtration. Finally, find the marginal filtrations of F. Let A =
∪∞ k=1 (Xk
= 0) and define Mi,j = P(A | Fi,j ),
i, j ≥ 0.
Clearly, M = (Mi,j ; i, j ≥ 0) is a bounded two-parameter martingale. In fact, 0 ≤ Mi,j ≤ 1 for all i, j ≥ 0. We plan to show that it need not converge. More precisely, we have the following: ∞ Theorem 3.3.1 Suppose k=1 pk < 1 but limk→∞ mk pk = ∞. Then, with Mt > lim inf t∈NN Mt . probability one, lim supt∈NN 0 0
34
1. Discrete-Parameter Martingales
Remarks (a) Once again, we identify Mi,j with M(i,j) for any i, j ≥ 0. Throughout k , etc. our proof, the same remark applies to Fi,j , Fi,j (b) The conditions of the theorem hold, for example, if mk = 5k and pk = 2−k . Proof We start with a calculation that is an exercise in conditional expectations: p , if i + j < k, k mk p k k 1l(Xk =0 or Xk =i) , if i + j = k, (1) P(Xk = 0 | Fi,j )= m p k k + 1 − pk if i + j > k. 1l(Xk =0) , Fix any n, m ≥ 0 and note that for all k large enough (how large depends on n and m), mk p k . m p k k + 1 − pk (i,j) (n,m) (This uses the Borel–Cantelli lemma and the fact that k pk < +∞, so that with probability one, Xk = 0 for all but finitely many k’s.) By (b) above, Xk is independent of the entire collection (F ; = k). Therefore, for all m, n ≥ 0, and all k large enough, mk p k sup P(Xk = 0 | Fi,j ) ≥ . mk p k + 1 − p k (i,j) (n,m) sup
k P(Xk = 0 | Fi,j )≥
Since (Xk = 0) ⊂ A, we obtain the following: For all n, m ≥ 0 and all k large enough, mk p k sup Mi,j ≥ . mk p k + 1 − p k (i,j) (n,m) The left-hand side is independent of all large k and bounded above by one. We can let k → ∞ and use the conditions limk→∞ pk = 0 and limk→∞ mk pk = +∞ to see that with probability one, sup(i,j) (n,m) Mi,j = 1. That is, a.s. (2) lim sup Mt = inf2 sup Ms = 1, t∈NN 0
t∈N0 s t
On the other hand, (Mj,j ; j ≥ 0) is a bounded, one-parameter martingale with respect to the filtration (Fj,j ; j ≥ 0); cf. Exercise 2.2.2. By Doob’s convergence theorem, it converges almost surely and in Lp (P) for all p ≥ 1. That is, with probability one, limj→∞ Mj,j = P(A). This shows that with probability one, lim inf Mt = sup inf Mt ≤ P(A) ≤ t∈NN 0
s∈N20 t s
∞ k=1
P(Xk = 0) =
∞ k=1
pk .
3 Martingales
Since
∞
k=1
pk < 1, this and (2), together, have the desired effect.
Exercise 3.3.2 Verify equation (1).
35
Exercise 3.3.3 Show that Theorem 3.3.1 continues to hold, even if the condition k pk < 1 is replaced by k pk < +∞. (Hint: The Borel–Cantelli lemma.) Exercise 3.3.4 Construct a 2-parameter martingale that is in Lp (P) for for all p > 0, and yet its maximum is a.s. unbounded. This is due to S. Janson. (Hint: For the set-up of the Dubins–Pitman counterexample, consider ∞ Ni,j = k=1 αk P(Xk = 0 | Fi,j ), for a suitably chosen sequence α1 , α2 , . . . that goes to infinity in the limit.)
3.4 Commutation In Section 3.3 we saw that general smartingales can have undesirable properties; cf. Theorem 3.3.1 and Exercise 3.3.4. On the other hand, when the smartingale in question is an orthosmartingale, the theory is quite rich; see Section 2. The question arises, when is a martingale an orthomartingale? This brings us to the so-called commutation hypothesis. Suppose G1 , . . . , GN are sub-σ-fields of G. We say that they commute if for all bounded random variables Y and all π, π ∈ ΠN (the collection of all permutations of {1, . . . , N }), E E E{Y | Gπ(1) } Gπ(2) · · · Gπ(N ) = E E E(Y | Gπ (1) ) Gπ (2) · · · Gπ (N ) , a.s. Two remarks are in order: 1. We have already encountered a weak version of such a property in Section 2.7. See the remarks after Theorem 2.7.1; and 2. Commutation of σ-fields is a rather special property; it typically does not hold. A seemingly different condition is commutation of filtrations. An N 5 N parameter filtration F = (Ft ; t ∈ NN 0 ) is commuting if for every s, t ∈ N0 and for all bounded Ft -measurable random variables Y , E[Y | Fs ] = E[Y | Fst ],
5 This
is also known as condition (F4).
a.s.
36
1. Discrete-Parameter Martingales
Exercise 3.4.1 Construct several (e.g., six or seven) 2-parameter filtrations that do not commute. The following is the first sign of the connections between commuting filtrations, orthomartingales, and martingales. It states that if the underlying filtration is commuting, many martingales are also orthomartingales. As such, this serves as a converse to Proposition 3.2.1. Theorem 3.4.1 Suppose F is an N -parameter commuting filtration. Then, for all bounded random variables Z and all t ∈ NN 0 , a.s., E · · · E E(Z | Ft1(1) ) Ft2(2) · · · FtN(N ) = E[Z | Ft ], where F1 , . . . , FN denote the marginal filtrations of F. Proof To simplify the demonstration, suppose N = 2 and write Mi,j for M(i,j) , etc. By Doob’s 1-parameter martingale convergence theorem (Theorem 1.7.1), E E{Z | Fi1 } Fj2 = E lim E{Z | Fi,k } Fj2 k→∞ = lim E E{Z | Fi,k } Fj2 k→∞ = lim lim E E{Z | Fi,k } F,j , k→∞ →∞
where all of the convergences are taking place in L1 (P). Recall that F is commuting and Y = E[Z | Fi,k ] is Fi,k -measurable. This implies that a.s., E[Y | F,j ] = E[Y | F(i,k)(,j) ] = E[Y | Fi∧,k∧j ] = E[Z | Fi∧,k∧j ]. The ultimate equality follows from the towering property of conditional expectations; cf. equation (1) of Section 1.1, Chapter 1. We have shown that for every i, j ≥ 0 and for all bounded random variables Z, a.s., E E{Z | Fi1 } Fj2 = lim lim E[Z | Fi∧,k∧j ], k→∞ →∞
which, almost surely, equals E{Z | Fi,j }.
The following result better explains the term commuting. Corollary 3.4.1 The marginal filtrations F1 , . . . , FN of an N -parameter commuting filtration F all commute. Proof Consider a bounded random variable Z and a t ∈ NN 0 . By relabeling the parameter coordinates, we see that for all π ∈ ΠN (the collection of all permutations of {1, . . . , N }), π(2) π(N ) π(1) a.s. E · · · E E(Z | Ft(π(1)) ) Ft(π(2)) · · · Ft(π(N )) = E[Z | Ft ], Since the right-hand side is not affected by our choice of π, this implies the result.
3 Martingales
37
Exercise 3.4.2 Prove that the filtration F of the Dubins–Pitman counterexample of Section 3.3 is not commuting. Exercise 3.4.3 The intention of this exercise is to show that under commutation, Lemma 3.2.1 is sharp. One way to make this precise is as follows: Suppose F1 , . . . , FN are N independent 1-parameter filtrations. 1 N That is, for all t ∈ NN 0 , Ft(1) , . . . , Ft(N ) are independent σ-fields. For all ()
N N t ∈ NN 0 , define Ft = ∩=1 Ft() and prove that F = (Ft ; t ∈ N0 ) is a commuting N -parameter filtration. Compute its marginal filtrations. This is related to Exercise 2.1.1.
3.5 Martingales In Section 3.2 we showed that orthomartingales are martingales. We now show that if the underlying filtration is commuting, the converse also holds. In particular, under commutation, the powerful machinery of orthosmartingales becomes available to martingales; see Corollary 3.5.1 below. Theorem 3.5.1 Suppose that F is an N -parameter commuting filtration and that M = (Mt ; t ∈ NN 0 ) is adapted to F. Then, the following are equivalent: (i) M is an orthosubmartingale with respect to the marginals of F; and (ii) M is a submartingale with respect to F. Proof Proposition 3.2.1 shows that (i) =⇒ (ii). We now show that under commutation, (ii) implies (i). Let us suppose (ii) holds. To simplify the exposition, we also suppose N = 2 and write Mi,j for M(i,j) , etc. By the one-parameter martingale convergence theorem, E[Mi+1,j | Fi1 ] = lim E[Mi+1,j | Fi,k ], k→∞
a.s.
Since F is commuting, for all k > j, E[Mi+1,j | Fi,k ] = E[Mi+1,j | Fi,j ]. Since M is a submartingale, E[Mi+1,j | Fi1 ] = lim E[Mi+1,j | Fi,j ] ≥ Mi,j . k→∞
That is, for any fixed j ≥ 0, (Mi,j ; i ≥ 0) is a one-parameter submartingale with respect to F1 . Relabeling the order of the parameters proves the result. As an immediate but important consequence, we obtain the following. Corollary 3.5.1 Suppose M = (Mt ; t ∈ NN 0 ) is a submartingale with respect to the N -parameter commuting filtration F. Then, Cairoli’s maximal inequalities (Theorems 2.3.1 and 2.4.1 and 2.5.1) all hold for M . Moreover, so do Cairoli’s convergence theorems (Theorems 2.7.1 and 2.8.1).
38
1. Discrete-Parameter Martingales
Exercise 3.5.1 Suppose X = (Xt ; t ∈ NN 0 ) is an N -parameter submartingale with respect to a commuting filtration F = (Ft ; t ∈ NN 0 ). Show ) and an adapted process that there exist a martingale M = (Mt ; t ∈ NN 0 A = (At ; t ∈ NN ) such that X = M + A . Moreover, A can be chosen t t t 0 so that it is nondecreasing with respect to the partial order . That is, whenever s t, As ≤At . This is from Cairoli (1971). (Hint: Write Mt = s t ξs and try to imitate the given proof of Doob’s decomposition; cf. Exercise 1.2.2 and Supplementary Exercise 1.) Exercise 3.5.2 Suppose F = (Fn ; n ≥ 0) is a one-parameter filtration and suppose X = (Xt ; t ∈ NN 0 ) is a sequence of random variables indexed by N parameters. Assume that (i) there exists a (nonrandom, finite) constant K > 0 such that a.s., supt |Xt | ≤ K; and (ii) the sectorial limit of Xt exists, which we denote by X∞ . Prove that no matter how we let n, t(1) , . . . , t(N ) → ∞, E[Xt | Fn ] → E[X∞ | Fn ], a.s. In words, show that E[X∞ | Fn ] is the sectorial limit of E[Xt | Fn ], as (n, t) goes to infinity in NN +1 . This is based on Blackwell and Dubins (1962, Theorem 2).
3.6 Conditional Independence Suppose G1 , . . . , Gk+1 are sub-σ-fields of G. We say that G1 , . . . , Gk are conditionally independent given Gk+1 if E
k =1
Y Gk+1 =
k
E[Y | Gk+1 ] =1
whenever Y1 , . . . , Yk are bounded random variables that are measurable with respect to G1 , . . . , Gk , respectively. Exercise 3.6.1 Consider an ordinary random walk S = (Sn ; n ≥ 0) described in Exercise 1.1.1. Fix some n ≥ 1 and define the following σfields: (a) Let Fn be the σ-field generated by (Sj ; 0 ≤ j ≤ n); and (b) let Gn denote the σ-field generated by (Sn+j ; j ≥ 0). Show that Fn and Gn are conditionally independent, given σ(Sn ), the σ-field generated by Sn . The following characterization theorem describes an important connection between conditional independence and commutation. Theorem 3.6.1 For a given filtration F = (Ft ; t ∈ NN 0 ), the following are equivalent: (i) F is commuting; and
3 Martingales
39
(ii) for all s, t ∈ NN 0 , Ft and Fs are conditionally independent, given Fst . Proof Suppose that for all t ∈ NN 0 , Yt is a bounded Ft -measurable random variable. By the towering property of conditional expectations (equation (1), Section 1.1), a.s. E[Yt Ys | Fts ] = E Yt E{Ys | Ft } Fts = E Yt E{Ys | Fts } Fts , Thus, (i) ⇒ (ii). Conversely, supposing that (ii) holds, E[Yt Ys ] = E E{Yt Ys | Fts } = E E{Yt | Fts } · E{Ys | Fts } = E E{Yt | Fts } · Ys . Since Fts ⊂ Fs and the above holds for all bounded Fs -measurable random variables Ys , E[Yt | Fts ] = E[Yt | Fs ], almost surely. This shows that (ii) implies (i), and hence (ii) is equivalent to (i). At this point it would be a good idea to take another look at Figures 1.1, 1.2, and 1.3 and interpret commutation pictorially, at least in the interesting case where the N -parameter filtration in question is the history of a certain stochastic process. We conclude this section, and in fact this chapter, with a series of indispensable exercises on conditionally independent σ-fields. Henceforth, let G1 , G2 denote two σ-fields that are conditionally independent, given another σ-field G3 . Exercise 3.6.2 Prove that G1 ∨ G3 and G2 ∨ G3 are conditionally independent, given G3 . Use this to prove that the following are equivalent: (i) G1 and G2 are conditionally independent, given G3 ; and (ii) for all bounded G1 -measurable random variables X1 , E[X1 | G2 ∨ G3 ] = E[X1 | G3 ],
a.s.
In the general theory of random fields, condition (ii) is referred to as the Markov property for the triple (G1 , G3 , G2 ); see Rozanov (1982, Section 1, Chapter 2). Exercise 3.6.3 Show that G1 and G2 are conditionally independent, given G4 , for any other σ-field G4 ⊃ G3 such that G4 ⊂ G1 ∨ G2 . Construct an example to show that this last condition cannot be dropped. (Hint: Let G1 and G2 denote the σ-fields generated by ξ1 and ξ2 , respectively, where ξ1 , ξ2 are i.i.d. ±1 with probability 12 each. Consider G3 = {∅, Ω} and let G4 denote the σ-field generated by ξ1 + ξ2 .)
40
1. Discrete-Parameter Martingales
Exercise 3.6.4 Suppose Hn (n ≥ 1) is a decreasing sequence of σ-fields such that for all n ≥ 1, G1 and G2 are conditionally independent, given Hn . Prove that G1 and G2 are conditionally independent, given ∩n≥1 Hn . Exercise 3.6.5 Verify that whenever G1 and G2 are conditionally independent, given G3 , then G1 ∩ G2 ⊂ G3 . (Warning: You will need to assume that the underlying probability space, together with all of the mentioned σ-fields, is complete.)
4 Supplementary Exercises 1. Consider a collection of real numbers a = (at ; t ∈ NN 0 ). Show that corresponding to a, there exists a unique sequence δ = (δt ; t ∈ NN 0 ) such that (i) for N δ ; and (ii) for all t ∈ N , δ depends only on (as ; s t). all t ∈ NN s t 0 , at = 0 s t This is an aspect of the so-called inclusion–exclusion formula of J. H. Poincar´e, and the δ’s are referred to as the increments of the sequence a. 2. Suppose M = (Mn ; n ≥ 0) is a nonnegative uniformly integrable 1-parameter submartingale with respect to a filtration F = (F n ; n ≥ 0). (i) Use Doob’s convergence theorem and Fatou’s lemma to prove that for any F-stopping time T , E[MT 1l(T <∞) ] < ∞. (ii) Conclude that whenever T is a finite F-stopping time, (MT ∧n ; n ≥ 0) is uniformly integrable. (iii) Conclude that for all uniformly integrable submartingales M = (Mn ; n ≥ 0) and for all stopping times T , E{|MT |1l(T <∞) } ≤ sup E{|Mn |} < ∞. n
Moreover, when M is a uniformly integrable martingale, then for all stopping times T , E[MT 1l(T <∞) ] = E[M0 ]. 3. As observed by S. D. Chatterji, Doob’s convergence theorem for martingale is also a consequence of Doob’s maximal inequalities; cf. Chatterji (1967, 1968) and Lamb (1973). In this exercise you may not appeal to the martingale convergence theorem. Throughout, the underlying probability space is (Ω, F ∞ , P) and F = (F n ; n ≥ 0) denotes a filtration of sub–σ-fields of F ∞ with F ∞ = ∨n F n . (i) Suppose (Yn ; n ≥ 0) is a uniformly integrable martingale with respect to the filtration F. For all A ∈ F n , define µ(A) = E[Yn 1lA ] and verify that µ is a (signed) measure on ∨n F n . Furthermore, µ is absolutely continuous with respect to P. Conclude the existence of a random variable Y∞ ∈ L1 (P) such that for all n ≥ 0, Yn = E[Y∞ | F n ], a.s. (ii) Let F denote the collection of all Z ∈ L1 (P) such that for some n = n(Z), Z is F n -measurable. Prove that F is dense in L1 (P).
4 Supplementary Exercises
41
(iii) Prove that whenever Y is the uniformly integrable martingale of (i), then as n → ∞, Yn converges a.s. and in L1 (P) to Y∞ . (iv) Prove that whenever M = (Mn ; n ≥ 0) is a uniformly integrable martingale on an arbitrary probability space (Ω, G, P) and with respect to an arbitrary filtration F 1 , F 2 , . . ., there exists M∞ ∈ L1 (P) such that Mn converges to M∞ a.s. and in L1 (P). (v) Let M = (Mn ; n ≥ 0) denote a martingale that is bounded in L1 (P). That is, supn E{|Mn |} < ∞. Prove that there exists M∞ ∈ L1 (P) such that Mn converges to M∞ , a.s. (vi) Deduce Doob’s martingale convergence theorem from (v) and Doob’s decomposition, Exercise 1.2.1. (Hint: For part (i), check that µ(A) = limk→∞ E[Yk 1lA ]. For part (ii), consider ∈ F such that Z’s of the form Z = 1lE , where E ∈ F n . For part (iii), choose Y∞ , a.s. Finally, E{|Y∞ − Y∞ |} ≤ ε. Define Yn = E[Y∞ | F n ] and note that Yn → Y∞ estimate P(supn |Yn − Yn | ≥ ε), via Doob’s maximal inequality. For part (iv), let T = inf(n ≥ 0 : |Mn | ≥ R) and first prove that n → MT ∧n is uniformly integrable. Finish by picking R so large that P(T < ∞) < ε for ε small enough.) 4. Suppose M = (Mn ; n ≥ 0) is a reversed martingale with respect to a reversed filtration F = (F n ; n ≥ 0). (i) Verify that limn→∞ E[Mn ] exists but may be −∞. (ii) If limn E[Mn ] > −∞, prove the existence of a positive finite constant A such , for all λ > 0. that supn P(|Mn | ≥ λ) ≤ A λ (iii) Show that whenever limn E[Mn ] > −∞, (Mk+ ; k ≥ 0) is uniformly integrable. (iv) Prove that whenever n > m and λ > 0, 0 ≥ E[Mn 1l(Mn ≤−λ) ] ≥ E[Mn − Mm ] + E[Mm 1l(Mn <−λ) ]. (v) Conclude that limn E[Mn ] > −∞ is equivalent to the uniform integrability of M . (Hint: For part (ii), check that E[Mn+ 1l(Mn+ ≥λ) ] ≤ E[M1+ 1l(Mn+ ≥λ) ].) 5. Consider the hypercube I0,1 = [0, 1]N . We can subdivide I0,1 into 2N equal-sized subhypercubes of side 12 each and denote them by J1,1 , . . . , J1,2N , in some order. For instance, supposing that N = 2, we could take J1,1 = [0, 12 ]2 , J1,2 = [0, 12 ] × [ 12 , 1], J1,3 = [ 12 , 1] × [0, 12 ] and J1,4 = [ 12 , 1]2 . Next, we toss 2N independent p-coins (i.e., probability of heads equals p ∈]0, 1[)—one coin for every one of the J’s—and declare J1,j as open if the jth coin landed heads. Let N1 denote the number of open hypercubes among J1,1 , . . . , J1,2N and write these (randomly selected) open hypercubes as I1,1 , . . . , I1,N1 in some order. We also N 1 define K1 = ∪N j=1 I1,j , which is a random compact subset of [0, 1] . Next, we N subdivide each of the I1,j ’s (1 ≤ j ≤ N1 ) into 2 equal-sized subhypercubes of side 14 . According to newly introduced independent p-coins, we declare each
42
1. Discrete-Parameter Martingales
of them open or not as before to obtain a compact set K2 ⊂ K1 . We can continue this construction inductively: Having constructed Kn as a random union of hypercubes Jn,1 , . . . , Jn,Nn , we subdivide each Jn,i into 2N equal-sized subhypercubes and keep them or not according to the parity of i.i.d. p-coins, all tossed independently of the previous one. Define Kn+1 to be the union of the hypercubes kept in this (n + 1)st stage and continue the process indefinitely to obtain a decreasing family of random compact sets K1 ⊃ K2 ⊃ · · ·. Let K = ∩n Kn denote the limiting compact set and prove that there exists pc ∈]0, 1[ such that for all p > pc , K = ∅, with positive probability, whereas for all p < pc , K = ∅, a.s. Can you compute pc ? This process of creating random Cantor-like sets was coined fractal percolation by Mandelbrot (1982). (Hint: Consider a hidden branching process.) 6. Consider an N -parameter martingale M = (Mt ; t ∈ NN 0 ) with respect to an N -parameter filtration F = (F t ; t ∈ NN 0 ). We do not require F to be commuting here. (i) Verify that whenever E[Mt2 ] < ∞ for all t ∈ NN 0 , then for all s t, both in NN 0 , E[(Mt − Ms )2 ] = E[Mt2 ] − E[Ms2 ]. (ii) In particular, demonstrate that whenever supt E[Mt2 ] < ∞, t → Mt is a Cauchy sequence in L2 (P). (iii) Deduce the martingale convergence theorem when supt E[Mt2 ] < ∞. Thus, while we essentially need the commutation of F to prove the existence of a.s. limits, the existence of L2 (P) limits is another matter. N 7. Let F = (F t ; t ∈ NN 0 ) denote an N -parameter filtration. An N0 ∪ {+∞}, (T t) ∈ F t . valued random variable is an F-stopping point if for all t ∈ NN 0 N (2) (i) Let X = (Xt ; t ∈ N0 ) be adapted to F and hold integers t , . . . , t(N) ≥ 0 fixed. Prove, then, that for all Borel sets E ⊂ R, T is an F-stopping point, where T (j) = t(j) for all N ≥ j ≥ 2 and T (1) = inf(t(1) ≥ 0 : Xt ∈ E). (You need to interpret inf ∅ = +∞ and identify (∞, t(2) , . . . , t(N) ) with the point +∞ in the one-point compactification of NN 0 .) ) is an N -parameter submartingale (ii) Prove that whenever M = (Mt ; t ∈ NN 0 with respect to F and whenever T S are bounded stopping points, then E[MT ] ≤ E[MS ]. (iii) If T is an F-stopping point, let F T denote the collection of all s ∈ NN 0 such that for all events A, A ∩ (T = s) ∈ F s .
(a) Prove that F T is a σ-field. (b) Show that T is F T measurable and that (T = s) can, equivalently, be replaced by (T s) in the definition of F T . (c) If S and T are F-stopping points, S ∨ T is also an F-stopping point. (d) (Hard) Construct an example of a 2-parameter filtration F and two F-stopping points S and T such that S T is not an F-stopping point.
4 Supplementary Exercises
43
8. Let F denote an N -parameter filtration. A collection of F-stopping points (γ(n); n ≥ 0) is an optional increasing path if γ(1) γ(2) · · ·, almost surely. Extend a part of Exercise 2.2.2 by showing that when M is an N -parameter martingale with respect to F and when γ is an optional increasing path, M ◦ γ is a one-parameter martingale with respect to F ◦ γ, where the ◦ notation is taken from Exercise 2.2.2. 9. Let R denote the collection of all sets of the form [0, t], where t ∈ NN 0 ; let (Ω, G, P) designate a probability space; and let F = (F t ; t ∈ NN 0 ) denote an N -parameter filtration of sub–σ-field of G. A stopping domain D is a subset of NN 0 such that (a) D is a (random) denumerable union of elements of R; and (b) for all t ∈ NN 0 , (t ∈ D) ∈ F t . Its interior D◦ is defined as the collection of all t ∈ D such that there exists s ≺ t that is also in D. Demonstrate the truth of the following: (i) Whenever N = 1, D \ Do is a stopping time with respect to F. (ii) Any nonrandom union of elements of R is a stopping domain. (iii) If D1 and D2 are stopping domains, so are D1 ∩ D2 and D1 ∪ D2 . (iv) Suppose X is an adapted stochastic process. Let K be a measurable set and define [0, t] : t ∈ NN DK = 0 , Xs ∈ K for all s t . Then, DK is a stopping domain. (v) If D is a stopping domain, the following is a σ-field: F D = (A ∈ G : A ∩ (t ∈ D◦ ) ∈ F t for all t ∈ NN 0 ). (vi) If D1 ⊂ D2 are stopping domains, F D1 ⊂ F D2 . (vii) If D is a stopping domain, then for all t ∈ NN 0 , (t ∈ D) ∈ F D . (This is due to J. B. Walsh.) 10. (Hard. Continued from Supplementary Exercise 9.) Let F = (F t ; t ∈ NN 0 ) denote an N -parameter filtration on some probability space and let F 1 , · · · , F N denote its marginal filtrations. %N j N (i) For all t ∈ NN 0 , let F t = j=1 F t(j) and prove that F = (F t ; t ∈ N0 ) is an N -parameter filtration. to be the collection of all events A such (ii) If D is a stopping domain, define F D ◦ , A ∩ (t ∈ D ) ∈ F t . Prove that (a) F D is a σ-field; (b) that for all t ∈ NN 0 N (t ∈ D) ∈ F D , for all t ∈ N0 ; and (c) for all stopping domains D1 ⊂ D2 , FD ⊂ FD . 1 2 (iii) An adapted process M = (Mt ; t ∈ NN 0 ) is a strong martingale if for all 1 s ≺ t, both in NN 0 , Mt ∈ L (P) and E[δt | F s ] = 0, where the δ’s denote the increments of M in the sense of Supplementary Exercise 1. Prove that whenever D1 ⊂ D2 are stopping domains, E = δt F D δt , 1 t∈D2
almost surely.
t∈D1
44
1. Discrete-Parameter Martingales
This exercise constitutes J. B. Walsh’s optional stopping theorem for strong martingales. It opens the door to a rich theory of strong martingales that runs parallel to the one-parameter theory. 11. (Hard) Suppose M denotes an N -parameter orthomartingale. For all i ∈ i {1, . . . , N }, all t ∈ NN 0 , and for all a < b, define Ut [a, b] to be the number of upcrossings of [a, b] made by the numerical sequence (Ms ; s(j) = t(j) for all j = i , s(i) ≤ t(i) ). Prove that sup E{ max Uti [a, b]} < ∞,
t∈NN 0
1≤i≤N
as long as supt E(|Mt |{ln+ |Mt |}N−1 ) < ∞. Using this, describe another proof of Cairoli’s second convergence theorem (Theorem 2.8.1). 12. (Hard) Let F denote a commuting N -parameter filtration and suppose that M is a submartingale with respect to F. In this exercise we prove Cairoli’s second convergence theorem for M , using the ideas of the 1-parameter proof of S. D. Chatterji; cf. Supplementary Exercise 3. 1 (i) If (Yt ; t ∈ NN 0 ) is adapted and uniformly integrable, there exists Y ∈ L (P) such that for all t ∈ NN , Y = E[Y | F ], a.s. t t 0
(ii) Let F denote the collection of all Z ∈ L1 (P) such that for some t = t(Z), Z is F t -measurable. Prove that F is dense in L1 (P). (iii) Prove that whenever Y is the uniformly integrable martingale of (i), Yt has an a.s. and L1 (P) limit, as t → ∞. (iv) Returning to the submartingale M , assume that sup E(|Mt |{ln+ |Mt |}N−1 ) < ∞, t
and show that by (i)–(iii), M converges a.s. as t → ∞.
5 Notes on Chapter 1 Section 1 The basic theory of discrete-parameter martingales is covered in most introductory graduate texts in probability theory such as (Billingsley 1995; Chow and Teicher 1997; Chung 1974; Durrett 1991) and the more advanced books (Dudley 1989; Stroock 1993). Many of the one-parameter exercises of this chapter are standard and borrowed from these texts. For more specialized treatments, together with various applications, see Garsia (1970, 1973), Hall and Heyde (1980), and Neveu (1975). The earlier development of the subject, together with a wealth of information, can be found in Doob (1990). A comprehensive account of 1-parameter martingale theory, in both discrete and continuous time, is Dellacherie and Meyer (1982). In the preamble the claim was made that maximal inequalities are critical to the existence of strong convergence theorems. This can be made quite precise; cf. Burkholder (1964).
5 Notes on Chapter 1
45
The inequalities of Sections 1.3–1.5 are all part of one “master inequality” involving Orlicz norms; see Neveu (1975, Appendix). Exercise 1.2.1 is due to J. L. Doob; see Doob (1990) and Krickeberg (1963, 1965). An abstract theory of martingales can be found in Krickeberg and Pauc (1963). It is interesting that a nonadapted variant of Exercise 1.2.2 arises in the analysis of the game of craps; cf. Ethier (1998) for further details. Corollary 1.2.1 is what is most commonly called Doob’s stopping time theorem; see Dellacherie and Meyer (1982, Sec. 2, Ch. V). Exercise 1.7.5 is the 0-1 law of Hewitt and Savage (1955). The argument outlined borrows its ideas from Hoeffding (1960) and arose from the fundamental work of W. Hoeffding on the so-called U-statistics; see Serfling (1980) for further details. Exercise 3.5.2 was originally presented for N = 1 in Blackwell and Dubins (1962, Theorem 2) in order to rigorously prove the following intriguing principle: “given infinite information, any two people who agree on all things possible and all things impossible, assign equal probabilities to all possible events.” The fact that this exercise also holds for N > 1 is a remark in F¨ ollmer (1984a). Exercise 3.5.2 is also sometimes called Hunt’s lemma; cf. Dellacherie and Meyer (1982, Theorem 45, Ch. V). Section 2 The literature on multiparameter martingales is truely massive. This is partly due to the fact that there are entirely different aspects to this theory. Various facets of the theory of multiparameter martingales not treated in this book, in particular those with applications to ergodic theory and optimization, can be found in (Edgar and Sucheston 1992; Cairoli and Dalang 1996). The astute reader of this section will note that the multiparameter results of Section 2 are proved by several one-parameter projections. See Cairoli and Dalang (1996) and Sucheston (1983). Section 3 This section, together with Section 3, can be viewed as a systematic exposition of the works of R. Cairoli and J. B. Walsh on this subject. This material, in a less expanded form and together with many other aspects of the theory of multiparameter processes, already appears in the extensive notes of J. B. Walsh; see Walsh (1986a, 1986b). Section 3.3 is a slightly simplified version of the counterexample of L. E. Dubins and J. W. Pitman. The mentioned counterexample, in its full generality, is presented in Exercise 3.3.3. Exercise 3.3.4 is ascribed to S. Janson in Walsh (1986b). The multiparameter martingale convergence theorems here are part of a metatheorem that was made precise in Sucheston (1983). In words, it states that we can sometimes work one parameter at a time. An alternative approach to Cairoli’s convergence theorem that is closer, in spirit, to the discussion of this chapter can be found in Bakry (1981a). One can also prove such theorems by using the rather general theory of amarts or asymptotic martingales; see Edgar and Sucheston (1992) for a pedagogic treatment as well as an extensive bibliography of the relevant literature. As mentioned within it, the main idea behind Supplementary Exercise 3 is borrowed from Chatterji, (1968, 1967), where the martingale convergence theorem is proved for some Banach-space-valued martingales. In the context of the original
46
1. Discrete-Parameter Martingales
finite-dimensional convergence theorem, this proof was rediscovered later in Lamb (1973). Supplementary Exercises 11 and 12 are new; see Khoshnevisan (2000) for yet another approach. Exercises 3.6.2–3.6.5 form an integral part in the study of splitting fields and Markov properties of random fields; see Rozanov (1982), for instance. There is a large body of work that extends and studies martingales indexed by partially order sets. A good general reference for those motivated by classical analysis is Walsh (1986b, Ch. 1). Those motivated by the general theory of random fields are studied in (H¨ urzeler 1985; Ivanova and Mertsbakh 1992; Song 1988), among many other references; see also the volume Fouque et al. (1996) for some of the recent developments. A comprehensive treatment of set-indexed martingales can be found in the recent book of Ivanoff and Merzbach (2000). Section 4 Supplementary Exercise 7 is borrowed from Walsh (1986b); see also Mazziotto and Szpirglas (1983, 1982). A self-contained treatment of multiparameter optional stopping theory and related results is Cairoli and Dalang (1996). Supplementary Exercise 8 is part of the general theory of Walsh (1981); see also Cairoli and Dalang (1996).
2 Two Applications in Analysis
One of the unifying themes of this book is its systematic applications of martingales and, in particular, maximal inequalities. While most of our intended applications are in probability theory, in this chapter we sidetrack slightly and show probabilistic/martingale proofs of two fundamental analytic theorems. The first is a theorem that states that Haar functions form an orthonormal basis for Lp [0, 1] for any p > 1. That is, there exists a prescribed collection of piecewise flat functions (H ; ≥ 1) such that any function f ∈ Lp [0, 1] can be written as f = f, H H , where •, • denotes the Hilbertian inner product on L2 [0, 1], extended to Lp , and the infinite sum converges in Lp [0, 1]. The second theorem of interest to us is the celebrated differentiation theorem of H. Lebesgue; it states that indefinite integrals of functions in L1 [0, 1] are almost everywhere differentiable. After gathering some preliminary information, this chapter carefully states and proves the mentioned results. Moreover, it describes some multidimensional extensions of these works that will use the methods of multiparameter martingales.
1 Haar Systems Suppose f : [0, 1]N → R is Lebesgue measurable and for all p > 0, define the Lp norm of f by f p =
[0,1]N
p1 |f (x)|p dx ,
48
2. Two Applications in Analysis
where dx = Leb(dx) denotes N -dimensional Lebesgue measure. A probabilistic way to think of this is as follows: Given a random variable U uniformly distributed over [0, 1]N , f pp = E{|f (U )|p }. Moreover, for any Borel set A ⊂ R, Leb[f −1 (A)] = P(f (U ) ∈ A), where f −1 (A) = {t ∈ [0, 1]N : f (t) ∈ A}. Let Lp [0, 1]N denote the collection of all such f ’s for which f p < ∞. When p ≥ 1, the space Lp [0, 1]N is a complete vector space in the metric defined by its norm, if we identify functions that are equal almost everywhere. We can also define L∞ [0, 1]N to be the collection of all bounded measurable functions f : [0, 1]N → R, with suitable identification made for a.e. equivalence. When f ∈ L∞ [0, 1]N , define f ∞ to be the essential supremum, ess supt∈[0,1]N |f (t)|. Recall that for any measurable g : [0, 1]N → R, ess sup g(t) = inf r ∈ R : Leb[g −1 (r, ∞)] = 0 , t∈[0,1]N
where inf ∅ = ∞. What do the elements of Lp [0, 1]N look like, when p ≥ 1? To answer this, let us begin with the simplest case, N = 1.
1.1 The 1-Dimensional Haar System Let h0,0 (t) = 1,
0 ≤ t ≤ 1,
and for all k ≥ 1, define 1 2 2 (k−1) , hk,0 (t) = −2 12 (k−1) , 0,
if 0 ≤ t ≤ 2−k , if 2−k < t ≤ 2−k+1 , otherwise.
For any 0 ≤ j ≤ 2k−1 − 1, define hk,j (t) = hk,0 (t − j2−k+1 ),
t ≥ 0.
That is, for all 0 ≤ j ≤ 2k−1 −1, hk,j is hk,0 shifted to the right by the amount j2−k+1 . The collection hk,j ; k ≥ 1, 0 ≤ j ≤ 2k−1 − 1 ∪ h0,0 is the standard Haar system on [0, 1], and the hk,j ’s are the Haar functions on [0, 1]. In order to simplify the forthcoming formulæ, let us define the index set Γ(k) = 0, 1, . . . , 2(k∨1)−1 − 1 . Thus, the Haar system is compactly represented as hk,j ; k ≥ 0, j ∈ Γ(k) .
1 Haar Systems
49
Note that the Haar functions are in Lp [0, 1] for all p > 0. It is also a simple matter to check that we have the following orthonormality property: For all k, m ≥ 0, j ∈ Γ(k), and n ∈ Γ(m), " 1 1, if (k, j) = (m, n), hk,j (r)hm,n (r) dr = 0, otherwise. 0 &1 When it makes sense, define the inner product f, g = 0 f (r)g(r) dr. 1 As is usual in Lebesgue’s theory, this induces a norm f 2 = f, f 2 and renders L2 [0, 1] a real separable Hilbert space. In particular, hk,j , hn,m = 0 if (k, j) = (n, m), and otherwise it is equal to hk,j 22 = 1. In this section we will show that the Haar system spans L1 [0, 1], and consequently, it also spans Lp [0, 1] for any p ≥ 1. Thus, in light of the orthonormality property mentioned above, our goal is to show that the Haar system is an orthonormal basis for Lp [0, 1] for any p ≥ 1. Introduce a random variable U that is uniformly distributed on [0, 1], and define the dyadic filtration σ hk,j (U ) , k ≥ 0, Fk =
j∈Γ(k)
where σ · · · denotes the σ-field generated by the objects in the braces. The following is an important exercise. Exercise 1.1.1 Prove that the dyadic filtration is indeed a filtration. Also prove that {sgn hk,j (U ); k ≥ 1, j ∈ Γ(k)} is a collection of mean-zero independent ±1-valued random variables, where sgn x equals 1 if x ≥ 0 and it equals −1 if x < 0. (Hint: Consider first the binary representation of U .) Let us consider an arbitrary f ∈ L1 [0, 1]. Note that for all k ≥ 1 and all j ∈ Γ(k), 1
|f, hk,j | ≤ hk,j ∞ · f 1 = 2 2 (k−1) f 1 < ∞.
(1)
Therefore, the following is well-defined: Mn (f ) =
n
hk,j (U )f, hk,j .
k=0 j∈Γ(k)
(What about the k = 0 term?) Exercise 1.1.2 Verify that whenever q > k are both integers, Mk (1l[0,2−q ] ) = 2−q+k 1l[0,2−q ] (U ), a.s. On the other hand, when q ≤ k, show that Mk (1l[0,2−q ] ) = 1l[0,2−q ] (U ), a.s. (Hint: When q > k, check that Mk (1l[0,2−q ] ) = 2−q βk , where βk = 1 +
k j=1
2j−1 1l[0,2−j ] (U ) − 1l[2−j ,2−j+1 ] (U ) ,
50
2. Two Applications in Analysis
and use induction.) The key observation of this subsection is the following:
Proposition 1.1.1 If f ∈ L1 [0, 1], then (Mn (f ); n ≥ 0) is a martingale. In fact, for all n ≥ 0, Mn (f ) = E[f (U ) | Fn ],
a.s.
Proof By Exercise 1.1.1, the hk,j ’s are i.i.d. mean-zero random variables. The martingale property is a simple consequence of this. What we are after is the stated representation of Mn (f ) as a conditional expectation; this also implies the asserted martingale property of M . Since F0 is the trivial σ-field, the assertion of the theorem holds for n = 0. We will show that it does for n ≥ 1 as well. Note that f → Mn (f ) and f → E[f (U ) | Fn ] are both linear maps. First, we prove the result when f has the following form: f (t) = 1l[0,2−q ] (t),
t ∈ [0, 1],
where q ≥ 1 is a fixed integer. By directly computing, we see that for any two integers k ≥ q and j ∈ Γ(k), the following holds almost surely: E[f (U ) | Fk ]1l[j2−k ,(j+1)2−k ] (U ) = P(U ≤ 2−q | j2−k ≤ U ≤ (j + 1)2−k )1l[j2−k ,(j+1)2−k ] (U ) " 0, if j ≥ 2−q+k , (2) = 1l[j2−k ,(j+1)2−k ] (U ) × 1, if 0 ≤ j ≤ 2−q+k − 1. On the other hand, if k < q, then for all integers j ∈ Γ(k), " 2−q+k 1l[0,2−k ] (U ), if j = 0, E[f (U ) | Fk ]1l[j2−k ,(j+1)2−k ] (U ) = 0 if j ≥ 1.
(3)
It is easy to see directly that when k ≥ q, Mk (f )1l[j2−k ,(j+1)2−k ] (U ) equals the right-hand side of (2), and when k < q, it equals the right-hand side of (3); see Exercise 1.1.2 above. This verifies the result when f is of the form, f = 1l[0,2−q [ for an integer q ≥ 0. The same argument works to establish the result when f is of the form f = 1l[j2−q ,(j+1)2−q [ , for positive integers j and q. The remainder of our proof is an exercise in measure theory. By the asserted linearity, the result holds when f is simple and dyadic, i.e., when it is of the form f = m j=1 αj 1l[j2−q ,(j+1)2−q [ , where m and q are positive integers and α1 , . . . , αm are real-valued. Any continuous function can be sandwiched between two simple dyadic functions. Therefore, the result holds for all continuous f . Since continuous functions are dense in L1 [0, 1], the proposition follows.
1 Haar Systems
51
It is an immediate consequence of Proposition 1.1.1 that if f ∈ L1 [0, 1], then (Mn (f ); n ≥ 1) is a uniformly integrable martingale. Therefore, by the (one-parameter) martingale convergence theorem (Theorem 1.7.1, Chapter 1), it converges almost surely and in L1 (P). In fact, lim Mn (f ) = E[f (U ) | ∨n Fn ] = f (U ),
n→∞
almost surely, and limn E{|Mn (f ) − f (U )|} = 0. In analytical terms, we have proved the following result: Theorem 1.1.1 If f ∈ L1 [0, 1], the following limit exists for Lebesgue almost all t ∈ [0, 1] and in L1 [0, 1]: f (t) = lim
n→∞
n
hk,j (t)f, hk,j .
k=0 j∈Γ(k)
That is, any function in L1 [0, 1] can be thought of as an infinite linear combination of Haar functions. Thus, we have shown that the Haar system forms a basis for L1 [0, 1]. An analogous result holds in Lp [0, 1] when p ∈ (1, ∞). Exercise 1.1.3 Prove that whenever f ∈ Lp [0, 1] for p ∈ [1, ∞), the convergence in Theorem 1.1.1 holds almost everywhere and in Lp [0, 1].
1.2 The N-Dimensional Haar System The Haar system on [0, 1] extends nicely to form a basis for Lp [0, 1]N (p ≥ 1). Recall that for any integer k ≥ 0, " {0, . . . , 2k−1 − 1}, if k ≥ 1, Γ(k) = {0}, if k = 0, and {hk,j ; k ≥ 0, j ∈ Γ(k)} is the Haar system on [0, 1] described in the previous section. Let us begin with the N -dimensional analogue of Γ(k). For all k ∈ NN 0 , define ΓN (k) = Γ(k (1) ) × · · · × Γ(k (N ) ). N N Suppose k ∈ NN 0 and j ∈ Γ (k). For all t ∈ [0, 1] , we can define
hN k,j (t)
N
=
hk() ,j () (t() ),
t ∈ [0, 1]N .
=1 N N The Haar system on [0, 1]N is the collection {hN k,j ; k ∈ N0 , j ∈ Γ (k)}. 1 Note that h is our old Haar system on [0, 1] from Section 1.1.
52
2. Two Applications in Analysis
Next, take a random vector V that is uniformly distributed on [0, 1]N , and define the dyadic filtration
N Ft = σ hN t ∈ NN k,j (V ); j ∈ Γ (k) , 0 . 0kt
Exercise 1.2.1 Show that the above dyadic filtration is indeed an N parameter commuting filtration. For all f ∈ L1 [0, 1]N and t ∈ NN 0 , we define Mt (f ) by N hN Mt (f ) = k,j (V )f, hk,j , 0 k t j∈ΓN (k)
& where f, g = [0,1]N f (t)g(t) dt, whenever it makes sense. Note that Mt (f ) is well-defined as long as f ∈ L1 [0, 1]N . Indeed, using equation (1) of Section 1.1, for all k ∈ NN and j ∈ ΓN (k), 1
2( |f, hN k,j | ≤ 2
N
=1
k() −N )
f 1.
Our multidimensional analogue of Proposition 1.1.1 is the following. Proposition 1.2.1 F is a commuting filtration. Moreover, if f L1 [0, 1]N , then t ∈ NN Mt (f ) = E[f (V ) | Ft ], 0 .
∈
Proof That F is a filtration is immediate. Clearly, the coordinates of V are i.i.d. random variables, each uniformly picked from [0, 1]. Therefore, by N N Exercise 1.1.1, {hN k,j (V ); k ∈ N0 , j ∈ Γ (k)} is a collection of mean-zero independent random variables. Exercise 1.2.1 shows that F is commuting. In fact, we can be more careful and write F in terms of its marginal filtrations: For all t ∈ [0, 1]N , Ft =
N =1
Ft() ,
t ∈ NN 0 ,
(1)
where the marginal filtrations F1 , . . . , FN are independent one-dimensional dyadic filtrations. It remains to demonstrate the representation of Mt (f ) as the mentioned conditional expectation. By appealing to arguments of measuretheory, it suffices to prove the result when f is of the form () N f (t) = N =1 f (t ), t ∈ [0, 1] , where f1 , . . . , fN are bounded, measurable functions from [0, 1] to R. For f of this form, equation (1) implies that N almost surely, E[f (V ) | Ft ] = =1 E[f (V () ) | Ft() ]. By Proposition 1.1.1, N
E[f (V ) | Ft ] =
Mt() (f ), =1
a.s.,
1 Haar Systems
53
where (Mk (f ); k ≥ 0) is the one-parameter process defined in Section 1.1. (Note: The notation is being slightly abused. When t ∈ NN 0 and f : [0, 1]N → R, Mt (f ) is the stochastic process of this section. When t ∈ N10 and f : [0, 1] → R, Mt (f ) is the one defined in Section 1.1.) To finish, let us observe the following simple calculation for f of the above form: N
Mt (f ) =
Mt() (f ). =1
This follows from theform of the Haar system together with the observaN tion that when f = f , f, hN = k,j =1 f , hk() ,j () . A final word of caution: The first inner product above is an integral over [0, 1]N , while the second one is over [0, 1]. This completes our proof. Exercise 1.2.2 If f ∈ L1 [0, 1]N , prove that (Mt,...,t (f ); t ∈ N0 ) is a uniformly integrable one-parameter martingale with respect to the one parameter filtration (Ft,...,t ; t ∈ N10 ). We can argue as in the previous section and use Exercise 1.2.2 to deduce the following: Theorem 1.2.1 Suppose f ∈ L1 [0, 1]N . Then, N hN f (t) = lim k,j (t)f, hk,j , n→∞
0 k (n,...,n) j∈ΓN (k)
for Lebesgue almost all t ∈ [0, 1]N and in L1 [0, 1]N . Exercise 1.2.3 Complete the above proof of Theorem 1.2.1.
Note that we have used only the one-parameter theory to obtain the −1 above. Suppose next that f is L lnN L-bounded. We can use the multi+ parameter martingale convergence theorem (Corollary 2.8.1 of Chapter 1) and its Lp [0, 1]N extensions (Exercise 1.1.2) to obtain the following sharper form of Theorem 1.2.1. Theorem 1.2.2 Suppose f : [0, 1]N → R satisfies |f (t)| · [ln+ |f (t)|]N −1 dt < ∞.
(2)
[0,1]N
Then, f (t) = lim
s→∞
0ks
j∈ΓN (k)
N hN k,j (t)f, hk,j ,
for Lebesgue almost all t ∈ [0, 1]N and in L1 [0, 1]N . Moreover, if f ∈ Lp [0, 1]N for some p ∈ (1, ∞), then convergence also holds in Lp [0, 1]N .
54
2. Two Applications in Analysis
Exercise 1.2.4 Complete the above proof of Theorem 1.2.2.
Theorem 1.2.1 shows that the Haar system forms a basis for L1 [0, 1]N . On the other hand, Theorem 1.2.2 implies that for p > 1, Lp [0, 1]N is spanned by the Haar system, in a uniform fashion.
2 Differentiation We will discuss two differentiation theorems of classical analysis: 1. Lebesgue’s differentiation theorem; and 2. the Jessen–Zygmund–Marcinkiewicz theorem.
2.1 Lebesgue’s Differentiation Theorem Given a continuous function f : [0, 1]N → R, and any ∆, t0 ∈ RN +, 1 f (t) dt − f (t0 ) ≤ sup |f (t) − f (t0 )|. N (j) [t ,t +∆] ∆ t∈[t0 ,t0 +∆] 0 0 j=1 Therefore, for any t0 ∈ ]0, 1[N , lim
∆∈(0,∞)N +: ∆→0
N
1
j=1
∆(j)
[t0 ,t0 +∆]
f (t) dt = f (t0 ).
(1)
There is need for a technical aside here: As we have stated things, f (t) may not be defined for all t ∈ [t0 , t0 + ∆]. To be completely careful, we will extend the definition of any f : [0, 1]N → R by defining f (t) to be 0 where t ∈ [0, 1]N . It is instructive to check that (1) holds uniformly over all choices of t0 ∈ ]0, 1[N . Equation (1) says that—at least for continuous functions—derivatives are anti-integrals. Lebesgue’s differentiation theorem shows that, as long as the integrals are well-defined, this is always almost the case. In its simplest form, Lebesgue’s differentiation theorem states the following: Theorem 2.1.1 (Lebesgue’s Differentiation Theorem) For any function f ∈ L1 [0, 1]N , and for almost all t ∈ [0, 1]N , −N lim ε f (s) ds = f (t). ε→0+
[t,t+(ε,...,ε)]
It is important to note that the approximation in Lebesgue’s differentiation theorem is not as strong as the one in (1). The latter allows for
2 Differentiation
55
N -dimensional rectangles whose sides collapse possibly at different rates, while the former is a statement about N -dimensional hypercubes; in particular, the sides are all of equal length. The key to the above theorem is a maximal inequality. To describe it, define the maximal operator M by M f (t) = sup ε−N |f (s)| ds, t ∈ [0, 1]N , ε>0
[t,t+(ε,...,ε)]
where f (t) = 0 if t ∈ [0, 1]N , for simplicity. The required maximal inequality is the following: Proposition 2.1.1 Suppose f ∈ L1 [0, 1]N . Then for all λ > 0, Leb{t ∈ [0, 1]N : M f (t) ≥ λ} ≤
4N f 1 . λ
Proof Applying Proposition 1.2.1 to |f | and noting the absolute values, we see that for U uniformly distributed on [0, 1]N and F the corresponding dyadic filtration, t ∈ NN Mt (f ) = E |f (U )| Ft , 0 . See Section 1.2 for definitions. In particular, M (f ) = (Mt (f ); t ∈ NN 0 ) is an N -parameter martingale with respect to F. For all k ∈ NN and j ∈ ΓN (k), define Ik,j to be the interior of the (necessarily closed) support of the Haar function hN k,j . That is, N
Ik,j =
() −k() () () j 2 , (j + 1)2−k ,
k ∈ NN , j ∈ ΓN (k).
(2)
=1
The following is an instructive exercise in conditional expectations: For all t, k ∈ NN with k t and for all j ∈ ΓN (k), 1 E |f (U )| Ft 1l(U ∈Ik,j ) = |f (s)|ds · 1l(U ∈Ik,j ) , a.s. Leb(Ik,j ) Ik,j Define p : R → RN by p(x) = (x, . . . , x), and let Xk (t) = 2N k |f (s)| ds · 1lIp(k),j (t), j∈ΓN (k)
Ip(k),j
k ≥ 1, t ∈ RN +.
Since Leb(Ip(k),j ) = 2−N k and the Ik,j ’s are disjoint, we have shown that for all integers k ≥ 1, Mp(k) (f ) = Xk (U ), a.s. In particular, we have shown that (Xk (U ); k ≥ 1) is a nonnegative martingale with respect to the filtration
56
2. Two Applications in Analysis
F ◦ p = (Fp(k) ; k ≥ 1). Therefore, by the weak (1,1) inequality (Theorem 1.3.1, Chapter 1), for all λ > 0,
1 P sup Xk (U ) ≥ λ ≤ sup E[Xk (U )] λ k≥1 k≥1 1 |f (s)| ds · P(U ∈ Ip(k),j ) = sup 2N k λ k≥1 Ip(k),j N j∈Γ (k)
f 1 , = λ
(3)
since the closure of the union over the j’s of the Ip(k),j ’s is all of [0, 1]N . We finish our proof by using an interpolation argument that relates (3) to the maximal function. For all integers k ≥ 2 and every t ∈ ]0, 1[N , we can uniquely find an integer j ∈ ΓN (k) such that t ∈ Ip(k),j . Therefore, there exists a unique j such that [t, t + p(2−k )] is a subset of Ip(k−1),j . (This is why we required k ≥ 2 and not 1.) In particular, Nk |f (s)| ds ≤ 2N Xk−1 (t). 2 [t,t+p(2−k )]
Now take ε ∈ ]0, 14 [. We can find an integer k ≥ 2 such that 2−k−1 ≤ ε ≤ 2−k . Consequently, |f (s)|ds ≤ 2N (k+1) |f (s)| ds ≤ 4N Xk−1 (t), ε−N [t,t+p(2−k )]
[t,t+p(ε)]
thus establishing the pointwise inequality |f (s)| ds ≤ 4N sup Xk (t), sup ε−N 0<ε< 14
t ∈ ]0, 1[N .
k≥1
[t,t+p(ε)]
On the other hand, it is easy to see that sup ε−N |f (s)| ds ≤ 4N f 1 . 1 4 ≤ε
[t,t+p(ε)]
Thus, whenever M f (t) ≥ λ, supk≥1 Xk (t) ≥ 4−N λ. In other words, for all λ ≥ 4N f 1, Leb t ∈ [0, 1]N : M f (t) ≥ λ ≤ Leb t ∈ [0, 1]N : sup Xk (t) ≥ 4−N λ k≥1
= P sup Xk (U ) ≥ 4−N λ
k≥1
4N f 1 , ≤ λ by equation (3). This completes our proof.
2 Differentiation
57
Exercise 2.1.1 Prove that for all p > 1, there exists a finite, positive constant C such that whenever f ∈ Lp [0, 1]N , M f p ≤ Cf p . This is called the Hardy–Littlewood maximal inequality. We are ready to verify Theorem 2.1.1. Proof of Theorem 2.1.1 Throughout this proof, for any g ∈ L1 [0, 1]N , and for all ε > 0, we write gε (t) = ε−N g(s) ds, t ∈ ]0, 1[N , [t,t+(ε,...,ε)]
and extend gε to a function on [0, 1]N by continuity. It should be recognized that whenever g is continuous, then limε→0 gε = g, uniformly on compacts. If f, g ∈ L1 [0, 1]N , an application of the triangle inequality yields fε − gε 1 ≤ f − g1 . On the other hand, for any f ∈ L1 [0, 1]N , we can always find continuous g n such that limn→∞ g n = f in L1 [0, 1]N . Thus, we can use the previous display to establish fε − f 1 ≤ fε − gεn 1 + gεn − g n 1 + f − g n 1 ≤ 2f − g n 1 + gεn − g n 1 . Now, first let ε → 0 to deduce lim supε→0 fε − f 1 ≤ 2f − g n 1 . Then, let n → ∞ to see that fε → f in L1 [0, 1]N . It remains to prove almost everywhere convergence. Recall, once more, that if g is continuous, then limε→0 gε = g, pointwise. In particular, for any continuous function g : [0, 1]N → R, 0 ≤ lim sup fε − lim inf fε ≤ 2M (|f − g|), ε→0
ε→0
pointwise. Consequently, by Proposition 1.1.1, for any λ > 0, 2 · 4N f − g1 . Leb t ∈ [0, 1]N : lim sup fε (t) − lim inf f (t) ≥ λ ≤ ε→0 λ ε→0 Since this holds for an arbitrary continuous g, the above is, in fact, equal to 0 by density. Let λ ↓ 0 along a rational sequence to conclude that limε→0 fε exists almost everywhere. On the other hand, by the already-proved L1 convergence, there exists a subsequence ε → 0 such that fε → f , almost everywhere. Hence, limε→0 fε = f almost everywhere, as desired. Exercise 2.1.2 Prove an Lp [0, 1]N version of Theorem 2.1.1; cf. Exercise 1.1.2 for the 1-parameter version of this.
58
2. Two Applications in Analysis
Exercise 2.1.3 Verify that Lebesgue’s differentiation theorem (Theorem 2.1.1) remains true if we replace [0, 1]N by RN .
2.2 A Uniform Differentiation Theorem As we saw from its proof, Theorem 2.1.1 is actually a one-parameter, or one-dimensional, result. Roughly speaking, it says that if f ∈ L1 [0, 1]N , then its “distribution function” is Lebesgue almost everywhere differentiable. There is a much stronger result, due to B. Jessen, J. Marcinkiewicz, and A. Zygmund, that holds under the stronger assumption that f is in L(ln+ L)N −1 . To simplify the notation, for the remainder of this section define Ψ(x) = x(ln+ x)N −1 , x > 0. Theorem 2.2.1 Suppose f : [0, 1]N → R is a measurable function that satisfies the integrability condition Ψ ◦ |f | ∈ L1 [0, 1]N . Then, for Lebesgue almost all t ∈ [0, 1]N , 1 f (s) ds = f (t). lim N (j) [t,t+∆] ∆∈ ]0,1[N : j=1 ∆ ∆→0
The main point of this theorem is that the sides of the rectangles [t, t+∆] need not go to zero at the same rate, as long as |f (x)| · {ln+ |f (x)|}N −1 dx < ∞. [0,1]N
It is also worth mentioning that in the above, the domain of definition of the function f is implicitly extended to all of RN by defining f (t) ≡ 0 if t ∈ [0, 1]N . Just like its one-parameter counterpart (Theorem 2.1.1), Theorem 2.2.1 is a consequence of a maximal inequality. Define the multiparameter max! by imal operator M !f (t) = sup 1 |f (s)| ds, M N (j) [t,t+∆] ∆∈ ]0, 14 [N j=1 ∆ where f (t) = 0 if t ∈ [0, 1]N . The N -parameter analogue of Proposition 1.2.1 is the following: Proposition 2.2.1 Suppose f : [0, 1]N → R+ is a measurable function such that Ψ ◦ f ∈ L1 [0, 1]N . Then, for all λ > 0, N e N −1 !f (t) ≥ λ} ≤ 2 Leb{t ∈ [0, 1]N : M (N − 1) + Ψ ◦ f 1 . λ e−1
2 Differentiation
59
Proof First, note that f ≥ 0, by definition. Having made this remark, we now adapt the presented proof of Proposition 2.1.1 to the present multiparameter setting. Recall equation (2), Section 2.1, for the definition of Ik,j , k ∈ NN , j ∈ ΓN (k) and define () N 'k (t) = 2 N =1 k · f (s)ds · 1lIk,j (t), t ∈ RN X +, k ∈ N . j∈ΓN (k)
Ik,j
N
()
For any k ∈ NN and j ∈ ΓN (k), Leb(Ik,j ) = 2− =1 k . Thus, the ' = argument of the presented proof of Proposition 2.1.1 shows that X 'k (U ); k ∈ NN ) is a nonnegative N -parameter martingale with respect (X to the dyadic filtration F. Moreover, by Proposition 1.2.1, F is commuting. ' is an orthomartingale with Consequently (Corollary 3.5.1 of Chapter 1), X respect to the marginal filtrations of F. By Theorem 2.5.1 of Chapter 1, for all λ > 0, P
e N −1
'k (U )) . 'k (U ) ≥ λ ≤ 1 sup X (N − 1) + E Ψ(X λ e−1 k∈NN
On the other hand, since Ψ is convex nondecreasing, by Jensen’s inequality,
' Ψ f (s) ds ≤ Ψ ◦ f 1 . E Ψ Xk (U ) ≤ j∈ΓN (k)
Ik,j
(Why?) Thus, we have shown that for all λ > 0, P
e N −1
'k (U ) ≥ λ ≤ 1 {(N − 1) + Ψ ◦ f 1 }. sup X λ e−1 k∈NN
(1)
Next, we will relate this to the maximal function. Recall the function p : R → RN given by p(t) = (t, . . . , t). For all k ∈ NN , () define αk to be the N -dimensional vector whose th coordinate is 2−k . Next, for any ∆ ∈ ]0, 14 [N , find k p(2) such that αk+p(1) ∆ αk . By monotonicity, for all t ∈ ]0, 1[N , 1 1 'k−p(1) (t). f (s) ds ≤ N f (s) ds = 2N X N (j) (j) [t,t+∆] [t,t+α ] α k j=1 ∆ j=1 k+p(1) The result follows from (1).
Exercise 2.2.1 For all p ∈ (1, ∞), prove the existence of a positive, finite !f p ≤ constant C that depends on p such that for all f ∈ Lp [0, 1]N , M Cf p .
60
2. Two Applications in Analysis
We are ready for the following. Proof of Theorem 2.2.1 Throughout, for all integrable functions g : [0, 1]N → R, and for all ∆ ∈ ]0, ∞[N , we write 1 g∆ (t) = N g(s) ds, (j) [t,t+∆] j=1 ∆ ' Θg(t) = lim sup g∆ (t) − lim inf g∆ (t), ∆→0
∆→0
for all t ∈ ]0, 1[N . We can extend the domain of g∆ to all of [0, 1]N by continuity. Given a function f as in the statement of the theorem, we can find continuous functions g n : [0, 1]N → R such that lim g n − f 1 = 0.
n→∞
On the other hand, Ψ(|x − y|) ≤ |Ψ(x) − Ψ(y)|; cf. equation (3), Section 2.8, Chapter 1. Thus, by uniform integrability, for any fixed δ > 0, ( |g n − f | ( ( ( lim (Ψ ◦ ( = 0. n→∞ δ 1 Now, for each fixed n, n ' (t) ≤ Θg ' n (t) + 2M !(|f − g n |)(t) = 2δ M ! |f − g | (t), Θf δ
t ∈ ]0, 1[N .
The latter equality holds from continuity of g n . Consequently, we can apply Proposition 2.2.1 to deduce that for any λ > 0, ' (t) ≥ λ} Leb{t ∈ [0, 1]N : Θf ( |f − g n | ( N +1 2 δ e N −1 ( ( (N − 1) + (Ψ ◦ ≤ ( . λ e−1 δ 1 ' = We can let n → ∞, δ → 0+ , and λ → 0+ , in this order, to deduce that Θf 0, almost everywhere. Equivalently, lim∆→0 f∆ exists almost everywhere. To show that this limit equals f almost everywhere, we can simply apply Lebesgue’s differentiation theorem (Theorem 2.1.1, why?). Exercise 2.2.2 Prove an Lp [0, 1]N version of Theorem 2.2.1.
Exercise 2.2.3 Prove that Theorem 2.2.1 remains true if we replace [0, 1]N by RN everywhere.
3 Supplementary Exercises
61
3 Supplementary Exercises 1. Given a sequence(ck,j ;k ≥ 1, j ∈ Γ(k)) of real numbers, define n the function Fn (t) = k=0 j∈Γ(k) ck,j hk,j (t), (t ∈ [0, 1]). Supposing that ∞ 1 (k−1) 2 maxj∈Γ(k) |ck,j | < +∞, prove that as n → ∞, Fn converges unik=0 2 formly and in Lp [0, 1] for all p ≥ 1. 2. Show that Theorem 1.1.1 on Haar function expansions implies Lebesgue’s differentiation theorem, Theorem 2.1.1. 3. Suppose (It ; t ≥ 0) is a one-parameter monotonic family of rectangles in [0, 1]N with sides parallel to the axes. That is, (a) for any t ≥ 0, It is of the form [a(1) , b(1) ] × · · · × [a(N) , b(N) ] ⊂ [0, 1]N ; and (b) if s ≤ t, Is ⊂ It . For all functions f : [0, 1]N → R, define the maximal function Mf by 1 |f (x − y)| dy, x ∈ [0, 1]N , Mf (x) = sup t≥0 Leb(It ) It where f (z) = 0, for all z ∈ [0, 1]N . Show that whenever f ∈ Lp [0, 1]N for some p > 1, then Mf ∈ Lp [0, 1]N . In fact, show that for all p > 1, there exists a finite constant Cp > 0 such that for all f ∈ Lp [0, 1]N , Mf p ≤ Cp f p . This is from Zygmund (1988, Ch. XVII). 4. For all ε > 0 and t ∈ RN , let Rε (t) denote the collection of all cubes of side ε that contain the point t and whose sides are parallel to the axes, i.e., R ∈ Rε (t) if and only if R = [a, a + (ε, . . . , ε)] for some a ∈ RN such that a t a + (ε, . . . , ε). We can define the enhanced maximal operator M as follows: For all f ∈ L1 (RN ), |f (s)| ds. Mf (t) = sup sup ε−N ε>0 R∈Rε (t)
R
(a) Verify that for all t ∈ RN , 2−N Mf t − (ε, . . . , ε) ≤ M f (t) ≤ Mf (t). (b) Demonstrate the following extension of Proposition 2.1.1: For all f ∈ L1 (RN ) N and all λ > 0, Leb{t ∈ RN : Mf (t) ≥ λ} ≤ 8λ f 1 . (c) Conclude the following enhancement of Lebesgue’s differentiation theorem: Whenever f ∈ L1 (RN ), for almost all t ∈ RN , f (s) ds − f (t) = 0. lim sup ε−N ε→0+ R∈Rε (t)
R
(d) Formulate and prove the corresponding enhancement of the uniform differentiation theorem (Theorem 2.2.1). 5. A function f : Rd → R is said to be& harmonic if it is continuous and if for all x ∈ Rd and all ε > 0, f (x) = (2ε)−d B(x;ε) f (y) dy, where B(x; r) denotes the open ∞ ball of radius r about x. Show that the only bounded harmonic functions on Rd are constants. This is Liouville’s theorem of classical potential theory.
62
2. Two Applications in Analysis
(Hint: For ε of the form ε = 2−n , interpret the right-hand side as a conditional expectation with respect to the dyadic filtration. An alternative proof can be based on Haar function expansions.) 6. Suppose (Ω, F, P) is a probability space. (i) Show that if Y ∈ L1 (P), then A → Q(A) = E[Y 1lA ] defines a measure on the F-measurable subsets of Ω. (ii) Suppose that for every n ≥ 1, Cn = {C1,n , C2,n , . . .} is a denumerable collection of subsets of F that cover Ω. Let F n define the σ-field generated by the elements of Cn and show that E[Y | F n ] =
k≥1: P(Ck,n )>0
Q(Ck,n ) 1lC , P(Ck,n ) k,n
n ≥ 1.
(iii) If Cn+1 ⊃ Cn , conclude that the limn→∞ E[Y | F n ] exists a.s. and in L1 (P). (iv) Suppose Y is ∨n F n -measurable. Then, show that for P-almost every ω ∈ Ω, lim
n→∞
Q(Ck(ω),n ) = Y (ω), P(Ck(ω),n )
where k(ω) is the unique k ≥ 1 such that ω ∈ Ck(ω),n . (Since Y = dQ/dP in the sense of the Radon–Nikod´ ym theorem, the above is a probabilistic proof of the differentiation theorem, which is, in fact, equivalent to the Radon–Nikod´ ym theorem.) (v) Obtain the following form of Lebesgue’s differentiation theorem on R1 : Every function F : R → R that is of bounded variation is a.e. differentiable in the sense that limε→0+ ε−1 {F (x + ε) − F (x)} exists for almost every x ∈ R. This is due to F. Riesz; see Riesz and Sz.-Nagy (1955, Ch. 1) and also (Stein 1970; Stein and Weiss 1971). (vi) Obtain the following form of Lebesgue’s differentiation theorem on a separable metric space (S, d): Given a σ-finite measure µ on the Borel field of S, define for all f ∈ L1 (µ), 1 Tr f (a) = f (y) µ(dy) a ∈ S, r > 0, µ(Bd (a; r)) B d (a;r) where Bd (a; r) = {x ∈ Rd : d(a, r) < r} denotes the ball of radius r about a ∈ S. Prove that as r → 0+ , Tr f → f , µ-almost everywhere. 7. In this exercise we will derive a decomposition of Calder´ on and Zygmund (1952) that is an important tool in the study of singular integrals. (i) Suppose M = (Mn ; n ≥ 0) is a nonnegative one-parameter submartingale with respect to a filtration F = (F n ; n ≥ 0). Suppose M has subexponential growth in the sense that there exists a finite, nonrandom constant C > 0 such that for all n ≥ 0, Mn+1 ≤ CMn . For any λ > 0, define Tλ = inf(n ≥ 0 : Mn > λ), where inf ∅ = +∞. Check that: (a) Tλ is a stopping time for each λ > 0.
4 Notes on Chapter 2
63
(b) on (Tλ < ∞), λ < MTλ ≤ Cλ. (ii) Suppose F is the dyadic filtration on [0, 1]N and consider the martingale M given by M = E[f (V ) | F n ], where f : [0, 1]N → R+ and V is uniformly chosen from [0, 1]N . Use (i) to prove the following: For each α ≥ 0, there exists a decomposition {Φ, Γ}, of [0, 1]N such that: (a) [0, 1]N = Φ ∪ Γ, Φ ∩ Γ = ∅; (b) f (x) ≤ λ, for Lebesgue almost every x ∈ Φ; (c) there exists a countable collection of cubes {Γk ; k ≥ 1} with disjoint interiors such that Γ = ∪k≥1 Γk ; (d) for all k ≥ 1, λ≤
1 Leb(Γk )
Γk
f (x) dx ≤ 2N λ.
(Hint: Show that the dyadic martingale of (ii) satisfies (i) with C = 2N . You may want also to note that (Tλ < ∞) = ∪k (Tλ = k) and (Tλ = k) ∈ F k .)
4 Notes on Chapter 2 Section 1 It is safe to say that a probabilist’s interpretation of Haar functions is Exercise 1.1.1. This exercise can sometimes even be an exercise for the undergraduate student and is a part of the folklore of stochastic processes. The significance of this exercise was showcased by P. L´evy’s construction of Brownian motion, where Supplementary Exercise 1 was essentially introduced and used; cf. (L´evy 1965; Ciesielski 1961). See also Ciesielski (1959), Neveu (1975, III-3), and Ciesielski and Musielak (1959), where you will also find many related works, including proofs for Theorem 1.1.1 and Exercise 1.1.2. Section 2 The differentiation theorem of Lebesgue, as well as that of Jessen, Zygmund, and Marcinkiewicz, is completely classical. See Zygmund (1988, Ch. XVII) for this and related developments. The constants in Proposition 2.2.1 are better than those in the analysis literature, and its probabilistic proof is new. However, the real content of this proof is classical and has already been utilized by Jessen et al. A proof of Theorem 2.2.1 where ∆ ∈ QN + is dyadic can be found in Walsh (1986b), where Cairoli’s convergence theorem is directly applied. It is also possible to use the theory of martingales that are indexed by directed sets in order to prove a more general result than Theorem 2.2.1; see Shieh (1982). An excellent modern source for results relating to this chapter, and much more, is Stein (1993). Section 3 The probabilistic proof outlined in Supplementary Exercise 7 is from Gundy (1969).
This page intentionally left blank
3 Random Walks
Those cannot remember the past are condemned to repeat it. —Santayanna Random walks entered mathematics early on through the analysis of gambling and other games of chance. To cite a typical example, let X0 denote the initial fortune of a certain gambler and let Xn stand for the amount won (if Xn ≥ 0) or lost (if Xn ≤ 0) the nth time that the gambler places a bet. In the simplest i.i.d., and the gambler’s fortune at time n is gambling situations, the Xn ’s are described by the partial sum Sn = n j=0 Xj . The stochastic process S = (Sn ; n ≥ 0) is called a one-dimensional random walk and lies at the heart of modern, as well as classical, probability theory. This chapter is a study of some properties of systems of such walks. The main problem addressed here is, under what conditions does the random walk return to 0 infinitely often? To see how this may come up, suppose the gambler plays ad infinitum and has an unbounded credit line. We then wish to know under what conditions the gambler can break even, infinitely many times, as he or she plays on. In the language of the theory of Markov chains, we wish to know when the state 0 is recurrent. The analogous problem for systems of random walks is more intricate and is the subject of much of this chapter: Suppose the Xj ’s are i.i.d. random vectors in d-space. Then, the d-dimensional random walk models the movement of a small particle in a homogeneous medium. Suppose we have N particles, each of which paints every point that it visits. If each individual particle uses a distinct color, under what conditions do the N random lines created by the N random particles cross paths infinitely many times? These are some of the main problems that are taken up in this chapter.
66
3. Random Walks
1 One-Parameter Random Walks The stochastic process S = (Sn ; n ≥ 1) is a random walk if it has stationary, independent increments. To put it another way, we consider independent, identically distributed random variables X1 , X2 , . . ., all taking nvalues in Rd , and define the corresponding random walk n → Sn as Sn = i=1 Xi (n = 1, 2, . . .). Clearly, X1 = S1 , and for all n ≥ 2, Xn = Sn − Sn−1 , when n ≥ 2. Thus, we are justified in calling the Xi ’s the increments of S. This is a review section on one-parameter random walks; we develop the theory with an eye toward multiparameter extensions that will be developed in the remainder of this chapter.
1.1 Transition Operators Suppose S = (Sn ; n ≥ 1) is a d-dimensional random walk with increments X = (Xn ; n ≥ 1). For all n ≥ 1, define Fn to be the σ-field generated by X1 , . . . , Xn . It is simple to see that Fn is precisely the σ-field generated by S1 , . . . , Sn . In the notation of Chapter 1, we have shown that F = (Fn ; n ≥ 1) is the history of the stochastic process S. It is always the case that the study of the stochastic process S is equivalent to the analysis of probabilities of the form P(Sn1 ∈ E1 , Sn2 ∈ E2 , . . . , Snk ∈ Ek ), where k, n1 , . . . , nk ≥ 1 are integers and E1 , . . . , Ek are measurable subsets of Rd . These probabilities are called the finite-dimensional distributions of S. It turns out that the finite-dimensional distributions of the random walk S are completely determined by the collection P(X1 + x ∈ E), where E ⊂ Rd is measurable and x ∈ Rd . A precise form of such a statement is called the Markov property; we shall come to this later. Bearing this discussion in mind, we define for all measurable functions f : Rd → R, all n ≥ 1, and x ∈ Rd , Tn f (x) = E f (Sn + x) . In particular, note that for all Borel sets E ⊂ Rd , T1 1lE (x) = P(X1 + x ∈ E). Thus, once we know the operator Tn , we know how to compute these probabilities. We begin our study of random walks by first analyzing these operators. Note that Tn is a bounded linear operator: For all bounded measurable f, g : Rd → R, n ≥ 1, x ∈ Rd and all α, β ∈ R, (i) supx∈Rd |Tn f (x)| ≤ supx∈Rd |f (x)|; (ii) Tn (αf + βg)(x) = αTn f (x) + βTn g(x); and (iii) x → Tn f (x) is measurable.
1 One-Parameter Random Walks
67
Next, we interpret Tn in terms of the conditional distributions of S. Lemma 1.1.1 For all n, k ≥ 1 and all bounded measurable f : Rd → R, a.s. E f (Sk+n ) Fk = E f (Sk+n ) Sk = Tn f (Sk ), In particular, for all x ∈ Rd , n, k ≥ 1, and all bounded measurable f : Rd → R, Tn+k f (x) = Tn (Tk f )(x) = Tk (Tn f )(x). In functional-analytic language, (Tn ; n ≥ 1) is a semigroup of operators. To see what the above lemma means, take f = 1lE for some Borel set E ⊂ Rd . The above says that if k denotes the current time, 1. given the present position Sk , any future position Sk+n is conditionally independent of the past positions S1 , . . . , Sk−1 ; and 2. Tn 1lE (Sk ) is the conditional probability of making a transition to E in n steps, given Fk . Motivated by this, we call Tn the n-step transition operator of S. k+n is (a) inProof of Lemma 1.1.1 Note that Sk+n − Sk = j=k+1 Xj n dependent of Fk ; and (b) has the same distribution as Sn = j=1 Xj . Thus, E f (Sk+n ) Fk = E f (Sk+n − Sk + Sk ) Fk = f (x + Sk ) P(Sn ∈ dx) = Tn f (Sk ), almost surely. From this, we also can conclude the equality regarding the conditional expectation E[f (Sk+n ) | Sk ]. Applying the preceding to f (•+x), we obtain E[f (x + Sk+n ) | Fk ] = Tn f (x + Sk ), almost surely. Taking expectations, we deduce that Tk+n f (x) = Tk (Tn f )(x). The rest follows from reversing the roles of k and n. Digression If we define S0 = 0, then for any x ∈ Rd , we can, and should, think of x+S as our random walk started at x. In particular, S itself should be thought of as the random walk started at the origin. The above lemma suggests the following interpretation: Given the position of the process at time k, the future trajectories of our walk are those of a random walk started at Sk . The following is a more precise formulation of this and is a version of the so-called Markov property of S that was alluded to earlier. Theorem 1.1.1 (The Markov Property) Fix integers k ≥ 1, n ≥ 2 and bounded measurable functions f1 , . . . , fn : Rd → R. Then, the following holds with probability one: n n E f (Sk+ − Sk ) Fk = E f (S ) . =1
=1
68
3. Random Walks
In other words, for any k ≥ 1, the process n → Sn+k − Sk is (i) independent of Fk ; and (ii) has the same finite-dimensional distributions as the original process S. Recalling that we think of n → Sn + x as a random walk with increments X1 , X2 , . . . that starts at x ∈ Rd , we readily obtain the following useful interpretation of the above. Corollary 1.1.1 Suppose k ≥ 1 is a fixed integer. Then, conditionally on σ(Sk ), (Sk+n ; k ≥ 0) is a random walk whose increments have the same distribution as X1 . Moreover, the σ-field generated by (Sk+n ; n ≥ 0) is conditionally independent of Fk , given σ(Sk ). See Section 3.6 of Chapter 1 for information on conditional independence. Exercise 1.1.1 Carefully prove Corollary 1.1.1.
Proof of Theorem n 1.1.1 Since it depends only on Xk+1 , . . . , Xk+n , the random variable =1 f (Sk+ − Sk ) is independent of (X1 , . . . , Xk ) and hence of Fk . (Why?) As a result, with probability one, E
n =1
f (Sk+ − Sk ) Fk = E
n
f (Sk+ − Sk ) .
=1
On the other hand, the sequence (Xk+1 , . . . , Xk+n ) has the same distribution as the sequence (X1 , . . . , Xn ). After performing a little algebra, we can reinterpret this statement as follows: The distribution of the Rnd -valued . . , Sk+n −Sk ) is the same as that of (S1 , . . . , Sn ). random vector (Sk+1 −Sk , . n n In particular, we have E[ =1 f (Sk+ − Sk )] = E[ =1 f (S )], which proves the result. It is clear that Corollary 1.1.1 extends the conditional independence assertion of Lemma 1.1.1. However, the latter lemma also contains information on the transition operators, to which we now return. Corollary 1.1.2 The transition operators, in fact T1 , uniquely determine the finite-dimensional distributions and vice versa. Proof By the very definition of Tn , if we know all finite-dimensional distributions, we can compute Tn f (x) for all measurable f : Rd → R+ , all n ≥ 1, and all x ∈ Rd . The converse requires an honest proof. Consider the following proposition: (Πn ) For all measurable f1 , . . . , fn : Rd → R+ , E[ n=1 f (S )] can be computed from T1 . Our goal is to show that (Πn ) holds for all n ≥ 1. We will prove this by using induction on n: Lemma 1.1.1 shows that (Π1 ) is true. Thus, we
1 One-Parameter Random Walks
69
suppose that (Π1 ), . . . , (Πn−1 ) hold and venture to prove (Πn ). By Lemma 1.1.1, for all measurable f1 , . . . , fn : Rd → R+ , E
n =1
n−1 n−1 f (S ) Fn−1 = f (S ) · T1 f (Sn−1 ) = g (S ), =1
=1
where gi = fi for all 1 ≤ i ≤ n − 2 and gn−1 (x) = fn−1 (x) · T1 fn−1 (x). Takn−1 n ing expectations, we see that E[ =1 f (S )] = E[ i=1 gi (Si )]. By Πn−1 , this can be written entirely in terms of T1 , thus proving (Πn ). n Exercise 1.1.2 Find an explicit recursive formula for E[ =1 f (S )] in terms of T1 .
1.2 The Strong Markov Property Let S = (Sk ; k ≥ 1) denote a d-dimensional random walk with history F = (Fk ; k ≥ 1) and increment process X = (Xk ; k ≥ 1). The strong Markov property of S states that for any finite stopping time T (with respect to the filtration F), the stochastic process (Sk+T − ST ; k ≥ 1) is independent of FT and has the same finite-dimensional distributions as the process S. Roughly speaking, this means that the process (Sk+T ; k ≥ 1) is conditionally independent of FT given ST and is, in distribution, the random walk S started at ST . Theorem 1.2.1 (The Strong Markov Property) Suppose T is a stopping time with respect to F. Given integers n, k ≥ 1 and bounded, measurable f1 , . . . , fn : Rd → R, E
n =1
f (ST + − ST ) FT 1l(T <∞) = E
n
f (S ) 1l(T <∞) ,
a.s.
=1
Remarks (i) Given the transition operators, the above expression can be computed using Corollaries 1.1.1 and 1.1.2; see Exercise 1.1.2. (ii) It is important to realize that the stopping time condition cannot be removed in general, as the following clearly shows. Exercise 1.2.1 Consider the simple walk on Z1 . Here, the increments X1 , X2 , . . . take the values ±1 with probability 12 each. Consider the N0 ∪ {∞}-valued random variable L = sup(k ≥ 0 : Sk ≤ − 12 k), where sup ∅ = 0. That is, L designates the last time that the random walk goes below the line y = − 21 x. (i) Show that with probability one, L < ∞ and that L is not a stopping time with respect to the history of the process S.
70
3. Random Walks
(ii) Verify that FL is a σ-field and that the process j → Sj+L − SL is independent of
FL , where FL = A ∈ ∨n Fn : A ∩ (L ≤ j) ∈ Fj , for all j ≥ 0 is defined as if L were a stopping time. (iii) Show that the stochastic process j → SL+j − SL does not have the same finite-dimensional distributions as S. This is a part of a deep result of Williams (1970, 1974). (Hint: for part (i), you can use a limit theorem; for part (ii), condition on the value of L.) T + Proof of Theorem 1.2.1 For all ≥ 1, ST + − ST = j=T +1 Xj . Since for all j ≥ 1, the event (T = j) is FT -measurable, E
n
∞ f (ST + − ST ) FT 1l(T <∞) = E
=1
n
j=1
=1 n
j=1
=1
∞ = E
f (ST + − ST ) FT 1l(T =j) f (Sj+ − Sj ) FT 1l(T =j) ,
n almost surely. Regarding j ≥ 1 as fixed, define Y = =1 f (Sj+ − Sj ) and for all k ≥ 1, let Mk = E[Y | Fk ]. By Theorem 1.1.1, Chapter 1, with probability one, Mj 1l(T =j) = MT 1l(T =j) = E[Y |FT ]1l(T =j) . Thus, E
n
∞ f (ST + − ST ) FT 1l(T <∞) = E
=1
j=1
n
f (Sj+ − Sj ) Fj 1l(T =j) .
=1
By the stationarity and the independence of the increments of S, the above n equals E[ =1 f (S )]1l(T <∞) , as desired.
1.3 Recurrence Suppose S is a d-dimensional random walk with increment process X and history F. Throughout this section we assume that the X’s are taking values in the d-dimensional integer lattice Zd . A point x ∈ Zd is said to be recurrent if P(Sk = x infinitely often) > 0. When is a point x ∈ Zd recurrent? In this subsection we will resolve this when x is the origin of Zd . Since it is the starting position of the random walk, the origin is a very special point; see the Digression in Section 1.1. Recurrence properties of a general point x ∈ Zd are discussed in Section 1.6 below. Recalling that inf ∅ = ∞, let τ1 = inf(j ≥ 1 : Sj = 0); that is, τ1 is the first time the random walk visits 0. Iteratively define τk+1 = inf(j ≥ 1+τk : Sj = 0), for k ≥ 1. It is easy to see that τ1 , τ2 , . . . are stopping times. One should think of τ1 (τ2 , . . .) as the first (second, etc.) time the random walk visits the origin. Among other things, this sequence of visitation times has the following property.
1 One-Parameter Random Walks
71
Lemma 1.3.1 Fix n, j ≥ 1. On (τn < ∞), P(τn+1 − τn = j | Fτn ) = P(τ1 = j),
a.s.
Suppose we knew that with probability one, τn < ∞ for all n ≥ 1. The above lemma asserts that in this case, τ1 , τ2 − τ1 , . . . is a sequence of independent, identically distributed random variables (why?). Since τn = τ1 + nj=2 (τj − τj−1 ), τ = (τn ; n ≥ 1) is then identified as a random walk with nonnegative increments. Proof This is a consequence of the strong
Markov property (see Theorem 1.2.1). In fact, since Sτn = 0 on τn < ∞ ,
P τn+1 − τn = j Fτn 1l(τn <∞)
= P Sτn + = 0 for all 1 ≤ ≤ j − 1, Sτn +j = 0 Fτn 1l(τn <∞)
= P Sτn + − Sτn = 0 for all 1 ≤ ≤ j − 1, Sτn +j − Sτn = 0 Fτn 1l(τn <∞)
= P S = 0 for all 1 ≤ ≤ j − 1, Sj = 0 1l(τn <∞)
= P τ1 = j 1l(τn <∞) . The strong Markov property (Theorem 1.2.1) is used in the penultimate line. This proves the result. In particular, upon summing Lemma 1.3.1 over all integers j ≥ 1, we arrive at the following: For all n ≥ 2, P(τn < ∞) = P(τn − τn−1 < ∞, τn−1 < ∞) = E P(τn − τn−1 < ∞ | Fτn−1 )1l(τn−1 <∞) = P(τ1 < ∞) · P(τn−1 < ∞). By induction,
n P(τn < ∞) = P(τ1 < ∞) .
(1)
With the unambiguous understanding that ∞ ≤ ∞, we can deduce that the τn ’s are nondecreasing. Continuity properties of probability measures then imply that n P(0 is recurrent) = lim P(τn < ∞) = lim P(τ1 < ∞) . n→∞
n→∞
Taking equation (1) into account, we have proven the following: Proposition 1.3.1 The following are equivalent: (i) 0 is recurrent; (ii) P(Sk = 0 infinitely often) = 1; and
72
3. Random Walks
(iii) P(τ1 < ∞) = 1. Informally, we are stating that if starting from the origin we are sure of returning to the origin, then we will do so infinitely many times. This is an example of the strong Markov property at its finest.
1.4 Classification of Recurrence A natural question is, how do the finite-dimensional distributions of a Zd valued random walk influence the recurrence of the point 0? For all integers n ≥ 1, define n 1l(Sk =0) . Rn = 1 + k=1
Recalling the Digression of Section 1.1, we think of S as starting from the origin, so that at time 0, S is at 0. Viewed as such, Rn denotes the total number of visits to the origin by time n. Note that R∞ = limn→∞ Rn is a random variable taking values in N∪{∞}. Proposition 1.3.1 can be restated as follows: P(R∞ = ∞) ∈ {0, 1}. Moreover, this probability is 1 if and only if 0 is recurrent. key to our analysis of recurrence turns out to be E[R∞ ] = 1 + The ∞ P(S olya, K. k = 0). In fact, we have the following result, due to G. P´ k=1 L. Chung, and W. H. J. Fuchs, which appeared in Chung and Fuchs (1951) in full generality; see (P´ olya 1921; Chung and Ornstein 1962) for some related results. Supplementary Exercise 9 contains a complete statement of the above results: the so-called Chung–Fuchs theorem. Theorem olya Criterion) The point 0 is recurrent if and ∞1.4.1 (The P´ only if k=1 P(Sk = 0) = ∞. Informally, S will hit 0 infinitely often if it is expected to do so. For our proof, we need the the following simple and powerful lemma, first found in Paley and Zygmund (1932). Lemma 1.4.1 (Paley–Zygmund Lemma) Suppose Z is an almost surely nonnegative random variable. Then for all ε ∈ ]0, 1[, 2
{E[Z]} , P Z ≥ εE[Z] ≥ (1 − ε)2 2 E[Z ]
provided that all of the mentioned expectations exist. Exercise 1.4.1 Prove the Paley–Zygmund lemma. (Hint: Apply the Cauchy–Schwarz inequality to E[Z1l(Z≥εE[Z]) ].)
1 One-Parameter Random Walks
73
Exercise 1.4.2 If Z is a nonnegative random variable that is also in 2 L2 (P), show that P(Z = 0) ≤ Var(Z)/ E[Z] , where Var denotes the variance. Exercise 1.4.3 Suppose E1 , E2 , . . . are measurable events such that j P(Ej ) = +∞. Prove that whenever n n j=1 k=1 P(Ej ∩ Ek ) < ∞, lim inf n 2 n→∞ j=1 P(Ej ) os 1952; then P(En infinitely often) > 0. This is from (Chung and Erd˝ Kochen and Stone 1964). (Hint: Consider the first two moments of Jn = nj=1 1lEj .) Proof of Theorem 1.4.1 We have already made the observation that ∞ R∞ ≥ 1 and E[R∞ − 1] = k=1 P(Sk = 0). (Since R∞ = limn Rn , a.s., this is a consequence of the monotone convergence theorem of measure theory.) P(S = 0) < ∞ if and only if E[R∞ − 1] < ∞. Consequently, Thus, k k P(S < ∞) < ∞ certainly implies that R∞ < ∞, a.s.; that is to say k k that 0 is not recurrent.Next, we suppose that k P(Sk = 0) = ∞. It is n clear that E[Rn − 1] = k=1 P(Sk = 0) and that this sequence explodes as n → ∞. We now estimate E[(Rn − 1)2 ], viz., n E (Rn − 1)2 = E 1l(Sk =0) + 2E 1l(Sk =0) 1l(S =0) k=1
= E[Rn − 1] + 2
1≤k<≤n
P(Sk = 0)P(S−k = 0),
1≤k<≤n
by the Markov property (Theorem 1.1.1). Relabeling the last summation and possibly adding more nonnegative terms, we arrive at the estimate
2 E (Rn − 1)2 ≤ E[Rn − 1] + 2 E[Rn − 1] . Since Rn − 1 ∈ N0 , (Rn − 1 > 0) = (Rn ≥ 2). Applying Lemma 1.4.1 first, and then the above estimate, in this order, we arrive at the following:
2 E[Rn − 1] P(τ1 ≤ n) = P(Rn ≥ 2) ≥
2 , E[Rn − 1] + 2 E[Rn − 1]
where τ1 = inf j ≥ 1 : Sj = 0 . Since limn E[Rn ] = ∞, this implies that P(τ1 < ∞) ≥ 12 . By Proposition 1.3.1, whenever P(τ < ∞) is positive, it is, in fact, 1. This completes our proof. While it was meant to bring forth a powerful technique, our demonstration of Theorem 1.4.1 is not the fastest method for getting there, as we see next.
74
3. Random Walks
Exercise 1.4.4 Let N denote the total number of returns to zero. That ∞ is, N = k=0 1l(Sk =0) . Show that N is a geometric random variable with olya’s mean p−1 , where p = P(∃k ≥ 1 : Sk = 0). Use this to verify P´ criterion. (Hint: Show that for all k ≥ 1, P(N ≥ k) = pP(N ≥ k − 1).)
1.5 Transience When a point x ∈ Zd is not recurrent for the Zd -valued random walk S, we say that it is transient. It is easy to see that 0 ∈ Zd is transient if and only if lim P(Sk = 0 for some k ≥ n) = 0. n→∞
Thus, a natural measure for the strength of the transience of the origin is the rate at which P(Sk = 0 for some k ≥ n) goes to 0 as n goes to infinity. The following sheds much light on this rate. Theorem 1.5.1 If the origin is transient for the Zd -valued random walk S, the following holds for every integer n ≥ 1:
1 T ≤ P Sk = 0 for some k ≥ n ≤ 8T , 2 ∞
where T =
1+
j=n P(Sj = 0) . ∞ j=1 P(Sj = 0)
This theorem makes the point that as n → ∞, P(Sk = 0 for some k ≥ n) goes to zero like a constant multiple of j≥n P(Sj = 0). Remarks 1. This can be sharpened; see Supplementary Exercise 1. 2. Throughout this subsection we implicitly use the notation of Section 1.3 and Section 1.4. −1 ; see 3. It can be shown that P(τ1 = ∞) = {1 + ∞ k=1 P(Sk = 0)} Supplementary Exercise 1. This is the probability of never hitting 0. ∞ Proof By transience and by Theorem 1.4.1, j=1 P(Sj = 0) < ∞. For all n ≥ 1, let ∞ Z= 1l(Sj =0) = R∞ − Rn−1 , j=n
∞ where R0 = 1. Clearly, E[Z] = j=n P(Sj = 0), which we know is finite. Recall our proof of Theorem 1.4.1; the method used there to estimate
1 One-Parameter Random Walks
75
E[(Rn − 1)2 ] can be used here to show that E[Z 2 ] ≤ 2
∞
∞ P(S = 0) · 1 + P(Sj = 0) .
(1)
j=1
=n
Since (Z > 0) ⊆ (Sk = 0 for some k ≥ n), we obtain the lower bound from the Paley–Zygmund lemma (Lemma 1.4.1). For the upper bound on the probability, define ∞ 1l(Sj =0) Fk , Mk = E
k ≥ n.
j=n
It is not hard to check that M = (Mk ; k ≥ n) is a martingale. Moreover, for all k ≥ n, ∞ ∞ Mk ≥ E 1l(Sj =0) Fk ·1l(Sk =0) = 1+ P(Sj+k −Sk = 0 | Fk ) ·1l(Sk =0) . j=1
j=k
We have used the monotone convergence theorem to write the conditional expectation and the sum of the conditional ∞ probabilities. By the Markov property (Corollary 1.1.1), Mk ≥ {1 + j=1 P(Sj = 0)} · 1l(Sk =0) , almost surely. Taking suprema over all k ≥ n and squaring, we obtain the following: 1l(Sk =0
for some k≥n)
∞ −2 ≤ 1+ P(Sj = 0) · sup Mk2 . j=1
k≥n
(2)
By Doob’s strong (2, 2) inequality (Theorem 1.4.1, Chapter 1), E sup Mk2 ≤ 4 sup E[Mk2 ]. k≥n
k≥n
Therefore, by taking expectations in equation (2), we obtain ∞ −2 P(Sj = 0) sup E[Mk2 ]. P(Sk = 0 for some k ≥ n) ≤ 4 1 + k=1
k≥n
Jensen’s inequality shows that for any k ≥ n, E[Mk2 ] ≤ E[Z 2 ]. Consequently, equation (1) implies the result.
1.6 Recurrence of Possible Points We now return to the question of when a general point x ∈ Zd is recurrent. To illustrate the potential complications, consider the following simple example.
76
3. Random Walks
Example Suppose d = 1 and the X1 , X2 , . . . are independent, identically distributed random variables taking the values ±2 with probability 12 each. Let S = (Sk ; k ≥ 1) denote the random walk whose increments are X1 , X2 , . . .; i.e., Sn = X1 + · · · + Xn , for all n ≥ 1. It should be absolutely clear that the point x = 1 is not recurrent. In fact, odd values can never be visited by S, and even values can. On the other hand, by the central limit theorem, lim supn Sn = − lim inf n Sn = +∞, almost surely. A little thought reveals that for any even number x, there are infinitely many n’s such that Sn = x. Exercise 1.6.1 Use the central limit theorem to show that, in the above example, lim supn Sn = − lim inf n Sn = +∞, a.s. In the previous example we constructed a random walk for which all of the even numbers are recurrent, while the odd numbers can never be reached. This property turns out to be typical. To explore this phenomenon in greater depth, suppose S is a Zd -valued random walk. An x ∈ Zd is possible if there exists an integer k ≥ 1 such that P(Sk = x) > 0. If x is not possible, it is deemed impossible. Clearly, impossible points are not, and can never be, visited. Therefore, any discussion of recurrence must be reduced to the possible points. What do the possible points of a random walk look like? Below is a prefatory result that will be elaborated upon in the next section. Lemma 1.6.1 The collection of all possible points of a Zd -valued random walk is an additive semigroup of Zd . Proof Suppose the random walk is denoted by S and x1 , x2 ∈ Zd are possible for S. By definition, there exist k1 , k2 ∈ Zd such that pi = P(Ski = xi ) > 0 for i = 1, 2. Since P(Sk1 +k2 − Sk1 = x2 ) = P(Sk2 = x2 ) = p2 , by the Markov property (Corollary 1.1.1), P(Sk1 +k2 = x2 + x1 ) ≥ P(Sk1 = x1 , Sk1 +k2 − Sk1 = x2 ) = p1 p2 > 0. This proves the lemma.
The following is a very important exercise. Exercise 1.6.2 Let S denote a random walk on Zd whose increment process is X. We say that S is symmetric if X1 and −X1 have the same distributions. Prove that whenever S is a symmetric random walk on Zd , the set of its possible values forms an additive subgroup of Zd . In particular, argue that the origin is always possible. Lemma 1.6.2 The collection of all recurrent points is an additive subgroup of Zd . In particular, if there are any recurrent points, 0 is one of them.
1 One-Parameter Random Walks
77
Proof We will show that whenever x and y are recurrent, so is x − y. Let τ denote the first hitting time of y. That is, τ = inf(k ≥ 1 : Sk = y), where inf ∅ = ∞. Thanks to the recurrence of y, τ is finite and Sτ = y, a.s.; cf. Proposition 1.3.1. Consequently, the strong Markov property (Theorem 1.2.1) implies the following (why?): P(Sk = x − y for infinitely many k ≥ 1) = P(Sk+τ − Sτ = x − y for infinitely many k ≥ 1) = P(Sk+τ = x for infinitely many k ≥ 1) = P(Sk = x for infinitely many k ≥ 1), which is equal to one, thanks to the recurrence of x, together with Proposition 1.3.1. This completes our proof. Theorem 1.6.1 Suppose S is a Zd -valued random walk. If x ∈ Zd is possible and y ∈ Zd is recurrent, x − y is recurrent. In particular, the following are equivalent: (i) 0 is recurrent; (ii) all possible points x are recurrent with probability one. Note that the condition (ii) subsumes the assumption that x is possible and that Theorem 1.6.1 extends Lemma 1.6.2. Proof To begin, let us argue that the first assertion of the theorem implies the equivalence of (i) and (ii). Suppose (ii) holds, first. Then, for any possible point x, 0 = x − x is recurrent, by the first assertion of the theorem, thus proving (i). Conversely, if (i) holds, by the first assertion of the theorem and by Lemma 1.6.2, for any possible point x, x = x − 0 is recurrent. We have shown that (i) ⇔ (ii) and are left to verify that for all possible points x and all recurrent points y, x − y is recurrent. Holding such x and y fixed, define σ1 = inf(k ≥ 1 : Sk = y), σ2 = inf(k ≥ K0 + τ1 : Sk = y), . . ., where K0 is a fixed constant that is to be chosen later on in this proof. (For now, you can think of K0 = 1, in which case σj denotes the jth time the random walk hits y.) In general, for all j ≥ 1, we define σj+1 = inf(k ≥ K0 + σj : Sk = y), where inf ∅ = ∞, as usual. Since y is recurrent, σj < ∞ for all j ≥ 1, with probability one. Now we define the events E1 , E2 , . . . as
En = Sk = x for some σn < k < σn+1 , n ≥ 1. As k varies between σn and σn+1 , the process Sk makes a loop, starting from y and ending at y. This loop is called an excursion from y, and En denotes the event that in the nth excursion from y, the random walk hits
78
3. Random Walks
x at some point. Equivalently,
En = Sk+σn − Sσn = x − y for some 1 ≤ k ≤ σn+1 − σn . You should check that as a consequence of the strong Markov property, E1 , E2 , . . . are independent events and all have the same probability P(E1 ); cf. Theorem 1.2.1. Now is the time to choose K0 . Since x is possible, by choosing K0 large enough, we can ensure that P(E1 ) > 0. (Why?) Thus, by the Borel–Cantelli lemma, P(En infinitely often) = 1. In particular, x is recurrent and, thanks to Lemma 1.6.2, so is x − y, as desired.
1.7 Recurrence–Transience Dichotomy Let S denote a Zd -valued random walk and P denote the collection of its possible values. According to Theorem 1.6.1, either all x ∈ P are recurrent or they are all transient. This is the recurrence–transience dichotomy. The impossible values, of course, are never visited and have no effect on the structure of the random walk. On the other hand, at least in the presence of some recurrent values, all elements of P are recurrent and P is an additive group (Lemma 1.6.2 and Theorem 1.6.1). Thus, when P = ∅, we can view S as a Markov chain on the group P . A little group theory will show that quite a bit more is true. Indeed, recall that Zd is a free abelian group.1 Since all subgroups of free abelian groups are free abelian,2 Lemma 1.6.1 shows that P is itself a free abelian group. If k ∈ {1, . . . , d} denotes the rank of P , then P is isomorphic to Zk (why an isomorphism and not just a homomorphism?). For us, this means that there exists a k × k invertible matrix A such that AP = Zk . Since Sn ∈ P , a.s. for all n ≥ 1, AS = ASn ; n ≥ 1 is a random walk on Zk and all points in Zk are possible for this walk. Since A−1 exists, all statements about the P -valued Markov chain S translate to statements for the Zk -valued random walk AS, and vice versa. Thus, it is no essential loss in generality to assume that S is itself a Zd -valued random walk for which all points in Zd are possible.
1 Let G be a class of groups. Consider some G ∈ G whose generator is the set g = {xi ; i ∈ I}. Recall that G is freely generated by g (within the class G) if for any group G ∈ G that is generated by {yi ; i ∈ I}, the map xi → yi extends to a homomorphism (i.e., operation-preserving) G → G . The cardinality of I is the rank of G, and G is free within G. A free abelian group is a group that is free within the class of all abelian groups. While general free groups do not have much rank structure in a “dimensional” sense, free abelian groups do. 2 This is an immediate consequence of the free abelian group theorem: Each subgroup of a free abelian group is itself a free abelian group. (Why is it a consequence?) See Kargapolov and Merzljakov (1979, Theorem 7.1.4, Chapter 3),
1 One-Parameter Random Walks
79
Proposition 1.7.1 Suppose S is a Zd -valued random walk and let ϕ denote the characteristic function of the increments ϕ(ξ) = E[eiξ·X1 ], ξ ∈ Rd . Then, ∞
P(Sk = 0) = (2π)−d lim
λ↑1
k=1
Re
[−π,π]d
ϕ(ξ) dξ. 1 − λϕ(ξ)
Combining the above with Theorem 1.6.1, we conclude the following. Corollary 1.7.1 Suppose S is a Zd -valued random walk for which all points are possible. Let ϕ denote the characteristic function of the increments of S. Then, all x ∈ Zd are transient, a.s., unless & limλ↑1 [−π,π]d Re{1 − λϕ(ξ)}−1 dξ < ∞, in which case all points are recurrent, a.s. Bearing in mind the discussion in the beginning of this subsection, what the above states is that for any random walk on Zd , either all possible points are recurrent, or all possible points are transient (why?). In the latter case, we say that the random walk is recurrent and in the former case, transient. It is important to point out that if S is a transient walk for which all points are possible, then with probability one, limn→∞ |Sn | = ∞. The converse also holds, as the following shows. Exercise 1.7.1 S is transient if and only if |Sn | → ∞, a.s.
Proof of Proposition 1.7.1 By the inversion theorem of Fourier analysis on the torus (or by the inversion&theorem for discrete random variables), for all k ≥ 1, P(Sk = 0) = (2π)−d [−π,π]d {ϕ(ξ)}k dξ. Thus, for all λ ∈ ]0, 1[, ∞
k
−d
λ P(Sk = 0) = (2π)
k=1
λ
Re [−π,π]d
ϕ(ξ) dξ, 1 − λϕ(ξ)
since the left-hand side is real-valued. (Check this calculation!) To finish, simply let λ ↑ 1. In fact, the following (surprisingly) subtle fact holds:3 lim λ↑1
Re
[−π,π]d
ϕ(ξ) dξ = 1 − λϕ(ξ)
Re [−π,π]d
ϕ(ξ) dξ. 1 − ϕ(ξ)
We will not have need for this. 3 Cf. Ornstein (1969) and Stone (1969). For a more complete result, see Port and Stone (1971b, Theorem 16.2).
80
3. Random Walks
2 Intersection Probabilities A collection of N (≥ 2) independent Zd -valued random walks S 1 , S 2 , . . . , S N are said to intersect if there exists t ∈ NN such that St1(1) = · · · = StN(N ) . If we think of Ski as the position of particle i at time k, then S 1 , . . . , S N intersect if and only if the particle trajectories cross at some point. It should be recognized that such intersections are different from the collisions of S 1 , . . . , S N . The latter happens when there exists k ∈ N such that Sk1 = Sk2 = · · · = SkN . In words, S 1 , . . . , S N intersect if the trajectories of S 1 , . . . , S N intersect, while they collide if the particles Sk1 , . . . , SkN collide at some time k. In light of the development in Section 1, collision problems are simpler to analyze. For instance, two independent random walks S 1 and S 2 collide infinitely often if and only if 0 is recurrent for the random walk k → Sk1 −Sk2 . In this section we study the more intricate problem of intersections of independent random walks.
Define the multiparameter ZdN -valued process S = St ; t ∈ NN by St = (St1(1) , . . . , StN(N ) ),
t ∈ NN .
This means that the first d coordinates of St match those of St1(1) , the second d coordinates of St are the coordinates of St2(2) , and so on. It is apparent that for any m ≥ 1 (finite or infinite) the ranges of S 1 , . . . , S N intersect m times if and only if S hits the diagonal of ZN d m times. If we write any x ∈ ZN d as x = (x1 , . . . , xN ) with xi ∈ Zd , then the diagonal of ZN d is the set diag(ZN d ) = {x ∈ ZN d : x1 = · · · = xN }. In direct product notation, we can write x ∈ ZN d as x = x1 ⊗ · · · ⊗ xN , where xi ∈ Zd . (For example, (1, 2, 3, 4) = (1, 2) ⊗ (3, 4) = 1 ⊗ 2 ⊗ 3 ⊗ 4.) Since St = St1(1) ⊗ · · · ⊗ StN(N ) , we sometimes write the stochastic process S as S = S 1 ⊗ · · · ⊗ S N and refer to S 1 , . . . , S N as the coordinate processes of S. To write things more explicitly, consider N = 2. Then, S = S 1 ⊗ S 2 is a two-parameter process defined by S(i,j) = (Si1 , Sj2 ), i, j ≥ 1. This means that the first d coordinates of S(i,j) are the d coordinates of Si1 , and the next d coordinates of S(i,j) are those of Sj2 . Henceforth, we will assume that all points are possible for S 1 , . . . , S N . See Section 1.7 for a discussion of this assumption and how it can be essentially made without loss of generality.
2.1 Intersections of Two Walks Let S 1 and S 2 denote two independent random walks on Zd and let S = S 1 ⊗ S 2 denote the associated 2-parameter process. We are interested in knowing when S hits the diagonal of Z2d finitely often. In other words, ∞ ∞ we ask, “when is j=n k=m 1l(Sj1 =Sk2 ) finite for all choices of n, m ≥ 1? ” At the time of writing this book, this question seems unanswerable for
2 Intersection Probabilities
81
completely general walks S 1 and S 2 . However, we will give a comprehensive answer when S 1 and S 2 are both symmetric, i.e., when Sj1 (respectively Sj2 ) has the same distribution as −Sj1 (respectively −Sj2 ) for all j ≥ 1; cf. Exercise 2.1.3 below for a further refinement. According to the recurrence–transience dichotomy (Corollary 1.7.1 and its proceeding discussion ), S 1 is either recurrent or transient, a.s. First, we address the easy case where S 1 (or equivalently, S 2 ) is recurrent. Lemma 2.1.1 If either of S 1 or S 2 is recurrent, then with probability one, there are infinitely many intersections. Exercise 2.1.1 Prove Lemma 2.1.1.
According to Lemma 2.1.1, in our study of the intersections of S 1 and S we can confine ourselves to the transient case. 2
Henceforth, S 1 and S 2 are symmetric walks, and S01 = S02 = 0. Consider the function ∞ ∞ λj+k 1l(Sj1 +a=Sk2 +b) , Gλ (a, b) = E
λ ∈ ]0, 1[, a, b ∈ Zd .
(1)
j=0 k=0
Theorem 2.1.1 Suppose S 1 and S 2 are symmetric, independent, transient random walks in Zd . Then, the following are equivalent: (i) limλ↑1 Gλ (0, 0) = +∞; ∞ (ii) P( ∞ j=1 k=1 1l(Sj1 =Sk2 ) < ∞) > 0; ∞ ∞ (iii) P( j=1 k=1 1l(Sj1 =Sk2 ) < ∞) = 1; and ∞ ∞ (iv) j=1 k=1 P(Sj1 = Sk2 ) < ∞. The following technical lemma lies at the heart of Theorem 2.1.1 and seems to require symmetry. Lemma 2.1.2 Let ϕ1 and ϕ2 denote the characteristic functions of the increments of S 1 and S 2 , respectively. Then, for all λ ∈ ]0, 1[, sup Gλ (a, b) = Gλ (0, 0).
a,b∈Zd
Proof By the inversion formula for characteristic functions, 1 2 e−iξ·(b−a) E[eiξ·Sj ] E[e−iξ·Sk ] dξ P(Sj1 + a = Sk2 + b) = (2π)−d d [−π,π] −d e−iξ·(b−a) {ϕ1 (ξ)}j {ϕ2 (−ξ)}k dξ = (2π) d [−π,π] e−iξ·(b−a) {ϕ1 (ξ)}j {ϕ2 (ξ)}k dξ. = (2π)−d [−π,π]d
82
3. Random Walks
In the last line, symmetry is used. Therefore, 1 1 −d dξ. e−iξ·(b−a) Gλ (a, b) = (2π) 1 (ξ) 1 − λϕ2 (ξ) 1 − λϕ d [−π,π]
(2)
On the other hand, −1 ≤ ϕ1 (ξ), ϕ2 (ξ) ≤ 1, which implies that {1 − λϕ1 (ξ)}−1 × {1 − λϕ2 (ξ)}−1 is nonnegative. Since |e−iξ·(b−a) | ≤ 1, the lemma follows. It may be helpful to note that the one-parameter version of this lemma always holds: Exercise 2.1.2 Suppose S is a random walk on Zd with S0 = 0, and ∞ define for all λ ∈ ]0, 1[, Gλ (a) = E[ k=0 λk 1l(Sk =a) ]. Prove that even if S is not symmetric, Gλ (a) ≤ Gλ (0) for all a ∈ Zd and all λ ∈ ]0, 1[. (Hint: Consider the first hitting time of a.) Proof of Theorem 2.1.1 It is clear that (iii) ⇒ (ii). Conversely, it is not hard to check that (ii) ⇒ (iii), thanks to the Hewitt–Savage 0-1 law; cf. Exercise 1.7.5, Chapter 1. Since (i) ⇔ (iv ) ⇒ (iii) is clear, it remains to prove that if (iv ) fails, then so will (iii). Define for all n ≥ 1, Jλ =
∞ ∞ j=0 k=0
λj+k 1l(Sj1 =Sk2 ) .
Note that E[Jλ ] = Gλ (0, 0); cf. equation (2). Since (iv ) is assumed to fail, limλ↑1 E[Jλ ] = +∞. Our strategy, then, is to show the existence of a nontrivial constant A1 such that
2 E[Jλ2 ] ≤ A1 E[Jλ ] ,
λ ∈ ]0, 1[.
(3)
Assuming this, we can finish our proof: Apply equation (3) and the Paley– Zygmund lemma (Lemma 1.4.1) to see that P
sup Jλ = +∞ ≥ lim P Jλ ≥ 12 E[Jλ ] ≥ A−1 1 , λ↑1
λ∈ ]0,1[
which is positive. Thus, it remains to verify equation (3). We can write E[Jλ2 ] ≤ 2(T1 + T2 ), where T1 =
i≤i
T2 =
j≤j
i≤i
λi+i +j+j P(Si1 = Sj2 , Si1 = Sj2 ),
j ≤j
λi+i +j+j P(Si1 = Sj2 , Si1 = Sj2 ).
2 Intersection Probabilities
83
(Why?) Next, we write T1 = T11 + T12 + T13 , where T11 = λi+i +j+j P(Si1 = Sj2 , Si1 = Sj2 ), T12 =
i
λ
i=0
T13 =
j<j
∞ j=0
P(Si1 = Sj2 = Sj2 ),
λi+i +2j P(Si1 = Sj2 = Si1 ).
i
Similarly, we write T2 = T21 + T12 + T13 , where λi+i +j+j P(Si1 = Sj2 , Si1 = Sj2 ). T21 = i
j <j
We now estimate the Tij ’s in turn. By the Markov property, λi+i +j+j P(Si1 = Sj2 )P(Si1 −i = Sj2 −j ) T11 = i
=
j<j
i
λ2i+2j+(i −i)+(j
−j)
P(Si1 = Sj2 )P(Si1 −i = Sj2 −j )
j<j
2 ≤ E[Jλ ] .
On the other hand, T12 ≤
∞ i=0
λi+j+j P(Si1 = Sj2 )P(Sj2 −j = 0) ≤ A2 E[Jλ ],
j<j
∞ where A2 = i=0 P(Si2 = 0). Of course, since S 2 is transient, A2 < +∞; cf. Theorem 1.4.1. In similar fashion we obtain T13 ≤ A3 E[Jλ ], where A3 = ∞ 1 i=0 P(Si = 0) is finite as well. Since we have assumed (iv ) of the theorem, our job is complete, once we show that there exists a nontrivial constant A4 such that for all n ≥ 1,
2 T21 ≤ A4 E[Jλ ] .
(4)
Indeed, from this, equation (3), and hence the theorem, follows. We observe that T21 equals
λi+i +j+j P Si1 = Sj2 + [Sj2 − Sj2 ] , Si1 + [Si1 − Si1 ] = Sj2 i
=
j <j
i
j <j
2 1 2 ¯1 λi+i +j+j P Si1 = Sj2 + S¯j−j , Si + Si −i = Sj ,
84
3. Random Walks
where (S¯u1 , S¯v2 ) is an independent copy of (Su1 , Sv2 ) for any two integers u, v ≥ 1. This is a consequence of the Markov property; cf. Corollary 1.1.1. Consequently,
2 ¯1 ¯2 T21 = λi+i +j+j P Si1 = Sj2 + S¯j−j , Si −i = Sj−j i
≤
j <j
∞ ∞ ∞ ∞
λi+j+u+v P Si1 = Sj2 + S¯v2 , S¯u1 = −S¯v2
i=0 j=0 u=0 v=0
=
∞ ∞ u=0 v=0
λu+v E Gλ (0, S¯v2 )1l(S¯u1 =−S¯v2 ) ,
by independence and by equation (1). Thanks to Lemma 2.1.2, T21 ≤ Gλ (0, 0)
∞ ∞
λu+v P(Su1 = −Sv2 )
u=0 v=0
= E[Jλ ]
∞ ∞
λu+v P(Su1 = −Sv2 )
u=0 v=0
2
= E[Jλ ] .
To follow up, the first line follows from the fact that (S¯u1 , S¯v2 ) has the same distribution as (Su1 , Sv2 ). The second line is from the definition of Jλ , and the third line follows from the symmetry hypothesis of the theorem. This verifies equation (4) and completes our task. Exercise 2.1.3 A characteristic function ϕ, on Rd , is said to satisfy the sector condition if there exists a constant A > 0 such that |Imϕ(ξ)| ≤ A 1 + |Reϕ(ξ)| , ξ ∈ Rd . Suppose S 1 and S 2 are independent random walks on Zd , whose increments have characteristic functions that satisfy the sector condition. Prove that Theorem 2.1.1 remains valid in this setting. 4 Theorem 2.1.1 states that, under the given conditions, the trajectories of S 1 and S 2 intersect infinitely many times if and only if j,k≥1 P(Sj1 = Sk2 ) = ∞. By a summability argument (see the described proof of Proposition 1.7.1), the latter can be written as follows.
Proposition 2.1.1 We have ∞ j,k=1
P(Sj1
=
Sk2 )
−d
= (2π)
lim λ↑1
[−π,π]d
ϕ1 (ξ) ϕ2 (ξ) dξ. 1 − λϕ1 (ξ) 1 − λϕ2 (ξ)
4 Throughout this book, the trajectories of a stochastic process (X ; t ∈ T ) are the t realizations of the (random) function t → Xt , for any index set T .
2 Intersection Probabilities
Exercise 2.1.4 Verify Proposition 2.1.1.
85
2.2 An Estimate for Two Walks Let S 1 and S 2 be two independent Zd -valued walks. According random 1 to Theorem 2.1.1, we can conclude that P(S = Sk2 ) < ∞ is a j j,k≥1 necessary and sufficient condition for lim P(Sj1 = Sk2 for some (j, k) (n, m)) = 0,
n,m→∞
provided that the random walks are symmetric. We now explore the rate at which the above probability tends to 0, under the extra condition that there exists C0 such that whenever P(Si1 = Sj2 ) > 0, P(Si1 = Sj2 + a) ≤ C0 P(Si1 = Sj2 ).
(1)
This is a unimodality-type condition and is verified, for instance, when S 1 and S 2 are so-called simple random walks; cf. Section 3. Theorem 2.2.1 Suppose S 1 and S 2 are two symmetric and independent Zd -valued random walks that satisfy condition (1). If j,k≥1 P(Sj1 = Sk2 ) < ∞, there exist nontrivial constants C1 and C2 such that for all n, m ≥ 1, C1
∞ ∞
P(Sj1 = Sk2 ) ≤ P(Sj1 = Sk2 for some (j, k) (n, m))
j=n k=m
≤ C2
∞ ∞
P(Sj1 = Sk2 ).
j=n k=m
Proof Define for all n, m ≥ 1, Jn,m =
∞ ∞ j=n k=m
1l(Sj1 =Sk2 ) .
Arguing as we did in Theorem 2.1.1, we can show that there exist nontrivial 2 ] ≤ C3 (E[Jn,m ])2 + constants C3 and C4 such that for all n, m ≥ 1, E[Jn,m C4 E[Jn,m ]; this uses (1), as well as symmetry. Since E[Jn,m ] goes to zero as n, m → ∞, we can deduce the existence of a finite constant C1 such that 2 E[Jn,m ]≤
E[Jn,m ] . C1
(2)
The details are delegated to Supplementary Exercise 6. By the Paley– Zygmund lemma (Lemma 1.4.1), P(Jn,m > 0) ≥ C1 E[Jn,m ],
86
3. Random Walks
which is the desired probability lower bound. To demonstrate the corresponding upper bound, for all n, m ≥ 1, let Fn,m define the σ-field generated by ((Si1 , Sj2 ); 1 ≤ i ≤ n, 1 ≤ j ≤ m). By Exercise 3.4.2 of Chapter 1, F = (Fn,m ; n, m ≥ 1) is a commuting filtration in the sense of Chapter 1. Fix (n, m) ∈ N2 and define Mp,q = E(Jn,m | Fp,q ),
(p, q) (n, m).
By the Markov property (Corollary 1.1.1), Mp,q ≥
∞ ∞ i=p j=q
=
P(Si1 = Sj2 | Fp,q )1l(Sp1 =Sq2 )
∞ ∞ i=1 j=1
=
∞ ∞ i=1 j=1
) =
1 2 P(Si+p − Sp1 = Sj+q − Sq2 | Fp,q ) + 1 1l(Sp1 =Sq2 ) P(Si1 = Sj2 ) + 1 1l(Sp1 =Sq2 )
16 1l(Sp1 =Sq2 ) . C2 C1
(This defines C2 .) It is clear that M = (Mt ; t ∈ N2 ) is a two-parameter martingale with respect to the (commuting) filtration F. Thus, by Cairoli’s strong (2, 2) inequality (Theorem 2.3.1 and Corollary 3.5.1 of Chapter 1), P(Sq1 = Sq2 for some (p, q) (n, m)) ≤
C2 C1 2 E sup Mp,q 16 (p,q) (n,m)
2 ≤ C2 C1 E[Jn,m ].
The probability upper bound follows from this and equation (1).
2.3 Intersections of Several Walks We are ready to consider the general problem of when and how often N independent random walks in Zd intersect, when N ≥ 2 is an arbitrary integer. This will be achieved by extending the two-parameter methods of Section 2.1 to N parameters. Let S 1 , . . . , S N denote N independent Zd -valued random walks. The following can be proved in complete analogy to Lemma 2.1.1. Lemma 2.3.1 If any one of the coordinate processes is recurrent and if the trajectories of the remaining N − 1 coordinate processes intersect infinitely 1 1 l many times, then for all t ∈ NN , s t (S (1) =···=S N(N ) ) = ∞, almost s s surely.
2 Intersection Probabilities
Exercise 2.3.1 Prove Lemma 2.3.1.
87
In particular, we need to consider only the case where all of the coordinate processes are transient. By Theorem 1.4.1, this happens precisely when ∞ i P(S k = 0) < ∞, for all i = 1, . . . , N , a condition that we will assume k=1 tacitly from now on. Let S01 = · · · = S0N = 0 and define the N -variable version of equation (1) of Section 2.2 as Gλ (a1 , . .. , aN ) i1 +···+iN 1l(Si1 =E ··· λ
+a1 =Si2 +a2 =···=SiN 1 2 N
0≤i1 ,...,iN
+aN ) ,
(1)
where a1 , . . . , aN ∈ Zd and λ ∈ ]0, 1[. One can prove the following. Proposition 2.3.1 Suppose S 1 , . . . , S N are N symmetric and independent Zd -valued random walks whose increments have characteristic functions ϕ1 , . . . , ϕN , respectively. Then, Gλ (a1 , . . . , aN ) ≤ Gλ (0, . . . , 0) for all a1 , . . . , aN ∈ Zd . Moreover, Gλ (0, . . . , 0) = (2π)−d(N −1) F (ξ; λ) dξ, [−π,π]d(N −1)
where for all ξ ∈ [−π, π]d(N −1) and all λ ∈ ]0, 1[, F (ξ; λ) =
1−
λϕ1
1
· N −1 − =1 ξ ()
N −1 j=2
1 . 1 − λϕj (ξ)
Exercise 2.3.2 Prove Proposition 2.3.1.
Theorem 2.3.1 Suppose S 1 , . . . , S N are symmetric, independent, Zd valued, transient random walks. Then, the following are equivalent: (i) With positive probability, t∈NN 1l(S 1 =···=S N ) < ∞; (ii) With probability one, (iii)
t∈NN
t(1)
t∈NN
1l(S 1
t(1)
t(N )
=···=S N(N ) ) t
< ∞; and
P(St1(1) = · · · = StN(N ) ) < ∞.
We provide only a sketch of the proof. Sketch of Proof In light of the presented proof of Theorem 2.1.1, (iii) ⇒ (ii) ⇔ (i) follows readily; it remains to show that if (iii) fails, then so does (ii). For all n ≥ 1, define (1) (N ) Jλ = λs +···+s 1l(S 1 =···=S N ) , λ ∈ ]0, 1[. s∈NN 0
s(1)
s(N )
88
3. Random Walks
Our goal is to show that if (iii) fails, supλ∈ ]0,1[ Jλ = ∞, with positive probability. It is this argument that we merely sketch. Since limλ↑1 E[Jλ ] = +∞, it suffices to exhibit a finite C1 such that E[Jλ2 ] ≤ C1 (E[Jλ ])2 for all λ ∈ ]0, 1[. Once this is accomplished, the remainder of our argument follows our proof of Theorem 2.1.1 quite closely. Clearly, N
E[Jλ2 ] = ··· λ =1 (i +i ) P Si11 = · · · = SiNN , Si11 = · · · = SiN . i1 ,i1 ≥0
N
iN ,iN ≥0
Let us consider the contribution to the above when i1 = i 1 : ∞ i1 =0 i2 ,i2 ≥0
···
λ2i1 +
N
=2 (i +i )
iN ,iN ≥0
×P Si11 = Si22 = · · · = SiNN , Si11 = Si22 = · · · = SiN
≤ (N − 1)!
∞
···
N
N
λ
=1 i +
N
=2 (i −i )
i1 =0 0≤i2 ≤i2
×P Si11 = · · · =
0≤iN ≤iN
N SiNN P(Si −i =2
= 0)
≤ C2 E[Jλ ], for some finite constant C2 that is independent of λ ∈ [0, 1]. By symmetry, N N ··· λ =1 i + =2 (i −i ) Q, E[Jλ2 ] ≤ C3 E[Jλ ] + 0≤i1 ,i1 i1 =i1
0≤iN ,iN iN =iN
where Q = P(Si11 = · · · = SiNN , Si1 = · · · = SiN ). A little thought shows 1 N that, over the range in question, Si11 ∧i + S¯i11 −(i1 ∧i ) = · · · = SiNN ∧i + S¯iNN −(iN ∧i ) 1 1 N N and Q = P , Si11 ∧i + S¯i1 −(i ∧i1 ) = · · · = SiNN ∧i + S¯iN −(i ∧iN ) 1
1
1
(Su1(1) , . . . , SuN(N ) )
N
N
N
(S¯u1(1) , . . . , S¯uN(N ) )
and are independent copies of where one another for each u ∈ NN . Solving, we get Si11 ∧i + S¯i11 −(i1 ∧i ) = · · · = SiNN ∧i + S¯iNN −(iN ∧i ) 1 1 N N and Q ≤ P . |S¯|i1 −i1 | | = · · · = |S¯|iN −iN | | 1
N
The rest of the proof follows from changing variables (j = |i − i |) and follows the N = 2 argument very closely, except that we now use Proposition 2.3.1 in place of Lemma 2.1.2.
3 The Simple Random Walk
89
2.4 An Estimate for N Walks We consider the problem of the previous subsection in the case where the number of intersections is finite, almost of surely.1 Under the hypotheses N Theorem 2.3.1, this is to say that P(S = · · · = S ) < ∞. N (1) (N ) t∈N t t The question that we address now is, how large is P(Ss1(1) = · · · = SsN(N ) for some s t) when t ∈ NN is large, coordinatewise? When N = 2, this was achieved in Theorem 2.2.1; the general case follows under the following unimodality analogue of equation (1) of Section 2.2: There exists a finite constant C0 such that sup a1 ,...,aN ∈Zd
P(Si11 + a1 = · · · = SiNN + aN ) ≤ C0 P(Si11 = · · · = SiNN ),
(1)
as long as the right-hand side is positive. Theorem 2.4.1 Suppose S 1 , . . . , S N are independent Zd -valued random = · · · = StN(N ) ), and assume walks and for all t ∈ NN , let ψ(t) = P(St1(1) that these walks satisfy condition (1) above. If t∈NN ψ(t) < ∞, there exist finite constants C1 and C2 such that for all t ∈ NN , ψ(s) ≤ P(Ss1(1) = · · · = SsN(N ) for some s t) ≤ C2 ψ(s). C1 st
st
One can prove this by finding a suitable N -parameter modification of the two-parameter argument used to prove Theorem 2.2.1. Exercise 2.4.1 (Hard) Prove Theorem 2.4.1.
3 The Simple Random Walk Nearest-neighborhood random walks on Zd are random walks that can move only to the nearest point in Zd . Indeed, let (e1 , . . . , ed ) denote the usual ba(i) sis for Rd . That is, for all i, j ∈ {1, . . . , d}, ej equals 1 if i = j, and it equals 0 otherwise. Consider a Zd -valued random walk S = (Sk ; k ≥ 1) with increments X1 , X2 , . . . . We say that S is a nearest-neighborhood random walk if with probability one, X1 ∈ {±e1 , . . . , ±ed}. Nearest-neighborhood random walks form some of the most common models for the motion of a randomly moving particle. An important member of this family of random walks is the simple random walk. A random walk S is said to be simple if it is truly unbiased in its motion. More precisely, S is a simple random walk if P(X1 = e1 ) = P(X1 = −e1 ) = · · · = P(X1 = ed ) = P(X1 = −ed ) = (2d)−1 . In this section we put the general theory of Section 2 to test by way of explicit calculations. Let us recall that for all x ∈ Rk , |x| = max1≤≤k |x() | and x = k 1 { =1 |x() |2 } 2 denote the ∞ and 2 norms of x, respectively.
90
3. Random Walks
3.1 Recurrence We now wish to study the recurrence properties of a simple random walk S in Zd with increments X1 , X2 , . . . . The following elementary result is a first step in this direction. Lemma 3.1.1 All points are possible for a simple random walk. Exercise 3.1.1 Prove Lemma 3.1.1.
Thus, according to Corollary 1.7.1, S is either recurrent or transient. In order to decide which is the case, we first need a technical lemma. & Lemma 3.1.2 The integral [0,1]d ξ−β dξ is finite if and only if β < d.
(j) 2 12 ) } , while |ξ| = max1≤j≤N |ξ (j) |. Proof Recall that ξ = { N j=1 (ξ The traditional approach to this sort of problem is to estimate the integral in polar coordinates; we will do this in & probabilistic language. First, note 1 that |ξ| ≤ ξ ≤ d 2 |ξ|. Therefore, [0,1]d ξ−β dξ < ∞ if and only if & |ξ|−β dξ < ∞. Let U be a random variable that is uniformly picked [0,1]d −β on [0, 1]d. The problem is to decide when } is finite. On E{|U | −β the other hand, a direct calculation shows that n≥1 P(|U | ≥ n) = n≥1 n−d/β , which is finite iff d > β. Theorem 3.1.1 Let S denote the simple random walk in Zd . Then S is recurrent if d ≤ 2; otherwise, S is transient. Proof Let ϕ denote the characteristic function of X1 . It is easy to check that d 1 ϕ(ξ) = cos(ξ () ), ξ ∈ Rd . (1) d =1
Since ϕ(ξ) ≥ 0 (and is, of course, real) for all ξ ∈ [−1, 1]d , we can apply the bounded and monotone convergence theorems to Proposition 1.7.1 to see that ∞ ϕ(ξ) dξ. P(Sn = 0) = (2π)−d [−π,π]d 1 − ϕ(ξ) n=1 (Why?) Equivalently, we apply symmetry to deduce 1+
∞ n=1
P(Sn = 0) = π −d
[0,π]d
1−
d
−1 1 cos(ξ () ) dξ. d =1
By Theorem 1.4.1, it suffices to show that the above integral is finite if and only if d ≥ 3. Owing to Taylor’s theorem with remainder, for all y there exists a λ between 0 and y such that cos(y) = 1 −
y2 λ4 + . 2 12
3 The Simple Random Walk
91
Hence, for all y ∈ [0, 1], 1 1 5 y2 ≤ cos(y) ≤ 1 − y 2 − = 1 − y2. (2) 2 2 12 12 This, in turn, implies the inequality d 1
−1 12d −2 () 1− 2d ξ dξ ≤ cos(ξ ) dξ ≤ ξ−2 dξ. d 5 [0,1]d [0,1]d [0,1]d 1−
=1
& d Since d is an integer, by Lemma 3.1.2, [0,1]d (1 − d−1 =1 cos(ξ () ))−1 dξ if and d ≥ 3. Our proof is concluded once we show that &is finite −1 d only if() (1−d cos(ξ ))−1 dξ < ∞, where K = [0, π]d \[0, 1]d . To observe =1 K this, note that whenever ξ ∈ K, there is at least one ∈ {1, . . . , N } such that cos(ξ () ) ≤ cos(1). For such ξ’s, we can conclude that 1 − d−1
d
cos(ξ () ) ≥ d−1 [1 − cos(1)].
=1
Since cos(1) < 1, d
−1 1 d Leb(K). 1− cos ξ () dξ ≤ d 1 − cos(1) K
(3)
=1
Clearly, Leb(K) ≤ π d < ∞, which proves the result.
Theorem 3.1.1 is deeply related to the following: Exercise 3.1.2 (Hard) If S denotes the simple walk in Zd , then there exists a finite constant C > 1 such that for all n ≥ 1, d
d
C −1 n− 2 ≤ P(S2n = 0) ≤ sup P(S2n = a) ≤ Cn− 2 . a∈Zd
(Hint: Use the inversion theorem for characteristic functions and write d & P(S2n = 0) as (2π)− 2 [−π,π]d E[eiξ·S2n ] dξ. Use the fact that S has i.i.d. increments and expand this integral near ξ = 0. Alternatively, look at Durrett (1991) under “local central limit theorem.”) Exercise 3.1.3 Use Exercise 3.1.2, together with Theorem 1.4.1, to construct an alternative proof of Theorem 3.1.1.
3.2 Intersections of Two Simple Walks Given two independent Zd -valued simple random walks, when do their trajectories intersect infinitely often? In other words, if the random walks are denoted by S 1 and S 2 , when can we conclude that j,k≥1 1l(Sj1 =Sk2 ) = +∞?
92
3. Random Walks
Theorem 3.2.1 Suppose S 1 and S 2 are independent simple random walks in Zd . With probability one, the trajectories of S 1 and S 2 intersect infinitely often if and only if d ≤ 4. Proof When d ≤ 2, S 1 and S 2 are recurrent; cf. Theorem 3.1.1. By Lemma 2.1.1, we can assume with no loss of generality that d ≥ 3, i.e., that S 1 and S 2 are transient. Let ϕ denote the characteristic function of the increments S 2 , since they have the same distribution. By Corollary 1.7.1, of S 1 and/or & limλ↑1 [−π,π]d {1 − λϕ(ξ)}−1 dξ < ∞. Since ϕ(ξ) ≥ 0 for all ξ ∈ [−1, 1]d , the bounded and monotone convergence theorems together show us that 1 dξ < ∞. [−π,π]d 1 − ϕ(ξ) Once again applying the bounded and monotone convergence theorems, this time via Proposition 2.1.1, we obtain the following: ∞ [ϕ(ξ)]2 (2π)d P(Sj1 = Sk2 ) = dξ 2 [−π,π]d {1 − ϕ(ξ)} j,k=1 {1 − ϕ(ξ)}−2 dξ = [−π,π]d {1 + ϕ(ξ)}{1 − ϕ(ξ)}−1 dξ. − [−π,π]d
The & second integral is finite. In fact, it is positive and bounded above by 2 [−π,π]d {1 − ϕ(ξ)}−1 dξ < +∞. Thanks to symmetry and by Theorem & 2.1.1, it suffices to show that [0,π]d {1 − ϕ(ξ)}−2 dξ < ∞ if and only if d ≥ 5. Following the demonstration of Theorem 3.1.1, we split the integral in two parts: where ξ ∈ [0, 1]d and where ξ ∈ K &= [0, π]d \ [0, 1]d . As in the derivation of equation (3) of Section 3.1, K {1 − ϕ(ξ)}−2 dξ ≤ 2 d −2 , which is always finite. It remains to show that &d π {1 − cos(1)} −2 {1 − ϕ(ξ)} dξ is finite if and only if d ≥ 5. d [0,1] Using equation (2) of Section 3.1, 12d 2 (2d)2 ξ−4 dξ ≤ {1 − ϕ(ξ)}−2 dξ ≤ ξ−4 dξ. 5 [0,1]d [0,1]d [0,1]d We obtain the result from Lemma 3.1.2.
The next question that we address is, when do three or more independent simple random walks intersect infinitely many times? When d ≥ 5, the above theorem states that the answer is never, a.s. On the other hand, when d ≤ 2, Theorem 3.1.1 implies that the random walks in question are recurrent; Lemmas 2.1.1 and 2.3.1 together show that any number of
3 The Simple Random Walk
93
such random walks will intersect infinitely many times, a.s. Thus, the only dimensions of interest are d = 3 and d = 4. In the next two subsections we will study these in detail. Exercise 3.2.1 Use Exercise 3.1.3 and Theorem 1.4.1 together to find an alternative proof of Theorem 3.2.1. Exercise 3.2.2 Show that if S 1 and S 2 denote two independent simple walks in Zd where d ≥ 5, there exists a finite constant C > 1 such that for all n ≥ 1, 1
1
C −1 n− 2 (d−4) ≤ P(Si1 = Sj2 for some i, j ≥ n) ≤ Cn− 2 (d−4) . (Hint: Use Exercise 3.1.3 and Theorem 2.2.1.)
3.3 Three Simple Walks By Theorem 3.2.1 of the previous subsection, two independent Z4 -valued simple random walks will intersect infinitely many times. We now address the problem for three such walks. Theorem 3.3.1 Suppose S 1 , S 2 , and S 3 are independent Zd -valued simple random walks. The trajectories of S 1 , S 2 , and S 3 will a.s. intersect infinitely often if and only if d ≤ 3. Our proof relies on two technical lemmas regarding the function Eβd : R → R+ that is defined as follows: d y − ξ−β ξ−β dξ, y ∈ Rd . (1) Eβ (y) = d
ξ∈Rd :
ξ ≤1
Lemma 3.3.1 Suppose β < d < 2β. Then, there are two finite and positive constants C1 and C2 that depend only on β and d such that for all y ∈ Rd with y ≤ 1, C1 yd−2β ≤ Eβd (y) ≤ C2 yd−2β . Proof Fix some y ∈ Rd with y ≤ 1. Evidently, ξ − y−β · ξ−β dξ. Eβd (y) ≥
ξ ≤ y
Over the region of integration, ξ − y ≤ ξ + y ≤ 2y. Hence, d −β −β −β d−β Eβ (y) ≥ 2 y ξ dξ = 2 ζ−β dζ · yd−2β ,
ξ ≤ y
ζ ≤1
94
3. Random Walks
& which gives the desired lower bound with C1 = 2d−β ζ ≤1 ζ−β dζ. (By Lemma 3.1.2, C1 is finite and positive.) Next, we proceed with the upper bound. Write Eβd (y) = T1 + T2 + T3 , where
T1 = T2 = T3 =
ξ−y ≤ 12 y
ξ ≤1
ξ−y > 12 y
ξ ≤2 y ∧1
ξ−y > 12 y 2 y ≤ ξ ≤1
ξ − y−β · ξ−β dξ, ξ − y−β ξ−β dξ, ξ − y−β ξ−β dξ.
We estimate the above in order. When ξ − y ≤ 12 y, by the triangle inequality, ξ ≥ 12 y. Thus, β −β −β 2β−d T1 ≤ 2 y ζ dζ = 2 ζ−β dζ yd−2β .
ζ ≤ 12 y
ζ ≤1
By Supplementary Exercise 7, T1 ≤
22β−d dωd yd−2β , d−β
(2)
where ωd denotes the d-dimensional Lebesgue measure of the ball {z ∈ Rd : z ≤ 1}. Similarly, ζ−β dζ = 2d ζ−β dζ yd−2β . T2 ≤ 2β y−β
ζ ≤2 y
ζ ≤1
Another application of Supplementary Exercise 7 leads us to the bound T2 ≤
2d dωd yd−2β . d−β
It remains to estimate T3 . First, we note that if ξ − y ≥ ξ ≥ 2y, then certainly ξ − y ≤ ξ + y ≤ 32 ξ. Thus, 3 β T3 ≤ ξ − y−2β dξ 2
ξ−y ≥ 12 y 3 β ζ−2β dζ ≤ 1 2
ζ > 2 y β −d d−2β ζ−2β dζ. ≤ 3 2 y
ζ >1
(3) 1 2 y
and
3 The Simple Random Walk
95
& To finish, we are left to show that ζ >1 ζ−2β dζ < ∞. This is easy to do: Since d < 2β, by Supplementary Exercise 7, ∞ dωd −2β . ζ dζ = dωd rd−1−2β dr = 2β −d 1
ζ >1 To summarize, we have shown that T3 ≤ 3β 2−d ωd d(2β − d)−1 yd−2β . Combining this with (3) and (4), we obtain Eβd ≤ C2 yd−2β with C2 = dωd
2d + 22β−d d−β
+
3β 2−d . 2β − d
Since C2 is clearly finite and positive, this concludes our proof.
Going over the above argument with some care, we can also decide what happens when d = 2β. Lemma 3.3.2 There exists a finite and positive constant C that depends
only on d such that for all y ∈ Rd with y ≤ 1, E dd (y) ≤ C ln 4/y . 2
d Proof In the notation of our proof of Lemma 3.3.1, write Ed/2 (y) = T1 + T2 + T3 . Since they still hold for d = 2β, equations (2) and (3) together show that T1 + T2 ≤ C1 ≤ C1 ln(4/y), with C1 = (2d+1 + 2)ωd . Still proceeding with our proof of Lemma 3.3.1 and using β = d2 , we obtain,
T3 ≤
3 d2 2
= dωd = dωd
2≥ ξ ≥ 12 y
3 d2 2 3 d2 2
2 1 2 y
ln
ξ−d dξ
r−1 dr
4 . y
We have used Supplementary Exercise 7 once more and obtained the ded sired result with C = C1 + ( 32 ) 2 dωd . Exercise 3.3.1 Prove that Lemma 3.3.2 is sharp, up to a constant. That is, prove that lim inf y →0+ {ln(1/y)}−1E dβ (y) > 0. 2
We are ready for the following. Proof of Theorem 3.3.1 When d ≤ 2, the simple random walk is recurrent (Theorem 3.1.1). Thus, Lemmas 2.1.1 and 2.3.1 tell us that the trajectories of S 1 , S 2 , and S 3 intersect infinitely many times. (Why?) On the other hand, if d ≥ 5, then by Theorem 3.2.1, the trajectories of S 1 and
96
3. Random Walks
S 2 intersect only finitely many times. In particular, so do the trajectories of S 1 , S 2 , and S 3 . Thus, it remains to focus our attention on d ∈ {3, 4}. ∞ Let S = (2π)2d i,j,k=1 P Si1 = Sj2 = Sk3 . Thanks to Theorem 2.3.1, we need to show that S < ∞ when d = 4 while S = ∞ when d = 3. In order to do this, we begin with the identity ϕ(ξ1 + ξ2 ) ϕ(ξ1 ) ϕ(ξ2 ) S = lim dξ1 dξ2 . λ↑1 [−π,π]2d 1 − λϕ(ξ1 + ξ2 ) 1 − λϕ(ξ1 ) 1 − λϕ(ξ2 ) (We have implicitly used the fact that ϕ is real-valued. Why?) While for every ξ1 , ξ2 ∈ [−1, 1]d, ϕ(ξ1 ), ϕ(ξ2 ) ≥ 0, it is not always true that ϕ(ξ1 + ξ2 ) ≥ 0. To regain positivity, we split the above integral into two parts: Let I1 denote the above integral taken over [− 12 , 12 ]2d , and I2 the integral over K = [−π, π]2d \ [− 12 , 12 ]2d . We estimate I2 first. Since cosines are bounded above by 1, 1 1 1 dξ1 dξ2 . |I2 | ≤ lim λ↑1 K 1 − λϕ(ξ1 + ξ2 ) 1 − λϕ(ξ1 ) 1 − λϕ(ξ2 ) Note that whenever ξ1 , ξ2 ∈ K, then for all 1 ≤ ≤ d, ()
()
(a) cos(ξ1 + ξ2 ) ≤ cos( 12 ) < 1; ()
(b) cos(ξ1 ) ≤ cos( 12 ) < 1; and ()
(c) cos(ξ2 ) ≤ cos( 12 ) < 1. Hence,
−3 |I2 | ≤ (2π)2d 1 − cos( 12 ) < ∞.
Thus, we need to show that |I1 | is finite when d = 4 and is infinite when d = 3. This is where positivity comes into play: If ξ1 , ξ2 ∈ [− 21 , 12 ]2d , then ϕ(ξ1 ), ϕ(ξ2 ), and ϕ(ξ1 + ξ2 ) are all nonnegative. By the monotone convergence theorem, ϕ(ξ1 + ξ2 ) ϕ(ξ1 ) ϕ(ξ2 ) dξ1 dξ2 . I1 = 1 1 2d 1 − ϕ(ξ1 + ξ2 ) 1 − ϕ(ξ1 ) 1 − ϕ(ξ2 ) [− 2 , 2 ] Moreover, if ξ ∈ [− 12 , 12 ], then 0 < cos( 12 ) ≤ cos(ξ) ≤ 1. We have arrived at the bound {cos( 12 )}3 I1 ≤ I1 ≤ I1 , where
−1
−1
−2
1 − ϕ(ξ1 + ξ2 ) 1 − ϕ(ξ1 ) 1 − ϕ(ξ2 ) dξ1 dξ2 . I1 = [− 12 , 12 ]2d
Since I1 ≥ 0, we want to show that I1 is finite if d = 4 but is infinite if 3
d = 3. By equations (1) and (2) of Section 3.1, (2d)3 I1
≤ I1 ≤ ( 12d 5 ) I1 , where I1
= ξ1 + ξ2 −2 ξ1 −2 ξ2 −2 dξ1 dξ2 . [− 12 , 12 ]2d
3 The Simple Random Walk
97
Our goal now is to show that I1
is finite if d = 4 and is infinite if d = 3. By Fubini’s theorem and symmetry, I1
= ξ1 − ξ2 −2 ξ1 −2 ξ2 −2 dξ1 dξ2 [− 12 , 12 ]2d
≤
[− 12 , 12 ]2d
E2d (ξ1 )ξ1 −2 dξ1 .
If d = 4, by Lemma 3.3.2 there exists a finite and positive constant C1 such that 4 I1
≤ C1 ξ−2 dξ. ln ξ [−1,1]4 Since ln(4/ξ) ≤ 4/ξ for all ξ ∈ R4 with ξ ≤ 1, I1
≤ C1 ξ−3 dξ, [−1,1]d
which is finite, thanks to Lemma 3.1.2. If d = 3, by Lemma 3.3.2 there exists a finite positive constant C2 such that I1
≥ C2 ξ−3 dξ. [− 12 , 12 ]3
Since d = 3, Lemma 3.1.2 shows us that I1
= ∞. This concludes our proof.
3.4 Several Simple Walks Throughout, let us fix an integer N ≥ 4 and consider N independent simple walks, S 1 , . . . , S N , all taking values in Zd . If d ≤ 2, such random walks are recurrent (Theorem 3.1.1). By Lemma 2.1.1, when d ≤ 2, the trajectories of S 1 , . . . , S N intersect infinitely often, a.s. Next, suppose d ≥ 4. In this case, the trajectories of S 1 , S 2 , and S 3 intersect finitely often, a.s. (Theorem 3.3.1). Therefore, the same holds for S 1 , . . . , S N . The only case that remains to be analyzed is d = 3. Theorem 3.4.1 The trajectories of four or more independent simple walks in Z3 will almost surely intersect at most finitely many times. Our proof is an imitation of those in the previous sections but requires one more technical lemma. Lemma 3.4.1 For all y ∈ R3 define F (y) = ξ∈R3 : ξ − y−1 ξ−2 dξ.
ξ ≤1
Then, for all y ∈ R3 with y ≤ 1, F (y) ≤ 20π ln(4/y).
98
3. Random Walks
Proof We follow closely the arguments used in the given proofs of Lemmas 3.3.1 and 3.3.2. Write F (y) = T1 + T2 + T3 , where T1 = ξ−y ≤ 1 y ξ − y−1 ξ−2 dξ, 2
ξ ≤1
T2 = T3 =
ξ−y > 12 y
ξ ≤2 y ∧1
ξ−y > 12 y 2 y ≤ ξ ≤1
ξ − y−1 ξ−2 dξ, ξ − y−1 ξ−2 dξ.
We estimate each as in the demonstrations of Lemmas 3.3.1 and 3.3.2. To estimate T1 , use ξ − y ≥ y/2 to obtain ξ − y−1 dξ = r−1 dr. T1 ≤ 4y−2
ξ−y ≤ 12 y
r ≤1
By Supplementary Exercise 7, T1 ≤ 2π ≤ 2π ln(4/y). We have used the elementary fact that ω3 = 4π 3 . Likewise, T2 = 2y−1 ξ−2 dξ = 4 ξ−2 dξ = 8π.
ξ ≤2 y
ξ ≤1
Since 8π ≤ 8π ln(4/y), it remains to show that T3 ≤ 9π ln(4/y). Use ξ − y ≤ 3ξ to obtain 9 9 T3 ≤ ξ − y−3 dξ ≤ ζ−3 dζ. 4 1≥ ξ−y ≥ 12 y 4 2≥ ζ ≥ 12 y By Exercise 3.4.1 below this equals 9π ln(4/y), as desired. Exercise 3.4.1 For any ε ∈ ]0, 2[, compute
& 2≥ ζ ≥ε
ζ−3 dζ.
Exercise 3.4.2 Show that Lemma 3.4.1 is sharp, up to a constant. That is, lim inf y →0+ F (y)/ ln(1/y) > 0. We are ready to prove the theorem. Proof of Theorem 3.4.1 It suffices to consider only N = 4 and to show that ∞ P(Si1 = Sj2 = Sk3 = S4 ) < ∞, i,j,k,=0
S01
S02
where = = S03 = S04 = 0. However, symmetry and Proposition 2.3.1 together show that this is the same as showing that 3 ϕ(ξ1 + ξ2 + ξ3 ) ϕ(ξj ) lim dξ1 dξ2 dξ3 < ∞. λ↑1 [−π,π]9 1 − λϕ(ξ1 + ξ2 + ξ3 ) 1 − λϕ(ξj ) j=1
4 Supplementary Exercises
99
We split the above integral into two parts. Let I1 be the integral over [− 31 , 13 ]9 and I2 the integral over K = [−π, π]9 \ [− 13 , 13 ]9 . The same argument used to prove Theorem 3.3.1 goes through unhindered to show that −4 < ∞. |I2 | ≤ (2π)9 1 − cos( 13 ) It suffices to show that I1 is finite. When ξi ∈ [− 13 , 13 ]3 (i = 1, 2, 3), ϕ(ξi ) is positive (i = 1, 2, 3). Moreover, so is ϕ(ξ1 + ξ2 + ξ3 ). By the monotone convergence theorem, I1 =
[− 13 , 13 ]9
3
ϕ(ξ1 + ξ2 + ξ3 ) ϕ(ξj ) dξ1 dξ2 dξ3 1 − ϕ(ξ1 + ξ2 + ξ3 ) j=1 1 − ϕ(ξj )
−1 1 − ϕ(ξ1 + ξ2 + ξ3 )
≤
ξ1 , ξ2 , ξ3 ≤1
3
−1 1 − ϕ(ξ ) dξ1 dξ2 dξ3 .
=1
4 Employing equations (1) and (2) of Section 3.1, we deduce that I1 ≤ ( 36 5 ) J, where ξ1 + ξ2 + ξ3 −2 ξ1 −2 ξ2 −2 ξ3 −2 dξ1 dξ2 dξ3 . J=
ξ1 , ξ2 , ξ3 ≤1
We propose to show that J < ∞. Using symmetry and the definition of Eβd (equation (1) of Section 3.3), J≤ E23 (ξ1 + ξ2 )ξ1 −2 ξ2 −2 dξ1 dξ2 .
ξ1 , ξ2 ≤1
Lemma 3.3.1 can be applied with d = 3 and β = 2 to show us the existence & of a positive and finite constant C such that J ≤ C ξ ≤1 F (ξ)ξ−2 dξ. By Supplementary Exercise 7, and by Lemma 3.4.1 above, 4 ξ−2 dξ, J ≤ 20πC ln ξ
ξ ≤1 which is finite, by Supplementary Exercise 7.
4 Supplementary Exercises 1. Show that the inequalities of Theorem 1.5.1 can be sharpened to the following: P(Sk = 0 for some k ≥ n) = Qn {1 + Q1 }−1 , where Qn = ∞ n=1 P(Sj = 0).
100
3. Random Walks
2. Refine an aspect of Exercise 3.1.2 by showing that when S denotes the simple d d walk on Zd , limn→∞ (2n) 2 P(S2n = 0) = (2π)− 2 . This is a part of the local central limit theorem. You should compare this to the classical central limit theorem of A. de Moivre and P.-S. Laplace by looking at the density function of a mean-zero Gaussian random variable with the same variance as S2n . 3. Let S denote a transient random walk on Zd with S0 = 0 and define Tx to be the first time S hits x. That is, Tx = inf(k ≥ 0 : Sk = x). In the notation of Section 1.4, show that E[RTx ] = ∞ k=0 {P(Sk = 0) − P(Sk = −x)}. Tx −1 (Hint: By transience, R ∞ < ∞, a.s. Now we can write R∞ = k=0 1l(Sk =0) + ∞ k=Tx 1l(Sk =0) and use the strong Markov property.) k ]≤ 4. Show that for any random walk S on Zd and for all integers n, k ≥ 1, E[Rn k k!{E[Rn ]} . In particular, obtain the large deviation bound
R 1 n λ > 0, ≥λ ≤ e−δλ , P E[Rn ] 1−δ
where δ is an arbitrary number strictly between 0 and 1. 5. (Mixing) Much of the theory for independent random variables goes through with fewer hypotheses than independence. We explore one such possibility in this exercise. A sequence of random variables ξ1 , ξ2 , . . . is said to be ϕ-mixing if sup sup P(E | F ) − P(E) ≤ ϕ(n), i≥1 E∈F [i+n,∞[ F ∈F [1,i]
where F A is the σ-field generated by {ξi ; i ∈ A}, and limn→∞ ϕ(n) = 0. Note that if the ξi ’s are independent, then they are ϕ-mixing for any ϕ that vanishes at infinity. (i) Prove that the tail σ-field T = ∩n F [n,∞[ is trivial. (ii) Show that whenever n ϕ(n) < +∞, " 0, if ∞ n=1 P(ξn = 0) < +∞, P(ξn = 0 infinitely often) = 1, if ∞ n=1 P(ξn = 0) = +∞. 6. Verify equation (1) of Section 2.3. 7. Suppose U is chosen uniformly at random from Dm = {ξ ∈ Rm : ξ ≤ 1}. (i) Show that the density function of U at x ∈ [0, 1] is mxm−1 . (ii) Use the previous part to prove the following integration-by-parts formula: For all integrable functions f : [0, 1] → [0, 1], 1 f (u) du = mωm · sm−1 f (s) ds, Dm
0
where ωm denotes Lebesgue’s (m-dimensional) measure of Dm .
4 Supplementary Exercises (iii) Show that
ωm
m mπ 2 (m/2)! = 1 1 2 2 (m+1) π 2 (m−1) , 1 · 3 · 5 · · · (m − 2)
101
if m is even, if m is odd and m > 1.
8. (Hard) Let S denote the simple walk on Z2 and let S0 = 0. (i) When d = 1, use Supplementary Exercise 2 to deduce that with probability one, n 1 1l(Sk =0) 1 = √ . lim 1 n→∞ ln n 2 2π k2 k=1 (ii) Prove that when d = 2, lim
n→∞
n 1 1 1l(Sk =0) = . ln ln n k=2 ln k 4π
This is due to Erd˝ os and Taylor (1960a, 1960b). (Hint: For part (i), start by proving that the expected value of the limit theorem holds. Then, prove that the variance of the given sum is bounded by C ln n, for some finite constant C > 0. Use the Borel–Cantelli lemma to obtain the a.s. convergence along the subsequence nk = exp(k2 ). To conclude part (i), estimate the sum for nk ≤ n ≤ nk+1 by the end values of n. Part (ii) is proved similarly, but the variance estimate is now given by a bound of C ln ln n, and the subsequence 2 should be changed to nk = exp(ek ).) 9. (Hard) Suppose X1 , X2 , . . . denote i.i.d. random variables that take their values in Rd and define the corresponding random walk Sn = n j=1 Xj (n ≥ 1). We say that 0 is recurrent if for all ε > 0, P(|Sn | < ε infinitely often) > 0. (i) Verify that when P(X1 ∈ Zd ) = 1, our two notions of recurrence are one and the same. (ii) Show that 0 is recurrent if and only if for all ε > 0, P(|Sn | < ε infinitely often) = 1. (iii) Define S0 = 0 and prove that for all n ≥ 1 and all ε > 0, n j=0
P(|Sj | ≤ 2ε) ≤ 16d
n
P(|Sj | ≤ ε).
j=0
(iv) Show that the following are all equivalent: (a) 0 is recurrent;
(b) for some ε > 0, ∞ j=1 P(|Sj | ≤ ε) = +∞; ∞ (c) for all ε > 0, j=1 P(|Sj | ≤ ε) = +∞. (Hint: For part (iii), cover [−2ε, 2ε]d with 16d cubes of side Markov property.)
1 ε 2
and apply the
102
3. Random Walks
d d 10. Given a transient random walk S on Z with S0 = 0, define for each a ∈ Z , 1 l ]. u(a) = E[ ∞ k=0 (Sk +a=0)
(i) Check that u(0) = E[R∞ ] and show that u(a) is finite for all a ∈ Zd . (ii) Show that m → u(Sm ) is a supermartingale. (Hint: Apply Lemma 1.1.1 to f (x) = 1l{0} (x).)
11. (Hard) Let S denote the simple walk on Zd . (i) In the case d ≥ 3, prove that there are finite positive constants C1 < C2 such that for all n ≥ 1, 1
1
C1 n− 2 (d−2) ≤ P(Si = 0 for some i ≥ n) ≤ C2 n− 2 (d−2) . (ii) Let d = 2 and suppose c1 , c2 , . . . is a nondecreasing sequence such that limn→∞ cn = +∞ and lim supn→∞ cn /n < ∞. Show that when d = 2, there exist finite positive constants C1 < C2 such that for all n ≥ 1, cn cn C1 ≤ P(Si = 0 for some n ≤ i ≤ n + cn ) ≤ C2 . n ln cn n ln cn n+cn (Hint: For the lower bound, consider the first two moments of j=n 1l(Sj =0) . n 1l(Sj =0) , For the upper bound, estimate the conditional expectation of n+2c j=n given F m , where m is between n and n + cn .) In different forms and to various extents, this can be found in Benjamini et al. (1995), Erd˝ os and Taylor (1960a, 1960b), Lawler (1991), and R´ev´esz (1990). 12. Let τ1 , τ2 , . . . denote the first, second, . . . hitting times of 0 by a Zd -valued random walk S. The goal of this exercise is an exact computation of the distribution of τ1 . (i) Show that for all λ > 0 and for all integers n ≥ 1, E[e−λτn ] = (E[e−λτ1 ])n . −λk 1l(Sk =0) . Show (ii) Let S0 = τ0= 0 and for all λ > 0, define Vλ = ∞ k=0 e ∞ −λτn that Vλ = n=0 e and conclude the following identity for the Laplace −λk P(Sk = 0)}−1 . transform of τ1 : E[e−λτ1 ] = 1 − { ∞ k=0 e (iii) Show that when S is the simple walk on Zd , 1
lim λ 2
λ→0+
∞
e−λk P(Sk = 0) =
√
2
k=0
when d = 1, and when d = 2, ∞ 1 1 −λk e P(Sk = 0) = . 1 2π λ→0+ ln( ) λ k=0 (Hint: Consider the distribution function F (k) = j≤k P(Sj = 0). Apply the Tauberian theorem Theorem 2.1.1, Appendix B, together with Supplementary Exercise 2.)
lim
Such results are a part of the folklore of random walks; for instance, read Chung and Hunt (1949) with care. In the above forms, they can be found in Khoshnevisan (1994), where you can also find further applications to measure the zero set of random walks.
5 Notes on Chapter 3
103
13. (Continued from Supplementary Exercise 12) (i) Let S denote the simple walk on Zd . In the notation of Supplementary Exercise 12, show that when d = 1, τn /n2 converges in distribution to a nonnegative random variable τ∞ whose Laplace transform is E[e−ζτ∞ ] = √ exp(− ζ). (Hint: Use the convergence theorem for Laplace transforms (cf. Theorem 1.2.1, Appendix B). The random variable τ∞ is the so-called stable random variable of index 12 and will reappear later in Section 3.2, Chapter 10.) √ (ii) Conclude that when d = 1, Rn / n converges in distribution to the absolute value of a standard Gaussian random variable. (Hint: Since τ is the inverse function to R, roughly speaking, P(Rn ≥ √ λ n) = P(τλ√n ≤ n). You need to make this work by a series of inequalities.) 14. (Hard) Suppose S 1 and S 2 are two independent simple walks on Z4 . Consider a nondecreasing sequence c1 , c2 , . . . such that limn→∞ cn = +∞ and lim supn→∞ cn /n < ∞. Show the existence of two positive finite constants C1 < C2 such that for all n ≥ 1, c 2 c 2 1 1 n n · ≤ P(Si1 = Sj2 for some n ≤ i, j ≤ n + cn ) ≤ C2 · . C1 n ln cn n ln cn (You should first study Supplementary Exercise 11.)
5 Notes on Chapter 3 Section 1 The references (Ornstein 1969; Spitzer 1964; R´ev´esz 1990; Revuz 1984) are excellent resources for the fine and general structure of one-parameter random walks, Markov chains, and their connections to ergodic theory and potential theory. The argument of Section 1.7 that reduces attention to the set of possible points is quite old, but often goes unmentioned when d > 1, perhaps to avoid discussions relating to free abelian groups. Much of the material of this section, and, in fact, chapter, can be extended to random walks on locally compact abelian groups. A comprehensive account of the potential-theoretic aspects of this can be found in Port and Stone (1971a, 1971b). The basic message of the investigations of recurrence for random walks is that a point is recurrent for the walk if and only if the walk is expected to hit that point infinitely often. The number of times the random walk hits a given point is the so-called local time at that point. There are limit theorems associated with such local times; they can be viewed as refinements of the notion of recurrence, among other things; see Bass and Khoshnevisan (1993b, 1993c, 1995), Borodin (1986, 1988), Cs´ aki and R´ev´esz (1983), Cs¨ org˝ o and R´ev´esz (1984, 1985, 1986), Kesten and Spitzer (1979), Jacod (1998), Khoshnevisan (1992, 1993), Knight (1981), Perkins (1982), and R´ev´esz (1981).
104
3. Random Walks
Section 2 In the probability literature, the study of the intersections of random walk trajectories goes back at least to Dvoretzky and Erd˝ os (1951), as well as Erd˝ os and Taylor (1960b, 1960a), and Dvoretzky et al. (1950, 1954, 1958, 1957). Related results, together with references to the physics literature, can be found in (Madras and Slade 1993; Lawler 1991). In this section we essentially showed that the intersections are recurrent if and only if the walks are expected to intersect infinitely many times, at least as long as all of the intervening walks are symmetric. At this time it is not known whether Theorem 2.3.1 holds without any symmetry, or sector-type, hypotheses. Further analysis of the number of intersections of random walks leads to a so-called intersection local time that is the main subject of Le Gall et al. (1989), Le Gall and Rosen (1991), Lawler (1991), Rosen (1993), and Stoll (1987, 1989). Some very general results can be found in (Bass and Khoshnevisan 1992a; Dynkin 1988). Many of the quantitative results of this section are new. Section 3 The results of this section are all classical and can be found in the pre60’s references cited under Section 2 above. For further refinements, see Lawler (1991). Many of the presented proofs in this section are new. Further related works, but in a genuine multiparameter context, can be found in Etemadi (1977). A variant of Exercise 3.2.1 can be found in Lawler (1991, Theorem 3.3.2). Section 4 A variant of Supplementary Exercise 14 can be found in Lawler (1991, Theorem 3.3.2). Supplementary Exercise 5 seems to be new. However, much is known about sums of mixing random variables. A good starting place for this is Billingsley (1995).
4 Multiparameter Walks
The discussions of Chapter 3 revolved around multiparameter processes that are formed by considering systems of independent one-parameter random walks. In this chapter we consider properties of genuinely multiparameter random walks. For an example of such a process, suppose each “site” t ∈ NN corresponds to an independent particle that is negatively charged with probability p and positively charged with probability 1 − p. Let Xt = 1 if the particle at site t ∈ NN is negatively charged; otherwise, set Xt = 0. Then, the total number of negatively charged particles in the rectangle [0, t] is precisely s t Xs . When the X’s are general i.i.d. random variables, this defines a general N -parameter random walk. To summarize the main results of this chapter, let us first suppose that the increments of the multiparameter random walk have the same distribution as some random variable ξ. Then, a rough summary of the main results of this chapter is as follows:
The strong law of large numbers holds the law of the iterated logarithm holds
iff E[|ξ|{ln+ |ξ|}N−1 ] < ∞; iff E[|ξ|2 {ln+ |ξ|}N−1 ] < ∞,
with the condition on the law of the iterated logarithm being nearly optimal; cf. Exercise 2.2.1 and Supplementary Exercise 2, as well as the Notes at the end of the chapter. Both of these results are natural extensions of 1-parameter results, with which the reader is expected to be familiar. We prove these results by establishing connections to martingales, maximal inequalities, etc.
106
4. Multiparameter Walks
1 The Strong Law of Large Numbers In this section we discuss the strong law of large numbers for mean-zero, finite-variance multiparameter random walks. We first recall the requisite 1-parameter background. If X1 , X2 , . . . are i.i.d. mean-zero random variables, the random walk S = (Sn ; n ≥ 1) is defined by Sn = X1 + · · · + Xn (n ≥ 1). Supposing that the X’s have mean zero, the weak law of large numbers states that 1 n Sn converges to 0 in probability, as n → ∞. A much more interesting fact—the strong law of large numbers—states that this convergence holds almost surely. In like manner, in the multiparameter setting we will define the increments Xs (s ∈ NN ), together with a walk St = s t Xs (t ∈ NN ). Note that for any fixed t ∈ NN , St is a sum of t i.i.d. random variables, where N
t =
t(j) ,
t ∈ NN .
(1)
j=1
Thus, as soon as the X’s have mean zero, by the weak law of large numbers, lim
t→∞
St = 0, t
in probability.
A more interesting mode of convergence is convergence with probability one, where the analogous law of large numbers is much more delicate. For instance, we have already seen, in the preamble to this chapter, that the necessary and sufficient condition for this strong law is the integrability of X1 {ln+ |X1 |}N −1 and not X1 . Informally, this difference between the 1-parameter and and multiparameter theories arises because in the latter case there are many ways in which the actual time parameter t can go to infinity.
1.1 Definitions Fix an integer N ≥ 1 and consider independent, identically distributed Rd -valued random variables indexed by NN : X = (Xt ; t ∈ NN ). The corresponding multiparameter random walk S = (St ; t ∈ NN ) is defined by St = Xs , t ∈ NN . st
In analogy to the one-parameter theory, the X’s are referred to as the increments of the underlying random walk S. Corresponding to the multiparameter walk S, we define the N -parameter filtration F = (Ft ; t ∈ NN ) to be the history of S. That is, for all t ∈ NN , Ft designates the σ-field generated by (Xr ; r t). If F1 , . . . , FN stand for
1 The Strong Law of Large Numbers
107
the marginal filtrations of F, after recalling Section 2.6 of Chapter 1, it quickly follows that F1 , . . . , FN are the orthohistories of S.
1.2 Commutation An important property of the filtration F is that it is commuting. This will be a consequence of the following. Lemma 1.2.1 (Inclusion–Exclusion Formula) of Given a sequence N X , t ∈ N . Then, real numbers X = (Xt ; t ∈ NN ), let St = st s for all t ∈ NN , N () Xt = (−1) =1 r St−r , r∈{0,1}N
where St = 0 if t ∈ NN . Proof We proceed by induction on N . First, suppose N = 1. Clearly, for all t ∈ NN , Xt = St − St−1 = (−1)r St−r . r∈{0,1}
Thus, the result holds when N = 1. Next, we argue by induction on N (≥ 2). Assuming that the result holds for N − 1, we prove it for N . Write any t ∈ NN as t = (t(1) , t ), where t ∈ NN −1 is given by t = t(2) , . . . , t(N ) ). We will write Xt(1) ,t for Xt = X(t(1) ,t ) . The same convention applies to k (1) ) (t ∈ NN ), where Yr (k) = St . Clearly, St = s t Ys (t j=1 Xj,r (r ∈ NN −1 ). By the induction hypothesis,
Yt (t(1) ) =
N −1
(−1)
=1
r ()
t ∈ NN .
St(1) ,t −r ,
r∈{0,1}N −1
On the other hand, Xt = Yt (t(1) ) − Yt (t(1) − 1), where for any t ∈ NN −1 , Yt (k) = 0 whenever k ∈ NN . In summary, Xt =
N −1
(−1)
=1
r ()
St(1) ,t −r −
r∈{0,1}N −1
N −1
(−1)
=1
r ()
St(1) −1,t −r ,
r∈{0,1}N −1
which is a restatement of the lemma.
A corollary of the above is the commutation property of F. Proposition 1.2.1 If F = (Ft ; t ∈ NN ) denotes the history of a multiparameter random walk S = (St ; t ∈ NN ) with increments X = (Xt ; t ∈ NN ), then: (i) for all t ∈ NN , Ft is the σ-field generated by (Xr ; r t); and
108
4. Multiparameter Walks
(ii) F is a commuting filtration. Proof Temporarily, define Xt to be the σ-field generated by (Xr ; r t). Since St = r t Xr , Xt ⊂ Ft for all t ∈ NN . The converse inclusion also holds and follows from Lemma 1.2.1. Next, we prove (ii). Applying Theorem 3.6.1 of Chapter 1, we set out to show that for all t, s ∈ NN , Ft and Fs are conditionally independent, given Fst . (The mentioned theorem is about stochastic processes indexed by NN 0 , while we have processes indexed by NN . To overcome this trivial difficulty, simply “shift back” time by (1, . . . , 1).) We need to show the following: For all s, t ∈ NN and all bounded random variables Ys and Yt that are Fs and Ft measurable, respectively, E[Yt Ys | Fst ] = E[Yt | Fst ] · E[Ys | Fst ],
a.s.
It suffices to do so for s = t. Henceforth, we fix two different s, t ∈ NN . By Lemma 1.2.1, for all r ∈ NN , Fr is the σ-field generated by (Xs ; s r). By a standard argument only consider Yt and Ys k of measure theory, weneed j of the form Yt = n=0 fn (Xun ) and Ys = m=0 gm (Xvm ), where fn , gm : Rd → [0, 1] (0 ≤ n ≤ k, 0 ≤ m ≤ j) are bounded and measurable functions, vm s, and un t (0 ≤ n ≤ k, 0 ≤ m ≤ j). By relabeling indices and possibly introducing more variables, we may assume the existence of 0 ≤ M ≤ j and 0 ≤ N ≤ k such that for all n ≤ N and m ≤ M , un s t and vm s t. On the other hand, for values of n > N (respectively m > M ) un | s t (respectively vm | s t). Recall that t = s, un t and vm s (0 ≤ n ≤ k, 0 ≤ m ≤ j). Thus, (un ; N < n ≤ k) and (vm ; M < m ≤ j) are necessarily disjoint sequences. Using the independence of the X’s, we can conclude the following: N
E[Ys Yt | Fst ] =
M
fn (Xun ) n=0
×E
gm (Xvm ) m=0
k
fn (Xun ) E
n=N +1
j
gm (Xvm )
m=M+1
= E[Yt | Fst ] E[Ys | Fst ]. This proves the result.
It follows from the above that—when they are in R and integrable— mean-zero, multiparameter random walks are martingales with respect to a commuting filtration. In particular, they enjoy the general properties of martingales with respect to commuting filtrations. Put in more precise terms, we may state the following lemma: Lemma 1.2.2 Suppose S is an R-valued, N-parameter random walk with increments X = (Xt ; t ∈ NN ). If, in addition, X(1,...,1) has mean 0, then S is an N -parameter martingale with respect to a commuting filtration.
1 The Strong Law of Large Numbers
Exercise 1.2.1 Complete the proof of Lemma 1.2.2.
109
Exercise 1.2.2 In the context of Lemma 1.2.2, suppose further that the Xt ’s have a finite variance σ 2 . Show that t → St2 − σ 2 t is a mean-zero martingale with respect to the same commuting filtration as S. Exercise 1.2.3 Suppose S is an N -parameter random walk with meanzero increments, each of which has a finite moment generating function. Given α ∈ R, find a nonrandom function ψ such that t → ψ(t) · eαSt is an N -parameter martingale with mean 1.
1.3 A Reversed Orthomartingale Let S = (St ; t ∈ NN ) denote an N -parameter random walk that takes values in Rd . Recalling equation (1) in the introductory portion of this section, let X denote the increments process and define the “sample average” At =
St , t
t ∈ NN .
(1)
The main goal of this subsection is to show that A is a reversed N parameter orthomartingale. To do so, we shall need N (one-parameter) reversed filtrations. For 1 ≤ i ≤ N , define Ri = (Rik ; k ≥ 1) as follows: For all k ≥ 1, let Rik denote the σ-field generated by (St ; t ∈ NN , such that t(i) ≥ k). To illustrate this better, consider N = 2. Then, R1k is the σ-field generated by (Sn,m ; n ≥ k, m ≥ 1), and R2k is the σ-field generated by (Sn,m ; n ≥ 1, m ≥ k). Continuing with the discussion for N = 2, the inclusion–exclusion formula (Lemma 1.2.1) implies that for any k ≥ 1, R1k is the σ-field generated by the following two collections of random variables: (a) (Sk,m ; m ≥ 1); and (b) (Xn,m ; n ≥ k + 1, m ≥ 1). Likewise, R2k is the σ-field generated by (c) (Sk,n ; n ≥ 1); and (d) (Xn,m ; n ≥ 1, m ≥ k + 1). Now let us suppose d = 1, so that X, S, and A are all real-valued stochastic processes. By the stationarity and the independence of the increments, for any k ≥ 1, E[X1,1 | R1k ] = · · · = E[Xk,1 | R1k ]. Since X1,1 + · · · + Xk,1 = Sk,1 , this implies that E[X1,1 | R1k ] =
Sk,1 = Ak,1 . k
110
4. Multiparameter Walks
More generally, this argument shows that for all k, n ≥ 1, Ak,n =
1 E[X1,1 + · · · + X1,n | R1k ]. n
Similarly, 1 E[X1,1 + · · · + Xk,1 | R2n ]. k Hence, when N = 2, A is a 2-parameter reversed orthomartingale with respect to the reversed filtrations R1 and R2 . The same proof works in any number of temporal dimensions (N ≥ 1). In fact, we have the following result. Ak,n =
Lemma 1.3.1 Suppose S = (St ; t ∈ NN ) is a multiparameter walk with Rvalued increments X = (Xt ; t ∈ NN ). Let A = (At ; t ∈ NN ) be the sample average process given by equation (1). If X(1,...,1) is integrable, then A is a reversed orthomartingale with respect to the reversed filtrations R1 , . . . , RN defined above. Moreover, for all nondecreasing convex Φ : R → R+ , sup E Φ(At ) ≤ E[Φ(X(1,...,1) )], t∈NN
as long as E[Φ(X(1,...,1) )] < +∞. Exercise 1.3.1 Prove Lemma 1.3.1 when N ≥ 3.
1.4 Smythe’s Law of Large Numbers We now state and prove the strong law of large numbers of Smythe (1973) for multiparameter random walks. Theorem 1.4.1 (Smythe’s Law of Large Numbers) Suppose S is an N -parameter random walk with real-valued increments X = (Xt ; t ∈ NN ). Let X0 be a random variable with the same distribution as Xt (t ∈ NN ) and recall A from equation (1) of Section 1.3. If E{|X0 |(ln+ |X0 |)N −1 } < ∞, then almost surely, lim At = E[X0 ]. t→∞
Conversely, suppose P(L = ∞) > 0, where L = lim supt |At |. Then, E |X0 |(ln+ |X0 |)N −1 = ∞. To demonstrate this, we develop two preliminary real-variable facts concerning the following function: dt , x ≥ 1. (1) 1l[1,x] (t) IN (x) = t N [1,∞[
1 The Strong Law of Large Numbers
111
Lemma 1.4.1 For any random variable Z, 1 1 E |Z|IN −1 2 |Z| ≤ P(|Z| ≥ t) dt ≤ E |Z|IN −1 |Z| , 2 N [1,∞[ where N ≥ 1 is an arbitrary integer. Proof It is evident that P(|Z| ≥ t) dt = [1,∞]N
[1,∞[N −1
1
∞
∞
= [1,∞[N −1
v
=E
P(|Z| ≥ sv) ds dv P(|Z| ≥ u) du
dv v
dv . 1l[1,|Z|] (v) |Z| − v v [1,∞[N −1
The inequality |Z| − v ≤ |Z| gives the upper bound of the lemma. On the other hand, if v ≤ 12 |Z|, we have |Z| − v ≥ 12 |Z|, thus deriving the converse bound of the lemma. Our second preliminary result is an estimation of the function IN . Lemma 1.4.2 For all x > e and all integers N ≥ 1, N −N (ln x)N ≤ IN (x) ≤ (ln x)N . This is proved by means similar to those used to verify Lemma 1.4.1. As such, we leave the argument as an exercise. Exercise 1.4.1 Prove Lemma 1.4.2.
Proof of Theorem 1.4.1 To prove half of the theorem, suppose |X0 |(ln+ |X0 |)N −1 is integrable. Certainly, X0 is also integrable. Thus, Lemma 1.3.1 shows that A is a reversed orthomartingale. Moreover, applying the latter lemma to Φ(t) = t(ln+ t)N −1 , we can deduce that supt∈NN E[|At |(ln+ |At |)N −1 ] is finite. By Theorem 2.9.1 of Chapter 1, At has a limit, a.s. To finish our proof of the sufficiency, we need to identify this limit as E[X0 ]. However, by Kolmogorov’s (one-parameter) strong law of large numbers, limt∞ At = E[X0 ], almost surely. This proves the sufficiency. To prove the converse, let us suppose that with positive probability, L = ∞. Since (L = ∞) is a tail event for independent random variables, we must have L = ∞, a.s. (Why?) For all t ∈ ZN \ NN , define At = 0. With this convention in mind, the inclusion–exclusion formula (Lemma 1.2.1) shows us that for all t ∈ NN , Xt = t
r∈{0,1}N
N
(−1)
j=1
r (j)
At−r .
112
4. Multiparameter Walks
Since P(L = ∞) = 1, lim supt→∞ t−1 |Xt | = +∞, almost surely. By the Borel–Cantelli lemma for independence events this is equivalent to the following: For all λ > 0,
P |Xt | ≥ λt = +∞. t∈NN
Since Xt has the same distribution as X0 , the integral test of calculus can be employed to see that
P |X0 | ≥ t dt = ∞. [1,∞[N
The result follows, after we show that for all N ≥ 1,
P |X0 | ≥ t dt = +∞ ⇐⇒ E |X0 |(ln+ |X0 |)N −1 = +∞. [1,∞[N
But this follows from Lemmas 1.4.1 and 1.4.2.
Exercise 1.4.2 In the context of Smythe’s law of large numbers, suppose further that X0 ∈ Lp (P) for some p > 1 and show that as t → ∞, At converges to E[X0 ] in Lp (P) as well. Exercise 1.4.3 Complete our proof of Smythe’s law of large numbers by showing that L = ∞ implies that a.s., lim supt→∞ t−1 |Xt | = +∞. Exercise 1.4.4 Refine Lemma 1.4.1 by showing that P(Z ≥ t) dt [1,∞[N
=
N −1
(−1)j+1 E ZIN −j (Z) + (−1)N −1 E Z1l[1,∞[ (Z) ,
j=1
where Z is any almost surely nonnegative random variable, and N ≥ 1 is an integer.
2 The Law of the Iterated Logarithm Suppose S is an N -parameter random walk in R with increments X = N −1 } < ∞, the strong law of (Xt ; t ∈ NN 0 ). Given that E{|X0 |(ln+ |X0 |) large numbers (Theorem 1.4.1) shows that as t → ∞, St ≈ t · E[X0 ], almost surely.
2 The Law of the Iterated Logarithm
113
One can refine this by seeking to estimate the size of the difference St − tE[X0 ]. When the increments have sufficient integrability, this refinement is the so-called law of the iterated logarithm (written LIL). As a warm-up, we first work with the special case where the random walk is one-parameter Gaussian. This shows some of the salient features of what makes the LIL happen in a setting that is free of many of the technical difficulties inherent to such an undertaking. The general law of the iterated logarithm will be stated and proved following the one-parameter Gaussian case, and will take up the remainder of this chapter.
2.1 The One-Parameter Gaussian Case This subsection proves the following law of the iterated logarithm in the one-parameter Gaussian setting: the simplest of its kind. Theorem 2.1.1 (LIL–Gaussian Case) Let S = (Sk ; k ≥ 1) denote an R-valued random walk whose increments X = (Xk ; k ≥ 1) are standard Gaussian random variables. Then, Sk lim sup √ = 1, 2k ln ln k k→∞
a.s.
The usual proof of the above LIL relies on three key steps: (i) a maximal inequality; (ii) tail estimates for deviation probabilities; and (iii) a blocking argument. We will address each in order. The first step relies on L´evy’s maximal inequality, which holds for all symmetric random walks.1 Lemma 2.1.1 (L´ evy’s Maximal Inequality) Suppose S denotes a realvalued, one-parameter symmetric random walk. Then, for all k ≥ 1 and λ > 0,
P max |Sj | ≥ λ ≤ 2P |Sk | ≥ λ . 1≤j≤k
Proof Fix k ≥ 1 and λ > 0 and define τ = inf(j ≥ 1 : |Sj | ≥ λ), with the usual convention that inf ∅ = ∞. Note that P(max1≤j≤k |Sj | ≥ λ) = P(τ ≤ k), so that
P max |Sj | ≥ λ = P(|Sk | ≥ λ) + P(|Sk | < λ , τ ≤ n) 1≤j≤k
= P(|Sk | ≥ λ) + P(|Sk − Sτ + Sτ | < λ , τ ≤ n). 1 That is, the increments are symmetric. Recall that a random variable Z is symmetric if Z and −Z have the same distribution.
114
4. Multiparameter Walks
Since τ is a stopping time for the history of S, by the strong Markov property (Theorem 1.2.1 of Chapter 3), Sk − Sτ is independent of Sτ and (τ ≤ n). This and the symmetry of the increments together imply that we can replace Sk − Sτ by Sτ − Sk and not change the probability. That is, P
max |Sj | ≥ λ = P |Sk | ≥ λ + P |2Sτ − Sk | < λ , τ ≤ n .
1≤j≤k
On the other hand, on (τ < ∞), |Sτ | ≥ λ. Thus, thanks to the triangle inequality, on (τ < ∞), (|2Sτ − Sk | < λ) ⊂ (|Sk | ≥ λ).
This has the desired result.
Exercise 2.1.1 Suppose S is an N -parameter random walk whose increments X are symmetric random variables. Prove that for all t ∈ NN and for all λ > 0,
P max |Ss | ≥ λ ≤ 2N P(|St | ≥ λ). st
This is due to P. L´evy when N = 1, and Wichura (1973) for N > 1. (Hint: Use the method used in our proof of Lemma 2.1.1, one parameter at a time.) The second step in our proof of the theorem is a probability tail estimate. Lemma 2.1.2 Let S denote a real-valued, one-parameter Gaussian random walk. Then, if xn is a sequence that tends to infinity,
0 1 1 ln P Sk > kxk = − . k→∞ xk 2 lim
Proof For any k ≥ 1, Sk is a Gaussian random variable with mean 0 and variance k. Thus, for any x > 0, √ − 12 P Sk > x k = (2π)
∞
1
2
e− 2 u du.
x
On the other hand, it is easy to check from this (via L’Hˆ opital’s rule of calculus) that √ 1 2 1 lim xe 2 x P Sk > x k = (2π)− 2 , x→∞
and that the convergence rate must be independent of the choice of k ≥ 1.
2 The Law of the Iterated Logarithm
115
Exercise 2.1.2 (Hard) Show that Lemma 2.1.2 continues to hold if we replace the Gaussian walk by the simple walk, at least as long as xk = √ α 2k ln ln k for α > 0 fixed. (Hint: One way √ to proceed is by proving an upper and a lower bound for P(Sn ≥ α 2n ln ln n). For a method of getting the lower bound, peek ahead to Lemma 2.10.1 below. To obtain an upper bound, you can: (i) compute the moment generating function of Sn ; and (ii) prove H. Chernoff’s inequality: For all α > 0, P(Sn > x) ≤ 2e−αn E[eαSn ].
Optimize over the α of part (ii) to get a working upper bound for the probability in question.) We are ready to derive our Gaussian LIL, using the third step: the blocking argument. Proof of Theorem 2.1.1 We will first prove that for any ε > 0, |Sk | ≤ 1 + ε, lim sup √ 2k ln ln k k→∞
a.s.
(1)
Fix any ε > 0, and for any other fixed 1 < b < (1 + ε)2 , define bk = bk ! (k ≥ 1). By Lemma 2.1.1,
0 0
P max |Sj | ≥ (1 + ε) 2bk ln ln bk ≤ 2P |Sbk+1 | ≥ bk+1 xk+1 , 1≤j≤bk+1
bk ln ln bk . To this we can apply Lemma 2.1.2 and where xk+1 = 2(1 + ε)2 bk+1 deduce
0 (1 + ε)2 ln k , P max |Sj | ≥ (1 + ε) 2bk ln ln bk ≤ exp − (1 + δk ) 1≤j≤bk+1 b
where δk is some term that goes to 0 as k → ∞. Since b > (1+ε)2 , the above can be summed over k, and we obtain, from the Borel–Cantelli lemma, that with probability one, 0 max |Sj | ≤ 2(1 + ε)bk ln ln bk , 1≤j≤bk+1
for all but finitely many k. Therefore, for all m ∈ [bk , bk+1 ] sufficiently large, 0 0 |Sm | ≤ max |Sj | ≤ 2(1 + ε)bk ln ln bk ≤ 2(1 + ε)m ln ln m. 1≤j≤bk+1
116
4. Multiparameter Walks
This verifies equation (1). We now work toward the converse. Fixing b > 1, ε > 0, define
0 Ek = Sbk+1 − Sbk ≥ (1 − ε) 2(bk+1 − bk ) ln ln bk . The Ek ’s are independent events. On the other hand, Sbk+1 − Sbk and Sbk+1 −bk have the same distribution. Thus, by Lemma 2.1.2, applied to xn = 2(1 − ε)2 ln ln bn , P(Ek ) ≥ exp − (1 + ηk )(1 − ε) ln k , where ηk is some term that goes to 0 as k → ∞. The upshot of this is that k P(Ek ) = +∞. Therefore, owing to the independence of the Ek ’s and by the Borel–Cantelli lemma for independent events, P(Ek infinitely often) = 1. Equivalently, ) Sbk+1 − Sbk 1 a.s. ≥ (1 − ε) 1 − , lim sup 0 b 2bk+1 ln ln bk+1 k→∞ 1
Since b > 1, equation (1) shows that lim supk→∞ (2bk+1 ln ln bk+1 )− 2 |Sbk | ≤ 1 b− 2 , a.s. Therefore, with probability one, Sbk+1 Sn lim sup √ ≥ lim sup 0 n→∞ 2bk+1 ln ln bk+1 2n ln ln n k→∞ ) ) 1 1 . ≥ (1 − ε) 1 − − b b Since this holds for all b > 1, and since equation (1) holds for all ε > 0, the result follows. We conclude this subsection with the following very important exercise. Exercise 2.1.3 (Khintchine’s LIL) Prove that the 1-parameter LIL (Theorem 2.1.1) still holds when Gaussian walks are replaced by simple ones, that is, when the increments are ±1 with probability 12 each.
2.2 The General LIL We define the function Λ by "√ 2x ln ln x, if x ≥ 4, Λ(x) = 1, otherwise.
(1)
There is nothing special √ about the 4 in this definition. All that we need is a function that equals 2x ln ln x for x large, and Λ is one such function.
2 The Law of the Iterated Logarithm
117
Theorem 2.2.1 (The Law of the Iterated Logarithm) Let S denote a real-valued, N -parameter random walk with increments X = (Xt ; t ∈ 2 2 N −1 ] < ∞, then with NN 0 ). If E[X0 ] = 0, E[X0 ] = 1 and E[X0 (ln+ |X0 |) probability one, √ St lim sup = N , (2) t→∞ Λ t where Λ is given by (1) above, and t is defined in (1) in the introduction of Section 1. We will prove this result in Sections 2.3 through 2.10. Our proof follows loosely along the original 1-parameter proof of A. I. Khintchine for simple walks. However, there are a number of technical difficulties that are overcome by applying the elegant method of de Acosta (1983). Before proving the direct half of the LIL, it should be pointed out that there is an essential converse. Exercise 2.2.1 Show that in the setting of the LIL (Theorem 2.2.1), X 2 (ln |X |)N −1 |St | + 0 = +∞ =⇒ lim sup = +∞. E 0 ln+ ln+ |X0 | t→∞ Λ t This is due to Wichura (1973). (Hint: Adapt our proof of the “converse” in Smythe’s law of large numbers; cf. Theorem 1.4.1.) To conclude this subsection we point out a restatement of the above that is obtained from a real-variable argument. Lemma 2.2.1 Under the conditions of Theorem 2.2.1, equation (2) holds if and only if √ St lim sup = N , a.s. t→∞ Λ t Exercise 2.2.2 Prove Lemma 2.2.1.
2.3 Summability In this subsection we present two summability lemmas. The first is the multiparameter extension of Kronecker’s lemma of classical probability theory. When N = 1, a more general version of this is attributed to L. Kronecker; cf. Supplementary Exercise 6. Lemma 2.3.1 (Part of Kronecker’s Lemma) Suppose (xt ; t ∈ NN ) and a = (at ; t ∈ NN ) denote two collections of nonnegative real numbers. Suppose that a is nondecreasing with respect to the partial order ,
118
4. Multiparameter Walks
limt∞ at = ∞, and
xt /at < ∞. Then,
t∈NN
1 xs = 0. t→∞ at lim
st
Our second result is a quantitative version of summation by parts. Lemma 2.3.2 (Summation by Parts) Suppose {ai }, {bi }, {ci }, and {di } are sequences of positive real numbers such that: (i) b1 , b2 , . . . is a nondecreasing sequence; and (ii) ck ≤
k
j=1
aj ≤ dk , for all k ≥ 1.
Then, for all k ≥ 1, k j=1
cj
b
j+1
− bj
aj bj+1 − bj dk ck + . ≤ ≤ dj bk b bj bj+1 bk j=1 j j=1 k
+
bj bj+1
k
Exercise 2.3.1 Prove the above form of the Kronecker’s Lemma 2.3.1, as well as the summation-by-parts Lemma 2.3.2.
2.4 Dirichlet’s Divisor Lemma (2)
For any positive integer k, let Dk denote the number of its divisors. The following result of J. P. G. L. Dirichlet is an elementary fact from analytic number theory: n 1 (2) lim Dk = 1. n→∞ n ln n k=1
That is, a number that is uniformly picked from {1, . . . , n} has, on average, about ln n divisors. We will need a quantitative version of this, which makes (2) precise the fact that nk=1 Dk is of the order n ln n. The key to this is the (2) observation that Dk = t∈N2 1l{k} (t(1) t(2) ). More generally, define (N )
Dk
=
1l{k} t ,
k ≥ 1.
t∈NN
Lemma 2.4.1 For any two integers n, N ≥ 1, n k=1
(N )
Dk
≤ 2N n(N ln 2 + ln n)N −1 .
(1)
2 The Law of the Iterated Logarithm
119
n (N ) Remark While we need only the above upper bound on k=1 Dk , much more can be done; cf. Supplementary Exercise 1 for a sample, or see Karatsuba (1993, Theorems 3 and 7, Ch. 1) for very detailed estimates when N = 2. Proof For each t ∈ NN define Q(t) = {s ∈ NN : s t , |s − t| < 1}, and recall that for all x ∈ RN , |x| = max1≤j≤N |x(j) | denotes the maximum modulus norm. Thus, Q(t) denotes the cube of side 1 in RN whose “lower end point” is t. Since Lebesgue’s measure of Q(t) is 1, n k=1
(N )
Dk
=
t∈NN
Q(t)
t∈NN
Q(t)
≤
=
[1,∞[N
1l{1,...,n} t ds
1l[1,2N n] s ds
1l[1,2N n] s ds.
To obtain the inequality above we used the fact that for t (1, . . . , 1) and s ∈ Q(t), N
s ≤
t(j) + 1 ≤ 2N t.
j=1
In summary,
n
k=1
(N )
Dk
≤ JN (2N n), where for all x > 1,
JN (x) =
[1,∞[N
1l[1,x] s ds.
Since JN (x) ≤ x(ln x)N −1 , this proves the lemma.
(2)
Exercise 2.4.1 Complete the above by showing that JN (x) ≤ x(ln x)N −1 .
2.5 Truncation Define
) β(x) =
x , if x ≥ 4, ln ln x 1, otherwise.
(1)
Now, for any η > 0, we can truncate the increment Xt as Xt (η) = Xt 1l(|Xt |≤ηβ(t)) ,
t ∈ NN .
(2)
120
4. Multiparameter Walks
These are independent, but not identically distributed, random variables, which have the advantage of being bounded. Based on them we can build the mean-zero independent increments process S(η) as Xs (η) − E[Xs (η)] , t ∈ NN . (3) St (η) = st
In this section we make our first move by reducing the LIL to a problem about sums of bounded independent random variables that are not necessarily identically distributed. Namely, we have the following: Proposition 2.5.1 For any η > 0, lim
t→∞
|St − St (η)| = 0, Λ(t)
a.s.
To prove it we need a supporting lemma. Lemma 2.5.1 The following are equivalent: (i) E[X02 (ln+ |X0 |)N −1 ] < ∞; (ii) for all η > 0,
E Xt − Xt (η) < ∞. Λ(t) N
t∈N
Proof of Lemma 2.5.1 We will prove that (i) ⇒ (ii); the reverse implication is proved similarly. Define the right inverse to β as β −1 (a) = sup{x > 0 : β(x) < a}. By equation (1) above, there exists a constant C1 such that for all x ≥ 0, β −1 (a) ≤ C1 a2 ln+ ln+ a,
∀a ≥ 0.
(4)
(Why?) On the other hand, 1l(|X0 |≥ηβ(t)) E Xt − Xt (η) = E |X0 | · . Λ(t) Λ(t) N N t∈N
t∈N
To this we apply equation (4) with a = |X0 |η −1 , and deduce E Xt − Xt (η) Λ(t) N t∈N
1l(t≤C1 X 2 η−2 ln+ ln+ (X 2 η−2 )) 0 0 ≤ E |X0 | · Λ(t) N t∈N
∞ (N ) Dk = E |X0 | · 1l(k≤C1 X02 η−2 ln+ ln+ (X02 η−2 )) , Λ(k) k=1
(5)
2 The Law of the Iterated Logarithm
121
(N )
where Dk was defined in equation (1) of Section 2.4. By Dirichlet’s divisor lemma (Lemma 2.4.1), there exists a constant C2 such that for all j ≥ 1, j (N ) ≤ C2 j(ln+ j)N −1 . Thus, summation-by-parts (Lemma 2.3.2) k=1 Dk yields a constant C3 such that for any a ≥ 1, Λ(k + 1) − Λ(k) D(N ) (ln+ a)N −1 k ≤ C3 . + C3 k(ln+ k)N −1
2 Λ(k) Λ(a) Λ(k)
1≤k≤a
1≤k≤a
By Taylor’s expansion we can find a constant C4 such that for all integers 1 1 k ≥ 1, Λ(k + 1) − Λ(k) ≤ C4 k − 2 (ln ln k) 2 . Using this, and a few more lines of calculations, we deduce the existence of a constant C5 such that for all a > 0, √ D(N ) a(ln+ a)N −1 k ≤ C5 0 . Λ(k) ln+ ln+ a 0≤k≤a Hence, equation (5) implies that √ E{|Xt − Xt (η)|} Y (ln+ Y )N −1 ≤ C5 E |X0 | · 0 , Λ(t) ln+ ln+ Y t∈NN where Y = C1 X02 η −2 ln+ ln+ (X02 η −2 ). Since E[X02 (ln+ |X0 |)N −1 ] is finite, so is the display above. Using the previous lemma we can finish our proof of Proposition 2.5.1. Proof of Proposition 2.5.1 Since E[Xt ] = 0, E[Xt 1l(|Xt |>ηβ(t)) ] E{|Xt − Xt (η)|} E[Xt (η)] , = ≤ Λ(t) Λ(t) Λ(t) N N N t∈N
t∈N
t∈N
which is finite, thanks to Lemma 2.5.1. Another application of the latter lemma yields Xt − {Xt (η) − E[Xt (η)]} < +∞, a.s. Λ(t) N t∈N
Kronecker’s lemma (Lemma 2.3.1) now shows that (i) ⇒ (ii).
Exercise 2.5.1 Verify the converse half of Lemma 2.5.1, i.e., show that (ii) ⇒ (i).
2.6 Bernstein’s Inequality According to Proposition 2.5.1, we can reduce the LIL to proving a law of the iterated logarithm for bounded, independent random variables. In this
122
4. Multiparameter Walks
section we present a deep inequality of F. Bernstein that states that sums of mean-zero independent random variables have Gaussian tails. Proposition 2.6.1 (Bernstein’s Inequality) Let V1 , V2 , . . . , Vn be independent random variables with zero means and with max1≤i≤n |Vi | ≤ µ, for some deterministic constant µ > 0. Then, for all choices of λ > 0, n
1 λ2 . P Vi ≥ λ ≤ 2 exp − n 2 i=1 Var(Vi ) + µλ i=1
Proof By Taylor’s expansion, for any ζ > 0, eζVi = 1 + ζVi +
ζ 3 (Vi )3 ζVi ζ 2 Vi2 + e , 2 6
where |Vi | ≤ |Vi |, and Vi Vi ≥ 0. Consequently, ζµeζµ ζ 2 Vi2 1+ , eζVi ≤ 1 + ζVi + 2 3
a.s.
We take expectations to see that 1 1 1 1 ζ 2 Var(Vi ) 1 + ζµeζµ , E{eζVi } ≤ 1 + ζ 2 Var(Vi ) 1 + ζµeζµ ≤ exp 2 3 2 3 since 1 + x ≤ ex , for x ≥ 0. In particular, the independence of the Vi ’s leads to 1 n 1 E eζ i=1 Vi ≤ exp ζ 2 σn2 1 + ζµeζµ , 2 3 n 2 where we write σn = i=1 Var(Vi ), for brevity. Thus, by Chebyshev’s inequality, for all ζ, λ > 0, P
ζ 2 σ2 1 n 1 + ζµeζµ − ζλ . Vi ≥ λ ≤ exp 2 3 i=1
n
Now we choose ζ= Since 13 eζµ ≤ P
n
e 3
σn2
λ . + λµ
≤ 1,
ζ 2 σ2 n 1 + ζµ − ζλ Vi ≥ λ ≤ exp 2 i=1 2 σn2 λµ λ2 λ · 1 + − = exp 2(σn2 + λµ) σn2 + λµ σn2 + λµ σn2 + λµ 2 2 1 λ 1 + 2η λ · − 2 , = exp 2 2(σn + λµ) 1 + η 1 + η σn + λµ
(1)
2 The Law of the Iterated Logarithm
123
where η = λµ/σn2 . Consequently, # " n 1 1 + 2η λ2 · 2 − . P Vi ≥ λ ≤ exp − 2(σn2 + λµ) 1+η 1+η i=1 It remains to prove that 2−
1 1 + 2η ≥ 1. 1+η 1+η
But this is an elementary fact.
One can use a different choice of ζ in (1), to get other inequalities. For instance, see the following exercise. Exercise 2.6.1 Prove that in Proposition 2.6.1 there exists a constant C, independent of the distribution of the Vi ’s, such that for all λ > 0, n λ P Vi ≥ λ ≤ C exp − 1 . n 2 2 max i=1 j=1 VarVi , µ In its essence, this is due to A. N. Kolmogorov.
Exercise 2.6.2 Improve Proposition 2.6.1 by finding a constant κ ∈ ]0, 1[, such that n
λ2 . P Vi ≥ λ ≤ 2 exp − n 2 j=1 VarVi + κµλ i=1 Usually, one refers to Bernstein’s inequality as this with κ = 13 .
2.7 Maximal Inequalities In this subsection we prove two maximal inequalities. When used together with Bernstein’s inequality (Proposition 2.6.1), their implications are analogous to those of Lemma 2.1.1. Lemma 2.7.1 Suppose Z = (Zt ; t ∈ NN ) is a collection of R-valued independent random variables with mean zero and σ 2 = supt∈NN E[Zt2 ] < ∞. Then, whenever λ ≥ 2N +1 σ , 0 1 Zr ≥ λ t ≤ . sup P max st 2 t∈NN rs
Our next maximal inequality is a variant of classical inequalities of P. L´evy and G. Ottaviani; cf. (Pyke 1973; Shorack and Smythe 1976; Wichura 1973) for other variants. When the random walk in question is symmetric, an even better inequality is found in Exercise 2.1.1 above.
124
4. Multiparameter Walks
Lemma 2.7.2 In the notation of Lemma 2.7.1 above, for any x, y > 0, and for all t ∈ NN ,
N
P max Zr ≥ x+N y P max Zr ≤ x ≤P Zs ≥ y . st
st
rs
rs
st
Proof of Lemma 2.7.1 Throughout, we write Tt = s t Zs . For any t ∈ NN , define Ft to be the σ-field generated by (Tr ; r t). Our proof of Proposition 1.2.1 can be mimicked to show that F = (Ft ; t ∈ NN ) is a commuting filtration, since in the latter proposition the identical distribution of the increments was not needed. Moreover, the arguments used in Lemma 1.2.2 go through (verbatim) to show that t → Tt is a martingale with respect to F. By Corollary 3.5.1 and Theorem 2.3.1—both of Chapter 1—for all t ∈ NN , E[Zs2 ] ≤ 4N σ 2 · t. E max Ts2 ≤ 4N E[Tt2 ] = 4N st
st
In the last equality we used the fact that the increments of Tt are meanzero independent random variables. The result follows from the above and Chebyshev’s inequality. Proof of Lemma 2.7.2 We will prove this when N = 1 and N = 2. The general case is deferred to Exercise 2.7.1 below. First, suppose N = 1, and consider the stopping time τ = inf(k ≥ 1 : |Zk | ≥ x + y), where, as usual, inf ∅ = +∞. Clearly, k
Zi ≥ x + y = P(τ ≤ n). P max k≤n
i=1
We decompose the above as follows, all the time writing Tn = brevity: P(τ ≤ n) ≤ P(|Tn | ≥ x) + P(τ ≤ n , |Tn | ≤ x).
n
i=1
Zi for
On the other hand, on (τ ≤ n), we can write |Tn | = |Tn − Tτ + Tτ | ≥ |Tτ | − max |Ti+τ − Tτ | ≥ x + y − max |Ti+τ − Tτ |. i≤n
i≤n
Thus,
P(τ ≤ n) ≤ P(|Tn | ≥ x) + P τ ≤ n , max |Ti+τ − Tτ | ≥ y . i≤n
2 The Law of the Iterated Logarithm
125
The above, together with the strong Markov property (Theorem 1.2.1 of Chapter 3), yields
P(τ ≤ n) ≤ P(|Tn | ≥ x) + P(τ ≤ n)P max |Ti | ≥ y . i≤n
Solving, we obtain
P max |Ti | ≥ x + y · P max |Ti | ≤ y ≤ P(|Tn | ≥ x), i≤n
i≤n
(1)
which is the desired result when N = 1. Note that equation (1) holds even if the increments of T are d-dimensional; in that case, | · · · | denotes, as usual, the ∞ -norm on Rd . now proceed with our proof for N = 2. Here, we write Tn,m = We i≤n j≤m Zi,j , where Zi,j is written in place of Z(i,j) , etc. For any fixed integer m ≥ 1, i → Ti = (Ti,1 , . . . , Ti,m ) is an mdimensional, one-parameter random walk. Since equation (1) applies in this case, we have
P max |Ti | ≥ x + 2y · P max |Ti | ≤ 2y ≤ P(|Tn | ≥ x). i≤n
i≤n
Equivalently, in terms of our two-parameter walk, this states that
max |Ti,j | ≤ y P max |Ti,j | ≥ x + 2y · P (i,j) (n,m) (i,j) (n,m)
≤ P max |Tn,j | ≥ x + y . j≤m
The last term is a maximal inequality involving the one-parameter, onedimensional walk j → Tn,j . Equation (1) implies
P max |Tn,j | ≥ x + y · P max |Ti,j | ≤ y j≤m (i,j) (n,m)
≤ P max |Tn,j | ≥ x + y · P max |Tn,j | ≤ y j≤m
j≤m
≤ P(|Tn,m | ≥ x). This and the preceding display together demonstrate the result for the case N = 2. Exercise 2.7.1 Prove the above for general N > 1.
2.8 A Number-Theoretic Estimate (N )
In Section 2.4 we saw Dirichlet’s divisor numbers Dk introduce a variant of these.
(k ≥ 1). Now we
126
4. Multiparameter Walks
For any k ≥ 1, define (N )
Rk
=
1l{k}
N
j=1
t(j) .
(1)
t∈NN (N )
Note that Rk = 0, unless k ≥ N . We proceed to find a quantitative (N ) estimate for the cumulative sums of the function k → Rk . Lemma 2.8.1 For any k ≥ 1, (N )
(N )
=
=1
(N )
R
≤ k N /N !.
= 0, for all negative integers , and note that for all
Proof Define R k ≥ 1, Rk
k
∞
1l{k−}
N −1 j=1
t∈NN −1 =1
∞ k−1 (N −1)
(N −1) t(j) = Rk− = R . =1
(2)
=1
Therefore, for all k ≥ 1, k j=1
(N )
Rj
=
j−1 k j=1 =1
(N −1)
R
.
(3)
We can now perform mathematical induction on N . Clearly, both bounds k (1) hold for N = 1 and, in fact, =1 R = k. Assuming that the result holds for N − 1, we proceed to show that it does for N . By (2) and by the induction hypothesis, k j=1
(N )
Rj
≤
k (j − 1)N −1 j=1
(N − 1)!
≤
1 (N − 1)!
k
xN −1 dx,
0
which is the desired upper bound. Lemma 2.8.1 is sharp, asymptotically, as k → ∞: Exercise 2.8.1 If x+ = max(x, 0), show that k =1
(N )
R
≥
(k − N + 1)N + . N!
Historically, Dirichlet’s divisor lemma (Lemma 2.4.1) and Lemma 2.8.1 arise in analytic number theory through the theory of integer points. In(2) deed, consider Lemma 2.4.1 with N = 2 and observe that Rk denote the number of points of integral coordinates in a large triangle.
2 The Law of the Iterated Logarithm (2)
Exercise 2.8.2 Verify that as k → ∞, k −2 Rk → 12 .
127
Exercise 2.8.3 Continuing with Exercise 2.8.2, let Gk denote the number of points of integral coordinates in the centered disk of radius k on the plane. Show that as k → ∞, k −2 Gk → π. This calculation, together with some of its refinements, are due to C.-F. Gauss. See Karatsuba (1993, Theorems 2 and 6, Ch. 1) for this and more. For our applications to the LIL, integer points come in via summability criteria such as the following important exercise. ∞ (N ) Exercise 2.8.4 Show that k=1 k −γ Rk < +∞ ⇐⇒ γ > N.
2.9 Proof of the LIL: The Upper Bound In light of Proposition 2.5.1, it suffices to show that for any η > 0, almost surely, 1 St (η) lim sup ≤ 14 N 2 η 2 + N + 12 N η, (1) t→∞ Λ(t) where S(η) is the truncation of the walk S; cf. equations (1)–(3) of Section 2.5. The upper bound of the LIL follows upon appealing to Proposition 2.5.1, and letting η ↓ 0 along a rational sequence, in the above display. We work toward establishing tail estimates for the walk S(η). Since E{|Xt (η)|2 } ≤ E{|Xt |2 } = 1, Lemma 2.7.1 shows us that 0 1 inf P max Ss (η) ≤ 2N +2 t ≥ . st 2 t∈NN Therefore, by Lemma 2.7.2, for all t ∈ NN , and all α > 0,
0 P max |Ss (η)| ≥ αΛ(t) + 2N +2 t ≤ 2N P |St (η)| ≥ αΛ(t) . st
The right-hand side can be estimated by Bernstein’s inequality (Proposition 2.6.1). Indeed, since VarSt (η) ≤ VarSt = t, Bernstein’s inequality and the previous display together show us that for all t ∈ NN , and all α > 0, 0 P max Ss (η) ≥ αΛ(t) + 2N +2 t st
≤ 2N +1 exp −
α2 t ln+ ln+ t , t + αηΛ(t)β(t)
where β is defined in equations (1) of Section 2.5. In particular, 0 P max |Ss (η)| ≥ αΛ(t) + 2N +2 t st
≤ 2N +1 exp −
− α2 α2 ln+ ln+ t ≤ 2N +1 ln+ t 1+αη , 1 + αη
(2)
128
4. Multiparameter Walks
and the last inequality is an equality if ln+ ln+ t = ln lnt. This is the desired probability estimate; we will use it via a blocking argument. To define the blocks, let us fix some δ > 0, and for all t ∈ NN , let rt ∈ NN be the point 2 3 (j) rt = exp(δt(j) ) , j = 1, . . . , N. (3) N (j) Note that lnrt ≥ δ j=1 t − δ. In light of (2), we deduce the existence of a constant C such that t∈NN
P max |Ss (η)| ≥ αΛ(rt ) + 2 s rt
N +2
0
rt ≤ C =C
N t∈NN ∞
k
k=1
t
(j)
−
α2 1+αη
j=1 α2 − 1+αη
(N )
Rk ,
(N ) Rk
was defined in equation (1), Section 2.8. Thanks to Exercise where 2.8.4, these sums converge if α2 > N. 1 + αη
(4)
We choose α, η such that they satisfy equation (4) and deduce, from the Borel–Cantelli lemma, that almost surely, 0 max Ss (η) ≤ αΛ(rt ) + 2N +2 rt , for all but finitely many t ∈ NN . s rt
Now, to blocking: For any ν ∈ NN , we can find t ∈ NN such that rt ≤ ν ≤ rt+1 , where (t + 1)(j) = t(j) + 1. Thus, with probability one, for all but a finite number of ν’s in NN , 0 Sν (η) ≤ max Ss (η) ≤ αΛ(rt+1 ) + 2N +2 rt+1 s rt+1
≤ αeδN/2 (1 + oν )Λ(ν), where oν → 0 as ν → ∞. In other words, assuming that condition (4) holds, lim supν→∞ |Λ(ν)|−1 Sν (η) ≤ αeδN/2 , a.s. Since δ is arbitrary, we can let δ → 0 along a rational sequence and see that this lim sup is bounded above by α. We can even let α converge to the positive solution of α2 = N (1 + αη) and not change the validity of this bound. This verifies equation (1) and completes our proof of the upper bound in the LIL.
2.10 A Moderate Deviations Estimate To prove the lower half of the LIL, we need the following one-parameter result, which states that at least half of Lemma 2.1.2 holds only under the conditions of the central limit theorem. This is, in general, sharp; see Supplementary Exercise 5 below.
2 The Law of the Iterated Logarithm
129
Lemma 2.10.1 Let S = (Sk ; k ≥ 1) denote an R-valued random walk with increments X = (Xk ; k ≥ 1). If E[X1 ] = 0 and E[X12 ] = 1, then for all α > 0, √
1 ln P Sk ≥ α 2k ln ln k ≥ −α2 . lim inf k→∞ ln ln k Proof To clarify the essential ideas for this proof, we first present a rough outline. This discussion will be made rigorous subsequently. Fix some t > 0 and working informally, we can write Sk = Y1 + · · · + Yt ln ln k , where Y1 = Sk/(t ln ln k) , and for all j ∈ {2, . . . , t ln ln k}, Yj = Sjk/(t ln ln k) − Yj−1 − · · · − Y1 . Note that Y1 , Y2 , . . . , Yt ln ln k are independent and all have the same distribution as Sk/(t ln ln k) . (This is where our proof is rough; it is rigorous only if t ln ln k and k/(t ln ln k) are both integers. Nonetheless, this idea of “blocking” the sum into independent blocks is correct in spirit.) 0 Clearly, if all the Yj ’s are larger than αt 2k/ ln ln k, then Sk ≥ √ α 2k ln ln k. Thus, using the i.i.d. structure of the Y ’s, we can estimate probabilities as follows: ) #t ln ln k " √
α 2k P Sk ≥ α 2k ln ln k ≥ P Y1 ≥ t ln ln k ) #t ln ln k " α 2k = P Sk/(t ln ln k) ≥ . t ln ln k Let N denote a standard Gaussian random variable. By the central limit theorem, for all ε > 0, there exists k0 such that for all k ≥ k0 , ) ) α 2k 2 ≥P N≥α − ε. P Sk/(t ln ln k) ≥ t ln ln k t Consequently, for all ε > 0, ) √
1 2 lim inf ln P Sk ≥ α 2k ln ln k ≥ t ln P N ≥ α − ε. k→∞ ln ln t t We can let ε → 0, and appeal to standard tail estimates, e.g., Supplementary Exercise 11, to finish. As we have already seen, the above is a complete proof, except that t ln ln k and/or k/(t ln ln k) need not be integer-valued for some choices of t and k. To get around this, fix t > 0 and define for all k ≥ 4, 4 ak =
k 5 , t ln ln k
bk = t ln ln k!.
Since ak bk ≤ k for all k large enough, we can then write Sk = Y1 + · · · + Ybk + (Sk − Ybk ), where Y1 = Sak , and for all 2 ≤ j ≤ bk , Yj = Sjak − Yj−1 .
130
4. Multiparameter Walks
For all δ > 0, α − δ ) 2k bk P Sk ≥ t ln ln k 7 6 α ) 2k bk √
P |Sk − Ybk | ≤ δ 2k ln ln k . ≥ P Y1 ≥ t ln ln k On the other hand, Sk − Ybk has the same distribution as Sk−ak bk which has mean zero and variance (k − ak bk ). By Chebyshev’s inequality, √
k − ak b k . P |Sk − Ybk | ≤ δ 2k ln ln k ≥ 1 − 2 2δ k ln ln k Since ak ≥ k(t ln ln k)−1 − 1 and bk ≥ t ln ln k − 1, 1 t k − ak b k ≤ . + k ln ln k t(ln ln k)2 +k ln ln k √ That is, P(|Sk − Ybk | ≤ δ 2k ln ln k) goes to 1 as k → ∞. By changing α to α + δ, we obtain ) 1 2k −1 lim inf P Sk ≥ αt bk k→∞ bk ln ln k ) 2k −1 ≥ lim inf ln P Sak ≥ (α + δ)t k→∞ ln ln k ) 2 = ln P N ≥ (α + δ) , t thanks to the central limit theorem and the fact that limk→∞ (ak ln ln k/k) = t−1 . Since limk→∞ (bk / ln ln k) = t, for all α < α and all t > 0, ) √
1 1 2
P Sk ≥ α 2k ln ln k ≥ ln P N ≥ (α + β) . lim inf k→∞ ln ln k t t As t → 0 we arrive at the following: For all 0 < α < α and all β > 0, lim inf k→∞
√
1 P Sk ≥ α 2k ln ln k ≥ −(α + β)2 . ln ln k
Letting β ↓ 0 and α ↓ α , this proves the result with α replacing α.
2.11 Proof of the LIL: The Lower Bound To this end, let us fix some δ > 2 and for all r ∈ NN , define (j) (j) B(r) = t ∈ NN : for all 1 ≤ j ≤ N , eδr ≤ t(j) < eδ(r +1) .
(1)
2 The Law of the Iterated Logarithm
131
Note the appearance of rt in the above, where rt were defined in equation (3) in our proof of the upper bound of the LIL (Section 2.9). Whenever r, t ∈ NN satisfy t ∈ B(r), then two things must happen: N N 1. exp(δ j=1 r(j) ) ≤ t ≤ exp(δ + δ j=1 r(j) ); and 2. t L(r), where L(r) is defined to be the point whose jth coordinate is L(j) (r) = exp(δ + δr(j) )!. In words, L(r) is the unique largest element of B(r) with respect to the partial order . It is important also to recognize that 8 9 #B(r) = 1 − e−δ }−N L(r) , (2) where #B(r) denotes the cardinality of B(r). Fix η ∈ ]0, 1[ and define α > 0 by : N α= . (3) 1−η By Lemma 2.10.1, there exists an integer k0 (which may depend on η and δ) such that whenever #B(r) ≥ k0 ,
−α2 (1−ε) P Xt ≥ αΛ #B(r) ≥ ln #B(r) . t∈B(r)
N Also, by equation (2), ln #B(r) = (1 + or )δ j=1 r(j) , where or → 0 as N (j) → ∞. Therefore, there exists a constant C0 such that for all but j=1 r finitely many r, P
N N −(1−ε)α2 −N
Xt ≥ αΛ #B(r) ≥ C0 r(j) = C0 r(j) ,
j=1
t∈B(r)
j=1
thanks to equations (2) and (3). Consequently, there exist constants C1 and C2 such that
(N ) P Xt ≥ αΛ #B(r) ≥ C1 k −N Rk , r∈NN (N )
t∈B(r)
k≥C2
where Rk is defined by equation (1) of Section 2.8. By Exercise 2.8.4, P(E ) = +∞, where E is the event that N r r r∈N t∈B(r) Xt ≥ αΛ(#B(r)). Since the Er ’s are independent events, the Borel–Cantelli lemma tells us that infinitely many of the Er ’s occur with probability one. To recapitulate, for all δ > 2 and all α and η > 0 satisfying equation (3), Xt t∈B(s) ≥ α, lim sup a.s. #B(s)→∞ Λ #B(s)
132
4. Multiparameter Walks
By equation (2), almost surely, lim sup L(s)→∞
Xt − N2 t∈B(s) ≥ α 1 − e−δ . Λ L(s)
(4)
Pick any s ∈ NN such that s is very large, and note that (s) ≤ − X |SL(r) |, SL(s) t
(5)
r
t∈B(s)
(s) where the sum r is taken over all r ∈ NN such that for all but one (j) 1 ≤ j ≤ N , r = s(j) ; for the one exceptional j, r(j) = s(j) − 1. (There are N summands in this sum.) By the already-proven upper bound to the LIL (Section 2.9), (s) 1 3 1 SL(s) − Xt ≤ (1 + θs )N 2 Λ(r) ≤ (1 + κs )N 2 e− 2 N δ Λ(s), r
t∈B(s)
where θs , κs → 0 as s → ∞. By equations (3)–(5), the following holds, a.s.: lim sup t→∞
SL(s) St ≥ lim sup
Λ(t) s→∞ Λ s − N2 3 1 − N 2 e− 2 N δ ≥ α 1 − e−δ : − N2 N 3 1 1 − e−δ = − N 2 e− 2 N δ . 1−η
Moreover, this holds a.s., simultaneously for all rational numbers δ > 2 and η > 0. The desired lower bound to the LIL follows upon letting δ → ∞ and η → 0, in this order, and along rational sequences.
3 Supplementary Exercises 1. Prove that as n → ∞,
n
k=1
(N)
Dk
/n(ln n)N−1 converges to 1.
2. Suppose X1 , X2 , . . . are i.i.d. random variables such that with probability 1 one, lim supn→∞ (n ln ln n)− 2 | n j=1 Xj | < ∞. We wish to show that this implies E[X12 ] < ∞. That is, in the one-parameter setting, LIL is equivalent to the existence of two finite moments. (i) Prove there exists a nonrandom M > 0 such that the above lim sup is bounded above by M .
3 Supplementary Exercises
133
(ii) Prove that it suffices to show this when the X’s are symmetric. (iii) Suppose the X’s are symmetric, fix c > 0, and define X i = Xi 1l(|Xi |≤c) − Xi 1l(|Xi |>c) . Show that (X i ; i ≥ 1) is a copy of (Xi ; i ≥ 1), i.e., that X and X have the same finite-dimensional distributions. 1 (iv) Prove that almost surely, lim supn→∞ (n ln ln n)− 2 | n j=1 Xi 1l(|Xj |≤c) | ≤ 2M. (v) Use the LIL to show that E[X12 1l(|X1 |≤c) ] ≤ 4M 2 and conclude that E[X12 ] < ∞. This result is due to Strassen (1966), while the above-sketched proof is from Feller (1969). (Hint: For part (ii), introduce independent copies Xi of Xi and consider Xi − Xi instead. For part (iv), note that 2X1l(|X|≤c) = X + X.) 3. Let S = (Sk ; k ≥ 0) denote the simple walk on Z, starting from 0 ∈ Z. That is, the increments of S are ±1 with probability 12 each. Prove that for any n ≥ 1, max1≤k≤n Sk has the same distribution as |Sn |. This is the reflection principle of D. Andr´e; see Feller (1968). (Hint: Study the described proof of Lemma 2.7.2 carefully.) 4. Suppose(Xj ; j ≥ 0) are i.i.d. random variables with mean 0 and variance 1. If Sn = n j=1 Xj , prove the following moderate small-deviations bound: For each λ > 0, ) 1 n ln P max |Sk | ≤ λ lim sup ≤ ln P(|N| ≤ 2λ), 1≤k≤n ln ln n n→∞ ln ln n where N is a standard Gaussian random variable. Conclude that with probability 1 one, lim inf n→∞ (ln ln n/n)− 2 max1≤k≤n |Sk | > 0. This is a part of the statement of Chung (1948). 5. (Hard) Our goal, here, is to show the following moderate deviations principle: If V, V1 , V2 , . . . are i.i.d. random variables with mean zero and variance one, and if they have a finite moment generating function in a neighborhood of the origin, then for any sequence xn tending to infinity, such that limn→∞ n1 xn = 0,
√ 1 1 lim ln P V1 + · · · + Vn ≥ nxn = − . n→∞ xn 2 (i) First, show that even without the assumption on the local existence of a √ moment generating function, lim inf n→∞ x−1 nxn ) ≥ n ln P(V1 + · · · + Vn ≥ − 12 . (ii) Use a blocking argument to show that under the stated conditions of the exercise, the corresponding lim sup ≤ − 12 . (Hint: For (i), mimic the derivation of Lemma 2.10.1. For part (ii), start, as in the first part, by blocking j≤n Vj = j≤an Yj,n , where Y1,n , . . . , Yan ,n are i.i.d. random variables, and an is chosen carefully. Then, write
√ Vj ≥ nxn = P Zj,n ≥ an , P j≤n
j≤an
134
4. Multiparameter Walks 1
where Zj,n = an (nxn )− 2 Yj,n . Use the exponential use of Chebyshev’s inequality as was done for Bernstein’s inequality (Proposition 2.6.1), together with the central limit theorem.) 6. (Hard) Prove that Kronecker’s lemma (Lemma 2.3.1) remains true even if the xs ’s are no longer assumed to be nonnegative. When N = 1, this is the classical and standard form of Kronecker’s lemma. When N ≥ 2, this is due to Cairoli and Dalang (1996, Lemma 2, Ch. 1). (Hint: First, do this for N = 1. Then, proceed by induction on N , using the inclusion–exclusion formula, Lemma 1.2.1.) 7. Suppose ε1 , ε2 , . . . are i.i.d. Rademacher random variables. That is, P(ε1 = 1) = P(ε1 = −1) = 12 . (i) If a1 , a2 , . . . is an arbitrary sequence of numbers, prove that with probability n 1 2 one, lim supn→∞ (2υn ln ln υn )− 2 n j=1 εj aj = 1, where υn = j=1 aj . (ii) Suppose X1 , X2 , . . . are i.i.d. and are totally independent of the sequence 1 of ε’s. Conclude that lim supn→∞ (2Vn ln ln Vn )− 2 n j=1 εj Xj = 1, where n Vn = j=1 Xj2 . 1 (iii) Prove that if the Xi ’s are symmetric, lim supn→∞ (2Vn ln ln Vn )− 2 n j=1 Xj = 1, almost surely. 1 (iv) Show that even without symmetry, lim supn→∞ (Vn ln ln Vn )− 2 n j=1 Xj > 0, almost surely. (Hint: Symmetrize.) (v) Verify that when the Xi ’s are symmetric and have variance 1, the LIL follows. Derive an N -parameter version of this result. This is an example of a “self-normalized LIL” and is essentially due to Griffin and Kuelbs (1991). Roughly speaking, it states that in the symmetric case, the LIL is a consequence of the strong law. 8. In the context of Smythe’s law of large numbers, show that whenever E[|X0 |p ] < ∞ for some p > 1, At converges to E[X0 ] in Lp (P). 9. In the context of Smythe’s law of large numbers, show that for any p > 1, E[supt |At |p ] < ∞ if and only if E[|X0 |p ] < ∞, whereas E[supt |At |] < ∞ if and only if E[|X0 |{ln+ |X0 |}N ] < ∞. (Note the power of N and not N − 1.) This is, in various forms, due to (Burkholder 1962; Gabriel 1977; Gut 1979); see also Cairoli and Dalang (1996, Theorem 2.4.2, Chapter 2). 10. The moment conditions of the strong law and the LIL are crucial, as the following shows. Recall that a random variable Y has a standard Cauchy distribution if it has the probability density function (with respect to Lebesgue’s measure) f (x) = π1 (1 + x2 )−1 , (x ∈ R). Given i.i.d. (standard) Cauchy random variables X1 , X2 , . . ., define Sn = n j=1 Xj to be the corresponding random walk. (i) Verify that for each n ≥ 1, n1 Sn has a standard Cauchy random distribution. (Hint: Consider characteristic functions.)
4 Notes on Chapter 4
135
(ii) (Hard) Suppose a1 , a2 , . . . is a nondecreasing sequence of positive reals such that limn an = +∞. Prove that the following dichotomy holds: " 0, if n a−1 2n < ∞, P(Sn > nan infinitely often ) = 1, if n a−1 2n = +∞. (iii) Conclude that for any nondecreasing function h, lim supn→∞ Sn /{nh(n)} is 0 or infinity (a.s.). (iv) Derive an N -parameter version of this “summability test.” 11. If N is standard Gaussian, prove that for all λ > 0, P(|N| ≥ λ) ≤ 2e−λ 0 2 Moreover, prove that limλ→∞ λeλ /2 P(|N| ≥ λ) = 2/π.
2
/2
.
4 Notes on Chapter 4 Section 1 The proof given here of Smythe’s law of large numbers is close to those in (Edgar and Sucheston 1992; Smythe 1973; Walsh 1986b). For a survey of some of these results (five of them, to be exact), see Smythe (1974a). To see further related results, refer to (Bass and Pyke 1984c; Kwon 1994; Pyke 1973; Shorack and Smythe 1976; Smythe 1974b). The method of reduction to the one-parameter setting is, in fact, a metatheorem; cf. Sucheston (1983). Alternative derivations of Smythe’s law can be found in (Cairoli and Dalang 1996; Khoshnevisan 2000). There is a rich multiparameter ergodic theory that essentially runs parallel to the limit theory of this chapter. See, for example, (Krengel and Pyke 1987; Frangos and Sucheston 1986; Sucheston 1983; Stein 1961; Ricci and Stein 1992). Section 2 The one-parameter law of the iterated logarithm has a long and distinguished history. The literature is so large that it would be impossible to give an extensive bibliography here. Ledoux and Talagrand (1991) and Stout (1974) are good sources for further reading in different directions. For two particularly beautiful works about this general topic, see (Erd˝ os 1942; Rogers and Taylor 1962). There are many LILs that are related to the material of this chapter and, in fact, the entire book. As a sampler, we mention (Bass 1985; Bass and Pyke 1984b; Li and Wu 1989; Paranjape and Park 1973; Park 1975; Wichura 1973). We have already seen that when N = 1, the LIL holds if and only if the increments of the random walk have a finite second moment. When N ≥ 2, the L2 {ln L}N−1 can be improved to the following necessary and sufficient condition: (ln+ |X|)N−1 E |X|2 < +∞. ln+ ln+ |X| See Wichura (1973); for an infinite-dimensional extension, see Li and Wu (1989). It is intriguing that while the strong law always holds iff the increments are in L(ln+ L)N−1 , the necessary and sufficient condition for the existence of the LIL depends on whether or not the number of parameters, N , is greater than or equal to 2!
136
4. Multiparameter Walks
The proof of the LIL given in this section is the most elegant one that I know, and is a multiparameter adaptation of de Acosta (1983). Lemma 2.10.1 is borrowed from de Acosta (1983) and de Acosta and Kuelbs (1981). Moreover, there are related inequalities to those that appear here that you can learn from (Etemadi 1991; Pyke 1973; Shorack and Smythe 1976). Section 3 Supplementary Exercise 5 is the starting point for a number of interesting directions of work on random walks. You can explore related themes under the general headings of the law of the iterated logarithm and moderate (and sometimes large) deviations in (Dembo and Zeitouni 1998; Petrov 1995), for instance.
5 Gaussian Random Variables
Classical probability theory has shown us that Rd -valued Gaussian random variables appear naturally as limits of random walks that take their values in Rd . Later on, in Chapter 6, we shall see that such limit theorems are part of an elegant abstract theory. At the heart of such a theory is an understanding of Gaussian variables that take values in a rather general state space. This chapter is concerned with the development of a theory of Gaussian variables that is sufficiently general for our later needs.
1 The Basic Construction In this section we will construct a Gaussian random variable that takes values in the space of continuous linear functionals on a certain separable Hilbert space. This construction is sufficiently general to yield many interesting objects such as “Brownian motion,” “Brownian sheet,” and “white noise.” We begin our discussion with preliminaries on finite-dimensional Gaussian random variables.
1.1 Gaussian Random Vectors An Rd -valued random variable g is said to be a Gaussian random variable (or equivalently a Gaussian random vector) if there are a vector µ ∈ Rd and a matrix Σ ∈ Rd × Rd such that for all t ∈ Rd , g · t = dj=1 t(j) g (j) is an R-valued Gaussian random variable with mean
138
5. Gaussian Random Variables
µ · t and variance t Σt; here and throughout, we view t as a column vector, and t denotes the transpose of t. Equivalently, g is a Gaussian random variable if for all t ∈ Rd and ξ ∈ R, d d d
ξ 2 (j) (k) (j,k) E exp iξ(g · t) = exp iξ . t(j) µ(j) − t t Σ 2 j=1 j=1 k=1
The vector µ is called the mean vector, and Σ is called the covariance matrix of g. Since for all t ∈ Rd , 0 ≤ Var(g · t) = t Σt, Σ is always a nonnegative definite matrix. Throughout this book we will consistently write g ∼ Nd (µ, Σ) to mean that g is an Rd -valued Gaussian random vector with mean vector µ and covariance matrix Σ. In particular, N1 (0, 1) corresponds to the standard Gaussian distribution on the real line. Exercise 1.1.1 If Σ is nonsingular, show that the distribution of g is absolutely continuous with respect to Lebesgue’s measure and its density at x ∈ Rd is 1 1 √ exp − (x − µ)Σ−1 (x − µ) , 2 (2π)d/2 det Σ where Σ−1 (respectively det Σ) denotes the inverse (respectively determinant) of Σ. Exercise 1.1.2 (Mill’s Ratios) Suppose that d = 1 and that the mean and variance of g are 0 and 1, respectively. Use the explicit form of the density of g (Exercise 1.1.1) and L’Hˆopital’s rule to deduce that as λ → ∞, 1
2
1
λe 2 λ P(g > λ) → (2π)− 2 . Conclude the existence of a finite constant C > 1 such that for all λ > 1, C −1 λ−1 exp(− 21 λ2 ) ≤ P(g > λ) ≤ Cλ−1 exp(− 12 λ2 ). In this regard, see also Supplementary Exercise 11, Chapter 4. In order to better understand the roles of µ and Σ, fix some i ∈ {1, . . . , d} and choose t ∈ Rd according to " 1, if j = i, (j) j = 1, . . . , d. t = 0, otherwise, For this choice of t we have g · t = g (i) ∼ N1 (µ(i) , Σ(i,i) ). In other words, µ is the vector of the means of g, viewed coordinatewise, and the diagonal of Σ is the vector of the variances of g, also viewed coordinatewise. To identify the off-diagonal elements of Σ, fix distinct i, j ∈ {1, . . . , d}, and choose t ∈ Rd as 6 1, if = i, t() = −1, if = j, = 1, . . . , d. 0, otherwise,
1 The Basic Construction
139
We see that g · t = g (i) − g (j) is an R-valued Gaussian random variable with variance Σ(i,i) + Σ(j,j) − Σ(i,j) − Σ(j,i) . Furthermore, thanks to the preceding paragraph, Σ(,) = Var[g () ]. Thus, Var[g (i) − g (j) ] = Var[g (i) ] + Var[g (j) ] − {Σ(i,j) + Σ(j,i) }. (i,j)
Equivalently, Cov[g (i) , g (j) ] = Σ , where Σ = 12 {Σ(i,j) + Σ(j,i) } is the symmetrization of Σ. An important property of matrix symmetrization is that it preserves quadratic forms; i.e., t Σ t = t Σt,
t ∈ Rd .
(Check!) In other words, whenever g ∼ Nd (µ, Σ), then g ∼ Nd (µ, Σ ). Moreover, one can determine Σ uniquely as the symmetric matrix of the covariances of g. From now on, whenever we write g ∼ Nd (µ, Σ), µ and Σ designate the mean vector and matrix of the covariances of g, respectively. This justifies the use of the terms “mean vector ” and “covariance matrix.” According to the above discussion, we can assume, without loss of generality, that Σ is a symmetric matrix, and we have also seen that Σ is positive definite. It turns out that these properties of Σ together completely characterize Rd -valued Gaussian random variables in the following distributional sense. Theorem 1.1.1 Given any vector µ ∈ Rd and any symmetric, positive definite matrix Σ ∈ Rd × Rd, one can construct (on some probability space) an Rd -valued random variable g ∼ Nd (µ, Σ). Conversely, if g ∼ Nd (µ, Σ), then Σ can be taken to be symmetric and is always real and positive definite. Proof Since the converse part was proved in the previous paragraph, we proceed by deriving the direct half. Suppose µ ∈ Rd and Σ is a symmetric, positive definite (d×d) matrix. By elementary matrix algebra, Σ = P ΛP , where P is a real, orthogonal (d×d) matrix and Λ is a diagonal (d × d) matrix of eigenvalues with Λ(i,i) ≥ 0 for 1 1 where Λ 2 denotes the (d × d) diagonal matrix all 1 ≤ i ≤ d. Let A = Λ 2 P ,√ whose (i, i)th coordinate is Λ(i,i) , and note that A is a real matrix that satisfies Σ = A A. Let Z be an Rd -valued vector of independent standard Gaussian random variables constructed on some probability space and define g = µ + A Z. By directly computing the characteristic function of t · g for t ∈ Rd , we can see that g ∼ Nd (µ, Σ), as desired. The following is an important consequence of the explicit form of the characteristic function of g · t: Corollary 1.1.1 Suppose g ∼ Nd (µ, Σ), where Σ(i,j) = 0 for i = j. Then g (1) , . . . , g (d) are independent, R-valued, Gaussian random variables.
140
5. Gaussian Random Variables
Exercise 1.1.3 Prove Corollary 1.1.1.
Exercise 1.1.4 The family of Gaussian distributions is closed under convergence in distribution. More precisely, suppose g1 , g2 , . . . are meanzero Gaussian random variables in R that converge in distribution to some g. Prove that g is a Gaussian random variable.
1.2 Gaussian Processes Given an abstract index set T , a process X = (Xt ; t ∈ T ) is a real-valued Gaussian process if for all finite choices of t1 , . . . , tk ∈ T , (Xt1 , . . . , Xtk ) is an Rk -valued Gaussian random variable. To each such Gaussian process we can associate a mean function µ(t) = E[Xt ] (t ∈ T ) and a covariance function Σ(s, t) = E[Xs Xt ] (s, t ∈ T ). When T is a measurable subset of RN , we may also refer to X as a Gaussian random field to conform to the historical use of this term. The intention of this subsection is to decide when a Gaussian random process exists. Since Gaussian random vectors are, in a trivial sense, random processes, this description should be viewed as a natural extension of Theorem 1.1.1. We say that a function f : T × T → R is symmetric if for all s, t ∈ T , f (s, t) = f (t, s). It is positive definite if for all s1 , . . . , sn ∈ T and all ξ1 , . . . , ξn ∈ R, n n ξi f (si , sj )ξj ≥ 0. (1) i=1 j=1
We first state two elementary results. Lemma 1.2.1 Any symmetric, positive definite function f is nonnegative on the diagonal in the sense that f (x, x) ≥ 0 for all x. Exercise 1.2.1 Prove Lemma 1.2.1.
Exercise 1.2.2 Suppose T is a measurable space, and f : T → R is measurable. If µ is a measure on the measurable subsets of T , and if f is positive definite, show that f (s, t) µ(ds)µ(dt) ≥ 0, provided that the integral is well-defined.
Lemma 1.2.2 The covariance function of a Gaussian process is symmetric and positive definite.
1 The Basic Construction
141
Proof Let X = (Xt ; t ∈ T ), and let Σ denote our Gaussian process and its covariance function, respectively. To show symmetry, we need only note that for all s, t ∈ T , E[Xs Xt ] = E[Xt Xs ]. To show that Σ is positive definite, fix s1 , . . . , sn ∈ T and ξ1 , . . . , ξn ∈ R and observe that n n n 2 E = ξj X(sj ) ξi Σ(si , sj )ξj . j=1
i=1 j=1
Since the left-hand side is positive, so is the right-hand side.
In fact, the properties of Lemma 1.2.2 characterize Gaussian processes in the following sense. Theorem 1.2.1 Given an abstract set T , an arbitrary function µ : T → R, and a symmetric, positive definite function Σ : T × T → R, there exists a Gaussian process X = (Xt ; t ∈ T ) with mean and covariance functions µ and Σ, respectively. This theorem asserts that such a process exists on some probability space, as the following exercise shows. Exercise 1.2.3 Let Ω be a denumerable set and let F denote a σ-field on the subsets of Ω. Show that one cannot define a real-valued Gaussian random variable on the measure space (Ω, F). Proof of Theorem 1.2.1 Suppose we could prove the result for the meanzero case; i.e., suppose we can construct a Gaussian process Y = (Yt ; t ∈ T ) with mean and covariance functions 0 and Σ, respectively. Then, Xt = Yt +µ(t) is the Gaussian process mentioned in the statement of the theorem. Thus, we can assume, without loss of generality, that µ(t) = 0, for all t ∈ T . Whenever F = {t1 , . . . , tk } is a finite subset of T , define the (k × k) matrix ΣF by (i,j)
ΣF
= Σ(ti , tj ),
1 ≤ i, j ≤ k.
Clearly, ΣF is a symmetric, positive definite (k × k) matrix. By Theorem 1.1.1, on some probability space we can construct an Rk -valued random variable ZF ∼ Nk (0, ΣF ) whose distribution is, for all measurable A ⊂ Rk , defined as µF (A) = P(ZF ∈ A). Suppose k > 1 and let F1 = {t1 , . . . , tk−1 }. In particular, F1 ⊂ F and ZF1 has the same distribution as the first k − 1 coordinates of ZF . Thus, for all measurable A1 , . . . , Ak−1 ⊂ R, µF1 (A1 × · · · × Ak−1 ) = µF (A1 × · · · × Ak−1 × R).
142
5. Gaussian Random Variables
In fact, if F0 is any subset of F of cardinality m (≤ k), µF0 (A1 × · · · × Am ) = µF (B1 × · · · × Bk ), where Bj = Aj for all j such that tj ∈ F0 , and Bj = R, otherwise. In this way we have created probability measures µF , on RF such that the family (µF ; F ⊂ T, finite) is a consistent family. By Kolmogorov’s existence theorem (Theorem 2, Appendix A), on an appropriate probability space we can construct a real-valued process X = (Xt ; t ∈ T ) such that for all finite F ⊂ T , the distribution of (Xt ; t ∈ F ) is µF . This is our desired stochastic process.
1.3 White Noise Our first nontrivial example of a Gaussian process will be one-dimensional white noise on RN . This will be a Gaussian process indexed by the Borel field on RN , which we write as T . (We could also let T denote the collection of all Lebesgue measurable sets in RN .) Let us begin with the form of the covariance function. Lemma 1.3.1 For all A, B ∈ T , define Σ(A, B) = Leb(A ∩ B). Then, Σ : T × T → R is symmetric and positive definite. &Proof It suffices to show that Σ is positive definite. Since Σ(A, B) = 1l (u)1lB (u)du, Fubini’s theorem shows that for all Ai ∈ T and all RN A ξi ∈ R (1 ≤ i ≤ n), n n
2 ξi Σ(Ai , Aj )ξj = 1lAi (u)ξi du ≥ 0. RN
i,j=1
i=1
This proves the lemma.
(One-dimensional) white noise on RN is defined to be a mean-zero Gaussian process W = (W(A); A ∈ T ) whose covariance function is Σ(A, B) = Leb(A ∩ B) (A, B ∈ T ). By Theorem 1.2.1 and by Lemma 1.3.1 above, white noise exists as a well-defined process. The following captures some of its salient features. Theorem 1.3.1 Let W = (W(A); A ∈ T ) be white noise on RN . (a) For all disjoint A, B ∈ T , W(A) and W(B) are independent. (b) For all A, B ∈ T , W(A ∪ B) = W(A) + W(B) − W(A ∩ B), a.s. ∞ (c) If A1 , A2 , . . . ∈ T are disjoint and i=1 Leb(Ai ) < ∞, then a.s., ∞ ∞
Ai = W(Ai ). W i=1
i=1
1 The Basic Construction
143
Remark It is tempting to think that W is almost surely a σ-finite, signed measure on the Borel field of RN . This is not the case (Supplementary Exercise 2), since the null sets in (a), (b), and (c) above depend on the choice of the sets A, B, A1 , A2 , . . ., and there are uncountably many such null sets. However, W is a σ-finite, signed, L2 (P)-valued measure; cf. Exercise 1.3.1 below. Proof If A, B ∈ T are disjoint, E[W(A)W(B)] = Leb(A∩B) = 0. Corollary 1.1.1 (i.e., “uncorrelated ⇒ independent”) implies part (a). Suppose A, B ∈ T are disjoint and define D = W(A∪B)−W(A)−W(B). By squaring D and taking expectations, and since A ∩ B = ∅, E[D2 ] = E{|W(A ∪ B)|2 } − 2E[W(B)W(A ∪ B)] − 2E[W(A)W (A ∪ B)] +E{|W(A)|2 } + 2E[W(A)W(B)] + E{|W(B)|2 } = Leb(A ∪ B) − 2 Leb(B) − 2 Leb(A) + Leb(A) + 2 Leb(A ∩ B) + Leb(B) = 0. This verifies (b) when A and B are disjoint. In fact, we can use this, together with induction, to deduce that whenever A1 , . . . , Ak ∈ T are disjoint, k W(∪ki=1 Ai ) = i=1 W(Ai ). Now we prove (b) in general. Write A ∪ B = A1 ∪ A2 ∪ A3 , where A1 = A ∩ B , A2 = B ∩ A , and A3 = A ∩ B. Since A1 , A2 , and A3 are disjoint, W(A ∪ B) = W(A1 ) + W(A2 ) + W(A3 ), Similarly,
W(A) = W(A1 ) + W(A3 ), W(B) = W(A2 ) + W(A3 ).
a.s.
(1)
(2)
Part (b) follows from (1) and (2), and it remains to prove (c). n For all n ≥ 1, define Mn = i=1 W(Ai ) and εn = W(∪∞ i=1 Ai ) − Mn . By A ), a.s. On the other hand, (b), Mn = W(∪ni=1 Ai ) and εn = W(∪∞ i=n+1 i E[ε2n ] = Leb
∞
Ai = Leb(Ai ),
∞ i=n+1
i=n+1
since the Ai ’s are disjoint. By the summability assumption on Lebesgue’s measure of the Ai ’s, εn goes to 0 in L2 (P) as n → ∞. Equivalently, Mn → 2 W(∪∞ i=1 Ai ) in L (P). To show a.s. convergence, note that M = (Mn ; n ≥ 1) is a martingale, since Mn is a sum of n independent mean-zero random variables. Moreover, E[Mn2 ] =
n i=1
Leb(Ai ),
144
5. Gaussian Random Variables
which is bounded. By the martingale convergence theorem (Theorem 1.7.1, Chapter 1), Mn converges in L2 (P) and almost surely. Since we have already identified the L2 (P) limit, the result follows. Exercise 1.3.1 Show that for all Borel sets A1 ⊃ A2 ⊃ · · · that satisfy ∩n An = ∅, limn→∞ E{|W(An )|2 } = 0. Conclude that W can be viewed as a σ-finite, L2 (P)-valued, signed measure. Exercise 1.3.2 Improve Theorem 1.3.1(a) by checking that whenever Leb(A ∩ B) = 0, W(A) is independent of W(B).
1.4 The Isonormal Process Let T designate the Borel field on RN , and W = (W(A); A ∈ T ) be white noise on RN . While it is not a measure almost surely (Remark of Section 1.3), Exercise 1.3.1 shows that W is a random L2 (P)-valued, signed measure. & This alone makes it possible to define a reasonable integral W (h) = h(a) W(ds), which is sometimes called the Wiener integral of h. The map h → W (h) is the isonormal process whose construction is taken up in this subsection. A function f : RN → R is an elementary function if for some E ∈ T with Leb(E) < ∞, f = 1lE . Finite linear combinations of elementary functions are simple functions. the collection of all measurable functions Recall that L2 (RN ) denotes & f : RN → R such that RN |f (s)|2 ds < ∞. For all f, g ∈ L2 (RN ), let & f, g = RN f (s)g(s) ds and f = f, f 1/2 . If we identify the elements of L2 (RN ) that are almost everywhere equal, then L2 (RN ) is a complete, separable metric space in the metric d(f, g) = f − g. Lemma 1.4.1 Simple functions are dense in L2 (RN ). Proof Fix an arbitrary ε > 0 and f ∈ L2 (RN ). We intend to show that there exists an elementary function S such that S − f ≤ ε. By the dominated convergence theorem, there exists an R > 0 such that
12 ε (1) |f (s)|2 ds ≤ . f 1lRN \[−R,R]N = 2 |s|≥R Fix such an R and define " R/η+1 j=−R/η−1 jη 1l[jη,(j+1)η] (f (s)), if |s| ≤ R, S(s) = 0, otherwise, where η = 12 (2R)−N ε, and •! denotes the greatest integer function. (What does S look like?) We claim that S is indeed a simple function. Let Ej = f −1 ([jη, (j + 1)η]) and note that Ej ∈ T , since f is measurable. Once we
1 The Basic Construction
145
demonstrate that Leb(Ej ) < ∞, this verifies that S is a simple function. On the other hand, by Chebyshev’s inequality, Leb(Ej ) ≤ Leb(s ∈ RN : |f (s)| ≥ jη) ≤
f 2 < ∞. j 2 η2
Consequently, S is a simple function. Moreover, for all s ∈ [−R, R]N , f (s)− η ≤ S(s) ≤ f (s). In particular, ( ( (S − f 1l[−R,R]N ( ≤ (2R)N η = ε . 2 Combining this with equation (1) and using the triangle inequality, we see that S − f ≤ ε, which is the desired result. Suppose h : RN → R is an elementary function. If we write h = 1lE for E ∈ T , we can define W (h) = h(s) W(ds) = W(E). For any two elementary functions f and g, let W (f + g) = W (f ) + W (g). This is well-defined, as the following shows. Lemma 1.4.2 Suppose f + g = p + q, where f, g, p, q are elementary functions. Then, W (f ) + W (g) = W (p) + W (q), almost surely. Proof By symmetry, it suffices to consider the case where g ≡ 0. Suppose f = p + q and f = 1lF , p = 1lP and q = 1lQ , where F, P, Q ∈ T . We seek to show that a.s., D = 0, where, D = W(F ) − W(P ) − W(Q). Since 1lF = 1lP + 1lQ , P and Q must be disjoint. Thus, writing F = (F ∩ P ) ∪ (F ∩ Q) leads to E[D 2 ] = E {W(F ∩ P ) − W(P ) + W(F ∩ Q) − W(Q)}2 = E {W(F ∩ P ) − W(P )}2 + E {W(F ∩ Q) − W(Q)}2 . We have used Theorem 1.3.1 in this calculation. The same theorem can be applied again to show that • E[{W(F ∩ P ) − W(P )}2 ] = Leb(F ∩ P ); • E[{W(F ∩ Q) − W(Q)}2 ] = Leb(F ∩ Q). Thus, E[D2 ] = Leb{F ∩ (P ∪ Q)}, which is zero, since f = p + q.
Finally, if α, β ∈ R and f, g are elementary functions, define W (αf + βg) = αW (f ) + βW (g). By Lemma 1.4.2, this definition is well-defined.
146
5. Gaussian Random Variables
Moreover, we have now defined W (h) for all simple functions h. We want to use Lemma 1.4.1 to define W (h) for all h ∈ L2 (RN ). The key to this is the following; it will show that W : L2 (P) → L2 (RN ) is an isometry. Lemma 1.4.3 If h : RN → R is a simple function, then E{|W (h)|2 } = h2 . n Proof We can write h as h = j=1 cj 1lEj , where the cj ’s are real numbers and Ej ∈ T are disjoint and have finite Lebesgue measure. By Theorem 1.3.1, n c2j Leb(Ej ). E{|W (h)|2 } = j=1
Since the Ej ’s are disjoint, this equals h2 .
By Lemma 1.4.1, simple functions are dense in L2 (RN ). Thus, for any f ∈ L2 (RN ), there are simple functions sn such that limn→∞ sn − f = 0. In particular, (sn ; n ≥ 1) is a Cauchy sequence in L2 (RN ). By Lemma 1.4.3, (W (sn ); n ≥ 1) is a Cauchy sequence in L2 (P). Since the latter is complete, limn→∞ W (sn ) exists in L2 (P). We denote this limit by W (f ). The stochastic process (W (h); h ∈ L2 (RN )) is called the isonormal process. Theorem 1.4.1 The isonormal process W = (W (h); h ∈ L2 (RN )) is a mean-zero Gaussian process indexed by L2 (RN ) such that for all h1 , h2 ∈ L2 (RN ), E[W (h1 )W (h2 )] = h1 , h2 . Moreover, for all α, β ∈ R and for every f, g ∈ L2 (RN ), W (αf + βg) = αW (f ) + βW (g),
a.s.
Proof The asserted linearity follows from the construction of W . We need to show that for all t ∈ Rk and all all h1 , . . . , hk ∈ L2 (RN ),
k (i) k k 2 2 (i) (j) i=1 t W (hi ) ∼ N1 0, σ , where σ = i=1 j=1 t t hi , hj . How k ever, by the asserted linearity of this result, i=1 t(i) W (hi ) = W (h ), a.s., k where h = i=1 t(i) hi ∈ L2 (RN ). Therefore, it suffices to show that for all h ∈ L2 (RN ), W (h) is a Gaussian random variable with mean 0 and variance h2 . If h is elementary, this follows from Lemma 1.4.3 and the fact that white noise W is a mean-zero Gaussian process. Since L2 (RN )-limits of Gaussian random variables are Gaussian, the result follows; cf. Exercise 1.1.4. An alternative approach to the above construction is to directly define the isonormal process as a Gaussian process indexed by L2 (RN ) whose mean and covariance functions are 0 and Σ(f, g) = f, g, respectively. Then, one can define white noise by W(A) = W (1lA ); see Supplementary
1 The Basic Construction
147
Exercise 3 for details. We conclude this subsection by introducing more notation. Let W and W denote the isonormal process and the white noise of this subsection, respectively. Given &a Borel set A ⊆ RN and a measurable function f : & RN → R, we write A f (u) W(du) for W (1lA f ) = 1lA (u)f (u) W(du). We say that f ∈ L2loc (RN ) if for all compact sets K ⊂ RN , 1lK f ∈ L2 (RN ). If f ∈ L2loc (R&N ), we can define the process W (f ) = (Wt (f ); t ∈ RN +) by Wt (f ) = [0,t] f (u) W(du) (t ∈ RN ). The following is an immediate + corollary of Theorem 1.4.1. Corollary 1.4.1 If f ∈ L2loc (RN ) is fixed, W (f ) = (Wt (f ); t ∈ RN + ) is a Gaussian process with mean function 0 and covariance function Σ(s, t) = & & 2 |f (u)| du, where W (f ) = f (u) W(du). t [0,st] [0,t] The following is well worth trying at this point. The connection to martingales will be further elaborated upon later on. Exercise 1.4.1 Prove that in Corollary 1.4.1, t → Wt (f ) is a mean-zero, N -parameter martingale. Use this to define Wt (f ), and hence W (f ), for any f ∈ Lp (RN ) (p > 1).
1.5 The Brownian Sheet The one-dimensional Brownian sheet indexed by RN + is a Gaussian process B = (Bt ; t ∈ RN + ) with mean 0 and covariance function N
Σ(s, t) =
(s() ∧ t() ),
s, t ∈ RN +.
=1
For any integer d ≥ 1, the d-dimensional Brownian sheet indexed by (i) N (i) = (Br ; r ≥ 0) (1 ≤ i ≤ d) RN + is the process B = (Bt ; t ∈ R+ ), where B are independent, 1-dimensional, N -parameter Brownian sheets.1 Recall that when N = 1, the one-dimensional Brownian sheet is more commonly called Brownian motion. Likewise, the d-dimensional, 1parameter Brownian sheet is more commonly known as the d-dimensional Brownian motion. For our current purposes, it suffices to study the 1dimensional, N -parameter Brownian sheet, which is what we now concentrate on.
Since Σ(s, t) = Leb [0, s] ∩ [0, t] , Lemma 1.3.1 shows that Σ is indeed positive definite and such a process is well-defined and exists on some probability space. In fact, by checking covariances, we can immediately deduce ˇ the following useful representation of Centsov (1956). 1 The processes X 1 , X 2 , . . ., indexed by some set T , are independent if for all finite sets F1 , F2 . . . ⊂ T , (Xtj ; t ∈ Fj ) are independent random vectors (j = 1, 2, . . .).
148
5. Gaussian Random Variables
ˇ Theorem 1.5.1 (Centsov’s Representation) Consider a white noise on RN denoted by W = (W(A); A ⊂ RN , Borel). Then, B = (Bt ; t ∈ RN +) is a Brownian sheet, where Bt = W [0, t] . Exercise 1.5.1 Verify the details of the proof of Theorem 1.5.1.
If W denotes the isonormal process obtained from W, then W(A) = W (1lA ) for all Borel sets A ⊂ RN that have finite Lebesgue’s measure. In particular, the Brownian sheet of the theorem has the representation Bt = &W (1l[0,t] ). Recall from Section 1.4 that we may also write this as Bt = [0,t] W(ds). Therefore, very loosely speaking, W([0, t]) =
∂N Bt . ∂t(1) · · · ∂t(N )
While one can make sense of this derivative as a so-called generalized function, it does not exist in the usual way: B turns out to be a.s. nowhere differentiable. However, this observation motivates the following convenient notation: For all f ∈ L2 (RN + ), define f (s) dBs = f (s) W(ds) = W (f ). Similarly, for all f ∈ L2loc (RN ) and all t ∈ RN +, f (s) dBs = f (s) W(ds) = W (1l[0,t] f ). [0,t]
[0,t]
We will use the above notations interchangeably. Let us conclude with a few exercises on the basic structure of the Brownian sheet. Exercise 1.5.2 If B denotes a 2-parameter Brownian sheet, show that for any fixed t > 0, s → t−1/2 Bs,t and s → t−1/2 Bt,s are each Brownian motions. Find an N -parameter analogue when N > 2. Exercise 1.5.3 If B is a 2-parameter Brownian sheet, so are (s, t) → sB(1/s),t , (s, t) → tBs,(1/t) , and (s, t) → stB(1/s),(1/t) . These are examples of time inversion. Formulate and verify the general N -parameter results. Note that, in general, there are 2N − 1 different ways of inverting time.
2 Regularity Theory Suppose X = (Xt ; t ∈ T ) is a stochastic process that is indexed by some metric space T . In this section we derive general conditions under which the random function t → Xt is continuous, a.s.
2 Regularity Theory
149
A useful way of establishing continuity is to verify that on every compact subset of T , t → Xt is, with probability one, uniformly continuous. Since T is a general index set, we begin by reexamining the compactness, or, more generally, total boundedness, of T .
2.1 Totally Bounded Pseudometric Spaces Given an abstract set T , a function d : T × T → R+ is said to be a pseudometric if it satisfies the following conditions: (i) (symmetry) for all s, t ∈ T , d(s, t) = d(t, s); (ii) (triangle inequality) for all r, s, t ∈ T , d(s, t) ≤ d(s, r) + d(r, t); and (iii) for all s ∈ T , d(s, s) = 0. In particular, the difference between a pseudometric and a proper metric is that the former need not separate points, i.e., d(s, t) = 0 need not imply s = t. For example, if we do not identify functions up to almost everywhere equality, its usual metric is, in fact, a pseudometric for any given Lp space. As a less trivial example, consider the space C[0, 1] of real-valued, continuous functions on [0, 1], and define d(f, g) =
inf
x,y∈[0,1]
|f (x) − g(y)|,
f, g ∈ C[0, 1].
It is an easy matter to check that this is a pseudometric on C[0, 1]. The typical probabilistic example of a pseudometric is given by the following. Exercise 2.1.1 Given a stochastic process (Xt ; t ∈ T ), and given p ≥ 1, d(s, t) = [E{|Xs − Xt |p }]1/p defines a pseudometric on T , provided that Xt ∈ Lp (P) for all t ∈ T . Also prove that, quite generally, d(s, t) = E{|Xs − Xt | ∧ 1} defines a pseudometric on T . We say that (T, d) is a pseudometric space in order to emphasize the fact that d is the pseudometric on T . The (open) ball of radius r > 0 about t ∈ T is denoted by Bd (t; r) and defined by Bd (t; r) = s ∈ T : d(s, t) < r . We say that (T, d) is totally bounded if for all ε > 0, there exists an integer m and there are distinct t1 , . . . , tm ∈ T such that Bd (t1 ; ε), . . . , Bd (tm ; ε) cover T . Moreover, any such collection (t1 , . . . , tm ) is an ε-net for T . Any ε-net (t1 , . . . , tm ) has the important property that for all t ∈ T , there exists 1 ≤ i ≤ m such that d(t, ti ) ≤ ε. We will appeal to this interpretation later on. For complete metric spaces, totally boundedness is easily characterized, as the following standard fact from general topology shows. We will not
150
5. Gaussian Random Variables
need it very much in the sequel, and refer to Munkres (1975, Theorem 3.1, Chapter 7) for a proof. Theorem 2.1.1 A metric space (T, d) is compact if and only if it is complete and totally bounded. Example 1 Suppose T = [−1, 1]N is endowed with the metric d(s, t) = |s − t| (s, t ∈ RN ); recall that this is the ∞ -metric. In this metric, the “ball” Bd (t; r) is an open hypercube of side 12 r centered at t ∈ RN . We now seek to find a good ε-net in T . For any ε > 0, let Tε denote the collection of all points s ∈ T such that εs ∈ ZN . This is a natural ε-net for T , and a little thought shows that the cardinality of Tε satisfies lim ε−N #Tε = 1;
ε→0
see Supplementary Exercise 4 for details. It should be intuitively clear that the Tε in the above example leads to the best possible ε-cover for T = [−1, 1]N . As such, #Tε ≈ ε−N ought to be a good measure of the size of T . We define the metric entropy D(•; T, d) of (T, d) as follows:2 For each ε > 0, D(ε; T, d) denotes the minimum number of balls of radius ε > 0 required to cover all of T . It is clear that ε → D(ε; T, d) is nonincreasing, and unless T is a finite set, limε→0 D(ε; T, d) = +∞. Moreover, the rate at which the metric entropy of (T, d) blows up near ε = 0 is, in fact, a gauge for the size of T , and in many cases is estimable. For instance, consider T = [−1, 1]N with d(s, t) = |s − t|. Then, according to Example 1, D(ε; [−1, 1]N , d) ≈ ε−N when ε ≈ 0. Another measure of the size of (T, d) is the Kolmogorov capacitance K(•; T, d):3 For all ε > 0, K(ε; T, d) denotes the maximum number m of points t1 , . . . , tm ∈ T such that for all i = j, d(ti , tj ) ≥ ε. The functions D and K are related by the following inequalities; see Dudley (1984, Theorem 6.0.1). Lemma 2.1.1 Given a totally bounded metric space (T, d), for each ε > 0, D(ε; T, d) ≤ K(ε; T, d) ≤ D( 12 ε; T, d). Proof Fix ε > 0 and let D = D(ε; T, d), D = D( 12 ε; T, d), and K = K(ε; T, d). We can find a maximal collection t1 , . . . , tK ∈ T such that for all i = j, d(ti , tj ) ≥ ε. The maximality of this collection, together with 2 The full power of metric entropy was brought to light in the work of R. M. Dudley; hence, the letter D. See Dudley (1973, 1984), where D is replaced by N throughout. 3 Capacitance was motivated by the groundbreaking work of Shannon (1948) in information theory; for a discussion of Kolmogorov’s contribution, see Tihomirov (1963).
2 Regularity Theory
151
an application of the triangle inequality, shows that for any t ∈ T there exists 1 ≤ i ≤ K such that d(t, ti ) < ε. In other words, the balls Bd (ti ; ε) (1 ≤ i ≤ K) cover T , and since D is the cardinality of the minimal such covering, D ≤ K. Conversely, we can find σ1 , . . . , σD ∈ T such that Bd (σi ; 12 ε) cover T for 1 ≤ i ≤ D . Let Bi = Bd (σi ; 12 ε) and note that whenever s1 , s2 ∈ T satisfy d(s1 , s2 ) ≥ ε, then s1 and s2 cannot be in the same Bj for any j ≤ D . Thus, we have that K ≤ D , as desired. As the following examples show, Lemma 2.1.1 provides a useful way to estimate the metric entropy and/or the Kolmogorov capacitance of a totally bounded set T . Example 2 Recall that a function f : [0, 1] → R is a contraction if it is measurable and |f (s) − f (t)| ≤ |s − t|,
s, t ∈ [0, 1].
Let T denote the collection of all contractions f : [0, 1] → R such that f (0) = 0, and endow T with the maximum modulus metric d(f, g) = sup |f (t) − g(t)|, 0≤t≤1
f, g ∈ T.
Next, we introduce A. N. Kolmogorov’s estimates for D(ε; T, d) and K(ε; T, d). We shall see from these estimates that D(ε; T, d) and K(ε; T, d) are finite, and the total boundedness of (T, d) follows readily. To begin with, note that f maps [0, 1] into [0, 1]. Indeed, if f ∈ T , then for all s ∈ [0, 1], |f (s)| = |f (s) − f (0)| ≤ s ≤ 1. Now for all k > 0, let Tk denote the collection of all piecewise linear functions f ∈ T such that for all integers 0 ≤ j ≤ k, kf (j/k) ∈ N0 . Since the elements of Tk are contractions, #(Tk ) = 3k+1 .4 Now if f, g ∈ Tk are distinct, d(f, g) ≥ k −1 ; this leads to the inequality K(k −1 ; T, d) ≥ 3k . Next, we use a monotonicity argument to estimate K(ε; T, d) for any ε: Whenever ε ∈ ]0, 1[, then, ε ∈ [(k + 1)−1 , k −1 ], for some integer k ≥ 1. Since ε → K(ε; T, d) and k → 3−k are nonincreasing, K(ε; T, d) ≥ 31/ε ,
0 < ε < 1.
(1)
On the other hand, for any f ∈ T we can define πk f to be the piecewise linear function in Tk such that πk f (jk −1 ) = kf (jk −1 )!, for all 0 ≤ j ≤ k. 4 To verify this, think of #(T ) as the total number of ways to construct piecewise k linear contractions f : [0, 1] → [0, 1] such that f (0) = 0 and kf (jk −1 ) ∈ N0 , for all j = 0, . . . , k. Once f (k −1 ), . . . , f (jk −1 ) have been constructed, there are only three possibilities for constructing f ((j + 1)k −1 ): f (jk −1 ) or f (jk −1 ) ± k −1 .
152
5. Gaussian Random Variables
The fact that f is a contraction implies that d(f, πk f ) ≤ k −1 . Since πk f ∈ Tk for all contractions f , we can conclude that the balls (Bd (f ; k −1 ); f ∈ Tk ) have radius at most k −1 and cover T . Consequently, D(k −1 ; T, d) ≤ #(Tk ) = 3k+1 . By another monotonicity argument, D(ε; T, d) ≤ 31+(1/ε) ,
0 < ε < 1.
(2)
Upon combining (1) and (2) with Lemma 2.1.1, we obtain the following: 1 ln 3 ≤ lim inf ε ln D(ε; T, d) ≤ lim sup ε ln D(ε; T, d) ≤ ln 3, ε→0 2 ε→0 ln 3 ≤ lim inf ε ln K(ε; T, d) ≤ lim sup ε ln K(ε; T, d) ≤ 2 ln 3, ε→0
ε→0
the point being that, very roughly speaking, both K and D behave like exp(cε−1 ) in the present infinite-dimensional setting. This should be compared to the polynomial growth of the entropy as outlined in Example 1 above. Example 3 For any fixed c > 0, let Tc denote the class of all differentiable functions f : [0, 1] → R such that f (0) = 0 and sup0≤s≤1 |f (s)| ≤ c, and endow Tc with the metric d(f, g) = sup0≤t≤1 |f (t) − g(t)|. It is easy to see that (Tc , d) is a totally bounded metric space. Indeed, (c−1 f ; f ∈ Tc ) is a subset of the collection of all contractions on [0, 1], which, by Example 2 above, is totally bounded in the same metric d. We conclude this subsection with the following corollary of Lemma 2.1.1. Corollary 2.1.1 Suppose T ⊂ RN + is measurable and d is a pseudometric on T . If there exist C, α > 0 such that for all s, t ∈ T , d(s, t) ≤ C|s − t|α , then there exists r0 > 0 such that for all r ∈ [0, r0 ], D(r; T, d) ≤ C N/α Leb(T )r−N/α , where Leb(T ) is Lebesgue’s measure of T . Proof Let ∂ denote the metric ∂(s, t) = |s − t|, and define ε0 = 12 sup ∂(s, t) : s, t ∈ T . Whenever 0 < ε < ε0 and t1 , . . . , tm ∈ T are such that ∂(ti , tj ) > ε (i = j), 1 then B∂ (ti ; 12 ε) are disjoint and ∪m i=1 B∂ (ti ; 2 ε) ⊂ T . Hence, mεN = Leb
m i=1
−N
B∂ (ti ; 12 ε) ≤ Leb(T ).
Equivalently, m ≤ Leb(T )ε . Since m is arbitrary, we can maximize over all such possible m’s to see that for all 0 < ε < ε0 , K(ε; T, ∂) ≤ Leb(T )ε−N . By Lemma 2.1.1, for all 0 < ε < ε0 , D(ε; T, ∂) ≤ Leb(T )ε−N . On the other hand, d(s, t) ≤ C{∂(s, t)}α . Thus, any ε-net with respect to ∂ is a Cεα -net with respect to d. That is, for all 0 < ε < ε0 , D(Cεα ; T, d) ≤ D(ε; T, ∂) ≤ Leb(T )ε−N . The result follows with r0 = Cεα 0.
2 Regularity Theory
153
Exercise 2.1.2 Consider the collection of continuous functions f : [0, 1] → R such that for all s, t ∈ [0, 1], |f (s) − f (t)| ≤ C|t − s|α for some α > 0 and some finite C > 0. (a) Show that α is necessarily in [0, 1]. (b) (Hard) Compute the best possible upper bound for the metric entropy of this collection. (Hint: To show that your upper bound is best possible, consider global helices: A function f is a global helix of order α on [0, 1] if there exists a finite constant C > 1 such that for all open intervals I ⊂ [0, 1] of length , sups,t∈I |f (s)−f (t)| ≥ Cα . Compute a lower bound for the metric entropy of global helices on [0, 1]. You may assume, without proof, the existence of global helices for any α ∈ ]0, 1[.)
2.2 Modifications and Separability The transition from discrete-parameter to continuous-parameter processes could potentially pose a great number of technical problems. However, a good number of such issues would vanish if the process t → Xt (t ∈ T ) were continuous. Of course, this makes sense only if the parameter set T is topological. We begin this subsection with the following simple but instructive example, which implies that one cannot be too ambitious in setting out such goals. Example Let B = (Bt ; t ≥ 0) denote a Brownian motion. Recall from Section 1.5 that B is a Gaussian process indexed by T = [0, ∞[ with mean and covariance functions µ(t) = 0 and Σ(s, t) = s∧t, respectively (s, t ∈ T ). Suppose U is picked at random, uniformly in [0, 1], and that U is independent of B. Of course, we may need to enlarge the underlying probability space to construct such a U . On this possibly enlarged space we can define a stochastic process Y = (Yt ; t ≥ 0) as follows: " Bt , if U = t, Yt = t ≥ 0. 12, if U = t, Note that for all t1 , . . . , tk ≥ 0,
P Yti = Bti , for some 1 ≤ i ≤ k = P U ∈ {t1 , . . . , tk } = 0. This shows that Y has the same finite-dimensional distributions as Brownian motion; hence, Y is a Brownian motion. Supposing that B is continuous, we now show that Y is a.s. not continuous. Indeed, by the independence of U and Y , and using the uniformity
154
5. Gaussian Random Variables
of U in [0, 1], we obtain P(BU = 12) =
0
1
P(Bt = 12) dt = 0.
In summary, we have shown that any continuous Brownian motion B can be modified to produce a Brownian motion Y that is not continuous. The above example suggests that it is virtually impossible to show that the process X itself is continuous, since a priori we know only its finitedimensional distributions. However, it may still be possible to modify X, without altering its finite-dimensional distributions, so as to produce a continuous process. For instance, in Example 2, if B were continuous, then it could be viewed as such a modification of Y . In this subsection we explore some of the fundamental properties of modifications of a given stochastic process. Given an arbitrary index set T , two real-valued stochastic processes X = (Xt ; t ∈ T ) and Y = (Yt ; t ∈ T ) are said to be modifications of each other if for all t ∈ T , P Xt = Yt = 1. Following the special definition of Section 1.1, Chapter 3, we will define the finite-dimensional distributions of a real-valued stochastic process X as the collection of all the distributions of the Rk -dimensional random variables (Xt1 , . . . , Xtk ), as we vary k ≥ 1 and t1 , . . . , tk ∈ T , and begin with the remark that modifying a stochastic process does not alter its finite-dimensional distributions. Lemma 2.2.1 If X = (Xt ; t ∈ T ) is a modification of Y = (Yt ; t ∈ T ), then X and Y have the same finite-dimensional distributions. Exercise 2.2.1 Verify Lemma 2.2.1.
But which modification of X, if any, is useful? To answer this, we first observe that unless T is at most denumerable, some very natural objects such as supt∈T Xt can be nonmeasurable. For instance, if λ ∈ R,
ω : Xt (ω) ≥ λ ω : sup Xt (ω) ≥ λ = t∈T
t∈T
need not be measurable when T is uncountable. In the remainder of this section we will see that, under some technical conditions on T , one can construct a continuous modification Y = (Yt ; t ∈ T ) of X. By Lemma 2.2.1, Y and X have the same finite-dimensional distributions, and under mild conditions such as separability of T , supt∈T Yt is indeed measurable in such cases. However, even when continuous modifications do not exist, one can often obtain a modification with nice measurability properties, e.g., such that supt∈T Yt is measurable.
2 Regularity Theory
155
We will close this subsection with the more general, but weaker, property of separability for real-valued multiparameter processes; it is good enough to handle many such measurability issues, even when continuous modifications do not exist. A stochastic process X = (Xt ; t ∈ RN + ) is said to be separable if there exists an at most countable collection T ⊂ RN + and a null set Λ such that for all closed sets A ⊂ R and all open sets I ⊂ RN + of the form N () () () () I = =1 ]α , β [, where α ≤ β (1 ≤ ≤ N ) are rational or infinite,
ω : Xs (ω) ∈ A for all s ∈ I ∩ T \ ω : Xs (ω) ∈ A for all s ∈ I ⊂ Λ. It is important to note the order of the quantifiers: The choice of Λ and T can be made independently of that of A and I. Recall that a probability space is complete if all subsets of null sets are themselves null sets. Also, recall that one can always complete the underlying probability space at no cost. Consequently, we have the following. Lemma 2.2.2 Suppose X = (Xt ; t ∈ RN + ) is a separable stochastic process and suppose the underlying probability space is complete. Then for any open N rectangle I ⊂ RN + and any τ ∈ R+ , the following are random variables: supt∈I Xt , inf t∈I Xt , lim supt→τ Xt , and lim inf t→τ Xt . Exercise 2.2.2 Verify Lemma 2.2.2.
Thus far, we have seen that it is advantageous to study a separable modification Y of a process X, if and when it exists, since Y has the same finite-dimensional distributions as X but has fewer measure-theoretic problems. When can we find such modifications? The following remarkable fact, due to J. L. Doob, says that the answer is always; see Doob (1990, Chapter II) for extensions and other variants. Theorem 2.2.1 (Doob’s Separability Theorem) Any stochastic process X = (Xt ; t ∈ RN + ) has a separable modification. The proof of Doob’s separability theorem is long and is divided into two steps, which are stated below as technical lemmas. Throughout, R denotes the collection of all open rectangles in RN + with rational or infinite endpoints, I denotes the collection of all closed intervals in R with rational or infinite endpoints, and C denotes the collection of all closed subsets of R. Lemma 2.2.3 There exists a countable set S ⊂ RN + such that for any fixed t ∈ RN , the following is a null set: + Nt =
ω : Xs (ω) ∈ A for all s ∈ S , Xt (ω) ∈ A .
A∈I
156
5. Gaussian Random Variables
Proof. Temporarily, fix some Borel set A ⊂ R and let t1 = 0 ∈ RN + and
ε1 = sup P X0 ∈ A , Xt ∈ A . t∈RN +
Having constructed distinct t1 , . . . , tk ∈ RN + and ε1 ≥ ε2 ≥ · · · ≥ εk define
εk+1 = sup P Xtj ∈ A for all 1 ≤ j ≤ k Xt ∈ A . t∈RN +
Clearly, ε1 ≥ ε2 ≥ · · · ≥ εk+1 . Moreover, we can always choose some tk+1 ∈ RN + \ {t1 , . . . , tk } such that
εk+1 P Xtj ∈ A for all 1 ≤ j ≤ k , Xtk+1 ∈ A ≥ . 2 Thus, ∞
εk ≤ 2
k=2
∞
P Xtj ∈ A for all 1 ≤ j ≤ k − 1 , Xtk ∈ A k=2
= 2P Xtk ∈ A for some k ≥ 2 < +∞.
In particular, limk→∞ εk = 0. In other words, we have shown that for any Borel set A ∈ RN + , there exists a countable set TA such that
sup P Xs ∈ A for all s ∈ TA , Xt ∈ A = 0. t∈RN +
To finish, define S =
; A∈I
TA , which is clearly countable, since I is.
Lemma 2.2.4 For each t ∈ RN +,
ω : Xs (ω) ∈ A for all s ∈ S , Xt (ω) ∈ A ⊂ Nt .
A∈C
Proof. Note that any A ∈ C can be written as A = ∩∞ n=1 An , where An ∈ I. By Lemma 2.2.3, for any such A ∈ C and A1 , A2 , . . . ∈ I,
ω : Xs (ω) ∈ A for all s ∈ S , Xt (ω) ∈ An
⊂ ω : Xs (ω) ∈ An for all s ∈ S , Xt (ω) ∈ An
ω : Xs (ω) ∈ E for all s ∈ S , Xt (ω) ∈ E ⊂ E∈I
⊂ Nt . Since A = ∩n≥1 An , the results follows.
2 Regularity Theory
157
5 Proof of Theorem 2.2.1 Let I ∈ R and apply Lemma 2.2.4 to the stochastic process Xt ; t ∈ I to conclude the existence of null sets Nt (I) (t ∈ RN + ) and a countable set SI ⊂ I such that
ω : Xs (ω) ∈ A for all s ∈ SI , Xt (ω) ∈ A ⊂ Nt (I). A∈C
; Since ; R is countable, S = I∈R SI is countable. Similarly, for all t ∈ RN +, Λt = I∈R Nt (I) is a null set. Define RI (ω) = Xs (ω); s ∈ I ∩ S . In words, RI is the closure of the image of I under the random function X. Note that RI may include the values of ±∞. Moreover, RI is closed and nonempty in R ∪ {±∞}. For any t ∈ RN + , define the random set Rt = RI . I∈R: t∈I
Clearly, Rt ⊂ R ∪ {±∞} is closed and nonempty, for all ω ∈ Ω. Moreover, if t ∈ RN + and ω ∈ Λt , then Xt (ω) ∈ Rt (ω).
(1)
We are ready to construct the desired modification of X. For all ω ∈ Ω 't (ω) = Xt (ω). If t ∈ S and ω ∈ Λt , also define and all t ∈ S , define X 't (ω) to 't (ω) = Xt (ω). Finally, whenever t ∈ S and ω ∈ Λt , define X X be some designated element of Rt (ω). Since P(Λt ) = 0 for each t ∈ RN +, N ' ' ' P(Xt = Xt ) = 1, which is to say that X = (Xt : t ∈ R+ ) is a modification ' is separable. Fix A ∈ C and I ∈ R, and of X. It remains to show that X suppose ω satisfies the following: 's (ω) ∈ A for all s ∈ I ∩ S . X 's (ω) = Xs (ω) ∈ Rs (ω) ⊂ RI (ω) ⊂ A, since A is If s ∈ I ∩ S but ω ∈ Λs , X 's (ω) = Xs (ω) ∈ closed. Similarly, if s ∈ I, but s ∈ S and ω ∈ Λs , then X RI (ω) ⊂ A, by equation (1). Define Λs . Λ= s∈S
Since S is countable, Λ is a null set; it is also chosen independently of all A ∈ C and I ∈ R. Finally, we have shown that
's (ω) ∈ A for all s ∈ I ∩ S ∩ Λ ⊂ ω : X 's (ω) ∈ A for all s ∈ I . ω: X 5 This
can be skipped at first reading.
158
5. Gaussian Random Variables
Thus,
's (ω) ∈ A for all s ∈ I ∩ S ⊃ ω : X 's (ω) ∈ A for all s ∈ I ω: X implies that
's (ω) ∈ A for all I ∩ S \ X 's (ω) for all s ∈ I ⊂ Λ. ω:X
(2)
∞ Finally, any open rectangle J ⊂ RN + can be written as J = ∪n=1 In where In ∈ R. Since Λ is chosen independently of all I ∈ R and all closed sets A, we conclude that for all open rectangles J ⊂ RN + and all closed sets A ⊂ R,
's (ω) ∈ A for all J ∩ S \ X 's (ω) for all s ∈ J ⊂ Λ. ω:X
' This shows the separability of X.
2.3 Kolmogorov’s Continuity Theorem Given a totally bounded pseudometric space (T, d), we seek to find conditions under which a stochastic process X = (Xt ; t ∈ T ) has a continuous modification.6 If X has a continuous modification, then it is clearly continuous in probability in the sense that for any ε > 0 and any t ∈ T , lim P(|Xs − Xt | ≥ ε) = 0.
s→t
By Chebyshev’s inequality, this automatically holds if there exists p > 0 such that for all t ∈ T , lim E{|Xs − Xt |p } = 0.
s→t
Our next theorem—known as Kolmogorov’s continuity theorem— states that if E{|Xs − Xt |p } → 0 quickly enough, then X has a continuous modification. Theorem 2.3.1 (Kolmogorov’s Continuity Theorem) Consider a process X = (Xt ; t ∈ T ) where (T, d) is a totally bounded pseudometric space, and suppose there exists p > 0 and a nondecreasing, continuous function Ψ : R+ → R+ such that Ψ(0) = 0 and for all s, t ∈ T , E{|Xs − Xt |p } ≤ Ψ(d(s, t)). Then, X has a continuous modification, provided that there exists a nondecreasing, measurable function f : R+ → R+ such that: 6 Detailed historical background and related results, together with extensions from metric entropy to majorizing measures, can be found in Adler (1990), Dudley (1973, 1984, 1989), Ledoux (1996), Ledoux and Talagrand (1991), and their combined references.
2 Regularity Theory
(i) (ii)
&1 0
r−1 f (r) dr < ∞; and
&1 0
159
D(r; T, d)Ψ(2r){f (r)}−p dr < +∞.
Remark If Y = (Yt ; t ∈ T ) denotes any modification of X, it is not a priori clear that the event (t → Yt is continuous) is measurable. This is a part of the assertion of Theorem 2.3.1. We will prove Theorem 2.3.1 in the next subsection. However, for now, let us state and prove a useful consequence. Corollary 2.3.1 Let X = (Xt ; t ∈ RN + ) be a stochastic process that satisfies the following for some C, p > 0 and γ > 1 + p: 1 −γ , t, s ∈ RN E{|Xs − Xt |p } ≤ C|t − s|N ln+ +. |t − s| Then, X has a continuous modification. Proof We first show that for all τ ∈ NN , (Xt ; t ∈ [0, τ ]) has a continuous modification. Fix such a τ , let T = [0, τ ], and define the pseudometric d by d(s, t) = |s − t|, s, t ∈ T . In fact, d is a metric on T and (T, d) is totally bounded. To show total boundedness, we bound the metric entropy of (T, d). For any integer n ≥ 1, let Tn denote the collection of all points t ∈ T ∩ n−1 ZN . N N Since N τ()∈ N , the cardinality of Tn is precisely Cτ n , where Cτ = + 1). On the other hand, Tn is easily seen to be the maximal =1 (τ 1 -net for T . Recalling Kolmogorov’s capacitance K from Section 2.1, it n follows that K( n1 ; T, d) = Cτ nN . By a monotonicity argument, for all n ≥ 2 and all n−1 ≤ r ≤ (n − 1)−1 ,
N 1 +1 . D(r; T, d) ≤ K(r; T, d) ≤ K( n1 ; T, d) = Cτ nN ≤ Cτ r We have used Lemma 2.1.1 to compare entropy to capacitance. The above discussion implies that for all 0 < r < 1, D(r; T, d) ≤ 2N Cτ r−N .
(1)
By the assumption of Corollary 2.3.1, for all s, t ∈ T , E{|Xs − Xt |p } ≤ Ψ(d(s, t)), where Ψ(x) = CxN {ln+ (1/x)}−γ . Let f (r) = {ln+ (1/r)}−θ , where 1 < θ < p−1 (γ − 1) is a fixed number. By equation (1), 1 1 1 θp−γ D(r; T, d)Ψ(2r) 1 N ln dr ≤ 4 C C dr, τ + r{f (r)}p 2r 0 0 r &1 which is finite, since θp − γ < −1. Since 0 r−1 f (r) dr < ∞, Kolmogorov’s continuity theorem implies that for all τ ∈ NN , the process (Xt ; t ∈ [0, τ ]) has a continuous modification.
160
5. Gaussian Random Variables
We conclude this demonstration by proving our result for (Xt ; t ∈ RN + ); this is done by appealing to a patching argument. Let Y τ = (Ytτ ; t ∈ [0, τ ]) denote the mentioned modification of (Xt ; t ∈ [0, τ ]). For any τ σ, both in NN , Y τ and Y σ agree on [0, τ ]; i.e., P(Ytτ = Ytσ ) = 1. Therefore, by continuity, P(Ytτ = Ytσ , for all s ∈ [0, τ ]) = 1. Using the ortholimit notation of Chapter 1, Section 2.7, define Yt = limσ∞ Ytσ , t ∈ RN + . It is easily seen that this limit exists, is continuous, and agrees with each Y τ on [0, τ ]. As such, Y is the desired modification of X. (This is called
a patching argument, since Y is constructed in patches Y σ ; σ ∈ NN .)
2.4 Chaining In this subsection we prove Kolmogorov’s continuity theorem (Theorem 2.3.1). Throughout, the assumptions of Theorem 2.3.1 are in force. If Dn = D(2−n ; T, d) denotes the metric encn+1 (t) tropy of T evaluated t −n at 2 , we can find −n Dn balls of radius 2 that cover T ; let Bn cn (t) denote the collection of these balls. For every t ∈ T , there exists a well-defined Bn (t) element of Bn that contains t; we denote this ball by Bn (t). Note that we can always choose this consistently in the sense that whenever s ∈ Bn (t), then Bn (t) = Bn+1 (t) Bn (s). Since Bn (t) ∩ Bn+1 (t) is open and Figure 5.1: Covering by balls contains t, in an arbitrary but fixed way we choose some point cn (t) ∈ Bn (t) ∩ Bn+1 (t) and let Cn = (cn (t); t ∈ T ) designate the totality of these points. (We can think of cn (t) as the “center” of Bn (t) and Cn as the collection of these centers. However, strictly speaking, this interpretation need not be correct.) It is important to note that the cardinality of Cn is at most that of Bn , which is Dn . In particular, Cn is a finite set.
2 Regularity Theory
161
We mention three more properties of the elements of Cn , the last one of which we just stated explicitly: (P1) since cn (t) ∈ Bn (t), d(cn (t), t) ≤ 2−n , for all t ∈ T ; (P2) for all t ∈ T and all n ≥ 1, Bn+1 (cn (t)) = Bn+1 (t), which implies that cn+1 (t) = cn+1 (cn (t)); (P3) #Cn ≤ Dn , where #K denotes the cardinality of any set K. For each integer n ≥ 1, define the stochastic processes X n = (Xtn ; t ∈ T ) and Y = (Yt ; t ∈ T ) as follows: Xtn = Xcn (t)
and
Yt = lim sup Xcn (t) , n→∞
t ∈ T.
The process Y will turn out to be an almost surely continuous modification of X, while X n is a discretization of it. We will remodify Y later, in order to obtain a continuous modification of X. First, we show that we can even discuss the continuity of Y in a measurable way. Lemma 2.4.1 The event (t → Yt is continuous) is measurable. Proof The event in question can be written as follows: (|Ys − Yt | ≤ ε), ε∈Q+ δ∈Q+ s,t∈∪∞ n=1 Cn : d(s,t)≤δ
which is measurable, since ∪∞ n=1 Cn is at most countable.
The key estimate is the following; it shows that for large m, Y is very close to X m , uniformly over all t ∈ T . Lemma 2.4.2 There exists an integer m0 ≥ 1 such that for all λ > 0 and all integers m ≥ m0 ,
P sup sup |Yt − k≥m t∈T
Xtk |
−p ≥ λ ≤ 2λ
0
2−m−1
D(r; T, d)Ψ(2r) dr. r{f (r)}p
In particular, with probability one, limk→∞ supt∈T |Yt − Xtk | = 0. Proof By the first assertion, for all λ > 0,
lim P sup sup |Yt − Xtk | ≥ λ = 0. m→∞
k≥m t∈T
Since the event inside the probability is decreasing in m, the second assertion follows. Thus, it suffices to demonstrate the asserted probability estimate.
162
5. Gaussian Random Variables
For any two integers n > m ≥ 1, supt∈T |Xtn − Xtm | is a maximum over a finite set. As such, it is a random variable. Having checked measurability, we use what is called a chaining argument. Note that max |Xtk − Xtm | ≤
m≤k≤n
n
|Xcj (t) − Xcj−1 (t) |.
j=m+1
In words, we have chained Xtn to Xtm by observing X over appropriate elements of Cm , . . . , Cn . For all integers j ≥ 1, define wj = f (2−j ). Since f is nondecreasing, ∞
wj =
j=1
∞ j=1
2−j+1
2−j
2j f (2−j ) dr ≤ 2
1
0
f (r) dr < ∞. r
∞ Thus, m0 = inf(k ≥ 1 : j=k wj ≤ 1) is well-defined and is a finite integer. For all λ > 0 and all m ≥ m0
P sup max |Xtk − Xtm | ≥ λ t∈T m≤k≤n n
≤P
|Xcj (t) − Xcj−1 (t) | ≥ λ for some t ∈ T
j=m+1 n
≤ ≤
j=m+1 n
P |Xcj (α) − Xα | ≥ wj λ for some α ∈ Cj
(1)
P |Xcj (α) − Xα | ≥ wj λ
j=m+1 α∈Cj n
≤
j=m+1
Dj max P |Xcj (α) − Xα | ≥ wj λ . α∈Cj
(2)
Equations (1) and (2) follow from (P2) and (P3), respectively. By Chebyshev’s inequality and the assumptions of Kolmogorov’s continuity theorem (Theorem 2.3.1), for all λ > 0 and all m ≥ m0 ,
P sup max |Xtk − Xtm | ≥ λ t∈T m≤k≤n n −p
≤λ
≤ λ−p
j=m+1 n j=m+1
Dj wj−p max E |Xcj (α) − Xα |p α∈Cj
Dj {f (2−j )}−p max Ψ(d(cj (α), α)). α∈Cj
2 Regularity Theory
163
By the monotonicity of Ψ and by (P1), Ψ(d(cj (α), α)) ≤ Ψ(2−j ). Hence, for all λ > 0 and all m ≥ m0 , n
P sup max |Xtk − Xtm | ≥ λ ≤ λ−p Dj {f (2−j )}−p Ψ(2−j ). t∈T m≤k≤n
j=m+1
Thus, for all m ≥ m0 , ∞
P sup sup |Xtk − Xtm | ≥ λ ≤ λ−p Dj {f (2−j )}−p Ψ(2−j ) t∈T k≥m
j=m+1
∞
−p
=λ
2−j−1
j=m+1
∞
≤2
j=m+1
2−j
−j
2
2−j−1
2−m−1
=2 0
D(2−j ; T, d)Ψ(2−j )2j dr {f (2−j )}p
D(r; T, d)Ψ(2r) dr r{f (r)}p
D(r; T, d)Ψ(2r) dr. r{f (r)}p
(3)
Since Yt = lim supk→∞ Xtk , for any k ≥ m ≥ 1 and all t ∈ T , |Yt − Xtk | ≤ sup |Xtj − Xtm |. j≥m
The lemma is a consequence of this fact, together with equation (3).
Lemma 2.4.3 Y is a modification of X. Proof To start with, X is continuous in probability. In fact, by Chebyshev’s inequality, for all ε > 0, P(|Xs − Xt | ≥ ε) ≤ ε−p Ψ(d(s, t)). Since Ψ is continuous and Ψ(0) = 0, it follows that as s → t, Xs converges to Xt in probability. In particular, for each t ∈ T , as n → ∞, Xtn = Xcn (t) → Xt , in probability. By equation (3) there exists a finite m0 such that for all m ≥ m0 and all λ > 0, P(|Xtm − Xt | ≥ λ) = lim P(|Xtm − Xtn | ≥ λ) n→∞
≤ P sup sup |Xtk − Xtm | ≥ λ t∈T k≥m
−p
≤λ
2−m−1 0
D(r; T, d)Ψ(2r) dr, r{f (r)}p
which goes to zero as m → ∞. This and Lemma 2.4.2 together show that for each t ∈ T , P(Yt = Xt ) = 1, proving the result. Now that we know that Y is a modification of X, we use Lemma 2.4.2 to provide an estimate for the modulus of continuity of Y .
164
5. Gaussian Random Variables
Lemma 2.4.4 Recall the integer m0 of Lemma 2.4.2. For all 0 < δ < 2−m0 and all λ > 0, = < δ D(r; T, d)Ψ(2r) dr. P sup |Ys − Yt | ≥ λ ≤ 6p λ−p r{f (r)}p s,t∈T : 0 d(s,t)≤δ
Proof For all s, t ∈ T , whenever s ∈ Bm (t), then cm (s) = cm (t). In particular, if s ∈ Bm (t), then Xsm = Xtm . Hence, for any m ≥ 1, sup sup |Ys − Yt | ≤ sup |Yt − Xtm | + sup |Ys − Xsm | t∈T s∈Bm (t)
t∈T
= 2 sup |Yt − t∈T
s∈T
Xtm |.
Applying this, we can deduce that for every integer m ≥ 1 and for all s, t ∈ T with d(s, t) ≤ 2−m , |Ys − Yt | ≤ |Yt − Ycm (t) | + |Ycm (t) − Ycm (s) | + |Ys − Ycm (s) | ≤ 3 sup sup |Yt − Ys | t∈T s∈Bm (t)
≤ 6 sup |Yt − Xtm |. t∈T
We have used the fact that whenever d(s, t) ≤ 2−m , then cm (t) ∈ Bn (cm (s)); see (P2). Lemma 2.4.2 implies that for all m ≥ m0 , = < 2−m−1 D(r; T, d)Ψ(2r) |Ys − Yt | ≥ λ ≤ 6p λ−p dr. (4) P sup r{f (r)}p s,t∈T : 0 d(s,t)≤2−m
Now, if δ < 2−m0 , we can find an integer m ≥ m0 such that 2−m−1 ≤ δ ≤ 2−m . Hence, sup{|Ys −Yt | : s, t ∈ T, d(s, t) ≤ δ} ≤ sup{|Ys −Yt | : s, t ∈ T, d(s, t) ≤ 2−m }. &δ & 2−m−1 {· · ·} dr ≤ 0 {· · ·} dr, where {· · ·} represents the inteSimilarly, 0 grand in equation (4). Equation (4) itself now implies the lemma. We are now ready to prove Kolmogorov’s continuity theorem. Proof of Theorem 2.3.1 By Lemma 2.4.4 and by the continuity properties of P, for all λ > 0, < = < = P
lim
sup
δ→0+ s,t∈T : d(s,t)≤δ
|Ys − Yt | ≥ λ
= lim P δ→0+
sup
s,t∈T : d(s,t)≤δ
|Ys − Yt | ≥ λ
= 0.
2 Regularity Theory
165
Since λ > 0 is arbitrary, we see that Y is an a.s. continuous modification of X. To finish the proof, we remodify Y as follows. Consider the event Ω0 defined to be the collection of all ω ∈ Ω such that t → Yt (ω) is continuous. By Lemma 2.4.1, Ω0 is measurable, and we have just shown that P(Ω0 ) = 0. For all ω ∈ Ω0 , define the process Y't (ω) = Yt (ω) for all t ∈ T , and for all ω ∈ Ω0 , let Y't (ω) ≡ 0 for all t ∈ T . Clearly, Y' = (Y't ; t ∈ T ) is continuous. Moreover, it is not hard to see that Y' is a modification of X, since it is a modification of Y . Theorem 2.3.1 follows for this modification.
2.5 H¨ older-Continuous Modifications Roughly speaking, Kolmogorov’s continuity theorem (Theorem 2.3.1) states that whenever Xs and Xt are sufficiently close uniformly in Lp (P), then X has a continuous modification. In this subsection we further investigate this principle by studying an important special case. Given a totally bounded pseudometric space (T, d) and a continuous function y : T → R, define the modulus of continuity7 ωy,T : R+ → R+ of y by ωy,T (δ) = sup |y(s) − y(t)|, δ ≥ 0. s,t∈T : d(s,t)≤δ
We say that y is H¨ older continuous of order q > 0 if lim sup δ −q ωy,T (δ) < +∞. δ→0+
Stated in terms of the function y, y is H¨older continuous of order q > 0 if there are constants C, δ0 > 0 such that whenever 0 < δ < δ0 , for all s, t ∈ T satisfying d(s, t) ≤ δ, |y(s) − y(t)| ≤ C|s − t|q ; see Supplementary Exercise 5 for details. When T is not totally bounded, we say that y is H¨older continuous of order q > 0, if for all totally bounded S ⊂ T , lim supδ→0+ δ −q ωy,S (δ) < +∞. In Section 2.4 we have shown that under some conditions on the process X, there exists a modification Y such that limδ→0+ ωY,T (δ) = 0. That is, we have shown that Y is continuous. Our next theorem is an N -parameter refinement of Corollary 2.3.1. It shows that under enough smoothness in Lp (P), X has a H¨older continuous modification. Theorem 2.5.1 Let X = (Xt ; t ∈ RN + ) denote a stochastic process that satisfies the following for some C, p > 0 and γ > N : E{|Xs − Xt |p } ≤ C|s − t|γ , 7 This
notion is due to H. Lebesgue.
s, t ∈ RN +.
166
5. Gaussian Random Variables
Then, there exists a modification Y = (Yt ; t ∈ RN older + ) of X that is H¨ continuous of any order q ∈ [0, p−1 (γ − N )[. N Proof For all s, t ∈ RN + , define d(s, t) = |s − t|. While (R+ , d) is not totally bounded, ([a, b], d) is, as long as a ≺ b are both in RN + . Fixing such a, b ∈ RN + , we aim to show that (Xt ; t ∈ [a, b]) has a modification Y = (Yt ; t ∈ [a, b]) that is H¨ older continuous of any order q < p−1 (γ − N ). The patching argument used to complete the proof of Corollary 2.3.1 can be invoked to finish our proof. The process Y = (Yt ; t ∈ [a, b]) is the one provided to us by Section 2.4. By the proof of Corollary 2.3.1, there exists a constant Ca,b such that for all 0 < r < 1, D(r; [a, b], d) ≤ 2N Ca,b r−N . &1 Fix θ > 0 and define f (r) = {ln+ (1/r)}−θ . Since 0 r−1 f (r) dr < +∞, we can apply Lemma 2.4.4 with Ψ(x) = Cxγ to see that there exists a constant 0 < δ0 < 1 such that for all λ > 0 and all 0 < δ < δ0 ,
P(ωY,[a,b] (δ) ≥ λ) ≤ 2
N +γ
p −p
Ca,b C6 λ
0
δ
r−N +γ−1 {ln(1/r)}pθ dr.
Fix two arbitrary constants q and Q that satisfy 0 < q < Q < p−1 (γ − N ). Since δ0 < 1, then for all λ > 0 and 0 < δ < δ0 , P(ωY,[a,b] (δ) ≥ λ) ≤ Γλ−p δ Qp ,
(1)
pθ &1 where Γ = 2N +γ Ca,b C6p 0 rγ−N −Qp−1 ln(1/r) dr < ∞. We can apply equation (1) with δ = e−n and λ = e−(n+1)q to deduce that for all integers n > ln(1/δ0 ),
P ωY,[a,b](e−n ) ≥ e−(n+1)q ≤ epq Γe−np(Q−p) . Since Q > p, the summation of the above over n is finite. By the Borel– Cantelli lemma, there exists a finite random variable n0 such that a.s., for all n ≥ n0 , ωY,[a,b] (e−n ) ≤ e−(n+1)q . We now use a monotonicity argument: For all δ < e−n0 , there exists some integer nδ ≥ n0 with e−(nδ +1) ≤ δ ≤ e−nδ . Thus, outside one set of P-measure 0, for all 0 < δ < e−n0 , ωY,[a,b] (δ) ≤ ωY,[a,b] (e−nδ ) ≤ e−(nδ +1)q ≤ δ q . In particular, lim supδ→0+ δ −q ωY,[a,b](δ) ≤ 1, almost surely. We have shown the existence of a modification of X that is a.s. H¨older continuous. We can remodify this, as we did in the proof of Kolmogorov’s continuity theorem, to obtain a H¨ older continuous modification and finish our derivation.
2 Regularity Theory
167
Exercise 2.5.1 (Hard) In the setting of Theorem 2.5.1, show that for −1 all τ ∈ RN (γ − N )[, there exists a finite + , 0 < q < p, and all Q ∈ ]0, p constant C (which depends on p, q, Q, γ and τ ) such that for all δ ∈ [0, 1[, |Xs − Xt |q ≤ Cδ Qp . E sup s,t∈[0,τ ]: |s−t|≤δ
(Hint: Use integration by parts, in conjunction with equation (1).)
Exercise 2.5.2 In the setting of Theorem 2.5.1, show that for all 0 < q q < p and all τ ∈ RN + , E{supt∈[0,τ ] |Xt | } < ∞. Exercise 2.5.3 There are other, more relaxed, notions of modulus of continuity. For instance, consider the integral modulus of continuity Ωf (ε) of an integrable function f , defined by |f (x + y) − f (x)| dx. Ωf (ε) = sup 0≤y≤ε
Prove that for every integrable function f , limε→0 Ωf (ε) = 0. (Hint: First do this for simple functions f , and then appeal to density.)
2.6 The Entropy Integral If (T, d) denotes a totally bounded pseudometric space, Theorem 2.5.1 shows one instance of the following general principle: If the stochastic process X = (Xt ; t ∈ T ) is very smooth in distribution, then it has a modification with some nice continuity properties. In this subsection we illustrate another class of processes for which the above principle holds true. We say that the pseudometric d is a modulus of continuity for X in probability if there exists a strictly decreasing function Φ : R+ → [0, 1] such that limλ→∞ Φ(λ) = 0 and for all λ > 0, sup P{|Xs − Xt | > d(s, t)λ} ≤ Φ(λ).
s,t∈T
In order to emphasize the dependence on the function Φ, we will say that (d, Φ) is a modulus of continuity for X in probability. This definition is motivated by the following simple result. Lemma 2.6.1 If (d, Φ) is a modulus of continuity for X in probability, then X is continuous in probability. Conversely, if X is continuous in probability and if (T, d) is a compact topological space, X has a modulus of continuity in probability. Exercise 2.6.1 Prove Lemma 2.6.1.
168
5. Gaussian Random Variables
Throughout the remainder of this subsection we will suppose that (d, Φ) is a modulus of continuity for X in probability with a corresponding function Φ. Let Φ−1 denote the inverse function to Φ; this exists, since Φ is strictly decreasing. For any measurable function f : R+ → R+ , we define δ
f (r) dr, δ > 0. Φ−1 Θf (δ) = D( 12 r; T, d) 0 Next, we show that whenever Θf (1) < ∞ for a suitable function f , then X has a continuous modification whose modulus of continuity is Θf ; see Section 2.5 for the latter. Theorem 2.6.1 Consider a process X = (Xt ; t ∈ T ) indexed by a totally bounded pseudometric space (T, d). Suppose (d, Φ) is a modulus of continuity for X in probability, and that there exists a nondecreasing function f : R+ → R+ such that: & 1/2 (a) 0 r−1 f (r) dr < ∞; and (b) Θf ( 12 ) < ∞. Then, X has a continuous modification Y = (Yt ; t ∈ T ). Finally, suppose that condition (a) is replaced by the stronger condition & 1/2 (c) 0 r−1 ln( 1r )f (r) dr < ∞. Then, lim sup δ→0+
ωY,T (δ) ≤ 12. Θf (δ)
Proof8 We follow the arguments of Section 2.4 very closely, but now we pick the weights w1 , w2 , . . . with more care. We will delineate the important steps of the proof, all the while using much of the notation of Section 2.4. Define f (2−j ) , j ≥ 1. wj = 2−j Φ−1 D(2−j ; T, d) An important property of the wj ’s is that for all integers n > m ≥ 1, 2−j+1 n n f (2−j ) dr ≤ 2Θf (2−m−1 ). (1) wj = Φ−1 −j ; T, d) D(2 −j j=m+1 j=m+1 2 We have used the monotonicity properties of metric entropy, Φ−1 , and f . In analogy to Section 2.4, define ∞
m0 = inf k ≥ 1 : wj ≤ 1 . j=k 8 This
can be skipped at first reading.
2 Regularity Theory
169
The above implies that m0 is a well-defined finite integer. Arguing as in equations (1) and (2) of Section 2.4, we see that for all integers n > m ≥ m0 , P sup max |Xtk − Xtm | ≥ 2Θf (2−m−1 ) ≤
t∈T m≤k≤n n
n
P |Xcj (α) − Xα | ≥ wj ≤
j=m+1 α∈Cj
Φ
j=m+1 α∈Cj
wj . d(cj (α), α)
We have implicitly used Lemma 2.6.1, together with that the convention
Φ(t/0) = 0, for all t ≥ 0. By (P3) of Section 2.4, d cj (α), α ≤ 2−j . Since Φ is decreasing, for all n > m ≥ m0 , P sup max |Xtk − Xtm | ≥ 2Θf (2−m−1 ) ≤
t∈T m≤k≤n n
j=m+1 α∈Cj
≤
n
n
Φ(2j wj ) =
j=m+1
f (2−j )
1
α∈Cj (α)
D(2−j ; T, d)
f (2−j ).
j=m+1
The last line follows from (P3) of Section 2.4. Thus, for all n > m ≥ m0 , 2−j+1 n 1 f (2−j ) dr P sup max |Xtk − Xtm | ≥ 2Θf (2−m−1 ) ≤ −j m≤k≤n 2 −j t∈T j=m+1 2
≤2
0
2−m
f (r) dr. r
Now we use the same modification provided us by Lemma 2.4.3. The proof of Lemma 2.4.4 shows that for all m ≥ m0 , 7 6 2−m f (r) −m−1 dr. (2) |Ys − Yt | ≥ 12Θf (2 ) ≤2 P sup r s,t∈T : 0 d(s,t)≤2−m
&1 Since 02 r−1 f (r) dr < +∞, the right-hand side of equation (2) goes to 0 as m → ∞; similarly, limm→∞ Θf (2−m−1 ) = 0. Consequently, for all 1 > ε > 1 ε. 0, there exists some m1 (ε) ≥ m0 such that m ≥ m1 (ε), Θf (2−m−1 ) ≤ 12 By equation (2), for all ε > 0 and all m ≥ m1 (ε), = < 2−m f (r) dr. |Ys − Yt | ≥ ε ≤ 2 P sup r s,t∈T : 0 d(s,t)≤2−m
By the continuity properties of P, for all ε > 0, < = < lim P
m→∞
sup
s,t∈T : d(s,t)≤2−m
|Ys −Yt | ≥ ε
= P lim sup m→∞
= sup
s,t∈T : d(s,t)≤2−m
|Ys −Yt | ≥ ε
= 0.
170
5. Gaussian Random Variables
In particular, limm→∞ ωY,T (2−m ) = 0, a.s. . We can now apply a monotonicity argument to complete the proof of a.s. continuity: For all 2−m ≤ δ ≤ 2−m+1 , ωY,T (2−m ) ≤ ωY,T (δ) ≤ ωY,T (2−m+1 ). This shows that limδ→0+ ωY,T (δ) = 0. In fact, a little more holds true, &1 provided that 0 r−1 ln( 1r )f (r) dr < ∞. To see this, we use equation (2) to see that ∞
∞ −m −m−1 P ωY,T (2 ) ≥ 12Θf (2 ) ≤2
m=m0
0
m=1
1 2
=2 0
2 ≤ ln 2
2−m
∞ m=1 1 2
0
f (r) dr r
1l[0,2−m ] (r)
f (r) dr r
f (r) ln( 1r ) dr, r
which is finite. By the Borel–Cantelli lemma, there exists a random variable n0 such that a.s., for all m ≥ n0 , ωY,T (2−m ) ≤ 12Θf (2−m−1 ). By a monotonicity argument we have established the existence of an a.s. continuous modification of X with an a.s. modulus of continuity of Θf . To remove the a.s., we remodify the process further as in the proof of Kolmogorov’s continuity theorem.
2.7 Dudley’s Theorem Let X = (Xt ; t ∈ T ) be a mean-zero Gaussian process, indexed by some set T . That is, for all t ∈ T , E[Xt ] = 0. Let Σ denote the covariance function of X: s, t ∈ T. Σ(s, t) = E[Xs Xt ], We wish to find conditions under which X has a continuous modification.9 This question is always well posed when T is a pseudometric space. On the other hand, the process X itself induces a natural pseudometric on T as follows: 0 d(s, t) = E{|Xs − Xt |2 }, s, t ∈ T. (1) According to Supplementary Exercise 1, (T, d) is a pseudometric space, and it turns out that as far as the Gaussian process X is concerned, d is the correct pseudometric on T .10 9 The
general nonzero mean-µ case can be obtained from this by considering Xt +µ(t). is a part of the general theory of Gaussian processes. For example, see (Adler 1990; Ledoux 1996; Ledoux and Talagrand 1991). 10 This
2 Regularity Theory
171
Exercise 2.7.1 The pseudometric d can be computed from the covariance function as follows: 0 s, t ∈ T. d(s, t) = Σ(s, s) + Σ(t, t) − 2Σ(s, t), Define the function Φ : R → [0, 1] by 1
2
Φ(λ) = e− 2 λ ,
λ > 0.
Clearly, Φ is strictly decreasing, and its inverse function is 0 t ∈ [0, 1]. Φ−1 (t) = 2 ln(1/t), Lemma 2.7.1 (d, Φ) is a modulus of continuity for X in probability. Exercise 2.7.2 Prove Lemma 2.7.1.
Dudley’s theorem is the following sufficient condition for the continuity of a Gaussian process. Theorem 2.7.1 (Dudley’s Theorem) Suppose X = (Xt ; t ∈ T ) is a mean-zero Gaussian process indexed by the pseudometric space (T, d). If &10 (T, d) is totally bounded and if 0 ln D(r; T, d) dr < ∞, then X has a continuous modification Y = (Yt ; t ∈ T ). Moreover, there exists a universal constant C > 0 such that ωY,T (δ) lim sup & 1 ≤ 24. 0 δ 1 δ→0+ ln D( r; T, d) dr + Cδ ln ln(1/δ) 2 0 This theorem relies on the following real-variable lemma. Lemma 2.7.2 There exists a constant c > 0 such that for all 0 < δ < e−2 , δ1 0 ln ln( 12 ) dr ≤ cδ ln ln(1/δ). 0
Exercise 2.7.3 Prove Lemma 2.7.2. (Hint: You can try L’Hˆ opital’s rule, for instance.)
Proof of Theorem 2.7.1 Fix some θ > 2 and let 1 −θ r > 0. f (r) = ln+ , r It is easy to see that condition (a) of Theorem 2.6.1 holds and, using the notation of Section 2.6, δ1 Θf (δ) = 2 ln D( 12 r; T, d) + 2θ ln ln(1/r) dr, δ > 0. 0
The result follows from the inequality |a+b| ≤ 2(|a|+|b|), and from Lemma 2.7.2.
172
5. Gaussian Random Variables
Exercise 2.7.4 Prove that, in Dudley’s theorem, C =
√ 2 works.
3 The Standard Brownian Sheet This section is concerned with the study of continuity properties of the Brownian sheet of Section 1.5. Recall that B = (Bt ; t ∈ RN + ) is a Brownian sheet if it is a mean-zero Gaussian process indexed by RN + that has the following covariance function: N
Σ(s, t) = E[Bs Bt ] =
s() ∧ t() ,
s, t ∈ RN +.
=1
3.1 Entropy Estimate Let B = (Bt ; t ∈ RN + ) denote the N -parameter Brownian sheet. Since it is a Gaussian process, by the discussion of Section 2.7 there is a natural metric that B defines on its parameter space. This is given by 0 0 d(s, t) = E{|Bs − Bt |2 } = E{(Bs )2 + (Bt )2 − 2Bs Bt } N N N 12 (i) (j) = s + t −2 (s(k) ∧ t(k) ) . i=1
j=1
k=1
Recall that when N = 1, B is called Brownian motion. In this case, 0 √ d(s, t) = s + t − 2st = |s − t|. However, such nice expressions do not exist when N > 1. On the other hand, as far as the continuity properties of B are concerned, all that matters is the behavior of d(s, t) when s and t are close. In other words, we need to know how fast d(s, t) goes to 0 as s → t. In order to do this, define the symmetric difference A B of any two sets A, B ∈ RN by
A B = A ∩ B ∪ A ∩ B . Lemma 3.1.1 For all s, t ∈ RN + , d(s, t) = Leb([0, t] [0, s]). Moreover, 0 0 A |s − t| ≤ d(s, t) ≤ A |s − t|, where N −1 0
t(j) ∧ s(j) and A = max
A = max
1≤k≤N
j=1 j =k
1≤k≤N
N −1 0
t(j) ∨ s(j) .
j=1 j =k
3 The Standard Brownian Sheet
173
Proof One can use the explicit expression for d(s, t) and messy calculations to arrive at this estimate. However, there is a conceptual method that avoids many of the algebraic pitfalls. Recall from Section 1.4 the isonormal process W = (W (h); h ∈ L2 (RN )). ˇ By Centsov’s representation (Theorem 1.5.1), B can be written as Bt = W (1l[0,t] ); thus, Bt −Bs = W (1l[0,t] )−W (1l[0,s] ). On the other hand, W enjoys linearity properties (Theorem 1.4.1). Hence, Bt − Bs = W (1l[0,t] − 1l[0,s] ), a.s. A little thought shows that (1l[0,t] − 1l[0,s] )2 = 1l[0,t][0,s] . Therefore, Bt − Bs = W (1l[0,t][0,s] ), a.s. (why?), and Theorem 1.4.1 implies E{|Bt − Bs |2 } = Leb([0, t] [0, s]), as desired. To estimate the above, note that r ∈ [0, t] [0, s] if and only if there exists some integer 1 ≤ k ≤ N such that s(k) ∧t(k) < r(k) < s(k) ∨t(k) . In particular, [0, t] [0, s] ⊂
N r ∈ [0, s t] : s(k) ∧ t(k) < r(k) < s(k) ∨ t(k) , k=1
(j) = s(j) ∨ t(j) , 1 ≤ j ≤ N . The upper where s t ∈ RN is defined by s t bound on d(s, t) readily follows from this. Similarly, for any 1 ≤ k ≤ N , [0, t] [0, s] ⊃ r ∈ [0, s t] : s(k) ∧ t(k) < r(k) < s(k) ∨ t(k) , which proves the corresponding lower bound.
Combining the above with Corollary 2.1.1, we obtain the following estimate on metric entropy after some algebraic manipulations. Corollary 3.1.1 For any a ≺ b both in RN + , there exist finite positive constants C, r0 > 0 such that for all 0 < r < r0 , D(r; [a, b], d) ≤ Cr−2N . Exercise 3.1.1 Prove Corollary 3.1.1.
3.2 Modulus of Continuity Armed with the entropy estimate of Section 3.1 (see Corollary 3.1.1), we now estimate the modulus of continuity of the Brownian sheet.
174
5. Gaussian Random Variables
Theorem 3.2.1 An N -parameter Brownian sheet B = (Bt ; t ∈ RN + ) has a continuous modification β = (βt ; t ∈ RN ). Moreover, β can be constructed + so that for any a ≺ b both in RN + , the following holds: lim sup
sup
δ→0+
s,t∈[a,b]: |s−t|≤δ
N |β − βt | 0s ≤ 24N |b| 2 . δ ln(1/δ)
In particular, β is H¨older continuous of any order γ < 12 . From now on, any continuous Brownian sheet is referred to as the standard Brownian sheet, for clarity. Since a 1-parameter Brownian sheet is called Brownian motion, by a standard Brownian motion we mean a continuous Brownian motion. Proof The asserted H¨ older continuity is an immediate consequence of the bound on the modulus of continuity of β that we prove next. By a patching argument, it suffices to show that for all a ≺ b both in RN + , (Bt ; t ∈ [a, b]) has a continuous modification β = (β : t ∈ [a, b]); see the proof of Corollary 2.3.1 for the details of a patching argument. Now apply Corollary 3.1.1 to deduce the existence of constants c > 0 and r0 > 0 such that for all 0 < r < r0 , D(r; [a, b], d) ≤ cr−2N . Without loss of any generality, we can assume that r0 < 1. Since metric entropy is nonincreasing, 10 ln D(r; [a, b], d) dr 0
≤
r0
0 ln c + 2N ln(1/r) dr +
1 r0
0
0
ln D(r0 ; T, d) dr,
which is finite. Dudley’s theorem (Theorem 2.7.1) establishes the existence of a continuous modification β. To obtain the stated lim sup bound, we use Corollary 3.1.1 together with L’Hˆopital’s rule of elementary calculus to see that 0 &δ0 ln D(r; T, d) dr + Cδ ln ln(1/δ) 0 0 lim sup ≤ 1, δ 2N ln(1/δ) δ→0+ where C is the constant in Dudley’s theorem. Thus, the latter result implies that sup s,t∈[a,b]: βs − βt √ d(s,t)≤δ 0 lim sup ≤ 24 2N . δ ln(1/δ) δ→0+ By Lemma 3.1.1, whenever |s − t| ≤ ε = N −1 |b|−N δ 2 , then d(s, t) ≤ δ. Thus, sup |βs − βt | ≤ sup |βs − βt |. s,t∈[a,b]:|s−t|≤ε
s,t∈[a,b]:d(s,t)≤δ
4 Supplementary Exercises
The upper bound on the lim sup follows.
175
The modulus of continuity of the Brownian sheet (Theorem 3.2.1) is sharp up to a constant: Exercise 3.2.1 Suppose N = 1 and consider a standard Brownian motion B. Show that for any 0 ≤ a ≤ b, a.s., Bs − Bt lim inf sup 0 ≥ 1. δ→0+ s,t∈[a,b]: 2δ ln(1/δ) |s−t|≤δ
This is due to P. L´evy. (Hint: Fix ε > 0, define Ej = |B n1 (j+1) − B n1 j |2 ≤ (2 + ε) n1 ln( n1 ) , and show that limn→∞ P(∩nj=0 Ej ) = 0.)
Exercise 3.2.2 Let B denote the standard N -parameter Brownian sheet. Prove the following converse to Theorem 3.2.1: lim inf
sup
δ→0+ s,t∈[a,b]: |s−t|≤δ
|Bs − Bt | 0 ≥ 1, 2δ ln(1/δ)
almost surely.
4 Supplementary Exercises 1. In the context of equation (1) of Section 2.7, verify that (T, d) is a pseudometric space. 2. Let D n denote the collection of all dyadiccubes of side 2−n in [0, 1]N . That () −n 2 , (j () + 1)2−n ], where is, I ∈ D n if and only if I is of the form I = N =1 [j () n j ∈ NN satisfies 0 ≤ j ≤ 2 . 0 (i) If µ is any finite signed measure on [0, 1]N , prove that limn→∞ I∈D n {µ(I)}2 = 0. (ii) Let W denote a white noise on RN + . Prove that a.s., {W(I)}2 = 1. lim n→∞
I∈D n
Use the above to prove that white noise is a.s. not a σ-finite signed measure. This is essentially due to P. L´evy. (Hint: Compute the mean and variance.)
176
5. Gaussian Random Variables
3. For this exercise you need to know some elementary functional analysis. Let H denote a Hilbert space with inner product •, •. We can then define the isonormal process W = (W (h); h ∈ H) as a Gaussian process with mean function 0 and covariance function given as follows: For all h, g ∈ H, E[W (h)W (g)] = h, g. (i) Use Kolmogorov’s existence theorem (Theorem 2, Appendix A) to prove directly that W exists. (ii) Given two linear subspaces H1 and H2 of H, check that (W (h); h ∈ H1 ) and (W (h); h ∈ H2 ) are independent if and only if H1 and H2 are orthogonal; i.e., for all h1 ∈ H1 and h2 ∈ H2 , h1 , h2 = 0. (iii) Suppose H has a countable orthonormal basis; i.e., there exist ψ1 , ψ2 , . . . ∈ H such that every f ∈ H has the representation f = ∞ i=1 ψi f, ψi , where the convergence takes place in H. Then, there exist independent standard Gaussian variables g1 , g2 , . . . such that for any h ∈ H, W (h) = ∞ i=1 gi ψi , where the convergence holds in L2 (P). 4. If d(s, t) = |s−t| for s, t ∈ [−1, 1]N , verify that limε→0 εN D(ε; [−1, 1]N , d) = 1. 5. Consider a totally bounded metric space (T, d). Prove that a function f : T → R is H¨ older continuous of order q > 0 if and only if there exist two finite positive constants C and ε0 such that for all s, t ∈ T with d(s, t) ≤ ε0 , |f (s) − f (t)| ≤ C|s − t|q . Prove that unless f is a constant, q ≤ 1. 6. Suppose X = (Xt ; t ∈ T ) is a Gaussian process. Prove that for all p > 0 and all s ∈ T , E{|Xs |p } = kp {E[|Xs |2 ]}p/2 , where kp = 2p/2 π −1/2 Γ( 12 (p + 1)). Conclude that for all s, t ∈ T , E{|Xs − Xt |p } = kp [E{|Xs − Xt |2 }]p/2 . This is an Lp (P) variant of Exercise 2.7.1. 7. Demonstrate the following variant of Exercise 2.5.1: Under the conditions of −1 Exercise 2.5.1, for all τ 0 in RN (γ−N )[, + , all 0 < q < p, and for every Q ∈ ]0, p E
|Xs − Xt |q < +∞. Qp s,t∈[0,τ ] |s − t| sup
8. (Hard) Let X and Y be two d-dimensional Gaussian random vectors such that for all 1 ≤ i, j ≤ d, E[X (i) ] = E[Y (j) ] = 0, E[(X (i) )2 ] = E[(Y (i) )2 ], and E[X (i) X (j) ] ≤ E[Y (i) Y (j) ]. We intend to demonstrate the following comparison inequality of Slepian (1962) for all λ > 0:
P max X (i) ≥ λ ≥ P max Y (i) ≥ λ . (1) 1≤i≤d
This will be done in a number of steps.
1≤i≤d
4 Supplementary Exercises
177
(i) Let [f ] denote the Fourier transform of f . That is, for all f : Rd → R, ξ ∈ Rd , [f ](ξ) = eiξ·x f (x) dx, if, say, f ∈ L1 (Rd ). Show that whenever f is continuously differentiable and vanishes outside a compact set, ∂f (ξ) = −iξ (i) [f ](ξ), ∂x(i) where ξ ∈ Rd and 1 ≤ i ≤ d. Conclude that if f is twice continuously differentiable and vanishes outside a compact set, ∂2f (ξ) = −ξ (i) ξ (j) [f ](ξ). ∂x(i) ∂x(j) √ √ (ii) For all t ∈ [0, 1], define Zt = 1 − t X + t Y. Verify that Z = (Zt ; t ∈ [0, 1]) d is an R -valued Gaussian process indexed by [0, 1]. That is, for all finite T ⊂ [0, 1], (Zt ; t ∈ T ) is a Gaussian random vector. Moreover, αi,j (t) = (i) (j) E[Zt Zt ] satisfies the differential inequality αi,j (t) ≥ 0. (iii) Let ϕt denote the probability density of Zt (with respect to Lebesgue’s measure). Show that [ϕt ](ξ) = exp
−
d d 1 (i) (j) ξ ξ αi,j (t) . 2 i=1 j=1
(iv) Use the inversion theorem for Fourier transforms, combined with (iii), to deduce that d d 1 dϕt ∂ 2 ϕt (x) = α (t) (x). dt 2 i=1 j=1 i,j ∂x(i) ∂x(j) (v) Prove that
∂ 2 ϕt (x) = |x(i) |2 ϕt (x) ≥ 0. {∂x(i) }2
Conclude that d 1 dϕt ∂ 2 ϕt (x) ≥ αi,j (t) (i) (j) (x). dt 2 i,j=1 ∂x ∂x i =j
(vi) Prove that
λ
λ
dϕt (x) dx −∞ −∞ dt λ λ d ∂ 2 ϕt 1 ≥ αi,j (t) ··· (x) dx. (i) (j) 2 i,j=1 −∞ −∞ ∂x ∂x
d (i) P( max Z ≤ λ) = dt 1≤i≤d t
···
i =j
Moreover, show that the above is nonnegative and derive equation (1) above. (Hint: In (vi), to show the final positivity, show that the multidimensional integral is a probability.)
178
5. Gaussian Random Variables
9. Apply Slepian’s inequality (Supplementary Exercise 8) to derive the following improvement of Exercise 3.2.2: If B denotes the N -parameter standard Brownian sheet, then with probability one, lim inf δ→0+
sup s,t∈[a,b]: |s−t|≤δ
0
|Bs − Bt | ≥ 1. 2N δ ln(1/δ)
This is a part of Orey and Pruitt (1973, Theorem 2.4); see also Esqu´ıvel (1996). (Hint: The increments of B have positive correlations. Compare them to a Gaussian process whose increments are independent.) 10. Let (ξi,j ; i, j ≥ 0) denote i.i.d. standard Gaussian random variables and N recall from Chapter 2 the N -dimensional Haar functions (hN k,j ; k ≥ 0, j ∈ Γ (k)). (i) If g1 , g2 , . . . are i.i.d. standard Gaussian random variables, √ there exists a finite C > 0 such that for all n ≥ 1, E{maxj≤n |gj |} ≤ C ln+ n. N (ii) Show that with probability one, ∞ k=0 j∈ΓN (k) ξk,j hk,j (t) converges uniN formly in t ∈ [0, 1] . (iii) Check that t→ Bt is a Brownian sheet with t restricted to [0, 1]N , where N Bt = ∞ k=0 j∈ΓN (k) ξk,j hk,j (t). When N = 1, this series expansion of the Brownian sheet is due to P. L´evy and simplifies an older one, due to N. Wiener, who used it to prove the existence of Brownian motion; this and related results can be found in Adler (1990, Section 3.3, Chapter 3) and Itˆ o and McKean (1974, Section 1.5, Chapter 1). (Hint: For part (i), integrate the inequality P(maxj≤n |gj | ≥ λ) ≤ nP(|g1 | ≥ λ). For part (ii), have a look at Supplementary Exercise 1, Chapter 2.)
5 Notes on Chapter 5 Section 1 This section’s construction of Gaussian processes and variables only scratches the surface of this theory. See Janson (1997) for a very rich, as well as modern, account. The isonormal process makes rigorous objects (such as white noise) that have been lurking in the physics and engineering literature for a long time, and was developed in Segal (1954); an excellent resource for this and its relation to abstract Gaussian measures is Dudley (1973). The Brownian sheet and related multiparameter processes were first discovered in the theory of mathematical statistics; see Kitagawa (1951). Modern accounts can now be found in (Adler 1990; Dudley 1984; Ledoux and Talagrand 1991; G¨ anssler 1983; Pollard 1984); see also Adler and Pyke (1997) for a related set of results. An interesting recent appearance of the Brownian sheet in statistical mechanics is presented in Kuroda and Manaka (1987, 1988, 1998). Section 2 In its present form, the notion of modifications and separability is due to J. L. Doob. The classic text Doob (1990, Chapter II) contains a very thorough account as well as a wealth of references to older literature.
5 Notes on Chapter 5
179
As mentioned within the text, metric entropy is due to R. M. Dudley; what we call Kolmogorov entropy is sometimes also called ε-entropy. This was motivated by a related notion of a λ-capacity from information theory. The latter was discovered in Shannon (1948); see also Shannon and Weaver (1949). Example 2 of this section is a special case of more general estimates of Kolmogorov; see Dudley (1984, Theorem 7.1.1). Kolmogorov’s infinite-dimensional calculations were preceded by the finite-dimensional computations of Muroga (1949). The continuity theorems of this chapter are essentially borrowed from Dudley (1973); they are the sharp form of classical results of A. N. Kolmogorov, as well as those of Garsia et al. (1971). The metric entropy integral condition has since been replaced by a “majoring measure” condition due to X. Fernique and, later, to M. Talagrand. This level of refinement is not used in this book, although it is well within reach of the methods here; see (Adler 1990; Ledoux 1996; Ledoux and Talagrand 1991) for more detailed information. Section 3 Theorem 3.2.1 can be sharpened. In fact, the lim sup there equals the lim inf, and the lower bound in Supplementary Exercise 9 is sharp. See Orey and Pruitt (1973, Theorem 2.4) and Cs¨ org˝ o and R´ev´esz (1978, 1981) for related results, as well as further refinements.
This page intentionally left blank
6 Limit Theorems
Suppose X1 , X2 , . . . are independent, identically distributed Rd -valued random variables and consider, as usual, the random walk k → Sk = X1 + · · · + Xk . If E[X1 ] = 0 and E[X12 ] < +∞, the classical central limit theory on Rd implies that as n → ∞, the random vector n−1/2 Sn converges in distribution to an Rd valued Gaussian random vector. It turns out that much more is true; namely, as n → ∞, the distribution of the process t → n−1/2 Snt starts to approximate that of a suitable Gaussian process. This approximation is good enough to show that various functionals of the random walk path converge in distribution to those of the limiting Gaussian process. To do any of this, we first view the process S as a random element of a certain space of functions. Thus, we begin with general facts about random variables that take their values in a topological space.
1 Random Variables This chapter starts with some general results on random variables that take values in a topological space T . Throughout, B(T ) will denote the Borel σ-field on T , which is the σ-field generated by all open subsets of T . Oftentimes, we assume further that T is a metric space; i.e., we are really assuming the existence of a metric d on T that is compatible with the topology of T . The astute reader will notice that much of the material of this section is in parallel with the theory of Rd -valued random variables.
182
6. Limit Theorems
1.1 Definitions Let T be a topological space, endowed with its Borel field B(T ). A T -valued random variable X on the probability space (Ω, G, P) is just a measurable map X : Ω → T . In other words, X is a T -valued random variable if for all E ∈ B(T ), (ω ∈ Ω : X(ω) ∈ E) ∈ G. The latter is an event and will be written as (X ∈ E) for the sake of brevity. For example, using this notation, we need only write P(X ∈ E) instead of the more cumbersome P(ω ∈ Ω : X(ω) ∈ E). The calculus of T -valued random variables follows the same rules as that of Rd -valued random variables, and is derived by similar means. Therefore, many of the results of this section are left as exercises that the first-time reader is encouraged to attempt. Theorem 1.1.1 Let T1 and T2 be topological spaces with Borel fields B(T1 ) and B(T2 ), respectively. If X is a T1 -valued random variable and f : T1 → T2 is measurable, then f (X) is a T2 -valued random variable. Exercise 1.1.1 Prove Theorem 1.1.1.
The above has many important consequences, an example of which is the following. Corollary 1.1.1 Suppose T1 , T2 , and T3 are topological spaces. If Xi is a Ti -valued random variable (i = 1, 2), and if f : T1 × T2 → T3 is product measurable, then f (X1 , X2 ) is a T3 -valued random variable. Exercise 1.1.2 Prove Corollary 1.1.1.
In particular, when X1 , . . . , Xn are T -valued random variables, and given any function f : T n → T , Y = f (X1 , . . . , Xn ) is a random variable. As an example, consider a linear topological space Tover R. If X1 , . . . , Xn are n T -valued random variables, then so is Sn = n1 j=1 Xj . This is a familiar object and motivates the need for the following theorem. Theorem 1.1.2 Suppose T is a complete metric space and X1 , X2 , . . . is a.s. a Cauchy sequence of T -valued random variables. Then, there exists a T -valued random variable X such that X = limn→∞ Xn , a.s. Proof Since T is complete and (Xn ; n ≥ 1) is a.s. a Cauchy sequence, then for P-almost all ω ∈ Ω, limn→∞ Xn (ω) exists. Thus, there exists an event Ω0 ∈ G such that P(Ω0 ) = 1 and for all ω ∈ Ω0 , X(ω) = limn→∞ Xn (ω) exists. Fix some arbitrary t ∈ T and define X(ω) = t for all ω ∈ Ω \ Ω0 , so that X(ω) is now defined for all ω ∈ Ω. Suppose that G ⊂ T is open and that for some ω ∈ Ω0 , X(ω) ∈ G. Then, there exists n0 ≥ 1 such that for
1 Random Variables
183
all n ≥ n0 , Xn (ω) ∈ G. Similarly, the converse also holds. In summary, (X ∈ G) ∩ Ω0 =
∞
∞
(Xn ∈ G) ∩ Ω0 .
(1)
n0 =1 n=n0
If t ∈ G, (X ∈ G) = Ω0 ∪(X ∈ G)∩Ω0 . Otherwise, (X ∈ G) = (X ∈ G)∩Ω0 . In any case, equation (1) shows that (X ∈ G) ∈ G, for all open sets G ⊂ T . Since B(T ) is generated by open sets, X is a T -valued random variable.
1.2 Distributions Suppose T is a topological space with its Borel field B(T ). To every T -valued random variable X we can associate a distribution P ◦ X −1 defined by P ◦ X −1 (E) = P(X ∈ E),
E ∈ B(T ).
The following characterizes an important property of the distribution of X. Lemma 1.2.1 If X is a T -valued random variable, then P ◦ X −1 is a probability measure on (T, B(T )). Exercise 1.2.1 Prove Lemma 1.2.1.
Thus, for every T -valued random variable X, there exists a probability measure P ◦ X −1 ; the converse is also true. Theorem 1.2.1 Let µ be a probability measure on a topological space T . There exists a probability space (Ω, G, P) on which there is a T -valued random variable X whose distribution, P ◦ X −1 , is µ. Proof Let (Ω, G, P) = (T, B(T ), µ), and define X to be the coordinate function on T ; that is, X(ω) = ω, for all ω ∈ T. It follows that for all E ∈ B(T ), P ◦ X −1 (E) = P(X ∈ E) = µ(E), as desired. Let us conclude this subsection with a change of variables formula. Theorem 1.2.2 Suppose X is a T -valued random variable where T is a topological space. Then, for all bounded, continuous functions f : T → R, f (ω) P ◦ X −1 (dω). E[f (X)] = T
Exercise 1.2.2 Prove Theorem 1.2.2. (Hint: Consider, first, functions of the form f (x) = 1lA (x).)
184
6. Limit Theorems
1.3 Uniqueness Suppose T = R and let X be an R-valued random variable. In order to “know” the measure P ◦ X −1 , all we need are the probabilities P ◦ X −1 (E) for an appropriately large class of Borel sets E. For example, it is sufficient to know P ◦ X −1 (E) for all E of the form ] − ∞, x] where x ∈ R. In this case, P ◦ X −1 (] − ∞, x]) = P(X ≤ x) is the familiar cumulative distribution function of X. More generally, when T = Rd , we only need to know the “distribution function” x → P(X x) (why?). When T is a general topological space, such notions do not easily extend themselves. In order to generalize to a topological T , first notice d that when x ∈ T = Rd , P(X x) = E[1l]−∞,x] (X)], where ] − ∞, x] = =1 ] − ∞, x() ]. Since one can approximate 1l]−∞,x] by a bounded, continuous function arbitrarily well, it follows that when T = Rd , knowing the collection E[f (X)]; f : Rd → R bounded, continuous amounts to knowing the entire distribution of X. One attractive feature of this formulation is that since continuity is a topological phenomenon, one can just as easily work on more general topological spaces. One way to state this is as follows: Let Cb (T ) denote the collection of all bounded, continuous functions f : T → R. Theorem 1.3.1 Consider probability measures P1& and P2 on (T, B(T )), & where T is a metric space. If T f (ω) P1 (dω) = T f (ω) P2 (dω) for all f ∈ Cb (T ), then P1 = P2 . Before proving Theorem 1.3.1, we note the following important result. Corollary 1.3.1 Let T be a topological space, and consider any T -valued random variable X. Then, the collection {E[f (X)]; f ∈ Cb (T )} uniquely determines the distribution of X. In other words, as f varies over Cb (T ), f → E[f (X)] plays the role of the cumulative distribution function of X. The key technical step in the proof of Theorem 1.3.1 is Urysohn’s lemma of general topology, which we state here in the context of metric spaces; see Munkres (1975, Chapter 4, Section 4-3) for this, and for further extensions. Lemma 1.3.1 (Urysohn’s Lemma) If A and B are two disjoint closed subsets of a metric space T , there exists a continuous function f : T → [0, 1] such that for all ω ∈ A, f (ω) = 1, and for all ω ∈ B, f (ω) = 0. In words, disjoint closed subsets of a metric space can be separated by continuous functions.
2 Weak Convergence
185
Proof of Theorem 1.3.1 It suffices to show that for all closed sets F ⊂ T , 1lF (ω) P1 (dω) = 1lF (ω) P2 (dω). T
T
Had 1lF been continuous, we would be done. Since this is clearly not the case, we approximate 1lF by a bounded continuous function. For any ε > 0, define Fε = {x ∈ T : d({x}, F ) < ε}, where the distance d(A, B) between any two sets A and B is defined as d(A, B) = inf{d(x, y) : x ∈ A, y ∈ B}.
(1)
Since F is closed, x → d({x}, F ) is continuous; thus, Fε is closed. By Lemma 1.3.1, we can find a continuous function f : T → [0, 1] such that f = 1 on F and f = 0 on Fε . Since 0 ≤ f ≤ 1, we have shown that 1lF ≤ f ≤ 1lFε . Thus, 1lFε (ω) P1 (dω) ≥ f (ω) P1 (dω) ≥ P2 (F ). P1 (Fε ) = T
T
On the other hand, if 0 < ε1 ≤ ε2 , then Fε1 ⊂ Fε2 and ∩ε>0 Fε = F . Thus, P1 (F ) = lim P1 (Fε ) ≥ P2 (F ). ε→0
Reversing the roles of P1 and P2 , we obtain the result.
2 Weak Convergence Let (µn ; 1 ≤ n ≤ ∞) denote a sequence of probability measures on a topological space T . We say that µn converges weakly to µ∞ if for all bounded, continuous functions f : T → R, f (ω) µn (dω) = f (ω) µ∞ (dω), lim n→∞
T
T
and write this as µn =⇒ µ∞ . Sometimes, when µn is the distribution of a T -valued random variable Xn (1 ≤ n ≤ ∞), we may say, instead, that Xn converges weakly to X∞ , and write this as Xn =⇒ X∞ . Note that we do not require the Xn ’s to be defined on the same probability space. Suppose that for all 1 ≤ n ≤ ∞, Xn is defined on a probability space (Ωn , Fn , Pn ). By Theorem 1.2.2, Xn =⇒ X∞ if and only if for all continuous, bounded functions f : T → R, limn→∞ En [f (Xn )] = E∞ [f (X∞ )], where En denotes the expectation operator corresponding to Pn (1 ≤ n ≤ ∞). Since this is distracting, we often abuse the notation by writing limn→∞ E[f (Xn )] = E[f (X)]; it should be clear which expectation applies to what random variable.
186
6. Limit Theorems
2.1 The Portmanteau Theorem The following result characterizes weak convergence of probability measures. Theorem 2.1.1 (The Portmanteau Theorem) Suppose (µn ; 1 ≤ n ≤ ∞) denotes a collection of probability measures on a topological space T . The following are equivalent: (i) µn =⇒ µ∞ ; (ii) for all closed sets F ⊂ T , lim supn→∞ µn (F ) ≤ µ∞ (F ); (iii) for all open sets G ⊂ T , lim inf n→∞ µn (G) ≥ µ∞ (G); and (iv) for all measurable A ⊂ T such that µ∞ (∂A) = 0, limn→∞ µn (A) = µ∞ (A). Remarks (a) ∂A is the topological boundary of A; it is defined as ¯ ◦ , where A¯ and A◦ are the closure and interior of A, respectively. ∂A = A\A (b) At least when T = Rd , Theorem 2.1.1 should be a familiar result. Proof If F is closed, F is open and vice versa. Since µ(U ) = 1 − µ(U ), for all U ∈ B(T ), the equivalence of (ii) and (iii) follows. Now we show that (ii) and (iii) together imply (iv ). ¯ Using the notation of the above remark (a), µn (A◦ ) ≤ µn (A) ≤ µn (A). ◦ ¯ Since A is open and A is closed, (ii) and (iii) together show that ¯ µ∞ (A◦ ) ≤ lim inf µn (A) ≤ lim sup µn (A) ≤ µ∞ (A). n→∞
n→∞
¯ = µ∞ (A◦ ) + µ∞ (∂A) = µ∞ (A◦ ), property (iv ) follows. Next, Since µ∞ (A) we show that (iv ) ⇒ (iii). In the notation of equation (1) of Section 1.3, for any F ⊂ T and for all ε > 0, define Fε = {y ∈ T : d({y}, F ) < ε}. (1) This is an open set for any ε > 0; cf. the proof of Theorem 1.3.1. Since µ1 , µ2 , . . . , µ∞ can have no more than a denumerable number of atoms, we can choose a sequence εk → 0 such that for all 1 ≤ n ≤ ∞ and all k ≥ 1, µn (∂Fεk ) = 0. By (iv ), limn→∞ µn (Fεk ) = limn→∞ µn (Fεk ) = µ∞ (Fεk ). Thus, (iv ) implies that for any measurable F ⊂ T , lim sup µn (F ) ≤ lim µn (Fεk ) = µ∞ (Fεk ). n→∞
n→∞
If, in addition, F is closed, then ∩k Fεk = F , and we obtain (iv ) ⇒ (ii) from the above. Finally, we show that (i) ⇒ (ii).
2 Weak Convergence
187
If F is a closed subset of T , by Urysohn’s lemma (Lemma 1.3.1), we can find a continuous function f : T → [0, 1] such that f = 1 on F and f = 0 on & Fε , where Fε is defined in equation (1) above. Since&f ≥ 1lF , µn (F ) ≤ f (ω) µn (dω). By (i), as n → ∞, this converges to T f (ω)µ∞ (dω) ≤ T µ∞ (Fε ), thanks to the inequality f ≤ 1lFε . This proves (i) ⇒ (ii), and it remains to verify the converse. Fix any continuous function f : T → R such that for some integer k > 0, −k ≤ f (ω) ≤ k, for all ω ∈ T . For all real numbers m > 1, T
f (ω) µn (dω) ≤
j∈Z: |j|≤km
j j−1 j µn ω ∈ T : ≤ f (ω) ≤ . m m m
−1 −1 On the other hand, the collection µ−1 1 ◦ f, µ2 ◦ f, . . . , µ∞ ◦ f has at most a denumerable number of atoms. Thus, we can find mk → ∞ such that
µ−1 n ◦f
j j = µn ω ∈ T : f (ω) = = 0, mk mk
for all j, k ≥ 1 and all n = 1, 2, . . . , ∞. Since (ω ∈ T : j−1 m ≤ f (ω) ≤ closed, then by (ii), and after applying the above display, lim sup n→∞
T
f (ω) µn (dω) ≤
j∈Z: |j|≤km
≤
T
j m)
is
j−1 j j µ∞ ω ∈ T : ≤ f (ω) ≤ mk mk mk
f (ω) µ∞ (dω) +
1 . mk
Sending mk → ∞ demonstrates (ii) ⇒ (i) and completes this proof.
Alternatively, one can write the portmanteau theorem in terms of random variables. Theorem 2.1.2 Suppose (Xn ; 1 ≤ n ≤ ∞) is a sequence of random variables that take their values in a topological space T . The following are equivalent: (i) Xn =⇒ X∞ ; (ii) for all closed sets F ⊂ T , lim supn→∞ P(Xn ∈ F ) ≤ P(X∞ ∈ F ); (iii) for all open sets G ⊂ T , lim inf n→∞ P(Xn ∈ G) ≥ P(X∞ ∈ G); and (iv) for any measurable A ⊂ T such that P(X∞ ∈ ∂A) = 0, lim P(Xn ∈ A) = P(X∞ ∈ A).
n→∞
188
6. Limit Theorems
Henceforth, any reference to the portmanteau theorem will be to either Theorem 2.1.1 or 2.1.2, whichever naturally applies.1 Exercise 2.1.1 Check Theorem 2.1.2.
Exercise 2.1.2 Given a collection of probability measures (µn ; 1 ≤ n ≤ ∞) on a topological space T , extend the portmanteau theorem by showing that for all bounded upper semicontinuous functions f : T → R, f (ω) µn (dω) ≥ f (ω) µ∞ (dω). lim inf n→∞
T
T
You may recall that f is upper semicontinuous if for all real α, (ω : f (ω) < α) is an open set in T .
2.2 The Continuous Mapping Theorem If T1 and T2 are two topological spaces, it is not difficult to see that whenever (xn ; 1 ≤ n ≤ ∞) are elements of T1 such that limn→∞ xn = x∞ , then for any continuous function f : T1 → T2 , limn→∞ f (xn ) = f (x∞ ). The following is the weak convergence analogue of such a fact. Theorem 2.2.1 (The Continuous Mapping Theorem) If Xn =⇒ X∞ and f : T1 → T2 is continuous, then f (Xn ) =⇒ f (X∞ ). Exercise 2.2.1 Prove the continuous mapping theorem. (Hint: Consider Corollary 1.1.1.)
2.3 Weak Convergence in Euclidean Space We now specialize to the case T = Rd , with which the reader is assumed to be familiar. In concordance with the rest of the book, Rd is metrized by d(x, y) = |x − y|, (x, y ∈ Rd ), which provides us with the usual Euclidean topology on Rd and the corresponding Borel field. Any probability measure µ on Rd gives rise to a cumulative distribution function F defined by F (t) = µ(r ∈ Rd : r t),
t ∈ Rd .
The following two results are elementary. Theorem 2.3.1 Two probability measures on Rd are equal if and only if their cumulative distribution functions are. 1 We reiterate that since the X ’s need not be defined on the same probability space, n the notational remarks of Section 2 preceding Theorem 2.1.1 apply also to Theorem 2.1.2.
2 Weak Convergence
189
Theorem 2.3.2 Suppose µ1 , . . . , µ∞ is a possibly infinite sequence of probability measures on Rd whose cumulative distribution functions are F1 , . . . , F∞ , respectively. Then, µn =⇒ µ∞ if and only if whenever F∞ is continuous at a point t ∈ Rd , we have limn→∞ Fn (t) = F∞ (t). Exercise 2.3.1 Prove Theorems 2.3.1 and 2.3.2.
In other words, we may state the following corollary: Corollary 2.3.1 Weak convergence in Rd is the same as convergence in distribution in the classical sense.
2.4 Tightness When does a family of probability measures converge weakly? To answer this, we first need the notion of tightness. Suppose T is a topological space and (Pα ; α ∈ A) is a collection of probability measures defined on B(T ), where A is some indexing set. We say that (Pα ; α ∈ A) is tight if for all ε ∈ ]0, 1[, there exists a compact set Γε ⊂ T such that sup Pα (Γε ) ≤ ε. α∈A
Sometimes when Pα denotes the distribution of a T -valued random variable Xα (α ∈ A) we may refer to the family (Xα ; α ∈ A) as tight when (Pα ; α ∈ A) is tight. Finally, we say that P is tight if the singleton {P} is tight. Exercise 2.4.1 Suppose (Pα ; α ∈ A) is a collection of probability measures on R. Let Fα denote the cumulative distribution function of Pα (α ∈ A), and prove that (Pα ; α ∈ A) is tight if and only if limx→∞ Fα (x) = 1 and limx→−∞ Fα (x) = 0, uniformly for all α ∈ A. Exercise 2.4.2 Show that if Pα is tight for each α ∈ A, and if A is a finite set, then (Pα ; α ∈ A) is tight. Exercise 2.4.3 Show that if T is σ-compact, i.e., a countable union of compact sets, any finite number of probability measures on T are tight. Theorem 2.4.1 If T is a complete separable metric space, any finite number of probability measures on T are tight. Proof By Exercise 2.4.2, we need only show that a single probability measure P on T is tight. Since T is separable, for any integer n ≥ 1, we can find a countable sequence of points x1 , x2 , . . . ∈ T such that ∪∞ i=1 Bi,n covers T , where Bi,n denotes the ball (in T ) of radius n1 about xi . Thus, for any probability measure P on T , limm→∞ P(∪m i=1 Bi,n ) = 1. In particular, for
190
6. Limit Theorems m(n)
any ε ∈ ]0, 1[, there exists m(n) so large that P(∪i=1 Bi,n ) ≥ 1 − ε2−n , for all n ≥ 1. An immediate consequence of this is that P
∞ m(n)
∞ ∞
m(n) ≤ ≤ε Bi,n P Bi,n 2−n ≤ ε.
n=1 i=1
n=1
i=1
n=1
m(n)
Note that Γε = ∩∞ n=1 ∪i=1 Bi,n is totally bounded. (Why? See Section 2.1 of Chapter 5.) By Theorem 2.1.1 of Chapter 5, Γε is compact, and our proof is complete. Exercise 2.4.4 Let (Xn ; n ≥ 1) denote a collection of Rk -valued random vectors that are bounded in Lp (P) for some p > 0; i.e., supn E{|Xn |p } < +∞. Prove that (Xn ; n ≥ 1) is a tight family. The following gives the first indication of deep relationships between tightness and weak convergence. Proposition 2.4.1 Suppose (Pn ; 1 ≤ n ≤ ∞) is a collection of probability measures on a complete, separable metric space T such that Pn =⇒ P∞ . Then, (Pn ; n ≥ 1) is a tight family. Proof We will show the slightly stronger fact that (Pn ; 1 ≤ n ≤ ∞) is a tight family. For any ε ∈ [0, 1], we can choose a compact set Γε such that P∞ (Γε ) ≤ ε; cf. Theorem 2.4.1. Since Γε is closed, by the portmanteau theorem (Theorem 2.1.2) there exists n0 (ε) such that for all n ≥ n0 (ε), Pn (Γε ) ≤ 2ε. On the other hand, the finite collection {Pn ; 1 ≤ n ≤ n0 (ε)} is tight; cf. Exercise 2.4.2. Therefore, we can find another compact set Kε such that for all 1 ≤ n ≤ n0 (ε), Pn (Kε ) ≤ 2ε. Concerning the compact set Λε = Kε ∪ Γε , we have shown that for all 1 ≤ n ≤ ∞, Pn (Λε ) ≤ 2ε; this proves tightness.
2.5 Prohorov’s Theorem Proposition 2.4.1 shows that on a complete metric space, weak convergence implies tightness. The converse also holds, but in general only along subsequences. This deep fact, discovered by Yu. V. Prohorov, is the subject of this subsection. Theorem 2.5.1 (Prohorov’s Theorem) Suppose (Pn ; n ≥ 1) is a tight collection of probability measures on a complete, separable metric space (T, d). Then, there exists a subsequence n and a probability measure P on T such that Pn =⇒ P. Remarks The following remarks are elaborated upon in standard references on weak convergence; cf. Billingsley (1968), for example.
2 Weak Convergence
191
(a) According to Supplementary Exercise 2, weak convergence can be topologized. As such, the above can be restated as follows: On a complete, separable metric space, a tight collection of probability measures is sequentially compact. (b) It can be shown that separability is not needed in the above. (c) As it is stated, this is only half of Prohorov’s theorem. The other half asserts that on a complete, separable metric space (T, d), any sequentially compact family (Pn ; n ≥ 1) of probability measures on T is tight. Exercise 2.5.1 Verify the following extension of Prohorov’s theorem: For any subsequence (Pn ) of the tight family (Pn ), there is a further subsequence (Pn ) that converges weakly to a probability measure. Proof of Theorem 2.5.1 in the Compact Case We merely prove Prohorov’s theorem when T is compact. Supplementary Exercise 4 provides guidelines for extending this to the general case. Note that when T is compact, (Pn ) is always a tight family. In this case, Theorem 2.5.1 is a reformulation of the Banach–Alaoglu theorem of functional analysis; cf. Rudin (1973, Theorem 3.15), for instance. Our proof is divided into four easy steps, all the time assuming that T is compact. Step 1. (Reduction to T ⊂ R∞ ) In this first step we argue that without loss of generality, T is a compact subset of R∞ , where the latter is, as usual, endowed with the product topology. A key role is played by the following variant of Urysohn’s metrization theorem; cf. Munkres (1975, Theorem 4.1, Chapter 4), for instance. Urysohn’s Metrization Theorem Any separable metric space T is homeomorphic to a subset of R∞ . That is, there exists a one-to-one continuous function h : T → R∞ whose inverse function h−1 : h(T ) → T is also continuous. Let Xn (respectively X) be a T -valued random variable with distribution Pn (respectively P) We wish to show the existence of a subsequence n such that for all bounded continuous functions f : T → R, limn →∞ E[f (Xn )] = E[f (X)]. By the continuous mapping theorem (Theorem 2.2.1), this is equivalent to showing that for all homeomorphisms h : T → R∞ and all bounded continuous functions ψ : R∞ → R, limn →∞ E[ψ(h(Xn ))] = E[ψ(h(X))]. Since h(Xn ) and h(X) are R∞ -valued random variables, this reduces our proof of Prohorov’s theorem to the case T ⊂ R∞ . Step 2. (Separability of the Space of Continuous Functions) Henceforth, T is a compact subset of R∞ , and C(T ) denotes the collection
192
6. Limit Theorems
of all continuous functions f : T → R, metrized by the supremum norm. In this second step of our proof we propose to show that C(T ) is separable. To do so, we will need to recall the Stone–Weierstrass theorem. Recall that A ⊂ C(T ) is an algebra if f, g ∈ A implies that the product f g is in A. Also recall that A ⊂ C(T ) is said to separate points if whenever x, y ∈ T are distinct, we can find f ∈ A such that f (x) = f (y). The Stone–Weierstrass Theorem Suppose T is compact and A ⊂ C(T ) is an algebra that (i) separates points and (ii) contains all constants. Then, A is dense in C(T ). For a proof, see Royden (1968, Theorem 28, Chapter 9, Section 7), for instance. Since T ⊂ R∞ , we can unambiguously define A0 ⊂ C(T ) as the collection of all polynomials with rational coefficients, i.e., all functions of type n f (x) = α + β j=1 [x(j) ]γj , where x ∈ T , α, β range over the rationals, and n, γ1 , . . . , γn range over nonnegative integers. Let A denote the smallest algebra that contains A0 and apply the Stone–Weierstrass theorem to conclude that A is a countable dense subset of C(T ), as desired. Exercise 2.5.2 Prove the following refinement: If Γ is a compact metric space, C(Γ) is separable. Step 3. (Existence of Subsequential Limits) We now show the existence of a subsequence n such that for every f ∈ & C(T ), Λ(f ) = limn →∞ f (ω) Pn (dω) exists. & Since &T is compact, any f ∈ C(T ) is bounded. Thus, for each f ∈ C(T ), f dP1 , f dP2 , . . . is a bounded sequence in R. In particular, &for each f ∈ C(T ), there exists a subsequence nm such that limnm →∞ f dPnm exists. On the other hand, by Step 2, C(T ) has a countable dense subset A. Applying Cantor’s diagonalization argument, we can extract one subsequence n such that for all f ∈ A, f (ω) Pn (dω) (1) Λ(f ) = lim n →∞
exists. We complete Step 3 by showing that the above holds for all f ∈ C(T ), along the same subsequence n . We do this by first continuously extending the domain of Λ from A to all of C(T ). Indeed, note that for all f, g ∈ A, |Λ(f ) − Λ(g)| ≤ supx∈T |f (x) − g(x)|. By density, for any f ∈ C(T ), we can find fm ∈ A such that limm→∞ fm = f , in C(T ), i.e., in the supremum norm. Since f1 , f2 , . . . is a Cauchy sequence, this shows that Λ(f ) = limm Λ(fm ) exists, and |Λ(f ) − Λ(g)| ≤ supx∈T |f (x) − g(x)| for all f, g ∈ C(T ). Once more, for all f ∈ C(T ), find fm ∈ A such that fm → f in C(T ). This leads to the following sequence of bounds:
3 The Space C
193
lim sup f (ω) Pn (dω) − Λ(f ) n →∞
≤ |Λ(f ) − Λ(fm )| + lim sup f (ω) Pn (dω) − fm (ω) Pn (dω) n →∞
≤ 2 sup |f (x) − fm (x)|. x∈T
Let m → ∞ to see that equation (1) holds for all f ∈ C(T ), as desired. Step 4. (The Conclusion) Prohorov’s theorem readily follows upon combining Step 3 with the representation theorem of F. Riesz; see Rudin (1974, Theorem 2.14), for a proof. The Riesz Representation Theorem Let T be a compact metric space and let Λ denote a positive and continuous linear functional on C(T ). Then, there exists a measure &P on the Borel subsets of T such that for all f ∈ C(T ), Λ(f ) = f (ω) P(dω). Exercise 2.5.3 Complete the details of the proof of Prohorov’s theorem.
3 The Space C In Chapter 5 we encountered processes that are continuous or have a continuous modification. Among other things, in this chapter we adopted the viewpoint that such processes are, in fact, random variables that take values in some space of continuous functions. We now turn to weak convergence on such spaces.
3.1 Uniform Continuity Define C = C([0, 1]N , Rd ) to be the collection of all continuous functions f : [0, 1]N → Rd . It is metrized by dC (f, g) =
sup |f (t) − g(t)|,
t∈[0,1]N
f, g ∈ C.
(1)
Thus, the space C([0, 1]N ) of Section 2.5 is none other than C with d = 1. According to Prohorov’s theorem (Theorem 2.5.1), in order to study weak convergence on C, we need first to understand the structure of its compact subsets. This is described by the following result, whose proof can be found, for example, in Munkres (1975, Theorem 6.1, Chapter 7).
194
6. Limit Theorems
The Arzel´ a–Ascoli Theorem A subset C of C has compact closure if and only if (a) it is equicontinuous and (b) for each x ∈ [0, 1]N , the closure of Cx = {f (x) : f ∈ C } is compact in Rd . The modulus of continuity of f ∈ C is the function ωf (ε) =
sup s,t∈[0,1]N : |s−t|≤ε
|f (s) − f (t)|,
ε > 0.
(2)
(Compare to the analogous notion defined in Section 2.5 of Chapter 5.) Clearly, a function f ∈ C is uniformly continuous if limε→0+ ωf (ε) = 0. Exercise 3.1.1 Prove that a subset C of C is equicontinuous if and only if limε→0+ supf ∈C ωf (ε) = 0. This exercise leads to the following reformulation of the Arzel´a–Ascoli theorem. Theorem 3.1.1 (The Arzel´ a–Ascoli Theorem) A subset C of C has compact closure if and only if for all x ∈ [0, 1]N , the closure of Cx = {f (x) : f ∈ C } is compact in Rd and limε→0+ supf ∈C ωf (ε) = 0. Theorem 3.1.1 will be used to characterize tightness on C. But what do C-valued random variables look like? In fact, are there any interesting ones? The following example shows that there are indeed many natural C-valued random variables. Example Let g = (gt ; t ∈ RN + ) be a real-valued Gaussian process with a continuous mean function µ and some covariance function Σ. According to Chapter 5, when µ is continuous, under some technical conditions on Σ, g has a continuous modification X = (Xt ; t ∈ RN + ); see Sections 2.3, 2.5, and 2.6 of Chapter 5. According to Lemma 2.2.1 of Chapter 5, X and g have the same finite-dimensional distributions. Since Gaussian processes are solely described by their finite-dimensional distributions, X is itself a Gaussian process with mean function µ and covariance function Σ. For all ω ∈ Ω and all t ∈ [0, 1]N , let X(t)(ω) = Xt (ω). That is, we think of X as the random function t → Xt restricted to [0, 1]N . Continuity of X implies that P(X ∈ C) = 1, where C = C([0, 1]N , R). Equivalently, X is a C-valued random variable. A concrete example of a C-valued random variable is the standard Brownian sheet; cf. Section 3.2 of Chapter 5 for the requisite continuity results. We conclude this subsection with the following technical result, which will tacitly be used from now on. Lemma 3.1.1 Suppose X is a C-valued random variable. Then, for any ε > 0, ωX (ε) is an R+-valued random variable.
3 The Space C
Exercise 3.1.2 Prove Lemma 3.1.1.
195
3.2 Finite-Dimensional Distributions Given that C = C([0, 1]N , Rd ), let dC be defined by equation (1) of Section 3.1, and suppose that µ1 and µ2 are two probability measures &on Borel subsets of Rd . According to Theorem 3.1.1, µ1 = µ2 if and only if h dµ1 = & h dµ2 for all bounded, continuous functions h : Rd → R. Let X1 and X2 denote random variables whose distributions are µ1 and µ2 , respectively; cf. Section 1.2. The preceding discussion says that X1 and X2 have the same distribution if and only if for all bounded continuous functions h : C → R, E[h(X1 )] = E[h(X2 )]. The following is often more useful. Theorem 3.2.1 Let X1 and X2 be two C-valued random variables. Then, X1 and X2 have the same distributions if and only if for all t1 , . . . , tk ∈ [0, 1]N , (X1 (t1 ), . . . , X1 (tk )) and (X2 (t1 ), . . . , X2 (tk )) have the same distribution. Remarks (a) Since Xi ∈ C, it is a continuous Rd -valued random function. In particular, we point out that (Xi (t1 ), . . . , Xi (tk )) can be viewed as an Rdk -valued random vector; see Theorem 1.1.1. (b) In the notation of Chapter 2, the above theorem says that X1 and X2 have the same distribution if and only if they have the same finitedimensional distributions. Proof We have already mentioned that X1 and X2 have the same distribution if and only if for all bounded, continuous h : C → R, E[h(X1 )] = E[h(X2 )]. Suppose, first, that X1 and X2 have the same distribution. Fix t1 , . . . , tk ∈ [0, 1]N and let h(x) = f (x(t1 ), . . . , x(tk )) (x ∈ C), where f : Rdk → R is bounded and continuous. Since h : C → R is continuous, this implies that E[h(X1 )] = E[h(X2 )], which implies the equality of the finite-dimensional distributions. Conversely, suppose X1 and X2 have the same finite-dimensional distributions. It suffices to show that for all closed sets F ⊂ C, P(X1 ∈ F ) ≤ P(X2 ∈ F ).
(1)
(Why?) Fix ε > 0 and let t1 , . . . , tk be a partition of [0, 1]N of mesh ε. Define the projection operator πt1 ,...,tk by πt1 ,...,tk f = (f (t1 ), . . . , f (tk )), for all f ∈ C. Since X1 and X2 have the same finite-dimensional distributions, in the notation of Sections 1.1 and 1.2, = P2 ◦ πt−1 , P1 ◦ πt−1 1 ,...,tk 1 ,...,tk
(2)
196
6. Limit Theorems
where Pi denotes the distribution of Xi (i = 1, 2). For any closed F ⊂ C, let πt1 ,...,tk F = πt1 ,...,tk f : f ∈ F . Of course, if f ∈ F , then πt1 ,...,tk f ∈ πt1 ,...,tk F . On the other hand, if f ∈ C satisfies ωf (ε) ≤ η for some η > 0 and if πt1 ,...,tk f ∈ πt1 ,...,tk F , then f ∈ F η = g ∈ C : dC ({g}, F ) ≤ η , where dC ({g}, F ) = inf h∈F dC (h, g). We combine these observations as follows: For all ε, η > 0, P(X1 ∈ F ) ≤ P(πt1 ,...,tk X1 ∈ πt1 ,...,tk F ) (πt1 ,...,tk F ) = P1 ◦ πt−1 1 ,...,tk = P2 ◦ πt−1 (πt1 ,...,tk F ), 1 ,...,tk
(by (2))
= P(πt1 ,...,tk X2 ∈ πt1 ,...,tk F ) ≤ P(X2 ∈ F η ) + P(ωX2 (ε) > η). Let ε → 0+ and use the Arzel´ a–Ascoli theorem (Theorem 3.1.1) to see that P(X1 ∈ F ) ≤ P(X2 ∈ F η ). On the other hand, η > 0 is arbitrary and η → F η is set-theoretically decreasing with ∩η∈Q+ F η = F . Equation (1) follows.
3.3 Weak Convergence in C Continuing with our discussion of Section 3.2, we let (µn ; 1 ≤ n ≤ ∞) denote a collection of probability measures on C and seek conditions that ensure that µn =⇒ µ∞ . Bearing the discussion of Section 1.2 in mind, we need to ask, if (Xn ; 1 ≤ n ≤ ∞) is a sequence of C-valued random variables, when does Xn =⇒ X∞ ? Given the development of Sections 3.1 and 3.2, we may be tempted to conjecture that Xn =⇒ X∞ if and only if all finite-dimensional distributions of Xn converge to those of X∞ . This is not so, as the following example shows. Example Let N = 1 and S = R. Thus, C is the collection of all continuous functions f : [0, 1] → R, and dC is the usual supremum norm on C. We wish to construct a sequence (Xn ; n ≥ 1) of C-valued random variables such that the finite-dimensional distributions of Xn converge to those of the function 0, and yet Xn does not converge weakly to 0. Here is one such construction: On an appropriate probability space, construct independent random variables U1 , U2 , . . . all of which are uniformly distributed on the
3 The Space C
197
interval [ 12 , 34 ] (say). Define 2 n t − n2 U n , Xn (t) =
if 0 ≤ Un ≤ t ≤ Un + n1 ,
−n2 t + n2 Un + 2n, if Un + 0,
1 n
≤ t ≤ Un +
2 n
≤ 1,
otherwise.
Recall that almost surely, Un ∈ [ 12 , 34 ]. Therefore, after directly plotting the function Xn , we see that for all n ≥ 8, Xn is a continuous, piecewise linear function on [0, 1] such that Xn (0) = Xn (Un ) = 0, Xn (Un + n1 ) = n, Xn (Un + n2 ) = Xn (1) = 0, and between these values, Xn is obtained by linear interpolation. In particular, P(Xn ∈ C) = 1. Fix any t1 , . . . , tk ∈ [0, 1] and consider the event that (Xn (ti ) = 0, for some 1 ≤ i ≤ k), which is the same as (ti ∈ [Un , Un + n2 ], for some 1 ≤ i ≤ k). Since the latter has probability at most 4k n , P(Xn (t1 ) = · · · = Xn (tk ) = 0) ≥ 1 −
4k n
converges to 1 as n → ∞. That is, the finite-dimensional distributions of Xn converge to those of the function 0. Next, we will verify that Xn =⇒ 0. To show this, for all x ∈ C define f1 (x) = sup0≤t≤1 |x(t)|. The function f1 has the following two properties: (a) f1 : C → R+ ; and (b) for all x, y ∈ C, |f (x) − f (y)| ≤ sup0≤t≤1 |x(t) − y(t)| = dC (x, y). In particular, f1 is a continuous function from C to R+ . Define f2 : R+ → [0, 1] by f2 (t) = t ∧ 1 and let f = f2 ◦ f1 . Apparently, f : C → [0, 1] is bounded and continuous, although for all n ≥ 1, f (Xn ) ≡ 1 and f (0) = 0. Hence, E[f (Xn )] → E[f (0)]. The above example shows that convergence of the finite-dimensional distributions is not sufficiently strong to guarantee weak convergence in C. Indeed, the missing ingredient is tightness, as the following shows. Proposition 3.3.1 Suppose (Xn ; 1 ≤ n ≤ ∞) are C-valued random variables. Then, Xn =⇒ X∞ , provided that: (i) the finite-dimensional distributions of Xn converge to those of X∞ ; and (ii) (Xn ) is a tight sequence. Proof Let Pn denote the distribution of Xn (1 ≤ n ≤ ∞). By Exercise 2.5.1, for any subsequence n there exists a further subsequence n
198
6. Limit Theorems
and a probability measure Q∞& such that for all bounded continuous func& tions f : C → R, f dPn → f dQ∞ . It suffices to show that no matter which sequence n we choose, Q∞ = P∞ (why?). The convergence of finitedimensional distributions implies that the finite-dimensional distributions of Q∞ and P∞ agree. Consequently, the proposition follows from Theorem 3.2.1. In the example above, we may see that the trouble comes in via the wild oscillations of the functions Xn . In other words, the family (Xn ; n ≥ 1) is not equicontinuous, i.e., not tight. The following theorem shows that this lack of tightness is indeed the source of the difficulty. Theorem 3.3.1 Let (Xn ; 1 ≤ n ≤ ∞) denote a collection of C-valued random variables. Then Xn =⇒ X∞ provided that: (i) the finite-dimensional distributions of Xn converge to those of X∞ ; and (ii) for all ε > 0, limδ→0 lim supn→∞ P(ωXn (δ) ≥ ε) = 0. Proof In light of Proposition 3.3.1, it suffices to prove that (Xn ) is tight. That is, given any ε ∈ ]0, 1[, we need to produce a compact set Γε such that supn P(Xn ∈ Γε ) ≤ ε. Owing to Theorem 2.4.1, condition (ii) of the theorem is equivalent to the following: For all ε > 0, limδ→0 supn P(ωXn (δ) ≥ ε) = 0. (Why?) Thus, for any ε ∈ ]0, 1[, we can find a sequence δ1 > δ2 > · · · such that limk δk = 0 and supn P(ωXn (δk ) ≥ k1 ) ≤ ε2−k−1 . That is, if we define 1 , k ≥ 1, Ak = f ∈ C : ωf (δk ) ≤ k we have supn P(Xn ∈ ∩k≥1 Ak ) ≤ k≥1 ε2−k−1 = 12 ε. Next, define for all λ > 0, A0 (λ) = {f ∈ C : |f (0)| ≤ λ}. By the convergence of finite-dimensional distributions and by the portmanteau theorem (Theorem 2.1.1), for all λ > 0, lim sup P{Xn ∈ A0 (λ)} ≤ P{X∞ ∈ A0 (λ)}. n→∞
1 Thus, there exists λ large such > that supn P{Xn ∈ A0 (λ)} ≤ 2 ε. If we let Γε be the closure of A0 (λ) ∩ k≥1 Ak , we have supn P(Xn ∈ Γε ) ≤ ε, and Γε is compact, thanks to the Arzel´a–Ascoli theorem (Theorem 3.1.1; why?) This completes our proof.
3 The Space C
199
Exercise 3.3.1 Suppose (µn ) is a collection of probability measures on C. Prove that (µn ) is tight if and only if: (i) limλ→∞ lim supn→∞ µn (f ∈ C : |f (0)| ≥ λ) = 0; and (ii) for all ε > 0, lim supδ→0 lim supn→∞ µn (f ∈ C : ωf (δ) ≥ ε) = 0. Is there anything special about the condition |f (0)| ≥ λ in (i)? For instance, can it be replaced by |f (a)| ≥ λ, where a ∈ [0, 1]N is fixed?
3.4 Continuous Functionals Suppose C = C([0, 1]N , Rd ) is defined as before. If (Xn ; 1 ≤ n ≤ ∞) are C-valued random variables, we now explore some of the consequence of the statement “Xn =⇒ X∞ .” Recall that a functional is a real-valued function on C, and a functional Λ is continuous if for all f, fn ∈ C, limn→∞ dC (fn , f ) = 0 implies limn→∞ Λ(fn ) = Λ(f ). (Recall that dC is the distance that metrizes C; see (1) of Section 3.1.) As the following two examples show, interesting continuous functionals on C abound. Example 1 Define the functional Λ by Λ(f ) = supt∈[0,1]N |f (t)|. Then, Λ is a continuous functional. In fact, Λ(f ) = dC (f, 0), where 0 denotes the 0 function. Moreover, for all p > 0, Λp defines a continuous functional, where
1/p Λp (f ) = |f (t)|p dt . [0,1]N
This follows from the trivial inequality |Λp (f ) − Λp (g)| ≤ dC (f, g). As another example, consider d = 1 and let Λ+ (f ) = supt∈[0,1]N f (t). Then Λ+ is a continuous functional. Example 2 Fix an integer k ≥ 1 and a continuous function θ : S k → R. For any fixed t1 , . . . , tk ∈ [0, 1]N and all f ∈ C, define Λ(f ) = θ(f (t1 ), . . . , f (tk )). It is not hard to see that Λ is a continuous functional. For instance, when S = R, Λ(f ) = kj=1 ξ (j) f (tj ) is a continuous functional, where ξ ∈ Rk is fixed. The following is an immediate consequence of the continuous mapping theorem (Theorem 2.2.1). Theorem 3.4.1 Suppose that Xn =⇒ X∞ , as elements of C. Then for any continuous functional Λ on C, Λ(Xn ) converges in distribution to Λ(X∞ ). To better understand Theorem 3.4.1, try the following almost sure version.
200
6. Limit Theorems
Exercise 3.4.1 Suppose X1 , . . . , X∞ are all C = C([0, 1]N , Rd )valued random variables such that for each continuous functional Λ on C, limn→∞ Λ(Xn ) = Λ(X∞ ), a.s. Then, show that with probability one, limn→∞ Xn = X∞ , where the convergence takes place in C. Thus, Theorem 3.4.1 is one way to state that Xn =⇒ X∞ in C if and only if the “distribution” of the entire function t → Xn (t) converges to that of t → X∞ (t), all viewed as functions from [0, 1]N to Rd .
3.5 A Sufficient Condition for Pretightness If X1 , X2 , . . . is a sequence of C = C([0, 1]N , Rd )-valued random variables, we can inspect the contents of Proposition 3.3.1, Theorem 3.3.1, and Exercise 3.3.1 to see that verifying the tightness of (Xn ) often boils down to finding a sufficient condition for the following: For all ε > 0, lim lim sup P(ωXn (δ) ≥ ε) = 0.
δ→0 n→∞
When this holds, we say that (Xn ) is pretight. In this subsection we find a technical condition for the pretightness of (Xn ). This condition will be used in Section 5 in its consideration of multiparameter random walks. The main result is the following; see Billingsley (1968, Theorem 8.3); see Lachout (1988) for related results. Theorem 3.5.1 A collection X1 , X2 , . . . of C([0, 1]N , Rd )-valued random variables is pretight if for every = 1, . . . , N , lim lim sup
sup
δ→0 n→∞ 0≤t() ≤1
1 P sup |Xn (s) − Xn (t)| ≥ ε = 0, δ
where the supremum inside the probability is taken over all s ∈ [0, 1]N and all t(k) ∈ [0, 1] (k = ) such that t() ≤ s() ≤ t() + δ and for all k = , s(k) = t(k) . It is important to note that the supremum inside the probability is taken only over the values of s(1) , . . . , s(N ) and t(k) , where k = ; the supremum over the t() is outside the probability, and this alone makes such a result useful. Proof We will prove this for N = 1 only; when N > 1, the proof is similar but should be attempted by the reader for better understanding. Thus, throughout this proof we let N = 1, and set out to prove that the following implies the pretightness of (Xn ): For all ε > 0,
1 P sup |Xn (s) − Xn (t)| ≥ ε = 0. 0≤t≤1 δ s∈[0,1]: t≤s≤t+δ
lim lim sup sup
δ→0 n→∞
(1)
4 Invariance Principles
201
To prove this, we will use a simplified version of the chaining argument of Chapter 5. Fix some δ > 0 and define Γδ = {0, δ, 2δ, . . . , 1δ !δ}. In particular, note that for all s ∈ [0, 1] there exists a unique γs ∈ Γδ such that γs ≤ s ≤ γs +δ. By the triangle inequality, for all s, t ∈ [0, 1] with |s − t| ≤ δ, |Xn (s) − Xn (t)| ≤ |Xn (s) − Xn (γs )| + |Xn (t) − Xn (γt )| + |Xn (γs ) − Xn (γt )| ≤ 3 sup |Xn (r) − Xn (γr )|. r∈[0,1]
(The last inequality uses the continuity of X. Why?) Since the cardinality of Γδ is no more than 1 + 1δ , 1 ε max P . sup |Xn (s) − Xn (γ)| ≥ P(ωXn (δ) ≥ ε) ≤ 1 + δ γ∈Γδ 3 s∈[0,1]: γ≤s≤γ+δ The theorem follows readily from this and from equation (1).
Exercise 3.5.1 Prove Theorem 3.5.1 for all N ≥ 1.
4 Invariance Principles Let ξ1 , ξ2 , . . . denote independent, identically distributed random variables that have mean µ and variance σ 2 and let S = (Sn ; n ≥ 1) denote the n associated random walk, i.e., Sn = j=1 ξj (n ≥ 1). The classical central limit theorem states that as n → ∞, n−1/2 σ −1 {Sn − nµ} converges in distribution to a standard Gaussian random variable. Combining the notation of Section 2.3 with Section 1.1 of Chapter 5, we can write this as Sn − nµ √ =⇒ N1 (0, σ 2 ), n where =⇒ denotes weak convergence in R; see Chapter 5 for the notation on Gaussian distributions. Exercise 4.0.2 Show that for any choice of 0 ≤ t1 < · · · < tk ≤ 1, S
Sntk − ntk µ − nt1 µ √ √ ,..., =⇒ (Zt1 , . . . , Ztk ), nσ nσ
nt1
where =⇒ denotes weak convergence in Rk and (Zt1 , . . . , Ztk ) is an Rk valued Gaussian random vector with mean vector 0 and covariance matrix Σ, where Σ(i,j) = ti ∧ tj (1 ≤ i, j ≤ k).
202
6. Limit Theorems
In the above exercise, the limiting probability measure (or the random vector Z) is reminiscent of a Brownian motion sampled at fixed times t1 , . . . , tk . That is, if Z is a Brownian motion, (Zt1 , . . . , Ztk ) has the same finite-dimensional distributions as the limiting vector in Exercise 4.0.2 above, for any choice of 0 ≤ t1 < · · · < tk ≤ 1. It is natural to ask whether the stochastic process t → n−1/2 σ −1 (Snt − nµt) (t ∈ [0, 1]) converges weakly to Brownian motion in C([0, 1], R), as n → ∞. Unfortunately, the random function fn (t) = Snt (0 ≤ t ≤ 1) does not belong to C([0, 1], R). However, we are interested only in the random walk values fn (n/k), where 0 ≤ k ≤ n. Thus, we can alternatively study the stochastic process Sn = (Sn (t); t ∈ [0, 1]) defined by
Snt + nt − nt! ξnt+1 − ntµ √ Sn (t) = , 0 ≤ t ≤ 1. (1) nσ That is, Sn is a random continuous function on [0, 1] such that for all 0 ≤ k ≤ n, k S − kµ k = √ , Sn n nσ and between these values, Sn is defined by linear interpolation. (To see things clearly, set µ = 0 and σ = 1.) In particular, P{Sn ∈ C([0, 1], R)} = 1. Donsker’s theorem states that the C-valued process Sn converges weakly to standard Brownian motion on [0, 1]. Assuming this result for the moment, we can then combine the first example of Section 3.4 with Theorem 3.4.1 to prove results such as the following: For all λ ≥ 0,
√ (2) lim P max (Sk − kµ) ≥ n σλ = P sup Bt ≥ λ , n→∞
1≤k≤n
0≤t≤1
where B = (Bt ; t ≥ 0) denotes a standard Brownian motion. This is an example of an invariance principle: The limiting distribution is independent of the distributions of the ξ’s, as long as the latter are i.i.d., with mean µ and variance σ 2 . Sometimes, such invariance principles can themselves be used to compute the limiting distribution; for examples of this technique see Supplementary Exercises 5, 6, and 7. In this section we will prove the aforementioned theorem of M. Donsker and a multiparameter extension due to (Bickel and Wichura 1971; Pyke 1973). Donsker’s theorem, in a slightly different setting, can be found in Donsker (1952). For general theory, and for detailed historical accounts, see (Billingsley 1968; Dudley 1989; Ethier and Kurtz 1986).
4.1 Preliminaries Let ξ = (ξt ; t ∈ NN ) denote a collection of independent, identically distributed random variables with mean 0 and variance 1. As in Chapter 5,
4 Invariance Principles
203
we associate to ξ a random walk S = (St ; t ∈ NN ) defined by ξr , t ∈ NN . St = rt
In the case N = 1, there is a clear way to linearly interpolate and define a continuous function Sn (t) that agrees with n−1/2 Snt for values of t = 0, n1 , n2 , . . . , 1; cf. the discussion preceding Section 4.1. When N > 1, “linear interpolation” is more arduous but still possible, as we shall see next. For all t ∈ [0, 1]N , let
Ξ(t) = Leb [t], t ξ[t]+(1,...,1) , where [t] = ( t(1) !, . . . , t(N ) !), and define the process Sn = (Sn (t); t ∈ [0, 1]N ) by the following Stieltjes integral: −N 2 Ξ(ds), n ≥ 0 , t ∈ [0, 1]N . Sn (t) = n [0,nt]
Proposition 4.1.1 Whenever n ≥ 1: N
(i) for all t ∈ [0, 1]N , Sn ( n1 [nt]) = n− 2 S[nt] , a.s.; (ii) P{Sn ∈ C([0, 1]N , R)} = 1; and (iii) with probability one, Sn is a linear function, coordinatewise. In particular, when N = 1, Sn is the same as equation (2) of the beginning of this section, with µ = 0 and σ = 1. Proof We will perform the explicit computation (i). Assertions (ii) and (iii) are proved analogously. For all s ∈ [0, 1]N , define Q(s) = {t ∈ [0, 1]N : s t ≺ s + (1, . . . , 1)}. Then, Sn
[nt] n
N
= n− 2
0 s≺[nt]
Ξ(dr). Q(s)
If r ∈ Q(s), then Ξ(r) = Leb([s, r])ξs+(1,...,1) . Therefore, on Q(s), the (random) signed measure Ξ is absolutely continuous with respect to Lebesgue’s measure, and Ξ(dr)/dr = ξs+(1,...,1) . Thus, Sn which proves (i).
[nt] n
N
= n− 2
ξs+(1,...,1) ,
0 s≺[nt]
204
6. Limit Theorems
Exercise 4.1.1 Complete the proof of Proposition 4.1.1.
The goal of this section is to prove the following result. Theorem 4.1.1 As n → ∞, Sn converges weakly in C([0, 1]N , R) to a standard Brownian sheet. The above is an immediate consequence of Theorem 3.3.1, once we show that (a) the finite-dimensional distributions of Sn converge to those of a standard Brownian sheet; and (b) (Sn ) is pretight; cf. also Section 3.5 above. We shall verify (a) and (b) in Sections 4.2 and 4.3, respectively.
4.2 Finite-Dimensional Distributions In this subsection we continue with the discussion of Section 4.1 and prove the following: Proposition 4.2.1 The finite-dimensional distributions of Sn converge to those of a standard Brownian sheet. Our proof requires three technical lemmas. Throughout, B = (Bt ; t ∈ RN + ) denotes a standard Brownian sheet. Lemma 4.2.1 Let X be a random variable with mean 0 and variance ν 2 . Then, ν 2 7 iX E[e ] − 1 + ≤ E[X 2 (1 ∧ |X|)]. 2 2 Proof By Taylor’s theorem, for any real number a, eia = 1+ia− 12 a2 − 6i a3 , where |a | ≤ |a|. Thus, 1 1 7 ia e − 1 − ia + a2 ≤ |a|3 ≤ |a|3 . 2 6 2
(1)
The above is useful only if |a| ≤ 1. If |a| > 1, we can get a better estimate by simply using the triangle inequality, viz., 1 1 7 ia e − 1 − ia + a2 ≤ 2 + |a| + a2 ≤ a2 . 2 2 2 We obtain the result by combining this with equation (1), plugging in a = X and taking expectations. Exercise 4.2.1 Verify Lemma 4.2.1, with improved further still.
7 2
reduced to 1. This can be
Lemma 4.2.2 Given n ≥ 1, consider functions fn : NN + → R such that: |fn (s)| < ∞; and (i) supn≥1 sups∈NN +
4 Invariance Principles
205
(ii) for some finite p > 0, limn→∞ n−N s (n,...,n) fn2 (s) = p. N Then, Yn = n− 2 s (n,...,n) fn (s)ξs converges in distribution to a Gaussian random variable with mean 0 and variance p. Proof By the independence of the ξ’s and by Lemma 4.2.1, for any θ ∈ R, −N/2 ξs fn (s) E eiθn E eiθYn = s∈NN + : |s|≤n
= s∈NN + : |s|≤n
"
# θ2 n−N fn2 (s) (1 + εn (s)) , 1− 2
N where εn : NN + → [−1, 1], limn→∞ εn (s) = 0, for all s ∈ N+ , and |s| = (j) max1≤j≤N |s |, as always. Thus, by Taylor’s expansion, 6
7 iθYn θ2 n−N fn2 (s) 1 + εn (s) = exp ln 1 − E e 2 s∈NN + : |s|≤n θ2
= exp − N fn2 (s) 1 + δn (s) , 2n N s∈N+ : |s|≤n
where δ1 , δ2 , . . . : NN + → R is a bounded sequence of functions that satisfies limn→∞ δn (s) = 0, for all s ∈ NN + . By the dominated convergence theorem, 2 limn→∞ E[eiθYn ] = e−θ p/2 , and the result follows from the convergence theorem for characteristic functions. Lemma 4.2.3 For all t1 , . . . , tk ∈ [0, 1]N , [nt1 ] [ntk ] , . . . , Sn =⇒ (Bt1 , . . . , Btk ), Sn n n where =⇒ denotes weak convergence in Rk and B is the standard Brownian sheet. N You may recall that for s ∈ RN + , [s] ∈ R+ denotes the point whose ith coordinate is s(i) !.
Proof Owing to Proposition 4.1.1, we need to show that for all t1 , . . . , tk ∈ [0, 1]N , N n− 2 (S[nt1 ] , . . . , S[ntk ] ) =⇒ (Bt1 , . . . , Btk ), where =⇒ denotes weak convergence in Rk . By Exercise 4.2.2 below, it suffices to show that for all α1 , . . . , αk ∈ R, N
n− 2
k i=1
αi S[nti ] =⇒
k i=1
αi Bti ,
206
6. Limit Theorems
where =⇒ now denotes k weak convergence in R. Recall from Chapter 5, Section 1.1, that Z = i=1 αi Bti is an R-valued Gaussian random variable. Moreover, it has mean zero, and a direct computation reveals that the variance of Z is 2
σ =
k k
n
αi αj
i=1 j=1
N
N
k i=1
where fn (s) = n−N
k
i=1
s (n,...,n)
αi
()
()
ti ∧ tj
.
=1
Thus, our goal is to show that Yn = n− 2 bution to N1 (0, σ 2 ). On the other hand, Yn = n− 2
k
i=1
N
ξs = n− 2
s [nti ]
αi S[nti ] converges in distri-
fn (s)ξs ,
s∈NN : |s|≤n
αi 1l[0,[nti ]] (s). Since
fn2 (s) = n−N
k k i=1 j=1
αi αj
1l[0,[nti ][ntj ]] (s),
s (n,...,n)
Riemann sum approximations show that n−N s (n,...,n) fn2 (s) converges to σ 2 , and the result follows from Lemma 4.2.2. Exercise 4.2.2 If (Xn ; n ≤ ∞) are Rd -valued random variables, then Xn converges weakly to X∞ if and only if for all α ∈ Rd , the random variable α · Xn converges in distribution to α · X∞ . This is called the Cram´er–Wald device. We are ready for our proof of Proposition 4.2.1. Proof of Proposition 4.2.1 In light of Lemma 4.2.3, we need only show that for each t ∈ [0, 1]N , as n → ∞, Sn (t) − Sn ([nt]/n) → 0, in L2 (P). (Why?) If ∆n = ∂NN ∩ [0, [nt] + (1, . . . , 1)], one can check directly that N [nt] = n− 2 Sn (t) − Sn Ls,t ξs , n s∈∆n
where Ls,t is the Lebesgue measure of a certain subrectangle of the cube {r ∈ RN + : s − (1, . . . 1) r s} (plot a picture!). All that we need to know of the L’s is the simple fact that |Ls,t | ≤ 1. In particular, since the ξ’s are i.i.d. with mean 0 and variance 1, 2 −N = n ) L2s,t ≤ n−N #∆n , E Sn (t) − Sn ( [nt] n s∈∆n
4 Invariance Principles
207
where #A denotes the cardinality of the set A. One can check that ∆n ⊂ [0, [nt] + (1, . . . , 1)] \ [0, [nt]]. Therefore, for all t ∈ NN , 6N 7 N 2 () [nt] −N () nt + 1 − nt ! E Sn (t) − Sn ( n ) ≤ n =1
n
()
= =1
nt ! · n
"
=1
# nt() + 1! −1 nt() ! =1 N
nt() + 1
N
≤
nt() − 1 n + 1 N − 1, ≤ n−1
−1
=1
since the expression in the penultimate line is increasing in t() ≥ 1 for all . As n → ∞, this bound goes to 0, and our result follows.
4.3 Pretightness Having verified convergence of the finite-dimensional distribution, we now prove the following: Proposition 4.3.1 S1 , S2 , . . . is pretight. Proof We will prove this only for N = 2. The case N = 1 is much easier, and N ≥ 3 is proved by similar means as N = 2, but the notation is more cumbersome. To prove pretightness, we will verify the condition of Theorem 3.5.1. A little thought shows that it suffices to verify that condition with = 1; the = 2 case is similar. Thus, we need only show that for all ε > 0, lim lim sup sup P sup
δ→0 n→∞ 0≤t≤1
sup |Sn (s, u) − Sn (t, u)| ≥ ε = 0.
(1)
s∈[0,1]: u∈[0,1] t≤s≤t+δ
Of course, in the above, Sn (s, u) = Sn (p), where p = (s, u) ∈ [0, 1]2 . For notational simplicity, define for all t ∈ [0, 1] and δ > 0, Qn (t, δ) =
sup
sup |Sn (s, u) − Sn (t, u)|.
(2)
s∈[0,1]: u∈[0,1] t≤s≤t+δ
By Proposition 4.1.1, u → Sn (s, u) is piecewise linear on each of the intervals u ∈ [ nk , k+1 n ] (0 ≤ k ≤ n − 1). This leads to the following bound: nu! nu! − Sn t, sup Sn s, Qn (t, δ) = sup . n n s∈[0,1]: u∈[0,1] t≤s≤t+δ
208
6. Limit Theorems
Hence, Qn (t, δ) ≤
sup s∈[0,1]: t≤s≤t+δ
ns! nu! nt! nu! , − Sn , sup Sn n n n n u∈[0,1]
+
sup s∈[0,1]: t≤s≤t+δ
+
sup s∈[0,1]: t≤s≤t+δ
nu! ns! nu! , − Sn s, sup Sn n n n u∈[0,1] nu! nt! nu! , − Sn t, sup Sn . n n n u∈[0,1]
Now we can use linearity in t (and s) to deduce that Qn (t, δ) ≤
sup s∈[0,1]: t≤s≤t+δ
+
ns! nu! nt! nu! , − Sn , sup Sn n n n n u∈[0,1]
sup s∈[0,1]: t≤s≤t+δ
+
sup s∈[0,1]: t≤s≤t+δ
ns! nu! ns! + 1 nu! , − Sn , sup Sn n n n n u∈[0,1] nt! nu! nt! + 1 nu! , − Sn , sup Sn . n n n n u∈[0,1]
Using Proposition 4.1.1 once more, we can relate this to the multiparameter random walk, all the time remembering that N = 2: sup |Sn (s, u) − Sn (t, u)| ≤
u∈[0,1]
1 sup |Sns,nu − Snt,nu | n u∈[0,1] +
1 sup |Sns,nu − Sns+1,nu | n u∈[0,1]
+
1 sup |Snt,nu − Snt+1,nu |. n u∈[0,1]
Thus, for any δ > 0 and any t ∈ [0, 1], Qn (t, δ) ≤
1 max max Snt+k,u − Snt,u n 1≤k≤nδ+1 u∈N: u≤n
2 sup max Sns,u − Sns+1,u . + n t≤s≤t+δ u∈N: u≤n
By the triangle inequality, for the above values of s and u we have |Sns,u − Sns+1,u | ≤ |Sns,u − Snt,u | + |Snt,u − Sns+1,u | ≤2
max
1≤k≤nδ+2
|Sk+nt,u − Snt,u |.
4 Invariance Principles
209
In particular, for any t ∈ [0, 1], n ≥ 1, and δ > 0, Qn (t, δ) ≤ 5n−1
max
1≤k≤nδ+2 1≤j≤n
|Sk+nt,j − Snt,j |.
By the stationarity of the increments of S, for all t ∈ [0, 1], δ > 0, n ≥ 1, and λ > 0, 1 P(Qn (t, δ) ≥ λ) ≤ P |Sk,j | ≥ λn , max 5 1≤k≤nδ+2 1≤j≤n
which is independent of t ∈ [0, 1]! Thus, we can apply Lemma 2.7.2 of Chapter 4 to see that for all α, ε > 0, P(Qn (t, δ) ≥ α + 2ε)
−2
αn εn × P max |Sk,j | ≤ . ≤ P |Snδ+2,n | ≥ 5 1≤k≤nδ+2 5
(3)
1≤j≤n
2 = n( nδ! + 2) ≤ 2δn2 , as long as n ≥ δ −1 . Thus, Note that E Snδ+2,n we can apply Lemma 2.7.1 of Chapter 4 with N = 2, σ = 1, and Zt = Xt to see that for all λ > 8 and for all n ≥ δ −1 , √ 1 P |Sk,j | ≤ 2δλn ≥ . max 2 1≤k≤nδ+2 1≤j≤n
Plugging this into equation (3) with λ = 10, we see that for all ε, α > 0, with 0 < δ < e−200 ε2 , and for all n ≥ δ −1 , αn . P(Qn (t, δ) ≥ α + 2ε) ≤ 4P |Snδ+2,n | ≥ 5
2 By Proposition 4.1.1, n−1 Snδ+2,n = Sn nδ n + n , 1 . In addition, Proposition 4.2.1 and its proof together show that as n → ∞, this converges in distribution to Bδ,1 , where B is the standard Brownian sheet. Therefore, for all ε, α > 0 such that 0 < δ < e−200 ε2 , α . lim sup sup P(Qn (t, δ) ≥ α + ε2) ≤ 4P |Bδ,1 | ≥ 5 n→∞ t∈[0,1] Since δ −1 Bδ,1 is Gaussian with mean 0 and variance 1, Supplementary Exercise 11 of Chapter 4 implies that if 0 < δ < e−200 ε2 , α2 lim sup sup P(Qn (t, δ) ≥ α + 2ε) ≤ 8 exp − . 50δ 2 n→∞ t∈[0,1]
210
6. Limit Theorems
In particular, for all α, ε > 0, lim lim sup
δ→0 n→∞
1 sup P(Qn (t, δ) ≥ α + 2ε) = 0. δ t∈[0,1]
In light of equation (2), we obtain equation (1), and hence the result when N = 2. Exercise 4.3.1 Verify Proposition 4.3.1 when N = 2.
5 Supplementary Exercises 1. Suppose X1 , X2 , . . . , X∞ are S-valued random variables, where (S, d) is a separable metric space, and that for all ε > 0, limn→∞ P{d(Xn , X∞ ) ≥ ε} = 0. Prove that Xn =⇒ X∞ . That is, show that convergence in probability implies weak convergence. 2. For any probability measure P on a separable metric space T , consider sets of the form Uf,ε (P) = Q ∈ P(T ) : f (ω) Q(dω) − f (ω) P(dω) < ε , T
T
where f : T → R is bounded and continuous, P(T ) denotes the collection of all probability measures on T , and ε > 0. Show that the topology generated by such sets Uf,ε (P) topologizes weak convergence. 3. In the notation of Section 3.3, suppose (Xn ; 1 ≤ n ≤ ∞) is a sequence of C-valued random variables such that as n → ∞, Xn converges weakly in C to X∞ . Prove that: (i) for all t1 , . . . , tk ∈ [0, 1]N , (Xn (t1 ), . . . , Xn (tk )) converges weakly in Rk to (X∞ (t1 ), . . . , X∞ (tk )); and (ii) for all ε > 0, limδ→0+ lim supn→∞ P(ωXn (δ) ≥ ε) = 0. 4. The intention of this exercise is to complete the proof of Prohorov’s theorem in the noncompact case; cf. Theorem 2.5.1. (i) Show that the tightness of (Pn ) implies the existence of compact sets K1 ⊂ ) ≤ (2m)−1 . K2 ⊂ · · ·, all subsets of T , such that for all m ≥ 1, supn Pn (Km (ii) For all n, m ≥ 1, define Qm n (•) = Pn (• ∩ Km )/Pn (Km ). Prove that there exists a subsequence n and for each m, there exists a probability measure m m Qm ∞ on Km such that as n → ∞, Qn =⇒ Q∞ , for each m ≥ 1. m+1 (E). (iii) Prove that for all m ≥ 1 and all Borel sets E ⊂ Km , Qm ∞ (E) = Q∞ are nested. That is, the probability measures Qm ∞
(iv) Use (iii) to complete the proof of Theorem 2.5.1.
5 Supplementary Exercises
211
5. Suppose S = (Sn ; n ≥ 1) denotes the simple walk on Z1 . Prove that as n → ∞, n−1/2 maxk≤n Sk converges weakly in R to |N|, where N denotes a standard Gaussian random variable. Use this to conclude the following: Given arbitrary i.i.d. random variables X1 , X2 , . . . , with mean 0 and variance 1, let Sn = n j=1 Xj to see that for all λ > 0, ) λ √ 1 2 2 lim P max Sk ≤ λ n = e− 2 x dx. n→∞ 1≤k≤n π 0 Relate this to equation (2) of the preamble to Section 4. (Hint: Start with Supplementary Exercise 3, Chapter 4.) 6. Let (Sn ; n ≥ 1) be a mean-zero variance-one random walk (1-parameter) and let Sn be the associated process that was defined in Section 4.1 (once more, N = 1). (i) Apply Theorem 4.1.1 to verify that as &1 & t n → ∞, the random variables S (t) dt converge in distribution to Bs ds. n 0 0 n −3/2 (ii) Use the above to prove that as n → ∞, n k=1 Sk converges in distri&1 bution (in R) to 0 Bs ds. (iii) Find the latter distribution in terms of its probability density function. Refine this by proving that if S is an N -parameter random walk with mean 0 and variance 1, there exists αn → 0 such that αn t (n,...,n) St has a distributional limit. Find αn and identify this limit. &t (Hint: For part (iii), start by proving that 0 Bs ds is Gaussian random variable. In fact, any bounded linear functional of the random function t → Bt is Gaussian.) 7. Suppose that S is an N -parameter random walk with mean 0 and variance 1, andthat f : R → R is a continuous function. Find αn → ∞ such that as n → ∞, αn t (n,...,n) f (St ) converges in distribution. Identify the limiting distribution. 8. Let U1 , U2 , . . . denote i.i.d. random variables, all chosen uniformly from [0, 1]. Define αn (t) = n1 n j=1 1l[0,t] (Uj ), (n ≥ 1, t ∈ [0, 1]). The random distribution function αn is called the empirical distribution function for the “data” {U1 , . . . , Un }. √ (i) Verify that the finite-dimensional distributions of the process t → n(αn (t)− t) (t ∈ [0, 1]) converge, as n → ∞, to those of t → Bt − tB1 (t ∈ [0, 1]), where B denotes standard Brownian motion. (Hint: Use a multidimensional central limit theorem for multinomials.) √ (ii) We would like to state that as n → ∞, t → n(αn (t) − t) (t ∈ [0, 1]) converges in C to t → Bt − tBt (t ∈ [0, 1]). Unfortunately, αn ∈ C([0, 1]). To get around this, show that there are random functions An , Bn ∈ C([0, 1]) such that (a) An (t) ≤ αn (t) ≤ Bn (t) for all n ≥ 1 and all t ∈ [0, 1]; (b) supt∈[0,1] |An (t) − Bn (t)| → 0; and (c) An and Bn both converge weakly in C([0, 1]), as n → ∞, to t → Bt − tB1 (t ∈ [0, 1]). (Hint: Approximate the random step function t → αn (t) by random piecewise linear functions.)
212
6. Limit Theorems
(iii) Prove that as n → ∞, this limit.
√ n supt∈[0,1] |αn (t) − t| converges weakly. Identify
(iv) Prove that as n → ∞, αn (t) → t, uniformly in t ∈ [0, 1], a.s. (Hint: Convergence in probability follows from (iii). For the a.s. convergence, work from first principles.) This, the Glivenko–Cantelli theorem, is one of the fundamental results of empirical processes. See (Billingsley 1968; Dudley 1984; G¨ anssler 1983; Pollard 1984) for various refinements, and for related discussions. The Gaussian process (Bt − tB1 ; t ∈ [0, 1]) is called the Brownian bridge on [0, 1]. 9. Let B = (Bt ; t ∈ RN + ) denote the standard N -parameter Brownian sheet and define a stochastic process B ◦ = (Bt◦ ; t ∈ [0, 1]N ), indexed by [0, 1]N , as follows: (j) N t B Bt◦ = Bt − N (1,...,1) , (t ∈ [0, 1] ). This is the “Brownian sheet pinned to j=1 zero at time (1, . . . , 1)” (or the N -parameter Brownian bridge.) (i) Check that B ◦ is a Gaussian process, and compute its mean and covariance functions. (ii) Check that the entire process B ◦ is independent of B(1,...,1) . (iii) Define measures (Pε ; ε ∈ ]0, 1[) on Borel subsets of C([0, 1]N ) by Pε (•) = P(B ∈ • | 0 ≤ B(1,...,1) ≤ ε). Show that Pε is a probability measure on C([0, 1]N ) and prove that for all closed sets F ⊂ C([0, 1]N ), limε→0+ Pε (F ) ≤ P(B ◦ ∈ F η ), where F η is the closed η-enlargement of F . Conclude that Pε converges in C([0, 1]N ) to the distribution of B ◦ . Intuitively speaking, this states that the pinned process B ◦ is the process B, conditioned to be 0 at time t = (1, . . . , 1). (iv) What can you say about the asymptotic behavior of the measures Qε defined by Qε (A) = P(B ∈ A | |B(1,...,1) | ≤ ε)? (v) Let B = (B(s,t) ; s, t ≥ 0) denote the 2-parameter Brownian sheet and | consider the 2-parameter process B | = (B(s,t) ; s, t ∈ [0, 1]) defined as |
B(s,t) = B(s,t) − sB(1,t) ,
0 ≤ s, t ≤ 1.
Provide a weak convergence justification for the statement that B | is B conditioned to be 0 on the line {(1, t) : t ∈ [0, 1]}. (Hint: First show that the 2-parameter process B | is independent of the process (B(1,t) ; t ∈ [0, 1]).) 10. Let B = (Bt ; t ≥ 0) denote the standard 1-dimensional Brownian motion. (i) For all n ≥ 1, define Vn (t) = 0≤j≤2n t (B(j+1)2−n − Bj2−n )2 , whenever t ≥ 0 −n is of the form t = j2 , and for other t ≥ 0, define the random function Vn by linearly interpolating these values between t’s of the form j2−n , j = 0, 1, . . . . Prove that with probability one, as n → ∞, Vn (t) converges to t, uniformly on t-compacta. (ii) Find constants µ, α, and σ such that as n → ∞, the process t → σ −1 nα {Vn (t) − µt} converges weakly to Brownian motion in C([0, 1]).
6 Notes on Chapter 6
213
Does this have an N -parameter extension? You should also consult Supplementary Exercise 2, Chapter 5. (Hint: For part (i), first compute the mean and the variance. Next, show that for any finite set F ⊂ [0, 1] and all ε > 0, P(maxt∈F |Vn (t) − t| ≥ ε) is small. Interpolate between the points in a sufficiently well chosen F = Fn .)
6 Notes on Chapter 6 Sections 1–3 Our construction of abstract random variables is completely standard. The material on weak convergence is modeled after Billingsley (1968) and is a subset of the rich development there. As a warning to the functional analysis aficionado, we should mention that in the probability literature, “weak convergence” translates to “weak- convergence” of analysis (and not “weak convergence”!) Theorem 2.4.1 is from Oxtoby and Ulam (1939) The term “tightness” is due to Lucien LeCam; cf. LeCam (1957). Our proof of Prohorov’s theorem in the compact case is philosophically different from the usual ones that can be found, for instance, in (Billingsley 1968; Rudin 1973), although there are some similarities. Section 5 Supplementary Exercise 9 is modeled after Billingsley (1968, equation (11.31)), whereas Supplementary Exercise 10 is classical, and a part of the folklore of the subject. For a variant on random sceneries, see Khoshnevisan and Lewis (1998).
This page intentionally left blank
Part II
Continuous-Parameter Random Fields
215
This page intentionally left blank
7 Continuous-Parameter Martingales
The second part of this book starts with a continuous-parameter extension of the discrete-parameter theory of Chapter 1. Our use of the term “extension” is quite misleading. Indeed, we will quickly find that in order to carry out these “extensions,” one needs a good understanding of the regularity of the sample functions of multiparameter stochastic processes; this will require a great effort. However, we will be rewarded for our hard work, since it will lead to a successful continuous-parameter theory that, in many ways, probes much more deeply than its discrete-parameter counterpart. Moreover, this theory lies at the foundations of nearly all of the random fields that arise throughout the rest of this book and a great deal more. Viewed as such, this chapter is simply indispensable for those who wish to read on. We will also discuss elements of hyperbolic stochastic partial differential equations as an interesting area with ready applications.
1 One-Parameter Martingales In continuous-time, and in informal terms, we can introduce an RN + -indexed martingale as we defined an NN -indexed martingale, but replacing NN 0 0 N valued parameters by R+ -valued ones. While this is simple enough to understand, attempts at developing a viable continuous-time theory will quickly encounter a number of technical and conceptual problems. These difficulties are easier to isolate in the simpler one-parameter setting, which will be our starting point. This presentation will be extended to the multiparameter setting afterwards.
218
7. Continuous-Parameter Martingales
1.1 Filtrations and Stopping Times Motivated by Chapter 1, we say that a collection F = (Ft ; t ≥ 0) of sub– σ-fields of the underlying σ-field G is a (one-parameter) filtration if for all 0 ≤ s ≤ t, Fs ⊂ Ft . We emphasize that, here, the variables s and t are R+ -valued. Suppose X = (Xt ; t ≥ 0) is an S-valued stochastic process, where (S, d) is a metric space. We say that X is adapted to the filtration F if for all t ≥ 0, Xt is Ft -measurable. We also say that a [0, ∞]-valued random variable T is a stopping time if (T ≤ t) ∈ Ft ,
t ≥ 0.
In order to highlight the dependence of T on the filtration F, we may sometimes refer to T as an F-stopping time. The following exercise shows that some of the important properties of stopping times are preserved in the transition from discrete to continuous-time. Exercise 1.1.1 Prove thatwhenever T1 , T2 , . . . are stopping times, so are ∞ inf n≥1 Tn , supn≥1 Tn , and i=1 Ti . Moreover, show that all nonrandom times are stopping times. For any stopping time T , we define
FT = A ∈ Ft : A ∩ (T ≤ t) ∈ Ft , for all t ≥ 0 . t≥0
The collection FT is, in some sense, similar to its discrete-time counterpart. For example: (i) FT is a σ-field; (ii) if T is nonrandom, say T = k, then FT = Fk ; (iii) T is an FT -measurable random variable; and (v) if T ≤ S are both F-stopping times, then FT ⊂ FS . Exercise 1.1.2 Verify that the above properties hold true.
Letting XT (ω) = XT (ω) (ω), we would also like to know that whenever X is adapted to F, and if T is an F-stopping time, then XT is FT -measurable. Here, we meet our first stumbling block, since XT is, in general, not FT measurable. Example Let T be a strictly positive random variable with an absolutely continuous distribution, and define 6 0, if 0 ≤ t < T , Xt = ξ, if t = T , 1, if t > T ,
1 One-Parameter Martingales
219
where ξ > 1 and is independent of T . (We can arrange it so that T (ω) > 0 and ξ(ω) > 1 for all ω ∈ Ω. Thus, there is no need for a.s. statements here.) Define the filtration F = (Ft ; t ≥ 0) as follows: For all t ≥ 0, Ft denotes the σ-field generated by the collection of random variables (Xr ; 0 ≤ r ≤ t). By its very definition, X is adapted to F, and we note also that T = inf(s > 0 : Xs > 0). Therefore: (P1) T (ω) > t if and only if for all 0 ≤ s ≤ t, Xs (ω) = 0. (P2) Conversely, T (ω) < t if and only if for all s ≥ t, Xs (ω) = 1. This shows that T is a stopping time, since for any t ≥ 0, (T > t) = (Xt = 0) ∈ Ft . On the other hand, not only is XT not FT -measurable, but in fact, XT is independent of FT . To see this, fix some t > 0 and consider bounded, measurable functions g, f1 , . . . , fn : R+ → R and a collection of points s1 , . . . , sn such that for some k ≥ 1, 0 ≤ s1 < · · · < sk ≤ t < sk+1 < · · · < sn . Since T has an absolutely continuous distribution, E
n
n fi (Xsi )g(ξ)1l(T >t) = E
i=1
fi (Xsi )g(ξ)1l(s
=k+1
+E
n
fi (Xsi )g(ξ)1l(t
n i=1
where sn+1 = ∞, for notational convenience. Using (P1), (P2), and the independence of ξ from T , we can deduce that E
n i=1
n fi (Xsi )g(ξ)1l(T >t) =
n
fi (0) ·
=k+1 i=1 k
j=+1 n
fi (0) ·
+ i=1
= E[g(ξ)] · E
fj (1) E g(ξ)1l(s
fj (1) E g(ξ)1l(t
j=k+1 n
fi (Xsi )1l(T >t) .
i=1
By a monotone class argument, for any bounded, ∨t≥0 Ft -measurable random variable Z, E[Zg(ξ)1l(T >t) ] = E[g(ξ)] · E[Z1l(T >t) ].
220
7. Continuous-Parameter Martingales
Take Z = 1lA , where A ∈ ∨t≥0 Ft , to see that XT = ξ is independent of FT (why?). In fact, we have shown the surprising fact that XT is independent of ∨t≥0 Ft . That is, XT is independent of the entire process X. This example shows one of the peculiarities of the theory of continuoustime stochastic processes: If t → Xt is ill-behaved, XT need not be FT measurable. (In the example, t → Xt has a discontinuity of the second kind at the random point t = T .) The following result demonstrates a kind of converse to this. Theorem 1.1.1 Suppose X is a right-continuous, S-valued stochastic process that is adapted to a filtration F. Then, for all measurable functions f : S → R and for all F-stopping times T , f (XT )1l(T <∞) is an FT measurable, real-valued random variable. A word of caution: f (XT )1l(T <∞) is defined to be 0 on the event (T = ∞). Our proof of this theorem relies on the following approximation result. Lemma 1.1.1 Let F = (Ft ; t ≥ 0) be a filtration and T an F-stopping time. For all n ≥ 1, define " ∞ −n 1l(k2−n ≤T <(k+1)2−n ) , if T < ∞, n k=0 (k + 1)2 T = +∞, otherwise. Then: (i) for each n ≥ 1, T n is an F-stopping time; and (ii) T n ↓ T , as n → ∞. Exercise 1.1.3 Prove Lemma 1.1.1.
Proof of Theorem 1.1.1 Note that XT is defined only on (T < ∞), while (T = ∞) need not be empty. Therefore, we need to consider two classes of ω’s: those for which T (ω) < ∞ and those for which T (ω) = ∞. To get around this, introduce a point δ ∈ S, called the cemetery state. Let S = S ∪ {δ} and topologize S by declaring E ∪ {δ} open whenever E ⊂ S is open. Endow S with the induced Borel field and extend any f : S → R to a function on S by f (δ) = 0. It is easy to see that if f : S → R is measurable, its extension to S —still denoted by f —is a measurable function from S into R. We also extend the process X by defining X∞ (ω) = δ for all ω ∈ Ω. With this extension in mind, we need to show that for all Borel sets E ⊂ R and all t ≥ 0, (f (XT ) ∈ E) ∩ (T ≤ t) ∈ Ft . In fact, upon writing (T ≤ t) as (T = t) ∪ (T < t), we need only show that for all Borel sets E ⊂ R and all t > 0, (f (XT ) ∈ E) ∩ (T < t) ∈ Ft .
1 One-Parameter Martingales
221
On the other hand, since f −1 (E) = (s ∈ S : f (s) ∈ E) is a measurable subset of S , it suffices to show that for all Borel sets F ⊂ S , (XT ∈ F ) ∩ (T < t) ∈ Ft ,
t ≥ 0.
(1)
The collection of all sets F that satisfy the above is a monotone class. Therefore, we need only verify (1) for all open sets F . Let T n be as in Lemma 1.1.1 and note that for any open F ⊂ S , (XT ∈ F ) ∩ (T < t) =
∞ ∞
(XT n ∈ F ) ∩ (T n < t).
m=1 n=m
(Why?) This equals ∞ ∞
(Xk2−n ∈ F ) ∩ (T n = k2−n ),
m=1 n=m k∈NN : k2−n
which is clearly in Ft .
1.2 Entrance Times Suppose X = (Xt ; t ≥ 0) is a random process that takes values in a metric space (S, d), and is adapted to a filtration F = (Ft ; t ≥ 0). The entrance (or hitting) time of any set E ⊂ S is defined as TE = inf(s ≥ 0 : Xs ∈ E). We now show that in many cases of interest, TE is a stopping time. Theorem 1.2.1 Suppose X is a right-continuous, S-valued stochastic process that is adapted to a filtration F. Then, for all open sets E ⊂ S, TE is a stopping time. If t → Xt is continuous, then for all closed sets E ⊂ S, TE is a stopping time. Proof Since E is open and X is right-continuous, whenever ω ∈ (TE ≤ t), either Xt (ω) ∈ E or there exists a rational r < t such that Xr (ω) ∈ E. That is, (TE ≤ t) = (Xr ∈ E) ∪ (Xt ∈ E), r∈Q+ ∩[0,t[
which is in Ft and shows that TE is a stopping time. For the second assertion, we suppose that t → Xt is continuous and note that for all t ≥ 0, (TE ≤ t) = (d({Xs }, E) ≤ ε), ε∈Q+ s∈[0,t]∩Q+
where d(F, E) = inf{d(x, y); x ∈ F, y ∈ E}. Since {x ∈ S : d({x}, E) ≤ ε} is closed, it is a measurable subset of S. Thus, (T ≤ t) ∈ Ft , as desired.
222
7. Continuous-Parameter Martingales
1.3 Smartingales and Inequalities A stochastic process M is said to be a submartingale with respect to a filtration F if: (i) M is adapted to F; (ii) for all t ≥ 0, E{|Mt |} < +∞; and (iii) for all t, s ≥ 0, E{Mt+s | Fs } ≥ Ms , a.s. It is said to be supermartingale if −M is a submartingale, and it is a martingale if it is both a sub- and a supermartingale. If M is either a sub- or a supermartingale, we refer to it as a smartingale. At first glance, it may seem that the theory of continuous-time smartingales is the same as its discrete-time relative. However, the example of Section 1.1 shows that unless t → Mt is well-behaved, one cannot possibly hope for an extensive theory. On the other hand, if M is right-continuous (say), then it has many nice properties. We list some of them below. Theorem 1.3.1 If M is a nonnegative, right-continuous submartingale: (i) (Weak (1,1) inequality) for all t, λ > 0, P
1 sup Ms ≥ λ ≤ E[Mt 1l(sup0≤s≤t Ms ≥λ) ]; λ 0≤s≤t
(ii) (Strong (p,p) inequality) for all p > 1 and all t ≥ 0, p p E[Mtp ]; and E sup Msp ≤ p − 1 0≤s≤t (iii) (The L ln L inequality) for all t ≥ 0, e 1 + E[Mt ln+ Mt ] , E sup Ms ≤ e−1 0≤s≤t where ln+ x = ln(x ∧ 1). Proof Consider finite sets Fk ⊂ [0, t] ∩ Q+ (k ≥ 1) such that as k → ∞, Fk increases to [0, t] ∩ Q+ . By right continuity, as k → ∞, sups∈Fk Ms converges upwards to sup0≤s≤t Ms . To prove (i), apply Theorem 1.3.1 of Chapter 1 to sups∈Fk Ms and use the monotone convergence theorem to take limits. Parts (ii) and (iii) follow from applying a similar argument, and using Theorems 1.4.1 and 1.5.1 of Chapter 1, respectively.
1 One-Parameter Martingales
223
1.4 Regularity Section 1.3 shows that right-continuous smartingales in continuous-time have many of the desirable properties of discrete-parameter martingales. In this subsection we seek conditions that guarantee that our smartingale has a right-continuous modification that is itself a smartingale. (Recall Section 2.2 of Chapter 5 for the definition of modifications.) Throughout, we assume that the underlying probability space is complete. If not, we can always complete it without changing anything critical. Given a filtration F, we can define a filtration F , in two stages, as follows: First, define Ft as the P-completion of Ft for all t ≥ 0. That is, Ft is the σ-field generated by the collection of all sets of the form A ∪ B, where A ∈ Ft and B is a subset of a P-null set. Clearly, F is a filtration. Now we can define Fs , t ≥ 0. Ft = s>t
Ft
is defined as the “right-continuous extension” of Ft for each In words, t ≥ 0. It should not be hard to check that F is also a filtration. The main result of this section is the following: Theorem 1.4.1 Suppose M is a submartingale with respect to the filtration F . If t → E[Mt ] is right-continuous, then M has a right-continuous modification. Remarks (i) The said modification is necessarily a right-continuous submartingale with respect to F ; cf. Exercise 1.4.1 below. (ii) If Y ∈ L1 (P), then Mt = E[Y | Ft ] is a martingale for any version of this conditional expectation. Theorem 1.4.1 implies that there exists a version of this conditional expectation such that t → E[Y | Ft ] is right-continuous; see Exercise 1.4.1. (iii) F is complete and right-continuous; i.e., for all t ≥ 0, Ft = ∩s>t Fs . Exercise 1.4.1 Verify the claims of the above remarks.
Theorem 1.4.1 is proved in two stages that are stated as the following lemmas. Lemma 1.4.1 With probability one, limr↓t:r∈Q+ Mr exists, simultaneously for all t ≥ 0. Note the order of the quantifiers, namely, that there exists one null set Θ such that for all ω ∈ Θ, limr↓t:r∈Q+ Mr (ω) exists for all t ≥ 0.
224
7. Continuous-Parameter Martingales
Proof The measurability of the event in question is part of the statement of the lemma, and will be proved shortly. Note that this assertion is not obvious, since the event in question involves an uncountable number of t’s. For any two real numbers a < b, and for any finite set F ⊂ [0, ∞[, define UF [a, b] to be the number of upcrossings of the interval [a, b] made by the sequence (Mr ; r ∈ F ). Whenever G ⊂ [0, ∞[, let UG [a, b] = supF ⊂G, F finite UF [a, b]. Recall the upcrossing inequality (Theorem 1.6.1, Chapter 1), and observe that by taking finite sets Fk ↑ Q+ ∩ [0, n], we deduce for all integers n ≥ 1, E{U[0,n] [a, b]} ≤
|a| + E{|Mn |} . b−a
Let Ω0 denote the collection of all ω’s such that for some pair of rational numbers a < b, and for some integer n ≥ 1, U[0,n] [a, b] = ∞. The above shows that P(Ω0 ) = 0. For all a < b, define Na,b =
t≥0
lim inf Mr ≤ a < b ≤ lim sup Mr .
r↓t: r∈Q+
r↓t: r∈Q+
Since the above is an uncountable union, Na,b need not be measurable. However, inspecting the event ω by ω, we can deduce that ∪a
1 One-Parameter Martingales
225
Proof Fix a t ≥ 0 and let ρ1 , ρ2 , . . . be an enumeration of positive rationals in [t, t + 1] such that as n → ∞, ρn ↓ t. Clearly, (Mρn ; n ≥ 1) is a reversed submartingale, and supn E{|Mρn |} ≤ E{|Mt+1 |} < +∞. By Supplementary Exercise 4 of Chapter 1, limn→∞ Mρn exists, almost surely and in L1 (P); this implies (i). To prove (ii), we merely note that there exists one null set, outside of which for all n ≥ 1, Mt ≤ E[Mρn | Ft ], a.s. Assertion (ii) follows readily from this. To prove (iii), first note that by convergence in L1 (P), E[Mt+ ] = limn→∞ E[Mρn ]. The assumed right continuity of s → E[Ms ] implies that E[Mt+ − Mt ] = 0. Since Mt+ − Mt ≥ 0, a.s. (part ii), and has mean 0, by Supplementary Exercise 3, P(Mt+ = Mt ) = 1. To prove (iv), we begin by fixing an s and a t such that 0 ≤ s < t. Note that almost surely, Mp ≤ E[Mq | Fp ], simultaneously for all rationals p < q. By Exercise 1.4.1, ∩p>s: p∈Q+ Fp = Fs . Consequently, we can take p ↓ s (p ∈ Q+ ) and use the discrete-parameter convergence theorem for reversed martingales (Supplementary Exercise 4, Chapter 1) to see that with probability one, Ms+ ≤ E[Mq | Fs ], simultaneously over all rational q > s. Let q ↓ t (q ∈ Q+ ) and use L1 (P) convergence to see that M + is a submartingale. Part (v) follows from (iv), by considering both M and −M . We are ready for our proof of Theorem 1.4.1. Proof of Theorem 1.4.1 Let Ω1 be the some t ≥ 0, limr↓t: r∈Q+ Mr (ω) does not P-null set. For all t ≥ 0 and all ω, define " Mt+ (ω), Nt (ω) = 0,
collection of all ω’s such that for exist. By Lemma 1.4.1, Ω1 is a if ω ∈ Ω1 , if ω ∈ Ω1 .
By Lemmas 1.4.1 and 1.4.2, the process N = (Nt ; t ≥ 0) is a rightcontinuous modification of M that is also a submartingale. Thus far, we have shown that if M is a smartingale with respect to the larger filtration F , then it has a right-continuous modification. Since F is larger than F, it may appear that the class of right-continuous F smartingales is substantially more restrictive than the collection of rightcontinuous F-smartingales. The following shows that this is not so. Lemma 1.4.3 If M is a right-continuous submartingale with respect to a filtration F, then M is a right-continuous submartingale with respect to F . Exercise 1.4.2 Prove Lemma 1.4.3.
From now on, we say that the filtration F satisfies the usual conditions if for all t ≥ 0, Ft = Ft . Theorem 1.4.1 states that if M is a smartingale with
226
7. Continuous-Parameter Martingales
respect to a filtration that satisfies the usual conditions, and if t → E[Mt ] is right-continuous, then M has a right-continuous modification that is itself a smartingale. Note that when M is a martingale, t → E[Mt ] is automatically continuous, and right-continuous modifications exist automatically.
1.5 Measurability of Entrance Times In this subsection we state the following deep theorem, due to G. A. Hunt and C. Dellacherie, to various degrees of generality. Its proof relies on several results from measure theory that would take too long to develop. Since we have only one use for this theory (namely the following result), we omit a proof. However, a self-contained derivation can be found in Bass (1995, Theorem 2.8, Ch. II), and in Dellacherie and Meyer (1978, Theorem 50, Ch. IV). Theorem 1.5.1 Suppose F is a filtration that satisfies the usual conditions, and X is a right-continuous, adapted, S-valued stochastic process, where S is a complete, separable metric space. Then, for all Borel sets E ⊂ S, TE = inf(s ≥ 0 : Xs ∈ E) is a stopping time. When X is continuous and E is closed, we do not need F to satisfy the usual conditions, and S need not be complete and separable; cf. Theorem 1.2.1.
1.6 The Optional Stopping Theorem We are in a position to state and prove our first main result for continuoustime martingales. It is the natural continuous-time extension of the discretetime result of Section 1.2, Chapter 1, and will be called the optional stopping theorem. Theorem 1.6.1 (The Optional Stopping Theorem) Suppose M is a right-continuous submartingale with respect to a given filtration F. Whenever T1 and T2 are a.s. bounded F-stopping times with T1 ≤ T2 , a.s., then MT1 ≤ E[MT2 | FT1 ], a.s. Proof Since T1 and T2 are almost surely bounded, there exists a nonrandom integer K > 0 such that with probability one, T1 , T2 ≤ K. Let T1n and T2n be the stopping times of Lemma 1.1.1, so that with probability one, for all n ≥ 1, T1n , T2n ∈ {k2−n; 1 ≤ k ≤ K2n }, and as n → ∞, T1n ↓ T1 and T2n ↓ T2 . From their construction, it is clear that for all n ≥ 1, T1n ≤ T2n ; cf. Lemma 1.1.1. Note that (1) (Mk2−n ; 1 ≤ k ≤ K2n ) is a discrete-parameter submartingale with respect to the filtration Fn = (Fk2−n ; 1 ≤ k ≤ K2n ); and (2) thanks to their particular construction, T1n and T2n are both Fn stopping times. The optional stopping theorem (Theorem 1.6.1, Chapter 1)
1 One-Parameter Martingales
227
implies that with probability one, MT1n ≤ E[MT2n | FT1n ]. Since T2n decreases as n increases, for all integers n ≥ m ≥ 1, MT1n ≤ E[MT2m | FT1n ],
a.s.
Let n → ∞ and use Lemma 1.1.1, together with right continuity of M , to see that MT1 ≤ lim supn→∞ E[MT2m | FT1n ], a.s. On the other hand, since T1n is decreasing in n and since T2m is bounded above by K + 2−m , (E[MT2m | FT1n ]; n ≥ 1) is a uniformly integrable, reversed submartingale. In particular, with probability one, it converges a.s. and in L1 (P); see Supplementary Exercise 4, Chapter 1. That is, ∞ FT1n . MT1 ≤ E MT2m n=1
By Theorem 1.1.1, MT1 is FT1 -measurable. Since FT1 ⊂ ∩∞ n=1 FT1n , we can take conditional expectations of the above inequality, given FT1 , to obtain MT1 ≤ E[MT2m | FT1 ],
a.s.
(1)
On the other hand, (MT2m ; m ≥ 1) is a uniformly integrable, reversed submartingale. Therefore, as m → ∞, MT2m converges a.s. and in L1 (P). Since T2m ↓ T2 and M is right-continuous, limm→∞ MT2m = MT2 , a.s. and in L1 (P). In particular, equation (1) implies the theorem. Part of the assertion of the previous theorem is that E{|MT |} < +∞ for all bounded stopping times T . This is implicit in the proof of Theorem 1.6.1. Next is our first application of the optional stopping theorem. Corollary 1.6.1 Suppose T1 and T2 are a.s. finite stopping times. Then,
E E[Y FT1 ] FT2 = E[Y | FT1 ∧T2 ], a.s. Exercise 1.6.1 Verify Corollary 1.6.1.
This is an example of commutation of filtrations in continuous-time. We will return to this concept in greater depth later on. For the time being, we will be satisfied with the following reformulation. Corollary 1.6.2 If M is a right-continuous submartingale with respect to a filtration F, then for any stopping time T , (MT ∧t ; t ≥ 0) is a rightcontinuous submartingale with respect to F. Proof We need the following two facts, which are proved in Exercise 1.6.2 below.
228
7. Continuous-Parameter Martingales
(a) for any integrable random variable Y and for all s ≥ 0, a.s., E[Y | Fs ]1l(T >s) = E[Y | FT ∧s ]1l(T >s) ,
a.s.; and
(b) MT 1l(T ≤s) is Fs -measurable. Since (T > s), (T ≤ s) ∈ Fs , for all s ≥ 0, we conclude that for all t ≥ s ≥ 0, E[MT ∧t | Fs ] = E[MT ∧t 1l(T ≤s) | Fs ] + E[MT ∧t 1l(T >s) | Fs ] = E[MT 1l(T ≤s) | Fs ] + E[MT ∧t | Fs ]1l(T >s) = MT 1l(T ≤s) + 1l(T >s) E[MT ∧t | FT ∧s ] (by (a) and (b)) = MT ∧s 1l(T ≤s) + 1l(T >s) E[MT ∧t | FT ∧s ] ≥ MT ∧s , almost surely. We have used the optional stopping theorem in the last line; cf. Theorem 1.6.1. Exercise 1.6.2 Prove claims (a) and (b) of the proof of Corollary 1.6.2.
1.7 Brownian Motion Recall from Chapter 5 that an R-valued process B = (Bt ; t ≥ 0) is (standard) Brownian motion if it is a continuous Gaussian process with mean 0 and covariance function given by E[Bs Bt ] = s ∧ t (s, t ≥ 0). Note that for all t ≥ s ≥ r ≥ 0, E[(Bt − Bs )Br ] = 0. By Corollary 1.1.1, Chapter 5, for all 0 ≤ r1 , . . . , rk ≤ s, Bt − Bs is independent of (Br1 , . . . , Brk ). Hence, Bt − Bs is independent of Hs , which is defined as the σ-field generated by (Br ; 0 ≤ r ≤ t). In fact, one can do a little better. By relabeling the indices, we see that for all t ≥ s and all ε > η > 0, Bt+ε − Bs+ε is independent of Hs+η . Since the choice of η < ε is arbitrary, Bt+ε − Bs+ε is independent of Hs+ (recall the notation from Section 1.4). Equivalently, for all A ∈ Hs+ , and for all bounded, continuous functions f : R → R, E[f (Bt+ε − Bs+ε )1lA ] = P(A) · E[f (Bt+ε − Bs+ε )].
(1)
We can replace A by A ∪ Λ with no change, where A ∈ Hs+ and Λ is a P-null set. Hence, we see that Bt+ε − Bs+ε is independent of Fs , where F = (Ft ; t ≥ 0) denotes the smallest filtration that (1) contains H; and (2) satisfies the usual conditions. According to the notation of Section 1.4, Ft = Ht , for all t ≥ 0, and we refer to F as the history of B. By the a.s. (right) continuity of B, we can let ε → 0 to see that (1) holds for all t > s ≥ 0, all A ∈ Fs , and with ε = 0. This can be stated in other words as follows.
1 One-Parameter Martingales
229
Theorem 1.7.1 (Stationary, Independent Increments Property) If B denotes a Brownian motion with history F, then for all t ≥ s ≥ 0, Bt −Bs is independent of Fs , and Bt − Bs ∼ N1 (0, t − s). As a corollary, we obtain the following. Corollary 1.7.1 If B denotes a Brownian motion, the following are martingales with respect to the history F of B: (i) t → Bt ; (ii) t → Bt2 − t; and (iii) t → exp(αBt − 12 tα2 ), where α ∈ R is fixed. Exercise 1.7.1 Prove Corollary 1.7.1.
Next, we mention a second corollary of Theorem 1.7.1. Corollary 1.7.2 With probability one, lim sup Bn = +∞, n→∞
and
lim inf Bn = −∞. n→∞
Exercise 1.7.2 Prove Corollary 1.7.2.
Once we have identified enough martingales, we can use the optional stopping theorem to make certain computations possible. To produce a class of interesting examples, let Ta,b = T{a,b} = inf(s ≥ 0 : Bs ∈ {a, b}) be the entrance time of {a, b}, where a < 0 < b. Since t → Bt is continuous, Corollary 1.7.2 shows that for all a < 0 < b, P(Ta,b < ∞) = 1. Furthermore, Ta,b is an F-stopping time (Theorem 1.2.1), and (BTa,b ∧t ; t ≥ 0) is a continuous martingale with respect to F (Corollary 1.6.2). In fact, since supt≥0 |BTa,b ∧t | ≤ |a| ∨ |b|, t → BTa ∧t is a bounded continuous martingale. By the bounded convergence theorem, E[BTa,b ] = 0. Since BTa,b ∈ {a, b}, almost surely, the latter expectation can be written as 0 = aP(BTa,b = a) + b 1 − P(BTa,b = a) . Upon solving the above algebraic equation, we obtain the following: Corollary 1.7.3 For all a < 0 < b, P(BTa,b = a) =
b , b−a
P(BTa,b = b) =
−a . b−a
Corollary 1.7.3 is the solution to the gambler’s ruin problem: Suppose the gambler’s fortune at time t is Bt , where negative fortune means loss. If b is the house limit and −a is all that the gambler owns, P(BTa,b = a) is
230
7. Continuous-Parameter Martingales
the probability of ruin for the gambler and P(BTa,b = b) is the probability of ruin for the house. Compare to Supplementary Exercise 2. We conclude this subsection with a final computation involving Brownian motion. Let Ta = inf(s ≥ 0 : Bs = a), and apply Theorem 1.1.1 and Corollary 1.7.2 to deduce that Ta is an almost surely finite stopping time for all a ∈ R. In this language, Corollary 1.7.3 shows that for all a < 0 < b, P(Ta < Tb ) = b/(b − a). Arguing as in Corollary 1.7.3, we can apply √ Corollary 1.7.1(iii) with α = 2λ to compute the Laplace transform of Ta . √ 2λ
Corollary 1.7.4 For all a ∈ R and all λ > 0, E[e−λTa ] = e−a
.
Exercise 1.7.3 Prove Corollary 1.7.4. In fact, it is possible to check directly that the probability density of Ta is 2
ae−a /2t P(Ta ∈ dt) = √ 3 dt, 2π t 2 where t ≥ 0, and a ∈ R.
1.8 Poisson Processes A real-valued stochastic process (Xt ; t ≥ 0) is a (time-homogeneous) Poisson process with rate λ > 0 if: (i) for each s, t ≥ 0, Xt+s − Xt is independent of Ft = Ht , where Ht denotes the σ-field generated by (Xr ; 0 ≤ r ≤ t) (cf. Section 1.4); and (ii) for all s, t ≥ 0, Xt+s − Xt has a Poisson distribution with mean λs; i.e., P(Xt+s − Xt = k) =
1 −λs e (λs)k , k!
∀k = 0, 1, . . . .
We will always, and implicitly, use a separable modification of X, whose existence is guaranteed by Theorem 2.2.1 of Chapter 5. In words, a Poisson process with rate λ is a stochastic process with stationary, independent increments that are themselves Poisson random variables. In light of Theorem 1.7.1, Poisson processes are closely related to Brownian motion. However, there are also obvious differences. For example, by (ii) above, t → Xt is almost surely increasing. Let τ0 = 0 and for all k ≥ 1, define τk = inf(s > τk−1 : Xs − Xτk−1 ≥ 1),
k ≥ 1.
1 One-Parameter Martingales
Exercise 1.8.1 The τi ’s are F-stopping times.
231
Since X is almost surely increasing, the τi ’s are the times at which X increases. The following is an important first step in the analysis of Poisson processes. Lemma 1.8.1 The random variables (τk − τk−1 ; k ≥ 1) are i.i.d. exponential random variables with mean λ−1 each. Proof Since t → Xt is a.s. increasing, for all t ≥ 0, P(τ1 > t) = P(Xt = 0) = e−λt . That is, τ1 has an exponential distribution with mean λ−1 . We now proceed with induction. Supposing the result is true for some k ≥ 1, it suffices to show that it is true for k + 1. Since τk has an absolutely continuous distribution, for all Borel sets A ⊂ R, P(τk+1 − τk > t, τk ∈ A) = λ P(τk+1 − τk > t | τk = s)e−λs ds. (1) A
On the other hand, on (τk = s), τk+1 − τk > t if and only if Xt+s − Xs = 0. By (ii) of the definition of Poisson processes, this latter event is independent of Fs and hence of (τk = s), since τk is a stopping time. Thus, by (i), P(τk+1 − τk > t | τk ∈ A) = P(Xt+s − Xs = 0) = e−λt . The result follows from (1).
Now consider a right-continuous modification of X that we continue to write as X. This process is itself a Poisson process with rate λ. The following exercise shows that, once it is suitably ‘compensated’, X becomes a martingale. Exercise 1.8.2 Prove that the stochastic process (Xt − λt; t ≥ 0) is a right-continuous martingale. Since X0 = 0, by the optional stopping theorem (Theorem 1.6.1), for all n ≥ 1, E[Xτ1 ∧n − λ(τ1 ∧ n)] = 0. On the other hand, |Xτ1 ∧n | ≤ 1, almost surely. Thus, by the monotone and the dominated convergence theorems, E[Xτ1 ] = λE[τ1 ], which equals 1, thanks to Lemma 1.8.1 above. Since Xτ1 ≥ 1 (right continuity), this shows that Xτ1 = 1, almost surely; cf. Supplementary Exercise 3 for details. In summary, we have shown that with probability one, at the first jump time, the process always jumps to 1. This can be generalized as follows. Lemma 1.8.2 With probability one, for all n ≥ 1, Xτn = n.
232
7. Continuous-Parameter Martingales
Exercise 1.8.3 Check that the above extension is valid.
Now we can combine Lemmas 1.8.1 and 1.8.2 to see that Poisson processes of rate λ > 0 exist. Moreover, they have a simple construction: (a) Since X0 ≥ 0 and E[X0 ] = 0, X0 = 0, i.e., the process starts at 0. (b) For all 0 ≤ s < τ1 , Xs = 0, where τ1 has an exponential distribution with mean λ−1 . (c) For all τ1 ≤ s < τ2 , Xs = 1, where τ2 − τ1 is independent of τ1 and has an exponential distribution with mean λ−1 . More generally, for all τk ≤ s < τk+1 , Xs = k and τk+1 − τk is an exponentially distributed random variable with mean λ−1 that is independent of τ1 , . . . , τk . In other words, we can always construct a right-continuous Poisson process with rate λ > 0 as follows. . . . be independent exponential random variProposition 1.8.1 Let ξ1 , ξ2 , n ables with mean λ−1 . If γn = i=1 ξi (n ≥ 1), the process Y = (Yt ; t ≥ 0) is a Poisson process with rate λ, where Yt =
∞
1l(γn ≤t) ,
t ≥ 0.
n=1
Exercise 1.8.4 Suppose U1 , U2 , . . . are i.i.d. random variables, all uniformly picked from the interval [0, 1]. Consider the empirical distribution function Fn , described by Fn (t) =
n
1l(Uj ≤t) ,
t ∈ [0, 1], n ≥ 1.
j=1
This is a random distribution function on [0, 1] for every n ≥ 1. Let Nn be an independent Poisson random variable with mean n, and define Yn (t) = FNn (t),
t ∈ [0, 1], n ≥ 1.
(i) Check that conditional on (Nn = n), Yn has the same finite-dimensional distributions as Fn . (ii) Prove that Yn is a Poisson process (indexed by [0, 1]) of rate n. This is from Kac (1949).
2 Multiparameter Martingales
233
2 Multiparameter Martingales In the previous section we discussed the general construction and regularity theory of one-parameter martingales in continuous-time. We are now in position to construct multiparameter martingales in continuous-time. We do this by first introducing some concepts whose discrete-time counterparts appeared in Chapter 1.
2.1 Filtrations and Commutation Recall that (Ω, G, P) is our underlying probability space. A collection F = (Ft ; t ∈ RN + ) is said to be an (N -parameter) filtration if F is a collection of sub–σ-fields of G with the property that whenever s t, both in RN +, then Fs ⊂ Ft . To each such N -parameter filtration F we ascribe N oneparameter filtrations (in the sense of Section 1.1) F1 , . . . , FN , defined by the following: For every i ∈ {1, . . . , N }, Ft , r ≥ 0. Fri = (i) =r t∈RN + :t
Exercise 2.1.1 Prove that Fri =
Ft ,
r ≥ 0,
(i) ≤r t∈RN + :t
for every 1 ≤ i ≤ N .
The filtrations F1 , . . . , FN are called the marginal filtrations of F. We say that the N -parameter filtration F is commuting if for all s, t ∈ RN + and all bounded Ft -measurable random variables Y , E[Y | Fs ] = E[Y | Fst ],
a.s.
The following characterizes two of the fundamental properties of commuting filtrations. Theorem 2.1.1 Suppose F = (Ft ; t ∈ RN + ) is a commuting filtration. Then: (i) for all bounded random variables Z, and for all t ∈ RN +,
E[Z | Ft ] = E · · · E E[Z|Ft1(1) ] Ft2(2) · · · FtN(N ) , a.s.; and (ii) the commutation property of F is equivalent to the following: For all s, t ∈ RN + , Fs and Ft are conditionally independent, given Fst . Exercise 2.1.2 Prove Theorem 2.1.1. (Hint: Consult Theorems 3.4.1 and 3.6.1 of Chapter 1.)
234
7. Continuous-Parameter Martingales
2.2 Martingales and Histories Henceforth, we will always make the following assumption. Assumption The underlying probability space (Ω, G, P) is complete. Recall that one can always complete the probability space at no cost. The advantage is a gain in regularity for stochastic processes; cf. Theorem 2.2.1 of Chapter 5. Suppose F = (Ft ; t ∈ RN + ) is a filtration of sub–σ-fields of G. By completing them, we can, and will, assume that F is complete in the sense that Ft is a complete σ-field for each t ∈ RN +. A real-valued stochastic process M = (Mt ; t ∈ RN + ) is a submartingale (with respect to F) if: (i) M is adapted to F. That is, for all t ∈ RN + , Mt is Ft -measurable. 1 (ii) M is integrable. That is, for all t ∈ RN + , Mt ∈ L (P).
(iii) For all s t both in RN + , E[Mt | Fs ] ≥ Ms , a.s. A stochastic process M is a supermartingale if −M is a submartingale. It is a martingale if it is both a sub- and a supermartingale. Many of the properties of discrete-time multiparameter super- or submartingales carry through with few changes. For example, if M is a nonnegative submartingale and Ψ : R+ → R+ is convex and nondecreasing, then 1 (Ψ(Mt ); t ∈ RN + ) is a nonnegative submartingale, as long as Ψ(Mt ) ∈ L (P) N for all t ∈ R+ . What about maximal inequalities? As we have already seen in Chapter 5, sups≤t Ms need not even be a random variable. In order to circumvent this difficulty, we can use Theorem 2.2.1 of Chapter 5 to construct a separable ! = (M !t ; t ∈ RN ) of M . The following shows that there is modification M + no harm in doing this. !. Lemma 2.2.1 If M is a submartingale (supermartingale), so is M Exercise 2.2.1 Prove Lemma 2.2.1.
From now on, we will always choose and work with such a separable modification. Let us conclude this subsection with a brief discussion of histories. In complete analogy to the discrete-time theory, we say that H = (Ht ; t ∈ N N RN + ) is the history of M = (Mt ; t ∈ R+ ) if for all t ∈ R+ , Ht is the σ-field generated by (Mr ; r t). It is possible to show that whenever M is a supermartingale (respectively submartingale) with respect to a filtration F, it is also a supermartingale (respectively submartingale) with respect to its history. As a result, when we say that M is a super- or a submartingale
2 Multiparameter Martingales
235
with no reference to the corresponding filtration, we are safe in assuming that the underlying filtration is the history of M . Exercise 2.2.2 Verify the above claims. That is, suppose M is a supermartingale with respect to a filtration F. Show that M is a supermartingale with respect to its history. Construct an example where this history is not commuting.
2.3 Cairoli’s Maximal Inequalities We are ready to state and prove Cairoli’s maximal inequalities for superand submartingales that are indexed by RN +. Theorem 2.3.1 (Cairoli’s Weak L lnN −1 L-Inequality) If M is a separable, nonnegative N -parameter submartingale with respect to a commuting filtration, then for any t ∈ RN + and for all λ > 0,
1 e N −1 (N − 1) + E Mt (ln+ Mt )N −1 . P sup Ms ≥ λ ≤ λ e−1 s≺t Theorem 2.3.2 (Cairoli’s Strong (p, p) Inequality) If M = (Mt ; t ∈ RN + ) is a separable, nonnegative submartingale with respect to a commuting filtration, then for any t ∈ RN + and all p > 1, p N p E[Mtp ]. E sup Msp ≤ p−1 s≺t We will prove Theorem 2.3.1; Theorem 2.3.2 is proved similarly. Proof of Theorem 2.3.1 To begin, note that by Lemma 2.2.1 of Chapter 4, sups≺t Ms is a random variable. By separability and countable additivity,
P sup Ms ≥ λ = sup P max Ms ≥ λ , s≺t
F (t)
s∈F (t)
where supF (t) denotes the supremum over all finite sets F (t) ⊂ [0, t[. On the other hand, since (Ms ; s ∈ F (t)) is a discrete-parameter submartingale, by Cairoli’s weak maximal inequality (Theorem 2.5.1, Chapter 1),
P sup Ms ≥ λ s≺t
1 e N −1 (N − 1) + E Ms (ln+ Ms )N −1 F (t) s∈F (t) λ e − 1 1 e N −1 (N − 1) + E Mt (ln+ Mt )N −1 . = λ e−1 We have applied Jensen’s inequality to obtain the last line. This completes our proof. ≤ sup max
236
7. Continuous-Parameter Martingales
Exercise 2.3.1 Prove Theorem 2.3.2.
It is not hard to extend the above to include estimates for sups t Ms (as opposed to sups≺t Ms ). We will supply the following analogue, which is reminiscent of some of the results of Section 1. Corollary 2.3.1 Suppose M is a separable, nonnegative submartingale with respect to a commuting filtration. 1. If t → E[Mt (ln+ Mt )N −1 ] is right-continuous, then for any t ∈ RN + and for all λ > 0,
1 e N −1 (N − 1) + E Mt (ln+ Mt )N −1 . P sup Ms ≥ λ ≤ λ e−1 st 2. If p > 1 and t → E[Mtp ] is right-continuous, then for all t ∈ RN +, p N p E sup Msp ≤ E[Mtp ]. p−1 st We have used the following definition implicitly: A function f : RN + →R is right-continuous (with respect to the partial order ) if for all t ∈ RN +, lim
s t: s→t
f (s) = f (t).
Exercise 2.3.2 Prove Corollary 2.3.1.
2.4 Another Look at the Brownian Sheet Continuous N -parameter martingales abound. Indeed, consider a realvalued N -parameter Brownian sheet B. It is possible, then, to produce a multiparameter analogue of Corollary 1.7.1. Lemma 2.4.1 The following are N -parameter martingales: (a) t → Bt ; (b) t → Bt2 −
N
=1 t
()
; and
(c) t → exp(αBt − 12 α2
N
=1
t() ), where α ∈ R is fixed.
Exercise 2.4.1 Verify Lemma 2.4.1.
2 Multiparameter Martingales
237
In order to use the above (and other) martingales effectively, we need to have access to maximal inequalities. In light of Cairoli’s inequalities (Theorems 2.3.1 and 2.3.2 and Corollary 2.3.1), the following important result of R. Cairoli and J. B. Walsh shows that all continuous smartingales with respect to the Brownian sheet filtration satisfy maximal inequalities; see Walsh (1986b). Theorem 2.4.1 (The Cairoli–Walsh Commutation Theorem) The Brownian sheet’s history is a commuting filtration. Proof Let F denote the history of the Brownian sheet. We are to show that for all s, t ∈ RN + , and for all bounded, Ft -measurable random variables Y , E[Y | Fs ] = E[Y | Fst ], almost surely. Equivalently, it suffices to show that for all bounded, continuous functions f1 , . . . , fm : R → R and all r1 , . . . , rm t (all in RN + ), E
m
fj (Brj ) Fs = E
j=1
m
fj (Brj ) Fst ,
a.s.
(1)
j=1
(Why?) For each integer n ≥ 1, we define a filtration Fn = (Ftn ; t ∈ RN + ) as n follows: Given t ∈ RN + , define Ft as the σ-field generated by the finite n+1 n for all t ∈ RN collection (Bs ; 2n s ∈ ZN + , s t). Note that Ft ⊂ Ft +. Suppose that n > k ≥ 1 are integers, and that r1 , . . . , rm t satisfy 2k rj ∈ ZN + for all 1 ≤ j ≤ m. It is sufficient to prove that E
m
fj (Brj ) Fsn = E
j=1
m
n , fj (Brj ) Fst
a.s.
(2)
j=1
Once this is established, we can let n → ∞ and use Doob’s martingale convergence theorem (Theorem 1.7.1, Chapter 1) to see that E
m
∞ fj (Brj ) Fsn = E
m
j=1
n=1
j=1
∞ n , fj (Brj ) Fst
a.s.
n=1
By the a.s. continuity of t → Bt , Ft and Fst are the completions of n ∨n Ftn and ∨n Fst , respectively. Thus, equation (2) would imply that for all bounded, continuous f1 , . . . , fm : R → R and all r1 , . . . , rm t such that 2k rj ∈ ZN +, E
m j=1
fj (Brj ) Ft = E
m
fj (Brj ) Fst ,
a.s.
j=1
By the continuity of t → Bt and the dominated convergence theorem, the above holds simultaneously for every r1 , . . . , rm t all in RN + . That is,
238
7. Continuous-Parameter Martingales
we have argued that equation (2) implies equation (1). Thus, in order to conclude our proof, we are left to verify equation (2). On the other hand, (2) is equivalent to the following: For all s, t such that 2n t, 2n s ∈ ZN + and for all bounded, Ftn -measurable random variables Y , n E[Y | Fsn ] = E[Y | Fst ],
a.s.
(3)
Thus, we are to show that (Ftn ; 2n t ∈ ZN + ) is a commuting filtration for each n ≥ 1. Let Dn denote the collection of all dyadic cubes of side 2−n . That is, A ∈ Dn if and only if there exists r ∈ 2n ZN + such that N
A=
[r(j) 2−n , (r(j) + 1)2−n ].
j=1
Note that for all t ∈ 2n ZN + , 1l[0,t] = A∈Dn : A⊂[0,t] 1lA . By its very definition, Bt = W (1l[0,t] ), a.s., where W denotes the isonormal process on RN +; cf. Section 1.4 of Chapter 5. Therefore, for all t ∈ 2n ZN , + Bt = W (1lA ), (4) A∈Dn : A⊂[0,t]
almost surely. This follows form Theorem 1.3.1 and the fact that whenever A ∈ Dn , W (1l∂A ) = 0, almost surely. On the other hand, {A ∈ Dn : A ⊂ [0, t]} is a collection of cubes whose interiors are disjoint. This disjointness implies that for any two distinct A1 , A2 ∈ Dn , we always have E[W (1lA1 )W (1lA2 )] = 0. Consequently, Corollary 1.1.1 of Chapter 5 shows us that (W (1lA ) : A ∈ Dn , A ⊂ [0, t]) is an independent collection of random variables. Let m(A) denote the unique largest element of A (with respect to the partial order ). We can rewrite equation (4) as follows: For all A ∈ Dn , Bm(A) = W (1lA ), m(A )∈Dn : m(A ) m(A)
almost surely, which means that Bm = (Bm(A) ; A ∈ Dn ) is a multiparameter random walk. By the inclusion–exclusion lemma (Lemma 1.2.1, n ; A ∈ Dn ) is the history of the multiparameter random Chapter 4), (Fm(A) walk Bm , and Proposition 1.2.1 of Chapter 4 implies the commutation of Fn . This, in turn, proves equation (3) and completes our proof of Theorem 2.4.1. The following variant is an extremely important exercise. Exercise 2.4.2 Let F denote the history of the Brownian sheet B and define Fs , t ∈ RN Ft+ = +. s t
3 One-Parameter Stochastic Integration
239
Prove that F+ = (Ft+ ; t ∈ RN + ) is an N -parameter filtration. Is it commuting? This filtration arises in J. B. Walsh’s theory of strong martingales; cf. Dozzi (1989, 1991), Imkeller (1988), and Walsh (1979, 1986b). The filtration F that appears in Theorem 2.4.1 is the complete augmented history of B.
3 One-Parameter Stochastic Integration Stochastic integral processes are at the heart of the theory of (local) martingales, as well as its multiparameter extensions. Roughly speaking, a &t stochastic integral process is a process of the form t → 0 Xs dMs , where M —the integrator—is a continuous, one-parameter local martingale and X—the integrand—is a stochastic process. We shall soon see that &M typically has unbounded variation. Thus, any reasonable definition of X dM will not be a classical integral and is, rather, a genuine “stochastic integral.” Moreover, when X is nonrandom and M is Brownian motion (a very nice continuous martingale, indeed), we & would expect the definition of X dM to agree with that of the stochastic integral of X against white noise, as defined in Chapter 5. That is, when X & is a nonrandom function and M is Brownian motion, we should expect X dM to equal M (X), where M is the isonormal process on R; cf. Section 1.4 of Chapter 5. Our construction of stochastic integrals tries to mimic that of the isonormal process. While doing this, we are forced to address a number of fundamental issues. First and & foremost, we need to understand the class of processes X for which X dM can be defined. Once this class is identified, our construction is not too different from that of classical integrals in spirit, and can be made in a few steps. This section goes through just such a program.1 We begin by showing that, viewed as random functions, continuous martingales have rougher paths than many “nice” functions.
3.1 Unbounded Variation The following shows that aside from trivial cases, continuous martingales cannot be too smooth, where here, smoothness is gauged by having bounded variation.
1 This is only the tip of an iceberg; see (Bass 1995; Chung and Williams 1990; Dellacherie and Meyer 1982; Karatzas and Shreve 1991; Revuz and Yor 1994; Rogers and Williams 1987) for aspects of the general theory of stochastic integration. You should ¯ also study (Bass 1998; Dellacherie and Meyer 1988; Fukushima, Oshima, and Takeda 1994; Hunt 1966; Sharpe 1988) for applications and connections to Markov processes.
240
7. Continuous-Parameter Martingales
Theorem 3.1.1 If M = (Mt ; t ≥ 0) is an almost surely continuous martingale of bounded variation, then P(Mt = M0 for all t ≥ 0) = 1. Continuity is an indispensable condition for this unbounded variation property to hold. For instance, see the martingale of Exercise 1.8.2. Proof. There exists a measurable set Λ ⊂ Ω such that P(Λ) = 0 and for all ω ∈ Λ, t → Mt (ω) is continuous. We will show that for each t ≥ 0, P(Mt = M0 ) = 1.
(1)
Let Λ denote the measurable collection of all ω ∈ Ω such that for some t ∈ Q+ , Mt (ω) = M0 (ω). By countable additivity and by equation (1), P(Λ ) = 0. Consequently, P(Λ ∪ Λ ) = 0. On the other hand, if two continuous functions from R+ into R agree on all positive rationals, then they are equal everywhere. Thus, P(Mt = M0 , for some t ≥ 0) = 0. Thus, it suffices to derive equation (1). In fact, it is enough to do this when M is a bounded martingale. To show this, we use a localization argument: According to Theorem 1.2.1, τk is a stopping time, where τk = inf(s ≥ 0 : |Ms | ≥ k), and inf ∅ = ∞. By Corollary 1.6.2, (Mτk ∧t ; t ≥ 0) is a martingale that is bounded in magnitude by k. If we derive equation (1) for all bounded martingales, we can then deduce that P(Mt∧τk = M0 ) = 1. By path continuity, with probability one, limk→∞ Mt∧τk = Mt , for all t ≤ supk τk , which implies (1) in general. We are left to prove equation (1) for any almost surely continuous martingale M such that supt≥0 |Mt | ≤ k for some nonrandom k ≥ 0. Note that for any t ≥ s ≥ 0, E[(Mt − Ms )2 | Fs ] = E[Mt2 − Ms2 | Fs ] − 2E[Ms (Mt − Ms ) | Fs ] = E[Mt2 − Ms2 | Fs ],
(2)
thanks to the martingale property. Thus, E[(Mt − M0 )2 ] = E[Mt2 − M02 ] =
n−1 j=0
n−1
= E
j=0
E M 2j+1 t − M 2j t n
M j+1 t − M j t n
n
2
n
.
(3)
n−1 Let ωn = sup0≤u,v≤t: |u−v|≤1/n |Mu − Mv |, and Vn = j=0 |Mt(j+1)/n − Mtj/n |. Since M is a.s. continuous, it is uniformly continuous on [0, t],
3 One-Parameter Stochastic Integration
241
almost surely. In particular, with probability one, limn→∞ ωn = 0. Furthermore, since M a.s. has bounded variation, with probability one, supn≥1 Vn < ∞. Consequently, n−1
lim
n→∞
j=0
M j+1 t − M j t n
2
n
= 0, a.s.
(4)
In light of equation (3), we wish to show that the above a.s. convergence also holds in L1 (P). By equation(4) and uniform integrability, it suffices to show that the sequence under study in equation (4) is bounded in L2 (P). Now, n−1 2 E = Tn1 + 2Tn2, (M j+1 t − M j t )2 n
j=0
n
where Tn1 =
n−1 j=0
Tn2 =
4 , E M j+1 t − M j t n
n−2 n−1 i=0 j=i+1
(5)
n
2
2 M i+1 t − M i t . E M j+1 t − M j t n
n
(6)
n
n
We will show that Tn1 and Tn2 are both bounded in n. This ensures the requisite uniform integrability, and implies that equation (4) also holds in L1 (P). Thanks to (3), equation (1) follows, and so does the theorem. Since M is bounded by k, and keeping equation (5) in mind, n−1 n−1 Tn1 ≤ 4k 2 E (M j+1 t − M j t )2 = 4k 2 E (M 2j+1 t − M 2j t ) n
j=0
n
n
j=0
n
= 4k 2 E[Mt2 − M02 ] ≤ 4k 4 . The second line uses equation (2), and this shows that the term in equation (5) is bounded in n. Next, we show that the term in equation (6) is bounded, and conclude our argument. Utilizing equation (2) once more, n−1 j=i+1
n−1
2
E M j+1 t − M j t F i+1 t = E M 2j+1 t − M 2j t F i+1 t n
n
n
j=i+1
n
n
n
= E Mt2 − M 2i+1 t F i+1 t ≤ k 2 . n
n
2 Once more using (2),4 we conclude that for any n, Tn is bounded 2equation 2 2 above by k E Mt − M0 ≤ k , which is the desired result.
242
7. Continuous-Parameter Martingales
3.2 Quadratic Variation Consider, for the time being, a function g : R+ → R that is of bounded variation. Very roughly speaking, this bounded variation property states that for “typical values” of s, t ≥ 0 with s ≈ t, |g(s) − g(t)| ≈ |s − t|. In fact, if this property holds for all s, t sufficiently close, then g is also differentiable; cf. Chapter 2 for ways of making this rigorous. Theorem 3.1.1 asserts that when M is a continuous martingale, e.g., M is a Brownian motion (Corollary 1.7.1), then |Mt − Ms | is much larger than |s − t| for typical values of s and t. We now refine this by showing 0 a precise formulation of the statement that for most s, t ≥ 0, |Mt − Ms | ≈ |s − t|. That is, while they have unbounded variation, continuous martingales have finite quadratic variation. More precisely, we state the following theorem. Theorem 3.2.1 Suppose M is a continuous martingale with respect to a filtration F that satisfies the usual conditions. If, in addition, Mt ∈ L2 (P) for all t ≥ 0, then there exists a unique nondecreasing, continuous adapted process [M ] = ([M ]t ; t ≥ 0) such that: (i) [M ]0 = 0, a.s.; and (ii) t → Mt2 − [M ]t is a martingale. Moreover, if rj,n = j2−n (j, n ≥ 0), then (Mrj,n − Mrj−1,n )2 , [M ]t = lim n→∞
1≤j≤2n t
where the convergence holds in probability. The process [M ] is the quadratic variation of M and uniquely determines M among all continuous L2 (P)-martingales. For instance, in light of Corollary 1.7.1, if B denotes standard Brownian motion, then [B]t = t, which has the added—and very special—property of being nonrandom. We will prove Theorem 3.2.1 in three steps. In the first step we verify uniqueness, while the following two steps are concerned with existence. Step 1. (Uniqueness) Suppose there exists another continuous adapted process I = (It ; t ≥ 0) such that (Mt2 − It ; t ≥ 0) is a martingale and I0 = 0. Since the difference between two martingales is itself a martingale, t → [M ]t − It is a continuous martingale of bounded variation that is 0 at t = 0. By Theorem 3.1.1, with probability one, [M ]t = It for all t ≥ 0, which proves the uniqueness of [M ]. Step 2. (Localization) At this stage we show that it suffices to prove Theorem 3.2.1 for continuous bounded martingales. This is done by appealing to localization, an argument that has appeared earlier in our proof of Theorem 3.1.1.
3 One-Parameter Stochastic Integration
243
Suppose we have verified Theorem 3.2.1 for all bounded martingales, and let M denote a general continuous L2 (P)-martingale. For all k ≥ 1, define τ (k) = inf(s ≥ 0 : |Ms | > k); this is a stopping time for each k ≥ 1, and since M is continuous, M k = (Mt∧τ (k) ; t ≥ 0) is a bounded martingale (Corollary 1.6.2). The bounded portion of Theorem 3.2.1 assures us of the existence of a nondecreasing, continuous adapted process I k = (Itk ; t ≥ 0) that is the quadratic variation of M k . By uniqueness, for all t ∈ [0, τ (k)], Itk = Itk+1 . Thus, there exists a nondecreasing, continuous adapted process I = (It ; t ≥ 0) such that for each integer k ≥ 1, It = Itk for all t ∈ [0, τ (k)]. The process I is the quadratic variation of M , and the remaining details are checked directly. Exercise 3.2.1 Complete the proof of Step 1 by verifying that the process I constructed therein has the asserted properties. Step 3. (Proof in the Bounded Case) By Step 2, we can assume that there exists some nonrandom constant κ > 0 such that supt≥0 |Mt | ≤ κ. Throughout this proof we shall choose an arbitrary constant T > 0 that is held fixed. Define Qn (t) = (Mrj,n − Mrj−1,n )2 , n ≥ 1, t ≥ 0. 1≤j≤2n t
Note that t → Qn (t) is bounded uniformly for all t ∈ [0, T ]. We will show that it is Cauchy in L2 (P), also uniformly in t ∈ [0, T ]. If m > n ≥ 1, then Qm (t) − Qn (t) equals (Mj2−m − M(j−1)2−m )2 − (M2−n − M(−1)2−n )2 . 1≤j≤2m t
1≤≤2n t
Since dyadic partitions are nested inside one another, the summands (without the squares) in the second sum can be written as M2−n − M(−1)2−n = (Mj2−m − M(j−1)2−m ). 2m−n (−1)<j≤2m−n
That is, when m > n ≥ 1, Qm (t) − Qn (t) = 1≤≤2n t
−
(Mj2−m − M(j−1)2−m )2
2m−n (−1)<j≤2m−n
1≤≤2n t
= −2
2 (Mj2−m − M(j−1)2−m )
2m−n (−1)<j≤2m−n
(Mj2−m − M(j−1)2−m )
1≤≤2n t 2m−n (−1)<j
×(Mk2−m − M(k−1)2−m ).
(1)
244
7. Continuous-Parameter Martingales
Next, we wish to square the above and take expectations. By the martingale property, the off-diagonal terms vanish and we obtain the following (why?): E {Qm (t) − Qn (t)}2 = 4 1≤≤2n t 2m−n (−1)<j
E (Mj2−m − M(j−1)2−m )2 (Mk2−m − M(k−1)2−m )2 . By equation (2) of Section 3.1, whenever k > j, 2 2 E[(Mk2−m −M(k−1)2−m )2 | F(j−1)2−m ] = E[Mk2 −m −M(k−1)2−m | F(j−1)2−m ].
Thus, we can use this, and telescope the sum over all k’s to see that E[{Qm (t) − Qn (t)}2 ] = 4 1≤≤2n t 2m−n (−1)<j≤2m−n
E (Mj2−m − M(j−1)2−m )2 (M2−n − M(j−1)2−m )2 . For the and j in the above range and for all t ∈ [0, T ], (M2−n − M(j−1)2−m )2 ≤
sup −n+1
0≤u,v≤T : |u−v|≤2
|Mu − Mv |2 = γn .
Note that γn ≤ 2κ2 is a bounded sequence of random variables. Moreover, by the continuity of t → Mt , limn→∞ γn = 0. Finally, E {Qm (t) − Qn (t)}2 E γn · (Mj2−m − M(j−1)2−m )2 . ≤4 1≤≤2n t 2m−n (−1)<j≤2m−n
As m > n → ∞, the above goes to 0, thanks to Lebesgue’s dominated convergence theorem. This uses the mentioned properties of γn , together with the following consequence of equation (1) of the previous subsection (cf. Section 3.1): E (Mj2−m − M(j−1)2−m )2 1≤≤2n t 2m−n (−1)<j≤2m−n
=
2 2 E[Mj2 −m − M(j−1)2−m ]
1≤≤2n t 2m−n (−1)<j≤2m−n
= E[M22m−n 2m t − M02 ] ≤ E[Mt2 − M02 ], by Jensen’s inequality, where •! denotes the greatest integer function. We have shown that for each t ∈ [0, T ], lim E {Qm (t) − Qn (t)}2 = 0. n,m→∞
3 One-Parameter Stochastic Integration
245
On the other hand, equation (1), and the martingale property, together show that Qn − Qm is a martingale (why?). By Doob’s strong (2, 2) inequality, lim E sup {Qm (t) − Qn (t)}2 = 0; n,m→∞
0≤t≤T
see Theorem 1.3.1. However, L2 spaces are complete, Qn (0) = 0, and Qn is adapted and nondecreasing. Thus, there exists an adapted, nondecreasing process [M ] such that [M ]0 = 0 and
lim E sup {Qn (t) − [M ]t }2 = 0. n→∞
0≤t≤T
Next, we show that t → [M ]t is continuous. While Qn is not continuous, the following is (check!): Qn (t) + {Mt − M2n 2−n t }2 ,
t ≥ 0.
By the boundedness and continuity of t → Mt , sup |Mt − M2n 2−n t | = 0,
0≤t≤T
almost surely and in L2 (P). Hence, t → [M ]t is also continuous. To finish, note that whenever t > s, E[Qn (t) | Fs ] = Qn (s) + E Mr2j,n − Mr2j−1,n Fs 2n s<j≤2n t
= Qn (s) + E M22n 2−n t − M22n 2−n s Fs , almost surely. We have used equation (2) of Section 3.1. Using boundedness and continuity, we can deduce that for t > s, E{[M ]t | Fs } = [M ]s + E{Mt2 − Ms2 | Fs }, almost surely. This demonstrates the martingale assertion and completes our proof of Theorem 3.2.1.
3.3 Local Martingales As far as quadratic variation is concerned, local martingales are the most natural extension of the class of L2 (P)-martingales. Here, M = (Mt ; t ≥ 0) is a local martingale if there exist stopping times τ1 , τ2 , . . . such that for each k ≥ 1, M k = (Mt∧τk ; t ≥ 0) is a bounded martingale.2 Any 2 Thus, very loosely speaking, local martingales are martingales minus the integrability hypothesis. However, this naively understates the role of local martingales, since there are some very important local martingales that are not martingales; cf. Supplementary Exercise 7, Chapter 9.
246
7. Continuous-Parameter Martingales
such collection of τ ’s is said to be a localizing sequence, but when M is continuous, there is always a natural localizing sequence given by the following lemma. Lemma 3.3.1 (Localization Lemma) If M is a continuous local martingale, then τ1 , τ2 , . . . is a localizing sequence for M , where τk = inf(s ≥ 0 : |Ms | > k),
k ≥ 1.
Exercise 3.3.1 Prove the localization lemma.
Continuous local martingales are uniquely described by their quadratic variation, as the following theorem asserts. Theorem 3.3.1 If M is a continuous local martingale, there exists a unique continuous, adapted, nondecreasing process [M ] such that [M ]0 = 0 and t → Mt2 − [M ]t is a local martingale. Moreover, for each T > 0, as n → ∞, sup (Mj2−n − M(j−1)2−n )2 − [M ]t → 0, 0≤t≤T
1≤j≤2n t
in probability. Exercise 3.3.2 Prove Theorem 3.3.1.
3.4 Stochastic Integration of Elementary Processes Given an L2 (P) process M that is a continuous martingale with respect to & a filtration F, we now wish to define X dM for a class of nice processes X. As usual, we tacitly assume that F satisfies the usual conditions. We say that a stochastic process X is an elementary process if there exist nonrandom constants β ≥ α > 0 and a bounded Fα -measurable random variable Θ such that for all s ≥ 0, Xs = Θ1l]α,β] (s). Note that elementary processes are necessarily adapted. For such an “elementary integrand,” we define the stochastic integral process M (X) = (M (X)t ; t ≥ 0) as M (X)t = Θ · (Mβ∧t − Mα∧t ), A more suggestive notation for this is t Xs dMs , M (X)t = 0
t ≥ 0.
t ≥ 0,
which we also adopt. The following can be checked by direct means.
3 One-Parameter Stochastic Integration
247
Lemma 3.4.1 Suppose M is a continuous L2 (P)-martingale. If X is an elementary process, then M (X) is a continuous L2 (P)-martingale with M (X)0 = 0 and with quadratic variation t [M (X)]t = Xs2 d[M ]s , t ≥ 0. 0
Exercise 3.4.1 Prove Lemma 3.4.1.
Since t → [M ]t is nondecreasing and t → Mt is continuous, the integral Xs d[M ]s is a (random) Stieltjes integral in the classical sense. 0 We close this subsection with another important exercise.
&t
Lemma 3.4.2 (Polarization) Consider a continuous L2 (P)-martingale M , and suppose X and Y are elementary processes. Then, the following is a continuous martingale: t Xs Ys d[M ]s . t → M (X)t M (Y )t − 0
Exercise 3.4.2 Verify the above.
3.5 Stochastic Integration of Simple Processes Suppose M is a continuous L2 (P)-martingale with respect to a filtration F & that satisfies the usual conditions. We now extend our definition of X dM to include a larger class of integrands than elementary ones. We say that a process X is a simple process if there are a finite number of elementary processes X 1 , . . . , X m such that for all s ≥ 0, Xs = Xs1 +· · ·+ Xsm . Simple processes are bounded and adapted, and for such integrands, we define t m m t j M (X)t = Xs dMs = M (X )t = Xsj dMs , t ≥ 0. 0
j=1
j=1
0
The following can be checked directly. Lemma 3.5.1 Suppose M is a continuous L2 (P)-martingale. (a) If X is a simple process, M (X) is properly defined. That is, the definition of M (X) does not depend on the particular elementary process representation of X. (b) Lemmas 3.4.1 and 3.4.2 continue to hold for simple processes in place of elementary ones.
248
7. Continuous-Parameter Martingales
Exercise 3.5.1 Prove Lemma 3.5.1. Moreover, show that if X and Y are simple processes, so is X +Y , and that a.s., M (X +Y )t = M (X)t +M (Y )t , for all t ≥ 0. (Hint: You may need Lemmas 3.4.1 and 3.4.2.)
3.6 Integrating Continuous Adapted Processes Not surprisingly, when we say that a stochastic process X is continuous adapted (with respect to a filtration F that satisfies the usual conditions), we mean: 1. s → Xs is continuous; and 2. X is adapted to F. Our next lemma states that continuous adapted processes can be well approximated by simple ones. Lemma 3.6.1 If X is a bounded, continuous adapted process, there exists a sequence of simple processes X 1 , X 2 , . . . such that for all t ≥ 0, lim sup |Xsn − Xs | = 0,
n→∞ 0≤s≤t
almost surely and in Lp (P) for all p > 0. Proof Here is one such candidate: For all n ≥ 1, define X j · 1l] j , j+1 ] (s), s ≥ 0. Xsn = 1≤j≤nt
n
n
n
Since X is adapted and bounded, X n is a simple process for each n. The other assertion of the lemma follows from the boundedness and continuity of t → Xt , as well as from Lebesgue’s dominated convergence theorem. We are in position to define M (X), where X is a bounded, continuous and adapted process. Let X 1 , X 2 , . . . denote the approximating simple processes of Lemma 3.6.1 above. Since the difference of two simple processes is itself a simple process, Lemma 3.5.1 shows us that for all n, m ≥ 0 and all t ≥ 0, t (Xsn − Xsm )2 d[M ]s = E[(M (X n )t − M (X m )t )2 ]. E 0
Of course, M (X n ) − M (X m) is a martingale. Therefore, we can use Doob’s strong (2, 2) inequality (Theorem 1.3.1) to see that for all n, m ≥ 1 and all t ≥ 0, t n m 2 E sup {M (X )s − M (X )s } ≤ 4E (Xsn − Xsm )2 d[M ]s . 0≤s≤t
0
3 One-Parameter Stochastic Integration
249
By Lemma 3.6.1, as n, m → ∞, the above goes to zero. Consequently, there exists a process M (X) such that for all t ≥ 0, lim E sup {M (X)s − M (X n )s }2 = 0. n→∞
0≤s≤t
&t (Why?) We shall write this process as M (X)t = 0 Xs dMs , and readily deduce that t → M (X)t is a continuous L2 (P)-martingale. Exercise 3.6.1 Demonstrate Lemmas 3.4.1 and 3.4.2 for all bounded continuous adapted processes in place of simple ones. Moreover, show that whenever X and Y are bounded continuous adapted processes, so is X + Y , and with probability one, M (X + Y )t = M (X)t + M (Y )t for all t ≥ 0. Our next theorem follows from the above and the “localization” methods used in Step 2 of Theorem 3.2.1. Theorem 3.6.1 Given an L2 (P)-martingale M and a continuous adapted process X, there exists a continuous local martingale M (X) with quadratic variation t
[M (X)]t =
0
Xs2 d[M ]s ,
t ≥ 0,
as long as the above is almost surely finite for each t ≥ 0. Exercise 3.6.2 Prove Theorem 3.6.1.
Remarks (i) We shall refer to M (X)t as a stochastic integral and also &t write it as 0 Xs dMs . The term “integral” is justified, since M (X) has many of the usual properties of integrals whose integrand is X; cf. Exercise 3.6.1 above, for example. (ii) Since M (X)0 = 0, the definition of quadratic variation used in conjunction with the above theorem implies that 2 t t =E t ≥ 0. Xs dMs Xs2 d[M ]s , E 0
0
&t
Recall that the condition that a.s., 0 Xs2 d[M ]s is finite for all t ≥ 0 is also written as, X is a.s. locally square integrable with respect to [M ]. Theorem 3.6.2 Consider a continuous L2 (P)-martingale M . If X and Y are continuous adapted processes that are a.s. locally square integrable with respect to [M ], the following is a continuous martingale: t t → M (X)t M (Y )t − Xs Ys d[M ]s . 0
250
7. Continuous-Parameter Martingales
Exercise 3.6.3 Prove Theorem 3.6.2.
Remark If M is a local martingale, [M ] is still well-defined, and as long &t as 0 Xs2 d[M ]s is a.s. finite, one can still define t → M (X)t as a local martingale. See Supplementary Exercises 4 and 5 for details.
3.7 Two Approximation Theorems We continue our discussion of this section by showing that stochastic integrals are approximable by a kind of left-point rule, in a similar way that Riemann sums approximate ordinary Riemann integrals. We shall also verify an analogous fact for the quadratic variations of the stochastic integrals under study.3 Throughout this subsection M denotes a continuous L2 (P)martingale and rj,n = j2−n (n = 1, 2 . . ., j = 0, 1, . . .). Theorem 3.7.1 Suppose X is a bounded and continuous adapted process that is a.s. locally square integrable with respect to [M ]. For all t ≥ 0, lim sup
n→∞ 0≤s≤t
Xrj−1,n (Mrj,n − Mrj−1,n ) −
1≤j≤2n s
s 0
Xr dMr = 0,
where the convergence holds in L2 (P). The above is not true if the left-point rule is replaced by the midpoint rule; cf. Supplementary Exercise 7 for details. We recall that the midpoint rule is obtained by replacing Xrj−1,n (in the summand) by 1 2 (Xrj,n + Xrj−1,n ). Proof Since X is bounded, it is not hard to check directly that
Xrj−1,n (Mrj,n − Mrj−1,n ) =
1≤j≤2n s
s
0
Xrn dMr ,
where X n is the simple process defined by Xsn =
Xrj−1,n 1l](j−1)2−n ,j2−n ] (s),
s ≥ 0.
1≤j≤2n s 1
Let γn = sup0≤r≤t |Xrn − Xr |2 , so that γn2 is the greatest error in the approximation X by X n . By Doob’s strong (2, 2) inequality (Theorem 1.3.1), 3 To various degrees of generality, these were discovered by K. Itˆ o, and in subsequent works by E. Wong and M. Zakai, as well as by D. L. Fisk.
3 One-Parameter Stochastic Integration
251
and Theorem 3.6.1, we can deduce the following. s s 2 t 2 n E sup ≤ 4E Xr dMr − Xr dMr (Xrn − Xr ) dMr 0≤s≤t
0
0
0
t = 4E (Xrn − Xr )2 d[M ]r 0
≤ 4E{γn · [M ]t }, By Lebesgue’s dominated convergence theorem, used together with the boundedness and continuity of t → Xt , as n → ∞, the above goes to 0. The second result of this subsection is the analogue of Theorem 3.7.1 for quadratic variations of stochastic integrals. Theorem 3.7.2 Suppose X is a bounded and continuous adapted process that is a.s. locally square integrable with respect to [M ]. For all t ≥ 0, s lim sup Xrj−1,n (Mrj,n − Mrj−1,n )2 − Xr d[M ]r = 0, n→∞ 0≤s≤t
0
1≤j≤2n s
where the convergence holds in probability. Proof When X is an elementary process, this follows readily from Theorem 3.2.1. Subsequently, it also holds when X is a simple process. The general result follows by approximating bounded and continuous adapted processes by simple ones; cf. Lemma 3.6.1 above.
3.8 Itˆ o’s Formula: Stochastic Integration by Parts A key result of classical integration theory states that a function x : R+ → R that has bounded variation satisfies the following integration by parts formula: For all continuously differentiable functions f : R → R, t f (x(t)) = f (x(0)) + f (x(s)) dx(s), t ≥ 0. 0
A remarkable result of K. Itˆo states that when x is a continuous L2 (P)martingale, the above continues to hold a.s., but with an extra term.4 Theorem 3.8.1 (Itˆ o’s Formula) Suppose that M is a continuous L2 (P)martingale. Then, for all twice continuously differentiable functions f : R → R, t 1 t
f (Ms ) dMs + f (Ms ) d[M ]s , t ≥ 0, f (Mt ) = f (M0 ) + 2 0 0 4 See
(Itˆ o 1944; Kunita and Watanabe 1967).
252
7. Continuous-Parameter Martingales
almost surely. Proof Clearly, t → f (Mt ) and t → f
(Mt ) are both continuous adapted processes. Therefore, both integrals are well-defined (why?). It suffices to show that for each t ≥ 0, Itˆ o’s formula holds almost surely (why?). This is what we will show. By a localization argument, we can assume that there exists a constant k ≥ 0 such that f ∞ + f ∞ + f
∞ ≤ k. (Why?) Now recall rj,n = j2−n , and write f (Mrj,n ) − f (Mrj−1,n ) . f (Mt ) − f (M0 ) = 1≤j≤2n t
&x By Taylor’s expansion, f (x) − f (y) = f (y) · (x − y) + y f
(v) · (x − v) dv. Thus, f (Mt ) − f (M0 ) = f (Mrj−1,n ) · (Mrj,n − Mrj−1,n ) 1≤j≤2n t
+
1≤j≤2n t
Mrj,n
Mrj−1,n
f
(v) · (Mrj,n − v) dv
= T1 + T2 .
&t By Theorem 3.7.1, as n → ∞, T1 converges to 0 f (Ms ) dMs in probability. Therefore, it suffices to show that T2 converges in probability to & 1 t
f (M s ) d[M ]s . 2 0 Let Ij,n denote the interval whose two endpoints are the maximum and the minimum of Mrj,n and Mrj−1,n . Since t → f
(Mt ) is continuous, lim
max
sup |f
(v) − f
(Mrj−1,n )| = 0,
n→∞ 1≤j≤2n t v∈Ij,n
almost surely. Moreover, the above is bounded, since f
is. Hence, 1 f
(Mrj−1,n ) · (Mrj,n − Mrj−1,n )2 = 0, lim T2 − n→∞ 2 n 1≤j≤2 t
almost surely. Itˆo’s formula now follows from Theorem 3.7.2.
To see the power of this formula, we apply it to Brownian motion, which is a continuous L2 (P)-martingale (Corollary 1.7.1). Example Let B denote Brownian motion and recall that [B]t = t; cf. Corollary 1.7.1(ii) and the uniqueness part of Theorem 3.2.1. Itˆo’s formula states that for all twice continuously differentiable functions f : R → R, t 1 t
f (Bs ) dBs + f (Bs ) ds. f (Bt ) = f (0) + 2 0 0
3 One-Parameter Stochastic Integration
253
We apply this to the function f (x) = x2 to obtain the almost sure identity Bt2 − t = 2
t
0
Bs dBs .
This gives the explicit form for the martingale term in Corollary 1.7.1(ii). One can also obtain an integral representation for the martingale of part (iii) of the mentioned corollary. Indeed, it turns out that for any α ∈ R, e
αBt − 12 tα2
=1+α 0
t
α2 s dBs , exp αBs − 2
t ≥ 0.
However, in order to prove this, one needs an “extended form” of Itˆ o’s formula that can be found in Supplementary Exercises 8 and 10.
3.9 The Burkholder–Davis–Gundy Inequality We have just seen that for any continuous L2 (P)-martingale M , E[Mt2 ] = E{[M ]t }, (t ≥ 0). In particular, we can apply Doob’s strong (2, 2) inequality (Theorem 1.3.1) to conclude that E sup Mt2 ≤ 4 sup E{[M ]t }. t≥0
t≥0
In rough terms, we have here an L2 maximal inequality that relates the “maximal L2 -norm” of M to the L1 (P)-norm of the quadratic variation [M ]. The main result of this section is more or less half of an inequality of D. L. Burkholder, B. Davis, and R. Gundy that states an Lp analogue of the above estimate. See Burkholder, Davis, and Gundy (1972), as well as the related works of Burkholder and Gundy (1972) and Burkholder (1973, 1975). Theorem 3.9.1 (The Burkholder–Davis–Gundy Inequality) If M denotes a continuous L2 (P)-martingale with M0 = 0, then for all p ≥ 1, E sup Mt2p ≤ c(p) · sup E{[M ]pt }, t≥0
where c(p) =
t≥0
2p 2p2 pp (2p − 1)p . 2p − 1
Used in conjunction with the optional stopping theorem (Theorem 1.6.1), this has the following important corollary.
254
7. Continuous-Parameter Martingales
Corollary 3.9.1 If M is a continuous L2 (P)-martingale with M0 = 0, then for all p ≥ 1, and for all stopping times T , E sup Mt2p 1l(T <∞) ≤ c(p) · E{[M ]pT 1l(T <∞) }. t≤T
In particular, we have the following. Corollary 3.9.2 Let B denote standard Brownian motion. For all finite stopping times T and for all p ≥ 1, E sup Bt2p ≤ c(p) · E[T p ]. 0≤t≤T
Remarks (i) If we consider only nonrandom times T , we can dramatically improve the constant c(p) of Corollary 3.9.2; see Supplementary Exercise 12. (ii) Theorem 3.9.1 is sharp up to a constant; see Exercise 3.9.2 below. We defer the proofs of Corollaries 3.9.1 and 3.9.2 to Exercise 3.9.1, and demonstrate Theorem 3.9.1 next. Proof of Theorem 3.9.1 We can apply localization to reduce the result to the case where M∞ and supt≥0 Mt2 are both in Lp (P) (why?). Henceforth, we will assume this Lp condition without loss of generality. By the monotone convergence theorem, supt≥0 E{[M ]pt } = E{[M ]p∞ }, where [M ]∞ = limt→∞ [M ]t a.s. exists, thanks to monotonicity of quadratic variation. Apply Itˆ o’s formula (Theorem 3.8.1) to the function f (x) = x2p to see that for all t ≥ 0, t t 2p 2p−1 Ms dMs + p(2p − 1) Ms2p−2 d[M ]s , a.s. Mt = 2p 0
0
We can take expectations of both sides to obtain t E[Mt2p ] = p(2p − 1)E Ms2p−2 d[M ]s 0 ≤ p(2p − 1)E sup Ms2p−2 · [M ]t 0≤s≤t
1− p1
1 ≤ p(2p − 1) E sup Ms2p · E{[M ]p∞ } p , 0≤s≤t
thanks to H¨older’s inequality. By Doob’s strong (2, 2) inequality (Theorem 1.3.1), 2p 2p E sup Mt2p ≤ sup E[Mt2p ], 2p − 1 t≥0 t≥0 which implies the result, after a few lines of direct calculations.
4 Stochastic Partial Differential Equations
Exercise 3.9.1 Prove Corollaries 3.9.1 and 3.9.2.
255
Exercise 3.9.2 (Hard) If M is a continuous L2 (P)-martingale, prove that for all p ≥ 2, there exists a positive and finite constant γ(p) such that sup E{[M ]pt } ≤ γ(p)E sup Mt2p . t≥0
t≥0
&t (Hint: Use Mt2 = 2 0 Ms dMs + [M ]t together with localization and the inequality |x + y|p ≤ 2p {|x|p + |y|p }. This approach can be found in Revuz and Yor (1994, Proposition 4.4, Ch. VI).)
4 An Introduction to Stochastic PDEs Having defined one-parameter stochastic integrals, we can now define multiparameter stochastic integrals with respect to the Brownian sheet. This will allow us to study elements of a class of so-called hyperbolic stochastic partial differential equations (henceforth hyperbolic SPDEs). As in nonstochastic settings, in a one-parameter setting, SPDEs are called stochastic differential equations, and we have seen at least one of them so far. To wit, let B denote standard Brownian motion and define Et = exp(Bt − 2t ), t ≥ 0. According to the example of Section 3.8, Et = 1 +
0
t
Es dBs ,
t ≥ 0.
See also Supplementary Exercise 10. Thus, we can think of the process E as the solution to the stochastic differential equation dE = EdB, subject to E0 = 1.5 This section is concerned with the definition, as well as the existence and uniqueness, of solutions to a class of hyperbolic SPDEs. Before embarking on this journey, we need to construct and study the basic properties of multiparameter stochastic integrals. We will do this in Section 4.1 below. Sections 4.2 and 4.3 form a very brief introduction to hyperbolic SPDEs. Throughout this section B = (Bt ; t ∈ RN + ) will unwaveringly denote a real-valued, N -parameter Brownian sheet, and F = (Ft ; t ∈ RN + ) is the complete natural history of the process B. 5 The process E plays a critical role in the detailed analysis of continuous martingales. Motivated by this, and the fact that it solves the stochastic differential equation dE = E dB, H. P. McKean, has dubbed it the exponential martingale.
256
7. Continuous-Parameter Martingales
4.1 Stochastic Integration Against the Brownian Sheet Our construction of stochastic integrals against the Brownian sheet follows that of one-parameter stochastic integrals. We say that a process [M ] is the quadratic variation of an N parameter martingale M if t → Mt2 − [M ]t is an N -parameter martingale. With this in mind, we have the following result. Theorem 4.1.1 Suppose X is a continuous process such that (i) it is & 2 adapted to F; and (ii) for all t ∈ RN , E[ X + s ds] < +∞. Then, there [0,t] exists a continuous L2 (P)-martingale B(X) = (B(X)t ; t ∈ RN + ) such that: (a) B(X)0 = 0; & (b) [B(X)]t = [0,t] Xs2 ds for all t ∈ RN + ; and (c) [B(X)] is the a.s. unique adapted process of bounded variation that is zero when t = 0. Furthermore, whenever X and Y are continuous adapted processes such & that E[ [0,t] (Xs2 + Ys2 ) ds] < ∞: (i) with probability one, for all t ∈ RN + , B(X + Y )t = B(X)t + B(Y )t ; and (ii) as a process indexed by t ∈ RN + , the following is a continuous martingale: B(X)t B(Y )t −
[0,t]
Xs Ys ds,
t ∈ RN +.
& We shall interchangeably write [0,t] Xs dBs for B(X)t . An attractive feature & of this construction is that when X is nonrandom, B(X)t agrees with [0,t] Xs dBs of Chapter 5, Section 1.5 (Supplementary Exercise 13). Theorem 4.1.1 is proved along the same lines as Theorems 3.6.1 and 3.6.2; see Supplementary Exercise 11 for details. In order to obtain a multiparameter extension of the Burkholder–Davis– Gundy inequality, let us first note that if M is an N -parameter martingale with respect to F, then for every fixed t(2) , . . . , t(N ) ≥ 0, t(1) → Mt is a 1-parameter martingale with respect to the first marginal filtration of F, F1 . In a discrete setting, this follows from Proposition 2.2.1 of Chapter 1; its continuous extension is proved similarly (check!). Now let us consider the N -parameter martingale B(X) X is any & where 2 continuous adapted process such that for all t ∈ RN , E[ X ds] < +∞. + s [0,t] Let us fix t(2) , . . . , t(N ) ≥ 0 and note that t(1) → B(X)t and t(1) → {B(X)t }2 − [B(X)]t are 1-parameter martingales; see the previous paragraph. Since Theorem 4.1.1 shows us the explicit form of the quadratic
4 Stochastic Partial Differential Equations
257
variation [B(X)], we can apply the 1-parameter Burkholder–Davis–Gundy inequality (Theorem 3.9.1) to see that for any t ∈ RN + and for all p ≥ 1, 2p p ≤ c(p) E . Xs dBs Xs2 ds E [0,t]
[0,t]
Subsequently, we can combine the Cairoli-Walsh commutation theorem (Theorem 2.4.1) together with Cairoli’s strong (2p, 2p) inequality (Theorem 2.3.2) and obtain a multiparameter extension of the Burkholder–Davis– Gundy inequality. Theorem 4.1.2 (The Burkholder–Davis–Gundy &Inequality) Suppose 2 X is continuous and adapted, and for all t ∈ RN + , E[ [0,t] Xs ds] < +∞. Then, for all t ∈ RN + and all p ≥ 1, 2p p ≤ c(p)4pN E . Xr dBr Xr2 dr E sup s∈[0,t]
[0,s]
[0,t]
That is, whenever the right-hand side is finite, so is the left-hand side, and the above bound holds. Exercise 4.1.1 Fill in the gaps to prove Theorem 4.1.2.
4.2 Hyperbolic SPDEs: Some Physical Motivation We begin our discussion with a heuristic, though compelling, description of a physical model that naturally leads to a large class of hyperbolic stochastic partial differential equations (written as SPDEs). Consider three smooth functions α, β, b : RN → R. The following nonlinear hyperbolic partial differential equation seeks a “solution” f : RN → R that, in some reasonable sense, satisfies ∂N f ∂t(1) · · · ∂t(N )
(t) = α(f (t))
∂t(1)
∂N b (t) + β(f (t)), · · · ∂t(N )
t ∈ RN .
(1)
To see what this equation means in its simplest nontrivial setting, suppose N = 2, b(t) ≡ 0, and β(u) = 0, the zero function. Let us relabel the variables s = s(1) and y = t(2) to get ∂2f (s, y) = 0. ∂s∂y
(2)
At this stage we can use D’Alembert’s method of characteristics to relate the above to the well-known wave equation of mathematical physics. We change variables as t = s + y and x = s − y, so that the (t, x)-plane is a rotation of the (s, y)-plane by 45◦ . Apparently, ∂f ∂f ∂f = + . ∂s ∂t ∂x
258
7. Continuous-Parameter Martingales
Another round of differentiation, this time with respect to y, yields ∂2f ∂2f ∂2f = 2 − , ∂s∂y ∂t ∂x2 and we have transformed equation (2) into ∂ 2 f /∂t2 = ∂ 2 f /∂x2 . Consequently, it can easily be checked that for any fixed c > 0, the function ψ(t, x) = f (t, x/c) solves the one-dimensional wave equation of mathematical physics, viz., ∂2ψ 1 ∂ψ = 2 2. ∂t2 c ∂x In a few words, the solution ψ represents the displacement, or position, of a flexible vibrating string at “time” t in “position” x. The above equation is in terms of some constant c that has a physical interpretation as well. Indeed, one can write c2 = T /ρ, where T is the tension in the string and ρ is the linear string density. If, in addition, we apply a possibly time-dependent external force of amount F (t, x) per unit length to this string, the equation of the vibrating string changes to ρ ∂2ψ 1 ∂2ψ F. = − ∂t2 T ∂x2 T Conversely, if b satisfies ∂ 2 b/∂t2 = ∂ 2 b/∂x2 + F , we would expect that the vibrating string problem with external force can be transformed into equation (1) with α(u) ≡ 1/T , β(u) = 0, and N = 2. Still working with the vibrating string problem in the N = 2 case, suppose the external force F (t, x) is “truly random.” Say, we have a string, and F describes the quantum effect of the surrounding particles. Since F is obtained by “averaging out” the effect of many i.i.d. particles, it stands to reason that F is a centered Gaussian process. On the other hand, in the “truly random” case, we would expect (F (t, x); x ∈ R, t ≥ 0) to be an i.i.d. collection of random variables. We also might as well scale things so that E[{F (t, x)}2 ] = 1 for all x ∈ R and all t ≥ 0. Thus, our model for the external force formally is described by F (t, x) = −W (δt,x ), where W is the isonormal process on R2 and δp denotes Dirac’s delta function at the point p (why?). Since delta functions are not really functions, one needs to work to justify the above model properly. Nonetheless, arguing formally still, the random process b is modeled to satisfy ∂ 2 b/∂t2 − ∂ 2 b/∂x2 = W (δt,x ). Reverting to the (s, y)-plane, b must satisfy ∂ 2 b/∂s ∂y = W (δs,y ). Integrating (still purely formally) and that W
is a random linear operator, we would & srecalling &y want b(s, y) = W 0 0 δu,v du dv . On the other hand, as “functions” of
4 Stochastic Partial Differential Equations
(p, q),6
s
0
0
259
y
δu,v (p, q) du dv = 1l[0,s]×[0,y] (p, q).
Let t = (s, y) to see that we want b(t) = W (1l[0,t] ), whose modification is a Brownian sheet. Thus, another equally formal interpretation of equation (1), when the external force is “truly random” is given by the following equation for N = 2: df (t) = α(f (t)) dBt + β(f (t)) dt,
t ∈ RN +,
where B denotes the N -parameter Brownian sheet. We are now in a position to write our model in a sensible way: Find a continuous adapted process X = (Xt ; t ∈ RN + ) such that dXt = α(Xt ) dBt + β(Xt ) dt. We can interpret this further as the following stochastic integral equation: Xt − X0 =
[0,t]
α(Xs ) dBs +
[0,t]
β(Xs ) ds,
t ∈ RN +.
(3)
This formulation of our hyperbolic SPDE is at least in terms of objects each of which is, in principle, perfectly well defined. Moreover, when α(u) ≡ 1 and β(u) ≡ 0, the solution with X0 ≡ 0 is Xt = Bt . That is, when N = 2, the Brownian sheet has the physical interpretation of being the solution to the stochastic vibrating string problem in the transformed space, when α ≡ 1 and β ≡ 0. In the next subsection we show that if α and β have nice regularity features, the general hyperbolic SPDE of equation (3) has a solution that is unique, given the “boundary” value X0 . A notable example is the following special case. Suppose X0 = 0, N = 2, α(u) ≡ T −1 , and β(u) ≡ 0. Then this equation, once transformed, solves the following vibrating string problem: “A flat string is subject to a ‘truly random’ external force that forces it to vibrate. Then, displacement of the string at time t at position x is Xt,x/c , where c = T /ρ and X solves equation (3).” By allowing β(u) to be a constant other than 0, we can allow for damped strings, and by allowing a general β, we get the full stochastic vibrating string equation with nonlinear damping. 6 Strictly speaking, this is nonsense and needs to be interpreted in the sense of distributions. However, we will not have need for a rigorous interpretation of the above, since we plan to transform things into a rigorously definable equation.
260
7. Continuous-Parameter Martingales
4.3 Hyperbolic SPDEs: Existence and Uniqueness Issues Let B denote a real-valued, N -parameter Brownian sheet, and consider the following hyperbolic SPDE: Xt = x0 + α(Xs ) dBs + β(Xs ) ds, t ∈ RN (1) +, [0,t]
[0,t]
where α and β are suitably chosen functions. We now prove the existence and uniqueness of a solution to the hyperbolic SPDE (1), under some regularity conditions on α and β. We say that a function f : R → R is globally Lipschitz if there exists a constant Γ such that for all x, y ∈ R, |f (x) − f (y)| ≤ Γ|x − y|. The smallest such Γ is called the Lipschitz norm of f and is denoted by f L, i.e., f L = sup
x,y∈R x =y
|f (x) − f (y)| . |x − y|
Note that f L ≤ f ∞ if f happens to be differentiable. Moreover, this inequality is not improvable; see Supplementary Exercise 6 in this connection. Theorem 4.3.1 Suppose α and β are bounded and globally Lipschitz functions. Then, for every x0 ∈ R, the hyperbolic SPDE (1) has a continuous adapted solution that is a.s. unique. Almost sure uniqueness of a continuous adapted solution X means that ' to (1) is necessarily a modification any other continuous adapted solution X of X. Before proving this result, we shall state a useful analytical estimate. N Lemma 4.3.1 (Gronwall’s Lemma) Suppose ϕ1 , ϕ2 , . . . : RN + → R+ are measurable and nondecreasing in each of their N coordinate variables. Suppose, further, that there exist finite constants C, T > 0 such that for all t ∈ [0, T ] and all n ≥ 1, ϕn+1 (t) ≤ C ϕn (s) ds. [0,t]
Then, for all n ≥ 1 and all t ∈ [0, T ], N () n C =1 t ϕn+1 (t) ≤ ϕ1 (t). (n!)N Exercise 4.3.1 Prove Gronwall’s lemma.
4 Stochastic Partial Differential Equations
261
Proof of Theorem 4.3.1 We use Picard’s iteration of classical ODEs. That is, we show that there is always a fixed-point solution to equation (1), and that it is unique. Define the process Xt0 = x0 , and iteratively let Xtn+1 = x0 + α(Xsn ) dBs + β(Xsn ) ds. [0,t]
[0,t]
By induction, X n is continuous, and both integrals are always well-defined, since α and β are bounded and continuous. We proceed by showing that, in a suitable sense, n → Xtn is Cauchy, uniformly over t-compacts. To this end, let us fix a nonrandom T ∈ RN + and consider for m, n ≥ 0 and all t ∈ [0, T ], α(Xsm ) − α(Xsn ) dBs + β(Xsm ) − β(Xsn ) ds. Xtm+1 − Xtn+1 = [0,t]
[0,t]
Clearly, sup
t∈[0,T ]
[0,t]
β(Xsm ) − β(Xsn ) ds ≤ βL ·
[0,T ]
|Xsn − Xsm | ds.
Thus, by the Cauchy–Schwarz inequality, for all p ≥ 1, 2p m n E sup ≤ A1 β(Xs ) − β(Xs ) ds E{|Xsn − Xsm |2p } ds, t∈[0,T ]
[0,t]
[0,T ]
N () 2p−1 } . Likewise, by the Burkholder–Davis– where A1 = β2p L · { =1 T Gundy inequality (Theorem 4.1.2), E sup t∈[0,T ]
≤ c(p)4
pN
[0,t]
2p α(Xsm ) − α(Xsn ) dBs
E
[0,T ]
2 p α(Xsm ) − α(Xsn ) ds
p m Xs − Xsn 2 ds ≤ c(p)4pN α2p · E L [0,T ] n m 2p E{|Xs − Xs | } ds, ≤ A2 [0,T ]
where A2 = c(p)4pN α2p L ·{ verified the following bound:
2p−1 =1
T () }2p−1 . Combining terms, we have
E sup |Xsm+1 −Xsn+1 |2p ≤ 4p (A1 +A2 ) t∈[0,T ]
[0,T ]
E sup |Xsm −Xsn |2p dt. s∈[0,t]
262
7. Continuous-Parameter Martingales
We have invoked the elementary inequality (a + b)2p ≤ 4p (a2p + b2p ). Use the above with m = n + 1 and apply Gronwall’s lemma (Lemma 4.3.1) to ϕn+1 (t) = E{supt∈[0,T ] |Xtn+1 − Xtn |2p } and C = 4p (A1 + A2 ) to see that C N T () n =1 n+1 n 2p 1 0 2p ≤ . − Xt | E sup |X − X | E sup |Xt t t (n!)N t∈[0,T ] t∈[0,T ] N On the other hand, Xt1 − Xt0 = α(x0 )Bt + β(x0 ) =1 t() . Thus, E sup |Xt1 − Xt0 |2p t∈[0,T ]
≤ 4p |α(x0 )|2p · E sup Bt2p + |β(x0 )|2p · t∈[0,T ]
N
|T () |2p
=1
= C(p), which is finite. (Why?) This shows that ∞ E sup |Xtn+1 − Xtn |2p < +∞. n=1
t∈[0,T ]
In particular, by the completeness of Lp spaces and by the Borel–Cantelli lemma, there exists a process X such that for all T ∈ RN +, lim
sup |Xtn − Xt | = 0,
n→∞ t∈[0,T ]
almost surely and in Lp (P) for all p ≥ 1. The process X is clearly continuous and adapted, since the X n ’s are. Moreover, it is our solution to the hyperbolic SPDE (1); this fact is relegated to Exercise 4.3.2 below. We now show uniqueness. Suppose Y is another continuous adapted solution. Then, the same argument as above shows that for all T ∈ RN + and p ≥ 1, there exists a finite constant Ap such that for all t ∈ [0, T ], ϕ(s) ds, ϕ(t) ≤ Ap [0,t]
2p
where ϕ(t) = E{sups∈[0,t] |Xs −Ys | }. Since α and β are bounded, Exercise 4.3.3 shows that ϕ(t) is finite for all t ∈ RN + . In particular, by Gronwall’s lemma applied to C = Ap and ϕn = ϕ for all n and all t ∈ [0, T ], ϕ(t) ≤
Dn ϕ(t), (n!)N
where D is a finite constant. Let n → ∞ to see that ϕ(t) ≡ 0. In particular, for all T ∈ RN + , P(Xt = Yt for all t ∈ [0, T ]) = 1. Let T ∞, i.e., coordinatewise, to prove the asserted uniqueness.
5 Supplementary Exercises
263
Exercise 4.3.2 Show that in the proof of Theorem 4.3.1, X = limn→∞ X n solves equation (1). Exercise 4.3.3 Prove that in the proof of Theorem 4.3.1, the boundedness of α and β guarantees that ϕ(t) < +∞ for all t ∈ RN +. (Hint: Gronwall’s lemma.) Exercise 4.3.4 Prove that the solution to the hyperbolic SPDE (1) is, in fact, H¨ older continuous. (Hint: Theorem 2.3.1, Chapter 5.)
5 Supplementary Exercises 1. Suppose that for each n ≥ 1, M n is an N -parameter martingale with respect to one fixed N -parameter filtration F. Suppose further that for every t ∈ RN +, Mtn converges in L1 (P) to some Mt , as n → ∞. Prove that M is an N -parameter martingale. 2. Suppose S = (Sn ; n ≥ 0) denotes the simple symmetric random walk on Z; cf. Section 3 of Chapter 3. For every a ∈ Z, define Ta = inf(k ≥ 0 : Sk = a) to be the entrance time to {a}, and show that for all a, b ≥ 0, P(T−a < Tb ) = b/(b − a). Here is a gambling interpretation: Suppose you gamble on a fair game independently one time after another. Suppose further that on every trial you either lose or win a dollar. Then, Sn denotes your net profit by the nth trial, and the above computes the probability that you reach the house limit before going bankrupt. 3. If P(Z ≥ 0) = 1 and E[Z] = 0, show that P(Z = 0) = 1. 4. If M denotes a continuous one-parameter local martingale, show that there exists a unique nondecreasing, continuous adapted process [M ] such that with probability one, [M ]0 = 0, and t → Mt2 − [M ]t is a continuous local martingale. 5. Suppose M is a continuous one-parameter local martingale,& and X is a onet parameter continuous adapted process such that almost surely, 0 Xs2 d[M ]s < ∞ for all t ≥ 0. Construct a stochastic integral process t → M (X)&t such that t M (X) is a continuous local martingale with quadratic variation t → 0 Xs2 d[M ]s . &t Moreover, for all continuous, adapted processes X and Y such that 0 (Xs2 + Ys2 ) d[M ]s < +∞, a.s. for all t ≥ 0, with probability one, M (X + Y )t = M (X)t + M (Y )t , for all t ≥ 0. 6. Prove that whenever f : R → R is globally Lipschitz, it has a derivative at almost every point. However, find an example to show that such an f need not have a derivative at every point. (Hint: For the interval I = (x, y), define µ(I) = f (y) − f (x), and show that µ extends to an absolutely continuous measure on the Borel subsets of R.)
264
7. Continuous-Parameter Martingales
7. Suppose X (M ) is a one-parameter continuous adapted (martingale) process. Let rj,n = j2−n , and prove the existence of t Xrj−1,n + Xrj,n Xs ◦ dMs = lim · (Mrj,n − Mrj−1,n ). n→∞ 2 0 n 1≤j≤2 t
This stochastic integral is called as found in & t the Stratonovich integral, &t Stratonovich (1966). Identify 0 Xs ◦ dMs in terms of 0 Xs dMs explicitly, and show that for&all twice continuously differentiable functions f : R → R, t f (Mt ) = f (M0 ) + 0 f (Ms ) ◦ dMs , for all t ≥ 0, a.s. That is, the Stratonovich calculus follows the rules of the ordinary calculus of functions while the Itˆ o calculus does not. Finally, show that whenever X has zero quadratic variation, the &t &t two integrals agree. That is, for all t ≥ 0, 0 Xs dMs = 0 Xs ◦ dMs , a.s. (Hint: Imitate the proof of Itˆ o’s lemma, Theorem 3.8.1.) (1)
(d)
8. A d-dimensional Brownian motion B is defined by Bt = (Bt , . . . , Bt ), where B (1) , . . . , B (d) are d independent standard Brownian motions. Prove that for all twice continuously differentiable functions f : Rd → R, the following holds a.s. for all t ≥ 0: f (Bt ) = f (0) +
d i=1
where ∆f (x) =
d
t 0
∂2f j=1 ∂(x(j) )2 (x)
1 ∂if (Bs ) dBs(i) + 2 ∂x(i)
t 0
∆f (Bs ) ds,
is the Laplacian of f .
9. Let M denote a one-parameter continuous L2 (P)-martingale with respect to a filtration F that satisfies the usual conditions. Given an adapted process V of bounded variation, show that for all f : R × R (x, v) → f (x, v) ∈ R that are twice continuously differentiable in x and continuously differentiable in v, t ∂f f (Mt , Vt ) = f (M0 , V0 ) + (Ms , Vs ) dMs 0 ∂x t 1 t ∂2f ∂f + (Ms , Vs ) d[M ]s , (Ms , Vs ) dVs + 2 0 ∂x2 0 ∂v where the dV integral is defined (ω by ω) as a Stieltjes integral. 10. Given a d-dimensional Brownian motion B, an adapted process V of bounded variation, and a sufficiently smooth function f : R × R → R,
t
+ 0
d
t
∂f (Bs , Vs ) dBs(j) (i) ∂x j=1 0 1 t ∂f ∆f (Bs , Vs ) ds, (Bs , Vs ) dVs + ∂v 2 0
f (Bt , Vt ) = f (0, V0 ) +
where ∆ is the Laplacian applied to the x variable. Apply this to show that when d = 1, Et = exp(Bt − 12 t) is a continuous martingale that solves the stochastic &t differential equation Et = 1 + 0 Es dBs . (Hint: See Exercise 8.)
5 Supplementary Exercises
265
11. We wish to prove Theorem 4.1.1. Throughout, F = (F t ; t ∈ RN + ) denotes the complete augmented history of the Brownian sheet. (i) We say that an N -parameter process X is elementary if it is adapted to F and if there are α β in RN + and a bounded F β -measurable random variable Θ such that Xs = Θ1l]α,β] (s) (s ∈ RN + ). Prove that Theorem 4.1.1 holds for elementary processes in place of continuous, adapted processes. (ii) A simple process is defined to be a finite linear combination of elementary processes. Extend the previous argument to include integrands that are simple processes. (iii) Conclude the proof of Theorem 4.1.1 by proving a multiparameter version of Lemma 3.6.1. 12. If B denotes the standard Brownian motion: (i) Show that for all λ > 0, P(sup0≤s≤1 Bs ≥ λ) = P(|B1 | ≥ λ). (ii) Deduce that for all p, t > 0, E[sup0≤s≤t Bs2p ] = E[Bt2p ]. Compute this. (iii) Improve the constant c(p) of Corollary 3.9.2 to 1 when only nonrandom times T are considered. (Hint: Part (i) is the reflection principle of D. Andr´e; see Supplementary Exercise 3 of Chapter 4 for the discrete case. For part (ii), the expression E[B12p ]) can be computed directly using the properties of Gaussian densities. However, it can also be computed using Itˆ o’s formula. You may wish try this neat approach!) 13. Let B denote the N -parameter Brownian sheet. Show that when X is a continuous nonrandom function, the stochastic integral B(X) of this chapter is a.s. the same as the stochastic integral of Chapter 5. 14. (Hard) Recall that a function f : Rd → R is locally Lipschitz if for all M > 0, there exists a finite constant ΓM > 0 such that for all x, y ∈ [−M, M ]d , |f (x)−f (y)| ≤ ΓM |x−y|. Suppose (i) α and β are locally Lipschitz; and (ii) there exists a finite constant C > 0 such that for all x ∈ R, |α(x)| + |β(x)| ≤ C{1 + |x|}. Show that the conclusion of Theorem 4.3.1 still holds true. (Hint: Using the notation of our proof of Theorem 4.3.1, start by showing that n 2p for any t ∈ RN + , E{supr∈[0,t] |Xr | } is bounded in n. You may need to derive the & following variant of Gronwall’s lemma: If ϕn+1 (t) ≤ C{1 + [0,t] ϕn (s) ds}, then supn ϕn is locally bounded.) 15. Suppose α and β are globally Lipschitz and bounded functions, and consider d the SPDE of equation (1) of Section 4.3. Let X = (Xx0 ,t ; t ∈ RN + , x0 ∈ R ) denote the solution, viewed as a random function of t as well as the “starting older continuous modification of (x0 , t) → point” x0 . Show that there exists a H¨ Xx0 ,t . This process is a stochastic flow, viewed as a function-valued function of x0 .
266
7. Continuous-Parameter Martingales
6 Notes on Chapter 7 Sections 1,3 The material of these two sections is standard fare in the theory of continuous martingales. See (Chung and Williams 1990; Karatzas and Shreve 1991; Revuz and Yor 1994) for three different pedagogic accounts. For an encyclopedic treatment, see Dellacherie and Meyer (1982). Our proof of Lemma 1.8.1 is motivated by ideas from Kunita and Watanabe (1967), which is excellent reading to this day. Our proof of the Burkholder–Davis–Gundy inequality (Theorem 3.9.1) does not give sharp constants, but has the advantage of being brief. Theorem 3.9.1 holds for all p > 0; see Bass (1987) for a proof, and much more. The fact that Theorem 3.9.1 holds for all p > 0 also follows from the fact that it holds for all p ≥ 1, used in conjunction with the multiplier theorem of J. Marcinkiewicz; see Stein (1993). Section 2 Walsh (1986b, 1986a) are excellent places to start learning more about the general theory of multiparameter martingales. Since the general literature on this subject is massive, we merely refer the reader to the following references, which are more or less directly related to this chapter: Bakry (1979, 1981b, 1982), Imkeller (1985, 1988), Mazziotto and Merzbach (1985), Mazziotto and Szpirglas (1981, 1982), and Nualart (1985). See Merzbach and Nualart (1985) for a survey of various aspects of the general theory, and K¨ orezlio˘ glu et al. (1981) for some of the more recent activity. There is a powerful regularity theorem, due to D. Bakry, that proves the multiparameter analogue of Theorem 1.4.1. The basic message is that all L{ln+ L}N−1 bounded, N -parameter martingales (with respect to commuting filtrations) have a right-continuous modification. Note that for this modification, Corollary 2.3.1 always holds; cf. Theorem 2.3.2. Bakry’s regularity theorem is worked out in detail in Bakry (1979) in the 2-parameter case using Cairoli’s inequalities and the general theory of 2-parameter processes, in particular, a section theorem found by E. Merzbach. This general theory is well described in Dozzi (1989, 1991) and Imkeller (1988). Section 4 The heuristic discussion of Section 4.2 is modeled after aspects of the presentations of Caba˜ na (1991) and Walsh (1986a, 1986b). This section presents stochastic integration against the Brownian sheet by a multiparameter extension of Itˆ o integrals. Since such integrals are all that we develop in this book, this approach is sufficient for our needs. However, stochastic integration against more general multiparameter martingales is truly a more complicated story. Much of this is developed in (Imkeller 1988; Nualart 1985; Walsh 1986a) and their combined references. Further extensions of these notions can be found in Nualart (1995).
8 Constructing Markov Processes
Markov processes will provide us with a large class of processes that can be used as building blocks for useful and interesting random fields. This and the following chapter are a brief introduction to Markov processes. Although in some specialized settings we have already encountered the Markov property, e.g., Chapters 3 and 7, the general theory of Markov processes is substantially more complicated; thus, we restrict attention to a nice class of Markov processes that are known as Feller processes. In order to gel the basic ideas, we begin with the simplest case, which is that of discrete Markov chains. Our treatment of the continuous-time theory will follow, starting with Section 2.
1 Discrete Markov Chains We start our development of the theory of Markov processes in its simplest setting: discrete Markov chains. These are discrete-time, discrete-space processes that possess the Markov property. Once some of the key concepts and methods are isolated, we proceed with our development of the more complicated continuous-time theory.
1.1 Preliminaries If S denotes a denumerable set, we say that a stochastic process X = (Xn ; n ≥ 0) is a (discrete) Markov chain with state space S if there exists a filtration F = (Fn ; n ≥ 0) such that:
268
8. Constructing Markov Processes
1. X is an S-valued process that is adapted to F; 2. for all n ≥ 0 and all a ∈ S, P(Xn+1 = a | Fn ) = P(Xn+1 = a | Xn ), a.s.1 Property 2 above is called the Markov property of X. The following is an important exercise. Exercise 1.1.1 If H = (Hn ; n ≥ 0) denotes the history of a Markov chain X, then for all a ∈ S and all n ≥ 0, P(Xn+1 = a | Hn ) = P(Xn+1 = a | Xn ), a.s. Thus, as a consequence of Exercise 1.1.1, unless a specific filtration is mentioned, the underlying filtration is tacitly assumed to be the history of X. For all n, k ≥ 0 and a, b ∈ S, define, " P(Xn+k = b | Xn = a), if P(Xn = a) > 0, pn,n+k (a, b) = 0, otherwise. In words, pn,n+k (a, b) denotes the probability that our chain goes from a to b, starting at time n, and ending at time n + k. Recall that on (Xn = a), pn,n+1 (a, b) = P(Xn+1 = b | Xn ). Thus, the Markov property can be restated as follows: For all n ≥ 0 and all a ∈ S, pn,n+1 (a, b) = P(Xn+1 = b | Fn ), on (Xn = a). Equivalently, we have the Markov property if and only if pn,n+1 (Xn , b) = P(Xn+1 = b | Fn ), a.s. We say that X is a time-homogeneous Markov chain if for all n, k ≥ 0, pn,n+k = p0,k . One can usually reduce attention to time-homogeneous Markov chains, as the following shows. Lemma 1.1.1 Let X be an S-valued Markov chain. Define Y = (Yn ; n ≥ 0) by n ≥ 0. Yn = (n, Xn ), Then Y is an N0 × S-valued, time-homogeneous Markov chain. Proof Let F denote the history of X and note that F is also the history of Y . Clearly, P(Yn+k = y | Fn ) = P(Yn+k = y | Yn ),
a.s.
1 Recall that for any random variable Y and event E, P(E | Y ) is shorthand for P{E | σ(Y )}, where σ(Y ) is the σ-field generated by Y . A similar remark holds for conditional expectations.
1 Discrete Markov Chains
269
That is, we have the Markov property. It suffices to prove time-homogeneity. We can write any x ∈ N0 × S as x = (x(1) , x(2) ), where x(1) is a nonnegative integer and x(2) ∈ S. With the above in mind, note that P(Yn+k = y |Yn = x) " P(Xx(1) +k = y (2) | Xx(1) = x(2) ), if y (1) = x(1) + k, = 0, otherwise.
Since this is independent of n, our proof is complete.
Since the map X → Y is a tractable one, it is often sufficient to study time-homogeneous Markov chains. This motivates the following simplification. Convention Unless we state otherwise, Markov chains are to be assumed time-homogeneous. The transition probabilities of a Markov chain X are the collection of the probabilities p0,n (x, y), n ≥ 0, x, y ∈ S. We will write pn (x, y) for p0,n (x, y) for brevity. Thus, in this time-homogeneous setting, the Markov property can be recast as follows: For all n ≥ 0 and all y ∈ S, P(Xn+1 = y | Fn ) = p1 (Xn , y),
a.s.
(1)
The function (x, y) → pk (x, y) is the k-step transition function of X. In words, pk (x, y) denotes the probability that in k time steps, the process moves from state x ∈ S to state y ∈ S. The 1-step transition function p1 is called simply the transition function of X, and thanks to (1), it determines all of the finite-dimensional distributions of X. Indeed, for any a0 , . . . , am ∈ S, P(X0 = a0 , . . . , Xm = am ) = p1 (am−1 , am )P(X0 = a0 , . . . , Xm−1 = am−1 ). By iterating this, we see that m−1
P(X0 = a0 , . . . , Xm = am ) = P(X0 = a0 )
p1 (a , a+1 ).
(2)
=0
Consequently, if we know the transition function, as well as the distribution of X0 , we can, in principle, compute the probability of any interesting event pertaining to the process X. From now on we refer to the distribution of X0 as the initial distribution of the Markov chain X. Let ν denote the initial distribution of X, and for any event E, define Pν (E) = ν({a}) P(E | X0 = a). a∈S
270
8. Constructing Markov Processes
Clearly, Pν is a probability measure on S endowed with its Borel sets (in the counting topology). Moreover, suppose we define probability measures (Pa ; a ∈ S) by Pa (•) = P(• | X0 = a). & Then, the above shows that Pν (•) = Px (•) ν(da). Thus, we should think of Pν as the natural probability measure for the Markov chain X, given that X0 has distribution ν. Moreover, it is easy to see that the above definitions are consistent. That is, if we denote the point mass at x ∈ S by δx , then Px = Pδx . Henceforth, we will write Eν and Ex for the expectation operators corresponding to Pν and Px , respectively. Equation (2) can now be recast in the following way. Theorem 1.1.1 (The Chapman–Kolmogorov Equation) For any integer m ≥ 0 and all bounded functions f : S m+1 → R, m−1
Eν f (X0 , . . . , Xm ) =
f (a0 , . . . , am )ν({a0 })
a0 ,...,am ∈S
p1 (a , a+1 ). =0
Example (Random Walks) Suppose ξ1 , ξ2 , . . . are i.i.d. Zd -valued random vectors, and recall from Chapter 3 that n the corresponding random walk X = (Xn ; n ≥ 1) is defined by Xn = j=1 ξj . Now we introduce an extra Zd -valued random vector ξ0 , independently of ξ1 , ξ2 , . . ., and with distribution ν. Let us then redefine Xn =
n
ξj ,
n ≥ 0.
j=0
In this way we see that the process X = (Xn ; n ≥ 0) is a Markov chain on S = Zd with initial measure ν; cf. Theorem 1.1.1 of Chapter 3. Moreover, the transition function of X is given by p1 (x, y) = P(ξ1 = y − x),
x, y ∈ S.
As such, Theorem 1.1.1 should be viewed as an extension of Corollary 1.1.2, Chapter 3. Exercise 1.1.2 Suppose X = (Xn ; n ≥ 0) denotes a symmetric random walk on Zd . Recall that this means that −X and X have the same finitedimensional distributions. If γ denotes the counting (or uniform) measure on Zd , show that for all m ≥ 0 and all f1 , . . . , fm : Zd → R+ , Eγ
m j=1
fj (Xj ) = Eγ
m j=1
fj (Xm−j ) ,
1 Discrete Markov Chains
271
& where Eγ [•] = Ex [•] γ(dx) is well-defined, although γ is not a probability measure. In brief, conclude that if the initial (nonprobability) measure is γ, the process (Xn ; 0 ≤ n ≤ m) has the same finite-dimensional distributions as (Xm−n ; 0 ≤ n ≤ m). This is an example of time-reversal. If Pγ (•) = Eγ [1l• ] is the corresponding “probability,” what is Pγ (Xn ∈ A)? We conclude this subsection with the following variant of Theorem 1.1.1. Theorem 1.1.2 Suppose m is a positive integer and f : S m → R is a bounded function. Then, whenever ν denotes the initial measure of the Markov chain X, for any integer n ≥ 0, Eν [f (Xn+1 , . . . , Xn+m ) | Fn ] = EXn [f (X1 , . . . , Xm )],
Pν -a.s.
Proof We first check this for m = 1. Indeed, by equation (1), Eν [f (Xn+1 ) | Fn ] =
f (y) p1 (Xn , y) = EXn [f (X1 )],
y∈S
Pν -a.s., since for all x ∈ S, Ex [f (X1 )] = y∈S f (y) p1 (x, y). To conclude, let us assume that Theorem 1.1.2 holds for m = j. We will show that it also holds for m = j + 1. In fact, it suffices to prove that for all y1 , . . . , yj+1 ∈ S, the following holds Pν -a.s.: Pν (Xn+1 = y1 , . . . , Xn+j+1 = yj+1 | Fn ) = PXn (X1 = y1 , . . . , Xm = ym ).
(3)
By equation (1) (or the induction hypothesis), we have the following Pν -a.s.: Pν (Xn+1 = y1 , . . . , Xn+j+1 = yj+1 | Fn ) j 1l(Xn+ =y ) Fn = Eν Pν (Xn+j+1 = yj+1 | Fn+j )
=1
j
= Eν p1 (Xn+j , yj+1 ) =1
1l(Xn+ =y ) Fn .
Using the induction hypothesis (possibly one more time), we see that the following holds Pν -a.s.:
j
Pν (Xn+1 = y1 , . . . , Xn+j+1 = yj+1 | Fn ) = EXn p1 (Xj , yj+1 )
1l(X =y ) .
=1
Another application of equation (1) proves equation (3) and hence the result.
272
8. Constructing Markov Processes
1.2 The Strong Markov Property Comparing the example of Section 1.1 to the discussion of Section 1.1, Chapter 3, we should expect that all Markov chains on the state space S should satisfy a stronger form of the Markov property. To this end, suppose X = (Xn ; n ≥ 0) is a Markov chain on a denumerable state space S with respect to a filtration F = (Fn ; n ≥ 0), and let ν denote the initial measure of X. The following is an extension of the Markov property for random walks (Theorem 1.2.1, Chapter 3) and, in a much more general context, is due to Blumenthal (1957). Precursors to this result can be found in (Hunt 1956b; Kinney 1953). Theorem 1.2.1 (The Strong Markov Property) For any stopping time T , for all integers m ≥ 1, and for all bounded functions f : S m → R, the following holds Pν -almost surely: Eν f (XT +1 , XT +2 , . . . , XT +m ) FT 1l(T <∞) = EXT f (X1 , . . . , Xm ) 1l(T <∞) . Proof If j denotes any positive integer, we have, Pν -a.s. on (T = j), Eν f (XT +1 , . . . , XT +m ) FT = Eν f (Xj+1 , . . . , Xj+m ) FT = Eν f (Xj+1 , . . . , Xj+m ) Fj . The last step uses Theorem 1.1.1 of Chapter 1. By Theorem 1.1.2, Eν f (XT +1 , . . . , XT +m ) FT 1l(T =j) = EXj f (X1 , . . . , Xm ) 1l(T =j) , Pν -a.s. Summing this over all j ≥ 0, we can conclude the proof.
1.3 Killing and Absorbing Let X = (Xn ; n ≥ 0) denote a Markov chain with a denumerable state space S and transition function p1 : S×S → R+ . In this subsection we show how one can perform certain path surgeries on X to obtain new Markov chains. The idea behind killing is as follows: Consider a given set A ⊂ S. We can then construct a new Markov process Y = (Yk ; k ≥ 0) by letting Y equal X until the first time the latter enters the set A. After that, we send Y to a so-called cemetery state. We now proceed with more caution. Let ∆ ∈ S be a fixed point, called the cemetery or the coffin state, and define the enlarged state space S∆ = S ∪ {∆}. Recall the entrance times TA = inf(j ≥ 0 : Xj ∈ A), where inf ∅ = +∞, as usual. Finally, we can define the killed process Y as " Xn , if n < TA , Yn = ∆, if n ≥ TA .
1 Discrete Markov Chains
273
The process Y is said to be X killed upon entering A. Intuitively speaking, Y equals X until the latter process first enters A. At that time, we “kill it and send it to the cemetery ∆.” Theorem 1.3.1 The process Y is a Markov chain with state space S∆ \ A, whose transition function q1 is given by the following formula: p1 (x, y), if x, y ∈ S \ A, p (x, a), if y = ∆, q1 (x, y) = 1 a∈A 1l{∆} (y), if x = ∆. Remark Once the transition function is found, we can then compute any or all of the k-step transition functions by Theorem 1.1.1; cf. Supplementary Exercise 2. Proof Let ν denote the initial measure of X. Since (TA > n) is Fn measurable, for any y ∈ S \ A, Pν -a.s., Pν (Yn+1 = y | Fn )1l(TA >n) = Pν (Xn+1 = y | Fn )1l(TA >n) = p1 (Xn , y)1l(TA >n) , by (1). Thus, for any y ∈ S \ A, Pν -a.s., P(Yn+1 = y | Fn )1l(TA >n) = p1 (Yn , y)1l(TA >n) , Pν -almost surely. Similarly, we can show that Pν -a.s., p1 (Yn , a)1l(TA >n) , Pν (Yn+1 = ∆ | Fn )1l(TA >n) = a∈A
Pν (Yn+1 = y | Fn )1l(TA ≤n) = 1l{∆} (y)1l(TA ≤n) . The result follows readily from the above together with the fact that (TA > n) = (Yn ∈ S \ A). There is another way to kill the chain X. For any given λ ∈ ]0, 1[, let gλ denote a geometric random variable with parameter λ that is totally independent of the process X. Recall that the distribution of gλ is given by P(gλ = k) = (1 − λ)λk−1 , k = 1, 2, . . . . We can define the process X killed at rate λ to be given by X λ = (Xnλ ; n ≥ 0), where " Xn , if gλ > n, λ Xn = ∆, if gλ ≤ n. Theorem 1.3.2 The process X λ is a Markov chain on S∆ whose transition function pλ1 is given by " if x, y ∈ S, λp1 (x, y), λ p1 (x, y) = (1 − λ)1l{∆} (y), if x = ∆.
274
8. Constructing Markov Processes
It is worthwhile to provide an intuitive interpretation for the above procedure. Given that X is at x at time n, in the next step it goes to (some place) y ∈ S with probability p1 (x, y). On the other hand, given that Xnλ = x, we toss an independent λ-coin;2 if the coin lands heads, we then send X λ to (some place) y ∈ S with probability p1 (x, y). If the coin lands tails, we kill X λ once and for all. Proof Clearly, the history of X λ is Fλ = (Fnλ ; n ≥ 0), where Fnλ = Fn∧gλ . We can extend the domain of definition of any function f : S → R to S∆ by defining f (∆) = 0. Now consider arbitrary bounded functions f1 , . . . , fn+1 : S → R. Since X0λ = X0 , for any initial measure ν (for X0 ), n+1 Eν j=1
n+1
fj (Xjλ )
= Eν
n+1
fj (Xj ) 1l(gλ >n+1) = λ j=1
n+1 Eν
fj (Xj ) .
j=1
We have used the simple fact that Pν (gλ > n + 1) = λn+1 . Let fn+1 (x) = 1l{y} (x), where y ∈ S is fixed. By a monotone class argument, for any initial measure ν, the following holds Pν -a.s.: λ Pν (Xn+1 = y | Fnλ ) = λPν (Xn+1 = y | Fn ) = λp1 (Xn , y).
(Why?) The last step uses equation (1) and completes the proof.
A related concept is that of absorption at some fixed state a ∈ S. The idea here is to construct a process Z that equals X until it reaches the set {a}. After that, Z remains in {a}. Recalling that we write Ta for T{a} , we can define this more precisely as " Zn =
Xn , if n < Ta , a, if n ≥ Ta .
The process Z = (Zk ; k ≥ 0) is X absorbed at state a. We then have the following: Theorem 1.3.3 Let Z be X absorbed at some state a ∈ S. Then, Z is a Markov chain on S whose transition function γ1 is given by the following: " γ1 (x, y) =
p1 (x, y), 1l{a} (y),
if x = a, if x = a.
Exercise 1.3.1 Prove Theorem 1.3.3. 2A
λ-coin is one that tosses heads with probability λ.
1 Discrete Markov Chains
275
1.4 Transition Operators Recalling from the example of Section 1.1 that random walks are Markov chains, we now wish to extend the notion of transition operators developed in Chapter 3, Section 1.1. Let X = (Xn ; n ≥ 0) denote a Markov chain on a denumerable state space S. Throughout, we define F = (Fn ; n ≥ 0) to be any filtration with respect to which X is a Markov chain. (It may help to think of F as the history of X.) We define the transition operators (Tn ; n ≥ 0) as follows: For any bounded function f : S → R, Tn f (x) = Ex [f (Xn )],
n ≥ 0.
In particular, Tn 1lA (x) denotes the conditional probability that Xn is in A, given that X0 = x. It is clear that Tn is a linear operator for any n ≥ 0; cf. Section 1.1 of Chapter 3. Moreover, if f : S → R is bounded, Tn Tm f (x) = Ex [Tm f (Xn )] = Ex EXn [f (Xm )] = Ex f (Xn+m ) , by Theorem 1.1.2. We have shown that Tn+m = Tn Tm = Tm Tn . This is the semigroup property of the transition operators and should be familiar from Lemma 1.1.1 of Chapter 3. There are three further properties of (Tn ; n ≥ 0) that are elementary and yet deserve to be mentioned at this stage. Namely: (i) For every bounded function, f : S → R and for all x ∈ S, T0 f (x) = f (x). (ii) For all n ≥ 0 and all x ∈ S, Tn 1(x) = 1, where 1(y) = 1 is the function 1, identically. (iii) For all n ≥ 0, Tn is a nonnegative operator. That is, whenever f (x) ≥ 0 for all x ∈ S, then Tn f (x) ≥ 0, for all x ∈ S. Exercise 1.4.1 Verify the above conditions (i)–(iii).
If T = (Tn ; n ≥ 0) is any semigroup of linear operators on the space of bounded functions f : S → R for which (i)–(iii) all hold, we say that T is a Markov semigroup. The following is the main result of this subsection. Theorem 1.4.1 Suppose T = (Tn ; n ≥ 0) is a collection of linear operators on the space of bounded functions f : S → R. Then, T is the transition operator of a Markov chain on S if and only if T is a Markov semigroup. Proof If T is the transition operator of a Markov chain on S, then we have already shown that it is a Markov semigroup. It suffices to prove the converse. Throughout, we will assume that T is a Markov semigroup. For every x, y ∈ S and for every integer n ≥ 0, define pn (x, y) = Tn 1l{y} (x).
276
8. Constructing Markov Processes
It is easy to see that y → pn (x, y) is a probability function on S. In fact, pn will end up being the n-step transition probability of the Markov chain that we are establishing. With this in mind, we first define probability measures (Qnx ; x ∈ S) on S n as follows: For all A ⊂ S n and all a0 ∈ S, Qna0 (A) =
a∈S n
n−1
1lA (a)
p1 a() , a(+1) .
=0
This should be compared to the Chapman–Kolmogorov equation (Theorem 1.1.1). It is not hard to see that for each x ∈ S, (Qnx ; n ≥ 1) is a consistent family of probability measures on S n . By Kolmogorov’s existence theorem (Theorem 1, Appendix A), we can construct probability measures (Px ; x ∈ S) on S ∞ such that the restriction of Px to S n is exactly Qnx for each x ∈ S. Now we construct our Markov chain. For any ω ∈ S ∞ , define Xn (ω) = ω (n) ,
n ≥ 0.
It should be recognized that, under the measure Px , the joint distribution of X1 , . . . , Xm is given by the measure Qm x . Since X satisfies the Chapman– Kolmogorov property, this is clearly equivalent to the Markov property. We close this subsection with three examples. Example 1 Suppose we kill the Markov chain X when it enters a set A ⊂ S. Letting ∆ denote the coffin state, we extend the domain of any function f : S → R to S∆ = S ∪ {∆} by letting f (∆) ≡ 0. Following the notation of Section 1.3, we will denote this killed process by Y = (Yn ; n ≥ 0). By Theorem 1.3.1, Y is a Markov chain on S. If T = (Tn ; n ≥ 0) designates the transition semigroup of X, that of Y is given by T = (T n ; n ≥ 0), where for all bounded functions f : S → R, T n f (x) = Ex f (Xn )1l(TA >n) . It is a good exercise to check directly that this is indeed a Markov semigroup; see Supplementary Exercise 8. Example 2 Suppose we kill our Markov chain X at rate λ ∈ ]0, 1[ and call this killed process X λ = (Xnλ ; n ≥ 0); cf. Section 1.3 for this notation. By Theorem 1.3.2, X λ is a Markov chain on S. It is easy to see that its transition operators (Tnλ ; n ≥ 0) are given by Tnλ f (x) = λn Tn f (x), where f : S → R is any bounded function. One can directly check that this is a Markov semigroup; cf. Supplementary Exercise 8.
1 Discrete Markov Chains
277
Example 3 Our final example for this subsection relates to absorbed Markov chains. Let Z be our Markov chain X, absorbed at {a}. (The notation is that of Section 1.3.) The transition operators of Z are T a = (Tna ; n ≥ 0), where Tna f (x) = Ex f (Xn )1l(Ta >n) + f (a)Px (Ta ≤ n), for any bounded function f : S → R. These form a Markov semigroup; cf. Supplementary Exercise 8.
1.5 Resolvents and λ-Potentials Given a Markov chain X with transition operators T = (Tn ; n ≥ 0) on a denumerable state space S, we define a family of linear operators R = (Rλ ; λ ∈ ]0, 1[) by the prescription Rλ f (x) =
∞
λn Tn f (x),
n=0
for all bounded functions f : S → R and all x ∈ S. (Clearly, the above sum converges absolutely.) We call Rλ the λ-potential operator of X, the function Rλ f is the λ-potential of f , and the resolvent of X is the entire collection R. Note that λ → Rλ f (x) is none other then the classical generating function for n → Tn f (x). By the dominated convergence theorem, we can reinterpret the above as Rλ f (x) = Ex
∞
λn f (Xn ) .
(1)
n=0
In particular, Rλ 1lA (x) is the expected number of visits to A ⊂ S, discounted at rate λ, conditional on (X0 = x). Example 2 of Section 1.4 provides us with yet another interpretation of these λ-resolvent operators: If X λ = (Xnλ ; n ≥ 0) is the process X killed at rate λ, then Rλ f (x) = Ex
∞
f (Xnλ ) .
n=0
In particular, Rλ 1lA (x) is the expected number of visits to A ⊂ S for the killed process, conditional on (X0λ = x). The generating function interpretation of λ-potential operators, together with the classical uniqueness theorem for generating functions, shows us that if Rλ f (x) = Rλ g(x) for all λ ∈ ]0, 1[ and all x ∈ S, then Tn f = Tn g for all n ≥ 0. This uniqueness assertion can be refined.
278
8. Constructing Markov Processes
Theorem 1.5.1 (The Resolvent Equation) The resolvent of a Markov chain satisfies the following: For all λ, γ ∈ ]0, 1[, λRγ − γRλ = (λ − γ)Rλ Rγ . In particular, suppose there exists a λ ∈ ]0, 1[ such that Rλ f (x) = 0 for all x ∈ S. Then, for all γ ∈ ]0, 1[ and y ∈ S, Rγ f (y) = 0. The second statement shows that if Rλ f = Rλ g for some λ, then Tn f = Tn g for all n ≥ 0. This is the improvement over the mentioned uniqueness assertion via generating functions. Proof By Theorem 1.2.1, for all bounded functions f : S → R and all integers j, n ≥ 0, Tj f (Xn ) = Ex [f (Xj+n ) | Fn ], (2) Px -a.s. for all x ∈ S. Thus, we can use equation (1) to deduce that with Px probability one for any x ∈ S, Rγ f (Xn ) = EXn
∞
∞ γ j f (Xj ) = γ j Tj f (Xn )
j=0
=
∞
j=0
j
γ Ex [f (Xj+n ) | Fn ] = γ
−n
Ex
j=0
∞
j
γ f (Xj ) | Fn . (3)
j=n
Consequently, for all x ∈ S, ∞
∞ n ∞ λ j Rλ Rγ f (x) = Ex λ Rγ f (Xn ) = Ex γ f (Xj ) γ j=n n=0 n=0 ∞ " λ j+1 # γ Ex 1− γ j f (Xj ) . = λ−γ γ j=0 n
The result follows readily from this.
Equation (3) of the above proof allows us to deduce that for any λ ∈ ]0, 1[ and all x ∈ S, Ex
∞ j=0
n−1 λj f (Xj ) Fn = λn Rλ f (Xn ) + λj f (Xj ),
Px -a.s.
(4)
j=0
Corollary 1.5.1 (Doob–Meyer Decomposition) For any λ ∈ ]0, 1[ and any bounded nonnegative f : S → R+ , (λn Rλ f (Xn ); n ≥ 0) is a supermartingale that satisfies the decomposition (4) with respect to Px for any x ∈ S.
1 Discrete Markov Chains
279
1.6 Distribution of Entrance Times There are deep connections between entrance times and potentials of functions. We will now elaborate on one such connection. Let X = (Xn ; n ≥ 0) denote a Markov chain on a denumerable space S. Let F = (Fn ; n ≥ 0) designate the underlying filtration with respect to which X is Markov, and as in the previous subsection, we define R = (Rλ ; λ ∈ ]0, 1[) to be the resolvent of X. The following computes the generating function of the entrance time as the potential of a function. Theorem 1.6.1 Let TE = inf(k ≥ 0 : Xk ∈ E) denote the entrance time to E ⊂ S. There exists a function fE : S → [0, 1] such that for all x ∈ S and all λ ∈ ]0, 1[, Ex [λTE ] = Rλ fE (x). Proof Recall that gλ is an independent geometric random variable with parameter λ and that P(gλ > n) = λn . Let LλE denote the last hitting time of E before gλ . That is, LλE = sup(0 ≤ k < gλ : Xk ∈ E), where sup ∅ = −1. The advertised function fE is given by fE (x) = Px (LλE = 0),
x ∈ S.
(1)
We now compute directly: Rλ fE (x) ∞ n λ λ PXn (LE = 0) = Ex n=0
= Ex
∞
λ PXn (X0 ∈ E , X ∈
E for all 1 ≤ < gλ ) n
n=0
= (1 − λ)
∞
λk Ex
k=1
∞
λn PXn (X0 ∈ E , X ∈ E for all 1 ≤ < k) ,
n=0
by independence. Employing Theorem 1.2.1 leads to Rλ fE (x) = (1 − λ)
∞ k=1
= Ex = Ex
∞
n=0 ∞ n=0
n
λk Ex
∞
λn Px (Xn ∈ E , Xn+ ∈ E for all 1 ≤ < k | Fn )
n=0
λ Px (Xn ∈ E , Xn+
E for all 1 ≤ < gλ | Fn ) ∈
∞ λn Px (LλE = n | Fn ) = Ex λn 1l(LλE =n) . n=0
280
8. Constructing Markov Processes
Thus, Rλ fE (x) = Ex
g λ −1 n=0
1l(LλE =n) = Px (TE < gλ ) = Ex [λTE ].
We have used the independence of X and gλ several times. This completes the proof. As a consequence of the proof of the above, we mention the following. Corollary 1.6.1 The function fE above satisfies: (i) for all x ∈ E, fE (x) = 0; and (ii) for all x ∈ E, Rλ fE (x) = 1. Exercise 1.6.1 Complete the proof of Corollary 1.6.1.
The function fE is called the equilibrium measure on E, and Rλ fE is called the equilibrium (λ)-potential of E and has the following interpretation. Corollary 1.6.2 For all x ∈ S and E ⊂ S, Px (TE < ∞) = lim Rλ fE (x). λ→1−
Exercise 1.6.2 Prove Corollary 1.6.2.
The equilibrium potentials of Theorem 1.6.1 satisfy the following variational problem. Theorem 1.6.2 For any x ∈ S, all λ ∈ ]0, 1[, and every E ⊂ S, Rλ fE (x) = inf Rλ f (x), where the infimum is taken over all bounded functions f : S → [0, 1] such that f = 0 off of E and Rλ f ≥ 1 on E. Proof Fix some bounded function f : S → R such that f = 0 off of E, and Rλ f ≥ 1 on E. Fix x ∈ S and define M f = (Mnf ; n ≥ 0) by ∞ λj f (Xj ) Fn . Mnf = Ex j=0
Clearly, M f is a bounded martingale with respect to Px . Moreover, by the Doob–Meyer decomposition (Corollary 1.5.1), with Px -probability one, MTfE 1l(TE <∞) = λTE Rλ f (XTE )1l(TE <∞) .
2 Markov Semigroups
281
Therefore (why?), by the optional stopping theorem (Theorem 1.2.1 of Chapter 1), Ex [M0f ] = lim Ex [MTfE ∧k ] = Ex [MTfE 1l(TE <∞) ] k→∞
= Ex [λTE Rλ f (XTE )1l(TE <∞) ] ≥ Ex [λTE 1l(TE <∞) ], which equals Ex [λTE ], since λ ∈ ]0, 1[. On the other hand, Ex [M0f ] = Rλ f (x). This and Theorem 1.6.1 together complete the proof. There are other variational representations of equilibrium potentials. Here is a sample. Exercise 1.6.3 Prove that for all x ∈ S, all λ ∈ ]0, 1[, and for every E ⊂ S, Rλ fE (x) = sup Rλ f (x), where the supremum is taken over all f : S → R+ such that f ≡ 0 off of E and Rλ f (x) ≤ 1 for all x ∈ S. We will return to this topic more fully in Chapter 10.
2 Markov Semigroups At first sight, Markov processes can be thought of as “obvious” extensions of Markov chains. However, upon closer examination, one finds a number of hazardous technical pitfalls in actually carrying out such extensions. To avoid them, one typically focuses on a class of “nice” Markov processes. In this book we will concern ourselves with one such class of processes known as Feller processes. In order to construct such processes, we need some functional-analytic machinery that will be developed in this section. The following section uses this material to construct and analyze Feller processes. The reader should be well familiar with the material of Section 1 before proceeding any further. We begin our discussion with some elementary notions from functional analysis.
2.1 Bounded Linear Operators This subsection is a brief review of some relevant facts about bounded (or continuous) linear operators and establishes some useful notation. Recall that a set X is a linear space (or a real vector space) if for every α1 , α2 ∈ R and all f1 , f2 ∈ X, α1 f1 + α2 f2 ∈ X. The mapping · X : X → R+ ∪ {+∞} defines a norm on X if: 1. for any f ∈ X, f X = 0 if and only if f = 0—the zero element of X;
282
8. Constructing Markov Processes
2. (The Triangle Inequality) for all f1 , f2 ∈ X, f1 + f2 X ≤ f1 X + f2 X ; and 3. for all α ∈ R and f ∈ X, αf X = |α| · f X . Note that any normed linear space X can be metrized: We can simply define the distance between f, g ∈ X as f − gX . If X and Y are linear spaces, recall that a map L : X → Y is a linear operator (from X in to Y ) if for all α1 , α2 ∈ R and all f1 , f2 ∈ X, L(α1 f1 + α2 f2 ) = α1 L(f1 ) + α2 L(f2 ). If X and Y are in fact normed linear spaces (with norms · X and · Y , respectively), we can define Lop =
sup f ∈X: f X =1
L(f )Y .
We say that the operator L is a bounded linear operator if Lop < ∞. The collection of all bounded linear operators from X in to Y is denoted by B(X, Y ). Lemma 2.1.1 Suppose X and Y are normed linear spaces. Then endowed with · op , B(X, Y ) is a normed linear space. Moreover, Lop = sup
f ∈X
L(f )Y , f X
(1)
where 0 ÷ 0 = 1. The norm Lop is called the operator norm of L. Corollary 2.1.1 Suppose X and Y are normed linear spaces with norms · · · X and · · · Y , respectively. For any f1 , f2 ∈ X, and for all L ∈ B(X, Y ), L(f1 ) − L(f2 )Y ≤ Lop f1 − f2 X . Exercise 2.1.1 Verify Lemma 2.1.1 and Corollary 2.1.1.
2.2 Markov Semigroups and Resolvents From now on, let S denote a separable, locally compact space.3 We are primarily interested in two linear spaces associated with S. The first is L∞ (S), the collection of all bounded, measurable functions f : S → R.4 3 The space S is locally compact at x ∈ S if for any open neighborhood U of x, we can find an open neighborhood V of x such that V ⊂ U is compact. We say that S is locally compact if it is locally compact at all points x ∈ S. 4 This is a slight abuse of notation, since, typically, we refer to an L∞ -space as one with a measure structure. This should not cause any confusion, however.
2 Markov Semigroups
283
The second linear space of interest is C0 (S). This is the collection of all continuous functions f : S → R that vanish at infinity. While this definition is clear when S = Rd , it deserves a line of explanation to cover the general case. We say that f : S → R vanishes at infinity if for all ε > 0, there exists a compact set Kε ⊂ S such that supx ∈Kε |f (x)| ≤ ε. You should check the following. Exercise 2.2.1 Any function f ∈ C0 (S) is uniformly continuous.
∞
It can (and should) also be easily checked that C0 (S) ⊂ L (S) and that L∞ (S) and C0 (S) are both normed linear spaces when endowed with the norm f ∞ = sup |f (x)|, f ∈ L∞ (S). x∈S
A collection T = (Tt ; t ≥ 0) is said to be a Markov semigroup on S if: 1. For all t ≥ 0, Tt ∈ B(L∞ (S), L∞ (S)). 2. T0 is the identity operator. That is, for all f ∈ L∞ (S), T0 (f ) = f . 3. (Semigroup Property) For all t, s ≥ 0, Tt+s = Tt Ts . 4. (Nonnegativity-Preserving) If f ∈ L∞ (S) is nonnegative, then for all t ≥ 0 and all x ∈ S, Tt f (x) ≥ 0. Lemma 2.2.1 If T is a Markov semigroup on S, then for all t ≥ 0, Tt op = 1. Proof For all x ∈ S, f (x) − f ∞ ≤ 0; consequently, Tt f (x) ≤ f ∞ Tt 1(x) = f ∞ , pointwise. This shows that Tt op ≤ 1. On the other hand, Tt 1(x) = 1 for all t ≥ 0 and all x ∈ S, and our proof is complete. To any Markov semigroup we can associate a resolvent R = (Rλ ; λ > 0), defined as follows: For all f ∈ L∞ (S), all λ > 0, and all x ∈ S, Rλ f (x) =
0
∞
e−λs Ts f (x) ds.
The function Rλ f is said to be the λ-potential of f ∈ L∞ (S). Note that the above resolvent (in continuous time) is not exactly the same as the discrete-time resolvent of Section 1.5. Lemma 2.2.2 For all λ > 0, Rλ ∈ B(L∞ (S), L∞ (S)) and Rλ op = λ−1 . Exercise 2.2.2 Prove Lemma 2.2.2.
284
8. Constructing Markov Processes
The following is the continuous-time analogue of the resolvent equation in discrete time (Theorem 1.5.1) and bears the same name. Theorem 2.2.1 (The Resolvent Equation) For all λ, γ > 0, Rγ − Rλ = (λ − γ)Rλ Rγ = (λ − γ)Rγ Rλ . Exercise 2.2.3 Prove Theorem 2.2.1. (Hint: Model the proof after that of Theorem 1.5.1.)
Corollary 2.2.1 Given f ∈ L∞ (S), suppose there exists λ > 0 such that Rλ f ≡ 0. Then, for all γ > 0, Rγ f ≡ 0. Exercise 2.2.4 Prove Corollary 2.2.1.
2.3 Transition and Potential Densities We now discuss a useful method of producing Markov semigroups. Suppose µ is a σ-finite measure on S and p = (pt ; t ≥ 0) is a family of functions pt : S × S → R+ (t ≥ 0) such that (t; x, y) → pt (x, y) is measurable and: & 1. for all t ≥ 0 and x ∈ S, pt (x, y) µ(dy) = 1; 2. (The Chapman–Kolmogorov Property) for every t, s ≥ 0 and for all & x, y ∈ S, pt+s (x, y) = pt (x, z)ps (z, y) µ(dz). The functions pt are then said to be transition densities (with respect to the measure µ). For any f ∈ L∞ (S), for all t ≥ 0, and every x ∈ S, we may define Tt f (x) =
pt (x, y)f (y) µ(dy).
Proposition 2.3.1 The collection T = (Tt ; t ≥ 0) is a Markov semigroup on S. Exercise 2.3.1 Verify Proposition 2.3.1.
Next, let us look at the resolvent R = (Rλ ; λ > 0) of T. For any λ > 0, define rλ : S × S → R by ∞ rλ (x, y) = e−λt pt (x, y) dt. 0
By the monotone convergence theorem, this integral always exists, although it may be infinite. When it is finite µ × µ-almost everywhere, we say that rλ is the λ-potential density of T. This definition is motivated by the
2 Markov Semigroups
285
following calculation, which is a consequence of Fubini’s theorem: For all nonnegative f ∈ L∞ (S), Rλ f (x) = rλ (x, y)f (y) µ(dy). That is, the measure A → Rλ 1lA (x) is absolutely continuous with respect to µ and the Radon–Nikod´ ym density (at y ∈ S) is rλ (x, y). Example Consider S = R, endowed with its Euclidean topology. Define pt (x, y) = qt (y − x), where 1 −a2 /2t e , qt (a) = √ 2πt
a ∈ R, t ≥ 0.
&∞ Clearly, (t; x, y) → pt (x, y) is measurable and −∞ pt (x, y)dy = 1, for all t ≥ 0 and x ∈ R. We will next show that the collection of pt ’s satisfies the Chapman–Kolmogorov condition 2 above with µ being the 1-dimensional Lebesgue measure. In other words, we will check that for all t, s ≥ 0, qt+s = qt qs , where denotes convolution. Let qFt denote the Fourier transform of a → qt (a). That is, ∞ qFt (ξ) = eiξa qt (a) da. −∞
From a direct computation (cf. Exercise 2.3.2 below), we find that for any 2 1 t ≥ 0 and all ξ ∈ R, qFt (ξ) = e− 2 tξ . In particular, for all t, s ≥ 0, qG t+s = qFt · qFs . Since f g = fFF g, we have shown that qt+s = qt qs , as desired. Define T = (Tt ; t ≥ 0) by ∞ Tt f (x) = pt (x, y)f (y) dy, t ≥ 0, x ∈ R, f ∈ L∞ (R). −∞
By Proposition 2.3.1, T is a semigroup on R and is called the heat semigroup on R. Next, we compute the resolvent of the heat semigroup by finding an expression for the λ-potential density rλ . It is clear that rλ (x, y) = uλ (y − x), where ∞ uλ (a) = e−λr qr (a) dr, λ > 0, a ∈ R. (1) 0
By the inversion theorem for characteristic functions, ∞ ∞ 2 1 1 1 Re qr (a) = qFr (ξ)e−iξa dξ = e− 2 rξ −iξa dξ 2π −∞ 2π −∞ 1 ∞ − 1 rξ2 e 2 cos(ξa) dξ. = π 0
286
8. Constructing Markov Processes
Plugging this into equation (1) and using Fubini’s theorem, we arrive at the following: 2 ∞ cos(ξa) dξ. uλ (a) = π 0 ξ 2 + 2λ The modified Bessel’s function, K 12 , of the third kind is defined by ) K 12 (y) =
2 yπ
∞
0
cos(yξ) dξ, ξ2 + 1
y ∈ R;
see Watson (1995). Thus, the λ-resolvent density rλ (x, y) is equal to uλ (y − x), where uλ (a) =
2a2 14 π3
√
K 21 a 2λ ,
λ > 0, a ∈ R.
This provides the computation of the resolvent of the heat semigroup on R1 . 1
2
Exercise 2.3.2 Verify that in the above example, qFt (ξ) = e− 2 tξ , as asserted. Also, check that uλ is a bounded continuous function. Exercise 2.3.3 The heat semigroup on Rd is defined analogously to the & d = 1 case as Tt f (x) = Rd pt (x, y)f (y) dy, where d
pt (x, y) = (2πt)− 2 exp
−
1 x − y2 . 2t
Show that for all λ > 0, rλ (x, y) = uλ (y − x), where uλ (a) =
2 (2π)d
Rd
cos(ξ · a) dξ, 2λ + ξ2
for all a ∈ Rd . Moreover, prove that when d ≥ 3, even r0 makes sense and is given by r0 (x, y) = Cx − y2−d for some finite positive constant C. Compute C explicitly. Exercise 2.3.4 For the example of this section, prove that uλ (a) = Ce−|a|
√ 2λ
,
and compute C. Can you compute, in terms of Bessel functions, the analogous uλ when d ≥ 2? (Hint: Try to show that uλ , suitably truncated, solves u
= 2λu.)
2 Markov Semigroups
287
2.4 Feller Semigroups Let S denote a separable, locally compact space and suppose T = (Tt ; t ≥ 0) is a Markov semigroup on S. We say that T is a Feller semigroup (Feller, briefly) if: 1. for all t ≥ 0, Tt : C0 (S) → C0 (S); 2. (The Feller Property) for all f ∈ C0 (S), limt→0+ Tt f − f ∞ = 0. Thus, the Feller property states that the operator-valued map t → Tt is right-continuous at 0. In fact, t → Tt is uniformly right-continuous at all points t ≥ 0, as the following shows. Lemma 2.4.1 The Feller property is equivalent to the following: For all f ∈ C0 (S), lim sup Ts+t f − Ts f ∞ = 0. t→0+ s≥0
Moreover, it implies that for all f ∈ C0 (S), lim λRλ f − f ∞ = 0.
λ→∞
Exercise 2.4.1 Prove Lemma 2.4.1.
Suppose T has transition densities p = (pt ; t ≥ 0) with respect to some σ-finite measure µ on S; cf. Section 2.3. It is not hard to find a useful sufficient condition on the transition densities that guarantees that T is Feller. Proposition 2.4.1 Suppose T is a Markov semigroup on a locally compact, separable metric space S. Suppose further that T has transition densities p with respect to a σ-finite measure µ on S. Then, T is a Feller semigroup, provided that for all δ > 0, pt (x, y) µ(dy) = 0, lim sup t→0+ x∈S
y∈S: d(x,y)≥δ
where d(•, •) denotes the metric on S. Proof For all f ∈ C0 (S) and for all t ≥ 0, Tt f (x) − f (x) = pt (x, y) f (y) − f (x) µ(dy),
x ∈ S.
We can divide the above integral into two pieces: &one where d(x, y) ≤ δ, and one where d(x, y) > δ. Since pt (x, y) ≥ 0 and pt (x, z) µ(dz) = 1 for all t ≥ 0 and x, y ∈ S, Tt f − f ∞ ≤
sup x,y∈S: d(x,y)≤δ
|f (x) − f (y)|
+2f ∞ sup x∈S
y∈S: d(x,y)≥δ
pt (x, y) µ(dy).
288
8. Constructing Markov Processes
Letting t → 0+ and then δ → 0+, we see that lim sup Tt f − f ∞ ≤ lim
sup
δ→0+ x,y∈S: d(x,y)≤δ
t→0+
|f (x) − f (y)|,
which is 0 (why?). This completes our proof.
Example We proceed to use Proposition 2.4.1 to show that the heat semigroup of the example of Section 2.3 is indeed Feller. In the notation of the latter Example, a few lines of calculations show that for any x ∈ R and all δ > 0, ) ∞ 2 1 2 pt (x, y) dy = e− 2 z dz. π −1/2 y∈R: |x−y|≥δ δt The Feller property of the heat semigroup follows from Proposition 2.4.1.
3 Markov Processes We are now in a position to begin our construction of Markov processes in continuous time. Ignoring the technical difficulties for the moment, a Markov process should be defined as a stochastic process that is conditionally independent of its past, given its present value. See Section 1 on Markov chains for a more precise description in discrete time. In order to carry out such a construction in continuous time, we need additional regularity properties. In this book we content ourselves with a uniform continuity condition in probability, which, in more precise terms, translates to the so-called Feller property. Our approach is based on more traditional semigroup methods and is closest to the treatment of Dynkin (1965), Hunt (1956a, 1957, 1958), Itˆo (1984), and Knight (1981).
3.1 Initial Measures Suppose T is a Markov semigroup on a separable, locally compact space S. An S-valued stochastic process X = (Xt ; t ≥ 0) is said to be a Markov process with initial measure ν, filtration X = (Xt ; t ≥ 0), and transition operators T if there exists a probability measure Pν on our probability space such that: 1. X is adapted to X; 2. for all s, t ≥ 0 and all f ∈ C0 (S), Eν [f (Xt+s ) | Xs ] = Tt f (Xs ),
3 Markov Processes
289
Pν -a.s., where Eν denotes the expectation operator corresponding to Pν ; and 3. for all Borel sets A ⊂ S, Pν (X0 ∈ A) = ν(A). An important choice for an initial measure is the point mass at x ∈ S, which we denote by δx . In this case, we write Px and Ex in place of the more cumbersome Pδx and Eδx, respectively. We say that X is a Markov process with transition operators T and filtration X if it is a Markov process for any initial measure δx , x ∈ S. We can associate a collection of probability measures to a Markov process X as follows: Tt (x, A) = Tt 1lA (x),
x ∈ S, A ⊂ S, Borel.
(1)
It is easy to see that for each t ≥ 0 and x ∈ S, Tt (x, •) is a probability measure. In fact, by properties 2 and 3 of Markov processes, Tt (x, A) = Px (Xt ∈ A). That is, Tt (x, •) is the the distribution of Xt under the measure Px . A little thought shows that an equivalent formulation is that Tt (x, •) is the conditional distribution of Xt , given that X0 = x. However, the somewhat circuitous route that we have taken avoids our having to worry about a suitable choice of regular conditional probabilities.5 The measures (Tt (x, •); x ∈ S) are called the transition functions of X. According to equation (1), we can find the transition functions of a Markov process from its transition operators. On the other hand, by a monotone class argument, Tt f (x) = Ex [f (Xt )] = f (y) Tt (x, dy), x ∈ S, f ∈ C0 (R). (2) (Why?) That is, we know the transition operators if and only if we know the transition functions. From now on, we will make no distinctions between transition operators and transition functions; they are identified with each other by equations (1) and (2). Lemma 3.1.1 (The Chapman–Kolmogorov Equation) Suppose X is a Markov process with filtration X, transition operators T, and initial measure ν. For all ϕ0 , . . . , ϕk ∈ L∞ (S) and all 0 = t0 < t1 < · · · < tk , k ϕj (Xtj ) Eν j=0
=
···
k
ϕj (aj )ν(da0 )Tt1 (a0 , da1 )Tt2 −t1 (a1 , da2 ) · · · Ttk −tk−1 (ak−1 , dak ), j=0
5 It is equally possible to use regular conditional probabilities to define such Markov processes. To learn more about regular conditional probabilities, see (Blackwell and Dubins 1975; Walsh 1972).
290
8. Constructing Markov Processes
where Eν [Y ] = sponding to Pν .
& Ω
Y (ω) Pν (dω) denotes the expectation operator corre-
Exercise 3.1.1 Prove Lemma 3.1.1 by imitating its discrete analogue, Theorem 1.1.1. Lemma 3.1.1 has two important implications. First, it shows that the transition functions completely determine the finite-dimensional distributions of a Markov process. The second implication is the following, which is obtained from Lemma 3.1.1 and a monotone class argument. Corollary 3.1.1 Suppose X%is a Markov process with transition operators T, filtration X. Let X∞ = t≥0 Xt denote the smallest σ-field generated by the entire process X. For any probability measure ν on (S, X∞ ) and all bounded, X∞ -measurable random variables Y , Ex [Y ] ν(dx), (3) Eν [Y ] = S
where Ex = Eδx, where δx represents the point mass at x ∈ S. Let us conclude this subsection with two remarks. • For any X∞ -measurable random variable Y ∈ ∪x∈S L1 (Px ), x → Ex [Y ] is measurable as a function from S into R. This holds thanks to Lemma 3.1.1 and the measurability conditions on Markov semigroups. & • By equation (3), for any Borel set A ∈ X∞ , Pν (A) = Px (A)ν(dx). Exercise 3.1.2 Prove that X is a Markov process if and only if it is a Markov process for all choices of the initial measure ν.
3.2 Augmentation Aside from the basic definitions, in the previous subsection we studied the elementary properties of a Markov process X with filtration X and transition functions T as we considered various initial measures. We now turn our attention to the underlying filtration. For any probability measure µ on a measure space (S, X∞ ), define the filtration Xµ = (Xµt ; t ≥ 0) as follows: For all t ≥ 0, Xµt is the completion of Xt with respect to the measure µ.6 The complete filtration X = (Xt ; t ≥ 0) is defined by µ Xt = Xt , t ≥ 0, 6 Recall that this means that Xµ is obtained by adding to X all subsets of µ-null t t sets.
3 Markov Processes
291
where the intersection is taken over all probability measures µ on (S, X∞ ). We can extend a probability measure µ on (S, X∞ ) to a probability measure µ on (S, X∞ ) as usual, by defining: 1. µ(E) ≡ 0 for all E ∈ X∞ such that E is a subset of m-null sets for all probability measures m on (S, X∞ ); and 2. µ = µ, on X∞ . As is customary, we will abuse notation by writing µ in place of µ. This should not cause any confusion. It turns out that, after completing the underlying filtration, we do not alter the Markovian structure of a process. Lemma 3.2.1 Suppose X is a Markov process with transition operators T and filtration X. Then, X is also a Markov process with the same transition functions and with the complete filtration X. Proof We will show that for all x ∈ S, all f ∈ C0 (S), and all s, t ≥ 0, (1) Ex f (Xt+s ) Xs = Tt f (Xs ), Px -a.s. By the remarks following Corollary 3.1.1, this shows that for any initial measure ν, (1) holds Pν -a.s. and proves the desired result. See also Exercise 3.1.2. Let us choose some s, t ≥ 0 and hold them fixed. For any Λ ∈ Xs , there exists Λ ∈ Xs such that for all probability measures µ on S, µ(ΛΛ ) = 0. In particular, for all x ∈ S, Px (ΛΛ ) = 0. Hence, for all f ∈ C0 (S), Ex f (Xs+t ) 1lΛ = Ex f (Xs+t ) 1lΛ = Ex Ex f (Xs+t ) Xs 1lΛ . By Property 2 of Markov processes (cf. Section 3.1), Ex f (Xs+t ) 1lΛ = Ex Tt f (Xs )1lΛ = Ex Tt f (Xs )1lΛ . This validates (1) and completes the proof.
The next step in our program is fairly natural: In order to fully utilize the martingale theory of Chapter 7, we need to extend the Markov property from the complete filtration of Lemma 3.2.1 to a Markov property that holds on the “right-continuous” extension of the mentioned complete filtrations. That is, suppose X is a Markov process with transition functions T and complete filtration X. Define the filtration X = (X t ; t ≥ 0 ) as in Section 1.4 of Chapter 7, by Xs , t ≥ 0, X t = s>t
292
8. Constructing Markov Processes
and call it the complete augmented filtration of X. Our hope is to show that if X is Markov with respect to the filtration F, it is also Markov with respect to F . Unfortunately, at this level of generality this is not so!7 However, if the transition operators of X are sufficiently nicely behaved, it is. We will return to this issue in Section 4, where we discuss a class of nice Feller processes.
3.3 Shifts A collection θ = (θt ; t ≥ 0) is called a collection of shift operators (shifts, briefly) for a Markov process X if: 1. for each t ≥ 0, θt : Ω → Ω is measurable; 2. for all s, t ≥ 0 and all ω ∈ Ω, Xs ◦ θt (ω) = Xs (θt (ω)) = Xt+s (ω). The following is the basic existence result in the theory. Theorem 3.3.1 Suppose T is a Markov semigroup on a locally compact, separable metric space S. On an appropriate probability space one can construct a Markov process X with transition functions T, together with shift operators θ. Proof For any probability measure ν on S, and for all 0 = t0 < t1 < · · · < tk , we define a probability measure Pt0 ,...,tk on S k+1 as follows: For all Borel sets A0 , . . . , Ak ⊂ S, Pt0 ,...,tk (A0 , . . . , Ak ) equals
···
k
1lAj (aj ) ν(da0 ) Tt1 (a0 , da1 ) Tt2 −t1 (a1 , da2 ) · · · Ttk −tk−1 (ak−1 , dak ); j=0
see the Chapman–Kolmogorov equation, Lemma 3.1.1. It is clear that the above measures form a consistent family of probability measures on S R+ (endowed with the product topology). By Kolmogorov’s consistency theorem (Theorem 1, Appendix A), there exists a measure Pν on S R+ whose cylindrical projections are the Pt0 ,...,tk ’s above. More precisely, for all ω ∈ S R+ , define Xt (ω) as the tth coordinate of ω, i.e., Xt (ω) = ω (t) = ω(t). Then, for all Borel sets A0 , . . . , Ak ⊂ S, Pν (Xti ∈ Ai , for all 0 ≤ i ≤ k) = Pt0 ,...,tk (A0 , . . . , Ak ). One now checks directly that X = (Xt ; t ≥ 0) is the desired Markov process. Moreover, we can define a family of shift operators for X as follows: 7 In fact, the existence of such extended Markov properties is closely tied with the strong Markov property. For a glimpse of what is involved, see Supplementary Exercises 11, 12, and 13 below.
4 Feller Processes
293
For all t ≥ 0 and ω ∈ S R+ , define the function θt (ω) according to the formula s ≥ 0. θt (ω)(s) = ω(t + s), Our proof is now complete.
When shift operators exist, we can formulate the Markov property in the following elegant manner. Theorem 3.3.2 (The Markov Property) Suppose X is a Markov process with respect to a filtration F, and with transition functions T and shifts θ. Then, for any bounded ∨t Ft -measurable random variable Y , for all t ≥ 0 and all x ∈ S, Px -a.s. Ex [Y ◦ θt | Ft ] = EXt [Y ], Proof Suppose f : S m → R is a bounded, measurable function and hold 0 ≤ t1 < · · · < tk fixed. Then, for all t ≥ 0, the following holds Px -a.s., for all x ∈ S, Ex f (Xt1 +t , . . . , Xtk +t ) Ft = EXt f (Xt1 , . . . , Xtk ) . (1) The proof of this assertion is very similar to that of Theorem 1.1.2; see also the Chapman–Kolmogorov equation (Lemma 3.1.1) and Exercise 3.3.1 below for details. Alternatively, we can write the above display as (2) Ex f (Xt1 , . . . , Xtk ) ◦ θt Ft = EXt f (Xt1 , . . . , Xtk ) . The general result follows from a straightforward monotone class argument. Exercise 3.3.1 First, verify equation (1) and then, complete the proof of Theorem 3.3.2. Exercise 3.3.2 If X is a Markov process with respect to some filtration F, it is also a Markov process with respect to its history H. It may help to recall that Ht denotes the smallest σ-field that makes the random variables (Xr ; 0 ≤ r ≤ t) measurable.
4 Feller Processes Throughout the remainder of this section we consider a locally compact, separable metric space (S, d).8 In the previous section we constructed Svalued Markov processes that corresponded to Markov semigroups on S. 8 By Urysohn’s metrization theorem, this is the same as assuming S to be either a locally compact, separable Hausdorff space or a second countable, locally compact space.
294
8. Constructing Markov Processes
While we have established the existence of such stochastic processes in Theorem 3.3.1, the S-valued random function t → Xt may well be quite badly behaved. In this section we propose to show that if, in addition, T is ' of X a Feller semigroup, then there exists an appropriate modification X that is not only a Markov process with transition functions T, but also has 't is right-continuous. Moreover, we will right-continuous paths; i.e., t → X show that this modification is Markov, even with respect to the complete augmented filtration; see the discussion of Section 3.2. In light of Section 1.1 of Chapter 7, we could then use the theory of stopping times to better understand Feller processes. On the other hand, modifications of stochastic processes often require us to enlarge the state space. For instance, in Section 2.2 of Chapter 5 we showed that R-valued stochastic processes have separable modifications; an inspection of our proofs shows that such separable modifications are quite possibly R ∪ {±∞}-valued. In order to obtain suitable modifications for Markov processes, we now need to enlarge the state space so that it is compact. Recall that S is a separable, locally compact space. If S is already compact, there is no need to further compactify it. Otherwise, choose some state ∆ ∈ S and let S∆ = S ∪ {∆}. (We have already encountered such extensions in our study of Markov chains in Section 1.3.) We declare A ⊂ S∆ open (in S∆ ) if either: 1. A ⊂ S is open (in the topology of S); or if 2. there exists a compact K ⊂ S (K is compact in the topology of S) such that A = S∆ \ K. The above defines a topology on S∆ that renders it compact; cf. Supplementary Exercise 1. The space S∆ is the so-called one-point compactification of S∆ . Henceforth, any f from S into R is extended to a function from S∆ in to R by defining f (∆) = 0. If S is already compact, we define S∆ = S.
4.1 Feller Processes Suppose T is a Markov semigroup on a locally compact, separable metric space (S, d) and let X denote a Markov process with transition functions T. We say that X is a Feller process (Feller, briefly) if T is Feller. The following is the first main result of this subsection. Theorem 4.1.1 Suppose X is a Feller process on S with transition func' that tions T. Then, X has a right-continuous, S∆ -valued modification X is itself a Feller process with the same transition functions as X. Moreover, on a suitable probability space we can construct a right-continuous Feller process Y , together with a family of shifts, such that Y has the same transition functions as X.
4 Feller Processes
295
Our proof of Theorem 4.1.1 is long and is divided into several steps. We begin with two lemmas, the first of which is the continuous-time analogue of Corollary 1.5.1. Throughout, X denotes our Feller process that has transition functions T and resolvent R. Lemma 4.1.1 For any nonnegative f ∈ C0 (S) and all λ > 0, the stochastic process (e−λt Rλ f (Xt ); t ≥ 0) is a supermartingale, under the measure Px for any x ∈ S. Proof Suppose f ∈ C0 (S). Since r → Tr f (·) is uniformly right-continuous (Lemma 2.4.1), we can apply Fubini’s theorem to see that Px -a.s. for all x ∈ S, ∞ −λr e Tr f (Xt+s ) dr Fs Ex Rλ f (Xt+s ) Fs = Ex ∞ 0 e−λr Ex Tr f (Xt+s ) Fs dr. = 0
On the other hand, for all r, t, s ≥ 0, Ex [Tr f (Xt+s ) | Fs ] = Tr+t f (Xs ), Px -a.s., for all x ∈ S. Using Fubini’s theorem once more, we deduce from Theorem 3.3.2 (and equation (2) of Section 3.3 if shifts do not exist) that Px -a.s. for any x ∈ S, ∞ Ex Rλ f (Xt+s ) Fs = e−λr EXs Tr f (Xt ) dr 0 ∞ = e−λr Tr+t f (Xs ) dr 0 ∞ λt =e e−λr Tr f (Xs ) dr. t
In other words, for all x ∈ S, the following holds Px -a.s.: Ex e−λt Rλ f (Xt+s ) Fs = Rλ f (Xs ) −
0
t
e−λr Tr f (Xs ) dr.
(1)
If f (a) ≥ 0 for all a ∈ S, it follows that for any x ∈ S, Px -a.s., Ex e−λ(t+s) Rλ f (Xt+s ) Fs ≥ e−λs Rλ f (Xs ). This proves the result.
The second lemma of this subsection is a technical real-variable result. In order to introduce it properly, we need some notation. Let d denote the metric on S and define Bd (a; ε) = b ∈ S : d(a, b) ≤ ε
296
8. Constructing Markov Processes
to be the closed ball of radius ε about a. This is the closure of the open ball Bd (a; ε), which appeared earlier in Section 2.1 of Chapter 5. By Urysohn’s lemma (Lemma 1.3.1, Chapter 6), for any a ∈ S and ε > 0, we can find a function ψa,ε ∈ C0 (S) such that: • for all y ∈ S, 0 ≤ ψa,ε (y) ≤ 1; • for all y ∈ Bd (a; ε), ψa,ε (y) = 1; and • for all y ∈ Bd (a; 2ε), ψa,ε (y) = 0. Since S is separable, it contains a dense subset S . Define
Ψ = ψa,ε ; a ∈ S , ε ∈ Q+ . The announced technical lemma is the following. It is here that we use the compactness of S∆ in an essential way. Lemma 4.1.2 Suppose the function x : R+ → S∆ satisfies the following: For all f ∈ Ψ and all t ≥ 0, lims↓t: s∈Q+ f (x(s)) exists. Then, for all t ≥ 0, lims↓t: s∈Q+ x(s) exists. Proof Suppose, to the contrary, that the mentioned limit does not always exist. By compactness, we can find two distinct points a, a ∈ S∆ , t ∈ [0, 1[, and s1 , s2 , . . . , s 1 , s 2 , . . . ∈ Q+ such that as n → ∞, sn , s n ↓ t and a = limn→∞ x(sn ), while a = limn→∞ x(s n ). Suppose a, a ∈ S. We can then find b ∈ S and ε ∈ Q+ such that d(a, b) ≤ ε/2 and d(b, a ) > 4ε. Thus, there exists n such that for all n ≥ n , d(x(sn ), b) ≤ ε while d(x(s n ), b) > 2ε. In other words, for all n ≥ n , ψb,ε (x(sn )) = 1 and ψb,ε (x(s n )) = 0. In particular,
lim sup ψb,ε x(s) ≥ 1 > 0 ≥ lim inf ψb,ε x(s) . s↓t: s∈Q+
s↓t: s∈Q+
The above still holds true if a ∈ S but a = ∆ or vice versa. We have arrived at a contradiction, and the lemma follows. We are prepared to demonstrate Theorem 4.1.1. Proof of Theorem 4.1.1 For all f ∈ C0 (S) and for every λ > 0, define Λf,λ to be the collection of all ω ∈ Ω such that lim
s↓t: s∈Q+
Rλ f (Xs )(ω) exists for all t ≥ 0.
Lemma 4.1.1 and Lemma 1.4.1 of Chapter 7 can be combined to show that for all x ∈ S, Px (Λf,λ ) = 0. If Λf,λ , Λ= f ∈Ψ,λ∈Q+
4 Feller Processes
297
then for all x ∈ S, Px (Λ ) = 0. By Lemma 2.4.1, for any ω ∈ Λ, lim
s↓t: s∈Q+
f (Xs )(ω) exists for all t ≥ 0 and all f ∈ Ψ.
Given ω ∈ Λ, we can apply Lemma 4.1.2 with x(s) = Xs (ω) to conclude that for all ω ∈ Λ and all t ≥ 0, lims↓t: s∈Q+ Xs (ω) exists. Fix some a ∈ S ' = (X 't ; t ≥ 0) by and define X " lims↓t: s∈Q+ Xs (ω), if ω ∈ Λ, ' Xt (ω) = a, if ω ∈ Λ. ' is right-continuous. We will show that it is a modification It is clear that X of X. Given any ϕ1 , ϕ2 ∈ L∞ (S)9 and any x ∈ S, then for all t ≥ 0, 't )ϕ2 (Xt ) = lim Ex ϕ1 (Xs )ϕ2 (Xt ) Ex ϕ1 (X s↓t: s∈Q+ = lim Ex Ex {ϕ1 (Xs ) | Ft } ϕ2 (Xt ) s↓t: s∈Q+ = lim Ex Ts−t ϕ1 (Xt ) ϕ2 (Xt ) s↓t: s∈Q+ = Ex ϕ1 (Xt )ϕ2 (Xt ) . The first equality follows from the dominated convergence theorem, the third from the definition of Markov processes, and the last from the Feller property, together with the dominated convergence theorem. A monotone class argument now shows that for all bounded, continuous ϕ : S × S → R, for all x ∈ S, and for every t ≥ 0, 't , Xt ) = Ex ϕ(Xt , Xt ) . Ex ϕ(X Applying this to ϕ(a, b) = d(a, b) ∧ 1, (a, b ∈ S), we can conclude that ' is a Markov ' is a Px -modification of X. The fact that X for all x ∈ S, X process with transitions T follows immediately from the fact that it is a modification of X. To conclude, we address the issue of the existence of shifts. The desired probability space is the one appearing in the proof of Theorem 3.3.1, namely, Ω = S R+ . For all ω ∈ S R+ and all s, t ≥ 0, let Y't (ω) = ω(t) and θt (ω)(s) = ω(t + s). We recall from the proof of Theorem 3.3.1 that the measures Px render Y' = (Y't ; t ≥ 0) a Feller process with shift operators θ and transition functions T. We can define Λ0 in the same way as we did Λ above, except that we replace X by Y' everywhere. Define " 0 ' Yt (ω) = lims↓t: s∈Q+ Ys (ω), if ω ∈ Λ0 , ω(0), if ω ∈ Λ . 9 Recall
that ϕi (∆) = 0
298
8. Constructing Markov Processes
Finally, let θ't (ω)(s) =
"
limr↓t+s: ω(0),
r∈Q+
Yr (ω), if ω ∈ Λ0 , if ω ∈
Λ0 .
Then, t → Yt (ω) is right-continuous for all ω ∈ S R+ , and Y is a Feller process with transition functions T. Finally, a few more lines of calculations show that θ' = (θ't ; t ≥ 0) are shifts for Y . To state the second, and final, result of this subsection, we return to a question that came up at the end of Section 3.2 above. Namely, we now show that if X is Markov with respect to a filtration F, X is also Markov with respect to the complete augmented filtration F , at least if X is Feller. Theorem 4.1.2 Suppose X is a right-continuous, S∆ -valued Feller process with respect to a filtration F. Then, X is also a Feller process with respect to the complete augmented filtration F . Proof In light of Lemma 3.2.1, we can assume, without loss of generality, that F is complete; thus, Ft = ∩s>t Fs , and we seek to prove the following: For all f ∈ L∞ (S), all x ∈ S, and all s, t > 0, Ex f (Xt+s ) Fs = Ex f (Xt+s ) Fs , Px -a.s. But by the Markov property, for all small ε > 0, Ex f (Xt+s ) Fs+ε = Tt−ε f (Xs+ε ), Px -a.s.; cf. Lemma 3.2.1. Let ε ↓ 0 along a rational sequence to see that the right-hand side converges to Tt f (Xs ), by the Feller property and the right continuity of X. Furthermore, the left-hand side converges to Ex [f (Xt+s ) | Fs ], by Doob’s martingale convergence theorem for discreteparameter martingales; cf. Theorem 1.7.1 of Chapter 1. The result follows. Convention Recall that the history F = (Ft ; t ≥ 0) of the process X is the filtration given by the following: For all t ≥ 0, Ft is the smallest σ-field that makes (Xr ; 0 ≤ r ≤ t) measurable. In light of Theorems 4.1.1, 4.1.2 and Exercise 3.3.2, when we discuss a Feller process X, we can, and always will, assume, without loss of generality, that X is right-continuous and the underlying filtration is the complete augmented history of X.
4.2 The Strong Markov Property It is a good time for us to prove that Feller processes satisfy a strong Markov property that is the natural continuous-time extension of the strong Markov property we encountered in the discrete setting; cf. Section 1.2 for the latter.
4 Feller Processes
299
Suppose (S, d) is a locally compact, separable metric space, T is a Markov semigroup on S, and X is a Markov process on S with transition functions T. Throughout this subsection, F = (Ft ; t ≥ 0) designates the complete augmented history of X. Using our one-point compactification S∆ , we define X∞ (ω) = ∆ and call X a strong Markov process if: 1. for all F-stopping times T , ω → XT (ω) is measurable (as a function from Ω → S∆ ); and 2. (The Strong Markov Property) for all F-stopping times T , all f ∈ L∞ (S), and for every t ≥ 0, Ex [f (Xt+T ) | FT ]1l(T <∞) = EXT [f (Xt )], Px -a.s., for every x ∈ S. It is clear that every strong Markov process is a Markov process. However, the converse need not be true, as the following shows. Exercise 4.2.1 Let T be an exponential random variable with mean 1 and define Xt = 0 for all t = T , whereas XT = 1. (i) Check that T = inf(s ≥ 0 : Xs = 1) and is a finite stopping time. (ii) Prove that X is a Markov process that is not strong Markov. (Hint: You can start by showing that a modification of a Markov process is itself a Markov process.) Since g(∆) = 0 for all g : S → R, condition 2 holds if and only if for all x ∈ S, the following holds Px -a.s.: Ex [f (Xt+T ) | FT ]1l(T <∞) = EXT [f (Xt )]1l(T <∞) . The following provides us with an easy-to-use condition for verification of the strong Markov property. Lemma 4.2.1 Given condition 1 above, condition 2 is equivalent to the following: 3. for all F-stopping times T , f ∈ L∞ (S), x ∈ S, and for every t ≥ 0, Ex [f (Xt+T )] = Ex EXT [f (Xt )] . Proof If condition 2 holds, then we obtain condition 3 by integration with respect to Px . We will now show that condition 3 implies condition 2. For any A ∈ FT , we define the R+ ∪ {∞}-valued random variable TA by " T (ω), if ω ∈ A, TA (ω) = ω ∈ Ω. +∞, if ω ∈ A,
300
8. Constructing Markov Processes
For any t ≥ 0, (TA ≤ t) = (T ≤ t) ∩ A ∈ Ft . Thus, TA is an F-stopping time. By condition 3, for all x ∈ S and all t ≥ 0, Ex [f (Xt+TA )] = Ex EXTA [f (Xt )] . The right-hand side equals Ex {EXT [f (Xt )]1lA }, while the left-hand side equals Ex [f (Xt+T )1lA ]. This proves the lemma. Next, we prove the strong version of Theorem 3.3.2 for strong Markov processes. Theorem 4.2.1 Suppose X is a strong Markov process with shift operators θ = (θt ; t ≥ 0). For any F-stopping time T and for all bounded, ∨t≥0 Ft measurable random variables Y , Ex [Y ◦ θT | FT ]1l(T <∞) = EXT [Y ],
(1)
Px -a.s., for all x ∈ S. Proof Since t → θt (ω) is a stochastic process, θT (ω) = θT (ω) (ω). To begin with, we need to show that θT : Ω → Ω is measurable. Let A1 , . . . , Am be measurable subsets of S and fix 0 ≤ t1 < t2 < · · · < tm . Note that A = ∩m j=1 (Xtj ∈ Aj ) ∈ ∨t≥0 Ft . Moreover, θT−1 (A) = (ω : θT (ω) ∈ A) =
m
(ω : XT +tj (ω) ∈ Aj ),
j=1
which is in FT +tm ; we have used the simple fact that T + tm is a stopping time. By a monotone class theorem, for all A ∈ ∨t≥0 Ft , θT−1 (A) is a measurable subset of Ω. This shows that θT : Ω → Ω is indeed measurable. Now we proceed to verify the claim of the theorem. By a monotone class theorem, k it suffices to prove this result for Y ’s of the form j=1 ϕj (Xtj ), where ϕ1 , . . . , ϕk ∈ L∞ (S) and 0 ≤ t1 < · · · < tk . Let Φm denote the collection of all random variables Y of this form. When Y ∈ Φ1 , (1) follows from the definition of the strong Markov property. We proceed by induction on m. Suppose (1) holds true for all Y ∈ Φk−1 ; we will show that it also holds for all Y ∈ Φk . Indeed, if Y ∈ Φk , we can find ϕ1 , . . . , ϕk ∈ L∞ (S) and k 0 ≤ t1 < · · · < tk such that Y = j=1 ϕj (Xtj ). Now, for any x ∈ S, the following holds Px -a.s.: k−1
Ex [Y ◦ θT | FT +tk−1 ] =
ϕj (Xtj +T ) Ex [ϕk (XT +tk ) | Ftk−1 +T ] j=1
k−1
ϕj (Xtj +T ) Ex [ϕk (Xtk −tk−1 ) ◦ θtk−1 +T | Ftk−1 +T ]
= j=1
4 Feller Processes
301
k−1
ϕj (Xtj +T ) × EXk [ϕk (Xtk −tk−1 )],
= j=1
by the induction hypothesis (on Φ1 ). Define ψ(x) = ϕk−1 (x) Ex [ϕk (Xtk −tk−1 )],
x ∈ S.
This is a measurable function from S into R, and we have shown that Px -a.s. for any x ∈ S, k−2
Ex [Y ◦ θT | FT +tk−1 ] =
ϕj (Xtj +T ) × ψ(Xtk−1 +T ) j=1
k−2 ϕj (Xtj ) ◦ θT . = ψ(Xtk−1 ) × j=1
By the induction hypothesis (on Φk−1 ), Px -a.s. for all x ∈ S, Ex [Y ◦ θT | FT ] = Ex Ex [Y ◦ θT | FT +tk−1 ] FT k−2 ϕj (Xtj ) = EXT ψ(Xtk−1 ) × " k−1 = EXT j=1
k−1 = EXT
j=1
# ϕj (Xtj ) EXtk−1 [ϕk (Xtk −tk−1 )]
ϕj (Xtj ) .
j=1
The last line follows from the Markov property of X; cf. Theorem 3.3.2. This completes our proof. The main result of this section is the following. Theorem 4.2.2 Any right-continuous Feller process on S is a strong Markov process. Proof This proof is long, and divided into three steps. Step 1. Suppose there exists α ∈ R+ such that for all x ∈ S, Px (T ∈ αN0 ∪ {∞}) = 1. We will show that condition 3 of the strong Markov property holds for such a stopping time T . Indeed, for all f ∈ L∞ (S), all x ∈ S, and all t ≥ 0, Ex [f (Xt+T )] =
∞ n=0
Ex [f (Xt+αn )1l(T =αn) ].
302
8. Constructing Markov Processes
Let F denote the complete augmented history of X. Of course, for all t ≥ 0, (T = t) ∈ Ft . Thus, Ex [f (Xt+T )] = =
∞ n=0 ∞
Ex Ex [f (Xt+αn ) | Fαn ] 1l(T =αn) Ex EXαn [f (Xt )]1l(T =αn)
n=0
= Ex EXT [f (Xt )] . The second equality follows from the Markov property; see Theorems 3.3.2 and 4.1.2. This completes Step 1 of our proof. Step 2. If T is an F-stopping time, we can apply Lemma 1.1.1 of Chapter 7 to construct F-stopping times T 1 , T 2 , . . . such that: • for any n ≥ 1, T n < ∞ if and only T < ∞; • for any n ≥ 1, T n ≥ T n+1 ; • if T (ω) < ∞, then T n (ω) ∈ 2−n N; and • limn→∞ T n = T . Applying Step 1 to the T n ’s, we see that for all f ∈ C0 (S), all x ∈ S, every t ≥ 0, and every n ≥ 0, Ex [f (Xt+T n )] = Ex EXT n [f (Xt )] . Letting n → ∞ and using the right continuity of t → Xt , we obtain the following from the dominated convergence theorem: For all f ∈ C0 (S), all x ∈ S, and every t ≥ 0, Ex [f (Xt+T )] = Ex EXT [f (Xt )] . That is, condition 3 holds for all f ∈ C0 (S) and all F-stopping times T . Step 3. Let G ⊂ S be an open set. There exists a sequence of open sets G1 , G2 , . . . and a series of closed sets F1 , F2 , . . . such that for all n ≥ 1, Gn ⊂ Gn+1 ⊂ · · · ⊂ G ⊂ · · · ⊂ Fn+1 ⊂ Fn , and such that ∩n Fn = ∪n Gn = G. To be concrete, we can consider Gn = {y ∈ G : d(y, G ) < n−1 } and Fn = {y ∈ S : d(y, G) ≤ n−1 }. By Urysohn’s lemma (Lemma 1.3.1, Chapter 6), for any n ≥ 1, we can find functions ϕn , ϕn ∈ C0 (S) such that for all x ∈ S and all n ≥ 1, 0 ≤ ϕn (x), ϕn (x) ≤ 1, and " " 1, if x ∈ Gn , 1, if x ∈ G, ϕn (x) = ϕn (x) = 0, if x ∈ G, 0, if x ∈ Fn .
4 Feller Processes
303
A little thought shows that for all n ≥ 1, ϕn ≤ ϕn+1 ≤ 1lG ≤ ϕn+1 ≤ ϕn . Thus, for all x ∈ S, all F-stopping times T , and all t ≥ 0, Ex ϕn (Xt+T ) ≤ Px (Xt+T ∈ G) ≤ Ex ϕn (Xt+T ) .
(2)
(3)
By Step 2, the two sides of the above can be identified as Ex {EXT [ϕn (Xt )]} and Ex {EXT [ϕn (Xt )]}, respectively. Consequently, (4) Ex EXT ϕn (Xt ) ≤ Px (Xt+T ∈ G) ≤ Ex EXT ϕn (Xt ) . By equation (2) and the monotone convergence theorem, both sides have finite limits, as n → ∞. We now claim that these two limits are one and the same, and we wish to identify the said limit. For all x ∈ S, 0 ≤ ϕn (x) − ϕn (x) ≤ 1lFn \Gn (x). As n → ∞, the latter converges down to 0. Hence, we can apply the monotone convergence theorem to see that lim Ex |ϕn (Xt+T ) − ϕn (Xt+T )| = 0. n→∞
By Step 2, equations (2) and (4), and the monotone convergence theorem, Px (Xt+T ∈ G) = lim Ex EXT [ϕn (Xt )] = Ex PXT (Xt ∈ G) . n→∞
The two extreme sides of the above are probability measures as set functions of G. Since the above holds for all open sets G ⊂ S, by a monotone class argument it holds for all measurable G ⊂ S. Another monotone class argument shows that for all f ∈ L∞ (S), all x ∈ S, and every t ≥ 0, Ex [f (Xt+T )] = Ex EXT [f (Xt )] . An appeal to Lemma 4.2.1 finishes our proof.
4.3 L´evy Processes It is time for us to see some examples of Feller processes. The examples of this subsection are, in a sense, not much more than continuous-time random walks. With this random-walk model in mind, we say that an Rd -valued evy process (on Rd ) if: stochastic process X = (Xt ; t ≥ 0) is a L´ 1. P(X0 = 0) = 1. 2. (Independence of the Increments) For all 0 = t0 ≤ t1 ≤ · · · ≤ tm , (Xti+1 − Xti ; 0 ≤ i ≤ m − 1) is a collection of independent random vectors in Rd .
304
8. Constructing Markov Processes
3. (Stationarity of the Increments) For all 0 ≤ t1 ≤ t2 , the distribution of Xt2 − Xt1 is the same as that of Xt2 −t1 . 4. The random function t → Xt is continuous in probability. That is, for all t ≥ 0 and all ε > 0, lim P(|Xs − Xt | ≥ ε) = 0.
s→t
Conditions 1–3 ensure that X is a random walk in continuous time, while condition 4 guarantees us some minimal regularity. Barring issues of existence, let us define a family of linear operators T = (Tt ; t ≥ 0) by Tt f (x) = E[f (Xt + x)],
t ≥ 0, x ∈ Rd , f ∈ L∞ (Rd ).
Lemma 4.3.1 If the L´evy process X exists, then T, as defined above, is a Feller semigroup on Rd . Proof For all f ∈ L∞ (Rd ), x ∈ Rd , and t ≥ 0, " # Tt Ts f (x) = E[Ts f (x + Xt )] = E f (x + y + Xt ) P(Xs ∈ dy) Rd # " f (x + y + Xt ) P(Xt+s − Xt ∈ dy) , =E Rd
by condition 3. Consequently, Tt Ts f (x) = E E[f (x + Xt+s − Xt + Xt ) | Xt ] = E E[f (x + Xt+s ) | Xt ] = E[f (x + Xt+s )] = Tt+s f (x). That is, T is a semigroup on Rd . The other properties of Markov semigroups follow readily. To conclude this proof, note that any f ∈ C0 (Rd ) is uniformly continuous; cf. Exercise 2.2.1. Moreover, sup |Tt f (x) − f (x)| ≤ E sup |f (x + Xt ) − f (x)| . x∈Rd
x∈Rd
The bounded convergence theorem implies that it is sufficient to show that as t → 0+, Xt → 0, in probability; this is an immediate consequence of conditions 1 and 4 of L´evy processes. Exercise 4.3.1 Show that whenever µ is an infinitely divisible distribution on Rd , there exists an Rd -valued L´evy process X = (Xt ; t ≥ 0) such that the distribution of X1 is µ. That is, P(X1 ∈ A) = µ(A). Conversely, show that whenever X is a L´evy process, X1 is an infinitely divisible random variable. (Hint: If X is a L´evy process, (Xt ; t ∈ F ) can be identified with a random walk, as long as F is finite.)
4 Feller Processes
305
Based on the transition operators T, we can construct transition functions T by the assignment Tt (x, A) = Tt 1lA (x). Note that A → Tt (x, A) is a probability measure on Rd for each x ∈ Rd and for every t ≥ 0. The following is a prefatory version of the Chapman–Kolmogorov equation and has the very same proof; cf. Lemma 3.1.1. Lemma 4.3.2 If the L´evy process X exists, then for all 0 ≤ t1 ≤ · · · ≤ tm and all ϕ1 , . . . , ϕm ∈ L∞ (Rd ), m ϕj (Xtj ) E j=1
=
···
m
ϕj (aj )Tt1 (0, da1 )Tt2 −t1 (a1 , da2 ) · · · Ttm −tm−1 (dm−1 , dam ). j=1
If X is a L´evy process, we can view t → Xt as a random variable that takes its values in the space S = (Rd )R+ , endowed with the product topology and the corresponding Borel field S. Remark While S is not a metric space, one defines such random variables in the same manner as those of Chapter 5. For each x ∈ Rd , we can define the following probability measure on (S, S): Px = P ◦ (X + x)−1 , where (X + x)t = Xt + x. Then, it quickly d follows that for any initial measure & ν, X is a Feller process on R with respect to the measure Pν (•) = Px (•) ν(dx). As such, we can combine Lemma 4.3.1 with Theorems 4.1.1 and 4.2.2 to obtain the following. Proposition 4.3.1 After a suitable modification, any L´evy process on Rd is a right-continuous strong Markov process. Example 1 (Multidimensional Brownian Motion) We say that the Rd -valued stochastic process B = (Bt ; t ≥ 0) is (standard) Brownian motion in Rd if its coordinate processes B (1) , . . . , B (d) are independent (i) standard Brownian motions, where, B (i) = (Bt ; t ≥ 0), 1 ≤ i ≤ d. By Theorem 1.7.1 of Chapter 7, B is a L´evy process, and is, in fact, continuous. Note that for any t ≥ 0, Bt ∼ Nd (0, tId ), where Id denotes the identity matrix in Rd × Rd . For all t ≥ 0 and y ∈ Rd , define y2 d , qt (y) = q(t; y) = (2πt)− 2 exp − 2t d where y2 = j=1 |y (j) |2 denotes the square of the Euclidean 2 norm, as usual. It is then clear that for all t ≥ 0, x ∈ Rd , and f ∈ L∞ (Rd ), f (y) q(t; y − x) dy. Tt f (x) = E[f (Bt + x)] = Rd
306
8. Constructing Markov Processes
In other words, the transition functions of B are given by the following: For all t ≥ 0, x ∈ Rd , and f ∈ L∞ (Rd ), Tt f (x) = f qt (x), where denotes convolution. Following the discussion of Section 2.3, we conclude that T has transition densities with respect to Lebesgue’s measure, and they are pt (x, y) = qt (y − x), t ≥ 0, x, y ∈ Rd . Example 2 (The Generalized Multidimensional Poisson Process) Let P = (Pt ; t ≥ 0) denote an Rd -valued stochastic process whose ith (i) coordinate process P(i) = (Pt ; t ≥ 0) is a Poisson process with rate (i) λ > 0, and suppose that the coordinate processes are independent. The process P is called the generalized Poisson process on Nd0 with rate vector λ. According to Section 1.8 of Chapter 7, P is a L´evy process on Rd , and it is not hard to see that it has the following transition densities with respect to the counting measure on Nd0 : For all t ≥ 0, −λ(j) t (j) y (j) −x(j) (λ t) d e , if y x and y − x ∈ Nd0 , j=1 pt (x, y) = (y (j) − x(j) )! 0, otherwise. Example 3 (Isotropic Stable Processes) An Rd -valued stochastic process X = (Xt ; t ≥ 0) is an isotropic stable process of index α ∈ ]0, 2] if it is a L´evy process and if for all t ≥ 0, the characteristic function of Xt is given by the following formula: 1
α
E[eiξ·Xt ] = e− 2 t ξ ,
ξ ∈ Rd .
d As usual, ξ = ( j=1 |ξ (j) |2 )1/2 stands for the Euclidean 2 norm of ξ ∈ Rd .10 When α = 2, this is Brownian motion, and existence of such a process has already been verified. When α = d = 1, this is called the (symmetric) Cauchy process, since the above characteristic function implies that the probability density function of Xt is, in this case, given by 2t 1 P(Xt ∈ A) = dx, 2 π A t + 4x2 for all Borel sets A ∈ R. (Why?) That is, in this case, t−1 Xt has a standard Cauchy distribution. 10 If
α ∈ ]0, 2], recall that ξ → exp{− 12 ξα } is not a characteristic function.
5 Supplementary Exercises
307
In the general case where α ∈ ]0, 2], such processes always exist. This is worked out in Supplementary Exercise 5, by using the L´evy–Khintchine formula. Moreover, as we will shortly see, their distribution has a density with respect to Lebesgue’s measure. But these densities are not explicitly known except when α = 2 or α = d = 1. Nonetheless, one can obtain much information on the existence and structure of these densities by Fourier-analytic methods. Indeed, by the inversion formula for characteristic functions, X has transition densities with respect to the Lebesgue measure and they are of the form pt (x, y) = qt (y − x), where qt (a) = (2π)−d
1
α
e−iξ·a e− 2 t ξ dξ.
Rd
(Check!) We can now compute the transition functions in complete analogy to Example 1: For all f ∈ L∞ (Rd ), Tt f (x) = f qt (x). Example 4 (Uniform Motion Along a Ray) Fix β ∈ Rd and define for all ω ∈ Ω and all t ≥ 0, Xt (ω) = tβ. This is a nonrandom stochastic process. However, it is possible to show that it is a L´evy process on Rd with transition functions Tt f (x) = f (x+tβ), t ≥ 0, x ∈ Rd , f ∈ L∞ (Rd ). Exercise 4.3.2 Complete the computations of Example 4 above.
5 Supplementary Exercises 1. Prove that the one-point compactification of a locally compact, separable metric space is a topology and that it renders S∆ compact. 2. Let X = (Xn ; n ≥ 0) denote a Markov chain on a denumerable state space S whose k-step transition function is denoted by pk . Find an explicit formula for pk in terms of pk−1 and p1 and obtain a recursion. (Hint: pk (x, y) is the probability of going from x to y in k steps. This means that in (k − 1) steps, the chain goes from x to some z ∈ S and in one last step, it goes from z to y.) 3. Let X = (Xn ; n ≥ 0) denote the simple random walk on Zd . View this as a Markov chain with transition functions pk . (i) Prove that for all x, y ∈ Zd and all n ≥ 0, pn (x, y) = (2π)−d
[−π,π]d
e−iξ·y
n d 1 cos(ξ (j) ) dξ. d j=1
308
8. Constructing Markov Processes
d (ii) Show that for all positive functions f , for all x ∈ Z , and for all λ ∈ ]0, 1[, Rλ f (x) = y∈Zd f (y)uλ (y − x), where
uλ (a) = (2π)−d
−1 d λ e−iξ·a 1 − cos(ξ (j) ) dξ. d j=1 [−π,π]d
When is limλ↑1 uλ (a) finite? This is related to Section 3.1, Chapter 3. (Hint: For part (i), use the inversion formula for characteristic functions.) 4. Given a Markov chain X = (Xn ; n ≥ 0) on a denumerable space S, we say that x0 ∈ S is recurrent if, starting at x0 , S visits x0 infinitely often with positive probability. Check that x0 ∈ S is recurrent if and only if ∞ n=0 pn (x0 , x0 ) = +∞, where p n designates the n-step transition function of X. Moreover, whenever ∞ ∞ n=0 pn (x0 , x0 ) = +∞, then Px0 -a.s., n=0 1l(Xn =x0 ) = +∞. (Hint: See Proposition 1.3.1, Chapter 3.) 5. Let µ be an infinitely divisible distribution on Rd whose characteristic function is denoted by ϕ. Recall the L´evy–Khintchine formula11 ϕ(ξ) = exp{−Ψ(ξ)}, where Ψ has the following representation: There exists a ∈ Rd , a symmetric, d d d positive & definite matrix Σ ∈ R × R , and a measure L on R with L({0}) = 0 and {1 ∧ |τ |2 } L(dτ ) < ∞ such that for all ξ ∈ Rd , iξ · τ 1 1 − eiξ·τ + L(dτ ). Ψ(ξ) = ia · ξ + ξ Σξ + 2 1 + τ 2 (i) Use this to show that for any α ∈ ]0, 2[, there exists an infinitely divisible Rd -valued random variable Y such that E[eiξ·Y ] = exp{− 12 ξα }. (ii) Conclude that isotropic stable processes exist. (Hint: For the second part, employ Exercise 4.3.1; it may help to know this fact in R1 .) 6. A strong Markov process X = (Xt ; t ≥ 0) on a separable matric space (S, d) is a diffusion if it has a continuous modification. In this exercise we will find a sufficient condition for X to be a diffusion. As usual, S∆ denotes the one-point compactification of S. (i) Suppose f : [0, 1] → S∆ is a right-continuous function. Show that f is discontinuous if and only if k k+1 , f > 0. lim inf max d f n→∞ 0≤k≤n−1 n n (ii) Use the above to show that X is a diffusion (on S) if for all compact sets K ⊂ S and for all ε > 0, lim
t→0+
1 sup Px (Xt ∈ B(x; ε)) = 0, t x∈K
where B(x; ε) = {y ∈ S : d(x, y) < ε}. 11 See (Bertoin 1996; Feller 1971; Itˆ o 1984; Sato 1999) and their references for earlier works. The 1-dimensional version appears in most textbooks on probability.
5 Supplementary Exercises
309
(iii) Provide an alternative proof that Brownian motion has a continuous modification. 7. (Hard. Continued from Supplementary Exercise 6) Let X = (Xt ; t ≥ 0) denote an isotropic stable L´evy process of index α ∈ ]0, 2[. (Note α = 2 in this problem!) We intend to prove that X is not a diffusion. (i) Check that for each fixed t > 0, Xt has the same distribution as t1/α X1 . This property is called scaling. (ii) Let qt (x) denote the density function of Xt . We will need the following estimate, which follows from Proposition 3.3.1 of Chapter 10: There exists a finite constant C > 1 such that for all a ∈ Rd with a > 2, Ca−(d+α) ≥ q1 (a) ≥ C −1 a−(d+α) . Use this, without proof, to deduce the existence of a finite constant A > 1 such that for all y > 1, Ay −α ≥ P(X1 > y) ≥ A−1 y −α .
(iii) Show that with positive probability, X is not a diffusion. Use a 0-1 law to demonstrate that X is a.s. not a diffusion. (See Supplementary Exercises 10 and 11.) (iv) Check, explicitly, that the condition of Supplementary Exercise 6 fails to hold in this case. (Hint: For part (iii), apply the Paley–Zygmund lemma; cf. Lemma 1.4.1, Chapter 3.) 8. Prove directly that the operators of Examples 1, 2, and 3 of Section 1.4 form Markov semigroups. 9. (Hard) Suppose T = (Tn ; n ≥ 0) and T = (Tn ; n ≥ 0) are two transition functions on a denumerable state space S. If ν is a probability measure on S, we & in duality with respect to ν if for all f, g : S → R+ , & say that T and T are f (x) · T 1 g(x) ν(dx) = g(x) · T1 f (x) ν(dx). Let X = (Xn ; n ≥ 0) denote the S-valued Markov chain whose transition functions are T, and define F n and Gn to be the σ-fields generated by (X0 , . . . , Xn ) and (Xn , Xn+1 , . . .), respectively. (i) Prove that for all n ≥ 0 and all bounded functions h : S → R+ , Eν [h(Xn+1 ) | F 1 ] = Tn h(X1 ),
Pν -a.s.
(ii) Prove that for all n ≥ 0 and all bounded functions h : S → R+ , Eν [h(X1 ) | Gn+1 ] = Tn h(Xn+1 ),
Pν -a.s.
(iii) Prove that for all (measurable) functions f : S → R+ and for all p > 1, p p | sup Tn Tn f (x)|p ν(dx) ≤ |f (x)|p ν(dx). p−1 n S S
310
8. Constructing Markov Processes
(iv) Show that when p = 1, the above has the following analogue: For all functions f : S → R+ , S
| sup Tn Tn f (x)| ν(dx) ≤
n
e 1+ e−1
f (x) ln+ f (x) ν(dx) .
& Moreover, if S f (x) ln+ f (x) ν(dx) < ∞, show that limn Tn Tn f exists νa.s. What is this limit? (Hint: For part (ii), have a look at Exercise 1.1.2.) 10. Suppose X = (Xt ; t ≥ 0) is a Feller process on Rd . For all t ≥ 0, let Tt denote the σ-field generated by the collection of random variables (Xs ; s ≥ t). The tail σ-field of X, T, is defined as T=
Tt .
t≥0
(i) Prove that the tail σ-field of any L´evy process is trivial in that whenever A ∈ T, then, Px (A) ∈ {0, 1} for all x ∈ Rd . (ii) Construct Rd -valued Markov processes whose tail σ-fields are not trivial. The first portion is a variant of A. N. Kolmogorov’s 0-1 law. (Hint: If X is a L´evy process, (Xs ; s ∈ Q+ ) can be approximated by a random walk.) 11. Let X = (Xt ; t ≥ 0) denote a Feller process on a compact separable metric space (S, d). Suppose F = (F t ; t ≥ 0) denotes its complete augmented history of X. Prove that F 0 is trivial in that any A ∈ F 0 satisfies Px (A) ∈ {0, 1} for all x ∈ S. This is the celebrated Blumenthal’s 0-1 law, discovered by Blumenthal (1957). (Hint: First consider sets A that are measurable with respect to the σ-field generated by X0 . The latter is not quite the same σ-field as F 0 . Why?) 12. (Hard. Blumenthal’s 0-1 law, continued) Suppose X is a Markov process on a compact metric space (S, d) and with respect to a complete filtration F = (F t ; t ≥ 0). Define the right-continuous filtration F = (F t+ ; t ≥ 0) by F t+ = ∩s>t F s . We say that X satisfies Blumenthal’s 0-1 law if for all Λ ∈ F 0+ and all x ∈ S, Px (Λ) ∈ {0, 1}; cf. also Supplementary Exercise 11. Prove that X satisfies Blumenthal’s 0-1 law if and only if the following holds: For all t ≥ 0, all x ∈ S, and all F t+ -measurable events Λ, there exists an F t -measurable event Λx such that Px (Λ = Λx ) = 1. In words, Blumenthal’s 0-1 law holds if and only if F t+ is essentially the same as F t . Use this and Theorem 4.1.2 to give an alternative proof of the fact that if X is Feller, it satisfies Blumenthal’s 0-1 law; cf. Supplementary Exercise 11. (Hint: Without loss of generality, assume the existence of shifts θ and by considering the finite-dimensional distributions, we can reduce the problem to showing that for all bounded random variables Y , Ex [Y ◦ θt | F t+ ] = EXt [Y ], Px -a.s.)
6 Notes on Chapter 8
311
13. (Hard) In this exercise we construct a Markov process X with respect to a complete filtration F such that X is not Markov with respect to F ; it also is not a strong Markov process. Let S1 = [0, 1] × {(0, 0)} be the interval [0, 1] viewed as a subset of R2 ; also, let S2 = {(1, 0)} × R and define S = S1 ∪ S2 . Viewed as a subset of R2 , S is a nice metric space and will be the state space of our Markov process. The dynamics of the process X, in words, are as follows: For any a ∈ [0, 1[, on the event that X0 = (a, 0), X moves at unit speed to the right along S1 ; when it reaches (1, 0), it tosses an independent fair coin and goes up or down along S2 at unit speed, with probability 12 each. If X0 = (1, a) for a > 0, X moves up along S2 at unit speed, whereas if X0 = (1, −a), it moves down along S2 at unit speed. Finally, if X0 = (1, 0), it tosses an independent fair coin and moves up or down along S2 at unit speed, with probability 12 each. (i) Construct X rigorously and prove that it is a Markov process with respect to its complete history F. (ii) Show that F 1 is trivial, whereas F 1+ = ∩s>1 F s is not. Conclude that X is not Markov with respect to its complete augmented history. (iii) Prove that X is not a strong Markov process. (Hint: Consider the first time that the second coordinate of X is greater than zero.) (iv) Check directly that X is not a Feller process. This is due to J. B. Walsh. 14. Suppose X denotes a Feller process on the one-point compactification of a locally compact metric space (S, d). If K ⊂ S is compact, TK = inf(s ≥ 0 : Xs ∈ K) is a stopping time. Prove that t → Xt∧TK is a Feller process on K. Identify its transition functions in terms of those of X. (Hint: See Theorem 1.3.3 of Chapter 8 for a related result.)
6 Notes on Chapter 8 Section 1 The material on denumerable Markov chains is classical and shows only the surface of a rich theory; see Revuz (1984) for a complete treatment, as well as an extensive bibliography of this subject. In relation to Theorem 1.6.1, and in the context of random walks, we also mention Itˆ o and McKean (1960). Section 2 Some of the many in-depth introductions to operator theory and functional analysis are (Hille and Phillips 1957; Riesz and Sz.-Nagy 1955; Yosida 1995). Section 3 While our approach is an economical one, the theory of Markov processes is far from being a consequence of semigroup considerations. A more general theory can be found in the combined references Blumenthal and Getoor (1968), Dellacherie and Meyer (1988), Ethier and Kurtz (1986), Fukushima et al. (1994), Getoor (1975, 1990), Rogers and Williams (1987, 1994), and Sharpe (1988). They
312
8. Constructing Markov Processes
contain various aspects of a very rich and general theory that is based on the resolvents, rather than on the associated semigroups. Section 4 Two pedagogical accounts of L´evy processes are (Bertoin 1996; Sato 1999). Even more is known in the case that our L´evy process is Brownian motion. You can start reading on this in (Bass 1995; F¨ ollmer 1984b; Le Gall 1992; Revuz and Yor 1994), as well as Yor (1992, 1997). Section 5 Supplementary Exercise 7 is quite classical and is a natural introduction to L´evy systems. Revuz and Yor (1994) is an excellent textbook account that is also equipped with a detailed bibliography. The original form of Supplementary Exercise 9 for p > 1 is from Rota (1962); it extends an older result of Burkholder and Chow (1961). When p = 1, this was announced in Rota (1969); see Burkholder (1962, footnotes). The proof outlined here is borrowed from Doob (1963). For a multiparameter extension of Rota’s theorem, together with a number of results in multiparameter ergodic theory, see Sucheston (1983, Theorem 2.3). Supplementary Exercise 9 and Exercise 1.1.2 are only the beginnings of the theory of time reversal for Markov processes. Much more on this can be found in (Chung and Walsh 1969; Getoor 1979; Millar 1978; Nagasawa 1964); see also Bertoin (1996) for the special case of L´evy processes. Among many other things, these references will show you that Blumenthal’s 0-1 law (Supplementary Exercise 11) holds for all strong Markov processes; this is a deep fact and lies at the heart of much of the general theory.
9 Generation of Markov Processes
In this chapter we briefly discuss a few of the many interactions between Markov processes and integro-differential equations. In particular, we will concentrate on connections involving the so-called infinitesimal generator of the Markov process. Suppose T denotes the transition operators of the process in question. Then, the semigroup property T t+s = Tt Ts suggests the existence of an operator A such that Tt = etA , once this operator identity is suitably interpreted. If so, knowing the one operator A is equivalent to knowing all of the Tt ’s, and one expects A to be the right derivative of t → Tt at t = 0. The operator A is the so-called generator of the process, and is the subject of this chapter. The reader who is strictly interested in multiparameter processes may safely skip the material of Section 1.4 and beyond.
1 Generation In Chapter 8 we saw that nice Markov processes are naturally described by two classes of linear operators: (1) their semigroups; and (2) their resolvents. It is a remarkable fact that there is one linear operator that completely describes the process, together with its semigroup and resolvent. This so-called generator often turns out to be an unbounded operator. In this section we show the existence of generators for Feller processes, while in the next section we shall explicitly compute them for two classes of Feller processes: isotropic L´evy processes such as Brownian motion, and generalized Poisson processes. Supplementary Exercises 1–5 contain further computations. These examples, together with the material of Sections 3 and
314
9. Generation of Markov Processes
4 below, present a glimpse at some of the many deep connections between Markov processes and integro-differential equations.
1.1 Existence Suppose T is a Feller semigroup on a locally compact, separable metric space S. Let R denote the resolvent of T. According to Theorems 3.3.1, 4.1.1, and 4.2.2 of Chapter 8, there exists an associated Feller process X with shift operators θ that is right-continuous and whose transition functions and resolvents are T and R, respectively. We begin with two technical lemmas. Lemma 1.1.1 For all λ > 0, Rλ : C0 (S) → Rλ (C0 (S)) is a bijection. Thus, its inverse R−1 λ : Rλ (C0 (S)) → C0 (S) is a linear operator. It is important to point out that R−1 λ need not be a bounded operator. Proof If f ∈ C0 (S) is such that for some λ > 0, Rλ f ≡ 0, we can apply Corollary 2.2.1 of Chapter 8 to the resolvent equation to see that for all γ > 0, Rγ f ≡ 0. Applying Lemma 2.4.1 of Chapter 8, we see that f = limγ→+∞ γRγ f = 0. In particular, if f, g ∈ C0 (S) satisfy Rλ f ≡ Rλ g, for some λ > 0, then f (x) = g(x), for all x ∈ S. In other words, Rλ maps C0 (S) : Rλ (C0 (S)) → C0 (S) bijectively onto its range. Thus, the inverse R−1 λ exists, and for all f, g ∈ C0 (S) and all α, β ∈ R, −1 −1 R−1 λ Rλ (αf + βg) = αRλ Rλ f + βRλ Rλ g,
as can be seen by applying Rλ to both sides (why is this enough?). This concludes our proof. Our second technical lemma is the following consequence of the resolvent equation (Theorem 2.2.1 of Chapter 8). Lemma 1.1.2 For any λ, γ > 0, Rλ (C0 (S)) = Rγ (C0 (S)). Exercise 1.1.1 Prove Lemma 1.1.2.
−1 R−1 λ Rγ f .
By Given distinct λ, γ > 0 and f ∈ Rλ (C0 (S)), define g = Lemma 1.1.2, this is sensible, and g ∈ C0 (S). Thus, we can use the resolvent equation (Theorem 2.2.1 of Chapter 8) to conclude that −1 R−1 γ f − Rλ f = (γ − λ)f.
(1)
−1 (Why?) Equation (1) can be rewritten as λ − R−1 λ = γ − Rγ . Thus, we can define a linear operator A : Rλ (C0 (S)) → C0 (S) by setting A = λ − R−1 λ for any λ > 0. To be more precise,
Af (x) = λf (x) − R−1 λ f (x),
x ∈ S, f ∈ D(A),
(2)
1 Generation
315
where D(A) = Rλ (C0 (S)) is independent of the choice of λ. The operator A is interchangeably called the generator of T, R, and the process X. The space D(A) is the domain of the operator A. In other words, A : D(A) → C0 (S). We re-iterate that neither A nor its domain depend on λ > 0.
1.2 Identifying the Domain: The Hille–Yosida Theorem It is usually very difficult to find the domain D(A) of the generator A of a Markov process. Typically, one aims to find a sufficiently large collection C ⊂ D(A). The following characterization of D(A) is sometimes useful; it was found, independently and at around the same time, in (Hille 1958; Yosida 1958). Two nice pedagogical treatments, plus a thorough combined bibliography, can be found in (Hille and Phillips 1957; Yosida 1995). Theorem 1.2.1 (The Hille–Yosida Theorem) If A denotes the generator of a Feller semigroup T on a locally compact, separable metric space S, then D(A) = ϕ ∈ C0 (S) :
Tt ϕ − ϕ exists in C0 (S) . t→0+ t lim
Moreover, if ϕ ∈ D(A), then Aϕ = lim
t→0+
Tt ϕ − ϕ , t
(1)
where the limit takes place in C0 (S). Of course, this last statement means that for all ϕ ∈ D(A), ( ( 1 ( ( lim (Aϕ − {Tt ϕ − ϕ}( = 0. t→0+ t ∞ In particular, Aϕ(x) is the derivative, from the right, of the function t → Tt ϕ(x) at t = 0. Informally, Tt+s = Ts Tt is an exponential-type property; one may think of Tt as etA . However, this intuitive picture is not necessary for the discussion of this chapter. The usual treatments of this theorem contain a second half that can be found in Exercise 1.2.1 below. Proof Throughout this argument we will use the following fact several times: Whenever ϕ ∈ D(A), there exist λ > 0 and f ∈ C0 (S) such that ϕ = Rλ f . In fact, for any λ > 0, such an f can be found. A direct computation reveals that for all f ∈ L∞ (S), all λ > 0, and all t ≥ 0, Tt Rλ f = eλt
∞
t
e−λs Ts f ds.
(2)
316
9. Generation of Markov Processes
See the given proof of the resolvent equation (Theorem 2.2.1, Chapter 8). In particular, Tt Rλ f − Rλ f = e−λt Tt Rλ f − Rλ f + (1 − e−λt )Tt Rλ f t =− e−λs Ts f ds + (1 − e−λt )Tt Rλ f, 0
by equation (2). Thus, 1 Tt (Rλ f ) − Rλ f =− t t
t
0
e−λs Ts f ds +
1 − e−λt Tt (Rλ f ). t
By the Feller property, lim
t→0+
Tt (Rλ f ) − Rλ f = −f + λRλ f = A(Rλ f ), t
where the convergence takes place in C0 (S). Now suppose ϕ ∈ D(A). Then, ϕ = Rλ f for some f ∈ C0 (S) and λ > 0. This proves equation (1) and that Tt ϕ − ϕ exists in C0 (S) . (3) D(A) ⊂ D = ϕ ∈ C0 (S) : lim t→0+ t Conversely, suppose ϕ ∈ D and define A ϕ = limt→0+ {Tt ϕ − ϕ}/t, where the limit takes place in C0 (S). It follows that A is a linear operator with domain D(A ) = D . Moreover, by equation (1), for any λ > 0, Rλ (A ϕ) = lim
t→0+
Tt (Rλ ϕ) − Rλ ϕ = A(Rλ ϕ), t
where the limit takes place in C0 (S). We have used the continuity of f → Rλ f ; cf. Corollary 2.1.1 and Lemma 2.2.2 of Chapter 8. Recalling the definition of A, i.e., equation (2) of Section 1.1, we can rewrite the above as follows: For all ϕ ∈ D , A (Rλ ϕ) = Rλ (A ϕ) = λRλ ϕ − ϕ. Since ψ = λϕ − A ϕ ∈ C0 (S), ϕ = Rλ ψ ∈ D(A). We have shown that D ⊂ D(A) and completed the proof. Theorem 1.2.1 shows that if we know the semigroup, we can, in principle, find the generator. The converse also holds. In particular, the single operator A completely determines the entire semigroup T. A more precise statement follows. Theorem 1.2.2 Suppose T and T are two Feller semigroups on S. Denote their generators by A and A and their resolvents by R and R , respectively. If D(A) = D(A ) = D and if Af = A f on D, then T and T are one and the same.
1 Generation
317
Proof For all f ∈ D, Af − A f = 0. Recalling (2) of Section 1.1 and applying Rλ to the latter equation, we see that for all λ > 0, −1
0 = Rλ Af − A f = Rλ R λ f − f. By Lemma 1.1.2, for all ϕ ∈ C0 (S) and all λ > 0, there exists f ∈ D such that R λ ϕ = f . Thus, we have shown that for all ϕ ∈ C0 (S) and all λ > 0, Rλ ϕ = R λ ϕ. The present theorem now follows from the uniqueness of Laplace transforms; cf. Theorem 1.1.1 of Appendix B. In fact, the cycle A → T → R can be completed by showing that the resolvent R completely determines the generator A. Exercise 1.2.1 (Theorem 1.2.1. Continued) Suppose A denotes the generator of a Feller semigroup T on a locally compact, separable metric space S. Prove that D(A) = ϕ ∈ C0 (S) : lim λ λRλ ϕ − ϕ exists in C0 (S) λ→∞
and Aϕ = limλ→∞ λ{λRλ ϕ − ϕ}, where the limit takes place in C0 (S).
1.3 The Martingale Problem There are deep connections between generators and martingales. One such connection is the following. Theorem 1.3.1 (The Martingale Problem: Sufficiency) Suppose A is the generator of a right-continuous Feller process X on a locally compact, separable metric space S. For any f ∈ D(A), define the adapted process M f = (Mtf ; t ≥ 0) by Mtf
= f (Xt ) − f (X0 ) −
0
t
Af (Xs ) ds.
If F = (Ft ; t ≥ 0) denotes the complete augmented history of X, then M f is a mean-zero martingale with respect to the filtration F, under any of the measures Px , x ∈ S. We say that C ⊂ D(A) is an essential core for A if A(C) ⊂ C0 (S). The previous theorem is essentially sharp as long as the domain of the generator has a sufficiently large essential core, as the following reveals. Theorem 1.3.2 (The Martingale Problem: Necessity) Suppose A is the generator of a Feller process X on a locally compact, separable metric space S. Let C denote an essential core of A. Suppose further that
318
9. Generation of Markov Processes
there exists a linear operator B : C → C0 (S) such that for all f ∈ C, N f = (Ntf ; t ≥ 0) is an F-martingale Px -a.s., for all x ∈ S, where Ntf = f (Xt ) − f (X0 ) −
t 0
Bf (Xs ) ds,
t ≥ 0.
Then, B = A on C. Roughly speaking, the two parts of the martingale problem together assert that if the generator A of a Feller process X has a sufficiently large essential core, then A is determined by knowing all of the martingales adapted to the filtration of X. The main tool for verifying the sufficiency of the martingale problem is the very important Doob–Meyer decomposition for potentials in continuous time. In the setting of discrete Markov chains, its analogue was shown to be true in Corollary 1.5.1 of Chapter 8. Theorem 1.3.3 (Doob–Meyer Decomposition of Potentials) For any x ∈ S, all λ > 0, and any ϕ ∈ C0 (S), Px -a.s., e
−λt
Rλ ϕ(Xt ) +
t
e 0
−λs
ϕ(Xs ) ds = Ex
0
∞
e−λs ϕ(Xs ) ds Ft .
Proof By Theorem 4.1.1 of Chapter 8, on an appropriate probability space we can construct a Feller process X with the same finite-dimensional distributions as X (under Px , for any x ∈ S) such that X also has shift operators. Since the theorem can be reduced to statements about the finitedimensional distributions of X, it suffices to prove the theorem for X replacing X. That is, we can assume, without loss of generality, that X possesses shift operators θ. For any x ∈ S, t ≥ 0, and λ > 0, the following holds Px -a.s.: ∞ ∞ Rλ ϕ(Xt ) = e−λr Tr ϕ(Xt ) dr = e−λr Ex [ϕ(Xr+t ) | Ft ] dr 0 0 ∞ λt =e e−λs Ex [ϕ(Xs ) | Ft ] ds. t
In fact, we have already seen this calculation in the course of the proof of Lemma 4.1.1 of Chapter 8. What is new, however, is the right continuity of t → Xt (ω) for every ω. In particular, (ω, t) → Xt (ω) is measurable. By Fubini’s theorem, Px -a.s. for any x ∈ S, ∞ e−λt Rλ ϕ(Xt ) = Ex e−λs ϕ(Xs ) ds Ft , t
from which the theorem ensues.
1 Generation
319
In particular, if ϕ is also assumed to be nonnegative, then t → e−λt Rλ ϕ(Xt ) is a supermartingale. Thus, the Doob–Meyer decomposition of potentials can be viewed as an improvement of Lemma 4.1.1 of Chapter 8. We can now demonstrate Theorem 1.3.1. Proof of Theorem 1.3.1 Suppose f ∈ D(A). By its very definition, this implies that for all λ > 0, there exists a function ϕ ∈ C0 (S) such that f = Rλ ϕ. Equivalently, Af = λRλ ϕ − ϕ. Applying the Doob–Meyer decomposition (Theorem 1.3.3), we obtain the following for all t ≥ 0: For all x ∈ S, Px -a.s., t t ∞ −λt −λs e f (Xt )− Af (Xs ) ds+λ e f (Xs ) ds = Ex e−λs ϕ(Xs ) ds Ft . 0
0
λ
0
(Ztλ ;
t ≥ 0) is a martingale with respect to F, where t t Af (Xs ) ds + λ e−λs f (Xs ) ds. Ztλ = e−λt f (Xt ) −
In other words, Z =
0
0
To address integrability issues, we merely note that thanks to Lemma 2.2.2 of Chapter 8, Af ∞ ≤ λRλ ϕ∞ + ϕ∞ ≤ 2ϕ∞ . Moreover, by the dominated convergence theorem, t lim Ztλ = f (Xt ) − Af (Xs ) ds, λ↓0
0
1
Px -a.s. and in L (Px ) for all x ∈ S. By Supplementary Exercise 1 of Chap&t ter 7, t → f (Xt ) − 0 Af (Xs ) ds is a martingale, since it is the L1 (Px ) limit of martingales. The result follows, since M0f = 0. Proof of Theorem 1.3.2 Let f ∈ C be a fixed function and define D = (Dt ; t ≥ 0) by t t Af (Xs ) ds − Bf (Xs ) ds, t ≥ 0. Dt = 0
0
2
By our assumptions, D is an L (Px ) continuous martingale of bounded variation, for any x ∈ S. Since D0 = 0, Theorem 3.1.1 of Chapter 7 implies that for any x ∈ S, Px (Dt = 0 for all t ≥ 0) = 1. On the other hand, since t → Xt is right-continuous and Af and Bf are bounded and continuous, Px -a.s. for all x ∈ S, 1 t Af (x) = Af (X0 ) = lim Af (Xs ) ds t→0+ t 0 1 t Bf (Xs ) ds = Bf (X0 ) = Bf (x). = lim t→0+ t 0
320
9. Generation of Markov Processes
This completes our derivation.
2 Explicit Computations We now study the form of the generator of the L´evy processes that appeared in Examples 1 through 4 of Section 4.3, Chapter 8. The processes of interest are Brownian motion in Rd , isotropic stable processes in Rd , the Poisson process on R, and uniform motion in R.
2.1 Brownian Motion We have encountered Rd -valued Brownian motion B = (Bt ; t ≥ 0) in Example 1 of Section 4.3, Chapter 8. Recall that it is a continuous Feller process on Rd whose transition densities (with respect to Lebesgue’s measure on Rd ) are given by pt (x, y) = qt (y − x) = q(t; y − x), where d
qt (a) = (2πt)− 2 exp
−
a2 , 2t
t ≥ 0, q ∈ Rd .
d As usual, a2 = j=1 |a(j) |2 denotes the square of the 2 -norm of a ∈ Rd . The transition functions of B are determined as follows: For all f ∈ L∞ (Rd ), Tt f (x) = qt f (x) =
Rd
qt (y − x)f (y) dy.
It is easy to see that q satisfies the classical heat equation q(t; ˙ x) =
1 ∆q(t; x), 2
t ≥ 0, x ∈ Rd ,
(1)
where q˙ denotes differentiation in the time variable t, and ∆ is the usual Laplace operator1 (Laplacian, for short), acting on the spatial variable x. That is, for all twice differentiable functions g : Rd → R, ∆g(x) =
d j=1
∂2g
2 (x), ∂x(i)
x ∈ Rd .
In particular, if f ∈ L∞ (Rd ) and τf (t; x) = Tt f (x), then we obtain the following from the dominated convergence theorem: τ˙f (t; x) =
1 ∆τf (t; x), 2
t ≥ 0, x ∈ Rd .
(2)
1 In the mathematical physics literature ∆ is often written as ∇ · ∇ or ∇2 ; in any case, ∆ is the trace of the (d × d) Hessian matrix H = (Di Dj ).
2 Explicit Computations
321
Now let Cc2 (Rd ) denote the collection of all twice continuously differentiable functions from Rd into R that vanish outside a compact & set. Suppose now that f ∈ Cc2 (Rd ). Clearly, ∆f ∈ C0 (Rd ) and τf (t; x) = Rd f (x − y)qt (y) dy. Consequently, equation (2) can be written as follows:
1 ∂ Tf (x) = Tf ∆f (x), ∂t 2
t ≥ 0, x ∈ Rd .
We can use the identity T0 f = f , and write the above in integrated form: 1 Tt f (x) − f (x) = t t
0
t
Ts
1 2
∆f (x) ds,
t ≥ 0, x ∈ Rd .
(3)
Since B is a Feller process, we can use the fact that ∆f ∈ C0 (Rd ) once more to observe that lims→0+ Ts ( 12 ∆f ) = 12 ∆f, in C0 (Rd ). That is, for all ε > 0, there exists s0 > 0 such that whenever s ∈ ]0, s0 [, ( ( 1
1 ( ( (Ts ∆f − ∆f ( ≤ ε. 2 2 ∞ Theorem 1.2.1 together with equation (3) yields the following. Proposition 2.1.1 If A denotes the generator of Brownian motion on Rd , then Cc2 (Rd ) is an essential core for D(A). Moreover, on Cc2 (Rd ), A = 12 ∆. For all intents and purposes this is an adequate result on the generator of Brownian motion. Indeed, by Proposition 2.1.1, Cc2 (Rd ) ⊂ D(A) ⊂ C0 (Rd ), while it is a standard exercise in analysis to show that Cc2 (Rd ) is dense in C0 (Rd ). However, using ideas from the theory of distributions, one can do a little more in this special setting. If F, G : Rd → &R are two measurable functions such that F G ∈ L1 (Rd ), define F, G = Rd F (x)G(x) dx. We have already seen •, • as an inner product on L2 (Rd ). However, F, G makes sense, for example, if F ∈ C0 (Rd ) and if G vanishes outside a compact set. Recall equation (2): For all f ∈ L∞ (Rd ), 1 ∂ Tf (x) = ∆Tt f (x), ∂t 2
t ≥ 0, x ∈ Rd .
(4)
Let Cc∞ (Rd ) denote the collection of all infinitely differentiable functions from Rd into R that vanish outside a compact set. We can multiply both sides of (4) by ψ(x) and integrate (dx) to obtain the following for all t ≥ 0: H ∂ I 1 ψ, Tt f = ψ, ∆Tt f . ∂t 2
(5)
322
9. Generation of Markov Processes
Thanks to the dominated convergence theorem, the left-hand side is equal to (∂/∂t)ψ, Tt f . On the other hand, we can integrate the right-hand side of equation (5) by parts, and use the fact that ψ is zero outside a compact set, to see that ψ, ∆Tt f = ∆ψ, Tt f . In other words, we have shown that for all t ≥ 0, all f ∈ L∞ (Rd ), and all ψ ∈ Cc∞ (Rd ), H1 I ∂ ψ, Tt f = ∆ψ, Tt f . ∂t 2 Since T is Feller, we can integrate the above (in t) from 0 to r and divide by r to see that for all r > 0, all ψ ∈ Cc∞ (Rd ), and all f ∈ C0 (Rd ), I H T f − f I H1 1 r r = ∆ψ, Tt f dt . (6) ψ, r 2 r 0 Now let us restrict f ∈ D(A). By the dominated convergence theorem, as r → 0+, the left-hand side converges to ψ, Af , while the right-hand side converges to 12 ∆ψ, f . That is, ψ, Af =
H1 2
I ∆ψ, f ,
ψ ∈ Cc∞ (Rd ), f ∈ D(A).
To summarize our efforts thus far, we have established the following: 1 D(A) ⊂ f ∈ C0 (Rd ) : ∆f exists in the sense of distributions , 2 and on its domain, A = 12 ∆, in the sense of distributions. On the other hand, Proposition 2.1.1 shows that for all f ∈ C0 (Rd ) and all ψ ∈ Cc∞ (Rd ), Aψ, f = 12 ∆ψ, f . Consequently, we have demonstrated the following theorem. Theorem 2.1.1 The domain of the generator of Brownian motion on Rd is precisely the class of all f ∈ C0 (Rd ) for which ∆f exists, in the sense of distributions. Moreover, for all f ∈ D(A), Af = 12 ∆f , in the sense of distributions.
2.2 Isotropic Stable Processes We now compute the form of the generator of the isotropic stable processes of Section 4.3, Chapter 8. Recall that the Rd -valued X = (Xt ; t ≥ 0) is an isotropic stable process with index α ∈ ]0, 2] if it is a L´evy process on Rd with ξ ∈ Rd , E[eiξ·Xt ] = exp(− 12 ξα ), where • denotes the 2 Euclidean norm, as usual. Recall, further, that when α = 2, this is none other than d-dimensional Brownian motion.
2 Explicit Computations
323
We begin by recalling some elementary facts from harmonic analysis. Let C denote the complex plane. For any integrable f : Rd → C, fF denotes its Fourier transform in the following sense: eiξ·x f (x) dx, ξ ∈ Rd . fF(ξ) = Rd
We say that f ∈ L1 (Rd ), if Ref and Imf are both in L1 (Rd ). The same remark applies to all other (real) function spaces that we have seen thus far; this includes Lp (Rd ), Cc∞ (Rd ), etc. In particular, by Fourier inversion, if fF ∈ L1 (Rd ), e−ix·ξ fF(ξ) dξ, x ∈ Rd . f (x) = (2π)−d Rd
Finally, let us recall Parseval’s identity: Whenever f, g ∈ L1 (Rd ) and f g ∈ L1 (Rd ), then fFgF ∈ L1 (Rd ), and moreover, G dξ. fG (ξ)g(ξ) f (x)g(x) dx = (2π)−d Rd
Rd
We now begin to compute the form of the generator of X. Let us define α 1 −d qt (a) = (2π) e−iξ·a e− 2 t ξ dξ, t ≥ 0, a ∈ Rd . Rd
The transition densities for X can then be written as pt (x, y) = qt (y − x); cf. Example 3, Section 4.3 of Chapter 8. Moreover, the following symmetry relation holds: t ≥ 0, a ∈ Rd .
qt (a) = qt (−a),
Thus, for all f ∈ L∞ (Rd ), qt (y − x)f (y) dy = qt (x − y)f (y) dy Tt f (x) = Rd Rd α 1 = (2π)−d e−iξ·(x−y) f (y)e− 2 t ξ dy dξ d d R R α 1 = (2π)−d fF(y)e−iξ·x e− 2 t ξ dξ, Rd
by Fubini’s theorem; also consult Example 3, Section 4.3 of Chapter 8. If f ∈ L∞ (Rd ) and fF ∈ L1 (Rd ), then by Fourier’s inversion theorem, f (x) = (2π)−d fF(ξ)e−iξ·x dξ. Rd
324
9. Generation of Markov Processes
Combining the last two displays, we conclude that whenever f ∈ L∞ (Rd ) and fF ∈ L1 (Rd ), α 1 Tt f (x) − f (x) e− 2 t ξ − 1 = (2π)−d dξ. (1) fF(ξ)e−iξ·x t 2 Rd Using Taylor’s expansion, it is easy to see that for all θ ≥ 0, all ε0 > 0, and all t ∈ [0, ε0 ], 1 − e−tθ ≤ ε0 eθε0 . (2) 0≤ θ Define ∞ d fF(ξ) · ξα e 21 ε0 ξ α dξ < ∞ , (3) Sα = f ∈ L (R ) ∃ε0 > 0 : Rd
where the existence of fF implicitly implies that f ∈ L1 (Rd ) as well. Combining (1) and (2), we can conclude from the dominated convergence theorem that whenever f ∈ Sα , then as t → 0+ , 1 Tt f (x) − f (x) → − (2π)−d fF(ξ) · ξα e−iξ·x dξ, t 2 Rd uniformly in x ∈ Rd . That is, we have shown that Sα ⊂ D(A) and 1 x ∈ Rd , f ∈ S α . fF(ξ) · ξα e−iξ·x dξ, Af (x) = − (2π)−d 2 d R
(4)
Using the dominated convergence theorem once more, it is clear from (1) and (2) that Sα is an essential core for D(A). Thus, we have obtained the following. Proposition 2.2.1 Let A denote the generator of an Rd -valued isotropic stable L´evy process of index α ∈ ]0, 2]. Then, Sα is an essential core for the domain of A, and on Sα , A is given by (4). In fact, the above description captures the essence of the form of the generator, as the following shows. Proposition 2.2.2 Let C denote any essential core of D(A). Then, for all f ∈ C such that Af ∈ L1 (Rd ), Af is given by (4). Proof Suppose f is in the domain of A. For all ϕ ∈ Cc∞ (Rd ), α 1 H T f −fI e− 2 t ξ − 1 t = (2π)−d dξ, F fF(ξ) ϕ(ξ) ϕ, t t Rd where z denotes the complex conjugate of z ∈ C. By the dominated convergence theorem, as t → 0+ , the left-hand side converges to ϕ, Af . Thus, if C denotes an essential core of A, 1 F ξα dξ, ∀f ∈ C, ϕ ∈ Sα . fF(ξ) ϕ(ξ) ϕ, Af = − (2π)−d 2 Rd
2 Explicit Computations
325
By Supplementary Exercise 5, Sα is dense in C0 (Rd ). Therefore, by Parseval’s identity, J (ξ) = − 1 fF(ξ) · ξα , Af ξ ∈ Rd , f ∈ C. (5) 2 The proposition now follows from Fourier’s inversion theorem. When α = 2, X is d-dimensional Brownian motion and A = − 12 ∆, in the sense of distributions; cf. Theorem 2.1.1. In this case, direct computations reveal that for all f ∈ Cc2 (Rd ), J (ξ) = − 1 fF(ξ) · ξ2 , Af ξ ∈ Rd . 2 G Informally, this is written as −∆(ξ) = ξ2 . In view of equation (5), when d A denotes the generator of an R -valued isotropic stable L´evy process of index α ∈ ]0, 2], then −2A(ξ) = ξα (again, informally), and therefore 2A is called the fractional Laplacian of power α2 . This operator is sometimes written, suggestively, as ∆α/2 . Exercise 2.2.1 Verify by direct computations that in the above sense, G −∆(ξ) = ξ2 for all ξ ∈ Rd . Exercise 2.2.2 Supose X and Y are independent L´evy processes with generators AX and AY , respectively. Let D(AX ) and D(AY ) designate the respective generator’s domain. (i) Prove that the domain of the generator of the L´evy process Z equals D(AX ) ∩ D(AY ). (ii) Verify that the formal generator of Z is AX + AY ,
where Z is the L´evy process given by the assignment Zt = Xt + Yt .
2.3 The Poisson Process Let P = (Pt ; t ≥ 0) denote an Rd -valued Poisson process with rate vector λ ∈ Rd+ . (We are following the notation of Example 2, Section 4.3 of Chapter 8.) We can now identify the generator of P, together with its domain. Proposition 2.3.1 Let A denote the generator of P. Then, D(A) = C0 (Rd ) and Af (x) =
d
f (x + ej ) − f (x) λ(j) ,
f ∈ C0 (Rd ), x ∈ Rd ,
=1 (i)
where e1 , . . . , ed designate the standard basis vectors of Rd . That is, ej = 1l{j} (i), 1 ≤ i, j ≤ d.
326
9. Generation of Markov Processes
Exercise 2.3.1 Prove Proposition 2.3.1.
2.4 The Linear Uniform Motion We conclude this section by presenting the domain of the generator of the uniform motion in the 1-dimensional case. Proposition 2.4.1 Let Xt = t for all t ≥ 0. Then, X = (Xt ; t ≥ 0) is a Feller process on R whose generator A is given by Af (x) = f (x), for all f ∈ C0 (R) whose derivatives exist and are uniformly continuous. The latter collection of f ’s is precisely the domain of A. Exercise 2.4.1 Prove Proposition 2.4.1.
3 The Feynman–Kac Formula We now begin in earnest to discuss a few of the connections between Markov processes and some equations of analysis and/or mathematical physics. This section deals with the celebrated Feynman–Kac formula.2 Throughout this section let S denote a locally compact, separable metric space with one-point compactification S∆ . Furthermore, X = (Xt ; t ≥ 0) denotes a right-continuous Feller process on S∆ whose transition functions and generator are T = (Tt ; t ≥ 0) and A, respectively. For the sake of convenience, we will also assume the existence of shift operators θ = (θt ; t ≥ 0); cf. Section 3.3 of Chapter 8. Finally, we will hold fixed a nonnegative function υ ∈ L∞ (S) and define Vt =
t
0
υ(Xs ) ds,
t ≥ 0.
3.1 The Feynman–Kac Semigroup We begin with a technical lemma. Lemma 3.1.1 Suppose that f, υ ∈ L∞ (S) and that υ satisfies lim Tt υ − υ∞ = 0.
t→0+
(1)
2 This was found to various degrees of rigor and generality in (Feynman 1948; Kac 1951). A modern account, together with some applications, can be found in Fitzsimmons and Pitman (1999).
3 The Feynman–Kac Formula
327
Then, uniformly over all x ∈ S, 1 − e−Vt = υ(x)f (x). lim Ex f (Xt ) · t→0 t Remark By the Feller property, any υ ∈ C0 (S) satisfies equation (1). ∞ Proof By Taylor’s expansion, for all x ∈ R, |1 − e−x − x| ≤ j=2 |x|j /j!. Apply this with x ≡ Vt to obtain the following: For all 0 ≤ t ≤ 1, 1 − e−Vt Vt − Ex f (Xt ) ≤ At, (2) sup Ex f (Xt ) · t t x∈S where A = f ∞ · υ2∞ e υ ∞ (why?). Moreover, Vt Vt − υ(x)Tt f (x) = sup Ex f (Xt ) − Ex f (Xt )υ(x) sup Ex f (Xt ) t t x∈S x∈S t ( 1 (Ts υ − υ∞ ds. ≤ f ∞ · t 0
The lemma follows from this, equation (2), and assumption (1). Next, we define a collection of linear operators T υ = (Ttυ ; t ≥ 0) as Ttυ f (x) = Ex [f (Xt )e−Vt ],
f ∈ C0 (S), t ≥ 0.
(3)
Our next result is the first indication that T υ is an interesting collection of linear operators. Lemma 3.1.2 Under the assumption (1), T υ is a Feller semigroup on S. This semigroup is sometimes known as the Feynman–Kac semigroup.3 Proof Clearly, T0υ f = f , and Ttυ is a positive linear operator. Next, we verify the semigroup property. By the Markov property (Theorem 3.3.2, Chapter 8), for all f ∈ L∞ (S) and all x ∈ S, the following holds Px -a.s.: &s & t+s Tsυ f (Xt ) = EXt f (Xs )e− 0 υ(Xr ) dr = Ex f (Xs+t )e− t υ(Xr ) dr Ft ,
where F = (Ft ; t ≥ 0) denotes the complete augmented history of X. On the other hand, for all f ∈ L∞ (S) and all x ∈ S, Ttυ Tsυ f (x) = Ex Tsυ f (Xt )e−Vt . 3 Oftentimes, this is said to correspond to the so-called potential υ. However, we will not use this terminology, since we are already using the term potential for something else.
328
9. Generation of Markov Processes
Whenever f ∈ C0 (S), Tt f ∈ C0 (S) for all t ≥ 0. By the dominated convergence theorem, Ttυ f ∈ C0 (S), also. This implies that Ttυ (C0 (S)) ⊂ C0 (S). We obtain the semigroup property promptly. Let us conclude this argument by verifying the Feller property. For all f ∈ C0 (S), Ttυ f − f ∞ = sup Ex f (Xt )e−Vt − f (x) x∈S ≤ sup Ex f (Xt )(e−Vt − 1) + Tt f − f ∞ . x∈S
Since X is Feller, Lemma 3.1.1 implies that T υ is Feller.
Let Aυ denote the generator of T υ . Our main and final goal in this subsection is to identify Aυ and its domain in terms of A and its domain. Theorem 3.1.1 (The Feynman–Kac Formula) Under the assumption (1), the domains of the generators Aυ and A are one and the same. Moreover, for all f ∈ D(A), Aυ f (x) = Af (x) − υ(x)f (x),
x ∈ S.
Proof Clearly, for all f ∈ L∞ (S) and all x ∈ S, 1 − e−Vt Tt f (x) − f (x) Ttυ f (x) − f (x) = − Ex f (Xt ) . t t t By Lemma 3.1.1, for all f ∈ L∞ (S), in particular for all f ∈ C0 (S), υ Tt f (x) − f (x) Tt f (x) − f (x) − + υ(x)f (x) = 0. lim+ sup t t t→0 x∈S
The theorem follows readily.
3.2 The Doob–Meyer Decomposition We now prove a Doob–Meyer decomposition that corresponds to the Feynman–Kac semigroup. We will later apply this decomposition to make explicit computations for multidimensional Brownian motion; see the following section. Theorem 3.2.1 (The Doob–Meyer Decomposition) Suppose that υ ∈ L∞ (S) satisfies equation (1) of Section 3.1. For any f ∈ D(A), define M = (Mt ; t ≥ 0) by t e−Vr Aυ f (Xs ) ds, t ≥ 0. Mt = e−Vt f (Xt ) − f (X0 ) − 0
Then, M is a mean-zero martingale with respect to the complete augmented filtration of X, and under any of the measures Px , x ∈ S.
4 Exit Times and Brownian Motion
329
Proof Recall that we have assumed the existence of shifts θ = (θt ; t ≥ 0). If they do not exist, the proof only becomes notationally more cumbersome. For any s, t ≥ 0, Vt ◦ θs = Vt+s − Vs . (1) We now follow the proof of Theorem 1.3.3 closely, only making a few necessary adjustments. Let Rυ = (Rυλ ; λ > 0) denote the resolvent corresponding to the generator Aυ . For any λ > 0, all ϕ ∈ L∞ (S), and for all t ≥ 0, we can use the Markov property (Theorem 3.3.2 of Chapter 8) to see that Px -a.s. for all x ∈ S, ∞ υ e−λs−Vs ◦θt ϕ(Xs+t ) ds Ft . Rλ ϕ(Xt ) = Ex 0
Since Vt is Ft -measurable, by equation (1), Px -a.s. for all x ∈ S, ∞ e−Vt Rυλ ϕ(Xt ) = eλt Ex e−λ(s+t)−Vs+t ϕ(Xs+t ) ds Ft 0 ∞ = eλt Ex e−λr−Vr ϕ(Xr ) dr Ft . t
Given f ∈ D(A), we can use the Feynman–Kac formula (Theorem 3.1.1) to deduce that for any λ > 0, we can find ϕ ∈ C0 (S) such that f = Rυλ ϕ. Equivalently, Aυ f = λf −ϕ. Thus, we can apply the previous computations to this specific choice of ϕ to deduce that Px -a.s. for all x ∈ S, t ∞ e−λr−Vr ϕ(Xr ) dr Ft − e−λr−Vr ϕ(Xr ) dr e−λt−Vt f (Xt ) = Ex 0 0 ∞ −λr−Vr = Ex e ϕ(Xr ) dr Ft 0 t e−λr−Vr λf (Xr ) − Aυ f (Xr ) dr. − 0
Define Ztλ = e−λt−Vt f (Xt ) −
0
t
Aυ f (Xr ) dr + λ
t 0
e−λr−Vr f (Xr ) dr,
t ≥ 0.
Then, we have shown that Z λ = (Ztλ ; t ≥ 0) is a martingale with respect to F, and under any of the measures Px , x ∈ S. The result follows, by the dominated convergence theorem, upon letting λ → 0+ .
4 Exit Times and Brownian Motion Throughout this section we let B = (Bt ; t ≥ 0) denote a d-dimensional Brownian motion with complete augmented filtration F = (Ft ; t ≥ 0).
330
9. Generation of Markov Processes
Recall from Chapter 8 that Px is the P-distribution of x + B, for any x ∈ Rd . We will write P for P0 and use the fact that t → Bt is continuous. Let us define B(a; r) as the open 2 -ball of radius r > 0 around a ∈ Rd . That is, B(a; r) = {b ∈ Rd : a − b < r},
a ∈ Rd , r > 0.
Then, the exit time ζ(a; r) of the ball B(a; r) is defined as
ζ(a; r) = inf s > 0 : Bs ∈ B(a; r) . Clearly, ζ(a; r) is an F-stopping time. Among other things, in this section we shall compute the distribution of ζ(a; r). This will be done by establishing connections between Brownian motion and partial differential equations. The material of this, and the remaining sections of this chapter, are independent of the rest of the book, and can be skipped at first reading. In order to simplify our exposition, let τ (a; r) = T∂B(a;r) denote the first time to hit the boundary of the ball B(a; r). If our Brownian motion B starts in the interior of B(a; r), then ζ(a; r) = 0 (why?). On the other hand, if it start outside of the closure of B(a; r), the continuity of t → Bt ensures that ζ(a; r) = τ (a; r). Thus, we will concentrate on computing the distribution of τ (a; r).4
4.1 Dimension One When d = 1, it is possible to find the distribution of τ (a; r) in a simple and elegant way. We will now make this computation without further ado. The remaining calculations, i.e., those for d ≥ 2, are more delicate, and rely on further analysis that we will perform in the subsequent subsections. Theorem 4.1.1 (P. L´ evy) If d = 1, then for any x ∈ R and all α > 0, √ Ex [e−ατ (x;r)] = sech(r 2α), where sech denotes the hyperbolic secant. Proof Under Px , B − x has the same distribution as B, under P. Thus, Ex [e−ατ (x;r)] = E[e−α(0;r) ]. We should recall that E = E0 . That is, we have reduced the problem for general x to that for x = 0. From now on, we hold r > 0 fixed and write τ = τ (0; r). The remainder of the proof is a sign of what is to come in higher dimensions: We strive to find a good martingale, and apply the optional stopping theorem. 4 It turns out that if B starts on the boundary of B(a; r), then ζ(a; r) is still zero. In this regard, see Supplementary Exercise 6.
4 Exit Times and Brownian Motion
331
Recall from Corollary 1.7.1(iii), Chapter 7, that for any λ > 0, M = (Mt ; t ≥ 0), a mean-one F-martingale, where λ2 Mt = exp λBt − t , 2
t ≥ 0.
Since t → Bt is continuous, Corollary 1.7.2 of Chapter 7 shows that τ is a.s. finite. Thus, we can apply the optional stopping theorem (Theorem 1.7.1, Chapter 7) to deduce that for all t ≥ 0, 1 2 1 = E[Mt∧τ ] = E e−λBt∧τ − 2 λ (t∧τ ) . On the other hand, by path continuity, sups<τ |Bs | ≤ r, since τ < ∞, P-a.s. Thus, we can apply the dominated convergence theorem to legitimately let t → ∞ and conclude that λ2 1 = E e−λBτ − 2 τ . Since P(Bτ = ±r) = 1, λ2 λ2 1 = e−λr E e− 2 τ 1l(Bτ =−r) + eλr E e− 2 τ 1l(Bτ =r) .
(1)
The processes B and −B have the same finite-dimensional distributions. Therefore, the random vectors (Bτ , τ ) and (−Bτ , τ ) have the same distributions (why?). As a result, the two expectations in equation (1) are equal; i.e., λ2 1 λ2 λ2 E e− 2 τ 1l(Bτ =r) = E e− 2 τ 1l(Bτ =−r) = E e− 2 τ . 2 Plugging this into equation (1), we can see that for all λ > 0, λ2 E e− 2 τ =
2 = sech(λr). + eλr √ The theorem follows upon letting λ = 2α. e−λr
4.2 Some Fundamental Local Martingales Recall from Section 3.3 of Chapter 7 that a real-valued stochastic process M = (Mt ; t ≥ 0) is a local martingale (with respect to F under all measures Px , x ∈ Rd ) if: (a) for all t ≥ 0, Mt is adapted to Ft ; (b) there exist possibly infinite F-stopping times τ1 , τ2 , . . . such that (b1 ) for all x ∈ Rd , Px (limk→∞ τk = +∞) = 1; and
332
9. Generation of Markov Processes
(b2 ) for all x ∈ Rd and all k ≥ 1, t → Mt∧τk is a martingale under the measure Px . Lemma 3.3.1 of Chapter 7 shows that if we truncate local martingales, we obtain genuine martingales. Conversely, if M is a martingale, so is t → Mt∧τk for any k ≥ 1, where τk is the sequence of stopping times given in Lemma 3.3.1, Chapter 7; see the optional stopping theorem (Theorem 1.6.1, Chapter 7). In this section we will find some useful local martingales. Define " − ln x, if d = 2, u(x) = (1) x2−d , if d ≥ 3, where we interpret log ∞ = 1 ÷ 0 = +∞. The function u is said to be the fundamental harmonic function with pole at the origin. That u has a pole, or singularity, at the origin is clear; the word harmonic comes from the following. Lemma 4.2.1 Let d ≥ 2. The function u given by (1) solves the PDE ∆u(x) = 0, where ∆ =
d
j=1
x ∈ Rd ,
∂ 2 /∂(x(j) )2 denotes the Laplacian.
Exercise 4.2.1 Verify Lemma 4.2.1 directly.
The partial differential equation ∆u = 0, given above, is the so-called Dirichlet problem of mathematical physics; its solutions are called harmonic functions. The general solution to this PDE turns out to be c1 u(x) + c2 , where c1 and c2 are arbitrary constants (why?). This explains why u is said to be the fundamental harmonic function with pole at 0 ∈ Rd . The following shows one of the many deep connections between PDEs and Markov processes. Lemma 4.2.2 Let d ≥ 2 and let u denote the fundamental harmonic function with pole at the origin. Then, u(B) = (u(Bt ); t ≥ 0) is a local martingale under the measure Px , x ∈ Rd \ {0}. Proof Recall from Theorem 2.1.1 that the Laplacian ∆ is the generator of B. Moreover, note that u is not in the domain of this generator, since it is not in C0 (Rd ). Before giving an honest proof, let us skirt this issue and formally argue why the result holds true. We will then proceed with a precise demonstration. By Lemma 4.2.1, ∆u ≡ 0. Therefore, one would expect that (by the martingale problem, cf. Theorem 1.3.1), t → u(Bt ) − u(x) is a mean-zero martingale under the measure Px , for any x ∈ Rd . This would imply a strong form of the theorem, i.e., one that asserts a martingale property,
4 Exit Times and Brownian Motion
333
rather than a local martingale property. However, as it turns out, u(B) is not a martingale; cf. Supplementary Exercise 7. Therefore, we need to proceed with more caution. Now we proceed with our derivation. Let us fix a constant a > 0 and define U(a) = {x ∈ Rd : |u(x)| < a}. Clearly, this is an open set. Define σ(a) = inf(s ≥ 0 : Bs ∈ U(a)),
a > 0,
where, as usual, inf ∅ = ∞. Note that σ(a) is a stopping time (Theorem 1.2.1, Chapter 7). Moreover, it is not always the same as inf(s > 0 : Bs ∈ U(a)). By Supplementary Exercise 14 of Chapter 8, the absorbed process B a = (Bt∧σ(a) ; t ≥ 0) is a Feller process on the compact space U(a). The domain of its generator is described in Supplementary Exercise 8; this domain includes the collection of all continuous functions f : U(a) → R for which ∆f exists in the sense of distributions. Define ua (x) = u(x)1lU(a) (x). Then, ua is in the domain of the generator of B a . Furthermore, in the sense of distributions, ∆ua ≡ 0, since ∆u ≡ 0. Therefore, we can apply the martingale problem (Theorem 1.3.1) to the process B a , and conclude that t → ua (Bta ) is a martingale under any of the measures Px , x ∈ U(a). On the other hand, for all t ∈ ]0, σ(a)], ua (Bta ) = u(Bt ). By the optional stopping theorem, t → u(Bt∧σ(a) ) is a martingale under Px , for any x ∈ U(a). Thus, (σ(k); k ≥ 1) is a localizing sequence, and u(B) is a local martingale under Px , x ∈ U(a). Since a > 0 is arbitrary, we are done. Before proceeding with our next lemma, we first need access to an important class of martingales. Exercise 4.2.2 If B denotes Rd -valued Brownian motion, then, t → Bt 2 − t is a mean-zero martingale under the measure P. Consequently, t → Bt − x2 − t is a mean-zero martingale under the measure Px , for any x ∈ Rd . Lemma 4.2.3 For all a ∈ Rd , all r > 0, and all x ∈ B(a, r), Px (τ (a; r) < ∞) = 1. Proof Combining Exercise 4.2.2 with the optional stopping theorem (Theorem 1.6.1, Chapter 7) yields Ex Bt∧τ (a;r) − x2 = Ex [t ∧ τ (a; r)]. (2) On the other hand, Px -a.s., sup Bt∧τ (a;r) − x ≤ x + a + r ≤ 2a + r, t
334
9. Generation of Markov Processes
which is finite. Thus, we can take t → ∞ in (2), to obtain Ex Bτ (a;r) − x2 = Ex [τ (a; r)]. You need to envoke the dominated convergence theorem for the left-hand side of (2) and the monotone convergence theorem for the corresponding right-hand side. Since Bτ (a;r) − x ≤ a + r + x ≤ 2a + r, Px -a.s., Ex [τ (a; r)] ≤ 2a + r < ∞. The lemma follows readily from this.
We can now apply the above lemmas to solve a “gambler’s ruin problem”; it may be helpful to recall that we have already encountered the 1-dimensional gambler’s ruin problem in Corollary 1.7.3 of Chapter 7. Theorem 4.2.1 (Gambler’s Ruin) If d ≥ 2, then for 0 < r < R and x ∈ Rd with r ≤ x ≤ R, ln(R/x) , if d = 2, ln(R/r) Px (τ (a; r) < τ (a; R)) = 2−d − R2−d x , if d ≥ 3. 2−d r − R2−d Figure 9.1 shows the event that the two-dimensional Brownian path starting at x ∈ Rd with x = satisfies τ (a; R) < τ (a; r). Let us now prove the theorem. Proof We shall fix x, r, and R as given by the statement of the theorem. Next, let us define U :]0, ∞[→ R by " − ln y, if d = 2, U (y) = y > 0. y 2−d , if d ≥ 3, Clearly, u(x) = U (x), for all x ∈ Rd . We will show that U (x) − U (R) , (3) U (r) − U (R) which is the same as the assertion of our theorem. By Lemma 4.2.2 above, U (B) = (U (Bt ); t ≥ 0) is a local martingale under Px . On the other hand, with probability one,
sup U Bt∧τ (x,r)∧τ (x,R) ≤ |U (R)| ∧ |U (r)|, (4) Px (τ (a; r) < τ (a; R)) =
t≥0
which is finite. Hence, by the optional stopping theorem (Theorem 1.6.1, Chapter 7), t → U (Bt∧τ (x;r)∧τ (x;R)) is a martingale. In particular, we can take expectations to deduce U (x) = Ex U (Bt∧τ (x;r)∧τ (x;R)) .
4 Exit Times and Brownian Motion
ρ
r
335
R
Figure 9.1: Gambler’s ruin for 2-dimensional Brownian motion By combining Lemma 4.2.3, equation (4), and the dominated convergence theorem, we can let t → ∞ to see that U (x) = Ex U (Bτ (x;r)∧τ (x;R)) = U (r)Px {τ (a; r) < τ (a; R)} + U (R)Px {τ (a; r) > τ (a; R)} = U (r) − U (R) Px {τ (a; r) < τ (a; R)} + U (R).
Equation (3) follows.
4.3 The Distribution of Exit Times The main result of this subsection is the following computation of the Laplace transform of the distribution of τ (a; r). At this level of generality, this computation is from Ciesielski and Taylor (1962); see L´evy (1965) for several related results. We follow the elegant exposition of Knight (1981). Theorem 4.3.1 (The Ciesielski–Taylor Theorem) For all a ∈ Rd , r > 0, every x ∈ B(a; r), and all α > 0, Ex [e
−ατ (a;r)
√ d x − a1− 2 I d −1 ( 2αx − a) 2 = , √ d r1− 2 I d −1 ( 2αr) 2
where Iν is the modified Bessel function of index ν.
336
9. Generation of Markov Processes
Recall that when ν > − 21 , the modified Bessel functions Iν and Kν are given by5 1 z ν 21−ν 1
Iν (z) = e−zt (1 − t2 )ν− 2 dt; (1) √ Γ ν + 12 π 0 √ ∞ zν π 1
Kν (z) = ν e−zt (t2 − 1)ν− 2 dt. (2) 1 2 Γ ν+2 1 Moreover, when ν ≤ − 21 , they are defined by the identities Iν = I−ν , Kν = K−ν . (We have already encountered the function K 12 in Section 2.3 of Chapter 8.) Now, consider the following Bessel’s equation: 1 ν2 s ∈ R. (3) G
(s) + G (s) − 1 + 2 G(s) = 0, s s The general solution to (3) is G(s) = c1 Iν (s) + c2 Kν (s), where c1 and c2 are arbitrary constants. Lemma 4.3.1 Consider the following form of Bessel’s equation: θ G
(s) + G (s) − κG(s) = 0, s
s ∈ R,
where θ ∈ R and κ > 0. Then, the general solution to the above is of the form √ √ 1 1 G(s) = c1 s 2 (1−θ) I 12 (θ−1) ( κs) + c2 s 2 (1−θ) K 12 (θ−1) ( κs), where c1 and c2 are arbitrary constants. Exercise 4.3.1 Check the validity of Lemma 4.3.1.
Lemma 4.3.2 For any r, α > 0, x → Ex [e−ατ (0;r) ] is a radial function. Proof For a fixed r > 0, we define v(x) = Ex [e−ατ (0;r)], x ∈ Rd , and will show that for any (unitary) rotation matrix O, and for all x ∈ Rd , v(Ox) = v(x). It is easy to see that for any (d × d) (unitary) rotation matrix O, OB = (OBt ; t ≥ 0) is d-dimensional Brownian motion. Indeed, since OB is a continuous Gaussian process, one does this by simply checking the covariances (check this!). Let τ (0; r) denote the entrance time of ∂B(0; r) 5 For
a thorough treatment of Bessel functions, see Watson (1995).
4 Exit Times and Brownian Motion
337
for the process OB. It is immediate that the distribution of τ (0; r) under POx is the same as that of τ (0; r) under Px . The result follows readily from this observation. Lemma 4.3.3 For any r, α > 0, lim
x∈B(0;r):
x →r
Ex e−ατ (0;r) = 1.
Proof Since this result has to do with distributional properties, we can assume, without loss of generality, that there are shifts θ = (θt ; t ≥ 0) (why?). Let us now hold fixed a y ∈ B(0; r) and a u ∈ ]y, r[. Since t → Bt is continuous, starting from y, Brownian motion must first exit B(0; u) before exiting B(0; r). In other words, by inspecting the paths, τ (0; r) = τ (0; u) + τ (0; r) ◦ θτ (0;u) ,
Py -a.s.
On the other hand, B is a strong Markov process (Example 1 of Section 4.3, Chapter 8). By the strong Markov property (Theorem 4.2.1, Chapter 8), Ey [e−ατ (0;r) ] = Ey e−ατ (0;u) EBτ (0;u) {e−ατ (0;r)} . (4) We have tacitly used Lemma 4.2.3 of the previous subsection. Since t → Bt is continuous, Bτ (0;u) ∈ ∂B(0; u). Hence, Lemma 4.3.2 shows that for any x ∈ ∂B(0; u), EBτ (0;u) e−ατ (0;r) = Ex e−ατ (0;r) . Consequently, equation (4) has the following reformulation: For all x, y ∈ B(0; r) with x > y, Ey e−ατ (0;r) = Ey e−ατ (0; x ) Ex e−ατ (0,r) . By the continuity of t → Bt , as x → r− , τ (0; x) → τ (0; r), Py -a.s. The lemma follows immediately from the previous display. We can now prove Theorem 4.3.1. Proof of Theorem 4.3.1 Recall from Theorem 2.1.1 that the generator of B is 12 ∆, i.e., the Laplacian in the sense of distributions. Moreover, its domain,D(∆), is the collection of all f ∈ C0 (Rd ) for which ∆f exists. Suppose that we could find a radial function f ∈ D(∆) such that 1 ∆f = αf, 2
(5)
338
9. Generation of Markov Processes
where α > 0 is a fixed real number. Let υ(x) ≡ α for all x ∈ Rd , and note that by the Feynman–Kac formula (Theorem 3.1.1), Aυ f = 12 ∆f − αf. That is, equation (5) is equivalent to the condition Aυ f (x) = 0,
∀x ∈ Rd .
By the Doob–Meyer decomposition (Theorem 3.2.1), for all x ∈ B(0; r), γ = (γt ; t ≥ 0) is a continuous, mean-zero martingale, under the measure Px , where t ≥ 0. γt = e−αt f (Bt ) − f (x), By the optional stopping theorem (Theorem 1.6.1, Chapter 7), for all t ≥ 0 and all x ∈ B(0; r), Ex [γt∧τ (0;r) ] = 0. Equivalently, x ∈ B(0; r). f (x) = Ex e−α(t∧τ (0;r))f (Bt∧τ (0;r) ) , Since f ∈ C0 (Rd ), we can use the dominated convergence theorem to let t → ∞, and see that f (x) = Ex e−ατ (0;r)f (Bτ (0;r) ) , x ∈ B(0; r). But Bτ (0;r) ∈ ∂B(0; r), and f is radial. Therefore, f (x) = f (0, . . . , 0, r) × Ex e−ατ (0;r) ,
x ∈ B(0; r).
(6)
That is, in summary, if there is a radial solution to the PDE (5), it must provide the general form of the Laplace transform of the exit time of B(0; r). Consequently, it suffices to solve equation (5) by a radial function. Henceforth, let f denote a radial C0 (Rd ) solution to equation (5). We will write f (x) = F (x), for all x ∈ Rd , where F is a function on R+ . It is easy to check directly from equation (5) that F must satisfy the differential equation d−1
F (s) − 2αF (s) = 0, s ≥ 0. F
(s) + s According to Lemma 4.3.1, √ √
d d s ≥ 0, F (s) = c1 s1− 2 I d −1 2αs + c2 s1− 2 K d −1 2αs , 2
2
for some constants c1 and c2 . On the other hand, f (x) = F (x) is bounded. Since z → z −ν Kν (z) is not bounded near 0 (equation (2)), the constant c2 above must be 0. That is, √
d f (x) = c1 x1− 2 I d −1 2αx . 2
Therefore, equation (6) and the above together yield a constant C such that for all x ∈ B(0; r), √ d Ex e−ατ (0;r) = Cx1− 2 I d −1 ( 2αx). 2
5 Supplementary Exercises
339
We can let x → r− , and apply Lemma 4.3.3 to see that C=
1 . √ d r1− 2 I d −1 ( 2αr) 2
In other words, we have proven the theorem when a = 0. To prove the general result, note that the distribution of τ (a; r) under the measure Px is the same as the distribution of τ (0; r) under the measure Px−a . This follows quickly from the definition of Px as the distribution of x + B.
5 Supplementary Exercises 1. Use Itˆ o’s formula (Theorem 3.8.1, Chapter 7) to give another proof of Proposition 2.1.1 in the case d = 1. To complete the proof of Proposition 2.1.1, use the multidimensional Itˆ o’s formula of Supplementary Exercise 8, Chapter 7. 2. Suppose A denotes the generator of a Feller process X on a locally compact, separable metric space S. If A has an essential core C such that A(C) is dense in C0 (S), prove that t → (t, Xt ) is a Feller process on [0, ∞[×S whose generator, A , has the form ∂ϕ + Aϕ(t, x). A ϕ(t, x) = ∂t Find an essential core C for D(A ) such that A (C ) is dense in C0 (S). The process t → (t, Xt ) is called the space–time process corresponding to X, and its generator A is the parabolic operator corresponding to the operator A. 3. Let B denote d-dimensional Brownian motion and define the d-dimensional t Ornstein–Uhlenbeck O = (Ot ; t ≥ 0) by Ot = e− 2 Bet (t ≥ 0). Prove that O is a Feller process on Rd and find the form of its generator. Can you identify the domain of the generator? 4. Use Itˆ o’s formula (Supplementary Exercise 8, Chapter 7) to show that when B is d-dimensional Brownian motion and d ≥ 2, t → Bt and t → Bt 2 are Feller processes. Identify the forms of their respective generators. These processes are the d-dimensional Bessel process and the d-dimensional Bessel squared process, respectively. 5. Show that when 0 < α < 2, the collection Sα , defined by equation (3) of Section 2.2, is dense in C0 (Rd ). (Hint: It may be easier to prove more: Define C to be the collection of all f ∈ L∞ (Rd ) ∩ L1 (Rd ) such that outside a compact set, fF = 0, and prove that C is dense in C0 (Rd ).) 6. Use Theorem 4.2.1 to show that in the notation of the preamble to Section 4, for all r > 0 and all x ∈ ∂B(a; r), Px (ζ(a; r) = 0) = 1.
340
9. Generation of Markov Processes
7. Given a 3-dimensional Brownian motion B, prove that when x = 0, t → Bt −1 is a continuous local martingale but is not a continuous martingale. What if d ≥ 4 and B is d-dimensional Brownian motion? (Hint: Check that limt→∞ Ex [Bt −1 ] = 0 = x−1 .) 8. Suppose X is a Feller process on a locally compact, separable metric space (S, d). Fix a compact set K ⊂ S and let TK = inf(s ≥ 0 : Xs ∈ K). Define Y to be X “killed upon leaving K,” i.e., Yt = Xt∧TK , t ≥ 0. Recall from Supplementary Exercise 14 of Chapter 8 that Y is a Feller process on K. If the generators of X and Y are denoted by A X and A Y , respectively, show that for all ϕ ∈ D(A X ) that vanish outside K, A Y ϕ = A X ϕ. 9. Suppose X is a Markov process on S∆ with generator A. (i) Show that whenever f ∈ D(A) solves Af = 0, t → f (Xt ) is a local martingale. (ii) Suppose υ : [0, ∞[×S → R is such that for all t ≥ 0, x → υ(t, x) is in D(A) and that ∂υ = −Aυ. ∂t Prove that t → υ(t, Xt ) is a local martingale. 10. Suppose D ⊂ Rd is open and has compact closure. The Dirichlet problem on D with boundary function g is the following PDE: Find a function υ such that ∆υ (x) = 0, for all x ∈ D and υ(x) ≡ g(x), for all x on the boundary of D. Prove that whenever g is continuous, and when the Dirichlet problem has a bounded solution υ, it has the probabilistic representation υ(x) = Ex [g(BS )], for all x ∈ Rd , where B denotes d-dimensional Brownian motion and S = inf(s ≥ 0 : Bs ∈ D ) is the exit time from D. 11. (Continued from Supplementary Exercise 10) The Poisson equation on D (a bounded, open set in Rd ) with boundary function g (a function on ∂D) and potential V is the PDE ∆υ(x) = V (x), for all x ∈ D and υ(x) = g(x), for all x ∈ ∂D. Find a probabilistic representation for the unique bounded solution to this PDE, if one exists.
6 Notes on Chapter 9 Section 1 Our discussion of generators is quite slim. You can learn more about this subject from (Ethier and Kurtz 1986; Fukushima et al. 1994; Yosida 1995). What we call the martingale problem is only half of the martingale problem of D. Stroock and S. R. S. Varadhan; cf. Ethier and Kurtz (1986) for a textbook introduction to this topic. The Doob–Meyer decomposition has other probabilistic implications, as well as analytical ones; see Getoor and Glover (1984) for a sample.
6 Notes on Chapter 9
341
Section 2 To learn more about the material of this section, you can start with (Bertoin 1996; Knight 1981). Section 4 In the mathematical literature the connections between Markov processes and partial differential equations date at least as far back as Kakutani (1944b, 1945) and Dynkin (1965). The Ciesielski–Taylor theorem (Theorem 4.3.1) was first found, in part, by Paul L´evy and then completely in Ciesielski and Taylor (1962). See the references in the latter paper. Another proof can be found in Knight (1981, Theorem 4.2.20) that essentially forms the basis for the arguments presented here. Section 5 Supplementary Exercises 11 and 10 are a starting point in the probabilistic approach to second-order PDEs, and explicitly appear in Kac (1951). To learn more about probabilistic solutions to PDEs and many of their applications, see (Bass 1998; Karatzas and Shreve 1991; Revuz and Yor 1994).
This page intentionally left blank
10 Probabilistic Potential Theory
Consider a random subset K of Rd . A basic problem in probabilistic potential theory is the following: For what nonrandom sets E is P(K ∩ E = ∅) positive? The archetypal example of such a set K is the range of a random field. Let X = (Xt ; t ∈ RN + ) denote an N -parameter stochastic process that takes its 1 values in Rd and consider the random set K = {Xs : s ∈ RN + }. For this particular random set K, the above question translates to the following: When does the random function X ever enter a given nonrandom set E with positive probability? Even though we will study a large class of random fields in the next chapter, the solution to the above problem is sufficiently involved that it is best to start with the easiest one-parameter case, which is the subject of the present chapter. Even in this simpler one-parameter setting, it is not clear, a priori, why such problems are interesting. Thus, our starting point will be the analysis of recurrence phenomena for one-parameter Markov processes that have nice properties. To illustrate the key ideas without having to deal with too many technical issues, our discussion of recurrence concentrates on L´evy processes. The astute reader may recognize Section 1 below as the continuous-time analogue of the results of the first section of Chapter 3. 1 The set {X : s ∈ RN } must not be confused with the process (X ; s ∈ RN ). Since s s + + this notation for sets is the typical one in most of mathematics, we shall adopt it with no further comment.
344
10. Probabilistic Potential Theory
1 Recurrent L´evy Processes In Section 4.3 of Chapter 8 we saw that L´evy processes are continuoustime analogue of the random walks of Part I. We now address the issues of transience and recurrence, in analogy to the discrete theory of Section 1, Chapter 3. Throughout this section we let X = (Xt ; t ≥ 0) be an Rd -valued L´evy process. Recall from Section 4.3, Chapter 8, that X is a Feller process and the measures Px (x ∈ Rd ) are none other than the distributions of the (Rd )R+ -valued random variable X + x. (Heuristically speaking, under the measure Px , X is a L´evy process, conditioned on (X0 = x).) We will write P for P0 , at no great risk of ambiguity. Finally, F = (Ft ; t ≥ 0) denotes the complete augmented history of the process X. A point x ∈ Rd is said to be recurrent (for X) if for all ε > 0, the set t ≥ 0 : |Xt − x| ≤ ε is unbounded, P-almost surely. In order to make this measure-theoretically rigorous, we let τ0 = 0 and iteratively define τk+1 = inf(t > 1 + τk : |Xt − x| ≤ ε), with the usual stipulation that inf ∅ = ∞. Then, we say that x is recurrent (for X) if ∀k ≥ 1. P(τk < ∞) = 1, If x is not recurrent, it is said to be transient. If every x ∈ Rd is transient (respectively recurrent), then X is said to be transient (respectively recurrent). Our immediate goal is to find a useful condition for when 0 is recurrent or transient for a L´evy process X. To this end, we present the methods of Khoshnevisan (1997a).
1.1 Sojourn Times Recall that |x| denotes the (Euclidean) ∞ norm of any (Euclidean) vector x. For all a ∈ Rd and all r > 0, define the closed ball B(a; r) = b ∈ Rd : |a − b| ≤ r . (1) (This is, geometrically speaking, a cube.) We will use the above notation for the remainder of this section. Our discussion of recurrence begins with the following technical lemma. Lemma 1.1.1 For any t, ε > 0, t t
d P |Xs | ≤ 2ε ds ≤ 4 P |Xs | ≤ ε ds. 0
0
1 Recurrent L´evy Processes
345
2ε
ε/2
Figure 10.1: Covering B(0; 2ε) with 16 disjoint balls of radius 12 ε; the bullets denote the positions of a1 , . . . , a16 . Proof Since this is a distributional result, we can assume, without loss of generality, that X has shifts θ = (θt ; t ≥ 0) (why?). It is simple to see that there exist a1 , . . . , a4d ∈ B(0; 2ε) such that: • for all i = j, both in {1, . . . , 4d }, the interiors of B(ai ; 12 ε) and B(aj ; 12 ε) are disjoint; and d
• ∪4i=1 B(ai ; 12 ε) = B(0; 2ε). Figure 10.1 shows this for d = 2. By Fubini’s theorem,
t 0
4d t
P |Xs | ≤ 2ε ds ≤ E 1l(|Xs −aj |≤ 12 ε) ds .
(2)
0
j=1
Define σj = inf(s > 0 : |Xs − aj | ≤ 12 ε); by Theorem 1.5.1, Chapter 7, the σj ’s are F-stopping times. Moreover, t t 1l(|Xs −aj |≤ 12 ε) ds = 1l(|Xs −aj |≤ 12 ε) ds 1l(σj ≤t) 0
σj
t−σj
= 0
≤
0
t
1l(|Xs −aj |≤ 12 ε) ds ◦ θσj 1l(σj ≤t)
1l(|Xs −aj |≤ 12 ε) ds ◦ θσj 1l(σj ≤t) .
By the strong Markov property (Theorem 4.2.1, Chapter 8), t t 1l(|Xs −aj |≤ 12 ε) ds ≤ E E 1l(|Xs −aj |≤ 12 ε) ds Fσj 1l(σj ≤t) E 0 0 t 1l(|Xs −aj |≤ 12 ε) ds 1l(σj ≤t) . = E EXσj 0
346
10. Probabilistic Potential Theory
However, the distribution of Xs −aj under Px is the same as the distribution of Xs under the measure Px−aj . Thus, t t 1l(|Xs −aj |≤ 12 ε) ds ≤ E EXσj −aj 1l(|Xs |≤ 12 ε) ds 1l(σj ≤t) . E 0
0
By the right continuity of t → Xt , if σj < ∞, then |Xσj − aj | ≤ 12 ε. Therefore, t t E 1l(|Xs −aj |≤ 12 ε) ds ≤ E sup Ex 1l(|Xs |≤ 12 ε) ds 1l(σj ≤t) 0
=E
x: |x|≤ 12 ε
sup x: |x|≤ 12 ε
0
t E 1l(|Xs +x|≤ 12 ε) ds 1l(σj ≤t) 0
t ≤E E 1l(|Xs |≤ε) ds 1l(σj ≤t) 0 t P(|Xs | ≤ ε) ds. ≤ 0
This, together with equation (2) proves the lemma. The following is another useful technical estimate. Lemma 1.1.2 For any b > a > 0 and for all ε > 0, b b 2 d ≤2·4 E 1l(|Xs |≤ε) ds P(|Xs | ≤ ε) ds · a
a
0
b
P(|Xs | ≤ ε) ds.
Proof Once more, as in our proof
of Lemma 1.1.1, we may assume the existence of shifts θ = θt ; t ≥ 0 . By Fubini’s theorem, E
a
b
1l(|Xs |≤ε) ds
2
b
=2 a
s
b
P(|Xs | ≤ ε , |Xt | ≤ ε) dt ds.
By the definition of L´evy processes, whenever t ≥ s ≥ 0, the random vector Xt − Xs is independent of Xs and has the same distribution as Xt−s . Thus, E
a
b
1l
ds
|Xs |≤ε
2
b
b
=2 a
≤2
a
b
b
s
s
=2 a
≤2
a
b
s
b
b
P(|Xs | ≤ ε , |Xt − Xs + Xs | ≤ ε) dt ds P(|Xs | ≤ ε) P(|Xt − Xs | ≤ 2ε) ds dt P(|Xs | ≤ ε) P(|Xt−s | ≤ 2ε) dt ds
P(|Xs | ≤ ε) ds ·
0
b
P(|Xt | ≤ 2ε) dt.
1 Recurrent L´evy Processes
347
Our lemma now follows from Lemma 1.1.1. For all a, ε > 0, define Aa (ε) =
0
a
1l(|Xs |≤ε) ds.
(3)
The process A(ε) = (Aa (ε); a > 0) is the so-called sojourn time process on B(0; ε), for X. Proposition 1.1.1 The following are equivalent: (i) there exists an ε > 0 such that E{A∞ (ε)} = ∞; (ii) for all ε > 0, E{A∞ (ε)} = ∞; (iii) there exists an ε > 0 such that P{A∞ (ε) = ∞} > 0; (iv) for all ε > 0, P{A∞ (ε) = ∞} > 0; and (v) for all ε > 0, P{A∞ (ε) = ∞} = 1. Proof The equivalence of (i) and (ii) follows from Lemma 1.1.1. Next, we show that (i) ⇒ (iii). We apply Lemma 1.1.2 to obtain the following: For all a, ε > 0, 2 E {Aa (ε)}2 ≤ 2 · 4d · E[Aa (ε)] . Thus, we can combine this with the Paley–Zygmund lemma (Lemma 1.4.1, Chapter 3) to deduce that for all a, ε > 0, P A∞ (ε) ≥ 12 E[Aa (ε)] ≥ P Aa (ε) ≥ 12 E[Aa (ε)] ≥
1 . 2 · 41+d
Let a → ∞ in the leftmost term to prove that both (i) ⇒ (iii) and (ii) ⇒ (iv). Since (iv) ⇒ (iii) ⇒ (i), and since (i) ⇒ (ii) has already been established, we can deduce the equivalence of (i) through (iv). It suffices to prove that (iv) ⇒ (v) to finish the demonstration. On the other hand, the event (A∞ (ε) = ∞) is a tail event for X. By Supplementary Exercise 10 of Chapter 8, the tail σ-field of X is trivial. Therefore, P{A∞ (ε) = ∞} is 0 or 1, and our task is finished.
1.2 Recurrence of the Origin We are now in a position to decide when the origin 0 ∈ Rd is recurrent for X. The problem of deciding when other points x ∈ Rd are recurrent is deferred to Supplementary Exercise 8. Theorem 1.2.1 The following are equivalent:
348
10. Probabilistic Potential Theory
&∞ (i) there exists ε > 0 such that 0 P(|Xs | ≤ ε) ds = ∞; &∞ (ii) for all ε > 0, 0 P(|Xs | ≤ ε) ds = ∞; &∞ (iii) for all ε > 0, 0 1l(|Xs |≤ε) ds = ∞, a.s.; and (iv) 0 ∈ Rd is recurrent for X. Our proof of Theorem 1.2.1 rests on one technical lemma. Lemma 1.2.1 For all ε > 0 and all b > a > 0, & 12 (a+b) P(|Xs | ≤ ε) ds a &1 d 2 (a+b) 2·4 0 P(|Xs | ≤ ε) ds &b
a P(|Xs | ≤ 2ε) ds . ≤ P inf a≤s≤ 12 (b+a) |Xs | ≤ ε ≤ & 1 (b−a) 2 P(|Xs | ≤ ε) ds 0
(1)
Proof Consider the right-continuous bounded martingale M = (Mt ; t ≥ 0) defined by b Mt = E t ≥ 0. 1l(|Xs |≤ε) ds Ft , a
For all 0 ≤ t ≤ 12 (a + b),
Mt ≥ E
b
≥E
1l(|Xs |≤ε) ds Ft 1l(|Xt |≤ 12 ε)
b
=E
1l(|Xs −Xt |≤ 12 ε) ds Ft 1l(|Xt |≤ 12 ε)
b−t
t
t
0
1l(|Xs |≤ 12 ε) ds 1l(|Xt |≤ 12 ε) .
We have used the stationarity and the independence of the increments of X in the last line. Since t ≤ 12 (a + b), we obtain Mt ≥
1 2 (b−a)
0
P(|Xs | ≤ 12 ε) ds 1l(|Xt |≤ 12 ε) ,
P-a.s.
(2)
The above holds P-a.s. for each t ≥ 0. By the σ-additivity of probability measures, (2) holds P-a.s., simultaneously for all rational t ≥ 0. On the other hand, M is right-continuous, and so is X; see the last paragraph of Section 1.4, Chapter 7. Thus, equation (2) holds P-a.s., for all t ≥ 0. (Why?) Now let σ = inf(t ∈ [a, 12 (a + b)] : |Xt | ≤ 12 ε), where inf ∅ = ∞. By Theorem 1.5.1 of Chapter 7, σ is an F-stopping time. Thus, 12 (b−a)
1 E[Mσ 1l(σ<∞) ] ≥ P |Xs | ≤ 12 ε ds · P inf |X | ≤ ε . t 2 1 0
a≤t≤ 2 (a+b)
1 Recurrent L´evy Processes
349
Since M is bounded, by the bounded convergence theorem, E Mσ 1l(σ<∞) = lim E Mσ∧n , n→∞
which, thanks to the optional stopping theorem (Theorem 1.6.1 of Chapter &b 7), equals E[M0 ] = a P(|Xs | ≤ ε) ds. Replacing ε by 2ε, we obtain the first inequality of equation (1). To prove the second inequality of (1), note that
1 2 (a+b)
a
1l(|Xs |≤ 12 ε) ds > 0 =⇒
inf
a≤t≤ 12 (a+b)
|Xt | ≤ ε.
Thus, P
inf 1
a≤t≤ 2 (a+b)
|Xt | ≤ ε ≥ P
a
1 2 (a+b)
1l(|Xs |≤ε) ds > 0
2 & 12 (a+b) 1l(|Xs |≤ε) ds E a ≥ & 1 (a+b) 2 . 2 E 1l(|Xs |≤ε) ds a
In the last line we have used the Paley–Zygmund lemma; see Lemma 1.4.1 of Chapter 3. Equation (1) follows from this and Lemma 1.1.2. Proof of Theorem 1.2.1 Proposition 1.1.1 shows that (i), (ii), and (iii) are equivalent. Next, let us suppose that (ii) fails to hold. By Lemma 1.2.1, for all a, ε > 0, &∞
P(|Xs | ≤ 12 ε) ds P inf |Xt | ≤ ε ≤ &a ∞ . t≥a P(|Xs | ≤ ε) ds 0 Consequently, by Lemma 1.1.1,
lim P inf |Xt | ≤ ε = 0. a→∞
t≥a
In particular, 0 cannot be recurrent (why?). That is, we have verified (iv) ⇒ (ii). Finally, let us
suppose that (iv) fails to hold. This means that for any ε > 0, P Lε < ∞ = 1, where Lε = sup(t ≥ 0 : |Xt | ≤ ε). (Why?) Since a.s.,
∞ 0
(iii) ⇒ (iv) follows.
1l(|Xs |≤ε) ds ≤ Lε < ∞,
350
10. Probabilistic Potential Theory
Exercise 1.2.1 If 0 is transient, show that &∞ t & P(|Xs | ≤ ε) ds ≤ P(|Xs | ≤ ε for some s ≥ t) ∞ 2 · 4d 0 P(|Xs | ≤ ε) ds &∞ 4d t P(|Xs | ≤ ε) ds , ≤ &∞ 0 P(|Xs | ≤ ε) ds for all choices of t, ε > 0.
To put this subsection in the general context of this chapter, define the random set K = {Xs : a ≤ s ≤ b}. Lemma 1.2.1 estimates P(K ∩ E = ∅), where E denotes the closed ball of radius ε about the origin. Recalling the first paragraph of this chapter, we have seen that issues of recurrence can be boiled down to questions about whether K intersects a certain set (here a closed ball) with positive probability.
1.3 Escape Rates for Isotropic Stable Processes In some specific cases, the method used in our characterization of the recurrence of 0 (for a L´evy process X) has some delicate and intriguing consequences. We now describe one such fact for a special class of L´evy processes. Let X = (Xt ; t ≥ 0) designate an isotropic stable L´evy process on Rd with index α ∈ ]0, 2]; cf. Example 3, Section 4.3 of Chapter 8. Recall also that when α = 2, X is d-dimensional Brownian motion. Define the function Uα : R+ → R+ by if α > d, 1, −1 (1) Uα (r) = ln+ ( 1r ) , if α = d, d−α r , if α < d. The following is the main result of this and the next subsection. When α = 2 (i.e., Brownian motion), it was found by Dvoretzky and Erd˝ os (1951) when d ≥ 3 and by Spitzer (1958) when d = 2. At the level of generality described below, the theorem was found in Takeuchi (1964a, 1964b), and Takeuchi and Watanabe (1964); see also Hendricks (1972, 1974, 1979), Sato (1999), and Bertoin (1996, Theorem 6, Chapter VIII.2), and their references for further elaborations. Our proofs will follow the general method of Khoshnevisan (1997a) closely. Theorem 1.3.1 Let ϕ : R+ → R+ be nonincreasing. (i) If d < α, then with probability one, the level set X −1 {0} is unbounded, where X −1 {0} = t ≥ 0 : Xt = 0 .
1 Recurrent L´evy Processes
(ii) If d ≥ α, then with probability one, " |Xt | ∞, lim inf 1 = 0, t→∞ t α ϕ(t)
where J(ϕ) =
∞
1
351
if J(ϕ) < ∞, if J(ϕ) = ∞,
Uα (ϕ(t)) dt. t
Theorem 1.3.1 will be demonstrated in this and the next subsection. However, we first examine its content. Examples (a) If d < α, we see that X is recurrent. Of course, since α ∈ ]0, 2], the only possible value for d in this regime is d = 1. That is, we conclude that whenever d = 1 and α > 1, X is recurrent. In fact, one can show the stronger property that when α > 1, for any a ∈ R, there is an a.s. unbounded set of times t such that Xt = a; cf. Supplementary Exercise 7. This property is called point recurrence. That is, not only does X come within ε of any point a ∈ R infinitely often in the long run, but it in fact hits that point infinitely often in the long run. (b) If d > α, then Uα (x) = xd−α . Choose ϕ(t) = β > 0 and note that for this choice of ϕ, J(ϕ) < ∞ ⇐⇒ β >
−β
ln+ t
for some
1 . d−α
In particular, with probability one, lim inf t→∞
β
ln+ t t
1 α
" |Xt | =
∞, 0,
if β > (d − α)−1 , if β ≤ (d − α)−1 .
An immediate consequence of this is that limt→∞ |Xt | = ∞, a.s. That is, the process X is transient when d > α (why?). (c) We now turn to the “critical case” d = α. Since α ∈ ]0, 2], this means that either (d, α) = (2, 2) or (d, α) = (1, 1). In the latter case, X is 2-dimensional Brownian motion, while in the former case, it is called the 1-dimensional Cauchy process; cf. Example 3, Section 4.3 of Chapter 8, for an explanation of this terminology. In either case, Uα (x) = {ln+ (1/x)}−1 . For any β > 0 fixed, let us choose ϕ(x) = exp(−{ln+ x}β ). Then, it is easy to see that J(ϕ) < ∞ ⇐⇒ β > 1.
352
10. Probabilistic Potential Theory
In other words, with probability one, |Xt | lim inf 1 = t→∞ t α exp − (ln t)β
"
∞, if β > 1, 0, if β ≤ 1.
From this we can conclude that a.s., for all t large, lim inf t→∞ |Xt | = 0. In other words, in the case d = α, 0 is recurrent. Using Supplementary Exercise 8, this can be strengthened to show that when d = α, X is a recurrent process that is not point recurrent. We conclude this subsection by starting on our proof of Theorem 1.3.1; the following results are elementary distributional properties of the isotropic stable process X. Lemma 1.3.1 (Scaling Lemma) If c > 0 is fixed, then the finite1 dimensional distributions of (Xct ; t ≥ 0) and (c α Xt ; t ≥ 0) are the same. Proof We will prove this lemma by directly examining the characteristic function of X at various times. Suppose 0 = t0 < t1 < · · · < tm are fixed. We need to show that the distributions of (Xct1 , . . . , Xctm ) and 1 c α (Xt1 , . . . , Xtm ) are one and the same. That is, we will show that for all ξ1 , . . . , ξm ∈ Rd , m m 1 E exp i = E exp ic α ξj · Xctj ξj · Xtj . j=1
j=1
By considering linear combinations, it suffices to show that for all choices of ζ1 , . . . , ζm ∈ Rd , m m 1 E exp i ζj · (Xctj − Xctj−1 ) = E exp ic α ζj · (Xtj − Xtj−1 ) . j=1
j=1
(Why?) On the other hand, by the stationary, independent increments property of L´evy processes, m E exp i ζj · (Xctj − Xctj−1 ) = j=1
while m 1 α ζj · (Xtj − Xtj−1 ) = E exp ic j=1
m
E exp iζj · Xc(tj −tj−1 ) , (2)
j=1
m
1 E exp{ic α ζj · X(tj −tj−1 ) } . (3)
j=1
Recall from Example 3 of Section 4.3, Chapter 8, that the characteristic function of Xt is α 1 E eiξ·Xt = e− 2 ξ , ξ ∈ Rd .
1 Recurrent L´evy Processes
353
By the form of this characteristic function, the terms in equations (2) and (3) are both equal to m c(t − t )ζ α j j−1 j , exp − 2 j=1
which proves our lemma.
The following is immediate from the inversion formula for characteristic functions. Lemma 1.3.2 For any t ≥ 0, the random vector Xt has the following (isotropic) probability density function: α d 1 qt (x) = (2π)− 2 e−iξ·x e− 2 t ξ dξ, x ∈ Rd . Rd
This formula has some important consequences, which we list below: Corollary 1.3.1 (i) For all t > 0, qt (0) = supx∈Rd qt (x) > 0. 1
(ii) t → qt (0) is nonincreasing. In fact, qt (0) = At− α , where A = α 1 d & 2 α (2π)− 2 Rd e− ζ dζ. (iii) The function (t, x) → qt (x) is uniformly continuous on [ε, ε−1 ] × Rd for any ε > 0. (iv) For all ε ∈ ]0, 1[, there exist a finite constant η > 0 and an open set G $ 0 in Rd such that for all ε < t < ε−1 and for all x ∈ G, qt (x) ≥ η. Exercise 1.3.1 Verify Corollary 1.3.1.
Exercise 1.3.2 Prove that the above density function qt is isotropic, i.e., depends on x only through x, and satisfies the following scaling relation: d 1 For all t > 0 and all x ∈ Rd , qt (x) = t− α q1 (xt− α ).
1.4 Hitting Probabilities The following is the main step in our proof of Theorem 1.3.1; it computes, up to a multiplicative constant, the probability that the process X hits the closed ball B(0; ε) at some time between a and b. Recall the function Uα from equation (1) of Section 1.3. Proposition 1.4.1 For all b > a > 0, there exist finite constants ε0 ∈ ]0, 1[ and A > 1 such that for all ε ∈ ]0, ε0 [,
1 Uα (ε) ≤ P inf |Xs | ≤ ε ≤ AUα (ε). a≤s≤b A
354
10. Probabilistic Potential Theory
The above is proved in two steps, which we present as lemmas. For the sake of notational convenience, throughout our proof of Proposition 1.4.1 A, A1 , A2 , . . . will denote unimportant constants whose values may change from one lemma to another. Lemma 1.4.1 For every b > a > 0, there exist ε0 > 0 and a finite A > 1 such that for all ε ∈ ]0, ε0 [ and all s ∈ [a, b], 1 d ε ≤ P(|Xs | ≤ ε) ≤ Aεd . A Proof By Lemma 1.3.2 and its corollary (Corollary 1.3.1), for any s, ε > 0, P(|Xs | ≤ ε) ≤ qs (0)(2ε)d . (Recall that |x| is the ∞ norm of x.) Thus, for all a ≤ s ≤ b, α d 1 P(|Xs | ≤ ε) ≤ (2π)− 2 e− 2 s ξ dξ · (2ε)d ≤ A1 εd , Rd
d
where A1 = ( π2 ) 2
& Rd
1
α
e− 2 a ξ dξ. On the other hand, P(|Xs | ≤ ε) ≥ inf qs (x)(2ε)d . x:|x|≤ε
By another application of Lemma 1.3.2 and Corollary 1.3.1, there exists ε0 > 0 small enough and a constant A1 > 1 such that 1 = 2d inf inf qs (x) > 0. A2 s∈[a,b] x:|x|≤ε0 The lemma follows with A = A1 ∨ A2 ∨ 1.
Lemma 1.4.2 For any c > 0, there exist finite constants A > 1 and ε0 ∈ ]0, 1[ such that for all ε ∈ ]0, ε0 [, εd ≤ AUα (ε)
0
c
P(|Xs | ≤ ε) ds ≤
Aεd . Uα (ε)
Proof We will verify the upper bound on the integral; the lower bound is derived similarly. By the scaling lemma (Lemma 1.3.1), c c ε P(|Xs | ≤ ε) ds = P |X1 | ≤ 1 ds. sα 0 0 By Lemma 1.3.2, for all r > 0, P(|X1 | ≤ r) ≤ {q1 (0)rd ∧ 1} ≤ A1 {rd ∧ 1},
1 Recurrent L´evy Processes
355
1
where A1 = q1 (0) ∨ 1. Thus, for all ε ∈ ]0, c α [, c c c d d ε α d −α ε P(|Xs | ≤ ε) ds ≤ A1 ∧ 1 ds = A + ε s ds . 1 1 sα 0 0 εα Computing directly, we see that Uα (ε) c P(|Xs | ≤ ε) ds ≤ 2A1 . lim sup εd ε→0+ 0 The announced upper bound on the integral follows suit.
Exercise 1.4.1 Verify the lower bound in Lemma 1.4.2.
Proposition 1.4.1 is now easily seen to follow from Lemmas 1.2.1, 1.4.1, and 1.4.2. Exercise 1.4.2 When d > α, prove there exist two finite positive constants A1 and A2 such that for all T, R > 0, A1
T 1− αd T 1− αd ≤ P(|X | ≤ R for some s ≥ T ) ≤ A s 2 Rα Rα
as long as T R−α > 1. (Hint: See Exercise 1.2.1.)
For technical reasons, we will later need the following improvement over the probability upper bound in Proposition 1.4.1. Lemma 1.4.3 For all b > a > 0, there exist finite constants ε0 ∈ ]0, 1[ and A > 1 such that for all ε ∈ ]0, ε0 [,
sup Px inf |Xs | ≤ ε ≤ AUα (ε). a≤s≤b
x∈Rd
Proof Our proof is analogous to that of Lemma 1.2.1. Define 2b−a t ≥ 0. 1l(|Xs |≤ε) ds Ft , Mt = Ex a
The process M = (Mt ; t ≥ 0) is a martingale under Px . Moreover, Px -a.s., b P(|Xs | ≤ 12 ε) ds 1l(|Xt |≤ 12 ε) , t ≥ 0; Mt ≥ 0
see the derivation of (2) in Section 1.2 above. Define the F-stopping time σ = inf(s ∈ [a, b] : |Xs | ≤ 12 ε) to see that for any x ∈ Rd , Px -a.s., Mσ 1l(σ<∞) ≥
0
b
P(|Xs | ≤ 12 ε) ds 1l(σ<∞) .
356
10. Probabilistic Potential Theory
As in the described proof of Lemma 1.2.1, we can take expectations (this time Ex -expectations), use the optional stopping theorem, and conclude that b
Ex [M0 ] =
P(|Xs | ≤ 12 ε) ds · Px ( inf |Xt | ≤ 12 ε). a≤t≤b
0
(1)
It is very important to note that the right-hand side has one probability term involving P = P0 and one involving Px . But Ex [M0 ] =
a
b
b
Px (|Xs | ≤ ε) ds =
= B(0;ε)
a
b
a
P(|Xs + x| ≤ ε) ds
qs (a + x) da ds ≤ qa (0)(b − a)(2ε)d .
(2)
In the last line we have used Corollary 1.3.1. By Lemma 1.1.1, 0
b
P(|Xs | ≤
1 2 ε) ds
≥4
−d
b
0
P(|Xs | ≤ ε) ds.
Applying Lemma 1.4.2, we obtain finite constants ε0 > 0 and A1 > 1 such that for all ε ∈ ]0, ε0 [, 0
b
P(|Xs | ≤ 12 ε) ds ≥
A1 εd . Uα (ε)
Combining this with equations (1) and (2), we deduce the lemma with A = 8d qa (0)(b − a)A−1 1 . We will also need the following “converse” to the Borel–Cantelli lemma that is found in (Chung and Erd˝ os 1952; Kochen and Stone 1964); see also Exercise 1.4.3, Chapter 3. Lemma 1.4.4 (Kochen–Stone Lemma) Suppose E1 , E2 , . . . are mea surable events. Let Nn = nj=1 1lEj and assume that: (i) E[N∞ ] = ∞; and (ii) lim inf n→∞ E[(Nn )2 ]/{E[Nn ]}2 < ∞. Then, P(En infinitely often) > 0. We now have accumulated the tools necessary for the following. Proof of Theorem 1.3.1: Part (i) By Proposition 1.4.1, we can find finite constants ε0 > 0 and A > 1 such that for all ε ∈ ]0, ε0 [,
P inf |Xt | ≤ ε ≥ A−1 . 1≤t≤e
1 Recurrent L´evy Processes
357
(You should recall the definition of Uα from equation (1) of Section 1.3.) Letting ε → 0, continuity of probability measures guarantees us that
1 > 0. P(X −1 {0} ∩ [1, e] = ∅) = P inf |Xt | = 0 ≥ 1≤t≤e A On the other hand, by scaling (Lemma 1.3.1), inf 1≤t≤e |Xt | has the same j distribution as e− α inf ej ≤t≤ej+1 |Xt |. Thus, for all j ≥ 1, P(X −1 {0} ∩ [ej , ej+1 ] = ∅) ≥
1 . A
Since A is independent of j ≥ 1,
1 P X −1 {0} ∩ [ej , ej+1 ] = ∅ infinitely often ≥ > 0. A On the other hand, the tail σ-field of X is trivial (Supplementary Exercise 10, Chapter 8), and the above event is measurable with respect to it. Thus,
P X −1 {0} ∩ [ej , ej+1 ] = ∅ infinitely often = 1. This completes our proof of the first part.
Proof of Theorem 1.3.1: Part (ii) Since the result holds for all such ϕ, we can assume, without loss of generality, that limt→∞ ϕ(t) = 0. (Why?) This assumption holds tacitly from now on. For all j ≥ 1 and all λ > 0, define the measurable events
j Ej = inf |Xt | ≤ λe α ϕ(ej+1 ) ; ej ≤t≤ej+1
|Xt | ≤λ ; Fj = j inf j+1 1 (3) t α ϕ(t) e ≤t≤e
(j+1) Gj = j inf j+1 |Xt | ≤ λe α ϕ(ej ) . e ≤t≤e
By monotonicity, for all j ≥ 1, Ej ⊂ Fj ⊂ Gj .
(4)
By scaling (Lemma 1.3.1), P(Ej ) = P
inf |Xt | ≤ λϕ(ej+1 ) .
1≤t≤e
Since λ > 0 is held fixed, by Proposition 1.4.1 and by the form of the function Uα there exist finite constants J1 > 1 and A1 ∈ ]0, 1[ such that for all j ≥ J1 , (5) P(Ej ) ≥ A1 Uα (ϕ(ej+1 )).
358
10. Probabilistic Potential Theory
(See equation (1) of Section 1.3 for the form of Uα .) Similarly, there exist finite constants A2 , J2 > 1 such that for all j ≥ J2 , P(Gj ) ≤ A2 Uα (ϕ(ej )).
(6)
Combining the above two estimates with equation (4), we can deduce that P(Fj ) < ∞ ⇐⇒ Uα (ϕ(ej )) < ∞. j
j
On the other hand, by monotonicity, ∞ ∞ ej+1 e − 1 Uα (ϕ(ej+1 )) j Uα (ϕ(e )) = dt e ej+1 j j=1 j=0 e ≤
∞
ej+1
ej
j=0 ∞
Uα (ϕ(t)) dt t
Uα (ϕ(t)) dt t 1 ∞ ej+1 Uα (ϕ(ej )) dt ≤ ej j j=0 e
=
= (e − 1)
∞
· · · = J(ϕ)
Uα (ϕ(ej )).
j=0
Thus far, we have shown that
P Fj < ∞ ⇐⇒ J(ϕ) < ∞.
(7)
j
Now we proceed with our proof. Suppose
By equation that J(ϕ) < ∞. (7) and by the Borel–Cantelli lemma, P Fj infinitely often = 0. Recalling equation (3), this means that for any λ > 0, the following holds a.s.: inf
|Xt |
ej ≤t≤ej+1
1
t α ϕ(t)
≥ λ, eventually.
The above holds a.s., simultaneously over all rational λ > 0. Thus, we can let λ → ∞ along a rational sequence to see that lim inf
inf
j→∞ ej ≤t≤ej+1
|Xt | 1 α
t ϕ(t)
= ∞,
a.s.
This proves the theorem in the case J(ϕ) < ∞. Conversely, suppose J(ϕ) = ∞. By equation (7), j P(Fj ) = ∞. We claim that n n P(Fi ∩ Fj ) i=1 nj=1 lim inf < ∞. (8) n→∞ { i=1 P(Fi )}2
1 Recurrent L´evy Processes
359
Having this, we employ the Kochen–Stone lemma (Lemma 1.4.4) to see that P(Fj infinitely often) > 0. The latter event is a tail event, and the tail σ-field for X is trivial; cf. Supplementary Exercise 10 of Chapter 8. Therefore, P(Fj infinitely often) = 1. Now, we can reverse the argument used above (in the summability portion) to argue that if J(ϕ) = ∞, then lim inf
|Xt |
inf
j→∞ ej ≤t≤ej+1
1
t α ϕ(t)
= 0,
a.s.
Since this would finish our proof of Theorem 1.3.1, we are left to verify the validity of equation (8). Using equation (4), for all i, j ≥ 1, P(Fi ∩ Fj ) ≤ P(Gi ∩ Gj ). With (8) in mind, we set out to estimate the latter probability. Since Gi ∈ Fei+1 , for all j ≥ i + 2,
1 P(Gj | Fei+1 ) = PXei+1 j i+1 inf j+1 i+1 |Xt | ≤ λe α ϕ(ej ) e −e ≤t≤e −e
1 ≤ sup Px j i+1 inf j+1 i+1 |Xt | ≤ λe α ϕ(ej ) . x∈Rd
e −e
≤t≤e
−e
We have used the Markov property in the first line; cf. Theorem 3.3.2 of Chapter 8. By the Scaling Lemma 1.3.1,
1 α ϕ(ej ) P(Gj Fei+1 ≤ sup Px inf |X | ≤ λe t 1−ei−j+1 ≤t≤e−ei−j+1 x∈Rd
1 ≤ sup Px 1 inf |Xt | ≤ λe α ϕ(ej ) , x∈Rd
2 ≤t≤e
since j ≥ i + 2. Now we can apply Lemma 1.4.3 and use the form of the function Uα to deduce the existence of finite constants A3 , J3 > 1 such that for all j ≥ J3 and all i ≤ j − 2, P(Gj | Fei+1 ) ≤ A3 Uα (ϕ(ej )). Together with (6), the above shows that for all j ≥ i − 2 ≥ J4 = J3 ∨ J2 , P(Gi ∩ Gj ) ≤ A2 A3 Uα (ϕ(ei )) · Uα (ϕ(ej )). Together with equations (5) and (4), this implies that for all j ≥ i − 2 ≥ J5 = J4 ∨ J1 , P(Gi ∩ Gj ) ≤ A1 A2 A3 P(Fi ) · P(Fj ). By symmetry and by equation (4), n n i=1 j=1 |j−i|≥J5 +2
n
2
P(Fi ∩ Fj ≤ A1 A2 A3 P(Fi ) . i=1
(9)
360
10. Probabilistic Potential Theory
On the other hand, n n
P(Fi ∩ Fj ) ≤ 2(J5 + 2)
i=1 j=1 |j−i|<J5 +2
n
P(Fi ),
(10)
i=1
using P(E ∩ F ) ≤ P(E). Now we can finish our proof. Recall we
that have J(ϕ) = ∞, which, thanks to equation (7), implies j P Fj = ∞. Moreover, (9) and (10) together yield the following: n n n n 2 P (Fi ∩ Fj ) ≤ A4 P(Fi ) + P(Fj ) , i=1 j=1
j=1
j=1
where A4 = max{A1 A2 A3 , 2(J5 + 2)}. In particular, since n n i=1 j=1 P(Fi ∩ Fj ) ≤ A4 , lim sup n 2 n→∞ i=1 P(Fi )
j
which amply verifies equation (8) and hence the theorem.
P(Fj ) = ∞,
Exercise 1.4.3 Suppose X is a d-dimensional isotropic stable process of index α ∈ ]0, 2], where d ≥ α. If ψ : R+ → R+ is monotone, find a necessary 1 and sufficient condition for lim inf t→0+ t− α ψ(t)|Xt | = +∞. (Hint: Mimic the demonstration of Theorem 1.3.1 carefully, but instead of using the sequence ej , use e−j .)
2 Hitting Probabilities for Feller Processes The previous section has introduced us to the notion that some of the key properties of Markov processes are based on the determination of “hitting probabilities.” (Roughly speaking, the hitting probability of a Borel set E is the probability that the Markov process ever enters the set E.) We now reconsider this problem in the more general setting of some Feller processes on Rd . As it turns out, for many Feller processes of interest, the hitting probability of a compact set E can be described by a certain capacity of the set E. At this point the reader is strongly urged to acquaint him- or herself with the material and terminology of Appendices C and D on capacities before proceeding further.
2.1 Strongly Symmetric Feller Processes Henceforth, we will focus our attention on a strongly symmetric Feller process X = (Xt ; t ≥ 0) on Rd . This is an Rd -valued Feller process that satisfies the following conditions:
2 Hitting Probabilities for Feller Processes
361
1. the transition functions of X have transition densities pt (x, y) with respect to some σ-finite measure ν on the Borel subsets of Rd ; 2. for each t > 0, (x, y) → pt (x, y) is symmetric in that pt (x, y) = pt (y, x) for all x, y ∈ Rd ; and 3. for every λ > 0, the corresponding λ-resolvent density rλ is a gauge function on Rd × Rd , where rλ (x, y) =
∞
0
e−λs ps (x, y) ds,
x, y ∈ Rd , λ > 0.
The measure ν is called the reference measure for the process X. Recall that condition 1 is equivalent to the following: For all bounded measurable f : Rd → R, Ex [f (Xt )] =
pt (x, y)f (y) ν(dy),
x ∈ Rd , t ≥ 0;
see Section 2.3 of Chapter 8. Moreover, the transition functions of X are given by the relation Tt f (x) = Ex [f (X & t )], and the corresponding resolvent is similarly described by Rλ f (x) = rλ (x, y)f (y) ν(dy) (x ∈ Rd , λ > 0). We also recall from Appendix D that a function f is a gauge function on Rd × Rd if for all η > 0, f is continuous on Oη = {x ⊗ y : x, y ∈ Rd , |x − y| > η} and if f > 0 on Oη , when η is small enough. In words, gauge functions on Rd × Rd are nonnegative functions that are continuous everywhere, except possibly on the “diagonal” of Rd × Rd , near which they are strictly positive. Exercise 2.1.1 Any gauge function on Rd × Rd is lower semicontinuous. In fact, if g is a gauge function, so is g ∧ M , for any M > 0 sufficiently large. Moreover, if rλ denotes the λ-resolvent density of a strongly symmetric Feller process, then rλ is a symmetric function. You may need to recall that f is lower semicontinuous if there are continuous functions f1 , f2 , . . . such that as n → ∞, fn (x) ↑ f (x), for all x; see also Exercise 1.1.3, Appendix D. Exercise 2.1.2 Consider an isotropic stable L´evy process X of index α ∈ ]0, 2]. Verify that X has symmetric, bounded, and continuous transition densities with respect to ν, when ν denotes Lebesgue’s measure on the Borel subsets of Rd . In fact, isotropic stable processes are strongly symmetric. See Section 3 below. (Hint: Examine Section 4.3 of Chapter 8.)
362
10. Probabilistic Potential Theory
Exercise 2.1.3 Show that for all continuous functions f, g : Rd → R+ and for all λ > 0, f (x)Rλ g(x) ν(dx) = g(x)Rλ f (x) ν(dx). In words, the symmetry of the transition densities pt implies that the resolvent operator is self-adjoint as an operator on L2 (ν). The existence of the λ-potential density rλ and the transition densities pt give rise to the definitions of potentials and transition operators of measures. Indeed, suppose φ is a finite measure on Rd . We can define the λ-potential of φ, Rλ φ, as Rλ φ(x) = rλ (x, y) φ(dy), x ∈ Rd , λ > 0. Likewise, we can define Tt φ(x) =
pt (x, y) φ(dy),
t > 0, x ∈ Rd .
The above make sense due to positivity, even though they may be infinite for some, or even for all, values of x. Exercise 2.1.4 Check that the above extend our earlier definitions of Rλ f and Tt f , respectively. That is, prove that when µ(dx) = f (x)ν(dx), then for all t, λ > 0, Tt µ = Tt f and Rλ µ = Rλ f . Exercise 2.1.5 Check that for any two finite measures ξ and ζ on Rd and for all λ > 0, Rλ ζ(x) ξ(dx) = Rλ ξ(x) ζ(dx). In particular, show that for all probability measures µ on Rd , Rλ µ(x) ν(dx) = λ−1 . This is the reciprocity theorem of classical potential theory, and by Exercise 2.1.4, it extends Exercise 2.1.3 above. See also Exercise 1.1.4, Appendix D.
2.2 Balayage Balayage (French for “sweeping out ”) is a method of approximating potentials of measures by potentials of functions.2 In the context of probabilis2 The roots of this method back to the early works of H. Poincar´ e and his work on the diffusion equation; see the historical discussions of (Kellogg 1967; Wermer 1981) on this matter.
2 Hitting Probabilities for Feller Processes
363
tic potential theory, balayage addresses the following technical question: “Given a probability measure µ on Rd , can we approximate Rλ µ by the λpotential of some function? ” In light of Exercise 2.1.4, an equivalent goal is to approximate Rλ µ by the λ-potential of an absolutely continuous measure. We shall see, shortly, that one can give an affirmative answer to this question. This answer will turn out to depend on the following unusuallooking result, which is the main theorem of this subsection. Theorem 2.2.1 (The Balayage Theorem) Consider a strongly symmetric Feller process X = (Xt ; t ≥ 0) on Rd with reference measure ν and resolvent R = (Rλ ; λ > 0). Given a compact set E ⊂ Rd , a bounded open set F ⊂ Rd with E ⊂ F , and a µ ∈ P(E),3 we can find a collection of measures (ζt ; t > 0) such that: (i) for every t > 0, ζt is absolutely continuous with respect to ν; (ii) for all t, λ > 0 and all x ∈ Rd , Rλ ζt (x) ≤ Rλ µ(x); (iii) for every t > 0, ζt is supported on F ; and (iv) as t → 0+ , ζt converges weakly to µ. Recall that the measures ζt converge weakly to µ in the same way that probability measures &do: if and &only if for all bounded continuous functions ψ : Rd → R, limt→0+ ψ dζt = ψ dµ. Before proving the balayage theorem, we use it to analyze the λ-potential density of a strongly symmetric Feller process X on Rd , whose λ-resolvent density is denoted by rλ . First, we recall from Appendix D that the gauge function rλ is proper if for any compact set E ⊂ Rd and all probability measures µ on E, there exist bounded open sets E1 , E2 , . . . and measures µ1 , µ2 , . . . on E1 , E 2 , . . ., respectively, that are absolutely continuous with respect to ν, and such that: 1. E1 ⊃ E2 ⊃ · · ·; 2. ∩n En = E; and 3. for all ε > 0, there exist N0 such that for all n ≥ N0 , (a) µn (En ) ≥ 1 − ε; and (b) Rλ µn (x) ≤ Rλ µ(x), for all x ∈ Rd . The following is a corollary to Theorem 2.2.1, and captures the salient features of the λ-potential densities of strongly symmetric Feller processes. This is a very important result, and its sole message is that rλ is proper: 3 Recall
that P(E) denotes the collection of all probability measures on E.
364
10. Probabilistic Potential Theory
Corollary 2.2.1 Suppose X is a strongly symmetric Feller process on Rd with reference measure ν and λ-potential density rλ . Then, rλ is a proper, symmetric gauge function on Rd × Rd . Proof By Exercise 2.1.1, rλ is symmetric, and it is a gauge function by definition. To argue that rλ is proper for every λ, consider a compact set E ⊂ Rd , and let E n denote its n1 -enlargement. That is, 1 , E n = x ∈ Rd : dist{x; E} < n
n ≥ 1.
Fix n ≥ 1 and apply the balayage theorem (Theorem 2.2.1) to F = E n . This way, we construct measures (ζt ; t ≥ 0) as described. By picking t = t(n) sufficiently small, we can ensure that ζt(n) (E n ) ≥ 1 − ε for all n ≥ 1, and our corollary follows. The remainder of this section is concerned with proving the balayage theorem (Theorem 2.2.1). This proof is not too difficult, but unless the reader is well familiar with time-reversal, it is somewhat mysterious. Furthermore, our proof relies on facts that are needed only for this argument. Therefore, the rest of this subsection can be safely omitted for the newly initiated student of this subject. Henceforth, X is a strongly symmetric Feller process on Rd , E ⊂ Rd is compact, and F ⊂ Rd is bounded, open, and F ⊃ E. We introduce a sequence of families of bounded linear operators (Ttn ; t ≥ 0), n = 1, 2, . . ., by Ttn f (x) = Ex f (Xt )
2n j=0
1(X
j2−n t ∈F )
.
(1)
Lemma 2.2.1 Fix n ≥ 1 and t > 0. Then: (i) whenever f : Rd → R+ is measurable, Tt f (x) ≥ Ttn f (x) ≥ Ttn+1 f (x), for all x ∈ Rd ; (ii) there exists a nonnegative, symmetric function pnt , supported on F ×F , such that for all measurable f : Rd → R+ , and for all x ∈ Rd , Ttn f (x) = pnt (x, y)f (y) ν(dy); (iii) for all measurable f : Rd → R+ , and for all x ∈ F , Ttn f (x) = 0; and (iv) the map A → Ttn 1lA (x) is a subprobability measure on F , for any x ∈ Rd .
2 Hitting Probabilities for Feller Processes
365
Proof Part (i) follows from equation (1) and the elementary observation that dyadic numbers are nested. Part (iii) follows from (ii), since by (ii), pnt (x, y) = 0, unless x and y are both in F . Finally, (iv) follows from equation (1). Thus, it remains to prove (ii); this follows from the Chapman– Kolmogorov equation (Lemma 3.1.1, Chapter 8). In fact, we can choose pnt (x, y) by the formula · · · pt2−n (x, x1 ) 2n −2
pt2−n (xi , xi+1 ) · pt2−n (x2n −1 , y) ν(dx1 ) · · · ν(dx2n −1 )
× i=1
if x, y ∈ F , and pnt (x, y) = 0 otherwise (check!). The remaining properties follow from this and the symmetry of (a, b) → pt2−n (a, b). Exercise 2.2.1 Check that for all nonnegative f ∈ L∞ (Rd ), lim Ttn f (x) = Ex f (Xt )1l(τF >t) , ∀x ∈ Rd , t ≥ 0, n→∞
where τF = inf(s ≥ 0 : Xs ∈ F ). This exercise is a starting point for investigating deep connections between balayage and killed processes. For some further information, see Supplementary Exercise 6. Now that we have a density pnt , we can extend the definition of the operator Ttn as we did for Tt . For all measures φ on Rd , define n Tt φ(x) = pnt (x, y) φ(dy), x ∈ Rd , t ≥ 0, n ≥ 1. By Lemma 2.2.1,
Ttn φ = 0, off of F .
As in Exercise 2.1.4, this definition of f (x)ν(dx), Ttn φ = Ttn f .
Ttn
(2)
is consistent. Namely, if φ(dx) =
Lemma 2.2.2 For all λ > 0, t ≥ 0, n ≥ 1, x ∈ Rd , and for all measures φ on Rd , Rλ Ttn φ(x) ≤ eλt Rλ φ(x). Proof Since Rλ Ttn φ(x) =
rλ (x, y)pnt (y, z) φ(dz) ν(dy),
(3)
we apply the symmetry of rλ (Exercise 2.1.1) and pnt (Lemma 2.2.1(ii)) to obtain rλ (y, x)pnt (z, y) ν(dy) φ(dz) = Ttn hx (z) φ(dz), Rλ Ttn φ(x) =
366
10. Probabilistic Potential Theory
where hx (y) = rλ (y, x). By Lemma 2.2.1(i), n Rλ Tt φ(x) ≤ Tt hx (z) φ(dz) = Rλ Tt φ(x). In the last equality we have utilized the fact that equation (3) also holds without the superscript of n. Now we finish by computing as follows: ∞ Rλ Ttn φ(x) ≤ Rλ Tt φ(x) = e−λs Ts+t φ(x) ds 0 ∞ λt =e e−λs Ts φ(x) ds, t
from which the assertion follows. We are ready to proceed with our proof of Theorem 2.2.1.
Proof of Theorem 2.2.1 define a sequence of measures ζtn , all on Rd , by ζtn (dx) = e−λt Ttn µ(x) ν(dx), where µ is given in the statement of Theorem 2.2.1. We list some of the key properties of these measures next: (P1) each ζtn is absolutely continuous with respect to ν; (P2) thanks to equation (2), each ζtn is a measure on F ; (P3) by Lemma 2.2.2,
Rλ ζtn (x) ≤ Rλ µ(x).
We claim that for all bounded continuous functions ψ : Rd → R, lim lim ψ(x) ζtn (dx) = ψ(x) µ(dx). t↓0 n→∞
(4)
Since we can then relabel the ζ’s and finish our proof, it suffices to prove (4). But, thanks to Lemma 2.2.1, ψ(x) ζtn (dx) = e−λt ψ(x)pnt (x, y) µ(dy) ν(dx) −λt ψ(x)pnt (y, x) µ(dy) ν(dx). =e Reversing the order of the two integrals yields ψ(x) ζtn (dx) = e−λt Ttn ψ(y) µ(dy) =e
−λt
Ey ψ(Xt )
2n j=0
1l(X
j2−n t ∈F )
µ(dy),
2 Hitting Probabilities for Feller Processes
367
owing to (1). By Exercise 2.2.1, and by the bounded convergence theorem, lim ψ(x) ζtn (dx) = e−λt Ey ψ(Xt )1l(τF >t) µ(dy), n→∞
where τF = inf(s ≥ 0 : Xs ∈ F ). By right continuity, limt↓0 Xt = y, Py -a.s. Thus, another application of the bounded convergence theorem yields n ψ(x) ζt (dx) = ψ(y)Py (τF > 0) µ(dy), lim+ lim t→0
n→∞
& which equals ψ dµ. This uses the facts that F is open, E is compact, E ⊂ F , t → Xt is right-continuous, and µ ∈ P(E). We have demonstrated equation (4), and hence, the balayage theorem. Exercise 2.2.2 Show, in complete detail, that in the above proof, Py (τF > 0) = 1,
for µ-almost all y.
2.3 Hitting Probabilities and Capacities Throughout this subsection X = (Xt ; t ≥ 0) denotes a strongly symmetric Feller process on Rd whose reference measure is ν. We shall write T, R, pt , and rλ for the transition functions, resolvent, transition densities, and λ-potential densities of X, respectively. The main result of this section is an estimate for the Laplace transform of the hitting (or entrance) time of a compact set E ⊂ Rd in terms of capacities whose gauge function is rλ . Recall from Appendix D that the energy (with respect to the gauge function rλ ) of a measure µ is defined as Erλ (µ) = rλ (x, y) µ(dy) µ(dy). The capacity of a Borel set E ⊂ Rd is then given by the principle of minimum energy, viz., Crλ (E) =
inf
µ∈P(E)
−1 Erλ (µ) ,
where P(E) denotes the collection of all probability measures on E. See Appendix D for further details. Having recalled the definitions of capacity and energy, we are ready to state the main result of this section. In the case that X is Brownian motion, a sharper version appears in Benjamini, Pemantle, and Peres (1995).
368
10. Probabilistic Potential Theory
Theorem 2.3.1 Suppose X is a strongly symmetric Feller process as described above. Let E ⊂ Rd be a compact set and let TE = inf(s ≥ 0 : Xs ∈ E) denote its entrance time. Then, under the above assumptions, for all x ∈ E and all λ > 0, I2 · Crλ (E) ≤ Ex [e−λTE ] ≤ S · Crλ (E), 2S where I = inf y∈E rλ (x, y) and S = supy∈E rλ (x, y). As a consequence of Theorem 2.3.1, one can show that, typically, Px (TE < ∞) > 0 if and only if there exists a probability measure µ on E that has finite energy with gauge function rλ . The following states this more precisely. Corollary 2.3.1 Suppose E ⊂ Rd is compact and let TE denote its hitting time. If for all x, y ∈ Rd , r1 (x, y) > 0, then for any x ∈ E, Px (TE < ∞) > 0 ⇐⇒ Cr1 (E) > 0. In other words, this states that our Markov process X can hit E, starting from x ∈ Rd , only if E is a sufficiently large set, in the sense that it carries a probability measure of finite energy with respect to the gauge function rλ . We will prove Theorem 2.3.1 in the next subsection.
2.4 Proof of Theorem 2.3.1 Throughout this proof let T = TE , for simplicity. We will prove the upper bound and the lower bound for Ex [e−λT ] separately and in this order. The upper bound is much more difficult to prove. Fortunately, we have already built up all of the necessary ingredients, with one notable exception: Lemma 2.4.1 Given a measurable function ϕ : Rd → R+ , Rλ ϕ(x) ≥ Ex [e−λT Rλ ϕ(XT )], for all λ > 0 and every x ∈ Rd . Proof Recall that our Feller process lives on the one-point compactification Rd∆ of Rd , where ∆ is a point outside of Rd . Also recall that X∞ = ∆ and that the domain of definition for all functions f : Rd → R is extended to Rd∆ upon defining f (∆) = 0. Thus, Ex [e−λT Rλ ϕ(XT )] is always well-defined, even if T ≡ ∞. Now we proceed with a proof. By considering ϕ ∧ k and letting k → ∞, we can and will assume that ϕ is a bounded measurable function. By the
2 Hitting Probabilities for Feller Processes
369
Doob–Meyer decomposition (Theorem 1.3.3 of Chapter 9), for all t ≥ 0 and all x ∈ Rd , Mt = e
−λt
Rλ ϕ(Xt ) +
where Mt = Ex
t
0
∞
0
e−λs ϕ(Xs ) ds,
Px -a.s.,
(1)
e−λs ϕ(Xs ) ds Ft .
Since t → Mt and t → Xt are both right-continuous, equation (1) holds Px -a.s., simultaneously for all t ≥ 0. In particular, we are allowed to plug the random time T for t to see that Px -a.s. for all x ∈ Rd , MT 1l(T <∞) = e
−λT
Rλ ϕ(XT ) +
0
T
e−λs ϕ(Xs ) ds ≥ e−λT Rλ ϕ(XT ),
since ϕ(y) ≥ 0 for all y ∈ Rd . Since ϕ∈ L∞ (Rd ), by Lebesgue’s dominated convergence theorem, Ex MT 1l(T <∞) = limn→∞ Ex MT ∧n . We now apply the optional stopping theorem (Theorem 1.6.1, Chapter 7) to deduce that for all x ∈ Rd , Ex [M0 ] ≥ Ex e−λT Rλ ϕ(XT ) . The result follows from the towering property of conditional expectations (equation (1), Section 1.1 of Chapter 1), together with the definition of M . Exercise 2.4.1 Suppose that µ is a finite measure that is supported on a compact set E. Check that for all x ∈ Rd and all λ > 0, Rλ µ(x) = Ex e−λTE Rλ µ(XTE ) .
Thus, Lemma 2.4.1 is best possible.
We now enlarge the probability space, if need be, to construct a random variable e that (i) is independent of Ft for all t ≥ 0; and (ii) has a mean 1 exponential distribution. Define e(λ) =
e , λ
λ > 0.
Then, for all λ > 0: • e(λ) is independent of the entire history of X; and • e(λ) has an exponential distribution with parameter λ. That is, P{e(λ) > y} = e−λy ,
for all y ≥ 0.
370
10. Probabilistic Potential Theory
Combining the above two facts, it follows that for all x ∈ Rd and all λ > 0, Px {e(λ) ≥ y} = e−λy , y ≥ 0. In particular, we have the following interpretation of the Laplace transform of the entrance time of E; it follows immediately from the above independence assertion, together with Fubini’s theorem. Lemma 2.4.2 For all x ∈ Rd and all λ > 0, Ex [e−λT ] = Px (T < e(λ)). Proof of the Upper Bound in Theorem 2.3.1 We may assume without loss of generality that Ex [e−λT ] = Px (T < e(λ)) > 0. For any measurable set G ⊂ Rd , define µ0 (G) = Px (XT ∈ G | T < e(λ)). It is easy to check that µ0 ∈ P(E), since E is compact and since t → Xt is right-continuous. We shall pick a bounded open set F ⊂ Rd that contains E, and define (ζt ; t > 0) to be the collection of measures given by the balayage theorem for µ0 in place of µ; cf. Theorem 2.2.1. The balayage theorem shows that for all t, λ > 0 and all x ∈ Rd , sup rλ (x, y) ≥ Rλ µ0 (x) ≥ Rλ ζt (x) ≥ Ex [e−λT Rλ ζt (XT )].
y∈E
We have applied Lemma 2.4.1 to the absolutely continuous measure ζt . On the other hand, e−λT is the conditional probability of (e(λ) > T ), given the entire process X. Thus, sup rλ (x, y) ≥ Ex [e−λT ] · Ex [Rλ ζt (XT ) | T < e(λ)] t∈E −λT = Ex [e ] · Rλ ζt (x) µ0 (dx), by the definition of µ0 . By the balayage theorem, as t → 0+ , ζt converges weakly to µ0 . Lemma 2.1.1 of Appendix D then asserts that lim inf t→0+ Rλ ζt (x) ≥ Rλ µ0 (x), for all x ∈ Rd . By Fatou’s lemma, −λT sup rλ (x, y) ≥ Ex [e ] · Rλ µ0 (x) µ0 (dx), y∈E
which equals Ex [e−λT ]Erλ (µ0 ); see Exercise 1.1.2, Appendix D. Consequently, sup rλ (x, y) ≥ Ex [e−λT ] · Erλ (µ0 ). y∈E
2 Hitting Probabilities for Feller Processes
371
The fact that rλ is a gauge function and x ∈ E implies that the left-hand side is finite. Since Ex [e−λT ] is assumed to be (strictly) positive, the above energy is finite, and we obtain Ex [e−λT ] ≤
supy∈E rλ (x, y) . Erλ (µ0 )
This amply proves the upper bound of Theorem 2.3.1.
To prove the lower bound in Theorem 2.3.1, we need one more technical lemma. Lemma 2.4.3 For all ϕ ∈ L∞ (Rd ), all λ > 0, and all x ∈ Rd , Ex
e(λ)
ϕ(Xs ) ds
0
2
≤2
Rλ ϕ(y)rλ (z, y) ϕ(y) ν(dz) ν(dy).
In particular, if ϕ = 0 off of a compact set E, then Ex
e(λ)
ϕ(Xs ) ds
0
2
≤ 2 sup rλ (x, y)Erλ (φ), y∈E
where φ(dz) = ϕ(z)ν(dz). Proof We expand the square to see that Ex
e(λ)
0
ϕ(Xs ) ds
2
= 2Ex
∞
0
= 2Ex
∞
0
s ∞ s
∞
1l(e(λ)>t) ϕ(Xs )ϕ(Xt ) dt ds e−λt ϕ(Xs )ϕ(Xt ) dt ds .
Let T denote the transition functions of X. By the Markov property, whenever s < t, Ex [ϕ(Xs )ϕ(Xt )] = Ex ϕ(Xs )EXs ϕ(Xt−s ) = Ex ϕ(Xs )Tt−s ϕ(Xs ) ; cf. Theorem 3.3.2, Chapter 8. Applying Fubini’s theorem twice, we obtain Ex
0
e(λ)
ϕ(Xs ) ds
2
= 2Ex = 2Ex
∞ 0
0
ϕ(Xs ) ∞
∞ s
e−λt Tt−s ϕ(Xs ) dt ds
e−λs ϕ(Xs )Rλ ϕ(Xs ) ds .
The lemma follows immediately from this and a few more lines of direct calculations.
372
10. Probabilistic Potential Theory
Exercise 2.4.2 The basic idea behind the estimate of Lemma 2.4.3 can be Eν [Z] = & used to show that rλ is nonnegative definite. Indeed, recall that Ex [Z] ν(dx), and check that for any bounded function f : Rd → R, and for all λ > 0, e(λ) 2 2 Eν f (Xs ) ds = Erλ (φ), λ 0 where φ(x) = f (x)ν(dx). Conclude that rλ is nonnegative definite for any λ > 0. Also, define as in Appendix D (Section 1.1) the mutual energy µ, νrλ for any two signed measures µ and ν on Rd . Use the above to show that •, •rλ defines an inner product on the space of signed measures whose corresponding norm is Erλ . This improves on Exercise 1.1.1, Appendix D. We conclude this subsection by proving the remainder of Theorem 2.3.1. Proof of the Lower Bound in Theorem 2.3.1 For every η > 0, let E η denote the η-enlargement of E described by η > 0. E η = x ∈ Rd : dist{x; E} < η , Each E η is a bounded open set that contains E. Let T η = inf(s ≥ 0 : Xs ∈ E η ) denote the entrance time to E η . This is a stopping time for every η > 0 (Theorem 1.2.1, Chapter 7) and Px -a.s. for all x ∈ Rd , T η ↓ T , as η → 0+ ; see Supplementary Exercise 5. We shall first obtain a lower bound η for Ex [e−λT ]. Consider any absolutely continuous probability measure ζ on E η and let ϕ(x) = ζ(dx)/ν(dx) denote its Radon–Nikod´ ym derivative. Note that if & e(λ) and when 0 ϕ(Xs ) ds > 0, then certainly, T η < e(λ). That is, e(λ)
η Ex e−λT = Px T η < e(λ) ≥ Px ϕ(Xs ) ds > 0 ; 0
cf. Lemma 2.4.2 for the first equality. By the Paley–Zygmund lemma (Lemma 1.4.1, Chapter 3), & e(λ) 2 −λT η Ex 0 ϕ(Xs ) ds ≥ Ex e & e(λ) 2 . Ex ϕ(X ) ds s 0 The numerator is the square of ∞ Ex e−λs ϕ(Xs ) ds = Rλ ϕ(x) ≥ inf rλ (x, y). 0
y∈E η
By Lemma 2.4.3, the denominator is bounded above by 2 supy∈E η rλ (x, y) times Erλ (ζ). Thus, 2 inf y∈E η rλ (x, y) −1 −λT Erλ (ζ) ]≥ . Ex [e 2 supy∈E η rλ (x, y)
3 Explicit Computations
373
Since ζ is an arbitrary absolutely continuous probability measure on E η , we can take the supremum over all such ζ’s to obtain the absolutely continuous capacity of E η on the right-hand side; see Appendix D. That is, 2 2 inf y∈E η rλ (x, y) inf y∈E η rλ (x, y) −λT η ac η · Crλ (E ) ≥ · Cac Ex [e ]≥ rλ (E). 2 supy∈E η rλ (x, y) 2 supy∈E η rλ (x, y) We now wish to let η ↓ 0+ . By Corollary 2.2.1, rλ is proper. Together with Theorem 2.3.1 of Appendix D, this implies that Cac rλ (E) = Crλ (E). On the other hand, as η → 0+ , inf y∈E η rλ (x, y) (respectively supy∈E η rλ (x, y)) converges to inf y∈E rλ (x, y) (respectively supy∈E η rλ (x, y)), since rλ is a gauge function and since x ∈ E. The remainder of this proof follows from the already observed fact that T η ↓ T , Px -a.s.
3 Explicit Computations In order to better understand the general results of the previous section, we now specialize our Markov processes to isotropic stable processes and make some rather explicit computations and/or estimations. While Brownian motion is one of the isotropic stable processes, its analysis is much simpler. Thus, we start with Brownian motion.
3.1 Brownian Motion and Bessel–Riesz Capacities
Let B = Bt ; t ≥ 0 denote d-dimensional Brownian motion. Our immediate goal is to compute its λ-potential density rλ . When d = 1, this was carried out in the example of Section 2.3, Chapter 8, but in disguise. To begin with, note that for all functions f ∈ L∞ (Rd ), all x ∈ Rd , and all t ≥ 0, f (x + y)qt (y) dy, Tt f (x) = Ex [f (Bt )] = E[f (x + Bt )] = Rd
where T denotes the semigroup corresponding to B and a2 d d , t ≥ 0, a ∈ Rd . qt (a) = (2π)− 2 t− 2 exp − 2t See Exercise 1.1.1 of Chapter 5. The semigroup T is called the heat semigroup (on Rd ), and its corresponding transition density is the so-called heat kernel. The latter exists. In fact, the discussion of Section 2.3 of Chapter 8 shows that Brownian motion B has transition densities pt (x, y) (with respect to Lebesgue’s measure) that are given by pt (x, y) = qt (y − x),
t ≥ 0, x, y ∈ Rd .
374
10. Probabilistic Potential Theory
Now the λ-potential density rλ can be computed as follows (why?): λ > 0, x, y ∈ Rd ,
rλ (x, y) = uλ (y − x),
where uλ (a) =
∞
0
e−λt qt (a) dt,
λ > 0, a ∈ Rd .
Of course, λ → uλ (a) is the Laplace transform of the function (not distribution function) t → qt (a). You can find some information on Laplace transforms in Appendix B. In any case, we can assemble the above information in one package to obtain the following formula for the 1-potential density of Brownian motion: For all x, y ∈ Rd , ∞ y − x2 d d dt. (1) e−t t− 2 exp − r1 (x, y) = (2π)− 2 2t 0 Exercise 3.1.1 Compute rλ for all λ > 0 and verify that it is indeed a gauge function on Rd × Rd . (i) Conclude that d-dimensional Brownian motion is a strongly symmetric Feller process. (ii) Check that for all x, y ∈ Rd and for all λ > 0, rλ (x, y) > 0. While rλ is a gauge function on Rd × Rd , it may (and in this case, it does when d ≥ 2) have singularities on the diagonal of Rd × Rd . A little thought shows that the nature of such singularities is precisely what makes some sets have positive capacity and others not. Our next two lemmas estimate the nature of this singularity, first in the case d ≥ 3, and then in the case d = 2. See the paragraph below Lemma 3.1.2 for the case d = 1. Lemma 3.1.1 Given d ≥ 3 and a finite constant ε > 0, there exist two finite positive constants A1 ≤ A2 such that for all x, y ∈ Rd with x − y ≤ ε, A2 A1 ≤ r1 (x, y) ≤ . y − xd−2 y − xd−2 Proof Recalling that r1 (x, y) = u1 (y − x), we use the formula of equation (1) and change variables to see that ∞ 2 d 1 −d 2−d 2 u1 (a) = (2π) a e−s a s− 2 e− 2s ds. (2) 0
Note that ∞ 0
2
d
1
e−s a s− 2 e− 2s ds ≤
0
∞
d
1
d
s− 2 e− 2s ds = 2−1+ 2 Γ
d − 2 . 2
3 Explicit Computations
375
Of course, we need the condition d ≥ 3 here to have finite integrals. The upper bound on r1 follows with A2 =
1 2π
d 2
Γ
d − 2 2
.
Similarly, the lower bound holds with −d 2
A1 = (2π)
∞
2
d
1
e−ε s s− 2 e− 2s ds,
0
which is positive and finite.
When d = 2, the estimation is only slightly more delicate, as the following shows. Lemma 3.1.2 Given d = 2 and a positive finite ε, there exist two finite positive constants A1 ≤ A2 such that for all x, y ∈ R2 with x − y < e−ε , A1 ln
1 1 ≤ r1 (x, y) ≤ A2 ln . x − y x − y
Proof By equation (2), for all a ∈ Rd , u1 (a) = (2π)−1
∞
2
1
e−s a s−1 e− 2s ds.
0
Thus, for all a ∈ Rd with a < e−ε , −1 − 32
u1 (a) ≥ (2π)
a −2
e
3
s−1 ds = π −1 e− 2 ln
1
1 . a
Thus, we obtain the lower bound on r1 with A1 = π −1 e−3/2 . To arrive at the upper bound, we split the integral as follows: For all a ∈ Rd with a < e−ε , u1 (a) = (2π)−1 T1 + T2 + T3 , (3) where T1 =
T3 =
2
1
e−s a s−1 e− 2s ds,
0
T2 =
1
a −2
1 ∞
a −2
2
1
e−s a s−1 e− 2s ds, 2
1
e−s a s−1 e− 2s ds.
376
10. Probabilistic Potential Theory
Clearly, T1 ≤
e
T3 ≤
1 − 2s −1
0
T2 ≤
1
s
∞
ds =
e
− 2t −1
t
1
dt ≤
∞
t
e− 2 dt = 2,
0
1 , s−1 ds = 2 ln a2 ∞ 2 e−s a s−1 ds = e−r r−1 dr ≤
a −2
1 ∞
a −2
1
∞
e−r dr = 1.
0
By equation (3), whenever a < e−ε , 1 1 ≤ (2π)−1 3ε−1 + 2 ln , u1 (a) ≤ (2π)−1 3 + 2 ln a a proving the result with A2 = (2π)−1 2ε−1 + 2 .
The only unresolved case is d = 1. Fortunately, this has been dealt with in the example of Section 2.3, Chapter 8, but in disguise. Namely, when d = 1, 2|x − y|2 14 √
r1 (x, y) = K 12 |x − y| 2 , 3 π where K 12 is the modified Bessel function of index 12 . Exercise 3.1.2 Show that when d = 1, r1 is continuous on all of R × R. Show, moreover, that r1 (x, y) > 0 for all x, y ∈ R. Next, we recall the definitions of Bessel–Riesz capacities; cf. Section 2.2 of Appendix C for more details. For any probability measure µ on Borel subsets of Rd , and for any β > 0, we define Energyβ (µ) to be the energy of µ with respect to the gauge function x → x−β . When β = 0, the gauge function is changed to x → ln+ (1/x). Finally, when β < 0, we define Energyβ (µ) to be identically equal to 1. For any Borel set E ⊂ Rd , we define the Bessel–Riesz capacity of E of index β ≥ 0 as Capβ (E) =
inf
µ∈P(E)
−1 Energyβ (µ) .
Exercise 3.1.1 states that X is a strongly symmetric Feller process with a strictly positive λ-potential density. Therefore, we can combine Lemmas 3.1.1 and 3.1.2 with Theorem 2.3.1 in order to obtain the following wellknown theorem of S. Kakutani; see Kakutani (1944b, 1945) for the original analysis in two dimensions. The higher-dimensional result can be found in Dvoretzky et al. (1950, Lemma 2).
3 Explicit Computations
377
Theorem 3.1.1 (Kakutani’s Theorem) Suppose B = (Bt ; t ≥ 0) denotes d-dimensional Brownian motion. For any compact set E ⊂ Rd and for all x ∈ E, Px (TE < ∞) > 0 ⇐⇒ Capd−2 (E) > 0. Exercise 3.1.3 Complete the described proof of Kakutani’s theorem. Remarks (i) When d = 1, Px (TE < ∞) > 0 for all compact sets E ⊂ R; (ii) When d = 2, Brownian motion does not hit points in the sense that for all x = y, Px T{y} < ∞ = 0. To see this, note that any probability measure on {y} is necessarily of the form cδy , where δy denotes a point mass at the point y and c > 0 is a finite constant. On the other hand,
Energy0 (cδy ) is clearly infinite. That is, Cap0 {y} = 0, for all y ∈ R2 . The same property holds when d ≥ 3.
3.2 Stable Densities and Bochner’s Subordination It turns out that it is both possible and interesting to try to extend Kakutani’s theorem (Theorem 3.1.1) from Brownian motion to isotropic stable processes. Unfortunately, transition densities of stable processes are not explicitly computable outside a few cases; cf. Example 3, Section 4.3 of Chapter 8. Thus, we need to take a different, more subtle, route. Let X = (Xt ; t ≥ 0) denote an isotropic stable L´evy process in Rd , whose index4 is α ∈ ]0, 2[ and whose transition functions are denoted by T. Recall from Example 3, Section 4.3 of Chapter 8, that X has transition densities (with respect to Lebesgue’s measure on Rd ) that can be written as follows: For all x ∈ Rd , all t ≥ 0, and all f ∈ L∞ (Rd ), f (y)pt (x, y) dy, Tt f (x) = Ex [f (Xt )] = Rd
with
t ≥ 0, x, y ∈ Rd ,
pt (x, y) = qt (y − x), where −d
qt (a) = (2π)
Rd
1
α
e−iξ·a e− 2 ξ dξ,
t ≥ 0, a ∈ Rd .
See Section 1.3 above for further details. While we have studied some of the elementary properties of the function qt in Lemmas 1.3.1 and 1.3.2 and 4 Since we have already studied the connections between Brownian motion and Bessel– Riesz capacity in the previous section, we can and will restrict attention to the stable processes other than Brownian motion; that is, α < 2.
378
10. Probabilistic Potential Theory
in Corollary 1.3.1, they are not sufficient for our present needs. In this and the next subsection we will study the function qt in much greater depth. In fact, we will study the behavior of a → qt (a), as a → ∞. By scaling, we can reduce this asymptotic question to one about the behavior of a → q1 (a) as a → ∞; cf. Lemma 1.3.1. In order to begin our asymptotic analysis, we relate the function q1 to the transition densities of Brownian motion. The mechanism for doing so involves a little Laplace transform theory and is called subordination; this is a special case of a general method that was proposed in Bochner (1955). Throughout, for any β > 0, we define the function gβ : R+ → R+ by gβ (λ) = exp{−(2λ)β },
λ ≥ 0.
The first important result of this subsection is the following. Theorem 3.2.1 If β ∈ ]0, 1[, then gβ is the Laplace transform of some probability measure σβ on R+ . In order to prove this, we need the following lemma. Recall from Appendix B that a function f : R+ → R+ is completely monotone if f is infinitely differentiable and satisfies f ≥ 0, f ≤ 0, f
≥ 0, f
≤ 0, . . . . Lemma 3.2.1 Consider two infinitely differentiable functions f : R+ → R+ and h : R+ → R+ . If f and h are completely monotone, so is the composition function f ◦ h. Exercise 3.2.1 Prove Lemma 3.2.1.
−x
β
and h(x) = (2x) and check Proof of Theorem 3.2.1 Let f (x) = e directly that f and h are both completely monotone. The theorem follows from Lemma 3.2.1 used in conjunction with Bernstein’s theorem; cf. Theorem 1.3.1 of Appendix B. When 0 < β < 1, the probability measure σβ given to us by Theorem 3.2.1 corresponds to the (completely asymmetric) stable distribution on [0, ∞[ with index β. For the remainder of this subsection we refer to this σβ as such and write Fβ for its distribution function. It is now easy to explain S. Bochner’s idea of subordination in our present context. Recalling that 0 < α < 2, we let β = α/2 and notice that 0 < β < 1. In particular, we can construct an a.s. positive random variable τβ = τα/2 and an independent Rd -valued random vector Z ∼ Nd (0, I), where I denotes the identity matrix of dimension (d× d). Recall that this latter notation merely means that Z = (Z (1) , . . . , Z (d) ), where the Z (i) ’s are i.i.d. standard normal random variables. In our present setting, Bochner’s subordination can be phrased as follows.
3 Explicit Computations
379
Theorem 3.2.2 (Bochner’s Subordination) Let τα/2 be a completely asymmetric stable random variable of index α/2, taking its values in R+ , a.s. Let Z ∼ Nd (0, I) be constructed on the same probability space and √ totally independently of τα/2 . Then, the Rd -valued random vector τα/2 Z has the same distribution as X1 . Proof We check the characteristic functions. By independence and by the form of the characteristic function of Gaussian random vectors (Section 1.1, Chapter 5), for all ξ ∈ Rd ,
√ E[exp{iξ · τα/2 Z}] = E[exp{− 21 τα/2 ξ2 }] = gα/2 12 ξ2 = exp − 12 ξα , which equals E[eiξ·X1 ]. The theorem follows from the uniqueness theorem of characteristic functions. Let Z and τα/2 be given by the above theorem. Conditional on the value √ of τα/2 , τα/2 Z ∼ Nd (0, τα/2 I). Thus, using Bochner’s subordination (Theorem 3.2.2), the form of the probability density of Gaussian random vectors, and Fubini’s theorem, we see that for all Borel sets A ⊂ Rd , x2 d −d dx E τα/22 exp − P(X1 ∈ A) = (2π)− 2 2τα/2 A ∞ x2 d −d 2 σα/2 (dt) dx, = (2π) t− 2 exp − 2t A 0 where σβ is the measure given by Theorem 3.2.1. But q1 (a) is the density function of X1 at a (with respect to Lebesgue’s measure). Therefore, we have the following. (Why?) Corollary 3.2.1 For any a ∈ Rd , ∞ a2 d d σα/2 (dt). t− 2 exp − q1 (a) = (2π)− 2 2t 0 In particular, q1 (a) > 0, for all a ∈ Rd . This corollary translates statements about a → q1 (a) (and its asymptotics) into statements about x → Fα/2 (x) (and its asymptotics.) We conclude this subsection with the asymptotics of x → Fα/2 (x). Theorem 3.2.3 For any α ∈ ]0, 2[,
−1 lim xα/2 P(τα/2 > x) = lim xα/2 {1 − Fα/2 (x)} = 2α/2 Γ 1 − 12 α .
x→∞
x→∞
Proof Using Corollary 2.1.2 of Appendix B with θ = 1 − α2 , it suffices to show that α/2 lim+ λ−α/2 {1 − F , α/2 (λ)} = 2 λ→0
380
10. Probabilistic Potential Theory
where f' denotes the Laplace transform of the function f ; cf. Appendix B. α/2 On the other hand, F }, from which the α/2 (λ) = gα/2 (λ) = exp{−(2λ) theorem follows immediately.
3.3 Asymptotics for Stable Densities Continuing with the setup of the previous subsection, we are now ready to prove the following technical estimate. Proposition 3.3.1 There exists a finite constant A > 1 such that for all a ∈ Rd with a ≥ 2, 1 a−(d+α) ≤ q1 (a) ≤ Aa−(d+α) . A A slightly better result is possible; cf. Supplementary Exercise 9. Proof Our proof is performed in several steps. We start by writing d q1 (a) = (2π)− 2 T1 + T2 ,
(1)
where T1 =
0
T2 =
a 2
∞
a2 σα/2 (dt), 2t a2 σα/2 (dt). exp − 2t
d
t− 2 exp d
t− 2
a 2
−
This is justified due to the formula given by Corollary 3.2.1. We now start to obtain upper and lower bounds for T1 and T2 in stages. The simplest is the upper bound for T2 . Indeed, T2 ≤
∞
t
a 2
≤ a−d = a−d
−d 2
σα/2 (dt) =
∞ j=0
∞ j=0 ∞
ej+1 a 2
ej a 2
d
t− 2 σα/2 (dt)
1 e− 2 dj σα/2 ]ej a2 , ∞[ 1
e− 2 dj P(τα/2 > ej a2 ).
j=0
By Theorem 3.2.3, we can find a finite constant A1 > 1 such that for all j ≥ 0 and all a > 0, 1
P(τα/2 > ej a2 ) ≤ A1 a−α e− 2 jα ,
3 Explicit Computations
whence where A2 = A1 lines:
T2 ≤ A2 a−(d+α) ,
∞
T1 =
j=0
e
− 12 (d+α)j
∞
e−j+1 a 2
≤ a−d
∞
exp
(2)
. The estimation of T1 starts along similar d
t− 2 exp
e−j a 2
j=1
381
jd
j=1
2
−
−
a2 σα/2 (dt) 2t
ej P(τα/2 > e−j a2 ). 2e
Whenever e−j a2 > 1, then we can find a finite constant A3 > 1 such that 1 P(τα/2 > e−j a2 ) ≤ A3 e 2 jα a−α . On the other hand, if e−j a2 ≤ 1, an appropriate bound for this above probability is 1. Thus, j(d + α) ej − T1 ≤ A3 a−(d+α) exp 2 2d 1≤j≤2 ln a
+ a−d
exp
j>2 ln a
jd 2
≤ A4 a−(d+α) + a−d exp where A4 = A3
∞ j=1
exp
−
ej 2e −
∞ jd ej a2 − , exp 4e 2 4e j=1
j(d + α) 2
−
ej . 2e
Since a ≥ 2, we see that there exists a finite constant A5 > 1 such that T1 ≤ A5 a−(d+α). In light of equation (1), we can combine this with d equation (2) to deduce the existence of a constant A6 = (2π)− 2 A5 +A2 > 1 such that that for all a ≥ 2, q1 (a) ≤ A6 a−(d+α).
(3)
It remains to get lower bounds. Using the decomposition of equation (1) one more time, 1
q1 (a) ≥ T2 ≥ e− 2 1
≥ e− 2 a−d
∞
d
a 2 ∞ j=0
1
t− 2 σα/2 (dt) ≥ e− 2
∞ j=0
ej+1 a 2
ej a 2
d
t− 2 σα/2 (dt)
d e−(j+1) 2 P ej+1 a2 ≥ τα/2 > ej a2 .
382
10. Probabilistic Potential Theory
By Theorem 3.2.3, for all ε > 0, there exists M > 2 such that for all a ∈ Rd with a ≥ M, and for all j ≥ 0,
P ej+1 a2 ≥ τα/2 > ej a2
= P τα/2 > ej a2 − P τα/2 > ej+1 a2 1
1
≥ (1 − ε)e− 2 jα a−α − (1 + ε)e− 2 (j+1)α a−α 1 α = a−α e− 2 jα (1 − ε) + ε(1 + e− 2 ) . Pick ε = 12 (1 − e−α/2 )(1 + e−α/2 )−1 to see that whenever a ≥ M , 1 − e−α
1 = A7 a−α . P ej+1 a2 ≥ τα/2 > ej a2 ≥ a−α e− 2 jα 2 In particular, whenever a ≥ M , q1 (a) ≥ A8 a−(d+α) , where A8 = e−1/2 A7
∞ 1 − e− α2
2
1
e− 2 (j+1+α)d .
j=0
By the positivity assertion of Corollary 3.2.1, whenever 2 ≤ a < M , the above lower bound for q1 (a) holds with A8 replaced by a possibly smaller constant A9 ; In light of this and equation (3), we have proved the proposition with an appropriately large choice of A.
3.4 Stable Processes and Bessel–Riesz Capacity We are in a position to estimate the hitting probabilities for the process X of this section. Recall that the transition density (with respect to Lebesgue’s measure) of X is pt (x, y) = qt (y−x), where qt (a) denotes the density (under P) of the random vector Xt , evaluated at a ∈ Rd . Thus, (why?) X has a λ-potential density rλ (x, y) that is given by rλ (x, y) = uλ (y − x), where,
uλ (a) =
0
∞
λ > 0, x, y ∈ Rd ,
e−λs qs (a) ds.
Note that uλ (a) ≤ uλ (0). However, the latter is finite if and only if d > α. Exercise 3.4.1 Verify that uλ (0) < ∞ if and only if d > α.
3 Explicit Computations
383
Some analysis shows that whenever d > α, rλ is a gauge function on Rd × Rd and X is a strongly symmetric Feller process with Lebesgue’s measure as its reference measure. In fact, this is true even if d ≤ α, but it is harder to prove. The following addresses such technical issues and provides the analogue of Lemmas 3.1.1 and 3.1.2. Lemma 3.4.1 Suppose d > α and λ > 0 are fixed. Then, there exist two finite positive constants A1 ≤ A2 such that for all x, y ∈ Rd with x − y ≤ 1, A2 A1 ≤ rλ (x, y) ≤ . x − yd−α x − yd−α In particular, rλ is a gauge function on Rd × Rd and X is a strongly symmetric Feller process whose reference measure is Lebesgue’s measure on Rd . Proof We will prove the asserted inequalities for the λ-potential density rλ . The remaining assertions follow from this and are covered in Exercise 3.4.2 below. By the scaling lemma (Lemma 1.3.1) and by elementary manipulations, d
1
qs (a) = s− α q1 (a/s α ), see Exercise 1.3.2. Thus,
uλ (a) =
∞
0
s ≥ 0, a ∈ Rd ;
d
1
s− α e−λs q1 (a/s α ) ds.
We will split this integral into two parts: uλ (a) = T1 (a) + T2 (a),
(1)
where T1 (a) = T2 (a) =
2−α a α
0 ∞
d
d
2−α a α
1
s− α e−λs q1 (a/s α ) ds, 1
s− α e−λs q1 (a/s α ) ds.
By Proposition 3.3.1, there exists a finite constant A3 > 1 such that for all b ∈ Rd with b ≥ 2, q1 (b) ≤ A3 b−(d+α) . Thus, 2−α a α −(d+α) s ds = 21−2α A3 aα−d . (2) T1 (a) ≤ A3 a 0
On the other hand, by Corollary 1.3.1, for all a ∈ Rd , q1 (a) ≤ q1 (0) < ∞. Consequently, d s− α ds = A4 aα−d , T2 (a) ≤ q1 (0) 2−α a α
384
10. Probabilistic Potential Theory
&∞ d where A4 = q1 (0) 2−α t− α dt. Together with equations (1) and (2), this proves the upper bound with A1 = 21−2α A3 + A4 . To obtain the lower bound, note that whenever a ≤ 1,
uλ (a) ≥ T1 (a) ≥ exp − λ2
−α
2−α a α
0
d
1
s− α q1 (a/s α ) ds.
Now we can use the lower bound of Proposition 3.3.1 and argue as above to finish our proof. Exercise 3.4.2 Complete the derivation of Lemma 3.4.1 by showing that rλ is a gauge function on Rd × Rd and that X is a strongly symmetric Feller process whose reference measure is Lebesgue’s measure on Rd . The “critical case” is when d = α. In this section α ∈ ]0, 2[, which means that the critical case is d = α = 1. That is handled by the following estimate. Lemma 3.4.2 Suppose d = α = 1 and λ > 0 are fixed. Then, there exist two finite positive constants A1 ≤ A2 such that for all x, y ∈ R with x − y ≤ 12 ,
1 1 ≤ rλ (x, y) ≤ A2 ln . A1 ln x − y x − y In particular, rλ is a gauge function on R×R and X is a strongly symmetric Feller process with Lebesgue’s measure on R as its reference measure. Exercise 3.4.3 Prove Lemma 3.4.2. (Hint: Borrow ideas from Lemmas 3.1.2 and 3.4.1.)
Exercise 3.4.4 Show that in any case, whenever X is a d-dimensional isotropic stable L´evy process of index α ∈ ]0, 2], rλ (x, y) > 0 for all x, y ∈ Rd . We are ready to state the following extension of Kakutani’s theorem (Theorem 3.1.1). Theorem 3.4.1 Suppose X = (Xt ; t ≥ 0) denotes a d-dimensional isotropic stable L´evy process of index α ∈ ]0, 2]. For any compact set E ⊂ Rd and for all x ∈ E,
Px TE < ∞ > 0 ⇐⇒ Capd−α (E) > 0, where Capβ denotes the index-β Bessel–Riesz capacity defined in Section 3.1. Exercise 3.4.5 Carefully verify Theorem 3.4.1.
3 Explicit Computations
385
3.5 Relation to Hausdorff Dimension Let X = (Xt ; t ≥ 0) denote an Rd -valued isotropic stable L´evy process with index α ∈ ]0, 2]. Given a compact set E ⊂ Rd , let TE = inf(s > 0 : Xs ∈ E) and note that TE < ∞ if and only if X(R+ ) ∩ E = ∅, where X(T ) is the image of T under the random map s → Xs . That is, X(T ) = {Xs : s ∈ T }. In particular, X(R+ ) is the range of X. According to Theorem 3.4.1, for all x ∈ E, Px X(R+ ) ∩ E = ∅ > 0 ⇐⇒ Capd−α (E) > 0. We can now combine the above with Frostman’s theorem (Theorem 2.2.1, Appendix C) to conclude the following. For a detailed historical account of the development of this result and its kin, see Taylor (1986). Theorem 3.5.1 For any compact set E ⊂ Rd and for all x ∈ E, dim(E) > d − α =⇒ Px X(R+ ) ∩ E = ∅ > 0, dim(E) < d − α =⇒ Px X(R+ ) ∩ E = ∅ = 0.
Remarks (1) The above states that the Hausdorff dimension of a set E essentially determines whether or not E is ever hit by the range of an isotropic stable L´evy process. In the critical case where d = α, knowing the Hausdorff dimension alone does not give us enough information on the positivity of the probability of the intersection of the range and the set E; see Carleson (1983, Theorems 4 and 5, Chapter IV), for example. (2) If the starting point x is inside E, the estimation of hitting probabilities could become either trivial or much more complicated, depending on one’s interpretation of hitting probabilities and on how complicated the structure of the set E is. For instance, if x ∈ E, it is trivial to see that for E ⊂ Rd compact, Px (X(R+ ) ∩ E = ∅) = 1, since X0 = x ∈ E, Px -a.s. However, one can ask about when lim inf x→E:x ∈E Px (X(R+ ) ∩ E = ∅) > 0. This is another matter entirely and has to do with the development of a theory for so-called regular points; we will not address such issues in this book. However, you may wish to consult Itˆo and McKean (1974, equation (9), Section 8.1, Chapter 7) under the general heading of Wiener’s test for electrostatic (Bessel–) Riesz capacity to appreciate some of the many subtleties involved.
386
10. Probabilistic Potential Theory
4 Supplementary Exercises 1. Consider an Rd -valued L´evy process X = (Xt ; t ≥ 0). Prove that the following are equivalent: (i) X is transient; (ii) the origin is transient; and (iii) limt→∞ |Xt | = +∞, almost surely. 2. Suppose R and T denote the the resolvent and transitions of a strongly symmetric Feller process X on Rd , with reference measure ν. (i) If φ is a finite measure on Rd , show that as t ↓ 0, e−λt Tt Rλ φ(x) ↑ Rλ φ(x) for each x ∈ Rd . (ii) Given that there are two constants λ, γ > 0 and two probability measures µ1 and µ2 such that Rλ µ1 (x) = Rγ µ2 (x), for ν-almost all x ∈ Rd , prove that Rλ µ1 (x) = Rγ µ2 (x) for every x ∈ Rd . Conclude that µ1 equals µ2 . 3. (Hard) Let X = (Xt ; t ≥ 0) denote an Rd -valued isotropic stable L´evy ε to be process with index α ∈ ]0, 2[. Fix some R, ε > 0 and define the sausage XR the random compact set ε = x ∈ [−R, R]d : dist[{x}; X([1, 2])] ≤ ε . XR (i) Prove that when d > α, there exists a finite constant A > 0 that depends on ε is d, α, and R such that the expected value of Lebesgue’s measure of XR d−α . bounded above by Aε (ii) Prove that when d = α, there exists a finite constant A that depends on ε is d, α and R, such that the expected value of Lebesgue’s measure of XR bounded above by A/ ln+ (1/ε). (iii) Deduce from this that when d ≥ α, the d-dimensional Lebesgue’s measure of the random set X(R+ ) is a.s. zero. In particular, this is the case when X is two-dimensional Brownian motion. This is essentially due to P. L´evy. (Hint: Use Lemma 1.4.3.) 4. Recall from Chapter 8 that when X is a Feller process on Rd , as λ → 0+ , λRλ f converges uniformly to f , for all continuous functions f that vanish at infinity. Suppose further that X is strongly symmetric. Prove that for all finite measures µ on Rd , the measures λRλ µ(x)ν(dx) converge weakly to µ, as λ → 0+ . 5. Let E denote a compact set in Rd and let E η denote its η-enlargement. That is, x ∈ E η if and only if the Euclidean distance between x and E is (strictly) less than η. Let T η = inf(s ≥ 0 : Xs ∈ E η ) denote the entrance time to E η and prove that for any x ∈ Rd , as η → 0+ , T η ↓ T , Px -a.s. (Hint: Use the right continuity of t → Xt .)
4 Supplementary Exercises
387
6. (Hard) Suppose E ⊂ Rd is compact, F ⊂ Rd is open, and E ⊂ F . If X is a strongly symmetric Feller process, define the operator TtF f (x) = Ex [f (Xt )1l(SF >t) ], where SF = inf(s ≥ 0 : Xs ∈ F ). (i) Prove that T F = (TtF ; t ≥ 0) is a semigroup. (ii) Show that T F defines a Markov semigroup on the one-point compactification F∆ of F . Informally, T F corresponds to the Markov process X, killed upon leaving the open set F . In this regard, see also Section 1.3, Chapter 8. d d (iii) Show that for each t > 0, there exists a nonnegative function pF t : R ×R , where:
(a) for each x ∈ Rd , y → pt (x, y) is measurable; (b) unless x and y are both in F , pt (x, y) = 0; & (c) as operators, TtF ϕ(x) = pF t (x, y)ϕ(y) ν(dy). (v) Show that for all ϕ, ψ ∈ L∞ (Rd ), ϕ(x)TtF ψ(x) ν(dx) = ψ(x)T tF ϕ(x) ν(dx). Use this to prove that for each t > 0, we can choose a version of the density pF t with the following further property: There exists a ν-null set Nt such F that for all x ∈ Nt and for all y ∈ Rd , pF t (x, y) = pt (y, x). F F (vi) One & F can extend the definition of T to act on a measured φ by Tt φ(x) = pt (x, y) φ(dy). Show that for all λ, t > 0 and all x ∈ R ,
Rλ TtF φ(x) ≤ eλt Rλ Tt φ(x), where T and R denote the transitions and the resolvent of X, respectively. 7. Let X = (Xt ; t ≥ 0) denote an isotropic stable L´evy process on Rd with index α ∈ ]0, d[. Prove that for any given a ∈ Rd , {t ≥ 0 : Xt = a} is a.s. unbounded. 8. Let X = (Xt ; t ≥ 0) denote a L´evy process on Rd and let R denote the collection of all points in Rd that are recurrent for X. (i) Prove that R is a free abelian subgroup of Rd . In particular, deduce from this that R = ∅ if and only if the origin is recurrent. (ii) A point x ∈ Rd is said to be (nearly) possible if there exists t ≥ 0 such that for all ε > 0, P(|Xt − x| ≤ ε) > 0. Prove that when R = ∅, all possible points are recurrent. (iii) Prove that when R = ∅, one can identify X with a recurrent L´evy process on a free abelian group G.
388
10. Probabilistic Potential Theory
9. Refine Proposition 3.3.1 by showing that for any α ∈ ]0, 2[, απ d + α α d lim xd+α q1 (x) = α2α−1 π − 2 −1 sin Γ Γ , x→∞ 2 2 2 where Γ denotes the gamma function. When d = 1, this is due to P´ olya (1923); the general case is due to Blumenthal and Getoor (1960b); see also (Bendikov 1994; Bergstr¨ om 1952). (Hint: Follow the described derivation of Proposition 3.3.1 but pay closer attention to estimating the errors involved. You may need the identity βΓ(1 − β) = πβ/ sin(πβ).) 10. Suppose X has transition densities pt (x, y). Prove that for any finite measure µ, t → e−λt Rλ µ(Xt ) is a supermartingale. Use this to derive another proof of Lemma 2.4.2. 11. Consider a d-dimensional Brownian motion B = (Bt ; t ≥ 0) and define the t process O = (Ut ; t ≥ 0) by Ot = e− 2 Bet , (t ≥ 0). This is the Ornstein–Uhlenbeck process also defined earlier in Supplementary Exercise 3, Chapter 9. (i) Prove that O is a strongly symmetric Feller process with reference measure 1 2 ν(dx) = e 2 x dx. (ii) Show that for all compact sets E ⊂ Rd and for all x ∈ E, Px {Ot ∈ E for some t ∈ [0, e(λ)]} > 0 if and only if E has positive (d − 2)dimensional Bessel–Riesz capacity. 12. (Hard) Suppose X is a strongly symmetric Feller process on Rd , with reference measure ν, and whose λ-potential density is rλ for any λ > 0. Prove that rλ satisfies the maximum principle of Appendix D. In other words, show that for all compactly supported probability measures µ with support E, sup Rλ µ(x) = sup Rλ µ(x).
x∈Rd
x∈E
(Hint: First, consider the supremum of Rλ f , where f is a function. For this you may need the strong Markov property. Then, apply balayage, together with lower semicontinuity.)
5 Notes on Chapter 10 Section 1 The fact that recurrence/transience and hitting probabilities are quite intertwined has been in the folklore of the subject for a very long time. It seems to have been made explicit in Kakutani (1944a). Section 2 Our proof of Lemma 2.2.2 involves time-reversal. We come back to this method time and again, since time-reversal is intrinsic to the structure of one-parameter Markov processes. See the Notes in Chapter 8 for some references on time-reversal.
5 Notes on Chapter 10
389
To the generalist, the material on potential theory may seem a little specialized. However, this is mainly due to the nature of the exposition rather than the power of the presented methods. Here are two broad remarks: To go beyond the stated symmetry assumptions, one needs a little bit more on time-reversal and initial measures; to extend the theory to processes on separable, locally compact metric spaces, one needs the general form of Prohorov’s theorem. See Section 2.5 of Chapter 6 for the latter and see Chapter 8 for the former. Other variants of Theorem 2.3.1 are also possible; see (Fitzsimmons and Salisbury 1989; Salisbury 1996) for the starting point of many of the works in the subject, as well as (Bauer 1994; Benjamini et al. 1995; Mazziotto 1988; Ren 1990), and the fine series by Hirsch and Song (1994, 1995, 1995b, 1995c, 1995d). There is a rich theory of 1-parameter Markov processes that requires only a notion of duality, and goes beyond assumptions of symmetry; see Getoor (1990) and its bibliography. The passing references made to time-reversal have to do with the fact that, in the construction of pn t in Lemma 2.2.1, we tacitly reversed time, in that we considered both vectors (X0 , X2−n t , . . . , X((2n −1)2−n )t , Xt ) and (Xt , X((2n −1)2−n )t , . . . , X2−n t , X0 ). Time-reversal was systematically explored in Nagasawa (1964), and leads to a rich theory that takes too long to develop here; see also Millar (1978). Section 3 Potential theory of L´evy processes is well delineated in (Bertoin 1996; Sato 1999) and their combined bibliography. See Janke (1985), Kanda (1982, 1983), Kanda and Uehara (1981), Hawkes (1979), Orey (1967), Rao (1987), and their references for some of the earlier work. Fukushima et al. (1994) contains a detailed treatment of the potential theory of symmetric Markov processes in the context of the Brelot–Beurling–Cartan–Deny theory. It is not possible to extend the potential density estimates of Lemmas 3.1.1, 3.1.2, 3.4.1, and 3.4.2 to completely general stable processes; cf. Pruitt and Taylor (1969) for a precise statement. Theorem 3.5.1 is a starting point for connections between stochastic processes and Hausdorff dimension. We will elaborate on this at greater length in Chapter 11. In the meantime, have a look at the survey article Taylor (1986) for an extensive bibliography.
This page intentionally left blank
11 Multiparameter Markov Processes
We can informally interpret Chapter 8’s definition of a Markov process X = (Xt ; t ≥ 0) as a (one-parameter) process whose “future” values Xt+s depend on the past only through its current value Xt . While this is perfectly intuitively clear (due to the well-ordering of the “time axis”), it is far less clear what a multiparameter Markov process should be. In this chapter we introduce and study a class of random fields called multiparameter Markov processes. The definitions, given early on, are motivated by the potential theory that is developed later in this chapter. We will also see how this multiparameter theory can be used to study intersections of ordinary one-parameter processes.
1 Definitions Throughout, we let (S, d) denote a separable, locally compact metric space. As in Chapter 8, we may need to compactify it via a one-point compactification and topologize it with its one-point compactification topology. Also as in Chapter 8, we always denote the latter one-point compactification by S∆ . In agreement with Chapter 8, all measurable functions f : S → R are extended to functions from S∆ into R via the assignment f (∆) = 0. This section provides us with a general definition of an N -parameter Markov process that takes its values in the space S∆ .
392
11. Multiparameter Markov Processes
1.1 Preliminaries An N -parameter, S∆ -valued stochastic process X = (Xt ; t ∈ RN + ) is said to be a multiparameter Markov process if there exists an N -parameter N filtration F = (Ft ; t ∈ RN + ) and a family of operators T = (Tt ; t ∈ R+ ) such that for all x ∈ S, there exists a probability measure Px for which the following conditions are met:1 (i) X is adapted to F; (ii) t → Xt is right-continuous Px -a.s. for all x ∈ S; (iii) for all t ∈ RN + , Ft is Px -complete for all x ∈ S; moreover, F is a commuting σ-field with respect to all measures Px , x ∈ S; (iv) for all s, t ∈ RN + and all f ∈ C0 (S), the following holds Px -a.s. for all x ∈ S: Ex [f (Xt+s ) | Fs ] = Tt f (Xs ); and (v) For all x ∈ S, Px (X0 = x) = 1. Remarks (1) Part of the assertion of (ii) is that the following event is measurable: (t → Xt is right-continuous). (2) From now on, we assume the existence of such a process. Later on, we shall see, via diverse examples, that such processes often exist. In a oneparameter setting we have seen that such an assumption can be proved to hold under Feller’s condition. (3) Recall that a function f : RN + → S is right-continuous if for any s ↓ t (with respect to the partial order on RN + ), f (s) → f (t). (4) Suppose ν is a measure on the Borel subsets of S. In complete analogy with the 1-parameter theory, we will write & & Pν for the measure Px (· · ·) ν(dx). Likewise, Eν denotes the operator Ex (· · ·) ν(dx).2 Intuitively speaking, Pν denotes the underlying probability measure, given that X0 has distribution ν. However, this makes honest sense only when ν is itself a probability measure. Nonetheless, the above are both well-defined, irrespective of the total mass of ν; we will call ν the initial measure and call Pν and Eν the distribution of the process X and its expectation operator, respectively, with initial measure ν.
1 There are other, equally interesting, notions of Markov property in several dimensions; see Rozanov (1982). 2 Of course, this discussion makes sense only if x → P (· · ·) is measurable, a fact that x will hold automatically for the multiparameter Markov processes of the remainder of this chapter.
1 Definitions
393
The Tt ’s are called the transition operators of the process X. As in Chapter 8, we may identify the transition operator Tt with the transition function Tt (x, A) = Tt 1lA (x); cf. equation (1), Section 3.1 of Chapter 8. In particular, we are justified in also referring to the Tt ’s as the transition functions of X. It is extremely important to stress that the underlying filtration must be commuting for the forthcoming theory to work. Finally, we say that X is an N -parameter, S∆ -valued Feller process (or simply Feller) if: (i) for all t ∈ RN + , Tt : C0 (S) → C0 (S); that is, for all f ∈ C0 (S), Tt f ∈ C0 (S); (ii) for each f ∈ C0 (S),
lim Tt f − f ∞ = 0.
t→0
Given an N -parameter Markov process with transition functions T and initial measure ν, we can compute the “one-dimensional marginals” of the process X, i.e., the Pν -distribution of Xt for any t ∈ RN + , as follows: Tt (x, A) ν(dx), for all measurable A ⊂ S, t ∈ RN Pν (Xt ∈ A) = +. S
This follows from the fact that Px (Xt ∈ A) = Tt (x, A). One can also compute “two-dimensional” marginals, although the expression is slightly more cumbersome: Lemma 1.1.1 For all ϕ1 , ϕ2 ∈ L∞ (S), every x ∈ S, and all s, t ∈ RN +, Ex ϕ1 (Xs )ϕ2 (Xt ) = Tst Ts−(st) ϕ1 · Tt−(st) ϕ2 (x). Proof Clearly, ϕ(Xt ) (respectively ϕ(Xs )) is measurable with respective to Ft (respectively Fs ). Thus, commutation allows us to write Ex ϕ1 (Xs )ϕ2 (Xt ) = Ex Ex ϕ1 (Xs ) Fst · Ex ϕ2 (Xt ) Fst . By the multiparameter Markov property, Ex ϕ1 (Xs )ϕ2 (Xt ) = Ex Ts−(st) ϕ1 (Xst ) · Tt−(st) ϕ2 (Xst ) , which has the desired effect.
Exercise 1.1.1 Suppose γ : [0, 1] → RN + is nondecreasing in the sense that whenever s ≤ t, then γ(s) γ(t). Given any such γ, show that the finite-dimensional distributions of the one-parameter process t → Xγ(t) are entirely determined by the transition functions T. At the time of writing this book, it is not known whether T determines the finite-dimensional distributions of the entire process X.
394
11. Multiparameter Markov Processes
' is another N -parameter Markov pro(Hint: You can try showing that if X 'γ(t) has the same finite-dimensional cess with transition functions T, t → X distributions as X.) It is time to discuss concrete examples of multiparameter Markov processes. Example 1 Consider two independent d-dimensional Brownian B 1 and B 2 and define the 2-parameter, d-dimensional additive Brownian motion X = (Xt ; t ∈ R2+ ) as Xt = Bt1(1) + Bt2(2) ,
t ∈ R2+ .
For all t ∈ R2+ , all x ∈ Rd , and all measurable functions f : Rd → R+ , define the bounded linear operator Tt f (x) = qt (y − x)f (y) dy, where t 0 is in R2+ , x ∈ Rd , and d qt (a) = (2π{t(1) + t(2) })− 2 exp −
a2 , 2{t(1) + t(2) }
a ∈ Rd .
As usual, a denotes the 2 -Euclidean norm of a ∈ Rd . By the elementary properties of Gaussian processes (Exercise 1.1.1, Chapter 5), Tt f (x) = E[f (x + Xt )]. We shall see later on, in Section 3 (or you can try this at this point), that X is a 2-parameter Feller process on Rd whose transition operators are precisely T = (Tt ; t ∈ R2+ ). Example 2 Continuing with the basic setup of Example 1, define the 2parameter process Y = (Yt ; t ∈ R2+ ) by Yt = Bt1(1) ⊗ Bt2(2) ,
t ∈ R2+ .
Note that B 1 and B 2 are Rd -valued, while Y is Rd × Rd = R2d -valued. In particular, if d = 1, Yt = (Bt1(1) , Bt2(2) ). Similar calculations to those in Example 1 can be made to show that Y is a 2-parameter Feller process whose transition operators are u2 v2
−d (1) (2) − d 2 f x+{u⊗v} exp − (1) − (2) du dv, Tt f (x) = (2π) {t t } 2t 2t Rd ×Rd
1 Definitions
395
where t 0 is in R2+ , x ∈ R2d , and f : R2d → R+ is measurable. (Why t 0 and not t 0?) It is a good idea to either verify directly that Y is a 2-parameter Feller process or to peek ahead to Section 3 and peruse the general discussion there. The two-parameter process Y is called bi-Brownian motion and is related to multiply harmonic functions; cf. Cairoli and Walsh (1977a, 1977b, 1977c), as well as Walsh (1986b), together with their combined references. Exercise 1.1.2 The discussion of Example 2 describes the operator Tt when t 0. What does this particular operator look like when t ∈ R2+ but t 0? Exercise 1.1.3 Prove that the standard N -parameter Brownian sheet is not an N -parameter Markov process.
1.2 Commutation and Semigroups Consider an N -parameter, S∆ -valued Markov process X = (Xt ; t ∈ RN +) and let T = (Tt ; t ∈ RN ) denote its corresponding collection of transition + operators. In this subsection we study some of the analytical properties of the family T. ∞ To start, let us note that for all t ∈ RN + , f ∈ L (S), and x ∈ S, Tt f (x) = Ex [Tt f (X0 )] = Ex Ex {f (Xt ) | F0 } = Ex [f (Xt )]. Moreover, whenever s is also in RN +,
Tt Ts f (x) = Ex [Ts f (Xt )] = Ex Ex [f (Xt+s ) | Ft ] = Ex [f (Xt+s )] = Tt+s f (x).
That is, T is a semigroup of operators on S. In fact, we have the following result. Proposition 1.2.1 The transition functions of an N -parameter, S∆ valued Markov process form an N -parameter semigroup of bounded linear operators on S. Exercise 1.2.1 Complete the proof of Proposition 1.2.1.
The operators Tt are interesting in and of themselves, since they describe the local dynamics of the process X. Next, we will show a representation of T in terms of N one-parameter family of operators. For any integer 1 ≤ j ≤ N , we can temporarily define σj : R+ → RN + as follows: "
() r, if = j, = 1 ≤ ≤ N. σj (r) 0, otherwise,
396
11. Multiparameter Markov Processes
Stated in plain terms, for all r ≥ 0, σ1 (r) = (r, 0, 0, . . . , 0), σ2 (r) = (0, r, 0, · · · , 0), .. . σN (r) = (0, 0, . . . , 0, r). We can now define N one-parameter bounded linear operators T 1 , . . . , T N , all on S, by the following prescription: For every j ∈ {1, . . . , N }, Trj f (x) = Tσj (r) f (x),
f ∈ L∞ (S), x ∈ S, r ≥ 0.
The operators T 1 , . . . , T N are called the marginal semigroups (equivalently, marginal transition functions or marginal transition operators) of X. The reason for this terminology is given by the following result. Proposition 1.2.2 If T denotes the transition functions of an N parameter Markov process, then for all t ∈ RN +, Tt = Tt1(1) · · · TtN(N ) , where the T i ’s denote the marginal transition functions of T. Furthermore, each T i is itself a 1-parameter Markov semigroup. Finally, the T i ’s commute in the following sense: For all i, j ∈ {1, . . . , N } and all s, t ≥ 0, Tsi Ttj = Ttj Tsi . Exercise 1.2.2 Verify Proposition 1.2.2.
A Notational Remark If A and B denote any two bounded linear operators on S, then AB (itself a bounded linear operator) denotes the composition of the two operators A and B, in this order. Motivated by this, we sometimes write A1 ◦ A2 ◦ · · · ◦ Am = A1 · · · Am . We will also sometimes write this as %m j=1 Aj , all the time noting that 1 %m j=1 Aj need not equal %j=m Aj , since the Aj ’s need not commute. However, this is not an issue for the transition operators of a multiparameter Markov process. Indeed, as the above proposition shows, for each t ∈ RN +, Tt can be written compactly as π(j)
Tt = %N j=1 Tt(j) ,
(1)
where {π(1), . . . , π(N )} denotes an arbitrary permutation of {1, . . . , N }. Let us close this section with a property of Feller processes.
1 Definitions
397
Proposition 1.2.3 Suppose X is an N -parameter Markov process with transition functions T and marginal semigroups T 1 , . . . , T N . Then, X is Feller if and only if T j is a one-parameter Feller semigroup, for every j ∈ {1, . . . , N }. Exercise 1.2.3 Prove Proposition 1.2.3.
1.3 Resolvents Corresponding to X and T of the previous section, we now define the resolvents as the N -parameter family R = (Rλ ; λ ∈ RN + such that λ 0), where Rλ = e−λ·s Ts ds, λ 0 in RN +, RN +
as bounded linear operators.3 This is shorthand for the following: For all measurable functions f : S → R+ , Rλ f (x) = e−λ·s Ts f (x) ds, x ∈ S, λ 0 in RN +. RN +
It is often more convenient to use the former operator notation, as we shall see next. Note that as linear operators, for all λ, s 0 (both in RN + ), (j) (j)
−λ e−λ·s Ts = %N j=1 e
s
Tsj(j) .
Thus, viewed once again as bounded linear operators, ∞ (j) e−λ·s Ts ds = %N e−λ r Trj dr. j=1 RN +
0
But the right-hand side is Rjλ(j) , where Rj = (Rjγ ; γ > 0) denotes the resolvent of the Markov semigroup T j . That is, we have proved the following: Proposition 1.3.1 Consider an N -parameter Markov process X whose transition functions are given by T. If T 1 , . . . , T N and R1 , . . . , RN denote the associated marginal semigroups and their respective resolvents, then j Rλ = %N j=1 Rλ(j) ,
λ 0 (in RN + ).
(j) s(j) denotes the Euclidean inner product in the earlier chapters, λ · s = N j=1 λ N between s and λ, both of which are in R+ . Also, note that in order for Rλ to be a bounded linear operator, in general, we need to assume that λ 0 (in RN + ), since there & is no a priori reason for x → RN Ts f (x) ds to be bounded when f is. See Supplementary 3 As
Exercise 13.
+
398
11. Multiparameter Markov Processes
When X is a Feller process, we have the following N -parameter analogue of Lemma 2.4.1 of Chapter 8. Lemma 1.3.1 Suppose X is an N -parameter Feller process with transition functions T and resolvent R, respectively. Then, for all f ∈ C0 (S), lim sup Tt+s f − Ts ∞ = 0,
t→0 s∈RN +
( ( lim (
λ→0
N
( ( λ(j) · Rλ f − f (
∞
j=1
= 0.
Exercise 1.3.1 Prove Lemma 1.3.1.
Exercise 1.3.2 Suppose f, g : S → R are bounded measurable functions such that for some γ 0 in RN + , Rγ f = Rγ g. Prove that for all λ 0 in RN + , Rλ f = Rλ g.
1.4 Strongly Symmetric Feller Processes We now focus our attention on a class of N -parameter Markov processes on the Euclidean space Rd . d Let X = (Xt ; t ∈ RN + ) denote an N -parameter, R -valued Markov proN cess with transition operators T = (Tt ; t ∈ R+ ). A measure ν on the Borel subsets of S is a reference measure for X if: 1. there exists a measurable function p from ]0, ∞[N ×Rd × Rd into R+ such that for all measurable f : S → R+ , all t 0 in RN + , and all x ∈ Rd , Tt f (x) = f (y)pt (x, y) ν(dy); and 2. for all open sets G ⊂ Rd , ν(G) > 0. The second condition is assumed for simplicity, and does not have implications that deeply affect the potential theory to be developed; cf. Exercise 3.4.1 below. In particular, by choosing f (x) = 1lA (x), the above implies that we can write probabilities such as Px (Xt ∈ A) in terms of the density function pt . In general, since f ≥ 0, the above integrals are always well-defined, but they may be infinite. Part of this definition is that either they are both finite, or they are both infinite. The functions pt are the transition densities of X with respect to ν, while the corresponding λ-resolvent density rλ is defined by the following formula: e−λ·s ps (x, y) ds, rλ (x, y) = RN +
1 Definitions
399
d where λ 0 is in RN + and x, y ∈ R . As in Chapter 10, we can extend the definition of the operators Tt and Rλ as& follows: For all σfinite measures µ on Borel subsets of Rd , Tt µ(x) = pt (x, y) µ(dy) and & Rλ µ(x) = rλ (x, y) µ(dy), where x ∈ Rd and λ, t 0 are both in RN +.
Exercise 1.4.1 Prove that the above definitions are consistent with the earlier ones. That is, suppose µ(dx) = f (x) ν(dx) is a σ-finite measure on Rd . Prove that for all t, λ 0, both in RN + , Rλ µ = Rλ f and Tt µ = Tt f . Moreover, check that rλ is symmetric if pt is, i.e., if pt (x, y) = pt (y, x). We can consistently define Pν , even when ν is not a probability measure, by Pν (•) = Px (•) ν(dx), & and Eν (· · ·) = Ex (· · ·) ν(dx), in analogy. Recall that when ν is a probability measure, we think of Pν as the distribution of the random function X, when the distribution of X0 is ν. When ν is not a probability measure, this interpretation breaks down unless we are willing to think of the distribution of X0 as a (possibly infinite) σ-finite measure ν. We say that X is a strongly symmetric Feller process if: d d (i) for every t 0 in RN + , pt is a symmetric function on R × R (that is, d pt (x, y) = pt (y, x), for all x, y ∈ R ); and d d (ii) for each λ 0 in RN + , rλ is a proper gauge function on R × R .
A technical word of caution is in order here. When N = 1, we needed to assume only that rλ is a gauge function. From this and symmetry, it followed that rλ is proper; cf. the balayage theorem (Theorem 2.2.1, Chapter 10). Our N -parameter processes are sufficiently general, however, that we need to assume that rλ is proper for a nice theory to follow. Nevertheless, we shall see, via a number of examples, that this assumption is, in practice, harmless. Exercise 1.4.2 Suppose that X is strongly symmetric. (i) Prove that for all σ-finite measures ξ and ζ on Rd , Tt ζ(x) ξ(dx) = Tt ξ(x) ζ(dx), 2 for all t ∈ RN + . In other words, show that Tt is self-adjoint on L (ν).
(ii) Prove that for all σ-finite measures ξ and ζ on Rd , Rλ ζ(x) ξ(dx) = Rλ ξ(x) ζ(dx), whenever λ 0 is in RN +. This is a multiparameter extension of Exercise 2.1.5, Chapter 10.
400
11. Multiparameter Markov Processes
An important property of strongly symmetric Markov processes is that if we allow their initial measure to be their reference measure, then the distribution of Xt becomes independent of t. This property is often referred to as the (weak) stationarity of X. More precisely, we state the following result. Lemma 1.4.1 Suppose X is a strongly symmetric, N -parameter, Rd valued Markov process whose reference measure is ν. Then, for all measurable f : Rd → R+ , Eν [f (Xt )] = f (x) ν(dx) = Tt f (x) ν(dx). Exercise 1.4.3 Prove Lemma 1.4.1.
In fact, not only does symmetry make it easier to compute the distribution of Xt for a fixed t ∈ RN + , it also often allows for the development of more-or-less simple formulæ for the finite-dimensional distributions of X, as the following extension of Lemma 1.4.1 reveals in a special case. d Lemma 1.4.2 Let s, t ∈ RN + and suppose ϕ1 , ϕ2 : R → R+ are measurable functions. Given the hypotheses of Lemma 1.4.1, Eν [ϕ1 (Xs )ϕ2 (Xt )] = ϕ1 (x) · Ts+t−2(st) ϕ2 (x) ν(dx).
Proof of Lemma 1.4.2 Suppose 1 denotes the function that is identically equal to 1. Since Tt 1(x) = 1, we can apply Exercise 1.4.2, as well as Lemmas 1.1.1 and 1.4.1, to deduce the following: Eν [ϕ1 (Xs )ϕ2 (Xt )] = Tst Ts−(st) ϕ1 · Tt−(st) ϕ2 (x) ν(dx) = Ts−(st) ϕ1 (x) · Tt−(st) ϕ2 (x) ν(dx). Another application of Exercise 1.4.2 implies the result.
To recapitulate, suppose that X is strongly symmetric and that its initial measure is its reference measure ν.4 Then, the distribution of Xt is independent of t (Lemma 1.4.1), and the joint distribution of Xs and Xt depends only on s + t − 2(s t) (Lemma 1.4.2). Consequently, given a fixed r ∈ RN + , the distribution of (Xs , Xt ) is the same as that of (Xs+r , Xt+r ). This property is sometimes called second-order, or L2 , stationarity. So far, the results of this subsection have relied only on the the symmetry of pt . However, when X is strongly symmetric, we also know that rλ is a 4 Heuristically, this means that X is a random variable with “distribution” ν, al0 though this interpretation is sensible only when ν is a probability measure.
2 Examples
401
proper gauge function on Rd ×Rd ; cf. Sections 1.1 and 2.3 of Appendix D for definitions. These properties will be used later on to connect N -parameter processes to potentials and energy.
2 Examples We now turn our attention to some examples of multiparameter Markov processes that we shall study in the following two sections. More specifically, we will introduce three large classes of multiparameter Markov processes: product Feller processes, additive L´evy processes, and multiparameter product processes. The first two families share the property that they are both built from N independent one-parameter processes. Earlier in the book we have seen other such constructions in the setting of discrete parameter theory; cf. Section 2.1, Chapter 1, and Section 2 of Chapter 3.
2.1 General Notation In this section we build examples of multiparameter Markov processes by suitably combining one-parameter Markov processes. The processes that we shall construct have a rich structure, as we shall see in Section 4 below. On the other hand, some cumbersome notation is required in order to carry out our constructions. Thus, before working on the details of the constructions of this section, it is worth our while to spend a few paragraphs and establish some basic notation. Given N probability spaces (Ω1 , G1 , Q1 ), . . . , (ΩN , GN , QN ), we define the product space Ω, the product σ-field G, and the product probability measure Q in the usual way, as follows: Ω = Ω 1 × · · · × ΩN , G = G1 × · · · × GN , Q = Q1 × · · · × QN . Throughout much of this section we will be considering N independent one-parameter Feller processes X 1 , . . . , X N , all the time remembering that X j = (Xrj ; r ≥ 0) is a one-parameter process defined on the probability space (Ωj , Gj , Qj ), for j ∈ {1, . . . , N }. Furthermore, for each j ∈ {1, . . . , N }, every X j is Sj -valued, where S1 , . . . , SN are compact spaces; if they are not compact, we will always compactify them, using the one-point compactification.5 Define Pix (x ∈ Si ) to be the probability measures in the definition of the Feller process X i , where i ∈ {1, . . . , N }. As always, Eix (x ∈ Si ) denotes 5 It
may help you to consider the special case S1 = S2 = · · · = SN .
402
11. Multiparameter Markov Processes
the corresponding expectation operator; we also let Ti = (Tir ; r ≥ 0) and Ri = (Riγ ; γ > 0) denote the transition functions and the resolvent of X i , respectively. Fixing each i ∈ {1, . . . , N }, we also need some notation for the complete augmented filtration for X i , which will be written as Fi = (Fri ; r ≥ 0). Of course, the phrase “complete augmented” refers to the measures Pix , x ∈ Si . In particular, note that for each i ∈ {1, . . . , N }, all bounded measurable functions f : Si → R, and for all u, v ≥ 0, i ) | Fui ] = Tiv f (Xvi ), Eix [f (Xu+v
Pix -a.s. for all x ∈ Si . In order to make statements about X 1 , . . . , X N simultaneously, we let Px = P1x1 × · · · × PN xN (x ∈ S), as a product measure on (Ω, G). We have written x ∈ S in the direct product notation x = x1 ⊗ · · · ⊗ xN , where xi ∈ Si , 1 ≤ i ≤ N . To put it another way, whenever xi ∈ Si , 1 ≤ i ≤ N , x1 ⊗ · · · ⊗ xN is the element of S with x(i) = xi , for all 1 ≤ i ≤ N .6 As usual, Ex (x ∈ S) denotes the corresponding expectation operator. We complete the introduction of our notation by defining the N parameter filtration F = (Ft ; t ∈ RN + ) as Ft =
N i=1
Fti(i) ,
t ∈ RN +.
Observe that F is the minimal N -parameter filtration whose marginal filtrations are given by F1 , . . . , FN , respectively.
2.2 Product Feller Processes Let S = S1 × · · · × SN denote the corresponding product space endowed with the product topology and the corresponding Borel σ-field. Since S is already compact, we need not one-point compactify it. Consequently, in the notation of Chapter 8, S∆ = S. Define the N -parameter stochastic process X = (Xt ; t ∈ RN + ) by defining, in direct product notation, Xt (ω) = Xt1(1) (ω1 ) ⊗ · · · ⊗ XtN(N ) (ωN ). Let us recall what this means: Whenever xi ∈ Si (1 ≤ i ≤ N ), then x = x1 ⊗ · · ·⊗xN is the “N -dimensional”7 point in S whose first coordinate 6 Hence,
as a concrete example, consider (1, 2, 3, 4, 5, 6) = 1 ⊗ (2, 3) ⊗ (4, 5, 6). S is the product of the N spaces S1 , . . . , SN , this may justify a relaxed reference to x as being N -dimensional. 7 Since
2 Examples
403
is x1 , second is x2 , etc. (Similarly, ω ∈ Ω is written as ω1 ⊗ · · · ⊗ ωN , where ωj ∈ Ωj , 1 ≤ j ≤ N .) The process X is called the product Feller process with coordinate processes X 1 , . . . , X N . We shall next argue that X is an N -parameter, S-valued Feller process. This is achieved by a direct calculation of Ex [f (Xt )] for all bounded, measurable functions f : S → R, all t ∈ RN + , and all x ∈ S. Suppose f : S → R+ is measurable and is of the form f = f1 ⊗ · · · ⊗ fN , where fi : Si → R+ . That is, we suppose N
fj (xj ),
f (x) =
S $ x = x1 ⊗ · · · ⊗ xN , xi ∈ Si .
j=1
Then, for all x ∈ S of the form x = x1 ⊗ · · · ⊗ xN with xi ∈ Si for all 1 ≤ i ≤ N, N
Ex [f (Xt )] = i=1
Eixi [fi (Xti(i) )]
N
= j=1
Tjt(j) fj (xj ),
t ∈ RN +.
(1)
(Recall that Tj denotes the transition functions of X j .) Motivated by this calculation, define for all measurable functions f : S → R+ of the form f = f1 ⊗ · · · ⊗ fN (fi : Si → R+ ), Trj f (x) = Tjr fj (xj ),
r ≥ 0, x ∈ S,
where x ∈ S is (again) written in direct product notation as x1 ⊗ · · · ⊗ xN , xi ∈ Si . By the Feller property, we can extend the domain of definition of f → Trj f to all measurable functions f : S → R+ in a unique manner: Exercise 2.2.1 Use a monotone class argument to show that there exists a unique extension of the bounded linear operator f → Trj f to the collection of all bounded measurable functions f : Rd → R+ . Moreover, show that T j is a one-parameter semigroup on S (and not always Sj !) and that Tt = j N %N j=1 Tt(j) , for all t ∈ R+ . Having come this far, it should not too difficult to check the following: Theorem 2.2.1 The product Feller process X is an N -parameter Feller process whose transition functions are given by T = (Tt ; t ∈ RN + ). Exercise 2.2.2 Verify Theorem 2.2.1. (Hint: Carefully study the proof of Proposition 2.2.1 below.)
We also mention the following calculation for the resolvent of X in terms of the resolvent Rj of X j ’s: For all measurable f : S → R+ of the form f = f 1 ⊗ · · · ⊗ f N , f j : S j → R+ , N
Rλ f (x) = j=1
Rjλ(j) fj (xj ),
(2)
404
11. Multiparameter Markov Processes
where λ ∈ RN + and x = x1 ⊗ · · · ⊗ xN , xj ∈ Sj . Suppose X 1 , . . . , X N are strongly symmetric one-parameter Feller processes with reference measures ν1 , . . . , νN , respectively. Suppose further that each X j possesses transition densities (pjr (x, y); r > 0, x, y ∈ Sj ), with respect to νj . That is, for all bounded measurable functions f : Sj → R, Ejx [f (Xrj )] = f (y)pjr (x, y) νj (dy). Sj
We then have the following. Proposition 2.2.1 Suppose that X 1 , . . . , X N are strongly symmetric Feller processes with reference measures ν1 , . . . , νN , respectively. Then, the product Feller process X is a strongly symmetric N -parameter Feller process with reference measure ν = ν1 × · · · × νN . Proof Suppose f = f1 ⊗ · · · ⊗ fN , where fi : Si → R are bounded and measurable (1 = 1, . . . , N ). Then, for all x = x1 ⊗ · · · ⊗ xN with xi ∈ Si , and for all t ∈ RN +, N
Tt f (x) = Ex [f (Xt )] = j=1
Sj
pjt(j) (xj , y)fj (y) νj (dy).
By a monotone class argument, we can choose the following version of the transition densities for X (why?): N
pt (x, y) = j=1
pjt(j) (xj , yj ),
RN +
where t 0 is in and where x = x1 ⊗ · · · ⊗ xN and y = y1 ⊗ · · · ⊗ yN are both in S. Note that pt is symmetric for all t 0 in RN + , as it should be. Moreover, Rλ has the following density with respect to the reference measure ν = ν1 × · · · × νN : N
rλ (x, y) = j=1
rλj (j) (xj , yj ),
(3)
where x = x1 ⊗& · · · ⊗ xN , y = y1 ⊗ · · · ⊗ yN (xi , yi ∈ Si ), λ 0 is in RN +, ∞ and rγj (a, b) = 0 e−γr pjr (a, b) dr for all a, b ∈ Sj and all γ > 0. That is, rγj denotes the γ-potential density of X j with respect to the measure νj . Finally, notice that when rγj is a proper gauge function on S j × S j , rλ is automatically a gauge function on S. (To be consistent with our definitions of gauge functions, we need to assume that the Sj ’s are Euclidean. It is a good idea to define proper gauge functions on metric spaces by analogy and check that the above remains true, in general.) The proof follows readily from this.
2 Examples
405
2.3 Additive L´evy Processes The second class of multiparameter processes that we are interested in is the family of so-called additive L´evy processes. Suppose X 1 , . . . , X N of Section 2.1 are in fact (one-parameter) L´evy processes with S1 = S2 = evy process · · · = SN = Rd . The N -parameter, Rd -valued additive L´ X = (Xt ; t ∈ RN ) is defined as follows: + Xt (ω) =
N j=1
Xtj(j) (ωj ),
ω ∈ Ω, t ∈ RN +.
K j We sometimes write the process X in direct sum notation as X = N j=1 X . d d For all bounded measurable functions f : R → R, all x ∈ R , and all t ∈ RN + , define Tt f (x) = Q[f (Xt + x)]. We recall that Q denotes both the underlying probability measure and its expectation. Theorem 2.3.1 The process X is an N -parameter Feller process whose transition functions are given by T = (Tt ; t ∈ RN + ). Exercise 2.3.1 Prove Theorem 2.3.1.
Let us now turn our attention to the resolvent of X. Suppose f : Rd → R is a bounded, measurable function, x ∈ Rd , and λ 0 is in RN + . We shall N also fix N points u1 , . . . , uN ∈ Rd such that j=1 uj = x; otherwise, the choice of u1 , . . . , uN is completely arbitrary. After several applications of Fubini’s theorem, we obtain −λ·t e Q[f (Xt + x)] dt = Q e−λ·t f (Xt + x) dt Rλ f (x) = RN +
= E1u1 E2u2 · · · EN uN
RN +
N j = (%j=1 Euj )
RN +
RN +
e−λ·t f (Xt ) dt
e−λ·t f (Xt ) dt ,
in operator notation. We can evaluate the above by “peeling the indices back.” To be more precise, note that for any ω1 ∈ Ω1 , . . . , ωN −1 ∈ ΩN −1 , N EuN e−λ·t f (Xt (ω)) dt
RN +
= =
ΩN
RN +
N −1 R+
e−
e−λ·t f Xt (ω) dt PN uN (dωN ) N −1 j=1
λ(j) t(j)
RN λ(N ) f
−1 N j=1
Xtj(j) (ω ) dt(N −1) · · · dt(1) ,
406
11. Multiparameter Markov Processes
where ω = ω1 ⊗ · · · ωN −1 . By induction (on N ), j Rλ f (x) = %N j=1 Rλ(j) f (x). 1
(1) N
We can specialize further by assuming that X , . . . , X possess γ-potential densities rγ1 , . . . , rγN with respect to Lebesgue’s measure on Rd . That is, for all γ > 0, for x ∈ Rd , and all bounded, measurable functions f : Rd → R, Riγ f (x) = f (y)rγi (x, y) dy. By Example 3, Section 4.3 of Chapter 8, we can always write rγi (x, y) = uiγ (y − x) (why?). Consequently, Riγ f (x) = f uiγ (x),
(2)
d
where denotes ordinary convolution on R . For all λ 0 in a ∈ Rd , define uλ (a) = u1λ(1) · · · uN λ(N ) (a), RN +
RN +
and all (3) d
and x, y ∈ R . By and let rλ (x, y) = uλ (y − x), where λ 0 is in equation (1), rλ is the density of Rλ with respect to Lebesgue’s measure on Rd in the sense that for all bounded, measurable functions f : Rd → R, Rλ f (x) = f (y)rλ (x, y) dy. (4) We now address the issue of strong symmetry. Theorem 2.3.2 Suppose X 1 , . . . , X N are independent, strongly symmetric Rd -valued L´evy processes whose reference measure is Lebesgue’s meaj d sure on Rd . Then, X = ⊕N j=1 X is a strongly symmetric R -valued, N parameter Feller process whose reference measure is Lebesgue’s measure on Rd . Proof We begin by reiterating the computations made prior to the statement of Theorem 2.3.2. Suppose that λ > 0 and that rλ1 , . . . , rλN denote the λ-potential densities of X 1 , . . . , X N , respectively. Then, rλj (x, y) = uiλ (y − x), for λ > 0 and y, x ∈ Rd , and rλ (x, y) = uλ (x, y), where uλ is given by equation (3). Since the X i ’s are strongly symmetric, the ri ’s are symmetric functions (Exercise 2.1.1, Chapter 10). Equivalently, uiλ (a) = uiλ (−a). Thus, uλ (a) = uλ (−a), and this means that the λ-potential density rλ of the N parameter process X is symmetric. Moreover, since continuity is preserved under convolutions, rλ is a gauge function (why?). That pt exists and is symmetric is shown by using similar arguments. Thus, to finish, we need to demonstrate that rλ is proper. For any σ-finite measures µ on Rd , all λ 0 d N 1 1 in RN + , and for every x ∈ R , Rλ µ(x) = Rλ(N ) ◦ · · · ◦ Rλ(1) µ(x). Since rλ(1) is proper by assumption, a little thought shows that so is rλ . This completes the proof.
2 Examples
407
2.4 More General Product Processes Let N be a positive integer and consider for each i ∈ {1, . . . , N }, N ini dependent multiparameter Markov processes X i = (Xti ; t ∈ RM + ), where M1 , . . . , MN are positive integers. For the sake of concreteness, we suppose that for each i ∈ {1, . . . , N }, X i is defined on some probability space (Ωi , Gi , Qi ) and is Si -valued, where Si is a locally compact metric space. (Recall that for our purposes, we can assume, without loss of generality, that each Si is compact.) In order to keep the notation from becoming overwhelming, you should note at this point that each X i is an Si -valued, Mi -parameter stochastic process. As in Section 2.1, we define (Ω, G, Q) to be the product probability space, let S = S1 ×· · ·×SN , M = M1 +· · ·+MN , and define the S-valued, M -parameter process X = X 1 ⊗ · · · ⊗ X N by Xt (ω) = Xt11 (ω1 ) ⊗ · · · ⊗ XtNN (ωN ),
t ∈ RM ,
where ω = ω1 ⊗ · · · ⊗ ωN ∈ Ω with ωi ∈ Ωi and t = t1 ⊗ · · · ⊗ tN ∈ RM + with i t i ∈ RM . The S-valued, M -parameter process X is the product process + built with the N processes X 1 , . . . , X N , the ith one of which is Si -valued and is indexed by Mi parameters. Of course, the underlying probability space is built up as in Section 2.1. Theorem 2.4.1 If X i is an Mi -parameter, Si -valued Feller process for all 1 ≤ i ≤ N , then X is an S-valued, M -parameter Feller process. Moreover, suppose R1 , . . . , RN and R denote the resolvents of X 1 , . . . , X N and X, respectively. Define f = f1 ⊗ · · · ⊗ fN , where fi : Si → R+ are measurable Mi for each 1 ≤ i ≤ N . Then, for all λ ∈ RM + with λ = λ1 ⊗· · ·⊗λN , λi ∈ R+ , and all x ∈ S with x = x1 ⊗ · · · ⊗ xN , xi ∈ Si , N
Rλ f (x) = i=1
Riλi fi (xi ).
Finally, if the X 1 , . . . , X N are all strongly symmetric and if ν1 , . . . , νN denote their respective reference measures, X is also strongly symmetric, and its reference measure is ν1 × · · · × νN . Exercise 2.4.1 Prove Theorem 2.4.1. (Warning: We have defined strong symmetry for a process only when the process takes its values in a Euclidean space. Theorem 2.4.1, however, remains true for the obvious general definition of strong symmetry. Try some of these details.) In particular, the above implies that we can combine the examples of Sections 2.2 and 2.3 in nontrivial ways to construct other interesting multiparameter Markov processes.
408
11. Multiparameter Markov Processes
3 Potential Theory We now return to the basic question raised (and answered in a oneparameter setting) in Chapter 10. Namely, when does the range of a strongly symmetric multiparameter Markov process intersect a given set?
3.1 The Main Result d Let X = (Xt ; t ∈ RN + ) denote a strongly symmetric, R -valued Markov N process with transition functions T = (Tt ; t ∈ R+ ), resolvent R = (Rλ ; λ 0 in RN + ), and reference measure ν. The main result of this section is the following characterization of the positivity of intersection (or hitting) probabilities in terms of capacities. It unifies the results of Hirsch (1995), Hirsch and Song (1995a, 1994, 1995c, 1995d), Fitzsimmons and Salisbury (1989), Ren (1990), and Salisbury (1996) in the setting of multiparameter strongly symmetric Feller process on Euclidean spaces.
Theorem 3.1.1 Suppose X is a strongly symmetric, Rd -valued, N parameter Markov process with resolvent R and reference measure ν. Then, for all compact sets E ⊂ Rd , and for all λ 0 in RN +, A1 Crλ (E) ≤ where A1 = 2−N {
RN +
N
j=1
e−λ·t Pν X([0, t]) ∩ E = ∅ dt ≤ A2 Crλ/2 (E), λ(j) }−1 and A2 = 8N {
N
j=1
λ(j) }−1 .
Remarks (i) Essentially using the notation of the previous chapter, for any Borel set G ⊂ RN + , we let X(G) denote the image of G under t → Xt . That is, X(G) = x ∈ Rd : ∃t ∈ G such that Xt = x . (ii) Recall that Cg (E) denotes the capacity of E with respect to any gauge function g : Rd × Rd → R+ ∪ {∞}; cf. Appendix D. (iii) Recall that ν need not be a probability measure. See the convention about Pν stated after Exercise 1.4.1. An important consequence of the above theorem is the following, which we leave as an exercise. Exercise 3.1.1 Prove that under the conditions of Theorem 3.1.1, for any compact set E ⊂ Rd and for any λ 0 in RN +, Pν X(RN + ) ∩ E = ∅ > 0 ⇐⇒ Crλ (E) > 0.
3 Potential Theory
409
In words, show that the closed range of the process X hits a compact set E (with positive Pν -“probability”) if and only if E has positive capacity. (Hint: One can view rλ as a Laplace transform; cf. also Appendix B.) We will prove Theorem 3.1.1 in the next three subsections and close this subsection with an important consequence. Note that unless ν is a probability (respectively finite) measure, Pν is not a probability (respectively finite) measure. This makes the present form of Theorem 3.1.1 both awkward and difficult to use. Next, we show that under mild conditions, Pν can be replaced with the probability measure Px . This result is akin to Corollary 2.3.1 of Chapter 10, and its proof is motivated by a positivity condition first mentioned explicitly in Evans (1987a); see Evans (1987b) for related results. Corollary 3.1.1 Consider a strongly symmetric, N -parameter, Rd -valued d Markov process and assume that there exists λ 0 in RN + and x0 ∈ R d such that for all y ∈ R , rλ (x0 , y) > 0. Then, for all compact sets E ⊂ Rd such that x0 ∈ E, Px0 X(RN + ) ∩ E = ∅ > 0 ⇐⇒ Crλ (E) > 0. Proof of Corollary 3.1.1 Let x0 and λ be as in the statement of the corollary, and recall that for all λ 0 and all y ∈ Rd , rλ (x0 , y) = e−λ·s ps (x0 , y) ds. RN +
Thus, the condition rλ (x0 , y) > 0 is equivalent to the strict positivity of ps (x0 , y) for Lebesgue almost all s ∈ RN + . In particular, Pν X(RN + ) ∩ E = ∅ > 0 N ⇐⇒ Py X(RN + ) ∩ E = ∅ ps (x0 , y) ν(dy) > 0, for almost all s ∈ R+ . Now, N ) ∩ E = ∅ P ) ∩ E =
∅ p (x , y) ν(dy) = E X(R Py X(RN s 0 x X 0 s + + = Px0 X([s, ∞[) ∩ E = ∅ , N where [s, ∞[ = =1 s() , ∞ . In the last step we have used part (iv) of the definition of Markov property in Section 1.1 (why?). In particular, we can combine the above observation with Remark (iii) after Theorem 3.1.1 to deduce that
Crλ (E) > 0 ⇐⇒ Px0 X [s, ∞[ ∩ E = ∅ > 0, for almost all s ∈ RN +.
410
11. Multiparameter Markov Processes
To finish, note that by the right continuity of t → Xt ,
lim Px0 X [0, s] ∩ E = ∅ = 0. s→0
Since s → Px0 X [s, ∞[ ∩ E = ∅ is nonincreasing with respect to the partial order , this implies the corollary.
3.2 Three Technical Estimates Let e1 , . . . , eN denote N independent exponentially distributed random variables whose means are all 1. By further enlarging the underlying probability space, we may, and will, assume without loss of generality that the collection (e1 , . . . , eN ) is totally independent of the process X. For any λ 0 in RN + , define e(λ) to be the random vector whose jth coordinate is ej /λ(j) . Throughout this subsection we are interested in obtaining three distributional estimates for Jλ (ϕ), where ϕ : Rd → R+ is a measurable function and where ϕ(Xs ) ds. (1) Jλ (ϕ) = [0,e(λ)]
We shall often use the self-evident fact that for any x ∈ Rd and for all s ∈ RN +, (2) Px {s e(λ)} = e−λ·s . Lemma 3.2.1 For all λ 0 in RN + and for every measurable function ϕ : R d → R+ , 1 Eν Jλ (ϕ) = N · ϕ(x) ν(dx). (j) j=1 λ Proof By the monotone convergence theorem, we may assume that both sides are finite (why?). Note that for all x ∈ Rd , Ex [Jλ (ϕ)] = Rλ ϕ(x). Thus, Eν [Jλ (ϕ)] = Rλ ϕ(x) ν(dx) = ϕ(x) · Rλ 1(x) ν(dx), Rd
Rd
d
where 1(x) ≡ 1 for all x ∈ R ; we have used Exercise 1.4.2 in this last step. & N (j) −1 Since Rλ 1(x) = RN e−λ·s ds = , the lemma follows. j=1 λ +
Lemma 3.2.2 For all λ 0 in RN + and for all measurable functions ϕ : Rd → R + , 2N · ϕ(a) Rλ ϕ(a) ν(da). Eν {Jλ (ϕ)}2 = N (j) Rd j=1 λ
3 Potential Theory
411
In particular, 2N Eν {Jλ (ϕ)}2 = N (j) j=1 λ
rλ (x, y)ϕ(x)ϕ(y) ν(dx) ν(dy).
Proof By the monotone convergence theorem, we can assume that all integrals in question are finite (why?). Directly computing, we obtain the following: For all a ∈ Rd , e−λ·(st) Ea ϕ(Xs )ϕ(Xt ) ds dt. Ea {Jλ (ϕ)}2 = RN +
RN +
Thus, we can integrate with respect to ν(da) to deduce that Eν {Jλ (ϕ)}2 = e−λ·(st) Eν ϕ(Xs )ϕ(Xt ) ds dt. RN +
RN +
We now use symmetry in an essential way: By Lemma 1.4.2 and by Fubini’s theorem, e−λ·(st) ϕ(a) Ts+t−2(st) ϕ(a) ds dt ν(da). Eν {Jλ (ϕ)}2 = Rd
RN +
RN +
Clearly, for all u, v ∈ RN + , u + v − 2(u v) = (u v) − (u v). Thus, Eν {Jλ (ϕ)}2 = e−λ·(st) Tst−st ϕ(a) ds dt ϕ(a) ν(da). RN +
Rd
RN +
On the other hand, as operators, e−λ·(st) Tst−st ds dt RN +
RN +
= %N j=1
∞
∞
(j)
e−λ 0 0∞ ∞
(s(j) ∨t(j) ) j Ts(j) ∨t(j) −s(j) ∧t(j) (j) (j)
ds(j) dt(j)
Ttj(j) −s(j) dt(j) ds(j) 0 ∞ (j) (j) N N e−λ s Rjλ(j) ds(j) = 2 %j=1
= 2N %N j=1
e−λ
t
s(j)
0
2N
= N
j=1
λ(j)
2N j %N Rλ . j=1 Rλ(j) = N (j) j=1 λ
We have used Proposition 1.3.1 in the last line. This immediately proves the first assertion of the lemma. The second assertion follows from the first and from Fubini’s theorem.
412
11. Multiparameter Markov Processes
In order to describe the next technical lemma of this subsection, we finally need to address the case where ν is potentially not a probability measure. While Pν makes sense in any case, the martingale theory that has been developed in this book can be invoked for Pν only if it is a probability measure, i.e., when ν is a probability measure. Unfortunately, in many of our intended applications ν will be Lebesgue’s measure, and this means that we need to approximate ν with probability measures. The precise way to do this is immaterial in that any reasonable approximation serves our purposes. We describe one such approximation next. For all k ≥ 1, we define the probability measure νk as follows: For all Borel sets G ⊂ Rd ,
ν G ∩ [−k, k]d
, k ≥ 1, (3) νk (G) = ν [−k, k]d where 1 ÷ 0 = ∞. Our definition of reference measures implies that ν([−k, k]d ) is strictly positive, for all k ≥ 1. Thus, νk is a probability measure on Rd , for all k ≥ 1, and correspondingly, Pνk is a probability measure (on our underlying probability space) for any and all k large enough. k If ϕ : Rd → R+ is measurable and λ 0 is in RN + , we define M = (Mtk ; t ∈ RN ) by + (4) Mtk = Eνk [Jλ (ϕ) | Ft ], where Jλ (ϕ) is defined by equation (1) above. Note that when ϕ is bounded, for instance, M k is an N -parameter martingale, which we can always take to be separable. (Otherwise, consider instead the modification of M k provided by Doob’s separability theorem, Theorem 2.2.1 of Chapter 5. Once again, note that we need the probability space to be complete in order to do this.) Lemma 3.2.3 Consider a fixed λ 0 in RN + and a fixed, measurable function ϕ : Rd → R+ . Then, for all t ∈ RN + and all k ≥ 1, Mtk ≥ e−λ·t Rλ ϕ(Xt ),
Pνk -a.s.,
where M k is defined by (4). Proof First, we suppose that ϕ is also bounded. For each fixed t ∈ RN + and for every k ≥ 1, Mtk = Eνk [Jλ (ϕ) | Ft ] 1l ≥ Eνk = st
=
st
RN +
t s e(λ)
ϕ(Xs ) ds Ft
e−λ·s Eνk ϕ(Xs ) Ft ds e−λ·s Ts−t ϕ(Xt ) ds
3 Potential Theory
413
= e−λ·s Rλ ϕ(Xt ), Pνk -a.s. This completes the proof when ϕ is bounded. The general result follows from Lebesgue’s monotone convergence theorem, since we can consider the function ϕ ∧ n instead of ϕ and then let n ↑ ∞, monotonically.
3.3 Proof of Theorem 3.1.1: First Half In this subsection we prove the upper bound on the integral of Theorem 3.1.1. Recalling the definition of e(λ), λ 0 in RN + , this is equivalent to proving that (1) Pν X([0, e(2λ)]) ∩ E = ∅ ≤ A2 Crλ (E). (Why?) This is a natural time to ponder over measurability issues. Define En to be the open n1 -enlargement of E. That is, 1 , En = x ∈ Rd : dist{x; E} < n
n ≥ 1.
Of course, n need not be an integer for the above to make sense. Clearly, E1 , E2 , . . . are all compact subsets of Rd . Moreover, for any s ∈ RN + and for all ω ∈ Ω (the underlying sample space),
X [0, s] (ω) ∩ E = ∅ ⇐⇒ ∀n ≥ 1 : X [0, s] (ω) ∩ En = ∅. We have used the right continuity of t → Xt . Since E is open, this right continuity implies that the collection of all ω ∈ Ω such that X [0, s] (ω) ∩ En = ∅ is measurable (why?). Since measurability follows from this, we proceed with the main body of our proof. Let ϕ : Rd → R+ denote an arbitrary measurable function and recall Jλ (ϕ) from equation (1) of Section 3.2. Also recall the probability measures ν1 , ν2 , . . . and the associated N -parameter processes M 1 , M 2 , . . . from equations (3) and (4) of Section 3.2, respectively. Note that when ϕ is bounded (say), the M k ’s are N -parameter martingales. (In general, integrability need not hold.) We shall use these processes in the main step in the proof of equation (1) and show that for every n ≥ 1 and all k ≥ 1 sufficiently large,
Crλ En
. Pνk X [0, e(2λ)] ∩ En = ∅ ≤ A2 (2) ν [−k, k]d This is the first reduction. Lemma 3.3.1 Equation (2) implies equation (1).
414
11. Multiparameter Markov Processes
Proof Clearly, for each Borel set G, ν [−k, k]d × Pνk (G) ↑ Pν (G), as k ↑ ∞. Thus, equation (2) implies Pν X([0, e(2λ)]) ∩ En = ∅ ≤ A2 Crλ (En ). Since En ↓ E as n → ∞, by the countable additivity of Pν , lim Pν X([0, e(2λ)]) ∩ En = ∅ = Pν X([0, e(2λ)]) ∩ E = ∅ , n→∞
by compactness. It remains to show that limn→∞ Crλ (En ) = Crλ (E). But this follows from the outer regularity of capacities; cf. Lemma 2.1.2, Appendix D. Before proving equation (2) in earnest, we make a few observations. First, we note that since En is open and since ν is a reference measure, ν(En ) > 0. Using this, together with Lemma 1.4.1, we deduce that for all t ∈ RN +, Pν (Xt ∈ En ) = Pν (X0 ∈ En ) = ν(En ) > 0.
(3)
Our second observation is a mere application of Supplementary Exercise 11. Namely, we can find a QN + ∪ ∞}-valued random vector T such that N (4) T (ω) ∈ QN + ∩ [0, e(2λ)] ⇐⇒ ∃t ∈ Q+ ∩ 0, e(λ) : Xt ∈ En . It should be noted that by its very definition, T = ∞ if and only if T ∈ QN +. For this random vector T , we have the following statement. Lemma 3.3.2 For all k ≥ 1, the following L2 (Pνk ) estimate holds: 2 · Pνk {T e(2λ)}. Eνk sup |Mtk |2 ≥ Eνk Rλ ϕ(XT ) T e(2λ) t∈Q+
Proof Applying (4), Lemma 3.2.3, and the countable additivity of Pνk , all in conjunction, we obtain , sup Mtk ≥ e−λ·T Rλ ϕ(XT )1l(T ∈QN +)
t∈QN +
Pνk -a.s. (Why?) We can square both sides of the above and take Eνk expectations to see that Eνk sup |Mt |2 ≥ Eνk e−2λ·T {Rλ ϕ(XT )}2 1l(T =∞) t∈QN +
2 = Eνk [1l(T e(2λ)) Rλ ϕ(XT ) 1l(T =∞) 2 = Eνk 1l(T e(2λ)) Rλ ϕ(XT ) 2 = Eνk Rλ ϕ(XT ) T e(2λ) · Pνk {T e(2λ)}.
3 Potential Theory
415
We have used the independence of e from the entire process X, together with equation (2) of Section 3.2. By the Cauchy–Schwarz inequality, for all nonnegative random variables Z, 2 Eνk Z 2 T e(2λ) ≥ Eνk Z T e(2λ) . The result follows from this upon letting Z = Rλ ϕ(XT ).
For all k ≥ 1 and all Borel sets F ⊂ Rd , define µk (F ) = Pνk {XT ∈ F T e(2λ)}.
(5)
Lemma 3.3.3 There exists k0 ≥ 1 such that for every k ≥ k0 , the measure µk is absolutely continuous with respect to νk and hence with respect to ν. Moreover, µk ∈ P(En ). Proof We can see the right continuity of t → Xt , and from equation
from we can (3), that µk ∈ P En for all k ≥ 1. Furthermore, appeal to equation (4) to see that on (T = ∞), XT ∈ Xt : t ∈ QN + . Now suppose there exists a Borel set G ⊂ Rd such that νk (G) = 0. By invoking equation (4), we obtain µk (G) ≤
t∈QN +
Pνk (Xt ∈ G) ν (G)
≤ k
= 0, T e(2λ) Pνk T e(2λ) P νk N t∈Q+
as long as Pνk {T e(2λ)} > 0. However,
lim ν [−k, k]d × Pνk {T e(2λ)} = Pν {T e(2λ)} > 0, k→∞
by equation (3). This proves the absolute continuity of µk for all k sufficiently large. Lemma 3.3.4 Let µk be defined by (5) and let k0 be the constant given in Lemma 3.3.3. Then, for all k ≥ k0 , for any measurable ϕ : Rd → R+ and for all λ 0 in RN +,
8N
N
rλ (x, y)ϕ(x)ϕ(y) ν(dx) ν(dy) λ(j)
2
Rλ ϕ(y) µk (dy) · Pνk T e(2λ) · ν [−k, k]d . ≥ j=1
Rd
Proof Truncate ϕ and apply Lebesgue’s monotone convergence theorem to see that, without loss of generality, ϕ can be assumed to be bounded. In particular, for each k ≥ k0 , M k is now an N -parameter martingale!
416
11. Multiparameter Markov Processes
Combine Lemma 3.3.2 and the definition of µk to see that Eνk
2 sup {Mtk }2 ≥ Eνk Rλ ϕ(XT ) T e(2λ) · Pνk T e(2λ)
t∈QN +
= Rd
2
Rλ ϕ(y) µk (dy) · Pνk T e(2λ) .
We now employ Cairoli’s second inequality (Theorem 2.3.2, Chapter 7) and deduce that
2
4N sup Eνk [|Mtk |2 ] ≥ Rλ ϕ(y) µk (dy) · Pνk T e(2λ) . t∈RN +
Rd
On the other hand, by Lemma 3.2.2, sup Eνk [Mt2 ] ≤ Eνk {Jλ (ϕ)}2 t∈RN +
1
· Eν |Jλ (ϕ)|2 ≤ d ν [−k, k] 1 2N
× N = rλ (x, y)ϕ(x)ϕ(y) ν(dx) ν(dy). (j) ν [−k, k]d j=1 λ
The lemma follows readily from this. We are ready to conclude the proof of the first half of Theorem 3.1.1.
Proof of Equation (2) A little thought shows that equation (2) is equivalent to the following: For all k ≥ k0 ,
Cr En
. (6) Pνk T e(2λ) ≤ A2 λ ν [−k, k]d This follows from equation (4) above, used in conjunction with equation (2) of Section 3.2 and the independence of T and e(2λ). Thus, we may (and will) assume without loss of generality that the above probability is strictly positive, for otherwise, there is nothing to prove. The previous lemma holds for any measurable function ϕ. We now choose a “good” ϕ in order to finish. Henceforth, let us fix large finite constants > 0 and k ≥ k0 and define ϕ(x) = where
dµk 1l (x), dνk Θ()
x ∈ Rd ,
dµk ≤ . Θ() = x ∈ E n : dν
3 Potential Theory
417
It is very important to note that by picking k0 large enough (chosen independently of our choice of ), we can ensure that
dµk ϕ(x) = ν [−k, k]d (x)1Θ() , dν since for all n ≥ 1, En ⊂ E1 is bounded. By Lemma 3.3.3, this definition is well-defined. For this choice of ϕ, the expression in Lemma 3.3.4 can be efficiently estimated, viz.,
Rλ ϕ(x) µk (dx) = ν [−k, k]d rλ (x, y) ϕ(y) µk (dx) ν(dy)
d rλ (x, y) µk (dx) µk (dy) = ν [−k, k] Θ()
rλ (x, y)µk (dx) µk (dy) ≥ ν [−k, k]d Θ()
Θ()
= ν [−k, k]d Erλ µk Θ() , where µk F (Γ) = µk (F ∩ Γ) denotes the restriction of µk to any measurable set Γ. Hence, for the above choice of ϕ, Lemma 3.3.4 becomes 2
8N Pνk T e(2λ) · Erλ (µk |Θ() ) ·ν [−k, k]d ≤ N Erλ (µk |Θ() ). (7) (j) j=1 λ However, by Exercise 1.4.2, Erλ (µk |Θ() ) = ϕ(x) Rλ ϕ(x) νk (dx) ≤ Rλ ϕ(x) νk (dx) Θ() ϕ(x) νk (dx), = ϕ(x) · Rλ 1(x) νk (dx) = N (j) j=1 λ where 1(x) = 1 for all x ∈ Rd . We have used the following calculation, which we also encountered earlier: Rλ 1(x) =
RN +
e−λ·s ds =
N
λ(j)
−1
.
j=1
& Since ϕ(x) νk (dx) = µk (Rd ) = 1, we can conclude that Erλ (µk |Θ() ) < ∞. We claim that this expression is also strictly positive, as long as is large enough. Indeed, by Fubini’s theorem,
lim inf Erλ µk |Θ() = lim inf Rλ µk |Θ() (x) ν(dx) →∞ →∞ ≥ lim inf Rλ µk Θ() (x) ν(dx) →∞
418
11. Multiparameter Markov Processes
=
rλ (x, y) µk (dx) ν(dy) 1
= N
j=1
λ(j)
> 0,
since µk (Rd ) = 1. What all this means is that we can divide both sides of equation (7) by the square of Erλ µk |Θ() if is sufficiently large. In other words, for all large,
8N 8N ν [−k, k]d · Pνk T e(2λ) ≤ N Θ() ≤ Cac Cac r rλ E n , λ N (j) (j) j=1 λ j=1 λ where Cac rλ (G) is the absolutely continuous capacity of G with respect to the gauge function rλ ; cf. Section 2.3 of Appendix D. Since rλ is a symmetric and proper gauge function, Theorem 2.3.1 of Appendix D reveals that on compact sets, Cac rλ = Crλ . This verifies equation (6) and hence, equation (2). The first half of the proof of Theorem 3.1.1 is complete.
3.4 Proof of Theorem 3.1.1: Second Half From Section 3.3 we recall the open sets En ↓ E, as n → ∞, where En denotes the open n1 -enlargement of E. Suppose that n ≥ 1 is a fixed integer and that ϕ : Rd → R+ is a nontrivial measurable function such that for all x ∈ En , ϕ(x) = 0. By equation (3) of Section 3.2, for any λ 0 in RN +, Pν (Jλ (ϕ) > 0) ≤ Pν X([0, e(λ)]) ∩ En = ∅ . Therefore, we can use our proof8 of the Paley–Zygmund lemma (Lemma 1.4.1, Chapter 3), to obtain Pν X([0, e(λ)]) ∩ En = ∅ ≥
2 Eν [Jλ (ϕ)] . Eν {Jλ (ϕ)}2
Lemmas 3.2.1 and 3.2.2 together imply that Pν X([0, e(λ)]) ∩ En = ∅ ≥
1
2
N N
×
j=1
λ(j)
2 ϕ(x) ν(dx)
−1 rλ (x, y)ϕ(x)ϕ(y) ν(dx) ν(dy) .
8 While we can use the proof of the Paley–Zygmund lemma, we cannot invoke it as it appears in Chapter 3, since Pν is not necessarily a probability measure. Alternatively, we can replace ν by νk and work with Pνk first. You may want to try this on your own.
4 Applications
419
We should recognize that this holds for all measurable ϕ : Rd → R+ such that ϕ(x) = 0, off of En . Equivalently, suppose ζ is any probability measure on the closure of En that is absolutely continuous with respect to ν. Then, Pν X([0, e(λ)]) ∩ En = ∅ ≥
2
1 N N
j=1
λ(j)
−1 Erλ (ζ) .
Optimizing over such ζ’s, we can conclude that Pν X([0, e(λ)]) ∩ En = ∅ ≥
2N
1 N
(j) j=1 λ
Cac rλ (E n ),
where Cac rλ denotes the absolutely continuous capacity with respect to the gauge function rλ ; cf. Section 2.3 of Appendix D. Since rλ is a symmetric, proper gauge function on Rd × Rd , Theorem 2.3.1 of Appendix D ensures us that absolutely continuous capacities agree with capacities on compact sets. The proof of the lower bound of Theorem 3.1.1 follows once we observe that Crλ (En ) ≥ Crλ (E) and let n → ∞ afterwards. Exercise 3.4.1 Recall that in the N -parameter theory, we defined ν to be a reference measure if ν(G) > 0 for all open sets G ⊂ Rd . This was not needed to establish the 1-parameter potential theory of Chapter 10. Show that it is not needed in the N -parameter case, either. To be more precise, show that whenever ν(G) = 0 for an open set G ⊂ Rd , the Pν -probability that the image of X ever intersects G is zero. Conclude that Theorem 3.1.1 holds without this positivity condition on ν. (Hint: Use the right continuity of the N -parameter process t → Xt .)
4 Applications Having demonstrated Theorem 3.1.1, we are in position to apply it to analyze the fractal structure of a large class of interesting stochastic processes. In this section we show some such applications of Theorem 3.1.1.
4.1 Additive Stable Processes Consider N independent, Rd -valued, isotropic L´evy processes X 1 , . . . , X N of index α ∈ ]0, 2], and define the additive stable process of index α, 1 N X = (Xt ; t ∈ RN + ), as X = X ⊕ · · · ⊕ X . That is, Xt =
N j=1
Xtj(j) ,
t ∈ RN +.
420
11. Multiparameter Markov Processes
Since X is an additive L´evy process, the measure-theoretic details of such a construction have already been worked out in Section 2.3 above. Recall 1 N (the same) γ-potential densities from 1 Section 3.4 that X d, . . . , X have rγ (x, y); γ > 0, x, y ∈ R given by rγ1 (x, y) = u1γ (y − x), where u1γ (a)
= 0
∞
e−γs qs (a) ds,
a ∈ Rd , γ > 0,
where qs (a) denotes the density function of Xs1 at a, with respect to Lebesgue’s measure on Rd . By equation (3) of Section 2.3, X has λ-potential d densities (rλ (x, y); λ 0 in RN + , x, y ∈ R ) with respect to Lebesgue’s d measure on R . Moreover, rλ (x, y) = uλ (y − x), where uλ (a) = u1λ(1) · · · u1λ(N ) (a),
a ∈ Rd , λ 0 in RN +,
(1)
where denotes convolution. We now estimate this λ-potential density. Proposition 4.1.1 Suppose d > N α. Then, the λ-potential density uλ is a symmetric, proper gauge function on Rd ×Rd . In particular, X is a strongly symmetric Markov process whose reference measure is Lebesgue’s measure on Rd . Moreover, for every λ 0 in RN + , there exists a finite constant A > 1 such that for all x, y ∈ Rd with x − y ≤ 12 , 1 x − y−d+N α ≤ rλ (x, y) ≤ Ax − y−d+N α . A Proof Throughout, we assume that d > N α and λ 0 (in RN + ) are fixed. The proof of this proposition is divided into two general parts: an upper bound on rλ and a lower bound on rλ . The verification of the remaining assertions are deferred to Exercise 4.1.2 below. For the upper bound, we propose to prove more. Namely, we will show that there exists a finite constant A1 > 1 such that for all a ∈ Rd , uλ (a) ≤ A1 a−d+N α.
(2)
When N = 1 and a ≤ 1, this follows from Lemma 3.4.1 of Chapter 10. We demonstrate equation (2) by induction on N . For all a ∈ Rd , define U 0 (a) = 1. Then, we iteratively define for all n ∈ {1, . . . , N }, U n (a) = U n−1 u1λ(n) (a),
a ∈ Rd .
It is important to note that U 1 (a) = u1λ(1) (a) and U N (a) = uλ (a). equation (2) clearly follows once we show that for every n ∈ {1, . . . , N }, there exists a finite constant A2 > 1 such that U n (a) ≤ A2 a−d+nα ,
a ∈ Rd .
(3)
4 Applications
421
We have already observed that when n = 1, this holds by Lemma 3.4.1 of Chapter 10. Supposing the above holds for n replaced by any k ≤ n − 1, we will verify it for n. The proof (and, in fact, the inequality (3) itself) is very close to that of Lemma 3.3.1 of Chapter 3. Henceforth, we assume that equation (3) holds with n replaced by n − 1, where n ∈ {1, . . . , N } and d > N α ≥ nα. By the definition of a convolution and by the induction hypothesis, for all a ∈ Rd , U n−1 (b − a)u1λ(n) (b) db U n (a) = d R ≤ A3 b − a−d+(n−1)α · b−d+α db, Rd
for some finite constant A3 > 1. We split this integral into three pieces, U n (a) ≤ A3 T1 + T2 + T3 , where
T1 =
b∈Rd : b >2 a
T2 = T3 =
b∈Rd : b <2 a
b−a ≤ 12 a b∈Rd : b <2 a
b−a ≥ 12 a
b − a−d+(n−1)α · b−d+α db, b − a−d+(n−1)α · b−d+α db, b − a−d+(n−1)α · b−d+α db,
and estimate each term in turn. For the b’s in the integral of T1 , b ≤ b − a + a ≤ b − a + 12 b. In other words, b − a ≥ 12 b. Thus, T1 ≤ 2d−(n−1)α b−2d+nα db d b∈R : b >2 a d−(n−1)α a−d+nα ξ−2d+nα dξ. (4) =2 ξ∈Rd : ξ >2
By polar coordinates calculations (Supplementary Exercise 7, Chapter 3), &∞ the above integral equals a constant times 2 x−2+nα−1 dx, which is finite, since d > nα, for all n ≤ N ; this estimates T1 . To estimate T2 , note that for all b ∈ Rd with b ≤ 2a and b − a ≤ 1 1 2 a, b ≥ a − b − a ≥ 2 a. Thus, T2 ≤ 2d−α a−d+α b − a−d+(n−1)α db b∈Rd : b−a ≤ a /2 b−d+(n−1)α db = 2d−α a−d+α d b∈R : b ≤ a /2 d−α −d+nα a ξ−d+(n−1)α dξ, (5) =2 ξ∈Rd : ξ ≤ 12
422
11. Multiparameter Markov Processes
and the latter integral is finite, as can be seen by once again calculating in polar coordinates; cf. also Lemma 3.1.2 of Chapter 3. Finally, we have T3 ≤ 2d−(n−1)α a−d+(n−1)α b−d+α db d b∈R : b <2 a d−(n−1)α −d+nα a b−d+α db. =2 b∈Rd : b <2
Since the above integral is finite, we have deduced the existence of a finite constant A4 > 1 such that for all a ∈ Rd , U n (a) ≤ A4 a−d+nα. This proves (3), and the asserted upper bound (2) follows by induction. The last part of our proof is concerned with verifying the lower bound on the λ-potential density. We will show that there exists a finite constant A5 > 1 such that for all a ∈ Rd with a ≤ 12 , 1 a−d+N α. A5
uλ (a) ≥
The proposition follows upon letting A = max{A1 , A5 }. We will show that for all n ∈ {1, . . . , N }, there exists a finite constant A6 > 1 such that for every a ∈ Rd with a ≤ 12 , U n (a) ≥
1 a−d+nα. A6
(6)
This would finish the proof. When n = 1, (6) follows from the lower bound in Lemma 3.4.1, Chapter 10. Supposing (6) holds for n replaced by any k ≤ n − 1, we will show it holds for n. By the induction hypothesis, there exists a finite constant A7 > 1 such that whenever a ≤ 12 ,
n
U (a) =
U n−1 (b − a)u1λ(N ) (b) db
≥ b∈Rd : b−a ≤ 14 a
b <1
≥
1 A7
U n−1 (b − a)u1λ(N ) (b) db
b − a−d+(n−1)α · b−d+α db. b∈Rd : b−a ≤ 14 a
b <1
Equation (6), and hence the proposition, follows from similar arguments used in estimating T2 above; cf. Exercise 4.1.1 for details. Exercise 4.1.1 Complete the verification of equation (6).
4 Applications
423
Exercise 4.1.2 Prove the remaining assertions of Proposition 4.1.1 by showing that X is a strongly symmetric Markov process whose reference measure is Lebesgue’s measure on Rd . Exercise 4.1.3 Prove that in the above setting, rλ (x, y) > 0 for all λ 0 d in RN + and all x, y ∈ R . In order to present our first application of Theorem 3.1.1, let ν denote Lebesgue’s measure on Rd . By Theorem 2.3.2, X is a strongly symmetric L´evy process with reference measure ν. By Exercise 4.1.3, all of the conditions of Corollary 3.1.1 are met. Therefore, we can deduce the following: Theorem 4.1.1 Suppose X is an Rd -valued, N -parameter additive stable process with index α ∈ ]0, 2] with d > N α. If E ⊂ Rd is compact, then for all x ∈ E, Px X(RN + ) ∩ E = ∅ > 0 ⇐⇒ Capd−N α (E) > 0.
In particular, by invoking Frostman’s theorem (Theorem 2.2.1, Appendix C), we can deduce the following multiparameter analogue of Theorem 3.5.1, Chapter 10. Corollary 4.1.1 If E ⊂ Rd is compact, then for all x ∈ E, dim(E) > d − N α =⇒ Px X(RN + ) ∩ E = ∅ > 0, dim(E) < d − N α =⇒ Px X(RN + ) ∩ E = ∅ = 0. where dim denotes Hausdorff dimension. The proof of Theorem 4.1.1 has another corollary, which we state as the following important exercise. Exercise 4.1.4 (Hard) For all β, M > 0, there exist constants A1 and A2 such that for all compact sets E ⊂ [−M, M ]d , A1 Capβ (E) ≤ Capac β (E) ≤ A2 Capβ (E), where Capβ denotes the β-dimensional Bessel–Riesz capacity and Capac β designates the absolutely continuous capacity with respect to the gauge function x → x−β , where absolute continuity is understood to hold with respect to Lebesgue’s measure; see Section 2.2, Appendix D, for the definition of absolutely continuous capacities. (In fact, using Fourier analysis, one can show that A1 = A2 = 1.) Prove also that the above has an analogue when β = 0. (Hint: Prove the following refinement of Proposition 4.1.1: For all M > 0, there are positive and finite constants A1 and A2 such that
A1 1 ∨ x − y−d+N α ≤ rλ (x, y) ≤ A2 1 ∨ x − y−d+N α ,
424
11. Multiparameter Markov Processes
for all x, y ∈ [−M, M ]d and all λ 0 in RN + .)
4.2 Intersections of Independent Processes To present another application of Corollary 3.1.1 of Theorem 3.1.1, we investigate the problem of when the trajectories of N independent, strongly symmetric multiparameter Feller processes intersect. To this end, we suppose that X 1 , . . . , X N are N independent, Rd -valued, strongly symmetric and multiparameter Feller processes. To be concrete, we suppose that for each 1 ≤ i ≤ N , X i has Mi parameters, as well as a λ-potential density rλi i i for every λ 0 in RM + . We shall also denote the reference measure of X by νi . As in Section 2.4, we define X = X 1 ⊗ · · · ⊗ X N to be the M -parameter, RN d -valued stochastic process where M = M1 + · · · + MN . By Theorem 2.4.1, X is a strongly symmetric, RN d -valued, M -parameter Feller process. Moreover, for all λ 0 (in RM + ), X has a λ-potential density rλ that is a symmetric, proper gauge function on Rd × Rd and is described by the following: N
rλ (x, y) = j=1
rλj j (xj , yj ),
x, y ∈ RN d , λ 0 in RM +,
where any x ∈ RN d is written in direct product notation as x1 ⊗ · · · ⊗ xN , where xi ∈ Rd for all 1 ≤ i ≤ N ; similarly, λ 0 (in RM + ) is written as Mi λ1 ⊗ · · · ⊗ λN , where λi 0 (in R+ ) for all 1 ≤ i ≤ N . By invoking Theorem 4.1.1, we can conclude that for all compact sets F ∈ RN d , Pν X(RM + ) ∩ F = ∅ > 0 ⇐⇒ Crλ (F ) > 0. Equivalently, Pν {X(RM + ) ∩ F = ∅} > 0 if and only if there exists a probability measure σ ∈ P(F ) such that Erλ (σ) < ∞ (why?). Now we turn to the question of intersections. Note that for any compact set E ⊂ Rd , N
i M X i (RM + ) ∩ E = ∅ ⇐⇒ X(R+ ) ∩ DE = ∅,
i=1
where DE = {x ⊗ · · · ⊗ x : x ∈ E}. In words, the (closed) ranges of the processes X 1 , . . . , X N intersect in some set E ⊂ Rd if and only if the M > Mi i parameter process X hits DE . Hence, we can conclude that N i=1 X (R+ )∩ E is nonvoid if and only if there exists a probability measure σ ∈ P(DE ) such that Erλ (σ) < ∞. On the other hand, the set DE is special; any probability measure σ on DE must be of the form σ(dx1 ⊗ · · ·⊗dxN ) = µ(dx1 )δx1 (dx2 ) · · · δx1 (dxN ),
xi ∈ Rd , 1 ≤ i ≤ N,
4 Applications
425
where δa denotes a point mass at a ∈ Rd and µ ∈ P(E) (why?). It is now easy to check that when σ is of the above form,
N
Erλ (σ) =
rj (x, y) µ(dx) µ(dy) = E N
j=1
j=1
j rλ
(µ), j
i where λ = λ1 ⊗ · · · ⊗ λN for λi 0 (in RM + ), 1 ≤ i ≤ N . We have proven the following result; see Benjamini et al. (1995), Fitzsimmons and Salisbury (1989), Hirsch and Song (1994, 1995a, 1995c, 1995d), Hirsch (1995), Khoshnevisan (1997b, 1999), Khoshnevisan and Shi (1999), Pemantle et al. (1996), Peres (1996a), Ren (1990), and Salisbury (1992, 1996) for a collection of related results.
Theorem 4.2.1 Consider N independent, strongly symmetric, Rd -valued Feller processes X 1 , . . . , X N with potential densities r1 , . . . , rN . If X i has Mi parameters and has reference measure νi (1 ≤ i ≤ N ), then for any compact set E ⊂ Rd , Pν
N
i X i (RM ) ∩ E =
∅ > 0 ⇐⇒ Cg (E) > 0, +
j=1
where ν = ν1 × · · · × νN and g(x, y) = i RM + ,
d
N
j j=1 rλj (x, y)
for any λi 0 (in
1 ≤ i ≤ N ) and for all x, y ∈ R .
The following important corollary is proved in similar fashion to Corollary 3.1.1. Corollary 4.2.1 Consider the processes of Theorem 4.2.1, and fix some compact E ⊂ RN d . Suppose, in addition, that there exists some λ 0 N set Mi in i=1 R and some x0 ∈ E such that for all y ∈ Rd , rλ (x0 , y) > 0. Then, N
i Px 0 X i (RM + ) ∩ E = ∅ > 0 ⇐⇒ Cg (E) > 0. i=1
Here, g(x, y) = M R+ j
N d
j=1 rλj (x, y),
where λ = λ1 ⊗ · · · ⊗ λN with λj 0 in
and x, y ∈ R .
Exercise 4.2.1 Prove Corollary 4.2.1. (Hint: This is an exercise in Laplace transforms; cf. also Appendix B.) In the next subsection we will try to understand this result via a number of examples that are presented to various degrees of generality.
426
11. Multiparameter Markov Processes
4.3 Dvoretzky–Erd˝os–Kakutani Theorems Consider two independent Rd -valued, isotropic L´evy processes X 1 and X 2 with indices α1 and α2 , respectively. We shall assume that X 1 and X 2 are both transient. That is, we assume that d > α1 ∨ α2 ;
(1)
cf. Theorem 1.3.1, Chapter 10. Let p1 and p2 denote their respective transition densities (with respect to Lebesgue’s measure on Rd ). That is, for each = 1, 2, pt (x, y) = (2π)−d
exp
Rd
− iξ · (y − x) −
tξα dξ, 2
x, y ∈ Rd , t ≥ 0;
cf. Example 3, Section 4.3 of Chapter 10. By replacing ξ by −ξ in the above integral(s), we see that for every t ≥ 0, for all x, y ∈ Rd , and for = 1 or 2, pt (x, y) = pt (y, x). Consequently, we can appeal to the example of Section 1.4, to see that X 1 and X 2 are strongly symmetric, multiparameter Feller processes and that they both have Lebesgue’s measure for their reference measures. Let X = X 1 ⊗ X 2 be the corresponding product process in the sense of Section 2.2. By Proposition 2.2.1, X is a two-parameter Feller process whose reference measure is Lebesgue’s measure on R2d . Combining these observations, together with Corollary 4.2.1 above and with Lemma 3.4.1 of Chapter 10, we can conclude the following: Theorem 4.3.1 Let X 1 and X 2 denote two independent, Rd -valued isotropic stable L´evy processes with indices α1 and α2 , respectively. Under the transience condition (1), for any compact set E ⊂ Rd , and for all x = x1 ⊗ x2 with distinct x1 , x2 ∈ Rd , Px X 1 (R+ ) ∩ X 2 (R+ ) ∩ E = ∅ > 0 ⇐⇒ Cap2d−α1 −α2 (E) > 0.
Exercise 4.3.1 Verify the details of the proof of Theorem 4.3.1.
As it was mentioned in Chapter 8, Px should the thought of as the conditional measure of the process X, given that X01 = x1 ∈ Rd and X02 = x2 ∈ Rd , where x = x1 ⊗ x2 , in direct product notation. In words, what we have shown, thus far, is that the (2d − α1 − α2 )-dimensional Bessel– Riesz capacity of E is positive if and only if the intersection of the (closed) ranges of X 1 and X 2 intersects E, as long as X starts outside E. We apply Frostman’s theorem (Theorem 2.2.1, Appendix C) to Theorem 4.3.1 to deduce the following:
4 Applications
427
Corollary 4.3.1 In the setup of Theorem 4.3.1,
∅ > 0, dim(E) > 2d − α1 − α2 =⇒ Px X 1 (R+ ) ∩ X 2 (R+ ) ∩ E = dim(E) < 2d − α1 − α2 =⇒ Px X 1 (R+ ) ∩ X 2 (R+ ) ∩ E =
∅ = 0. A particularly important case is obtained if we let E approximate Rd . In this case, we conclude that if x = x1 ⊗ x2 with distinct x1 , x2 ∈ Rd , d < α1 + α2 =⇒ Px X 1 (R+ ) ∩ X 2 (R+ ) = ∅ > 0, d > α1 + α2 =⇒ Px X 1 (R+ ) ∩ X 2 (R+ ) = ∅ = 0, with the added proviso that d > α1 ∨ α2 ; cf. equation (1). In particular, we can take α1 = α2 = 2 to find that the trajectories of two independent Brownian motions in Rd intersect if d = 3, but not if d ≥ 5, and this holds whenever their starting points are distinct. This characterization leaves out the cases d = 2 and d = 4. It also omits d = 1, which is the simplest case; cf. Exercise 4.3.3 below for the treatment of the one-dimensional case. We concentrate on d = 2, 4 next. Let us first fix d = 4 and use the notation of Theorem 4.3.1. The argument that led to Theorem 4.3.1 shows that for all compact sets E ⊂ R4 , and all x = x1 ⊗ x2 with distinct x1 , x2 ∈ R4 , Px X 1 (R+ ) ∩ X 2 (R+ ) ∩ E = ∅ > 0 ⇐⇒ Cap4 (E) > 0. Let E approximate R4 arbitrarily well, and recall from Taylor’s theorem that Cap4 (R4 ) = 0; cf. Corollary 2.3.1 of Appendix C. This shows that if they have distinct starting points, the trajectories of two independent Brownian motions in R4 do not intersect. To conclude our discussion of trajectorial intersections of two independent Brownian motions, we need to address the case d = 2. Let X 1 and X 2 be two independent 2-dimensional Brownian motions. Lemma 3.1.2 of Chapter 10 can be used to show that for any compact set E ⊂ R2 , the following are equivalent: • For all x = x1 ⊗ x2 with distinct x1 , x2 ∈ R2 , Px X 1 (R+ ) ∩ X 2 (R+ ) ∩ E = ∅ > 0. • There exists a probability measure µ ∈ P(E) such that 1 2 µ(da) µ(db) < ∞. ln+ a − b
428
11. Multiparameter Markov Processes
Exercise 4.3.2 Prove, in detail, that the above are equivalent.
We can apply this equivalence to E = R2 (why? this is not completely obvious, since R2 is not compact). We note that there exists ε0 ∈ ]0, 1[ such that for all ζ ∈ [0, ε0 ], | ln+ (1/ζ)|2 ≤ ζ −1 . Thus, for all µ ∈ P(R2 ), 1 2 ln+ µ(da) µ(db) a − b ≤1+ a − b−1 µ(da) µ(db) a,b∈R2 : a−b ≤ε0
≤ 1 + Energy1 (µ). On the other hand, since H2 (R2 ) = +∞, by Frostman’s theorem, Cap1 (R2 ) > 0; cf. Theorem 2.2.1 of Appendix C. In particular, there must 2 exist a probability such that Energy1 (µ) < ∞. By the & & measure µ on R 2 above estimate, | ln+ (1/a−b)| µ(da) µ(db) < ∞ for such a µ ∈ P(R2 ). That is, whenever they have distinct starting points, the trajectories of two independent Brownian motions in R2 intersect with positive probability. Combining the above discussions, we obtain the following result of Dvoretzky et al. (1954). Theorem 4.3.2 (The Dvoretzky–Erd˝ os–Kakutani Theorem) If X 1 2 and X denote two independent d-dimensional Brownian motions with distinct starting points x1 and x2 , respectively, then Px1 ⊗x2 X 1 (R+ ) ∩ X 2 (R+ ) = ∅ > 0 ⇐⇒ d ≤ 3. We have not discussed the easiest case in the above theorem, which is when d = 1. This is worked out in the following exercise. Exercise 4.3.3 Verify that the trajectories of two real-valued, independent Brownian motions intersect with positive probability. Supplementary Exercise 3 discusses other related results of Dvoretzky et al. (1954).
4.4 Intersecting an Additive Stable Process Let X 1 denote an Rd -valued, isotropic stable L´evy process with index α1 ∈ ]0, 2]. Consider an independent process X 2 , which denotes an Rd -valued, N -parameter additive stable process of index α2 ∈ ]0, 2]; cf. Section 4.1 for the latter. Throughout this subsection we will assume the following: d > N α2 .
(1)
Theorem 4.4.1 Suppose X 1 is an index-α1 isotropic L´evy process on Rd , and X 2 is an independent, N -parameter, index-α2 additive stable process.
4 Applications
429
If equation (1) holds, then for all compact sets E ⊂ Rd and all distinct x, y ∈ Rd with x ∈ E , Px⊗y X 1 (R+ ) ∩ X 2 (RN + ) ∩ E = ∅ > 0 ⇐⇒ Cκ (E) > 0, where
ln
1 u − v−d+N α2 , κ(u, v) = u − v u − v−2d+α1 +N α2 , +
if d = α1 , if d > α1 .
The easier case d < α1 is handled in the following exercise. Exercise 4.4.1 Show that when d < α1 , the above intersection probability is always positive. Proof of Theorem 4.4.1 Recall Lemmas 3.4.1 and 3.4.2, both from Chapter 10. In particular, recall that X 1 and X 2 are strongly symmetric and their reference measure is Lebesgue’s measure on Rd . Let r1 and r2 denote the resolvent densities of X 1 and X 2 , respectively. By applying Lemmas 3.1.1 and 3.1.2 of Chapter 10, when d ≥ α1 , we can find a finite constant A > 1 such that whenever x, y ∈ Rd satisfy x − y ≤ 12 (say), then for any λ1 ∈ R+ and for all λ2 0 in RN +, 1 κ(x, y) ≤ rλ1 1 (x, y) rλ2 2 (x, y) ≤ Aκ(x, y). A Consequently, Crλ1 rλ2 (E) > 0 ⇐⇒ Cκ (E) > 0. Thus, Corollary 4.2.1 im1 2 plies the result.
4.5 Hausdorff Dimension of the Range of a Stable Process Let X = (Xt ; t ≥ 0) denote an Rd -valued isotropic stable L´evy process of index α ∈ ]0, 2]. Our present goal is to study the “size” of the range X(R+ ) of the process, viewed as a d-dimensional “random” set.9 Let (Ω, G, Q) denote the underlying probability space and let Px (x ∈ Rd ) denote the distribution of X + x, as before. We now introduce an auxilliary probability space (Ω , G , Q ) large enough to support the construction of an Rd -valued,
N -parameter additive stable process X = (Xt ; t ∈ RN + ) of index α ∈ ]0, 2].
d
Let Px (x ∈ R ) denote the distribution of X + x, as before. Now, we combine things to make X and X independent. That is, define (Ω0 , G0 , Q0 ) to be the product probability space given by Ω0 = Ω × Ω , G0 = G × G and Q0 = Q × Q . For all x, y ∈ Rd , we can define a measure P0x⊗y on G0 by P0x⊗y = Px × P y (as a product measure). We shall extend the definition of 9 For
a rigorous definition of random sets, see Section 4.7.
430
11. Multiparameter Markov Processes
the processes X and X to the probability space (Ω0 , G0 , Q0 ) in the natural way. That is, for all ω 0 ∈ Ω0 of the form ω 0 = ω ⊗ ω , where ω ∈ Ω and ω ∈ Ω , Xs (ω 0 ) = Xs (ω), Xt (ω 0 ) = Xt (ω ),
s ≥ 0, t ∈ RN +.
The only thing that the above heavy-handed notation does is to set up some rigorous machinery for the statement that X and X are two independent additive stable processes on Rd with 1 and N parameters, respectively, that have indices α and α , respectively. Moreover, for all x, x ∈ Rd , X and X
start at x and x , respectively, under the measure P0x⊗x . Having established the requisite notation, we can employ Theorem 4.4.1 to see that for all compact sets E ⊂ Rd and for all distinct x, x ∈ Rd with x ∈ E, P0x⊗x X(R+ ) ∩ X (RN + ) ∩ E = ∅ > 0 ⇐⇒ Cap2d−α−N α (E) > 0, as long as d > α ∧ N α . Equivalently, by Fubini’s theorem, we can first condition on the entire process X to see that P x X (RN + ) ∩ X(R+ ) ∩ E = ∅ X > 0, Px -a.s. (1) ⇐⇒ Cap2d−α−N α (E) > 0. On the other hand, ) ∩ X(R ) ∩ E =
∅ P x X (RN X = lim P x X (RN + + + ) ∩ Ft = ∅ X , t→∞
Px -a.s., where Ft = E ∩ X([0, t]) is a compact set in Rd . Furthermore, by Theorem 4.1.1 above, P x X (RN + )∩Ft = ∅ X > 0, Px -a.s. ⇐⇒ Ex Capd−N α (Ft ) > 0. (2) We have used the assumption that x ∈ E. Finally, note that as t ↑ ∞, Ft ↑ E ∩ X(R+ ). Thus, by Exercise 1.1.5 of Appendix D, the following holds for all ω ∈ Ω: sup Capd−N α (Ft ) > 0 ⇐⇒ Capd−N α {E ∩ X(RN + )} > 0. t
(Why?) Consequently, we can combine equation (1) with equation (2) to deduce the following: Ex Capd−N α {E ∩ X(R+ )} > 0 ⇐⇒ Cap2d−α−N α (E) > 0. (Why is the expression under Ex measurable?) The neat feature of this formula is the complete absence of the process X . In summary, equation
4 Applications
431
(2) holds for all positive integers N and for all choices of α ∈ ]0, 2]. Note that in this regime, β = d − N α is an arbitrary real number in ]0, d[. In other words, we have shown that whenever d > α and x ∈ E, for any β ∈ ]0, d[, (3) Ex Capβ {E ∩ X(R+ )} > 0 ⇐⇒ Capd−α+β (E) > 0. This suggests the validity of the following computation of the size of the range of an isotropic stable process. It was discovered in McKean (1955b). Theorem 4.5.1 (McKean’s Theorem) Let X denote an Rd -valued, isotropic stable L´evy process of index α ∈ ]0, 2]. For any x ∈ Rd , Px dim(X(R+ )) = α ∧ d = 1. When applied to the special case α = d = 2, Theorem 4.5.1 states that the range of 2-dimensional Brownian motion is a (random) set whose Hausdorff dimension is 2, a.s. On the other hand, by Supplementary Exercise 3, Chapter 10, the range of 2-dimensional Brownian motion has zero 2-dimensional Lebesgue’s measure and, equivalently, zero 2-dimensional Hausdorff measure; cf. Lemma 1.1.1 of Appendix C. In summary, we have shown that the Hs -measure of the range of 2-dimensional Brownian motion is 0 when s = 2, while it is positive when s < 2. In fact, it is possible to show a little more: Exercise 4.5.1 Show that for all choices of s < 2, Hs {B(R+ )} = +∞, almost surely. Figure 11.1 shows a realization of a portion of the range of 2-dimensional Brownian moFigure 11.1: B([0, 1]), where B is 2tion. dimensional Brownian motion Proof We will fix some x ∈ Rd and first prove the theorem for d > α. For all m ≥ 1 and all x ∈ Rd , let Γm (x) denote the following compact set in Rd : Γm (x) = y ∈ Rd : m−1 ≤ |x − y| ≤ m . Note that x ∈ Γm (x), no matter what the value of m ≥ 1 is. Recall that in integer dimensions, Hausdorff’s measure and Lebesgue’s measure agree (Theorem 1.1.1, Appendix C). Thus, we can apply Frostman’s theorem (Theorem 2.2.1, Appendix C) to see that for all β ∈ ]α, d[
432
11. Multiparameter Markov Processes
and for all m > 0 Capd−α+β Γm (x) = 0. By equation (3), Ex Capβ {Γm (x) ∩ X(R+ )} = 0,
∀β ∈ ]α, d[ .
By Exercise 1.1.5 of Appendix D, and by Lebesgue’s monotone convergence theorem, we can let m ↑ ∞ to conclude that Ex Capβ {X(R+ )} = 0, ∀β ∈ ]α, d[ . Consequently,
Px Capβ {X(R+ )} = 0 for all β ∈ ]α, d[ ∩ Q+ = 1, from which we get
Px dim{X(R+ )} ≤ α = 1.
(4)
A similar analysis shows that whenever β ∈ ]α, d[,
Ex Capβ X(R+ ) > 0. That is, for any β ∈ ]0, α[, there necessarily exists some t > 0 such that
Ex Capβ X [t, ∞[ >0 (Why?) By scaling (Lemma 1.3.1, Chapter 10), Capβ {X([t, ∞[)} has the same distribution as Capβ {t1/α X([1, ∞[)}, which, thanks to Exercise 2.2.2 of Appendix C, equals t−β/α Capβ {X([1, ∞[)}. The above discussion shows that (with positive Px -probability), Capβ {X([t, ∞[)} > 0, for some t > 0 if and only if Capβ {X([t, ∞[)} > 0, for all t > 0. Moreover, the Px probability that Capβ {X([t, ∞[)} > 0, is independent of the choice of t > 0. In particular, we see that for all β ∈ ]0, α[,
lim Px Capβ {X([t, ∞)} > 0 = Px Capβ {X ([t, ∞[ )} > 0, ∀t > 0 > 0, t→∞
by monotonicity. The terms inside the second probability above are measurable with respect to the tail σ-field of X. Thus, by Kolmogorov’s 0-1 law, the above probability is 1; cf. Supplementary Exercise 10 of Chapter 8. In particular, Px -a.s., Capβ X(R+ ) > 0, ∀β ∈ ]0, α[ ∩ Q+ . By Frostman’s theorem, dim{X(R+ )} ≥ α, Px -a.s., which, in light of equation (4), proves the assertion of the theorem when d > α. Next, we prove the theorem when d = α. Of course, since d is an integer, this means that X is either 2-dimensional Brownian motion (α = d = 2) or a 1-dimensional Cauchy process (α = d = 1); cf. Example (c), Section
4 Applications
433
1.3 of Chapter 10. We will concentrate on the harder case d = α. When d < α, this follows from the fact that X hits any singleton (proved in Supplementary Exercise 7 of Chapter 10). When d = α, our proof of equation (3) can be used to show that for all β ∈ ]0, d[ and all z ∈ E , Ez Capβ (E ∩ X(R+ )) > 0 ⇐⇒ Cκ (E) > 0,
where κ(x, y) = x − y−β ln+ 1/x − y , x, y ∈ Rd . The remainder of the proof when d = α is carried out in similar fashion to the one in the case d > α.
4.6 Extension to Additive Stable Processes We now extend the results of Section 4.5 to additive stable processes. Let d X = (Xt ; t ∈ RN + ) denote an N -parameter, R -valued additive stable process with index α ∈ ]0, 2]. When α = 2, X is said to be additive Brownian motion. Analogously to the proof of (3), we can prove the following: If d ≥ N α, then for any β ∈ ]0, d[ and for all compact sets E ⊂ Rd and all d-dimensional vectors x ∈ E, Ex Capβ {E ∩ X(RN (1) + )} > 0 ⇐⇒ Capd−N α+β (E) > 0. Exercise 4.6.1 Fill in the details of the proof that when d ≥ N α, equation (1) holds. Exercise 4.6.2 Prove that when d < N α, the Ex -expectation of Lebesgue’s measure of the range of X is positive. From this, deduce that in this case, Leb X(RN + ) = +∞, almost surely. Consequently, we can deduce the following N -parameter extension of Theorem 4.5.1: Theorem 4.6.1 Let X denote an Rd -valued, N -parameter additive stable process of index α ∈ ]0, 2]. Then, for any x ∈ Rd , with Px -probability one, dim X(RN + ) = N α ∧ d. Exercise 4.6.3 Prove Theorem 4.6.1.
In particular, we can let α ≡ 2 to see that whenever d ≤ 2N , the Hausdorff dimension of the range of additive Brownian motion is a.s. equal to d. Figure 4.6 shows a part of the range of 2-parameter, R-valued additive Brownian motion. To be more precise, it shows the result of a simulation of of [0, 1]2 $ s ∈ R2+ versus Xs ∈ R.
434
11. Multiparameter Markov Processes
Figure 11.2: additive Brownian motion (N = 2, d = 1) plotted against time
Exercise 4.6.2 above used Theorem 4.1.1 to conclude that whenever d < 2N , then Leb{X(RN + )} = +∞, a.s. That is, when d < 2N , the range has full d-dimensional Lebesgue measure (and hence, full d-dimensional Hausdorff measure, by Lemma 1.1.1 of Appendix C), whereas we just showed that it has Hausdorff dimension 2. In particular, H2 {X(RN + )} = 0, a.s., while N for any s ∈ ]0, 2[, Hs {X(R+ )} = +∞, a.s. This discussion leaves out the “critical case,” i.e., when d = 2N . In this case, the range has zero Lebesgue measure; cf. Supplementary Exercise 12. We mention another interesting application of (1). Theorem 4.6.2 Let X denote an N -parameter, Rd -valued additive stable process of index α ∈ ]0, 2]. Given a compact set E ⊂ Rd and a d-dimensional vector x ∈ E,
+ Px dim X(RN + ) ∩ E ≤ {dim(E) − d + N α} ∧ d = 1. On the other hand,
+ Px dim X(RN + ) ∩ E ≥ {dim(E) − d + N α} ∧ d > 0. Exercise 4.6.4 Prove Theorem 4.6.2.
4 Applications
435
4.7 Stochastic Codimension The potential theoretic results of this chapter also apply to other random fields than Markovian ones. In the next two subsections we show how such connections can be established. A set-valued function K mapping Ω into the collection of all subsets of Rd is said to be a random set if for all nonrandom compact (or σcompact) sets E ⊂ Rd , 1lK∩E is a random variable. We have already seen many examples of random sets: The range of an N -parameter, Rd -valued Markov process X = (Xt ; t ∈ RN + ) is a random set in this sense. Somewhat more generally, suppose X = (Xt ; t ∈ RN + ) denotes a right-continuous, N parameter, Rd -valued stochastic process. If I ⊂ RN + is Borel measurable and if X(I) = {Xs ; s ∈ I}, then the closure of X(I) in Rd is a random set. In fact, if t → Xt is continuous, then X(I) is itself a random set (why?). Throughout this subsection K denotes a random set in Rd . We next define two nonrandom indices for K that we call the upper and the lower (stochastic) codimensions of K, respectively. The upper (stochastic) codimension of K (written as codim(K)) is the smallest real number β ∈ [0, d] such that P K ∩ G = ∅ > 0, for all compact sets G ⊂ Rd with dim(G) > β. If such a β does not exist, we define the upper codimension of K as d. It should be reemphasized that the upper codimension of a random set is always deterministic; i.e., it is not a random number. There is a symmetric definition for the lower (stochastic) codimension of a random set: codim(K) is defined as the greatest real number β ∈ [0, d] such that for all compact sets G ⊂ Rd with dim(K) < β, P(K ∩ G = ∅) = 0. If such a β does not exist, then we define the lower codimension of K to be 0. In case the two codimensions agree, that is, whenever codim(K) = codim(K), we write codim(K) for the common value and call it the (stochastic) codimension of K. It may happen that there are several probability measures Px , x ∈ Rd on our probability space. In such a case, we refer to codim(K), codim(K), and codim(K) as the upper codimension, lower codimension, and the codimension of K, with respect to the measure Px , when the P in the definitions of codimensions is replaced by Px . Let us summarize our efforts thus far. Lemma 4.7.1 Suppose K is a random set in Rd . If G ⊂ Rd is compact, then "
> 0, whenever dim(G) > codim(K), P K ∩ G = ∅ = 0, whenever dim(G) < codim(K). Example 1 Suppose X denotes a d-dimensional, 1-parameter, isotropic stable L´evy process of index α ∈ ]0, 2]. Theorem 3.5.1 of Chapter 10 can be
436
11. Multiparameter Markov Processes
recast as follows: With respect to any of the measures Px , where x ∈ Rd , codim X(R+ ) = (d − α)+ , where a+ = max{a, 0}, as usual.
Example 2 The previous example can be generalized to a multiparameter setting in various ways. Here is one such method: Let X = (Xt ; t ∈ RN +) denote an Rd -valued, N -parameter additive stable process of index α ∈ ]0, 2] and let Px denote the distribution of X + x. Then, Corollary 4.1.1 is equivalent to the statement that + codim X(RN + ) = (d − N α) , under any Px , where x ∈ Rd .
Example 3 Let X 1 and X 2 denote independent, Rd -valued, isotropic stable L´evy processes of indices α1 and α2 , respectively. If x, y ∈ Rd are distinct, Corollary 4.3.1 can be recast as follows: Px⊗y -a.s., codim X1 (R+ ) ∩ X2 (R+ ) = (2d − α1 − α2 )+ . In particular, if we choose X 1 and X 2 to both be Brownian motions whose starting points are different, then we see that the codimension of the intersection of the ranges of two independent Brownian motions in Rd is 2(d − 2)+ . The main result of this subsection is the following theorem, which also justifies the use of the term “codimension.” In what follows, the assumption that the codimension is strictly between 0 and d is indispensable. Theorem 4.7.1 Given a random set K in Rd whose codimension is strictly between 0 and d, dim (K) + codim(K) = d,
P-a.s.
The main step in our proof is the following result due to Y. Peres; see Peres (1996a, 1996b). Lemma 4.7.2 (Peres’s Lemma) On a suitable probability space, one can construct a random set Λβ,d ⊂ Rd for any β ∈ ]0, d[ such that for all σ-compact sets E ⊂ Rd ,
P dim{Λβ,d ∩ E} = dim(E) − β = 1, where dim(A) < 0 means that A = ∅, for any Borel set A ⊂ Rd . Finally, for any Borel set E ⊂ Rd and all β ∈ ]0, d[,
P Λβ,d ∩ E = ∅ ∈ {0, 1}.
4 Applications
437
Proof By enlarging the probability space if necessary, we can assume the existence of independent, identically distributed additive stable processes X 1 , X 2 , . . . and choose distinct vectors x1 , x2 , . . . ∈ Rd as their starting points. (That is, the underlying probability measure is taken to be the j P , where Pjxj denotes the distriproduct measure P = Px1 ⊗x2 ⊗··· = ∞ j=1 xj j bution of X + xj .) We shall further assume that these processes have N parameters, are Rd -valued, and have the same index α, where the parameters in question satisfy d − N α = β. Define the random set Λβ,d (ω) =
∞
X i (RN + )(ω).
i=1
(Why is this a random set?) By Exercise 1.2.2 of Appendix C, for any σ-compact set E ⊂ Rd ,
dim Λβ,d ∩ E = sup dim X i (RN +) ∩ E . i≥1
Applying the Borel–Cantelli lemma together with Theorem 4.6.2, we conclude that
P dim{Λβ,d ∩ E} = dim(E) − β = 1. This is the first half of the desired result. The second half, i.e., the 0-1 law, follows from the Borel–Cantelli lemma and the i.i.d. structure of the X j ’s. Proof of Theorem 4.7.1 Let Λβ,d be the random sets of Peres’s lemma (Lemma 4.7.2). By further enlarging the probability space, we may assume that all the Λβ,d ’s are independent of the random set K. For any γ ∈ ]0, d[, Λγ,d and K are independent. Thus, by Peres’s lemma, dim(Λγ,d ∩ K) = dim(K) − γ, almost surely. In particular, "
1, if dim(K) > γ, P Λγ,d ∩ K = ∅ = 0, if dim(K) < γ. Now, suppose there exists γ ∈ ]0, d[ such that P(dim{K} > γ) > 0. The above, together with the 0-1 law of Peres’s lemma, implies that P(Λγ,d ∩K = ∅) = 1. By Lemma 4.7.1,
P dim{K} > γ > 0 ⇐⇒ P dim(Λγ,d ) ≥ codim(K) > 0. On the other hand, Peres’s lemma states that a.s., dim(Λγ,d ) = d − γ. Hence,
P dim{K} > γ > 0 ⇐⇒ d − γ ≥ codim(K). Equivalently, for all γ > d − codim(K), P(dim{K} > γ) = 0. This proves P dim{K} ≤ d − codim(K) = 1.
438
11. Multiparameter Markov Processes
Conversely, if P(dim{K} < γ) > 0, then d − γ ≤ codim(K), which shows that P dim{K} ≥ d − codim(K) = 1.
Our proof is complete.
5 α-Regular Gaussian Random Fields The ideas and results introduced in this chapter can be applied, to various degrees, to study other random fields than Markovian ones. We close this chapter with a look at a broad class of Gaussian random fields that (1) are non-Markovian in every reasonable sense; and (2) arise in applications of probability theory to other areas. Throughout this section X = (Xt ; t ∈ RN ) denotes an Rd -valued Gaussian process that is indexed by all of RN and not just RN . We will also (i)+ assume that for all t ∈ RN and for all 1 ≤ i ≤ N , E Xt = 0. In order to simplify the exposition, we begin with our study of R-valued, stationary Gaussian random fields.
5.1 One-Dimensional Stationary Gaussian Processes An N -parameter, real-valued Gaussian process X = (Xt ; t ∈ RN ) is said to be stationary if (s, t) → E[Xs Xt ] is a function of s − t. It is said to be centered if for all t ∈ RN , E[Xt ] = 0. Recall the covariance function of X from Chapter 5, viz., Σ(s, t) = E[Xs Xt ],
s, t ∈ RN .
Note that X is stationary if and only if Σ(s, t) = R(s − t) for some function R : RN → R. In this case, we can refer to R as the correlation function of X. It is abundantly clear that for all t ∈ RN , R(t) = E[Xt X0 ]. Moreover, from the form of the characteristic function of linear combinations of the Xt ’s, as t varies, we obtain the following: Lemma 5.1.1 If X is a centered, stationary, R-valued, N -parameter Gaussian random field, then for all s ∈ RN , (Xs+t ; t ∈ RN ) has the same finite-dimensional distributions as (Xt ; t ∈ RN ). Moreover, for all t ∈ RN , Xt ∼ N1 (0, R(0)). Exercise 5.1.1 Verify Lemma 5.1.1 by checking covariances.
In particular, we may assume, without loss of much generality, that R(0) = 1.
(1)
5 α-Regular Gaussian Random Fields
439
Indeed, if R(0) = 1, the results of this section can be applied to the process [R(0)]−1 X and then suitably translated. By Theorem 1.2.1, Chapter 5, Σ is positive definite in the sense that for all ξ1 , . . . , ξn ∈ R, and for all s1 , . . . , sn ∈ RN , n n
ξi Σ(si , sj )ξj ≥ 0.
j=1 i=1
In particular, R is a positive definite function in the classical sense: For all ξ1 , . . . , ξn ∈ R, and for all s1 , . . . , sn ∈ RN , n n
ξi R(si − sj )ξj ≥ 0.
j=1 i=1
Among its other notable features, we mention that the function R is symmetric in the sense that R(t) = R(−t), for all t ∈ RN . Moreover, by the Cauchy–Schwarz inequality, for all t ∈ RN , 0 |R(t)| ≤ E[(Xt )2 ] · E[(X0 )2 ] = 1. Let us recall the following form of the so-called spectral theorem. Theorem 5.1.1 (The Spectral Theorem) Given a symmetric, positive definite function f : RN → R such that supt∈RN |f (t)| = 1, there exists a probability measure µ, defined on Borel subsets of RN, such that for all t ∈ RN, f (t) = RN
eiξ·t µf (dξ).
The probability measure µf is the so-called spectral measure associated with the function f . We now combine the above ingredients to obtain the following: Lemma 5.1.2 Recalling (1), there exists a probability measure µ, on Borel subsets of RN , such that for all t ∈ RN , eiξ·t µ(dξ). R(t) = RN
Remarks (1) In fact, since R is real-valued, we can write R as the following cosine transform: R(t) = cos(ξ · t) µ(dξ). RN
(2) Conversely, whenever R is the characteristic function of a probability measure on RN , then R generates a mean-function-zero, stationary, real-valued, N -parameter Gaussian process by the prescription Σ(s, t) = R(s − t), for all s, t ∈ RN . Clearly, Σ is positive definite in the sense
440
11. Multiparameter Markov Processes
of Chapter 5. Theorem 1.2.1 of Chapter 5 now constructs the mentioned Gaussian process. (3) Remark (2) above is often useful in modeling. For instance, it shows that for any α ∈ ]0, 2], we can construct a centered, stationary, N -parameter, real-valued Gaussian random field whose correlation function is given by the “shape” R(t) = exp(−tα ) (why?). It is also possible to write the natural pseudometric generated by X, in terms of R, viz., E (Xs − Xt )2 = 2 1 − R(s − t) , s, t ∈ RN . (2) We can often use the above, in conjunction with continuity theorems of Chapter 5, to see that X has a continuous modification if t → R(t) is sufficiently smooth near the origin. Example By (2) and by Exercise 2.5.1 of Chapter 5, for all p > 0, we can find a finite, positive constant c(p), such that p p p E |Xs − Xt |p = c(p) E[|Xs − Xt |2 ] 2 = 2 2 c(p) 1 − R(s − t) 2 . Suppose there exist finite constants C > 1, β ∈ ]0, 2], and ε > 0 such that for all h ∈ RN with |h| ≤ ε, 1 − R(h) ≤ C|h|β .
(R(t) = exp − tβ of Remark (2) above is an example of such a correlation function.) Then, for all p > 0, we can deduce the existence of a finite, positive constant Ap such that for all s, t ∈ RN , E[|Xs − Xt |p ] ≤ Ap |s − t|
βp 2
.
Choose p so large that βp > 2N and use Theorem 2.5.1 of Chapter 5 to conclude that there exists a H¨ older continuous modification of X, which we continue to denote by X. One can refine this, since the same theorem shows that X is H¨older continuous of any order r < 12 β. In fact, we can do even more than this: By Exercise 2.5.1 of Chapter 5, for all a b, both in RN , all 0 < µ < β, and all p > 0, there exists a finite constant Bp such that for all ε ∈ ]0, 1[, µp E sup |Xs − Xt |p ≤ Bp ε 2 . (3) s,t∈[a,b]: |s−t|≤ε
See Supplementary Exercise 9 for an extension.
5 α-Regular Gaussian Random Fields
441
5.2 α-Regular Gaussian Fields We say that the N -parameter, Rd -valued process X is α-regular if α ∈ ]0, 1[ and if X is a centered, Rd -valued, N -parameter, stationary Gaussian process with i.i.d. coordinate processes whose correlation function R satisfies the following: There exists finite, positive constants c1 ≤ c2 and ε0 such that for all h ∈ RN with |h| ≤ ε0 , c1 |h|2α ≤ 1 − R(s − t) ≤ c2 |h|2α .
(1)
(i) (i) Note that for all 1 ≤ i ≤ N , R(t) = E Xt X0 . In particular, each coordinate process X (i) is of the type considered in the example of Section 5.1 with β = 2α. The main result of this section is the following codimension calculation for the range of an α-regular process. Theorem 5.2.1 Let X denote an N -parameter, Rd -valued, α-regular process, where α ∈ ]0, 1[. Then,
N + codim X(RN ) = d − . α
In particular, we deduce the following important corollary; it measures the size of the range of the process X by way of Hausdorff dimension. Corollary 5.2.1 In Theorem 5.2.1,
N dim X(RN ) = ∧ d, α
a.s.
Variants of the above, together with further refinements, can be found in (Adler 1981; Testard 1985; Weber 1983). Let us conclude this subsection by proving Corollary 5.2.1. The proof of Theorem 5.2.1 is deferred to the next two subsections. Proof of Corollary 5.2.1 If N < αd, Theorem 5.2.1 implies that the range of X has codimension d − N α , which is strictly between 0 and d. Since K = X(RN ) is a random set in the sense of Section 4.7 (why?), we can apply Theorem 4.7.1 to conclude that with probability one, the Hausdorff dimension of the range of X is d minus the codimension; i.e., the Hausdorff dimension is a.s. equal to N α . It remains to prove the corollary when N ≥ αd. Indeed, it suffices to show that dim(X([1, 1 + ε0 ]N )) ≥ d,
a.s.,
442
11. Multiparameter Markov Processes
where ε0 is given in the preamble to equation (1) above. (Why?) We shall demonstrate this by bare-hands methods. Define the random measure µ on Borel subsets of Rd by assigning to every Borel set A ⊂ Rd ,
µ(A) = 1lA Xs ds. [1,1+ε0 ]N
This is the so-called occupation measure for the stochastic process (Xs ; s ∈ [1, 1 + ε0 ]N ). It should also be clear that µ0 = ε−N 0 µ is a (random) probability measure that is supported on the (random, closed) set X([1, 1 + ε0 ]N ). Moreover, after a change of variables, we can see that for any β ∈ R, Energyβ (µ0 ) = ε−2N Xs − Xt −β ds dt. 0 [1,1+ε0 ]N
[1,1+ε0 ]N
In particular, if β ≥ 0, E Energyβ (µ0 ) = ε−2N 0 (i)
(i)
Since Xs − Xt
[1,1+ε0 ]N
[1,1+ε0 ]N
E[Xs − Xt −β ] ds dt.
are i.i.d. (as i varies from 1 to d), for all s, t ∈ [1, 1 + ε0 ]N ,
(1) E Xs − Xt 2 = dE |Xs(1) − Xt |2 = 2d 1 − R(s − t) , by equation (2) of Section 5.1. By the definition of α-regularity, for all s, t ∈ [1, 1 + ε0 ]N , 2dc1 |s − t|2α ≤ E Xs − Xt 2 ≤ 2dc2 |s − t|2α . (2) For any β ∈ R, − β2 E Xs − Xt −β = E Xs − Xt 2 E[Z−β ], where Z is an Rd -valued, Gaussian random variable with mean vector 0 whose covariance matrix is the (d×d) identity matrix (why?). Furthermore, E Z−β < ∞ ⇐⇒ β < d. (Why? See Exercise 1.1.1 of Chapter 5 for the form of the density.) Now we put things together: Suppose 0 < β < d and observe by equation (3) that E Energyβ (µ0 ) ≤ Aβ |s − t|−βα ds dt, [1,1+ε0 ]N
[1,1+ε0 ]N
5 α-Regular Gaussian Random Fields
where
443
−β −2N Aβ = c−β ε0 Cβ E Z−β < ∞. 2 (2d)
Since β < d ≤ N α , we can deduce that αβ < N , and in particular, upon changing variable (w = s − t), we obtain the following simple bound: |s − t|−βα ds dt ≤ εN |w|−βα dw, 0 [1,1+ε0 ]N
[1,1+ε0 ]N
[−1,1]N
which, thanks to Lemma 3.1.2 of Chapter 3, is finite. In summary, we have shown that there exists a probability measure µ0 on X([1, 1 + ε0 ]N ) such that for any β < d, Energyβ (µ0 ) < ∞, a.s. By Frostman’s theorem (Theorem 2.2.1, Appendix C), for any β < d, dim(X([1, 1 + ε0 ]N )) ≥ β,
a.s.
Letting β ↑ d along a rational sequence, we deduce that the Hausdorff dimension of X([1, 1 + ε0 ]N ) is a.s. at least d. This proves the corollary.
5.3 Proof of Theorem 5.2.1: First Part The difficulty with handling the processes of this section is that, generally, they are not Markovian. Thus, in our first lemma we are led to examine methods by which one can introduce some conditional independence for Gaussian processes. Throughout, X = (Xt ; t ∈ RN ) denotes an N parameter, Rd -valued, α-regular process, where α ∈ ]0, 1[. Lemma 5.3.1 For any fixed s ∈ RN , Xs is independent of the entire process (Xt − R(s − t)Xs ; t ∈ RN ). Moreover, the latter is a centered, N parameter, Rd -valued Gaussian process with i.i.d. coordinates all of which have mean zero and whose covariance is given by E {Xu(1) −R(u−s)Xs(1) }{Xv(1) −R(v−s)Xs(1) } = R(u−v)−R(s−u)R(s−v), for all u, v ∈ RN . Exercise 5.3.1 Verify Lemma 5.3.1 by computing covariances.
In particular, note that the variance of each coordinate of Xt −R(s−t)Xs equals 1 − {R(s − t)}2 . Our next technical result is a key step in proving the first half of Theorem 5.2.1. Lemma 5.3.2 Suppose d > N α and let M > 1 be fixed. Then, for all ζ ∈ ]0, 1[, there exists a finite constant A > 0 such that for all ε > 0,
N P inf |Xt − x| ≤ ε ≤ Aε(1−ζ)d− α . sup x∈[−M,M]d
t∈[0,1]N
444
11. Multiparameter Markov Processes
Before commencing with a proof for this lemma, we introduce some notation. Define Γn to be the equipartition of [0, 1]N of mesh n1 , viz., k Γn = t ∈ [0, 1]N : ∀1 ≤ ≤ N, t() = for some 0 ≤ k ≤ n . n
Throughout this proof the generic element of Γn is distinguished by the letter γ, where the generic elements of [0, 1]N are written as s, t, . . ., as usual. For each γ ∈ Γn , let Rn (γ) denote the “right-hand box” of side n1 about γ. This is defined, more precisely, as 1 , n ≥ 1, γ ∈ Γn . Rn (γ) = t ∈ [0, 1]N : t γ, |t − γ| ≤ n (Recall that |t| = max1≤≤N |t() | denotes the ∞ norm of t ∈ RN .) Clearly, for every integer n ≥ 1, N 1 Rn (γ) = 0, 1 + . n γ∈Γn
Consequently,
; γ∈Γn
Rn (γ) ⊃ [0, 1]N .
Proof Let n ≥ 1 be a fixed integer such that n ≥ ε−1 0 , where ε0 is given by the definition of α-regularity. We shall also hold some point x ∈ [−M, M ]d fixed, where M > 1 is a fixed constant, as in the statement of the lemma. −α there must exist Note that whenever inf t∈[0,1]N Xt − x ≤ n , then−α some γ ∈ Γn such that for some t ∈ Rn (γ), Xt − x ≤ n . Consequently, inf Xt − x ≤ n−α =⇒ ∃γ ∈ Γn : Xγ − x ≤ n−α + sup Xs − Xγ . t∈[0,1]N
s∈Rn (γ)
We now utilize Lemma 5.3.1 in order to create some independence. Namely, we note that for any γ ∈ Γn and for all s ∈ Rn (γ), |Xs − Xγ | ≤ |Xs − R(s − γ)Xγ | + |Xγ | {1 − R(s − γ)} ≤ |Xs − R(s − γ)Xγ | + |Xγ − x| {1 − R(s − γ) } + |x| {1 − R(s − γ)}. On the other hand, if s ∈ Rn (γ), then |s − γ| ≤ n−1 , which is less than or equal to ε0 . Thus, 1 − R(s − γ) ≤ c2 n−2α . Consequently, for all γ ∈ Γn and all s ∈ Rn (γ), |Xs − Xγ | ≤ |Xs − R(s − γ)Xγ | + c2 n−2α |Xγ − x| + c2 n−2α M. Thus, whenever inf t∈[0,1]N |Xt − x| ≤ n−α , then ∃γ ∈ Γn : (1−c2 n−2α )|Xγ −x| ≤ n−α +c2 n−2α M + sup |Xs −R(s−γ)Xγ |. s∈Rn (γ)
5 α-Regular Gaussian Random Fields
445
The above holds for all integers n > ε−1 0 . If n is even larger, we can simplify 1 1 α α this further. Indeed, suppose n > J, where J = max{ε−1 0 , (c2 M ) , (2c2 ) }. Then, 1 − c2 n−2α ≤ 12 and c2 n−2α M ≤ n−α . Thus, inf
t∈[0,1]N
|Xt − x| ≤ n−α 1 |Xγ − x| ≤ 2n−α + sup |Xs − R(s − γ)Xγ |. 2 s∈Rn (γ)
=⇒ ∃γ ∈ Γn :
Consequently,
P inf |Xt − x| ≤ n−α t∈[0,1]N
≤ P |Xγ − x| ≤ 4n−α + 2 sup |Xs − R(s − γ)Xγ | . s∈Rn (γ)
γ∈Γn
For any γ ∈ Γn , Xγ is a vector of d independent Gaussian random variables, each with mean 0 and variance 1. Thus, the density of Xγ (with respect to Lebesgue’s measure on Rd ) is bounded above by 1. In particular, for all z ∈ Rd and r > 0,
P |Xγ − z| ≤ r ≤ Leb([−r, r]d ) = (2r)d . By Lemma 5.3.1, |Xγ −x| and sups∈Rn (γ) |Xs −R(s−γ)Xγ | are independent random variables. Thus, by first conditioning on the latter random variable, we obtain
P inf |Xt − x| ≤ n−α t∈[0,1]N
≤ 2d
d E 4n−α + 2 sup |Xs − R(s − γ)Xγ | s∈Rn (γ)
γ∈Γn
≤ 4d
4n
−α d
d , + 2d E sup Xs − R(s − γ)Xγ
(1)
s∈Rn (γ)
γ∈Γn
since for all p, q ≥ 0, (p + q)d ≤ 2d (pd + q d ). The same inequality shows that E sup |Xs − R(s − γ)Xγ |d s∈Rn (γ)
≤ 2d E sup |Xs − Xγ |d
(2)
s∈Rn (γ)
+ 2d E |Xγ |d sup {1 − R(s − γ)}d . s∈Rn (γ)
By equation (3) of the example of Section 5.1, for any ζ ∈ ]0, 1[, there exists a positive finite constant B1 such that E sup |Xs − Xγ |d ≤ B1 n−α(1−ζ)d . (3) s∈Rn (γ)
446
11. Multiparameter Markov Processes
Moreover, by α-regularity, sups∈Rn (γ) {1 − R(s − γ)}d ≤ cd2 n−2αd , which is less than n−α(1−ζ)d for all n > J. Combining this with equations (1), (2), and (3), we can conclude that for all n > J,
P inf |Xt − x| ≤ n−α t∈[0,1]N 4d n−αd + 4d B1 n−α(1−ζ)d + 4d E |Xγ |d n−α(1−ζ)d ≤ 4d γ∈Γn d −α(1−ζ)d
≤ 16 n
1 + B1 + B2 #(Γn ),
where #(Γn ) = (n + 1)N ≤ 2N nN is a cardinality bound for Γn , and for any and all γ ∈ Γn , 2 d 1 |u|d e− 2 u du < ∞. B2 = E[|Xγ |d ] = (2π)− 2 Rd
We have shown that for all n > J,
P inf |Xt − x| ≤ n−α ≤ B3 n−α(1−ζ)d+N , t∈[0,1]N
where B3 = 16d 2N {1 + B1 + B2 }. Now, supposing 0 < ε < (2J)−α , we can find an integer n > J such that (n + 1)−α ≤ ε ≤ n−α . Consequently,
P inf |Xt − x| ≤ ε ≤ P inf |Xt − x| ≤ n−α ≤ B3 n−α(1−ζ)d+N . t∈[0,1]N
t∈[0,1]N
Using the fact that ε ≥ (n + 1)−α ≥ (2n)−α , we deduce that for all 0 < ε < (2J)−α ,
N P inf |Xt − x| ≤ ε ≤ B4 ε(1−ζ)d− α , t∈[0,1]N
where B4 = B3 2(1−ζ)dα−N . On the other hand, whenever ε ≥ (2J)−α ,
N P inf |Xt − x| ≤ ε ≤ 1 ≤ (2J)α(1−ζ)d−N ε(1−ζ)d− α . t∈[0,1]N
Thus, the lemma follows with A = B4 ∨ (2J)α(1−ζ)d−N .
We are ready to prove the first half of Theorem 5.2.1. Proof of Theorem 5.2.1: First Half In the first half of the proof, we show that
N + . codim X(RN ) ≥ d − α We can assume, without loss of generality, that d > nothing to prove.
N α.
Otherwise, there is
5 α-Regular Gaussian Random Fields
447
Suppose E ⊂ Rd is a compact set with 0 < dim(E) < d − N α . We are to N show that P(X(R ) ∩ E = ∅) = 0. By countable additivity, it suffices to show that for all a b, both in RN ,
P X([a, b]) ∩ E = ∅ = 0. We will do this for [a, b] = [0, 1]N . The somewhat more general case is handled in Exercise 5.3.2 below. By the definition of Hausdorff dimension (used in conjunction with Lemma 1.1.3(i) of Appendix C), for all s > dim(E) and for all δ ∈ ]0, 1[, there exist cubes B1 , B2 , . . . of sides r1 , r2 , . . . such that (1) supj rj ≤ δ; (2) E ⊂ ∪j Bj ; and (3) ∞ (2rj )s ≤ δ. (4) j=1
Henceforth, will choose a fixed ζ ∈ ]0, 1[ so small that s = (1 − ζ)d − N α > dim(E). Since E is compact, there exists some M > 1 such that Bj ⊂ [−M, M ]d for all j ≥ 1. Hence, by Lemma 5.3.2 above, we can find a finite constant A > 0 such that for all j ≥ 1,
N (1−ζ)d− N α P X([0, 1]N ) ∩ Bj = ∅ ≤ Arj = A1 (2rj )(1−ζ)d− α , where A1 = 2−(1−ζ)d+(N/α)A. By (4), ∞
P X([0, 1]N ) ∩ E = ∅ ≤ P X([0, 1]N ) ∩ Bj = ∅ j=1
≤ A1
∞
N
(2rj )(1−ζ)d− α ≤ A1 δ.
j=1
Since δ > 0 and ζ ∈ ]0, 1[ are arbitrary, we have shown that whenever N 0 < dim(E) < d − N α , then P{X([0, 1] ) ∩ E = ∅} = 0. This completes our proof of the first half. Exercise 5.3.2 Extend the above proof to cover the general case where E ⊂ [a, b]. Exercise 5.3.3 (Hard) Refine a part of Theorem 5.2.1 by showing that for all M > 0 and all a b, both in RN + , there exists a positive finite constant C such that for all compact sets E ⊂ [−M, M ]d , P{X([a, b]) ∩ E = ∅} ≤ CHd− N (E), α
where Hs is the s-dimensional Hausdorff measure of Appendix C.
448
11. Multiparameter Markov Processes
5.4 Proof of Theorem 5.2.1: Second Part The second half of the proof of Theorem 5.2.1 relies on the following two technical lemmas. Lemma 5.4.1 For all M > 0, there exists a finite constant A > 0 such that for all measurable functions f : Rd → R+ that are zero outside [−M, M ]d , 2 ≤A κ(x − y) f (x)f (y) dx dy, f (Xs ) ds E [1,1+ε0 ]N
where ε0 is given by the definition of α-regularity and κ : R+ → R+ ∪ {∞} is defined by if d < N/α, 1, if d = N/α, κ(r) = ln+ (1/r), −d+(N/α) r , if d > N/α. Proof Let I denote the (d × d) identity matrix and recall that by Lemma 5.3.1, for any s, t ∈ RN: • Xs ∼ Nd (0, I);
• Xt − R(s − t)Xs ∼ Nd 0, {1 − R2 (s − t)}I ; and • Xs and Xt − R(s − t)Xs are independent. In particular, d E f (Xs )f (Xt ) = (2π)− 2
1
2
e− 2 x f (x) ps,t (y)f (R(s − t)x + y) dx dy,
where d d ps,t (y) = (2π)− 2 {1 − R2 (s − t)}− 2 exp −
y2 . 2 2{1 − R (s − t)}
(Why?) If we further assume that |s − t| ≤ ε0 , then by α-regularity, −d ps,t (y) ≤ c1 2 |s − t|−αd exp −
y2 . 2c2 |s − t|2α
Thus, whenever |s − t| ≤ ε0 , E[f (Xs )f (Xt )] is bounded above by
− y2 f (x)f R(s − t)x + y e 2c2 |s−t|2α dx dy 2 − z+R(s−t)x −d −αd 2 = c1 |s − t| f (x)f (z)e 2c2 |s−t|2α dx dz.
−d c1 2 |s
−αd
− t|
[−M,M]d
[−M,M]d
5 α-Regular Gaussian Random Fields
449
We have also used the fact that f is supported on [−M, M ]d . Since |R(h)| ≤ 1 for all g ∈ RN , by the triangle inequality and by α-regularity, for all x, z ∈ [−M, M ]d and all s, t ∈ RN with |s − t| ≤ ε0 , 2 z − x2 ≤ 2 z − R(s − t)x2 + M 2 1 − R(s − t) 2α ≤ 2 z − R(s − t)x2 + M 2 c22 ε2α . |s − t| 0 Thus, whenever |s − t| ≤ ε0 , E[f (Xs )f (Xt )] is bounded above by z − x2 −αd A1 |s − t| f (x)f (z) dx dz, exp − 4c2 |s − t|2α −d/2
where A1 = c1 exp( 12 M 2 c22 ε2α 0 ). The remainder of this lemma follows from direct calculations. Exercise 5.4.1 Complete the proof of Lemma 5.4.1.
We are ready to complete our derivation of Theorem 5.2.1. Proof of Theorem 5.2.1: Second Half We will prove the following stronger result: If E ⊂ Rd is compact, then
(1) Capd− N (E) > 0 =⇒ P X([1, 1 + ε0 ]N ) ∩ E = ∅ > 0. α
An application of Frostman’s theorem completes the proof; see Theorem 2.2.1, Appendix C. It remains to demonstrate equation (1). From now on, we assume that E ⊂ [−M, M ]d for some fixed finite constant M > 0. Suppose µ ∈ P(E) is absolutely continuous with respect to Lebesgue’s measure, and let f denote the probability density function dµ/dx. Define I(µ) = Since each Xs ∼ Nd (0, I), −d 2 (2π) E I(µ) = εN 0
[1,1+ε0 ]N
[−M,M]d
f (Xs ) ds.
1
2
d
1
2
−2 −2M f (x)e− 2 x dx ≥ εN e . 0 (2π)
(2)
On the other hand, by Lemma 5.4.1, there exists a finite constant A > 0 such that E {I(µ)}2 ≤ A · Energyd− N (µ). α
Combining this and equation (2), together with the Paley–Zygmund inequality (Lemma 1.4.1, Chapter 3), we obtain the following: 2
E[I(µ)] N P X([1, 1 + ε0 ] ) ∩ E = ∅ ≥ P{I(µ) > 0} ≥ E[{I(µ)}2 −1 ≥ A1 Energyd− N (µ) , α
450
11. Multiparameter Markov Processes 2
−d −M where A1 = ε2N e /A. Since the left-hand side and the choice of 0 (2π) A1 are both independent of the choice of µ, we can deduce that −1
, P X([1, 1 + ε0 ]N ) ∩ E = ∅ ≥ A1 inf Energyd− N (µ) µ
α
where the infimum is taken over all probability measures µ ∈ P(E) that are absolutely continuous with respect to Lebesgue’s measure. Equation (1) now follows from Exercise 4.1.4 above.
6 Supplementary Exercises 1. (Hard) Suppose X = (Xt ; t ∈ RN + ) is an N -parameter, S∆ -valued Markov process with transition operators T and resolvent R, where S is a separable metric space (say S = Rd ) and S∆ is its one-point compactification. For every λ 0 in RN + , let e(λ) denote the random vector defined in Section 3.2. Recall that e(λ) is independent of the entire X process. (i) Define a new process X λ = (Xtλ ; t ∈ RN + ) by " Xt , if t ≺ e(λ), Xtλ = ∆, if t e(λ). Prove that X λ is an N -parameter Markov process. Compute its transition operators and its resolvents. (See also Theorem 1.3.2, Chapter 8.) (ii) Show that whenever X is a strongly symmetric Markov process on Rd , X λ is a strongly symmetric Markov process on Rd∆ . (What does this mean?) (iii) By using X λ , show that the upper bound in Theorem 3.1.1 can be improved to A2 Crλ (E). (Hint: Check that the integral in the latter theorem is equal to Pν {X λ (RN + ) ∩ E = ∅}.) 2. Given an N -parameter Markov process on Rd and a compact set K ⊂ Rd whose interior is open, we can define the set HK = {t ∈ RN + : Xs ∈ K for all s t}. (i) Prove that (t ∈ HK ) is F t -measurable, where F = (F t ; t ∈ RN + ) denotes the complete augmented history of X. (ii) Define the linear operators T K = (TtK ; t ∈ RN + ) as follows: For all f ∈ d L∞ (Rd ), all t ∈ RN + , and all x ∈ R , TtK f (x) = Ex [f (Xt )1lH K (t)]. Prove that for any nonnegative function f , all t, s ∈ RN + , and all x, K f (x) ≤ TsK TsK f (x). That is, T K is a subsemigroup. Prove that when Tt+s N = 1, T K is a proper Markov semigroup.
6 Supplementary Exercises
451
3. Consider k ≥ 3 independent, d-dimensional Brownian motions, all starting from distinct points of Rd . Show that their ranges intersect with positive probability if and only if d = 1 or d = 2. Find an extension of this when the X’s are Rd -valued, isotropic stable L´evy processes, all with index α ∈ ]0, 2]. The first part is due to Dvoretzky et al. (1954). 4. Let X denote an N -parameter Markov process on a compact, separable metric space S. If R denotes its resolvent, prove that for all λ 0 in RN + and all ϕ ∈ L∞ (S), t → e−λ·t Rλ ϕ(Xt ) is an N -parameter supermartingale. 5. (Hard) Consider a one-parameter, isotropic stable process X = (Xt ; ≥ 0) of index α ∈ [0, 2], and show that given a compact set E ⊂ R+ , P(0 ∈ X(E)) > 0 ⇐⇒ Capd/α (E) > 0. Conclude that no matter the value of α, whenever d ≥ 2, Leb{X(E)} = 0, P-a.s. This is from Hawkes (1977b, Theorem 4) and is based on Hawkes (1970, Theorem 3). 6. (Hard) Suppose X is an N -parameter, S-valued stochastic process, where S is a compact separable metric space. Suppose X satisfies all of the conditions for being an N -parameter Feller process, except that t → Xt need not be right continuous. Prove that it has a modification that is a proper N -parameter Feller process. (Hint: Use Supplementary Exercise 4 and follow the proof of Theorem 4.1.1, Chapter 8.) 7. Suppose X is an N -parameter Markov process, taking its values in some compact separable metric space S. Suppose further that there exists a measure ν on S and a measurable function p such that & Tt f (x) = pt (x, y)f (y) ν(dy), & ∞ for all t 0 in RN + , all x ∈ S, and all f ∈ L (S). Supposing that RN pt (x, y) dt < +
∞, show that for all ϕ ∈ L∞ (S), t → R0 ϕ(Xt ) is a supermartingale, where & R0 ϕ(x) = RN Tt ϕ(x) dt. +
8. (Hard) Consider two independent d-dimensional Brownian motions, B 1 and B 2 . Let P (ε) denote the probability that B 1 ([1, 2]) is within ε of B 2 ([1, 2]). Prove the existence of positive and finite constants A1 and A2 such that for all ε ∈ ]0, 1[, A1 U (ε) ≤ P (ε) ≤ A2 U (ε), where
1, U (ε) = {ln+ (1/ε)}−1 , d−4 , ε
if 1 ≤ d ≤ 3, if d = 4, if d > 4.
(i) Show that the above estimate implies Theorem 4.3.2. (ii) What if B 1 and B 2 were independent d-dimensional isotropic stable L´evy processes with the same index α ∈ ]0, 2[ each?
452
11. Multiparameter Markov Processes
This is due to Aizenman (1985). (Hint: The lower bound on P (ε) uses the Paley–Zygmund lemma. For the upper & bound, consider the 2-parameter martingale Mt = E[ [1,3]2 1l(|Bs1 −Bt2 |≤ε) ds dt | F t ], t ∈ [1, 2]2 . Use the trivial inequality that for all t ∈ [1, 2]2 , Mt ≥ Mt 1l(|Bt |≤ε/2) and apply Cairoli’s strong (2, 2) inequality. This exercise is an extension of Proposition 1.4.1, Chapter 10.) 9. Show that a modification of the process X of the example of Section 5.1 exists such that for all a b both in RN +, lim sup
sup
h→0+
s,t∈[a,b]: |s−t|≤h
|Xt − Xs | < ∞. 0 h ln+ (1/h) β 2
(Hint: Consult Dudley’s theorem; see Theorem 2.7.1, Chapter 5.) 10. Consider a d-dimensional Brownian motion B = (Bt ; t ≥ 0). If E ⊂ R+ is a compact set, show that with probability one, dim{B(E)} = 2 dim(E) ∧ d. This is due to McKean (1955a). (Hint: The upper bound on the dimension of B(E) follows readily from Example 2, Section 1.2 of Appendix C. For the lower bound, take a probability measure µ on E and consider the β-dimensional Bessel–Riesz energy of µ ◦ B −1 ∈ P(B(E)). Start your analysis of this half by showing that for any β > 0, Energyβ (µ◦B −1 ) = && Bs − Bt −β µ(ds) µ(dt). Finish by computing the expectation of the latter.) d 11. Suppose Y = (Yt ; t ∈ RN + ) is an R -valued stochastic process with rightcontinuous trajectories and some open set E ⊂ Rd . Prove that there is a measurN N able T ∈ QN + ∪ {∞} (R+ ∪ {∞} denotes the one-point compactification of R+ ) N ∩ [0, 1] . such that ∃t ∈ [0, 1]N such that Yt (ω) ∈ E if and only if T (ω) ∈ QN + Prove that equation (4) of Section 3.3 follows from this. (Hint: When N = 1, you can define Tn = inf(j2−n : 0 ≤ j ≤ 2n , Yj2−n ∈ E) and define T = inf n Tn . As usual, inf ∅ = +∞. When N > 1, work one parameter at a time.)
12. (Hard) Let X denote an N -parameter, Rd -valued additive stable process of index α ∈ ]0, 2]. Suppose d ≥ 2N and fix M > 1. Prove that lim
sup
ε→0+ a∈[−M,M ]d
P{X([1, 2]N ) ∩ B(a; ε) = ∅} = 0.
Conclude that a.s., the closure of X([1, 2]N ) has zero Lebesgue measure in Rd . (Hint: Prove the appropriate variant of Supplementary Exercise 8.) 13. Let X denote N -parameter additive Brownian motion in Rd , where d ≤ 2N . Prove that if K ⊂ Rd is compact, & E0 RN e−λ·s 1lK (Xs ) ds = +∞, +
RN +
whenever λ ∈ has a coordinate that is 0. Conclude that the potential operator Rλ is a bounded linear operator only if λ 0 (∈ RN + ).
7 Notes on Chapter 11
453
7 Notes on Chapter 11 Section 1 In the literature at large there does not seem to be an agreement on what a multiparameter Markov process should be. The development here is new and is designed to (i) handle the potential theory of many of the interesting processes; and (ii) be sufficiently general without being overburdened with technical details. However, earlier parts of the development of this theory can be found in Cairoli (1966, 1970a), as well as (Hirsch and Song 1995b; Mazziotto 1988; Wong 1989). Bass and Pyke (1985, 1984a) contain existence and regularity results for a very large class of random fields that include the additive L´evy processes of this chapter. In considering general multiparameter Markov processes, similar questions are largely unanswered at the moment, and much more remains to be done on this topic. Section 3 The original motivation for the development of Theorem 3.1.1 and its ilk was to decide when the trajectories of k independent L´evy processes intersect. This was the so-called Hendricks–Taylor problem and was solved in the pioneering work of Fitzsimmons and Salisbury (1989). In the background lurks an ingenious result of Fitzsimmons and Maisonneuve (1986); it essentially states that any one-parameter Markov process can be reversed, as long as one starts the process appropriately (that time-reversal again!). In this connection, see also the expository paper Salisbury (1996). The statement of the Hendricks–Taylor problem, as well as related things, can be found in Hendricks (1972, 1974, 1979) and Taylor (1986, Problem 6). Hirsch and Song (1994, 1995a, 1995b, 1995c) extend the Fitzsimmons– Salisbury theory, by other methods, to exhibit a very general and elegant potential theory for multiparameter Markov processes. For related works, see Bass et al. (1994), Bass and Khoshnevisan (1993a), Bauer (1994), Dynkin (1980, 1981a, 1981b, 1983, 1984a, 1984b, 1985, 1986, 1987), Geman et al. (1984), Hawkes (1977a, 1978, 1979), Le Gall and Rosen (1991), Le Gall (1987), Peres (1996a, 1996b), Ren (1990), Rosen (1983, 1984), and Werner (1993). Some of the earlier works on the Hendricks–Taylor problem can be found in Evans (1987a, 1987b), Le Gall et al. (1989), and Rogers (1989). The methods of this section are an adaptation of those of Khoshnevisan and Shi (1999), developed to study the Brownian sheet (which is, unfortunately, not a multiparameter Markov process as we have defined such processes here.)10 Section 4 The material of this section is largely motivated by those of (Fitzsimmons and Salisbury 1989; Hirsch 1995; Khoshnevisan and Shi 1999). Technically speaking, Theorem 4.6.1 is new. However, you should read the one-parameter argument McKean (1955b) with care to see that the necessary ideas are all there. Theorem 4.6.2 is a multiparameter extension of some of the results of Hawkes (1971a, 1971b). In fact, much more can be proved. For instance, Blumenthal and Getoor (1960a) compute dim{X(E)} for a compact set E, where X is a one-parameter stable L´evy process on Rd . See Blumenthal and Getoor (1962), 10 As
a counterpoint to this last remark, see the Notes in Chapter 12.
454
11. Multiparameter Markov Processes
Hawkes (1971a), Kahane (1985), Khoshnevisan and Xiao (2002), Le Gall (1992), Pruitt (1970, 1975), and Taylor (1966) for related results and many other references. In other directions, extensions can be found in Taylor (1953, 1955, 1973, 1986). The last two references are survery articles that point to older literature. In one dimension, stochastic codimension was first formally defined in Khoshnevisan and Shi (2000). However, the real essence of such a notion appears much earlier in the literature. For example, see the proof of Taylor (1966, Theorem 4); see also (Barlow and Perkins 1984; Lyons 1990; Peres 1996b). Lemma 4.7.2 was first found in Peres (1996b, Lemma 2.2), but with a different proof. The mentioned bibliography misses three very interesting papers related to the material of this section by way of conditioned processes and time-reversal; see Davis and Salisbury (1988) and Salisbury (1988, 1992). Section 5 The material from this section follows from the more general treatment of Weber (1983) and Testard (1985, 1986). The proofs presented here, as well as Exercise 5.3.3, are worked out jointly with Z. Shi; they are included here with his permission. Theorem 5.1.1 is completely classical. In the present probabilistic setting you can find it in Adler (1981, Section 7.2) and in Rozanov (1982, Section 2, Chapter 3).
12 The Brownian Sheet and Potential Theory
The results and discussions of Chapter 11 do not readily extend to cover a multiparameter process such as the Brownian sheet, since the latter is not a multiparameter Markov process according to the definitions of Chapter 11. In this regard, see Exercise 1.1.3, Chapter 11. In the first section of this chapter we extend and refine the arguments of Chapter 11 to estimate intersection probabilities for the range of the Brownian sheet. As in the development of Chapter 11, these estimates yield geometric information about the range of the Brownian sheet by way of Hausdorff dimension calculations. Section 2 of this chapter is concerned with the evaluation of the Hausdorff dimension of the zero set of the Brownian sheet; here, the zero set of an N parameter process X is the set of “times” t ∈ RN + when Xt = 0. In order to perform this calculation, we need a way of estimating the probability that the zero set intersects a small set (yet another application of intersection probabilities). The third, and final, section is a brief introduction to the theory of local times. Roughly speaking, the local time of a process X at zero is the most natural random measure that one can construct on the zero set. The arguments of Section 2 rely on an approximate version of this local time, and this connection will be further developed in Section 3. Throughout this chapter, B = (Bt ; t ∈ RN + ) will denote an N -parameter, d-dimensional Brownian sheet.
1 Polar Sets for the Range of the Brownian Sheet A (nonrandom) set E ⊂ Rd is said to be polar for a (random) set K ⊂ Rd if P(K∩E = ∅) = 0; otherwise, E is nonpolar for K. When K is the range of
456
12. The Brownian Sheet and Potential Theory
a suitable multiparameter Markov process in Rd , the results of Chapter 11 characterize the collection of all compact, and hence σ-compact, sets in Rd that are polar for the range of the aforementioned multiparameter Markov process. The primary goal of this chapter is to determine the collection of all polar sets for the range of the Brownian sheet.
1.1 Intersection Probabilities Recalling the definition of Bessel–Riesz capacities Capβ from Appendix C, the main result of this chapter is that the range of B intersects a compact set E ⊂ Rd if and only if Capd−2N (E) > 0. This agrees with Kakutani’s theorem when N = 1; cf. Theorem 3.1.1, Chapter 10. Before we attempt detailed calculations, note that no matter how small or how thin E is, P(B(RN + ) ∩ E = ∅) = 1, as long as E intersects the boundary of Rd . This is due to the fact that for all t on the boundary of the parameter space RN + , Bt = 0. However, this is merely a superficial problem, and there are two ways to avoid it. To explain the first method, note that the portion of E that is away from the axes is polar for the range of B if and only if E ∩ [n−1 , n]d is polar for all integers n ≥ 1. (Why?) Consequently, we need only characterize all compact polar subsets of the interior of Rd , i.e., Rd \ ∂Rd . Alternatively, we can slightly redefine our notion of polarity by thinking of E as polar (for the range of B) if and only for all 0 ≺ a b, P(B([a, b]) ∩ E = ∅) > 0. Following Khoshnevisan and Shi (1999), we take the latter approach, due to its technical simplicity. The former approach is developed in Supplementary Exercise 4. Theorem 1.1.1 Let E ⊂ Rd denote a compact subset of Rd . For any 0 ≺ a ≺ b, both in RN + , there exists a finite constant A ≥ 1 such that
1 Capd−2N (E) ≤ P B([a, b]) ∩ E = ∅ ≤ A Capd−2N (E). A The following is an immediate, but important, corollary of this theorem. Corollary 1.1.1 Suppose E is a compact subset of Rd \ {0}. Then, E is polar for the range of the N -parameter Brownian sheet if and only if Capd−2N (E) > 0. In particular, Frostman’s theorem implies that the range of the Brownian sheet intersects E with positive probability if dim(E) > d − 2N , while there are no such intersections if dim(E) < d − 2N ; cf. Theorem 2.1.1 of Appendix C. Recalling stochastic codimensions from Section 4.7, Chapter 11, we deduce the following as a by-product. Corollary 1.1.2 For any 0 ≺ a ≺ b, both in RN +, + codim B([a, b]) = codim B(RN + ) = (d − 2N ) .
1 Polar Sets for the Range of the Brownian Sheet
Exercise 1.1.1 Complete the derivation of Corollary 1.1.2.
457
According to Theorem 4.7.1 of Chapter 11, we can now deduce the following computation of the Hausdorff dimension of the image of a rectangle [a, b] ⊂ RN + , under the random map B. Corollary 1.1.3 For any 0 ≺ a ≺ b, both in RN +, dim B([a, b]) = dim{B(RN + )} = 2N ∧ d,
a.s.
Exercise 1.1.2 Complete the derivation of Corollary 1.1.3.
In particular, when 2N < d, the image of [0, 1]N under the random map B has 0 d-dimensional Lebesgue’s measure, while it has Hausdorff dimension 2N . That is, when 2N < d, B([0, 1]N ) has irregular—sometimes called fractal—structure in that it has a Hausdorff dimension (here, 2N ) that is different from its Euclidean dimension (here, d). Perhaps this fractal-like behavior can be detected in the (multiparameter) random walk simulations/approximations of Figures 12.1, 12.2, and 12.3. All three figures show a simulation of the same 2-parameter, 1-dimensional Brownian sheet. The figures are plotted such that the (x, y)-plane forms the 2-parameter time space against the z-axis that shows the values of the process. That is, in all three figures, (x, y, z) = (t(1) , t(2) , Bt ).
1.2 Proof of Theorem 1.1.1: Lower Bound Let E ⊂ Rd be a fixed compact set. Throughout, M denotes the outer radius of E, i.e., M = sup{x : x ∈ E}. (1) We start our proof of Theorem 1.1.1 by defining the following: For all d a ≺ b, both in RN + , and for all measurable functions f : R → R+ that are d supported on [−M, M ] , Ja,b (f ) = f (Bs ) ds. (2) [a,b]
Using the explicit form of the density function of Bs (Exercise 1.1.1, Chapter 5), we can write E[Ja,b (f )] = f (y)ps (y) dy ds, [a,b]
[−M,M]d
d where for all s ∈ RN + and all y ∈ R , d
ps (y) = (2π)− 2
N =1
s()
− d2
exp
−
y2 N () . 2 =1 s
(3)
458
12. The Brownian Sheet and Potential Theory
Figure 12.1: Aerial view of an R-valued, 2-parameter Brownian sheet
Figure 12.2: Another view of the same Brownian sheet
Figure 12.3: A side view of the same Brownian sheet
1 Polar Sets for the Range of the Brownian Sheet
459
Note that whenever s ∈ [a, b] and y ∈ [−M, M ]d , d
ps (y) ≥ (2π)− 2
N =1
b()
− d2
exp
−
2
M2 N =1
a()
= A1 .
Thus, we obtain the following “first moment estimate”: Lemma 1.2.1 For all 0 ≺ a ≺ b, both in RN + , there exists a finite constant A1 > 0 such that for all measurable functions f : Rd → R+ that are supported in [−M, M ]d , E[Ja,b (f )] ≥ A1 · f (y) dy. We shall prove the following “second moment estimate” for Ja,b (f ) in the next subsection. Lemma 1.2.2 For all 0 ≺ a ≺ b, both in RN + , and all m > 0, there exists a finite constant A2 > 0 such that for all measurable functions f : Rd → R+ that are supported in [−m, m]d , 2 ≤ A2 · Energyd−2N (f ), E Ja,b (f ) where Energyβ (f ) denotes the β-dimensional Bessel–Riesz energy of the measure µ(dx) = f (x) dx; cf. Appendix C. Armed with this, we can readily demonstrate the lower bound for the probability of Theorem 1.1.1. Proof of Theorem 1.1.1: Lower Bound Recall the outer radius M of E from equation (1) and hold fixed a probability density function f : Rd → R+ that is supported in E ε , where ε ∈ ]0, 1[ is arbitrary and where E ε denotes the ε-enlargement of E. That is, E ε = y ∈ Rd : dist y; E} < ε . We note that whenever Ja,b (f ) is strictly positive, there must exist some t ∈ [a, b] such that Bt ∈ E ε . Equivalently,
(Ja,b (f ) > 0) ⊆ B([a, b]) ∩ E ε = ∅ . In particular,
P B([a, b]) ∩ E ε = ∅ ≥ P(Ja,b (f ) > 0) ≥
2 E[Ja,b (f )] . E {Ja,b (f )}2
We have used the Paley–Zygmund lemma in the last line; cf. Lemma 1.4.1 of Chapter 3. Since ε ∈ ]0, 1[, f is supported in [−m, m]d with m = M + 1.
460
12. The Brownian Sheet and Potential Theory
Hence, we can apply Lemmas 1.2.1 and 1.2.2 and deduce the existence of constants A1 and A2 such that, independently of the choice of f and ε, P B([a, b]) ∩ E ε = ∅ ≥
A21 , A2 Energyd−2N (f )
where 1 ÷ ∞ = 0. Optimizing over the choice of all probability density functions f that are supported on E ε , we see that for all ε > 0, A2 ε P B([a, b]) ∩ E ε = ∅ ≥ 1 · Capac d−2N (E ), A2 where Capac β denotes the absolutely continuous capacity corresponding to the gauge function x → x−β for any β > 0. By Exercise 4.1.4, Chapter 11, there exists a finite constant K > 1 such that ε ε K −1 Capd−2N (E ε ) ≤ Capac d−2N (E ) ≤ KCapd−2N (E ).
It should be recognized that K depends only on the distance between E and the axes of RN + , the outer radius of E, and the parameters N and d. As ε → 0+ , E ε ↓ E, which is compact. Since t → Bt is continuous, Lemma 2.1.2 of Appendix D implies that A21 · Capd−2N (E). P B([a, b]) ∩ E = ∅ ≥ A2 K (Why?) This concludes our proof of the lower bound of Theorem 1.1.1, where A is any constant chosen to be larger than the maximum of 1 and A2 K/A21 .
1.3 Proof of Lemma 1.2.2 Our proof of Lemma 1.2.2 is divided into several steps. First, we need a technical real-variable result that estimates the following function: For all s, t ∈ RN +, N () () N s t − =1 (s() ∧ t() )2 . (1) σ 2 (s, t) = =1 N () () =1 s t Throughout this section we will choose and fix two points a, b ∈ RN + with 0 ≺ a ≺ b. These are the same constants a and b of Lemma 1.2.2 of the previous subsection. Lemma 1.3.1 There exists a finite constant A1 > 1 such that for all s, t ∈ [a, b], 1 s − t ≤ σ 2 (s, t) ≤ A1 s − t. A1
1 Polar Sets for the Range of the Brownian Sheet
461
That is, as s tends to t, σ(s, t) goes to 0, roughly at the rate s − t = N 1 { =1 |s() − t() |2 } 2 . Moreover, this approximation holds uniformly over all s, t ∈ [a, b]. The significance of the function σ will be shown later. Right now, we set out to prove the technical Lemma 1.3.1. Proof Throughout this proof we define C =
max1≤j≤N b(j) . min1≤j≤N a(j)
For all s, t ∈ [a, b], N j=1
s(j) ∨ t(j) s(j) ∧ t(j)
= exp
" N
ln
s(j) ∨ t(j) #
j=1
= exp
s(j) ∧ t(j)
# |s(j) − t(j) | . ln 1 + (j) s ∧ t(j) j=1
" N
On the other hand, since ln(1 + r) =
& 1+r 1
r ≤ ln(1 + r) ≤ r, 1+r
u−1 du, r > 0.
Consequently, for all 0 ≤ r ≤ C , r ≤ ln(1 + r) ≤ r. 1 + C We apply this estimate to r = |s(j) − t(j) |/(s(j) ∧ t(j) ) and deduce the following: exp
" N j=1
|s(j) − t(j) | (1 + C ) (s(j) ∧ t(j) )
#
N
≤ j=1
s(j) ∨ t(j) s(j) ∧ t(j)
≤ exp
" N j=1
# |s(j) − t(j) | . s(j) ∧ t(j)
In the above we have used the simple fact that a(j) ≤ s(j) ∧ t(j) ≤ b(j) to ensure that our r is indeed in [0, C ]. Using this simple fact once more, we see that N (j) s ∨ t(j) eC1 |s−t| ≤ ≤ eC2 |s−t| , (2) (j) ∧ t(j) s j=1 where 1 , (1 + C ) max1≤j≤N b(j) N C2 = . min1≤j≤N a(j)
C1 =
462
12. The Brownian Sheet and Potential Theory
Now we estimate the above exponentials. Note that for all x ≥ 0, 1 + x ≤ ex ≤ 1 + xex . &x This follows immediately from ex = 1 + 0 eu du. Applying equation (2), together with the above estimates, we see that N
1 + C1 |s − t| ≤ j=1
s(j) ∨ t(j) s(j) ∧ t(j)
≤ 1 + C2 |s − t|eC2 |s−t| .
Since |s − t| ≤ max1≤j≤N b(j) , C2 |s − t| ≤ N C , and this gives N
1 + C1 |s − t| ≤
s(j) ∨ t(j)
j=1
s(j) ∧ t(j)
≤ 1 + C3 |s − t|,
(3)
with C3 = C2 eN C . Now we finish verifying the lemma. Clearly, σ 2 (s, t) =
# s() ∧ t() " N s(j) ∨ t(j) · − 1 . s() ∨ t() s(j) ∧ t(j) j=1 =1 N
Equally clearly, C −N ≤
N =1
s() ∧ t() s() ∨ t()
≤ C N .
Thus, equation (3) implies C −N C1 · |s − t| ≤ σ 2 (s, t) ≤ C N C3 · |s − t|. By the Cauchy–Schwarz inequality, N −1 s − t ≤ |s − t| ≤ s − t. Consequently, the lemma follows once we let A1 be the maximum of 1, C N C3 , and N C N C1−1 . For all s, t ∈ [a, b], define N R(s, t) =
() ∧ t() ) =1 (s . N () =1 t
(4)
The following is the Brownian sheet analogue of Lemma 5.3.1 of Chapter 11. It also shows the role of the functions (s, t) → σ 2 (s, t) via the function (s, t) → R(s, t). Lemma 1.3.2 For any two points s, t ∈ [a, b], Bs − R(s, t)Bt is independent of Bt . Moreover, Bs − R(s, t)Bt ∼ Nd (0, σ 2 (s, t)Id ), where Id denotes the (d × d) identity matrix.
1 Polar Sets for the Range of the Brownian Sheet
Exercise 1.3.1 Prove Lemma 1.3.2.
463
Next, we need to estimate the size of 1 − R(s, t), when s ≈ t. The proof is based on similar arguments that were used to prove Lemma 1.3.1. Lemma 1.3.3 There exists a finite constant A2 ≥ 1 such that for all s, t ∈ [a, b], 1 s − t ≤ 1 − R(s, t) ≤ A2 s − t. A2 Proof Later on, we will need only the upper bound on 1−R(s, t). Therefore, we will verify only this part. The lower bound is proved in Exercise 1.3.2 below. Note that R(s, t) = exp
" N
# t() − s() ∧ t() . ln 1 − t() =1
For all r ≥ 0, ln(1 − r) = − R(s, t) ≥ exp
−
&1 1−r
u−1 du ≥ − r/(1 − r). Therefore,
N |t() − s() ∧ t() |
t()
=1
≥ exp ≥ exp
− −
1 max1≤j≤N
b(j)
1
·
N
|t() − s() ∧ t() |
=1
·
N
|t() − s() |
max1≤j≤N b(j) =1 √ N ≥ exp − s − t . (j) max1≤j≤N b
We have used the Cauchy–Schwarz inequality in the last line. The upper bound of the lemma follows from the inequality 1 − x ≤ e−x , x ≥ 0. Exercise 1.3.2 Verify the lower bound of Lemma 1.3.3.
We will also need a joint probability estimate. Lemma 1.3.4 For all m > 0, there exists a finite constant A3 > 0 such that for all s, t ∈ [a, b] and all measurable functions f, g : Rd → R+ that are supported on [−m, m]d , d w − y2 dw dy, f (w)g(y) exp − E f (Bs )g(Bt ) ≤ A3 s − t− 2 · 4A1 s − t where A1 is the constant of Lemma 1.3.1.
464
12. The Brownian Sheet and Potential Theory
Proof Clearly,
E f (Bs )g(Bt ) = E f Bs − R(s, t)Bt + R(s, t)Bt g(Bt )
= g(y) pt (y) E f Bs − R(s, t)Bt + R(s, t)Bt dy, where pt (y) is given by equation (3) of Section 1.2. Since g is supported on [−m, m]d , it suffices to consider pt (y) for t ∈ [a, b] and y ∈ [−m, m]d . In this case, it follows immediately from (3) of Section 1.2 that pt (y) ≤ A4 , where N
− d2 −d 2 a() . A4 = (2π) =1
Thus, E f (Bs )g(Bt ) ≤ A4 ·
g(y) E f Bs − R(s, t)Bt + R(s, t)y dy. (5)
Now let us use Lemma 1.3.2 and the explicit form of the Gaussian densities to deduce that for all s, t ∈ [a, b] and all y ∈ [−m, m]d ,
E f Bs − R(s, t)Bt + R(s, t)y
−d z2 d dz. f z + R(s, t)y exp − 2 = (2π)− 2 σ(s, t) 2σ (s, t) We can use Lemma 1.3.1 and the fact that 2π ≥ 1 (to simplify the constants) and deduce that for all s, t ∈ [a, b] and for all y ∈ [−m, m]d ,
E f Bs − R(s, t)Bt + R(s, t)y d
z2 d dz f z + R(s, t)y exp − ≤ A12 s − t− 2 2A1 s − t w − R(s, t)y2 d d dw, (6) f (w) exp − = A12 s − t− 2 2A1 s − t upon changing variables w = z + R(s, t)y. On the other hand, it is simple to see that ( (2 2 w − y2 ≤ 2(w − R(s, t)y ( + 2y2 · 1 − R(s, t) . Thus, an application of Lemma 1.3.3 shows us that for all s, t ∈ [a, b], all w ∈ Rd , and all y ∈ [−m, m]d , ( ( (w − R(s, t)y (2 ≥ 1 w − y2 − m2 A2 s − t2 2 2 1 1 ≥ w − y2 − 2m2 A22 N 2 |b| · s − t. 2
1 Polar Sets for the Range of the Brownian Sheet
465
1
We have used the elementary bound s − t ≤ N 2 |b|. Plugging this into equation (6), we see that for all s, t ∈ [a, b] and for all y ∈ [−m, m]d ,
E f Bs − R(s, t)Bt + R(s, t)y 1 d 2 2 w − y2 −d 2 m A2 N 2 |b| 2 dw. f (w) exp − s − t ≤ A1 e 4A1 s − t This and equation (5) together imply the lemma, where the constant A3 is d
1 defined by A12 A4 exp m2 A22 N 2 |b| . Next, we present a final real-variable lemma. For all β > 0, define Φβ (x) = s−β e−1/ s ds, x ≥ 0.
(7)
s∈RN : s ≤x
Lemma 1.3.5 For all m, β > 0, there exists a finite constant A5 ≥ 1 such that for all x ∈ [m, ∞[, 1 Uβ (x) ≤ Φβ (x) ≤ A5 Uβ (x), A5 where
6 Uβ (x) =
1, ln+ (x), x−β+N ,
if β > N , if β = N , if β < N .
Remark The function Uβ and its variants have already made several appearances in this book. For example, noting the slight change in notation, see equation (1) from Section 1.3 of Chapter 10. Proof We will prove the upper bound for Φβ ; the lower bound is proved similarly and can be found in Exercise 1.3.3 below. We can assume, without loss of generality, that m < 1. By calculating in polar coordinates (Supplementary Exercise 7, Chapter 3), we can find a finite constant C1 > 0 such that x Φβ (x) = C1 λ−β+N −1 e−1/λ dλ. 0
&1
Define C2 = C1 0 λ−β+N −1 e−1/λ dλ and note that it is finite. Moreover, for all m ≤ x ≤ 1, Φβ (x) ≤ C2 ≤
C2 Uβ (x). inf m≤w≤1 Uβ (w)
Of course, the form of the function Uβ makes it apparent that inf
m≤w≤1
Uβ (w) > 0.
466
12. The Brownian Sheet and Potential Theory
Thus, it remains to prove the upper bound of the lemma for x > 1. For the remainder of this proof, we assume x > 1. When β > N , ∞ Φβ (x) ≤ C2 + C1 λ−β+N −1 dλ = C3 . 1
(Note that C3 < ∞.) When β = N , x λ−1 dλ ≤ (C2 + C1 ) ln+ (x). Φβ (x) ≤ C2 + C1 1
Finally, when β < N , Φβ (x) ≤ C2 + C1
1
x
λ−β+N −1 dλ ≤ C2 + C1 (N − β)−1 x−β+N
≤ (C2 + C1 )x−β+N . The upper bound lemma follows for any choice of A5 that is greater than the maximum of 1, C3 , C1 + C2 , and C2 / inf m≤w≤1 Uβ (x). Exercise 1.3.3 Verify the lower bound for Φβ in Lemma 1.3.5.
We are ready to prove Lemma 1.2.2. Proof of Lemma 1.2.2 By equation (2) of Section 1.2, Fubini’s theorem, and Lemma 1.3.4, E {Ja,b (f )}2 = E f (Bs )g(Bt ) dt ds [a,b] [a,b] d f (w)f (y) s − t− 2 ≤ A3 · × exp
[a,b]
−
[a,b]
w − y dt ds dw dy. 4A1 s − t 2
(8)
Let us concentrate on the ds × dt integrals first. Since f is supported by [−m, m]d , we restrict attention to y, w ∈ [−m, m]d . Note that whenever 1 s, t ∈ [a, b], then certainly, r = s − t ∈ [−b, b] ⊂ {v ∈ RN : v ≤ N 2 |b|}. d Consequently, for all y, w ∈ [−m, m] , w − y2 d dt ds s − t− 2 exp − 4A1 s − t [a,b] [a,b] w − y2
−d 2 exp ≤ Leb [a, b] · dr r − 1 4A1 r r∈RN : r ≤N 2 |b| w − y2
d ≤ (4A1 )−( 2 )+N Leb [a, b] · w − y−d+2N Φ d √ . 2 4 N |b|A1
1 Polar Sets for the Range of the Brownian Sheet
467
In the last line we used a change of variables s = 4A1 r/w − y2 and used the definition of Φ d ; see (7). By equation (8), 2
E {Ja,b (f )}2 ≤ A ·
w − y2 √ dy dw, 2 4 N |b|A1
f (w)f (y)w − y−d+2N Φ d
d
with A = A3 (4A1 )−( 2 )+N Leb([a, b]). By Lemma 1.3.5, when d > 2N , we can find some finite positive constant A5 such that supx Φ d (x) ≤ A5 . Thus, 2 when d > 2N , 2 E {Ja,b (f )} ≤ A A5 · f (w)f (y) w − y−d+2N dw dy = A A5 · Energyd−2N (f ).
(9)
Similarly, when d < 2N , E {Ja,b (f )}2 ≤ A ·
f (w)f (y) w − y−d+2N dw dy
= A · Energyd−2N (f ),
(10)
√ d where A = A A5 (4 N |b|A1 ) 2 −N . Finally, a similar analysis shows that when d = 2N , 4√N |b|A 1 E {Ja,b (f )}2 ≤ A A5 · dw dy. f (w)f (y) ln+ w − y2 1
Note that for all w, y ∈ [−m, m]d , y − w ≤ d 2 M . Thus, ln+
Thus,
4√N |b|A w −
1 y2
√ #
" ln(2 N |b|A1 ) 1
1+ = 2 ln+ 1 w − y ln+ w − y √ " #
ln(2 N |b|A1 ) 1
1+ ≤ 2 ln+ 1 w − y ln+ 1 d2 M
1 . = A
ln+ w − y
2 E Ja,b (f ) ≤ A A5 A
· Energy0 (f ).
Combining this with (9) and (10), we can deduce Lemma 1.2.2 with A2 there defined to be the maximum of the values of the following parameters from this subsection: A · A5 , A , and A · A5 · A
.
468
12. The Brownian Sheet and Potential Theory
1.4 Proof of Theorem 1.1.1: Upper Bound Throughout, we let F = (Ft ; t ∈ RN + ) denote the complete augmented history of the process B. We will also hold 0 ≺ a ≺ b, both in RN + , and a compact set E ⊂ Rd fixed whose outer radius is denoted by M ; cf. equation (1) of Section 1.2. With the above in mind, for any measurable f : Rd → R+ , we can define a multiparameter martingale M (f ) = (Mt (f ); t ∈ RN +) as t ∈ RN f (Bs ) ds Ft , (1) Mt (f ) = E +. [a,2b−a]
We will assume, without loss of generality, that the underlying probability space is complete and hence M (f ) can be taken to be separable; cf. Theorem 2.2.1 of Chapter 5. In light of equation (2) of Section 1.2, we note that Mt (f ) = E Ja,2b−a (f ) Ft . (2) Exercise 1.4.1 Demonstrate that for all s, t ∈ RN + , Bt+s − Bt is independent of Ft and compute the mean and covariance matrix of Bt+s − Bt . Lemma 1.4.1 There exist finite positive constants K1 and K2 that depend only on N and d such that for all measurable functions f : Rd → R+ and all t ∈ [a, b], y2 dy, a.s., Mt (f ) ≥ K1 · f (y + Bt ) y−d+2N Φ d 2 K2 where Φ d is defined in (7) of Section 1.3. 2
Proof Since t ∈ [a, b], Mt (f ) ≥ E f (Bs ) ds Ft ≥ E f (Bs+t ) ds Ft [t,2b−a] [0,b] E f (Bs+t − Bt + Bt ) Ft ds. =
(3)
[0,b]
We have used Fubini’s theorem in the last line. By Exercise 1.4.1 above, Bt+s − Bt is independent of Ft and Bt+s − Bt ∼ Nd (0, τ 2 Id ), where τ 2 = Leb([t, t + s]) and Id denotes the (d × d) identity matrix. By the form of Gaussian densities (Exercise 1.1.1, Chapter 5), y2 d dy. E f (Bt+s − Bt + Bt ) Ft = (2πτ 2 )− 2 f (y + Bt ) exp − 2τ 2 By Lemma 3.1.1 of Chapter 5, we can find two finite positive constants c1 and c2 such that for all s, t ∈ [a, b], c1 |s| ≤ τ 2 ≤ c2 |s|. Since
1 Polar Sets for the Range of the Brownian Sheet
469
1
N − 2 s ≤ |s| ≤ s, we have shown the existence of a finite constant 2 c3 ≥ 1 such that for all s, t ∈ [a, b], c−1 3 s ≤ τ ≤ c3 s. Consequently, we obtain c y2 d d 3 dy. E f (Bt+s −Bt +Bt ) Ft ≥ (2πc3 )− 2 s− 2 · f (y+Bt ) exp − 2s Plugging this into equation (3), we deduce the following: c y2 d d 3 ds dy Mt (f ) ≥ (2πc3 )− 2 · f (y + Bt ) s− 2 exp − 2s [0,b]
c3 y2 N −d −d 2 = c3 π 2 · f (y + Bt ) y−d+2N Φ d dy, 2 2 min1≤≤N b() by changing variables λ = 2s/(c3 y2 ). The lemma follows readily.
The next ingredient in our derivation of the upper bound in Theorem 1.1.1 is based on Lemma 1.4.1 and Cairoli’s maximal inequality (Theorem 2.3.2, Chapter 7): For any measurable function f : Rd → R+ , 2 2 . ≤ 4N E J2b−a,a (f ) E sup {Mt (f )}2 ≤ 4N sup E {Mt (f ) t
t∈[a,b]
The second inequality follows from Jensen’s inequality for conditional expectations and from equation (2). (An important ingredient in Cairoli’s maximal inequality was the commutation of the filtration F; cf. Theorem 2.3.2 of Chapter 7.) Applying Lemma 1.2.2, we have the following result. Lemma 1.4.2 For all m > 0, there exists a finite constant C1 that depends only on N , m, and d such that for any measurable function f : Rd → R+ that is supported in [−m, m]d , 2 ≤ C1 · Energyd−2N (f ). E sup Mt (f ) t∈[a,b]
For any ε ∈ ]0, 1[, let E ε denote the ε-enlargement of E. That is, E ε = x ∈ Rd : dist x; E < ε . Note that E ε is open. As such,
P B([a, b]) ∩ E ε = ∅ ≥ P(B1 ∈ E ε ) > 0,
(4)
thanks to the form of the Gaussian density. In particular, by Supplementary Exercise 11 of Chapter 11, we can find a QN + ∪{+∞}-valued random variable ε Tε such that Tε = +∞ if and only if there exists t ∈ QN + such that Bt ∈ E . ε Since t → Bt is continuous and E is open, Tε = +∞ ⇐⇒ B([a, b]) ∩ E ε = ∅.
(5)
470
12. The Brownian Sheet and Potential Theory
Now we can define a set function µε as follows: For all Borel sets F ⊂ Rd ,
µε (F ) = P BTε ∈ F Tε = +∞ . Lemma 1.4.3 For all ε > 0, µε is an absolutely continuous probability measure on E ε . Proof By (4), for all Borel sets F ⊂ Rd , µε (F ) is a properly defined conditional probability. Consequently, µε is a probability measure. By the continuity of t → Bt and since E ε is open, conditional on (Tε = +∞), BTε ∈ E ε . Thus, µε ∈ P(E ε ), as desired. To conclude, suppose G ⊂ Rd has zero Lebesgue measure. We wish to show that µε (G) = 0. To argue this, recall that conditional on (Tε = +∞), Tε ∈ QN + . In particular,
µε (G) ≤ P Bt ∈ G | Tε = +∞ , t∈QN +
which is 0, thanks to the fact that Leb(G) = 0 and to equation (4). This concludes our proof. We are ready to prove Theorem 1.1.1. Proof of Theorem 1.1.1 Owing to Lemma 1.4.3, we can define the probability density µε (dx) f ε (x) = , x ∈ Rd . dx In fact, by Lemma 1.4.3, f ε is a probability density on E ε . In particular, for any choice of ε ∈ ]0, 1[, f ε is supported on [−m, m]d , where m = M + 1. By Lemma 1.4.2, for all ε ∈ ]0, 1[, there exists a finite constant C2 > 0 such that 2 E sup Mt (f ε ) ≤ C2 · Energyd−2N (f ε ). (6) t∈[a,b]
On the other hand, we can apply Lemma 1.4.1 with f replaced by f ε and deduce the existence of finite constants C3 and C4 such that for all ε ∈ [0, 1[, 2 Mt (f ε ) t∈[a,b] " y2 #2 f ε (y + BTε ) y−d+2N Φ d dy 1l(Tε =+∞) . ≥ C3 2 C4 sup
(7)
The expectation of the right-hand side of (7) equals y2 2
dy f ε (w) dw f ε (y + w) y−d+2N Φ d P Tε = +∞ · 2 C4 y2 2
≥ P Tε = +∞ · dyf ε (w) dw . f ε (y + w) y−d+2N Φ d 2 C4
1 Polar Sets for the Range of the Brownian Sheet
471
The last line follows from the Cauchy–Schwarz inequality. Lemma 1.3.5 and a few lines of calculations reveal the existence of a finite constant C5 > 0 such that the above is bounded below by
2 C5 P Tε = +∞ · Energyd−2N (f ε ) . This is a good lower bound for the expectation of the right-hand side of (7). By Lemma 1.4.2, the expectation of the left-hand side of (7) is bounded above by C1 · Energyd−2N (f ε ), for some constant C1 that is independent of the choice of ε ∈ ]0, 1[. If we knew that Energyd−2N (f ε ) were finite, this development would show us that
P Tε = +∞ ≤
C1 , C5 Energyd−2N (µε )
(8)
which is a key inequality. (Recall that the energy Energyd−2N (f ε ) of the function f ε was defined to be the energy Energyd−2N (µε ) of the measure µε ; cf. equation (5) and the subsequent display.) Unfortunately, we do not a priori know that f ε has finite energy. To get around this difficulty we will use a truncation argument to prove (8). For all q > 0 and all ε ∈ ]0, 1[, we define the function fqε by
fqε (x) = f ε (x) 1l[0,q] f ε (x) ,
x ∈ Rd .
Since f ε is supported on E ε , so is fqε . Moreover, the latter is a subprobability density function that is bounded above by q. Note that Energyd−2N (fqε ) < ∞ (why?). Exactly the same argument that led to (7) shows us that 2 sup Mt (fqε )
t∈[a,b]
≥ C3
fqε (y + BTε ) y−d+2N Φ d 2
y2 C4
dy
2
1l(Tε =+∞) .
Taking expectations, we can see that 2 E sup Mt (fqε ) t∈[a,b] y2 2 dy f ε (w) dw ≥ C3 fqε (y + w) y−d+2N Φ d 2 C4 ε
× P T = +∞ y2 2 fqε (y + w) y−d+2N Φ d dy f ε (w) dw ≥ C3 2 C4
× P Tε = +∞ , by the Cauchy–Schwarz inequality. By Lemma 1.4.2, the left-hand side is bounded above by C1 Energyd−2N (fqε ). We have already pointed out that
472
12. The Brownian Sheet and Potential Theory
this is finite. Thus, by Lemma 1.3.5 and a few lines of calculations, there exists a finite constant C5 > 0 such that 2 E sup Mt (fqε ) t∈[a,b] 2
(9) fqε (y + w) f ε (w) κ(y) dw · P Tε = +∞ ≥ C5 2 ≥ C5 Energyd−2N (fqε ) · P(Tε = +∞), where
if d < 2N , 1, κ(y) = ln+ (1/y), if d = 2N , y−d+2N , if d > 2N .
(Since it depends only on M , d, and N , the above is the same constant C5 that we found in equation (8).) By Lemma 1.4.2, the left-hand side of (9) is bounded above by C1 · Energyd−2N (fqε ). On the other hand, we have already observed that this energy is finite. Thus, P(Tε = +∞) ≤
C1 . C5 Energyd−2N (fqε )
Now we let q ↑ +∞ and use the monotone convergence theorem to obtain (8). The remainder of our proof is simple. By equations (8) and (5),
P B([a, b]) ∩ E ε = ∅ ≤
C1 , C5 Energyd−2N (µε )
and recall that µε ∈ P(E ε ). In particular,
C1 P B([a, b]) ∩ E ε = ∅ ≤ Capd−2N (E ε ). C5 The upper bound in Theorem 1.1.1 now follows from the continuity of t → Bt and from Lemma 2.1.2 of Appendix D.
2 The Codimension of the Level Sets Theorem 1.1.1 of Section 1 and its corollaries have shown us that the range of the Brownian sheet has quite an intricate structure. See also Figures 12.1, 12.2, and 12.3 of Section 1 for some simulations. In this section we study the level sets (or contours) of the Brownian sheet; we shall see that they, too, have interesting geometric and/or measure-theoretic properties.
2 The Codimension of the Level Sets
473
2.1 The Main Calculation d As before, we will let B = (Bt ; t ∈ RN + ) denote an N -parameter, R -valued d −1 Brownian sheet. For any a ∈ R , the level set B (a) of B at a is defined by the following N -dimensional “temporal” set: B −1 (a) = t ∈ RN + : Bt = a .
In this section we shall concern ourselves with the analysis of the zero set of B, which is B −1 (0). This is the level set of B at 0 ∈ Rd , and the other level sets can be similarly studied; cf. Supplementary Exercise 1. Figure 12.4 shows a simulation of the portion of the zero set of B that lies in [0, 1]2 ; here, d = 1 and N = 2. In order to analyze the zero set of B, we need to first ask when B −1 (0) is nonempty. To begin, note that for all t on the Figure 12.4: The zero set of the R-valued, axes of the parameter space 2-parameter Brownian sheet RN + , Bt = 0 (why?). Thus, we N −1 always have ∂R+ ⊂ B (0). We say that B −1 (0) is trivial if B −1 (0) = ∂RN + . That is, the zero set of B is trivial if and only if the only t’s for which Bt = 0 are the ones on the axes. By Theorem 1.1.1, B −1 (0) is nontrivial if and only if Capd−2N ({0}) > 0. (Why?) This is equivalent to the following, more useful, result. Corollary 2.1.1 The zero set of the N -parameter, Rd -valued Brownian sheet is nontrivial if and only if d < 2N . Exercise 2.1.1 Prove Corollary 2.1.1.
Thus, in all of our subsequent discussions of the zero set of B we are naturally led to assume that d < 2N . The main result of this section is the following computation of the codimension of the zero set. Theorem 2.1.1 When d < 2N , d codim B −1 (0) \ ∂RN + = . 2 Together with Theorem 4.7.1 of Chapter 11, Theorem 2.1.1 implies the following computation of the size of the zero set of the Brownian sheet.
474
12. The Brownian Sheet and Potential Theory
Corollary 2.1.2 When d < 2N , d dim B −1 (0) \ ∂RN + =N − , 2
a.s.
Exercise 2.1.2 Complete this proof of Corollary 2.1.2.
−1
Figure 12.4 shows B (0) for N = 2 and d = 1; this is a (random) set in R2+ whose Hausdorff dimension is (almost surely) equal to 32 .
2.2 Proof of Theorem 2.1.1: The Lower Bound Proving the “lower bound” in Theorem 2.1.1 consists in showing that the lower stochastic codimension of B −1 (0) is at least d2 . That is, we wish to verify that whenever K ⊂ RN + is a compact set whose Hausdorff dimension is strictly less than d2 , then B −1 (0) ∩ K is empty, almost surely; cf. Lemma 4.7.1 of Chapter 11. We do this by appealing to a counting principle as we did in our proof of Theorem 5.2.1 of Chapter 11. Namely, we derive an estimate for the probability that B −1 (0) intersects a small ball as follows: Lemma 2.2.1 Suppose d < 2N , M > 1, and η ∈ ]0, d2 [ are fixed. Then, there exists a finite constant A > 0 such that for all a ∈ [M −1 , M ]N and all ε ∈ ]0, 1[,
P B −1 (0) ∩ [a, a + p(ε)] = ∅ ≤ Aεη , where p(ε) denotes the point in RN + all of whose coordinates are equal to ε. The above is a useful estimate for small values of ε and can be shown to be essentially sharp, as the following shows. Exercise 2.2.1 Suppose M > 1 is fixed. Then, there exists a finite constant A > 0 such that for all ε ∈ ]0, 1[,
d P B −1 (0) ∩ [a, a + p(ε)] = ∅ ≥ A ε 2 ,
where p(ε) is defined in Lemma 2.2.1.
Proof We begin with an aside, which is a simple calculation involving Gaussian densities: For any a ∈ [M −1 , M ]N and for all z > 0,
P |Ba | ≤ z = 2π
N
a()
− d2
j=1
=1
≤
2 d2 π
M
d
·
Nd 2
z
−z
exp −
2
zd.
We have merely used the fact that e−u ≤ 1 for u ≥ 0.
w2 N
k=1
a(k)
dw (1)
2 The Codimension of the Level Sets
475
From here on, the proof of this lemma is similar in spirit to that of Lemma 5.3.2, Chapter 11. Note that B −1 (0) ∩ [a, a + p(ε)] is nonempty if and only if there exists a point τ ∈ [a, a + p(ε)] such that Bτ = 0. In particular, for this choice of τ , |Ba | ≤ |Ba − Bτ | ≤
sup a t a+p(ε)
|Ba − Bt | =
sup 0 t p(ε)
|Ba+t − Ba |.
Therefore,
P B −1 (0) ∩ [a, a + p(ε)] = ∅ ≤ P |Ba | ≤
sup 0 t p(ε)
|Ba+t − Ba | .
On the other hand, Ba is independent of the entire process (Ba+t − Ba ; t ∈ RN + ) (why?). In particular, Ba is independent of sup0 t p(ε) |Ba+t − Ba |, and equation (1) and Fubini’s theorem together imply
P B −1 (0) ∩ [a, a + p(ε)] = ∅ ≤ A1 · E
sup 0 t p(ε)
|Ba+t − Ba |d ,
d
where A1 = (2M N /π) 2 . On the other hand, metric entropy considerations imply that for any fixed η ∈ ]0, d2 [, there exists A2 > 0 such that for all ε ∈ ]0, 1[, (2) E sup |Ba+t − Ba |d ≤ εη ; 0 t p(ε)
see Kolmogorov’s continuity theorem (in particular, Exercise 2.5.1 of Chapter 5). This proves our lemma. Exercise 2.2.2 Verify (2) in detail. (Hint: Consult Lemma 3.1.1, Chapter 5.)
A more ambitious exercise is to refine Lemma 2.2.1 as follows. Exercise 2.2.3 Prove that for all a ∈ ]0, ∞[d fixed, there exists C1 > 0 such that for all ε ∈ ]0, 1[, E
sup 0 t p(ε)
d |Ba+t − Ba |d ≤ C1 ε 2 .
Conclude that there exists C2 > 0 such that for all ε ∈ ]0, 1[,
d P B −1 (0) ∩ [a, a + p(ε)] = ∅ ≤ C2 ε 2 , thus showing that Exercise 2.2.2 is sharp, up to a multiplicative constant.
476
12. The Brownian Sheet and Potential Theory
We conclude this subsection by proving half of Theorem 2.1.1. Proof of the Lower Bound in Theorem 2.1.1 We are to show that for d −1 all compact sets K ⊂ RN (0)∩K is a.s. empty. In + , if dim(K) < 2 , then B fact, it suffices to assume that K ⊂ [−M −1 , M ]N , where M > 1 is chosen arbitrarily large (why?). Throughout, we hold fixed some η ∈ ]0, d2 [. By Lemma 1.1.3 of Appendix C, for any n > 0, we can find cubes Cj,n of side rj,n such that ∪∞ j=1 Cj,n ⊃ K and ∞
(2rj,n )η ≤
j=1
1 . n
Clearly, whenever B −1 (0) ∩ K is nonempty, there must exist some j ≥ 1 such that B −1 (0)∩Cj,n is nonempty. Consequently, by Lemma 2.2.1 above, there exists a finite constant A > 0 such that ∞ ∞
A P B −1 (0) ∩ K = ∅ ≤ E 1l(B −1 (0)∩Cj,n =∅) ≤ A · (2rj,n )η ≤ . n j=1 j=1
Since n ≥ 1 is arbitrary, this shows that B −1 (0) ∩ K = ∅, a.s., and completes our proof.
2.3 Proof of Theorem 2.1.1: The Upper Bound To complete our derivation of Theorem 2.1.1, it suffices to show that for all M > 1 and all compact sets K ⊂]0, M ]N with dim(K) > d2 , P(B −1 (0)∩K = ∅) > 0 (why?). Frostman’s theorem implies the existence of a probability measure µ on K such that Energy d (µ) < ∞; cf. Theorem 2.2.1, Appendix C. Fix such a 2 µ and consider Jε = ε−d ·
1l(|Bs |≤ε) µ(ds).
By Fatou’s lemma and by the form of the Gaussian density function, d
lim inf E[Jε ] ≥ (2π)− 2 + ε→0
N
s(j)
− d2
d
µ(ds) ≥ (2π)− 2 M −
Nd 2
.
(1)
j=1
On the other hand, by Lemma 1.3.4, for all s, t ∈ [−M, M ]N , we can find a finite constant A such that for all ε > 0,
−d 2 P |Bs | ≤ ε , |Bt | ≤ ε ≤ As − t dw dy |y|≤ε
d
≤ 4 A · s − t
−d 2 2d
ε .
|w|≤ε
3 Local Times as Frostman’s Measures
Thus, for all ε > 0,
E Jε2 ≤ 4d A · Energy d (µ).
477
(2)
2
By the Paley–Zygmund lemma (Lemma 1.4.1, Chapter 3), used in conjunction with equations (1) and (2),
{E[Jε ]}2 N lim inf P K ∩ {s ∈ R : |B | ≤ ε} =
∅ ≥ lim inf s + E[Jε2 ] ε→0+ ε→0+ −1
≥ A Energy d (µ) , 2
where A = (8π)−d A−1 M −N d . We should recognize that the above is strictly positive. By the compactness of K and by the continuity of t → Bt , the left-hand side equals the probability that B −1 (0) intersects K. This completes our proof. In fact, the above proof implies the following. Corollary 2.3.1 Suppose E ⊂ RN (E) > 0. Then, + is compact and Cap d 2 −1 with positive probability, B (0) ∩ E = ∅. One can prove the following refinement, using similar arguments. Corollary 2.3.2 Suppose E ⊂ RN (E) > 0. Then, + is compact and Cap d 2 d −1 for any x ∈ R , with positive probability, B (x) ∩ E = ∅. Exercise 2.3.1 Verify Corollaries 2.3.1 and 2.3.2.
3 Local Times as Frostman’s Measures We conclude this chapter with a glimpse of further refinements in the analysis of the level sets of random fields. For concreteness, we concentrate on the zero set B −1 (0) of an N -parameter, Rd -valued Brownian sheet B that has already been subject to our studies of Sections 1 and 2 above. Thus far, we have seen that B −1 (0) is nontrivial if and only if d < 2N (Corollary 2.1.1). Moreover, in the nontrivial case, the Hausdorff dimension of B −1 (0) is a.s. equal to N − d2 ; see Corollary 2.1.2. Before we use −1 this observation, note that for any compact set K ⊂ RN (0) ∩ K is +, B compact. Accordingly, the aforementioned dimension result, together with Frostman’s lemma (Theorem 2.1.1, Appendix C), guarantees the existence of a (random) measure L on B −1 (0) such that a.s., for all 0 < s < N − d2 , (1) lim sup r−s sup L B(x; r) < ∞. r→0+
x∈K
478
12. The Brownian Sheet and Potential Theory
Furthermore, with probability one, for all s > N − d2 , lim sup r−s sup L B(x; r) = ∞. r→0+
x∈K
(2)
(Why?) In this section we show how one can explicitly construct this random measure L—called the local time of B at 0. Of course, the choice of L is not unique. For instance, replace the above L by 2L. It, too, is a measure on B −1 (0), and it, too, satisfies the smoothness properties given by equations (1) and (2). Thus, we will be concerned with finding one “natural” construction. It turns out that our “natural” construction has attractive “uniqueness” properties, which will be explored further in Supplementary Exercise 3 below.
3.1 Construction What does Frostman’s measure of B −1 (0) look like? Inspecting our proof of Frostman’s theorem, we see that, to a great degree, Frostman’s measure of a possibly irregular fractal-like set is the flattest, most uniform, measure that one can construct on the given set. Now let us consider the zero set of a discrete-time random field Z = (Zt ; t ∈ NN 0 ). Intuitively speaking, if Z looks the same everywhere, then the flattest measure that one can construct on Z −1 (0) is the following: For all A ⊂ NN 0 , µ(A) = 1l(Zt =0) . t∈A
In the continuous-time setting of the Brownian sheet, we should be trying to define a flat measure L on B −1 (0) as δ0 (Bt ) dt, L(A) = A
for all Borel sets A ⊂ RN + , where δ0 denotes the delta function, or point mass, at {0}. Of course, δ0 is not a function but a measure, and the above definition is an improper one. However, it does imply that if we take a sequence of proper functions ϕε that converge in some & reasonable sense to δ0 as ε → 0, one might expect that the measures • ϕε (Bs ) ds should converge to L(•). Moreover, one can try to define L this way. As it turns out, the choice of ϕε is more or less immaterial, as long as ϕε looks like δ0 for small ε. An effective method for constructing such ϕ’s is as follows: Consider a probability density function ϕ1 : RN + → R+ and let ϕε (x) = ε−N ϕ1
x
, ε
x ∈ RN +.
Such a sequence is called an approximation to the identity (or a collection of mollifiers), and it is easy to check that ϕε is a probability density
3 Local Times as Frostman’s Measures
479
function for every ε > 0. To see what happens as ε → 0+ , let us temporarily concentrate on the specific ϕ1 given by ϕ1 (x) = 2−N 1l[−1,1]N (x), (x ∈ R). Then, for all x ∈ R, ϕε (x) = (2ε)−N 1l[−ε,ε] (x). In particular, for small values of ε, ϕε (x) = 0 for essentially all x but x ≈ 0. In the latter case, ϕε (0) ≈ ∞. That is, for small values of ε, the function ϕε looks like the delta function δ0 , as planned. Henceforth, we shall use a different approximation to the identity that is easier to use. Namely, for all x ∈ RN and for all ε > 0, define x2 N . (1) ϕε (x) = (2πε)− 2 exp − 2ε It should be recognized that the above is also an approximation to the identity. Based on the preceding, we now define approximate local times Lε as follows: For all Borel sets A ⊂ RN +, Lε (A) = ϕε (Bs ) ds. (2) A
&
N
That is, Lε (•) = • ϕε (Bs ) ds, where ϕ1 (x) = (2π)− 2 exp{− 21 x2 }. It is very important to try to recognize Lε (•) as a variant of Jε of Section 2.3 and ϕ1 as the heat kernel. As such, we have already used such objects to calculate the Hausdorff dimension of B −1 (0). In brief, we are now taking a second, deeper, look at the Jε of Section 2.3, together with its ilk. Based on the preliminary discussions above, we should expect the following result, which is the main result of this section. Theorem 3.1.1 Suppose d < 2N . Then, there exists a random measure L on B −1 (0) such that: (i) Equations (1) and (2) of the preamble to this subsection hold for this L; and (ii) for any Borel set A ⊂ RN +, lim Lε (A) = L(A),
ε→0+
where the convergence takes place in Lp (P) for every p > 0. The measure L is sometimes called the local time of the Brownian sheet (at 0). Once more, we emphasize that the local time (at 0) can be viewed as an explicit construction of Frostman’s measure of the (random) level set of B (at 0). We will derive Theorem 3.1.1 in detail in Sections 3.3 and 3.4 below. However, in order to understand some of the essential features of local times, it is best (and by far the easiest) to start with the one-parameter case of Brownian motion. Recall that in this case N = 1, and the condition d < 2N forces us to assume d = 1.
480
12. The Brownian Sheet and Potential Theory
3.2 Warmup: Linear Brownian Motion Throughout this subsection let B = (Bt ; t ≥ 0) denote linear (i.e., onedimensional) Brownian motion. We begin our investigation of the local times of B with the following analytical estimate: Proposition 3.2.1 For all f ∈ L1 (R), all t, h ≥ 0, and all integers k ≥ 1, E
t
t+h
f (Bs ) ds
k
≤
2 k π
k
k! f k1 h 2 .
The constant factors can be slightly improved; cf. Supplementary Exercise 2. We obtain the following as a consequence. Corollary 3.2.1 Given f ∈ L1 (R) and t ≥ 0, for 0 < λ < &t E[eλ 0 |f (Bs )| ds ] < ∞.
π − 12 f −1 1 , 2t
Exercise 3.2.1 Prove Corollary 3.2.1.
Proof of Proposition 3.2.1 It suffices to prove the result for nonnegative functions f . Otherwise, replace f by |f | everywhere. Let T denote the heat semigroup that forms the transition operators of B; cf. Example 1, Section 4.3 of Chapter 8. We will shortly reduce the & t+h study of the kth moment of t f (Bs ) ds to the study of integrals of Ts f . First, let us note that E
t
t+h
f (Bs ) ds
k
t+h =E ··· t
= k! · · ·
t+h
t
t≤s1 ≤···≤sk ≤t+h
f (Bs1 ) · · · f (Bsk ) ds1 · · · dsk E
k
f (Bs ) ds1 · · · dsk .
(1)
=1
Fix any t = s0 ≤ s1 ≤ · · · ≤ sk ≤ t + h and consider E[f (Bsk ) | Fsk−1 ], where F = (Fs ; s ≥ 0) denotes the natural filtration of B. Clearly, E[f (Bsk ) | Fsk−1 ] = E[f (Bsk − Bsk−1 + Bsk−1 ) | Fsk−1 ]. Since the increments of B are stationary and independent, Bsk − Bsk−1 is independent of Fsk−1 . It also has the same distribution as Bsk −sk−1 . Thus, recalling that f ≥ 0, E[f (Bsk ) | Fsk−1 ] ≤ sup E[f (Bsk −sk−1 + a)] = Tsk −sk−1 f ∞ . a∈R
3 Local Times as Frostman’s Measures
481
Iterating this and remembering that s0 = t, we see that equation (1) yields t t E · · · f (Bs1 ) · · · f (Bsk ) ds1 · · · dsk 0 0 k ≤ k! · · · Ts −s−1 f ∞ ds1 · · · dsk .
(2)
t≤s1 ≤···≤sk ≤t+h =1
On the other hand, for any r ≥ 0 and for all x ∈ R, ∞ 2 1 1 Tr f (x) = (2πr)− 2 f (y)e−|x−y| /2r dy ≤ (2πr)− 2 f 1 . −∞
& t+h Thus, (2) implies that the kth moment of t f (Bs ) ds is bounded above by k 1 −k k 2 ··· f 1 (s − s−1 )− 2 ds1 · · · dsk . k! (2π) t≤s1 ≤···≤sk ≤t+h =1
Changing variables r = s − s−1 , it is easy to see that
···
k
1
(s − s−1 )− 2 ds1 . . . dsk
t≤s1 ≤···≤sk ≤t+h =1
≤
h
r
− 12
0
k−1 dr ·
t+h t
− 12
s1
k
ds1 ≤ 2k−1 h 2 .
The proposition follows from this.
Recalling equation (2) of Section 3.1, we begin by showing that for each Borel set A ⊂ R+ , {Lε (A); ε > 0} is a Cauchy sequence in L2 (P). Lemma 3.2.1 For each T > 0, lim
sup
|δ−ε|→0 A⊂[0,T ], Borel
E |Lε (A) − Lδ (A)|2 = 0.
In particular, L(A) = limε→0+ Lε (A) exists, where the limit holds in L2 (P). In fact, it is possible to prove the following extension; see Exercise 3.2.2 below. Corollary 3.2.2 For all T > 0, there exists a finite C > 0 such that for all Borel sets A ⊂ [0, T ] and all ε, δ > 0, E |Lε (A) − Lδ (A)|2 ≤ C|δ − ε|.
482
12. The Brownian Sheet and Potential Theory
Proof Once we show that the expectation goes to 0, the existence of the limit follows from the completeness of L2 (P). By (2) and by the inversion theorem for characteristic functions, Lε (A) − Lδ (A) = [ϕε (Bs ) − ϕδ (Bs )] ds A ∞ 2 2 1 1 1 = e−iξBs [e− 2 εξ − e− 2 δξ ] dξ ds. 2π A −∞ Since the left-hand side is real-valued, its square is the same as its square modulus in the sense of complex numbers. That is, E {Lε (A) − Lδ (A)}2 ∞ ∞ − 1 εξ2 2 1 1 = e 2 − e− 2 δξ 2 4π A A −∞ −∞ 1 2 2 1 × e− 2 εζ − e− 2 δζ E e−iξBs +iζBr dξ dζ ds dr. Suppose δ ≥ ε > 0 (say). Then, 2
1
1
2
1
e− 2 εξ − e− 2 δξ = e− 2 εξ
2
1
1 − e− 2 (δ−ε)ξ
2
≤
δ − ε ξ 2 ∧ 1. 2
We have used the inequality 1 − x ≤ e−x for x ≥ 0. Furthermore, since B is 1 2 a Gaussian process, E e−iξBs +iζBr = e− 2 E[{ξBs −ζBr } ] , which is nonnegative. Thus, for all Borel sets A ⊂ [0, T ], E {Lε (A) − Lδ (A)}2 ∞ ∞ 1 δ − ε 2 ξ ≤ ∧ 1 4π 2 [0,T ] [0,T ] −∞ −∞ 2 1 δ − ε 2 ζ 2 ∧ 1 e− 2 E[{ξBs −ζBr } ] dξ dζ ds dr × 2 ∞ ∞ 1 δ − ε 2 ξ ∧ 1 = 2 2π 2 −∞ −∞ 0≤s≤r≤T
×
δ − ε 2
1 2 ζ 2 ∧ 1 e− 2 E[{ξBs −ζBr } ] dξ dζ ds dr.
If 0 ≤ s ≤ r ≤ T , E[{ξBs − ζBr }2 ] = (r − s)ζ 2 + s|ξ − ζ|2 . (Why?) Accordingly, E {Lε (A) − Lδ (A)}2 ∞ ∞ 1 δ − ε 2 2 2 ≤ ξ ζ ∧ 1 2π 2 2 −∞ −∞ 0≤s≤r≤T
×e
− 12 (r−s)ζ 2 + 12 s|ξ−ζ|2
dξ dζ ds dr.
3 Local Times as Frostman’s Measures
483
&T 2 1 Clearly, s e− 2 (r−s)ξ dr ≤ 2(T ∧ ξ −2 ). Thus (why?),
2 2 1 1 e− 2 (r−s)ζ + 2 s|ξ−ζ| dr ds ≤ 4 T ∧ ξ −2 × T ∧ |ξ − ζ|−2 . 0≤s≤r≤T
In particular, 2 E {Lε (A) − Lδ (A)}2 ≤ 2 π
∞
−∞
∞
δ − ε 2
−∞ −2
2
ξ 2 ζ 2 ∧ 1 × (T ∧ ξ −2 )
×(T ∧ |ξ − ζ| ) dξ dζ ∞ ∞ 2 δ − ε 2 2 ξ |ξ + ζ|2 ∧ 1 ≤ 2 π 2 −∞ −∞ ×(T ∧ ξ −2 ) × (T ∧ ζ −2 ) dξ dζ.
&& (T ∧ ξ −2 )(T ∧ ζ −2 ) dξ dζ is finite, the result follows from Since R2 Lebesgue’s dominated convergence theorem. Exercise 3.2.2 Prove Corollary 3.2.2.
We are ready to prove Theorem 3.1.1 in the case N = d = 1. Proof of Theorem 3.1.1 for Linear Brownian Motion By Proposition 3.2.1, for all k ≥ 1 and all intervals A ⊂ R, 2 k k E {Lε (A)}k ≤ k! |Leb(A)| 2 , π
(3)
where Leb denotes Lebesgue’s measure. In fact, a monotone class argument reveals that (3) holds for all Borel sets A ⊂ R+ . In particular, for each Borel set A ⊂ R+ , {Lε (A); ε > 0} is bounded in Lk (P) for all k ≥ 1. By Lemma 3.2.1 and by uniform integrability, we see that for each Borel set A ⊂ R+ , L(A) = limε→0+ Lε (A) in Lk (P) for all k ≥ 0. Consequently, we can take ε → 0 in equation (3) and deduce that for all k ≥ 0 and for all Borel sets A ⊂ R+ , 2 k E {L(A)}k ≤ k!{Leb(A)}k/2 . (4) π A priori we do not know that this L is a measure on B −1 (0), since for each A, our construction of L(A) comes with a number of null sets, and as A varies, so do these uncountably many null sets. To circumvent this problem, ' of L and prove Theorem 3.1.1 we will construct a suitable modification L for this modification instead. Define t = L([0, t]) and think of as the distribution function of L. Equation (4) implies that for all t, h ≥ 0 and for all integers k ≥ 1, 2 k k E |t+h − t |k ≤ k! h 2 . π
484
12. The Brownian Sheet and Potential Theory
By Kolmogorov’s continuity theorem, = (t ; t ≥ 0) has a H¨ older modification of any order 0 < q < 12 ; cf. Theorem 2.5.1, Chapter 5. Let ' = ('t ; t ≥ 0) denote this modification. By Exercise 3.2.4 below, t → 't is continuous and increasing on R+ and satisfies '0 = 0. Thus, ' is a random ' denote the corresponding measure. We shall distribution function. Let L ' now argue that this L is the L of Theorem 3.1.1. ' In parSince and ' are modifications of one another, so are L and L. ' ticular, for each Borel set A ⊂ R+ , L(A) = limε→0+ Lε (A), where the convergence takes place in Lk (P) for any k ≥ 1. Thus (why?), for all η > 0, ' L(A)1 l(∃s∈A:
|Bs |>η)
= lim+ Lε (A)1l(∃s∈A: ε→0
|Bs |>η) ,
where the convergence takes place in Lk (P) for any k ≥ 1. However, by Exercise 3.2.5 below, the above is 0. Equivalently, for all η > 0, ' L(A)1 l(∃s∈A: |Bs |>η) = 0, almost surely. In particular, the following holds with probability one: ∀η > 0, rational , ∀ intervals A ⊂ R+ with rational endpoints : ' L(A)1 l(∃s∈A: |B |>η) = 0. s
' is a measure and since B −1 (0) = Since L
>
: |Bs | ≤ η}, thanks to ' the continuity of t → Bt , the above implies that L is in fact a measure on B −1 (0). Since t → t is H¨older continuous of any order strictly less than 12 , for all s < 12 and all T > 0, lim+ sup
h→0
Equivalently, for all s <
1 2
0≤t≤T
η>0 {s
|'t+h − 't | < +∞. hs
and all T > 0,
' B(t; h) L < ∞. lim sup hs h→0+ 0≤t≤T ' To finish Thus, equation (1) of the preamble of this section holds for L. this proof, it suffices to show that equation (2) of the preamble of this sec' This follows from Frostman’s theorem; cf. the following tion holds for L. Exercise 3.2.3. Exercise 3.2.3 Demonstrate equation (2) of the preamble to this section for the local time constructed above. ' is not a trivial measure, i.e., a.s., (Hint: You will need to show that L ' L(R+ ) = +∞.)
3 Local Times as Frostman’s Measures
485
Exercise 3.2.4 Prove that in our proof of Theorem 3.1.1, t → 't is indeed continuous and increasing on R+ . Exercise 3.2.5 Complete the present derivation of Theorem 3.1.1 by showing that with probability one, for all η > 0, there exists ε0 > 0 such that for all ε ∈ ]0, ε0 [, Lε (A)1l(∃s∈A: |Bs |>η) = 0.
3.3 A Variance Estimate In order to carry out our analysis of the Brownian sheet local times in the more interesting N > 1 case, we shall need a technical variance bound that is the focus of this subsection. Roughly speaking, what we m need is a careful estimate for the variance of j=1 ξj · Btj , where B denotes the N -parameter, d-dimensional Brownian sheet, t1 , . . . , tm ∈ RN +, and ξ1 , . . . , ξm ∈ Rd . The reason for needing this estimate has already arisen within our proof of Lemma 3.2.1, where we needed the quantity E[e−iξBs +iζBr ] (in the notation of Section 3.2). A similar but more intricate problem arises in the multiparameter setting. Once again, we start with 1-dimensional Brownian motion. That is, where d = N = 1. Lemma 3.3.1 Suppose Z = (Zt ; t ≥ 0) denotes standard Brownian motion. For all 0 ≤ r1 ≤ · · · ≤ rm and for all ζ1 , . . . , ζm ∈ R, m m m
2
2 E = ζj Zrj ζj · (rk − rk−1 ), j=1
k=1
j=k
where r0 = 0. Proof To better understand this calculation, note that m m m
2 E = ζj Zrj ζj ζi · {ri ∧ rj }, j=1
j=1 i=1
since E[Bu Bv ] = u ∧ v. Our lemma above gives us what turns out to be a useful reordering of this sum. Let us now proceed with a proof in earnest. Note that m j=1
ζj Zrj =
m j=1
ζj ·
k k=1
m m (Zrk − Zrk−1 ) = ζj · {Zrk − Zrk−1 }, k=1
j=k
all the time remembering that r0 = Zr0 = 0. On the other hand, for all integers 1 ≤ k, k ≤ m, " rk − rk−1 , if k = j, E {Zrk − Zrk−1 } · {Zrk − Zrk −1 } = 0, otherwise.
486
12. The Brownian Sheet and Potential Theory
That is, the differences Zrk − Zrk−1 are uncorrelated and, in fact, independent. The result follows readily from this. Next, we describe a comparison result that relates the variance mentioned for the Brownian sheet to a similar object, but for additive Brownian motion. Recall that X = (Xt ; t ∈ RN + ) is N -parameter, d-dimensional ad j N 1 N ditive Brownian if for all t ∈ R+ , Xt = N are j=1 Xt(j) , where X , . . . , X independent, d-dimensional Brownian motions. Lemma 3.3.2 Suppose a ∈ RN + satisfies the strict inequality 0 ≺ a. Then, d for all t1 , . . . , tm ∈ RN + and for all ξ1 , . . . , ξm ∈ R , m m
2
2 N −1 ≥η , ξj · Btj +a E ξj · Xtj E j=1
j=1
where η = min1≤j≤N a(j) and X = (Xt ; t ∈ RN + ) denotes N -parameter, d-dimensional additive Brownian motion. Proof We begin by defining N one-parameter, d-dimensional Brownian motions β 1 , . . . , β N : βri = Bqri − Ba ,
r ≥ 0,
where qri is a vector in RN + all of whose coordinates agree with the corresponding ones in a, except in the ith coordinate, where (qri )(i) equals r. For instance, when N = 2, βr1 = Br,a(2) − Ba and βr2 = Ba(1) ,r − Ba . For all t ∈ RN + such that a t, we can write Bt = Ct +
N j=1
βtj(j) −a(j) ,
t a.
˘ Applying Centsov’s representation (Theorem 1.5.1, Chapter 5), one can check that the processes C = (Ct ; t a), (βt1(1) −a(1) ; t(1) ≥ a(1) ), . . . , and (βtN(N ) −a(N ) ; t(N ) ≥ a(N ) ) are all independent, centered, and Gaussian; cf. Exercise 3.3.1 below. Moreover, for all j = 1, . . . , N , Z j = (Zrj ; r ≥ 0) is a d-dimensional Brownian motion, where Zrj =
N
a()
− 12
βrj ,
r ≥ 0.
=1 =j (i)
(j)
Finally, E[Ct Zr ] ≥ 0 for all 1 ≤ i, j ≤ N , t ∈ RN + and r ≥ 0. Hence, N m m 2 2 ≥E ξj · Btj ξj · βt() −a() E j=1
j=1 =1
j
3 Local Times as Frostman’s Measures N m =E j=1 i=1 1≤≤N =j
487
2 0 a() ξj · Zti(i) −a(i) j
N m 2 ≥ η N −1 E . ξj · Zti(i) −a(i) j
j=1 i=1
(Why?) The result follows, since additive Brownian motion.
N
i=1
(i)
Z• is a d-dimensional, N -parameter
Now we combine Lemmas 3.3.1 and 3.3.2 to obtain our desired variance estimate, by way of a lower bound. To describe it, we need some messy notation. Suppose t1 , . . . , tm , a ∈ RN + are held fixed, and for some (k) (k) η > 0, tj ≥ a ≥ η, for all k = 1, . . . , N and j = 1, . . . , m. Let ()
()
π (1), . . . , π (m) denote the indices that order t1 , . . . , tm . More precisely, for all = 1, . . . , N , π is the m-vector defined by ()
()
tπ (1) ≤ · · · ≤ tπ (m) . We will also define π (0) ≡ 0. Then, by Lemma 3.3.2 above, for all ξ1 , . . . , ξm ∈ Rd , m m 2 2 ≥ η N −1 E , ξj · Btj ξj · Xtj −a E j=1
j=1
where X is a d-dimensional, N -parameter additive Brownian motion. Note that we can write Xt = Xt1(1) + · · · + XtN(N ) , where X 1 , . . . , X N are independent Brownian motions. Thus, m m d N 2 (p) N −1 E ≥η ξj · Btj E ξπ (j) Xtπ j=1
=1 p=1
j=1
(p) 2 −a() (j)
.
To simplify the notation somewhat, let Z denote a standard (onedimensional) Brownian motion and note that we have m m d N 2 (p) E ≥ η N −1 ξj · Btj E ξπ (j) Ztπ j=1
=1 p=1
j=1
−a() (j)
2 .
According to Lemma 3.3.1, we have proved the following: Proposition 3.3.1 Suppose a, t1 , . . . , tm are in RN + , and satisfy tj a, and min1≤j≤N a(j) = η > 0. For all = 1, . . . , N , let π (1), . . . , π (m) denote () () the indices that order t1 , . . . , tm . Then, m m d N m 2
2
(p) () () ≥ η N −1 ξj · Btj ξπ (j) · tπ (k) − tπ (k−1) , E j=1
=1 p=1 k=1
j=k
488
12. The Brownian Sheet and Potential Theory
where π (0) = 0 for all = 1, . . . , N . Exercise 3.3.1 Prove that in our proof of Lemma 3.3.2, (Ct ; t a), (βt1(1) −a(1) ; t(1) ≥ a(1) ), . . . , (βtN(N ) −a(N ) ; t(N ) ≥ a(N ) ) are independent Gaussian processes. Compute their mean and covariance functions.
3.4 Proof of Theorem 3.1.1: General Case The following technical proposition is the key step in our proof of Theorem 3.1.1; it is the general N analogue of Proposition 3.2.1. With this underway, Theorem 3.1.1 follows by refining and extending the one-parameter method of Section 3.2. We defer the remainder of this derivation of Theorem 3.1.1 to Exercise 3.4.1 below and state and prove the hardest portion of the argument, which is the following. Proposition 3.4.1 Suppose η, h > 0 are fixed and that d < 2N . For all integers m ≥ 1, there exists a finite constant Γ > 0 that only depends on η, N , and d such that for every Borel set A ⊂ [η, ∞[N of diameter bounded above by h, and for all functions f ∈ L1 (Rd ), m 1 N ≤ Γm f m f (Bs ) ds h 2 (N −2d)m . E 1 (m!) A
Proof Without loss of generality, we can assume that f (x) ≥ 0 for all x ∈ Rd . Clearly, m =E f (Bs ) ds · · · f (Bs1 ) · · · f (Bsm ) ds1 · · · dsm . E A
A
A
We can denote this mth moment by Mm and observe that by Fubini’s theorem, m f (Bsj ) ds1 · · · dsm . Mm = · · · E A
A
j=1
On the other hand, by the inversion theorem for characteristic functions, m
E
f (Bsj ) = (2π)−dm
j=1
× E[ei
m
···
Rd
j=1
Rd
ξj ·Bsj
×
···
Rd
m
Rd j=1
−iξj ·λj e f (λj )
] dλ1 · · · dλm × dξ1 · · · dξm ,
which equals −dm
(2π)
···
Rd
Rd
×
···
Rd
m Rd j=1
−iξj ·λj 1 m 2 e f (λj ) e− 2 E[{ j=1 ξj ·Bsj } ]
× dλ1 · · · dλm × dξ1 · · · dξm ,
3 Local Times as Frostman’s Measures
489
since B is a Gaussian process. Moreover, we can use the inequality |eiθ | ≤ m 1 and the fact that all else is nonnegative to see that E[ j=1 f (Bsj )] is bounded above by m 2 1 · · · e− 2 E[{ j=1 ξj ·Bsj } ] dξ1 · · · dξm . (2π)−dm f m 1 Rd
Rd
On the other hand, the above exponential can be bounded by Proposition 3.3.1. This way, we can bound Mm from above by · · · × ··· (2π)−dm f m 1 A Rd Rd m N d N −1 m
A
× exp −
η
2
=1 p=1 k=1 j=k
(p) 2 () () ξπ (j) · sπ (k) − sπ (k−1)
× dξ1 · · · dξm × ds1 · · · dsm , ()
()
where π (•) orders s1 , . . . , sm in the sense of the previous subsection. This is, of course, a slight abuse of notation, since π depends on s. Our subsequent changes of variables need to be made with care, all the time keeping this in mind. Nonetheless, it is fortunate that there are many symmetries in the above manyfold integral. For instance, we can change variables: For m (p) (p) all 1 ≤ k ≤ m and 1 ≤ p ≤ d, let ζk = j=k ξπ (j) . Since the absolute m m value of the Jacobian of this map is 1, j=1 dξj = j=1 dζj . Thus, Mm is bounded above by (2π)−dm f m · · · × ··· 1
× exp −
A Rd Rd m N d N −1
A
η
2
=1 p=1 k=1
ζ (p) 2 · s() − s() k π (k) π (k−1)
× dζ1 · · · dζm × ds1 · · · dsm , which can be simplified to −dm m f 1 · · · × · · · (2π)
Rd Rd N m N −1 (
A
η × exp −
A
2
=1 k=1
( (ζk (2 · s() − s() π (k) π (k−1)
×dζ1 · · · dζm × ds1 · · · dsm . At this juncture &we choose to integrate over the ds1 · · · dsm integral, first. & Note that A · · · A exp{· · ·} ds1 · · · dsm equals N m () () 2 1 N −1 =1 k=1 ζk ·(sk −sk−1 ) ds · · · ds (m!)N · · · e− 2 η 1 m, s1 ,...,sm ∈A: () ∀1≤≤N : s1 ≤···≤s() m
490
12. The Brownian Sheet and Potential Theory
where s0 = 0. Since the diameter of A is bounded above by h > 0, we can deduce the following estimate: · · · exp{· · ·} ds1 · · · dsm A A N m 2 () 1 N −1 =1 k=1 ζk ·rk dr · · · dr ≤ (m!)N ··· e− 2 η 1 m = (m!)N
[0,h]N m
k=1
[0,h]N
1
e− 2 η
N −1
N
=1
r () · ζk 2
dr.
[0,h]N
In particular, Mm is bounded above by N (2π)−dm f m (m!) ··· 1 m
× k=1
Rd
1
e− 2 η
N −1 N
Rd
=1
r () · ζk 2
[0,h]N
dr dζ1 · · · dζm .
Once more, we choose to change the order of integration. Note that N () 1 N −1 · ζk 2 =1 r e− 2 η dζk Rd
d
1
= 2 2 η − 2 d(N −1)
N
r()
− d2
≤ C1 r
2
e− ζ dζ
Rd
=1 −d 2
,
for some finite constant C1 that depends only on N , η, and d. Thus, Mm is bounded above by
m d −dm m N m f 1 (m!) C1 r− 2 dr (2π) [0,h]N
N ≤ C2m f m 1 (m!)
h
d
r− 2 +N −1 dr
m
,
0
for some finite constant C2 that depends only on N , η, and d. This follows from integration in polar coordinates; cf. Supplementary Exercise 7, Chap ter 3. The proposition follows from this upon letting Γ = C2 . Exercise 3.4.1 (Hard) Complete the presented proof of Theorem 3.1.1. This is from Ehm (1981). (Hint: Imitate the argument given in the Brownian case, but use Propositions 3.3.1 and 3.4.1 in place of their 1-parameter counterparts.)
4 Supplementary Exercises
491
4 Supplementary Exercises d 1. Let B −1 (a) = {s ∈ RN + : Bs = a} denote the level set at a ∈ R of the N -parameter, Rd -valued Brownian sheet B. Prove that irrespective of the choice of a ∈ Rd , B −1 (a) is nontrivial if and only if d < 2N . In the case d < 2N , show that it has codimension ( d2 ) and Hausdorff dimension N − ( d2 ).
2. Improve Proposition 3.2.1 by showing the existence of a finite constant C > 0 such that t+h 2k (2k)! k E f 2k ≤ Ck f (Bs ) ds 1 h . k! t (Hint: Reconsider the k-dimensional multiple integral that arose in the course of our proof of the mentioned proposition.) 3. Let B denote the N -parameter, Rd -valued Brownian sheet and consider the occupation measure µ of (Bt ; t ∈ [1, 2]N ): For all Borel sets A ⊂ Rd , 1lA (Bs ) ds. µ(A) = [1,2]N
F(ξ) = Let µ F denote the Fourier transform of µ. That is, for all ξ ∈ Rd , µ & iξ·a e µ(da). & (i) Prove that when d < 2N , E[ Rd |F µ(a)|2 da ] < ∞. (ii) Conclude that when d < 2N , there exists a stochastic process (L(x); x ∈ Rd ) such that: & (a) L(x) ≥ 0, a.s. for all x ∈ Rd and E[ Rd {L(x)}2 dx] < ∞; and & & (b) for all f ∈ L∞ (Rd ), [1,2]N f (Bs ) ds = f (x)L(x) dx, a.s. (iii) It can be shown that x → L(x) has a continuous modification that can be denoted by L. Using this, show that L(0) = L([1, 2]N ), a.s., where L denotes the local times of Section 3.2. Various parts of this are due to P. L´evy, H. Trotter, S. M. Berman, and W. Ehm. (Hint: First, prove Plancherel’s theorem in the following form: If ζ is a finite measure on Rd whose Fourier transform is in L2 (Rd ), then ζ is absolutely continuous with respect to Lebesgue’s measure, and the Radon–Nikod´ ym derivative is in L2 (Rd ).) 4. (Hard) For any compact set E ⊂ Rd , define ρi (E) and ρo (E) to be the inner and outer radii of E, respectively. More precisely, ρi (E) = inf{|x| : x ∈ E} and ρo (E) = sup{|x| : x ∈ E}. Prove that if B denotes the N -parameter Brownian sheet and if η ∈ ]0, 1[ is fixed, there exists a finite constant Kη > 1 such that for all compact sets E with ρi (E) > η and ρo (E) < η −1 , Kη−1 Capd−2N (E) ≤ P(B([0, 1]N ) ∩ E = ∅) ≤ Kη Capd−2N (E). (Hint: Adapt the given proof of Theorem 1.1.1.)
492
12. The Brownian Sheet and Potential Theory
d 5. Suppose X = (Xt ; t ∈ RN + ) denotes N -parameter, R -valued additive Brownian motion. Prove that for all M > 1, there exists a finite constant KM > 1 such that for every compact set E whose outer radius is no more than M , −1 KM Capd−2N (E) ≤ P(X([1, 2]N ) ∩ E = ∅) ≤ KM Capd−2N (E).
Improve this by replacing [1, 2]N by [a, b] for any 0 ≺ a ≺ b both in RN + . (The constant K will then depend on a and b as well.) d 6. Suppose X = (Xt ; t ∈ RN + ) denotes N -parameter, R -valued additive Brownian motion. In the case d < 2N , construct local times for X and prove that Theorem 3.1.1 holds true for the process X. In a slightly different form, this is due to E. B. Dynkin. (When N = 2, the 2-parameter process R2+ (s, t) → Bs1 − Bt2 is an additive Brownian motion (why?). The corresponding local times are traditionally called the intersection local times between the two d-dimensional Brownian motions B 1 and B 2 .)
7. (Hard) Let B 1 , . . . , B k denote k independent Rd -valued Brownian sheets with N 1 , . . . , N k parameters, respectively. Find a necessary and sufficient condition 1 k on N 1 , . . . , N k and d for B 1 ([1, 2]N ) ∩ · · · ∩ B k ([1, 2]N ) to be nonempty. That is, when do the trajectories of k independent Brownian sheets intersect? 8. Suppose B denotes an Rd -valued, N -parameter Brownian sheet. (i) Prove that for all a ≺ b both in RN + and for all M > 1, there exist positive and finite constants A1 and A2 such that for all ε ∈ ]0, 1[ and for all x ∈ [−M, M ]d , A1 Ud−2N (ε) ≤ P(B([a, b]) ∩ B(x; ε) = ∅) ≤ A2 Ud−2N (ε), where Uβ (y) = 1 if β < 0, Uβ (x) = {ln+ (1/x)}−1 if β = 0, and Uβ (x) = xβ if β > 0. Moreover, B(x; ε) = {z ∈ Rd : |z − x| < ε}. (ii) Conclude that when d ≥ 2N , with probability one, Leb{B(RN + )} = 0. 9. (Hard) Consider a standard (1-dimensional) Brownian motion B = (Bt ; t ≥ 0). &t (i) Prove that for all x ∈ R and all t > 0, limε→0+ 0 ϕε (Bs − x) ds exists in L2 (P), where ϕε is defined by equation (1) of Section 3.1. Let Lxt denote this limit. (ii) Show that for all T > 0 and all integers k ≥ 1, there exists a finite constant Ck > 0 such that for all x ∈ R, all integers k ≥ 1, and all s, t ∈ [0, T ], E{|Lxt − Lxs |2k } ≤ Ck |t − s|k . (iii) Show that for all T > 0 and all integers k ≥ 1, there exists a finite constant Dk > 0 such that for all x, y ∈ R, and all t > 0, E{|Lxt − Lyt |2k } ≤ Dk |x − y|k .
5 Notes on Chapter 12
493
(iv) Conclude that there exists a continuous modification of R×[0, ∞[ (x, t) → Lxt . Let us continue to write this modification as Lxt . (v) Prove that there exists one null set outside which the following holds for all f ∈ L∞ (R) and all t ≥ 0:
t 0
f (Bs ) ds =
∞ −∞
f (x)Lxt dx.
Parts (i) and (iv) are essentially due to P. L´evy; part (iii) is essentially due to H. Trotter and, in part, to Berman (1983). Part (iv) is called the occupation density formula; it represents the local times Lxt as the density of the occupation measure of B with respect to Lebesgue’s measure. (Hint: For part (iii), use the fact that |eiξx − eiξy | ≤ |ξ| · |x − y| ∧ 1.)
5 Notes on Chapter 12 Preamble That the Brownian sheet is not a multiparameter Markov process in the sense of Chapter 11 can only be interpreted as such, since the Brownian sheet does have a number of Markovian properties. For instance, consider an Rd -valued, N -parameter Brownian sheet B. If we view t(i) → Bt as a one-parameter process taking its values in the space of continuous functions from [0, 1]N−1 into Rd , we are observing an infinite-dimensional Feller process. Other Markovian properties, far more intricate than the one mentioned here, can be found in Dalang and Walsh (1992), Dorea (1983, 1982), and Zhang (1985). Section 1 Theorem 1.1.1 is from Khoshnevisan and Shi (1999) and resolves an old problem that was partially settled in Orey and Pruitt (1973). Corollary 1.1.3 is a consequence of the general theory of Adler (1977, 1981) and Weber (1983); see also Chen (1997). Recent applications, as well as nontrivial extensions of Theorem 1.1.1 to the potential theory of stochastic partial differential equations, can be found in Nualart (2001a, 2001b). Section 2 In the context of random fields, the codimension approach that we have taken is new. However, see the Notes on Chapter 11 for earlier works on codimension. Theorem 1.1.1 is a consequence of the general theory of Adler (1981); see also Kahane (1985, Chapters 17,18). When N = 2, a deeper understanding of Theorem 2.1.1 is available; cf. Khoshnevisan (1999). Khoshnevisan and Xiao (2002) study the level sets of N parameter additive L´evy processes via capacities. It is a strange fact that the level sets of various random processes are intimately connected to the size of the range of those very processes! This observation is at the heart of the extension of Theorem 2.1.1, given in Khoshnevisan (1999), and appears earlier on in the literature in Fitzsimmons and Port (1990), Kahane (1982, 1983), and Port (1988). For a pedagogical discussion, see Kahane (1985, Chapters 17,18).
494
12. The Brownian Sheet and Potential Theory
Section 3 The introduction of local times as the natural measures that live on the level sets of processes is not new and goes back to Paul L´evy; see Itˆ o and McKean (1974, Section 2.8, Chapter 1). Proposition 3.2.1 is a consequence of M. Kac’s moment formula; cf. Fitzsimmons and Pitman (1999). Our proof of Theorem 3.1.1 is essentially a refinement of Ehm (1981, equation 1.6) for m = 2, using the notation of the mentioned paper. The literature on local times is monstrously large, and it is simply impossible to provide a comprehensive list of all of the appropriate references here. Suffice it to say that three good starting points are the following, together with their combined references: • There are deep connections between local times and the theory of Markov processes. To learn more about these, you can start by reading (Dellacherie and Meyer 1988; Blumenthal and Getoor 1968; Fukushima et al. 1994; Sharpe 1988). Depending on which reference you look at, you may need to look at “continuous additive functionals” and/or “homogeneous random measures” first. • Local times of nice processes can also be studied by appealing to functiontheoretic and Fourier-analytic methods. This aspect is very well documented in Geman and Horowitz (1980); and • There are still deeper connections to stochastic calculus, for which you can start by reading (Chung and Williams 1990; Karatzas and Shreve 1991; Revuz and Yor 1994; Rogers and Williams 1994). As a representative Brownian sheet, we (1984, 1986), Lacey (Adler 1981; Geman ject.
bibliography on local times of processes related to the mention only Adler (1977, 1980), Dozzi (1988), Imkeller (1990), and Vares (1983). The combined bibliography of and Horowitz 1980) contains further references to this sub-
Section 4 Supplementary Exercise 9 started as a problem of P. L´evy that was subsequently solved by H. Trotter, who proved that Brownian local times are continuous. The problem of when a L´evy process possesses (jointly) continuous local times, however, remained open until recently. It was finally settled in Barlow (1988), using earlier metric entropy ideas from Barlow and Hawkes (1985). See Bass and Khoshnevisan (1992b) and Bertoin (1996, Section 3, Chapter V) for simpler proofs of the sufficiency half of Barlow’s theorem. This has, in turn, generated further recent interest. In particular, see the impressive results of Marcus and Rosen (1992). Interestingly enough, the local times of one-parameter processes are related to the range of the Brownian sheet and other random fields; see Cs´ aki et al. (1988, 1989, 1992), Eisenbaum (1995, 1997), Rosen (1991), Weinryb (1986), Weinryb and Yor (1988, 1993), and Yor (1983). Epstein (1989) extends the connections between the Brownian sheet and local times, and more general additive functionals, in new and innovative directions. In the past decade our understanding of the geometry of the Brownian sheet has been rapidly developing. It is a shame that this book has to end before I can describe some of these exciting results. You can start off, where this book ends, by reading Dalang and Mountford (1996, 1997, 2000, 2001), Dalang and Walsh (1992,
5 Notes on Chapter 12
495
1993, 1996), Kendall (1980), Khoshnevisan (1995), Mountford (1993, 2002), and Walsh (1982).
This page intentionally left blank
Part III
Appendices
497
This page intentionally left blank
Appendix A Kolmogorov’s Consistency Theorem
The Daniell–Kolmogorov existence theorem, or Kolmogorov’s consistency theorem, describes precisely when a given stochastic process exists. Given an arbitrary set T , we can define RT as the collection of all functions f : T → R. For any finite set F ⊂ T , let πF : RT → RF denote the projection onto RF . Of course, RF is the same thing as the finitedimensional Euclidean space R#F , up to an identification of terms. Recall that the product topology on RT is the smallest topology that makes πF continuous for all finite F ⊂ T . Suppose for each finite F , we are given a probability measure µF that lives on the Borel subsets of the finite-dimensional Euclidean space RF . Then, we can define a probability measure µF ◦ πF−1 on the Borel subsets of RT (with the product topology). Recall that this means that for all Borel sets A ⊂ RT , µF ◦ πF−1 (A) = µF {πF (A)}. For instance, suppose T = {1, 2} and F = {1}. Then, RT = R2 , and πF is the projection πF (x) = x(1) for all x ∈ R2 . As another, somewhat more interesting, example, consider T = N and F = {1, 2}. Then, for all x ∈ RT , πF (x) is the vector (x(1) , x(2) ). In this latter case, for any finite F ⊂ N of the form F = {t1 , . . . , tk }, πF maps the vector x = (x(1) , x(2) , . . .) ∈ RN to πF (x), which is the vector (x(t1 ) , . . . , x(tk ) ) ∈ R#F = RF . Let us first consider a measure µ on RN , which is usually written as ω R = R × R × · · ·. What do probability measures on Rω look like? Recall that Rω is endowed with the product topology. Moreover, any A ∈ ω R is of the form A = A1 × A2 × · · ·, where Ai ∈ R. Such a set is a cylinder set if all but a finite number of the Aj ’s are equal to R. We can define a
500
Appendix A. Kolmogorov’s Consistency Theorem
very natural partial order ⊂ on all cylinder sets. Suppose A = A1 × A2 × · · · and B = B1 × B2 × · · · are cylinder sets. Let A denote {i ≥ 1 : Ai = R}
and similarly define B for B. Then, A ⊃ B if and only if B ⊃ A. That is, whenever a coordinate of B is R, the corresponding coordinate of A is also R. Note that any probability measure µ on the Borel subsets of Rω is defined consistently on all cylinder sets. Formally speaking, whenever A
and B are cylinder sets with A ⊃ B, µ(πB (A)) = µ(B). Note that µ ◦ πF is a probability measure on the Borel subsets of RF , and consistency is equivalent to the following: If F ⊂ G are both finite, µG ◦ πF−1 = µF . (Why?) The above motivates the following definition. Suppose that for each finite F ⊂ T , µF is a probability measure on RF . We say that this collection of µF ’s is consistent if F ⊂ G finite ⇐⇒ µF = µG ◦ πF−1 . You should check that when T = N, this is the same as the previous notion of consistency. Kolmogorov’s consistency theorem asserts that any collection of consistent probability measures actually comes from one measure on RT . Theorem 1 (Kolmogorov’s Consistency Theorem, I) Given a consistent collection (µF : F ⊂ T, finite) of probability measures on (RF ; F ⊂ T, finite), respectively, there exists a unique probability measure P on the Borel subsets of RT such that P ◦ πF−1 = µF , for all finite F ⊂ T . The measure P is called Kolmogorov’s extension of the µF ’s. The above is equivalent to the following. Theorem 2 (Kolmogorov’s Consistency Theorem, II) Given a consistent collection (µF ; F ⊂ T, finite) of probability measures on (RF ; F ⊂ T, finite), respectively, let P denote the extension of the µF ’s to all of RT . Then, there exists a probability space (Ω, F, P) on which we construct a stochastic process X = (Xt ; t ∈ T ) whose finite-dimensional distributions are the µF ’s. That is, for all finite F ⊂ T of the form F = {t1 , . . . , tk }, and for all Borel sets A1 , . . . , Ak ∈ R, P(Xt1 ∈ A1 , . . . , Xtk ∈ Ak ) = µF (A1 × · · · × Ak ). Theorems 1 and 2 are proved in detail in (Bass 1995; Billingsley 1968; Dudley 1989), for instance.
Appendix B Laplace Transforms
In this appendix we collect some important facts about Laplace transforms.1 Throughout, µ denotes a σ-finite measure on R+ (endowed with its Borel sets). The distribution function F of the measure µ is defined as s ≥ 0.
F (s) = µ{[0, s]},
Suppose h : R+ → R+ is Borel measurable. Then, ∞ ∞ h(s) dF (s) = h(s) µ(ds). 0
0
That is, the abstract Lebesgue integral with respect to µ can be completely identified with the Stieltjes integral with respect to F . With this in mind, we can define the Laplace transform µ ' of µ by ∞ ∞ µ '(λ) = F' (λ) = e−λs µ(ds) = e−λs dF (s), λ ≥ 0. 0
0
1 Uniqueness and Convergence Theorems The first two important results in this theory are the uniqueness and continuity theorems. 1 The
material of this chapter, and much more, can be found in Feller (1971).
502
Appendix B. Laplace Transforms
1.1 The Uniqueness Theorem Roughly speaking, the uniqueness theorem for Laplace transforms states that whenever two measures—or equivalently, two distribution functions— have the same Laplace transform, they are indeed one and the same. This is reminiscent of the uniqueness theorem for characteristic functions. One main difference is that in order to obtain uniqueness, it is sufficient that the two Laplace transforms agree on an infinite half-line. On the other hand, there are distinct measures whose Fourier transforms agree on an infinite half-line. Theorem 1.1.1 (The Uniqueness Theorem) Suppose µ and ν are two σ-finite measures on R+ such that for some λ0 > 0, µ '(λ0 ) and ν'(λ0 ) are both finite. Then the following are equivalent: (i) µ = ν; '(λ) = ν'(λ). (ii) for all λ > λ0 , µ The above is an immediate consequence of the following inversion formula: Theorem 1.1.2 (Widder’s Inversion Formula) Let µ denote a mea'(λ0 ) < ∞. Then, for all s > 0 sure on R+ such that for some λ0 > 0, µ that are points of continuity of F , lim
λ→∞
(−λ)j ∂ j µ ' (λ) = F (s). j! ∂λj
j≤λs
Proof To begin with, note that for all λ > λ0 , µ '(λ) < ∞, and by the dominated convergence theorem, the following derivatives exist and are finite: ∞ ' ∂j µ j (λ) = (−1) e−λs sj µ(ds), j ≥ 0, λ > λ0 . ∂λj 0 Consequently, by Fubini’s theorem, we see that for all s > 0, ∞ (−λ)j ∂ j µ ' (λ) = P(Xλt ≤ λs) µ(dt), j! ∂λj 0 j≤λs
where Xα denotes a Poisson random variable with mean α, for any α ≥ 0.2 By the (weak) law of large numbers, for all t ≥ 0 and all s > 0, " 0, if s < t, (1) lim P(Xλt ≤ λs) = 1, if s > t. λ→∞ 2A
Poisson random variable with mean 0 is identically zero.
1 Uniqueness and Convergence Theorems
503
On the other hand, for all ε, s > 0 and all λ > λ0 , (1−ε)s (−λ)j ∂ j µ ' (λ) = P(Xλt ≤ λs) µ(dt) j! ∂λj 0 j≤λs ∞ + P(Xλt ≤ λs) µ(dt) (1+ε)s
(1+ε)s
+ (1−ε)s
P(Xλt ≤ λs) µ(dt)
= T1 (λ) + T2 (λ) + T3 (λ). By equation (1) and by the dominated convergence theorem, lim T1 (λ) = µ{[0, (1 − ε)s]} = F {(1 − ε)s},
λ→∞
lim T2 (λ) = 0,
λ→∞
sup T3 (λ) ≤ F {(1 + ε)s} − F {(1 − ε)s}. λ≥0
Whenever s is a continuity point for F , limε→0+ supλ≥0 T3 (λ) = 0 and limε→0+ F {(1 − ε)s} = F (s). This proves the theorem.
1.2 The Convergence Theorem The convergence theorem of Laplace transforms parallels the convergence theorem of Fourier transforms, but is simpler. Theorem 1.2.1 (The Convergence Theorem) Consider a collection of measures (µn ; 1 ≤ n ≤ ∞), all on R+ . (i) Suppose that µn converges weakly to µ∞ and that supn µ !n (λ) < ∞, for all λ greater than some λ0 ≥ 0. Then limn→∞ µ !n (λ) exists. (ii) Conversely, suppose there exists λ0 ≥ 0 such that for each and every !n (λ) exists. Then, L is the Laplace transλ ≥ λ0 , L(λ) = limn→∞ µ form of a measure ν on R+ , and µn converges weakly to this measure ν. In particular, by combining (i) and (ii), we see that if µn converges weakly to some µ∞ , then under the boundedness condition, limn µ !n (λ) = µ∞ (λ), for all λ > λ0 . L Proof We first prove the theorem in the case µ1 , µ2 , . . . are all probability !n (λ) ≤ 1, for all measures. Of course, in this case, we always have supn µ λ > 0.
504
Appendix B. Laplace Transforms
(i) Since µn converges weakly to µ∞ , then µ∞ is necessarily a subprobability measure, and this part follows from the definition of weak convergence. (ii) Recall Helly’s selection theorem: Any subsequence (nk )k≥1 has a further subsequence (nkj )j≥1 such that as j → ∞, µnkj converges weakly to some subprobability measure σ. By the part of the theorem that we verified earlier, for all λ > 0, limj→∞ µL '(λ). That is, L = σ ' . Since this is nkj (λ) = σ independent of the choice of the subsequence (nk ), the theorem follows in the case that µ1 , µ2 , . . . are all probability measures. We now proceed with the general case. (i) The first part is similar to the probability case discussed above. Namely, note that by the boundedness assumption and by the dominated convergence theorem, ∞ lim sup e−λt µn (dt) = 0, λ > λ0 . T →∞ n≥1
T
Equivalently, for any ε > 0 and for all λ > λ0 , there exists T0 > 0 such that ∞ sup e−λt µn (dt) ≤ ε. n≥1
T0
On the other hand, by properties of weak convergence, for all λ > λ0 , T0 T0 e−λs µn (ds) = e−λs µ∞ (ds). lim n→∞
0
0
In particular, for all λ > λ0 , ∞ −λs lim e µn (ds) − n,m→∞
0
∞
0
e−λs µm (ds) ≤ 2ε.
Since ε > 0 is arbitrary, we see that limn→∞ µ !n (λ) exists for all λ > λ0 . (ii) Let us fix some λ1 > λ0 and define νn (dt) =
e−λ1 t µn (dt), µ !n (λ1 )
t ≥ 0.
(Why can we assume the strict positivity of µ !n (λ1 )?) It follows immediately that ν1 , ν2 , . . . are probability measures on R+ . Moreover, for all λ ≥ 0, ν! n (λ) =
µ !n (λ + λ1 ) . µ !n (λ1 )
In particular, lim ν! n (λ) =
n→∞
L(λ + λ1 ) , L(λ1 )
λ ≥ 0.
(1)
1 Uniqueness and Convergence Theorems
505
By what we have shown about probability measures, there must exist a subprobability measure ν∞ on R+ such that νn converges weakly to ν∞ and L(λ + λ1 ) , λ ≥ 0. ν! ∞ (λ) = L(λ1 ) Define the measure σ(dt) = L(λ1 )eλ1 t ν∞ (dt),
t ≥ 0.
!n (λ1 ) = L(λ1 ), the definition of νn shows that µn converges Since limn→∞ µ weakly to σ. To finish, note that for all λ ≥ λ1 , ∞ σ ' (λ) = L(λ1 ) e−(λ−λ1 )t ν∞ (dt) = L(λ1 ) lim ν! n (λ − λ1 ) = L(λ). n→∞
0
The last step uses (1). Since λ1 > λ0 is arbitrary, the result follows.
1.3 Bernstein’s Theorem Suppose f : R+ → R+ is measurable. S. N. Bernstein’s theorem addresses the following interesting question: When is f the Laplace transform of a That is, we seek conditions under which for all λ ≥ 0, f (λ) = &measure? ∞ −λt e µ(dt) for some measure µ on R+ . The answer to this question 0 revolves around the notion of complete monotonicity. A function f : R+ → R+ is said to be completely monotone if it is infinitely differentiable and (−1)n
dn f (λ) ≥ 0, dλn
λ > 0.
Theorem 1.3.1 (Bernstein’s Theorem) A function f : R+ → R+ is the Laplace transform of a measure if and only if it is completely monotone. It is the Laplace transform of a probability measure if and only if it is completely monotone and f (0+ ) = 1. Proof Suppose f = µ ' for some measure µ. By the dominated convergence theorem, f is completely monotone. In fact, ∞ dn f (−1)n n (λ) = sn e−λs µ(ds) ≥ 0. dλ 0 Conversely, let us suppose that f is completely monotone. We fix some number r > 0 and define F : [0, 1[ → R+ by F (s) = f (r − rs),
0 ≤ s < 1.
506
Appendix B. Laplace Transforms
Computing directly, we see that for all n ≥ 0, n dn F nd f (s) = (−r) (r − rs), dsn dλn
0 ≤ s < 1.
Since this is positive, we can apply Taylor’s expansion to see that F (s) =
∞ (−r)n sn dn f , (r) n dλ n! n=0
0 ≤ s < 1.
λ
In particular, we apply the above to s = e− r , where λ > 0, and see that λ
f (r − re− r ) =
∞ (−r)n − 1 nλ dn f e r . (r) n dλ n! n=0 λ
!r (λ), where µr is That is, for any fixed r > 0, f (r − re− r ) = µ the purely atomic nonnegative measure that assigns nonnegative mass (−r)n f (n) (r)/n! to every point of the form nr , n = 0, 1, . . . . (Temporarily, we have written f (n) for the nth derivative of f .) Since limr→∞ (r − re−λ/r ) = λ, !r (λ), λ > 0. f (λ) = lim µ r→∞
By the convergence theorem (Theorem 1.2.1), f is the Laplace transform of some measure. At this point it should be clear that f is also the Laplace transform of a probability measure if and only if f (0+) = 1.
2 A Tauberian Theorem A Tauberian theorem is one that states that the asymptotic behavior of the distribution function F (t), as t → ∞, can be read from the behavior of F' (λ), as λ → 0+ . An Abelian theorem is one that states the converse. Following Feller (1971), we use the term Tauberian theorem loosely to stand for both kinds of results. Theorem 2.1.1 (Feller’s Tauberian Theorem) Consider a σ-finite measure µ with distribution function F such that µ '(λ) < ∞ for all λ > 0. The following are equivalent: There exists θ > −1 such that: (i) as t → ∞, F (tx) → xθ , F (t) (ii) as s → 0+ ,
µ '(sλ) → λ−θ , µ '(s)
∀x > 0; and
∀λ > 0.
2 A Tauberian Theorem
507
Moreover, either of the above two conditions above implies µ '( 1t ) = Γ(θ + 1). t→∞ F (t) lim
(1)
In particular, we obtain the following very important corollary. Corollary 2.1.1 Suppose µ is given by Theorem 2.1.1. If there exist two '(s) → C, then, finite constants C > 0 and θ > −1 such that as s → 0+ , sθ µ as t → ∞, t−θ F (t) → D, where D = C/Γ(θ + 1). The converse also holds. Proof of Theorem 2.1.1 First, we show that (i) ⇒ (1). For all x ≥ 0, define G(x) = xθ /Γ(θ + 1), where 1 ÷ 0 = ∞. Viewing G as a distribution function, we immediately see that ∞ 1 ' G(λ) = e−λx θxθ−1 dx = λ−θ . (2) Γ(θ + 1) 0 For each t > 0, define the distribution function Ht (x) =
F (tx) , F (t)
x ≥ 0.
It is easy to check that for all t ≥ 0, '( λt ) ' t (λ) = µ H , F (t)
λ > 0.
Assertion (i) states as t → ∞, Ht converges weakly to the distribution function Γ(θ + 1)G. Suppose we could show that for all λ > 0, ' t (λ) < ∞. sup H
(3)
t
Then, by equation (2) and by the convergence theorem for Laplace transforms (Theorem 1.2.1), we would have µ '( λt ) ' ' t (λ) = Γ(θ + 1)G(λ) = lim H = Γ(θ + 1)λ−θ . t→∞ F (t) t→∞ lim
Applying this with λ = 1, we obtain (1). It remains to demonstrate the validity of (3). By (i), there exists t0 > 0 such that for all t ≥ t0 , F (te) ≤ e1+θ F (t). By iterating this, we see that for all t ≥ t0 , F (tek ) ≤ ek(1+θ) F (t),
k ≥ 0.
(4)
508
Appendix B. Laplace Transforms
Thus, by monotonicity, for all t ≥ t0 ,
µ ' 1t =
t
s
e− t µ(ds) +
0
∞ j=0
≤ F (t) +
∞
tej+1
s
e− t µ(ds)
tej
exp(−ej )F (tej+1 )
j=0
∞ exp{−ej + (j + 1)(1 + θ)} . ≤ F (t) 1 + j=0
We have used equation (4) in the last line. This proves (3) and that (i) ⇒ (1). Combining (1) with the assertion of (i) itself, we obtain (ii) readily. Next, we prove that (ii) ⇒ (1). For each t > 0, define the distribution function F (tx) , x ≥ 0. Ft (x) = µ '( 1t ) It is easy to check that µ '( λt ) , F't (λ) = µ '( 1t )
λ > 0.
In light of equation (2), assertion (ii) of the theorem can be recast as follows: ' lim F't (λ) = G(λ), λ > 0. t→∞
By the convergence theorem (Theorem 1.2.1), as t → ∞, Ft converges weakly to G. Since G is continuous, lim Ft (x) = G(x),
t→∞
∀x ≥ 0.
In particular, limt→∞ Ft (1) = G(1), which is (1) written in shorthand. That is, we have shown that (ii) ⇒ (1). Together with (ii) itself, (1) implies (i). We conclude this appendix with an application to the computation of the asymptotic rate of probability distribution functions. Now let F denote a probability distribution function on R+ . Of course, limx→∞ F (x) = 1. Here is an instructive exercise. Exercise 2.1.1 Show that 1 {1 − F' (λ)} = λ whenever λ > 0.
0
∞
e−λx {1 − F (x)} dx,
2 A Tauberian Theorem
509
Let µ0 (dx) = {1 − F (x)} dx. Its distribution function is, of course, F0 , where x F0 (x) = 1 − F (y) dy, x ≥ 0. 0
!0 (λ) = λ−1 {1 − F'(λ)}, λ > 0. Applying Corollary 2.1.1 to µ0 Moreover, F and F0 , we can deduce that for some positive finite C and some θ > −1, x−θ F0 (x) → C (as x → ∞) if and only if λθ−1 {1− F' (λ)} → D (as λ → 0+ ), where D = C/Γ(1+θ). By applying L’Hˆ opital’s rule of elementary calculus, we obtain the following: Corollary 2.1.2 Suppose F is a probability distribution function on R+ . The following are equivalent: For some θ > −1 and some constant C > 0: (i) as x → ∞, x1−θ 1 − F (x) → C; and (ii) as λ → 0+ , λθ−1 1 − F'(λ) → D, where D = C Γ(θ + 1). Clearly, this is useful and sensible only when θ ∈ ]−1, 1[.
This page intentionally left blank
Appendix C Hausdorff Dimensions and Measures
Hausdorff measures can be thought of as extensions of Lebesgue’s measure. While a rather abstract treatment is possible, we restrict our attention to such measures on Euclidean spaces.
1 Preliminaries Let (S, d) denote a metric space and recall that a set function µ : S → R+ is a Carath´ eodory outer measure (or outer measure, for brevity) if: (i) µ(∅) = 0;
; ∞ ∞ ≤ i=1 µ(Ei ); E (ii) [σ-subadditivity] for all E1 , E2 , . . . ⊂ S, µ i i=1 and (iii) [Monotonicity] If E1 ⊂ E2 are both in S, then µ(E1 ) ≤ µ(E2 ). The essential difference between outer measures and ordinary measures is countable additivity.
1.1 Definition Throughout, we restrict attention to S = Rd , endowed with its Euclidean metric topology and its Borel σ-field.
512
Appendix C. Hausdorff Dimensions and Measures
Given fixed numbers s ≥ 0 and ε > 0 and given a set E ⊂ Rd , we define Hsε (E) = inf
∞
(2rj )s : E ⊂
j=1
∞
B(xk ; rk ) , sup r ≤ ε ,
k=1
where B(x; r) denotes, as usual, the open ∞ -ball of radius r > 0 about x ∈ Rd , and E the (Euclidean) closure of E ⊂ Rd . It is not hard to deduce the following. Exercise 1.1.1 Show that Hsε is an outer measure on the subsets of Rd . Moreover, ε → Hsε (E) is nonincreasing. Therefore, we can unambiguously define Hs (E) = lim Hsε (E), E ⊂ Rd . ε→0+
It immediately follows that Hs is an outer measure. In fact, it is much more than that. Recall that a set E ⊂ Rd is measurable for an outer measure µ if µ(B) = µ(B ∩ E) + µ(B \ E), ∀B ⊆ Rd . Recall that the collection of all measurable sets of µ is a σ-field. Theorem 1.1.1 For any s ≥ 0, Hs is a measure on its measurable sets. Moreover, all Borel sets are measurable for any and all of the outer measures Hs . We can, therefore, view Hs as a measure on the Borel σ-field of Rd . It is called the d-dimensional Hausdorff measure on Rd . Proof Throughout, s ≥ 0 is held fixed. In light of Exercise 1.1.1, that Hs is a measure on its measurable sets is a consequence of Carath´eodory’s extension theorem of measure theory. We finish the proof by showing that Borel sets are measurable for Hs . That is, we need to show that for any Borel set E ⊂ Rd , Hs (B) = Hs (B ∩ E) + Hs (B \ E),
∀B ⊂ Rd .
Since the collection of all measurable subsets of Rd is a σ-field, it suffices to prove that the above holds for all closed sets E ⊂ Rd (why?). Thanks to Exercise 1.1.1, it suffices to show that for all closed sets E ⊂ Rd and for all B ⊂ Rd , Hs (B) ≥ Hs (B ∩ E) + Hs (B \ E).
(1)
With this in mind, let us hold fixed a closed set E ⊂ Rd and an arbitrary set B ⊂ Rd . If either E ⊂ B or B ⊂ E, then (1) holds trivially. Therefore,
1 Preliminaries
513
we may assume that EB = ∅.1 We may also assume, without loss of any generality, that Hs (B) < ∞, for otherwise there is nothing left to prove. For any integer j ≥ 1, let Bj denote the collection of all points x in B such that the distance between x and E is at least j −1 . Of course, Bj and B ∩ E are disjoint. Thus, by the definition of Hausdorff measure and by covering B ∩ E and Bj separately, Hs (B ∩ E) + Hs (Bj ) ≤ Hs (B).
(2)
(Why?) There are two cases to consider at this point. First, suppose there exists j such that Bj = B. In this case, equation (1) follows immediately from equation (2). Next, let us suppose that Bj is a proper subset of B for all j ≥ 1. This means that for all j ≥ 1, Bj is a proper subset of Bj+1 (why?). Consequently, we may write B \ E as the following disjoint union: ∞
B \ E = Bj ∪
(Bk+1 \ Bk ).
k=j
By the first part of the theorem, Hs (B \ E) = Hs (Bj ) +
∞
Hs (Bk+1 \ Bk ).
k=j
In particular, limj→∞ Hs (Bj ) = Hs (B \ E). Equation (1) now follows from equation (2), and our proof is complete. In order to better understand these measures, let us first consider Hausdorff measures of integral dimensions. Lemma 1.1.1 H0 is counting measure on Borel subsets of Rd , and Hd is d-dimensional Lebesgue’s measure on the subsets of Rd . Proof The remark about H0 is immediate. To prove the remainder of the lemma, write Leb, Hε , and H for Lebd , Hdε , and Hd , respectively. Clearly, for any Borel set E ⊂ Rd , Hε (E) = inf
∞
∞ Leb B(xj ; rj ) : E ⊂ B(xk ; rk ) , sup rk ≤ ε .
j=1
k=1
k
As ε → 0+ , the right-hand side converges to Leb(E), by the definition of Lebesgue’s measure, and the left-hand side to H(E), by the definition of Hausdorff’s measure. The result follows.
1 Recall
that E B = (E \ B) ∪ (B \ E).
514
Appendix C. Hausdorff Dimensions and Measures
Exercise 1.1.2 Suppose we defined Hs based on p balls, not ∞ balls, and where 1 ≤ p < ∞. Show that when 1 ≤ s ≤ d is integral, for all E compact, Hs (E) = cp Leb(E), where cp is a constant. Compute cp explicitly. In fact, for nonintegral values of s, we always have the following. Lemma 1.1.2 For all Borel sets E ⊂ Rd , and for all real numbers s ≥ 0, Hs (E) ≥ Lebs (E). Proof Let us fix ε > 0 and find arbitrary closed ∞ -balls B1 , B2 , . . . whose radii r1 , r2 , . . . are all less than or equal to ε. Whenever E ⊂ ∪i Bi , then Lebs (E) ≤
Lebs (Bi ) =
j
(2rj )s .
j
Taking the infimum over all such balls, we conclude that Lebs (E) ≤ Hsε (E). Let ε → 0+ to obtain the lemma. What if s > d? In this case, the following identifies Hs as the trivial measure. Lemma 1.1.3 Given a Borel set E ⊂ Rd : (i) Whenever s < t and Hs (E) < ∞, then Ht (E) = 0; and (ii) whenever s > t and Hs (E) > 0, then Ht (E) = ∞. Exercise 1.1.3 Prove Lemma 1.1.3.
Next, we state an invariance property of Hausdorff measures. In light of Lemma 1.1.1, the following extends the usual invariance properties of Lebesgue’s measure on Rd under the action of Poincar´e motions.2 Lemma 1.1.4 Given numbers r, s ≥ 0, a point x ∈ Rd , and a Borel set E ⊂ Rd , Hs (rE + x) = rs Hs (E), where 00 = 0 and rE + x = y ∈ Rd : y = rz + x for some z ∈ E . Moreover, for any (d × d) unitary rotation matrix O, Hs (OE) = Hs (E). Exercise 1.1.4 Prove Lemma 1.1.4.
2 A function f : Rd → Rd is a Poincar´ e motion if there exists a (d × d) unitary rotation matrix O and a vector β ∈ Rd such that for all x ∈ Rd , f (x) = Ox + β.
1 Preliminaries
515
Finally, we mention that there is a relationship between Hausdorff measures and metric entropy of Section 2.1, Chapter 5. Let d denote both the dimension of Rd and the Euclidean metric induced by the ∞ norm on Rd (while this is abusing our notation, no confusion should arise). Recall that for any totally bounded set E ⊂ Rd , D(ε; E) = D(ε; E, d) denotes the minimum number of balls of radius at most ε required to cover E. This immediately gives the following. Lemma 1.1.5 For any totally bounded E ⊂ Rd and for all s ≥ 0, Hs (E) ≤ lim inf (2ε)s D(ε; E). + ε→0
1.2 Hausdorff Dimension To any Borel set E ⊂ Rd we associate a number dim(E) as follows: dim(E) = sup{s > 0 : Hs (E) = ∞}, where sup ∅ = 0. This is the Hausdorff dimension of the set E. By Lemma 1.1.3, the above is well-defined. Furthermore (why?), dim(E) = inf{s > 0 : Hs (E) = 0}. A simple but important property of Hausdorff dimension is monotonicity. Lemma 1.2.1 (Monotonicity) If E ⊂ B are Borel subsets of Rd , then dim(E) ≤ dim(B). Exercise 1.2.1 Prove the monotonicity lemma, Lemma 1.2.1.
In light of Lemma 1.1.5, it is easy to obtain upper bounds for the Hausdorff dimension of a Borel set E ⊂ Rd : Lemma 1.2.2 Suppose E ⊂ Rd is a Borel set with the following property: There exists a sequence εk ↓ 0 for which there are Nk closed ∞ -balls of radius εk > 0 that cover E. Then dim(E) ≤ inf{s > 0 : lim inf εsk Nk = 0}. k→∞
We apply this to two examples. Example 1 [Cantor’s Tertiary Set] Let C0 = [0, 1], C1 = [0, 12 ] ∪ [ 23 , 1], C2 = [0, 12 ] ∪ [ 92 , 13 ] ∪ [ 23 , 79 ] ∪ [ 89 , 1], etc. In general, we obtain Ck+1 by removing the middle third of every interval subset of Ck ; cf. Figure C.1. Cantor’s tertiary set is simply defined by C = ∩∞ k=0 Ck . This is clearly a nonempty compact set. On the other hand, for any k ≥ 0, Ck is made up of 2k disjoint intervals of length 3−k .
516
Appendix C. Hausdorff Dimensions and Measures
0
1 9
2 9
1 3
2 3
7 9
8 9
1
Figure C.1: The first 3 stages in the construction of Cantor’s tertiary set Thus, Leb(Ck ) = ( 23 )k , which goes to 0 as k → ∞. Since Ck ↓ C, this implies that Leb(C) = 0. Let εk = 3−k and note that Ck is made up of Nk = 2k ∞ -balls of radius εk . Since C ⊂ Ck , by Lemma 1.2.2, ln 2 . dim(C) ≤ inf s > 0 : lim inf 2k 3−ks = 0 = k→∞ ln 3 This turns out to be the correct value of the Hausdorff dimension of the tertiary Cantor set, as we shall see later on in Exercise 2.1.1. Example 2 Recall from Section 2.5, Chapter 5, that a function f : [0, 1]d → Rk is said to be H¨ older continuous of order α > 0 whenever there exists a finite positive constant Γ such that for all s, t ∈ [0, 1]d, |f (t) − f (s)| ≤ Γ|t − s|α . By Taylor’s expansion, the only cases of interest are α ∈ ]0, 1]. If f is such a function, then for all Borel sets E ⊂ [0, 1]d , dim{f (E)} ≤
1 dim(E) ∧ k. α
(1)
Indeed, let us hold fixed a constant γ > α−1 dim(E). By Lemma 1.1.3, we can find a collection of closed ∞ -balls B1,ε , B2,ε , . . . of radii r1,ε , r2,ε , . . . such that: • supj rj,ε ≤ ε; • ∪∞ j=1 Bj,ε ⊃ E; and α γ • lim supε→0+ ∞ j=1 (2rj,ε ) < ∞. By the H¨older condition on f , each f (Bj,ε ) is inside a closed ∞ -ball Xj,ε α of radius Γrj,ε ≤ δ = Γεα . Since f (E) ⊂ ∪∞ j=1 Xj,ε , Hγδ {f (E)} ≤
∞
α γ (2Γrj,ε ) ,
j=1
which remains bounded as δ → 0+ . Thus, Hγ {f (E)} < ∞, which implies that dim{f (E)} ≤ γ. Since this holds for every γ > α−1 dim(E), we can deduce that dim{f (E)} ≤ α−1 dim(E). On the other hand, since f (E) ⊂ Rk ,
2 Frostman’s Theorems
517
by the monotonicity lemma (Lemma 1.2.1), dim{f (E)} ≤ k. The claim (1) follows. Exercise 1.2.2 Show that for all compact sets E1 , E2 , . . . in Rd , dim
∞
Ei = sup dim(Ei ). i
i=1
This is the inner regularity of Hausdorff dimensions.
2 Frostman’s Theorems In order to compute Hausdorff dimensions, we need a way of obtaining good lower bounds. This is the gist of two theorems due to O. Frostman.
2.1 Frostman’s Lemma The following is the first theorem of Frostman, which, for historical reasons, is called Frostman’s lemma. Theorem 2.1.1 (Frostman’s Lemma) For a compact set E ⊂ Rd , the following are equivalent. For s ≥ 0: (i) Hs (E) > 0; (ii) there exists a probability measure µ on E such that lim sup r−s sup µ B(x; r) < ∞. r→0+
x∈Rd
Proof Interestingly enough, the more useful half of this theorem is the easier one to prove. We begin with this case, which is (ii) ⇒ (i). Since µ ∈ P(E), (ii) is equivalent to the following: A = sup sup
r>0 x∈Rd
µ{B(x; r)} < ∞. rs
Now let B1 , B2 , . . . denote any sequence of closed ∞ -balls of radii r1 , r2 , . . . all bounded above by ε. Then, ∞ j=1
(2rj )s ≥
∞ ∞ 1 1 µ µ{B(x ; r )} ≥ B(x ; r ) . j j j j 2s A j=1 2s A j=1
If, in addition, B1 , B2 , . . . cover E, then the right-hand side is greater than or equal to 2−s A−1 µ(E) = 2−s A−1 . That is, we have shown that for all
518
Appendix C. Hausdorff Dimensions and Measures
ε > 0, Hsε (E) ≥ 2−s A−1 . Let ε → 0 to obtain (i). Note that in this part of the proof we did not need compactness. Suppose (i) holds; we are to verify the truth of (ii). Since E is compact, there exists a finite constant M > 0 such that E ⊂ [−M, M ]d . Equivalently, 1 E + 1 ⊂ [0, 1]d . 2M By the invariance properties of Hausdorff measures (Lemma 1.1.4), we may and will assume that E ⊂ [0, 1]d ; otherwise, apply the result for compact subsets of [0, 1]d to the compact set (2M )−1 E + 1. From now on, suppose E ⊂ [0, 1]d is compact and Hs (E) > 0. The latter positivity condition implies that i := inf ε>0 Hsε (E) > 0 (why?). That is, for any collection of closed ∞ -balls B1 , B2 , . . . whose radii are r1 , r2 , . . ., respectively, ∞ ∞ E⊂ Bj =⇒ (2rj )s ≥ i. (1) j=1
j=1
Let Dn denote the collection of all closed, dyadic ∞ -balls of side 2−n : I ∈ Dn ⇐⇒ ∃k ∈ Zd : I =
d
[k (j) 2−n , (k (j) + 1)2−n ].
j=1 d
Since E ⊂ [0, 1] , there must exist a maximal n ≥ 0 and some I ∈ Dn such that E ⊂ I. We shall need this n throughout this proof. For all integers n > n we can define a measure µn,n on Rd as follows: For each I ∈ Dn , if I ∩ E = ∅, then µn,n uniformly assigns mass 2−ns to I; otherwise, µn,n (I) ≡ 0. More precisely, if µ|F denotes the restriction of the measure µ to the set F , for every integer n > n and all I ∈ Dn , we define 2−ns Leb |I , if E ∩ I = ∅, Leb(I) µn,n |I = 0, if I ∩ E = ∅, where Leb denotes Lebesgue’s measure on Rd . We would like to let n → ∞ and obtain a limiting measure. In order to ensure that µn,n remains nontrivial, it needs to be slightly modified. For all I ∈ Dn−1 , define µ | , if µn,n (I) ≤ 2−(n−1)s , n,n I µn,n−1 |I = µ | 2−(n−1)s n,n I , if µn,n (I) > 2−(n−1)s . µn,n (I) We continue this way to define a measure µn,n−j−1 from µn,n−j , for all 0 ≤ j < n − n. That is, for each 0 ≤ j < n − n and for all I ∈ Dn−j−1 , µ | , if µn,n−j (I) ≤ 2−(n−j−1)s , n,n−j I µn,n−j−1 |I = µ | 2−(n−j−1)s n,n−j I , if µn,n−j (I) > 2−(n−j−1)s . µn,n−j (I)
2 Frostman’s Theorems
519
At the end of this construction you should verify that we have a measure µn,n−n that has the following properties: P1 µn,n−n is a nonatomic measure on E. P2 For any j ∈ {0, 1, . . . , n} and all I ∈ Dn−j , µn,n−n (I) ≤ 2−(n−j)s . P3 For each x ∈ E, there exists an integer j ∈ {0, . . . , n} and some I and µn,n−n (I) = 2−(n−j)s . Equivalently, I ∈ Dn−j such that x∈ s µn,n−n (I) = 2 rad(I) , where rad(I) denotes the side, i.e., the ∞ radius, of the box I. By P3, we can find closed ∞ -balls I1 , . . . , Im such that (i) E ⊂ ∪m j=1 Ij ; (ii) the interiors of I1 , . . . , Im are disjoint; and (iii) µn,n−n (Ij ) = {2 rad(Ij )}s . Since µn,n−n is nonatomic (P1), for each n > n, µn,n−n (E) =
m
µn,n−n (Ij ) =
j=1
m {2 rad(Ij )}s . j=1
By (1), for all n > n, µn,n−n (E) ≥ i. Define, for each n > n, µn =
(2)
µn,n−n . µn,n−n (E)
Thus, {µn ; n > n} is a collection of probability measures on E. We can apply (2) and P2 to deduce that for every I ∈ Dn−j (0 ≤ j < n − n), µn (I) ≤
1 {2 rad(I)}s . i
For any closed ∞ -ball B of radius 2−(n−j) (0 ≤ j < n − n), we can find d (not necessarily distinct) J1 , . . . , J2d ∈ Dn−j such that B ⊂ ∪2=1 J ; cf. Figure 10.1 of Chapter 10. Thus, d
µn (B) ≤
2 =1
µn (J ) ≤
s 2d 2 rad(B) . i
For any ∞ -ball B of radius at most 2−n , there exists a ball B ⊃ B such that rad(B ) = 2−(n−j) ≤ 2 rad(B), for some 0 ≤ j < n − n. Thus, for any s closed ∞ -ball B of radius at most 2−n , µn (B) ≤ 2d+1+2s i−1 rad(B) . s On the other hand, if rad(B) > 2−n , µn (B) ≤ 1 ≤ 2ns rad(B) . Thus, ∞ for all closed -balls B, s (3) µn (B) ≤ A rad(B) , we have constructed a sequence where A = max 2d+1+2s i−1 , 2ns . Thus, of probability measures µn ; n > n , all on the compact set E, all of
520
Appendix C. Hausdorff Dimensions and Measures
which satisfy (3). Let µ denote any one of the subsequential limits. Since E is compact, the µn ’s form a tight sequence. Consequently, µ is a probability measure on E. By (3), this is the desired probability measure. Indeed, for this measure µ, µ B(x; r) ≤ A. lim sup sup rs r→0+ x∈Rd This completes the proof.
Frostman’s lemma can be used to complete the calculation of Example 1, as the following shows. Exercise 2.1.1 Use Frostman’s lemma to show that the Hausdorff dimension of Cantor’s tertiary set is equal to ln 2/ ln 3. (Hint: In the notation of Example 1 of Section 1.2, construct a measure on Cn by putting equal mass on all the intervals that form Cn . (a) If this measure is denoted by µn , estimate the µn measure of a small ball, uniformly in n. (b) Show that the µn ’s have a subsequential weak limit µ∞ that is necessarily a probability measure on Cantor’s tertiary set. (c) Estimate the µ∞ measure of a small ball.) Corollary 2.1.1 Let C denote Cantor’s tertiary set in [0, 1] and define s = ln 2/ ln 3. Then, Hs (C) > 0. In particular, dim(C) = ln 2/ ln 3.
2.2 Frostman’s Theorem and Bessel–Riesz Capacities Given a Borel set E ⊂ Rd , we write P(E) for the collection of all probability measures on E. That is, µ ∈ P(E) if and only if µ(U ) = 0, for all open sets U in the complement of E. When E is compact, P(E) is exactly the collection of all probability measures whose closed support is E. This is an important property and makes the measure theory of compact sets particularly nice. Much of the time we shall concentrate on compacta for this very reason. For any β > 0 and any probability measure µ on Rd , we define the β-dimensional (Bessel–Riesz) energy of µ as Energyβ (µ) = x − y−β µ(dy) µ(dx). The 0-dimensional Bessel–Riesz energy of µ is given by
1 µ(dy) µ(dx). Energy0 (µ) = ln+ x − y Finally, when β < 0, we define Energyβ (µ) = 1, for any probability measure µ.
2 Frostman’s Theorems
521
For all β ∈ R, the β-dimensional Bessel–Riesz capacity of a Borel set E ⊂ Rd can now be defined by the principle of minimum energy as follows: −1 inf Energyβ (µ) . Capβ (E) = µ∈P(E)
Frostman’s second theorem—henceforth Frostman’s theorem—is about the following beautiful connection between these capacities and measuretheoretic objects such as Hausdorff measures. Theorem 2.2.1 (Frostman’s Theorem) Let E ⊂ Rd be a compact set. Whenever s > β > 0, Caps (E) > 0 =⇒ Hs (E) > 0 =⇒ Capβ (E) > 0. In particular,
dim(E) = sup β > 0 : Capβ (E) > 0 = inf β > 0 : Capβ (E) = 0 .
The proof of this result requires a simple lemma that is stated as an exercise. Exercise 2.2.1 Show that for all compacts E ⊂ Rd , sup β > 0 : Capβ (E) > 0 = inf β > 0 : Capβ (E) = 0 . Thus, Frostman’s theorem identifies this number with the Hausdorff dimension of E. Proof Suppose there exists an s > 0 such that Hs (E) > 0. By Lemmas 1.1.1 and 1.1.3, this means implicitly that 0 < s ≤ d. By Frostman’s lemma (Theorem 2.1.1) we can find a probability measure µ on E and a finite constant A > 0 such that sup µ(dy) ≤ Ars , ∀r > 0. x∈Rd
y:|x−y|≤r
Since x − y ≥ |x − y|, sup x∈Rd
y: x−y ≤r
µ(dy) ≤ Ars ,
∀r > 0.
(1)
Let D denote any finite number greater than the diameter of E. Then, for all 0 < β < s, Energyβ (µ) = x − y−β µ(dx) µ(dy) =
j=0
≤
∞
∞ j=0
x − y−β µ(dx) µ(dy)
2−j−1 D≤ x−y <2−j D
2
(j+1)β
D
−β
sup x∈Rd
µ(dy). y: x−y ≤2−j D
522
Appendix C. Hausdorff Dimensions and Measures
We have used the fact that µ(E) = 1. Equation (1) can now be invoked to show that ∞ Energyβ (µ) ≤ ADs−β 2−js+(j+1)β , j=0
which is finite. That is, we have shown that ∃s > 0 : Hs (E) > 0 =⇒ ∀β ∈ ]0, s[: Capβ (E) > 0. In particular, we obtain the following immediately. sup β > 0 : Capβ (E) > 0 ≥ dim(E). This proves one half of the result. Next, let us suppose there exists s ∈ ]0, d], such that Caps (E) > 0. There must exist µ ∈ P(E) such that Energys (µ) < ∞. The key estimate in this half is the following maximal inequality: 2 sd 2 µ{B(x; r)} Energys (µ), ≥λ ≤ µ x ∈ E : sup s r λ r>0
λ > 0.
(2)
d
Let us prove this first. Since |x − y| ≤ 2 2 x − y, for any x ∈ Rd and for all r > 0, sd 2 rs µ{B(x; r)} ≤ x − y−s µ(dy). µ(dy) ≤ 2 d y: y−x ≤2 2 r
Equation (2) follows immediately from this and Chebyshev’s inequality. sd Now we conclude the proof. Let λ = 21+ 2 Energys (µ) and µ{B(x; r)} ≤ λ . F = x ∈ E : sup rs r>0 Thus, using (2), we can conclude that F ⊂ E and µ(F ) ≥ 12 . Let us cover F with closed ∞ -balls B1 , B2 , . . . with radii r1 , r2 , . . ., respectively. We can always arrange things so that for all k ≥ 1, Bk ∩ F = ∅; the centers of the Bk ’s lie in F ; and supi ri ≤ ε, where ε > 0 is a preassigned, arbitrary number. Note that for our current choice, µ(Bj ) ≤ λrjs for all j ≥ 1. Thus, ∞ ∞ 1 ≤ µ(F ) ≤ µ(Bj ) ≤ 2−s λ (2rj )s . 2 j=1 j=1
We have used (2) in the last inequality; in particular, we have used the fact that the centers of the Bj ’s lie in F . Taking the infimum over all such choices of B1 , B2 , . . ., we can deduce that 12 ≤ 2−s λHsε (F ). Let ε → 0+ and
2 Frostman’s Theorems
523
invoke monotonicity (Lemma 1.2.1) to conclude that whenever Caps (E) > 0, Hs (E) ≥ 2s−1 λ−1 > 0. In particular, dim(E) ≥ inf s > 0 : Caps (E) > 0 . By Exercise 2.2.1, inf s > 0 : Caps (E) > 0 = sup s > 0 : Caps (E) = 0 . This completes our proof.
Exercise 2.2.2 Supposing that β > 0 and E ⊂ Rd is compact, verify that for all finite constants c > 0, Capβ (cE) = c−β Capβ (E), where cE = cy : y ∈ E .
2.3 Taylor’s Theorem The first half of Frostman’s theorem (Theorem 2.2.1) states that if a compact set E ⊂ Rd has positive s-dimensional Bessel–Riesz capacity for some s > 0, then it has positive s-dimensional Hausdorff measure. In fact, one can prove slightly more. While the following is only a consequence of a theorem of S. J. Taylor, we will refer to it as Taylor’s theorem, nonetheless. Theorem 2.3.1 (Taylor’s Theorem) Let E ⊂ Rd be a compact set such that Caps (E) > 0, for some s > 0. Then, Hs (E) = +∞. We first need a technical lemma. Lemma 2.3.1 Suppose µ is a probability measure on a compact set E ⊂ Rd and s > 0. Define D to be the outradius of E. That is, D = sup x : x ∈ E . Then, Ds S ≤ Energys (E) ≤ (2D)s S, where S=
∞
2js µ × µ (a, b) ∈ E × E : 2−j−1 D < a − b ≤ 2−j D .
j=0
Exercise 2.3.1 Prove Lemma 2.3.1.
Our proof of Taylor’s theorem will also require an elementary technical result about real numbers. It, too, is stated as an exercise. Exercise 2.3.2 Suppose a1 , a2 , . . . is a summable sequence of real numbers. Then, there exists a sequence b1 , b2 , . . . such that limn→∞ bn = +∞ and yet n an bn < ∞.
524
Appendix C. Hausdorff Dimensions and Measures
Exercise 2.3.3 Suppose µ is a finite measure on some locally compact, separable metric space S. Let D denote the diagonal of S × S (endowed with the product topology). That is, D = {x ⊗ x : x ∈ S}. Prove that µ × µ(D) = 0. What if µ is σ-finite? Proof of Theorem 2.3.1 If Caps (E) > 0, there must exist some probability measure µ ∈ P(E) that has finite s-dimensional Bessel–Riesz energy. By Lemma 2.3.1, this is equivalent to the condition ∞
2js µ × µ (a, b) ∈ E × E : 2−j−1 D < a − b ≤ 2−j D < ∞,
j=0
where D denotes the outradius of E. By Exercise 2.3.2, we can find a nonincreasing function h : R+ → R+ ∪{+∞} such that limr→0+ h(r) = +∞ and ∞
2js h(2−j−1 ) µ × µ (a, b) ∈ E × E : 2−j−1 D < a − b ≤ 2−j D < ∞.
j=0
Define κ(x) = xs h(x) for all x ∈ R+ . Clearly, limx→0+ κ(x)/|x|s = +∞. Equivalently, ∀M > 0, ∃ε0 > 0 : ∀x ∈ ]0, ε0 ], κ(x) ≥ M xs .
(1)
Moreover, the method of proof of Lemma 2.3.1 can be used to see&& that the finiteness of the above double sum is equivalent to the condition κ(a − b) µ(da) µ(db) < +∞. An argument similar to the proof of equation (2) can now be used to verify the existence of a finite constant A > 1 such that
A µ{B(x; r)b} µ x ∈ E : sup ≥λ ≤ κ(a−b) µ(da) µ(db), λ > 0. κ(r) λ r>0 We now continue our application of the proof of Frostman’s theorem (Theorem 2.2.1) to conclude that Hκ (E) > 0, where Hκ is defined exactly like Hs (E), except that the function x → xs is replaced by x → κ(x) there. To be more precise, we can define Hκ (E) = limε→0+ Hκε (E), where Hκε (E) = inf
∞ j=1
κ(2rj ) : E ⊂
∞ j=1
B(xj ; rj ), sup r ≤ ε .
Since Hκ (E) > 0, there exists δ > 0 such that for all ε > 0, Hκε (E) ≥ δ. By equation (1), for all M > 0, there exists ε0 > 0 such that for all ε ∈ ]0, ε0 ], Hsε (E) ≥ M Hκε (E) ≥ M δ. Let ε → 0+ and M → ∞, in this order, to deduce Theorem 2.3.1. Theorem 2.3.1 has the following important consequence.
3 Notes on Appendix C
525
Corollary 2.3.1 For all integers d ≥ 1, Capd (Rd ) = 0. Proof Suppose, to the contrary, that for some d ≥ 1, Capd (Rd ) > 0. Equivalently, there exists a probability measure µ ∈ P(Rd ) such that Energyd (µ) < ∞. We will derive a contradiction from this. We first claim that there must exist a compact set E ⊂ Rd such that Capd (E) > 0. For all Borel sets G ⊂ Rd and for all compact sets E ⊂ Rd , define µE (G) = µ(E ∩ G)/µ(E), where 0 ÷ 0 = 0. Since µ ∈ P(Rd ), one can always find a compact set E ⊂ Rd such that µ(E) > 0, in which case µE ∈ P(E). Moreover, 1 2 Energyd (µE ) = a − b−d µ(da) µ(db) µ(E) E E 1 2 ≤ Energyd (µ) < ∞, µ(E) which implies that Capd (E) > 0. By Taylor’s theorem (Theorem 2.3.1), Hd (E) = +∞. Lemma 1.1.1 now implies that Leb(E) = +∞, where Leb denotes Lebesgue’s measure on Rd . On the other hand, Leb is a σ-finite measure. Equivalently, the Lebesgue measure of a compact set must be finite. This provides us with the desired contradiction.
3 Notes on Appendix C Section 1 This is standard material and can be found, in excellent textbook forms, in (Kahane 1985; Mattila 1995); see also Taylor (1986). Section 2 Taylor’s theorem can be found within (Carleson 1958; Taylor 1961).
This page intentionally left blank
Appendix D Energy and Capacity
Appendix C contains a discussion of Hausdorff measure and dimension that naturally leads to Bessel–Riesz energy and capacity. We now study some of the fundamental properties of rather general energy forms and capacities. This appendix also includes a brief physical discussion of the role of energy and capacity in classical electrostatics.
1 Preliminaries In this section we aim to introduce some important notions such as energy, potentials, and capacity (Section 1.1). Deeper properties will be studied in the remainder of this appendix. Our definitions of energy and capacity can be viewed as abstractions of the Bessel–Riesz capacities introduced in Appendix C. There is also a heuristic discussion on classical electrostatics, where many of these objects first arose (Section 1.2). At the very least, Section 1.2 will partly explain the usage of the terms capacity, energy, etc. Section 1.2 can be skipped at first reading.
1.1 General Definitions We say that a function g : Rd ×Rd → R+ is a gauge function (on Rd ×Rd ) if:
528
Appendix D. Energy and Capacity
• for all η > 0, (x, y) → g(x, y) is continuous (and finite) on Oη , where Oη = x ⊗ y ∈ Rd × Rd : |x − y| > η , (1) and where | • | denotes the ∞ -norm of a Euclidean vector; and • there exists η > 0 such that on Oη , g > 0. Recall that a gauge function g is symmetric if g(x, y) = g(y, x). To every gauge function g on Rd × Rd we associate a bilinear form, called mutual energy, defined next. Given two measures µ and ν, both on Rd , the mutual energy between µ and ν, corresponding to the gauge function g, is defined as 1 g(x, y) µ(dx) ν(dy) + g(x, y) ν(dx) µ(dy) . µ, νg = 2 This is always defined, since g is nonnegative; it may, however, be infinite. The energy of a finite positive measure µ 8 with respect to the gauge function g is defined by the relation Eg (µ) = µ, µg . Equivalently, g(x, y) µ(dx) µ(dy). Eg (µ) = Note that as long as the total variation measure |µ| has finite energy, Eg (µ) can be defined for all signed measures µ as well. Exercise 1.1.1 Show that mutual energy can be used to define an inner product on the space of probability measures. Moreover, the seminorm corresponding to this inner product is energy. In particular, conclude that for any two probability measures ν and µ, both of finite energy, µ, ν2g ≤ Eg (µ) · Eg (ν). There is a dramatic improvement to this result when g is nonnegative definite; consult Exercise 2.4.2 of Chapter 10, which is the beginning of the Beurling–Brelot–Cartan–Deny theory. Finally, the corresponding capacity Cg (A) of a Borel set A ⊂ Rd is defined by the principle of minimum energy of Appendix C. Namely, Cg (A) =
inf
µ∈P(A)
−1 Eg (µ) ;
we recall that P(E) designates the collection of all probability measures on E. In this book we will be interested only in capacities of compact sets, and may refer to Eg (µ) and Cg (E) as the g-energy of µ and the g-capacity of E, respectively.
1 Preliminaries
529
Remarks (a) The function g(x, y) = x − y−β is the gauge function that corresponds to the Bessel–Riesz energy and capacity of Appendix C. (b) It is important to observe that E has positive g-capacity if and only if there exists a probability measure µ, on E, that has finite g-energy. Depending on the “shape” of the kernel g, this gives some information on how large, or thick, the set in question is. For instance, the singleton {y} has positive g-capacity if and only if g(y, y) < ∞. Thus, if g(x, y) is proportional to |x − y|−β or ln+ (1/|x − y|), singletons are thin sets in gauge g. The latter kernels are discussed in some depth in Appendix C. Given a measure µ on Rd , we can define the potential of µ (with respect to g) as the function Gµ, where Gµ(x) = g(x, y) µ(dy), x ∈ Rd . See Section 1.2 below for a heuristic physical interpretation of this for the gauge function g(x, y) = cx − y−1 . The following exercises show important connections among energies, capacities, and potentials. Exercise 1.1.2 Verify that for any gauge function g on Rd × Rd , and for all measures µ and ν on Rd , 1 Gµ(x) ν(dx) + Gν(x) µ(dx) . µ, νg = 2 & In particular, check that Eg (µ) = Gµ(x) µ(dx). Exercise 1.1.3 Suppose g is a gauge function on Rd , and µ is a measure on Rd of finite g-energy. (i) Check that Gµ < +∞, µ-a.e. (ii) Prove that x → Gµ(x) is lower semicontinuous. (Hint: Estimate the µ-measure of the set {x ∈ Rd : Gµ(x) ≥ λ}.)
We say that a gauge function g is symmetric if g(x, y) = g(y, x). Exercise 1.1.4 Verify that for any symmetric gauge function g on Rd ×Rd, µ, νg = Gµ(x) ν(dx) = Gν(x) µ(dx), where µ and ν are two arbitrary (nonnegative) measures on the Borel subsets of Rd . (This is the reciprocity theorem of classical potential theory.)
530
Appendix D. Energy and Capacity
Exercise 1.1.5 As the preceding Remark(a) implies, one often wants to know whether or not a given set has positive g-capacity. Along these lines, suppose g is a gauge function on Rd , and show that for any compact sets Λ1 , Λ2 , . . ., sup Cg (Λi ) = 0 ⇐⇒ Cg Λi = 0. i≥1
i≥1
Show that there are no obvious extensions for uncountably many Λ’s. More precisely, construct a gauge g on Rd , and uncountably many sets (Λα ; α ∈ A), all of which have zero g-capacity, such that ∪α∈A Λα has positive gcapacity.
1.2 Physical Interpretations It has long been known that a positively charged particle in R3 is subject to a certain amount of force if there is a charged body in the vicinity. Moreover, this force depends on the position of the particle and is proportional to the amount of its charge. Thus, we can define the force as qF (x), if q > 0 denotes the charge of the particle and if the particle is at position x ∈ R3 . The function F : R3 → R3 is called the electrical field and is due to the effect of the charged body that is in the vicinity. It is also known that electrical fields are additive in the following sense: Given n positively charged particles at x1 , x2 , . . . , xn ∈ R3 , with charges electrical fields F1 , . . . , Fn , respectively, the force on q1 , q2 , . . . , qn and with N the ith particle is qi · =1 F (x ). In order to simplify things, we now suppose that the charged body is concentrated at some point x0 and has charge −q0 . In this case, Coulomb’s law of electrostatics states that the electrical field of the ith particle (which is Fi (xi )) is a three-dimensional vector that points from xi to x0 and whose magnitude is proportional to the inverse of the distance between x0 and xi . In fact, the constant of proportionality is precisely the charge of the body, and thus Fi (xi ) = q0 /x0 − xi 2 . That is, Fi (xi ) =
q0 xi − x0 xi − x0 = q0 · · . xi − x0 2 xi − x0 xi − x0 3
Therefore, if we put a charge of q(x) at some point y ∈ R3 , the induced electrical field at x ∈ R3 is F (x) = q(x)(x − y)/x − y3 . By the additivity property of electrical fields, if we have n charges of amounts q(y1 ), . . . , q(yn ) at y1 , . . . , yn ∈ R3 , respectively, for any x ∈ R3 , the force exerted at point x ∈ R3 is precisely F (x) =
n =1
q(y ) (x − y ). x − y 3
1 Preliminaries
531
Somewhat more generally, suppose µ is a measure on R3 that describes the distribution of charge in space. For example, in the previous paragraph, µ puts weight q1 , . . . , qn at x1 , . . . , xn , respectively. Then, the amount & of force that this charged field exerts at x ∈ R3 should be F (x) = (x − y)x − y−3 µ(dy). We can formally interchange the order of differentiation and integration, and make a line or two of calculations, to deduce that1 F solves the partial differential equation F = −∇Rµ, where ∇ denotes the gradient operator whose ith coordinate is (∂/∂x(i) ) (i = 1, 2, 3) and Rµ(x) = x − y−1 µ(dy). The function Rµ is the potential functions of the force field F . In this three-dimensional setting, this potential function is called the Coulomb potential of µ. Suppose, once more, that we place n positively charged particles in space. Let q1 , . . . , qn denote the respective charges and x1 , . . . , xn the positions. We have already seen that the nth particle is subject toa certain amount −1 of force. In fact, this force is qn ∇pn (xn ), where pn (x) = n−1 =1 q x − x is the potential function for the nth particle. Consequently, particle n will be indefinitely pushed away from the other n − 1 particles. This holds simultaneously for all particles. We now focus our attention on the nth particle. Let τ : [0, ∞] → R3 denote a parametrization of the trajectory of this nth particle such that τ (0) = xn and τ (∞) = +∞. The amount of work done by the nth particle is the sum total of all of the forces on it along its path. That is, ∞
Wn = F (τ )dτ = −qn ∇pn τ (s) · τ (s) ds 0
∞ dpn τ (s) ds = −qn pn (∞) − pn (xn ) = qn pn (xn ) = −qn ds 0 n−1 qn q , = xn − x =1
where, in the second equation, “·” denotes the Euclidean inner product. Particle n is pushed away from the others, and Wn is the total work performed by it. We have now n − 1 particles and view the (n − 1)st as it is being repelled from the remaining n − 2 particles. The total amount of work performed by the (n − 1)st particle as it is being pushed away is Wn−1 =
n−2 k=1
qn−1 qk . xn−1 − xk
1 This was first observed by J.-L. Lagrange. See Wermer (1981), where the discussion of this section is carried through with more care and in much greater detail.
532
Appendix D. Energy and Capacity
Now repeat this procedure to compute the work Wi done by the ith particle as it is being pushed away. The principle of conservation of work shows that the total amount of work done by this system of n particles is W = n =1 W , which we have just computed as W =
n −1 =1 k=1
q qk . x − xk
This is half of k = qk q /xk − x . More generally, if we spread a charge distribution µ in space, the total amount of work expended by this (possibly infinite) system of particles is 1 x − y−1 µ(dx) µ(dy), W = 2 provided that we can ignore the effect of the diagonal terms in the above double integral. In other words, total work equals one-half of the 1dimensional Bessel–Riesz energy Energy1 (µ) of the charge distribution, and this is more commonly known as the Coulomb energy of µ in this threedimensional setting. The physical interpretation of the corresponding capacity Cap1 (E) of a set E ⊂ R3 follows from the construction of a so-called capacitor.2 Imagine a conductor C that is surrounded by a grounded body G. To be more concrete, let us think of C in the shape of a unit ball in R3 and of G in the shape of the boundary of the concentric ball of radius r > 1, also in R3 . That is, in symbols, G = ∂(rC). If we put q units of charge in C, it will redistribute itself on ∂C and reach an equilibrium state. Moreover, in this equilibrium state, the charge distribution is as uniform as possible. This, in turn, produces an electric field F on R3 \ C. Since G is grounded, the only area that is subject to this electric field is {y ∈ R3 : 1 ≤ y < r}. Next, suppose the electric field F comes from a potential function of the form ∇f , where f is a reasonable function. Since ∇f = 0 on ∂C, f equals some constant K on ∂C. One can show that this observation alone implies that K is necessarily proportional to q. The capacity of the “condenser” (or capacitor) created by adjoining C and G is precisely this constant of proportionality. In other words, in physical terms, the total amount of charge in the capacitor equals the potential difference across the capacitor multiplied by the capacity of the capacitor. C.-F. Gauss showed that capacity can also be obtained by minimum energy considerations. This is a cornerstone in axiomatic potential theory and partial differential equations, and is developed in Theorem 2.1.2 below, as well as in Exercise 2.1.3 below. However, for the purposes of our 2 When E is a three-dimensional set, Cap (E) is commonly referred to as the New1 tonian capacity of E.
2 Choquet Capacities
533
present physical discussion, the principle of requiring minimum energy at equilibrium should seem a physically sound one.
2 Choquet Capacities Let Ω denote a topological space. A nonnegative set function C, defined on compact subsets of Ω, is said to be a Choquet capacity (or a natural capacity) if it satisfies the following criteria:3 (a) (Outer Regularity) Consider K1 ⊃ K2 ⊃ · · ·, all compact subsets of Ω, and the compact set K = ∩n Kn . Then, limn→∞ C(Kn ) = C(K). (b) (Subadditivity) For all compact sets E and F , C(E ∪ F ) ≤ C(E) + C(F ). We plan to show, among other things, that if g is a symmetric gauge function on Rd × Rd , Cg is a Choquet capacity on compact subsets of Rd . In the meantime, we note that any probability measure is a Choquet capacity.
2.1 Maximum Principle and Natural Capacities We say that a gauge function g satisfies a maximum principle if for all probability measures µ with compact support E ⊂ Rd , sup Gµ(x) = sup Gµ(x), x∈Rd
x∈E
where Gµ denotes the g-potential of µ. Our next theorem is the main result of this subsection. Theorem 2.1.1 If g is a gauge function on Rd , then Cg is outer regular. Furthermore, if g is a symmetric gauge function that satisfies the maximum principle, Cg is a Choquet capacity on compact subsets of Rd . The first portion of this theorem, i.e., the outer regularity claim, is the only part that is used in this book. However, the second portion is essential if one wants to go from capacities on compacts to more general sets, such as measurable, or even analytic, sets. Some of this development can be found in the first chapter of Dellacherie and Meyer (1978). It is easiest to prove Theorem 2.1.1 in a few steps, each of which is stated separately. Throughout, the notation of Theorem 2.1.1 is used and enforced. 3 In axiomatic potential theory this is actually a Choquet outer capacity on compacts, and is then proved to agree with another object called a Choquet capacity. We do not make such distinctions here.
534
Appendix D. Energy and Capacity
Lemma 2.1.1 (Lower Semicontinuity of Potentials) Consider probability measures µn (1 ≤ n ≤ ∞), all defined on Borel subsets of Rd . If µn converges weakly to µ∞ , lim inf Eg (µn ) ≥ Eg (µ∞ ). n→∞
Proof Properties of gauge functions dictate that g ∧ λ is a continuous function for all λ sufficiently large (why?). For such λ > 0, Eg (µn ) ≥ Eg∧λ (µn ) → Eg∧λ (µ∞ ), as n → ∞. We have applied the continuous mapping theorem of weak convergence to the probability measures µn × µn ; cf. Theorem 2.2.1 of Chapter 6, for example. By Lebesgue’s monotone convergence theorem, limλ→∞ Eg∧λ (µ∞ ) = Eg (µ∞ ), which proves the result. Lemma 2.1.2 (Outer Regularity) If K1 ⊃ K2 ⊃ · · · are compact, and if K = ∩n Kn , then, as n → ∞, Cg (Kn ) ↓ Cg (K). Proof The set function Cg is obviously monotone. That is, whenever E ⊂ F , Cg (E) ≤ Cg (F ). Hence, it suffices to prove lim sup Cg (Kn ) ≤ Cg (K).
(1)
n→∞
Unless Cg (Kn ) > 0 for all but finitely many n’s, this is a trivial matter. Hence, without loss of generality, we may assume that for all n large, the g-capacity of Kn is positive. This means that for all n large, we can find µn ∈ P(Kn ) such that Eg (µn ) < ∞. Moreover, we can arrange things so that for all n large and for these very µn ’s, −1 1 Eg (µn ) ≥ Cg (Kn ) + . n
(2)
Since K1 includes all the Kn ’s, as well as K, and since K1 is compact, there exists a probability measure µ∞ and a subsequence nj going to ∞ such that as j → ∞, µnj converges weakly to µ∞ . Moreover, Kn ↓ K shows that µ ∈ P(K) (why?). By Lemma 2.1.1, lim inf Eg (µnj ) ≥ Eg (µ∞ ) ≥ j→∞
inf
µ∈P(K)
Eg (µ).
Take reciprocals of this inequality and apply (2) to complete this proof. The above proves the first assertion of Theorem 2.1.1. Its proof can be used to complete the following important exercise.
2 Choquet Capacities
535
Exercise 2.1.1 Lemma 2.1.1 has the following compactness consequence: If Cg (E) > 0 for some compact set E, there exists µ ∈ P(E) such that Cg (E) = 1/Eg (µ). In particular, any compact F ⊂ E of zero g-capacity is µ-null. Exercise 2.1.2 If µ is a measure of finite energy, prove that for all λ > 0,
1 Cg {x : Gµ(x) ≥ λ} ≤ 2 Eg (µ). λ In particular, conclude that outside a set of zero g-capacity, Gµ < +∞. Show that this, together with Exercise 2.1.1, improves on Exercise 1.1.3. (Hint: Exercise 1.1.1.) In order to prove Theorem 2.1.1, we will need to better understand the measure µ of minimum energy, as supplied to us by the preceding exercise. The discussion of Section 1.2 suggests that, in some cases, the potential of µ should equal a constant on E. In general, this is almost the case, as the following shows. Lemma 2.1.3 Suppose µ is a compactly supported probability measure on Rd that has finite g-energy. Then, outside a set of zero g-capacity, Gµ ≥ Eg (µ), while for all x in the support of µ, Gµ(x) ≤ Eg (µ). Proof Define for all α > 0, Λα = x ∈ E : Gµ(x) ≤ αEg (µ) . By lower semicontinuity, Λα is compact for all α > 0; cf. Exercise 1.1.3. We claim that whenever α ∈ ]0, 1[, Cg (Λα ) = 0. Indeed, if it were not the case, one could find α ∈ ]0, 1[ and ν ∈ P(Λα ) such that Eg (ν) < +∞. For any η ∈ ]0, 1[, define ζη = ην + (1 − η)µ, and note that ζη ∈ P(E). We can write ζη = µ + η(ν − µ), and compute its g-energy, to be Eg (ζη ) = ζη , ζη g = Eg (µ) + η 2 Eg (ν − µ) + 2ηµ, νg − 2ηEg (µ). (Check!) On the other hand, µ is a probability measure on E that has minimum energy. In particular, Eg (µ) ≤ Eg (ζη ), for all η ∈ ]0, 1[. Using the displayed expression for Eg (ζη ) leads to the inequality Eg (µ) ≤
η Eg (ν − µ) + µ, νg . 2
We can let η ↓ 0 to deduce that Eg (µ) ≤ µ, νg .
536
Appendix D. Energy and Capacity
Since g is symmetric, µ, νg =
&
Gµ dν. Moreover, since ν ∈ P(Λα ),
µ, νg ≤ αEg (µ). (Why? This is not entirely trivial when µ is not purely atomic.) This implies the contradiction Eg (µ) ≤ αEg (µ). Thus, as asserted, Λα has zero g-capacity. By Exercise 1.1.5, Cg (Λ) = 0, where Λ = {x ∈ E : Gµ(x) < Eg (µ)}. This is best seen by noticing that Λ is the countable union of Λα , as α ranges over ]0, 1[ ∩ Q. We have shown that outside a set of zero capacity, Gµ ≥ Eg (µ); this proves the first half of our theorem. Moreover, by Exercise 2.1.1, if a set in E has zero g-capacity, it has zero µ-measure, i.e., Gµ ≥ Eg (µ),
µ-almost everywhere.
(3)
This proves half of our lemma. For the second half, define for all ε ∈ ]0, 1[, Aε = x ∈ E : Gµ(x) > (1 + ε)Eg (µ) . If there exists some x0 in the support of µ such that Gµ(x0 ) > Eg (µ), by lower semicontinuity (Exercise 1.1.3) we can find ε ∈ ]0, 1[ such that Aε contains a relatively open neighborhood (in E) of x0 that has positive µ measure for some ε ∈ ]0, 1[; this follows from the definition of the support of a measure. In particular, there exists ε ∈ ]0, 1[ such that µ(Aε ) > 0. However, according to equation (3), Eg (µ) = Gµ(x) µ(dx) + Gµ(x) µ(dx) A ε
Aε
≥
Eg (µ)µ(Aε )
+ (1 + ε)µ(Aε )Eg (µ)
= Eg (µ){1 + εµ(Aε )}, which is the desired contradiction.
In light of Lemma 2.1.2, Theorem 2.1.1 follows, once we show that Cg is subadditive. We prove the latter property by first deriving an alternative characterization of capacities. Theorem 2.1.2 Suppose g is a symmetric gauge function and E ⊂ Rd is a compact set of positive g-capacity. Then, Cg (E) = inf σ(E), where the infimum is taken over all measures σ of finite energy on E such that {x ∈ E : Gσ(x) < 1} has zero g-capacity. Proof By Exercise 2.1.1, we can find such a probability measure µ ∈ P(E) that satisfies Cg (E) = 1/Eg (µ). We also know, from Lemma 2.1.3, that
2 Choquet Capacities
537
µ has a potential Gµ that is bounded below by Eg (µ), off a set of zero g-capacity. Since g is a gauge function, Eg (µ) > 0 (why?). Hence, we can define µ(•) , σ0 (•) = Eg (µ) and check that the following hold: • Eg (σ0 ) = Cg (E); • σ0 (E) = Cg (E); and • Cg ({x ∈ E : Gσ0 (x) < 1}) = µ({x ∈ E : Gσ0 (x) < 1}) = 0. The gauge function properties of g imply that Cg (E) < +∞. Thus, we have shown that Cg (E) ≥ inf σ(E), where the infimum is taken as in the statement of the theorem. For the converse, suppose σ is a measure on E such that (a) it has finite energy; and (b) outside a µ-null set, Gσ ≥ 1. In this way, we obtain 1 ≤ Gσ(x) µ(dx) = Gµ(dx) σ(dx) ≤ sup Gµ(x) σ(E). x∈Rd
We have used the reciprocity theorem (Exercise 1.1.4). On the other hand, Lemma 2.1.3 and the maximum principle together imply that supx∈Rd Gµ(x) ≤ Eg (µ). Thus, 1 ≤ Eg (µ)σ(E). Divide by Eg (µ) to deduce that Cg (E) ≤ inf σ(E), where the inf is as in the statement of the theorem. Exercise 2.1.3 If E is compact and has positive g-capacity, where g is a symmetric gauge function on Rd , show that Cg (E) = sup σ(E), where the sup is taken over all measures on E of finite energy with supx∈E Gσ(x) ≤ 1. With Theorem 2.1.2 under way, it is possible to prove Theorem 2.1.1, and conclude this subsection. Exercise 2.1.4 Complete the proof of Theorem 2.1.1.
2.2 Absolutely Continuous Capacities Let g denote a gauge function on Rd , and suppose we are interested in minimizing the energy Eg (µ) = g(x, y) µ(dx) µ(dy), as µ varies over all probability measures on some set E. Equivalently, suppose we are interested in computing the reciprocal of the g-capacity of E. In
538
Appendix D. Energy and Capacity
principle, there are many such probability measures µ which tends to make the above optimization problem difficult. However, sometimes one can reduce this problem to the case where µ(dx) is approximable by f (x)dx for some reasonable nonnegative function f on Rd . In such a case, the energy of µ has the following simpler form: g(x, y)f (x)f (y) dx dy. Eg (µ) = We now study the problem of when it is enough to study such energies, in the more general setting where “dx” is replaced by a general measure. Throughout, ν denotes a fixed measure on measurable subsets of Rd , and we will say that a measure is absolutely continuous if it is absolutely continuous with respect to this measure ν. Consider a gauge function g on Rd × Rd , together with its induced capacity Cg . We can modify this capacity a little, by defining a set function C0g on measurable subsets of Rd as follows: C0g (E)
=
inf
µ∈P(E): absolutely continuous
−1 Eg (µ) .
As usual, inf of anything over the empty set is defined to be +∞. On the one hand, C0g is a nicer set function than Cg in the sense that the measures involved are absolutely continuous. On the other hand, C0g has some undesirable properties, one of which is that ν(E) = 0 implies C0g (E) = 0. The second, even more serious, problem is that, generally, C0g is not outer regular on compacts. We try to address both of these issues 0 by using, instead, a regularization Cac g of Cg . Namely, for all bounded Borel sets E ⊂ Rd , we define 0 Cac g (E) = inf Cg (F ) : F ⊃ E is bounded and open . We can extend this definition to any Borel set E ⊂ Rd by ac Cac g (E) = sup Cg (G) : G ⊂ E is a bounded Borel set . However, since we are interested in Cac g (E) only when E is compact, the above extension is superfluous (for such E’s), as the following shows. Exercise 2.2.1 Show that Cac g is outer regular on compacts. Moreover, show that Cac g is a Choquet capacity whenever Cg is. Finally, check that for all compact E, Cg (E) ≥ Cac g (E). In words, Cac g is the “best” possible construction of a g-based capacity that solely uses absolutely continuous measures in its definition. As such, we are justified in calling it the absolutely continuous capacity based on the gauge g.
2 Choquet Capacities
539
2.3 Proper Gauge Functions and Balayage Consider a gauge function g on Rd × Rd . A question of paramount importance to us is, when do the capacities and the absolutely continuous capacities based on g agree on compact sets? where absolute continuity is meant to hold with respect to some underlying measure ν on Rd . Equivalently, when is it the case that for all compact sets E ⊂ Rd , Cg (E) = Cac g (E)? In general, a satisfactory answer to this question does not seem to exist. However, we will see below that there is a complete answer when g is “nice.” We say that the gauge function g is proper if for all compact sets E ⊂ Rd and all µ ∈ P(E), there exist bounded open sets E1 , E2 , . . . such that: 1. E1 ⊃ E2 ⊃ · · ·; 2. ∩n En = E; and 3. for all n ≥ 1 large, there exist absolutely continuous measures µn on En such that for all ε > 0, there exists N0 such that for all n ≥ N0 , (a) µn (En ) ≥ 1 − ε; and (b) Gµn (x) ≤ Gµ(x) for all x ∈ Rd . Roughly speaking, g is a proper gauge function if the potential of a probability measure on a compact set can be approximated from below by the potential of an absolutely continuous measure on a fattening of E. Moreover, the latter measure can be taken to be close to a probability measure. Following J. H. Poincar´e, we can think of Gµn as a smoothing out of the potential Gµ and refer to Gµn as a balayage for Gµ. Theorem 2.3.1 Suppose ν is a measure on Rd and suppose that g is a proper, symmetric gauge function on Rd × Rd . Then, for all compact sets E ⊂ Rd , Cg (E) = Cac g (E). Proof According to Exercise 2.2.1, it suffices to show that for all compact sets E ⊂ Rd , Cg (E) ≤ Cac g (E). For any measure µ ∈ P(E), we can construct bounded open sets En such that as n → ∞, En ↓ E. We can also find absolutely continuous measures µn on En such that for all n large, µn (En ) ≤ 1 + ε and Gµn ≤ Gµ, pointwise. We can apply Exercise 1.1.2 and the reciprocity theorem (Exercise 1.1.4) as follows: Eg (µ) = Gµ(x) µ(dx) ≥ Gµn (x) µ(dx) = Gµ(x) µn (dx) ≥ Gµn (x) µn (dx) = Eg (µn ). Define µ n (•) = µn (•)/µn (En ) and note that µ n ∈ P(En ). Also note that Eg (µ n ) =
1 Eg (µn ), |µ(En )|2
540
Appendix D. Energy and Capacity
which is less than or equal to (1 − ε)−2 Eg (µn ), for all n large. Thus, we have shown that for all n large, Eg (µ) ≥ (1 − ε)2 inf Eg (φ), φ
where the infimum is taken over all absolutely continuous φ ∈ P(En ). Since the above holds for any µ ∈ P(E), inf
µ∈P(E)
Eg (µ) ≥ (1 − ε)2 inf Eg (φ). φ
We can now invert this and see that for all n large, Cg (E) is bounded above by (1 − ε)−2 Cac g (En ). Consequently, Exercise 2.2.1 implies that Cg (E) ≤ (1 − ε)−2 Cac g (E), which proves the theorem, since ε > 0 is arbitrary.
3 Notes on Appendix D Section 1 Two useful references are (Kellogg 1967; Wermer 1981); these treatments are not only rigorous and thorough, but they also contain lively discussion of the physical aspects of classical potential theory. While the brief discussions of this appendix form a minute portion of the history of the subject, it would be a shame to leave it out altogether. For two wonderful general references, see (Helms 1975; Landkof 1972). Section 2 Capacities can be treated more generally and with less focus on compact sets. In particular, we have omitted the capacitability theorem of Gustav Choquet. In the present context it states that when g is a symmetric gauge function, it is also inner regular. That is, for all measurable E ⊂ Rd , Cg (E) = sup{Cg (K) : K ⊂ E is compact}; see (Bass 1995; Dellacherie and Meyer 1978). Some of the essential ideas behind this chapter’s extremal characterizations of capacity (Theorem 2.1.2 and Exercise 2.1.3) date back to C.-F. Gauss. See Carleson (1983, Theorem 6, Section III, Chapter 1) and Itˆ o and McKean (1974, Section 7.9, Chapter 7) for special cases, together with references to the older literature. While maximum principles are powerful tools in probability and analysis, we have discussed them only briefly. The reason for this is that our sole intended application of maximum principles is in proving Theorem 2.1.1, which itself is never used in this book. Carleson (1983) contains a delightful introduction to a class of energies and capacities, with applications to diverse areas in analysis, such as the boundary theory for differential equations, nontriviality of certain H p spaces, and singular points for harmonic functions.
3 Notes on Appendix D
541
I have learned the terminology gauge functions from Yuval Peres. Since capacities are introduced via mutual energy in this book, the notion of balayage becomes a somewhat different matter and naturally leads to the introduction of proper gauge functions and absolutely continuous capacities. A probabilist may find some appeal in this approach, due to its measure-theoretic (as opposed to potential-theoretic) nature.
This page intentionally left blank
References
Adler, R. J. (1977). Hausdorff dimension and Gaussian fields. Ann. Probability 5 (1), 145–151. Adler, R. J. (1980). A H¨ older condition for the local time of the Brownian sheet. Indiana Univ. Math. J. 29 (5), 793–798. Adler, R. J. (1981). The Geometry of Random Fields. Chichester: John Wiley & Sons Ltd. Wiley Series in Probability and Mathematical Statistics. Adler, R. J. (1990). An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Processes. Hayward, CA: Institute of Mathematical Statistics. Adler, R. J. and R. Pyke (1997). Scanning Brownian processes. Adv. in Appl. Probab. 29 (2), 295–326. Aizenman, M. (1985). The intersection of Brownian paths as a case study of a renormalization group method for quantum field theory. Comm. Math. Phys. 97 (1-2), 91–110. Bakry, D. (1979). Sur la r´egularit´e des trajectoires des martingales ` a deux indices. Z. Wahrsch. Verw. Gebiete 50 (2), 149–157. Bakry, D. (1981a). Limites “quadrantales” des martingales. In Two-index random processes (Paris, 1980), pp. 40–49. Berlin: Springer. Bakry, D. (1981b). Th´eor`emes de section et de projection pour les processus a deux indices. Z. Wahrsch. Verw. Gebiete 55 (1), 55–71. ` Bakry, D. (1982). Semimartingales ` a deux indices. Ann. Sci. Univ. ClermontFerrand II Math. (20), 53–54. Barlow, M. T. (1988). Necessary and sufficient conditions for the continuity of local time of L´evy processes. Ann. Probab. 16 (4), 1389–1427.
544
References
Barlow, M. T. and J. Hawkes (1985). Application de l’entropie m´etrique ` a la continuit´e des temps locaux des processus de L´evy. C. R. Acad. Sci. Paris S´er. I Math. 301 (5), 237–239. Barlow, M. T. and E. Perkins (1984). Levels at which every Brownian excursion is exceptional. In Seminar on probability, XVIII, pp. 1–28. Berlin: Springer. Bass, R. (1987). Lp inequalities for functionals of Brownian motion. S´eminaire de Probabilit´es, XXI, pp. 206–217. Berlin: Springer.
In
Bass, R. F. (1985). Law of the iterated logarithm for set-indexed partial sum processes with finite variance. Z. Wahrsch. Verw. Gebiete 70 (4), 591–608. Bass, R. F. (1995). Probabilistic Techniques in Analysis. New York: SpringerVerlag. Bass, R. F. (1998). Diffusions and Elliptic Operators. New York: SpringerVerlag. Bass, R. F., K. Burdzy, and D. Khoshnevisan (1994). Intersection local time for points of infinite multiplicity. Ann. Probab. 22 (2), 566–625. Bass, R. F. and D. Khoshnevisan (1992a). Local times on curves and uniform invariance principles. Probab. Theory Related Fields 92 (4), 465–492. Bass, R. F. and D. Khoshnevisan (1992b). Stochastic calculus and the continuity of local times of L´evy processes. In S´eminaire de Probabilit´es, XXVI, pp. 1–10. Berlin: Springer. Bass, R. F. and D. Khoshnevisan (1993a). Intersection local times and Tanaka formulas. Ann. Inst. H. Poincar´e Probab. Statist. 29 (3), 419–451. Bass, R. F. and D. Khoshnevisan (1993b). Rates of convergence to Brownian local time. Stochastic Process. Appl. 47 (2), 197–213. Bass, R. F. and D. Khoshnevisan (1993c). Strong approximations to Brownian local time. In Seminar on Stochastic Processes, 1992 (Seattle, WA, 1992), pp. 43–65. Boston, MA: Birkh¨ auser Boston. Bass, R. F. and D. Khoshnevisan (1995). Laws of the iterated logarithm for local times of the empirical process. Ann. Probab. 23 (1), 388–399. Bass, R. F. and R. Pyke (1984a). The existence of set-indexed L´evy processes. Z. Wahrsch. Verw. Gebiete 66 (2), 157–172. Bass, R. F. and R. Pyke (1984b). Functional law of the iterated logarithm and uniform central limit theorem for partial-sum processes indexed by sets. Ann. Probab. 12 (1), 13–34. Bass, R. F. and R. Pyke (1984c). A strong law of large numbers for partial-sum processes indexed by sets. Ann. Probab. 12 (1), 268–271. Bass, R. F. and R. Pyke (1985). The space D(A) and weak convergence for set-indexed processes. Ann. Probab. 13 (3), 860–884. Bauer, J. (1994). Multiparameter processes associated with OrnsteinUhlenbeck semigroups. In Classical and Modern Potential Theory and Applications (Chateau de Bonas, 1993), pp. 41–55. Dordrecht: Kluwer Acad. Publ.
References
545
Bendikov, A. (1994). Asymptotic formulas for symmetric stable semigroups. Exposition. Math. 12 (4), 381–384. Benjamini, I., R. Pemantle, and Y. Peres (1995). Martin capacity for Markov chains. Ann. Probab. 23 (3), 1332–1346. Bergstr¨ om, H. (1952). On some expansions of stable distribution functions. Ark. Mat. 2, 375–378. Berman, S. M. (1983). Local nondeterminism and local times of general stochastic processes. Ann. Inst. H. Poincar´e Sect. B (N.S.) 19 (2), 189–207. Bertoin, J. (1996). L´evy Processes. Cambridge: Cambridge University Press. Bickel, P. J. and M. J. Wichura (1971). Convergence criteria for multiparameter stochastic processes and some applications. Ann. Math. Statist. 42, 1656–1670. Billingsley, P. (1968). Convergence of Probability Measures. New York: John Wiley & Sons Inc. Billingsley, P. (1995). Probability and Measure (Third ed.). New York: John Wiley & Sons Inc. A Wiley-Interscience Publication. Blackwell, D. and L. Dubins (1962). Merging of opinions with increasing information. Ann. Math. Statist. 33, 882–886. Blackwell, D. and L. E. Dubins (1975). On existence and non-existence of proper, regular, conditional distributions. Ann. Probability 3 (5), 741–752. Blumenthal, R. M. (1957). An extended Markov property. Trans. Amer. Math. Soc. 85, 52–72. Blumenthal, R. M. and R. K. Getoor (1960a). A dimension theorem for sample functions of stable processes. Illinois J. Math. 4, 370–375. Blumenthal, R. M. and R. K. Getoor (1960b). Some theorems on stable processes. Trans. Amer. Math. Soc. 95, 263–273. Blumenthal, R. M. and R. K. Getoor (1962). The dimension of the set of zeros and the graph of a symmetric stable process. Illinois J. Math. 6, 308–316. Blumenthal, R. M. and R. K. Getoor (1968). Markov Processes and Potential Theory. New York: Academic Press. Pure and Applied Mathematics, Vol. 29. Bochner, S. (1955). Harmonic Analysis and the Theory of Probability. Berkeley and Los Angeles: University of California Press. Borodin, A. N. (1986). On the character of convergence to Brownian local time. I, II. Probab. Theory Relat. Fields 72 (2), 231–250, 251–277. Borodin, A. N. (1988). On the weak convergence to Brownian local time. In Probability theory and mathematical statistics (Kyoto, 1986), pp. 55–63. Berlin: Springer. Burkholder, D. L. (1962). Successive conditional expectations of an integrable function. Ann. Math. Statist. 33, 887–893. Burkholder, D. L. (1964). Maximal inequalities as necessary conditions for almost everywhere convergence. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 3, 75–88 (1964).
546
References
Burkholder, D. L. (1973). Distribution function inequalities for martingales. Ann. Probability 1, 19–42. Burkholder, D. L. (1975). One-sided maximal functions and H p . J. Functional Analysis 18, 429–454. Burkholder, D. L. and Y. S. Chow (1961). Iterates of conditional expectation operators. Proc. Amer. Math. Soc. 12, 490–495. Burkholder, D. L., B. J. Davis, and R. F. Gundy (1972). Integral inequalities for convex functions of operators on martingales. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability theory, pp. 223–240. Berkeley, Calif.: Univ. California Press. Burkholder, D. L. and R. F. Gundy (1972). Distribution function inequalities for the area integral. Studia Math. 44, 527–544. Collection of articles honoring the completion by Antoni Zygmund of 50 years of scientific activity, VI. Caba˜ na, E. (1991). The Markov property of the Brownian sheet associated with its wave components. In Math´ematiques appliqu´ ees aux sciences de l’ing´enieur (Santiago, 1989), pp. 103–120. Toulouse: C´epadu`es. Cairoli, R. (1966). Produits de semi-groupes de transition et produits de processus. Publ. Inst. Statist. Univ. Paris 15, 311–384. Cairoli, R. (1969). Un th´eor`eme de convergence pour martingales ` a indices multiples. C. R. Acad. Sci. Paris S´ er. A-B 269, A587–A589. Cairoli, R. (1970a). Processus croissant naturel associ´e ` a une classe de processus ` a indices doubles. C. R. Acad. Sci. Paris S´ er. A-B 270, A1604–A1606. Cairoli, R. (1970b). Une in´egalit´e pour martingales ` a indices multiples et ses applications. In S´eminaire de Probabilit´es, IV (Univ. Strasbourg, 1968/1969), pp. 1–27. Lecture Notes in Mathematics, Vol. 124. Springer, Berlin. Cairoli, R. (1971). D´ecomposition de processus ` a indices doubles. In S´eminaire de Probabilit´ es, V (Univ. Strasbourg, ann´ee universitaire 1969-1970), pp. 37– 57. Lecture Notes in Math., Vol. 191. Berlin: Springer. Cairoli, R. (1979). Sur la convergence des martingales index´ees par n × n. In S´eminaire de Probabilit´es, XIII (Univ. Strasbourg, Strasbourg, 1977/1978), pp. 162–173. Berlin: Springer. Cairoli, R. and R. C. Dalang (1996). Sequential Stochastic Optimization. New York: John Wiley & Sons Inc. A Wiley-Interscience Publication. Cairoli, R. and J.-P. Gabriel (1979). Arrˆet de certaines suites multiples de variables al´eatoires ind´ependantes. In S´eminaire de Probabilit´es, XIII (Univ. Strasbourg, Strasbourg, 1977/1978), pp. 174–198. Berlin: Springer. Cairoli, R. and J. B. Walsh (1975). Stochastic integrals in the plane. Acta Math. 134, 111–183. Cairoli, R. and J. B. Walsh (1977a). Martingale representations and holomorphic processes. Ann. Probability 5 (4), 511–521. Cairoli, R. and J. B. Walsh (1977b). Prolongement de processus holomorphes. Cas “carr´e int´egrable”. In S´eminaire de Probabilit´es, XI (Univ. Strasbourg, Strasbourg, 1975/1976), pp. 327–339. Lecture Notes in Math., Vol. 581. Berlin: Springer.
References
547
Cairoli, R. and J. B. Walsh (1977c). Some examples of holomorphic processes. In S´eminaire de Probabilit´es, XI (Univ. Strasbourg, Strasbourg, 1975/1976), pp. 340–348. Lecture Notes in Math., Vol. 581. Berlin: Springer. Cairoli, R. and J. B. Walsh (1978). R´egions d’arrˆet, localisations et prolongements de martingales. Z. Wahrsch. Verw. Gebiete 44 (4), 279–306. Calder´ on, A. P. and A. Zygmund (1952). On the existence of certain singular integrals. Acta Math. 88, 85–139. Cao, J. and K. Worsley (1999). The geometry of correlation fields with an application to functional connectivity of the brain. Ann. Appl. Probab. 9 (4), 1021–1057. Carleson, L. (1958). On the connection between Hausdorff measures and capacity. Ark. Mat. 3, 403–406. Carleson, L. (1983). Selected Problems on Exceptional Sets. Belmont, CA: Wadsworth. Selected reprints. ˇ Centsov, N. N. (1956). Wiener random fields depending on several parameters. Dokl. Akad. Nauk. S.S.S.R. (NS) 106, 607–609. Chatterji, S. D. (1967). Comments on the martingale convergence theorem. In Symposium on Probability Methods in Analysis (Loutraki, 1966), pp. 55–61. Berlin: Springer. Chatterji, S. D. (1968). Martingale convergence and the Radon-Nikodym theorem in Banach spaces. Math. Scand. 22, 21–41. Chen, Z. L. (1997). Properties of the polar sets of Brownian sheets. J. Math. (Wuhan) 17 (3), 373–378. Chow, Y. S. and H. Teicher (1997). Probability theory (Third ed.). New York: Springer-Verlag. Independence, Interchangeability, Martingales. Chung, K. L. (1948). On the maximum partial sums of sequences of independent random variables. Trans. Amer. Math. Soc. 64, 205–233. Chung, K. L. (1974). A Course in Probability Theory (Second ed.). Academic Press, New York-London. Probability and Mathematical Statistics, Vol. 21. Chung, K. L. and P. Erd˝ os (1952). On the application of the Borel-Cantelli lemma. Trans. Amer. Math. Soc. 72, 179–186. Chung, K. L. and W. H. J. Fuchs (1951). On the distribution of values of sums of random variables. Mem. Amer. Math. Soc. 1951 (6), 12. Chung, K. L. and G. A. Hunt (1949). On the zeros of n 1 ±1. Ann. of Math. (2) 50, 385–400. Chung, K. L. and D. Ornstein (1962). On the recurrence of sums of random variables. Bull. Amer. Math. Soc. 68, 30–32. Chung, K. L. and J. B. Walsh (1969). To reverse a Markov process. Acta Math. 123, 225–251. Chung, K. L. and R. J. Williams (1990). Introduction to Stochastic Integration (Second ed.). Boston, MA: Birkh¨ auser Boston Inc. Ciesielski, Z. (1959). On Haar functions and on the Schauder basis of the space C0, 1 . Bull. Acad. Polon. Sci. S´er. Sci. Math. Astronom. Phys. 7, 227–232.
548
References
Ciesielski, Z. (1961). H¨ older conditions for realizations of Gaussian processes. Trans. Amer. Math. Soc. 99, 403–413. Ciesielski, Z. and J. Musielak (1959). On absolute convergence of Haar series. Colloq. Math. 7, 61–65. Ciesielski, Z. and S. J. Taylor (1962). First passage times and sojourn times for Brownian motion in space and the exact Hausdorff measure of the sample path. Trans. Amer. Math. Soc. 103, 434–450. Cs´ aki, E., M. Cs¨ org˝ o, A. F¨ oldes, and P. R´ev´esz (1989). Brownian local time approximated by a Wiener sheet. Ann. Probab. 17 (2), 516–537. Cs´ aki, E., M. Cs¨ org˝ o, A. F¨ oldes, and P. R´ev´esz (1992). Strong approximation of additive functionals. J. Theoret. Probab. 5 (4), 679–706. Cs´ aki, E., A. F¨ oldes, and Y. Kasahara (1988). Around Yor’s theorem on the Brownian sheet and local time. J. Math. Kyoto Univ. 28 (2), 373–381. Cs´ aki, E. and P. R´ev´esz (1983). Strong invariance for local times. Z. Wahrsch. Verw. Gebiete 62 (2), 263–278. Cs¨ org˝ o, M. and P. R´ev´esz (1978). How big are the increments of a multiparameter Wiener process? Z. Wahrsch. Verw. Gebiete 42 (1), 1–12. Cs¨ org˝ o, M. and P. R´ev´esz (1981). Strong Approximations in Probability and Statistics. New York: Academic Press Inc. [Harcourt Brace Jovanovich Publishers]. Cs¨ org˝ o, M. and P. R´ev´esz (1984). Three strong approximations of the local time of a Wiener process and their applications to invariance. In Limit theorems in probability and statistics, Vol. I, II (Veszpr´ em, 1982), pp. 223–254. Amsterdam: North-Holland. Cs¨ org˝ o, M. and P. R´ev´esz (1985). On the stability of the local time of a symmetric random walk. Acta Sci. Math. (Szeged) 48 (1-4), 85–96. Cs¨ org˝ o, M. and P. R´ev´esz (1986). Mesure du voisinage and occupation density. Probab. Theory Relat. Fields 73 (2), 211–226. Dalang, R. C. and T. Mountford (1996). Nondifferentiability of curves on the Brownian sheet. Ann. Probab. 24 (1), 182–195. Dalang, R. C. and T. Mountford (1997). Points of increase of the Brownian sheet. Probab. Theory Related Fields 108 (1), 1–27. Dalang, R. C. and T. Mountford (2001). Jordan curves in the level sets of additive Brownian motion. Trans. Amer. Math. Soc. 353 (9), 3531–3545 (electronic). Dalang, R. C. and T. S. Mountford (2000). Level sets, bubbles and excursions of a Brownian sheet. In Infinite dimensional stochastic analysis (Amsterdam, 1999), pp. 117–128. R. Neth. Acad. Arts Sci., Amsterdam. Dalang, R. C. and J. B. Walsh (1992). The sharp Markov property of the Brownian sheet and related processes. Acta Math. 168 (3-4), 153–218. Dalang, R. C. and J. B. Walsh (1993). Geography of the level sets of the Brownian sheet. Probab. Theory Related Fields 96 (2), 153–176.
References
549
Dalang, R. C. and J. B. Walsh (1996). Local structure of level sets of the Brownian sheet. In Stochastic analysis: random fields and measure-valued processes (Ramat Gan, 1993/1995), pp. 57–64. Ramat Gan: Bar-Ilan Univ. Davis, B. and T. S. Salisbury (1988). Connecting Brownian paths. Ann. Probab. 16 (4), 1428–1457. de Acosta, A. (1983). A new proof of the Hartman–Wintner law of the iterated logarithm. Ann. Probab. 11 (2), 270–276. de Acosta, A. and J. Kuelbs (1981). Some new results on the cluster set c({Sn /an }) and the LIL. Ann. Prob. 11, 102–122. Dellacherie, C. and P.-A. Meyer (1978). Probabilities and Potential. Amsterdam: North-Holland Publishing Co. Dellacherie, C. and P.-A. Meyer (1982). Probabilities and Potential. B. Amsterdam: North-Holland Publishing Co. Theory of martingales, Translated from the French by J. P. Wilson. Dellacherie, C. and P.-A. Meyer (1988). Probabilities and Potential. C. Amsterdam: North-Holland Publishing Co. Potential theory for discrete and continuous semigroups, Translated from the French by J. Norris. Dembo, A. and O. Zeitouni (1998). Large Deviations Techniques and Applications (second ed.). Berlin: Springer. Donsker, M. D. (1952). Justification and extension of Doob’s heuristic approach to the Kolmogorov-Smirnov theorems. Ann. Math. Statistics 23, 277– 281. Doob, J. L. (1962/1963). A ratio operator limit theorem. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 1, 288–294. Doob, J. L. (1990). Stochastic Processes. New York: John Wiley & Sons Inc. Reprint of the 1953 original, A Wiley-Interscience Publication. Dorea, C. C. Y. (1982). A characterization of the multiparameter Wiener process and an application. Proc. Amer. Math. Soc. 85 (2), 267–271. Dorea, C. C. Y. (1983). A semigroup characterization of the multiparameter Wiener process. Semigroup Forum 26 (3-4), 287–293. Dozzi, M. (1988). On the local time of the multiparameter Wiener process and the asymptotic behaviour of an associated integral. Stochastics 25 (3), 155–169. Dozzi, M. (1989). Stochastic Processes with a Multidimensional Parameter. Harlow: Longman Scientific & Technical. Dozzi, M. (1991). Two-parameter stochastic processes. In Stochastic Processes and Related Topics (Georgenthal, 1990), pp. 17–43. Berlin: Akademie-Verlag. Dubins, L. E. and J. Pitman (1980). A divergent, two-parameter, bounded martingale. Proc. Amer. Math. Soc. 78 (3), 414–416. Dudley, R. M. (1973). Sample functions of the Gaussian process. Ann. Probability 1 (1), 66–103. ´ Dudley, R. M. (1984). A Course on Empirical Processes. In Ecole d’´et´e de probabilit´es de Saint-Flour, XII—1982, pp. 1–142. Berlin: Springer.
550
References
Dudley, R. M. (1989). Real Analysis and Probability. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books & Software. Durrett, R. (1991). Probability. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books & Software. Theory and Examples. Dvoretzky, A. and P. Erd˝ os (1951). Some problems on random walk in space. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950., Berkeley and Los Angeles, pp. 353–367. University of California Press. Dvoretzky, A., P. Erd˝ os, and S. Kakutani (1950). Double points of paths of Brownian motion in n-space. Acta Sci. Math. Szeged 12 (Leopoldo Fejer et Frederico Riesz LXX annos natis dedicatus, Pars B), 75–81. Dvoretzky, A., P. Erd˝ os, and S. Kakutani (1954). Multiple points of paths of Brownian motion in the plane. Bull. Res. Council Israel 3, 364–371. Dvoretzky, A., P. Erd˝ os, and S. Kakutani (1958). Points of multiplicity c of plane Brownian paths. Bull. Res. Council Israel Sect. F 7F, 175–180 (1958). Dvoretzky, A., P. Erd˝ os, S. Kakutani, and S. J. Taylor (1957). Triple points of Brownian paths in 3-space. Proc. Cambridge Philos. Soc. 53, 856–862. Dynkin, E. B. (1965). Markov Processes. Vols. I, II. Publishers, New York: Academic Press Inc. Translated with the authorization and assistance of the author by J. Fabius, V. Greenberg, A. Maitra, G. Majone. Die Grundlehren der Mathematischen Wissenschaften, B¨ ande 121, 122. Dynkin, E. B. (1980). Markov processes and random fields. Bull. Amer. Math. Soc. (N.S.) 3 (3), 975–999. Dynkin, E. B. (1981a). Additive functionals of several time-reversible Markov processes. J. Funct. Anal. 42 (1), 64–101. Dynkin, E. B. (1981b). Harmonic functions associated with several Markov processes. Adv. in Appl. Math. 2 (3), 260–283. Dynkin, E. B. (1983). Markov processes as a tool in field theory. J. Funct. Anal. 50 (2), 167–187. Dynkin, E. B. (1984a). Local times and quantum fields. In Seminar on stochastic processes, 1983 (Gainesville, Fla., 1983), pp. 69–83. Boston, Mass.: Birkh¨ auser Boston. Dynkin, E. B. (1984b). Polynomials of the occupation field and related random fields. J. Funct. Anal. 58 (1), 20–52. Dynkin, E. B. (1985). Random fields associated with multiple points of the Brownian motion. J. Funct. Anal. 62 (3), 397–434. Dynkin, E. B. (1986). Generalized random fields related to self-intersections of the Brownian motion. Proc. Nat. Acad. Sci. U.S.A. 83 (11), 3575–3576. Dynkin, E. B. (1987). Self-intersection local times, occupation fields, and stochastic integrals. Adv. in Math. 65 (3), 254–271. Dynkin, E. B. (1988). Self-intersection gauge for random walks and for Brownian motion. Ann. Probab. 16 (1), 1–57.
References
551
Edgar, G. A. and L. Sucheston (1992). Stopping Times and Directed Processes. Cambridge: Cambridge University Press. Ehm, W. (1981). Sample function properties of multiparameter stable processes. Z. Wahrsch. Verw. Gebiete 56 (2), 195–228. Eisenbaum, N. (1995). Une version sans conditionnement du th´eor`eme d’isomorphisms de Dynkin. In S´eminaire de Probabilit´es, XXIX, pp. 266–289. Berlin: Springer. Eisenbaum, N. (1997). Th´eor`emes limites pour les temps locaux d’un processus stable sym´etrique. In S´eminaire de Probabilit´es, XXXI, pp. 216–224. Berlin: Springer. Epstein, R. (1989). Some limit theorems for functionals of the Brownian sheet. Ann. Probab. 17 (2), 538–558. Erd˝ os, P. (1942). On the law of the iterated logarithm. Ann. Math. 43, 419– 436. Erd˝ os, P. and S. J. Taylor (1960a). Some intersection properties of random walk paths. Acta Math. Acad. Sci. Hungar. 11, 231–248. Erd˝ os, P. and S. J. Taylor (1960b). Some problems concerning the structure of random walk paths. Acta Math. Acad. Sci. Hungar 11, 137–162. (unbound insert). Esqu´ıvel, M. L. (1996). Points of rapid oscillation for the Brownian sheet via Fourier-Schauder series representation. In Interaction between Functional Analysis, Harmonic Analysis, and Probability (Columbia, MO, 1994), pp. 153– 162. New York: Dekker. Etemadi, N. (1977). Collision problems of random walks in two-dimensional time. J. Multivariate Anal. 7 (2), 249–264. Etemadi, N. (1991). Maximal inequalities for partial sums of independent random vectors with multi-dimensional time parameters. Comm. Statist. Theory Methods 20 (12), 3909–3923. Ethier, S. N. (1998). An optional stopping theorem for nonadapted martingales. Statist. Probab. Lett. 39 (3), 283–288. Ethier, S. N. and T. G. Kurtz (1986). Markov Processes. Characterization and Convergence. New York: John Wiley & Sons Inc. Evans, S. N. (1987a). Multiple points in the sample paths of a L´evy process. Probab. Theory Related Fields 76 (3), 359–367. Evans, S. N. (1987b). Potential theory for a family of several Markov processes. Ann. Inst. H. Poincar´e Probab. Statist. 23 (3), 499–530. Feller, W. (1968). An Introduction to Probability Theory and Its Applications. Vol. I (Third ed.). New York: John Wiley & Sons Inc. Feller, W. (1968/1969). An extension of the law of the iterated logarithm to variables without variance. J. Math. Mech. 18, 343–355. Feller, W. (1971). An Introduction to Probability Theory and Its Applications. Vol. II. (Second ed.). New York: John Wiley & Sons Inc.
552
References
Feynman, R. J. (1948). Space-time approach to nonrelativistic quantuum mechanics. Rev. Mod. Phys. 20, 367–387. Fitzsimmons, P. J. and B. Maisonneuve (1986). Excessive measures and Markov processes with random birth and death. Probab. Theory Relat. Fields 72 (3), 319–336. Fitzsimmons, P. J. and J. Pitman (1999). Kac’s moment formula and the Feynman-Kac formula for additive functionals of a Markov process. Stochastic Process. Appl. 79 (1), 117–134. Fitzsimmons, P. J. and S. C. Port (1990). Local times, occupation times, and the Lebesgue measure of the range of a L´evy process. In Seminar on Stochastic Processes, 1989 (San Diego, CA, 1989), pp. 59–73. Boston, MA: Birkh¨ auser Boston. Fitzsimmons, P. J. and T. S. Salisbury (1989). Capacity and energy for multiparameter Markov processes. Ann. Inst. H. Poincar´e Probab. Statist. 25 (3), 325–350. F¨ ollmer, H. (1984a). Almost sure convergence of multiparameter martingales for Markov random fields. Ann. Probab. 12 (1), 133–140. F¨ ollmer, H. (1984b). Von der Brownschen Bewegung zum Brownschen Blatt: einige neuere Richtungen in der Theorie der stochastischen Prozesse. In Perspectives in mathematics, pp. 159–190. Basel: Birkh¨ auser. Fouque, J.-P., K. J. Hochberg, and E. Merzbach (Eds.) (1996). Stochastic Analysis: Random Fields and Measure-Valued Processes. Ramat Gan: BarIlan University Gelbart Research Institute for Mathematical Sciences. Papers from the Binational France-Israel Symposium on the Brownian Sheet, held September 1993, and the Conference on Measure-valued Branching and Superprocesses, held May 1995, at Bar-Ilan University, Ramat Gan. Frangos, N. E. and L. Sucheston (1986). On multiparameter ergodic and martingale theorems in infinite measure spaces. Probab. Theory Relat. Fields 71 (4), 477–490. ¯ Fukushima, M., Y. Oshima, and M. Takeda (1994). Dirichlet Forms and Symmetric Markov Processes. Berlin: Walter de Gruyter & Co. Gabriel, J.-P. (1977). Martingales with a countable filtering index set. Ann. Probability 5 (6), 888–898. G¨ anssler, P. (1983). Empirical Processes. Hayward, Calif.: Institute of Mathematical Statistics. Garsia, A. M. (1970). Topics in Almost Everywhere Convergence. Markham Publishing Co., Chicago, Ill. Lectures in Advanced Mathematics, 4. Garsia, A. M. (1973). Martingale Inequalities: Seminar Notes on Recent Progress. W. A. Benjamin, Inc., Reading, Mass.–London–Amsterdam. Mathematics Lecture Notes Series. Garsia, A. M., E. Rodemich, and H. Rumsey, Jr. (1970/1971). A real variable lemma and the continuity of paths of some Gaussian processes. Indiana Univ. Math. J. 20, 565–578.
References
553
Geman, D. and J. Horowitz (1980). Occupation densities. Ann. Probab. 8 (1), 1–67. Geman, D., J. Horowitz, and J. Rosen (1984). A local time analysis of intersections of Brownian paths in the plane. Ann. Probab. 12 (1), 86–107. Getoor, R. K. (1975). Markov Processes: Ray Processes and Right Processes. Berlin: Springer-Verlag. Lecture Notes in Mathematics, Vol. 440. Getoor, R. K. (1979). Splitting times and shift functionals. Z. Wahrsch. Verw. Gebiete 47 (1), 69–81. Getoor, R. K. (1990). Excessive Measures. Boston, MA: Birkh¨ auser Boston, Inc. Getoor, R. K. and J. Glover (1984). Riesz decompositions in Markov process theory. Trans. Amer. Math. Soc. 285 (1), 107–132. Griffin, P. and J. Kuelbs (1991). Some extensions of the LIL via selfnormalizations. Ann. Probab. 19 (1), 380–395. Gundy, R. F. (1969). On the class L log L, martingales, and singular integrals. Studia Math. 33, 109–118. Gut, A. (1978/1979). Moments of the maximum of normed partial sums of random variables with multidimensional indices. Z. Wahrsch. Verw. Gebiete 46 (2), 205–220. Hall, P. and C. C. Heyde (1980). Martingale Limit Theory and Its Application. New York: Academic Press Inc. [Harcourt Brace Jovanovich Publishers]. Probability and Mathematical Statistics. Hawkes, J. (1970). Polar sets, regular points and recurrent sets for the symmetric and increasing stable processes. Bull. London Math. Soc. 2, 53–59. Hawkes, J. (1970/1971b). Some dimension theorems for the sample functions of stable processes. Indiana Univ. Math. J. 20, 733–738. Hawkes, J. (1971a). On the Hausdorff dimension of the intersection of the range of a stable process with a Borel set. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 19, 90–102. Hawkes, J. (1976/1977a). Intersections of Markov random sets. Wahrscheinlichkeitstheorie und Verw. Gebiete 37 (3), 243–251.
Z.
Hawkes, J. (1977b). Local properties of some Gaussian processes. Wahrscheinlichkeitstheorie und Verw. Gebiete 40 (4), 309–315.
Z.
Hawkes, J. (1978). Multiple points for symmetric L´evy processes. Math. Proc. Cambridge Philos. Soc. 83 (1), 83–90. Hawkes, J. (1979). Potential theory of L´evy processes. Proc. London Math. Soc. (3) 38 (2), 335–352. Helms, L. L. (1975). Introduction to Potential Theory. Robert E. Krieger Publishing Co., Huntington, N.Y. Reprint of the 1969 edition, Pure and Applied Mathematics, Vol. XXII. Hendricks, W. J. (1972). Hausdorff dimension in a process with stable components—an interesting counterexample. Ann. Math. Statist. 43 (2), 690– 694.
554
References
Hendricks, W. J. (1973/1974). Multiple points for a process in R2 with stable components. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 28, 113–128. Hendricks, W. J. (1979). Multiple points for transient symmetric L´evy processes in Rd . Z. Wahrsch. Verw. Gebiete 49 (1), 13–21. Hewitt, E. and L. J. Savage (1955). Symmetric measures on Cartesian products. Trans. Amer. Math. Soc. 80, 470–501. Hille, E. (1958). On roots and logarithms of elements of a complex Banach algebra. Math. Ann. 136, 46–57. Hille, E. and R. S. Phillips (1957). Functional Analysis and Semi-Groups. Providence, R. I.: American Mathematical Society. rev. ed, American Mathematical Society Colloquium Publications, vol. 31. Hirsch, F. (1995). Potential theory related to some multiparameter processes. Potential Anal. 4 (3), 245–267. Hirsch, F. and S. Song (1995a). Markov properties of multiparameter processes and capacities. Probab. Theory Related Fields 103 (1), 45–71. Hirsch, F. and S. Song (1995b). Symmetric Skorohod topology on nvariable functions and hierarchical Markov properties of n-parameter processes. Probab. Theory Related Fields 103 (1), 25–43. Hirsch, F. and S. Song (1995c). Une in´egalit´e maximale pour certains processus de Markov ` a plusieurs param`etres. I. C. R. Acad. Sci. Paris S´er. I Math. 320 (6), 719–722. Hirsch, F. and S. Song (1995d). Une in´egalit´e maximale pour certains processus de Markov ` a plusieurs param`etres. II. C. R. Acad. Sci. Paris S´ er. I Math. 320 (7), 867–870. Hirsch, F. and S. Q. Song (1994). Propri´et´es de Markov des processus ` a plusieurs param`etres et capacit´es. C. R. Acad. Sci. Paris S´er. I Math. 319 (5), 483–488. Hoeffding, W. (1960). The strong law of large numbers for U-statistics. University of North Carolina Institute of Statistics, Mimeo. Series. Hunt, G. A. (1956a). Markoff processes and potentials. Proc. Nat. Acad. Sci. U.S.A. 42, 414–418. Hunt, G. A. (1956b). Semi-groups of measures on Lie groups. Trans. Amer. Math. Soc. 81, 264–293. Hunt, G. A. (1957). Markoff processes and potentials. I, II. Illinois J. Math. 1, 44–93, 316–369. Hunt, G. A. (1958). Markoff processes and potentials. III. Illinois J. Math. 2, 151–213. Hunt, G. A. (1966). Martingales et Processus de Markov. Paris: Dunod. Monographies de la soci´et´e Math´ematique de France, No. 1. H¨ urzeler, H. E. (1985). The optional sampling theorem for processes indexed by a partially ordered set. Ann. Probab. 13 (4), 1224–1235. Imkeller, P. (1984). Local times for a class of multiparameter processes. Stochastics 12 (2), 143–157.
References
555
Imkeller, P. (1985). A stochastic calculus for continuous N -parameter strong martingales. Stochastic Process. Appl. 20 (1), 1–40. Imkeller, P. (1986). Local times of continuous N -parameter strong martingales. J. Multivariate Anal. 19 (2), 348–365. Imkeller, P. (1988). Two-Parameter Martingales and Their Quadratic Variation. Berlin: Springer-Verlag. Itˆ o, K. (1944). Stochastic integral. Proc. Imp. Acad. Tokyo 20, 519–524. Itˆ o, K. (1984). Lectures on Stochastic Processes (Second ed.). Distributed for the Tata Institute of Fundamental Research, Bombay. Notes by K. Muralidhara Rao. Itˆ o, K. and J. McKean, H. P. (1960). Potentials and the random walk. Illinois J. Math. 4, 119–132. Itˆ o, K. and J. McKean, Henry P. (1974). Diffusion Processes and Their Sample Paths. Berlin: Springer-Verlag. Second printing, corrected, Die Grundlehren der mathematischen Wissenschaften, Band 125. Ivanoff, G. and E. Merzbach (2000). Set-Indexed Martingales. Chapman & Hall/CRC, Boca Raton, FL. Ivanova, B. G. and E. Mertsbakh (1992). Set-indexed stochastic processes and predictability. Teor. Veroyatnost. i Primenen. 37 (1), 57–63. Jacod, J. (1998). Rates of convergence to the local time of a diffusion. Ann. Inst. H. Poincar´e Probab. Statist. 34 (4), 505–544. Janke, S. J. (1985). Recurrent sets for transient L´evy processes with bounded kernels. Ann. Probab. 13 (4), 1204–1218. Janson, S. (1997). Gaussian Hilbert Spaces. Cambridge: Cambridge University Press. Kac, M. (1949). On deviations between theoretical and empirical distributions. Proceedings of the National Academy of Sciences, U.S.A. 35, 252–257. Kac, M. (1951). On some connections between probability theory and differential and integral equations. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950, Berkeley and Los Angeles, pp. 189–215. University of California Press. Kahane, J.-P. (1982). Points multiples du mouvement brownien et des processus de L´evy sym´etriques, restreints ` a un ensemble compact de valeurs du temps. C. R. Acad. Sci. Paris S´er. I Math. 295 (9), 531–534. Kahane, J.-P. (1983). Points multiples des processus de L´evy sym´etriques stables restreints ` a un ensemble de valeurs du temps. In Seminar on Harmonic Analysis, 1981–1982, pp. 74–105. Orsay: Univ. Paris XI. Kahane, J.-P. (1985). Some Random Series of Functions (Second ed.). Cambridge: Cambridge University Press. Kakutani, S. (1944a). On Brownian motions in n-space. Proc. Imp. Acad. Tokyo 20, 648–652. Kakutani, S. (1944b). Two-dimensional Brownian motion and harmonic functions. Proc. Imp. Acad. Tokyo 20, 706–714.
556
References
Kakutani, S. (1945). Markoff process and the Dirichlet problem. Proc. Japan Acad. 21, 227–233 (1949). Kanda, M. (1982). Notes on polar sets for L´evy processes on the line. In Functional analysis in Markov processes (Katata/Kyoto, 1981), pp. 227–234. Berlin: Springer. Kanda, M. (1983). On the class of polar sets for a certain class of L´evy processes on the line. J. Math. Soc. Japan 35 (2), 221–242. Kanda, M. and M. Uehara (1981). On the class of polar sets for symmetric L´evy processes on the line. Z. Wahrsch. Verw. Gebiete 58 (1), 55–67. Karatsuba, A. A. (1993). Basic Analytic Number Theory. Berlin: SpringerVerlag. Translated from the second (1983) Russian edition and with a preface by Melvyn B. Nathanson. Karatzas, I. and S. E. Shreve (1991). Brownian Motion and Stochastic Calculus (Second ed.). New York: Springer-Verlag. Kargapolov, M. I. and J. I. Merzljakov (1979). Fundamentals of the Theory of Groups. New York: Springer-Verlag. Translated from the second Russian edition by Robert G. Burns. Kellogg, O. D. (1967). Foundations of Potential Theory. Berlin: SpringerVerlag. Reprint from the first edition of 1929. Die Grundlehren der Mathematischen Wissenschaften, Band 31. Kendall, W. S. (1980). Contours of Brownian processes with severaldimensional times. Z. Wahrsch. Verw. Gebiete 52 (3), 267–276. Kesten, H. and F. Spitzer (1979). A limit theorem related to a new class of self-similar processes. Z. Wahrsch. Verw. Gebiete 50 (1), 5–25. Khoshnevisan, D. (1992). Level crossings of the empirical process. Stochastic Process. Appl. 43 (2), 331–343. Khoshnevisan, D. (1993). An embedding of compensated compound Poisson processes with applications to local times. Ann. Probab. 21 (1), 340–361. Khoshnevisan, D. (1994). Soc. 120 (2), 577–584.
A discrete fractal in Z1+ .
Proc. Amer. Math.
Khoshnevisan, D. (1995). On the distribution of bubbles of the Brownian sheet. Ann. Probab. 23 (2), 786–805. Khoshnevisan, D. (1997a). Escape rates for L´evy processes. Studia Sci. Math. Hungar. 33 (1-3), 177–183. Khoshnevisan, D. (1997b). Some polar sets for the Brownian sheet. S´eminaire de Probabilit´es, XXXI, pp. 190–197. Berlin: Springer.
In
Khoshnevisan, D. (1999). Brownian sheet images and Bessel–Riesz capacity. Trans. Amer. Math. Soc. 351 (7), 2607–2622. Khoshnevisan, D. (2000). On sums of i.i.d. random variables indexed by N parameters. In S´eminaire de Probabilit´es, XXXIV, pp. 151–156. Berlin: Springer. Khoshnevisan, D. and T. M. Lewis (1998). A law of the iterated logarithm for stable processes in random scenery. Stochastic Process. Appl. 74 (1), 89–121.
References
557
Khoshnevisan, D. and Z. Shi (1999). Brownian sheet and capacity. Ann. Probab. 27 (3), 1135–1159. Khoshnevisan, D. and Z. Shi (2000). Fast sets and points for fractional Brownian motion. In S´eminaire de Probabilit´es, XXXIV, pp. 393–416. Berlin: Springer. Khoshnevisan, D. and Y. Xiao (2002). Level sets of additive L´evy processes. Ann. Probab. (To appear). Kinney, J. R. (1953). Continuity properties of sample functions of Markov processes. Trans. Amer. Math. Soc. 74, 280–302. Kitagawa, T. (1951). Analysis of variance applied to function spaces. Mem. Fac. Sci. Ky¯ usy¯ u Univ. A. 6, 41–53. Knight, F. B. (1981). Essentials of Brownian motion and Diffusion. Providence, R.I.: American Mathematical Society. Kochen, S. and C. Stone (1964). A note on the Borel–Cantelli lemma. Illinois J. Math. 8, 248–251. K¨ orezlio˘ glu, H., G. Mazziotto, and J. Szpirglas (Eds.) (1981). Processus ` Deux Indices. Berlin: Springer. Papers from the E.N.S.T.Al´eatoires A C.N.E.T. Colloquium held in Paris, June 30–July 1, 1980. Krengel, U. and R. Pyke (1987). Uniform pointwise ergodic theorems for classes of averaging sets and multiparameter subadditive processes. Stochastic Process. Appl. 26 (2), 289–296. Krickeberg, K. (1963). Wahrscheinlichkeitstheorie. B. G. Teubner Verlagsgesellschaft, Stuttgart. Krickeberg, K. (1965). Probability Theory. Addison-Wesley Publishing Co., Inc., Reading, Mass.–London. Krickeberg, K. and C. Pauc (1963). Martingales et d´erivation. Bull. Soc. Math. France 91, 455–543. Kunita, H. and S. Watanabe (1967). On square integrable martingales. Nagoya Math. J. 30, 209–245. Kuroda, K. and H. Manaka (1987). The interface of the Ising model and the Brownian sheet. In Proceedings of the symposium on statistical mechanics of phase transitions—mathematical and physical aspects (Trebon, 1986), Volume 47, pp. 979–984. Kuroda, K. and H. Manaka (1998). Limit theorem related to an interface of three-dimensional Ising model. Kobe J. Math. 15 (1), 17–39. Kuroda, K. and H. Tanemura (1988). Interacting particle system and Brownian sheet. Keio Sci. Tech. Rep. 41 (1), 1–16. Kwon, J. S. (1994). The law of large numbers for product partial sum processes indexed by sets. J. Multivariate Anal. 49 (1), 76–86. Lacey, M. T. (1990). Limit laws for local times of the Brownian sheet. Probab. Theory Related Fields 86 (1), 63–85. Lachout, P. (1988). Billingsley-type tightness criteria for multiparameter stochastic processes. Kybernetika (Prague) 24 (5), 363–371.
558
References
Lamb, C. W. (1973). A short proof of the martingale convergence theorem. Proc. Amer. Math. Soc. 38, 215–217. Landkof, N. S. (1972). Foundations of Modern Potential Theory. New York: Springer-Verlag. Translated from the Russian by A. P. Doohovskoy, Die Grundlehren der mathematischen Wissenschaften, Band 180. Lawler, G. F. (1991). Intersections of Random Walks. Boston, MA: Birkh¨ auser Boston, Inc. Le Gall, J.-F. (1987). Temps locaux d’intersection et points multiples des processus de L´evy. In S´eminaire de Probabilit´es, XXI, pp. 341–374. Berlin: Springer. ´ Le Gall, J.-F. (1992). Some Properties of Planar Brownian Motion. In Ecole ´ e de Probabilit´es de Saint-Flour XX—1990, pp. 111–235. Berlin: Springer. d’Et´ Le Gall, J.-F. and J. Rosen (1991). The range of stable random walks. Ann. Probab. 19 (2), 650–705. Le Gall, J.-F., J. S. Rosen, and N.-R. Shieh (1989). Multiple points of L´evy processes. Ann. Probab. 17 (2), 503–515. LeCam, L. (1957). Convergence in distribution of stochastic processes. University of California Pub. Stat. 2 (2). Ledoux, M. (1981). Classe L logL et martingales fortes ` a param`etre bidimensionnel. Ann. Inst. H. Poincar´e Sect. B (N.S.) 17 (3), 275–280. Ledoux, M. (1996). Lectures on Probability Theory and Statistics. Berlin: Springer-Verlag. Lectures from the 24th Saint-Flour Summer School held July 7–23, 1994, Edited by P. Bernard (with Dobrushin, R. and Groeneboom, P.). Ledoux, M. and M. Talagrand (1991). Probability in Banach Spaces. Berlin: Springer-Verlag. L´evy, P. (1965). Processus stochastiques et mouvement brownien. GauthierVillars & Cie, Paris. Suivi d’une note de M. Lo`eve. Deuxi`eme ´edition revue et augment´ee. Li, D. L. and Z. Q. Wu (1989). The law of the iterated logarithm for B-valued random variables with multidimensional indices. Ann. Probab. 17 (2), 760–774. Lyons, R. (1990). Random walks and percolation on trees. Ann. Probab. 18 (3), 931–958. Madras, N. and G. Slade (1993). Birkh¨ auser Boston Inc.
The Self-Avoiding Walk. Boston, MA:
Mandelbrot, B. B. (1982). The Fractal Geometry of Nature. San Francisco, Calif.: W. H. Freeman and Co. Schriftenreihe f¨ ur den Referenten. [Series for the Referee]. Marcus, M. B. and J. Rosen (1992). Sample path properties of the local times of strongly symmetric Markov processes via Gaussian processes. Ann. Probab. 20 (4), 1603–1684. Mattila, P. (1995). Geometry of Sets and Measures in Euclidean Spaces. Cambridge: Cambridge University Press.
References
559
Mazziotto, G. (1988). Two-parameter Hunt processes and a potential theory. Ann. Probab. 16 (2), 600–619. Mazziotto, G. and E. Merzbach (1985). Regularity and decomposition of twoparameter supermartingales. J. Multivariate Anal. 17 (1), 38–55. Mazziotto, G. and J. Szpirglas (1981). Un exemple de processus ` a deux indices sans l’hypoth`ese F4. In Seminar on Probability, XV (Univ. Strasbourg, Strasbourg, 1979/1980) (French), pp. 673–688. Berlin: Springer. Mazziotto, G. and J. Szpirglas (1982). Optimal stopping for two-parameter processes. In Advances in filtering and optimal stochastic control (Cocoyoc, 1982), pp. 239–245. Berlin: Springer. Mazziotto, G. and J. Szpirglas (1983). Arrˆet optimal sur le plan. Z. Wahrsch. Verw. Gebiete 62 (2), 215–233. McKean, Henry P., J. (1955a). Hausdorff–Besicovitch dimension of Brownian motion paths. Duke Math. J. 22, 229–234. McKean, Henry P., J. (1955b). Sample functions of stable processes. Ann. of Math. (2) 61, 564–579. Merzbach, E. and D. Nualart (1985). Different kinds of two-parameter martingales. Israel J. Math. 52 (3), 193–208. Millar, P. W. (1978). A path decomposition for Markov processes. Ann. Probability 6 (2), 345–348. Mountford, T. S. (1993). Estimates of the Hausdorff dimension of the boundary of positive Brownian sheet components. In S´eminaire de Probabilit´es, XXVII, pp. 233–255. Berlin: Springer. Mountford, T. S. (2002). Brownian bubbles and the local time of the Brownian sheet. (In progress). Munkres, J. R. (1975). Topology: A First Course. Englewood Cliffs, N.J.: Prentice-Hall Inc. Muroga, S. (1949). On the capacity of a discrete channel. J. Phys. Soc. Japan 8, 484–494. Nagasawa, M. (1964). Time reversions of Markov processes. Nagoya Math. J. 24, 177–204. Neveu, J. (1975). Discrete-Parameter Martingales (Revised ed.). Amsterdam: North-Holland Publishing Co. Translated from the French by T. P. Speed, North-Holland Mathematical Library, Vol. 10. Nualart, D. (1985). Variations quadratiques et in´egalit´es pour les martingales a deux indices. Stochastics 15 (1), 51–63. ` Nualart, D. (1995). The Malliavin Calculus and Related Topics. New York: Springer-Verlag. Nualart, E. (2001a). Ph. D. thesis (in preparation). F`eder´ ale de Lausanne.
Ecole Polytechnique
Nualart, E. (2001b). Potential theory for hyperbolic SPDEs. Preprint.
560
References
Orey, S. (1967). Polar sets for processes with stationary independent increments. In Markov Processes and Potential Theory, pp. 117–126. New York: Wiley. Proc. Sympos. Math. Res. Center, Madison, Wis., 1967. Orey, S. and W. E. Pruitt (1973). Sample functions of the N -parameter Wiener process. Ann. Probability 1 (1), 138–163. Ornstein, D. S. (1969). Random walks. I, II. Trans. Amer. Math. Soc. 138 (1969), 1–43; ibid. 138, 45–60. Oxtoby, J. C. and S. Ulam (1939). On the existence of a measure invariant under a transformation. Ann. Math. 40 (2), 560–566. Paley, R. E. A. C. and A. Zygmund (1932). A note on analytic functions in the unit circle. Proc. Camb. Phil. Soc. 28, 366–272. Paranjape, S. R. and C. Park (1973). Laws of iterated logarithm of multiparameter Wiener processes. J. Multivariate Anal. 3, 132–136. Park, W. J. (1975). The law of the iterated logarithm for Brownian sheets. J. Appl. Probability 12 (4), 840–844. Pemantle, R., Y. Peres, and J. W. Shapiro (1996). The trace of spatial Brownian motion is capacity-equivalent to the unit square. Probab. Theory Related Fields 106 (3), 379–399. Peres, Y. (1996a). Intersection-equivalence of Brownian paths and certain branching processes. Comm. Math. Phys. 177 (2), 417–434. Peres, Y. (1996b). Remarks on intersection-equivalence and capacityequivalence. Ann. Inst. H. Poincar´e Phys. Th´eor. 64 (3), 339–347. Perkins, E. (1982). Weak invariance principles for local time. Z. Wahrsch. Verw. Gebiete 60 (4), 437–451. Petrov, V. V. (1995). Limit Theorems of Probability. Oxford: Oxford Univ. Press. Pollard, D. (1984). Convergence of Stochastic Processes. New York: SpringerVerlag. ¨ P´ olya, G. (1921). Uber eine Aufgabe der Wahrscheinlichkeitsrechnung betreffend der Irrfahrt im Straßennetz. Math. Annalen 84, 149–160. P´ olya, G. (1923). On the zeros of an integral function represented by Fourier’s intergal. Mess. of Math. 52, 185–188. Port, S. C. (1988). Occupation time and the Lebesgue measure of the range for a L´evy process. Proc. Amer. Math. Soc. 103 (4), 1241–1248. Port, S. C. and C. J. Stone (1971a). Infinitely divisible processes and their potential theory, I. Ann. Inst. Fourier (Grenoble) 21 (2), 157–275. Port, S. C. and C. J. Stone (1971b). Infinitely divisible processes and their potential theory, II. Ann. Inst. Fourier (Grenoble) 21 (4), 179–265. Pruitt, W. E. (1969/1970). The Hausdorff dimension of the range of a process with stationary independent increments. J. Math. Mech. 19, 371–378.
References
561
Pruitt, W. E. (1975). Some dimension results for processes with independent increments. In Stochastic processes and related topics (Proc. Summer Res. Inst. on Statist. Inference for Stochastic Processes, Indiana Univ., Bloomington, Ind., 1974, Vol. 1; dedicated to Jerzy Neyman), New York, pp. 133–165. Academic Press. Pruitt, W. E. and S. J. Taylor (1969). The potential kernel and hitting probabilities for the general stable process in RN . Trans. Amer. Math. Soc. 146, 299–321. Pyke, R. (1973). Partial sums of matrix arrays, and Brownian sheets. In Stochastic analysis (a tribute to the memory of Rollo Davidson), pp. 331–348. London: Wiley. Pyke, R. (1985). Opportunities for set-indexed empirical and quantile processes in inference. In Proceedings of the 45th session of the International Statistical Institute, Vol. 4 (Amsterdam, 1985), Volume 51, pp. No. 25.2, 11. Rao, M. (1987). On polar sets for L´evy processes. J. London Math. Soc. (2) 35 (3), 569–576. Ren, J. G. (1990). Topologie p-fine sur l’espace de Wiener et th´eor`eme des fonctions implicites. Bull. Sci. Math. (2) 114 (2), 99–114. R´ev´esz, P. (1981). A strong invariance principle of the local time of RVs with continuous distribution. Studia Sci. Math. Hungar. 16 (1-2), 219–228. R´ev´esz, P. (1990). Random Walk in Random and Nonrandom Environments. Teaneck, NJ: World Scientific Publishing Co. Inc. Revuz, D. (1984). Markov Chains (Second ed.). Amsterdam: North-Holland Publishing Co. Revuz, D. and M. Yor (1994). Continuous Martingales and Brownian Motion (Second ed.). Berlin: Springer-Verlag. Ricci, F. and E. M. Stein (1992). Multiparameter singular integrals and maximal functions. Ann. Inst. Fourier (Grenoble) 42 (3), 637–670. Riesz, F. and B. Sz.-Nagy (1955). Functional Analysis. New York: Fredrick Ungar Publishing Company. Seventh Printing, 1978. Translated from the second French edition by Leo F. Boron. Rogers, C. A. and S. J. Taylor (1962). On the law of the iterated logarithm. J. London Math. Soc. 37, 145–151. Rogers, L. C. G. (1989). Multiple points of Markov processes in a complete metric space. In S´eminaire de Probabilit´es, XXIII, pp. 186–197. Berlin: Springer. Rogers, L. C. G. and D. Williams (1987). Diffusions, Markov Processes, and Martingales. Vol. 2. New York: John Wiley & Sons Inc. Itˆ o calculus. Rogers, L. C. G. and D. Williams (1994). Diffusions, Markov Processes, and Martingales. Vol. 1 (Second ed.). Chichester: John Wiley & Sons Ltd. Foundations. Rosen, J. (1983). A local time approach to the self-intersections of Brownian paths in space. Comm. Math. Phys. 88 (3), 327–338.
562
References
Rosen, J. (1984). Self-intersections of random fields. Ann. Probab. 12 (1), 108–119. Rosen, J. (1991). Second order limit laws for the local times of stable processes. In S´eminaire de Probabilit´es, XXV, pp. 407–424. Berlin: Springer. Rosen, J. (1993). Uniform invariance principles for intersection local times. In Seminar on Stochastic Processes, 1992 (Seattle, WA, 1992), pp. 241–247. Boston, MA: Birkh¨ auser Boston. Rota, G.-C. (1962). An “Alternierende Verfahren” for general positive operators. Bull. Amer. Math. Soc. 68, 95–102. Rota, G.-C. (1969). Syposium on Ergodic Theory, Tulane University. Presented in October 1969. Royden, H. L. (1968). Real Analysis (Second ed.). New York: Macmillan Publishing Company. Rozanov, Yu. A. (1982). Markov Random Fields. New York: Springer-Verlag. Translated from the Russian by Constance M. Elson. Rudin, W. (1973). Functional Analysis (First ed.). New York: McGraw-Hill Inc. Rudin, W. (1974). Real and complex analysis (Third ed.). New York: McGrawHill Book Co. Salisbury, T. S. (1988). Brownian bitransforms. In Seminar on Stochastic Processes, 1987 (Princeton, NJ, 1987), pp. 249–263. Boston, MA: Birkh¨ auser Boston. Salisbury, T. S. (1992). A low intensity maximum principle for bi-Brownian motion. Illinois J. Math. 36 (1), 1–14. Salisbury, T. S. (1996). Energy, and intersections of Markov chains. In Random discrete structures (Minneapolis, MN, 1993), pp. 213–225. New York: Springer. Sato, K. (1999). L´evy Processes and Infinitely Divisble Processes. Cambridge: Cambridge University Press. Segal, I. E. (1954). Abstract probability spaces and a theorem of Kolmogoroff. Amer. J. Math. 76, 721–732. Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons Inc. Wiley Series in Probability and Mathematical Statistics. Shannon, C. E. (1948). A mathematical theory of communication. Bell System Tech. J. 27, 379–423, 623–656. Shannon, C. E. and W. Weaver (1949). A Mathematical Theory of Communication. Urbana, Ill.: University of Illinois Press. Sharpe, M. (1988). General Theory of Markov Processes. Boston, MA: Academic Press Inc. Shieh, N. R. (1982). Strong differentiation and martingales in product spaces. Math. Rep. Toyama Univ. 5, 29–36. Shorack, G. R. and R. T. Smythe (1976). Inequalities for max | Sk | /bk where k ∈ Nr . Proc. Amer. Math. Soc. 54, 331–336.
References
563
Slepian, D. (1962). The one-sided barrier problem for Gaussian noise. Bell System Tech. J. 41, 463–501. Smythe, R. T. (1973). Strong laws of large numbers for r-dimensional arrays of random variables. Ann. Probability 1 (1), 164–170. Smythe, R. T. (1974a). Convergence de sommes de variables al´eatoires indic´ees par des ensembles partiellement ordonn´es. Ann. Sci. Univ. Clermont 51 (9), 43–46. Colloque Consacr´e au Calcul des Probabilit´es (Univ. Clermont, Clermont-Ferrand, 1973). Smythe, R. T. (1974b). Sums of independent random variables on partially ordered sets. Ann. Probability 2, 906–917. Song, R. G. (1988). Optimal stopping for general stochastic processes indexed by a lattice-ordered set. Acta Math. Sci. (English Ed.) 8 (3), 293–306. Spitzer, F. (1958). Some theorems concerning 2-dimensional Brownian motion. Trans. Amer. Math. Soc. 87, 187–197. Spitzer, F. (1964). Principles of Random Walk. D. Van Nostrand Co., Inc., Princeton, N.J.–Toronto–London. The University Series in Higher Mathematics. Stein, E. M. (1961). On the maximal ergodic theorem. Proc. Nat. Acad. Sci. U.S.A. 47, 1894–1897. Stein, E. M. (1970). Singular Integrals and Differentiability Properties of Functions. Princeton, N.J.: Princeton University Press. Princeton Mathematical Series, No. 30. Stein, E. M. (1993). Harmonic Analysis: Real-Variable Methods, Orthogonality, and Oscillatory Integrals. Princeton, NJ: Princeton University Press. With the assistance of Timothy S. Murphy, Monographs in Harmonic Analysis, III. Stein, E. M. and G. Weiss (1971). Introduction to Fourier Analysis on Euclidean Spaces. Princeton, N.J.: Princeton University Press. Princeton Mathematical Series, No. 32. Stoll, A. (1987). Self-repellent random walks and polymer measures in two dimensions. In Stochastic processes—mathematics and physics, II (Bielefeld, 1985), pp. 298–318. Berlin: Springer. Stoll, A. (1989). Invariance principles for Brownian intersection local time and polymer measures. Math. Scand. 64 (1), 133–160. Stone, C. J. (1969). On the potential operator for one-dimensional recurrent random walks. Trans. Amer. Math. Soc. 136, 413–426. Stout, W. F. (1974). Almost Sure Convergence. Academic Press [a subsidiary of Harcourt Brace Jovanovich, Publishers], New York-London. Probability and Mathematical Statistics, Vol. 24. Strassen, V. (1965/1966). A converse to the law of the iterated logarithm. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 4, 265–268. Stratonovich, R. L. (1966). A new representation for stochastic integrals and equations. SIAM J. of Control 4, 362–371. Stroock, D. W. (1993). Probability Theory, an Analytic View. Cambridge: Cambridge University Press.
564
References
Sucheston, L. (1983). On one-parameter proofs of almost sure convergence of multiparameter processes. Z. Wahrsch. Verw. Gebiete 63 (1), 43–49. Takeuchi, J. (1964a). A local asymptotic law for the transient stable process. Proc. Japan Acad. 40, 141–144. Takeuchi, J. (1964b). On the sample paths of the symmetric stable processes in spaces. J. Math. Soc. Japan 16, 109–127. Takeuchi, J. and S. Watanabe (1964). Spitzer’s test for the Cauchy process on the line. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 3, 204–210 (1964). Taylor, S. J. (1953). The Hausdorff α-dimensional measure of Brownian paths in n-space. Proc. Cambridge Philos. Soc. 49, 31–39. Taylor, S. J. (1955). The α-dimensional measure of the graph and set of zeros of a Brownian path. Proc. Cambridge Philos. Soc. 51, 265–274. Taylor, S. J. (1961). On the connexion between Hausdorff measures and generalized capacity. Proc. Cambridge Philos. Soc. 57, 524–531. Taylor, S. J. (1966). Multiple points for the sample paths of the symmetric stable process. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 5, 247–264. Taylor, S. J. (1973). Sample path properties of processes with stationary independent increments. In Stochastic analysis (a tribute to the memory of Rollo Davidson), pp. 387–414. London: Wiley. Taylor, S. J. (1986). The measure theory of random fractals. Math. Proc. Cambridge Philos. Soc. 100 (3), 383–406. Testard, F. (1985). Quelques propri´et´es g´eom´etriques de certains processus gaussiens. C. R. Acad. Sci. Paris S´ er. I Math. 300 (14), 497–500. Testard, F. (1986). Dimension asym´etrique et ensembles doublement non polaires. C. R. Acad. Sci. Paris S´ er. I Math. 303 (12), 579–581. Tihomirov, V. M. (1963). The works of A. N. Kolmogorov on ε-entropy of function classes and superpositions of functions. Uspehi Mat. Nauk 18 (5 (113)), 55–92. Vares, M. E. (1983). Local times for two-parameter L´evy processes. Stochastic Process. Appl. 15 (1), 59–82. Walsh, J. B. (1972). Transition functions of Markov processes. In S´eminaire de Probabilit´es, VI (Univ. Strasbourg, ann´ee universitaire 1970–1971), pp. 215– 232. Lecture Notes in Math., Vol. 258. Berlin: Springer. Walsh, J. B. (1978/1979). Convergence and regularity of multiparameter strong martingales. Z. Wahrsch. Verw. Gebiete 46 (2), 177–192. Walsh, J. B. (1981). Optional increasing paths. In Two-Index Random Processes (Paris, 1980), pp. 172–201. Berlin: Springer. Walsh, J. B. (1982). Propagation of singularities in the Brownian sheet. Ann. Probab. 10 (2), 279–288. Walsh, J. B. (1986a). An introduction to stochastic partial differential equa´ tions. In Ecole d’´et´e de probabilit´es de Saint-Flour, XIV—1984, pp. 265–439. Berlin: Springer.
References
565
Walsh, J. B. (1986b). Martingales with a multidimensional parameter and stochastic integrals in the plane. In Lectures in probability and statistics (Santiago de Chile, 1986), pp. 329–491. Berlin: Springer. Watson, G. N. (1995). A Treatise on the Theory of Bessel Functions. Cambridge: Cambridge University Press. Reprint of the second (1944) edition. Weber, M. (1983). Polar sets of some Gaussian processes. In Probability in Banach Spaces, IV (Oberwolfach, 1982), pp. 204–214. Berlin: Springer. ´ Weinryb, S. (1986). Etude asymptotique par des mesures de r3 de saucisses de Wiener localis´ees. Probab. Theory Relat. Fields 73 (1), 135–148. Weinryb, S. and M. Yor (1988). Le mouvement brownien de L´evy index´e par R3 comme limite centrale de temps locaux d’intersection. In S´eminaire de Probabilit´es, XXII, pp. 225–248. Berlin: Springer. Weinryb, S. and M. Yor (1993). Th´eor`eme central limite pour l’intersection de deux saucisses de Wiener ind´ependantes. Probab. Theory Related Fields 97 (3), 383–401. Wermer, J. (1981). Potential Theory (Second ed.). Berlin: Springer. Werner, W. (1993). Sur les singularit´es des temps locaux d’intersection du mouvement brownien plan. Ann. Inst. H. Poincar´e Probab. Statist. 29 (3), 391–418. Wichura, M. J. (1973). Some Strassen-type laws of the iterated logarithm for multiparameter stochastic processes with independent increments. Ann. Probab. 1, 272–296. Williams, D. (1970). Decomposing the Brownian path. Bull. Amer. Math. Soc. 76, 871–873. Williams, D. (1974). Path decomposition and continuity of local time for onedimensional diffusions. I. Proc. London Math. Soc. (3) 28, 738–768. Wong, E. (1989). Multiparameter martingale and Markov process. In Stochastic Differential Systems (Bad Honnef, 1988), pp. 329–336. Berlin: Springer. Yor, M. (1983). Le drap brownien comme limite en loi de temps locaux lin´eaires. In Seminar on probability, XVII, pp. 89–105. Berlin: Springer. Yor, M. (1992). Some Aspects of Brownian Motion. Part I. Basel: Birkh¨ auser Verlag. Some special functionals. Yor, M. (1997). Some Aspects of Brownian Motion. Part II. Basel: Birkh¨ auser Verlag. Some recent martingale problems. Yosida, K. (1958). On the differentiability of semigroups of linear operators. Proc. Japan Acad. 34, 337–340. Yosida, K. (1995). Functional Analysis. Berlin: Springer-Verlag. Reprint of the sixth (1980) edition. Zhang, R. C. (1985). Markov properties of the generalized Brownian sheet and extended OUP2 . Sci. Sinica Ser. A 28 (8), 814–825. Zygmund, A. (1988). Trigonometric Series. Vol. I, II. Cambridge: Cambridge University Press. Reprint of the 1979 edition.
Name Index
A: Abel, N. H., 103, 387, 506 Adler, R. J., 158, 170, 178, 179, 441, 454, 493, 494 Aizenman, M., 452 Alaoglu, L., 191 Andr´e, D., 133 Arzel´ a, C., 194, 196, 198 Ascoli, R., 194, 196, 198
Bertoin, J., iv, 308, 312, 341, 350, 389, 494 Bessel, F. W., 286, 339, 376, 377, 384, 385, 423, 426, 452, 456, 459, 520, 523, 524, 527, 529, 532 Beurling, A. C.-A., 389, 528 Bickel, P. J., 202 Billingsley, P., 44, 104, 200, 202, 212, 213, 500 Blackwell, D., 38, 45, 289 Blumenthal, R. M., 272, 310–312, 388, 453, 454 Bochner, S., 378, 379 ´ 6, 35, 42, 48, 62, 66, 78, Borel, E., 101, 112, 115, 116, 128, 131, 143, 144, 147, 148, 166, 170, 181, 182, 188, 195, 210, 220, 221, 226, 262, 263, 270, 289, 305, 306, 356, 358, 361, 367, 376, 379, 392, 398, 399, 402, 408, 412, 415, 435–437, 442, 470, 478, 479, 481–484, 488, 491,
B: Bakry, D., 45, 266 Banach, S., 191 Barlow, M. T., 454, 494 Bass, R. F., iv, 103, 135, 226, 239, 266, 312, 341, 453, 494, 500, 540 Bauer, J., 389, 453 Beebe, N. H. F., iv Bendikov, A., 388 Benjamini, I., 102, 367, 389, 425 Bergstr¨ om, H., 388 Berman, S. M., 491, 493 Bernstein, F., 122, 127 Bernstein, S. N., 505
566
NAME INDEX 499, 501, 511–516, 520, 521, 525, 528, 533 Borodin, A. N., 103 Bowman, P., iv Brelot, M., 389, 528 Burdzy, K., iv, 453 Burkholder, D. L., 44, 134, 253, 256, 261, 266, 312 C: Caba˜ na, E., 266 Cairoli, R., iv, 16, 19, 26, 30, 38, 44–46, 63, 86, 134, 135, 237, 266, 395, 416, 452, 453, 469 Calder´ on, A. P., 62 Cantelli, F. P., 35, 78, 101, 112, 115, 116, 128, 131, 166, 170, 212, 262, 356, 358, 437 Cantor, G. F. L. P., 192, 515 Carath´eodory, C., 511 Carleson, L., 385, 525, 540 Cartan, E. J., 389, 528 Cauchy, A.-L., 72, 134, 146, 182, 192, 261, 306, 351, 415, 432, 439, 462, 463, 471, 481 ˇ Centsov, N. N., 147, 173 Chapman, S., 270, 276, 284, 285, 289, 292, 293, 305, 365 Chatterji, S. D., 12, 40, 44, 45 Chebyshev, P., 3, 12, 122, 124, 130, 145, 158, 162, 163, 522 Chen, Z. L., 493 Chernoff, H., 115 Choquet, G., 533, 538, 540 Chow, Y. S., 44 Chung, K. L., 72, 73, 102, 133, 239, 266, 312, 356, 494 Ciesielski, Z., 63, 335, 341 Coulomb, C. A., de, 530 Cram´er, H., 206 Cs´ aki, E., 103, 494 Cs¨ org˝ o, M., 103, 179, 494 D: D’Alembert, J. L., 257
567
Dalang, R. C., iv, 45, 46, 134, 135, 493, 495 Daniell, P. J., 499 Davis, B. J., 253, 256, 261, 266, 454 de Acosta, A., 117, 136 de Moivre, A. P., 100 Dellacherie, C., 8, 44, 45, 226, 239, 266, 311, 533, 540 Dembo, A., 136 Deny, J., 528 Dirichlet, J. P. G. L., 118, 121, 125, 127, 332, 340 Donsker, M. D., 202 Doob, J. L., 7–9, 11, 12, 18–20, 24, 34, 36, 38, 40, 41, 44, 45, 75, 155, 178, 237, 245, 248, 250, 253, 254, 278, 280, 298, 312, 318, 319, 328, 338, 340, 368, 412 Doob, K. L., iii, iv Dorea, C. C. Y., 493 Dozzi, M., 239, 266, 494 Dubins, L. E., 33, 35, 37, 38, 45, 289 Dudley, R. M., 44, 150, 151, 158, 171, 172, 174, 178, 179, 202, 212, 452, 500 Durrett, R., 44, 91 Dvoretzky, A., 104, 350, 376, 428, 451 Dynkin, E. B., 104, 288, 341, 453 E: Edgar, G. A., 45, 135 Ehm, W., 491, 494 Eisenbaum, N., 494 Epstein, R., 494 Erd˝ os, P., 73, 101, 102, 104, 135, 350, 356, 376, 428, 451 Esqu´ıvel, M. L., 178 Etemadi, N., 104, 136 Ethier, S. N., iv, 45, 202, 311, 340 Euclid, 285, 305, 322, 344, 398, 404, 407, 408, 457, 511, 528, 531 Evans, S. N., 409, 453 F: Fatou, P., 13, 26
568
NAME INDEX
Feller, W., 133, 267, 281, 287, 288, 292, 294, 295, 297, 301, 303, 305, 308, 310, 311, 313–318, 320–322, 326–328, 339, 340, 344, 363, 367, 368, 383, 384, 386–388, 392, 394–396, 398, 399, 401, 403–407, 424–426, 451, 493, 501, 506 Fernique, X., 179 Feynman, R. J., 326–329, 338 Fisk, D. L., 250 Fitzsimmons, P. J., 326, 389, 408, 425, 453, 493, 494 F¨ oldes, A., 494 F¨ ollmer, H., 33, 45, 312 Fouque, J.-P., 46 Fourier, J. B. J., 177, 285, 307, 323, 325, 423, 491, 502, 503 Frangos, N. E., 135 Frostman, O., 385, 423, 426, 431, 432, 449, 456, 517, 520, 521, 523, 524 Fubini, G., 97, 285, 286, 295, 323, 345, 346, 370, 371, 379, 405, 411, 417, 430, 468, 502 Fuchs, W. H. J., 72 Fukushima, M., 239, 311, 340, 389 G: Gabriel, J.-P., 16, 134 G¨ anssler, P., 212 Garsia, A. M., 44, 179 Gauss, C.-F., 100, 103, 115, 116, 127, 129, 135, 137, 139, 141, 142, 146, 147, 170–172, 176, 178, 181, 194, 201, 206, 209, 211, 212, 265, 379, 438, 439, 442, 443, 445, 464, 469, 532, 540 Geman, D. N., 453, 494 Getoor, R. K., 311, 312, 340, 388, 389, 453, 454 Glivenko, V. I., 212 Glover, J., 340 Griffin, P., 134
Gundy, R. F., 63, 253, 256, 261, 266 Gut, A., 134 H: Haar, A., 47–49, 51, 53–55, 61–63 Hall, P., 44 Hardy, G. H., 57 Hausdorff, F., 293, 385, 389, 423, 431, 433, 434, 441, 443, 447, 455, 457, 512, 515, 517, 520, 521, 523, 527 Hawkes, J., 389, 451, 453, 454, 494 Helly, E., 504 Helms, L. L., 540 Hendricks, W. J., 350, 453 Hesse, L. O., 320 Hewitt, E., 14, 82 Heyde, C. C., 44 Hilbert, D., 137, 176 Hille, E., 311, 315 Hirsch, F., 389, 408, 425, 453 Hochberg, K. J., 46 Hoeffding, W., 45 H¨ older, O. L., 165, 166, 174, 176, 254, 263, 265, 440, 484, 516 Horowitz, J., 453, 494 Horv´ ath, L., iv Hunt, G. A., 7, 8, 45, 226, 239, 272, 288 H¨ urzeler, H. E., 46 I: Imkeller, P., 239, 266, 494 Itˆ o, K., 178, 251, 252, 254, 264–266, 288, 308, 311, 339, 385, 494, 540 Ivanoff, B. G., 46 Ivanova, B. G., see Ivanoff J: Jacod, J., 103 Janke, S. J., 389 Janson, S., 35, 45, 178 Jensen, J., 4, 6, 9, 18, 19, 21, 28, 29, 31, 59, 75, 235, 244 Jessen, B., 54, 58 K: Kac, M., 232, 326–329, 338, 494
NAME INDEX Kahane, J.-P., 454, 493, 525 Kakutani, S., 104, 341, 376, 377, 388, 428, 451, 456 Kanda, M., 389 Karatsuba, A. A., 119, 127 Karatzas, I., 239, 266, 341, 494 Kargapolov, M. I., 78 Kasahara, Y., 494 Kellogg, O. D., 362, 540 Kendall, W. S., 495 Kesten, H., 103 Khintchine, A. I., 116, 117, 307 Khoshnevisan, D., 46, 102–104, 135, 213, 344, 350, 425, 453, 454, 493–495 Kinney, J. R., 272 Kitagawa, T., 178 Knight, F. B., 103, 288, 335, 341 Kochen, S., 73, 356, 359 Kolmogorov, A. N., 3, 14, 111, 123, 142, 150, 151, 159, 160, 162, 164–166, 170, 176, 179, 270, 276, 284, 285, 289, 292, 293, 305, 310, 365, 432, 475, 484, 499, 500 Krengel, U., 135 Krickeberg, K., 45 Krone, S., iv Kronecker, L., 117, 121, 134 Kuelbs, J., 134, 136 Kunita, H., 251, 266 Kuroda, K., 178 Kurtz, T. G., 202, 311, 340 Kwon, J. S., 135 K¨ orezlio˘ glu, H., 266 L: L’Hˆ opital, G. F. A. M., de, 114, 138, 171, 174, 509 L´evˆeque, O., iv Lacey, M. T., 494 Lachout, P., 200 Lagrange, J.-L., 531 Lamb, C. W., 12, 40, 46 Landkof, N. S., 540 Laplace, P.-S., 100, 102, 103, 230, 264, 317, 320, 332, 337,
569
338, 367, 370, 374, 378, 380, 409, 425, 501–507 Lawler, G., 102, 104 Le Gall, J.-F., 104, 312, 453, 454 Lebesgue, H., 12, 47, 49, 54, 60, 61, 100, 134, 138, 142, 143, 146, 148, 152, 165, 177, 203, 218, 244, 251, 285, 306, 307, 320, 361, 369, 373, 377, 379, 382–384, 386, 406, 409, 412, 413, 415, 420, 423, 426, 431–434, 445, 449, 450, 452, 457, 483, 491, 501, 525, 534 LeCam, L., 213 Ledoux, M., 16, 135, 158, 170, 178, 179 L´evy, P., 63, 113, 114, 123, 175, 178, 303–305, 307, 309, 310, 312, 313, 320, 322, 325, 330, 335, 341, 343, 344, 346, 350, 352, 377, 385–387, 389, 405, 423, 428, 435, 451, 453, 491, 493, 494 Lewis, T. M., iv, 213 Li, D., 135 Liouville, J., 62 Lipschitz, R. O. S., 260, 263, 265 Littlewood, J. E., 57 Lyons, R., 454 M: Madras, N., 104 Manaka, H., 178 Mandelbrot, B., 42 Marcinkiewicz, J., 54, 58, 266 Marcus, M. B., 494 Markov, A. A., 66, 67, 69, 73, 75, 76, 78, 83, 84, 86, 101, 114, 239, 267–313, 315, 327, 332, 341, 343, 345, 359, 360, 371, 373, 387–389, 391, 393, 395–398, 400, 401, 407–409, 420, 423, 435, 438, 443, 450, 451, 453, 455, 456, 493, 494
570
NAME INDEX
Mattila, P., 525 Mazziotto, G., 46, 266, 389, 453 McKean, H. P., 178, 255, 385, 431, 452, 453, 494 Mertsbakh, E., see Merzbach Merzbach, E., 46, 266 Merzljakov, I. J., 78 Meyer, P.-A., 8, 44, 45, 226, 239, 278, 280, 311, 318, 319, 328, 338, 340, 368, 533, 540 Millar, P. W., 312, 389 Milton, G. W., iv Mountford, T. S., iv, 495 Munkres, J. R., 150, 184, 191, 193 Muroga, S., 179 N: Nagasawa, M., 312, 389 Neveu, J., 44, 45, 63 Newton, I., 532 Nikod´ ym, O. M., 62, 285, 372, 491 Nualart, D., 266 Nualart, E., iv, 493 O: Orey, S., 178, 179, 389, 493 Orlicz, W., 45 Ornstein, D. S., 79, 103, 339, 388 ¯ Oshima, Y., 239, 311, 340, 389 Ottaviani, G, 123 Oxtoby, J. C., 213 P: Paley, R. E. A. C., 72, 75, 82, 85, 309, 347, 349, 372, 418, 449, 452, 459 Paranjape, S. R., 135 Park, C., 135 Parseval, M.-A., des Chˆenes, 323, 325 Pauc, C., 45 Pemantle, R., 102, 367, 389, 425 Peres, Y., 102, 367, 389, 425, 436, 453, 454, 541 Perkins, E., 103, 454 Petrov, V. V., 136 Phillips, R. S., 311, 315 ´ 261 Picard, C. E.,
Pitman, J. W., iv, 33, 35, 37, 45, 326, 494 Plancherel, M., 491 P´ olya, G., 72, 74, 388 Poincar´e, J. H., 40, 362, 539 Poisson, S. D., 340, 502 Pollard, D., 178, 212 Port, S. C., 79, 103, 493 Prohorov, Yu. V., 190, 193, 210, 213, 389 Pruitt, W. E., 178, 389, 454, 493 Pyke, R., 123, 135, 453 R: Rademacher, H., 134 Radon, J., 62, 285, 372, 491 Rao, K. M., 389 R´ev´esz, P., 102, 103, 179, 494 Ren, J. G., 389, 408, 425, 453 Revuz, D., 103, 239, 255, 266, 311, 312, 341, 494 Ricci, F., 135 Riemann, G. F. B., 206 Riesz, F., 62, 193, 311, 376, 377, 384, 385, 423, 426, 452, 456, 459, 520, 523, 524, 527, 529, 532 Rogers, C. A., 135 Rogers, L. C. G., 239, 311, 453, 494 Rosen, J. S., 104, 453, 494 Rota, G.-C., 312 Royden, H. L., 192 Rozanov, Yu. A., 39, 46, 392, 454 Rudin, W., 193, 213 S: Salisbury, T. S., 389, 408, 425, 453, 454 Sato, K., 308, 312, 350, 389 Savage, L. J., 14, 82 Schwarz, K. H. A., 72, 261, 415, 439, 462, 463, 471 Segall, I. E., 178 Serfling, R. J., 45 Shannon, C. E., 150, 179 Sharpe, M., 239, 311 Shi, Z., iv, 425, 453, 454 Shieh, N.-R., 63, 104 Shorack, G. R., 123, 135
NAME INDEX
571
Shreve, S. E., 239, 266, 341, 494 Slade, G., 104 Slepian, D., 176, 178 Smythe, R. T., 110, 112, 117, 134, 135, 178 Song, R. G., 46 Song, S., 389, 408, 425, 453 Spitzer, F., 103, 350 Stein, E. M., 62, 63, 135, 266 Stieltjes, T. J., 203, 264, 501 Stoll, A., 104 Stone, C. J., 73, 79, 103, 356, 359 Stone, M. H., 192 Stout, W. F., 135 Strassen, V., 133 Stratonovich, R. L., 264 Stroock, D. W., 44, 340 Sucheston, L., 10, 45, 135, 312 Sz.-Nagy, B., 62, 311 Szpirglas, J., 46, 266
Walsh, J. B., iv, 16, 43–46, 63, 135, 237, 239, 266, 289, 311, 312, 395, 493, 495 Watanabe, S., 251, 266, 350 Watson, G. N., 286 Weaver, W., 179 Weber, M., 441, 454, 493 Weierstrass, K. T. W., 192 Weinryb, S., 494 Weiss, G., 62 Wermer, J., 362, 531, 540 Werner, W., 453 Wichura, M. J., 114, 117, 123, 135, 202 Widder, D. V., 502 Wiener, N., 144, 178, 385 Williams, D., 70, 239, 311, 453, 494 Williams, R. J., 239, 266, 494 Wong, E., 250, 453 Wu, Z., 135
T: Takeda, M., 239, 311, 340, 389 Takeuchi, J., 350 Talagrand, M., 135, 158, 170, 178, 179 Tanemura, H., 178 Tauber, A., 102, 506 Taylor, B., 90, 121, 122, 204, 205, 252, 324, 327, 506, 516 Taylor, S. J., 101, 102, 104, 135, 335, 341, 385, 389, 427, 453, 454, 523, 525 Teicher, H., 44 Testard, F., 441, 454 Tihomirov, V. M., 150 Trotter, H., 491, 493, 494
X: Xiao, Y., iv, 454, 493
U: Uhlenbeck, G. E., 339, 388 Ulam, S., 213 Urysohn, P. S., 184, 186, 191, 293, 296, 302 V: Varadhan, S. R. S., 340 Vares, M. E., 494 W: Wald, A., 206
Y: Yor, M., 239, 255, 266, 312, 341, 494 Yosida, K., 311, 315, 340 Z: Zakai, M., 250 Zeitouni, P., 136 Zermelo, E., 15 Zhang, R. C., 493 Zygmund, A., 54, 58, 61–63, 72, 75, 82, 85, 309, 347, 349, 372, 418, 449, 452, 459
Subject Index
B: B(a; r) Euclidean ball/cube, 344 Bd (t; r) metric balls, 149 Bakry’s regularity theorem, 266 balayage, 362, 539 balayage theorem for strongly symmetric Feller processes, 363 Bernstein’s theorem, 505 Bessel functions, 286 modified, see Kν , 338 Bessel processes, 339 Bessel’s equation, 338 bi-Brownian motion, 395 Borel–Cantelli lemma, 35, 78, 101, 112, 116, 131, 166, 170, 262, 356, 358, 437 branching process, 5, 13, 42 Brownian bridge, 212 Brownian motion, 63 additive, 433, see additive Brownian motion and the heat semigroup, 373 as a L´evy process, 229
Symbols: ε-enlargement, 372, 413 A: Abelian theorems, see Tauberian theorems adapted, see stochastic process additive Brownian motion, 394, see also L´evy processes, see also additive stable processes, 492 hitting probabilities of, 492 additive stable process Hausdorff dimension of the range, 433 Lebesgue’s measure of the range, 433 additive stable processes, 419, see also L´evy process, additive codimension of the range, 423 hitting probabilities of, 423 Lebesgue measure of range, 452 potential density of, 420 Arzel´ a–Ascoli theorem, 194, 194
572
SUBJECT INDEX Hausdorff dimension of images of, 452 Hausdorff measure of the range, 431 intersection of 3, 451 martingales related to, 229, see also harmonic functions and Brownian martingales multidimensional, 147, 305 as a L´evy process, 305 one-dimensional, 147 potential density of, 374 estimates, 374, 375 standard, 174 and martingales, 228 Brownian sheet, see also stochastic integrals, 455, 462 ˇ Centsov’s representation, 148 and commutation, see The Cairoli–Walsh commutation theorem and its natural pseudometric, 172 as an infinite-dimensional Feller process, 493 as mixed derivative of white noise, 148 SPDE interpretation, 259 codimension of the range, 456 H¨ older continuity of, 174, 175 Hausdorff dimension of range, 457 hitting probabilities of, 456, 491 intersections of, 492 Lebesgue measure of the range, 492 martingales related to, 236 mean and covariance functions, 468 modulus of continuity of, 178 multidimensional, 147 one-dimensional, 147 pinned, 212, see also Brownian bridge polar sets for, 456 represented by a random series, 178
573
standard, 174 Burkholder–Davis–Gundy inequality, see inequalities C: C0 (S), 283 Cb (S), 184 Cairoli’s convergence theorems, see orthosmartingales, convergence theorem inequality, see maximal inequalities maximal inequality, see discrete, multiparameter maximal inequalities, see also maximal inequalities, multiparameter, continuous Cantor’s tertiary set, 515 Hausdorff dimension of, 516, 520 Hausdorff measure of, 520 capacitance, see Kolmogorov capacitance capacities, see also Hausdorff dimension, 528 absolutely continuous, 460, 538 and Markov processes, see Markov processes, multiparameter, hitting probabilities, see Markov processes and intersections Bessel–Riesz, 376, 377, see Markov processes and intersections and L´evy processes, 456, 523, 527, 529 β-dimensional, 520 connection to Hausdorff measures, see also Frostman’s theorem, 521 vs. absolutely continuous, 423 Choquet, 533
574
SUBJECT INDEX
Newtonian, see capacities, Bessel–Riesz, 532 outer Choquet, 533 Carath´eodory outer measure, see outer measure Cauchy process, 306, 351, 432 Cauchy random walks, 134 cemetery, 272, see also Markov chains, killed and Markov processes, killed ˇ Centsov’s representation, 173 Chapman–Kolmogorov equation, 270, 276, 292, 293 for Markov processes, 289 for Markov semigroups, 284 characteristic functions, 134, 352 and spectral theorem, 439, see also spectral theorem and subordination, 379 inversion theorem, 91, 285, 307, 308 codimension, 435, 493 and α-regular Gaussian processes, 441, see also Gaussian random variables and Hausdorff dimension, 436 coffin state, see cemetery commutation, 35, 35, 37, 42, 233 and conditional independence, 39, 233 and marginal filtrations, 36 and multiparameter random walks, 107 and orthosmartingales, 37, 38 completely monotone functions, 378, 505 conditional independence, see independence consistency, 500 continuity in probability, 158 continuous mapping theorem, the, 188 convergence theorems for multiparameter martingales, 38 coordinate processes, 80 correlation function, 438
and pseudometrics, 440 Coulomb’s law of electrostatics, 530 Cram´er–Wald device, 206 cumulative distribution function, 188 and uniqueness of measures, 188 and weak convergence in Rd , 189 cylinder sets, 499 D: D domain of a generator, 315 differentiation theorems, see also Lebesgue of Jessen, Marcinkiewicz and Zygmund, 63 of Jessen, Marcinkiewicz, and Zygmund, 58 diffusion, 308 direct products, 80, 402, 403 Dirichlet problem, 332 and Brownian motion, 340 Dirichlet’s divisor lemma, 118, 118, 121, 125, 127 Donsker’s theorem, 202, see also random walks, multiparameter, weak convergence Doob’s decomposition, 8, 38, 41, 45 inequality, see maximal inequalities martingale convergence theorem, see martingales separability theorem, 155 Doob–Meyer decomposition, 280, 340, 368 for the Feynman–Kac semigroup, 328, 338 of potentials for Feller processes, 318, 319 for Markov chains, 278 downcrossings, see upcrossings duality, 309 Dudley’s theorem, 171, 172, 174
SUBJECT INDEX Dvoretzky, Erd˝ os, Kakutani theorem, 428 dyadic filtrations, see also Haar systems and binary expansions, see filtrations E: elementary functions, 144 elementary process multiparameter, 265 elementary processes, 246 embedding proposition, 18, 21 energy, 528 Bessel–Riesz, 529, 532 β-dimensional, 459, 520 0-dimensional, 520 Coulomb, 532 logarithmic, see energy, 0-dimensional Bessel–Riesz mutual, 528 Newtonian, see energy, β-dimensional Bessel–Riesz enlargement of a set, see ε-enlargement entrance times, 221 measurability of, 221, 226 of Markov chains distribution of, 279 equilibrium measure, 280 equilibrium potential, see equilibrium measure essential core, 317 essential supremum, 48 exchangeability, see zero–one laws, Hewitt–Savage exchangeable σ-field, 14 excursions, 78 F: F4, see commutation Feller processes, 294 characterization via resolvents, 386 infinite-dimensional, 493 multiparameter, 392 product, 403 reference measures for, 361
575
resolvents of, 361 right continuity of, 294 strong Markov property of, 301 strongly symmetric, 360, 399 Feller semigroup, 294 Feynman–Kac formula, the, 328, 329, 338 Feynman–Kac semigroup, 327 filtrations augmented, 290 complete, 290 dyadic, 55, 59 commutation of, 52 martingales associated to, 52 multidimensional, 52 one-dimensional, 49 marginal, 32 continuous, 233 multiparameter, 31 continuous, 233 one-parameter continuous, 218 continuous, anomalous behavior, 218–220 discrete, 4 reversed, discrete, 30 right-continuous extension of, 223 finite-dimensional distributions, 66, 68, 69, 154, 194 and modifications, 154 not determining weak convergence in C, 197 flow stochastic, see stochastic flow Fourier transform and the generator of stable processes, 323 Fourier transforms, 177, 285, 491, see also characteristic functions, 502, 503 inversion theorem, 177, 323, 325 fractal percolation, 41 fractals, see Hausdorff measures and Hausdorff dimensions
576
SUBJECT INDEX
fractional Laplacian, 325, see also stable processes, isotropic, generator of Frostman’s lemma, 517, 520, 521 Frostman’s theorem, 385, 456, 521, 523, 524 Fubini’s theorem, 468 G: gambler’s ruin problem for Brownian motion, 229 for simple walks, 263 gauge function, 361 gauge functions, 460, see also energy and capacities, 527, 533 proper, 363, 539 Gaussian distributions and independence, 139 characterization of, 139 covariance, 138 density function of, 114, 138, 265, 464, 468, 469 mean, 138 tails, 135 Gaussian processes, see Gaussian random variables Gaussian random variables, 137, 140 and α-regular process, 441 and martingales, 143 and stationarity, 438 and their natural pseudometric, 170 centered, 438 covariance, 140 existence of, 141 Hausdorff dimension of the range, 441 mean, 140 generators, 315 and integro-differential equations, 340 existence, see Hille–Yosida theorem global helix, 153 Gronwall’s lemma, 260, 262, 263 groups free abelian, 78, 103
and L´evy processes, 387 and random walks, 78 the free abelian group theorem, 78 H: H¨ older continuity, 165, 516 Haar functions, see also Haar systems martingales associated to, 50 Haar systems and binary expansions, 49 and dyadic filtrations, see Haar systems and binary expansions in Lebesgue’s differentiation theorem, 55, 61 multidimensional, 51 as a basis for L1 [0, 1]N , 53 as a basis for Lp [0, 1]N , 53 one-dimensional, 48 as a basis for L1 [0, 1], 51 as a basis for Lp [0, 1], 51 orthonormality of, 49 harmonic functions, 62 and Brownian martingales, 332 Hausdorff dimension, 455, 457, 515, 517 and capacity, 521 and metric entropy, 515 outer regularity, 517 Hausdorff measures, 512, 521, see also capacity and Frostman’s theorem, 523 and metric entropy, 515 as extensions of Lebesgue’s measure, 513 as outer measures, 512 invariance properties, 514 heat equation and the transition density of Brownian motion, 320 heat kernel, 373 heat semigroup, 285 Feller property of, 288 potential density of, 285 transition density of, 285 Helly’s selection theorem, 504
SUBJECT INDEX Hessian, 320 Hewitt–Savage 0–1 law, see zero–one laws Hille–Yosida theorem, 315 history, see orthohistories, 69, 228, 234, 298 hitting times, see entrance times H¨ older continuity, 516 H¨ older continuous, 166, 176 Gaussian processes via correlations, 440 Hunt’s lemma, 38, 45 continuous case, 227 I: image, 385 inclusion–exclusion formula, 38, 40, 107, 134, 238 increments of, 106 independence conditional, 38, see also commutation and Markov property, 38, 39, 69 inequalities, see also maximal inequalities Bernstein, 122 Burkholder–Davis–Gundy, 253, 261, 266 multiparameter, 257 Cauchy–Schwarz, 72, 261, 415, 439, 462, 463, 471 Chebyshev, 3, 12, 124, 130, 145, 158, 162, 163, 522 H¨ older, 254 Jensen, 4, 6, 9, 18, 19, 21, 28, 29, 31, 59, 75, 235, 244 Kolmogorov, 3 maximal, see maximal inequalities Paley–Zygmund, 72, 75, 82, 85, 309, 347, 349, 372, 449 for σ-finite measures, 418 Slepian, 176, 178 intersection local times, see local times invariance principle, 202
577
isonormal process, 146, 148, 176, 238 existence, 176 properties, 146 Itˆ o’s formula, 251, 252, 254, 264, 264, 265 and the generator of a Bessel process, 339 and the generator of Brownian motion, 339 Itˆ o’s lemma, see Itˆ o’s formula K: Kν (x), 286 Kakutani’s theorem, 376, 456 Kochen–Stone lemma, see also inequalities, Paley–Zygmund, 356, 359 Kolmogorov capacitance, see also metric entropy, 150 Kolmogorov’s consistency theorem, see Kolmogorov’s existence theorem Kolmogorov’s continuity theorem, 158, 158, 162, 166, 170 Kolmogorov’s existence theorem, 142, 276, 292, 499, 500 Kronecker’s lemma, 117 multiparameter, 117 L: L2 (RN ), 144 L2loc (RN ), 147 L∞ (S), 282 Lp bounded martingales, see martingales Lp [0, 1]N , 48 Laplace operator, see Laplacian Laplace transforms, 102, 103, 230, 338, 370, 378, 425, 501, 501 and completely asymmetric stable distributions, 378 and resolvents, 317 convergence theorem, 503 inversion theorem, 502 of hitting times, 368
578
SUBJECT INDEX
resolvents and transition densities, 374 uniqueness theorem, 502 Laplacian, 264, 264, 320, 332, 337, see also fractional Laplacian law of large numbers Kolmogorov’s, 3, 111 moment conditions, 134 Smythe’s, 110, 112, 117, 134, 178 Lp version, 134 weak, 502 law of the iterated logarithm, 113 Chung’s, 133 converse to, 133 for one-parameter Gaussian walks, 113 Khintchine’s, 116 moment conditions, 134 multiparameter, 117 self-normalized, 134 Lebesgue differentiation theorem of, 47, 54, 58, 62 enhanced, 61 monotone convergence theorem of, 12 L´evy processes, see also Brownian motion, 303 additive, 405 and codimension of intersection, stable case, 436 and codimension of range, stable case, 436 as Feller processes, 405 reference measure, 406 and codimension of range, stable case, 436 and Hausdorff dimension, stable case, 431 as Feller processes, 304 intersections, see also Dvoretzky, Erd˝ os, Kakutani theorem intersections of, 426 intersections of, stable case, 428
Liouville’s theorem, 62 Lipschitz condition, 265 local central limit theorem, 91, 100 local times, 455, 494 approximate, 479 for additive Brownian motion, 492 for Brownian motion, 491, 493 intersection, 104 inverse of, 103 localization, 240, 242, 246, see also localizing sequence, 246, 249, 254, 255 localizing sequence, 246 locally compact space, 282 lower stochastic codimension, see codimension M: marginal filtrations and commutation, 36 Markov chains, 78, 267 k-step transition functions of, 269 absorbed, 274 transition operators of, 277 homogeneous, associated to inhomogeneous Markov chains, 268 initial distribution, 269 killed, 273 transition operators of, 276 Markov property, 86, see also Chapman–Kolmogorov equation recurrent, 308 strong Markov property, 272 time-homogeneous, 268 Markov processes, 288 capacities associated to, 367 generator of, see generators initial measures of, 288 intersections of, 425 killed generator of, 340 multiparameter, 392, 455, 456, 493 hitting probabilities, 408 initial measures, 392
SUBJECT INDEX reference measures, 398 product, 407 Feller property of, 407 stationary, 400 Markov property, 75, 83, 268, 293, see also Markov processes and random walks, 67, 73, 76 strong, 69, 78, 114, see also Feller processes, 299 and L´evy’s maximal inequality, 114 and random walks, 69, 71 Markov semigroups, 275, 283 and Feller semigroups, 287 martingales, see also Gaussian random variables Lp bounded, 12 additive, 17 and quadratic variation, see quadratic variation continuous and unbounded variation, 239 convergence theorem L2 (P) case, 42 multiparameter, lack of, 33, 35, 37 one-parameter, 36, 37, 41, 51, 144 one-parameter, discrete, 12 one-parameter, reversed case, 41 convergence theorem, discrete one-parameter, 237 convergence theorem, one-parameter, 12, 34 indexed by directed sets, 63 local, 245, 331 multiparameter, see also orthomartingales, 31 and quadratic variation, 256 continuous, 234 strong, 43, 44 multiplicative, 17 one-parameter continuous, 222 discrete, 4
579
existence of a regular modification, 223 reversed, discrete, 30, 227 one-parameter convergence theorem, discrete, 41 that are local but not martingales, 340 the martingale problem, 317 upcrossings, see upcrossings maximal inequalities, 522 and multiparameter walks, 123, 124 application of upcrossing inequalities, 11 for continuous smartingales, 222, 245, 248, 250, 253, 254, 266 strong Lp inequality, 235, 469 weak L{ln+ L}N−1 inequality, 235 for multiparameter martingales, 38 in differentiation theory, 55, 59 in potential theory, 416 in the LIL, 113 Kolmogorov, see inequalities L´evy’s, 113, 114 multiparameter, see also maximal inequalities for continuous smartingales martingales, lack of, 33 of Hardy and Littlewood type, 57, 59 one-parameter Doob’s inequality, discrete, 8 strong (p, q) inequality, discrete, 8 strong L ln+ L inequality, discrete, 9, 10 strong Lp inequality, continuous, see maximal inequalities for continuous smartingales and maximal inequalities, multiparameter, continuous
580
SUBJECT INDEX
strong Lp inequality, discrete, 9, 9, 12, 40, 75 weak (1, 1) inequality, discrete, 8, 18 strong (p, p) inequality for orthosubmartingales discrete, 20, 86 strong L{ln+ L}p inequality for orthosubmartingales discrete, 20 weak (1, L{ln+ L}N−1 ) inequality for orthosubmartingales discrete, 22 maximal operator, 55, 58 maximum principle, 388, 533 McKean’s theorem, see L´evy processes, and Hausdorff dimension method of characteristics, 257 metric entropy, 150, 515, see also chaining for global helices, 153 of contractions on [0, 1], 151 relation to Kolmogorov capacitance, 150 Mill’s ratios, 138 mixing, 100 modification of a stochastic process, 154 modulus of continuity, 163, 165, 194 in probability, 167 monotonicity argument, 166, 170 multiparameter martingales, see martingales
multiparameter and stopping domains, 43, 44 and stopping points, 42 for strong martingales, 44 one-parameter continuous, 226, 253, 331, 332, 338, 349, 356, 369 discrete, 7, 8, 12, 40, 226, 281 Orlicz norms, 45 Ornstein–Uhlenbeck process, 339 Feller property of, 388 orthohistories discrete, 22 orthomartingales, see also martingales and commutation, 36 discrete, 16 reversed convergence theorem, discrete, 30 discrete, 30 orthosmartingales, see also orthomartingale convergence theorem discrete, 26, 44, 53 Lamb’s method, 44 discrete, 16 orthosubmartingales discrete, see also orthomartingale, 16 orthosupermartingales, see also orthomartingale, 16 outer measure, 511 outer radius, 457, 459, 460
N: Normal distributions, see Gaussian distributions
P: P(E), 520 Paley–Zygmund lemma, see inequalities, Paley–Zygmund, 459 parabolic operator, 339 Parseval’s identity, 323, 325 patching argument, 160, 166, 174 Peres’s lemma, 436 ϕ-mixing, 100 Plancherel’s theorem, 491 P´ olya’s criterion, 72
O: occupation density, see local times formula, the, 493 occupation measure, 442 one-point compactification, 42, 294, 391, 401 optional increasing paths, 43 optional stopping theorem
SUBJECT INDEX point recurrence, 351 Poisson processes, 230 a construction of, 232 generator of, 325 Poisson’s equation, 340 polar coordinates, 90, 100, 465 polar set, 455 polarization, 247 Portmanteau theorem, 186 portmanteau theorem, the, 187 possible points, 76 and recurrence, 77 as a semigroup, 76 as a subgroup, 76 potential density for Markov semigroups, 284 potential functions, 283, 531 potentials, see also entrance times, see also balayage of a measure, 529 pretightness, 200 Prohorov’s theorem, 190, 210 pseudometric, 149 pseudometric spaces, 149 totally bounded, 149 Q: quadratic variation, 213, 242 R: Rademacher random variables, 134 Radon–Nikod´ ym theorem, 62, 491 random processes, see stochastic processes random set, 435 random variables, 182, 194 distributions of, 183, 195, see also finite-dimensional distributions existence of, 183, see also Urysohn’s lemma random walks, 5, 65, 66, 69 and martingales, 110 increments of, 40, 66, see also inclusion exclusion formula intersection of two independent, 85 intersection probabilities, 93
581
intersections of, 80, see also random walks, simple intersections of several, 87, 89 Markov property of, 270 multiparameter, 17, 106 and martingales, 108 and simulation of the Brownian sheet, 457 weak convergence to the Brownian sheet, 204 nearest-neighborhood, 89 simple, 69, 89 and weak convergence, 211 as Markov chains, 308 intersection of four or more, 97 intersection of three, 93 intersections of two, 90, 91 symmetric, 76, 113 range, 385 reciprocity theorem, 529 recurrence for L´evy process probabilistic criterion, 347 for L´evy processes, 344 for random walks, 70, 71, see also P´ olya’s criterion and possible points, 77 recurrence–transience dichotomy, 78, 79 recurrent points as a subgroup, 76 reference measure, see Markov processes, multiparameter reflection principle Andr´e’s, 133, 265 regular conditional probabilities, 289 resolvent density for stopped random walk, 100 multiparameter, 397 resolvent density, see potential density resolvent equation for Markov chains, 278 for Markov semigroups, 284
582
SUBJECT INDEX
resolvents, see also resolvent equation and supermartingales, 295 corresponding to Markov semigroups, 283 density for multiparameter Markov processes, 398 of Markov chains, 277 of Markov process and supermartingales, 388 of Markov processes and supermartingales, 451 reversed martingales, see martingales reversed orthomartingales, see orthomartingales right continuity multiparameter, 236 S: S∆ , 391 sector condition, 84 sectorial limits discrete, 24 semigroups, 67, see also Markov semigroups, see also transition operators and Markov chains, 275 and random walks, 67 Feller, 287 marginal associated to multiparameter semigroups, 396 multiparameter, 395 one-parameter representation of multiparameter, 403 separability, see stochastic processes shift operators (or shifts), 292 simple functions, 144 simple process multiparameter, 265 simple processes, 247 smartingales, see martingales sojourn times, 347, see also occupation measure space–time process, 339
SPDEs hyperbolic and Picard’s iteration, 261 H¨ older continuity of solutions, 263 motivation, 259, 266 spectral theorem, 439 splitting fields, 46 stable processes, see L´evy processes completely asymmetric, 378 asymptotics of the distribution, 379 isotropic, 306, see also Bessel–Riesz capacities and capacities, 384 and Hausdorff dimension, 385 are not diffusions, 309 asymptotics of the density function, 380 escape rates for, 350 existence of, 308 generator of, 324 Lebesgue’s measure of range, 386 reference measure of, 384 resolvent density, 382 resolvent density estimates, 384 scaling properties of, 352 strong symmetry of, 361 transition density, 382 stationarity, see Markov processes, see and Gaussian random variables stick breaking, 5, 13 stochastic codimension, see codimension stochastic flow, 265 stochastic integrals against martingales continuous adapted integrands, 248 elementary integrands, 246 quadratic variation, 249 simple integrands, 247 against the Brownian sheet, 148 of Stratonovich, 264
SUBJECT INDEX stochastic partial differential equation, see SPDEs stochastic process adapted, 234 stochastic processes adapted, 4, 218 existence, see Kolmogorov’s existence theorem modification continuous, 168 modification, continuous, see Kolmogorov’s continuity theorem and metric entropy and Dudley’s theorem one-parameter discrete, 4 separable, see also Doob’s separability theorem, 155 stopping domains, 43 stopping points, 42 stopping times continuous, 218 approximation by discrete, 220 discrete, 5, 69 strong law of large numbers, see law of large numbers strong Markov property, see Markov property, see also Markov chains strong martingales, see martingales strong symmetry, see Feller processes submartingales multiparameter, see also multiparameter martingales, 31 continuous, 234 one-parameter continuous, 222 discrete, 4 subordination, 378, 378 summation by parts, 118 supermartingales multiparameter, 31 continuous, 234 one-parameter continuous, 222
583
discrete, 4 symmetry, see also random walks and Gaussian random variables, 140 and positive definiteness, 140 and resolvent densities, 399 and the law of the iterated logarithm, 134 for random variables, 113 of gauge functions, 528, 529 strong, see Feller processes T: tail σ-field, 310 Tauberian theorems, 102, 506 Feller’s, 506 Taylor’s theorem, 523, 523, 525 The Cairoli–Walsh commutation theorem, 237 tightness, 189 in C, 200 time-reversal, 271, 388, 453, 454 total boundedness, 149 relation to compactness and completeness, 150 towering property of conditional expectations, 7, 23, 32, 36, 39, 369 trajectories, 67, 84 transience, see also recurrence for L´evy processes, see recurrence for L´evy processes for random walks, 74 transition densities of Markov semigroups, 284 transition functions, see transition operators transition operators, 68, 275 and Markov semigroups, 275 for Markov processes, 288 for multiparameter Markov processes, 393 for random walks, 67 marginal, 396 U: U-Statistics, 45
584
SUBJECT INDEX
upcrossing times, discrete, 10, see also upcrossings upcrossings inequality application to continuous-time, 224 multiparameter inequality, 44 one-parameter inequality, discrete, 11, 11, 12 inequality,discrete, 11 upper stochastic codimension, see codimension Urysohn’s lemma, 184, 296 and existence of random variables, 186 Urysohn’s metrization theorem, 293 usual conditions, 225, 248 V: vibrating string problem, see wave equation W: wave equation, 258 weak convergence, 185, see also Portmanteau theorem in C, see also Donsker’s theorem weak convergence in C, 198 white noise, 142, 148 is not a measure, 175 properties, 142 Wiener’s test, 385 Williams’ path decomposition, 70 Z: zero set, 455 of random walks, 102 zero–one laws Blumenthal, 310 for tail fields, see tail σ-field Hewitt–Savage, 14, 82 Kolmogorov, 14, 310, 432